date:20220518

RE: [PATCH] whpx: Added support for saving/restoring VM state

2022-05-18 Thread Ivan Shcherbakov

Hi Paolo,

Thanks for pointing out the relevant fields of the XSAVE state - it explains 
what is going on.
I did some quick experiments with modifying the buffer manually, so:

* Hyper-V doesn't care about the value of MXCSR_MASK. Both 0x and 0x 
work.
* Setting XCOMP_BV to 0 does trigger the error.

My best guess is that Hyper-V is emulating XSAVE/XRSTOR programmatically and 
they only implemented the compacted format.

Do you think it would be a reasonable workaround to handle it like this:

1. Pass the XSAVE data received from Hyper-V to x86_cpu_xrstor_all_areas().
2. Also save it into the snapshot area like we do now.
3. When restoring, first try to pass the data from x86_cpu_xsave_all_areas() to 
Hyper-V.
4. If it rejects it, pass the original block saved in step 2.

If either Microsoft adds support for the regular format, or someone on the Qemu 
side implements the compacted one, this logic will start properly synchronizing 
the QEMU-level XSAVE structure with Hyper-V. Until then (or if it breaks again) 
it will fall back to saving/restoring the data "as is", which will be 
sufficient for snapshots.

Best,
Ivan

-Original Message-
From: Qemu-devel  On Behalf Of 
Paolo Bonzini
Sent: Tuesday, May 17, 2022 7:12 AM
To: Ivan Shcherbakov ; qemu-devel@nongnu.org
Subject: Re: [PATCH] whpx: Added support for saving/restoring VM state

On 5/16/22 20:44, Ivan Shcherbakov wrote:
> Passing it to x86_cpu_xrstor_all_areas()/x86_cpu_xsave_all_areas() changed 
> the following values:
> 
> 0x001C: ff ff -> 00 00
> 0x0208: 07 -> 00
> 0x020F: 80 -> 00

0x1C-0x1F is MXCSR_MASK.  There's already a field in the x86 CPUState, but it 
was forgotten in x86_cpu_xsave_all_areas()/x86_cpu_xrstor_all_areas().  The 
field should also be initialized to 0x in the CPU reset function.

0x208...0x20F is XCOMP_BV and bit 63 in there is indeed signaling compacted 
format.  First of all I'd start with your patch and hack it to check if Hyper-V 
accepts zero at 0x208..0x20F; in this specific case of 0x208...0x20F have all 
low consecutive bits set plus bit 63 set, it's fine to do just that.  If so, 
x86_cpu_xrstor_all_areas() needs no support for compacted format.  I would be 
somewhat surprised if Hyper-V needs support in XRSTOR too.

For XSAVE, the algorithm to compute the offset (instead of just using 
x->offset) is given in the Intel manual:

If XCOMP_BV[i] = 0, state component i is not in the XSAVE area at all.

If XCOMP_BV[i] = 1, state component i is located at a byte offset  from the 
base address of the XSAVE area, which is determined by the following
steps:

- If i is the first bit set in bits 62:2 of the XCOMP_BV, state component i 
starts at offset 576

- Otherwise, take CPUID[EAX=0DH,ECX=i].ECX[1]:

   - If it is 0, state component i starts right after the preceding state
 component whose bit is set in XCOMP_BV (where the size of component
 j is enumerated in CPUID[EAX=0DH,ECX=j].EAX).

   - If it is 1, state component i starts after the preceding state
 component whose bit is set in XCOMP_BV, but on a 64-byte aligned
 offset relative to the beginning of the XSAVE area.

Paolo

[PATCH v24 8/8] tests: Add dirty page rate limit test

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

Add dirty page rate limit test if kernel support dirty ring.

The following qmp commands are covered by this test case:
"calc-dirty-rate", "query-dirty-rate", "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit" and "query-vcpu-dirty-limit".

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 
---
 tests/qtest/migration-helpers.c |  22 
 tests/qtest/migration-helpers.h |   2 +
 tests/qtest/migration-test.c| 255 
 3 files changed, 279 insertions(+)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index a6aa59e..4849cba 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -76,6 +76,28 @@ QDict *wait_command(QTestState *who, const char *command, 
...)
 }
 
 /*
+ * Execute the qmp command only
+ */
+QDict *qmp_command(QTestState *who, const char *command, ...)
+{
+va_list ap;
+QDict *resp, *ret;
+
+va_start(ap, command);
+resp = qtest_vqmp(who, command, ap);
+va_end(ap);
+
+g_assert(!qdict_haskey(resp, "error"));
+g_assert(qdict_haskey(resp, "return"));
+
+ret = qdict_get_qdict(resp, "return");
+qobject_ref(ret);
+qobject_unref(resp);
+
+return ret;
+}
+
+/*
  * Send QMP command "migrate".
  * Arguments are built from @fmt... (formatted like
  * qobject_from_jsonf_nofail()) with "uri": @uri spliced in.
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 78587c2..5956189 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -23,6 +23,8 @@ QDict *wait_command_fd(QTestState *who, int fd, const char 
*command, ...);
 G_GNUC_PRINTF(2, 3)
 QDict *wait_command(QTestState *who, const char *command, ...);
 
+QDict *qmp_command(QTestState *who, const char *command, ...);
+
 G_GNUC_PRINTF(3, 4)
 void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...);
 
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index d33e806..f59d31b 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -24,6 +24,7 @@
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/qobject-output-visitor.h"
 #include "crypto/tlscredspsk.h"
+#include "qapi/qmp/qlist.h"
 
 #include "migration-helpers.h"
 #include "tests/migration/migration-test.h"
@@ -58,6 +59,11 @@ static bool uffd_feature_thread_id;
 #include 
 #include 
 #include 
+/*
+ * Dirtylimit stop working if dirty page rate error
+ * value less than DIRTYLIMIT_TOLERANCE_RANGE
+ */
+#define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
 
 static bool ufd_version_check(void)
 {
@@ -2070,6 +2076,253 @@ static void test_multifd_tcp_cancel(void)
 test_migrate_end(from, to2, true);
 }
 
+static void calc_dirty_rate(QTestState *who, uint64_t calc_time)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'calc-dirty-rate',"
+  "'arguments': { "
+  "'calc-time': %ld,"
+  "'mode': 'dirty-ring' }}",
+  calc_time));
+}
+
+static QDict *query_dirty_rate(QTestState *who)
+{
+return qmp_command(who, "{ 'execute': 'query-dirty-rate' }");
+}
+
+static void dirtylimit_set_all(QTestState *who, uint64_t dirtyrate)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'set-vcpu-dirty-limit',"
+  "'arguments': { "
+  "'dirty-rate': %ld } }",
+  dirtyrate));
+}
+
+static void cancel_vcpu_dirty_limit(QTestState *who)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'cancel-vcpu-dirty-limit' }"));
+}
+
+static QDict *query_vcpu_dirty_limit(QTestState *who)
+{
+QDict *rsp;
+
+rsp = qtest_qmp(who, "{ 'execute': 'query-vcpu-dirty-limit' }");
+g_assert(!qdict_haskey(rsp, "error"));
+g_assert(qdict_haskey(rsp, "return"));
+
+return rsp;
+}
+
+static bool calc_dirtyrate_ready(QTestState *who)
+{
+QDict *rsp_return;
+gchar *status;
+
+rsp_return = query_dirty_rate(who);
+g_assert(rsp_return);
+
+status = g_strdup(qdict_get_str(rsp_return, "status"));
+g_assert(status);
+
+return g_strcmp0(status, "measuring");
+}
+
+static void wait_for_calc_dirtyrate_complete(QTestState *who,
+ int64_t time_s)
+{
+int max_try_count = 1;
+usleep(time_s * 100);
+
+while (!calc_dirtyrate_ready(who) && max_try_count--) {
+usleep(1000);
+}
+
+/*
+ * Set the timeout with 10 s(max_try_count * 1000us),
+ * if dirtyrate measurement not complete, fail test.
+ */
+g_assert_cmpint(max_try_count, !=, 0);
+}
+
+static int64_t get_dirty_rate(QTestState *who)
+{
+QDict *rsp_return;
+gchar *status;
+QList *rates;
+const QListEntry *entry;
+QDict *rate;
+int64_t dirtyrate;
+
+rsp_return = query_dirty_rate(who);
+g_assert(rsp_return);
+
+status = g_strdup(qdict_get_str(rsp_return, "status"));
+

[PATCH v24 7/8] softmmu/dirtylimit: Implement dirty page rate limit

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

Implement dirtyrate calculation periodically basing on
dirty-ring and throttle virtual CPU until it reachs the quota
dirty page rate given by user.

Introduce qmp commands "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit", "query-vcpu-dirty-limit"
to enable, disable, query dirty page limit for virtual CPU.

Meanwhile, introduce corresponding hmp commands
"set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit",
"info vcpu_dirty_limit" so the feature can be more usable.

"query-vcpu-dirty-limit" success depends on enabling dirty
page rate limit, so just add it to the list of skipped
command to ensure qmp-cmd-test run successfully.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Markus Armbruster 
Reviewed-by: Peter Xu 
---
 hmp-commands-info.hx   |  13 +++
 hmp-commands.hx|  32 
 include/monitor/hmp.h  |   3 +
 qapi/migration.json|  80 +++
 softmmu/dirtylimit.c   | 194 +
 tests/qtest/qmp-cmd-test.c |   2 +
 6 files changed, 324 insertions(+)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index adfa085..016717d 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -865,6 +865,19 @@ SRST
 Display the vcpu dirty rate information.
 ERST
 
+{
+.name   = "vcpu_dirty_limit",
+.args_type  = "",
+.params = "",
+.help   = "show dirty page limit information of all vCPU",
+.cmd= hmp_info_vcpu_dirty_limit,
+},
+
+SRST
+  ``info vcpu_dirty_limit``
+Display the vcpu dirty page limit information.
+ERST
+
 #if defined(TARGET_I386)
 {
 .name   = "sgx",
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 564f1de..cb688a6 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1754,3 +1754,35 @@ ERST
   "\n\t\t\t -b to specify dirty bitmap as method of 
calculation)",
 .cmd= hmp_calc_dirty_rate,
 },
+
+SRST
+``set_vcpu_dirty_limit``
+  Set dirty page rate limit on virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+{
+.name   = "set_vcpu_dirty_limit",
+.args_type  = "dirty_rate:l,cpu_index:l?",
+.params = "dirty_rate [cpu_index]",
+.help   = "set dirty page rate limit, use cpu_index to set limit"
+  "\n\t\t\t\t\t on a specified virtual cpu",
+.cmd= hmp_set_vcpu_dirty_limit,
+},
+
+SRST
+``cancel_vcpu_dirty_limit``
+  Cancel dirty page rate limit on virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+{
+.name   = "cancel_vcpu_dirty_limit",
+.args_type  = "cpu_index:l?",
+.params = "[cpu_index]",
+.help   = "cancel dirty page rate limit, use cpu_index to cancel"
+  "\n\t\t\t\t\t limit on a specified virtual cpu",
+.cmd= hmp_cancel_vcpu_dirty_limit,
+},
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 96d0148..478820e 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -131,6 +131,9 @@ void hmp_replay_delete_break(Monitor *mon, const QDict 
*qdict);
 void hmp_replay_seek(Monitor *mon, const QDict *qdict);
 void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict);
 void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict);
+void hmp_set_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_cancel_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
 void hmp_human_readable_text_helper(Monitor *mon,
 HumanReadableText *(*qmp_handler)(Error 
**));
 
diff --git a/qapi/migration.json b/qapi/migration.json
index 6130cd9..9c57cc0 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1884,6 +1884,86 @@
 { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
 
 ##
+# @DirtyLimitInfo:
+#
+# Dirty page rate limit information of a virtual CPU.
+#
+# @cpu-index: index of a virtual CPU.
+#
+# @limit-rate: upper limit of dirty page rate (MB/s) for a virtual
+#  CPU, 0 means unlimited.
+#
+# @current-rate: current dirty page rate (MB/s) for a virtual CPU.
+#
+# Since: 7.1
+#
+##
+{ 'struct': 'DirtyLimitInfo',
+  'data': { 'cpu-index': 'int',
+'limit-rate': 'uint64',
+'current-rate': 'uint64' } }
+
+##
+# @set-vcpu-dirty-limit:
+#
+# Set the upper limit of dirty page rate for virtual CPUs.
+#
+# Requires KVM with accelerator property "dirty-ring-size" set.
+# A virtual CPU's dirty page rate is a measure of its memory load.
+# To observe dirty page rates, use @calc-dirty-rate.
+#
+# @cpu-index: index of a virtual CPU, default is all.
+#
+# @dirty-rate: upper limit of dirty page rate (MB/s) for virtual CPUs.
+#
+# Since: 7.1
+#
+# Example:

[PATCH v24 6/8] softmmu/dirtylimit: Implement virtual CPU throttle

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

Setup a negative feedback system when vCPU thread
handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
throttle_us_per_full field in struct CPUState. Sleep
throttle_us_per_full microseconds to throttle vCPU
if dirtylimit is in service.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 accel/kvm/kvm-all.c |  20 ++-
 include/hw/core/cpu.h   |   6 +
 include/sysemu/dirtylimit.h |  15 +++
 softmmu/dirtylimit.c| 291 
 softmmu/trace-events|   7 ++
 5 files changed, 338 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 0c6846b..187c337 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -45,6 +45,7 @@
 #include "qemu/guest-random.h"
 #include "sysemu/hw_accel.h"
 #include "kvm-cpus.h"
+#include "sysemu/dirtylimit.h"
 
 #include "hw/boards.h"
 
@@ -476,6 +477,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 cpu->kvm_state = s;
 cpu->vcpu_dirty = true;
 cpu->dirty_pages = 0;
+cpu->throttle_us_per_full = 0;
 
 mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
 if (mmap_size < 0) {
@@ -1469,6 +1471,11 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
  */
 sleep(1);
 
+/* keep sleeping so that dirtylimit not be interfered by reaper */
+if (dirtylimit_in_service()) {
+continue;
+}
+
 trace_kvm_dirty_ring_reaper("wakeup");
 r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
 
@@ -2965,8 +2972,19 @@ int kvm_cpu_exec(CPUState *cpu)
  */
 trace_kvm_dirty_ring_full(cpu->cpu_index);
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(kvm_state, NULL);
+/*
+ * We throttle vCPU by making it sleep once it exit from kernel
+ * due to dirty ring full. In the dirtylimit scenario, reaping
+ * all vCPUs after a single vCPU dirty ring get full result in
+ * the miss of sleep, so just reap the ring-fulled vCPU.
+ */
+if (dirtylimit_in_service()) {
+kvm_dirty_ring_reap(kvm_state, cpu);
+} else {
+kvm_dirty_ring_reap(kvm_state, NULL);
+}
 qemu_mutex_unlock_iothread();
+dirtylimit_vcpu_execute(cpu);
 ret = 0;
 break;
 case KVM_EXIT_SYSTEM_EVENT:
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 996f940..500503d 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -418,6 +418,12 @@ struct CPUState {
  */
 bool throttle_thread_scheduled;
 
+/*
+ * Sleep throttle_us_per_full microseconds once dirty ring is full
+ * if dirty page rate limit is enabled.
+ */
+int64_t throttle_us_per_full;
+
 bool ignore_memory_transaction_failures;
 
 /* Used for user-only emulation of prctl(PR_SET_UNALIGN). */
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
index da459f0..8d2c1f3 100644
--- a/include/sysemu/dirtylimit.h
+++ b/include/sysemu/dirtylimit.h
@@ -19,4 +19,19 @@ void vcpu_dirty_rate_stat_start(void);
 void vcpu_dirty_rate_stat_stop(void);
 void vcpu_dirty_rate_stat_initialize(void);
 void vcpu_dirty_rate_stat_finalize(void);
+
+void dirtylimit_state_lock(void);
+void dirtylimit_state_unlock(void);
+void dirtylimit_state_initialize(void);
+void dirtylimit_state_finalize(void);
+bool dirtylimit_in_service(void);
+bool dirtylimit_vcpu_index_valid(int cpu_index);
+void dirtylimit_process(void);
+void dirtylimit_change(bool start);
+void dirtylimit_set_vcpu(int cpu_index,
+ uint64_t quota,
+ bool enable);
+void dirtylimit_set_all(uint64_t quota,
+bool enable);
+void dirtylimit_vcpu_execute(CPUState *cpu);
 #endif
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index 6102e8c..76d0b44 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -18,6 +18,26 @@
 #include "sysemu/dirtylimit.h"
 #include "exec/memory.h"
 #include "hw/boards.h"
+#include "sysemu/kvm.h"
+#include "trace.h"
+
+/*
+ * Dirtylimit stop working if dirty page rate error
+ * value less than DIRTYLIMIT_TOLERANCE_RANGE
+ */
+#define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
+/*
+ * Plus or minus vcpu sleep time linearly if dirty
+ * page rate error value percentage over
+ * DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT.
+ * Otherwise, plus or minus a fixed vcpu sleep time.
+ */
+#define DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT 50
+/*
+ * Max vcpu sleep time percentage during a cycle
+ * composed of dirty ring full and sleep time.
+ */
+#define DIRTYLIMIT_THROTTLE_PCT_MAX 99
 
 struct {
 VcpuStat stat;
@@ -25,6 +45,30 @@ struct {
 QemuThread thread;
 } *vcpu_dirty_rate_stat;
 
+typedef struct VcpuDirtyLimitState {
+int cpu_index;
+bool enabled;
+/*
+ * Quota dirty page rate, unit is MB/s
+ * zero if not

[PATCH v24 5/8] accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

Introduce kvm_dirty_ring_size util function to help calculate
dirty ring ful time.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 
---
 accel/kvm/kvm-all.c| 5 +
 accel/stubs/kvm-stub.c | 6 ++
 include/sysemu/kvm.h   | 2 ++
 3 files changed, 13 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index b13cd27..0c6846b 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2313,6 +2313,11 @@ bool kvm_dirty_ring_enabled(void)
 return kvm_state->kvm_dirty_ring_size ? true : false;
 }
 
+uint32_t kvm_dirty_ring_size(void)
+{
+return kvm_state->kvm_dirty_ring_size;
+}
+
 static int kvm_init(MachineState *ms)
 {
 MachineClass *mc = MACHINE_GET_CLASS(ms);
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 3345882..c5aafaa 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -148,3 +148,9 @@ bool kvm_dirty_ring_enabled(void)
 {
 return false;
 }
+
+uint32_t kvm_dirty_ring_size(void)
+{
+return 0;
+}
+#endif
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a783c78..efd6dee 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -582,4 +582,6 @@ bool kvm_cpu_check_are_resettable(void);
 bool kvm_arch_cpu_check_are_resettable(void);
 
 bool kvm_dirty_ring_enabled(void);
+
+uint32_t kvm_dirty_ring_size(void);
 #endif
-- 
1.8.3.1

[PATCH v24 4/8] softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodically

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
tracking for calculate dirtyrate periodly for dirty page
rate limit.

Add dirtylimit.c to implement dirtyrate calculation periodly,
which will be used for dirty page rate limit.

Add dirtylimit.h to export util functions for dirty page rate
limit implementation.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 include/exec/memory.h   |   5 +-
 include/sysemu/dirtylimit.h |  22 +
 softmmu/dirtylimit.c| 116 
 softmmu/meson.build |   1 +
 4 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 include/sysemu/dirtylimit.h
 create mode 100644 softmmu/dirtylimit.c

diff --git a/include/exec/memory.h b/include/exec/memory.h
index f1c1945..9e4e966 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
 /* Dirty tracking enabled because measuring dirty rate */
 #define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
 
-#define GLOBAL_DIRTY_MASK  (0x3)
+/* Dirty tracking enabled because dirty limit */
+#define GLOBAL_DIRTY_LIMIT  (1U << 2)
+
+#define GLOBAL_DIRTY_MASK  (0x7)
 
 extern unsigned int global_dirty_tracking;
 
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
new file mode 100644
index 000..da459f0
--- /dev/null
+++ b/include/sysemu/dirtylimit.h
@@ -0,0 +1,22 @@
+/*
+ * Dirty page rate limit common functions
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_DIRTYRLIMIT_H
+#define QEMU_DIRTYRLIMIT_H
+
+#define DIRTYLIMIT_CALC_TIME_MS 1000/* 1000ms */
+
+int64_t vcpu_dirty_rate_get(int cpu_index);
+void vcpu_dirty_rate_stat_start(void);
+void vcpu_dirty_rate_stat_stop(void);
+void vcpu_dirty_rate_stat_initialize(void);
+void vcpu_dirty_rate_stat_finalize(void);
+#endif
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
new file mode 100644
index 000..6102e8c
--- /dev/null
+++ b/softmmu/dirtylimit.c
@@ -0,0 +1,116 @@
+/*
+ * Dirty page rate limit implementation code
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/main-loop.h"
+#include "qapi/qapi-commands-migration.h"
+#include "sysemu/dirtyrate.h"
+#include "sysemu/dirtylimit.h"
+#include "exec/memory.h"
+#include "hw/boards.h"
+
+struct {
+VcpuStat stat;
+bool running;
+QemuThread thread;
+} *vcpu_dirty_rate_stat;
+
+static void vcpu_dirty_rate_stat_collect(void)
+{
+VcpuStat stat;
+int i = 0;
+
+/* calculate vcpu dirtyrate */
+vcpu_calculate_dirtyrate(DIRTYLIMIT_CALC_TIME_MS,
+ ,
+ GLOBAL_DIRTY_LIMIT,
+ false);
+
+for (i = 0; i < stat.nvcpu; i++) {
+vcpu_dirty_rate_stat->stat.rates[i].id = i;
+vcpu_dirty_rate_stat->stat.rates[i].dirty_rate =
+stat.rates[i].dirty_rate;
+}
+
+free(stat.rates);
+}
+
+static void *vcpu_dirty_rate_stat_thread(void *opaque)
+{
+rcu_register_thread();
+
+/* start log sync */
+global_dirty_log_change(GLOBAL_DIRTY_LIMIT, true);
+
+while (qatomic_read(_dirty_rate_stat->running)) {
+vcpu_dirty_rate_stat_collect();
+}
+
+/* stop log sync */
+global_dirty_log_change(GLOBAL_DIRTY_LIMIT, false);
+
+rcu_unregister_thread();
+return NULL;
+}
+
+int64_t vcpu_dirty_rate_get(int cpu_index)
+{
+DirtyRateVcpu *rates = vcpu_dirty_rate_stat->stat.rates;
+return qatomic_read([cpu_index].dirty_rate);
+}
+
+void vcpu_dirty_rate_stat_start(void)
+{
+if (qatomic_read(_dirty_rate_stat->running)) {
+return;
+}
+
+qatomic_set(_dirty_rate_stat->running, 1);
+qemu_thread_create(_dirty_rate_stat->thread,
+   "dirtyrate-stat",
+   vcpu_dirty_rate_stat_thread,
+   NULL,
+   QEMU_THREAD_JOINABLE);
+}
+
+void vcpu_dirty_rate_stat_stop(void)
+{
+qatomic_set(_dirty_rate_stat->running, 0);
+qemu_mutex_unlock_iothread();
+qemu_thread_join(_dirty_rate_stat->thread);
+qemu_mutex_lock_iothread();
+}
+
+void vcpu_dirty_rate_stat_initialize(void)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+int max_cpus = ms->smp.max_cpus;
+
+vcpu_dirty_rate_stat =
+g_malloc0(sizeof(*vcpu_dirty_rate_stat));
+
+vcpu_dirty_rate_stat->stat.nvcpu = max_cpus;
+vcpu_dirty_rate_stat->stat.rates =
+g_malloc0(sizeof(DirtyRateVcpu) * max_cpus);
+
+vcpu_dirty_rate_stat->running = false;
+}
+
+void

[PATCH v24 3/8] migration/dirtyrate: Refactor dirty page rate calculation

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

abstract out dirty log change logic into function
global_dirty_log_change.

abstract out dirty page rate calculation logic via
dirty-ring into function vcpu_calculate_dirtyrate.

abstract out mathematical dirty page rate calculation
into do_calculate_dirtyrate, decouple it from DirtyStat.

rename set_sample_page_period to dirty_stat_wait, which
is well-understood and will be reused in dirtylimit.

handle cpu hotplug/unplug scenario during measurement of
dirty page rate.

export util functions outside migration.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 include/sysemu/dirtyrate.h |  28 ++
 migration/dirtyrate.c  | 227 -
 migration/dirtyrate.h  |   7 +-
 3 files changed, 174 insertions(+), 88 deletions(-)
 create mode 100644 include/sysemu/dirtyrate.h

diff --git a/include/sysemu/dirtyrate.h b/include/sysemu/dirtyrate.h
new file mode 100644
index 000..4d3b9a4
--- /dev/null
+++ b/include/sysemu/dirtyrate.h
@@ -0,0 +1,28 @@
+/*
+ * dirty page rate helper functions
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_DIRTYRATE_H
+#define QEMU_DIRTYRATE_H
+
+typedef struct VcpuStat {
+int nvcpu; /* number of vcpu */
+DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
+} VcpuStat;
+
+int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
+ VcpuStat *stat,
+ unsigned int flag,
+ bool one_shot);
+
+void global_dirty_log_change(unsigned int flag,
+ bool start);
+#endif
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index aace12a..795fab5 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -46,7 +46,7 @@ static struct DirtyRateStat DirtyStat;
 static DirtyRateMeasureMode dirtyrate_mode =
 DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
 
-static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
+static int64_t dirty_stat_wait(int64_t msec, int64_t initial_time)
 {
 int64_t current_time;
 
@@ -60,6 +60,132 @@ static int64_t set_sample_page_period(int64_t msec, int64_t 
initial_time)
 return msec;
 }
 
+static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
+ CPUState *cpu, bool start)
+{
+if (start) {
+dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
+} else {
+dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
+}
+}
+
+static int64_t do_calculate_dirtyrate(DirtyPageRecord dirty_pages,
+  int64_t calc_time_ms)
+{
+uint64_t memory_size_MB;
+uint64_t increased_dirty_pages =
+dirty_pages.end_pages - dirty_pages.start_pages;
+
+memory_size_MB = (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
+
+return memory_size_MB * 1000 / calc_time_ms;
+}
+
+void global_dirty_log_change(unsigned int flag, bool start)
+{
+qemu_mutex_lock_iothread();
+if (start) {
+memory_global_dirty_log_start(flag);
+} else {
+memory_global_dirty_log_stop(flag);
+}
+qemu_mutex_unlock_iothread();
+}
+
+/*
+ * global_dirty_log_sync
+ * 1. sync dirty log from kvm
+ * 2. stop dirty tracking if needed.
+ */
+static void global_dirty_log_sync(unsigned int flag, bool one_shot)
+{
+qemu_mutex_lock_iothread();
+memory_global_dirty_log_sync();
+if (one_shot) {
+memory_global_dirty_log_stop(flag);
+}
+qemu_mutex_unlock_iothread();
+}
+
+static DirtyPageRecord *vcpu_dirty_stat_alloc(VcpuStat *stat)
+{
+CPUState *cpu;
+DirtyPageRecord *records;
+int nvcpu = 0;
+
+CPU_FOREACH(cpu) {
+nvcpu++;
+}
+
+stat->nvcpu = nvcpu;
+stat->rates = g_malloc0(sizeof(DirtyRateVcpu) * nvcpu);
+
+records = g_malloc0(sizeof(DirtyPageRecord) * nvcpu);
+
+return records;
+}
+
+static void vcpu_dirty_stat_collect(VcpuStat *stat,
+DirtyPageRecord *records,
+bool start)
+{
+CPUState *cpu;
+
+CPU_FOREACH(cpu) {
+record_dirtypages(records, cpu, start);
+}
+}
+
+int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
+ VcpuStat *stat,
+ unsigned int flag,
+ bool one_shot)
+{
+DirtyPageRecord *records;
+int64_t init_time_ms;
+int64_t duration;
+int64_t dirtyrate;
+int i = 0;
+unsigned int gen_id;
+
+retry:
+init_time_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+cpu_list_lock();
+gen_id = cpu_list_generation_id_get();
+records = vcpu_dirty_stat_alloc(stat);
+vcpu_dirty_stat_collect(stat, records, true);
+

[PATCH v24 2/8] cpus: Introduce cpu_list_generation_id

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

Introduce cpu_list_generation_id to track cpu list generation so
that cpu hotplug/unplug can be detected during measurement of
dirty page rate.

cpu_list_generation_id could be used to detect changes of cpu
list, which is prepared for dirty page rate measurement.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 cpus-common.c | 8 
 include/exec/cpu-common.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/cpus-common.c b/cpus-common.c
index db459b4..793364d 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -73,6 +73,12 @@ static int cpu_get_free_index(void)
 }
 
 CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
+static unsigned int cpu_list_generation_id;
+
+unsigned int cpu_list_generation_id_get(void)
+{
+return cpu_list_generation_id;
+}
 
 void cpu_list_add(CPUState *cpu)
 {
@@ -84,6 +90,7 @@ void cpu_list_add(CPUState *cpu)
 assert(!cpu_index_auto_assigned);
 }
 QTAILQ_INSERT_TAIL_RCU(, cpu, node);
+cpu_list_generation_id++;
 }
 
 void cpu_list_remove(CPUState *cpu)
@@ -96,6 +103,7 @@ void cpu_list_remove(CPUState *cpu)
 
 QTAILQ_REMOVE_RCU(, cpu, node);
 cpu->cpu_index = UNASSIGNED_CPU_INDEX;
+cpu_list_generation_id++;
 }
 
 CPUState *qemu_get_cpu(int index)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 5968551..2281be4 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -35,6 +35,7 @@ extern intptr_t qemu_host_page_mask;
 void qemu_init_cpu_list(void);
 void cpu_list_lock(void);
 void cpu_list_unlock(void);
+unsigned int cpu_list_generation_id_get(void);
 
 void tcg_flush_softmmu_tlb(CPUState *cs);
 
-- 
1.8.3.1

[PATCH v24 1/8] accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

Add a non-required argument 'CPUState' to kvm_dirty_ring_reap so
that it can cover single vcpu dirty-ring-reaping scenario.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 accel/kvm/kvm-all.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 32e177b..b13cd27 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -756,17 +756,20 @@ static uint32_t kvm_dirty_ring_reap_one(KVMState *s, 
CPUState *cpu)
 }
 
 /* Must be with slots_lock held */
-static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
+static uint64_t kvm_dirty_ring_reap_locked(KVMState *s, CPUState* cpu)
 {
 int ret;
-CPUState *cpu;
 uint64_t total = 0;
 int64_t stamp;
 
 stamp = get_clock();
 
-CPU_FOREACH(cpu) {
-total += kvm_dirty_ring_reap_one(s, cpu);
+if (cpu) {
+total = kvm_dirty_ring_reap_one(s, cpu);
+} else {
+CPU_FOREACH(cpu) {
+total += kvm_dirty_ring_reap_one(s, cpu);
+}
 }
 
 if (total) {
@@ -787,7 +790,7 @@ static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
  * Currently for simplicity, we must hold BQL before calling this.  We can
  * consider to drop the BQL if we're clear with all the race conditions.
  */
-static uint64_t kvm_dirty_ring_reap(KVMState *s)
+static uint64_t kvm_dirty_ring_reap(KVMState *s, CPUState *cpu)
 {
 uint64_t total;
 
@@ -807,7 +810,7 @@ static uint64_t kvm_dirty_ring_reap(KVMState *s)
  * reset below.
  */
 kvm_slots_lock();
-total = kvm_dirty_ring_reap_locked(s);
+total = kvm_dirty_ring_reap_locked(s, cpu);
 kvm_slots_unlock();
 
 return total;
@@ -854,7 +857,7 @@ static void kvm_dirty_ring_flush(void)
  * vcpus out in a synchronous way.
  */
 kvm_cpu_synchronize_kick_all();
-kvm_dirty_ring_reap(kvm_state);
+kvm_dirty_ring_reap(kvm_state, NULL);
 trace_kvm_dirty_ring_flush(1);
 }
 
@@ -1398,7 +1401,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
  * Not easy.  Let's cross the fingers until it's fixed.
  */
 if (kvm_state->kvm_dirty_ring_size) {
-kvm_dirty_ring_reap_locked(kvm_state);
+kvm_dirty_ring_reap_locked(kvm_state, NULL);
 } else {
 kvm_slot_get_dirty_log(kvm_state, mem);
 }
@@ -1470,7 +1473,7 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
 r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
 
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(s);
+kvm_dirty_ring_reap(s, NULL);
 qemu_mutex_unlock_iothread();
 
 r->reaper_iteration++;
@@ -2957,7 +2960,7 @@ int kvm_cpu_exec(CPUState *cpu)
  */
 trace_kvm_dirty_ring_full(cpu->cpu_index);
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(kvm_state);
+kvm_dirty_ring_reap(kvm_state, NULL);
 qemu_mutex_unlock_iothread();
 ret = 0;
 break;
-- 
1.8.3.1

[PATCH v24 0/8] support dirty restraint on vCPU

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

v24:
- add "Acked-by: Peter Xu " tag in (PATCH [8/8])

v23:

This is v23 of dirtylimit series. Since v22 posted abount 1 month ago,
i did some modifications to make sure it's ready to be queued:

- rebased the master and changed the qapi version tag from 7.0 to 7.1
- do not set error if when query_vcpu_dirty_limit find dirtylimit not
  in service, returning NULL is sufficient. (PATCH v22 [7/8]). 

The following is the history of the patchset, since v22 kind of different from
the original version, i made abstracts of changelog:

RFC and v1: 
https://lore.kernel.org/qemu-devel/cover.1637214721.git.huang...@chinatelecom.cn/
v2: 
https://lore.kernel.org/qemu-devel/cover.1637256224.git.huang...@chinatelecom.cn/
v1->v2 changelog: 
- rename some function and variables. refactor the original algo of dirtylimit. 
Thanks for
  the comments given by Juan Quintela.
v3: 
https://lore.kernel.org/qemu-devel/cover.1637403404.git.huang...@chinatelecom.cn/
v4: 
https://lore.kernel.org/qemu-devel/cover.1637653303.git.huang...@chinatelecom.cn/
v5: 
https://lore.kernel.org/qemu-devel/cover.1637759139.git.huang...@chinatelecom.cn/
v6: 
https://lore.kernel.org/qemu-devel/cover.1637856472.git.huang...@chinatelecom.cn/
v7: 
https://lore.kernel.org/qemu-devel/cover.1638202004.git.huang...@chinatelecom.cn/
v2->v7 changelog:
- refactor the docs, annotation and fix bugs of the original algo of dirtylimit.
  Thanks for the review given by Markus Armbruster. 
v8: 
https://lore.kernel.org/qemu-devel/cover.1638463260.git.huang...@chinatelecom.cn/
v9: 
https://lore.kernel.org/qemu-devel/cover.1638495274.git.huang...@chinatelecom.cn/
v10: 
https://lore.kernel.org/qemu-devel/cover.1639479557.git.huang...@chinatelecom.cn/
v7->v10 changelog:
- introduce a simpler but more efficient algo of dirtylimit inspired by Peter 
Xu.
- keep polishing the annotation suggested by Markus Armbruster.
v11: 
https://lore.kernel.org/qemu-devel/cover.1641315745.git.huang...@chinatelecom.cn/
v12: 
https://lore.kernel.org/qemu-devel/cover.1642774952.git.huang...@chinatelecom.cn/
v13: 
https://lore.kernel.org/qemu-devel/cover.1644506963.git.huang...@chinatelecom.cn/
v10->v13 changelog:
- handle the hotplug/unplug scenario.
- refactor the new algo, split the commit and make the code more clean.
v14: 
https://lore.kernel.org/qemu-devel/cover.1644509582.git.huang...@chinatelecom.cn/
v13->v14 changelog:
- sent by accident.
v15: 
https://lore.kernel.org/qemu-devel/cover.1644976045.git.huang...@chinatelecom.cn/
 
v16: 
https://lore.kernel.org/qemu-devel/cover.1645067452.git.huang...@chinatelecom.cn/
 
v17: 
https://lore.kernel.org/qemu-devel/cover.1646243252.git.huang...@chinatelecom.cn/
v14->v17 changelog: 
- do some code clean and fix test bug reported by Dr. David Alan Gilbert.
v18: 
https://lore.kernel.org/qemu-devel/cover.1646247968.git.huang...@chinatelecom.cn/
v19: 
https://lore.kernel.org/qemu-devel/cover.1647390160.git.huang...@chinatelecom.cn/
v20: 
https://lore.kernel.org/qemu-devel/cover.1647396907.git.huang...@chinatelecom.cn/
v21: 
https://lore.kernel.org/qemu-devel/cover.1647435820.git.huang...@chinatelecom.cn/
v17->v21 changelog:
- add qtest, fix bug and do code clean. 
v21->v22 changelog:
- move the vcpu dirty limit test into migration-test and do some modification 
suggested
  by Peter.

Please review.

Yong.

Abstract


This patchset introduce a mechanism to impose dirty restraint
on vCPU, aiming to keep the vCPU running in a certain dirtyrate
given by user. dirty restraint on vCPU maybe an alternative
method to implement convergence logic for live migration,
which could improve guest memory performance during migration
compared with traditional method in theory.

For the current live migration implementation, the convergence
logic throttles all vCPUs of the VM, which has some side effects.
-'read processes' on vCPU will be unnecessarily penalized
- throttle increase percentage step by step, which seems
  struggling to find the optimal throttle percentage when
  dirtyrate is high.
- hard to predict the remaining time of migration if the
  throttling percentage reachs 99%

to a certain extent, the dirty restraint machnism can fix these
effects by throttling at vCPU granularity during migration.

the implementation is rather straightforward, we calculate
vCPU dirtyrate via the Dirty Ring mechanism periodically
as the commit 0e21bf246 "implement dirty-ring dirtyrate calculation"
does, for vCPU that be specified to impose dirty restraint,
we throttle it periodically as the auto-converge does, once after
throttling, we compare the quota dirtyrate with current dirtyrate,
if current dirtyrate is not under the quota, increase the throttling
percentage until current dirtyrate is under the quota.

this patchset is the basis of implmenting a new auto-converge method
for live migration, we introduce two qmp commands for impose/cancel
the dirty restraint on specified vCPU, so it also can be an independent
api to supply the upper app

[PATCH v23 7/8] softmmu/dirtylimit: Implement dirty page rate limit

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

Implement dirtyrate calculation periodically basing on
dirty-ring and throttle virtual CPU until it reachs the quota
dirty page rate given by user.

Introduce qmp commands "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit", "query-vcpu-dirty-limit"
to enable, disable, query dirty page limit for virtual CPU.

Meanwhile, introduce corresponding hmp commands
"set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit",
"info vcpu_dirty_limit" so the feature can be more usable.

"query-vcpu-dirty-limit" success depends on enabling dirty
page rate limit, so just add it to the list of skipped
command to ensure qmp-cmd-test run successfully.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Markus Armbruster 
Reviewed-by: Peter Xu 
---
 hmp-commands-info.hx   |  13 +++
 hmp-commands.hx|  32 
 include/monitor/hmp.h  |   3 +
 qapi/migration.json|  80 +++
 softmmu/dirtylimit.c   | 194 +
 tests/qtest/qmp-cmd-test.c |   2 +
 6 files changed, 324 insertions(+)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index adfa085..016717d 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -865,6 +865,19 @@ SRST
 Display the vcpu dirty rate information.
 ERST
 
+{
+.name   = "vcpu_dirty_limit",
+.args_type  = "",
+.params = "",
+.help   = "show dirty page limit information of all vCPU",
+.cmd= hmp_info_vcpu_dirty_limit,
+},
+
+SRST
+  ``info vcpu_dirty_limit``
+Display the vcpu dirty page limit information.
+ERST
+
 #if defined(TARGET_I386)
 {
 .name   = "sgx",
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 564f1de..cb688a6 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1754,3 +1754,35 @@ ERST
   "\n\t\t\t -b to specify dirty bitmap as method of 
calculation)",
 .cmd= hmp_calc_dirty_rate,
 },
+
+SRST
+``set_vcpu_dirty_limit``
+  Set dirty page rate limit on virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+{
+.name   = "set_vcpu_dirty_limit",
+.args_type  = "dirty_rate:l,cpu_index:l?",
+.params = "dirty_rate [cpu_index]",
+.help   = "set dirty page rate limit, use cpu_index to set limit"
+  "\n\t\t\t\t\t on a specified virtual cpu",
+.cmd= hmp_set_vcpu_dirty_limit,
+},
+
+SRST
+``cancel_vcpu_dirty_limit``
+  Cancel dirty page rate limit on virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+{
+.name   = "cancel_vcpu_dirty_limit",
+.args_type  = "cpu_index:l?",
+.params = "[cpu_index]",
+.help   = "cancel dirty page rate limit, use cpu_index to cancel"
+  "\n\t\t\t\t\t limit on a specified virtual cpu",
+.cmd= hmp_cancel_vcpu_dirty_limit,
+},
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 96d0148..478820e 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -131,6 +131,9 @@ void hmp_replay_delete_break(Monitor *mon, const QDict 
*qdict);
 void hmp_replay_seek(Monitor *mon, const QDict *qdict);
 void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict);
 void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict);
+void hmp_set_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_cancel_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
 void hmp_human_readable_text_helper(Monitor *mon,
 HumanReadableText *(*qmp_handler)(Error 
**));
 
diff --git a/qapi/migration.json b/qapi/migration.json
index 6130cd9..9c57cc0 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1884,6 +1884,86 @@
 { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
 
 ##
+# @DirtyLimitInfo:
+#
+# Dirty page rate limit information of a virtual CPU.
+#
+# @cpu-index: index of a virtual CPU.
+#
+# @limit-rate: upper limit of dirty page rate (MB/s) for a virtual
+#  CPU, 0 means unlimited.
+#
+# @current-rate: current dirty page rate (MB/s) for a virtual CPU.
+#
+# Since: 7.1
+#
+##
+{ 'struct': 'DirtyLimitInfo',
+  'data': { 'cpu-index': 'int',
+'limit-rate': 'uint64',
+'current-rate': 'uint64' } }
+
+##
+# @set-vcpu-dirty-limit:
+#
+# Set the upper limit of dirty page rate for virtual CPUs.
+#
+# Requires KVM with accelerator property "dirty-ring-size" set.
+# A virtual CPU's dirty page rate is a measure of its memory load.
+# To observe dirty page rates, use @calc-dirty-rate.
+#
+# @cpu-index: index of a virtual CPU, default is all.
+#
+# @dirty-rate: upper limit of dirty page rate (MB/s) for virtual CPUs.
+#
+# Since: 7.1
+#
+# Example:

[PATCH v23 3/8] migration/dirtyrate: Refactor dirty page rate calculation

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

abstract out dirty log change logic into function
global_dirty_log_change.

abstract out dirty page rate calculation logic via
dirty-ring into function vcpu_calculate_dirtyrate.

abstract out mathematical dirty page rate calculation
into do_calculate_dirtyrate, decouple it from DirtyStat.

rename set_sample_page_period to dirty_stat_wait, which
is well-understood and will be reused in dirtylimit.

handle cpu hotplug/unplug scenario during measurement of
dirty page rate.

export util functions outside migration.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 include/sysemu/dirtyrate.h |  28 ++
 migration/dirtyrate.c  | 227 -
 migration/dirtyrate.h  |   7 +-
 3 files changed, 174 insertions(+), 88 deletions(-)
 create mode 100644 include/sysemu/dirtyrate.h

diff --git a/include/sysemu/dirtyrate.h b/include/sysemu/dirtyrate.h
new file mode 100644
index 000..4d3b9a4
--- /dev/null
+++ b/include/sysemu/dirtyrate.h
@@ -0,0 +1,28 @@
+/*
+ * dirty page rate helper functions
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_DIRTYRATE_H
+#define QEMU_DIRTYRATE_H
+
+typedef struct VcpuStat {
+int nvcpu; /* number of vcpu */
+DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
+} VcpuStat;
+
+int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
+ VcpuStat *stat,
+ unsigned int flag,
+ bool one_shot);
+
+void global_dirty_log_change(unsigned int flag,
+ bool start);
+#endif
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index aace12a..795fab5 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -46,7 +46,7 @@ static struct DirtyRateStat DirtyStat;
 static DirtyRateMeasureMode dirtyrate_mode =
 DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
 
-static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
+static int64_t dirty_stat_wait(int64_t msec, int64_t initial_time)
 {
 int64_t current_time;
 
@@ -60,6 +60,132 @@ static int64_t set_sample_page_period(int64_t msec, int64_t 
initial_time)
 return msec;
 }
 
+static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
+ CPUState *cpu, bool start)
+{
+if (start) {
+dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
+} else {
+dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
+}
+}
+
+static int64_t do_calculate_dirtyrate(DirtyPageRecord dirty_pages,
+  int64_t calc_time_ms)
+{
+uint64_t memory_size_MB;
+uint64_t increased_dirty_pages =
+dirty_pages.end_pages - dirty_pages.start_pages;
+
+memory_size_MB = (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
+
+return memory_size_MB * 1000 / calc_time_ms;
+}
+
+void global_dirty_log_change(unsigned int flag, bool start)
+{
+qemu_mutex_lock_iothread();
+if (start) {
+memory_global_dirty_log_start(flag);
+} else {
+memory_global_dirty_log_stop(flag);
+}
+qemu_mutex_unlock_iothread();
+}
+
+/*
+ * global_dirty_log_sync
+ * 1. sync dirty log from kvm
+ * 2. stop dirty tracking if needed.
+ */
+static void global_dirty_log_sync(unsigned int flag, bool one_shot)
+{
+qemu_mutex_lock_iothread();
+memory_global_dirty_log_sync();
+if (one_shot) {
+memory_global_dirty_log_stop(flag);
+}
+qemu_mutex_unlock_iothread();
+}
+
+static DirtyPageRecord *vcpu_dirty_stat_alloc(VcpuStat *stat)
+{
+CPUState *cpu;
+DirtyPageRecord *records;
+int nvcpu = 0;
+
+CPU_FOREACH(cpu) {
+nvcpu++;
+}
+
+stat->nvcpu = nvcpu;
+stat->rates = g_malloc0(sizeof(DirtyRateVcpu) * nvcpu);
+
+records = g_malloc0(sizeof(DirtyPageRecord) * nvcpu);
+
+return records;
+}
+
+static void vcpu_dirty_stat_collect(VcpuStat *stat,
+DirtyPageRecord *records,
+bool start)
+{
+CPUState *cpu;
+
+CPU_FOREACH(cpu) {
+record_dirtypages(records, cpu, start);
+}
+}
+
+int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
+ VcpuStat *stat,
+ unsigned int flag,
+ bool one_shot)
+{
+DirtyPageRecord *records;
+int64_t init_time_ms;
+int64_t duration;
+int64_t dirtyrate;
+int i = 0;
+unsigned int gen_id;
+
+retry:
+init_time_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+cpu_list_lock();
+gen_id = cpu_list_generation_id_get();
+records = vcpu_dirty_stat_alloc(stat);
+vcpu_dirty_stat_collect(stat, records, true);
+

Re: [PATCH v23 0/8] support dirty restraint on vCPU

2022-05-18 Thread Hyman Huang


sent by accident, please ignore, i'll resend when series is ready.

在 2022/5/19 11:25, huang...@chinatelecom.cn 写道:

From: Hyman Huang(黄勇) 

This is v23 of dirtylimit series. Since v22 posted abount 1 month ago,
i did some modifications to make sure it's ready to be queued:

- rebased the master and changed the qapi version tag from 7.0 to 7.1
- do not set error if when query_vcpu_dirty_limit find dirtylimit not
   in service, returning NULL is sufficient. (PATCH v22 [7/8]).

The following is the history of the patchset, since v22 kind of different from
the original version, i made abstracts of changelog:

RFC and v1: 
https://lore.kernel.org/qemu-devel/cover.1637214721.git.huang...@chinatelecom.cn/
v2: 
https://lore.kernel.org/qemu-devel/cover.1637256224.git.huang...@chinatelecom.cn/
v1->v2 changelog:
- rename some function and variables. refactor the original algo of dirtylimit. 
Thanks for
   the comments given by Juan Quintela.
v3: 
https://lore.kernel.org/qemu-devel/cover.1637403404.git.huang...@chinatelecom.cn/
v4: 
https://lore.kernel.org/qemu-devel/cover.1637653303.git.huang...@chinatelecom.cn/
v5: 
https://lore.kernel.org/qemu-devel/cover.1637759139.git.huang...@chinatelecom.cn/
v6: 
https://lore.kernel.org/qemu-devel/cover.1637856472.git.huang...@chinatelecom.cn/
v7: 
https://lore.kernel.org/qemu-devel/cover.1638202004.git.huang...@chinatelecom.cn/
v2->v7 changelog:
- refactor the docs, annotation and fix bugs of the original algo of dirtylimit.
   Thanks for the review given by Markus Armbruster.
v8: 
https://lore.kernel.org/qemu-devel/cover.1638463260.git.huang...@chinatelecom.cn/
v9: 
https://lore.kernel.org/qemu-devel/cover.1638495274.git.huang...@chinatelecom.cn/
v10: 
https://lore.kernel.org/qemu-devel/cover.1639479557.git.huang...@chinatelecom.cn/
v7->v10 changelog:
- introduce a simpler but more efficient algo of dirtylimit inspired by Peter 
Xu.
- keep polishing the annotation suggested by Markus Armbruster.
v11: 
https://lore.kernel.org/qemu-devel/cover.1641315745.git.huang...@chinatelecom.cn/
v12: 
https://lore.kernel.org/qemu-devel/cover.1642774952.git.huang...@chinatelecom.cn/
v13: 
https://lore.kernel.org/qemu-devel/cover.1644506963.git.huang...@chinatelecom.cn/
v10->v13 changelog:
- handle the hotplug/unplug scenario.
- refactor the new algo, split the commit and make the code more clean.
v14: 
https://lore.kernel.org/qemu-devel/cover.1644509582.git.huang...@chinatelecom.cn/
v13->v14 changelog:
- sent by accident.
v15: 
https://lore.kernel.org/qemu-devel/cover.1644976045.git.huang...@chinatelecom.cn/
v16: 
https://lore.kernel.org/qemu-devel/cover.1645067452.git.huang...@chinatelecom.cn/
v17: 
https://lore.kernel.org/qemu-devel/cover.1646243252.git.huang...@chinatelecom.cn/
v14->v17 changelog:
- do some code clean and fix test bug reported by Dr. David Alan Gilbert.
v18: 
https://lore.kernel.org/qemu-devel/cover.1646247968.git.huang...@chinatelecom.cn/
v19: 
https://lore.kernel.org/qemu-devel/cover.1647390160.git.huang...@chinatelecom.cn/
v20: 
https://lore.kernel.org/qemu-devel/cover.1647396907.git.huang...@chinatelecom.cn/
v21: 
https://lore.kernel.org/qemu-devel/cover.1647435820.git.huang...@chinatelecom.cn/
v17->v21 changelog:
- add qtest, fix bug and do code clean.
v21->v22 changelog:
- move the vcpu dirty limit test into migration-test and do some modification 
suggested
   by Peter.

Please review.

Yong.

Abstract


This patchset introduce a mechanism to impose dirty restraint
on vCPU, aiming to keep the vCPU running in a certain dirtyrate
given by user. dirty restraint on vCPU maybe an alternative
method to implement convergence logic for live migration,
which could improve guest memory performance during migration
compared with traditional method in theory.

For the current live migration implementation, the convergence
logic throttles all vCPUs of the VM, which has some side effects.
-'read processes' on vCPU will be unnecessarily penalized
- throttle increase percentage step by step, which seems
   struggling to find the optimal throttle percentage when
   dirtyrate is high.
- hard to predict the remaining time of migration if the
   throttling percentage reachs 99%

to a certain extent, the dirty restraint machnism can fix these
effects by throttling at vCPU granularity during migration.

the implementation is rather straightforward, we calculate
vCPU dirtyrate via the Dirty Ring mechanism periodically
as the commit 0e21bf246 "implement dirty-ring dirtyrate calculation"
does, for vCPU that be specified to impose dirty restraint,
we throttle it periodically as the auto-converge does, once after
throttling, we compare the quota dirtyrate with current dirtyrate,
if current dirtyrate is not under the quota, increase the throttling
percentage until current dirtyrate is under the quota.

this patchset is the basis of implmenting a new auto-converge method
for live migration, we introduce two qmp commands for impose/cancel
the dirty restraint on specified vCPU, so it

[PATCH v23 2/8] cpus: Introduce cpu_list_generation_id

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

Introduce cpu_list_generation_id to track cpu list generation so
that cpu hotplug/unplug can be detected during measurement of
dirty page rate.

cpu_list_generation_id could be used to detect changes of cpu
list, which is prepared for dirty page rate measurement.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 cpus-common.c | 8 
 include/exec/cpu-common.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/cpus-common.c b/cpus-common.c
index db459b4..793364d 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -73,6 +73,12 @@ static int cpu_get_free_index(void)
 }
 
 CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
+static unsigned int cpu_list_generation_id;
+
+unsigned int cpu_list_generation_id_get(void)
+{
+return cpu_list_generation_id;
+}
 
 void cpu_list_add(CPUState *cpu)
 {
@@ -84,6 +90,7 @@ void cpu_list_add(CPUState *cpu)
 assert(!cpu_index_auto_assigned);
 }
 QTAILQ_INSERT_TAIL_RCU(, cpu, node);
+cpu_list_generation_id++;
 }
 
 void cpu_list_remove(CPUState *cpu)
@@ -96,6 +103,7 @@ void cpu_list_remove(CPUState *cpu)
 
 QTAILQ_REMOVE_RCU(, cpu, node);
 cpu->cpu_index = UNASSIGNED_CPU_INDEX;
+cpu_list_generation_id++;
 }
 
 CPUState *qemu_get_cpu(int index)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 5968551..2281be4 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -35,6 +35,7 @@ extern intptr_t qemu_host_page_mask;
 void qemu_init_cpu_list(void);
 void cpu_list_lock(void);
 void cpu_list_unlock(void);
+unsigned int cpu_list_generation_id_get(void);
 
 void tcg_flush_softmmu_tlb(CPUState *cs);
 
-- 
1.8.3.1

[PATCH v23 5/8] accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

Introduce kvm_dirty_ring_size util function to help calculate
dirty ring ful time.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 
---
 accel/kvm/kvm-all.c| 5 +
 accel/stubs/kvm-stub.c | 6 ++
 include/sysemu/kvm.h   | 2 ++
 3 files changed, 13 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index b13cd27..0c6846b 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2313,6 +2313,11 @@ bool kvm_dirty_ring_enabled(void)
 return kvm_state->kvm_dirty_ring_size ? true : false;
 }
 
+uint32_t kvm_dirty_ring_size(void)
+{
+return kvm_state->kvm_dirty_ring_size;
+}
+
 static int kvm_init(MachineState *ms)
 {
 MachineClass *mc = MACHINE_GET_CLASS(ms);
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 3345882..c5aafaa 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -148,3 +148,9 @@ bool kvm_dirty_ring_enabled(void)
 {
 return false;
 }
+
+uint32_t kvm_dirty_ring_size(void)
+{
+return 0;
+}
+#endif
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a783c78..efd6dee 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -582,4 +582,6 @@ bool kvm_cpu_check_are_resettable(void);
 bool kvm_arch_cpu_check_are_resettable(void);
 
 bool kvm_dirty_ring_enabled(void);
+
+uint32_t kvm_dirty_ring_size(void);
 #endif
-- 
1.8.3.1

[PATCH v23 8/8] tests: Add dirty page rate limit test

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

Add dirty page rate limit test if kernel support dirty ring,
create a standalone file to implement the test case.

The following qmp commands are covered by this test case:
"calc-dirty-rate", "query-dirty-rate", "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit" and "query-vcpu-dirty-limit".

Signed-off-by: Hyman Huang(黄勇) 
---
 tests/qtest/migration-helpers.c |  22 
 tests/qtest/migration-helpers.h |   2 +
 tests/qtest/migration-test.c| 255 
 3 files changed, 279 insertions(+)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index a6aa59e..4849cba 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -76,6 +76,28 @@ QDict *wait_command(QTestState *who, const char *command, 
...)
 }
 
 /*
+ * Execute the qmp command only
+ */
+QDict *qmp_command(QTestState *who, const char *command, ...)
+{
+va_list ap;
+QDict *resp, *ret;
+
+va_start(ap, command);
+resp = qtest_vqmp(who, command, ap);
+va_end(ap);
+
+g_assert(!qdict_haskey(resp, "error"));
+g_assert(qdict_haskey(resp, "return"));
+
+ret = qdict_get_qdict(resp, "return");
+qobject_ref(ret);
+qobject_unref(resp);
+
+return ret;
+}
+
+/*
  * Send QMP command "migrate".
  * Arguments are built from @fmt... (formatted like
  * qobject_from_jsonf_nofail()) with "uri": @uri spliced in.
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 78587c2..5956189 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -23,6 +23,8 @@ QDict *wait_command_fd(QTestState *who, int fd, const char 
*command, ...);
 G_GNUC_PRINTF(2, 3)
 QDict *wait_command(QTestState *who, const char *command, ...);
 
+QDict *qmp_command(QTestState *who, const char *command, ...);
+
 G_GNUC_PRINTF(3, 4)
 void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...);
 
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index d33e806..f59d31b 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -24,6 +24,7 @@
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/qobject-output-visitor.h"
 #include "crypto/tlscredspsk.h"
+#include "qapi/qmp/qlist.h"
 
 #include "migration-helpers.h"
 #include "tests/migration/migration-test.h"
@@ -58,6 +59,11 @@ static bool uffd_feature_thread_id;
 #include 
 #include 
 #include 
+/*
+ * Dirtylimit stop working if dirty page rate error
+ * value less than DIRTYLIMIT_TOLERANCE_RANGE
+ */
+#define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
 
 static bool ufd_version_check(void)
 {
@@ -2070,6 +2076,253 @@ static void test_multifd_tcp_cancel(void)
 test_migrate_end(from, to2, true);
 }
 
+static void calc_dirty_rate(QTestState *who, uint64_t calc_time)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'calc-dirty-rate',"
+  "'arguments': { "
+  "'calc-time': %ld,"
+  "'mode': 'dirty-ring' }}",
+  calc_time));
+}
+
+static QDict *query_dirty_rate(QTestState *who)
+{
+return qmp_command(who, "{ 'execute': 'query-dirty-rate' }");
+}
+
+static void dirtylimit_set_all(QTestState *who, uint64_t dirtyrate)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'set-vcpu-dirty-limit',"
+  "'arguments': { "
+  "'dirty-rate': %ld } }",
+  dirtyrate));
+}
+
+static void cancel_vcpu_dirty_limit(QTestState *who)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'cancel-vcpu-dirty-limit' }"));
+}
+
+static QDict *query_vcpu_dirty_limit(QTestState *who)
+{
+QDict *rsp;
+
+rsp = qtest_qmp(who, "{ 'execute': 'query-vcpu-dirty-limit' }");
+g_assert(!qdict_haskey(rsp, "error"));
+g_assert(qdict_haskey(rsp, "return"));
+
+return rsp;
+}
+
+static bool calc_dirtyrate_ready(QTestState *who)
+{
+QDict *rsp_return;
+gchar *status;
+
+rsp_return = query_dirty_rate(who);
+g_assert(rsp_return);
+
+status = g_strdup(qdict_get_str(rsp_return, "status"));
+g_assert(status);
+
+return g_strcmp0(status, "measuring");
+}
+
+static void wait_for_calc_dirtyrate_complete(QTestState *who,
+ int64_t time_s)
+{
+int max_try_count = 1;
+usleep(time_s * 100);
+
+while (!calc_dirtyrate_ready(who) && max_try_count--) {
+usleep(1000);
+}
+
+/*
+ * Set the timeout with 10 s(max_try_count * 1000us),
+ * if dirtyrate measurement not complete, fail test.
+ */
+g_assert_cmpint(max_try_count, !=, 0);
+}
+
+static int64_t get_dirty_rate(QTestState *who)
+{
+QDict *rsp_return;
+gchar *status;
+QList *rates;
+const QListEntry *entry;
+QDict *rate;
+int64_t dirtyrate;
+
+rsp_return = query_dirty_rate(who);
+g_assert(rsp_return);
+
+status =

[PATCH v23 1/8] accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

Add a non-required argument 'CPUState' to kvm_dirty_ring_reap so
that it can cover single vcpu dirty-ring-reaping scenario.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 accel/kvm/kvm-all.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 32e177b..b13cd27 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -756,17 +756,20 @@ static uint32_t kvm_dirty_ring_reap_one(KVMState *s, 
CPUState *cpu)
 }
 
 /* Must be with slots_lock held */
-static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
+static uint64_t kvm_dirty_ring_reap_locked(KVMState *s, CPUState* cpu)
 {
 int ret;
-CPUState *cpu;
 uint64_t total = 0;
 int64_t stamp;
 
 stamp = get_clock();
 
-CPU_FOREACH(cpu) {
-total += kvm_dirty_ring_reap_one(s, cpu);
+if (cpu) {
+total = kvm_dirty_ring_reap_one(s, cpu);
+} else {
+CPU_FOREACH(cpu) {
+total += kvm_dirty_ring_reap_one(s, cpu);
+}
 }
 
 if (total) {
@@ -787,7 +790,7 @@ static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
  * Currently for simplicity, we must hold BQL before calling this.  We can
  * consider to drop the BQL if we're clear with all the race conditions.
  */
-static uint64_t kvm_dirty_ring_reap(KVMState *s)
+static uint64_t kvm_dirty_ring_reap(KVMState *s, CPUState *cpu)
 {
 uint64_t total;
 
@@ -807,7 +810,7 @@ static uint64_t kvm_dirty_ring_reap(KVMState *s)
  * reset below.
  */
 kvm_slots_lock();
-total = kvm_dirty_ring_reap_locked(s);
+total = kvm_dirty_ring_reap_locked(s, cpu);
 kvm_slots_unlock();
 
 return total;
@@ -854,7 +857,7 @@ static void kvm_dirty_ring_flush(void)
  * vcpus out in a synchronous way.
  */
 kvm_cpu_synchronize_kick_all();
-kvm_dirty_ring_reap(kvm_state);
+kvm_dirty_ring_reap(kvm_state, NULL);
 trace_kvm_dirty_ring_flush(1);
 }
 
@@ -1398,7 +1401,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
  * Not easy.  Let's cross the fingers until it's fixed.
  */
 if (kvm_state->kvm_dirty_ring_size) {
-kvm_dirty_ring_reap_locked(kvm_state);
+kvm_dirty_ring_reap_locked(kvm_state, NULL);
 } else {
 kvm_slot_get_dirty_log(kvm_state, mem);
 }
@@ -1470,7 +1473,7 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
 r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
 
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(s);
+kvm_dirty_ring_reap(s, NULL);
 qemu_mutex_unlock_iothread();
 
 r->reaper_iteration++;
@@ -2957,7 +2960,7 @@ int kvm_cpu_exec(CPUState *cpu)
  */
 trace_kvm_dirty_ring_full(cpu->cpu_index);
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(kvm_state);
+kvm_dirty_ring_reap(kvm_state, NULL);
 qemu_mutex_unlock_iothread();
 ret = 0;
 break;
-- 
1.8.3.1

[PATCH v23 6/8] softmmu/dirtylimit: Implement virtual CPU throttle

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

Setup a negative feedback system when vCPU thread
handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
throttle_us_per_full field in struct CPUState. Sleep
throttle_us_per_full microseconds to throttle vCPU
if dirtylimit is in service.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 accel/kvm/kvm-all.c |  20 ++-
 include/hw/core/cpu.h   |   6 +
 include/sysemu/dirtylimit.h |  15 +++
 softmmu/dirtylimit.c| 291 
 softmmu/trace-events|   7 ++
 5 files changed, 338 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 0c6846b..187c337 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -45,6 +45,7 @@
 #include "qemu/guest-random.h"
 #include "sysemu/hw_accel.h"
 #include "kvm-cpus.h"
+#include "sysemu/dirtylimit.h"
 
 #include "hw/boards.h"
 
@@ -476,6 +477,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 cpu->kvm_state = s;
 cpu->vcpu_dirty = true;
 cpu->dirty_pages = 0;
+cpu->throttle_us_per_full = 0;
 
 mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
 if (mmap_size < 0) {
@@ -1469,6 +1471,11 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
  */
 sleep(1);
 
+/* keep sleeping so that dirtylimit not be interfered by reaper */
+if (dirtylimit_in_service()) {
+continue;
+}
+
 trace_kvm_dirty_ring_reaper("wakeup");
 r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
 
@@ -2965,8 +2972,19 @@ int kvm_cpu_exec(CPUState *cpu)
  */
 trace_kvm_dirty_ring_full(cpu->cpu_index);
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(kvm_state, NULL);
+/*
+ * We throttle vCPU by making it sleep once it exit from kernel
+ * due to dirty ring full. In the dirtylimit scenario, reaping
+ * all vCPUs after a single vCPU dirty ring get full result in
+ * the miss of sleep, so just reap the ring-fulled vCPU.
+ */
+if (dirtylimit_in_service()) {
+kvm_dirty_ring_reap(kvm_state, cpu);
+} else {
+kvm_dirty_ring_reap(kvm_state, NULL);
+}
 qemu_mutex_unlock_iothread();
+dirtylimit_vcpu_execute(cpu);
 ret = 0;
 break;
 case KVM_EXIT_SYSTEM_EVENT:
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 996f940..500503d 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -418,6 +418,12 @@ struct CPUState {
  */
 bool throttle_thread_scheduled;
 
+/*
+ * Sleep throttle_us_per_full microseconds once dirty ring is full
+ * if dirty page rate limit is enabled.
+ */
+int64_t throttle_us_per_full;
+
 bool ignore_memory_transaction_failures;
 
 /* Used for user-only emulation of prctl(PR_SET_UNALIGN). */
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
index da459f0..8d2c1f3 100644
--- a/include/sysemu/dirtylimit.h
+++ b/include/sysemu/dirtylimit.h
@@ -19,4 +19,19 @@ void vcpu_dirty_rate_stat_start(void);
 void vcpu_dirty_rate_stat_stop(void);
 void vcpu_dirty_rate_stat_initialize(void);
 void vcpu_dirty_rate_stat_finalize(void);
+
+void dirtylimit_state_lock(void);
+void dirtylimit_state_unlock(void);
+void dirtylimit_state_initialize(void);
+void dirtylimit_state_finalize(void);
+bool dirtylimit_in_service(void);
+bool dirtylimit_vcpu_index_valid(int cpu_index);
+void dirtylimit_process(void);
+void dirtylimit_change(bool start);
+void dirtylimit_set_vcpu(int cpu_index,
+ uint64_t quota,
+ bool enable);
+void dirtylimit_set_all(uint64_t quota,
+bool enable);
+void dirtylimit_vcpu_execute(CPUState *cpu);
 #endif
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index 6102e8c..76d0b44 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -18,6 +18,26 @@
 #include "sysemu/dirtylimit.h"
 #include "exec/memory.h"
 #include "hw/boards.h"
+#include "sysemu/kvm.h"
+#include "trace.h"
+
+/*
+ * Dirtylimit stop working if dirty page rate error
+ * value less than DIRTYLIMIT_TOLERANCE_RANGE
+ */
+#define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
+/*
+ * Plus or minus vcpu sleep time linearly if dirty
+ * page rate error value percentage over
+ * DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT.
+ * Otherwise, plus or minus a fixed vcpu sleep time.
+ */
+#define DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT 50
+/*
+ * Max vcpu sleep time percentage during a cycle
+ * composed of dirty ring full and sleep time.
+ */
+#define DIRTYLIMIT_THROTTLE_PCT_MAX 99
 
 struct {
 VcpuStat stat;
@@ -25,6 +45,30 @@ struct {
 QemuThread thread;
 } *vcpu_dirty_rate_stat;
 
+typedef struct VcpuDirtyLimitState {
+int cpu_index;
+bool enabled;
+/*
+ * Quota dirty page rate, unit is MB/s
+ * zero if not

[PATCH v23 4/8] softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodically

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
tracking for calculate dirtyrate periodly for dirty page
rate limit.

Add dirtylimit.c to implement dirtyrate calculation periodly,
which will be used for dirty page rate limit.

Add dirtylimit.h to export util functions for dirty page rate
limit implementation.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
---
 include/exec/memory.h   |   5 +-
 include/sysemu/dirtylimit.h |  22 +
 softmmu/dirtylimit.c| 116 
 softmmu/meson.build |   1 +
 4 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 include/sysemu/dirtylimit.h
 create mode 100644 softmmu/dirtylimit.c

diff --git a/include/exec/memory.h b/include/exec/memory.h
index f1c1945..9e4e966 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
 /* Dirty tracking enabled because measuring dirty rate */
 #define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
 
-#define GLOBAL_DIRTY_MASK  (0x3)
+/* Dirty tracking enabled because dirty limit */
+#define GLOBAL_DIRTY_LIMIT  (1U << 2)
+
+#define GLOBAL_DIRTY_MASK  (0x7)
 
 extern unsigned int global_dirty_tracking;
 
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
new file mode 100644
index 000..da459f0
--- /dev/null
+++ b/include/sysemu/dirtylimit.h
@@ -0,0 +1,22 @@
+/*
+ * Dirty page rate limit common functions
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_DIRTYRLIMIT_H
+#define QEMU_DIRTYRLIMIT_H
+
+#define DIRTYLIMIT_CALC_TIME_MS 1000/* 1000ms */
+
+int64_t vcpu_dirty_rate_get(int cpu_index);
+void vcpu_dirty_rate_stat_start(void);
+void vcpu_dirty_rate_stat_stop(void);
+void vcpu_dirty_rate_stat_initialize(void);
+void vcpu_dirty_rate_stat_finalize(void);
+#endif
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
new file mode 100644
index 000..6102e8c
--- /dev/null
+++ b/softmmu/dirtylimit.c
@@ -0,0 +1,116 @@
+/*
+ * Dirty page rate limit implementation code
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/main-loop.h"
+#include "qapi/qapi-commands-migration.h"
+#include "sysemu/dirtyrate.h"
+#include "sysemu/dirtylimit.h"
+#include "exec/memory.h"
+#include "hw/boards.h"
+
+struct {
+VcpuStat stat;
+bool running;
+QemuThread thread;
+} *vcpu_dirty_rate_stat;
+
+static void vcpu_dirty_rate_stat_collect(void)
+{
+VcpuStat stat;
+int i = 0;
+
+/* calculate vcpu dirtyrate */
+vcpu_calculate_dirtyrate(DIRTYLIMIT_CALC_TIME_MS,
+ ,
+ GLOBAL_DIRTY_LIMIT,
+ false);
+
+for (i = 0; i < stat.nvcpu; i++) {
+vcpu_dirty_rate_stat->stat.rates[i].id = i;
+vcpu_dirty_rate_stat->stat.rates[i].dirty_rate =
+stat.rates[i].dirty_rate;
+}
+
+free(stat.rates);
+}
+
+static void *vcpu_dirty_rate_stat_thread(void *opaque)
+{
+rcu_register_thread();
+
+/* start log sync */
+global_dirty_log_change(GLOBAL_DIRTY_LIMIT, true);
+
+while (qatomic_read(_dirty_rate_stat->running)) {
+vcpu_dirty_rate_stat_collect();
+}
+
+/* stop log sync */
+global_dirty_log_change(GLOBAL_DIRTY_LIMIT, false);
+
+rcu_unregister_thread();
+return NULL;
+}
+
+int64_t vcpu_dirty_rate_get(int cpu_index)
+{
+DirtyRateVcpu *rates = vcpu_dirty_rate_stat->stat.rates;
+return qatomic_read([cpu_index].dirty_rate);
+}
+
+void vcpu_dirty_rate_stat_start(void)
+{
+if (qatomic_read(_dirty_rate_stat->running)) {
+return;
+}
+
+qatomic_set(_dirty_rate_stat->running, 1);
+qemu_thread_create(_dirty_rate_stat->thread,
+   "dirtyrate-stat",
+   vcpu_dirty_rate_stat_thread,
+   NULL,
+   QEMU_THREAD_JOINABLE);
+}
+
+void vcpu_dirty_rate_stat_stop(void)
+{
+qatomic_set(_dirty_rate_stat->running, 0);
+qemu_mutex_unlock_iothread();
+qemu_thread_join(_dirty_rate_stat->thread);
+qemu_mutex_lock_iothread();
+}
+
+void vcpu_dirty_rate_stat_initialize(void)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+int max_cpus = ms->smp.max_cpus;
+
+vcpu_dirty_rate_stat =
+g_malloc0(sizeof(*vcpu_dirty_rate_stat));
+
+vcpu_dirty_rate_stat->stat.nvcpu = max_cpus;
+vcpu_dirty_rate_stat->stat.rates =
+g_malloc0(sizeof(DirtyRateVcpu) * max_cpus);
+
+vcpu_dirty_rate_stat->running = false;
+}
+
+void

[PATCH v23 0/8] support dirty restraint on vCPU

2022-05-18 Thread huangy81

From: Hyman Huang(黄勇) 

This is v23 of dirtylimit series. Since v22 posted abount 1 month ago,
i did some modifications to make sure it's ready to be queued:

- rebased the master and changed the qapi version tag from 7.0 to 7.1
- do not set error if when query_vcpu_dirty_limit find dirtylimit not
  in service, returning NULL is sufficient. (PATCH v22 [7/8]). 

The following is the history of the patchset, since v22 kind of different from
the original version, i made abstracts of changelog:

RFC and v1: 
https://lore.kernel.org/qemu-devel/cover.1637214721.git.huang...@chinatelecom.cn/
v2: 
https://lore.kernel.org/qemu-devel/cover.1637256224.git.huang...@chinatelecom.cn/
v1->v2 changelog: 
- rename some function and variables. refactor the original algo of dirtylimit. 
Thanks for
  the comments given by Juan Quintela.
v3: 
https://lore.kernel.org/qemu-devel/cover.1637403404.git.huang...@chinatelecom.cn/
v4: 
https://lore.kernel.org/qemu-devel/cover.1637653303.git.huang...@chinatelecom.cn/
v5: 
https://lore.kernel.org/qemu-devel/cover.1637759139.git.huang...@chinatelecom.cn/
v6: 
https://lore.kernel.org/qemu-devel/cover.1637856472.git.huang...@chinatelecom.cn/
v7: 
https://lore.kernel.org/qemu-devel/cover.1638202004.git.huang...@chinatelecom.cn/
v2->v7 changelog:
- refactor the docs, annotation and fix bugs of the original algo of dirtylimit.
  Thanks for the review given by Markus Armbruster. 
v8: 
https://lore.kernel.org/qemu-devel/cover.1638463260.git.huang...@chinatelecom.cn/
v9: 
https://lore.kernel.org/qemu-devel/cover.1638495274.git.huang...@chinatelecom.cn/
v10: 
https://lore.kernel.org/qemu-devel/cover.1639479557.git.huang...@chinatelecom.cn/
v7->v10 changelog:
- introduce a simpler but more efficient algo of dirtylimit inspired by Peter 
Xu.
- keep polishing the annotation suggested by Markus Armbruster.
v11: 
https://lore.kernel.org/qemu-devel/cover.1641315745.git.huang...@chinatelecom.cn/
v12: 
https://lore.kernel.org/qemu-devel/cover.1642774952.git.huang...@chinatelecom.cn/
v13: 
https://lore.kernel.org/qemu-devel/cover.1644506963.git.huang...@chinatelecom.cn/
v10->v13 changelog:
- handle the hotplug/unplug scenario.
- refactor the new algo, split the commit and make the code more clean.
v14: 
https://lore.kernel.org/qemu-devel/cover.1644509582.git.huang...@chinatelecom.cn/
v13->v14 changelog:
- sent by accident.
v15: 
https://lore.kernel.org/qemu-devel/cover.1644976045.git.huang...@chinatelecom.cn/
 
v16: 
https://lore.kernel.org/qemu-devel/cover.1645067452.git.huang...@chinatelecom.cn/
 
v17: 
https://lore.kernel.org/qemu-devel/cover.1646243252.git.huang...@chinatelecom.cn/
v14->v17 changelog: 
- do some code clean and fix test bug reported by Dr. David Alan Gilbert.
v18: 
https://lore.kernel.org/qemu-devel/cover.1646247968.git.huang...@chinatelecom.cn/
v19: 
https://lore.kernel.org/qemu-devel/cover.1647390160.git.huang...@chinatelecom.cn/
v20: 
https://lore.kernel.org/qemu-devel/cover.1647396907.git.huang...@chinatelecom.cn/
v21: 
https://lore.kernel.org/qemu-devel/cover.1647435820.git.huang...@chinatelecom.cn/
v17->v21 changelog:
- add qtest, fix bug and do code clean. 
v21->v22 changelog:
- move the vcpu dirty limit test into migration-test and do some modification 
suggested
  by Peter.

Please review.

Yong.

Abstract


This patchset introduce a mechanism to impose dirty restraint
on vCPU, aiming to keep the vCPU running in a certain dirtyrate
given by user. dirty restraint on vCPU maybe an alternative
method to implement convergence logic for live migration,
which could improve guest memory performance during migration
compared with traditional method in theory.

For the current live migration implementation, the convergence
logic throttles all vCPUs of the VM, which has some side effects.
-'read processes' on vCPU will be unnecessarily penalized
- throttle increase percentage step by step, which seems
  struggling to find the optimal throttle percentage when
  dirtyrate is high.
- hard to predict the remaining time of migration if the
  throttling percentage reachs 99%

to a certain extent, the dirty restraint machnism can fix these
effects by throttling at vCPU granularity during migration.

the implementation is rather straightforward, we calculate
vCPU dirtyrate via the Dirty Ring mechanism periodically
as the commit 0e21bf246 "implement dirty-ring dirtyrate calculation"
does, for vCPU that be specified to impose dirty restraint,
we throttle it periodically as the auto-converge does, once after
throttling, we compare the quota dirtyrate with current dirtyrate,
if current dirtyrate is not under the quota, increase the throttling
percentage until current dirtyrate is under the quota.

this patchset is the basis of implmenting a new auto-converge method
for live migration, we introduce two qmp commands for impose/cancel
the dirty restraint on specified vCPU, so it also can be an independent
api to supply the upper app such as libvirt, which can use it to
implement the

Re: [RFC 1/6] qapi/migration: Introduce vcpu-dirtylimit-period parameters

2022-05-18 Thread Hyman Huang





在 2022/5/18 23:05, Eric Blake 写道:

On Tue, May 17, 2022 at 02:35:01PM +0800, huang...@chinatelecom.cn wrote:

From: Hyman Huang(黄勇) 

Introduce "vcpu-dirtylimit-period" migration parameters,
which is used to makes dirtyrate calculation period


make


configurable.

To implement that, refactor vcpu_dirty_rate_stat_collect
so that period can be configured instead of hardcode.


hardcoded



Meanwhile, introduce migrate_dirtylimit function to help
check if dirtylimit enabled during live migration, set
it false by default.

Signed-off-by: Hyman Huang(黄勇) 
---


Focusing just on UI...


+++ b/qapi/migration.json
@@ -760,6 +760,9 @@
  #block device name if there is one, and to their node 
name
  #otherwise. (Since 5.2)
  #
+# @vcpu-dirtylimit-period: Periodic time (ms) of dirtylimit during live 
migration.
+#  Defaults to 500ms. (Since 7.0)


The next release is 7.1.  You'll need to fix this and all other references.

Ok, i'll fix that in the v1.


Do we want 'dirty-limit' instead of 'dirtylimit'?  There was a recent
thread on how to translate QAPI to other languages that are a bit more
insistent on MixedCase, where properly separating English words makes
it easier to translate to the expected case.


Changing the parameter name sounds ok to me, i'm not insistent that。

  ##
  # @migrate-set-parameters:
@@ -1125,6 +1132,9 @@
  #block device name if there is one, and to their node 
name
  #otherwise. (Since 5.2)
  #
+# @vcpu-dirtylimit-period: Periodic time (ms) of dirtylimit during live 
migration.
+#  Defaults to 500ms. (Since 7.0)
+#
  # Features:
  # @unstable: Member @x-checkpoint-delay is experimental.


Is this feature ready for prime time, or do we want to initially name
it x-vcpu-dirty[-]limit-period to mark it experimental?
Indeed, for this fresh new feature, finding factors affecting migration 
need more practice, marking it experimental could be much safer, i'll do 
that in v1.




--
Best regard

Hyman Huang(黄勇)

Re: [PATCH v4 34/43] hw/intc: Add LoongArch extioi interrupt controller(EIOINTC)

2022-05-18 Thread yangxiaojuan


Hi Richard

On 2022/5/19 上午2:04, Richard Henderson wrote:


+    uint64_t sw_isr[LOONGARCH_MAX_VCPUS][LS3A_INTC_IP][EXTIOI_IRQS 
/ 64];


This has not been declared with DECLARE_BITMAP, therefore you will see 
a compile-time error when building on an ILP32 (i686) or P64 (win64) 
host.


I pointed this out to you as recently as v2 of this series.
I am really disappointed to see this regress in just one month.

You can test this yourself with

  IMAGES='fedora-i386-cross fedora-win32-cross fedora-win64-cross' \
  make docker-test-build

Please do so before your next submission. 

Thank you for your patient guidance, we will carefully correct them.

Thanks.
Xiaojuan

Re: [PATCH V2 1/4] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH

2022-05-18 Thread Jason Wang

On Wed, May 18, 2022 at 9:09 PM Dr. David Alan Gilbert
 wrote:
>
> * Jason Wang (jasow...@redhat.com) wrote:
> > On Sat, May 7, 2022 at 10:03 AM Zhang, Chen  wrote:
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Zhang, Chen
> > > > Sent: Wednesday, April 27, 2022 5:26 PM
> > > > To: Jason Wang ; Paolo Bonzini
> > > > 
> > > > Cc: Li Zhijian ; qemu-dev  > > > de...@nongnu.org>; Like Xu 
> > > > Subject: RE: [PATCH V2 1/4] softmmu/runstate.c: add RunStateTransition
> > > > support form COLO to PRELAUNCH
> > > >
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Jason Wang 
> > > > > Sent: Wednesday, April 27, 2022 4:57 PM
> > > > > To: Zhang, Chen 
> > > > > Cc: Li Zhijian ; qemu-dev  > > > > de...@nongnu.org>; Like Xu 
> > > > > Subject: Re: [PATCH V2 1/4] softmmu/runstate.c: add RunStateTransition
> > > > > support form COLO to PRELAUNCH
> > > > >
> > > > > On Fri, Apr 1, 2022 at 11:59 AM Zhang Chen  
> > > > > wrote:
> > > > > >
> > > > > > If the checkpoint occurs when the guest finishes restarting but has
> > > > > > not started running, the runstate_set() may reject the transition
> > > > > > from COLO to PRELAUNCH with the crash log:
> > > > > >
> > > > > > {"timestamp": {"seconds": 1593484591, "microseconds": 26605},\
> > > > > > "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
> > > > > > qemu-system-x86_64: invalid runstate transition: 'colo' -> 
> > > > > > 'prelaunch'
> > > > > >
> > > > > > Long-term testing says that it's pretty safe.
> > > > > >
> > > > > > Signed-off-by: Like Xu 
> > > > > > Signed-off-by: Zhang Chen 
> > > > >
> > > > > I'd expect this to get ack from the relevant maintainers.
> > > > >
> > > >
> > > > The scripts/get_maintainer.pl can't find relevant maintainers for this 
> > > > patch.
> > > > Maybe Paolo have time to cover this simple patch related to runstate?
> > >
> > > No news for a while, any comments for unmaintained files changes ?
> > > Ping...
> >
> > Adding David and Juan.
>
> This looks OK to me;
>
> Acked-by: Dr. David Alan Gilbert 

Great.

>
> it should be fine to merge it along with the pull that takes the other
> patches.

Yes, I've queued this series.

Thanks

>
> Dave
>
> > Thanks
> >
> > >
> > > Thanks
> > > Chen
> > >
> > > >
> > > > Thanks
> > > > Chen
> > > >
> > > > > Thanks
> > > > >
> > > > > > ---
> > > > > >  softmmu/runstate.c | 1 +
> > > > > >  1 file changed, 1 insertion(+)
> > > > > >
> > > > > > diff --git a/softmmu/runstate.c b/softmmu/runstate.c index
> > > > > > e0d869b21a..c021c56338 100644
> > > > > > --- a/softmmu/runstate.c
> > > > > > +++ b/softmmu/runstate.c
> > > > > > @@ -127,6 +127,7 @@ static const RunStateTransition
> > > > > runstate_transitions_def[] = {
> > > > > >  { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
> > > > > >
> > > > > >  { RUN_STATE_COLO, RUN_STATE_RUNNING },
> > > > > > +{ RUN_STATE_COLO, RUN_STATE_PRELAUNCH },
> > > > > >  { RUN_STATE_COLO, RUN_STATE_SHUTDOWN},
> > > > > >
> > > > > >  { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
> > > > > > --
> > > > > > 2.25.1
> > > > > >
> > >
> >
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>

Re: [PULL 07/11] sysemu: tpm: Add a stub function for TPM_IS_CRB

2022-05-18 Thread Stefan Berger





On 5/18/22 21:17, Alex Williamson wrote:

On Wed, 18 May 2022 20:46:02 -0400
Stefan Berger  wrote:


On 5/6/22 16:48, Alex Williamson wrote:

From: Eric Auger 

In a subsequent patch, VFIO will need to recognize if
a memory region owner is a TPM CRB device. Hence VFIO
needs to use TPM_IS_CRB() even if CONFIG_TPM is unset. So
let's add a stub function.

Signed-off-by: Eric Auger 
Suggested-by: Cornelia Huck 
Reviewed-by: Stefan Berger 


Oops, a typo here: linnux -> linux


Yup, sorry, didn't notice until it was too late, but that was as
provided by you:


I know...



https://lore.kernel.org/all/96940f79-a6e0-d14f-5d74-abe280846...@linux.ibm.com/

Thanks,

Alex

Re: [PULL 07/11] sysemu: tpm: Add a stub function for TPM_IS_CRB

2022-05-18 Thread Alex Williamson

On Wed, 18 May 2022 20:46:02 -0400
Stefan Berger  wrote:

> On 5/6/22 16:48, Alex Williamson wrote:
> > From: Eric Auger 
> > 
> > In a subsequent patch, VFIO will need to recognize if
> > a memory region owner is a TPM CRB device. Hence VFIO
> > needs to use TPM_IS_CRB() even if CONFIG_TPM is unset. So
> > let's add a stub function.
> > 
> > Signed-off-by: Eric Auger 
> > Suggested-by: Cornelia Huck 
> > Reviewed-by: Stefan Berger   
> 
> Oops, a typo here: linnux -> linux

Yup, sorry, didn't notice until it was too late, but that was as
provided by you:

https://lore.kernel.org/all/96940f79-a6e0-d14f-5d74-abe280846...@linux.ibm.com/

Thanks,

Alex

Re: [PULL 07/11] sysemu: tpm: Add a stub function for TPM_IS_CRB

2022-05-18 Thread Stefan Berger





On 5/6/22 16:48, Alex Williamson wrote:

From: Eric Auger 

In a subsequent patch, VFIO will need to recognize if
a memory region owner is a TPM CRB device. Hence VFIO
needs to use TPM_IS_CRB() even if CONFIG_TPM is unset. So
let's add a stub function.

Signed-off-by: Eric Auger 
Suggested-by: Cornelia Huck 
Reviewed-by: Stefan Berger 


Oops, a typo here: linnux -> linux

RE: [PATCH] target/arm: Make number of counters in PMCR follow the CPU

2022-05-18 Thread ishii.shuuic...@fujitsu.com

> Thanks for looking up the a64fx register value. You don't need to
> do anything more -- I'll fix up the TODO comment and put the right
> value into this patch, either when I post a v2 of it or else when
> I apply it to target-arm.next.

I understand.
Thank you in advance.

Shuuichirou.

> -Original Message-
> From: Peter Maydell 
> Sent: Wednesday, May 18, 2022 7:31 PM
> To: Ishii, Shuuichirou/石井 周一郎 
> Cc: Alex Bennée ; Itaru Kitayama
> ; qemu-...@nongnu.org; qemu-devel@nongnu.org
> Subject: Re: [PATCH] target/arm: Make number of counters in PMCR follow the
> CPU
> 
> On Wed, 18 May 2022 at 00:24, ishii.shuuic...@fujitsu.com
>  wrote:
> >
> > Hi, Peter.
> >
> > > Shuuichirou, Itaru: this is another patch where we need to know
> > > an A64FX register value...
> >
> > Sorry for the late reply.
> >
> > The initial value of the pmcr_el0 register in A64FX is 0x46014040.
> >
> > After applying this Peter's patch, should we submit a new patch as a64fx 
> > patch
> from us?
> > or do you want to fix your own modifications to the patch that peter has 
> > posted?
> > Which is the best procedure?
> 
> Thanks for looking up the a64fx register value. You don't need to
> do anything more -- I'll fix up the TODO comment and put the right
> value into this patch, either when I post a v2 of it or else when
> I apply it to target-arm.next.
> 
> -- PMM

Re: [PATCH 1/2] target/m68k: Clear mach in m68k_cpu_disas_set_info

2022-05-18 Thread Laurent Vivier


Le 30/04/2022 à 19:02, Richard Henderson a écrit :

Zero selects all cpu features in disas/m68k.c,
which is really what we want -- not limited to 68040.

Signed-off-by: Richard Henderson 
---
  target/m68k/cpu.c | 6 +-
  1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/target/m68k/cpu.c b/target/m68k/cpu.c
index c7aeb7da9c..5671067923 100644
--- a/target/m68k/cpu.c
+++ b/target/m68k/cpu.c
@@ -75,12 +75,8 @@ static void m68k_cpu_reset(DeviceState *dev)
  
  static void m68k_cpu_disas_set_info(CPUState *s, disassemble_info *info)

  {
-M68kCPU *cpu = M68K_CPU(s);
-CPUM68KState *env = >env;
  info->print_insn = print_insn_m68k;
-if (m68k_feature(env, M68K_FEATURE_M68000)) {
-info->mach = bfd_mach_m68040;
-}
+info->mach = 0;
  }
  
  /* CPU models */


Reviewed-by: Laurent Vivier

Re: [PULL 0/7] Artist cursor fix final patches

2022-05-18 Thread Richard Henderson


On 5/18/22 09:17, Helge Deller wrote:

The following changes since commit 178bacb66d98d9ee7a702b9f2a4dfcd88b72a9ab:

   Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into 
staging (2022-05-09 11:07:04 -0700)

are available in the Git repository at:

   https://github.com/hdeller/qemu-hppa.git 
tags/artist-cursor-fix-final-pull-request

for you to fetch changes up to 9ef2c6b4ab9a430ee230ee305b6994d7eae86fe7:

   artist: Fix X cursor position calculation in X11 (2022-05-16 16:46:25 +0200)


hppa: Artist graphics driver fixes for HP-UX and keyboard fix in firmware boot 
console

This series updates the SeaBIOS-hppa firmware to version 5, in which additional
HP fonts were added to the firmware and the firmware boot console was fixed to
accept input from the emulated PS/2 keyboard when running in graphical mode
(serial console was working before already). To test use the "-boot menu=on"
qemu option.

The artist graphics card driver got various fixes when running the X11-Windows
on HP-UX:
- fixes the horizontal and vertical postioning of the X11 cursor with HP-UX
- allows X11 to blank the screen (e.g. screensaver)
- allows the X11 driver to turn the X11 cursor on/off


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as 
appropriate.


r~

Re: [PATCH v4 39/43] hw/loongarch: Add LoongArch load elf function.

2022-05-18 Thread Richard Henderson


On 5/17/22 04:30, Xiaojuan Yang wrote:

@@ -26,6 +31,10 @@ struct LoongArchMachineState {
  MemoryRegion lowmem;
  MemoryRegion highmem;
  MemoryRegion isa_io;
+MemoryRegion bios;


This is unused.

Otherwise,
Reviewed-by: Richard Henderson 


r~

Re: [PATCH v4 38/43] hw/loongarch: Add LoongArch ls7a rtc device support

2022-05-18 Thread Richard Henderson


On 5/17/22 04:30, Xiaojuan Yang wrote:

This patch add ls7a rtc device support.

Signed-off-by: Xiaojuan Yang 
Signed-off-by: Song Gao 
---
  MAINTAINERS|   1 +
  hw/loongarch/Kconfig   |   1 +
  hw/loongarch/loongson3.c   |   4 +
  hw/rtc/Kconfig |   3 +
  hw/rtc/ls7a_rtc.c  | 526 +
  hw/rtc/meson.build |   1 +
  include/hw/pci-host/ls7a.h |   4 +
  7 files changed, 540 insertions(+)
  create mode 100644 hw/rtc/ls7a_rtc.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 8c3cac8d20..6e03a8bca8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1137,6 +1137,7 @@ F: include/hw/loongarch/virt.h
  F: include/hw/intc/loongarch_*.h
  F: hw/intc/loongarch_*.c
  F: include/hw/pci-host/ls7a.h
+F: hw/rtc/ls7a_rtc.c
  
  M68K Machines

  -
diff --git a/hw/loongarch/Kconfig b/hw/loongarch/Kconfig
index 8552ff4bee..35b6680772 100644
--- a/hw/loongarch/Kconfig
+++ b/hw/loongarch/Kconfig
@@ -13,3 +13,4 @@ config LOONGARCH_VIRT
  select LOONGARCH_PCH_PIC
  select LOONGARCH_PCH_MSI
  select LOONGARCH_EXTIOI
+select LS7A_RTC
diff --git a/hw/loongarch/loongson3.c b/hw/loongarch/loongson3.c
index 7bc17113dc..2c04ddeadd 100644
--- a/hw/loongarch/loongson3.c
+++ b/hw/loongarch/loongson3.c
@@ -97,6 +97,10 @@ static void loongarch_devices_init(DeviceState *pch_pic)
   * Create some unimplemented devices to emulate this.
   */
  create_unimplemented_device("pci-dma-cfg", 0x1001041c, 0x4);
+
+sysbus_create_simple("ls7a_rtc", LS7A_RTC_REG_BASE,
+ qdev_get_gpio_in(pch_pic,
+ LS7A_RTC_IRQ - PCH_PIC_IRQ_OFFSET));
  }
  
  static void loongarch_irq_init(LoongArchMachineState *lams)

diff --git a/hw/rtc/Kconfig b/hw/rtc/Kconfig
index 730c272bc5..d0d8dda084 100644
--- a/hw/rtc/Kconfig
+++ b/hw/rtc/Kconfig
@@ -27,3 +27,6 @@ config SUN4V_RTC
  
  config GOLDFISH_RTC

  bool
+
+config LS7A_RTC
+bool
diff --git a/hw/rtc/ls7a_rtc.c b/hw/rtc/ls7a_rtc.c
new file mode 100644
index 00..398afdc8b0
--- /dev/null
+++ b/hw/rtc/ls7a_rtc.c
@@ -0,0 +1,526 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Loongarch LS7A Real Time Clock emulation
+ *
+ * Copyright (C) 2021 Loongson Technology Corporation Limited
+ */
+
+#include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/irq.h"
+#include "include/hw/register.h"
+#include "qemu/timer.h"
+#include "sysemu/sysemu.h"
+#include "qemu/cutils.h"
+#include "qemu/log.h"
+#include "migration/vmstate.h"
+#include "hw/misc/unimp.h"
+#include "sysemu/rtc.h"
+#include "hw/registerfields.h"
+
+#define SYS_TOYTRIM0x20
+#define SYS_TOYWRITE0  0x24
+#define SYS_TOYWRITE1  0x28
+#define SYS_TOYREAD0   0x2C
+#define SYS_TOYREAD1   0x30
+#define SYS_TOYMATCH0  0x34
+#define SYS_TOYMATCH1  0x38
+#define SYS_TOYMATCH2  0x3C
+#define SYS_RTCCTRL0x40
+#define SYS_RTCTRIM0x60
+#define SYS_RTCWRTIE0  0x64
+#define SYS_RTCREAD0   0x68
+#define SYS_RTCMATCH0  0x6C
+#define SYS_RTCMATCH1  0x70
+#define SYS_RTCMATCH2  0x74
+
+#define LS7A_RTC_FREQ 32768
+#define TIMER_NUMS3
+/*
+ * Shift bits and filed mask
+ */
+
+FIELD(TOY, MON, 26, 6)
+FIELD(TOY, DAY, 21, 5)
+FIELD(TOY, HOUR, 16, 5)
+FIELD(TOY, MIN, 10, 6)
+FIELD(TOY, SEC, 4, 6)
+FIELD(TOY, MSEC, 0, 4)
+
+FIELD(TOY_MATCH, YEAR, 26, 6)
+FIELD(TOY_MATCH, MON, 22, 4)
+FIELD(TOY_MATCH, DAY, 17, 5)
+FIELD(TOY_MATCH, HOUR, 12, 5)
+FIELD(TOY_MATCH, MIN, 6, 6)
+FIELD(TOY_MATCH, SEC, 0, 6)
+
+FIELD(RTC_CTRL, RTCEN, 13, 1)
+FIELD(RTC_CTRL, TOYEN, 11, 1)
+FIELD(RTC_CTRL, EO, 8, 1)
+
+#define TYPE_LS7A_RTC "ls7a_rtc"
+OBJECT_DECLARE_SIMPLE_TYPE(LS7ARtcState, LS7A_RTC)
+
+typedef struct LS7ARtcTimer {
+QEMUTimer *timer;
+int64_t save_offset;
+int64_t enable_offset;
+int32_t flag;
+LS7ARtcState *d;
+} LS7ARtcTimer;
+
+struct LS7ARtcState {
+SysBusDevice parent_obj;
+
+MemoryRegion iomem;
+/*
+ * Needed to preserve the tick_count across migration, even if the
+ * absolute value of the rtc_clock is different on the source and
+ * destination.
+ */
+int64_t offset_toy;
+int64_t offset_rtc;
+int64_t data;
+int tidx;
+uint32_t toymatch[3];
+uint32_t toytrim;
+uint32_t cntrctl;
+uint32_t rtctrim;
+uint32_t rtccount;
+uint32_t rtcmatch[3];
+LS7ARtcTimer toy_timer[TIMER_NUMS];
+LS7ARtcTimer rtc_timer[TIMER_NUMS];
+qemu_irq irq;
+};
+
+static int64_t ls7a_rtc_ticks(void)
+{
+return qemu_clock_get_ms(rtc_clock) * LS7A_RTC_FREQ / 1000;
+}
+
+static uint64_t ls7a_rtc_read(void *opaque, hwaddr addr, unsigned size)
+{
+LS7ARtcState *s = LS7A_RTC(opaque);
+struct tm tm;
+int val = 0;
+
+switch (addr) {
+case SYS_TOYREAD0:
+qemu_get_timedate(, s->offset_toy);
+val = FIELD_DP32(val, TOY, MON, tm.tm_mon + 1);
+val = FIELD_DP32(val, TOY, DAY, tm.tm_mday);
+val = FIELD_DP32(val,

Re: The fate of iotest 297

2022-05-18 Thread John Snow

On Wed, May 18, 2022, 12:37 PM Kevin Wolf  wrote:

> Am 18.05.2022 um 01:28 hat John Snow geschrieben:
> > Hi Kevin,
> >
> > I remember that you wanted some minimum Niceness threshold in order to
> > agree to me removing iotest 297.
> >
> > I've already moved it onto GitLab CI in the form of the
> > check-python-pipenv job, but I recall you wanted to be able to run it
> > locally as well before agreeing to axe 297. I remember that you didn't
> > think that running "make check-pipenv" from the python directory was
> > sufficiently Nice enough.
> >
> > Do you need it to be part of "make check", or are you OK with
> > something like "make check-python" from the build directory?
> >
> > I have a bit more work to do if you want it to be part of 'make check'
> > (if you happen to have the right packages installed), but it's pretty
> > easy right now to give you a 'make check-python' (where I just
> > forcibly install those packages to the testing venv.)
>
> Hm, what is the reason for 'make check-python' not being part of 'make
> check'?
>

Oh, it just needs more logic so that it performs correctly in RPM building
environments. As a manual test, I'm free to just grab stuff from PyPI and
build a venv to some precise specification and automate it. This is how
"check-pipenv" and "check-tox" work. The RPM environment can't dial out to
PyPI, so it shouldn't try any venv-based tests by default.

To wire it up to "make check" by *default*, I believe I need to expand the
configure script to poll for certain requisites and then create some
wrapper script of some kind that only engages the python tests if the
requisites were met ... and I lose some control over the mypy/pylint
versioning windows. I have to tolerate a wider versioning, or it'll never
get run in practice.

I have some reluctance to doing this, because pylint and mypy change so
frequently that I don't want "make check" to fail spuriously in the future.

(In practice, these failures occur 100% of the time when I am on vacation.)

The gitlab ci job check-python-tox pulls whatever the latest and greatest
are, and these jobs fail so constantly we had to mark the job as optional.
The check-pipenv job by contrast is extremely stable (its still must-pass)
because it can concoct its own lil' universe.

So, I can add something to make check by default but it needs some
scaffolding to skip the test based on environment, and I have some
reliability concerns.

Ultimately, I don't believe tolerating a wide matrix for mypy/pylint really
adds any value to 297; it only really matters if a specific environment
comes up green, and that a developer like you or I can replicate that test
locally and quickly.

That said ... maybe I can add a controlled venv version of "check-python"
and just have a --disable-check-python or something that spec files can opt
into. Maybe that will work well enough?

i.e. maybe configure can check for the presence of pip, the python venv
module (debian doesnt ship it standard...), and PyPI connectivity and if
so, enables the test. Otherwise, we skip it.

Something like that.

> I'm currently running two things locally, 'make check' (which is the
> generic one that everyone should run) and iotests (for which it is
> reasonable enough that I need to run it separately because it's the
> special thing for my own subsystem).
>

Pretty much exactly what I do. (Except I run the python tests these days,
too.)

> Now adding a third one 'make check-python' certainly isn't the end of
> the world, but it's not really something that is tied to my subsystem
> any more. Having to run test cases separately for other subsystems
> doesn't really scale for me, so I would prefer not to start doing that.
> I can usually get away with not running the more special tests of other
> subsystems before the pull request because I'm unlikely to break things
> in other subsystems, but Python style warnings are easy to get.
>

Reasonable. I already forget to run things like avocado and vm tests, and I
am sympathetic to not wanting to expand the list of manually run tests.

(What avocado and vm tests have in common is that they need to fetch stuff
from the internet, which I am learning makes them unsuitable for make
check, which must work without internet. """Coincidentally""", tests that
require internet seem to break an awful lot more often because they are
getting run a lot less and in fewer places.)

> If we're going to have 'make check-python' separate, but CI checks it,
> we'll get pull requests that don't pass it and would then only be caught
> by CI after a long test run, requiring a v2 pull request. I feel for
> something that checks things like style (and will fail frequently on
> code where nobody ran the check before), that's a bit too unfriendly.
>
> Kevin
>

Got it. I'll see what I can come up with that checks the boxes for
everyone, thanks for clarifying yours.

I want to make everything "just work" but I'm also afraid of writing too
much magic crap that could

Re: [PATCH v4 34/43] hw/intc: Add LoongArch extioi interrupt controller(EIOINTC)

2022-05-18 Thread Richard Henderson


On 5/17/22 04:30, Xiaojuan Yang wrote:

+static void extioi_update_irq(LoongArchExtIOI *s, int irq_num, int level)
+{
+int ipnum, cpu, found, irq_index, irq_mask;
+
+ipnum = get_ipmap(s, irq_num);
+cpu = get_coremap(s, irq_num);
+irq_index = irq_num / 32;
+irq_mask = 1 << (irq_num & 0x1f);
+
+if (level) {
+/* if not enable return false */
+if (((s->enable[irq_index]) & irq_mask) == 0) {
+s->sw_pending[irq_index] |= irq_mask;
+return;
+}
+s->coreisr[cpu][irq_index] |= irq_mask;
+found = find_first_bit(s->sw_isr[cpu][ipnum], EXTIOI_IRQS);


AGAIN!  You CANNOT use only part of the bitmap.h interface.


+uint64_t sw_isr[LOONGARCH_MAX_VCPUS][LS3A_INTC_IP][EXTIOI_IRQS / 64];


This has not been declared with DECLARE_BITMAP, therefore you will see a compile-time 
error when building on an ILP32 (i686) or P64 (win64) host.


I pointed this out to you as recently as v2 of this series.
I am really disappointed to see this regress in just one month.

You can test this yourself with

  IMAGES='fedora-i386-cross fedora-win32-cross fedora-win64-cross' \
  make docker-test-build

Please do so before your next submission.


+static void extioi_writew(void *opaque, hwaddr addr,
+  uint64_t val, unsigned size)
+{
+LoongArchExtIOI *s = LOONGARCH_EXTIOI(opaque);
+int i, cpu, index, old_data, data_offset;
+int old_ip, new_ip, old_core, new_core, irq_mask, irq_num;
+uint32_t offset;
+int old_ipnum[128], old_cpu[4];
+trace_loongarch_extioi_writew(size, (uint32_t)addr, val);
+
+offset = addr & 0x;
+
+switch (offset) {
+case EXTIOI_NODETYPE_START ... EXTIOI_NODETYPE_END - 1:
+index = (offset - EXTIOI_NODETYPE_START) >> 2;
+s->nodetype[index] = val;
+break;
+case EXTIOI_IPMAP_START ... EXTIOI_IPMAP_END - 1:
+/* get irq number */
+offset -= EXTIOI_IPMAP_START;
+index = offset >> 2;
+/*
+ * 4 bytes writing, set 4 irq groups one time,
+ * and one group is 32 irqs, so set 128 irqs mapping
+ */
+for (i = 0; i < 128; i++) {
+old_ipnum[i] = get_ipmap(s, offset);
+offset += 1;
+}


You increment offset in the first loop,


+s->ipmap[index] = val;
+offset -= 128;
+/* if core isr is set, need to update irq */
+for (i = 0; i < 128; i++) {
+old_ip = old_ipnum[i];
+new_ip = get_ipmap(s, offset);
+cpu = get_coremap(s, offset);
+if (old_ip != new_ip) {
+if (s->coreisr[cpu][offset / 32] & (1 << (offset & 0x1f))) {
+extioi_update_irq(s, offset, 1);
+}
+}
+}


but not the second.


+case EXTIOI_ENABLE_START ... EXTIOI_ENABLE_END - 1:
+index = (offset - EXTIOI_ENABLE_START) >> 2;
+old_data = s->enable[index];
+if (old_data != (int)val) {
+s->enable[index] = val;
+/* get data diff */
+old_data ^= val;
+/* value change from 0 to 1 */
+old_data &= val;
+data_offset = ctz32(old_data);
+while (data_offset != 32) {
+/*
+ * enable bit change from 0 to 1,
+ * need to update irq by pending bits
+ */
+irq_num = data_offset + index * 32;
+irq_mask = 1 << data_offset;
+if (s->sw_pending[index] & irq_mask) {
+extioi_update_irq(s, irq_num, 1);
+s->sw_pending[index] &= ~irq_mask;
+}
+old_data &= ~irq_mask;
+data_offset = ctz32(old_data);
+}


I'm still not convinced.  Why would unmasking (enabling) an irq call update_irq, but 
masking (disabling) the same irq not also call update_irq?


Even if that is correct, the testing of sw_pending could be done in parallel, 
like

old_data &= s->sw_pending[index];

before the loop, instead of testing each bit one at a time within the loop.

The loop itself should be written

while (old_data) {
data_offset = ctz32(old_data);
...
old_data &= old_data - 1;
}

so that you don't bother computing ctz for zero values, and with the adjustment to 
old_data before the loop, you don't need irq_mask within the loop.


Likewise with the updates to COREISR and COREMAP.


r~

Re: [PATCH v4 25/43] target/loongarch: Add LoongArch CSR instruction

2022-05-18 Thread Richard Henderson


On 5/17/22 04:30, Xiaojuan Yang wrote:

This includes:
- CSRRD
- CSRWR
- CSRXCHG

Signed-off-by: Xiaojuan Yang
Signed-off-by: Song Gao
---
  target/loongarch/csr_helper.c |  87 ++
  target/loongarch/disas.c  | 101 +++
  target/loongarch/helper.h |   8 +
  .../insn_trans/trans_privileged.c.inc | 264 ++
  target/loongarch/insns.decode |  13 +
  target/loongarch/meson.build  |   1 +
  target/loongarch/translate.c  |  11 +-
  7 files changed, 484 insertions(+), 1 deletion(-)
  create mode 100644 target/loongarch/csr_helper.c
  create mode 100644 target/loongarch/insn_trans/trans_privileged.c.inc


Reviewed-by: Richard Henderson 

r~

Re: [PATCH 4/9] vfio/migration: Skip pre-copy if dirty page tracking is not supported

2022-05-18 Thread Jason Gunthorpe

On Tue, May 17, 2022 at 09:46:56PM -0600, Alex Williamson wrote:

> The current solution is obviously non-optimal, it was mainly
> meant for backwards compatibility, but this seems like a fail faster
> solution, with less useless work, but also providing less indication
> how to configure the migration to succeed.

I don't think this is a production configuration. We should not be
expecting that the user is going to carefully fine tune some SLA
parameter. It may be the "SLA check" we are missing is simply if a SLA
exists or not.

> > It it not intended to be a useful configuration, this is just covering
> > off backwards compat issues - so I'm not keen to do a bunch of
> > management work to support it.
> 
> Are we then deemphasizing the vfio compatibility support in iommufd?

I'm viewing iommufd compatability with VFIO type 1 as only extending
to implemented features.. We are deleting NESTED for instance. As
there is no in-kernel implementation of the type 1 tracker I would
expect to eventually delete it as well once we have consensus this is
what we plan to do.

We discussed this in Joao's series and that was the consensus.

> If we really don't want to put effort towards backwards compatibility,
> should migration support only be enabled with native iommufd
> support?

I'm expecting that the dirty tracking will require native iommufd only
for the system iommu tracker, but the mlx5 on-device tracker will be
fine with either iommu back end.

It is still useful currently for testing the VFIO device part as it
will be some time until the other parts are ready, so I'd rather not
block it completely in qemu.

The goal is for qemu to do something sensible so when we delete kernel
support for type 1 dirty tracking so there is no back compat concern
for existing non-experimental qemu.

Jason

Re: [PATCH] hw/tpm/tpm_tis_common.c: Assert that locty is in range

2022-05-18 Thread Peter Maydell

On Wed, 18 May 2022 at 14:46, Stefan Berger  wrote:
>
>
>
> On 5/13/22 12:38, Peter Maydell wrote:
> > In tpm_tis_mmio_read(), tpm_tis_mmio_write() and
> > tpm_tis_dump_state(), we calculate a locality index with
> > tpm_tis_locality_from_addr() and then use it as an index into the
> > s->loc[] array.  In all these cases, the array index can't overflow
> > because the MemoryRegion is sized to be TPM_TIS_NUM_LOCALITIES <<
> > TPM_TIS_LOCALITY_SHIFT bytes.  However, Coverity can't see that, and
> > it complains (CID 1487138, 1487180, 1487188, 1487198, 1487240).
>

> All 3 of your fixes below are after the 3 existing calls to
> tpm_tis_locality_from_addr(). Would Coverity be happy if we were to move
> the asserts into that one function? I am fine with this patch, though.

Yes, I think Coverity would be happy either way. There's not
a lot in it, but I picked this way round because in theory one
might want in a hypothetical future situation to have a different
kind of error checking for a callsite that did an address-to-locality
lookup: it's not inherently of itself never possible it can fail.

thanks
-- PMM

Re: The fate of iotest 297

2022-05-18 Thread Kevin Wolf

Am 18.05.2022 um 01:28 hat John Snow geschrieben:
> Hi Kevin,
> 
> I remember that you wanted some minimum Niceness threshold in order to
> agree to me removing iotest 297.
> 
> I've already moved it onto GitLab CI in the form of the
> check-python-pipenv job, but I recall you wanted to be able to run it
> locally as well before agreeing to axe 297. I remember that you didn't
> think that running "make check-pipenv" from the python directory was
> sufficiently Nice enough.
> 
> Do you need it to be part of "make check", or are you OK with
> something like "make check-python" from the build directory?
> 
> I have a bit more work to do if you want it to be part of 'make check'
> (if you happen to have the right packages installed), but it's pretty
> easy right now to give you a 'make check-python' (where I just
> forcibly install those packages to the testing venv.)

Hm, what is the reason for 'make check-python' not being part of 'make
check'?

I'm currently running two things locally, 'make check' (which is the
generic one that everyone should run) and iotests (for which it is
reasonable enough that I need to run it separately because it's the
special thing for my own subsystem).

Now adding a third one 'make check-python' certainly isn't the end of
the world, but it's not really something that is tied to my subsystem
any more. Having to run test cases separately for other subsystems
doesn't really scale for me, so I would prefer not to start doing that.
I can usually get away with not running the more special tests of other
subsystems before the pull request because I'm unlikely to break things
in other subsystems, but Python style warnings are easy to get.

If we're going to have 'make check-python' separate, but CI checks it,
we'll get pull requests that don't pass it and would then only be caught
by CI after a long test run, requiring a v2 pull request. I feel for
something that checks things like style (and will fail frequently on
code where nobody ran the check before), that's a bit too unfriendly.

Kevin

Re: [PATCH 28/35] acpi: pvpanic-isa: use AcpiDevAmlIfClass:build_dev_aml to provide device's AML

2022-05-18 Thread Michael S. Tsirkin

On Tue, May 17, 2022 at 10:13:51AM +0200, Gerd Hoffmann wrote:
> That problem isn't new and we already have a bunch of aml_* stubs
> because of that.  I expect it'll work just fine, at worst we'll
> have to add a stub or two in case some calls are not covered yet.

Right but adding these stubs is a bother, we keep missing some.
If possible I'd like the solution to be cleaner than the status quo.
Is adding a wrapper instead of setting a method directly such
a big problem really?

-- 
MST

[PULL 4/7] artist: Fix vertical X11 cursor position in HP-UX

2022-05-18 Thread Helge Deller

Drop the hard-coded value of 1146 lines which seems to work with HP-UX
11, but not with HP-UX 10. Instead encode the screen height in byte 0 of
active_lines_low and byte 3 of misc_video as it's expected by the Xorg
X11 graphics driver.

This potentially allows for higher vertical screen resolutions than
1280x1024 with X11.

Signed-off-by: Helge Deller 
Acked-by: Mark Cave-Ayland 
---
 hw/display/artist.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/hw/display/artist.c b/hw/display/artist.c
index c8b261a52e..780cb15026 100644
--- a/hw/display/artist.c
+++ b/hw/display/artist.c
@@ -337,10 +337,11 @@ static void artist_get_cursor_pos(ARTISTState *s, int *x, 
int *y)
 }
 *x = (lx - offset) / 2;

-*y = 1146 - artist_get_y(s->cursor_pos);
-
 /* subtract cursor offset from cursor control register */
 *x -= (s->cursor_cntrl & 0xf0) >> 4;
+
+/* height minus nOffscreenScanlines is stored in cursor control register */
+*y = s->height - artist_get_y(s->cursor_pos);
 *y -= (s->cursor_cntrl & 0x0f);

 if (*x > s->width) {
@@ -1158,14 +1159,17 @@ static uint64_t artist_reg_read(void *opaque, hwaddr 
addr, unsigned size)
 case ACTIVE_LINES_LOW:
 val = s->active_lines_low;
 /* activeLinesLo for cursor is in reg20.b.b0 */
-val |= ((s->height - 1) & 0xff);
+val &= ~(0xff << 24);
+val |= (s->height & 0xff) << 24;
 break;

 case MISC_VIDEO:
 /* emulate V-blank */
-val = s->misc_video ^ 0x0004;
+s->misc_video ^= 0x0004;
 /* activeLinesHi for cursor is in reg21.b.b2 */
-val |= ((s->height - 1) & 0xff00);
+val = s->misc_video;
+val &= ~0xff00UL;
+val |= (s->height & 0xff00);
 break;

 case MISC_CTRL:
--
2.35.3

[PULL 0/7] Artist cursor fix final patches

2022-05-18 Thread Helge Deller

The following changes since commit 178bacb66d98d9ee7a702b9f2a4dfcd88b72a9ab:

  Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into 
staging (2022-05-09 11:07:04 -0700)

are available in the Git repository at:

  https://github.com/hdeller/qemu-hppa.git 
tags/artist-cursor-fix-final-pull-request

for you to fetch changes up to 9ef2c6b4ab9a430ee230ee305b6994d7eae86fe7:

  artist: Fix X cursor position calculation in X11 (2022-05-16 16:46:25 +0200)


hppa: Artist graphics driver fixes for HP-UX and keyboard fix in firmware boot 
console

This series updates the SeaBIOS-hppa firmware to version 5, in which additional
HP fonts were added to the firmware and the firmware boot console was fixed to
accept input from the emulated PS/2 keyboard when running in graphical mode
(serial console was working before already). To test use the "-boot menu=on"
qemu option.

The artist graphics card driver got various fixes when running the X11-Windows
on HP-UX:
- fixes the horizontal and vertical postioning of the X11 cursor with HP-UX
- allows X11 to blank the screen (e.g. screensaver)
- allows the X11 driver to turn the X11 cursor on/off

Signed-off-by: Helge Deller 

--
Changes compared to version 2 of this series:
- Fixed style issues in the X-cursor positioning patch (noticed by Mark 
Cave-Ayland)

Changes compared to version 1 of this series:
- Added some Acked-by's from Mark Cave-Ayland 
- SeaBIOS-hppa v5 instead of v4 (PS/2 keyboard now works in boot console)
- integrated artist X11 X-cusor positioning fix (which was sent serperately 
before)

--
This series should apply cleanly on git head and can be pulled for testing
from: https://github.com/hdeller/qemu-hppa.git   artist-cursor-fix-final



Helge Deller (7):
  seabios-hppa: Update SeaBIOS-hppa to VERSION 5
  artist: Introduce constant for max cursor size
  artist: Use human-readable variable names instead of reg_xxx
  artist: Fix vertical X11 cursor position in HP-UX
  artist: Allow to turn cursor on or off
  artist: Emulate screen blanking
  artist: Fix X cursor position calculation in X11

 hw/display/artist.c   | 168 --
 pc-bios/hppa-firmware.img | Bin 701964 -> 719040 bytes
 roms/seabios-hppa |   2 +-
 3 files changed, 126 insertions(+), 44 deletions(-)

--
2.35.3

[PULL 2/7] artist: Introduce constant for max cursor size

2022-05-18 Thread Helge Deller

Add the constant NGLE_MAX_SPRITE_SIZE which defines the currently
maximum supported cursor size.

Signed-off-by: Helge Deller 
Acked-by: Mark Cave-Ayland 
---
 hw/display/artist.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/display/artist.c b/hw/display/artist.c
index 39fc0c4ca5..6333ee41db 100644
--- a/hw/display/artist.c
+++ b/hw/display/artist.c
@@ -98,6 +98,9 @@ struct ARTISTState {
 int draw_line_pattern;
 };

+/* hardware allows up to 64x64, but we emulate 32x32 only. */
+#define NGLE_MAX_SPRITE_SIZE32
+
 typedef enum {
 ARTIST_BUFFER_AP = 1,
 ARTIST_BUFFER_OVERLAY = 2,
@@ -1325,11 +1328,10 @@ static void artist_realizefn(DeviceState *dev, Error 
**errp)
 framebuffer_update_memory_section(>fbsection, >mr, 0,
   buf->width, buf->height);
 /*
- * no idea whether the cursor is fixed size or not, so assume 32x32 which
- * seems sufficient for HP-UX X11.
+ * Artist cursor max size
  */
-s->cursor_height = 32;
-s->cursor_width = 32;
+s->cursor_height = NGLE_MAX_SPRITE_SIZE;
+s->cursor_width = NGLE_MAX_SPRITE_SIZE;

 /*
  * These two registers are not initialized by seabios's STI implementation.
--
2.35.3

[PULL 7/7] artist: Fix X cursor position calculation in X11

2022-05-18 Thread Helge Deller

The X cursor postion can be calculated based on the backporch and
interleave values.  In the emulation we ignore the HP-UX settings for
backporch and use instead twice the size of the emulated cursor.  With
those changes the X-position of the graphics cursor is now finally
working correctly on HP-UX 10 and HP-UX 11.

Based on coding in Xorg X11R6.6

Signed-off-by: Helge Deller 
---
 hw/display/artist.c | 38 ++
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/hw/display/artist.c b/hw/display/artist.c
index 49dad2b824..eadaef0d46 100644
--- a/hw/display/artist.c
+++ b/hw/display/artist.c
@@ -1,7 +1,8 @@
 /*
  * QEMU HP Artist Emulation
  *
- * Copyright (c) 2019 Sven Schnelle 
+ * Copyright (c) 2019-2022 Sven Schnelle 
+ * Copyright (c) 2022 Helge Deller 
  *
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  */
@@ -313,19 +314,15 @@ static void artist_rop8(ARTISTState *s, struct 
vram_buffer *buf,
 static void artist_get_cursor_pos(ARTISTState *s, int *x, int *y)
 {
 /*
- * Don't know whether these magic offset values are configurable via
- * some register. They seem to be the same for all resolutions.
- * The cursor values provided in the registers are:
- * X-value: -295 (for HP-UX 11) and 338 (for HP-UX 10.20) up to 2265
- * Y-value: 1146 down to 0
  * The emulated Artist graphic is like a CRX graphic, and as such
  * it's usually fixed at 1280x1024 pixels.
- * Because of the maximum Y-value of 1146 you can not choose a higher
- * vertical resolution on HP-UX (unless you disable the mouse).
+ * Other resolutions may work, but no guarantee.
  */

-static int offset = 338;
-int lx;
+unsigned int hbp_times_vi, horizBackPorch;
+int16_t xHi, xLo;
+const int videoInterleave = 4;
+const int pipelineDelay = 4;

 /* ignore if uninitialized */
 if (s->cursor_pos == 0) {
@@ -333,16 +330,23 @@ static void artist_get_cursor_pos(ARTISTState *s, int *x, 
int *y)
 return;
 }

-lx = artist_get_x(s->cursor_pos);
-if (lx < offset) {
-offset = lx;
-}
-*x = (lx - offset) / 2;
+/*
+ * Calculate X position based on backporch and interleave values.
+ * Based on code from Xorg X11R6.6
+ */
+horizBackPorch = ((s->horiz_backporch & 0xff) >> 16) +
+ ((s->horiz_backporch & 0xff00) >> 8) + 2;
+hbp_times_vi = horizBackPorch * videoInterleave;
+xHi = s->cursor_pos >> 19;
+*x = ((xHi + pipelineDelay) * videoInterleave) - hbp_times_vi;
+
+xLo = (s->cursor_pos >> 16) & 0x07;
+*x += ((xLo - hbp_times_vi) & (videoInterleave - 1)) + 8 - 1;

 /* subtract cursor offset from cursor control register */
 *x -= (s->cursor_cntrl & 0xf0) >> 4;

-/* height minus nOffscreenScanlines is stored in cursor control register */
+/* Calculate Y position */
 *y = s->height - artist_get_y(s->cursor_pos);
 *y -= (s->cursor_cntrl & 0x0f);

@@ -1056,6 +1060,8 @@ static void artist_reg_write(void *opaque, hwaddr addr, 
uint64_t val,
 break;

 case HORIZ_BACKPORCH:
+/* overwrite HP-UX settings to fix X cursor position. */
+val = (NGLE_MAX_SPRITE_SIZE << 16) + (NGLE_MAX_SPRITE_SIZE << 8);
 combine_write_reg(addr, val, size, >horiz_backporch);
 break;

--
2.35.3

[PULL 6/7] artist: Emulate screen blanking

2022-05-18 Thread Helge Deller

The misc_video and misc_ctrl registers control the visibility of the
screen. Start with the screen turned on, and hide or show the screen
based on the control registers.

Signed-off-by: Helge Deller 
Acked-by: Mark Cave-Ayland 
---
 hw/display/artist.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/hw/display/artist.c b/hw/display/artist.c
index b8930b7c5a..49dad2b824 100644
--- a/hw/display/artist.c
+++ b/hw/display/artist.c
@@ -201,6 +201,8 @@ static const char *artist_reg_name(uint64_t addr)
 }
 #undef REG_NAME

+static void artist_invalidate(void *opaque);
+
 /* artist has a fixed line length of 2048 bytes. */
 #define ADDR_TO_Y(addr) extract32(addr, 11, 11)
 #define ADDR_TO_X(addr) extract32(addr, 0, 11)
@@ -903,6 +905,7 @@ static void artist_reg_write(void *opaque, hwaddr addr, 
uint64_t val,
 {
 ARTISTState *s = opaque;
 int width, height;
+uint64_t oldval;

 trace_artist_reg_write(size, addr, artist_reg_name(addr & ~3ULL), val);

@@ -1061,7 +1064,18 @@ static void artist_reg_write(void *opaque, hwaddr addr, 
uint64_t val,
 break;

 case MISC_VIDEO:
+oldval = s->misc_video;
 combine_write_reg(addr, val, size, >misc_video);
+/* Invalidate and hide screen if graphics signal is turned off. */
+if (((oldval & 0x0A00) == 0x0A00) &&
+((val & 0x0A00) != 0x0A00)) {
+artist_invalidate(s);
+}
+/* Invalidate and redraw screen if graphics signal is turned back on. 
*/
+if (((oldval & 0x0A00) != 0x0A00) &&
+((val & 0x0A00) == 0x0A00)) {
+artist_invalidate(s);
+}
 break;

 case MISC_CTRL:
@@ -1263,6 +1277,12 @@ static void artist_draw_cursor(ARTISTState *s)
 }
 }

+static bool artist_screen_enabled(ARTISTState *s)
+{
+/*  We could check for (s->misc_ctrl & 0x0080) too... */
+return ((s->misc_video & 0x0A00) == 0x0A00);
+}
+
 static void artist_draw_line(void *opaque, uint8_t *d, const uint8_t *src,
  int width, int pitch)
 {
@@ -1270,6 +1290,12 @@ static void artist_draw_line(void *opaque, uint8_t *d, 
const uint8_t *src,
 uint32_t *cmap, *data = (uint32_t *)d;
 int x;

+if (!artist_screen_enabled(s)) {
+/* clear screen */
+memset(data, 0, s->width * sizeof(uint32_t));
+return;
+}
+
 cmap = (uint32_t *)(s->vram_buffer[ARTIST_BUFFER_CMAP].data + 0x400);

 for (x = 0; x < s->width; x++) {
@@ -1384,6 +1410,10 @@ static void artist_realizefn(DeviceState *dev, Error 
**errp)
 s->image_bitmap_op = 0x23000300;
 s->plane_mask = 0xff;

+/* enable screen */
+s->misc_video |= 0x0A00;
+s->misc_ctrl  |= 0x0080;
+
 s->con = graphic_console_init(dev, 0, _ops, s);
 qemu_console_resize(s->con, s->width, s->height);
 }
--
2.35.3

[PULL 3/7] artist: Use human-readable variable names instead of reg_xxx

2022-05-18 Thread Helge Deller

Convert the variable names of some registers to human-readable and
understandable names.

Signed-off-by: Helge Deller 
Acked-by: Mark Cave-Ayland 
---
 hw/display/artist.c | 72 ++---
 1 file changed, 49 insertions(+), 23 deletions(-)

diff --git a/hw/display/artist.c b/hw/display/artist.c
index 6333ee41db..c8b261a52e 100644
--- a/hw/display/artist.c
+++ b/hw/display/artist.c
@@ -81,9 +81,10 @@ struct ARTISTState {
 uint32_t plane_mask;

 uint32_t reg_100080;
-uint32_t reg_300200;
-uint32_t reg_300208;
-uint32_t reg_300218;
+uint32_t horiz_backporch;
+uint32_t active_lines_low;
+uint32_t misc_video;
+uint32_t misc_ctrl;

 uint32_t dst_bm_access;
 uint32_t src_bm_access;
@@ -138,8 +139,14 @@ typedef enum {
 BG_COLOR = 0x118014,
 PLANE_MASK = 0x118018,
 IMAGE_BITMAP_OP = 0x11801c,
-CURSOR_POS = 0x300100,
-CURSOR_CTRL = 0x300104,
+CURSOR_POS = 0x300100,  /* reg17 */
+CURSOR_CTRL = 0x300104, /* reg18 */
+MISC_VIDEO = 0x300218,  /* reg21 */
+MISC_CTRL = 0x300308,   /* reg27 */
+HORIZ_BACKPORCH = 0x300200, /* reg19 */
+ACTIVE_LINES_LOW = 0x300208,/* reg20 */
+FIFO1 = 0x38,   /* reg34 */
+FIFO2 = 0x380008,
 } artist_reg_t;

 typedef enum {
@@ -177,12 +184,18 @@ static const char *artist_reg_name(uint64_t addr)
 REG_NAME(SRC_BM_ACCESS);
 REG_NAME(CURSOR_POS);
 REG_NAME(CURSOR_CTRL);
+REG_NAME(HORIZ_BACKPORCH);
+REG_NAME(ACTIVE_LINES_LOW);
+REG_NAME(MISC_VIDEO);
+REG_NAME(MISC_CTRL);
 REG_NAME(LINE_XY);
 REG_NAME(PATTERN_LINE_START);
 REG_NAME(LINE_SIZE);
 REG_NAME(LINE_END);
 REG_NAME(FONT_WRITE_INCR_Y);
 REG_NAME(FONT_WRITE_START);
+REG_NAME(FIFO1);
+REG_NAME(FIFO2);
 }
 return "";
 }
@@ -1028,16 +1041,20 @@ static void artist_reg_write(void *opaque, hwaddr addr, 
uint64_t val,
 combine_write_reg(addr, val, size, >transfer_data);
 break;

-case 0x300200:
-combine_write_reg(addr, val, size, >reg_300200);
+case HORIZ_BACKPORCH:
+combine_write_reg(addr, val, size, >horiz_backporch);
 break;

-case 0x300208:
-combine_write_reg(addr, val, size, >reg_300208);
+case ACTIVE_LINES_LOW:
+combine_write_reg(addr, val, size, >active_lines_low);
 break;

-case 0x300218:
-combine_write_reg(addr, val, size, >reg_300218);
+case MISC_VIDEO:
+combine_write_reg(addr, val, size, >misc_video);
+break;
+
+case MISC_CTRL:
+combine_write_reg(addr, val, size, >misc_ctrl);
 break;

 case CURSOR_POS:
@@ -1122,12 +1139,11 @@ static uint64_t artist_reg_read(void *opaque, hwaddr 
addr, unsigned size)
 case 0x10:
 case 0x30:
 case 0x34:
-case 0x300308:
 case 0x38:
 break;

-case 0x38:
-case 0x380008:
+case FIFO1:
+case FIFO2:
 /*
  * FIFO ready flag. we're not emulating the FIFOs
  * so we're always ready
@@ -1135,16 +1151,25 @@ static uint64_t artist_reg_read(void *opaque, hwaddr 
addr, unsigned size)
 val = 0x10;
 break;

-case 0x300200:
-val = s->reg_300200;
+case HORIZ_BACKPORCH:
+val = s->horiz_backporch;
+break;
+
+case ACTIVE_LINES_LOW:
+val = s->active_lines_low;
+/* activeLinesLo for cursor is in reg20.b.b0 */
+val |= ((s->height - 1) & 0xff);
 break;

-case 0x300208:
-val = s->reg_300208;
+case MISC_VIDEO:
+/* emulate V-blank */
+val = s->misc_video ^ 0x0004;
+/* activeLinesHi for cursor is in reg21.b.b2 */
+val |= ((s->height - 1) & 0xff00);
 break;

-case 0x300218:
-val = s->reg_300218;
+case MISC_CTRL:
+val = s->misc_ctrl;
 break;

 case 0x30023c:
@@ -1379,9 +1404,10 @@ static const VMStateDescription vmstate_artist = {
 VMSTATE_UINT32(cursor_width, ARTISTState),
 VMSTATE_UINT32(plane_mask, ARTISTState),
 VMSTATE_UINT32(reg_100080, ARTISTState),
-VMSTATE_UINT32(reg_300200, ARTISTState),
-VMSTATE_UINT32(reg_300208, ARTISTState),
-VMSTATE_UINT32(reg_300218, ARTISTState),
+VMSTATE_UINT32(horiz_backporch, ARTISTState),
+VMSTATE_UINT32(active_lines_low, ARTISTState),
+VMSTATE_UINT32(misc_video, ARTISTState),
+VMSTATE_UINT32(misc_ctrl, ARTISTState),
 VMSTATE_UINT32(dst_bm_access, ARTISTState),
 VMSTATE_UINT32(src_bm_access, ARTISTState),
 VMSTATE_UINT32(control_plane, ARTISTState),
--
2.35.3

Re: [PATCH 5/9] migration/qemu-file: Add qemu_file_get_to_fd()

2022-05-18 Thread Jason Gunthorpe

On Wed, May 18, 2022 at 05:00:26PM +0100, Daniel P. Berrangé wrote:
> On Wed, May 18, 2022 at 12:42:37PM -0300, Jason Gunthorpe wrote:
> > On Wed, May 18, 2022 at 01:54:34PM +0200, Juan Quintela wrote:
> > 
> > > >> Is there a really performance difference to just use:
> > > >>
> > > >> uint8_t buffer[size];
> > > >>
> > > >> qemu_get_buffer(f, buffer, size);
> > > >> write(fd, buffer, size);
> > > >>
> > > >> Or telling it otherwise, what sizes are we talking here?
> > > >
> > > > It depends on the device, but It can range from a few MBs to several
> > > > GBs AFAIK.
> > > 
> > > a few MB is ok.
> > > 
> > > Several GB on the main migration channel without a single
> > > header/whatever?
> > 
> > IIRC it iterates in multi-megabyte chunks each which gets a header.
> > 
> > The chunking size is set by the size of the buffer mmap
> > 
> > The overall point is that memcpying GB's is going to be taxing so we
> > want to eliminate copies on this path, especially copies that result
> > in more system calls.
> > 
> > We are expecting to look into further optimization down the road here
> > because even this is still too slow.
> 
> Considering the possibility of future optimization, IMHO adding this
> kind of API at the QEMUFile level is too high. We'd be better pushing
> the impl down into the QIOChannel API level.
> 
>int64_t qio_channel_copy_range(QIOCHannel *srcioc,
>   QIOChannel *tgtioc,
> size_t len);
> 
> The QIOChannel impl can do pretty much what you showed in the general
> case, but in special cases it could have the option to offload to the
> kernel copy_range() syscall to avoid the context sitches.

This is probably something to do down the road when we figure out
exactly what is best.

Currently we don't have kernel support for optimized copy_file_range()
(ie fops splice_read) inside the VFIO drivers so copy_file_range()
will just fail.

I didn't look closely again but IIRC the challenge is that the
QIOChannel doesn't have a ready large temporary buffer to use for a
non-splice fallback path.

Jason

[PULL 5/7] artist: Allow to turn cursor on or off

2022-05-18 Thread Helge Deller

Bit 0x80 in the cursor_cntrl register specifies if the cursor
should be visible. Prevent rendering the cursor if it's invisible.

Signed-off-by: Helge Deller 
Acked-by: Mark Cave-Ayland 
---
 hw/display/artist.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/hw/display/artist.c b/hw/display/artist.c
index 780cb15026..b8930b7c5a 100644
--- a/hw/display/artist.c
+++ b/hw/display/artist.c
@@ -353,10 +353,20 @@ static void artist_get_cursor_pos(ARTISTState *s, int *x, 
int *y)
 }
 }

+static inline bool cursor_visible(ARTISTState *s)
+{
+/* cursor is visible if bit 0x80 is set in cursor_cntrl */
+return s->cursor_cntrl & 0x80;
+}
+
 static void artist_invalidate_cursor(ARTISTState *s)
 {
 int x, y;

+if (!cursor_visible(s)) {
+return;
+}
+
 artist_get_cursor_pos(s, , );
 artist_invalidate_lines(>vram_buffer[ARTIST_BUFFER_AP],
 y, s->cursor_height);
@@ -1218,6 +1228,10 @@ static void artist_draw_cursor(ARTISTState *s)
 struct vram_buffer *cursor0, *cursor1 , *buf;
 int cx, cy, cursor_pos_x, cursor_pos_y;

+if (!cursor_visible(s)) {
+return;
+}
+
 cursor0 = >vram_buffer[ARTIST_BUFFER_CURSOR1];
 cursor1 = >vram_buffer[ARTIST_BUFFER_CURSOR2];
 buf = >vram_buffer[ARTIST_BUFFER_AP];
--
2.35.3

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-18 Thread Kevin Wolf

Am 18.05.2022 um 14:43 hat Paolo Bonzini geschrieben:
> On 5/18/22 14:28, Emanuele Giuseppe Esposito wrote:
> > For example, all callers of bdrv_open() always take the AioContext lock.
> > Often it is taken very high in the call stack, but it's always taken.
> 
> I think it's actually not a problem of who takes the AioContext lock or
> where; the requirements are contradictory:
> 
> * IO_OR_GS_CODE() functions, when called from coroutine context, expect to
> be called with the AioContext lock taken (example: bdrv_co_yield_to_drain)
> 
> * to call these functions with the lock taken, the code has to run in the
> BDS's home iothread.  Attempts to do otherwise results in deadlocks (the
> main loop's AIO_WAIT_WHILEs expect progress from the iothread, that cannot
> happen without releasing the aiocontext lock)
> 
> * running the code in the BDS's home iothread is not possible for
> GLOBAL_STATE_CODE() functions (unless the BDS home iothread is the main
> thread, but that cannot be guaranteed in general)
> 
> > We might suppose that many callbacks are called under drain and in
> > GLOBAL_STATE, which should be enough, but from our experimentation in
> > the previous series we saw that currently not everything is under drain,
> > leaving some operations unprotected (remember assert_graph_writable
> > temporarily disabled, since drain coverage for bdrv_replace_child_noperm
> > was not 100%?).
> > Therefore we need to add more drains. But isn't drain what we decided to
> > drop at the beginning? Why isn't drain good?
> 
> To sum up the patch ordering deadlock that we have right now:
> 
> * in some cases, graph manipulations are protected by the AioContext lock
> 
> * eliminating the AioContext lock is needed to move callbacks to coroutine
> contexts (see above for the deadlock scenario)
> 
> * moving callbacks to coroutine context is needed by the graph rwlock
> implementation
> 
> On one hand, we cannot protect the graph across manipulations with a graph
> rwlock without removing the AioContext lock; on the other hand, the
> AioContext lock is what _right now_ protects the graph.
> 
> So I'd rather go back to Emanuele's draining approach.  It may not be
> beautiful, but it allows progress.  Once that is in place, we can remove the
> AioContext lock (which mostly protects virtio-blk/virtio-scsi code right
> now) and reevaluate our next steps.

If we want to use drain for locking, we need to make sure that drain
actually does the job correctly. I see two major problems with it:

The first one is that drain only covers I/O paths, but we need to
protect against _anything_ touching block nodes. This might mean a
massive audit and making sure that everything in QEMU that could
possibly touch a block node is integrated with drain.

I think Emanuele has argued before that because writes to the graph only
happen in the main thread and we believe that currently only I/O
requests are processed in iothreads, this is safe and we don't actually
need to audit everything.

This is true as long as the assumption holds true (how do we ensure that
nobody ever introduces non-I/O code touching a block node in an
iothread?) and as long as the graph writer never yields or polls. I
think the latter condition is violated today, a random example is that
adjusting drain counts in bdrv_replace_child_noperm() does poll. Without
cooperation from all relevant places, the effectively locked code
section ends right there, even if the drained section continues. Even if
we can fix this, verifying that the conditions are met everywhere seems
not trivial.

And that's exactly my second major concern: Even if we manage to
correctly implement things with drain, I don't see a way to meaningfully
review it. I just don't know how to verify with some confidence that
it's actually correct and covering everything that needs to be covered.

Drain is not really a lock. But if you use it as one, the best it can
provide is advisory locking (callers, inside and outside the block
layer, need to explicitly support drain instead of having the lock
applied somewhere in the block layer functions). And even if all
relevant pieces actually make use of it, it still has an awkward
interface for locking:

/* Similar to rdlock(), but doesn't wait for writers to finish. It is
 * the callers responsibility to make sure that there are no writers. */
bdrv_inc_in_flight()

/* Similar to wrlock(). Waits for readers to finish. New readers are not
 * prevented from starting after it returns. Third parties are politely
 * asked not to touch the block node while it is drained. */
bdrv_drained_begin()

(I think the unlock counterparts actually behave as expected from a real
lock.)

Having an actual rdlock() (that waits for writers), and using static
analysis to verify that all relevant places use it (so that wrlock()
doesn't rely on politely asking third parties to leave the node alone)
gave me some confidence that we could verify the result.

I'm not sure at all how to achieve the

Re: [PULL 0/8] Misc patches (Gitlab-CI, qtest, Capstone, ...)

2022-05-18 Thread Richard Henderson


On 5/18/22 02:04, Thomas Huth wrote:

  Hi Richard!

The following changes since commit eec398119fc6911d99412c37af06a6bc27871f85:

   Merge tag 'for_upstream' of git://git.kernel.org/pub/scm/virt/kvm/mst/qemu 
into staging (2022-05-16 16:31:01 -0700)

are available in the Git repository at:

   https://gitlab.com/thuth/qemu.git tags/pull-request-2022-05-18

for you to fetch changes up to 83602083b4ada6ceb86bfb327e83556ebab120fc:

   capstone: Remove the capstone submodule (2022-05-18 08:54:22 +0200)


* Remove Ubuntu 18.04 containers (not supported anymore)
* Improve the cleanup of the QEMU binary in case of failing qtests
* Update the Windows support statement
* Remove the capstone submodule (and rely on Capstone of the distros instead)


Fails centos-stream-8-x86_64 test,

Run-time dependency capstone found: NO (tried pkgconfig)
../meson.build:2539:2: ERROR: Dependency "capstone" not found, tried pkgconfig

https://gitlab.com/qemu-project/qemu/-/jobs/2473935684


r~






Daniel P. Berrangé (2):
   tests/qtest: fix registration of ABRT handler for QEMU cleanup
   tests/qtest: use prctl(PR_SET_PDEATHSIG) as fallback to kill QEMU

Thomas Huth (6):
   gitlab-ci: Switch the container of the 'check-patch' & 'check-dco' jobs
   Remove Ubuntu 18.04 container support from the repository
   docs/about: Update the support statement for Windows
   tests/vm: Add capstone to the NetBSD and OpenBSD VMs
   capstone: Allow version 3.0.5 again
   capstone: Remove the capstone submodule

  docs/about/build-platforms.rst |  14 +-
  configure  |  23 +---
  meson.build| 115 +---
  include/qemu/osdep.h   |   2 +-
  tests/qtest/libqtest.c |  21 ++-
  .gitlab-ci.d/buildtest.yml |   3 +-
  .gitlab-ci.d/containers.yml|   5 -
  .../custom-runners/ubuntu-20.04-aarch32.yml|   2 +-
  .../custom-runners/ubuntu-20.04-aarch64.yml|   2 +-
  .gitlab-ci.d/static_checks.yml |  14 +-
  .gitlab-ci.d/windows.yml   |   5 +-
  .gitmodules|   3 -
  capstone   |   1 -
  meson_options.txt  |   3 +-
  scripts/ci/setup/build-environment.yml |  14 +-
  scripts/meson-buildoptions.sh  |   5 +-
  tests/docker/dockerfiles/ubuntu1804.docker | 144 -
  tests/lcitool/refresh  |   7 -
  tests/vm/netbsd|   3 +-
  tests/vm/openbsd   |   3 +-
  20 files changed, 59 insertions(+), 330 deletions(-)
  delete mode 16 capstone
  delete mode 100644 tests/docker/dockerfiles/ubuntu1804.docker

Re: [PATCH 5/9] migration/qemu-file: Add qemu_file_get_to_fd()

2022-05-18 Thread Daniel P . Berrangé

On Wed, May 18, 2022 at 12:42:37PM -0300, Jason Gunthorpe wrote:
> On Wed, May 18, 2022 at 01:54:34PM +0200, Juan Quintela wrote:
> 
> > >> Is there a really performance difference to just use:
> > >>
> > >> uint8_t buffer[size];
> > >>
> > >> qemu_get_buffer(f, buffer, size);
> > >> write(fd, buffer, size);
> > >>
> > >> Or telling it otherwise, what sizes are we talking here?
> > >
> > > It depends on the device, but It can range from a few MBs to several
> > > GBs AFAIK.
> > 
> > a few MB is ok.
> > 
> > Several GB on the main migration channel without a single
> > header/whatever?
> 
> IIRC it iterates in multi-megabyte chunks each which gets a header.
> 
> The chunking size is set by the size of the buffer mmap
> 
> The overall point is that memcpying GB's is going to be taxing so we
> want to eliminate copies on this path, especially copies that result
> in more system calls.
> 
> We are expecting to look into further optimization down the road here
> because even this is still too slow.

Considering the possibility of future optimization, IMHO adding this
kind of API at the QEMUFile level is too high. We'd be better pushing
the impl down into the QIOChannel API level.

   int64_t qio_channel_copy_range(QIOCHannel *srcioc,
  QIOChannel *tgtioc,
  size_t len);

The QIOChannel impl can do pretty much what you showed in the general
case, but in special cases it could have the option to offload to the
kernel copy_range() syscall to avoid the context sitches.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 4/9] vfio/migration: Skip pre-copy if dirty page tracking is not supported

2022-05-18 Thread Jason Gunthorpe

On Wed, May 18, 2022 at 01:39:31PM +0200, Juan Quintela wrote:

> > That does seem like a defect in this patch, any SLA constraints should
> > still all be checked under the assumption all ram is dirty.
> 
> And how are we going to:
> - detect the network link speed
> - to be sure that we are inside downtime limit
> 
> I think that it is not possible, so basically we are skiping the precopy
> stage and praying that the other bits are going to be ok.

Like I keep saying, this is not a real use case, we expect dirty
tracking to be available in any real configuration. This is just
trying to make qemu work in some reasonable way if dirty tracking is
not available but a VFIO migration device is plugged in.

Just pick something simple that makes sense. Like if any SLA is set
then just refuse to even start. If no SLA then go directly to
STOP_COPY.

> >> It seems like a better solution would be to expose to management
> >> tools that the VM contains a device that does not support the
> >> pre-copy phase so that downtime expectations can be adjusted.
> >
> > I don't expect this to be a real use case though..
> >
> > Remember, you asked for this patch when you wanted qemu to have good
> > behavior when kernel support for legacy dirty tracking is removed
> > before we merge v2 support.
> 
> I am an ignorant on the subject.  Can I ask how the dirty memory is
> tracked on this v2?

These two RFCs are the current proposal to enable it for the system
iommu:

https://lore.kernel.org/kvm/20220428210933.3583-1-joao.m.mart...@oracle.com

And for device internal trackers:

https://lore.kernel.org/kvm/20220501123301.127279-1-yish...@nvidia.com/ 

Regards,
Jason

Re: [PATCH 5/9] migration/qemu-file: Add qemu_file_get_to_fd()

2022-05-18 Thread Jason Gunthorpe

On Wed, May 18, 2022 at 01:54:34PM +0200, Juan Quintela wrote:

> >> Is there a really performance difference to just use:
> >>
> >> uint8_t buffer[size];
> >>
> >> qemu_get_buffer(f, buffer, size);
> >> write(fd, buffer, size);
> >>
> >> Or telling it otherwise, what sizes are we talking here?
> >
> > It depends on the device, but It can range from a few MBs to several
> > GBs AFAIK.
> 
> a few MB is ok.
> 
> Several GB on the main migration channel without a single
> header/whatever?

IIRC it iterates in multi-megabyte chunks each which gets a header.

The chunking size is set by the size of the buffer mmap

The overall point is that memcpying GB's is going to be taxing so we
want to eliminate copies on this path, especially copies that result
in more system calls.

We are expecting to look into further optimization down the road here
because even this is still too slow.

Jason

Re: [PATCH 0/3] hw/acpi/aml-build: Fix {socket, cluster, core} IDs in PPTT

2022-05-18 Thread Andrew Jones

On Wed, May 18, 2022 at 05:21:38PM +0800, Gavin Shan wrote:
> The {socket, cluster, core} IDs detected from Linux guest aren't
> matching with what have been provided in PPTT. The flag used for
> 'ACPI Processor ID valid' is missed for {socket, cluster, core}
> nodes. In this case, Linux guest takes the offset between the
> node and PPTT header as the corresponding IDs, as the following
> logs show.
> 
> 
>   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64\
>   -accel kvm -machine virt,gic-version=host -cpu host   \
>   -smp 8,sockets=2,clusters=2,cores=2,threads=1
> :
> 
>   # cd /sys/devices/system/cpu
>   # for i in `seq 0 15`; do cat cpu$i/topology/physical_package_id; done
> 36  36  36  36  36  36  36  36
> 336 336 336 336 336 336 336 336
>   # for i in `seq 0 15`; do cat cpu$i/topology/cluster_id; done
> 56  56  56  56  196 196 196 196
> 356 356 356 356 496 496 496 496
>   # for i in `seq 0 15`; do cat cpu$i/topology/core_id; done
> 76  76  136 136 216 216 276 276
> 376 376 436 436 516 516 576 576
> 
> This fixes the issue by setting 'ACPI Processor ID valid' flag for
> {socket, cluster, core} nodes. With this applied, the IDs are exactly
> what have been provided in PPTT. I also checked the PPTT table on my
> host, where the 'ACPI Processor ID valid' is set for cluster/core nodes,
> but missed from socket nodes.
> 
>   host# pwd
>   /sys/devices/system/cpu
>   host# cat cpu0/topology/physical_package_id; \
> cat cpu0/topology/cluster_id;  \
> cat cpu0/topology/core_id
>   36 0 0
> 
> Gavin Shan (3):
>   tests/acpi/virt: Allow PPTT ACPI table changes
>   hw/acpi/aml-build: Fix {socket, cluster, core} IDs in PPTT
>   tests/acpi/virt: Update PPTT ACPI table
> 
>  hw/acpi/aml-build.c   |   9 ++---
>  tests/data/acpi/virt/PPTT | Bin 96 -> 96 bytes
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> -- 
> 2.23.0
>

For the series

Reviewed-by: Andrew Jones

Re: [PATCH v2 2/3] ui: Switch "-display sdl" to use the QAPI parser

2022-05-18 Thread Eric Blake

On Wed, May 18, 2022 at 03:44:45PM +0200, Thomas Huth wrote:
> The "-display sdl" option still uses a hand-crafted parser for its
> parameters since we didn't want to drag an interface we considered
> somewhat flawed into the QAPI schema. Since the flaws are gone now,
> it's time to QAPIfy.
> 
> This introduces the new "DisplaySDL" QAPI struct that is used to hold
> the parameters that are unique to the SDL display. The only specific
> parameter is currently "grab-mod" that is used to specify the required
> modifier keys to escape from the mouse grabbing mode.
> 
> Signed-off-by: Thomas Huth 
> ---

> +++ b/qapi/ui.json
> @@ -1295,6 +1295,30 @@
>'*swap-opt-cmd': 'bool'
>} }
>  
> +##
> +# @GrabMod:
> +#
> +# Set of modifier keys that need to be hold for shortcut key actions.

s/hold/held/

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v4 0/3] block/dirty-bitmaps: fix and improve bitmap merge

2022-05-18 Thread Eric Blake

On Tue, May 17, 2022 at 02:12:03PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> From: Vladimir Sementsov-Ogievskiy 
> 
> v4:
> 01,03: add Kevin's r-b
> 02: add hbitmap_free() on success patch if local_backup is not needed
> 
> Vladimir Sementsov-Ogievskiy (3):
>   block: block_dirty_bitmap_merge(): fix error path
>   block: improve block_dirty_bitmap_merge(): don't allocate extra bitmap
>   block: simplify handling of try to merge different sized bitmaps

I plan to queue this through my block bitmaps tree, if no other block
maintainer beats me to it (I may do a combined pull request for that
and NBD patches).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v4 2/3] block: improve block_dirty_bitmap_merge(): don't allocate extra bitmap

2022-05-18 Thread Eric Blake

On Tue, May 17, 2022 at 02:12:05PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> From: Vladimir Sementsov-Ogievskiy 
> 
> We don't need extra bitmap. All we need is to backup the original
> bitmap when we do first merge. So, drop extra temporary bitmap and work
> directly with target and backup.
> 
> Still to keep old semantics, that on failure target is unchanged and
> user don't need to restore, we need a local_backup variable and do
> restore ourselves on failure path.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/monitor/bitmap-qmp-cmds.c | 41 +
>  1 file changed, 21 insertions(+), 20 deletions(-)

Reviewed-by: Eric Blake 

> 
> diff --git a/block/monitor/bitmap-qmp-cmds.c b/block/monitor/bitmap-qmp-cmds.c
> index bd10468596..282363606f 100644
> --- a/block/monitor/bitmap-qmp-cmds.c
> +++ b/block/monitor/bitmap-qmp-cmds.c
> @@ -261,8 +261,9 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char 
> *node, const char *target,
>HBitmap **backup, Error **errp)
>  {
>  BlockDriverState *bs;
> -BdrvDirtyBitmap *dst, *src, *anon;
> +BdrvDirtyBitmap *dst, *src;
>  BlockDirtyBitmapOrStrList *lst;
> +HBitmap *local_backup = NULL;

Basically, instead of tracking the backup via a full-featured
BdrvDirtyBitmap, we are instead tracking it via a lightweight HBitmap.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [RFC 4/6] migration: Introduce dirtylimit capability

2022-05-18 Thread Eric Blake

On Tue, May 17, 2022 at 02:35:04PM +0800, huang...@chinatelecom.cn wrote:
> From: Hyman Huang(黄勇) 
> 
> Introduce migration dirtylimit capability, which can
> be turned on before live migration and limit dirty
> page rate durty live migration.

s/durty/during/ ?

> 
> Dirtylimit dirtylimit capability is kind of like

Doubled word

> auto-converge but using dirtylimit instead of traditional
> cpu-throttle to throttle guest down.
> 
> To enable this feature, turn on the dirtylimit capability
> before live migration using migratioin-set-capabilities,
> and set dirtylimit-related parameters "vcpu-dirtylimit",
> "vcpu-dirtylimit-period" suitably to speed up convergence.
> 
> Signed-off-by: Hyman Huang(黄勇) 
> ---

> +++ b/qapi/migration.json
> @@ -463,6 +463,9 @@
>  #   procedure starts. The VM RAM is saved with running 
> VM.
>  #   (since 6.0)
>  #
> +# @dirtylimit: Use dirtylimit to throttle down guest if enabled.
> +#  (since 7.0)

7.1

same question about naming it 'dirty-limit'

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH] hw/gpio/aspeed_gpio: Fix QOM pin property

2022-05-18 Thread Peter Delevoryas



> On May 18, 2022, at 12:31 AM, Rashmica Gupta  wrote:
> 
> On Mon, 2022-05-02 at 01:08 -0700, Peter Delevoryas wrote:
>> I was setting gpioV4-7 to "1110" using the QOM pin property handler
>> and
>> noticed that lowering gpioV7 was inadvertently lowering gpioV4-6 too.
>> 
>> (qemu) qom-set /machine/soc/gpio gpioV4 true
>> (qemu) qom-set /machine/soc/gpio gpioV5 true
>> (qemu) qom-set /machine/soc/gpio gpioV6 true
>> (qemu) qom-get /machine/soc/gpio gpioV4
>> true
>> (qemu) qom-set /machine/soc/gpio gpioV7 false
>> (qemu) qom-get /machine/soc/gpio gpioV4
>> false
>> 
>> An expression in aspeed_gpio_set_pin_level was using a logical NOT
>> operator instead of a bitwise NOT operator:
>> 
>> value &= !pin_mask;
>> 
>> The original author probably intended to make a bitwise NOT
>> expression
>> "~", but mistakenly used a logical NOT operator "!" instead. Some
>> programming languages like Rust use "!" for both purposes.
>> 
>> Fixes: 4b7f956862dc ("hw/gpio: Add basic Aspeed GPIO model for
>> AST2400 and
>> AST2500")
>> Signed-off-by: Peter Delevoryas 
> 
> 
> Oops! Thanks for catching this. The tests look good.

Thanks!

> 
>

Re: [PATCH v2 2/3] ui: Switch "-display sdl" to use the QAPI parser

2022-05-18 Thread Markus Armbruster

Thomas Huth  writes:

> The "-display sdl" option still uses a hand-crafted parser for its
> parameters since we didn't want to drag an interface we considered
> somewhat flawed into the QAPI schema. Since the flaws are gone now,
> it's time to QAPIfy.
>
> This introduces the new "DisplaySDL" QAPI struct that is used to hold
> the parameters that are unique to the SDL display. The only specific
> parameter is currently "grab-mod" that is used to specify the required
> modifier keys to escape from the mouse grabbing mode.
>
> Signed-off-by: Thomas Huth 
> ---
>  qapi/ui.json| 27 +++-
>  include/sysemu/sysemu.h |  2 --
>  softmmu/globals.c   |  2 --
>  softmmu/vl.c| 70 +
>  ui/sdl2.c   | 10 ++
>  5 files changed, 37 insertions(+), 74 deletions(-)
>
> diff --git a/qapi/ui.json b/qapi/ui.json
> index 11a827d10f..a244e26e0f 100644
> --- a/qapi/ui.json
> +++ b/qapi/ui.json
> @@ -1295,6 +1295,30 @@
>'*swap-opt-cmd': 'bool'
>} }
>  
> +##
> +# @GrabMod:
> +#
> +# Set of modifier keys that need to be hold for shortcut key actions.
> +#
> +# Since: 7.1
> +##
> +{ 'enum'  : 'GrabMod',
> +  'data'  : [ 'lctrl-lalt', 'lshift-lctrl-lalt', 'rctrl' ] }

This is fine now.  If we ever generalize to "arbitrary set of modifier
keys", it'll become somewhat awkward.  No objection from me.

> +
> +##
> +# @DisplaySDL:
> +#
> +# SDL2 display options.
> +#
> +# @grab-mod:  String with modifier keys that should be pressed together with

s/String with modifier keys/Modifier keys/

> +# the "G" key to release the mouse grab. Only "lshift-lctrl-lalt"
> +# and "rctrl" are currently supported.

Why do we define lctrl-lalt if it's not supported?

> +#
> +# Since: 7.1
> +##
> +{ 'struct'  : 'DisplaySDL',
> +  'data': { '*grab-mod'   : 'GrabMod' } }
> +
>  ##
>  # @DisplayType:
>  #
> @@ -1374,7 +1398,8 @@
>'curses': { 'type': 'DisplayCurses', 'if': 'CONFIG_CURSES' },
>'egl-headless': { 'type': 'DisplayEGLHeadless',
>  'if': { 'all': ['CONFIG_OPENGL', 'CONFIG_GBM'] } },
> -  'dbus': { 'type': 'DisplayDBus', 'if': 'CONFIG_DBUS_DISPLAY' }
> +  'dbus': { 'type': 'DisplayDBus', 'if': 'CONFIG_DBUS_DISPLAY' },
> +  'sdl': { 'type': 'DisplaySDL', 'if': 'CONFIG_SDL' }
>}
>  }

[...]

Re: [RFC 2/6] qapi/migration: Introduce vcpu-dirtylimit parameters

2022-05-18 Thread Eric Blake

On Tue, May 17, 2022 at 02:35:02PM +0800, huang...@chinatelecom.cn wrote:
> From: Hyman Huang(黄勇) 
> 
> Introduce "vcpu-dirtylimit" migration parameter used
> to limit dirty page rate during live migration.
> 
> "vcpu-dirtylimit" and "vcpu-dirtylimit-period" are
> two dirtylimit-related migration parameters, which can
> be set before and during live migration by qmp
> migrate-set-parameters.
> 
> This two parameters are used to help implement the dirty
> page rate limit algo of migration.
> 
> Signed-off-by: Hyman Huang(黄勇) 
> ---
> +++ b/qapi/migration.json
> @@ -763,6 +763,9 @@
>  # @vcpu-dirtylimit-period: Periodic time (ms) of dirtylimit during live 
> migration.
>  #  Defaults to 500ms. (Since 7.0)
>  #
> +# @vcpu-dirtylimit: Dirtyrate limit (MB/s) during live migration.
> +#   Defaults to 1. (Since 7.0)
> +#

Same comments as in patch 1 (since 7.1, question on whether
dirty-limit makes more sense).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [RFC 1/6] qapi/migration: Introduce vcpu-dirtylimit-period parameters

2022-05-18 Thread Eric Blake

On Tue, May 17, 2022 at 02:35:01PM +0800, huang...@chinatelecom.cn wrote:
> From: Hyman Huang(黄勇) 
> 
> Introduce "vcpu-dirtylimit-period" migration parameters,
> which is used to makes dirtyrate calculation period

make

> configurable.
> 
> To implement that, refactor vcpu_dirty_rate_stat_collect
> so that period can be configured instead of hardcode.

hardcoded

> 
> Meanwhile, introduce migrate_dirtylimit function to help
> check if dirtylimit enabled during live migration, set
> it false by default.
> 
> Signed-off-by: Hyman Huang(黄勇) 
> ---

Focusing just on UI...

> +++ b/qapi/migration.json
> @@ -760,6 +760,9 @@
>  #block device name if there is one, and to their 
> node name
>  #otherwise. (Since 5.2)
>  #
> +# @vcpu-dirtylimit-period: Periodic time (ms) of dirtylimit during live 
> migration.
> +#  Defaults to 500ms. (Since 7.0)

The next release is 7.1.  You'll need to fix this and all other references.

Do we want 'dirty-limit' instead of 'dirtylimit'?  There was a recent
thread on how to translate QAPI to other languages that are a bit more
insistent on MixedCase, where properly separating English words makes
it easier to translate to the expected case.

>  ##
>  # @migrate-set-parameters:
> @@ -1125,6 +1132,9 @@
>  #block device name if there is one, and to their 
> node name
>  #otherwise. (Since 5.2)
>  #
> +# @vcpu-dirtylimit-period: Periodic time (ms) of dirtylimit during live 
> migration.
> +#  Defaults to 500ms. (Since 7.0)
> +#
>  # Features:
>  # @unstable: Member @x-checkpoint-delay is experimental.

Is this feature ready for prime time, or do we want to initially name
it x-vcpu-dirty[-]limit-period to mark it experimental?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v2 1/3] ui: Remove deprecated parameters of the "-display sdl" option

2022-05-18 Thread Markus Armbruster

Thomas Huth  writes:

> These parameters are in the way for further refactoring (since they
> use an underscore in the name which is forbidden in QAPI), so let's
> remove these now that their deprecation period is over.

Forbidden, but there's an exception mechanism, so this reason isn't
compelling.  I believe the actual reason is they are "too ugly and
inflexible to drag them along into the QAPI world" (your words).

Suggest:

  Dropping these deprecated parameters now simplifies further
  refactoring.

> Reviewed-by: Daniel P. Berrangé 
> Signed-off-by: Thomas Huth 

Patch looks good to me, so
Reviewed-by: Markus Armbruster

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-18 Thread Stefan Hajnoczi

On Wed, May 18, 2022 at 02:43:50PM +0200, Paolo Bonzini wrote:
> On 5/18/22 14:28, Emanuele Giuseppe Esposito wrote:
> > For example, all callers of bdrv_open() always take the AioContext lock.
> > Often it is taken very high in the call stack, but it's always taken.
> 
> I think it's actually not a problem of who takes the AioContext lock or
> where; the requirements are contradictory:
> 
> * IO_OR_GS_CODE() functions, when called from coroutine context, expect to
> be called with the AioContext lock taken (example: bdrv_co_yield_to_drain)
> 
> * to call these functions with the lock taken, the code has to run in the
> BDS's home iothread.  Attempts to do otherwise results in deadlocks (the
> main loop's AIO_WAIT_WHILEs expect progress from the iothread, that cannot
> happen without releasing the aiocontext lock)
>
> 
> * running the code in the BDS's home iothread is not possible for
> GLOBAL_STATE_CODE() functions (unless the BDS home iothread is the main
> thread, but that cannot be guaranteed in general)
> 
> > We might suppose that many callbacks are called under drain and in
> > GLOBAL_STATE, which should be enough, but from our experimentation in
> > the previous series we saw that currently not everything is under drain,
> > leaving some operations unprotected (remember assert_graph_writable
> > temporarily disabled, since drain coverage for bdrv_replace_child_noperm
> > was not 100%?).
> > Therefore we need to add more drains. But isn't drain what we decided to
> > drop at the beginning? Why isn't drain good?
> 
> To sum up the patch ordering deadlock that we have right now:
> 
> * in some cases, graph manipulations are protected by the AioContext lock
> 
> * eliminating the AioContext lock is needed to move callbacks to coroutine
> contexts (see above for the deadlock scenario)
> 
> * moving callbacks to coroutine context is needed by the graph rwlock
> implementation
> 
> On one hand, we cannot protect the graph across manipulations with a graph
> rwlock without removing the AioContext lock; on the other hand, the
> AioContext lock is what _right now_ protects the graph.
> 
> So I'd rather go back to Emanuele's draining approach.  It may not be
> beautiful, but it allows progress.  Once that is in place, we can remove the
> AioContext lock (which mostly protects virtio-blk/virtio-scsi code right
> now) and reevaluate our next steps.

Me too, I don't think the rwlock was particularly nice either.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v2 10/26] nbd: add missing coroutine_fn annotations

2022-05-18 Thread Eric Blake

On Mon, May 09, 2022 at 12:30:03PM +0200, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini 
> ---
>  block/nbd.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 6085ab1d2c..fe913a6db4 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -983,11 +983,11 @@ static void nbd_iter_request_error(NBDReplyChunkIter 
> *iter, int ret)
>   * nbd_reply_chunk_iter_receive
>   * The pointer stored in @payload requires g_free() to free it.
>   */
> -static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
> - NBDReplyChunkIter *iter,
> - uint64_t handle,
> - QEMUIOVector *qiov, NBDReply *reply,
> - void **payload)
> +static bool coroutine_fn nbd_reply_chunk_iter_receive(BDRVNBDState *s,
> +  NBDReplyChunkIter 
> *iter,
> +  uint64_t handle,
> +  QEMUIOVector *qiov, 
> NBDReply *reply,

Might be worth wrapping the now-longer line.

> +  void **payload)

Used only by the macro NBD_FOREACH_REPLY_CHUNK(), which in turn is
used by nbd_co_receive_return_code, nbd_co_receive_cmdread_reply,
nbd_co_receive_blockstatus_reply, which all got recently marked in
0c43c6fc.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v2 09/26] iscsi: add missing coroutine_fn annotations

2022-05-18 Thread Eric Blake

On Mon, May 09, 2022 at 12:30:02PM +0200, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini 
> ---
>  block/iscsi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index d707d0b354..b33eeec794 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -290,7 +290,7 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int 
> status,
>  }
>  }
>  
> -static void iscsi_co_init_iscsitask(IscsiLun *iscsilun, struct IscsiTask 
> *iTask)
> +static void coroutine_fn iscsi_co_init_iscsitask(IscsiLun *iscsilun, struct 
> IscsiTask *iTask)

Might be worth wrapping the long line.

All callers are already coroutine_fn (iscsi_co_writev,
iscsi_co_block_status, iscsi_co_readv, iscsi_co_flush,
iscsi_co_pdiscard, iscsi_co_pwrite_zeroes, iscsi_co_copy_range_to), so
safe.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-18 Thread Stefan Hajnoczi

On Wed, May 18, 2022 at 02:28:41PM +0200, Emanuele Giuseppe Esposito wrote:
> 
> 
> Am 17/05/2022 um 12:59 schrieb Stefan Hajnoczi:
> > On Wed, May 04, 2022 at 02:39:05PM +0100, Stefan Hajnoczi wrote:
> >> On Tue, Apr 26, 2022 at 04:51:06AM -0400, Emanuele Giuseppe Esposito wrote:
> >>> This is a new attempt to replace the need to take the AioContext lock to
> >>> protect graph modifications. In particular, we aim to remove
> >>> (or better, substitute) the AioContext around bdrv_replace_child_noperm,
> >>> since this function changes BlockDriverState's ->parents and ->children
> >>> lists.
> >>>
> >>> In the previous version, we decided to discard using subtree_drains to
> >>> protect the nodes, for different reasons: for those unfamiliar with it,
> >>> please see 
> >>> https://patchew.org/QEMU/20220301142113.163174-1-eespo...@redhat.com/
> >>
> >> I reread the thread and it's unclear to me why drain is the wrong
> >> mechanism for protecting graph modifications. We theorized a lot but
> >> ultimately is this new mechanism sufficiently different from
> >> bdrv_drained_begin()/end() to make it worth implementing?
> >>
> >> Instead of invoking .drained_begin() callbacks to stop further I/O,
> >> we're now queuing coroutines (without backpressure information that
> >> whoever is spawning I/O needs so they can stop). The writer still waits
> >> for in-flight I/O to finish, including I/O not associated with the bdrv
> >> graph we wish to modify (because rdlock is per-AioContext and unrelated
> >> to a specific graph). Is this really more lightweight than drain?
> >>
> >> If I understand correctly, the original goal was to avoid the need to
> >> hold the AioContext lock across bdrv_replace_child_noperm(). I would
> >> focus on that and use drain for now.
> >>
> >> Maybe I've missed an important point about why the new mechanism is
> >> needed?
> > 
> > Ping?
> 
> label: // read till the end to see why I wrote this here
> 
> I was hoping someone from the "No" party would answer to your question,
> because as you know we reached this same conclusion together.
> 
> We thought about dropping the drain for various reasons, the main one
> (at least as far as I understood) is that we are not sure if something
> can still happen in between drain_begin/end, and it is a little bit
> confusing to use the same mechanism to block I/O and protect the graph.

We had discussions about what could go wrong and there was a feeling
that maybe a separate mechanism is appropriate for graph modifications,
but there was no concrete reason why draining around graph modification
won't work.

If no one has a concrete reason then drain still seems like the most
promising approach to protecting graph modifications. The rwlock patch
wasn't sufficiently different from drain to have significant advantages
in my opinion.

> We then thought about implementing a rwlock. A rdlock would clarify what
> we are protecting and who is using the lock. I had a rwlock draft
> implementation sent in this thread, but this also lead to additional
> problems.
> Main problem was that this new lock would introduce nested event loops,
> that together with such locking would just create deadlocks.
> If readers are in coroutines and writers are not (because graph
> operations are not running in coroutines), we have a lot of deadlocks.
> If a writer has to take the lock, it must wait all other readers to
> finish. And it does it by internally calling AIO_WAIT_WHILE, creating
> nested event loop. We don't know what could execute when polling for
> events, and for example another writer could be resumed.
> Ideally, we want writers in coroutines too.

What is the deadlock? Do the readers depend on the writer somehow?

> Additionally, many readers are running in what we call "mixed"
> functions: usually implemented automatically with generated_co_wrapper
> tag, they let a function (usually bdrv callback) run always in a
> coroutine, creating one if necessary. For example, bdrv_flush() makes
> sure hat bs->bdrv_co_flush() is always run in a coroutine.
> Such mixed functions are used in other callbacks too, making it really
> difficult to understand if we are in a coroutine or not, and mostly
> important make rwlock usage very difficult.
> 
> Which lead us to stepping back once more and try to convert all
> BlockDriverState callbacks in coroutines. This would greatly simplify
> rwlock usage, because we could make the rwlock coroutine-frendly
> (without any AIO_WAIT_WHILE, allowing a writer to wait for readers by
> just yielding and queuing itself in coroutine queues).
> 
> First step was then to convert all callbacks in coroutines, using
> generated_coroutine_wrapper (g_c_w).
> A typical g_c_w is implemented in this way:
>   if (qemu_in_coroutine()) {
>   callback();
>   } else { // much simplified
>   co = qemu_coroutine_create(callback);
>   bdrv_coroutine_enter(bs, co);
>   BDRV_POLL_WHILE(bs,

Re: [PATCH v2 00/11] scsi: add quirks and features to support m68k Macs

2022-05-18 Thread Mark Cave-Ayland


On 24/04/2022 17:49, Mark Cave-Ayland wrote:


Here are the next set of patches from my ongoing work to allow the q800
machine to boot MacOS related to SCSI devices.

The first patch implements a dummy FORMAT UNIT command which is used by
the Apple HD SC Setup program when preparing an empty disk to install
MacOS.

Patch 2 adds a new quirks bitmap to SCSIDiskState to allow buggy and/or
legacy features to enabled on an individual device basis. Once the quirks
bitmap has been added, patch 3 uses the quirks feature to implement an
Apple-specific mode page which is required to allow the disk to be recognised
and used by Apple HD SC Setup.

Patch 4 adds compat_props to the q800 machine which enable the new
MODE_PAGE_APPLE_VENDOR quirk for all scsi-hd devices attached to the machine.

Patch 5 adds a new quirk to force SCSI CDROMs to always return the block
descriptor for a MODE SENSE command which is expected by A/UX, whilst patch 6
enables the quirk for all scsi-cd devices on the q800 machine.

Patch 7 adds support for truncated MODE SELECT requests which are sent by
A/UX (and also MacOS in some circumstances) when enumerating a SCSI CDROM device
which are shown to be accepted on real hardware as documented in [1].

Patch 8 allows the MODE_PAGE_R_W_ERROR AWRE bit to be changeable since the A/UX
MODE SELECT request sets this bit to 0 rather than the QEMU default which is 1.

Patch 9 adds support for setting the CDROM block size via a MODE SELECT request
which is supported by older CDROMs to allow the block size to be changed from
the default of 2048 bytes to 512 bytes for compatibility purposes. This is used
by A/UX which otherwise fails with SCSI errors if the block size is not set to
512 bytes when accessing CDROMs.

Finally patches 10 and 11 augment the compat_props to set the default vendor,
product and version information for all scsi-hd and scsi-cd devices attached
to the q800 machine, taken from real drives. This is because MacOS will only
allow a known set of SCSI devices to be recognised during the installation
process.

Signed-off-by: Mark Cave-Ayland 

[1] 
https://68kmla.org/bb/index.php?threads/scsi2sd-project-anyone-interested.29040/page-7#post-316444


v2:
- Change patchset title from "scsi: add support for FORMAT UNIT command and 
quirks"
   to "scsi: add quirks and features to support m68k Macs"
- Fix missing shift in patch 2 as pointed out by Fam
- Rename MODE_PAGE_APPLE to MODE_PAGE_APPLE_VENDOR
- Add SCSI_DISK_QUIRK_MODE_SENSE_ROM_FORCE_DBD quirk
- Add support for truncated MODE SELECT requests
- Allow MODE_PAGE_R_W_ERROR AWRE bit to be changeable for CDROM devices
- Allow the MODE SELECT block descriptor to set the CDROM block size


Mark Cave-Ayland (11):
   scsi-disk: add FORMAT UNIT command
   scsi-disk: add new quirks bitmap to SCSIDiskState
   scsi-disk: add MODE_PAGE_APPLE_VENDOR quirk for Macintosh
   q800: implement compat_props to enable quirk_mode_page_apple_vendor
 for scsi-hd devices
   scsi-disk: add SCSI_DISK_QUIRK_MODE_SENSE_ROM_FORCE_DBD quirk for
 Macintosh
   q800: implement compat_props to enable quirk_mode_sense_rom_force_dbd
 for scsi-cd devices
   scsi-disk: allow truncated MODE SELECT requests
   scsi-disk: allow the MODE_PAGE_R_W_ERROR AWRE bit to be changeable for
 CDROM drives
   scsi-disk: allow MODE SELECT block descriptor to set the ROM device
 block size
   q800: add default vendor and product information for scsi-hd devices
   q800: add default vendor and product information for scsi-cd devices

  hw/m68k/q800.c   | 13 ++
  hw/scsi/scsi-disk.c  | 53 +++-
  hw/scsi/trace-events |  3 +++
  include/hw/scsi/scsi.h   |  4 +++
  include/scsi/constants.h |  1 +
  5 files changed, 68 insertions(+), 6 deletions(-)


Ping? Anyone have any further thoughts on this?


ATB,

Mark.

Re: [PATCH] block: get rid of blk->guest_block_size

2022-05-18 Thread Durrant, Paul


On 18/05/2022 14:09, Stefan Hajnoczi wrote:

Commit 1b7fd729559c ("block: rename buffer_alignment to
guest_block_size") noted:

   At this point, the field is set by the device emulation, but completely
   ignored by the block layer.

The last time the value of buffer_alignment/guest_block_size was
actually used was before commit 339064d50639 ("block: Don't use guest
sector size for qemu_blockalign()").

This value has not been used since 2013. Get rid of it.

Cc: Xie Yongji 
Signed-off-by: Stefan Hajnoczi 


Xen part...

Reviewed-by: Paul Durrant

Re: [PULL 0/8] Net patches

2022-05-18 Thread Richard Henderson


On 5/17/22 20:12, Jason Wang wrote:

The following changes since commit eec398119fc6911d99412c37af06a6bc27871f85:

   Merge tag 'for_upstream' of git://git.kernel.org/pub/scm/virt/kvm/mst/qemu 
into staging (2022-05-16 16:31:01 -0700)

are available in the git repository at:

   https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to 052c2579b89b0d87debe8b05594b5180f0fde87d:

   tulip: Assign default MAC address if not specified (2022-05-17 16:48:23 
+0800)




Helge Deller (1):
   tulip: Assign default MAC address if not specified

Vladislav Yaroshchuk (7):
   net/vmnet: add vmnet dependency and customizable option
   net/vmnet: add vmnet backends to qapi/net
   net/vmnet: implement shared mode (vmnet-shared)
   net/vmnet: implement host mode (vmnet-host)
   net/vmnet: implement bridged mode (vmnet-bridged)
   net/vmnet: update qemu-options.hx
   net/vmnet: update hmp-commands.hx


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as 
appropriate.


r~





  hmp-commands.hx   |   6 +-
  hw/net/tulip.c|   4 +-
  meson.build   |  16 +-
  meson_options.txt |   2 +
  net/clients.h |  11 ++
  net/meson.build   |   7 +
  net/net.c |  10 ++
  net/vmnet-bridged.m   | 152 +
  net/vmnet-common.m| 378 ++
  net/vmnet-host.c  | 128 ++
  net/vmnet-shared.c| 114 +
  net/vmnet_int.h   |  63 +++
  qapi/net.json | 133 ++-
  qemu-options.hx   |  25 +++
  scripts/meson-buildoptions.sh |   1 +
  15 files changed, 1044 insertions(+), 6 deletions(-)
  create mode 100644 net/vmnet-bridged.m
  create mode 100644 net/vmnet-common.m
  create mode 100644 net/vmnet-host.c
  create mode 100644 net/vmnet-shared.c
  create mode 100644 net/vmnet_int.h

Re: QEMU cpu socket allocation

2022-05-18 Thread Peter Zay

Hi Rajesh,

>So, is it a bug that Virt Manager uses more Sockets by default, when i
choose “Copy host CPU Configuration” ?

Bug, or feature.

With a win10 client in a VM, Windows PC Health Check will complain that you
do not have enough CPUs to upgrade to win11 if you have
(Sockets,Cores,Threads) = (16,1,1).

If you have (Sockets,Cores,Threads) = (1,16,1), Windows will allow the
upgrade.

Looks like a bug.


On Wed, May 18, 2022 at 2:24 AM Rajesh A 
wrote:

> Hi Peter
>
>
>
> Thanks. Yes, I believe (Sockets,Cores,Threads) = (1,16,1) should be the
> best performance, as the VM does not need to access the memory of another
> NUMA node.
>
>
>
> So, is it a bug that Virt Manager uses more Sockets by default, when i
> choose “Copy host CPU Configuration” ?
>
>
>
> regards,
>
>
>
> *Rajesh A*
> Engineering Media Movement
> Media Services
>
>
>
> Direct +91 44 66674575   |  Mobile +91 9043011101   |  IP 564575
> rajes...@tatacommunications.com
>
>
> 
>
> www.tatacommunications.com
>
> 
>
>  @tata_comm 
>
>
>
>
>
> *From:* Peter Zay 
> *Sent:* 18 May 2022 01:33 AM
> *To:* Rajesh A 
> *Cc:* qemu-disc...@nongnu.org; qemu-devel@nongnu.org; Ramya R (GIMEC) <
> ramy...@tatacommunications.com>; V G ASHWINI . <
> ashwini...@contractor.tatacommunications.com>
> *Subject:* Re: QEMU cpu socket allocation
>
>
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.
>
> Hi Rajesh,
>
>
>
> Would the more intuitive CPU setting of (Sockets,Cores,Threads) = (1,16,1)
> be ok?
>
>
>
> Thanks.
>
>
>
>
>
> On Tue, May 17, 2022 at 10:03 AM Rajesh A 
> wrote:
>
> Hi QEMU dev
>
>
>
> Virt Manager is able to configure a QEMU VM with more CPU sockets than the
> physical host has.
>
> For example, in the below VM, when I request 16 vCPU cores,  by default it
> takes as 16 Sockets with 1 core each. The host itself has only 2 Sockets.
>
>1. How does QEMU allow this and how the VM works?
>2. What is the recommended configuration of Sockets/Cores/Threads for
>best VM performance of a 16 core VM running on a 2 sockets host ?
>
>
>
>
>
> regards,
>
>
>
> *Rajesh A*
> Engineering Media Movement
> Media Services
>
>
>
> Direct +91 44 66674575   |  Mobile +91 9043011101   |  IP 564575
> rajes...@tatacommunications.com
>
>
>
> 
>
> www.tatacommunications.com
> 
>
>
> 
>
>  @tata_comm
> 
>
>
>
>
>
>
>
>
> --
>
> Peter
>


-- 
Peter

Re: [PATCH] hostmem: default the amount of prealloc-threads to smp-cpus

2022-05-18 Thread Paolo Bonzini


On 5/18/22 15:31, Daniel P. Berrangé wrote:

When picking defaults there is never a perfect answer, it
is more a matter of the least-worst option.

It is pretty clear that nthreads=1 is terrible for any
large VMs. Defaulting it to nvcpus made conceptual sense
as the user has implicit said that they expect the VM to
be able to consume nvcpus worth of CPU time on the host,
so we might as well consume that allotted resource.


I agree.  Yes, one could argue that the regression was on the libvirt 
side, but it's easier to fix it in QEMU.


If we later add the ability to create a memory backend before machine 
creation (for example with a QMP-only binary), then it's of course okay 
for those backends to use only one thread and require a manual choice 
for the # or preallocation threads.


Paolo

Re: [PATCH v6 09/13] migration: Enable TLS for preempt channel

2022-05-18 Thread Daniel P . Berrangé

On Tue, May 17, 2022 at 03:57:26PM -0400, Peter Xu wrote:
> This patch is based on the async preempt channel creation.  It continues
> wiring up the new channel with TLS handshake to destionation when enabled.
> 
> Note that only the src QEMU needs such operation; the dest QEMU does not
> need any change for TLS support due to the fact that all channels are
> established synchronously there, so all the TLS magic is already properly
> handled by migration_tls_channel_process_incoming().
> 
> Signed-off-by: Peter Xu 
> ---
>  migration/postcopy-ram.c | 57 ++--
>  migration/trace-events   |  1 +
>  2 files changed, 50 insertions(+), 8 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 03/18] block: Change blk_{pread,pwrite}() param order

2022-05-18 Thread Eric Blake

On Tue, May 17, 2022 at 12:37:31PM +0100, Alberto Faria wrote:
> Swap 'buf' and 'bytes' around for consistency with
> blk_co_{pread,pwrite}(), and in preparation to implement these functions
> using generated_co_wrapper.
> 
> Callers were updated using this Coccinelle script:
> 
> @@ expression blk, offset, buf, bytes, flags; @@
> - blk_pread(blk, offset, buf, bytes, flags)
> + blk_pread(blk, offset, bytes, buf, flags)
> 
> @@ expression blk, offset, buf, bytes, flags; @@
> - blk_pwrite(blk, offset, buf, bytes, flags)
> + blk_pwrite(blk, offset, bytes, buf, flags)
> 
> It had no effect on hw/block/nand.c, presumably due to the #if, so that
> file was updated manually.
> 
> Overly-long lines were then fixed by hand.
> 
> Signed-off-by: Alberto Faria 
> ---

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v5 8/8] libvduse: Add support for reconnecting

2022-05-18 Thread Stefan Hajnoczi

On Wed, May 04, 2022 at 03:40:51PM +0800, Xie Yongji wrote:
> @@ -291,6 +294,15 @@ static int vduse_blk_exp_create(BlockExport *exp, 
> BlockExportOptions *opts,
>  return -ENOMEM;
>  }
>  
> +vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
> +   g_get_tmp_dir(), exp->id);

g_get_tmp_dir() returns the $TMPDIR environment variable. This means
exp->id must be unique across the host. Please document this.

Thanks,
Stefan


signature.asc
Description: PGP signature

Re: [PATCH v6 08/13] migration: Export tls-[creds|hostname|authz] params to cmdline too

2022-05-18 Thread Daniel P . Berrangé

On Tue, May 17, 2022 at 03:57:25PM -0400, Peter Xu wrote:
> It's useful for specifying tls credentials all in the cmdline (along with
> the -object tls-creds-*), especially for debugging purpose.
> 
> The trick here is we must remember to not free these fields again in the
> finalize() function of migration object, otherwise it'll cause double-free.
> 
> The thing is when destroying an object, we'll first destroy the properties
> that bound to the object, then the object itself.  To be explicit, when
> destroy the object in object_finalize() we have such sequence of
> operations:
> 
> object_property_del_all(obj);
> object_deinit(obj, ti);
> 
> So after this change the two fields are properly released already even
> before reaching the finalize() function but in object_property_del_all(),
> hence we don't need to free them anymore in finalize() or it's double-free.
> 
> This also fixes a trivial memory leak for tls-authz as we forgot to free it
> before this patch.
> 
> Signed-off-by: Peter Xu 
> ---
>  migration/migration.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 06/18] block: Implement blk_{pread, pwrite}() using generated_co_wrapper

2022-05-18 Thread Paolo Bonzini


On 5/18/22 15:34, Emanuele Giuseppe Esposito wrote:

+/* For blk_bs() in generated block/block-gen.c */
+#include "sysemu/block-backend.h"
+#include "block/block_int.h"
  #include "block/coroutines.h"
  #include "block/block-gen.h"
  #include "block/block_int.h"\


This second include of block_int.h is unnecessary.

Paolo

Re: [RFC 00/18] vfio: Adopt iommufd

2022-05-18 Thread Yi Liu


On 2022/5/18 15:22, zhangfei@foxmail.com wrote:



On 2022/5/17 下午4:55, Yi Liu wrote:

Hi Zhangfei,

On 2022/5/12 17:01, zhangfei@foxmail.com wrote:


Hi, Yi

On 2022/5/11 下午10:17, zhangfei@foxmail.com wrote:



On 2022/5/10 下午10:08, Yi Liu wrote:

On 2022/5/10 20:45, Jason Gunthorpe wrote:

On Tue, May 10, 2022 at 08:35:00PM +0800, Zhangfei Gao wrote:

Thanks Yi and Eric,
Then will wait for the updated iommufd kernel for the PCI MMIO region.

Another question,
How to get the iommu_domain in the ioctl.


The ID of the iommu_domain (called the hwpt) it should be returned by
the vfio attach ioctl.


yes, hwpt_id is returned by the vfio attach ioctl and recorded in
qemu. You can query page table related capabilities with this id.

https://lore.kernel.org/kvm/20220414104710.28534-16-yi.l@intel.com/


Thanks Yi,

Do we use iommufd_hw_pagetable_from_id in kernel?

The qemu send hwpt_id via ioctl.
Currently VFIOIOMMUFDContainer has hwpt_list,
Which member is good to save hwpt_id, IOMMUTLBEntry?


Can VFIOIOMMUFDContainer  have multi hwpt?


yes, it is possible
Then how to get hwpt_id in map/unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry 
*iotlb)


in map/unmap, should use ioas_id instead of hwpt_id






Since VFIOIOMMUFDContainer has hwpt_list now.
If so, how to get specific hwpt from map/unmap_notify in hw/vfio/as.c, 
where no vbasedev can be used for compare.


I am testing with a workaround, adding VFIOIOASHwpt *hwpt in 
VFIOIOMMUFDContainer.

And save hwpt when vfio_device_attach_container.



In kernel ioctl: iommufd_vfio_ioctl
@dev: Device to get an iommu_domain for
iommufd_hw_pagetable_from_id(struct iommufd_ctx *ictx, u32 pt_id, 
struct device *dev)

But iommufd_vfio_ioctl seems no para dev?


We can set dev=Null since IOMMUFD_OBJ_HW_PAGETABLE does not need dev.
iommufd_hw_pagetable_from_id(ictx, hwpt_id, NULL)


this is not good. dev is passed in to this function to allocate domain
and also check sw_msi things. If you pass in a NULL, it may even unable
to get a domain for the hwpt. It won't work I guess.


The iommufd_hw_pagetable_from_id can be used for
1, allocate domain, which need para dev
case IOMMUFD_OBJ_IOAS
hwpt = iommufd_hw_pagetable_auto_get(ictx, ioas, dev);


this is used when attaching ioas.


2. Just return allocated domain via hwpt_id, which does not need dev.
case IOMMUFD_OBJ_HW_PAGETABLE:
return container_of(obj, struct iommufd_hw_pagetable, obj);


yes, this would be the usage in nesting. you may check my below
branch. It's for nesting integration.

https://github.com/luxis1999/iommufd/tree/iommufd-v5.18-rc4-nesting


By the way, any plan of the nested mode?

I'm working with Eric, Nic on it. Currently, I've got the above kernel
branch, QEMU side is also WIP.


Thanks


--
Regards,
Yi Liu

Re: [PATCH v5 7/8] vduse-blk: Add vduse-blk resize support

2022-05-18 Thread Stefan Hajnoczi

On Wed, May 04, 2022 at 03:40:50PM +0800, Xie Yongji wrote:
> To support block resize, this uses vduse_dev_update_config()
> to update the capacity field in configuration space and inject
> config interrupt on the block resize callback.
> 
> Signed-off-by: Xie Yongji 
> Reviewed-by: Stefan Hajnoczi 
> ---
>  block/export/vduse-blk.c | 20 
>  1 file changed, 20 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: TCG performance on PPC64

2022-05-18 Thread Daniel Henrique Barboza


I'm adding qemu-devel for extra visibility since you've also measure TCG
performance of x86 and s390x guests in x86/M1 hosts as well.


This is very interesting. Nice work collecting all this data.

As for ppc64 performance I'll say that I am surprised that, in the end,
the ppc64 TCG backend isn't that bad in comparison with x86. There's a good
chance that the pseries guests does a lot of instructions that benefits x86
more than ppc64. Otherwise we wouldn't see other guests perform better in
ppc64.

The aarch64 guest is booting too slow IMHO. I'd guess that there might be
some command line tuning to turn off some default stuff that the ARM guest
might be enabling by default.

I'll also mention that I had no idea that the Apple M1 was that fast. Apple
really meant business when developing this chip.


Thanks,


Daniel


On 5/18/22 10:16, Matheus K. Ferst wrote:

Hi,

Since we started working with QEMU on PPC, we've noticed that
emulating PPC64 VMs is faster in x86_64 than PPC64 itself, even when compared 
with x86 machines that are slower in other workloads (like building QEMU or the 
Linux kernel).

We thought it would be related to the TCG backend, which would be better 
optimized on x86. As a first approach to better understand the problem, I ran 
some boot tests with Fedora Cloud Base 35-1.2[1] on both platforms. Using the 
command line

./qemu-system-ppc64 -name Fedora-Cloud-Base-35-1.2.ppc64le -smp 2 -m 2G -vga 
none -nographic -serial pipe:Fedora-Cloud-Base-35-1.2.ppc64le -monitor 
unix:Fedora-Cloud-Base-35-1.2.ppc64le.mon,server,nowait -device 
virtio-net,netdev=vmnic -netdev user,id=vmnic -cdrom fedora-cloud-init.iso -cpu 
POWER10 -accel tcg -device virtio-scsi-pci -drive 
file=Fedora-Cloud-Base-35-1.2.ppc64le.temp.qcow2,if=none,format=qcow2,id=hd0 
-device scsi-hd,drive=hd0 -boot c

in a POWER9 DD2.2 and an Intel Xeon E5-2687W, a simple bash script reads the ".out" pipe until the 
"fedora login:" string is found and then issues a "system_powerdown" through QEMU monitor. The 
."temp.qcow2" file is backed by the original Fedora image and deleted at the end of the test, so every boot 
is fresh. Running the test 10 times gave us 235.26 ± 6.27 s on PPC64 and 192.92 ± 4.53 s on x86_64, i.e., TCG is ~20% 
slower in the POWER9.

As a second step, I wondered if this gap would be the same when emulating other 
architectures on PPC64, so I used the same version of Fedora Cloud for 
aarch64[2] and s390x[3], using the following command lines:

./qemu-system-aarch64 -name Fedora-Cloud-Base-35-1.2.aarch64 -smp 2 -m 2G -vga 
none -nographic -serial pipe:Fedora-Cloud-Base-35-1.2.aarch64 -monitor 
unix:Fedora-Cloud-Base-35-1.2.aarch64.mon,server,nowait -device 
virtio-net,netdev=vmnic -netdev user,id=vmnic -cdrom fedora-cloud-init.iso 
-machine virt -cpu max -accel tcg -device virtio-scsi-pci -drive 
file=Fedora-Cloud-Base-35-1.2.aarch64.temp.qcow2,if=none,format=qcow2,id=hd0 
-device scsi-hd,drive=hd0 -boot c -bios ./pc-bios/edk2-aarch64-code.fd

and

./qemu-system-s390x -name Fedora-Cloud-Base-35-1.2.s390x -smp 2 -m 2G -vga none 
-nographic -serial pipe:Fedora-Cloud-Base-35-1.2.s390x -monitor 
unix:Fedora-Cloud-Base-35-1.2.s390x.mon,server,nowait -device 
virtio-net,netdev=vmnic -netdev user,id=vmnic -cdrom fedora-cloud-init.iso 
-machine s390-ccw-virtio -cpu max -accel tcg -hda 
Fedora-Cloud-Base-35-1.2.s390x.temp.qcow2 -boot c

With 50 runs, we got:

+-+-+
| |   Host  |
|  Guest  +++
| |  PPC64 | x86_64 |
+-+++
| PPC64   |  194.72 ± 7.28 |  162.75 ± 8.75 |
| aarch64 |  501.89 ± 9.98 | 586.08 ± 10.55 |
| s390x   | 294.10 ± 21.62 | 223.71 ± 85.30 |
+-+++

The difference with an s390x guest is around ~30%, with a greater variability 
on x86_64 that I couldn't find the source. However, POWER9 emulates aarch64 
faster than this Xeon.

The particular workload of the guest could distort this result since in the 
first boot Cloud-Init will create user accounts, generate SSH keys, etc. If the 
aarch64 guest uses many vector instructions for this initial setup, that might 
explain why an older Xeon would be slower here.

As a final test, I changed the images to have a normal user account already 
created and unlocked, disabled Cloud-Init, downloaded bc-1.07 sources[4][5], 
installed its build dependencies[6], and changed the test script to login, 
extract, configure, build, and shutdown the guest. I also added an aarch64 
compatible machine (Apple M1 w/ 10 cores) to our test setup. Running 100 
iterations gave us the following results:

+-++
| |    Host    |
|  Guest  +-+-++
| |  PPC64  | x86_64  | aarch64    |

Re: [PATCH v5 6/8] vduse-blk: Implement vduse-blk export

2022-05-18 Thread Stefan Hajnoczi

On Wed, May 04, 2022 at 03:40:49PM +0800, Xie Yongji wrote:
> This implements a VDUSE block backends based on
> the libvduse library. We can use it to export the BDSs
> for both VM and container (host) usage.
> 
> The new command-line syntax is:
> 
> $ qemu-storage-daemon \
> --blockdev file,node-name=drive0,filename=test.img \
> --export vduse-blk,node-name=drive0,id=vduse-export0,writable=on
> 
> After the qemu-storage-daemon started, we need to use
> the "vdpa" command to attach the device to vDPA bus:
> 
> $ vdpa dev add name vduse-export0 mgmtdev vduse
> 
> Also the device must be removed via the "vdpa" command
> before we stop the qemu-storage-daemon.
> 
> Signed-off-by: Xie Yongji 
> ---
>  MAINTAINERS   |   4 +-
>  block/export/export.c |   6 +
>  block/export/meson.build  |   5 +
>  block/export/vduse-blk.c  | 312 ++
>  block/export/vduse-blk.h  |  20 +++
>  meson.build   |  13 ++
>  meson_options.txt |   2 +
>  qapi/block-export.json|  25 ++-
>  scripts/meson-buildoptions.sh |   4 +
>  9 files changed, 388 insertions(+), 3 deletions(-)
>  create mode 100644 block/export/vduse-blk.c
>  create mode 100644 block/export/vduse-blk.h

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH] hw/tpm/tpm_tis_common.c: Assert that locty is in range

2022-05-18 Thread Stefan Berger





On 5/13/22 12:38, Peter Maydell wrote:

In tpm_tis_mmio_read(), tpm_tis_mmio_write() and
tpm_tis_dump_state(), we calculate a locality index with
tpm_tis_locality_from_addr() and then use it as an index into the
s->loc[] array.  In all these cases, the array index can't overflow
because the MemoryRegion is sized to be TPM_TIS_NUM_LOCALITIES <<
TPM_TIS_LOCALITY_SHIFT bytes.  However, Coverity can't see that, and
it complains (CID 1487138, 1487180, 1487188, 1487198, 1487240).




Add assertions that the calculated locality index is valid, which
will help Coverity and also catch any potential future bug where
the MemoryRegion isn't sized exactly.

Signed-off-by: Peter Maydell 


I trust that the 3 fixes resolve the 5 CIDs.

Reviewed-by: Stefan Berger 


---
Tested with 'make check' only...

  hw/tpm/tpm_tis_common.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/hw/tpm/tpm_tis_common.c b/hw/tpm/tpm_tis_common.c
index e700d821816..81edae410c8 100644
--- a/hw/tpm/tpm_tis_common.c
+++ b/hw/tpm/tpm_tis_common.c
@@ -295,6 +295,8 @@ static void tpm_tis_dump_state(TPMState *s, hwaddr addr)
  uint8_t locty = tpm_tis_locality_from_addr(addr);
  hwaddr base = addr & ~0xfff;

+assert(TPM_TIS_IS_VALID_LOCTY(locty));
+
  printf("tpm_tis: active locality  : %d\n"
 "tpm_tis: state of locality %d : %d\n"
 "tpm_tis: register dump:\n",
@@ -336,6 +338,8 @@ static uint64_t tpm_tis_mmio_read(void *opaque, hwaddr addr,
  uint32_t avail;
  uint8_t v;

+assert(TPM_TIS_IS_VALID_LOCTY(locty));
+
  if (tpm_backend_had_startup_error(s->be_driver)) {
  return 0;
  }
@@ -458,6 +462,8 @@ static void tpm_tis_mmio_write(void *opaque, hwaddr addr,
  uint16_t len;
  uint32_t mask = (size == 1) ? 0xff : ((size == 2) ? 0x : ~0);

+assert(TPM_TIS_IS_VALID_LOCTY(locty));
+
  trace_tpm_tis_mmio_write(size, addr, val);

  if (locty == 4) {


All 3 of your fixes below are after the 3 existing calls to 
tpm_tis_locality_from_addr(). Would Coverity be happy if we were to move 
the asserts into that one function? I am fine with this patch, though.

Re: [PATCH v5 5/8] libvduse: Add VDUSE (vDPA Device in Userspace) library

2022-05-18 Thread Stefan Hajnoczi

On Wed, May 04, 2022 at 03:40:48PM +0800, Xie Yongji wrote:
> +static int vduse_queue_update_vring(VduseVirtq *vq, uint64_t desc_addr,
> +uint64_t avail_addr, uint64_t used_addr)
> +{
> +struct VduseDev *dev = vq->dev;
> +uint64_t len;
> +
> +len = sizeof(struct vring_desc);
> +vq->vring.desc = iova_to_va(dev, , desc_addr);
> +assert(len == sizeof(struct vring_desc));
> +
> +len = sizeof(struct vring_avail);
> +vq->vring.avail = iova_to_va(dev, , avail_addr);
> +assert(len == sizeof(struct vring_avail));
> +
> +len = sizeof(struct vring_used);
> +vq->vring.used = iova_to_va(dev, , used_addr);
> +assert(len == sizeof(struct vring_used));

Can these assertions be triggered by a malicious virtio driver? For
example, if a guest is accessing this device, will the vDPA/VDUSE kernel
code forward the address to QEMU without validation?

If yes, then it's necessary to return an error here instead of aborting.
A qemu-storage-daemon process might contain multiple VDUSE exports and
an error in one export shouldn't kill the entire process.

I haven't audited the code, but this is a general issue: if vDPA/VDUSE
kernel code forwards untrusted inputs to us then we cannot abort or
crash. Usually the kernel is trusted by userspace but maybe not in this
case since it may just forward inputs from a malicious VIRTIO driver?

signature.asc
Description: PGP signature

[PATCH v2 3/3] ui: Remove deprecated options "-sdl" and "-curses"

2022-05-18 Thread Thomas Huth

We have "-sdl" and "-curses", but no "-gtk" and no "-cocoa" ...
these old-style options are rather confusing than helpful nowadays.
Now that the deprecation period is over, let's remove them, so we
get a cleaner interface (where "-display" is the only way to select
the user interface).

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Thomas Huth 
---
 docs/about/deprecated.rst   | 10 --
 docs/about/removed-features.rst | 10 ++
 softmmu/vl.c| 19 ---
 qemu-options.hx | 24 ++--
 4 files changed, 12 insertions(+), 51 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 562a133f18..e19bcba242 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -81,16 +81,6 @@ the process listing. This is replaced by the new 
``password-secret``
 option which lets the password be securely provided on the command
 line using a ``secret`` object instance.
 
-``-sdl`` (since 6.2)
-
-
-Use ``-display sdl`` instead.
-
-``-curses`` (since 6.2)
-'''
-
-Use ``-display curses`` instead.
-
 ``-watchdog`` (since 6.2)
 '
 
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index 4c9e001c35..c7b9dadd5d 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -386,6 +386,16 @@ Use ``-display sdl,grab-mod=lshift-lctrl-lalt`` instead.
 
 Use ``-display sdl,grab-mod=rctrl`` instead.
 
+``-sdl`` (removed in 7.1)
+'
+
+Use ``-display sdl`` instead.
+
+``-curses`` (removed in 7.1)
+
+
+Use ``-display curses`` instead.
+
 
 QEMU Machine Protocol (QMP) commands
 
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 484e9d9921..4c1e94b00e 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2800,16 +2800,6 @@ void qemu_init(int argc, char **argv, char **envp)
 nographic = true;
 dpy.type = DISPLAY_TYPE_NONE;
 break;
-case QEMU_OPTION_curses:
-warn_report("-curses is deprecated, "
-"use -display curses instead.");
-#ifdef CONFIG_CURSES
-dpy.type = DISPLAY_TYPE_CURSES;
-#else
-error_report("curses or iconv support is disabled");
-exit(1);
-#endif
-break;
 case QEMU_OPTION_portrait:
 graphic_rotate = 90;
 break;
@@ -3176,15 +3166,6 @@ void qemu_init(int argc, char **argv, char **envp)
 dpy.has_full_screen = true;
 dpy.full_screen = true;
 break;
-case QEMU_OPTION_sdl:
-warn_report("-sdl is deprecated, use -display sdl instead.");
-#ifdef CONFIG_SDL
-dpy.type = DISPLAY_TYPE_SDL;
-break;
-#else
-error_report("SDL support is disabled");
-exit(1);
-#endif
 case QEMU_OPTION_pidfile:
 pid_file = optarg;
 break;
diff --git a/qemu-options.hx b/qemu-options.hx
index 011caad530..3efca0e7f9 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1981,9 +1981,8 @@ DEF("display", HAS_ARG, QEMU_OPTION_display,
 , QEMU_ARCH_ALL)
 SRST
 ``-display type``
-Select type of display to use. This option is a replacement for the
-old style -sdl/-curses/... options. Use ``-display help`` to list
-the available display types. Valid values for type are
+Select type of display to use. Use ``-display help`` to list the available
+display types. Valid values for type are
 
 ``spice-app[,gl=on|off]``
 Start QEMU as a Spice server and launch the default Spice client
@@ -2085,25 +2084,6 @@ SRST
 Use C-a h for help on switching between the console and monitor.
 ERST
 
-DEF("curses", 0, QEMU_OPTION_curses,
-"-curses shorthand for -display curses\n",
-QEMU_ARCH_ALL)
-SRST
-``-curses``
-Normally, if QEMU is compiled with graphical window support, it
-displays output such as guest graphics, guest console, and the QEMU
-monitor in a window. With this option, QEMU can display the VGA
-output when in text mode using a curses/ncurses interface. Nothing
-is displayed in graphical mode.
-ERST
-
-DEF("sdl", 0, QEMU_OPTION_sdl,
-"-sdlshorthand for -display sdl\n", QEMU_ARCH_ALL)
-SRST
-``-sdl``
-Enable SDL.
-ERST
-
 #ifdef CONFIG_SPICE
 DEF("spice", HAS_ARG, QEMU_OPTION_spice,
 "-spice [port=port][,tls-port=secured-port][,x509-dir=]\n"
-- 
2.27.0

[PATCH v2 1/3] ui: Remove deprecated parameters of the "-display sdl" option

2022-05-18 Thread Thomas Huth

These parameters are in the way for further refactoring (since they
use an underscore in the name which is forbidden in QAPI), so let's
remove these now that their deprecation period is over.

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Thomas Huth 
---
 docs/about/deprecated.rst   | 16 -
 docs/about/removed-features.rst | 17 ++
 softmmu/vl.c| 41 +
 qemu-options.hx | 32 ++---
 4 files changed, 20 insertions(+), 86 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index a92ae0f162..562a133f18 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -81,22 +81,6 @@ the process listing. This is replaced by the new 
``password-secret``
 option which lets the password be securely provided on the command
 line using a ``secret`` object instance.
 
-``-display sdl,window_close=...`` (since 6.1)
-'
-
-Use ``-display sdl,window-close=...`` instead (i.e. with a minus instead of
-an underscore between "window" and "close").
-
-``-alt-grab`` and ``-display sdl,alt_grab=on`` (since 6.2)
-''
-
-Use ``-display sdl,grab-mod=lshift-lctrl-lalt`` instead.
-
-``-ctrl-grab`` and ``-display sdl,ctrl_grab=on`` (since 6.2)
-
-
-Use ``-display sdl,grab-mod=rctrl`` instead.
-
 ``-sdl`` (since 6.2)
 
 
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index eb76974347..4c9e001c35 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -370,6 +370,23 @@ The ``opened=on`` option in the command line or QMP 
``object-add`` either had
 no effect (if ``opened`` was the last option) or caused errors.  The property
 is therefore useless and should simply be removed.
 
+``-display sdl,window_close=...`` (removed in 7.1)
+''
+
+Use ``-display sdl,window-close=...`` instead (i.e. with a minus instead of
+an underscore between "window" and "close").
+
+``-alt-grab`` and ``-display sdl,alt_grab=on`` (removed in 7.1)
+'''
+
+Use ``-display sdl,grab-mod=lshift-lctrl-lalt`` instead.
+
+``-ctrl-grab`` and ``-display sdl,ctrl_grab=on`` (removed in 7.1)
+'
+
+Use ``-display sdl,grab-mod=rctrl`` instead.
+
+
 QEMU Machine Protocol (QMP) commands
 
 
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 84a31eba76..57ab9d5322 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -1079,32 +1079,7 @@ static void parse_display(const char *p)
 } else {
 goto invalid_sdl_args;
 }
-} else if (strstart(opts, ",alt_grab=", )) {
-opts = nextopt;
-if (strstart(opts, "on", )) {
-alt_grab = 1;
-} else if (strstart(opts, "off", )) {
-alt_grab = 0;
-} else {
-goto invalid_sdl_args;
-}
-warn_report("alt_grab is deprecated, use grab-mod instead.");
-} else if (strstart(opts, ",ctrl_grab=", )) {
-opts = nextopt;
-if (strstart(opts, "on", )) {
-ctrl_grab = 1;
-} else if (strstart(opts, "off", )) {
-ctrl_grab = 0;
-} else {
-goto invalid_sdl_args;
-}
-warn_report("ctrl_grab is deprecated, use grab-mod instead.");
-} else if (strstart(opts, ",window_close=", ) ||
-   strstart(opts, ",window-close=", )) {
-if (strstart(opts, ",window_close=", NULL)) {
-warn_report("window_close with an underscore is 
deprecated,"
-" please use window-close instead.");
-}
+} else if (strstart(opts, ",window-close=", )) {
 opts = nextopt;
 dpy.has_window_close = true;
 if (strstart(opts, "on", )) {
@@ -1962,10 +1937,6 @@ static void qemu_create_early_backends(void)
 const bool use_gtk = false;
 #endif
 
-if ((alt_grab || ctrl_grab) && !use_sdl) {
-error_report("-alt-grab and -ctrl-grab are only valid "
- "for SDL, ignoring option");
-}
 if (dpy.has_window_close && !use_gtk && !use_sdl) {
 error_report("window-close is only valid for GTK and SDL, "
  "ignoring option");
@@ -3273,16 +3244,6 @@ void qemu_init(int argc, char **argv, char **envp)
 dpy.has_full_screen = true;
 dpy.full_screen = true;
 break;
-case

[PATCH v2 0/3] ui: Remove deprecated sdl parameters and switch to QAPI parser

2022-05-18 Thread Thomas Huth

The "-display sdl" option still uses a hand-crafted parser for its
parameters since some of them used underscores which is forbidden
in QAPI. Now that they've been deprecated and the deprecation period
is over, we can remove the problematic parameters and switch to use
the QAPI parser instead.

While we're at it, also remove the deprecated "-sdl" and "-curses" options.

v2:
 - Rebase to current master branch to resolve conflicts in docs/about/*.rst
 - Use an enum for the grab-mod parameter instead of a unconstrained string

Thomas Huth (3):
  ui: Remove deprecated parameters of the "-display sdl" option
  ui: Switch "-display sdl" to use the QAPI parser
  ui: Remove deprecated options "-sdl" and "-curses"

 docs/about/deprecated.rst   |  26 ---
 docs/about/removed-features.rst |  27 +++
 qapi/ui.json|  27 ++-
 include/sysemu/sysemu.h |   2 -
 softmmu/globals.c   |   2 -
 softmmu/vl.c| 128 +---
 ui/sdl2.c   |  10 +++
 qemu-options.hx |  56 +-
 8 files changed, 68 insertions(+), 210 deletions(-)

-- 
2.27.0

[PATCH v2 2/3] ui: Switch "-display sdl" to use the QAPI parser

2022-05-18 Thread Thomas Huth

The "-display sdl" option still uses a hand-crafted parser for its
parameters since we didn't want to drag an interface we considered
somewhat flawed into the QAPI schema. Since the flaws are gone now,
it's time to QAPIfy.

This introduces the new "DisplaySDL" QAPI struct that is used to hold
the parameters that are unique to the SDL display. The only specific
parameter is currently "grab-mod" that is used to specify the required
modifier keys to escape from the mouse grabbing mode.

Signed-off-by: Thomas Huth 
---
 qapi/ui.json| 27 +++-
 include/sysemu/sysemu.h |  2 --
 softmmu/globals.c   |  2 --
 softmmu/vl.c| 70 +
 ui/sdl2.c   | 10 ++
 5 files changed, 37 insertions(+), 74 deletions(-)

diff --git a/qapi/ui.json b/qapi/ui.json
index 11a827d10f..a244e26e0f 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -1295,6 +1295,30 @@
   '*swap-opt-cmd': 'bool'
   } }
 
+##
+# @GrabMod:
+#
+# Set of modifier keys that need to be hold for shortcut key actions.
+#
+# Since: 7.1
+##
+{ 'enum'  : 'GrabMod',
+  'data'  : [ 'lctrl-lalt', 'lshift-lctrl-lalt', 'rctrl' ] }
+
+##
+# @DisplaySDL:
+#
+# SDL2 display options.
+#
+# @grab-mod:  String with modifier keys that should be pressed together with
+# the "G" key to release the mouse grab. Only "lshift-lctrl-lalt"
+# and "rctrl" are currently supported.
+#
+# Since: 7.1
+##
+{ 'struct'  : 'DisplaySDL',
+  'data': { '*grab-mod'   : 'GrabMod' } }
+
 ##
 # @DisplayType:
 #
@@ -1374,7 +1398,8 @@
   'curses': { 'type': 'DisplayCurses', 'if': 'CONFIG_CURSES' },
   'egl-headless': { 'type': 'DisplayEGLHeadless',
 'if': { 'all': ['CONFIG_OPENGL', 'CONFIG_GBM'] } },
-  'dbus': { 'type': 'DisplayDBus', 'if': 'CONFIG_DBUS_DISPLAY' }
+  'dbus': { 'type': 'DisplayDBus', 'if': 'CONFIG_DBUS_DISPLAY' },
+  'sdl': { 'type': 'DisplaySDL', 'if': 'CONFIG_SDL' }
   }
 }
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index b4030acd74..812f66a31a 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -42,8 +42,6 @@ extern int graphic_depth;
 extern int display_opengl;
 extern const char *keyboard_layout;
 extern int win2k_install_hack;
-extern int alt_grab;
-extern int ctrl_grab;
 extern int graphic_rotate;
 extern int old_param;
 extern uint8_t *boot_splash_filedata;
diff --git a/softmmu/globals.c b/softmmu/globals.c
index 916bc12e2b..527edbefdd 100644
--- a/softmmu/globals.c
+++ b/softmmu/globals.c
@@ -50,8 +50,6 @@ QEMUOptionRom option_rom[MAX_OPTION_ROMS];
 int nb_option_roms;
 int old_param;
 const char *qemu_name;
-int alt_grab;
-int ctrl_grab;
 unsigned int nb_prom_envs;
 const char *prom_envs[MAX_PROM_ENVS];
 uint8_t *boot_splash_filedata;
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 57ab9d5322..484e9d9921 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -1056,75 +1056,7 @@ static void parse_display(const char *p)
 exit(0);
 }
 
-if (strstart(p, "sdl", )) {
-/*
- * sdl DisplayType needs hand-crafted parser instead of
- * parse_display_qapi() due to some options not in
- * DisplayOptions, specifically:
- *   - ctrl_grab + alt_grab
- * They can't be moved into the QAPI since they use underscores,
- * thus they will get replaced by "grab-mod" in the long term
- */
-#if defined(CONFIG_SDL)
-dpy.type = DISPLAY_TYPE_SDL;
-while (*opts) {
-const char *nextopt;
-
-if (strstart(opts, ",grab-mod=", )) {
-opts = nextopt;
-if (strstart(opts, "lshift-lctrl-lalt", )) {
-alt_grab = 1;
-} else if (strstart(opts, "rctrl", )) {
-ctrl_grab = 1;
-} else {
-goto invalid_sdl_args;
-}
-} else if (strstart(opts, ",window-close=", )) {
-opts = nextopt;
-dpy.has_window_close = true;
-if (strstart(opts, "on", )) {
-dpy.window_close = true;
-} else if (strstart(opts, "off", )) {
-dpy.window_close = false;
-} else {
-goto invalid_sdl_args;
-}
-} else if (strstart(opts, ",show-cursor=", )) {
-opts = nextopt;
-dpy.has_show_cursor = true;
-if (strstart(opts, "on", )) {
-dpy.show_cursor = true;
-} else if (strstart(opts, "off", )) {
-dpy.show_cursor = false;
-} else {
-goto invalid_sdl_args;
-}
-} else if (strstart(opts, ",gl=", )) {
-opts = nextopt;
-dpy.has_gl = true;
-if (strstart(opts, "on", )) {
-dpy.gl = DISPLAYGL_MODE_ON;
-

Re: [PATCH v22 3/7] net/vmnet: implement shared mode (vmnet-shared)

2022-05-18 Thread osy

On Thu, Mar 17, 2022 at 10:28 AM Vladislav Yaroshchuk
 wrote:
>
> Interaction with vmnet.framework in different modes
> differs only on configuration stage, so we can create
> common `send`, `receive`, etc. procedures and reuse them.
>
> Signed-off-by: Phillip Tennen 
> Signed-off-by: Vladislav Yaroshchuk 
> Reviewed-by: Akihiko Odaki 
> Tested-by: Akihiko Odaki 

Hi Vladislav,

Thanks for the patches. We ran into an issue integrating it into UTM
when we discovered that by targeting a lower macOS version (we like to
have one binary for all macOS versions), newer features were compiled
out. This commit resolves that and we think it would be beneficial to
others as well. Since it does not appear the submission has made it to
mainline yet, it could be integrated into your patchset.

https://github.com/utmapp/qemu/commit/6626058f225c9c6a402f9ac6f90aa0b7e94d175c

Thank you!

Re: [PATCH 06/18] block: Implement blk_{pread, pwrite}() using generated_co_wrapper

2022-05-18 Thread Emanuele Giuseppe Esposito




Am 17/05/2022 um 16:22 schrieb Paolo Bonzini:
> On 5/17/22 13:38, Alberto Faria wrote:
>> We need to add include/sysemu/block-backend-io.h to the inputs of the
>> block-gen.c target defined in block/meson.build.
>>
>> Signed-off-by: Alberto Faria 
> 
> Emanuele is looking it cleaning this up, so you two need to coordinate.
> 
> Emanuele, can you separate/post the initial patches to clean up the
> includes of block/coroutines.h and block/block-gen.c?
> 
> Apart from that, fewer block/coroutines.h declarations is only good stuff.
> 

This is the main patch: it just moves headers from block/coroutines.h to
the script (block_gen.c).

This allows you to keep generated_co_wrapper functions in the header
where they originally were (no need to move the function in coroutines.h
and include it everywhere).

Btw @Alberto can you also cc me in the next series? So that I can follow
too.

Thank you,
Emanuele

>From 84fcea52c09024adcfe24bb0d6d2ec6842c6826b Mon Sep 17 00:00:00 2001
From: Emanuele Giuseppe Esposito 
Date: Tue, 17 May 2022 13:35:54 -0400
Subject: [PATCH] block-coroutine-wrapper: remove includes from coroutines.h

These includes in coroutines.h are not needed. Instead, they can
be moved in block-gen.c since they are needed by the generated
functions.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/coroutines.h | 5 -
 scripts/block-coroutine-wrapper.py | 3 +++
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index b293e943c8..45ecbfcc6a 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -25,11 +25,6 @@
 #ifndef BLOCK_COROUTINES_INT_H
 #define BLOCK_COROUTINES_INT_H

-#include "block/block_int.h"
-
-/* For blk_bs() in generated block/block-gen.c */
-#include "sysemu/block-backend.h"
-
 /*
  * I/O API functions. These functions are thread-safe.
  *
diff --git a/scripts/block-coroutine-wrapper.py
b/scripts/block-coroutine-wrapper.py
index 625b03e3ab..39fb1e8a4f 100644
--- a/scripts/block-coroutine-wrapper.py
+++ b/scripts/block-coroutine-wrapper.py
@@ -40,6 +40,9 @@ def gen_header():
  */

 #include "qemu/osdep.h"
+/* For blk_bs() in generated block/block-gen.c */
+#include "sysemu/block-backend.h"
+#include "block/block_int.h"
 #include "block/coroutines.h"
 #include "block/block-gen.h"
 #include "block/block_int.h"\
-- 
2.31.1

>> ---
>>   block/block-backend.c | 23 ---
>>   block/coroutines.h    |  4 
>>   block/meson.build |  1 +
>>   include/sysemu/block-backend-io.h | 10 ++
>>   4 files changed, 7 insertions(+), 31 deletions(-)
>>
>> diff --git a/block/block-backend.c b/block/block-backend.c
>> index 5b9706c443..c2a4c44a99 100644
>> --- a/block/block-backend.c
>> +++ b/block/block-backend.c
>> @@ -1567,29 +1567,6 @@ BlockAIOCB *blk_aio_pwrite_zeroes(BlockBackend
>> *blk, int64_t offset,
>>   flags | BDRV_REQ_ZERO_WRITE, cb, opaque);
>>   }
>>   -int blk_pread(BlockBackend *blk, int64_t offset, int64_t bytes,
>> void *buf,
>> -  BdrvRequestFlags flags)
>> -{
>> -    int ret;
>> -    QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, bytes);
>> -    IO_OR_GS_CODE();
>> -
>> -    blk_inc_in_flight(blk);
>> -    ret = blk_do_preadv(blk, offset, bytes, , flags);
>> -    blk_dec_in_flight(blk);
>> -
>> -    return ret;
>> -}
>> -
>> -int blk_pwrite(BlockBackend *blk, int64_t offset, int64_t bytes,
>> -   const void *buf, BdrvRequestFlags flags)
>> -{
>> -    QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, bytes);
>> -    IO_OR_GS_CODE();
>> -
>> -    return blk_pwritev_part(blk, offset, bytes, , 0, flags);
>> -}
>> -
>>   int64_t blk_getlength(BlockBackend *blk)
>>   {
>>   IO_CODE();
>> diff --git a/block/coroutines.h b/block/coroutines.h
>> index 3f41238b33..443ef2f2e6 100644
>> --- a/block/coroutines.h
>> +++ b/block/coroutines.h
>> @@ -112,10 +112,6 @@ bdrv_common_block_status_above(BlockDriverState *bs,
>>   int generated_co_wrapper
>>   nbd_do_establish_connection(BlockDriverState *bs, bool blocking,
>> Error **errp);
>>   -int generated_co_wrapper
>> -blk_do_preadv(BlockBackend *blk, int64_t offset, int64_t bytes,
>> -  QEMUIOVector *qiov, BdrvRequestFlags flags);
>> -
>>   int generated_co_wrapper
>>   blk_do_pwritev_part(BlockBackend *blk, int64_t offset, int64_t bytes,
>>   QEMUIOVector *qiov, size_t qiov_offset,
>> diff --git a/block/meson.build b/block/meson.build
>> index 0b2a60c99b..60bc305597 100644
>> --- a/block/meson.build
>> +++ b/block/meson.build
>> @@ -136,6 +136,7 @@ block_gen_c = custom_target('block-gen.c',
>>   input: files(
>>     '../include/block/block-io.h',
>>    
>> '../include/block/block-global-state.h',
>> + 
>> '../include/sysemu/block-backend-io.h',
>>

Re: [PATCH] hostmem: default the amount of prealloc-threads to smp-cpus

2022-05-18 Thread Daniel P . Berrangé

On Wed, May 18, 2022 at 03:02:48PM +0200, Dario Faggioli wrote:
> On Wed, 2022-05-18 at 12:17 +0200, Igor Mammedov wrote:
> > On Tue, 17 May 2022 20:46:50 +0200
> > Paolo Bonzini  wrote:
> > > > diff --git a/backends/hostmem.c b/backends/hostmem.c
> > > > index a7bae3d713..624bb7ecd3 100644
> > > > --- a/backends/hostmem.c
> > > > +++ b/backends/hostmem.c
> > > > @@ -274,7 +274,7 @@ static void host_memory_backend_init(Object
> > > > *obj)
> > > >   backend->merge = machine_mem_merge(machine);
> > > >   backend->dump = machine_dump_guest_core(machine);
> > > >   backend->reserve = true;
> > > > -    backend->prealloc_threads = 1;
> > > > +    backend->prealloc_threads = machine->smp.cpus;
> > > >   }
> > > >   
> > > >   static void host_memory_backend_post_init(Object *obj)  
> > > 
> > > Queued, thanks.
> > 
> > PS:
> > There is no good default in this case (whatever number is picked
> > it could be good or bad depending on usecase).
> > 
> That is fair enough. What we observed, however, is that, with QEMU 5.2,
> starting a 1024G VM takes ~34s.
> 
> Then you just update QEMU to > 5.2 (and don't do/changing anything
> else) and the same VM now takes ~4m30s to start.
> 
> If users are managing QEMU via Libvirt *and* have _at_least_ Libvirt
> 8.2, they can indeed set, e.g.,  threads='NNN'/> (provided they can understand where the problem is, and
> figure out that this is the solution).

I think you get the QEMU version numbers a bit mixed up based on
what i see in git history

Originally mem prellocation was single threaded and slow.

In v2.8.1 it switched to multi threaded with nthreads==vcpus

  commit 1e356fc14beaa3ece6c0e961bd479af58be3198b
  Author: Jitendra Kolhe 
  Date:   Fri Feb 24 09:01:43 2017 +0530

mem-prealloc: reduce large guest start-up and migration time.

This applied to --mem-prealloc and --object memory-backend*,prealloc=on


In v5.0.0 the prealloc-threads property was introduced with

  commit ffac16fab33bb42f17e47624985220c1fd864e9d
  Author: Igor Mammedov 
  Date:   Wed Feb 19 11:09:50 2020 -0500

hostmem: introduce "prealloc-threads" property

This changed it so that --mem-prealloc stil uses nthreads=vcpus
but --object memory-backend,prealloc=on regressed to nthreads=1

When picking defaults there is never a perfect answer, it
is more a matter of the least-worst option.

It is pretty clear that nthreads=1 is terrible for any
large VMs. Defaulting it to nvcpus made conceptual sense
as the user has implicit said that they expect the VM to
be able to consume nvcpus worth of CPU time on the host,
so we might as well consume that allotted resource.

I struggle to come up with a compelling reason why it is
better to only use 1 single thread for preallocation. There
might be some niches where its useful but I can't see it
being the common case desirable behaviour.

Having different defaults based on how you configure it
is also especially unplesant experience.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v5 4/8] linux-headers: Add vduse.h

2022-05-18 Thread Stefan Hajnoczi

On Wed, May 04, 2022 at 03:40:47PM +0800, Xie Yongji wrote:
> This adds vduse header to linux headers so that the
> relevant VDUSE API can be used in subsequent patches.
> 
> Signed-off-by: Xie Yongji 
> ---
>  linux-headers/linux/vduse.h | 306 
>  scripts/update-linux-headers.sh |   2 +-
>  2 files changed, 307 insertions(+), 1 deletion(-)
>  create mode 100644 linux-headers/linux/vduse.h

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v5 1/8] block: Support passing NULL ops to blk_set_dev_ops()

2022-05-18 Thread Stefan Hajnoczi

On Wed, May 04, 2022 at 03:40:44PM +0800, Xie Yongji wrote:
> This supports passing NULL ops to blk_set_dev_ops()
> so that we can remove stale ops in some cases.
> 
> Signed-off-by: Xie Yongji 
> ---
>  block/block-backend.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v5 2/8] block-backend: Introduce blk_get_guest_block_size()

2022-05-18 Thread Stefan Hajnoczi

On Wed, May 04, 2022 at 03:40:45PM +0800, Xie Yongji wrote:
> Support getting the guest block size for the block backend.
> It's needed for the following commit.
> 
> Signed-off-by: Xie Yongji 
> ---
>  block/block-backend.c | 6 ++
>  include/sysemu/block-backend-io.h | 1 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 35457a6a1d..1582ff81c9 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -2106,6 +2106,12 @@ void blk_set_guest_block_size(BlockBackend *blk, int 
> align)
>  blk->guest_block_size = align;
>  }
>  
> +int blk_get_guest_block_size(BlockBackend *blk)
> +{
> +IO_CODE();
> +return blk->guest_block_size;

I have sent a patch to remove blk->guest_block_size because this field
is currently unused.

I think there is a cleaner way for this patch series to store the guest
logical_block_size (see next patch). Stashing it in BlockBackend was
attractive because virtio-blk-handler.c lacks a struct to store its
parameters (writable, serial, logical_block_size), but if such a struct
is introduced then there's no need to stash it in BlockBackend.

Stefan

signature.asc
Description: PGP signature

Re: [PATCH v5 3/8] block/export: Abstract out the logic of virtio-blk I/O process

2022-05-18 Thread Stefan Hajnoczi

On Wed, May 04, 2022 at 03:40:46PM +0800, Xie Yongji wrote:
> -static void vu_blk_req_complete(VuBlkReq *req)
> +static void vu_blk_req_complete(VuBlkReq *req, size_t in_len)
>  {
>  VuDev *vu_dev = >server->vu_dev;
>  
> -/* IO size with 1 extra status byte */
> -vu_queue_push(vu_dev, req->vq, >elem, req->size + 1);
> +vu_queue_push(vu_dev, req->vq, >elem, in_len);

I think this silently fixes a bug: now the correct len value is
calculated. Before the I/O buffer wasn't counted in read requests.
Please mention this in the commit description.

> +static bool virtio_blk_sect_range_ok(BlockBackend *blk,
> + uint64_t sector, size_t size)
> +{
> +uint64_t nb_sectors;
> +uint64_t total_sectors;
> +
> +if (size % VIRTIO_BLK_SECTOR_SIZE) {
> +return false;
> +}
> +
> +nb_sectors = size >> VIRTIO_BLK_SECTOR_BITS;
> +
> +QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != VIRTIO_BLK_SECTOR_SIZE);
> +if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
> +return false;
> +}
> +if ((sector << VIRTIO_BLK_SECTOR_BITS) % blk_get_guest_block_size(blk)) {

Please use VirtioBlkHandler->logical_block_size instead (see below).

> +int coroutine_fn virtio_blk_process_req(BlockBackend *blk, bool writable,
> +const char *serial,

I suggest defining a struct instead of passing individual arguments:

  typedef struct {
  BlockBackend *blk;
  const char *serial;
  uint32_t logical_block_size;
  bool writable;
  } VirtioBlkHandler;


signature.asc
Description: PGP signature

[PATCH] block: get rid of blk->guest_block_size

2022-05-18 Thread Stefan Hajnoczi

Commit 1b7fd729559c ("block: rename buffer_alignment to
guest_block_size") noted:

  At this point, the field is set by the device emulation, but completely
  ignored by the block layer.

The last time the value of buffer_alignment/guest_block_size was
actually used was before commit 339064d50639 ("block: Don't use guest
sector size for qemu_blockalign()").

This value has not been used since 2013. Get rid of it.

Cc: Xie Yongji 
Signed-off-by: Stefan Hajnoczi 
---
 include/sysemu/block-backend-io.h|  1 -
 block/block-backend.c| 10 --
 block/export/vhost-user-blk-server.c |  1 -
 hw/block/virtio-blk.c|  1 -
 hw/block/xen-block.c |  1 -
 hw/ide/core.c|  1 -
 hw/scsi/scsi-disk.c  |  1 -
 hw/scsi/scsi-generic.c   |  1 -
 8 files changed, 17 deletions(-)

diff --git a/include/sysemu/block-backend-io.h 
b/include/sysemu/block-backend-io.h
index 6517c39295..ccef514023 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -72,7 +72,6 @@ void blk_error_action(BlockBackend *blk, BlockErrorAction 
action,
 void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
-void blk_set_guest_block_size(BlockBackend *blk, int align);
 
 void blk_io_plug(BlockBackend *blk);
 void blk_io_unplug(BlockBackend *blk);
diff --git a/block/block-backend.c b/block/block-backend.c
index e0e1aff4b1..d4abdf8faa 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -56,9 +56,6 @@ struct BlockBackend {
 const BlockDevOps *dev_ops;
 void *dev_opaque;
 
-/* the block size for which the guest device expects atomicity */
-int guest_block_size;
-
 /* If the BDS tree is removed, some of its options are stored here (which
  * can be used to restore those options in the new BDS on insert) */
 BlockBackendRootState root_state;
@@ -998,7 +995,6 @@ void blk_detach_dev(BlockBackend *blk, DeviceState *dev)
 blk->dev = NULL;
 blk->dev_ops = NULL;
 blk->dev_opaque = NULL;
-blk->guest_block_size = 512;
 blk_set_perm(blk, 0, BLK_PERM_ALL, _abort);
 blk_unref(blk);
 }
@@ -2100,12 +2096,6 @@ int blk_get_max_iov(BlockBackend *blk)
 return blk->root->bs->bl.max_iov;
 }
 
-void blk_set_guest_block_size(BlockBackend *blk, int align)
-{
-IO_CODE();
-blk->guest_block_size = align;
-}
-
 void *blk_try_blockalign(BlockBackend *blk, size_t size)
 {
 IO_CODE();
diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index a129204c44..b2e458ade3 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -495,7 +495,6 @@ static int vu_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 return -EINVAL;
 }
 vexp->blk_size = logical_block_size;
-blk_set_guest_block_size(exp->blk, logical_block_size);
 
 if (vu_opts->has_num_queues) {
 num_queues = vu_opts->num_queues;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cd804795c6..e9ba752f6b 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1228,7 +1228,6 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 
 s->change = qemu_add_vm_change_state_handler(virtio_blk_dma_restart_cb, s);
 blk_set_dev_ops(s->blk, _block_ops, s);
-blk_set_guest_block_size(s->blk, s->conf.conf.logical_block_size);
 
 blk_iostatus_enable(s->blk);
 
diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index 674953f1ad..345b284d70 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -243,7 +243,6 @@ static void xen_block_realize(XenDevice *xendev, Error 
**errp)
 }
 
 blk_set_dev_ops(blk, _block_dev_ops, blockdev);
-blk_set_guest_block_size(blk, conf->logical_block_size);
 
 if (conf->discard_granularity == -1) {
 conf->discard_granularity = conf->physical_block_size;
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 3a5afff5d7..f7ec68513f 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2544,7 +2544,6 @@ int ide_init_drive(IDEState *s, BlockBackend *blk, 
IDEDriveKind kind,
 s->smart_selftest_count = 0;
 if (kind == IDE_CD) {
 blk_set_dev_ops(blk, _cd_block_ops, s);
-blk_set_guest_block_size(blk, 2048);
 } else {
 if (!blk_is_inserted(s->blk)) {
 error_setg(errp, "Device needs media, but drive is empty");
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 072686ed58..91acb5c0ce 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2419,7 +2419,6 @@ static void scsi_realize(SCSIDevice *dev, Error **errp)
 } else {
 blk_set_dev_ops(s->qdev.conf.blk, _disk_block_ops, s);
 }
-blk_set_guest_block_size(s->qdev.conf.blk, s->qdev.blocksize);
 
 blk_iostatus_enable(s->qdev.conf.blk);
 
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c

Re: [PATCH V2 1/4] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH

2022-05-18 Thread Dr. David Alan Gilbert

* Jason Wang (jasow...@redhat.com) wrote:
> On Sat, May 7, 2022 at 10:03 AM Zhang, Chen  wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Zhang, Chen
> > > Sent: Wednesday, April 27, 2022 5:26 PM
> > > To: Jason Wang ; Paolo Bonzini
> > > 
> > > Cc: Li Zhijian ; qemu-dev  > > de...@nongnu.org>; Like Xu 
> > > Subject: RE: [PATCH V2 1/4] softmmu/runstate.c: add RunStateTransition
> > > support form COLO to PRELAUNCH
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Jason Wang 
> > > > Sent: Wednesday, April 27, 2022 4:57 PM
> > > > To: Zhang, Chen 
> > > > Cc: Li Zhijian ; qemu-dev  > > > de...@nongnu.org>; Like Xu 
> > > > Subject: Re: [PATCH V2 1/4] softmmu/runstate.c: add RunStateTransition
> > > > support form COLO to PRELAUNCH
> > > >
> > > > On Fri, Apr 1, 2022 at 11:59 AM Zhang Chen  wrote:
> > > > >
> > > > > If the checkpoint occurs when the guest finishes restarting but has
> > > > > not started running, the runstate_set() may reject the transition
> > > > > from COLO to PRELAUNCH with the crash log:
> > > > >
> > > > > {"timestamp": {"seconds": 1593484591, "microseconds": 26605},\
> > > > > "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
> > > > > qemu-system-x86_64: invalid runstate transition: 'colo' -> 'prelaunch'
> > > > >
> > > > > Long-term testing says that it's pretty safe.
> > > > >
> > > > > Signed-off-by: Like Xu 
> > > > > Signed-off-by: Zhang Chen 
> > > >
> > > > I'd expect this to get ack from the relevant maintainers.
> > > >
> > >
> > > The scripts/get_maintainer.pl can't find relevant maintainers for this 
> > > patch.
> > > Maybe Paolo have time to cover this simple patch related to runstate?
> >
> > No news for a while, any comments for unmaintained files changes ?
> > Ping...
> 
> Adding David and Juan.

This looks OK to me;

Acked-by: Dr. David Alan Gilbert 

it should be fine to merge it along with the pull that takes the other
patches.

Dave

> Thanks
> 
> >
> > Thanks
> > Chen
> >
> > >
> > > Thanks
> > > Chen
> > >
> > > > Thanks
> > > >
> > > > > ---
> > > > >  softmmu/runstate.c | 1 +
> > > > >  1 file changed, 1 insertion(+)
> > > > >
> > > > > diff --git a/softmmu/runstate.c b/softmmu/runstate.c index
> > > > > e0d869b21a..c021c56338 100644
> > > > > --- a/softmmu/runstate.c
> > > > > +++ b/softmmu/runstate.c
> > > > > @@ -127,6 +127,7 @@ static const RunStateTransition
> > > > runstate_transitions_def[] = {
> > > > >  { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
> > > > >
> > > > >  { RUN_STATE_COLO, RUN_STATE_RUNNING },
> > > > > +{ RUN_STATE_COLO, RUN_STATE_PRELAUNCH },
> > > > >  { RUN_STATE_COLO, RUN_STATE_SHUTDOWN},
> > > > >
> > > > >  { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
> > > > > --
> > > > > 2.25.1
> > > > >
> >
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v6 07/13] migration: Add helpers to detect TLS capability

2022-05-18 Thread Peter Xu

On Wed, May 18, 2022 at 09:57:12AM +0100, Daniel P. Berrangé wrote:
> > @@ -37,4 +37,8 @@ void migration_tls_channel_connect(MigrationState *s,
> > QIOChannel *ioc,
> > const char *hostname,
> > Error **errp);
> > +
> > +/* Whether the QIO channel requires further TLS handshake? */
> > +bool migrate_channel_requires_tls(QIOChannel *ioc);
> 
> I find this name somewhat confusing, as 'requires tls' and
> 'uses tls' are just synonyms for the same thing IMHO.
> 
> What this method is actually checking is whether we still need
> to upgrade the channel from plain text to TLS, by completing a
> TLS handshake. So can we call this:
> 
>   migrate_channel_requires_tls_upgrade

Sounds good.  I'll wait for more comments on other patches.  Thanks,

-- 
Peter Xu

Re: [PATCH] hostmem: default the amount of prealloc-threads to smp-cpus

2022-05-18 Thread Dario Faggioli

On Wed, 2022-05-18 at 12:17 +0200, Igor Mammedov wrote:
> On Tue, 17 May 2022 20:46:50 +0200
> Paolo Bonzini  wrote:
> > > diff --git a/backends/hostmem.c b/backends/hostmem.c
> > > index a7bae3d713..624bb7ecd3 100644
> > > --- a/backends/hostmem.c
> > > +++ b/backends/hostmem.c
> > > @@ -274,7 +274,7 @@ static void host_memory_backend_init(Object
> > > *obj)
> > >   backend->merge = machine_mem_merge(machine);
> > >   backend->dump = machine_dump_guest_core(machine);
> > >   backend->reserve = true;
> > > -    backend->prealloc_threads = 1;
> > > +    backend->prealloc_threads = machine->smp.cpus;
> > >   }
> > >   
> > >   static void host_memory_backend_post_init(Object *obj)  
> > 
> > Queued, thanks.
> 
> PS:
> There is no good default in this case (whatever number is picked
> it could be good or bad depending on usecase).
> 
That is fair enough. What we observed, however, is that, with QEMU 5.2,
starting a 1024G VM takes ~34s.

Then you just update QEMU to > 5.2 (and don't do/changing anything
else) and the same VM now takes ~4m30s to start.

If users are managing QEMU via Libvirt *and* have _at_least_ Libvirt
8.2, they can indeed set, e.g.,  (provided they can understand where the problem is, and
figure out that this is the solution).

If they have Libvirt < 8.2 (e.g., people/distros that have, say, QEMU
6.2 and Libvirt 8.0.0, or something like that), there's basically
nothing they can do... Except perhaps command line passthrough [1], but
that's really rather tricky!

So, I personally don't know where any default should be set and how,
but the above situation is not nice for users to have to handle.

[1] https://libvirt.org/kbase/qemu-passthrough-security.html

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)

signature.asc
Description: This is a digitally signed message part

Re: [PATCH 01/18] block: Make blk_{pread,pwrite}() return 0 on success

2022-05-18 Thread Eric Blake

On Tue, May 17, 2022 at 12:35:07PM +0100, Alberto Faria wrote:
> They currently return the value of their 'bytes' parameter on success.
> 
> Make them return 0 instead, for consistency with other I/O functions and
> in preparation to implement them using generated_co_wrapper. This also
> makes it clear that short reads/writes are not possible.
> 
> Signed-off-by: Alberto Faria 
> ---

> +++ b/qemu-img.c
> @@ -5120,30 +5120,27 @@ static int img_dd(int argc, char **argv)
>  in.buf = g_new(uint8_t, in.bsz);
>  
>  for (out_pos = 0; in_pos < size; block_count++) {

in_pos, out_pos, and size are int64_t...

> -int in_ret, out_ret;
> +int bytes, in_ret, out_ret;
>  
> -if (in_pos + in.bsz > size) {
> -in_ret = blk_pread(blk1, in_pos, in.buf, size - in_pos);
> -} else {
> -in_ret = blk_pread(blk1, in_pos, in.buf, in.bsz);
> -}
> +bytes = (in_pos + in.bsz > size) ? size - in_pos : in.bsz;

...but in.bsz is int, so declaring 'int bytes' appears safe.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

RE: About restoring the state in vhost-vdpa device

2022-05-18 Thread Parav Pandit


> From: Eugenio Perez Martin 
> Sent: Tuesday, May 17, 2022 4:12 AM
 
> > 2. Each VQ enablement one at a time, requires constant steering update
> > for the VQ While this information is something already known. Trying to
> reuse brings a callback result in this in-efficiency.
> > So better to start with more reusable APIs that fits the LM flow.
> 
> We can change to that model later. Since the model proposed by us does not
> add any burden, we can discard it down the road if something better arises.
> The proposed behavior should already work for all
> devices: It comes for free regarding kernel / vdpa code.
It is not for free.
It comes with higher LM downtime.
And that makes it unusable as the queues scale.

> 
> I think that doing at vhost/vDPA level is going to cause the same problem as
> VRING_SET_BASE: We will need to maintain two ways of performing the
> same, and the code will need to synchronize them. I'm not *against* adding
> it by itself, I'm just considering it an optimization that needs to be 
> balanced
> against what already enables the device to perform state restoring.

We only need to change the sequencing of how we restore and abstract it out how 
to restore in the vdpa layer.
CVQ or something else it the choice internal inside the vpda vendor driver.

Re: [PATCH v2 06/10] block: Make 'bytes' param of bdrv_co_{pread, pwrite, preadv, pwritev}() an int64_t

2022-05-18 Thread Eric Blake

On Tue, May 17, 2022 at 03:48:59PM +0100, Alberto Faria wrote:
> On Tue, May 17, 2022 at 3:33 PM Eric Blake  wrote:
> > ...now end up calling QEMU_IOVEC_INIT_BUF() which tries to do
> > .local_iov.iov_len = bytes, which can silently overflow on 32-bit
> > platforms where iov_len is size_t.  We need to add a code guard that
> > callers do not pass in too large of a buffer.
> 
> I see. blk_co_pread() and blk_co_pwrite() use assert(bytes <=
> SIZE_MAX). Would that be an appropriate safeguard here? Or should we
> return an error?

I'd be okay with the assert.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

RE: About restoring the state in vhost-vdpa device

2022-05-18 Thread Parav Pandit


> From: Jason Wang 
> Sent: Monday, May 16, 2022 11:05 PM
> >> Although it's a longer route, I'd very much prefer an in-band virtio
> >> way to perform it rather than a linux/vdpa specific. It's one of the
> >> reasons I prefer the CVQ behavior over a vdpa specific ioctl.
> >>
> > What is the in-band method to set last_avail_idx?
> > In-band virtio method doesn't exist.
> 
> 
> Right, but it's part of the vhost API which was there for more than 10 years.
> This should be supported by all the vDPA vendors.
Sure. My point to Eugenio was that vdpa doesn’t have to limited by virtio spec.
Plumbing exists to make vdpa work without virtio spec.
And hence, additional ioctl can be ok.

> >> layers of the stack need to maintain more state.
> > Mostly not. A complete virtio device state arrived from source vdpa device
> can be given to destination vdpa device without anyone else looking in the
> middle. If this format is known/well defined.
> 
> 
> That's fine, and it seems the virtio spec is a better place for this,
> then we won't duplicate efforts?
> 
Yes. for VDPA kernel, setting parameters doesn’t need virtio spec update.
It is similar to avail index setting.

> 
> >
> >>  From the guest point of view, to enable all the queues with
> >> VHOST_VDPA_SET_VRING_ENABLE and don't send DRIVER_OK is the
> same
> >> as send DRIVER_OK and not to enable any data queue with
> >> VHOST_VDPA_SET_VRING_ENABLE.
> > Enabling SET_VRING_ENABLE after DRIVER_OK has two basic things
> broken.
> 
> 
> It looks to me the spec:
> 
> 1) For PCI it doesn't forbid the driver to set queue_enable to 1 after
> DRIVER_OK.
Device init sequence sort of hints that vq setup should be done before 
driver_ok in below snippet.

"Perform device-specific setup, including discovery of virtqueues for the 
device, optional per-bus setup,
reading and possibly writing the device’s virtio configuration space, and 
population of virtqueues."

For a moment even if we assume, that queue can be enabled after driver_ok, it 
ends up going to incorrect queue.
Because the queue where it supposed to go, it not enabled and its rss is not 
setup.

So on restore flow it is desired to set needed config before doing driver_ok.

> 2) For MMIO, it even allows the driver to disable a queue after DRIVER_OK
> 
> 
> > 1. supplied RSS config and VQ config is not honored for several tens of
> hundreds of milliseconds
> > It will be purely dependent on how/when this ioctl are made.
> > Due to this behavior packet supposed to arrive in X VQ, arrives in Y VQ.
> 
> 
> I don't get why we end up with this situation.
> 
> 1) enable cvq
> 2) set driver_ok
> 3) set RSS
> 4) enable TX/RX
> 
> vs
> 
> 1) set RSS
> 2) enable cvq
> 3) enable TX/RX
> 4) set driver_ok
> 
> Is the latter faster?
> 
Yes, because later sequence has the ability to setup steering config once.
As opposed to that first sequence needs to incrementally update the rss setting 
on every new queue addition on step #4.

> 
> >
> > 2. Each VQ enablement one at a time, requires constant steering update
> for the VQ
> > While this information is something already known. Trying to reuse brings a
> callback result in this in-efficiency.
> > So better to start with more reusable APIs that fits the LM flow.
> 
> 
> I agree, but the method proposed in the mail seems to be the only way
> that can work with the all the major vDPA vendors.
> 
> E.g the new API requires the device has the ability to receive device
> state other than the control virtqueue which might not be supported the
> hardware. (The device might expects a trap and emulate model rather than
> save and restore).
> 
How a given vendor to return the values is in the vendor specific vdpa driver, 
just like avail_index which is not coming through the CVQ.

>  From qemu point of view, it might need to support both models.
> 
> If the device can't do save and restore:
> 
> 1.1) enable cvq
> 1.2) set driver_ok
> 1.3) set device state (MQ, RSS) via control vq
> 1.4) enable TX/RX
> 
> If the device can do save and restore:
> 
> 2.1) set device state (new API for setting MQ,RSS)
> 2.2) enable cvq
> 2.3) enable TX?RX
> 2.4) set driver_ok
> 
> We can start from 1 since it works for all device and then adding
> support for 2?
> 

How about:
3.1) create cvq for the supported device
Cvq not exposed to user space, stays in the kernel. Vdpa driver created it.

3.2) set device state (MQ, RSS) comes via user->kernel ioctl()
Vdpa driver internally decides whether to use cvq or something else (like avail 
index).

3.3) enable tx/rx
3.4) set driver_ok

1 2 >

1 - 100 of 176 matches

Mail list logo