Re: [PATCH v1 20/23] xen platform: unplug ahci object

2023-06-21 Thread Bernhard Beschow



Am 20. Juni 2023 17:24:54 UTC schrieb Joel Upham :
>This will unplug the ahci device when the Xen driver calls for an unplug.
>This has been tested to work in linux and Windows guests.
>When q35 is detected, we will remove the ahci controller
>with the hard disks.  In the libxl config, cdrom devices
>are put on a seperate ahci controller. This allows for 6 cdrom
>devices to be added, and 6 qemu hard disks.

Does this also work with KVM Xen emulation? If so, the QEMU manual should be 
updated accordingly in this patch since it explicitly rules out Q35 due to 
missing AHCI unplug: 
https://gitlab.com/qemu-project/qemu/-/blob/stable-8.0/docs/system/i386/xen.rst?plain=1_type=heads#L51

Best regards,
Bernhard

>
>
>Signed-off-by: Joel Upham 
>---
> hw/i386/xen/xen_platform.c | 19 ++-
> hw/pci/pci.c   | 17 +
> include/hw/pci/pci.h   |  3 +++
> 3 files changed, 38 insertions(+), 1 deletion(-)
>
>diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
>index 57f1d742c1..0375337222 100644
>--- a/hw/i386/xen/xen_platform.c
>+++ b/hw/i386/xen/xen_platform.c
>@@ -34,6 +34,7 @@
> #include "sysemu/block-backend.h"
> #include "qemu/error-report.h"
> #include "qemu/module.h"
>+#include "include/hw/i386/pc.h"
> #include "qom/object.h"
> 
> #ifdef CONFIG_XEN
>@@ -223,6 +224,12 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void 
>*opaque)
> if (flags & UNPLUG_NVME_DISKS) {
> object_unparent(OBJECT(d));
> }
>+break;
>+
>+case PCI_CLASS_STORAGE_SATA:
>+  if (!aux) {
>+object_unparent(OBJECT(d));
>+}
> 
> default:
> break;
>@@ -231,7 +238,17 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void 
>*opaque)
> 
> static void pci_unplug_disks(PCIBus *bus, uint32_t flags)
> {
>-pci_for_each_device(bus, 0, unplug_disks, );
>+PCIBus *q35 = find_q35();
>+if (q35) {
>+/* When q35 is detected, we will remove the ahci controller
>+   * with the hard disks.  In the libxl config, cdrom devices
>+   * are put on a seperate ahci controller. This allows for 6 cdrom
>+   * devices to be added, and 6 qemu hard disks.
>+   */
>+pci_function_for_one_bus(bus, unplug_disks, );
>+} else {
>+pci_for_each_device(bus, 0, unplug_disks, );
>+}
> }
> 
> static void platform_fixed_ioport_writew(void *opaque, uint32_t addr, 
> uint32_t val)
>diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>index 1cc7c89036..8eac3d751a 100644
>--- a/hw/pci/pci.c
>+++ b/hw/pci/pci.c
>@@ -1815,6 +1815,23 @@ void pci_for_each_device_reverse(PCIBus *bus, int 
>bus_num,
> }
> }
> 
>+void pci_function_for_one_bus(PCIBus *bus,
>+  void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
>+  void *opaque)
>+{
>+bus = pci_find_bus_nr(bus, 0);
>+
>+if (bus) {
>+PCIDevice *d;
>+
>+d = bus->devices[PCI_DEVFN(4,0)];
>+if (d) {
>+fn(bus, d, opaque);
>+return;
>+}
>+}
>+}
>+
> void pci_for_each_device_under_bus(PCIBus *bus,
>pci_bus_dev_fn fn, void *opaque)
> {
>diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>index e6d0574a29..c53e21082a 100644
>--- a/include/hw/pci/pci.h
>+++ b/include/hw/pci/pci.h
>@@ -343,6 +343,9 @@ void pci_for_each_device_under_bus(PCIBus *bus,
> void pci_for_each_device_under_bus_reverse(PCIBus *bus,
>pci_bus_dev_fn fn,
>void *opaque);
>+void pci_function_for_one_bus(PCIBus *bus,
>+ void (*fn)(PCIBus *bus, PCIDevice *d, void *opaque),
>+ void *opaque);
> void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
>   pci_bus_fn end, void *parent_state);
> PCIDevice *pci_get_function_0(PCIDevice *pci_dev);



Re: [PULL 00/30] Next patches

2023-06-21 Thread Richard Henderson

On 6/22/23 04:12, Juan Quintela wrote:

The following changes since commit 67fe6ae41da64368bc4936b196fee2bf61f8c720:

   Merge tag 'pull-tricore-20230621-1' ofhttps://github.com/bkoppelmann/qemu  
into staging (2023-06-21 20:08:48 +0200)

are available in the Git repository at:

   https://gitlab.com/juan.quintela/qemu.git  tags/next-pull-request

for you to fetch changes up to c53dc569d0a0fb76eaa83f353253a897914948f9:

   migration/rdma: Split qemu_fopen_rdma() into input/output functions 
(2023-06-22 02:45:30 +0200)


Migration Pull request (20230621)

In this pull request:

- fix for multifd thread creation (fabiano)
- dirtylimity (hyman)
   * migration-test will go on next PULL request, as it has failures.
- Improve error description (tejus)
- improve -incoming and set parameters before calling incoming (wei)
- migration atomic counters reviewed patches (quintela)
- migration-test refacttoring reviewed (quintela)

Please apply.


You really need to test at least one 32-bit host regularly.
It should be trivial for you to do an i686 build somewhere.

https://gitlab.com/qemu-project/qemu/-/jobs/4518975360#L4817
https://gitlab.com/qemu-project/qemu/-/jobs/4518975263#L3486
https://gitlab.com/qemu-project/qemu/-/jobs/4518975261#L3145
https://gitlab.com/qemu-project/qemu/-/jobs/4518975298#L3372
https://gitlab.com/qemu-project/qemu/-/jobs/4518975301#L3221

../softmmu/dirtylimit.c:558:58: error: format specifies type 'long' but the argument has 
type 'int64_t' (aka 'long long') [-Werror,-Wformat]

error_setg(, "invalid dirty page limit %ld", dirty_rate);
   ~~~   ^~
   %lld


r~



Re: [PATCH] ui/gtk: making dmabuf NULL when it's released.

2023-06-21 Thread Richard Henderson

On 6/22/23 00:11, Dongwon Kim wrote:

  static void gd_gl_release_dmabuf(DisplayChangeListener *dcl,
   QemuDmaBuf *dmabuf)
  {
+VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
  #ifdef CONFIG_GBM
  egl_dmabuf_release_texture(dmabuf);
+if (vc->gfx.guest_fb.dmabuf == dmabuf) {
+vc->gfx.guest_fb.dmabuf = NULL;
+}
  #endif
  }


Conditionally unused variable outside the ifdef.

r~



Re: [PATCH v3 00/37] crypto: Provide aes-round.h and host accel

2023-06-21 Thread Richard Henderson

On 6/20/23 13:07, Richard Henderson wrote:

Patches missing r-b:
   08-target-arm-Use-aesenc_SB_SR_AK.patch
   10-target-riscv-Use-aesenc_SB_SR_AK.patch
   13-target-arm-Use-aesdec_ISB_ISR_AK.patch
   15-target-riscv-Use-aesdec_ISB_ISR_AK.patch
   17-target-arm-Use-aesenc_MC.patch
   19-target-i386-Use-aesdec_IMC.patch
   20-target-arm-Use-aesdec_IMC.patch
   21-target-riscv-Use-aesdec_IMC.patch
   23-target-i386-Use-aesenc_SB_SR_MC_AK.patch
   25-target-riscv-Use-aesenc_SB_SR_MC_AK.patch
   27-target-i386-Use-aesdec_ISB_ISR_IMC_AK.patch
   28-target-riscv-Use-aesdec_ISB_ISR_IMC_AK.patch
   35-host-include-i386-Implement-aes-round.h.patch
   36-host-include-aarch64-Implement-aes-round.h.patch


The crypto/ portion of the patch set has now been reviewed (thanks Daniel P.B.), as well 
as the target/ppc/ portions (thanks Daniel H.B.).


What's left are the x86 and aa64 host accel, target/{arm,i386,riscv}.

Would it make anything easier if I re-sorted and separated the unreviewed 
patches per target?


r~



[PATCH RFC 1/3] vdpa: Restore MAC address filtering state

2023-06-21 Thread Hawkins Jiawei
This patch refactors vhost_vdpa_net_load_mac() to
restore the MAC address filtering state at device's startup.

Signed-off-by: Hawkins Jiawei 
---
 net/vhost-vdpa.c | 39 ++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index ecfa8852b5..10264d3e96 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -651,8 +651,45 @@ static int vhost_vdpa_net_load_mac(VhostVDPAState *s, 
const VirtIONet *n)
 if (unlikely(dev_written < 0)) {
 return dev_written;
 }
+if (*s->status != VIRTIO_NET_OK) {
+return -EINVAL;
+}
+}
+
+if (virtio_vdev_has_feature(>parent_obj, VIRTIO_NET_F_CTRL_RX)) {
+/* Load the MAC address filtering */
+uint32_t uni_entries = n->mac_table.first_multi,
+ uni_macs_size = uni_entries * ETH_ALEN,
+ uni_size = sizeof(struct virtio_net_ctrl_mac) + uni_macs_size,
+ mul_entries = n->mac_table.in_use - uni_entries,
+ mul_macs_size = mul_entries * ETH_ALEN,
+ mul_size = sizeof(struct virtio_net_ctrl_mac) + mul_macs_size,
+ data_size = uni_size + mul_size;
+void *data = g_malloc(data_size);
+struct virtio_net_ctrl_mac *ctrl_mac;
+
+/* Pack the non-multicast(unicast) MAC addresses */
+ctrl_mac = data;
+ctrl_mac->entries = cpu_to_le32(uni_entries);
+memcpy(ctrl_mac->macs, n->mac_table.macs, uni_macs_size);
+
+/* Pack the multicast MAC addresses */
+ctrl_mac = data + uni_size;
+ctrl_mac->entries = cpu_to_le32(mul_entries);
+memcpy(ctrl_mac->macs, >mac_table.macs[uni_macs_size],
+   mul_macs_size);
+
+ssize_t dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MAC,
+  
VIRTIO_NET_CTRL_MAC_TABLE_SET,
+  data, data_size);
+g_free(data);
 
-return *s->status != VIRTIO_NET_OK;
+if (unlikely(dev_written < 0)) {
+return dev_written;
+}
+if (*s->status != VIRTIO_NET_OK) {
+return -EINVAL;
+}
 }
 
 return 0;
-- 
2.25.1




[PATCH RFC 3/3] vdpa: Allow VIRTIO_NET_F_CTRL_RX in SVQ

2023-06-21 Thread Hawkins Jiawei
Enable SVQ with VIRTIO_NET_F_CTRL_RX feature.

Signed-off-by: Hawkins Jiawei 
---
 net/vhost-vdpa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 355a6aef15..ca800f97e2 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -99,6 +99,7 @@ static const uint64_t vdpa_svq_device_features =
 BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) |
 BIT_ULL(VIRTIO_NET_F_STATUS) |
 BIT_ULL(VIRTIO_NET_F_CTRL_VQ) |
+BIT_ULL(VIRTIO_NET_F_CTRL_RX) |
 BIT_ULL(VIRTIO_NET_F_MQ) |
 BIT_ULL(VIRTIO_F_ANY_LAYOUT) |
 BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) |
-- 
2.25.1




[PATCH RFC 0/3] Vhost-vdpa Shadow Virtqueue _F_CTRL_RX commands support

2023-06-21 Thread Hawkins Jiawei
This series enables shadowed CVQ to intercept rx commands related to
VIRTIO_NET_F_CTRL_RX feature through shadowed CVQ, update the virtio
NIC device model so qemu send it in a migration, and the restore of
that rx state in the destination.

Note that this patch should be based on [1], and conflicts [2], which have not
been merged. I will submit the v2 patch after they are merged.

[1]. https://lore.kernel.org/all/cover.1685704856.git.yin31...@gmail.com/
[2]. https://lore.kernel.org/all/cover.1686746406.git.yin31...@gmail.com/

Hawkins Jiawei (3):
  vdpa: Restore MAC address filtering state
  vdpa: Restore packet receive filtering state relative with _F_CTRL_RX
feature
  vdpa: Allow VIRTIO_NET_F_CTRL_RX in SVQ

 net/vhost-vdpa.c | 114 ++-
 1 file changed, 113 insertions(+), 1 deletion(-)

-- 
2.25.1




[PATCH RFC 2/3] vdpa: Restore packet receive filtering state relative with _F_CTRL_RX feature

2023-06-21 Thread Hawkins Jiawei
This patch introduces vhost_vdpa_net_load_rx_mode()
and vhost_vdpa_net_load_rx() to restore the packet
receive filtering state in relation to
VIRTIO_NET_F_CTRL_RX feature at device's startup.

Signed-off-by: Hawkins Jiawei 
---
 net/vhost-vdpa.c | 74 
 1 file changed, 74 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 10264d3e96..355a6aef15 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -754,6 +754,76 @@ static int vhost_vdpa_net_load_offloads(VhostVDPAState *s,
 return *s->status != VIRTIO_NET_OK;
 }
 
+static int vhost_vdpa_net_load_rx_mode(VhostVDPAState *s,
+   uint8_t cmd,
+   uint8_t on)
+{
+ssize_t dev_written;
+dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_RX,
+  cmd, , sizeof(on));
+if (unlikely(dev_written < 0)) {
+return dev_written;
+}
+if (*s->status != VIRTIO_NET_OK) {
+return -EINVAL;
+}
+
+return 0;
+}
+
+static int vhost_vdpa_net_load_rx(VhostVDPAState *s,
+  const VirtIONet *n)
+{
+uint8_t on;
+int r;
+
+if (virtio_vdev_has_feature(>parent_obj, VIRTIO_NET_F_CTRL_RX)) {
+/* Load the promiscous mode */
+if (n->mac_table.uni_overflow) {
+/*
+ * According to VirtIO standard, "Since there are no guarantees,
+ * it can use a hash filter or silently switch to
+ * allmulti or promiscuous mode if it is given too many addresses."
+ *
+ * QEMU ignores non-multicast(unicast) MAC addresses and
+ * marks `uni_overflow` for the device internal state
+ * if guest sets too many non-multicast(unicast) MAC addresses.
+ * Therefore, we should turn promiscous mode on in this case.
+ */
+on = 1;
+} else {
+on = n->promisc;
+}
+r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_PROMISC, on);
+if (r < 0) {
+return r;
+}
+
+/* Load the all-multicast mode */
+if (n->mac_table.multi_overflow) {
+/*
+ * According to VirtIO standard, "Since there are no guarantees,
+ * it can use a hash filter or silently switch to
+ * allmulti or promiscuous mode if it is given too many addresses."
+ *
+ * QEMU ignores multicast MAC addresses and
+ * marks `multi_overflow` for the device internal state
+ * if guest sets too many multicast MAC addresses.
+ * Therefore, we should turn all-multicast mode on in this case.
+ */
+on = 1;
+} else {
+on = n->allmulti;
+}
+r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_ALLMULTI, on);
+if (r < 0) {
+return r;
+}
+}
+
+return 0;
+}
+
 static int vhost_vdpa_net_load(NetClientState *nc)
 {
 VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
@@ -780,6 +850,10 @@ static int vhost_vdpa_net_load(NetClientState *nc)
 if (unlikely(r)) {
 return r;
 }
+r = vhost_vdpa_net_load_rx(s, n);
+if (unlikely(r)) {
+return r;
+}
 
 return 0;
 }
-- 
2.25.1




Re: [PATCH] target/riscv/cpu.c: fix veyron-v1 CPU properties

2023-06-21 Thread Alistair Francis
On Wed, Jun 21, 2023 at 1:25 AM Daniel Henrique Barboza
 wrote:
>
> Commit 7f0bdfb5bfc2 ("target/riscv/cpu.c: remove cfg setup from
> riscv_cpu_init()") removed code that was enabling mmu, pmp, ext_ifencei
> and ext_icsr from riscv_cpu_init(), the init() function of
> TYPE_RISCV_CPU, parent type of all RISC-V CPUss. This was done to force
> CPUs to explictly enable all extensions and features it requires,
> without any 'magic values' that were inherited by the parent type.
>
> This commit failed to make appropriate changes in the 'veyron-v1' CPU,
> added earlier by commit e1d084a8524a. The result is that the veyron-v1
> CPU has ext_ifencei, ext_icsr and pmp set to 'false', which is not the
> case.
>
> The reason why it took this long to notice (thanks LIU Zhiwei for
> reporting it) is because Linux doesn't mind 'ifencei' and 'icsr' being
> absent in the 'riscv,isa' DT, implying that they're both present if the
> 'i' extension is enabled. OpenSBI also doesn't error out or warns about
> the lack of 'pmp', it'll just not protect memory pages.
>
> Fix it by setting them to 'true' in rv64_veyron_v1_cpu_init() like
> 7f0bdfb5bfc2 already did with other CPUs.
>
> Reported-by: LIU Zhiwei 
> Fixes: 7f0bdfb5bfc2 ("target/riscv/cpu.c: remove cfg setup from 
> riscv_cpu_init()")
> Signed-off-by: Daniel Henrique Barboza 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>  target/riscv/cpu.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 881bddf393..707f62b592 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -444,6 +444,9 @@ static void rv64_veyron_v1_cpu_init(Object *obj)
>
>  /* Enable ISA extensions */
>  cpu->cfg.mmu = true;
> +cpu->cfg.ext_ifencei = true;
> +cpu->cfg.ext_icsr = true;
> +cpu->cfg.pmp = true;
>  cpu->cfg.ext_icbom = true;
>  cpu->cfg.cbom_blocksize = 64;
>  cpu->cfg.cboz_blocksize = 64;
> --
> 2.41.0
>
>



Re: [RFC 4/6] migration: Deprecate -incoming

2023-06-21 Thread Juan Quintela
Thomas Huth  wrote:
> On 12/06/2023 21.33, Juan Quintela wrote:
>> Only "defer" is recommended.  After setting all migation parameters,
>> start incoming migration with "migrate-incoming uri" command.
>> Signed-off-by: Juan Quintela 
>> ---
>>   docs/about/deprecated.rst | 7 +++
>>   softmmu/vl.c  | 2 ++
>>   2 files changed, 9 insertions(+)
>> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
>> index 47e98dc95e..518672722d 100644
>> --- a/docs/about/deprecated.rst
>> +++ b/docs/about/deprecated.rst
>> @@ -447,3 +447,10 @@ The new way to modify migration is using migration 
>> parameters.
>>   ``blk`` functionality can be acchieved using
>>   ``migrate_set_parameter block-incremental true``.
>>   +``-incoming uri`` (since 8.1)
>> +'
>> +
>> +Everything except ``-incoming defer`` are deprecated.  This allows to
>> +setup parameters before launching the proper migration with
>> +``migrate-incoming uri``.
>> +
>> diff --git a/softmmu/vl.c b/softmmu/vl.c
>> index b0b96f67fa..7fe865ab59 100644
>> --- a/softmmu/vl.c
>> +++ b/softmmu/vl.c
>> @@ -2651,6 +2651,8 @@ void qmp_x_exit_preconfig(Error **errp)
>>   if (incoming) {
>>   Error *local_err = NULL;
>>   if (strcmp(incoming, "defer") != 0) {
>> +warn_report("-incoming %s is deprecated, use -incoming defer 
>> and "
>> +" set the uri with migrate-incoming.", incoming);
>>   qmp_migrate_incoming(incoming, _err);
>>   if (local_err) {
>>   error_reportf_err(local_err, "-incoming %s: ", incoming);
>
> Could we maybe keep at least the smallest set of necessary parameters
> around? I'm often doing a "-incoming tcp:0:1234" for doing quick
> sanity checks with migration, not caring about other migration
> parameters, so if that could continue to work, that would be very
> appreciated.

I will try to explain myself here.

I think that everything except tcp works.
But when we have tcp, we have two cases where this is a trap:
- multifd channels:
  * if we default to a big number, we underuse resources in a normal
case
  * if we default to a small number, we have the problem that if the
user set "later" multifd-channels to a bigger number, things can
break.
- postcopy+preempt:
  this case is also problematic, but easily fixable.  Put a default
  of 2 instead of 1.

The only other solution that I can think of is just fail if we set
multifd without incoming defer.  But more sooner than later we are going
to have to default to multifd, so ...

Later, Juan.




[PULL 29/30] qemu-file: Make qemu_file_get_error_obj() static

2023-06-21 Thread Juan Quintela
It was not used outside of qemu_file.c anyways.

Reviewed-by: Peter Xu 
Message-ID: <20230530183941.7223-21-quint...@redhat.com>
Signed-off-by: Juan Quintela 
---
 migration/qemu-file.h | 1 -
 migration/qemu-file.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index a081ef6c3f..8b8b7d27fe 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -128,7 +128,6 @@ void qemu_file_skip(QEMUFile *f, int size);
  * accounting information tracks the total migration traffic.
  */
 void qemu_file_credit_transfer(QEMUFile *f, size_t size);
-int qemu_file_get_error_obj(QEMUFile *f, Error **errp);
 int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp);
 void qemu_file_set_error_obj(QEMUFile *f, int ret, Error *err);
 void qemu_file_set_error(QEMUFile *f, int ret);
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 4c577bdff8..d30bf3c377 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -158,7 +158,7 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks 
*hooks)
  * is not 0.
  *
  */
-int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
+static int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
 {
 if (errp) {
 *errp = f->last_error_obj ? error_copy(f->last_error_obj) : NULL;
-- 
2.40.1




[PULL 27/30] qemu_file: Make qemu_file_is_writable() static

2023-06-21 Thread Juan Quintela
It is not used outside of qemu_file, and it shouldn't.

Signed-off-by: Juan Quintela 
Message-ID: <20230530183941.7223-19-quint...@redhat.com>
Signed-off-by: Juan Quintela 
---
 migration/qemu-file.h | 1 -
 migration/qemu-file.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index aa6eee66da..a081ef6c3f 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -103,7 +103,6 @@ uint64_t qemu_file_transferred_noflush(QEMUFile *f);
 void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, size_t size,
bool may_free);
 bool qemu_file_mode_is_not_valid(const char *mode);
-bool qemu_file_is_writable(QEMUFile *f);
 
 #include "migration/qemu-file-types.h"
 
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index fdf115b5da..9a89e17924 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -228,7 +228,7 @@ void qemu_file_set_error(QEMUFile *f, int ret)
 qemu_file_set_error_obj(f, ret, NULL);
 }
 
-bool qemu_file_is_writable(QEMUFile *f)
+static bool qemu_file_is_writable(QEMUFile *f)
 {
 return f->is_writable;
 }
-- 
2.40.1




[PULL 06/30] migration: Introduce dirty-limit capability

2023-06-21 Thread Juan Quintela
From: Hyman Huang(黄勇) 

Introduce migration dirty-limit capability, which can
be turned on before live migration and limit dirty
page rate durty live migration.

Introduce migrate_dirty_limit function to help check
if dirty-limit capability enabled during live migration.

Meanwhile, refactor vcpu_dirty_rate_stat_collect
so that period can be configured instead of hardcoded.

dirty-limit capability is kind of like auto-converge
but using dirty limit instead of traditional cpu-throttle
to throttle guest down. To enable this feature, turn on
the dirty-limit capability before live migration using
migrate-set-capabilities, and set the parameters
"x-vcpu-dirty-limit-period", "vcpu-dirty-limit" suitably
to speed up convergence.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 
Reviewed-by: Juan Quintela 
Message-Id: <168618975839.6361.1740763387474768865...@git.sr.ht>
Signed-off-by: Juan Quintela 
---
 qapi/migration.json  | 12 +++-
 migration/options.h  |  1 +
 migration/options.c  | 23 +++
 softmmu/dirtylimit.c | 18 ++
 4 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index e7243c0c0d..621e6604c6 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -487,6 +487,16 @@
 # and should not affect the correctness of postcopy migration.
 # (since 7.1)
 #
+# @dirty-limit: If enabled, migration will use the dirty-limit algo to
+#   throttle down guest instead of auto-converge algo.
+#   Throttle algo only works when vCPU's dirtyrate greater
+#   than 'vcpu-dirty-limit', read processes in guest os
+#   aren't penalized any more, so this algo can improve
+#   performance of vCPU during live migration. This is an
+#   optional performance feature and should not affect the
+#   correctness of the existing auto-converge algo.
+#   (since 8.1)
+#
 # Features:
 #
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
@@ -502,7 +512,7 @@
'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
{ 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
'validate-uuid', 'background-snapshot',
-   'zero-copy-send', 'postcopy-preempt'] }
+   'zero-copy-send', 'postcopy-preempt', 'dirty-limit'] }
 
 ##
 # @MigrationCapabilityStatus:
diff --git a/migration/options.h b/migration/options.h
index 45991af3c2..51964eff29 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -29,6 +29,7 @@ bool migrate_block(void);
 bool migrate_colo(void);
 bool migrate_compress(void);
 bool migrate_dirty_bitmaps(void);
+bool migrate_dirty_limit(void);
 bool migrate_events(void);
 bool migrate_ignore_shared(void);
 bool migrate_late_block_activate(void);
diff --git a/migration/options.c b/migration/options.c
index 8acf5f1d2c..ba1010e08b 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -27,6 +27,7 @@
 #include "qemu-file.h"
 #include "ram.h"
 #include "options.h"
+#include "sysemu/kvm.h"
 
 /* Maximum migrate downtime set to 2000 seconds */
 #define MAX_MIGRATE_DOWNTIME_SECONDS 2000
@@ -194,6 +195,7 @@ Property migration_properties[] = {
 DEFINE_PROP_MIG_CAP("x-zero-copy-send",
 MIGRATION_CAPABILITY_ZERO_COPY_SEND),
 #endif
+DEFINE_PROP_MIG_CAP("x-dirty-limit", MIGRATION_CAPABILITY_DIRTY_LIMIT),
 
 DEFINE_PROP_END_OF_LIST(),
 };
@@ -240,6 +242,13 @@ bool migrate_dirty_bitmaps(void)
 return s->capabilities[MIGRATION_CAPABILITY_DIRTY_BITMAPS];
 }
 
+bool migrate_dirty_limit(void)
+{
+MigrationState *s = migrate_get_current();
+
+return s->capabilities[MIGRATION_CAPABILITY_DIRTY_LIMIT];
+}
+
 bool migrate_events(void)
 {
 MigrationState *s = migrate_get_current();
@@ -556,6 +565,20 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
Error **errp)
 }
 }
 
+if (new_caps[MIGRATION_CAPABILITY_DIRTY_LIMIT]) {
+if (new_caps[MIGRATION_CAPABILITY_AUTO_CONVERGE]) {
+error_setg(errp, "dirty-limit conflicts with auto-converge"
+   " either of then available currently");
+return false;
+}
+
+if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
+error_setg(errp, "dirty-limit requires KVM with accelerator"
+   " property 'dirty-ring-size' set");
+return false;
+}
+}
+
 return true;
 }
 
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index 5c12d26d49..3f1103b04b 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -24,6 +24,9 @@
 #include "hw/boards.h"
 #include "sysemu/kvm.h"
 #include "trace.h"
+#include "migration/misc.h"
+#include "migration/migration.h"
+#include "migration/options.h"
 
 /*
  * Dirtylimit stop working if dirty page rate error
@@ -75,14 +78,21 @@ static bool dirtylimit_quit;
 
 static void vcpu_dirty_rate_stat_collect(void)
 {
+MigrationState 

[PULL 23/30] qtest/migration-tests.c: use "-incoming defer" for postcopy tests

2023-06-21 Thread Juan Quintela
From: Wei Wang 

The Postcopy preempt capability is expected to be set before incoming
starts, so change the postcopy tests to start with deferred incoming and
call migrate-incoming after the cap has been set.

Why the existing tests (without this patch) didn't fail?
There could be two reasons:
1) "backlog" specifies the number of pending connections. As long as the
   server accepts the connections faster than the clients side connecting,
   connection will succeed. For the preempt test, it uses only 2 channels,
   so very likely to not have pending connections.
2) per my tests (on kernel 6.2), the number of pending connections allowed
   is actually "backlog + 1", which is 2 in this case.
That said, the implementation of socket_start_incoming_migration_internal
expects "migrate defer" to be used, and for safety, change the test to
work with the expected usage.

Signed-off-by: Wei Wang 
Reviewed-by: Peter Xu 
Reviewed-by: Juan Quintela 
Message-ID: <20230606101910.20456-3-wei.w.w...@intel.com>
Signed-off-by: Juan Quintela 
---
 tests/qtest/migration-test.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index e3e7d54216..c694685923 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1161,10 +1161,10 @@ static int migrate_postcopy_prepare(QTestState 
**from_ptr,
 QTestState **to_ptr,
 MigrateCommon *args)
 {
-g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+g_autofree char *uri = NULL;
 QTestState *from, *to;
 
-if (test_migrate_start(, , uri, >start)) {
+if (test_migrate_start(, , "defer", >start)) {
 return -1;
 }
 
@@ -1183,9 +1183,13 @@ static int migrate_postcopy_prepare(QTestState 
**from_ptr,
 
 migrate_ensure_non_converge(from);
 
+qtest_qmp_assert_success(to, "{ 'execute': 'migrate-incoming',"
+ "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
+
 /* Wait for the first serial output from the source */
 wait_for_serial("src_serial");
 
+uri = migrate_get_socket_address(to, "socket-address");
 migrate_qmp(from, uri, "{}");
 
 wait_for_migration_pass(from);
-- 
2.40.1




[PULL 20/30] migration: Update error description whenever migration fails

2023-06-21 Thread Juan Quintela
From: Tejus GK 

There are places in migration.c where the migration is marked failed with
MIGRATION_STATUS_FAILED, but the failure reason is never updated. Hence
libvirt doesn't know why the migration failed when it queries for it.

Reviewed-by: Daniel P. Berrangé 
Acked-by: Peter Xu 
Signed-off-by: Tejus GK 
Message-ID: <20230621130940.178659-2-tejus...@nutanix.com>
Signed-off-by: Juan Quintela 
---
 migration/migration.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 719f91573f..e6bff2e848 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1679,7 +1679,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 if (!(has_resume && resume)) {
 yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
-error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri",
+error_setg(_err, QERR_INVALID_PARAMETER_VALUE, "uri",
"a valid migration protocol");
 migrate_set_state(>state, MIGRATION_STATUS_SETUP,
   MIGRATION_STATUS_FAILED);
@@ -2066,7 +2066,7 @@ migration_wait_main_channel(MigrationState *ms)
  * Switch from normal iteration to postcopy
  * Returns non-0 on error
  */
-static int postcopy_start(MigrationState *ms)
+static int postcopy_start(MigrationState *ms, Error **errp)
 {
 int ret;
 QIOChannelBuffer *bioc;
@@ -2176,7 +2176,7 @@ static int postcopy_start(MigrationState *ms)
  */
 ret = qemu_file_get_error(ms->to_dst_file);
 if (ret) {
-error_report("postcopy_start: Migration stream errored (pre package)");
+error_setg(errp, "postcopy_start: Migration stream errored (pre 
package)");
 goto fail_closefb;
 }
 
@@ -2213,7 +2213,7 @@ static int postcopy_start(MigrationState *ms)
 
 ret = qemu_file_get_error(ms->to_dst_file);
 if (ret) {
-error_report("postcopy_start: Migration stream errored");
+error_setg(errp, "postcopy_start: Migration stream errored");
 migrate_set_state(>state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
   MIGRATION_STATUS_FAILED);
 }
@@ -2720,6 +2720,7 @@ typedef enum {
 static MigIterateState migration_iteration_run(MigrationState *s)
 {
 uint64_t must_precopy, can_postcopy;
+Error *local_err = NULL;
 bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
 
 qemu_savevm_state_pending_estimate(_precopy, _postcopy);
@@ -2742,8 +2743,9 @@ static MigIterateState 
migration_iteration_run(MigrationState *s)
 /* Still a significant amount to transfer */
 if (!in_postcopy && must_precopy <= s->threshold_size &&
 qatomic_read(>start_postcopy)) {
-if (postcopy_start(s)) {
-error_report("%s: postcopy failed to start", __func__);
+if (postcopy_start(s, _err)) {
+migrate_set_error(s, local_err);
+error_report_err(local_err);
 }
 return MIG_ITERATE_SKIP;
 }
@@ -3234,8 +3236,10 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
  */
 if (migrate_postcopy_ram() || migrate_return_path()) {
 if (open_return_path_on_source(s, !resume)) {
-error_report("Unable to open return-path for postcopy");
+error_setg(_err, "Unable to open return-path for postcopy");
 migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED);
+migrate_set_error(s, local_err);
+error_report_err(local_err);
 migrate_fd_cleanup(s);
 return;
 }
@@ -3259,6 +3263,7 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 }
 
 if (multifd_save_setup(_err) != 0) {
+migrate_set_error(s, local_err);
 error_report_err(local_err);
 migrate_set_state(>state, MIGRATION_STATUS_SETUP,
   MIGRATION_STATUS_FAILED);
-- 
2.40.1




[PULL 30/30] migration/rdma: Split qemu_fopen_rdma() into input/output functions

2023-06-21 Thread Juan Quintela
This is how everything else in QEMUFile is structured.
As a bonus they are three less lines of code.

Reviewed-by: Peter Xu 
Message-ID: <20230530183941.7223-17-quint...@redhat.com>
Signed-off-by: Juan Quintela 
---
 migration/qemu-file.h |  1 -
 migration/qemu-file.c | 12 
 migration/rdma.c  | 39 +++
 3 files changed, 19 insertions(+), 33 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 8b8b7d27fe..47015f5201 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -102,7 +102,6 @@ uint64_t qemu_file_transferred_noflush(QEMUFile *f);
  */
 void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, size_t size,
bool may_free);
-bool qemu_file_mode_is_not_valid(const char *mode);
 
 #include "migration/qemu-file-types.h"
 
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index d30bf3c377..19c33c9985 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -100,18 +100,6 @@ int qemu_file_shutdown(QEMUFile *f)
 return 0;
 }
 
-bool qemu_file_mode_is_not_valid(const char *mode)
-{
-if (mode == NULL ||
-(mode[0] != 'r' && mode[0] != 'w') ||
-mode[1] != 'b' || mode[2] != 0) {
-fprintf(stderr, "qemu_fopen: Argument validity check failed\n");
-return true;
-}
-
-return false;
-}
-
 static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool is_writable)
 {
 QEMUFile *f;
diff --git a/migration/rdma.c b/migration/rdma.c
index dd1c039e6c..ca430d319d 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -4053,27 +4053,26 @@ static void qio_channel_rdma_register_types(void)
 
 type_init(qio_channel_rdma_register_types);
 
-static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode)
+static QEMUFile *rdma_new_input(RDMAContext *rdma)
 {
-QIOChannelRDMA *rioc;
+QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(object_new(TYPE_QIO_CHANNEL_RDMA));
 
-if (qemu_file_mode_is_not_valid(mode)) {
-return NULL;
-}
+rioc->file = qemu_file_new_input(QIO_CHANNEL(rioc));
+rioc->rdmain = rdma;
+rioc->rdmaout = rdma->return_path;
+qemu_file_set_hooks(rioc->file, _read_hooks);
 
-rioc = QIO_CHANNEL_RDMA(object_new(TYPE_QIO_CHANNEL_RDMA));
+return rioc->file;
+}
 
-if (mode[0] == 'w') {
-rioc->file = qemu_file_new_output(QIO_CHANNEL(rioc));
-rioc->rdmaout = rdma;
-rioc->rdmain = rdma->return_path;
-qemu_file_set_hooks(rioc->file, _write_hooks);
-} else {
-rioc->file = qemu_file_new_input(QIO_CHANNEL(rioc));
-rioc->rdmain = rdma;
-rioc->rdmaout = rdma->return_path;
-qemu_file_set_hooks(rioc->file, _read_hooks);
-}
+static QEMUFile *rdma_new_output(RDMAContext *rdma)
+{
+QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(object_new(TYPE_QIO_CHANNEL_RDMA));
+
+rioc->file = qemu_file_new_output(QIO_CHANNEL(rioc));
+rioc->rdmaout = rdma;
+rioc->rdmain = rdma->return_path;
+qemu_file_set_hooks(rioc->file, _write_hooks);
 
 return rioc->file;
 }
@@ -4099,9 +4098,9 @@ static void rdma_accept_incoming_migration(void *opaque)
 return;
 }
 
-f = qemu_fopen_rdma(rdma, "rb");
+f = rdma_new_input(rdma);
 if (f == NULL) {
-fprintf(stderr, "RDMA ERROR: could not qemu_fopen_rdma\n");
+fprintf(stderr, "RDMA ERROR: could not open RDMA for input\n");
 qemu_rdma_cleanup(rdma);
 return;
 }
@@ -4224,7 +4223,7 @@ void rdma_start_outgoing_migration(void *opaque,
 
 trace_rdma_start_outgoing_migration_after_rdma_connect();
 
-s->to_dst_file = qemu_fopen_rdma(rdma, "wb");
+s->to_dst_file = rdma_new_output(rdma);
 migrate_fd_connect(s, NULL);
 return;
 return_path_err:
-- 
2.40.1




[PULL 25/30] migration: Change qemu_file_transferred to noflush

2023-06-21 Thread Juan Quintela
We do a qemu_fclose() just after that, that also does a qemu_fflush(),
so remove one qemu_fflush().

Reviewed-by: Philippe Mathieu-Daudé 
Message-ID: <20230530183941.7223-3-quint...@redhat.com>
Signed-off-by: Juan Quintela 
---
 migration/savevm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index f26b455764..b2199d1039 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2952,7 +2952,7 @@ bool save_snapshot(const char *name, bool overwrite, 
const char *vmstate,
 goto the_end;
 }
 ret = qemu_savevm_state(f, errp);
-vm_state_size = qemu_file_transferred(f);
+vm_state_size = qemu_file_transferred_noflush(f);
 ret2 = qemu_fclose(f);
 if (ret < 0) {
 goto the_end;
-- 
2.40.1




[PULL 22/30] migration: enforce multifd and postcopy preempt to be set before incoming

2023-06-21 Thread Juan Quintela
From: Wei Wang 

qemu_start_incoming_migration needs to check the number of multifd
channels or postcopy ram channels to configure the backlog parameter (i.e.
the maximum length to which the queue of pending connections for sockfd
may grow) of listen(). So enforce the usage of postcopy-preempt and
multifd as below:
- need to use "-incoming defer" on the destination; and
- set_capability and set_parameter need to be done before migrate_incoming

Otherwise, disable the use of the features and report error messages to
remind users to adjust the commands.

Signed-off-by: Wei Wang 
Reviewed-by: Peter Xu 
Reviewed-by: Juan Quintela 
Message-ID: <20230606101910.20456-2-wei.w.w...@intel.com>
Signed-off-by: Juan Quintela 
Acked-by: Juan Quintela 
---
 migration/options.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/migration/options.c b/migration/options.c
index ba1010e08b..c072c2fab7 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -433,6 +433,11 @@ INITIALIZE_MIGRATE_CAPS_SET(check_caps_background_snapshot,
 MIGRATION_CAPABILITY_VALIDATE_UUID,
 MIGRATION_CAPABILITY_ZERO_COPY_SEND);
 
+static bool migrate_incoming_started(void)
+{
+return !!migration_incoming_get_current()->transport_data;
+}
+
 /**
  * @migration_caps_check - check capability compatibility
  *
@@ -556,6 +561,12 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
Error **errp)
 error_setg(errp, "Postcopy preempt not compatible with compress");
 return false;
 }
+
+if (migrate_incoming_started()) {
+error_setg(errp,
+   "Postcopy preempt must be set before incoming starts");
+return false;
+}
 }
 
 if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) {
@@ -563,6 +574,10 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
Error **errp)
 error_setg(errp, "Multifd is not compatible with compress");
 return false;
 }
+if (migrate_incoming_started()) {
+error_setg(errp, "Multifd must be set before incoming starts");
+return false;
+}
 }
 
 if (new_caps[MIGRATION_CAPABILITY_DIRTY_LIMIT]) {
-- 
2.40.1




[PULL 26/30] migration: Use qemu_file_transferred_noflush() for block migration.

2023-06-21 Thread Juan Quintela
We only care about the amount of bytes transferred.  Flushing is done
by the system somewhere else.

Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20230530183941.7223-4-quint...@redhat.com>
Signed-off-by: Juan Quintela 
---
 migration/block.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index b9580a6c7e..b29e80bdc4 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -748,7 +748,7 @@ static int block_save_setup(QEMUFile *f, void *opaque)
 static int block_save_iterate(QEMUFile *f, void *opaque)
 {
 int ret;
-uint64_t last_bytes = qemu_file_transferred(f);
+uint64_t last_bytes = qemu_file_transferred_noflush(f);
 
 trace_migration_block_save("iterate", block_mig_state.submitted,
block_mig_state.transferred);
@@ -800,7 +800,7 @@ static int block_save_iterate(QEMUFile *f, void *opaque)
 }
 
 qemu_put_be64(f, BLK_MIG_FLAG_EOS);
-uint64_t delta_bytes = qemu_file_transferred(f) - last_bytes;
+uint64_t delta_bytes = qemu_file_transferred_noflush(f) - last_bytes;
 return (delta_bytes > 0);
 }
 
-- 
2.40.1




[PULL 13/30] migration-test: Create arch_opts

2023-06-21 Thread Juan Quintela
This will contain the options needed for both source and target.

Reviewed-by: Peter Xu 
Message-ID: <20230608224943.3877-6-quint...@redhat.com>
Signed-off-by: Juan Quintela 
---
 tests/qtest/migration-test.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 79157d600b..4d8542f5c7 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -600,6 +600,8 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 {
 g_autofree gchar *arch_source = NULL;
 g_autofree gchar *arch_target = NULL;
+/* options for source and target */
+g_autofree gchar *arch_opts = NULL;
 g_autofree gchar *cmd_source = NULL;
 g_autofree gchar *cmd_target = NULL;
 const gchar *ignore_stderr;
@@ -625,15 +627,13 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 assert(sizeof(x86_bootsect) == 512);
 init_bootfile(bootpath, x86_bootsect, sizeof(x86_bootsect));
 memory_size = "150M";
-arch_source = g_strdup_printf("-drive file=%s,format=raw", bootpath);
-arch_target = g_strdup(arch_source);
+arch_opts = g_strdup_printf("-drive file=%s,format=raw", bootpath);
 start_address = X86_TEST_MEM_START;
 end_address = X86_TEST_MEM_END;
 } else if (g_str_equal(arch, "s390x")) {
 init_bootfile(bootpath, s390x_elf, sizeof(s390x_elf));
 memory_size = "128M";
-arch_source = g_strdup_printf("-bios %s", bootpath);
-arch_target = g_strdup(arch_source);
+arch_opts = g_strdup_printf("-bios %s", bootpath);
 start_address = S390_TEST_MEM_START;
 end_address = S390_TEST_MEM_END;
 } else if (strcmp(arch, "ppc64") == 0) {
@@ -641,20 +641,16 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 memory_size = "256M";
 start_address = PPC_TEST_MEM_START;
 end_address = PPC_TEST_MEM_END;
-arch_source = g_strdup_printf("-nodefaults "
-  "-prom-env 'use-nvramrc?=true' -prom-env 
"
+arch_source = g_strdup_printf("-prom-env 'use-nvramrc?=true' -prom-env 
"
   "'nvramrc=hex .\" _\" begin %x %x "
   "do i c@ 1 + i c! 1000 +loop .\" B\" 0 "
   "until'", end_address, start_address);
-arch_target = g_strdup("-nodefaults");
+arch_opts = g_strdup("-nodefaults");
 } else if (strcmp(arch, "aarch64") == 0) {
 init_bootfile(bootpath, aarch64_kernel, sizeof(aarch64_kernel));
 machine_opts = "-machine virt,gic-version=max";
 memory_size = "150M";
-arch_source = g_strdup_printf("-cpu max "
-  "-kernel %s",
-  bootpath);
-arch_target = g_strdup(arch_source);
+arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath);
 start_address = ARM_TEST_MEM_START;
 end_address = ARM_TEST_MEM_END;
 
@@ -693,12 +689,14 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
  "-name source,debug-threads=on "
  "-m %s "
  "-serial file:%s/src_serial "
- "%s %s %s %s",
+ "%s %s %s %s %s",
  args->use_dirty_ring ?
  ",dirty-ring-size=4096" : "",
  machine_opts ? machine_opts : "",
  memory_size, tmpfs,
- arch_source, shmem_opts,
+ arch_opts ? arch_opts : "",
+ arch_source ? arch_source : "",
+ shmem_opts,
  args->opts_source ? args->opts_source : "",
  ignore_stderr);
 if (!args->only_target) {
@@ -713,12 +711,14 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
  "-m %s "
  "-serial file:%s/dest_serial "
  "-incoming %s "
- "%s %s %s %s",
+ "%s %s %s %s %s",
  args->use_dirty_ring ?
  ",dirty-ring-size=4096" : "",
  machine_opts ? machine_opts : "",
  memory_size, tmpfs, uri,
- arch_target, shmem_opts,
+ arch_opts ? arch_opts : "",
+ arch_target ? arch_target : "",
+ shmem_opts,
   

[PULL 07/30] migration: Refactor auto-converge capability logic

2023-06-21 Thread Juan Quintela
From: Hyman Huang(黄勇) 

Check if block migration is running before throttling
guest down in auto-converge way.

Note that this modification is kind of like code clean,
because block migration does not depend on auto-converge
capability, so the order of checks can be adjusted.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 
Reviewed-by: Juan Quintela 
Message-Id: <168618975839.6361.1740763387474768865...@git.sr.ht>
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 5283a75f02..78746849b5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -995,7 +995,11 @@ static void migration_trigger_throttle(RAMState *rs)
 /* During block migration the auto-converge logic incorrectly detects
  * that ram migration makes no progress. Avoid this by disabling the
  * throttling logic during the bulk phase of block migration. */
-if (migrate_auto_converge() && !blk_mig_bulk_active()) {
+if (blk_mig_bulk_active()) {
+return;
+}
+
+if (migrate_auto_converge()) {
 /* The following detection logic can be refined later. For now:
Check to see if the ratio between dirtied bytes and the approx.
amount of bytes that just got transferred since the last time
-- 
2.40.1




[PULL 14/30] migration-test: machine_opts is really arch specific

2023-06-21 Thread Juan Quintela
And it needs to be in both source and target, so put it on arch_opts.

Reviewed-by: Peter Xu 
Message-ID: <20230608224943.3877-7-quint...@redhat.com>
Signed-off-by: Juan Quintela 
---
 tests/qtest/migration-test.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 4d8542f5c7..fc3337b7bb 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -609,7 +609,6 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 g_autofree char *shmem_opts = NULL;
 g_autofree char *shmem_path = NULL;
 const char *arch = qtest_get_arch();
-const char *machine_opts = NULL;
 const char *memory_size;
 
 if (args->use_shmem) {
@@ -637,7 +636,6 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 start_address = S390_TEST_MEM_START;
 end_address = S390_TEST_MEM_END;
 } else if (strcmp(arch, "ppc64") == 0) {
-machine_opts = "-machine vsmt=8";
 memory_size = "256M";
 start_address = PPC_TEST_MEM_START;
 end_address = PPC_TEST_MEM_END;
@@ -645,12 +643,12 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
   "'nvramrc=hex .\" _\" begin %x %x "
   "do i c@ 1 + i c! 1000 +loop .\" B\" 0 "
   "until'", end_address, start_address);
-arch_opts = g_strdup("-nodefaults");
+arch_opts = g_strdup("-nodefaults -machine vsmt=8");
 } else if (strcmp(arch, "aarch64") == 0) {
 init_bootfile(bootpath, aarch64_kernel, sizeof(aarch64_kernel));
-machine_opts = "-machine virt,gic-version=max";
 memory_size = "150M";
-arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath);
+arch_opts = g_strdup_printf("-machine virt,gic-version=max -cpu max "
+"-kernel %s", bootpath);
 start_address = ARM_TEST_MEM_START;
 end_address = ARM_TEST_MEM_END;
 
@@ -685,14 +683,13 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 shmem_opts = g_strdup("");
 }
 
-cmd_source = g_strdup_printf("-accel kvm%s -accel tcg %s "
+cmd_source = g_strdup_printf("-accel kvm%s -accel tcg "
  "-name source,debug-threads=on "
  "-m %s "
  "-serial file:%s/src_serial "
  "%s %s %s %s %s",
  args->use_dirty_ring ?
  ",dirty-ring-size=4096" : "",
- machine_opts ? machine_opts : "",
  memory_size, tmpfs,
  arch_opts ? arch_opts : "",
  arch_source ? arch_source : "",
@@ -706,7 +703,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
  _src_stop);
 }
 
-cmd_target = g_strdup_printf("-accel kvm%s -accel tcg %s "
+cmd_target = g_strdup_printf("-accel kvm%s -accel tcg "
  "-name target,debug-threads=on "
  "-m %s "
  "-serial file:%s/dest_serial "
@@ -714,7 +711,6 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
  "%s %s %s %s %s",
  args->use_dirty_ring ?
  ",dirty-ring-size=4096" : "",
- machine_opts ? machine_opts : "",
  memory_size, tmpfs, uri,
  arch_opts ? arch_opts : "",
  arch_target ? arch_target : "",
-- 
2.40.1




[PULL 15/30] migration-test: Create kvm_opts

2023-06-21 Thread Juan Quintela
So arch_dirty_ring option becomes one option like the others.

Reviewed-by: Peter Xu 
Message-ID: <20230608224943.3877-8-quint...@redhat.com>
Signed-off-by: Juan Quintela 
---
 tests/qtest/migration-test.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index fc3337b7bb..40967fdffc 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -608,6 +608,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 g_autofree char *bootpath = NULL;
 g_autofree char *shmem_opts = NULL;
 g_autofree char *shmem_path = NULL;
+const char *kvm_opts = NULL;
 const char *arch = qtest_get_arch();
 const char *memory_size;
 
@@ -683,13 +684,16 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 shmem_opts = g_strdup("");
 }
 
+if (args->use_dirty_ring) {
+kvm_opts = ",dirty-ring-size=4096";
+}
+
 cmd_source = g_strdup_printf("-accel kvm%s -accel tcg "
  "-name source,debug-threads=on "
  "-m %s "
  "-serial file:%s/src_serial "
  "%s %s %s %s %s",
- args->use_dirty_ring ?
- ",dirty-ring-size=4096" : "",
+ kvm_opts ? kvm_opts : "",
  memory_size, tmpfs,
  arch_opts ? arch_opts : "",
  arch_source ? arch_source : "",
@@ -709,8 +713,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
  "-serial file:%s/dest_serial "
  "-incoming %s "
  "%s %s %s %s %s",
- args->use_dirty_ring ?
- ",dirty-ring-size=4096" : "",
+ kvm_opts ? kvm_opts : "",
  memory_size, tmpfs, uri,
  arch_opts ? arch_opts : "",
  arch_target ? arch_target : "",
-- 
2.40.1




[PULL 28/30] qemu-file: Simplify qemu_file_shutdown()

2023-06-21 Thread Juan Quintela
Reviewed-by: Peter Xu 
Message-ID: <20230530183941.7223-20-quint...@redhat.com>
Signed-off-by: Juan Quintela 
---
 migration/qemu-file.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 9a89e17924..4c577bdff8 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -65,8 +65,6 @@ struct QEMUFile {
  */
 int qemu_file_shutdown(QEMUFile *f)
 {
-int ret = 0;
-
 /*
  * We must set qemufile error before the real shutdown(), otherwise
  * there can be a race window where we thought IO all went though
@@ -96,10 +94,10 @@ int qemu_file_shutdown(QEMUFile *f)
 }
 
 if (qio_channel_shutdown(f->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL) < 0) {
-ret = -EIO;
+return -EIO;
 }
 
-return ret;
+return 0;
 }
 
 bool qemu_file_mode_is_not_valid(const char *mode)
-- 
2.40.1




[PULL 21/30] migration: Refactor repeated call of yank_unregister_instance

2023-06-21 Thread Juan Quintela
From: Tejus GK 

In the function qmp_migrate(), yank_unregister_instance() gets called
twice which isn't required. Hence, refactoring it so that it gets called
during the local_error cleanup.

Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Juan Quintela 
Acked-by: Peter Xu 
Signed-off-by: Tejus GK 
Message-ID: <20230621130940.178659-3-tejus...@nutanix.com>
Signed-off-by: Juan Quintela 
---
 migration/migration.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index e6bff2e848..7a4ba2e846 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1676,15 +1676,11 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
blk,
 } else if (strstart(uri, "fd:", )) {
 fd_start_outgoing_migration(s, p, _err);
 } else {
-if (!(has_resume && resume)) {
-yank_unregister_instance(MIGRATION_YANK_INSTANCE);
-}
 error_setg(_err, QERR_INVALID_PARAMETER_VALUE, "uri",
"a valid migration protocol");
 migrate_set_state(>state, MIGRATION_STATUS_SETUP,
   MIGRATION_STATUS_FAILED);
 block_cleanup_parameters();
-return;
 }
 
 if (local_err) {
-- 
2.40.1




[PULL 16/30] migration-test: bootpath is the same for all tests and for all archs

2023-06-21 Thread Juan Quintela
So just make it a global variable.

Reviewed-by: Peter Xu 
Message-ID: <20230608224943.3877-9-quint...@redhat.com>
Signed-off-by: Juan Quintela 
---
 tests/qtest/migration-test.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 40967fdffc..0f80dbfe80 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -102,6 +102,7 @@ static bool ufd_version_check(void)
 #endif
 
 static char *tmpfs;
+static char *bootpath;
 
 /* The boot file modifies memory area in [start_address, end_address)
  * repeatedly. It outputs a 'B' at a fixed rate while it's still running.
@@ -110,7 +111,7 @@ static char *tmpfs;
 #include "tests/migration/aarch64/a-b-kernel.h"
 #include "tests/migration/s390x/a-b-bios.h"
 
-static void init_bootfile(const char *bootpath, void *content, size_t len)
+static void init_bootfile(void *content, size_t len)
 {
 FILE *bootfile = fopen(bootpath, "wb");
 
@@ -605,7 +606,6 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 g_autofree gchar *cmd_source = NULL;
 g_autofree gchar *cmd_target = NULL;
 const gchar *ignore_stderr;
-g_autofree char *bootpath = NULL;
 g_autofree char *shmem_opts = NULL;
 g_autofree char *shmem_path = NULL;
 const char *kvm_opts = NULL;
@@ -621,17 +621,16 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 
 got_src_stop = false;
 got_dst_resume = false;
-bootpath = g_strdup_printf("%s/bootsect", tmpfs);
 if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
 /* the assembled x86 boot sector should be exactly one sector large */
 assert(sizeof(x86_bootsect) == 512);
-init_bootfile(bootpath, x86_bootsect, sizeof(x86_bootsect));
+init_bootfile(x86_bootsect, sizeof(x86_bootsect));
 memory_size = "150M";
 arch_opts = g_strdup_printf("-drive file=%s,format=raw", bootpath);
 start_address = X86_TEST_MEM_START;
 end_address = X86_TEST_MEM_END;
 } else if (g_str_equal(arch, "s390x")) {
-init_bootfile(bootpath, s390x_elf, sizeof(s390x_elf));
+init_bootfile(s390x_elf, sizeof(s390x_elf));
 memory_size = "128M";
 arch_opts = g_strdup_printf("-bios %s", bootpath);
 start_address = S390_TEST_MEM_START;
@@ -646,7 +645,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
   "until'", end_address, start_address);
 arch_opts = g_strdup("-nodefaults -machine vsmt=8");
 } else if (strcmp(arch, "aarch64") == 0) {
-init_bootfile(bootpath, aarch64_kernel, sizeof(aarch64_kernel));
+init_bootfile(aarch64_kernel, sizeof(aarch64_kernel));
 memory_size = "150M";
 arch_opts = g_strdup_printf("-machine virt,gic-version=max -cpu max "
 "-kernel %s", bootpath);
@@ -764,7 +763,6 @@ static void test_migrate_end(QTestState *from, QTestState 
*to, bool test_dest)
 
 qtest_quit(to);
 
-cleanup("bootsect");
 cleanup("migsocket");
 cleanup("src_serial");
 cleanup("dest_serial");
@@ -2493,12 +2491,10 @@ static QTestState *dirtylimit_start_vm(void)
 QTestState *vm = NULL;
 g_autofree gchar *cmd = NULL;
 const char *arch = qtest_get_arch();
-g_autofree char *bootpath = NULL;
 
 assert((strcmp(arch, "x86_64") == 0));
-bootpath = g_strdup_printf("%s/bootsect", tmpfs);
 assert(sizeof(x86_bootsect) == 512);
-init_bootfile(bootpath, x86_bootsect, sizeof(x86_bootsect));
+init_bootfile(x86_bootsect, sizeof(x86_bootsect));
 
 cmd = g_strdup_printf("-accel kvm,dirty-ring-size=4096 "
   "-name dirtylimit-test,debug-threads=on "
@@ -2514,7 +2510,6 @@ static QTestState *dirtylimit_start_vm(void)
 static void dirtylimit_stop_vm(QTestState *vm)
 {
 qtest_quit(vm);
-cleanup("bootsect");
 cleanup("vm_serial");
 }
 
@@ -2676,6 +2671,7 @@ int main(int argc, char **argv)
g_get_tmp_dir(), err->message);
 }
 g_assert(tmpfs);
+bootpath = g_strdup_printf("%s/bootsect", tmpfs);
 
 module_call_init(MODULE_INIT_QOM);
 
@@ -2819,6 +2815,8 @@ int main(int argc, char **argv)
 
 g_assert_cmpint(ret, ==, 0);
 
+cleanup("bootsect");
+g_free(bootpath);
 ret = rmdir(tmpfs);
 if (ret != 0) {
 g_test_message("unable to rmdir: path (%s): %s",
-- 
2.40.1




[PULL 17/30] migration-test: Add bootfile_create/delete() functions

2023-06-21 Thread Juan Quintela
The bootsector code is read only from the guest (otherwise we are
going to have problems with it being read from both source and
destination).

Create a single copy for all the tests.

Reviewed-by: Peter Xu 
Message-ID: <20230608224943.3877-10-quint...@redhat.com>
Signed-off-by: Juan Quintela 
---
 tests/qtest/migration-test.c | 50 ++--
 1 file changed, 36 insertions(+), 14 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 0f80dbfe80..eb6a11e758 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -111,14 +111,47 @@ static char *bootpath;
 #include "tests/migration/aarch64/a-b-kernel.h"
 #include "tests/migration/s390x/a-b-bios.h"
 
-static void init_bootfile(void *content, size_t len)
+static void bootfile_create(char *dir)
 {
+const char *arch = qtest_get_arch();
+unsigned char *content;
+size_t len;
+
+bootpath = g_strdup_printf("%s/bootsect", dir);
+if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
+/* the assembled x86 boot sector should be exactly one sector large */
+g_assert(sizeof(x86_bootsect) == 512);
+content = x86_bootsect;
+len = sizeof(x86_bootsect);
+} else if (g_str_equal(arch, "s390x")) {
+content = s390x_elf;
+len = sizeof(s390x_elf);
+} else if (strcmp(arch, "ppc64") == 0) {
+/*
+ * sane architectures can be programmed at the boot prompt
+ */
+return;
+} else if (strcmp(arch, "aarch64") == 0) {
+content = aarch64_kernel;
+len = sizeof(aarch64_kernel);
+g_assert(sizeof(aarch64_kernel) <= ARM_TEST_MAX_KERNEL_SIZE);
+} else {
+g_assert_not_reached();
+}
+
 FILE *bootfile = fopen(bootpath, "wb");
 
 g_assert_cmpint(fwrite(content, len, 1, bootfile), ==, 1);
 fclose(bootfile);
 }
 
+static void bootfile_delete(void)
+{
+unlink(bootpath);
+g_free(bootpath);
+bootpath = NULL;
+}
+
 /*
  * Wait for some output in the serial output file,
  * we get an 'A' followed by an endless string of 'B's
@@ -622,15 +655,11 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 got_src_stop = false;
 got_dst_resume = false;
 if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
-/* the assembled x86 boot sector should be exactly one sector large */
-assert(sizeof(x86_bootsect) == 512);
-init_bootfile(x86_bootsect, sizeof(x86_bootsect));
 memory_size = "150M";
 arch_opts = g_strdup_printf("-drive file=%s,format=raw", bootpath);
 start_address = X86_TEST_MEM_START;
 end_address = X86_TEST_MEM_END;
 } else if (g_str_equal(arch, "s390x")) {
-init_bootfile(s390x_elf, sizeof(s390x_elf));
 memory_size = "128M";
 arch_opts = g_strdup_printf("-bios %s", bootpath);
 start_address = S390_TEST_MEM_START;
@@ -645,14 +674,11 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
   "until'", end_address, start_address);
 arch_opts = g_strdup("-nodefaults -machine vsmt=8");
 } else if (strcmp(arch, "aarch64") == 0) {
-init_bootfile(aarch64_kernel, sizeof(aarch64_kernel));
 memory_size = "150M";
 arch_opts = g_strdup_printf("-machine virt,gic-version=max -cpu max "
 "-kernel %s", bootpath);
 start_address = ARM_TEST_MEM_START;
 end_address = ARM_TEST_MEM_END;
-
-g_assert(sizeof(aarch64_kernel) <= ARM_TEST_MAX_KERNEL_SIZE);
 } else {
 g_assert_not_reached();
 }
@@ -2493,9 +2519,6 @@ static QTestState *dirtylimit_start_vm(void)
 const char *arch = qtest_get_arch();
 
 assert((strcmp(arch, "x86_64") == 0));
-assert(sizeof(x86_bootsect) == 512);
-init_bootfile(x86_bootsect, sizeof(x86_bootsect));
-
 cmd = g_strdup_printf("-accel kvm,dirty-ring-size=4096 "
   "-name dirtylimit-test,debug-threads=on "
   "-m 150M -smp 1 "
@@ -2671,7 +2694,7 @@ int main(int argc, char **argv)
g_get_tmp_dir(), err->message);
 }
 g_assert(tmpfs);
-bootpath = g_strdup_printf("%s/bootsect", tmpfs);
+bootfile_create(tmpfs);
 
 module_call_init(MODULE_INIT_QOM);
 
@@ -2815,8 +2838,7 @@ int main(int argc, char **argv)
 
 g_assert_cmpint(ret, ==, 0);
 
-cleanup("bootsect");
-g_free(bootpath);
+bootfile_delete();
 ret = rmdir(tmpfs);
 if (ret != 0) {
 g_test_message("unable to rmdir: path (%s): %s",
-- 
2.40.1




[PULL 24/30] qemu-file: Rename qemu_file_transferred_ fast -> noflush

2023-06-21 Thread Juan Quintela
Fast don't say much.  Noflush indicates more clearly that it is like
qemu_file_transferred but without the flush.

Reviewed-by: Philippe Mathieu-Daudé 
Message-ID: <20230530183941.7223-2-quint...@redhat.com>
Signed-off-by: Juan Quintela 
---
 migration/qemu-file.h | 11 +--
 migration/qemu-file.c |  2 +-
 migration/savevm.c|  4 ++--
 migration/vmstate.c   |  4 ++--
 4 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index e649718492..aa6eee66da 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -86,16 +86,15 @@ int qemu_fclose(QEMUFile *f);
 uint64_t qemu_file_transferred(QEMUFile *f);
 
 /*
- * qemu_file_transferred_fast:
+ * qemu_file_transferred_noflush:
  *
- * As qemu_file_transferred except for writable
- * files, where no flush is performed and the reported
- * amount will include the size of any queued buffers,
- * on top of the amount actually transferred.
+ * As qemu_file_transferred except for writable files, where no flush
+ * is performed and the reported amount will include the size of any
+ * queued buffers, on top of the amount actually transferred.
  *
  * Returns: the total bytes transferred and queued
  */
-uint64_t qemu_file_transferred_fast(QEMUFile *f);
+uint64_t qemu_file_transferred_noflush(QEMUFile *f);
 
 /*
  * put_buffer without copying the buffer.
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index acc282654a..fdf115b5da 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -694,7 +694,7 @@ int coroutine_mixed_fn qemu_get_byte(QEMUFile *f)
 return result;
 }
 
-uint64_t qemu_file_transferred_fast(QEMUFile *f)
+uint64_t qemu_file_transferred_noflush(QEMUFile *f)
 {
 uint64_t ret = f->total_transferred;
 int i;
diff --git a/migration/savevm.c b/migration/savevm.c
index bc284087f9..f26b455764 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -927,9 +927,9 @@ static int vmstate_load(QEMUFile *f, SaveStateEntry *se)
 static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se,
JSONWriter *vmdesc)
 {
-uint64_t old_offset = qemu_file_transferred_fast(f);
+uint64_t old_offset = qemu_file_transferred_noflush(f);
 se->ops->save_state(f, se->opaque);
-uint64_t size = qemu_file_transferred_fast(f) - old_offset;
+uint64_t size = qemu_file_transferred_noflush(f) - old_offset;
 
 if (vmdesc) {
 json_writer_int64(vmdesc, "size", size);
diff --git a/migration/vmstate.c b/migration/vmstate.c
index af01d54b6f..31842c3afb 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -361,7 +361,7 @@ int vmstate_save_state_v(QEMUFile *f, const 
VMStateDescription *vmsd,
 void *curr_elem = first_elem + size * i;
 
 vmsd_desc_field_start(vmsd, vmdesc_loop, field, i, n_elems);
-old_offset = qemu_file_transferred_fast(f);
+old_offset = qemu_file_transferred_noflush(f);
 if (field->flags & VMS_ARRAY_OF_POINTER) {
 assert(curr_elem);
 curr_elem = *(void **)curr_elem;
@@ -391,7 +391,7 @@ int vmstate_save_state_v(QEMUFile *f, const 
VMStateDescription *vmsd,
 return ret;
 }
 
-written_bytes = qemu_file_transferred_fast(f) - old_offset;
+written_bytes = qemu_file_transferred_noflush(f) - old_offset;
 vmsd_desc_field_end(vmsd, vmdesc_loop, field, written_bytes, 
i);
 
 /* Compressed arrays only care about the first element */
-- 
2.40.1




[PULL 19/30] migration-test: simplify shmem_opts handling

2023-06-21 Thread Juan Quintela
Reviewed-by: Peter Xu 
Message-ID: <20230608224943.3877-4-quint...@redhat.com>
Signed-off-by: Juan Quintela 
---
 tests/qtest/migration-test.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index fbe9db23cf..e3e7d54216 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -704,9 +704,6 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 "-object memory-backend-file,id=mem0,size=%s"
 ",mem-path=%s,share=on -numa node,memdev=mem0",
 memory_size, shmem_path);
-} else {
-shmem_path = NULL;
-shmem_opts = g_strdup("");
 }
 
 if (args->use_dirty_ring) {
@@ -722,7 +719,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
  memory_size, tmpfs,
  arch_opts ? arch_opts : "",
  arch_source ? arch_source : "",
- shmem_opts,
+ shmem_opts ? shmem_opts : "",
  args->opts_source ? args->opts_source : "",
  ignore_stderr);
 if (!args->only_target) {
@@ -742,7 +739,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
  memory_size, tmpfs, uri,
  arch_opts ? arch_opts : "",
  arch_target ? arch_target : "",
- shmem_opts,
+ shmem_opts ? shmem_opts : "",
  args->opts_target ? args->opts_target : "",
  ignore_stderr);
 *to = qtest_init(cmd_target);
-- 
2.40.1




[PULL 11/30] migration-test: Be consistent for ppc

2023-06-21 Thread Juan Quintela
It makes no sense that we don't have the same configuration on both sides.

Reviewed-by: Laurent Vivier 
Message-ID: <20230608224943.3877-2-quint...@redhat.com>
Signed-off-by: Juan Quintela 
---
 tests/qtest/migration-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b0c355bbd9..c5e0c69c6b 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -646,7 +646,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
   "'nvramrc=hex .\" _\" begin %x %x "
   "do i c@ 1 + i c! 1000 +loop .\" B\" 0 "
   "until'", end_address, start_address);
-arch_target = g_strdup("");
+arch_target = g_strdup("-nodefaults");
 } else if (strcmp(arch, "aarch64") == 0) {
 init_bootfile(bootpath, aarch64_kernel, sizeof(aarch64_kernel));
 machine_opts = "virt,gic-version=max";
-- 
2.40.1




[PULL 05/30] qapi/migration: Introduce vcpu-dirty-limit parameters

2023-06-21 Thread Juan Quintela
From: Hyman Huang(黄勇) 

Introduce "vcpu-dirty-limit" migration parameter used
to limit dirty page rate during live migration.

"vcpu-dirty-limit" and "x-vcpu-dirty-limit-period" are
two dirty-limit-related migration parameters, which can
be set before and during live migration by qmp
migrate-set-parameters.

This two parameters are used to help implement the dirty
page rate limit algo of migration.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 
Reviewed-by: Juan Quintela 
Message-Id: <168618975839.6361.1740763387474768865...@git.sr.ht>
Signed-off-by: Juan Quintela 
---
 qapi/migration.json| 18 +++---
 migration/migration-hmp-cmds.c |  8 
 migration/options.c| 21 +
 3 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 67c26d9dea..e7243c0c0d 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -783,6 +783,9 @@
 # live migration. Should be in the range 1 to 
1000ms,
 # defaults to 1000ms. (Since 8.1)
 #
+# @vcpu-dirty-limit: Dirtyrate limit (MB/s) during live migration.
+#Defaults to 1. (Since 8.1)
+#
 # Features:
 #
 # @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
@@ -806,7 +809,8 @@
'max-cpu-throttle', 'multifd-compression',
'multifd-zlib-level', 'multifd-zstd-level',
'block-bitmap-mapping',
-   { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] } ] 
}
+   { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] },
+   'vcpu-dirty-limit'] }
 
 ##
 # @MigrateSetParameters:
@@ -945,6 +949,9 @@
 # live migration. Should be in the range 1 to 
1000ms,
 # defaults to 1000ms. (Since 8.1)
 #
+# @vcpu-dirty-limit: Dirtyrate limit (MB/s) during live migration.
+#Defaults to 1. (Since 8.1)
+#
 # Features:
 #
 # @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
@@ -985,7 +992,8 @@
 '*multifd-zstd-level': 'uint8',
 '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
 '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
-'features': [ 'unstable' ] } } }
+'features': [ 'unstable' ] },
+'*vcpu-dirty-limit': 'uint64'} }
 
 ##
 # @migrate-set-parameters:
@@ -1144,6 +1152,9 @@
 # live migration. Should be in the range 1 to 
1000ms,
 # defaults to 1000ms. (Since 8.1)
 #
+# @vcpu-dirty-limit: Dirtyrate limit (MB/s) during live migration.
+#Defaults to 1. (Since 8.1)
+#
 # Features:
 #
 # @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
@@ -1181,7 +1192,8 @@
 '*multifd-zstd-level': 'uint8',
 '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
 '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
-'features': [ 'unstable' ] } } }
+'features': [ 'unstable' ] },
+'*vcpu-dirty-limit': 'uint64'} }
 
 ##
 # @query-migrate-parameters:
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 352e9ec716..35e8020bbf 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -368,6 +368,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "%s: %" PRIu64 " ms\n",
 MigrationParameter_str(MIGRATION_PARAMETER_X_VCPU_DIRTY_LIMIT_PERIOD),
 params->x_vcpu_dirty_limit_period);
+
+monitor_printf(mon, "%s: %" PRIu64 " MB/s\n",
+MigrationParameter_str(MIGRATION_PARAMETER_VCPU_DIRTY_LIMIT),
+params->vcpu_dirty_limit);
 }
 
 qapi_free_MigrationParameters(params);
@@ -628,6 +632,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 p->has_x_vcpu_dirty_limit_period = true;
 visit_type_size(v, param, >x_vcpu_dirty_limit_period, );
 break;
+case MIGRATION_PARAMETER_VCPU_DIRTY_LIMIT:
+p->has_vcpu_dirty_limit = true;
+visit_type_size(v, param, >vcpu_dirty_limit, );
+break;
 default:
 assert(0);
 }
diff --git a/migration/options.c b/migration/options.c
index 9743dea3ab..8acf5f1d2c 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -81,6 +81,7 @@
 DEFINE_PROP_BOOL(name, MigrationState, capabilities[x], false)
 
 #define DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT_PERIOD 1000/* milliseconds */
+#define DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT1   /* MB/s */
 
 Property migration_properties[] = {
 DEFINE_PROP_BOOL("store-global-state", MigrationState,
@@ -168,6 +169,9 @@ Property migration_properties[] = {
 

[PULL 18/30] migration-test: dirtylimit checks for x86_64 arch before

2023-06-21 Thread Juan Quintela
So no need to assert we are in x86_64.
Once there, refactor the function to remove useless variables.

Reviewed-by: Peter Xu 
Message-ID: <20230608224943.3877-11-quint...@redhat.com>
Signed-off-by: Juan Quintela 
---
 tests/qtest/migration-test.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index eb6a11e758..fbe9db23cf 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2515,10 +2515,7 @@ static int64_t get_limit_rate(QTestState *who)
 static QTestState *dirtylimit_start_vm(void)
 {
 QTestState *vm = NULL;
-g_autofree gchar *cmd = NULL;
-const char *arch = qtest_get_arch();
-
-assert((strcmp(arch, "x86_64") == 0));
+g_autofree gchar *
 cmd = g_strdup_printf("-accel kvm,dirty-ring-size=4096 "
   "-name dirtylimit-test,debug-threads=on "
   "-m 150M -smp 1 "
-- 
2.40.1




[PULL 09/30] migration: Implement dirty-limit convergence algo

2023-06-21 Thread Juan Quintela
From: Hyman Huang(黄勇) 

Implement dirty-limit convergence algo for live migration,
which is kind of like auto-converge algo but using dirty-limit
instead of cpu throttle to make migration convergent.

Enable dirty page limit if dirty_rate_high_cnt greater than 2
when dirty-limit capability enabled, Disable dirty-limit if
migration be canceled.

Note that "set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit"
commands are not allowed during dirty-limit live migration.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Markus Armbruster 
Message-ID: <168733225273.5845.1587182678887974167...@git.sr.ht>
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/migration.c  |  3 +++
 migration/ram.c| 36 
 softmmu/dirtylimit.c   | 29 +
 migration/trace-events |  1 +
 4 files changed, 69 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 3a001dd042..c101784dfa 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -165,6 +165,9 @@ void migration_cancel(const Error *error)
 if (error) {
 migrate_set_error(current_migration, error);
 }
+if (migrate_dirty_limit()) {
+qmp_cancel_vcpu_dirty_limit(false, -1, NULL);
+}
 migrate_fd_cancel(current_migration);
 }
 
diff --git a/migration/ram.c b/migration/ram.c
index b6559f9312..8a86363216 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -46,6 +46,7 @@
 #include "qapi/error.h"
 #include "qapi/qapi-types-migration.h"
 #include "qapi/qapi-events-migration.h"
+#include "qapi/qapi-commands-migration.h"
 #include "qapi/qmp/qerror.h"
 #include "trace.h"
 #include "exec/ram_addr.h"
@@ -59,6 +60,8 @@
 #include "multifd.h"
 #include "sysemu/runstate.h"
 #include "options.h"
+#include "sysemu/dirtylimit.h"
+#include "sysemu/kvm.h"
 
 #include "hw/boards.h" /* for machine_dump_guest_core() */
 
@@ -984,6 +987,37 @@ static void migration_update_rates(RAMState *rs, int64_t 
end_time)
 }
 }
 
+/*
+ * Enable dirty-limit to throttle down the guest
+ */
+static void migration_dirty_limit_guest(void)
+{
+/*
+ * dirty page rate quota for all vCPUs fetched from
+ * migration parameter 'vcpu_dirty_limit'
+ */
+static int64_t quota_dirtyrate;
+MigrationState *s = migrate_get_current();
+
+/*
+ * If dirty limit already enabled and migration parameter
+ * vcpu-dirty-limit untouched.
+ */
+if (dirtylimit_in_service() &&
+quota_dirtyrate == s->parameters.vcpu_dirty_limit) {
+return;
+}
+
+quota_dirtyrate = s->parameters.vcpu_dirty_limit;
+
+/*
+ * Set all vCPU a quota dirtyrate, note that the second
+ * parameter will be ignored if setting all vCPU for the vm
+ */
+qmp_set_vcpu_dirty_limit(false, -1, quota_dirtyrate, NULL);
+trace_migration_dirty_limit_guest(quota_dirtyrate);
+}
+
 static void migration_trigger_throttle(RAMState *rs)
 {
 uint64_t threshold = migrate_throttle_trigger_threshold();
@@ -1013,6 +1047,8 @@ static void migration_trigger_throttle(RAMState *rs)
 trace_migration_throttle();
 mig_throttle_guest_down(bytes_dirty_period,
 bytes_dirty_threshold);
+} else if (migrate_dirty_limit()) {
+migration_dirty_limit_guest();
 }
 }
 }
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index 3f1103b04b..6e218bb249 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -436,6 +436,23 @@ static void dirtylimit_cleanup(void)
 dirtylimit_state_finalize();
 }
 
+/*
+ * dirty page rate limit is not allowed to set if migration
+ * is running with dirty-limit capability enabled.
+ */
+static bool dirtylimit_is_allowed(void)
+{
+MigrationState *ms = migrate_get_current();
+
+if (migration_is_running(ms->state) &&
+(!qemu_thread_is_self(>thread)) &&
+migrate_dirty_limit() &&
+dirtylimit_in_service()) {
+return false;
+}
+return true;
+}
+
 void qmp_cancel_vcpu_dirty_limit(bool has_cpu_index,
  int64_t cpu_index,
  Error **errp)
@@ -449,6 +466,12 @@ void qmp_cancel_vcpu_dirty_limit(bool has_cpu_index,
 return;
 }
 
+if (!dirtylimit_is_allowed()) {
+error_setg(errp, "can't cancel dirty page rate limit while"
+   " migration is running");
+return;
+}
+
 if (!dirtylimit_in_service()) {
 return;
 }
@@ -499,6 +522,12 @@ void qmp_set_vcpu_dirty_limit(bool has_cpu_index,
 return;
 }
 
+if (!dirtylimit_is_allowed()) {
+error_setg(errp, "can't set dirty page rate limit while"
+   " migration is running");
+return;
+}
+
 if (!dirty_rate) {
 qmp_cancel_vcpu_dirty_limit(has_cpu_index, cpu_index, errp);
 return;
diff --git a/migration/trace-events b/migration/trace-events
index cdaef7a1ea..c5cb280d95 

[PULL 12/30] migration-test: Make machine_opts regular with other options

2023-06-21 Thread Juan Quintela
Reviewed-by: Peter Xu 
Signed-off-by: Juan Quintela 
Message-ID: <20230608224943.3877-5-quint...@redhat.com>
---
 tests/qtest/migration-test.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index c5e0c69c6b..79157d600b 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -637,7 +637,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 start_address = S390_TEST_MEM_START;
 end_address = S390_TEST_MEM_END;
 } else if (strcmp(arch, "ppc64") == 0) {
-machine_opts = "vsmt=8";
+machine_opts = "-machine vsmt=8";
 memory_size = "256M";
 start_address = PPC_TEST_MEM_START;
 end_address = PPC_TEST_MEM_END;
@@ -649,7 +649,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 arch_target = g_strdup("-nodefaults");
 } else if (strcmp(arch, "aarch64") == 0) {
 init_bootfile(bootpath, aarch64_kernel, sizeof(aarch64_kernel));
-machine_opts = "virt,gic-version=max";
+machine_opts = "-machine virt,gic-version=max";
 memory_size = "150M";
 arch_source = g_strdup_printf("-cpu max "
   "-kernel %s",
@@ -689,14 +689,13 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 shmem_opts = g_strdup("");
 }
 
-cmd_source = g_strdup_printf("-accel kvm%s -accel tcg%s%s "
+cmd_source = g_strdup_printf("-accel kvm%s -accel tcg %s "
  "-name source,debug-threads=on "
  "-m %s "
  "-serial file:%s/src_serial "
  "%s %s %s %s",
  args->use_dirty_ring ?
  ",dirty-ring-size=4096" : "",
- machine_opts ? " -machine " : "",
  machine_opts ? machine_opts : "",
  memory_size, tmpfs,
  arch_source, shmem_opts,
@@ -709,7 +708,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
  _src_stop);
 }
 
-cmd_target = g_strdup_printf("-accel kvm%s -accel tcg%s%s "
+cmd_target = g_strdup_printf("-accel kvm%s -accel tcg %s "
  "-name target,debug-threads=on "
  "-m %s "
  "-serial file:%s/dest_serial "
@@ -717,7 +716,6 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
  "%s %s %s %s",
  args->use_dirty_ring ?
  ",dirty-ring-size=4096" : "",
- machine_opts ? " -machine " : "",
  machine_opts ? machine_opts : "",
  memory_size, tmpfs, uri,
  arch_target, shmem_opts,
-- 
2.40.1




[PULL 10/30] migration: Extend query-migrate to provide dirty page limit info

2023-06-21 Thread Juan Quintela
From: Hyman Huang(黄勇) 

Extend query-migrate to provide throttle time and estimated
ring full time with dirty-limit capability enabled, through which
we can observe if dirty limit take effect during live migration.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Markus Armbruster 
Reviewed-by: Juan Quintela 
Message-ID: <168733225273.5845.1587182678887974167...@git.sr.ht>
Signed-off-by: Juan Quintela 
---
 qapi/migration.json| 16 +-
 include/sysemu/dirtylimit.h|  2 ++
 migration/migration-hmp-cmds.c | 10 +
 migration/migration.c  | 10 +
 softmmu/dirtylimit.c   | 39 ++
 5 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 621e6604c6..e9b24fc410 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -250,6 +250,18 @@
 # blocked.  Present and non-empty when migration is blocked.
 # (since 6.0)
 #
+# @dirty-limit-throttle-time-per-round: Maximum throttle time (in 
microseconds) of virtual
+#   CPUs each dirty ring full round, which 
shows how
+#   MigrationCapability dirty-limit 
affects the guest
+#   during live migration. (since 8.1)
+#
+# @dirty-limit-ring-full-time: Estimated average dirty ring full time (in 
microseconds)
+#  each dirty ring full round, note that the value 
equals
+#  dirty ring memory size divided by average dirty 
page rate
+#  of virtual CPU, which can be used to observe 
the average
+#  memory load of virtual CPU indirectly. Note 
that zero
+#  means guest doesn't dirty memory (since 8.1)
+#
 # Since: 0.14
 ##
 { 'struct': 'MigrationInfo',
@@ -267,7 +279,9 @@
'*postcopy-blocktime' : 'uint32',
'*postcopy-vcpu-blocktime': ['uint32'],
'*compression': 'CompressionStats',
-   '*socket-address': ['SocketAddress'] } }
+   '*socket-address': ['SocketAddress'],
+   '*dirty-limit-throttle-time-per-round': 'uint64',
+   '*dirty-limit-ring-full-time': 'uint64'} }
 
 ##
 # @query-migrate:
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
index 8d2c1f3a6b..d11edb 100644
--- a/include/sysemu/dirtylimit.h
+++ b/include/sysemu/dirtylimit.h
@@ -34,4 +34,6 @@ void dirtylimit_set_vcpu(int cpu_index,
 void dirtylimit_set_all(uint64_t quota,
 bool enable);
 void dirtylimit_vcpu_execute(CPUState *cpu);
+uint64_t dirtylimit_throttle_time_per_round(void);
+uint64_t dirtylimit_ring_full_time(void);
 #endif
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 35e8020bbf..c115ef2d23 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -190,6 +190,16 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
info->cpu_throttle_percentage);
 }
 
+if (info->has_dirty_limit_throttle_time_per_round) {
+monitor_printf(mon, "dirty-limit throttle time: %" PRIu64 " us\n",
+   info->dirty_limit_throttle_time_per_round);
+}
+
+if (info->has_dirty_limit_ring_full_time) {
+monitor_printf(mon, "dirty-limit ring full time: %" PRIu64 " us\n",
+   info->dirty_limit_ring_full_time);
+}
+
 if (info->has_postcopy_blocktime) {
 monitor_printf(mon, "postcopy blocktime: %u\n",
info->postcopy_blocktime);
diff --git a/migration/migration.c b/migration/migration.c
index c101784dfa..719f91573f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -64,6 +64,7 @@
 #include "yank_functions.h"
 #include "sysemu/qtest.h"
 #include "options.h"
+#include "sysemu/dirtylimit.h"
 
 static NotifierList migration_state_notifiers =
 NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
@@ -968,6 +969,15 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 info->ram->dirty_pages_rate =
stat64_get(_stats.dirty_pages_rate);
 }
+
+if (migrate_dirty_limit() && dirtylimit_in_service()) {
+info->has_dirty_limit_throttle_time_per_round = true;
+info->dirty_limit_throttle_time_per_round =
+dirtylimit_throttle_time_per_round();
+
+info->has_dirty_limit_ring_full_time = true;
+info->dirty_limit_ring_full_time = dirtylimit_ring_full_time();
+}
 }
 
 static void populate_disk_info(MigrationInfo *info)
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index 6e218bb249..af27f0d022 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -565,6 +565,45 @@ out:
 hmp_handle_error(mon, err);
 }
 
+/* Return the max throttle time of each virtual CPU */
+uint64_t dirtylimit_throttle_time_per_round(void)
+{
+

[PULL 02/30] migration/multifd: Protect accesses to migration_threads

2023-06-21 Thread Juan Quintela
From: Fabiano Rosas 

This doubly linked list is common for all the multifd and migration
threads so we need to avoid concurrent access.

Add a mutex to protect the data from concurrent access. This fixes a
crash when removing two MigrationThread objects from the list at the
same time during cleanup of multifd threads.

Fixes: 671326201d ("migration: Introduce interface query-migrationthreads")
Signed-off-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Reviewed-by: Juan Quintela 
Message-Id: <20230607161306.31425-3-faro...@suse.de>
Signed-off-by: Juan Quintela 
---
 migration/threadinfo.h |  2 --
 migration/threadinfo.c | 15 ++-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/migration/threadinfo.h b/migration/threadinfo.h
index 8aa6999d58..2f356ff312 100644
--- a/migration/threadinfo.h
+++ b/migration/threadinfo.h
@@ -10,8 +10,6 @@
  *  See the COPYING file in the top-level directory.
  */
 
-#include "qemu/queue.h"
-#include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-migration.h"
 
diff --git a/migration/threadinfo.c b/migration/threadinfo.c
index 3dd9b14ae6..262990dd75 100644
--- a/migration/threadinfo.c
+++ b/migration/threadinfo.c
@@ -10,23 +10,35 @@
  *  See the COPYING file in the top-level directory.
  */
 
+#include "qemu/osdep.h"
+#include "qemu/queue.h"
+#include "qemu/lockable.h"
 #include "threadinfo.h"
 
+QemuMutex migration_threads_lock;
 static QLIST_HEAD(, MigrationThread) migration_threads;
 
+static void __attribute__((constructor)) migration_threads_init(void)
+{
+qemu_mutex_init(_threads_lock);
+}
+
 MigrationThread *migration_threads_add(const char *name, int thread_id)
 {
 MigrationThread *thread =  g_new0(MigrationThread, 1);
 thread->name = name;
 thread->thread_id = thread_id;
 
-QLIST_INSERT_HEAD(_threads, thread, node);
+WITH_QEMU_LOCK_GUARD(_threads_lock) {
+QLIST_INSERT_HEAD(_threads, thread, node);
+}
 
 return thread;
 }
 
 void migration_threads_remove(MigrationThread *thread)
 {
+QEMU_LOCK_GUARD(_threads_lock);
 if (thread) {
 QLIST_REMOVE(thread, node);
 g_free(thread);
@@ -39,6 +51,7 @@ MigrationThreadInfoList *qmp_query_migrationthreads(Error 
**errp)
 MigrationThreadInfoList **tail = 
 MigrationThread *thread = NULL;
 
+QEMU_LOCK_GUARD(_threads_lock);
 QLIST_FOREACH(thread, _threads, node) {
 MigrationThreadInfo *info = g_new0(MigrationThreadInfo, 1);
 info->name = g_strdup(thread->name);
-- 
2.40.1




[PULL 08/30] migration: Put the detection logic before auto-converge checking

2023-06-21 Thread Juan Quintela
From: Hyman Huang(黄勇) 

This commit is prepared for the implementation of dirty-limit
convergence algo.

The detection logic of throttling condition can apply to both
auto-converge and dirty-limit algo, putting it's position
before the checking logic for auto-converge feature.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Juan Quintela 
Message-ID: <168733225273.5845.1587182678887974167...@git.sr.ht>
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 78746849b5..b6559f9312 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -999,17 +999,18 @@ static void migration_trigger_throttle(RAMState *rs)
 return;
 }
 
-if (migrate_auto_converge()) {
-/* The following detection logic can be refined later. For now:
-   Check to see if the ratio between dirtied bytes and the approx.
-   amount of bytes that just got transferred since the last time
-   we were in this routine reaches the threshold. If that happens
-   twice, start or increase throttling. */
-
-if ((bytes_dirty_period > bytes_dirty_threshold) &&
-(++rs->dirty_rate_high_cnt >= 2)) {
+/*
+ * The following detection logic can be refined later. For now:
+ * Check to see if the ratio between dirtied bytes and the approx.
+ * amount of bytes that just got transferred since the last time
+ * we were in this routine reaches the threshold. If that happens
+ * twice, start or increase throttling.
+ */
+if ((bytes_dirty_period > bytes_dirty_threshold) &&
+(++rs->dirty_rate_high_cnt >= 2)) {
+rs->dirty_rate_high_cnt = 0;
+if (migrate_auto_converge()) {
 trace_migration_throttle();
-rs->dirty_rate_high_cnt = 0;
 mig_throttle_guest_down(bytes_dirty_period,
 bytes_dirty_threshold);
 }
-- 
2.40.1




[PULL 04/30] qapi/migration: Introduce x-vcpu-dirty-limit-period parameter

2023-06-21 Thread Juan Quintela
From: Hyman Huang(黄勇) 

Introduce "x-vcpu-dirty-limit-period" migration experimental
parameter, which is in the range of 1 to 1000ms and used to
make dirtyrate calculation period configurable.

Currently with the "x-vcpu-dirty-limit-period" varies, the
total time of live migration changes, test results show the
optimal value of "x-vcpu-dirty-limit-period" ranges from
500ms to 1000 ms. "x-vcpu-dirty-limit-period" should be made
stable once it proves best value can not be determined with
developer's experiments.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Markus Armbruster 
Reviewed-by: Juan Quintela 
Message-Id: <168618975839.6361.1740763387474768865...@git.sr.ht>
Signed-off-by: Juan Quintela 
---
 qapi/migration.json| 34 +++---
 migration/migration-hmp-cmds.c |  8 
 migration/options.c| 28 
 3 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 5bb5ab82a0..67c26d9dea 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -779,9 +779,14 @@
 # Nodes are mapped to their block device name if there is one, and
 # to their node name otherwise.  (Since 5.2)
 #
+# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty limit 
during
+# live migration. Should be in the range 1 to 
1000ms,
+# defaults to 1000ms. (Since 8.1)
+#
 # Features:
 #
-# @unstable: Member @x-checkpoint-delay is experimental.
+# @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
+#are experimental.
 #
 # Since: 2.4
 ##
@@ -799,8 +804,9 @@
'multifd-channels',
'xbzrle-cache-size', 'max-postcopy-bandwidth',
'max-cpu-throttle', 'multifd-compression',
-   'multifd-zlib-level' ,'multifd-zstd-level',
-   'block-bitmap-mapping' ] }
+   'multifd-zlib-level', 'multifd-zstd-level',
+   'block-bitmap-mapping',
+   { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] } ] 
}
 
 ##
 # @MigrateSetParameters:
@@ -935,9 +941,14 @@
 # Nodes are mapped to their block device name if there is one, and
 # to their node name otherwise.  (Since 5.2)
 #
+# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty limit 
during
+# live migration. Should be in the range 1 to 
1000ms,
+# defaults to 1000ms. (Since 8.1)
+#
 # Features:
 #
-# @unstable: Member @x-checkpoint-delay is experimental.
+# @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
+#are experimental.
 #
 # TODO: either fuse back into MigrationParameters, or make
 # MigrationParameters members mandatory
@@ -972,7 +983,9 @@
 '*multifd-compression': 'MultiFDCompression',
 '*multifd-zlib-level': 'uint8',
 '*multifd-zstd-level': 'uint8',
-'*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
+'*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
+'*x-vcpu-dirty-limit-period': { 'type': 'uint64',
+'features': [ 'unstable' ] } } }
 
 ##
 # @migrate-set-parameters:
@@ -1127,9 +1140,14 @@
 # Nodes are mapped to their block device name if there is one, and
 # to their node name otherwise.  (Since 5.2)
 #
+# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty limit 
during
+# live migration. Should be in the range 1 to 
1000ms,
+# defaults to 1000ms. (Since 8.1)
+#
 # Features:
 #
-# @unstable: Member @x-checkpoint-delay is experimental.
+# @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
+#are experimental.
 #
 # Since: 2.4
 ##
@@ -1161,7 +1179,9 @@
 '*multifd-compression': 'MultiFDCompression',
 '*multifd-zlib-level': 'uint8',
 '*multifd-zstd-level': 'uint8',
-'*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
+'*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
+'*x-vcpu-dirty-limit-period': { 'type': 'uint64',
+'features': [ 'unstable' ] } } }
 
 ##
 # @query-migrate-parameters:
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 9885d7c9f7..352e9ec716 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -364,6 +364,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 }
 }
 }
+
+monitor_printf(mon, "%s: %" PRIu64 " ms\n",
+MigrationParameter_str(MIGRATION_PARAMETER_X_VCPU_DIRTY_LIMIT_PERIOD),
+params->x_vcpu_dirty_limit_period);
 }
 
 qapi_free_MigrationParameters(params);
@@ -620,6 +624,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const 

[PULL 01/30] migration/multifd: Rename threadinfo.c functions

2023-06-21 Thread Juan Quintela
From: Fabiano Rosas 

We're about to add more functions to this file so make it use the same
coding style as the rest of the code.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Peter Xu 
Message-Id: <20230607161306.31425-2-faro...@suse.de>
Signed-off-by: Juan Quintela 
---
 migration/threadinfo.h | 5 ++---
 migration/migration.c  | 4 ++--
 migration/multifd.c| 4 ++--
 migration/threadinfo.c | 4 ++--
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/migration/threadinfo.h b/migration/threadinfo.h
index 4d69423c0a..8aa6999d58 100644
--- a/migration/threadinfo.h
+++ b/migration/threadinfo.h
@@ -23,6 +23,5 @@ struct MigrationThread {
 QLIST_ENTRY(MigrationThread) node;
 };
 
-MigrationThread *MigrationThreadAdd(const char *name, int thread_id);
-
-void MigrationThreadDel(MigrationThread *info);
+MigrationThread *migration_threads_add(const char *name, int thread_id);
+void migration_threads_remove(MigrationThread *info);
diff --git a/migration/migration.c b/migration/migration.c
index dc05c6f6ea..3a001dd042 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2922,7 +2922,7 @@ static void *migration_thread(void *opaque)
 MigThrError thr_error;
 bool urgent = false;
 
-thread = MigrationThreadAdd("live_migration", qemu_get_thread_id());
+thread = migration_threads_add("live_migration", qemu_get_thread_id());
 
 rcu_register_thread();
 
@@ -3000,7 +3000,7 @@ static void *migration_thread(void *opaque)
 migration_iteration_finish(s);
 object_unref(OBJECT(s));
 rcu_unregister_thread();
-MigrationThreadDel(thread);
+migration_threads_remove(thread);
 return NULL;
 }
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 3387d8277f..4c6cee6547 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -651,7 +651,7 @@ static void *multifd_send_thread(void *opaque)
 int ret = 0;
 bool use_zero_copy_send = migrate_zero_copy_send();
 
-thread = MigrationThreadAdd(p->name, qemu_get_thread_id());
+thread = migration_threads_add(p->name, qemu_get_thread_id());
 
 trace_multifd_send_thread_start(p->id);
 rcu_register_thread();
@@ -767,7 +767,7 @@ out:
 qemu_mutex_unlock(>mutex);
 
 rcu_unregister_thread();
-MigrationThreadDel(thread);
+migration_threads_remove(thread);
 trace_multifd_send_thread_end(p->id, p->num_packets, 
p->total_normal_pages);
 
 return NULL;
diff --git a/migration/threadinfo.c b/migration/threadinfo.c
index 1de8b31855..3dd9b14ae6 100644
--- a/migration/threadinfo.c
+++ b/migration/threadinfo.c
@@ -14,7 +14,7 @@
 
 static QLIST_HEAD(, MigrationThread) migration_threads;
 
-MigrationThread *MigrationThreadAdd(const char *name, int thread_id)
+MigrationThread *migration_threads_add(const char *name, int thread_id)
 {
 MigrationThread *thread =  g_new0(MigrationThread, 1);
 thread->name = name;
@@ -25,7 +25,7 @@ MigrationThread *MigrationThreadAdd(const char *name, int 
thread_id)
 return thread;
 }
 
-void MigrationThreadDel(MigrationThread *thread)
+void migration_threads_remove(MigrationThread *thread)
 {
 if (thread) {
 QLIST_REMOVE(thread, node);
-- 
2.40.1




[PULL 00/30] Next patches

2023-06-21 Thread Juan Quintela
The following changes since commit 67fe6ae41da64368bc4936b196fee2bf61f8c720:

  Merge tag 'pull-tricore-20230621-1' of https://github.com/bkoppelmann/qemu 
into staging (2023-06-21 20:08:48 +0200)

are available in the Git repository at:

  https://gitlab.com/juan.quintela/qemu.git tags/next-pull-request

for you to fetch changes up to c53dc569d0a0fb76eaa83f353253a897914948f9:

  migration/rdma: Split qemu_fopen_rdma() into input/output functions 
(2023-06-22 02:45:30 +0200)


Migration Pull request (20230621)

In this pull request:

- fix for multifd thread creation (fabiano)
- dirtylimity (hyman)
  * migration-test will go on next PULL request, as it has failures.
- Improve error description (tejus)
- improve -incoming and set parameters before calling incoming (wei)
- migration atomic counters reviewed patches (quintela)
- migration-test refacttoring reviewed (quintela)

Please apply.



Fabiano Rosas (2):
  migration/multifd: Rename threadinfo.c functions
  migration/multifd: Protect accesses to migration_threads

Hyman Huang(黄勇) (8):
  softmmu/dirtylimit: Add parameter check for hmp "set_vcpu_dirty_limit"
  qapi/migration: Introduce x-vcpu-dirty-limit-period parameter
  qapi/migration: Introduce vcpu-dirty-limit parameters
  migration: Introduce dirty-limit capability
  migration: Refactor auto-converge capability logic
  migration: Put the detection logic before auto-converge checking
  migration: Implement dirty-limit convergence algo
  migration: Extend query-migrate to provide dirty page limit info

Juan Quintela (16):
  migration-test: Be consistent for ppc
  migration-test: Make machine_opts regular with other options
  migration-test: Create arch_opts
  migration-test: machine_opts is really arch specific
  migration-test: Create kvm_opts
  migration-test: bootpath is the same for all tests and for all archs
  migration-test: Add bootfile_create/delete() functions
  migration-test: dirtylimit checks for x86_64 arch before
  migration-test: simplify shmem_opts handling
  qemu-file: Rename qemu_file_transferred_ fast -> noflush
  migration: Change qemu_file_transferred to noflush
  migration: Use qemu_file_transferred_noflush() for block migration.
  qemu_file: Make qemu_file_is_writable() static
  qemu-file: Simplify qemu_file_shutdown()
  qemu-file: Make qemu_file_get_error_obj() static
  migration/rdma: Split qemu_fopen_rdma() into input/output functions

Tejus GK (2):
  migration: Update error description whenever migration fails
  migration: Refactor repeated call of yank_unregister_instance

Wei Wang (2):
  migration: enforce multifd and postcopy preempt to be set before
incoming
  qtest/migration-tests.c: use "-incoming defer" for postcopy tests

 qapi/migration.json|  74 +---
 include/sysemu/dirtylimit.h|   2 +
 migration/options.h|   1 +
 migration/qemu-file.h  |  14 ++--
 migration/threadinfo.h |   7 +-
 migration/block.c  |   4 +-
 migration/migration-hmp-cmds.c |  26 +++
 migration/migration.c  |  40 +++
 migration/multifd.c|   4 +-
 migration/options.c|  87 +++
 migration/qemu-file.c  |  24 ++-
 migration/ram.c|  59 +---
 migration/rdma.c   |  39 +--
 migration/savevm.c |   6 +-
 migration/threadinfo.c |  19 -
 migration/vmstate.c|   4 +-
 softmmu/dirtylimit.c   |  97 +++---
 tests/qtest/migration-test.c   | 123 ++---
 migration/trace-events |   1 +
 19 files changed, 472 insertions(+), 159 deletions(-)


base-commit: 5f9dd6a8ce3961db4ce47411ed2097ad88bdf5fc
prerequisite-patch-id: 99c8bffa9428838925e330eb2881bab476122579
prerequisite-patch-id: 77ba427fd916aeb395e95aa0e7190f84e98e96ab
prerequisite-patch-id: 9983d46fa438d7075a37be883529e37ae41e4228
prerequisite-patch-id: 207f7529924b12dcb57f6557d6db6f79ceb2d682
prerequisite-patch-id: 5ad1799a13845dbf893a28a202b51a6b50d95d90
prerequisite-patch-id: c51959aacd6d65ee84fcd4f1b2aed3dd6f6af879
prerequisite-patch-id: da9dbb6799b2da002c0896574334920097e4c50a
prerequisite-patch-id: c1110ffafbaf5465fb277a20db809372291f7846
prerequisite-patch-id: 8307c92bedd07446214b35b40206eb6793a7384d
prerequisite-patch-id: 0a6106cd4a508d5e700a7ff6c25edfdd03c8ca3d
prerequisite-patch-id: 83205051de22382e75bf4acdf69e59315801fa0d
prerequisite-patch-id: 8c9b3cba89d555c071a410041e6da41806106a7e
prerequisite-patch-id: 0ff62a33b9a242226ccc1f5424a516de803c9fe5
prerequisite-patch-id: 25b8ae1ebe09ace14457c454cfcb23077c37346c
prerequisite-patch-id: 466ea91d5be41fe345dacd4d17bbbe5ce13118c2
prerequisite-patch-id: d1045858f9729ac62eccf2e83ebf95cfebae2cb5
prerequisite-patch-id: 0276ec02073bda5426de39e2f2e81eef080b4f5

[PULL 03/30] softmmu/dirtylimit: Add parameter check for hmp "set_vcpu_dirty_limit"

2023-06-21 Thread Juan Quintela
From: Hyman Huang(黄勇) 

dirty_rate paraemter of hmp command "set_vcpu_dirty_limit" is invalid
if less than 0, so add parameter check for it.

Note that this patch also delete the unsolicited help message and
clean up the code.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Markus Armbruster 
Reviewed-by: Peter Xu 
Reviewed-by: Juan Quintela 
Message-Id: <168618975839.6361.1740763387474768865...@git.sr.ht>
Signed-off-by: Juan Quintela 
---
 softmmu/dirtylimit.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index 015a9038d1..5c12d26d49 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -515,14 +515,15 @@ void hmp_set_vcpu_dirty_limit(Monitor *mon, const QDict 
*qdict)
 int64_t cpu_index = qdict_get_try_int(qdict, "cpu_index", -1);
 Error *err = NULL;
 
+if (dirty_rate < 0) {
+error_setg(, "invalid dirty page limit %ld", dirty_rate);
+goto out;
+}
+
 qmp_set_vcpu_dirty_limit(!!(cpu_index != -1), cpu_index, dirty_rate, );
-if (err) {
-hmp_handle_error(mon, err);
-return;
-}
 
-monitor_printf(mon, "[Please use 'info vcpu_dirty_limit' to query "
-   "dirty limit for virtual CPU]\n");
+out:
+hmp_handle_error(mon, err);
 }
 
 static struct DirtyLimitInfo *dirtylimit_query_vcpu(int cpu_index)
-- 
2.40.1




Re: [PATCH v5 0/3] Implement the watchdog timer of HiFive 1 rev b.

2023-06-21 Thread Alistair Francis
On Fri, Jun 9, 2023 at 2:46 AM Tommy Wu  wrote:
>
> The HiFive 1 rev b includes a watchdog module based on a 32-bit
> counter. The watchdog timer is in the always-on domain device of
> HiFive 1 rev b, so this patch added the AON device to the sifive_e
> machine. This patch only implemented the functionality of the
> watchdog timer, not all the functionality of the AON device.
>
> You can test the patchset by the QTest 
> tests/qtest/sifive-e-aon-watchdog-test.c
>
> Changes since v1 ( Thank Alistair for the feedback ):
> - Use the register field macro.
> - Delete the public create function. The board creates the aon device itself.
> - Keep all variable declarations at the top of the code block.
>
> Changes since v2 ( Thank Alistair for the feedback ):
> - Delete the declaration and definition of the create function.
>
> Changes since v3 ( Thank Alistair and Thomas for the feedback ):
> - Use `device_class_set_props()` for the properties in sifive_e_aon device.
> - Add SPDX identifier in QTEST.
> - Use libqtest.h in QTEST.
> - Let the statements on one line as long as they still fit into 80 columns.
>
> Changes since v4 ( Thank Phil for the feedback ):
> - Improve code style consistency.
> - Move the timer create function to the sifive_e_aon_init.
> - Allocate the sifive_e_aon device state in the SoC.
>
> Tommy Wu (3):
>   hw/misc: sifive_e_aon: Support the watchdog timer of HiFive 1 rev b.
>   hw/riscv: sifive_e: Support the watchdog timer of HiFive 1 rev b.
>   tests/qtest: sifive-e-aon-watchdog-test.c: Add QTest of watchdog of
> sifive_e

Do you mind rebasing this on
https://github.com/alistair23/qemu/tree/riscv-to-apply.next ? Then I
will apply it

Alistair

>
>  hw/misc/Kconfig  |   3 +
>  hw/misc/meson.build  |   1 +
>  hw/misc/sifive_e_aon.c   | 319 
>  hw/riscv/Kconfig |   1 +
>  hw/riscv/sifive_e.c  |  17 +-
>  include/hw/misc/sifive_e_aon.h   |  60 +++
>  include/hw/riscv/sifive_e.h  |   9 +-
>  tests/qtest/meson.build  |   3 +
>  tests/qtest/sifive-e-aon-watchdog-test.c | 450 +++
>  9 files changed, 858 insertions(+), 5 deletions(-)
>  create mode 100644 hw/misc/sifive_e_aon.c
>  create mode 100644 include/hw/misc/sifive_e_aon.h
>  create mode 100644 tests/qtest/sifive-e-aon-watchdog-test.c
>
> --
> 2.27.0
>
>



Re: [PATCH v5 3/3] tests/qtest: sifive-e-aon-watchdog-test.c: Add QTest of watchdog of sifive_e

2023-06-21 Thread Alistair Francis
On Fri, Jun 9, 2023 at 2:46 AM Tommy Wu  wrote:
>
> Add some simple tests of the watchdog timer in the always-on domain device
> of HiFive 1 rev b.
>
> Signed-off-by: Tommy Wu 
> Reviewed-by: Frank Chang 
> Acked-by: Thomas Huth 

Acked-by: Alistair Francis 

Alistair

> ---
>  tests/qtest/meson.build  |   3 +
>  tests/qtest/sifive-e-aon-watchdog-test.c | 450 +++
>  2 files changed, 453 insertions(+)
>  create mode 100644 tests/qtest/sifive-e-aon-watchdog-test.c
>
> diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> index 5fa6833ad7..eb8d153a65 100644
> --- a/tests/qtest/meson.build
> +++ b/tests/qtest/meson.build
> @@ -234,6 +234,9 @@ qtests_s390x = \
> 'cpu-plug-test',
> 'migration-test']
>
> +qtests_riscv32 = \
> +  (config_all_devices.has_key('CONFIG_SIFIVE_E_AON') ? 
> ['sifive-e-aon-watchdog-test'] : [])
> +
>  qos_test_ss = ss.source_set()
>  qos_test_ss.add(
>'ac97-test.c',
> diff --git a/tests/qtest/sifive-e-aon-watchdog-test.c 
> b/tests/qtest/sifive-e-aon-watchdog-test.c
> new file mode 100644
> index 00..1f313d16ad
> --- /dev/null
> +++ b/tests/qtest/sifive-e-aon-watchdog-test.c
> @@ -0,0 +1,450 @@
> +/*
> + * QTest testcase for the watchdog timer of HiFive 1 rev b.
> + *
> + * Copyright (c) 2023 SiFive, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this program.  If not, see .
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/timer.h"
> +#include "qemu/bitops.h"
> +#include "libqtest.h"
> +#include "hw/registerfields.h"
> +#include "hw/misc/sifive_e_aon.h"
> +
> +FIELD(AON_WDT_WDOGCFG, SCALE, 0, 4)
> +FIELD(AON_WDT_WDOGCFG, RSVD0, 4, 4)
> +FIELD(AON_WDT_WDOGCFG, RSTEN, 8, 1)
> +FIELD(AON_WDT_WDOGCFG, ZEROCMP, 9, 1)
> +FIELD(AON_WDT_WDOGCFG, RSVD1, 10, 2)
> +FIELD(AON_WDT_WDOGCFG, EN_ALWAYS, 12, 1)
> +FIELD(AON_WDT_WDOGCFG, EN_CORE_AWAKE, 13, 1)
> +FIELD(AON_WDT_WDOGCFG, RSVD2, 14, 14)
> +FIELD(AON_WDT_WDOGCFG, IP0, 28, 1)
> +FIELD(AON_WDT_WDOGCFG, RSVD3, 29, 3)
> +
> +#define WDOG_BASE (0x1000)
> +#define WDOGCFG (0x0)
> +#define WDOGCOUNT (0x8)
> +#define WDOGS (0x10)
> +#define WDOGFEED (0x18)
> +#define WDOGKEY (0x1c)
> +#define WDOGCMP0 (0x20)
> +
> +#define SIFIVE_E_AON_WDOGKEY (0x51F15E)
> +#define SIFIVE_E_AON_WDOGFEED (0xD09F00D)
> +#define SIFIVE_E_LFCLK_DEFAULT_FREQ (32768)
> +
> +static void test_init(QTestState *qts)
> +{
> +qtest_writel(qts, WDOG_BASE + WDOGKEY, SIFIVE_E_AON_WDOGKEY);
> +qtest_writel(qts, WDOG_BASE + WDOGCOUNT, 0);
> +
> +qtest_writel(qts, WDOG_BASE + WDOGKEY, SIFIVE_E_AON_WDOGKEY);
> +qtest_writel(qts, WDOG_BASE + WDOGCFG, 0);
> +
> +qtest_writel(qts, WDOG_BASE + WDOGKEY, SIFIVE_E_AON_WDOGKEY);
> +qtest_writel(qts, WDOG_BASE + WDOGCMP0, 0xBEEF);
> +}
> +
> +static void test_wdogcount(void)
> +{
> +uint64_t tmp;
> +QTestState *qts = qtest_init("-machine sifive_e");
> +
> +test_init(qts);
> +
> +tmp = qtest_readl(qts, WDOG_BASE + WDOGCOUNT);
> +qtest_writel(qts, WDOG_BASE + WDOGCOUNT, 0xBEEF);
> +g_assert(qtest_readl(qts, WDOG_BASE + WDOGCOUNT) == tmp);
> +
> +qtest_writel(qts, WDOG_BASE + WDOGKEY, SIFIVE_E_AON_WDOGKEY);
> +qtest_writel(qts, WDOG_BASE + WDOGCOUNT, 0xBEEF);
> +g_assert(0xBEEF == qtest_readl(qts, WDOG_BASE + WDOGCOUNT));
> +
> +qtest_writel(qts, WDOG_BASE + WDOGKEY, SIFIVE_E_AON_WDOGKEY);
> +qtest_writel(qts, WDOG_BASE + WDOGCOUNT, 0x);
> +g_assert(0x2AAA == qtest_readl(qts, WDOG_BASE + WDOGCOUNT));
> +
> +qtest_writel(qts, WDOG_BASE + WDOGKEY, SIFIVE_E_AON_WDOGKEY);
> +qtest_writel(qts, WDOG_BASE + WDOGFEED, 0x);
> +g_assert(0x2AAA == qtest_readl(qts, WDOG_BASE + WDOGCOUNT));
> +
> +qtest_writel(qts, WDOG_BASE + WDOGKEY, SIFIVE_E_AON_WDOGKEY);
> +qtest_writel(qts, WDOG_BASE + WDOGFEED, SIFIVE_E_AON_WDOGFEED);
> +g_assert(0 == qtest_readl(qts, WDOG_BASE + WDOGCOUNT));
> +
> +qtest_quit(qts);
> +}
> +
> +static void test_wdogcfg(void)
> +{
> +uint32_t tmp_cfg;
> +QTestState *qts = qtest_init("-machine sifive_e");
> +
> +test_init(qts);
> +
> +tmp_cfg = qtest_readl(qts, WDOG_BASE + WDOGCFG);
> +qtest_writel(qts, WDOG_BASE + WDOGCFG, 0x);
> +g_assert(qtest_readl(qts, WDOG_BASE + WDOGCFG) == tmp_cfg);
> +
> +qtest_writel(qts, WDOG_BASE + WDOGKEY, SIFIVE_E_AON_WDOGKEY);
> +qtest_writel(qts, WDOG_BASE + WDOGCFG, 0x);
> +g_assert(0x 

Re: [PATCH v5 2/3] hw/riscv: sifive_e: Support the watchdog timer of HiFive 1 rev b.

2023-06-21 Thread Alistair Francis
On Fri, Jun 9, 2023 at 2:46 AM Tommy Wu  wrote:
>
> Create the AON device when we realize the sifive_e machine.
> This patch only implemented the functionality of the watchdog timer,
> not all the functionality of the AON device.
>
> Signed-off-by: Tommy Wu 
> Reviewed-by: Frank Chang 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/riscv/Kconfig|  1 +
>  hw/riscv/sifive_e.c | 17 +++--
>  include/hw/riscv/sifive_e.h |  9 ++---
>  3 files changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
> index 6528ebfa3a..b6a5eb4452 100644
> --- a/hw/riscv/Kconfig
> +++ b/hw/riscv/Kconfig
> @@ -60,6 +60,7 @@ config SIFIVE_E
>  select SIFIVE_PLIC
>  select SIFIVE_UART
>  select SIFIVE_E_PRCI
> +select SIFIVE_E_AON
>  select UNIMP
>
>  config SIFIVE_U
> diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c
> index 04939b60c3..0d37adc542 100644
> --- a/hw/riscv/sifive_e.c
> +++ b/hw/riscv/sifive_e.c
> @@ -45,6 +45,7 @@
>  #include "hw/intc/riscv_aclint.h"
>  #include "hw/intc/sifive_plic.h"
>  #include "hw/misc/sifive_e_prci.h"
> +#include "hw/misc/sifive_e_aon.h"
>  #include "chardev/char.h"
>  #include "sysemu/sysemu.h"
>
> @@ -185,6 +186,8 @@ static void sifive_e_soc_init(Object *obj)
>  object_property_set_int(OBJECT(>cpus), "resetvec", 0x1004, 
> _abort);
>  object_initialize_child(obj, "riscv.sifive.e.gpio0", >gpio,
>  TYPE_SIFIVE_GPIO);
> +object_initialize_child(obj, "riscv.sifive.e.aon", >aon,
> +TYPE_SIFIVE_E_AON);
>  }
>
>  static void sifive_e_soc_realize(DeviceState *dev, Error **errp)
> @@ -223,10 +226,17 @@ static void sifive_e_soc_realize(DeviceState *dev, 
> Error **errp)
>  RISCV_ACLINT_DEFAULT_MTIMER_SIZE, 0, ms->smp.cpus,
>  RISCV_ACLINT_DEFAULT_MTIMECMP, RISCV_ACLINT_DEFAULT_MTIME,
>  RISCV_ACLINT_DEFAULT_TIMEBASE_FREQ, false);
> -create_unimplemented_device("riscv.sifive.e.aon",
> -memmap[SIFIVE_E_DEV_AON].base, memmap[SIFIVE_E_DEV_AON].size);
>  sifive_e_prci_create(memmap[SIFIVE_E_DEV_PRCI].base);
>
> +/* AON */
> +
> +if (!sysbus_realize(SYS_BUS_DEVICE(>aon), errp)) {
> +return;
> +}
> +
> +/* Map AON registers */
> +sysbus_mmio_map(SYS_BUS_DEVICE(>aon), 0, 
> memmap[SIFIVE_E_DEV_AON].base);
> +
>  /* GPIO */
>
>  if (!sysbus_realize(SYS_BUS_DEVICE(>gpio), errp)) {
> @@ -245,6 +255,9 @@ static void sifive_e_soc_realize(DeviceState *dev, Error 
> **errp)
> qdev_get_gpio_in(DEVICE(s->plic),
>  SIFIVE_E_GPIO0_IRQ0 + i));
>  }
> +sysbus_connect_irq(SYS_BUS_DEVICE(>aon), 0,
> +   qdev_get_gpio_in(DEVICE(s->plic),
> +SIFIVE_E_AON_WDT_IRQ));
>
>  sifive_uart_create(sys_mem, memmap[SIFIVE_E_DEV_UART0].base,
>  serial_hd(0), qdev_get_gpio_in(DEVICE(s->plic), SIFIVE_E_UART0_IRQ));
> diff --git a/include/hw/riscv/sifive_e.h b/include/hw/riscv/sifive_e.h
> index b824a79e2d..31180a680e 100644
> --- a/include/hw/riscv/sifive_e.h
> +++ b/include/hw/riscv/sifive_e.h
> @@ -22,6 +22,7 @@
>  #include "hw/riscv/riscv_hart.h"
>  #include "hw/riscv/sifive_cpu.h"
>  #include "hw/gpio/sifive_gpio.h"
> +#include "hw/misc/sifive_e_aon.h"
>  #include "hw/boards.h"
>
>  #define TYPE_RISCV_E_SOC "riscv.sifive.e.soc"
> @@ -35,6 +36,7 @@ typedef struct SiFiveESoCState {
>  /*< public >*/
>  RISCVHartArrayState cpus;
>  DeviceState *plic;
> +SiFiveEAONState aon;
>  SIFIVEGPIOState gpio;
>  MemoryRegion xip_mem;
>  MemoryRegion mask_rom;
> @@ -76,9 +78,10 @@ enum {
>  };
>
>  enum {
> -SIFIVE_E_UART0_IRQ  = 3,
> -SIFIVE_E_UART1_IRQ  = 4,
> -SIFIVE_E_GPIO0_IRQ0 = 8
> +SIFIVE_E_AON_WDT_IRQ  = 1,
> +SIFIVE_E_UART0_IRQ= 3,
> +SIFIVE_E_UART1_IRQ= 4,
> +SIFIVE_E_GPIO0_IRQ0   = 8
>  };
>
>  #define SIFIVE_E_PLIC_HART_CONFIG "M"
> --
> 2.27.0
>
>



Re: [PATCH v5 1/3] hw/misc: sifive_e_aon: Support the watchdog timer of HiFive 1 rev b.

2023-06-21 Thread Alistair Francis
On Fri, Jun 9, 2023 at 2:47 AM Tommy Wu  wrote:
>
> The watchdog timer is in the always-on domain device of HiFive 1 rev b,
> so this patch added the AON device to the sifive_e machine. This patch
> only implemented the functionality of the watchdog timer.
>
> Signed-off-by: Tommy Wu 
> Reviewed-by: Frank Chang 

Acked-by: Alistair Francis 

Alistair

> ---
>  hw/misc/Kconfig|   3 +
>  hw/misc/meson.build|   1 +
>  hw/misc/sifive_e_aon.c | 319 +
>  include/hw/misc/sifive_e_aon.h |  60 +++
>  4 files changed, 383 insertions(+)
>  create mode 100644 hw/misc/sifive_e_aon.c
>  create mode 100644 include/hw/misc/sifive_e_aon.h
>
> diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig
> index e4c2149175..6996d265e4 100644
> --- a/hw/misc/Kconfig
> +++ b/hw/misc/Kconfig
> @@ -158,6 +158,9 @@ config SIFIVE_TEST
>  config SIFIVE_E_PRCI
>  bool
>
> +config SIFIVE_E_AON
> +bool
> +
>  config SIFIVE_U_OTP
>  bool
>
> diff --git a/hw/misc/meson.build b/hw/misc/meson.build
> index 78ca857c9d..6ac62e6751 100644
> --- a/hw/misc/meson.build
> +++ b/hw/misc/meson.build
> @@ -30,6 +30,7 @@ softmmu_ss.add(when: 'CONFIG_MCHP_PFSOC_IOSCB', if_true: 
> files('mchp_pfsoc_ioscb
>  softmmu_ss.add(when: 'CONFIG_MCHP_PFSOC_SYSREG', if_true: 
> files('mchp_pfsoc_sysreg.c'))
>  softmmu_ss.add(when: 'CONFIG_SIFIVE_TEST', if_true: files('sifive_test.c'))
>  softmmu_ss.add(when: 'CONFIG_SIFIVE_E_PRCI', if_true: 
> files('sifive_e_prci.c'))
> +softmmu_ss.add(when: 'CONFIG_SIFIVE_E_AON', if_true: files('sifive_e_aon.c'))
>  softmmu_ss.add(when: 'CONFIG_SIFIVE_U_OTP', if_true: files('sifive_u_otp.c'))
>  softmmu_ss.add(when: 'CONFIG_SIFIVE_U_PRCI', if_true: 
> files('sifive_u_prci.c'))
>
> diff --git a/hw/misc/sifive_e_aon.c b/hw/misc/sifive_e_aon.c
> new file mode 100644
> index 00..4656457d0b
> --- /dev/null
> +++ b/hw/misc/sifive_e_aon.c
> @@ -0,0 +1,319 @@
> +/*
> + * SiFive HiFive1 AON (Always On Domain) for QEMU.
> + *
> + * Copyright (c) 2022 SiFive, Inc. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this program.  If not, see .
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/timer.h"
> +#include "qemu/log.h"
> +#include "hw/irq.h"
> +#include "hw/registerfields.h"
> +#include "hw/misc/sifive_e_aon.h"
> +#include "qapi/visitor.h"
> +#include "qapi/error.h"
> +#include "sysemu/watchdog.h"
> +#include "hw/qdev-properties.h"
> +
> +REG32(AON_WDT_WDOGCFG, 0x0)
> +FIELD(AON_WDT_WDOGCFG, SCALE, 0, 4)
> +FIELD(AON_WDT_WDOGCFG, RSVD0, 4, 4)
> +FIELD(AON_WDT_WDOGCFG, RSTEN, 8, 1)
> +FIELD(AON_WDT_WDOGCFG, ZEROCMP, 9, 1)
> +FIELD(AON_WDT_WDOGCFG, RSVD1, 10, 2)
> +FIELD(AON_WDT_WDOGCFG, EN_ALWAYS, 12, 1)
> +FIELD(AON_WDT_WDOGCFG, EN_CORE_AWAKE, 13, 1)
> +FIELD(AON_WDT_WDOGCFG, RSVD2, 14, 14)
> +FIELD(AON_WDT_WDOGCFG, IP0, 28, 1)
> +FIELD(AON_WDT_WDOGCFG, RSVD3, 29, 3)
> +REG32(AON_WDT_WDOGCOUNT, 0x8)
> +FIELD(AON_WDT_WDOGCOUNT, VALUE, 0, 31)
> +REG32(AON_WDT_WDOGS, 0x10)
> +REG32(AON_WDT_WDOGFEED, 0x18)
> +REG32(AON_WDT_WDOGKEY, 0x1c)
> +REG32(AON_WDT_WDOGCMP0, 0x20)
> +
> +static void sifive_e_aon_wdt_update_wdogcount(SiFiveEAONState *r)
> +{
> +int64_t now;
> +if (FIELD_EX32(r->wdogcfg, AON_WDT_WDOGCFG, EN_ALWAYS) == 0 &&
> +FIELD_EX32(r->wdogcfg, AON_WDT_WDOGCFG, EN_CORE_AWAKE) == 0) {
> +return;
> +}
> +
> +now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> +r->wdogcount += muldiv64(now - r->wdog_restart_time,
> + r->wdogclk_freq, NANOSECONDS_PER_SECOND);
> +
> +/* Clean the most significant bit. */
> +r->wdogcount &= R_AON_WDT_WDOGCOUNT_VALUE_MASK;
> +r->wdog_restart_time = now;
> +}
> +
> +static void sifive_e_aon_wdt_update_state(SiFiveEAONState *r)
> +{
> +uint16_t wdogs;
> +bool cmp_signal = false;
> +sifive_e_aon_wdt_update_wdogcount(r);
> +wdogs = (uint16_t)(r->wdogcount >>
> +   FIELD_EX32(r->wdogcfg, AON_WDT_WDOGCFG, SCALE));
> +
> +if (wdogs >= r->wdogcmp0) {
> +cmp_signal = true;
> +if (FIELD_EX32(r->wdogcfg, AON_WDT_WDOGCFG, ZEROCMP) == 1) {
> +r->wdogcount = 0;
> +wdogs = 0;
> +}
> +}
> +
> +if (cmp_signal) {
> +if (FIELD_EX32(r->wdogcfg, AON_WDT_WDOGCFG, RSTEN) == 1) {
> +watchdog_perform_action();
> +}
> +

Re: [PATCH v2 0/3] target/riscv: Fix mstatus related problems

2023-06-21 Thread Alistair Francis
On Sat, Jun 3, 2023 at 11:43 PM Weiwei Li  wrote:
>
> This patchset tries to fix some problems in the fields of mstatus, such as 
> make MPV only work when MPP != PRM.
>
> The port is available here:
> https://github.com/plctlab/plct-qemu/tree/plct-mpv-upstream-v2
>
> v2:
> * Drop patch 3 (remove check on mode M for MPRV)
> * rebase on apply-to-riscv.next
>
> Weiwei Li (3):
>   target/riscv: Make MPV only work when MPP != PRV_M
>   target/riscv: Support MSTATUS.MPV/GVA only when RVH is enabled
>   target/riscv: Remove redundant assignment to SXL

Thanks!

Applied to riscv-to-apply.next

Alistair

>
>  target/riscv/cpu_helper.c |  3 ++-
>  target/riscv/csr.c| 14 --
>  target/riscv/op_helper.c  |  3 ++-
>  3 files changed, 8 insertions(+), 12 deletions(-)
>
> --
> 2.25.1
>
>



Re: [PATCH] hw/intc: If mmsiaddrcfgh.L == 1, smsiaddrcfg and smsiaddrcfgh are read-only.

2023-06-21 Thread Alistair Francis
On Mon, Jun 19, 2023 at 7:24 PM Peter Maydell  wrote:
>
> On Mon, 12 Jun 2023 at 05:12, Alistair Francis  wrote:
> >
> > On Fri, Jun 9, 2023 at 4:01 PM Tommy Wu  wrote:
> > >
> > > According to the `The RISC-V Advanced Interrupt Architecture`
> > > document, if register `mmsiaddrcfgh` of the domain has bit L set
> > > to one, then `smsiaddrcfg` and `smsiaddrcfgh` are locked as
> > > read-only alongside `mmsiaddrcfg` and `mmsiaddrcfgh`.
> > >
> > > Signed-off-by: Tommy Wu 
> > > Reviewed-by: Frank Chang 
> >
> > Thanks!
> >
> > Applied to riscv-to-apply.next
>
> If it hasn't gone in already, would you mind tweaking the
> subject line so that it says which interrupt controller
> the change is for ? (ie "hw/intc/riscv_aplic", not just "hw/intc".)

Sorry Peter, it's already in. I'll try to keep a closer eye on the
commit titles in future

Alistair

>
> thanks
> -- PMM



Re: [PATCH v2 10/18] target/riscv/kvm.c: init 'misa_ext_mask' with scratch CPU

2023-06-21 Thread Alistair Francis
On Wed, Jun 14, 2023 at 7:00 AM Daniel Henrique Barboza
 wrote:
>
> At this moment we're retrieving env->misa_ext during
> kvm_arch_init_cpu(), leaving env->misa_ext_mask behind.
>
> We want to set env->misa_ext_mask, and we want to set it as early as
> possible. The reason is that we're going to use it in the validation
> process of the KVM MISA properties we're going to add next. Setting it
> during arch_init_cpu() is too late for user validation.
>
> Move the code to a new helper that is going to be called during init()
> time, via kvm_riscv_init_user_properties(), like we're already doing for
> the machine ID properties. Set both misa_ext and misa_ext_mask to the
> same value retrieved by the 'isa' config reg.
>
> Signed-off-by: Daniel Henrique Barboza 
> Reviewed-by: Andrew Jones 

Acked-by: Alistair Francis 

Alistair

> ---
>  target/riscv/kvm.c | 34 +++---
>  1 file changed, 23 insertions(+), 11 deletions(-)
>
> diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
> index 602727cdfd..4d0808cb9a 100644
> --- a/target/riscv/kvm.c
> +++ b/target/riscv/kvm.c
> @@ -396,6 +396,28 @@ static void kvm_riscv_init_machine_ids(RISCVCPU *cpu, 
> KVMScratchCPU *kvmcpu)
>  }
>  }
>
> +static void kvm_riscv_init_misa_ext_mask(RISCVCPU *cpu,
> + KVMScratchCPU *kvmcpu)
> +{
> +CPURISCVState *env = >env;
> +struct kvm_one_reg reg;
> +int ret;
> +
> +reg.id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG,
> +  KVM_REG_RISCV_CONFIG_REG(isa));
> +reg.addr = (uint64_t)>misa_ext_mask;
> +ret = ioctl(kvmcpu->cpufd, KVM_GET_ONE_REG, );
> +
> +if (ret) {
> +error_report("Unable to fetch ISA register from KVM, "
> + "error %d", ret);
> +kvm_riscv_destroy_scratch_vcpu(kvmcpu);
> +exit(EXIT_FAILURE);
> +}
> +
> +env->misa_ext = env->misa_ext_mask;
> +}
> +
>  void kvm_riscv_init_user_properties(Object *cpu_obj)
>  {
>  RISCVCPU *cpu = RISCV_CPU(cpu_obj);
> @@ -406,6 +428,7 @@ void kvm_riscv_init_user_properties(Object *cpu_obj)
>  }
>
>  kvm_riscv_init_machine_ids(cpu, );
> +kvm_riscv_init_misa_ext_mask(cpu, );
>
>  kvm_riscv_destroy_scratch_vcpu();
>  }
> @@ -525,21 +548,10 @@ static int kvm_vcpu_set_machine_ids(RISCVCPU *cpu, 
> CPUState *cs)
>  int kvm_arch_init_vcpu(CPUState *cs)
>  {
>  int ret = 0;
> -target_ulong isa;
>  RISCVCPU *cpu = RISCV_CPU(cs);
> -CPURISCVState *env = >env;
> -uint64_t id;
>
>  qemu_add_vm_change_state_handler(kvm_riscv_vm_state_change, cs);
>
> -id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG,
> -  KVM_REG_RISCV_CONFIG_REG(isa));
> -ret = kvm_get_one_reg(cs, id, );
> -if (ret) {
> -return ret;
> -}
> -env->misa_ext = isa;
> -
>  if (!object_dynamic_cast(OBJECT(cpu), TYPE_RISCV_CPU_HOST)) {
>  ret = kvm_vcpu_set_machine_ids(cpu, cs);
>  }
> --
> 2.40.1
>
>



Re: [PATCH v2 09/18] linux-headers: Update to v6.4-rc1

2023-06-21 Thread Alistair Francis
On Wed, Jun 14, 2023 at 7:02 AM Daniel Henrique Barboza
 wrote:
>
> Update to commit ac9a78681b92 ("Linux 6.4-rc1").
>
> Signed-off-by: Daniel Henrique Barboza 

Acked-by: Alistair Francis 

Alistair

> ---
>  include/standard-headers/linux/const.h|  2 +-
>  include/standard-headers/linux/virtio_blk.h   | 18 +++
>  .../standard-headers/linux/virtio_config.h|  6 +++
>  include/standard-headers/linux/virtio_net.h   |  1 +
>  linux-headers/asm-arm64/kvm.h | 33 
>  linux-headers/asm-riscv/kvm.h | 53 ++-
>  linux-headers/asm-riscv/unistd.h  |  9 
>  linux-headers/asm-s390/unistd_32.h|  1 +
>  linux-headers/asm-s390/unistd_64.h|  1 +
>  linux-headers/asm-x86/kvm.h   |  3 ++
>  linux-headers/linux/const.h   |  2 +-
>  linux-headers/linux/kvm.h | 12 +++--
>  linux-headers/linux/psp-sev.h |  7 +++
>  linux-headers/linux/userfaultfd.h | 17 +-
>  14 files changed, 149 insertions(+), 16 deletions(-)
>
> diff --git a/include/standard-headers/linux/const.h 
> b/include/standard-headers/linux/const.h
> index 5e48987251..1eb84b5087 100644
> --- a/include/standard-headers/linux/const.h
> +++ b/include/standard-headers/linux/const.h
> @@ -28,7 +28,7 @@
>  #define _BITUL(x)  (_UL(1) << (x))
>  #define _BITULL(x) (_ULL(1) << (x))
>
> -#define __ALIGN_KERNEL(x, a)   __ALIGN_KERNEL_MASK(x, (typeof(x))(a) 
> - 1)
> +#define __ALIGN_KERNEL(x, a)   __ALIGN_KERNEL_MASK(x, 
> (__typeof__(x))(a) - 1)
>  #define __ALIGN_KERNEL_MASK(x, mask)   (((x) + (mask)) & ~(mask))
>
>  #define __KERNEL_DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
> diff --git a/include/standard-headers/linux/virtio_blk.h 
> b/include/standard-headers/linux/virtio_blk.h
> index 7155b1a470..d7be3cf5e4 100644
> --- a/include/standard-headers/linux/virtio_blk.h
> +++ b/include/standard-headers/linux/virtio_blk.h
> @@ -138,11 +138,11 @@ struct virtio_blk_config {
>
> /* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */
> struct virtio_blk_zoned_characteristics {
> -   uint32_t zone_sectors;
> -   uint32_t max_open_zones;
> -   uint32_t max_active_zones;
> -   uint32_t max_append_sectors;
> -   uint32_t write_granularity;
> +   __virtio32 zone_sectors;
> +   __virtio32 max_open_zones;
> +   __virtio32 max_active_zones;
> +   __virtio32 max_append_sectors;
> +   __virtio32 write_granularity;
> uint8_t model;
> uint8_t unused2[3];
> } zoned;
> @@ -239,11 +239,11 @@ struct virtio_blk_outhdr {
>   */
>  struct virtio_blk_zone_descriptor {
> /* Zone capacity */
> -   uint64_t z_cap;
> +   __virtio64 z_cap;
> /* The starting sector of the zone */
> -   uint64_t z_start;
> +   __virtio64 z_start;
> /* Zone write pointer position in sectors */
> -   uint64_t z_wp;
> +   __virtio64 z_wp;
> /* Zone type */
> uint8_t z_type;
> /* Zone state */
> @@ -252,7 +252,7 @@ struct virtio_blk_zone_descriptor {
>  };
>
>  struct virtio_blk_zone_report {
> -   uint64_t nr_zones;
> +   __virtio64 nr_zones;
> uint8_t reserved[56];
> struct virtio_blk_zone_descriptor zones[];
>  };
> diff --git a/include/standard-headers/linux/virtio_config.h 
> b/include/standard-headers/linux/virtio_config.h
> index 965ee6ae23..8a7d0dc8b0 100644
> --- a/include/standard-headers/linux/virtio_config.h
> +++ b/include/standard-headers/linux/virtio_config.h
> @@ -97,6 +97,12 @@
>   */
>  #define VIRTIO_F_SR_IOV37
>
> +/*
> + * This feature indicates that the driver passes extra data (besides
> + * identifying the virtqueue) in its device notifications.
> + */
> +#define VIRTIO_F_NOTIFICATION_DATA 38
> +
>  /*
>   * This feature indicates that the driver can reset a queue individually.
>   */
> diff --git a/include/standard-headers/linux/virtio_net.h 
> b/include/standard-headers/linux/virtio_net.h
> index c0e797067a..2325485f2c 100644
> --- a/include/standard-headers/linux/virtio_net.h
> +++ b/include/standard-headers/linux/virtio_net.h
> @@ -61,6 +61,7 @@
>  #define VIRTIO_NET_F_GUEST_USO655  /* Guest can handle USOv6 in. 
> */
>  #define VIRTIO_NET_F_HOST_USO  56  /* Host can handle USO in. */
>  #define VIRTIO_NET_F_HASH_REPORT  57   /* Supports hash report */
> +#define VIRTIO_NET_F_GUEST_HDRLEN  59  /* Guest provides the exact hdr_len 
> value. */
>  #define VIRTIO_NET_F_RSS 60/* Supports RSS RX steering */
>  #define VIRTIO_NET_F_RSC_EXT 61/* extended coalescing info */
>  #define VIRTIO_NET_F_STANDBY 62/* Act as standby for another device
> diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
> index 

Re: [PATCH v2 08/18] target/riscv: handle mvendorid/marchid/mimpid for KVM CPUs

2023-06-21 Thread Alistair Francis
On Wed, Jun 14, 2023 at 7:00 AM Daniel Henrique Barboza
 wrote:
>
> After changing user validation for mvendorid/marchid/mimpid to guarantee
> that the value is validated on user input time, coupled with the work in
> fetching KVM default values for them by using a scratch CPU, we're
> certain that the values in cpu->cfg.(mvendorid|marchid|mimpid) are
> already good to be written back to KVM.
>
> There's no need to write the values back for 'host' type CPUs since the
> values can't be changed, so let's do that just for generic CPUs.
>
> Signed-off-by: Daniel Henrique Barboza 
> Reviewed-by: Andrew Jones 

Acked-by: Alistair Francis 

Alistair

> ---
>  target/riscv/kvm.c | 31 +++
>  1 file changed, 31 insertions(+)
>
> diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
> index cd2974c663..602727cdfd 100644
> --- a/target/riscv/kvm.c
> +++ b/target/riscv/kvm.c
> @@ -495,6 +495,33 @@ void kvm_arch_init_irq_routing(KVMState *s)
>  {
>  }
>
> +static int kvm_vcpu_set_machine_ids(RISCVCPU *cpu, CPUState *cs)
> +{
> +CPURISCVState *env = >env;
> +uint64_t id;
> +int ret;
> +
> +id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG,
> +  KVM_REG_RISCV_CONFIG_REG(mvendorid));
> +ret = kvm_set_one_reg(cs, id, >cfg.mvendorid);
> +if (ret != 0) {
> +return ret;
> +}
> +
> +id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG,
> +  KVM_REG_RISCV_CONFIG_REG(marchid));
> +ret = kvm_set_one_reg(cs, id, >cfg.marchid);
> +if (ret != 0) {
> +return ret;
> +}
> +
> +id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG,
> +  KVM_REG_RISCV_CONFIG_REG(mimpid));
> +ret = kvm_set_one_reg(cs, id, >cfg.mimpid);
> +
> +return ret;
> +}
> +
>  int kvm_arch_init_vcpu(CPUState *cs)
>  {
>  int ret = 0;
> @@ -513,6 +540,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  }
>  env->misa_ext = isa;
>
> +if (!object_dynamic_cast(OBJECT(cpu), TYPE_RISCV_CPU_HOST)) {
> +ret = kvm_vcpu_set_machine_ids(cpu, cs);
> +}
> +
>  return ret;
>  }
>
> --
> 2.40.1
>
>



Re: [PATCH v2 05/18] target/riscv/cpu.c: restrict 'marchid' value

2023-06-21 Thread Alistair Francis
On Wed, Jun 14, 2023 at 7:03 AM Daniel Henrique Barboza
 wrote:
>
> 'marchid' shouldn't be set to a different value as previously set for
> named CPUs.
>
> For all other CPUs it shouldn't be freely set either - the spec requires
> that 'marchid' can't have the MSB (most significant bit) set and every
> other bit set to zero, i.e. 0x8000 is an invalid 'marchid' value for
> 32 bit CPUs.
>
> As with 'mimpid', setting a default value based on the current QEMU
> version is not a good idea because it implies that the CPU
> implementation changes from one QEMU version to the other. Named CPUs
> should set 'marchid' to a meaningful value instead, and generic CPUs can
> set to any valid value.
>
> For the 'veyron-v1' CPU this is the error thrown if 'marchid' is set to
> a different val:
>
> $ ./build/qemu-system-riscv64 -M virt -nographic -cpu 
> veyron-v1,marchid=0x8000
> qemu-system-riscv64: can't apply global 
> veyron-v1-riscv-cpu.marchid=0x8000:
> Unable to change veyron-v1-riscv-cpu marchid (0x8001)
>
> And, for generics CPUs, this is the error when trying to set to an
> invalid val:
>
> $ ./build/qemu-system-riscv64 -M virt -nographic -cpu 
> rv64,marchid=0x8000
> qemu-system-riscv64: can't apply global 
> rv64-riscv-cpu.marchid=0x8000:
> Unable to set marchid with MSB (64) bit set and the remaining bits zero
>
> Signed-off-by: Daniel Henrique Barboza 
> Reviewed-by: Andrew Jones 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c | 60 --
>  1 file changed, 53 insertions(+), 7 deletions(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 39c550682a..2eb793188c 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -38,11 +38,6 @@
>  #include "tcg/tcg.h"
>
>  /* RISC-V CPU definitions */
> -
> -#define RISCV_CPU_MARCHID   ((QEMU_VERSION_MAJOR << 16) | \
> - (QEMU_VERSION_MINOR << 8)  | \
> - (QEMU_VERSION_MICRO))
> -
>  static const char riscv_single_letter_exts[] = "IEMAFDQCPVH";
>
>  struct isa_ext_data {
> @@ -1733,8 +1728,6 @@ static void riscv_cpu_add_user_properties(Object *obj)
>  static Property riscv_cpu_properties[] = {
>  DEFINE_PROP_BOOL("debug", RISCVCPU, cfg.debug, true),
>
> -DEFINE_PROP_UINT64("marchid", RISCVCPU, cfg.marchid, RISCV_CPU_MARCHID),
> -
>  #ifndef CONFIG_USER_ONLY
>  DEFINE_PROP_UINT64("resetvec", RISCVCPU, env.resetvec, DEFAULT_RSTVEC),
>  #endif
> @@ -1881,6 +1874,56 @@ static void cpu_get_mimpid(Object *obj, Visitor *v, 
> const char *name,
>  visit_type_bool(v, name, , errp);
>  }
>
> +static void cpu_set_marchid(Object *obj, Visitor *v, const char *name,
> +void *opaque, Error **errp)
> +{
> +bool dynamic_cpu = riscv_cpu_is_dynamic(obj);
> +RISCVCPU *cpu = RISCV_CPU(obj);
> +uint64_t prev_val = cpu->cfg.marchid;
> +uint64_t value, invalid_val;
> +uint32_t mxlen = 0;
> +
> +if (!visit_type_uint64(v, name, , errp)) {
> +return;
> +}
> +
> +if (!dynamic_cpu && prev_val != value) {
> +error_setg(errp, "Unable to change %s marchid (0x%lx)",
> +   object_get_typename(obj), prev_val);
> +return;
> +}
> +
> +switch (riscv_cpu_mxl(>env)) {
> +case MXL_RV32:
> +mxlen = 32;
> +break;
> +case MXL_RV64:
> +case MXL_RV128:
> +mxlen = 64;
> +break;
> +default:
> +g_assert_not_reached();
> +}
> +
> +invalid_val = 1LL << (mxlen - 1);
> +
> +if (value == invalid_val) {
> +error_setg(errp, "Unable to set marchid with MSB (%u) bit set "
> + "and the remaining bits zero", mxlen);
> +return;
> +}
> +
> +cpu->cfg.marchid = value;
> +}
> +
> +static void cpu_get_marchid(Object *obj, Visitor *v, const char *name,
> +   void *opaque, Error **errp)
> +{
> +bool value = RISCV_CPU(obj)->cfg.marchid;
> +
> +visit_type_bool(v, name, , errp);
> +}
> +
>  static void riscv_cpu_class_init(ObjectClass *c, void *data)
>  {
>  RISCVCPUClass *mcc = RISCV_CPU_CLASS(c);
> @@ -1918,6 +1961,9 @@ static void riscv_cpu_class_init(ObjectClass *c, void 
> *data)
>  object_class_property_add(c, "mimpid", "uint64", cpu_get_mimpid,
>cpu_set_mimpid, NULL, NULL);
>
> +object_class_property_add(c, "marchid", "uint64", cpu_get_marchid,
> +  cpu_set_marchid, NULL, NULL);
> +
>  device_class_set_props(dc, riscv_cpu_properties);
>  }
>
> --
> 2.40.1
>
>



Re: [PATCH v2 04/18] target/riscv/cpu.c: restrict 'mimpid' value

2023-06-21 Thread Alistair Francis
On Wed, Jun 14, 2023 at 7:01 AM Daniel Henrique Barboza
 wrote:
>
> Following the same logic used with 'mvendorid' let's also restrict
> 'mimpid' for named CPUs. Generic CPUs keep setting the value freely.
>
> Note that we're getting rid of the default RISCV_CPU_MARCHID value. The
> reason is that this is not a good default since it's dynamic, changing
> with with every QEMU version, regardless of whether the actual
> implementation of the CPU changed from one QEMU version to the other.
> Named CPU should set it to a meaningful value instead and generic CPUs
> can set whatever they want.
>
> This is the error thrown for an invalid 'mimpid' value for the veyron-v1
> CPU:
>
> $ ./qemu-system-riscv64 -M virt -nographic -cpu veyron-v1,mimpid=2
> qemu-system-riscv64: can't apply global veyron-v1-riscv-cpu.mimpid=2:
> Unable to change veyron-v1-riscv-cpu mimpid (0x111)
>
> Signed-off-by: Daniel Henrique Barboza 
> Reviewed-by: Andrew Jones 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c | 34 --
>  1 file changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 6a9a6d34eb..39c550682a 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -42,7 +42,6 @@
>  #define RISCV_CPU_MARCHID   ((QEMU_VERSION_MAJOR << 16) | \
>   (QEMU_VERSION_MINOR << 8)  | \
>   (QEMU_VERSION_MICRO))
> -#define RISCV_CPU_MIMPIDRISCV_CPU_MARCHID
>
>  static const char riscv_single_letter_exts[] = "IEMAFDQCPVH";
>
> @@ -1735,7 +1734,6 @@ static Property riscv_cpu_properties[] = {
>  DEFINE_PROP_BOOL("debug", RISCVCPU, cfg.debug, true),
>
>  DEFINE_PROP_UINT64("marchid", RISCVCPU, cfg.marchid, RISCV_CPU_MARCHID),
> -DEFINE_PROP_UINT64("mimpid", RISCVCPU, cfg.mimpid, RISCV_CPU_MIMPID),
>
>  #ifndef CONFIG_USER_ONLY
>  DEFINE_PROP_UINT64("resetvec", RISCVCPU, env.resetvec, DEFAULT_RSTVEC),
> @@ -1854,6 +1852,35 @@ static void cpu_get_mvendorid(Object *obj, Visitor *v, 
> const char *name,
>  visit_type_bool(v, name, , errp);
>  }
>
> +static void cpu_set_mimpid(Object *obj, Visitor *v, const char *name,
> +   void *opaque, Error **errp)
> +{
> +bool dynamic_cpu = riscv_cpu_is_dynamic(obj);
> +RISCVCPU *cpu = RISCV_CPU(obj);
> +uint64_t prev_val = cpu->cfg.mimpid;
> +uint64_t value;
> +
> +if (!visit_type_uint64(v, name, , errp)) {
> +return;
> +}
> +
> +if (!dynamic_cpu && prev_val != value) {
> +error_setg(errp, "Unable to change %s mimpid (0x%lx)",
> +   object_get_typename(obj), prev_val);
> +return;
> +}
> +
> +cpu->cfg.mimpid = value;
> +}
> +
> +static void cpu_get_mimpid(Object *obj, Visitor *v, const char *name,
> +   void *opaque, Error **errp)
> +{
> +bool value = RISCV_CPU(obj)->cfg.mimpid;
> +
> +visit_type_bool(v, name, , errp);
> +}
> +
>  static void riscv_cpu_class_init(ObjectClass *c, void *data)
>  {
>  RISCVCPUClass *mcc = RISCV_CPU_CLASS(c);
> @@ -1888,6 +1915,9 @@ static void riscv_cpu_class_init(ObjectClass *c, void 
> *data)
>  object_class_property_add(c, "mvendorid", "uint32", cpu_get_mvendorid,
>cpu_set_mvendorid, NULL, NULL);
>
> +object_class_property_add(c, "mimpid", "uint64", cpu_get_mimpid,
> +  cpu_set_mimpid, NULL, NULL);
> +
>  device_class_set_props(dc, riscv_cpu_properties);
>  }
>
> --
> 2.40.1
>
>



Re: [PATCH v2 03/18] target/riscv/cpu.c: restrict 'mvendorid' value

2023-06-21 Thread Alistair Francis
On Wed, Jun 14, 2023 at 7:00 AM Daniel Henrique Barboza
 wrote:
>
> We're going to change the handling of mvendorid/marchid/mimpid by the
> KVM driver. Since these are always present in all CPUs let's put the
> same validation for everyone.
>
> It doesn't make sense to allow 'mvendorid' to be different than it
> is already set in named (vendor) CPUs. Generic (dynamic) CPUs can have
> any 'mvendorid' they want.
>
> Change 'mvendorid' to be a class property created via
> 'object_class_property_add', instead of using the DEFINE_PROP_UINT32()
> macro. This allow us to define a custom setter for it that will verify,
> for named CPUs, if mvendorid is different than it is already set by the
> CPU. This is the error thrown for the 'veyron-v1' CPU if 'mvendorid' is
> set to an invalid value:
>
> $ qemu-system-riscv64 -M virt -nographic -cpu veyron-v1,mvendorid=2
> qemu-system-riscv64: can't apply global veyron-v1-riscv-cpu.mvendorid=2:
> Unable to change veyron-v1-riscv-cpu mvendorid (0x61f)
>
> Signed-off-by: Daniel Henrique Barboza 
> Reviewed-by: Andrew Jones 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c | 38 +-
>  1 file changed, 37 insertions(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index e904018644..6a9a6d34eb 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -1734,7 +1734,6 @@ static void riscv_cpu_add_user_properties(Object *obj)
>  static Property riscv_cpu_properties[] = {
>  DEFINE_PROP_BOOL("debug", RISCVCPU, cfg.debug, true),
>
> -DEFINE_PROP_UINT32("mvendorid", RISCVCPU, cfg.mvendorid, 0),
>  DEFINE_PROP_UINT64("marchid", RISCVCPU, cfg.marchid, RISCV_CPU_MARCHID),
>  DEFINE_PROP_UINT64("mimpid", RISCVCPU, cfg.mimpid, RISCV_CPU_MIMPID),
>
> @@ -1821,6 +1820,40 @@ static const struct TCGCPUOps riscv_tcg_ops = {
>  #endif /* !CONFIG_USER_ONLY */
>  };
>
> +static bool riscv_cpu_is_dynamic(Object *cpu_obj)
> +{
> +return object_dynamic_cast(cpu_obj, TYPE_RISCV_DYNAMIC_CPU) != NULL;
> +}
> +
> +static void cpu_set_mvendorid(Object *obj, Visitor *v, const char *name,
> +  void *opaque, Error **errp)
> +{
> +bool dynamic_cpu = riscv_cpu_is_dynamic(obj);
> +RISCVCPU *cpu = RISCV_CPU(obj);
> +uint32_t prev_val = cpu->cfg.mvendorid;
> +uint32_t value;
> +
> +if (!visit_type_uint32(v, name, , errp)) {
> +return;
> +}
> +
> +if (!dynamic_cpu && prev_val != value) {
> +error_setg(errp, "Unable to change %s mvendorid (0x%x)",
> +   object_get_typename(obj), prev_val);
> +return;
> +}
> +
> +cpu->cfg.mvendorid = value;
> +}
> +
> +static void cpu_get_mvendorid(Object *obj, Visitor *v, const char *name,
> +  void *opaque, Error **errp)
> +{
> +bool value = RISCV_CPU(obj)->cfg.mvendorid;
> +
> +visit_type_bool(v, name, , errp);
> +}
> +
>  static void riscv_cpu_class_init(ObjectClass *c, void *data)
>  {
>  RISCVCPUClass *mcc = RISCV_CPU_CLASS(c);
> @@ -1852,6 +1885,9 @@ static void riscv_cpu_class_init(ObjectClass *c, void 
> *data)
>  cc->gdb_get_dynamic_xml = riscv_gdb_get_dynamic_xml;
>  cc->tcg_ops = _tcg_ops;
>
> +object_class_property_add(c, "mvendorid", "uint32", cpu_get_mvendorid,
> +  cpu_set_mvendorid, NULL, NULL);
> +
>  device_class_set_props(dc, riscv_cpu_properties);
>  }
>
> --
> 2.40.1
>
>



[PATCH] vdpa: Increase out buffer size for CVQ commands

2023-06-21 Thread Hawkins Jiawei
According to the VirtIO standard, "Since there are no guarantees,
it can use a hash filter or silently switch to
allmulti or promiscuous mode if it is given too many addresses."
To achive this, QEMU ignores MAC addresses and marks `mac_table.x_overflow`
in the device internal state in virtio_net_handle_mac()
if the guest sets more than `MAC_TABLE_ENTRIES` MAC addresses
for the filter table.

However, the problem is that QEMU never marks the `mac_table.x_overflow`
for the vdpa device internal state when the guest sets more than
`MAC_TABLE_ENTRIES` MAC addresses.

To be more specific, currently QEMU offers a buffer size of
vhost_vdpa_net_cvq_cmd_len() for CVQ commands, which represents the size of
VIRTIO_NET_CTRL_MAC_TABLE_SET command with a maximum `MAC_TABLE_ENTRIES`
MAC addresses.

Consequently, if the guest sets more than `MAC_TABLE_ENTRIES` MAC addresses,
QEMU truncates the CVQ command data and copies this incomplete command
into the out buffer. In this situation, virtio_net_handle_mac() fails the
integrity check and returns VIRTIO_NET_ERR instead of marking
`mac_table.x_overflow` and returning VIRTIO_NET_OK, since the copied
CVQ command in the buffer is incomplete and flawed.

This patch solves this problem by increasing the buffer size to
vhost_vdpa_net_cvq_cmd_page_len(), which represents the size of the buffer
that is allocated and mmaped. Therefore, everything should work correctly
as long as the guest sets fewer than `(vhost_vdpa_net_cvq_cmd_page_len() -
sizeof(struct virtio_net_ctrl_hdr)
- 2 * sizeof(struct virtio_net_ctrl_mac)) / ETH_ALEN` MAC addresses.

Considering the highly unlikely scenario for the guest setting more than
that number of MAC addresses for the filter table, this patch should
work fine for the majority of cases. If there is a need for more than thoes
entries, we can increase the value for vhost_vdpa_net_cvq_cmd_page_len()
in the future, mapping more than one page for command output.

Fixes: 7a7f87e94c ("vdpa: Move command buffers map to start of net device")
Signed-off-by: Hawkins Jiawei 
---
 net/vhost-vdpa.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 5a72204899..ecfa8852b5 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -784,9 +784,18 @@ static int 
vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
 };
 ssize_t dev_written = -EINVAL;
 
+/*
+ * This code truncates the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command
+ * and prevents QEMU from marking `mac_table.x_overflow` in the device
+ * internal state in virtio_net_handle_mac() if the guest sets more than
+ * `(vhost_vdpa_net_cvq_cmd_page_len() - sizeof(struct virtio_net_ctrl_hdr)
+ * - 2 * sizeof(struct virtio_net_ctrl_mac)) / ETH_ALEN` MAC addresses for
+ * filter table.
+ * However, this situation is considered rare, so it is acceptable.
+ */
 out.iov_len = iov_to_buf(elem->out_sg, elem->out_num, 0,
  s->cvq_cmd_out_buffer,
- vhost_vdpa_net_cvq_cmd_len());
+ vhost_vdpa_net_cvq_cmd_page_len());
 if (*(uint8_t *)s->cvq_cmd_out_buffer == VIRTIO_NET_CTRL_ANNOUNCE) {
 /*
  * Guest announce capability is emulated by qemu, so don't forward to
-- 
2.25.1




Re: [PATCH v2 0/8] disas/riscv: Add vendor extension support

2023-06-21 Thread Alistair Francis
On Mon, Jun 12, 2023 at 9:11 PM Christoph Muellner
 wrote:
>
> From: Christoph Müllner 
>
> This series adds vendor extension support to the QEMU disassembler
> for RISC-V. The following vendor extensions are covered:
> * XThead{Ba,Bb,Bs,Cmo,CondMov,FMemIdx,Fmv,Mac,MemIdx,MemPair,Sync}
> * XVentanaCondOps
>
> So far, there have been two attempts to add vendor extension support
> to the QEMU disassembler. The first one [1] was posted in August 2022
> by LIU Zhiwei and attempts to separate vendor extension specifics
> from standard extension code in combination with a patch that introduced
> support for XVentanaCondOps. The second one [2] was posted in March 2023
> by me and added XThead* support without separating the vendor extensions
> from the standard code.
>
> This patchset represents the third attempt to add vendor extension
> support to the QEMU disassembler. It adds all features of the previous
> attempts and integrates them into a patchset that uses the same
> mechanism for testing the extension availability like translate.c
> (using the booleans RISCVCPUConfig::ext_*).
> To achieve that, a couple of patches were needed to restructure
> the existing code.
>
> Note, that this patchset allows an instruction encoder function for each
> vendor extension, but operand decoding and instruction printing remains
> common code. This is irrelevant for XVentanaCondOps, but the patch for
> the XThead* extensions includes changes in riscv.c and riscv.h.
> This could be changed to force more separation with the cost of
> duplication.
>
> The first patch of this series is cherry-picked from LIU Zhiwei's series.
> It was reviewed by Alistair Francis and Richard Henderson, but never
> made it on master. I've added "Reviewed-by" tags to the commit.
>
> Changes for v2:
> * Rebase on Alistair's riscv-to-apply.next branch
>
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2022-08/msg03662.html
> [2] https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg04566.html
>
> Christoph Müllner (7):
>   target/riscv: Factor out extension tests to cpu_cfg.h
>   disas/riscv: Move types/constants to new header file
>   disas/riscv: Make rv_op_illegal a shared enum value
>   disas/riscv: Encapsulate opcode_data into decode
>   disas/riscv: Provide infrastructure for vendor extensions
>   disas/riscv: Add support for XVentanaCondOps
>   disas/riscv: Add support for XThead* instructions
>
> LIU Zhiwei (1):
>   target/riscv: Use xl instead of mxl for disassemble

Thanks!

Applied to riscv-to-apply.next

Alistair

>
>  disas/meson.build|   6 +-
>  disas/riscv-xthead.c | 707 +++
>  disas/riscv-xthead.h |  28 ++
>  disas/riscv-xventana.c   |  41 +++
>  disas/riscv-xventana.h   |  18 +
>  disas/riscv.c| 378 ++---
>  disas/riscv.h| 299 +
>  target/riscv/cpu.c   |   3 +-
>  target/riscv/cpu_cfg.h   |  37 ++
>  target/riscv/translate.c |  27 +-
>  10 files changed, 1246 insertions(+), 298 deletions(-)
>  create mode 100644 disas/riscv-xthead.c
>  create mode 100644 disas/riscv-xthead.h
>  create mode 100644 disas/riscv-xventana.c
>  create mode 100644 disas/riscv-xventana.h
>  create mode 100644 disas/riscv.h
>
> --
> 2.40.1
>
>



[Bug 1617385] Re: No snapshot possible with virtio-gpu activated

2023-06-21 Thread Buo-ren, Lin
I also have this problem, with the 3D acceleration enabled, it would be
great if the situation improves as the VM often not stable after the
host restores from sleep state.

** Changed in: qemu
   Status: Expired => Confirmed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1617385

Title:
  No snapshot possible with virtio-gpu activated

Status in QEMU:
  Confirmed

Bug description:
  I'm using "Qemu" and "Virtual Machine Manager" on Debian-8-Stretch -
  both newest versions out of the Debian-testing-repository (state
  26.08.2016).

  If I try to save a virtual machine, it fails and I'll get the
  following error:

  libvirtError: internal error: unable to execute QEMU command
  'migrate': State blocked by non-migratable device
  ':00:02.0/virtio-gpu'

  This only happens, if I chose "Virtio" as graphics-driver (no matter
  if I use "Spice" or "Vnc" as Server by the way). If I switch to any
  other driver (Cirrus, Qxl, Vga, VMvga...) there is no problem to take
  a snapshot and save the virtual machine.

  Unfortunately "virtio-gpu" (together with "Spice-Server") is the only
  driver that provides proper working/running my virtual machines on my
  PC.

  feuerkogel1

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1617385/+subscriptions




Re: [PATCH v2 06/20] qemu_file: total_transferred is not used anymore

2023-06-21 Thread Juan Quintela
Peter Xu  wrote:
> On Tue, May 30, 2023 at 08:39:27PM +0200, Juan Quintela wrote:
>> Signed-off-by: Juan Quintela 
>> ---
>>  migration/qemu-file.c | 4 
>>  1 file changed, 4 deletions(-)
>> 
>> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> index eb0497e532..6b6deea19b 100644
>> --- a/migration/qemu-file.c
>> +++ b/migration/qemu-file.c
>> @@ -41,9 +41,6 @@ struct QEMUFile {
>>  QIOChannel *ioc;
>>  bool is_writable;
>>  
>> -/* The sum of bytes transferred on the wire */
>> -uint64_t total_transferred;
>> -
>>  int buf_index;
>>  int buf_size; /* 0 when writing */
>>  uint8_t buf[IO_BUF_SIZE];
>> @@ -287,7 +284,6 @@ void qemu_fflush(QEMUFile *f)
>>  qemu_file_set_error_obj(f, -EIO, local_error);
>>  } else {
>>  uint64_t size = iov_size(f->iov, f->iovcnt);
>> -f->total_transferred += size;
>
> I think this patch is another example why I think sometimes the way patch
> is split are pretty much adding more complexity on review...

It depends of taste.

You are doing one thing in way1.
Then you find a better way to do it, lets call it way2.

Now we have two options to see how we arrived there.

a- You got any declarations/definition/initializations for way2
b- You write way2 alongside way1
c- You test that both ways give the same result, and you see that they
   give the same result.
d- you remove the way1.

Or you squash the four patches in a single patch.  But then the reviewer
lost the place where one can see why it is the same than the old one.

Sometimes is better the longer way, sometimes is better the short one.

Clearly we don't agree about what is the best way in this case.

> Here we removed a variable operation but it seems all fine if it's not used
> anywhere.  But it also means current code base (before this patch applied)
> doesn't make sense already because it contains this useless addition.  So
> IMHO it means some previous patch does it just wrong.

No.  It is how it is developed.  And being respectful with the
reviewer.  Given it enough information to do a proper review.

During the development of this series, there were lots of:

if (old_counter != new_counter)
   printf("");

traces were in the several thousand lines long.  If I have to review
that change, I would love any help that writer can give me.  That is why
it is done this way.

> I think it means it's caused by a wrong split of patches, then each patch
> stops to make much sense as a standalone one.

It stops making sense if you want each feature to be a single patch.
Before the patch no feature.  After the patch full feature.  That brings
us to very long patches.

What is easier to review (to do the same)

a - 1 x 1000 lines patch
b - 10 x 100 lines patch

I will go with b any time.  Except if the split is arbitrary.

> I can go back and try to find whatever patch on the list that will explain
> this.  But it'll also go into git log.  Anyone reads this later will be
> confused once again.  Even harder for them to figure out what
> happened.

As said before, I completely disagree here.  And what is worse.  If it
gets wrong, with your approach git bisect will not help as much than
with my appreach.

> Do you think we could reorganize the patches so each of a single patch
> explains itself?

No.  See before.  We go for a very spaguetti code to a much less
spaguety code.

> The other thing is about priority of patches - I still have ~80 patches
> pending reviews on migration only.. Would you think it makes sense we pickg
> up important ones first and merge them with higher priority?

Ok, lets make this clear.
This whole atomic migration counters started because the zero_page
detection in multifd had the counters so wrong that meassuring speed
become impossible.

I haven't yet send the multifd zero pages.  And why was it so
complicated.  Just on top of my memory.

- how much data had we transferred.  Historically we stored that
  information on qemu-file.  But qemu-file can only be read/written from
  the migration thread.  So we went through jumps to be able to update
  that values.

  Current upstream code for compressed multifd assumes that it transfer
  as much data as non compressed one.  Why?  because we don't have an
  easy way to get that value back.  Contorsions that we were trying to
  do:

  https://lore.kernel.org/all/20220802063907.18882-5-quint...@redhat.com/

  To resume, the way that we had to do it was something like:

  - we send a bunch of pages to multifd thread
  - multifd thread send data and returns on the buffer what has written
  - migration thread when reuses a buffer adds the written stuff from
previous time than the struct was used.

  This was not just problematic from multifd zero pages detection.
  * compression was lying about it
  * zero_copy is doing it wrong (accounting at the time that it does the
write, not when it knows that it was written).

- rdma: this is even funnier
  * It accounted for 

[PATCH] virtio-gpu-udmabuf: create udmabuf for blob even when iov_cnt == 1

2023-06-21 Thread Dongwon Kim
There were often cases where a scanout blob sometimes has just 1 entry
that is linked to many pages in it. So just checking whether iov_cnt is 1
is not enough for screening small, non-scanout blobs. Therefore adding
iov_len check as well to make sure it creates an udmabuf only for a scanout
blob, which is at least bigger than one page size.

Cc: Gerd Hoffmann 
Cc: Marc-André Lureau 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 hw/display/virtio-gpu-udmabuf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/display/virtio-gpu-udmabuf.c b/hw/display/virtio-gpu-udmabuf.c
index 69e2cf0bd6..ef1a740de5 100644
--- a/hw/display/virtio-gpu-udmabuf.c
+++ b/hw/display/virtio-gpu-udmabuf.c
@@ -132,7 +132,8 @@ void virtio_gpu_init_udmabuf(struct 
virtio_gpu_simple_resource *res)
 void *pdata = NULL;
 
 res->dmabuf_fd = -1;
-if (res->iov_cnt == 1) {
+if (res->iov_cnt == 1 &&
+res->iov[0].iov_len < 4096) {
 pdata = res->iov[0].iov_base;
 } else {
 virtio_gpu_create_udmabuf(res);
-- 
2.34.1




[PATCH] ui/gtk: making dmabuf NULL when it's released.

2023-06-21 Thread Dongwon Kim
Set vc->gfx.guest_fb.dmabuf to NULL to prevent any further access
to it after the dmabuf is released.

Cc: Gerd Hoffmann 
Cc: Marc-André Lureau 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 ui/gtk.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/ui/gtk.c b/ui/gtk.c
index e50f950f2b..0b8bf8ea8a 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -587,8 +587,12 @@ static bool gd_has_dmabuf(DisplayChangeListener *dcl)
 static void gd_gl_release_dmabuf(DisplayChangeListener *dcl,
  QemuDmaBuf *dmabuf)
 {
+VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
 #ifdef CONFIG_GBM
 egl_dmabuf_release_texture(dmabuf);
+if (vc->gfx.guest_fb.dmabuf == dmabuf) {
+vc->gfx.guest_fb.dmabuf = NULL;
+}
 #endif
 }
 
-- 
2.34.1




Re: [PATCH v2 04/20] qemu-file: We only call qemu_file_transferred_* on the sending side

2023-06-21 Thread Juan Quintela
Peter Xu  wrote:
> On Tue, Jun 13, 2023 at 06:02:05PM +0200, Juan Quintela wrote:
>> Peter Xu  wrote:
>> > On Tue, May 30, 2023 at 08:39:25PM +0200, Juan Quintela wrote:
>> >> Remove the increase in qemu_file_fill_buffer() and add asserts to
>> >> qemu_file_transferred* functions.
>> >> 
>> >> Signed-off-by: Juan Quintela 
>> >
>> > The read side accounting does look a bit weird and never caught my notice..
>> >
>> > Maybe worth also touching the document of QEMUFile::total_transferred to
>> > clarify what it accounts?
>> >
>> > Reviewed-by: Peter Xu 
>> >
>> > Though when I'm looking at the counters (didn't follow every single recent
>> > patch on this..), I found that now reading transferred value is actually
>> > more expensive - qemu_file_transferred() needs flushing, even if for the
>> > fast version, qemu_file_transferred_fast() loops over all possible iovs,
>> > which can be as large as MAX_IOV_SIZE==64.
>> >
>> > To be explicit, I _think_ for each guest page we now need to flush...
>> >
>> >   ram_save_iterate
>> > migration_rate_exceeded
>> >   migration_transferred_bytes
>> > qemu_file_transferred
>> >
>> > I hope I'm wrong..
>> 
>> See patch 7:
>> 
>> diff --git a/migration/migration-stats.c b/migration/migration-stats.c
>> index 79eea8d865..1696185694 100644
>> --- a/migration/migration-stats.c
>> +++ b/migration/migration-stats.c
>> @@ -62,7 +62,7 @@ uint64_t migration_transferred_bytes(QEMUFile *f)
>>  {
>>  uint64_t multifd = stat64_get(_stats.multifd_bytes);
>>  uint64_t rdma = stat64_get(_stats.rdma_bytes);
>> -uint64_t qemu_file = qemu_file_transferred(f);
>> +uint64_t qemu_file = stat64_get(_stats.qemu_file_transferred);
>>  
>>  trace_migration_transferred_bytes(qemu_file, multifd, rdma);
>>  return qemu_file + multifd + rdma;
>
> If this is a known regression, should we make a first patchset fix it and
> make it higher priority to merge?

This is the simpler way that I have found to arrive from A to B.
The reason why I didn't want to enter the atomic vars (and everybody
before) was because it had so many changing bits.

And here we are, moving to single counters instead of several of them
took something like 200 patches.

> It seems this is even not mentioned in the cover letter.. while IMHO this
> is the most important bit to have in it..

My fault.

I cc'd Fiona on v1, and she confirmed that this fixed her problem.

This is the commit that introduces the slowdown.

commit 813cd61669e45ee6d5db09a83d03df8f0c6eb5d2
Author: Juan Quintela 
Date:   Mon May 15 21:57:01 2023 +0200

migration: Use migration_transferred_bytes() to calculate rate_limit

Signed-off-by: Juan Quintela 
Reviewed-by: Cédric Le Goater 
Message-Id: <20230515195709.63843-9-quint...@redhat.com>

Important bits:

-uint64_t rate_limit_used = stat64_get(_stats.rate_limit_used);
+uint64_t rate_limit_start = stat64_get(_stats.rate_limit_start);
+uint64_t rate_limit_current = migration_transferred_bytes(f);
+uint64_t rate_limit_used = rate_limit_current - rate_limit_start;
 uint64_t rate_limit_max = stat64_get(_stats.rate_limit_max);

We moved from reading an atomic to call qemu_file_transferred(), that
does the iovec dance.

This commit (on this series):

ommit 524072cb5f5ce5605f1171f86ba0879405e4b9b3
Author: Juan Quintela 
Date:   Mon May 8 12:16:47 2023 +0200

migration: Use the number of transferred bytes directly

We only use migration_transferred_bytes() to calculate the rate_limit,
for that we don't need to flush whatever is on the qemu_file buffer.
Remember that the buffer is really small (normal case is 32K if we use
iov's can be 64 * TARGET_PAGE_SIZE), so this is not relevant to
calculations.

Signed-off-by: Juan Quintela 

diff --git a/migration/migration-stats.c b/migration/migration-stats.c
index 79eea8d865..1696185694 100644
--- a/migration/migration-stats.c
+++ b/migration/migration-stats.c
@@ -62,7 +62,7 @@ uint64_t migration_transferred_bytes(QEMUFile *f)
 {
 uint64_t multifd = stat64_get(_stats.multifd_bytes);
 uint64_t rdma = stat64_get(_stats.rdma_bytes);
-uint64_t qemu_file = qemu_file_transferred(f);
+uint64_t qemu_file = stat64_get(_stats.qemu_file_transferred);
 
 trace_migration_transferred_bytes(qemu_file, multifd, rdma);
 return qemu_file + multifd + rdma;

Undoes the damage.

And yes, before you ask, I got this wrong lots of times, have to rebase
and changing order of patches several times O:-)

>> 
>> > Does it mean that perhaps we simply need "sent and put into send buffer"
>> > more than "what really got transferred"?  So I start to wonder what's the
>> > origianl purpose of this change, and which one is better..
>> 
>> That is basically what patch 5 and 6 do O:-)
>> 
>> Problem is arriving to something that is bisectable (for correctness)
>> and is easy to review.
>> 
>> And yes, my choices can be different from the ones tat you do.
>> 
>> The other reason 

Re: [PATCH 12/42] migration-test: Enable back ignore-shared test

2023-06-21 Thread Juan Quintela
Peter Xu  wrote:
> On Wed, Jun 21, 2023 at 09:38:08PM +0200, Juan Quintela wrote:
>> Peter Xu  wrote:
>> > On Fri, Jun 09, 2023 at 12:49:13AM +0200, Juan Quintela wrote:
>> >> It failed on aarch64 tcg, lets see if that is still the case.
>> >> 
>> >> Signed-off-by: Juan Quintela 
>> >
>> > According to the history:
>> >
>> > https://lore.kernel.org/all/20190305180635.GA3803@work-vm/
>> >
>> > It's never enabled, and not sure whether Yury followed it up.  Juan: have
>> > you tried it out on aarch64 before enabling it again?  I assume we rely on
>> > the previous patch but that doesn't even sound like aarch64 specific.  I
>> > worry it'll just keep failing on aarch64.
>> 
>> Hi
>> 
>> I am resending this series.
>> 
>> I hard tested this time.  x86_64 host.
>> Two build directories:
>> - x86_64 (I just build qemu-system-x86_64, kvm)
>> - aarch64 (I just build qemu-system-aarch64, tcg)
>> 
>> Everything is run as:
>> 
>> while true; do $command || break; done
>> 
>> And run this:
>> - x86_64:
>>   * make check (nit: you can't run two make checks on the same
>> directory)
>>   * 4 ./test/qtest/migration-test
>>   * 2 ./test/qtest/migration-test -p ./tests/qtest/migration-test -p 
>> /x86_64/migration/multifd/tcp/plain/cancel
>>   * 2 ./test/qtest/migration-test -p ./tests/qtest/migration-test -p 
>> /x86_64/migration/ignore_shared
>> 
>> - aarch64:
>>   The same with s/x86_64/aarch64/
>> 
>> And left it running for 6 hours.  No errors.
>> Machine has enough RAM for running this (128GB) and 18 cores (intel
>> i9900K).
>> Load of the machine while running this tests is around 50 (I really hope
>> that our CI hosts have less load).
>> 
>> A run master with the same configuration.  In less than 10 minutes I get
>> the dreaded:
>> 
>> # starting QEMU: exec ./qemu-system-aarch64 -qtest 
>> unix:/tmp/qtest-3264370.sock -qtest-log /dev/null -chardev 
>> socket,path=/tmp/qtest-3264370.qmp,id=char0 -mon chardev=char0,mode=control 
>> -display none -accel kvm -accel tcg -machine virt,gic-version=max -name 
>> target,debug-threads=on -m 150M -serial 
>> file:/tmp/migration-test-1A1461/dest_serial -incoming defer -cpu max -kernel 
>> /tmp/migration-test-1A1461/bootsect-accel qtest
>> Broken pipe
>> ../../../../../mnt/code/qemu/multifd/tests/qtest/libqtest.c:195: kill_qemu() 
>> detected QEMU death from signal 6 (Aborted) (core dumped)
>> Aborted (core dumped)
>> $
>> 
>> On multifd+cancel.
>> 
>> I have no been able to ever get ignore_shared to fail on my machine.
>> But I didn't tested aarch64 TCG in the past so hard, and in x86_64 it
>> has always worked for me.
>
> Thanks a lot, Juan.
>
> Do you mean master is broken with QEMU_TEST_FLAKY_TESTS=1?

Yeap.  I mean multifd+cancel.  That is the reason why we put the FLAKY
part.

> And after the
> whole series applied we cannot trigger issue in the few hours test even
> with it?

Yeap.

> Shall we wait for another 1-2 days to see whether Yury would comment
> (before you repost)?  Otherwise I agree if it survives your few-hours test
> we should give it a try - at least according to Dave's comment before it
> was failing easily, but it is not now on the test bed.

>From the v2 series that I am about to post:

migration-test: Re-enable multifd_cancel test

Why?
- migration/multifd: Protect accesses to migration_threads
  this patch fixed the problem about memory corruption
- migration-test: Move serial to GuestState
  now we are using guest name as serial file name
  In the past there was a conflict between vm "to" and "to2" that used
  the same file name.
- migration-test: Wait for first target to finish
  Now we wait from vm "to" to finish before launching "to2".  So we
  avoid similar problems in the future.

Signed-off-by: Juan Quintela 


> Maybe it's still just hidden, but in that case I also agree enabling it in
> the repo is the simplest way to reproduce the failure again, if we still
> ever want to enable it one day..

We want.  If it still fails, we want to know why and fix it.

Later, Juan.




[PATCH] ui/gtk: set the area of the scanout texture correctly

2023-06-21 Thread Dongwon Kim
x and y offsets and width and height of the scanout texture
is not correctly configured in case guest scanout frame is
dmabuf.

Cc: Gerd Hoffmann 
Cc: Marc-André Lureau 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 ui/gtk-egl.c | 3 ++-
 ui/gtk-gl-area.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index 19130041bc..e99e3b0d8c 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -257,7 +257,8 @@ void gd_egl_scanout_dmabuf(DisplayChangeListener *dcl,
 
 gd_egl_scanout_texture(dcl, dmabuf->texture,
dmabuf->y0_top, dmabuf->width, dmabuf->height,
-   0, 0, dmabuf->width, dmabuf->height);
+   dmabuf->x, dmabuf->y, dmabuf->scanout_width,
+   dmabuf->scanout_height);
 
 if (dmabuf->allow_fences) {
 vc->gfx.guest_fb.dmabuf = dmabuf;
diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
index c384a1516b..1605818bd1 100644
--- a/ui/gtk-gl-area.c
+++ b/ui/gtk-gl-area.c
@@ -299,7 +299,8 @@ void gd_gl_area_scanout_dmabuf(DisplayChangeListener *dcl,
 
 gd_gl_area_scanout_texture(dcl, dmabuf->texture,
dmabuf->y0_top, dmabuf->width, dmabuf->height,
-   0, 0, dmabuf->width, dmabuf->height);
+   dmabuf->x, dmabuf->y, dmabuf->scanout_width,
+   dmabuf->scanout_height);
 
 if (dmabuf->allow_fences) {
 vc->gfx.guest_fb.dmabuf = dmabuf;
-- 
2.34.1




[PATCH] virtio-gpu: OUT_OF_MEMORY if failing to create udmabuf

2023-06-21 Thread Dongwon Kim
Respond with VIRTIO_GPU_RESP_ERR_OUT_OF_MEMORY if it fails to create
an udmabuf for the blob resource.

Cc: Gerd Hoffmann 
Cc: Marc-André Lureau 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 hw/display/virtio-gpu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 66cddd94d9..efe66ca7a3 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -635,9 +635,11 @@ static void virtio_gpu_do_set_scanout(VirtIOGPU *g,
 if (!virtio_gpu_update_dmabuf(g, scanout_id, res, fb, r)) {
 virtio_gpu_update_scanout(g, scanout_id, res, r);
 return;
+} else {
+*error = VIRTIO_GPU_RESP_ERR_OUT_OF_MEMORY;
+return;
 }
 }
-
 data = res->blob;
 } else {
 data = (uint8_t *)pixman_image_get_data(res->image);
-- 
2.34.1




Re: [PATCH v1 1/1] Q35 Support

2023-06-21 Thread Bernhard Beschow



Hi Joel,

Nice! I've been working on making the PIIX south bridge Xen agnostic, partly to 
show how Xen enablement in Q35 could look like. Not that I'd have any use case 
for it but great to see that you've actually done that!

I know you didn't intend to send this patch but I'll give you some early 
comments anyway.

Am 20. Juni 2023 17:24:35 UTC schrieb Joel Upham :
>---
> hw/acpi/ich9.c|   22 +-
> hw/acpi/pcihp.c   |6 +-
> hw/core/machine.c |   19 +
> hw/i386/pc_piix.c |3 +-
> hw/i386/pc_q35.c  |   39 +-
> hw/i386/xen/xen-hvm.c |7 +-
> hw/i386/xen/xen_platform.c|   19 +-
> hw/isa/lpc_ich9.c |   53 +-
> hw/isa/piix3.c|2 +-
> hw/pci-host/q35.c |   28 +-
> hw/pci/pci.c  |   17 +
> hw/xen/xen-host-pci-device.c  |  106 +++-
> hw/xen/xen-host-pci-device.h  |6 +-
> hw/xen/xen_pt.c   |   49 +-
> hw/xen/xen_pt.h   |   19 +-
> hw/xen/xen_pt_config_init.c   | 1103 ++---
> include/hw/acpi/ich9.h|1 +
> include/hw/acpi/pcihp.h   |2 +
> include/hw/boards.h   |1 +
> include/hw/i386/pc.h  |3 +
> include/hw/pci-host/q35.h |4 +-
> include/hw/pci/pci.h  |3 +
> include/hw/southbridge/ich9.h |1 +
> include/hw/xen/xen.h  |4 +-
> qemu-options.hx   |1 +
> softmmu/datadir.c |1 -
> softmmu/qdev-monitor.c|3 +-
> stubs/xen-hw-stub.c   |4 +-
> 28 files changed, 1395 insertions(+), 131 deletions(-)
>
>diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
>index 25e2c7243e..234706a191 100644
>--- a/hw/acpi/ich9.c
>+++ b/hw/acpi/ich9.c
>@@ -39,6 +39,8 @@
> #include "hw/southbridge/ich9.h"
> #include "hw/mem/pc-dimm.h"
> #include "hw/mem/nvdimm.h"
>+#include "hw/xen/xen.h"
>+#include "sysemu/xen.h"
> 
> //#define DEBUG
> 
>@@ -67,6 +69,10 @@ static void ich9_gpe_writeb(void *opaque, hwaddr addr, 
>uint64_t val,
> ICH9LPCPMRegs *pm = opaque;
> acpi_gpe_ioport_writeb(>acpi_regs, addr, val);
> acpi_update_sci(>acpi_regs, pm->irq);
>+
>+if (xen_enabled()) {
>+acpi_pcihp_reset(>acpi_pci_hotplug);
>+}
> }
> 
> static const MemoryRegionOps ich9_gpe_ops = {
>@@ -137,7 +143,8 @@ static int ich9_pm_post_load(void *opaque, int version_id)
> {
> ICH9LPCPMRegs *pm = opaque;
> uint32_t pm_io_base = pm->pm_io_base;
>-pm->pm_io_base = 0;
>+if (!xen_enabled())
>+pm->pm_io_base = 0;
> ich9_pm_iospace_update(pm, pm_io_base);
> return 0;
> }
>@@ -268,7 +275,10 @@ static void pm_reset(void *opaque)
> acpi_pm1_evt_reset(>acpi_regs);
> acpi_pm1_cnt_reset(>acpi_regs);
> acpi_pm_tmr_reset(>acpi_regs);
>-acpi_gpe_reset(>acpi_regs);
>+/* Noticed guest freezing in xen when this was reset after S3. */
>+if (!xen_enabled()) {
>+acpi_gpe_reset(>acpi_regs);
>+}

I wonder why this seems to work with PIIX?

I'd rather try to keep the Xen impact on the device model as low as possible, 
ideally as low as in the PIIX4 ACPI device model.

> 
> pm->smi_en = 0;
> if (!pm->smm_enabled) {
>@@ -316,7 +326,7 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm, 
>qemu_irq sci_irq)
> acpi_pm_tco_init(>tco_regs, >io);
> }
> 
>-if (pm->acpi_pci_hotplug.use_acpi_hotplug_bridge) {
>+if (pm->acpi_pci_hotplug.use_acpi_hotplug_bridge || xen_enabled()) {
> acpi_pcihp_init(OBJECT(lpc_pci),
> >acpi_pci_hotplug,
> pci_get_bus(lpc_pci),
>@@ -332,10 +342,14 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm, 
>qemu_irq sci_irq)
> pm->powerdown_notifier.notify = pm_powerdown_req;
> qemu_register_powerdown_notifier(>powerdown_notifier);
> 
>+if (xen_enabled()) {
>+acpi_set_pci_info(true);
>+}
>+
> legacy_acpi_cpu_hotplug_init(pci_address_space_io(lpc_pci),
> OBJECT(lpc_pci), >gpe_cpu, ICH9_CPU_HOTPLUG_IO_BASE);
> 
>-if (pm->acpi_memory_hotplug.is_enabled) {
>+if (pm->acpi_memory_hotplug.is_enabled || xen_enabled()) {
> acpi_memory_hotplug_init(pci_address_space_io(lpc_pci), 
> OBJECT(lpc_pci),
>  >acpi_memory_hotplug,
>  ACPI_MEMORY_HOTPLUG_BASE);
>diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
>index cdd6f775a1..5b065d670c 100644
>--- a/hw/acpi/pcihp.c
>+++ b/hw/acpi/pcihp.c
>@@ -40,6 +40,7 @@
> #include "qapi/error.h"
> #include "qom/qom-qobject.h"
> #include "trace.h"
>+#include "sysemu/xen.h"
> 
> #define ACPI_PCIHP_SIZE 0x0018
> #define PCI_UP_BASE 0x
>@@ -84,7 +85,8 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
> bool is_bridge = IS_PCI_BRIDGE(br);
> 
> /* hotplugged bridges can't be described in ACPI ignore them */
>-if (qbus_is_hotpluggable(BUS(bus))) {
>+/* Xen requires hotplugging to the root device, even on the Q35 

Re: [PULL 00/20] tricore queue

2023-06-21 Thread Richard Henderson

On 6/21/23 18:14, Bastian Koppelmann wrote:

The following changes since commit c5ffd16ba4c8fd3601742cc9d2b3cff03995dd5d:

   Revert "cputlb: Restrict SavedIOTLB to system emulation" (2023-06-21 
07:19:46 +0200)

are available in the Git repository at:

   https://github.com/bkoppelmann/qemu.git  tags/pull-tricore-20230621-1

for you to fetch changes up to a9c37abdff65a07d0191123a21d318c4d8cc7f33:

   target/tricore: Fix ICR.IE offset in RESTORE insn (2023-06-21 18:09:54 +0200)


- Implement privilege levels for TriCore
- Fix missing REG_PAIR() for insns using two 32 regs
- Fix erroneously saving PSW.CDC on CALL insns
- Added some missing v1.6.2 insns


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~




Re: [PATCH 12/42] migration-test: Enable back ignore-shared test

2023-06-21 Thread Juan Quintela
Peter Xu  wrote:
> On Wed, Jun 21, 2023 at 09:38:08PM +0200, Juan Quintela wrote:
>> Peter Xu  wrote:
>> > On Fri, Jun 09, 2023 at 12:49:13AM +0200, Juan Quintela wrote:
>> >> It failed on aarch64 tcg, lets see if that is still the case.
>> >> 
>> >> Signed-off-by: Juan Quintela 
>> >
>> > According to the history:
>> >
>> > https://lore.kernel.org/all/20190305180635.GA3803@work-vm/
>> >
>> > It's never enabled, and not sure whether Yury followed it up.  Juan: have
>> > you tried it out on aarch64 before enabling it again?  I assume we rely on
>> > the previous patch but that doesn't even sound like aarch64 specific.  I
>> > worry it'll just keep failing on aarch64.

>> On multifd+cancel.
>> 
>> I have no been able to ever get ignore_shared to fail on my machine.
>> But I didn't tested aarch64 TCG in the past so hard, and in x86_64 it
>> has always worked for me.
>
> Thanks a lot, Juan.
>
> Do you mean master is broken with QEMU_TEST_FLAKY_TESTS=1?  And after the
> whole series applied we cannot trigger issue in the few hours test even
> with it?
>
> Shall we wait for another 1-2 days to see whether Yury would comment
> (before you repost)?  Otherwise I agree if it survives your few-hours test
> we should give it a try - at least according to Dave's comment before it
> was failing easily, but it is not now on the test bed.
>
> Maybe it's still just hidden, but in that case I also agree enabling it in
> the repo is the simplest way to reproduce the failure again, if we still
> ever want to enable it one day..

Sending v2.

In that serie both re-enablement of multifd+cancel and ingnore_shared
are the last two patches.

So I am going to wait until all other patches are in and that the people
that complained the most: Peter Maydell, Daniel and Thomas say/test on their
cases.  Hint, Hint.

Later, Juan.




[PATCH v4 5/8] gdbstub: Report the actual qemu-user pid

2023-06-21 Thread Ilya Leoshkevich
Currently qemu-user reports pid 1 to GDB. Resolve the TODO and report
the actual PID. Using getpid() relies on the assumption that there is
only one GDBProcess. Add an assertion to make sure that future changes
don't break it.

Reviewed-by: Alex Bennée 
Signed-off-by: Ilya Leoshkevich 
---
 gdbstub/gdbstub.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 9139fec92a..c7e3ee71f2 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -202,13 +202,16 @@ void gdb_memtox(GString *buf, const char *mem, int len)
 
 static uint32_t gdb_get_cpu_pid(CPUState *cpu)
 {
-/* TODO: In user mode, we should use the task state PID */
+#ifdef CONFIG_USER_ONLY
+return getpid();
+#else
 if (cpu->cluster_index == UNASSIGNED_CLUSTER_INDEX) {
 /* Return the default process' PID */
 int index = gdbserver_state.process_num - 1;
 return gdbserver_state.processes[index].pid;
 }
 return cpu->cluster_index + 1;
+#endif
 }
 
 GDBProcess *gdb_get_process(uint32_t pid)
@@ -2146,19 +2149,25 @@ void gdb_read_byte(uint8_t ch)
 void gdb_create_default_process(GDBState *s)
 {
 GDBProcess *process;
-int max_pid = 0;
+int pid;
 
+#ifdef CONFIG_USER_ONLY
+assert(gdbserver_state.process_num == 0);
+pid = getpid();
+#else
 if (gdbserver_state.process_num) {
-max_pid = s->processes[s->process_num - 1].pid;
+pid = s->processes[s->process_num - 1].pid;
+} else {
+pid = 0;
 }
+/* We need an available PID slot for this process */
+assert(pid < UINT32_MAX);
+pid++;
+#endif
 
 s->processes = g_renew(GDBProcess, s->processes, ++s->process_num);
 process = >processes[s->process_num - 1];
-
-/* We need an available PID slot for this process */
-assert(max_pid < UINT32_MAX);
-
-process->pid = max_pid + 1;
+process->pid = pid;
 process->attached = false;
 process->target_xml[0] = '\0';
 }
-- 
2.40.1




[PATCH v4 8/8] tests/tcg: Add a test for info proc mappings

2023-06-21 Thread Ilya Leoshkevich
Add a small test to prevent regressions.

Signed-off-by: Ilya Leoshkevich 
---
 tests/tcg/multiarch/Makefile.target   |  9 ++-
 .../multiarch/gdbstub/test-proc-mappings.py   | 65 +++
 2 files changed, 73 insertions(+), 1 deletion(-)
 create mode 100644 tests/tcg/multiarch/gdbstub/test-proc-mappings.py

diff --git a/tests/tcg/multiarch/Makefile.target 
b/tests/tcg/multiarch/Makefile.target
index 373db69648..43bddeaf21 100644
--- a/tests/tcg/multiarch/Makefile.target
+++ b/tests/tcg/multiarch/Makefile.target
@@ -81,6 +81,13 @@ run-gdbstub-qxfer-auxv-read: sha1
--bin $< --test 
$(MULTIARCH_SRC)/gdbstub/test-qxfer-auxv-read.py, \
basic gdbstub qXfer:auxv:read support)
 
+run-gdbstub-proc-mappings: sha1
+   $(call run-test, $@, $(GDB_SCRIPT) \
+   --gdb $(HAVE_GDB_BIN) \
+   --qemu $(QEMU) --qargs "$(QEMU_OPTS)" \
+   --bin $< --test $(MULTIARCH_SRC)/gdbstub/test-proc-mappings.py, 
\
+   proc mappings support)
+
 run-gdbstub-thread-breakpoint: testthread
$(call run-test, $@, $(GDB_SCRIPT) \
--gdb $(HAVE_GDB_BIN) \
@@ -97,7 +104,7 @@ run-gdbstub-%:
$(call skip-test, "gdbstub test $*", "need working gdb")
 endif
 EXTRA_RUNS += run-gdbstub-sha1 run-gdbstub-qxfer-auxv-read \
- run-gdbstub-thread-breakpoint
+ run-gdbstub-proc-mappings run-gdbstub-thread-breakpoint
 
 # ARM Compatible Semi Hosting Tests
 #
diff --git a/tests/tcg/multiarch/gdbstub/test-proc-mappings.py 
b/tests/tcg/multiarch/gdbstub/test-proc-mappings.py
new file mode 100644
index 00..a23bbcee7f
--- /dev/null
+++ b/tests/tcg/multiarch/gdbstub/test-proc-mappings.py
@@ -0,0 +1,65 @@
+"""Test that gdbstub has access to proc mappings.
+
+This runs as a sourced script (via -x, via run-test.py)."""
+from __future__ import print_function
+import gdb
+import sys
+
+
+n_failures = 0
+
+
+def report(cond, msg):
+"""Report success/fail of a test"""
+if cond:
+print("PASS: {}".format(msg))
+else:
+print("FAIL: {}".format(msg))
+global n_failures
+n_failures += 1
+
+
+def run_test():
+"""Run through the tests one by one"""
+try:
+mappings = gdb.execute("info proc mappings", False, True)
+except gdb.error as exc:
+exc_str = str(exc)
+if "Not supported on this target." in exc_str:
+# Detect failures due to an outstanding issue with how GDB handles
+# the x86_64 QEMU's target.xml, which does not contain the
+# definition of orig_rax. Skip the test in this case.
+print("SKIP: {}".format(exc_str))
+return
+raise
+report(isinstance(mappings, str), "Fetched the mappings from the inferior")
+report("/sha1" in mappings, "Found the test binary name in the mappings")
+
+
+def main():
+"""Prepare the environment and run through the tests"""
+try:
+inferior = gdb.selected_inferior()
+print("ATTACHED: {}".format(inferior.architecture().name()))
+except (gdb.error, AttributeError):
+print("SKIPPING (not connected)")
+exit(0)
+
+if gdb.parse_and_eval('$pc') == 0:
+print("SKIP: PC not set")
+exit(0)
+
+try:
+# These are not very useful in scripts
+gdb.execute("set pagination off")
+gdb.execute("set confirm off")
+
+# Run the actual tests
+run_test()
+except gdb.error:
+report(False, "GDB Exception: {}".format(sys.exc_info()[0]))
+print("All tests complete: %d failures" % n_failures)
+exit(n_failures)
+
+
+main()
-- 
2.40.1




[PATCH v4 0/8] gdbstub: Add support for info proc mappings

2023-06-21 Thread Ilya Leoshkevich
v3: https://lists.gnu.org/archive/html/qemu-devel/2023-06/msg01311.html
v3 -> v4: Fix the 32-bit build (Alex).
  Enable the test on all architectures and ignore certain
  expected failures (Alex). I tried this with the latest
  gdb-multiarch and it works. The only skip is on x86_64,
  as expected.

v2: https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg06837.html
v2 -> v3: Use openat() instead of safe_openat() (new patch: 2/8).
  Add /proc/self/smaps emulation (new patch: 3/8).
  With these 2 changes, the minor issues previously mentioned in
  the patch 6/8 are gone.

v1: https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg02614.html
v1 -> v2: Reword the 5/6 commit message (Dominik).
  Add R-bs.
  Patches that need review:
  4/6 gdbstub: Add support for info proc mappings
  6/6 tests/tcg: Add a test for info proc mappings

Hi,

this series partially implements the Host I/O feature of the GDB Remote
Serial Protocol in order to make generate-core-file work with qemu-user.
It borrows heavily from the abandoned patch by Dominik [1], hence 4/6
carries the respective Co-developed-by: tag. I also peeked at
gdbserver/hostio.cc quite a few times.

The changes compared to Dominik's patch are:

- Implement readlink.
- Move the main functionality to user-target.c.
- Allocate buffers on heap.
- Add a test.
- Update gdb.rst.
- Split refactorings to the existing code into separate patches.
- Rename do_openat() to do_guest_openat().
- Do not retry pread(), since GDB is capable of doing it itself.
- Add an extra sanity check to gdb_handle_query_xfer_exec_file().
- Replace citations of the spec by a single link.

Best regards,
Ilya
Ilya Leoshkevich (8):
  linux-user: Expose do_guest_openat() and do_guest_readlink()
  linux-user: Add "safe" parameter to do_guest_openat()
  linux-user: Emulate /proc/self/smaps
  gdbstub: Expose gdb_get_process() and gdb_get_first_cpu_in_process()
  gdbstub: Report the actual qemu-user pid
  gdbstub: Add support for info proc mappings
  docs: Document security implications of debugging
  tests/tcg: Add a test for info proc mappings

 docs/system/gdb.rst   |  15 ++
 gdbstub/gdbstub.c |  86 ---
 gdbstub/internals.h   |   7 +
 gdbstub/user-target.c | 139 ++
 linux-user/qemu.h |   3 +
 linux-user/syscall.c  | 128 +---
 tests/tcg/multiarch/Makefile.target   |   9 +-
 .../multiarch/gdbstub/test-proc-mappings.py   |  65 
 8 files changed, 409 insertions(+), 43 deletions(-)
 create mode 100644 tests/tcg/multiarch/gdbstub/test-proc-mappings.py

-- 
2.40.1




[PATCH v4 7/8] docs: Document security implications of debugging

2023-06-21 Thread Ilya Leoshkevich
Now that the GDB stub explicitly implements reading host files (note
that it was already possible by changing the emulated code to open and
read those files), concerns may arise that it undermines security.

Document the status quo, which is that the users are already
responsible for securing the GDB connection themselves.

Reviewed-by: Alex Bennée 
Signed-off-by: Ilya Leoshkevich 
---
 docs/system/gdb.rst | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/docs/system/gdb.rst b/docs/system/gdb.rst
index 7d3718deef..9906991b84 100644
--- a/docs/system/gdb.rst
+++ b/docs/system/gdb.rst
@@ -214,3 +214,18 @@ The memory mode can be checked by sending the following 
command:
 
 ``maintenance packet Qqemu.PhyMemMode:0``
 This will change it back to normal memory mode.
+
+Security considerations
+===
+
+Connecting to the GDB socket allows running arbitrary code inside the guest;
+in case of the TCG emulation, which is not considered a security boundary, this
+also means running arbitrary code on the host. Additionally, when debugging
+qemu-user, it allows directly downloading any file readable by QEMU from the
+host.
+
+The GDB socket is not protected by authentication, authorization or encryption.
+It is therefore a responsibility of the user to make sure that only authorized
+clients can connect to it, e.g., by using a unix socket with proper
+permissions, or by opening a TCP socket only on interfaces that are not
+reachable by potential attackers.
-- 
2.40.1




[PATCH v4 6/8] gdbstub: Add support for info proc mappings

2023-06-21 Thread Ilya Leoshkevich
Currently the GDB's generate-core-file command doesn't work well with
qemu-user: the resulting dumps are huge [1] and at the same time
incomplete (argv and envp are missing). The reason is that GDB has no
access to proc mappings and therefore has to fall back to using
heuristics for discovering them. This is, in turn, because qemu-user
does not implement the Host I/O feature of the GDB Remote Serial
Protocol.

Implement vFile:{open,close,pread,readlink} and also
qXfer:exec-file:read+. With that, generate-core-file begins to work on
aarch64 and s390x.

[1] https://sourceware.org/pipermail/gdb-patches/2023-May/199432.html

Co-developed-by: Dominik 'Disconnect3d' Czarnota 
Signed-off-by: Ilya Leoshkevich 
---
 gdbstub/gdbstub.c |  45 +-
 gdbstub/internals.h   |   5 ++
 gdbstub/user-target.c | 139 ++
 3 files changed, 187 insertions(+), 2 deletions(-)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index c7e3ee71f2..d2efefd352 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -1337,6 +1337,36 @@ static const GdbCmdParseEntry gdb_v_commands_table[] = {
 .cmd = "Kill;",
 .cmd_startswith = 1
 },
+#ifdef CONFIG_USER_ONLY
+/*
+ * Host I/O Packets. See [1] for details.
+ * [1] https://sourceware.org/gdb/onlinedocs/gdb/Host-I_002fO-Packets.html
+ */
+{
+.handler = gdb_handle_v_file_open,
+.cmd = "File:open:",
+.cmd_startswith = 1,
+.schema = "s,L,L0"
+},
+{
+.handler = gdb_handle_v_file_close,
+.cmd = "File:close:",
+.cmd_startswith = 1,
+.schema = "l0"
+},
+{
+.handler = gdb_handle_v_file_pread,
+.cmd = "File:pread:",
+.cmd_startswith = 1,
+.schema = "l,L,L0"
+},
+{
+.handler = gdb_handle_v_file_readlink,
+.cmd = "File:readlink:",
+.cmd_startswith = 1,
+.schema = "s0"
+},
+#endif
 };
 
 static void handle_v_commands(GArray *params, void *user_ctx)
@@ -1482,11 +1512,14 @@ static void handle_query_supported(GArray *params, void 
*user_ctx)
 ";ReverseStep+;ReverseContinue+");
 }
 
-#if defined(CONFIG_USER_ONLY) && defined(CONFIG_LINUX)
+#if defined(CONFIG_USER_ONLY)
+#if defined(CONFIG_LINUX)
 if (gdbserver_state.c_cpu->opaque) {
 g_string_append(gdbserver_state.str_buf, ";qXfer:auxv:read+");
 }
 #endif
+g_string_append(gdbserver_state.str_buf, ";qXfer:exec-file:read+");
+#endif
 
 if (params->len &&
 strstr(get_param(params, 0)->data, "multiprocess+")) {
@@ -1625,13 +1658,21 @@ static const GdbCmdParseEntry gdb_gen_query_table[] = {
 .cmd_startswith = 1,
 .schema = "s:l,l0"
 },
-#if defined(CONFIG_USER_ONLY) && defined(CONFIG_LINUX)
+#if defined(CONFIG_USER_ONLY)
+#if defined(CONFIG_LINUX)
 {
 .handler = gdb_handle_query_xfer_auxv,
 .cmd = "Xfer:auxv:read::",
 .cmd_startswith = 1,
 .schema = "l,l0"
 },
+#endif
+{
+.handler = gdb_handle_query_xfer_exec_file,
+.cmd = "Xfer:exec-file:read:",
+.cmd_startswith = 1,
+.schema = "l:l,l0"
+},
 #endif
 {
 .handler = gdb_handle_query_attached,
diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index 25e4d5eeaa..f2b46cce41 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -189,6 +189,11 @@ typedef union GdbCmdVariant {
 void gdb_handle_query_rcmd(GArray *params, void *user_ctx); /* softmmu */
 void gdb_handle_query_offsets(GArray *params, void *user_ctx); /* user */
 void gdb_handle_query_xfer_auxv(GArray *params, void *user_ctx); /*user */
+void gdb_handle_v_file_open(GArray *params, void *user_ctx); /* user */
+void gdb_handle_v_file_close(GArray *params, void *user_ctx); /* user */
+void gdb_handle_v_file_pread(GArray *params, void *user_ctx); /* user */
+void gdb_handle_v_file_readlink(GArray *params, void *user_ctx); /* user */
+void gdb_handle_query_xfer_exec_file(GArray *params, void *user_ctx); /* user 
*/
 
 void gdb_handle_query_attached(GArray *params, void *user_ctx); /* both */
 
diff --git a/gdbstub/user-target.c b/gdbstub/user-target.c
index fa0e59ec9a..5f0098c806 100644
--- a/gdbstub/user-target.c
+++ b/gdbstub/user-target.c
@@ -11,6 +11,10 @@
 #include "exec/gdbstub.h"
 #include "qemu.h"
 #include "internals.h"
+#ifdef CONFIG_LINUX
+#include "linux-user/loader.h"
+#include "linux-user/qemu.h"
+#endif
 
 /*
  * Map target signal numbers to GDB protocol signal numbers and vice
@@ -281,3 +285,138 @@ void gdb_handle_query_xfer_auxv(GArray *params, void 
*user_ctx)
   gdbserver_state.str_buf->len, true);
 }
 #endif
+
+static const char *get_filename_param(GArray *params, int i)
+{
+const char *hex_filename = get_param(params, i)->data;
+gdb_hextomem(gdbserver_state.mem_buf, hex_filename,
+ strlen(hex_filename) / 2);
+g_byte_array_append(gdbserver_state.mem_buf, (const guint8 *)"", 1);
+   

[PATCH v4 4/8] gdbstub: Expose gdb_get_process() and gdb_get_first_cpu_in_process()

2023-06-21 Thread Ilya Leoshkevich
These functions will be needed by user-target.c in order to retrieve
the name of the executable.

Reviewed-by: Alex Bennée 
Signed-off-by: Ilya Leoshkevich 
---
 gdbstub/gdbstub.c   | 16 
 gdbstub/internals.h |  2 ++
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index be18568d0a..9139fec92a 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -211,7 +211,7 @@ static uint32_t gdb_get_cpu_pid(CPUState *cpu)
 return cpu->cluster_index + 1;
 }
 
-static GDBProcess *gdb_get_process(uint32_t pid)
+GDBProcess *gdb_get_process(uint32_t pid)
 {
 int i;
 
@@ -247,7 +247,7 @@ static CPUState *find_cpu(uint32_t thread_id)
 return NULL;
 }
 
-static CPUState *get_first_cpu_in_process(GDBProcess *process)
+CPUState *gdb_get_first_cpu_in_process(GDBProcess *process)
 {
 CPUState *cpu;
 
@@ -325,7 +325,7 @@ static CPUState *gdb_get_cpu(uint32_t pid, uint32_t tid)
 return NULL;
 }
 
-return get_first_cpu_in_process(process);
+return gdb_get_first_cpu_in_process(process);
 } else {
 /* a specific thread */
 cpu = find_cpu(tid);
@@ -354,7 +354,7 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
 size_t len;
 int i;
 const char *name;
-CPUState *cpu = get_first_cpu_in_process(process);
+CPUState *cpu = gdb_get_first_cpu_in_process(process);
 CPUClass *cc = CPU_GET_CLASS(cpu);
 
 len = 0;
@@ -490,7 +490,7 @@ void gdb_register_coprocessor(CPUState *cpu,
 
 static void gdb_process_breakpoint_remove_all(GDBProcess *p)
 {
-CPUState *cpu = get_first_cpu_in_process(p);
+CPUState *cpu = gdb_get_first_cpu_in_process(p);
 
 while (cpu) {
 gdb_breakpoint_remove_all(cpu);
@@ -653,7 +653,7 @@ static int gdb_handle_vcont(const char *p)
 goto out;
 }
 
-cpu = get_first_cpu_in_process(process);
+cpu = gdb_get_first_cpu_in_process(process);
 while (cpu) {
 if (newstates[cpu->cpu_index] == 1) {
 newstates[cpu->cpu_index] = cur_action;
@@ -1280,7 +1280,7 @@ static void handle_v_attach(GArray *params, void 
*user_ctx)
 goto cleanup;
 }
 
-cpu = get_first_cpu_in_process(process);
+cpu = gdb_get_first_cpu_in_process(process);
 if (!cpu) {
 goto cleanup;
 }
@@ -1403,7 +1403,7 @@ static void handle_query_curr_tid(GArray *params, void 
*user_ctx)
  * first thread).
  */
 process = gdb_get_cpu_process(gdbserver_state.g_cpu);
-cpu = get_first_cpu_in_process(process);
+cpu = gdb_get_first_cpu_in_process(process);
 g_string_assign(gdbserver_state.str_buf, "QC");
 gdb_append_thread_id(cpu, gdbserver_state.str_buf);
 gdb_put_strbuf();
diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index 33d21d6488..25e4d5eeaa 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -129,6 +129,8 @@ void gdb_read_byte(uint8_t ch);
  */
 bool gdb_got_immediate_ack(void);
 /* utility helpers */
+GDBProcess *gdb_get_process(uint32_t pid);
+CPUState *gdb_get_first_cpu_in_process(GDBProcess *process);
 CPUState *gdb_first_attached_cpu(void);
 void gdb_append_thread_id(CPUState *cpu, GString *buf);
 int gdb_get_cpu_index(CPUState *cpu);
-- 
2.40.1




[PATCH v4 1/8] linux-user: Expose do_guest_openat() and do_guest_readlink()

2023-06-21 Thread Ilya Leoshkevich
These functions will be required by the GDB stub in order to provide
the guest view of /proc to GDB.

Reviewed-by: Alex Bennée 
Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Richard Henderson 
---
 linux-user/qemu.h|  3 +++
 linux-user/syscall.c | 54 
 2 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 92f9f5af41..a5830ec239 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -165,6 +165,9 @@ typedef struct TaskState {
 } TaskState;
 
 abi_long do_brk(abi_ulong new_brk);
+int do_guest_openat(CPUArchState *cpu_env, int dirfd, const char *pathname,
+int flags, mode_t mode);
+ssize_t do_guest_readlink(const char *pathname, char *buf, size_t bufsiz);
 
 /* user access */
 
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index f2cb101d83..fa83737192 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8448,7 +8448,8 @@ static int open_hardware(CPUArchState *cpu_env, int fd)
 }
 #endif
 
-static int do_openat(CPUArchState *cpu_env, int dirfd, const char *pathname, 
int flags, mode_t mode)
+int do_guest_openat(CPUArchState *cpu_env, int dirfd, const char *pathname,
+int flags, mode_t mode)
 {
 struct fake_open {
 const char *filename;
@@ -8520,6 +8521,36 @@ static int do_openat(CPUArchState *cpu_env, int dirfd, 
const char *pathname, int
 return safe_openat(dirfd, path(pathname), flags, mode);
 }
 
+ssize_t do_guest_readlink(const char *pathname, char *buf, size_t bufsiz)
+{
+ssize_t ret;
+
+if (!pathname || !buf) {
+errno = EFAULT;
+return -1;
+}
+
+if (!bufsiz) {
+/* Short circuit this for the magic exe check. */
+errno = EINVAL;
+return -1;
+}
+
+if (is_proc_myself((const char *)pathname, "exe")) {
+/*
+ * Don't worry about sign mismatch as earlier mapping
+ * logic would have thrown a bad address error.
+ */
+ret = MIN(strlen(exec_path), bufsiz);
+/* We cannot NUL terminate the string. */
+memcpy(buf, exec_path, ret);
+} else {
+ret = readlink(path(pathname), buf, bufsiz);
+}
+
+return ret;
+}
+
 static int do_execveat(CPUArchState *cpu_env, int dirfd,
abi_long pathname, abi_long guest_argp,
abi_long guest_envp, int flags)
@@ -8994,7 +9025,7 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 case TARGET_NR_open:
 if (!(p = lock_user_string(arg1)))
 return -TARGET_EFAULT;
-ret = get_errno(do_openat(cpu_env, AT_FDCWD, p,
+ret = get_errno(do_guest_openat(cpu_env, AT_FDCWD, p,
   target_to_host_bitmask(arg2, 
fcntl_flags_tbl),
   arg3));
 fd_trans_unregister(ret);
@@ -9004,7 +9035,7 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 case TARGET_NR_openat:
 if (!(p = lock_user_string(arg2)))
 return -TARGET_EFAULT;
-ret = get_errno(do_openat(cpu_env, arg1, p,
+ret = get_errno(do_guest_openat(cpu_env, arg1, p,
   target_to_host_bitmask(arg3, 
fcntl_flags_tbl),
   arg4));
 fd_trans_unregister(ret);
@@ -10229,22 +10260,7 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 void *p2;
 p = lock_user_string(arg1);
 p2 = lock_user(VERIFY_WRITE, arg2, arg3, 0);
-if (!p || !p2) {
-ret = -TARGET_EFAULT;
-} else if (!arg3) {
-/* Short circuit this for the magic exe check. */
-ret = -TARGET_EINVAL;
-} else if (is_proc_myself((const char *)p, "exe")) {
-/*
- * Don't worry about sign mismatch as earlier mapping
- * logic would have thrown a bad address error.
- */
-ret = MIN(strlen(exec_path), arg3);
-/* We cannot NUL terminate the string. */
-memcpy(p2, exec_path, ret);
-} else {
-ret = get_errno(readlink(path(p), p2, arg3));
-}
+ret = get_errno(do_guest_readlink(p, p2, arg3));
 unlock_user(p2, arg2, ret);
 unlock_user(p, arg1, 0);
 }
-- 
2.40.1




[PATCH v4 2/8] linux-user: Add "safe" parameter to do_guest_openat()

2023-06-21 Thread Ilya Leoshkevich
gdbstub cannot meaningfully handle QEMU_ERESTARTSYS, and it doesn't
need to. Add a parameter to do_guest_openat() that makes it use
openat() instead of safe_openat(), so that it becomes usable from
gdbstub.

Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Richard Henderson 
---
 linux-user/qemu.h|  2 +-
 linux-user/syscall.c | 18 +-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index a5830ec239..9b8e0860d7 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -166,7 +166,7 @@ typedef struct TaskState {
 
 abi_long do_brk(abi_ulong new_brk);
 int do_guest_openat(CPUArchState *cpu_env, int dirfd, const char *pathname,
-int flags, mode_t mode);
+int flags, mode_t mode, bool safe);
 ssize_t do_guest_readlink(const char *pathname, char *buf, size_t bufsiz);
 
 /* user access */
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index fa83737192..ecd9f5e23d 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8449,7 +8449,7 @@ static int open_hardware(CPUArchState *cpu_env, int fd)
 #endif
 
 int do_guest_openat(CPUArchState *cpu_env, int dirfd, const char *pathname,
-int flags, mode_t mode)
+int flags, mode_t mode, bool safe)
 {
 struct fake_open {
 const char *filename;
@@ -8476,7 +8476,11 @@ int do_guest_openat(CPUArchState *cpu_env, int dirfd, 
const char *pathname,
 };
 
 if (is_proc_myself(pathname, "exe")) {
-return safe_openat(dirfd, exec_path, flags, mode);
+if (safe) {
+return safe_openat(dirfd, exec_path, flags, mode);
+} else {
+return openat(dirfd, exec_path, flags, mode);
+}
 }
 
 for (fake_open = fakes; fake_open->filename; fake_open++) {
@@ -8518,7 +8522,11 @@ int do_guest_openat(CPUArchState *cpu_env, int dirfd, 
const char *pathname,
 return fd;
 }
 
-return safe_openat(dirfd, path(pathname), flags, mode);
+if (safe) {
+return safe_openat(dirfd, path(pathname), flags, mode);
+} else {
+return openat(dirfd, path(pathname), flags, mode);
+}
 }
 
 ssize_t do_guest_readlink(const char *pathname, char *buf, size_t bufsiz)
@@ -9027,7 +9035,7 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 return -TARGET_EFAULT;
 ret = get_errno(do_guest_openat(cpu_env, AT_FDCWD, p,
   target_to_host_bitmask(arg2, 
fcntl_flags_tbl),
-  arg3));
+  arg3, true));
 fd_trans_unregister(ret);
 unlock_user(p, arg1, 0);
 return ret;
@@ -9037,7 +9045,7 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 return -TARGET_EFAULT;
 ret = get_errno(do_guest_openat(cpu_env, arg1, p,
   target_to_host_bitmask(arg3, 
fcntl_flags_tbl),
-  arg4));
+  arg4, true));
 fd_trans_unregister(ret);
 unlock_user(p, arg2, 0);
 return ret;
-- 
2.40.1




[PATCH v4 3/8] linux-user: Emulate /proc/self/smaps

2023-06-21 Thread Ilya Leoshkevich
/proc/self/smaps is an extension of /proc/self/maps: it provides the
same lines, plus additional information about each range.

GDB uses /proc/self/smaps when available, which means that
generate-core-file tries it first before falling back to
/proc/self/maps. This, in turn, causes it to dump the host mappings,
since /proc/self/smaps is not emulated and is just passed through.

Fix by emulating /proc/self/smaps. Provide true values only for
Size, KernelPageSize, MMUPageSize and VmFlags. Leave all other values
at 0, which is a valid conservative estimate.

Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Richard Henderson 
---
 linux-user/syscall.c | 58 +++-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index ecd9f5e23d..08162cc966 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8042,7 +8042,36 @@ static int open_self_cmdline(CPUArchState *cpu_env, int 
fd)
 return 0;
 }
 
-static int open_self_maps(CPUArchState *cpu_env, int fd)
+static void show_smaps(int fd, unsigned long size)
+{
+unsigned long page_size_kb = TARGET_PAGE_SIZE >> 10;
+unsigned long size_kb = size >> 10;
+
+dprintf(fd, "Size:  %lu kB\n"
+"KernelPageSize:%lu kB\n"
+"MMUPageSize:   %lu kB\n"
+"Rss:   0 kB\n"
+"Pss:   0 kB\n"
+"Pss_Dirty: 0 kB\n"
+"Shared_Clean:  0 kB\n"
+"Shared_Dirty:  0 kB\n"
+"Private_Clean: 0 kB\n"
+"Private_Dirty: 0 kB\n"
+"Referenced:0 kB\n"
+"Anonymous: 0 kB\n"
+"LazyFree:  0 kB\n"
+"AnonHugePages: 0 kB\n"
+"ShmemPmdMapped:0 kB\n"
+"FilePmdMapped: 0 kB\n"
+"Shared_Hugetlb:0 kB\n"
+"Private_Hugetlb:   0 kB\n"
+"Swap:  0 kB\n"
+"SwapPss:   0 kB\n"
+"Locked:0 kB\n"
+"THPeligible:0\n", size_kb, page_size_kb, page_size_kb);
+}
+
+static int open_self_maps_1(CPUArchState *cpu_env, int fd, bool smaps)
 {
 CPUState *cpu = env_cpu(cpu_env);
 TaskState *ts = cpu->opaque;
@@ -8089,6 +8118,18 @@ static int open_self_maps(CPUArchState *cpu_env, int fd)
 } else {
 dprintf(fd, "\n");
 }
+if (smaps) {
+show_smaps(fd, max - min);
+dprintf(fd, "VmFlags:%s%s%s%s%s%s%s%s\n",
+(flags & PAGE_READ) ? " rd" : "",
+(flags & PAGE_WRITE_ORG) ? " wr" : "",
+(flags & PAGE_EXEC) ? " ex" : "",
+e->is_priv ? "" : " sh",
+(flags & PAGE_READ) ? " mr" : "",
+(flags & PAGE_WRITE_ORG) ? " mw" : "",
+(flags & PAGE_EXEC) ? " me" : "",
+e->is_priv ? "" : " ms");
+}
 }
 }
 
@@ -8103,11 +8144,25 @@ static int open_self_maps(CPUArchState *cpu_env, int fd)
 " --xp  00:00 0",
 TARGET_VSYSCALL_PAGE, TARGET_VSYSCALL_PAGE + 
TARGET_PAGE_SIZE);
 dprintf(fd, "%*s%s\n", 73 - count, "",  "[vsyscall]");
+if (smaps) {
+show_smaps(fd, TARGET_PAGE_SIZE);
+dprintf(fd, "VmFlags: ex\n");
+}
 #endif
 
 return 0;
 }
 
+static int open_self_maps(CPUArchState *cpu_env, int fd)
+{
+return open_self_maps_1(cpu_env, fd, false);
+}
+
+static int open_self_smaps(CPUArchState *cpu_env, int fd)
+{
+return open_self_maps_1(cpu_env, fd, true);
+}
+
 static int open_self_stat(CPUArchState *cpu_env, int fd)
 {
 CPUState *cpu = env_cpu(cpu_env);
@@ -8459,6 +8514,7 @@ int do_guest_openat(CPUArchState *cpu_env, int dirfd, 
const char *pathname,
 const struct fake_open *fake_open;
 static const struct fake_open fakes[] = {
 { "maps", open_self_maps, is_proc_myself },
+{ "smaps", open_self_smaps, is_proc_myself },
 { "stat", open_self_stat, is_proc_myself },
 { "auxv", open_self_auxv, is_proc_myself },
 { "cmdline", open_self_cmdline, is_proc_myself },
-- 
2.40.1




Re: [PATCH v2 1/2] migration: enfocre multifd and postcopy preempt to be set before incoming

2023-06-21 Thread Juan Quintela
Wei Wang  wrote:
> qemu_start_incoming_migration needs to check the number of multifd
> channels or postcopy ram channels to configure the backlog parameter (i.e.
> the maximum length to which the queue of pending connections for sockfd
> may grow) of listen(). So enforce the usage of postcopy-preempt and
> multifd as below:
> - need to use "-incoming defer" on the destination; and
> - set_capability and set_parameter need to be done before migrate_incoming
>
> Otherwise, disable the use of the features and report error messages to
> remind users to adjust the commands.
>
> Signed-off-by: Wei Wang 
> Reviewed-by: Peter Xu 
> ---
>  migration/options.c | 36 +++-
>  1 file changed, 31 insertions(+), 5 deletions(-)
>

This bit is wrong

> @@ -998,11 +1013,22 @@ bool migrate_params_check(MigrationParameters *params, 
> Error **errp)
>  
>  /* x_checkpoint_delay is now always positive */
>  
> -if (params->has_multifd_channels && (params->multifd_channels < 1)) {
> -error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
> -   "multifd_channels",
> -   "a value between 1 and 255");
> -return false;
> +if (params->has_multifd_channels) {
> +if (params->multifd_channels < 1) {
> +error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
> +   "multifd_channels",
> +   "a value between 1 and 255");
> +return false;
> +}
> +if (migrate_incoming_started()) {
> +MigrationState *ms = migrate_get_current();
> +
> +ms->capabilities[MIGRATION_CAPABILITY_MULTIFD] = false;
> +error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
> +   "multifd_channels",
> +   "must be set before incoming starts");
> +return false;
> +}
>  }
>  
>  if (params->has_multifd_zlib_level &&

# Start of tls tests
# starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-3655124.sock 
-qtest-log /dev/null -chardev socket,path=/tmp/qtest-3655124.qmp,id=char0 -mon 
chardev=char0,mode=control -display none -accel kvm -accel tcg -name 
source,debug-threads=on -m 150M -serial 
file:/tmp/migration-test-5QEX61/src_serial -drive 
file=/tmp/migration-test-5QEX61/bootsect,format=raw2>/dev/null -accel qtest
# starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-3655124.sock 
-qtest-log /dev/null -chardev socket,path=/tmp/qtest-3655124.qmp,id=char0 -mon 
chardev=char0,mode=control -display none -accel kvm -accel tcg -name 
target,debug-threads=on -m 150M -serial 
file:/tmp/migration-test-5QEX61/dest_serial -incoming 
unix:/tmp/migration-test-5QEX61/migsocket -drive 
file=/tmp/migration-test-5QEX61/bootsect,format=raw2>/dev/null -accel qtest
# {
# "error": {
# "class": "GenericError",
# "desc": "Parameter 'multifd_channels' expects must be set before 
incoming starts"
# }
# }
**
ERROR:../../../../../mnt/code/qemu/full/tests/qtest/libqtest.c:1259:qtest_vqmp_assert_success_ref:
 assertion failed: (qdict_haskey(response, "return"))
not ok /x86_64/migration/postcopy/recovery/tls/psk - 
ERROR:../../../../../mnt/code/qemu/full/tests/qtest/libqtest.c:1259:qtest_vqmp_assert_success_ref:
 assertion failed: (qdict_haskey(response, "return"))
Bail out!
Aborted (core dumped)

This is the tests that fails.

qtest_add_func("/migration/postcopy/preempt/plain", 
test_postcopy_preempt);

I am dropping that change and let the others, which are right.

I think what we should do is changing that check to:


static bool migration_has_started(void)
{
MigrationIncomingState *mis = migration_incoming_get_current();

if (mis->state != MIGRATION_STATUS_NONE) {
   return true;
}
MigrationState *ms = migration_get_current();
if (mis->state != MIGRATION_STATUS_NONE) {
   return true;
}
return false;
}

And for all the parameters that can't be changed after migration has
started do:

if (params->has_multifd_channels) {
if (params->multifd_channels < 1) {
error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
   "multifd_channels",
   "a value between 1 and 255");
return false;
}
if (migration_has_started()) {
error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
   "multifd_channels",
   "must be set before migration starts");
return false;
}
 }

Forr all parameters, can they be changed after migration has started:

compress_level: NO
compress_threads: NO
compress_wait_thread: NO
decompress_threads: NO
throttle_trigger_threshold: MAYBE
cpu_throttle_initial: NO
cpu_throttle_increment: NO?
cpu_throotle_tailslow: NO?
tls_creds: NO
tls_hostname: NO
max_bandwidth: YES
downtime_limit: YES
x_checkpoint_delay: NO?
block_incremental: NO
multifd_channels: NO
multifd_compression: NO
multifd_zlib_devel: 

Re: [PATCH V1 2/3] migration: fix suspended runstate

2023-06-21 Thread Peter Xu
On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote:
> On 6/20/2023 5:46 PM, Peter Xu wrote:
> > On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote:
> >> Migration of a guest in the suspended state is broken.  The incoming
> >> migration code automatically tries to wake the guest, which IMO is
> >> wrong -- the guest should end migration in the same state it started.
> >> Further, the wakeup is done by calling qemu_system_wakeup_request(), which
> >> bypasses vm_start().  The guest appears to be in the running state, but
> >> it is not.
> >>
> >> To fix, leave the guest in the suspended state, but call
> >> qemu_system_start_on_wakeup_request() so the guest is properly resumed
> >> later, when the client sends a system_wakeup command.
> >>
> >> Signed-off-by: Steve Sistare 
> >> ---
> >>  migration/migration.c | 11 ---
> >>  softmmu/runstate.c|  1 +
> >>  2 files changed, 5 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/migration/migration.c b/migration/migration.c
> >> index 17b4b47..851fe6d 100644
> >> --- a/migration/migration.c
> >> +++ b/migration/migration.c
> >> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void 
> >> *opaque)
> >>  vm_start();
> >>  } else {
> >>  runstate_set(global_state_get_runstate());
> >> +if (runstate_check(RUN_STATE_SUSPENDED)) {
> >> +/* Force vm_start to be called later. */
> >> +qemu_system_start_on_wakeup_request();
> >> +}
> > 
> > Is this really needed, along with patch 1?
> > 
> > I have a very limited knowledge on suspension, so I'm prone to making
> > mistakes..
> > 
> > But from what I read this, qemu_system_wakeup_request() (existing one, not
> > after patch 1 applied) will setup wakeup_reason and kick the main thread
> > using qemu_notify_event().  Then IIUC the e.g. vcpu wakeups will be done in
> > the main thread later on after qemu_wakeup_requested() returns true.
> 
> Correct, here:
> 
> if (qemu_wakeup_requested()) {
> pause_all_vcpus();
> qemu_system_wakeup();
> notifier_list_notify(_notifiers, _reason);
> wakeup_reason = QEMU_WAKEUP_REASON_NONE;
> resume_all_vcpus();
> qapi_event_send_wakeup();
> }
> 
> However, that is not sufficient, because vm_start() was never called on the 
> incoming
> side.  vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among 
> other things.
> 
> 
> Without my fixes, it "works" because the outgoing migration automatically 
> wakes a suspended
> guest, which sets the state to running, which is saved in global state:
> 
> void migration_completion(MigrationState *s)
> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
> global_state_store()
> 
> Then the incoming migration calls vm_start here:
> 
> migration/migration.c
> if (!global_state_received() ||
> global_state_get_runstate() == RUN_STATE_RUNNING) {
> if (autostart) {
> vm_start();
> 
> vm_start must be called for correctness.

I see.  Though I had a feeling that this is still not the right way to do,
at least not as clean.

One question is, would above work for postcopy when VM is suspended during
the switchover?

I think I see your point that vm_start() (mostly vm_prepare_start())
contains a bunch of operations that maybe we must have before starting the
VM, but then.. should we just make that vm_start() unconditional when
loading VM completes?  I just don't see anything won't need it (besides
-S), even COLO.

So I'm wondering about something like this:

===8<===
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque)
 
 dirty_bitmap_mig_before_vm_start();
 
-if (!global_state_received() ||
-global_state_get_runstate() == RUN_STATE_RUNNING) {
-if (autostart) {
-vm_start();
-} else {
-runstate_set(RUN_STATE_PAUSED);
-}
-} else if (migration_incoming_colo_enabled()) {
+if (migration_incoming_colo_enabled()) {
 migration_incoming_disable_colo();
+/* COLO should always have autostart=1 or we can enforce it here */
+}
+
+if (autostart) {
+RunState run_state = global_state_get_runstate();
 vm_start();
+switch (run_state) {
+case RUN_STATE_RUNNING:
+break;
+case RUN_STATE_SUSPENDED:
+qemu_system_suspend();
+break;
+default:
+runstate_set(run_state);
+break;
+}
 } else {
-runstate_set(global_state_get_runstate());
+runstate_set(RUN_STATE_PAUSED);
 }
===8<===

IIUC this can drop qemu_system_start_on_wakeup_request() along with the
other global var.  Would something like it work for us?

-- 
Peter Xu




Re: [PATCH v3 0/4] Virtio shared dma-buf

2023-06-21 Thread Michael S. Tsirkin
On Wed, Jun 21, 2023 at 10:20:25AM +0200, Albert Esteve wrote:
> Hi!
> 
> It has been a month since I sent this patch, so I'll give it a bump to get 
> some
> attention back.
> 
> @mst and @Fam any comments? What would be the next steps to take to move this
> forward?
> 
> BR,
> Albert
> 
> 

I'd really want help from Gerd here. Don't know enough about dmabuf.
Gerd any comments or I'll just merge?


-- 
MST




Re: [PATCH V1 3/3] tests/qtest: live migration suspended state

2023-06-21 Thread Peter Xu
On Wed, Jun 21, 2023 at 03:39:44PM -0400, Steven Sistare wrote:
> >> -jmp mainloop
> >> +# should this test suspend?
> >> +mov (suspend_me),%eax
> >> +cmp $0,%eax
> >> +je mainloop
> >> +
> >> +# are we waking after suspend?  do not suspend again.
> >> +mov $suspended,%eax
> > 
> > So IIUC then it'll use 4 bytes over 100MB range which means we need at
> > least 100MB+4bytes.. not obvious for a HIGH_ADDR definition to me..
> > 
> > Could we just define a variable inside the section like suspend_me?
> 
> No, because modifications to this memory backing the boot block are not
> copied to the destination.  The dest reads a clean copy of the boot block
> from disk, as specified by the qemu command line arguments.

Oh okay, can we use HIGH_ADDR-4, then?  I just still think it'll be nice if
we can keep HIGH_ADDR the high bar of the whole range.

Thanks,

-- 
Peter Xu




Re: [PATCH 12/42] migration-test: Enable back ignore-shared test

2023-06-21 Thread Peter Xu
On Wed, Jun 21, 2023 at 09:38:08PM +0200, Juan Quintela wrote:
> Peter Xu  wrote:
> > On Fri, Jun 09, 2023 at 12:49:13AM +0200, Juan Quintela wrote:
> >> It failed on aarch64 tcg, lets see if that is still the case.
> >> 
> >> Signed-off-by: Juan Quintela 
> >
> > According to the history:
> >
> > https://lore.kernel.org/all/20190305180635.GA3803@work-vm/
> >
> > It's never enabled, and not sure whether Yury followed it up.  Juan: have
> > you tried it out on aarch64 before enabling it again?  I assume we rely on
> > the previous patch but that doesn't even sound like aarch64 specific.  I
> > worry it'll just keep failing on aarch64.
> 
> Hi
> 
> I am resending this series.
> 
> I hard tested this time.  x86_64 host.
> Two build directories:
> - x86_64 (I just build qemu-system-x86_64, kvm)
> - aarch64 (I just build qemu-system-aarch64, tcg)
> 
> Everything is run as:
> 
> while true; do $command || break; done
> 
> And run this:
> - x86_64:
>   * make check (nit: you can't run two make checks on the same
> directory)
>   * 4 ./test/qtest/migration-test
>   * 2 ./test/qtest/migration-test -p ./tests/qtest/migration-test -p 
> /x86_64/migration/multifd/tcp/plain/cancel
>   * 2 ./test/qtest/migration-test -p ./tests/qtest/migration-test -p 
> /x86_64/migration/ignore_shared
> 
> - aarch64:
>   The same with s/x86_64/aarch64/
> 
> And left it running for 6 hours.  No errors.
> Machine has enough RAM for running this (128GB) and 18 cores (intel
> i9900K).
> Load of the machine while running this tests is around 50 (I really hope
> that our CI hosts have less load).
> 
> A run master with the same configuration.  In less than 10 minutes I get
> the dreaded:
> 
> # starting QEMU: exec ./qemu-system-aarch64 -qtest 
> unix:/tmp/qtest-3264370.sock -qtest-log /dev/null -chardev 
> socket,path=/tmp/qtest-3264370.qmp,id=char0 -mon chardev=char0,mode=control 
> -display none -accel kvm -accel tcg -machine virt,gic-version=max -name 
> target,debug-threads=on -m 150M -serial 
> file:/tmp/migration-test-1A1461/dest_serial -incoming defer -cpu max -kernel 
> /tmp/migration-test-1A1461/bootsect-accel qtest
> Broken pipe
> ../../../../../mnt/code/qemu/multifd/tests/qtest/libqtest.c:195: kill_qemu() 
> detected QEMU death from signal 6 (Aborted) (core dumped)
> Aborted (core dumped)
> $
> 
> On multifd+cancel.
> 
> I have no been able to ever get ignore_shared to fail on my machine.
> But I didn't tested aarch64 TCG in the past so hard, and in x86_64 it
> has always worked for me.

Thanks a lot, Juan.

Do you mean master is broken with QEMU_TEST_FLAKY_TESTS=1?  And after the
whole series applied we cannot trigger issue in the few hours test even
with it?

Shall we wait for another 1-2 days to see whether Yury would comment
(before you repost)?  Otherwise I agree if it survives your few-hours test
we should give it a try - at least according to Dave's comment before it
was failing easily, but it is not now on the test bed.

Maybe it's still just hidden, but in that case I also agree enabling it in
the repo is the simplest way to reproduce the failure again, if we still
ever want to enable it one day..

-- 
Peter Xu




Re: [PATCH v2 04/11] multifd: Count the number of bytes sent correctly

2023-06-21 Thread Juan Quintela
Chuang Xu  wrote:
> Hi,Juan,
>
> On 2023/1/30 下午4:09, Juan Quintela wrote:
>> Current code asumes that all pages are whole.  That is not true for
>> example for compression already.  Fix it for creating a new field
>> ->sent_bytes that includes it.
>>
>> All ram_counters are used only from the migration thread, so we have
>> two options:
>> - put a mutex and fill everything when we sent it (not only
>>ram_counters, also qemu_file->xfer_bytes).
>> - Create a local variable that implements how much has been sent
>>through each channel.  And when we push another packet, we "add" the
>>previous stats.
>>
>> I choose two due to less changes overall.  On the previous code we
>> increase transferred and then we sent.  Current code goes the other
>> way around.  It sents the data, and after the fact, it updates the
>> counters.  Notice that each channel can have a maximum of half a
>> megabyte of data without counting, so it is not very important.
>>
>> Signed-off-by: Juan Quintela 
>> ---
>>   migration/multifd.h | 2 ++
>>   migration/multifd.c | 6 --
>>   2 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/migration/multifd.h b/migration/multifd.h
>> index e2802a9ce2..36f899c56f 100644
>> --- a/migration/multifd.h
>> +++ b/migration/multifd.h
>> @@ -102,6 +102,8 @@ typedef struct {
>>   uint32_t flags;
>>   /* global number of generated multifd packets */
>>   uint64_t packet_num;
>> +/* How many bytes have we sent on the last packet */
>> +uint64_t sent_bytes;
>>   /* thread has work to do */
>>   int pending_job;
>>   /* array of pages to sent.
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index 61cafe4c76..cd26b2fda9 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -394,7 +394,6 @@ static int multifd_send_pages(QEMUFile *f)
>>   static int next_channel;
>>   MultiFDSendParams *p = NULL; /* make happy gcc */
>>   MultiFDPages_t *pages = multifd_send_state->pages;
>> -uint64_t transferred;
>>
>>   if (qatomic_read(_send_state->exiting)) {
>>   return -1;
>> @@ -429,7 +428,8 @@ static int multifd_send_pages(QEMUFile *f)
>>   p->packet_num = multifd_send_state->packet_num++;
>>   multifd_send_state->pages = p->pages;
>>   p->pages = pages;
>> -transferred = ((uint64_t) pages->num) * p->page_size + p->packet_len;
>> +uint64_t transferred = p->sent_bytes;
>> +p->sent_bytes = 0;
>>   qemu_file_acct_rate_limit(f, transferred);
>>   qemu_mutex_unlock(>mutex);
>>   stat64_add(_atomic_counters.multifd_bytes, transferred);
>> @@ -719,6 +719,8 @@ static void *multifd_send_thread(void *opaque)
>>   }
>>
>>   qemu_mutex_lock(>mutex);
>> +p->sent_bytes += p->packet_len;
>> +p->sent_bytes += p->next_packet_size;
>
> Consider a scenario where some normal pages are transmitted in the first 
> round,
> followed by several consecutive rounds of zero pages. When zero pages
> are transmitted,
> next_packet_size of first round is still incorrectly added to
> sent_bytes. If we set a rate
> limiting for dirty page transmission, the transmission performance of
> multi zero check
> will degrade.
>
> Maybe we should set next_packet_size to 0 in multifd_send_pages()?

See my series of migration atomic counters O:-)

You are right with your comments, that is the reason why it took me so
many patches to fix it properly.

After the last serie on the list that set_bytes variable don't exist
anymore and I just do (with atomic operations):

multifd_bytes += size_of_write_just_done;

And no more sheanigans.

Thanks, Juan.




Re: [PATCH v2 1/2] migration: enfocre multifd and postcopy preempt to be set before incoming

2023-06-21 Thread Juan Quintela
Wei Wang  wrote:
> qemu_start_incoming_migration needs to check the number of multifd
> channels or postcopy ram channels to configure the backlog parameter (i.e.
> the maximum length to which the queue of pending connections for sockfd
> may grow) of listen(). So enforce the usage of postcopy-preempt and
> multifd as below:
> - need to use "-incoming defer" on the destination; and
> - set_capability and set_parameter need to be done before migrate_incoming
>
> Otherwise, disable the use of the features and report error messages to
> remind users to adjust the commands.
>
> Signed-off-by: Wei Wang 
> Reviewed-by: Peter Xu 

Reviewed-by: Juan Quintela 

queued.




Re: [PATCH V1 3/3] tests/qtest: live migration suspended state

2023-06-21 Thread Steven Sistare
On 6/21/2023 12:45 PM, Peter Xu wrote:
> On Thu, Jun 15, 2023 at 01:26:40PM -0700, Steve Sistare wrote:
>> Add a test case to verify that the suspended state is handled correctly in
>> live migration.  The test suspends the src, migrates, then wakes the dest.
>>
>> Add an option to suspend the src in a-b-bootblock.S, which puts the guest
>> in S3 state after one round of writing to memory.  The option is enabled by
>> poking a 1 into the suspend_me word in the boot block prior to starting the
>> src vm.  Generate symbol offsets in a-b-bootblock.h so that the suspend_me
>> offset is known.
>>
>> Signed-off-by: Steve Sistare 
> 
> Thanks for the test case, mostly good to me, a few trivial comments /
> questions below.
> 
>> ---
>>  tests/migration/i386/Makefile|  5 ++--
>>  tests/migration/i386/a-b-bootblock.S | 49 
>> +---
>>  tests/migration/i386/a-b-bootblock.h | 22 ++--
>>  tests/qtest/migration-helpers.c  |  2 +-
>>  tests/qtest/migration-test.c | 31 +--
>>  5 files changed, 92 insertions(+), 17 deletions(-)
>>
>> diff --git a/tests/migration/i386/Makefile b/tests/migration/i386/Makefile
>> index 5c03241..37a72ae 100644
>> --- a/tests/migration/i386/Makefile
>> +++ b/tests/migration/i386/Makefile
>> @@ -4,9 +4,10 @@
>>  .PHONY: all clean
>>  all: a-b-bootblock.h
>>  
>> -a-b-bootblock.h: x86.bootsect
>> +a-b-bootblock.h: x86.bootsect x86.o
>>  echo "$$__note" > header.tmp
>>  xxd -i $< | sed -e 's/.*int.*//' >> header.tmp
>> +nm x86.o | awk '{print "#define SYM_"$$3" 0x"$$1}' >> header.tmp
>>  mv header.tmp $@
>>  
>>  x86.bootsect: x86.boot
>> @@ -16,7 +17,7 @@ x86.boot: x86.o
>>  $(CROSS_PREFIX)objcopy -O binary $< $@
>>  
>>  x86.o: a-b-bootblock.S
>> -$(CROSS_PREFIX)gcc -m32 -march=i486 -c $< -o $@
>> +$(CROSS_PREFIX)gcc -I.. -m32 -march=i486 -c $< -o $@
>>  
>>  clean:
>>  @rm -rf *.boot *.o *.bootsect
>> diff --git a/tests/migration/i386/a-b-bootblock.S 
>> b/tests/migration/i386/a-b-bootblock.S
>> index 3d464c7..63d446f 100644
>> --- a/tests/migration/i386/a-b-bootblock.S
>> +++ b/tests/migration/i386/a-b-bootblock.S
>> @@ -9,6 +9,21 @@
>>  #
>>  # Author: dgilb...@redhat.com
>>  
>> +#include "migration-test.h"
>> +
>> +#define ACPI_ENABLE 0xf1
>> +#define ACPI_PORT_SMI_CMD   0xb2
>> +#define ACPI_PM_BASE0x600
>> +#define PM1A_CNT_OFFSET 4
>> +
>> +#define ACPI_SCI_ENABLE 0x0001
>> +#define ACPI_SLEEP_TYPE 0x0400
>> +#define ACPI_SLEEP_ENABLE   0x2000
>> +#define SLEEP (ACPI_SCI_ENABLE + ACPI_SLEEP_TYPE + ACPI_SLEEP_ENABLE)
>> +
>> +#define LOW_ADDRX86_TEST_MEM_START
>> +#define HIGH_ADDR   X86_TEST_MEM_END
>> +#define suspended   HIGH_ADDR
>>  
>>  .code16
>>  .org 0x7c00
>> @@ -41,12 +56,11 @@ start: # at 0x7c00 ?
>>  # bl keeps a counter so we limit the output speed
>>  mov $0, %bl
>>  mainloop:
>> -# Start from 1MB
>> -mov $(1024*1024),%eax
>> +mov $LOW_ADDR,%eax
>>  innerloop:
>>  incb (%eax)
>>  add $4096,%eax
>> -cmp $(100*1024*1024),%eax
>> +cmp $HIGH_ADDR,%eax
>>  jl innerloop
>>  
>>  inc %bl
>> @@ -57,7 +71,30 @@ innerloop:
>>  mov $0x3f8,%dx
>>  outb %al,%dx
>>  
>> -jmp mainloop
>> +# should this test suspend?
>> +mov (suspend_me),%eax
>> +cmp $0,%eax
>> +je mainloop
>> +
>> +# are we waking after suspend?  do not suspend again.
>> +mov $suspended,%eax
> 
> So IIUC then it'll use 4 bytes over 100MB range which means we need at
> least 100MB+4bytes.. not obvious for a HIGH_ADDR definition to me..
> 
> Could we just define a variable inside the section like suspend_me?

No, because modifications to this memory backing the boot block are not
copied to the destination.  The dest reads a clean copy of the boot block
from disk, as specified by the qemu command line arguments.

>> +mov (%eax),%eax
>> +cmp $1,%eax
>> +je mainloop
>> +
>> +# enable acpi
>> +mov $ACPI_ENABLE,%al
>> +outb %al,$ACPI_PORT_SMI_CMD
>> +
>> +# suspend to ram
>> +mov $suspended,%eax
>> +movl $1,(%eax)
>> +mov $SLEEP,%ax
>> +mov $(ACPI_PM_BASE + PM1A_CNT_OFFSET),%dx
>> +outw %ax,%dx
>> +# not reached.  The wakeup causes reset and restart at 0x7c00, and 
>> we
>> +# do not save and restore registers as a real kernel would do.
>> +
>>  
>>  # GDT magic from old (GPLv2)  Grub startup.S
>>  .p2align2   /* force 4-byte alignment */
>> @@ -83,6 +120,10 @@ gdtdesc:
>>  .word   0x27/* limit */
>>  .long   gdt /* addr */
>>  
>> +/* test launcher can poke a 1 here to exercise suspend */
>> +suspend_me:
>> +.int  0
>> +
>>  /* I'm a bootable disk */
>>  .org 0x7dfe
>>  

Re: [PATCH 12/42] migration-test: Enable back ignore-shared test

2023-06-21 Thread Juan Quintela
Peter Xu  wrote:
> On Fri, Jun 09, 2023 at 12:49:13AM +0200, Juan Quintela wrote:
>> It failed on aarch64 tcg, lets see if that is still the case.
>> 
>> Signed-off-by: Juan Quintela 
>
> According to the history:
>
> https://lore.kernel.org/all/20190305180635.GA3803@work-vm/
>
> It's never enabled, and not sure whether Yury followed it up.  Juan: have
> you tried it out on aarch64 before enabling it again?  I assume we rely on
> the previous patch but that doesn't even sound like aarch64 specific.  I
> worry it'll just keep failing on aarch64.

Hi

I am resending this series.

I hard tested this time.  x86_64 host.
Two build directories:
- x86_64 (I just build qemu-system-x86_64, kvm)
- aarch64 (I just build qemu-system-aarch64, tcg)

Everything is run as:

while true; do $command || break; done

And run this:
- x86_64:
  * make check (nit: you can't run two make checks on the same
directory)
  * 4 ./test/qtest/migration-test
  * 2 ./test/qtest/migration-test -p ./tests/qtest/migration-test -p 
/x86_64/migration/multifd/tcp/plain/cancel
  * 2 ./test/qtest/migration-test -p ./tests/qtest/migration-test -p 
/x86_64/migration/ignore_shared

- aarch64:
  The same with s/x86_64/aarch64/

And left it running for 6 hours.  No errors.
Machine has enough RAM for running this (128GB) and 18 cores (intel
i9900K).
Load of the machine while running this tests is around 50 (I really hope
that our CI hosts have less load).

A run master with the same configuration.  In less than 10 minutes I get
the dreaded:

# starting QEMU: exec ./qemu-system-aarch64 -qtest unix:/tmp/qtest-3264370.sock 
-qtest-log /dev/null -chardev socket,path=/tmp/qtest-3264370.qmp,id=char0 -mon 
chardev=char0,mode=control -display none -accel kvm -accel tcg -machine 
virt,gic-version=max -name target,debug-threads=on -m 150M -serial 
file:/tmp/migration-test-1A1461/dest_serial -incoming defer -cpu max -kernel 
/tmp/migration-test-1A1461/bootsect-accel qtest
Broken pipe
../../../../../mnt/code/qemu/multifd/tests/qtest/libqtest.c:195: kill_qemu() 
detected QEMU death from signal 6 (Aborted) (core dumped)
Aborted (core dumped)
$

On multifd+cancel.

I have no been able to ever get ignore_shared to fail on my machine.
But I didn't tested aarch64 TCG in the past so hard, and in x86_64 it
has always worked for me.

Later, Juan.






Re: [PATCH 2/5] include/migration: mark vmstate_register() as a legacy function

2023-06-21 Thread Juan Quintela
Alex Bennée  wrote:
> Mention that QOM-ified devices already have support for registering
> the description.
>
> Signed-off-by: Alex Bennée 

Reviewed-by: Juan Quintela 

I really remove that function in a future series (well, I substitute it
with vmstate_register_id() and vmstate_register_any(), but the comment
applies to the new versions also).

Later, Juan.




Re: [PATCH v6 0/8] migration: Add switchover ack capability and VFIO precopy support

2023-06-21 Thread Alex Williamson
On Wed, 21 Jun 2023 14:11:53 +0300
Avihai Horon  wrote:

> Hello everyone,
> 
> The latest changes to migration qtest made the tests run non-live by
> default. I am posting this v6 to change back the switchover-ack
> migration test to run live as it used to (because the source VM needs to
> be running to consider the switchover ACK when deciding to do the
> switchover or not).
> 
> Changes from v5 [7]:
> * Rebased on latest master branch.
> * Made switchover-ack migration test run live again (I kept the R-bs as
>   this was the original behavior when they were given).
> * Dropped patch #8 (x-allow-pre-copy property). (Alex)
> * Adjusted patch #9 commit message according to drop of patch #8.
> * Added R-b to patch #9 and Tested-by tags to the series.

I think Cédric is going to handle the pull request for this, so...

Acked-by: Alex Williamson 




Re: [PATCH v5 0/2] Update error description whenever migration fails

2023-06-21 Thread Juan Quintela
Tejus GK  wrote:
> Hi everyone,
>
> This is the v5 patchset which has been rebased on the current 
> master. Requesting this to be queued for merge as this has already been
> reviewed. 

queued.

thanks.

>
> Regards,
> Tejus
>
> Tejus GK (2):
>   migration: Update error description whenever migration fails
>   migration: Refactor repeated call of yank_unregister_instance
>
>  migration/migration.c | 23 ---
>  1 file changed, 12 insertions(+), 11 deletions(-)




Re: 'make check-tcg' fails with an assert in qemu_plugin_vcpu_init_hook

2023-06-21 Thread Peter Maydell
On Wed, 21 Jun 2023 at 11:00, Alex Bennée  wrote:
>
>
> Peter Maydell  writes:
>
> > On Wed, 21 Jun 2023 at 09:05, Alex Bennée  wrote:
> >>   - I suspect the plugin core stuff could be build once (or maybe twice,
> >> system and user)
> >
> > It is already build-once, that's why it goes wrong...
>
> I thought it was the other way around:
>
>   specific_ss.add(when: 'CONFIG_PLUGIN', if_true: [files(
> 'loader.c',
> 'core.c',
> 'api.c',
>   ), declare_dependency(link_args: plugin_ldflags)])
>
> but if we built it for linux-user and softmmu this could be fixed (until
> the next breakage anyway). cpus-common.c is the common code that sets
> this once.

Oh, right, I got it the wrong way around.

> >>   - we need to have some guard rails somehow to make sure things don't
> >> go out of sync
> >
> > We do, this is the poison.h stuff. CONFIG_USER_ONLY is a
> > special case which we don't poison because there would be
> > too much refactoring required...
>
> I guess a great big honking comment at the top of CPUState telling
> people not to do that or pushing softmmu and user specific bits of
> CPUState into their own de-referenced structures.

It's not specific to CPUState, though. The thing you must not
do is use CONFIG_USER_ONLY (or CONFIG_SOFTMMU, now) to ifdef
out any struct field anywhere in any struct that's visible to
compiled-once code, or otherwise do something that changes
the ABI of a global or of a type passed around between functions.

-- PMM



Re: [PATCH 6/6] iotests: add test 314 for "qemu-img rebase" with compression

2023-06-21 Thread Denis V. Lunev

On 6/1/23 21:28, Andrey Drobyshev wrote:

The test cases considered so far:

1. Check that compression mode isn't compatible with "-f raw" (raw
format doesn't support compression).
2. Check that rebasing an image onto no backing file preserves the data
and writes the copied clusters actually compressed.
3. Same as 2, but with a raw backing file (i.e. the clusters copied from the
backing are originally uncompressed -- we check they end up compressed
after being merged).
4. Remove a single delta from a backing chain, perform the same checks
as in 2.
5. Check that even when backing and overlay are initially uncompressed,
copied clusters end up compressed when rebase with compression is
performed.

Signed-off-by: Andrey Drobyshev 
---
  tests/qemu-iotests/314 | 165 +
  tests/qemu-iotests/314.out |  75 +
  2 files changed, 240 insertions(+)
  create mode 100755 tests/qemu-iotests/314
  create mode 100644 tests/qemu-iotests/314.out

diff --git a/tests/qemu-iotests/314 b/tests/qemu-iotests/314
new file mode 100755
index 00..96d7b4d258
--- /dev/null
+++ b/tests/qemu-iotests/314
@@ -0,0 +1,165 @@
+#!/usr/bin/env bash
+# group: rw backing auto quick
+#
+# Test qemu-img rebase with compression
+#
+# Copyright (c) 2023 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=andrey.drobys...@virtuozzo.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+_rm_test_img "$TEST_IMG.base"
+_rm_test_img "$TEST_IMG.itmd"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+# Want the size divisible by 2 and 3
+size=$(( 48 * 1024 * 1024 ))
+half_size=$(( size / 2 ))
+third_size=$(( size / 3 ))
+
+# 1. "qemu-img rebase -c" should refuse working with any format which doesn't
+# support compression.  We only check "-f raw" here.
+echo
+echo "=== Testing compressed rebase format compatibility ==="
+echo
+
+$QEMU_IMG create -f raw "$TEST_IMG" "$size" | _filter_img_create
+$QEMU_IMG rebase -c -f raw -b "" "$TEST_IMG"
+
+# 2. Write the 1st half of $size to backing file (compressed), 2nd half -- to
+# the top image (also compressed).  Rebase the top image onto no backing file,
+# with compression (i.e. "qemu-img -c -b ''").  Check that the resulting image
+# has the written data preserved, and "qemu-img check" reports 100% clusters
+# as compressed.
+echo
+echo "=== Testing rebase with compression onto no backing file ==="
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img $size
+_make_test_img -b "$TEST_IMG.base" -F $IMGFMT $size
+
+$QEMU_IO -c "write -c -P 0xaa 0 $half_size" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xbb $half_size $half_size" "$TEST_IMG" \
+| _filter_qemu_io
+
+$QEMU_IMG rebase -c -f $IMGFMT -b "" "$TEST_IMG"
+
+$QEMU_IO -c "read -P 0xaa 0 $half_size" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xbb $half_size $half_size" "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG check "$TEST_IMG" | _filter_testdir
+
+# 3. Same as the previous one, but with raw backing file (hence write to
+# the backing is uncompressed).
+echo
+echo "=== Testing rebase with compression with raw backing file ==="
+echo
+
+$QEMU_IMG create -f raw "$TEST_IMG.base" "$half_size" | _filter_img_create
+_make_test_img -b "$TEST_IMG.base" -F raw $size
+
+$QEMU_IO -f raw -c "write -P 0xaa 0 $half_size" "$TEST_IMG.base" \
+| _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xbb $half_size $half_size" \
+"$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG rebase -c -f $IMGFMT -b "" "$TEST_IMG"
+
+$QEMU_IO -c "read -P 0xaa 0 $half_size" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xbb $half_size $half_size" "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG check "$TEST_IMG" | _filter_testdir
+
+# 4. Create a backing chain base<--itmd<--img, filling 1st, 2nd and 3rd
+# thirds of them, respectively (with compression).  Rebase img onto base,
+# effectively deleting itmd from the chain, and check that written data is
+# preserved in the resulting image.  Also check that "qemu-img check" reports
+# 100% clusters as compressed.
+echo
+echo "=== Testing compressed rebase 

Re: [PATCH V1 2/3] migration: fix suspended runstate

2023-06-21 Thread Steven Sistare
On 6/20/2023 5:46 PM, Peter Xu wrote:
> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote:
>> Migration of a guest in the suspended state is broken.  The incoming
>> migration code automatically tries to wake the guest, which IMO is
>> wrong -- the guest should end migration in the same state it started.
>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which
>> bypasses vm_start().  The guest appears to be in the running state, but
>> it is not.
>>
>> To fix, leave the guest in the suspended state, but call
>> qemu_system_start_on_wakeup_request() so the guest is properly resumed
>> later, when the client sends a system_wakeup command.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>  migration/migration.c | 11 ---
>>  softmmu/runstate.c|  1 +
>>  2 files changed, 5 insertions(+), 7 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 17b4b47..851fe6d 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque)
>>  vm_start();
>>  } else {
>>  runstate_set(global_state_get_runstate());
>> +if (runstate_check(RUN_STATE_SUSPENDED)) {
>> +/* Force vm_start to be called later. */
>> +qemu_system_start_on_wakeup_request();
>> +}
> 
> Is this really needed, along with patch 1?
> 
> I have a very limited knowledge on suspension, so I'm prone to making
> mistakes..
> 
> But from what I read this, qemu_system_wakeup_request() (existing one, not
> after patch 1 applied) will setup wakeup_reason and kick the main thread
> using qemu_notify_event().  Then IIUC the e.g. vcpu wakeups will be done in
> the main thread later on after qemu_wakeup_requested() returns true.

Correct, here:

if (qemu_wakeup_requested()) {
pause_all_vcpus();
qemu_system_wakeup();
notifier_list_notify(_notifiers, _reason);
wakeup_reason = QEMU_WAKEUP_REASON_NONE;
resume_all_vcpus();
qapi_event_send_wakeup();
}

However, that is not sufficient, because vm_start() was never called on the 
incoming
side.  vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other 
things.


Without my fixes, it "works" because the outgoing migration automatically wakes 
a suspended
guest, which sets the state to running, which is saved in global state:

void migration_completion(MigrationState *s)
qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
global_state_store()

Then the incoming migration calls vm_start here:

migration/migration.c
if (!global_state_received() ||
global_state_get_runstate() == RUN_STATE_RUNNING) {
if (autostart) {
vm_start();

vm_start must be called for correctness.

- Steve

>>  }
>>  /*
>>   * This must happen after any state changes since as soon as an external
>> @@ -2101,7 +2105,6 @@ static int postcopy_start(MigrationState *ms)
>>  qemu_mutex_lock_iothread();
>>  trace_postcopy_start_set_run();
>>  
>> -qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>>  global_state_store();
>>  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>>  if (ret < 0) {
>> @@ -2307,7 +2310,6 @@ static void migration_completion(MigrationState *s)
>>  if (s->state == MIGRATION_STATUS_ACTIVE) {
>>  qemu_mutex_lock_iothread();
>>  s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> -qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>>  
>>  s->vm_old_state = runstate_get();
>>  global_state_store();
>> @@ -3102,11 +3104,6 @@ static void *bg_migration_thread(void *opaque)
>>  
>>  qemu_mutex_lock_iothread();
>>  
>> -/*
>> - * If VM is currently in suspended state, then, to make a valid runstate
>> - * transition in vm_stop_force_state() we need to wakeup it up.
>> - */
>> -qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
> 
> Removal of these three places seems reasonable to me, or we won't persist
> the SUSPEND state.
> 
> Above comment was the major reason I used to have thought it was needed
> (again, based on zero knowledge around this..), but perhaps it was just
> wrong?  I would assume vm_stop_force_state() will still just work with
> suepended, am I right?
> 
>>  s->vm_old_state = runstate_get();
>>  
>>  global_state_store();
>> diff --git a/softmmu/runstate.c b/softmmu/runstate.c
>> index e127b21..771896c 100644
>> --- a/softmmu/runstate.c
>> +++ b/softmmu/runstate.c
>> @@ -159,6 +159,7 @@ static const RunStateTransition 
>> runstate_transitions_def[] = {
>>  { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
>>  { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
>>  { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
>> +{ RUN_STATE_SUSPENDED, RUN_STATE_PAUSED },
>>  { RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH },
>>

Re: [PATCH 13/42] migration-test: Check for shared memory like for everything else

2023-06-21 Thread Juan Quintela
Peter Xu  wrote:
> On Wed, Jun 21, 2023 at 12:07:20PM +0200, Juan Quintela wrote:
>> Peter Xu  wrote:
>> > On Fri, Jun 09, 2023 at 12:49:14AM +0200, Juan Quintela wrote:
>> >> Makes things easier and cleaner.
>> >> 
>> >> Signed-off-by: Juan Quintela 
>> >> ---
>> >>  tests/qtest/migration-test.c | 20 
>> >>  1 file changed, 12 insertions(+), 8 deletions(-)
>> >> 
>> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> >> index daaf5cd71a..5837060138 100644
>> >> --- a/tests/qtest/migration-test.c
>> >> +++ b/tests/qtest/migration-test.c
>> >> @@ -645,13 +645,6 @@ static int test_migrate_start(QTestState **from, 
>> >> QTestState **to,
>> >>  const char *arch = qtest_get_arch();
>> >>  const char *memory_size;
>> >>  
>> >> -if (args->use_shmem) {
>> >> -if (!g_file_test("/dev/shm", G_FILE_TEST_IS_DIR)) {
>> >> -g_test_skip("/dev/shm is not supported");
>> >> -return -1;
>> >> -}
>> >> -}
>> >
>> > Maybe assert on: "!args->use_shmem || shm_supported()" here?
>> 
>> Nope.
>> 
>> We are being extra defensive in some tests.
>
> This will protect a new test passing in use_shmem=true without checking
> shm_supported().  It'll then fail at starting the VM I think otherwise.

Hi


As it should.
Test is wrong and it aborts.
It is not that it has found an error, it is that it is badly written.

And anyways, args->use_shmem dissapears later in the series O:-)

Later, Juan.




Re: [PATCH 4/6] qemu-img: rebase: avoid unnecessary COW operations

2023-06-21 Thread Denis V. Lunev

On 6/1/23 21:28, Andrey Drobyshev wrote:

When rebasing an image from one backing file to another, we need to
compare data from old and new backings.  If the diff between that data
happens to be unaligned to the target cluster size, we might end up
doing partial writes, which would lead to copy-on-write and additional IO.

Consider the following simple case (virtual_size == cluster_size == 64K):

base <-- inc1 <-- inc2

qemu-io -c "write -P 0xaa 0 32K" base.qcow2
qemu-io -c "write -P 0xcc 32K 32K" base.qcow2
qemu-io -c "write -P 0xbb 0 32K" inc1.qcow2
qemu-io -c "write -P 0xcc 32K 32K" inc1.qcow2
qemu-img rebase -f qcow2 -b base.qcow2 -F qcow2 inc2.qcow2

While doing rebase, we'll write a half of the cluster to inc2, and block
layer will have to read the 2nd half of the same cluster from the base image
inc1 while doing this write operation, although the whole cluster is already
read earlier to perform data comparison.

In order to avoid these unnecessary IO cycles, let's make sure every
write request is aligned to the overlay cluster size.

Signed-off-by: Andrey Drobyshev 
---
  qemu-img.c | 72 +++---
  1 file changed, 52 insertions(+), 20 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 60f4c06487..9a469cd609 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3513,6 +3513,7 @@ static int img_rebase(int argc, char **argv)
  uint8_t *buf_new = NULL;
  BlockDriverState *bs = NULL, *prefix_chain_bs = NULL;
  BlockDriverState *unfiltered_bs;
+BlockDriverInfo bdi = {0};
  char *filename;
  const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
  int c, flags, src_flags, ret;
@@ -3646,6 +3647,15 @@ static int img_rebase(int argc, char **argv)
  }
  }
  
+/* We need overlay cluster size to make sure write requests are aligned */

+ret = bdrv_get_info(unfiltered_bs, );
+if (ret < 0) {
+error_report("could not get block driver info");
+goto out;
+} else if (bdi.cluster_size == 0) {
+bdi.cluster_size = 1;
+}
+
  /* For safe rebasing we need to compare old and new backing file */
  if (!unsafe) {
  QDict *options = NULL;
@@ -3744,6 +3754,7 @@ static int img_rebase(int argc, char **argv)
  int64_t new_backing_size = 0;
  uint64_t offset;
  int64_t n;
+int64_t n_old = 0, n_new = 0;
  float local_progress = 0;
  
  buf_old = blk_blockalign(blk_old_backing, IO_BUF_SIZE);

@@ -3784,7 +3795,7 @@ static int img_rebase(int argc, char **argv)
  }
  
  for (offset = 0; offset < size; offset += n) {

-bool buf_old_is_zero = false;
+bool old_backing_eof = false;
  
  /* How many bytes can we handle with the next read? */

  n = MIN(IO_BUF_SIZE, size - offset);
@@ -3829,33 +3840,38 @@ static int img_rebase(int argc, char **argv)
  }
  }
  
+/* At this point n must be aligned to the target cluster size. */

Minor: except last non-aligned cluster as stated by 'if' :)

+if (offset + n < size) {
+assert(n % bdi.cluster_size == 0);
+}
+
+/*
+ * Much like the with the target image, we'll try to read as much
+ * of the old and new backings as we can.
+ */
+n_old = MIN(n, MAX(0, old_backing_size - (int64_t) offset));
+if (blk_new_backing) {
+n_new = MIN(n, MAX(0, new_backing_size - (int64_t) offset));
+}
+
  /*
   * Read old and new backing file and take into consideration that
   * backing files may be smaller than the COW image.
   */
-if (offset >= old_backing_size) {
-memset(buf_old, 0, n);
-buf_old_is_zero = true;
+memset(buf_old + n_old, 0, n - n_old);
+if (!n_old) {
+old_backing_eof = true;
  } else {
-if (offset + n > old_backing_size) {
-n = old_backing_size - offset;
-}
-
-ret = blk_pread(blk_old_backing, offset, n, buf_old, 0);
+ret = blk_pread(blk_old_backing, offset, n_old, buf_old, 0);
  if (ret < 0) {
  error_report("error while reading from old backing file");
  goto out;
  }
  }
  
-if (offset >= new_backing_size || !blk_new_backing) {

-memset(buf_new, 0, n);
-} else {
-if (offset + n > new_backing_size) {
-n = new_backing_size - offset;
-}
-
-ret = blk_pread(blk_new_backing, offset, n, buf_new, 0);
+memset(buf_new + n_new, 0, n - n_new);
+if (blk_new_backing && n_new) {
+ret = blk_pread(blk_new_backing, offset, n_new, 

Re: [PATCH 2/6] qemu-iotests: 024: add rebasing test case for overlay_size > backing_size

2023-06-21 Thread Denis V. Lunev

On 6/1/23 21:28, Andrey Drobyshev wrote:

Before previous commit, rebase was getting infitely stuck in case of
rebasing within the same backing chain and when overlay_size > backing_size.
Let's add this case to the rebasing test 024 to make sure it doesn't
break again.

Signed-off-by: Andrey Drobyshev 
---
  tests/qemu-iotests/024 | 57 ++
  tests/qemu-iotests/024.out | 30 
  2 files changed, 87 insertions(+)

diff --git a/tests/qemu-iotests/024 b/tests/qemu-iotests/024
index 25a564a150..98a7c8fd65 100755
--- a/tests/qemu-iotests/024
+++ b/tests/qemu-iotests/024
@@ -199,6 +199,63 @@ echo
  # $BASE_OLD and $BASE_NEW)
  $QEMU_IMG map "$OVERLAY" | _filter_qemu_img_map
  
+# Check that rebase within the chain is working when

+# overlay_size > old_backing_size
+#
+# base_new <-- base_old <-- overlay
+#
+# Backing (new): 11 11 11 11 11
+# Backing (old): 22 22 22 22
+# Overlay:   -- -- -- -- --
+#
+# As a result, overlay should contain data identical to base_old, with the
+# last cluster remaining unallocated.
+
+echo
+echo "=== Test rebase within one backing chain ==="
+echo
+
+echo "Creating backing chain"
+echo
+
+TEST_IMG=$BASE_NEW _make_test_img $(( CLUSTER_SIZE * 5 ))
+TEST_IMG=$BASE_OLD _make_test_img -b "$BASE_NEW" -F $IMGFMT \
+$(( CLUSTER_SIZE * 4 ))
+TEST_IMG=$OVERLAY _make_test_img -b "$BASE_OLD" -F $IMGFMT \
+$(( CLUSTER_SIZE * 5 ))
+
+echo
+echo "Fill backing files with data"
+echo
+
+$QEMU_IO "$BASE_NEW" -c "write -P 0x11 0 $(( CLUSTER_SIZE * 5 ))" \
+| _filter_qemu_io
+$QEMU_IO "$BASE_OLD" -c "write -P 0x22 0 $(( CLUSTER_SIZE * 4 ))" \
+| _filter_qemu_io
+
+echo
+echo "Check the last cluster is zeroed in overlay before the rebase"
+echo
+$QEMU_IO "$OVERLAY" -c "read -P 0x00 $(( CLUSTER_SIZE * 4 )) $CLUSTER_SIZE" \
+| _filter_qemu_io
+
+echo
+echo "Rebase onto another image in the same chain"
+echo
+
+$QEMU_IMG rebase -b "$BASE_NEW" -F $IMGFMT "$OVERLAY"
+
+echo "Verify that data is read the same before and after rebase"
+echo
+
+# Verify the first 4 clusters are still read the same as in the old base
+$QEMU_IO "$OVERLAY" -c "read -P 0x22 0 $(( CLUSTER_SIZE * 4 ))" \
+| _filter_qemu_io
+# Verify the last cluster still reads as zeroes
+$QEMU_IO "$OVERLAY" -c "read -P 0x00 $(( CLUSTER_SIZE * 4 )) $CLUSTER_SIZE" \
+| _filter_qemu_io
+
+echo
  
  # success, all done

  echo "*** done"
diff --git a/tests/qemu-iotests/024.out b/tests/qemu-iotests/024.out
index 973a5a3711..245fe8b1d1 100644
--- a/tests/qemu-iotests/024.out
+++ b/tests/qemu-iotests/024.out
@@ -171,4 +171,34 @@ read 65536/65536 bytes at offset 196608
  Offset  Length  File
  0   0x3 TEST_DIR/subdir/t.IMGFMT
  0x3 0x1 TEST_DIR/subdir/t.IMGFMT.base_new
+
+=== Test rebase within one backing chain ===
+
+Creating backing chain
+
+Formatting 'TEST_DIR/subdir/t.IMGFMT.base_new', fmt=IMGFMT size=327680
+Formatting 'TEST_DIR/subdir/t.IMGFMT.base_old', fmt=IMGFMT size=262144 
backing_file=TEST_DIR/subdir/t.IMGFMT.base_new backing_fmt=IMGFMT
+Formatting 'TEST_DIR/subdir/t.IMGFMT', fmt=IMGFMT size=327680 
backing_file=TEST_DIR/subdir/t.IMGFMT.base_old backing_fmt=IMGFMT
+
+Fill backing files with data
+
+wrote 327680/327680 bytes at offset 0
+320 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 262144/262144 bytes at offset 0
+256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Check the last cluster is zeroed in overlay before the rebase
+
+read 65536/65536 bytes at offset 262144
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Rebase onto another image in the same chain
+
+Verify that data is read the same before and after rebase
+
+read 262144/262144 bytes at offset 0
+256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 262144
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
  *** done

For the clarity:
Reviewed-by: Denis V. Lunev 



Re: [PATCH 1/6] qemu-img: rebase: stop when reaching EOF of old backing file

2023-06-21 Thread Denis V. Lunev

On 6/1/23 21:28, Andrey Drobyshev wrote:

In case when we're rebasing within one backing chain, and when target image
is larger than old backing file, bdrv_is_allocated_above() ends up setting
*pnum = 0.  As a result, target offset isn't getting incremented, and we
get stuck in an infinite for loop.  Let's detect this case and proceed
further down the loop body, as the offsets beyond the old backing size need
to be explicitly zeroed.

Signed-off-by: Andrey Drobyshev 
---
  qemu-img.c | 13 -
  1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/qemu-img.c b/qemu-img.c
index 27f48051b0..78433f3746 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3801,6 +3801,8 @@ static int img_rebase(int argc, char **argv)
  }
  
  if (prefix_chain_bs) {

+uint64_t bytes = n;
+
  /*
   * If cluster wasn't changed since prefix_chain, we don't need
   * to take action
@@ -3813,9 +3815,18 @@ static int img_rebase(int argc, char **argv)
   strerror(-ret));
  goto out;
  }
-if (!ret) {
+if (!ret && n) {
  continue;
  }
+if (!n) {
+/*
+ * If we've reached EOF of the old backing, it means that
+ * offsets beyond the old backing size were read as zeroes.
+ * Now we will need to explicitly zero the cluster in
+ * order to preserve that state after the rebase.
+ */
+n = bytes;
+}
  }
  
  /*

for the clarity:
Reviewed-by: Denis V. Lunev 



Re: [PATCH 5/6] qemu-img: add compression option to rebase subcommand

2023-06-21 Thread Denis V. Lunev

On 6/1/23 21:28, Andrey Drobyshev wrote:

If we rebase an image whose backing file has compressed clusters, we
might end up wasting disk space since the copied clusters are now
uncompressed.  In order to have better control over this, let's add
"--compress" option to the "qemu-img rebase" command.

Note that this option affects only the clusters which are actually being
copied from the original backing file.  The clusters which were
uncompressed in the target image will remain so.

Signed-off-by: Andrey Drobyshev 
---
  docs/tools/qemu-img.rst |  6 --
  qemu-img-cmds.hx|  4 ++--
  qemu-img.c  | 19 +--
  3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 15aeddc6d8..973a912dec 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -663,7 +663,7 @@ Command description:
  
List, apply, create or delete snapshots in image *FILENAME*.
  
-.. option:: rebase [--object OBJECTDEF] [--image-opts] [-U] [-q] [-f FMT] [-t CACHE] [-T SRC_CACHE] [-p] [-u] -b BACKING_FILE [-F BACKING_FMT] FILENAME

+.. option:: rebase [--object OBJECTDEF] [--image-opts] [-U] [-q] [-f FMT] [-t 
CACHE] [-T SRC_CACHE] [-p] [-u] [-c] -b BACKING_FILE [-F BACKING_FMT] FILENAME
  
Changes the backing file of an image. Only the formats ``qcow2`` and

``qed`` support changing the backing file.
@@ -690,7 +690,9 @@ Command description:
  
  In order to achieve this, any clusters that differ between

  *BACKING_FILE* and the old backing file of *FILENAME* are merged
-into *FILENAME* before actually changing the backing file.
+into *FILENAME* before actually changing the backing file. With ``-c``
+option specified, the clusters which are being merged (but not the
+entire *FILENAME* image) are written in the compressed mode.
  
  Note that the safe mode is an expensive operation, comparable to

  converting an image. It only works if the old backing file still
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 1b1dab5b17..068692d13e 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -88,9 +88,9 @@ SRST
  ERST
  
  DEF("rebase", img_rebase,

-"rebase [--object objectdef] [--image-opts] [-U] [-q] [-f fmt] [-t cache] [-T 
src_cache] [-p] [-u] -b backing_file [-F backing_fmt] filename")
+"rebase [--object objectdef] [--image-opts] [-U] [-q] [-f fmt] [-t cache] [-T 
src_cache] [-p] [-u] [-c] -b backing_file [-F backing_fmt] filename")
  SRST
-.. option:: rebase [--object OBJECTDEF] [--image-opts] [-U] [-q] [-f FMT] [-t 
CACHE] [-T SRC_CACHE] [-p] [-u] -b BACKING_FILE [-F BACKING_FMT] FILENAME
+.. option:: rebase [--object OBJECTDEF] [--image-opts] [-U] [-q] [-f FMT] [-t 
CACHE] [-T SRC_CACHE] [-p] [-u] [-c] -b BACKING_FILE [-F BACKING_FMT] FILENAME
  ERST
  
  DEF("resize", img_resize,

diff --git a/qemu-img.c b/qemu-img.c
index 9a469cd609..108da27b23 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3517,11 +3517,13 @@ static int img_rebase(int argc, char **argv)
  char *filename;
  const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
  int c, flags, src_flags, ret;
+BdrvRequestFlags write_flags = 0;
  bool writethrough, src_writethrough;
  int unsafe = 0;
  bool force_share = false;
  int progress = 0;
  bool quiet = false;
+bool compress = false;
  Error *local_err = NULL;
  bool image_opts = false;
  
@@ -3537,9 +3539,10 @@ static int img_rebase(int argc, char **argv)

  {"object", required_argument, 0, OPTION_OBJECT},
  {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
  {"force-share", no_argument, 0, 'U'},
+{"compress", no_argument, 0, 'c'},
  {0, 0, 0, 0}
  };
-c = getopt_long(argc, argv, ":hf:F:b:upt:T:qU",
+c = getopt_long(argc, argv, ":hf:F:b:upt:T:qUc",
  long_options, NULL);
  if (c == -1) {
  break;
@@ -3587,6 +3590,9 @@ static int img_rebase(int argc, char **argv)
  case 'U':
  force_share = true;
  break;
+case 'c':
+compress = true;
+break;
  }
  }
  
@@ -3639,6 +3645,14 @@ static int img_rebase(int argc, char **argv)
  
  unfiltered_bs = bdrv_skip_filters(bs);
  
+if (compress && !block_driver_can_compress(unfiltered_bs->drv)) {

+error_report("Compression not supported for this file format");
+ret = -1;
+goto out;
+} else if (compress) {
+write_flags |= BDRV_REQ_WRITE_COMPRESSED;
+}
+


minor neat-picking. Should we get a global
if (compress) {
  if (!block_driver_can_compress(unfiltered_bs->drv)) {
 report_error
 goto out;
  }
  write_flags |= BDRV_REQ_WRITE_COMPRESSED;
}


  if (out_basefmt != NULL) {
  if (bdrv_find_format(out_basefmt) == NULL) {
  error_report("Invalid format name: '%s'", out_basefmt);
@@ -3903,7 

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-21 Thread Maciej S. Szmigiero

On 21.06.2023 12:32, David Hildenbrand wrote:

On 20.06.23 22:13, Maciej S. Szmigiero wrote:

On 19.06.2023 17:58, David Hildenbrand wrote:

[...]

Sorry for the late reply!

Still trying to make up my mind what the right way forward with this is.



This usage is still problematic I suspect (well, and a layer violation 
regarding the machine). The machine hotplug handler is supposed to call the 
pre_plug/plug/unplug hooks as response to pre_plug/plug/unplug notifications 
from the core. See how we handle virtio-mem/virtio-pmem/nvdimms as an example.

We assume that when memory_device_pre_plug() gets called, that the device is 
not realized yet, but once it gets plugged, that it already is realized, and 
that the device will actually vanish (get unrealized) when unplugging the 
device.
Otherwise memory device logic like in get_plugged_memory_size() stops working.


get_plugged_memory_size() just calls get_plugged_size() method on every
realized TYPE_MEMORY_DEVICE.

While this now always returns the whole backing memory size (once the
backend gets plugged) I don't see a reason why this method could not be
overridden in hv-balloon to return just the currently hot-added size.

By the way, this function seems to be used just for reporting stats via QMP.


memory_device_build_list() is another example, used for 
memory_device_get_free_addr().


I don't see it calling get_plugged_size() method, I can see it only using
(indirectly) get_addr() method.


It similarly contains the TYPE_MEMORY_DEVICE -> dev->realized logic.


All right, I thought at first you meant just that the get_plugged_memory_size()
function reports misleading values.






You'd be blocking memory address ranges with an unplugged-but-realized memory 
device.>
Memory device code expects that realized memory devices are plugged and vice 
versa.


Which QEMU code you mean specifically? Maybe it just needs a trivial
change.

Before the driver hot-adds the first chunk of memory it does not use any
part of the address space.

After that, it has to reserve address space for the whole backing memory
device, so no other devices will claim parts of it and because a
TYPE_MEMORY_DEVICE (currently) can have just a single range.

This address space is released when the VM is restarted.



As I said, memory device code currently expects that you don't have realized
TYPE_MEMORY_DEVICE that are not plugged, and currently that holds true for all
memory devices.

We could modify memory device code to change that, but IMHO it's the wrong way
around: the machine (hotplug) is responsible for (un)plugging memory devices
as they get realized.

Doing a qdev_get_machine()/current_machine from device code and then
modifying the state of the machine (here: calling plug/unplug handlers)
is usually a warning sign that there is a layer violation.

That's why I'm thinking about a cleaner way to handle that.


Okay, now I think I understand what you think is questionable:
calling memory_device_pre_plug(), memory_device_plug() and friends from
the driver when hot-adding the first memory chunk, even thought no actual
device is getting plugged in at that time.

I'm open to other approaches here (besides the virtual DIMMs one that we
already tried in the past).


[...]


Is it to support the !memdev case or why is this this plugging/unplugging in 
our_range_plugged_new()/our_range_plugged_free() required?


At least for three (four) reasons:
1a) At the hv-balloon plug time the device doesn't yet know the guest
alignement requirements - or whether the guest supports memory hot add at
all - that's what the device will learn only once the guest connects
to the protocol.


Understood, so you want to at least expose the memory dynamically to the VM 
(map the MR on demand).

That could be done using a memory region container like virtio-mem is planning 
[1] on using fairly easily.

[1] https://lkml.kernel.org/r/20230616092654.175518-14-da...@redhat.com


Thanks for the pointer to your series - I've looked at it and it seems
to me that while it allows multiple memory subregions, each backed by
a separate memslot it still needs a single big main region for
the particular TYPE_MEMORY_DEVICE, am I right?


Yes.




1b) For the same reason the memory region has to be unplugged at the VM
reset time - the new guest might have stricter alignement requirements


Alignment is certainly interesting, but is it a real problem?

As default (not other memory devices) you get an address that's aligned to 1 
GiB. And, in fact, you can simply always request a 1 GiB alignment for the 
device, independent of the guest requirement.

Would the guest requirement be even stricter than that (e.g., 2 GiB)?


The protocol allows up to 32 GiB alignement so we cannot simply
hardcode the alignement to 1 GiB, especially since this is Windows
we're talking about (so this parameter is subject to unpredictable
changes).


Note that anything bigger than 1 GiB is not really guaranteed to work in
QEMU on x86-64. See the 

Re: [PATCH 3/6] qemu-img: rebase: use backing files' BlockBackend for buffer alignment

2023-06-21 Thread Denis V. Lunev

On 6/1/23 21:28, Andrey Drobyshev wrote:

Since commit bb1c05973cf ("qemu-img: Use qemu_blockalign"), buffers for
the data read from the old and new backing files are aligned using
BlockDriverState (or BlockBackend later on) referring to the target image.
However, this isn't quite right, because target image is only being
written to and has nothing to do with those buffers.  Let's fix that.

Signed-off-by: Andrey Drobyshev 
---
  qemu-img.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 78433f3746..60f4c06487 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3746,8 +3746,8 @@ static int img_rebase(int argc, char **argv)
  int64_t n;
  float local_progress = 0;
  
-buf_old = blk_blockalign(blk, IO_BUF_SIZE);

-buf_new = blk_blockalign(blk, IO_BUF_SIZE);
+buf_old = blk_blockalign(blk_old_backing, IO_BUF_SIZE);
+buf_new = blk_blockalign(blk_new_backing, IO_BUF_SIZE);
  
  size = blk_getlength(blk);

  if (size < 0) {

Reviewed-by: Denis V. Lunev 



Re: [PATCH v7] Emulate dip switch language layout settings on SUN keyboard

2023-06-21 Thread Henrik Carlqvist
On Wed, 21 Jun 2023 08:09:12 +0100
Daniel P. Berrangé  wrote:
> If you're using one of the common Linux distros, you'll find a list of
> the full set of packages you need to enable QEMU feuatres in the
> dockerfiles at tests/docker/dockerfiles/. Those all have enough to
> enable the docs build.

Thanks for your support! I am using Slackware 15.0 as my build system which
wasn't among those container configurations, but studying those files and the
output from "./configure" made me realize that I needed to install Sphinx with
all its dependencies and a rather recent version of sphinx-rtd-theme.

Now I am able to do  "make html" and as expected with my broken files I get 

-8<---
Warning, treated as error:
/tmp/qemu/docs/system/keyboard.rst:document isn't included in any toctree
ninja: build stopped: subcommand failed.
-8<---

It will be a lot easier to work from here when I get feedback on any typos in
the .rst files and then also get to read the content in a formatted way.

I hope to be able to produce an updated patch the next weekend.

Best regards Henrik



Re: [PATCH 2/3] qemu-img: map: report compressed data blocks

2023-06-21 Thread Denis V. Lunev

On 6/7/23 17:26, Andrey Drobyshev wrote:

Right now "qemu-img map" reports compressed blocks as containing data
but having no host offset.  This is not very informative.  Instead,
let's add another boolean field named "compressed" in case JSON output
mode is specified.  This is achieved by utilizing new allocation status
flag BDRV_BLOCK_COMPRESSED for bdrv_block_status().

Signed-off-by: Andrey Drobyshev 
---
  qapi/block-core.json |  7 +--
  qemu-img.c   | 16 +---
  2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5dd5f7e4b0..bc6653e5d6 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -409,6 +409,9 @@
  #
  # @zero: whether the virtual blocks read as zeroes
  #
+# @compressed: true indicates that data is stored compressed (the target
+# format must support compression)
+#
  # @depth: number of layers (0 = top image, 1 = top image's backing
  # file, ..., n - 1 = bottom image (where n is the number of images
  # in the chain)) before reaching one for which the range is
@@ -426,8 +429,8 @@
  ##
  { 'struct': 'MapEntry',
'data': {'start': 'int', 'length': 'int', 'data': 'bool',
-   'zero': 'bool', 'depth': 'int', 'present': 'bool',
-   '*offset': 'int', '*filename': 'str' } }
+   'zero': 'bool', 'compressed': 'bool', 'depth': 'int',
+   'present': 'bool', '*offset': 'int', '*filename': 'str' } }

after some thoughts I would say that for compatibility reasons it
would be beneficial to have compressed field optional.
  
  ##

  # @BlockdevCacheInfo:
diff --git a/qemu-img.c b/qemu-img.c
index 27f48051b0..9bb69f58f6 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3083,7 +3083,7 @@ static int img_info(int argc, char **argv)
  }
  
  static int dump_map_entry(OutputFormat output_format, MapEntry *e,

-  MapEntry *next)
+  MapEntry *next, bool can_compress)
  {
  switch (output_format) {
  case OFORMAT_HUMAN:
@@ -3112,6 +3112,9 @@ static int dump_map_entry(OutputFormat output_format, 
MapEntry *e,
 e->present ? "true" : "false",
 e->zero ? "true" : "false",
 e->data ? "true" : "false");
+if (can_compress) {
+printf(", \"compressed\": %s", e->compressed ? "true" : "false");

If compressed field is optional, then it would be reasonable to skip
filling this field for non-compressed clusters. In that case we
will not need 'can_compress' parameter of the call.

Ha! More importantly. The field (according to the metadata) is
mandatory while it is reported conditionally, i.e. the field is
optional in reality. There is a problem in a this or that way.


+}
  if (e->has_offset) {
  printf(", \"offset\": %"PRId64"", e->offset);
  }
@@ -3172,6 +3175,7 @@ static int get_block_status(BlockDriverState *bs, int64_t 
offset,
  .length = bytes,
  .data = !!(ret & BDRV_BLOCK_DATA),
  .zero = !!(ret & BDRV_BLOCK_ZERO),
+.compressed = !!(ret & BDRV_BLOCK_COMPRESSED),
  .offset = map,
  .has_offset = has_offset,
  .depth = depth,
@@ -3189,6 +3193,7 @@ static inline bool entry_mergeable(const MapEntry *curr, 
const MapEntry *next)
  }
  if (curr->zero != next->zero ||
  curr->data != next->data ||
+curr->compressed != next->compressed ||
  curr->depth != next->depth ||
  curr->present != next->present ||
  !curr->filename != !next->filename ||
@@ -3218,6 +3223,7 @@ static int img_map(int argc, char **argv)
  bool force_share = false;
  int64_t start_offset = 0;
  int64_t max_length = -1;
+bool can_compress = false;
  
  fmt = NULL;

  output = NULL;
@@ -3313,6 +3319,10 @@ static int img_map(int argc, char **argv)
  length = MIN(start_offset + max_length, length);
  }
  
+if (output_format == OFORMAT_JSON) {

+can_compress = block_driver_can_compress(bs->drv);
+}
+
  curr.start = start_offset;
  while (curr.start + curr.length < length) {
  int64_t offset = curr.start + curr.length;
@@ -3330,7 +3340,7 @@ static int img_map(int argc, char **argv)
  }
  
  if (curr.length > 0) {

-ret = dump_map_entry(output_format, , );
+ret = dump_map_entry(output_format, , , can_compress);
  if (ret < 0) {
  goto out;
  }
@@ -3338,7 +3348,7 @@ static int img_map(int argc, char **argv)
  curr = next;
  }
  
-ret = dump_map_entry(output_format, , NULL);

+ret = dump_map_entry(output_format, , NULL, can_compress);
  if (output_format == OFORMAT_JSON) {
  puts("]");
  }





Re: [PATCH 1/3] block: add BDRV_BLOCK_COMPRESSED flag for bdrv_block_status()

2023-06-21 Thread Andrey Drobyshev
On 6/21/23 20:46, Denis V. Lunev wrote:
> On 6/21/23 19:08, Denis V. Lunev wrote:
>> On 6/7/23 17:26, Andrey Drobyshev wrote:
>>> Functions qcow2_get_host_offset(), get_cluster_offset() explicitly
>>> report compressed cluster types when data is compressed. However, this
>>> information is never passed further.  Let's make use of it by adding new
>>> BDRV_BLOCK_COMPRESSED flag for bdrv_block_status(), so that caller may
>>> know that the data range is compressed.  In particular, we're going to
>>> use this flag to tweak "qemu-img map" output.
>>>
>>> This new flag is only being utilized by qcow and qcow2 formats, as only
>>> these two support compression.
>>>
>>> Signed-off-by: Andrey Drobyshev 
>>> ---
>>>   block/qcow.c | 5 -
>>>   block/qcow2.c    | 3 +++
>>>   include/block/block-common.h | 3 +++
>>>   3 files changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/block/qcow.c b/block/qcow.c
>>> index 3644bbf5cb..8416bcc2c3 100644
>>> --- a/block/qcow.c
>>> +++ b/block/qcow.c
>>> @@ -549,7 +549,10 @@ qcow_co_block_status(BlockDriverState *bs, bool
>>> want_zero,
>>>   if (!cluster_offset) {
>>>   return 0;
>>>   }
>>> -    if ((cluster_offset & QCOW_OFLAG_COMPRESSED) || s->crypto) {
>>> +    if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
>>> +    return BDRV_BLOCK_DATA | BDRV_BLOCK_COMPRESSED;
>>> +    }
>>> +    if (s->crypto) {
>>>   return BDRV_BLOCK_DATA;
>>>   }
>>>   *map = cluster_offset | index_in_cluster;
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index e23edd48c2..8e01adc610 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>> @@ -2162,6 +2162,9 @@ qcow2_co_block_status(BlockDriverState *bs,
>>> bool want_zero, int64_t offset,
>>>   {
>>>   status |= BDRV_BLOCK_RECURSE;
>>>   }
>>> +    if (type == QCOW2_SUBCLUSTER_COMPRESSED) {
>>> +    status |= BDRV_BLOCK_COMPRESSED;
>>> +    }
>>>   return status;
>>>   }
>>>   diff --git a/include/block/block-common.h
>>> b/include/block/block-common.h
>>> index e15395f2cb..f7a4e7d4db 100644
>>> --- a/include/block/block-common.h
>>> +++ b/include/block/block-common.h
>>> @@ -282,6 +282,8 @@ typedef enum {
>>>    *   layer rather than any backing, set by
>>> block layer
>>>    * BDRV_BLOCK_EOF: the returned pnum covers through end of file for
>>> this
>>>    * layer, set by block layer
>>> + * BDRV_BLOCK_COMPRESSED: the underlying data is compressed; only
>>> valid for
>>> + *    the formats supporting compression: qcow,
>>> qcow2
>>>    *
>>>    * Internal flags:
>>>    * BDRV_BLOCK_RAW: for use by passthrough drivers, such as raw, to
>>> request
>>> @@ -317,6 +319,7 @@ typedef enum {
>>>   #define BDRV_BLOCK_ALLOCATED    0x10
>>>   #define BDRV_BLOCK_EOF  0x20
>>>   #define BDRV_BLOCK_RECURSE  0x40
>>> +#define BDRV_BLOCK_COMPRESSED   0x80
>>>     typedef QTAILQ_HEAD(BlockReopenQueue, BlockReopenQueueEntry)
>>> BlockReopenQueue;
>> Reviewed-by: Denis V. Lunev 
> Looking into the second patch I have found that I was a too fast here :)
> 
> The comment is misleading and the patch is incomplete.
> 
> static inline bool TSA_NO_TSA block_driver_can_compress(BlockDriver *drv)
> {
>     return drv->bdrv_co_pwritev_compressed ||
>    drv->bdrv_co_pwritev_compressed_part;
> }
> 
> which means that
> 
>    1    257  block/copy-on-read.c <>
>  .bdrv_co_pwritev_compressed = cor_co_pwritev_compressed,
>    2   1199  block/qcow.c <>
>  .bdrv_co_pwritev_compressed = qcow_co_pwritev_compressed,
>    3    255  block/throttle.c <>
>  .bdrv_co_pwritev_compressed = throttle_co_pwritev_compressed,
>    4   3108  block/vmdk.c <>
>  .bdrv_co_pwritev_compressed = vmdk_co_pwritev_compressed,
>   1   6121  block/qcow2.c <>
> .bdrv_co_pwritev_compressed_part...cow2_co_pwritev_compressed_part,
> 
> We have missed at least VMDK images.

Thanks, my bad I didn't check that more thoroughly, I was mainly looking
in the docs when making this conclusion.

man qemu-img:
> Only the formats qcow and qcow2 support  compression.  The  com‐
> pression  is  read-only. It means that if a compressed sector is
> rewritten, then it is rewritten as uncompressed data.

Apparently man pages got a bit outdated on this.



  1   2   3   4   5   >