Re: [PATCH v2 2/3] ide/via: don't attempt to set default BAR addresses

2023-10-24 Thread BALATON Zoltan

On Tue, 24 Oct 2023, Mark Cave-Ayland wrote:

The via-ide device currently attempts to set the default BAR addresses to the
values shown in the datasheet, but this doesn't work for 2 reasons: firstly
BARS 1-4 do not set the bottom 2 bits to PCI_BASE_ADDRESS_SPACE_IO, and
secondly the initial PCI bus reset clears the values of all PCI device BARs
after the device itself has been reset.

Remove the setting of the default BAR addresses from via_ide_reset() to ensure
there is no doubt that these values are never exposed to the guest.


Suggested-by: BALATON Zoltan 

this was taken from my original patch so I could also add R-b but being in 
two tags already is enough so let others listed if they want :-)


Regards,
BALATON Zoltan


Signed-off-by: Mark Cave-Ayland 
Tested-by: BALATON Zoltan 
Tested-by: Bernhard Beschow 
---
hw/ide/via.c | 5 -
1 file changed, 5 deletions(-)

diff --git a/hw/ide/via.c b/hw/ide/via.c
index fff23803a6..87b134083a 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -132,11 +132,6 @@ static void via_ide_reset(DeviceState *dev)
pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_FAST_BACK |
 PCI_STATUS_DEVSEL_MEDIUM);

-pci_set_long(pci_conf + PCI_BASE_ADDRESS_0, 0x01f0);
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_1, 0x03f4);
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_2, 0x0170);
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_3, 0x0374);
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_4, 0xcc01); /* BMIBA: 20-23h 
*/
pci_set_long(pci_conf + PCI_INTERRUPT_LINE, 0x010e);

/* IDE chip enable, IDE configuration 1/2, IDE FIFO Configuration*/





[PATCH v2 3/3] hw/ide/via: implement legacy/native mode switching

2023-10-24 Thread Mark Cave-Ayland
Allow the VIA IDE controller to switch between both legacy and native modes by
calling pci_ide_update_mode() to reconfigure the device whenever PCI_CLASS_PROG
is updated.

This patch moves the initial setting of PCI_CLASS_PROG from via_ide_realize() to
via_ide_reset(), and removes the direct setting of PCI_INTERRUPT_PIN during PCI
bus reset since this is now managed by pci_ide_update_mode(). This ensures that
the device configuration is always consistent with respect to the currently
selected mode.

Signed-off-by: Mark Cave-Ayland 
Tested-by: BALATON Zoltan 
Tested-by: Bernhard Beschow 
---
 hw/ide/via.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/hw/ide/via.c b/hw/ide/via.c
index 87b134083a..0cb65c8a3d 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -28,6 +28,7 @@
 #include "hw/pci/pci.h"
 #include "migration/vmstate.h"
 #include "qemu/module.h"
+#include "qemu/range.h"
 #include "sysemu/dma.h"
 #include "hw/isa/vt82c686.h"
 #include "hw/ide/pci.h"
@@ -128,11 +129,14 @@ static void via_ide_reset(DeviceState *dev)
 ide_bus_reset(>bus[i]);
 }
 
+pci_config_set_prog_interface(pci_conf, 0x8a); /* legacy mode */
+pci_ide_update_mode(d);
+
 pci_set_word(pci_conf + PCI_COMMAND, PCI_COMMAND_IO | PCI_COMMAND_WAIT);
 pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_FAST_BACK |
  PCI_STATUS_DEVSEL_MEDIUM);
 
-pci_set_long(pci_conf + PCI_INTERRUPT_LINE, 0x010e);
+pci_set_byte(pci_conf + PCI_INTERRUPT_LINE, 0xe);
 
 /* IDE chip enable, IDE configuration 1/2, IDE FIFO Configuration*/
 pci_set_long(pci_conf + 0x40, 0x0a090600);
@@ -154,6 +158,18 @@ static void via_ide_reset(DeviceState *dev)
 pci_set_long(pci_conf + 0xc0, 0x00020001);
 }
 
+static void via_ide_cfg_write(PCIDevice *pd, uint32_t addr,
+  uint32_t val, int len)
+{
+PCIIDEState *d = PCI_IDE(pd);
+
+pci_default_write_config(pd, addr, val, len);
+
+if (range_covers_byte(addr, len, PCI_CLASS_PROG)) {
+pci_ide_update_mode(d);
+}
+}
+
 static void via_ide_realize(PCIDevice *dev, Error **errp)
 {
 PCIIDEState *d = PCI_IDE(dev);
@@ -161,7 +177,6 @@ static void via_ide_realize(PCIDevice *dev, Error **errp)
 uint8_t *pci_conf = dev->config;
 int i;
 
-pci_config_set_prog_interface(pci_conf, 0x8a); /* legacy mode */
 pci_set_long(pci_conf + PCI_CAPABILITY_LIST, 0x00c0);
 dev->wmask[PCI_INTERRUPT_LINE] = 0;
 dev->wmask[PCI_CLASS_PROG] = 5;
@@ -216,6 +231,7 @@ static void via_ide_class_init(ObjectClass *klass, void 
*data)
 /* Reason: only works as function of VIA southbridge */
 dc->user_creatable = false;
 
+k->config_write = via_ide_cfg_write;
 k->realize = via_ide_realize;
 k->exit = via_ide_exitfn;
 k->vendor_id = PCI_VENDOR_ID_VIA;
-- 
2.39.2




[PATCH v2 2/3] ide/via: don't attempt to set default BAR addresses

2023-10-24 Thread Mark Cave-Ayland
The via-ide device currently attempts to set the default BAR addresses to the
values shown in the datasheet, but this doesn't work for 2 reasons: firstly
BARS 1-4 do not set the bottom 2 bits to PCI_BASE_ADDRESS_SPACE_IO, and
secondly the initial PCI bus reset clears the values of all PCI device BARs
after the device itself has been reset.

Remove the setting of the default BAR addresses from via_ide_reset() to ensure
there is no doubt that these values are never exposed to the guest.

Signed-off-by: Mark Cave-Ayland 
Tested-by: BALATON Zoltan 
Tested-by: Bernhard Beschow 
---
 hw/ide/via.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/hw/ide/via.c b/hw/ide/via.c
index fff23803a6..87b134083a 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -132,11 +132,6 @@ static void via_ide_reset(DeviceState *dev)
 pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_FAST_BACK |
  PCI_STATUS_DEVSEL_MEDIUM);
 
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_0, 0x01f0);
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_1, 0x03f4);
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_2, 0x0170);
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_3, 0x0374);
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_4, 0xcc01); /* BMIBA: 20-23h 
*/
 pci_set_long(pci_conf + PCI_INTERRUPT_LINE, 0x010e);
 
 /* IDE chip enable, IDE configuration 1/2, IDE FIFO Configuration*/
-- 
2.39.2




[PATCH v2 0/3] ide: implement simple legacy/native mode switching for PCI IDE controllers

2023-10-24 Thread Mark Cave-Ayland
This series adds a simple implementation of legacy/native mode switching for PCI
IDE controllers and updates the via-ide device to use it.

The approach I take here is to add a new pci_ide_update_mode() function which 
handles
management of the PCI BARs and legacy IDE ioports for each mode to avoid 
exposing
details of the internal logic to individual PCI IDE controllers.

As noted in [1] this is extracted from a local WIP branch I have which contains
further work in this area. However for the moment I've kept it simple (and
restricted it to the via-ide device) which is good enough for Zoltan's PPC
images whilst paving the way for future improvements after 8.2.

Signed-off-by: Mark Cave-Ayland 

[1] https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg05403.html

v2:
- Rebase onto master
- Mask the bottom 4 bits of PCI_CLASS_PROG in pci_ide_update_mode() in patch 1
- Add patch 2 to remove the default BAR addresses to avoid confusion
- Don't set PCI_INTERRUPT_PIN directly in via_ide_reset() as it is already set
  by pci_ide_update_mode() in patch 3, and reword the commit message accordingly
- Add Tested-By tags from Zoltan and Bernhard


Mark Cave-Ayland (3):
  ide/pci.c: introduce pci_ide_update_mode() function
  ide/via: don't attempt to set default BAR addresses
  hw/ide/via: implement legacy/native mode switching

 hw/ide/pci.c | 90 
 hw/ide/via.c | 25 
 include/hw/ide/pci.h |  1 +
 3 files changed, 109 insertions(+), 7 deletions(-)

-- 
2.39.2




[PATCH v2 1/3] ide/pci.c: introduce pci_ide_update_mode() function

2023-10-24 Thread Mark Cave-Ayland
This function reads the value of the PCI_CLASS_PROG register for PCI IDE
controllers and configures the PCI BARs and/or IDE ioports accordingly.

In the case where we switch to legacy mode, the PCI BARs are set to return zero
(as suggested in the "PCI IDE Controller" specification), the legacy IDE ioports
are enabled, and the PCI interrupt pin cleared to indicate legacy IRQ routing.

Conversely when we switch to native mode, the legacy IDE ioports are disabled
and the PCI interrupt pin set to indicate native IRQ routing. The contents of
the PCI BARs are unspecified, but this is not an issue since if a PCI IDE
controller has been switched to native mode then its BARs will need to be
programmed.

Signed-off-by: Mark Cave-Ayland 
Tested-by: BALATON Zoltan 
Tested-by: Bernhard Beschow 
---
 hw/ide/pci.c | 90 
 include/hw/ide/pci.h |  1 +
 2 files changed, 91 insertions(+)

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index a25b352537..5be643b460 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -104,6 +104,96 @@ const MemoryRegionOps pci_ide_data_le_ops = {
 .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
+static const MemoryRegionPortio ide_portio_list[] = {
+{ 0, 8, 1, .read = ide_ioport_read, .write = ide_ioport_write },
+{ 0, 1, 2, .read = ide_data_readw, .write = ide_data_writew },
+{ 0, 1, 4, .read = ide_data_readl, .write = ide_data_writel },
+PORTIO_END_OF_LIST(),
+};
+
+static const MemoryRegionPortio ide_portio2_list[] = {
+{ 0, 1, 1, .read = ide_status_read, .write = ide_ctrl_write },
+PORTIO_END_OF_LIST(),
+};
+
+void pci_ide_update_mode(PCIIDEState *s)
+{
+PCIDevice *d = PCI_DEVICE(s);
+uint8_t mode = d->config[PCI_CLASS_PROG];
+
+switch (mode & 0xf) {
+case 0xa:
+/* Both channels legacy mode */
+
+/* Zero BARs */
+pci_set_long(d->config + PCI_BASE_ADDRESS_0, 0x0);
+pci_set_long(d->config + PCI_BASE_ADDRESS_1, 0x0);
+pci_set_long(d->config + PCI_BASE_ADDRESS_2, 0x0);
+pci_set_long(d->config + PCI_BASE_ADDRESS_3, 0x0);
+
+/* Clear interrupt pin */
+pci_config_set_interrupt_pin(d->config, 0);
+
+/* Add legacy IDE ports */
+if (!s->bus[0].portio_list.owner) {
+portio_list_init(>bus[0].portio_list, OBJECT(d),
+ ide_portio_list, >bus[0], "ide");
+portio_list_add(>bus[0].portio_list,
+pci_address_space_io(d), 0x1f0);
+}
+
+if (!s->bus[0].portio2_list.owner) {
+portio_list_init(>bus[0].portio2_list, OBJECT(d),
+ ide_portio2_list, >bus[0], "ide");
+portio_list_add(>bus[0].portio2_list,
+pci_address_space_io(d), 0x3f6);
+}
+
+if (!s->bus[1].portio_list.owner) {
+portio_list_init(>bus[1].portio_list, OBJECT(d),
+ide_portio_list, >bus[1], "ide");
+portio_list_add(>bus[1].portio_list,
+pci_address_space_io(d), 0x170);
+}
+
+if (!s->bus[1].portio2_list.owner) {
+portio_list_init(>bus[1].portio2_list, OBJECT(d),
+ ide_portio2_list, >bus[1], "ide");
+portio_list_add(>bus[1].portio2_list,
+pci_address_space_io(d), 0x376);
+}
+break;
+
+case 0xf:
+/* Both channels native mode */
+
+/* Set interrupt pin */
+pci_config_set_interrupt_pin(d->config, 1);
+
+/* Remove legacy IDE ports */
+if (s->bus[0].portio_list.owner) {
+portio_list_del(>bus[0].portio_list);
+portio_list_destroy(>bus[0].portio_list);
+}
+
+if (s->bus[0].portio2_list.owner) {
+portio_list_del(>bus[0].portio2_list);
+portio_list_destroy(>bus[0].portio2_list);
+}
+
+if (s->bus[1].portio_list.owner) {
+portio_list_del(>bus[1].portio_list);
+portio_list_destroy(>bus[1].portio_list);
+}
+
+if (s->bus[1].portio2_list.owner) {
+portio_list_del(>bus[1].portio2_list);
+portio_list_destroy(>bus[1].portio2_list);
+}
+break;
+}
+}
+
 static IDEState *bmdma_active_if(BMDMAState *bmdma)
 {
 assert(bmdma->bus->retry_unit != (uint8_t)-1);
diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 1ff469de87..a814a0a7c3 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -61,6 +61,7 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val);
 void bmdma_status_writeb(BMDMAState *bm, uint32_t val);
 extern MemoryRegionOps bmdma_addr_ioport_ops;
 void pci_ide_create_devs(PCIDevice *dev);
+void pci_ide_update_mode(PCIIDEState *s);
 
 extern const VMStateDescription vmstate_ide_pci;
 extern const MemoryRegionOps pci_ide_cmd_le_ops;
-- 
2.39.2




Re: [PATCH 1/2] ide/pci.c: introduce pci_ide_update_mode() function

2023-10-24 Thread Mark Cave-Ayland

On 24/10/2023 08:08, Bernhard Beschow wrote:


Am 23. Oktober 2023 21:06:11 UTC schrieb Mark Cave-Ayland 
:

On 23/10/2023 18:19, Bernhard Beschow wrote:


Am 22. Oktober 2023 22:06:30 UTC schrieb Bernhard Beschow :



Am 19. Oktober 2023 13:04:51 UTC schrieb Mark Cave-Ayland 
:

This function reads the value of the PCI_CLASS_PROG register for PCI IDE
controllers and configures the PCI BARs and/or IDE ioports accordingly.

In the case where we switch to legacy mode, the PCI BARs are set to return zero
(as suggested in the "PCI IDE Controller" specification), the legacy IDE ioports
are enabled, and the PCI interrupt pin cleared to indicate legacy IRQ routing.

Conversely when we switch to native mode, the legacy IDE ioports are disabled
and the PCI interrupt pin set to indicate native IRQ routing. The contents of
the PCI BARs are unspecified, but this is not an issue since if a PCI IDE
controller has been switched to native mode then its BARs will need to be
programmed.

Signed-off-by: Mark Cave-Ayland 
---
hw/ide/pci.c | 90 
include/hw/ide/pci.h |  1 +
2 files changed, 91 insertions(+)

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index a25b352537..9eb30af632 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -104,6 +104,96 @@ const MemoryRegionOps pci_ide_data_le_ops = {
  .endianness = DEVICE_LITTLE_ENDIAN,
};

+static const MemoryRegionPortio ide_portio_list[] = {
+{ 0, 8, 1, .read = ide_ioport_read, .write = ide_ioport_write },
+{ 0, 1, 2, .read = ide_data_readw, .write = ide_data_writew },
+{ 0, 1, 4, .read = ide_data_readl, .write = ide_data_writel },
+PORTIO_END_OF_LIST(),
+};
+
+static const MemoryRegionPortio ide_portio2_list[] = {


Although the naming seems familiar: What about renaming both lists to something 
like ide_portio_primary_list resp. ide_portio_secondary_list? Having one list 
carrying a number in its name while omitting it for the other I find rather 
confusing.


The two different portio_lists don't represent the primary and secondary 
interfaces though: they represent the command and data ports for a single 
interface.


Ah, right.


I've left the naming as-is (at least for now) so that all of the IDEBus fields, 
ISA IDE ioports and PCI IDE ioports all share the same naming convention.


Okay. At some point we should really harmonize the names to avoid above 
confusion. The PCI IDE BAR code does a much better job at naming and could 
serve as a template. Then all IDE code would clearly communicate that these are 
all the same concepts. I could send a patch for it once this series is in.


I agree that would be useful: and also same for the memory region names which could 
be updated so it's possible to tell the difference between the command and data ports 
(or indeed base/control as indicated in other online references).





+{ 0, 1, 1, .read = ide_status_read, .write = ide_ctrl_write },
+PORTIO_END_OF_LIST(),
+};
+
+void pci_ide_update_mode(PCIIDEState *s)
+{
+PCIDevice *d = PCI_DEVICE(s);
+uint8_t mode = d->config[PCI_CLASS_PROG];
+
+switch (mode) {


Maybe

   switch (mode & 0xf) {

here such that only the bits relevant to the PCI IDE controller specification 
are analyzed?


Due to the above conversation I realize that s->bus[] could be iterated over 
such that only the two bits of each bus could be switch()ed over. This would avoid 
some duplicate code, model the specification closer and allow for catching illegal 
states. Illegal states could be logged as guest errors. But it would also 
complicate dealing with the interrupt pin. So this might be a future extension.


Agreed. That particular logic will also change as a result of adding support for 
switching individual buses on the PCI IDE controller: not that I've seen anything do 
this to date, but it makes sense to implement what the documentation says where possible.


I didn't manage to send out the v2 yesterday evening, but I'll do so now.


Best regards,
Bernhard


Then we can omit the high '8' nibble in the case labels which indicate bus 
master capability which is obviously out of scope of the switch statement 
(since you're not touching the BM DMA BAR).


+case 0x8a:


Perhaps we could add a

   case 0x0:

in front of the above label for compatibility with PIIX-IDE? That way, this 
function could be reused in the future for resetting PIIX-IDE.


+/* Both channels legacy mode */
+
+/* Zero BARs */
+pci_set_long(d->config + PCI_BASE_ADDRESS_0, 0x0);
+pci_set_long(d->config + PCI_BASE_ADDRESS_1, 0x0);
+pci_set_long(d->config + PCI_BASE_ADDRESS_2, 0x0);
+pci_set_long(d->config + PCI_BASE_ADDRESS_3, 0x0);
+
+/* Clear interrupt pin */
+pci_config_set_interrupt_pin(d->config, 0);


Do we really need to toggle the interrupt pin in this function? Or is this 
VIA-specific? This byte isn't even defined for PIIX-IDE.

Best regards,
Bernhard


+
+/* Add legacy IDE 

Re: [PATCH v2 5/9] migration: check required subsections are loaded, once

2023-10-24 Thread Peter Xu
On Tue, Oct 24, 2023 at 12:41:56PM +0200, Juan Quintela wrote:
> > @@ -509,6 +538,13 @@ static int vmstate_subsection_load(QEMUFile *f, const 
> > VMStateDescription *vmsd,
> >  }
> >  }
> >  
> > +for (i = 0; i < n; i++) {
> > +if (!visited[i] && vmstate_section_needed(vmsd->subsections[i], 
> > opaque)) {
> > +trace_vmstate_subsection_load_bad(vmsd->name, 
> > vmsd->subsections[i]->name, "(not visited)");
> > +return -ENOENT;
> > +}
> > +}
> > +
> >  trace_vmstate_subsection_load_good(vmsd->name);
> >  return 0;
> >  }
> 
> This part is the only one where I can see there could be some
> discussion.  So I wil wait to see what other people think.

Previous email:

https://lore.kernel.org/qemu-devel/ZR2P1RbxCfBdYBaQ@x1n/

I still think it is safer to not fail unless justified that we won't hit
surprises in the ->needed().  There are a lot so I assume it's non-trivial
to justify.

We can switch the tracepoint into error_report() in that case, though, as
long as it won't fail the migration.

Thanks,

-- 
Peter Xu




Re: [PATCH v2 16/24] hw/xen: handle soft reset for primary console

2023-10-24 Thread David Woodhouse
On Tue, 2023-10-24 at 17:22 +0100, Paul Durrant wrote:
> On 24/10/2023 16:48, David Woodhouse wrote:
> > On Tue, 2023-10-24 at 16:44 +0100, Paul Durrant wrote:
> > > On 19/10/2023 16:40, David Woodhouse wrote:
> > > > From: David Woodhouse 
> > > > 
> > > > On soft reset, the prinary console event channel needs to be rebound to
> > > > the backend port (in the xen-console driver). We could put that into the
> > > > xen-console driver itself, but it's slightly less ugly to keep it within
> > > > the KVM/Xen code, by stashing the backend port# on event channel reset
> > > > and then rebinding in the primary console reset when it has to recreate
> > > > the guest port anyway.
> > > 
> > > Does Xen re-bind the primary console on EVTCHNOP_reset? That's news to
> > > me. I go check.
> > 
> > I spent an unhapp hour trying to work out how Xen actually does any of
> > this :)
> > 
> > In the short term I'm more interested in having soft reset work, than
> > an explicit EVTCHNOP_reset. And I can't work out *how*, but we do seem
> > to have console again after a kexec in real Xen.
> 
> *Soft* reset may do it, but not the EVTCHNOP_reset hypercall itself, 
> because there's a bunch of impenetrable toolstack magic involved the 
> former. Perhaps you could just push the re-bind code up a layer into
> kvm_xen_soft_reset().

Actually, all we're doing here is *storing* the be_port so that the
*console* reset code can later connect to it. So the actual reconnect 
is already called separately from a layer up in kvm_xen_soft_reset().

If the guest just calls EVTCHNOP_reset then it'll just get the event
channels reset and the console *won't* reconnect.

Actually.. if the guest just calls EVTCHNOP_reset I think they'll just
hit the assert(qemu_mutex_iothread_locked()) at line 1109. We need a  
QEMU_IOTHREAD_LOCK_GUARD() in the penultimate line of
xen_evtchn_reset_op(). I'll fix that separately.


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 02/12] qemu_file: Use a stat64 for qemu_file_transferred

2023-10-24 Thread Eric Blake
On Tue, Oct 24, 2023 at 05:10:32PM +0200, Juan Quintela wrote:
> This way we can read it from any thread.
> I checked that it gives the same value than the current one.  We never

s/than/as/

> use to qemu_files at the same time.

s/to/two/

> 
> Signed-off-by: Juan Quintela 
> ---
>  migration/migration-stats.h | 4 
>  migration/qemu-file.c   | 5 +++--
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH 12/12] qemu-file: Make qemu_fflush() return errors

2023-10-24 Thread Fabiano Rosas
Juan Quintela  writes:

> This let us simplify code of this shape.
>
>qemu_fflush(f);
>int ret = qemu_file_get_error(f);
>if (ret) {
>   return ret;
>}
>
> into:
>
>int ret = qemu_fflush(f);
>if (ret) {
>   return ret;
>}
>
> I updated all callers where there is any error check.
> qemu_fclose() don't need to check for f->last_error because
> qemu_fflush() returns it at the beggining of the function.
>
> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Juan Quintela 

Reviewed-by: Fabiano Rosas 



Re: [PATCH 11/12] migration: Remove transferred atomic counter

2023-10-24 Thread Fabiano Rosas
Juan Quintela  writes:

> After last commit, it is a write only variable.
>
> Signed-off-by: Juan Quintela 
> ---
>  migration/migration-stats.h | 4 
>  migration/multifd.c | 3 ---
>  migration/ram.c | 1 -
>  3 files changed, 8 deletions(-)
>
> diff --git a/migration/migration-stats.h b/migration/migration-stats.h
> index 68f3939188..05290ade76 100644
> --- a/migration/migration-stats.h
> +++ b/migration/migration-stats.h
> @@ -97,10 +97,6 @@ typedef struct {
>   * Number of bytes sent through RDMA.
>   */
>  Stat64 rdma_bytes;
> -/*
> - * Total number of bytes transferred.
> - */
> -Stat64 transferred;
>  /*
>   * Number of pages transferred that were full of zeros.
>   */
> diff --git a/migration/multifd.c b/migration/multifd.c
> index e2a45c667a..ec58c58082 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -188,7 +188,6 @@ static int multifd_send_initial_packet(MultiFDSendParams 
> *p, Error **errp)
>  return -1;
>  }
>  stat64_add(_stats.multifd_bytes, size);
> -stat64_add(_stats.transferred, size);
>  return 0;
>  }
>  
> @@ -733,8 +732,6 @@ static void *multifd_send_thread(void *opaque)
>  
>  stat64_add(_stats.multifd_bytes,
> p->next_packet_size + p->packet_len);
> -stat64_add(_stats.transferred,
> -   p->next_packet_size + p->packet_len);
>  p->next_packet_size = 0;
>  qemu_mutex_lock(>mutex);
>  p->pending_job--;
> diff --git a/migration/ram.c b/migration/ram.c
> index 5ccf70333a..6d2bf50614 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -455,7 +455,6 @@ void ram_transferred_add(uint64_t bytes)
>  } else {
>  stat64_add(_stats.downtime_bytes, bytes);
>  }
> -stat64_add(_stats.transferred, bytes);
>  }
>  
>  struct MigrationOps {

Reviewed-by: Fabiano Rosas 



Re: [PATCH 10/12] migration: Use migration_transferred_bytes()

2023-10-24 Thread Fabiano Rosas
Juan Quintela  writes:

> There are only two differnces with the old value:
>
> - the amount of QEMUFile that hasn't yet been flushed.  It can be
>   discussed what is more exact, the new or the old one.
> - the amount of transferred bytes that we forgot to account for (the
>   newer is better, i.e. exact).
>
> Notice that this two values are used to:
> a - present to the user
> b - calculate the rate_limit
>
> So a few KB here and there is not going to make a difference.
>
> Signed-off-by: Juan Quintela 

Reviewed-by: Fabiano Rosas 



Re: [PATCH 06/12] qemu-file: Remove _noflush from qemu_file_transferred_noflush()

2023-10-24 Thread Fabiano Rosas
Juan Quintela  writes:

> qemu_file_transferred() don't exist anymore, so we can reuse the name.
>
> Signed-off-by: Juan Quintela 
>
> ---
>
> v2: Update the documentation (thanks fabiano)
> ---
>  migration/qemu-file.h | 9 -
>  migration/block.c | 4 ++--
>  migration/qemu-file.c | 2 +-
>  migration/savevm.c| 6 +++---
>  migration/vmstate.c   | 4 ++--
>  5 files changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/migration/qemu-file.h b/migration/qemu-file.h
> index 8b71152754..1b2f6b8d8f 100644
> --- a/migration/qemu-file.h
> +++ b/migration/qemu-file.h
> @@ -34,15 +34,14 @@ QEMUFile *qemu_file_new_output(QIOChannel *ioc);
>  int qemu_fclose(QEMUFile *f);
>  
>  /*
> - * qemu_file_transferred_noflush:
> + * qemu_file_transferred:
>   *
> - * As qemu_file_transferred except for writable files, where no flush
> - * is performed and the reported amount will include the size of any
> - * queued buffers, on top of the amount actually transferred.
> + * No flush is performed and the reported amount will include the size
> + * of any queued buffers, on top of the amount actually transferred.
>   *
>   * Returns: the total bytes transferred and queued
>   */
> -uint64_t qemu_file_transferred_noflush(QEMUFile *f);
> +uint64_t qemu_file_transferred(QEMUFile *f);
>  
>  /*
>   * put_buffer without copying the buffer.
> diff --git a/migration/block.c b/migration/block.c
> index b60698d6e2..47f11d0e4f 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -752,7 +752,7 @@ static int block_save_setup(QEMUFile *f, void *opaque)
>  static int block_save_iterate(QEMUFile *f, void *opaque)
>  {
>  int ret;
> -uint64_t last_bytes = qemu_file_transferred_noflush(f);
> +uint64_t last_bytes = qemu_file_transferred(f);
>  
>  trace_migration_block_save("iterate", block_mig_state.submitted,
> block_mig_state.transferred);
> @@ -804,7 +804,7 @@ static int block_save_iterate(QEMUFile *f, void *opaque)
>  }
>  
>  qemu_put_be64(f, BLK_MIG_FLAG_EOS);
> -uint64_t delta_bytes = qemu_file_transferred_noflush(f) - last_bytes;
> +uint64_t delta_bytes = qemu_file_transferred(f) - last_bytes;
>  return (delta_bytes > 0);
>  }
>  
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index efa5f11033..0158db2a54 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -618,7 +618,7 @@ int coroutine_mixed_fn qemu_get_byte(QEMUFile *f)
>  return result;
>  }
>  
> -uint64_t qemu_file_transferred_noflush(QEMUFile *f)
> +uint64_t qemu_file_transferred(QEMUFile *f)
>  {
>  uint64_t ret = stat64_get(_stats.qemu_file_transferred);
>  int i;
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 8622f229e5..9c90499609 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -927,9 +927,9 @@ static int vmstate_load(QEMUFile *f, SaveStateEntry *se)
>  static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se,
> JSONWriter *vmdesc)
>  {
> -uint64_t old_offset = qemu_file_transferred_noflush(f);
> +uint64_t old_offset = qemu_file_transferred(f);
>  se->ops->save_state(f, se->opaque);
> -uint64_t size = qemu_file_transferred_noflush(f) - old_offset;
> +uint64_t size = qemu_file_transferred(f) - old_offset;
>  
>  if (vmdesc) {
>  json_writer_int64(vmdesc, "size", size);
> @@ -3053,7 +3053,7 @@ bool save_snapshot(const char *name, bool overwrite, 
> const char *vmstate,
>  goto the_end;
>  }
>  ret = qemu_savevm_state(f, errp);
> -vm_state_size = qemu_file_transferred_noflush(f);
> +vm_state_size = qemu_file_transferred(f);
>  ret2 = qemu_fclose(f);
>  if (ret < 0) {
>  goto the_end;
> diff --git a/migration/vmstate.c b/migration/vmstate.c
> index 1cf9e45b85..16420fa9a3 100644
> --- a/migration/vmstate.c
> +++ b/migration/vmstate.c
> @@ -386,7 +386,7 @@ int vmstate_save_state_v(QEMUFile *f, const 
> VMStateDescription *vmsd,
>  void *curr_elem = first_elem + size * i;
>  
>  vmsd_desc_field_start(vmsd, vmdesc_loop, field, i, n_elems);
> -old_offset = qemu_file_transferred_noflush(f);
> +old_offset = qemu_file_transferred(f);
>  if (field->flags & VMS_ARRAY_OF_POINTER) {
>  assert(curr_elem);
>  curr_elem = *(void **)curr_elem;
> @@ -416,7 +416,7 @@ int vmstate_save_state_v(QEMUFile *f, const 
> VMStateDescription *vmsd,
>  return ret;
>  }
>  
> -written_bytes = qemu_file_transferred_noflush(f) - 
> old_offset;
> +written_bytes = qemu_file_transferred(f) - old_offset;
>  vmsd_desc_field_end(vmsd, vmdesc_loop, field, written_bytes, 
> i);
>  
>  /* Compressed arrays only care about the first element */

Reviewed-by: Fabiano Rosas 



Re: [PATCH 05/12] qemu_file: Remove unused qemu_file_transferred()

2023-10-24 Thread Fabiano Rosas
Juan Quintela  writes:

> Signed-off-by: Juan Quintela 
> ---
>  migration/qemu-file.h | 18 --
>  migration/qemu-file.c |  7 ---
>  2 files changed, 25 deletions(-)
>
> diff --git a/migration/qemu-file.h b/migration/qemu-file.h
> index a29c37b0d0..8b71152754 100644
> --- a/migration/qemu-file.h
> +++ b/migration/qemu-file.h
> @@ -33,24 +33,6 @@ QEMUFile *qemu_file_new_input(QIOChannel *ioc);
>  QEMUFile *qemu_file_new_output(QIOChannel *ioc);
>  int qemu_fclose(QEMUFile *f);
>  
> -/*
> - * qemu_file_transferred:
> - *
> - * Report the total number of bytes transferred with
> - * this file.
> - *
> - * For writable files, any pending buffers will be
> - * flushed, so the reported value will be equal to
> - * the number of bytes transferred on the wire.
> - *
> - * For readable files, the reported value will be
> - * equal to the number of bytes transferred on the
> - * wire.
> - *
> - * Returns: the total bytes transferred
> - */
> -uint64_t qemu_file_transferred(QEMUFile *f);
> -
>  /*
>   * qemu_file_transferred_noflush:
>   *
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 641ab703cc..efa5f11033 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -632,13 +632,6 @@ uint64_t qemu_file_transferred_noflush(QEMUFile *f)
>  return ret;
>  }
>  
> -uint64_t qemu_file_transferred(QEMUFile *f)
> -{
> -g_assert(qemu_file_is_writable(f));
> -qemu_fflush(f);
> -return stat64_get(_stats.qemu_file_transferred);
> -}
> -
>  void qemu_put_be16(QEMUFile *f, unsigned int v)
>  {
>  qemu_put_byte(f, v >> 8);

Reviewed-by: Fabiano Rosas 



Re: [PATCH 04/12] migration: Use the number of transferred bytes directly

2023-10-24 Thread Fabiano Rosas
Juan Quintela  writes:

> We only use migration_transferred_bytes() to calculate the rate_limit,
> for that we don't need to flush whatever is on the qemu_file buffer.
> Remember that the buffer is really small (normal case is 32K if we use
> iov's can be 64 * TARGET_PAGE_SIZE), so this is not relevant to
> calculations.
>
> Signed-off-by: Juan Quintela 
> ---
>  migration/migration-stats.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/migration/migration-stats.c b/migration/migration-stats.c
> index 4cc989d975..1d9197b4c3 100644
> --- a/migration/migration-stats.c
> +++ b/migration/migration-stats.c
> @@ -63,7 +63,7 @@ uint64_t migration_transferred_bytes(QEMUFile *f)
>  {
>  uint64_t multifd = stat64_get(_stats.multifd_bytes);
>  uint64_t rdma = stat64_get(_stats.rdma_bytes);
> -uint64_t qemu_file = qemu_file_transferred(f);
> +uint64_t qemu_file = stat64_get(_stats.qemu_file_transferred);
>  
>  trace_migration_transferred_bytes(qemu_file, multifd, rdma);
>  return qemu_file + multifd + rdma;

Reviewed-by: Fabiano Rosas 



Re: [PATCH 03/12] qemu_file: total_transferred is not used anymore

2023-10-24 Thread Fabiano Rosas
Juan Quintela  writes:

> Signed-off-by: Juan Quintela 
> ---
>  migration/qemu-file.c | 4 
>  1 file changed, 4 deletions(-)
>
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 384985f534..641ab703cc 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -41,9 +41,6 @@ struct QEMUFile {
>  QIOChannel *ioc;
>  bool is_writable;
>  
> -/* The sum of bytes transferred on the wire */
> -uint64_t total_transferred;
> -
>  int buf_index;
>  int buf_size; /* 0 when writing */
>  uint8_t buf[IO_BUF_SIZE];
> @@ -282,7 +279,6 @@ void qemu_fflush(QEMUFile *f)
>  qemu_file_set_error_obj(f, -EIO, local_error);
>  } else {
>  uint64_t size = iov_size(f->iov, f->iovcnt);
> -f->total_transferred += size;
>  stat64_add(_stats.qemu_file_transferred, size);
>  }

Reviewed-by: Fabiano Rosas 



Re: [PATCH 02/12] qemu_file: Use a stat64 for qemu_file_transferred

2023-10-24 Thread Fabiano Rosas
Juan Quintela  writes:

> This way we can read it from any thread.
> I checked that it gives the same value than the current one.  We never
> use to qemu_files at the same time.
>
> Signed-off-by: Juan Quintela 

Reviewed-by: Fabiano Rosas 



Re: [PATCH 01/12] qemu-file: We only call qemu_file_transferred_* on the sending side

2023-10-24 Thread Fabiano Rosas
Juan Quintela  writes:

> Remove the increase in qemu_file_fill_buffer() and add asserts to
> qemu_file_transferred* functions.

Patch looks ok, but I would rewrite the whole commit message like this:

Don't increment qemu_file_transferred at qemu_file_fill_buffer

We only call qemu_file_transferred_* on the sending side. Remove the
increment at qemu_file_fill_buffer() and add asserts to
qemu_file_transferred* functions.




Re: [PATCH v5 0/6] virtio: cleanup vhost-user-generic and reduce c

2023-10-24 Thread Michael S. Tsirkin
On Tue, Oct 24, 2023 at 06:14:36PM +0100, Alex Bennée wrote:
> 
> Alex Bennée  writes:
> 
> > A lot of our vhost-user stubs are large chunks of boilerplate that do
> > (mostly) the same thing. This series continues the cleanups by
> > splitting the vhost-user-base and vhost-user-generic implementations.
> > After adding a new vq_size property the rng, gpio and i2c vhost-user
> > devices become simple specialisations of the common base defining the
> > ID, number of queues and potentially the config handling.
> >
> > I've also added Manos' vhost-user-sound while I was at it.
> 
> Ping MST. Any more comments or happy to take as is?

It's tagged already.

> -- 
> Alex Bennée
> Virtualisation Tech Lead @ Linaro




Re: [PATCH v5 0/6] virtio: cleanup vhost-user-generic and reduce c

2023-10-24 Thread Alex Bennée


Alex Bennée  writes:

> A lot of our vhost-user stubs are large chunks of boilerplate that do
> (mostly) the same thing. This series continues the cleanups by
> splitting the vhost-user-base and vhost-user-generic implementations.
> After adding a new vq_size property the rng, gpio and i2c vhost-user
> devices become simple specialisations of the common base defining the
> ID, number of queues and potentially the config handling.
>
> I've also added Manos' vhost-user-sound while I was at it.

Ping MST. Any more comments or happy to take as is?

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode

2023-10-24 Thread David Woodhouse
On Tue, 2023-10-24 at 17:25 +0100, Paul Durrant wrote:
> On 24/10/2023 16:49, David Woodhouse wrote:
> > On Tue, 2023-10-24 at 16:39 +0100, Paul Durrant wrote:
> > > On 24/10/2023 16:37, David Woodhouse wrote:
> > > > On Tue, 2023-10-24 at 15:20 +0100, Paul Durrant wrote:
> > > > > On 16/10/2023 16:19, David Woodhouse wrote:
> > > > > > From: David Woodhouse 
> > > > > > 
> > > > > > The primary console is special because the toolstack maps a page at 
> > > > > > a
> > > > > > fixed GFN and also allocates the guest-side event channel. Add 
> > > > > > support
> > > > > > for that in emulated mode, so that we can have a primary console.
> > > > > > 
> > > > > > Add a *very* rudimentary stub of foriegnmem ops for emulated mode, 
> > > > > > which
> > > > > > supports literally nothing except a single-page mapping of the 
> > > > > > console
> > > > > > page. This might as well have been a hack in the xen_console 
> > > > > > driver, but
> > > > > > this way at least the special-casing is kept within the Xen 
> > > > > > emulation
> > > > > > code, and it gives us a hook for a more complete implementation 
> > > > > > if/when
> > > > > > we ever do need one.
> > > > > > 
> > > > > Why can't you map the console page via the grant table like the 
> > > > > xenstore
> > > > > page?
> > > > 
> > > > I suppose we could, but I didn't really want the generic xen-console
> > > > device code having any more of a special case for 'Xen emulation' than
> > > > it does already by having to call xen_primary_console_create().
> > > > 
> > > 
> > > But doesn't is save you the whole foreignmem thing? You can use the
> > > grant table for primary and secondary consoles.
> > 
> > Yes. And I could leave the existing foreignmem thing just for the case
> > of primary console under true Xen. It's probably not that awful a
> > special case, in the end.
> > 
> > Then again, I was surprised I didn't *already* have a foreignmem ops
> > for the emulated case, and we're probably going to want to continue
> > fleshing it out later, so I don't really mind adding it.
> > 
> 
> True. We'll need it for some of the other more fun protocols like vkbd 
> or fb. Still, I think it'd be nicer to align the xenstore and primary
> console code to look similar and punt the work until then :-)

I don't think it ends up looking like xenstore either way, does it?
Xenstore is special because it gets to use the original pointer to its
own page.

I don't think I want to hack the xen_console code to explicitly call a
xen_console_give_me_your_page() function. If not foreignmem, I think
you were suggesting that we actually call the grant mapping code to get
a pointer to the underlying page, right? 

I could kind of live with that... except that Xen has this ugly
convention that the "ring-ref" frontend node for the primary console
actually has the *MFN* not a grant ref. Which I don't understand since
the toolstack *does* populate the grant table for it (just as it does
for the xenstore page). But we'd have to add a special case exception
to that special case, so that in the emu case it's an actual grant ref
again. I think I prefer just having a stub of foreignmem, TBH.

(I didn't yet manage to get Xen to actually create a primary console of
type iomem, FWIW)




smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode

2023-10-24 Thread Paul Durrant

On 24/10/2023 16:49, David Woodhouse wrote:

On Tue, 2023-10-24 at 16:39 +0100, Paul Durrant wrote:

On 24/10/2023 16:37, David Woodhouse wrote:

On Tue, 2023-10-24 at 15:20 +0100, Paul Durrant wrote:

On 16/10/2023 16:19, David Woodhouse wrote:

From: David Woodhouse 

The primary console is special because the toolstack maps a page at a
fixed GFN and also allocates the guest-side event channel. Add support
for that in emulated mode, so that we can have a primary console.

Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which
supports literally nothing except a single-page mapping of the console
page. This might as well have been a hack in the xen_console driver, but
this way at least the special-casing is kept within the Xen emulation
code, and it gives us a hook for a more complete implementation if/when
we ever do need one.


Why can't you map the console page via the grant table like the xenstore
page?


I suppose we could, but I didn't really want the generic xen-console
device code having any more of a special case for 'Xen emulation' than
it does already by having to call xen_primary_console_create().



But doesn't is save you the whole foreignmem thing? You can use the
grant table for primary and secondary consoles.


Yes. And I could leave the existing foreignmem thing just for the case
of primary console under true Xen. It's probably not that awful a
special case, in the end.

Then again, I was surprised I didn't *already* have a foreignmem ops
for the emulated case, and we're probably going to want to continue
fleshing it out later, so I don't really mind adding it.



True. We'll need it for some of the other more fun protocols like vkbd 
or fb. Still, I think it'd be nicer to align the xenstore and primary 
console code to look similar and punt the work until then :-)


  Paul



Re: [PATCH v2 16/24] hw/xen: handle soft reset for primary console

2023-10-24 Thread Paul Durrant

On 24/10/2023 16:48, David Woodhouse wrote:

On Tue, 2023-10-24 at 16:44 +0100, Paul Durrant wrote:

On 19/10/2023 16:40, David Woodhouse wrote:

From: David Woodhouse 

On soft reset, the prinary console event channel needs to be rebound to
the backend port (in the xen-console driver). We could put that into the
xen-console driver itself, but it's slightly less ugly to keep it within
the KVM/Xen code, by stashing the backend port# on event channel reset
and then rebinding in the primary console reset when it has to recreate
the guest port anyway.


Does Xen re-bind the primary console on EVTCHNOP_reset? That's news to
me. I go check.


I spent an unhapp hour trying to work out how Xen actually does any of
this :)

In the short term I'm more interested in having soft reset work, than
an explicit EVTCHNOP_reset. And I can't work out *how*, but we do seem
to have console again after a kexec in real Xen.


*Soft* reset may do it, but not the EVTCHNOP_reset hypercall itself, 
because there's a bunch of impenetrable toolstack magic involved the 
former. Perhaps you could just push the re-bind code up a layer into

kvm_xen_soft_reset().

  Paul



Re: [PATCH 0/12] Get Xen PV shim running in qemu

2023-10-24 Thread David Woodhouse
On Tue, 2023-10-24 at 16:24 +0100, Alex Bennée wrote:
> 
> David Woodhouse  writes:
> 
> > I hadn't got round to getting the PV shim running yet; I thought it would
> > need work on the multiboot loader. Turns out it doesn't. I *did* need to
> > fix a couple of brown-paper-bag bugs in the per-vCPU upcall vector support,
> > and implement Xen console support though. Now I can test PV guests:
> > 
> >  $ qemu-system-x86_64 --accel kvm,xen-version=0x40011,kernel-irqchip=split \
> >    -chardev stdio,mux=on,id=char0 -device xen-console,chardev=char0 \
> >    -drive file=${GUEST_IMAGE},if=xen -display none -m 1G \
> >    -kernel ~/git/xen/xen/xen -initrd ~/git/linux/arch/x86/boot/bzImage
> >  \
> 

(Reordering your questions so the answers flow better)

> Would it be possible to have some sort of overview document in our
> manual for how Xen guests are supported under KVM?

https://qemu-project.gitlab.io/qemu/system/i386/xen.html covers running
Xen HVM guests under Qemu/KVM.

What I'm adding here is the facility to support Xen PV guests. There is
a corresponding update to the documentation in my working tree at 
https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv

https://git.infradead.org/users/dwmw2/qemu.git/commitdiff/af693bf51141

PV mode is the old mode which predates proper virtualization support in
the CPUs, where a guest kernel *knows* it doesn't have access to real
(or indeed virtualized) hardware. It runs in ring 1 (or ring 3 on
x86_64) and makes hypercalls to Xen to ask it to do all the MMU
management.

When Spectre/Meltdown happened, running actual PV guests directly under
Xen became kind of insane, so we hacked a version of Xen to work as a
"shim", running inside a proper HVM guest, and just providing those MMU
management services to its guest. Its *only* guest. This shim doesn't
even do any of the PV disk/block stuff; that's passed through directly
to the real hypervisor.

So you have a real Xen hypervisor, then a "PV shim" Xen running inside
that hardware virtual machine, and a guest kernel hosted by that PV
shim.

Now, since Qemu/KVM can now pretend to be Xen and host guests that
think they're running as Xen HVM guests... Qemu/KVM can host that PV
shim too. As noted, I just had to realise that we could use '-initrd'
to trick Qemu's multiboot loader into doing it... and fix a few brown
paper bag bugs.

> So this is a KVM guest running the Xen hypervisor (via -kernel) and a
> Dom0 Linux guest (via -initrd)?

Fairly much. It's a KVM guest running that "shim" version of the Xen
hypervisor via -kernel, and a Linux guest via -initrd.

Although I wouldn't call that a "Dom0 Linux guest" because we tend to
use "Dom0" to mean the control domain, which can launch other (DomU)
guests... and that isn't what's happening here. It's more of a "Dom1".
The one and only unprivileged guest.

In particular, there's no nested virtualization here. Because in that
sense, what "Xen" does to host a PV guest isn't really virtualization.

> Should this work for any Xen architecture or is this x86 specific? Does
> the -M machine model matter?

It's currently x86-specific and KVM-specific. You can use the pc or q35
models as you see fit, although see the doc linked above for discussion
of the IDE 'unplug' mechanism. And recent patches on the list to fix it
for q35.

It would be interesting to make it work on other platforms, and even
with TCG. I've tried to keep it as portable as possible up to a point,
but without too much gratuitous overengineering to chase that goal.

Making it work with TCG would require dealing with all the struct
layouts where alignment/padding differs on different host
architectures, so we probably wouldn't be able to use the Xen public
header files directly. And we would need to implement some of the basic
event channel delivery and shared info page handling that we rely on
KVM to do for us. The latter probably isn't that hard; the former is
what stopped me even bothering.

Making it work for e.g. Arm would require porting some of the KVM
support to Arm in the kernel (it's currently x86-specific). And/or
making it work for TCG but the parts that *are* accelerated in the
kernel (timers, IPIs, etc) are there for a reason though. If we do make
it work for TCG by implementing those in userspace, I wouldn't
necessarily want a *KVM* guest to have to rely on those in userspace.


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode

2023-10-24 Thread David Woodhouse
On Tue, 2023-10-24 at 16:39 +0100, Paul Durrant wrote:
> On 24/10/2023 16:37, David Woodhouse wrote:
> > On Tue, 2023-10-24 at 15:20 +0100, Paul Durrant wrote:
> > > On 16/10/2023 16:19, David Woodhouse wrote:
> > > > From: David Woodhouse 
> > > > 
> > > > The primary console is special because the toolstack maps a page at a
> > > > fixed GFN and also allocates the guest-side event channel. Add support
> > > > for that in emulated mode, so that we can have a primary console.
> > > > 
> > > > Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which
> > > > supports literally nothing except a single-page mapping of the console
> > > > page. This might as well have been a hack in the xen_console driver, but
> > > > this way at least the special-casing is kept within the Xen emulation
> > > > code, and it gives us a hook for a more complete implementation if/when
> > > > we ever do need one.
> > > > 
> > > Why can't you map the console page via the grant table like the xenstore
> > > page?
> > 
> > I suppose we could, but I didn't really want the generic xen-console
> > device code having any more of a special case for 'Xen emulation' than
> > it does already by having to call xen_primary_console_create().
> > 
> 
> But doesn't is save you the whole foreignmem thing? You can use the 
> grant table for primary and secondary consoles.

Yes. And I could leave the existing foreignmem thing just for the case
of primary console under true Xen. It's probably not that awful a
special case, in the end.

Then again, I was surprised I didn't *already* have a foreignmem ops
for the emulated case, and we're probably going to want to continue
fleshing it out later, so I don't really mind adding it.



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v2 16/24] hw/xen: handle soft reset for primary console

2023-10-24 Thread David Woodhouse
On Tue, 2023-10-24 at 16:44 +0100, Paul Durrant wrote:
> On 19/10/2023 16:40, David Woodhouse wrote:
> > From: David Woodhouse 
> > 
> > On soft reset, the prinary console event channel needs to be rebound to
> > the backend port (in the xen-console driver). We could put that into the
> > xen-console driver itself, but it's slightly less ugly to keep it within
> > the KVM/Xen code, by stashing the backend port# on event channel reset
> > and then rebinding in the primary console reset when it has to recreate
> > the guest port anyway.
> 
> Does Xen re-bind the primary console on EVTCHNOP_reset? That's news to 
> me. I go check.

I spent an unhapp hour trying to work out how Xen actually does any of
this :)

In the short term I'm more interested in having soft reset work, than
an explicit EVTCHNOP_reset. And I can't work out *how*, but we do seem
to have console again after a kexec in real Xen.


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v2 16/24] hw/xen: handle soft reset for primary console

2023-10-24 Thread Paul Durrant

On 19/10/2023 16:40, David Woodhouse wrote:

From: David Woodhouse 

On soft reset, the prinary console event channel needs to be rebound to
the backend port (in the xen-console driver). We could put that into the
xen-console driver itself, but it's slightly less ugly to keep it within
the KVM/Xen code, by stashing the backend port# on event channel reset
and then rebinding in the primary console reset when it has to recreate
the guest port anyway.


Does Xen re-bind the primary console on EVTCHNOP_reset? That's news to 
me. I go check.


  Paul



Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_evtchn.c  |  9 +
  hw/i386/kvm/xen_primary_console.c | 29 -
  hw/i386/kvm/xen_primary_console.h |  1 +
  3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index d72dca6591..ce4da6d37a 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -40,6 +40,7 @@
  #include "xen_evtchn.h"
  #include "xen_overlay.h"
  #include "xen_xenstore.h"
+#include "xen_primary_console.h"
  
  #include "sysemu/kvm.h"

  #include "sysemu/kvm_xen.h"
@@ -1098,6 +1099,7 @@ int xen_evtchn_soft_reset(void)
  {
  XenEvtchnState *s = xen_evtchn_singleton;
  bool flush_kvm_routes;
+uint16_t con_port = xen_primary_console_get_port();
  int i;
  
  if (!s) {

@@ -1108,6 +1110,13 @@ int xen_evtchn_soft_reset(void)
  
  qemu_mutex_lock(>port_lock);
  
+if (con_port) {

+XenEvtchnPort *p = >port_table[con_port];
+if (p->type == EVTCHNSTAT_interdomain) {
+xen_primary_console_set_be_port(p->u.interdomain.port);
+}
+}
+
  for (i = 0; i < s->nr_ports; i++) {
  close_port(s, i, _kvm_routes);
  }
diff --git a/hw/i386/kvm/xen_primary_console.c 
b/hw/i386/kvm/xen_primary_console.c
index 0aa1c16ad6..5e6e085ac7 100644
--- a/hw/i386/kvm/xen_primary_console.c
+++ b/hw/i386/kvm/xen_primary_console.c
@@ -112,6 +112,15 @@ uint16_t xen_primary_console_get_port(void)
  return s->guest_port;
  }
  
+void xen_primary_console_set_be_port(uint16_t port)

+{
+XenPrimaryConsoleState *s = xen_primary_console_singleton;
+if (s) {
+printf("be port set to %d\n", port);
+s->be_port = port;
+}
+}
+
  uint64_t xen_primary_console_get_pfn(void)
  {
  XenPrimaryConsoleState *s = xen_primary_console_singleton;
@@ -142,6 +151,20 @@ static void alloc_guest_port(XenPrimaryConsoleState *s)
  }
  }
  
+static void rebind_guest_port(XenPrimaryConsoleState *s)

+{
+struct evtchn_bind_interdomain inter = {
+.remote_dom = DOMID_QEMU,
+.remote_port = s->be_port,
+};
+
+if (!xen_evtchn_bind_interdomain_op()) {
+s->guest_port = inter.local_port;
+}
+
+s->be_port = 0;
+}
+
  int xen_primary_console_reset(void)
  {
  XenPrimaryConsoleState *s = xen_primary_console_singleton;
@@ -154,7 +177,11 @@ int xen_primary_console_reset(void)
  xen_overlay_do_map_page(>console_page, gpa);
  }
  
-alloc_guest_port(s);

+if (s->be_port) {
+rebind_guest_port(s);
+} else {
+alloc_guest_port(s);
+}
  
  trace_xen_primary_console_reset(s->guest_port);
  
diff --git a/hw/i386/kvm/xen_primary_console.h b/hw/i386/kvm/xen_primary_console.h

index dd4922f3f4..7e2989ea0d 100644
--- a/hw/i386/kvm/xen_primary_console.h
+++ b/hw/i386/kvm/xen_primary_console.h
@@ -16,6 +16,7 @@ void xen_primary_console_create(void);
  int xen_primary_console_reset(void);
  
  uint16_t xen_primary_console_get_port(void);

+void xen_primary_console_set_be_port(uint16_t port);
  uint64_t xen_primary_console_get_pfn(void);
  void *xen_primary_console_get_map(void);
  





Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode

2023-10-24 Thread Paul Durrant

On 24/10/2023 16:37, David Woodhouse wrote:

On Tue, 2023-10-24 at 15:20 +0100, Paul Durrant wrote:

On 16/10/2023 16:19, David Woodhouse wrote:

From: David Woodhouse 

The primary console is special because the toolstack maps a page at a
fixed GFN and also allocates the guest-side event channel. Add support
for that in emulated mode, so that we can have a primary console.

Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which
supports literally nothing except a single-page mapping of the console
page. This might as well have been a hack in the xen_console driver, but
this way at least the special-casing is kept within the Xen emulation
code, and it gives us a hook for a more complete implementation if/when
we ever do need one.


Why can't you map the console page via the grant table like the xenstore
page?


I suppose we could, but I didn't really want the generic xen-console
device code having any more of a special case for 'Xen emulation' than
it does already by having to call xen_primary_console_create().



But doesn't is save you the whole foreignmem thing? You can use the 
grant table for primary and secondary consoles.


  Paul




Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode

2023-10-24 Thread David Woodhouse
On Tue, 2023-10-24 at 15:20 +0100, Paul Durrant wrote:
> On 16/10/2023 16:19, David Woodhouse wrote:
> > From: David Woodhouse 
> > 
> > The primary console is special because the toolstack maps a page at a
> > fixed GFN and also allocates the guest-side event channel. Add support
> > for that in emulated mode, so that we can have a primary console.
> > 
> > Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which
> > supports literally nothing except a single-page mapping of the console
> > page. This might as well have been a hack in the xen_console driver, but
> > this way at least the special-casing is kept within the Xen emulation
> > code, and it gives us a hook for a more complete implementation if/when
> > we ever do need one.
> > 
> Why can't you map the console page via the grant table like the xenstore 
> page?

I suppose we could, but I didn't really want the generic xen-console
device code having any more of a special case for 'Xen emulation' than
it does already by having to call xen_primary_console_create().




smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v2 10/24] hw/xen: populate store frontend nodes with XenStore PFN/port

2023-10-24 Thread Paul Durrant

On 19/10/2023 16:40, David Woodhouse wrote:

From: David Woodhouse 

This is kind of redundant since without being able to get these through
some other method (HVMOP_get_param) the guest wouldn't be able to access
XenStore in order to find them. But Xen populates them, and it does
allow guests to *rebind* to the event channel port after a reset.

Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_xenstore.c | 11 +++
  1 file changed, 11 insertions(+)



Reviewed-by: Paul Durrant 




Re: [PATCH 0/12] Get Xen PV shim running in qemu

2023-10-24 Thread Alex Bennée


David Woodhouse  writes:

> I hadn't got round to getting the PV shim running yet; I thought it would
> need work on the multiboot loader. Turns out it doesn't. I *did* need to
> fix a couple of brown-paper-bag bugs in the per-vCPU upcall vector support,
> and implement Xen console support though. Now I can test PV guests:
>
>  $ qemu-system-x86_64 --accel kvm,xen-version=0x40011,kernel-irqchip=split \
>-chardev stdio,mux=on,id=char0 -device xen-console,chardev=char0 \
>-drive file=${GUEST_IMAGE},if=xen -display none -m 1G \
>-kernel ~/git/xen/xen/xen -initrd ~/git/linux/arch/x86/boot/bzImage
>  \

So this is a KVM guest running the Xen hypervisor (via -kernel) and a
Dom0 Linux guest (via -initrd)?

Should this work for any Xen architecture or is this x86 specific? Does
the -M machine model matter?

Would it be possible to have some sort of overview document in our
manual for how Xen guests are supported under KVM?

>-append "loglvl=all -- console=hvc0 root=/dev/xvda1"
>


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH v2 05/24] hw/xen: fix XenStore watch delivery to guest

2023-10-24 Thread David Woodhouse
On Tue, 2023-10-24 at 16:19 +0100, Paul Durrant wrote:
> On 19/10/2023 16:40, David Woodhouse wrote:
> > From: David Woodhouse 
> > 
> > When fire_watch_cb() found the response buffer empty, it would call
> > deliver_watch() to generate the XS_WATCH_EVENT message in the response
> > buffer and send an event channel notification to the guest… without
> > actually *copying* the response buffer into the ring. So there was
> > nothing for the guest to see. The pending response didn't actually get
> > processed into the ring until the guest next triggered some activity
> > from its side.
> > 
> > Add the missing call to put_rsp().
> > 
> > It might have been slightly nicer to call xen_xenstore_event() here,
> > which would *almost* have worked. Except for the fact that it calls
> > xen_be_evtchn_pending() to check that it really does have an event
> > pending (and clear the eventfd for next time). And under Xen it's
> > defined that setting that fd to O_NONBLOCK isn't guaranteed to work,
> > so the emu implementation follows suit.
> > 
> > This fixes Xen device hot-unplug.
> > 
> > Fixes: 0254c4d19df ("hw/xen: Add xenstore wire implementation and 
> > implementation stubs")
> > Signed-off-by: David Woodhouse 
> > ---
> >   hw/i386/kvm/xen_xenstore.c | 8 +---
> >   1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
> > index 660d0b72f9..82a215058a 100644
> > --- a/hw/i386/kvm/xen_xenstore.c
> > +++ b/hw/i386/kvm/xen_xenstore.c
> > @@ -1357,10 +1357,12 @@ static void fire_watch_cb(void *opaque, const char 
> > *path, const char *token)
> >   } else {
> >   deliver_watch(s, path, token);
> >   /*
> > - * If the message was queued because there was already ring 
> > activity,
> > - * no need to wake the guest. But if not, we need to send the 
> > evtchn.
> > + * Attempt to queue the message into the actual ring, and send
> > + * the event channel notification if any bytes are copied.
> >    */
> > -    xen_be_evtchn_notify(s->eh, s->be_port);
> > +    if (put_rsp(s) > 0) {
> > +    xen_be_evtchn_notify(s->eh, s->be_port);
> > +    }
> 
> There's actually the potential for an assertion failure there; if the
> guest has specified an oversize token then deliver_watch() will not set 
> rsp_pending... then put_rsp() will fail its assertion that the flag is true.
> 


Meh, I *already* had a whole paragraph in the commit message about how
it would have been nicer to just call xen_xenstore_event() :)

Thanks for spotting that; will fix it to check s->rsp_pending.




smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v2 05/24] hw/xen: fix XenStore watch delivery to guest

2023-10-24 Thread Paul Durrant

On 19/10/2023 16:40, David Woodhouse wrote:

From: David Woodhouse 

When fire_watch_cb() found the response buffer empty, it would call
deliver_watch() to generate the XS_WATCH_EVENT message in the response
buffer and send an event channel notification to the guest… without
actually *copying* the response buffer into the ring. So there was
nothing for the guest to see. The pending response didn't actually get
processed into the ring until the guest next triggered some activity
from its side.

Add the missing call to put_rsp().

It might have been slightly nicer to call xen_xenstore_event() here,
which would *almost* have worked. Except for the fact that it calls
xen_be_evtchn_pending() to check that it really does have an event
pending (and clear the eventfd for next time). And under Xen it's
defined that setting that fd to O_NONBLOCK isn't guaranteed to work,
so the emu implementation follows suit.

This fixes Xen device hot-unplug.

Fixes: 0254c4d19df ("hw/xen: Add xenstore wire implementation and implementation 
stubs")
Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_xenstore.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
index 660d0b72f9..82a215058a 100644
--- a/hw/i386/kvm/xen_xenstore.c
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -1357,10 +1357,12 @@ static void fire_watch_cb(void *opaque, const char 
*path, const char *token)
  } else {
  deliver_watch(s, path, token);
  /*
- * If the message was queued because there was already ring activity,
- * no need to wake the guest. But if not, we need to send the evtchn.
+ * Attempt to queue the message into the actual ring, and send
+ * the event channel notification if any bytes are copied.
   */
-xen_be_evtchn_notify(s->eh, s->be_port);
+if (put_rsp(s) > 0) {
+xen_be_evtchn_notify(s->eh, s->be_port);
+}


There's actually the potential for an assertion failure there; if the 
guest has specified an oversize token then deliver_watch() will not set 
rsp_pending... then put_rsp() will fail its assertion that the flag is true.


  Paul


  }
  }
  





[PATCH 02/12] qemu_file: Use a stat64 for qemu_file_transferred

2023-10-24 Thread Juan Quintela
This way we can read it from any thread.
I checked that it gives the same value than the current one.  We never
use to qemu_files at the same time.

Signed-off-by: Juan Quintela 
---
 migration/migration-stats.h | 4 
 migration/qemu-file.c   | 5 +++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/migration/migration-stats.h b/migration/migration-stats.h
index 2358caad63..b7795e7914 100644
--- a/migration/migration-stats.h
+++ b/migration/migration-stats.h
@@ -81,6 +81,10 @@ typedef struct {
  * Number of bytes sent during precopy stage.
  */
 Stat64 precopy_bytes;
+/*
+ * Number of bytes transferred with QEMUFile.
+ */
+Stat64 qemu_file_transferred;
 /*
  * Amount of transferred data at the start of current cycle.
  */
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 6814c562e6..384985f534 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -283,6 +283,7 @@ void qemu_fflush(QEMUFile *f)
 } else {
 uint64_t size = iov_size(f->iov, f->iovcnt);
 f->total_transferred += size;
+stat64_add(_stats.qemu_file_transferred, size);
 }
 
 qemu_iovec_release_ram(f);
@@ -623,7 +624,7 @@ int coroutine_mixed_fn qemu_get_byte(QEMUFile *f)
 
 uint64_t qemu_file_transferred_noflush(QEMUFile *f)
 {
-uint64_t ret = f->total_transferred;
+uint64_t ret = stat64_get(_stats.qemu_file_transferred);
 int i;
 
 g_assert(qemu_file_is_writable(f));
@@ -639,7 +640,7 @@ uint64_t qemu_file_transferred(QEMUFile *f)
 {
 g_assert(qemu_file_is_writable(f));
 qemu_fflush(f);
-return f->total_transferred;
+return stat64_get(_stats.qemu_file_transferred);
 }
 
 void qemu_put_be16(QEMUFile *f, unsigned int v)
-- 
2.41.0




[PATCH 03/12] qemu_file: total_transferred is not used anymore

2023-10-24 Thread Juan Quintela
Signed-off-by: Juan Quintela 
---
 migration/qemu-file.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 384985f534..641ab703cc 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -41,9 +41,6 @@ struct QEMUFile {
 QIOChannel *ioc;
 bool is_writable;
 
-/* The sum of bytes transferred on the wire */
-uint64_t total_transferred;
-
 int buf_index;
 int buf_size; /* 0 when writing */
 uint8_t buf[IO_BUF_SIZE];
@@ -282,7 +279,6 @@ void qemu_fflush(QEMUFile *f)
 qemu_file_set_error_obj(f, -EIO, local_error);
 } else {
 uint64_t size = iov_size(f->iov, f->iovcnt);
-f->total_transferred += size;
 stat64_add(_stats.qemu_file_transferred, size);
 }
 
-- 
2.41.0




[PATCH 09/12] qemu-file: Simplify qemu_file_get_error()

2023-10-24 Thread Juan Quintela
If we pass a NULL error is the same that returning dirrectly the value.

Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
---
 migration/qemu-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 0158db2a54..7e738743ce 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -204,7 +204,7 @@ void qemu_file_set_error_obj(QEMUFile *f, int ret, Error 
*err)
  */
 int qemu_file_get_error(QEMUFile *f)
 {
-return qemu_file_get_error_obj(f, NULL);
+return f->last_error;
 }
 
 /*
-- 
2.41.0




[PATCH 04/12] migration: Use the number of transferred bytes directly

2023-10-24 Thread Juan Quintela
We only use migration_transferred_bytes() to calculate the rate_limit,
for that we don't need to flush whatever is on the qemu_file buffer.
Remember that the buffer is really small (normal case is 32K if we use
iov's can be 64 * TARGET_PAGE_SIZE), so this is not relevant to
calculations.

Signed-off-by: Juan Quintela 
---
 migration/migration-stats.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/migration-stats.c b/migration/migration-stats.c
index 4cc989d975..1d9197b4c3 100644
--- a/migration/migration-stats.c
+++ b/migration/migration-stats.c
@@ -63,7 +63,7 @@ uint64_t migration_transferred_bytes(QEMUFile *f)
 {
 uint64_t multifd = stat64_get(_stats.multifd_bytes);
 uint64_t rdma = stat64_get(_stats.rdma_bytes);
-uint64_t qemu_file = qemu_file_transferred(f);
+uint64_t qemu_file = stat64_get(_stats.qemu_file_transferred);
 
 trace_migration_transferred_bytes(qemu_file, multifd, rdma);
 return qemu_file + multifd + rdma;
-- 
2.41.0




[PATCH 08/12] migration: migration_rate_limit_reset() don't need the QEMUFile

2023-10-24 Thread Juan Quintela
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Juan Quintela 
---
 migration/migration-stats.h | 4 +---
 migration/migration-stats.c | 2 +-
 migration/migration.c   | 2 +-
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/migration/migration-stats.h b/migration/migration-stats.h
index e3863bf9bb..68f3939188 100644
--- a/migration/migration-stats.h
+++ b/migration/migration-stats.h
@@ -120,10 +120,8 @@ uint64_t migration_rate_get(void);
  * migration_rate_reset: Reset the rate limit counter.
  *
  * This is called when we know we start a new transfer cycle.
- *
- * @f: QEMUFile used for main migration channel
  */
-void migration_rate_reset(QEMUFile *f);
+void migration_rate_reset(void);
 
 /**
  * migration_rate_set: Set the maximum amount that can be transferred.
diff --git a/migration/migration-stats.c b/migration/migration-stats.c
index 4ae8c0c722..f690b98a03 100644
--- a/migration/migration-stats.c
+++ b/migration/migration-stats.c
@@ -54,7 +54,7 @@ void migration_rate_set(uint64_t limit)
 stat64_set(_stats.rate_limit_max, limit / XFER_LIMIT_RATIO);
 }
 
-void migration_rate_reset(QEMUFile *f)
+void migration_rate_reset(void)
 {
 stat64_set(_stats.rate_limit_start, migration_transferred_bytes());
 }
diff --git a/migration/migration.c b/migration/migration.c
index e199d2f50d..bb62244288 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2775,7 +2775,7 @@ static void migration_update_counters(MigrationState *s,
 stat64_get(_stats.dirty_bytes_last_sync) / expected_bw_per_ms;
 }
 
-migration_rate_reset(s->to_dst_file);
+migration_rate_reset();
 
 update_iteration_initial_status(s);
 
-- 
2.41.0




[PATCH 11/12] migration: Remove transferred atomic counter

2023-10-24 Thread Juan Quintela
After last commit, it is a write only variable.

Signed-off-by: Juan Quintela 
---
 migration/migration-stats.h | 4 
 migration/multifd.c | 3 ---
 migration/ram.c | 1 -
 3 files changed, 8 deletions(-)

diff --git a/migration/migration-stats.h b/migration/migration-stats.h
index 68f3939188..05290ade76 100644
--- a/migration/migration-stats.h
+++ b/migration/migration-stats.h
@@ -97,10 +97,6 @@ typedef struct {
  * Number of bytes sent through RDMA.
  */
 Stat64 rdma_bytes;
-/*
- * Total number of bytes transferred.
- */
-Stat64 transferred;
 /*
  * Number of pages transferred that were full of zeros.
  */
diff --git a/migration/multifd.c b/migration/multifd.c
index e2a45c667a..ec58c58082 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -188,7 +188,6 @@ static int multifd_send_initial_packet(MultiFDSendParams 
*p, Error **errp)
 return -1;
 }
 stat64_add(_stats.multifd_bytes, size);
-stat64_add(_stats.transferred, size);
 return 0;
 }
 
@@ -733,8 +732,6 @@ static void *multifd_send_thread(void *opaque)
 
 stat64_add(_stats.multifd_bytes,
p->next_packet_size + p->packet_len);
-stat64_add(_stats.transferred,
-   p->next_packet_size + p->packet_len);
 p->next_packet_size = 0;
 qemu_mutex_lock(>mutex);
 p->pending_job--;
diff --git a/migration/ram.c b/migration/ram.c
index 5ccf70333a..6d2bf50614 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -455,7 +455,6 @@ void ram_transferred_add(uint64_t bytes)
 } else {
 stat64_add(_stats.downtime_bytes, bytes);
 }
-stat64_add(_stats.transferred, bytes);
 }
 
 struct MigrationOps {
-- 
2.41.0




[PATCH 12/12] qemu-file: Make qemu_fflush() return errors

2023-10-24 Thread Juan Quintela
This let us simplify code of this shape.

   qemu_fflush(f);
   int ret = qemu_file_get_error(f);
   if (ret) {
  return ret;
   }

into:

   int ret = qemu_fflush(f);
   if (ret) {
  return ret;
   }

I updated all callers where there is any error check.
qemu_fclose() don't need to check for f->last_error because
qemu_fflush() returns it at the beggining of the function.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Juan Quintela 

---

In v2: Now that we call always qemu_fflush() for all files, we can
simplify qemu_fclose()
---
 migration/qemu-file.h |  2 +-
 migration/colo.c  | 11 +++
 migration/migration.c |  7 +--
 migration/qemu-file.c | 23 +++
 migration/ram.c   | 22 +++---
 migration/rdma.c  |  4 +---
 migration/savevm.c|  3 +--
 7 files changed, 21 insertions(+), 51 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 1b2f6b8d8f..1774116f79 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -71,7 +71,7 @@ void qemu_file_set_error_obj(QEMUFile *f, int ret, Error 
*err);
 void qemu_file_set_error(QEMUFile *f, int ret);
 int qemu_file_shutdown(QEMUFile *f);
 QEMUFile *qemu_file_get_return_path(QEMUFile *f);
-void qemu_fflush(QEMUFile *f);
+int qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
 int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
 
diff --git a/migration/colo.c b/migration/colo.c
index 72f4f7b37e..4447e34914 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -314,9 +314,7 @@ static void colo_send_message(QEMUFile *f, COLOMessage msg,
 return;
 }
 qemu_put_be32(f, msg);
-qemu_fflush(f);
-
-ret = qemu_file_get_error(f);
+ret = qemu_fflush(f);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Can't send COLO message");
 }
@@ -335,9 +333,7 @@ static void colo_send_message_value(QEMUFile *f, 
COLOMessage msg,
 return;
 }
 qemu_put_be64(f, value);
-qemu_fflush(f);
-
-ret = qemu_file_get_error(f);
+ret = qemu_fflush(f);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Failed to send value for message:%s",
  COLOMessage_str(msg));
@@ -483,8 +479,7 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
 }
 
 qemu_put_buffer(s->to_dst_file, bioc->data, bioc->usage);
-qemu_fflush(s->to_dst_file);
-ret = qemu_file_get_error(s->to_dst_file);
+ret = qemu_fflush(s->to_dst_file);
 if (ret < 0) {
 goto out;
 }
diff --git a/migration/migration.c b/migration/migration.c
index a6cde985a2..ce12fca520 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -305,12 +305,7 @@ static int migrate_send_rp_message(MigrationIncomingState 
*mis,
 qemu_put_be16(mis->to_src_file, (unsigned int)message_type);
 qemu_put_be16(mis->to_src_file, len);
 qemu_put_buffer(mis->to_src_file, data, len);
-qemu_fflush(mis->to_src_file);
-
-/* It's possible that qemu file got error during sending */
-ret = qemu_file_get_error(mis->to_src_file);
-
-return ret;
+return qemu_fflush(mis->to_src_file);
 }
 
 /* Request one page from the source VM at the given start address.
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 7e738743ce..d64500310d 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -262,14 +262,14 @@ static void qemu_iovec_release_ram(QEMUFile *f)
  * This will flush all pending data. If data was only partially flushed, it
  * will set an error state.
  */
-void qemu_fflush(QEMUFile *f)
+int qemu_fflush(QEMUFile *f)
 {
 if (!qemu_file_is_writable(f)) {
-return;
+return f->last_error;
 }
 
-if (qemu_file_get_error(f)) {
-return;
+if (f->last_error) {
+return f->last_error;
 }
 if (f->iovcnt > 0) {
 Error *local_error = NULL;
@@ -287,6 +287,7 @@ void qemu_fflush(QEMUFile *f)
 
 f->buf_index = 0;
 f->iovcnt = 0;
+return f->last_error;
 }
 
 /*
@@ -353,22 +354,12 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
  */
 int qemu_fclose(QEMUFile *f)
 {
-int ret, ret2;
-qemu_fflush(f);
-ret = qemu_file_get_error(f);
-
-ret2 = qio_channel_close(f->ioc, NULL);
+int ret = qemu_fflush(f);
+int ret2 = qio_channel_close(f->ioc, NULL);
 if (ret >= 0) {
 ret = ret2;
 }
 g_clear_pointer(>ioc, object_unref);
-
-/* If any error was spotted before closing, we should report it
- * instead of the close() return value.
- */
-if (f->last_error) {
-ret = f->last_error;
-}
 error_free(f->last_error_obj);
 g_free(f);
 trace_qemu_file_fclose();
diff --git a/migration/ram.c b/migration/ram.c
index 6d2bf50614..3a1d9882ce 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -305,17 +305,15 @@ int64_t ramblock_recv_bitmap_send(QEMUFile *file,
 
 qemu_put_be64(file, size);
 

[PATCH 06/12] qemu-file: Remove _noflush from qemu_file_transferred_noflush()

2023-10-24 Thread Juan Quintela
qemu_file_transferred() don't exist anymore, so we can reuse the name.

Signed-off-by: Juan Quintela 

---

v2: Update the documentation (thanks fabiano)
---
 migration/qemu-file.h | 9 -
 migration/block.c | 4 ++--
 migration/qemu-file.c | 2 +-
 migration/savevm.c| 6 +++---
 migration/vmstate.c   | 4 ++--
 5 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 8b71152754..1b2f6b8d8f 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -34,15 +34,14 @@ QEMUFile *qemu_file_new_output(QIOChannel *ioc);
 int qemu_fclose(QEMUFile *f);
 
 /*
- * qemu_file_transferred_noflush:
+ * qemu_file_transferred:
  *
- * As qemu_file_transferred except for writable files, where no flush
- * is performed and the reported amount will include the size of any
- * queued buffers, on top of the amount actually transferred.
+ * No flush is performed and the reported amount will include the size
+ * of any queued buffers, on top of the amount actually transferred.
  *
  * Returns: the total bytes transferred and queued
  */
-uint64_t qemu_file_transferred_noflush(QEMUFile *f);
+uint64_t qemu_file_transferred(QEMUFile *f);
 
 /*
  * put_buffer without copying the buffer.
diff --git a/migration/block.c b/migration/block.c
index b60698d6e2..47f11d0e4f 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -752,7 +752,7 @@ static int block_save_setup(QEMUFile *f, void *opaque)
 static int block_save_iterate(QEMUFile *f, void *opaque)
 {
 int ret;
-uint64_t last_bytes = qemu_file_transferred_noflush(f);
+uint64_t last_bytes = qemu_file_transferred(f);
 
 trace_migration_block_save("iterate", block_mig_state.submitted,
block_mig_state.transferred);
@@ -804,7 +804,7 @@ static int block_save_iterate(QEMUFile *f, void *opaque)
 }
 
 qemu_put_be64(f, BLK_MIG_FLAG_EOS);
-uint64_t delta_bytes = qemu_file_transferred_noflush(f) - last_bytes;
+uint64_t delta_bytes = qemu_file_transferred(f) - last_bytes;
 return (delta_bytes > 0);
 }
 
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index efa5f11033..0158db2a54 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -618,7 +618,7 @@ int coroutine_mixed_fn qemu_get_byte(QEMUFile *f)
 return result;
 }
 
-uint64_t qemu_file_transferred_noflush(QEMUFile *f)
+uint64_t qemu_file_transferred(QEMUFile *f)
 {
 uint64_t ret = stat64_get(_stats.qemu_file_transferred);
 int i;
diff --git a/migration/savevm.c b/migration/savevm.c
index 8622f229e5..9c90499609 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -927,9 +927,9 @@ static int vmstate_load(QEMUFile *f, SaveStateEntry *se)
 static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se,
JSONWriter *vmdesc)
 {
-uint64_t old_offset = qemu_file_transferred_noflush(f);
+uint64_t old_offset = qemu_file_transferred(f);
 se->ops->save_state(f, se->opaque);
-uint64_t size = qemu_file_transferred_noflush(f) - old_offset;
+uint64_t size = qemu_file_transferred(f) - old_offset;
 
 if (vmdesc) {
 json_writer_int64(vmdesc, "size", size);
@@ -3053,7 +3053,7 @@ bool save_snapshot(const char *name, bool overwrite, 
const char *vmstate,
 goto the_end;
 }
 ret = qemu_savevm_state(f, errp);
-vm_state_size = qemu_file_transferred_noflush(f);
+vm_state_size = qemu_file_transferred(f);
 ret2 = qemu_fclose(f);
 if (ret < 0) {
 goto the_end;
diff --git a/migration/vmstate.c b/migration/vmstate.c
index 1cf9e45b85..16420fa9a3 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -386,7 +386,7 @@ int vmstate_save_state_v(QEMUFile *f, const 
VMStateDescription *vmsd,
 void *curr_elem = first_elem + size * i;
 
 vmsd_desc_field_start(vmsd, vmdesc_loop, field, i, n_elems);
-old_offset = qemu_file_transferred_noflush(f);
+old_offset = qemu_file_transferred(f);
 if (field->flags & VMS_ARRAY_OF_POINTER) {
 assert(curr_elem);
 curr_elem = *(void **)curr_elem;
@@ -416,7 +416,7 @@ int vmstate_save_state_v(QEMUFile *f, const 
VMStateDescription *vmsd,
 return ret;
 }
 
-written_bytes = qemu_file_transferred_noflush(f) - old_offset;
+written_bytes = qemu_file_transferred(f) - old_offset;
 vmsd_desc_field_end(vmsd, vmdesc_loop, field, written_bytes, 
i);
 
 /* Compressed arrays only care about the first element */
-- 
2.41.0




[PATCH 10/12] migration: Use migration_transferred_bytes()

2023-10-24 Thread Juan Quintela
There are only two differnces with the old value:

- the amount of QEMUFile that hasn't yet been flushed.  It can be
  discussed what is more exact, the new or the old one.
- the amount of transferred bytes that we forgot to account for (the
  newer is better, i.e. exact).

Notice that this two values are used to:
a - present to the user
b - calculate the rate_limit

So a few KB here and there is not going to make a difference.

Signed-off-by: Juan Quintela 
---
 migration/migration.c | 2 +-
 migration/ram.c   | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index bb62244288..a6cde985a2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -942,7 +942,7 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 size_t page_size = qemu_target_page_size();
 
 info->ram = g_malloc0(sizeof(*info->ram));
-info->ram->transferred = stat64_get(_stats.transferred);
+info->ram->transferred = migration_transferred_bytes();
 info->ram->total = ram_bytes_total();
 info->ram->duplicate = stat64_get(_stats.zero_pages);
 /* legacy value.  It is not used anymore */
diff --git a/migration/ram.c b/migration/ram.c
index 92769902bb..5ccf70333a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -564,7 +564,7 @@ void mig_throttle_counter_reset(void)
 
 rs->time_last_bitmap_sync = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 rs->num_dirty_pages_period = 0;
-rs->bytes_xfer_prev = stat64_get(_stats.transferred);
+rs->bytes_xfer_prev = migration_transferred_bytes();
 }
 
 /**
@@ -1030,7 +1030,7 @@ static void migration_trigger_throttle(RAMState *rs)
 {
 uint64_t threshold = migrate_throttle_trigger_threshold();
 uint64_t bytes_xfer_period =
-stat64_get(_stats.transferred) - rs->bytes_xfer_prev;
+migration_transferred_bytes() - rs->bytes_xfer_prev;
 uint64_t bytes_dirty_period = rs->num_dirty_pages_period * 
TARGET_PAGE_SIZE;
 uint64_t bytes_dirty_threshold = bytes_xfer_period * threshold / 100;
 
@@ -1100,7 +1100,7 @@ static void migration_bitmap_sync(RAMState *rs, bool 
last_stage)
 /* reset period counters */
 rs->time_last_bitmap_sync = end_time;
 rs->num_dirty_pages_period = 0;
-rs->bytes_xfer_prev = stat64_get(_stats.transferred);
+rs->bytes_xfer_prev = migration_transferred_bytes();
 }
 if (migrate_events()) {
 uint64_t generation = stat64_get(_stats.dirty_sync_count);
-- 
2.41.0




[PATCH 05/12] qemu_file: Remove unused qemu_file_transferred()

2023-10-24 Thread Juan Quintela
Signed-off-by: Juan Quintela 
---
 migration/qemu-file.h | 18 --
 migration/qemu-file.c |  7 ---
 2 files changed, 25 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index a29c37b0d0..8b71152754 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -33,24 +33,6 @@ QEMUFile *qemu_file_new_input(QIOChannel *ioc);
 QEMUFile *qemu_file_new_output(QIOChannel *ioc);
 int qemu_fclose(QEMUFile *f);
 
-/*
- * qemu_file_transferred:
- *
- * Report the total number of bytes transferred with
- * this file.
- *
- * For writable files, any pending buffers will be
- * flushed, so the reported value will be equal to
- * the number of bytes transferred on the wire.
- *
- * For readable files, the reported value will be
- * equal to the number of bytes transferred on the
- * wire.
- *
- * Returns: the total bytes transferred
- */
-uint64_t qemu_file_transferred(QEMUFile *f);
-
 /*
  * qemu_file_transferred_noflush:
  *
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 641ab703cc..efa5f11033 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -632,13 +632,6 @@ uint64_t qemu_file_transferred_noflush(QEMUFile *f)
 return ret;
 }
 
-uint64_t qemu_file_transferred(QEMUFile *f)
-{
-g_assert(qemu_file_is_writable(f));
-qemu_fflush(f);
-return stat64_get(_stats.qemu_file_transferred);
-}
-
 void qemu_put_be16(QEMUFile *f, unsigned int v)
 {
 qemu_put_byte(f, v >> 8);
-- 
2.41.0




[PATCH 07/12] migration: migration_transferred_bytes() don't need the QEMUFile

2023-10-24 Thread Juan Quintela
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Juan Quintela 
---
 migration/migration-stats.h | 4 +---
 migration/migration-stats.c | 6 +++---
 migration/migration.c   | 6 +++---
 3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/migration/migration-stats.h b/migration/migration-stats.h
index b7795e7914..e3863bf9bb 100644
--- a/migration/migration-stats.h
+++ b/migration/migration-stats.h
@@ -137,11 +137,9 @@ void migration_rate_set(uint64_t new_rate);
 /**
  * migration_transferred_bytes: Return number of bytes transferred
  *
- * @f: QEMUFile used for main migration channel
- *
  * Returns how many bytes have we transferred since the beginning of
  * the migration.  It accounts for bytes sent through any migration
  * channel, multifd, qemu_file, rdma, 
  */
-uint64_t migration_transferred_bytes(QEMUFile *f);
+uint64_t migration_transferred_bytes(void);
 #endif
diff --git a/migration/migration-stats.c b/migration/migration-stats.c
index 1d9197b4c3..4ae8c0c722 100644
--- a/migration/migration-stats.c
+++ b/migration/migration-stats.c
@@ -30,7 +30,7 @@ bool migration_rate_exceeded(QEMUFile *f)
 }
 
 uint64_t rate_limit_start = stat64_get(_stats.rate_limit_start);
-uint64_t rate_limit_current = migration_transferred_bytes(f);
+uint64_t rate_limit_current = migration_transferred_bytes();
 uint64_t rate_limit_used = rate_limit_current - rate_limit_start;
 
 if (rate_limit_max > 0 && rate_limit_used > rate_limit_max) {
@@ -56,10 +56,10 @@ void migration_rate_set(uint64_t limit)
 
 void migration_rate_reset(QEMUFile *f)
 {
-stat64_set(_stats.rate_limit_start, migration_transferred_bytes(f));
+stat64_set(_stats.rate_limit_start, migration_transferred_bytes());
 }
 
-uint64_t migration_transferred_bytes(QEMUFile *f)
+uint64_t migration_transferred_bytes(void)
 {
 uint64_t multifd = stat64_get(_stats.multifd_bytes);
 uint64_t rdma = stat64_get(_stats.rdma_bytes);
diff --git a/migration/migration.c b/migration/migration.c
index 67547eb6a1..e199d2f50d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2694,7 +2694,7 @@ static MigThrError migration_detect_error(MigrationState 
*s)
 
 static void migration_calculate_complete(MigrationState *s)
 {
-uint64_t bytes = migration_transferred_bytes(s->to_dst_file);
+uint64_t bytes = migration_transferred_bytes();
 int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 int64_t transfer_time;
 
@@ -2720,7 +2720,7 @@ static void 
update_iteration_initial_status(MigrationState *s)
  * wrong speed calculation.
  */
 s->iteration_start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-s->iteration_initial_bytes = migration_transferred_bytes(s->to_dst_file);
+s->iteration_initial_bytes = migration_transferred_bytes();
 s->iteration_initial_pages = ram_get_total_transferred_pages();
 }
 
@@ -2739,7 +2739,7 @@ static void migration_update_counters(MigrationState *s,
 }
 
 switchover_bw = migrate_avail_switchover_bandwidth();
-current_bytes = migration_transferred_bytes(s->to_dst_file);
+current_bytes = migration_transferred_bytes();
 transferred = current_bytes - s->iteration_initial_bytes;
 time_spent = current_time - s->iteration_start_time;
 bandwidth = (double)transferred / time_spent;
-- 
2.41.0




[PATCH 00/12] migration: Yet another round of atomic counters

2023-10-24 Thread Juan Quintela
Hi

Goal of the whole series was to be able to move rate_limit logic to
not use qemu_file.  Goal achieved.

Removal of trasnferred atomic counter.

After this series, we have three atomic counters:
- multifd_bytes
- rdma_bytes
- qemu_file_trasferred

And we only need to setup one (and only one) of these each time that
we sent anything.

Please review.

Later, Juan.

Juan Quintela (12):
  qemu-file: We only call qemu_file_transferred_* on the sending side
  qemu_file: Use a stat64 for qemu_file_transferred
  qemu_file: total_transferred is not used anymore
  migration: Use the number of transferred bytes directly
  qemu_file: Remove unused qemu_file_transferred()
  qemu-file: Remove _noflush from qemu_file_transferred_noflush()
  migration: migration_transferred_bytes() don't need the QEMUFile
  migration: migration_rate_limit_reset() don't need the QEMUFile
  qemu-file: Simplify qemu_file_get_error()
  migration: Use migration_transferred_bytes()
  migration: Remove transferred atomic counter
  qemu-file: Make qemu_fflush() return errors

 migration/migration-stats.h | 16 ++
 migration/qemu-file.h   | 27 ---
 migration/block.c   |  4 ++--
 migration/colo.c| 11 +++---
 migration/migration-stats.c | 10 -
 migration/migration.c   | 17 ++-
 migration/multifd.c |  3 ---
 migration/qemu-file.c   | 43 +++--
 migration/ram.c | 29 +
 migration/rdma.c|  4 +---
 migration/savevm.c  |  9 
 migration/vmstate.c |  4 ++--
 12 files changed, 56 insertions(+), 121 deletions(-)

-- 
2.41.0




[PATCH 01/12] qemu-file: We only call qemu_file_transferred_* on the sending side

2023-10-24 Thread Juan Quintela
Remove the increase in qemu_file_fill_buffer() and add asserts to
qemu_file_transferred* functions.

Signed-off-by: Juan Quintela 
---
 migration/qemu-file.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 3fb25148d1..6814c562e6 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -337,7 +337,6 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile 
*f)
 
 if (len > 0) {
 f->buf_size += len;
-f->total_transferred += len;
 } else if (len == 0) {
 qemu_file_set_error_obj(f, -EIO, local_error);
 } else {
@@ -627,6 +626,8 @@ uint64_t qemu_file_transferred_noflush(QEMUFile *f)
 uint64_t ret = f->total_transferred;
 int i;
 
+g_assert(qemu_file_is_writable(f));
+
 for (i = 0; i < f->iovcnt; i++) {
 ret += f->iov[i].iov_len;
 }
@@ -636,6 +637,7 @@ uint64_t qemu_file_transferred_noflush(QEMUFile *f)
 
 uint64_t qemu_file_transferred(QEMUFile *f)
 {
+g_assert(qemu_file_is_writable(f));
 qemu_fflush(f);
 return f->total_transferred;
 }
-- 
2.41.0




Re: [PATCH v2 04/24] hw/xen: don't clear map_track[] in xen_gnttab_reset()

2023-10-24 Thread Paul Durrant

On 19/10/2023 16:40, David Woodhouse wrote:

From: David Woodhouse 

The refcounts actually correspond to 'active_ref' structures stored in a
GHashTable per "user" on the backend side (mostly, per XenDevice).

If we zero map_track[] on reset, then when the backend drivers get torn
down and release their mapping we hit the assert(s->map_track[ref] != 0)
in gnt_unref().

So leave them in place. Each backend driver will disconnect and reconnect
as the guest comes back up again and reconnects, and it all works out OK
in the end as the old refs get dropped.

Fixes: de26b2619789 ("hw/xen: Implement soft reset for emulated gnttab")
Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_gnttab.c | 2 --
  1 file changed, 2 deletions(-)



Reviewed-by: Paul Durrant 




Re: [PATCH v2 03/24] hw/xen: select kernel mode for per-vCPU event channel upcall vector

2023-10-24 Thread Paul Durrant

On 19/10/2023 16:39, David Woodhouse wrote:

From: David Woodhouse 

A guest which has configured the per-vCPU upcall vector may set the
HVM_PARAM_CALLBACK_IRQ param to fairly much anything other than zero.

For example, Linux v6.0+ after commit b1c3497e604 ("x86/xen: Add support
for HVMOP_set_evtchn_upcall_vector") will just do this after setting the
vector:

/* Trick toolstack to think we are enlightened. */
if (!cpu)
rc = xen_set_callback_via(1);

That's explicitly setting the delivery to GSI#1, but it's supposed to be
overridden by the per-vCPU vector setting. This mostly works in Qemu
*except* for the logic to enable the in-kernel handling of event channels,
which falsely determines that the kernel cannot accelerate GSI delivery
in this case.

Add a kvm_xen_has_vcpu_callback_vector() to report whether vCPU#0 has
the vector set, and use that in xen_evtchn_set_callback_param() to
enable the kernel acceleration features even when the param *appears*
to be set to target a GSI.

Preserve the Xen behaviour that when HVM_PARAM_CALLBACK_IRQ is set to
*zero* the event channel delivery is disabled completely. (Which is
what that bizarre guest behaviour is working round in the first place.)

Fixes: 91cce756179 ("hw/xen: Add xen_evtchn device for event channel emulation")
Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_evtchn.c  | 6 ++
  include/sysemu/kvm_xen.h  | 1 +
  target/i386/kvm/xen-emu.c | 7 +++
  3 files changed, 14 insertions(+)



Reviewed-by: Paul Durrant 




Re: [PATCH 09/12] hw/xen: prevent duplicate device registrations

2023-10-24 Thread David Woodhouse
On Tue, 2023-10-24 at 15:10 +0100, Paul Durrant wrote:
> On 16/10/2023 16:19, David Woodhouse wrote:
> > --- a/hw/char/xen_console.c
> > +++ b/hw/char/xen_console.c
> > @@ -468,7 +468,7 @@ static void 
> > xen_console_device_create(XenBackendInstance *backend,
> >   Chardev *cd = NULL;
> >   struct qemu_xs_handle *xsh = xenbus->xsh;
> >   
> > -    if (qemu_strtoul(name, NULL, 10, )) {
> > +    if (qemu_strtoul(name, NULL, 10, ) || number >= INT_MAX) {
> >   error_setg(errp, "failed to parse name '%s'", name);
> >   goto fail;
> >   }
> I don't think this hunk belongs here, does it? Seems like it should be 
> in patch 7.

Well, console#4294967295 *did* actually work before this patch started
using -1 to mean something different. But yes, I've already moved that
into the previous patch.

In fact I've just completely dropped this patch now, as the
dedeuplication needs to happen on the *frontend* nodes, since a given
frontend can be powered by a backend of different types, or in
different driver domains.



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode

2023-10-24 Thread Paul Durrant

On 16/10/2023 16:19, David Woodhouse wrote:

From: David Woodhouse 

The primary console is special because the toolstack maps a page at a
fixed GFN and also allocates the guest-side event channel. Add support
for that in emulated mode, so that we can have a primary console.

Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which
supports literally nothing except a single-page mapping of the console
page. This might as well have been a hack in the xen_console driver, but
this way at least the special-casing is kept within the Xen emulation
code, and it gives us a hook for a more complete implementation if/when
we ever do need one.

Why can't you map the console page via the grant table like the xenstore 
page?


  Paul




Re: [PATCH 09/12] hw/xen: prevent duplicate device registrations

2023-10-24 Thread Paul Durrant

On 16/10/2023 16:19, David Woodhouse wrote:

From: David Woodhouse 

Ensure that we have a XenBackendInstance for every device regardless
of whether it was "discovered" in XenStore or created directly in QEMU.

This allows the backend_list to be a source of truth about whether a
given backend exists, and allows us to reject duplicates.

This also cleans up the fact that backend drivers were calling
xen_backend_set_device() with a XenDevice immediately after calling
qdev_realize_and_unref() on it, when it wasn't theirs to play with any
more.

Signed-off-by: David Woodhouse 
---
  hw/block/xen-block.c |  1 -
  hw/char/xen_console.c|  2 +-
  hw/xen/xen-backend.c | 78 ++--
  hw/xen/xen-bus.c |  8 
  include/hw/xen/xen-backend.h |  3 ++
  5 files changed, 69 insertions(+), 23 deletions(-)

diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index a07cd7eb5d..9262338535 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -975,7 +975,6 @@ static void xen_block_device_create(XenBackendInstance 
*backend,
  goto fail;
  }
  
-xen_backend_set_device(backend, xendev);

  return;
  
  fail:

diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c
index bd20be116c..2825b8c511 100644
--- a/hw/char/xen_console.c
+++ b/hw/char/xen_console.c
@@ -468,7 +468,7 @@ static void xen_console_device_create(XenBackendInstance 
*backend,
  Chardev *cd = NULL;
  struct qemu_xs_handle *xsh = xenbus->xsh;
  
-if (qemu_strtoul(name, NULL, 10, )) {

+if (qemu_strtoul(name, NULL, 10, ) || number >= INT_MAX) {
  error_setg(errp, "failed to parse name '%s'", name);
  goto fail;
  }
I don't think this hunk belongs here, does it? Seems like it should be 
in patch 7.



diff --git a/hw/xen/xen-backend.c b/hw/xen/xen-backend.c
index b9bf70a9f5..dcb4329258 100644
--- a/hw/xen/xen-backend.c
+++ b/hw/xen/xen-backend.c
@@ -101,22 +101,28 @@ static XenBackendInstance 
*xen_backend_list_find(XenDevice *xendev)
  return NULL;
  }
  
-bool xen_backend_exists(const char *type, const char *name)

+static XenBackendInstance *xen_backend_lookup(const XenBackendImpl *impl, 
const char *name)


This name is a little close to xen_backend_table_lookup()... perhaps 
that one should be renamed xen_backend_impl_lookup() for clarity.



  {
-const XenBackendImpl *impl = xen_backend_table_lookup(type);
  XenBackendInstance *backend;
  
-if (!impl) {

-return false;
-}
-
  QLIST_FOREACH(backend, _list, entry) {
  if (backend->impl == impl && !strcmp(backend->name, name)) {
-return true;
+return backend;
  }
  }
  
-return false;

+return NULL;
+}
+
+bool xen_backend_exists(const char *type, const char *name)
+{
+const XenBackendImpl *impl = xen_backend_table_lookup(type);
+
+if (!impl) {
+return false;
+}
+
+return !!xen_backend_lookup(impl, name);
  }
  
  static void xen_backend_list_remove(XenBackendInstance *backend)

@@ -138,11 +144,10 @@ void xen_backend_device_create(XenBus *xenbus, const char 
*type,
  backend = g_new0(XenBackendInstance, 1);
  backend->xenbus = xenbus;
  backend->name = g_strdup(name);
-
-impl->create(backend, opts, errp);
-
  backend->impl = impl;
  xen_backend_list_add(backend);
+
+impl->create(backend, opts, errp);
  }
  
  XenBus *xen_backend_get_bus(XenBackendInstance *backend)

@@ -155,13 +160,6 @@ const char *xen_backend_get_name(XenBackendInstance 
*backend)
  return backend->name;
  }
  
-void xen_backend_set_device(XenBackendInstance *backend,

-XenDevice *xendev)
-{
-g_assert(!backend->xendev);
-backend->xendev = xendev;
-}
-


The declaration also needs removing from xen-backend.h presumably.


  XenDevice *xen_backend_get_device(XenBackendInstance *backend)
  {
  return backend->xendev;
@@ -178,9 +176,7 @@ bool xen_backend_try_device_destroy(XenDevice *xendev, 
Error **errp)
  }
  
  impl = backend->impl;

-if (backend->xendev) {
-impl->destroy(backend, errp);
-}
+impl->destroy(backend, errp);
  
  xen_backend_list_remove(backend);

  g_free(backend->name);
@@ -188,3 +184,43 @@ bool xen_backend_try_device_destroy(XenDevice *xendev, 
Error **errp)
  
  return true;

  }
+
+bool xen_backend_device_realized(XenDevice *xendev, Error **errp)
+{
+XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev);
+const char *type = xendev_class->backend ? : 
object_get_typename(OBJECT(xendev));
+const XenBackendImpl *impl = xen_backend_table_lookup(type);
+XenBackendInstance *backend;
+
+if (!impl) {
+return false;
+}
+
+backend = xen_backend_lookup(impl, xendev->name);
+if (backend) {
+if (backend->xendev && backend->xendev != xendev) {
+error_setg(errp, "device %s/%s already exists", type, 
xendev->name);
+  

Re: [PATCH 06/12] hw/xen: add get_frontend_path() method to XenDeviceClass

2023-10-24 Thread Paul Durrant

On 24/10/2023 14:29, David Woodhouse wrote:

On Tue, 2023-10-24 at 13:59 +0100, Paul Durrant wrote:

On 24/10/2023 13:56, David Woodhouse wrote:

On Tue, 2023-10-24 at 13:42 +0100, Paul Durrant wrote:



--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -711,8 +711,16 @@ static void xen_device_frontend_create(XenDevice *xendev, 
Error **errp)
     {
     ERRP_GUARD();
     XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev)));
+    XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev);
 
-    xendev->frontend_path = xen_device_get_frontend_path(xendev);

+    if (xendev_class->get_frontend_path) {
+    xendev->frontend_path = xendev_class->get_frontend_path(xendev, errp);
+    if (!xendev->frontend_path) {
+    return;


I think you need to update errp here to note that you are failing to
create the frontend.


If xendev_class->get_frontend_path returned NULL it will have filled in errp.



Ok, but a prepend to say that a lack of path there means we skip
frontend creation seems reasonable?


No, it *is* returning an error. Perhaps I can make it



I understand it is returning an error. I thought the point of the 
cascading error handling was to prepend text at each (meaningful) layer 
such that the eventual message conveyed what failed and also what the 
consequences of that failure were.


  Paul


 if (!xendev->frontend_path) {
 /*
  * If the ->get_frontend_path() method returned NULL, it will
  * already have set *errp accordingly. Return the error.
  */
 return /* false */;
 }



As a general rule (I'll be doing a bombing run on xen-bus once I get my
patch queue down into single digits) we should never check 'if (*errp)'
to check if a function had an error. It should *also* return a success
or failure indication, and we should cope with errp being NULL.



I'm pretty sure someone told me the exact opposite a few years back.


Then they were wrong :)





[PULL 24/39] migration: set file error on subsection loading

2023-10-24 Thread Juan Quintela
From: Marc-André Lureau 

commit 13cde50889237 ("vmstate: Return error in case of error") sets
QemuFile error to stop reading from it and report to the caller (checked
by unit tests). We should do the same on subsection loading error.

Signed-off-by: Marc-André Lureau 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231024084043.2926316-8-marcandre.lur...@redhat.com>
---
 migration/vmstate.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/vmstate.c b/migration/vmstate.c
index 16e33a5d34..9c36803c8a 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -179,6 +179,7 @@ int vmstate_load_state(QEMUFile *f, const 
VMStateDescription *vmsd,
 assert(field->flags == VMS_END);
 ret = vmstate_subsection_load(f, vmsd, opaque);
 if (ret != 0) {
+qemu_file_set_error(f, ret);
 return ret;
 }
 if (vmsd->post_load) {
-- 
2.41.0




[PULL 36/39] migration: migrate 'inc' command option is deprecated.

2023-10-24 Thread Juan Quintela
Use blockdev-mirror with NBD instead.

Reviewed-by: Thomas Huth 
Acked-by: Stefan Hajnoczi 
Reviewed-by: Markus Armbruster 
Signed-off-by: Juan Quintela 
Message-ID: <20231018115513.2163-3-quint...@redhat.com>
---
 docs/about/deprecated.rst  | 8 
 qapi/migration.json| 8 +++-
 migration/migration-hmp-cmds.c | 5 +
 migration/migration.c  | 5 +
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 2febd2d12f..fc6adf1dea 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -461,3 +461,11 @@ Migration
 ``skipped`` field in Migration stats has been deprecated.  It hasn't
 been used for more than 10 years.
 
+``inc`` migrate command option (since 8.2)
+''
+
+Use blockdev-mirror with NBD instead.
+
+As an intermediate step the ``inc`` functionality can be achieved by
+setting the ``block-incremental`` migration parameter to ``true``.
+But this parameter is also deprecated.
diff --git a/qapi/migration.json b/qapi/migration.json
index db3df12d6c..fa7f4f2575 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1524,6 +1524,11 @@
 #
 # @resume: resume one paused migration, default "off". (since 3.0)
 #
+# Features:
+#
+# @deprecated: Member @inc is deprecated.  Use blockdev-mirror with
+# NBD instead.
+#
 # Returns: nothing on success
 #
 # Since: 0.14
@@ -1545,7 +1550,8 @@
 # <- { "return": {} }
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
+  'data': {'uri': 'str', '*blk': 'bool',
+   '*inc': { 'type': 'bool', 'features': [ 'deprecated' ] },
'*detach': 'bool', '*resume': 'bool' } }
 
 ##
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index a82597f18e..83176f5bae 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -745,6 +745,11 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 const char *uri = qdict_get_str(qdict, "uri");
 Error *err = NULL;
 
+if (inc) {
+warn_report("option '-i' is deprecated;"
+" use blockdev-mirror with NBD instead");
+}
+
 qmp_migrate(uri, !!blk, blk, !!inc, inc,
 false, false, true, resume, );
 if (hmp_handle_error(mon, err)) {
diff --git a/migration/migration.c b/migration/migration.c
index 67547eb6a1..06a706ad90 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1620,6 +1620,11 @@ static bool migrate_prepare(MigrationState *s, bool blk, 
bool blk_inc,
 {
 Error *local_err = NULL;
 
+if (blk_inc) {
+warn_report("parameter 'inc' is deprecated;"
+" use blockdev-mirror with NBD instead");
+}
+
 if (resume) {
 if (s->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
 error_setg(errp, "Cannot resume if there is no "
-- 
2.41.0




[PULL 18/39] hw/ipmi: Don't call vmstate_register() from instance_init() functions

2023-10-24 Thread Juan Quintela
From: Thomas Huth 

instance_init() can be called multiple times, e.g. during introspection
of the device. We should not install the vmstate handlers here. Do it
in the realize() function instead.

Signed-off-by: Thomas Huth 
Reviewed-by: Juan Quintela 
Acked-by: Corey Minyard 
Signed-off-by: Juan Quintela 
Message-ID: <20231020145554.662751-1-th...@redhat.com>
---
 hw/ipmi/ipmi_bmc_extern.c | 29 ---
 hw/ipmi/isa_ipmi_bt.c | 34 +-
 hw/ipmi/isa_ipmi_kcs.c| 50 +++
 3 files changed, 57 insertions(+), 56 deletions(-)

diff --git a/hw/ipmi/ipmi_bmc_extern.c b/hw/ipmi/ipmi_bmc_extern.c
index e232d35ba2..2117dad35a 100644
--- a/hw/ipmi/ipmi_bmc_extern.c
+++ b/hw/ipmi/ipmi_bmc_extern.c
@@ -453,19 +453,6 @@ static void ipmi_bmc_extern_handle_reset(IPMIBmc *b)
 continue_send(ibe);
 }
 
-static void ipmi_bmc_extern_realize(DeviceState *dev, Error **errp)
-{
-IPMIBmcExtern *ibe = IPMI_BMC_EXTERN(dev);
-
-if (!qemu_chr_fe_backend_connected(>chr)) {
-error_setg(errp, "IPMI external bmc requires chardev attribute");
-return;
-}
-
-qemu_chr_fe_set_handlers(>chr, can_receive, receive,
- chr_event, NULL, ibe, NULL, true);
-}
-
 static int ipmi_bmc_extern_post_migrate(void *opaque, int version_id)
 {
 IPMIBmcExtern *ibe = opaque;
@@ -499,12 +486,26 @@ static const VMStateDescription vmstate_ipmi_bmc_extern = 
{
 }
 };
 
+static void ipmi_bmc_extern_realize(DeviceState *dev, Error **errp)
+{
+IPMIBmcExtern *ibe = IPMI_BMC_EXTERN(dev);
+
+if (!qemu_chr_fe_backend_connected(>chr)) {
+error_setg(errp, "IPMI external bmc requires chardev attribute");
+return;
+}
+
+qemu_chr_fe_set_handlers(>chr, can_receive, receive,
+ chr_event, NULL, ibe, NULL, true);
+
+vmstate_register(NULL, 0, _ipmi_bmc_extern, ibe);
+}
+
 static void ipmi_bmc_extern_init(Object *obj)
 {
 IPMIBmcExtern *ibe = IPMI_BMC_EXTERN(obj);
 
 ibe->extern_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, extern_timeout, ibe);
-vmstate_register(NULL, 0, _ipmi_bmc_extern, ibe);
 }
 
 static void ipmi_bmc_extern_finalize(Object *obj)
diff --git a/hw/ipmi/isa_ipmi_bt.c b/hw/ipmi/isa_ipmi_bt.c
index a83e7243d6..aec064d3cd 100644
--- a/hw/ipmi/isa_ipmi_bt.c
+++ b/hw/ipmi/isa_ipmi_bt.c
@@ -68,6 +68,21 @@ static void isa_ipmi_bt_lower_irq(IPMIBT *ib)
 qemu_irq_lower(iib->irq);
 }
 
+static const VMStateDescription vmstate_ISAIPMIBTDevice = {
+.name = TYPE_IPMI_INTERFACE_PREFIX "isa-bt",
+.version_id = 2,
+.minimum_version_id = 2,
+/*
+ * Version 1 had messed up the array transfer, it's not even usable
+ * because it used VMSTATE_VBUFFER_UINT32, but it did not transfer
+ * the buffer length, so random things would happen.
+ */
+.fields  = (VMStateField[]) {
+VMSTATE_STRUCT(bt, ISAIPMIBTDevice, 1, vmstate_IPMIBT, IPMIBT),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static void isa_ipmi_bt_realize(DeviceState *dev, Error **errp)
 {
 Error *err = NULL;
@@ -102,30 +117,15 @@ static void isa_ipmi_bt_realize(DeviceState *dev, Error 
**errp)
 qdev_set_legacy_instance_id(dev, iib->bt.io_base, iib->bt.io_length);
 
 isa_register_ioport(isadev, >bt.io, iib->bt.io_base);
-}
 
-static const VMStateDescription vmstate_ISAIPMIBTDevice = {
-.name = TYPE_IPMI_INTERFACE_PREFIX "isa-bt",
-.version_id = 2,
-.minimum_version_id = 2,
-/*
- * Version 1 had messed up the array transfer, it's not even usable
- * because it used VMSTATE_VBUFFER_UINT32, but it did not transfer
- * the buffer length, so random things would happen.
- */
-.fields  = (VMStateField[]) {
-VMSTATE_STRUCT(bt, ISAIPMIBTDevice, 1, vmstate_IPMIBT, IPMIBT),
-VMSTATE_END_OF_LIST()
-}
-};
+vmstate_register(NULL, 0, _ISAIPMIBTDevice, dev);
+}
 
 static void isa_ipmi_bt_init(Object *obj)
 {
 ISAIPMIBTDevice *iib = ISA_IPMI_BT(obj);
 
 ipmi_bmc_find_and_link(obj, (Object **) >bt.bmc);
-
-vmstate_register(NULL, 0, _ISAIPMIBTDevice, iib);
 }
 
 static void *isa_ipmi_bt_get_backend_data(IPMIInterface *ii)
diff --git a/hw/ipmi/isa_ipmi_kcs.c b/hw/ipmi/isa_ipmi_kcs.c
index b2ed70b9da..b5dcb64616 100644
--- a/hw/ipmi/isa_ipmi_kcs.c
+++ b/hw/ipmi/isa_ipmi_kcs.c
@@ -67,6 +67,24 @@ static void isa_ipmi_kcs_lower_irq(IPMIKCS *ik)
 qemu_irq_lower(iik->irq);
 }
 
+static bool vmstate_kcs_before_version2(void *opaque, int version)
+{
+return version <= 1;
+}
+
+static const VMStateDescription vmstate_ISAIPMIKCSDevice = {
+.name = TYPE_IPMI_INTERFACE,
+.version_id = 2,
+.minimum_version_id = 1,
+.fields  = (VMStateField[]) {
+VMSTATE_VSTRUCT_TEST(kcs, ISAIPMIKCSDevice, 
vmstate_kcs_before_version2,
+ 0, vmstate_IPMIKCS, IPMIKCS, 1),
+VMSTATE_VSTRUCT_V(kcs, ISAIPMIKCSDevice, 2, vmstate_IPMIKCS,
+   

Re: [PATCH 06/12] hw/xen: add get_frontend_path() method to XenDeviceClass

2023-10-24 Thread David Woodhouse
On Tue, 2023-10-24 at 13:59 +0100, Paul Durrant wrote:
> On 24/10/2023 13:56, David Woodhouse wrote:
> > On Tue, 2023-10-24 at 13:42 +0100, Paul Durrant wrote:
> > > 
> > > > --- a/hw/xen/xen-bus.c
> > > > +++ b/hw/xen/xen-bus.c
> > > > @@ -711,8 +711,16 @@ static void xen_device_frontend_create(XenDevice 
> > > > *xendev, Error **errp)
> > > >     {
> > > >     ERRP_GUARD();
> > > >     XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev)));
> > > > +    XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev);
> > > >     
> > > > -    xendev->frontend_path = xen_device_get_frontend_path(xendev);
> > > > +    if (xendev_class->get_frontend_path) {
> > > > +    xendev->frontend_path = 
> > > > xendev_class->get_frontend_path(xendev, errp);
> > > > +    if (!xendev->frontend_path) {
> > > > +    return;
> > > 
> > > I think you need to update errp here to note that you are failing to
> > > create the frontend.
> > 
> > If xendev_class->get_frontend_path returned NULL it will have filled in 
> > errp.
> > 
> 
> Ok, but a prepend to say that a lack of path there means we skip 
> frontend creation seems reasonable?

No, it *is* returning an error. Perhaps I can make it

if (!xendev->frontend_path) {
/*
 * If the ->get_frontend_path() method returned NULL, it will
 * already have set *errp accordingly. Return the error.
 */
return /* false */;
}


> > As a general rule (I'll be doing a bombing run on xen-bus once I get my
> > patch queue down into single digits) we should never check 'if (*errp)'
> > to check if a function had an error. It should *also* return a success
> > or failure indication, and we should cope with errp being NULL.
> > 
> 
> I'm pretty sure someone told me the exact opposite a few years back.

Then they were wrong :)


smime.p7s
Description: S/MIME cryptographic signature


[PULL 27/39] migration: Use vmstate_register_any() for isa-ide

2023-10-24 Thread Juan Quintela
Otherwise qom-test fails.

ok 4 /i386/qom/x-remote
qemu-system-i386: savevm_state_handler_insert: Detected duplicate 
SaveStateEntry: id=isa-ide, instance_id=0x0
Broken pipe
../../../../../mnt/code/qemu/full/tests/qtest/libqtest.c:195: kill_qemu() tried 
to terminate QEMU process but encountered exit status 1 (expected 0)
Aborted (core dumped)
$

Reviewed-by: Stefan Berger 
Signed-off-by: Juan Quintela 
Message-ID: <20231020090731.28701-4-quint...@redhat.com>
---
 hw/ide/isa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ide/isa.c b/hw/ide/isa.c
index 95053e026f..ea60c08116 100644
--- a/hw/ide/isa.c
+++ b/hw/ide/isa.c
@@ -73,7 +73,7 @@ static void isa_ide_realizefn(DeviceState *dev, Error **errp)
 ide_bus_init(>bus, sizeof(s->bus), dev, 0, 2);
 ide_init_ioport(>bus, isadev, s->iobase, s->iobase2);
 ide_bus_init_output_irq(>bus, isa_get_irq(isadev, s->irqnum));
-vmstate_register(VMSTATE_IF(dev), 0, _ide_isa, s);
+vmstate_register_any(VMSTATE_IF(dev), _ide_isa, s);
 ide_bus_register_restart_cb(>bus);
 }
 
-- 
2.41.0




Re: [PATCH 02/12] hw/xen: select kernel mode for per-vCPU event channel upcall vector

2023-10-24 Thread David Woodhouse
On Tue, 2023-10-24 at 13:29 +0100, Paul Durrant wrote:
> 
> > +    /* If the guest has set a per-vCPU callback vector, prefer that. */
> > +    if (gsi && kvm_xen_has_vcpu_callback_vector()) {
> > +    in_kernel = kvm_xen_has_cap(EVTCHN_SEND);
> > +    gsi = 0;
> > +    }
> > +
> 
> So this deals with setting the callback via after setting the upcall 
> vector. What happens if the guest then disables the upcall vector (by
> setting it to zero)? Xen checks 'v->arch.hvm.evtchn_upcall_vector != 0' 
> for every event delivery.

Qemu and KVM check before each delivery too. For a vCPU other than
vCPU0, if it isn't the per-vCPU lapic upcall vector, and it isn't the
system-wide vector, the interrupt isn't supposed to be delivered (the
GSI is tied to vCPU0).

However, we don't support dynamically disabling the kernel acceleration
(telling it to forget about the VIRQs, IPIs so that we can handle them
in userspace from now on). Not except for soft reset, when they're all
torn down anyway.

I don't really *want* to support that either. Better to make the kernel
mode unconditional, having it signal userspace via an eventfd when
vCPU0 has an upcall pending and it's GSI or INTx. But that's a *pair*
of eventfds, one for the resampler... and first I have to fix Qemu's
interrupt controller models to do the resampling properly on ack (qv).

Right now this works for all guests in practice and I have other yaks
to shave :)




smime.p7s
Description: S/MIME cryptographic signature


[PULL 31/39] migration: Improve example and documentation of vmstate_register()

2023-10-24 Thread Juan Quintela
Reviewed-by: Stefan Berger 
Signed-off-by: Juan Quintela 
Message-ID: <20231020090731.28701-11-quint...@redhat.com>
---
 docs/devel/migration.rst | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
index be913630c3..240eb16d90 100644
--- a/docs/devel/migration.rst
+++ b/docs/devel/migration.rst
@@ -167,13 +167,17 @@ An example (from hw/input/pckbd.c)
   }
   };
 
-We are declaring the state with name "pckbd".
-The ``version_id`` is 3, and the fields are 4 uint8_t in a KBDState structure.
-We registered this with:
+We are declaring the state with name "pckbd".  The ``version_id`` is
+3, and there are 4 uint8_t fields in the KBDState structure.  We
+registered this ``VMSTATEDescription`` with one of the following
+functions.  The first one will generate a device ``instance_id``
+different for each registration.  Use the second one if you already
+have an id that is different for each instance of the device:
 
 .. code:: c
 
-vmstate_register(NULL, 0, _kbd, s);
+vmstate_register_any(NULL, _kbd, s);
+vmstate_register(NULL, instance_id, _kbd, s);
 
 For devices that are ``qdev`` based, we can register the device in the class
 init function:
-- 
2.41.0




[PULL 39/39] migration: Deprecate old compression method

2023-10-24 Thread Juan Quintela
Acked-by: Stefan Hajnoczi 
Acked-by: Peter Xu 
Reviewed-by: Markus Armbruster 
Signed-off-by: Juan Quintela 
Message-ID: <20231018115513.2163-6-quint...@redhat.com>
---
 docs/about/deprecated.rst |  8 +
 qapi/migration.json   | 63 ++-
 migration/options.c   | 13 
 3 files changed, 64 insertions(+), 20 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 7ae872162d..e7f17827d3 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -488,3 +488,11 @@ devices or none.
 Please see "QMP invocation for live storage migration with
 ``blockdev-mirror`` + NBD" in docs/interop/live-block-operations.rst
 for a detailed explanation.
+
+old compression method (since 8.2)
+''
+
+Compression method fails too much.  Too many races.  We are going to
+remove it if nobody fixes it.  For starters, migration-test
+compression tests are disabled becase they fail randomly.  If you need
+compression, use multifd compression methods.
diff --git a/qapi/migration.json b/qapi/migration.json
index e3b00a215b..e6610af428 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -272,6 +272,10 @@
 # Features:
 #
 # @deprecated: Member @disk is deprecated because block migration is.
+# Member @compression is deprecated because it is unreliable and
+# untested.  It is recommended to use multifd migration, which
+# offers an alternative compression implementation that is
+# reliable and tested.
 #
 # Since: 0.14
 ##
@@ -289,7 +293,7 @@
'*blocked-reasons': ['str'],
'*postcopy-blocktime': 'uint32',
'*postcopy-vcpu-blocktime': ['uint32'],
-   '*compression': 'CompressionStats',
+   '*compression': { 'type': 'CompressionStats', 'features': [ 
'deprecated' ] },
'*socket-address': ['SocketAddress'],
'*dirty-limit-throttle-time-per-round': 'uint64',
'*dirty-limit-ring-full-time': 'uint64'} }
@@ -530,7 +534,10 @@
 # Features:
 #
 # @deprecated: Member @block is deprecated.  Use blockdev-mirror with
-# NBD instead.
+# NBD instead.  Member @compression is deprecated because it is
+# unreliable and untested.  It is recommended to use multifd
+# migration, which offers an alternative compression
+# implementation that is reliable and tested.
 #
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
 #
@@ -538,7 +545,8 @@
 ##
 { 'enum': 'MigrationCapability',
   'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
-   'compress', 'events', 'postcopy-ram',
+   { 'name': 'compress', 'features': [ 'deprecated' ] },
+   'events', 'postcopy-ram',
{ 'name': 'x-colo', 'features': [ 'unstable' ] },
'release-ram',
{ 'name': 'block', 'features': [ 'deprecated' ] },
@@ -844,7 +852,9 @@
 # Features:
 #
 # @deprecated: Member @block-incremental is deprecated.  Use
-# blockdev-mirror with NBD instead.
+# blockdev-mirror with NBD instead.  Members @compress-level,
+# @compress-threads, @decompress-threads and @compress-wait-thread
+# are deprecated because @compression is deprecated.
 #
 # @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
 # are experimental.
@@ -854,8 +864,11 @@
 { 'enum': 'MigrationParameter',
   'data': ['announce-initial', 'announce-max',
'announce-rounds', 'announce-step',
-   'compress-level', 'compress-threads', 'decompress-threads',
-   'compress-wait-thread', 'throttle-trigger-threshold',
+   { 'name': 'compress-level', 'features': [ 'deprecated' ] },
+   { 'name': 'compress-threads', 'features': [ 'deprecated' ] },
+   { 'name': 'decompress-threads', 'features': [ 'deprecated' ] },
+   { 'name': 'compress-wait-thread', 'features': [ 'deprecated' ] },
+   'throttle-trigger-threshold',
'cpu-throttle-initial', 'cpu-throttle-increment',
'cpu-throttle-tailslow',
'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth',
@@ -1023,7 +1036,9 @@
 # Features:
 #
 # @deprecated: Member @block-incremental is deprecated.  Use
-# blockdev-mirror with NBD instead.
+# blockdev-mirror with NBD instead.  Members @compress-level,
+# @compress-threads, @decompress-threads and @compress-wait-thread
+# are deprecated because @compression is deprecated.
 #
 # @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
 # are experimental.
@@ -1038,10 +1053,14 @@
 '*announce-max': 'size',
 '*announce-rounds': 'size',
 '*announce-step': 'size',
-'*compress-level': 'uint8',
-'*compress-threads': 'uint8',
-'*compress-wait-thread': 'bool',
-'*decompress-threads': 'uint8',
+'*compress-level': { 'type': 'uint8',
+ 

Re: [PATCH 08/12] hw/xen: do not repeatedly try to create a failing backend device

2023-10-24 Thread Paul Durrant

On 16/10/2023 16:19, David Woodhouse wrote:

From: David Woodhouse 

If xen_backend_device_create() fails to instantiate a device, the XenBus
code will just keep trying over and over again each time the bus is
re-enumerated, as long as the backend appears online and in
XenbusStateInitialising.

The only thing which prevents the XenBus code from recreating duplicates
of devices which already exist, is the fact that xen_device_realize()
sets the backend state to XenbusStateInitWait. If the attempt to create
the device doesn't get *that* far, that's when it will keep getting
retried.

My first thought was to handle errors by setting the backend state to
XenbusStateClosed, but that doesn't work for XenConsole which wants to
*ignore* any device of type != "ioemu" completely.

So, make xen_backend_device_create() *keep* the XenBackendInstance for a
failed device, and provide a new xen_backend_exists() function to allow
xen_bus_type_enumerate() to check whether one already exists before
creating a new one.

Signed-off-by: David Woodhouse 
---
  hw/xen/xen-backend.c | 27 +--
  hw/xen/xen-bus.c |  3 ++-
  include/hw/xen/xen-backend.h |  1 +
  3 files changed, 24 insertions(+), 7 deletions(-)



Reviewed-by: Paul Durrant 




[PULL 37/39] migration: migrate 'blk' command option is deprecated.

2023-10-24 Thread Juan Quintela
Use blocked-mirror with NBD instead.

Acked-by: Stefan Hajnoczi 
Reviewed-by: Thomas Huth 
Reviewed-by: Markus Armbruster 
Signed-off-by: Juan Quintela 
Message-ID: <20231018115513.2163-4-quint...@redhat.com>
---
 docs/about/deprecated.rst  | 9 +
 qapi/migration.json| 7 ---
 migration/migration-hmp-cmds.c | 5 +
 migration/migration.c  | 5 +
 4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index fc6adf1dea..98b0f14e69 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -469,3 +469,12 @@ Use blockdev-mirror with NBD instead.
 As an intermediate step the ``inc`` functionality can be achieved by
 setting the ``block-incremental`` migration parameter to ``true``.
 But this parameter is also deprecated.
+
+``blk`` migrate command option (since 8.2)
+''
+
+Use blockdev-mirror with NBD instead.
+
+As an intermediate step the ``blk`` functionality can be achieved by
+setting the ``block`` migration capability to ``true``.  But this
+capability is also deprecated.
diff --git a/qapi/migration.json b/qapi/migration.json
index fa7f4f2575..3765c2b662 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1526,8 +1526,8 @@
 #
 # Features:
 #
-# @deprecated: Member @inc is deprecated.  Use blockdev-mirror with
-# NBD instead.
+# @deprecated: Members @inc and @blk are deprecated.  Use
+# blockdev-mirror with NBD instead.
 #
 # Returns: nothing on success
 #
@@ -1550,7 +1550,8 @@
 # <- { "return": {} }
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool',
+  'data': {'uri': 'str',
+   '*blk': { 'type': 'bool', 'features': [ 'deprecated' ] },
'*inc': { 'type': 'bool', 'features': [ 'deprecated' ] },
'*detach': 'bool', '*resume': 'bool' } }
 
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 83176f5bae..dfe98da355 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -750,6 +750,11 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 " use blockdev-mirror with NBD instead");
 }
 
+if (blk) {
+warn_report("option '-b' is deprecated;"
+" use blockdev-mirror with NBD instead");
+}
+
 qmp_migrate(uri, !!blk, blk, !!inc, inc,
 false, false, true, resume, );
 if (hmp_handle_error(mon, err)) {
diff --git a/migration/migration.c b/migration/migration.c
index 06a706ad90..cb2d7161b5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1625,6 +1625,11 @@ static bool migrate_prepare(MigrationState *s, bool blk, 
bool blk_inc,
 " use blockdev-mirror with NBD instead");
 }
 
+if (blk) {
+warn_report("parameter 'blk' is deprecated;"
+" use blockdev-mirror with NBD instead");
+}
+
 if (resume) {
 if (s->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
 error_setg(errp, "Cannot resume if there is no "
-- 
2.41.0




[PULL 32/39] migration: Use vmstate_register_any() for audio

2023-10-24 Thread Juan Quintela
We can have more than one audio backend.

void audio_init_audiodevs(void)
{
AudiodevListEntry *e;

QSIMPLEQ_FOREACH(e, , next) {
audio_init(e->dev, _fatal);
}
}

Reviewed-by: Stefan Berger 
Signed-off-by: Juan Quintela 
Message-ID: <20231020090731.28701-12-quint...@redhat.com>
---
 audio/audio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/audio/audio.c b/audio/audio.c
index e9815d6812..f91e05b72c 100644
--- a/audio/audio.c
+++ b/audio/audio.c
@@ -1781,7 +1781,7 @@ static AudioState *audio_init(Audiodev *dev, Error **errp)
 
 QTAILQ_INSERT_TAIL(_states, s, list);
 QLIST_INIT (>card_head);
-vmstate_register (NULL, 0, _audio, s);
+vmstate_register_any(NULL, _audio, s);
 return s;
 
 out:
-- 
2.41.0




[PULL 34/39] migration: Use vmstate_register_any() for vmware_vga

2023-10-24 Thread Juan Quintela
I have no idea if we can have more than one vmware_vga device, so play
it safe.

Reviewed-by: Stefan Berger 
Reviewed-by: Thomas Huth 
Signed-off-by: Juan Quintela 
Message-ID: <20231020090731.28701-14-quint...@redhat.com>
---
 hw/display/vmware_vga.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/display/vmware_vga.c b/hw/display/vmware_vga.c
index 09591fbd39..7490d43881 100644
--- a/hw/display/vmware_vga.c
+++ b/hw/display/vmware_vga.c
@@ -1264,7 +1264,7 @@ static void vmsvga_init(DeviceState *dev, struct 
vmsvga_state_s *s,
 
 vga_common_init(>vga, OBJECT(dev), _fatal);
 vga_init(>vga, OBJECT(dev), address_space, io, true);
-vmstate_register(NULL, 0, _vga_common, >vga);
+vmstate_register_any(NULL, _vga_common, >vga);
 s->new_depth = 32;
 }
 
-- 
2.41.0




[PULL 38/39] migration: Deprecate block migration

2023-10-24 Thread Juan Quintela
It is obsolete.  It is better to use driver-mirror with NBD instead.

CC: Kevin Wolf 
CC: Eric Blake 
CC: Stefan Hajnoczi 
CC: Hanna Czenczek 

Acked-by: Stefan Hajnoczi 
Reviewed-by: Markus Armbruster 
Signed-off-by: Juan Quintela 
Message-ID: <20231018115513.2163-5-quint...@redhat.com>
---
 docs/about/deprecated.rst | 10 ++
 qapi/migration.json   | 29 -
 migration/block.c |  3 +++
 migration/options.c   |  9 -
 4 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 98b0f14e69..7ae872162d 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -478,3 +478,13 @@ Use blockdev-mirror with NBD instead.
 As an intermediate step the ``blk`` functionality can be achieved by
 setting the ``block`` migration capability to ``true``.  But this
 capability is also deprecated.
+
+block migration (since 8.2)
+'''
+
+Block migration is too inflexible.  It needs to migrate all block
+devices or none.
+
+Please see "QMP invocation for live storage migration with
+``blockdev-mirror`` + NBD" in docs/interop/live-block-operations.rst
+for a detailed explanation.
diff --git a/qapi/migration.json b/qapi/migration.json
index 3765c2b662..e3b00a215b 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -269,11 +269,15 @@
 # average memory load of the virtual CPU indirectly.  Note that
 # zero means guest doesn't dirty memory.  (Since 8.1)
 #
+# Features:
+#
+# @deprecated: Member @disk is deprecated because block migration is.
+#
 # Since: 0.14
 ##
 { 'struct': 'MigrationInfo',
   'data': {'*status': 'MigrationStatus', '*ram': 'MigrationStats',
-   '*disk': 'MigrationStats',
+   '*disk': { 'type': 'MigrationStats', 'features': [ 'deprecated' ] },
'*vfio': 'VfioStats',
'*xbzrle-cache': 'XBZRLECacheStats',
'*total-time': 'int',
@@ -525,6 +529,9 @@
 #
 # Features:
 #
+# @deprecated: Member @block is deprecated.  Use blockdev-mirror with
+# NBD instead.
+#
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
 #
 # Since: 1.2
@@ -534,7 +541,8 @@
'compress', 'events', 'postcopy-ram',
{ 'name': 'x-colo', 'features': [ 'unstable' ] },
'release-ram',
-   'block', 'return-path', 'pause-before-switchover', 'multifd',
+   { 'name': 'block', 'features': [ 'deprecated' ] },
+   'return-path', 'pause-before-switchover', 'multifd',
'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
{ 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
'validate-uuid', 'background-snapshot',
@@ -835,6 +843,9 @@
 #
 # Features:
 #
+# @deprecated: Member @block-incremental is deprecated.  Use
+# blockdev-mirror with NBD instead.
+#
 # @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
 # are experimental.
 #
@@ -850,7 +861,7 @@
'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth',
'avail-switchover-bandwidth', 'downtime-limit',
{ 'name': 'x-checkpoint-delay', 'features': [ 'unstable' ] },
-   'block-incremental',
+   { 'name': 'block-incremental', 'features': [ 'deprecated' ] },
'multifd-channels',
'xbzrle-cache-size', 'max-postcopy-bandwidth',
'max-cpu-throttle', 'multifd-compression',
@@ -1011,6 +1022,9 @@
 #
 # Features:
 #
+# @deprecated: Member @block-incremental is deprecated.  Use
+# blockdev-mirror with NBD instead.
+#
 # @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
 # are experimental.
 #
@@ -1040,7 +1054,8 @@
 '*downtime-limit': 'uint64',
 '*x-checkpoint-delay': { 'type': 'uint32',
  'features': [ 'unstable' ] },
-'*block-incremental': 'bool',
+'*block-incremental': { 'type': 'bool',
+'features': [ 'deprecated' ] },
 '*multifd-channels': 'uint8',
 '*xbzrle-cache-size': 'size',
 '*max-postcopy-bandwidth': 'size',
@@ -1225,6 +1240,9 @@
 #
 # Features:
 #
+# @deprecated: Member @block-incremental is deprecated.  Use
+# blockdev-mirror with NBD instead.
+#
 # @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
 # are experimental.
 #
@@ -1251,7 +1269,8 @@
 '*downtime-limit': 'uint64',
 '*x-checkpoint-delay': { 'type': 'uint32',
  'features': [ 'unstable' ] },
-'*block-incremental': 'bool',
+'*block-incremental': { 'type': 'bool',
+'features': [ 'deprecated' ] },
 '*multifd-channels': 'uint8',
 '*xbzrle-cache-size': 'size',
 '*max-postcopy-bandwidth': 'size',
diff --git a/migration/block.c b/migration/block.c

[PULL 28/39] migration: Use VMSTATE_INSTANCE_ID_ANY for slirp

2023-10-24 Thread Juan Quintela
Each user network conection create a new slirp instance.  We register
more than one slirp instance for number 0.

qemu-system-x86_64: -netdev user,id=hs1: savevm_state_handler_insert: Detected 
duplicate SaveStateEntry: id=slirp, instance_id=0x0
Broken pipe
../../../../../mnt/code/qemu/full/tests/qtest/libqtest.c:195: kill_qemu() tried 
to terminate QEMU process but encountered exit status 1 (expected 0)
Aborted (core dumped)

Reviewed-by: Stefan Berger 
Signed-off-by: Juan Quintela 
Message-ID: <20231020090731.28701-6-quint...@redhat.com>
---
 net/slirp.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/slirp.c b/net/slirp.c
index c33b3e02e7..25b49c4526 100644
--- a/net/slirp.c
+++ b/net/slirp.c
@@ -46,6 +46,7 @@
 #include "qapi/qmp/qdict.h"
 #include "util.h"
 #include "migration/register.h"
+#include "migration/vmstate.h"
 #include "migration/qemu-file-types.h"
 
 static int get_str_sep(char *buf, int buf_size, const char **pp, int sep)
@@ -659,8 +660,8 @@ static int net_slirp_init(NetClientState *peer, const char 
*model,
  * specific version?
  */
 g_assert(slirp_state_version() == 4);
-register_savevm_live("slirp", 0, slirp_state_version(),
- _slirp_state, s->slirp);
+register_savevm_live("slirp", VMSTATE_INSTANCE_ID_ANY,
+ slirp_state_version(), _slirp_state, s->slirp);
 
 s->poll_notifier.notify = net_slirp_poll_notify;
 main_loop_poll_add_notifier(>poll_notifier);
-- 
2.41.0




[PULL 17/39] migration: Rename ram_compressed_pages() to compress_ram_pages()

2023-10-24 Thread Juan Quintela
We are moving to have all functions exported from ram-compress.c to
start with compress_.

Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20231019110724.15324-12-quint...@redhat.com>
---
 migration/ram-compress.h | 2 +-
 migration/ram-compress.c | 2 +-
 migration/ram.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/ram-compress.h b/migration/ram-compress.h
index e222887fb7..0d89a2f55e 100644
--- a/migration/ram-compress.h
+++ b/migration/ram-compress.h
@@ -68,7 +68,7 @@ int compress_threads_load_setup(QEMUFile *f);
 void decompress_data_with_multi_threads(QEMUFile *f, void *host, int len);
 
 void populate_compress(MigrationInfo *info);
-uint64_t ram_compressed_pages(void);
+uint64_t compress_ram_pages(void);
 void update_compress_thread_counts(const CompressParam *param, int bytes_xmit);
 void compress_update_rates(uint64_t page_count);
 int compress_send_queued_data(CompressParam *param);
diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index 036e44085b..fa4388f6a6 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -516,7 +516,7 @@ void populate_compress(MigrationInfo *info)
 info->compression->compression_rate = 
compression_counters.compression_rate;
 }
 
-uint64_t ram_compressed_pages(void)
+uint64_t compress_ram_pages(void)
 {
 return compression_counters.pages;
 }
diff --git a/migration/ram.c b/migration/ram.c
index 849752ef29..6335564035 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -932,7 +932,7 @@ uint64_t ram_get_total_transferred_pages(void)
 {
 return stat64_get(_stats.normal_pages) +
 stat64_get(_stats.zero_pages) +
-ram_compressed_pages() + xbzrle_counters.pages;
+compress_ram_pages() + xbzrle_counters.pages;
 }
 
 static void migration_update_rates(RAMState *rs, int64_t end_time)
-- 
2.41.0




[PULL 35/39] qemu-iotests: Filter warnings about block migration being deprecated

2023-10-24 Thread Juan Quintela
Create a new filter that removes the two warnings for test 183.

Reviewed-by: Hanna Czenczek 
Signed-off-by: Juan Quintela 
Message-ID: <20231018115513.2163-2-quint...@redhat.com>
---
 tests/qemu-iotests/183   | 2 +-
 tests/qemu-iotests/common.filter | 7 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/183 b/tests/qemu-iotests/183
index ee62939e72..b85770458e 100755
--- a/tests/qemu-iotests/183
+++ b/tests/qemu-iotests/183
@@ -90,7 +90,7 @@ echo
 reply="$(_send_qemu_cmd $src \
 "{ 'execute': 'migrate',
'arguments': { 'uri': 'unix:${MIG_SOCKET}', 'blk': true } }" \
-'return\|error')"
+'return\|error' | _filter_migration_block_deprecated)"
 echo "$reply"
 if echo "$reply" | grep "compiled without old-style" > /dev/null; then
 _notrun "migrate -b support not compiled in"
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index fc3c64bcb8..2846c83808 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -359,5 +359,12 @@ _filter_qcow2_compression_type_bit()
 -e 's/\(incompatible_features.*\), 3\(,.*\)/\1\2/'
 }
 
+# filter warnings caused for block migration deprecation
+_filter_migration_block_deprecated()
+{
+gsed -e '/warning: parameter .blk. is deprecated; use blockdev-mirror with 
NBD instead/d' \
+ -e '/warning: block migration is deprecated; use blockdev-mirror with 
NBD instead/d'
+}
+
 # make sure this script returns success
 true
-- 
2.41.0




[PULL 26/39] migration: Use vmstate_register_any()

2023-10-24 Thread Juan Quintela
This are the easiest cases, where we were already using
VMSTATE_INSTANCE_ID_ANY.

Reviewed-by: Stefan Berger 
Signed-off-by: Juan Quintela 
Message-ID: <20231020090731.28701-3-quint...@redhat.com>
---
 backends/dbus-vmstate.c | 3 +--
 backends/tpm/tpm_emulator.c | 3 +--
 hw/i2c/core.c   | 2 +-
 hw/input/adb.c  | 2 +-
 hw/input/ads7846.c  | 2 +-
 hw/input/stellaris_input.c  | 3 +--
 hw/net/eepro100.c   | 3 +--
 hw/pci/pci.c| 2 +-
 hw/ppc/spapr_nvdimm.c   | 3 +--
 hw/timer/arm_timer.c| 2 +-
 hw/virtio/virtio-mem.c  | 4 ++--
 11 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/backends/dbus-vmstate.c b/backends/dbus-vmstate.c
index 57369ec0f2..a9d8cb0acd 100644
--- a/backends/dbus-vmstate.c
+++ b/backends/dbus-vmstate.c
@@ -426,8 +426,7 @@ dbus_vmstate_complete(UserCreatable *uc, Error **errp)
 return;
 }
 
-if (vmstate_register(VMSTATE_IF(self), VMSTATE_INSTANCE_ID_ANY,
- _vmstate, self) < 0) {
+if (vmstate_register_any(VMSTATE_IF(self), _vmstate, self) < 0) {
 error_setg(errp, "Failed to register vmstate");
 }
 }
diff --git a/backends/tpm/tpm_emulator.c b/backends/tpm/tpm_emulator.c
index bf1a90f5d7..f7f1b4ad7a 100644
--- a/backends/tpm/tpm_emulator.c
+++ b/backends/tpm/tpm_emulator.c
@@ -975,8 +975,7 @@ static void tpm_emulator_inst_init(Object *obj)
 qemu_add_vm_change_state_handler(tpm_emulator_vm_state_change,
  tpm_emu);
 
-vmstate_register(NULL, VMSTATE_INSTANCE_ID_ANY,
- _tpm_emulator, obj);
+vmstate_register_any(NULL, _tpm_emulator, obj);
 }
 
 /*
diff --git a/hw/i2c/core.c b/hw/i2c/core.c
index bed594fe59..879a1d45cb 100644
--- a/hw/i2c/core.c
+++ b/hw/i2c/core.c
@@ -64,7 +64,7 @@ I2CBus *i2c_init_bus(DeviceState *parent, const char *name)
 bus = I2C_BUS(qbus_new(TYPE_I2C_BUS, parent, name));
 QLIST_INIT(>current_devs);
 QSIMPLEQ_INIT(>pending_masters);
-vmstate_register(NULL, VMSTATE_INSTANCE_ID_ANY, _i2c_bus, bus);
+vmstate_register_any(NULL, _i2c_bus, bus);
 return bus;
 }
 
diff --git a/hw/input/adb.c b/hw/input/adb.c
index 214ae6f42b..8aed0da2cd 100644
--- a/hw/input/adb.c
+++ b/hw/input/adb.c
@@ -247,7 +247,7 @@ static void adb_bus_realize(BusState *qbus, Error **errp)
 adb_bus->autopoll_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL, adb_autopoll,
adb_bus);
 
-vmstate_register(NULL, -1, _adb_bus, adb_bus);
+vmstate_register_any(NULL, _adb_bus, adb_bus);
 }
 
 static void adb_bus_unrealize(BusState *qbus)
diff --git a/hw/input/ads7846.c b/hw/input/ads7846.c
index dc0998ac79..91116c6bdb 100644
--- a/hw/input/ads7846.c
+++ b/hw/input/ads7846.c
@@ -158,7 +158,7 @@ static void ads7846_realize(SSIPeripheral *d, Error **errp)
 
 ads7846_int_update(s);
 
-vmstate_register(NULL, VMSTATE_INSTANCE_ID_ANY, _ads7846, s);
+vmstate_register_any(NULL, _ads7846, s);
 }
 
 static void ads7846_class_init(ObjectClass *klass, void *data)
diff --git a/hw/input/stellaris_input.c b/hw/input/stellaris_input.c
index e6ee5e11f1..a58721c8cd 100644
--- a/hw/input/stellaris_input.c
+++ b/hw/input/stellaris_input.c
@@ -88,6 +88,5 @@ void stellaris_gamepad_init(int n, qemu_irq *irq, const int 
*keycode)
 }
 s->num_buttons = n;
 qemu_add_kbd_event_handler(stellaris_gamepad_put_key, s);
-vmstate_register(NULL, VMSTATE_INSTANCE_ID_ANY,
- _stellaris_gamepad, s);
+vmstate_register_any(NULL, _stellaris_gamepad, s);
 }
diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index dc07984ae9..94ce9e18ff 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -1883,8 +1883,7 @@ static void e100_nic_realize(PCIDevice *pci_dev, Error 
**errp)
 
 s->vmstate = g_memdup(_eepro100, sizeof(vmstate_eepro100));
 s->vmstate->name = qemu_get_queue(s->nic)->model;
-vmstate_register(VMSTATE_IF(_dev->qdev), VMSTATE_INSTANCE_ID_ANY,
- s->vmstate, s);
+vmstate_register_any(VMSTATE_IF(_dev->qdev), s->vmstate, s);
 }
 
 static void eepro100_instance_init(Object *obj)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 7d09e1a39d..885c04b6f5 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -147,7 +147,7 @@ static void pci_bus_realize(BusState *qbus, Error **errp)
 bus->machine_done.notify = pcibus_machine_done;
 qemu_add_machine_init_done_notifier(>machine_done);
 
-vmstate_register(NULL, VMSTATE_INSTANCE_ID_ANY, _pcibus, bus);
+vmstate_register_any(NULL, _pcibus, bus);
 }
 
 static void pcie_bus_realize(BusState *qbus, Error **errp)
diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
index b2f009c816..ad7afe7544 100644
--- a/hw/ppc/spapr_nvdimm.c
+++ b/hw/ppc/spapr_nvdimm.c
@@ -876,8 +876,7 @@ static void spapr_nvdimm_realize(NVDIMMDevice *dimm, Error 
**errp)
 s_nvdimm->hcall_flush_required = true;
 }
 
-vmstate_register(NULL, 

[PULL 21/39] hw/s390x/s390-stattrib: Don't call register_savevm_live() during instance_init()

2023-10-24 Thread Juan Quintela
From: Thomas Huth 

We must not call register_savevm_live() from an instance_init() function
(since this could be called multiple times during device introspection).
Move this to the realize() function instead.

Acked-by: David Hildenbrand 
Reviewed-by: Eric Farman 
Signed-off-by: Juan Quintela 
Signed-off-by: Thomas Huth 
Signed-off-by: Juan Quintela 
Message-ID: <20231020150554.664422-4-th...@redhat.com>
---
 hw/s390x/s390-stattrib.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index 52f9fc036e..54a138011c 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -331,6 +331,17 @@ static const TypeInfo qemu_s390_stattrib_info = {
 
 /* Generic abstract object: */
 
+static SaveVMHandlers savevm_s390_stattrib_handlers = {
+.save_setup = cmma_save_setup,
+.save_live_iterate = cmma_save_iterate,
+.save_live_complete_precopy = cmma_save_complete,
+.state_pending_exact = cmma_state_pending,
+.state_pending_estimate = cmma_state_pending,
+.save_cleanup = cmma_save_cleanup,
+.load_state = cmma_load,
+.is_active = cmma_active,
+};
+
 static void s390_stattrib_realize(DeviceState *dev, Error **errp)
 {
 bool ambiguous = false;
@@ -338,7 +349,11 @@ static void s390_stattrib_realize(DeviceState *dev, Error 
**errp)
 object_resolve_path_type("", TYPE_S390_STATTRIB, );
 if (ambiguous) {
 error_setg(errp, "storage_attributes device already exists");
+return;
 }
+
+register_savevm_live(TYPE_S390_STATTRIB, 0, 0,
+ _s390_stattrib_handlers, dev);
 }
 
 static Property s390_stattrib_props[] = {
@@ -355,24 +370,10 @@ static void s390_stattrib_class_init(ObjectClass *oc, 
void *data)
 device_class_set_props(dc, s390_stattrib_props);
 }
 
-static SaveVMHandlers savevm_s390_stattrib_handlers = {
-.save_setup = cmma_save_setup,
-.save_live_iterate = cmma_save_iterate,
-.save_live_complete_precopy = cmma_save_complete,
-.state_pending_exact = cmma_state_pending,
-.state_pending_estimate = cmma_state_pending,
-.save_cleanup = cmma_save_cleanup,
-.load_state = cmma_load,
-.is_active = cmma_active,
-};
-
 static void s390_stattrib_instance_init(Object *obj)
 {
 S390StAttribState *sas = S390_STATTRIB(obj);
 
-register_savevm_live(TYPE_S390_STATTRIB, 0, 0,
- _s390_stattrib_handlers, sas);
-
 sas->migration_cur_gfn = 0;
 }
 
-- 
2.41.0




[PULL 29/39] migration: Hack to maintain backwards compatibility for ppc

2023-10-24 Thread Juan Quintela
Current code does:
- register pre_2_10_vmstate_dummy_icp with "icp/server" and instance
  dependinfg on cpu number
- for newer machines, it register vmstate_icp with "icp/server" name
  and instance 0
- now it unregisters "icp/server" for the 1st instance.

This is wrong at many levels:
- we shouldn't have two VMSTATEDescriptions with the same name
- In case this is the only solution that we can came with, it needs to
  be:
  * register pre_2_10_vmstate_dummy_icp
  * unregister pre_2_10_vmstate_dummy_icp
  * register real vmstate_icp

Created vmstate_replace_hack_for_ppc() with warnings left and right
that it is a hack.

CC: Cedric Le Goater 
CC: Daniel Henrique Barboza 
CC: David Gibson 
CC: Greg Kurz 

Reviewed-by: Nicholas Piggin 
Signed-off-by: Juan Quintela 
Message-ID: <20231020090731.28701-8-quint...@redhat.com>
---
 include/migration/vmstate.h | 11 +++
 hw/intc/xics.c  | 18 --
 hw/ppc/spapr.c  | 25 +++--
 migration/savevm.c  | 18 ++
 4 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 1ea97ccf2d..9821918631 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -1230,6 +1230,17 @@ static inline int vmstate_register(VMStateIf *obj, int 
instance_id,
   opaque, -1, 0, NULL);
 }
 
+/**
+ * vmstate_replace_hack_for_ppc() - ppc used to abuse vmstate_register
+ *
+ * Don't even think about using this function in new code.
+ *
+ * Returns: 0 on success, -1 on failure
+ */
+int vmstate_replace_hack_for_ppc(VMStateIf *obj, int instance_id,
+ const VMStateDescription *vmsd,
+ void *opaque);
+
 /**
  * vmstate_register_any() - legacy function to register state
  * serialisation description and let the function choose the id
diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index c7f8abd71e..c77e986136 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -335,8 +335,22 @@ static void icp_realize(DeviceState *dev, Error **errp)
 return;
 }
 }
-
-vmstate_register(NULL, icp->cs->cpu_index, _icp_server, icp);
+/*
+ * The way that pre_2_10_icp is handling is really, really hacky.
+ * We used to have here this call:
+ *
+ * vmstate_register(NULL, icp->cs->cpu_index, _icp_server, icp);
+ *
+ * But we were doing:
+ * pre_2_10_vmstate_register_dummy_icp()
+ * this vmstate_register()
+ * pre_2_10_vmstate_unregister_dummy_icp()
+ *
+ * So for a short amount of time we had to vmstate entries with
+ * the same name.  This fixes it.
+ */
+vmstate_replace_hack_for_ppc(NULL, icp->cs->cpu_index,
+ _icp_server, icp);
 }
 
 static void icp_unrealize(DeviceState *dev)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b25093be28..df09aa9d6a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -143,6 +143,11 @@ static bool pre_2_10_vmstate_dummy_icp_needed(void *opaque)
 }
 
 static const VMStateDescription pre_2_10_vmstate_dummy_icp = {
+/*
+ * Hack ahead.  We can't have two devices with the same name and
+ * instance id.  So I rename this to pass make check.
+ * Real help from people who knows the hardware is needed.
+ */
 .name = "icp/server",
 .version_id = 1,
 .minimum_version_id = 1,
@@ -155,16 +160,32 @@ static const VMStateDescription 
pre_2_10_vmstate_dummy_icp = {
 },
 };
 
+/*
+ * See comment in hw/intc/xics.c:icp_realize()
+ *
+ * You have to remove vmstate_replace_hack_for_ppc() when you remove
+ * the machine types that need the following function.
+ */
 static void pre_2_10_vmstate_register_dummy_icp(int i)
 {
 vmstate_register(NULL, i, _2_10_vmstate_dummy_icp,
  (void *)(uintptr_t) i);
 }
 
+/*
+ * See comment in hw/intc/xics.c:icp_realize()
+ *
+ * You have to remove vmstate_replace_hack_for_ppc() when you remove
+ * the machine types that need the following function.
+ */
 static void pre_2_10_vmstate_unregister_dummy_icp(int i)
 {
-vmstate_unregister(NULL, _2_10_vmstate_dummy_icp,
-   (void *)(uintptr_t) i);
+/*
+ * This used to be:
+ *
+ *vmstate_unregister(NULL, _2_10_vmstate_dummy_icp,
+ *  (void *)(uintptr_t) i);
+ */
 }
 
 int spapr_max_server_number(SpaprMachineState *spapr)
diff --git a/migration/savevm.c b/migration/savevm.c
index ca5c7cebe0..1d1639c4b6 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -846,6 +846,24 @@ static void vmstate_check(const VMStateDescription *vmsd)
 }
 }
 
+/*
+ * See comment in hw/intc/xics.c:icp_realize()
+ *
+ * This function can be removed when
+ * pre_2_10_vmstate_register_dummy_icp() is removed.
+ */
+int vmstate_replace_hack_for_ppc(VMStateIf *obj, int instance_id,
+ const 

[PULL 12/39] migration: Move busy++ to migrate_with_multithread

2023-10-24 Thread Juan Quintela
And now we can simplify save_compress_page().

Reviewed-by: Lukas Straub 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20231019110724.15324-7-quint...@redhat.com>
---
 migration/ram-compress.c | 1 +
 migration/ram.c  | 8 ++--
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index a991b15b7a..f56e1f8e69 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -291,6 +291,7 @@ bool compress_page_with_multi_thread(RAMBlock *block, 
ram_addr_t offset,
 }
 if (!wait) {
 qemu_mutex_unlock(_done_lock);
+compression_counters.busy++;
 return false;
 }
 /*
diff --git a/migration/ram.c b/migration/ram.c
index 63a575ae90..46209388ec 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2067,12 +2067,8 @@ static bool save_compress_page(RAMState *rs, 
PageSearchStatus *pss,
 return false;
 }
 
-if (compress_page_with_multi_thread(pss->block, offset, send_queued_data)) 
{
-return true;
-}
-
-compression_counters.busy++;
-return false;
+return compress_page_with_multi_thread(pss->block, offset,
+   send_queued_data);
 }
 
 /**
-- 
2.41.0




[PULL 22/39] migration/ram: Fix compilation with -Wshadow=local

2023-10-24 Thread Juan Quintela
From: Thomas Huth 

Rename the variable here to avoid that it shadows a variable from
the beginning of the function scope. With this change the code now
successfully compiles with -Wshadow=local.

Signed-off-by: Thomas Huth 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231024092220.55305-1-th...@redhat.com>
---
 migration/ram.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 6335564035..024dedb6b1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3147,6 +3147,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 rs->last_stage = !migration_in_colo_state();
 
 WITH_RCU_READ_LOCK_GUARD() {
+int rdma_reg_ret;
+
 if (!migration_in_postcopy()) {
 migration_bitmap_sync_precopy(rs, true);
 }
@@ -3177,9 +3179,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 
 compress_flush_data();
 
-int ret = rdma_registration_stop(f, RAM_CONTROL_FINISH);
-if (ret < 0) {
-qemu_file_set_error(f, ret);
+rdma_reg_ret = rdma_registration_stop(f, RAM_CONTROL_FINISH);
+if (rdma_reg_ret < 0) {
+qemu_file_set_error(f, rdma_reg_ret);
 }
 }
 
-- 
2.41.0




[PULL 33/39] migration: Use vmstate_register_any() for eeprom93xx

2023-10-24 Thread Juan Quintela
We can have more than one eeprom93xx.
For instance:

e100_nic_realize() -> eeprom93xx_new()

Reviewed-by: Stefan Berger 
Signed-off-by: Juan Quintela 
Message-ID: <20231020090731.28701-13-quint...@redhat.com>
---
 hw/nvram/eeprom93xx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/nvram/eeprom93xx.c b/hw/nvram/eeprom93xx.c
index 1081e2cc0d..57d63638d7 100644
--- a/hw/nvram/eeprom93xx.c
+++ b/hw/nvram/eeprom93xx.c
@@ -321,7 +321,7 @@ eeprom_t *eeprom93xx_new(DeviceState *dev, uint16_t nwords)
 /* Output DO is tristate, read results in 1. */
 eeprom->eedo = 1;
 logout("eeprom = 0x%p, nwords = %u\n", eeprom, nwords);
-vmstate_register(VMSTATE_IF(dev), 0, _eeprom, eeprom);
+vmstate_register_any(VMSTATE_IF(dev), _eeprom, eeprom);
 return eeprom;
 }
 
-- 
2.41.0




[PULL 09/39] migration: Remove save_page_use_compression()

2023-10-24 Thread Juan Quintela
After previous patch, we disable the posiblity that we use compression
together with xbzrle.  So we can use directly migrate_compress().

Once there, now we don't need the rs parameter, so remove it.

Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20231019110724.15324-4-quint...@redhat.com>
---
 migration/ram.c | 34 +++---
 1 file changed, 7 insertions(+), 27 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index e0ad732ee8..8246663f64 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1291,8 +1291,6 @@ static int ram_save_multifd_page(QEMUFile *file, RAMBlock 
*block,
 return 1;
 }
 
-static bool save_page_use_compression(RAMState *rs);
-
 static int send_queued_data(CompressParam *param)
 {
 PageSearchStatus *pss = _state->pss[RAM_CHANNEL_PRECOPY];
@@ -1329,9 +1327,9 @@ static int send_queued_data(CompressParam *param)
 return len;
 }
 
-static void ram_flush_compressed_data(RAMState *rs)
+static void ram_flush_compressed_data(void)
 {
-if (!save_page_use_compression(rs)) {
+if (!migrate_compress()) {
 return;
 }
 
@@ -1393,7 +1391,7 @@ static int find_dirty_block(RAMState *rs, 
PageSearchStatus *pss)
  * Also If xbzrle is on, stop using the data compression at this
  * point. In theory, xbzrle can do better than compression.
  */
-ram_flush_compressed_data(rs);
+ram_flush_compressed_data();
 
 /* Hit the end of the list */
 pss->block = QLIST_FIRST_RCU(_list.blocks);
@@ -2042,24 +2040,6 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t 
start, ram_addr_t len)
 return 0;
 }
 
-static bool save_page_use_compression(RAMState *rs)
-{
-if (!migrate_compress()) {
-return false;
-}
-
-/*
- * If xbzrle is enabled (e.g., after first round of migration), stop
- * using the data compression. In theory, xbzrle can do better than
- * compression.
- */
-if (rs->xbzrle_started) {
-return false;
-}
-
-return true;
-}
-
 /*
  * try to compress the page before posting it out, return true if the page
  * has been properly handled by compression, otherwise needs other
@@ -2068,7 +2048,7 @@ static bool save_page_use_compression(RAMState *rs)
 static bool save_compress_page(RAMState *rs, PageSearchStatus *pss,
ram_addr_t offset)
 {
-if (!save_page_use_compression(rs)) {
+if (!migrate_compress()) {
 return false;
 }
 
@@ -2083,7 +2063,7 @@ static bool save_compress_page(RAMState *rs, 
PageSearchStatus *pss,
  * much CPU resource.
  */
 if (pss->block != pss->last_sent_block) {
-ram_flush_compressed_data(rs);
+ram_flush_compressed_data();
 return false;
 }
 
@@ -3135,7 +3115,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
  * page is sent in one chunk.
  */
 if (migrate_postcopy_ram()) {
-ram_flush_compressed_data(rs);
+ram_flush_compressed_data();
 }
 
 /*
@@ -3236,7 +3216,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 }
 qemu_mutex_unlock(>bitmap_mutex);
 
-ram_flush_compressed_data(rs);
+ram_flush_compressed_data();
 
 int ret = rdma_registration_stop(f, RAM_CONTROL_FINISH);
 if (ret < 0) {
-- 
2.41.0




[PULL 25/39] migration: Create vmstate_register_any()

2023-10-24 Thread Juan Quintela
We have lots of cases where we are using an instance_id==0 when we
should be using VMSTATE_INSTANCE_ID_ANY (-1).  Basically everything
that can have more than one needs to have a proper instance_id or -1
and the system will take one for it.

vmstate_register_any(): We register with -1.

Reviewed-by: Stefan Berger 
Signed-off-by: Juan Quintela 
Message-ID: <20231020090731.28701-2-quint...@redhat.com>
---
 include/migration/vmstate.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 1af181877c..1ea97ccf2d 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -1230,6 +1230,23 @@ static inline int vmstate_register(VMStateIf *obj, int 
instance_id,
   opaque, -1, 0, NULL);
 }
 
+/**
+ * vmstate_register_any() - legacy function to register state
+ * serialisation description and let the function choose the id
+ *
+ * New code shouldn't be using this function as QOM-ified devices have
+ * dc->vmsd to store the serialisation description.
+ *
+ * Returns: 0 on success, -1 on failure
+ */
+static inline int vmstate_register_any(VMStateIf *obj,
+   const VMStateDescription *vmsd,
+   void *opaque)
+{
+return vmstate_register_with_alias_id(obj, VMSTATE_INSTANCE_ID_ANY, vmsd,
+  opaque, -1, 0, NULL);
+}
+
 void vmstate_unregister(VMStateIf *obj, const VMStateDescription *vmsd,
 void *opaque);
 
-- 
2.41.0




[PULL 23/39] migration: rename vmstate_save_needed->vmstate_section_needed

2023-10-24 Thread Juan Quintela
From: Marc-André Lureau 

The function is used on save at this point. The following commits will
use it on load.

Signed-off-by: Marc-André Lureau 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231024084043.2926316-5-marcandre.lur...@redhat.com>
---
 include/migration/vmstate.h | 2 +-
 migration/savevm.c  | 2 +-
 migration/vmstate.c | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 1a31fb7293..1af181877c 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -1202,7 +1202,7 @@ int vmstate_save_state_v(QEMUFile *f, const 
VMStateDescription *vmsd,
  void *opaque, JSONWriter *vmdesc,
  int version_id, Error **errp);
 
-bool vmstate_save_needed(const VMStateDescription *vmsd, void *opaque);
+bool vmstate_section_needed(const VMStateDescription *vmsd, void *opaque);
 
 #define  VMSTATE_INSTANCE_ID_ANY  -1
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 8622f229e5..ca5c7cebe0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -985,7 +985,7 @@ static int vmstate_save(QEMUFile *f, SaveStateEntry *se, 
JSONWriter *vmdesc)
 if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
 return 0;
 }
-if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
+if (se->vmsd && !vmstate_section_needed(se->vmsd, se->opaque)) {
 trace_savevm_section_skip(se->idstr, se->section_id);
 return 0;
 }
diff --git a/migration/vmstate.c b/migration/vmstate.c
index 1cf9e45b85..16e33a5d34 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -324,7 +324,7 @@ static void vmsd_desc_field_end(const VMStateDescription 
*vmsd,
 }
 
 
-bool vmstate_save_needed(const VMStateDescription *vmsd, void *opaque)
+bool vmstate_section_needed(const VMStateDescription *vmsd, void *opaque)
 {
 if (vmsd->needed && !vmsd->needed(opaque)) {
 /* optional section not needed */
@@ -522,7 +522,7 @@ static int vmstate_subsection_save(QEMUFile *f, const 
VMStateDescription *vmsd,
 
 trace_vmstate_subsection_save_top(vmsd->name);
 while (sub && *sub) {
-if (vmstate_save_needed(*sub, opaque)) {
+if (vmstate_section_needed(*sub, opaque)) {
 const VMStateDescription *vmsdsub = *sub;
 uint8_t len;
 
-- 
2.41.0




[PULL 07/39] migration: Give one error if trying to set MULTIFD and XBZRLE

2023-10-24 Thread Juan Quintela
Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20231019110724.15324-2-quint...@redhat.com>
---
 migration/options.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/migration/options.c b/migration/options.c
index 42fb818956..b8c3c3218d 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -618,6 +618,13 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
Error **errp)
 }
 }
 
+if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) {
+if (new_caps[MIGRATION_CAPABILITY_XBZRLE]) {
+error_setg(errp, "Multifd is not compatible with xbzrle");
+return false;
+}
+}
+
 return true;
 }
 
-- 
2.41.0




[PULL 14/39] migration: Export send_queued_data()

2023-10-24 Thread Juan Quintela
This function is only used for compression.  So we rename it as
compress_send_queued_data().  We put it on ram-compress.h because we
are moving it later to ram-compress.c.

Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20231019110724.15324-9-quint...@redhat.com>
---
 migration/ram-compress.h | 1 +
 migration/ram.c  | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/migration/ram-compress.h b/migration/ram-compress.h
index 76dacd3ec7..636281ed97 100644
--- a/migration/ram-compress.h
+++ b/migration/ram-compress.h
@@ -72,5 +72,6 @@ void populate_compress(MigrationInfo *info);
 uint64_t ram_compressed_pages(void);
 void update_compress_thread_counts(const CompressParam *param, int bytes_xmit);
 void compress_update_rates(uint64_t page_count);
+int compress_send_queued_data(CompressParam *param);
 
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index f7daf2226e..b6d485358e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1264,7 +1264,7 @@ static int ram_save_multifd_page(QEMUFile *file, RAMBlock 
*block,
 return 1;
 }
 
-static int send_queued_data(CompressParam *param)
+int compress_send_queued_data(CompressParam *param)
 {
 PageSearchStatus *pss = _state->pss[RAM_CHANNEL_PRECOPY];
 MigrationState *ms = migrate_get_current();
@@ -1306,7 +1306,7 @@ static void ram_flush_compressed_data(void)
 return;
 }
 
-flush_compressed_data(send_queued_data);
+flush_compressed_data(compress_send_queued_data);
 }
 
 #define PAGE_ALL_CLEAN 0
@@ -2041,7 +2041,7 @@ static bool save_compress_page(RAMState *rs, 
PageSearchStatus *pss,
 }
 
 return compress_page_with_multi_thread(pss->block, offset,
-   send_queued_data);
+   compress_send_queued_data);
 }
 
 /**
-- 
2.41.0




[PULL 30/39] migration: Check in savevm_state_handler_insert for dups

2023-10-24 Thread Juan Quintela
From: Peter Xu 

Before finally register one SaveStateEntry, we detect for duplicated
entries.  This could be helpful to notify us asap instead of get
silent migration failures which could be hard to diagnose.

For example, this patch will generate a message like this (if without
previous fixes on x2apic) as long as we wants to boot a VM instance
with "-smp 200,maxcpus=288,sockets=2,cores=72,threads=2" and QEMU will
bail out even before VM starts:

savevm_state_handler_insert: Detected duplicate SaveStateEntry: id=apic, 
instance_id=0x0

Suggested-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Juan Quintela 
Message-ID: <20231020090731.28701-10-quint...@redhat.com>
---
 migration/savevm.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 1d1639c4b6..11ece3a91a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -237,6 +237,8 @@ static SaveState savevm_state = {
 .global_section_id = 0,
 };
 
+static SaveStateEntry *find_se(const char *idstr, uint32_t instance_id);
+
 static bool should_validate_capability(int capability)
 {
 assert(capability >= 0 && capability < MIGRATION_CAPABILITY__MAX);
@@ -716,6 +718,18 @@ static void savevm_state_handler_insert(SaveStateEntry 
*nse)
 
 assert(priority <= MIG_PRI_MAX);
 
+/*
+ * This should never happen otherwise migration will probably fail
+ * silently somewhere because we can be wrongly applying one
+ * object properties upon another one.  Bail out ASAP.
+ */
+if (find_se(nse->idstr, nse->instance_id)) {
+error_report("%s: Detected duplicate SaveStateEntry: "
+ "id=%s, instance_id=0x%"PRIx32, __func__,
+ nse->idstr, nse->instance_id);
+exit(EXIT_FAILURE);
+}
+
 for (i = priority - 1; i >= 0; i--) {
 se = savevm_state.handler_pri_head[i];
 if (se != NULL) {
-- 
2.41.0




[PULL 19/39] hw/s390x/s390-skeys: Don't call register_savevm_live() during instance_init()

2023-10-24 Thread Juan Quintela
From: Thomas Huth 

Since the instance_init() function immediately tries to set the
property to "true", the s390_skeys_set_migration_enabled() tries
to register a savevm handler during instance_init(). However,
instance_init() functions can be called multiple times, e.g. for
introspection of devices. That means multiple instances of devices
can be created during runtime (which is fine as long as they all
don't get realized, too), so the "Prevent double registration of
savevm handler" check in the s390_skeys_set_migration_enabled()
function does not work at all as expected (since there could be
more than one instance).

Thus we must not call register_savevm_live() from an instance_init()
function at all. Move this to the realize() function instead. This
way we can also get rid of the property getter and setter functions
completely, simplifying the code along the way quite a bit.

Acked-by: David Hildenbrand 
Reviewed-by: Eric Farman 
Acked-by: Juan Quintela 
Signed-off-by: Thomas Huth 
Signed-off-by: Juan Quintela 
Message-ID: <20231020150554.664422-2-th...@redhat.com>
---
 hw/s390x/s390-skeys.c | 35 ---
 1 file changed, 8 insertions(+), 27 deletions(-)

diff --git a/hw/s390x/s390-skeys.c b/hw/s390x/s390-skeys.c
index 5024faf411..8e9d9e41e8 100644
--- a/hw/s390x/s390-skeys.c
+++ b/hw/s390x/s390-skeys.c
@@ -12,6 +12,7 @@
 #include "qemu/osdep.h"
 #include "qemu/units.h"
 #include "hw/boards.h"
+#include "hw/qdev-properties.h"
 #include "hw/s390x/storage-keys.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-misc-target.h"
@@ -432,58 +433,38 @@ static int s390_storage_keys_load(QEMUFile *f, void 
*opaque, int version_id)
 return ret;
 }
 
-static inline bool s390_skeys_get_migration_enabled(Object *obj, Error **errp)
-{
-S390SKeysState *ss = S390_SKEYS(obj);
-
-return ss->migration_enabled;
-}
-
 static SaveVMHandlers savevm_s390_storage_keys = {
 .save_state = s390_storage_keys_save,
 .load_state = s390_storage_keys_load,
 };
 
-static inline void s390_skeys_set_migration_enabled(Object *obj, bool value,
-Error **errp)
+static void s390_skeys_realize(DeviceState *dev, Error **errp)
 {
-S390SKeysState *ss = S390_SKEYS(obj);
-
-/* Prevent double registration of savevm handler */
-if (ss->migration_enabled == value) {
-return;
-}
-
-ss->migration_enabled = value;
+S390SKeysState *ss = S390_SKEYS(dev);
 
 if (ss->migration_enabled) {
 register_savevm_live(TYPE_S390_SKEYS, 0, 1,
  _s390_storage_keys, ss);
-} else {
-unregister_savevm(VMSTATE_IF(ss), TYPE_S390_SKEYS, ss);
 }
 }
 
-static void s390_skeys_instance_init(Object *obj)
-{
-object_property_add_bool(obj, "migration-enabled",
- s390_skeys_get_migration_enabled,
- s390_skeys_set_migration_enabled);
-object_property_set_bool(obj, "migration-enabled", true, NULL);
-}
+static Property s390_skeys_props[] = {
+DEFINE_PROP_BOOL("migration-enabled", S390SKeysState, migration_enabled, 
true),
+};
 
 static void s390_skeys_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
 
 dc->hotpluggable = false;
+dc->realize = s390_skeys_realize;
+device_class_set_props(dc, s390_skeys_props);
 set_bit(DEVICE_CATEGORY_MISC, dc->categories);
 }
 
 static const TypeInfo s390_skeys_info = {
 .name  = TYPE_S390_SKEYS,
 .parent= TYPE_DEVICE,
-.instance_init = s390_skeys_instance_init,
 .instance_size = sizeof(S390SKeysState),
 .class_init= s390_skeys_class_init,
 .class_size= sizeof(S390SKeysClass),
-- 
2.41.0




[PULL 16/39] migration: Merge flush_compressed_data() and compress_flush_data()

2023-10-24 Thread Juan Quintela
Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20231019110724.15324-11-quint...@redhat.com>
---
 migration/ram-compress.h |  1 -
 migration/ram-compress.c | 17 ++---
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/migration/ram-compress.h b/migration/ram-compress.h
index 7ba01e2882..e222887fb7 100644
--- a/migration/ram-compress.h
+++ b/migration/ram-compress.h
@@ -59,7 +59,6 @@ typedef struct CompressParam CompressParam;
 void compress_threads_save_cleanup(void);
 int compress_threads_save_setup(void);
 
-void flush_compressed_data(int (send_queued_data(CompressParam *)));
 bool compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset,
   int (send_queued_data(CompressParam *)));
 
diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index 1443a1cb45..036e44085b 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -241,10 +241,14 @@ static inline void compress_reset_result(CompressParam 
*param)
 param->offset = 0;
 }
 
-void flush_compressed_data(int (send_queued_data(CompressParam *)))
+void compress_flush_data(void)
 {
 int thread_count = migrate_compress_threads();
 
+if (!migrate_compress()) {
+return;
+}
+
 qemu_mutex_lock(_done_lock);
 for (int i = 0; i < thread_count; i++) {
 while (!comp_param[i].done) {
@@ -257,7 +261,7 @@ void flush_compressed_data(int 
(send_queued_data(CompressParam *)))
 qemu_mutex_lock(_param[i].mutex);
 if (!comp_param[i].quit) {
 CompressParam *param = _param[i];
-send_queued_data(param);
+compress_send_queued_data(param);
 assert(qemu_file_buffer_empty(param->file));
 compress_reset_result(param);
 }
@@ -558,12 +562,3 @@ void compress_update_rates(uint64_t page_count)
 compression_counters.compressed_size;
 }
 }
-
-void compress_flush_data(void)
-{
-if (!migrate_compress()) {
-return;
-}
-
-flush_compressed_data(compress_send_queued_data);
-}
-- 
2.41.0




[PULL 15/39] migration: Move ram_flush_compressed_data() to ram-compress.c

2023-10-24 Thread Juan Quintela
As we export it, rename it compress_flush_data().

Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20231019110724.15324-10-quint...@redhat.com>
---
 migration/ram-compress.h |  1 +
 migration/ram-compress.c |  9 +
 migration/ram.c  | 17 -
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/migration/ram-compress.h b/migration/ram-compress.h
index 636281ed97..7ba01e2882 100644
--- a/migration/ram-compress.h
+++ b/migration/ram-compress.h
@@ -73,5 +73,6 @@ uint64_t ram_compressed_pages(void);
 void update_compress_thread_counts(const CompressParam *param, int bytes_xmit);
 void compress_update_rates(uint64_t page_count);
 int compress_send_queued_data(CompressParam *param);
+void compress_flush_data(void);
 
 #endif
diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index af42cab0fe..1443a1cb45 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -558,3 +558,12 @@ void compress_update_rates(uint64_t page_count)
 compression_counters.compressed_size;
 }
 }
+
+void compress_flush_data(void)
+{
+if (!migrate_compress()) {
+return;
+}
+
+flush_compressed_data(compress_send_queued_data);
+}
diff --git a/migration/ram.c b/migration/ram.c
index b6d485358e..849752ef29 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1300,15 +1300,6 @@ int compress_send_queued_data(CompressParam *param)
 return len;
 }
 
-static void ram_flush_compressed_data(void)
-{
-if (!migrate_compress()) {
-return;
-}
-
-flush_compressed_data(compress_send_queued_data);
-}
-
 #define PAGE_ALL_CLEAN 0
 #define PAGE_TRY_AGAIN 1
 #define PAGE_DIRTY_FOUND 2
@@ -1364,7 +1355,7 @@ static int find_dirty_block(RAMState *rs, 
PageSearchStatus *pss)
  * Also If xbzrle is on, stop using the data compression at this
  * point. In theory, xbzrle can do better than compression.
  */
-ram_flush_compressed_data();
+compress_flush_data();
 
 /* Hit the end of the list */
 pss->block = QLIST_FIRST_RCU(_list.blocks);
@@ -2036,7 +2027,7 @@ static bool save_compress_page(RAMState *rs, 
PageSearchStatus *pss,
  * much CPU resource.
  */
 if (pss->block != pss->last_sent_block) {
-ram_flush_compressed_data();
+compress_flush_data();
 return false;
 }
 
@@ -3083,7 +3074,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
  * page is sent in one chunk.
  */
 if (migrate_postcopy_ram()) {
-ram_flush_compressed_data();
+compress_flush_data();
 }
 
 /*
@@ -3184,7 +3175,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 }
 qemu_mutex_unlock(>bitmap_mutex);
 
-ram_flush_compressed_data();
+compress_flush_data();
 
 int ret = rdma_registration_stop(f, RAM_CONTROL_FINISH);
 if (ret < 0) {
-- 
2.41.0




[PULL 05/39] migration: Receiving a zero page non zero is an error

2023-10-24 Thread Juan Quintela
We don't allow non zero compressed pages since:

commit 3edcd7e6ebae3ef0ac178eed5f4225803159562d
Author: Peter Lieven 
Date:   Tue Mar 26 10:58:35 2013 +0100

migration: search for zero instead of dup pages

RDMA case is a bit more complicated, but they don't handle it since:

commit a1febc4950f2c6232c002f401d7cd409f6fa6a88
Author: Richard Henderson 
Date:   Mon Aug 29 11:46:14 2016 -0700

cutils: Export only buffer_is_zero

Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Signed-off-by: Juan Quintela 
Message-ID: <20231019085259.13307-2-quint...@redhat.com>
---
 migration/ram.c  | 15 +++
 migration/rdma.c |  6 +-
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 92769902bb..4bfb20c94a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3715,16 +3715,18 @@ int ram_load_postcopy(QEMUFile *f, int channel)
 switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
 case RAM_SAVE_FLAG_ZERO:
 ch = qemu_get_byte(f);
+if (ch != 0) {
+error_report("Found a zero page with value %d", ch);
+ret = -EINVAL;
+break;
+}
 /*
  * Can skip to set page_buffer when
  * this is a zero page and (block->page_size == TARGET_PAGE_SIZE).
  */
-if (ch || !matches_target_page_size) {
+if (!matches_target_page_size) {
 memset(page_buffer, ch, TARGET_PAGE_SIZE);
 }
-if (ch) {
-tmp_page->all_zero = false;
-}
 break;
 
 case RAM_SAVE_FLAG_PAGE:
@@ -4030,6 +4032,11 @@ static int ram_load_precopy(QEMUFile *f)
 
 case RAM_SAVE_FLAG_ZERO:
 ch = qemu_get_byte(f);
+if (ch != 0) {
+error_report("Found a zero page with value %d", ch);
+ret = -EINVAL;
+break;
+}
 ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
 break;
 
diff --git a/migration/rdma.c b/migration/rdma.c
index 2a1852ec7f..2d963fd147 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3592,7 +3592,11 @@ int rdma_registration_handle(QEMUFile *f)
 
 host_addr = block->local_host_addr +
 (comp->offset - block->offset);
-
+if (comp->value) {
+error_report("rdma: Zero page with non-zero (%d) value",
+ comp->value);
+goto err;
+}
 ram_handle_compressed(host_addr, comp->value, comp->length);
 break;
 
-- 
2.41.0




[PULL 20/39] hw/s390x/s390-stattrib: Simplify handling of the "migration-enabled" property

2023-10-24 Thread Juan Quintela
From: Thomas Huth 

There's no need for dedicated handlers here if they don't do anything
special.

Acked-by: David Hildenbrand 
Reviewed-by: Eric Farman 
Acked-by: Juan Quintela 
Signed-off-by: Thomas Huth 
Signed-off-by: Juan Quintela 
Message-ID: <20231020150554.664422-3-th...@redhat.com>
---
 hw/s390x/s390-stattrib.c | 26 ++
 1 file changed, 6 insertions(+), 20 deletions(-)

diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index 220e845d12..52f9fc036e 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -13,6 +13,7 @@
 #include "qemu/units.h"
 #include "migration/qemu-file.h"
 #include "migration/register.h"
+#include "hw/qdev-properties.h"
 #include "hw/s390x/storage-attributes.h"
 #include "qemu/error-report.h"
 #include "exec/ram_addr.h"
@@ -340,6 +341,10 @@ static void s390_stattrib_realize(DeviceState *dev, Error 
**errp)
 }
 }
 
+static Property s390_stattrib_props[] = {
+DEFINE_PROP_BOOL("migration-enabled", S390StAttribState, 
migration_enabled, true),
+};
+
 static void s390_stattrib_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
@@ -347,22 +352,7 @@ static void s390_stattrib_class_init(ObjectClass *oc, void 
*data)
 dc->hotpluggable = false;
 set_bit(DEVICE_CATEGORY_MISC, dc->categories);
 dc->realize = s390_stattrib_realize;
-}
-
-static inline bool s390_stattrib_get_migration_enabled(Object *obj,
-   Error **errp)
-{
-S390StAttribState *s = S390_STATTRIB(obj);
-
-return s->migration_enabled;
-}
-
-static inline void s390_stattrib_set_migration_enabled(Object *obj, bool value,
-Error **errp)
-{
-S390StAttribState *s = S390_STATTRIB(obj);
-
-s->migration_enabled = value;
+device_class_set_props(dc, s390_stattrib_props);
 }
 
 static SaveVMHandlers savevm_s390_stattrib_handlers = {
@@ -383,10 +373,6 @@ static void s390_stattrib_instance_init(Object *obj)
 register_savevm_live(TYPE_S390_STATTRIB, 0, 0,
  _s390_stattrib_handlers, sas);
 
-object_property_add_bool(obj, "migration-enabled",
- s390_stattrib_get_migration_enabled,
- s390_stattrib_set_migration_enabled);
-object_property_set_bool(obj, "migration-enabled", true, NULL);
 sas->migration_cur_gfn = 0;
 }
 
-- 
2.41.0




[PULL 01/39] migration/doc: Add contents

2023-10-24 Thread Juan Quintela
Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20231018112827.1325-2-quint...@redhat.com>
---
 docs/devel/migration.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
index c3e1400c0c..4d6a98ae58 100644
--- a/docs/devel/migration.rst
+++ b/docs/devel/migration.rst
@@ -28,6 +28,8 @@ the guest to be stopped.  Typically the time that the guest is
 unresponsive during live migration is the low hundred of milliseconds
 (notice that this depends on a lot of things).
 
+.. contents::
+
 Transports
 ==
 
-- 
2.41.0




[PULL 11/39] migration: Simplify compress_page_with_multithread()

2023-10-24 Thread Juan Quintela
Move the goto to a while true.

Reviewed-by: Lukas Straub 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20231019110724.15324-6-quint...@redhat.com>
---
 migration/ram-compress.c | 48 
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index ef03d60a6d..a991b15b7a 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -271,35 +271,35 @@ bool compress_page_with_multi_thread(RAMBlock *block, 
ram_addr_t offset,
 
 thread_count = migrate_compress_threads();
 qemu_mutex_lock(_done_lock);
-retry:
-for (int i = 0; i < thread_count; i++) {
-if (comp_param[i].done) {
-CompressParam *param = _param[i];
-qemu_mutex_lock(>mutex);
-param->done = false;
-send_queued_data(param);
-assert(qemu_file_buffer_empty(param->file));
-compress_reset_result(param);
-set_compress_params(param, block, offset);
 
-qemu_cond_signal(>cond);
-qemu_mutex_unlock(>mutex);
+while (true) {
+for (int i = 0; i < thread_count; i++) {
+if (comp_param[i].done) {
+CompressParam *param = _param[i];
+qemu_mutex_lock(>mutex);
+param->done = false;
+send_queued_data(param);
+assert(qemu_file_buffer_empty(param->file));
+compress_reset_result(param);
+set_compress_params(param, block, offset);
+
+qemu_cond_signal(>cond);
+qemu_mutex_unlock(>mutex);
+qemu_mutex_unlock(_done_lock);
+return true;
+}
+}
+if (!wait) {
 qemu_mutex_unlock(_done_lock);
-return true;
+return false;
 }
-}
-
-/*
- * wait for the free thread if the user specifies 'compress-wait-thread',
- * otherwise we will post the page out in the main thread as normal page.
- */
-if (wait) {
+/*
+ * wait for a free thread if the user specifies
+ * 'compress-wait-thread', otherwise we will post the page out
+ * in the main thread as normal page.
+ */
 qemu_cond_wait(_done_cond, _done_lock);
-goto retry;
 }
-qemu_mutex_unlock(_done_lock);
-
-return false;
 }
 
 /* return the size after decompression, or negative value on error */
-- 
2.41.0




[PULL 00/39] Migration 20231024 patches

2023-10-24 Thread Juan Quintela
The following changes since commit a95260486aa7e78d7c7194eba65cf03311ad94ad:

  Merge tag 'pull-tcg-20231023' of https://gitlab.com/rth7680/qemu into staging 
(2023-10-23 14:45:46 -0700)

are available in the Git repository at:

  https://gitlab.com/juan.quintela/qemu.git tags/migration-20231024-pull-request

for you to fetch changes up to 088f7f03da3f5b3487091302b795c22b1bfe56fb:

  migration: Deprecate old compression method (2023-10-24 13:48:24 +0200)


Migration Pull request (20231024)

Hi

In this PULL:
- vmstate registration fixes (thomas, juan)
- start merging vmstate_section_needed changes (marc)
- migration depreactions (juan)
- migration documentation for backwards compatibility (juan)

Please apply.



Juan Quintela (31):
  migration/doc: Add contents
  migration/doc: Add documentation for backwards compatiblity
  migration/doc: How to migrate when hosts have different features
  migration/doc: We broke backwards compatibility
  migration: Receiving a zero page non zero is an error
  migration: Rename ram_handle_compressed() to ram_handle_zero()
  migration: Give one error if trying to set MULTIFD and XBZRLE
  migration: Give one error if trying to set COMPRESSION and XBZRLE
  migration: Remove save_page_use_compression()
  migration: Make compress_data_with_multithreads return bool
  migration: Simplify compress_page_with_multithread()
  migration: Move busy++ to migrate_with_multithread
  migration: Create compress_update_rates()
  migration: Export send_queued_data()
  migration: Move ram_flush_compressed_data() to ram-compress.c
  migration: Merge flush_compressed_data() and compress_flush_data()
  migration: Rename ram_compressed_pages() to compress_ram_pages()
  migration: Create vmstate_register_any()
  migration: Use vmstate_register_any()
  migration: Use vmstate_register_any() for isa-ide
  migration: Use VMSTATE_INSTANCE_ID_ANY for slirp
  migration: Hack to maintain backwards compatibility for ppc
  migration: Improve example and documentation of vmstate_register()
  migration: Use vmstate_register_any() for audio
  migration: Use vmstate_register_any() for eeprom93xx
  migration: Use vmstate_register_any() for vmware_vga
  qemu-iotests: Filter warnings about block migration being deprecated
  migration: migrate 'inc' command option is deprecated.
  migration: migrate 'blk' command option is deprecated.
  migration: Deprecate block migration
  migration: Deprecate old compression method

Marc-André Lureau (2):
  migration: rename vmstate_save_needed->vmstate_section_needed
  migration: set file error on subsection loading

Peter Xu (1):
  migration: Check in savevm_state_handler_insert for dups

Thomas Huth (5):
  hw/ipmi: Don't call vmstate_register() from instance_init() functions
  hw/s390x/s390-skeys: Don't call register_savevm_live() during
instance_init()
  hw/s390x/s390-stattrib: Simplify handling of the "migration-enabled"
property
  hw/s390x/s390-stattrib: Don't call register_savevm_live() during
instance_init()
  migration/ram: Fix compilation with -Wshadow=local

 docs/about/deprecated.rst|  35 ++
 docs/devel/migration.rst | 532 ++-
 qapi/migration.json  |  93 --
 include/migration/vmstate.h  |  30 +-
 migration/ram-compress.h |  10 +-
 migration/ram.h  |   3 +-
 audio/audio.c|   2 +-
 backends/dbus-vmstate.c  |   3 +-
 backends/tpm/tpm_emulator.c  |   3 +-
 hw/display/vmware_vga.c  |   2 +-
 hw/i2c/core.c|   2 +-
 hw/ide/isa.c |   2 +-
 hw/input/adb.c   |   2 +-
 hw/input/ads7846.c   |   2 +-
 hw/input/stellaris_input.c   |   3 +-
 hw/intc/xics.c   |  18 +-
 hw/ipmi/ipmi_bmc_extern.c|  29 +-
 hw/ipmi/isa_ipmi_bt.c|  34 +-
 hw/ipmi/isa_ipmi_kcs.c   |  50 +--
 hw/net/eepro100.c|   3 +-
 hw/nvram/eeprom93xx.c|   2 +-
 hw/pci/pci.c |   2 +-
 hw/ppc/spapr.c   |  25 +-
 hw/ppc/spapr_nvdimm.c|   3 +-
 hw/s390x/s390-skeys.c|  35 +-
 hw/s390x/s390-stattrib.c |  71 ++---
 hw/timer/arm_timer.c |   2 +-
 hw/virtio/virtio-mem.c   |   4 +-
 migration/block.c|   3 +
 migration/migration-hmp-cmds.c   |  10 +
 migration/migration.c|  10 +
 migration/options.c  |  36 ++-
 migration/ram-compress.c | 112 +--
 migration/ram.c  | 114 ++-
 migration/rdma.c |   8 +-
 migration/savevm.c   |  34 +-
 migration/vmstate.c  |   5 +-
 net/slirp.c  |   5 +-
 tests/qemu-iotests/183   |   2 +-
 tests/qemu-iotests/common.filter |   7 +
 40 files changed, 1041 inser

[PULL 06/39] migration: Rename ram_handle_compressed() to ram_handle_zero()

2023-10-24 Thread Juan Quintela
Now that we know it only handles zero, we can remove the ch parameter.

Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Signed-off-by: Juan Quintela 
Message-ID: <20231019085259.13307-3-quint...@redhat.com>
---
 migration/ram.h  |  2 +-
 migration/ram.c  | 10 +-
 migration/rdma.c |  2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/migration/ram.h b/migration/ram.h
index 145c915ca7..3f724b2f02 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -60,7 +60,7 @@ int ram_discard_range(const char *block_name, uint64_t start, 
size_t length);
 int ram_postcopy_incoming_init(MigrationIncomingState *mis);
 int ram_load_postcopy(QEMUFile *f, int channel);
 
-void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
+void ram_handle_zero(void *host, uint64_t size);
 
 void ram_transferred_add(uint64_t bytes);
 void ram_release_page(const char *rbname, uint64_t offset);
diff --git a/migration/ram.c b/migration/ram.c
index 4bfb20c94a..e0ad732ee8 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3446,7 +3446,7 @@ static inline void *colo_cache_from_block_offset(RAMBlock 
*block,
 }
 
 /**
- * ram_handle_compressed: handle the zero page case
+ * ram_handle_zero: handle the zero page case
  *
  * If a page (or a whole RDMA chunk) has been
  * determined to be zero, then zap it.
@@ -3455,10 +3455,10 @@ static inline void 
*colo_cache_from_block_offset(RAMBlock *block,
  * @ch: what the page is filled from.  We only support zero
  * @size: size of the zero page
  */
-void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
+void ram_handle_zero(void *host, uint64_t size)
 {
-if (ch != 0 || !buffer_is_zero(host, size)) {
-memset(host, ch, size);
+if (!buffer_is_zero(host, size)) {
+memset(host, 0, size);
 }
 }
 
@@ -4037,7 +4037,7 @@ static int ram_load_precopy(QEMUFile *f)
 ret = -EINVAL;
 break;
 }
-ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+ram_handle_zero(host, TARGET_PAGE_SIZE);
 break;
 
 case RAM_SAVE_FLAG_PAGE:
diff --git a/migration/rdma.c b/migration/rdma.c
index 2d963fd147..e3493e3b3e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3597,7 +3597,7 @@ int rdma_registration_handle(QEMUFile *f)
  comp->value);
 goto err;
 }
-ram_handle_compressed(host_addr, comp->value, comp->length);
+ram_handle_zero(host_addr, comp->length);
 break;
 
 case RDMA_CONTROL_REGISTER_FINISHED:
-- 
2.41.0




[PULL 10/39] migration: Make compress_data_with_multithreads return bool

2023-10-24 Thread Juan Quintela
Reviewed-by: Lukas Straub 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20231019110724.15324-5-quint...@redhat.com>
---
 migration/ram-compress.h |  4 ++--
 migration/ram-compress.c | 17 ++---
 migration/ram.c  |  3 +--
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/migration/ram-compress.h b/migration/ram-compress.h
index e55d3b50bd..b228640092 100644
--- a/migration/ram-compress.h
+++ b/migration/ram-compress.h
@@ -60,8 +60,8 @@ void compress_threads_save_cleanup(void);
 int compress_threads_save_setup(void);
 
 void flush_compressed_data(int (send_queued_data(CompressParam *)));
-int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset,
-int (send_queued_data(CompressParam *)));
+bool compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset,
+  int (send_queued_data(CompressParam *)));
 
 int wait_for_decompress_done(void);
 void compress_threads_load_cleanup(void);
diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index d037dfe6cf..ef03d60a6d 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -260,10 +260,13 @@ static inline void set_compress_params(CompressParam 
*param, RAMBlock *block,
 param->trigger = true;
 }
 
-int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset,
-int (send_queued_data(CompressParam *)))
+/*
+ * Return true when it compress a page
+ */
+bool compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset,
+ int (send_queued_data(CompressParam *)))
 {
-int  thread_count, pages = -1;
+int thread_count;
 bool wait = migrate_compress_wait_thread();
 
 thread_count = migrate_compress_threads();
@@ -281,8 +284,8 @@ retry:
 
 qemu_cond_signal(>cond);
 qemu_mutex_unlock(>mutex);
-pages = 1;
-break;
+qemu_mutex_unlock(_done_lock);
+return true;
 }
 }
 
@@ -290,13 +293,13 @@ retry:
  * wait for the free thread if the user specifies 'compress-wait-thread',
  * otherwise we will post the page out in the main thread as normal page.
  */
-if (pages < 0 && wait) {
+if (wait) {
 qemu_cond_wait(_done_cond, _done_lock);
 goto retry;
 }
 qemu_mutex_unlock(_done_lock);
 
-return pages;
+return false;
 }
 
 /* return the size after decompression, or negative value on error */
diff --git a/migration/ram.c b/migration/ram.c
index 8246663f64..63a575ae90 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2067,8 +2067,7 @@ static bool save_compress_page(RAMState *rs, 
PageSearchStatus *pss,
 return false;
 }
 
-if (compress_page_with_multi_thread(pss->block, offset,
-send_queued_data) > 0) {
+if (compress_page_with_multi_thread(pss->block, offset, send_queued_data)) 
{
 return true;
 }
 
-- 
2.41.0




[PULL 03/39] migration/doc: How to migrate when hosts have different features

2023-10-24 Thread Juan Quintela
Sometimes devices have different features depending of things outside
of qemu.  For instance the kernel.  Document how to handle that cases.

Acked-by: Peter Xu 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20231018112827.1325-4-quint...@redhat.com>
---
 docs/devel/migration.rst | 97 
 1 file changed, 97 insertions(+)

diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
index 6fe275b1ec..974505e4a7 100644
--- a/docs/devel/migration.rst
+++ b/docs/devel/migration.rst
@@ -1138,3 +1138,100 @@ machine types to have the right value::
 +{ "virtio-blk-device", "num-queues", "1"},
  ...
  };
+
+A device with diferent features on both sides
+-
+
+Let's assume that we are using the same QEMU binary on both sides,
+just to make the things easier.  But we have a device that has
+different features on both sides of the migration.  That can be
+because the devices are different, because the kernel driver of both
+devices have different features, whatever.
+
+How can we get this to work with migration.  The way to do that is
+"theoretically" easy.  You have to get the features that the device
+has in the source of the migration.  The features that the device has
+on the target of the migration, you get the intersection of the
+features of both sides, and that is the way that you should launch
+QEMU.
+
+Notice that this is not completely related to QEMU.  The most
+important thing here is that this should be handled by the managing
+application that launches QEMU.  If QEMU is configured correctly, the
+migration will succeed.
+
+That said, actually doing it is complicated.  Almost all devices are
+bad at being able to be launched with only some features enabled.
+With one big exception: cpus.
+
+You can read the documentation for QEMU x86 cpu models here:
+
+https://qemu-project.gitlab.io/qemu/system/qemu-cpu-models.html
+
+See when they talk about migration they recommend that one chooses the
+newest cpu model that is supported for all cpus.
+
+Let's say that we have:
+
+Host A:
+
+Device X has the feature Y
+
+Host B:
+
+Device X has not the feature Y
+
+If we try to migrate without any care from host A to host B, it will
+fail because when migration tries to load the feature Y on
+destination, it will find that the hardware is not there.
+
+Doing this would be the equivalent of doing with cpus:
+
+Host A:
+
+$ qemu-system-x86_64 -cpu host
+
+Host B:
+
+$ qemu-system-x86_64 -cpu host
+
+When both hosts have different cpu features this is guaranteed to
+fail.  Especially if Host B has less features than host A.  If host A
+has less features than host B, sometimes it works.  Important word of
+last sentence is "sometimes".
+
+So, forgetting about cpu models and continuing with the -cpu host
+example, let's see that the differences of the cpus is that Host A and
+B have the following features:
+
+Features:   'pcid'  'stibp' 'taa-no'
+Host A:X   X
+Host B:X
+
+And we want to migrate between them, the way configure both QEMU cpu
+will be:
+
+Host A:
+
+$ qemu-system-x86_64 -cpu host,pcid=off,stibp=off
+
+Host B:
+
+$ qemu-system-x86_64 -cpu host,taa-no=off
+
+And you would be able to migrate between them.  It is responsability
+of the management application or of the user to make sure that the
+configuration is correct.  QEMU doesn't know how to look at this kind
+of features in general.
+
+Notice that we don't recomend to use -cpu host for migration.  It is
+used in this example because it makes the example simpler.
+
+Other devices have worse control about individual features.  If they
+want to be able to migrate between hosts that show different features,
+the device needs a way to configure which ones it is going to use.
+
+In this section we have considered that we are using the same QEMU
+binary in both sides of the migration.  If we use different QEMU
+versions process, then we need to have into account all other
+differences and the examples become even more complicated.
-- 
2.41.0




[PULL 13/39] migration: Create compress_update_rates()

2023-10-24 Thread Juan Quintela
So we can move more compression_counters stuff to ram-compress.c.
Create compression_counters struct to add the stuff that was on
MigrationState.

Reviewed-by: Lukas Straub 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20231019110724.15324-8-quint...@redhat.com>
---
 migration/ram-compress.h |  1 +
 migration/ram.h  |  1 -
 migration/ram-compress.c | 42 +++-
 migration/ram.c  | 29 +--
 4 files changed, 43 insertions(+), 30 deletions(-)

diff --git a/migration/ram-compress.h b/migration/ram-compress.h
index b228640092..76dacd3ec7 100644
--- a/migration/ram-compress.h
+++ b/migration/ram-compress.h
@@ -71,5 +71,6 @@ void decompress_data_with_multi_threads(QEMUFile *f, void 
*host, int len);
 void populate_compress(MigrationInfo *info);
 uint64_t ram_compressed_pages(void);
 void update_compress_thread_counts(const CompressParam *param, int bytes_xmit);
+void compress_update_rates(uint64_t page_count);
 
 #endif
diff --git a/migration/ram.h b/migration/ram.h
index 3f724b2f02..9f3ad1ee81 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -34,7 +34,6 @@
 #include "io/channel.h"
 
 extern XBZRLECacheStats xbzrle_counters;
-extern CompressionStats compression_counters;
 
 /* Should be holding either ram_list.mutex, or the RCU lock. */
 #define RAMBLOCK_FOREACH_NOT_IGNORED(block)\
diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index f56e1f8e69..af42cab0fe 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -41,7 +41,20 @@
 #include "ram.h"
 #include "migration-stats.h"
 
-CompressionStats compression_counters;
+static struct {
+int64_t pages;
+int64_t busy;
+double busy_rate;
+int64_t compressed_size;
+double compression_rate;
+/* compression statistics since the beginning of the period */
+/* amount of count that no free thread to compress data */
+uint64_t compress_thread_busy_prev;
+/* amount bytes after compression */
+uint64_t compressed_size_prev;
+/* amount of compressed pages */
+uint64_t compress_pages_prev;
+} compression_counters;
 
 static CompressParam *comp_param;
 static QemuThread *compress_threads;
@@ -518,3 +531,30 @@ void update_compress_thread_counts(const CompressParam 
*param, int bytes_xmit)
 compression_counters.pages++;
 }
 
+void compress_update_rates(uint64_t page_count)
+{
+if (!migrate_compress()) {
+return;
+}
+compression_counters.busy_rate = (double)(compression_counters.busy -
+compression_counters.compress_thread_busy_prev) / page_count;
+compression_counters.compress_thread_busy_prev =
+compression_counters.busy;
+
+double compressed_size = compression_counters.compressed_size -
+compression_counters.compressed_size_prev;
+if (compressed_size) {
+double uncompressed_size = (compression_counters.pages -
+compression_counters.compress_pages_prev) *
+qemu_target_page_size();
+
+/* Compression-Ratio = Uncompressed-size / Compressed-size */
+compression_counters.compression_rate =
+uncompressed_size / compressed_size;
+
+compression_counters.compress_pages_prev =
+compression_counters.pages;
+compression_counters.compressed_size_prev =
+compression_counters.compressed_size;
+}
+}
diff --git a/migration/ram.c b/migration/ram.c
index 46209388ec..f7daf2226e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -369,13 +369,6 @@ struct RAMState {
 bool xbzrle_started;
 /* Are we on the last stage of migration */
 bool last_stage;
-/* compression statistics since the beginning of the period */
-/* amount of count that no free thread to compress data */
-uint64_t compress_thread_busy_prev;
-/* amount bytes after compression */
-uint64_t compressed_size_prev;
-/* amount of compressed pages */
-uint64_t compress_pages_prev;
 
 /* total handled target pages at the beginning of period */
 uint64_t target_page_count_prev;
@@ -945,7 +938,6 @@ uint64_t ram_get_total_transferred_pages(void)
 static void migration_update_rates(RAMState *rs, int64_t end_time)
 {
 uint64_t page_count = rs->target_page_count - rs->target_page_count_prev;
-double compressed_size;
 
 /* calculate period counters */
 stat64_set(_stats.dirty_pages_rate,
@@ -973,26 +965,7 @@ static void migration_update_rates(RAMState *rs, int64_t 
end_time)
 rs->xbzrle_pages_prev = xbzrle_counters.pages;
 rs->xbzrle_bytes_prev = xbzrle_counters.bytes;
 }
-
-if (migrate_compress()) {
-compression_counters.busy_rate = (double)(compression_counters.busy -
-rs->compress_thread_busy_prev) / page_count;
-rs->compress_thread_busy_prev = compression_counters.busy;
-
-compressed_size = compression_counters.compressed_size -
-  

[PULL 08/39] migration: Give one error if trying to set COMPRESSION and XBZRLE

2023-10-24 Thread Juan Quintela
Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20231019110724.15324-3-quint...@redhat.com>
---
 migration/options.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/migration/options.c b/migration/options.c
index b8c3c3218d..37fa1cfe74 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -625,6 +625,13 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
Error **errp)
 }
 }
 
+if (new_caps[MIGRATION_CAPABILITY_COMPRESS]) {
+if (new_caps[MIGRATION_CAPABILITY_XBZRLE]) {
+error_setg(errp, "Compression is not compatible with xbzrle");
+return false;
+}
+}
+
 return true;
 }
 
-- 
2.41.0




[PULL 04/39] migration/doc: We broke backwards compatibility

2023-10-24 Thread Juan Quintela
When we detect that we have broken backwards compatibility in a
released version, we can't do anything for that version.  But once we
fix that bug on the next released version, we can "mitigate" that
problem when migrating to new versions to give a way out of that
machine until it does a hard reboot.

Acked-by: Peter Xu 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20231018112827.1325-5-quint...@redhat.com>
---
 docs/devel/migration.rst | 202 +++
 1 file changed, 202 insertions(+)

diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
index 974505e4a7..be913630c3 100644
--- a/docs/devel/migration.rst
+++ b/docs/devel/migration.rst
@@ -1235,3 +1235,205 @@ In this section we have considered that we are using 
the same QEMU
 binary in both sides of the migration.  If we use different QEMU
 versions process, then we need to have into account all other
 differences and the examples become even more complicated.
+
+How to mitigate when we have a backward compatibility error
+---
+
+We broke migration for old machine types continuously during
+development.  But as soon as we find that there is a problem, we fix
+it.  The problem is what happens when we detect after we have done a
+release that something has gone wrong.
+
+Let see how it worked with one example.
+
+After the release of qemu-8.0 we found a problem when doing migration
+of the machine type pc-7.2.
+
+- $ qemu-7.2 -M pc-7.2  ->  qemu-7.2 -M pc-7.2
+
+  This migration works
+
+- $ qemu-8.0 -M pc-7.2  ->  qemu-8.0 -M pc-7.2
+
+  This migration works
+
+- $ qemu-8.0 -M pc-7.2  ->  qemu-7.2 -M pc-7.2
+
+  This migration fails
+
+- $ qemu-7.2 -M pc-7.2  ->  qemu-8.0 -M pc-7.2
+
+  This migration fails
+
+So clearly something fails when migration between qemu-7.2 and
+qemu-8.0 with machine type pc-7.2.  The error messages, and git bisect
+pointed to this commit.
+
+In qemu-8.0 we got this commit::
+
+commit 010746ae1db7f52700cb2e2c46eb94f299cfa0d2
+Author: Jonathan Cameron 
+Date:   Thu Mar 2 13:37:02 2023 +
+
+hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register
+
+
+The relevant bits of the commit for our example are this ones::
+
+--- a/hw/pci/pcie_aer.c
++++ b/hw/pci/pcie_aer.c
+@@ -112,6 +112,10 @@ int pcie_aer_init(PCIDevice *dev,
+
+ pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
+  PCI_ERR_UNC_SUPPORTED);
++pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
++ PCI_ERR_UNC_MASK_DEFAULT);
++pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
++ PCI_ERR_UNC_SUPPORTED);
+
+ pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER,
+ PCI_ERR_UNC_SEVERITY_DEFAULT);
+
+The patch changes how we configure PCI space for AER.  But QEMU fails
+when the PCI space configuration is different between source and
+destination.
+
+The following commit shows how this got fixed::
+
+commit 5ed3dabe57dd9f4c007404345e5f5bf0e347317f
+Author: Leonardo Bras 
+Date:   Tue May 2 21:27:02 2023 -0300
+
+hw/pci: Disable PCI_ERR_UNCOR_MASK register for machine type < 8.0
+
+[...]
+
+The relevant parts of the fix in QEMU are as follow:
+
+First, we create a new property for the device to be able to configure
+the old behaviour or the new behaviour::
+
+diff --git a/hw/pci/pci.c b/hw/pci/pci.c
+index 8a87ccc8b0..5153ad63d6 100644
+--- a/hw/pci/pci.c
++++ b/hw/pci/pci.c
+@@ -79,6 +79,8 @@ static Property pci_props[] = {
+ DEFINE_PROP_STRING("failover_pair_id", PCIDevice,
+failover_pair_id),
+ DEFINE_PROP_UINT32("acpi-index",  PCIDevice, acpi_index, 0),
++DEFINE_PROP_BIT("x-pcie-err-unc-mask", PCIDevice, cap_present,
++QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
+ DEFINE_PROP_END_OF_LIST()
+ };
+
+Notice that we enable the feature for new machine types.
+
+Now we see how the fix is done.  This is going to depend on what kind
+of breakage happens, but in this case it is quite simple::
+
+diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
+index 103667c368..374d593ead 100644
+--- a/hw/pci/pcie_aer.c
++++ b/hw/pci/pcie_aer.c
+@@ -112,10 +112,13 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver,
+uint16_t offset,
+
+ pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
+  PCI_ERR_UNC_SUPPORTED);
+-pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
+- PCI_ERR_UNC_MASK_DEFAULT);
+-pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
+- PCI_ERR_UNC_SUPPORTED);
++
++if (dev->cap_present & QEMU_PCIE_ERR_UNC_MASK) {
++pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
++ PCI_ERR_UNC_MASK_DEFAULT);
++

[PULL 02/39] migration/doc: Add documentation for backwards compatiblity

2023-10-24 Thread Juan Quintela
State what are the requeriments to get migration working between qemu
versions.  And once there explain how one is supposed to implement a
new feature/default value and not break migration.

Reviewed-by: Vladimir Sementsov-Ogievskiy 
Acked-by: Peter Xu 
Signed-off-by: Juan Quintela 
Message-ID: <20231018112827.1325-3-quint...@redhat.com>
---
 docs/devel/migration.rst | 219 +++
 1 file changed, 219 insertions(+)

diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
index 4d6a98ae58..6fe275b1ec 100644
--- a/docs/devel/migration.rst
+++ b/docs/devel/migration.rst
@@ -919,3 +919,222 @@ versioned machine types to cut down on the combinations 
that will need
 support.  This is also useful when newer versions of firmware outgrow
 the padding.
 
+
+Backwards compatibility
+===
+
+How backwards compatibility works
+-
+
+When we do migration, we have two QEMU processes: the source and the
+target.  There are two cases, they are the same version or they are
+different versions.  The easy case is when they are the same version.
+The difficult one is when they are different versions.
+
+There are two things that are different, but they have very similar
+names and sometimes get confused:
+
+- QEMU version
+- machine type version
+
+Let's start with a practical example, we start with:
+
+- qemu-system-x86_64 (v5.2), from now on qemu-5.2.
+- qemu-system-x86_64 (v5.1), from now on qemu-5.1.
+
+Related to this are the "latest" machine types defined on each of
+them:
+
+- pc-q35-5.2 (newer one in qemu-5.2) from now on pc-5.2
+- pc-q35-5.1 (newer one in qemu-5.1) from now on pc-5.1
+
+First of all, migration is only supposed to work if you use the same
+machine type in both source and destination. The QEMU hardware
+configuration needs to be the same also on source and destination.
+Most aspects of the backend configuration can be changed at will,
+except for a few cases where the backend features influence frontend
+device feature exposure.  But that is not relevant for this section.
+
+I am going to list the number of combinations that we can have.  Let's
+start with the trivial ones, QEMU is the same on source and
+destination:
+
+1 - qemu-5.2 -M pc-5.2  -> migrates to -> qemu-5.2 -M pc-5.2
+
+  This is the latest QEMU with the latest machine type.
+  This have to work, and if it doesn't work it is a bug.
+
+2 - qemu-5.1 -M pc-5.1  -> migrates to -> qemu-5.1 -M pc-5.1
+
+  Exactly the same case than the previous one, but for 5.1.
+  Nothing to see here either.
+
+This are the easiest ones, we will not talk more about them in this
+section.
+
+Now we start with the more interesting cases.  Consider the case where
+we have the same QEMU version in both sides (qemu-5.2) but we are using
+the latest machine type for that version (pc-5.2) but one of an older
+QEMU version, in this case pc-5.1.
+
+3 - qemu-5.2 -M pc-5.1  -> migrates to -> qemu-5.2 -M pc-5.1
+
+  It needs to use the definition of pc-5.1 and the devices as they
+  were configured on 5.1, but this should be easy in the sense that
+  both sides are the same QEMU and both sides have exactly the same
+  idea of what the pc-5.1 machine is.
+
+4 - qemu-5.1 -M pc-5.2  -> migrates to -> qemu-5.1 -M pc-5.2
+
+  This combination is not possible as the qemu-5.1 doen't understand
+  pc-5.2 machine type.  So nothing to worry here.
+
+Now it comes the interesting ones, when both QEMU processes are
+different.  Notice also that the machine type needs to be pc-5.1,
+because we have the limitation than qemu-5.1 doesn't know pc-5.2.  So
+the possible cases are:
+
+5 - qemu-5.2 -M pc-5.1  -> migrates to -> qemu-5.1 -M pc-5.1
+
+  This migration is known as newer to older.  We need to make sure
+  when we are developing 5.2 we need to take care about not to break
+  migration to qemu-5.1.  Notice that we can't make updates to
+  qemu-5.1 to understand whatever qemu-5.2 decides to change, so it is
+  in qemu-5.2 side to make the relevant changes.
+
+6 - qemu-5.1 -M pc-5.1  -> migrates to -> qemu-5.2 -M pc-5.1
+
+  This migration is known as older to newer.  We need to make sure
+  than we are able to receive migrations from qemu-5.1. The problem is
+  similar to the previous one.
+
+If qemu-5.1 and qemu-5.2 were the same, there will not be any
+compatibility problems.  But the reason that we create qemu-5.2 is to
+get new features, devices, defaults, etc.
+
+If we get a device that has a new feature, or change a default value,
+we have a problem when we try to migrate between different QEMU
+versions.
+
+So we need a way to tell qemu-5.2 that when we are using machine type
+pc-5.1, it needs to **not** use the feature, to be able to migrate to
+real qemu-5.1.
+
+And the equivalent part when migrating from qemu-5.1 to qemu-5.2.
+qemu-5.2 has to expect that it is not going to get data for the new
+feature, because qemu-5.1 doesn't know about it.
+
+How do we tell QEMU about these device 

Re: [PATCH 07/12] hw/xen: update Xen console to XenDevice model

2023-10-24 Thread Paul Durrant

On 16/10/2023 16:19, David Woodhouse wrote:

This allows (non-primary) console devices to be created on the command
line.

Signed-off-by: David Woodhouse 
---
  hw/char/trace-events|   8 +
  hw/char/xen_console.c   | 502 +++-
  hw/xen/xen-legacy-backend.c |   1 -
  3 files changed, 381 insertions(+), 130 deletions(-)

diff --git a/hw/char/trace-events b/hw/char/trace-events
index babf4d35ea..7a398c82a5 100644
--- a/hw/char/trace-events
+++ b/hw/char/trace-events
@@ -105,3 +105,11 @@ cadence_uart_baudrate(unsigned baudrate) "baudrate %u"
  # sh_serial.c
  sh_serial_read(char *id, unsigned size, uint64_t offs, uint64_t val) " %s size %d offs 
0x%02" PRIx64 " -> 0x%02" PRIx64
  sh_serial_write(char *id, unsigned size, uint64_t offs, uint64_t val) "%s size %d offs 
0x%02" PRIx64 " <- 0x%02" PRIx64
+
+# xen_console.c
+xen_console_connect(unsigned int idx, unsigned int ring_ref, unsigned int port, unsigned 
int limit) "idx %u ring_ref %u port %u limit %u"
+xen_console_disconnect(unsigned int idx) "idx %u"
+xen_console_unrealize(unsigned int idx) "idx %u"
+xen_console_realize(unsigned int idx, const char *chrdev) "idx %u chrdev %s"
+xen_console_device_create(unsigned int idx) "idx %u"
+xen_console_device_destroy(unsigned int idx) "idx %u"
diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c
index 810dae3f44..bd20be116c 100644
--- a/hw/char/xen_console.c
+++ b/hw/char/xen_console.c
@@ -20,15 +20,19 @@
   */
  
  #include "qemu/osdep.h"

+#include "qemu/cutils.h"
  #include 
  #include 
  
  #include "qapi/error.h"

  #include "sysemu/sysemu.h"
  #include "chardev/char-fe.h"
-#include "hw/xen/xen-legacy-backend.h"
-
+#include "hw/xen/xen-backend.h"
+#include "hw/xen/xen-bus-helper.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-properties-system.h"
  #include "hw/xen/interface/io/console.h"
+#include "trace.h"
  
  struct buffer {

  uint8_t *data;
@@ -39,16 +43,22 @@ struct buffer {
  };
  
  struct XenConsole {

-struct XenLegacyDevice  xendev;  /* must be first */
+struct XenDevice  xendev;  /* must be first */
+XenEventChannel   *event_channel;
+int   dev;
  struct buffer buffer;
-char  console[XEN_BUFSIZE];
-int   ring_ref;
+char  *fe_path;
+unsigned int  ring_ref;
  void  *sring;
  CharBackend   chr;
  int   backlog;
  };
+typedef struct XenConsole XenConsole;
+
+#define TYPE_XEN_CONSOLE_DEVICE "xen-console"
+OBJECT_DECLARE_SIMPLE_TYPE(XenConsole, XEN_CONSOLE_DEVICE)
  
-static void buffer_append(struct XenConsole *con)

+static bool buffer_append(XenConsole *con)
  {
  struct buffer *buffer = >buffer;
  XENCONS_RING_IDX cons, prod, size;
@@ -60,7 +70,7 @@ static void buffer_append(struct XenConsole *con)
  
  size = prod - cons;

  if ((size == 0) || (size > sizeof(intf->out)))
-return;
+return false;
  
  if ((buffer->capacity - buffer->size) < size) {

  buffer->capacity += (size + 1024);
@@ -73,7 +83,7 @@ static void buffer_append(struct XenConsole *con)
  
  xen_mb();

  intf->out_cons = cons;
-xen_pv_send_notify(>xendev);
+xen_device_notify_event_channel(XEN_DEVICE(con), con->event_channel, NULL);
  
  if (buffer->max_capacity &&

  buffer->size > buffer->max_capacity) {
@@ -89,6 +99,7 @@ static void buffer_append(struct XenConsole *con)
  if (buffer->consumed > buffer->max_capacity - over)
  buffer->consumed = buffer->max_capacity - over;
  }
+return true;
  }
  
  static void buffer_advance(struct buffer *buffer, size_t len)

@@ -100,7 +111,7 @@ static void buffer_advance(struct buffer *buffer, size_t 
len)
  }
  }
  
-static int ring_free_bytes(struct XenConsole *con)

+static int ring_free_bytes(XenConsole *con)
  {
  struct xencons_interface *intf = con->sring;
  XENCONS_RING_IDX cons, prod, space;
@@ -118,13 +129,13 @@ static int ring_free_bytes(struct XenConsole *con)
  
  static int xencons_can_receive(void *opaque)

  {
-struct XenConsole *con = opaque;
+XenConsole *con = opaque;
  return ring_free_bytes(con);
  }
  
  static void xencons_receive(void *opaque, const uint8_t *buf, int len)

  {
-struct XenConsole *con = opaque;
+XenConsole *con = opaque;
  struct xencons_interface *intf = con->sring;
  XENCONS_RING_IDX prod;
  int i, max;
@@ -141,10 +152,10 @@ static void xencons_receive(void *opaque, const uint8_t 
*buf, int len)
  }
  xen_wmb();
  intf->in_prod = prod;
-xen_pv_send_notify(>xendev);
+xen_device_notify_event_channel(XEN_DEVICE(con), con->event_channel, NULL);
  }
  
-static void xencons_send(struct XenConsole *con)

+static bool xencons_send(XenConsole *con)
  {
  ssize_t len, size;
  
@@ -159,174 +170,407 @@ static void xencons_send(struct XenConsole *con)

  if (len < 1) {
  if (!con->backlog) {
  

Re: [PATCH 06/12] hw/xen: add get_frontend_path() method to XenDeviceClass

2023-10-24 Thread Paul Durrant

On 24/10/2023 13:56, David Woodhouse wrote:

On Tue, 2023-10-24 at 13:42 +0100, Paul Durrant wrote:



--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -711,8 +711,16 @@ static void xen_device_frontend_create(XenDevice *xendev, 
Error **errp)
    {
    ERRP_GUARD();
    XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev)));
+    XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev);

-    xendev->frontend_path = xen_device_get_frontend_path(xendev);

+    if (xendev_class->get_frontend_path) {
+    xendev->frontend_path = xendev_class->get_frontend_path(xendev, errp);
+    if (!xendev->frontend_path) {
+    return;


I think you need to update errp here to note that you are failing to
create the frontend.


If xendev_class->get_frontend_path returned NULL it will have filled in errp.



Ok, but a prepend to say that a lack of path there means we skip 
frontend creation seems reasonable?



As a general rule (I'll be doing a bombing run on xen-bus once I get my
patch queue down into single digits) we should never check 'if (*errp)'
to check if a function had an error. It should *also* return a success
or failure indication, and we should cope with errp being NULL.



I'm pretty sure someone told me the exact opposite a few years back.

  Paul



Re: [PATCH 01/12] i386/xen: fix per-vCPU upcall vector for Xen emulation

2023-10-24 Thread David Woodhouse
On Tue, 2023-10-24 at 13:16 +0100, Paul Durrant wrote:
> On 16/10/2023 16:18, David Woodhouse wrote:
> > From: David Woodhouse 
> > 
> > The per-vCPU upcall vector support had two problems. Firstly it was
> > using the wrong hypercall argument and would always return -EFAULT.
> > And secondly it was using the wrong ioctl() to pass the vector to
> > the kernel and thus the *kernel* would always return -EINVAL.
> > 
> > Linux doesn't (yet) use this mode so it went without decent testing
> > for a while.
> > 
> > Fixes: 105b47fdf2d0 ("i386/xen: implement
> > HVMOP_set_evtchn_upcall_vector")
> > Signed-off-by: David Woodhouse 
> > ---
> >   target/i386/kvm/xen-emu.c | 5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> 
> Reviewed-by: Paul Durrant 

FWIW this patch gained a third "brown paper bag" fix this morning, when
I finally worked it out:

@@ -440,7 +440,8 @@ void kvm_xen_inject_vcpu_callback_vector(uint32_t
vcpu_id, int type)
  * deliver it as an MSI.
  */
 MSIMessage msg = {
-.address = APIC_DEFAULT_ADDRESS | X86_CPU(cs)->apic_id,
+.address = APIC_DEFAULT_ADDRESS |
+   (X86_CPU(cs)->apic_id << MSI_ADDR_DEST_ID_SHIFT),
 .data = vector | (1UL << MSI_DATA_LEVEL_SHIFT),
 };
 kvm_irqchip_send_msi(kvm_state, msg);



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 06/12] hw/xen: add get_frontend_path() method to XenDeviceClass

2023-10-24 Thread David Woodhouse
On Tue, 2023-10-24 at 13:42 +0100, Paul Durrant wrote:
> 
> > --- a/hw/xen/xen-bus.c
> > +++ b/hw/xen/xen-bus.c
> > @@ -711,8 +711,16 @@ static void xen_device_frontend_create(XenDevice 
> > *xendev, Error **errp)
> >    {
> >    ERRP_GUARD();
> >    XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev)));
> > +    XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev);
> >    
> > -    xendev->frontend_path = xen_device_get_frontend_path(xendev);
> > +    if (xendev_class->get_frontend_path) {
> > +    xendev->frontend_path = xendev_class->get_frontend_path(xendev, 
> > errp);
> > +    if (!xendev->frontend_path) {
> > +    return;
> 
> I think you need to update errp here to note that you are failing to 
> create the frontend.

If xendev_class->get_frontend_path returned NULL it will have filled in errp.

As a general rule (I'll be doing a bombing run on xen-bus once I get my
patch queue down into single digits) we should never check 'if (*errp)'
to check if a function had an error. It should *also* return a success
or failure indication, and we should cope with errp being NULL.




smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 05/12] hw/xen: populate store frontend nodes with XenStore PFN/port

2023-10-24 Thread David Woodhouse
On Tue, 2023-10-24 at 13:35 +0100, Paul Durrant wrote:
> On 16/10/2023 16:19, David Woodhouse wrote:
> > From: David Woodhouse 
> > 
> > This is kind of redundant since without being able to get these
> > through
> > ome other method (HVMOP_get_param) the guest wouldn't be able to
> > access
> 
> ^ typo
> 
> > XenStore in order to find them. But Xen populates them, and it does
> > allow guests to *rebind* to the event channel port after a reset.
> > 
> 
> ... although this can also be done by querying the remote end of the 
> port before reset.
> 
I think you had to do that anyway; I don't think I was supposed to be
putting s->be_port in there anyway; it was supposed to be the
*frontend* port, and I've changed that in my tree.

I'll drop this whole sentence (and fix the typo).

> > Signed-off-by: David Woodhouse 
> > ---
> >   hw/i386/kvm/xen_xenstore.c | 10 ++
> >   1 file changed, 10 insertions(+)
> > 
> 
> Reviewed-by: Paul Durrant 
> 



smime.p7s
Description: S/MIME cryptographic signature


  1   2   >