Re: [PATCH v2 3/3] docs/about: Add the missing release record in the subject
On 23/08/2021 05.00, Yanan Wang wrote: Commit 29e0447551 (docs/about/removed-features: Document removed CLI options from QEMU v3.1) has recorded some CLI options as replaced/removed from QEMU v3.1, but one of the subjects has missed the release record. Let's fix it. Reported-by: Cornelia Huck Signed-off-by: Yanan Wang --- docs/about/removed-features.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst index 1c926a8bc1..8feeead449 100644 --- a/docs/about/removed-features.rst +++ b/docs/about/removed-features.rst @@ -140,8 +140,8 @@ Use ``-rtc driftfix=slew`` instead. Replaced by ``-rtc base=date``. -``-vnc ...,tls=...``, ``-vnc ...,x509=...`` & ``-vnc ...,x509verify=...`` -' +``-vnc ...,tls=...``, ``-vnc ...,x509=...`` & ``-vnc ...,x509verify=...`` (removed in 3.1) +'' Reviewed-by: Thomas Huth
Re: [PATCH v2 2/3] docs/about: Unify the subject format
On 23/08/2021 05.00, Yanan Wang wrote: There is a mixture of "since/removed in X.Y" vs "since/removed in X.Y.Z" in the subjects in deprecated.rst/removed-features.rst. It will be better to use an unified format. It seems unlikely that we will ever deprecate something in a stable release, and even more unlikely that we'll remove something in one, so the short versions look like the thing we want to standardize on. So here we unify the subject format in deprecated.rst to "since X.Y", and unify the subject format in removed-features.rst to "removed in X.Y". Signed-off-by: Yanan Wang Reviewed-by: Cornelia Huck --- docs/about/deprecated.rst | 56 - docs/about/removed-features.rst | 28 - 2 files changed, 42 insertions(+), 42 deletions(-) diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst index 6d438f1c8d..8d4fd384a5 100644 --- a/docs/about/deprecated.rst +++ b/docs/about/deprecated.rst @@ -107,8 +107,8 @@ the process listing. This is replaced by the new ``password-secret`` option which lets the password be securely provided on the command line using a ``secret`` object instance. -``opened`` property of ``rng-*`` objects (since 6.0.0) -'' +``opened`` property of ``rng-*`` objects (since 6.0) + The only effect of specifying ``opened=on`` in the command line or QMP ``object-add`` is that the device is opened immediately, possibly before all @@ -116,8 +116,8 @@ other options have been processed. This will either have no effect (if ``opened`` was the last option) or cause errors. The property is therefore useless and should not be specified. -``loaded`` property of ``secret`` and ``secret_keyring`` objects (since 6.0.0) -'' +``loaded`` property of ``secret`` and ``secret_keyring`` objects (since 6.0) + The only effect of specifying ``loaded=on`` in the command line or QMP ``object-add`` is that the secret is loaded immediately, possibly before all @@ -142,33 +142,33 @@ should be used instead. QEMU Machine Protocol (QMP) commands -``blockdev-open-tray``, ``blockdev-close-tray`` argument ``device`` (since 2.8.0) -' +``blockdev-open-tray``, ``blockdev-close-tray`` argument ``device`` (since 2.8) +''' Use argument ``id`` instead. -``eject`` argument ``device`` (since 2.8.0) -''' +``eject`` argument ``device`` (since 2.8) +' Use argument ``id`` instead. -``blockdev-change-medium`` argument ``device`` (since 2.8.0) - +``blockdev-change-medium`` argument ``device`` (since 2.8) +'' Use argument ``id`` instead. -``block_set_io_throttle`` argument ``device`` (since 2.8.0) -''' +``block_set_io_throttle`` argument ``device`` (since 2.8) +' Use argument ``id`` instead. -``blockdev-add`` empty string argument ``backing`` (since 2.10.0) -' +``blockdev-add`` empty string argument ``backing`` (since 2.10) +''' Use argument value ``null`` instead. -``block-commit`` arguments ``base`` and ``top`` (since 3.1.0) -' +``block-commit`` arguments ``base`` and ``top`` (since 3.1) +''' Use arguments ``base-node`` and ``top-node`` instead. @@ -191,8 +191,8 @@ from Linux upstream kernel, declare it deprecated. System emulator CPUS -``Icelake-Client`` CPU Model (since 5.2.0) -'' +``Icelake-Client`` CPU Model (since 5.2) + ``Icelake-Client`` CPU Models are deprecated. Use ``Icelake-Server`` CPU Models instead. @@ -245,8 +245,8 @@ Device options Emulated device options ''' -``-device virtio-blk,scsi=on|off`` (since 5.0.0) - +``-device virtio-blk,scsi=on|off`` (since 5.0) +^^ The virtio-blk SCSI passthrough feature is a legacy VIRTIO feature. VIRTIO 1.0 and later do not support it because the virtio-scsi device was introduced for @@ -258,14 +258,14 @@
Re: [PATCH v2 1/3] docs/about: Remove the duplicated doc
On 23/08/2021 05.00, Yanan Wang wrote: There are two places describing the same thing about deprecation of invalid topologies of -smp CLI, so remove the duplicated one. Signed-off-by: Yanan Wang Reviewed-by: Cornelia Huck --- docs/about/removed-features.rst | 13 - 1 file changed, 13 deletions(-) diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst index cbfa1a8e31..6a9c5bb484 100644 --- a/docs/about/removed-features.rst +++ b/docs/about/removed-features.rst @@ -647,19 +647,6 @@ as ignored. Currently, users are responsible for making sure the backing storage specified with ``-mem-path`` can actually provide the guest RAM configured with ``-m`` and QEMU fails to start up if RAM allocation is unsuccessful. -``-smp`` (invalid topologies) (removed 5.2) -''' - -CPU topology properties should describe whole machine topology including -possible CPUs. - -However, historically it was possible to start QEMU with an incorrect topology -where *n* <= *sockets* * *cores* * *threads* < *maxcpus*, -which could lead to an incorrect topology enumeration by the guest. -Support for invalid topologies is removed, the user must ensure -topologies described with -smp include all possible cpus, i.e. -*sockets* * *cores* * *threads* = *maxcpus*. - ``-machine enforce-config-section=on|off`` (removed 5.2) Reviewed-by: Thomas Huth
[PATCH] hw/arm/smmuv3: Simplify range invalidation
Simplify range invalidation which can avoid to iterate over all iotlb entries multi-times. For instance invalidations patterns like "invalidate 32 4kB pages starting from 0xffacd000" need to iterate over all iotlb entries 6 times (num_pages: 1, 2, 16, 8, 4, 1). It only needs to iterate over all iotlb entries once with new implementation. Signed-off-by: Renwei Liu --- hw/arm/smmu-common.c | 6 +++--- hw/arm/smmu-internal.h | 2 +- hw/arm/smmuv3.c| 22 -- 3 files changed, 8 insertions(+), 22 deletions(-) mode change 100644 => 100755 hw/arm/smmu-common.c mode change 100644 => 100755 hw/arm/smmu-internal.h mode change 100644 => 100755 hw/arm/smmuv3.c diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c old mode 100644 new mode 100755 index 0459850a93..ccb085f83c --- a/hw/arm/smmu-common.c +++ b/hw/arm/smmu-common.c @@ -142,8 +142,8 @@ static gboolean smmu_hash_remove_by_asid_iova(gpointer key, gpointer value, if (info->asid >= 0 && info->asid != SMMU_IOTLB_ASID(iotlb_key)) { return false; } -return ((info->iova & ~entry->addr_mask) == entry->iova) || - ((entry->iova & ~info->mask) == info->iova); +return (entry->iova >= info->iova) && + ((entry->iova + entry->addr_mask) < (info->iova + info->range)); } inline void @@ -167,7 +167,7 @@ smmu_iotlb_inv_iova(SMMUState *s, int asid, dma_addr_t iova, SMMUIOTLBPageInvInfo info = { .asid = asid, .iova = iova, -.mask = (num_pages * 1 << granule) - 1}; +.range = num_pages * 1 << granule}; g_hash_table_foreach_remove(s->iotlb, smmu_hash_remove_by_asid_iova, diff --git a/hw/arm/smmu-internal.h b/hw/arm/smmu-internal.h old mode 100644 new mode 100755 index 2d75b31953..f0e3a777af --- a/hw/arm/smmu-internal.h +++ b/hw/arm/smmu-internal.h @@ -101,7 +101,7 @@ uint64_t iova_level_offset(uint64_t iova, int inputsize, typedef struct SMMUIOTLBPageInvInfo { int asid; uint64_t iova; -uint64_t mask; +uint64_t range; } SMMUIOTLBPageInvInfo; typedef struct SMMUSIDRange { diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c old mode 100644 new mode 100755 index 01b60bee49..0b009107d1 --- a/hw/arm/smmuv3.c +++ b/hw/arm/smmuv3.c @@ -857,7 +857,7 @@ static void smmuv3_inv_notifiers_iova(SMMUState *s, int asid, dma_addr_t iova, static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd) { -dma_addr_t end, addr = CMD_ADDR(cmd); +dma_addr_t addr = CMD_ADDR(cmd); uint8_t type = CMD_TYPE(cmd); uint16_t vmid = CMD_VMID(cmd); uint8_t scale = CMD_SCALE(cmd); @@ -866,7 +866,6 @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd) bool leaf = CMD_LEAF(cmd); uint8_t tg = CMD_TG(cmd); uint64_t num_pages; -uint8_t granule; int asid = -1; if (type == SMMU_CMD_TLBI_NH_VA) { @@ -880,23 +879,10 @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd) return; } -/* RIL in use */ - num_pages = (num + 1) * BIT_ULL(scale); -granule = tg * 2 + 10; - -/* Split invalidations into ^2 range invalidations */ -end = addr + (num_pages << granule) - 1; - -while (addr != end + 1) { -uint64_t mask = dma_aligned_pow2_mask(addr, end, 64); - -num_pages = (mask + 1) >> granule; -trace_smmuv3_s1_range_inval(vmid, asid, addr, tg, num_pages, ttl, leaf); -smmuv3_inv_notifiers_iova(s, asid, addr, tg, num_pages); -smmu_iotlb_inv_iova(s, asid, addr, tg, num_pages, ttl); -addr += mask + 1; -} +trace_smmuv3_s1_range_inval(vmid, asid, addr, tg, num_pages, ttl, leaf); +smmuv3_inv_notifiers_iova(s, asid, addr, tg, num_pages); +smmu_iotlb_inv_iova(s, asid, addr, tg, num_pages, ttl); } static gboolean -- 2.32.0
Re: [PATCH v4 15/21] target/riscv: Reorg csr instructions
On Sat, Aug 21, 2021 at 1:43 AM Richard Henderson wrote: > > Introduce csrr and csrw helpers, for read-only and write-only insns. > > Note that we do not properly implement this in riscv_csrrw, in that > we cannot distinguish true read-only (rs1 == 0) from any other zero > write_mask another source register -- this should still raise an > exception for read-only registers. > > Only issue gen_io_start for CF_USE_ICOUNT. > Use ctx->zero for csrrc. > Use get_gpr and dest_gpr. > > Reviewed-by: Bin Meng > Signed-off-by: Richard Henderson > --- > target/riscv/helper.h | 6 +- > target/riscv/op_helper.c| 18 +-- > target/riscv/insn_trans/trans_rvi.c.inc | 172 +--- > 3 files changed, 131 insertions(+), 65 deletions(-) > When testing Linux kernel boot, there was a segment fault in the helper_csrw() path where ret_value pointer is now NULL, and some CSR write op does not test ret_value. Regards, Bin
Re: [PATCH 2/3] hw/char: cadence_uart: Disable transmit when input clock is disabled
On Mon, Aug 23, 2021 at 12:43 PM Alistair Francis wrote: > > On Mon, Aug 23, 2021 at 12:11 PM Bin Meng wrote: > > > > At present when input clock is disabled, any character transmitted > > to tx fifo can still show on the serial line, which is wrong. > > > > Fixes: b636db306e06 ("hw/char/cadence_uart: add clock support") > > Signed-off-by: Bin Meng > > --- > > > > hw/char/cadence_uart.c | 5 + > > 1 file changed, 5 insertions(+) > > > > diff --git a/hw/char/cadence_uart.c b/hw/char/cadence_uart.c > > index b4b5e8a3ee..154be34992 100644 > > --- a/hw/char/cadence_uart.c > > +++ b/hw/char/cadence_uart.c > > @@ -327,6 +327,11 @@ static gboolean cadence_uart_xmit(void *do_not_use, > > GIOCondition cond, > > static void uart_write_tx_fifo(CadenceUARTState *s, const uint8_t *buf, > > int size) > > { > > +/* ignore characters when unclocked or in reset */ > > +if (!clock_is_enabled(s->refclk) || device_is_in_reset(DEVICE(s))) { > > +return; > > +} > > Should we log a guest error here? > Not sure. Based on my past experience of many hardware, if the input clock is disabled, accessing the whole register block might cause a bus fault. But I believe such bus fault is not modeled in QEMU. This change just mirrors the same check on the Rx side. Regards, Bin
Re: [PATCH 2/3] hw/char: cadence_uart: Disable transmit when input clock is disabled
On Mon, Aug 23, 2021 at 12:11 PM Bin Meng wrote: > > At present when input clock is disabled, any character transmitted > to tx fifo can still show on the serial line, which is wrong. > > Fixes: b636db306e06 ("hw/char/cadence_uart: add clock support") > Signed-off-by: Bin Meng > --- > > hw/char/cadence_uart.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/hw/char/cadence_uart.c b/hw/char/cadence_uart.c > index b4b5e8a3ee..154be34992 100644 > --- a/hw/char/cadence_uart.c > +++ b/hw/char/cadence_uart.c > @@ -327,6 +327,11 @@ static gboolean cadence_uart_xmit(void *do_not_use, > GIOCondition cond, > static void uart_write_tx_fifo(CadenceUARTState *s, const uint8_t *buf, > int size) > { > +/* ignore characters when unclocked or in reset */ > +if (!clock_is_enabled(s->refclk) || device_is_in_reset(DEVICE(s))) { > +return; > +} Should we log a guest error here? Alistair > + > if ((s->r[R_CR] & UART_CR_TX_DIS) || !(s->r[R_CR] & UART_CR_TX_EN)) { > return; > } > -- > 2.25.1 > >
Re: [PATCH 1/3] hw/misc: zynq_slcr: Correctly compute output clocks in the reset exit phase
On Mon, Aug 23, 2021 at 12:09 PM Bin Meng wrote: > > As of today, when booting upstream U-Boot for Xilinx Zynq, the UART > does not receive anything. Debugging shows that the UART input clock > frequency is zero which prevents the UART from receiving anything as > per the logic in uart_receive(). > > From zynq_slcr_reset_exit() comment, it intends to compute output > clocks according to ps_clk and registers. zynq_slcr_compute_clocks() > is called to accomplish the task, inside which device_is_in_reset() > is called to actually make the attempt in vain. > > Rework reset_hold() and reset_exit() so that in the reset exit phase, > the logic can really compute output clocks in reset_exit(). > > With this change, upstream U-Boot boots properly again with: > > $ qemu-system-arm -M xilinx-zynq-a9 -m 1G -display none -serial null -serial > stdio \ > -device loader,file=u-boot-dtb.bin,addr=0x400,cpu-num=0 > > Fixes: 38867cb7ec90 ("hw/misc/zynq_slcr: add clock generation for uarts") > Signed-off-by: Bin Meng Acked-by: Alistair Francis Alistair > --- > > hw/misc/zynq_slcr.c | 31 ++- > 1 file changed, 18 insertions(+), 13 deletions(-) > > diff --git a/hw/misc/zynq_slcr.c b/hw/misc/zynq_slcr.c > index 5086e6b7ed..8b70285961 100644 > --- a/hw/misc/zynq_slcr.c > +++ b/hw/misc/zynq_slcr.c > @@ -269,6 +269,21 @@ static uint64_t zynq_slcr_compute_clock(const uint64_t > periods[], > zynq_slcr_compute_clock((plls), (state)->regs[reg], \ > reg ## _ ## enable_field ## _SHIFT) > > +static void zynq_slcr_compute_clocks_internal(ZynqSLCRState *s, uint64_t > ps_clk) > +{ > +uint64_t io_pll = zynq_slcr_compute_pll(ps_clk, s->regs[R_IO_PLL_CTRL]); > +uint64_t arm_pll = zynq_slcr_compute_pll(ps_clk, > s->regs[R_ARM_PLL_CTRL]); > +uint64_t ddr_pll = zynq_slcr_compute_pll(ps_clk, > s->regs[R_DDR_PLL_CTRL]); > + > +uint64_t uart_mux[4] = {io_pll, io_pll, arm_pll, ddr_pll}; > + > +/* compute uartX reference clocks */ > +clock_set(s->uart0_ref_clk, > + ZYNQ_COMPUTE_CLK(s, uart_mux, R_UART_CLK_CTRL, CLKACT0)); > +clock_set(s->uart1_ref_clk, > + ZYNQ_COMPUTE_CLK(s, uart_mux, R_UART_CLK_CTRL, CLKACT1)); > +} > + > /** > * Compute and set the ouputs clocks periods. > * But do not propagate them further. Connected clocks > @@ -283,17 +298,7 @@ static void zynq_slcr_compute_clocks(ZynqSLCRState *s) > ps_clk = 0; > } > > -uint64_t io_pll = zynq_slcr_compute_pll(ps_clk, s->regs[R_IO_PLL_CTRL]); > -uint64_t arm_pll = zynq_slcr_compute_pll(ps_clk, > s->regs[R_ARM_PLL_CTRL]); > -uint64_t ddr_pll = zynq_slcr_compute_pll(ps_clk, > s->regs[R_DDR_PLL_CTRL]); > - > -uint64_t uart_mux[4] = {io_pll, io_pll, arm_pll, ddr_pll}; > - > -/* compute uartX reference clocks */ > -clock_set(s->uart0_ref_clk, > - ZYNQ_COMPUTE_CLK(s, uart_mux, R_UART_CLK_CTRL, CLKACT0)); > -clock_set(s->uart1_ref_clk, > - ZYNQ_COMPUTE_CLK(s, uart_mux, R_UART_CLK_CTRL, CLKACT1)); > +zynq_slcr_compute_clocks_internal(s, ps_clk); > } > > /** > @@ -416,7 +421,7 @@ static void zynq_slcr_reset_hold(Object *obj) > ZynqSLCRState *s = ZYNQ_SLCR(obj); > > /* will disable all output clocks */ > -zynq_slcr_compute_clocks(s); > +zynq_slcr_compute_clocks_internal(s, 0); > zynq_slcr_propagate_clocks(s); > } > > @@ -425,7 +430,7 @@ static void zynq_slcr_reset_exit(Object *obj) > ZynqSLCRState *s = ZYNQ_SLCR(obj); > > /* will compute output clocks according to ps_clk and registers */ > -zynq_slcr_compute_clocks(s); > +zynq_slcr_compute_clocks_internal(s, clock_get(s->ps_clk)); > zynq_slcr_propagate_clocks(s); > } > > -- > 2.25.1 > >
Re: [PATCH v4 03/21] target/riscv: Clean up division helpers
On Sat, Aug 21, 2021 at 1:46 AM Richard Henderson wrote: > > Utilize the condition in the movcond more; this allows some of > the setcond that were feeding into movcond to be removed. > Do not write into source1 and source2. Re-name "condN" to "tempN" > and use the temporaries for more than holding conditions. > > Signed-off-by: Richard Henderson > --- > target/riscv/translate.c | 160 --- > 1 file changed, 84 insertions(+), 76 deletions(-) > Reviewed-by: Bin Meng Tested-by: Bin Meng
Re: [PATCH v2 1/2] migration: allow multifd for socket protocol only
kindly ping On 31/07/2021 22:05, Li Zhijian wrote: > multifd with unsupported protocol will cause a segment fault. > (gdb) bt > #0 0x563b4a93faf8 in socket_connect (addr=0x0, errp=0x7f7f02675410) at > ../util/qemu-sockets.c:1190 > #1 0x563b4a797a03 in qio_channel_socket_connect_sync > (ioc=0x563b4d16e8c0, addr=0x0, errp=0x7f7f02675410) at > ../io/channel-socket.c:145 > #2 0x563b4a797abf in qio_channel_socket_connect_worker > (task=0x563b4cd86c30, opaque=0x0) at ../io/channel-socket.c:168 > #3 0x563b4a792631 in qio_task_thread_worker (opaque=0x563b4cd86c30) at > ../io/task.c:124 > #4 0x563b4a91da69 in qemu_thread_start (args=0x563b4c44bb80) at > ../util/qemu-thread-posix.c:541 > #5 0x7f7fe9b5b3f9 in ?? () > #6 0x in ?? () > > It's enough to check migrate_multifd_is_allowed() in multifd cleanup() and > multifd setup() though there are so many other places using > migrate_use_multifd(). > > Signed-off-by: Li Zhijian > --- > migration/migration.c | 4 > migration/multifd.c | 24 ++-- > migration/multifd.h | 2 ++ > 3 files changed, 28 insertions(+), 2 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 2d306582ebf..212314541f1 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -456,10 +456,12 @@ static void qemu_start_incoming_migration(const char > *uri, Error **errp) > { > const char *p = NULL; > > +migrate_protocol_allow_multifd(false); /* reset it anyway */ > qapi_event_send_migration(MIGRATION_STATUS_SETUP); > if (strstart(uri, "tcp:", ) || > strstart(uri, "unix:", NULL) || > strstart(uri, "vsock:", NULL)) { > +migrate_protocol_allow_multifd(true); > socket_start_incoming_migration(p ? p : uri, errp); > #ifdef CONFIG_RDMA > } else if (strstart(uri, "rdma:", )) { > @@ -2289,9 +2291,11 @@ void qmp_migrate(const char *uri, bool has_blk, bool > blk, > } > } > > +migrate_protocol_allow_multifd(false); > if (strstart(uri, "tcp:", ) || > strstart(uri, "unix:", NULL) || > strstart(uri, "vsock:", NULL)) { > +migrate_protocol_allow_multifd(true); > socket_start_outgoing_migration(s, p ? p : uri, _err); > #ifdef CONFIG_RDMA > } else if (strstart(uri, "rdma:", )) { > diff --git a/migration/multifd.c b/migration/multifd.c > index ab41590e714..4a4d16d3888 100644 > --- a/migration/multifd.c > +++ b/migration/multifd.c > @@ -531,7 +531,7 @@ void multifd_save_cleanup(void) > { > int i; > > -if (!migrate_use_multifd()) { > +if (!migrate_use_multifd() || !migrate_multifd_is_allowed()) { > return; > } > multifd_send_terminate_threads(NULL); > @@ -864,6 +864,17 @@ cleanup: > multifd_new_send_channel_cleanup(p, sioc, local_err); > } > > +static bool migrate_allow_multifd; > +void migrate_protocol_allow_multifd(bool allow) > +{ > +migrate_allow_multifd = allow; > +} > + > +bool migrate_multifd_is_allowed(void) > +{ > +return migrate_allow_multifd; > +} > + > int multifd_save_setup(Error **errp) > { > int thread_count; > @@ -874,6 +885,11 @@ int multifd_save_setup(Error **errp) > if (!migrate_use_multifd()) { > return 0; > } > +if (!migrate_multifd_is_allowed()) { > +error_setg(errp, "multifd is not supported by current protocol"); > +return -1; > +} > + > s = migrate_get_current(); > thread_count = migrate_multifd_channels(); > multifd_send_state = g_malloc0(sizeof(*multifd_send_state)); > @@ -967,7 +983,7 @@ int multifd_load_cleanup(Error **errp) > { > int i; > > -if (!migrate_use_multifd()) { > +if (!migrate_use_multifd() || !migrate_multifd_is_allowed()) { > return 0; > } > multifd_recv_terminate_threads(NULL); > @@ -1123,6 +1139,10 @@ int multifd_load_setup(Error **errp) > if (!migrate_use_multifd()) { > return 0; > } > +if (!migrate_multifd_is_allowed()) { > +error_setg(errp, "multifd is not supported by current protocol"); > +return -1; > +} > thread_count = migrate_multifd_channels(); > multifd_recv_state = g_malloc0(sizeof(*multifd_recv_state)); > multifd_recv_state->params = g_new0(MultiFDRecvParams, thread_count); > diff --git a/migration/multifd.h b/migration/multifd.h > index 8d6751f5ed8..f62a1becd0b 100644 > --- a/migration/multifd.h > +++ b/migration/multifd.h > @@ -13,6 +13,8 @@ > #ifndef QEMU_MIGRATION_MULTIFD_H > #define QEMU_MIGRATION_MULTIFD_H > > +bool migrate_multifd_is_allowed(void); > +void migrate_protocol_allow_multifd(bool allow); > int multifd_save_setup(Error **errp); > void multifd_save_cleanup(void); > int multifd_load_setup(Error **errp);
[PATCH v2 1/2] migration/rdma: Try to register On-Demand Paging memory region
Previously, for the fsdax mem-backend-file, it will register failed with Operation not supported. In this case, we can try to register it with On-Demand Paging[1] like what rpma_mr_reg() does on rpma[2]. [1]: https://community.mellanox.com/s/article/understanding-on-demand-paging--odp-x [2]: http://pmem.io/rpma/manpages/v0.9.0/rpma_mr_reg.3 CC: Marcel Apfelbaum Signed-off-by: Li Zhijian --- V2: add ODP sanity check and remove goto --- migration/rdma.c | 73 ++ migration/trace-events | 1 + 2 files changed, 54 insertions(+), 20 deletions(-) diff --git a/migration/rdma.c b/migration/rdma.c index 5c2d113aa94..eb80431aae2 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -1117,19 +1117,47 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma) return 0; } +/* Check whether On-Demand Paging is supported by RDAM device */ +static bool rdma_support_odp(struct ibv_context *dev) +{ +struct ibv_device_attr_ex attr = {0}; +int ret = ibv_query_device_ex(dev, NULL, ); +if (ret) { +return false; +} + +if (attr.odp_caps.general_caps & IBV_ODP_SUPPORT) { +return true; +} + +return false; +} + static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma) { int i; RDMALocalBlocks *local = >local_ram_blocks; for (i = 0; i < local->nb_blocks; i++) { +int access = IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE; + local->block[i].mr = ibv_reg_mr(rdma->pd, local->block[i].local_host_addr, -local->block[i].length, -IBV_ACCESS_LOCAL_WRITE | -IBV_ACCESS_REMOTE_WRITE +local->block[i].length, access ); + +if (!local->block[i].mr && +errno == ENOTSUP && rdma_support_odp(rdma->verbs)) { +access |= IBV_ACCESS_ON_DEMAND; +/* register ODP mr */ +local->block[i].mr = +ibv_reg_mr(rdma->pd, + local->block[i].local_host_addr, + local->block[i].length, access); +trace_qemu_rdma_register_odp_mr(local->block[i].block_name); +} + if (!local->block[i].mr) { perror("Failed to register local dest ram block!"); break; @@ -1215,28 +1243,33 @@ static int qemu_rdma_register_and_get_keys(RDMAContext *rdma, */ if (!block->pmr[chunk]) { uint64_t len = chunk_end - chunk_start; +int access = rkey ? IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE : + 0; trace_qemu_rdma_register_and_get_keys(len, chunk_start); -block->pmr[chunk] = ibv_reg_mr(rdma->pd, -chunk_start, len, -(rkey ? (IBV_ACCESS_LOCAL_WRITE | -IBV_ACCESS_REMOTE_WRITE) : 0)); - -if (!block->pmr[chunk]) { -perror("Failed to register chunk!"); -fprintf(stderr, "Chunk details: block: %d chunk index %d" -" start %" PRIuPTR " end %" PRIuPTR -" host %" PRIuPTR -" local %" PRIuPTR " registrations: %d\n", -block->index, chunk, (uintptr_t)chunk_start, -(uintptr_t)chunk_end, host_addr, -(uintptr_t)block->local_host_addr, -rdma->total_registrations); -return -1; +block->pmr[chunk] = ibv_reg_mr(rdma->pd, chunk_start, len, access); +if (!block->pmr[chunk] && +errno == ENOTSUP && rdma_support_odp(rdma->verbs)) { +access |= IBV_ACCESS_ON_DEMAND; +/* register ODP mr */ +block->pmr[chunk] = ibv_reg_mr(rdma->pd, chunk_start, len, access); +trace_qemu_rdma_register_odp_mr(block->block_name); } -rdma->total_registrations++; } +if (!block->pmr[chunk]) { +perror("Failed to register chunk!"); +fprintf(stderr, "Chunk details: block: %d chunk index %d" +" start %" PRIuPTR " end %" PRIuPTR +" host %" PRIuPTR +" local %" PRIuPTR " registrations: %d\n", +block->index, chunk, (uintptr_t)chunk_start, +(uintptr_t)chunk_end, host_addr, +(uintptr_t)block->local_host_addr, +rdma->total_registrations); +return -1; +} +rdma->total_registrations++; if (lkey) { *lkey = block->pmr[chunk]->lkey; diff --git a/migration/trace-events b/migration/trace-events index a1c0f034ab8..5f6aa580def 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -212,6 +212,7 @@ qemu_rdma_poll_write(const char *compstr, int64_t comp, int left, uint64_t block qemu_rdma_poll_other(const char
[PATCH v2 2/2] migration/rdma: advise prefetch write for ODP region
The responder mr registering with ODP will sent RNR NAK back to the requester in the face of the page fault. - ibv_poll_cq wc.status=13 RNR retry counter exceeded! ibv_poll_cq wrid=WRITE RDMA! - ibv_advise_mr(3) helps to make pages present before the actual IO is conducted so that the responder does page fault as little as possible. Signed-off-by: Li Zhijian Reviewed-by: Marcel Apfelbaum --- V2: use IBV_ADVISE_MR_FLAG_FLUSH instead of IB_UVERBS_ADVISE_MR_FLAG_FLUSH and add Reviewed-by tag. # Marcel --- migration/rdma.c | 40 migration/trace-events | 1 + 2 files changed, 41 insertions(+) diff --git a/migration/rdma.c b/migration/rdma.c index eb80431aae2..6c2cc3f617c 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -1133,6 +1133,30 @@ static bool rdma_support_odp(struct ibv_context *dev) return false; } +/* + * ibv_advise_mr to avoid RNR NAK error as far as possible. + * The responder mr registering with ODP will sent RNR NAK back to + * the requester in the face of the page fault. + */ +static void qemu_rdma_advise_prefetch_mr(struct ibv_pd *pd, uint64_t addr, + uint32_t len, uint32_t lkey, + const char *name, bool wr) +{ +int ret; +int advice = wr ? IBV_ADVISE_MR_ADVICE_PREFETCH_WRITE : + IBV_ADVISE_MR_ADVICE_PREFETCH; +struct ibv_sge sg_list = {.lkey = lkey, .addr = addr, .length = len}; + +ret = ibv_advise_mr(pd, advice, +IBV_ADVISE_MR_FLAG_FLUSH, _list, 1); +/* ignore the error */ +if (ret) { +trace_qemu_rdma_advise_mr(name, len, addr, strerror(errno)); +} else { +trace_qemu_rdma_advise_mr(name, len, addr, "successed"); +} +} + static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma) { int i; @@ -1156,6 +1180,15 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma) local->block[i].local_host_addr, local->block[i].length, access); trace_qemu_rdma_register_odp_mr(local->block[i].block_name); + +if (local->block[i].mr) { +qemu_rdma_advise_prefetch_mr(rdma->pd, +(uintptr_t)local->block[i].local_host_addr, +local->block[i].length, +local->block[i].mr->lkey, +local->block[i].block_name, +true); +} } if (!local->block[i].mr) { @@ -1255,6 +1288,13 @@ static int qemu_rdma_register_and_get_keys(RDMAContext *rdma, /* register ODP mr */ block->pmr[chunk] = ibv_reg_mr(rdma->pd, chunk_start, len, access); trace_qemu_rdma_register_odp_mr(block->block_name); + +if (block->pmr[chunk]) { +qemu_rdma_advise_prefetch_mr(rdma->pd, (uintptr_t)chunk_start, +len, block->pmr[chunk]->lkey, +block->block_name, rkey); + +} } } if (!block->pmr[chunk]) { diff --git a/migration/trace-events b/migration/trace-events index 5f6aa580def..a8ae163707c 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -213,6 +213,7 @@ qemu_rdma_poll_other(const char *compstr, int64_t comp, int left) "other complet qemu_rdma_post_send_control(const char *desc) "CONTROL: sending %s.." qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" PRIu64 " bytes @ %p" qemu_rdma_register_odp_mr(const char *name) "Try to register On-Demand Paging memory region: %s" +qemu_rdma_advise_mr(const char *name, uint32_t len, uint64_t addr, const char *res) "Try to advise block %s prefetch at %" PRIu32 "@0x%" PRIx64 ": %s" qemu_rdma_registration_handle_compress(int64_t length, int index, int64_t offset) "Zapping zero chunk: %" PRId64 " bytes, index %d, offset %" PRId64 qemu_rdma_registration_handle_finished(void) "" qemu_rdma_registration_handle_ram_blocks(void) "" -- 2.31.1
[PATCH v2 0/2] enable fsdax rdma migration
Previous qemu are facing 2 problems when migrating a fsdax memory backend with RDMA protocol. (1) ibv_reg_mr failed with Operation not supported (2) requester(source) side could receive RNR NAK. For the (1), we can try to register memory region with ODP feature which has already been implemented in some modern HCA hardware/drivers. For the (2), IB provides advise API to prefetch pages in specific memory region. It can help driver reduce the page fault on responder(destination) side during RDMA_WRITE. CC: marcel.apfelb...@gmail.com Li Zhijian (2): migration/rdma: Try to register On-Demand Paging memory region migration/rdma: advise prefetch write for ODP region migration/rdma.c | 117 + migration/trace-events | 2 + 2 files changed, 98 insertions(+), 21 deletions(-) -- 2.31.1
Re: [PATCH v4 02/21] tests/tcg/riscv64: Add test for division
On Sat, Aug 21, 2021 at 1:43 AM Richard Henderson wrote: > > Signed-off-by: Richard Henderson > --- > tests/tcg/riscv64/test-div.c | 58 +++ > tests/tcg/riscv64/Makefile.target | 5 +++ > 2 files changed, 63 insertions(+) > create mode 100644 tests/tcg/riscv64/test-div.c > create mode 100644 tests/tcg/riscv64/Makefile.target > Reviewed-by: Bin Meng Tested-by: Bin Meng
[PATCH v2 1/3] docs/about: Remove the duplicated doc
There are two places describing the same thing about deprecation of invalid topologies of -smp CLI, so remove the duplicated one. Signed-off-by: Yanan Wang Reviewed-by: Cornelia Huck --- docs/about/removed-features.rst | 13 - 1 file changed, 13 deletions(-) diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst index cbfa1a8e31..6a9c5bb484 100644 --- a/docs/about/removed-features.rst +++ b/docs/about/removed-features.rst @@ -647,19 +647,6 @@ as ignored. Currently, users are responsible for making sure the backing storage specified with ``-mem-path`` can actually provide the guest RAM configured with ``-m`` and QEMU fails to start up if RAM allocation is unsuccessful. -``-smp`` (invalid topologies) (removed 5.2) -''' - -CPU topology properties should describe whole machine topology including -possible CPUs. - -However, historically it was possible to start QEMU with an incorrect topology -where *n* <= *sockets* * *cores* * *threads* < *maxcpus*, -which could lead to an incorrect topology enumeration by the guest. -Support for invalid topologies is removed, the user must ensure -topologies described with -smp include all possible cpus, i.e. -*sockets* * *cores* * *threads* = *maxcpus*. - ``-machine enforce-config-section=on|off`` (removed 5.2) -- 2.19.1
[PATCH v2 2/3] docs/about: Unify the subject format
There is a mixture of "since/removed in X.Y" vs "since/removed in X.Y.Z" in the subjects in deprecated.rst/removed-features.rst. It will be better to use an unified format. It seems unlikely that we will ever deprecate something in a stable release, and even more unlikely that we'll remove something in one, so the short versions look like the thing we want to standardize on. So here we unify the subject format in deprecated.rst to "since X.Y", and unify the subject format in removed-features.rst to "removed in X.Y". Signed-off-by: Yanan Wang Reviewed-by: Cornelia Huck --- docs/about/deprecated.rst | 56 - docs/about/removed-features.rst | 28 - 2 files changed, 42 insertions(+), 42 deletions(-) diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst index 6d438f1c8d..8d4fd384a5 100644 --- a/docs/about/deprecated.rst +++ b/docs/about/deprecated.rst @@ -107,8 +107,8 @@ the process listing. This is replaced by the new ``password-secret`` option which lets the password be securely provided on the command line using a ``secret`` object instance. -``opened`` property of ``rng-*`` objects (since 6.0.0) -'' +``opened`` property of ``rng-*`` objects (since 6.0) + The only effect of specifying ``opened=on`` in the command line or QMP ``object-add`` is that the device is opened immediately, possibly before all @@ -116,8 +116,8 @@ other options have been processed. This will either have no effect (if ``opened`` was the last option) or cause errors. The property is therefore useless and should not be specified. -``loaded`` property of ``secret`` and ``secret_keyring`` objects (since 6.0.0) -'' +``loaded`` property of ``secret`` and ``secret_keyring`` objects (since 6.0) + The only effect of specifying ``loaded=on`` in the command line or QMP ``object-add`` is that the secret is loaded immediately, possibly before all @@ -142,33 +142,33 @@ should be used instead. QEMU Machine Protocol (QMP) commands -``blockdev-open-tray``, ``blockdev-close-tray`` argument ``device`` (since 2.8.0) -' +``blockdev-open-tray``, ``blockdev-close-tray`` argument ``device`` (since 2.8) +''' Use argument ``id`` instead. -``eject`` argument ``device`` (since 2.8.0) -''' +``eject`` argument ``device`` (since 2.8) +' Use argument ``id`` instead. -``blockdev-change-medium`` argument ``device`` (since 2.8.0) - +``blockdev-change-medium`` argument ``device`` (since 2.8) +'' Use argument ``id`` instead. -``block_set_io_throttle`` argument ``device`` (since 2.8.0) -''' +``block_set_io_throttle`` argument ``device`` (since 2.8) +' Use argument ``id`` instead. -``blockdev-add`` empty string argument ``backing`` (since 2.10.0) -' +``blockdev-add`` empty string argument ``backing`` (since 2.10) +''' Use argument value ``null`` instead. -``block-commit`` arguments ``base`` and ``top`` (since 3.1.0) -' +``block-commit`` arguments ``base`` and ``top`` (since 3.1) +''' Use arguments ``base-node`` and ``top-node`` instead. @@ -191,8 +191,8 @@ from Linux upstream kernel, declare it deprecated. System emulator CPUS -``Icelake-Client`` CPU Model (since 5.2.0) -'' +``Icelake-Client`` CPU Model (since 5.2) + ``Icelake-Client`` CPU Models are deprecated. Use ``Icelake-Server`` CPU Models instead. @@ -245,8 +245,8 @@ Device options Emulated device options ''' -``-device virtio-blk,scsi=on|off`` (since 5.0.0) - +``-device virtio-blk,scsi=on|off`` (since 5.0) +^^ The virtio-blk SCSI passthrough feature is a legacy VIRTIO feature. VIRTIO 1.0 and later do not support it because the virtio-scsi device was introduced for @@ -258,14 +258,14 @@ alias. Block device options -``"backing": ""`` (since 2.12.0)
[PATCH v2 3/3] docs/about: Add the missing release record in the subject
Commit 29e0447551 (docs/about/removed-features: Document removed CLI options from QEMU v3.1) has recorded some CLI options as replaced/removed from QEMU v3.1, but one of the subjects has missed the release record. Let's fix it. Reported-by: Cornelia Huck Signed-off-by: Yanan Wang --- docs/about/removed-features.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst index 1c926a8bc1..8feeead449 100644 --- a/docs/about/removed-features.rst +++ b/docs/about/removed-features.rst @@ -140,8 +140,8 @@ Use ``-rtc driftfix=slew`` instead. Replaced by ``-rtc base=date``. -``-vnc ...,tls=...``, ``-vnc ...,x509=...`` & ``-vnc ...,x509verify=...`` -' +``-vnc ...,tls=...``, ``-vnc ...,x509=...`` & ``-vnc ...,x509verify=...`` (removed in 3.1) +'' The "tls-creds" option should be used instead to point to a "tls-creds-x509" object created using "-object". -- 2.19.1
[PATCH v2 0/3] docs/about: some documentation clean-up/fix
This series makes some clean-up and fix for docs/about, including removing the duplicated section, unifying the subject format, and adding the missed release record in the subject. v1->v2: - update the commit message, combined with Cornelia's comment - add the missing release record in the subject (patch #3) - v1: https://lore.kernel.org/qemu-devel/20210820015628.173532-1-wangyana...@huawei.com/ Yanan Wang (3): docs/about: Remove the duplicated doc docs/about: Unify the subject format docs/about: Add the missing release record in the subject docs/about/deprecated.rst | 56 - docs/about/removed-features.rst | 45 ++ 2 files changed, 44 insertions(+), 57 deletions(-) -- 2.19.1
[PATCH 2/3] hw/char: cadence_uart: Disable transmit when input clock is disabled
At present when input clock is disabled, any character transmitted to tx fifo can still show on the serial line, which is wrong. Fixes: b636db306e06 ("hw/char/cadence_uart: add clock support") Signed-off-by: Bin Meng --- hw/char/cadence_uart.c | 5 + 1 file changed, 5 insertions(+) diff --git a/hw/char/cadence_uart.c b/hw/char/cadence_uart.c index b4b5e8a3ee..154be34992 100644 --- a/hw/char/cadence_uart.c +++ b/hw/char/cadence_uart.c @@ -327,6 +327,11 @@ static gboolean cadence_uart_xmit(void *do_not_use, GIOCondition cond, static void uart_write_tx_fifo(CadenceUARTState *s, const uint8_t *buf, int size) { +/* ignore characters when unclocked or in reset */ +if (!clock_is_enabled(s->refclk) || device_is_in_reset(DEVICE(s))) { +return; +} + if ((s->r[R_CR] & UART_CR_TX_DIS) || !(s->r[R_CR] & UART_CR_TX_EN)) { return; } -- 2.25.1
[PATCH 3/3] hw/char: cadence_uart: Move clock/reset check to uart_can_receive()
Currently the clock/reset check is done in uart_receive(), but we can move the check to uart_can_receive() which is earlier. Signed-off-by: Bin Meng --- hw/char/cadence_uart.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/hw/char/cadence_uart.c b/hw/char/cadence_uart.c index 154be34992..7326445385 100644 --- a/hw/char/cadence_uart.c +++ b/hw/char/cadence_uart.c @@ -235,6 +235,12 @@ static void uart_parameters_setup(CadenceUARTState *s) static int uart_can_receive(void *opaque) { CadenceUARTState *s = opaque; + +/* ignore characters when unclocked or in reset */ +if (!clock_is_enabled(s->refclk) || device_is_in_reset(DEVICE(s))) { +return 0; +} + int ret = MAX(CADENCE_UART_RX_FIFO_SIZE, CADENCE_UART_TX_FIFO_SIZE); uint32_t ch_mode = s->r[R_MR] & UART_MR_CHMODE; @@ -358,11 +364,6 @@ static void uart_receive(void *opaque, const uint8_t *buf, int size) CadenceUARTState *s = opaque; uint32_t ch_mode = s->r[R_MR] & UART_MR_CHMODE; -/* ignore characters when unclocked or in reset */ -if (!clock_is_enabled(s->refclk) || device_is_in_reset(DEVICE(s))) { -return; -} - if (ch_mode == NORMAL_MODE || ch_mode == ECHO_MODE) { uart_write_rx_fifo(opaque, buf, size); } -- 2.25.1
[PATCH 1/3] hw/misc: zynq_slcr: Correctly compute output clocks in the reset exit phase
As of today, when booting upstream U-Boot for Xilinx Zynq, the UART does not receive anything. Debugging shows that the UART input clock frequency is zero which prevents the UART from receiving anything as per the logic in uart_receive(). >From zynq_slcr_reset_exit() comment, it intends to compute output clocks according to ps_clk and registers. zynq_slcr_compute_clocks() is called to accomplish the task, inside which device_is_in_reset() is called to actually make the attempt in vain. Rework reset_hold() and reset_exit() so that in the reset exit phase, the logic can really compute output clocks in reset_exit(). With this change, upstream U-Boot boots properly again with: $ qemu-system-arm -M xilinx-zynq-a9 -m 1G -display none -serial null -serial stdio \ -device loader,file=u-boot-dtb.bin,addr=0x400,cpu-num=0 Fixes: 38867cb7ec90 ("hw/misc/zynq_slcr: add clock generation for uarts") Signed-off-by: Bin Meng --- hw/misc/zynq_slcr.c | 31 ++- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/hw/misc/zynq_slcr.c b/hw/misc/zynq_slcr.c index 5086e6b7ed..8b70285961 100644 --- a/hw/misc/zynq_slcr.c +++ b/hw/misc/zynq_slcr.c @@ -269,6 +269,21 @@ static uint64_t zynq_slcr_compute_clock(const uint64_t periods[], zynq_slcr_compute_clock((plls), (state)->regs[reg], \ reg ## _ ## enable_field ## _SHIFT) +static void zynq_slcr_compute_clocks_internal(ZynqSLCRState *s, uint64_t ps_clk) +{ +uint64_t io_pll = zynq_slcr_compute_pll(ps_clk, s->regs[R_IO_PLL_CTRL]); +uint64_t arm_pll = zynq_slcr_compute_pll(ps_clk, s->regs[R_ARM_PLL_CTRL]); +uint64_t ddr_pll = zynq_slcr_compute_pll(ps_clk, s->regs[R_DDR_PLL_CTRL]); + +uint64_t uart_mux[4] = {io_pll, io_pll, arm_pll, ddr_pll}; + +/* compute uartX reference clocks */ +clock_set(s->uart0_ref_clk, + ZYNQ_COMPUTE_CLK(s, uart_mux, R_UART_CLK_CTRL, CLKACT0)); +clock_set(s->uart1_ref_clk, + ZYNQ_COMPUTE_CLK(s, uart_mux, R_UART_CLK_CTRL, CLKACT1)); +} + /** * Compute and set the ouputs clocks periods. * But do not propagate them further. Connected clocks @@ -283,17 +298,7 @@ static void zynq_slcr_compute_clocks(ZynqSLCRState *s) ps_clk = 0; } -uint64_t io_pll = zynq_slcr_compute_pll(ps_clk, s->regs[R_IO_PLL_CTRL]); -uint64_t arm_pll = zynq_slcr_compute_pll(ps_clk, s->regs[R_ARM_PLL_CTRL]); -uint64_t ddr_pll = zynq_slcr_compute_pll(ps_clk, s->regs[R_DDR_PLL_CTRL]); - -uint64_t uart_mux[4] = {io_pll, io_pll, arm_pll, ddr_pll}; - -/* compute uartX reference clocks */ -clock_set(s->uart0_ref_clk, - ZYNQ_COMPUTE_CLK(s, uart_mux, R_UART_CLK_CTRL, CLKACT0)); -clock_set(s->uart1_ref_clk, - ZYNQ_COMPUTE_CLK(s, uart_mux, R_UART_CLK_CTRL, CLKACT1)); +zynq_slcr_compute_clocks_internal(s, ps_clk); } /** @@ -416,7 +421,7 @@ static void zynq_slcr_reset_hold(Object *obj) ZynqSLCRState *s = ZYNQ_SLCR(obj); /* will disable all output clocks */ -zynq_slcr_compute_clocks(s); +zynq_slcr_compute_clocks_internal(s, 0); zynq_slcr_propagate_clocks(s); } @@ -425,7 +430,7 @@ static void zynq_slcr_reset_exit(Object *obj) ZynqSLCRState *s = ZYNQ_SLCR(obj); /* will compute output clocks according to ps_clk and registers */ -zynq_slcr_compute_clocks(s); +zynq_slcr_compute_clocks_internal(s, clock_get(s->ps_clk)); zynq_slcr_propagate_clocks(s); } -- 2.25.1
[PATCH 0/3] hw/arm: xilinx_zynq: Fix upstream U-Boot boot failure
As of today, when booting upstream U-Boot for Xilinx Zynq, the UART does not receive anything. Debugging shows that the UART input clock frequency is zero which prevents the UART from receiving anything as. per the logic in uart_receive(). Note the U-Boot can still output data to the UART tx fifo, which should not happen, as the design seems to prevent the data transmission when clock is not enabled but somehow it only applies to the Rx side. For anyone who is interested to give a try, here is the U-Boot defconfig: $ make xilinx_zynq_virt_defconfig and QEMU commands to test U-Boot: $ qemu-system-arm -M xilinx-zynq-a9 -m 1G -display none -serial null -serial stdio \ -device loader,file=u-boot-dtb.bin,addr=0x400,cpu-num=0 Note U-Boot used to boot properly in QEMU 4.2.0 which is the QEMU version used in current U-Boot's CI testing. The UART clock changes were introduced by the following 3 commits: 38867cb7ec90 ("hw/misc/zynq_slcr: add clock generation for uarts") b636db306e06 ("hw/char/cadence_uart: add clock support") 5b49a34c6800 ("hw/arm/xilinx_zynq: connect uart clocks to slcr") Bin Meng (3): hw/misc: zynq_slcr: Correctly compute output clocks in the reset exit phase hw/char: cadence_uart: Disable transmit when input clock is disabled hw/char: cadence_uart: Move clock/reset check to uart_can_receive() hw/char/cadence_uart.c | 16 +++- hw/misc/zynq_slcr.c| 31 ++- 2 files changed, 29 insertions(+), 18 deletions(-) -- 2.25.1
Re: [PATCH 1/2] migration/rdma: Try to register On-Demand Paging memory region
On 22/08/2021 16:53, Marcel Apfelbaum wrote: > Hi > > On Sat, Jul 31, 2021 at 5:00 PM Li Zhijian wrote: >> Previously, for the fsdax mem-backend-file, it will register failed with >> Operation not supported. In this case, we can try to register it with >> On-Demand Paging[1] like what rpma_mr_reg() does on rpma[2]. >> >> [1]: >> https://community.mellanox.com/s/article/understanding-on-demand-paging--odp-x >> [2]: http://pmem.io/rpma/manpages/v0.9.0/rpma_mr_reg.3 >> Signed-off-by: Li Zhijian >> --- >> migration/rdma.c | 27 ++- >> migration/trace-events | 1 + >> 2 files changed, 19 insertions(+), 9 deletions(-) >> >> diff --git a/migration/rdma.c b/migration/rdma.c >> index 5c2d113aa94..8784b5f22a6 100644 >> --- a/migration/rdma.c >> +++ b/migration/rdma.c >> @@ -1123,15 +1123,21 @@ static int >> qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma) >> RDMALocalBlocks *local = >local_ram_blocks; >> >> for (i = 0; i < local->nb_blocks; i++) { >> +int access = IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE; >> + >> +on_demand: >> local->block[i].mr = >> ibv_reg_mr(rdma->pd, >> local->block[i].local_host_addr, >> -local->block[i].length, >> -IBV_ACCESS_LOCAL_WRITE | >> -IBV_ACCESS_REMOTE_WRITE >> +local->block[i].length, access >> ); >> if (!local->block[i].mr) { >> -perror("Failed to register local dest ram block!"); >> +if (!(access & IBV_ACCESS_ON_DEMAND) && errno == ENOTSUP) { >> +access |= IBV_ACCESS_ON_DEMAND; >> +trace_qemu_rdma_register_odp_mr(local->block[i].block_name); >> +goto on_demand; > Wouldn't it be better to check first if the device supports ODP ? > Something like: > ret = ibv_exp_query_device(context, ); > if (dattr.exp_device_cap_flags & IBV_EXP_DEVICE_ODP)... Good idea ! > > Also, I am not (personally) too fond of the "on_demand" label usage here, > however I will let the maintainer/others decide. Indeed, how just repeating the ibv_reg_mr() instead of a 'go to' Thanks Zhijian > > Thanks, > Marcel > >> +} >> +perror("Failed to register local dest ram block!\n"); >> break; >> } >> rdma->total_registrations++; >> @@ -1215,15 +1221,18 @@ static int >> qemu_rdma_register_and_get_keys(RDMAContext *rdma, >>*/ >> if (!block->pmr[chunk]) { >> uint64_t len = chunk_end - chunk_start; >> +int access = rkey ? IBV_ACCESS_LOCAL_WRITE | >> IBV_ACCESS_REMOTE_WRITE : 0; >> >> trace_qemu_rdma_register_and_get_keys(len, chunk_start); >> >> -block->pmr[chunk] = ibv_reg_mr(rdma->pd, >> -chunk_start, len, >> -(rkey ? (IBV_ACCESS_LOCAL_WRITE | >> -IBV_ACCESS_REMOTE_WRITE) : 0)); >> - >> +on_demand: >> +block->pmr[chunk] = ibv_reg_mr(rdma->pd, chunk_start, len, access); >> if (!block->pmr[chunk]) { >> +if (!(access & IBV_ACCESS_ON_DEMAND) && errno == ENOTSUP) { >> +access |= IBV_ACCESS_ON_DEMAND; >> +trace_qemu_rdma_register_odp_mr(block->block_name); >> +goto on_demand; >> +} >> perror("Failed to register chunk!"); >> fprintf(stderr, "Chunk details: block: %d chunk index %d" >> " start %" PRIuPTR " end %" PRIuPTR >> diff --git a/migration/trace-events b/migration/trace-events >> index a1c0f034ab8..5f6aa580def 100644 >> --- a/migration/trace-events >> +++ b/migration/trace-events >> @@ -212,6 +212,7 @@ qemu_rdma_poll_write(const char *compstr, int64_t comp, >> int left, uint64_t block >> qemu_rdma_poll_other(const char *compstr, int64_t comp, int left) "other >> completion %s (%" PRId64 ") received left %d" >> qemu_rdma_post_send_control(const char *desc) "CONTROL: sending %s.." >> qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" >> PRIu64 " bytes @ %p" >> +qemu_rdma_register_odp_mr(const char *name) "Try to register On-Demand >> Paging memory region: %s" >> qemu_rdma_registration_handle_compress(int64_t length, int index, int64_t >> offset) "Zapping zero chunk: %" PRId64 " bytes, index %d, offset %" PRId64 >> qemu_rdma_registration_handle_finished(void) "" >> qemu_rdma_registration_handle_ram_blocks(void) "" >> -- >> 2.31.1 >> >> >> >> >
Re: [PATCH 2/2] docs/about: Unify the subject format
On 2021/8/20 18:18, Cornelia Huck wrote: On Fri, Aug 20 2021, Yanan Wang wrote: Unify the subject format in deprecated.rst to "since X.Y". Unify the subject format in removed-features.rst to "removed in X.Y". It seems unlikely that we will ever deprecate something in a stable release, and even more unlikely that we'll remove something in one, so the short versions look like the thing we want to standardize on. Signed-off-by: Yanan Wang --- docs/about/deprecated.rst | 56 - docs/about/removed-features.rst | 28 - 2 files changed, 42 insertions(+), 42 deletions(-) Unrelated to your patch, line 143 in removed-features.rst reads ``-vnc ...,tls=...``, ``-vnc ...,x509=...`` & ``-vnc ...,x509verify=...`` and is missing the release it was removed in (presumably 3.1?) Yes, this part of section was introduced by commit 29e0447551 (docs/about/removed-features: Document removed CLI options from QEMU v3.1), so I think a record of "removed in 3.1" is needed. I rechecked both deprecated.rst and removed-features.rst, and this is the only place where we are missing a release. I can fix this in a third patch. Anyway, Reviewed-by: Cornelia Huck . Thanks, Yanan
[PATCH v3] hw/i386/acpi-build: Get NUMA information from struct NumaState
Since commits aa57020774b ("numa: move numa global variable nb_numa_nodes into MachineState") and 7e721e7b10e ("numa: move numa global variable numa_info into MachineState"), we can get NUMA information completely from MachineState::numa_state. Remove PCMachineState::numa_nodes and PCMachineState::node_mem, since they are just copied from MachineState::numa_state. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Jingqi Liu --- hw/i386/acpi-build.c | 12 +++- hw/i386/pc.c | 9 - include/hw/i386/pc.h | 4 3 files changed, 7 insertions(+), 18 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 17836149fe..e3c9ad011e 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -1902,6 +1902,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) X86MachineState *x86ms = X86_MACHINE(machine); const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine); PCMachineState *pcms = PC_MACHINE(machine); +int nb_numa_nodes = machine->numa_state->num_nodes; +NodeInfo *numa_info = machine->numa_state->nodes; ram_addr_t hotplugabble_address_space_size = object_property_get_int(OBJECT(pcms), PC_MACHINE_DEVMEM_REGION_SIZE, NULL); @@ -1945,9 +1947,9 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) next_base = 0; numa_start = table_data->len; -for (i = 1; i < pcms->numa_nodes + 1; ++i) { +for (i = 1; i < nb_numa_nodes + 1; ++i) { mem_base = next_base; -mem_len = pcms->node_mem[i - 1]; +mem_len = numa_info[i - 1].node_mem; next_base = mem_base + mem_len; /* Cut out the 640K hole */ @@ -1995,7 +1997,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) } slots = (table_data->len - numa_start) / sizeof *numamem; -for (; slots < pcms->numa_nodes + 2; slots++) { +for (; slots < nb_numa_nodes + 2; slots++) { numamem = acpi_data_push(table_data, sizeof *numamem); build_srat_memory(numamem, 0, 0, 0, MEM_AFFINITY_NOFLAGS); } @@ -2011,7 +2013,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) if (hotplugabble_address_space_size) { numamem = acpi_data_push(table_data, sizeof *numamem); build_srat_memory(numamem, machine->device_memory->base, - hotplugabble_address_space_size, pcms->numa_nodes - 1, + hotplugabble_address_space_size, nb_numa_nodes - 1, MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED); } @@ -2513,7 +2515,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine) } } #endif -if (pcms->numa_nodes) { +if (machine->numa_state->num_nodes) { acpi_add_table(table_offsets, tables_blob); build_srat(tables_blob, tables->linker, machine); if (machine->numa_state->have_numa_distance) { diff --git a/hw/i386/pc.c b/hw/i386/pc.c index c2b9d62a35..adbc348488 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -800,18 +800,9 @@ void pc_machine_done(Notifier *notifier, void *data) void pc_guest_info_init(PCMachineState *pcms) { -int i; -MachineState *ms = MACHINE(pcms); X86MachineState *x86ms = X86_MACHINE(pcms); x86ms->apic_xrupt_override = true; -pcms->numa_nodes = ms->numa_state->num_nodes; -pcms->node_mem = g_malloc0(pcms->numa_nodes * -sizeof *pcms->node_mem); -for (i = 0; i < ms->numa_state->num_nodes; i++) { -pcms->node_mem[i] = ms->numa_state->nodes[i].node_mem; -} - pcms->machine_done.notify = pc_machine_done; qemu_add_machine_init_done_notifier(>machine_done); } diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 88dffe7517..31b334e0a4 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -47,10 +47,6 @@ typedef struct PCMachineState { bool default_bus_bypass_iommu; uint64_t max_fw_size; -/* NUMA information: */ -uint64_t numa_nodes; -uint64_t *node_mem; - /* ACPI Memory hotplug IO base address */ hwaddr memhp_io_base; } PCMachineState; -- 2.21.3
Re: [PATCH 2/2] migration/rdma: advise prefetch write for ODP region
Hi Marcel On 22/08/2021 16:39, Marcel Apfelbaum wrote: > Hi, > > On Sat, Jul 31, 2021 at 5:03 PM Li Zhijian wrote: >> The responder mr registering with ODP will sent RNR NAK back to >> the requester in the face of the page fault. >> - >> ibv_poll_cq wc.status=13 RNR retry counter exceeded! >> ibv_poll_cq wrid=WRITE RDMA! >> - >> ibv_advise_mr(3) helps to make pages present before the actual IO is >> conducted so that the responder does page fault as little as possible. >> >> Signed-off-by: Li Zhijian >> --- >> migration/rdma.c | 40 >> migration/trace-events | 1 + >> 2 files changed, 41 insertions(+) >> >> diff --git a/migration/rdma.c b/migration/rdma.c >> index 8784b5f22a6..a2ad00d665f 100644 >> --- a/migration/rdma.c >> +++ b/migration/rdma.c >> @@ -1117,6 +1117,30 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma) >> return 0; >> } >> >> +/* >> + * ibv_advise_mr to avoid RNR NAK error as far as possible. >> + * The responder mr registering with ODP will sent RNR NAK back to >> + * the requester in the face of the page fault. >> + */ >> +static void qemu_rdma_advise_prefetch_write_mr(struct ibv_pd *pd, uint64_t >> addr, >> + uint32_t len, uint32_t lkey, >> + const char *name, bool wr) >> +{ >> +int ret; >> +int advice = wr ? IBV_ADVISE_MR_ADVICE_PREFETCH_WRITE : >> + IBV_ADVISE_MR_ADVICE_PREFETCH; >> +struct ibv_sge sg_list = {.lkey = lkey, .addr = addr, .length = len}; >> + >> +ret = ibv_advise_mr(pd, advice, >> +IB_UVERBS_ADVISE_MR_FLAG_FLUSH, _list, 1); >> +/* ignore the error */ > Following > https://github.com/linux-rdma/rdma-core/blob/master/libibverbs/man/ibv_advise_mr.3.md > it looks like it is a best-effort optimization, > I don't see any down-sides to it. > However it seems like it is recommended to use > IBV_ADVISE_MR_FLAG_FLUSH in order to > increase the optimization chances. Good catch, i will update it soon. Thanks > > Anyway > > Reviewed-by: Marcel Apfelbaum > > Thanks, > Marcel > >
Re: [PATCH V2] block/rbd: implement bdrv_co_block_status
On Tue, Aug 10, 2021 at 3:41 PM Peter Lieven wrote: > > the qemu rbd driver currently lacks support for bdrv_co_block_status. > This results mainly in incorrect progress during block operations (e.g. > qemu-img convert with an rbd image as source). > > This patch utilizes the rbd_diff_iterate2 call from librbd to detect > allocated and unallocated (all zero areas). > > To avoid querying the ceph OSDs for the answer this is only done if > the image has the fast-diff features which depends on the object-map Hi Peter, Nit: "has the fast-diff feature which depends on the object-map and exclusive-lock features" > and exclusive-lock. In this case it is guaranteed that the information > is present in memory in the librbd client and thus very fast. > > If fast-diff is not available all areas are reported to be allocated > which is the current behaviour if bdrv_co_block_status is not implemented. > > Signed-off-by: Peter Lieven > --- > V1->V2: > - add commit comment [Stefano] > - use failed_post_open [Stefano] > - remove redundant assert [Stefano] > - add macro+comment for the magic -9000 value [Stefano] > - always set *file if its non NULL [Stefano] > > block/rbd.c | 125 > 1 file changed, 125 insertions(+) > > diff --git a/block/rbd.c b/block/rbd.c > index dcf82b15b8..8692e76f40 100644 > --- a/block/rbd.c > +++ b/block/rbd.c > @@ -88,6 +88,7 @@ typedef struct BDRVRBDState { > char *namespace; > uint64_t image_size; > uint64_t object_size; > +uint64_t features; > } BDRVRBDState; > > typedef struct RBDTask { > @@ -983,6 +984,13 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict > *options, int flags, > s->image_size = info.size; > s->object_size = info.obj_size; > > +r = rbd_get_features(s->image, >features); > +if (r < 0) { > +error_setg_errno(errp, -r, "error getting image features from %s", > + s->image_name); > +goto failed_post_open; > +} The object-map and fast-diff features can be enabled/disabled while the image is open so this should probably go to qemu_rbd_co_block_status(). > + > /* If we are using an rbd snapshot, we must be r/o, otherwise > * leave as-is */ > if (s->snap != NULL) { > @@ -1259,6 +1267,122 @@ static ImageInfoSpecific > *qemu_rbd_get_specific_info(BlockDriverState *bs, > return spec_info; > } > > +typedef struct rbd_diff_req { > +uint64_t offs; > +uint64_t bytes; > +int exists; > +} rbd_diff_req; > + > +/* > + * rbd_diff_iterate2 allows to interrupt the exection by returning a negative > + * value in the callback routine. Choose a value that does not conflict with > + * an existing exitcode and return it if we want to prematurely stop the > + * execution because we detected a change in the allocation status. > + */ > +#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000 > + > +static int qemu_rbd_co_block_status_cb(uint64_t offs, size_t len, > + int exists, void *opaque) > +{ > +struct rbd_diff_req *req = opaque; > + > +assert(req->offs + req->bytes <= offs); > + > +if (req->exists && offs > req->offs + req->bytes) { > +/* > + * we started in an allocated area and jumped over an unallocated > area, > + * req->bytes contains the length of the allocated area before the > + * unallocated area. stop further processing. > + */ > +return QEMU_RBD_EXIT_DIFF_ITERATE2; > +} > +if (req->exists && !exists) { > +/* > + * we started in an allocated area and reached a hole. req->bytes > + * contains the length of the allocated area before the hole. > + * stop further processing. > + */ > +return QEMU_RBD_EXIT_DIFF_ITERATE2; > +} > +if (!req->exists && exists && offs > req->offs) { > +/* > + * we started in an unallocated area and hit the first allocated > + * block. req->bytes must be set to the length of the unallocated > area > + * before the allocated area. stop further processing. > + */ > +req->bytes = offs - req->offs; > +return QEMU_RBD_EXIT_DIFF_ITERATE2; > +} > + > +/* > + * assert that we catched all cases above and allocation state has not catched -> caught > + * changed during callbacks. > + */ > +assert(exists == req->exists || !req->bytes); > +req->exists = exists; > + > +/* > + * assert that we either return an unallocated block or have got > callbacks > + * for all allocated blocks present. > + */ > +assert(!req->exists || offs == req->offs + req->bytes); > +req->bytes = offs + len - req->offs; > + > +return 0; > +} > + > +static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs, > + bool want_zero, int64_t > offset, > + int64_t
Re: [PATCH v3 63/66] tcg/tci: Support raising sigbus for user-only
On 8/22/21 5:32 AM, Peter Maydell wrote: On Sun, 22 Aug 2021 at 08:59, Richard Henderson wrote: On 8/20/21 3:14 AM, Peter Maydell wrote: @@ -377,11 +381,11 @@ static uint64_t tci_qemu_ld(CPUArchState *env, target_ulong taddr, static void tci_qemu_st(CPUArchState *env, target_ulong taddr, uint64_t val, MemOpIdx oi, const void *tb_ptr) { -MemOp mop = get_memop(oi) & (MO_BSWAP | MO_SSIZE); +MemOp mop = get_memop(oi); uintptr_t ra = (uintptr_t)tb_ptr; Don't you need this bit in tci_qemu_st() as well ? Which bit isn't present in st as well? There's missing hunks in your reply, but afaics they're the same. https://patchew.org/QEMU/20210818191920.390759-1-richard.hender...@linaro.org/20210818191920.390759-64-richard.hender...@linaro.org/ I had the function name wrong, but only the tci_qemu_st() change has this bit: -MemOp mop = get_memop(oi) & (MO_BSWAP | MO_SSIZE); +MemOp mop = get_memop(oi); Ah yes, thanks. r~
Re: Testing a microcontroller emulation by loading the binary on incomplete Flash emulation
On Sun, 22 Aug 2021 at 15:37, Gautam Bhat wrote: > > Hi, > > I am to implement a very simple microcontroller for my understanding > of Qemu development. This microcontroller runs its code from > programmable flash which is bit-, byte- and word addressable. To do > some initial tests of my nascent microcontroller implementation I > would like to load a very simple program binary. Is there a way to > load this binary and start execution without emulating Flash > controller and memory? Just create a plain old RAM memory region, and then load the guest binary into it with the 'generic loader' (which can take an ELF file or a raw binary). -- PMM
Re: [For 6.1 PATCH] hw/arm: xilinx_zynq: Disconnect the UART clocks temporarily
On Sun, Aug 22, 2021 at 2:14 AM Peter Maydell wrote: > > On Sat, 21 Aug 2021 at 16:45, Bin Meng wrote: > > > > As of today, when booting upstream U-Boot for Xilinx Zynq, the UART > > does not receive anything. Initial debugging shows that the UART clock > > frequency is 0 somehow which prevents the UART from receiving anything. > > Note the U-Boot can still output data to the UART tx fifo, which should > > not happen, as the design seems to prevent the data transmission when > > clock is not enabled but somehow it only applies to the Rx side. > > > > For anyone who is interested to give a try, here is the U-Boot defconfig: > > $ make xilinx_zynq_virt_defconfig > > > > and QEMU commands to test U-Boot: > > $ qemu-system-arm -M xilinx-zynq-a9 -m 1G -display none -serial null > > -serial stdio \ > > -device loader,file=u-boot-dtb.bin,addr=0x400,cpu-num=0 > > > > Note U-Boot used to boot properly in QEMU 4.2.0 which is the QEMU > > version used in current U-Boot's CI testing. The UART clock changes > > were introduced by the following 3 commits: > > > > 38867cb7ec90 ("hw/misc/zynq_slcr: add clock generation for uarts") > > b636db306e06 ("hw/char/cadence_uart: add clock support") > > 5b49a34c6800 ("hw/arm/xilinx_zynq: connect uart clocks to slcr") > > > > Looks like we don't have enough time to figure out a proper fix before > > 6.1.0 release date, let's disconnect the UART clocks temporarily. > > This is too late for 6.1 regardless, I'm afraid. That's too bad :( I figured out a proper fix, and will send it out soon. Regards, Bin
vmgenid no maintainers
Hi : All of you have contributed to vmgenid device at some point. Unfortunately, this code currently has no maintainers. I had looked into this feature as a part of another project for a previous company, hence noticed it. Will any of you would want to be the maintainer of this codebase in Qemu? I do not know all the parts of this feature and the spec well enough to be confident to be a maintainer but I can be a reviewer if you want. Thanks ani
Re: [PATCH 0/3] target/arm: Reduced-IPA space and highmem=off fixes
On 2021-08-22 15:44, Marc Zyngier wrote: With the availability of a fruity range of arm64 systems, it becomes obvious that QEMU doesn't deal very well with limited IPA ranges when used as a front-end for KVM. This short series aims at making usable on such systems: - the first patch makes the creation of a scratch VM IPA-limit aware - the second one actually removes the highmem devices from the computed IPA range when highmem=off - the last one addresses an imprecision in the documentation for the highmem option This has been tested on an M1-based Mac-mini running Linux v5.14-rc6. I realise I haven't been very clear in my description of the above. With this series, using 'highmem=off' results in a usable VM, while sticking to the default 'highmem=on' still generates an error. M. -- Jazz is not dead. It just smells funny...
[PATCH 0/3] target/arm: Reduced-IPA space and highmem=off fixes
With the availability of a fruity range of arm64 systems, it becomes obvious that QEMU doesn't deal very well with limited IPA ranges when used as a front-end for KVM. This short series aims at making usable on such systems: - the first patch makes the creation of a scratch VM IPA-limit aware - the second one actually removes the highmem devices from the computed IPA range when highmem=off - the last one addresses an imprecision in the documentation for the highmem option This has been tested on an M1-based Mac-mini running Linux v5.14-rc6. Marc Zyngier (3): hw/arm/virt: KVM: Probe for KVM_CAP_ARM_VM_IPA_SIZE when creating scratch VM hw/arm/virt: Honor highmem setting when computing highest_gpa docs/system/arm/virt: Fix documentation for the 'highmem' option docs/system/arm/virt.rst | 6 +++--- hw/arm/virt.c| 10 +++--- target/arm/kvm.c | 7 ++- 3 files changed, 16 insertions(+), 7 deletions(-) -- 2.30.2
[PATCH 1/3] hw/arm/virt: KVM: Probe for KVM_CAP_ARM_VM_IPA_SIZE when creating scratch VM
Although we probe for the IPA limits imposed by KVM (and the hardware) when computing the memory map, we still use the old style '0' when creating a scratch VM in kvm_arm_create_scratch_host_vcpu(). On systems that are severely IPA challenged (such as the Apple M1), this results in a failure as KVM cannot use the default 40bit that '0' represents. Instead, probe for the extension and use the reported IPA limit if available. Cc: Andrew Jones Cc: Eric Auger Cc: Peter Maydell Signed-off-by: Marc Zyngier --- target/arm/kvm.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/target/arm/kvm.c b/target/arm/kvm.c index d8381ba224..cc3371a99b 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -70,12 +70,17 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try, struct kvm_vcpu_init *init) { int ret = 0, kvmfd = -1, vmfd = -1, cpufd = -1; +int max_vm_pa_size; kvmfd = qemu_open_old("/dev/kvm", O_RDWR); if (kvmfd < 0) { goto err; } -vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0); +max_vm_pa_size = ioctl(kvmfd, KVM_CHECK_EXTENSION, KVM_CAP_ARM_VM_IPA_SIZE); +if (max_vm_pa_size < 0) { +max_vm_pa_size = 0; +} +vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size); if (vmfd < 0) { goto err; } -- 2.30.2
[PATCH 3/3] docs/system/arm/virt: Fix documentation for the 'highmem' option
The documentation for the 'highmem' option indicates that it controls the placement of both devices and RAM. The actual behaviour of QEMU seems to be that RAM is allowed to go beyond the 4GiB limit, and that only devices are constraint by this option. Align the documentation with the actual behaviour. Cc: Andrew Jones Cc: Eric Auger Cc: Peter Maydell Signed-off-by: Marc Zyngier --- docs/system/arm/virt.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index 59acf0eeaf..e206e7565d 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -86,9 +86,9 @@ mte Arm Memory Tagging Extensions. The default is ``off``. highmem - Set ``on``/``off`` to enable/disable placing devices and RAM in physical - address space above 32 bits. The default is ``on`` for machine types - later than ``virt-2.12``. + Set ``on``/``off`` to enable/disable placing devices in physical address + space above 32 bits. RAM in excess of 3GiB will always be placed above + 32 bits. The default is ``on`` for machine types later than ``virt-2.12``. gic-version Specify the version of the Generic Interrupt Controller (GIC) to provide. -- 2.30.2
[PATCH 2/3] hw/arm/virt: Honor highmem setting when computing highest_gpa
Even when the VM is configured with highmem=off, the highest_gpa field includes devices that are above the 4GiB limit, which is what highmem=off is supposed to enforce. This leads to failures in virt_kvm_type() on systems that have a crippled IPA range, as the reported IPA space is larger than what it should be. Instead, honor the user-specified limit to only use the devices at the lowest end of the spectrum. Note that this doesn't affect memory, which is still allowed to go beyond 4GiB with highmem=on configurations. Cc: Andrew Jones Cc: Eric Auger Cc: Peter Maydell Signed-off-by: Marc Zyngier --- hw/arm/virt.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 81eda46b0b..bc189e30b8 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1598,7 +1598,7 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) static void virt_set_memmap(VirtMachineState *vms) { MachineState *ms = MACHINE(vms); -hwaddr base, device_memory_base, device_memory_size; +hwaddr base, device_memory_base, device_memory_size, ceiling; int i; vms->memmap = extended_memmap; @@ -1625,7 +1625,7 @@ static void virt_set_memmap(VirtMachineState *vms) device_memory_size = ms->maxram_size - ms->ram_size + ms->ram_slots * GiB; /* Base address of the high IO region */ -base = device_memory_base + ROUND_UP(device_memory_size, GiB); +ceiling = base = device_memory_base + ROUND_UP(device_memory_size, GiB); if (base < device_memory_base) { error_report("maxmem/slots too huge"); exit(EXIT_FAILURE); @@ -1642,7 +1642,11 @@ static void virt_set_memmap(VirtMachineState *vms) vms->memmap[i].size = size; base += size; } -vms->highest_gpa = base - 1; +if (vms->highmem) { + /* If we have highmem, move the IPA limit to the top */ + ceiling = base; +} +vms->highest_gpa = ceiling - 1; if (device_memory_size > 0) { ms->device_memory = g_malloc0(sizeof(*ms->device_memory)); ms->device_memory->base = device_memory_base; -- 2.30.2
Testing a microcontroller emulation by loading the binary on incomplete Flash emulation
Hi, I am to implement a very simple microcontroller for my understanding of Qemu development. This microcontroller runs its code from programmable flash which is bit-, byte- and word addressable. To do some initial tests of my nascent microcontroller implementation I would like to load a very simple program binary. Is there a way to load this binary and start execution without emulating Flash controller and memory? Thanks.
[PATCH] virtio-vga, stubs: Fix qemu failing on virio-vga with blobs on when qemu compiled with modules enabled
>From f972dab518c6581a83f397c35b3d09b2ef6de674 Mon Sep 17 00:00:00 2001 From: max Date: Sun, 22 Aug 2021 14:02:48 +0300 Subject: [PATCH] virtio-vga, stubs: Fix qemu failing on virio-vga with blobs on when qemu compiled with modules enabled Resolves: https://gitlab.com/qemu-project/qemu/-/issues/553 Signed-off-by: max --- stubs/meson.build | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/stubs/meson.build b/stubs/meson.build index d3fa864..beb737f 100644 --- a/stubs/meson.build +++ b/stubs/meson.build @@ -53,7 +53,7 @@ if have_system stub_ss.add(files('semihost.c')) stub_ss.add(files('usb-dev-stub.c')) stub_ss.add(files('xen-hw-stub.c')) - stub_ss.add(files('virtio-gpu-udmabuf.c')) + stub_ss.add(when: 'CONFIG_LINUX', if_false: files('virtio-gpu-udmabuf.c')) else stub_ss.add(files('qdev.c')) endif -- 2.32.0
[PATCH v2 2/5] qemu/qarray.h: check scalar type in QARRAY_CREATE()
Make sure at compile time that the scalar type of the array requested to be created via QARRAY_CREATE() matches the scalar type of the passed auto reference variable (unique pointer). Suggested-by: Richard Henderson Signed-off-by: Christian Schoenebeck --- include/qemu/qarray.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/include/qemu/qarray.h b/include/qemu/qarray.h index 9885e5e9ed..643c8877c5 100644 --- a/include/qemu/qarray.h +++ b/include/qemu/qarray.h @@ -27,6 +27,8 @@ #ifndef QEMU_QARRAY_H #define QEMU_QARRAY_H +#include "qemu/compiler.h" + /** * QArray provides a mechanism to access arrays in common C-style (e.g. by * square bracket [] operator) in conjunction with reference variables that @@ -145,6 +147,10 @@ * @param len - amount of array elements to be allocated immediately */ #define QARRAY_CREATE(scalar_type, auto_var, len) \ +QEMU_BUILD_BUG_MSG( \ +!__builtin_types_compatible_p(scalar_type, typeof(*auto_var)), \ +"QArray scalar type mismatch" \ +); \ qarray_create_##scalar_type((_var), len) #endif /* QEMU_QARRAY_H */ -- 2.20.1
[PATCH v2 4/5] 9pfs: make V9fsPath usable via QArray API
Signed-off-by: Christian Schoenebeck --- fsdev/file-op-9p.h | 2 ++ hw/9pfs/9p.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index 42f677cf38..7630f0e538 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -18,6 +18,7 @@ #include #include #include "qemu-fsdev-throttle.h" +#include "qemu/qarray.h" #define SM_LOCAL_MODE_BITS0600 #define SM_LOCAL_DIR_MODE_BITS0700 @@ -105,6 +106,7 @@ struct V9fsPath { uint16_t size; char *data; }; +DECLARE_QARRAY_TYPE(V9fsPath); typedef union V9fsFidOpenState V9fsFidOpenState; diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index c857b31321..b59572fa79 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -50,6 +50,8 @@ enum { Oappend = 0x80, }; +DEFINE_QARRAY_TYPE(V9fsPath, v9fs_path_free); + static ssize_t pdu_marshal(V9fsPDU *pdu, size_t offset, const char *fmt, ...) { ssize_t ret; -- 2.20.1
[PATCH v2 5/5] 9pfs: use QArray in v9fs_walk()
Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p.c | 17 + 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index b59572fa79..91062ee4d6 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -1707,13 +1707,14 @@ static void coroutine_fn v9fs_walk(void *opaque) int name_idx; g_autofree V9fsQID *qids = NULL; int i, err = 0; -V9fsPath dpath, path, *pathes = NULL; +V9fsPath dpath, path; +QArrayRef(V9fsPath) pathes = NULL; uint16_t nwnames; struct stat stbuf, fidst; g_autofree struct stat *stbufs = NULL; size_t offset = 7; int32_t fid, newfid; -V9fsString *wnames = NULL; +QArrayRef(V9fsString) wnames = NULL; V9fsFidState *fidp; V9fsFidState *newfidp = NULL; V9fsPDU *pdu = opaque; @@ -1734,10 +1735,10 @@ static void coroutine_fn v9fs_walk(void *opaque) goto out_nofid; } if (nwnames) { -wnames = g_new0(V9fsString, nwnames); +QARRAY_CREATE(V9fsString, wnames, nwnames); qids = g_new0(V9fsQID, nwnames); stbufs = g_new0(struct stat, nwnames); -pathes = g_new0(V9fsPath, nwnames); +QARRAY_CREATE(V9fsPath, pathes, nwnames); for (i = 0; i < nwnames; i++) { err = pdu_unmarshal(pdu, offset, "s", [i]); if (err < 0) { @@ -1869,14 +1870,6 @@ out: v9fs_path_free(); out_nofid: pdu_complete(pdu, err); -if (nwnames && nwnames <= P9_MAXWELEM) { -for (name_idx = 0; name_idx < nwnames; name_idx++) { -v9fs_string_free([name_idx]); -v9fs_path_free([name_idx]); -} -g_free(wnames); -g_free(pathes); -} } static int32_t coroutine_fn get_iounit(V9fsPDU *pdu, V9fsPath *path) -- 2.20.1
[PATCH v2 3/5] 9pfs: make V9fsString usable via QArray API
Signed-off-by: Christian Schoenebeck --- fsdev/9p-marshal.c | 2 ++ fsdev/9p-marshal.h | 3 +++ 2 files changed, 5 insertions(+) diff --git a/fsdev/9p-marshal.c b/fsdev/9p-marshal.c index a01bba6908..fbfc2a62cd 100644 --- a/fsdev/9p-marshal.c +++ b/fsdev/9p-marshal.c @@ -18,6 +18,8 @@ #include "9p-marshal.h" +DEFINE_QARRAY_TYPE(V9fsString, v9fs_string_free); + void v9fs_string_free(V9fsString *str) { g_free(str->data); diff --git a/fsdev/9p-marshal.h b/fsdev/9p-marshal.h index ceaf2f521e..7229e4e617 100644 --- a/fsdev/9p-marshal.h +++ b/fsdev/9p-marshal.h @@ -1,10 +1,13 @@ #ifndef QEMU_9P_MARSHAL_H #define QEMU_9P_MARSHAL_H +#include "qemu/qarray.h" + typedef struct V9fsString { uint16_t size; char *data; } V9fsString; +DECLARE_QARRAY_TYPE(V9fsString); typedef struct V9fsQID { uint8_t type; -- 2.20.1
[PATCH v2 1/5] qemu/qarray.h: introduce QArray
Implements deep auto free of arrays while retaining common C-style squared bracket access. Signed-off-by: Christian Schoenebeck --- include/qemu/qarray.h | 150 ++ 1 file changed, 150 insertions(+) create mode 100644 include/qemu/qarray.h diff --git a/include/qemu/qarray.h b/include/qemu/qarray.h new file mode 100644 index 00..9885e5e9ed --- /dev/null +++ b/include/qemu/qarray.h @@ -0,0 +1,150 @@ +/* + * QArray - deep auto free C-array + * + * Copyright (c) 2021 Crudebyte + * + * Authors: + * Christian Schoenebeck + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ +#ifndef QEMU_QARRAY_H +#define QEMU_QARRAY_H + +/** + * QArray provides a mechanism to access arrays in common C-style (e.g. by + * square bracket [] operator) in conjunction with reference variables that + * perform deep auto free of the array when leaving the scope of the auto + * reference variable. That means not only is the array itself automatically + * freed, but also memory dynamically allocated by the individual array + * elements. + * + * Example: + * + * Consider the following user struct @c Foo which shall be used as scalar + * (element) type of an array: + * @code + * typedef struct Foo { + * int i; + * char *s; + * } Foo; + * @endcode + * and assume it has the following function to free memory allocated by @c Foo + * instances: + * @code + * void free_foo(Foo *foo) { + * free(foo->s); + * } + * @endcode + * Add the following to a shared header file: + * @code + * DECLARE_QARRAY_TYPE(Foo); + * @endcode + * and the following to a C unit file: + * @code + * DEFINE_QARRAY_TYPE(Foo, free_foo); + * @endcode + * Finally the array may then be used like this: + * @code + * void doSomething(int n) { + * QArrayRef(Foo) foos = NULL; + * QARRAY_CREATE(Foo, foos, n); + * for (size_t i = 0; i < n; ++i) { + * foos[i].i = i; + * foos[i].s = calloc(4096, 1); + * snprintf(foos[i].s, 4096, "foo %d", i); + * } + * } + * @endcode + */ + +/** + * Declares an array type for the passed @a scalar_type. + * + * This is typically used from a shared header file. + * + * @param scalar_type - type of the individual array elements + */ +#define DECLARE_QARRAY_TYPE(scalar_type) \ +typedef struct QArray##scalar_type { \ +size_t len; \ +scalar_type first[]; \ +} QArray##scalar_type; \ +\ +void qarray_create_##scalar_type(scalar_type **auto_var, size_t len); \ +void qarray_auto_free_##scalar_type(scalar_type **auto_var); \ + +/** + * Defines an array type for the passed @a scalar_type and appropriate + * @a scalar_cleanup_func. + * + * This is typically used from a C unit file. + * + * @param scalar_type - type of the individual array elements + * @param scalar_cleanup_func - appropriate function to free memory dynamically + * allocated by individual array elements before + */ +#define DEFINE_QARRAY_TYPE(scalar_type, scalar_cleanup_func) \ +void qarray_create_##scalar_type(scalar_type **auto_var, size_t len) \ +{ \ +qarray_auto_free_##scalar_type(auto_var); \ +QArray##scalar_type *arr = g_malloc0(sizeof(QArray##scalar_type) + \ +len * sizeof(scalar_type)); \ +arr->len = len; \ +*auto_var = >first[0]; \ +} \ +\ +void qarray_auto_free_##scalar_type(scalar_type **auto_var) \ +{ \ +scalar_type *first = (*auto_var); \ +if (!first) { \ +return; \ +} \ +QArray##scalar_type *arr = (QArray##scalar_type *) ( \ +((char *)first) - offsetof(QArray##scalar_type, first) \ +); \ +for (size_t i = 0; i < arr->len; ++i) { \ +scalar_cleanup_func(>first[i]); \ +} \ +g_free(arr); \ +} \ + +/** + * Used to declare a reference variable (unique pointer) for an array. After + * leaving the scope of the reference variable,
[PATCH v2 0/5] introduce QArray
Patches 1 and 2 introduce include/qemu/qarray.h which implements a deep auto free mechanism for arrays. Unlike GArray it does not require special macros, function calls or member dereferencing to access the individual array elements. So existing C-style array code can be retained with only very little changes. In this initial version QArray only supports the concept of unique pointers, i.e. it does not support reference counting. The array (and all dynamically allocated memory of individual array elements) is auto freed once execution leaves the scope of the reference variable (unique pointer) associated with the array. Patches 3..5 are provided (e.g. as example) for 9p being the first user of this new QArray API. These particular patches 3..5 are rebased on my current 9p queue: https://github.com/cschoenebeck/qemu/commits/9p.next which are basically just the following two queued patches: https://github.com/cschoenebeck/qemu/commit/7772715d43908235940f5b7dec68d0458b1ccdf4 https://github.com/cschoenebeck/qemu/commit/838b55e392ea7d52e714fdba1db777f658aee2cc v1 -> v2: * Minor API comment changes [patch 1]. * Perform strong type check by using __builtin_types_compatible_p() instead of a weak check using sizeof() [patch 2]. Christian Schoenebeck (5): qemu/qarray.h: introduce QArray qemu/qarray.h: check scalar type in QARRAY_CREATE() 9pfs: make V9fsString usable via QArray API 9pfs: make V9fsPath usable via QArray API 9pfs: use QArray in v9fs_walk() fsdev/9p-marshal.c| 2 + fsdev/9p-marshal.h| 3 + fsdev/file-op-9p.h| 2 + hw/9pfs/9p.c | 19 ++--- include/qemu/qarray.h | 156 ++ 5 files changed, 170 insertions(+), 12 deletions(-) create mode 100644 include/qemu/qarray.h -- 2.20.1
Re: [qemu-web PATCH] Add a blog post about FUSE block exports
On 20/08/2021 09.56, Hanna Reitz wrote: On 19.08.21 18:23, Stefan Hajnoczi wrote: On Thu, Aug 19, 2021 at 12:25:01PM +0200, Hanna Reitz wrote: This post explains when FUSE block exports are useful, how they work, and that it is fun to export an image file on its own path so it looks like your image file (in whatever format it was) is a raw image now. [...] This is a long blog post. One idea is to show a quickstart qemu-storage-daemon FUSE export command-line in the beginning before explaining all the details. That way people who just want to see what this is about can get an idea without learning all the background first. Sounds good, will do. Thanks! Oops, sorry, I just already pushed your patch to the repository since I did not notice this conversation in time. Next time, please make sure to put myself (and Paolo) on CC: when sending updates for qemu-web, otherwise I might not notice the updates :-/ Thomas
Re: [PATCH 1/5] qemu/qarray.h: introduce QArray
On Samstag, 21. August 2021 22:18:18 CEST Christian Schoenebeck wrote: > Implements deep auto free of arrays while retaining common C-style > squared bracket access. > > Signed-off-by: Christian Schoenebeck > --- As I'm going to send a v2 anyway, I'll also just do some minor API comment changes in this patch, specifically ... > include/qemu/qarray.h | 148 ++ > 1 file changed, 148 insertions(+) > create mode 100644 include/qemu/qarray.h > > diff --git a/include/qemu/qarray.h b/include/qemu/qarray.h > new file mode 100644 > index 00..230a556e81 > --- /dev/null > +++ b/include/qemu/qarray.h > @@ -0,0 +1,148 @@ > +/* > + * QArray - deep auto free C-array > + * > + * Copyright (c) 2021 Crudebyte > + * > + * Authors: > + * Christian Schoenebeck > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > copy + * of this software and associated documentation files (the > "Software"), to deal + * in the Software without restriction, including > without limitation the rights + * to use, copy, modify, merge, publish, > distribute, sublicense, and/or sell + * copies of the Software, and to > permit persons to whom the Software is + * furnished to do so, subject to > the following conditions: > + * > + * The above copyright notice and this permission notice shall be included > in + * all copies or substantial portions of the Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS > OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. > IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY > CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, > TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE > SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. > + */ > +#ifndef QEMU_QARRAY_H > +#define QEMU_QARRAY_H > + > +/** > + * QArray provides a mechanism to access arrays in common C-style (e.g. by > + * square bracket [] operator) in conjunction with reference variables that > + * perform deep auto free of the array when leaving the scope of the auto > + * reference variable. That means not only is the array itself > automatically + * freed, but also memory dynamically allocated by the > individual array + * elements. > + * > + * Example: > + * > + * Consider the following user struct @c Foo which shall be used as scalar > + * (element) type of an array: > + * @code > + * typedef struct Foo { > + * int i; > + * char *s; > + * } Foo; > + * @endcode > + * and assume it has the following function to free memory allocated by @c > Foo + * instances: > + * @code > + * void free_foo(Foo *foo) { > + * free(foo->s); > + * } > + * @endcode > + * Add the following to a shared header file: > + * @code > + * DECLARE_QARRAY_TYPE(Foo); > + * @endcode > + * and the following to a C unit file: > + * @code > + * DEFINE_QARRAY_TYPE(Foo, free_foo); > + * @endcode > + * Finally the array may then be used like this: > + * @code > + * void doSomething(int n) { > + * QArrayRef(Foo) foos = NULL; > + * QARRAY_CREATE(Foo, foos, n); > + * for (size_t i = 0; i < n; ++i) { > + * foos[i].i = i; > + * foos[i].s = calloc(4096, 1); > + * snprintf(foos[i].s, 4096, "foo %d", i); > + * } > + * } > + * @endcode > + */ > + > +/** > + * Declares an array for the passed @a scalar_type. "Declares an array *type* ...". To not confuse it with declaring an array instance/variable. > + * > + * This is typically used from a shared header file. > + * > + * @param scalar_type - type of the individual array elements > + */ > +#define DECLARE_QARRAY_TYPE(scalar_type) \ > +typedef struct QArray##scalar_type { \ > +size_t len; \ > +scalar_type first[]; \ > +} QArray##scalar_type; \ > +\ > +void qarray_create_##scalar_type(scalar_type **auto_var, size_t len); \ > +void qarray_auto_free_##scalar_type(scalar_type **auto_var); \ + > +/** > + * Defines an array for the passed @a scalar_type and appropriate > + * @a scalar_cleanup_func. Likewise "Defines an array *type* ...". Plus missing: "This is typically used from a C unit file." > + * > + * @param scalar_type - type of the individual array elements > + * @param scalar_cleanup_func - appropriate function to free memory > dynamically + * allocated by individual array > elements before + */ > +#define DEFINE_QARRAY_TYPE(scalar_type, scalar_cleanup_func) \ > +void qarray_create_##scalar_type(scalar_type **auto_var, size_t len) \ > +{ \ > +qarray_auto_free_##scalar_type(auto_var); \ > +QArray##scalar_type *arr = g_malloc0(sizeof(QArray##scalar_type) + > \ +len * sizeof(scalar_type)); \ > +arr->len = len; \ > +*auto_var = >first[0]; \ > +} \ > +\ > +void
Re: [PATCH v3 63/66] tcg/tci: Support raising sigbus for user-only
On Sun, 22 Aug 2021 at 08:59, Richard Henderson wrote: > > On 8/20/21 3:14 AM, Peter Maydell wrote: > >> @@ -377,11 +381,11 @@ static uint64_t tci_qemu_ld(CPUArchState *env, > >> target_ulong taddr, > >> static void tci_qemu_st(CPUArchState *env, target_ulong taddr, uint64_t > >> val, > >> MemOpIdx oi, const void *tb_ptr) > >> { > >> -MemOp mop = get_memop(oi) & (MO_BSWAP | MO_SSIZE); > >> +MemOp mop = get_memop(oi); > >> uintptr_t ra = (uintptr_t)tb_ptr; > > > > Don't you need this bit in tci_qemu_st() as well ? > > Which bit isn't present in st as well? > There's missing hunks in your reply, but afaics they're the same. https://patchew.org/QEMU/20210818191920.390759-1-richard.hender...@linaro.org/20210818191920.390759-64-richard.hender...@linaro.org/ I had the function name wrong, but only the tci_qemu_st() change has this bit: -MemOp mop = get_memop(oi) & (MO_BSWAP | MO_SSIZE); +MemOp mop = get_memop(oi); -- PMM
Re: [PATCH 2/5] qemu/qarray.h: weak scalar type check in QARRAY_CREATE()
On Sonntag, 22. August 2021 06:11:58 CEST Richard Henderson wrote: > On 8/21/21 1:30 PM, Christian Schoenebeck wrote: > > Unfortunately something like > > > >_Static_assert(typeof(a) == typeof(b), "type mismatch"); > > > > is currently not suported by C. So for the time being at least > > check that the size of the scalar types match at compile time. > > Did you try > _Static_assert(__builtin_types_compatible_p(X, Y), "type mismatch"); > > > r~ Ah, you are right. I was trying it, but now as you pointed me at it again, I realized I was just missing something. The specific use case here is like: struct Foo { ... } Foo; Foo *var; _Static_assert(__builtin_types_compatible_p(Foo, typeof(*var)), "type mismatch"); So I was missing the typeof() keyword to deduce the scalar type from the passed variable. I'll send a v2. Thanks! Best regards, Christian Schoenebeck
Re: [PATCH 1/2] migration/rdma: Try to register On-Demand Paging memory region
Hi On Sat, Jul 31, 2021 at 5:00 PM Li Zhijian wrote: > > Previously, for the fsdax mem-backend-file, it will register failed with > Operation not supported. In this case, we can try to register it with > On-Demand Paging[1] like what rpma_mr_reg() does on rpma[2]. > > [1]: > https://community.mellanox.com/s/article/understanding-on-demand-paging--odp-x > [2]: http://pmem.io/rpma/manpages/v0.9.0/rpma_mr_reg.3 > Signed-off-by: Li Zhijian > --- > migration/rdma.c | 27 ++- > migration/trace-events | 1 + > 2 files changed, 19 insertions(+), 9 deletions(-) > > diff --git a/migration/rdma.c b/migration/rdma.c > index 5c2d113aa94..8784b5f22a6 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -1123,15 +1123,21 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext > *rdma) > RDMALocalBlocks *local = >local_ram_blocks; > > for (i = 0; i < local->nb_blocks; i++) { > +int access = IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE; > + > +on_demand: > local->block[i].mr = > ibv_reg_mr(rdma->pd, > local->block[i].local_host_addr, > -local->block[i].length, > -IBV_ACCESS_LOCAL_WRITE | > -IBV_ACCESS_REMOTE_WRITE > +local->block[i].length, access > ); > if (!local->block[i].mr) { > -perror("Failed to register local dest ram block!"); > +if (!(access & IBV_ACCESS_ON_DEMAND) && errno == ENOTSUP) { > +access |= IBV_ACCESS_ON_DEMAND; > +trace_qemu_rdma_register_odp_mr(local->block[i].block_name); > +goto on_demand; Wouldn't it be better to check first if the device supports ODP ? Something like: ret = ibv_exp_query_device(context, ); if (dattr.exp_device_cap_flags & IBV_EXP_DEVICE_ODP)... Also, I am not (personally) too fond of the "on_demand" label usage here, however I will let the maintainer/others decide. Thanks, Marcel > +} > +perror("Failed to register local dest ram block!\n"); > break; > } > rdma->total_registrations++; > @@ -1215,15 +1221,18 @@ static int > qemu_rdma_register_and_get_keys(RDMAContext *rdma, > */ > if (!block->pmr[chunk]) { > uint64_t len = chunk_end - chunk_start; > +int access = rkey ? IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE > : 0; > > trace_qemu_rdma_register_and_get_keys(len, chunk_start); > > -block->pmr[chunk] = ibv_reg_mr(rdma->pd, > -chunk_start, len, > -(rkey ? (IBV_ACCESS_LOCAL_WRITE | > -IBV_ACCESS_REMOTE_WRITE) : 0)); > - > +on_demand: > +block->pmr[chunk] = ibv_reg_mr(rdma->pd, chunk_start, len, access); > if (!block->pmr[chunk]) { > +if (!(access & IBV_ACCESS_ON_DEMAND) && errno == ENOTSUP) { > +access |= IBV_ACCESS_ON_DEMAND; > +trace_qemu_rdma_register_odp_mr(block->block_name); > +goto on_demand; > +} > perror("Failed to register chunk!"); > fprintf(stderr, "Chunk details: block: %d chunk index %d" > " start %" PRIuPTR " end %" PRIuPTR > diff --git a/migration/trace-events b/migration/trace-events > index a1c0f034ab8..5f6aa580def 100644 > --- a/migration/trace-events > +++ b/migration/trace-events > @@ -212,6 +212,7 @@ qemu_rdma_poll_write(const char *compstr, int64_t comp, > int left, uint64_t block > qemu_rdma_poll_other(const char *compstr, int64_t comp, int left) "other > completion %s (%" PRId64 ") received left %d" > qemu_rdma_post_send_control(const char *desc) "CONTROL: sending %s.." > qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" > PRIu64 " bytes @ %p" > +qemu_rdma_register_odp_mr(const char *name) "Try to register On-Demand > Paging memory region: %s" > qemu_rdma_registration_handle_compress(int64_t length, int index, int64_t > offset) "Zapping zero chunk: %" PRId64 " bytes, index %d, offset %" PRId64 > qemu_rdma_registration_handle_finished(void) "" > qemu_rdma_registration_handle_ram_blocks(void) "" > -- > 2.31.1 > > > >
Re: [PATCH 2/2] migration/rdma: advise prefetch write for ODP region
Hi, On Sat, Jul 31, 2021 at 5:03 PM Li Zhijian wrote: > > The responder mr registering with ODP will sent RNR NAK back to > the requester in the face of the page fault. > - > ibv_poll_cq wc.status=13 RNR retry counter exceeded! > ibv_poll_cq wrid=WRITE RDMA! > - > ibv_advise_mr(3) helps to make pages present before the actual IO is > conducted so that the responder does page fault as little as possible. > > Signed-off-by: Li Zhijian > --- > migration/rdma.c | 40 > migration/trace-events | 1 + > 2 files changed, 41 insertions(+) > > diff --git a/migration/rdma.c b/migration/rdma.c > index 8784b5f22a6..a2ad00d665f 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -1117,6 +1117,30 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma) > return 0; > } > > +/* > + * ibv_advise_mr to avoid RNR NAK error as far as possible. > + * The responder mr registering with ODP will sent RNR NAK back to > + * the requester in the face of the page fault. > + */ > +static void qemu_rdma_advise_prefetch_write_mr(struct ibv_pd *pd, uint64_t > addr, > + uint32_t len, uint32_t lkey, > + const char *name, bool wr) > +{ > +int ret; > +int advice = wr ? IBV_ADVISE_MR_ADVICE_PREFETCH_WRITE : > + IBV_ADVISE_MR_ADVICE_PREFETCH; > +struct ibv_sge sg_list = {.lkey = lkey, .addr = addr, .length = len}; > + > +ret = ibv_advise_mr(pd, advice, > +IB_UVERBS_ADVISE_MR_FLAG_FLUSH, _list, 1); > +/* ignore the error */ > +if (ret) { > +trace_qemu_rdma_advise_mr(name, len, addr, strerror(errno)); > +} else { > +trace_qemu_rdma_advise_mr(name, len, addr, "successed"); > +} > +} > + > static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma) > { > int i; > @@ -1140,6 +1164,17 @@ on_demand: > perror("Failed to register local dest ram block!\n"); > break; > } > + > +if (access & IBV_ACCESS_ON_DEMAND) { > +qemu_rdma_advise_prefetch_write_mr(rdma->pd, > + (uintptr_t) > + > local->block[i].local_host_addr, > + local->block[i].length, > + local->block[i].mr->lkey, > + local->block[i].block_name, > + true); > +} > + > rdma->total_registrations++; > } > > @@ -1244,6 +1279,11 @@ on_demand: > rdma->total_registrations); > return -1; > } > +if (access & IBV_ACCESS_ON_DEMAND) { > +qemu_rdma_advise_prefetch_write_mr(rdma->pd, > (uintptr_t)chunk_start, > + len, block->pmr[chunk]->lkey, > + block->block_name, rkey); > +} > rdma->total_registrations++; > } > > diff --git a/migration/trace-events b/migration/trace-events > index 5f6aa580def..901c1d54c12 100644 > --- a/migration/trace-events > +++ b/migration/trace-events > @@ -213,6 +213,7 @@ qemu_rdma_poll_other(const char *compstr, int64_t comp, > int left) "other complet > qemu_rdma_post_send_control(const char *desc) "CONTROL: sending %s.." > qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" > PRIu64 " bytes @ %p" > qemu_rdma_register_odp_mr(const char *name) "Try to register On-Demand > Paging memory region: %s" > +qemu_rdma_advise_mr(const char *name, uint32_t len, uint64_t addr, const > char *res) "Try to advise block %s prefetch write at %" PRIu32 "@0x%" PRIx64 > ": %s" > qemu_rdma_registration_handle_compress(int64_t length, int index, int64_t > offset) "Zapping zero chunk: %" PRId64 " bytes, index %d, offset %" PRId64 > qemu_rdma_registration_handle_finished(void) "" > qemu_rdma_registration_handle_ram_blocks(void) "" > -- > 2.31.1 > Following https://github.com/linux-rdma/rdma-core/blob/master/libibverbs/man/ibv_advise_mr.3.md it looks like it is a best-effort optimization, I don't see any down-sides to it. However it seems like it is recommended to use IBV_ADVISE_MR_FLAG_FLUSH in order to increase the optimization chances. Anyway Reviewed-by: Marcel Apfelbaum Thanks, Marcel
Re: [PATCH v3 63/66] tcg/tci: Support raising sigbus for user-only
On 8/20/21 3:14 AM, Peter Maydell wrote: @@ -296,7 +296,7 @@ static uint64_t tci_qemu_ld(CPUArchState *env, target_ulong taddr, uintptr_t ra = (uintptr_t)tb_ptr; #ifdef CONFIG_SOFTMMU -switch (mop) { +switch (mop & (MO_BSWAP | MO_SSIZE)) { case MO_UB: return helper_ret_ldub_mmu(env, taddr, oi, ra); case MO_SB: @@ -326,10 +326,14 @@ static uint64_t tci_qemu_ld(CPUArchState *env, target_ulong taddr, } #else void *haddr = g2h(env_cpu(env), taddr); +unsigned a_mask = (1u << get_alignment_bits(mop)) - 1; uint64_t ret; set_helper_retaddr(ra); -switch (mop) { +if (taddr & a_mask) { +helper_unaligned_ld(env, taddr); +} +switch (mop & (MO_BSWAP | MO_SSIZE)) { case MO_UB: ret = ldub_p(haddr); break; @@ -377,11 +381,11 @@ static uint64_t tci_qemu_ld(CPUArchState *env, target_ulong taddr, static void tci_qemu_st(CPUArchState *env, target_ulong taddr, uint64_t val, MemOpIdx oi, const void *tb_ptr) { -MemOp mop = get_memop(oi) & (MO_BSWAP | MO_SSIZE); +MemOp mop = get_memop(oi); uintptr_t ra = (uintptr_t)tb_ptr; Don't you need this bit in tci_qemu_st() as well ? Which bit isn't present in st as well? There's missing hunks in your reply, but afaics they're the same. r~
Re: [PATCH v3 59/66] accel/tcg: Handle SIGBUS in handle_cpu_signal
On 8/20/21 2:34 AM, Peter Maydell wrote: On Wed, 18 Aug 2021 at 21:13, Richard Henderson wrote: We've been registering host SIGBUS, but then treating it exactly like SIGSEGV. Handle BUS_ADRALN via cpu_unaligned_access, but allow other SIGBUS si_codes to continue into the host-to-guest signal coversion code in host_signal_handler. Unwind the guest state so that we report the correct guest PC for the fault. You can't rely on alignment faults being marked by BUS_ADRALN: eg MIPS doesn't give you that si_code. How much does that matter for our use of it here ? Aside from whatever tcg/foo relies upon, it doesn't matter. But: silly mips. I was going to rely on it there -- oh well, I'll undo that section of the follow-on tcg/mips patch set. And it's got a different syscall for disabling the unaligned fixup. r~
Re: [PATCH v4 3/7] python/aqmp-tui: Add AQMP TUI draft
On Thu, Aug 19, 2021 at 1:39 PM G S Niteesh Babu wrote: > Added a draft of AQMP TUI. > > Implements the follwing basic features: > 1) Command transmission/reception. > 2) Shows events asynchronously. > 3) Shows server status in the bottom status bar. > > Also added type annotations and necessary pylint, > mypy configurations > > Signed-off-by: G S Niteesh Babu > --- > python/qemu/aqmp/aqmp_tui.py | 566 +++ > python/setup.cfg | 15 +- > 2 files changed, 579 insertions(+), 2 deletions(-) > create mode 100644 python/qemu/aqmp/aqmp_tui.py > > diff --git a/python/qemu/aqmp/aqmp_tui.py b/python/qemu/aqmp/aqmp_tui.py > new file mode 100644 > index 00..12c9c4162a > --- /dev/null > +++ b/python/qemu/aqmp/aqmp_tui.py > @@ -0,0 +1,566 @@ > +# Copyright (c) 2021 > +# > +# Authors: > +# Niteesh Babu G S > +# > +# This work is licensed under the terms of the GNU GPL, version 2 or > +# later. See the COPYING file in the top-level directory. > +""" > +AQMP TUI > + > +AQMP TUI is an asynchronous interface built on top the of the AQMP > library. > +It is the successor of QMP-shell and is bought-in as a replacement for it. > + > +Example Usage: aqmp-tui > +Full Usage: aqmp-tui --help > +""" > + > +import argparse > +import asyncio > +import logging > +from logging import Handler, LogRecord > +import signal > +from typing import ( > +List, > +Optional, > +Tuple, > +Type, > +Union, > +cast, > +) > + > +import urwid > +import urwid_readline > + > +from ..qmp import QEMUMonitorProtocol, QMPBadPortError > +from .message import DeserializationError, Message, UnexpectedTypeError > +from .protocol import ConnectError > +from .qmp_client import ExecInterruptedError, QMPClient > +from .util import create_task, pretty_traceback > + > + > +# The name of the signal that is used to update the history list > +UPDATE_MSG: str = 'UPDATE_MSG' > + > + > +def format_json(msg: str) -> str: > +""" > +Formats given multi-line JSON message into a single-line message. > +Converting into single line is more asthetically pleasing when looking > +along with error messages. > + > +Eg: > +Input: > + [ 1, > +true, > +3 ] > +The above input is not a valid QMP message and produces the following > error > +"QMP message is not a JSON object." > +When displaying this in TUI in multiline mode we get > + > +[ 1, > + true, > + 3 ]: QMP message is not a JSON object. > + > +whereas in singleline mode we get the following > + > +[1, true, 3]: QMP message is not a JSON object. > + > +The single line mode is more asthetically pleasing. > + > +:param msg: > +The message to formatted into single line. > + > +:return: Formatted singleline message. > + > +NOTE: We cannot use the JSON module here because it is only capable of > +format valid JSON messages. But here the goal is to also format > invalid > +JSON messages. > +""" > +msg = msg.replace('\n', '') > +words = msg.split(' ') > +words = [word for word in words if word != ''] > +return ' '.join(words) > + > + > +def has_tui_handler(logger: logging.Logger, > +handler_type: Type[Handler]) -> bool: > +""" > +The Logger class has no interface to check if a certain type of > handler is > +installed or not. So we provide an interface to do so. > + > +:param logger: > +Logger object > +:param handler_type: > +The type of the handler to be checked. > + > +:return: returns True if handler of type `handler_type` is installed > else > + False. > +""" > +handlers = logger.handlers > +for handler in handlers: > +if isinstance(handler, handler_type): > +return True > +return False > + > + > +class App(QMPClient): > +""" > +Implements the AQMP TUI. > + > +Initializes the widgets and starts the urwid event loop. > +""" > +def __init__(self, address: Union[str, Tuple[str, int]]) -> None: > +""" > +Initializes the TUI. > + > +:param address: > +Address of the server to connect to. > +""" > +urwid.register_signal(type(self), UPDATE_MSG) > +self.window = Window(self) > +self.address = address > +self.aloop: Optional[asyncio.AbstractEventLoop] = None > +super().__init__() > + > +def add_to_history(self, msg: str, level: Optional[str] = None) -> > None: > +""" > +Appends the msg to the history list. > + > +:param msg: > +The raw message to be appended in string type. > +""" > +urwid.emit_signal(self, UPDATE_MSG, msg, level) > + > +def _cb_outbound(self, msg: Message) -> Message: > +""" > +Callback: outbound message hook. > + > +Appends the outgoing messages to the history box. > + > +:param msg: raw