Re: [PATCH] xen-hvm: Avoid livelock while handling buffered ioreqs
On 04/04/2024 15:08, Ross Lagerwall wrote: A malicious or buggy guest may generated buffered ioreqs faster than QEMU can process them in handle_buffered_iopage(). The result is a livelock - QEMU continuously processes ioreqs on the main thread without iterating through the main loop which prevents handling other events, processing timers, etc. Without QEMU handling other events, it often results in the guest becoming unsable and makes it difficult to stop the source of buffered ioreqs. To avoid this, if we process a full page of buffered ioreqs, stop and reschedule an immediate timer to continue processing them. This lets QEMU go back to the main loop and catch up. Signed-off-by: Ross Lagerwall --- hw/xen/xen-hvm-common.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH] xen-hvm: Avoid livelock while handling buffered ioreqs
On 08/04/2024 14:00, Ross Lagerwall wrote: On Sat, Apr 6, 2024 at 11:58 AM Durrant, Paul wrote: On 04/04/2024 15:08, Ross Lagerwall wrote: A malicious or buggy guest may generated buffered ioreqs faster than QEMU can process them in handle_buffered_iopage(). The result is a livelock - QEMU continuously processes ioreqs on the main thread without iterating through the main loop which prevents handling other events, processing timers, etc. Without QEMU handling other events, it often results in the guest becoming unsable and makes it difficult to stop the source of buffered ioreqs. To avoid this, if we process a full page of buffered ioreqs, stop and reschedule an immediate timer to continue processing them. This lets QEMU go back to the main loop and catch up. Do PV backends potentially cause the same scheduling issue (if not using io threads)? From what I can tell: xen-block: It reads req_prod / req_cons once before entering the loop so it should be fine, I think. xen_console: Same as xen-block xen_nic: It reads req_prod / req_cons once before entering the loop. However, once the loop ends it checks for more requests and if there are more requests it restarts from the beginning. It seems like this could be susceptible to the same issue. (These PV backends generally aren't used by XenServer's system QEMU so I didn't spend too much time looking into it.) Thanks, Ok. Thanks for checking. Paul
Re: Xen NIC driver have page_pool memory leaks
On 25/03/2024 12:21, Jesper Dangaard Brouer wrote: Hi Arthur, (Answer inlined below, which is custom on this mailing list) On 23/03/2024 14.23, Arthur Borsboom wrote: Hi Jesper, After a recent kernel upgrade 6.7.6 > 6.8.1 all my Xen guests on Arch Linux are dumping kernel traces. It seems to be indirectly caused by the page pool memory leak mechanism, which is probably a good thing. I have created a bug report, but there is no response. https://bugzilla.kernel.org/show_bug.cgi?id=218618 I am uncertain where and to whom I need to report this page leak. Can you help me get this issue fixed? I'm the page_pool maintainer, but as you say yourself in comment 2 then since dba1b8a7ab68 ("mm/page_pool: catch page_pool memory leaks") this indicated there is a problem in the xen_netfront driver, which was previously not visible. Cc'ing the "XEN NETWORK BACKEND DRIVER" maintainers, as this is a driver bug. What confuses me it that I cannot find any modules named "xen_netfront" in the upstream tree. You should have tried '-' rather than '_' :-) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/xen-netfront.c
[PATCH] MAINTAINERS: Remove myself from several subsystems
From: Paul Durrant I am not as actively involved with the Xen project as I once was so I need to relinquish some of my maintainer responsibilities: IOMMU and I/O emulation. NOTE: Since I am sole maintainer for I/O EMULATION (IOREQ) and X86 I/O EMULATION, these sections are removed. Signed-off-by: Paul Durrant --- Cc: Andrew Cooper Cc: George Dunlap Cc: Jan Beulich Cc: Julien Grall Cc: Stefano Stabellini Cc: Wei Liu --- MAINTAINERS | 20 1 file changed, 20 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 56a6932037fe..b2cc3cc3921b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -358,7 +358,6 @@ F: xen/drivers/passthrough/vtd/ IOMMU VENDOR INDEPENDENT CODE M: Jan Beulich -M: Paul Durrant R: Roger Pau Monné S: Supported F: xen/drivers/passthrough/ @@ -368,13 +367,6 @@ X: xen/drivers/passthrough/vtd/ X: xen/drivers/passthrough/device_tree.c F: xen/include/xen/iommu.h -I/O EMULATION (IOREQ) -M: Paul Durrant -S: Supported -F: xen/common/ioreq.c -F: xen/include/xen/ioreq.h -F: xen/include/public/hvm/ioreq.h - KCONFIG M: Doug Goldstein S: Supported @@ -608,18 +600,6 @@ F: tools/misc/xen-cpuid.c F: tools/tests/cpu-policy/ F: tools/tests/x86_emulator/ -X86 I/O EMULATION -M: Paul Durrant -S: Supported -F: xen/arch/x86/hvm/emulate.c -F: xen/arch/x86/hvm/intercept.c -F: xen/arch/x86/hvm/io.c -F: xen/arch/x86/hvm/ioreq.c -F: xen/arch/x86/include/asm/hvm/emulate.h -F: xen/arch/x86/include/asm/hvm/io.h -F: xen/arch/x86/include/asm/hvm/ioreq.h -F: xen/arch/x86/include/asm/ioreq.h - X86 MEMORY MANAGEMENT M: Jan Beulich M: Andrew Cooper -- 2.39.2
Re: [PATCH net 1/7] net: fill in MODULE_DESCRIPTION()s for xen-netback
On 13/02/2024 11:21, Breno Leitao wrote: W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the Xen backend network module. Signed-off-by: Breno Leitao --- drivers/net/xen-netback/netback.c | 1 + 1 file changed, 1 insertion(+) Acked-by: Paul Durrant diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index fab361a250d6..ef76850d9bcd 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1778,5 +1778,6 @@ static void __exit netback_fini(void) } module_exit(netback_fini); +MODULE_DESCRIPTION("Xen backend network device module"); MODULE_LICENSE("Dual BSD/GPL"); MODULE_ALIAS("xen-backend:vif");
Re: [PATCH] x86/HVM: tidy state on hvmemul_map_linear_addr()'s error path
On 08/02/2024 15:59, Jan Beulich wrote: On 06.02.2024 13:06, Jan Beulich wrote: While in the vast majority of cases failure of the function will not be followed by re-invocation with the same emulation context, a few very specific insns - involving multiple independent writes, e.g. ENTER and PUSHA - exist where this can happen. Since failure of the function only signals to the caller that it ought to try an MMIO write instead, such failure also cannot be assumed to result in wholesale failure of emulation of the current insn. Instead we have to maintain internal state such that another invocation of the function with the same emulation context remains possible. To achieve that we need to reset MFN slots after putting page references on the error path. Note that all of this affects debugging code only, in causing an assertion to trigger (higher up in the function). There's otherwise no misbehavior - such a "leftover" slot would simply be overwritten by new contents in a release build. Also extend the related unmap() assertion, to further check for MFN 0. Fixes: 8cbd4fb0b7ea ("x86/hvm: implement hvmemul_write() using real mappings") Reported.by: Manuel Andreas Signed-off-by: Jan Beulich Just noticed that I forgot to Cc Paul. Jan --- While probably I could be convinced to omit the #ifndef, I'm really considering to extend the one in hvmemul_unmap_linear_addr(), to eliminate the zapping from release builds: Leaving MFN 0 in place is not much better than leaving a (presently) guest-owned one there. And we can't really put/leave INVALID_MFN there, as that would conflict with other debug checking. Would it be worth defining a sentinel value for this purpose rather than hardcoding _mfn(0)? (_mfn(0) seems like a reasonable sentinel... it's just a question of having a #define for it). Either way... Acked-by: Paul Durrant --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -696,7 +696,12 @@ static void *hvmemul_map_linear_addr( out: /* Drop all held references. */ while ( mfn-- > hvmemul_ctxt->mfn ) +{ put_page(mfn_to_page(*mfn)); +#ifndef NDEBUG /* Clean slot for a subsequent map()'s error checking. */ +*mfn = _mfn(0); +#endif +} return err; } @@ -718,7 +723,7 @@ static void hvmemul_unmap_linear_addr( for ( i = 0; i < nr_frames; i++ ) { -ASSERT(mfn_valid(*mfn)); +ASSERT(mfn_x(*mfn) && mfn_valid(*mfn)); paging_mark_dirty(currd, *mfn); put_page(mfn_to_page(*mfn));
Re: Ping: [PATCH] x86/guest: finish conversion to altcall
On 02/02/2024 07:08, Jan Beulich wrote: On 17.01.2024 10:31, Jan Beulich wrote: While .setup() and .e820_fixup() don't need fiddling with for being run only very early, both .ap_setup() and .resume() want converting too: This way both pre-filled struct hypervisor_ops instances can become __initconst_cf_clobber, thus allowing to eliminate up to 5 more ENDBR (configuration dependent) during the 2nd phase of alternatives patching. While fiddling with section annotations here, also move "ops" itself to .data.ro_after_init. Signed-off-by: Jan Beulich May I ask for an ack (or otherwise)? Thanks, Jan Yes, no objections. Acked-by: Paul Durrant --- a/xen/arch/x86/guest/hyperv/hyperv.c +++ b/xen/arch/x86/guest/hyperv/hyperv.c @@ -207,7 +207,7 @@ static int cf_check flush_tlb( return hyperv_flush_tlb(mask, va, flags); } -static const struct hypervisor_ops __initconstrel ops = { +static const struct hypervisor_ops __initconst_cf_clobber ops = { .name = "Hyper-V", .setup = setup, .ap_setup = ap_setup, --- a/xen/arch/x86/guest/hypervisor.c +++ b/xen/arch/x86/guest/hypervisor.c @@ -13,7 +13,7 @@ #include #include -static struct hypervisor_ops __read_mostly ops; +static struct hypervisor_ops __ro_after_init ops; const char *__init hypervisor_probe(void) { @@ -49,7 +49,7 @@ void __init hypervisor_setup(void) int hypervisor_ap_setup(void) { if ( ops.ap_setup ) -return ops.ap_setup(); +return alternative_call(ops.ap_setup); return 0; } @@ -57,7 +57,7 @@ int hypervisor_ap_setup(void) void hypervisor_resume(void) { if ( ops.resume ) -ops.resume(); +alternative_vcall(ops.resume); } void __init hypervisor_e820_fixup(void) --- a/xen/arch/x86/guest/xen/xen.c +++ b/xen/arch/x86/guest/xen/xen.c @@ -318,7 +318,7 @@ static int cf_check flush_tlb( return xen_hypercall_hvm_op(HVMOP_flush_tlbs, NULL); } -static const struct hypervisor_ops __initconstrel ops = { +static const struct hypervisor_ops __initconst_cf_clobber ops = { .name = "Xen", .setup = setup, .ap_setup = ap_setup,
Re: [PATCH net] xen-netback: properly sync TX responses
On 29/01/2024 13:03, Jan Beulich wrote: Invoking the make_tx_response() / push_tx_responses() pair with no lock held would be acceptable only if all such invocations happened from the same context (NAPI instance or dealloc thread). Since this isn't the case, and since the interface "spec" also doesn't demand that multicast operations may only be performed with no in-flight transmits, MCAST_{ADD,DEL} processing also needs to acquire the response lock around the invocations. To prevent similar mistakes going forward, "downgrade" the present functions to private helpers of just the two remaining ones using them directly, with no forward declarations anymore. This involves renaming what so far was make_tx_response(), for the new function of that name to serve the new (wrapper) purpose. While there, - constify the txp parameters, - correct xenvif_idx_release()'s status parameter's type, - rename {,_}make_tx_response()'s status parameters for consistency with xenvif_idx_release()'s. Fixes: 210c34dcd8d9 ("xen-netback: add support for multicast control") Cc: sta...@vger.kernel.org Signed-off-by: Jan Beulich --- Of course this could be split into two or even more separate changes, but I think these transformations are best done all in one go. It remains questionable whether push_tx_responses() really needs invoking after every single _make_tx_response(). MCAST_{ADD,DEL} are odd also from another perspective: They're supposed to come with "dummy requests", with the comment in the public header leaving open what that means. IIRC the only reference I had at the time was Solaris code... I don't really there being any documentation of the feature. I think that https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/xen/io/xnb.c is probably similar to the code I looked at to determine what the Solaris frontend expected. So I think 'dummy' means 'ignored'. Netback doesn't check dummy-ness (e.g. size being zero). Furthermore the description in the public header doesn't really make clear that there's a restriction of one such "extra" per dummy request. Yet the way xenvif_get_extras() works precludes multiple ADDs or multiple DELs in a single dummy request (only the last one would be honored afaict). While the way xenvif_tx_build_gops() works precludes an ADD and a DEL coming together in a single dummy request (the DEL would be ignored). It appears the Solaris backend never coped with multiple extra_info so what the 'correct' semantic would be is unclear. Anyway... Reviewed-by: Paul Durrant --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -104,13 +104,12 @@ bool provides_xdp_headroom = true; module_param(provides_xdp_headroom, bool, 0644); static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx, - u8 status); + s8 status); static void make_tx_response(struct xenvif_queue *queue, -struct xen_netif_tx_request *txp, +const struct xen_netif_tx_request *txp, unsigned int extra_count, -s8 st); -static void push_tx_responses(struct xenvif_queue *queue); +s8 status); static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx); @@ -208,13 +207,9 @@ static void xenvif_tx_err(struct xenvif_ unsigned int extra_count, RING_IDX end) { RING_IDX cons = queue->tx.req_cons; - unsigned long flags; do { - spin_lock_irqsave(>response_lock, flags); make_tx_response(queue, txp, extra_count, XEN_NETIF_RSP_ERROR); - push_tx_responses(queue); - spin_unlock_irqrestore(>response_lock, flags); if (cons == end) break; RING_COPY_REQUEST(>tx, cons++, txp); @@ -465,12 +460,7 @@ static void xenvif_get_requests(struct x for (shinfo->nr_frags = 0; nr_slots > 0 && shinfo->nr_frags < MAX_SKB_FRAGS; nr_slots--) { if (unlikely(!txp->size)) { - unsigned long flags; - - spin_lock_irqsave(>response_lock, flags); make_tx_response(queue, txp, 0, XEN_NETIF_RSP_OKAY); - push_tx_responses(queue); - spin_unlock_irqrestore(>response_lock, flags); ++txp; continue; } @@ -496,14 +486,8 @@ static void xenvif_get_requests(struct x for (shinfo->nr_frags = 0; shinfo->nr_frags < nr_slots; ++txp) { if (unlikely(!txp->size)) { - unsigned long flags; - - s
Re: [PATCH net] xen-netback: properly sync TX responses
On 01/02/2024 00:23, Jakub Kicinski wrote: On Mon, 29 Jan 2024 14:03:08 +0100 Jan Beulich wrote: Invoking the make_tx_response() / push_tx_responses() pair with no lock held would be acceptable only if all such invocations happened from the same context (NAPI instance or dealloc thread). Since this isn't the case, and since the interface "spec" also doesn't demand that multicast operations may only be performed with no in-flight transmits, MCAST_{ADD,DEL} processing also needs to acquire the response lock around the invocations. To prevent similar mistakes going forward, "downgrade" the present functions to private helpers of just the two remaining ones using them directly, with no forward declarations anymore. This involves renaming what so far was make_tx_response(), for the new function of that name to serve the new (wrapper) purpose. While there, - constify the txp parameters, - correct xenvif_idx_release()'s status parameter's type, - rename {,_}make_tx_response()'s status parameters for consistency with xenvif_idx_release()'s. Hi Paul, is this one on your TODO list to review or should we do our best? :) Sorry for the delay. I'll take a look at this now. Paul
Re: [PATCH v5 3/3] x86/iommu: cleanup unused functions
On 24/01/2024 17:29, Roger Pau Monne wrote: Remove xen_in_range() and vpci_is_mmcfg_address() now that hey are unused. Adjust comments to point to the new functions that replace the existing ones. No functional change. Signed-off-by: Roger Pau Monné Reviewed-by: Jan Beulich --- Changes since v2: - Do remove vpci_is_mmcfg_address(). --- Can be squashed with the previous patch if desired, split as a separate patch for clarity. --- xen/arch/x86/hvm/io.c | 5 --- xen/arch/x86/include/asm/hvm/io.h | 3 -- xen/arch/x86/include/asm/setup.h | 1 - xen/arch/x86/setup.c | 53 ++- xen/arch/x86/tboot.c | 2 +- 5 files changed, 3 insertions(+), 61 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v5 2/3] x86/iommu: switch hwdom IOMMU to use a rangeset
On 24/01/2024 17:29, Roger Pau Monne wrote: The current loop that iterates from 0 to the maximum RAM address in order to setup the IOMMU mappings is highly inefficient, and it will get worse as the amount of RAM increases. It's also not accounting for any reserved regions past the last RAM address. Instead of iterating over memory addresses, iterate over the memory map regions and use a rangeset in order to keep track of which ranges need to be identity mapped in the hardware domain physical address space. On an AMD EPYC 7452 with 512GiB of RAM, the time to execute arch_iommu_hwdom_init() in nanoseconds is: x old + new N Min MaxMedian AvgStddev x 5 2.2364154e+10 2.338244e+10 2.2474685e+10 2.2622409e+10 4.2949869e+08 + 5 1025012 1033036 1026188 1028276.2 3623.1194 Difference at 95.0% confidence -2.26214e+10 +/- 4.42931e+08 -99.9955% +/- 9.05152e-05% (Student's t, pooled s = 3.03701e+08) Execution time of arch_iommu_hwdom_init() goes down from ~22s to ~0.001s. Note there's a change for HVM domains (ie: PVH dom0) that get switched to create the p2m mappings using map_mmio_regions() instead of p2m_add_identity_entry(), so that ranges can be mapped with a single function call if possible. Note that the interface of map_mmio_regions() doesn't allow creating read-only mappings, but so far there are no such mappings created for PVH dom0 in arch_iommu_hwdom_init(). No change intended in the resulting mappings that a hardware domain gets. Signed-off-by: Roger Pau Monné --- Changes since v4: - Add default case to handle ACPI and NVS regions (which are not mapped) unless in the low 4GB and when inclusive mode is set. - Add changelog entry. - Dropped Jans RB. Changes since v3: - Remove unnecessary line wraps. Changes since v2: - Simplify a bit the logic related to inclusive option, at the cost of making some no-op calls on some cases. Changes since v1: - Split from bigger patch. - Remove unneeded default case. x86/iommu: add CHANGELOG entry for hwdom setup time improvements Signed-off-by: Roger Pau Monné Reviewed-by: Paul Durrant
Re: [PATCH v6 1/3] x86/iommu: remove regions not to be mapped
On 25/01/2024 13:26, Roger Pau Monne wrote: Introduce the code to remove regions not to be mapped from the rangeset that will be used to setup the IOMMU page tables for the hardware domain. This change also introduces two new functions: remove_xen_ranges() and vpci_subtract_mmcfg() that copy the logic in xen_in_range() and vpci_is_mmcfg_address() respectively and remove the ranges that would otherwise be intercepted by the original functions. Note that the rangeset is still not populated. Signed-off-by: Roger Pau Monné --- Changes since v6: - Fix subtraction to be done against the address, not the mfn. Changes since v4: - Fix off-by-one when removing the Xen used ranges, as the rangesets are inclusive. Changes since v3: - Remove unnecessary line wrapping. Changes since v1: - Split from bigger patch. --- xen/arch/x86/hvm/io.c | 16 xen/arch/x86/include/asm/hvm/io.h | 3 ++ xen/arch/x86/include/asm/setup.h| 1 + xen/arch/x86/setup.c| 48 +++ xen/drivers/passthrough/x86/iommu.c | 61 + 5 files changed, 129 insertions(+) Reviewed-by: Paul Durrant
Re: [PATCH v4 3/6] xen: decouple generic xen code from legacy backends codebase
On 02/12/2023 01:41, Volodymyr Babchuk wrote: In xen-all.c there are unneeded dependencies on xen-legacy-backend.c: - xen_init() uses xen_pv_printf() to report errors, but it does not provide a pointer to the struct XenLegacyDevice, so it is kind of useless, we can use standard error_report() instead. - xen-all.c has function xenstore_record_dm_state() which uses global variable "xenstore" defined and initialized in xen-legacy-backend.c It is used exactly once, so we can just open a new connection to the xenstore, update DM state and close connection back. Those two changes allows us to remove xen-legacy-backend.c at all, what should be done in the future anyways. But right now this patch moves us one step close to have QEMU build without legacy Xen backends. Signed-off-by: Volodymyr Babchuk --- In v4: - New in v4, previous was part of "xen: add option to disable legacy backends" - Do not move xenstore global variable from xen-legacy-backend.c, instead use a local variable. --- accel/xen/xen-all.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH 05/12] block: remove AioContext locking
On 29/11/2023 19:55, Stefan Hajnoczi wrote: This is the big patch that removes aio_context_acquire()/aio_context_release() from the block layer and affected block layer users. There isn't a clean way to split this patch and the reviewers are likely the same group of people, so I decided to do it in one patch. Signed-off-by: Stefan Hajnoczi --- include/block/block-global-state.h | 9 +- include/block/block-io.h | 3 +- include/block/snapshot.h | 2 - block.c| 234 +- block/block-backend.c | 14 -- block/copy-before-write.c | 22 +-- block/export/export.c | 22 +-- block/io.c | 45 + block/mirror.c | 19 -- block/monitor/bitmap-qmp-cmds.c| 20 +- block/monitor/block-hmp-cmds.c | 29 --- block/qapi-sysemu.c| 27 +-- block/qapi.c | 18 +- block/raw-format.c | 5 - block/replication.c| 58 +- block/snapshot.c | 22 +-- block/write-threshold.c| 6 - blockdev.c | 307 + blockjob.c | 18 -- hw/block/dataplane/virtio-blk.c| 10 - hw/block/dataplane/xen-block.c | 17 +- hw/block/virtio-blk.c | 45 + hw/core/qdev-properties-system.c | 9 - job.c | 16 -- migration/block.c | 33 +--- migration/migration-hmp-cmds.c | 3 - migration/savevm.c | 22 --- net/colo-compare.c | 2 - qemu-img.c | 4 - qemu-io.c | 10 +- qemu-nbd.c | 2 - replay/replay-debugging.c | 4 - tests/unit/test-bdrv-drain.c | 51 + tests/unit/test-bdrv-graph-mod.c | 6 - tests/unit/test-block-iothread.c | 31 --- tests/unit/test-blockjob.c | 137 - tests/unit/test-replication.c | 11 -- util/async.c | 4 - util/vhost-user-server.c | 3 - scripts/block-coroutine-wrapper.py | 3 - tests/tsan/suppressions.tsan | 1 - 41 files changed, 102 insertions(+), 1202 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH 2/6] qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to QEMU_BQL_LOCK_GUARD
On 29/11/2023 21:26, Stefan Hajnoczi wrote: The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL) instead, it is already widely used and unambiguous. Signed-off-by: Stefan Hajnoczi --- include/qemu/main-loop.h | 20 ++-- hw/i386/kvm/xen_evtchn.c | 14 +++--- hw/i386/kvm/xen_gnttab.c | 2 +- hw/mips/mips_int.c| 2 +- hw/ppc/ppc.c | 2 +- target/i386/kvm/xen-emu.c | 2 +- target/ppc/excp_helper.c | 2 +- target/ppc/helper_regs.c | 2 +- target/riscv/cpu_helper.c | 4 ++-- 9 files changed, 25 insertions(+), 25 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()
| 4 +- target/s390x/tcg/misc_helper.c | 118 +-- target/sparc/int32_helper.c | 2 +- target/sparc/int64_helper.c | 6 +- target/sparc/win_helper.c| 20 ++--- target/xtensa/exc_helper.c | 8 +- ui/spice-core.c | 4 +- util/async.c | 2 +- util/main-loop.c | 8 +- util/rcu.c | 14 ++-- audio/coreaudio.m| 4 +- memory_ldst.c.inc| 18 ++-- target/i386/hvf/README.md| 2 +- ui/cocoa.m | 50 ++-- 94 files changed, 502 insertions(+), 502 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v2 4/6] xen_pvdev: Do not assume Dom0 when creating a directory
On 23/11/2023 12:27, David Woodhouse wrote: On 23 November 2023 12:17:57 GMT, Volodymyr Babchuk wrote: Hi David, David Woodhouse writes: Which PV backends do you care about? We already have net, block and console converted. Well, this is all what we need, actually. Even console only will be sufficient, as we are using QEMU to provide virtio-pci backends, so both storage and networking should be provided by virtio. Are you proposing to just drop this patch at all? I believe we can live without it, yes. Yeah, I think you can drop anything that's only needed for the legacy backends. I'm tempted to make a separate config option to compile those out. I think that would be a good idea. The other legacy bacckend that we may need to care about is xenfb... not so much the framebuffer itself, but the mouse and keyboard aspects. The XENVKBD and XENHID drivers expose PV mouse and keyboard to Windows instances so it's be nice if we can avoid the backend withering away. Paul
Re: [PATCH v2 1/6] hw/xen: Set XenBackendInstance in the XenDevice before realizing it
On 22/11/2023 23:04, David Woodhouse wrote: On Wed, 2023-11-22 at 22:56 +, Volodymyr Babchuk wrote: Paul Durrant writes: On 21/11/2023 22:10, Volodymyr Babchuk wrote: From: David Woodhouse This allows a XenDevice implementation to know whether it was created by QEMU, or merely discovered in XenStore after the toolstack created it. This will allow us to create frontend/backend nodes only when we should, rather than unconditionally attempting to overwrite them from a driver domain which doesn't have privileges to do so. As an added benefit, it also means we no longer have to call the xen_backend_set_device() function from the device models immediately after calling qdev_realize_and_unref(). Even though we could make the argument that it's safe to do so, and the pointer to the unreffed device *will* actually still be valid, it still made my skin itch to look at it. Signed-off-by: David Woodhouse --- hw/block/xen-block.c | 3 +-- hw/char/xen_console.c | 2 +- hw/net/xen_nic.c | 2 +- hw/xen/xen-bus.c | 4 include/hw/xen/xen-backend.h | 2 -- include/hw/xen/xen-bus.h | 2 ++ 6 files changed, 9 insertions(+), 6 deletions(-) Actually, I think you should probably update xen_backend_try_device_destroy() in this patch too. It currently uses xen_backend_list_find() to check whether the specified XenDevice has an associated XenBackendInstance. Sure. Looks like it is the only user of xen_backend_list_find(), so we can get rid of it as well. I'll drop your R-b tag, because of those additional changes in the new version. I think it's fine to keep. He told me to do it :) I confirm that :-)
Re: [PATCH v2 4/6] xen_pvdev: Do not assume Dom0 when creating a directory
On 23/11/2023 00:07, Volodymyr Babchuk wrote: Hi, Volodymyr Babchuk writes: Hi Stefano, Stefano Stabellini writes: On Wed, 22 Nov 2023, David Woodhouse wrote: On Wed, 2023-11-22 at 15:09 -0800, Stefano Stabellini wrote: On Wed, 22 Nov 2023, David Woodhouse wrote: On Wed, 2023-11-22 at 14:29 -0800, Stefano Stabellini wrote: On Wed, 22 Nov 2023, Paul Durrant wrote: On 21/11/2023 22:10, Volodymyr Babchuk wrote: From: Oleksandr Tyshchenko Instead of forcing the owner to domid 0, use XS_PRESERVE_OWNER to inherit the owner of the directory. Ah... so that's why the previous patch is there. This is not the right way to fix it. The QEMU Xen support is *assuming* that QEMU is either running in, or emulating, dom0. In the emulation case this is probably fine, but the 'real Xen' case it should be using the correct domid for node creation. I guess this could either be supplied on the command line or discerned by reading the local domain 'domid' node. yes, it should be passed as command line option to QEMU I'm not sure I like the idea of a command line option for something which QEMU could discover for itself. That's fine too. I meant to say "yes, as far as I know the toolstack passes the domid to QEMU as a command line option today". The -xen-domid argument on the QEMU command line today is the *guest* domain ID, not the domain ID in which QEMU itself is running. Or were you thinking of something different? Ops, you are right and I understand your comment better now. The backend domid is not on the command line but it should be discoverable (on xenstore if I remember right). Yes, it is just "~/domid". I'll add a function that reads it. Just a quick question to QEMU folks: is it better to add a global variable where we will store own Domain ID or it will be okay to read domid from Xenstore every time we need it? If global variable variant is better, what is proffered place to define this variable? system/globals.c ? Actually... is it possible for QEMU just to use a relative path for the backend nodes? That way it won't need to know it's own domid, will it? Paul
Re: [PATCH v2 4/6] xen_pvdev: Do not assume Dom0 when creating a directory
On 21/11/2023 22:10, Volodymyr Babchuk wrote: From: Oleksandr Tyshchenko Instead of forcing the owner to domid 0, use XS_PRESERVE_OWNER to inherit the owner of the directory. Ah... so that's why the previous patch is there. This is not the right way to fix it. The QEMU Xen support is *assuming* that QEMU is either running in, or emulating, dom0. In the emulation case this is probably fine, but the 'real Xen' case it should be using the correct domid for node creation. I guess this could either be supplied on the command line or discerned by reading the local domain 'domid' node. Note that for other than Dom0 domain (non toolstack domain) the "driver_domain" property should be set in domain config file for the toolstack to create required directories in advance. Signed-off-by: Oleksandr Tyshchenko Signed-off-by: Volodymyr Babchuk --- hw/xen/xen_pvdev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/xen/xen_pvdev.c b/hw/xen/xen_pvdev.c index c5ad71e8dc..42bdd4f6c8 100644 --- a/hw/xen/xen_pvdev.c +++ b/hw/xen/xen_pvdev.c @@ -60,7 +60,8 @@ void xen_config_cleanup(void) int xenstore_mkdir(char *path, int p) { -if (!qemu_xen_xs_create(xenstore, 0, 0, xen_domid, p, path)) { +if (!qemu_xen_xs_create(xenstore, 0, XS_PRESERVE_OWNER, +xen_domid, p, path)) { xen_pv_printf(NULL, 0, "xs_mkdir %s: failed\n", path); return -1; }
Re: [PATCH v2 3/6] xen: xenstore: add possibility to preserve owner
On 21/11/2023 22:10, Volodymyr Babchuk wrote: Add option to preserve owner when creating an entry in Xen Store. This may be needed in cases when Qemu is working as device model in a *may* be needed? I don't undertstand why this patch is necessary and the commit comment isn't really helping me. domain that is Domain-0, e.g. in driver domain. "owner" parameter for qemu_xen_xs_create() function can have special value XS_PRESERVE_OWNER, which will make specific implementation to get original owner of an entry and pass it back to set_permissions() call. Please note, that XenStore inherits permissions, so even if entry is newly created by, it already has the owner set to match owner of entry at previous level. Signed-off-by: Volodymyr Babchuk
Re: [PATCH v2 1/6] hw/xen: Set XenBackendInstance in the XenDevice before realizing it
On 21/11/2023 22:10, Volodymyr Babchuk wrote: From: David Woodhouse This allows a XenDevice implementation to know whether it was created by QEMU, or merely discovered in XenStore after the toolstack created it. This will allow us to create frontend/backend nodes only when we should, rather than unconditionally attempting to overwrite them from a driver domain which doesn't have privileges to do so. As an added benefit, it also means we no longer have to call the xen_backend_set_device() function from the device models immediately after calling qdev_realize_and_unref(). Even though we could make the argument that it's safe to do so, and the pointer to the unreffed device *will* actually still be valid, it still made my skin itch to look at it. Signed-off-by: David Woodhouse --- hw/block/xen-block.c | 3 +-- hw/char/xen_console.c| 2 +- hw/net/xen_nic.c | 2 +- hw/xen/xen-bus.c | 4 include/hw/xen/xen-backend.h | 2 -- include/hw/xen/xen-bus.h | 2 ++ 6 files changed, 9 insertions(+), 6 deletions(-) Actually, I think you should probably update xen_backend_try_device_destroy() in this patch too. It currently uses xen_backend_list_find() to check whether the specified XenDevice has an associated XenBackendInstance. Paul
Re: [PATCH v2 2/6] xen: backends: touch some XenStore nodes only if device...
On 21/11/2023 22:10, Volodymyr Babchuk wrote: was created by QEMU Xen PV devices in QEMU can be created in two ways: either by QEMU itself, if they were passed via command line, or by Xen toolstack. In the latter case, QEMU scans XenStore entries and configures devices accordingly. In the second case we don't want QEMU to write/delete front-end entries for two reasons: it might have no access to those entries if it is running in un-privileged domain and it is just incorrect to overwrite entries already provided by Xen toolstack, because toolstack manages those nodes. For example, it might read backend- or frontend- state to be sure that they are both disconnected and it is safe to destroy a domain. This patch checks presence of xendev->backend to check if Xen PV device is acting as a backend (i.e. it was configured by Xen Technally *all* XenDevice objects are backends. toolstack) to decide if it should touch frontend entries in XenStore. Also, when we need to remove XenStore entries during device teardown only if they weren't created by Xen toolstack. If they were created by toolstack, then it is toolstack's job to do proper clean-up.
Re: [PATCH v2 1/6] hw/xen: Set XenBackendInstance in the XenDevice before realizing it
On 21/11/2023 22:10, Volodymyr Babchuk wrote: From: David Woodhouse This allows a XenDevice implementation to know whether it was created by QEMU, or merely discovered in XenStore after the toolstack created it. This will allow us to create frontend/backend nodes only when we should, rather than unconditionally attempting to overwrite them from a driver domain which doesn't have privileges to do so. As an added benefit, it also means we no longer have to call the xen_backend_set_device() function from the device models immediately after calling qdev_realize_and_unref(). Even though we could make the argument that it's safe to do so, and the pointer to the unreffed device *will* actually still be valid, it still made my skin itch to look at it. Signed-off-by: David Woodhouse --- hw/block/xen-block.c | 3 +-- hw/char/xen_console.c| 2 +- hw/net/xen_nic.c | 2 +- hw/xen/xen-bus.c | 4 include/hw/xen/xen-backend.h | 2 -- include/hw/xen/xen-bus.h | 2 ++ 6 files changed, 9 insertions(+), 6 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH 2/3] vl: disable default serial when xen-console is enabled
On 15/11/2023 17:24, David Woodhouse wrote: From: David Woodhouse If a Xen console is configured on the command line, do not add a default serial port. Signed-off-by: David Woodhouse --- system/vl.c | 1 + 1 file changed, 1 insertion(+) Reviewed-by: Paul Durrant
Re: [PATCH 3/3] hw/xen: clean up xen_block_find_free_vdev() to avoid Coverity false positive
On 15/11/2023 12:24, David Woodhouse wrote: From: David Woodhouse Coverity couldn't see that nr_existing was always going to be zero when qemu_xen_xs_directory() returned NULL in the ENOENT case (CID 1523906). Perhaps more to the point, neither could Peter at first glance. Improve the code to hopefully make it clearer to Coverity and human reviewers alike. Signed-off-by: David Woodhouse --- hw/block/xen-block.c | 24 +--- 1 file changed, 21 insertions(+), 3 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v1 3/7] xen: xenstore: add possibility to preserve owner
On 10/11/2023 15:42, Volodymyr Babchuk wrote: Add option to preserve owner when creating an entry in Xen Store. This may be needed in cases when Qemu is working as device model in a domain that is Domain-0, e.g. in driver domain. "owner" parameter for qemu_xen_xs_create() function can have special value XS_PRESERVE_OWNER, which will make specific implementation to get original owner of an entry and pass it back to set_permissions() call. If QEMU is running in a driver domain then it should know whether the domid of the domain it is running in and use that. Yes, it is hardcoded to 0 at the moment but surely a backend domid (which defaults to 0) could be passed on the command line? Paul Signed-off-by: Volodymyr Babchuk --- hw/i386/kvm/xen_xenstore.c | 18 ++ hw/xen/xen-operations.c | 12 include/hw/xen/xen_backend_ops.h | 2 ++ 3 files changed, 32 insertions(+) diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c index 660d0b72f9..7b894a9884 100644 --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -1572,6 +1572,24 @@ static bool xs_be_create(struct qemu_xs_handle *h, xs_transaction_t t, return false; } +if (owner == XS_PRESERVE_OWNER) { +GList *perms; +char letter; + +err = xs_impl_get_perms(h->impl, 0, t, path, ); +if (err) { +errno = err; +return false; +} + +if (sscanf(perms->data, "%c%u", , ) != 2) { +errno = EFAULT; +g_list_free_full(perms, g_free); +return false; +} +g_list_free_full(perms, g_free); +} + perms_list = g_list_append(perms_list, xs_perm_as_string(XS_PERM_NONE, owner)); perms_list = g_list_append(perms_list, diff --git a/hw/xen/xen-operations.c b/hw/xen/xen-operations.c index e00983ec44..1df59b3c08 100644 --- a/hw/xen/xen-operations.c +++ b/hw/xen/xen-operations.c @@ -300,6 +300,18 @@ static bool libxenstore_create(struct qemu_xs_handle *h, xs_transaction_t t, return false; } +if (owner == XS_PRESERVE_OWNER) { +struct xs_permissions *tmp; +unsigned int num; + +tmp = xs_get_permissions(h->xsh, 0, path, ); +if (tmp == NULL) { +return false; +} +perms_list[0].id = tmp[0].id; +free(tmp); +} + return xs_set_permissions(h->xsh, t, path, perms_list, ARRAY_SIZE(perms_list)); } diff --git a/include/hw/xen/xen_backend_ops.h b/include/hw/xen/xen_backend_ops.h index 90cca85f52..273e414559 100644 --- a/include/hw/xen/xen_backend_ops.h +++ b/include/hw/xen/xen_backend_ops.h @@ -266,6 +266,8 @@ typedef uint32_t xs_transaction_t; #define XS_PERM_READ 0x01 #define XS_PERM_WRITE 0x02 +#define XS_PRESERVE_OWNER0xFFFE + struct xenstore_backend_ops { struct qemu_xs_handle *(*open)(void); void (*close)(struct qemu_xs_handle *h);
Re: [PATCH v1 5/7] xen-bus: Set offline if backend's state is XenbusStateClosed
On 10/11/2023 15:42, Volodymyr Babchuk wrote: From: Oleksandr Tyshchenko Both state (XenbusStateClosed) and online (0) are expected by toolstack/xl devd to completely destroy the device. But "offline" is never being set by the backend resulting in timeout during domain destruction, garbage in Xestore and still running Qemu instance. Signed-off-by: Oleksandr Tyshchenko Signed-off-by: Volodymyr Babchuk --- hw/xen/xen-bus.c | 4 1 file changed, 4 insertions(+) diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c index 75474d4b43..6e7ec3af64 100644 --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -519,6 +519,10 @@ static void xen_device_backend_changed(void *opaque, const char *path) xen_device_backend_set_state(xendev, XenbusStateClosed); } +if (xen_device_backend_get_state(xendev) == XenbusStateClosed) { +xen_device_backend_set_online(xendev, false); +} + /* * If a backend is still 'online' then we should leave it alone but, * if a backend is not 'online', then the device is a candidate I don't understand what you're trying to do here. Just a few lines up from this hunk there is: 506if (xen_device_backend_scanf(xendev, "online", "%u", ) != 1) { 507online = 0; 508} 509 510xen_device_backend_set_online(xendev, !!online); Why is this not sufficient? What happens if the frontend decides to stop and start (e.g. for a driver update)? I'm guessing the backend will be destroyed... which is not very friendly. Paul
Re: [PATCH v1 1/7] xen-block: Do not write frontend nodes
On 10/11/2023 15:42, Volodymyr Babchuk wrote: From: Oleksandr Tyshchenko The PV backend running in other than Dom0 domain (non toolstack domain) is not allowed to write frontend nodes. The more, the backend does not need to do that at all, this is purely toolstack/xl devd business. I do not know for what reason the backend does that here, this is not really needed, probably it is just a leftover and all xen_device_frontend_printf() instances should go away completely. It is not a leftover and it is needed in the case that QEMU is instantiating the backend unilaterally... i.e. without the involvement of any Xen toolstack. Agreed that, if QEMU, is running in a deprivileged context, that is not an option. The correct way to determined this though is whether the device is being created via the QEMU command line or whether is being created because XenStore nodes written by a toolstack were discovered. In the latter case there should be a XenBackendInstance that corresponds to the XenDevice whereas in the former case there should not be. For example, see xen_backend_try_device_destroy() Paul Signed-off-by: Oleksandr Tyshchenko Signed-off-by: Volodymyr Babchuk --- hw/block/xen-block.c | 11 +++ hw/xen/xen-bus.c | 2 +- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c index a07cd7eb5d..dc4d477c22 100644 --- a/hw/block/xen-block.c +++ b/hw/block/xen-block.c @@ -221,6 +221,7 @@ static void xen_block_realize(XenDevice *xendev, Error **errp) XenBlockVdev *vdev = >props.vdev; BlockConf *conf = >props.conf; BlockBackend *blk = conf->blk; +XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev))); if (vdev->type == XEN_BLOCK_VDEV_TYPE_INVALID) { error_setg(errp, "vdev property not set"); @@ -280,10 +281,12 @@ static void xen_block_realize(XenDevice *xendev, Error **errp) xen_device_backend_printf(xendev, "info", "%u", blockdev->info); -xen_device_frontend_printf(xendev, "virtual-device", "%lu", - vdev->number); -xen_device_frontend_printf(xendev, "device-type", "%s", - blockdev->device_type); +if (xenbus->backend_id == 0) { +xen_device_frontend_printf(xendev, "virtual-device", "%lu", + vdev->number); +xen_device_frontend_printf(xendev, "device-type", "%s", + blockdev->device_type); +} xen_device_backend_printf(xendev, "sector-size", "%u", conf->logical_block_size); diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c index ece8ec40cd..06d5192aca 100644 --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -1048,7 +1048,7 @@ static void xen_device_realize(DeviceState *dev, Error **errp) xen_device_backend_set_online(xendev, true); xen_device_backend_set_state(xendev, XenbusStateInitWait); -if (!xen_device_frontend_exists(xendev)) { +if (!xen_device_frontend_exists(xendev) && xenbus->backend_id == 0) { xen_device_frontend_printf(xendev, "backend", "%s", xendev->backend_path); xen_device_frontend_printf(xendev, "backend-id", "%u",
Re: [PATCH] hw/xen: clean up xen_block_find_free_vdev() to avoid Coverity false positive
On 09/11/2023 15:30, David Woodhouse wrote: From: David Woodhouse Coverity couldn't see that nr_existing was always going to be zero when qemu_xen_xs_directory() returned NULL in the ENOENT case (CID 1523906). Perhaps more to the point, neither could Peter at first glance. Improve the code to hopefully make it clearer to Coverity and human reviewers alike. Signed-off-by: David Woodhouse --- hw/block/xen-block.c | 24 +--- 1 file changed, 21 insertions(+), 3 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v4 17/17] docs: update Xen-on-KVM documentation
On 06/11/2023 14:35, David Woodhouse wrote: From: David Woodhouse Add notes about console and network support, and how to launch PV guests. Clean up the disk configuration examples now that that's simpler, and remove the comment about IDE unplug on q35/AHCI now that it's fixed. Update the -initrd option documentation to explain how to quote commas in module command lines, and reference it when documenting PV guests. Also update stale avocado test filename in MAINTAINERS. Signed-off-by: David Woodhouse --- MAINTAINERS | 2 +- docs/system/i386/xen.rst | 107 +-- qemu-options.hx | 14 +++-- 3 files changed, 91 insertions(+), 32 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v4 16/17] doc/sphinx/hxtool.py: add optional label argument to SRST directive
On 06/11/2023 14:35, David Woodhouse wrote: From: David Woodhouse We can't just embed labels directly into files like qemu-options.hx which are included from multiple top-level RST files, because Sphinx sees the labels as duplicate: https://github.com/sphinx-doc/sphinx/issues/9707 So add an 'emitrefs' option to the Sphinx hxtool-doc directive, which is set only in invocation.rst and not from the HTML rendition of the man page. Along with an argument to the SRST directive which causes a label of the form '.. _LABEL-reference-label:' to be emitted when the emitrefs option is set. Signed-off-by: David Woodhouse --- docs/sphinx/hxtool.py | 18 +- docs/system/invocation.rst | 1 + 2 files changed, 18 insertions(+), 1 deletion(-) Reviewed-by: Paul Durrant
Re: [PATCH v4 13/17] hw/i386/pc: support '-nic' for xen-net-device
On 06/11/2023 14:35, David Woodhouse wrote: From: David Woodhouse The default NIC creation seems a bit hackish to me. I don't understand why each platform has to call pci_nic_init_nofail() from a point in the code where it actually has a pointer to the PCI bus, and then we have the special cases for things like ne2k_isa. If qmp_device_add() can *find* the appropriate bus and instantiate the device on it, why can't we just do that from generic code for creating the default NICs too? But that isn't a yak I want to shave today. Add a xenbus field to the PCMachineState so that it can make its way from pc_basic_device_init() to pc_nic_init() and be handled as a special case like ne2k_isa is. Now we can launch emulated Xen guests with '-nic user'. Signed-off-by: David Woodhouse --- hw/i386/pc.c | 11 --- hw/i386/pc_piix.c| 2 +- hw/i386/pc_q35.c | 2 +- hw/xen/xen-bus.c | 4 +++- include/hw/i386/pc.h | 4 +++- include/hw/xen/xen-bus.h | 2 +- 6 files changed, 17 insertions(+), 8 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v4 06/17] hw/xen: automatically assign device index to block devices
On 06/11/2023 14:34, David Woodhouse wrote: From: David Woodhouse There's no need to force the user to assign a vdev. We can automatically assign one, starting at xvda and searching until we find the first disk name that's unused. This means we can now allow '-drive if=xen,file=xxx' to work without an explicit separate -driver argument, just like if=virtio. Rip out the legacy handling from the xenpv machine, which was scribbling over any disks configured by the toolstack, and didn't work with anything but raw images. Signed-off-by: David Woodhouse Acked-by: Kevin Wolf --- blockdev.c | 15 +++- hw/block/xen-block.c| 118 ++-- hw/xen/xen_devconfig.c | 28 --- hw/xenpv/xen_machine_pv.c | 9 --- include/hw/xen/xen-legacy-backend.h | 1 - 5 files changed, 125 insertions(+), 46 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH] xen/pt: Emulate multifunction bit in header type
On 03/11/2023 17:26, Ross Lagerwall wrote: The intention of the code appears to have been to unconditionally set the multifunction bit but since the emulation mask is 0x00 it has no effect. Instead, emulate the bit and set it based on the multifunction property of the PCIDevice (which can be set using QAPI). This allows making passthrough devices appear as functions in a Xen guest. Signed-off-by: Ross Lagerwall Reviewed-by: Paul Durrant
Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode
On 25/10/2023 10:00, David Woodhouse wrote: On Wed, 2023-10-25 at 09:31 +0100, Paul Durrant wrote: On 24/10/2023 17:34, David Woodhouse wrote: On Tue, 2023-10-24 at 17:25 +0100, Paul Durrant wrote: On 24/10/2023 16:49, David Woodhouse wrote: On Tue, 2023-10-24 at 16:39 +0100, Paul Durrant wrote: On 24/10/2023 16:37, David Woodhouse wrote: On Tue, 2023-10-24 at 15:20 +0100, Paul Durrant wrote: On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse The primary console is special because the toolstack maps a page at a fixed GFN and also allocates the guest-side event channel. Add support for that in emulated mode, so that we can have a primary console. Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which supports literally nothing except a single-page mapping of the console page. This might as well have been a hack in the xen_console driver, but this way at least the special-casing is kept within the Xen emulation code, and it gives us a hook for a more complete implementation if/when we ever do need one. Why can't you map the console page via the grant table like the xenstore page? I suppose we could, but I didn't really want the generic xen-console device code having any more of a special case for 'Xen emulation' than it does already by having to call xen_primary_console_create(). But doesn't is save you the whole foreignmem thing? You can use the grant table for primary and secondary consoles. Yes. And I could leave the existing foreignmem thing just for the case of primary console under true Xen. It's probably not that awful a special case, in the end. Then again, I was surprised I didn't *already* have a foreignmem ops for the emulated case, and we're probably going to want to continue fleshing it out later, so I don't really mind adding it. True. We'll need it for some of the other more fun protocols like vkbd or fb. Still, I think it'd be nicer to align the xenstore and primary console code to look similar and punt the work until then :-) I don't think it ends up looking like xenstore either way, does it? Xenstore is special because it gets to use the original pointer to its own page. Not sure what you mean there? A guest can query the PFN for either xenstore or console using HVM params, or it can find them in its own grant table entries 0 or 1. The code in our xen_xenstore.c uses its *own* pointer (s->xs) to the MemoryRegion that it created (s->xenstore_page). It is its own backend, as well as doing the "magic" to create the guest-side mapping and event channel. The difference for the console code is that we actually have a *separation* between the standard backend code in xen_console.c, and the magic frontend parts for the emulated mode. I don't think I want to hack the xen_console code to explicitly call a xen_console_give_me_your_page() function. If not foreignmem, I think you were suggesting that we actually call the grant mapping code to get a pointer to the underlying page, right? I'm suggesting that the page be mapped in the same way that the xenstore backend does: 1462 /* 1463 * We don't actually access the guest's page through the grant, because 1464 * this isn't real Xen, and we can just use the page we gave it in the 1465 * first place. Map the grant anyway, mostly for cosmetic purposes so 1466 * it *looks* like it's in use in the guest-visible grant table. 1467 */ 1468 s->gt = qemu_xen_gnttab_open(); 1469 uint32_t xs_gntref = GNTTAB_RESERVED_XENSTORE; 1470 s->granted_xs = qemu_xen_gnttab_map_refs(s->gt, 1, xen_domid, _gntref, 1471 PROT_READ | PROT_WRITE); It already *is*. But as with xen_xenstore.c, nothing ever *uses* the s->granted_xs pointer. It's just cosmetic to make the grant table look right. But that doesn't help the *backend* code. The backend doesn't even know the grant ref#, because the convention we inherited from Xen is that the `ring-ref` in XenStore for the primary console is actually the MFN, to be mapped as foreignmem. Of course, we *do* know the grant-ref for the primary console, as it's always GNTTAB_RESERVED_CONSOLE. So I suppose we could put a hack into the xen_console backend to map *that* in the case of primary console under emu? In fact that would probably do the right thing even under Xen if we could persuade Xen to make an ioemu primary console? That's exactly what I am getting at :-) I don't think we need care about the ring-ref in xenstore for the primary console. Paul I could kind of live with that... except that Xen has this ugly convention that the "ring-ref" frontend node for the primary console actually has the *MFN* not a grant ref. Which I don't understand since the toolstack *does* populate the grant table for it (just as it does for the xenstore page). But we'd have to add a special case exception to that special case, so that in the emu case it'
Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode
On 24/10/2023 17:34, David Woodhouse wrote: On Tue, 2023-10-24 at 17:25 +0100, Paul Durrant wrote: On 24/10/2023 16:49, David Woodhouse wrote: On Tue, 2023-10-24 at 16:39 +0100, Paul Durrant wrote: On 24/10/2023 16:37, David Woodhouse wrote: On Tue, 2023-10-24 at 15:20 +0100, Paul Durrant wrote: On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse The primary console is special because the toolstack maps a page at a fixed GFN and also allocates the guest-side event channel. Add support for that in emulated mode, so that we can have a primary console. Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which supports literally nothing except a single-page mapping of the console page. This might as well have been a hack in the xen_console driver, but this way at least the special-casing is kept within the Xen emulation code, and it gives us a hook for a more complete implementation if/when we ever do need one. Why can't you map the console page via the grant table like the xenstore page? I suppose we could, but I didn't really want the generic xen-console device code having any more of a special case for 'Xen emulation' than it does already by having to call xen_primary_console_create(). But doesn't is save you the whole foreignmem thing? You can use the grant table for primary and secondary consoles. Yes. And I could leave the existing foreignmem thing just for the case of primary console under true Xen. It's probably not that awful a special case, in the end. Then again, I was surprised I didn't *already* have a foreignmem ops for the emulated case, and we're probably going to want to continue fleshing it out later, so I don't really mind adding it. True. We'll need it for some of the other more fun protocols like vkbd or fb. Still, I think it'd be nicer to align the xenstore and primary console code to look similar and punt the work until then :-) I don't think it ends up looking like xenstore either way, does it? Xenstore is special because it gets to use the original pointer to its own page. Not sure what you mean there? A guest can query the PFN for either xenstore or console using HVM params, or it can find them in its own grant table entries 0 or 1. I don't think I want to hack the xen_console code to explicitly call a xen_console_give_me_your_page() function. If not foreignmem, I think you were suggesting that we actually call the grant mapping code to get a pointer to the underlying page, right? I'm suggesting that the page be mapped in the same way that the xenstore backend does: 1462/* 1463 * We don't actually access the guest's page through the grant, because 1464 * this isn't real Xen, and we can just use the page we gave it in the 1465 * first place. Map the grant anyway, mostly for cosmetic purposes so 1466 * it *looks* like it's in use in the guest-visible grant table. 1467 */ 1468s->gt = qemu_xen_gnttab_open(); 1469uint32_t xs_gntref = GNTTAB_RESERVED_XENSTORE; 1470s->granted_xs = qemu_xen_gnttab_map_refs(s->gt, 1, xen_domid, _gntref, 1471 PROT_READ | PROT_WRITE); I could kind of live with that... except that Xen has this ugly convention that the "ring-ref" frontend node for the primary console actually has the *MFN* not a grant ref. Which I don't understand since the toolstack *does* populate the grant table for it (just as it does for the xenstore page). But we'd have to add a special case exception to that special case, so that in the emu case it's an actual grant ref again. I think I prefer just having a stub of foreignmem, TBH. You're worried about the guest changing the page it uses for the primary console and putting a new one in xenstore? I'd be amazed if that even works on Xen unless the guest is careful to write it into GNTTAB_RESERVED_CONSOLE. (I didn't yet manage to get Xen to actually create a primary console of type iomem, FWIW) No, that doesn't entirely surprise me. Paul
Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode
On 24/10/2023 16:49, David Woodhouse wrote: On Tue, 2023-10-24 at 16:39 +0100, Paul Durrant wrote: On 24/10/2023 16:37, David Woodhouse wrote: On Tue, 2023-10-24 at 15:20 +0100, Paul Durrant wrote: On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse The primary console is special because the toolstack maps a page at a fixed GFN and also allocates the guest-side event channel. Add support for that in emulated mode, so that we can have a primary console. Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which supports literally nothing except a single-page mapping of the console page. This might as well have been a hack in the xen_console driver, but this way at least the special-casing is kept within the Xen emulation code, and it gives us a hook for a more complete implementation if/when we ever do need one. Why can't you map the console page via the grant table like the xenstore page? I suppose we could, but I didn't really want the generic xen-console device code having any more of a special case for 'Xen emulation' than it does already by having to call xen_primary_console_create(). But doesn't is save you the whole foreignmem thing? You can use the grant table for primary and secondary consoles. Yes. And I could leave the existing foreignmem thing just for the case of primary console under true Xen. It's probably not that awful a special case, in the end. Then again, I was surprised I didn't *already* have a foreignmem ops for the emulated case, and we're probably going to want to continue fleshing it out later, so I don't really mind adding it. True. We'll need it for some of the other more fun protocols like vkbd or fb. Still, I think it'd be nicer to align the xenstore and primary console code to look similar and punt the work until then :-) Paul
Re: [PATCH v2 16/24] hw/xen: handle soft reset for primary console
On 24/10/2023 16:48, David Woodhouse wrote: On Tue, 2023-10-24 at 16:44 +0100, Paul Durrant wrote: On 19/10/2023 16:40, David Woodhouse wrote: From: David Woodhouse On soft reset, the prinary console event channel needs to be rebound to the backend port (in the xen-console driver). We could put that into the xen-console driver itself, but it's slightly less ugly to keep it within the KVM/Xen code, by stashing the backend port# on event channel reset and then rebinding in the primary console reset when it has to recreate the guest port anyway. Does Xen re-bind the primary console on EVTCHNOP_reset? That's news to me. I go check. I spent an unhapp hour trying to work out how Xen actually does any of this :) In the short term I'm more interested in having soft reset work, than an explicit EVTCHNOP_reset. And I can't work out *how*, but we do seem to have console again after a kexec in real Xen. *Soft* reset may do it, but not the EVTCHNOP_reset hypercall itself, because there's a bunch of impenetrable toolstack magic involved the former. Perhaps you could just push the re-bind code up a layer into kvm_xen_soft_reset(). Paul
Re: [PATCH 2/4] hw/xen: update Xen PV NIC to XenDevice model
On 24/10/2023 16:17, David Woodhouse wrote: [snip] I don't think that's really a valid state for a network frontend. Linux netback just ignores it. Must we? I was thinking of making the ->frontend_changed() methods optional and allowing backends to just provide ->connect() and ->disconnect() methods instead if they wanted to. Because we have three identical ->frontend_changed() methods now... Now maybe... Not sure things will look so common when other backends are converted. I'd prefer to maintain fidelity with Linux xen-netback as it is generally considered to be the canonical implementation. Paul
Re: [PATCH v2 16/24] hw/xen: handle soft reset for primary console
On 19/10/2023 16:40, David Woodhouse wrote: From: David Woodhouse On soft reset, the prinary console event channel needs to be rebound to the backend port (in the xen-console driver). We could put that into the xen-console driver itself, but it's slightly less ugly to keep it within the KVM/Xen code, by stashing the backend port# on event channel reset and then rebinding in the primary console reset when it has to recreate the guest port anyway. Does Xen re-bind the primary console on EVTCHNOP_reset? That's news to me. I go check. Paul Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_evtchn.c | 9 + hw/i386/kvm/xen_primary_console.c | 29 - hw/i386/kvm/xen_primary_console.h | 1 + 3 files changed, 38 insertions(+), 1 deletion(-) diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c index d72dca6591..ce4da6d37a 100644 --- a/hw/i386/kvm/xen_evtchn.c +++ b/hw/i386/kvm/xen_evtchn.c @@ -40,6 +40,7 @@ #include "xen_evtchn.h" #include "xen_overlay.h" #include "xen_xenstore.h" +#include "xen_primary_console.h" #include "sysemu/kvm.h" #include "sysemu/kvm_xen.h" @@ -1098,6 +1099,7 @@ int xen_evtchn_soft_reset(void) { XenEvtchnState *s = xen_evtchn_singleton; bool flush_kvm_routes; +uint16_t con_port = xen_primary_console_get_port(); int i; if (!s) { @@ -1108,6 +1110,13 @@ int xen_evtchn_soft_reset(void) qemu_mutex_lock(>port_lock); +if (con_port) { +XenEvtchnPort *p = >port_table[con_port]; +if (p->type == EVTCHNSTAT_interdomain) { +xen_primary_console_set_be_port(p->u.interdomain.port); +} +} + for (i = 0; i < s->nr_ports; i++) { close_port(s, i, _kvm_routes); } diff --git a/hw/i386/kvm/xen_primary_console.c b/hw/i386/kvm/xen_primary_console.c index 0aa1c16ad6..5e6e085ac7 100644 --- a/hw/i386/kvm/xen_primary_console.c +++ b/hw/i386/kvm/xen_primary_console.c @@ -112,6 +112,15 @@ uint16_t xen_primary_console_get_port(void) return s->guest_port; } +void xen_primary_console_set_be_port(uint16_t port) +{ +XenPrimaryConsoleState *s = xen_primary_console_singleton; +if (s) { +printf("be port set to %d\n", port); +s->be_port = port; +} +} + uint64_t xen_primary_console_get_pfn(void) { XenPrimaryConsoleState *s = xen_primary_console_singleton; @@ -142,6 +151,20 @@ static void alloc_guest_port(XenPrimaryConsoleState *s) } } +static void rebind_guest_port(XenPrimaryConsoleState *s) +{ +struct evtchn_bind_interdomain inter = { +.remote_dom = DOMID_QEMU, +.remote_port = s->be_port, +}; + +if (!xen_evtchn_bind_interdomain_op()) { +s->guest_port = inter.local_port; +} + +s->be_port = 0; +} + int xen_primary_console_reset(void) { XenPrimaryConsoleState *s = xen_primary_console_singleton; @@ -154,7 +177,11 @@ int xen_primary_console_reset(void) xen_overlay_do_map_page(>console_page, gpa); } -alloc_guest_port(s); +if (s->be_port) { +rebind_guest_port(s); +} else { +alloc_guest_port(s); +} trace_xen_primary_console_reset(s->guest_port); diff --git a/hw/i386/kvm/xen_primary_console.h b/hw/i386/kvm/xen_primary_console.h index dd4922f3f4..7e2989ea0d 100644 --- a/hw/i386/kvm/xen_primary_console.h +++ b/hw/i386/kvm/xen_primary_console.h @@ -16,6 +16,7 @@ void xen_primary_console_create(void); int xen_primary_console_reset(void); uint16_t xen_primary_console_get_port(void); +void xen_primary_console_set_be_port(uint16_t port); uint64_t xen_primary_console_get_pfn(void); void *xen_primary_console_get_map(void);
Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode
On 24/10/2023 16:37, David Woodhouse wrote: On Tue, 2023-10-24 at 15:20 +0100, Paul Durrant wrote: On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse The primary console is special because the toolstack maps a page at a fixed GFN and also allocates the guest-side event channel. Add support for that in emulated mode, so that we can have a primary console. Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which supports literally nothing except a single-page mapping of the console page. This might as well have been a hack in the xen_console driver, but this way at least the special-casing is kept within the Xen emulation code, and it gives us a hook for a more complete implementation if/when we ever do need one. Why can't you map the console page via the grant table like the xenstore page? I suppose we could, but I didn't really want the generic xen-console device code having any more of a special case for 'Xen emulation' than it does already by having to call xen_primary_console_create(). But doesn't is save you the whole foreignmem thing? You can use the grant table for primary and secondary consoles. Paul
Re: [PATCH v2 10/24] hw/xen: populate store frontend nodes with XenStore PFN/port
On 19/10/2023 16:40, David Woodhouse wrote: From: David Woodhouse This is kind of redundant since without being able to get these through some other method (HVMOP_get_param) the guest wouldn't be able to access XenStore in order to find them. But Xen populates them, and it does allow guests to *rebind* to the event channel port after a reset. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 11 +++ 1 file changed, 11 insertions(+) Reviewed-by: Paul Durrant
Re: [PATCH v2 05/24] hw/xen: fix XenStore watch delivery to guest
On 19/10/2023 16:40, David Woodhouse wrote: From: David Woodhouse When fire_watch_cb() found the response buffer empty, it would call deliver_watch() to generate the XS_WATCH_EVENT message in the response buffer and send an event channel notification to the guest… without actually *copying* the response buffer into the ring. So there was nothing for the guest to see. The pending response didn't actually get processed into the ring until the guest next triggered some activity from its side. Add the missing call to put_rsp(). It might have been slightly nicer to call xen_xenstore_event() here, which would *almost* have worked. Except for the fact that it calls xen_be_evtchn_pending() to check that it really does have an event pending (and clear the eventfd for next time). And under Xen it's defined that setting that fd to O_NONBLOCK isn't guaranteed to work, so the emu implementation follows suit. This fixes Xen device hot-unplug. Fixes: 0254c4d19df ("hw/xen: Add xenstore wire implementation and implementation stubs") Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c index 660d0b72f9..82a215058a 100644 --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -1357,10 +1357,12 @@ static void fire_watch_cb(void *opaque, const char *path, const char *token) } else { deliver_watch(s, path, token); /* - * If the message was queued because there was already ring activity, - * no need to wake the guest. But if not, we need to send the evtchn. + * Attempt to queue the message into the actual ring, and send + * the event channel notification if any bytes are copied. */ -xen_be_evtchn_notify(s->eh, s->be_port); +if (put_rsp(s) > 0) { +xen_be_evtchn_notify(s->eh, s->be_port); +} There's actually the potential for an assertion failure there; if the guest has specified an oversize token then deliver_watch() will not set rsp_pending... then put_rsp() will fail its assertion that the flag is true. Paul } }
Re: [PATCH v2 04/24] hw/xen: don't clear map_track[] in xen_gnttab_reset()
On 19/10/2023 16:40, David Woodhouse wrote: From: David Woodhouse The refcounts actually correspond to 'active_ref' structures stored in a GHashTable per "user" on the backend side (mostly, per XenDevice). If we zero map_track[] on reset, then when the backend drivers get torn down and release their mapping we hit the assert(s->map_track[ref] != 0) in gnt_unref(). So leave them in place. Each backend driver will disconnect and reconnect as the guest comes back up again and reconnects, and it all works out OK in the end as the old refs get dropped. Fixes: de26b2619789 ("hw/xen: Implement soft reset for emulated gnttab") Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_gnttab.c | 2 -- 1 file changed, 2 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v2 03/24] hw/xen: select kernel mode for per-vCPU event channel upcall vector
On 19/10/2023 16:39, David Woodhouse wrote: From: David Woodhouse A guest which has configured the per-vCPU upcall vector may set the HVM_PARAM_CALLBACK_IRQ param to fairly much anything other than zero. For example, Linux v6.0+ after commit b1c3497e604 ("x86/xen: Add support for HVMOP_set_evtchn_upcall_vector") will just do this after setting the vector: /* Trick toolstack to think we are enlightened. */ if (!cpu) rc = xen_set_callback_via(1); That's explicitly setting the delivery to GSI#1, but it's supposed to be overridden by the per-vCPU vector setting. This mostly works in Qemu *except* for the logic to enable the in-kernel handling of event channels, which falsely determines that the kernel cannot accelerate GSI delivery in this case. Add a kvm_xen_has_vcpu_callback_vector() to report whether vCPU#0 has the vector set, and use that in xen_evtchn_set_callback_param() to enable the kernel acceleration features even when the param *appears* to be set to target a GSI. Preserve the Xen behaviour that when HVM_PARAM_CALLBACK_IRQ is set to *zero* the event channel delivery is disabled completely. (Which is what that bizarre guest behaviour is working round in the first place.) Fixes: 91cce756179 ("hw/xen: Add xen_evtchn device for event channel emulation") Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_evtchn.c | 6 ++ include/sysemu/kvm_xen.h | 1 + target/i386/kvm/xen-emu.c | 7 +++ 3 files changed, 14 insertions(+) Reviewed-by: Paul Durrant
Re: [PATCH 2/4] hw/xen: update Xen PV NIC to XenDevice model
On 17/10/2023 19:25, David Woodhouse wrote: From: David Woodhouse Signed-off-by: David Woodhouse --- hw/net/meson.build| 2 +- hw/net/trace-events | 9 + hw/net/xen_nic.c | 426 +- hw/xenpv/xen_machine_pv.c | 1 - 4 files changed, 342 insertions(+), 96 deletions(-) diff --git a/hw/net/meson.build b/hw/net/meson.build index 2632634df3..f64651c467 100644 --- a/hw/net/meson.build +++ b/hw/net/meson.build @@ -1,5 +1,5 @@ system_ss.add(when: 'CONFIG_DP8393X', if_true: files('dp8393x.c')) -system_ss.add(when: 'CONFIG_XEN', if_true: files('xen_nic.c')) +system_ss.add(when: 'CONFIG_XEN_BUS', if_true: files('xen_nic.c')) system_ss.add(when: 'CONFIG_NE2000_COMMON', if_true: files('ne2000.c')) # PCI network cards diff --git a/hw/net/trace-events b/hw/net/trace-events index 3abfd65e5b..ee56acfbce 100644 --- a/hw/net/trace-events +++ b/hw/net/trace-events @@ -482,3 +482,12 @@ dp8393x_receive_oversize(int size) "oversize packet, pkt_size is %d" dp8393x_receive_not_netcard(void) "packet not for netcard" dp8393x_receive_packet(int crba) "Receive packet at 0x%"PRIx32 dp8393x_receive_write_status(int crba) "Write status at 0x%"PRIx32 + +# xen_nic.c +xen_netdev_realize(int idx, const char *info) "idx %u info '%s'" +xen_netdev_unrealize(int idx) "idx %u" +xen_netdev_create(int idx) "idx %u" +xen_netdev_destroy(int idx) "idx %u" +xen_netdev_disconnect(int idx) "idx %u" +xen_netdev_connect(int idx, unsigned int tx, unsigned int rx, int port) "idx %u tx %u rx %u port %u" +xen_netdev_frontend_changed(const char *idx, int state) "idx %s state %d" diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c index 9bbf6599fc..84914c329c 100644 --- a/hw/net/xen_nic.c +++ b/hw/net/xen_nic.c @@ -20,6 +20,11 @@ */ #include "qemu/osdep.h" +#include "qemu/cutils.h" +#include "qemu/qemu-print.h" +#include "qapi/qmp/qdict.h" +#include "qapi/error.h" + #include #include #include @@ -27,18 +32,26 @@ #include "net/net.h" #include "net/checksum.h" #include "net/util.h" -#include "hw/xen/xen-legacy-backend.h" + +#include "hw/xen/xen-backend.h" +#include "hw/xen/xen-bus-helper.h" +#include "hw/qdev-properties.h" +#include "hw/qdev-properties-system.h" #include "hw/xen/interface/io/netif.h" +#include "hw/xen/interface/io/xs_wire.h" + +#include "trace.h" /* - */ struct XenNetDev { -struct XenLegacyDevice xendev; /* must be first */ -char *mac; +struct XenDevice xendev; /* must be first */ +XenEventChannel *event_channel; +int dev; int tx_work; -int tx_ring_ref; -int rx_ring_ref; +unsigned int tx_ring_ref; +unsigned int rx_ring_ref; struct netif_tx_sring *txs; struct netif_rx_sring *rxs; netif_tx_back_ring_t tx_ring; @@ -47,6 +60,13 @@ struct XenNetDev { NICState *nic; }; +typedef struct XenNetDev XenNetDev; + +#define TYPE_XEN_NET_DEVICE "xen-net-device" +OBJECT_DECLARE_SIMPLE_TYPE(XenNetDev, XEN_NET_DEVICE) + +#define xen_pv_printf(a, n, ...) qemu_printf(__VA_ARGS__) Why define this... + /* - */ static void net_tx_response(struct XenNetDev *netdev, netif_tx_request_t *txp, int8_t st) @@ -68,7 +88,8 @@ static void net_tx_response(struct XenNetDev *netdev, netif_tx_request_t *txp, i netdev->tx_ring.rsp_prod_pvt = ++i; RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(>tx_ring, notify); if (notify) { -xen_pv_send_notify(>xendev); +xen_device_notify_event_channel(XEN_DEVICE(netdev), +netdev->event_channel, NULL); } if (i == netdev->tx_ring.req_cons) { @@ -104,8 +125,9 @@ static void net_tx_error(struct XenNetDev *netdev, netif_tx_request_t *txp, RING #endif } -static void net_tx_packets(struct XenNetDev *netdev) +static bool net_tx_packets(struct XenNetDev *netdev) { +bool done_something = false; netif_tx_request_t txreq; RING_IDX rc, rp; void *page; @@ -122,6 +144,7 @@ static void net_tx_packets(struct XenNetDev *netdev) } memcpy(, RING_GET_REQUEST(>tx_ring, rc), sizeof(txreq)); netdev->tx_ring.req_cons = ++rc; +done_something = true; #if 1 /* should not happen in theory, we don't announce the * @@ -151,7 +174,7 @@ static void net_tx_packets(struct XenNetDev *netdev) continue; } -xen_pv_printf(>xendev, 3, +if (0) qemu_printf(//>xendev, 3, ... and then not use it here? Perhaps the 'if (0)' ugliness can go in the macro too. "tx packet ref %d, off %d, len %d, flags 0x%x%s%s%s%s\n",
Re: [PATCH 1/4] hw/xen: only remove peers of PCI NICs on unplug
On 17/10/2023 19:25, David Woodhouse wrote: From: David Woodhouse When the Xen guest asks to unplug *emulated* NICs, it's kind of unhelpful also to unplug the peer of the *Xen* PV NIC. Signed-off-by: David Woodhouse --- hw/i386/xen/xen_platform.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c index 17457ff3de..e2dd1b536a 100644 --- a/hw/i386/xen/xen_platform.c +++ b/hw/i386/xen/xen_platform.c @@ -140,9 +140,14 @@ static void unplug_nic(PCIBus *b, PCIDevice *d, void *o) /* Remove the peer of the NIC device. Normally, this would be a tap device. */ static void del_nic_peer(NICState *nic, void *opaque) { -NetClientState *nc; +NetClientState *nc = qemu_get_queue(nic); +ObjectClass *klass = module_object_class_by_name(nc->model); + +/* Only delete peers of PCI NICs that we're about to delete */ +if (!klass || !object_class_dynamic_cast(klass, TYPE_PCI_DEVICE)) { Would it not be better to test for object_class_dynamic_cast(klass, TYPE_XEN_DEVICE)? Paul +return; +} -nc = qemu_get_queue(nic); if (nc->peer) qemu_del_net_client(nc->peer); }
Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode
On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse The primary console is special because the toolstack maps a page at a fixed GFN and also allocates the guest-side event channel. Add support for that in emulated mode, so that we can have a primary console. Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which supports literally nothing except a single-page mapping of the console page. This might as well have been a hack in the xen_console driver, but this way at least the special-casing is kept within the Xen emulation code, and it gives us a hook for a more complete implementation if/when we ever do need one. Why can't you map the console page via the grant table like the xenstore page? Paul
Re: [PATCH 09/12] hw/xen: prevent duplicate device registrations
On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse Ensure that we have a XenBackendInstance for every device regardless of whether it was "discovered" in XenStore or created directly in QEMU. This allows the backend_list to be a source of truth about whether a given backend exists, and allows us to reject duplicates. This also cleans up the fact that backend drivers were calling xen_backend_set_device() with a XenDevice immediately after calling qdev_realize_and_unref() on it, when it wasn't theirs to play with any more. Signed-off-by: David Woodhouse --- hw/block/xen-block.c | 1 - hw/char/xen_console.c| 2 +- hw/xen/xen-backend.c | 78 ++-- hw/xen/xen-bus.c | 8 include/hw/xen/xen-backend.h | 3 ++ 5 files changed, 69 insertions(+), 23 deletions(-) diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c index a07cd7eb5d..9262338535 100644 --- a/hw/block/xen-block.c +++ b/hw/block/xen-block.c @@ -975,7 +975,6 @@ static void xen_block_device_create(XenBackendInstance *backend, goto fail; } -xen_backend_set_device(backend, xendev); return; fail: diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c index bd20be116c..2825b8c511 100644 --- a/hw/char/xen_console.c +++ b/hw/char/xen_console.c @@ -468,7 +468,7 @@ static void xen_console_device_create(XenBackendInstance *backend, Chardev *cd = NULL; struct qemu_xs_handle *xsh = xenbus->xsh; -if (qemu_strtoul(name, NULL, 10, )) { +if (qemu_strtoul(name, NULL, 10, ) || number >= INT_MAX) { error_setg(errp, "failed to parse name '%s'", name); goto fail; } I don't think this hunk belongs here, does it? Seems like it should be in patch 7. diff --git a/hw/xen/xen-backend.c b/hw/xen/xen-backend.c index b9bf70a9f5..dcb4329258 100644 --- a/hw/xen/xen-backend.c +++ b/hw/xen/xen-backend.c @@ -101,22 +101,28 @@ static XenBackendInstance *xen_backend_list_find(XenDevice *xendev) return NULL; } -bool xen_backend_exists(const char *type, const char *name) +static XenBackendInstance *xen_backend_lookup(const XenBackendImpl *impl, const char *name) This name is a little close to xen_backend_table_lookup()... perhaps that one should be renamed xen_backend_impl_lookup() for clarity. { -const XenBackendImpl *impl = xen_backend_table_lookup(type); XenBackendInstance *backend; -if (!impl) { -return false; -} - QLIST_FOREACH(backend, _list, entry) { if (backend->impl == impl && !strcmp(backend->name, name)) { -return true; +return backend; } } -return false; +return NULL; +} + +bool xen_backend_exists(const char *type, const char *name) +{ +const XenBackendImpl *impl = xen_backend_table_lookup(type); + +if (!impl) { +return false; +} + +return !!xen_backend_lookup(impl, name); } static void xen_backend_list_remove(XenBackendInstance *backend) @@ -138,11 +144,10 @@ void xen_backend_device_create(XenBus *xenbus, const char *type, backend = g_new0(XenBackendInstance, 1); backend->xenbus = xenbus; backend->name = g_strdup(name); - -impl->create(backend, opts, errp); - backend->impl = impl; xen_backend_list_add(backend); + +impl->create(backend, opts, errp); } XenBus *xen_backend_get_bus(XenBackendInstance *backend) @@ -155,13 +160,6 @@ const char *xen_backend_get_name(XenBackendInstance *backend) return backend->name; } -void xen_backend_set_device(XenBackendInstance *backend, -XenDevice *xendev) -{ -g_assert(!backend->xendev); -backend->xendev = xendev; -} - The declaration also needs removing from xen-backend.h presumably. XenDevice *xen_backend_get_device(XenBackendInstance *backend) { return backend->xendev; @@ -178,9 +176,7 @@ bool xen_backend_try_device_destroy(XenDevice *xendev, Error **errp) } impl = backend->impl; -if (backend->xendev) { -impl->destroy(backend, errp); -} +impl->destroy(backend, errp); xen_backend_list_remove(backend); g_free(backend->name); @@ -188,3 +184,43 @@ bool xen_backend_try_device_destroy(XenDevice *xendev, Error **errp) return true; } + +bool xen_backend_device_realized(XenDevice *xendev, Error **errp) +{ +XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev); +const char *type = xendev_class->backend ? : object_get_typename(OBJECT(xendev)); +const XenBackendImpl *impl = xen_backend_table_lookup(type); +XenBackendInstance *backend; + +if (!impl) { +return false; +} + +backend = xen_backend_lookup(impl, xendev->name); +if (backend) { +if (backend->xendev && backend->xendev != xendev) { +error_setg(errp, "device %s/%s already exists", type, xendev->name); +
Re: [PATCH 06/12] hw/xen: add get_frontend_path() method to XenDeviceClass
On 24/10/2023 14:29, David Woodhouse wrote: On Tue, 2023-10-24 at 13:59 +0100, Paul Durrant wrote: On 24/10/2023 13:56, David Woodhouse wrote: On Tue, 2023-10-24 at 13:42 +0100, Paul Durrant wrote: --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -711,8 +711,16 @@ static void xen_device_frontend_create(XenDevice *xendev, Error **errp) { ERRP_GUARD(); XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev))); + XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev); - xendev->frontend_path = xen_device_get_frontend_path(xendev); + if (xendev_class->get_frontend_path) { + xendev->frontend_path = xendev_class->get_frontend_path(xendev, errp); + if (!xendev->frontend_path) { + return; I think you need to update errp here to note that you are failing to create the frontend. If xendev_class->get_frontend_path returned NULL it will have filled in errp. Ok, but a prepend to say that a lack of path there means we skip frontend creation seems reasonable? No, it *is* returning an error. Perhaps I can make it I understand it is returning an error. I thought the point of the cascading error handling was to prepend text at each (meaningful) layer such that the eventual message conveyed what failed and also what the consequences of that failure were. Paul if (!xendev->frontend_path) { /* * If the ->get_frontend_path() method returned NULL, it will * already have set *errp accordingly. Return the error. */ return /* false */; } As a general rule (I'll be doing a bombing run on xen-bus once I get my patch queue down into single digits) we should never check 'if (*errp)' to check if a function had an error. It should *also* return a success or failure indication, and we should cope with errp being NULL. I'm pretty sure someone told me the exact opposite a few years back. Then they were wrong :)
Re: [PATCH 08/12] hw/xen: do not repeatedly try to create a failing backend device
On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse If xen_backend_device_create() fails to instantiate a device, the XenBus code will just keep trying over and over again each time the bus is re-enumerated, as long as the backend appears online and in XenbusStateInitialising. The only thing which prevents the XenBus code from recreating duplicates of devices which already exist, is the fact that xen_device_realize() sets the backend state to XenbusStateInitWait. If the attempt to create the device doesn't get *that* far, that's when it will keep getting retried. My first thought was to handle errors by setting the backend state to XenbusStateClosed, but that doesn't work for XenConsole which wants to *ignore* any device of type != "ioemu" completely. So, make xen_backend_device_create() *keep* the XenBackendInstance for a failed device, and provide a new xen_backend_exists() function to allow xen_bus_type_enumerate() to check whether one already exists before creating a new one. Signed-off-by: David Woodhouse --- hw/xen/xen-backend.c | 27 +-- hw/xen/xen-bus.c | 3 ++- include/hw/xen/xen-backend.h | 1 + 3 files changed, 24 insertions(+), 7 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH 07/12] hw/xen: update Xen console to XenDevice model
ce_notify_event_channel(XEN_DEVICE(con), con->event_channel, NULL); } -static void xencons_send(struct XenConsole *con) +static bool xencons_send(XenConsole *con) { ssize_t len, size; @@ -159,174 +170,407 @@ static void xencons_send(struct XenConsole *con) if (len < 1) { if (!con->backlog) { con->backlog = 1; -xen_pv_printf(>xendev, 1, - "backlog piling up, nobody listening?\n"); } } else { buffer_advance(>buffer, len); if (con->backlog && len == size) { con->backlog = 0; -xen_pv_printf(>xendev, 1, "backlog is gone\n"); } } +return len > 0; } /* */ -static int store_con_info(struct XenConsole *con) +static bool con_event(void *_xendev) { -Chardev *cs = qemu_chr_fe_get_driver(>chr); -char *pts = NULL; -char *dom_path; -g_autoptr(GString) path = NULL; +XenConsole *con = XEN_CONSOLE_DEVICE(_xendev); +bool done_something; -/* Only continue if we're talking to a pty. */ -if (!CHARDEV_IS_PTY(cs)) { -return 0; -} -pts = cs->filename + 4; +done_something = buffer_append(con); -dom_path = qemu_xen_xs_get_domain_path(xenstore, xen_domid); -if (!dom_path) { -return 0; +if (con->buffer.size - con->buffer.consumed) { +done_something |= xencons_send(con); } +return done_something; +} -path = g_string_new(dom_path); -free(dom_path); +/* */ -if (con->xendev.dev) { -g_string_append_printf(path, "/device/console/%d", con->xendev.dev); -} else { -g_string_append(path, "/console"); +static void xen_console_disconnect(XenDevice *xendev, Error **errp) +{ +XenConsole *con = XEN_CONSOLE_DEVICE(xendev); + +qemu_chr_fe_set_handlers(>chr, NULL, NULL, NULL, NULL, + con, NULL, true); + nit: extraneous blank line by the looks of it. With that fixed... Reviewed-by: Paul Durrant
Re: [PATCH 06/12] hw/xen: add get_frontend_path() method to XenDeviceClass
On 24/10/2023 13:56, David Woodhouse wrote: On Tue, 2023-10-24 at 13:42 +0100, Paul Durrant wrote: --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -711,8 +711,16 @@ static void xen_device_frontend_create(XenDevice *xendev, Error **errp) { ERRP_GUARD(); XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev))); + XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev); - xendev->frontend_path = xen_device_get_frontend_path(xendev); + if (xendev_class->get_frontend_path) { + xendev->frontend_path = xendev_class->get_frontend_path(xendev, errp); + if (!xendev->frontend_path) { + return; I think you need to update errp here to note that you are failing to create the frontend. If xendev_class->get_frontend_path returned NULL it will have filled in errp. Ok, but a prepend to say that a lack of path there means we skip frontend creation seems reasonable? As a general rule (I'll be doing a bombing run on xen-bus once I get my patch queue down into single digits) we should never check 'if (*errp)' to check if a function had an error. It should *also* return a success or failure indication, and we should cope with errp being NULL. I'm pretty sure someone told me the exact opposite a few years back. Paul
Re: [PATCH 06/12] hw/xen: add get_frontend_path() method to XenDeviceClass
On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse The primary Xen console is special. The guest's side is set up for it by the toolstack automatically and not by the standard PV init sequence. Accordingly, its *frontend* doesn't appear in …/device/console/0 either; instead it appears under …/console in the guest's XenStore node. To allow the Xen console driver to override the frontend path for the primary console, add a method to the XenDeviceClass which can be used instead of the standard xen_device_get_frontend_path() Signed-off-by: David Woodhouse --- hw/xen/xen-bus.c | 10 +- include/hw/xen/xen-bus.h | 2 ++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c index ece8ec40cd..cc524ed92c 100644 --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -711,8 +711,16 @@ static void xen_device_frontend_create(XenDevice *xendev, Error **errp) { ERRP_GUARD(); XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev))); +XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev); -xendev->frontend_path = xen_device_get_frontend_path(xendev); +if (xendev_class->get_frontend_path) { +xendev->frontend_path = xendev_class->get_frontend_path(xendev, errp); +if (!xendev->frontend_path) { +return; I think you need to update errp here to note that you are failing to create the frontend. Paul
Re: [PATCH 05/12] hw/xen: populate store frontend nodes with XenStore PFN/port
On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse This is kind of redundant since without being able to get these through ome other method (HVMOP_get_param) the guest wouldn't be able to access ^ typo XenStore in order to find them. But Xen populates them, and it does allow guests to *rebind* to the event channel port after a reset. ... although this can also be done by querying the remote end of the port before reset. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 10 ++ 1 file changed, 10 insertions(+) Reviewed-by: Paul Durrant
Re: [PATCH 04/12] i386/xen: advertise XEN_HVM_CPUID_UPCALL_VECTOR in CPUID
On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse This will allow Linux guests (since v6.0) to use the per-vCPU upcall vector delivered as MSI through the local APIC. Signed-off-by: David Woodhouse --- target/i386/kvm/kvm.c | 4 1 file changed, 4 insertions(+) Reviewed-by: Paul Durrant
Re: [PATCH 03/12] include: update Xen public headers to Xen 4.17.2 release
On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse ... in order to advertise the XEN_HVM_CPUID_UPCALL_VECTOR feature, which will come in a subsequent commit. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c| 2 +- include/hw/xen/interface/arch-arm.h | 37 +++--- include/hw/xen/interface/arch-x86/cpuid.h | 31 +--- .../hw/xen/interface/arch-x86/xen-x86_32.h| 19 +-- .../hw/xen/interface/arch-x86/xen-x86_64.h| 19 +-- include/hw/xen/interface/arch-x86/xen.h | 26 ++ include/hw/xen/interface/event_channel.h | 19 +-- include/hw/xen/interface/features.h | 19 +-- include/hw/xen/interface/grant_table.h| 19 +-- include/hw/xen/interface/hvm/hvm_op.h | 19 +-- include/hw/xen/interface/hvm/params.h | 19 +-- include/hw/xen/interface/io/blkif.h | 27 -- include/hw/xen/interface/io/console.h | 19 +-- include/hw/xen/interface/io/fbif.h| 19 +-- include/hw/xen/interface/io/kbdif.h | 19 +-- include/hw/xen/interface/io/netif.h | 25 +++--- include/hw/xen/interface/io/protocols.h | 19 +-- include/hw/xen/interface/io/ring.h| 49 ++- include/hw/xen/interface/io/usbif.h | 19 +-- include/hw/xen/interface/io/xenbus.h | 19 +-- include/hw/xen/interface/io/xs_wire.h | 36 ++ include/hw/xen/interface/memory.h | 30 +--- include/hw/xen/interface/physdev.h| 23 ++--- include/hw/xen/interface/sched.h | 19 +-- include/hw/xen/interface/trace.h | 19 +-- include/hw/xen/interface/vcpu.h | 19 +-- include/hw/xen/interface/version.h| 19 +-- include/hw/xen/interface/xen-compat.h | 19 +-- include/hw/xen/interface/xen.h| 19 +-- 29 files changed, 124 insertions(+), 523 deletions(-) Acked-by: Paul Durrant
Re: [PATCH 02/12] hw/xen: select kernel mode for per-vCPU event channel upcall vector
On 16/10/2023 16:18, David Woodhouse wrote: From: David Woodhouse A guest which has configured the per-vCPU upcall vector may set the HVM_PARAM_CALLBACK_IRQ param to fairly much anything other than zero. For example, Linux v6.0+ after commit b1c3497e604 ("x86/xen: Add support for HVMOP_set_evtchn_upcall_vector") will just do this after setting the vector: /* Trick toolstack to think we are enlightened. */ if (!cpu) rc = xen_set_callback_via(1); That's explicitly setting the delivery to GSI#, but it's supposed to be overridden by the per-vCPU vector setting. This mostly works in QEMU *except* for the logic to enable the in-kernel handling of event channels, which falsely determines that the kernel cannot accelerate GSI delivery in this case. Add a kvm_xen_has_vcpu_callback_vector() to report whether vCPU#0 has the vector set, and use that in xen_evtchn_set_callback_param() to enable the kernel acceleration features even when the param *appears* to be set to target a GSI. Preserve the Xen behaviour that when HVM_PARAM_CALLBACK_IRQ is set to *zero* the event channel delivery is disabled completely. (Which is what that bizarre guest behaviour is working round in the first place.) Fixes: 91cce756179 ("hw/xen: Add xen_evtchn device for event channel emulation") Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_evtchn.c | 6 ++ include/sysemu/kvm_xen.h | 1 + target/i386/kvm/xen-emu.c | 7 +++ 3 files changed, 14 insertions(+) diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c index 4df973022c..d72dca6591 100644 --- a/hw/i386/kvm/xen_evtchn.c +++ b/hw/i386/kvm/xen_evtchn.c @@ -490,6 +490,12 @@ int xen_evtchn_set_callback_param(uint64_t param) break; } +/* If the guest has set a per-vCPU callback vector, prefer that. */ +if (gsi && kvm_xen_has_vcpu_callback_vector()) { +in_kernel = kvm_xen_has_cap(EVTCHN_SEND); +gsi = 0; +} + So this deals with setting the callback via after setting the upcall vector. What happens if the guest then disables the upcall vector (by setting it to zero)? Xen checks 'v->arch.hvm.evtchn_upcall_vector != 0' for every event delivery. Paul
Re: [PATCH 01/12] i386/xen: fix per-vCPU upcall vector for Xen emulation
On 16/10/2023 16:18, David Woodhouse wrote: From: David Woodhouse The per-vCPU upcall vector support had two problems. Firstly it was using the wrong hypercall argument and would always return -EFAULT. And secondly it was using the wrong ioctl() to pass the vector to the kernel and thus the *kernel* would always return -EINVAL. Linux doesn't (yet) use this mode so it went without decent testing for a while. Fixes: 105b47fdf2d0 ("i386/xen: implement HVMOP_set_evtchn_upcall_vector") Signed-off-by: David Woodhouse --- target/i386/kvm/xen-emu.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH] net/xen-netback: Break build if netback slots > max_skbs + 1
On 27/09/2023 09:29, David Kahurani wrote: If XEN_NETBK_LEGACY_SLOTS_MAX and MAX_SKB_FRAGS have a difference of more than 1, with MAX_SKB_FRAGS being the lesser value, it opens up a path for null-dereference. It was also noted that some distributions were modifying upstream behaviour in that direction which necessitates this patch. Signed-off-by: David Kahurani Acked-by: Paul Durrant
Re: [XEN PATCH 08/13] xen/hvm: address violations of MISRA C:2012 Rule 7.3
On 03/08/2023 11:22, Simone Ballarin wrote: From: Gianluca Luparini The xen sources contain violations of MISRA C:2012 Rule 7.3 whose headline states: "The lowercase character 'l' shall not be used in a literal suffix". Use the "L" suffix instead of the "l" suffix, to avoid potential ambiguity. If the "u" suffix is used near "L", use the "U" suffix instead, for consistency. The changes in this patch are mechanical. Signed-off-by: Gianluca Luparini Signed-off-by: Simone Ballarin --- xen/arch/x86/hvm/emulate.c | 2 +- xen/arch/x86/hvm/io.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) Acked-by: Paul Durrant
Re: [XEN PATCH 05/13] xen/ioreq: address violations of MISRA C:2012 Rule 7.3
On 03/08/2023 11:22, Simone Ballarin wrote: From: Gianluca Luparini The xen sources contain violations of MISRA C:2012 Rule 7.3 whose headline states: "The lowercase character 'l' shall not be used in a literal suffix". Use the "L" suffix instead of the "l" suffix, to avoid potential ambiguity. If the "u" suffix is used near "L", use the "U" suffix instead, for consistency. The changes in this patch are mechanical. Signed-off-by: Gianluca Luparini Signed-off-by: Simone Ballarin --- xen/common/ioreq.c | 2 +- xen/include/xen/ioreq.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) Acked-by: Paul Durrant
Re: [XEN PATCH 10/13] x86/viridian: address violations of MISRA C:2012 Rule 7.3
On 03/08/2023 11:22, Simone Ballarin wrote: From: Gianluca Luparini The xen sources contain violations of MISRA C:2012 Rule 7.3 whose headline states: "The lowercase character 'l' shall not be used in a literal suffix". Use the "L" suffix instead of the "l" suffix, to avoid potential ambiguity. If the "u" suffix is used near "L", use the "U" suffix instead, for consistency. The changes in this patch are mechanical. Signed-off-by: Gianluca Luparini Signed-off-by: Simone Ballarin --- xen/arch/x86/hvm/viridian/synic.c | 2 +- xen/arch/x86/hvm/viridian/time.c | 10 +- 2 files changed, 6 insertions(+), 6 deletions(-) Reviewed-by: Paul Durrant
Re: [XEN PATCH v5 3/4] x86/viridian: address violations of MISRA C:2012 Rule 7.2
On 28/08/2023 12:03, Simone Ballarin wrote: From: Gianluca Luparini The xen sources contains violations of MISRA C:2012 Rule 7.2 whose headline states: "A 'u' or 'U' suffix shall be applied to all integer constants that are represented in an unsigned type". Add the 'U' suffix to integers literals with unsigned type and also to other literals used in the same contexts or near violations, when their positive nature is immediately clear. The latter changes are done for the sake of uniformity. Signed-off-by: Gianluca Luparini Signed-off-by: Simone Ballarin Reviewed-by: Stefano Stabellini --- Changes in v4: - change commit headline - add Reviewed-by Changes in v3: - create this commit for 'viridian.c' and 'hyperv-tlfs.h' --- xen/arch/x86/hvm/viridian/viridian.c | 2 +- xen/arch/x86/include/asm/guest/hyperv-tlfs.h | 28 ++-- 2 files changed, 15 insertions(+), 15 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v2 1/4] block: rename blk_io_plug_call() API to defer_call()
On 17/08/2023 16:58, Stefan Hajnoczi wrote: Prepare to move the blk_io_plug_call() API out of the block layer so that other subsystems call use this deferred call mechanism. Rename it to defer_call() but leave the code in block/plug.c. The next commit will move the code out of the block layer. Suggested-by: Ilya Maximets Signed-off-by: Stefan Hajnoczi --- include/sysemu/block-backend-io.h | 6 +- block/blkio.c | 8 +-- block/io_uring.c | 4 +- block/linux-aio.c | 4 +- block/nvme.c | 4 +- block/plug.c | 109 +++--- hw/block/dataplane/xen-block.c| 10 +-- hw/block/virtio-blk.c | 4 +- hw/scsi/virtio-scsi.c | 6 +- 9 files changed, 76 insertions(+), 79 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v1] xen-platform: do full PCI reset during unplug of IDE devices
On 20/07/2023 08:29, Olaf Hering wrote: The IDE unplug function needs to reset the entire PCI device, to make sure all state is initialized to defaults. This is done by calling pci_device_reset, which resets not only the chip specific registers, but also all PCI state. This fixes "unplug" in a Xen HVM domU with the modular legacy xenlinux PV drivers. Commit ee358e919e38 ("hw/ide/piix: Convert reset handler to DeviceReset") changed the way how the the disks are unplugged. Prior this commit the PCI device remained unchanged. After this change, piix_ide_reset is exercised after the "unplug" command, which was not the case prior that commit. This function resets the command register. As a result the ata_piix driver inside the domU will see a disabled PCI device. The generic PCI code will reenable the PCI device. On the qemu side, this runs pci_default_write_config/pci_update_mappings. Here a changed address is returned by pci_bar_address, this is the address which was truncated in piix_ide_reset. In case of a Xen HVM domU, the address changes from 0xc120 to 0xc100. This truncation was a bug in piix_ide_reset, which was fixed in commit 230dfd9257 ("hw/ide/piix: properly initialize the BMIBA register"). If pci_xen_ide_unplug had used pci_device_reset, the PCI registers would have been properly reset, and commit ee358e919e38 would have not introduced a regression for this specific domU environment. While the unplug is supposed to hide the IDE disks, the changed BMIBA address broke the UHCI device. In case the domU has an USB tablet configured, to recive absolute pointer coordinates for the GUI, it will cause a hang during device discovery of the partly discovered USB hid device. Reading the USBSTS word size register will fail. The access ends up in the QEMU piix-bmdma device, instead of the expected uhci device. Here a byte size request is expected, and a value of ~0 is returned. As a result the UCHI driver sees an error state in the register, and turns off the UHCI controller. Signed-off-by: Olaf Hering --- hw/i386/xen/xen_platform.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH] hw/xen: Clarify (lack of) error handling in transaction_commit()
On 20/06/2023 18:58, David Woodhouse wrote: From: David Woodhouse Coverity was unhappy (CID 1508359) because we didn't check the return of init_walk_op() in transaction_commit(), despite doing so at every other call site. Strictly speaking, this is a false positive since it can never fail. It only fails for invalid user input (transaction ID or path), and both of those are hard-coded to known sane values in this invocation. But Coverity doesn't know that, and neither does the casual reader of the code. Returning an error here would be weird, since the transaction *is* committed by this point; all the walk_op is doing is firing watches on the newly-committed changed nodes. So make it a g_assert(!ret), since it really should never happen. Signed-off-by: David Woodhouse --- hw/i386/kvm/xenstore_impl.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) Reviewed-by: Paul Durrant
Re: [XEN PATCH] x86: I/O emulation: fix violations of MISRA C:2012 Rules 8.2 and 8.3
On 19/07/2023 09:24, Federico Serafini wrote: Give a name to unnamed parameters thus fixing violations of MISRA C:2012 Rule 8.2 ("Function types shall be in prototype form with named parameters"). Keep consistency between parameter names used in function declarations and names used in the corresponding function definitions thus fixing violations of MISRA C:2012 Rule 8.3 ("All declarations of an object or function shall use the same names and type qualifiers"). Signed-off-by: Federico Serafini --- xen/arch/x86/include/asm/hvm/emulate.h | 8 xen/arch/x86/include/asm/hvm/io.h | 14 +++--- 2 files changed, 11 insertions(+), 11 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH] xen-block: Avoid leaks on new error path
On 04/07/2023 18:18, Anthony PERARD wrote: From: Anthony PERARD Commit 189829399070 ("xen-block: Use specific blockdev driver") introduced a new error path, without taking care of allocated resources. So only allocate the qdicts after the error check, and free both `filename` and `driver` when we are about to return and thus taking care of both success and error path. Coverity only spotted the leak of qdicts (*_layer variables). Reported-by: Peter Maydell Fixes: Coverity CID 1508722, 1398649 Fixes: 189829399070 ("xen-block: Use specific blockdev driver") Signed-off-by: Anthony PERARD --- hw/block/xen-block.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v1] x86/hvm/ioreq: remove empty line after function declaration
On 25/05/2023 16:25, Olaf Hering wrote: Introduced in commit 6ddfaabceeec3c31bc97a7208c46f581de55f71d ("x86/hvm/ioreq: simplify code and use consistent naming"). Signed-off-by: Olaf Hering --- xen/arch/x86/hvm/ioreq.c | 1 - 1 file changed, 1 deletion(-) Reviewed-by: Paul Durrant
Re: [PATCH v2 RESEND] xen: Fix SEGV on domain disconnect
On 24/04/2023 14:17, Tim Smith wrote: On Mon, Apr 24, 2023 at 1:08 PM Mark Syms wrote: Copying in Tim who did the final phase of the changes. On Mon, 24 Apr 2023 at 11:32, Paul Durrant wrote: On 20/04/2023 12:02, mark.s...@citrix.com wrote: From: Mark Syms Ensure the PV ring is drained on disconnect. Also ensure all pending AIO is complete, otherwise AIO tries to complete into a mapping of the ring which has been torn down. Signed-off-by: Mark Syms --- CC: Stefano Stabellini CC: Anthony Perard CC: Paul Durrant CC: xen-devel@lists.xenproject.org v2: * Ensure all inflight requests are completed before teardown * RESEND to fix formatting --- hw/block/dataplane/xen-block.c | 31 +-- 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c index 734da42ea7..d9da4090bf 100644 --- a/hw/block/dataplane/xen-block.c +++ b/hw/block/dataplane/xen-block.c @@ -523,6 +523,10 @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane) dataplane->more_work = 0; +if (dataplane->sring == 0) { +return done_something; +} + I think you could just return false here... Nothing is ever going to be done if there's no ring :-) rc = dataplane->rings.common.req_cons; rp = dataplane->rings.common.sring->req_prod; xen_rmb(); /* Ensure we see queued requests up to 'rp'. */ @@ -666,14 +670,35 @@ void xen_block_dataplane_destroy(XenBlockDataPlane *dataplane > void xen_block_dataplane_stop(XenBlockDataPlane *dataplane) { XenDevice *xendev; +XenBlockRequest *request, *next; if (!dataplane) { return; } +/* We're about to drain the ring. We can cancel the scheduling of any + * bottom half now */ +qemu_bh_cancel(dataplane->bh); + +/* Ensure we have drained the ring */ +aio_context_acquire(dataplane->ctx); +do { +xen_block_handle_requests(dataplane); +} while (dataplane->more_work); +aio_context_release(dataplane->ctx); + I don't think we want to be taking new requests, do we? If we're in this situation and the guest has put something on the ring, I think we should do our best with it. We cannot just rely on the guest to be well-behaved, because they're not :-( We're about to throw the ring away, so whatever is there would otherwise be lost. We only throw away our mapping. The memory belongs to the guest and it should ensure it does not submit requests after the state has left 'connected' This bit is here to try to handle guests which are less than diligent about their shutdown. We *should* always be past this fast enough when the disconnect()/connect() of XenbusStateConnected happens that all remains well (if not, we were in a worse situation before). What about a malicious guest that is piling requests into the ring. It could keep us in the loop forever, couldn't it? +/* Now ensure that all inflight requests are complete */ +while (!QLIST_EMPTY(>inflight)) { +QLIST_FOREACH_SAFE(request, >inflight, list, next) { +blk_aio_flush(request->dataplane->blk, xen_block_complete_aio, +request); +} +} + I think this could possibly be simplified by doing the drain after the call to blk_set_aio_context(), as long as we set dataplane->ctx to qemu_get_aio_context(). Alos, as long as more_work is not set then it should still be safe to cancel the bh before the drain AFAICT. I'm not sure what you mean by simpler? Possibly I'm not getting something. Sorry, I was referring to the need to do aio_context_acquire() calls but they are only around the disputed xen_block_handle_requests() call anyway, so there's no simplification in this bit. We have to make sure that any "aio_bh_schedule_oneshot_full()" which happens as a result of "blk_aio_flush()" has finished before any change of AIO context, because it tries to use the one which was current at the time of being called (I have the SEGVs to prove it :-)). Ok, I had assumed that the issue was the context being picked up inside the xen_block_complete_aio() call. Whether that happens before or after "blk_set_aio_context(qemu_get_aio_context())" doesn't seem to be a change in complexity to me. Motivation was to get as much as possible to happen in the way it "normally" would, so that future changes are less likely to regress, but as mentioned maybe I'm missing something. The BH needs to be prevented from firing ASAP, otherwise the disconnect()/connect() which happens when XenbusStateConnected can have the bh fire from what the guest does next right in the middle of juggling contexts for the disconnect() (I have the SEGVs from that too...). So if you drop the ring drain then this patch should still stop the SEGVs, right? Paul
Re: [PATCH v2 RESEND] xen: Fix SEGV on domain disconnect
On 20/04/2023 12:02, mark.s...@citrix.com wrote: From: Mark Syms Ensure the PV ring is drained on disconnect. Also ensure all pending AIO is complete, otherwise AIO tries to complete into a mapping of the ring which has been torn down. Signed-off-by: Mark Syms --- CC: Stefano Stabellini CC: Anthony Perard CC: Paul Durrant CC: xen-devel@lists.xenproject.org v2: * Ensure all inflight requests are completed before teardown * RESEND to fix formatting --- hw/block/dataplane/xen-block.c | 31 +-- 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c index 734da42ea7..d9da4090bf 100644 --- a/hw/block/dataplane/xen-block.c +++ b/hw/block/dataplane/xen-block.c @@ -523,6 +523,10 @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane) dataplane->more_work = 0; +if (dataplane->sring == 0) { +return done_something; +} + I think you could just return false here... Nothing is ever going to be done if there's no ring :-) rc = dataplane->rings.common.req_cons; rp = dataplane->rings.common.sring->req_prod; xen_rmb(); /* Ensure we see queued requests up to 'rp'. */ @@ -666,14 +670,35 @@ void xen_block_dataplane_destroy(XenBlockDataPlane *dataplane > void xen_block_dataplane_stop(XenBlockDataPlane *dataplane) { XenDevice *xendev; +XenBlockRequest *request, *next; if (!dataplane) { return; } +/* We're about to drain the ring. We can cancel the scheduling of any + * bottom half now */ +qemu_bh_cancel(dataplane->bh); + +/* Ensure we have drained the ring */ +aio_context_acquire(dataplane->ctx); +do { +xen_block_handle_requests(dataplane); +} while (dataplane->more_work); +aio_context_release(dataplane->ctx); + I don't think we want to be taking new requests, do we? +/* Now ensure that all inflight requests are complete */ +while (!QLIST_EMPTY(>inflight)) { +QLIST_FOREACH_SAFE(request, >inflight, list, next) { +blk_aio_flush(request->dataplane->blk, xen_block_complete_aio, +request); +} +} + I think this could possibly be simplified by doing the drain after the call to blk_set_aio_context(), as long as we set dataplane->ctx to qemu_get_aio_context(). Alos, as long as more_work is not set then it should still be safe to cancel the bh before the drain AFAICT. Paul xendev = dataplane->xendev; aio_context_acquire(dataplane->ctx); + if (dataplane->event_channel) { /* Only reason for failure is a NULL channel */ xen_device_set_event_channel_context(xendev, dataplane->event_channel, @@ -684,12 +709,6 @@ void xen_block_dataplane_stop(XenBlockDataPlane *dataplane) blk_set_aio_context(dataplane->blk, qemu_get_aio_context(), _abort); aio_context_release(dataplane->ctx); -/* - * Now that the context has been moved onto the main thread, cancel - * further processing. - */ -qemu_bh_cancel(dataplane->bh); - if (dataplane->event_channel) { Error *local_err = NULL;
Re: [PATCH v2 08/16] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore
On 19/04/2023 18:28, Stefan Hajnoczi wrote: There is no need to suspend activity between aio_disable_external() and aio_enable_external(), which is mainly used for the block layer's drain operation. This is part of ongoing work to remove the aio_disable_external() API. Reviewed-by: David Woodhouse Signed-off-by: Stefan Hajnoczi --- hw/i386/kvm/xen_xenstore.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Paul Durrant
Re: [PATCH 5/5] hw/xen: Fix broken check for invalid state in xs_be_open()
On 12/04/2023 19:51, David Woodhouse wrote: From: David Woodhouse Coverity points out that if (!s && !s->impl) isn't really what we intended to do here. CID 1508131. Fixes: 032475127225 ("hw/xen: Add emulated implementation of XenStore operations") Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Paul Durrant
Re: [PATCH 4/5] hw/xen: Fix double-free in xen_console store_con_info()
On 12/04/2023 19:51, David Woodhouse wrote: From: David Woodhouse Coverity spotted a double-free (CID 1508254); we g_string_free(path) and then for some reason immediately call free(path) too. We should just use g_autoptr() for it anyway, which simplifies the code a bit. Fixes: 7a8a749da7d3 ("hw/xen: Move xenstore_store_pv_console_info to xen_console.c") Signed-off-by: David Woodhouse --- hw/char/xen_console.c | 13 +++-- 1 file changed, 3 insertions(+), 10 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH 3/5] xen: Drop support for Xen versions below 4.7.1
On 12/04/2023 19:51, David Woodhouse wrote: From: David Woodhouse In restructuring to allow for internal emulation of Xen functionality, I broke compatibility for Xen 4.6 and earlier. Fix this by explicitly removing support for anything older than 4.7.1, which is also ancient but it does still build, and the compatibility support for it is fairly unintrusive. Fixes: 15e283c5b684 ("hw/xen: Add foreignmem operations to allow redirection to internal emulation") Signed-off-by: David Woodhouse --- hw/xen/xen-operations.c | 57 +-- include/hw/xen/xen_native.h | 107 +--- meson.build | 5 +- scripts/xen-detect.c| 60 4 files changed, 3 insertions(+), 226 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH 2/5] hw/xen: Fix memory leak in libxenstore_open() for Xen
On 12/04/2023 19:50, David Woodhouse wrote: From: David Woodhouse There was a superfluous allocation of the XS handle, leading to it being leaked on both the error path and the success path (where it gets allocated again). Spotted by Coverity (CID 1508098). Fixes: ba2a92db1ff6 ("hw/xen: Add xenstore operations to allow redirection to internal emulation") Suggested-by: Peter Maydell Signed-off-by: David Woodhouse Reviewed-by: Peter Maydell Reviewed-by: Paul Durrant
Re: [PATCH 2/2] xen/netback: remove not needed test in xenvif_tx_build_gops()
On 27/03/2023 09:36, Juergen Gross wrote: The tests for the number of grant mapping or copy operations reaching the array size of the operations buffer at the end of the main loop in xenvif_tx_build_gops() isn't needed. The loop can handle at maximum MAX_PENDING_REQS transfer requests, as XEN_RING_NR_UNCONSUMED_REQUESTS() is taking unsent responses into consideration, too. Remove the tests. Suggested-by: Jan Beulich Signed-off-by: Juergen Gross --- drivers/net/xen-netback/netback.c | 4 1 file changed, 4 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH 1/2] xen/netback: don't do grant copy across page boundary
On 27/03/2023 09:36, Juergen Gross wrote: Fix xenvif_get_requests() not to do grant copy operations across local page boundaries. This requires to double the maximum number of copy operations per queue, as each copy could now be split into 2. Make sure that struct xenvif_tx_cb doesn't grow too large. Cc: sta...@vger.kernel.org Fixes: ad7f402ae4f4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area") Signed-off-by: Juergen Gross --- drivers/net/xen-netback/common.h | 2 +- drivers/net/xen-netback/netback.c | 25 +++-- 2 files changed, 24 insertions(+), 3 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH] hw/xenpv: Initialize Xen backend operations
On 23/03/2023 10:57, David Woodhouse wrote: From: David Woodhouse As the Xen backend operations were abstracted out into a function table to allow for internally emulated Xen support, we missed the xen_init_pv() code path which also needs to install the operations for the true Xen libraries. Add the missing call to setup_xen_backend_ops(). Fixes: b6cacfea0b38 ("hw/xen: Add evtchn operations to allow redirection to internal emulation") Reported-by: Anthony PERARD Signed-off-by: David Woodhouse --- hw/xenpv/xen_machine_pv.c | 2 ++ 1 file changed, 2 insertions(+) Reviewed-by: Paul Durrant diff --git a/hw/xenpv/xen_machine_pv.c b/hw/xenpv/xen_machine_pv.c index 2e759d0619..17cda5ec13 100644 --- a/hw/xenpv/xen_machine_pv.c +++ b/hw/xenpv/xen_machine_pv.c @@ -35,6 +35,8 @@ static void xen_init_pv(MachineState *machine) DriveInfo *dinfo; int i; +setup_xen_backend_ops(); + /* Initialize backend core & drivers */ xen_be_init();
Re: [PATCH] Fix PCI hotplug AML
On 20/03/2023 10:34, Jan Beulich wrote: On 20.03.2023 10:04, Paul Durrant wrote: On 17/03/2023 10:32, David Woodhouse wrote: From: David Woodhouse The emulated PIIX3 uses a nybble for the status of each PCI function, so the status for e.g. slot 0 functions 0 and 1 respectively can be read as (\_GPE.PH00 & 0x0F), and (\_GPE.PH00 >> 0x04). The AML that Xen gives to a guest gets the operand order for the odd- numbered functions the wrong way round, returning (0x04 >> \_GPE.PH00) instead. As far as I can tell, this was the wrong way round in Xen from the moment that PCI hotplug was first introduced in commit 83d82e6f35a8: +ShiftRight (0x4, \_GPE.PH00, Local1) +Return (Local1) /* IN status as the _STA */ Or maybe there's bizarre AML operand ordering going on there, like Intel's wrong-way-round assembler, and it only broke later when it was changed to being generated? Either way, it's definitely wrong now, and instrumenting a Linux guest shows that it correctly sees _STA being 0x00 in function 0 of an empty slot, but then the loop in acpiphp_glue.c::get_slot_status() goes on to look at function 1 and sees that _STA evaluates to 0x04. Thus reporting an adapter is present in every slot in /sys/bus/pci/slots/* Quite why Linux wants to look for function 1 being physically present when function 0 isn't... I don't want to think about right now. Signed-off-by: David Woodhouse Fixes: 83d82e6f35a8 ("hvmloader: pass-through: multi-function PCI hot-plug") --- Utterly untested in Xen. Tested the same change in a different environment which is using precisely the *same* AML for guest compatibility. This AML only relates to the hotplug controller for qemu-trad so it's unlikely anyone particularly cares any more. In fact I'm kind of surprised the generation code still exists. Why would it not exist anymore? Use of qemu-trad is deprecated and advised against, but it's still possible to use it. Otherwise quite a bit of cleanup in libxl could also happen, for example. Right. I'm just surprised that is not done already... seems like a while since trad was deprecated; I'd have thought it could be removed in the next release. Paul
Re: [PATCH] Fix PCI hotplug AML
On 17/03/2023 10:32, David Woodhouse wrote: From: David Woodhouse The emulated PIIX3 uses a nybble for the status of each PCI function, so the status for e.g. slot 0 functions 0 and 1 respectively can be read as (\_GPE.PH00 & 0x0F), and (\_GPE.PH00 >> 0x04). The AML that Xen gives to a guest gets the operand order for the odd- numbered functions the wrong way round, returning (0x04 >> \_GPE.PH00) instead. As far as I can tell, this was the wrong way round in Xen from the moment that PCI hotplug was first introduced in commit 83d82e6f35a8: +ShiftRight (0x4, \_GPE.PH00, Local1) +Return (Local1) /* IN status as the _STA */ Or maybe there's bizarre AML operand ordering going on there, like Intel's wrong-way-round assembler, and it only broke later when it was changed to being generated? Either way, it's definitely wrong now, and instrumenting a Linux guest shows that it correctly sees _STA being 0x00 in function 0 of an empty slot, but then the loop in acpiphp_glue.c::get_slot_status() goes on to look at function 1 and sees that _STA evaluates to 0x04. Thus reporting an adapter is present in every slot in /sys/bus/pci/slots/* Quite why Linux wants to look for function 1 being physically present when function 0 isn't... I don't want to think about right now. Signed-off-by: David Woodhouse Fixes: 83d82e6f35a8 ("hvmloader: pass-through: multi-function PCI hot-plug") --- Utterly untested in Xen. Tested the same change in a different environment which is using precisely the *same* AML for guest compatibility. This AML only relates to the hotplug controller for qemu-trad so it's unlikely anyone particularly cares any more. In fact I'm kind of surprised the generation code still exists. Paul diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c index 1176da80ef..1d27809116 100644 --- a/tools/libacpi/mk_dsdt.c +++ b/tools/libacpi/mk_dsdt.c @@ -431,7 +431,7 @@ int main(int argc, char **argv) stmt("Store", "0x89, \\_GPE.DPT2"); } if ( slot & 1 ) -stmt("ShiftRight", "0x4, \\_GPE.PH%02X, Local1", slot & ~1); +stmt("ShiftRight", "\\_GPE.PH%02X, 0x04, Local1", slot & ~1); else stmt("And", "\\_GPE.PH%02X, 0x0f, Local1", slot & ~1); stmt("Return", "Local1"); /* IN status as the _STA */
Re: [PATCH] accel/xen: Fix DM state change notification in dm_restrict mode
On 14/03/2023 08:35, David Woodhouse wrote: From: David Woodhouse When dm_restrict is set, QEMU isn't permitted to update the XenStore node to indicate its running status. Previously, the xs_write() call would fail but the failure was ignored. However, in refactoring to allow for emulated XenStore operations, a new call to xs_open() was added. That one didn't fail gracefully, causing a fatal error when running in dm_restrict mode. Partially revert the offending patch, removing the additional call to xs_open() because the global 'xenstore' variable is still available; it just needs to be used with qemu_xen_xs_write() now instead of directly with the xs_write() libxenstore function. Also make the whole thing conditional on !xen_domid_restrict. There's no point even registering the state change handler to attempt to update the XenStore node when we know it's destined to fail. Fixes: ba2a92db1ff6 ("hw/xen: Add xenstore operations to allow redirection to internal emulation") Reported-by: Jason Andryuk Co-developed-by: Jason Andryuk Not-Signed-off-by: Jason Andryuk Signed-off-by: David Woodhouse Will-be-Tested-by: Jason Andryuk --- accel/xen/xen-all.c | 27 ++- 1 file changed, 10 insertions(+), 17 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v2 00/27] Enable PV backends with Xen/KVM emulation
On 07/03/2023 17:17, David Woodhouse wrote: Following on from the basic platform support which has already been merged, here's phase 2 which wires up the XenBus and PV back ends. It starts with a basic single-tenant internal implementation of a XenStore, with a copy-on-write tree, watches, transactions, quotas. Then we introduce operations tables for the grant table, event channel, foreignmen and xenstore operations so that in addition to using the Xen libraries for those, QEMU can use its internal emulated versions. A little bit of cleaning up of header files, and we can enable the build of xen-bus in the CONFIG_XEN_EMU build, and run a Xen guest with an actual PV disk... qemu-system-x86_64 -serial mon:stdio -M q35 -display none -m 1G -smp 2 \ -accel kvm,xen-version=0x4000e,kernel-irqchip=split \ -kernel bzImage -append "console=ttyS0 root=/dev/xvda1 selinux=0" \ -drive file=/var/lib/libvirt/images/fedora28.qcow2,if=none,id=disk \ -device xen-disk,drive=disk,vdev=xvda The main thing that isn't working here is migration. I've implemented it for the internal xenstore and the unit tests exercise it, but the existing PV back ends don't support it, perhaps partly because support for guest transparent live migration support isn't upstream in Xen yet. So the disk doesn't come back correctly after migration. I'm content with that for 8.0 though, and we just mark the emulated XenStore device as unmigratable to prevent users from trying. The other pre-existing constraint is that only the block back end has yet been ported to the "new" XenBus infrastructure, and is actually capable of creating its own backend nodes. Again, I can live with that for 8.0. Maybe this will motivate us to finally get round to converting the rest off XenLegacyBackend and killing it. We also don't have a simple way to perform grant mapping of multiple guest pages to contiguous addresses, as we can under real Xen. So we don't advertise max-ring-page-order for xen-disk in the emulated mode. Fixing that — if we actually want to — would probably require mapping RAM from an actual backing store object, so that it can be mapped again at a different location for the PV back end to see. v2: https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv-2 • Full set of reviewed-by tags from Paul (and associated minor fixes). • Disable migration for emulated XenStore device. • Update docs and add MAINTAINERS entry. v1: https://lore.kernel.org/qemu-devel/20230302153435.1170111-1-dw...@infradead.org/ https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv-1 David Woodhouse (23): hw/xen: Add xenstore wire implementation and implementation stubs hw/xen: Add basic XenStore tree walk and write/read/directory support hw/xen: Implement XenStore watches hw/xen: Implement XenStore transactions hw/xen: Watches on XenStore transactions hw/xen: Implement core serialize/deserialize methods for xenstore_impl hw/xen: Add evtchn operations to allow redirection to internal emulation hw/xen: Add gnttab operations to allow redirection to internal emulation hw/xen: Pass grant ref to gnttab unmap operation hw/xen: Add foreignmem operations to allow redirection to internal emulation hw/xen: Move xenstore_store_pv_console_info to xen_console.c hw/xen: Use XEN_PAGE_SIZE in PV backend drivers hw/xen: Rename xen_common.h to xen_native.h hw/xen: Build PV backend drivers for CONFIG_XEN_BUS hw/xen: Only advertise ring-page-order for xen-block if gnttab supports it hw/xen: Hook up emulated implementation for event channel operations hw/xen: Add emulated implementation of grant table operations hw/xen: Add emulated implementation of XenStore operations hw/xen: Map guest XENSTORE_PFN grant in emulated Xenstore hw/xen: Implement soft reset for emulated gnttab i386/xen: Initialize Xen backends from pc_basic_device_init() for emulation MAINTAINERS: Add entry for Xen on KVM emulation docs: Update Xen-on-KVM documentation for PV disk support Paul Durrant (4): hw/xen: Implement XenStore permissions hw/xen: Create initial XenStore nodes hw/xen: Add xenstore operations to allow redirection to internal emulation hw/xen: Avoid crash when backend watch fires too early MAINTAINERS |9 + accel/xen/xen-all.c | 69 +- docs/system/i386/xen.rst | 30 +- hw/9pfs/meson.build |2 +- hw/9pfs/xen-9p-backend.c | 32 +- hw/block/dataplane/meson.build|2 +- hw/block/dataplane/xen-block.c| 12 +- hw/block/meson.build |2 +- hw/block/xen-block.c | 12 +-
Re: [PATCH v2 25/27] i386/xen: Initialize Xen backends from pc_basic_device_init() for emulation
On 07/03/2023 17:17, David Woodhouse wrote: From: David Woodhouse Now that all the work is done to enable the PV backends to work without actual Xen, instantiate the bus from pc_basic_device_init() for emulated mode. This allows us finally to launch an emulated Xen guest with PV disk. qemu-system-x86_64 -serial mon:stdio -M q35 -cpu host -display none \ -m 1G -smp 2 -accel kvm,xen-version=0x4000a,kernel-irqchip=split \ -kernel bzImage -append "console=ttyS0 root=/dev/xvda1" \ -drive file=/var/lib/libvirt/images/fedora28.qcow2,if=none,id=disk \ -device xen-disk,drive=disk,vdev=xvda If we use -M pc instead of q35, we can even add an IDE disk and boot a guest image normally through grub. But q35 gives us AHCI and that isn't unplugged by the Xen magic, so the guests ends up seeing "both" disks. Signed-off-by: David Woodhouse Reviewed-by: Paul Durrant --- hw/i386/pc.c | 7 +++ 1 file changed, 7 insertions(+) Also... Tested-by: Paul Durrant ... on real Xen (master branch, 4.18) with a Debian guest.
Re: [RFC PATCH v1 07/25] hw/xen: Implement core serialize/deserialize methods for xenstore_impl
On 07/03/2023 16:59, Paul Durrant wrote: On 07/03/2023 16:52, David Woodhouse wrote: On Tue, 2023-03-07 at 16:39 +, Paul Durrant wrote: On 07/03/2023 16:33, David Woodhouse wrote: On Thu, 2023-03-02 at 15:34 +, David Woodhouse wrote: From: David Woodhouse In fact I think we want to only serialize the contents of the domain's path in /local/domain/${domid} and leave the rest to be recreated? Will defer to Paul for that. Signed-off-by: David Woodhouse Paul, your Reviewed-by: on this one is conspicuous in its absence. I mentioned migration in the cover letter — this much is working fine, but it's the PV back ends that don't yet work. I'd quite like to merge the basic serialization/deserialization of XenStore itself, with the unit tests. The patch is basically ok, I just think the serialization should be limited to the guest nodes... filtering out those not owned by xen_domid would probably work for that. Yeah, so let's just do this (as part of this patch #7) for now: --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -235,6 +235,7 @@ static int xen_xenstore_post_load(void *opaque, int ver) static const VMStateDescription xen_xenstore_vmstate = { .name = "xen_xenstore", + .unmigratable = 1, /* The PV back ends don't migrate yet */ .version_id = 1, .minimum_version_id = 1, .needed = xen_xenstore_is_needed, It means we can't migrate guests even if they're only using fully emulated devices... but I think that's a reasonable limitation until we implement it fully. Ok. With that added... Revieweed-by: Paul Durrant Typoed, sorry... Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 07/25] hw/xen: Implement core serialize/deserialize methods for xenstore_impl
On 07/03/2023 16:52, David Woodhouse wrote: On Tue, 2023-03-07 at 16:39 +, Paul Durrant wrote: On 07/03/2023 16:33, David Woodhouse wrote: On Thu, 2023-03-02 at 15:34 +, David Woodhouse wrote: From: David Woodhouse In fact I think we want to only serialize the contents of the domain's path in /local/domain/${domid} and leave the rest to be recreated? Will defer to Paul for that. Signed-off-by: David Woodhouse Paul, your Reviewed-by: on this one is conspicuous in its absence. I mentioned migration in the cover letter — this much is working fine, but it's the PV back ends that don't yet work. I'd quite like to merge the basic serialization/deserialization of XenStore itself, with the unit tests. The patch is basically ok, I just think the serialization should be limited to the guest nodes... filtering out those not owned by xen_domid would probably work for that. Yeah, so let's just do this (as part of this patch #7) for now: --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -235,6 +235,7 @@ static int xen_xenstore_post_load(void *opaque, int ver) static const VMStateDescription xen_xenstore_vmstate = { .name = "xen_xenstore", +.unmigratable = 1, /* The PV back ends don't migrate yet */ .version_id = 1, .minimum_version_id = 1, .needed = xen_xenstore_is_needed, It means we can't migrate guests even if they're only using fully emulated devices... but I think that's a reasonable limitation until we implement it fully. Ok. With that added... Revieweed-by: Paul Durrant
Re: [RFC PATCH v1 07/25] hw/xen: Implement core serialize/deserialize methods for xenstore_impl
On 07/03/2023 16:39, Paul Durrant wrote: On 07/03/2023 16:33, David Woodhouse wrote: On Thu, 2023-03-02 at 15:34 +, David Woodhouse wrote: From: David Woodhouse In fact I think we want to only serialize the contents of the domain's path in /local/domain/${domid} and leave the rest to be recreated? Will defer to Paul for that. Signed-off-by: David Woodhouse Paul, your Reviewed-by: on this one is conspicuous in its absence. I mentioned migration in the cover letter — this much is working fine, but it's the PV back ends that don't yet work. I'd quite like to merge the basic serialization/deserialization of XenStore itself, with the unit tests. The patch is basically ok, I just think the serialization should be limited to the guest nodes... filtering out those not owned by xen_domid would probably work for that. Perhaps we can also set TYPE_XEN_DEVICE or TYPE_XEN_BLOCK_DEVICE to be unmigratable? Ideally I think we want TYPE_XEN_DEVICE to be unmigratable by default *unless* the specific device class (including net and other as we port them from XenLegacyDevice) says otherwise. Yes, that sounds like a good idea. Is there a way to do that? Not something I've looked into. I'll go look now. Maybe calling migrate_add_blocker() in the realize method is the right way to achieve this? Paul
Re: [RFC PATCH v1 07/25] hw/xen: Implement core serialize/deserialize methods for xenstore_impl
On 07/03/2023 16:33, David Woodhouse wrote: On Thu, 2023-03-02 at 15:34 +, David Woodhouse wrote: From: David Woodhouse In fact I think we want to only serialize the contents of the domain's path in /local/domain/${domid} and leave the rest to be recreated? Will defer to Paul for that. Signed-off-by: David Woodhouse Paul, your Reviewed-by: on this one is conspicuous in its absence. I mentioned migration in the cover letter — this much is working fine, but it's the PV back ends that don't yet work. I'd quite like to merge the basic serialization/deserialization of XenStore itself, with the unit tests. The patch is basically ok, I just think the serialization should be limited to the guest nodes... filtering out those not owned by xen_domid would probably work for that. Perhaps we can also set TYPE_XEN_DEVICE or TYPE_XEN_BLOCK_DEVICE to be unmigratable? Ideally I think we want TYPE_XEN_DEVICE to be unmigratable by default *unless* the specific device class (including net and other as we port them from XenLegacyDevice) says otherwise. Yes, that sounds like a good idea. Is there a way to do that? Not something I've looked into. I'll go look now. Paul
Re: [RFC PATCH v1 27/25] docs: Update Xen-on-KVM documentation for PV disk support
On 07/03/2023 16:22, David Woodhouse wrote: From: David Woodhouse Signed-off-by: David Woodhouse --- docs/system/i386/xen.rst | 30 +++--- 1 file changed, 23 insertions(+), 7 deletions(-) Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 26/25] MAINTAINERS: Add entry for Xen on KVM emulation
On 07/03/2023 16:21, David Woodhouse wrote: From: David Woodhouse Signed-off-by: David Woodhouse --- MAINTAINERS | 9 + 1 file changed, 9 insertions(+) Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 25/25] i386/xen: Initialize Xen backends from pc_basic_device_init() for emulation
On 02/03/2023 15:34, David Woodhouse wrote: From: David Woodhouse Now that all the work is done to enable the PV backends to work without actual Xen, instantiate the bus from pc_basic_device_init() for emulated mode. This allows us finally to launch an emulated Xen guest with PV disk. qemu-system-x86_64 -serial mon:stdio -M q35 -cpu host -display none \ -m 1G -smp 2 -accel kvm,xen-version=0x4000a,kernel-irqchip=split \ -kernel bzImage -append "console=ttyS0 root=/dev/xvda1" \ -drive file=/var/lib/libvirt/images/fedora28.qcow2,if=none,id=disk \ -device xen-disk,drive=disk,vdev=xvda If we use -M pc instead of q35, we can even add an IDE disk and boot a guest image normally through grub. But q35 gives us AHCI and that isn't unplugged by the Xen magic, so the guests ends up seeing "both" disks. Signed-off-by: David Woodhouse --- hw/i386/pc.c | 7 +++ 1 file changed, 7 insertions(+) Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 24/25] hw/xen: Implement soft reset for emulated gnttab
On 02/03/2023 15:34, David Woodhouse wrote: From: David Woodhouse This is only part of it; we will also need to get the PV back end drivers to tear down their own mappings (or do it for them, but they kind of need to stop using the pointers too). Some more work on the actual PV back ends and xen-bus code is going to be needed to really make soft reset and migration fully functional, and this part is the basis for that. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_gnttab.c | 26 -- hw/i386/kvm/xen_gnttab.h | 1 + target/i386/kvm/xen-emu.c | 5 + 3 files changed, 30 insertions(+), 2 deletions(-) Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 23/25] hw/xen: Map guest XENSTORE_PFN grant in emulated Xenstore
On 02/03/2023 15:34, David Woodhouse wrote: From: David Woodhouse Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 16 1 file changed, 16 insertions(+) diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c index 028f80499e..f9b7387024 100644 --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -21,6 +21,7 @@ #include "hw/sysbus.h" #include "hw/xen/xen.h" +#include "hw/xen/xen_backend_ops.h" #include "xen_overlay.h" #include "xen_evtchn.h" #include "xen_xenstore.h" @@ -34,6 +35,7 @@ #include "hw/xen/interface/io/xs_wire.h" #include "hw/xen/interface/event_channel.h" +#include "hw/xen/interface/grant_table.h" #define TYPE_XEN_XENSTORE "xen-xenstore" OBJECT_DECLARE_SIMPLE_TYPE(XenXenstoreState, XEN_XENSTORE) @@ -66,6 +68,9 @@ struct XenXenstoreState { uint8_t *impl_state; uint32_t impl_state_size; + +struct xengntdev_handle *gt; +void *granted_xs; }; struct XenXenstoreState *xen_xenstore_singleton; @@ -1452,6 +1457,17 @@ int xen_xenstore_reset(void) } s->be_port = err; +/* + * We don't actually access the guest's page through the grant, because + * this isn't real Xen, and we can just use the page we gave it in the + * first place. Map the grant anyway, mostly for cosmetic purposes so + * it *looks* like it's in use in the guest-visible grant table. Might be useful to stick this text in the commit comment too. Reviewed-by: Paul Durrant + */ +s->gt = qemu_xen_gnttab_open(); +uint32_t xs_gntref = GNTTAB_RESERVED_XENSTORE; +s->granted_xs = qemu_xen_gnttab_map_refs(s->gt, 1, xen_domid, _gntref, + PROT_READ | PROT_WRITE); + return 0; }
Re: [RFC PATCH v1 22/25] hw/xen: Add emulated implementation of XenStore operations
On 02/03/2023 15:34, David Woodhouse wrote: From: David Woodhouse Now that we have an internal implementation of XenStore, we can populate the xenstore_backend_ops to allow PV backends to talk to it. Watches can't be processed with immediate callbacks because that would call back into XenBus code recursively. Defer them to a QEMUBH to be run as appropriate from the main loop. We use a QEMUBH per XS handle, and it walks all the watches (there shouldn't be many per handle) to fire any which have pending events. We *could* have done it differently but this allows us to use the same struct watch_event as we have for the guest side, and keeps things relatively simple. Yes, it's more consistent with watch events on real Xen this way. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 273 - 1 file changed, 269 insertions(+), 4 deletions(-) Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 21/25] hw/xen: Add emulated implementation of grant table operations
On 02/03/2023 15:34, David Woodhouse wrote: From: David Woodhouse This is limited to mapping a single grant at a time, because under Xen the pages are mapped *contiguously* into qemu's address space, and that's very hard to do when those pages actually come from anonymous mappings in qemu in the first place. Eventually perhaps we can look at using shared mappings of actual objects for system RAM, and then we can make new mappings of the same backing store (be it deleted files, shmem, whatever). But for now let's stick to a page at a time. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_gnttab.c | 299 ++- 1 file changed, 296 insertions(+), 3 deletions(-) [snip] +static uint64_t gnt_ref(XenGnttabState *s, grant_ref_t ref, int prot) +{ +uint16_t mask = GTF_type_mask | GTF_sub_page; +grant_entry_v1_t gnt, *gnt_p; +int retries = 0; + +if (ref >= s->max_frames * ENTRIES_PER_FRAME_V1 || +s->map_track[ref] == UINT8_MAX) { +return INVALID_GPA; +} + +if (prot & PROT_WRITE) { +mask |= GTF_readonly; +} + +gnt_p = >entries.v1[ref]; + +/* + * The guest can legitimately be changing the GTF_readonly flag. Allow I'd call a guest playing with the ref after setting GTF_permit_access a buggy guest and not bother with the loop. + * that, but don't let a malicious guest cause a livelock. + */ +for (retries = 0; retries < 5; retries++) { +uint16_t new_flags; + +/* Read the entry before an atomic operation on its flags */ +gnt = *(volatile grant_entry_v1_t *)gnt_p; + +if ((gnt.flags & mask) != GTF_permit_access || +gnt.domid != DOMID_QEMU) { +return INVALID_GPA; +} + +new_flags = gnt.flags | GTF_reading; +if (prot & PROT_WRITE) { +new_flags |= GTF_writing; +} + +if (qatomic_cmpxchg(_p->flags, gnt.flags, new_flags) == gnt.flags) { Xen actually does a cmpxchg on both the flags and the domid. We probably ought to fail to set the flags if the guest is playing with the domid but since we're single-tenant it doesn't *really* matter... just a nice-to-have. So... Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 20/25] hw/xen: Hook up emulated implementation for event channel operations
On 02/03/2023 15:34, David Woodhouse wrote: From: David Woodhouse We provided the backend-facing evtchn functions very early on as part of the core Xen platform support, since things like timers and xenstore need to use them. By what may or may not be an astonishing coincidence, those functions just *happen* all to have exactly the right function prototypes to slot into the evtchn_backend_ops table and be called by the PV backends. It is indeed an amazing coincidence :-) Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_evtchn.c | 15 +++ 1 file changed, 15 insertions(+) Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 19/25] hw/xen: Only advertise ring-page-order for xen-block if gnttab supports it
On 02/03/2023 15:34, David Woodhouse wrote: From: David Woodhouse Whem emulating Xen, multi-page grants are distinctly non-trivial and we have elected not to support them for the time being. Don't advertise them to the guest. Signed-off-by: David Woodhouse --- hw/block/xen-block.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 18/25] hw/xen: Avoid crash when backend watch fires too early
On 02/03/2023 15:34, David Woodhouse wrote: From: Paul Durrant The xen-block code ends up calling aio_poll() through blkconf_geometry(), which means we see watch events during the indirect call to xendev_class->realize() in xen_device_realize(). Unfortunately this call is made before populating the initial frontend and backend device nodes in xenstore and hence xen_block_frontend_changed() (which is called from a watch event) fails to read the frontend's 'state' node, and hence believes the device is being torn down. This in-turn sets the backend state to XenbusStateClosed and causes the device to be deleted before it is fully set up, leading to the crash. By simply moving the call to xendev_class->realize() after the initial xenstore nodes are populated, this sorry state of affairs is avoided. Reported-by: David Woodhouse Signed-off-by: Paul Durrant Signed-off-by: David Woodhouse --- hw/xen/xen-bus.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) Reviewed-by: Paul Durrant