from:"Paul Durrant"

Re: [PATCH] xen-hvm: Avoid livelock while handling buffered ioreqs

2024-04-08 Thread Paul Durrant


On 04/04/2024 15:08, Ross Lagerwall wrote:

A malicious or buggy guest may generated buffered ioreqs faster than
QEMU can process them in handle_buffered_iopage(). The result is a
livelock - QEMU continuously processes ioreqs on the main thread without
iterating through the main loop which prevents handling other events,
processing timers, etc. Without QEMU handling other events, it often
results in the guest becoming unsable and makes it difficult to stop the
source of buffered ioreqs.

To avoid this, if we process a full page of buffered ioreqs, stop and
reschedule an immediate timer to continue processing them. This lets
QEMU go back to the main loop and catch up.

Signed-off-by: Ross Lagerwall 
---
  hw/xen/xen-hvm-common.c | 26 +-
  1 file changed, 17 insertions(+), 9 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH] xen-hvm: Avoid livelock while handling buffered ioreqs

2024-04-08 Thread Paul Durrant


On 08/04/2024 14:00, Ross Lagerwall wrote:

On Sat, Apr 6, 2024 at 11:58 AM Durrant, Paul  wrote:


On 04/04/2024 15:08, Ross Lagerwall wrote:

A malicious or buggy guest may generated buffered ioreqs faster than
QEMU can process them in handle_buffered_iopage(). The result is a
livelock - QEMU continuously processes ioreqs on the main thread without
iterating through the main loop which prevents handling other events,
processing timers, etc. Without QEMU handling other events, it often
results in the guest becoming unsable and makes it difficult to stop the
source of buffered ioreqs.

To avoid this, if we process a full page of buffered ioreqs, stop and
reschedule an immediate timer to continue processing them. This lets
QEMU go back to the main loop and catch up.



Do PV backends potentially cause the same scheduling issue (if not using
io threads)?



 From what I can tell:

xen-block: It reads req_prod / req_cons once before entering the loop
so it should be fine, I think.

xen_console: Same as xen-block

xen_nic: It reads req_prod / req_cons once before entering the loop.
However, once the loop ends it checks for more requests and if there
are more requests it restarts from the beginning. It seems like this
could be susceptible to the same issue.

(These PV backends generally aren't used by XenServer's system QEMU
so I didn't spend too much time looking into it.)

Thanks,


Ok. Thanks for checking.

  Paul

Re: Xen NIC driver have page_pool memory leaks

2024-03-25 Thread Paul Durrant


On 25/03/2024 12:21, Jesper Dangaard Brouer wrote:

Hi Arthur,

(Answer inlined below, which is custom on this mailing list)

On 23/03/2024 14.23, Arthur Borsboom wrote:

Hi Jesper,

After a recent kernel upgrade 6.7.6 > 6.8.1 all my Xen guests on Arch
Linux are dumping kernel traces.
It seems to be indirectly caused by the page pool memory leak
mechanism, which is probably a good thing.

I have created a bug report, but there is no response.

https://bugzilla.kernel.org/show_bug.cgi?id=218618

I am uncertain where and to whom I need to report this page leak.
Can you help me get this issue fixed?


I'm the page_pool maintainer, but as you say yourself in comment 2 then
since dba1b8a7ab68 ("mm/page_pool: catch page_pool memory leaks") this
indicated there is a problem in the xen_netfront driver, which was
previously not visible.

Cc'ing the "XEN NETWORK BACKEND DRIVER" maintainers, as this is a driver
bug.  What confuses me it that I cannot find any modules named
"xen_netfront" in the upstream tree.



You should have tried '-' rather than '_' :-)

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/xen-netfront.c

[PATCH] MAINTAINERS: Remove myself from several subsystems

2024-03-13 Thread Paul Durrant

From: Paul Durrant 

I am not as actively involved with the Xen project as I once was so I need
to relinquish some of my maintainer responsibilities: IOMMU and I/O
emulation.

NOTE: Since I am sole maintainer for I/O EMULATION (IOREQ) and
  X86 I/O EMULATION, these sections are removed.

Signed-off-by: Paul Durrant 
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Jan Beulich 
Cc: Julien Grall 
Cc: Stefano Stabellini 
Cc: Wei Liu 
---
 MAINTAINERS | 20 
 1 file changed, 20 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 56a6932037fe..b2cc3cc3921b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -358,7 +358,6 @@ F:  xen/drivers/passthrough/vtd/
 
 IOMMU VENDOR INDEPENDENT CODE
 M: Jan Beulich 
-M: Paul Durrant 
 R: Roger Pau Monné 
 S: Supported
 F: xen/drivers/passthrough/
@@ -368,13 +367,6 @@ X: xen/drivers/passthrough/vtd/
 X: xen/drivers/passthrough/device_tree.c
 F: xen/include/xen/iommu.h
 
-I/O EMULATION (IOREQ)
-M: Paul Durrant 
-S: Supported
-F: xen/common/ioreq.c
-F: xen/include/xen/ioreq.h
-F: xen/include/public/hvm/ioreq.h
-
 KCONFIG
 M: Doug Goldstein 
 S: Supported
@@ -608,18 +600,6 @@ F: tools/misc/xen-cpuid.c
 F: tools/tests/cpu-policy/
 F: tools/tests/x86_emulator/
 
-X86 I/O EMULATION
-M: Paul Durrant 
-S: Supported
-F: xen/arch/x86/hvm/emulate.c
-F: xen/arch/x86/hvm/intercept.c
-F: xen/arch/x86/hvm/io.c
-F: xen/arch/x86/hvm/ioreq.c
-F: xen/arch/x86/include/asm/hvm/emulate.h
-F: xen/arch/x86/include/asm/hvm/io.h
-F: xen/arch/x86/include/asm/hvm/ioreq.h
-F: xen/arch/x86/include/asm/ioreq.h
-
 X86 MEMORY MANAGEMENT
 M: Jan Beulich 
 M: Andrew Cooper 
-- 
2.39.2

Re: [PATCH net 1/7] net: fill in MODULE_DESCRIPTION()s for xen-netback

2024-02-13 Thread Paul Durrant


On 13/02/2024 11:21, Breno Leitao wrote:

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Xen backend network module.

Signed-off-by: Breno Leitao 
---
  drivers/net/xen-netback/netback.c | 1 +
  1 file changed, 1 insertion(+)



Acked-by: Paul Durrant 


diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index fab361a250d6..ef76850d9bcd 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1778,5 +1778,6 @@ static void __exit netback_fini(void)
  }
  module_exit(netback_fini);
  
+MODULE_DESCRIPTION("Xen backend network device module");

  MODULE_LICENSE("Dual BSD/GPL");
  MODULE_ALIAS("xen-backend:vif");

Re: [PATCH] x86/HVM: tidy state on hvmemul_map_linear_addr()'s error path

2024-02-08 Thread Paul Durrant


On 08/02/2024 15:59, Jan Beulich wrote:

On 06.02.2024 13:06, Jan Beulich wrote:

While in the vast majority of cases failure of the function will not
be followed by re-invocation with the same emulation context, a few
very specific insns - involving multiple independent writes, e.g. ENTER
and PUSHA - exist where this can happen. Since failure of the function
only signals to the caller that it ought to try an MMIO write instead,
such failure also cannot be assumed to result in wholesale failure of
emulation of the current insn. Instead we have to maintain internal
state such that another invocation of the function with the same
emulation context remains possible. To achieve that we need to reset MFN
slots after putting page references on the error path.

Note that all of this affects debugging code only, in causing an
assertion to trigger (higher up in the function). There's otherwise no
misbehavior - such a "leftover" slot would simply be overwritten by new
contents in a release build.

Also extend the related unmap() assertion, to further check for MFN 0.

Fixes: 8cbd4fb0b7ea ("x86/hvm: implement hvmemul_write() using real mappings")
Reported.by: Manuel Andreas 
Signed-off-by: Jan Beulich 


Just noticed that I forgot to Cc Paul.

Jan


---
While probably I could be convinced to omit the #ifndef, I'm really
considering to extend the one in hvmemul_unmap_linear_addr(), to
eliminate the zapping from release builds: Leaving MFN 0 in place is not
much better than leaving a (presently) guest-owned one there. And we
can't really put/leave INVALID_MFN there, as that would conflict with
other debug checking.


Would it be worth defining a sentinel value for this purpose rather than 
hardcoding _mfn(0)? (_mfn(0) seems like a reasonable sentinel... it's 
just a question of having a #define for it).


Either way...

Acked-by: Paul Durrant 



--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -696,7 +696,12 @@ static void *hvmemul_map_linear_addr(
   out:
  /* Drop all held references. */
  while ( mfn-- > hvmemul_ctxt->mfn )
+{
  put_page(mfn_to_page(*mfn));
+#ifndef NDEBUG /* Clean slot for a subsequent map()'s error checking. */
+*mfn = _mfn(0);
+#endif
+}
  
  return err;

  }
@@ -718,7 +723,7 @@ static void hvmemul_unmap_linear_addr(
  
  for ( i = 0; i < nr_frames; i++ )

  {
-ASSERT(mfn_valid(*mfn));
+ASSERT(mfn_x(*mfn) && mfn_valid(*mfn));
  paging_mark_dirty(currd, *mfn);
  put_page(mfn_to_page(*mfn));

Re: Ping: [PATCH] x86/guest: finish conversion to altcall

2024-02-02 Thread Paul Durrant


On 02/02/2024 07:08, Jan Beulich wrote:

On 17.01.2024 10:31, Jan Beulich wrote:

While .setup() and .e820_fixup() don't need fiddling with for being run
only very early, both .ap_setup() and .resume() want converting too:
This way both pre-filled struct hypervisor_ops instances can become
__initconst_cf_clobber, thus allowing to eliminate up to 5 more ENDBR
(configuration dependent) during the 2nd phase of alternatives patching.

While fiddling with section annotations here, also move "ops" itself to
.data.ro_after_init.

Signed-off-by: Jan Beulich 


May I ask for an ack (or otherwise)?

Thanks, Jan



Yes, no objections.

Acked-by: Paul Durrant 


--- a/xen/arch/x86/guest/hyperv/hyperv.c
+++ b/xen/arch/x86/guest/hyperv/hyperv.c
@@ -207,7 +207,7 @@ static int cf_check flush_tlb(
  return hyperv_flush_tlb(mask, va, flags);
  }
  
-static const struct hypervisor_ops __initconstrel ops = {

+static const struct hypervisor_ops __initconst_cf_clobber ops = {
  .name = "Hyper-V",
  .setup = setup,
  .ap_setup = ap_setup,
--- a/xen/arch/x86/guest/hypervisor.c
+++ b/xen/arch/x86/guest/hypervisor.c
@@ -13,7 +13,7 @@
  #include 
  #include 
  
-static struct hypervisor_ops __read_mostly ops;

+static struct hypervisor_ops __ro_after_init ops;
  
  const char *__init hypervisor_probe(void)

  {
@@ -49,7 +49,7 @@ void __init hypervisor_setup(void)
  int hypervisor_ap_setup(void)
  {
  if ( ops.ap_setup )
-return ops.ap_setup();
+return alternative_call(ops.ap_setup);
  
  return 0;

  }
@@ -57,7 +57,7 @@ int hypervisor_ap_setup(void)
  void hypervisor_resume(void)
  {
  if ( ops.resume )
-ops.resume();
+alternative_vcall(ops.resume);
  }
  
  void __init hypervisor_e820_fixup(void)

--- a/xen/arch/x86/guest/xen/xen.c
+++ b/xen/arch/x86/guest/xen/xen.c
@@ -318,7 +318,7 @@ static int cf_check flush_tlb(
  return xen_hypercall_hvm_op(HVMOP_flush_tlbs, NULL);
  }
  
-static const struct hypervisor_ops __initconstrel ops = {

+static const struct hypervisor_ops __initconst_cf_clobber ops = {
  .name = "Xen",
  .setup = setup,
  .ap_setup = ap_setup,

Re: [PATCH net] xen-netback: properly sync TX responses

2024-02-01 Thread Paul Durrant


On 29/01/2024 13:03, Jan Beulich wrote:

Invoking the make_tx_response() / push_tx_responses() pair with no lock
held would be acceptable only if all such invocations happened from the
same context (NAPI instance or dealloc thread). Since this isn't the
case, and since the interface "spec" also doesn't demand that multicast
operations may only be performed with no in-flight transmits,
MCAST_{ADD,DEL} processing also needs to acquire the response lock
around the invocations.

To prevent similar mistakes going forward, "downgrade" the present
functions to private helpers of just the two remaining ones using them
directly, with no forward declarations anymore. This involves renaming
what so far was make_tx_response(), for the new function of that name
to serve the new (wrapper) purpose.

While there,
- constify the txp parameters,
- correct xenvif_idx_release()'s status parameter's type,
- rename {,_}make_tx_response()'s status parameters for consistency with
   xenvif_idx_release()'s.

Fixes: 210c34dcd8d9 ("xen-netback: add support for multicast control")
Cc: sta...@vger.kernel.org
Signed-off-by: Jan Beulich 
---
Of course this could be split into two or even more separate changes,
but I think these transformations are best done all in one go.

It remains questionable whether push_tx_responses() really needs
invoking after every single _make_tx_response().

MCAST_{ADD,DEL} are odd also from another perspective: They're supposed
to come with "dummy requests", with the comment in the public header
leaving open what that means.


IIRC the only reference I had at the time was Solaris code... I don't 
really there being any documentation of the feature. I think that 
https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/xen/io/xnb.c 
is probably similar to the code I looked at to determine what the 
Solaris frontend expected. So I think 'dummy' means 'ignored'.



Netback doesn't check dummy-ness (e.g.
size being zero). Furthermore the description in the public header
doesn't really make clear that there's a restriction of one such "extra"
per dummy request. Yet the way xenvif_get_extras() works precludes
multiple ADDs or multiple DELs in a single dummy request (only the last
one would be honored afaict). While the way xenvif_tx_build_gops() works
precludes an ADD and a DEL coming together in a single dummy request
(the DEL would be ignored).


It appears the Solaris backend never coped with multiple extra_info so 
what the 'correct' semantic would be is unclear.


Anyway...

Reviewed-by: Paul Durrant 



--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -104,13 +104,12 @@ bool provides_xdp_headroom = true;
  module_param(provides_xdp_headroom, bool, 0644);
  
  static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx,

-  u8 status);
+  s8 status);
  
  static void make_tx_response(struct xenvif_queue *queue,

-struct xen_netif_tx_request *txp,
+const struct xen_netif_tx_request *txp,
 unsigned int extra_count,
-s8   st);
-static void push_tx_responses(struct xenvif_queue *queue);
+s8 status);
  
  static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx);
  
@@ -208,13 +207,9 @@ static void xenvif_tx_err(struct xenvif_

  unsigned int extra_count, RING_IDX end)
  {
RING_IDX cons = queue->tx.req_cons;
-   unsigned long flags;
  
  	do {

-   spin_lock_irqsave(>response_lock, flags);
make_tx_response(queue, txp, extra_count, XEN_NETIF_RSP_ERROR);
-   push_tx_responses(queue);
-   spin_unlock_irqrestore(>response_lock, flags);
if (cons == end)
break;
RING_COPY_REQUEST(>tx, cons++, txp);
@@ -465,12 +460,7 @@ static void xenvif_get_requests(struct x
for (shinfo->nr_frags = 0; nr_slots > 0 && shinfo->nr_frags < 
MAX_SKB_FRAGS;
 nr_slots--) {
if (unlikely(!txp->size)) {
-   unsigned long flags;
-
-   spin_lock_irqsave(>response_lock, flags);
make_tx_response(queue, txp, 0, XEN_NETIF_RSP_OKAY);
-   push_tx_responses(queue);
-   spin_unlock_irqrestore(>response_lock, flags);
++txp;
continue;
}
@@ -496,14 +486,8 @@ static void xenvif_get_requests(struct x
  
  		for (shinfo->nr_frags = 0; shinfo->nr_frags < nr_slots; ++txp) {

if (unlikely(!txp->size)) {
-   unsigned long flags;
-
-   s

Re: [PATCH net] xen-netback: properly sync TX responses

2024-02-01 Thread Paul Durrant


On 01/02/2024 00:23, Jakub Kicinski wrote:

On Mon, 29 Jan 2024 14:03:08 +0100 Jan Beulich wrote:

Invoking the make_tx_response() / push_tx_responses() pair with no lock
held would be acceptable only if all such invocations happened from the
same context (NAPI instance or dealloc thread). Since this isn't the
case, and since the interface "spec" also doesn't demand that multicast
operations may only be performed with no in-flight transmits,
MCAST_{ADD,DEL} processing also needs to acquire the response lock
around the invocations.

To prevent similar mistakes going forward, "downgrade" the present
functions to private helpers of just the two remaining ones using them
directly, with no forward declarations anymore. This involves renaming
what so far was make_tx_response(), for the new function of that name
to serve the new (wrapper) purpose.

While there,
- constify the txp parameters,
- correct xenvif_idx_release()'s status parameter's type,
- rename {,_}make_tx_response()'s status parameters for consistency with
   xenvif_idx_release()'s.


Hi Paul, is this one on your TODO list to review or should
we do our best? :)


Sorry for the delay. I'll take a look at this now.

  Paul

Re: [PATCH v5 3/3] x86/iommu: cleanup unused functions

2024-01-29 Thread Paul Durrant


On 24/01/2024 17:29, Roger Pau Monne wrote:

Remove xen_in_range() and vpci_is_mmcfg_address() now that hey are unused.

Adjust comments to point to the new functions that replace the existing ones.

No functional change.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
---
Changes since v2:
  - Do remove vpci_is_mmcfg_address().
---
Can be squashed with the previous patch if desired, split as a separate patch
for clarity.
---
  xen/arch/x86/hvm/io.c |  5 ---
  xen/arch/x86/include/asm/hvm/io.h |  3 --
  xen/arch/x86/include/asm/setup.h  |  1 -
  xen/arch/x86/setup.c  | 53 ++-
  xen/arch/x86/tboot.c  |  2 +-
  5 files changed, 3 insertions(+), 61 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH v5 2/3] x86/iommu: switch hwdom IOMMU to use a rangeset

2024-01-29 Thread Paul Durrant


On 24/01/2024 17:29, Roger Pau Monne wrote:

The current loop that iterates from 0 to the maximum RAM address in order to
setup the IOMMU mappings is highly inefficient, and it will get worse as the
amount of RAM increases.  It's also not accounting for any reserved regions
past the last RAM address.

Instead of iterating over memory addresses, iterate over the memory map regions
and use a rangeset in order to keep track of which ranges need to be identity
mapped in the hardware domain physical address space.

On an AMD EPYC 7452 with 512GiB of RAM, the time to execute
arch_iommu_hwdom_init() in nanoseconds is:

x old
+ new
 N   Min   MaxMedian   AvgStddev
x   5 2.2364154e+10  2.338244e+10 2.2474685e+10 2.2622409e+10 4.2949869e+08
+   5   1025012   1033036   1026188 1028276.2 3623.1194
Difference at 95.0% confidence
 -2.26214e+10 +/- 4.42931e+08
 -99.9955% +/- 9.05152e-05%
 (Student's t, pooled s = 3.03701e+08)

Execution time of arch_iommu_hwdom_init() goes down from ~22s to ~0.001s.

Note there's a change for HVM domains (ie: PVH dom0) that get switched to
create the p2m mappings using map_mmio_regions() instead of
p2m_add_identity_entry(), so that ranges can be mapped with a single function
call if possible.  Note that the interface of map_mmio_regions() doesn't
allow creating read-only mappings, but so far there are no such mappings
created for PVH dom0 in arch_iommu_hwdom_init().

No change intended in the resulting mappings that a hardware domain gets.

Signed-off-by: Roger Pau Monné 
---
Changes since v4:
  - Add default case to handle ACPI and NVS regions (which are not mapped)
unless in the low 4GB and when inclusive mode is set.
  - Add changelog entry.
  - Dropped Jans RB.

Changes since v3:
  - Remove unnecessary line wraps.

Changes since v2:
  - Simplify a bit the logic related to inclusive option, at the cost of making
some no-op calls on some cases.

Changes since v1:
  - Split from bigger patch.
  - Remove unneeded default case.

x86/iommu: add CHANGELOG entry for hwdom setup time improvements

Signed-off-by: Roger Pau Monné 


Reviewed-by: Paul Durrant

Re: [PATCH v6 1/3] x86/iommu: remove regions not to be mapped

2024-01-29 Thread Paul Durrant


On 25/01/2024 13:26, Roger Pau Monne wrote:

Introduce the code to remove regions not to be mapped from the rangeset
that will be used to setup the IOMMU page tables for the hardware domain.

This change also introduces two new functions: remove_xen_ranges() and
vpci_subtract_mmcfg() that copy the logic in xen_in_range() and
vpci_is_mmcfg_address() respectively and remove the ranges that would otherwise
be intercepted by the original functions.

Note that the rangeset is still not populated.

Signed-off-by: Roger Pau Monné 
---
Changes since v6:
  - Fix subtraction to be done against the address, not the mfn.

Changes since v4:
  - Fix off-by-one when removing the Xen used ranges, as the rangesets are
inclusive.

Changes since v3:
  - Remove unnecessary line wrapping.

Changes since v1:
  - Split from bigger patch.
---
  xen/arch/x86/hvm/io.c   | 16 
  xen/arch/x86/include/asm/hvm/io.h   |  3 ++
  xen/arch/x86/include/asm/setup.h|  1 +
  xen/arch/x86/setup.c| 48 +++
  xen/drivers/passthrough/x86/iommu.c | 61 +
  5 files changed, 129 insertions(+)



Reviewed-by: Paul Durrant

Re: [PATCH v4 3/6] xen: decouple generic xen code from legacy backends codebase

2023-12-04 Thread Paul Durrant


On 02/12/2023 01:41, Volodymyr Babchuk wrote:

In xen-all.c there are unneeded dependencies on xen-legacy-backend.c:

  - xen_init() uses xen_pv_printf() to report errors, but it does not
  provide a pointer to the struct XenLegacyDevice, so it is kind of
  useless, we can use standard error_report() instead.

  - xen-all.c has function xenstore_record_dm_state() which uses global
  variable "xenstore" defined and initialized in xen-legacy-backend.c
  It is used exactly once, so we can just open a new connection to the
  xenstore, update DM state and close connection back.

Those two changes allows us to remove xen-legacy-backend.c at all,
what should be done in the future anyways. But right now this patch
moves us one step close to have QEMU build without legacy Xen
backends.

Signed-off-by: Volodymyr Babchuk 

---

In v4:

  - New in v4, previous was part of "xen: add option to disable legacy
  backends"
  - Do not move xenstore global variable from xen-legacy-backend.c,
instead use a local variable.
---
  accel/xen/xen-all.c | 16 
  1 file changed, 12 insertions(+), 4 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH 05/12] block: remove AioContext locking

2023-11-30 Thread Paul Durrant


On 29/11/2023 19:55, Stefan Hajnoczi wrote:

This is the big patch that removes
aio_context_acquire()/aio_context_release() from the block layer and
affected block layer users.

There isn't a clean way to split this patch and the reviewers are likely
the same group of people, so I decided to do it in one patch.

Signed-off-by: Stefan Hajnoczi 
---
  include/block/block-global-state.h |   9 +-
  include/block/block-io.h   |   3 +-
  include/block/snapshot.h   |   2 -
  block.c| 234 +-
  block/block-backend.c  |  14 --
  block/copy-before-write.c  |  22 +--
  block/export/export.c  |  22 +--
  block/io.c |  45 +
  block/mirror.c |  19 --
  block/monitor/bitmap-qmp-cmds.c|  20 +-
  block/monitor/block-hmp-cmds.c |  29 ---
  block/qapi-sysemu.c|  27 +--
  block/qapi.c   |  18 +-
  block/raw-format.c |   5 -
  block/replication.c|  58 +-
  block/snapshot.c   |  22 +--
  block/write-threshold.c|   6 -
  blockdev.c | 307 +
  blockjob.c |  18 --
  hw/block/dataplane/virtio-blk.c|  10 -
  hw/block/dataplane/xen-block.c |  17 +-
  hw/block/virtio-blk.c  |  45 +
  hw/core/qdev-properties-system.c   |   9 -
  job.c  |  16 --
  migration/block.c  |  33 +---
  migration/migration-hmp-cmds.c |   3 -
  migration/savevm.c |  22 ---
  net/colo-compare.c |   2 -
  qemu-img.c |   4 -
  qemu-io.c  |  10 +-
  qemu-nbd.c |   2 -
  replay/replay-debugging.c  |   4 -
  tests/unit/test-bdrv-drain.c   |  51 +
  tests/unit/test-bdrv-graph-mod.c   |   6 -
  tests/unit/test-block-iothread.c   |  31 ---
  tests/unit/test-blockjob.c | 137 -
  tests/unit/test-replication.c  |  11 --
  util/async.c   |   4 -
  util/vhost-user-server.c   |   3 -
  scripts/block-coroutine-wrapper.py |   3 -
  tests/tsan/suppressions.tsan   |   1 -
  41 files changed, 102 insertions(+), 1202 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH 2/6] qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to QEMU_BQL_LOCK_GUARD

2023-11-30 Thread Paul Durrant


On 29/11/2023 21:26, Stefan Hajnoczi wrote:

The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL)
instead, it is already widely used and unambiguous.

Signed-off-by: Stefan Hajnoczi 
---
  include/qemu/main-loop.h  | 20 ++--
  hw/i386/kvm/xen_evtchn.c  | 14 +++---
  hw/i386/kvm/xen_gnttab.c  |  2 +-
  hw/mips/mips_int.c|  2 +-
  hw/ppc/ppc.c  |  2 +-
  target/i386/kvm/xen-emu.c |  2 +-
  target/ppc/excp_helper.c  |  2 +-
  target/ppc/helper_regs.c  |  2 +-
  target/riscv/cpu_helper.c |  4 ++--
  9 files changed, 25 insertions(+), 25 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()

2023-11-30 Thread Paul Durrant

 |   4 +-
  target/s390x/tcg/misc_helper.c   | 118 +--
  target/sparc/int32_helper.c  |   2 +-
  target/sparc/int64_helper.c  |   6 +-
  target/sparc/win_helper.c|  20 ++---
  target/xtensa/exc_helper.c   |   8 +-
  ui/spice-core.c  |   4 +-
  util/async.c |   2 +-
  util/main-loop.c |   8 +-
  util/rcu.c   |  14 ++--
  audio/coreaudio.m|   4 +-
  memory_ldst.c.inc|  18 ++--
  target/i386/hvf/README.md|   2 +-
  ui/cocoa.m   |  50 ++--
  94 files changed, 502 insertions(+), 502 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH v2 4/6] xen_pvdev: Do not assume Dom0 when creating a directory

2023-11-23 Thread Paul Durrant


On 23/11/2023 12:27, David Woodhouse wrote:

On 23 November 2023 12:17:57 GMT, Volodymyr Babchuk 
 wrote:


Hi David,

David Woodhouse  writes:

Which PV backends do you care about? We already have net, block and console 
converted.


Well, this is all what we need, actually. Even console only will be
sufficient, as we are using QEMU to provide virtio-pci backends, so both
storage and networking should be provided by virtio. Are you proposing
to just drop this patch at all? I believe we can live without it, yes.


Yeah, I think you can drop anything that's only needed for the legacy backends. 
I'm tempted to make a separate config option to compile those out.



I think that would be a good idea. The other legacy bacckend that we may 
need to care about is xenfb... not so much the framebuffer itself, but 
the mouse and keyboard aspects. The XENVKBD and XENHID drivers expose PV 
mouse and keyboard to Windows instances so it's be nice if we can avoid 
the backend withering away.


  Paul

Re: [PATCH v2 1/6] hw/xen: Set XenBackendInstance in the XenDevice before realizing it

2023-11-23 Thread Paul Durrant


On 22/11/2023 23:04, David Woodhouse wrote:

On Wed, 2023-11-22 at 22:56 +, Volodymyr Babchuk wrote:



Paul Durrant  writes:


On 21/11/2023 22:10, Volodymyr Babchuk wrote:

From: David Woodhouse 
This allows a XenDevice implementation to know whether it was
created
by QEMU, or merely discovered in XenStore after the toolstack created
it. This will allow us to create frontend/backend nodes only when we
should, rather than unconditionally attempting to overwrite them from
a driver domain which doesn't have privileges to do so.
As an added benefit, it also means we no longer have to call the
xen_backend_set_device() function from the device models immediately
after calling qdev_realize_and_unref(). Even though we could make
the argument that it's safe to do so, and the pointer to the unreffed
device *will* actually still be valid, it still made my skin itch to
look at it.
Signed-off-by: David Woodhouse 
---
   hw/block/xen-block.c | 3 +--
   hw/char/xen_console.c    | 2 +-
   hw/net/xen_nic.c | 2 +-
   hw/xen/xen-bus.c | 4 
   include/hw/xen/xen-backend.h | 2 --
   include/hw/xen/xen-bus.h | 2 ++
   6 files changed, 9 insertions(+), 6 deletions(-)



Actually, I think you should probably update
xen_backend_try_device_destroy() in this patch too. It currently uses
xen_backend_list_find() to check whether the specified XenDevice has
an associated XenBackendInstance.


Sure. Looks like it is the only user of xen_backend_list_find(), so we
can get rid of it as well.

I'll drop your R-b tag, because of those additional changes in the new
version.


I think it's fine to keep. He told me to do it :)


I confirm that :-)

Re: [PATCH v2 4/6] xen_pvdev: Do not assume Dom0 when creating a directory

2023-11-23 Thread Paul Durrant


On 23/11/2023 00:07, Volodymyr Babchuk wrote:


Hi,

Volodymyr Babchuk  writes:


Hi Stefano,

Stefano Stabellini  writes:


On Wed, 22 Nov 2023, David Woodhouse wrote:

On Wed, 2023-11-22 at 15:09 -0800, Stefano Stabellini wrote:

On Wed, 22 Nov 2023, David Woodhouse wrote:

On Wed, 2023-11-22 at 14:29 -0800, Stefano Stabellini wrote:

On Wed, 22 Nov 2023, Paul Durrant wrote:

On 21/11/2023 22:10, Volodymyr Babchuk wrote:

From: Oleksandr Tyshchenko 

Instead of forcing the owner to domid 0, use XS_PRESERVE_OWNER to
inherit the owner of the directory.


Ah... so that's why the previous patch is there.

This is not the right way to fix it. The QEMU Xen support is *assuming* that
QEMU is either running in, or emulating, dom0. In the emulation case this is
probably fine, but the 'real Xen' case it should be using the correct domid
for node creation. I guess this could either be supplied on the command line
or discerned by reading the local domain 'domid' node.


yes, it should be passed as command line option to QEMU


I'm not sure I like the idea of a command line option for something
which QEMU could discover for itself.


That's fine too. I meant to say "yes, as far as I know the toolstack
passes the domid to QEMU as a command line option today".


The -xen-domid argument on the QEMU command line today is the *guest*
domain ID, not the domain ID in which QEMU itself is running.

Or were you thinking of something different?


Ops, you are right and I understand your comment better now. The backend
domid is not on the command line but it should be discoverable (on
xenstore if I remember right).


Yes, it is just "~/domid". I'll add a function that reads it.


Just a quick question to QEMU folks: is it better to add a global
variable where we will store own Domain ID or it will be okay to read
domid from Xenstore every time we need it?

If global variable variant is better, what is proffered place to define
this variable? system/globals.c ?



Actually... is it possible for QEMU just to use a relative path for the 
backend nodes? That way it won't need to know it's own domid, will it?


  Paul

Re: [PATCH v2 4/6] xen_pvdev: Do not assume Dom0 when creating a directory

2023-11-22 Thread Paul Durrant


On 21/11/2023 22:10, Volodymyr Babchuk wrote:

From: Oleksandr Tyshchenko 

Instead of forcing the owner to domid 0, use XS_PRESERVE_OWNER to
inherit the owner of the directory.


Ah... so that's why the previous patch is there.

This is not the right way to fix it. The QEMU Xen support is *assuming* 
that QEMU is either running in, or emulating, dom0. In the emulation 
case this is probably fine, but the 'real Xen' case it should be using 
the correct domid for node creation. I guess this could either be 
supplied on the command line or discerned by reading the local domain 
'domid' node.




Note that for other than Dom0 domain (non toolstack domain) the
"driver_domain" property should be set in domain config file for the
toolstack to create required directories in advance.

Signed-off-by: Oleksandr Tyshchenko 
Signed-off-by: Volodymyr Babchuk 
---
  hw/xen/xen_pvdev.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/xen/xen_pvdev.c b/hw/xen/xen_pvdev.c
index c5ad71e8dc..42bdd4f6c8 100644
--- a/hw/xen/xen_pvdev.c
+++ b/hw/xen/xen_pvdev.c
@@ -60,7 +60,8 @@ void xen_config_cleanup(void)
  
  int xenstore_mkdir(char *path, int p)

  {
-if (!qemu_xen_xs_create(xenstore, 0, 0, xen_domid, p, path)) {
+if (!qemu_xen_xs_create(xenstore, 0, XS_PRESERVE_OWNER,
+xen_domid, p, path)) {
  xen_pv_printf(NULL, 0, "xs_mkdir %s: failed\n", path);
  return -1;
  }

Re: [PATCH v2 3/6] xen: xenstore: add possibility to preserve owner

2023-11-22 Thread Paul Durrant


On 21/11/2023 22:10, Volodymyr Babchuk wrote:

Add option to preserve owner when creating an entry in Xen Store. This
may be needed in cases when Qemu is working as device model in a


*may* be needed?

I don't undertstand why this patch is necessary and the commit comment 
isn't really helping me.



domain that is Domain-0, e.g. in driver domain.

"owner" parameter for qemu_xen_xs_create() function can have special
value XS_PRESERVE_OWNER, which will make specific implementation to
get original owner of an entry and pass it back to
set_permissions() call.

Please note, that XenStore inherits permissions, so even if entry is
newly created by, it already has the owner set to match owner of entry
at previous level.

Signed-off-by: Volodymyr Babchuk

Re: [PATCH v2 1/6] hw/xen: Set XenBackendInstance in the XenDevice before realizing it

2023-11-22 Thread Paul Durrant


On 21/11/2023 22:10, Volodymyr Babchuk wrote:

From: David Woodhouse 

This allows a XenDevice implementation to know whether it was created
by QEMU, or merely discovered in XenStore after the toolstack created
it. This will allow us to create frontend/backend nodes only when we
should, rather than unconditionally attempting to overwrite them from
a driver domain which doesn't have privileges to do so.

As an added benefit, it also means we no longer have to call the
xen_backend_set_device() function from the device models immediately
after calling qdev_realize_and_unref(). Even though we could make
the argument that it's safe to do so, and the pointer to the unreffed
device *will* actually still be valid, it still made my skin itch to
look at it.

Signed-off-by: David Woodhouse 
---
  hw/block/xen-block.c | 3 +--
  hw/char/xen_console.c| 2 +-
  hw/net/xen_nic.c | 2 +-
  hw/xen/xen-bus.c | 4 
  include/hw/xen/xen-backend.h | 2 --
  include/hw/xen/xen-bus.h | 2 ++
  6 files changed, 9 insertions(+), 6 deletions(-)



Actually, I think you should probably update 
xen_backend_try_device_destroy() in this patch too. It currently uses 
xen_backend_list_find() to check whether the specified XenDevice has an 
associated XenBackendInstance.


  Paul

Re: [PATCH v2 2/6] xen: backends: touch some XenStore nodes only if device...

2023-11-22 Thread Paul Durrant


On 21/11/2023 22:10, Volodymyr Babchuk wrote:

was created by QEMU

Xen PV devices in QEMU can be created in two ways: either by QEMU
itself, if they were passed via command line, or by Xen toolstack. In
the latter case, QEMU scans XenStore entries and configures devices
accordingly.

In the second case we don't want QEMU to write/delete front-end
entries for two reasons: it might have no access to those entries if
it is running in un-privileged domain and it is just incorrect to
overwrite entries already provided by Xen toolstack, because toolstack
manages those nodes. For example, it might read backend- or frontend-
state to be sure that they are both disconnected and it is safe to
destroy a domain.

This patch checks presence of xendev->backend to check if Xen PV
device is acting as a backend (i.e. it was configured by Xen


Technally *all* XenDevice objects are backends.


toolstack) to decide if it should touch frontend entries in XenStore.
Also, when we need to remove XenStore entries during device teardown
only if they weren't created by Xen toolstack. If they were created by
toolstack, then it is toolstack's job to do proper clean-up.

Re: [PATCH v2 1/6] hw/xen: Set XenBackendInstance in the XenDevice before realizing it

2023-11-22 Thread Paul Durrant


On 21/11/2023 22:10, Volodymyr Babchuk wrote:

From: David Woodhouse 

This allows a XenDevice implementation to know whether it was created
by QEMU, or merely discovered in XenStore after the toolstack created
it. This will allow us to create frontend/backend nodes only when we
should, rather than unconditionally attempting to overwrite them from
a driver domain which doesn't have privileges to do so.

As an added benefit, it also means we no longer have to call the
xen_backend_set_device() function from the device models immediately
after calling qdev_realize_and_unref(). Even though we could make
the argument that it's safe to do so, and the pointer to the unreffed
device *will* actually still be valid, it still made my skin itch to
look at it.

Signed-off-by: David Woodhouse 
---
  hw/block/xen-block.c | 3 +--
  hw/char/xen_console.c| 2 +-
  hw/net/xen_nic.c | 2 +-
  hw/xen/xen-bus.c | 4 
  include/hw/xen/xen-backend.h | 2 --
  include/hw/xen/xen-bus.h | 2 ++
  6 files changed, 9 insertions(+), 6 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH 2/3] vl: disable default serial when xen-console is enabled

2023-11-21 Thread Paul Durrant


On 15/11/2023 17:24, David Woodhouse wrote:

From: David Woodhouse 

If a Xen console is configured on the command line, do not add a default
serial port.

Signed-off-by: David Woodhouse 
---
  system/vl.c | 1 +
  1 file changed, 1 insertion(+)



Reviewed-by: Paul Durrant

Re: [PATCH 3/3] hw/xen: clean up xen_block_find_free_vdev() to avoid Coverity false positive

2023-11-15 Thread Paul Durrant


On 15/11/2023 12:24, David Woodhouse wrote:

From: David Woodhouse 

Coverity couldn't see that nr_existing was always going to be zero when
qemu_xen_xs_directory() returned NULL in the ENOENT case (CID 1523906).

Perhaps more to the point, neither could Peter at first glance. Improve
the code to hopefully make it clearer to Coverity and human reviewers
alike.

Signed-off-by: David Woodhouse 
---
  hw/block/xen-block.c | 24 +---
  1 file changed, 21 insertions(+), 3 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH v1 3/7] xen: xenstore: add possibility to preserve owner

2023-11-12 Thread Paul Durrant


On 10/11/2023 15:42, Volodymyr Babchuk wrote:

Add option to preserve owner when creating an entry in Xen Store. This
may be needed in cases when Qemu is working as device model in a
domain that is Domain-0, e.g. in driver domain.

"owner" parameter for qemu_xen_xs_create() function can have special
value XS_PRESERVE_OWNER, which will make specific implementation to
get original owner of an entry and pass it back to
set_permissions() call.



If QEMU is running in a driver domain then it should know whether the 
domid of the domain it is running in and use that. Yes, it is hardcoded 
to 0 at the moment but surely a backend domid (which defaults to 0) 
could be passed on the command line?


  Paul


Signed-off-by: Volodymyr Babchuk 
---
  hw/i386/kvm/xen_xenstore.c   | 18 ++
  hw/xen/xen-operations.c  | 12 
  include/hw/xen/xen_backend_ops.h |  2 ++
  3 files changed, 32 insertions(+)

diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
index 660d0b72f9..7b894a9884 100644
--- a/hw/i386/kvm/xen_xenstore.c
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -1572,6 +1572,24 @@ static bool xs_be_create(struct qemu_xs_handle *h, 
xs_transaction_t t,
  return false;
  }
  
+if (owner == XS_PRESERVE_OWNER) {

+GList *perms;
+char letter;
+
+err = xs_impl_get_perms(h->impl, 0, t, path, );
+if (err) {
+errno = err;
+return false;
+}
+
+if (sscanf(perms->data, "%c%u", , ) != 2) {
+errno = EFAULT;
+g_list_free_full(perms, g_free);
+return false;
+}
+g_list_free_full(perms, g_free);
+}
+
  perms_list = g_list_append(perms_list,
 xs_perm_as_string(XS_PERM_NONE, owner));
  perms_list = g_list_append(perms_list,
diff --git a/hw/xen/xen-operations.c b/hw/xen/xen-operations.c
index e00983ec44..1df59b3c08 100644
--- a/hw/xen/xen-operations.c
+++ b/hw/xen/xen-operations.c
@@ -300,6 +300,18 @@ static bool libxenstore_create(struct qemu_xs_handle *h, 
xs_transaction_t t,
  return false;
  }
  
+if (owner == XS_PRESERVE_OWNER) {

+struct xs_permissions *tmp;
+unsigned int num;
+
+tmp = xs_get_permissions(h->xsh, 0, path, );
+if (tmp == NULL) {
+return false;
+}
+perms_list[0].id = tmp[0].id;
+free(tmp);
+}
+
  return xs_set_permissions(h->xsh, t, path, perms_list,
ARRAY_SIZE(perms_list));
  }
diff --git a/include/hw/xen/xen_backend_ops.h b/include/hw/xen/xen_backend_ops.h
index 90cca85f52..273e414559 100644
--- a/include/hw/xen/xen_backend_ops.h
+++ b/include/hw/xen/xen_backend_ops.h
@@ -266,6 +266,8 @@ typedef uint32_t xs_transaction_t;
  #define XS_PERM_READ  0x01
  #define XS_PERM_WRITE 0x02
  
+#define XS_PRESERVE_OWNER0xFFFE

+
  struct xenstore_backend_ops {
  struct qemu_xs_handle *(*open)(void);
  void (*close)(struct qemu_xs_handle *h);

Re: [PATCH v1 5/7] xen-bus: Set offline if backend's state is XenbusStateClosed

2023-11-12 Thread Paul Durrant


On 10/11/2023 15:42, Volodymyr Babchuk wrote:

From: Oleksandr Tyshchenko 

Both state (XenbusStateClosed) and online (0) are expected by
toolstack/xl devd to completely destroy the device. But "offline"
is never being set by the backend resulting in timeout during
domain destruction, garbage in Xestore and still running Qemu
instance.

Signed-off-by: Oleksandr Tyshchenko 
Signed-off-by: Volodymyr Babchuk 
---
  hw/xen/xen-bus.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index 75474d4b43..6e7ec3af64 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -519,6 +519,10 @@ static void xen_device_backend_changed(void *opaque, const 
char *path)
  xen_device_backend_set_state(xendev, XenbusStateClosed);
  }
  
+if (xen_device_backend_get_state(xendev) == XenbusStateClosed) {

+xen_device_backend_set_online(xendev, false);
+}
+
  /*
   * If a backend is still 'online' then we should leave it alone but,
   * if a backend is not 'online', then the device is a candidate


I don't understand what you're trying to do here. Just a few lines up 
from this hunk there is:


 506if (xen_device_backend_scanf(xendev, "online", "%u", ) 
!= 1) {

 507online = 0;
 508}
 509
 510xen_device_backend_set_online(xendev, !!online);

Why is this not sufficient? What happens if the frontend decides to stop 
and start (e.g. for a driver update)? I'm guessing the backend will be 
destroyed... which is not very friendly.


  Paul

Re: [PATCH v1 1/7] xen-block: Do not write frontend nodes

2023-11-12 Thread Paul Durrant


On 10/11/2023 15:42, Volodymyr Babchuk wrote:

From: Oleksandr Tyshchenko 

The PV backend running in other than Dom0 domain (non toolstack domain)
is not allowed to write frontend nodes. The more, the backend does not
need to do that at all, this is purely toolstack/xl devd business.

I do not know for what reason the backend does that here, this is not really
needed, probably it is just a leftover and all xen_device_frontend_printf()
instances should go away completely.



It is not a leftover and it is needed in the case that QEMU is 
instantiating the backend unilaterally... i.e. without the involvement 
of any Xen toolstack.
Agreed that, if QEMU, is running in a deprivileged context, that is not 
an option. The correct way to determined this though is whether the 
device is being created via the QEMU command line or whether is being 
created because XenStore nodes written by a toolstack were discovered.
In the latter case there should be a XenBackendInstance that corresponds 
to the XenDevice whereas in the former case there should not be. For 
example, see xen_backend_try_device_destroy()


  Paul



Signed-off-by: Oleksandr Tyshchenko 
Signed-off-by: Volodymyr Babchuk 
---
  hw/block/xen-block.c | 11 +++
  hw/xen/xen-bus.c |  2 +-
  2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index a07cd7eb5d..dc4d477c22 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -221,6 +221,7 @@ static void xen_block_realize(XenDevice *xendev, Error 
**errp)
  XenBlockVdev *vdev = >props.vdev;
  BlockConf *conf = >props.conf;
  BlockBackend *blk = conf->blk;
+XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev)));
  
  if (vdev->type == XEN_BLOCK_VDEV_TYPE_INVALID) {

  error_setg(errp, "vdev property not set");
@@ -280,10 +281,12 @@ static void xen_block_realize(XenDevice *xendev, Error 
**errp)
  
  xen_device_backend_printf(xendev, "info", "%u", blockdev->info);
  
-xen_device_frontend_printf(xendev, "virtual-device", "%lu",

-   vdev->number);
-xen_device_frontend_printf(xendev, "device-type", "%s",
-   blockdev->device_type);
+if (xenbus->backend_id == 0) {
+xen_device_frontend_printf(xendev, "virtual-device", "%lu",
+   vdev->number);
+xen_device_frontend_printf(xendev, "device-type", "%s",
+   blockdev->device_type);
+}
  
  xen_device_backend_printf(xendev, "sector-size", "%u",

conf->logical_block_size);
diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index ece8ec40cd..06d5192aca 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -1048,7 +1048,7 @@ static void xen_device_realize(DeviceState *dev, Error 
**errp)
  xen_device_backend_set_online(xendev, true);
  xen_device_backend_set_state(xendev, XenbusStateInitWait);
  
-if (!xen_device_frontend_exists(xendev)) {

+if (!xen_device_frontend_exists(xendev) && xenbus->backend_id == 0) {
  xen_device_frontend_printf(xendev, "backend", "%s",
 xendev->backend_path);
  xen_device_frontend_printf(xendev, "backend-id", "%u",

Re: [PATCH] hw/xen: clean up xen_block_find_free_vdev() to avoid Coverity false positive

2023-11-09 Thread Paul Durrant


On 09/11/2023 15:30, David Woodhouse wrote:

From: David Woodhouse 

Coverity couldn't see that nr_existing was always going to be zero when
qemu_xen_xs_directory() returned NULL in the ENOENT case (CID 1523906).

Perhaps more to the point, neither could Peter at first glance. Improve
the code to hopefully make it clearer to Coverity and human reviewers
alike.

Signed-off-by: David Woodhouse 
---
  hw/block/xen-block.c | 24 +---
  1 file changed, 21 insertions(+), 3 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH v4 17/17] docs: update Xen-on-KVM documentation

2023-11-06 Thread Paul Durrant


On 06/11/2023 14:35, David Woodhouse wrote:

From: David Woodhouse 

Add notes about console and network support, and how to launch PV guests.
Clean up the disk configuration examples now that that's simpler, and
remove the comment about IDE unplug on q35/AHCI now that it's fixed.

Update the -initrd option documentation to explain how to quote commas
in module command lines, and reference it when documenting PV guests.

Also update stale avocado test filename in MAINTAINERS.

Signed-off-by: David Woodhouse 
---
  MAINTAINERS  |   2 +-
  docs/system/i386/xen.rst | 107 +--
  qemu-options.hx  |  14 +++--
  3 files changed, 91 insertions(+), 32 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH v4 16/17] doc/sphinx/hxtool.py: add optional label argument to SRST directive

2023-11-06 Thread Paul Durrant


On 06/11/2023 14:35, David Woodhouse wrote:

From: David Woodhouse 

We can't just embed labels directly into files like qemu-options.hx which
are included from multiple top-level RST files, because Sphinx sees the
labels as duplicate: https://github.com/sphinx-doc/sphinx/issues/9707

So add an 'emitrefs' option to the Sphinx hxtool-doc directive, which is
set only in invocation.rst and not from the HTML rendition of the man
page. Along with an argument to the SRST directive which causes a label
of the form '.. _LABEL-reference-label:' to be emitted when the emitrefs
option is set.

Signed-off-by: David Woodhouse 
---
  docs/sphinx/hxtool.py  | 18 +-
  docs/system/invocation.rst |  1 +
  2 files changed, 18 insertions(+), 1 deletion(-)



Reviewed-by: Paul Durrant

Re: [PATCH v4 13/17] hw/i386/pc: support '-nic' for xen-net-device

2023-11-06 Thread Paul Durrant


On 06/11/2023 14:35, David Woodhouse wrote:

From: David Woodhouse 

The default NIC creation seems a bit hackish to me. I don't understand
why each platform has to call pci_nic_init_nofail() from a point in the
code where it actually has a pointer to the PCI bus, and then we have
the special cases for things like ne2k_isa.

If qmp_device_add() can *find* the appropriate bus and instantiate
the device on it, why can't we just do that from generic code for
creating the default NICs too?

But that isn't a yak I want to shave today. Add a xenbus field to the
PCMachineState so that it can make its way from pc_basic_device_init()
to pc_nic_init() and be handled as a special case like ne2k_isa is.

Now we can launch emulated Xen guests with '-nic user'.

Signed-off-by: David Woodhouse 
---
  hw/i386/pc.c | 11 ---
  hw/i386/pc_piix.c|  2 +-
  hw/i386/pc_q35.c |  2 +-
  hw/xen/xen-bus.c |  4 +++-
  include/hw/i386/pc.h |  4 +++-
  include/hw/xen/xen-bus.h |  2 +-
  6 files changed, 17 insertions(+), 8 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH v4 06/17] hw/xen: automatically assign device index to block devices

2023-11-06 Thread Paul Durrant


On 06/11/2023 14:34, David Woodhouse wrote:

From: David Woodhouse 

There's no need to force the user to assign a vdev. We can automatically
assign one, starting at xvda and searching until we find the first disk
name that's unused.

This means we can now allow '-drive if=xen,file=xxx' to work without an
explicit separate -driver argument, just like if=virtio.

Rip out the legacy handling from the xenpv machine, which was scribbling
over any disks configured by the toolstack, and didn't work with anything
but raw images.

Signed-off-by: David Woodhouse 
Acked-by: Kevin Wolf 
---
  blockdev.c  |  15 +++-
  hw/block/xen-block.c| 118 ++--
  hw/xen/xen_devconfig.c  |  28 ---
  hw/xenpv/xen_machine_pv.c   |   9 ---
  include/hw/xen/xen-legacy-backend.h |   1 -
  5 files changed, 125 insertions(+), 46 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH] xen/pt: Emulate multifunction bit in header type

2023-11-06 Thread Paul Durrant


On 03/11/2023 17:26, Ross Lagerwall wrote:

The intention of the code appears to have been to unconditionally set
the multifunction bit but since the emulation mask is 0x00 it has no
effect. Instead, emulate the bit and set it based on the multifunction
property of the PCIDevice (which can be set using QAPI).

This allows making passthrough devices appear as functions in a Xen
guest.

Signed-off-by: Ross Lagerwall 


Reviewed-by: Paul Durrant

Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode

2023-10-25 Thread Paul Durrant


On 25/10/2023 10:00, David Woodhouse wrote:

On Wed, 2023-10-25 at 09:31 +0100, Paul Durrant wrote:

On 24/10/2023 17:34, David Woodhouse wrote:

On Tue, 2023-10-24 at 17:25 +0100, Paul Durrant wrote:

On 24/10/2023 16:49, David Woodhouse wrote:

On Tue, 2023-10-24 at 16:39 +0100, Paul Durrant wrote:

On 24/10/2023 16:37, David Woodhouse wrote:

On Tue, 2023-10-24 at 15:20 +0100, Paul Durrant wrote:

On 16/10/2023 16:19, David Woodhouse wrote:

From: David Woodhouse 

The primary console is special because the toolstack maps a page at a
fixed GFN and also allocates the guest-side event channel. Add support
for that in emulated mode, so that we can have a primary console.

Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which
supports literally nothing except a single-page mapping of the console
page. This might as well have been a hack in the xen_console driver, but
this way at least the special-casing is kept within the Xen emulation
code, and it gives us a hook for a more complete implementation if/when
we ever do need one.


Why can't you map the console page via the grant table like the xenstore
page?


I suppose we could, but I didn't really want the generic xen-console
device code having any more of a special case for 'Xen emulation' than
it does already by having to call xen_primary_console_create().



But doesn't is save you the whole foreignmem thing? You can use the
grant table for primary and secondary consoles.


Yes. And I could leave the existing foreignmem thing just for the case
of primary console under true Xen. It's probably not that awful a
special case, in the end.

Then again, I was surprised I didn't *already* have a foreignmem ops
for the emulated case, and we're probably going to want to continue
fleshing it out later, so I don't really mind adding it.



True. We'll need it for some of the other more fun protocols like vkbd
or fb. Still, I think it'd be nicer to align the xenstore and primary
console code to look similar and punt the work until then :-)


I don't think it ends up looking like xenstore either way, does it?
Xenstore is special because it gets to use the original pointer to its
own page.



Not sure what you mean there? A guest can query the PFN for either
xenstore or console using HVM params, or it can find them in its own
grant table entries 0 or 1.


The code in our xen_xenstore.c uses its *own* pointer (s->xs) to the
MemoryRegion that it created (s->xenstore_page). It is its own backend,
as well as doing the "magic" to create the guest-side mapping and event
channel.

The difference for the console code is that we actually have a
*separation* between the standard backend code in xen_console.c, and
the magic frontend parts for the emulated mode.





I don't think I want to hack the xen_console code to explicitly call a
xen_console_give_me_your_page() function. If not foreignmem, I think
you were suggesting that we actually call the grant mapping code to get
a pointer to the underlying page, right?


I'm suggesting that the page be mapped in the same way that the xenstore
backend does:

1462    /*

1463 * We don't actually access the guest's page through the grant, because
1464 * this isn't real Xen, and we can just use the page we gave it in the
1465 * first place. Map the grant anyway, mostly for cosmetic purposes so
1466 * it *looks* like it's in use in the guest-visible grant table.
1467 */
1468    s->gt = qemu_xen_gnttab_open();
1469    uint32_t xs_gntref = GNTTAB_RESERVED_XENSTORE;
1470    s->granted_xs = qemu_xen_gnttab_map_refs(s->gt, 1, xen_domid, 
_gntref,
1471 PROT_READ | PROT_WRITE);


It already *is*. But as with xen_xenstore.c, nothing ever *uses* the
s->granted_xs pointer. It's just cosmetic to make the grant table look
right.

But that doesn't help the *backend* code. The backend doesn't even know
the grant ref#, because the convention we inherited from Xen is that
the `ring-ref` in XenStore for the primary console is actually the MFN,
to be mapped as foreignmem.

Of course, we *do* know the grant-ref for the primary console, as it's
always GNTTAB_RESERVED_CONSOLE. So I suppose we could put a hack into
the xen_console backend to map *that* in the case of primary console
under emu? In fact that would probably do the right thing even under
Xen if we could persuade Xen to make an ioemu primary console?



That's exactly what I am getting at :-) I don't think we need care about 
the ring-ref in xenstore for the primary console.


  Paul








I could kind of live with that... except that Xen has this ugly
convention that the "ring-ref" frontend node for the primary console
actually has the *MFN* not a grant ref. Which I don't understand since
the toolstack *does* populate the grant table for it (just as it does
for the xenstore page). But we'd have to add a special case exception
to that special case, so that in the emu case it'

Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode

2023-10-25 Thread Paul Durrant


On 24/10/2023 17:34, David Woodhouse wrote:

On Tue, 2023-10-24 at 17:25 +0100, Paul Durrant wrote:

On 24/10/2023 16:49, David Woodhouse wrote:

On Tue, 2023-10-24 at 16:39 +0100, Paul Durrant wrote:

On 24/10/2023 16:37, David Woodhouse wrote:

On Tue, 2023-10-24 at 15:20 +0100, Paul Durrant wrote:

On 16/10/2023 16:19, David Woodhouse wrote:

From: David Woodhouse 

The primary console is special because the toolstack maps a page at a
fixed GFN and also allocates the guest-side event channel. Add support
for that in emulated mode, so that we can have a primary console.

Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which
supports literally nothing except a single-page mapping of the console
page. This might as well have been a hack in the xen_console driver, but
this way at least the special-casing is kept within the Xen emulation
code, and it gives us a hook for a more complete implementation if/when
we ever do need one.


Why can't you map the console page via the grant table like the xenstore
page?


I suppose we could, but I didn't really want the generic xen-console
device code having any more of a special case for 'Xen emulation' than
it does already by having to call xen_primary_console_create().



But doesn't is save you the whole foreignmem thing? You can use the
grant table for primary and secondary consoles.


Yes. And I could leave the existing foreignmem thing just for the case
of primary console under true Xen. It's probably not that awful a
special case, in the end.

Then again, I was surprised I didn't *already* have a foreignmem ops
for the emulated case, and we're probably going to want to continue
fleshing it out later, so I don't really mind adding it.



True. We'll need it for some of the other more fun protocols like vkbd
or fb. Still, I think it'd be nicer to align the xenstore and primary
console code to look similar and punt the work until then :-)


I don't think it ends up looking like xenstore either way, does it?
Xenstore is special because it gets to use the original pointer to its
own page.



Not sure what you mean there? A guest can query the PFN for either 
xenstore or console using HVM params, or it can find them in its own 
grant table entries 0 or 1.



I don't think I want to hack the xen_console code to explicitly call a
xen_console_give_me_your_page() function. If not foreignmem, I think
you were suggesting that we actually call the grant mapping code to get
a pointer to the underlying page, right?


I'm suggesting that the page be mapped in the same way that the xenstore 
backend does:


1462/* 

1463 * We don't actually access the guest's page through the grant, 
because
1464 * this isn't real Xen, and we can just use the page we gave it 
in the
1465 * first place. Map the grant anyway, mostly for cosmetic 
purposes so
1466 * it *looks* like it's in use in the guest-visible grant table. 


1467 */
1468s->gt = qemu_xen_gnttab_open();
1469uint32_t xs_gntref = GNTTAB_RESERVED_XENSTORE;
1470s->granted_xs = qemu_xen_gnttab_map_refs(s->gt, 1, xen_domid, 
_gntref,

1471 PROT_READ | PROT_WRITE);




I could kind of live with that... except that Xen has this ugly
convention that the "ring-ref" frontend node for the primary console
actually has the *MFN* not a grant ref. Which I don't understand since
the toolstack *does* populate the grant table for it (just as it does
for the xenstore page). But we'd have to add a special case exception
to that special case, so that in the emu case it's an actual grant ref
again. I think I prefer just having a stub of foreignmem, TBH.



You're worried about the guest changing the page it uses for the primary 
console and putting a new one in xenstore? I'd be amazed if that even 
works on Xen unless the guest is careful to write it into 
GNTTAB_RESERVED_CONSOLE.



(I didn't yet manage to get Xen to actually create a primary console of
type iomem, FWIW)



No, that doesn't entirely surprise me.

  Paul

Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode

2023-10-24 Thread Paul Durrant


On 24/10/2023 16:49, David Woodhouse wrote:

On Tue, 2023-10-24 at 16:39 +0100, Paul Durrant wrote:

On 24/10/2023 16:37, David Woodhouse wrote:

On Tue, 2023-10-24 at 15:20 +0100, Paul Durrant wrote:

On 16/10/2023 16:19, David Woodhouse wrote:

From: David Woodhouse 

The primary console is special because the toolstack maps a page at a
fixed GFN and also allocates the guest-side event channel. Add support
for that in emulated mode, so that we can have a primary console.

Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which
supports literally nothing except a single-page mapping of the console
page. This might as well have been a hack in the xen_console driver, but
this way at least the special-casing is kept within the Xen emulation
code, and it gives us a hook for a more complete implementation if/when
we ever do need one.


Why can't you map the console page via the grant table like the xenstore
page?


I suppose we could, but I didn't really want the generic xen-console
device code having any more of a special case for 'Xen emulation' than
it does already by having to call xen_primary_console_create().



But doesn't is save you the whole foreignmem thing? You can use the
grant table for primary and secondary consoles.


Yes. And I could leave the existing foreignmem thing just for the case
of primary console under true Xen. It's probably not that awful a
special case, in the end.

Then again, I was surprised I didn't *already* have a foreignmem ops
for the emulated case, and we're probably going to want to continue
fleshing it out later, so I don't really mind adding it.



True. We'll need it for some of the other more fun protocols like vkbd 
or fb. Still, I think it'd be nicer to align the xenstore and primary 
console code to look similar and punt the work until then :-)


  Paul

Re: [PATCH v2 16/24] hw/xen: handle soft reset for primary console

2023-10-24 Thread Paul Durrant


On 24/10/2023 16:48, David Woodhouse wrote:

On Tue, 2023-10-24 at 16:44 +0100, Paul Durrant wrote:

On 19/10/2023 16:40, David Woodhouse wrote:

From: David Woodhouse 

On soft reset, the prinary console event channel needs to be rebound to
the backend port (in the xen-console driver). We could put that into the
xen-console driver itself, but it's slightly less ugly to keep it within
the KVM/Xen code, by stashing the backend port# on event channel reset
and then rebinding in the primary console reset when it has to recreate
the guest port anyway.


Does Xen re-bind the primary console on EVTCHNOP_reset? That's news to
me. I go check.


I spent an unhapp hour trying to work out how Xen actually does any of
this :)

In the short term I'm more interested in having soft reset work, than
an explicit EVTCHNOP_reset. And I can't work out *how*, but we do seem
to have console again after a kexec in real Xen.


*Soft* reset may do it, but not the EVTCHNOP_reset hypercall itself, 
because there's a bunch of impenetrable toolstack magic involved the 
former. Perhaps you could just push the re-bind code up a layer into

kvm_xen_soft_reset().

  Paul

Re: [PATCH 2/4] hw/xen: update Xen PV NIC to XenDevice model

2023-10-24 Thread Paul Durrant


On 24/10/2023 16:17, David Woodhouse wrote:
[snip]


I don't think that's really a valid state for a network frontend. Linux
netback just ignores it.


Must we? I was thinking of making the ->frontend_changed() methods
optional and allowing backends to just provide ->connect() and
->disconnect() methods instead if they wanted to. Because we have three
identical ->frontend_changed() methods now...



Now maybe... Not sure things will look so common when other backends are 
converted. I'd prefer to maintain fidelity with Linux xen-netback as it 
is generally considered to be the canonical implementation.


  Paul

Re: [PATCH v2 16/24] hw/xen: handle soft reset for primary console

2023-10-24 Thread Paul Durrant


On 19/10/2023 16:40, David Woodhouse wrote:

From: David Woodhouse 

On soft reset, the prinary console event channel needs to be rebound to
the backend port (in the xen-console driver). We could put that into the
xen-console driver itself, but it's slightly less ugly to keep it within
the KVM/Xen code, by stashing the backend port# on event channel reset
and then rebinding in the primary console reset when it has to recreate
the guest port anyway.


Does Xen re-bind the primary console on EVTCHNOP_reset? That's news to 
me. I go check.


  Paul



Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_evtchn.c  |  9 +
  hw/i386/kvm/xen_primary_console.c | 29 -
  hw/i386/kvm/xen_primary_console.h |  1 +
  3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index d72dca6591..ce4da6d37a 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -40,6 +40,7 @@
  #include "xen_evtchn.h"
  #include "xen_overlay.h"
  #include "xen_xenstore.h"
+#include "xen_primary_console.h"
  
  #include "sysemu/kvm.h"

  #include "sysemu/kvm_xen.h"
@@ -1098,6 +1099,7 @@ int xen_evtchn_soft_reset(void)
  {
  XenEvtchnState *s = xen_evtchn_singleton;
  bool flush_kvm_routes;
+uint16_t con_port = xen_primary_console_get_port();
  int i;
  
  if (!s) {

@@ -1108,6 +1110,13 @@ int xen_evtchn_soft_reset(void)
  
  qemu_mutex_lock(>port_lock);
  
+if (con_port) {

+XenEvtchnPort *p = >port_table[con_port];
+if (p->type == EVTCHNSTAT_interdomain) {
+xen_primary_console_set_be_port(p->u.interdomain.port);
+}
+}
+
  for (i = 0; i < s->nr_ports; i++) {
  close_port(s, i, _kvm_routes);
  }
diff --git a/hw/i386/kvm/xen_primary_console.c 
b/hw/i386/kvm/xen_primary_console.c
index 0aa1c16ad6..5e6e085ac7 100644
--- a/hw/i386/kvm/xen_primary_console.c
+++ b/hw/i386/kvm/xen_primary_console.c
@@ -112,6 +112,15 @@ uint16_t xen_primary_console_get_port(void)
  return s->guest_port;
  }
  
+void xen_primary_console_set_be_port(uint16_t port)

+{
+XenPrimaryConsoleState *s = xen_primary_console_singleton;
+if (s) {
+printf("be port set to %d\n", port);
+s->be_port = port;
+}
+}
+
  uint64_t xen_primary_console_get_pfn(void)
  {
  XenPrimaryConsoleState *s = xen_primary_console_singleton;
@@ -142,6 +151,20 @@ static void alloc_guest_port(XenPrimaryConsoleState *s)
  }
  }
  
+static void rebind_guest_port(XenPrimaryConsoleState *s)

+{
+struct evtchn_bind_interdomain inter = {
+.remote_dom = DOMID_QEMU,
+.remote_port = s->be_port,
+};
+
+if (!xen_evtchn_bind_interdomain_op()) {
+s->guest_port = inter.local_port;
+}
+
+s->be_port = 0;
+}
+
  int xen_primary_console_reset(void)
  {
  XenPrimaryConsoleState *s = xen_primary_console_singleton;
@@ -154,7 +177,11 @@ int xen_primary_console_reset(void)
  xen_overlay_do_map_page(>console_page, gpa);
  }
  
-alloc_guest_port(s);

+if (s->be_port) {
+rebind_guest_port(s);
+} else {
+alloc_guest_port(s);
+}
  
  trace_xen_primary_console_reset(s->guest_port);
  
diff --git a/hw/i386/kvm/xen_primary_console.h b/hw/i386/kvm/xen_primary_console.h

index dd4922f3f4..7e2989ea0d 100644
--- a/hw/i386/kvm/xen_primary_console.h
+++ b/hw/i386/kvm/xen_primary_console.h
@@ -16,6 +16,7 @@ void xen_primary_console_create(void);
  int xen_primary_console_reset(void);
  
  uint16_t xen_primary_console_get_port(void);

+void xen_primary_console_set_be_port(uint16_t port);
  uint64_t xen_primary_console_get_pfn(void);
  void *xen_primary_console_get_map(void);

Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode

2023-10-24 Thread Paul Durrant


On 24/10/2023 16:37, David Woodhouse wrote:

On Tue, 2023-10-24 at 15:20 +0100, Paul Durrant wrote:

On 16/10/2023 16:19, David Woodhouse wrote:

From: David Woodhouse 

The primary console is special because the toolstack maps a page at a
fixed GFN and also allocates the guest-side event channel. Add support
for that in emulated mode, so that we can have a primary console.

Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which
supports literally nothing except a single-page mapping of the console
page. This might as well have been a hack in the xen_console driver, but
this way at least the special-casing is kept within the Xen emulation
code, and it gives us a hook for a more complete implementation if/when
we ever do need one.


Why can't you map the console page via the grant table like the xenstore
page?


I suppose we could, but I didn't really want the generic xen-console
device code having any more of a special case for 'Xen emulation' than
it does already by having to call xen_primary_console_create().



But doesn't is save you the whole foreignmem thing? You can use the 
grant table for primary and secondary consoles.


  Paul

Re: [PATCH v2 10/24] hw/xen: populate store frontend nodes with XenStore PFN/port

2023-10-24 Thread Paul Durrant


On 19/10/2023 16:40, David Woodhouse wrote:

From: David Woodhouse 

This is kind of redundant since without being able to get these through
some other method (HVMOP_get_param) the guest wouldn't be able to access
XenStore in order to find them. But Xen populates them, and it does
allow guests to *rebind* to the event channel port after a reset.

Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_xenstore.c | 11 +++
  1 file changed, 11 insertions(+)



Reviewed-by: Paul Durrant

Re: [PATCH v2 05/24] hw/xen: fix XenStore watch delivery to guest

2023-10-24 Thread Paul Durrant


On 19/10/2023 16:40, David Woodhouse wrote:

From: David Woodhouse 

When fire_watch_cb() found the response buffer empty, it would call
deliver_watch() to generate the XS_WATCH_EVENT message in the response
buffer and send an event channel notification to the guest… without
actually *copying* the response buffer into the ring. So there was
nothing for the guest to see. The pending response didn't actually get
processed into the ring until the guest next triggered some activity
from its side.

Add the missing call to put_rsp().

It might have been slightly nicer to call xen_xenstore_event() here,
which would *almost* have worked. Except for the fact that it calls
xen_be_evtchn_pending() to check that it really does have an event
pending (and clear the eventfd for next time). And under Xen it's
defined that setting that fd to O_NONBLOCK isn't guaranteed to work,
so the emu implementation follows suit.

This fixes Xen device hot-unplug.

Fixes: 0254c4d19df ("hw/xen: Add xenstore wire implementation and implementation 
stubs")
Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_xenstore.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
index 660d0b72f9..82a215058a 100644
--- a/hw/i386/kvm/xen_xenstore.c
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -1357,10 +1357,12 @@ static void fire_watch_cb(void *opaque, const char 
*path, const char *token)
  } else {
  deliver_watch(s, path, token);
  /*
- * If the message was queued because there was already ring activity,
- * no need to wake the guest. But if not, we need to send the evtchn.
+ * Attempt to queue the message into the actual ring, and send
+ * the event channel notification if any bytes are copied.
   */
-xen_be_evtchn_notify(s->eh, s->be_port);
+if (put_rsp(s) > 0) {
+xen_be_evtchn_notify(s->eh, s->be_port);
+}


There's actually the potential for an assertion failure there; if the 
guest has specified an oversize token then deliver_watch() will not set 
rsp_pending... then put_rsp() will fail its assertion that the flag is true.


  Paul


  }
  }

Re: [PATCH v2 04/24] hw/xen: don't clear map_track[] in xen_gnttab_reset()

2023-10-24 Thread Paul Durrant


On 19/10/2023 16:40, David Woodhouse wrote:

From: David Woodhouse 

The refcounts actually correspond to 'active_ref' structures stored in a
GHashTable per "user" on the backend side (mostly, per XenDevice).

If we zero map_track[] on reset, then when the backend drivers get torn
down and release their mapping we hit the assert(s->map_track[ref] != 0)
in gnt_unref().

So leave them in place. Each backend driver will disconnect and reconnect
as the guest comes back up again and reconnects, and it all works out OK
in the end as the old refs get dropped.

Fixes: de26b2619789 ("hw/xen: Implement soft reset for emulated gnttab")
Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_gnttab.c | 2 --
  1 file changed, 2 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH v2 03/24] hw/xen: select kernel mode for per-vCPU event channel upcall vector

2023-10-24 Thread Paul Durrant


On 19/10/2023 16:39, David Woodhouse wrote:

From: David Woodhouse 

A guest which has configured the per-vCPU upcall vector may set the
HVM_PARAM_CALLBACK_IRQ param to fairly much anything other than zero.

For example, Linux v6.0+ after commit b1c3497e604 ("x86/xen: Add support
for HVMOP_set_evtchn_upcall_vector") will just do this after setting the
vector:

/* Trick toolstack to think we are enlightened. */
if (!cpu)
rc = xen_set_callback_via(1);

That's explicitly setting the delivery to GSI#1, but it's supposed to be
overridden by the per-vCPU vector setting. This mostly works in Qemu
*except* for the logic to enable the in-kernel handling of event channels,
which falsely determines that the kernel cannot accelerate GSI delivery
in this case.

Add a kvm_xen_has_vcpu_callback_vector() to report whether vCPU#0 has
the vector set, and use that in xen_evtchn_set_callback_param() to
enable the kernel acceleration features even when the param *appears*
to be set to target a GSI.

Preserve the Xen behaviour that when HVM_PARAM_CALLBACK_IRQ is set to
*zero* the event channel delivery is disabled completely. (Which is
what that bizarre guest behaviour is working round in the first place.)

Fixes: 91cce756179 ("hw/xen: Add xen_evtchn device for event channel emulation")
Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_evtchn.c  | 6 ++
  include/sysemu/kvm_xen.h  | 1 +
  target/i386/kvm/xen-emu.c | 7 +++
  3 files changed, 14 insertions(+)



Reviewed-by: Paul Durrant

Re: [PATCH 2/4] hw/xen: update Xen PV NIC to XenDevice model

2023-10-24 Thread Paul Durrant


On 17/10/2023 19:25, David Woodhouse wrote:

From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
  hw/net/meson.build|   2 +-
  hw/net/trace-events   |   9 +
  hw/net/xen_nic.c  | 426 +-
  hw/xenpv/xen_machine_pv.c |   1 -
  4 files changed, 342 insertions(+), 96 deletions(-)

diff --git a/hw/net/meson.build b/hw/net/meson.build
index 2632634df3..f64651c467 100644
--- a/hw/net/meson.build
+++ b/hw/net/meson.build
@@ -1,5 +1,5 @@
  system_ss.add(when: 'CONFIG_DP8393X', if_true: files('dp8393x.c'))
-system_ss.add(when: 'CONFIG_XEN', if_true: files('xen_nic.c'))
+system_ss.add(when: 'CONFIG_XEN_BUS', if_true: files('xen_nic.c'))
  system_ss.add(when: 'CONFIG_NE2000_COMMON', if_true: files('ne2000.c'))
  
  # PCI network cards

diff --git a/hw/net/trace-events b/hw/net/trace-events
index 3abfd65e5b..ee56acfbce 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -482,3 +482,12 @@ dp8393x_receive_oversize(int size) "oversize packet, pkt_size 
is %d"
  dp8393x_receive_not_netcard(void) "packet not for netcard"
  dp8393x_receive_packet(int crba) "Receive packet at 0x%"PRIx32
  dp8393x_receive_write_status(int crba) "Write status at 0x%"PRIx32
+
+# xen_nic.c
+xen_netdev_realize(int idx, const char *info) "idx %u info '%s'"
+xen_netdev_unrealize(int idx) "idx %u"
+xen_netdev_create(int idx) "idx %u"
+xen_netdev_destroy(int idx) "idx %u"
+xen_netdev_disconnect(int idx) "idx %u"
+xen_netdev_connect(int idx, unsigned int tx, unsigned int rx, int port) "idx %u tx 
%u rx %u port %u"
+xen_netdev_frontend_changed(const char *idx, int state) "idx %s state %d"
diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c
index 9bbf6599fc..84914c329c 100644
--- a/hw/net/xen_nic.c
+++ b/hw/net/xen_nic.c
@@ -20,6 +20,11 @@
   */
  
  #include "qemu/osdep.h"

+#include "qemu/cutils.h"
+#include "qemu/qemu-print.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/error.h"
+
  #include 
  #include 
  #include 
@@ -27,18 +32,26 @@
  #include "net/net.h"
  #include "net/checksum.h"
  #include "net/util.h"
-#include "hw/xen/xen-legacy-backend.h"
+
+#include "hw/xen/xen-backend.h"
+#include "hw/xen/xen-bus-helper.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-properties-system.h"
  
  #include "hw/xen/interface/io/netif.h"

+#include "hw/xen/interface/io/xs_wire.h"
+
+#include "trace.h"
  
  /* - */
  
  struct XenNetDev {

-struct XenLegacyDevice  xendev;  /* must be first */
-char  *mac;
+struct XenDevice  xendev;  /* must be first */
+XenEventChannel   *event_channel;
+int   dev;
  int   tx_work;
-int   tx_ring_ref;
-int   rx_ring_ref;
+unsigned int  tx_ring_ref;
+unsigned int  rx_ring_ref;
  struct netif_tx_sring *txs;
  struct netif_rx_sring *rxs;
  netif_tx_back_ring_t  tx_ring;
@@ -47,6 +60,13 @@ struct XenNetDev {
  NICState  *nic;
  };
  
+typedef struct XenNetDev XenNetDev;

+
+#define TYPE_XEN_NET_DEVICE "xen-net-device"
+OBJECT_DECLARE_SIMPLE_TYPE(XenNetDev, XEN_NET_DEVICE)
+
+#define xen_pv_printf(a, n, ...) qemu_printf(__VA_ARGS__)


Why define this...


+
  /* - */
  
  static void net_tx_response(struct XenNetDev *netdev, netif_tx_request_t *txp, int8_t st)

@@ -68,7 +88,8 @@ static void net_tx_response(struct XenNetDev *netdev, 
netif_tx_request_t *txp, i
  netdev->tx_ring.rsp_prod_pvt = ++i;
  RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(>tx_ring, notify);
  if (notify) {
-xen_pv_send_notify(>xendev);
+xen_device_notify_event_channel(XEN_DEVICE(netdev),
+netdev->event_channel, NULL);
  }
  
  if (i == netdev->tx_ring.req_cons) {

@@ -104,8 +125,9 @@ static void net_tx_error(struct XenNetDev *netdev, 
netif_tx_request_t *txp, RING
  #endif
  }
  
-static void net_tx_packets(struct XenNetDev *netdev)

+static bool net_tx_packets(struct XenNetDev *netdev)
  {
+bool done_something = false;
  netif_tx_request_t txreq;
  RING_IDX rc, rp;
  void *page;
@@ -122,6 +144,7 @@ static void net_tx_packets(struct XenNetDev *netdev)
  }
  memcpy(, RING_GET_REQUEST(>tx_ring, rc), 
sizeof(txreq));
  netdev->tx_ring.req_cons = ++rc;
+done_something = true;
  
  #if 1

  /* should not happen in theory, we don't announce the *
@@ -151,7 +174,7 @@ static void net_tx_packets(struct XenNetDev *netdev)
  continue;
  }
  
-xen_pv_printf(>xendev, 3,

+if (0) qemu_printf(//>xendev, 3,


... and then not use it here? Perhaps the 'if (0)' ugliness can go in 
the macro too.



"tx packet ref %d, off %d, len %d, flags 
0x%x%s%s%s%s\n",

Re: [PATCH 1/4] hw/xen: only remove peers of PCI NICs on unplug

2023-10-24 Thread Paul Durrant


On 17/10/2023 19:25, David Woodhouse wrote:

From: David Woodhouse 

When the Xen guest asks to unplug *emulated* NICs, it's kind of unhelpful
also to unplug the peer of the *Xen* PV NIC.

Signed-off-by: David Woodhouse 
---
  hw/i386/xen/xen_platform.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
index 17457ff3de..e2dd1b536a 100644
--- a/hw/i386/xen/xen_platform.c
+++ b/hw/i386/xen/xen_platform.c
@@ -140,9 +140,14 @@ static void unplug_nic(PCIBus *b, PCIDevice *d, void *o)
  /* Remove the peer of the NIC device. Normally, this would be a tap device. */
  static void del_nic_peer(NICState *nic, void *opaque)
  {
-NetClientState *nc;
+NetClientState *nc = qemu_get_queue(nic);
+ObjectClass *klass = module_object_class_by_name(nc->model);
+
+/* Only delete peers of PCI NICs that we're about to delete */
+if (!klass || !object_class_dynamic_cast(klass, TYPE_PCI_DEVICE)) {


Would it not be better to test for object_class_dynamic_cast(klass, 
TYPE_XEN_DEVICE)?


  Paul


+return;
+}
  
-nc = qemu_get_queue(nic);

  if (nc->peer)
  qemu_del_net_client(nc->peer);
  }

Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode

2023-10-24 Thread Paul Durrant


On 16/10/2023 16:19, David Woodhouse wrote:

From: David Woodhouse 

The primary console is special because the toolstack maps a page at a
fixed GFN and also allocates the guest-side event channel. Add support
for that in emulated mode, so that we can have a primary console.

Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which
supports literally nothing except a single-page mapping of the console
page. This might as well have been a hack in the xen_console driver, but
this way at least the special-casing is kept within the Xen emulation
code, and it gives us a hook for a more complete implementation if/when
we ever do need one.

Why can't you map the console page via the grant table like the xenstore 
page?


  Paul

Re: [PATCH 09/12] hw/xen: prevent duplicate device registrations

2023-10-24 Thread Paul Durrant


On 16/10/2023 16:19, David Woodhouse wrote:

From: David Woodhouse 

Ensure that we have a XenBackendInstance for every device regardless
of whether it was "discovered" in XenStore or created directly in QEMU.

This allows the backend_list to be a source of truth about whether a
given backend exists, and allows us to reject duplicates.

This also cleans up the fact that backend drivers were calling
xen_backend_set_device() with a XenDevice immediately after calling
qdev_realize_and_unref() on it, when it wasn't theirs to play with any
more.

Signed-off-by: David Woodhouse 
---
  hw/block/xen-block.c |  1 -
  hw/char/xen_console.c|  2 +-
  hw/xen/xen-backend.c | 78 ++--
  hw/xen/xen-bus.c |  8 
  include/hw/xen/xen-backend.h |  3 ++
  5 files changed, 69 insertions(+), 23 deletions(-)

diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index a07cd7eb5d..9262338535 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -975,7 +975,6 @@ static void xen_block_device_create(XenBackendInstance 
*backend,
  goto fail;
  }
  
-xen_backend_set_device(backend, xendev);

  return;
  
  fail:

diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c
index bd20be116c..2825b8c511 100644
--- a/hw/char/xen_console.c
+++ b/hw/char/xen_console.c
@@ -468,7 +468,7 @@ static void xen_console_device_create(XenBackendInstance 
*backend,
  Chardev *cd = NULL;
  struct qemu_xs_handle *xsh = xenbus->xsh;
  
-if (qemu_strtoul(name, NULL, 10, )) {

+if (qemu_strtoul(name, NULL, 10, ) || number >= INT_MAX) {
  error_setg(errp, "failed to parse name '%s'", name);
  goto fail;
  }
I don't think this hunk belongs here, does it? Seems like it should be 
in patch 7.



diff --git a/hw/xen/xen-backend.c b/hw/xen/xen-backend.c
index b9bf70a9f5..dcb4329258 100644
--- a/hw/xen/xen-backend.c
+++ b/hw/xen/xen-backend.c
@@ -101,22 +101,28 @@ static XenBackendInstance 
*xen_backend_list_find(XenDevice *xendev)
  return NULL;
  }
  
-bool xen_backend_exists(const char *type, const char *name)

+static XenBackendInstance *xen_backend_lookup(const XenBackendImpl *impl, 
const char *name)


This name is a little close to xen_backend_table_lookup()... perhaps 
that one should be renamed xen_backend_impl_lookup() for clarity.



  {
-const XenBackendImpl *impl = xen_backend_table_lookup(type);
  XenBackendInstance *backend;
  
-if (!impl) {

-return false;
-}
-
  QLIST_FOREACH(backend, _list, entry) {
  if (backend->impl == impl && !strcmp(backend->name, name)) {
-return true;
+return backend;
  }
  }
  
-return false;

+return NULL;
+}
+
+bool xen_backend_exists(const char *type, const char *name)
+{
+const XenBackendImpl *impl = xen_backend_table_lookup(type);
+
+if (!impl) {
+return false;
+}
+
+return !!xen_backend_lookup(impl, name);
  }
  
  static void xen_backend_list_remove(XenBackendInstance *backend)

@@ -138,11 +144,10 @@ void xen_backend_device_create(XenBus *xenbus, const char 
*type,
  backend = g_new0(XenBackendInstance, 1);
  backend->xenbus = xenbus;
  backend->name = g_strdup(name);
-
-impl->create(backend, opts, errp);
-
  backend->impl = impl;
  xen_backend_list_add(backend);
+
+impl->create(backend, opts, errp);
  }
  
  XenBus *xen_backend_get_bus(XenBackendInstance *backend)

@@ -155,13 +160,6 @@ const char *xen_backend_get_name(XenBackendInstance 
*backend)
  return backend->name;
  }
  
-void xen_backend_set_device(XenBackendInstance *backend,

-XenDevice *xendev)
-{
-g_assert(!backend->xendev);
-backend->xendev = xendev;
-}
-


The declaration also needs removing from xen-backend.h presumably.


  XenDevice *xen_backend_get_device(XenBackendInstance *backend)
  {
  return backend->xendev;
@@ -178,9 +176,7 @@ bool xen_backend_try_device_destroy(XenDevice *xendev, 
Error **errp)
  }
  
  impl = backend->impl;

-if (backend->xendev) {
-impl->destroy(backend, errp);
-}
+impl->destroy(backend, errp);
  
  xen_backend_list_remove(backend);

  g_free(backend->name);
@@ -188,3 +184,43 @@ bool xen_backend_try_device_destroy(XenDevice *xendev, 
Error **errp)
  
  return true;

  }
+
+bool xen_backend_device_realized(XenDevice *xendev, Error **errp)
+{
+XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev);
+const char *type = xendev_class->backend ? : 
object_get_typename(OBJECT(xendev));
+const XenBackendImpl *impl = xen_backend_table_lookup(type);
+XenBackendInstance *backend;
+
+if (!impl) {
+return false;
+}
+
+backend = xen_backend_lookup(impl, xendev->name);
+if (backend) {
+if (backend->xendev && backend->xendev != xendev) {
+error_setg(errp, "device %s/%s already exists", type, 
xendev->name);
+

Re: [PATCH 06/12] hw/xen: add get_frontend_path() method to XenDeviceClass

2023-10-24 Thread Paul Durrant


On 24/10/2023 14:29, David Woodhouse wrote:

On Tue, 2023-10-24 at 13:59 +0100, Paul Durrant wrote:

On 24/10/2023 13:56, David Woodhouse wrote:

On Tue, 2023-10-24 at 13:42 +0100, Paul Durrant wrote:



--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -711,8 +711,16 @@ static void xen_device_frontend_create(XenDevice *xendev, 
Error **errp)
     {
     ERRP_GUARD();
     XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev)));
+    XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev);
 
-    xendev->frontend_path = xen_device_get_frontend_path(xendev);

+    if (xendev_class->get_frontend_path) {
+    xendev->frontend_path = xendev_class->get_frontend_path(xendev, errp);
+    if (!xendev->frontend_path) {
+    return;


I think you need to update errp here to note that you are failing to
create the frontend.


If xendev_class->get_frontend_path returned NULL it will have filled in errp.



Ok, but a prepend to say that a lack of path there means we skip
frontend creation seems reasonable?


No, it *is* returning an error. Perhaps I can make it



I understand it is returning an error. I thought the point of the 
cascading error handling was to prepend text at each (meaningful) layer 
such that the eventual message conveyed what failed and also what the 
consequences of that failure were.


  Paul


 if (!xendev->frontend_path) {
 /*
  * If the ->get_frontend_path() method returned NULL, it will
  * already have set *errp accordingly. Return the error.
  */
 return /* false */;
 }



As a general rule (I'll be doing a bombing run on xen-bus once I get my
patch queue down into single digits) we should never check 'if (*errp)'
to check if a function had an error. It should *also* return a success
or failure indication, and we should cope with errp being NULL.



I'm pretty sure someone told me the exact opposite a few years back.


Then they were wrong :)

Re: [PATCH 08/12] hw/xen: do not repeatedly try to create a failing backend device

2023-10-24 Thread Paul Durrant


On 16/10/2023 16:19, David Woodhouse wrote:

From: David Woodhouse 

If xen_backend_device_create() fails to instantiate a device, the XenBus
code will just keep trying over and over again each time the bus is
re-enumerated, as long as the backend appears online and in
XenbusStateInitialising.

The only thing which prevents the XenBus code from recreating duplicates
of devices which already exist, is the fact that xen_device_realize()
sets the backend state to XenbusStateInitWait. If the attempt to create
the device doesn't get *that* far, that's when it will keep getting
retried.

My first thought was to handle errors by setting the backend state to
XenbusStateClosed, but that doesn't work for XenConsole which wants to
*ignore* any device of type != "ioemu" completely.

So, make xen_backend_device_create() *keep* the XenBackendInstance for a
failed device, and provide a new xen_backend_exists() function to allow
xen_bus_type_enumerate() to check whether one already exists before
creating a new one.

Signed-off-by: David Woodhouse 
---
  hw/xen/xen-backend.c | 27 +--
  hw/xen/xen-bus.c |  3 ++-
  include/hw/xen/xen-backend.h |  1 +
  3 files changed, 24 insertions(+), 7 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH 07/12] hw/xen: update Xen console to XenDevice model

2023-10-24 Thread Paul Durrant

ce_notify_event_channel(XEN_DEVICE(con), con->event_channel, NULL);
  }
  
-static void xencons_send(struct XenConsole *con)

+static bool xencons_send(XenConsole *con)
  {
  ssize_t len, size;
  
@@ -159,174 +170,407 @@ static void xencons_send(struct XenConsole *con)

  if (len < 1) {
  if (!con->backlog) {
  con->backlog = 1;
-xen_pv_printf(>xendev, 1,
-  "backlog piling up, nobody listening?\n");
  }
  } else {
  buffer_advance(>buffer, len);
  if (con->backlog && len == size) {
  con->backlog = 0;
-xen_pv_printf(>xendev, 1, "backlog is gone\n");
  }
  }
+return len > 0;
  }
  
  /*  */
  
-static int store_con_info(struct XenConsole *con)

+static bool con_event(void *_xendev)
  {
-Chardev *cs = qemu_chr_fe_get_driver(>chr);
-char *pts = NULL;
-char *dom_path;
-g_autoptr(GString) path = NULL;
+XenConsole *con = XEN_CONSOLE_DEVICE(_xendev);
+bool done_something;
  
-/* Only continue if we're talking to a pty. */

-if (!CHARDEV_IS_PTY(cs)) {
-return 0;
-}
-pts = cs->filename + 4;
+done_something = buffer_append(con);
  
-dom_path = qemu_xen_xs_get_domain_path(xenstore, xen_domid);

-if (!dom_path) {
-return 0;
+if (con->buffer.size - con->buffer.consumed) {
+done_something |= xencons_send(con);
  }
+return done_something;
+}
  
-path = g_string_new(dom_path);

-free(dom_path);
+/*  */
  
-if (con->xendev.dev) {

-g_string_append_printf(path, "/device/console/%d", con->xendev.dev);
-} else {
-g_string_append(path, "/console");
+static void xen_console_disconnect(XenDevice *xendev, Error **errp)
+{
+XenConsole *con = XEN_CONSOLE_DEVICE(xendev);
+
+qemu_chr_fe_set_handlers(>chr, NULL, NULL, NULL, NULL,
+ con, NULL, true);
+


nit: extraneous blank line by the looks of it.

With that fixed...

Reviewed-by: Paul Durrant

Re: [PATCH 06/12] hw/xen: add get_frontend_path() method to XenDeviceClass

2023-10-24 Thread Paul Durrant


On 24/10/2023 13:56, David Woodhouse wrote:

On Tue, 2023-10-24 at 13:42 +0100, Paul Durrant wrote:



--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -711,8 +711,16 @@ static void xen_device_frontend_create(XenDevice *xendev, 
Error **errp)
    {
    ERRP_GUARD();
    XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev)));
+    XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev);

-    xendev->frontend_path = xen_device_get_frontend_path(xendev);

+    if (xendev_class->get_frontend_path) {
+    xendev->frontend_path = xendev_class->get_frontend_path(xendev, errp);
+    if (!xendev->frontend_path) {
+    return;


I think you need to update errp here to note that you are failing to
create the frontend.


If xendev_class->get_frontend_path returned NULL it will have filled in errp.



Ok, but a prepend to say that a lack of path there means we skip 
frontend creation seems reasonable?



As a general rule (I'll be doing a bombing run on xen-bus once I get my
patch queue down into single digits) we should never check 'if (*errp)'
to check if a function had an error. It should *also* return a success
or failure indication, and we should cope with errp being NULL.



I'm pretty sure someone told me the exact opposite a few years back.

  Paul

Re: [PATCH 06/12] hw/xen: add get_frontend_path() method to XenDeviceClass

2023-10-24 Thread Paul Durrant


On 16/10/2023 16:19, David Woodhouse wrote:

From: David Woodhouse 

The primary Xen console is special. The guest's side is set up for it by
the toolstack automatically and not by the standard PV init sequence.

Accordingly, its *frontend* doesn't appear in …/device/console/0 either;
instead it appears under …/console in the guest's XenStore node.

To allow the Xen console driver to override the frontend path for the
primary console, add a method to the XenDeviceClass which can be used
instead of the standard xen_device_get_frontend_path()

Signed-off-by: David Woodhouse 
---
  hw/xen/xen-bus.c | 10 +-
  include/hw/xen/xen-bus.h |  2 ++
  2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index ece8ec40cd..cc524ed92c 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -711,8 +711,16 @@ static void xen_device_frontend_create(XenDevice *xendev, 
Error **errp)
  {
  ERRP_GUARD();
  XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev)));
+XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev);
  
-xendev->frontend_path = xen_device_get_frontend_path(xendev);

+if (xendev_class->get_frontend_path) {
+xendev->frontend_path = xendev_class->get_frontend_path(xendev, errp);
+if (!xendev->frontend_path) {
+return;


I think you need to update errp here to note that you are failing to 
create the frontend.


  Paul

Re: [PATCH 05/12] hw/xen: populate store frontend nodes with XenStore PFN/port

2023-10-24 Thread Paul Durrant


On 16/10/2023 16:19, David Woodhouse wrote:

From: David Woodhouse 

This is kind of redundant since without being able to get these through
ome other method (HVMOP_get_param) the guest wouldn't be able to access


^ typo


XenStore in order to find them. But Xen populates them, and it does
allow guests to *rebind* to the event channel port after a reset.



... although this can also be done by querying the remote end of the 
port before reset.



Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_xenstore.c | 10 ++
  1 file changed, 10 insertions(+)



Reviewed-by: Paul Durrant

Re: [PATCH 04/12] i386/xen: advertise XEN_HVM_CPUID_UPCALL_VECTOR in CPUID

2023-10-24 Thread Paul Durrant


On 16/10/2023 16:19, David Woodhouse wrote:

From: David Woodhouse 

This will allow Linux guests (since v6.0) to use the per-vCPU upcall
vector delivered as MSI through the local APIC.

Signed-off-by: David Woodhouse 
---
  target/i386/kvm/kvm.c | 4 
  1 file changed, 4 insertions(+)



Reviewed-by: Paul Durrant

Re: [PATCH 03/12] include: update Xen public headers to Xen 4.17.2 release

2023-10-24 Thread Paul Durrant


On 16/10/2023 16:19, David Woodhouse wrote:

From: David Woodhouse 

... in order to advertise the XEN_HVM_CPUID_UPCALL_VECTOR feature,
which will come in a subsequent commit.

Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_xenstore.c|  2 +-
  include/hw/xen/interface/arch-arm.h   | 37 +++---
  include/hw/xen/interface/arch-x86/cpuid.h | 31 +---
  .../hw/xen/interface/arch-x86/xen-x86_32.h| 19 +--
  .../hw/xen/interface/arch-x86/xen-x86_64.h| 19 +--
  include/hw/xen/interface/arch-x86/xen.h   | 26 ++
  include/hw/xen/interface/event_channel.h  | 19 +--
  include/hw/xen/interface/features.h   | 19 +--
  include/hw/xen/interface/grant_table.h| 19 +--
  include/hw/xen/interface/hvm/hvm_op.h | 19 +--
  include/hw/xen/interface/hvm/params.h | 19 +--
  include/hw/xen/interface/io/blkif.h   | 27 --
  include/hw/xen/interface/io/console.h | 19 +--
  include/hw/xen/interface/io/fbif.h| 19 +--
  include/hw/xen/interface/io/kbdif.h   | 19 +--
  include/hw/xen/interface/io/netif.h   | 25 +++---
  include/hw/xen/interface/io/protocols.h   | 19 +--
  include/hw/xen/interface/io/ring.h| 49 ++-
  include/hw/xen/interface/io/usbif.h   | 19 +--
  include/hw/xen/interface/io/xenbus.h  | 19 +--
  include/hw/xen/interface/io/xs_wire.h | 36 ++
  include/hw/xen/interface/memory.h | 30 +---
  include/hw/xen/interface/physdev.h| 23 ++---
  include/hw/xen/interface/sched.h  | 19 +--
  include/hw/xen/interface/trace.h  | 19 +--
  include/hw/xen/interface/vcpu.h   | 19 +--
  include/hw/xen/interface/version.h| 19 +--
  include/hw/xen/interface/xen-compat.h | 19 +--
  include/hw/xen/interface/xen.h| 19 +--
  29 files changed, 124 insertions(+), 523 deletions(-)



Acked-by: Paul Durrant

Re: [PATCH 02/12] hw/xen: select kernel mode for per-vCPU event channel upcall vector

2023-10-24 Thread Paul Durrant


On 16/10/2023 16:18, David Woodhouse wrote:

From: David Woodhouse 

A guest which has configured the per-vCPU upcall vector may set the
HVM_PARAM_CALLBACK_IRQ param to fairly much anything other than zero.

For example, Linux v6.0+ after commit b1c3497e604 ("x86/xen: Add support
for HVMOP_set_evtchn_upcall_vector") will just do this after setting the
vector:

/* Trick toolstack to think we are enlightened. */
if (!cpu)
rc = xen_set_callback_via(1);

That's explicitly setting the delivery to GSI#, but it's supposed to be
overridden by the per-vCPU vector setting. This mostly works in QEMU
*except* for the logic to enable the in-kernel handling of event channels,
which falsely determines that the kernel cannot accelerate GSI delivery
in this case.

Add a kvm_xen_has_vcpu_callback_vector() to report whether vCPU#0 has
the vector set, and use that in xen_evtchn_set_callback_param() to
enable the kernel acceleration features even when the param *appears*
to be set to target a GSI.

Preserve the Xen behaviour that when HVM_PARAM_CALLBACK_IRQ is set to
*zero* the event channel delivery is disabled completely. (Which is
what that bizarre guest behaviour is working round in the first place.)

Fixes: 91cce756179 ("hw/xen: Add xen_evtchn device for event channel emulation")
Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_evtchn.c  | 6 ++
  include/sysemu/kvm_xen.h  | 1 +
  target/i386/kvm/xen-emu.c | 7 +++
  3 files changed, 14 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 4df973022c..d72dca6591 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -490,6 +490,12 @@ int xen_evtchn_set_callback_param(uint64_t param)
  break;
  }
  
+/* If the guest has set a per-vCPU callback vector, prefer that. */

+if (gsi && kvm_xen_has_vcpu_callback_vector()) {
+in_kernel = kvm_xen_has_cap(EVTCHN_SEND);
+gsi = 0;
+}
+


So this deals with setting the callback via after setting the upcall 
vector. What happens if the guest then disables the upcall vector (by 
setting it to zero)? Xen checks 'v->arch.hvm.evtchn_upcall_vector != 0' 
for every event delivery.


  Paul

Re: [PATCH 01/12] i386/xen: fix per-vCPU upcall vector for Xen emulation

2023-10-24 Thread Paul Durrant


On 16/10/2023 16:18, David Woodhouse wrote:

From: David Woodhouse 

The per-vCPU upcall vector support had two problems. Firstly it was
using the wrong hypercall argument and would always return -EFAULT.
And secondly it was using the wrong ioctl() to pass the vector to
the kernel and thus the *kernel* would always return -EINVAL.

Linux doesn't (yet) use this mode so it went without decent testing
for a while.

Fixes: 105b47fdf2d0 ("i386/xen: implement HVMOP_set_evtchn_upcall_vector")
Signed-off-by: David Woodhouse 
---
  target/i386/kvm/xen-emu.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)


Reviewed-by: Paul Durrant

Re: [PATCH] net/xen-netback: Break build if netback slots > max_skbs + 1

2023-09-27 Thread Paul Durrant


On 27/09/2023 09:29, David Kahurani wrote:

If XEN_NETBK_LEGACY_SLOTS_MAX and MAX_SKB_FRAGS have a difference of
more than 1, with MAX_SKB_FRAGS being the lesser value, it opens up a
path for null-dereference. It was also noted that some distributions
were modifying upstream behaviour in that direction which necessitates
this patch.

Signed-off-by: David Kahurani 


Acked-by: Paul Durrant

Re: [XEN PATCH 08/13] xen/hvm: address violations of MISRA C:2012 Rule 7.3

2023-09-13 Thread Paul Durrant


On 03/08/2023 11:22, Simone Ballarin wrote:

From: Gianluca Luparini 

The xen sources contain violations of MISRA C:2012 Rule 7.3 whose headline
states:
"The lowercase character 'l' shall not be used in a literal suffix".

Use the "L" suffix instead of the "l" suffix, to avoid potential ambiguity.
If the "u" suffix is used near "L", use the "U" suffix instead, for consistency.

The changes in this patch are mechanical.

Signed-off-by: Gianluca Luparini 
Signed-off-by: Simone Ballarin 
---
  xen/arch/x86/hvm/emulate.c | 2 +-
  xen/arch/x86/hvm/io.c  | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)



Acked-by: Paul Durrant

Re: [XEN PATCH 05/13] xen/ioreq: address violations of MISRA C:2012 Rule 7.3

2023-09-13 Thread Paul Durrant


On 03/08/2023 11:22, Simone Ballarin wrote:

From: Gianluca Luparini 

The xen sources contain violations of MISRA C:2012 Rule 7.3 whose headline
states:
"The lowercase character 'l' shall not be used in a literal suffix".

Use the "L" suffix instead of the "l" suffix, to avoid potential ambiguity.
If the "u" suffix is used near "L", use the "U" suffix instead, for consistency.

The changes in this patch are mechanical.

Signed-off-by: Gianluca Luparini 
Signed-off-by: Simone Ballarin 
---
  xen/common/ioreq.c  | 2 +-
  xen/include/xen/ioreq.h | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)



Acked-by: Paul Durrant

Re: [XEN PATCH 10/13] x86/viridian: address violations of MISRA C:2012 Rule 7.3

2023-09-12 Thread Paul Durrant


On 03/08/2023 11:22, Simone Ballarin wrote:

From: Gianluca Luparini 

The xen sources contain violations of MISRA C:2012 Rule 7.3 whose headline
states:
"The lowercase character 'l' shall not be used in a literal suffix".

Use the "L" suffix instead of the "l" suffix, to avoid potential ambiguity.
If the "u" suffix is used near "L", use the "U" suffix instead, for consistency.

The changes in this patch are mechanical.

Signed-off-by: Gianluca Luparini 
Signed-off-by: Simone Ballarin 
---
  xen/arch/x86/hvm/viridian/synic.c |  2 +-
  xen/arch/x86/hvm/viridian/time.c  | 10 +-
  2 files changed, 6 insertions(+), 6 deletions(-)



Reviewed-by: Paul Durrant

Re: [XEN PATCH v5 3/4] x86/viridian: address violations of MISRA C:2012 Rule 7.2

2023-09-12 Thread Paul Durrant


On 28/08/2023 12:03, Simone Ballarin wrote:

From: Gianluca Luparini 

The xen sources contains violations of MISRA C:2012 Rule 7.2 whose
headline states:
"A 'u' or 'U' suffix shall be applied to all integer constants
that are represented in an unsigned type".

Add the 'U' suffix to integers literals with unsigned type and also to other
literals used in the same contexts or near violations, when their positive
nature is immediately clear. The latter changes are done for the sake of
uniformity.

Signed-off-by: Gianluca Luparini 
Signed-off-by: Simone Ballarin 
Reviewed-by: Stefano Stabellini 
---
Changes in v4:
- change commit headline
- add Reviewed-by

Changes in v3:
- create this commit for 'viridian.c' and 'hyperv-tlfs.h'
---
  xen/arch/x86/hvm/viridian/viridian.c |  2 +-
  xen/arch/x86/include/asm/guest/hyperv-tlfs.h | 28 ++--
  2 files changed, 15 insertions(+), 15 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH v2 1/4] block: rename blk_io_plug_call() API to defer_call()

2023-08-18 Thread Paul Durrant


On 17/08/2023 16:58, Stefan Hajnoczi wrote:

Prepare to move the blk_io_plug_call() API out of the block layer so
that other subsystems call use this deferred call mechanism. Rename it
to defer_call() but leave the code in block/plug.c.

The next commit will move the code out of the block layer.

Suggested-by: Ilya Maximets 
Signed-off-by: Stefan Hajnoczi 
---
  include/sysemu/block-backend-io.h |   6 +-
  block/blkio.c |   8 +--
  block/io_uring.c  |   4 +-
  block/linux-aio.c |   4 +-
  block/nvme.c  |   4 +-
  block/plug.c  | 109 +++---
  hw/block/dataplane/xen-block.c|  10 +--
  hw/block/virtio-blk.c |   4 +-
  hw/scsi/virtio-scsi.c |   6 +-
  9 files changed, 76 insertions(+), 79 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH v1] xen-platform: do full PCI reset during unplug of IDE devices

2023-07-27 Thread Paul Durrant


On 20/07/2023 08:29, Olaf Hering wrote:

The IDE unplug function needs to reset the entire PCI device, to make
sure all state is initialized to defaults. This is done by calling
pci_device_reset, which resets not only the chip specific registers, but
also all PCI state. This fixes "unplug" in a Xen HVM domU with the
modular legacy xenlinux PV drivers.

Commit ee358e919e38 ("hw/ide/piix: Convert reset handler to
DeviceReset") changed the way how the the disks are unplugged. Prior
this commit the PCI device remained unchanged. After this change,
piix_ide_reset is exercised after the "unplug" command, which was not
the case prior that commit. This function resets the command register.
As a result the ata_piix driver inside the domU will see a disabled PCI
device. The generic PCI code will reenable the PCI device. On the qemu
side, this runs pci_default_write_config/pci_update_mappings. Here a
changed address is returned by pci_bar_address, this is the address
which was truncated in piix_ide_reset. In case of a Xen HVM domU, the
address changes from 0xc120 to 0xc100. This truncation was a bug in
piix_ide_reset, which was fixed in commit 230dfd9257 ("hw/ide/piix:
properly initialize the BMIBA register"). If pci_xen_ide_unplug had used
pci_device_reset, the PCI registers would have been properly reset, and
commit ee358e919e38 would have not introduced a regression for this
specific domU environment.

While the unplug is supposed to hide the IDE disks, the changed BMIBA
address broke the UHCI device. In case the domU has an USB tablet
configured, to recive absolute pointer coordinates for the GUI, it will
cause a hang during device discovery of the partly discovered USB hid
device. Reading the USBSTS word size register will fail. The access ends
up in the QEMU piix-bmdma device, instead of the expected uhci device.
Here a byte size request is expected, and a value of ~0 is returned. As
a result the UCHI driver sees an error state in the register, and turns
off the UHCI controller.

Signed-off-by: Olaf Hering 
---
  hw/i386/xen/xen_platform.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH] hw/xen: Clarify (lack of) error handling in transaction_commit()

2023-07-26 Thread Paul Durrant


On 20/06/2023 18:58, David Woodhouse wrote:

From: David Woodhouse 

Coverity was unhappy (CID 1508359) because we didn't check the return of
init_walk_op() in transaction_commit(), despite doing so at every other
call site.

Strictly speaking, this is a false positive since it can never fail. It
only fails for invalid user input (transaction ID or path), and both of
those are hard-coded to known sane values in this invocation.

But Coverity doesn't know that, and neither does the casual reader of the
code.

Returning an error here would be weird, since the transaction *is*
committed by this point; all the walk_op is doing is firing watches on
the newly-committed changed nodes. So make it a g_assert(!ret), since
it really should never happen.

Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xenstore_impl.c | 12 +++-
  1 file changed, 11 insertions(+), 1 deletion(-)



Reviewed-by: Paul Durrant

Re: [XEN PATCH] x86: I/O emulation: fix violations of MISRA C:2012 Rules 8.2 and 8.3

2023-07-19 Thread Paul Durrant


On 19/07/2023 09:24, Federico Serafini wrote:

Give a name to unnamed parameters thus fixing violations of
MISRA C:2012 Rule 8.2 ("Function types shall be in prototype form with
named parameters").
Keep consistency between parameter names used in function declarations
and names used in the corresponding function definitions thus fixing
violations of MISRA C:2012 Rule 8.3 ("All declarations of an object or
function shall use the same names and type qualifiers").

Signed-off-by: Federico Serafini 
---
  xen/arch/x86/include/asm/hvm/emulate.h |  8 
  xen/arch/x86/include/asm/hvm/io.h  | 14 +++---
  2 files changed, 11 insertions(+), 11 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH] xen-block: Avoid leaks on new error path

2023-07-06 Thread Paul Durrant


On 04/07/2023 18:18, Anthony PERARD wrote:

From: Anthony PERARD 

Commit 189829399070 ("xen-block: Use specific blockdev driver")
introduced a new error path, without taking care of allocated
resources.

So only allocate the qdicts after the error check, and free both
`filename` and `driver` when we are about to return and thus taking
care of both success and error path.

Coverity only spotted the leak of qdicts (*_layer variables).

Reported-by: Peter Maydell 
Fixes: Coverity CID 1508722, 1398649
Fixes: 189829399070 ("xen-block: Use specific blockdev driver")
Signed-off-by: Anthony PERARD 
---
  hw/block/xen-block.c | 11 ++-
  1 file changed, 6 insertions(+), 5 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH v1] x86/hvm/ioreq: remove empty line after function declaration

2023-05-25 Thread Paul Durrant


On 25/05/2023 16:25, Olaf Hering wrote:

Introduced in commit 6ddfaabceeec3c31bc97a7208c46f581de55f71d
("x86/hvm/ioreq: simplify code and use consistent naming").

Signed-off-by: Olaf Hering 
---
  xen/arch/x86/hvm/ioreq.c | 1 -
  1 file changed, 1 deletion(-)



Reviewed-by: Paul Durrant

Re: [PATCH v2 RESEND] xen: Fix SEGV on domain disconnect

2023-04-24 Thread Paul Durrant


On 24/04/2023 14:17, Tim Smith wrote:

On Mon, Apr 24, 2023 at 1:08 PM Mark Syms  wrote:


Copying in Tim who did the final phase of the changes.

On Mon, 24 Apr 2023 at 11:32, Paul Durrant  wrote:


On 20/04/2023 12:02, mark.s...@citrix.com wrote:

From: Mark Syms 

Ensure the PV ring is drained on disconnect. Also ensure all pending
AIO is complete, otherwise AIO tries to complete into a mapping of the
ring which has been torn down.

Signed-off-by: Mark Syms 
---
CC: Stefano Stabellini 
CC: Anthony Perard 
CC: Paul Durrant 
CC: xen-devel@lists.xenproject.org

v2:
   * Ensure all inflight requests are completed before teardown
   * RESEND to fix formatting
---
   hw/block/dataplane/xen-block.c | 31 +--
   1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index 734da42ea7..d9da4090bf 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -523,6 +523,10 @@ static bool xen_block_handle_requests(XenBlockDataPlane 
*dataplane)

   dataplane->more_work = 0;

+if (dataplane->sring == 0) {
+return done_something;
+}
+


I think you could just return false here... Nothing is ever going to be
done if there's no ring :-)


   rc = dataplane->rings.common.req_cons;
   rp = dataplane->rings.common.sring->req_prod;
   xen_rmb(); /* Ensure we see queued requests up to 'rp'. */
@@ -666,14 +670,35 @@ void xen_block_dataplane_destroy(XenBlockDataPlane 
*dataplane >   void xen_block_dataplane_stop(XenBlockDataPlane *dataplane)
   {
   XenDevice *xendev;
+XenBlockRequest *request, *next;

   if (!dataplane) {
   return;
   }

+/* We're about to drain the ring. We can cancel the scheduling of any
+ * bottom half now */
+qemu_bh_cancel(dataplane->bh);
+
+/* Ensure we have drained the ring */
+aio_context_acquire(dataplane->ctx);
+do {
+xen_block_handle_requests(dataplane);
+} while (dataplane->more_work);
+aio_context_release(dataplane->ctx);
+


I don't think we want to be taking new requests, do we?



If we're in this situation and the guest has put something on the
ring, I think we should do our best with it.
We cannot just rely on the guest to be well-behaved, because they're
not :-( We're about to throw the
ring away, so whatever is there would otherwise be lost.


We only throw away our mapping. The memory belongs to the guest and it
should ensure it does not submit requests after the state has left
'connected'


This bit is
here to try to handle guests which are
less than diligent about their shutdown. We *should* always be past
this fast enough when the disconnect()/connect()
of XenbusStateConnected happens that all remains well (if not, we were
in a worse situation before).



What about a malicious guest that is piling requests into the ring. It 
could keep us in the loop forever, couldn't it?



+/* Now ensure that all inflight requests are complete */
+while (!QLIST_EMPTY(>inflight)) {
+QLIST_FOREACH_SAFE(request, >inflight, list, next) {
+blk_aio_flush(request->dataplane->blk, xen_block_complete_aio,
+request);
+}
+}
+


I think this could possibly be simplified by doing the drain after the
call to blk_set_aio_context(), as long as we set dataplane->ctx to
qemu_get_aio_context(). Alos, as long as more_work is not set then it
should still be safe to cancel the bh before the drain AFAICT.


I'm not sure what you mean by simpler? Possibly I'm not getting something.



Sorry, I was referring to the need to do aio_context_acquire() calls but 
they are only around the disputed xen_block_handle_requests() call 
anyway, so there's no simplification in this bit.



We have to make sure that any "aio_bh_schedule_oneshot_full()" which
happens as a result of
"blk_aio_flush()" has finished before any change of AIO context,
because it tries to use the one which
was current at the time of being called (I have the SEGVs to prove it
:-)).


Ok, I had assumed that the issue was the context being picked up inside 
the xen_block_complete_aio() call.



Whether that happens before or after
"blk_set_aio_context(qemu_get_aio_context())" doesn't seem to be a
change in complexity to me.

Motivation was to get as much as possible to happen in the way it
"normally" would, so that future changes
are less likely to regress, but as mentioned maybe I'm missing something.

The BH needs to be prevented from firing ASAP, otherwise the
disconnect()/connect() which happens when
XenbusStateConnected can have the bh fire from what the guest does
next right in the middle of juggling
contexts for the disconnect() (I have the SEGVs from that too...).



So if you drop the ring drain then this patch should still stop the 
SEGVs, right?


  Paul

Re: [PATCH v2 RESEND] xen: Fix SEGV on domain disconnect

2023-04-24 Thread Paul Durrant


On 20/04/2023 12:02, mark.s...@citrix.com wrote:

From: Mark Syms 

Ensure the PV ring is drained on disconnect. Also ensure all pending
AIO is complete, otherwise AIO tries to complete into a mapping of the
ring which has been torn down.

Signed-off-by: Mark Syms 
---
CC: Stefano Stabellini 
CC: Anthony Perard 
CC: Paul Durrant 
CC: xen-devel@lists.xenproject.org

v2:
  * Ensure all inflight requests are completed before teardown
  * RESEND to fix formatting
---
  hw/block/dataplane/xen-block.c | 31 +--
  1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index 734da42ea7..d9da4090bf 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -523,6 +523,10 @@ static bool xen_block_handle_requests(XenBlockDataPlane 
*dataplane)
  
  dataplane->more_work = 0;
  
+if (dataplane->sring == 0) {

+return done_something;
+}
+


I think you could just return false here... Nothing is ever going to be 
done if there's no ring :-)



  rc = dataplane->rings.common.req_cons;
  rp = dataplane->rings.common.sring->req_prod;
  xen_rmb(); /* Ensure we see queued requests up to 'rp'. */
@@ -666,14 +670,35 @@ void xen_block_dataplane_destroy(XenBlockDataPlane 
*dataplane >   void xen_block_dataplane_stop(XenBlockDataPlane *dataplane)
  {
  XenDevice *xendev;
+XenBlockRequest *request, *next;
  
  if (!dataplane) {

  return;
  }
  
+/* We're about to drain the ring. We can cancel the scheduling of any

+ * bottom half now */
+qemu_bh_cancel(dataplane->bh);
+
+/* Ensure we have drained the ring */
+aio_context_acquire(dataplane->ctx);
+do {
+xen_block_handle_requests(dataplane);
+} while (dataplane->more_work);
+aio_context_release(dataplane->ctx);
+


I don't think we want to be taking new requests, do we?


+/* Now ensure that all inflight requests are complete */
+while (!QLIST_EMPTY(>inflight)) {
+QLIST_FOREACH_SAFE(request, >inflight, list, next) {
+blk_aio_flush(request->dataplane->blk, xen_block_complete_aio,
+request);
+}
+}
+


I think this could possibly be simplified by doing the drain after the 
call to blk_set_aio_context(), as long as we set dataplane->ctx to 
qemu_get_aio_context(). Alos, as long as more_work is not set then it 
should still be safe to cancel the bh before the drain AFAICT.


  Paul


  xendev = dataplane->xendev;
  
  aio_context_acquire(dataplane->ctx);

+
  if (dataplane->event_channel) {
  /* Only reason for failure is a NULL channel */
  xen_device_set_event_channel_context(xendev, dataplane->event_channel,
@@ -684,12 +709,6 @@ void xen_block_dataplane_stop(XenBlockDataPlane *dataplane)
  blk_set_aio_context(dataplane->blk, qemu_get_aio_context(), _abort);
  aio_context_release(dataplane->ctx);
  
-/*

- * Now that the context has been moved onto the main thread, cancel
- * further processing.
- */
-qemu_bh_cancel(dataplane->bh);
-
  if (dataplane->event_channel) {
  Error *local_err = NULL;

Re: [PATCH v2 08/16] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore

2023-04-20 Thread Paul Durrant


On 19/04/2023 18:28, Stefan Hajnoczi wrote:

There is no need to suspend activity between aio_disable_external() and
aio_enable_external(), which is mainly used for the block layer's drain
operation.

This is part of ongoing work to remove the aio_disable_external() API.

Reviewed-by: David Woodhouse 
Signed-off-by: Stefan Hajnoczi 
---
  hw/i386/kvm/xen_xenstore.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



Reviewed-by: Paul Durrant

Re: [PATCH 5/5] hw/xen: Fix broken check for invalid state in xs_be_open()

2023-04-17 Thread Paul Durrant


On 12/04/2023 19:51, David Woodhouse wrote:

From: David Woodhouse 

Coverity points out that if (!s && !s->impl) isn't really what we intended
to do here. CID 1508131.

Fixes: 032475127225 ("hw/xen: Add emulated implementation of XenStore 
operations")
Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_xenstore.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



Reviewed-by: Paul Durrant

Re: [PATCH 4/5] hw/xen: Fix double-free in xen_console store_con_info()

2023-04-17 Thread Paul Durrant


On 12/04/2023 19:51, David Woodhouse wrote:

From: David Woodhouse 

Coverity spotted a double-free (CID 1508254); we g_string_free(path) and
then for some reason immediately call free(path) too.

We should just use g_autoptr() for it anyway, which simplifies the code
a bit.

Fixes: 7a8a749da7d3 ("hw/xen: Move xenstore_store_pv_console_info to 
xen_console.c")
Signed-off-by: David Woodhouse 
---
  hw/char/xen_console.c | 13 +++--
  1 file changed, 3 insertions(+), 10 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH 3/5] xen: Drop support for Xen versions below 4.7.1

2023-04-17 Thread Paul Durrant


On 12/04/2023 19:51, David Woodhouse wrote:

From: David Woodhouse 

In restructuring to allow for internal emulation of Xen functionality,
I broke compatibility for Xen 4.6 and earlier. Fix this by explicitly
removing support for anything older than 4.7.1, which is also ancient
but it does still build, and the compatibility support for it is fairly
unintrusive.

Fixes: 15e283c5b684 ("hw/xen: Add foreignmem operations to allow redirection to 
internal emulation")
Signed-off-by: David Woodhouse 
---
  hw/xen/xen-operations.c |  57 +--
  include/hw/xen/xen_native.h | 107 +---
  meson.build |   5 +-
  scripts/xen-detect.c|  60 
  4 files changed, 3 insertions(+), 226 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH 2/5] hw/xen: Fix memory leak in libxenstore_open() for Xen

2023-04-17 Thread Paul Durrant


On 12/04/2023 19:50, David Woodhouse wrote:

From: David Woodhouse 

There was a superfluous allocation of the XS handle, leading to it
being leaked on both the error path and the success path (where it gets
allocated again).

Spotted by Coverity (CID 1508098).

Fixes: ba2a92db1ff6 ("hw/xen: Add xenstore operations to allow redirection to 
internal emulation")
Suggested-by: Peter Maydell 
Signed-off-by: David Woodhouse 
Reviewed-by: Peter Maydell 


Reviewed-by: Paul Durrant

Re: [PATCH 2/2] xen/netback: remove not needed test in xenvif_tx_build_gops()

2023-03-27 Thread Paul Durrant


On 27/03/2023 09:36, Juergen Gross wrote:

The tests for the number of grant mapping or copy operations reaching
the array size of the operations buffer at the end of the main loop in
xenvif_tx_build_gops() isn't needed.

The loop can handle at maximum MAX_PENDING_REQS transfer requests, as
XEN_RING_NR_UNCONSUMED_REQUESTS() is taking unsent responses into
consideration, too.

Remove the tests.

Suggested-by: Jan Beulich 
Signed-off-by: Juergen Gross 
---
  drivers/net/xen-netback/netback.c | 4 
  1 file changed, 4 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH 1/2] xen/netback: don't do grant copy across page boundary

2023-03-27 Thread Paul Durrant


On 27/03/2023 09:36, Juergen Gross wrote:

Fix xenvif_get_requests() not to do grant copy operations across local
page boundaries. This requires to double the maximum number of copy
operations per queue, as each copy could now be split into 2.

Make sure that struct xenvif_tx_cb doesn't grow too large.

Cc: sta...@vger.kernel.org
Fixes: ad7f402ae4f4 ("xen/netback: Ensure protocol headers don't fall in the 
non-linear area")
Signed-off-by: Juergen Gross 
---
  drivers/net/xen-netback/common.h  |  2 +-
  drivers/net/xen-netback/netback.c | 25 +++--
  2 files changed, 24 insertions(+), 3 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH] hw/xenpv: Initialize Xen backend operations

2023-03-23 Thread Paul Durrant


On 23/03/2023 10:57, David Woodhouse wrote:

From: David Woodhouse 

As the Xen backend operations were abstracted out into a function table to
allow for internally emulated Xen support, we missed the xen_init_pv()
code path which also needs to install the operations for the true Xen
libraries. Add the missing call to setup_xen_backend_ops().

Fixes: b6cacfea0b38 ("hw/xen: Add evtchn operations to allow redirection to internal 
emulation")
Reported-by: Anthony PERARD 
Signed-off-by: David Woodhouse 
---
  hw/xenpv/xen_machine_pv.c | 2 ++
  1 file changed, 2 insertions(+)


Reviewed-by: Paul Durrant 



diff --git a/hw/xenpv/xen_machine_pv.c b/hw/xenpv/xen_machine_pv.c
index 2e759d0619..17cda5ec13 100644
--- a/hw/xenpv/xen_machine_pv.c
+++ b/hw/xenpv/xen_machine_pv.c
@@ -35,6 +35,8 @@ static void xen_init_pv(MachineState *machine)
  DriveInfo *dinfo;
  int i;
  
+setup_xen_backend_ops();

+
  /* Initialize backend core & drivers */
  xen_be_init();

Re: [PATCH] Fix PCI hotplug AML

2023-03-20 Thread Paul Durrant


On 20/03/2023 10:34, Jan Beulich wrote:

On 20.03.2023 10:04, Paul Durrant wrote:

On 17/03/2023 10:32, David Woodhouse wrote:

From: David Woodhouse 

The emulated PIIX3 uses a nybble for the status of each PCI function,
so the status for e.g. slot 0 functions 0 and 1 respectively can be
read as (\_GPE.PH00 & 0x0F), and (\_GPE.PH00 >> 0x04).

The AML that Xen gives to a guest gets the operand order for the odd-
numbered functions the wrong way round, returning (0x04 >> \_GPE.PH00)
instead.

As far as I can tell, this was the wrong way round in Xen from the
moment that PCI hotplug was first introduced in commit 83d82e6f35a8:

+ShiftRight (0x4, \_GPE.PH00, Local1)
+Return (Local1) /* IN status as the _STA */

Or maybe there's bizarre AML operand ordering going on there, like
Intel's wrong-way-round assembler, and it only broke later when it was
changed to being generated?

Either way, it's definitely wrong now, and instrumenting a Linux guest
shows that it correctly sees _STA being 0x00 in function 0 of an empty
slot, but then the loop in acpiphp_glue.c::get_slot_status() goes on to
look at function 1 and sees that _STA evaluates to 0x04. Thus reporting
an adapter is present in every slot in /sys/bus/pci/slots/*

Quite why Linux wants to look for function 1 being physically present
when function 0 isn't... I don't want to think about right now.

Signed-off-by: David Woodhouse 
Fixes: 83d82e6f35a8 ("hvmloader: pass-through: multi-function PCI hot-plug")
---
Utterly untested in Xen. Tested the same change in a different
environment which is using precisely the *same* AML for guest
compatibility.



This AML only relates to the hotplug controller for qemu-trad so it's
unlikely anyone particularly cares any more. In fact I'm kind of
surprised the generation code still exists.


Why would it not exist anymore? Use of qemu-trad is deprecated and
advised against, but it's still possible to use it. Otherwise quite a
bit of cleanup in libxl could also happen, for example.



Right. I'm just surprised that is not done already... seems like a while 
since trad was deprecated; I'd have thought it could be removed in the 
next release.


  Paul

Re: [PATCH] Fix PCI hotplug AML

2023-03-20 Thread Paul Durrant


On 17/03/2023 10:32, David Woodhouse wrote:

From: David Woodhouse 

The emulated PIIX3 uses a nybble for the status of each PCI function,
so the status for e.g. slot 0 functions 0 and 1 respectively can be
read as (\_GPE.PH00 & 0x0F), and (\_GPE.PH00 >> 0x04).

The AML that Xen gives to a guest gets the operand order for the odd-
numbered functions the wrong way round, returning (0x04 >> \_GPE.PH00)
instead.

As far as I can tell, this was the wrong way round in Xen from the
moment that PCI hotplug was first introduced in commit 83d82e6f35a8:

+ShiftRight (0x4, \_GPE.PH00, Local1)
+Return (Local1) /* IN status as the _STA */

Or maybe there's bizarre AML operand ordering going on there, like
Intel's wrong-way-round assembler, and it only broke later when it was
changed to being generated?

Either way, it's definitely wrong now, and instrumenting a Linux guest
shows that it correctly sees _STA being 0x00 in function 0 of an empty
slot, but then the loop in acpiphp_glue.c::get_slot_status() goes on to
look at function 1 and sees that _STA evaluates to 0x04. Thus reporting
an adapter is present in every slot in /sys/bus/pci/slots/*

Quite why Linux wants to look for function 1 being physically present
when function 0 isn't... I don't want to think about right now.

Signed-off-by: David Woodhouse 
Fixes: 83d82e6f35a8 ("hvmloader: pass-through: multi-function PCI hot-plug")
---
Utterly untested in Xen. Tested the same change in a different
environment which is using precisely the *same* AML for guest
compatibility.



This AML only relates to the hotplug controller for qemu-trad so it's 
unlikely anyone particularly cares any more. In fact I'm kind of 
surprised the generation code still exists.


  Paul


diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
index 1176da80ef..1d27809116 100644
--- a/tools/libacpi/mk_dsdt.c
+++ b/tools/libacpi/mk_dsdt.c
@@ -431,7 +431,7 @@ int main(int argc, char **argv)
  stmt("Store", "0x89, \\_GPE.DPT2");
  }
  if ( slot & 1 )
-stmt("ShiftRight", "0x4, \\_GPE.PH%02X, Local1", slot & ~1);
+stmt("ShiftRight", "\\_GPE.PH%02X, 0x04, Local1", slot & ~1);
  else
  stmt("And", "\\_GPE.PH%02X, 0x0f, Local1", slot & ~1);
  stmt("Return", "Local1"); /* IN status as the _STA */

Re: [PATCH] accel/xen: Fix DM state change notification in dm_restrict mode

2023-03-14 Thread Paul Durrant


On 14/03/2023 08:35, David Woodhouse wrote:

From: David Woodhouse 

When dm_restrict is set, QEMU isn't permitted to update the XenStore node
to indicate its running status. Previously, the xs_write() call would fail
but the failure was ignored.

However, in refactoring to allow for emulated XenStore operations, a new
call to xs_open() was added. That one didn't fail gracefully, causing a
fatal error when running in dm_restrict mode.

Partially revert the offending patch, removing the additional call to
xs_open() because the global 'xenstore' variable is still available; it
just needs to be used with qemu_xen_xs_write() now instead of directly
with the xs_write() libxenstore function.

Also make the whole thing conditional on !xen_domid_restrict. There's no
point even registering the state change handler to attempt to update the
XenStore node when we know it's destined to fail.

Fixes: ba2a92db1ff6 ("hw/xen: Add xenstore operations to allow redirection to 
internal emulation")
Reported-by: Jason Andryuk 
Co-developed-by: Jason Andryuk 
Not-Signed-off-by: Jason Andryuk 
Signed-off-by: David Woodhouse 
Will-be-Tested-by: Jason Andryuk 
---
  accel/xen/xen-all.c | 27 ++-
  1 file changed, 10 insertions(+), 17 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH v2 00/27] Enable PV backends with Xen/KVM emulation

2023-03-07 Thread Paul Durrant


On 07/03/2023 17:17, David Woodhouse wrote:

Following on from the basic platform support which has already been
merged, here's phase 2 which wires up the XenBus and PV back ends.

It starts with a basic single-tenant internal implementation of a
XenStore, with a copy-on-write tree, watches, transactions, quotas.

Then we introduce operations tables for the grant table, event channel,
foreignmen and xenstore operations so that in addition to using the Xen
libraries for those, QEMU can use its internal emulated versions.

A little bit of cleaning up of header files, and we can enable the build
of xen-bus in the CONFIG_XEN_EMU build, and run a Xen guest with an
actual PV disk...

qemu-system-x86_64 -serial mon:stdio -M q35 -display none -m 1G -smp 2 \
   -accel kvm,xen-version=0x4000e,kernel-irqchip=split \
   -kernel bzImage -append "console=ttyS0 root=/dev/xvda1 selinux=0" \
   -drive file=/var/lib/libvirt/images/fedora28.qcow2,if=none,id=disk \
   -device xen-disk,drive=disk,vdev=xvda

The main thing that isn't working here is migration. I've implemented it
for the internal xenstore and the unit tests exercise it, but the
existing PV back ends don't support it, perhaps partly because support
for guest transparent live migration support isn't upstream in Xen yet.
So the disk doesn't come back correctly after migration. I'm content
with that for 8.0 though, and we just mark the emulated XenStore device
as unmigratable to prevent users from trying.

The other pre-existing constraint is that only the block back end has
yet been ported to the "new" XenBus infrastructure, and is actually
capable of creating its own backend nodes. Again, I can live with
that for 8.0. Maybe this will motivate us to finally get round to
converting the rest off XenLegacyBackend and killing it.

We also don't have a simple way to perform grant mapping of multiple
guest pages to contiguous addresses, as we can under real Xen. So we
don't advertise max-ring-page-order for xen-disk in the emulated mode.
Fixing that — if we actually want to — would probably require mapping
RAM from an actual backing store object, so that it can be mapped again
at a different location for the PV back end to see.

v2: https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv-2

  • Full set of reviewed-by tags from Paul (and associated minor fixes).

  • Disable migration for emulated XenStore device.

  • Update docs and add MAINTAINERS entry.

v1: 
https://lore.kernel.org/qemu-devel/20230302153435.1170111-1-dw...@infradead.org/
 https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv-1

David Woodhouse (23):
   hw/xen: Add xenstore wire implementation and implementation stubs
   hw/xen: Add basic XenStore tree walk and write/read/directory support
   hw/xen: Implement XenStore watches
   hw/xen: Implement XenStore transactions
   hw/xen: Watches on XenStore transactions
   hw/xen: Implement core serialize/deserialize methods for xenstore_impl
   hw/xen: Add evtchn operations to allow redirection to internal emulation
   hw/xen: Add gnttab operations to allow redirection to internal emulation
   hw/xen: Pass grant ref to gnttab unmap operation
   hw/xen: Add foreignmem operations to allow redirection to internal 
emulation
   hw/xen: Move xenstore_store_pv_console_info to xen_console.c
   hw/xen: Use XEN_PAGE_SIZE in PV backend drivers
   hw/xen: Rename xen_common.h to xen_native.h
   hw/xen: Build PV backend drivers for CONFIG_XEN_BUS
   hw/xen: Only advertise ring-page-order for xen-block if gnttab supports 
it
   hw/xen: Hook up emulated implementation for event channel operations
   hw/xen: Add emulated implementation of grant table operations
   hw/xen: Add emulated implementation of XenStore operations
   hw/xen: Map guest XENSTORE_PFN grant in emulated Xenstore
   hw/xen: Implement soft reset for emulated gnttab
   i386/xen: Initialize Xen backends from pc_basic_device_init() for 
emulation
   MAINTAINERS: Add entry for Xen on KVM emulation
   docs: Update Xen-on-KVM documentation for PV disk support

Paul Durrant (4):
   hw/xen: Implement XenStore permissions
   hw/xen: Create initial XenStore nodes
   hw/xen: Add xenstore operations to allow redirection to internal 
emulation
   hw/xen: Avoid crash when backend watch fires too early

  MAINTAINERS   |9 +
  accel/xen/xen-all.c   |   69 +-
  docs/system/i386/xen.rst  |   30 +-
  hw/9pfs/meson.build   |2 +-
  hw/9pfs/xen-9p-backend.c  |   32 +-
  hw/block/dataplane/meson.build|2 +-
  hw/block/dataplane/xen-block.c|   12 +-
  hw/block/meson.build  |2 +-
  hw/block/xen-block.c  |   12 +-

Re: [PATCH v2 25/27] i386/xen: Initialize Xen backends from pc_basic_device_init() for emulation

2023-03-07 Thread Paul Durrant


On 07/03/2023 17:17, David Woodhouse wrote:

From: David Woodhouse 

Now that all the work is done to enable the PV backends to work without
actual Xen, instantiate the bus from pc_basic_device_init() for emulated
mode.

This allows us finally to launch an emulated Xen guest with PV disk.

qemu-system-x86_64 -serial mon:stdio -M q35 -cpu host -display none \
  -m 1G -smp 2 -accel kvm,xen-version=0x4000a,kernel-irqchip=split \
  -kernel bzImage -append "console=ttyS0 root=/dev/xvda1" \
  -drive file=/var/lib/libvirt/images/fedora28.qcow2,if=none,id=disk \
  -device xen-disk,drive=disk,vdev=xvda

If we use -M pc instead of q35, we can even add an IDE disk and boot a
guest image normally through grub. But q35 gives us AHCI and that isn't
unplugged by the Xen magic, so the guests ends up seeing "both" disks.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
  hw/i386/pc.c | 7 +++
  1 file changed, 7 insertions(+)



Also...

Tested-by: Paul Durrant 

... on real Xen (master branch, 4.18) with a Debian guest.

Re: [RFC PATCH v1 07/25] hw/xen: Implement core serialize/deserialize methods for xenstore_impl

2023-03-07 Thread Paul Durrant


On 07/03/2023 16:59, Paul Durrant wrote:

On 07/03/2023 16:52, David Woodhouse wrote:

On Tue, 2023-03-07 at 16:39 +, Paul Durrant wrote:

On 07/03/2023 16:33, David Woodhouse wrote:

On Thu, 2023-03-02 at 15:34 +, David Woodhouse wrote:

From: David Woodhouse 

In fact I think we want to only serialize the contents of the domain's
path in /local/domain/${domid} and leave the rest to be recreated? 
Will

defer to Paul for that.

Signed-off-by: David Woodhouse 


Paul, your Reviewed-by: on this one is conspicuous in its absence. I
mentioned migration in the cover letter — this much is working fine,
but it's the PV back ends that don't yet work.

I'd quite like to merge the basic serialization/deserialization of
XenStore itself, with the unit tests.


The patch is basically ok, I just think the serialization should be
limited to the guest nodes... filtering out those not owned by xen_domid
would probably work for that.


Yeah, so let's just do this (as part of this patch #7) for now:

--- a/hw/i386/kvm/xen_xenstore.c
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -235,6 +235,7 @@ static int xen_xenstore_post_load(void *opaque, int
ver)
  static const VMStateDescription xen_xenstore_vmstate = {
  .name = "xen_xenstore",
+    .unmigratable = 1, /* The PV back ends don't migrate yet */
  .version_id = 1,
  .minimum_version_id = 1,
  .needed = xen_xenstore_is_needed,


It means we can't migrate guests even if they're only using fully
emulated devices... but I think that's a reasonable limitation until we
implement it fully.



Ok. With that added...

Revieweed-by: Paul Durrant 


Typoed, sorry...

Reviewed-by: Paul Durrant

Re: [RFC PATCH v1 07/25] hw/xen: Implement core serialize/deserialize methods for xenstore_impl

2023-03-07 Thread Paul Durrant


On 07/03/2023 16:52, David Woodhouse wrote:

On Tue, 2023-03-07 at 16:39 +, Paul Durrant wrote:

On 07/03/2023 16:33, David Woodhouse wrote:

On Thu, 2023-03-02 at 15:34 +, David Woodhouse wrote:

From: David Woodhouse 

In fact I think we want to only serialize the contents of the domain's
path in /local/domain/${domid} and leave the rest to be recreated? Will
defer to Paul for that.

Signed-off-by: David Woodhouse 


Paul, your Reviewed-by: on this one is conspicuous in its absence. I
mentioned migration in the cover letter — this much is working fine,
but it's the PV back ends that don't yet work.

I'd quite like to merge the basic serialization/deserialization of
XenStore itself, with the unit tests.


The patch is basically ok, I just think the serialization should be
limited to the guest nodes... filtering out those not owned by xen_domid
would probably work for that.


Yeah, so let's just do this (as part of this patch #7) for now:

--- a/hw/i386/kvm/xen_xenstore.c
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -235,6 +235,7 @@ static int xen_xenstore_post_load(void *opaque, int
ver)
  
  static const VMStateDescription xen_xenstore_vmstate = {

  .name = "xen_xenstore",
+.unmigratable = 1, /* The PV back ends don't migrate yet */
  .version_id = 1,
  .minimum_version_id = 1,
  .needed = xen_xenstore_is_needed,


It means we can't migrate guests even if they're only using fully
emulated devices... but I think that's a reasonable limitation until we
implement it fully.



Ok. With that added...

Revieweed-by: Paul Durrant

Re: [RFC PATCH v1 07/25] hw/xen: Implement core serialize/deserialize methods for xenstore_impl

2023-03-07 Thread Paul Durrant


On 07/03/2023 16:39, Paul Durrant wrote:

On 07/03/2023 16:33, David Woodhouse wrote:

On Thu, 2023-03-02 at 15:34 +, David Woodhouse wrote:

From: David Woodhouse 

In fact I think we want to only serialize the contents of the domain's
path in /local/domain/${domid} and leave the rest to be recreated? Will
defer to Paul for that.

Signed-off-by: David Woodhouse 


Paul, your Reviewed-by: on this one is conspicuous in its absence. I
mentioned migration in the cover letter — this much is working fine,
but it's the PV back ends that don't yet work.

I'd quite like to merge the basic serialization/deserialization of
XenStore itself, with the unit tests.


The patch is basically ok, I just think the serialization should be 
limited to the guest nodes... filtering out those not owned by xen_domid 
would probably work for that.




Perhaps we can also set TYPE_XEN_DEVICE or TYPE_XEN_BLOCK_DEVICE to be
unmigratable? Ideally I think we want TYPE_XEN_DEVICE to be
unmigratable by default *unless* the specific device class (including
net and other as we port them from XenLegacyDevice) says otherwise.



Yes, that sounds like a good idea.


Is there a way to do that?


Not something I've looked into. I'll go look now.



Maybe calling migrate_add_blocker() in the realize method is the right 
way to achieve this?


  Paul

Re: [RFC PATCH v1 07/25] hw/xen: Implement core serialize/deserialize methods for xenstore_impl

2023-03-07 Thread Paul Durrant


On 07/03/2023 16:33, David Woodhouse wrote:

On Thu, 2023-03-02 at 15:34 +, David Woodhouse wrote:

From: David Woodhouse 

In fact I think we want to only serialize the contents of the domain's
path in /local/domain/${domid} and leave the rest to be recreated? Will
defer to Paul for that.

Signed-off-by: David Woodhouse 


Paul, your Reviewed-by: on this one is conspicuous in its absence. I
mentioned migration in the cover letter — this much is working fine,
but it's the PV back ends that don't yet work.

I'd quite like to merge the basic serialization/deserialization of
XenStore itself, with the unit tests.


The patch is basically ok, I just think the serialization should be 
limited to the guest nodes... filtering out those not owned by xen_domid 
would probably work for that.




Perhaps we can also set TYPE_XEN_DEVICE or TYPE_XEN_BLOCK_DEVICE to be
unmigratable? Ideally I think we want TYPE_XEN_DEVICE to be
unmigratable by default *unless* the specific device class (including
net and other as we port them from XenLegacyDevice) says otherwise.



Yes, that sounds like a good idea.


Is there a way to do that?


Not something I've looked into. I'll go look now.

  Paul

Re: [RFC PATCH v1 27/25] docs: Update Xen-on-KVM documentation for PV disk support

2023-03-07 Thread Paul Durrant


On 07/03/2023 16:22, David Woodhouse wrote:

From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
  docs/system/i386/xen.rst | 30 +++---
  1 file changed, 23 insertions(+), 7 deletions(-)



Reviewed-by: Paul Durrant

Re: [RFC PATCH v1 26/25] MAINTAINERS: Add entry for Xen on KVM emulation

2023-03-07 Thread Paul Durrant


On 07/03/2023 16:21, David Woodhouse wrote:

From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
  MAINTAINERS | 9 +
  1 file changed, 9 insertions(+)



Reviewed-by: Paul Durrant

Re: [RFC PATCH v1 25/25] i386/xen: Initialize Xen backends from pc_basic_device_init() for emulation

2023-03-07 Thread Paul Durrant


On 02/03/2023 15:34, David Woodhouse wrote:

From: David Woodhouse 

Now that all the work is done to enable the PV backends to work without
actual Xen, instantiate the bus from pc_basic_device_init() for emulated
mode.

This allows us finally to launch an emulated Xen guest with PV disk.

qemu-system-x86_64 -serial mon:stdio -M q35 -cpu host -display none \
  -m 1G -smp 2 -accel kvm,xen-version=0x4000a,kernel-irqchip=split \
  -kernel bzImage -append "console=ttyS0 root=/dev/xvda1" \
  -drive file=/var/lib/libvirt/images/fedora28.qcow2,if=none,id=disk \
  -device xen-disk,drive=disk,vdev=xvda

If we use -M pc instead of q35, we can even add an IDE disk and boot a
guest image normally through grub. But q35 gives us AHCI and that isn't
unplugged by the Xen magic, so the guests ends up seeing "both" disks.

Signed-off-by: David Woodhouse 
---
  hw/i386/pc.c | 7 +++
  1 file changed, 7 insertions(+)



Reviewed-by: Paul Durrant

Re: [RFC PATCH v1 24/25] hw/xen: Implement soft reset for emulated gnttab

2023-03-07 Thread Paul Durrant


On 02/03/2023 15:34, David Woodhouse wrote:

From: David Woodhouse 

This is only part of it; we will also need to get the PV back end drivers
to tear down their own mappings (or do it for them, but they kind of need
to stop using the pointers too).

Some more work on the actual PV back ends and xen-bus code is going to be
needed to really make soft reset and migration fully functional, and this
part is the basis for that.

Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_gnttab.c  | 26 --
  hw/i386/kvm/xen_gnttab.h  |  1 +
  target/i386/kvm/xen-emu.c |  5 +
  3 files changed, 30 insertions(+), 2 deletions(-)



Reviewed-by: Paul Durrant

Re: [RFC PATCH v1 23/25] hw/xen: Map guest XENSTORE_PFN grant in emulated Xenstore

2023-03-07 Thread Paul Durrant


On 02/03/2023 15:34, David Woodhouse wrote:

From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_xenstore.c | 16 
  1 file changed, 16 insertions(+)

diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
index 028f80499e..f9b7387024 100644
--- a/hw/i386/kvm/xen_xenstore.c
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -21,6 +21,7 @@
  
  #include "hw/sysbus.h"

  #include "hw/xen/xen.h"
+#include "hw/xen/xen_backend_ops.h"
  #include "xen_overlay.h"
  #include "xen_evtchn.h"
  #include "xen_xenstore.h"
@@ -34,6 +35,7 @@
  
  #include "hw/xen/interface/io/xs_wire.h"

  #include "hw/xen/interface/event_channel.h"
+#include "hw/xen/interface/grant_table.h"
  
  #define TYPE_XEN_XENSTORE "xen-xenstore"

  OBJECT_DECLARE_SIMPLE_TYPE(XenXenstoreState, XEN_XENSTORE)
@@ -66,6 +68,9 @@ struct XenXenstoreState {
  
  uint8_t *impl_state;

  uint32_t impl_state_size;
+
+struct xengntdev_handle *gt;
+void *granted_xs;
  };
  
  struct XenXenstoreState *xen_xenstore_singleton;

@@ -1452,6 +1457,17 @@ int xen_xenstore_reset(void)
  }
  s->be_port = err;
  
+/*

+ * We don't actually access the guest's page through the grant, because
+ * this isn't real Xen, and we can just use the page we gave it in the
+ * first place. Map the grant anyway, mostly for cosmetic purposes so
+ * it *looks* like it's in use in the guest-visible grant table.


Might be useful to stick this text in the commit comment too.

Reviewed-by: Paul Durrant 


+ */
+s->gt = qemu_xen_gnttab_open();
+uint32_t xs_gntref = GNTTAB_RESERVED_XENSTORE;
+s->granted_xs = qemu_xen_gnttab_map_refs(s->gt, 1, xen_domid, _gntref,
+ PROT_READ | PROT_WRITE);
+
  return 0;
  }

Re: [RFC PATCH v1 22/25] hw/xen: Add emulated implementation of XenStore operations

2023-03-07 Thread Paul Durrant


On 02/03/2023 15:34, David Woodhouse wrote:

From: David Woodhouse 

Now that we have an internal implementation of XenStore, we can populate
the xenstore_backend_ops to allow PV backends to talk to it.

Watches can't be processed with immediate callbacks because that would
call back into XenBus code recursively. Defer them to a QEMUBH to be run
as appropriate from the main loop. We use a QEMUBH per XS handle, and it
walks all the watches (there shouldn't be many per handle) to fire any
which have pending events. We *could* have done it differently but this
allows us to use the same struct watch_event as we have for the guest
side, and keeps things relatively simple.


Yes, it's more consistent with watch events on real Xen this way.



Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_xenstore.c | 273 -
  1 file changed, 269 insertions(+), 4 deletions(-)



Reviewed-by: Paul Durrant

Re: [RFC PATCH v1 21/25] hw/xen: Add emulated implementation of grant table operations

2023-03-07 Thread Paul Durrant


On 02/03/2023 15:34, David Woodhouse wrote:

From: David Woodhouse 

This is limited to mapping a single grant at a time, because under Xen the
pages are mapped *contiguously* into qemu's address space, and that's very
hard to do when those pages actually come from anonymous mappings in qemu
in the first place.

Eventually perhaps we can look at using shared mappings of actual objects
for system RAM, and then we can make new mappings of the same backing
store (be it deleted files, shmem, whatever). But for now let's stick to
a page at a time.

Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_gnttab.c | 299 ++-
  1 file changed, 296 insertions(+), 3 deletions(-)


[snip]

+static uint64_t gnt_ref(XenGnttabState *s, grant_ref_t ref, int prot)
+{
+uint16_t mask = GTF_type_mask | GTF_sub_page;
+grant_entry_v1_t gnt, *gnt_p;
+int retries = 0;
+
+if (ref >= s->max_frames * ENTRIES_PER_FRAME_V1 ||
+s->map_track[ref] == UINT8_MAX) {
+return INVALID_GPA;
+}
+
+if (prot & PROT_WRITE) {
+mask |= GTF_readonly;
+}
+
+gnt_p = >entries.v1[ref];
+
+/*
+ * The guest can legitimately be changing the GTF_readonly flag. Allow


I'd call a guest playing with the ref after setting GTF_permit_access a 
buggy guest and not bother with the loop.



+ * that, but don't let a malicious guest cause a livelock.
+ */
+for (retries = 0; retries < 5; retries++) {
+uint16_t new_flags;
+
+/* Read the entry before an atomic operation on its flags */
+gnt = *(volatile grant_entry_v1_t *)gnt_p;
+
+if ((gnt.flags & mask) != GTF_permit_access ||
+gnt.domid != DOMID_QEMU) {
+return INVALID_GPA;
+}
+
+new_flags = gnt.flags | GTF_reading;
+if (prot & PROT_WRITE) {
+new_flags |= GTF_writing;
+}
+
+if (qatomic_cmpxchg(_p->flags, gnt.flags, new_flags) == gnt.flags) 
{


Xen actually does a cmpxchg on both the flags and the domid. We probably 
ought to fail to set the flags if the guest is playing with the domid 
but since we're single-tenant it doesn't *really* matter... just a 
nice-to-have. So...


Reviewed-by: Paul Durrant

Re: [RFC PATCH v1 20/25] hw/xen: Hook up emulated implementation for event channel operations

2023-03-07 Thread Paul Durrant


On 02/03/2023 15:34, David Woodhouse wrote:

From: David Woodhouse 

We provided the backend-facing evtchn functions very early on as part of
the core Xen platform support, since things like timers and xenstore need
to use them.

By what may or may not be an astonishing coincidence, those functions
just *happen* all to have exactly the right function prototypes to slot
into the evtchn_backend_ops table and be called by the PV backends.


It is indeed an amazing coincidence :-)



Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_evtchn.c | 15 +++
  1 file changed, 15 insertions(+)



Reviewed-by: Paul Durrant

Re: [RFC PATCH v1 19/25] hw/xen: Only advertise ring-page-order for xen-block if gnttab supports it

2023-03-07 Thread Paul Durrant


On 02/03/2023 15:34, David Woodhouse wrote:

From: David Woodhouse 

Whem emulating Xen, multi-page grants are distinctly non-trivial and we
have elected not to support them for the time being. Don't advertise
them to the guest.

Signed-off-by: David Woodhouse 
---
  hw/block/xen-block.c | 11 ---
  1 file changed, 8 insertions(+), 3 deletions(-)



Reviewed-by: Paul Durrant

Re: [RFC PATCH v1 18/25] hw/xen: Avoid crash when backend watch fires too early

2023-03-07 Thread Paul Durrant


On 02/03/2023 15:34, David Woodhouse wrote:

From: Paul Durrant 

The xen-block code ends up calling aio_poll() through blkconf_geometry(),
which means we see watch events during the indirect call to
xendev_class->realize() in xen_device_realize(). Unfortunately this call
is made before populating the initial frontend and backend device nodes
in xenstore and hence xen_block_frontend_changed() (which is called from
a watch event) fails to read the frontend's 'state' node, and hence
believes the device is being torn down. This in-turn sets the backend
state to XenbusStateClosed and causes the device to be deleted before it
is fully set up, leading to the crash.
By simply moving the call to xendev_class->realize() after the initial
xenstore nodes are populated, this sorry state of affairs is avoided.

Reported-by: David Woodhouse 
Signed-off-by: Paul Durrant 
Signed-off-by: David Woodhouse 
---
  hw/xen/xen-bus.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)



Reviewed-by: Paul Durrant

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3550 matches

Mail list logo