date:20221108

Re: [PULL 0/2] Net patches

2022-11-08 Thread Laurent Vivier


On 11/8/22 17:32, Stefan Hajnoczi wrote:

On Mon, 7 Nov 2022 at 23:20, Jason Wang  wrote:


The following changes since commit 524fc737431d240f9d9f10aaf381003092868bac:

   util/log: Ignore per-thread flag if global file already there (2022-11-07 
16:00:02 -0500)

are available in the git repository at:

   https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to fd2c87c7b0c97be2ac8d334885419f51f5963b51:

   tests/qtest: netdev: test stream and dgram backends (2022-11-08 12:10:26 
+0800)




Laurent Vivier (1):
   tests/qtest: netdev: test stream and dgram backends


This test does not pass in CI:
https://gitlab.com/qemu-project/qemu/-/jobs/3290964536
https://gitlab.com/qemu-project/qemu/-/jobs/3290964524
https://gitlab.com/qemu-project/qemu/-/jobs/3290964471
https://gitlab.com/qemu-project/qemu/-/jobs/3290964475


These four fail because of "No machine specified, and there is no default"


https://gitlab.com/qemu-project/qemu/-/jobs/3290964569


This one because of an unexpected "info network" result:

st0: index=0,type=stream,
xlnx.xps-ethernetlite.0: 
index=0,type=nic,model=xlnx.xps-ethernetlite,macaddr=52:54:00:12:34:56




We're in soft freeze now. Please hold off on new tests unless they
verify regressions/blockers.


Sorry for that, I fix that and wait for 7.3...

Thanks,
Laurent

Re: [PATCH trivial for 7.2] hw/ssi/sifive_spi.c: spelling: reigster

2022-11-08 Thread Philippe Mathieu-Daudé


On 8/11/22 23:11, Palmer Dabbelt wrote:

On Sat, 05 Nov 2022 04:53:29 PDT (-0700), m...@tls.msk.ru wrote:

Fixes: 0694dabe9763847f3010b54ab3ec7d367d2f0ff0


Not sure if I missed something in QEMU land, but those are usually 
listed more like


Fixes: 0694dabe97 ("hw/ssi: Add SiFive SPI controller support")


MST suggested once to try to restrict the 'Fixes:' tag to bug /
regressions, as it might help downstream distributions to filter
commits to cherry-pick.

Since it might be useful to have the offending commit sha1 in the
description, when it is simply an omission or improvement I use
the an inline form instead of a tag:

  Fixes the typo introduced in commit 0694dabe97 ("hw/ssi: Add SiFive
  SPI controller support").

Although in this particular use-case it is not really useful ;)

Another example:

  When adding  in commit ")>, we forgot
  to fill the API prototype description. Do it now.

Regards,

Phil.

Re: [PULL 0/3] Memory/SDHCI/ParallelFlash patches for v7.2.0-rc0

2022-11-08 Thread Philippe Mathieu-Daudé


On 8/11/22 21:57, Stefan Hajnoczi wrote:

I've dropped the SDHCI CVE fix due to the CI failure.

The rest of the commits are still in the staging tree and I plan to
include them in v7.2.0-rc0.


Thank you Stefan, sorry for not catching that failure sooner.

Re: [PULL v4 29/83] virtio: introduce virtio_queue_enable()

2022-11-08 Thread Xuan Zhuo

On Wed, 9 Nov 2022 02:04:17 -0500, "Michael S. Tsirkin"  wrote:
> On Wed, Nov 09, 2022 at 02:56:01PM +0800, Xuan Zhuo wrote:
> > On Wed, 9 Nov 2022 14:55:03 +0800, Jason Wang  wrote:
> > >
> > > 在 2022/11/9 14:51, Michael S. Tsirkin 写道:
> > > > On Wed, Nov 09, 2022 at 11:31:23AM +0800, Jason Wang wrote:
> > > >> On Wed, Nov 9, 2022 at 3:43 AM Stefan Hajnoczi  
> > > >> wrote:
> > > >>> On Mon, 7 Nov 2022 at 18:10, Michael S. Tsirkin  
> > > >>> wrote:
> > >  From: Kangjie Xu 
> > > 
> > >  Introduce the interface queue_enable() in VirtioDeviceClass and the
> > >  fucntion virtio_queue_enable() in virtio, it can be called when
> > >  VIRTIO_PCI_COMMON_Q_ENABLE is written and related virtqueue can be
> > >  started. It only supports the devices of virtio 1 or later. The
> > >  not-supported devices can only start the virtqueue when DRIVER_OK.
> > > 
> > >  Signed-off-by: Kangjie Xu 
> > >  Signed-off-by: Xuan Zhuo 
> > >  Acked-by: Jason Wang 
> > >  Message-Id: <20221017092558.111082-4-xuanz...@linux.alibaba.com>
> > >  Reviewed-by: Michael S. Tsirkin 
> > >  Signed-off-by: Michael S. Tsirkin 
> > >  ---
> > >    include/hw/virtio/virtio.h |  2 ++
> > >    hw/virtio/virtio.c | 14 ++
> > >    2 files changed, 16 insertions(+)
> > > 
> > >  diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > >  index 74d76c1dbc..b00b3fcf31 100644
> > >  --- a/include/hw/virtio/virtio.h
> > >  +++ b/include/hw/virtio/virtio.h
> > >  @@ -149,6 +149,7 @@ struct VirtioDeviceClass {
> > >    void (*reset)(VirtIODevice *vdev);
> > >    void (*set_status)(VirtIODevice *vdev, uint8_t val);
> > >    void (*queue_reset)(VirtIODevice *vdev, uint32_t queue_index);
> > >  +void (*queue_enable)(VirtIODevice *vdev, uint32_t queue_index);
> > >    /* For transitional devices, this is a bitmap of features
> > > * that are only exposed on the legacy interface but not
> > > * the modern one.
> > >  @@ -288,6 +289,7 @@ int 
> > >  virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
> > >    int virtio_set_status(VirtIODevice *vdev, uint8_t val);
> > >    void virtio_reset(void *opaque);
> > >    void virtio_queue_reset(VirtIODevice *vdev, uint32_t queue_index);
> > >  +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
> > >    void virtio_update_irq(VirtIODevice *vdev);
> > >    int virtio_set_features(VirtIODevice *vdev, uint64_t val);
> > > 
> > >  diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > >  index cf5f9ca387..9683b2e158 100644
> > >  --- a/hw/virtio/virtio.c
> > >  +++ b/hw/virtio/virtio.c
> > >  @@ -2495,6 +2495,20 @@ void virtio_queue_reset(VirtIODevice *vdev, 
> > >  uint32_t queue_index)
> > >    __virtio_queue_reset(vdev, queue_index);
> > >    }
> > > 
> > >  +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index)
> > >  +{
> > >  +VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
> > >  +
> > >  +if (!virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> > >  +error_report("queue_enable is only suppported in devices of 
> > >  virtio "
> > >  + "1.0 or later.");
> > > >>> Why is this triggering here? Maybe virtio_queue_enable() is called too
> > > >>> early. I have verified that the Linux guest driver sets VERSION_1. I
> > > >>> didn't check what SeaBIOS does.
> > > >> Looks like a bug, we should check device features here at least and it
> > > >> should be guest errors instead of error_report() here.
> > > >>
> > > >> Thanks
> > > > But it's weird, queue enable is written before guest features?
> > > > How come?
> > >
> > >
> > > Or queue_enable is written when the driver doesn't negotiate VERSION_1?
> >
> > Is this a bug?
> >
> > Or is it allowed in some cases?
> >
> > I feel weird too.
> >
> > Thanks.
>
> Weren't you able to reproduce?
> I suggest
>   - write a bios patch to make it spec compliant
>   - check UEFI to make sure it's spec compliant
>   - ask bios/uefi maintainers whether they can include the patch for this 
> release

It looks very interesting, I am happy to study it.

>   - add a patch to drop the warning - we don't really need it

I sent this patch first.

Thanks.

>
>
> > >
> > > Thanks
> > >
> > >
> > > >
> > > >>> $ build/qemu-system-x86_64 -M accel=kvm -m 1G -cpu host -blockdev
> > > >>> file,node-name=drive0,filename=test.img -device
> > > >>> virtio-blk-pci,drive=drive0
> > > >>> qemu: queue_enable is only suppported in devices of virtio 1.0 or 
> > > >>> later.
> > > >>>
> > > >>> Stefan
> > > >>>
> > >
>

Re: [PULL v4 29/83] virtio: introduce virtio_queue_enable()

2022-11-08 Thread Xuan Zhuo

On Wed, 9 Nov 2022 02:01:38 -0500, "Michael S. Tsirkin"  wrote:
> On Wed, Nov 09, 2022 at 02:48:29PM +0800, Xuan Zhuo wrote:
> > On Wed, 9 Nov 2022 01:39:32 -0500, "Michael S. Tsirkin"  
> > wrote:
> > > On Wed, Nov 09, 2022 at 11:31:23AM +0800, Jason Wang wrote:
> > > > On Wed, Nov 9, 2022 at 3:43 AM Stefan Hajnoczi  
> > > > wrote:
> > > > >
> > > > > On Mon, 7 Nov 2022 at 18:10, Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > From: Kangjie Xu 
> > > > > >
> > > > > > Introduce the interface queue_enable() in VirtioDeviceClass and the
> > > > > > fucntion virtio_queue_enable() in virtio, it can be called when
> > > > > > VIRTIO_PCI_COMMON_Q_ENABLE is written and related virtqueue can be
> > > > > > started. It only supports the devices of virtio 1 or later. The
> > > > > > not-supported devices can only start the virtqueue when DRIVER_OK.
> > > > > >
> > > > > > Signed-off-by: Kangjie Xu 
> > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > Acked-by: Jason Wang 
> > > > > > Message-Id: <20221017092558.111082-4-xuanz...@linux.alibaba.com>
> > > > > > Reviewed-by: Michael S. Tsirkin 
> > > > > > Signed-off-by: Michael S. Tsirkin 
> > > > > > ---
> > > > > >  include/hw/virtio/virtio.h |  2 ++
> > > > > >  hw/virtio/virtio.c | 14 ++
> > > > > >  2 files changed, 16 insertions(+)
> > > > > >
> > > > > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > > > > > index 74d76c1dbc..b00b3fcf31 100644
> > > > > > --- a/include/hw/virtio/virtio.h
> > > > > > +++ b/include/hw/virtio/virtio.h
> > > > > > @@ -149,6 +149,7 @@ struct VirtioDeviceClass {
> > > > > >  void (*reset)(VirtIODevice *vdev);
> > > > > >  void (*set_status)(VirtIODevice *vdev, uint8_t val);
> > > > > >  void (*queue_reset)(VirtIODevice *vdev, uint32_t queue_index);
> > > > > > +void (*queue_enable)(VirtIODevice *vdev, uint32_t queue_index);
> > > > > >  /* For transitional devices, this is a bitmap of features
> > > > > >   * that are only exposed on the legacy interface but not
> > > > > >   * the modern one.
> > > > > > @@ -288,6 +289,7 @@ int 
> > > > > > virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
> > > > > >  int virtio_set_status(VirtIODevice *vdev, uint8_t val);
> > > > > >  void virtio_reset(void *opaque);
> > > > > >  void virtio_queue_reset(VirtIODevice *vdev, uint32_t queue_index);
> > > > > > +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
> > > > > >  void virtio_update_irq(VirtIODevice *vdev);
> > > > > >  int virtio_set_features(VirtIODevice *vdev, uint64_t val);
> > > > > >
> > > > > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > > > > > index cf5f9ca387..9683b2e158 100644
> > > > > > --- a/hw/virtio/virtio.c
> > > > > > +++ b/hw/virtio/virtio.c
> > > > > > @@ -2495,6 +2495,20 @@ void virtio_queue_reset(VirtIODevice *vdev, 
> > > > > > uint32_t queue_index)
> > > > > >  __virtio_queue_reset(vdev, queue_index);
> > > > > >  }
> > > > > >
> > > > > > +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index)
> > > > > > +{
> > > > > > +VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
> > > > > > +
> > > > > > +if (!virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> > > > > > +error_report("queue_enable is only suppported in devices 
> > > > > > of virtio "
> > > > > > + "1.0 or later.");
> > > > >
> > > > > Why is this triggering here? Maybe virtio_queue_enable() is called too
> > > > > early. I have verified that the Linux guest driver sets VERSION_1. I
> > > > > didn't check what SeaBIOS does.
> > > >
> > > > Looks like a bug, we should check device features here at least and it
> > > > should be guest errors instead of error_report() here.
> > > >
> > > > Thanks
> > > >
> > >
> > > I suspect we should just drop this print. Kangjie?
> >
> >
> > I think it is.
> >
> > At that time, this inspection was only added at hand, and theoretically it
> > should not be performed.
> >
> > I am responsible for this patch set now.
> >
> > hi, Michael,
> >
> > What should I do, do I send a new version again?
> >
> > Thanks.
>
> I debugged it and replied separately. Can you check EFI drivers too?

OK.

Thanks.


>
> > >
> > >
> > > > > $ build/qemu-system-x86_64 -M accel=kvm -m 1G -cpu host -blockdev
> > > > > file,node-name=drive0,filename=test.img -device
> > > > > virtio-blk-pci,drive=drive0
> > > > > qemu: queue_enable is only suppported in devices of virtio 1.0 or 
> > > > > later.
> > > > >
> > > > > Stefan
> > > > >
> > >
>

Re: [PULL v4 29/83] virtio: introduce virtio_queue_enable()

2022-11-08 Thread Michael S. Tsirkin

On Wed, Nov 09, 2022 at 02:56:01PM +0800, Xuan Zhuo wrote:
> On Wed, 9 Nov 2022 14:55:03 +0800, Jason Wang  wrote:
> >
> > 在 2022/11/9 14:51, Michael S. Tsirkin 写道:
> > > On Wed, Nov 09, 2022 at 11:31:23AM +0800, Jason Wang wrote:
> > >> On Wed, Nov 9, 2022 at 3:43 AM Stefan Hajnoczi  
> > >> wrote:
> > >>> On Mon, 7 Nov 2022 at 18:10, Michael S. Tsirkin  wrote:
> >  From: Kangjie Xu 
> > 
> >  Introduce the interface queue_enable() in VirtioDeviceClass and the
> >  fucntion virtio_queue_enable() in virtio, it can be called when
> >  VIRTIO_PCI_COMMON_Q_ENABLE is written and related virtqueue can be
> >  started. It only supports the devices of virtio 1 or later. The
> >  not-supported devices can only start the virtqueue when DRIVER_OK.
> > 
> >  Signed-off-by: Kangjie Xu 
> >  Signed-off-by: Xuan Zhuo 
> >  Acked-by: Jason Wang 
> >  Message-Id: <20221017092558.111082-4-xuanz...@linux.alibaba.com>
> >  Reviewed-by: Michael S. Tsirkin 
> >  Signed-off-by: Michael S. Tsirkin 
> >  ---
> >    include/hw/virtio/virtio.h |  2 ++
> >    hw/virtio/virtio.c | 14 ++
> >    2 files changed, 16 insertions(+)
> > 
> >  diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> >  index 74d76c1dbc..b00b3fcf31 100644
> >  --- a/include/hw/virtio/virtio.h
> >  +++ b/include/hw/virtio/virtio.h
> >  @@ -149,6 +149,7 @@ struct VirtioDeviceClass {
> >    void (*reset)(VirtIODevice *vdev);
> >    void (*set_status)(VirtIODevice *vdev, uint8_t val);
> >    void (*queue_reset)(VirtIODevice *vdev, uint32_t queue_index);
> >  +void (*queue_enable)(VirtIODevice *vdev, uint32_t queue_index);
> >    /* For transitional devices, this is a bitmap of features
> > * that are only exposed on the legacy interface but not
> > * the modern one.
> >  @@ -288,6 +289,7 @@ int virtio_queue_set_host_notifier_mr(VirtIODevice 
> >  *vdev, int n,
> >    int virtio_set_status(VirtIODevice *vdev, uint8_t val);
> >    void virtio_reset(void *opaque);
> >    void virtio_queue_reset(VirtIODevice *vdev, uint32_t queue_index);
> >  +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
> >    void virtio_update_irq(VirtIODevice *vdev);
> >    int virtio_set_features(VirtIODevice *vdev, uint64_t val);
> > 
> >  diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >  index cf5f9ca387..9683b2e158 100644
> >  --- a/hw/virtio/virtio.c
> >  +++ b/hw/virtio/virtio.c
> >  @@ -2495,6 +2495,20 @@ void virtio_queue_reset(VirtIODevice *vdev, 
> >  uint32_t queue_index)
> >    __virtio_queue_reset(vdev, queue_index);
> >    }
> > 
> >  +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index)
> >  +{
> >  +VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
> >  +
> >  +if (!virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> >  +error_report("queue_enable is only suppported in devices of 
> >  virtio "
> >  + "1.0 or later.");
> > >>> Why is this triggering here? Maybe virtio_queue_enable() is called too
> > >>> early. I have verified that the Linux guest driver sets VERSION_1. I
> > >>> didn't check what SeaBIOS does.
> > >> Looks like a bug, we should check device features here at least and it
> > >> should be guest errors instead of error_report() here.
> > >>
> > >> Thanks
> > > But it's weird, queue enable is written before guest features?
> > > How come?
> >
> >
> > Or queue_enable is written when the driver doesn't negotiate VERSION_1?
> 
> Is this a bug?
> 
> Or is it allowed in some cases?
> 
> I feel weird too.
> 
> Thanks.

Weren't you able to reproduce?
I suggest
- write a bios patch to make it spec compliant
- check UEFI to make sure it's spec compliant
- ask bios/uefi maintainers whether they can include the patch for this 
release
- add a patch to drop the warning - we don't really need it


> >
> > Thanks
> >
> >
> > >
> > >>> $ build/qemu-system-x86_64 -M accel=kvm -m 1G -cpu host -blockdev
> > >>> file,node-name=drive0,filename=test.img -device
> > >>> virtio-blk-pci,drive=drive0
> > >>> qemu: queue_enable is only suppported in devices of virtio 1.0 or later.
> > >>>
> > >>> Stefan
> > >>>
> >

[PULL 1/2] tcg: Move TCG_TARGET_HAS_direct_jump init to tb_gen_code

2022-11-08 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 accel/tcg/translate-all.c | 10 --
 tcg/tcg.c | 12 
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 921944a5ab..9ee21f7f52 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -821,16 +821,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 trace_translate_block(tb, pc, tb->tc.ptr);
 
 /* generate machine code */
-tb->jmp_reset_offset[0] = TB_JMP_RESET_OFFSET_INVALID;
-tb->jmp_reset_offset[1] = TB_JMP_RESET_OFFSET_INVALID;
-tcg_ctx->tb_jmp_reset_offset = tb->jmp_reset_offset;
-if (TCG_TARGET_HAS_direct_jump) {
-tcg_ctx->tb_jmp_insn_offset = tb->jmp_target_arg;
-tcg_ctx->tb_jmp_target_addr = NULL;
-} else {
-tcg_ctx->tb_jmp_insn_offset = NULL;
-tcg_ctx->tb_jmp_target_addr = tb->jmp_target_arg;
-}
 
 #ifdef CONFIG_PROFILER
 qatomic_set(>tb_count, prof->tb_count + 1);
diff --git a/tcg/tcg.c b/tcg/tcg.c
index b43b6a7981..436fcf6ebd 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -4228,6 +4228,18 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, 
target_ulong pc_start)
 }
 #endif
 
+/* Initialize goto_tb jump offsets. */
+tb->jmp_reset_offset[0] = TB_JMP_RESET_OFFSET_INVALID;
+tb->jmp_reset_offset[1] = TB_JMP_RESET_OFFSET_INVALID;
+tcg_ctx->tb_jmp_reset_offset = tb->jmp_reset_offset;
+if (TCG_TARGET_HAS_direct_jump) {
+tcg_ctx->tb_jmp_insn_offset = tb->jmp_target_arg;
+tcg_ctx->tb_jmp_target_addr = NULL;
+} else {
+tcg_ctx->tb_jmp_insn_offset = NULL;
+tcg_ctx->tb_jmp_target_addr = tb->jmp_target_arg;
+}
+
 tcg_reg_alloc_start(s);
 
 /*
-- 
2.34.1

[PULL for-7.2 0/2] tcg patch queue

2022-11-08 Thread Richard Henderson

The following changes since commit 60ab36907ded2918d33683f2b66f603b7400d8f3:

  Update VERSION for v7.2.0-rc0 (2022-11-08 15:53:41 -0500)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20221109

for you to fetch changes up to 344b63b380541a63c02ef7a8a6ae66cb0b6f0273:

  accel/tcg: Split out setjmp_gen_code (2022-11-09 12:29:03 +1100)


Fix -Werror=clobbered issue with tb_gen_code


Richard Henderson (2):
  tcg: Move TCG_TARGET_HAS_direct_jump init to tb_gen_code
  accel/tcg: Split out setjmp_gen_code

 accel/tcg/translate-all.c | 68 +++
 tcg/tcg.c | 12 +
 2 files changed, 45 insertions(+), 35 deletions(-)

[PULL 2/2] accel/tcg: Split out setjmp_gen_code

2022-11-08 Thread Richard Henderson

Isolate the code protected by setjmp.  Fixes:

translate-all.c: In function ‘tb_gen_code’:
translate-all.c:748:51: error: argument ‘cflags’ might be clobbered by 
‘longjmp’ or ‘vfork’ [-Werror=clobbered]

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 accel/tcg/translate-all.c | 58 ++-
 1 file changed, 33 insertions(+), 25 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 9ee21f7f52..ac3ee3740c 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -742,6 +742,37 @@ void page_collection_unlock(struct page_collection *set)
 
 #endif /* !CONFIG_USER_ONLY */
 
+/*
+ * Isolate the portion of code gen which can setjmp/longjmp.
+ * Return the size of the generated code, or negative on error.
+ */
+static int setjmp_gen_code(CPUArchState *env, TranslationBlock *tb,
+   target_ulong pc, void *host_pc,
+   int *max_insns, int64_t *ti)
+{
+int ret = sigsetjmp(tcg_ctx->jmp_trans, 0);
+if (unlikely(ret != 0)) {
+return ret;
+}
+
+tcg_func_start(tcg_ctx);
+
+tcg_ctx->cpu = env_cpu(env);
+gen_intermediate_code(env_cpu(env), tb, *max_insns, pc, host_pc);
+assert(tb->size != 0);
+tcg_ctx->cpu = NULL;
+*max_insns = tb->icount;
+
+#ifdef CONFIG_PROFILER
+qatomic_set(_ctx->prof.tb_count, tcg_ctx->prof.tb_count + 1);
+qatomic_set(_ctx->prof.interm_time,
+tcg_ctx->prof.interm_time + profile_getclock() - *ti);
+*ti = profile_getclock();
+#endif
+
+return tcg_gen_code(tcg_ctx, tb, pc);
+}
+
 /* Called with mmap_lock held for user mode emulation.  */
 TranslationBlock *tb_gen_code(CPUState *cpu,
   target_ulong pc, target_ulong cs_base,
@@ -754,8 +785,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 int gen_code_size, search_size, max_insns;
 #ifdef CONFIG_PROFILER
 TCGProfile *prof = _ctx->prof;
-int64_t ti;
 #endif
+int64_t ti;
 void *host_pc;
 
 assert_memory_lock();
@@ -805,33 +836,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 ti = profile_getclock();
 #endif
 
-gen_code_size = sigsetjmp(tcg_ctx->jmp_trans, 0);
-if (unlikely(gen_code_size != 0)) {
-goto error_return;
-}
-
-tcg_func_start(tcg_ctx);
-
-tcg_ctx->cpu = env_cpu(env);
-gen_intermediate_code(cpu, tb, max_insns, pc, host_pc);
-assert(tb->size != 0);
-tcg_ctx->cpu = NULL;
-max_insns = tb->icount;
-
 trace_translate_block(tb, pc, tb->tc.ptr);
 
-/* generate machine code */
-
-#ifdef CONFIG_PROFILER
-qatomic_set(>tb_count, prof->tb_count + 1);
-qatomic_set(>interm_time,
-prof->interm_time + profile_getclock() - ti);
-ti = profile_getclock();
-#endif
-
-gen_code_size = tcg_gen_code(tcg_ctx, tb, pc);
+gen_code_size = setjmp_gen_code(env, tb, pc, host_pc, _insns, );
 if (unlikely(gen_code_size < 0)) {
- error_return:
 switch (gen_code_size) {
 case -1:
 /*
-- 
2.34.1

Re: [PULL v4 29/83] virtio: introduce virtio_queue_enable()

2022-11-08 Thread Michael S. Tsirkin

On Wed, Nov 09, 2022 at 02:48:29PM +0800, Xuan Zhuo wrote:
> On Wed, 9 Nov 2022 01:39:32 -0500, "Michael S. Tsirkin"  
> wrote:
> > On Wed, Nov 09, 2022 at 11:31:23AM +0800, Jason Wang wrote:
> > > On Wed, Nov 9, 2022 at 3:43 AM Stefan Hajnoczi  wrote:
> > > >
> > > > On Mon, 7 Nov 2022 at 18:10, Michael S. Tsirkin  wrote:
> > > > >
> > > > > From: Kangjie Xu 
> > > > >
> > > > > Introduce the interface queue_enable() in VirtioDeviceClass and the
> > > > > fucntion virtio_queue_enable() in virtio, it can be called when
> > > > > VIRTIO_PCI_COMMON_Q_ENABLE is written and related virtqueue can be
> > > > > started. It only supports the devices of virtio 1 or later. The
> > > > > not-supported devices can only start the virtqueue when DRIVER_OK.
> > > > >
> > > > > Signed-off-by: Kangjie Xu 
> > > > > Signed-off-by: Xuan Zhuo 
> > > > > Acked-by: Jason Wang 
> > > > > Message-Id: <20221017092558.111082-4-xuanz...@linux.alibaba.com>
> > > > > Reviewed-by: Michael S. Tsirkin 
> > > > > Signed-off-by: Michael S. Tsirkin 
> > > > > ---
> > > > >  include/hw/virtio/virtio.h |  2 ++
> > > > >  hw/virtio/virtio.c | 14 ++
> > > > >  2 files changed, 16 insertions(+)
> > > > >
> > > > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > > > > index 74d76c1dbc..b00b3fcf31 100644
> > > > > --- a/include/hw/virtio/virtio.h
> > > > > +++ b/include/hw/virtio/virtio.h
> > > > > @@ -149,6 +149,7 @@ struct VirtioDeviceClass {
> > > > >  void (*reset)(VirtIODevice *vdev);
> > > > >  void (*set_status)(VirtIODevice *vdev, uint8_t val);
> > > > >  void (*queue_reset)(VirtIODevice *vdev, uint32_t queue_index);
> > > > > +void (*queue_enable)(VirtIODevice *vdev, uint32_t queue_index);
> > > > >  /* For transitional devices, this is a bitmap of features
> > > > >   * that are only exposed on the legacy interface but not
> > > > >   * the modern one.
> > > > > @@ -288,6 +289,7 @@ int 
> > > > > virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
> > > > >  int virtio_set_status(VirtIODevice *vdev, uint8_t val);
> > > > >  void virtio_reset(void *opaque);
> > > > >  void virtio_queue_reset(VirtIODevice *vdev, uint32_t queue_index);
> > > > > +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
> > > > >  void virtio_update_irq(VirtIODevice *vdev);
> > > > >  int virtio_set_features(VirtIODevice *vdev, uint64_t val);
> > > > >
> > > > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > > > > index cf5f9ca387..9683b2e158 100644
> > > > > --- a/hw/virtio/virtio.c
> > > > > +++ b/hw/virtio/virtio.c
> > > > > @@ -2495,6 +2495,20 @@ void virtio_queue_reset(VirtIODevice *vdev, 
> > > > > uint32_t queue_index)
> > > > >  __virtio_queue_reset(vdev, queue_index);
> > > > >  }
> > > > >
> > > > > +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index)
> > > > > +{
> > > > > +VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
> > > > > +
> > > > > +if (!virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> > > > > +error_report("queue_enable is only suppported in devices of 
> > > > > virtio "
> > > > > + "1.0 or later.");
> > > >
> > > > Why is this triggering here? Maybe virtio_queue_enable() is called too
> > > > early. I have verified that the Linux guest driver sets VERSION_1. I
> > > > didn't check what SeaBIOS does.
> > >
> > > Looks like a bug, we should check device features here at least and it
> > > should be guest errors instead of error_report() here.
> > >
> > > Thanks
> > >
> >
> > I suspect we should just drop this print. Kangjie?
> 
> 
> I think it is.
> 
> At that time, this inspection was only added at hand, and theoretically it
> should not be performed.
> 
> I am responsible for this patch set now.
> 
> hi, Michael,
> 
> What should I do, do I send a new version again?
> 
> Thanks.

I debugged it and replied separately. Can you check EFI drivers too?

> >
> >
> > > > $ build/qemu-system-x86_64 -M accel=kvm -m 1G -cpu host -blockdev
> > > > file,node-name=drive0,filename=test.img -device
> > > > virtio-blk-pci,drive=drive0
> > > > qemu: queue_enable is only suppported in devices of virtio 1.0 or later.
> > > >
> > > > Stefan
> > > >
> >

Re: [PULL v4 29/83] virtio: introduce virtio_queue_enable()

2022-11-08 Thread Michael S. Tsirkin

On Wed, Nov 09, 2022 at 01:52:01AM -0500, Michael S. Tsirkin wrote:
> On Wed, Nov 09, 2022 at 11:31:23AM +0800, Jason Wang wrote:
> > On Wed, Nov 9, 2022 at 3:43 AM Stefan Hajnoczi  wrote:
> > >
> > > On Mon, 7 Nov 2022 at 18:10, Michael S. Tsirkin  wrote:
> > > >
> > > > From: Kangjie Xu 
> > > >
> > > > Introduce the interface queue_enable() in VirtioDeviceClass and the
> > > > fucntion virtio_queue_enable() in virtio, it can be called when
> > > > VIRTIO_PCI_COMMON_Q_ENABLE is written and related virtqueue can be
> > > > started. It only supports the devices of virtio 1 or later. The
> > > > not-supported devices can only start the virtqueue when DRIVER_OK.
> > > >
> > > > Signed-off-by: Kangjie Xu 
> > > > Signed-off-by: Xuan Zhuo 
> > > > Acked-by: Jason Wang 
> > > > Message-Id: <20221017092558.111082-4-xuanz...@linux.alibaba.com>
> > > > Reviewed-by: Michael S. Tsirkin 
> > > > Signed-off-by: Michael S. Tsirkin 
> > > > ---
> > > >  include/hw/virtio/virtio.h |  2 ++
> > > >  hw/virtio/virtio.c | 14 ++
> > > >  2 files changed, 16 insertions(+)
> > > >
> > > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > > > index 74d76c1dbc..b00b3fcf31 100644
> > > > --- a/include/hw/virtio/virtio.h
> > > > +++ b/include/hw/virtio/virtio.h
> > > > @@ -149,6 +149,7 @@ struct VirtioDeviceClass {
> > > >  void (*reset)(VirtIODevice *vdev);
> > > >  void (*set_status)(VirtIODevice *vdev, uint8_t val);
> > > >  void (*queue_reset)(VirtIODevice *vdev, uint32_t queue_index);
> > > > +void (*queue_enable)(VirtIODevice *vdev, uint32_t queue_index);
> > > >  /* For transitional devices, this is a bitmap of features
> > > >   * that are only exposed on the legacy interface but not
> > > >   * the modern one.
> > > > @@ -288,6 +289,7 @@ int virtio_queue_set_host_notifier_mr(VirtIODevice 
> > > > *vdev, int n,
> > > >  int virtio_set_status(VirtIODevice *vdev, uint8_t val);
> > > >  void virtio_reset(void *opaque);
> > > >  void virtio_queue_reset(VirtIODevice *vdev, uint32_t queue_index);
> > > > +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
> > > >  void virtio_update_irq(VirtIODevice *vdev);
> > > >  int virtio_set_features(VirtIODevice *vdev, uint64_t val);
> > > >
> > > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > > > index cf5f9ca387..9683b2e158 100644
> > > > --- a/hw/virtio/virtio.c
> > > > +++ b/hw/virtio/virtio.c
> > > > @@ -2495,6 +2495,20 @@ void virtio_queue_reset(VirtIODevice *vdev, 
> > > > uint32_t queue_index)
> > > >  __virtio_queue_reset(vdev, queue_index);
> > > >  }
> > > >
> > > > +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index)
> > > > +{
> > > > +VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
> > > > +
> > > > +if (!virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> > > > +error_report("queue_enable is only suppported in devices of 
> > > > virtio "
> > > > + "1.0 or later.");
> > >
> > > Why is this triggering here? Maybe virtio_queue_enable() is called too
> > > early. I have verified that the Linux guest driver sets VERSION_1. I
> > > didn't check what SeaBIOS does.
> > 
> > Looks like a bug, we should check device features here at least and it
> > should be guest errors instead of error_report() here.
> > 
> > Thanks
> 
> But it's weird, queue enable is written before guest features?
> How come?

It's a bios bug:



vp_init_simple(>vp, pci);
if (vp_find_vq(>vp, 0, >vq) < 0 ) {
dprintf(1, "fail to find vq for virtio-blk %pP\n", pci);
goto fail; 
}   

if (vdrive->vp.use_modern) {
struct vp_device *vp = >vp;
u64 features = vp_get_features(vp);
u64 version1 = 1ull << VIRTIO_F_VERSION_1;
u64 iommu_platform = 1ull << VIRTIO_F_IOMMU_PLATFORM;
u64 blk_size = 1ull << VIRTIO_BLK_F_BLK_SIZE;
if (!(features & version1)) {
dprintf(1, "modern device without virtio_1 feature bit: %pP\n", 
pci);
goto fail;
}

features = features & (version1 | iommu_platform | blk_size);
vp_set_features(vp, features);
status |= VIRTIO_CONFIG_S_FEATURES_OK;
vp_set_status(vp, status);



Not good - does not match the spec. Here's what the spec says:


The driver MUST follow this sequence to initialize a device:
1. Reset the device.
2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
3. Set the DRIVER status bit: the guest OS knows how to drive the device.
4. Read device feature bits, and write the subset of feature bits understood by 
the OS and driver to the
device. During this step the driver MAY read (but MUST NOT write) the 
device-specific configuration
fields to check that it can support the device before accepting it.
5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits 
after this step.
6. Re-read device status to ensure the

Re: [PULL v4 29/83] virtio: introduce virtio_queue_enable()

2022-11-08 Thread Xuan Zhuo

On Wed, 9 Nov 2022 14:55:03 +0800, Jason Wang  wrote:
>
> 在 2022/11/9 14:51, Michael S. Tsirkin 写道:
> > On Wed, Nov 09, 2022 at 11:31:23AM +0800, Jason Wang wrote:
> >> On Wed, Nov 9, 2022 at 3:43 AM Stefan Hajnoczi  wrote:
> >>> On Mon, 7 Nov 2022 at 18:10, Michael S. Tsirkin  wrote:
>  From: Kangjie Xu 
> 
>  Introduce the interface queue_enable() in VirtioDeviceClass and the
>  fucntion virtio_queue_enable() in virtio, it can be called when
>  VIRTIO_PCI_COMMON_Q_ENABLE is written and related virtqueue can be
>  started. It only supports the devices of virtio 1 or later. The
>  not-supported devices can only start the virtqueue when DRIVER_OK.
> 
>  Signed-off-by: Kangjie Xu 
>  Signed-off-by: Xuan Zhuo 
>  Acked-by: Jason Wang 
>  Message-Id: <20221017092558.111082-4-xuanz...@linux.alibaba.com>
>  Reviewed-by: Michael S. Tsirkin 
>  Signed-off-by: Michael S. Tsirkin 
>  ---
>    include/hw/virtio/virtio.h |  2 ++
>    hw/virtio/virtio.c | 14 ++
>    2 files changed, 16 insertions(+)
> 
>  diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>  index 74d76c1dbc..b00b3fcf31 100644
>  --- a/include/hw/virtio/virtio.h
>  +++ b/include/hw/virtio/virtio.h
>  @@ -149,6 +149,7 @@ struct VirtioDeviceClass {
>    void (*reset)(VirtIODevice *vdev);
>    void (*set_status)(VirtIODevice *vdev, uint8_t val);
>    void (*queue_reset)(VirtIODevice *vdev, uint32_t queue_index);
>  +void (*queue_enable)(VirtIODevice *vdev, uint32_t queue_index);
>    /* For transitional devices, this is a bitmap of features
> * that are only exposed on the legacy interface but not
> * the modern one.
>  @@ -288,6 +289,7 @@ int virtio_queue_set_host_notifier_mr(VirtIODevice 
>  *vdev, int n,
>    int virtio_set_status(VirtIODevice *vdev, uint8_t val);
>    void virtio_reset(void *opaque);
>    void virtio_queue_reset(VirtIODevice *vdev, uint32_t queue_index);
>  +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
>    void virtio_update_irq(VirtIODevice *vdev);
>    int virtio_set_features(VirtIODevice *vdev, uint64_t val);
> 
>  diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>  index cf5f9ca387..9683b2e158 100644
>  --- a/hw/virtio/virtio.c
>  +++ b/hw/virtio/virtio.c
>  @@ -2495,6 +2495,20 @@ void virtio_queue_reset(VirtIODevice *vdev, 
>  uint32_t queue_index)
>    __virtio_queue_reset(vdev, queue_index);
>    }
> 
>  +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index)
>  +{
>  +VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>  +
>  +if (!virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
>  +error_report("queue_enable is only suppported in devices of 
>  virtio "
>  + "1.0 or later.");
> >>> Why is this triggering here? Maybe virtio_queue_enable() is called too
> >>> early. I have verified that the Linux guest driver sets VERSION_1. I
> >>> didn't check what SeaBIOS does.
> >> Looks like a bug, we should check device features here at least and it
> >> should be guest errors instead of error_report() here.
> >>
> >> Thanks
> > But it's weird, queue enable is written before guest features?
> > How come?
>
>
> Or queue_enable is written when the driver doesn't negotiate VERSION_1?

Is this a bug?

Or is it allowed in some cases?

I feel weird too.

Thanks.

>
> Thanks
>
>
> >
> >>> $ build/qemu-system-x86_64 -M accel=kvm -m 1G -cpu host -blockdev
> >>> file,node-name=drive0,filename=test.img -device
> >>> virtio-blk-pci,drive=drive0
> >>> qemu: queue_enable is only suppported in devices of virtio 1.0 or later.
> >>>
> >>> Stefan
> >>>
>

Re: [PULL v4 29/83] virtio: introduce virtio_queue_enable()

2022-11-08 Thread Jason Wang




在 2022/11/9 14:51, Michael S. Tsirkin 写道:

On Wed, Nov 09, 2022 at 11:31:23AM +0800, Jason Wang wrote:

On Wed, Nov 9, 2022 at 3:43 AM Stefan Hajnoczi  wrote:

On Mon, 7 Nov 2022 at 18:10, Michael S. Tsirkin  wrote:

From: Kangjie Xu 

Introduce the interface queue_enable() in VirtioDeviceClass and the
fucntion virtio_queue_enable() in virtio, it can be called when
VIRTIO_PCI_COMMON_Q_ENABLE is written and related virtqueue can be
started. It only supports the devices of virtio 1 or later. The
not-supported devices can only start the virtqueue when DRIVER_OK.

Signed-off-by: Kangjie Xu 
Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
Message-Id: <20221017092558.111082-4-xuanz...@linux.alibaba.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
  include/hw/virtio/virtio.h |  2 ++
  hw/virtio/virtio.c | 14 ++
  2 files changed, 16 insertions(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 74d76c1dbc..b00b3fcf31 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -149,6 +149,7 @@ struct VirtioDeviceClass {
  void (*reset)(VirtIODevice *vdev);
  void (*set_status)(VirtIODevice *vdev, uint8_t val);
  void (*queue_reset)(VirtIODevice *vdev, uint32_t queue_index);
+void (*queue_enable)(VirtIODevice *vdev, uint32_t queue_index);
  /* For transitional devices, this is a bitmap of features
   * that are only exposed on the legacy interface but not
   * the modern one.
@@ -288,6 +289,7 @@ int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, 
int n,
  int virtio_set_status(VirtIODevice *vdev, uint8_t val);
  void virtio_reset(void *opaque);
  void virtio_queue_reset(VirtIODevice *vdev, uint32_t queue_index);
+void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
  void virtio_update_irq(VirtIODevice *vdev);
  int virtio_set_features(VirtIODevice *vdev, uint64_t val);

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index cf5f9ca387..9683b2e158 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2495,6 +2495,20 @@ void virtio_queue_reset(VirtIODevice *vdev, uint32_t 
queue_index)
  __virtio_queue_reset(vdev, queue_index);
  }

+void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index)
+{
+VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
+
+if (!virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+error_report("queue_enable is only suppported in devices of virtio "
+ "1.0 or later.");

Why is this triggering here? Maybe virtio_queue_enable() is called too
early. I have verified that the Linux guest driver sets VERSION_1. I
didn't check what SeaBIOS does.

Looks like a bug, we should check device features here at least and it
should be guest errors instead of error_report() here.

Thanks

But it's weird, queue enable is written before guest features?
How come?



Or queue_enable is written when the driver doesn't negotiate VERSION_1?

Thanks





$ build/qemu-system-x86_64 -M accel=kvm -m 1G -cpu host -blockdev
file,node-name=drive0,filename=test.img -device
virtio-blk-pci,drive=drive0
qemu: queue_enable is only suppported in devices of virtio 1.0 or later.

Stefan

Re: [PULL v4 29/83] virtio: introduce virtio_queue_enable()

2022-11-08 Thread Michael S. Tsirkin

On Wed, Nov 09, 2022 at 11:31:23AM +0800, Jason Wang wrote:
> On Wed, Nov 9, 2022 at 3:43 AM Stefan Hajnoczi  wrote:
> >
> > On Mon, 7 Nov 2022 at 18:10, Michael S. Tsirkin  wrote:
> > >
> > > From: Kangjie Xu 
> > >
> > > Introduce the interface queue_enable() in VirtioDeviceClass and the
> > > fucntion virtio_queue_enable() in virtio, it can be called when
> > > VIRTIO_PCI_COMMON_Q_ENABLE is written and related virtqueue can be
> > > started. It only supports the devices of virtio 1 or later. The
> > > not-supported devices can only start the virtqueue when DRIVER_OK.
> > >
> > > Signed-off-by: Kangjie Xu 
> > > Signed-off-by: Xuan Zhuo 
> > > Acked-by: Jason Wang 
> > > Message-Id: <20221017092558.111082-4-xuanz...@linux.alibaba.com>
> > > Reviewed-by: Michael S. Tsirkin 
> > > Signed-off-by: Michael S. Tsirkin 
> > > ---
> > >  include/hw/virtio/virtio.h |  2 ++
> > >  hw/virtio/virtio.c | 14 ++
> > >  2 files changed, 16 insertions(+)
> > >
> > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > > index 74d76c1dbc..b00b3fcf31 100644
> > > --- a/include/hw/virtio/virtio.h
> > > +++ b/include/hw/virtio/virtio.h
> > > @@ -149,6 +149,7 @@ struct VirtioDeviceClass {
> > >  void (*reset)(VirtIODevice *vdev);
> > >  void (*set_status)(VirtIODevice *vdev, uint8_t val);
> > >  void (*queue_reset)(VirtIODevice *vdev, uint32_t queue_index);
> > > +void (*queue_enable)(VirtIODevice *vdev, uint32_t queue_index);
> > >  /* For transitional devices, this is a bitmap of features
> > >   * that are only exposed on the legacy interface but not
> > >   * the modern one.
> > > @@ -288,6 +289,7 @@ int virtio_queue_set_host_notifier_mr(VirtIODevice 
> > > *vdev, int n,
> > >  int virtio_set_status(VirtIODevice *vdev, uint8_t val);
> > >  void virtio_reset(void *opaque);
> > >  void virtio_queue_reset(VirtIODevice *vdev, uint32_t queue_index);
> > > +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
> > >  void virtio_update_irq(VirtIODevice *vdev);
> > >  int virtio_set_features(VirtIODevice *vdev, uint64_t val);
> > >
> > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > > index cf5f9ca387..9683b2e158 100644
> > > --- a/hw/virtio/virtio.c
> > > +++ b/hw/virtio/virtio.c
> > > @@ -2495,6 +2495,20 @@ void virtio_queue_reset(VirtIODevice *vdev, 
> > > uint32_t queue_index)
> > >  __virtio_queue_reset(vdev, queue_index);
> > >  }
> > >
> > > +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index)
> > > +{
> > > +VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
> > > +
> > > +if (!virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> > > +error_report("queue_enable is only suppported in devices of 
> > > virtio "
> > > + "1.0 or later.");
> >
> > Why is this triggering here? Maybe virtio_queue_enable() is called too
> > early. I have verified that the Linux guest driver sets VERSION_1. I
> > didn't check what SeaBIOS does.
> 
> Looks like a bug, we should check device features here at least and it
> should be guest errors instead of error_report() here.
> 
> Thanks

But it's weird, queue enable is written before guest features?
How come?

> >
> > $ build/qemu-system-x86_64 -M accel=kvm -m 1G -cpu host -blockdev
> > file,node-name=drive0,filename=test.img -device
> > virtio-blk-pci,drive=drive0
> > qemu: queue_enable is only suppported in devices of virtio 1.0 or later.
> >
> > Stefan
> >

Re: [PULL v4 29/83] virtio: introduce virtio_queue_enable()

2022-11-08 Thread Xuan Zhuo

On Wed, 9 Nov 2022 01:39:32 -0500, "Michael S. Tsirkin"  wrote:
> On Wed, Nov 09, 2022 at 11:31:23AM +0800, Jason Wang wrote:
> > On Wed, Nov 9, 2022 at 3:43 AM Stefan Hajnoczi  wrote:
> > >
> > > On Mon, 7 Nov 2022 at 18:10, Michael S. Tsirkin  wrote:
> > > >
> > > > From: Kangjie Xu 
> > > >
> > > > Introduce the interface queue_enable() in VirtioDeviceClass and the
> > > > fucntion virtio_queue_enable() in virtio, it can be called when
> > > > VIRTIO_PCI_COMMON_Q_ENABLE is written and related virtqueue can be
> > > > started. It only supports the devices of virtio 1 or later. The
> > > > not-supported devices can only start the virtqueue when DRIVER_OK.
> > > >
> > > > Signed-off-by: Kangjie Xu 
> > > > Signed-off-by: Xuan Zhuo 
> > > > Acked-by: Jason Wang 
> > > > Message-Id: <20221017092558.111082-4-xuanz...@linux.alibaba.com>
> > > > Reviewed-by: Michael S. Tsirkin 
> > > > Signed-off-by: Michael S. Tsirkin 
> > > > ---
> > > >  include/hw/virtio/virtio.h |  2 ++
> > > >  hw/virtio/virtio.c | 14 ++
> > > >  2 files changed, 16 insertions(+)
> > > >
> > > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > > > index 74d76c1dbc..b00b3fcf31 100644
> > > > --- a/include/hw/virtio/virtio.h
> > > > +++ b/include/hw/virtio/virtio.h
> > > > @@ -149,6 +149,7 @@ struct VirtioDeviceClass {
> > > >  void (*reset)(VirtIODevice *vdev);
> > > >  void (*set_status)(VirtIODevice *vdev, uint8_t val);
> > > >  void (*queue_reset)(VirtIODevice *vdev, uint32_t queue_index);
> > > > +void (*queue_enable)(VirtIODevice *vdev, uint32_t queue_index);
> > > >  /* For transitional devices, this is a bitmap of features
> > > >   * that are only exposed on the legacy interface but not
> > > >   * the modern one.
> > > > @@ -288,6 +289,7 @@ int virtio_queue_set_host_notifier_mr(VirtIODevice 
> > > > *vdev, int n,
> > > >  int virtio_set_status(VirtIODevice *vdev, uint8_t val);
> > > >  void virtio_reset(void *opaque);
> > > >  void virtio_queue_reset(VirtIODevice *vdev, uint32_t queue_index);
> > > > +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
> > > >  void virtio_update_irq(VirtIODevice *vdev);
> > > >  int virtio_set_features(VirtIODevice *vdev, uint64_t val);
> > > >
> > > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > > > index cf5f9ca387..9683b2e158 100644
> > > > --- a/hw/virtio/virtio.c
> > > > +++ b/hw/virtio/virtio.c
> > > > @@ -2495,6 +2495,20 @@ void virtio_queue_reset(VirtIODevice *vdev, 
> > > > uint32_t queue_index)
> > > >  __virtio_queue_reset(vdev, queue_index);
> > > >  }
> > > >
> > > > +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index)
> > > > +{
> > > > +VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
> > > > +
> > > > +if (!virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> > > > +error_report("queue_enable is only suppported in devices of 
> > > > virtio "
> > > > + "1.0 or later.");
> > >
> > > Why is this triggering here? Maybe virtio_queue_enable() is called too
> > > early. I have verified that the Linux guest driver sets VERSION_1. I
> > > didn't check what SeaBIOS does.
> >
> > Looks like a bug, we should check device features here at least and it
> > should be guest errors instead of error_report() here.
> >
> > Thanks
> >
>
> I suspect we should just drop this print. Kangjie?


I think it is.

At that time, this inspection was only added at hand, and theoretically it
should not be performed.

I am responsible for this patch set now.

hi, Michael,

What should I do, do I send a new version again?

Thanks.

>
>
> > > $ build/qemu-system-x86_64 -M accel=kvm -m 1G -cpu host -blockdev
> > > file,node-name=drive0,filename=test.img -device
> > > virtio-blk-pci,drive=drive0
> > > qemu: queue_enable is only suppported in devices of virtio 1.0 or later.
> > >
> > > Stefan
> > >
>

Re: [PULL v4 29/83] virtio: introduce virtio_queue_enable()

2022-11-08 Thread Michael S. Tsirkin

On Wed, Nov 09, 2022 at 11:31:23AM +0800, Jason Wang wrote:
> On Wed, Nov 9, 2022 at 3:43 AM Stefan Hajnoczi  wrote:
> >
> > On Mon, 7 Nov 2022 at 18:10, Michael S. Tsirkin  wrote:
> > >
> > > From: Kangjie Xu 
> > >
> > > Introduce the interface queue_enable() in VirtioDeviceClass and the
> > > fucntion virtio_queue_enable() in virtio, it can be called when
> > > VIRTIO_PCI_COMMON_Q_ENABLE is written and related virtqueue can be
> > > started. It only supports the devices of virtio 1 or later. The
> > > not-supported devices can only start the virtqueue when DRIVER_OK.
> > >
> > > Signed-off-by: Kangjie Xu 
> > > Signed-off-by: Xuan Zhuo 
> > > Acked-by: Jason Wang 
> > > Message-Id: <20221017092558.111082-4-xuanz...@linux.alibaba.com>
> > > Reviewed-by: Michael S. Tsirkin 
> > > Signed-off-by: Michael S. Tsirkin 
> > > ---
> > >  include/hw/virtio/virtio.h |  2 ++
> > >  hw/virtio/virtio.c | 14 ++
> > >  2 files changed, 16 insertions(+)
> > >
> > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > > index 74d76c1dbc..b00b3fcf31 100644
> > > --- a/include/hw/virtio/virtio.h
> > > +++ b/include/hw/virtio/virtio.h
> > > @@ -149,6 +149,7 @@ struct VirtioDeviceClass {
> > >  void (*reset)(VirtIODevice *vdev);
> > >  void (*set_status)(VirtIODevice *vdev, uint8_t val);
> > >  void (*queue_reset)(VirtIODevice *vdev, uint32_t queue_index);
> > > +void (*queue_enable)(VirtIODevice *vdev, uint32_t queue_index);
> > >  /* For transitional devices, this is a bitmap of features
> > >   * that are only exposed on the legacy interface but not
> > >   * the modern one.
> > > @@ -288,6 +289,7 @@ int virtio_queue_set_host_notifier_mr(VirtIODevice 
> > > *vdev, int n,
> > >  int virtio_set_status(VirtIODevice *vdev, uint8_t val);
> > >  void virtio_reset(void *opaque);
> > >  void virtio_queue_reset(VirtIODevice *vdev, uint32_t queue_index);
> > > +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
> > >  void virtio_update_irq(VirtIODevice *vdev);
> > >  int virtio_set_features(VirtIODevice *vdev, uint64_t val);
> > >
> > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > > index cf5f9ca387..9683b2e158 100644
> > > --- a/hw/virtio/virtio.c
> > > +++ b/hw/virtio/virtio.c
> > > @@ -2495,6 +2495,20 @@ void virtio_queue_reset(VirtIODevice *vdev, 
> > > uint32_t queue_index)
> > >  __virtio_queue_reset(vdev, queue_index);
> > >  }
> > >
> > > +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index)
> > > +{
> > > +VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
> > > +
> > > +if (!virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> > > +error_report("queue_enable is only suppported in devices of 
> > > virtio "
> > > + "1.0 or later.");
> >
> > Why is this triggering here? Maybe virtio_queue_enable() is called too
> > early. I have verified that the Linux guest driver sets VERSION_1. I
> > didn't check what SeaBIOS does.
> 
> Looks like a bug, we should check device features here at least and it
> should be guest errors instead of error_report() here.
> 
> Thanks
> 

I suspect we should just drop this print. Kangjie?


> > $ build/qemu-system-x86_64 -M accel=kvm -m 1G -cpu host -blockdev
> > file,node-name=drive0,filename=test.img -device
> > virtio-blk-pci,drive=drive0
> > qemu: queue_enable is only suppported in devices of virtio 1.0 or later.
> >
> > Stefan
> >

Re: [PATCH] KVM: Add system call KVM_VERIFY_MSI to verify MSI vector

2022-11-08 Thread chenxiang (M)


Hi Marc,


在 2022/11/8 20:47, Marc Zyngier 写道:

On Tue, 08 Nov 2022 08:08:57 +,
chenxiang  wrote:

From: Xiang Chen 

Currently the numbers of MSI vectors come from register PCI_MSI_FLAGS
which should be power-of-2, but in some scenaries it is not the same as
the number that driver requires in guest, for example, a PCI driver wants
to allocate 6 MSI vecotrs in guest, but as the limitation, it will allocate
8 MSI vectors. So it requires 8 MSI vectors in qemu while the driver in
guest only wants to allocate 6 MSI vectors.

When GICv4.1 is enabled, we can see some exception print as following for
above scenaro:
vfio-pci :3a:00.1: irq bypass producer (token 8f08224d) 
registration fails:66311

In order to verify whether a MSI vector is valid, add KVM_VERIFY_MSI to do
that. If there is a mapping, return 0, otherwise return negative value.

This is the kernel part of adding system call KVM_VERIFY_MSI.

Exposing something that is an internal implementation detail to
userspace feels like the absolute wrong way to solve this issue.

Can you please characterise the issue you're having? Is it that vfio
tries to enable an interrupt for which there is no virtual ITS
mapping? Shouldn't we instead try and manage this in the kernel?


Before i reported the issue to community, you gave a suggestion about 
the issue, but not sure whether i misundertood your meaning.

You can refer to the link for more details about the issue.
https://lkml.kernel.org/lkml/87cze9lcut.wl-...@kernel.org/T/

Best regards,
Xiang

[PATCH v1 1/1] migration: Fix yank on postcopy multifd crashing guest after migration

2022-11-08 Thread Leonardo Bras

When multifd and postcopy-ram capabilities are enabled, if a
migrate-start-postcopy is attempted, the migration will finish sending the
memory pages and then crash with the following error:

qemu-system-x86_64: ../util/yank.c:107: yank_unregister_instance: Assertion
`QLIST_EMPTY(>yankfns)' failed.

This happens because even though all multifd channels could
yank_register_function(), none of them could unregister it before
unregistering the MIGRATION_YANK_INSTANCE, causing the assert to fail.

Fix that by calling multifd_load_cleanup() on postcopy_ram_listen_thread()
before MIGRATION_YANK_INSTANCE is unregistered.

Fixes: b5eea99ec2 ("migration: Add yank feature")
Reported-by: Li Xiaohui 
Signed-off-by: Leonardo Bras 
---
 migration/migration.h |  1 +
 migration/migration.c | 18 +-
 migration/savevm.c|  2 ++
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index cdad8aceaa..240f64efb0 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -473,6 +473,7 @@ void migration_make_urgent_request(void);
 void migration_consume_urgent_request(void);
 bool migration_rate_limit(void);
 void migration_cancel(const Error *error);
+bool migration_load_cleanup(void);
 
 void populate_vfio_info(MigrationInfo *info);
 void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
diff --git a/migration/migration.c b/migration/migration.c
index 739bb683f3..4f363b2a95 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -486,6 +486,17 @@ void migrate_add_address(SocketAddress *address)
   QAPI_CLONE(SocketAddress, address));
 }
 
+bool migration_load_cleanup(void)
+{
+Error *local_err = NULL;
+
+if (multifd_load_cleanup(_err)) {
+error_report_err(local_err);
+return true;
+}
+return false;
+}
+
 static void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
 const char *p = NULL;
@@ -540,8 +551,7 @@ static void process_incoming_migration_bh(void *opaque)
  */
 qemu_announce_self(>announce_timer, migrate_announce_params());
 
-if (multifd_load_cleanup(_err) != 0) {
-error_report_err(local_err);
+if (migration_load_cleanup()) {
 autostart = false;
 }
 /* If global state section was not received or we are in running
@@ -646,9 +656,7 @@ fail:
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_FAILED);
 qemu_fclose(mis->from_src_file);
-if (multifd_load_cleanup(_err) != 0) {
-error_report_err(local_err);
-}
+migration_load_cleanup();
 exit(EXIT_FAILURE);
 }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index a0cdb714f7..250caff7f4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1889,6 +1889,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
 exit(EXIT_FAILURE);
 }
 
+migration_load_cleanup();
+
 migrate_set_state(>state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
MIGRATION_STATUS_COMPLETED);
 /*
-- 
2.38.1

Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions

2022-11-08 Thread Yuan Yao

On Tue, Nov 08, 2022 at 05:41:41PM +0800, Chao Peng wrote:
> On Tue, Nov 08, 2022 at 09:35:06AM +0800, Yuan Yao wrote:
> > On Tue, Oct 25, 2022 at 11:13:41PM +0800, Chao Peng wrote:
> > > Introduce generic private memory register/unregister by reusing existing
> > > SEV ioctls KVM_MEMORY_ENCRYPT_{UN,}REG_REGION. It differs from SEV case
> > > by treating address in the region as gpa instead of hva. Which cases
> > > should these ioctls go is determined by the kvm_arch_has_private_mem().
> > > Architecture which supports KVM_PRIVATE_MEM should override this function.
> > >
> > > KVM internally defaults all guest memory as private memory and maintain
> > > the shared memory in 'mem_attr_array'. The above ioctls operate on this
> > > field and unmap existing mappings if any.
> > >
> > > Signed-off-by: Chao Peng 
> > > ---
> > >  Documentation/virt/kvm/api.rst |  17 ++-
> > >  arch/x86/kvm/Kconfig   |   1 +
> > >  include/linux/kvm_host.h   |  10 +-
> > >  virt/kvm/Kconfig   |   4 +
> > >  virt/kvm/kvm_main.c| 227 +
> > >  5 files changed, 198 insertions(+), 61 deletions(-)
> > >
> > > diff --git a/Documentation/virt/kvm/api.rst 
> > > b/Documentation/virt/kvm/api.rst
> > > index 975688912b8c..08253cf498d1 100644
> > > --- a/Documentation/virt/kvm/api.rst
> > > +++ b/Documentation/virt/kvm/api.rst
> > > @@ -4717,10 +4717,19 @@ 
> > > Documentation/virt/kvm/x86/amd-memory-encryption.rst.
> > >  This ioctl can be used to register a guest memory region which may
> > >  contain encrypted data (e.g. guest RAM, SMRAM etc).
> > >
> > > -It is used in the SEV-enabled guest. When encryption is enabled, a guest
> > > -memory region may contain encrypted data. The SEV memory encryption
> > > -engine uses a tweak such that two identical plaintext pages, each at
> > > -different locations will have differing ciphertexts. So swapping or
> > > +Currently this ioctl supports registering memory regions for two usages:
> > > +private memory and SEV-encrypted memory.
> > > +
> > > +When private memory is enabled, this ioctl is used to register guest 
> > > private
> > > +memory region and the addr/size of kvm_enc_region represents guest 
> > > physical
> > > +address (GPA). In this usage, this ioctl zaps the existing guest memory
> > > +mappings in KVM that fallen into the region.
> > > +
> > > +When SEV-encrypted memory is enabled, this ioctl is used to register 
> > > guest
> > > +memory region which may contain encrypted data for a SEV-enabled guest. 
> > > The
> > > +addr/size of kvm_enc_region represents userspace address (HVA). The SEV
> > > +memory encryption engine uses a tweak such that two identical plaintext 
> > > pages,
> > > +each at different locations will have differing ciphertexts. So swapping 
> > > or
> > >  moving ciphertext of those pages will not result in plaintext being
> > >  swapped. So relocating (or migrating) physical backing pages for the SEV
> > >  guest will require some additional steps.
> > > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> > > index 8d2bd455c0cd..73fdfa429b20 100644
> > > --- a/arch/x86/kvm/Kconfig
> > > +++ b/arch/x86/kvm/Kconfig
> > > @@ -51,6 +51,7 @@ config KVM
> > >   select HAVE_KVM_PM_NOTIFIER if PM
> > >   select HAVE_KVM_RESTRICTED_MEM if X86_64
> > >   select RESTRICTEDMEM if HAVE_KVM_RESTRICTED_MEM
> > > + select KVM_GENERIC_PRIVATE_MEM if HAVE_KVM_RESTRICTED_MEM
> > >   help
> > > Support hosting fully virtualized guest machines using hardware
> > > virtualization extensions.  You will need a fairly recent
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index 79e5cbc35fcf..4ce98fa0153c 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -245,7 +245,8 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t 
> > > cr2_or_gpa,
> > >  int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
> > >  #endif
> > >
> > > -#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
> > > +
> > > +#if defined(KVM_ARCH_WANT_MMU_NOTIFIER) || 
> > > defined(CONFIG_KVM_GENERIC_PRIVATE_MEM)
> > >  struct kvm_gfn_range {
> > >   struct kvm_memory_slot *slot;
> > >   gfn_t start;
> > > @@ -254,6 +255,9 @@ struct kvm_gfn_range {
> > >   bool may_block;
> > >  };
> > >  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
> > > +#endif
> > > +
> > > +#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
> > >  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
> > >  bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
> > >  bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
> > > @@ -794,6 +798,9 @@ struct kvm {
> > >   struct notifier_block pm_notifier;
> > >  #endif
> > >   char stats_id[KVM_STATS_NAME_SIZE];
> > > +#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
> > > + struct xarray mem_attr_array;
> > > +#endif
> > >  };
> > >
> > >  #define kvm_err(fmt, ...) \
> > > @@ -1453,6 +1460,7 @@ bool

Re: [PATCH v3 4/4] hw/nvme: add polling support

2022-11-08 Thread Jinhao Fan

at 10:11 PM, Klaus Jensen  wrote:

> On Nov  8 12:39, John Levon wrote:
>> On Fri, Nov 04, 2022 at 07:32:12AM +0100, Klaus Jensen wrote:
>> 
>>> On Nov  3 21:19, Jinhao Fan wrote:
 On 11/3/2022 8:10 PM, Klaus Jensen wrote:
> I agree that the spec is a little unclear on this point. In any case, in
> Linux, when the driver has decided that the sq tail must be updated,
> it will use this check:
> 
>   (new_idx - event_idx - 1) < (new_idx - old)
 
 When eventidx is already behind, it's like:
 
 0
 1 <- event_idx
 2 <- old
 3 <- new_idx
 4
 .
 .
 .
 
 In this case, (new_idx - event_idx - 1) = 3-1-1 = 1 >= (new_idx - old) =
 3-2=1, so the host won't update sq tail. Where am I wrong in this example?
>>> 
>>> That becomes 1 >= 1, i.e. "true". So this will result in the driver
>>> doing an mmio doorbell write.
>> 
>> The code is:
>> 
>> static inline int nvme_dbbuf_need_event(u16 event_idx, u16 new_idx, u16 old) 
>> 
>> {
>> 
>>return (u16)(new_idx - event_idx - 1) < (u16)(new_idx - old); 
>>
>> }
>> 
>> 
>> which per the above is "return 1 < 1;", or false. So the above case does 
>> *not*
>> do an mmio write. No?
> 
> Whelp.
> 
> Looks like I'm in the wrong here, apologies!

So disabling eventidx updates during polling has the potential of reducing
doorbell writes. But as Klaus observed removing this function does not cause
performance difference. So I guess only one command is processed during each
polling iteration.

Re: [PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed

2022-11-08 Thread Chao Peng

On Tue, Nov 08, 2022 at 08:08:05PM +0800, Yuan Yao wrote:
> On Tue, Oct 25, 2022 at 11:13:42PM +0800, Chao Peng wrote:
> > When private/shared memory are mixed in a large page, the lpage_info may
> > not be accurate and should be updated with this mixed info. A large page
> > has mixed pages can't be really mapped as large page since its
> > private/shared pages are from different physical memory.
> >
> > Update lpage_info when private/shared memory attribute is changed. If
> > both private and shared pages are within a large page region, it can't
> > be mapped as large page. It's a bit challenge to track the mixed
> > info in a 'count' like variable, this patch instead reserves a bit in
> > 'disallow_lpage' to indicate a large page has mixed private/share pages.
> >
> > Signed-off-by: Chao Peng 
> > ---
> >  arch/x86/include/asm/kvm_host.h |   8 +++
> >  arch/x86/kvm/mmu/mmu.c  | 112 +++-
> >  arch/x86/kvm/x86.c  |   2 +
> >  include/linux/kvm_host.h|  19 ++
> >  virt/kvm/kvm_main.c |  16 +++--
> >  5 files changed, 152 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h 
> > b/arch/x86/include/asm/kvm_host.h
> > index 7551b6f9c31c..db811a54e3fd 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -37,6 +37,7 @@
> >  #include 
> >
> >  #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
> > +#define __KVM_HAVE_ARCH_UPDATE_MEM_ATTR
> >
> >  #define KVM_MAX_VCPUS 1024
> >
> > @@ -952,6 +953,13 @@ struct kvm_vcpu_arch {
> >  #endif
> >  };
> >
> > +/*
> > + * Use a bit in disallow_lpage to indicate private/shared pages mixed at 
> > the
> > + * level. The remaining bits are used as a reference count.
> > + */
> > +#define KVM_LPAGE_PRIVATE_SHARED_MIXED (1U << 31)
> > +#define KVM_LPAGE_COUNT_MAX((1U << 31) - 1)
> > +
> >  struct kvm_lpage_info {
> > int disallow_lpage;
> >  };
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 33b1aec44fb8..67a9823a8c35 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -762,11 +762,16 @@ static void update_gfn_disallow_lpage_count(const 
> > struct kvm_memory_slot *slot,
> >  {
> > struct kvm_lpage_info *linfo;
> > int i;
> > +   int disallow_count;
> >
> > for (i = PG_LEVEL_2M; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
> > linfo = lpage_info_slot(gfn, slot, i);
> > +
> > +   disallow_count = linfo->disallow_lpage & KVM_LPAGE_COUNT_MAX;
> > +   WARN_ON(disallow_count + count < 0 ||
> > +   disallow_count > KVM_LPAGE_COUNT_MAX - count);
> > +
> > linfo->disallow_lpage += count;
> > -   WARN_ON(linfo->disallow_lpage < 0);
> > }
> >  }
> >
> > @@ -6910,3 +6915,108 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
> > if (kvm->arch.nx_lpage_recovery_thread)
> > kthread_stop(kvm->arch.nx_lpage_recovery_thread);
> >  }
> > +
> > +static inline bool linfo_is_mixed(struct kvm_lpage_info *linfo)
> > +{
> > +   return linfo->disallow_lpage & KVM_LPAGE_PRIVATE_SHARED_MIXED;
> > +}
> > +
> > +static inline void linfo_update_mixed(struct kvm_lpage_info *linfo, bool 
> > mixed)
> > +{
> > +   if (mixed)
> > +   linfo->disallow_lpage |= KVM_LPAGE_PRIVATE_SHARED_MIXED;
> > +   else
> > +   linfo->disallow_lpage &= ~KVM_LPAGE_PRIVATE_SHARED_MIXED;
> > +}
> > +
> > +static bool mem_attr_is_mixed_2m(struct kvm *kvm, unsigned int attr,
> > +gfn_t start, gfn_t end)
> > +{
> > +   XA_STATE(xas, >mem_attr_array, start);
> > +   gfn_t gfn = start;
> > +   void *entry;
> > +   bool shared = attr == KVM_MEM_ATTR_SHARED;
> > +   bool mixed = false;
> > +
> > +   rcu_read_lock();
> > +   entry = xas_load();
> > +   while (gfn < end) {
> > +   if (xas_retry(, entry))
> > +   continue;
> > +
> > +   KVM_BUG_ON(gfn != xas.xa_index, kvm);
> > +
> > +   if ((entry && !shared) || (!entry && shared)) {
> > +   mixed = true;
> > +   goto out;
> > +   }
> > +
> > +   entry = xas_next();
> > +   gfn++;
> > +   }
> > +out:
> > +   rcu_read_unlock();
> > +   return mixed;
> > +}
> > +
> > +static bool mem_attr_is_mixed(struct kvm *kvm, struct kvm_memory_slot 
> > *slot,
> > + int level, unsigned int attr,
> > + gfn_t start, gfn_t end)
> > +{
> > +   unsigned long gfn;
> > +   void *entry;
> > +
> > +   if (level == PG_LEVEL_2M)
> > +   return mem_attr_is_mixed_2m(kvm, attr, start, end);
> > +
> > +   entry = xa_load(>mem_attr_array, start);
> > +   for (gfn = start; gfn < end; gfn += KVM_PAGES_PER_HPAGE(level - 1)) {
> > +   if (linfo_is_mixed(lpage_info_slot(gfn, slot, level - 1)))
> > +   return true;
> > +   if (xa_load(>mem_attr_array, gfn) != entry)
> > +

Re: [PULL 0/2] Net patches

2022-11-08 Thread Jason Wang

On Wed, Nov 9, 2022 at 12:33 AM Stefan Hajnoczi  wrote:
>
> On Mon, 7 Nov 2022 at 23:20, Jason Wang  wrote:
> >
> > The following changes since commit 524fc737431d240f9d9f10aaf381003092868bac:
> >
> >   util/log: Ignore per-thread flag if global file already there (2022-11-07 
> > 16:00:02 -0500)
> >
> > are available in the git repository at:
> >
> >   https://github.com/jasowang/qemu.git tags/net-pull-request
> >
> > for you to fetch changes up to fd2c87c7b0c97be2ac8d334885419f51f5963b51:
> >
> >   tests/qtest: netdev: test stream and dgram backends (2022-11-08 12:10:26 
> > +0800)
> >
> > 
> >
> > 
> > Laurent Vivier (1):
> >   tests/qtest: netdev: test stream and dgram backends
>
> This test does not pass in CI:
> https://gitlab.com/qemu-project/qemu/-/jobs/3290964536
> https://gitlab.com/qemu-project/qemu/-/jobs/3290964524
> https://gitlab.com/qemu-project/qemu/-/jobs/3290964471
> https://gitlab.com/qemu-project/qemu/-/jobs/3290964475
> https://gitlab.com/qemu-project/qemu/-/jobs/3290964569
>
> We're in soft freeze now. Please hold off on new tests unless they
> verify regressions/blockers.

Ok, so I think the netdev socket test could go for 7.3.

Thanks

>
> Thanks,
> Stefan
>

Re: [PULL v4 29/83] virtio: introduce virtio_queue_enable()

2022-11-08 Thread Jason Wang

On Wed, Nov 9, 2022 at 3:43 AM Stefan Hajnoczi  wrote:
>
> On Mon, 7 Nov 2022 at 18:10, Michael S. Tsirkin  wrote:
> >
> > From: Kangjie Xu 
> >
> > Introduce the interface queue_enable() in VirtioDeviceClass and the
> > fucntion virtio_queue_enable() in virtio, it can be called when
> > VIRTIO_PCI_COMMON_Q_ENABLE is written and related virtqueue can be
> > started. It only supports the devices of virtio 1 or later. The
> > not-supported devices can only start the virtqueue when DRIVER_OK.
> >
> > Signed-off-by: Kangjie Xu 
> > Signed-off-by: Xuan Zhuo 
> > Acked-by: Jason Wang 
> > Message-Id: <20221017092558.111082-4-xuanz...@linux.alibaba.com>
> > Reviewed-by: Michael S. Tsirkin 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  include/hw/virtio/virtio.h |  2 ++
> >  hw/virtio/virtio.c | 14 ++
> >  2 files changed, 16 insertions(+)
> >
> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > index 74d76c1dbc..b00b3fcf31 100644
> > --- a/include/hw/virtio/virtio.h
> > +++ b/include/hw/virtio/virtio.h
> > @@ -149,6 +149,7 @@ struct VirtioDeviceClass {
> >  void (*reset)(VirtIODevice *vdev);
> >  void (*set_status)(VirtIODevice *vdev, uint8_t val);
> >  void (*queue_reset)(VirtIODevice *vdev, uint32_t queue_index);
> > +void (*queue_enable)(VirtIODevice *vdev, uint32_t queue_index);
> >  /* For transitional devices, this is a bitmap of features
> >   * that are only exposed on the legacy interface but not
> >   * the modern one.
> > @@ -288,6 +289,7 @@ int virtio_queue_set_host_notifier_mr(VirtIODevice 
> > *vdev, int n,
> >  int virtio_set_status(VirtIODevice *vdev, uint8_t val);
> >  void virtio_reset(void *opaque);
> >  void virtio_queue_reset(VirtIODevice *vdev, uint32_t queue_index);
> > +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
> >  void virtio_update_irq(VirtIODevice *vdev);
> >  int virtio_set_features(VirtIODevice *vdev, uint64_t val);
> >
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index cf5f9ca387..9683b2e158 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -2495,6 +2495,20 @@ void virtio_queue_reset(VirtIODevice *vdev, uint32_t 
> > queue_index)
> >  __virtio_queue_reset(vdev, queue_index);
> >  }
> >
> > +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index)
> > +{
> > +VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
> > +
> > +if (!virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> > +error_report("queue_enable is only suppported in devices of virtio 
> > "
> > + "1.0 or later.");
>
> Why is this triggering here? Maybe virtio_queue_enable() is called too
> early. I have verified that the Linux guest driver sets VERSION_1. I
> didn't check what SeaBIOS does.

Looks like a bug, we should check device features here at least and it
should be guest errors instead of error_report() here.

Thanks

>
> $ build/qemu-system-x86_64 -M accel=kvm -m 1G -cpu host -blockdev
> file,node-name=drive0,filename=test.img -device
> virtio-blk-pci,drive=drive0
> qemu: queue_enable is only suppported in devices of virtio 1.0 or later.
>
> Stefan
>

[PATCH] vhost: set mem table before device start

2022-11-08 Thread Yajun Wu

Today guest memory information (through VHOST_USER_SET_MEM_TABLE)
is sent out during vhost device start. Due to this delay, memory
pinning is delayed. For 4G guest memory, a VFIO driver usually
takes around 400+msec to pin the memory.

This time is accounted towards the VM downtime. When live migrating
a VM, vhost device start is occuring in vmstate load stage.

Moving set mem table just after VM bootup, before device start can
let backend have enough time to pin the guest memory before starting
the device. This improvements reduces VM downtime by 400+msec.

Signed-off-by: Yajun Wu 
Acked-by: Parav Pandit 
---
 hw/virtio/vhost.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index f758f177bb..73e473cd84 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -539,6 +539,14 @@ static void vhost_commit(MemoryListener *listener)
 }
 
 if (!dev->started) {
+/* Backend can pin memory before device start, reduce LM downtime */
+if (dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER &&
+dev->n_mem_sections) {
+r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem);
+if (r < 0) {
+VHOST_OPS_DEBUG(r, "vhost_set_mem_table failed");
+}
+}
 goto out;
 }
 
-- 
2.27.0

[PATCH] target/i386: Add SGX aex-notify and EDECCSSA support

2022-11-08 Thread Kai Huang

The new SGX Asynchronous Exit (AEX) notification mechanism (AEX-notify)
allows one enclave to receive a notification in the ERESUME after the
enclave exit due to an AEX.  EDECCSSA is a new SGX user leaf function
(ENCLU[EDECCSSA]) to facilitate the AEX notification handling.

Whether the hardware supports to create enclave with AEX-notify support
is enumerated via CPUID.(EAX=0x12,ECX=0x1):EAX[10].  The new EDECCSSA
user leaf function is enumerated via CPUID.(EAX=0x12,ECX=0x0):EAX[11].

Add support to allow to expose the new SGX AEX-notify feature and the
new EDECCSSA user leaf function to KVM guest.

Link: 
https://lore.kernel.org/lkml/166760360549.4906.809756297092548496.tip-bot2@tip-bot2/
Link: 
https://lore.kernel.org/lkml/166760360934.4906.2427175408052308969.tip-bot2@tip-bot2/
Reviewed-by: Yang Zhong 
Signed-off-by: Kai Huang 
---
 target/i386/cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 22b681ca37dd..51f212cef50d 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1233,7 +1233,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .feat_names = {
 "sgx1", "sgx2", NULL, NULL,
 NULL, NULL, NULL, NULL,
-NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, "sgx-edeccssa",
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
@@ -1273,7 +1273,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .feat_names = {
 NULL, "sgx-debug", "sgx-mode64", NULL,
 "sgx-provisionkey", "sgx-tokenkey", NULL, "sgx-kss",
-NULL, NULL, NULL, NULL,
+NULL, NULL, "sgx-aex-notify", NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,

base-commit: 466e81ff12013d026e2d0154266fce82bce2ee9b
-- 
2.38.1

Re: [PATCH v2] target/loongarch: Fix loongarch fdt addr confict

2022-11-08 Thread Richard Henderson


On 11/9/22 13:04, Song Gao wrote:

Fix LoongArch check-tcg error:
TESThello on loongarch64
qemu-system-loongarch64: Some ROM regions are overlapping
These ROM regions might have been loaded by direct user request or by default.
They could be BIOS/firmware images, a guest kernel, initrd or some other file 
loaded into guest memory.
Check whether you intended to load all this guest code, and whether it has been 
built to load to the correct addresses.

The following two regions overlap (in the memory address space):
hello ELF program header segment 0 (addresses 0x0020 - 
0x00242000)
fdt (addresses 0x0020 - 0x0030)
make[1]: *** [Makefile:177: run-hello] Error 1

Reported-by: Richard Henderson 
Signed-off-by: Song Gao 
---
  hw/loongarch/virt.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)


Reviewed-by: Richard Henderson 


r~

Re: [PATCH v3 2/2] util/log: Always send errors to logfile when daemonized

2022-11-08 Thread Richard Henderson


On 11/9/22 01:00, Greg Kurz wrote:

When QEMU is started with `-daemonize`, all stdio descriptors get
redirected to `/dev/null`. This basically means that anything
printed with error_report() and friends is lost.

Current logging code allows to redirect to a file with `-D` but
this requires to enable some logging item with `-d` as well to
be functional.

Relax the check on the log flags when QEMU is daemonized, so that
other users of stderr can benefit from the redirection, without the
need to enable unwanted debug logs. Previous behaviour is retained
for the non-daemonized case. The logic is unrolled as an `if` for
better readability. The qemu_log_level and log_per_thread globals
reflect the state we want to transition to at this point : use
them instead of the intermediary locals for correctness.

qemu_set_log_internal() is adapted to open a per-thread log file
when '-d tid' is passed. This is done by hijacking qemu_try_lock()
which seems simpler that refactoring the code.

Signed-off-by: Greg Kurz
---
  util/log.c | 72 --
  1 file changed, 53 insertions(+), 19 deletions(-)


Reviewed-by: Richard Henderson 

r~

[PATCH v2] target/loongarch: Fix loongarch fdt addr confict

2022-11-08 Thread Song Gao

Fix LoongArch check-tcg error:
   TESThello on loongarch64
qemu-system-loongarch64: Some ROM regions are overlapping
These ROM regions might have been loaded by direct user request or by default.
They could be BIOS/firmware images, a guest kernel, initrd or some other file 
loaded into guest memory.
Check whether you intended to load all this guest code, and whether it has been 
built to load to the correct addresses.

The following two regions overlap (in the memory address space):
   hello ELF program header segment 0 (addresses 0x0020 - 
0x00242000)
   fdt (addresses 0x0020 - 0x0030)
make[1]: *** [Makefile:177: run-hello] Error 1

Reported-by: Richard Henderson 
Signed-off-by: Song Gao 
---
 hw/loongarch/virt.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 5e4c2790bf..5136940b0b 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -793,13 +793,13 @@ static void loongarch_init(MachineState *machine)
 qemu_add_machine_init_done_notifier(>machine_done);
 fdt_add_pcie_node(lams);
 /*
- * Since lowmem region starts from 0, FDT base address is located
- * at 2 MiB to avoid NULL pointer access.
- *
+ * Since lowmem region starts from 0 and Linux kernel legacy start address
+ * at 2 MiB, FDT base address is located at 1 MiB to avoid NULL pointer
+ * access. FDT size limit with 1 MiB.
  * Put the FDT into the memory map as a ROM image: this will ensure
  * the FDT is copied again upon reset, even if addr points into RAM.
  */
-fdt_base = 2 * MiB;
+fdt_base = 1 * MiB;
 qemu_fdt_dumpdtb(machine->fdt, lams->fdt_size);
 rom_add_blob_fixed("fdt", machine->fdt, lams->fdt_size, fdt_base);
 }
-- 
2.31.1

Re: [PATCH v5 03/11] Hexagon (target/hexagon) Add overrides for S2_asr_r_r_sat/S2_asl_r_r_sat

2022-11-08 Thread Richard Henderson


On 11/9/22 03:28, Taylor Simpson wrote:

These instructions will not be generated by idef-parser, so we override
them manually.

Test cases added to tests/tcg/hexagon/usr.c

Co-authored-by: Matheus Tavares Bernardino
Signed-off-by: Matheus Tavares Bernardino
Signed-off-by: Taylor Simpson
---
  target/hexagon/gen_tcg.h |  10 +++-
  target/hexagon/genptr.c  | 104 +++
  tests/tcg/hexagon/usr.c  |  34 ++---
  3 files changed, 141 insertions(+), 7 deletions(-)


Reviewed-by: Richard Henderson 

r~

Questions about QEMU exception

2022-11-08 Thread Li, Kevin

Hi qemu community,

We are working on some open source project which uses qemu on mac, and we have 
some signing process to sign qemu-system-x86_64.
If qemu-system-x86_64 is not signed, we don’t see any problem, but after sign 
it, we got the following error:

qemu-system-x86_64 -M none -netdev help]: stdout=\"Accelerators supported in 
QEMU binary:\\ntcg\\nhax\\nhvf\\n\", stderr=\"qemu-system-x86_64: allocate 
1073741824 bytes for jit buffer: Invalid argument

Does anyone has clue about what change may result in this failure?

Thanks,
Kevin

Re: [PATCH v4 10/11] Hexagon (target/hexagon) Use direct block chaining for direct jump/branch

2022-11-08 Thread Richard Henderson

On 11/9/22 02:41, Taylor Simpson wrote:

-Original Message-
From: Richard Henderson 
Sent: Tuesday, November 8, 2022 1:24 AM
To: Taylor Simpson ; qemu-devel@nongnu.org
Cc: phi...@linaro.org; a...@rev.ng; a...@rev.ng; Brian Cain
; Matheus Bernardino (QUIC)

Subject: Re: [PATCH v4 10/11] Hexagon (target/hexagon) Use direct block
chaining for direct jump/branch

On 11/8/22 15:05, Taylor Simpson wrote:

   static void hexagon_tr_tb_start(DisasContextBase *db, CPUState *cpu)
   {
+DisasContext *ctx = container_of(db, DisasContext, base);
+ctx->branch_cond = TCG_COND_NEVER;
   }

Typically this would go in hexagon_tr_init_disas_context as well, but I don't
suppose it really matters.

AFAICT, these are always called back to back.  So, it's not clear to me what 
the distinction should be.

ops->tb_start is called after gen_tb_start, so you can emit code that comes after the 
interrupt/icount check, but before the first guest instruction.  Rarely needed, should 
probably be allowed to be NULL.

r~

[PATCH v1 05/24] vfio-user: add device IO ops vector

2022-11-08 Thread John Johnson

Used for communication with VFIO driver
(prep work for vfio-user, which will communicate over a socket)

Signed-off-by: John G Johnson 
---
 hw/vfio/ap.c  |   1 +
 hw/vfio/ccw.c |   1 +
 hw/vfio/common.c  | 107 +++-
 hw/vfio/pci.c | 140 ++
 hw/vfio/platform.c|   1 +
 include/hw/vfio/vfio-common.h |  27 
 6 files changed, 209 insertions(+), 68 deletions(-)

diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index e0dd561..7ef42f1 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -102,6 +102,7 @@ static void vfio_ap_realize(DeviceState *dev, Error **errp)
 mdevid = basename(vapdev->vdev.sysfsdev);
 vapdev->vdev.name = g_strdup_printf("%s", mdevid);
 vapdev->vdev.dev = dev;
+vapdev->vdev.io_ops = _dev_io_ioctl;
 
 /*
  * vfio-ap devices operate in a way compatible with discarding of
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 06b588c..cbd1c25 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -614,6 +614,7 @@ static void vfio_ccw_get_device(VFIOGroup *group, 
VFIOCCWDevice *vcdev,
 vcdev->vdev.type = VFIO_DEVICE_TYPE_CCW;
 vcdev->vdev.name = name;
 vcdev->vdev.dev = >cdev.parent_obj.parent_obj;
+vcdev->vdev.io_ops = _dev_io_ioctl;
 
 return;
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index dd9104f..c7bf0aa 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -71,7 +71,7 @@ void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
 .count = 0,
 };
 
-ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, _set);
+VDEV_SET_IRQS(vbasedev, _set);
 }
 
 void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index)
@@ -84,7 +84,7 @@ void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int 
index)
 .count = 1,
 };
 
-ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, _set);
+VDEV_SET_IRQS(vbasedev, _set);
 }
 
 void vfio_mask_single_irqindex(VFIODevice *vbasedev, int index)
@@ -97,7 +97,7 @@ void vfio_mask_single_irqindex(VFIODevice *vbasedev, int 
index)
 .count = 1,
 };
 
-ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, _set);
+VDEV_SET_IRQS(vbasedev, _set);
 }
 
 static inline const char *action_to_str(int action)
@@ -178,9 +178,7 @@ int vfio_set_irq_signaling(VFIODevice *vbasedev, int index, 
int subindex,
 pfd = (int32_t *)_set->data;
 *pfd = fd;
 
-if (ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set)) {
-ret = -errno;
-}
+ret = VDEV_SET_IRQS(vbasedev, irq_set);
 g_free(irq_set);
 
 if (!ret) {
@@ -215,6 +213,7 @@ void vfio_region_write(void *opaque, hwaddr addr,
 uint32_t dword;
 uint64_t qword;
 } buf;
+int ret;
 
 switch (size) {
 case 1:
@@ -234,13 +233,15 @@ void vfio_region_write(void *opaque, hwaddr addr,
 break;
 }
 
-if (pwrite(vbasedev->fd, , size, region->fd_offset + addr) != size) {
+ret = VDEV_REGION_WRITE(vbasedev, region->nr, addr, size, );
+if (ret != size) {
+const char *err = ret < 0 ? strerror(-ret) : "short write";
+
 error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
- ",%d) failed: %m",
+ ",%d) failed: %s",
  __func__, vbasedev->name, region->nr,
- addr, data, size);
+ addr, data, size, err);
 }
-
 trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size);
 
 /*
@@ -266,13 +267,18 @@ uint64_t vfio_region_read(void *opaque,
 uint64_t qword;
 } buf;
 uint64_t data = 0;
+int ret;
+
+ret = VDEV_REGION_READ(vbasedev, region->nr, addr, size, );
+if (ret != size) {
+const char *err = ret < 0 ? strerror(-ret) : "short read";
 
-if (pread(vbasedev->fd, , size, region->fd_offset + addr) != size) {
-error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m",
+error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %s",
  __func__, vbasedev->name, region->nr,
- addr, size);
+ addr, size, err);
 return (uint64_t)-1;
 }
+
 switch (size) {
 case 1:
 data = buf.byte;
@@ -2450,6 +2456,7 @@ int vfio_get_region_info(VFIODevice *vbasedev, int index,
  struct vfio_region_info **info)
 {
 size_t argsz = sizeof(struct vfio_region_info);
+int ret;
 
 /* create region cache */
 if (vbasedev->regions == NULL) {
@@ -2468,10 +2475,11 @@ int vfio_get_region_info(VFIODevice *vbasedev, int 
index,
 retry:
 (*info)->argsz = argsz;
 
-if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, *info)) {
+ret = VDEV_GET_REGION_INFO(vbasedev, *info);
+if (ret != 0) {
 g_free(*info);
 *info = NULL;
-return -errno;
+return ret;
 }
 
 if ((*info)->argsz > argsz) {
@@ -2632,6 +2640,75 @@ int vfio_eeh_as_op(AddressSpace *as,

[PATCH v1 10/24] vfio-user: get device info

2022-11-08 Thread John Johnson

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/vfio/pci.c   | 15 ++
 hw/vfio/user-protocol.h | 13 
 hw/vfio/user.c  | 55 +
 hw/vfio/user.h  |  2 ++
 4 files changed, 85 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index b2534b3..2e0e41d 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3465,6 +3465,8 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Error 
**errp)
 VFIODevice *vbasedev = >vbasedev;
 SocketAddress addr;
 VFIOProxy *proxy;
+struct vfio_device_info info;
+int ret;
 Error *err = NULL;
 
 /*
@@ -3503,6 +3505,19 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Error 
**errp)
 vbasedev->ops = _user_pci_ops;
 vbasedev->type = VFIO_DEVICE_TYPE_PCI;
 vbasedev->dev = DEVICE(vdev);
+vbasedev->io_ops = _dev_io_sock;
+
+ret = VDEV_GET_INFO(vbasedev, );
+if (ret) {
+error_setg_errno(errp, -ret, "get info failure");
+goto error;
+}
+
+vfio_populate_device(vdev, );
+if (err) {
+error_propagate(errp, err);
+goto error;
+}
 
 return;
 
diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h
index 5de5b20..43912a5 100644
--- a/hw/vfio/user-protocol.h
+++ b/hw/vfio/user-protocol.h
@@ -113,4 +113,17 @@ typedef struct {
  */
 #define VFIO_USER_DEF_MAX_BITMAP (256 * 1024 * 1024)
 
+/*
+ * VFIO_USER_DEVICE_GET_INFO
+ * imported from struct_device_info
+ */
+typedef struct {
+VFIOUserHdr hdr;
+uint32_t argsz;
+uint32_t flags;
+uint32_t num_regions;
+uint32_t num_irqs;
+uint32_t cap_offset;
+} VFIOUserDeviceInfo;
+
 #endif /* VFIO_USER_PROTOCOL_H */
diff --git a/hw/vfio/user.c b/hw/vfio/user.c
index 31bcc93..7873534 100644
--- a/hw/vfio/user.c
+++ b/hw/vfio/user.c
@@ -31,6 +31,14 @@
 #include "qapi/qmp/qbool.h"
 #include "user.h"
 
+
+/*
+ * These are to defend against a malign server trying
+ * to force us to run out of memory.
+ */
+#define VFIO_USER_MAX_REGIONS   100
+#define VFIO_USER_MAX_IRQS  50
+
 static int wait_time = 5000;   /* wait up to 5 sec for busy servers */
 static IOThread *vfio_user_iothread;
 
@@ -1075,3 +1083,50 @@ int vfio_user_validate_version(VFIOProxy *proxy, Error 
**errp)
 
 return 0;
 }
+
+static int vfio_user_get_info(VFIOProxy *proxy, struct vfio_device_info *info)
+{
+VFIOUserDeviceInfo msg;
+
+memset(, 0, sizeof(msg));
+vfio_user_request_msg(, VFIO_USER_DEVICE_GET_INFO, sizeof(msg), 0);
+msg.argsz = sizeof(struct vfio_device_info);
+
+vfio_user_send_wait(proxy, , NULL, 0, false);
+if (msg.hdr.flags & VFIO_USER_ERROR) {
+return -msg.hdr.error_reply;
+}
+
+memcpy(info, , sizeof(*info));
+return 0;
+}
+
+
+/*
+ * Socket-based io_ops
+ */
+
+static int vfio_user_io_get_info(VFIODevice *vbasedev,
+ struct vfio_device_info *info)
+{
+int ret;
+
+ret = vfio_user_get_info(vbasedev->proxy, info);
+if (ret) {
+return ret;
+}
+
+/* defend against a malicious server */
+if (info->num_regions > VFIO_USER_MAX_REGIONS ||
+info->num_irqs > VFIO_USER_MAX_IRQS) {
+error_printf("vfio_user_get_info: invalid reply\n");
+return -EINVAL;
+}
+
+return 0;
+}
+
+VFIODevIO vfio_dev_io_sock = {
+.get_info = vfio_user_io_get_info,
+};
+
diff --git a/hw/vfio/user.h b/hw/vfio/user.h
index 8ce3cd9..2547cf6 100644
--- a/hw/vfio/user.h
+++ b/hw/vfio/user.h
@@ -92,4 +92,6 @@ void vfio_user_set_handler(VFIODevice *vbasedev,
void *reqarg);
 int vfio_user_validate_version(VFIOProxy *proxy, Error **errp);
 
+extern VFIODevIO vfio_dev_io_sock;
+
 #endif /* VFIO_USER_H */
-- 
1.8.3.1

[PATCH v1 11/24] vfio-user: get region info

2022-11-08 Thread John Johnson

Add per-region FD to support mmap() of remote device regions

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/vfio/common.c  | 32 ---
 hw/vfio/user-protocol.h   | 14 ++
 hw/vfio/user.c| 59 +++
 include/hw/vfio/vfio-common.h |  8 +++---
 4 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index c589bd9..87400b3 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -41,6 +41,7 @@
 #include "qapi/error.h"
 #include "migration/migration.h"
 #include "sysemu/tpm.h"
+#include "hw/vfio/user.h"
 
 VFIOGroupList vfio_group_list =
 QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -1586,6 +1587,11 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, 
VFIORegion *region,
 region->size = info->size;
 region->fd_offset = info->offset;
 region->nr = index;
+if (vbasedev->regfds != NULL) {
+region->fd = vbasedev->regfds[index];
+} else {
+region->fd = vbasedev->fd;
+}
 
 if (region->size) {
 region->mem = g_new0(MemoryRegion, 1);
@@ -1637,7 +1643,7 @@ int vfio_region_mmap(VFIORegion *region)
 
 for (i = 0; i < region->nr_mmaps; i++) {
 region->mmaps[i].mmap = mmap(NULL, region->mmaps[i].size, prot,
- MAP_SHARED, region->vbasedev->fd,
+ MAP_SHARED, region->fd,
  region->fd_offset +
  region->mmaps[i].offset);
 if (region->mmaps[i].mmap == MAP_FAILED) {
@@ -2442,10 +2448,17 @@ void vfio_put_base_device(VFIODevice *vbasedev)
 int i;
 
 for (i = 0; i < vbasedev->num_regions; i++) {
+if (vbasedev->regfds != NULL && vbasedev->regfds[i] != -1) {
+close(vbasedev->regfds[i]);
+}
 g_free(vbasedev->regions[i]);
 }
 g_free(vbasedev->regions);
 vbasedev->regions = NULL;
+if (vbasedev->regfds != NULL) {
+g_free(vbasedev->regfds);
+vbasedev->regfds = NULL;
+}
 }
 
 if (!vbasedev->group) {
@@ -2461,12 +2474,16 @@ int vfio_get_region_info(VFIODevice *vbasedev, int 
index,
  struct vfio_region_info **info)
 {
 size_t argsz = sizeof(struct vfio_region_info);
+int fd = -1;
 int ret;
 
 /* create region cache */
 if (vbasedev->regions == NULL) {
 vbasedev->regions = g_new0(struct vfio_region_info *,
vbasedev->num_regions);
+if (vbasedev->proxy != NULL) {
+vbasedev->regfds = g_new0(int, vbasedev->num_regions);
+}
 }
 /* check cache */
 if (vbasedev->regions[index] != NULL) {
@@ -2480,7 +2497,7 @@ int vfio_get_region_info(VFIODevice *vbasedev, int index,
 retry:
 (*info)->argsz = argsz;
 
-ret = VDEV_GET_REGION_INFO(vbasedev, *info);
+ret = VDEV_GET_REGION_INFO(vbasedev, *info, );
 if (ret != 0) {
 g_free(*info);
 *info = NULL;
@@ -2490,12 +2507,19 @@ retry:
 if ((*info)->argsz > argsz) {
 argsz = (*info)->argsz;
 *info = g_realloc(*info, argsz);
+if (fd != -1) {
+close(fd);
+fd = -1;
+}
 
 goto retry;
 }
 
 /* fill cache */
 vbasedev->regions[index] = *info;
+if (vbasedev->regfds != NULL) {
+vbasedev->regfds[index] = fd;
+}
 
 return 0;
 }
@@ -2655,10 +2679,12 @@ static int vfio_io_get_info(VFIODevice *vbasedev, 
struct vfio_device_info *info)
 }
 
 static int vfio_io_get_region_info(VFIODevice *vbasedev,
-   struct vfio_region_info *info)
+   struct vfio_region_info *info,
+   int *fd)
 {
 int ret;
 
+*fd = -1;
 ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, info);
 
 return ret < 0 ? -errno : ret;
diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h
index 43912a5..a1b64fe 100644
--- a/hw/vfio/user-protocol.h
+++ b/hw/vfio/user-protocol.h
@@ -126,4 +126,18 @@ typedef struct {
 uint32_t cap_offset;
 } VFIOUserDeviceInfo;
 
+/*
+ * VFIO_USER_DEVICE_GET_REGION_INFO
+ * imported from struct_vfio_region_info
+ */
+typedef struct {
+VFIOUserHdr hdr;
+uint32_t argsz;
+uint32_t flags;
+uint32_t index;
+uint32_t cap_offset;
+uint64_t size;
+uint64_t offset;
+} VFIOUserRegionInfo;
+
 #endif /* VFIO_USER_PROTOCOL_H */
diff --git a/hw/vfio/user.c b/hw/vfio/user.c
index 7873534..69b0fed 100644
--- a/hw/vfio/user.c
+++ b/hw/vfio/user.c
@@ -1101,6 +1101,40 @@ static int vfio_user_get_info(VFIOProxy *proxy, struct 
vfio_device_info *info)
 return 0;
 }
 
+static int vfio_user_get_region_info(VFIOProxy *proxy,
+ struct vfio_region_info *info,
+

[PATCH v1 20/24] vfio-user: dma read/write operations

2022-11-08 Thread John Johnson

Messages from server to client that peform device DMA.

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/vfio/pci.c   | 110 
 hw/vfio/user-protocol.h |  11 +
 hw/vfio/user.c  |  57 +
 hw/vfio/user.h  |   3 ++
 4 files changed, 181 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index ce6776b..559b20d 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3550,6 +3550,95 @@ static void vfio_user_msix_teardown(VFIOPCIDevice *vdev)
 vdev->msix->pba_region = NULL;
 }
 
+static void vfio_user_dma_read(VFIOPCIDevice *vdev, VFIOUserDMARW *msg)
+{
+PCIDevice *pdev = >pdev;
+VFIOProxy *proxy = vdev->vbasedev.proxy;
+VFIOUserDMARW *res;
+MemTxResult r;
+size_t size;
+
+if (msg->hdr.size < sizeof(*msg)) {
+vfio_user_send_error(proxy, >hdr, EINVAL);
+return;
+}
+if (msg->count > proxy->max_xfer_size) {
+vfio_user_send_error(proxy, >hdr, E2BIG);
+return;
+}
+
+/* switch to our own message buffer */
+size = msg->count + sizeof(VFIOUserDMARW);
+res = g_malloc0(size);
+memcpy(res, msg, sizeof(*res));
+g_free(msg);
+
+r = pci_dma_read(pdev, res->offset, >data, res->count);
+
+switch (r) {
+case MEMTX_OK:
+if (res->hdr.flags & VFIO_USER_NO_REPLY) {
+g_free(res);
+return;
+}
+vfio_user_send_reply(proxy, >hdr, size);
+break;
+case MEMTX_ERROR:
+vfio_user_send_error(proxy, >hdr, EFAULT);
+break;
+case MEMTX_DECODE_ERROR:
+vfio_user_send_error(proxy, >hdr, ENODEV);
+break;
+case MEMTX_ACCESS_ERROR:
+vfio_user_send_error(proxy, >hdr, EPERM);
+break;
+default:
+error_printf("vfio_user_dma_read unknown error %d\n", r);
+vfio_user_send_error(vdev->vbasedev.proxy, >hdr, EINVAL);
+}
+}
+
+static void vfio_user_dma_write(VFIOPCIDevice *vdev, VFIOUserDMARW *msg)
+{
+PCIDevice *pdev = >pdev;
+VFIOProxy *proxy = vdev->vbasedev.proxy;
+MemTxResult r;
+
+if (msg->hdr.size < sizeof(*msg)) {
+vfio_user_send_error(proxy, >hdr, EINVAL);
+return;
+}
+/* make sure transfer count isn't larger than the message data */
+if (msg->count > msg->hdr.size - sizeof(*msg)) {
+vfio_user_send_error(proxy, >hdr, E2BIG);
+return;
+}
+
+r = pci_dma_write(pdev, msg->offset, >data, msg->count);
+
+switch (r) {
+case MEMTX_OK:
+if ((msg->hdr.flags & VFIO_USER_NO_REPLY) == 0) {
+vfio_user_send_reply(proxy, >hdr, sizeof(msg->hdr));
+} else {
+g_free(msg);
+}
+break;
+case MEMTX_ERROR:
+vfio_user_send_error(proxy, >hdr, EFAULT);
+break;
+case MEMTX_DECODE_ERROR:
+vfio_user_send_error(proxy, >hdr, ENODEV);
+break;
+case MEMTX_ACCESS_ERROR:
+vfio_user_send_error(proxy, >hdr, EPERM);
+break;
+default:
+error_printf("vfio_user_dma_write unknown error %d\n", r);
+vfio_user_send_error(vdev->vbasedev.proxy, >hdr, EINVAL);
+}
+}
+
 /*
  * Incoming request message callback.
  *
@@ -3557,9 +3646,30 @@ static void vfio_user_msix_teardown(VFIOPCIDevice *vdev)
  */
 static void vfio_user_pci_process_req(void *opaque, VFIOUserMsg *msg)
 {
+VFIOPCIDevice *vdev = opaque;
+VFIOUserHdr *hdr = msg->hdr;
+
+/* no incoming PCI requests pass FDs */
+if (msg->fds != NULL) {
+vfio_user_send_error(vdev->vbasedev.proxy, hdr, EINVAL);
+vfio_user_putfds(msg);
+return;
+}
 
+switch (hdr->command) {
+case VFIO_USER_DMA_READ:
+vfio_user_dma_read(vdev, (VFIOUserDMARW *)hdr);
+break;
+case VFIO_USER_DMA_WRITE:
+vfio_user_dma_write(vdev, (VFIOUserDMARW *)hdr);
+break;
+default:
+error_printf("vfio_user_process_req unknown cmd %d\n", hdr->command);
+vfio_user_send_error(vdev->vbasedev.proxy, hdr, ENOSYS);
+}
 }
 
+
 /*
  * Emulated devices don't use host hot reset
  */
diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h
index e9fcf64..6afd090 100644
--- a/hw/vfio/user-protocol.h
+++ b/hw/vfio/user-protocol.h
@@ -202,6 +202,17 @@ typedef struct {
 char data[];
 } VFIOUserRegionRW;
 
+/*
+ * VFIO_USER_DMA_READ
+ * VFIO_USER_DMA_WRITE
+ */
+typedef struct {
+VFIOUserHdr hdr;
+uint64_t offset;
+uint32_t count;
+char data[];
+} VFIOUserDMARW;
+
 /*imported from struct vfio_bitmap */
 typedef struct {
 uint64_t pgsize;
diff --git a/hw/vfio/user.c b/hw/vfio/user.c
index 0c5493e..56b3616 100644
--- a/hw/vfio/user.c
+++ b/hw/vfio/user.c
@@ -379,6 +379,10 @@ static int vfio_user_recv_one(VFIOProxy *proxy)
 *msg->hdr = hdr;
 data = (char *)msg->hdr + sizeof(hdr);
 } else {
+if (hdr.size > proxy->max_xfer_size +

[PATCH v1 21/24] vfio-user: pci reset

2022-11-08 Thread John Johnson

Message to tell the server to reset the device.

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/vfio/pci.c  | 15 +++
 hw/vfio/user.c | 12 
 hw/vfio/user.h |  1 +
 3 files changed, 28 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 559b20d..005fcf8 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3829,6 +3829,20 @@ static void vfio_user_instance_finalize(Object *obj)
 }
 }
 
+static void vfio_user_pci_reset(DeviceState *dev)
+{
+VFIOPCIDevice *vdev = VFIO_PCI_BASE(dev);
+VFIODevice *vbasedev = >vbasedev;
+
+vfio_pci_pre_reset(vdev);
+
+if (vbasedev->reset_works) {
+vfio_user_reset(vbasedev->proxy);
+}
+
+vfio_pci_post_reset(vdev);
+}
+
 static Property vfio_user_pci_dev_properties[] = {
 DEFINE_PROP_STRING("socket", VFIOUserPCIDevice, sock_name),
 DEFINE_PROP_BOOL("secure-dma", VFIOUserPCIDevice, secure_dma, false),
@@ -3842,6 +3856,7 @@ static void vfio_user_pci_dev_class_init(ObjectClass 
*klass, void *data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 PCIDeviceClass *pdc = PCI_DEVICE_CLASS(klass);
 
+dc->reset = vfio_user_pci_reset;
 device_class_set_props(dc, vfio_user_pci_dev_properties);
 dc->desc = "VFIO over socket PCI device assignment";
 pdc->realize = vfio_user_pci_realize;
diff --git a/hw/vfio/user.c b/hw/vfio/user.c
index 56b3616..ddf9e13 100644
--- a/hw/vfio/user.c
+++ b/hw/vfio/user.c
@@ -1580,6 +1580,18 @@ static int vfio_user_region_write(VFIOProxy *proxy, 
uint8_t index, off_t offset,
 return ret;
 }
 
+void vfio_user_reset(VFIOProxy *proxy)
+{
+VFIOUserHdr msg;
+
+vfio_user_request_msg(, VFIO_USER_DEVICE_RESET, sizeof(msg), 0);
+
+vfio_user_send_wait(proxy, , NULL, 0, false);
+if (msg.flags & VFIO_USER_ERROR) {
+error_printf("reset reply error %d\n", msg.error_reply);
+}
+}
+
 
 /*
  * Socket-based io_ops
diff --git a/hw/vfio/user.h b/hw/vfio/user.h
index 18e3112..d88ffe5 100644
--- a/hw/vfio/user.h
+++ b/hw/vfio/user.h
@@ -96,6 +96,7 @@ int vfio_user_validate_version(VFIOProxy *proxy, Error 
**errp);
 void vfio_user_send_reply(VFIOProxy *proxy, VFIOUserHdr *hdr, int size);
 void vfio_user_send_error(VFIOProxy *proxy, VFIOUserHdr *hdr, int error);
 void vfio_user_putfds(VFIOUserMsg *msg);
+void vfio_user_reset(VFIOProxy *proxy);
 
 extern VFIODevIO vfio_dev_io_sock;
 extern VFIOContIO vfio_cont_io_sock;
-- 
1.8.3.1

[PATCH v1 15/24] vfio-user: forward msix BAR accesses to server

2022-11-08 Thread John Johnson

Server holds device current device pending state
Use irq masking commands in socket case

Signed-off-by: John G Johnson 
Signed-off-by: Elena Ufimtseva 
Signed-off-by: Jagannathan Raman 
---
 hw/vfio/ccw.c |  1 +
 hw/vfio/common.c  | 26 +
 hw/vfio/pci.c | 87 ++-
 hw/vfio/pci.h |  1 +
 hw/vfio/platform.c|  1 +
 include/hw/vfio/vfio-common.h |  3 ++
 6 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index cbd1c25..830ca53 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -615,6 +615,7 @@ static void vfio_ccw_get_device(VFIOGroup *group, 
VFIOCCWDevice *vcdev,
 vcdev->vdev.name = name;
 vcdev->vdev.dev = >cdev.parent_obj.parent_obj;
 vcdev->vdev.io_ops = _dev_io_ioctl;
+vcdev->vdev.irq_mask_works = false;
 
 return;
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 87cd1d1..b540195 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -101,6 +101,32 @@ void vfio_mask_single_irqindex(VFIODevice *vbasedev, int 
index)
 VDEV_SET_IRQS(vbasedev, _set);
 }
 
+void vfio_mask_single_irq(VFIODevice *vbasedev, int index, int irq)
+{
+struct vfio_irq_set irq_set = {
+.argsz = sizeof(irq_set),
+.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
+.index = index,
+.start = irq,
+.count = 1,
+};
+
+VDEV_SET_IRQS(vbasedev, _set);
+}
+
+void vfio_unmask_single_irq(VFIODevice *vbasedev, int index, int irq)
+{
+struct vfio_irq_set irq_set = {
+.argsz = sizeof(irq_set),
+.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
+.index = index,
+.start = irq,
+.count = 1,
+};
+
+VDEV_SET_IRQS(vbasedev, _set);
+}
+
 static inline const char *action_to_str(int action)
 {
 switch (action) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index be39a4e..a1ae3fb 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -479,6 +479,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
unsigned int nr,
 {
 VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
 VFIOMSIVector *vector;
+bool new_vec = false;
 int ret;
 
 trace_vfio_msix_vector_do_use(vdev->vbasedev.name, nr);
@@ -492,6 +493,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
unsigned int nr,
 error_report("vfio: Error: event_notifier_init failed");
 }
 vector->use = true;
+new_vec = true;
 msix_vector_use(pdev, nr);
 }
 
@@ -518,6 +520,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
unsigned int nr,
 kvm_irqchip_commit_route_changes(_route_change);
 vfio_connect_kvm_msi_virq(vector);
 }
+new_vec = true;
 }
 }
 
@@ -525,6 +528,8 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
unsigned int nr,
  * We don't want to have the host allocate all possible MSI vectors
  * for a device if they're not in use, so we shutdown and incrementally
  * increase them as needed.
+ * Otherwise, unmask the vector if the vector is already setup (and we can
+ * do so) or send the fd if not.
  */
 if (vdev->nr_vectors < nr + 1) {
 vdev->nr_vectors = nr + 1;
@@ -535,6 +540,8 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
unsigned int nr,
 error_report("vfio: failed to enable vectors, %d", ret);
 }
 }
+} else if (vdev->vbasedev.irq_mask_works && !new_vec) {
+vfio_unmask_single_irq(>vbasedev, VFIO_PCI_MSIX_IRQ_INDEX, nr);
 } else {
 Error *err = NULL;
 int32_t fd;
@@ -576,6 +583,12 @@ static void vfio_msix_vector_release(PCIDevice *pdev, 
unsigned int nr)
 
 trace_vfio_msix_vector_release(vdev->vbasedev.name, nr);
 
+/* just mask vector if peer supports it */
+if (vdev->vbasedev.irq_mask_works) {
+vfio_mask_single_irq(>vbasedev, VFIO_PCI_MSIX_IRQ_INDEX, nr);
+return;
+}
+
 /*
  * There are still old guests that mask and unmask vectors on every
  * interrupt.  If we're using QEMU bypass with a KVM irqfd, leave all of
@@ -646,7 +659,7 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
 if (ret) {
 error_report("vfio: failed to enable vectors, %d", ret);
 }
-} else {
+} else if (!vdev->vbasedev.irq_mask_works) {
 /*
  * Some communication channels between VF & PF or PF & fw rely on the
  * physical state of the device and expect that enabling MSI-X from the
@@ -662,6 +675,13 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
  */
 vfio_msix_vector_do_use(>pdev, 0, NULL, NULL);
 vfio_msix_vector_release(>pdev, 0);
+} else {
+/*
+ * If we can use irq masking, send an invalid fd on vector 0
+ * to enable MSI-X without any vectors enabled.
+ */
+

[PATCH v1 17/24] vfio-user: dma map/unmap operations

2022-11-08 Thread John Johnson

Add ability to do async operations during memory transactions

Signed-off-by: Jagannathan Raman 
Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
---
 hw/vfio/common.c  |  63 +---
 hw/vfio/user-protocol.h   |  32 ++
 hw/vfio/user.c| 220 ++
 include/hw/vfio/vfio-common.h |   9 +-
 4 files changed, 308 insertions(+), 16 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index e73a772..fe6eddd 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -507,7 +507,7 @@ static int vfio_dma_unmap(VFIOContainer *container,
 return CONT_DMA_UNMAP(container, , NULL);
 }
 
-static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
+static int vfio_dma_map(VFIOContainer *container, MemoryRegion *mr, hwaddr 
iova,
 ram_addr_t size, void *vaddr, bool readonly)
 {
 struct vfio_iommu_type1_dma_map map = {
@@ -523,7 +523,7 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr 
iova,
 map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
 }
 
-ret = CONT_DMA_MAP(container, );
+ret = CONT_DMA_MAP(container, mr, );
 
 if (ret < 0) {
 error_report("VFIO_MAP_DMA failed: %s", strerror(-ret));
@@ -586,7 +586,8 @@ static bool 
vfio_listener_skipped_section(MemoryRegionSection *section)
 
 /* Called with rcu_read_lock held.  */
 static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
-   ram_addr_t *ram_addr, bool *read_only)
+   ram_addr_t *ram_addr, bool *read_only,
+   MemoryRegion **mrp)
 {
 MemoryRegion *mr;
 hwaddr xlat;
@@ -667,6 +668,10 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void 
**vaddr,
 *read_only = !writable || mr->readonly;
 }
 
+if (mrp != NULL) {
+*mrp = mr;
+}
+
 return true;
 }
 
@@ -674,6 +679,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
 {
 VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
 VFIOContainer *container = giommu->container;
+MemoryRegion *mr;
 hwaddr iova = iotlb->iova + giommu->iommu_offset;
 void *vaddr;
 int ret;
@@ -692,7 +698,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
 if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
 bool read_only;
 
-if (!vfio_get_xlat_addr(iotlb, , NULL, _only)) {
+if (!vfio_get_xlat_addr(iotlb, , NULL, _only, )) {
 goto out;
 }
 /*
@@ -702,14 +708,14 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
  * of vaddr will always be there, even if the memory object is
  * destroyed and its backing memory munmap-ed.
  */
-ret = vfio_dma_map(container, iova,
+ret = vfio_dma_map(container, mr, iova,
iotlb->addr_mask + 1, vaddr,
read_only);
 if (ret) {
 error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
- "0x%"HWADDR_PRIx", %p) = %d (%m)",
+ "0x%"HWADDR_PRIx", %p)",
  container, iova,
- iotlb->addr_mask + 1, vaddr, ret);
+ iotlb->addr_mask + 1, vaddr);
 }
 } else {
 ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1, iotlb);
@@ -764,7 +770,7 @@ static int 
vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
section->offset_within_address_space;
 vaddr = memory_region_get_ram_ptr(section->mr) + start;
 
-ret = vfio_dma_map(vrdl->container, iova, next - start,
+ret = vfio_dma_map(vrdl->container, section->mr, iova, next - start,
vaddr, section->readonly);
 if (ret) {
 /* Rollback */
@@ -888,6 +894,29 @@ static bool 
vfio_known_safe_misalignment(MemoryRegionSection *section)
 return true;
 }
 
+static void vfio_listener_begin(MemoryListener *listener)
+{
+VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+
+/*
+ * When DMA space is the physical address space,
+ * the region add/del listeners will fire during
+ * memory update transactions.  These depend on BQL
+ * being held, so do any resulting map/demap ops async
+ * while keeping BQL.
+ */
+container->async_ops = true;
+}
+
+static void vfio_listener_commit(MemoryListener *listener)
+{
+VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+
+/* wait here for any async requests sent during the transaction */
+CONT_WAIT_COMMIT(container);
+container->async_ops = false;
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
@@ -1095,12 +1124,12 @@ static void vfio_listener_region_add(MemoryListener

[PATCH v1 14/24] vfio-user: get and set IRQs

2022-11-08 Thread John Johnson

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/vfio/pci.c   |   7 ++-
 hw/vfio/user-protocol.h |  25 +
 hw/vfio/user.c  | 135 
 3 files changed, 166 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 7abe44e..be39a4e 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -713,7 +713,8 @@ retry:
 ret = vfio_enable_vectors(vdev, false);
 if (ret) {
 if (ret < 0) {
-error_report("vfio: Error: Failed to setup MSI fds: %m");
+error_report("vfio: Error: Failed to setup MSI fds: %s",
+ strerror(-ret));
 } else {
 error_report("vfio: Error: Failed to enable %d "
  "MSI vectors, retry with %d", vdev->nr_vectors, ret);
@@ -2712,6 +2713,7 @@ static void vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
 irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
 
 ret = VDEV_GET_IRQ_INFO(vbasedev, _info);
+
 if (ret) {
 /* This can fail for an old kernel or legacy PCI dev */
 trace_vfio_populate_device_get_irq_info_failure(strerror(errno));
@@ -3593,6 +3595,9 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Error 
**errp)
 goto out_teardown;
 }
 
+vfio_register_err_notifier(vdev);
+vfio_register_req_notifier(vdev);
+
 return;
 
 out_teardown:
diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h
index 124340c..31704cf 100644
--- a/hw/vfio/user-protocol.h
+++ b/hw/vfio/user-protocol.h
@@ -141,6 +141,31 @@ typedef struct {
 } VFIOUserRegionInfo;
 
 /*
+ * VFIO_USER_DEVICE_GET_IRQ_INFO
+ * imported from struct vfio_irq_info
+ */
+typedef struct {
+VFIOUserHdr hdr;
+uint32_t argsz;
+uint32_t flags;
+uint32_t index;
+uint32_t count;
+} VFIOUserIRQInfo;
+
+/*
+ * VFIO_USER_DEVICE_SET_IRQS
+ * imported from struct vfio_irq_set
+ */
+typedef struct {
+VFIOUserHdr hdr;
+uint32_t argsz;
+uint32_t flags;
+uint32_t index;
+uint32_t start;
+uint32_t count;
+} VFIOUserIRQSet;
+
+/*
  * VFIO_USER_REGION_READ
  * VFIO_USER_REGION_WRITE
  */
diff --git a/hw/vfio/user.c b/hw/vfio/user.c
index 1453bb5..815385b 100644
--- a/hw/vfio/user.c
+++ b/hw/vfio/user.c
@@ -1164,6 +1164,117 @@ static int vfio_user_get_region_info(VFIOProxy *proxy,
 return 0;
 }
 
+static int vfio_user_get_irq_info(VFIOProxy *proxy,
+  struct vfio_irq_info *info)
+{
+VFIOUserIRQInfo msg;
+
+memset(, 0, sizeof(msg));
+vfio_user_request_msg(, VFIO_USER_DEVICE_GET_IRQ_INFO,
+  sizeof(msg), 0);
+msg.argsz = info->argsz;
+msg.index = info->index;
+
+vfio_user_send_wait(proxy, , NULL, 0, false);
+if (msg.hdr.flags & VFIO_USER_ERROR) {
+return -msg.hdr.error_reply;
+}
+
+memcpy(info, , sizeof(*info));
+return 0;
+}
+
+static int irq_howmany(int *fdp, uint32_t cur, uint32_t max)
+{
+int n = 0;
+
+if (fdp[cur] != -1) {
+do {
+n++;
+} while (n < max && fdp[cur + n] != -1);
+} else {
+do {
+n++;
+} while (n < max && fdp[cur + n] == -1);
+}
+
+return n;
+}
+
+static int vfio_user_set_irqs(VFIOProxy *proxy, struct vfio_irq_set *irq)
+{
+g_autofree VFIOUserIRQSet *msgp = NULL;
+uint32_t size, nfds, send_fds, sent_fds, max;
+
+if (irq->argsz < sizeof(*irq)) {
+error_printf("vfio_user_set_irqs argsz too small\n");
+return -EINVAL;
+}
+
+/*
+ * Handle simple case
+ */
+if ((irq->flags & VFIO_IRQ_SET_DATA_EVENTFD) == 0) {
+size = sizeof(VFIOUserHdr) + irq->argsz;
+msgp = g_malloc0(size);
+
+vfio_user_request_msg(>hdr, VFIO_USER_DEVICE_SET_IRQS, size, 0);
+msgp->argsz = irq->argsz;
+msgp->flags = irq->flags;
+msgp->index = irq->index;
+msgp->start = irq->start;
+msgp->count = irq->count;
+
+vfio_user_send_wait(proxy, >hdr, NULL, 0, false);
+if (msgp->hdr.flags & VFIO_USER_ERROR) {
+return -msgp->hdr.error_reply;
+}
+
+return 0;
+}
+
+/*
+ * Calculate the number of FDs to send
+ * and adjust argsz
+ */
+nfds = (irq->argsz - sizeof(*irq)) / sizeof(int);
+irq->argsz = sizeof(*irq);
+msgp = g_malloc0(sizeof(*msgp));
+/*
+ * Send in chunks if over max_send_fds
+ */
+for (sent_fds = 0; nfds > sent_fds; sent_fds += send_fds) {
+VFIOUserFDs *arg_fds, loop_fds;
+
+/* must send all valid FDs or all invalid FDs in single msg */
+max = nfds - sent_fds;
+if (max > proxy->max_send_fds) {
+max = proxy->max_send_fds;
+}
+send_fds = irq_howmany((int *)irq->data, sent_fds, max);
+
+vfio_user_request_msg(>hdr, VFIO_USER_DEVICE_SET_IRQS,
+  sizeof(*msgp), 0);
+

[PATCH v1 13/24] vfio-user: pci_user_realize PCI setup

2022-11-08 Thread John Johnson

PCI BARs read from remote device
PCI config reads/writes sent to remote server

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/vfio/pci.c | 277 --
 1 file changed, 174 insertions(+), 103 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 027f9d5..7abe44e 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2874,6 +2874,133 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice 
*vdev)
 vdev->req_enabled = false;
 }
 
+static void vfio_pci_config_setup(VFIOPCIDevice *vdev, Error **errp)
+{
+PCIDevice *pdev = >pdev;
+VFIODevice *vbasedev = >vbasedev;
+Error *err = NULL;
+
+/* vfio emulates a lot for us, but some bits need extra love */
+vdev->emulated_config_bits = g_malloc0(vdev->config_size);
+
+/* QEMU can choose to expose the ROM or not */
+memset(vdev->emulated_config_bits + PCI_ROM_ADDRESS, 0xff, 4);
+/* QEMU can also add or extend BARs */
+memset(vdev->emulated_config_bits + PCI_BASE_ADDRESS_0, 0xff, 6 * 4);
+
+/*
+ * The PCI spec reserves vendor ID 0x as an invalid value.  The
+ * device ID is managed by the vendor and need only be a 16-bit value.
+ * Allow any 16-bit value for subsystem so they can be hidden or changed.
+ */
+if (vdev->vendor_id != PCI_ANY_ID) {
+if (vdev->vendor_id >= 0x) {
+error_setg(errp, "invalid PCI vendor ID provided");
+return;
+}
+vfio_add_emulated_word(vdev, PCI_VENDOR_ID, vdev->vendor_id, ~0);
+trace_vfio_pci_emulated_vendor_id(vdev->vbasedev.name, 
vdev->vendor_id);
+} else {
+vdev->vendor_id = pci_get_word(pdev->config + PCI_VENDOR_ID);
+}
+
+if (vdev->device_id != PCI_ANY_ID) {
+if (vdev->device_id > 0x) {
+error_setg(errp, "invalid PCI device ID provided");
+return;
+}
+vfio_add_emulated_word(vdev, PCI_DEVICE_ID, vdev->device_id, ~0);
+trace_vfio_pci_emulated_device_id(vbasedev->name, vdev->device_id);
+} else {
+vdev->device_id = pci_get_word(pdev->config + PCI_DEVICE_ID);
+}
+
+if (vdev->sub_vendor_id != PCI_ANY_ID) {
+if (vdev->sub_vendor_id > 0x) {
+error_setg(errp, "invalid PCI subsystem vendor ID provided");
+return;
+}
+vfio_add_emulated_word(vdev, PCI_SUBSYSTEM_VENDOR_ID,
+   vdev->sub_vendor_id, ~0);
+trace_vfio_pci_emulated_sub_vendor_id(vbasedev->name,
+  vdev->sub_vendor_id);
+}
+
+if (vdev->sub_device_id != PCI_ANY_ID) {
+if (vdev->sub_device_id > 0x) {
+error_setg(errp, "invalid PCI subsystem device ID provided");
+return;
+}
+vfio_add_emulated_word(vdev, PCI_SUBSYSTEM_ID, vdev->sub_device_id, 
~0);
+trace_vfio_pci_emulated_sub_device_id(vbasedev->name,
+  vdev->sub_device_id);
+}
+
+/* QEMU can change multi-function devices to single function, or reverse */
+vdev->emulated_config_bits[PCI_HEADER_TYPE] =
+  PCI_HEADER_TYPE_MULTI_FUNCTION;
+
+/* Restore or clear multifunction, this is always controlled by QEMU */
+if (vdev->pdev.cap_present & QEMU_PCI_CAP_MULTIFUNCTION) {
+vdev->pdev.config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION;
+} else {
+vdev->pdev.config[PCI_HEADER_TYPE] &= ~PCI_HEADER_TYPE_MULTI_FUNCTION;
+}
+
+/*
+ * Clear host resource mapping info.  If we choose not to register a
+ * BAR, such as might be the case with the option ROM, we can get
+ * confusing, unwritable, residual addresses from the host here.
+ */
+memset(>pdev.config[PCI_BASE_ADDRESS_0], 0, 24);
+memset(>pdev.config[PCI_ROM_ADDRESS], 0, 4);
+
+vfio_pci_size_rom(vdev);
+
+vfio_bars_prepare(vdev);
+
+vfio_msix_early_setup(vdev, );
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
+vfio_bars_register(vdev);
+}
+
+static int vfio_interrupt_setup(VFIOPCIDevice *vdev, Error **errp)
+{
+PCIDevice *pdev = >pdev;
+int ret;
+
+/* QEMU emulates all of MSI & MSIX */
+if (pdev->cap_present & QEMU_PCI_CAP_MSIX) {
+memset(vdev->emulated_config_bits + pdev->msix_cap, 0xff,
+   MSIX_CAP_LENGTH);
+}
+
+if (pdev->cap_present & QEMU_PCI_CAP_MSI) {
+memset(vdev->emulated_config_bits + pdev->msi_cap, 0xff,
+   vdev->msi_cap_size);
+}
+
+if (vfio_pci_read_config(>pdev, PCI_INTERRUPT_PIN, 1)) {
+vdev->intx.mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
+  vfio_intx_mmap_enable, vdev);
+pci_device_set_intx_routing_notifier(>pdev,
+ vfio_intx_routing_notifier);
+

[PATCH v1 24/24] vfio-user: add trace points

2022-11-08 Thread John Johnson

Signed-off-by: John G Johnson 
Signed-off-by: Elena Ufimtseva 
Signed-off-by: Jagannathan Raman 
---
 hw/vfio/trace-events | 15 +++
 hw/vfio/user.c   | 26 ++
 2 files changed, 41 insertions(+)

diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 73dffe9..c27cec7 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -166,3 +166,18 @@ vfio_load_state_device_data(const char *name, uint64_t 
data_offset, uint64_t dat
 vfio_load_cleanup(const char *name) " (%s)"
 vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t 
bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 
0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
 vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu 
dirty @ 0x%"PRIx64" - 0x%"PRIx64
+
+# user.c
+vfio_user_recv_hdr(const char *name, uint16_t id, uint16_t cmd, uint32_t size, 
uint32_t flags) " (%s) id 0x%x cmd 0x%x size 0x%x flags 0x%x"
+vfio_user_recv_read(uint16_t id, int read) " id 0x%x read 0x%x"
+vfio_user_recv_request(uint16_t cmd) " command 0x%x"
+vfio_user_send_write(uint16_t id, int wrote) " id 0x%x wrote 0x%x"
+vfio_user_version(uint16_t major, uint16_t minor, const char *caps) " major %d 
minor %d caps: %s"
+vfio_user_dma_map(uint64_t iova, uint64_t size, uint64_t off, uint32_t flags, 
bool will_commit) " iova 0x%"PRIx64" size 0x%"PRIx64" off 0x%"PRIx64" flags 
0x%x will_commit %d"
+vfio_user_dma_unmap(uint64_t iova, uint64_t size, uint32_t flags, bool dirty, 
bool will_commit) " iova 0x%"PRIx64" size 0x%"PRIx64" flags 0x%x dirty %d 
will_commit %d"
+vfio_user_get_info(uint32_t nregions, uint32_t nirqs) " #regions %d #irqs %d"
+vfio_user_get_region_info(uint32_t index, uint32_t flags, uint64_t size) " 
index %d flags 0x%x size 0x%"PRIx64
+vfio_user_get_irq_info(uint32_t index, uint32_t flags, uint32_t count) " index 
%d flags 0x%x count %d"
+vfio_user_set_irqs(uint32_t index, uint32_t start, uint32_t count, uint32_t 
flags) " index %d start %d count %d flags 0x%x"
+vfio_user_region_rw(uint32_t region, uint64_t off, uint32_t count) " region %d 
offset 0x%"PRIx64" count %d"
+vfio_user_wrmulti(const char *s, uint64_t wr_cnt) " %s count 0x%"PRIx64
diff --git a/hw/vfio/user.c b/hw/vfio/user.c
index 4ed305b..74e1714 100644
--- a/hw/vfio/user.c
+++ b/hw/vfio/user.c
@@ -30,6 +30,8 @@
 #include "qapi/qmp/qnum.h"
 #include "qapi/qmp/qbool.h"
 #include "user.h"
+#include "trace.h"
+
 
 
 /*
@@ -108,6 +110,8 @@ static int vfio_user_send_qio(VFIOProxy *proxy, VFIOUserMsg 
*msg)
 vfio_user_shutdown(proxy);
 error_report_err(local_err);
 }
+trace_vfio_user_send_write(msg->hdr->id, ret);
+
 return ret;
 }
 
@@ -225,6 +229,7 @@ static int vfio_user_complete(VFIOProxy *proxy, Error 
**errp)
 }
 return ret;
 }
+trace_vfio_user_recv_read(msg->hdr->id, ret);
 
 msgleft -= ret;
 data += ret;
@@ -332,6 +337,8 @@ static int vfio_user_recv_one(VFIOProxy *proxy)
 error_setg(_err, "unknown message type");
 goto fatal;
 }
+trace_vfio_user_recv_hdr(proxy->sockname, hdr.id, hdr.command, hdr.size,
+ hdr.flags);
 
 /*
  * For replies, find the matching pending request.
@@ -408,6 +415,7 @@ static int vfio_user_recv_one(VFIOProxy *proxy)
 if (ret <= 0) {
 goto fatal;
 }
+trace_vfio_user_recv_read(hdr.id, ret);
 
 msgleft -= ret;
 data += ret;
@@ -546,6 +554,7 @@ static void vfio_user_request(void *opaque)
 QTAILQ_INIT();
 QTAILQ_FOREACH_SAFE(msg, , next, m1) {
 QTAILQ_REMOVE(, msg, next);
+trace_vfio_user_recv_request(msg->hdr->command);
 proxy->request(proxy->req_arg, msg);
 QTAILQ_INSERT_HEAD(, msg, next);
 }
@@ -1265,6 +1274,7 @@ int vfio_user_validate_version(VFIOProxy *proxy, Error 
**errp)
 msgp->minor = VFIO_USER_MINOR_VER;
 memcpy(>capabilities, caps->str, caplen);
 g_string_free(caps, true);
+trace_vfio_user_version(msgp->major, msgp->minor, msgp->capabilities);
 
 vfio_user_send_wait(proxy, >hdr, NULL, 0, false);
 if (msgp->hdr.flags & VFIO_USER_ERROR) {
@@ -1288,6 +1298,7 @@ int vfio_user_validate_version(VFIOProxy *proxy, Error 
**errp)
 return -1;
 }
 
+trace_vfio_user_version(msgp->major, msgp->minor, msgp->capabilities);
 return 0;
 }
 
@@ -1305,6 +1316,8 @@ static int vfio_user_dma_map(VFIOProxy *proxy,
 msgp->offset = map->vaddr;
 msgp->iova = map->iova;
 msgp->size = map->size;
+trace_vfio_user_dma_map(msgp->iova, msgp->size, msgp->offset, msgp->flags,
+will_commit);
 
 /*
  * The will_commit case sends without blocking or dropping BQL.
@@ -1371,6 +1384,8 @@ static int vfio_user_dma_unmap(VFIOProxy *proxy,
 msgp->msg.flags = unmap->flags;
 msgp->msg.iova = unmap->iova;
 msgp->msg.size = unmap->size;
+

[PATCH v1 03/24] vfio-user: add container IO ops vector

2022-11-08 Thread John Johnson

Used for communication with VFIO driver
(prep work for vfio-user, which will communicate over a socket)

Signed-off-by: John G Johnson 
---
 hw/vfio/common.c  | 126 --
 include/hw/vfio/vfio-common.h |  33 +++
 2 files changed, 117 insertions(+), 42 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ace9562..83d69b9 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -432,12 +432,12 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container,
 goto unmap_exit;
 }
 
-ret = ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, unmap);
+ret = CONT_DMA_UNMAP(container, unmap, bitmap);
 if (!ret) {
 cpu_physical_memory_set_dirty_lebitmap((unsigned long *)bitmap->data,
 iotlb->translated_addr, pages);
 } else {
-error_report("VFIO_UNMAP_DMA with DIRTY_BITMAP : %m");
+error_report("VFIO_UNMAP_DMA with DIRTY_BITMAP : %s", strerror(-ret));
 }
 
 g_free(bitmap->data);
@@ -465,30 +465,7 @@ static int vfio_dma_unmap(VFIOContainer *container,
 return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
 }
 
-while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, )) {
-/*
- * The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c
- * v4.15) where an overflow in its wrap-around check prevents us from
- * unmapping the last page of the address space.  Test for the error
- * condition and re-try the unmap excluding the last page.  The
- * expectation is that we've never mapped the last page anyway and this
- * unmap request comes via vIOMMU support which also makes it unlikely
- * that this page is used.  This bug was introduced well after type1 v2
- * support was introduced, so we shouldn't need to test for v1.  A fix
- * is queued for kernel v5.0 so this workaround can be removed once
- * affected kernels are sufficiently deprecated.
- */
-if (errno == EINVAL && unmap.size && !(unmap.iova + unmap.size) &&
-container->iommu_type == VFIO_TYPE1v2_IOMMU) {
-trace_vfio_dma_unmap_overflow_workaround();
-unmap.size -= 1ULL << ctz64(container->pgsizes);
-continue;
-}
-error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno));
-return -errno;
-}
-
-return 0;
+return CONT_DMA_UNMAP(container, , NULL);
 }
 
 static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
@@ -501,24 +478,18 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr 
iova,
 .iova = iova,
 .size = size,
 };
+int ret;
 
 if (!readonly) {
 map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
 }
 
-/*
- * Try the mapping, if it fails with EBUSY, unmap the region and try
- * again.  This shouldn't be necessary, but we sometimes see it in
- * the VGA ROM space.
- */
-if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, ) == 0 ||
-(errno == EBUSY && vfio_dma_unmap(container, iova, size, NULL) == 0 &&
- ioctl(container->fd, VFIO_IOMMU_MAP_DMA, ) == 0)) {
-return 0;
-}
+ret = CONT_DMA_MAP(container, );
 
-error_report("VFIO_MAP_DMA failed: %s", strerror(errno));
-return -errno;
+if (ret < 0) {
+error_report("VFIO_MAP_DMA failed: %s", strerror(-ret));
+}
+return ret;
 }
 
 static void vfio_host_win_add(VFIOContainer *container,
@@ -1263,10 +1234,10 @@ static void vfio_set_dirty_page_tracking(VFIOContainer 
*container, bool start)
 dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP;
 }
 
-ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, );
+ret = CONT_DIRTY_BITMAP(container, , NULL);
 if (ret) {
 error_report("Failed to set dirty tracking flag 0x%x errno: %d",
- dirty.flags, errno);
+ dirty.flags, -ret);
 }
 }
 
@@ -1316,11 +1287,11 @@ static int vfio_get_dirty_bitmap(VFIOContainer 
*container, uint64_t iova,
 goto err_out;
 }
 
-ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, dbitmap);
+ret = CONT_DIRTY_BITMAP(container, dbitmap, range);
 if (ret) {
 error_report("Failed to get dirty bitmap for iova: 0x%"PRIx64
 " size: 0x%"PRIx64" err: %d", (uint64_t)range->iova,
-(uint64_t)range->size, errno);
+(uint64_t)range->size, -ret);
 goto err_out;
 }
 
@@ -2090,6 +2061,7 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 container->error = NULL;
 container->dirty_pages_supported = false;
 container->dma_max_mappings = 0;
+container->io_ops = _cont_io_ioctl;
 QLIST_INIT(>giommu_list);
 QLIST_INIT(>hostwin_list);
 QLIST_INIT(>vrdl_list);
@@ -2626,3 +2598,73 @@ int vfio_eeh_as_op(AddressSpace *as, uint32_t op)
 }
 return vfio_eeh_container_op(container, op);
 }
+
+/*
+ * Traditional ioctl()

[PATCH v1 01/24] vfio-user: introduce vfio-user protocol specification

2022-11-08 Thread John Johnson

From: Thanos Makatos 

This patch introduces the vfio-user protocol specification (formerly
known as VFIO-over-socket), which is designed to allow devices to be
emulated outside QEMU, in a separate process. vfio-user reuses the
existing VFIO defines, structs and concepts.

It has been earlier discussed as an RFC in:
"RFC: use VFIO over a UNIX domain socket to implement device offloading"

Signed-off-by: John G Johnson 
Signed-off-by: Thanos Makatos 
Signed-off-by: John Levon 
---
 MAINTAINERS|6 +
 docs/devel/index-internals.rst |1 +
 docs/devel/vfio-user.rst   | 1522 
 3 files changed, 1529 insertions(+)
 create mode 100644 docs/devel/vfio-user.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index 738c4eb..999340d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1984,6 +1984,12 @@ F: hw/vfio/ap.c
 F: docs/system/s390x/vfio-ap.rst
 L: qemu-s3...@nongnu.org
 
+vfio-user
+M: John G Johnson 
+M: Thanos Makatos 
+S: Supported
+F: docs/devel/vfio-user.rst
+
 vhost
 M: Michael S. Tsirkin 
 S: Supported
diff --git a/docs/devel/index-internals.rst b/docs/devel/index-internals.rst
index e1a93df..0ecb5c6 100644
--- a/docs/devel/index-internals.rst
+++ b/docs/devel/index-internals.rst
@@ -17,5 +17,6 @@ Details about QEMU's various subsystems including how to add 
features to them.
s390-dasd-ipl
tracing
vfio-migration
+   vfio-user
writing-monitor-commands
virtio-backends
diff --git a/docs/devel/vfio-user.rst b/docs/devel/vfio-user.rst
new file mode 100644
index 000..0d96477
--- /dev/null
+++ b/docs/devel/vfio-user.rst
@@ -0,0 +1,1522 @@
+.. include:: 
+
+vfio-user Protocol Specification
+
+
+--
+Version_ 0.9.1
+--
+
+.. contents:: Table of Contents
+
+Introduction
+
+vfio-user is a protocol that allows a device to be emulated in a separate
+process outside of a Virtual Machine Monitor (VMM). vfio-user devices consist
+of a generic VFIO device type, living inside the VMM, which we call the client,
+and the core device implementation, living outside the VMM, which we call the
+server.
+
+The vfio-user specification is partly based on the
+`Linux VFIO ioctl interface 
`_.
+
+VFIO is a mature and stable API, backed by an extensively used framework. The
+existing VFIO client implementation in QEMU (``qemu/hw/vfio/``) can be largely
+re-used, though there is nothing in this specification that requires that
+particular implementation. None of the VFIO kernel modules are required for
+supporting the protocol, on either the client or server side. Some source
+definitions in VFIO are re-used for vfio-user.
+
+The main idea is to allow a virtual device to function in a separate process in
+the same host over a UNIX domain socket. A UNIX domain socket (``AF_UNIX``) is
+chosen because file descriptors can be trivially sent over it, which in turn
+allows:
+
+* Sharing of client memory for DMA with the server.
+* Sharing of server memory with the client for fast MMIO.
+* Efficient sharing of eventfd's for triggering interrupts.
+
+Other socket types could be used which allow the server to run in a separate
+guest in the same host (``AF_VSOCK``) or remotely (``AF_INET``). Theoretically
+the underlying transport does not necessarily have to be a socket, however we 
do
+not examine such alternatives. In this protocol version we focus on using a 
UNIX
+domain socket and introduce basic support for the other two types of sockets
+without considering performance implications.
+
+While passing of file descriptors is desirable for performance reasons, support
+is not necessary for either the client or the server in order to implement the
+protocol. There is always an in-band, message-passing fall back mechanism.
+
+Overview
+
+
+VFIO is a framework that allows a physical device to be securely passed through
+to a user space process; the device-specific kernel driver does not drive the
+device at all.  Typically, the user space process is a VMM and the device is
+passed through to it in order to achieve high performance. VFIO provides an API
+and the required functionality in the kernel. QEMU has adopted VFIO to allow a
+guest to directly access physical devices, instead of emulating them in
+software.
+
+vfio-user reuses the core VFIO concepts defined in its API, but implements them
+as messages to be sent over a socket. It does not change the kernel-based VFIO
+in any way, in fact none of the VFIO kernel modules need to be loaded to use
+vfio-user. It is also possible for the client to concurrently use the current
+kernel-based VFIO for one device, and vfio-user for another device.
+
+VFIO Device Model
+-
+
+A device under VFIO presents a standard interface to the user process. Many of
+the VFIO operations in the existing interface use the ``ioctl()`` system call, 
and

[PATCH v1 18/24] vfio-user: add dma_unmap_all

2022-11-08 Thread John Johnson

Signed-off-by: John G Johnson 
Signed-off-by: Elena Ufimtseva 
Signed-off-by: Jagannathan Raman 
---
 hw/vfio/common.c  | 45 ++-
 hw/vfio/user.c| 24 +++
 include/hw/vfio/vfio-common.h |  3 +++
 3 files changed, 63 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index fe6eddd..cd64ec2 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -507,6 +507,14 @@ static int vfio_dma_unmap(VFIOContainer *container,
 return CONT_DMA_UNMAP(container, , NULL);
 }
 
+/*
+ * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
+ */
+static int vfio_dma_unmap_all(VFIOContainer *container)
+{
+return CONT_DMA_UNMAP_ALL(container, VFIO_DMA_UNMAP_FLAG_ALL);
+}
+
 static int vfio_dma_map(VFIOContainer *container, MemoryRegion *mr, hwaddr 
iova,
 ram_addr_t size, void *vaddr, bool readonly)
 {
@@ -1256,17 +1264,10 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 
 if (try_unmap) {
 if (int128_eq(llsize, int128_2_64())) {
-/* The unmap ioctl doesn't accept a full 64-bit span. */
-llsize = int128_rshift(llsize, 1);
+ret = vfio_dma_unmap_all(container);
+} else {
 ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
-if (ret) {
-error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
- "0x%"HWADDR_PRIx") = %d (%m)",
- container, iova, int128_get64(llsize), ret);
-}
-iova += int128_get64(llsize);
 }
-ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
 if (ret) {
 error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
  "0x%"HWADDR_PRIx") = %d (%m)",
@@ -3057,6 +3058,31 @@ static int vfio_io_dma_unmap(VFIOContainer *container,
 return 0;
 }
 
+static int vfio_io_dma_unmap_all(VFIOContainer *container, uint32_t flags)
+{
+struct vfio_iommu_type1_dma_unmap unmap = {
+.argsz = sizeof(unmap),
+.flags = 0,
+.size = 0x8000,
+};
+int ret;
+
+/* The unmap ioctl doesn't accept a full 64-bit span. */
+unmap.iova = 0;
+ret = ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, );
+if (ret) {
+return -errno;
+}
+
+unmap.iova += unmap.size;
+ret = ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, );
+if (ret) {
+return -errno;
+}
+
+return 0;
+}
+
 static int vfio_io_dirty_bitmap(VFIOContainer *container,
 struct vfio_iommu_type1_dirty_bitmap *bitmap,
 struct vfio_iommu_type1_dirty_bitmap_get 
*range)
@@ -3076,6 +3102,7 @@ static void vfio_io_wait_commit(VFIOContainer *container)
 VFIOContIO vfio_cont_io_ioctl = {
 .dma_map = vfio_io_dma_map,
 .dma_unmap = vfio_io_dma_unmap,
+.dma_unmap_all = vfio_io_dma_unmap_all,
 .dirty_bitmap = vfio_io_dirty_bitmap,
 .wait_commit = vfio_io_wait_commit,
 };
diff --git a/hw/vfio/user.c b/hw/vfio/user.c
index 1fd37cc..d62fe05 100644
--- a/hw/vfio/user.c
+++ b/hw/vfio/user.c
@@ -1646,6 +1646,28 @@ static int vfio_user_io_dma_unmap(VFIOContainer 
*container,
container->async_ops);
 }
 
+static int vfio_user_io_dma_unmap_all(VFIOContainer *container, uint32_t flags)
+{
+struct vfio_iommu_type1_dma_unmap unmap = {
+.argsz = sizeof(unmap),
+.flags = flags | VFIO_DMA_UNMAP_FLAG_ALL,
+.iova = 0,
+.size = 0,
+};
+
+return vfio_user_dma_unmap(container->proxy, , NULL,
+   container->async_ops);
+}
+
+static int vfio_user_io_dirty_bitmap(VFIOContainer *container,
+struct vfio_iommu_type1_dirty_bitmap *bitmap,
+struct vfio_iommu_type1_dirty_bitmap_get *range)
+{
+
+/* vfio-user doesn't support migration */
+return -EINVAL;
+}
+
 static void vfio_user_io_wait_commit(VFIOContainer *container)
 {
 vfio_user_wait_reqs(container->proxy);
@@ -1654,5 +1676,7 @@ static void vfio_user_io_wait_commit(VFIOContainer 
*container)
 VFIOContIO vfio_cont_io_sock = {
 .dma_map = vfio_user_io_dma_map,
 .dma_unmap = vfio_user_io_dma_unmap,
+.dma_unmap_all = vfio_user_io_dma_unmap_all,
+.dirty_bitmap = vfio_user_io_dirty_bitmap,
 .wait_commit = vfio_user_io_wait_commit,
 };
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 413dafc..d7e3f51 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -206,6 +206,7 @@ struct VFIOContIO {
 int (*dma_unmap)(VFIOContainer *container,
  struct vfio_iommu_type1_dma_unmap *unmap,
  struct vfio_bitmap *bitmap);
+int (*dma_unmap_all)(VFIOContainer *container, uint32_t flags);
 int

[PATCH v1 07/24] vfio-user: connect vfio proxy to remote server

2022-11-08 Thread John Johnson

add user.c & user.h files for vfio-user code
add proxy struct to handle comms with remote server

Signed-off-by: John G Johnson 
Signed-off-by: Elena Ufimtseva 
Signed-off-by: Jagannathan Raman 
---
 MAINTAINERS   |   4 +
 hw/vfio/meson.build   |   1 +
 hw/vfio/pci.c |  19 +
 hw/vfio/user.c| 170 ++
 hw/vfio/user.h|  78 +++
 include/hw/vfio/vfio-common.h |   2 +
 6 files changed, 274 insertions(+)
 create mode 100644 hw/vfio/user.c
 create mode 100644 hw/vfio/user.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 999340d..5d64d02 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1987,8 +1987,12 @@ L: qemu-s3...@nongnu.org
 vfio-user
 M: John G Johnson 
 M: Thanos Makatos 
+M: Elena Ufimtseva 
+M: Jagannathan Raman 
 S: Supported
 F: docs/devel/vfio-user.rst
+F: hw/vfio/user.c
+F: hw/vfio/user.h
 
 vhost
 M: Michael S. Tsirkin 
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index da9af29..2f86f72 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -9,6 +9,7 @@ vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
   'pci-quirks.c',
   'pci.c',
 ))
+vfio_ss.add(when: 'CONFIG_VFIO_USER', if_true: files('user.c'))
 vfio_ss.add(when: 'CONFIG_VFIO_CCW', if_true: files('ccw.c'))
 vfio_ss.add(when: 'CONFIG_VFIO_PLATFORM', if_true: files('platform.c'))
 vfio_ss.add(when: 'CONFIG_VFIO_XGMAC', if_true: files('calxeda-xgmac.c'))
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index dc19869..e5f2413 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -43,6 +43,7 @@
 #include "qapi/error.h"
 #include "migration/blocker.h"
 #include "migration/qemu-file.h"
+#include "hw/vfio/user.h"
 
 /* convenience macros for PCI config space */
 #define VDEV_CONFIG_READ(vbasedev, off, size, data) \
@@ -3452,6 +3453,9 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Error 
**errp)
 VFIOUserPCIDevice *udev = VFIO_USER_PCI(pdev);
 VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
 VFIODevice *vbasedev = >vbasedev;
+SocketAddress addr;
+VFIOProxy *proxy;
+Error *err = NULL;
 
 /*
  * TODO: make option parser understand SocketAddress
@@ -3464,6 +3468,16 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Error 
**errp)
 return;
 }
 
+memset(, 0, sizeof(addr));
+addr.type = SOCKET_ADDRESS_TYPE_UNIX;
+addr.u.q_unix.path = udev->sock_name;
+proxy = vfio_user_connect_dev(, );
+if (!proxy) {
+error_setg(errp, "Remote proxy not found");
+return;
+}
+vbasedev->proxy = proxy;
+
 vbasedev->name = g_strdup_printf("VFIO user <%s>", udev->sock_name);
 vbasedev->ops = _user_pci_ops;
 vbasedev->type = VFIO_DEVICE_TYPE_PCI;
@@ -3474,8 +3488,13 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Error 
**errp)
 static void vfio_user_instance_finalize(Object *obj)
 {
 VFIOPCIDevice *vdev = VFIO_PCI_BASE(obj);
+VFIODevice *vbasedev = >vbasedev;
 
 vfio_put_device(vdev);
+
+if (vbasedev->proxy != NULL) {
+vfio_user_disconnect(vbasedev->proxy);
+}
 }
 
 static Property vfio_user_pci_dev_properties[] = {
diff --git a/hw/vfio/user.c b/hw/vfio/user.c
new file mode 100644
index 000..4f09060
--- /dev/null
+++ b/hw/vfio/user.c
@@ -0,0 +1,170 @@
+/*
+ * vfio protocol over a UNIX socket.
+ *
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include 
+#include 
+
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "qemu/main-loop.h"
+#include "hw/hw.h"
+#include "hw/vfio/vfio-common.h"
+#include "hw/vfio/vfio.h"
+#include "qemu/sockets.h"
+#include "io/channel.h"
+#include "io/channel-socket.h"
+#include "io/channel-util.h"
+#include "sysemu/iothread.h"
+#include "user.h"
+
+static IOThread *vfio_user_iothread;
+
+static void vfio_user_shutdown(VFIOProxy *proxy);
+
+
+/*
+ * Functions called by main, CPU, or iothread threads
+ */
+
+static void vfio_user_shutdown(VFIOProxy *proxy)
+{
+qio_channel_shutdown(proxy->ioc, QIO_CHANNEL_SHUTDOWN_READ, NULL);
+qio_channel_set_aio_fd_handler(proxy->ioc, proxy->ctx, NULL, NULL, NULL);
+}
+
+/*
+ * Functions only called by iothread
+ */
+
+static void vfio_user_cb(void *opaque)
+{
+VFIOProxy *proxy = opaque;
+
+QEMU_LOCK_GUARD(>lock);
+
+proxy->state = VFIO_PROXY_CLOSED;
+qemu_cond_signal(>close_cv);
+}
+
+
+/*
+ * Functions called by main or CPU threads
+ */
+
+static QLIST_HEAD(, VFIOProxy) vfio_user_sockets =
+QLIST_HEAD_INITIALIZER(vfio_user_sockets);
+
+VFIOProxy *vfio_user_connect_dev(SocketAddress *addr, Error **errp)
+{
+VFIOProxy *proxy;
+QIOChannelSocket *sioc;
+QIOChannel *ioc;
+char *sockname;
+
+if (addr->type != SOCKET_ADDRESS_TYPE_UNIX) {
+error_setg(errp, "vfio_user_connect - bad address family");
+

[PATCH v1 16/24] vfio-user: proxy container connect/disconnect

2022-11-08 Thread John Johnson

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/vfio/common.c  | 207 +-
 hw/vfio/pci.c |  18 +++-
 hw/vfio/user.c|   3 +
 hw/vfio/user.h|   1 +
 include/hw/vfio/vfio-common.h |   6 ++
 5 files changed, 231 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index b540195..e73a772 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -19,6 +19,7 @@
  */
 
 #include "qemu/osdep.h"
+#include CONFIG_DEVICES
 #include 
 #ifdef CONFIG_KVM
 #include 
@@ -2267,6 +2268,208 @@ put_space_exit:
 return ret;
 }
 
+
+#ifdef CONFIG_VFIO_USER
+
+static int vfio_connect_proxy(VFIOProxy *proxy, VFIOGroup *group,
+  AddressSpace *as, Error **errp)
+{
+VFIOAddressSpace *space;
+VFIOContainer *container;
+int ret;
+
+/*
+ * try to mirror vfio_connect_container()
+ * as much as possible
+ */
+
+space = vfio_get_address_space(as);
+
+container = g_malloc0(sizeof(*container));
+container->space = space;
+container->fd = -1;
+container->error = NULL;
+container->io_ops = _cont_io_sock;
+QLIST_INIT(>giommu_list);
+QLIST_INIT(>hostwin_list);
+QLIST_INIT(>vrdl_list);
+container->proxy = proxy;
+
+/*
+ * The proxy uses a SW IOMMU in lieu of the HW one
+ * used in the ioctl() version.  Mascarade as TYPE1
+ * for maximum capatibility
+ */
+container->iommu_type = VFIO_TYPE1_IOMMU;
+
+/*
+ * VFIO user allows the device server to map guest
+ * memory so it has the same issue with discards as
+ * a local IOMMU has.
+ */
+ret = vfio_ram_block_discard_disable(container, true);
+if (ret) {
+error_setg_errno(errp, -ret, "Cannot set discarding of RAM broken");
+goto free_container_exit;
+}
+
+vfio_host_win_add(container, 0, (hwaddr)-1, proxy->dma_pgsizes);
+container->pgsizes = proxy->dma_pgsizes;
+container->dma_max_mappings = proxy->max_dma;
+
+/* setup bitmask now, but migration support won't be ready until v2 */
+container->dirty_pages_supported = true;
+container->max_dirty_bitmap_size = proxy->max_bitmap;
+container->dirty_pgsizes = proxy->migr_pgsize;
+
+QLIST_INIT(>group_list);
+QLIST_INSERT_HEAD(>containers, container, next);
+
+group->container = container;
+QLIST_INSERT_HEAD(>group_list, group, container_next);
+
+container->listener = vfio_memory_listener;
+memory_listener_register(>listener, container->space->as);
+
+if (container->error) {
+ret = -1;
+error_propagate_prepend(errp, container->error,
+"memory listener initialization failed: ");
+goto listener_release_exit;
+}
+
+container->initialized = true;
+
+return 0;
+
+listener_release_exit:
+QLIST_REMOVE(group, container_next);
+QLIST_REMOVE(container, next);
+vfio_listener_release(container);
+vfio_ram_block_discard_disable(container, false);
+
+free_container_exit:
+g_free(container);
+
+vfio_put_address_space(space);
+
+return ret;
+}
+
+static void vfio_disconnect_proxy(VFIOGroup *group)
+{
+VFIOContainer *container = group->container;
+VFIOAddressSpace *space = container->space;
+VFIOGuestIOMMU *giommu, *tmp;
+VFIOHostDMAWindow *hostwin, *next;
+
+/*
+ * try to mirror vfio_disconnect_container()
+ * as much as possible, knowing each device
+ * is in one group and one container
+ */
+
+QLIST_REMOVE(group, container_next);
+group->container = NULL;
+
+/*
+ * Explicitly release the listener first before unset container,
+ * since unset may destroy the backend container if it's the last
+ * group.
+ */
+memory_listener_unregister(>listener);
+
+QLIST_REMOVE(container, next);
+
+QLIST_FOREACH_SAFE(giommu, >giommu_list, giommu_next, tmp) {
+memory_region_unregister_iommu_notifier(
+MEMORY_REGION(giommu->iommu_mr), >n);
+QLIST_REMOVE(giommu, giommu_next);
+g_free(giommu);
+}
+
+QLIST_FOREACH_SAFE(hostwin, >hostwin_list, hostwin_next,
+   next) {
+QLIST_REMOVE(hostwin, hostwin_next);
+g_free(hostwin);
+}
+
+g_free(container);
+vfio_put_address_space(space);
+}
+
+int vfio_user_get_device(VFIOGroup *group, VFIODevice *vbasedev, Error **errp)
+{
+struct vfio_device_info info = { .argsz = sizeof(info) };
+int ret;
+
+ret = VDEV_GET_INFO(vbasedev, );
+if (ret) {
+error_setg_errno(errp, -ret, "get info failure");
+return ret;
+}
+
+vbasedev->fd = -1;
+vbasedev->group = group;
+QLIST_INSERT_HEAD(>device_list, vbasedev, next);
+
+vbasedev->num_irqs = info.num_irqs;
+vbasedev->num_regions = info.num_regions;
+vbasedev->flags = info.flags;
+
+vfio_get_all_regions(vbasedev);
+

[PATCH v1 08/24] vfio-user: define socket receive functions

2022-11-08 Thread John Johnson

Add infrastructure needed to receive incoming messages

Signed-off-by: John G Johnson 
Signed-off-by: Elena Ufimtseva 
Signed-off-by: Jagannathan Raman 
---
 MAINTAINERS |   1 +
 hw/vfio/pci.c   |  11 ++
 hw/vfio/user-protocol.h |  54 +++
 hw/vfio/user.c  | 399 
 hw/vfio/user.h  |   8 +
 5 files changed, 473 insertions(+)
 create mode 100644 hw/vfio/user-protocol.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 5d64d02..b6c186b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1993,6 +1993,7 @@ S: Supported
 F: docs/devel/vfio-user.rst
 F: hw/vfio/user.c
 F: hw/vfio/user.h
+F: hw/vfio/user-protocol.h
 
 vhost
 M: Michael S. Tsirkin 
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e5f2413..f086235 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3432,6 +3432,16 @@ type_init(register_vfio_pci_dev_type)
  */
 
 /*
+ * Incoming request message callback.
+ *
+ * Runs off main loop, so BQL held.
+ */
+static void vfio_user_pci_process_req(void *opaque, VFIOUserMsg *msg)
+{
+
+}
+
+/*
  * Emulated devices don't use host hot reset
  */
 static void vfio_user_compute_needs_reset(VFIODevice *vbasedev)
@@ -3477,6 +3487,7 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Error 
**errp)
 return;
 }
 vbasedev->proxy = proxy;
+vfio_user_set_handler(vbasedev, vfio_user_pci_process_req, vdev);
 
 vbasedev->name = g_strdup_printf("VFIO user <%s>", udev->sock_name);
 vbasedev->ops = _user_pci_ops;
diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h
new file mode 100644
index 000..d23877c
--- /dev/null
+++ b/hw/vfio/user-protocol.h
@@ -0,0 +1,54 @@
+#ifndef VFIO_USER_PROTOCOL_H
+#define VFIO_USER_PROTOCOL_H
+
+/*
+ * vfio protocol over a UNIX socket.
+ *
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Each message has a standard header that describes the command
+ * being sent, which is almost always a VFIO ioctl().
+ *
+ * The header may be followed by command-specific data, such as the
+ * region and offset info for read and write commands.
+ */
+
+typedef struct {
+uint16_t id;
+uint16_t command;
+uint32_t size;
+uint32_t flags;
+uint32_t error_reply;
+} VFIOUserHdr;
+
+/* VFIOUserHdr commands */
+enum vfio_user_command {
+VFIO_USER_VERSION   = 1,
+VFIO_USER_DMA_MAP   = 2,
+VFIO_USER_DMA_UNMAP = 3,
+VFIO_USER_DEVICE_GET_INFO   = 4,
+VFIO_USER_DEVICE_GET_REGION_INFO= 5,
+VFIO_USER_DEVICE_GET_REGION_IO_FDS  = 6,
+VFIO_USER_DEVICE_GET_IRQ_INFO   = 7,
+VFIO_USER_DEVICE_SET_IRQS   = 8,
+VFIO_USER_REGION_READ   = 9,
+VFIO_USER_REGION_WRITE  = 10,
+VFIO_USER_DMA_READ  = 11,
+VFIO_USER_DMA_WRITE = 12,
+VFIO_USER_DEVICE_RESET  = 13,
+VFIO_USER_DIRTY_PAGES   = 14,
+VFIO_USER_MAX,
+};
+
+/* VFIOUserHdr flags */
+#define VFIO_USER_REQUEST   0x0
+#define VFIO_USER_REPLY 0x1
+#define VFIO_USER_TYPE  0xF
+
+#define VFIO_USER_NO_REPLY  0x10
+#define VFIO_USER_ERROR 0x20
+
+#endif /* VFIO_USER_PROTOCOL_H */
diff --git a/hw/vfio/user.c b/hw/vfio/user.c
index 4f09060..ffd69b9 100644
--- a/hw/vfio/user.c
+++ b/hw/vfio/user.c
@@ -28,7 +28,22 @@
 static IOThread *vfio_user_iothread;
 
 static void vfio_user_shutdown(VFIOProxy *proxy);
+static VFIOUserMsg *vfio_user_getmsg(VFIOProxy *proxy, VFIOUserHdr *hdr,
+ VFIOUserFDs *fds);
+static VFIOUserFDs *vfio_user_getfds(int numfds);
+static void vfio_user_recycle(VFIOProxy *proxy, VFIOUserMsg *msg);
 
+static void vfio_user_recv(void *opaque);
+static int vfio_user_recv_one(VFIOProxy *proxy);
+static void vfio_user_cb(void *opaque);
+
+static void vfio_user_request(void *opaque);
+
+static inline void vfio_user_set_error(VFIOUserHdr *hdr, uint32_t err)
+{
+hdr->flags |= VFIO_USER_ERROR;
+hdr->error_reply = err;
+}
 
 /*
  * Functions called by main, CPU, or iothread threads
@@ -40,10 +55,334 @@ static void vfio_user_shutdown(VFIOProxy *proxy)
 qio_channel_set_aio_fd_handler(proxy->ioc, proxy->ctx, NULL, NULL, NULL);
 }
 
+static VFIOUserMsg *vfio_user_getmsg(VFIOProxy *proxy, VFIOUserHdr *hdr,
+ VFIOUserFDs *fds)
+{
+VFIOUserMsg *msg;
+
+msg = QTAILQ_FIRST(>free);
+if (msg != NULL) {
+QTAILQ_REMOVE(>free, msg, next);
+} else {
+msg = g_malloc0(sizeof(*msg));
+qemu_cond_init(>cv);
+}
+
+msg->hdr = hdr;
+msg->fds = fds;
+return msg;
+}
+
+/*
+ * Recycle a message list entry to the free list.
+ */
+static void vfio_user_recycle(VFIOProxy *proxy, VFIOUserMsg *msg)
+{
+if (msg->type == VFIO_MSG_NONE) {
+

[PATCH v1 23/24] vfio-user: add coalesced posted writes

2022-11-08 Thread John Johnson

Add new message to send multiple writes to server

Signed-off-by: John G Johnson 
Signed-off-by: Elena Ufimtseva 
Signed-off-by: Jagannathan Raman 
---
 hw/vfio/user-protocol.h |  21 
 hw/vfio/user.c  | 128 +++-
 hw/vfio/user.h  |   7 +++
 3 files changed, 154 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h
index 6afd090..ffa222b 100644
--- a/hw/vfio/user-protocol.h
+++ b/hw/vfio/user-protocol.h
@@ -40,6 +40,7 @@ enum vfio_user_command {
 VFIO_USER_DMA_WRITE = 12,
 VFIO_USER_DEVICE_RESET  = 13,
 VFIO_USER_DIRTY_PAGES   = 14,
+VFIO_USER_REGION_WRITE_MULTI= 15,
 VFIO_USER_MAX,
 };
 
@@ -73,6 +74,7 @@ typedef struct {
 #define VFIO_USER_CAP_PGSIZES   "pgsizes"
 #define VFIO_USER_CAP_MAP_MAX   "max_dma_maps"
 #define VFIO_USER_CAP_MIGR  "migration"
+#define VFIO_USER_CAP_MULTI "write_multiple"
 
 /* "migration" members */
 #define VFIO_USER_CAP_PGSIZE"pgsize"
@@ -220,4 +222,23 @@ typedef struct {
 char data[];
 } VFIOUserBitmap;
 
+/*
+ * VFIO_USER_REGION_WRITE_MULTI
+ */
+#define VFIO_USER_MULTI_DATA  8
+#define VFIO_USER_MULTI_MAX   200
+
+typedef struct {
+uint64_t offset;
+uint32_t region;
+uint32_t count;
+char data[VFIO_USER_MULTI_DATA];
+} VFIOUserWROne;
+
+typedef struct {
+VFIOUserHdr hdr;
+uint64_t wr_cnt;
+VFIOUserWROne wrs[VFIO_USER_MULTI_MAX];
+} VFIOUserWRMulti;
+
 #endif /* VFIO_USER_PROTOCOL_H */
diff --git a/hw/vfio/user.c b/hw/vfio/user.c
index a9e6cf5..4ed305b 100644
--- a/hw/vfio/user.c
+++ b/hw/vfio/user.c
@@ -66,6 +66,7 @@ static void vfio_user_send_wait(VFIOProxy *proxy, VFIOUserHdr 
*hdr,
 static void vfio_user_wait_reqs(VFIOProxy *proxy);
 static void vfio_user_request_msg(VFIOUserHdr *hdr, uint16_t cmd,
   uint32_t size, uint32_t flags);
+static void vfio_user_flush_multi(VFIOProxy *proxy);
 
 static inline void vfio_user_set_error(VFIOUserHdr *hdr, uint32_t err)
 {
@@ -461,6 +462,11 @@ static void vfio_user_send(void *opaque)
 }
 qio_channel_set_aio_fd_handler(proxy->ioc, proxy->ctx,
vfio_user_recv, NULL, proxy);
+
+/* queue empty - send any pending multi write msgs */
+if (proxy->wr_multi != NULL) {
+vfio_user_flush_multi(proxy);
+}
 }
 }
 
@@ -481,6 +487,7 @@ static int vfio_user_send_one(VFIOProxy *proxy)
 }
 
 QTAILQ_REMOVE(>outgoing, msg, next);
+proxy->num_outgoing--;
 if (msg->type == VFIO_MSG_ASYNC) {
 vfio_user_recycle(proxy, msg);
 } else {
@@ -587,11 +594,18 @@ static int vfio_user_send_queued(VFIOProxy *proxy, 
VFIOUserMsg *msg)
 {
 int ret;
 
+/* older coalesced writes go first */
+if (proxy->wr_multi != NULL &&
+((msg->hdr->flags & VFIO_USER_TYPE) == VFIO_USER_REQUEST)) {
+vfio_user_flush_multi(proxy);
+}
+
 /*
  * Unsent outgoing msgs - add to tail
  */
 if (!QTAILQ_EMPTY(>outgoing)) {
 QTAILQ_INSERT_TAIL(>outgoing, msg, next);
+proxy->num_outgoing++;
 return 0;
 }
 
@@ -605,6 +619,7 @@ static int vfio_user_send_queued(VFIOProxy *proxy, 
VFIOUserMsg *msg)
 }
 if (ret == QIO_CHANNEL_ERR_BLOCK) {
 QTAILQ_INSERT_HEAD(>outgoing, msg, next);
+proxy->num_outgoing = 1;
 qio_channel_set_aio_fd_handler(proxy->ioc, proxy->ctx,
vfio_user_recv, vfio_user_send,
proxy);
@@ -1145,12 +1160,27 @@ static int check_migr(VFIOProxy *proxy, QObject *qobj, 
Error **errp)
 return caps_parse(proxy, qdict, caps_migr, errp);
 }
 
+static int check_multi(VFIOProxy *proxy, QObject *qobj, Error **errp)
+{
+QBool *qb = qobject_to(QBool, qobj);
+
+if (qb == NULL) {
+error_setg(errp, "malformed %s", VFIO_USER_CAP_MULTI);
+return -1;
+}
+if (qbool_get_bool(qb)) {
+proxy->flags |= VFIO_PROXY_USE_MULTI;
+}
+return 0;
+}
+
 static struct cap_entry caps_cap[] = {
 { VFIO_USER_CAP_MAX_FDS, check_max_fds },
 { VFIO_USER_CAP_MAX_XFER, check_max_xfer },
 { VFIO_USER_CAP_PGSIZES, check_pgsizes },
 { VFIO_USER_CAP_MAP_MAX, check_max_dma },
 { VFIO_USER_CAP_MIGR, check_migr },
+{ VFIO_USER_CAP_MULTI, check_multi },
 { NULL }
 };
 
@@ -1209,6 +1239,7 @@ static GString *caps_json(void)
 qdict_put_int(capdict, VFIO_USER_CAP_MAX_XFER, VFIO_USER_DEF_MAX_XFER);
 qdict_put_int(capdict, VFIO_USER_CAP_PGSIZES, VFIO_USER_DEF_PGSIZE);
 qdict_put_int(capdict, VFIO_USER_CAP_MAP_MAX, VFIO_USER_DEF_MAP_MAX);
+qdict_put_bool(capdict, VFIO_USER_CAP_MULTI, true);
 
 qdict_put_obj(dict, VFIO_USER_CAP, QOBJECT(capdict));
 
@@ -1547,18 +1578,111 @@ static int vfio_user_region_read(VFIOProxy *proxy, 
uint8_t index, off_t offset,
 return msgp->count;
 }
 
+static

[PATCH v1 04/24] vfio-user: add region cache

2022-11-08 Thread John Johnson

cache VFIO_DEVICE_GET_REGION_INFO results to reduce
memory alloc/free cycles and as prep work for vfio-user

Signed-off-by: John G Johnson 
Signed-off-by: Elena Ufimtseva 
Signed-off-by: Jagannathan Raman 
---
 hw/vfio/ccw.c |  5 -
 hw/vfio/common.c  | 41 +++--
 hw/vfio/igd.c | 23 +--
 hw/vfio/migration.c   |  2 --
 hw/vfio/pci-quirks.c  | 19 +--
 hw/vfio/pci.c |  8 
 include/hw/vfio/vfio-common.h |  2 ++
 7 files changed, 51 insertions(+), 49 deletions(-)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 0354737..06b588c 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -517,7 +517,6 @@ static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error 
**errp)
 
 vcdev->io_region_offset = info->offset;
 vcdev->io_region = g_malloc0(info->size);
-g_free(info);
 
 /* check for the optional async command region */
 ret = vfio_get_dev_region_info(vdev, VFIO_REGION_TYPE_CCW,
@@ -530,7 +529,6 @@ static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error 
**errp)
 }
 vcdev->async_cmd_region_offset = info->offset;
 vcdev->async_cmd_region = g_malloc0(info->size);
-g_free(info);
 }
 
 ret = vfio_get_dev_region_info(vdev, VFIO_REGION_TYPE_CCW,
@@ -543,7 +541,6 @@ static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error 
**errp)
 }
 vcdev->schib_region_offset = info->offset;
 vcdev->schib_region = g_malloc(info->size);
-g_free(info);
 }
 
 ret = vfio_get_dev_region_info(vdev, VFIO_REGION_TYPE_CCW,
@@ -557,7 +554,6 @@ static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error 
**errp)
 }
 vcdev->crw_region_offset = info->offset;
 vcdev->crw_region = g_malloc(info->size);
-g_free(info);
 }
 
 return;
@@ -567,7 +563,6 @@ out_err:
 g_free(vcdev->schib_region);
 g_free(vcdev->async_cmd_region);
 g_free(vcdev->io_region);
-g_free(info);
 return;
 }
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 83d69b9..dd9104f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1600,8 +1600,6 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, 
VFIORegion *region,
 }
 }
 
-g_free(info);
-
 trace_vfio_region_setup(vbasedev->name, index, name,
 region->flags, region->fd_offset, region->size);
 return 0;
@@ -2357,6 +2355,16 @@ void vfio_put_group(VFIOGroup *group)
 }
 }
 
+void vfio_get_all_regions(VFIODevice *vbasedev)
+{
+struct vfio_region_info *info;
+int i;
+
+for (i = 0; i < vbasedev->num_regions; i++) {
+vfio_get_region_info(vbasedev, i, );
+}
+}
+
 int vfio_get_device(VFIOGroup *group, const char *name,
 VFIODevice *vbasedev, Error **errp)
 {
@@ -2412,12 +2420,23 @@ int vfio_get_device(VFIOGroup *group, const char *name,
 trace_vfio_get_device(name, dev_info.flags, dev_info.num_regions,
   dev_info.num_irqs);
 
+vfio_get_all_regions(vbasedev);
 vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
 return 0;
 }
 
 void vfio_put_base_device(VFIODevice *vbasedev)
 {
+if (vbasedev->regions != NULL) {
+int i;
+
+for (i = 0; i < vbasedev->num_regions; i++) {
+g_free(vbasedev->regions[i]);
+}
+g_free(vbasedev->regions);
+vbasedev->regions = NULL;
+}
+
 if (!vbasedev->group) {
 return;
 }
@@ -2432,6 +2451,17 @@ int vfio_get_region_info(VFIODevice *vbasedev, int index,
 {
 size_t argsz = sizeof(struct vfio_region_info);
 
+/* create region cache */
+if (vbasedev->regions == NULL) {
+vbasedev->regions = g_new0(struct vfio_region_info *,
+   vbasedev->num_regions);
+}
+/* check cache */
+if (vbasedev->regions[index] != NULL) {
+*info = vbasedev->regions[index];
+return 0;
+}
+
 *info = g_malloc0(argsz);
 
 (*info)->index = index;
@@ -2451,6 +2481,9 @@ retry:
 goto retry;
 }
 
+/* fill cache */
+vbasedev->regions[index] = *info;
+
 return 0;
 }
 
@@ -2469,7 +2502,6 @@ int vfio_get_dev_region_info(VFIODevice *vbasedev, 
uint32_t type,
 
 hdr = vfio_get_region_info_cap(*info, VFIO_REGION_INFO_CAP_TYPE);
 if (!hdr) {
-g_free(*info);
 continue;
 }
 
@@ -2481,8 +2513,6 @@ int vfio_get_dev_region_info(VFIODevice *vbasedev, 
uint32_t type,
 if (cap_type->type == type && cap_type->subtype == subtype) {
 return 0;
 }
-
-g_free(*info);
 }
 
 *info = NULL;
@@ -2498,7 +2528,6 @@ bool vfio_has_region_cap(VFIODevice *vbasedev, int 
region, uint16_t cap_type)
 if (vfio_get_region_info_cap(info, cap_type)) {
 ret = true;
 }
-g_free(info);
 }

[PATCH v1 06/24] vfio-user: Define type vfio_user_pci_dev_info

2022-11-08 Thread John Johnson

New class for vfio-user with its class and instance
constructors and destructors, and its pci ops.

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/vfio/Kconfig  | 10 +++
 hw/vfio/common.c |  5 
 hw/vfio/pci.c| 89 
 hw/vfio/pci.h|  8 +
 4 files changed, 112 insertions(+)

diff --git a/hw/vfio/Kconfig b/hw/vfio/Kconfig
index 7cdba05..301894e 100644
--- a/hw/vfio/Kconfig
+++ b/hw/vfio/Kconfig
@@ -2,6 +2,10 @@ config VFIO
 bool
 depends on LINUX
 
+config VFIO_USER
+bool
+depends on VFIO
+
 config VFIO_PCI
 bool
 default y
@@ -9,6 +13,12 @@ config VFIO_PCI
 select EDID
 depends on LINUX && PCI
 
+config VFIO_USER_PCI
+bool
+default y
+select VFIO_USER
+depends on VFIO_PCI
+
 config VFIO_CCW
 bool
 default y
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index c7bf0aa..c589bd9 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1774,6 +1774,11 @@ void vfio_reset_handler(void *opaque)
 QLIST_FOREACH(group, _group_list, next) {
 QLIST_FOREACH(vbasedev, >device_list, next) {
 if (vbasedev->dev->realized && vbasedev->needs_reset) {
+if (vbasedev->ops->vfio_hot_reset_multi == NULL) {
+error_printf("%s: No hot reset handler specified\n",
+ vbasedev->name);
+continue;
+}
 vbasedev->ops->vfio_hot_reset_multi(vbasedev);
 }
 }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 80b03a2..dc19869 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -19,6 +19,7 @@
  */
 
 #include "qemu/osdep.h"
+#include CONFIG_DEVICES
 #include 
 #include 
 
@@ -3421,3 +3422,91 @@ static void register_vfio_pci_dev_type(void)
 }
 
 type_init(register_vfio_pci_dev_type)
+
+
+#ifdef CONFIG_VFIO_USER_PCI
+
+/*
+ * vfio-user routines.
+ */
+
+/*
+ * Emulated devices don't use host hot reset
+ */
+static void vfio_user_compute_needs_reset(VFIODevice *vbasedev)
+{
+vbasedev->needs_reset = false;
+}
+
+static VFIODeviceOps vfio_user_pci_ops = {
+.vfio_compute_needs_reset = vfio_user_compute_needs_reset,
+.vfio_eoi = vfio_intx_eoi,
+.vfio_get_object = vfio_pci_get_object,
+.vfio_save_config = vfio_pci_save_config,
+.vfio_load_config = vfio_pci_load_config,
+};
+
+static void vfio_user_pci_realize(PCIDevice *pdev, Error **errp)
+{
+ERRP_GUARD();
+VFIOUserPCIDevice *udev = VFIO_USER_PCI(pdev);
+VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
+VFIODevice *vbasedev = >vbasedev;
+
+/*
+ * TODO: make option parser understand SocketAddress
+ * and use that instead of having scalar options
+ * for each socket type.
+ */
+if (!udev->sock_name) {
+error_setg(errp, "No socket specified");
+error_append_hint(errp, "Use -device vfio-user-pci,socket=\n");
+return;
+}
+
+vbasedev->name = g_strdup_printf("VFIO user <%s>", udev->sock_name);
+vbasedev->ops = _user_pci_ops;
+vbasedev->type = VFIO_DEVICE_TYPE_PCI;
+vbasedev->dev = DEVICE(vdev);
+
+}
+
+static void vfio_user_instance_finalize(Object *obj)
+{
+VFIOPCIDevice *vdev = VFIO_PCI_BASE(obj);
+
+vfio_put_device(vdev);
+}
+
+static Property vfio_user_pci_dev_properties[] = {
+DEFINE_PROP_STRING("socket", VFIOUserPCIDevice, sock_name),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void vfio_user_pci_dev_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+PCIDeviceClass *pdc = PCI_DEVICE_CLASS(klass);
+
+device_class_set_props(dc, vfio_user_pci_dev_properties);
+dc->desc = "VFIO over socket PCI device assignment";
+pdc->realize = vfio_user_pci_realize;
+}
+
+static const TypeInfo vfio_user_pci_dev_info = {
+.name = TYPE_VFIO_USER_PCI,
+.parent = TYPE_VFIO_PCI_BASE,
+.instance_size = sizeof(VFIOUserPCIDevice),
+.class_init = vfio_user_pci_dev_class_init,
+.instance_init = vfio_instance_init,
+.instance_finalize = vfio_user_instance_finalize,
+};
+
+static void register_vfio_user_dev_type(void)
+{
+type_register_static(_user_pci_dev_info);
+}
+
+type_init(register_vfio_user_dev_type)
+
+#endif /* VFIO_USER_PCI */
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 7c5c8ec..27db931 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -189,6 +189,14 @@ struct VFIOKernPCIDevice {
 VFIOPCIDevice device;
 };
 
+#define TYPE_VFIO_USER_PCI "vfio-user-pci"
+OBJECT_DECLARE_SIMPLE_TYPE(VFIOUserPCIDevice, VFIO_USER_PCI)
+
+struct VFIOUserPCIDevice {
+VFIOPCIDevice device;
+char *sock_name;
+};
+
 /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
 static inline bool vfio_pci_is(VFIOPCIDevice *vdev, uint32_t vendor, uint32_t 
device)
 {
-- 
1.8.3.1

[PATCH v1 09/24] vfio-user: define socket send functions

2022-11-08 Thread John Johnson

Also negotiate protocol version with remote server

Signed-off-by: Jagannathan Raman 
Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
---
 hw/vfio/pci.c   |  15 ++
 hw/vfio/pci.h   |   1 +
 hw/vfio/user-protocol.h |  62 ++
 hw/vfio/user.c  | 508 
 hw/vfio/user.h  |   9 +
 5 files changed, 595 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index f086235..b2534b3 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3489,11 +3489,25 @@ static void vfio_user_pci_realize(PCIDevice *pdev, 
Error **errp)
 vbasedev->proxy = proxy;
 vfio_user_set_handler(vbasedev, vfio_user_pci_process_req, vdev);
 
+if (udev->send_queued) {
+proxy->flags |= VFIO_PROXY_FORCE_QUEUED;
+}
+
+vfio_user_validate_version(proxy, );
+if (err != NULL) {
+error_propagate(errp, err);
+goto error;
+}
+
 vbasedev->name = g_strdup_printf("VFIO user <%s>", udev->sock_name);
 vbasedev->ops = _user_pci_ops;
 vbasedev->type = VFIO_DEVICE_TYPE_PCI;
 vbasedev->dev = DEVICE(vdev);
 
+return;
+
+error:
+error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name);
 }
 
 static void vfio_user_instance_finalize(Object *obj)
@@ -3510,6 +3524,7 @@ static void vfio_user_instance_finalize(Object *obj)
 
 static Property vfio_user_pci_dev_properties[] = {
 DEFINE_PROP_STRING("socket", VFIOUserPCIDevice, sock_name),
+DEFINE_PROP_BOOL("x-send-queued", VFIOUserPCIDevice, send_queued, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 27db931..c47d2f8 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -195,6 +195,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(VFIOUserPCIDevice, VFIO_USER_PCI)
 struct VFIOUserPCIDevice {
 VFIOPCIDevice device;
 char *sock_name;
+bool send_queued;   /* all sends are queued */
 };
 
 /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h
index d23877c..5de5b20 100644
--- a/hw/vfio/user-protocol.h
+++ b/hw/vfio/user-protocol.h
@@ -51,4 +51,66 @@ enum vfio_user_command {
 #define VFIO_USER_NO_REPLY  0x10
 #define VFIO_USER_ERROR 0x20
 
+
+/*
+ * VFIO_USER_VERSION
+ */
+typedef struct {
+VFIOUserHdr hdr;
+uint16_t major;
+uint16_t minor;
+char capabilities[];
+} VFIOUserVersion;
+
+#define VFIO_USER_MAJOR_VER 0
+#define VFIO_USER_MINOR_VER 0
+
+#define VFIO_USER_CAP   "capabilities"
+
+/* "capabilities" members */
+#define VFIO_USER_CAP_MAX_FDS   "max_msg_fds"
+#define VFIO_USER_CAP_MAX_XFER  "max_data_xfer_size"
+#define VFIO_USER_CAP_PGSIZES   "pgsizes"
+#define VFIO_USER_CAP_MAP_MAX   "max_dma_maps"
+#define VFIO_USER_CAP_MIGR  "migration"
+
+/* "migration" members */
+#define VFIO_USER_CAP_PGSIZE"pgsize"
+#define VFIO_USER_CAP_MAX_BITMAP"max_bitmap_size"
+
+/*
+ * Max FDs mainly comes into play when a device supports multiple interrupts
+ * where each ones uses an eventfd to inject it into the guest.
+ * It is clamped by the the number of FDs the qio channel supports in a
+ * single message.
+ */
+#define VFIO_USER_DEF_MAX_FDS   8
+#define VFIO_USER_MAX_MAX_FDS   16
+
+/*
+ * Max transfer limits the amount of data in region and DMA messages.
+ * Region R/W will be very small (limited by how much a single instruction
+ * can process) so just use a reasonable limit here.
+ */
+#define VFIO_USER_DEF_MAX_XFER  (1024 * 1024)
+#define VFIO_USER_MAX_MAX_XFER  (64 * 1024 * 1024)
+
+/*
+ * Default pagesizes supported is 4k.
+ */
+#define VFIO_USER_DEF_PGSIZE4096
+
+/*
+ * Default max number of DMA mappings is stolen from the
+ * linux kernel "dma_entry_limit"
+ */
+#define VFIO_USER_DEF_MAP_MAX   65535
+
+/*
+ * Default max bitmap size is also take from the linux kernel,
+ * where usage of signed ints limits the VA range to 2^31 bytes.
+ * Dividing that by the number of bits per byte yields 256MB
+ */
+#define VFIO_USER_DEF_MAX_BITMAP (256 * 1024 * 1024)
+
 #endif /* VFIO_USER_PROTOCOL_H */
diff --git a/hw/vfio/user.c b/hw/vfio/user.c
index ffd69b9..31bcc93 100644
--- a/hw/vfio/user.c
+++ b/hw/vfio/user.c
@@ -23,11 +23,19 @@
 #include "io/channel-socket.h"
 #include "io/channel-util.h"
 #include "sysemu/iothread.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qjson.h"
+#include "qapi/qmp/qnull.h"
+#include "qapi/qmp/qstring.h"
+#include "qapi/qmp/qnum.h"
+#include "qapi/qmp/qbool.h"
 #include "user.h"
 
+static int wait_time = 5000;   /* wait up to 5 sec for busy servers */
 static IOThread *vfio_user_iothread;
 
 static void vfio_user_shutdown(VFIOProxy *proxy);
+static int vfio_user_send_qio(VFIOProxy *proxy, VFIOUserMsg *msg);
 static VFIOUserMsg *vfio_user_getmsg(VFIOProxy *proxy, VFIOUserHdr *hdr,
  VFIOUserFDs *fds);
 static VFIOUserFDs *vfio_user_getfds(int numfds);
@@ -35,9 +43,16 @@ static void

[PATCH v1 00/24] vfio-user client

2022-11-08 Thread John Johnson



Hello,

This is the 6th revision of the vfio-user client implementation.
It is the first patch series (the previous revisions were RFCs)

First of all, thank you for your time reviewing the RFC versions.

The vfio-user framework consists of 3 parts:
 1) The VFIO user protocol specification.
 2) A client - the VFIO device in QEMU that encapsulates VFIO messages
and sends them to the server.
 3) A server - a remote process that emulates a device.

This patchset implements parts 1 and 2.

The libvfio-user project (https://github.com/nutanix/libvfio-user)
can be used by a remote process to handle the protocol to implement the third 
part.
We also have upstreamed a patch series that implement a server using QEMU.


Contributors:

John G Johnson 
John Levon 
Thanos Makatos 
Elena Ufimtseva 
Jagannathan Raman 

John Johnson (23):
  vfio-user: add VFIO base abstract class
  vfio-user: add container IO ops vector
  vfio-user: add region cache
  vfio-user: add device IO ops vector
  vfio-user: Define type vfio_user_pci_dev_info
  vfio-user: connect vfio proxy to remote server
  vfio-user: define socket receive functions
  vfio-user: define socket send functions
  vfio-user: get device info
  vfio-user: get region info
  vfio-user: region read/write
  vfio-user: pci_user_realize PCI setup
  vfio-user: get and set IRQs
  vfio-user: forward msix BAR accesses to server
  vfio-user: proxy container connect/disconnect
  vfio-user: dma map/unmap operations
  vfio-user: add dma_unmap_all
  vfio-user: secure DMA support
  vfio-user: dma read/write operations
  vfio-user: pci reset
  vfio-user: add 'x-msg-timeout' option that specifies msg wait times
  vfio-user: add coalesced posted writes
  vfio-user: add trace points

Thanos Makatos (1):
  vfio-user: introduce vfio-user protocol specification

 MAINTAINERS|   11 +
 docs/devel/index-internals.rst |1 +
 docs/devel/vfio-user.rst   | 1522 
 hw/vfio/Kconfig|   10 +
 hw/vfio/ap.c   |1 +
 hw/vfio/ccw.c  |7 +-
 hw/vfio/common.c   |  648 --
 hw/vfio/igd.c  |   23 +-
 hw/vfio/meson.build|1 +
 hw/vfio/migration.c|2 -
 hw/vfio/pci-quirks.c   |   19 +-
 hw/vfio/pci.c  |  926 ++-
 hw/vfio/pci.h  |   29 +-
 hw/vfio/platform.c |2 +
 hw/vfio/trace-events   |   15 +
 hw/vfio/user-protocol.h|  244 +
 hw/vfio/user.c | 1904 
 hw/vfio/user.h |  112 +++
 include/hw/vfio/vfio-common.h  |   82 ++
 19 files changed, 5230 insertions(+), 329 deletions(-)
 create mode 100644 docs/devel/vfio-user.rst
 create mode 100644 hw/vfio/user-protocol.h
 create mode 100644 hw/vfio/user.c
 create mode 100644 hw/vfio/user.h

-- 
1.8.3.1

[PATCH v1 02/24] vfio-user: add VFIO base abstract class

2022-11-08 Thread John Johnson

Add an abstract base class both the kernel driver
and user socket implementations can use to share code.

Signed-off-by: John G Johnson 
Signed-off-by: Elena Ufimtseva 
Signed-off-by: Jagannathan Raman 
---
 hw/vfio/pci.c | 106 +++---
 hw/vfio/pci.h |  16 +++--
 2 files changed, 78 insertions(+), 44 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 939dcc3..60acde5 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -235,7 +235,7 @@ static void vfio_intx_update(VFIOPCIDevice *vdev, 
PCIINTxRoute *route)
 
 static void vfio_intx_routing_notifier(PCIDevice *pdev)
 {
-VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
 PCIINTxRoute route;
 
 if (vdev->interrupt != VFIO_INT_INTx) {
@@ -467,7 +467,7 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, 
MSIMessage msg,
 static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
MSIMessage *msg, IOHandler *handler)
 {
-VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
 VFIOMSIVector *vector;
 int ret;
 
@@ -561,7 +561,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev,
 
 static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
 {
-VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
 VFIOMSIVector *vector = >msi_vectors[nr];
 
 trace_vfio_msix_vector_release(vdev->vbasedev.name, nr);
@@ -1109,7 +1109,7 @@ static const MemoryRegionOps vfio_vga_ops = {
  */
 static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar)
 {
-VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
 VFIORegion *region = >bars[bar].region;
 MemoryRegion *mmap_mr, *region_mr, *base_mr;
 PCIIORegion *r;
@@ -1155,7 +1155,7 @@ static void vfio_sub_page_bar_update_mapping(PCIDevice 
*pdev, int bar)
  */
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
 {
-VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
 uint32_t emu_bits = 0, emu_val = 0, phys_val = 0, val;
 
 memcpy(_bits, vdev->emulated_config_bits + addr, len);
@@ -1188,7 +1188,7 @@ uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t 
addr, int len)
 void vfio_pci_write_config(PCIDevice *pdev,
uint32_t addr, uint32_t val, int len)
 {
-VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
 uint32_t val_le = cpu_to_le32(val);
 
 trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
@@ -2845,7 +2845,7 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice 
*vdev)
 
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
-VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
 VFIODevice *vbasedev = >vbasedev;
 VFIODevice *vbasedev_iter;
 VFIOGroup *group;
@@ -3169,7 +3169,7 @@ error:
 
 static void vfio_instance_finalize(Object *obj)
 {
-VFIOPCIDevice *vdev = VFIO_PCI(obj);
+VFIOPCIDevice *vdev = VFIO_PCI_BASE(obj);
 VFIOGroup *group = vdev->vbasedev.group;
 
 vfio_display_finalize(vdev);
@@ -3189,7 +3189,7 @@ static void vfio_instance_finalize(Object *obj)
 
 static void vfio_exitfn(PCIDevice *pdev)
 {
-VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
 
 vfio_unregister_req_notifier(vdev);
 vfio_unregister_err_notifier(vdev);
@@ -3208,7 +3208,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 
 static void vfio_pci_reset(DeviceState *dev)
 {
-VFIOPCIDevice *vdev = VFIO_PCI(dev);
+VFIOPCIDevice *vdev = VFIO_PCI_BASE(dev);
 
 trace_vfio_pci_reset(vdev->vbasedev.name);
 
@@ -3248,7 +3248,7 @@ post_reset:
 static void vfio_instance_init(Object *obj)
 {
 PCIDevice *pci_dev = PCI_DEVICE(obj);
-VFIOPCIDevice *vdev = VFIO_PCI(obj);
+VFIOPCIDevice *vdev = VFIO_PCI_BASE(obj);
 
 device_add_bootindex_property(obj, >bootindex,
   "bootindex", NULL,
@@ -3265,24 +3265,12 @@ static void vfio_instance_init(Object *obj)
 pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
 }
 
-static Property vfio_pci_dev_properties[] = {
-DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host),
-DEFINE_PROP_STRING("sysfsdev", VFIOPCIDevice, vbasedev.sysfsdev),
+static Property vfio_pci_base_dev_properties[] = {
 DEFINE_PROP_ON_OFF_AUTO("x-pre-copy-dirty-page-tracking", VFIOPCIDevice,
 vbasedev.pre_copy_dirty_page_tracking,
 ON_OFF_AUTO_ON),
-DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice,
-display, ON_OFF_AUTO_OFF),
-DEFINE_PROP_UINT32("xres", VFIOPCIDevice, display_xres, 0),
-DEFINE_PROP_UINT32("yres", VFIOPCIDevice, display_yres, 0),
 DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIOPCIDevice,

[PATCH v1 12/24] vfio-user: region read/write

2022-11-08 Thread John Johnson

Add support for posted writes on remote devices

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/vfio/common.c  |  10 +++-
 hw/vfio/pci.c |   9 +++-
 hw/vfio/pci.h |   1 +
 hw/vfio/user-protocol.h   |  12 +
 hw/vfio/user.c| 109 ++
 hw/vfio/user.h|   1 +
 include/hw/vfio/vfio-common.h |   7 +--
 7 files changed, 143 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 87400b3..87cd1d1 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -214,6 +214,7 @@ void vfio_region_write(void *opaque, hwaddr addr,
 uint32_t dword;
 uint64_t qword;
 } buf;
+bool post = region->post_wr;
 int ret;
 
 switch (size) {
@@ -234,7 +235,11 @@ void vfio_region_write(void *opaque, hwaddr addr,
 break;
 }
 
-ret = VDEV_REGION_WRITE(vbasedev, region->nr, addr, size, );
+/* read-after-write hazard if guest can directly access region */
+if (region->nr_mmaps) {
+post = false;
+}
+ret = VDEV_REGION_WRITE(vbasedev, region->nr, addr, size, , post);
 if (ret != size) {
 const char *err = ret < 0 ? strerror(-ret) : "short write";
 
@@ -1587,6 +1592,7 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, 
VFIORegion *region,
 region->size = info->size;
 region->fd_offset = info->offset;
 region->nr = index;
+region->post_wr = false;
 if (vbasedev->regfds != NULL) {
 region->fd = vbasedev->regfds[index];
 } else {
@@ -2721,7 +2727,7 @@ static int vfio_io_region_read(VFIODevice *vbasedev, 
uint8_t index, off_t off,
 }
 
 static int vfio_io_region_write(VFIODevice *vbasedev, uint8_t index, off_t off,
-uint32_t size, void *data)
+uint32_t size, void *data, bool post)
 {
 struct vfio_region_info *info = vbasedev->regions[index];
 int ret;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 2e0e41d..027f9d5 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -51,7 +51,7 @@
  (size), (data))
 #define VDEV_CONFIG_WRITE(vbasedev, off, size, data) \
 VDEV_REGION_WRITE((vbasedev), VFIO_PCI_CONFIG_REGION_INDEX, (off), \
-  (size), (data))
+  (size), (data), false)
 
 #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
 
@@ -1704,6 +1704,9 @@ static void vfio_bar_prepare(VFIOPCIDevice *vdev, int nr)
 bar->type = pci_bar & (bar->ioport ? ~PCI_BASE_ADDRESS_IO_MASK :
  ~PCI_BASE_ADDRESS_MEM_MASK);
 bar->size = bar->region.size;
+
+/* IO regions are sync, memory can be async */
+bar->region.post_wr = (bar->ioport == 0);
 }
 
 static void vfio_bars_prepare(VFIOPCIDevice *vdev)
@@ -3494,6 +3497,9 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Error 
**errp)
 if (udev->send_queued) {
 proxy->flags |= VFIO_PROXY_FORCE_QUEUED;
 }
+if (udev->no_post) {
+proxy->flags |= VFIO_PROXY_NO_POST;
+}
 
 vfio_user_validate_version(proxy, );
 if (err != NULL) {
@@ -3540,6 +3546,7 @@ static void vfio_user_instance_finalize(Object *obj)
 static Property vfio_user_pci_dev_properties[] = {
 DEFINE_PROP_STRING("socket", VFIOUserPCIDevice, sock_name),
 DEFINE_PROP_BOOL("x-send-queued", VFIOUserPCIDevice, send_queued, false),
+DEFINE_PROP_BOOL("x-no-posted-writes", VFIOUserPCIDevice, no_post, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index c47d2f8..ec17f2e 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -196,6 +196,7 @@ struct VFIOUserPCIDevice {
 VFIOPCIDevice device;
 char *sock_name;
 bool send_queued;   /* all sends are queued */
+bool no_post;   /* all regions write are sync */
 };
 
 /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h
index a1b64fe..124340c 100644
--- a/hw/vfio/user-protocol.h
+++ b/hw/vfio/user-protocol.h
@@ -140,4 +140,16 @@ typedef struct {
 uint64_t offset;
 } VFIOUserRegionInfo;
 
+/*
+ * VFIO_USER_REGION_READ
+ * VFIO_USER_REGION_WRITE
+ */
+typedef struct {
+VFIOUserHdr hdr;
+uint64_t offset;
+uint32_t region;
+uint32_t count;
+char data[];
+} VFIOUserRegionRW;
+
 #endif /* VFIO_USER_PROTOCOL_H */
diff --git a/hw/vfio/user.c b/hw/vfio/user.c
index 69b0fed..1453bb5 100644
--- a/hw/vfio/user.c
+++ b/hw/vfio/user.c
@@ -57,6 +57,8 @@ static void vfio_user_cb(void *opaque);
 
 static void vfio_user_request(void *opaque);
 static int vfio_user_send_queued(VFIOProxy *proxy, VFIOUserMsg *msg);
+static void vfio_user_send_async(VFIOProxy *proxy, VFIOUserHdr *hdr,
+ VFIOUserFDs *fds);
 static void vfio_user_send_wait(VFIOProxy *proxy, VFIOUserHdr *hdr,

[PATCH v1 19/24] vfio-user: secure DMA support

2022-11-08 Thread John Johnson

Secure DMA forces the remote process to use DMA r/w messages
instead of directly mapping guest memeory.

Signed-off-by: John G Johnson 
Signed-off-by: Elena Ufimtseva 
Signed-off-by: Jagannathan Raman 
---
 hw/vfio/pci.c  | 4 
 hw/vfio/pci.h  | 1 +
 hw/vfio/user.c | 2 +-
 hw/vfio/user.h | 1 +
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 53e3bb8..ce6776b 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3610,6 +3610,9 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Error 
**errp)
 vbasedev->proxy = proxy;
 vfio_user_set_handler(vbasedev, vfio_user_pci_process_req, vdev);
 
+if (udev->secure_dma) {
+proxy->flags |= VFIO_PROXY_SECURE;
+}
 if (udev->send_queued) {
 proxy->flags |= VFIO_PROXY_FORCE_QUEUED;
 }
@@ -3718,6 +3721,7 @@ static void vfio_user_instance_finalize(Object *obj)
 
 static Property vfio_user_pci_dev_properties[] = {
 DEFINE_PROP_STRING("socket", VFIOUserPCIDevice, sock_name),
+DEFINE_PROP_BOOL("secure-dma", VFIOUserPCIDevice, secure_dma, false),
 DEFINE_PROP_BOOL("x-send-queued", VFIOUserPCIDevice, send_queued, false),
 DEFINE_PROP_BOOL("x-no-posted-writes", VFIOUserPCIDevice, no_post, false),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index c04fa58..c4b8e5c 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -196,6 +196,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(VFIOUserPCIDevice, VFIO_USER_PCI)
 struct VFIOUserPCIDevice {
 VFIOPCIDevice device;
 char *sock_name;
+bool secure_dma;/* disable shared mem for DMA */
 bool send_queued;   /* all sends are queued */
 bool no_post;   /* all regions write are sync */
 };
diff --git a/hw/vfio/user.c b/hw/vfio/user.c
index d62fe05..0c5493e 100644
--- a/hw/vfio/user.c
+++ b/hw/vfio/user.c
@@ -1627,7 +1627,7 @@ static int vfio_user_io_dma_map(VFIOContainer *container, 
MemoryRegion *mr,
  * map->vaddr enters as a QEMU process address
  * make it either a file offset for mapped areas or 0
  */
-if (fd != -1) {
+if (fd != -1 && (container->proxy->flags & VFIO_PROXY_SECURE) == 0) {
 void *addr = (void *)(uintptr_t)map->vaddr;
 
 map->vaddr = qemu_ram_block_host_offset(mr->ram_block, addr);
diff --git a/hw/vfio/user.h b/hw/vfio/user.h
index 19b8a29..6bd9fd3 100644
--- a/hw/vfio/user.h
+++ b/hw/vfio/user.h
@@ -83,6 +83,7 @@ typedef struct VFIOProxy {
 
 /* VFIOProxy flags */
 #define VFIO_PROXY_CLIENT0x1
+#define VFIO_PROXY_SECURE0x2
 #define VFIO_PROXY_FORCE_QUEUED  0x4
 #define VFIO_PROXY_NO_POST   0x8
 
-- 
1.8.3.1

[PATCH v1 22/24] vfio-user: add 'x-msg-timeout' option that specifies msg wait times

2022-11-08 Thread John Johnson

Signed-off-by: John G Johnson 
Signed-off-by: Elena Ufimtseva 
Signed-off-by: Jagannathan Raman 
---
 hw/vfio/pci.c  | 4 
 hw/vfio/pci.h  | 1 +
 hw/vfio/user.c | 7 +--
 hw/vfio/user.h | 1 +
 4 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 005fcf8..3ae3a13 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3729,6 +3729,9 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Error 
**errp)
 if (udev->no_post) {
 proxy->flags |= VFIO_PROXY_NO_POST;
 }
+if (udev->wait_time) {
+proxy->wait_time = udev->wait_time;
+}
 
 vfio_user_validate_version(proxy, );
 if (err != NULL) {
@@ -3848,6 +3851,7 @@ static Property vfio_user_pci_dev_properties[] = {
 DEFINE_PROP_BOOL("secure-dma", VFIOUserPCIDevice, secure_dma, false),
 DEFINE_PROP_BOOL("x-send-queued", VFIOUserPCIDevice, send_queued, false),
 DEFINE_PROP_BOOL("x-no-posted-writes", VFIOUserPCIDevice, no_post, false),
+DEFINE_PROP_UINT32("x-msg-timeout", VFIOUserPCIDevice, wait_time, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index c4b8e5c..48b19ee 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -199,6 +199,7 @@ struct VFIOUserPCIDevice {
 bool secure_dma;/* disable shared mem for DMA */
 bool send_queued;   /* all sends are queued */
 bool no_post;   /* all regions write are sync */
+uint32_t wait_time; /* timeout for message replies */
 };
 
 /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
diff --git a/hw/vfio/user.c b/hw/vfio/user.c
index ddf9e13..a9e6cf5 100644
--- a/hw/vfio/user.c
+++ b/hw/vfio/user.c
@@ -717,7 +717,8 @@ static void vfio_user_send_wait(VFIOProxy *proxy, 
VFIOUserHdr *hdr,
 
 if (ret == 0) {
 while (!msg->complete) {
-if (!qemu_cond_timedwait(>cv, >lock, wait_time)) {
+if (!qemu_cond_timedwait(>cv, >lock,
+ proxy->wait_time)) {
 VFIOUserMsgQ *list;
 
 list = msg->pending ? >pending : >outgoing;
@@ -759,7 +760,8 @@ static void vfio_user_wait_reqs(VFIOProxy *proxy)
 msg = proxy->last_nowait;
 msg->type = VFIO_MSG_WAIT;
 while (!msg->complete) {
-if (!qemu_cond_timedwait(>cv, >lock, wait_time)) {
+if (!qemu_cond_timedwait(>cv, >lock,
+ proxy->wait_time)) {
 VFIOUserMsgQ *list;
 
 list = msg->pending ? >pending : >outgoing;
@@ -881,6 +883,7 @@ VFIOProxy *vfio_user_connect_dev(SocketAddress *addr, Error 
**errp)
 
 proxy->flags = VFIO_PROXY_CLIENT;
 proxy->state = VFIO_PROXY_CONNECTED;
+proxy->wait_time = wait_time;
 
 qemu_mutex_init(>lock);
 qemu_cond_init(>close_cv);
diff --git a/hw/vfio/user.h b/hw/vfio/user.h
index d88ffe5..f711861 100644
--- a/hw/vfio/user.h
+++ b/hw/vfio/user.h
@@ -62,6 +62,7 @@ typedef struct VFIOProxy {
 uint64_t max_bitmap;
 uint64_t migr_pgsize;
 int flags;
+uint32_t wait_time;
 QemuCond close_cv;
 AioContext *ctx;
 QEMUBH *req_bh;
-- 
1.8.3.1

Re: [PATCH trivial for 7.2] hw/ssi/sifive_spi.c: spelling: reigster

2022-11-08 Thread Palmer Dabbelt


On Sat, 05 Nov 2022 04:53:29 PDT (-0700), m...@tls.msk.ru wrote:

Fixes: 0694dabe9763847f3010b54ab3ec7d367d2f0ff0


Not sure if I missed something in QEMU land, but those are usually 
listed more like


Fixes: 0694dabe97 ("hw/ssi: Add SiFive SPI controller support")

Checkpatch isn't failing, though.  Either way

Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 

Thanks!


Signed-off-by: Michael Tokarev 
---
 hw/ssi/sifive_spi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ssi/sifive_spi.c b/hw/ssi/sifive_spi.c
index 03540cf5ca..1b4a401ca1 100644
--- a/hw/ssi/sifive_spi.c
+++ b/hw/ssi/sifive_spi.c
@@ -267,7 +267,7 @@ static void sifive_spi_write(void *opaque, hwaddr addr,
 case R_RXDATA:
 case R_IP:
 qemu_log_mask(LOG_GUEST_ERROR,
-  "%s: invalid write to read-only reigster 0x%"
+  "%s: invalid write to read-only register 0x%"
   HWADDR_PRIx " with 0x%x\n", __func__, addr << 2, value);
 break;

Re: [PULL v3 49/81] acpi: pc: vga: use AcpiDevAmlIf interface to build VGA device descriptors

2022-11-08 Thread B




Am 7. November 2022 22:28:31 UTC schrieb "Michael S. Tsirkin" :
>On Mon, Nov 07, 2022 at 10:07:52PM +, Bernhard Beschow wrote:
>> Am 7. November 2022 13:00:36 UTC schrieb "Michael S. Tsirkin" 
>> :
>> >On Mon, Nov 07, 2022 at 06:16:25PM +0530, Ani Sinha wrote:
>> >> On Mon, Nov 7, 2022 at 6:03 PM Michael S. Tsirkin  wrote:
>> >> >
>> >> > On Sun, Nov 06, 2022 at 10:16:41PM +0100, Bernhard Beschow wrote:
>> >> > >
>> >> > >
>> >> > > On Sat, Nov 5, 2022 at 6:45 PM Michael S. Tsirkin  
>> >> > > wrote:
>> >> > >
>> >> > > From: Igor Mammedov 
>> >> > >
>> >> > > Signed-off-by: Igor Mammedov 
>> >> > > Message-Id: <20221017102146.2254096-2-imamm...@redhat.com>
>> >> > > Reviewed-by: Michael S. Tsirkin 
>> >> > > Signed-off-by: Michael S. Tsirkin 
>> >> > > NB: we do not expect any functional change in
>> >> > > any ACPI tables with this change. It's only a refactoring.
>> >> > >
>> >> > > Reviewed-by: Ani Sinha 
>> >> > > ---
>> >> > >  hw/display/vga_int.h   |  2 ++
>> >> > >  hw/display/acpi-vga-stub.c |  7 +++
>> >> > >  hw/display/acpi-vga.c  | 26 ++
>> >> > >  hw/display/vga-pci.c   |  4 
>> >> > >  hw/i386/acpi-build.c   | 26 +-
>> >> > >  hw/display/meson.build | 17 +
>> >> > >  6 files changed, 57 insertions(+), 25 deletions(-)
>> >> > >  create mode 100644 hw/display/acpi-vga-stub.c
>> >> > >  create mode 100644 hw/display/acpi-vga.c
>> >> > >
>> >> > >
>> >> > > With this "qemu:qtest+qtest-hppa / qtest-hppa/display-vga-test" fails 
>> >> > > due to
>> >> > > the symbol "aml_return" being undefined:
>> >> > >
>> >> > > # starting QEMU: exec ./qemu-system-hppa -qtest 
>> >> > > unix:/tmp/qtest-515650.sock
>> >> > > -qtest-log /dev/null -chardev 
>> >> > > socket,path=/tmp/qtest-515650.qmp,id=char0 -mon
>> >> > > chardev=char0,mode=control -display none -vga none -device virtio-vga 
>> >> > > -accel
>> >> > > qtest
>> >> > > --- stderr 
>> >> > > ---
>> >> > > Failed to open module: qemu/build/qemu-bundle/usr/lib/qemu/
>> >> > > hw-display-virtio-vga.so: undefined symbol: aml_return
>> >> > > qemu-system-hppa: -device virtio-vga: 'virtio-vga' is not a valid 
>> >> > > device model
>> >> > > name
>> >> > > Broken pipe
>> >> > > ../src/tests/qtest/libqtest.c:179: kill_qemu() tried to terminate 
>> >> > > QEMU process
>> >> > > but encountered exit status 1 (expected 0)
>> >> > >
>> >> > > (test program exited with status code -6)
>> >> > >
>> >> > > Best regards,
>> >> > > Bernhard
>> >> >
>> >> > It's unfortunate that it doesn't reproduce for me :(
>> >> 
>> >> To reproduce:
>> >> 
>> >> - docker pull registry.gitlab.com/qemu-project/qemu/qemu/centos8:latest
>> >> - configure line:
>> >> 
>> >> ../configure --enable-werror --disable-docs --disable-nettle
>> >> --enable-gcrypt --enable-fdt=system --enable-modules
>> >> --enable-trace-backends=dtrace --enable-docs --enable-vfio-user-server
>> >> --target-list="ppc64-softmmu or1k-softmmu s390x-softmmu x86_64-softmmu
>> >> rx-softmmu sh4-softmmu nios2-softmmu"
>> >
>> >
>> >I suspect --enable-modules is the difference.
>> 
>> Indeed, I'm building with --enable-modules and got the undefined symbol.
>> 
>> Perhaps we could use the -Wl,--no-undefined linker option when building with 
>> modules to catch these errors at link time already?
>> 
>> Best regards,
>> Bernhard
>
>I don't see how it can work but sure, send a patch :)

Sounds like circular dependencies. Don't count on me any time soon then ;)

Best regards,
Bernhard
>
>> >
>> >> - # make
>> >> - # QTEST_QEMU_BINARY=./qemu-system-or1k  ./tests/qtest/qos-test
>> >> failed to open module:
>> >> /build/qemu/qemu/build/qemu-bundle/usr/local/lib64/qemu/hw-display-virtio-vga.so:
>> >> undefined symbol: aml_return
>> >> qemu-system-or1k: ../util/error.c:59: error_setv: Assertion `*errp ==
>> >> NULL' failed.
>> >> Broken pipe
>> >> ../tests/qtest/libqtest.c:188: kill_qemu() detected QEMU death from
>> >> signal 6 (Aborted) (core dumped)
>> >> Aborted (core dumped)
>> >
>

[PATCH 6/8] virtio-blk: remove unnecessary AioContext lock from function already safe

2022-11-08 Thread Stefan Hajnoczi

From: Emanuele Giuseppe Esposito 

AioContext lock was introduced in b9e413dd375 and in this instance
it is used to protect these 3 functions:
- virtio_blk_handle_rw_error
- virtio_blk_req_complete
- block_acct_done

Now that all three of the above functions are protected with
their own locks, we can get rid of the AioContext lock.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Stefan Hajnoczi 
Message-Id: <20220609143727.1151816-9-eespo...@redhat.com>
---
 hw/block/virtio-blk.c | 19 ++-
 1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index f8fcf25292..faea045178 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -108,7 +108,6 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
 
 IO_CODE();
 
-aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
 while (next) {
 VirtIOBlockReq *req = next;
 next = req->mr_next;
@@ -141,7 +140,6 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
 block_acct_done(blk_get_stats(s->blk), >acct);
 virtio_blk_free_request(req);
 }
-aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
 static void virtio_blk_flush_complete(void *opaque, int ret)
@@ -150,20 +148,16 @@ static void virtio_blk_flush_complete(void *opaque, int 
ret)
 VirtIOBlock *s = req->dev;
 
 IO_CODE();
-aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
 
 if (ret) {
 if (virtio_blk_handle_rw_error(req, -ret, 0, true)) {
-goto out;
+return;
 }
 }
 
 virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
 block_acct_done(blk_get_stats(s->blk), >acct);
 virtio_blk_free_request(req);
-
-out:
-aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
 static void virtio_blk_discard_write_zeroes_complete(void *opaque, int ret)
@@ -174,11 +168,10 @@ static void virtio_blk_discard_write_zeroes_complete(void 
*opaque, int ret)
 ~VIRTIO_BLK_T_BARRIER) == 
VIRTIO_BLK_T_WRITE_ZEROES;
 
 IO_CODE();
-aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
 
 if (ret) {
 if (virtio_blk_handle_rw_error(req, -ret, false, is_write_zeroes)) {
-goto out;
+return;
 }
 }
 
@@ -187,9 +180,6 @@ static void virtio_blk_discard_write_zeroes_complete(void 
*opaque, int ret)
 block_acct_done(blk_get_stats(s->blk), >acct);
 }
 virtio_blk_free_request(req);
-
-out:
-aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
 #ifdef __linux__
@@ -238,10 +228,8 @@ static void virtio_blk_ioctl_complete(void *opaque, int 
status)
 virtio_stl_p(vdev, >data_len, hdr->dxfer_len);
 
 out:
-aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
 virtio_blk_req_complete(req, status);
 virtio_blk_free_request(req);
-aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 g_free(ioctl_req);
 }
 
@@ -852,7 +840,6 @@ static void virtio_blk_dma_restart_bh(void *opaque)
 
 s->rq = NULL;
 
-aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
 while (req) {
 VirtIOBlockReq *next = req->next;
 if (virtio_blk_handle_request(req, )) {
@@ -876,8 +863,6 @@ static void virtio_blk_dma_restart_bh(void *opaque)
 
 /* Paired with inc in virtio_blk_dma_restart_cb() */
 blk_dec_in_flight(s->conf.conf.blk);
-
-aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
 /*
-- 
2.38.1

[PATCH 4/8] virtio-blk: mark GLOBAL_STATE_CODE functions

2022-11-08 Thread Stefan Hajnoczi

From: Emanuele Giuseppe Esposito 

Just as done in the block API, mark functions in virtio-blk
that are always called in the main loop with BQL held.

We know such functions are GS because they all are callbacks
from virtio.c API that has already classified them as GS.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Stefan Hajnoczi 
Message-Id: <20220609143727.1151816-6-eespo...@redhat.com>
---
 hw/block/dataplane/virtio-blk.c |  4 
 hw/block/virtio-blk.c   | 27 +++
 2 files changed, 31 insertions(+)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 975f5ca8c4..728c9cd86c 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -89,6 +89,8 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *conf,
 BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
 
+GLOBAL_STATE_CODE();
+
 *dataplane = NULL;
 
 if (conf->iothread) {
@@ -140,6 +142,8 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s)
 {
 VirtIOBlock *vblk;
 
+GLOBAL_STATE_CODE();
+
 if (!s) {
 return;
 }
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 96bc11d2fe..02b213a140 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -845,11 +845,17 @@ static void virtio_blk_dma_restart_bh(void *opaque)
 aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
+/*
+ * Only called when VM is started or stopped in cpus.c.
+ * No iothread runs in parallel
+ */
 static void virtio_blk_dma_restart_cb(void *opaque, bool running,
   RunState state)
 {
 VirtIOBlock *s = opaque;
 
+GLOBAL_STATE_CODE();
+
 if (!running) {
 return;
 }
@@ -867,8 +873,14 @@ static void virtio_blk_reset(VirtIODevice *vdev)
 AioContext *ctx;
 VirtIOBlockReq *req;
 
+GLOBAL_STATE_CODE();
+
 ctx = blk_get_aio_context(s->blk);
 aio_context_acquire(ctx);
+/*
+ * This drain together with ->stop_ioeventfd() in virtio_pci_reset()
+ * stops all Iothreads.
+ */
 blk_drain(s->blk);
 
 /* We drop queued requests after blk_drain() because blk_drain() itself can
@@ -1037,11 +1049,17 @@ static void virtio_blk_set_status(VirtIODevice *vdev, 
uint8_t status)
 }
 }
 
+/*
+ * VM is stopped while doing migration, so iothread has
+ * no requests to process.
+ */
 static void virtio_blk_save_device(VirtIODevice *vdev, QEMUFile *f)
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 VirtIOBlockReq *req = s->rq;
 
+GLOBAL_STATE_CODE();
+
 while (req) {
 qemu_put_sbyte(f, 1);
 
@@ -1055,11 +1073,17 @@ static void virtio_blk_save_device(VirtIODevice *vdev, 
QEMUFile *f)
 qemu_put_sbyte(f, 0);
 }
 
+/*
+ * VM is stopped while doing migration, so iothread has
+ * no requests to process.
+ */
 static int virtio_blk_load_device(VirtIODevice *vdev, QEMUFile *f,
   int version_id)
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 
+GLOBAL_STATE_CODE();
+
 while (qemu_get_sbyte(f)) {
 unsigned nvqs = s->conf.num_queues;
 unsigned vq_idx = 0;
@@ -1108,6 +1132,7 @@ static const BlockDevOps virtio_block_ops = {
 .resize_cb = virtio_blk_resize,
 };
 
+/* Iothread is not yet created */
 static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -1116,6 +1141,8 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 Error *err = NULL;
 unsigned i;
 
+GLOBAL_STATE_CODE();
+
 if (!conf->conf.blk) {
 error_setg(errp, "drive property not set");
 return;
-- 
2.38.1

[PATCH 3/8] virtio: categorize callbacks in GS

2022-11-08 Thread Stefan Hajnoczi

From: Emanuele Giuseppe Esposito 

All the callbacks below are always running in the main loop.

The callbacks are the following:
- start/stop_ioeventfd: these are the callbacks where
  blk_set_aio_context(iothread) is done, so they are called in the main
  loop.

- save and load: called during migration, when VM is stopped from the
  main loop.

- reset: before calling this callback, stop_ioeventfd is invoked, so
  it can only run in the main loop.

- set_status: going through all the callers we can see it is called
  from a MemoryRegionOps callback, which always run in a vcpu thread and
  hold the BQL.

- realize: iothread is not even created yet.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Stefan Hajnoczi 
Message-Id: <20220609143727.1151816-5-eespo...@redhat.com>
---
 hw/block/virtio-blk.c  | 2 ++
 hw/virtio/virtio-bus.c | 5 +
 hw/virtio/virtio-pci.c | 2 ++
 hw/virtio/virtio.c | 8 
 4 files changed, 17 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 96d00103a4..96bc11d2fe 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1005,6 +1005,8 @@ static void virtio_blk_set_status(VirtIODevice *vdev, 
uint8_t status)
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 
+GLOBAL_STATE_CODE();
+
 if (!(status & (VIRTIO_CONFIG_S_DRIVER | VIRTIO_CONFIG_S_DRIVER_OK))) {
 assert(!s->dataplane_started);
 }
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index 896feb37a1..74cdf4bd27 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
 #include "qapi/error.h"
@@ -224,6 +225,8 @@ int virtio_bus_start_ioeventfd(VirtioBusState *bus)
 VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
 int r;
 
+GLOBAL_STATE_CODE();
+
 if (!k->ioeventfd_assign || !k->ioeventfd_enabled(proxy)) {
 return -ENOSYS;
 }
@@ -248,6 +251,8 @@ void virtio_bus_stop_ioeventfd(VirtioBusState *bus)
 VirtIODevice *vdev;
 VirtioDeviceClass *vdc;
 
+GLOBAL_STATE_CODE();
+
 if (!bus->ioeventfd_started) {
 return;
 }
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index a1c9dfa7bb..4f9a94f61b 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -313,6 +313,8 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 uint16_t vector;
 hwaddr pa;
 
+GLOBAL_STATE_CODE();
+
 switch (addr) {
 case VIRTIO_PCI_GUEST_FEATURES:
 /* Guest does not negotiate properly?  We have to assume nothing. */
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 9683b2e158..468e8f5ad0 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2422,6 +2422,8 @@ int virtio_set_status(VirtIODevice *vdev, uint8_t val)
 VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
 trace_virtio_set_status(vdev, val);
 
+GLOBAL_STATE_CODE();
+
 if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
 if (!(vdev->status & VIRTIO_CONFIG_S_FEATURES_OK) &&
 val & VIRTIO_CONFIG_S_FEATURES_OK) {
@@ -2515,6 +2517,8 @@ void virtio_reset(void *opaque)
 VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
 int i;
 
+GLOBAL_STATE_CODE();
+
 virtio_set_status(vdev, 0);
 if (current_cpu) {
 /* Guest initiated reset */
@@ -3357,6 +3361,8 @@ int virtio_save(VirtIODevice *vdev, QEMUFile *f)
 uint32_t guest_features_lo = (vdev->guest_features & 0x);
 int i;
 
+GLOBAL_STATE_CODE();
+
 if (k->save_config) {
 k->save_config(qbus->parent, f);
 }
@@ -3508,6 +3514,8 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int 
version_id)
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
 VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
 
+GLOBAL_STATE_CODE();
+
 /*
  * We poison the endianness to ensure it does not get used before
  * subsections have been loaded.
-- 
2.38.1

[PATCH 8/8] virtio-blk: minimize virtio_blk_reset() AioContext lock region

2022-11-08 Thread Stefan Hajnoczi

blk_drain() needs the lock because it calls AIO_WAIT_WHILE().

The s->rq loop doesn't need the lock because dataplane has been stopped
when virtio_blk_reset() is called.

Signed-off-by: Stefan Hajnoczi 
---
 hw/block/virtio-blk.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 771d87cfbe..0b411b3065 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -898,6 +898,7 @@ static void virtio_blk_reset(VirtIODevice *vdev)
 ctx = blk_get_aio_context(s->blk);
 aio_context_acquire(ctx);
 blk_drain(s->blk);
+aio_context_release(ctx);
 
 /* We drop queued requests after blk_drain() because blk_drain() itself can
  * produce them. */
@@ -908,8 +909,6 @@ static void virtio_blk_reset(VirtIODevice *vdev)
 virtio_blk_free_request(req);
 }
 
-aio_context_release(ctx);
-
 assert(!s->dataplane_started);
 blk_set_enable_write_cache(s->blk, s->original_wce);
 }
-- 
2.38.1

[PATCH 5/8] virtio-blk: mark IO_CODE functions

2022-11-08 Thread Stefan Hajnoczi

From: Emanuele Giuseppe Esposito 

Just as done in the block API, mark functions in virtio-blk
that are called also from iothread(s).

We know such functions are IO because many are blk_* callbacks,
running always in the device iothread, and remaining are propagated
from the leaf IO functions (if a function calls a IO_CODE function,
itself is categorized as IO_CODE too).

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Stefan Hajnoczi 
Message-Id: <20220609143727.1151816-7-eespo...@redhat.com>
---
 hw/block/dataplane/virtio-blk.c |  4 +++
 hw/block/virtio-blk.c   | 45 -
 2 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 728c9cd86c..3593ac0e7b 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -63,6 +63,8 @@ static void notify_guest_bh(void *opaque)
 unsigned long bitmap[BITS_TO_LONGS(nvqs)];
 unsigned j;
 
+IO_CODE();
+
 memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap));
 memset(s->batch_notify_vqs, 0, sizeof(bitmap));
 
@@ -288,6 +290,8 @@ static void virtio_blk_data_plane_stop_bh(void *opaque)
 VirtIOBlockDataPlane *s = opaque;
 unsigned i;
 
+IO_CODE();
+
 for (i = 0; i < s->conf->num_queues; i++) {
 VirtQueue *vq = virtio_get_queue(s->vdev, i);
 
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 02b213a140..f8fcf25292 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -39,6 +39,8 @@
 static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq,
 VirtIOBlockReq *req)
 {
+IO_CODE();
+
 req->dev = s;
 req->vq = vq;
 req->qiov.size = 0;
@@ -57,6 +59,8 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, 
unsigned char status)
 VirtIOBlock *s = req->dev;
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
+IO_CODE();
+
 trace_virtio_blk_req_complete(vdev, req, status);
 
 stb_p(>in->status, status);
@@ -76,6 +80,8 @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, 
int error,
 VirtIOBlock *s = req->dev;
 BlockErrorAction action = blk_get_error_action(s->blk, is_read, error);
 
+IO_CODE();
+
 if (action == BLOCK_ERROR_ACTION_STOP) {
 /* Break the link as the next request is going to be parsed from the
  * ring again. Otherwise we may end up doing a double completion! */
@@ -143,7 +149,9 @@ static void virtio_blk_flush_complete(void *opaque, int ret)
 VirtIOBlockReq *req = opaque;
 VirtIOBlock *s = req->dev;
 
+IO_CODE();
 aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
+
 if (ret) {
 if (virtio_blk_handle_rw_error(req, -ret, 0, true)) {
 goto out;
@@ -165,7 +173,9 @@ static void virtio_blk_discard_write_zeroes_complete(void 
*opaque, int ret)
 bool is_write_zeroes = (virtio_ldl_p(VIRTIO_DEVICE(s), >out.type) &
 ~VIRTIO_BLK_T_BARRIER) == 
VIRTIO_BLK_T_WRITE_ZEROES;
 
+IO_CODE();
 aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
+
 if (ret) {
 if (virtio_blk_handle_rw_error(req, -ret, false, is_write_zeroes)) {
 goto out;
@@ -198,6 +208,8 @@ static void virtio_blk_ioctl_complete(void *opaque, int 
status)
 struct virtio_scsi_inhdr *scsi;
 struct sg_io_hdr *hdr;
 
+IO_CODE();
+
 scsi = (void *)req->elem.in_sg[req->elem.in_num - 2].iov_base;
 
 if (status) {
@@ -239,6 +251,8 @@ static VirtIOBlockReq *virtio_blk_get_request(VirtIOBlock 
*s, VirtQueue *vq)
 {
 VirtIOBlockReq *req = virtqueue_pop(vq, sizeof(VirtIOBlockReq));
 
+IO_CODE();
+
 if (req) {
 virtio_blk_init_request(s, vq, req);
 }
@@ -259,6 +273,8 @@ static int virtio_blk_handle_scsi_req(VirtIOBlockReq *req)
 BlockAIOCB *acb;
 #endif
 
+IO_CODE();
+
 /*
  * We require at least one output segment each for the virtio_blk_outhdr
  * and the SCSI command block.
@@ -357,6 +373,7 @@ fail:
 static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
 {
 int status;
+IO_CODE();
 
 status = virtio_blk_handle_scsi_req(req);
 if (status != -EINPROGRESS) {
@@ -374,6 +391,8 @@ static inline void submit_requests(VirtIOBlock *s, 
MultiReqBuffer *mrb,
 bool is_write = mrb->is_write;
 BdrvRequestFlags flags = 0;
 
+IO_CODE();
+
 if (num_reqs > 1) {
 int i;
 struct iovec *tmp_iov = qiov->iov;
@@ -423,6 +442,8 @@ static int multireq_compare(const void *a, const void *b)
 const VirtIOBlockReq *req1 = *(VirtIOBlockReq **)a,
  *req2 = *(VirtIOBlockReq **)b;
 
+IO_CODE();
+
 /*
  * Note that we can't simply subtract sector_num1 from sector_num2
  * here as that could overflow the return value.
@@ -442,6 +463,8 @@ static void virtio_blk_submit_multireq(VirtIOBlock *s, 
MultiReqBuffer *mrb)
 uint32_t

[PATCH 2/8] block-backend: enable_write_cache should be atomic

2022-11-08 Thread Stefan Hajnoczi

From: Emanuele Giuseppe Esposito 

It is read from IO_CODE and written with BQL held,
so setting it as atomic should be enough.

Also remove the aiocontext lock that was sporadically
taken around the set.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Stefan Hajnoczi 
Message-Id: <20220609143727.1151816-3-eespo...@redhat.com>
---
 block/block-backend.c | 6 +++---
 hw/block/virtio-blk.c | 4 
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index c0c7d56c8d..949418cad4 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -60,7 +60,7 @@ struct BlockBackend {
  * can be used to restore those options in the new BDS on insert) */
 BlockBackendRootState root_state;
 
-bool enable_write_cache;
+bool enable_write_cache; /* Atomic */
 
 /* I/O stats (display with "info blockstats"). */
 BlockAcctStats stats;
@@ -1939,13 +1939,13 @@ bool blk_is_sg(BlockBackend *blk)
 bool blk_enable_write_cache(BlockBackend *blk)
 {
 IO_CODE();
-return blk->enable_write_cache;
+return qatomic_read(>enable_write_cache);
 }
 
 void blk_set_enable_write_cache(BlockBackend *blk, bool wce)
 {
 IO_CODE();
-blk->enable_write_cache = wce;
+qatomic_set(>enable_write_cache, wce);
 }
 
 void blk_activate(BlockBackend *blk, Error **errp)
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cdc6fd5979..96d00103a4 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -961,9 +961,7 @@ static void virtio_blk_set_config(VirtIODevice *vdev, const 
uint8_t *config)
 
 memcpy(, config, s->config_size);
 
-aio_context_acquire(blk_get_aio_context(s->blk));
 blk_set_enable_write_cache(s->blk, blkcfg.wce != 0);
-aio_context_release(blk_get_aio_context(s->blk));
 }
 
 static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features,
@@ -1031,11 +1029,9 @@ static void virtio_blk_set_status(VirtIODevice *vdev, 
uint8_t status)
  * s->blk would erroneously be placed in writethrough mode.
  */
 if (!virtio_vdev_has_feature(vdev, VIRTIO_BLK_F_CONFIG_WCE)) {
-aio_context_acquire(blk_get_aio_context(s->blk));
 blk_set_enable_write_cache(s->blk,
virtio_vdev_has_feature(vdev,
VIRTIO_BLK_F_WCE));
-aio_context_release(blk_get_aio_context(s->blk));
 }
 }
 
-- 
2.38.1

[PATCH 7/8] virtio-blk: don't acquire AioContext in virtio_blk_handle_vq()

2022-11-08 Thread Stefan Hajnoczi

There is no need to acquire AioContext in virtio_blk_handle_vq() because
no APIs used in the function require it and nothing else in the
virtio-blk code requires mutual exclusion anymore.

Signed-off-by: Stefan Hajnoczi 
---
 hw/block/virtio-blk.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index faea045178..771d87cfbe 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -784,7 +784,6 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
 
 IO_CODE();
 
-aio_context_acquire(blk_get_aio_context(s->blk));
 blk_io_plug(s->blk);
 
 do {
@@ -810,7 +809,6 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
 }
 
 blk_io_unplug(s->blk);
-aio_context_release(blk_get_aio_context(s->blk));
 }
 
 static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
-- 
2.38.1

Re: [PATCH v1 2/9] tests/avocado: improve behaviour waiting for login prompts

2022-11-08 Thread John Snow

On Tue, Nov 8, 2022 at 4:26 AM Alex Bennée  wrote:
>
> This attempts to deal with the problem of login prompts not being
> guaranteed to be terminated with a newline. The solution to this is to
> peek at the incoming data looking to see if we see an up-coming match
> before we fall back to the old readline() logic. The reason to mostly
> rely on readline is because I am occasionally seeing the peek stalling
> despite data being there.
>
> This seems kinda hacky and gross so I'm open to alternative approaches
> and cleaner python code.

Hm, yeah. I'm not too sure. I guess if it works, it works -- I don't
have better suggestions for you here. I need to rewrite a good bit of
how machine.py works, and time will tell if we still need this kind of
workaround when I do. I guess if it doesn't hurt anything, go for it.

*shrug*

>
> Signed-off-by: Alex Bennée 
> ---
>  tests/avocado/avocado_qemu/__init__.py | 89 +-
>  1 file changed, 88 insertions(+), 1 deletion(-)
>
> diff --git a/tests/avocado/avocado_qemu/__init__.py 
> b/tests/avocado/avocado_qemu/__init__.py
> index 910f3ba1ea..d6ff68e171 100644
> --- a/tests/avocado/avocado_qemu/__init__.py
> +++ b/tests/avocado/avocado_qemu/__init__.py
> @@ -131,6 +131,58 @@ def pick_default_qemu_bin(bin_prefix='qemu-system-', 
> arch=None):
>  return path
>  return None
>
> +def _peek_ahead(console, min_match, success_message):
> +"""
> +peek ahead in the console stream keeping an eye out for the
> +success message.
> +
> +Returns some message to process or None, indicating the normal
> +readline should occur.
> +"""
> +console_logger = logging.getLogger('console')
> +peek_len = 0
> +retries = 0
> +
> +while True:
> +try:
> +old_peek_len = peek_len
> +peek_ahead = console.peek(min_match).decode()
> +peek_len = len(peek_ahead)
> +
> +# if we get stuck too long lets just fallback to readline
> +if peek_len <= old_peek_len:
> +retries += 1
> +if retries > 10:
> +return None
> +
> +# if we see a newline in the peek we can let safely bail
> +# and let the normal readline() deal with it
> +if peek_ahead.endswith(('\n', '\r', '\r\n')):
> +return None
> +
> +# if we haven't seen enough for the whole message but the
> +# first part matches lets just loop again
> +if len(peek_ahead) < min_match and \
> +   success_message[:peek_len] in peek_ahead:
> +continue
> +
> +# if we see the whole success_message we are done, consume
> +# it and pass back so we can exit to the user
> +if success_message in peek_ahead:
> +return console.read(peek_len).decode()
> +
> +# of course if we've seen enough then this line probably
> +# doesn't contain what we are looking for, fallback
> +if peek_len > min_match:
> +return None
> +
> +except UnicodeDecodeError:
> +console_logger.log("error in decode of peek")
> +return None
> +
> +# we should never get here
> +return None
> +
>
>  def _console_interaction(test, success_message, failure_message,
>   send_string, keep_sending=False, vm=None):
> @@ -139,17 +191,52 @@ def _console_interaction(test, success_message, 
> failure_message,
>  vm = test.vm
>  console = vm.console_socket.makefile(mode='rb', encoding='utf-8')
>  console_logger = logging.getLogger('console')
> +
> +# Establish the minimum number of bytes we would need to
> +# potentially match against success_message
> +if success_message is not None:
> +min_match = len(success_message)
> +else:
> +min_match = 0
> +
> +console_logger.debug("looking for %d:(%s), sending %s (always=%s)",
> + min_match, success_message, send_string, 
> keep_sending)
> +
>  while True:
> +msg = None
> +
> +# First send our string, optionally repeating the send next
> +# cycle.
>  if send_string:
>  vm.console_socket.sendall(send_string.encode())
>  if not keep_sending:
>  send_string = None # send only once
> +
> +# If the console has no data to read we briefly
> +# sleep before continuing.
> +if not console.readable():
> +time.sleep(0.1)
> +continue
> +
>  try:
> -msg = console.readline().decode().strip()
> +
> +# First we shall peek ahead for a potential match to cover 
> waiting
> +# for lines without any newlines.
> +if min_match > 0:
> +msg = _peek_ahead(console, min_match, success_message)
> +
> +# otherwise we block here for a full line
> +if not

[PATCH 0/8] virtio-blk: remove AioContext lock

2022-11-08 Thread Stefan Hajnoczi

This is a continuation of Emanuele Esposito's work to remove the AioContext
lock in virtio-blk. In the past it was necessary to acquire the AioContext lock
in order to do I/O. Paolo Bonzini and Emanuele have removed the need for the
AioContext in the core block layer code, with a few exceptions like blk_drain()
and blk_set_aio_context().

This patch series annotates virtio-blk functions with
IO_CODE()/GLOBAL_STATE_CODE() so it's clear in which context they are called.
It also removes unnecessary AioContext locks.

The end result is that virtio-blk rarely takes the AioContext lock. Future
patch series will add support for multiple IOThreads so that true multi-queue
can be achieved, but right now virtio-blk still has unprotected shared state
like s->rq so that needs to wait for another day.

Based-on: <20221102182337.252202-1-stefa...@redhat.com>

Emanuele Giuseppe Esposito (6):
  virtio_queue_aio_attach_host_notifier: remove AioContext lock
  block-backend: enable_write_cache should be atomic
  virtio: categorize callbacks in GS
  virtio-blk: mark GLOBAL_STATE_CODE functions
  virtio-blk: mark IO_CODE functions
  virtio-blk: remove unnecessary AioContext lock from function already
safe

Stefan Hajnoczi (2):
  virtio-blk: don't acquire AioContext in virtio_blk_handle_vq()
  virtio-blk: minimize virtio_blk_reset() AioContext lock region

 include/block/aio-wait.h|  4 +-
 block/block-backend.c   |  6 +--
 hw/block/dataplane/virtio-blk.c | 18 +--
 hw/block/virtio-blk.c   | 92 -
 hw/scsi/virtio-scsi-dataplane.c | 10 ++--
 hw/virtio/virtio-bus.c  |  5 ++
 hw/virtio/virtio-pci.c  |  2 +
 hw/virtio/virtio.c  |  8 +++
 util/aio-wait.c |  2 +-
 9 files changed, 106 insertions(+), 41 deletions(-)

-- 
2.38.1

[PATCH 1/8] virtio_queue_aio_attach_host_notifier: remove AioContext lock

2022-11-08 Thread Stefan Hajnoczi

From: Emanuele Giuseppe Esposito 

virtio_queue_aio_attach_host_notifier() and
virtio_queue_aio_attach_host_notifier_nopoll() run always in the
main loop, so there is no need to protect them with AioContext
lock.

On the other side, virtio_queue_aio_detach_host_notifier() runs
in a bh in the iothread context, but it is always scheduled
(thus serialized) by the main loop. Therefore removing the
AioContext lock is safe.

In order to remove the AioContext lock it is necessary to switch
aio_wait_bh_oneshot() to AIO_WAIT_WHILE_UNLOCKED(). virtio-blk and
virtio-scsi are the only users of aio_wait_bh_oneshot() so it is
possible to make this change.

For now bdrv_set_aio_context() still needs the AioContext lock.

Signed-off-by: Emanuele Giuseppe Esposito 
Signed-off-by: Stefan Hajnoczi 
Message-Id: <20220609143727.1151816-2-eespo...@redhat.com>
---
 include/block/aio-wait.h|  4 ++--
 hw/block/dataplane/virtio-blk.c | 10 ++
 hw/block/virtio-blk.c   |  2 ++
 hw/scsi/virtio-scsi-dataplane.c | 10 --
 util/aio-wait.c |  2 +-
 5 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index dd9a7f6461..fce6bfee3a 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -131,8 +131,8 @@ void aio_wait_kick(void);
  *
  * Run a BH in @ctx and wait for it to complete.
  *
- * Must be called from the main loop thread with @ctx acquired exactly once.
- * Note that main loop event processing may occur.
+ * Must be called from the main loop thread. @ctx must not be acquired by the
+ * caller. Note that main loop event processing may occur.
  */
 void aio_wait_bh_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
 
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index b28d81737e..975f5ca8c4 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -167,6 +167,8 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 Error *local_err = NULL;
 int r;
 
+GLOBAL_STATE_CODE();
+
 if (vblk->dataplane_started || s->starting) {
 return 0;
 }
@@ -245,13 +247,11 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 }
 
 /* Get this show started by hooking up our callbacks */
-aio_context_acquire(s->ctx);
 for (i = 0; i < nvqs; i++) {
 VirtQueue *vq = virtio_get_queue(s->vdev, i);
 
 virtio_queue_aio_attach_host_notifier(vq, s->ctx);
 }
-aio_context_release(s->ctx);
 return 0;
 
   fail_aio_context:
@@ -301,6 +301,8 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 unsigned i;
 unsigned nvqs = s->conf->num_queues;
 
+GLOBAL_STATE_CODE();
+
 if (!vblk->dataplane_started || s->stopping) {
 return;
 }
@@ -314,9 +316,10 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 s->stopping = true;
 trace_virtio_blk_data_plane_stop(s);
 
-aio_context_acquire(s->ctx);
 aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
 
+aio_context_acquire(s->ctx);
+
 /* Wait for virtio_blk_dma_restart_bh() and in flight I/O to complete */
 blk_drain(s->conf->conf.blk);
 
@@ -325,7 +328,6 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
  * BlockBackend in the iothread, that's ok
  */
 blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context(), NULL);
-
 aio_context_release(s->ctx);
 
 /*
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 1762517878..cdc6fd5979 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -100,6 +100,8 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
 VirtIOBlock *s = next->dev;
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
+IO_CODE();
+
 aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
 while (next) {
 VirtIOBlockReq *req = next;
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 20bb91766e..f6f55d4511 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -91,6 +91,8 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
 VirtIOSCSI *s = VIRTIO_SCSI(vdev);
 
+GLOBAL_STATE_CODE();
+
 if (s->dataplane_started ||
 s->dataplane_starting ||
 s->dataplane_fenced) {
@@ -138,20 +140,18 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 
 /*
  * These fields are visible to the IOThread so we rely on implicit barriers
- * in aio_context_acquire() on the write side and aio_notify_accept() on
- * the read side.
+ * in virtio_queue_aio_attach_host_notifier() on the write side and
+ * aio_notify_accept() on the read side.
  */
 s->dataplane_starting = false;
 s->dataplane_started = true;
 
-aio_context_acquire(s->ctx);
 virtio_queue_aio_attach_host_notifier(vs->ctrl_vq, s->ctx);

Re: [PATCH v3 4/4] scripts: add script to compare compatible properties

2022-11-08 Thread John Snow

On Thu, Nov 3, 2022 at 6:29 AM Maksim Davydov
 wrote:
>
> This script run QEMU to obtain compat_props of machines and default
> values of different types and produce appropriate table. This table
> can be used to compare machine types to choose the most suitable
> machine. Also this table in json or csv format should be used to check that
> new machine doesn't affect previous ones by comparing tables with and
> without new machine.
> Default values of properties are needed to fill "holes" in the table (one
> machine has these properties and another not. For instance, 2.12 mt has
> `{ "EPYC-" TYPE_X86_CPU, "xlevel", "0x800a" }`, but compat_pros of
> 3.1 mt doesn't have it. So, to compare these machines we need to fill
> unknown value of "EPYC-x86_64-cpu-xlevel" for 3.1 mt. This unknown value
> in the table I called "hole". To get values (default values) for these
> "holes" the script uses list of appropriate methods.)
>
> Notes:
> * some init values from the devices can't be available like properties
>   from virtio-9p when configure has --disable-virtfs. This situations will
>   be seen in the table as "unavailable driver".
> * Default values can be obtained in an unobvious way, like x86 features.
>   If the script doesn't know how to get property default value to compare
>   one machine with another it fills "holes" with "unavailable method". This
>   is done because script uses whitelist model to get default values of
>   different types. It means that the method that can't be applied to a new
>   type that can crash this script. It is better to get an "unavailable
>   driver" when creating a new machine with new compatible properties than
>   to break this script. So it turns out a more stable and generic script.
> * If the default value can't be obtained because this property doesn't
>   exist or because this property can't have default value, appropriate
>   "hole" will be filled by "unknown property" or "no default value"
> * If the property is applied to the abstract class, the script collects
>   default values from all child classes (set of default values)
>
> Example:
>
> ./scripts/compare_mt.py --mt pc-q35-3.1 pc-q35-4.0
>
> ╒═══╤══╤╕
> │   │  pc-q35-3.1  │  
>pc-q35-4.0 │
> ╞═══╪══╪╡
> │ Cascadelake-Server-x86_64-cpu:mpx │ True │  
>  False│
> ├───┼──┼┤
> │ Cascadelake-Server-x86_64-cpu:stepping│  5   │  
>6  │
> ├───┼──┼┤
> │ Icelake-Client-x86_64-cpu:mpx │ True │ 
> unavailable driver │
> ├───┼──┼┤
> │ Icelake-Server-x86_64-cpu:mpx │ True │  
>  False│
> ├───┼──┼┤
> │ Opteron_G3-x86_64-cpu:rdtscp  │False │  
>   True│
> ├───┼──┼┤
> │ Opteron_G4-x86_64-cpu:rdtscp  │False │  
>   True│
> ├───┼──┼┤
> │ Opteron_G5-x86_64-cpu:rdtscp  │False │  
>   True│
> ├───┼──┼┤
> │ Skylake-Client-IBRS-x86_64-cpu:mpx│ True │  
>  False│
> ├───┼──┼┤
> │ Skylake-Client-x86_64-cpu:mpx │ True │  
>  False│
> ├───┼──┼┤
> │ Skylake-Server-IBRS-x86_64-cpu:mpx│ True │  
>  False│
> ├───┼──┼┤
> │ Skylake-Server-x86_64-cpu:mpx │ True │  
>  False│
> ├───┼──┼┤
> │ intel-iommu:dma-drain │False │  
>   True│
> ├───┼──┼┤
> │ memory-backend-file:x-use-canonical-path-for-ramblock-id  │ True │  
>

Re: [PULL 0/3] Memory/SDHCI/ParallelFlash patches for v7.2.0-rc0

2022-11-08 Thread Stefan Hajnoczi

I've dropped the SDHCI CVE fix due to the CI failure.

The rest of the commits are still in the staging tree and I plan to
include them in v7.2.0-rc0.

Stefan

Re: [PULL 0/3] Memory/SDHCI/ParallelFlash patches for v7.2.0-rc0

2022-11-08 Thread Stefan Hajnoczi

On Tue, 8 Nov 2022 at 13:35, Philippe Mathieu-Daudé  wrote:
>
> The following changes since commit ade760a2f63804b7ab1839fbc3e5ddbf30538718:
>
>   Merge tag 'pull-request-2022-11-08' of https://gitlab.com/thuth/qemu into 
> staging (2022-11-08 11:34:06 -0500)
>
> are available in the Git repository at:
>
>   https://github.com/philmd/qemu.git tags/memflash-20221108
>
> for you to fetch changes up to cf9b3efd816518f9f210f50a0fa3e46a00b33c27:
>
>   Revert "hw/block/pflash_cfi: Error out if dev length isn't power of 2" 
> (2022-11-08 19:29:25 +0100)
>
> 
> Memory/SDHCI/ParallelFlash patches queue
>
> - Fix wrong end address dump in 'info mtree' (Zhenzhong Duan)
> - Fix in SDHCI for CVE-2022-3872 (myself)

There is a CI failure:

>>> G_TEST_DBUS_DAEMON=/builds/qemu-project/qemu/tests/dbus-vmstate-daemon.sh 
>>> MALLOC_PERTURB_=127 QTEST_QEMU_BINARY=./qemu-system-arm 
>>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
>>> QTEST_QEMU_IMG=./qemu-img 
>>> /builds/qemu-project/qemu/build/tests/qtest/npcm7xx_sdhci-test --tap -k
― ✀ ―
stderr:
** Message: 19:27:52.411: /tmp/sdhci_ZD2EV1
**
ERROR:../tests/qtest/npcm7xx_sdhci-test.c:101:sdwrite_read: assertion
failed: (!memcmp(rmsg, msg, len))

https://gitlab.com/qemu-project/qemu/-/jobs/3292896670

Stefan

Re: [PATCH v3 2/4] python/qmp: increase read buffer size

2022-11-08 Thread John Snow

On Thu, Nov 3, 2022 at 6:29 AM Maksim Davydov
 wrote:
>
> After modification of "query-machines" command the buffer size should be
> more than 452kB to contain output with compat-props.
>
> Signed-off-by: Maksim Davydov 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---
>  python/qemu/qmp/qmp_client.py | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/python/qemu/qmp/qmp_client.py b/python/qemu/qmp/qmp_client.py
> index 5dcda04a75..659fe4d98c 100644
> --- a/python/qemu/qmp/qmp_client.py
> +++ b/python/qemu/qmp/qmp_client.py
> @@ -197,8 +197,8 @@ async def run(self, address='/tmp/qemu.socket'):
>  #: Logger object used for debugging messages.
>  logger = logging.getLogger(__name__)
>
> -# Read buffer limit; large enough to accept query-qmp-schema
> -_limit = (256 * 1024)
> +# Read buffer limit; large enough to accept query-machines
> +_limit = (512 * 1024)

wow :)

>
>  # Type alias for pending execute() result items
>  _PendingT = Union[Message, ExecInterruptedError]
> --
> 2.25.1
>

If you would please submit this to
https://gitlab.com/qemu-project/python-qemu-qmp I can get it merged
there quickly, then backport it to qemu.git.
Or, if you don't have a gitlab account (and do not want one), please
let me know and I'll port it there myself so you don't have to.

thanks,
--js

Re: [PATCH v3 08/17] migration/qemu-file: Add qemu_file_get_to_fd()

2022-11-08 Thread Vladimir Sementsov-Ogievskiy


On 11/3/22 19:16, Avihai Horon wrote:

Add new function qemu_file_get_to_fd() that allows reading data from
QEMUFile and writing it straight into a given fd.

This will be used later in VFIO migration code.

Signed-off-by: Avihai Horon


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: [PULL v4 29/83] virtio: introduce virtio_queue_enable()

2022-11-08 Thread Stefan Hajnoczi

On Mon, 7 Nov 2022 at 18:10, Michael S. Tsirkin  wrote:
>
> From: Kangjie Xu 
>
> Introduce the interface queue_enable() in VirtioDeviceClass and the
> fucntion virtio_queue_enable() in virtio, it can be called when
> VIRTIO_PCI_COMMON_Q_ENABLE is written and related virtqueue can be
> started. It only supports the devices of virtio 1 or later. The
> not-supported devices can only start the virtqueue when DRIVER_OK.
>
> Signed-off-by: Kangjie Xu 
> Signed-off-by: Xuan Zhuo 
> Acked-by: Jason Wang 
> Message-Id: <20221017092558.111082-4-xuanz...@linux.alibaba.com>
> Reviewed-by: Michael S. Tsirkin 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  include/hw/virtio/virtio.h |  2 ++
>  hw/virtio/virtio.c | 14 ++
>  2 files changed, 16 insertions(+)
>
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 74d76c1dbc..b00b3fcf31 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -149,6 +149,7 @@ struct VirtioDeviceClass {
>  void (*reset)(VirtIODevice *vdev);
>  void (*set_status)(VirtIODevice *vdev, uint8_t val);
>  void (*queue_reset)(VirtIODevice *vdev, uint32_t queue_index);
> +void (*queue_enable)(VirtIODevice *vdev, uint32_t queue_index);
>  /* For transitional devices, this is a bitmap of features
>   * that are only exposed on the legacy interface but not
>   * the modern one.
> @@ -288,6 +289,7 @@ int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, 
> int n,
>  int virtio_set_status(VirtIODevice *vdev, uint8_t val);
>  void virtio_reset(void *opaque);
>  void virtio_queue_reset(VirtIODevice *vdev, uint32_t queue_index);
> +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
>  void virtio_update_irq(VirtIODevice *vdev);
>  int virtio_set_features(VirtIODevice *vdev, uint64_t val);
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index cf5f9ca387..9683b2e158 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2495,6 +2495,20 @@ void virtio_queue_reset(VirtIODevice *vdev, uint32_t 
> queue_index)
>  __virtio_queue_reset(vdev, queue_index);
>  }
>
> +void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index)
> +{
> +VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
> +
> +if (!virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> +error_report("queue_enable is only suppported in devices of virtio "
> + "1.0 or later.");

Why is this triggering here? Maybe virtio_queue_enable() is called too
early. I have verified that the Linux guest driver sets VERSION_1. I
didn't check what SeaBIOS does.

$ build/qemu-system-x86_64 -M accel=kvm -m 1G -cpu host -blockdev
file,node-name=drive0,filename=test.img -device
virtio-blk-pci,drive=drive0
qemu: queue_enable is only suppported in devices of virtio 1.0 or later.

Stefan

Re: [PATCH-for-7.2 1/2] hw/sd/sdhci: Do not set Buf Wr Ena before writing block (CVE-2022-3872)

2022-11-08 Thread Alexander Bulekov

On 221108 1225, Alexander Bulekov wrote:
> On 221107 2312, Philippe Mathieu-Daudé wrote:
> > When sdhci_write_block_to_card() is called to transfer data from
> > the FIFO to the SD bus, the data is already present in the buffer
> > and we have to consume it directly.
> > 
> > See the description of the 'Buffer Write Enable' bit from the
> > 'Present State' register (prnsts::SDHC_SPACE_AVAILABLE) in Table
> > 2.14 from the SDHCI spec v2:
> > 
> >   Buffer Write Enable
> > 
> >   This status is used for non-DMA write transfers.
> > 
> >   The Host Controller can implement multiple buffers to transfer
> >   data efficiently. This read only flag indicates if space is
> >   available for write data. If this bit is 1, data can be written
> >   to the buffer. A change of this bit from 1 to 0 occurs when all
> >   the block data is written to the buffer. A change of this bit
> >   from 0 to 1 occurs when top of block data can be written to the
> >   buffer and generates the Buffer Write Ready interrupt.
> > 
> > In our case, we do not want to overwrite the buffer, so we want
> > this bit to be 0, then set it to 1 once the data is written onto
> > the bus.
> > 
> > This is probably a copy/paste error from commit d7dfca0807
> > ("hw/sdhci: introduce standard SD host controller").
> > 
> > Reproducer:
> > https://lore.kernel.org/qemu-devel/caa8xkjxrms0fkr28akvnnpyatm0y0b+5fichpsrhd+mugnu...@mail.gmail.com/
> > 
> > Fixes: CVE-2022-3872
> > Reported-by: RivenDell 
> > Reported-by: Siqi Chen 
> > Reported-by: ningqiang 
> > Signed-off-by: Philippe Mathieu-Daudé 
> 
> Seems like OSS-Fuzz also found this, not sure why it never made it into
> a gitlab issue:
> https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=45986#c4
> 
> Slightly shorter reproducer:
> 
> cat << EOF | ./qemu-system-i386 -display none -machine accel=qtest, -m \
> 512M -nodefaults -device sdhci-pci -device sd-card,drive=mydrive -drive \
> if=none,index=0,file=null-co://,format=raw,id=mydrive -nographic -qtest \
> stdio
> outl 0xcf8 0x80001010
> outl 0xcfc 0xe000
> outl 0xcf8 0x80001001
> outl 0xcfc 0x0600
> write 0xe058 0x1 0x6e
> write 0xe059 0x1 0x5a
> write 0xe028 0x1 0x10
> write 0xe02c 0x1 0x05
> write 0x5a6e 0x1 0x21
> write 0x5a75 0x1 0x20
> write 0xe005 0x1 0x02
> write 0xe00c 0x1 0x01
> write 0xe00e 0x1 0x20
> write 0xe00f 0x1 0x00
> write 0xe00c 0x1 0x00
> write 0xe020 0x1 0x00
> EOF

Hi Philippe,
I ran the fuzzer with this series applied and it found:

cat << EOF | ./qemu-system-i386 -display none -machine accel=qtest, -m \
512M -nodefaults -device sdhci-pci -device sd-card,drive=mydrive -drive \
if=none,index=0,file=null-co://,format=raw,id=mydrive -nographic -qtest \
stdio
outl 0xcf8 0x80001010
outl 0xcfc 0xe000
outl 0xcf8 0x80001004
outw 0xcfc 0x06
write 0xe028 0x1 0x08
write 0xe02c 0x1 0x05
write 0xe005 0x1 0x02
write 0x0 0x1 0x21
write 0x3 0x1 0x20
write 0xe00c 0x1 0x01
write 0xe00e 0x1 0x20
write 0xe00f 0x1 0x00
write 0xe00c 0x1 0x20
write 0xe020 0x1 0x00
EOF

The crash seems very similar, but it looks like instead of
SDHC_TRNS_READ this reproducer sets SDHC_TRNS_MULTI
-Alex

Re: [PATCH v3 06/17] vfio/migration: Fix NULL pointer dereference bug

2022-11-08 Thread Vladimir Sementsov-Ogievskiy


On 11/3/22 19:16, Avihai Horon wrote:

As part of its error flow, vfio_vmstate_change() accesses
MigrationState->to_dst_file without any checks. This can cause a NULL
pointer dereference if the error flow is taken and
MigrationState->to_dst_file is not set.

For example, this can happen if VM is started or stopped not during
migration and vfio_vmstate_change() error flow is taken, as
MigrationState->to_dst_file is not set at that time.

Fix it by checking that MigrationState->to_dst_file is set before using
it.

Fixes: 02a7e71b1e5b ("vfio: Add VM state change handler to know state of VM")
Signed-off-by: Avihai Horon
Reviewed-by: Juan Quintela



Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: [PATCH v3 05/17] vfio/migration: Fix wrong enum usage

2022-11-08 Thread Vladimir Sementsov-Ogievskiy


On 11/3/22 19:16, Avihai Horon wrote:

vfio_migration_init() initializes VFIOMigration->device_state using enum
of VFIO migration protocol v2. Current implemented protocol is v1 so v1
enum should be used. Fix it.

Fixes: 429c72800654 ("vfio/migration: Fix incorrect initialization value for 
parameters in VFIOMigration")
Signed-off-by: Avihai Horon
Reviewed-by: Zhang Chen


the commit is already in master branch

--
Best regards,
Vladimir

Re: [PULL 0/5] s390x fix and white space cleanup

2022-11-08 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 00/14] MIPS patches for 2022-11-08

2022-11-08 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PATCH v3 04/17] migration: Simplify migration_iteration_run()

2022-11-08 Thread Vladimir Sementsov-Ogievskiy


On 11/3/22 19:16, Avihai Horon wrote:

From: Juan Quintela 

Signed-off-by: Juan Quintela 
Signed-off-by: Avihai Horon 
---
  migration/migration.c | 25 +
  1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index ffe868b86f..59cc3c309b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3743,23 +3743,24 @@ static MigIterateState 
migration_iteration_run(MigrationState *s)
  
  trace_migrate_pending(pending_size, s->threshold_size, pend_pre, pend_post);
  
-if (pending_size && pending_size >= s->threshold_size) {

-/* Still a significant amount to transfer */
-if (!in_postcopy && pend_pre <= s->threshold_size &&
-qatomic_read(>start_postcopy)) {
-if (postcopy_start(s)) {
-error_report("%s: postcopy failed to start", __func__);
-}
-return MIG_ITERATE_SKIP;
-}
-/* Just another iteration step */
-qemu_savevm_state_iterate(s->to_dst_file, in_postcopy);
-} else {
+
+if (pending_size < s->threshold_size) {


Is corner case "pending_size == s->threshold_size == 0" theoretically possible 
here? In this case prepatch we go to completion. Afterpatch we go to next iteration..


  trace_migration_thread_low_pending(pending_size);
  migration_completion(s);
  return MIG_ITERATE_BREAK;
  }
  
+/* Still a significant amount to transfer */

+if (!in_postcopy && pend_pre <= s->threshold_size &&
+qatomic_read(>start_postcopy)) {
+if (postcopy_start(s)) {
+error_report("%s: postcopy failed to start", __func__);
+}
+return MIG_ITERATE_SKIP;
+}
+
+/* Just another iteration step */
+qemu_savevm_state_iterate(s->to_dst_file, in_postcopy);
  return MIG_ITERATE_RESUME;
  }
  


--
Best regards,
Vladimir

Re: [PULL 00/55] MIPS patches for 2022-10-30

2022-11-08 Thread Paolo Bonzini

Il mar 8 nov 2022, 16:09 Thomas Huth  ha scritto:

> >> If it is the last thing, we should put in the "Build Dependencies"
> >> part of the release notes that a C++ compiler is no longer required
> >> and mention that the configure options to specify it will go away in
> >> a future release.
> >
> > I guess the last use is from the Guest Agent on Windows...
> >
> > $ git ls-files | fgrep .cpp
> > qga/vss-win32/install.cpp
> > qga/vss-win32/provider.cpp
> > qga/vss-win32/requester.cpp
>
> Yes, I think the c++ configure options are still required for that Windows
> stuff ... but IIRC Paolo once mentioned that we could simplify the linker
> logic in configure or meson.build once the nanomips stuff has been
> converted, since we now do not have to mix C and C++ linkage anymore?
>

Yes, it can be simplified to remove the link_language checks.

Paolo

>

Re: [PATCH v3 03/17] migration: Block migration comment or code is wrong

2022-11-08 Thread Vladimir Sementsov-Ogievskiy


On 11/8/22 21:36, Vladimir Sementsov-Ogievskiy wrote:

On 11/3/22 19:16, Avihai Horon wrote:

From: Juan Quintela 

And it appears that what is wrong is the code. During bulk stage we
need to make sure that some block is dirty, but no games with
max_size at all.


:) That made me interested in, why we need this one block, so I decided to 
search through the history.

And what I see? Haha, that was my commit 04636dc410b163c "migration/block: fix 
pending() return value" [1], which you actually revert with this patch.

So, at least we should note, that it's a revert of [1].

Still that this will reintroduce the bug fixed by [1].

As I understand the problem is (was) that in block_save_complete() we finalize 
only dirty blocks, but don't finalize the bulk phase if it's not finalized yet. 
So, we can fix block_save_complete() to finalize the bulk phase, instead of 
hacking with pending in [1].

Interesting, why we need this one block, described in the comment you refer to? 
Was it an incomplete workaround for the same problem, described in [1]? If so, 
we can fix block_save_complete() and remove this if() together with the comment 
from block_save_pending().



PS: Don't we want to deprecate block migration? Is it really used in 
production?  block-mirror is a recommended way to migrate block devices.

--
Best regards,
Vladimir

Re: [PATCH v3 03/17] migration: Block migration comment or code is wrong

2022-11-08 Thread Vladimir Sementsov-Ogievskiy


On 11/3/22 19:16, Avihai Horon wrote:

From: Juan Quintela 

And it appears that what is wrong is the code. During bulk stage we
need to make sure that some block is dirty, but no games with
max_size at all.


:) That made me interested in, why we need this one block, so I decided to 
search through the history.

And what I see? Haha, that was my commit 04636dc410b163c "migration/block: fix 
pending() return value" [1], which you actually revert with this patch.

So, at least we should note, that it's a revert of [1].

Still that this will reintroduce the bug fixed by [1].

As I understand the problem is (was) that in block_save_complete() we finalize 
only dirty blocks, but don't finalize the bulk phase if it's not finalized yet. 
So, we can fix block_save_complete() to finalize the bulk phase, instead of 
hacking with pending in [1].

Interesting, why we need this one block, described in the comment you refer to? 
Was it an incomplete workaround for the same problem, described in [1]? If so, 
we can fix block_save_complete() and remove this if() together with the comment 
from block_save_pending().



Signed-off-by: Juan Quintela 
Reviewed-by: Stefan Hajnoczi 
---
  migration/block.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index b3d680af75..39ce4003c6 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -879,8 +879,8 @@ static void block_save_pending(void *opaque, uint64_t 
max_size,
  blk_mig_unlock();
  
  /* Report at least one block pending during bulk phase */

-if (pending <= max_size && !block_mig_state.bulk_completed) {
-pending = max_size + BLK_MIG_BLOCK_SIZE;
+if (!pending && !block_mig_state.bulk_completed) {
+pending = BLK_MIG_BLOCK_SIZE;
  }
  
  trace_migration_block_save_pending(pending);


--
Best regards,
Vladimir

[PULL 3/3] Revert "hw/block/pflash_cfi: Error out if dev length isn't power of 2"

2022-11-08 Thread Philippe Mathieu-Daudé

From: Daniel Henrique Barboza 

Commit 334c388f25 ("pflash_cfi: Error out if device length
isn't a power of two") aimed to finish the effort started by
commit 06f1521795 ("pflash: Require backend size to match device,
improve errors"), but unfortunately we are not quite there since
various machines are still ready to accept incomplete / oversized
pflash backend images, and now fail, i.e. on Debian bullseye:

 $ qemu-system-x86_64 \
   -drive \
   if=pflash,format=raw,unit=0,readonly=on,file=/usr/share/OVMF/OVMF_CODE.fd
 qemu-system-x86_64: Device size must be a power of two.

where OVMF_CODE.fd comes from the ovmf package, which doesn't
pad the firmware images to the flash size:

 $ ls -lh /usr/share/OVMF/
 -rw-r--r-- 1 root root 3.5M Aug 19  2021 OVMF_CODE_4M.fd
 -rw-r--r-- 1 root root 1.9M Aug 19  2021 OVMF_CODE.fd
 -rw-r--r-- 1 root root 128K Aug 19  2021 OVMF_VARS.fd

Since we entered the freeze period to prepare the v7.2.0 release,
the safest is to revert commit 334c388f25707a234c4a0dea05b9df08d.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1294
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20221108175755.95141-1-phi...@linaro.org>
Signed-off-by: Daniel Henrique Barboza 
Message-Id: <20221108172633.860700-1-danielhb...@gmail.com>
---
 hw/block/pflash_cfi01.c | 8 ++--
 hw/block/pflash_cfi02.c | 5 -
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 9c235bf66e..0cbc2fb4cb 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -690,7 +690,7 @@ static const MemoryRegionOps pflash_cfi01_ops = {
 .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static void pflash_cfi01_fill_cfi_table(PFlashCFI01 *pfl, Error **errp)
+static void pflash_cfi01_fill_cfi_table(PFlashCFI01 *pfl)
 {
 uint64_t blocks_per_device, sector_len_per_device, device_len;
 int num_devices;
@@ -708,10 +708,6 @@ static void pflash_cfi01_fill_cfi_table(PFlashCFI01 *pfl, 
Error **errp)
 sector_len_per_device = pfl->sector_len / num_devices;
 }
 device_len = sector_len_per_device * blocks_per_device;
-if (!is_power_of_2(device_len)) {
-error_setg(errp, "Device size must be a power of two.");
-return;
-}
 
 /* Hardcoded CFI table */
 /* Standard "QRY" string */
@@ -869,7 +865,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
  */
 pfl->cmd = 0x00;
 pfl->status = 0x80; /* WSM ready */
-pflash_cfi01_fill_cfi_table(pfl, errp);
+pflash_cfi01_fill_cfi_table(pfl);
 }
 
 static void pflash_cfi01_system_reset(DeviceState *dev)
diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index ff2fe154c1..2a99b286b0 100644
--- a/hw/block/pflash_cfi02.c
+++ b/hw/block/pflash_cfi02.c
@@ -880,11 +880,6 @@ static void pflash_cfi02_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-if (!is_power_of_2(pfl->chip_len)) {
-error_setg(errp, "Device size must be a power of two.");
-return;
-}
-
 memory_region_init_rom_device(>orig_mem, OBJECT(pfl),
   _cfi02_ops, pfl, pfl->name,
   pfl->chip_len, errp);
-- 
2.38.1

[PULL 1/3] memory: Fix wrong end address dump

2022-11-08 Thread Philippe Mathieu-Daudé

From: Zhenzhong Duan 

The end address of memory region section isn't correctly calculated
which leads to overflowed mtree dump:

  Dispatch
Physical sections
  ..
  #70 @2000..00011fff io [ROOT]
  #71 @5000..5fff (noname)
  #72 @5000..00014fff io [ROOT]
  #73 @5658..5658 vmport
  #74 @5659..00015658 io [ROOT]
  #75 @6000..00015fff io [ROOT]

After fix:
  #70 @2000..4fff io [ROOT]
  #71 @5000..5fff (noname)
  #72 @5000..5657 io [ROOT]
  #73 @5658..5658 vmport
  #74 @5659..5fff io [ROOT]
  #75 @6000.. io [ROOT]

Fixes: 5e8fd947e2670 ("memory: Rework "info mtree" to print flat views and 
dispatch trees")
Signed-off-by: Zhenzhong Duan 
Reviewed-by: David Hildenbrand 
Reviewed-by: Peter Xu 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20220622095912.3430583-1-zhenzhong.d...@intel.com>
Signed-off-by: Philippe Mathieu-Daudé 
---
 softmmu/physmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index d9578ccfd4..1b606a3002 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3712,7 +3712,7 @@ void mtree_print_dispatch(AddressSpaceDispatch *d, 
MemoryRegion *root)
 " %s%s%s%s%s",
 i,
 s->offset_within_address_space,
-s->offset_within_address_space + MR_SIZE(s->mr->size),
+s->offset_within_address_space + MR_SIZE(s->size),
 s->mr->name ? s->mr->name : "(noname)",
 i < ARRAY_SIZE(names) ? names[i] : "",
 s->mr == root ? " [ROOT]" : "",
-- 
2.38.1

[PULL 2/3] hw/sd/sdhci: Do not set Buf Wr Ena before writing block (CVE-2022-3872)

2022-11-08 Thread Philippe Mathieu-Daudé

When sdhci_write_block_to_card() is called to transfer data from
the FIFO to the SD bus, the data is already present in the buffer
and we have to consume it directly.

See the description of the 'Buffer Write Enable' bit from the
'Present State' register (prnsts::SDHC_SPACE_AVAILABLE) in Table
2.14 from the SDHCI spec v2:

  Buffer Write Enable

  This status is used for non-DMA write transfers.

  The Host Controller can implement multiple buffers to transfer
  data efficiently. This read only flag indicates if space is
  available for write data. If this bit is 1, data can be written
  to the buffer. A change of this bit from 1 to 0 occurs when all
  the block data is written to the buffer. A change of this bit
  from 0 to 1 occurs when top of block data can be written to the
  buffer and generates the Buffer Write Ready interrupt.

In our case, we do not want to overwrite the buffer, so we want
this bit to be 0, then set it to 1 once the data is written onto
the bus.

This is probably a copy/paste error from commit d7dfca0807
("hw/sdhci: introduce standard SD host controller").

OSS-Fuzz Report: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=45986#c4

Reproducers:

  $ cat << EOF | \
 qemu-system-x86_64 -nodefaults -display none -machine accel=qtest \
   -m 512M  -device sdhci-pci -device sd-card,drive=mydrive \
   -drive if=none,index=0,file=null-co://,format=raw,id=mydrive \
   -nographic -qtest stdio
  outl 0xcf8 0x80001010
  outl 0xcfc 0xe000
  outl 0xcf8 0x80001001
  outl 0xcfc 0x0600
  write 0xe058 0x1 0x6e
  write 0xe059 0x1 0x5a
  write 0xe028 0x1 0x10
  write 0xe02c 0x1 0x05
  write 0x5a6e 0x1 0x21
  write 0x5a75 0x1 0x20
  write 0xe005 0x1 0x02
  write 0xe00c 0x1 0x01
  write 0xe00e 0x1 0x20
  write 0xe00f 0x1 0x00
  write 0xe00c 0x1 0x00
  write 0xe020 0x1 0x00
  EOF

or 
https://lore.kernel.org/qemu-devel/caa8xkjxrms0fkr28akvnnpyatm0y0b+5fichpsrhd+mugnu...@mail.gmail.com/

Fixes: CVE-2022-3872
Reported-by: RivenDell 
Reported-by: Siqi Chen 
Reported-by: ningqiang 
Reported-by: ClusterFuzz
Signed-off-by: Philippe Mathieu-Daudé 
Tested-by: Mauro Matteo Cascella 
Message-Id: <20221107221236.47841-2-phi...@linaro.org>
---
 hw/sd/sdhci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index 306070c872..f230e7475f 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -954,7 +954,7 @@ static void sdhci_data_transfer(void *opaque)
 sdhci_read_block_from_card(s);
 } else {
 s->prnsts |= SDHC_DOING_WRITE | SDHC_DAT_LINE_ACTIVE |
-SDHC_SPACE_AVAILABLE | SDHC_DATA_INHIBIT;
+   SDHC_DATA_INHIBIT;
 sdhci_write_block_to_card(s);
 }
 }
-- 
2.38.1

[PULL 0/3] Memory/SDHCI/ParallelFlash patches for v7.2.0-rc0

2022-11-08 Thread Philippe Mathieu-Daudé

The following changes since commit ade760a2f63804b7ab1839fbc3e5ddbf30538718:

  Merge tag 'pull-request-2022-11-08' of https://gitlab.com/thuth/qemu into 
staging (2022-11-08 11:34:06 -0500)

are available in the Git repository at:

  https://github.com/philmd/qemu.git tags/memflash-20221108

for you to fetch changes up to cf9b3efd816518f9f210f50a0fa3e46a00b33c27:

  Revert "hw/block/pflash_cfi: Error out if dev length isn't power of 2" 
(2022-11-08 19:29:25 +0100)


Memory/SDHCI/ParallelFlash patches queue

- Fix wrong end address dump in 'info mtree' (Zhenzhong Duan)
- Fix in SDHCI for CVE-2022-3872 (myself)
- Revert latest pflash check of underlying block size (Daniel
  Henrique Barboza & myself)



Daniel Henrique Barboza (1):
  Revert "hw/block/pflash_cfi: Error out if dev length isn't power of 2"

Philippe Mathieu-Daudé (1):
  hw/sd/sdhci: Do not set Buf Wr Ena before writing block
(CVE-2022-3872)

Zhenzhong Duan (1):
  memory: Fix wrong end address dump

 hw/block/pflash_cfi01.c | 8 ++--
 hw/block/pflash_cfi02.c | 5 -
 hw/sd/sdhci.c   | 2 +-
 softmmu/physmem.c   | 2 +-
 4 files changed, 4 insertions(+), 13 deletions(-)

-- 
2.38.1

Re: [PULL 0/2] Net patches

2022-11-08 Thread Stefan Hajnoczi

On Mon, 7 Nov 2022 at 23:20, Jason Wang  wrote:
> Si-Wei Liu (1):
>   vhost-vdpa: fix assert !virtio_net_get_subqueue(nc)->async_tx.elem in 
> virtio_net_reset

I have applied just this patch to the staging tree.

Thanks,
Stefan

Re: [PATCH-for-7.2 0/2] hw/sd/sdhci: Do not set Buf Wr Ena before writing block (CVE-2022-3872)

2022-11-08 Thread Stefan Hajnoczi

Applied to the staging tree. Thanks!

Stefan

Re: [PATCH-for-7.2] Revert "hw/block/pflash_cfi: Error out if dev length isn't power of 2"

2022-11-08 Thread Stefan Hajnoczi

Applied to staging. Thanks!

Stefan

Re: [PATCH] Revert "hw/block/pflash_cfi0{1, 2}: Error out if device length isn't a power of two"

2022-11-08 Thread Stefan Hajnoczi

On Tue, 8 Nov 2022 at 13:10, Philippe Mathieu-Daudé  wrote:
>
> On 8/11/22 18:26, Daniel Henrique Barboza wrote:
> > This commit caused a regression [1] that prevents machines that uses
> > Open Virtual Machine Firmware (OVMF) to boot.
> >
> > This is a long standing behavior with how pflash handles images. More
> > information about why this happens can be found in [2] and commit
> > 06f1521795 ("pflash: Require backend size to match device, improve
> > errors").
> >
> > This reverts commit 334c388f25707a234c4a0dea05b9df08d7746638.
> >
> > [1] https://gitlab.com/qemu-project/qemu/-/issues/1294
> > [2] 
> > https://lore.kernel.org/qemu-devel/20190308062455.29755-1-arm...@redhat.com/
> >
> > Cc: Bernhard Beschow 
> > Cc: Philippe Mathieu-Daudé 
> > Cc: Stefan Hajnoczi 
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1294
> > Signed-off-by: Daniel Henrique Barboza 
> > ---
> >   hw/block/pflash_cfi01.c | 8 ++--
> >   hw/block/pflash_cfi02.c | 5 -
> >   2 files changed, 2 insertions(+), 11 deletions(-)
>
> Reviewed-by: Philippe Mathieu-Daudé 
>
> Thanks, our patches crossed :)
> https://lore.kernel.org/qemu-devel/20221108175755.95141-1-phi...@linaro.org/
>
> I'm queuing yours which is first and will amend the description
> (if you don't disagree).

I've already applied yours, Philippe, because the description is more
comprehensive.

Daniel, thank you for sending your version of the patch!

Stefan

Re: [PATCH] Revert "hw/block/pflash_cfi0{1, 2}: Error out if device length isn't a power of two"

2022-11-08 Thread Philippe Mathieu-Daudé


On 8/11/22 18:26, Daniel Henrique Barboza wrote:

This commit caused a regression [1] that prevents machines that uses
Open Virtual Machine Firmware (OVMF) to boot.

This is a long standing behavior with how pflash handles images. More
information about why this happens can be found in [2] and commit
06f1521795 ("pflash: Require backend size to match device, improve
errors").

This reverts commit 334c388f25707a234c4a0dea05b9df08d7746638.

[1] https://gitlab.com/qemu-project/qemu/-/issues/1294
[2] https://lore.kernel.org/qemu-devel/20190308062455.29755-1-arm...@redhat.com/

Cc: Bernhard Beschow 
Cc: Philippe Mathieu-Daudé 
Cc: Stefan Hajnoczi 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1294
Signed-off-by: Daniel Henrique Barboza 
---
  hw/block/pflash_cfi01.c | 8 ++--
  hw/block/pflash_cfi02.c | 5 -
  2 files changed, 2 insertions(+), 11 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 

Thanks, our patches crossed :)
https://lore.kernel.org/qemu-devel/20221108175755.95141-1-phi...@linaro.org/

I'm queuing yours which is first and will amend the description
(if you don't disagree).

Regards,

Phil.

Re: [PATCH v3 02/17] migration: No save_live_pending() method uses the QEMUFile parameter

2022-11-08 Thread Vladimir Sementsov-Ogievskiy


On 11/3/22 19:16, Avihai Horon wrote:

From: Juan Quintela

So remove it everywhere.

Signed-off-by: Juan Quintela


Reviewed-by: Vladimir Sementsov-Ogievskiy 


--
Best regards,
Vladimir

[PATCH-for-7.2] Revert "hw/block/pflash_cfi: Error out if dev length isn't power of 2"

2022-11-08 Thread Philippe Mathieu-Daudé

Commit 334c388f25 ("pflash_cfi: Error out if device length
isn't a power of two") aimed to finish the effort started by
commit 06f1521795 ("pflash: Require backend size to match device,
improve errors"), but unfortunately we are not quite there since
various machines are still ready to accept incomplete / oversized
pflash backend images, and now fail, i.e. on Debian bullseye:

 $ qemu-system-x86_64 \
   -drive \
   if=pflash,format=raw,unit=0,readonly=on,file=/usr/share/OVMF/OVMF_CODE.fd
 qemu-system-x86_64: Device size must be a power of two.

where OVMF_CODE.fd comes from the ovmf package, which doesn't
pad the firmware images to the flash size:

 $ ls -lh /usr/share/OVMF/
 -rw-r--r-- 1 root root 3.5M Aug 19  2021 OVMF_CODE_4M.fd
 -rw-r--r-- 1 root root 1.9M Aug 19  2021 OVMF_CODE.fd
 -rw-r--r-- 1 root root 128K Aug 19  2021 OVMF_VARS.fd

Since we entered the freeze period to prepare the v7.2.0 release,
the safest is to revert commit 334c388f25707a234c4a0dea05b9df08d.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1294
Signed-off-by: Philippe Mathieu-Daudé 
---
Cc: Sunil V L 
Cc: Daniel Henrique Barboza 
Cc: Markus Armbruster 
Cc: Bernhard Beschow 
---
 hw/block/pflash_cfi01.c | 8 ++--
 hw/block/pflash_cfi02.c | 5 -
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 9c235bf66e..0cbc2fb4cb 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -690,7 +690,7 @@ static const MemoryRegionOps pflash_cfi01_ops = {
 .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static void pflash_cfi01_fill_cfi_table(PFlashCFI01 *pfl, Error **errp)
+static void pflash_cfi01_fill_cfi_table(PFlashCFI01 *pfl)
 {
 uint64_t blocks_per_device, sector_len_per_device, device_len;
 int num_devices;
@@ -708,10 +708,6 @@ static void pflash_cfi01_fill_cfi_table(PFlashCFI01 *pfl, 
Error **errp)
 sector_len_per_device = pfl->sector_len / num_devices;
 }
 device_len = sector_len_per_device * blocks_per_device;
-if (!is_power_of_2(device_len)) {
-error_setg(errp, "Device size must be a power of two.");
-return;
-}
 
 /* Hardcoded CFI table */
 /* Standard "QRY" string */
@@ -869,7 +865,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
  */
 pfl->cmd = 0x00;
 pfl->status = 0x80; /* WSM ready */
-pflash_cfi01_fill_cfi_table(pfl, errp);
+pflash_cfi01_fill_cfi_table(pfl);
 }
 
 static void pflash_cfi01_system_reset(DeviceState *dev)
diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index ff2fe154c1..2a99b286b0 100644
--- a/hw/block/pflash_cfi02.c
+++ b/hw/block/pflash_cfi02.c
@@ -880,11 +880,6 @@ static void pflash_cfi02_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-if (!is_power_of_2(pfl->chip_len)) {
-error_setg(errp, "Device size must be a power of two.");
-return;
-}
-
 memory_region_init_rom_device(>orig_mem, OBJECT(pfl),
   _cfi02_ops, pfl, pfl->name,
   pfl->chip_len, errp);
-- 
2.38.1

Re: [PULL 00/55] MIPS patches for 2022-10-30

2022-11-08 Thread Konstantin Kostiuk

On Tue, Nov 8, 2022 at 5:23 PM Philippe Mathieu-Daudé 
wrote:

> On 8/11/22 16:09, Thomas Huth wrote:
> > On 08/11/2022 15.23, Philippe Mathieu-Daudé wrote:
> >> On 8/11/22 14:59, Peter Maydell wrote:
>
> >>> Was this the last use of C++ in the tree, or am I forgetting
> >>> some other part that still needs the C++ compiler?
> >>>
> >>> If it is the last thing, we should put in the "Build Dependencies"
> >>> part of the release notes that a C++ compiler is no longer required
> >>> and mention that the configure options to specify it will go away in
> >>> a future release.
> >>
> >> I guess the last use is from the Guest Agent on Windows...
> >>
> >> $ git ls-files | fgrep .cpp
> >> qga/vss-win32/install.cpp
> >> qga/vss-win32/provider.cpp
> >> qga/vss-win32/requester.cpp
> >
> > Yes, I think the c++ configure options are still required for that
> > Windows stuff ... but IIRC Paolo once mentioned that we could simplify
> > the linker logic in configure or meson.build once the nanomips stuff has
> > been converted, since we now do not have to mix C and C++ linkage
> anymore?
>
> Oh I guess I got it, we only need to link qga.exe as a standalone binary
> unrelated to the qemu-system/user binaries, so we can simplify most of
> the linkage?
>
>
Hi All,

Currently, we need C++ only for the VSS part of Windows Guest Agent.
Anyway, the VSS source is fully based on Windows API, so in general,
we can rewrite it to C.

Best Regards,
Konstantin Kostiuk.

Re: QOM: should you be able to cast from an interface class to the concrete class?

2022-11-08 Thread Daniel P . Berrangé

On Tue, Nov 08, 2022 at 04:01:56PM +, Peter Maydell wrote:
> Hi; in the QOM model, are you supposed to be able to cast from
> an interface class to the concrete class that is implementing it?
> 
> To give a specific example, if I have a ResettableClass *rc
> should I be able to do DeviceClass *dc = DEVICE_CLASS(rc);
> (assuming that the rc I have is actually from a DeviceClass) ?
> 
> If I'm reading the code correctly, at the moment this isn't possible:
> object_class_dynamic_cast() has code for "if the class we're
> casting from implements interfaces and the class we're casting to
> is an interface, then look through the list of interfaces to
> see if we should be returning the class pointer from the interface
> list", which means you can cast from the concrete class to the
> interface class. But there's no code there to say "if the class
> we're casting from is an interface, try the concrete class".
> 
> As far as I can see we do actually record the information we need
> to do this -- InterfaceClass has a field concrete_class that points
> to the concrete class that's implementing it. But this field is
> currently only written, never read.
> 
> Should we:
> (a) support casting from the interface class back to the concrete
> class, by adding some extra code in object_class_dynamic_cast(), or
> (b) decide that that isn't something we should be wanting to do,
> and remove the dead concrete_class struct field ?

My rule of thumb would be, if it is possible with GObject, then
it is reasonable to want it in QOM, and indeed you can cast
from an interface in GObject, back to the concrete type that
has implemented it.  So I'd go for (a).

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v3 01/17] migration: Remove res_compatible parameter

2022-11-08 Thread Vladimir Sementsov-Ogievskiy


On 11/3/22 19:16, Avihai Horon wrote:

From: Juan Quintela 

It was only used for RAM, and in that case, it means that this amount
of data was sent for memory. 


Not clear for me, what means "this amount of data was sent for memory"... That 
amount of data was not yet sent, actually.


Just delete the field in all callers.

Signed-off-by: Juan Quintela 
---
  hw/s390x/s390-stattrib.c   |  6 ++
  hw/vfio/migration.c| 10 --
  hw/vfio/trace-events   |  2 +-
  include/migration/register.h   | 20 ++--
  migration/block-dirty-bitmap.c |  7 +++
  migration/block.c  |  7 +++
  migration/migration.c  |  9 -
  migration/ram.c|  8 +++-
  migration/savevm.c | 14 +-
  migration/savevm.h |  4 +---
  migration/trace-events |  2 +-
  11 files changed, 37 insertions(+), 52 deletions(-)



[..]


diff --git a/include/migration/register.h b/include/migration/register.h
index c1dcff0f90..1950fee6a8 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -48,18 +48,18 @@ typedef struct SaveVMHandlers {
  int (*save_setup)(QEMUFile *f, void *opaque);
  void (*save_live_pending)(QEMUFile *f, void *opaque,
uint64_t threshold_size,
-  uint64_t *res_precopy_only,
-  uint64_t *res_compatible,
-  uint64_t *res_postcopy_only);
+  uint64_t *rest_precopy,
+  uint64_t *rest_postcopy);
  /* Note for save_live_pending:
- * - res_precopy_only is for data which must be migrated in precopy phase
- * or in stopped state, in other words - before target vm start
- * - res_compatible is for data which may be migrated in any phase
- * - res_postcopy_only is for data which must be migrated in postcopy phase
- * or in stopped state, in other words - after source vm stop
+ * - res_precopy is for data which must be migrated in precopy
+ * phase or in stopped state, in other words - before target
+ * vm start
+ * - res_postcopy is for data which must be migrated in postcopy
+ * phase or in stopped state, in other words - after source vm
+ * stop
   *
- * Sum of res_postcopy_only, res_compatible and res_postcopy_only is the
- * whole amount of pending data.
+ * Sum of res_precopy and res_postcopy is the whole amount of
+ * pending data.
   */
  
  


[..]


diff --git a/migration/ram.c b/migration/ram.c
index dc1de9ddbc..20167e1102 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3435,9 +3435,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
  }
  
  static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,

- uint64_t *res_precopy_only,
- uint64_t *res_compatible,
- uint64_t *res_postcopy_only)
+ uint64_t *res_precopy, uint64_t *res_postcopy)
  {
  RAMState **temp = opaque;
  RAMState *rs = *temp;
@@ -3457,9 +3455,9 @@ static void ram_save_pending(QEMUFile *f, void *opaque, 
uint64_t max_size,
  
  if (migrate_postcopy_ram()) {

  /* We can do postcopy, and all the data is postcopiable */
-*res_compatible += remaining_size;
+*res_postcopy += remaining_size;


That's seems to be not quite correct.

res_postcopy is defined as "data which must be migrated in postcopy", but 
that's not true here, as RAM can be migrated both in precopy and postcopy.

Still we really can include "compat" into "postcopy" just because in the logic of migration_iteration_run() we don't 
actually distinguish "compat" and "post". The logic only depends on "total" and "pre".

So, if we want to combine "compat" into "post", we should redefine "post" in 
the comment in include/migration/register.h, something like this:

- res_precopy is for data which MUST be migrated in precopy
  phase or in stopped state, in other words - before target
  vm start

- res_postcopy is for all data except for declared in res_precopy.
  res_postcopy data CAN be migrated in postcopy, i.e. after target
  vm start.



  } else {
-*res_precopy_only += remaining_size;
+*res_precopy += remaining_size;
  }
  }
  



--
Best regards,
Vladimir

Re: [PATCH v3 4/4] scripts: add script to compare compatible properties

2022-11-08 Thread Maksim Davydov




On 11/8/22 18:37, Vladimir Sementsov-Ogievskiy wrote:

On 11/3/22 13:27, Maksim Davydov wrote:

This script run QEMU to obtain compat_props of machines and default
values of different types and produce appropriate table. This table
can be used to compare machine types to choose the most suitable
machine. Also this table in json or csv format should be used to 
check that

new machine doesn't affect previous ones by comparing tables with and
without new machine.
Default values of properties are needed to fill "holes" in the table 
(one

machine has these properties and another not. For instance, 2.12 mt has
`{ "EPYC-" TYPE_X86_CPU, "xlevel", "0x800a" }`, but compat_pros of
3.1 mt doesn't have it. So, to compare these machines we need to fill
unknown value of "EPYC-x86_64-cpu-xlevel" for 3.1 mt. This unknown value
in the table I called "hole". To get values (default values) for these
"holes" the script uses list of appropriate methods.)

Notes:
* some init values from the devices can't be available like properties
   from virtio-9p when configure has --disable-virtfs. This 
situations will

   be seen in the table as "unavailable driver".
* Default values can be obtained in an unobvious way, like x86 features.
   If the script doesn't know how to get property default value to 
compare
   one machine with another it fills "holes" with "unavailable 
method". This

   is done because script uses whitelist model to get default values of
   different types. It means that the method that can't be applied to 
a new

   type that can crash this script. It is better to get an "unavailable
   driver" when creating a new machine with new compatible properties 
than
   to break this script. So it turns out a more stable and generic 
script.

* If the default value can't be obtained because this property doesn't
   exist or because this property can't have default value, appropriate
   "hole" will be filled by "unknown property" or "no default value"
* If the property is applied to the abstract class, the script collects
   default values from all child classes (set of default values)

Example:

./scripts/compare_mt.py --mt pc-q35-3.1 pc-q35-4.0

╒═══╤══╤╕ 

│   │ 
pc-q35-3.1  │ pc-q35-4.0 │
╞═══╪══╪╡ 

│ Cascadelake-Server-x86_64-cpu:mpx │ True │   
False    │
├───┼──┼┤ 

│ Cascadelake-Server-x86_64-cpu:stepping │  5   │ 
6  │
├───┼──┼┤ 


│ Icelake-Client-x86_64-cpu:mpx │ True │ unavailable driver │
├───┼──┼┤ 


│ Icelake-Server-x86_64-cpu:mpx │ True │   False    │
├───┼──┼┤ 

│ Opteron_G3-x86_64-cpu:rdtscp  │ 
False │    True    │
├───┼──┼┤ 

│ Opteron_G4-x86_64-cpu:rdtscp  │ 
False │    True    │
├───┼──┼┤ 

│ Opteron_G5-x86_64-cpu:rdtscp  │ 
False │    True    │
├───┼──┼┤ 

│ Skylake-Client-IBRS-x86_64-cpu:mpx │ True │   
False    │
├───┼──┼┤ 


│ Skylake-Client-x86_64-cpu:mpx │ True │   False    │
├───┼──┼┤ 

│ Skylake-Server-IBRS-x86_64-cpu:mpx │ True │   
False    │
├───┼──┼┤ 


│ Skylake-Server-x86_64-cpu:mpx │ True │   False    │
├───┼──┼┤ 

│ intel-iommu:dma-drain │ 
False │    True    │
├───┼──┼┤ 

│ memory-backend-file:x-use-canonical-path-for-ramblock-id │ 
True │  no default value  │
├───┼──┼┤ 

│ memory-backend-memfd:x-use-canonical-path-for-ramblock-id │ 
True │  no default value  │

Re: [PULL 59/62] hw/block/pflash_cfi0{1, 2}: Error out if device length isn't a power of two

2022-11-08 Thread Daniel Henrique Barboza


Phil,

On 11/1/22 19:49, Philippe Mathieu-Daudé wrote:

On 1/11/22 23:23, Stefan Hajnoczi wrote:

There is a report that this commit breaks an existing OVMF setup:
https://gitlab.com/qemu-project/qemu/-/issues/1290#note_1156507334

I'm not familiar with pflash. Please find a way to avoid a regression
in QEMU 7.2 here.


Long-standing problem with pflash and underlying images... i.e:
https://lore.kernel.org/qemu-devel/20190308062455.29755-1-arm...@redhat.com/

Let's revert for 7.2. Daniel, I can prepare a patch explaining.


Just sent a revert. I'm not sure if the explanation I provided is
good enough. I appreciate if you can review it.

If it's plausible I'll send a pull request ASAP.


Thanks,

Daniel

Re: [PATCH-for-7.2 1/2] hw/sd/sdhci: Do not set Buf Wr Ena before writing block (CVE-2022-3872)

2022-11-08 Thread Alexander Bulekov

On 221107 2312, Philippe Mathieu-Daudé wrote:
> When sdhci_write_block_to_card() is called to transfer data from
> the FIFO to the SD bus, the data is already present in the buffer
> and we have to consume it directly.
> 
> See the description of the 'Buffer Write Enable' bit from the
> 'Present State' register (prnsts::SDHC_SPACE_AVAILABLE) in Table
> 2.14 from the SDHCI spec v2:
> 
>   Buffer Write Enable
> 
>   This status is used for non-DMA write transfers.
> 
>   The Host Controller can implement multiple buffers to transfer
>   data efficiently. This read only flag indicates if space is
>   available for write data. If this bit is 1, data can be written
>   to the buffer. A change of this bit from 1 to 0 occurs when all
>   the block data is written to the buffer. A change of this bit
>   from 0 to 1 occurs when top of block data can be written to the
>   buffer and generates the Buffer Write Ready interrupt.
> 
> In our case, we do not want to overwrite the buffer, so we want
> this bit to be 0, then set it to 1 once the data is written onto
> the bus.
> 
> This is probably a copy/paste error from commit d7dfca0807
> ("hw/sdhci: introduce standard SD host controller").
> 
> Reproducer:
> https://lore.kernel.org/qemu-devel/caa8xkjxrms0fkr28akvnnpyatm0y0b+5fichpsrhd+mugnu...@mail.gmail.com/
> 
> Fixes: CVE-2022-3872
> Reported-by: RivenDell 
> Reported-by: Siqi Chen 
> Reported-by: ningqiang 
> Signed-off-by: Philippe Mathieu-Daudé 

Seems like OSS-Fuzz also found this, not sure why it never made it into
a gitlab issue:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=45986#c4

Slightly shorter reproducer:

cat << EOF | ./qemu-system-i386 -display none -machine accel=qtest, -m \
512M -nodefaults -device sdhci-pci -device sd-card,drive=mydrive -drive \
if=none,index=0,file=null-co://,format=raw,id=mydrive -nographic -qtest \
stdio
outl 0xcf8 0x80001010
outl 0xcfc 0xe000
outl 0xcf8 0x80001001
outl 0xcfc 0x0600
write 0xe058 0x1 0x6e
write 0xe059 0x1 0x5a
write 0xe028 0x1 0x10
write 0xe02c 0x1 0x05
write 0x5a6e 0x1 0x21
write 0x5a75 0x1 0x20
write 0xe005 0x1 0x02
write 0xe00c 0x1 0x01
write 0xe00e 0x1 0x20
write 0xe00f 0x1 0x00
write 0xe00c 0x1 0x00
write 0xe020 0x1 0x00
EOF

1 2 3 >

1 - 100 of 254 matches

Mail list logo