date:20220119

Re: [PATCH v13 1/7] net/vmnet: add vmnet dependency and customizable option

2022-01-19 Thread Roman Bolshakov

On Thu, Jan 13, 2022 at 08:22:13PM +0300, Vladislav Yaroshchuk wrote:
> vmnet.framework dependency is added with 'vmnet' option
> to enable or disable it. Default value is 'auto'.
> 
> vmnet features to be used are available since macOS 11.0,

Hi Vladislav,

I'm not sure if the comment belongs here. Perhaps you mean that bridged
mode is available from 10.15:

VMNET_BRIDGED_MODE API_AVAILABLE(macos(10.15))  = 1002

This means vmnet.framework is supported on all macbooks starting from 2012.

With this fixed,
Tested-by: Roman Bolshakov 
Reviewed-by: Roman Bolshakov 

The other two modes - shared and host are supported on earlier versions
of macOS (from 10.10). But port forwarding is only available from macOS
10.15.

Theoretically it should possible to support the framework on the earlier
models from 2010 or 2007 on Yosemite up to High Sierra with less
features using MacPorts but I don't think it'd be reasonable to ask
that.

Thanks,
Roman

> corresponding probe is created into meson.build.
> 
> Signed-off-by: Vladislav Yaroshchuk 
> ---
>  meson.build   | 16 +++-
>  meson_options.txt |  2 ++
>  scripts/meson-buildoptions.sh |  3 +++
>  3 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/meson.build b/meson.build
> index c1b1db1e28..285fb7bc41 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -496,6 +496,18 @@ if cocoa.found() and get_option('gtk').enabled()
>error('Cocoa and GTK+ cannot be enabled at the same time')
>  endif
>  
> +vmnet = dependency('appleframeworks', modules: 'vmnet', required: 
> get_option('vmnet'))
> +if vmnet.found() and not cc.has_header_symbol('vmnet/vmnet.h',
> +  'VMNET_BRIDGED_MODE',
> +  dependencies: vmnet)
> +  vmnet = not_found
> +  if get_option('vmnet').enabled()
> +error('vmnet.framework API is outdated')
> +  else
> +warning('vmnet.framework API is outdated, disabling')
> +  endif
> +endif
> +
>  seccomp = not_found
>  if not get_option('seccomp').auto() or have_system or have_tools
>seccomp = dependency('libseccomp', version: '>=2.3.0',
> @@ -1492,6 +1504,7 @@ config_host_data.set('CONFIG_SECCOMP', seccomp.found())
>  config_host_data.set('CONFIG_SNAPPY', snappy.found())
>  config_host_data.set('CONFIG_USB_LIBUSB', libusb.found())
>  config_host_data.set('CONFIG_VDE', vde.found())
> +config_host_data.set('CONFIG_VMNET', vmnet.found())
>  config_host_data.set('CONFIG_VHOST_USER_BLK_SERVER', 
> have_vhost_user_blk_server)
>  config_host_data.set('CONFIG_VNC', vnc.found())
>  config_host_data.set('CONFIG_VNC_JPEG', jpeg.found())
> @@ -3406,7 +3419,8 @@ summary(summary_info, bool_yn: true, section: 'Crypto')
>  # Libraries
>  summary_info = {}
>  if targetos == 'darwin'
> -  summary_info += {'Cocoa support':   cocoa}
> +  summary_info += {'Cocoa support':   cocoa}
> +  summary_info += {'vmnet.framework support': vmnet}
>  endif
>  summary_info += {'SDL support':   sdl}
>  summary_info += {'SDL image support': sdl_image}
> diff --git a/meson_options.txt b/meson_options.txt
> index 921967eddb..701e1381f9 100644
> --- a/meson_options.txt
> +++ b/meson_options.txt
> @@ -151,6 +151,8 @@ option('netmap', type : 'feature', value : 'auto',
> description: 'netmap network backend support')
>  option('vde', type : 'feature', value : 'auto',
> description: 'vde network backend support')
> +option('vmnet', type : 'feature', value : 'auto',
> +   description: 'vmnet.framework network backend support')
>  option('virglrenderer', type : 'feature', value : 'auto',
> description: 'virgl rendering support')
>  option('vnc', type : 'feature', value : 'auto',
> diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
> index 50bd7bed4d..cdcece4b05 100644
> --- a/scripts/meson-buildoptions.sh
> +++ b/scripts/meson-buildoptions.sh
> @@ -84,6 +84,7 @@ meson_options_help() {
>printf "%s\n" '  u2f U2F emulation support'
>printf "%s\n" '  usb-redir   libusbredir support'
>printf "%s\n" '  vde vde network backend support'
> +  printf "%s\n" '  vmnet   vmnet.framework network backend support'
>printf "%s\n" '  vhost-user-blk-server'
>printf "%s\n" '  build vhost-user-blk server'
>printf "%s\n" '  virglrenderer   virgl rendering support'
> @@ -248,6 +249,8 @@ _meson_option_parse() {
>  --disable-usb-redir) printf "%s" -Dusb_redir=disabled ;;
>  --enable-vde) printf "%s" -Dvde=enabled ;;
>  --disable-vde) printf "%s" -Dvde=disabled ;;
> +--enable-vmnet) printf "%s" -Dvmnet=enabled ;;
> +--disable-vmnet) printf "%s" -Dvmnet=disabled ;;
>  --enable-vhost-user-blk-server) printf "%s" 
> -Dvhost_user_blk_server=enabled ;;
>  --disable-vhost-user-blk-server) printf "%s" 
> -Dvhost_user_blk_server=disabled ;;
>  --enable-virglrenderer) printf "%s" -Dvirglrenderer=enabled ;;
> -- 
>

Re: [PATCH v7 21/22] target/riscv: Enable uxl field write

2022-01-19 Thread LIU Zhiwei


Hi Alistair,

Do you mind share you test method?

I follow the xvisor document on 
https://github.com/xvisor/xvisor/blob/v0.3.1/docs/riscv/riscv64-qemu.txt. 
But it can't run even on QEMU master branch.

It blocks on OpenSBI.

liuzw@b12e0231:/mnt/ssd/liuzw/git/xvisor$  qemu-system-riscv64 -cpu rv64,h=true -M virt 
-m 512M -nographic -bios ../opensbi/build/platform/generic/firmware/fw_jump.bin  -kernel 
./build/vmm.bin -initrd ./build/disk.img -append 'vmm.bootcmd="vfs mount initrd 
/;vfs run /boot.xscript;vfs cat /system/banner.txt"'

OpenSBI v1.0-2-g6dde435

       _  _

  / __ \  / |  _ \_   _|

 | |  | |_ __   ___ _ __ | (___ | |_) || |

 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |

 | |__| | |_) |  __/ | | |) | |_) || |_

  \/| .__/ \___|_| |_|_/|/_|

    | |

    |_|

Platform Name : riscv-virtio,qemu

Platform Features : medeleg

Platform HART Count   : 1

Platform IPI Device   : aclint-mswi

Platform Timer Device : aclint-mtimer @ 1000Hz

Platform Console Device   : uart8250

Platform HSM Device   : ---

Platform Reboot Device    : sifive_test

Platform Shutdown Device  : sifive_test

Firmware Base : 0x8000

Firmware Size : 252 KB

Runtime SBI Version   : 0.3

Domain0 Name  : root

Domain0 Boot HART : 0

Domain0 HARTs : 0*

Domain0 Region00  : 0x0200-0x0200 (I)

Domain0 Region01  : 0x8000-0x8003 ()

Domain0 Region02  : 0x-0x (R,W,X)

Domain0 Next Address  : 0x8020

Domain0 Next Arg1 : 0x8220

Domain0 Next Mode : S-mode

Domain0 SysReset  : yes

Boot HART ID  : 0

Boot HART Domain  : root

Boot HART ISA : rv64imafdcsuh

Boot HART Features    : scounteren,mcounteren,time

Boot HART PMP Count   : 16

Boot HART PMP Granularity : 4

Boot HART PMP Address Bits: 54

Boot HART MHPM Count  : 0

Boot HART MIDELEG : 0x0666

Boot HART MEDELEG : 0x00f0b509

QEMU: Terminated


Thanks,
Zhiwei

On 2022/1/20 上午11:29, Alistair Francis wrote:

On Thu, Jan 20, 2022 at 12:12 PM LIU Zhiwei  wrote:


On 2022/1/20 上午8:35, Alistair Francis wrote:

On Wed, Jan 19, 2022 at 3:34 PM LIU Zhiwei  wrote:

Signed-off-by: LIU Zhiwei 
Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
---
   target/riscv/csr.c | 17 -
   1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index b11d92b51b..90f78eca65 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -572,6 +572,7 @@ static RISCVException write_mstatus(CPURISCVState *env, int 
csrno,
   {
   uint64_t mstatus = env->mstatus;
   uint64_t mask = 0;
+RISCVMXL xl = riscv_cpu_mxl(env);

   /* flush tlb on mstatus fields that affect VM */
   if ((val ^ mstatus) & (MSTATUS_MXR | MSTATUS_MPP | MSTATUS_MPV |
@@ -583,21 +584,22 @@ static RISCVException write_mstatus(CPURISCVState *env, 
int csrno,
   MSTATUS_MPP | MSTATUS_MXR | MSTATUS_TVM | MSTATUS_TSR |
   MSTATUS_TW | MSTATUS_VS;

-if (riscv_cpu_mxl(env) != MXL_RV32) {
+if (xl != MXL_RV32) {
   /*
* RV32: MPV and GVA are not in mstatus. The current plan is to
* add them to mstatush. For now, we just don't support it.
*/
   mask |= MSTATUS_MPV | MSTATUS_GVA;
+if ((val & MSTATUS64_UXL) != 0) {
+mask |= MSTATUS64_UXL;
+}
   }

   mstatus = (mstatus & ~mask) | (val & mask);

-RISCVMXL xl = riscv_cpu_mxl(env);
   if (xl > MXL_RV32) {
-/* SXL and UXL fields are for now read only */
+/* SXL field is for now read only */
   mstatus = set_field(mstatus, MSTATUS64_SXL, xl);
-mstatus = set_field(mstatus, MSTATUS64_UXL, xl);

This change causes:

ERROR:../target/riscv/translate.c:295:get_gpr: code should not be reached

to assert when running an Xvisor (Hypervisor extension) guest on the
64-bit virt machine.

Hi Alistair,

I am  almost sure that there is an UXL  field write error in Xvisor.

You are probably right, but a guest bug like that shouldn't be able to
crash QEMU


I guess there is an write_sstatus instruction that  writes a 0 to
SSTATUS64_UXL.

We can fix it on Xvisor. But before that, we should also give more
strict constraints on SSTATUS64_UXL write.

+if ((val & SSTATUS64_UXL) != 0) {
+mask |= SSTATUS64_UXL;
+}
-mask |= SSTATUS64_UXL;


I will send v8 patch set later for you to test later.

Thanks!

Alistair



Thanks,
Zhiwei


Alistair

Re: [PATCH 0/2] virtio: Add vhost-user-gpio device's support

2022-01-19 Thread Viresh Kumar

On 17-01-22, 10:11, Alex Bennée wrote:
> 
> "Michael S. Tsirkin"  writes:
> 
> > On Wed, Jan 12, 2022 at 05:04:57PM +0530, Viresh Kumar wrote:
> >> Hello,
> >> 
> >> This patchset adds vhost-user-gpio device's support in Qemu. The support 
> >> for the
> >> same has already been added to virtio specification and Linux Kernel.
> >> 
> >> A Rust based backend is also in progress and is tested against this 
> >> patchset:
> >> 
> >> https://github.com/rust-vmm/vhost-device/pull/76
> >
> >
> > I'm reluctant to add this with no tests in tree.
> > Want to write a minimal libhost-user based backend?

I actually have one already, that I wrote before attempting the Rust
counterpart, but never upstreamed as I am not sure if anyone is ever
going to use it, as I am not. And I thought what's the point of
merging code which I will never end up using.

I am not sure what test I can add here to make sure this doesn't
breaks in future though.

> This is going to be a problem going forward as we have more out-of-tree
> backends written as a first preference. While the first couple of vhost
> devices have C implementations in contrib before we worked on the rust
> version I think we are getting to the point of skipping a first C
> version for future devices.
> 
> However I notice we have qtest/vhost-user-test.c so would that be enough
> to ensure we can instantiate the device and a basic vhost-user
> initialisation sequence doesn't cause it to crap out. This obviously
> won't be exercising the virtq processing itself but does that really
> exercise any of QEMU's boilerplate anyway?
> 
> > We also need some maintainers to step up.
> 
> You mean more reviewers for the vhost and virtio sections of QEMU's
> maintainers?

And I too was waiting for replies on these. I can surely write
something up if you guys feel there is a need. I just want to
understand it better.

-- 
viresh

Re: [PATCH v7 21/22] target/riscv: Enable uxl field write

2022-01-19 Thread Alistair Francis

On Thu, Jan 20, 2022 at 12:12 PM LIU Zhiwei  wrote:
>
>
> On 2022/1/20 上午8:35, Alistair Francis wrote:
> > On Wed, Jan 19, 2022 at 3:34 PM LIU Zhiwei  wrote:
> >> Signed-off-by: LIU Zhiwei 
> >> Reviewed-by: Richard Henderson 
> >> Reviewed-by: Alistair Francis 
> >> ---
> >>   target/riscv/csr.c | 17 -
> >>   1 file changed, 12 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> >> index b11d92b51b..90f78eca65 100644
> >> --- a/target/riscv/csr.c
> >> +++ b/target/riscv/csr.c
> >> @@ -572,6 +572,7 @@ static RISCVException write_mstatus(CPURISCVState 
> >> *env, int csrno,
> >>   {
> >>   uint64_t mstatus = env->mstatus;
> >>   uint64_t mask = 0;
> >> +RISCVMXL xl = riscv_cpu_mxl(env);
> >>
> >>   /* flush tlb on mstatus fields that affect VM */
> >>   if ((val ^ mstatus) & (MSTATUS_MXR | MSTATUS_MPP | MSTATUS_MPV |
> >> @@ -583,21 +584,22 @@ static RISCVException write_mstatus(CPURISCVState 
> >> *env, int csrno,
> >>   MSTATUS_MPP | MSTATUS_MXR | MSTATUS_TVM | MSTATUS_TSR |
> >>   MSTATUS_TW | MSTATUS_VS;
> >>
> >> -if (riscv_cpu_mxl(env) != MXL_RV32) {
> >> +if (xl != MXL_RV32) {
> >>   /*
> >>* RV32: MPV and GVA are not in mstatus. The current plan is to
> >>* add them to mstatush. For now, we just don't support it.
> >>*/
> >>   mask |= MSTATUS_MPV | MSTATUS_GVA;
> >> +if ((val & MSTATUS64_UXL) != 0) {
> >> +mask |= MSTATUS64_UXL;
> >> +}
> >>   }
> >>
> >>   mstatus = (mstatus & ~mask) | (val & mask);
> >>
> >> -RISCVMXL xl = riscv_cpu_mxl(env);
> >>   if (xl > MXL_RV32) {
> >> -/* SXL and UXL fields are for now read only */
> >> +/* SXL field is for now read only */
> >>   mstatus = set_field(mstatus, MSTATUS64_SXL, xl);
> >> -mstatus = set_field(mstatus, MSTATUS64_UXL, xl);
> > This change causes:
> >
> > ERROR:../target/riscv/translate.c:295:get_gpr: code should not be reached
> >
> > to assert when running an Xvisor (Hypervisor extension) guest on the
> > 64-bit virt machine.
>
> Hi Alistair,
>
> I am  almost sure that there is an UXL  field write error in Xvisor.

You are probably right, but a guest bug like that shouldn't be able to
crash QEMU

>
> I guess there is an write_sstatus instruction that  writes a 0 to
> SSTATUS64_UXL.
>
> We can fix it on Xvisor. But before that, we should also give more
> strict constraints on SSTATUS64_UXL write.
>
> +if ((val & SSTATUS64_UXL) != 0) {
> +mask |= SSTATUS64_UXL;
> +}
> -mask |= SSTATUS64_UXL;
>
>
> I will send v8 patch set later for you to test later.

Thanks!

Alistair

>
>
> Thanks,
> Zhiwei
>
> > Alistair

Re: [PATCH v3 5/6] docs/system/devices/usb: Add CanoKey to USB devices examples

2022-01-19 Thread Hongren (Zenithal) Zheng

On Tue, Jan 18, 2022 at 10:28:49AM +0100, Thomas Huth wrote:
> On 13/01/2022 19.11, Hongren (Zenithal) Zheng wrote:
> > Signed-off-by: Hongren (Zenithal) Zheng 
> > ---
> >   docs/system/devices/usb.rst | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/docs/system/devices/usb.rst b/docs/system/devices/usb.rst
> > index afb7d6c226..341694403a 100644
> > --- a/docs/system/devices/usb.rst
> > +++ b/docs/system/devices/usb.rst
> > @@ -199,6 +199,9 @@ option or the ``device_add`` monitor command. Available 
> > devices are:
> >   ``u2f-{emulated,passthru}``
> >  Universal Second Factor device
> > +``canokey``
> > +   An Open-source Secure Key implementing FIDO2, OpenPGP, PIV and more.
> 
> Reviewed-by: Thomas Huth 
> 
> Just an additional idea: It might be helpful for the users if you put a link
> to the separate documentation from the previous patch here?

Will be added in the next version.

> 
>  Thomas
>

Re: [PATCH v7 21/22] target/riscv: Enable uxl field write

2022-01-19 Thread LIU Zhiwei




On 2022/1/20 上午8:35, Alistair Francis wrote:

On Wed, Jan 19, 2022 at 3:34 PM LIU Zhiwei  wrote:

Signed-off-by: LIU Zhiwei 
Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
---
  target/riscv/csr.c | 17 -
  1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index b11d92b51b..90f78eca65 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -572,6 +572,7 @@ static RISCVException write_mstatus(CPURISCVState *env, int 
csrno,
  {
  uint64_t mstatus = env->mstatus;
  uint64_t mask = 0;
+RISCVMXL xl = riscv_cpu_mxl(env);

  /* flush tlb on mstatus fields that affect VM */
  if ((val ^ mstatus) & (MSTATUS_MXR | MSTATUS_MPP | MSTATUS_MPV |
@@ -583,21 +584,22 @@ static RISCVException write_mstatus(CPURISCVState *env, 
int csrno,
  MSTATUS_MPP | MSTATUS_MXR | MSTATUS_TVM | MSTATUS_TSR |
  MSTATUS_TW | MSTATUS_VS;

-if (riscv_cpu_mxl(env) != MXL_RV32) {
+if (xl != MXL_RV32) {
  /*
   * RV32: MPV and GVA are not in mstatus. The current plan is to
   * add them to mstatush. For now, we just don't support it.
   */
  mask |= MSTATUS_MPV | MSTATUS_GVA;
+if ((val & MSTATUS64_UXL) != 0) {
+mask |= MSTATUS64_UXL;
+}
  }

  mstatus = (mstatus & ~mask) | (val & mask);

-RISCVMXL xl = riscv_cpu_mxl(env);
  if (xl > MXL_RV32) {
-/* SXL and UXL fields are for now read only */
+/* SXL field is for now read only */
  mstatus = set_field(mstatus, MSTATUS64_SXL, xl);
-mstatus = set_field(mstatus, MSTATUS64_UXL, xl);

This change causes:

ERROR:../target/riscv/translate.c:295:get_gpr: code should not be reached

to assert when running an Xvisor (Hypervisor extension) guest on the
64-bit virt machine.


Hi Alistair,

Thanks for pointing it out. I will have a test on Xvisor.

Thanks,
Zhiwei



Alistair

Re: [PATCH RFC 05/15] migration: Simplify unqueue_page()

2022-01-19 Thread Peter Xu

On Wed, Jan 19, 2022 at 04:36:50PM +, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > This patch simplifies unqueue_page() on both sides of it (itself, and 
> > caller).
> > 
> > Firstly, due to the fact that right after unqueue_page() returned true, 
> > we'll
> > definitely send a huge page (see ram_save_huge_page() call - it will _never_
> > exit before finish sending that huge page), so unqueue_page() does not need 
> > to
> > jump in small page size if huge page is enabled on the ramblock.  IOW, it's
> > destined that only the 1st 4K page will be valid, when unqueue the 2nd+ time
> > we'll notice the whole huge page has already been sent anyway.  Switching to
> > operating on huge page reduces a lot of the loops of redundant 
> > unqueue_page().
> > 
> > Meanwhile, drop the dirty check.  It's not helpful to call test_bit() every
> > time to jump over clean pages, as ram_save_host_page() has already done so,
> > while in a faster way (see commit ba1b7c812c ("migration/ram: Optimize
> > ram_save_host_page()", 2021-05-13)).  So that's not necessary too.
> > 
> > Drop the two tracepoints along the way - based on above analysis it's very
> > possible that no one is really using it..
> > 
> > Signed-off-by: Peter Xu 
> 
> Yes, OK
> 
> Reviewed-by: Dr. David Alan Gilbert 
> 
> Although:
>   a) You might like to keep a trace in get_queued_page just to see
> what's getting unqueued
>   b) I think originally it was a useful diagnostic to find out when we
> were getting a lot of queue requests for pages that were already sent.

Ah, that makes sense.  How about I keep the test_bit but remove the loop?  I
can make both a) and b) into one tracepoint:


diff --git a/migration/ram.c b/migration/ram.c
index 0df15ff663..02f36fa6d5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1572,6 +1572,9 @@ static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t 
*offset)
 migration_consume_urgent_request();
 }
 
+trace_unqueue_page(block->idstr, *offset,
+   test_bit((*offset >> TARGET_PAGE_BITS), block->bmap));
+
 return block;
 }
 
diff --git a/migration/trace-events b/migration/trace-events
index 3a9b3567ae..efa3a95f81 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -110,6 +110,7 @@ ram_save_iterate_big_wait(uint64_t milliconds, int 
iterations) "big wait: %" PRI
 ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" 
PRIu64
 ram_write_tracking_ramblock_start(const char *block_id, size_t page_size, void 
*addr, size_t length) "%s: page_size: %zu addr: %p length: %zu"
 ram_write_tracking_ramblock_stop(const char *block_id, size_t page_size, void 
*addr, size_t length) "%s: page_size: %zu addr: %p length: %zu"
+unqueue_page(char *block, uint64_t offset, bool dirty) "ramblock '%s' offset 
0x%"PRIx64" dirty %d"
 
 # multifd.c
 multifd_new_send_channel_async(uint8_t id) "channel %d"


Thanks,

-- 
Peter Xu

Re: [PATCH v7 21/22] target/riscv: Enable uxl field write

2022-01-19 Thread LIU Zhiwei




On 2022/1/20 上午8:35, Alistair Francis wrote:

On Wed, Jan 19, 2022 at 3:34 PM LIU Zhiwei  wrote:

Signed-off-by: LIU Zhiwei 
Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
---
  target/riscv/csr.c | 17 -
  1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index b11d92b51b..90f78eca65 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -572,6 +572,7 @@ static RISCVException write_mstatus(CPURISCVState *env, int 
csrno,
  {
  uint64_t mstatus = env->mstatus;
  uint64_t mask = 0;
+RISCVMXL xl = riscv_cpu_mxl(env);

  /* flush tlb on mstatus fields that affect VM */
  if ((val ^ mstatus) & (MSTATUS_MXR | MSTATUS_MPP | MSTATUS_MPV |
@@ -583,21 +584,22 @@ static RISCVException write_mstatus(CPURISCVState *env, 
int csrno,
  MSTATUS_MPP | MSTATUS_MXR | MSTATUS_TVM | MSTATUS_TSR |
  MSTATUS_TW | MSTATUS_VS;

-if (riscv_cpu_mxl(env) != MXL_RV32) {
+if (xl != MXL_RV32) {
  /*
   * RV32: MPV and GVA are not in mstatus. The current plan is to
   * add them to mstatush. For now, we just don't support it.
   */
  mask |= MSTATUS_MPV | MSTATUS_GVA;
+if ((val & MSTATUS64_UXL) != 0) {
+mask |= MSTATUS64_UXL;
+}
  }

  mstatus = (mstatus & ~mask) | (val & mask);

-RISCVMXL xl = riscv_cpu_mxl(env);
  if (xl > MXL_RV32) {
-/* SXL and UXL fields are for now read only */
+/* SXL field is for now read only */
  mstatus = set_field(mstatus, MSTATUS64_SXL, xl);
-mstatus = set_field(mstatus, MSTATUS64_UXL, xl);

This change causes:

ERROR:../target/riscv/translate.c:295:get_gpr: code should not be reached

to assert when running an Xvisor (Hypervisor extension) guest on the
64-bit virt machine.


Hi Alistair,

I am  almost sure that there is an UXL  field write error in Xvisor.

I guess there is an write_sstatus instruction that  writes a 0 to 
SSTATUS64_UXL.


We can fix it on Xvisor. But before that, we should also give more 
strict constraints on SSTATUS64_UXL write.


+if ((val & SSTATUS64_UXL) != 0) {
+mask |= SSTATUS64_UXL;
+}
-mask |= SSTATUS64_UXL;


I will send v8 patch set later for you to test later.


Thanks,
Zhiwei


Alistair

Re: [PATCH RFC 02/15] migration: Allow pss->page jump over clean pages

2022-01-19 Thread Peter Xu

On Wed, Jan 19, 2022 at 01:42:47PM +, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > Commit ba1b7c812c ("migration/ram: Optimize ram_save_host_page()") managed 
> > to
> > optimize host huge page use case by scanning the dirty bitmap when looking 
> > for
> > the next dirty small page to migrate.
> > 
> > However when updating the pss->page before returning from that function, we
> > used MIN() of these two values: (1) next dirty bit, or (2) end of current 
> > sent
> > huge page, to fix up pss->page.
> > 
> > That sounds unnecessary, because I see nowhere that requires pss->page to be
> > not going over current huge page boundary.
> > 
> > What we need here is probably MAX() instead of MIN() so that we'll start
> > scanning from the next dirty bit next time. Since pss->page can't be smaller
> > than hostpage_boundary (the loop guarantees it), it probably means we don't
> > need to fix it up at all.
> > 
> > Cc: Keqian Zhu 
> > Cc: Kunkun Jiang 
> > Signed-off-by: Peter Xu 
> 
> 
> Hmm, I think that's potentially necessary.  note that the start of
> ram_save_host_page stores the 'start_page' at entry.
> That' start_page' goes to the ram_save_release_protection and so
> I think it needs to be pagesize aligned for the mmap/uffd that happens.

Right, that's indeed a functional change, but IMHO it's also fine.

When reaching ram_save_release_protection(), what we guaranteed is that below
page range contains no dirty bits in ramblock dirty bitmap:

  range0 = [start_page, pss->page)

Side note: inclusive on start, but not inclusive on the end side of range0
(that is, pss->page can be pointing to a dirty page).

What ram_save_release_protection() does is to unprotect the pages and let them
run free.  If we're sure range0 contains no dirty page, it means we have
already copied them over into the snapshot, so IIUC it's safe to unprotect all
of it (even if it's already bigger than the host page size)?

That can be slightly less efficient for live snapshot in some extreme cases
(when unprotect, we'll need to walk the pgtables in the uffd ioctl()), but I
don't assume live snapshot to be run on a huge VM, so hopefully it's still
fine?  Not to mention it should make live migration a little bit faster,
assuming that's more frequently used..

Thanks,

-- 
Peter Xu

Re: [PATCH v7 4/5] migration: Add migrate_use_tls() helper

2022-01-19 Thread Peter Xu

On Wed, Jan 19, 2022 at 03:06:55PM -0300, Leonardo Bras Soares Passos wrote:
> Hello Peter,
> 
> On Thu, Jan 13, 2022 at 4:02 AM Peter Xu  wrote:
> >
> > On Thu, Jan 06, 2022 at 07:13:41PM -0300, Leonardo Bras wrote:
> > >  void migration_channel_process_incoming(QIOChannel *ioc)
> > >  {
> > > -MigrationState *s = migrate_get_current();
> > >  Error *local_err = NULL;
> > >
> > >  trace_migration_set_incoming_channel(
> > >  ioc, object_get_typename(OBJECT(ioc)));
> > >
> > > -if (s->parameters.tls_creds &&
> > > -*s->parameters.tls_creds &&
> > > +if (migrate_use_tls() &&
> > >  !object_dynamic_cast(OBJECT(ioc),
> > >   TYPE_QIO_CHANNEL_TLS)) {
> > > +MigrationState *s = migrate_get_current();
> > > +
> >
> > Trivial nit: I'd rather keep the line there; as the movement offers nothing,
> > imho..
> 
> The idea to move the 's' to inside the if  block is to make it clear
> it's only used in this case.

IMHO not necessary; I hardly read declarations for this, unless there's a bug,
e.g. on variable shadowing. Moving it downwards makes it easier to happen. :)

> 
> But if you think it's better to keep it at the beginning of the
> function, sure, I can change that.
> Just let me know.

Since there'll be a new version, that definitely looks nicer.

Thanks,

-- 
Peter Xu

Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX

2022-01-19 Thread Peter Xu

On Wed, Jan 19, 2022 at 02:22:56PM -0300, Leonardo Bras Soares Passos wrote:
> Hello Daniel,
> 
> On Thu, Jan 13, 2022 at 7:42 AM Daniel P. Berrangé  
> wrote:
> >
> > On Thu, Jan 13, 2022 at 06:34:12PM +0800, Peter Xu wrote:
> > > On Thu, Jan 13, 2022 at 10:06:14AM +, Daniel P. Berrangé wrote:
> > > > On Thu, Jan 13, 2022 at 02:48:15PM +0800, Peter Xu wrote:
> > > > > On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > > > > > @@ -558,15 +575,26 @@ static ssize_t 
> > > > > > qio_channel_socket_writev(QIOChannel *ioc,
> > > > > >  memcpy(CMSG_DATA(cmsg), fds, fdsize);
> > > > > >  }
> > > > > >
> > > > > > +if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > > > > > +sflags = MSG_ZEROCOPY;
> > > > > > +}
> > > > > > +
> > > > > >   retry:
> > > > > > -ret = sendmsg(sioc->fd, , 0);
> > > > > > +ret = sendmsg(sioc->fd, , sflags);
> > > > > >  if (ret <= 0) {
> > > > > > -if (errno == EAGAIN) {
> > > > > > +switch (errno) {
> > > > > > +case EAGAIN:
> > > > > >  return QIO_CHANNEL_ERR_BLOCK;
> > > > > > -}
> > > > > > -if (errno == EINTR) {
> > > > > > +case EINTR:
> > > > > >  goto retry;
> > > > > > +case ENOBUFS:
> > > > > > +if (sflags & MSG_ZEROCOPY) {
> > > > > > +error_setg_errno(errp, errno,
> > > > > > + "Process can't lock enough memory 
> > > > > > for using MSG_ZEROCOPY");
> > > > > > +return -1;
> > > > > > +}
> > > > >
> > > > > I have no idea whether it'll make a real differnece, but - should we 
> > > > > better add
> > > > > a "break" here?  If you agree and with that fixed, feel free to add:
> > > > >
> > > > > Reviewed-by: Peter Xu 
> > > > >
> > > > > I also wonder whether you hit ENOBUFS in any of the environments.  On 
> > > > > Fedora
> > > > > here it's by default unlimited, but just curious when we should keep 
> > > > > an eye.
> > > >
> > > > Fedora doesn't allow unlimited locked memory by default
> > > >
> > > > $ grep "locked memory" /proc/self/limits
> > > > Max locked memory 6553665536
> > > > bytes
> > > >
> > > > And  regardless of Fedora defaults, libvirt will set a limit
> > > > for the guest. It will only be unlimited if requiring certain
> > > > things like VFIO.
> > >
> > > Thanks, I obviously checked up the wrong host..
> > >
> > > Leo, do you know how much locked memory will be needed by zero copy?  Will
> > > there be a limit?  Is it linear to the number of sockets/channels?
> >
> > IIRC we decided it would be limited by the socket send buffer size, rather
> > than guest RAM, because writes will block once the send buffer is full.
> >
> > This has a default global setting, with per-socket override. On one box I
> > have it is 200 Kb. With multifd you'll need  "num-sockets * send buffer".
> 
> Oh, I was not aware there is a send buffer size (or maybe I am unable
> to recall).
> That sure makes things much easier.
> 
> >
> > > It'll be better if we can fail at enabling the feature when we detected 
> > > that
> > > the specified locked memory limit may not be suffice.
> 
> sure
> 
> >
> > Checking this value against available locked memory though will always
> > have an error margin because other things in QEMU can use locked memory
> > too
> 
> We can get the current limit (before zerocopy) as an error margin:
> req_lock_mem = num-sockets * send buffer + BASE_LOCKED
> 
> Where BASE_LOCKED is the current libvirt value, or so on.

Hmm.. not familiar with libvirt, so I'm curious whether libvirt is actually
enlarging the allowed locked mem on Fedora since the default is 64KB?

I think it'll be great to capture the very major going-to-fail scenarios.  For
example, I'm wondering whether a qemu (without libvirt) will simply fail
directly on Fedora using non-root even with 1 channel due to the 64K limit, or
the other extreme case is when the user does not allow locking mem at all in
some container environment (when we see max locked mem is zero).

It's not only about failing early, it's also about failing with a meaningful
error so the user knows what to tune, while I'm not very sure that'll be easily
understandable when we wait until the failure of io_writev().

Thanks,

-- 
Peter Xu

Re: [PATCH v7 21/22] target/riscv: Enable uxl field write

2022-01-19 Thread Alistair Francis

On Wed, Jan 19, 2022 at 3:34 PM LIU Zhiwei  wrote:
>
> Signed-off-by: LIU Zhiwei 
> Reviewed-by: Richard Henderson 
> Reviewed-by: Alistair Francis 
> ---
>  target/riscv/csr.c | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index b11d92b51b..90f78eca65 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -572,6 +572,7 @@ static RISCVException write_mstatus(CPURISCVState *env, 
> int csrno,
>  {
>  uint64_t mstatus = env->mstatus;
>  uint64_t mask = 0;
> +RISCVMXL xl = riscv_cpu_mxl(env);
>
>  /* flush tlb on mstatus fields that affect VM */
>  if ((val ^ mstatus) & (MSTATUS_MXR | MSTATUS_MPP | MSTATUS_MPV |
> @@ -583,21 +584,22 @@ static RISCVException write_mstatus(CPURISCVState *env, 
> int csrno,
>  MSTATUS_MPP | MSTATUS_MXR | MSTATUS_TVM | MSTATUS_TSR |
>  MSTATUS_TW | MSTATUS_VS;
>
> -if (riscv_cpu_mxl(env) != MXL_RV32) {
> +if (xl != MXL_RV32) {
>  /*
>   * RV32: MPV and GVA are not in mstatus. The current plan is to
>   * add them to mstatush. For now, we just don't support it.
>   */
>  mask |= MSTATUS_MPV | MSTATUS_GVA;
> +if ((val & MSTATUS64_UXL) != 0) {
> +mask |= MSTATUS64_UXL;
> +}
>  }
>
>  mstatus = (mstatus & ~mask) | (val & mask);
>
> -RISCVMXL xl = riscv_cpu_mxl(env);
>  if (xl > MXL_RV32) {
> -/* SXL and UXL fields are for now read only */
> +/* SXL field is for now read only */
>  mstatus = set_field(mstatus, MSTATUS64_SXL, xl);
> -mstatus = set_field(mstatus, MSTATUS64_UXL, xl);

This change causes:

ERROR:../target/riscv/translate.c:295:get_gpr: code should not be reached

to assert when running an Xvisor (Hypervisor extension) guest on the
64-bit virt machine.

Alistair

Re: [PATCH v2] qapi: Cleanup SGX related comments and restore @section-size

2022-01-19 Thread Philippe Mathieu-Daudé via

+Markus for QAPI deprecation

On 1/20/22 00:57, Yang Zhong wrote:
> The SGX NUMA patches were merged into Qemu 7.0 release, we need
> clarify detailed version history information and also change
> some related comments, which make SGX related comments clearer.
> 
> The QMP command schema promises backwards compatibility as standard.
> We temporarily restore "@section-size", which can avoid incompatible
> API breakage. The "@section-size" will be deprecated in 7.2 version.
> 
> Suggested-by: Daniel P. Berrangé 
> Signed-off-by: Yang Zhong 
> Reviewed-by: Daniel P. Berrangé 
> ---
>  qapi/machine.json |  4 ++--
>  qapi/misc-target.json | 17 -
>  hw/i386/sgx.c | 11 +--
>  3 files changed, 23 insertions(+), 9 deletions(-)

> diff --git a/qapi/misc-target.json b/qapi/misc-target.json
> index 1022aa0184..a87358ea44 100644
> --- a/qapi/misc-target.json
> +++ b/qapi/misc-target.json
> @@ -344,9 +344,9 @@
>  #
>  # @node: the numa node
>  #
> -# @size: the size of epc section
> +# @size: the size of EPC section
>  #
> -# Since: 6.2
> +# Since: 7.0
>  ##
>  { 'struct': 'SGXEPCSection',
>'data': { 'node': 'int',
> @@ -365,7 +365,9 @@
>  #
>  # @flc: true if FLC is supported
>  #
> -# @sections: The EPC sections info for guest
> +# @section-size: The EPC section size for guest (Will be deprecated in 7.2)

See commit 75ecee72625 ("qapi: Enable enum member introspection to show
more than name"). I'd change as:

  # @section-size: The EPC section size for guest
  # Redundant with @sections.  Just for backward
compatibility.

> +#
> +# @sections: The EPC sections info for guest (Since: 7.0)

and then add:


  # Features:
  # @deprecated: Member @section-size is deprecated.  Use @sections instead.

>  #
>  # Since: 6.2
>  ##
> @@ -374,6 +376,7 @@
>  'sgx1': 'bool',
>  'sgx2': 'bool',
>  'flc': 'bool',
> +'section-size': 'uint64',
>  'sections': ['SGXEPCSection']},
> 'if': 'TARGET_I386' }

[PATCH v3 1/2] tpm: CRB: Use ram_device for "tpm-crb-cmd" region

2022-01-19 Thread Philippe Mathieu-Daudé via

From: Eric Auger 

Representing the CRB cmd/response buffer as a standard
RAM region causes some trouble when the device is used
with VFIO. Indeed VFIO attempts to DMA_MAP this region
as usual RAM but this latter does not have a valid page
size alignment causing such an error report:
"vfio_listener_region_add received unaligned region".
To allow VFIO to detect that failing dma mapping
this region is not an issue, let's use a ram_device
memory region type instead.

Signed-off-by: Eric Auger 
Tested-by: Stefan Berger 
Acked-by: Stefan Berger 
[PMD: Keep tpm_crb.c in meson's softmmu_ss]
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/tpm/tpm_crb.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/hw/tpm/tpm_crb.c b/hw/tpm/tpm_crb.c
index 58ebd1469c3..be0884ea603 100644
--- a/hw/tpm/tpm_crb.c
+++ b/hw/tpm/tpm_crb.c
@@ -25,6 +25,7 @@
 #include "sysemu/tpm_backend.h"
 #include "sysemu/tpm_util.h"
 #include "sysemu/reset.h"
+#include "exec/cpu-common.h"
 #include "tpm_prop.h"
 #include "tpm_ppi.h"
 #include "trace.h"
@@ -43,6 +44,7 @@ struct CRBState {
 
 bool ppi_enabled;
 TPMPPI ppi;
+uint8_t *crb_cmd_buf;
 };
 typedef struct CRBState CRBState;
 
@@ -291,10 +293,14 @@ static void tpm_crb_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
+s->crb_cmd_buf = qemu_memalign(qemu_real_host_page_size,
+HOST_PAGE_ALIGN(CRB_CTRL_CMD_SIZE));
+
 memory_region_init_io(>mmio, OBJECT(s), _crb_memory_ops, s,
 "tpm-crb-mmio", sizeof(s->regs));
-memory_region_init_ram(>cmdmem, OBJECT(s),
-"tpm-crb-cmd", CRB_CTRL_CMD_SIZE, errp);
+memory_region_init_ram_device_ptr(>cmdmem, OBJECT(s), "tpm-crb-cmd",
+  CRB_CTRL_CMD_SIZE, s->crb_cmd_buf);
+vmstate_register_ram(>cmdmem, DEVICE(s));
 
 memory_region_add_subregion(get_system_memory(),
 TPM_CRB_ADDR_BASE, >mmio);
@@ -309,12 +315,24 @@ static void tpm_crb_realize(DeviceState *dev, Error 
**errp)
 qemu_register_reset(tpm_crb_reset, dev);
 }
 
+static void tpm_crb_unrealize(DeviceState *dev)
+{
+CRBState *s = CRB(dev);
+
+qemu_vfree(s->crb_cmd_buf);
+
+if (s->ppi_enabled) {
+qemu_vfree(s->ppi.buf);
+}
+}
+
 static void tpm_crb_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 TPMIfClass *tc = TPM_IF_CLASS(klass);
 
 dc->realize = tpm_crb_realize;
+dc->unrealize = tpm_crb_unrealize;
 device_class_set_props(dc, tpm_crb_properties);
 dc->vmsd  = _tpm_crb;
 dc->user_creatable = true;
-- 
2.34.1

[PATCH v3 2/2] hw/vfio/common: Silence ram device offset alignment error traces

2022-01-19 Thread Philippe Mathieu-Daudé via

From: Eric Auger 

Failing to DMA MAP a ram_device should not cause an error message.
This is currently happening with the TPM CRB command region and
this is causing confusion.

We may want to keep the trace for debug purpose though.

Signed-off-by: Eric Auger 
Tested-by: Stefan Berger 
Acked-by: Alex Williamson 
Acked-by: Stefan Berger 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/vfio/common.c | 15 ++-
 hw/vfio/trace-events |  1 +
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 080046e3f51..9caa560b078 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -884,7 +884,20 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 if (unlikely((section->offset_within_address_space &
   ~qemu_real_host_page_mask) !=
  (section->offset_within_region & ~qemu_real_host_page_mask))) 
{
-error_report("%s received unaligned region", __func__);
+if (memory_region_is_ram_device(section->mr)) { /* just debug purpose 
*/
+trace_vfio_listener_region_add_bad_offset_alignment(
+memory_region_name(section->mr),
+section->offset_within_address_space,
+section->offset_within_region, qemu_real_host_page_size);
+} else { /* error case we don't want to be fatal */
+error_report("%s received unaligned region %s iova=0x%"PRIx64
+ " offset_within_region=0x%"PRIx64
+ " qemu_real_host_page_mask=0x%"PRIx64,
+ __func__, memory_region_name(section->mr),
+ section->offset_within_address_space,
+ section->offset_within_region,
+ qemu_real_host_page_mask);
+}
 return;
 }
 
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 0ef1b5f4a65..ccd9d7610d6 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -100,6 +100,7 @@ vfio_listener_region_add_skip(uint64_t start, uint64_t end) 
"SKIPPING region_add
 vfio_spapr_group_attach(int groupfd, int tablefd) "Attached groupfd %d to 
liobn fd %d"
 vfio_listener_region_add_iommu(uint64_t start, uint64_t end) "region_add 
[iommu] 0x%"PRIx64" - 0x%"PRIx64
 vfio_listener_region_add_ram(uint64_t iova_start, uint64_t iova_end, void 
*vaddr) "region_add [ram] 0x%"PRIx64" - 0x%"PRIx64" [%p]"
+vfio_listener_region_add_bad_offset_alignment(const char *name, uint64_t iova, 
uint64_t offset_within_region, uint64_t page_size) "Region \"%s\" @0x%"PRIx64", 
offset_within_region=0x%"PRIx64", qemu_real_host_page_mask=0x%"PRIx64 " cannot 
be mapped for DMA"
 vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t 
size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not 
aligned to 0x%"PRIx64" and cannot be mapped for DMA"
 vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING 
region_del 0x%"PRIx64" - 0x%"PRIx64
 vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" 
- 0x%"PRIx64
-- 
2.34.1

[PATCH v3 0/2] tpm: CRB: Use ram_device for "tpm-crb-cmd" region

2022-01-19 Thread Philippe Mathieu-Daudé via

This is a respin of Eric's work, but not making tpm_crb.c target
specific.

Based-on: <2022012836.229419-1-f4...@amsat.org>
"exec/cpu: Make host pages variables / macros 'target agnostic'"
https://lore.kernel.org/qemu-devel/2022012836.229419-1-f4...@amsat.org/

--

Eric's v2 cover:

This series aims at removing a spurious error message we get when
launching a guest with a TPM-CRB device and VFIO-PCI devices.

The CRB command buffer currently is a RAM MemoryRegion and given
its base address alignment, it causes an error report on
vfio_listener_region_add(). This series proposes to use a ram-device
region instead which helps in better assessing the dma map error
failure on VFIO side.

Eric Auger (2):
  tpm: CRB: Use ram_device for "tpm-crb-cmd" region
  hw/vfio/common: Silence ram device offset alignment error traces

 hw/tpm/tpm_crb.c | 22 --
 hw/vfio/common.c | 15 ++-
 hw/vfio/trace-events |  1 +
 3 files changed, 35 insertions(+), 3 deletions(-)

-- 
2.34.1

Re: [PATCH v5 03/18] pci: isolated address space for PCI bus

2022-01-19 Thread Michael S. Tsirkin

On Wed, Jan 19, 2022 at 04:41:52PM -0500, Jagannathan Raman wrote:
> Allow PCI buses to be part of isolated CPU address spaces. This has a
> niche usage.
> 
> TYPE_REMOTE_MACHINE allows multiple VMs to house their PCI devices in
> the same machine/server. This would cause address space collision as
> well as be a security vulnerability. Having separate address spaces for
> each PCI bus would solve this problem.

Fascinating, but I am not sure I understand. any examples?

I also wonder whether this special type could be modelled like a special
kind of iommu internally.

> Signed-off-by: Elena Ufimtseva 
> Signed-off-by: John G Johnson 
> Signed-off-by: Jagannathan Raman 
> ---
>  include/hw/pci/pci.h |  2 ++
>  include/hw/pci/pci_bus.h | 17 +
>  hw/pci/pci.c | 17 +
>  hw/pci/pci_bridge.c  |  5 +
>  4 files changed, 41 insertions(+)
> 
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 023abc0f79..9bb4472abc 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -387,6 +387,8 @@ void pci_device_save(PCIDevice *s, QEMUFile *f);
>  int pci_device_load(PCIDevice *s, QEMUFile *f);
>  MemoryRegion *pci_address_space(PCIDevice *dev);
>  MemoryRegion *pci_address_space_io(PCIDevice *dev);
> +AddressSpace *pci_isol_as_mem(PCIDevice *dev);
> +AddressSpace *pci_isol_as_io(PCIDevice *dev);
>  
>  /*
>   * Should not normally be used by devices. For use by sPAPR target
> diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
> index 347440d42c..d78258e79e 100644
> --- a/include/hw/pci/pci_bus.h
> +++ b/include/hw/pci/pci_bus.h
> @@ -39,9 +39,26 @@ struct PCIBus {
>  void *irq_opaque;
>  PCIDevice *devices[PCI_SLOT_MAX * PCI_FUNC_MAX];
>  PCIDevice *parent_dev;
> +
>  MemoryRegion *address_space_mem;
>  MemoryRegion *address_space_io;
>  
> +/**
> + * Isolated address spaces - these allow the PCI bus to be part
> + * of an isolated address space as opposed to the global
> + * address_space_memory & address_space_io.

Are you sure address_space_memory & address_space_io are
always global? even in the case of an iommu?

> This allows the
> + * bus to be attached to CPUs from different machines. The
> + * following is not used used commonly.
> + *
> + * TYPE_REMOTE_MACHINE allows emulating devices from multiple
> + * VM clients,

what are VM clients?

> as such it needs the PCI buses in the same machine
> + * to be part of different CPU address spaces. The following is
> + * useful in that scenario.
> + *
> + */
> +AddressSpace *isol_as_mem;
> +AddressSpace *isol_as_io;
> +
>  QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */
>  QLIST_ENTRY(PCIBus) sibling;/* this will be replaced by qdev later */
>  
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 5d30f9ca60..d5f1c6c421 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -442,6 +442,8 @@ static void pci_root_bus_internal_init(PCIBus *bus, 
> DeviceState *parent,
>  bus->slot_reserved_mask = 0x0;
>  bus->address_space_mem = address_space_mem;
>  bus->address_space_io = address_space_io;
> +bus->isol_as_mem = NULL;
> +bus->isol_as_io = NULL;
>  bus->flags |= PCI_BUS_IS_ROOT;
>  
>  /* host bridge */
> @@ -2676,6 +2678,16 @@ MemoryRegion *pci_address_space_io(PCIDevice *dev)
>  return pci_get_bus(dev)->address_space_io;
>  }
>  
> +AddressSpace *pci_isol_as_mem(PCIDevice *dev)
> +{
> +return pci_get_bus(dev)->isol_as_mem;
> +}
> +
> +AddressSpace *pci_isol_as_io(PCIDevice *dev)
> +{
> +return pci_get_bus(dev)->isol_as_io;
> +}
> +
>  static void pci_device_class_init(ObjectClass *klass, void *data)
>  {
>  DeviceClass *k = DEVICE_CLASS(klass);
> @@ -2699,6 +2711,7 @@ static void pci_device_class_base_init(ObjectClass 
> *klass, void *data)
>  
>  AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
>  {
> +AddressSpace *iommu_as = NULL;
>  PCIBus *bus = pci_get_bus(dev);
>  PCIBus *iommu_bus = bus;
>  uint8_t devfn = dev->devfn;
> @@ -2745,6 +2758,10 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice 
> *dev)
>  if (!pci_bus_bypass_iommu(bus) && iommu_bus && iommu_bus->iommu_fn) {
>  return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn);
>  }
> +iommu_as = pci_isol_as_mem(dev);
> +if (iommu_as) {
> +return iommu_as;
> +}
>  return _space_memory;
>  }
>  
> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
> index da34c8ebcd..98366768d2 100644
> --- a/hw/pci/pci_bridge.c
> +++ b/hw/pci/pci_bridge.c
> @@ -383,6 +383,11 @@ void pci_bridge_initfn(PCIDevice *dev, const char 
> *typename)
>  sec_bus->address_space_io = >address_space_io;
>  memory_region_init(>address_space_io, OBJECT(br), "pci_bridge_io",
> 4 * GiB);
> +
> +/* This PCI bridge puts the sec_bus in its parent's address space */
> +

[PATCH] exec/cpu: Make host pages variables / macros 'target agnostic'

2022-01-19 Thread Philippe Mathieu-Daudé via

"host" pages are related to the *host* not the *target*,
thus the qemu_host_page_size / qemu_host_page_mask variables
and the HOST_PAGE_ALIGN() / REAL_HOST_PAGE_ALIGN() macros
can be moved to "exec/cpu-common.h" which is target agnostic.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/exec/cpu-all.h| 9 -
 include/exec/cpu-common.h | 9 +
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index bb37239efa3..84caf5c3d9f 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -234,15 +234,6 @@ extern const TargetPageBits target_page;
 
 #define TARGET_PAGE_ALIGN(addr) ROUND_UP((addr), TARGET_PAGE_SIZE)
 
-/* Using intptr_t ensures that qemu_*_page_mask is sign-extended even
- * when intptr_t is 32-bit and we are aligning a long long.
- */
-extern uintptr_t qemu_host_page_size;
-extern intptr_t qemu_host_page_mask;
-
-#define HOST_PAGE_ALIGN(addr) ROUND_UP((addr), qemu_host_page_size)
-#define REAL_HOST_PAGE_ALIGN(addr) ROUND_UP((addr), qemu_real_host_page_size)
-
 /* same as PROT_xxx */
 #define PAGE_READ  0x0001
 #define PAGE_WRITE 0x0002
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 039d422bf4c..de5f444b193 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -7,6 +7,15 @@
 #include "exec/hwaddr.h"
 #endif
 
+/* Using intptr_t ensures that qemu_*_page_mask is sign-extended even
+ * when intptr_t is 32-bit and we are aligning a long long.
+ */
+extern uintptr_t qemu_host_page_size;
+extern intptr_t qemu_host_page_mask;
+
+#define HOST_PAGE_ALIGN(addr) ROUND_UP((addr), qemu_host_page_size)
+#define REAL_HOST_PAGE_ALIGN(addr) ROUND_UP((addr), qemu_real_host_page_size)
+
 /* The CPU list lock nests outside page_(un)lock or mmap_(un)lock */
 void qemu_init_cpu_list(void);
 void cpu_list_lock(void);
-- 
2.34.1

Re: [PATCH v2 1/2] tpm: CRB: Use ram_device for "tpm-crb-cmd" region

2022-01-19 Thread Alex Williamson

On Wed, 19 Jan 2022 23:46:19 +0100
Philippe Mathieu-Daudé  wrote:

> On 18/1/22 16:33, Eric Auger wrote:
> > Representing the CRB cmd/response buffer as a standard
> > RAM region causes some trouble when the device is used
> > with VFIO. Indeed VFIO attempts to DMA_MAP this region
> > as usual RAM but this latter does not have a valid page
> > size alignment causing such an error report:
> > "vfio_listener_region_add received unaligned region".
> > To allow VFIO to detect that failing dma mapping
> > this region is not an issue, let's use a ram_device
> > memory region type instead.
> > 
> > The change in meson.build is required to include the
> > cpu.h header.
> > 
> > Signed-off-by: Eric Auger 
> > Tested-by: Stefan Berger 
> > 
> > ---
> > 
> > v1 -> v2:
> > - Add tpm_crb_unrealize
> > ---
> >   hw/tpm/meson.build |  2 +-
> >   hw/tpm/tpm_crb.c   | 22 --
> >   2 files changed, 21 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/tpm/meson.build b/hw/tpm/meson.build
> > index 1c68d81d6a..3e74df945b 100644
> > --- a/hw/tpm/meson.build
> > +++ b/hw/tpm/meson.build
> > @@ -1,8 +1,8 @@
> >   softmmu_ss.add(when: 'CONFIG_TPM_TIS', if_true: files('tpm_tis_common.c'))
> >   softmmu_ss.add(when: 'CONFIG_TPM_TIS_ISA', if_true: 
> > files('tpm_tis_isa.c'))
> >   softmmu_ss.add(when: 'CONFIG_TPM_TIS_SYSBUS', if_true: 
> > files('tpm_tis_sysbus.c'))
> > -softmmu_ss.add(when: 'CONFIG_TPM_CRB', if_true: files('tpm_crb.c'))
> >   
> > +specific_ss.add(when: 'CONFIG_TPM_CRB', if_true: files('tpm_crb.c'))  
> 
> We don't need to make this file target-specific.
> 
> >   specific_ss.add(when: ['CONFIG_SOFTMMU', 'CONFIG_TPM_TIS'], if_true: 
> > files('tpm_ppi.c'))
> >   specific_ss.add(when: ['CONFIG_SOFTMMU', 'CONFIG_TPM_CRB'], if_true: 
> > files('tpm_ppi.c'))
> >   specific_ss.add(when: 'CONFIG_TPM_SPAPR', if_true: files('tpm_spapr.c'))
> > diff --git a/hw/tpm/tpm_crb.c b/hw/tpm/tpm_crb.c
> > index 58ebd1469c..6ec19a9911 100644
> > --- a/hw/tpm/tpm_crb.c
> > +++ b/hw/tpm/tpm_crb.c
> > @@ -25,6 +25,7 @@
> >   #include "sysemu/tpm_backend.h"
> >   #include "sysemu/tpm_util.h"
> >   #include "sysemu/reset.h"
> > +#include "cpu.h"
> >   #include "tpm_prop.h"
> >   #include "tpm_ppi.h"
> >   #include "trace.h"
> > @@ -43,6 +44,7 @@ struct CRBState {
> >   
> >   bool ppi_enabled;
> >   TPMPPI ppi;
> > +uint8_t *crb_cmd_buf;
> >   };
> >   typedef struct CRBState CRBState;
> >   
> > @@ -291,10 +293,14 @@ static void tpm_crb_realize(DeviceState *dev, Error 
> > **errp)
> >   return;
> >   }
> >   
> > +s->crb_cmd_buf = qemu_memalign(ç,
> > +HOST_PAGE_ALIGN(CRB_CTRL_CMD_SIZE));  
> 
> HOST_PAGE_ALIGN() and qemu_real_host_page_size() actually belong
> to "exec/cpu-common.h".
> 
> Alex, could you hold on a few days for this patch? I am going to send
> a cleanup series. Otherwise no worry, I will clean this on top too.

Sure.  Thanks,

Alex

Re: Cross Architecture Kernel Modules?

2022-01-19 Thread Kenneth Adam Miller

Would it be possible somehow to save the TCG cache, as with user binaries,
but for a kernel module, before then loading that kernel module into memory
the target architecture whether in or outside of QEMU?

On Wed, Jan 19, 2022 at 2:42 PM Kenneth Adam Miller <
kennethadammil...@gmail.com> wrote:

> The source for it isn't available in order that it be compiled to the
> desired architecture.
>
> What 3rd party forks take this approach?
>
> On Wed, Jan 19, 2022 at 2:06 PM Alex Bennée 
> wrote:
>
>>
>> Kenneth Adam Miller  writes:
>>
>> > Hello all,
>> >
>> > I just want to pose the following problem:
>> >
>> > There is a kernel module for a non-native architecture, say, arch 1.
>> For performance reasons, the rest of all of the software needs to run
>> > natively on a different arch, arch 2. Is there any way to perhaps run
>> multiple QEMU instances for the different architectures in such a way
>> > to minimize the cross architecture performance penalty? For example, I
>> would like the kernel module in one (non-native) QEMU instance to
>> > be made available, literally equivalently, in the second (native) QEMU
>> instance. Would there be any API or way to map across the QEMU
>> > instances so that the non native arch kernel module could be mapped to
>> > the native QEMU instance?
>>
>> What you are describing sounds like heterogeneous system modelling which
>> QEMU only supports in a very limited way (all vCPUs must be the same
>> base architecture). You can link QEMU's together by way of shared memory
>> but there is no other wiring together done in that case although some
>> 3rd party forks take this approach.
>>
>> The kernel module sounds confusing - why would you have a kernel module
>> that wasn't the same architecture as the kernel you are running?
>>
>> --
>> Alex Bennée
>>
>

Re: [PATCH] hw/nvram: use at24 macro

2022-01-19 Thread Philippe Mathieu-Daudé via


On 19/1/22 22:43, Patrick Venture wrote:

Use the macro for going from I2CSlave to EEPROMState.

Signed-off-by: Patrick Venture 
---
  hw/nvram/eeprom_at24c.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 1/2] tpm: CRB: Use ram_device for "tpm-crb-cmd" region

2022-01-19 Thread Philippe Mathieu-Daudé via


On 18/1/22 16:33, Eric Auger wrote:

Representing the CRB cmd/response buffer as a standard
RAM region causes some trouble when the device is used
with VFIO. Indeed VFIO attempts to DMA_MAP this region
as usual RAM but this latter does not have a valid page
size alignment causing such an error report:
"vfio_listener_region_add received unaligned region".
To allow VFIO to detect that failing dma mapping
this region is not an issue, let's use a ram_device
memory region type instead.

The change in meson.build is required to include the
cpu.h header.

Signed-off-by: Eric Auger 
Tested-by: Stefan Berger 

---

v1 -> v2:
- Add tpm_crb_unrealize
---
  hw/tpm/meson.build |  2 +-
  hw/tpm/tpm_crb.c   | 22 --
  2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/hw/tpm/meson.build b/hw/tpm/meson.build
index 1c68d81d6a..3e74df945b 100644
--- a/hw/tpm/meson.build
+++ b/hw/tpm/meson.build
@@ -1,8 +1,8 @@
  softmmu_ss.add(when: 'CONFIG_TPM_TIS', if_true: files('tpm_tis_common.c'))
  softmmu_ss.add(when: 'CONFIG_TPM_TIS_ISA', if_true: files('tpm_tis_isa.c'))
  softmmu_ss.add(when: 'CONFIG_TPM_TIS_SYSBUS', if_true: 
files('tpm_tis_sysbus.c'))
-softmmu_ss.add(when: 'CONFIG_TPM_CRB', if_true: files('tpm_crb.c'))
  
+specific_ss.add(when: 'CONFIG_TPM_CRB', if_true: files('tpm_crb.c'))


We don't need to make this file target-specific.


  specific_ss.add(when: ['CONFIG_SOFTMMU', 'CONFIG_TPM_TIS'], if_true: 
files('tpm_ppi.c'))
  specific_ss.add(when: ['CONFIG_SOFTMMU', 'CONFIG_TPM_CRB'], if_true: 
files('tpm_ppi.c'))
  specific_ss.add(when: 'CONFIG_TPM_SPAPR', if_true: files('tpm_spapr.c'))
diff --git a/hw/tpm/tpm_crb.c b/hw/tpm/tpm_crb.c
index 58ebd1469c..6ec19a9911 100644
--- a/hw/tpm/tpm_crb.c
+++ b/hw/tpm/tpm_crb.c
@@ -25,6 +25,7 @@
  #include "sysemu/tpm_backend.h"
  #include "sysemu/tpm_util.h"
  #include "sysemu/reset.h"
+#include "cpu.h"
  #include "tpm_prop.h"
  #include "tpm_ppi.h"
  #include "trace.h"
@@ -43,6 +44,7 @@ struct CRBState {
  
  bool ppi_enabled;

  TPMPPI ppi;
+uint8_t *crb_cmd_buf;
  };
  typedef struct CRBState CRBState;
  
@@ -291,10 +293,14 @@ static void tpm_crb_realize(DeviceState *dev, Error **errp)

  return;
  }
  
+s->crb_cmd_buf = qemu_memalign(ç,

+HOST_PAGE_ALIGN(CRB_CTRL_CMD_SIZE));


HOST_PAGE_ALIGN() and qemu_real_host_page_size() actually belong
to "exec/cpu-common.h".

Alex, could you hold on a few days for this patch? I am going to send
a cleanup series. Otherwise no worry, I will clean this on top too.

Thanks,

Phil.

[PATCH v5 18/18] vfio-user: avocado tests for vfio-user

2022-01-19 Thread Jagannathan Raman

Avocado tests for libvfio-user in QEMU - tests startup,
hotplug and migration of the server object

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 MAINTAINERS|   1 +
 tests/avocado/vfio-user.py | 225 +
 2 files changed, 226 insertions(+)
 create mode 100644 tests/avocado/vfio-user.py

diff --git a/MAINTAINERS b/MAINTAINERS
index 93bce3fa62..9ef9e1f75a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3488,6 +3488,7 @@ F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: subprojects/libvfio-user
 F: hw/remote/vfio-user-obj.c
+F: tests/avocado/vfio-user.py
 
 EBPF:
 M: Jason Wang 
diff --git a/tests/avocado/vfio-user.py b/tests/avocado/vfio-user.py
new file mode 100644
index 00..376c02c41f
--- /dev/null
+++ b/tests/avocado/vfio-user.py
@@ -0,0 +1,225 @@
+# vfio-user protocol sanity test
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+
+import os
+import socket
+import uuid
+
+from avocado_qemu import QemuSystemTest
+from avocado_qemu import wait_for_console_pattern
+from avocado_qemu import exec_command
+from avocado_qemu import exec_command_and_wait_for_pattern
+
+from avocado.utils import network
+from avocado.utils import wait
+
+class VfioUser(QemuSystemTest):
+"""
+:avocado: tags=vfiouser
+"""
+KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
+timeout = 20
+
+@staticmethod
+def migration_finished(vm):
+res = vm.command('query-migrate')
+if 'status' in res:
+return res['status'] in ('completed', 'failed')
+else:
+return False
+
+def _get_free_port(self):
+port = network.find_free_port()
+if port is None:
+self.cancel('Failed to find a free port')
+return port
+
+def validate_vm_launch(self, vm):
+wait_for_console_pattern(self, 'as init process',
+ 'Kernel panic - not syncing', vm=vm)
+exec_command(self, 'mount -t sysfs sysfs /sys', vm=vm)
+exec_command_and_wait_for_pattern(self,
+  'cat /sys/bus/pci/devices/*/uevent',
+  'PCI_ID=1000:0012', vm=vm)
+
+def launch_server_startup(self, socket, *opts):
+server_vm = self.get_vm()
+server_vm.add_args('-machine', 'x-remote')
+server_vm.add_args('-nodefaults')
+server_vm.add_args('-device', 'lsi53c895a,id=lsi1')
+server_vm.add_args('-object', 'x-vfio-user-server,id=vfioobj1,'
+   'type=unix,path='+socket+',device=lsi1')
+for opt in opts:
+server_vm.add_args(opt)
+server_vm.launch()
+return server_vm
+
+def launch_server_hotplug(self, socket):
+server_vm = self.get_vm()
+server_vm.add_args('-machine', 'x-remote')
+server_vm.add_args('-nodefaults')
+server_vm.add_args('-device', 'lsi53c895a,id=lsi1')
+server_vm.launch()
+server_vm.command('human-monitor-command',
+  command_line='object_add x-vfio-user-server,'
+   'id=vfioobj,socket.type=unix,'
+   'socket.path='+socket+',device=lsi1')
+return server_vm
+
+def launch_client(self, kernel_path, initrd_path, kernel_command_line,
+  machine_type, socket, *opts):
+client_vm = self.get_vm()
+client_vm.set_console()
+client_vm.add_args('-machine', machine_type)
+client_vm.add_args('-accel', 'kvm')
+client_vm.add_args('-cpu', 'host')
+client_vm.add_args('-object',
+   'memory-backend-memfd,id=sysmem-file,size=2G')
+client_vm.add_args('--numa', 'node,memdev=sysmem-file')
+client_vm.add_args('-m', '2048')
+client_vm.add_args('-kernel', kernel_path,
+   '-initrd', initrd_path,
+   '-append', kernel_command_line)
+client_vm.add_args('-device',
+   'vfio-user-pci,x-enable-migration=true,'
+   'socket='+socket)
+for opt in opts:
+client_vm.add_args(opt)
+client_vm.launch()
+return client_vm
+
+def do_test_startup(self, kernel_url, initrd_url, kernel_command_line,
+machine_type):
+self.require_accelerator('kvm')
+
+kernel_path = self.fetch_asset(kernel_url)
+initrd_path = self.fetch_asset(initrd_url)
+socket = os.path.join('/tmp', str(uuid.uuid4()))
+if os.path.exists(socket):
+os.remove(socket)
+self.launch_server_startup(socket)
+client = self.launch_client(kernel_path, initrd_path,
+kernel_command_line, machine_type, socket)
+

[PATCH] hw/nvram: use at24 macro

2022-01-19 Thread Patrick Venture

Use the macro for going from I2CSlave to EEPROMState.

Signed-off-by: Patrick Venture 
---
 hw/nvram/eeprom_at24c.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/nvram/eeprom_at24c.c b/hw/nvram/eeprom_at24c.c
index af6f5dbb99..da435500ba 100644
--- a/hw/nvram/eeprom_at24c.c
+++ b/hw/nvram/eeprom_at24c.c
@@ -54,7 +54,7 @@ struct EEPROMState {
 static
 int at24c_eeprom_event(I2CSlave *s, enum i2c_event event)
 {
-EEPROMState *ee = container_of(s, EEPROMState, parent_obj);
+EEPROMState *ee = AT24C_EE(s);
 
 switch (event) {
 case I2C_START_SEND:
-- 
2.34.1.703.g22d0c6ccf7-goog

[PATCH v5 16/18] vfio-user: handle device interrupts

2022-01-19 Thread Jagannathan Raman

Forward remote device's interrupts to the guest

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 include/hw/pci/pci.h  |  6 +++
 hw/pci/msi.c  | 13 +-
 hw/pci/msix.c | 12 +-
 hw/remote/vfio-user-obj.c | 89 +++
 hw/remote/trace-events|  1 +
 5 files changed, 119 insertions(+), 2 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 8c18f10d9d..092334d2af 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -128,6 +128,8 @@ typedef uint32_t PCIConfigReadFunc(PCIDevice *pci_dev,
 typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
 pcibus_t addr, pcibus_t size, int type);
 typedef void PCIUnregisterFunc(PCIDevice *pci_dev);
+typedef void PCIMSINotify(PCIDevice *pci_dev, unsigned vector);
+typedef void PCIMSIxNotify(PCIDevice *pci_dev, unsigned vector);
 
 typedef struct PCIIORegion {
 pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
@@ -322,6 +324,10 @@ struct PCIDevice {
 /* Space to store MSIX table & pending bit array */
 uint8_t *msix_table;
 uint8_t *msix_pba;
+
+PCIMSINotify *msi_notify;
+PCIMSIxNotify *msix_notify;
+
 /* MemoryRegion container for msix exclusive BAR setup */
 MemoryRegion msix_exclusive_bar;
 /* Memory Regions for MSIX table and pending bit entries. */
diff --git a/hw/pci/msi.c b/hw/pci/msi.c
index 47d2b0f33c..93f5e400cc 100644
--- a/hw/pci/msi.c
+++ b/hw/pci/msi.c
@@ -51,6 +51,8 @@
  */
 bool msi_nonbroken;
 
+static void pci_msi_notify(PCIDevice *dev, unsigned int vector);
+
 /* If we get rid of cap allocator, we won't need this. */
 static inline uint8_t msi_cap_sizeof(uint16_t flags)
 {
@@ -225,6 +227,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
 dev->msi_cap = config_offset;
 dev->cap_present |= QEMU_PCI_CAP_MSI;
 
+dev->msi_notify = pci_msi_notify;
+
 pci_set_word(dev->config + msi_flags_off(dev), flags);
 pci_set_word(dev->wmask + msi_flags_off(dev),
  PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
@@ -307,7 +311,7 @@ bool msi_is_masked(const PCIDevice *dev, unsigned int 
vector)
 return mask & (1U << vector);
 }
 
-void msi_notify(PCIDevice *dev, unsigned int vector)
+static void pci_msi_notify(PCIDevice *dev, unsigned int vector)
 {
 uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
 bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
@@ -332,6 +336,13 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
 msi_send_message(dev, msg);
 }
 
+void msi_notify(PCIDevice *dev, unsigned int vector)
+{
+if (dev->msi_notify) {
+dev->msi_notify(dev, vector);
+}
+}
+
 void msi_send_message(PCIDevice *dev, MSIMessage msg)
 {
 MemTxAttrs attrs = {};
diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index ae9331cd0b..1c71e67f53 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -31,6 +31,8 @@
 #define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
 #define MSIX_MASKALL_MASK (PCI_MSIX_FLAGS_MASKALL >> 8)
 
+static void pci_msix_notify(PCIDevice *dev, unsigned vector);
+
 MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
 {
 uint8_t *table_entry = dev->msix_table + vector * PCI_MSIX_ENTRY_SIZE;
@@ -334,6 +336,7 @@ int msix_init(struct PCIDevice *dev, unsigned short 
nentries,
 dev->msix_table = g_malloc0(table_size);
 dev->msix_pba = g_malloc0(pba_size);
 dev->msix_entry_used = g_malloc0(nentries * sizeof *dev->msix_entry_used);
+dev->msix_notify = pci_msix_notify;
 
 msix_mask_all(dev, nentries);
 
@@ -485,7 +488,7 @@ int msix_enabled(PCIDevice *dev)
 }
 
 /* Send an MSI-X message */
-void msix_notify(PCIDevice *dev, unsigned vector)
+static void pci_msix_notify(PCIDevice *dev, unsigned vector)
 {
 MSIMessage msg;
 
@@ -503,6 +506,13 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 msi_send_message(dev, msg);
 }
 
+void msix_notify(PCIDevice *dev, unsigned vector)
+{
+if (dev->msix_notify) {
+dev->msix_notify(dev, vector);
+}
+}
+
 void msix_reset(PCIDevice *dev)
 {
 if (!msix_present(dev)) {
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index bf88eac8f1..1771dba1bf 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -53,6 +53,8 @@
 #include "hw/qdev-core.h"
 #include "hw/pci/pci.h"
 #include "qemu/timer.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/msix.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -106,6 +108,8 @@ struct VfuObject {
 int vfu_poll_fd;
 };
 
+static GHashTable *vfu_object_dev_to_ctx_table;
+
 static void vfu_object_init_ctx(VfuObject *o, Error **errp);
 
 static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
@@ -381,6 +385,72 @@ static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, 
PCIDevice *pdev)
 }
 }
 
+static int

[PATCH v5 15/18] vfio-user: handle PCI BAR accesses

2022-01-19 Thread Jagannathan Raman

Determine the BARs used by the PCI device and register handlers to
manage the access to the same.

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
Reviewed-by: Stefan Hajnoczi 
---
 hw/remote/vfio-user-obj.c | 92 +++
 hw/remote/trace-events|  3 ++
 2 files changed, 95 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index e690f1eaae..bf88eac8f1 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -291,6 +291,96 @@ static void dma_unregister(vfu_ctx_t *vfu_ctx, 
vfu_dma_info_t *info)
 trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
 }
 
+static ssize_t vfu_object_bar_rw(PCIDevice *pci_dev, hwaddr addr, size_t count,
+ char * const buf, const bool is_write,
+ bool is_io)
+{
+AddressSpace *as = NULL;
+MemTxResult res;
+
+if (is_io) {
+as = pci_isol_as_io(pci_dev);
+as = as ? as : _space_io;
+} else {
+as = pci_isol_as_mem(pci_dev);
+as = as ? as : _space_memory;
+}
+
+trace_vfu_bar_rw_enter(is_write ? "Write" : "Read", (uint64_t)addr);
+
+res = address_space_rw(as, addr, MEMTXATTRS_UNSPECIFIED, (void *)buf,
+   (hwaddr)count, is_write);
+if (res != MEMTX_OK) {
+warn_report("vfu: failed to %s 0x%"PRIx64"",
+is_write ? "write to" : "read from",
+addr);
+return -1;
+}
+
+trace_vfu_bar_rw_exit(is_write ? "Write" : "Read", (uint64_t)addr);
+
+return count;
+}
+
+/**
+ * VFU_OBJECT_BAR_HANDLER - macro for defining handlers for PCI BARs.
+ *
+ * To create handler for BAR number 2, VFU_OBJECT_BAR_HANDLER(2) would
+ * define vfu_object_bar2_handler
+ */
+#define VFU_OBJECT_BAR_HANDLER(BAR_NO) 
\
+static ssize_t vfu_object_bar##BAR_NO##_handler(vfu_ctx_t *vfu_ctx,
\
+char * const buf, size_t count,
\
+loff_t offset, const bool is_write)
\
+{  
\
+VfuObject *o = vfu_get_private(vfu_ctx);   
\
+PCIDevice *pci_dev = o->pci_dev;   
\
+hwaddr addr = (hwaddr)(pci_get_bar_addr(pci_dev, BAR_NO) + offset);
\
+bool is_io = !!(pci_dev->io_regions[BAR_NO].type & 
\
+PCI_BASE_ADDRESS_SPACE);   
\
+   
\
+return vfu_object_bar_rw(pci_dev, addr, count, buf, is_write, is_io);  
\
+}  
\
+
+VFU_OBJECT_BAR_HANDLER(0)
+VFU_OBJECT_BAR_HANDLER(1)
+VFU_OBJECT_BAR_HANDLER(2)
+VFU_OBJECT_BAR_HANDLER(3)
+VFU_OBJECT_BAR_HANDLER(4)
+VFU_OBJECT_BAR_HANDLER(5)
+
+static vfu_region_access_cb_t *vfu_object_bar_handlers[PCI_NUM_REGIONS] = {
+_object_bar0_handler,
+_object_bar1_handler,
+_object_bar2_handler,
+_object_bar3_handler,
+_object_bar4_handler,
+_object_bar5_handler,
+};
+
+/**
+ * vfu_object_register_bars - Identify active BAR regions of pdev and setup
+ *callbacks to handle read/write accesses
+ */
+static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
+{
+int i;
+
+for (i = 0; i < PCI_NUM_REGIONS; i++) {
+if (!pdev->io_regions[i].size) {
+continue;
+}
+
+vfu_setup_region(vfu_ctx, VFU_PCI_DEV_BAR0_REGION_IDX + i,
+ (size_t)pdev->io_regions[i].size,
+ vfu_object_bar_handlers[i],
+ VFU_REGION_FLAG_RW, NULL, 0, -1, 0);
+
+trace_vfu_bar_register(i, pdev->io_regions[i].addr,
+   pdev->io_regions[i].size);
+}
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -386,6 +476,8 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
 goto fail;
 }
 
+vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
+
 ret = vfu_realize_ctx(o->vfu_ctx);
 if (ret < 0) {
 error_setg(errp, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index f945c7e33b..847d50d88f 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -9,3 +9,6 @@ vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 
0x%x"
 vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
 vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", 
%zu bytes"
 vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA

[PATCH v5 14/18] vfio-user: handle DMA mappings

2022-01-19 Thread Jagannathan Raman

Define and register callbacks to manage the RAM regions used for
device DMA

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
Reviewed-by: Stefan Hajnoczi 
---
 hw/remote/vfio-user-obj.c | 50 +++
 hw/remote/trace-events|  2 ++
 2 files changed, 52 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 8951617545..e690f1eaae 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -248,6 +248,49 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, 
char * const buf,
 return count;
 }
 
+static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+VfuObject *o = vfu_get_private(vfu_ctx);
+MemoryRegion *subregion = NULL;
+g_autofree char *name = NULL;
+struct iovec *iov = >iova;
+
+if (!info->vaddr) {
+return;
+}
+
+name = g_strdup_printf("mem-%s-%"PRIx64"", o->device,
+   (uint64_t)info->vaddr);
+
+subregion = g_new0(MemoryRegion, 1);
+
+memory_region_init_ram_ptr(subregion, NULL, name,
+   iov->iov_len, info->vaddr);
+
+memory_region_add_subregion(pci_address_space(o->pci_dev),
+(hwaddr)iov->iov_base, subregion);
+
+trace_vfu_dma_register((uint64_t)iov->iov_base, iov->iov_len);
+}
+
+static void dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+VfuObject *o = vfu_get_private(vfu_ctx);
+MemoryRegion *mr = NULL;
+ram_addr_t offset;
+
+mr = memory_region_from_host(info->vaddr, );
+if (!mr) {
+return;
+}
+
+memory_region_del_subregion(pci_address_space(o->pci_dev), mr);
+
+object_unparent((OBJECT(mr)));
+
+trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -336,6 +379,13 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
 goto fail;
 }
 
+ret = vfu_setup_device_dma(o->vfu_ctx, _register, _unregister);
+if (ret < 0) {
+error_setg(errp, "vfu: Failed to setup DMA handlers for %s",
+   o->device);
+goto fail;
+}
+
 ret = vfu_realize_ctx(o->vfu_ctx);
 if (ret < 0) {
 error_setg(errp, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 2ef7884346..f945c7e33b 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -7,3 +7,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to 
receive %d size %d,
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
 vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
 vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
+vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", 
%zu bytes"
+vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
-- 
2.20.1

[PATCH v5 06/18] vfio-user: add HotplugHandler for remote machine

2022-01-19 Thread Jagannathan Raman

Allow hotplugging of PCI(e) devices to remote machine

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/remote/machine.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index 952105eab5..220ff01aa9 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -54,14 +54,39 @@ static void remote_machine_init(MachineState *machine)
 
 pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
  >iohub, REMOTE_IOHUB_NB_PIRQS);
+
+qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
+}
+
+static void remote_machine_pre_plug_cb(HotplugHandler *hotplug_dev,
+   DeviceState *dev, Error **errp)
+{
+if (!object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+error_setg(errp, "Only allowing PCI hotplug");
+}
+}
+
+static void remote_machine_unplug_cb(HotplugHandler *hotplug_dev,
+ DeviceState *dev, Error **errp)
+{
+if (!object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+error_setg(errp, "Only allowing PCI hot-unplug");
+return;
+}
+
+qdev_unrealize(dev);
 }
 
 static void remote_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
+HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(oc);
 
 mc->init = remote_machine_init;
 mc->desc = "Experimental remote machine";
+
+hc->pre_plug = remote_machine_pre_plug_cb;
+hc->unplug = remote_machine_unplug_cb;
 }
 
 static const TypeInfo remote_machine = {
@@ -69,6 +94,10 @@ static const TypeInfo remote_machine = {
 .parent = TYPE_MACHINE,
 .instance_size = sizeof(RemoteMachineState),
 .class_init = remote_machine_class_init,
+.interfaces = (InterfaceInfo[]) {
+{ TYPE_HOTPLUG_HANDLER },
+{ }
+}
 };
 
 static void remote_machine_register_types(void)
-- 
2.20.1

[PATCH v5 17/18] vfio-user: register handlers to facilitate migration

2022-01-19 Thread Jagannathan Raman

Store and load the device's state during migration. use libvfio-user's
handlers for this purpose

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 include/migration/vmstate.h |   2 +
 migration/savevm.h  |   2 +
 hw/remote/vfio-user-obj.c   | 323 
 migration/savevm.c  |  73 
 migration/vmstate.c |  19 +++
 5 files changed, 419 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 017c03675c..68bea576ea 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -1165,6 +1165,8 @@ extern const VMStateInfo vmstate_info_qlist;
 #define VMSTATE_END_OF_LIST() \
 {}
 
+uint64_t vmstate_vmsd_size(PCIDevice *pci_dev);
+
 int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
void *opaque, int version_id);
 int vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
diff --git a/migration/savevm.h b/migration/savevm.h
index 6461342cb4..8007064ff2 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -67,5 +67,7 @@ int qemu_loadvm_state_main(QEMUFile *f, 
MigrationIncomingState *mis);
 int qemu_load_device_state(QEMUFile *f);
 int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
 bool in_postcopy, bool inactivate_disks);
+int qemu_remote_savevm(QEMUFile *f, DeviceState *dev);
+int qemu_remote_loadvm(QEMUFile *f);
 
 #endif
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 1771dba1bf..d3c51577bd 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -55,6 +55,11 @@
 #include "qemu/timer.h"
 #include "hw/pci/msi.h"
 #include "hw/pci/msix.h"
+#include "migration/qemu-file.h"
+#include "migration/savevm.h"
+#include "migration/vmstate.h"
+#include "migration/global_state.h"
+#include "block/block.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -106,6 +111,35 @@ struct VfuObject {
 Error *unplug_blocker;
 
 int vfu_poll_fd;
+
+/*
+ * vfu_mig_buf holds the migration data. In the remote server, this
+ * buffer replaces the role of an IO channel which links the source
+ * and the destination.
+ *
+ * Whenever the client QEMU process initiates migration, the remote
+ * server gets notified via libvfio-user callbacks. The remote server
+ * sets up a QEMUFile object using this buffer as backend. The remote
+ * server passes this object to its migration subsystem, which slurps
+ * the VMSD of the device ('devid' above) referenced by this object
+ * and stores the VMSD in this buffer.
+ *
+ * The client subsequetly asks the remote server for any data that
+ * needs to be moved over to the destination via libvfio-user
+ * library's vfu_migration_callbacks_t callbacks. The remote hands
+ * over this buffer as data at this time.
+ *
+ * A reverse of this process happens at the destination.
+ */
+uint8_t *vfu_mig_buf;
+
+uint64_t vfu_mig_buf_size;
+
+uint64_t vfu_mig_buf_pending;
+
+QEMUFile *vfu_mig_file;
+
+vfu_migr_state_t vfu_state;
 };
 
 static GHashTable *vfu_object_dev_to_ctx_table;
@@ -157,6 +191,272 @@ static void vfu_object_set_device(Object *obj, const char 
*str, Error **errp)
 vfu_object_init_ctx(o, errp);
 }
 
+/**
+ * Migration helper functions
+ *
+ * vfu_mig_buf_read & vfu_mig_buf_write are used by QEMU's migration
+ * subsystem - qemu_remote_loadvm & qemu_remote_savevm. loadvm/savevm
+ * call these functions via QEMUFileOps to load/save the VMSD of a
+ * device into vfu_mig_buf
+ *
+ */
+static ssize_t vfu_mig_buf_read(void *opaque, uint8_t *buf, int64_t pos,
+size_t size, Error **errp)
+{
+VfuObject *o = opaque;
+
+if (pos > o->vfu_mig_buf_size) {
+size = 0;
+} else if ((pos + size) > o->vfu_mig_buf_size) {
+size = o->vfu_mig_buf_size - pos;
+}
+
+memcpy(buf, (o->vfu_mig_buf + pos), size);
+
+return size;
+}
+
+static ssize_t vfu_mig_buf_write(void *opaque, struct iovec *iov, int iovcnt,
+ int64_t pos, Error **errp)
+{
+VfuObject *o = opaque;
+uint64_t end = pos + iov_size(iov, iovcnt);
+int i;
+
+if (end > o->vfu_mig_buf_size) {
+o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
+}
+
+for (i = 0; i < iovcnt; i++) {
+memcpy((o->vfu_mig_buf + o->vfu_mig_buf_size), iov[i].iov_base,
+   iov[i].iov_len);
+o->vfu_mig_buf_size += iov[i].iov_len;
+o->vfu_mig_buf_pending += iov[i].iov_len;
+}
+
+return iov_size(iov, iovcnt);
+}
+
+static int vfu_mig_buf_shutdown(void *opaque, bool rd, bool wr, Error **errp)
+{
+VfuObject *o = opaque;
+
+o->vfu_mig_buf_size = 0;
+
+g_free(o->vfu_mig_buf);
+
+o->vfu_mig_buf = NULL;
+
+

[PATCH v5 04/18] pci: create and free isolated PCI buses

2022-01-19 Thread Jagannathan Raman

Adds pci_isol_bus_new() and pci_isol_bus_free() functions to manage
creation and destruction of isolated PCI buses. Also adds qdev_get_bus
and qdev_put_bus callbacks to allow the choice of parent bus.

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 include/hw/pci/pci.h   |   4 +
 include/hw/qdev-core.h |  16 
 hw/pci/pci.c   | 169 +
 softmmu/qdev-monitor.c |  39 +-
 4 files changed, 225 insertions(+), 3 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 9bb4472abc..8c18f10d9d 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -452,6 +452,10 @@ PCIDevice *pci_nic_init_nofail(NICInfo *nd, PCIBus 
*rootbus,
 
 PCIDevice *pci_vga_init(PCIBus *bus);
 
+PCIBus *pci_isol_bus_new(BusState *parent_bus, const char *new_bus_type,
+ Error **errp);
+bool pci_isol_bus_free(PCIBus *pci_bus, Error **errp);
+
 static inline PCIBus *pci_get_bus(const PCIDevice *dev)
 {
 return PCI_BUS(qdev_get_parent_bus(DEVICE(dev)));
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 92c3d65208..eed2983072 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -419,6 +419,20 @@ void qdev_simple_device_unplug_cb(HotplugHandler 
*hotplug_dev,
 void qdev_machine_creation_done(void);
 bool qdev_machine_modified(void);
 
+/**
+ * Find parent bus - these callbacks are used during device addition
+ * and deletion.
+ *
+ * During addition, if no parent bus is specified in the options,
+ * these callbacks provide a way to figure it out based on the
+ * bus type. If these callbacks are not defined, defaults to
+ * finding the parent bus starting from default system bus
+ */
+typedef bool (QDevGetBusFunc)(const char *type, BusState **bus, Error **errp);
+typedef bool (QDevPutBusFunc)(BusState *bus, Error **errp);
+bool qdev_set_bus_cbs(QDevGetBusFunc *get_bus, QDevPutBusFunc *put_bus,
+  Error **errp);
+
 /**
  * GpioPolarity: Polarity of a GPIO line
  *
@@ -691,6 +705,8 @@ BusState *qdev_get_parent_bus(DeviceState *dev);
 /*** BUS API. ***/
 
 DeviceState *qdev_find_recursive(BusState *bus, const char *id);
+BusState *qbus_find_recursive(BusState *bus, const char *name,
+  const char *bus_typename);
 
 /* Returns 0 to walk children, > 0 to skip walk, < 0 to terminate walk. */
 typedef int (qbus_walkerfn)(BusState *bus, void *opaque);
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index d5f1c6c421..63ec1e47b5 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -493,6 +493,175 @@ void pci_root_bus_cleanup(PCIBus *bus)
 qbus_unrealize(BUS(bus));
 }
 
+static void pci_bus_free_isol_mem(PCIBus *pci_bus)
+{
+if (pci_bus->address_space_mem) {
+memory_region_unref(pci_bus->address_space_mem);
+pci_bus->address_space_mem = NULL;
+}
+
+if (pci_bus->isol_as_mem) {
+address_space_destroy(pci_bus->isol_as_mem);
+pci_bus->isol_as_mem = NULL;
+}
+
+if (pci_bus->address_space_io) {
+memory_region_unref(pci_bus->address_space_io);
+pci_bus->address_space_io = NULL;
+}
+
+if (pci_bus->isol_as_io) {
+address_space_destroy(pci_bus->isol_as_io);
+pci_bus->isol_as_io = NULL;
+}
+}
+
+static void pci_bus_init_isol_mem(PCIBus *pci_bus, uint32_t unique_id)
+{
+g_autofree char *mem_mr_name = NULL;
+g_autofree char *mem_as_name = NULL;
+g_autofree char *io_mr_name = NULL;
+g_autofree char *io_as_name = NULL;
+
+if (!pci_bus) {
+return;
+}
+
+mem_mr_name = g_strdup_printf("mem-mr-%u", unique_id);
+mem_as_name = g_strdup_printf("mem-as-%u", unique_id);
+io_mr_name = g_strdup_printf("io-mr-%u", unique_id);
+io_as_name = g_strdup_printf("io-as-%u", unique_id);
+
+pci_bus->address_space_mem = g_malloc0(sizeof(MemoryRegion));
+pci_bus->isol_as_mem = g_malloc0(sizeof(AddressSpace));
+memory_region_init(pci_bus->address_space_mem, NULL,
+   mem_mr_name, UINT64_MAX);
+address_space_init(pci_bus->isol_as_mem,
+   pci_bus->address_space_mem, mem_as_name);
+
+pci_bus->address_space_io = g_malloc0(sizeof(MemoryRegion));
+pci_bus->isol_as_io = g_malloc0(sizeof(AddressSpace));
+memory_region_init(pci_bus->address_space_io, NULL,
+   io_mr_name, UINT64_MAX);
+address_space_init(pci_bus->isol_as_io,
+   pci_bus->address_space_io, io_as_name);
+}
+
+PCIBus *pci_isol_bus_new(BusState *parent_bus, const char *new_bus_type,
+ Error **errp)
+{
+ERRP_GUARD();
+PCIBus *parent_pci_bus = NULL;
+DeviceState *pcie_root_port = NULL;
+g_autofree char *new_bus_name = NULL;
+PCIBus *new_pci_bus = NULL;
+HotplugHandler *hotplug_handler = NULL;
+uint32_t devfn, slot;
+
+if (!parent_bus) {
+error_setg(errp, "parent PCI bus not found");
+

[PATCH v5 12/18] vfio-user: run vfio-user context

2022-01-19 Thread Jagannathan Raman

Setup a handler to run vfio-user context. The context is driven by
messages to the file descriptor associated with it - get the fd for
the context and hook up the handler with it

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 qapi/misc.json| 23 ++
 hw/remote/vfio-user-obj.c | 90 ++-
 2 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/qapi/misc.json b/qapi/misc.json
index e8054f415b..f0791d3311 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -527,3 +527,26 @@
  'data': { '*option': 'str' },
  'returns': ['CommandLineOptionInfo'],
  'allow-preconfig': true }
+
+##
+# @VFU_CLIENT_HANGUP:
+#
+# Emitted when the client of a TYPE_VFIO_USER_SERVER closes the
+# communication channel
+#
+# @device: ID of attached PCI device
+#
+# @path: path of the socket
+#
+# Since: 6.3
+#
+# Example:
+#
+# <- { "event": "VFU_CLIENT_HANGUP",
+#  "data": { "device": "lsi1",
+#"path": "/tmp/vfu1-sock" },
+#  "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
+#
+##
+{ 'event': 'VFU_CLIENT_HANGUP',
+  'data': { 'device': 'str', 'path': 'str' } }
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 10db78eb8d..91d49a221f 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -27,6 +27,9 @@
  *
  * device - id of a device on the server, a required option. PCI devices
  *  alone are supported presently.
+ *
+ * notes - x-vfio-user-server could block IO and monitor during the
+ * initialization phase.
  */
 
 #include "qemu/osdep.h"
@@ -41,11 +44,14 @@
 #include "hw/remote/machine.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qapi/qapi-events-misc.h"
 #include "qemu/notify.h"
+#include "qemu/thread.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
 #include "hw/qdev-core.h"
 #include "hw/pci/pci.h"
+#include "qemu/timer.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -95,6 +101,8 @@ struct VfuObject {
 PCIDevice *pci_dev;
 
 Error *unplug_blocker;
+
+int vfu_poll_fd;
 };
 
 static void vfu_object_init_ctx(VfuObject *o, Error **errp);
@@ -144,6 +152,68 @@ static void vfu_object_set_device(Object *obj, const char 
*str, Error **errp)
 vfu_object_init_ctx(o, errp);
 }
 
+static void vfu_object_ctx_run(void *opaque)
+{
+VfuObject *o = opaque;
+int ret = -1;
+
+while (ret != 0) {
+ret = vfu_run_ctx(o->vfu_ctx);
+if (ret < 0) {
+if (errno == EINTR) {
+continue;
+} else if (errno == ENOTCONN) {
+qapi_event_send_vfu_client_hangup(o->device,
+  o->socket->u.q_unix.path);
+qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
+o->vfu_poll_fd = -1;
+object_unparent(OBJECT(o));
+break;
+} else {
+VFU_OBJECT_ERROR(o, "vfu: Failed to run device %s - %s",
+ o->device, strerror(errno));
+break;
+}
+}
+}
+}
+
+static void vfu_object_attach_ctx(void *opaque)
+{
+VfuObject *o = opaque;
+GPollFD pfds[1];
+int ret;
+
+qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
+
+pfds[0].fd = o->vfu_poll_fd;
+pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+
+retry_attach:
+ret = vfu_attach_ctx(o->vfu_ctx);
+if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
+/**
+ * vfu_object_attach_ctx can block QEMU's main loop
+ * during attach - the monitor and other IO
+ * could be unresponsive during this time.
+ */
+qemu_poll_ns(pfds, 1, 500 * (int64_t)SCALE_MS);
+goto retry_attach;
+} else if (ret < 0) {
+VFU_OBJECT_ERROR(o, "vfu: Failed to attach device %s to context - %s",
+ o->device, strerror(errno));
+return;
+}
+
+o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx);
+if (o->vfu_poll_fd < 0) {
+VFU_OBJECT_ERROR(o, "vfu: Failed to get poll fd %s", o->device);
+return;
+}
+
+qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_ctx_run, NULL, o);
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -182,7 +252,8 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
 return;
 }
 
-o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
+o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path,
+LIBVFIO_USER_FLAG_ATTACH_NB,
 o, VFU_DEV_TYPE_PCI);
 if (o->vfu_ctx == NULL) {
 error_setg(errp, "vfu: Failed to create context - %s",

[PATCH v5 11/18] vfio-user: find and init PCI device

2022-01-19 Thread Jagannathan Raman

Find the PCI device with specified id. Initialize the device context
with the QEMU PCI device

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/remote/vfio-user-obj.c | 60 +++
 1 file changed, 60 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 810a7c3943..10db78eb8d 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -44,6 +44,8 @@
 #include "qemu/notify.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
+#include "hw/qdev-core.h"
+#include "hw/pci/pci.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -89,6 +91,10 @@ struct VfuObject {
 Notifier machine_done;
 
 vfu_ctx_t *vfu_ctx;
+
+PCIDevice *pci_dev;
+
+Error *unplug_blocker;
 };
 
 static void vfu_object_init_ctx(VfuObject *o, Error **errp);
@@ -161,6 +167,9 @@ static void vfu_object_machine_done(Notifier *notifier, 
void *data)
 static void vfu_object_init_ctx(VfuObject *o, Error **errp)
 {
 ERRP_GUARD();
+DeviceState *dev = NULL;
+vfu_pci_type_t pci_type = VFU_PCI_TYPE_CONVENTIONAL;
+int ret;
 
 if (o->vfu_ctx || !o->socket || !o->device ||
 !phase_check(PHASE_MACHINE_READY)) {
@@ -179,6 +188,49 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
 error_setg(errp, "vfu: Failed to create context - %s", 
strerror(errno));
 return;
 }
+
+dev = qdev_find_recursive(sysbus_get_default(), o->device);
+if (dev == NULL) {
+error_setg(errp, "vfu: Device %s not found", o->device);
+goto fail;
+}
+
+if (!object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+error_setg(errp, "vfu: %s not a PCI device", o->device);
+goto fail;
+}
+
+o->pci_dev = PCI_DEVICE(dev);
+
+if (pci_is_express(o->pci_dev)) {
+pci_type = VFU_PCI_TYPE_EXPRESS;
+}
+
+ret = vfu_pci_init(o->vfu_ctx, pci_type, PCI_HEADER_TYPE_NORMAL, 0);
+if (ret < 0) {
+error_setg(errp,
+   "vfu: Failed to attach PCI device %s to context - %s",
+   o->device, strerror(errno));
+goto fail;
+}
+
+error_setg(>unplug_blocker, "%s is in use", o->device);
+qdev_add_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker, errp);
+if (*errp) {
+goto fail;
+}
+
+return;
+
+fail:
+vfu_destroy_ctx(o->vfu_ctx);
+if (o->unplug_blocker && o->pci_dev) {
+qdev_del_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
+error_free(o->unplug_blocker);
+o->unplug_blocker = NULL;
+}
+o->vfu_ctx = NULL;
+o->pci_dev = NULL;
 }
 
 static void vfu_object_init(Object *obj)
@@ -219,6 +271,14 @@ static void vfu_object_finalize(Object *obj)
 
 o->device = NULL;
 
+if (o->unplug_blocker && o->pci_dev) {
+qdev_del_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
+error_free(o->unplug_blocker);
+o->unplug_blocker = NULL;
+}
+
+o->pci_dev = NULL;
+
 if (!k->nr_devs && k->auto_shutdown) {
 qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
 }
-- 
2.20.1

[PATCH v5 10/18] vfio-user: instantiate vfio-user context

2022-01-19 Thread Jagannathan Raman

create a context with the vfio-user library to run a PCI device

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/remote/vfio-user-obj.c | 78 +++
 1 file changed, 78 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 80757b0029..810a7c3943 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -41,6 +41,9 @@
 #include "hw/remote/machine.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qemu/notify.h"
+#include "sysemu/sysemu.h"
+#include "libvfio-user.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -82,13 +85,23 @@ struct VfuObject {
 char *device;
 
 Error *err;
+
+Notifier machine_done;
+
+vfu_ctx_t *vfu_ctx;
 };
 
+static void vfu_object_init_ctx(VfuObject *o, Error **errp);
+
 static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
   void *opaque, Error **errp)
 {
 VfuObject *o = VFU_OBJECT(obj);
 
+if (o->vfu_ctx) {
+return;
+}
+
 qapi_free_SocketAddress(o->socket);
 
 o->socket = NULL;
@@ -104,17 +117,68 @@ static void vfu_object_set_socket(Object *obj, Visitor 
*v, const char *name,
 }
 
 trace_vfu_prop("socket", o->socket->u.q_unix.path);
+
+vfu_object_init_ctx(o, errp);
 }
 
 static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
 {
 VfuObject *o = VFU_OBJECT(obj);
 
+if (o->vfu_ctx) {
+return;
+}
+
 g_free(o->device);
 
 o->device = g_strdup(str);
 
 trace_vfu_prop("device", str);
+
+vfu_object_init_ctx(o, errp);
+}
+
+/*
+ * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
+ * properties. It also depends on devices instantiated in QEMU. These
+ * dependencies are not available during the instance_init phase of this
+ * object's life-cycle. As such, the server is initialized after the
+ * machine is setup. machine_init_done_notifier notifies TYPE_VFU_OBJECT
+ * when the machine is setup, and the dependencies are available.
+ */
+static void vfu_object_machine_done(Notifier *notifier, void *data)
+{
+VfuObject *o = container_of(notifier, VfuObject, machine_done);
+Error *err = NULL;
+
+vfu_object_init_ctx(o, );
+
+if (err) {
+error_propagate(_abort, err);
+}
+}
+
+static void vfu_object_init_ctx(VfuObject *o, Error **errp)
+{
+ERRP_GUARD();
+
+if (o->vfu_ctx || !o->socket || !o->device ||
+!phase_check(PHASE_MACHINE_READY)) {
+return;
+}
+
+if (o->err) {
+error_propagate(errp, o->err);
+o->err = NULL;
+return;
+}
+
+o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
+o, VFU_DEV_TYPE_PCI);
+if (o->vfu_ctx == NULL) {
+error_setg(errp, "vfu: Failed to create context - %s", 
strerror(errno));
+return;
+}
 }
 
 static void vfu_object_init(Object *obj)
@@ -124,6 +188,11 @@ static void vfu_object_init(Object *obj)
 
 k->nr_devs++;
 
+if (!phase_check(PHASE_MACHINE_READY)) {
+o->machine_done.notify = vfu_object_machine_done;
+qemu_add_machine_init_done_notifier(>machine_done);
+}
+
 if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
 error_setg(>err, "vfu: %s only compatible with %s machine",
TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE);
@@ -142,6 +211,10 @@ static void vfu_object_finalize(Object *obj)
 
 o->socket = NULL;
 
+if (o->vfu_ctx) {
+vfu_destroy_ctx(o->vfu_ctx);
+}
+
 g_free(o->device);
 
 o->device = NULL;
@@ -149,6 +222,11 @@ static void vfu_object_finalize(Object *obj)
 if (!k->nr_devs && k->auto_shutdown) {
 qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
 }
+
+if (o->machine_done.notify) {
+qemu_remove_machine_init_done_notifier(>machine_done);
+o->machine_done.notify = NULL;
+}
 }
 
 static void vfu_object_class_init(ObjectClass *klass, void *data)
-- 
2.20.1

[PATCH v5 01/18] configure, meson: override C compiler for cmake

2022-01-19 Thread Jagannathan Raman

The compiler path that cmake gets from meson is corrupted. It results in
the following error:
| -- The C compiler identification is unknown
| CMake Error at CMakeLists.txt:35 (project):
| The CMAKE_C_COMPILER:
| /opt/rh/devtoolset-9/root/bin/cc;-m64;-mcx16
| is not a full path to an existing compiler tool.

Explicitly specify the C compiler for cmake to avoid this error

Signed-off-by: Jagannathan Raman 
Acked-by: Paolo Bonzini 
---
 configure | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/configure b/configure
index e1a31fb332..6a865f8713 100755
--- a/configure
+++ b/configure
@@ -3747,6 +3747,8 @@ if test "$skip_meson" = no; then
   echo "cpp_args = [$(meson_quote $CXXFLAGS $EXTRA_CXXFLAGS)]" >> $cross
   echo "c_link_args = [$(meson_quote $CFLAGS $LDFLAGS $EXTRA_CFLAGS 
$EXTRA_LDFLAGS)]" >> $cross
   echo "cpp_link_args = [$(meson_quote $CXXFLAGS $LDFLAGS $EXTRA_CXXFLAGS 
$EXTRA_LDFLAGS)]" >> $cross
+  echo "[cmake]" >> $cross
+  echo "CMAKE_C_COMPILER = [$(meson_quote $cc $CPU_CFLAGS)]" >> $cross
   echo "[binaries]" >> $cross
   echo "c = [$(meson_quote $cc $CPU_CFLAGS)]" >> $cross
   test -n "$cxx" && echo "cpp = [$(meson_quote $cxx $CPU_CFLAGS)]" >> $cross
-- 
2.20.1

[PATCH v5 13/18] vfio-user: handle PCI config space accesses

2022-01-19 Thread Jagannathan Raman

Define and register handlers for PCI config space accesses

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/remote/vfio-user-obj.c | 45 +++
 hw/remote/trace-events|  2 ++
 2 files changed, 47 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 91d49a221f..8951617545 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -47,6 +47,7 @@
 #include "qapi/qapi-events-misc.h"
 #include "qemu/notify.h"
 #include "qemu/thread.h"
+#include "qemu/main-loop.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
 #include "hw/qdev-core.h"
@@ -214,6 +215,39 @@ retry_attach:
 qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_ctx_run, NULL, o);
 }
 
+static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
+ size_t count, loff_t offset,
+ const bool is_write)
+{
+VfuObject *o = vfu_get_private(vfu_ctx);
+uint32_t pci_access_width = sizeof(uint32_t);
+size_t bytes = count;
+uint32_t val = 0;
+char *ptr = buf;
+int len;
+
+while (bytes > 0) {
+len = (bytes > pci_access_width) ? pci_access_width : bytes;
+if (is_write) {
+memcpy(, ptr, len);
+pci_host_config_write_common(o->pci_dev, offset,
+ pci_config_size(o->pci_dev),
+ val, len);
+trace_vfu_cfg_write(offset, val);
+} else {
+val = pci_host_config_read_common(o->pci_dev, offset,
+  pci_config_size(o->pci_dev), 
len);
+memcpy(ptr, , len);
+trace_vfu_cfg_read(offset, val);
+}
+offset += len;
+ptr += len;
+bytes -= len;
+}
+
+return count;
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -291,6 +325,17 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
 goto fail;
 }
 
+ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_CFG_REGION_IDX,
+   pci_config_size(o->pci_dev), _object_cfg_access,
+   VFU_REGION_FLAG_RW | VFU_REGION_FLAG_ALWAYS_CB,
+   NULL, 0, -1, 0);
+if (ret < 0) {
+error_setg(errp,
+   "vfu: Failed to setup config space handlers for %s- %s",
+   o->device, strerror(errno));
+goto fail;
+}
+
 ret = vfu_realize_ctx(o->vfu_ctx);
 if (ret < 0) {
 error_setg(errp, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 7da12f0d96..2ef7884346 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -5,3 +5,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to 
receive %d size %d,
 
 # vfio-user-obj.c
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
+vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
+vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
-- 
2.20.1

[PATCH v5 09/18] vfio-user: define vfio-user-server object

2022-01-19 Thread Jagannathan Raman

Define vfio-user object which is remote process server for QEMU. Setup
object initialization functions and properties necessary to instantiate
the object

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 qapi/qom.json |  20 +++-
 hw/remote/vfio-user-obj.c | 194 ++
 MAINTAINERS   |   1 +
 hw/remote/meson.build |   1 +
 hw/remote/trace-events|   3 +
 5 files changed, 217 insertions(+), 2 deletions(-)
 create mode 100644 hw/remote/vfio-user-obj.c

diff --git a/qapi/qom.json b/qapi/qom.json
index eeb5395ff3..ff266e4732 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -703,6 +703,20 @@
 { 'struct': 'RemoteObjectProperties',
   'data': { 'fd': 'str', 'devid': 'str' } }
 
+##
+# @VfioUserServerProperties:
+#
+# Properties for x-vfio-user-server objects.
+#
+# @socket: socket to be used by the libvfiouser library
+#
+# @device: the id of the device to be emulated at the server
+#
+# Since: 6.3
+##
+{ 'struct': 'VfioUserServerProperties',
+  'data': { 'socket': 'SocketAddress', 'device': 'str' } }
+
 ##
 # @RngProperties:
 #
@@ -842,7 +856,8 @@
 'tls-creds-psk',
 'tls-creds-x509',
 'tls-cipher-suites',
-{ 'name': 'x-remote-object', 'features': [ 'unstable' ] }
+{ 'name': 'x-remote-object', 'features': [ 'unstable' ] },
+{ 'name': 'x-vfio-user-server', 'features': [ 'unstable' ] }
   ] }
 
 ##
@@ -905,7 +920,8 @@
   'tls-creds-psk':  'TlsCredsPskProperties',
   'tls-creds-x509': 'TlsCredsX509Properties',
   'tls-cipher-suites':  'TlsCredsProperties',
-  'x-remote-object':'RemoteObjectProperties'
+  'x-remote-object':'RemoteObjectProperties',
+  'x-vfio-user-server': 'VfioUserServerProperties'
   } }
 
 ##
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
new file mode 100644
index 00..80757b0029
--- /dev/null
+++ b/hw/remote/vfio-user-obj.c
@@ -0,0 +1,194 @@
+/**
+ * QEMU vfio-user-server server object
+ *
+ * Copyright © 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
+ *
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/**
+ * Usage: add options:
+ * -machine x-remote
+ * -device ,id=
+ * -object x-vfio-user-server,id=,type=unix,path=,
+ * device=
+ *
+ * Note that x-vfio-user-server object must be used with x-remote machine only.
+ * This server could only support PCI devices for now.
+ *
+ * type - SocketAddress type - presently "unix" alone is supported. Required
+ *option
+ *
+ * path - named unix socket, it will be created by the server. It is
+ *a required option
+ *
+ * device - id of a device on the server, a required option. PCI devices
+ *  alone are supported presently.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "qom/object.h"
+#include "qom/object_interfaces.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+#include "sysemu/runstate.h"
+#include "hw/boards.h"
+#include "hw/remote/machine.h"
+#include "qapi/error.h"
+#include "qapi/qapi-visit-sockets.h"
+
+#define TYPE_VFU_OBJECT "x-vfio-user-server"
+OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
+
+/**
+ * VFU_OBJECT_ERROR - reports an error message. If auto_shutdown
+ * is set, it abort the machine on error. Otherwise, it logs an
+ * error message without aborting.
+ */
+#define VFU_OBJECT_ERROR(o, fmt, ...) \
+{ \
+VfuObjectClass *oc = VFU_OBJECT_GET_CLASS(OBJECT(o)); \
+  \
+if (oc->auto_shutdown) {  \
+error_setg(_abort, (fmt), ## __VA_ARGS__);  \
+} else {  \
+error_report((fmt), ## __VA_ARGS__);  \
+} \
+} \
+
+struct VfuObjectClass {
+ObjectClass parent_class;
+
+unsigned int nr_devs;
+
+/*
+ * Can be set to shutdown automatically when all server object
+ * instances are destroyed
+ */
+bool auto_shutdown;
+};
+
+struct VfuObject {
+/* private */
+Object parent;
+
+SocketAddress *socket;
+
+char *device;
+
+Error *err;
+};
+
+static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp)
+{
+VfuObject *o = VFU_OBJECT(obj);
+
+qapi_free_SocketAddress(o->socket);
+
+o->socket = NULL;
+
+visit_type_SocketAddress(v, name, >socket, errp);
+
+if (o->socket->type != SOCKET_ADDRESS_TYPE_UNIX) {
+qapi_free_SocketAddress(o->socket);
+o->socket = NULL;
+

[PATCH v5 07/18] vfio-user: set qdev bus callbacks for remote machine

2022-01-19 Thread Jagannathan Raman

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/remote/machine.c | 57 +
 1 file changed, 57 insertions(+)

diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index 220ff01aa9..221a8430c1 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -22,6 +22,60 @@
 #include "hw/pci/pci_host.h"
 #include "hw/remote/iohub.h"
 
+static bool remote_machine_get_bus(const char *type, BusState **bus,
+   Error **errp)
+{
+ERRP_GUARD();
+RemoteMachineState *s = REMOTE_MACHINE(current_machine);
+BusState *root_bus = NULL;
+PCIBus *new_pci_bus = NULL;
+
+if (!bus) {
+error_setg(errp, "Invalid argument");
+return false;
+}
+
+if (strcmp(type, TYPE_PCI_BUS) && strcmp(type, TYPE_PCI_BUS)) {
+return true;
+}
+
+root_bus = qbus_find_recursive(sysbus_get_default(), NULL, TYPE_PCIE_BUS);
+if (!root_bus) {
+error_setg(errp, "Unable to find root PCI device");
+return false;
+}
+
+new_pci_bus = pci_isol_bus_new(root_bus, type, errp);
+if (!new_pci_bus) {
+return false;
+}
+
+*bus = BUS(new_pci_bus);
+
+pci_bus_irqs(new_pci_bus, remote_iohub_set_irq, remote_iohub_map_irq,
+ >iohub, REMOTE_IOHUB_NB_PIRQS);
+
+return true;
+}
+
+static bool remote_machine_put_bus(BusState *bus, Error **errp)
+{
+PCIBus *pci_bus = NULL;
+
+if (!bus) {
+error_setg(errp, "Invalid argument");
+return false;
+}
+
+if (!object_dynamic_cast(OBJECT(bus), TYPE_PCI_BUS)) {
+return true;
+}
+
+pci_bus = PCI_BUS(bus);
+
+return pci_isol_bus_free(pci_bus, errp);
+}
+
 static void remote_machine_init(MachineState *machine)
 {
 MemoryRegion *system_memory, *system_io, *pci_memory;
@@ -56,6 +110,9 @@ static void remote_machine_init(MachineState *machine)
  >iohub, REMOTE_IOHUB_NB_PIRQS);
 
 qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
+
+qdev_set_bus_cbs(remote_machine_get_bus, remote_machine_put_bus,
+ _fatal);
 }
 
 static void remote_machine_pre_plug_cb(HotplugHandler *hotplug_dev,
-- 
2.20.1

[PATCH v5 08/18] vfio-user: build library

2022-01-19 Thread Jagannathan Raman

add the libvfio-user library as a submodule. build it as a cmake
subproject.

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 configure  | 19 +-
 meson.build| 44 +-
 .gitlab-ci.d/buildtest.yml |  2 +
 .gitmodules|  3 ++
 Kconfig.host   |  4 ++
 MAINTAINERS|  1 +
 hw/remote/Kconfig  |  4 ++
 hw/remote/meson.build  |  2 +
 meson_options.txt  |  2 +
 subprojects/libvfio-user   |  1 +
 tests/docker/dockerfiles/centos8.docker|  2 +
 tests/docker/dockerfiles/ubuntu2004.docker |  2 +
 12 files changed, 84 insertions(+), 2 deletions(-)
 create mode 16 subprojects/libvfio-user

diff --git a/configure b/configure
index 6a865f8713..c8035de952 100755
--- a/configure
+++ b/configure
@@ -356,6 +356,7 @@ ninja=""
 gio="$default_feature"
 skip_meson=no
 slirp_smbd="$default_feature"
+vfio_user_server="disabled"
 
 # The following Meson options are handled manually (still they
 # are included in the automatically generated help message)
@@ -1172,6 +1173,10 @@ for opt do
   ;;
   --disable-blobs) meson_option_parse --disable-install-blobs ""
   ;;
+  --enable-vfio-user-server) vfio_user_server="enabled"
+  ;;
+  --disable-vfio-user-server) vfio_user_server="disabled"
+  ;;
   --enable-tcmalloc) meson_option_parse --enable-malloc=tcmalloc tcmalloc
   ;;
   --enable-jemalloc) meson_option_parse --enable-malloc=jemalloc jemalloc
@@ -1439,6 +1444,7 @@ cat << EOF
   rng-nonedummy RNG, avoid using /dev/(u)random and getrandom()
   gio libgio support
   slirp-smbd  use smbd (at path --smbd=*) in slirp networking
+  vfio-user-servervfio-user server support
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -3121,6 +3127,17 @@ but not implemented on your system"
 fi
 fi
 
+##
+# check for vfio_user_server
+
+case "$vfio_user_server" in
+  auto | enabled )
+if test "$git_submodules_action" != "ignore"; then
+  git_submodules="${git_submodules} subprojects/libvfio-user"
+fi
+;;
+esac
+
 ##
 # End of CC checks
 # After here, no more $cc or $ld runs
@@ -3811,7 +3828,7 @@ if test "$skip_meson" = no; then
 -Db_pie=$(if test "$pie" = yes; then echo true; else echo false; fi) \
 -Db_coverage=$(if test "$gcov" = yes; then echo true; else echo false; 
fi) \
 -Db_lto=$lto -Dcfi=$cfi -Dtcg=$tcg -Dxen=$xen \
--Dcapstone=$capstone -Dfdt=$fdt -Dslirp=$slirp \
+-Dcapstone=$capstone -Dfdt=$fdt -Dslirp=$slirp 
-Dvfio_user_server=$vfio_user_server \
 $(test -n "${LIB_FUZZING_ENGINE+xxx}" && echo 
"-Dfuzzing_engine=$LIB_FUZZING_ENGINE") \
 $(if test "$default_feature" = no; then echo 
"-Dauto_features=disabled"; fi) \
 "$@" $cross_arg "$PWD" "$source_path"
diff --git a/meson.build b/meson.build
index 333c61deba..15c2567543 100644
--- a/meson.build
+++ b/meson.build
@@ -274,6 +274,11 @@ if targetos != 'linux' and 
get_option('multiprocess').enabled()
 endif
 multiprocess_allowed = targetos == 'linux' and not 
get_option('multiprocess').disabled()
 
+if targetos != 'linux' and get_option('vfio_user_server').enabled()
+  error('vfio-user server is supported only on Linux')
+endif
+vfio_user_server_allowed = targetos == 'linux' and not 
get_option('vfio_user_server').disabled()
+
 # Target-specific libraries and flags
 libm = cc.find_library('m', required: false)
 threads = dependency('threads')
@@ -1877,7 +1882,8 @@ host_kconfig = \
   (have_virtfs ? ['CONFIG_VIRTFS=y'] : []) + \
   ('CONFIG_LINUX' in config_host ? ['CONFIG_LINUX=y'] : []) + \
   ('CONFIG_PVRDMA' in config_host ? ['CONFIG_PVRDMA=y'] : []) + \
-  (multiprocess_allowed ? ['CONFIG_MULTIPROCESS_ALLOWED=y'] : [])
+  (multiprocess_allowed ? ['CONFIG_MULTIPROCESS_ALLOWED=y'] : []) + \
+  (vfio_user_server_allowed ? ['CONFIG_VFIO_USER_SERVER_ALLOWED=y'] : [])
 
 ignored = [ 'TARGET_XML_FILES', 'TARGET_ABI_DIR', 'TARGET_ARCH' ]
 
@@ -2266,6 +2272,41 @@ if get_option('cfi') and slirp_opt == 'system'
  + ' Please configure with --enable-slirp=git')
 endif
 
+vfiouser = not_found
+if have_system and vfio_user_server_allowed
+  have_internal = fs.exists(meson.current_source_dir() / 
'subprojects/libvfio-user/Makefile')
+
+  if not have_internal
+error('libvfio-user source not found - please pull git submodule')
+  endif
+
+  json_c = dependency('json-c', required: false)
+  if not json_c.found()
+json_c = dependency('libjson-c', required: false)
+  endif
+  if not json_c.found()
+json_c = dependency('libjson-c-dev', required: false)
+  endif
+
+  if not json_c.found()
+error('Unable to find json-c package')
+  endif

[PATCH v5 03/18] pci: isolated address space for PCI bus

2022-01-19 Thread Jagannathan Raman

Allow PCI buses to be part of isolated CPU address spaces. This has a
niche usage.

TYPE_REMOTE_MACHINE allows multiple VMs to house their PCI devices in
the same machine/server. This would cause address space collision as
well as be a security vulnerability. Having separate address spaces for
each PCI bus would solve this problem.

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 include/hw/pci/pci.h |  2 ++
 include/hw/pci/pci_bus.h | 17 +
 hw/pci/pci.c | 17 +
 hw/pci/pci_bridge.c  |  5 +
 4 files changed, 41 insertions(+)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 023abc0f79..9bb4472abc 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -387,6 +387,8 @@ void pci_device_save(PCIDevice *s, QEMUFile *f);
 int pci_device_load(PCIDevice *s, QEMUFile *f);
 MemoryRegion *pci_address_space(PCIDevice *dev);
 MemoryRegion *pci_address_space_io(PCIDevice *dev);
+AddressSpace *pci_isol_as_mem(PCIDevice *dev);
+AddressSpace *pci_isol_as_io(PCIDevice *dev);
 
 /*
  * Should not normally be used by devices. For use by sPAPR target
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 347440d42c..d78258e79e 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -39,9 +39,26 @@ struct PCIBus {
 void *irq_opaque;
 PCIDevice *devices[PCI_SLOT_MAX * PCI_FUNC_MAX];
 PCIDevice *parent_dev;
+
 MemoryRegion *address_space_mem;
 MemoryRegion *address_space_io;
 
+/**
+ * Isolated address spaces - these allow the PCI bus to be part
+ * of an isolated address space as opposed to the global
+ * address_space_memory & address_space_io. This allows the
+ * bus to be attached to CPUs from different machines. The
+ * following is not used used commonly.
+ *
+ * TYPE_REMOTE_MACHINE allows emulating devices from multiple
+ * VM clients, as such it needs the PCI buses in the same machine
+ * to be part of different CPU address spaces. The following is
+ * useful in that scenario.
+ *
+ */
+AddressSpace *isol_as_mem;
+AddressSpace *isol_as_io;
+
 QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */
 QLIST_ENTRY(PCIBus) sibling;/* this will be replaced by qdev later */
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 5d30f9ca60..d5f1c6c421 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -442,6 +442,8 @@ static void pci_root_bus_internal_init(PCIBus *bus, 
DeviceState *parent,
 bus->slot_reserved_mask = 0x0;
 bus->address_space_mem = address_space_mem;
 bus->address_space_io = address_space_io;
+bus->isol_as_mem = NULL;
+bus->isol_as_io = NULL;
 bus->flags |= PCI_BUS_IS_ROOT;
 
 /* host bridge */
@@ -2676,6 +2678,16 @@ MemoryRegion *pci_address_space_io(PCIDevice *dev)
 return pci_get_bus(dev)->address_space_io;
 }
 
+AddressSpace *pci_isol_as_mem(PCIDevice *dev)
+{
+return pci_get_bus(dev)->isol_as_mem;
+}
+
+AddressSpace *pci_isol_as_io(PCIDevice *dev)
+{
+return pci_get_bus(dev)->isol_as_io;
+}
+
 static void pci_device_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *k = DEVICE_CLASS(klass);
@@ -2699,6 +2711,7 @@ static void pci_device_class_base_init(ObjectClass 
*klass, void *data)
 
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
 {
+AddressSpace *iommu_as = NULL;
 PCIBus *bus = pci_get_bus(dev);
 PCIBus *iommu_bus = bus;
 uint8_t devfn = dev->devfn;
@@ -2745,6 +2758,10 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice 
*dev)
 if (!pci_bus_bypass_iommu(bus) && iommu_bus && iommu_bus->iommu_fn) {
 return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn);
 }
+iommu_as = pci_isol_as_mem(dev);
+if (iommu_as) {
+return iommu_as;
+}
 return _space_memory;
 }
 
diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index da34c8ebcd..98366768d2 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -383,6 +383,11 @@ void pci_bridge_initfn(PCIDevice *dev, const char 
*typename)
 sec_bus->address_space_io = >address_space_io;
 memory_region_init(>address_space_io, OBJECT(br), "pci_bridge_io",
4 * GiB);
+
+/* This PCI bridge puts the sec_bus in its parent's address space */
+sec_bus->isol_as_mem = pci_isol_as_mem(dev);
+sec_bus->isol_as_io = pci_isol_as_io(dev);
+
 br->windows = pci_bridge_region_init(br);
 QLIST_INIT(_bus->child);
 QLIST_INSERT_HEAD(>child, sec_bus, sibling);
-- 
2.20.1

[PATCH v5 05/18] qdev: unplug blocker for devices

2022-01-19 Thread Jagannathan Raman

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 include/hw/qdev-core.h |  5 +
 softmmu/qdev-monitor.c | 35 +++
 2 files changed, 40 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index eed2983072..67df5e0081 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -193,6 +193,7 @@ struct DeviceState {
 int instance_id_alias;
 int alias_required_for_version;
 ResettableState reset;
+GSList *unplug_blockers;
 };
 
 struct DeviceListener {
@@ -433,6 +434,10 @@ typedef bool (QDevPutBusFunc)(BusState *bus, Error **errp);
 bool qdev_set_bus_cbs(QDevGetBusFunc *get_bus, QDevPutBusFunc *put_bus,
   Error **errp);
 
+int qdev_add_unplug_blocker(DeviceState *dev, Error *reason, Error **errp);
+void qdev_del_unplug_blocker(DeviceState *dev, Error *reason);
+bool qdev_unplug_blocked(DeviceState *dev, Error **errp);
+
 /**
  * GpioPolarity: Polarity of a GPIO line
  *
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 7306074019..1a169f89a2 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -978,10 +978,45 @@ void qmp_device_del(const char *id, Error **errp)
 return;
 }
 
+if (qdev_unplug_blocked(dev, errp)) {
+return;
+}
+
 qdev_unplug(dev, errp);
 }
 }
 
+int qdev_add_unplug_blocker(DeviceState *dev, Error *reason, Error **errp)
+{
+ERRP_GUARD();
+
+if (!migration_is_idle()) {
+error_setg(errp, "migration is in progress");
+return -EBUSY;
+}
+
+dev->unplug_blockers = g_slist_prepend(dev->unplug_blockers, reason);
+
+return 0;
+}
+
+void qdev_del_unplug_blocker(DeviceState *dev, Error *reason)
+{
+dev->unplug_blockers = g_slist_remove(dev->unplug_blockers, reason);
+}
+
+bool qdev_unplug_blocked(DeviceState *dev, Error **errp)
+{
+ERRP_GUARD();
+
+if (dev->unplug_blockers) {
+error_propagate(errp, error_copy(dev->unplug_blockers->data));
+return true;
+}
+
+return false;
+}
+
 void hmp_device_add(Monitor *mon, const QDict *qdict)
 {
 Error *err = NULL;
-- 
2.20.1

[PATCH v5 02/18] tests/avocado: Specify target VM argument to helper routines

2022-01-19 Thread Jagannathan Raman

Specify target VM for exec_command and
exec_command_and_wait_for_pattern routines

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Beraldo Leal 
---
 tests/avocado/avocado_qemu/__init__.py | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tests/avocado/avocado_qemu/__init__.py 
b/tests/avocado/avocado_qemu/__init__.py
index 75063c0c30..b3fbf77577 100644
--- a/tests/avocado/avocado_qemu/__init__.py
+++ b/tests/avocado/avocado_qemu/__init__.py
@@ -198,7 +198,7 @@ def wait_for_console_pattern(test, success_message, 
failure_message=None,
 """
 _console_interaction(test, success_message, failure_message, None, vm=vm)
 
-def exec_command(test, command):
+def exec_command(test, command, vm=None):
 """
 Send a command to a console (appending CRLF characters), while logging
 the content.
@@ -207,11 +207,14 @@ def exec_command(test, command):
 :type test: :class:`avocado_qemu.QemuSystemTest`
 :param command: the command to send
 :type command: str
+:param vm: target vm
+:type vm: :class:`qemu.machine.QEMUMachine`
 """
-_console_interaction(test, None, None, command + '\r')
+_console_interaction(test, None, None, command + '\r', vm=vm)
 
 def exec_command_and_wait_for_pattern(test, command,
-  success_message, failure_message=None):
+  success_message, failure_message=None,
+  vm=None):
 """
 Send a command to a console (appending CRLF characters), then wait
 for success_message to appear on the console, while logging the.
@@ -223,8 +226,11 @@ def exec_command_and_wait_for_pattern(test, command,
 :param command: the command to send
 :param success_message: if this message appears, test succeeds
 :param failure_message: if this message appears, test fails
+:param vm: target vm
+:type vm: :class:`qemu.machine.QEMUMachine`
 """
-_console_interaction(test, success_message, failure_message, command + 
'\r')
+_console_interaction(test, success_message, failure_message, command + 
'\r',
+ vm=vm)
 
 class QemuBaseTest(avocado.Test):
 def _get_unique_tag_val(self, tag_name):
-- 
2.20.1

[PATCH v5 00/18] vfio-user server in QEMU

2022-01-19 Thread Jagannathan Raman

Hi,

Thank you for taking the time to provide a comprehensive feedback
of the last series of patches. We have addressed all the comments.

We are posting this v5 of the series, which incorporates all the
feedback. Kindly share your feedback for this latest series

We added the following patches to the series:
  - [PATCH v5 03/18] pci: isolated address space for PCI bus
  - [PATCH v5 04/18] pci: create and free isolated PCI buses
  - [PATCH v5 05/18] qdev: unplug blocker for devices
  - [PATCH v5 06/18] vfio-user: add HotplugHandler for remote machine
  - [PATCH v5 07/18] vfio-user: set qdev bus callbacks for remote machine

We made the following changes to the existing patches:

[PATCH v5 09/18] vfio-user: define vfio-user-server object
  - renamed object class member 'daemon' as 'auto_shutdown'
  - set VfioUserServerProperties version to 6.3
  - use SocketAddressType_str to compose error message
  - refuse setting 'socket' and 'device' properties after server starts
  - added VFU_OBJECT_ERROR macro to report error

[PATCH v5 10/18] vfio-user: instantiate vfio-user context
  - set error variable to NULL after transferring ownership with
error_propagate()

[PATCH v5 11/18] vfio-user: find and init PCI device
  - block hot-unplug of PCI device when it is attached to the server object

[PATCH v5 12/18] vfio-user: run vfio-user context
  - emit a hangup event to the monitor when the client disconnects
  - reset vfu_poll_fd member and disable FD handler during finalize
  - add a comment to explain that attach could block
  - use VFU_OBJECT_ERROR instead of setting error_abort

[PATCH v5 14/18] vfio-user: handle DMA mappings
  - use pci_address_space() to access device's root memory region
  - given we're using one bus per device, mapped memory regions get
destroyed automatically when device is unplugged

[PATCH v5 15/18] vfio-user: handle PCI BAR accesses
  - use pci_isol_as_io() & pci_isol_as_mem() to access the device's
PCI/CPU address space. This simultaneously fixes the AddressSpace
issue noted in the last review cycle

[PATCH v5 16/18] vfio-user: handle device interrupts
  - setting own IRQ handlers for each bus
  - renamed vfu_object_dev_table to vfu_object_dev_to_ctx_table
  - indexing into vfu_object_dev_to_ctx_table with device's
address pointer instead of devfn
  - not looking up before removing from table

[PATCH v5 17/18] vfio-user: register handlers to facilitate migration
  - use VFU_OBJECT_ERROR instead of setting error_abort

We dropped the following patch from previous series:
  - vfio-user: IOMMU support for remote device

Thank you very much!

Jagannathan Raman (18):
  configure, meson: override C compiler for cmake
  tests/avocado: Specify target VM argument to helper routines
  pci: isolated address space for PCI bus
  pci: create and free isolated PCI buses
  qdev: unplug blocker for devices
  vfio-user: add HotplugHandler for remote machine
  vfio-user: set qdev bus callbacks for remote machine
  vfio-user: build library
  vfio-user: define vfio-user-server object
  vfio-user: instantiate vfio-user context
  vfio-user: find and init PCI device
  vfio-user: run vfio-user context
  vfio-user: handle PCI config space accesses
  vfio-user: handle DMA mappings
  vfio-user: handle PCI BAR accesses
  vfio-user: handle device interrupts
  vfio-user: register handlers to facilitate migration
  vfio-user: avocado tests for vfio-user

 configure  |   21 +-
 meson.build|   44 +-
 qapi/misc.json |   23 +
 qapi/qom.json  |   20 +-
 include/hw/pci/pci.h   |   12 +
 include/hw/pci/pci_bus.h   |   17 +
 include/hw/qdev-core.h |   21 +
 include/migration/vmstate.h|2 +
 migration/savevm.h |2 +
 hw/pci/msi.c   |   13 +-
 hw/pci/msix.c  |   12 +-
 hw/pci/pci.c   |  186 
 hw/pci/pci_bridge.c|5 +
 hw/remote/machine.c|   86 ++
 hw/remote/vfio-user-obj.c  | 1019 
 migration/savevm.c |   73 ++
 migration/vmstate.c|   19 +
 softmmu/qdev-monitor.c |   74 +-
 .gitlab-ci.d/buildtest.yml |2 +
 .gitmodules|3 +
 Kconfig.host   |4 +
 MAINTAINERS|3 +
 hw/remote/Kconfig  |4 +
 hw/remote/meson.build  |3 +
 hw/remote/trace-events |   11 +
 meson_options.txt  |2 +
 subprojects/libvfio-user   |1 +
 tests/avocado/avocado_qemu/__init__.py |   14 +-
 tests/avocado/vfio-user.py |  225 +

Re: [PATCH v2 00/13] arm gicv3 ITS: Various bug fixes and refactorings

2022-01-19 Thread Andre Przywara

On Wed, 19 Jan 2022 10:15:52 +
Peter Maydell  wrote:

Hi Peter,

> On Tue, 18 Jan 2022 at 23:30, Andre Przywara  wrote:
> > Looking at k-u-t's arm/gic.c and QEMU's hw/intc/arm_gic.c I see two
> > problems here: QEMU implements word accesses as four successive calls to
> > gic_dist_readb() - which is probably fine if that helps code design,
> > but it won't allow it to actually spot access size issues. I just
> > remember that we spent some brain cells and CPP macros on getting the
> > access size right in KVM - hence those tests in kvm-unit-tests.  
> 
> Thanks for looking at this. I should have read the code rather
> than dashing off a reply last thing in the evening based just
> on the test case output! I think I was confusing how our GICv3
> emulation handles register accesses (with separate functions for
> byte/halfword/word/quad accesses) with the GICv2 emulation
> (which as you say calls down into the byte emulation code
> wherever it can).

No worries!

> > But more importantly it looks like GICD_IIDR is actually not
> > implemented: There is a dubious "if (offset < 0x08) return 0;" line,
> > but IIDR (offset 0x8) would actually fall through, and hit the bad_reg
> > label, which would return 0 (and cause the message, if enabled).  
> 
> Mmm. I actually have a patch in target-arm.next from Petr Pavlu
> which implements GICC_IIDR, but we do indeed not implement the
> distributor equivalent.

Well, returning 0 is actually not the worst idea. Using proper ID
values might not even be feasible for QEMU, or would create some hassle
with versioning. With 0 all a user can assume is spec compliance.

> > If that helps: from a GIC MMIO perspective 8-bit accesses are actually
> > the exception rather than the norm (ARM IHI 0048B.b 4.1.4 GIC register
> > access).  
> 
> Yes. We got this right in the GICv3 emulation design, where almost
> all the logic is in the 32-bit accessor functions and the 8/16-bit
> functions deal only with the very few registers that have to
> permit non-word accesses.

Indeed. I dusted off my old GICv3 MMIO patches for kvm-unit-tests, and
QEMU passed with flying colours. With the debug switch I see it
reporting exactly the violating accesses we except to see.
Will send those patches ASAP.

> The GICv2 code is a lot older (and to
> be fair to it, started out as 11MPcore interrupt controller
> emulation, and I bet the docs of that were not very specific about
> what registers could or could not be accessed byte at a time).
> Unless we want to rewrite all that logic in the GICv2 emulation
> (which I at least do not :-))

... and I can't ...

> I think we'll have to live with
> the warnings about bad-offsets reporting for each byte rather
> than just once for the word access.

Yeah, if those warnings appear only with that debug switch, there is
probably little reason to change that code just because of this. At
least it seemed to work quite well over the years.

Cheers,
Andre

P.S. I changed k-u-t to special case the UP case, so that TCG passes.
But now KVM fails, of course. So I will have to make a patch for the
kernel ...

Re: [PATCH v2 0/2] TPM-CRB: Remove spurious error report when used with VFIO

2022-01-19 Thread Stefan Berger




On 1/18/22 10:33, Eric Auger wrote:

This series aims at removing a spurious error message we get when
launching a guest with a TPM-CRB device and VFIO-PCI devices.

The CRB command buffer currently is a RAM MemoryRegion and given
its base address alignment, it causes an error report on
vfio_listener_region_add(). This series proposes to use a ram-device
region instead which helps in better assessing the dma map error
failure on VFIO side.

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/tpm-crb-ram-device-v2

History:
v1 -> v2:
- added tpm_crb_unrealize (dared to keep Stefan's T-b though)

Eric Auger (2):
   tpm: CRB: Use ram_device for "tpm-crb-cmd" region
   hw/vfio/common: Silence ram device offset alignment error traces

  hw/tpm/meson.build   |  2 +-
  hw/tpm/tpm_crb.c | 22 --
  hw/vfio/common.c | 15 ++-
  hw/vfio/trace-events |  1 +
  4 files changed, 36 insertions(+), 4 deletions(-)


Acked-by: Stefan Berger

Re: [PATCH v2 2/2] hw/vfio/common: Silence ram device offset alignment error traces

2022-01-19 Thread Stefan Berger




On 1/19/22 15:13, Alex Williamson wrote:

On Tue, 18 Jan 2022 16:33:06 +0100
Eric Auger  wrote:


Failing to DMA MAP a ram_device should not cause an error message.
This is currently happening with the TPM CRB command region and
this is causing confusion.

We may want to keep the trace for debug purpose though.

Signed-off-by: Eric Auger 
Tested-by: Stefan Berger 
---
  hw/vfio/common.c | 15 ++-
  hw/vfio/trace-events |  1 +
  2 files changed, 15 insertions(+), 1 deletion(-)

Thanks!  Looks good to me.

Stefan, I can provide an ack here if you want to send a pull request
for both or likewise I can send a pull request with your ack on the
previous patch.  I suppose the patches themselves are technically
independent if you want to split them.  Whichever you prefer.

Acked-by: Alex Williamson 


If you want to send the PR, please go ahead.

   Stefan




diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 080046e3f5..9caa560b07 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -884,7 +884,20 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
  if (unlikely((section->offset_within_address_space &
~qemu_real_host_page_mask) !=
   (section->offset_within_region & 
~qemu_real_host_page_mask))) {
-error_report("%s received unaligned region", __func__);
+if (memory_region_is_ram_device(section->mr)) { /* just debug purpose 
*/
+trace_vfio_listener_region_add_bad_offset_alignment(
+memory_region_name(section->mr),
+section->offset_within_address_space,
+section->offset_within_region, qemu_real_host_page_size);
+} else { /* error case we don't want to be fatal */
+error_report("%s received unaligned region %s iova=0x%"PRIx64
+ " offset_within_region=0x%"PRIx64
+ " qemu_real_host_page_mask=0x%"PRIx64,
+ __func__, memory_region_name(section->mr),
+ section->offset_within_address_space,
+ section->offset_within_region,
+ qemu_real_host_page_mask);
+}
  return;
  }
  
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events

index 0ef1b5f4a6..ccd9d7610d 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -100,6 +100,7 @@ vfio_listener_region_add_skip(uint64_t start, uint64_t end) 
"SKIPPING region_add
  vfio_spapr_group_attach(int groupfd, int tablefd) "Attached groupfd %d to liobn fd 
%d"
  vfio_listener_region_add_iommu(uint64_t start, uint64_t end) "region_add [iommu] 
0x%"PRIx64" - 0x%"PRIx64
  vfio_listener_region_add_ram(uint64_t iova_start, uint64_t iova_end, void *vaddr) "region_add [ram] 
0x%"PRIx64" - 0x%"PRIx64" [%p]"
+vfio_listener_region_add_bad_offset_alignment(const char *name, uint64_t iova, uint64_t offset_within_region, uint64_t 
page_size) "Region \"%s\" @0x%"PRIx64", offset_within_region=0x%"PRIx64", 
qemu_real_host_page_mask=0x%"PRIx64 " cannot be mapped for DMA"
  vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t size, uint64_t page_size) "Region 
\"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not aligned to 0x%"PRIx64" and cannot be mapped 
for DMA"
  vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 
0x%"PRIx64" - 0x%"PRIx64
  vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 
0x%"PRIx64

Re: [PATCH v2 2/2] hw/vfio/common: Silence ram device offset alignment error traces

2022-01-19 Thread Alex Williamson

On Tue, 18 Jan 2022 16:33:06 +0100
Eric Auger  wrote:

> Failing to DMA MAP a ram_device should not cause an error message.
> This is currently happening with the TPM CRB command region and
> this is causing confusion.
> 
> We may want to keep the trace for debug purpose though.
> 
> Signed-off-by: Eric Auger 
> Tested-by: Stefan Berger 
> ---
>  hw/vfio/common.c | 15 ++-
>  hw/vfio/trace-events |  1 +
>  2 files changed, 15 insertions(+), 1 deletion(-)

Thanks!  Looks good to me.

Stefan, I can provide an ack here if you want to send a pull request
for both or likewise I can send a pull request with your ack on the
previous patch.  I suppose the patches themselves are technically
independent if you want to split them.  Whichever you prefer.

Acked-by: Alex Williamson 

> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 080046e3f5..9caa560b07 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -884,7 +884,20 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>  if (unlikely((section->offset_within_address_space &
>~qemu_real_host_page_mask) !=
>   (section->offset_within_region & 
> ~qemu_real_host_page_mask))) {
> -error_report("%s received unaligned region", __func__);
> +if (memory_region_is_ram_device(section->mr)) { /* just debug 
> purpose */
> +trace_vfio_listener_region_add_bad_offset_alignment(
> +memory_region_name(section->mr),
> +section->offset_within_address_space,
> +section->offset_within_region, qemu_real_host_page_size);
> +} else { /* error case we don't want to be fatal */
> +error_report("%s received unaligned region %s iova=0x%"PRIx64
> + " offset_within_region=0x%"PRIx64
> + " qemu_real_host_page_mask=0x%"PRIx64,
> + __func__, memory_region_name(section->mr),
> + section->offset_within_address_space,
> + section->offset_within_region,
> + qemu_real_host_page_mask);
> +}
>  return;
>  }
>  
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 0ef1b5f4a6..ccd9d7610d 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -100,6 +100,7 @@ vfio_listener_region_add_skip(uint64_t start, uint64_t 
> end) "SKIPPING region_add
>  vfio_spapr_group_attach(int groupfd, int tablefd) "Attached groupfd %d to 
> liobn fd %d"
>  vfio_listener_region_add_iommu(uint64_t start, uint64_t end) "region_add 
> [iommu] 0x%"PRIx64" - 0x%"PRIx64
>  vfio_listener_region_add_ram(uint64_t iova_start, uint64_t iova_end, void 
> *vaddr) "region_add [ram] 0x%"PRIx64" - 0x%"PRIx64" [%p]"
> +vfio_listener_region_add_bad_offset_alignment(const char *name, uint64_t 
> iova, uint64_t offset_within_region, uint64_t page_size) "Region \"%s\" 
> @0x%"PRIx64", offset_within_region=0x%"PRIx64", 
> qemu_real_host_page_mask=0x%"PRIx64 " cannot be mapped for DMA"
>  vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, 
> uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" 
> size=0x%"PRIx64" is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
>  vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING 
> region_del 0x%"PRIx64" - 0x%"PRIx64
>  vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 
> 0x%"PRIx64" - 0x%"PRIx64

Re: [RESEND] target/riscv: fix RV128 lq encoding

2022-01-19 Thread Philipp Tomsich

The cbo.* mnemonics share their opcode space with lq for those cases where
rd == 0 ("brownfield" encodings).
"Major opcode" refers to inst[6:0] according to chapter 26.

In overlapping multi-group syntax, this would look like:

> {
>
>   # *** RV32 Zicbom Standard Extension ***
>
>   cbo_clean  000 1 . 010 0 000 @sfence_vm
>
>   cbo_flush  000 00010 . 010 0 000 @sfence_vm
>
>   cbo_inval  000 0 . 010 0 000 @sfence_vm
>
>
>   # *** RV32 Zicboz Standard Extension ***
>
>   cbo_zero   000 00100 . 010 0 000 @sfence_vm
>
>
>   # *** RVI128 lq ***
>
>   lq      . 010 . 000 @i
>
> }
>

Instead of using a multigroup here, I would recommend that you take a look
at https://patchwork.kernel.org/project/qemu-devel/list/?series=605340
where we have added a table of optional decoders — this could be used to
split these off into separate decoders that are run before the regular
decoder, if & only if Zicboc and/or Zicboz are enabled.

Cheers,
Philipp.

On Tue, 18 Jan 2022 at 17:32, Christoph Muellner 
wrote:

> If LQ has func3==010 and is located in the MISC-MEM opcodes,
> then it conflicts with the CBO opcode space.
> However, since LQ is specified as: "LQ is added to the MISC-MEM major
> opcode", we have an implementation bug, because 'major opcode'
> refers to func3, which must be 111.
>
> This results in the following instruction encodings:
>
> lq  .111 .000
> cbo_clean  0001 .010 
> cbo_flush  0010 .010 
> cbo_inval   .010 
> cbo_zero   0100 .010 
>  ^^^-func3
>   ^^^-opcode
>
> Signed-off-by: Christoph Muellner 
> ---
>  target/riscv/insn32.decode | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 5bbedc254c..d3f798ca10 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -168,7 +168,7 @@ sraw 010 .  . 101 . 0111011 @r
>
>  # *** RV128I Base Instruction Set (in addition to RV64I) ***
>  ldu     . 111 . 011 @i
> -lq      . 010 . 000 @i
> +lq      . 111 . 000 @i
>  sq      . 100 . 0100011 @s
>  addid  .  000 . 1011011 @i
>  sllid00 ..  . 001 . 1011011 @sh6
> --
> 2.34.1
>
>

Re: iotest 040, 041, intermittent failure in netbsd VM

2022-01-19 Thread John Snow

On Tue, Jan 18, 2022 at 1:34 PM John Snow  wrote:
>
> On Tue, Jan 18, 2022 at 7:13 AM Peter Maydell  
> wrote:
> >
> > On Mon, 17 Jan 2022 at 20:35, John Snow  wrote:
> > > I do expect this to print more information on failure than it
> > > currently is, though (bug somewhere in machine.py, I think).
> > > Can you please try applying this temporary patch and running `./check
> > > -qcow2 040 041` until you see a breakage and show me the output from
> > > that?
> >
>
> Thanks for playing tele-debug.
>
> > Having fixed my setup to not use an ancient host QEMU, here's
> > the relevant bit of the log:
> >
> >   TEST   iotest-qcow2: 037
> >   TEST   iotest-qcow2: 038 [not run]
> >   TEST   iotest-qcow2: 039 [not run]
> >   TEST   iotest-qcow2: 040 [fail]
> > QEMU  --
> > "/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-system-aarch64"
> > -nodefaults -display none -accel qtest -machine virt
> > QEMU_IMG  --
> > "/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-img"
> > QEMU_IO   --
> > "/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-io"
> > --cache writeback --aio threads -f qcow2
> > QEMU_NBD  --
> > "/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-nbd"
> > IMGFMT-- qcow2
> > IMGPROTO  -- file
> > PLATFORM  -- NetBSD/amd64 localhost 9.2
> > TEST_DIR  -- 
> > /home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/scratch
> > SOCK_DIR  -- /tmp/tmp1h12r7ev
> > GDB_OPTIONS   --
> > VALGRIND_QEMU --
> > PRINT_QEMU_OUTPUT --
> >
> > --- /home/qemu/qemu-test.vdrI02/src/tests/qemu-iotests/040.out
> > +++ 040.out.bad
> > @@ -1,5 +1,95 @@
> > -.
> > +...ERROR:qemu.aqmp.qmp_client.qemu-12407:Failed to establish
> > connection: concurrent.futures._base.CancelledError
> > +ERROR:qemu.machine.machine:Error launching VM
> > +ERROR:qemu.machine.machine:Process was forked, waiting on it
> > +ERROR:qemu.machine.machine:Command:
> > '/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-system-aarch64
> > -display none -vga none -chardev
> > socket,id=mon,path=/tmp/tmp1h12r7ev/qemu-12407-monitor.sock -mon
> > chardev=mon,mode=control -qtest
> > unix:path=/tmp/tmp1h12r7ev/qemu-12407-qtest.sock -accel qtest
> > -nodefaults -display none -accel qtest -machine virt -drive
> > if=none,id=drive0,file=/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads,node-name=top,backing.node-name=mid,backing.backing.node-name=base
> > -device virtio-scsi -device scsi-hd,id=scsi0,drive=drive0'
>
> > +ERROR:qemu.machine.machine:Output: "qemu-system-aarch64: -chardev
> > socket,id=mon,path=/tmp/tmp1h12r7ev/qemu-12407-monitor.sock: Failed to
> > connect to '/tmp/tmp1h12r7ev/qemu-12407-monitor.sock': No such file or
> > directory\n"
>
> ... Oh. That's unpleasant. My guess is that we aren't listening on the
> socket before the QEMU process gets far enough to want to connect to
> it. The change to an asynchronous backend must have jostled the
> timing.
>
> > +ERROR:qemu.machine.machine:exitcode: 1
>
> And, oh: The VM launching library only chirps about *negative* error
> codes. That's why it wasn't printing anything more useful. I suppose
> the thinking was that we use the VM launch utility to knowingly launch
> bad command lines, so we only wanted to see failure notifications on
> -errno style codes, but that obviously makes debugging unintentional
> failures a lot more awful. I'll try to improve the usability and
> legibility of the errors here.
>
> Thanks,
> --js

I've published '[PATCH v2 0/5] Python: minor fixes' and pushed to
jsnow/python. Test it if you want; otherwise I'll wait for
reviews/acks and send a PR like normal. CI is still running on the
final push, but early smoke tests looked good.

(Patch 1 fixes compatibility with QEMU 2.11, patch 3 adds better
diagnostic info to failures, patch 5 should ultimately fix the root
cause of the race condition.)

Thanks,
--js

Re: [RESEND] target/riscv: fix RV128 lq encoding

2022-01-19 Thread Frédéric Pétrot


Le 18/01/2022 à 17:32, Christoph Muellner a écrit :

If LQ has func3==010 and is located in the MISC-MEM opcodes,
then it conflicts with the CBO opcode space.
However, since LQ is specified as: "LQ is added to the MISC-MEM major
opcode", we have an implementation bug, because 'major opcode'
refers to func3, which must be 111.

This results in the following instruction encodings:

lq  .111 .000
cbo_clean  0001 .010 
cbo_flush  0010 .010 
cbo_inval   .010 
cbo_zero   0100 .010 
  ^^^-func3
   ^^^-opcode


  Hello Christoph,
  I see page table 26.1 of the last riscv-isa-manual.pdf what is called major
  opcodes in my understanding, and MISC-MEM is one of them with value 00_111_11.
  The value for func3 that I chose comes from
  https://github.com/michaeljclark/riscv-meta/blob/master/opcodes
  which admittedly is out-dated, but I don't see any particular value for
  LQ/SQ in the new spec either (I mean, riscv-isa-manual.pdf, any pointer we
  could refer to ?).
  I have nothing against changing the opcode, but then we need to change
  disas/riscv.c which also uses the previous opcode to dump instructions when
  running with -d in_asm.

  Frédéric


Signed-off-by: Christoph Muellner 
---
  target/riscv/insn32.decode | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5bbedc254c..d3f798ca10 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -168,7 +168,7 @@ sraw 010 .  . 101 . 0111011 @r
  
  # *** RV128I Base Instruction Set (in addition to RV64I) ***

  ldu     . 111 . 011 @i
-lq      . 010 . 000 @i
+lq      . 111 . 000 @i
  sq      . 100 . 0100011 @s
  addid  .  000 . 1011011 @i
  sllid00 ..  . 001 . 1011011 @sh6


--
+---+
| Frédéric Pétrot, Pr. Grenoble INP-Ensimag/TIMA,   Ensimag deputy director |
| Mob/Pho: +33 6 74 57 99 65/+33 4 76 57 48 70  Ad augusta  per angusta |
| http://tima.univ-grenoble-alpes.fr frederic.pet...@univ-grenoble-alpes.fr |
+---+

Re: Cross Architecture Kernel Modules?

2022-01-19 Thread Kenneth Adam Miller

The source for it isn't available in order that it be compiled to the
desired architecture.

What 3rd party forks take this approach?

On Wed, Jan 19, 2022 at 2:06 PM Alex Bennée  wrote:

>
> Kenneth Adam Miller  writes:
>
> > Hello all,
> >
> > I just want to pose the following problem:
> >
> > There is a kernel module for a non-native architecture, say, arch 1. For
> performance reasons, the rest of all of the software needs to run
> > natively on a different arch, arch 2. Is there any way to perhaps run
> multiple QEMU instances for the different architectures in such a way
> > to minimize the cross architecture performance penalty? For example, I
> would like the kernel module in one (non-native) QEMU instance to
> > be made available, literally equivalently, in the second (native) QEMU
> instance. Would there be any API or way to map across the QEMU
> > instances so that the non native arch kernel module could be mapped to
> > the native QEMU instance?
>
> What you are describing sounds like heterogeneous system modelling which
> QEMU only supports in a very limited way (all vCPUs must be the same
> base architecture). You can link QEMU's together by way of shared memory
> but there is no other wiring together done in that case although some
> 3rd party forks take this approach.
>
> The kernel module sounds confusing - why would you have a kernel module
> that wasn't the same architecture as the kernel you are running?
>
> --
> Alex Bennée
>

Re: [PATCH 2/3] migration/migration.c: Avoid COLO boot in postcopy migration

2022-01-19 Thread Dr. David Alan Gilbert

* Zhang Chen (chen.zh...@intel.com) wrote:
> COLO dose not support postcopy migration and remove the Fixme.


'does' not 'dose'

> Signed-off-by: Zhang Chen 
> ---
>  migration/migration.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 2afa77da03..3fac9c67ca 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3230,7 +3230,11 @@ static void migration_completion(MigrationState *s)
>  goto fail_invalidate;
>  }
>  
> -if (!migrate_colo_enabled()) {
> +if (migrate_colo_enabled() && s->state == MIGRATION_STATUS_ACTIVE) {
> +/* COLO dose not support postcopy */
> +migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
> +  MIGRATION_STATUS_COLO);

I'm a bit confused; where were we setting the source state to COLO
before - I can't find it!

Dave

> +} else {
>  migrate_set_state(>state, current_active_state,
>MIGRATION_STATUS_COMPLETED);
>  }
> @@ -3621,10 +3625,6 @@ static void migration_iteration_finish(MigrationState 
> *s)
>   "COLO enabled", __func__);
>  }
>  migrate_start_colo_process(s);
> -/*
> - * Fixme: we will run VM in COLO no matter its old running state.
> - * After exited COLO, we will keep running.
> - */
>   /* Fallthrough */
>  case MIGRATION_STATUS_ACTIVE:
>  /*
> -- 
> 2.25.1
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[PATCH v2 5/5] python/aqmp: add socket bind step to legacy.py

2022-01-19 Thread John Snow

The old QMP library would actually bind to the server address during
__init__(). The new library delays this to the accept() call, because
binding occurs inside of the call to start_[unix_]server(), which is an
async method -- so it cannot happen during __init__ anymore.

Python 3.7+ adds the ability to create the server (and thus the bind()
call) and begin the active listening in separate steps, but we don't
have that functionality in 3.6, our current minimum.

Therefore ... Add a temporary workaround that allows the synchronous
version of the client to bind the socket in advance, guaranteeing that
there will be a UNIX socket in the filesystem ready for the QEMU client
to connect to without a race condition.

(Yes, it's ugly; fixing it more nicely will unfortunately have to wait
until I can stipulate Python 3.7+ as our minimum version. Python 3.6 is
EOL as of the beginning of this year, but I haven't checked if all of
our supported build platforms have a properly modern Python available
yet.)

Signed-off-by: John Snow 
---
 python/qemu/aqmp/legacy.py   |  3 +++
 python/qemu/aqmp/protocol.py | 41 +---
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/python/qemu/aqmp/legacy.py b/python/qemu/aqmp/legacy.py
index 9e7b9fb80b..cf7634ee95 100644
--- a/python/qemu/aqmp/legacy.py
+++ b/python/qemu/aqmp/legacy.py
@@ -33,6 +33,9 @@ def __init__(self, address: SocketAddrT,
 self._address = address
 self._timeout: Optional[float] = None
 
+if server:
+self._aqmp._bind_hack(address)  # pylint: disable=protected-access
+
 _T = TypeVar('_T')
 
 def _sync(
diff --git a/python/qemu/aqmp/protocol.py b/python/qemu/aqmp/protocol.py
index c4fbe35a0e..eb740a5452 100644
--- a/python/qemu/aqmp/protocol.py
+++ b/python/qemu/aqmp/protocol.py
@@ -15,6 +15,7 @@
 from enum import Enum
 from functools import wraps
 import logging
+import socket
 from ssl import SSLContext
 from typing import (
 Any,
@@ -234,6 +235,9 @@ def __init__(self, name: Optional[str] = None) -> None:
 self._runstate = Runstate.IDLE
 self._runstate_changed: Optional[asyncio.Event] = None
 
+# Workaround for bind()
+self._sock: Optional[socket.socket] = None
+
 def __repr__(self) -> str:
 cls_name = type(self).__name__
 tokens = []
@@ -423,6 +427,34 @@ async def _establish_connection(
 else:
 await self._do_connect(address, ssl)
 
+def _bind_hack(self, address: Union[str, Tuple[str, int]]) -> None:
+"""
+Used to create a socket in advance of accept().
+
+This is a workaround to ensure that we can guarantee timing of
+precisely when a socket exists to avoid a connection attempt
+bouncing off of nothing.
+
+Python 3.7+ adds a feature to separate the server creation and
+listening phases instead, and should be used instead of this
+hack.
+"""
+if isinstance(address, tuple):
+family = socket.AF_INET
+else:
+family = socket.AF_UNIX
+
+sock = socket.socket(family, socket.SOCK_STREAM)
+sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
+
+try:
+sock.bind(address)
+except:
+sock.close()
+raise
+
+self._sock = sock
+
 @upper_half
 async def _do_accept(self, address: Union[str, Tuple[str, int]],
  ssl: Optional[SSLContext] = None) -> None:
@@ -460,24 +492,27 @@ async def _client_connected_cb(reader: 
asyncio.StreamReader,
 if isinstance(address, tuple):
 coro = asyncio.start_server(
 _client_connected_cb,
-host=address[0],
-port=address[1],
+host=None if self._sock else address[0],
+port=None if self._sock else address[1],
 ssl=ssl,
 backlog=1,
 limit=self._limit,
+sock=self._sock,
 )
 else:
 coro = asyncio.start_unix_server(
 _client_connected_cb,
-path=address,
+path=None if self._sock else address,
 ssl=ssl,
 backlog=1,
 limit=self._limit,
+sock=self._sock,
 )
 
 server = await coro # Starts listening
 await connected.wait()  # Waits for the callback to fire (and finish)
 assert server is None
+self._sock = None
 
 self.logger.debug("Connection accepted.")
 
-- 
2.31.1

[PATCH v2 3/5] python/machine: raise VMLaunchFailure exception from launch()

2022-01-19 Thread John Snow

This allows us to pack in some extra information about the failure,
which guarantees that if the caller did not *intentionally* cause a
failure (by capturing this Exception), some pretty good clues will be
printed at the bottom of the traceback information.

This will help make failures in the event of a non-negative return code
more obvious when they go unhandled; the current behavior is to print a
warning message only in the event of signal-based terminations (for
negative return codes).

Signed-off-by: John Snow 
---
 python/qemu/machine/machine.py| 44 +++
 tests/qemu-iotests/tests/mirror-top-perms |  3 +-
 2 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 67ab06ca2b..5b76ee9a36 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -74,6 +74,35 @@ class QEMUMachineAddDeviceError(QEMUMachineError):
 """
 
 
+class VMLaunchFailure(QEMUMachineError):
+"""
+Exception raised when a VM was attempted, but failed.
+"""
+def __init__(self, exitcode: Optional[int],
+ command: str, output: Optional[str]):
+super().__init__(exitcode, command, output)
+self.exitcode = exitcode
+self.command = command
+self.output = output
+
+def __str__(self) -> str:
+ret = ''
+if self.__cause__ is not None:
+name = type(self.__cause__).__name__
+reason = str(self.__cause__)
+if reason:
+ret += f"{name}: {reason}"
+else:
+ret += f"{name}"
+ret += '\n'
+
+if self.exitcode is not None:
+ret += f"\tExit code: {self.exitcode}\n"
+ret += f"\tCommand: {self.command}\n"
+ret += f"\tOutput: {self.output}\n"
+return ret
+
+
 class AbnormalShutdown(QEMUMachineError):
 """
 Exception raised when a graceful shutdown was requested, but not performed.
@@ -397,7 +426,7 @@ def launch(self) -> None:
 
 try:
 self._launch()
-except:
+except BaseException as exc:
 # We may have launched the process but it may
 # have exited before we could connect via QMP.
 # Assume the VM didn't launch or is exiting.
@@ -408,11 +437,14 @@ def launch(self) -> None:
 else:
 self._post_shutdown()
 
-LOG.debug('Error launching VM')
-if self._qemu_full_args:
-LOG.debug('Command: %r', ' '.join(self._qemu_full_args))
-if self._iolog:
-LOG.debug('Output: %r', self._iolog)
+if isinstance(exc, Exception):
+raise VMLaunchFailure(
+exitcode=self.exitcode(),
+command=' '.join(self._qemu_full_args),
+output=self._iolog
+) from exc
+
+# Leave BaseException priority exceptions alone.
 raise
 
 def _launch(self) -> None:
diff --git a/tests/qemu-iotests/tests/mirror-top-perms 
b/tests/qemu-iotests/tests/mirror-top-perms
index 0a51a613f3..b5849978c4 100755
--- a/tests/qemu-iotests/tests/mirror-top-perms
+++ b/tests/qemu-iotests/tests/mirror-top-perms
@@ -21,7 +21,6 @@
 
 import os
 
-from qemu.aqmp import ConnectError
 from qemu.machine import machine
 from qemu.qmp import QMPConnectError
 
@@ -107,7 +106,7 @@ class TestMirrorTopPerms(iotests.QMPTestCase):
 self.vm_b.launch()
 print('ERROR: VM B launched successfully, '
   'this should not have happened')
-except (QMPConnectError, ConnectError):
+except (QMPConnectError, machine.VMLaunchFailure):
 assert 'Is another process using the image' in self.vm_b.get_log()
 
 result = self.vm.qmp('block-job-cancel',
-- 
2.31.1

[PATCH v2 4/5] python: upgrade mypy to 0.780

2022-01-19 Thread John Snow

We need a slightly newer version of mypy in order to use some features
of the asyncio server functions in a forthcoming patch.

(Note: pipenv is not really suited to upgrading individual packages; I
need to replace this tool with something better for the task. For now,
the miscellaneous updates not related to the mypy upgrade are simply
beyond my control.)

Signed-off-by: John Snow 
---
 python/Pipfile.lock | 66 ++---
 python/setup.cfg|  2 +-
 2 files changed, 40 insertions(+), 28 deletions(-)

diff --git a/python/Pipfile.lock b/python/Pipfile.lock
index d2a7dbd88b..ce46404ce0 100644
--- a/python/Pipfile.lock
+++ b/python/Pipfile.lock
@@ -1,7 +1,7 @@
 {
 "_meta": {
 "hash": {
-"sha256": 
"784b327272db32403d5a488507853b5afba850ba26a5948e5b6a90c1baef2d9c"
+"sha256": 
"f1a25654d884a5b450e38d78b1f2e3ebb9073e421cc4358d4bbb83ac251a5670"
 },
 "pipfile-spec": 6,
 "requires": {
@@ -34,7 +34,7 @@
 
"sha256:09bdb456e02564731f8b5957cdd0c98a7f01d2db5e90eb1d794c353c28bfd705",
 
"sha256:6a8a51f64dae307f6e0c9db752b66a7951e282389d8362cc1d39a56f3feeb31d"
 ],
-"markers": "python_version ~= '3.6'",
+"index": "pypi",
 "version": "==2.6.0"
 },
 "avocado-framework": {
@@ -50,6 +50,7 @@
 
"sha256:106fef6dc37dd8c0e2c0a60d3fca3e77460a48907f335fa28420463a6f799736",
 
"sha256:23e223426b28491b1ced97dc3bbe183027419dfc7982b4fa2f05d5f3ff10711c"
 ],
+"index": "pypi",
 "version": "==0.3.2"
 },
 "filelock": {
@@ -57,6 +58,7 @@
 
"sha256:18d82244ee114f543149c66a6e0c14e9c4f8a1044b5cdaadd0f82159d6a6ff59",
 
"sha256:929b7d63ec5b7d6b71b0fa5ac14e030b3f70b75747cef1b10da9b879fef15836"
 ],
+"index": "pypi",
 "version": "==3.0.12"
 },
 "flake8": {
@@ -88,7 +90,7 @@
 
"sha256:54161657e8ffc76596c4ede7080ca68cb02962a2e074a2586b695a93a925d36e",
 
"sha256:e962bff7440364183203d179d7ae9ad90cb1f2b74dcb84300e88ecc42dca3351"
 ],
-"markers": "python_version < '3.7'",
+"index": "pypi",
 "version": "==5.1.4"
 },
 "isort": {
@@ -124,7 +126,7 @@
 
"sha256:ed361bb83436f117f9917d282a456f9e5009ea12fd6de8742d1a4752c3017e93",
 
"sha256:f5144c75445ae3ca2057faac03fda5a902eff196702b0a24daf1d6ce0650514b"
 ],
-"markers": "python_version >= '2.7' and python_version not in 
'3.0, 3.1, 3.2, 3.3, 3.4, 3.5'",
+"index": "pypi",
 "version": "==1.6.0"
 },
 "mccabe": {
@@ -136,23 +138,23 @@
 },
 "mypy": {
 "hashes": [
-
"sha256:15b948e1302682e3682f11f50208b726a246ab4e6c1b39f9264a8796bb416aa2",
-
"sha256:219a3116ecd015f8dca7b5d2c366c973509dfb9a8fc97ef044a36e3da66144a1",
-
"sha256:3b1fc683fb204c6b4403a1ef23f0b1fac8e4477091585e0c8c54cbdf7d7bb164",
-
"sha256:3beff56b453b6ef94ecb2996bea101a08f1f8a9771d3cbf4988a61e4d9973761",
-
"sha256:7687f6455ec3ed7649d1ae574136835a4272b65b3ddcf01ab8704ac65616c5ce",
-
"sha256:7ec45a70d40ede1ec7ad7f95b3c94c9cf4c186a32f6bacb1795b60abd2f9ef27",
-
"sha256:86c857510a9b7c3104cf4cde1568f4921762c8f9842e987bc03ed4f160925754",
-
"sha256:8a627507ef9b307b46a1fea9513d5c98680ba09591253082b4c48697ba05a4ae",
-
"sha256:8dfb69fbf9f3aeed18afffb15e319ca7f8da9642336348ddd6cab2713ddcf8f9",
-
"sha256:a34b577cdf6313bf24755f7a0e3f3c326d5c1f4fe7422d1d06498eb25ad0c600",
-
"sha256:a8ffcd53cb5dfc131850851cc09f1c44689c2812d0beb954d8138d4f5fc17f65",
-
"sha256:b90928f2d9eb2f33162405f32dde9f6dcead63a0971ca8a1b50eb4ca3e35ceb8",
-
"sha256:c56ffe22faa2e51054c5f7a3bc70a370939c2ed4de308c690e7949230c995913",
-
"sha256:f91c7ae919bbc3f96cd5e5b2e786b2b108343d1d7972ea130f7de27fdd547cf3"
+
"sha256:00cb1964a7476e871d6108341ac9c1a857d6bd20bf5877f4773ac5e9d92cd3cd",
+
"sha256:127de5a9b817a03a98c5ae8a0c46a20dc2af6dcfa2ae7f96cb519b312efa",
+
"sha256:1f3976a945ad7f0a0727aafdc5651c2d3278e3c88dee94e2bf75cd3386b7b2f4",
+
"sha256:2f8c098f12b402c19b735aec724cc9105cc1a9eea405d08814eb4b14a6fb1a41",
+
"sha256:4ef13b619a289aa025f2273e05e755f8049bb4eaba6d703a425de37d495d178d",
+
"sha256:5d142f219bf8c7894dfa79ebfb7d352c4c63a325e75f10dfb4c3db9417dcd135",
+
"sha256:62eb5dd4ea86bda8ce386f26684f7f26e4bfe6283c9f2b6ca6d17faf704dcfad",
+
"sha256:64c36eb0936d0bfb7d8da49f92c18e312ad2e3ed46e5548ae4ca997b0d33bd59",
+

[PATCH v2 1/5] python/aqmp: Fix negotiation with pre-"oob" QEMU

2022-01-19 Thread John Snow

QEMU versions prior to the "oob" capability *also* can't accept the
"enable" keyword argument at all. Fix the handshake process with older
QEMU versions.

Signed-off-by: John Snow 
---
 python/qemu/aqmp/qmp_client.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/qemu/aqmp/qmp_client.py b/python/qemu/aqmp/qmp_client.py
index 8105e29fa8..6b43e1dbbe 100644
--- a/python/qemu/aqmp/qmp_client.py
+++ b/python/qemu/aqmp/qmp_client.py
@@ -292,9 +292,9 @@ async def _negotiate(self) -> None:
 """
 self.logger.debug("Negotiating capabilities ...")
 
-arguments: Dict[str, List[str]] = {'enable': []}
+arguments: Dict[str, List[str]] = {}
 if self._greeting and 'oob' in self._greeting.QMP.capabilities:
-arguments['enable'].append('oob')
+arguments.setdefault('enable', []).append('oob')
 msg = self.make_execute_msg('qmp_capabilities', arguments=arguments)
 
 # It's not safe to use execute() here, because the reader/writers
-- 
2.31.1

[PATCH v2 2/5] python: use avocado's "new" runner

2022-01-19 Thread John Snow

The old legacy runner no longer seems to work with output logging, so we
can't see failure logs when a test case fails. The new runner doesn't
(seem to) support Coverage.py yet, but seeing error output is a more
important feature.

Signed-off-by: John Snow 
---
 python/avocado.cfg | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/avocado.cfg b/python/avocado.cfg
index c7722e7ecd..a460420059 100644
--- a/python/avocado.cfg
+++ b/python/avocado.cfg
@@ -1,5 +1,5 @@
 [run]
-test_runner = runner
+test_runner = nrunner
 
 [simpletests]
 # Don't show stdout/stderr in the test *summary*
-- 
2.31.1

[PATCH v2 0/5] Python: minor fixes

2022-01-19 Thread John Snow

GitLab: https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-fixes
CI: https://gitlab.com/jsnow/qemu/-/pipelines/451899886

Fix a couple AQMP bugs and improve some minor irritants.

V2:
 - Hack-fix a race condition inherent
   between machine.py and aqmp/legacy.py
 - Improve error reporting from QEMUMachine.launch()

John Snow (5):
  python/aqmp: Fix negotiation with pre-"oob" QEMU
  python: use avocado's "new" runner
  python/machine: raise VMLaunchFailure exception from launch()
  python: upgrade mypy to 0.780
  python/aqmp: add socket bind step to legacy.py

 python/Pipfile.lock   | 66 +--
 python/avocado.cfg|  2 +-
 python/qemu/aqmp/legacy.py|  3 ++
 python/qemu/aqmp/protocol.py  | 41 --
 python/qemu/aqmp/qmp_client.py|  4 +-
 python/qemu/machine/machine.py| 44 ---
 python/setup.cfg  |  2 +-
 tests/qemu-iotests/tests/mirror-top-perms |  3 +-
 8 files changed, 123 insertions(+), 42 deletions(-)

-- 
2.31.1

Re: Cross Architecture Kernel Modules?

2022-01-19 Thread Alex Bennée

Kenneth Adam Miller  writes:

> Hello all,
>
> I just want to pose the following problem: 
>
> There is a kernel module for a non-native architecture, say, arch 1. For 
> performance reasons, the rest of all of the software needs to run
> natively on a different arch, arch 2. Is there any way to perhaps run 
> multiple QEMU instances for the different architectures in such a way
> to minimize the cross architecture performance penalty? For example, I would 
> like the kernel module in one (non-native) QEMU instance to
> be made available, literally equivalently, in the second (native) QEMU 
> instance. Would there be any API or way to map across the QEMU
> instances so that the non native arch kernel module could be mapped to
> the native QEMU instance?

What you are describing sounds like heterogeneous system modelling which
QEMU only supports in a very limited way (all vCPUs must be the same
base architecture). You can link QEMU's together by way of shared memory
but there is no other wiring together done in that case although some
3rd party forks take this approach.

The kernel module sounds confusing - why would you have a kernel module
that wasn't the same architecture as the kernel you are running?

-- 
Alex Bennée

Re: [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux

2022-01-19 Thread Leonardo Bras Soares Passos

On Wed, Jan 19, 2022 at 3:16 PM Daniel P. Berrangé  wrote:
>
> On Wed, Jan 19, 2022 at 03:03:29PM -0300, Leonardo Bras Soares Passos wrote:
> > Hello Daniel,
> >
> > On Thu, Jan 13, 2022 at 10:10 AM Daniel P. Berrangé  
> > wrote:
> > >
> > > On Thu, Jan 06, 2022 at 07:13:40PM -0300, Leonardo Bras wrote:
> > > > Add property that allows zero-copy migration of memory pages,
> > > > and also includes a helper function migrate_use_zero_copy() to check
> > > > if it's enabled.
> > > >
> > > > No code is introduced to actually do the migration, but it allow
> > > > future implementations to enable/disable this feature.
> > > >
> > > > On non-Linux builds this parameter is compiled-out.
> > > >
> > > > Signed-off-by: Leonardo Bras 
> > > > ---
> > > >  qapi/migration.json   | 24 
> > > >  migration/migration.h |  5 +
> > > >  migration/migration.c | 32 
> > > >  migration/socket.c|  5 +
> > > >  monitor/hmp-cmds.c|  6 ++
> > > >  5 files changed, 72 insertions(+)
> > >
> > > Reviewed-by: Daniel P. Berrangé 
> >
> > Thanks!
> >
> > >
> > > >
> > > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > > index bbfd48cf0b..2e62ea6ebd 100644
> > > > --- a/qapi/migration.json
> > > > +++ b/qapi/migration.json
> > > > @@ -730,6 +730,13 @@
> > > >  #  will consume more CPU.
> > > >  #  Defaults to 1. (Since 5.0)
> > > >  #
> > > > +# @zero-copy: Controls behavior on sending memory pages on migration.
> > > > +# When true, enables a zero-copy mechanism for sending 
> > > > memory
> > > > +# pages, if host supports it.
> > > > +# Requires that QEMU be permitted to use locked memory for 
> > > > guest
> > > > +# RAM pages.
> > > > +# Defaults to false. (Since 7.0)
> > > > +#
> > > >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
> > > >  #aliases for the purpose of dirty bitmap 
> > > > migration.  Such
> > > >  #aliases may for example be the corresponding 
> > > > names on the
> > > > @@ -769,6 +776,7 @@
> > > > 'xbzrle-cache-size', 'max-postcopy-bandwidth',
> > > > 'max-cpu-throttle', 'multifd-compression',
> > > > 'multifd-zlib-level' ,'multifd-zstd-level',
> > > > +   { 'name': 'zero-copy', 'if' : 'CONFIG_LINUX'},
> > > > 'block-bitmap-mapping' ] }
> > > >
> > > >  ##
> > > > @@ -895,6 +903,13 @@
> > > >  #  will consume more CPU.
> > > >  #  Defaults to 1. (Since 5.0)
> > > >  #
> > > > +# @zero-copy: Controls behavior on sending memory pages on migration.
> > > > +# When true, enables a zero-copy mechanism for sending 
> > > > memory
> > > > +# pages, if host supports it.
> > > > +# Requires that QEMU be permitted to use locked memory for 
> > > > guest
> > > > +# RAM pages.
> > > > +# Defaults to false. (Since 7.0)
> > > > +#
> > > >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
> > > >  #aliases for the purpose of dirty bitmap 
> > > > migration.  Such
> > > >  #aliases may for example be the corresponding 
> > > > names on the
> > > > @@ -949,6 +964,7 @@
> > > >  '*multifd-compression': 'MultiFDCompression',
> > > >  '*multifd-zlib-level': 'uint8',
> > > >  '*multifd-zstd-level': 'uint8',
> > > > +'*zero-copy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
> > > >  '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> > >
> > > The current zerocopy impl is for the send path.
> > >
> > > Do you expect we might get zerocopy in the receive path
> > > later ?
> >
> > It's possible, but I haven't started the implementation yet.
> >
> > >
> > > If so then either call this 'send-zero-copy', or change it
> > > from a bool to an enum taking '["send", "recv", "both"]'.
> > >
> > > I'd probably take the former and just rename it.
> > >
> >
> > Well, my rationale:
> > - I want to set zero copy sending:
> > zero-copy is set in the sending host, start migration.
> >
> > - I want to set zero copy receiving:
> > zero-copy is set in the receiving host, wait for migration.
> > (Of course host support is checked when setting the parameter).
> >
> > The problem with the current approach is trying to enable zero-copy on
> > receive before it's implemented, which will 'fail' silently .
> > A possible solution would be to add a patch to check in the receiving
> > path if zero-copy is enabled, and fail for now.
>
> That's not good because mgmt apps cannot query the QAPI schema
> to find out if this feature is supported or not.
>
> If we wantt o support zero copy recv, then we need an explicit
> flag for it that is distinct from zero copy send, so that apps
> can introspect whether the feature is

Re: [PULL 00/10] s390x patches (shift instructions, MAINTAINERS, ...)

2022-01-19 Thread Peter Maydell

On Wed, 19 Jan 2022 at 08:32, Thomas Huth  wrote:
>
>  Hi!
>
> The following changes since commit 69353c332c558cead5f8081d0bb69f989fe33fa3:
>
>   Merge remote-tracking branch 
> 'remotes/konstantin/tags/qga-win32-pull-2022-01-10' into staging (2022-01-16 
> 16:32:34 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/thuth/qemu.git tags/pull-request-2022-01-19
>
> for you to fetch changes up to 59b9b5186e44a90088a91ed7a7493b03027e4f1f:
>
>   s390x: sigp: Reorder the SIGP STOP code (2022-01-18 15:00:57 +0100)
>
> 
> * Fix bits in one of the PMCW channel subsystem masks
> * s390x TCG shift instruction fixes
> * Re-organization for the MAINTAINERS file
> * Support for extended length of kernel command lines
> * Re-order the SIGP STOP code
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0
for any user-visible changes.

-- PMM

Re: [PATCH v5 28/31] block.c: assert BQL lock held in bdrv_co_invalidate_cache

2022-01-19 Thread Kevin Wolf

Am 20.12.2021 um 13:20 hat Emanuele Giuseppe Esposito geschrieben:
> 
> 
> On 17/12/2021 17:38, Emanuele Giuseppe Esposito wrote:
> > 
> > 
> > On 17/12/2021 12:04, Hanna Reitz wrote:
> > > On 24.11.21 07:44, Emanuele Giuseppe Esposito wrote:
> > > > bdrv_co_invalidate_cache is special: it is an I/O function,
> > > 
> > > I still don’t believe it is, but well.
> > > 
> > > (Yes, it is called by a test in an iothread, but I believe we’ve
> > > seen that the tests simply sometimes test things that shouldn’t be
> > > allowed.)
> > > 
> > > > but uses the block layer permission API, which is GS.
> > > > 
> > > > Because of this, we can assert that either the function is
> > > > being called with BQL held, and thus can use the permission API,
> > > > or make sure that the permission API is not used, by ensuring that
> > > > bs (and parents) .open_flags does not contain BDRV_O_INACTIVE.
> > > > 
> > > > Signed-off-by: Emanuele Giuseppe Esposito 
> > > > ---
> > > >   block.c | 26 ++
> > > >   1 file changed, 26 insertions(+)
> > > > 
> > > > diff --git a/block.c b/block.c
> > > > index a0309f827d..805974676b 100644
> > > > --- a/block.c
> > > > +++ b/block.c
> > > > @@ -6574,6 +6574,26 @@ void bdrv_init_with_whitelist(void)
> > > >   bdrv_init();
> > > >   }
> > > > +static bool bdrv_is_active(BlockDriverState *bs)
> > > > +{
> > > > +    BdrvChild *parent;
> > > > +
> > > > +    if (bs->open_flags & BDRV_O_INACTIVE) {
> > > > +    return false;
> > > > +    }
> > > > +
> > > > +    QLIST_FOREACH(parent, >parents, next_parent) {
> > > > +    if (parent->klass->parent_is_bds) {
> > > > +    BlockDriverState *parent_bs = parent->opaque;
> > > 
> > > This looks like a really bad hack to me.  We purposefully have made
> > > the parent link opaque so that a BDS cannot easily reach its
> > > parents.  All accesses should go through BdrvChildClass methods.
> > > 
> > > I also don’t understand why we need to query parents at all.  The
> > > only fact that determines whether the current BDS will have its
> > > permissions changed is whether the BDS itself is active or
> > > inactive.  Sure, we’ll invoke bdrv_co_invalidate_cache() on the
> > > parents, too, but then we could simply let the assertion fail there.
> > > 
> > > > +    if (!bdrv_is_active(parent_bs)) {
> > > > +    return false;
> > > > +    }
> > > > +    }
> > > > +    }
> > > > +
> > > > +   return true;
> > > > +}
> > > > +
> > > >   int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState
> > > > *bs, Error **errp)
> > > >   {
> > > >   BdrvChild *child, *parent;
> > > > @@ -6585,6 +6605,12 @@ int coroutine_fn
> > > > bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)
> > > >   return -ENOMEDIUM;
> > > >   }
> > > > +    /*
> > > > + * No need to muck with permissions if bs is active.
> > > > + * TODO: should activation be a separate function?
> > > > + */
> > > > +    assert(qemu_in_main_thread() || bdrv_is_active(bs));
> > > > +
> > > 
> > > I don’t understand this, really.  It looks to me like “if you don’t
> > > call this in the main thread, this better be a no-op”, i.e., you
> > > must never call this function in an I/O thread if you really want to
> > > use it.  I.e. what I’d classify as a GS function.
> > > 
> > > It sounds like this is just a special case for said test, and
> > > special-casing code for tests sounds like a bad idea.
> > 
> > Ok, but trying to leave just the qemu_in_main_thread() assertion makes
> > test 307 (./check 307) fail.
> > I am actually not sure on why it fails, but I am sure it is because of
> > the assertion, since without it it passes.
> > 
> > I tried with gdb (./check -gdb 307 on one terminal and
> > gdb -iex "target remote localhost:12345"
> > in another) but it points me to this below, which I think is the ndb
> > server getting the socket closed (because on the other side it crashed),
> > and not the actual error.
> > 
> > 
> > Thread 1 "qemu-system-x86" received signal SIGPIPE, Broken pipe.
> > 0x768af54d in sendmsg () from target:/lib64/libc.so.6
> > (gdb) bt
> > #0  0x768af54d in sendmsg () from target:/lib64/libc.so.6
> > #1  0x55c13cc9 in qio_channel_socket_writev (ioc= > out>, iov=0x569a4870, niov=1, fds=0x0, nfds=,
> > errp=0x0)
> >      at ../io/channel-socket.c:561
> > #2  0x55c19b18 in qio_channel_writev_full_all
> > (ioc=0x5763b800, iov=iov@entry=0x7fffe8dffd80, niov=niov@entry=1,
> > fds=fds@entry=0x0,
> >      nfds=nfds@entry=0, errp=errp@entry=0x0) at ../io/channel.c:240
> > #3  0x55c19bd2 in qio_channel_writev_all (errp=0x0, niov=1,
> > iov=0x7fffe8dffd80, ioc=) at ../io/channel.c:220
> > #4  qio_channel_write_all (ioc=,
> > buf=buf@entry=0x7fffe8dffdd0 "", buflen=buflen@entry=20,
> > errp=errp@entry=0x0) at ../io/channel.c:330
> > #5  0x55c27e75 in nbd_write (errp=0x0, size=20,
> > buffer=0x7fffe8dffdd0, ioc=) at

Re: [PATCH v7 1/5] QIOChannel: Add flags on io_writev and introduce io_flush callback

2022-01-19 Thread Leonardo Bras Soares Passos

On Tue, Jan 18, 2022 at 10:58 PM Peter Xu  wrote:
>
> On Tue, Jan 18, 2022 at 05:45:09PM -0300, Leonardo Bras Soares Passos wrote:
> > Hello Peter,
> >
> > On Thu, Jan 13, 2022 at 3:28 AM Peter Xu  wrote:
> > >
> > > On Thu, Jan 06, 2022 at 07:13:38PM -0300, Leonardo Bras wrote:
> > > > diff --git a/io/channel.c b/io/channel.c
> > > > index e8b019dc36..904855e16e 100644
> > > > --- a/io/channel.c
> > > > +++ b/io/channel.c
> > > > @@ -67,12 +67,13 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
> > > >  }
> > > >
> > > >
> > > > -ssize_t qio_channel_writev_full(QIOChannel *ioc,
> > > > -const struct iovec *iov,
> > > > -size_t niov,
> > > > -int *fds,
> > > > -size_t nfds,
> > > > -Error **errp)
> > > > +ssize_t qio_channel_writev_full_flags(QIOChannel *ioc,
> > > > +  const struct iovec *iov,
> > > > +  size_t niov,
> > > > +  int *fds,
> > > > +  size_t nfds,
> > > > +  int flags,
> > > > +  Error **errp)
> > > >  {
> > > >  QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> > > >
> > > > @@ -83,7 +84,7 @@ ssize_t qio_channel_writev_full(QIOChannel *ioc,
> > > >  return -1;
> > > >  }
> > >
> > > Should we better also check QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY here when
> > > QIO_CHANNEL_WRITE_FLAG_ZERO_COPY is set?  Just like what we do with:
> >
> > Yes, that's correct.
> > I will also test for fds + zerocopy_flag , which should also fail here.
> >
> > >
> > > if ((fds || nfds) &&
> > > !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS)) {
> > > error_setg_errno(errp, EINVAL,
> > >  "Channel does not support file descriptor 
> > > passing");
> > > return -1;
> > > }
> > >
> > > I still think it's better to have the caller be crystal clear when to use
> > > zero_copy feature because it has implication on buffer lifetime.
> >
> > I don't disagree with that suggestion.
> >
> > But the buffer lifetime limitation is something on the socket
> > implementation, right?
> > There could be some synchronous zerocopy implementation that does not
> > require flush, and thus
> > don't require the buffer to be treated any special. Or am I missing 
> > something?
>
> Currently the flush() is required for zerocopy and not required for all the
> existing non-zerocopy use cases, that's already an API difference so the 
> caller
> needs to identify it anyway.  Then I think it's simpler we expose all of it to
> the user.

Yeah, I agree.
Since one ZC implementation uses flush, all should use them. Even if
it's a no-op.
It was just an observation that not all ZC implementations have buffer
limitations, but I agree the user should expect them anyway, since
they will exist in some implementations.

>
> Not to mention IIUC if we don't fail here, it will just fail later when the
> code will unconditionally convert the flags=ZEROCOPY into MSG_ZEROCOPY in your
> next patch:
>
> if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> sflags = MSG_ZEROCOPY;
> }
>

Correct.

> So AFAIU it'll fail anyway, either here with the cap check I mentioned, or
> later in sendmsg().
>
> IOW, I think it fails cleaner here, rather than reaching sendmsg().

I Agree.

>
> >
> > >
> > > I might have commented similar things before, but I have missed a few 
> > > versions
> > > so I could also have missed some previous discussions..
> > >
> >
> > That's all great suggestions Peter!  Thanks for that!
> >
> > Some of the previous suggestions may have been missed because a lot of
> > code moved.
> > Sorry about that.
>
> Not a problem at all, I just want to make sure my question still makes
> sense. :)

Thanks for asking them!

>
> --
> Peter Xu
>

Best regards,
Leo

Re: [PATCH v7 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)

2022-01-19 Thread Leonardo Bras Soares Passos

Hello Peter,

On Thu, Jan 13, 2022 at 4:15 AM Peter Xu  wrote:
>
> On Thu, Jan 06, 2022 at 07:13:42PM -0300, Leonardo Bras wrote:
> > Implement zero copy on nocomp_send_write(), by making use of QIOChannel
> > writev + flags & flush interface.
> >
> > Change multifd_send_sync_main() so it can distinguish each iteration sync 
> > from
> > the setup and the completion, so a flush_zero_copy() can be called
> > after each iteration in order to make sure all dirty pages are sent
> > before a new iteration is started.
>
> Leo - could you remind me (since I remembered we've discussed something
> similar) on why we can't simply do the sync() unconditionally for zero copy?

On previous implementations, it would get stuck on the setup, since it
was waiting for any movement on the error queue before even starting
the sending process.
At the time we would sync only at 'complete', so it would only need to
run once. Running every iteration seemed a waste at the time.

Then, after some talk with Juan, it was decided to sync after each
migration, so on 'complete' it was unnecessary.
But sure, now it would add just 2 syncs in the whole migration, and
those should not even get to the syscall due to queued/sent counters.

>
> I remember why we need the sync(), but I can't remember what's the matter if 
> we
> simply sync() too during setup and complete of migration.
>

> Another trivial nit here:
>
> > -void multifd_send_sync_main(QEMUFile *f)
> > +int multifd_send_sync_main(QEMUFile *f, bool sync)
>
> I'd name it "bool full" or anything not called "sync", because the function
> already has a name that contains "sync", then it's werid to sync(sync==false).
>

Yeah, I agree.
But if we will flush every time, then there is no need for such parameter :).

> The rest looks good to me.  Thanks.
>

Thanks!

> --
> Peter Xu
>

Best regards,
Leo

Re: [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux

2022-01-19 Thread Daniel P . Berrangé

On Wed, Jan 19, 2022 at 03:03:29PM -0300, Leonardo Bras Soares Passos wrote:
> Hello Daniel,
> 
> On Thu, Jan 13, 2022 at 10:10 AM Daniel P. Berrangé  
> wrote:
> >
> > On Thu, Jan 06, 2022 at 07:13:40PM -0300, Leonardo Bras wrote:
> > > Add property that allows zero-copy migration of memory pages,
> > > and also includes a helper function migrate_use_zero_copy() to check
> > > if it's enabled.
> > >
> > > No code is introduced to actually do the migration, but it allow
> > > future implementations to enable/disable this feature.
> > >
> > > On non-Linux builds this parameter is compiled-out.
> > >
> > > Signed-off-by: Leonardo Bras 
> > > ---
> > >  qapi/migration.json   | 24 
> > >  migration/migration.h |  5 +
> > >  migration/migration.c | 32 
> > >  migration/socket.c|  5 +
> > >  monitor/hmp-cmds.c|  6 ++
> > >  5 files changed, 72 insertions(+)
> >
> > Reviewed-by: Daniel P. Berrangé 
> 
> Thanks!
> 
> >
> > >
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index bbfd48cf0b..2e62ea6ebd 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -730,6 +730,13 @@
> > >  #  will consume more CPU.
> > >  #  Defaults to 1. (Since 5.0)
> > >  #
> > > +# @zero-copy: Controls behavior on sending memory pages on migration.
> > > +# When true, enables a zero-copy mechanism for sending memory
> > > +# pages, if host supports it.
> > > +# Requires that QEMU be permitted to use locked memory for 
> > > guest
> > > +# RAM pages.
> > > +# Defaults to false. (Since 7.0)
> > > +#
> > >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
> > >  #aliases for the purpose of dirty bitmap 
> > > migration.  Such
> > >  #aliases may for example be the corresponding 
> > > names on the
> > > @@ -769,6 +776,7 @@
> > > 'xbzrle-cache-size', 'max-postcopy-bandwidth',
> > > 'max-cpu-throttle', 'multifd-compression',
> > > 'multifd-zlib-level' ,'multifd-zstd-level',
> > > +   { 'name': 'zero-copy', 'if' : 'CONFIG_LINUX'},
> > > 'block-bitmap-mapping' ] }
> > >
> > >  ##
> > > @@ -895,6 +903,13 @@
> > >  #  will consume more CPU.
> > >  #  Defaults to 1. (Since 5.0)
> > >  #
> > > +# @zero-copy: Controls behavior on sending memory pages on migration.
> > > +# When true, enables a zero-copy mechanism for sending memory
> > > +# pages, if host supports it.
> > > +# Requires that QEMU be permitted to use locked memory for 
> > > guest
> > > +# RAM pages.
> > > +# Defaults to false. (Since 7.0)
> > > +#
> > >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
> > >  #aliases for the purpose of dirty bitmap 
> > > migration.  Such
> > >  #aliases may for example be the corresponding 
> > > names on the
> > > @@ -949,6 +964,7 @@
> > >  '*multifd-compression': 'MultiFDCompression',
> > >  '*multifd-zlib-level': 'uint8',
> > >  '*multifd-zstd-level': 'uint8',
> > > +'*zero-copy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
> > >  '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> >
> > The current zerocopy impl is for the send path.
> >
> > Do you expect we might get zerocopy in the receive path
> > later ?
> 
> It's possible, but I haven't started the implementation yet.
> 
> >
> > If so then either call this 'send-zero-copy', or change it
> > from a bool to an enum taking '["send", "recv", "both"]'.
> >
> > I'd probably take the former and just rename it.
> >
> 
> Well, my rationale:
> - I want to set zero copy sending:
> zero-copy is set in the sending host, start migration.
> 
> - I want to set zero copy receiving:
> zero-copy is set in the receiving host, wait for migration.
> (Of course host support is checked when setting the parameter).
> 
> The problem with the current approach is trying to enable zero-copy on
> receive before it's implemented, which will 'fail' silently .
> A possible solution would be to add a patch to check in the receiving
> path if zero-copy is enabled, and fail for now.

That's not good because mgmt apps cannot query the QAPI schema
to find out if this feature is supported or not.

If we wantt o support zero copy recv, then we need an explicit
flag for it that is distinct from zero copy send, so that apps
can introspect whether the feature is implemneted in QEMU or
not.

Distinct named bool flags feels better, and also makes it clear
to anyone not familiar with the impl that the current  code is
strictly send only.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-

Re: [PATCH v7 4/5] migration: Add migrate_use_tls() helper

2022-01-19 Thread Leonardo Bras Soares Passos

Hello Daniel,

On Thu, Jan 13, 2022 at 10:11 AM Daniel P. Berrangé  wrote:
>
> On Thu, Jan 06, 2022 at 07:13:41PM -0300, Leonardo Bras wrote:
> > A lot of places check parameters.tls_creds in order to evaluate if TLS is
> > in use, and sometimes call migrate_get_current() just for that test.
> >
> > Add new helper function migrate_use_tls() in order to simplify testing
> > for TLS usage.
> >
> > Signed-off-by: Leonardo Bras 
> > Reviewed-by: Juan Quintela 
> > ---
> >  migration/migration.h | 1 +
> >  migration/channel.c   | 6 +++---
> >  migration/migration.c | 9 +
> >  migration/multifd.c   | 5 +
> >  4 files changed, 14 insertions(+), 7 deletions(-)
>
> Reviewed-by: Daniel P. Berrangé 
>

Thanks!

> Regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>

Best regards,
Leo

Re: [PATCH v7 4/5] migration: Add migrate_use_tls() helper

2022-01-19 Thread Leonardo Bras Soares Passos

Hello Peter,

On Thu, Jan 13, 2022 at 4:02 AM Peter Xu  wrote:
>
> On Thu, Jan 06, 2022 at 07:13:41PM -0300, Leonardo Bras wrote:
> >  void migration_channel_process_incoming(QIOChannel *ioc)
> >  {
> > -MigrationState *s = migrate_get_current();
> >  Error *local_err = NULL;
> >
> >  trace_migration_set_incoming_channel(
> >  ioc, object_get_typename(OBJECT(ioc)));
> >
> > -if (s->parameters.tls_creds &&
> > -*s->parameters.tls_creds &&
> > +if (migrate_use_tls() &&
> >  !object_dynamic_cast(OBJECT(ioc),
> >   TYPE_QIO_CHANNEL_TLS)) {
> > +MigrationState *s = migrate_get_current();
> > +
>
> Trivial nit: I'd rather keep the line there; as the movement offers nothing,
> imho..

The idea to move the 's' to inside the if  block is to make it clear
it's only used in this case.

But if you think it's better to keep it at the beginning of the
function, sure, I can change that.
Just let me know.

>
> >  migration_tls_channel_process_incoming(s, ioc, _err);
> >  } else {
> >  migration_ioc_register_yank(ioc);
>
> Reviewed-by: Peter Xu 
>

Thanks!

> --
> Peter Xu
>

Best regards,
Leo

Re: [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux

2022-01-19 Thread Leonardo Bras Soares Passos

Hello Daniel,

On Thu, Jan 13, 2022 at 10:10 AM Daniel P. Berrangé  wrote:
>
> On Thu, Jan 06, 2022 at 07:13:40PM -0300, Leonardo Bras wrote:
> > Add property that allows zero-copy migration of memory pages,
> > and also includes a helper function migrate_use_zero_copy() to check
> > if it's enabled.
> >
> > No code is introduced to actually do the migration, but it allow
> > future implementations to enable/disable this feature.
> >
> > On non-Linux builds this parameter is compiled-out.
> >
> > Signed-off-by: Leonardo Bras 
> > ---
> >  qapi/migration.json   | 24 
> >  migration/migration.h |  5 +
> >  migration/migration.c | 32 
> >  migration/socket.c|  5 +
> >  monitor/hmp-cmds.c|  6 ++
> >  5 files changed, 72 insertions(+)
>
> Reviewed-by: Daniel P. Berrangé 

Thanks!

>
> >
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index bbfd48cf0b..2e62ea6ebd 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -730,6 +730,13 @@
> >  #  will consume more CPU.
> >  #  Defaults to 1. (Since 5.0)
> >  #
> > +# @zero-copy: Controls behavior on sending memory pages on migration.
> > +# When true, enables a zero-copy mechanism for sending memory
> > +# pages, if host supports it.
> > +# Requires that QEMU be permitted to use locked memory for 
> > guest
> > +# RAM pages.
> > +# Defaults to false. (Since 7.0)
> > +#
> >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
> >  #aliases for the purpose of dirty bitmap 
> > migration.  Such
> >  #aliases may for example be the corresponding 
> > names on the
> > @@ -769,6 +776,7 @@
> > 'xbzrle-cache-size', 'max-postcopy-bandwidth',
> > 'max-cpu-throttle', 'multifd-compression',
> > 'multifd-zlib-level' ,'multifd-zstd-level',
> > +   { 'name': 'zero-copy', 'if' : 'CONFIG_LINUX'},
> > 'block-bitmap-mapping' ] }
> >
> >  ##
> > @@ -895,6 +903,13 @@
> >  #  will consume more CPU.
> >  #  Defaults to 1. (Since 5.0)
> >  #
> > +# @zero-copy: Controls behavior on sending memory pages on migration.
> > +# When true, enables a zero-copy mechanism for sending memory
> > +# pages, if host supports it.
> > +# Requires that QEMU be permitted to use locked memory for 
> > guest
> > +# RAM pages.
> > +# Defaults to false. (Since 7.0)
> > +#
> >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
> >  #aliases for the purpose of dirty bitmap 
> > migration.  Such
> >  #aliases may for example be the corresponding 
> > names on the
> > @@ -949,6 +964,7 @@
> >  '*multifd-compression': 'MultiFDCompression',
> >  '*multifd-zlib-level': 'uint8',
> >  '*multifd-zstd-level': 'uint8',
> > +'*zero-copy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
> >  '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
>
> The current zerocopy impl is for the send path.
>
> Do you expect we might get zerocopy in the receive path
> later ?

It's possible, but I haven't started the implementation yet.

>
> If so then either call this 'send-zero-copy', or change it
> from a bool to an enum taking '["send", "recv", "both"]'.
>
> I'd probably take the former and just rename it.
>

Well, my rationale:
- I want to set zero copy sending:
zero-copy is set in the sending host, start migration.

- I want to set zero copy receiving:
zero-copy is set in the receiving host, wait for migration.
(Of course host support is checked when setting the parameter).

The problem with the current approach is trying to enable zero-copy on
receive before it's implemented, which will 'fail' silently .
A possible solution would be to add a patch to check in the receiving
path if zero-copy is enabled, and fail for now.

What do you think?

Best regards,
Leo

>
> Regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>

Re: [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux

2022-01-19 Thread Leonardo Bras Soares Passos

Hello Peter,

On Thu, Jan 13, 2022 at 4:00 AM Peter Xu  wrote:
>
> On Thu, Jan 06, 2022 at 07:13:40PM -0300, Leonardo Bras wrote:
> > Add property that allows zero-copy migration of memory pages,
> > and also includes a helper function migrate_use_zero_copy() to check
> > if it's enabled.
> >
> > No code is introduced to actually do the migration, but it allow
> > future implementations to enable/disable this feature.
> >
> > On non-Linux builds this parameter is compiled-out.
>
> I feel sad every time seeing a new parameter needs to be mostly duplicated 3
> times in the code. :(
>
> > diff --git a/migration/socket.c b/migration/socket.c
> > index 05705a32d8..f7a77aafd3 100644
> > --- a/migration/socket.c
> > +++ b/migration/socket.c
> > @@ -77,6 +77,11 @@ static void socket_outgoing_migration(QIOTask *task,
> >  } else {
> >  trace_migration_socket_outgoing_connected(data->hostname);
> >  }
> > +
> > +if (migrate_use_zero_copy()) {
> > +error_setg(, "Zero copy not available in migration");
> > +}
>
> I got confused the 1st time looking at it..  I think this is not strongly
> needed, but that's okay:

The idea is to avoid some future issues on testing migration while bisecting.

>
> Reviewed-by: Peter Xu 

Thanks Peter!

>
> Thanks,
>
> --
> Peter Xu
>

Re: [PATCH 0/3] meson: Don't pass 'method' to dependency()

2022-01-19 Thread Daniel P . Berrangé

On Wed, Jan 19, 2022 at 06:17:57PM +0100, Andrea Bolognani wrote:
> See [1] for recent discussion about libgcrypt specifically, which the
> first patch is about.
> 
> After writing that one, I realized that there is no point in
> explicitly passing 'method' to dependency() because Meson will do the
> right thing by default - hence the next two patches.

This whole series is effectively reverting

  commit 1a94933fcc3d641bda9988244cde61769baae2e5
  Author: Paolo Bonzini 
  Date:   Mon Aug 31 06:27:00 2020 -0400

meson: use pkg-config method to find dependencies

We do not need to ask cmake for the dependencies, so just use the
pkg-config mechanism.  Keep "auto" for SDL so that it tries using
sdl-config too.

The documentation is adjusted to use SDL2_image as the example,
rather than SDL which does not use the "pkg-config" method.

Signed-off-by: Paolo Bonzini 

which IIRC was done to get rid of mesons' confusing/misleading
attempts to probe for things via cmake when the pkg-config file
is not present.

> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg01224.html
> 
> Andrea Bolognani (3):
>   meson: Don't force use of libgcrypt-config
>   meson: Don't pass 'method' to dependency()
>   docs: Don't recommend passing 'method' to dependency()
> 
>  docs/devel/build-system.rst |  1 -
>  meson.build | 75 +++--
>  tcg/meson.build |  2 +-
>  3 files changed, 31 insertions(+), 47 deletions(-)
> 
> -- 
> 2.34.1
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 0/3] meson: Don't pass 'method' to dependency()

2022-01-19 Thread Andrea Bolognani

On Wed, Jan 19, 2022 at 05:38:38PM +, Daniel P. Berrangé wrote:
> On Wed, Jan 19, 2022 at 06:17:57PM +0100, Andrea Bolognani wrote:
> > See [1] for recent discussion about libgcrypt specifically, which the
> > first patch is about.
> >
> > After writing that one, I realized that there is no point in
> > explicitly passing 'method' to dependency() because Meson will do the
> > right thing by default - hence the next two patches.
>
> This whole series is effectively reverting
>
>   commit 1a94933fcc3d641bda9988244cde61769baae2e5
>   Author: Paolo Bonzini 
>   Date:   Mon Aug 31 06:27:00 2020 -0400
>
> meson: use pkg-config method to find dependencies
>
> We do not need to ask cmake for the dependencies, so just use the
> pkg-config mechanism.  Keep "auto" for SDL so that it tries using
> sdl-config too.
>
> The documentation is adjusted to use SDL2_image as the example,
> rather than SDL which does not use the "pkg-config" method.
>
> Signed-off-by: Paolo Bonzini 
>
> which IIRC was done to get rid of mesons' confusing/misleading
> attempts to probe for things via cmake when the pkg-config file
> is not present.

I guess I stand corrected on Meson doing the right thing by default
then :)

The first patch should still makes sense though: libgcrypt is like
SDL in that Meson implements special handling for it, and we should
allow the pkg-config file to be used if available.

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [PATCH 1/3] migration/migration.c: Add missed default error handler for migration state

2022-01-19 Thread Dr. David Alan Gilbert

* Zhang Chen (chen.zh...@intel.com) wrote:
> In the migration_completion() no other status is expected, for
> example MIGRATION_STATUS_CANCELLING, MIGRATION_STATUS_CANCELLED, etc.
> 
> Signed-off-by: Zhang Chen 

I think you're right;

Reviewed-by: Dr. David Alan Gilbert 

 however, did you actually see this trigger in a different state?

Dave
> ---
>  migration/migration.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 0652165610..2afa77da03 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3205,7 +3205,7 @@ static void migration_completion(MigrationState *s)
>  qemu_mutex_unlock_iothread();
>  
>  trace_migration_completion_postcopy_end_after_complete();
> -} else if (s->state == MIGRATION_STATUS_CANCELLING) {
> +} else {
>  goto fail;
>  }
>  
> -- 
> 2.25.1
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX

2022-01-19 Thread Leonardo Bras Soares Passos

On Thu, Jan 13, 2022 at 10:06 AM Daniel P. Berrangé  wrote:
>
> On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > For CONFIG_LINUX, implement the new zero copy flag and the optional callback
> > io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
> > feature is available in the host kernel, which is checked on
> > qio_channel_socket_connect_sync()
> >
> > qio_channel_socket_flush() was implemented by counting how many times
> > sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
> > socket's error queue, in order to find how many of them finished sending.
> > Flush will loop until those counters are the same, or until some error 
> > occurs.
> >
> > Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY:
> > 1: Buffer
> > - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid 
> > copying,
> > some caution is necessary to avoid overwriting any buffer before it's sent.
> > If something like this happen, a newer version of the buffer may be sent 
> > instead.
> > - If this is a problem, it's recommended to call qio_channel_flush() before 
> > freeing
> > or re-using the buffer.
> >
> > 2: Locked memory
> > - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, 
> > and
> > unlocked after it's sent.
> > - Depending on the size of each buffer, and how often it's sent, it may 
> > require
> > a larger amount of locked memory than usually available to non-root user.
> > - If the required amount of locked memory is not available, writev_zero_copy
> > will return an error, which can abort an operation like migration,
> > - Because of this, when an user code wants to add zero copy as a feature, it
> > requires a mechanism to disable it, so it can still be accessible to less
> > privileged users.
> >
> > Signed-off-by: Leonardo Bras 
> > ---
> >  include/io/channel-socket.h |   2 +
> >  io/channel-socket.c | 107 ++--
> >  2 files changed, 105 insertions(+), 4 deletions(-)
>
> Reviewed-by: Daniel P. Berrangé 
>

Thanks!

>
> Regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>

Re: [PATCH v5 28/31] block.c: assert BQL lock held in bdrv_co_invalidate_cache

2022-01-19 Thread Hanna Reitz


On 19.01.22 16:57, Hanna Reitz wrote:

On 23.12.21 18:11, Hanna Reitz wrote:

On 20.12.21 13:20, Emanuele Giuseppe Esposito wrote:



On 17/12/2021 17:38, Emanuele Giuseppe Esposito wrote:



On 17/12/2021 12:04, Hanna Reitz wrote:

On 24.11.21 07:44, Emanuele Giuseppe Esposito wrote:

bdrv_co_invalidate_cache is special: it is an I/O function,


I still don’t believe it is, but well.

(Yes, it is called by a test in an iothread, but I believe we’ve 
seen that the tests simply sometimes test things that shouldn’t be 
allowed.)



but uses the block layer permission API, which is GS.

Because of this, we can assert that either the function is
being called with BQL held, and thus can use the permission API,
or make sure that the permission API is not used, by ensuring that
bs (and parents) .open_flags does not contain BDRV_O_INACTIVE.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  block.c | 26 ++
  1 file changed, 26 insertions(+)

diff --git a/block.c b/block.c
index a0309f827d..805974676b 100644
--- a/block.c
+++ b/block.c
@@ -6574,6 +6574,26 @@ void bdrv_init_with_whitelist(void)
  bdrv_init();
  }
+static bool bdrv_is_active(BlockDriverState *bs)
+{
+    BdrvChild *parent;
+
+    if (bs->open_flags & BDRV_O_INACTIVE) {
+    return false;
+    }
+
+    QLIST_FOREACH(parent, >parents, next_parent) {
+    if (parent->klass->parent_is_bds) {
+    BlockDriverState *parent_bs = parent->opaque;


This looks like a really bad hack to me.  We purposefully have 
made the parent link opaque so that a BDS cannot easily reach its 
parents.  All accesses should go through BdrvChildClass methods.


I also don’t understand why we need to query parents at all. The 
only fact that determines whether the current BDS will have its 
permissions changed is whether the BDS itself is active or 
inactive.  Sure, we’ll invoke bdrv_co_invalidate_cache() on the 
parents, too, but then we could simply let the assertion fail there.



+    if (!bdrv_is_active(parent_bs)) {
+    return false;
+    }
+    }
+    }
+
+   return true;
+}
+
  int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, 
Error **errp)

  {
  BdrvChild *child, *parent;
@@ -6585,6 +6605,12 @@ int coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)

  return -ENOMEDIUM;
  }
+    /*
+ * No need to muck with permissions if bs is active.
+ * TODO: should activation be a separate function?
+ */
+    assert(qemu_in_main_thread() || bdrv_is_active(bs));
+


I don’t understand this, really.  It looks to me like “if you 
don’t call this in the main thread, this better be a no-op”, i.e., 
you must never call this function in an I/O thread if you really 
want to use it.  I.e. what I’d classify as a GS function.


It sounds like this is just a special case for said test, and 
special-casing code for tests sounds like a bad idea.


Ok, but trying to leave just the qemu_in_main_thread() assertion 
makes test 307 (./check 307) fail.
I am actually not sure on why it fails, but I am sure it is because 
of the assertion, since without it it passes.


I tried with gdb (./check -gdb 307 on one terminal and
gdb -iex "target remote localhost:12345"
in another) but it points me to this below, which I think is the 
ndb server getting the socket closed (because on the other side it 
crashed), and not the actual error.



Thread 1 "qemu-system-x86" received signal SIGPIPE, Broken pipe.
0x768af54d in sendmsg () from target:/lib64/libc.so.6
(gdb) bt
#0  0x768af54d in sendmsg () from target:/lib64/libc.so.6
#1  0x55c13cc9 in qio_channel_socket_writev (ioc=out>, iov=0x569a4870, niov=1, fds=0x0, nfds=, 
errp=0x0)

 at ../io/channel-socket.c:561
#2  0x55c19b18 in qio_channel_writev_full_all 
(ioc=0x5763b800, iov=iov@entry=0x7fffe8dffd80, 
niov=niov@entry=1, fds=fds@entry=0x0,

 nfds=nfds@entry=0, errp=errp@entry=0x0) at ../io/channel.c:240
#3  0x55c19bd2 in qio_channel_writev_all (errp=0x0, niov=1, 
iov=0x7fffe8dffd80, ioc=) at ../io/channel.c:220
#4  qio_channel_write_all (ioc=, 
buf=buf@entry=0x7fffe8dffdd0 "", buflen=buflen@entry=20, 
errp=errp@entry=0x0) at ../io/channel.c:330
#5  0x55c27e75 in nbd_write (errp=0x0, size=20, 
buffer=0x7fffe8dffdd0, ioc=) at 
../nbd/nbd-internal.h:71
#6  nbd_negotiate_send_rep_len (client=client@entry=0x56f60930, 
type=type@entry=1, len=len@entry=0, errp=errp@entry=0x0) at 
../nbd/server.c:203
#7  0x55c29db1 in nbd_negotiate_send_rep (errp=0x0, type=1, 
client=0x56f60930) at ../nbd/server.c:211

--Type  for more, q to quit, c to continue without paging--
#8  nbd_negotiate_options (errp=0x7fffe8dffe88, client=out>) at ../nbd/server.c:1224
#9  nbd_negotiate (errp=0x7fffe8dffe88, client=) at 
../nbd/server.c:1340
#10 nbd_co_client_start (opaque=) at 
../nbd/server.c:2715
#11 0x55d70423 in coroutine_trampoline (i0=, 
i1=) at

Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX

2022-01-19 Thread Leonardo Bras Soares Passos

Hello Daniel,

On Thu, Jan 13, 2022 at 7:42 AM Daniel P. Berrangé  wrote:
>
> On Thu, Jan 13, 2022 at 06:34:12PM +0800, Peter Xu wrote:
> > On Thu, Jan 13, 2022 at 10:06:14AM +, Daniel P. Berrangé wrote:
> > > On Thu, Jan 13, 2022 at 02:48:15PM +0800, Peter Xu wrote:
> > > > On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > > > > @@ -558,15 +575,26 @@ static ssize_t 
> > > > > qio_channel_socket_writev(QIOChannel *ioc,
> > > > >  memcpy(CMSG_DATA(cmsg), fds, fdsize);
> > > > >  }
> > > > >
> > > > > +if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > > > > +sflags = MSG_ZEROCOPY;
> > > > > +}
> > > > > +
> > > > >   retry:
> > > > > -ret = sendmsg(sioc->fd, , 0);
> > > > > +ret = sendmsg(sioc->fd, , sflags);
> > > > >  if (ret <= 0) {
> > > > > -if (errno == EAGAIN) {
> > > > > +switch (errno) {
> > > > > +case EAGAIN:
> > > > >  return QIO_CHANNEL_ERR_BLOCK;
> > > > > -}
> > > > > -if (errno == EINTR) {
> > > > > +case EINTR:
> > > > >  goto retry;
> > > > > +case ENOBUFS:
> > > > > +if (sflags & MSG_ZEROCOPY) {
> > > > > +error_setg_errno(errp, errno,
> > > > > + "Process can't lock enough memory 
> > > > > for using MSG_ZEROCOPY");
> > > > > +return -1;
> > > > > +}
> > > >
> > > > I have no idea whether it'll make a real differnece, but - should we 
> > > > better add
> > > > a "break" here?  If you agree and with that fixed, feel free to add:
> > > >
> > > > Reviewed-by: Peter Xu 
> > > >
> > > > I also wonder whether you hit ENOBUFS in any of the environments.  On 
> > > > Fedora
> > > > here it's by default unlimited, but just curious when we should keep an 
> > > > eye.
> > >
> > > Fedora doesn't allow unlimited locked memory by default
> > >
> > > $ grep "locked memory" /proc/self/limits
> > > Max locked memory 6553665536bytes
> > >
> > > And  regardless of Fedora defaults, libvirt will set a limit
> > > for the guest. It will only be unlimited if requiring certain
> > > things like VFIO.
> >
> > Thanks, I obviously checked up the wrong host..
> >
> > Leo, do you know how much locked memory will be needed by zero copy?  Will
> > there be a limit?  Is it linear to the number of sockets/channels?
>
> IIRC we decided it would be limited by the socket send buffer size, rather
> than guest RAM, because writes will block once the send buffer is full.
>
> This has a default global setting, with per-socket override. On one box I
> have it is 200 Kb. With multifd you'll need  "num-sockets * send buffer".

Oh, I was not aware there is a send buffer size (or maybe I am unable
to recall).
That sure makes things much easier.

>
> > It'll be better if we can fail at enabling the feature when we detected that
> > the specified locked memory limit may not be suffice.

sure

>
> Checking this value against available locked memory though will always
> have an error margin because other things in QEMU can use locked memory
> too

We can get the current limit (before zerocopy) as an error margin:
req_lock_mem = num-sockets * send buffer + BASE_LOCKED

Where BASE_LOCKED is the current libvirt value, or so on.

What do you think?

Best regards,
Leo

Re: [PATCH 2/2] tests: Refresh lcitool submodule

2022-01-19 Thread Daniel P . Berrangé

On Mon, Jan 10, 2022 at 01:46:38PM +0100, Philippe Mathieu-Daudé wrote:
> Refresh lcitool submodule and the generated files by running:
> 
>   $ make lcitool-refresh
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  tests/docker/dockerfiles/alpine.docker| 3 ++-
>  tests/docker/dockerfiles/centos8.docker   | 3 +--
>  tests/docker/dockerfiles/fedora.docker| 3 +--
>  tests/docker/dockerfiles/opensuse-leap.docker | 2 +-
>  tests/docker/dockerfiles/ubuntu1804.docker| 2 +-
>  tests/docker/dockerfiles/ubuntu2004.docker| 2 +-
>  tests/lcitool/libvirt-ci  | 2 +-
>  7 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/tests/docker/dockerfiles/alpine.docker 
> b/tests/docker/dockerfiles/alpine.docker
> index eb2251c81c8..9d7f74fc51e 100644
> --- a/tests/docker/dockerfiles/alpine.docker
> +++ b/tests/docker/dockerfiles/alpine.docker
> @@ -1,6 +1,6 @@
>  # THIS FILE WAS AUTO-GENERATED
>  #
> -#  $ lcitool dockerfile alpine-edge qemu
> +#  $ lcitool dockerfile --layers all alpine-edge qemu
>  #
>  # https://gitlab.com/libvirt/libvirt-ci
>  
> @@ -109,6 +109,7 @@ RUN apk update && \
>  zlib-dev \
>  zlib-static \
>  zstd-dev && \
> +apk list | sort > /packages.txt && \
>  mkdir -p /usr/libexec/ccache-wrappers && \
>  ln -s /usr/bin/ccache /usr/libexec/ccache-wrappers/c++ && \
>  ln -s /usr/bin/ccache /usr/libexec/ccache-wrappers/cc && \
> diff --git a/tests/docker/dockerfiles/centos8.docker 
> b/tests/docker/dockerfiles/centos8.docker
> index cbb909d02b3..fde6a036263 100644
> --- a/tests/docker/dockerfiles/centos8.docker
> +++ b/tests/docker/dockerfiles/centos8.docker
> @@ -1,6 +1,6 @@
>  # THIS FILE WAS AUTO-GENERATED
>  #
> -#  $ lcitool dockerfile centos-8 qemu
> +#  $ lcitool dockerfile --layers all centos-8 qemu
>  #
>  # https://gitlab.com/libvirt/libvirt-ci
>  
> @@ -69,7 +69,6 @@ RUN dnf update -y && \
>  libssh-devel \
>  libtasn1-devel \
>  libubsan \
> -libudev-devel \
>  liburing-devel \
>  libusbx-devel \
>  libxml2-devel \
> diff --git a/tests/docker/dockerfiles/fedora.docker 
> b/tests/docker/dockerfiles/fedora.docker
> index 60207f3da38..82f504e40d6 100644
> --- a/tests/docker/dockerfiles/fedora.docker
> +++ b/tests/docker/dockerfiles/fedora.docker
> @@ -1,6 +1,6 @@
>  # THIS FILE WAS AUTO-GENERATED
>  #
> -#  $ lcitool dockerfile fedora-35 qemu
> +#  $ lcitool dockerfile --layers all fedora-35 qemu
>  #
>  # https://gitlab.com/libvirt/libvirt-ci
>  
> @@ -77,7 +77,6 @@ exec "$@"' > /usr/bin/nosync && \
>  libssh-devel \
>  libtasn1-devel \
>  libubsan \
> -libudev-devel \
>  liburing-devel \
>  libusbx-devel \
>  libxml2-devel \
> diff --git a/tests/docker/dockerfiles/opensuse-leap.docker 
> b/tests/docker/dockerfiles/opensuse-leap.docker
> index f57d8cfb299..30e7038148a 100644
> --- a/tests/docker/dockerfiles/opensuse-leap.docker
> +++ b/tests/docker/dockerfiles/opensuse-leap.docker
> @@ -1,6 +1,6 @@
>  # THIS FILE WAS AUTO-GENERATED
>  #
> -#  $ lcitool dockerfile opensuse-leap-152 qemu
> +#  $ lcitool dockerfile --layers all opensuse-leap-152 qemu
>  #
>  # https://gitlab.com/libvirt/libvirt-ci
>  
> diff --git a/tests/docker/dockerfiles/ubuntu1804.docker 
> b/tests/docker/dockerfiles/ubuntu1804.docker
> index 0ffa3c4d4b5..4ea272d143b 100644
> --- a/tests/docker/dockerfiles/ubuntu1804.docker
> +++ b/tests/docker/dockerfiles/ubuntu1804.docker
> @@ -1,6 +1,6 @@
>  # THIS FILE WAS AUTO-GENERATED
>  #
> -#  $ lcitool dockerfile ubuntu-1804 qemu
> +#  $ lcitool dockerfile --layers all ubuntu-1804 qemu
>  #
>  # https://gitlab.com/libvirt/libvirt-ci
>  
> diff --git a/tests/docker/dockerfiles/ubuntu2004.docker 
> b/tests/docker/dockerfiles/ubuntu2004.docker
> index 4e562dfdcd3..90988b2bc53 100644
> --- a/tests/docker/dockerfiles/ubuntu2004.docker
> +++ b/tests/docker/dockerfiles/ubuntu2004.docker
> @@ -1,6 +1,6 @@
>  # THIS FILE WAS AUTO-GENERATED
>  #
> -#  $ lcitool dockerfile ubuntu-2004 qemu
> +#  $ lcitool dockerfile --layers all ubuntu-2004 qemu
>  #
>  # https://gitlab.com/libvirt/libvirt-ci
>  
> diff --git a/tests/lcitool/libvirt-ci b/tests/lcitool/libvirt-ci
> index 29cec2153b9..8f48e54238d 16
> --- a/tests/lcitool/libvirt-ci
> +++ b/tests/lcitool/libvirt-ci
> @@ -1 +1 @@
> -Subproject commit 29cec2153b9a4dbb2e66f1cbc9866a4eff519cfd
> +Subproject commit 8f48e54238d28d7a427a541d6dbe56432e3c4660

If you update that further you'll get the commit you added to
support macos-12.


regardless

  Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 0/2] tests: Refresh lcitool submodule

2022-01-19 Thread Philippe Mathieu-Daudé via

ping?

(the testing/next queue this patch was depending on is now merged).

On 1/10/22 13:46, Philippe Mathieu-Daudé wrote:
> Refresh lcitool to latest.
> 
> Based on Alex's testing/next
> Based-on: <20220105135009.1584676-1-alex.ben...@linaro.org>
> 
> Philippe Mathieu-Daudé (2):
>   MAINTAINERS: Cover lcitool submodule with build test / automation
>   tests: Refresh lcitool submodule
> 
>  MAINTAINERS   | 1 +
>  tests/docker/dockerfiles/alpine.docker| 3 ++-
>  tests/docker/dockerfiles/centos8.docker   | 3 +--
>  tests/docker/dockerfiles/fedora.docker| 3 +--
>  tests/docker/dockerfiles/opensuse-leap.docker | 2 +-
>  tests/docker/dockerfiles/ubuntu1804.docker| 2 +-
>  tests/docker/dockerfiles/ubuntu2004.docker| 2 +-
>  tests/lcitool/libvirt-ci  | 2 +-
>  8 files changed, 9 insertions(+), 9 deletions(-)
>

Re: [PATCH 1/2] MAINTAINERS: Cover lcitool submodule with build test / automation

2022-01-19 Thread Daniel P . Berrangé

On Mon, Jan 10, 2022 at 01:46:37PM +0100, Philippe Mathieu-Daudé wrote:
> lcitool is used by build test / automation, we want maintainers
> to get notified if the submodule is updated.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 2/2] tests: Refresh lcitool submodule

2022-01-19 Thread Philippe Mathieu-Daudé via

On 1/19/22 18:33, Daniel P. Berrangé wrote:
> On Mon, Jan 10, 2022 at 01:46:38PM +0100, Philippe Mathieu-Daudé wrote:
>> Refresh lcitool submodule and the generated files by running:
>>
>>   $ make lcitool-refresh
>>
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  tests/docker/dockerfiles/alpine.docker| 3 ++-
>>  tests/docker/dockerfiles/centos8.docker   | 3 +--
>>  tests/docker/dockerfiles/fedora.docker| 3 +--
>>  tests/docker/dockerfiles/opensuse-leap.docker | 2 +-
>>  tests/docker/dockerfiles/ubuntu1804.docker| 2 +-
>>  tests/docker/dockerfiles/ubuntu2004.docker| 2 +-
>>  tests/lcitool/libvirt-ci  | 2 +-
>>  7 files changed, 8 insertions(+), 9 deletions(-)

>> diff --git a/tests/lcitool/libvirt-ci b/tests/lcitool/libvirt-ci
>> index 29cec2153b9..8f48e54238d 16
>> --- a/tests/lcitool/libvirt-ci
>> +++ b/tests/lcitool/libvirt-ci
>> @@ -1 +1 @@
>> -Subproject commit 29cec2153b9a4dbb2e66f1cbc9866a4eff519cfd
>> +Subproject commit 8f48e54238d28d7a427a541d6dbe56432e3c4660
> 
> If you update that further you'll get the commit you added to
> support macos-12.

This in done in the "Add support for macOS 12 build on Cirrus-CI"
patch:

https://lore.kernel.org/qemu-devel/20220110131001.614319-8-f4...@amsat.org/

> regardless
> 
>   Reviewed-by: Daniel P. Berrangé 

Thanks!

[PATCH 1/3] meson: Don't force use of libgcrypt-config

2022-01-19 Thread Andrea Bolognani

libgcrypt 1.9.0 (released in 2021-01) ships with a proper
pkg-config file, which Meson's libgcrypt detection code can use
if available.

Passing 'config-tool' as 'method' when calling dependency(),
however, forces Meson to ignore the pkg-config file and always
use libgcrypt-config instead.

Signed-off-by: Andrea Bolognani 
---
 meson.build | 1 -
 1 file changed, 1 deletion(-)

diff --git a/meson.build b/meson.build
index 762d7cee85..bc17ba67fd 100644
--- a/meson.build
+++ b/meson.build
@@ -1036,7 +1036,6 @@ endif
 if not gnutls_crypto.found()
   if (not get_option('gcrypt').auto() or have_system) and not 
get_option('nettle').enabled()
 gcrypt = dependency('libgcrypt', version: '>=1.8',
-method: 'config-tool',
 required: get_option('gcrypt'),
 kwargs: static_kwargs)
 # Debian has removed -lgpg-error from libgcrypt-config
-- 
2.34.1

Re: [RFC PATCH v3 1/3] hw/intc/arm_gicv3: Check for !MEMTX_OK instead of MEMTX_ERROR

2022-01-19 Thread Philippe Mathieu-Daudé via

Hi Peter,

Can you take this single patch via your arm tree?

Thanks,

Phil.

On 12/15/21 19:24, Philippe Mathieu-Daudé wrote:
> Quoting Peter Maydell:
> 
>  "These MEMTX_* aren't from the memory transaction
>   API functions; they're just being used by gicd_readl() and
>   friends as a way to indicate a success/failure so that the
>   actual MemoryRegionOps read/write fns like gicv3_dist_read()
>   can log a guest error."
> 
> We are going to introduce more MemTxResult bits, so it is
> safer to check for !MEMTX_OK rather than MEMTX_ERROR.
> 
> Reviewed-by: Peter Xu 
> Reviewed-by: David Hildenbrand 
> Reviewed-by: Peter Maydell 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/intc/arm_gicv3_redist.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/intc/arm_gicv3_redist.c b/hw/intc/arm_gicv3_redist.c
> index c8ff3eca085..99b11ca5eee 100644
> --- a/hw/intc/arm_gicv3_redist.c
> +++ b/hw/intc/arm_gicv3_redist.c
> @@ -462,7 +462,7 @@ MemTxResult gicv3_redist_read(void *opaque, hwaddr 
> offset, uint64_t *data,
>  break;
>  }
>  
> -if (r == MEMTX_ERROR) {
> +if (r != MEMTX_OK) {
>  qemu_log_mask(LOG_GUEST_ERROR,
>"%s: invalid guest read at offset " TARGET_FMT_plx
>" size %u\n", __func__, offset, size);
> @@ -521,7 +521,7 @@ MemTxResult gicv3_redist_write(void *opaque, hwaddr 
> offset, uint64_t data,
>  break;
>  }
>  
> -if (r == MEMTX_ERROR) {
> +if (r != MEMTX_OK) {
>  qemu_log_mask(LOG_GUEST_ERROR,
>"%s: invalid guest write at offset " TARGET_FMT_plx
>" size %u\n", __func__, offset, size);

[PATCH 3/3] docs: Don't recommend passing 'method' to dependency()

2022-01-19 Thread Andrea Bolognani

Meson will do the right thing by default.

Signed-off-by: Andrea Bolognani 
---
 docs/devel/build-system.rst | 1 -
 1 file changed, 1 deletion(-)

diff --git a/docs/devel/build-system.rst b/docs/devel/build-system.rst
index 431caba7aa..fcdc0cd187 100644
--- a/docs/devel/build-system.rst
+++ b/docs/devel/build-system.rst
@@ -316,7 +316,6 @@ dependency will be used::
   sdl_image = not_found
   if not get_option('sdl_image').auto() or have_system
 sdl_image = dependency('SDL2_image', required: get_option('sdl_image'),
-   method: 'pkg-config',
static: enable_static)
   endif
 
-- 
2.34.1

Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX

2022-01-19 Thread Leonardo Bras Soares Passos

On Thu, Jan 13, 2022 at 9:12 AM Peter Xu  wrote:
>
> On Thu, Jan 13, 2022 at 10:42:39AM +, Daniel P. Berrangé wrote:
> > On Thu, Jan 13, 2022 at 06:34:12PM +0800, Peter Xu wrote:
> > > On Thu, Jan 13, 2022 at 10:06:14AM +, Daniel P. Berrangé wrote:
> > > > On Thu, Jan 13, 2022 at 02:48:15PM +0800, Peter Xu wrote:
> > > > > On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > > > > > @@ -558,15 +575,26 @@ static ssize_t 
> > > > > > qio_channel_socket_writev(QIOChannel *ioc,
> > > > > >  memcpy(CMSG_DATA(cmsg), fds, fdsize);
> > > > > >  }
> > > > > >
> > > > > > +if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > > > > > +sflags = MSG_ZEROCOPY;
> > > > > > +}
> > > > > > +
> > > > > >   retry:
> > > > > > -ret = sendmsg(sioc->fd, , 0);
> > > > > > +ret = sendmsg(sioc->fd, , sflags);
> > > > > >  if (ret <= 0) {
> > > > > > -if (errno == EAGAIN) {
> > > > > > +switch (errno) {
> > > > > > +case EAGAIN:
> > > > > >  return QIO_CHANNEL_ERR_BLOCK;
> > > > > > -}
> > > > > > -if (errno == EINTR) {
> > > > > > +case EINTR:
> > > > > >  goto retry;
> > > > > > +case ENOBUFS:
> > > > > > +if (sflags & MSG_ZEROCOPY) {
> > > > > > +error_setg_errno(errp, errno,
> > > > > > + "Process can't lock enough memory 
> > > > > > for using MSG_ZEROCOPY");
> > > > > > +return -1;
> > > > > > +}
> > > > >
> > > > > I have no idea whether it'll make a real differnece, but - should we 
> > > > > better add
> > > > > a "break" here?  If you agree and with that fixed, feel free to add:
> > > > >
> > > > > Reviewed-by: Peter Xu 
> > > > >
> > > > > I also wonder whether you hit ENOBUFS in any of the environments.  On 
> > > > > Fedora
> > > > > here it's by default unlimited, but just curious when we should keep 
> > > > > an eye.
> > > >
> > > > Fedora doesn't allow unlimited locked memory by default
> > > >
> > > > $ grep "locked memory" /proc/self/limits
> > > > Max locked memory 6553665536
> > > > bytes
> > > >
> > > > And  regardless of Fedora defaults, libvirt will set a limit
> > > > for the guest. It will only be unlimited if requiring certain
> > > > things like VFIO.
> > >
> > > Thanks, I obviously checked up the wrong host..
> > >
> > > Leo, do you know how much locked memory will be needed by zero copy?  Will
> > > there be a limit?  Is it linear to the number of sockets/channels?
> >
> > IIRC we decided it would be limited by the socket send buffer size, rather
> > than guest RAM, because writes will block once the send buffer is full.
> >
> > This has a default global setting, with per-socket override. On one box I
> > have it is 200 Kb. With multifd you'll need  "num-sockets * send buffer".
> >
> > > It'll be better if we can fail at enabling the feature when we detected 
> > > that
> > > the specified locked memory limit may not be suffice.
> >
> > Checking this value against available locked memory though will always
> > have an error margin because other things in QEMU can use locked memory
> > too
>
> We could always still allow false positive in this check, so we can fail if we
> have a solid clue to know we'll fail later (e.g. minimum locked_vm needed is
> already less than total).  But no strong opinion; we could have this merged 
> and
> see whether that's needed in real life.  Thanks,

I agree, this is a good approach.

Leo

Re: [PATCH] configure: Use -mlittle-endian instead of -mlittle for ppc64

2022-01-19 Thread Philippe Mathieu-Daudé via

On 1/19/22 12:13, Paolo Bonzini wrote:
> On 1/19/22 10:56, mreza...@redhat.com wrote:
>> From: Miroslav Rezanina 
>>
>> GCC options -mlittle and -mlittle-endian are equivalent on ppc64
>> architecture. However, clang supports only -mlittle-endian option.
>>
>> Use longer form in configure to properly support both GCC and clang
>> compiler.
>>
>> Signed-off-by: Miroslav Rezanina 
>> ---
>>   configure | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/configure b/configure
>> index e1a31fb332..e63c78ca67 100755
>> --- a/configure
>> +++ b/configure
>> @@ -658,7 +658,7 @@ case "$cpu" in
>>   CPU_CFLAGS="-m64 -mbig" ;;
>>     ppc64le)
>>   cpu="ppc64"
>> -    CPU_CFLAGS="-m64 -mlittle" ;;
>> +    CPU_CFLAGS="-m64 -mlittle-endian" ;;
>>       s390)
>>   CPU_CFLAGS="-m31" ;;
> 
> Does -mbig need the same one line above?

Yes, and tests/tcg/configure.sh needs that change too.

Re: [PATCH] drop libxml2 checks since libxml is not actually used (for parallels)

2022-01-19 Thread Stefan Hajnoczi

On Wed, Jan 19, 2022 at 12:04:23PM +0300, Michael Tokarev wrote:
> [This is a trivial patch, but due to the number of files it touches
> I'm not using qemu-trivial@ route.]
> 
> For a long time, we assumed that libxml2 is neecessary for parallels
> block format support (block/parallels*). However, this format actually
> does not use libxml. Since this is the only user of libxml2 in while
> qemu tree, we can drop all libxml2 checks and dependencies too.
> 
> It is even more: --enable-parallels configure option was the only
> option which was silently ignored when it's (fake) dependency
> (libxml2) isn't installed.
> 
> Drop all mentions of libxml2.
> 
> Signed-off-by: Michael Tokarev 
> ---
>  .cirrus.yml | 1 -
>  .gitlab-ci.d/cirrus/freebsd-12.vars | 2 +-
>  .gitlab-ci.d/cirrus/freebsd-13.vars | 2 +-
>  .gitlab-ci.d/cirrus/macos-11.vars   | 2 +-
>  .gitlab-ci.d/windows.yml| 2 --
>  block/meson.build   | 3 +--
>  meson.build | 6 --
>  meson_options.txt   | 2 --
>  scripts/ci/org.centos/stream/8/x86_64/configure | 1 -
>  scripts/coverity-scan/coverity-scan.docker  | 1 -
>  scripts/coverity-scan/run-coverity-scan | 2 +-
>  tests/docker/dockerfiles/alpine.docker  | 1 -
>  tests/docker/dockerfiles/centos8.docker | 1 -
>  tests/docker/dockerfiles/fedora.docker  | 1 -
>  tests/docker/dockerfiles/opensuse-leap.docker   | 1 -
>  tests/docker/dockerfiles/ubuntu1804.docker  | 1 -
>  tests/docker/dockerfiles/ubuntu2004.docker  | 1 -
>  17 files changed, 5 insertions(+), 25 deletions(-)

Nice cleanup.

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 1/3] qsd: Add pre-init argument parsing pass

2022-01-19 Thread Kevin Wolf

Am 19.01.2022 um 14:44 hat Hanna Reitz geschrieben:
> On 19.01.22 13:58, Markus Armbruster wrote:
> > Hanna Reitz  writes:
> > 
> > > We want to add a --daemonize argument to QSD's command line.
> > Why?
> 
> OK, s/we/I/.  I find it useful, because without such an option, I need to
> have whoever invokes QSD loop until the PID file exists, before I can be
> sure that all exports are set up.  I make use of it in the test cases added
> in patch 3.
> 
> I suppose this could be worked around with a special character device, like
> so:
> 
> ```
> ncat --listen -U /tmp/qsd-done.sock  ncat_pid=$!
> 
> qemu-storage-daemon \
>     ... \
>     --chardev socket,id=signal_done,path=/tmp/qsd-done.sock \
>     --monitor signal_done \
>     --pidfile /tmp/qsd.pid &
> 
> wait $ncat_pid
> ```
> 
> But having to use an extra tool for this is unergonomic.  I mean, if there’s
> no other way...

The other point is that the system emulator has it, qemu-nbd has it,
so certainly qsd should have it as well. Not the least because it should
be able to replace qemu-nbd (at least for the purpose of exporting NBD.
not necessarily for attaching it to the host).

> > >This will
> > > require forking the process before we do any complex initialization
> > > steps, like setting up the block layer or QMP.  Therefore, we must scan
> > > the command line for it long before our current process_options() call.
> > Can you explain in a bit more detail why early forking is required?
> > 
> > I have a strong dislike for parsing more than once...
> 
> Because I don’t want to set up QMP and block devices, and then fork the
> process into two.  That sounds like there’d be a lot of stuff to think
> about, which just isn’t necessary, because we don’t need to set up any
> of this in the parent.

Here we can compare again: Both the system emulator and qemu-nbd behave
the same, they fork before they do anything interesting.

The difference is that they still parse the command line only once
because they don't immediately create things, but just store the options
and later process them in their own magic order. I'd much rather parse
the command line twice than copy that behaviour.

Kevin

> For example, if I set up a monitor on a Unix socket (server=true),
> processing is delayed until the client connects.  Say I put --daemonize
> afterwards.  I connect to the waiting server socket, the child is forked
> off, and then... I’m not sure what happens, actually.  Do I have a
> connection with both the parent and the child listening?  I know that in
> practice, what happens is that once the parent exits, the connection is
> closed, and I get a “qemu: qemu_thread_join: Invalid argument” warning/error
> on the QSD side.
> 
> There’s a lot of stuff to think about if you allow forking after other
> options, so it should be done first.  We could just require the user to put
> --daemonize before all other options, and so have a single pass; but still,
> before options are even parsed, we have already for example called
> bdrv_init(), init_qmp_commands(), qemu_init_main_loop().  These are all
> things that the parent of a daemonizing process doesn’t need to do, and
> where I’d simply rather not think about what impact it has if we fork
> afterwards.
> 
> Hanna
> 
> > > Instead of adding custom new code to do so, just reuse process_options()
> > > and give it a @pre_init_pass argument to distinguish the two passes.  I
> > > believe there are some other switches but --daemonize that deserve
> > > parsing in the first pass:
> > > 
> > > - --help and --version are supposed to only print some text and then
> > >immediately exit (so any initialization we do would be for naught).
> > >This changes behavior, because now "--blockdev inv-drv --help" will
> > >print a help text instead of complaining about the --blockdev
> > >argument.
> > >Note that this is similar in behavior to other tools, though: "--help"
> > >is generally immediately acted upon when finding it in the argument
> > >list, potentially before other arguments (even ones before it) are
> > >acted on.  For example, "ls /does-not-exist --help" prints a help text
> > >and does not complain about ENOENT.
> > > 
> > > - --pidfile does not need initialization, and is already exempted from
> > >the sequential order that process_options() claims to strictly follow
> > >(the PID file is only created after all arguments are processed, not
> > >at the time the --pidfile argument appears), so it makes sense to
> > >include it in the same category as --daemonize.
> > > 
> > > - Invalid arguments should always be reported as soon as possible.  (The
> > >same caveat with --help applies: That means that "--blockdev inv-drv
> > >--inv-arg" will now complain about --inv-arg, not inv-drv.)
> > > 
> > > Note that we could decide to check only for --daemonize in the first
> > > pass, and defer --help,

[PATCH 0/3] meson: Don't pass 'method' to dependency()

2022-01-19 Thread Andrea Bolognani

See [1] for recent discussion about libgcrypt specifically, which the
first patch is about.

After writing that one, I realized that there is no point in
explicitly passing 'method' to dependency() because Meson will do the
right thing by default - hence the next two patches.

[1] https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg01224.html

Andrea Bolognani (3):
  meson: Don't force use of libgcrypt-config
  meson: Don't pass 'method' to dependency()
  docs: Don't recommend passing 'method' to dependency()

 docs/devel/build-system.rst |  1 -
 meson.build | 75 +++--
 tcg/meson.build |  2 +-
 3 files changed, 31 insertions(+), 47 deletions(-)

-- 
2.34.1

[PATCH RESEND v2] qapi/block: Cosmetic change in BlockExportType schema

2022-01-19 Thread Philippe Mathieu-Daudé via

From: Philippe Mathieu-Daudé 

Fix long line introduced in commit bb01ea73110 ("qapi/block:
Restrict vhost-user-blk to CONFIG_VHOST_USER_BLK_SERVER").

Suggested-by: Markus Armbruster 
Acked-by: Markus Armbruster 
Signed-off-by: Philippe Mathieu-Daudé 
---
Trying another git option to see even if my email From:
is "Philippe Mathieu-Daudé via ",
the patch contains the correct From: and can be applied...
---
 qapi/block-export.json | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/qapi/block-export.json b/qapi/block-export.json
index f9ce79a974b..f183522d0d2 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -278,7 +278,8 @@
 ##
 { 'enum': 'BlockExportType',
   'data': [ 'nbd',
-{ 'name': 'vhost-user-blk', 'if': 'CONFIG_VHOST_USER_BLK_SERVER' },
+{ 'name': 'vhost-user-blk',
+  'if': 'CONFIG_VHOST_USER_BLK_SERVER' },
 { 'name': 'fuse', 'if': 'CONFIG_FUSE' } ] }
 
 ##
-- 
2.34.1

Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX

2022-01-19 Thread Leonardo Bras Soares Passos

Hello Peter,

On Thu, Jan 13, 2022 at 3:48 AM Peter Xu  wrote:
>
> On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > @@ -558,15 +575,26 @@ static ssize_t qio_channel_socket_writev(QIOChannel 
> > *ioc,
> >  memcpy(CMSG_DATA(cmsg), fds, fdsize);
> >  }
> >
> > +if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > +sflags = MSG_ZEROCOPY;
> > +}
> > +
> >   retry:
> > -ret = sendmsg(sioc->fd, , 0);
> > +ret = sendmsg(sioc->fd, , sflags);
> >  if (ret <= 0) {
> > -if (errno == EAGAIN) {
> > +switch (errno) {
> > +case EAGAIN:
> >  return QIO_CHANNEL_ERR_BLOCK;
> > -}
> > -if (errno == EINTR) {
> > +case EINTR:
> >  goto retry;
> > +case ENOBUFS:
> > +if (sflags & MSG_ZEROCOPY) {
> > +error_setg_errno(errp, errno,
> > + "Process can't lock enough memory for 
> > using MSG_ZEROCOPY");
> > +return -1;
> > +}
>
> I have no idea whether it'll make a real differnece, but - should we better 
> add
> a "break" here?

Here I followed the standard of the EAGAIN error, that's why I just returned -1.

IIUC A break here would cause the errp to be re-set to the default
message, after the switch.
Another option would be to add a 'default' clause, and move the
default error msg there, and return the -1
after the switch.

In the end I thought the current way was simpler, but it's no issue
to change if you think the 'default' idea would be better.

>  If you agree and with that fixed, feel free to add:
>
> Reviewed-by: Peter Xu 
>

Thanks!

> I also wonder whether you hit ENOBUFS in any of the environments.  On Fedora
> here it's by default unlimited, but just curious when we should keep an eye.

It's unlimited if you run as root IIRC.

>
> Thanks,
>
> --
> Peter Xu
>

[PATCH 2/3] meson: Don't pass 'method' to dependency()

2022-01-19 Thread Andrea Bolognani

Meson will do the right thing by default.

Signed-off-by: Andrea Bolognani 
---
 meson.build | 74 -
 tcg/meson.build |  2 +-
 2 files changed, 31 insertions(+), 45 deletions(-)

diff --git a/meson.build b/meson.build
index bc17ba67fd..b807ad9fbb 100644
--- a/meson.build
+++ b/meson.build
@@ -427,13 +427,13 @@ if 'CONFIG_GIO' in config_host
 endif
 lttng = not_found
 if 'ust' in get_option('trace_backends')
-  lttng = dependency('lttng-ust', required: true, method: 'pkg-config',
+  lttng = dependency('lttng-ust', required: true,
  kwargs: static_kwargs)
 endif
 pixman = not_found
 if have_system or have_tools
   pixman = dependency('pixman-1', required: have_system, version:'>=0.21.8',
-  method: 'pkg-config', kwargs: static_kwargs)
+  kwargs: static_kwargs)
 endif
 zlib = dependency('zlib', required: true, kwargs: static_kwargs)
 
@@ -446,18 +446,18 @@ endif
 linux_io_uring = not_found
 if not get_option('linux_io_uring').auto() or have_block
   linux_io_uring = dependency('liburing', required: 
get_option('linux_io_uring'),
-  method: 'pkg-config', kwargs: static_kwargs)
+  kwargs: static_kwargs)
 endif
 libxml2 = not_found
 if not get_option('libxml2').auto() or have_block
   libxml2 = dependency('libxml-2.0', required: get_option('libxml2'),
-   method: 'pkg-config', kwargs: static_kwargs)
+   kwargs: static_kwargs)
 endif
 libnfs = not_found
 if not get_option('libnfs').auto() or have_block
   libnfs = dependency('libnfs', version: '>=1.9.3',
   required: get_option('libnfs'),
-  method: 'pkg-config', kwargs: static_kwargs)
+  kwargs: static_kwargs)
 endif
 
 libattr_test = '''
@@ -505,7 +505,7 @@ seccomp = not_found
 if not get_option('seccomp').auto() or have_system or have_tools
   seccomp = dependency('libseccomp', version: '>=2.3.0',
required: get_option('seccomp'),
-   method: 'pkg-config', kwargs: static_kwargs)
+   kwargs: static_kwargs)
 endif
 
 libcap_ng = not_found
@@ -533,7 +533,7 @@ if get_option('xkbcommon').auto() and not have_system and 
not have_tools
   xkbcommon = not_found
 else
   xkbcommon = dependency('xkbcommon', required: get_option('xkbcommon'),
- method: 'pkg-config', kwargs: static_kwargs)
+ kwargs: static_kwargs)
 endif
 
 vde = not_found
@@ -562,30 +562,30 @@ endif
 pulse = not_found
 if not get_option('pa').auto() or (targetos == 'linux' and have_system)
   pulse = dependency('libpulse', required: get_option('pa'),
- method: 'pkg-config', kwargs: static_kwargs)
+ kwargs: static_kwargs)
 endif
 alsa = not_found
 if not get_option('alsa').auto() or (targetos == 'linux' and have_system)
   alsa = dependency('alsa', required: get_option('alsa'),
-method: 'pkg-config', kwargs: static_kwargs)
+kwargs: static_kwargs)
 endif
 jack = not_found
 if not get_option('jack').auto() or have_system
   jack = dependency('jack', required: get_option('jack'),
-method: 'pkg-config', kwargs: static_kwargs)
+kwargs: static_kwargs)
 endif
 
 spice_protocol = not_found
 if not get_option('spice_protocol').auto() or have_system
   spice_protocol = dependency('spice-protocol', version: '>=0.12.3',
   required: get_option('spice_protocol'),
-  method: 'pkg-config', kwargs: static_kwargs)
+  kwargs: static_kwargs)
 endif
 spice = not_found
 if not get_option('spice').auto() or have_system
   spice = dependency('spice-server', version: '>=0.12.5',
  required: get_option('spice'),
- method: 'pkg-config', kwargs: static_kwargs)
+ kwargs: static_kwargs)
 endif
 spice_headers = spice.partial_dependency(compile_args: true, includes: true)
 
@@ -595,32 +595,29 @@ libiscsi = not_found
 if not get_option('libiscsi').auto() or have_block
   libiscsi = dependency('libiscsi', version: '>=1.9.0',
  required: get_option('libiscsi'),
- method: 'pkg-config', kwargs: static_kwargs)
+ kwargs: static_kwargs)
 endif
 zstd = not_found
 if not get_option('zstd').auto() or have_block
   zstd = dependency('libzstd', version: '>=1.4.0',
 required: get_option('zstd'),
-method: 'pkg-config', kwargs: static_kwargs)
+kwargs: static_kwargs)
 endif
 virgl = not_found
 if not get_option('virglrenderer').auto() or have_system
   virgl = dependency('virglrenderer',
- method: 'pkg-config',
  required: get_option('virglrenderer'),

Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX

2022-01-19 Thread Leonardo Bras Soares Passos

On Thu, Jan 13, 2022 at 7:34 AM Peter Xu  wrote:
>
> On Thu, Jan 13, 2022 at 10:06:14AM +, Daniel P. Berrangé wrote:
> > On Thu, Jan 13, 2022 at 02:48:15PM +0800, Peter Xu wrote:
> > > On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > > > @@ -558,15 +575,26 @@ static ssize_t 
> > > > qio_channel_socket_writev(QIOChannel *ioc,
> > > >  memcpy(CMSG_DATA(cmsg), fds, fdsize);
> > > >  }
> > > >
> > > > +if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > > > +sflags = MSG_ZEROCOPY;
> > > > +}
> > > > +
> > > >   retry:
> > > > -ret = sendmsg(sioc->fd, , 0);
> > > > +ret = sendmsg(sioc->fd, , sflags);
> > > >  if (ret <= 0) {
> > > > -if (errno == EAGAIN) {
> > > > +switch (errno) {
> > > > +case EAGAIN:
> > > >  return QIO_CHANNEL_ERR_BLOCK;
> > > > -}
> > > > -if (errno == EINTR) {
> > > > +case EINTR:
> > > >  goto retry;
> > > > +case ENOBUFS:
> > > > +if (sflags & MSG_ZEROCOPY) {
> > > > +error_setg_errno(errp, errno,
> > > > + "Process can't lock enough memory for 
> > > > using MSG_ZEROCOPY");
> > > > +return -1;
> > > > +}
> > >
> > > I have no idea whether it'll make a real differnece, but - should we 
> > > better add
> > > a "break" here?  If you agree and with that fixed, feel free to add:
> > >
> > > Reviewed-by: Peter Xu 
> > >
> > > I also wonder whether you hit ENOBUFS in any of the environments.  On 
> > > Fedora
> > > here it's by default unlimited, but just curious when we should keep an 
> > > eye.
> >
> > Fedora doesn't allow unlimited locked memory by default
> >
> > $ grep "locked memory" /proc/self/limits
> > Max locked memory 6553665536bytes
> >
> > And  regardless of Fedora defaults, libvirt will set a limit
> > for the guest. It will only be unlimited if requiring certain
> > things like VFIO.
>
> Thanks, I obviously checked up the wrong host..
>
> Leo, do you know how much locked memory will be needed by zero copy?  Will
> there be a limit?  Is it linear to the number of sockets/channels?

It depends on the number of channels, of course, but there are
influencing factors, like network bandwidth & usage, and cpu speed &
usage, network queue size, VM pagesize and so on.

A simple exemple:
If the cpu is free/fast, but there are other applications using the
network, we may enqueue a lot of stuff for sending, and end up needing
a lot of locked memory.

I don't think it's easy to calculate a good reference value for locked
memory here.

>
> It'll be better if we can fail at enabling the feature when we detected that
> the specified locked memory limit may not be suffice.

I agree it's a good idea. But having this reference value calculated
is not much simple, IIUC.

>
> --
> Peter Xu
>

[PATCH RESEND] qapi/block: Cosmetic change in BlockExportType schema

2022-01-19 Thread Philippe Mathieu-Daudé via

Fix long line introduced in commit bb01ea73110 ("qapi/block:
Restrict vhost-user-blk to CONFIG_VHOST_USER_BLK_SERVER").

Suggested-by: Markus Armbruster 
Acked-by: Markus Armbruster 
Signed-off-by: Philippe Mathieu-Daudé 
---
Trying another git option to see even if my email From:
is "Philippe Mathieu-Daudé via ",
the patch contains the correct From: and can be applied...
---
 qapi/block-export.json | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/qapi/block-export.json b/qapi/block-export.json
index f9ce79a974b..f183522d0d2 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -278,7 +278,8 @@
 ##
 { 'enum': 'BlockExportType',
   'data': [ 'nbd',
-{ 'name': 'vhost-user-blk', 'if': 'CONFIG_VHOST_USER_BLK_SERVER' },
+{ 'name': 'vhost-user-blk',
+  'if': 'CONFIG_VHOST_USER_BLK_SERVER' },
 { 'name': 'fuse', 'if': 'CONFIG_FUSE' } ] }
 
 ##
-- 
2.34.1

Re: [RFC 05/10] vdpa-dev: implement the realize interface

2022-01-19 Thread Stefan Hajnoczi

On Mon, Jan 17, 2022 at 12:34:50PM +, Longpeng (Mike, Cloud Infrastructure 
Service Product Dept.) wrote:
> 
> 
> > -Original Message-
> > From: Stefan Hajnoczi [mailto:stefa...@redhat.com]
> > Sent: Thursday, January 6, 2022 7:34 PM
> > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > 
> > Cc: m...@redhat.com; jasow...@redhat.com; sgarz...@redhat.com;
> > coh...@redhat.com; pbonz...@redhat.com; Gonglei (Arei)
> > ; Yechuan ; Huangzhichao
> > ; qemu-devel@nongnu.org
> > Subject: Re: [RFC 05/10] vdpa-dev: implement the realize interface
> > 
> > On Thu, Jan 06, 2022 at 03:02:37AM +, Longpeng (Mike, Cloud 
> > Infrastructure
> > Service Product Dept.) wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Stefan Hajnoczi [mailto:stefa...@redhat.com]
> > > > Sent: Wednesday, January 5, 2022 6:18 PM
> > > > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > > > 
> > > > Cc: m...@redhat.com; jasow...@redhat.com; sgarz...@redhat.com;
> > > > coh...@redhat.com; pbonz...@redhat.com; Gonglei (Arei)
> > > > ; Yechuan ; Huangzhichao
> > > > ; qemu-devel@nongnu.org
> > > > Subject: Re: [RFC 05/10] vdpa-dev: implement the realize interface
> > > >
> > > > On Wed, Jan 05, 2022 at 08:58:55AM +0800, Longpeng(Mike) wrote:
> > > > > From: Longpeng 
> > > > >
> > > > > Implements the .realize interface.
> > > > >
> > > > > Signed-off-by: Longpeng 
> > > > > ---
> > > > >  hw/virtio/vdpa-dev.c | 114 
> > > > > +++
> > > > >  include/hw/virtio/vdpa-dev.h |   8 +++
> > > > >  2 files changed, 122 insertions(+)
> > > > >
> > > > > diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> > > > > index 790117fb3b..2d534d837a 100644
> > > > > --- a/hw/virtio/vdpa-dev.c
> > > > > +++ b/hw/virtio/vdpa-dev.c
> > > > > @@ -15,9 +15,122 @@
> > > > >  #include "sysemu/sysemu.h"
> > > > >  #include "sysemu/runstate.h"
> > > > >
> > > > > +static void
> > > > > +vhost_vdpa_device_dummy_handle_output(VirtIODevice *vdev, VirtQueue
> > *vq)
> > > > > +{
> > > > > +/* Nothing to do */
> > > > > +}
> > > > > +
> > > > > +static int vdpa_dev_get_info_by_fd(int fd, uint64_t cmd, Error 
> > > > > **errp)
> > > >
> > > > This looks similar to the helper function in a previous patch but this
> > > > time the return value type is int instead of uint32_t. Please make the
> > > > types consistent.
> > > >
> > >
> > > OK.
> > >
> > > > > +{
> > > > > +int val;
> > > > > +
> > > > > +if (ioctl(fd, cmd, ) < 0) {
> > > > > +error_setg(errp, "vhost-vdpa-device: cmd 0x%lx failed: %s",
> > > > > +   cmd, strerror(errno));
> > > > > +return -1;
> > > > > +}
> > > > > +
> > > > > +return val;
> > > > > +}
> > > > > +
> > > > > +static inline int vdpa_dev_get_queue_size(int fd, Error **errp)
> > > > > +{
> > > > > +return vdpa_dev_get_info_by_fd(fd, VHOST_VDPA_GET_VRING_NUM, 
> > > > > errp);
> > > > > +}
> > > > > +
> > > > > +static inline int vdpa_dev_get_vqs_num(int fd, Error **errp)
> > > > > +{
> > > > > +return vdpa_dev_get_info_by_fd(fd, VHOST_VDPA_GET_VQS_NUM, errp);
> > > > > +}
> > > > > +
> > > > > +static inline int vdpa_dev_get_config_size(int fd, Error **errp)
> > > > > +{
> > > > > +return vdpa_dev_get_info_by_fd(fd, VHOST_VDPA_GET_CONFIG_SIZE,
> > errp);
> > > > > +}
> > > > > +
> > > > >  static void vhost_vdpa_device_realize(DeviceState *dev, Error **errp)
> > > > >  {
> > > > > +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > > > > +VhostVdpaDevice *s = VHOST_VDPA_DEVICE(vdev);
> > > > > +uint32_t device_id;
> > > > > +int max_queue_size;
> > > > > +int fd;
> > > > > +int i, ret;
> > > > > +
> > > > > +fd = qemu_open(s->vdpa_dev, O_RDWR, errp);
> > > > > +if (fd == -1) {
> > > > > +return;
> > > > > +}
> > > > > +s->vdpa.device_fd = fd;
> > > >
> > > > This is the field I suggest exposing as a QOM property so it can be set
> > > > from the proxy object (e.g. when the PCI proxy opens the vdpa device
> > > > before our .realize() function is called).
> > > >
> > >
> > > OK.
> > >
> > > > > +
> > > > > +max_queue_size = vdpa_dev_get_queue_size(fd, errp);
> > > > > +if (*errp) {
> > > > > +goto out;
> > > > > +}
> > > > > +
> > > > > +if (s->queue_size > max_queue_size) {
> > > > > +error_setg(errp, "vhost-vdpa-device: invalid queue_size: %d
> > > > (max:%d)",
> > > > > +   s->queue_size, max_queue_size);
> > > > > +goto out;
> > > > > +} else if (!s->queue_size) {
> > > > > +s->queue_size = max_queue_size;
> > > > > +}
> > > > > +
> > > > > +ret = vdpa_dev_get_vqs_num(fd, errp);
> > > > > +if (*errp) {
> > > > > +goto out;
> > > > > +}
> > > > > +
> > > > > +s->dev.nvqs = ret;
> > > >
> > > > There is no input validation because we trust the kernel vDPA return
> > > > values. That seems okay for now but if there is a vhost-user version of

Re: [PATCH RFC 06/15] migration: Move temp page setup and cleanup into separate functions

2022-01-19 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> Temp pages will need to grow if we want to have multiple channels for 
> postcopy,
> because each channel will need its own temp page to cache huge page data.
> 
> Before doing that, cleanup the related code.  No functional change intended.
> 
> Since at it, touch up the errno handling a little bit on the setup side.
> 
> Signed-off-by: Peter Xu 


Reviewed-by: Dr. David Alan Gilbert 

> ---
>  migration/postcopy-ram.c | 82 +---
>  1 file changed, 51 insertions(+), 31 deletions(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 2176ed68a5..e662dd05cc 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -523,6 +523,19 @@ int postcopy_ram_incoming_init(MigrationIncomingState 
> *mis)
>  return 0;
>  }
>  
> +static void postcopy_temp_pages_cleanup(MigrationIncomingState *mis)
> +{
> +if (mis->postcopy_tmp_page) {
> +munmap(mis->postcopy_tmp_page, mis->largest_page_size);
> +mis->postcopy_tmp_page = NULL;
> +}
> +
> +if (mis->postcopy_tmp_zero_page) {
> +munmap(mis->postcopy_tmp_zero_page, mis->largest_page_size);
> +mis->postcopy_tmp_zero_page = NULL;
> +}
> +}
> +
>  /*
>   * At the end of a migration where postcopy_ram_incoming_init was called.
>   */
> @@ -564,14 +577,8 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState 
> *mis)
>  }
>  }
>  
> -if (mis->postcopy_tmp_page) {
> -munmap(mis->postcopy_tmp_page, mis->largest_page_size);
> -mis->postcopy_tmp_page = NULL;
> -}
> -if (mis->postcopy_tmp_zero_page) {
> -munmap(mis->postcopy_tmp_zero_page, mis->largest_page_size);
> -mis->postcopy_tmp_zero_page = NULL;
> -}
> +postcopy_temp_pages_cleanup(mis);
> +
>  trace_postcopy_ram_incoming_cleanup_blocktime(
>  get_postcopy_total_blocktime());
>  
> @@ -1082,6 +1089,40 @@ retry:
>  return NULL;
>  }
>  
> +static int postcopy_temp_pages_setup(MigrationIncomingState *mis)
> +{
> +int err;
> +
> +mis->postcopy_tmp_page = mmap(NULL, mis->largest_page_size,
> +  PROT_READ | PROT_WRITE,
> +  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +if (mis->postcopy_tmp_page == MAP_FAILED) {
> +err = errno;
> +mis->postcopy_tmp_page = NULL;
> +error_report("%s: Failed to map postcopy_tmp_page %s",
> + __func__, strerror(err));
> +return -err;
> +}
> +
> +/*
> + * Map large zero page when kernel can't use UFFDIO_ZEROPAGE for 
> hugepages
> + */
> +mis->postcopy_tmp_zero_page = mmap(NULL, mis->largest_page_size,
> +   PROT_READ | PROT_WRITE,
> +   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +if (mis->postcopy_tmp_zero_page == MAP_FAILED) {
> +err = errno;
> +mis->postcopy_tmp_zero_page = NULL;
> +error_report("%s: Failed to map large zero page %s",
> + __func__, strerror(err));
> +return -err;
> +}
> +
> +memset(mis->postcopy_tmp_zero_page, '\0', mis->largest_page_size);
> +
> +return 0;
> +}
> +
>  int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
>  {
>  /* Open the fd for the kernel to give us userfaults */
> @@ -1122,32 +1163,11 @@ int 
> postcopy_ram_incoming_setup(MigrationIncomingState *mis)
>  return -1;
>  }
>  
> -mis->postcopy_tmp_page = mmap(NULL, mis->largest_page_size,
> -  PROT_READ | PROT_WRITE, MAP_PRIVATE |
> -  MAP_ANONYMOUS, -1, 0);
> -if (mis->postcopy_tmp_page == MAP_FAILED) {
> -mis->postcopy_tmp_page = NULL;
> -error_report("%s: Failed to map postcopy_tmp_page %s",
> - __func__, strerror(errno));
> +if (postcopy_temp_pages_setup(mis)) {
> +/* Error dumped in the sub-function */
>  return -1;
>  }
>  
> -/*
> - * Map large zero page when kernel can't use UFFDIO_ZEROPAGE for 
> hugepages
> - */
> -mis->postcopy_tmp_zero_page = mmap(NULL, mis->largest_page_size,
> -   PROT_READ | PROT_WRITE,
> -   MAP_PRIVATE | MAP_ANONYMOUS,
> -   -1, 0);
> -if (mis->postcopy_tmp_zero_page == MAP_FAILED) {
> -int e = errno;
> -mis->postcopy_tmp_zero_page = NULL;
> -error_report("%s: Failed to map large zero page %s",
> - __func__, strerror(e));
> -return -e;
> -}
> -memset(mis->postcopy_tmp_zero_page, '\0', mis->largest_page_size);
> -
>  trace_postcopy_ram_enable_notify();
>  
>  return 0;
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH RFC 05/15] migration: Simplify unqueue_page()

2022-01-19 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> This patch simplifies unqueue_page() on both sides of it (itself, and caller).
> 
> Firstly, due to the fact that right after unqueue_page() returned true, we'll
> definitely send a huge page (see ram_save_huge_page() call - it will _never_
> exit before finish sending that huge page), so unqueue_page() does not need to
> jump in small page size if huge page is enabled on the ramblock.  IOW, it's
> destined that only the 1st 4K page will be valid, when unqueue the 2nd+ time
> we'll notice the whole huge page has already been sent anyway.  Switching to
> operating on huge page reduces a lot of the loops of redundant unqueue_page().
> 
> Meanwhile, drop the dirty check.  It's not helpful to call test_bit() every
> time to jump over clean pages, as ram_save_host_page() has already done so,
> while in a faster way (see commit ba1b7c812c ("migration/ram: Optimize
> ram_save_host_page()", 2021-05-13)).  So that's not necessary too.
> 
> Drop the two tracepoints along the way - based on above analysis it's very
> possible that no one is really using it..
> 
> Signed-off-by: Peter Xu 

Yes, OK

Reviewed-by: Dr. David Alan Gilbert 

Although:
  a) You might like to keep a trace in get_queued_page just to see
what's getting unqueued
  b) I think originally it was a useful diagnostic to find out when we
were getting a lot of queue requests for pages that were already sent.

Dave


> ---
>  migration/ram.c| 34 --
>  migration/trace-events |  2 --
>  2 files changed, 8 insertions(+), 28 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index dc6ba041fa..0df15ff663 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1541,6 +1541,7 @@ static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t 
> *offset)
>  {
>  struct RAMSrcPageRequest *entry;
>  RAMBlock *block = NULL;
> +size_t page_size;
>  
>  if (!postcopy_has_request(rs)) {
>  return NULL;
> @@ -1557,10 +1558,13 @@ static RAMBlock *unqueue_page(RAMState *rs, 
> ram_addr_t *offset)
>  entry = QSIMPLEQ_FIRST(>src_page_requests);
>  block = entry->rb;
>  *offset = entry->offset;
> +page_size = qemu_ram_pagesize(block);
> +/* Each page request should only be multiple page size of the ramblock */
> +assert((entry->len % page_size) == 0);
>  
> -if (entry->len > TARGET_PAGE_SIZE) {
> -entry->len -= TARGET_PAGE_SIZE;
> -entry->offset += TARGET_PAGE_SIZE;
> +if (entry->len > page_size) {
> +entry->len -= page_size;
> +entry->offset += page_size;
>  } else {
>  memory_region_unref(block->mr);
>  QSIMPLEQ_REMOVE_HEAD(>src_page_requests, next_req);
> @@ -1942,30 +1946,8 @@ static bool get_queued_page(RAMState *rs, 
> PageSearchStatus *pss)
>  {
>  RAMBlock  *block;
>  ram_addr_t offset;
> -bool dirty;
>  
> -do {
> -block = unqueue_page(rs, );
> -/*
> - * We're sending this page, and since it's postcopy nothing else
> - * will dirty it, and we must make sure it doesn't get sent again
> - * even if this queue request was received after the background
> - * search already sent it.
> - */
> -if (block) {
> -unsigned long page;
> -
> -page = offset >> TARGET_PAGE_BITS;
> -dirty = test_bit(page, block->bmap);
> -if (!dirty) {
> -trace_get_queued_page_not_dirty(block->idstr, 
> (uint64_t)offset,
> -page);
> -} else {
> -trace_get_queued_page(block->idstr, (uint64_t)offset, page);
> -}
> -}
> -
> -} while (block && !dirty);
> +block = unqueue_page(rs, );
>  
>  if (!block) {
>  /*
> diff --git a/migration/trace-events b/migration/trace-events
> index e165687af2..3a9b3567ae 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -85,8 +85,6 @@ put_qlist_end(const char *field_name, const char 
> *vmsd_name) "%s(%s)"
>  qemu_file_fclose(void) ""
>  
>  # ram.c
> -get_queued_page(const char *block_name, uint64_t tmp_offset, unsigned long 
> page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
> -get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset, 
> unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
>  migration_bitmap_sync_start(void) ""
>  migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64
>  migration_bitmap_clear_dirty(char *str, uint64_t start, uint64_t size, 
> unsigned long page) "rb %s start 0x%"PRIx64" size 0x%"PRIx64" page 0x%lx"
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v8 18/23] hw/intc: Add RISC-V AIA APLIC device emulation

2022-01-19 Thread Frank Chang

On Wed, Jan 19, 2022 at 11:27 PM Anup Patel  wrote:

> From: Anup Patel 
>
> The RISC-V AIA (Advanced Interrupt Architecture) defines a new
> interrupt controller for wired interrupts called APLIC (Advanced
> Platform Level Interrupt Controller). The APLIC is capabable of
> forwarding wired interupts to RISC-V HARTs directly or as MSIs
> (Message Signaled Interupts).
>
> This patch adds device emulation for RISC-V AIA APLIC.
>
> Signed-off-by: Anup Patel 
> Signed-off-by: Anup Patel 
> Reviewed-by: Frank Chang 
> ---
>  hw/intc/Kconfig   |   3 +
>  hw/intc/meson.build   |   1 +
>  hw/intc/riscv_aplic.c | 975 ++
>  include/hw/intc/riscv_aplic.h |  79 +++
>  4 files changed, 1058 insertions(+)
>  create mode 100644 hw/intc/riscv_aplic.c
>  create mode 100644 include/hw/intc/riscv_aplic.h
>
> diff --git a/hw/intc/Kconfig b/hw/intc/Kconfig
> index 010ded7eae..528e77b4a6 100644
> --- a/hw/intc/Kconfig
> +++ b/hw/intc/Kconfig
> @@ -70,6 +70,9 @@ config LOONGSON_LIOINTC
>  config RISCV_ACLINT
>  bool
>
> +config RISCV_APLIC
> +bool
> +
>  config SIFIVE_PLIC
>  bool
>
> diff --git a/hw/intc/meson.build b/hw/intc/meson.build
> index 70080bc161..7466024402 100644
> --- a/hw/intc/meson.build
> +++ b/hw/intc/meson.build
> @@ -50,6 +50,7 @@ specific_ss.add(when: 'CONFIG_S390_FLIC', if_true:
> files('s390_flic.c'))
>  specific_ss.add(when: 'CONFIG_S390_FLIC_KVM', if_true:
> files('s390_flic_kvm.c'))
>  specific_ss.add(when: 'CONFIG_SH_INTC', if_true: files('sh_intc.c'))
>  specific_ss.add(when: 'CONFIG_RISCV_ACLINT', if_true:
> files('riscv_aclint.c'))
> +specific_ss.add(when: 'CONFIG_RISCV_APLIC', if_true:
> files('riscv_aplic.c'))
>  specific_ss.add(when: 'CONFIG_SIFIVE_PLIC', if_true:
> files('sifive_plic.c'))
>  specific_ss.add(when: 'CONFIG_XICS', if_true: files('xics.c'))
>  specific_ss.add(when: ['CONFIG_KVM', 'CONFIG_XICS'],
> diff --git a/hw/intc/riscv_aplic.c b/hw/intc/riscv_aplic.c
> new file mode 100644
> index 00..885c1de2af
> --- /dev/null
> +++ b/hw/intc/riscv_aplic.c
> @@ -0,0 +1,975 @@
> +/*
> + * RISC-V APLIC (Advanced Platform Level Interrupt Controller)
> + *
> + * Copyright (c) 2021 Western Digital Corporation or its affiliates.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License
> along with
> + * this program.  If not, see .
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu/log.h"
> +#include "qemu/module.h"
> +#include "qemu/error-report.h"
> +#include "qemu/bswap.h"
> +#include "exec/address-spaces.h"
> +#include "hw/sysbus.h"
> +#include "hw/pci/msi.h"
> +#include "hw/boards.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/intc/riscv_aplic.h"
> +#include "hw/irq.h"
> +#include "target/riscv/cpu.h"
> +#include "sysemu/sysemu.h"
> +#include "migration/vmstate.h"
> +
> +#define APLIC_MAX_IDC  (1UL << 14)
> +#define APLIC_MAX_SOURCE   1024
> +#define APLIC_MIN_IPRIO_BITS   1
> +#define APLIC_MAX_IPRIO_BITS   8
> +#define APLIC_MAX_CHILDREN 1024
> +
> +#define APLIC_DOMAINCFG0x
> +#define APLIC_DOMAINCFG_RDONLY 0x8000
> +#define APLIC_DOMAINCFG_IE (1 << 8)
> +#define APLIC_DOMAINCFG_DM (1 << 2)
> +#define APLIC_DOMAINCFG_BE (1 << 0)
> +
> +#define APLIC_SOURCECFG_BASE   0x0004
> +#define APLIC_SOURCECFG_D  (1 << 10)
> +#define APLIC_SOURCECFG_CHILDIDX_MASK  0x03ff
> +#define APLIC_SOURCECFG_SM_MASK0x0007
> +#define APLIC_SOURCECFG_SM_INACTIVE0x0
> +#define APLIC_SOURCECFG_SM_DETACH  0x1
> +#define APLIC_SOURCECFG_SM_EDGE_RISE   0x4
> +#define APLIC_SOURCECFG_SM_EDGE_FALL   0x5
> +#define APLIC_SOURCECFG_SM_LEVEL_HIGH  0x6
> +#define APLIC_SOURCECFG_SM_LEVEL_LOW   0x7
> +
> +#define APLIC_MMSICFGADDR  0x1bc0
> +#define APLIC_MMSICFGADDRH 0x1bc4
> +#define APLIC_SMSICFGADDR  0x1bc8
> +#define APLIC_SMSICFGADDRH 0x1bcc
> +
> +#define APLIC_xMSICFGADDRH_L   (1UL << 31)
> +#define APLIC_xMSICFGADDRH_HHXS_MASK   0x1f
> +#define APLIC_xMSICFGADDRH_HHXS_SHIFT  24
> +#define APLIC_xMSICFGADDRH_LHXS_MASK   0x7
> +#define APLIC_xMSICFGADDRH_LHXS_SHIFT  20
> +#define APLIC_xMSICFGADDRH_HHXW_MASK   0x7
> +#define APLIC_xMSICFGADDRH_HHXW_SHIFT  16
> +#define APLIC_xMSICFGADDRH_LHXW_MASK   0xf
> +#define APLIC_xMSICFGADDRH_LHXW_SHIFT  12
>

[PATCH v8 23/23] hw/riscv: virt: Increase maximum number of allowed CPUs

2022-01-19 Thread Anup Patel

From: Anup Patel 

To facilitate software development of RISC-V systems with large number
of HARTs, we increase the maximum number of allowed CPUs to 512 (2^9).

We also add a detailed source level comments about limit defines which
impact the physical address space utilization.

Signed-off-by: Anup Patel 
Signed-off-by: Anup Patel 
Reviewed-by: Alistair Francis 
Reviewed-by: Frank Chang 
---
 hw/riscv/virt.c | 10 ++
 include/hw/riscv/virt.h |  2 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index dc1b3dc751..367d01d3a9 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -45,6 +45,16 @@
 #include "hw/pci-host/gpex.h"
 #include "hw/display/ramfb.h"
 
+/*
+ * The virt machine physical address space used by some of the devices
+ * namely ACLINT, PLIC, APLIC, and IMSIC depend on number of Sockets,
+ * number of CPUs, and number of IMSIC guest files.
+ *
+ * Various limits defined by VIRT_SOCKETS_MAX_BITS, VIRT_CPUS_MAX_BITS,
+ * and VIRT_IRQCHIP_MAX_GUESTS_BITS are tuned for maximum utilization
+ * of virt machine physical address space.
+ */
+
 #define VIRT_IMSIC_GROUP_MAX_SIZE  (1U << IMSIC_MMIO_GROUP_MIN_SHIFT)
 #if VIRT_IMSIC_GROUP_MAX_SIZE < \
 IMSIC_GROUP_SIZE(VIRT_CPUS_MAX_BITS, VIRT_IRQCHIP_MAX_GUESTS_BITS)
diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
index 7898c574af..78f450eb60 100644
--- a/include/hw/riscv/virt.h
+++ b/include/hw/riscv/virt.h
@@ -24,7 +24,7 @@
 #include "hw/block/flash.h"
 #include "qom/object.h"
 
-#define VIRT_CPUS_MAX_BITS 3
+#define VIRT_CPUS_MAX_BITS 9
 #define VIRT_CPUS_MAX  (1 << VIRT_CPUS_MAX_BITS)
 #define VIRT_SOCKETS_MAX_BITS  2
 #define VIRT_SOCKETS_MAX   (1 << VIRT_SOCKETS_MAX_BITS)
-- 
2.25.1

[PATCH v8 22/23] docs/system: riscv: Document AIA options for virt machine

2022-01-19 Thread Anup Patel

From: Anup Patel 

We have two new machine options "aia" and "aia-guests" available
for the RISC-V virt machine so let's document these options.

Signed-off-by: Anup Patel 
Signed-off-by: Anup Patel 
Reviewed-by: Alistair Francis 
Reviewed-by: Frank Chang 
---
 docs/system/riscv/virt.rst | 16 
 1 file changed, 16 insertions(+)

diff --git a/docs/system/riscv/virt.rst b/docs/system/riscv/virt.rst
index fa016584bf..373645513a 100644
--- a/docs/system/riscv/virt.rst
+++ b/docs/system/riscv/virt.rst
@@ -63,6 +63,22 @@ The following machine-specific options are supported:
   When this option is "on", ACLINT devices will be emulated instead of
   SiFive CLINT. When not specified, this option is assumed to be "off".
 
+- aia=[none|aplic|aplic-imsic]
+
+  This option allows selecting interrupt controller defined by the AIA
+  (advanced interrupt architecture) specification. The "aia=aplic" selects
+  APLIC (advanced platform level interrupt controller) to handle wired
+  interrupts whereas the "aia=aplic-imsic" selects APLIC and IMSIC (incoming
+  message signaled interrupt controller) to handle both wired interrupts and
+  MSIs. When not specified, this option is assumed to be "none" which selects
+  SiFive PLIC to handle wired interrupts.
+
+- aia-guests=nnn
+
+  The number of per-HART VS-level AIA IMSIC pages to be emulated for a guest
+  having AIA IMSIC (i.e. "aia=aplic-imsic" selected). When not specified,
+  the default number of per-HART VS-level AIA IMSIC pages is 0.
+
 Running Linux kernel
 
 
-- 
2.25.1

[PATCH v8 17/23] target/riscv: Allow users to force enable AIA CSRs in HART

2022-01-19 Thread Anup Patel

From: Anup Patel 

We add "x-aia" command-line option for RISC-V HART using which
allows users to force enable CPU AIA CSRs without changing the
interrupt controller available in RISC-V machine.

Signed-off-by: Anup Patel 
Signed-off-by: Anup Patel 
Reviewed-by: Alistair Francis 
Reviewed-by: Frank Chang 
---
 target/riscv/cpu.c | 5 +
 target/riscv/cpu.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index f137c4bffb..2668f9c358 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -508,6 +508,10 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 }
 }
 
+if (cpu->cfg.aia) {
+riscv_set_feature(env, RISCV_FEATURE_AIA);
+}
+
 set_resetvec(env, cpu->cfg.resetvec);
 
 /* Validate that MISA_MXL is set properly. */
@@ -749,6 +753,7 @@ static Property riscv_cpu_properties[] = {
 DEFINE_PROP_BOOL("x-j", RISCVCPU, cfg.ext_j, false),
 /* ePMP 0.9.3 */
 DEFINE_PROP_BOOL("x-epmp", RISCVCPU, cfg.epmp, false),
+DEFINE_PROP_BOOL("x-aia", RISCVCPU, cfg.aia, false),
 
 DEFINE_PROP_UINT64("resetvec", RISCVCPU, cfg.resetvec, DEFAULT_RSTVEC),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 058ea9ce99..9d24d678e9 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -387,6 +387,7 @@ struct RISCVCPU {
 bool mmu;
 bool pmp;
 bool epmp;
+bool aia;
 uint64_t resetvec;
 } cfg;
 };
-- 
2.25.1

[PATCH v8 19/23] hw/riscv: virt: Add optional AIA APLIC support to virt machine

2022-01-19 Thread Anup Patel

From: Anup Patel 

We extend virt machine to emulate AIA APLIC devices only when
"aia=aplic" parameter is passed along with machine name in QEMU
command-line. When "aia=none" or not specified then we fallback
to original PLIC device emulation.

Signed-off-by: Anup Patel 
Signed-off-by: Anup Patel 
---
 hw/riscv/Kconfig|   1 +
 hw/riscv/virt.c | 291 
 include/hw/riscv/virt.h |  26 +++-
 3 files changed, 259 insertions(+), 59 deletions(-)

diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
index d2d869aaad..c30bb7cb6c 100644
--- a/hw/riscv/Kconfig
+++ b/hw/riscv/Kconfig
@@ -42,6 +42,7 @@ config RISCV_VIRT
 select PFLASH_CFI01
 select SERIAL
 select RISCV_ACLINT
+select RISCV_APLIC
 select SIFIVE_PLIC
 select SIFIVE_TEST
 select VIRTIO_MMIO
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index e3068d6126..6b06f79b46 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -33,6 +33,7 @@
 #include "hw/riscv/boot.h"
 #include "hw/riscv/numa.h"
 #include "hw/intc/riscv_aclint.h"
+#include "hw/intc/riscv_aplic.h"
 #include "hw/intc/sifive_plic.h"
 #include "hw/misc/sifive_test.h"
 #include "chardev/char.h"
@@ -52,6 +53,8 @@ static const MemMapEntry virt_memmap[] = {
 [VIRT_ACLINT_SSWI] = {  0x2F0,0x4000 },
 [VIRT_PCIE_PIO] ={  0x300,   0x1 },
 [VIRT_PLIC] ={  0xc00, VIRT_PLIC_SIZE(VIRT_CPUS_MAX * 2) },
+[VIRT_APLIC_M] = {  0xc00, APLIC_SIZE(VIRT_CPUS_MAX) },
+[VIRT_APLIC_S] = {  0xd00, APLIC_SIZE(VIRT_CPUS_MAX) },
 [VIRT_UART0] =   { 0x1000, 0x100 },
 [VIRT_VIRTIO] =  { 0x10001000,0x1000 },
 [VIRT_FW_CFG] =  { 0x1010,  0x18 },
@@ -133,12 +136,13 @@ static void virt_flash_map(RISCVVirtState *s,
 sysmem);
 }
 
-static void create_pcie_irq_map(void *fdt, char *nodename,
-uint32_t plic_phandle)
+static void create_pcie_irq_map(RISCVVirtState *s, void *fdt, char *nodename,
+uint32_t irqchip_phandle)
 {
 int pin, dev;
-uint32_t
-full_irq_map[GPEX_NUM_IRQS * GPEX_NUM_IRQS * FDT_INT_MAP_WIDTH] = {};
+uint32_t irq_map_stride = 0;
+uint32_t full_irq_map[GPEX_NUM_IRQS * GPEX_NUM_IRQS *
+  FDT_MAX_INT_MAP_WIDTH] = {};
 uint32_t *irq_map = full_irq_map;
 
 /* This code creates a standard swizzle of interrupts such that
@@ -156,23 +160,31 @@ static void create_pcie_irq_map(void *fdt, char *nodename,
 int irq_nr = PCIE_IRQ + ((pin + PCI_SLOT(devfn)) % GPEX_NUM_IRQS);
 int i = 0;
 
+/* Fill PCI address cells */
 irq_map[i] = cpu_to_be32(devfn << 8);
-
 i += FDT_PCI_ADDR_CELLS;
-irq_map[i] = cpu_to_be32(pin + 1);
 
+/* Fill PCI Interrupt cells */
+irq_map[i] = cpu_to_be32(pin + 1);
 i += FDT_PCI_INT_CELLS;
-irq_map[i++] = cpu_to_be32(plic_phandle);
 
-i += FDT_PLIC_ADDR_CELLS;
-irq_map[i] = cpu_to_be32(irq_nr);
+/* Fill interrupt controller phandle and cells */
+irq_map[i++] = cpu_to_be32(irqchip_phandle);
+irq_map[i++] = cpu_to_be32(irq_nr);
+if (s->aia_type != VIRT_AIA_TYPE_NONE) {
+irq_map[i++] = cpu_to_be32(0x4);
+}
 
-irq_map += FDT_INT_MAP_WIDTH;
+if (!irq_map_stride) {
+irq_map_stride = i;
+}
+irq_map += irq_map_stride;
 }
 }
 
-qemu_fdt_setprop(fdt, nodename, "interrupt-map",
- full_irq_map, sizeof(full_irq_map));
+qemu_fdt_setprop(fdt, nodename, "interrupt-map", full_irq_map,
+ GPEX_NUM_IRQS * GPEX_NUM_IRQS *
+ irq_map_stride * sizeof(uint32_t));
 
 qemu_fdt_setprop_cells(fdt, nodename, "interrupt-map-mask",
0x1800, 0, 0, 0x7);
@@ -404,8 +416,6 @@ static void create_fdt_socket_plic(RISCVVirtState *s,
 plic_addr = memmap[VIRT_PLIC].base + (memmap[VIRT_PLIC].size * socket);
 plic_name = g_strdup_printf("/soc/plic@%lx", plic_addr);
 qemu_fdt_add_subnode(mc->fdt, plic_name);
-qemu_fdt_setprop_cell(mc->fdt, plic_name,
-"#address-cells", FDT_PLIC_ADDR_CELLS);
 qemu_fdt_setprop_cell(mc->fdt, plic_name,
 "#interrupt-cells", FDT_PLIC_INT_CELLS);
 qemu_fdt_setprop_string_array(mc->fdt, plic_name, "compatible",
@@ -425,6 +435,76 @@ static void create_fdt_socket_plic(RISCVVirtState *s,
 g_free(plic_cells);
 }
 
+static void create_fdt_socket_aia(RISCVVirtState *s,
+  const MemMapEntry *memmap, int socket,
+  uint32_t *phandle, uint32_t *intc_phandles,
+  uint32_t *aplic_phandles)
+{
+int cpu;
+char *aplic_name;
+uint32_t *aplic_cells;
+unsigned

Re: [PULL v2 00/31] testing/next and other misc fixes

2022-01-19 Thread Peter Maydell

On Tue, 18 Jan 2022 at 19:00, Alex Bennée  wrote:
>
> The following changes since commit 6621441db50d5bae7e34dbd04bf3c57a27a71b32:
>
>   Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-20220115' 
> into staging (2022-01-16 20:12:23 +)
>
> are available in the Git repository at:
>
>   https://github.com/stsquad/qemu.git tags/pull-for-7.0-180122-2
>
> for you to fetch changes up to 3265d1fc77eb5da522accb37e50053dfdfda7e8f:
>
>   docker: include bison in debian-tricore-cross (2022-01-18 16:44:16 +)
>
> 
> Various testing and other misc updates:
>
>   - fix compiler warnings with ui and sdl
>   - update QXL/spice dependancy
>   - skip I/O tests on Alpine
>   - update fedora image to latest version
>   - integrate lcitool and regenerate docker images
>   - favour CONFIG_LINUX_USER over CONFIG_LINUX
>   - add libfuse3 dependencies to docker images
>   - add dtb-kaslr-seed control knob to virt machine
>   - fix build breakage from HMP update
>   - update docs for C standard and suffix usage
>   - add more logging for debugging user hole finding
>   - expand reserve for brk() for static 64 bit programs
>   - fix bug with linux-user hole calculation
>   - avoid affecting flags when printing results in float tests
>   - add float reference files for ppc64
>   - update FreeBSD to 12.3
>   - add bison dependancy to tricore images
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0
for any user-visible changes.

-- PMM

1 2 3 >

1 - 100 of 245 matches

Mail list logo