Re: [PATCH] tests/qemu-iotests: Rework the checks and spots using GNU sed

2022-02-16 Thread Thomas Huth

On 15/02/2022 23.10, Eric Blake wrote:

On Tue, Feb 15, 2022 at 02:20:31PM +0100, Thomas Huth wrote:

Instead of failing the iotests if GNU sed is not available (or skipping
them completely in the check-block.sh script), it would be better to
simply skip the bash-based tests that rely on GNU sed, so that the other
tests could still be run. Thus we now explicitely use "gsed" (either as
direct program or as a wrapper around "sed" if it's the GNU version)
in the spots that rely on the GNU sed behavior. Then we also remove the
sed checks from the check-block.sh script, so that "make check-block"
can now be run on systems without GNU sed, too.

...

diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 935217aa65..a3b1708adc 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -21,58 +21,58 @@
  
  _filter_date()

  {
-$SED -re 's/[0-9]{4}-[0-9]{2}-[0-9]{2} 
[0-9]{2}:[0-9]{2}:[0-9]{2}/-mm-dd hh:mm:ss/'
+gsed -re 's/[0-9]{4}-[0-9]{2}-[0-9]{2} 
[0-9]{2}:[0-9]{2}:[0-9]{2}/-mm-dd hh:mm:ss/'


GNU sed recommends spelling it 'sed -E', not 'sed -r', when using
extended regex.  Older POSIX did not support either spelling, but the
next version will require -E, as many implementations have it now:
https://www.austingroupbugs.net/view.php?id=528


Thanks for the pointer ... I originally checked "man 1p sed" on
my system and did not see -r or -E in there, so I thought that
this must be really something specific to GNU sed. But now that
you've mentioned this, I just double-checked the build environments
that we support with QEMU, and seems like -E should be supported
everywhere:

macOS 11:

$ sed --help
sed: illegal option -- -
usage: sed script [-Ealnru] [-i extension] [file ...]
sed [-Ealnu] [-i extension] [-e script] ... [-f script_file] ... [file 
...]


NetBSD 9.2:

$ sed --help
sed: unknown option -- -
Usage:  sed [-aElnru] command [file ...]
sed [-aElnru] [-e command] [-f command_file] [-I[extension]]
[-i[extension]] [file ...]


FreeBSD 12.3:

$ sed --help
sed: illegal option -- -
usage: sed script [-Ealnru] [-i extension] [file ...]
sed [-Ealnu] [-i extension] [-e script] ... [-f script_file] ... [file 
...]


OpenBSD 7.0:

$ sed --help
sed: unknown option -- -
usage: sed [-aEnru] [-i[extension]] command [file ...]
   sed [-aEnru] [-e command] [-f command_file] [-i[extension]] [file ...]


Illumos:

Has -E according to https://illumos.org/man/1/sed


Busybox:

Has -E according to https://www.commandlinux.com/man-page/man1/busybox.1.html


Haiku:

Seems to use GNU sed, so -E is available.


We likely never will run the iotests on Windows, so I did not check
those (but I assume MSYS and friends are using GNU sed, too).


So I think it should be safe to change these spots that are
using "-r" to "sed -E" (and not use gsed here).


Other than the fact that this was easier to write with ERE, I'm not
seeing any other GNU-only extensions in use here; but I'd recommend
that as long as we're touching the line, we spell it 'gsed -Ee'
instead of -re (here, and in several other places).


  _filter_qom_path()
  {
-$SED -e '/Attached to:/s/\device[[0-9]\+\]/device[N]/g'
+gsed -e '/Attached to:/s/\device[[0-9]\+\]/device[N]/g'


Here, it is our use of \+ that is a GNU sed extension, although it is
fairly easy (but verbose) to translate that one to portable sed
(\+ is the same as *).  So gsed is correct.  On the
other hand, the use of [[0-9]\+\] looks ugly - it probably does NOT
match what we meant (we have a bracket expression '[...]' that matches
the 11 characters [ and 0-9, then '\+' to match that bracket
expression 1 or more times, then '\]' which in its context is
identical to ']' to match a closing ], since only opening [ needs \
escaping for literal treatment).  What we probably meant is:

'/Attached to:/s/\device\[[0-9][0-9]*]/device[N]/g'

at which point normal sed would do.


Ok ... but I'd prefer to clean such spots rather in a second step,
to make sure not to introduce bugs and to make the review easier.


  # Removes \r from messages
  _filter_win32()
  {
-$SED -e 's/\r//g'
+gsed -e 's/\r//g'


Yep, \r is another GNU sed extension.


  }
  
  # sanitize qemu-io output

  _filter_qemu_io()
  {
-_filter_win32 | $SED -e "s/[0-9]* ops\; [0-9/:. sec]* ([0-9/.inf]* 
[EPTGMKiBbytes]*\/sec and [0-9/.inf]* ops\/sec)/X ops\; XX:XX:XX.X (XXX YYY\/sec and XXX 
ops\/sec)/" \
+_filter_win32 | gsed -e "s/[0-9]* ops\; [0-9/:. sec]* ([0-9/.inf]* 
[EPTGMKiBbytes]*\/sec and [0-9/.inf]* ops\/sec)/X ops\; XX:XX:XX.X (XXX YYY\/sec and XXX 
ops\/sec)/" \
  -e "s/: line [0-9][0-9]*:  *[0-9][0-9]*\( Aborted\| Killed\)/:\1/" \
  -e "s/qemu-io> //g"


I'm not seeing anything specific to GNU sed in this (long) sed script;
can we relax this one to plain 'sed'?  Use of s#some/text## might be
easier than having to s/some\/text//, but that's not essential.


If I switch that to plain 

Re: [PATCH v2 9/9] spapr: implement nested-hv capability for the virtual hypervisor

2022-02-16 Thread Nicholas Piggin
Excerpts from Cédric Le Goater's message of February 16, 2022 8:52 pm:
> On 2/16/22 11:25, Nicholas Piggin wrote:
>> This implements the Nested KVM HV hcall API for spapr under TCG.
>> 
>> The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
>> L1 is switched back in returned from the hcall when a HV exception
>> is sent to the vhyp. Register state is copied in and out according to
>> the nested KVM HV hcall API specification.
>> 
>> The hdecr timer is started when the L2 is switched in, and it provides
>> the HDEC / 0x980 return to L1.
>> 
>> The MMU re-uses the bare metal radix 2-level page table walker by
>> using the get_pate method to point the MMU to the nested partition
>> table entry. MMU faults due to partition scope errors raise HV
>> exceptions and accordingly are routed back to the L1.
>> 
>> The MMU does not tag translations for the L1 (direct) vs L2 (nested)
>> guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
>> and exit).>
>> Reviewed-by: Fabiano Rosas 
>> Signed-off-by: Nicholas Piggin 
> 
> Reviewed-by: Cédric Le Goater 
> 
> Some last comments below,

[...]

>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index edbf3eeed0..852fe61b36 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -199,6 +199,9 @@ struct SpaprMachineState {
>>   bool has_graphics;
>>   uint32_t vsmt;   /* Virtual SMT mode (KVM's "core stride") */
>>   
>> +/* Nested HV support (TCG only) */
>> +uint64_t nested_ptcr;
>> +
> 
> this is new state to migrate.
> 

[...]

>> +/* Linux 64-bit powerpc pt_regs struct, used by nested HV */
>> +struct kvmppc_pt_regs {
>> +uint64_t gpr[32];
>> +uint64_t nip;
>> +uint64_t msr;
>> +uint64_t orig_gpr3;/* Used for restarting system calls */
>> +uint64_t ctr;
>> +uint64_t link;
>> +uint64_t xer;
>> +uint64_t ccr;
>> +uint64_t softe;/* Soft enabled/disabled */
>> +uint64_t trap; /* Reason for being here */
>> +uint64_t dar;  /* Fault registers */
>> +uint64_t dsisr;/* on 4xx/Book-E used for ESR */
>> +uint64_t result;   /* Result of a system call */
>> +};
> 
> I think we need to start moving all the spapr hcall definitions under
> spapr_hcall.h. It can come later.

Sure.

[...]

>> diff --git a/include/hw/ppc/spapr_cpu_core.h 
>> b/include/hw/ppc/spapr_cpu_core.h
>> index dab3dfc76c..b560514560 100644
>> --- a/include/hw/ppc/spapr_cpu_core.h
>> +++ b/include/hw/ppc/spapr_cpu_core.h
>> @@ -48,6 +48,11 @@ typedef struct SpaprCpuState {
>>   bool prod; /* not migrated, only used to improve dispatch latencies */
>>   struct ICPState *icp;
>>   struct XiveTCTX *tctx;
>> +
>> +/* Fields for nested-HV support */
>> +bool in_nested; /* true while the L2 is executing */
>> +CPUPPCState *nested_host_state; /* holds the L1 state while L2 executes 
>> */
>> +int64_t nested_tb_offset; /* L1->L2 TB offset */
> 
> This needs a new vmstate.

How about instead of the vmstate (we would need all the L1 state in
nested_host_state as well), we just add a migration blocker in the
L2 entry path. We could limit the max hdecr to say 1 second to
ensure it unblocks before long.

I know migration blockers are not preferred but in this case it gives
us some iterations to debug and optimise first, which might change
the data to migrate.

Thanks,
Nick



Re: [PATCH v2] target/riscv: Add isa extenstion strings to the device tree

2022-02-16 Thread Heiko Stübner
Am Mittwoch, 16. Februar 2022, 01:09:04 CET schrieb Atish Patra:
> The Linux kernel parses the ISA extensions from "riscv,isa" DT
> property. It used to parse only the single letter base extensions
> until now. A generic ISA extension parsing framework was proposed[1]
> recently that can parse multi-letter ISA extensions as well.
> 
> Generate the extended ISA string by appending  the available ISA extensions
> to the "riscv,isa" string if it is enabled so that kernel can process it.
> 
> [1] https://lkml.org/lkml/2022/2/15/263
> 
> Suggested-by: Heiko Stubner 
> Signed-off-by: Atish Patra 

Tested-by: Heiko Stuebner 

> ---
> Changes from v1->v2:
> 1. Improved the code redability by using arrays instead of individual check
> ---
>  target/riscv/cpu.c | 35 ++-
>  1 file changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index b0a40b83e7a8..9bf8923f164b 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -34,6 +34,13 @@
>  
>  /* RISC-V CPU definitions */
>  
> +/* This includes the null terminated character '\0' */
> +#define MAX_ISA_EXT_LEN 256
> +struct isa_ext_data {
> +const char *name;
> +bool enabled;
> +};
> +
>  static const char riscv_exts[26] = "IEMAFDQCLBJTPVNSUHKORWXYZG";
>  
>  const char * const riscv_int_regnames[] = {
> @@ -881,10 +888,35 @@ static void riscv_cpu_class_init(ObjectClass *c, void 
> *data)
>  device_class_set_props(dc, riscv_cpu_properties);
>  }
>  
> +static void riscv_isa_string_ext(RISCVCPU *cpu, char *isa_str, int 
> max_str_len)
> +{
> +int offset = strnlen(isa_str, max_str_len);
> +int i;
> +struct isa_ext_data isa_edata_arr[] = {
> +{ "svpbmt", cpu->cfg.ext_svpbmt   },
> +{ "svinval", cpu->cfg.ext_svinval },
> +{ "svnapot", cpu->cfg.ext_svnapot },
> +};
> +
> +for (i = 0; i < ARRAY_SIZE(isa_edata_arr); i++) {
> +if (!isa_edata_arr[i].enabled) {
> +continue;
> +}
> +/* check available space */
> +if ((offset + strlen(isa_edata_arr[i].name) + 1) > max_str_len) {
> +qemu_log("No space left to append isa extension");
> +return;
> +}
> +offset += snprintf(isa_str + offset, max_str_len, "_%s",
> +   isa_edata_arr[i].name);
> +}
> +}
> +
>  char *riscv_isa_string(RISCVCPU *cpu)
>  {
>  int i;
> -const size_t maxlen = sizeof("rv128") + sizeof(riscv_exts) + 1;
> +const size_t maxlen = sizeof("rv128") + sizeof(riscv_exts) +
> +  MAX_ISA_EXT_LEN;
>  char *isa_str = g_new(char, maxlen);
>  char *p = isa_str + snprintf(isa_str, maxlen, "rv%d", TARGET_LONG_BITS);
>  for (i = 0; i < sizeof(riscv_exts); i++) {
> @@ -893,6 +925,7 @@ char *riscv_isa_string(RISCVCPU *cpu)
>  }
>  }
>  *p = '\0';
> +riscv_isa_string_ext(cpu, isa_str, maxlen);
>  return isa_str;
>  }
>  
> 







[PATCH] tests/tcg/s390x: Build tests with debian11

2022-02-16 Thread David Hildenbrand
We need a newer compiler to build upcoming tests that test for z15
features with -march=z15. So let's do it similar to arm64 and powerpc,
using an environment based on debian11 to build tests only.

Cc: Thomas Huth 
Cc: Cornelia Huck 
Cc: Richard Henderson 
Cc: "Alex Bennée" 
Cc: "Philippe Mathieu-Daudé" 
Cc: Wainer dos Santos Moschetta 
Cc: Beraldo Leal 
Cc: David Miller 
Signed-off-by: David Hildenbrand 
---
 .gitlab-ci.d/container-cross.yml|  7 +++
 tests/docker/Makefile.include   |  3 ++-
 .../dockerfiles/debian-s390x-test-cross.docker  | 13 +
 tests/tcg/configure.sh  |  2 +-
 4 files changed, 23 insertions(+), 2 deletions(-)
 create mode 100644 tests/docker/dockerfiles/debian-s390x-test-cross.docker

diff --git a/.gitlab-ci.d/container-cross.yml b/.gitlab-ci.d/container-cross.yml
index a3b5b90552..f8544750ea 100644
--- a/.gitlab-ci.d/container-cross.yml
+++ b/.gitlab-ci.d/container-cross.yml
@@ -146,6 +146,13 @@ s390x-debian-cross-container:
   variables:
 NAME: debian-s390x-cross
 
+s390x-test-debian-cross-container:
+  extends: .container_job_template
+  stage: containers-layer2
+  needs: ['amd64-debian11-container']
+  variables:
+NAME: debian-s390x-test-cross
+
 sh4-debian-cross-container:
   extends: .container_job_template
   stage: containers-layer2
diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index f1a0c5db7a..b77f6088d9 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -210,6 +210,7 @@ docker-image-debian-arm64-test-cross: docker-image-debian11
 docker-image-debian-microblaze-cross: docker-image-debian10
 docker-image-debian-nios2-cross: docker-image-debian10
 docker-image-debian-powerpc-test-cross: docker-image-debian11
+docker-image-debian-s390x-test-cross: docker-image-debian11
 
 # These images may be good enough for building tests but not for test builds
 DOCKER_PARTIAL_IMAGES += debian-alpha-cross
@@ -219,7 +220,7 @@ DOCKER_PARTIAL_IMAGES += debian-hppa-cross
 DOCKER_PARTIAL_IMAGES += debian-m68k-cross debian-mips64-cross
 DOCKER_PARTIAL_IMAGES += debian-microblaze-cross
 DOCKER_PARTIAL_IMAGES += debian-nios2-cross
-DOCKER_PARTIAL_IMAGES += debian-sh4-cross debian-sparc64-cross
+DOCKER_PARTIAL_IMAGES += debian-s390x-test-cross debian-sh4-cross 
debian-sparc64-cross
 DOCKER_PARTIAL_IMAGES += debian-tricore-cross
 DOCKER_PARTIAL_IMAGES += debian-xtensa-cross
 DOCKER_PARTIAL_IMAGES += fedora-cris-cross
diff --git a/tests/docker/dockerfiles/debian-s390x-test-cross.docker 
b/tests/docker/dockerfiles/debian-s390x-test-cross.docker
new file mode 100644
index 00..26435287b6
--- /dev/null
+++ b/tests/docker/dockerfiles/debian-s390x-test-cross.docker
@@ -0,0 +1,13 @@
+#
+# Docker s390x cross-compiler target (tests only)
+#
+# This docker target builds on the debian Bullseye base image.
+#
+FROM qemu/debian11
+
+# Add the foreign architecture we want and install dependencies
+RUN dpkg --add-architecture s390x
+RUN apt update && \
+DEBIAN_FRONTEND=noninteractive eatmydata \
+apt install -y --no-install-recommends \
+crossbuild-essential-s390x gcc-10-s390x-linux-gnu
diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
index 763e9b6ad8..3f00f9307f 100755
--- a/tests/tcg/configure.sh
+++ b/tests/tcg/configure.sh
@@ -185,7 +185,7 @@ for target in $target_list; do
   ;;
 s390x-*)
   container_hosts=x86_64
-  container_image=debian-s390x-cross
+  container_image=debian-s390x-test-cross
   container_cross_cc=s390x-linux-gnu-gcc
   ;;
 sh4-*)
-- 
2.34.1




Re: [PATCH RFCv2 2/4] i386/pc: relocate 4g start to 1T where applicable

2022-02-16 Thread Joao Martins
On 2/16/22 08:19, Gerd Hoffmann wrote:
> On Tue, Feb 15, 2022 at 07:37:40PM +, Joao Martins wrote:
>> On 2/15/22 09:53, Gerd Hoffmann wrote:
>>> What is missing:
>>>
>>>  * Some way for the firmware to get a phys-bits value it can actually
>>>use.  One possible way would be to have a paravirtual bit somewhere
>>>telling whenever host-phys-bits is enabled or not.
>>>
>> If we are not talking about *very old* processors... isn't what already
>> advertised in CPUID.8008 EAX enough? That's the maxphysaddr width
>> on x86, which on qemu we do set it with the phys-bits value;
> 
> Sigh.  Nope.  Did you read the complete reply?
> 
Yes, I did.

What I overlooked was the emphasis you had on desktops (qemu default bigger than
host-advertised), where I am thinking mostly in servers.

> Problem is this is not reliable.  With host-phys-bits=off (default) qemu
> allows to set phys-bits to whatever value you want, including values
> larger than what the host actually supports.  Which renders
> CPUID.8008.EAX unusable. 

I am seeing from another angle, which the way to convey the phys-bits is
via this CPUID leaf is *maybe* enough (like real hardware). But we are setting
with a bigger value than we should have (or in other words ... not honoring
our physical boundary).

> To make things even worse:  The default
> value (phys-bits=40) is larger than what many intel boxes support.
> 
> host-phys-bits=on fixes that.  It makes guest-phys-bits == host-phys-bits
> by default, and also enforces guest-phys-bits <= host-phys-bits.
> So with host-phys-bits=on the guest can actually use CPUID.8008.EAX
> to figure how big the guest physical address space is.
> 
Your 2 paragraphs sound like it's two different things, but +host-phys-bits
just sets CPUID.8008.EAX with the host CPUID equivalent leaf/register
value. Which yes, it makes it reliable, but the way to convey is still the
same. That is regardless, of phys-bits=bogus-bigger-than-host-number,
host-phys-bits=on or host-phys-bits-limit=N.

> Problem is the guest doesn't know whenever host-phys-bits is enabled or
> not ...
> 
> So the options to fix that are:
> 
>   (1) Make the host-phys-bits option visible to the guest (as suggested
>   above), or
>   (2) Advertise a _reliable_ phys-bits value to the guest somehow.

What I am not enterily sure from (1) is the value on having a 'guest phys-bits'
and a 'host phys-bits' *exposed to the guest* when it seems we are picking the 
wrong
value for guests. It seems unnecessary redirection (compared to real hw) unless
there's a use-case in keeping both that I am probably missing.

Joao



Re: [PATCH v6 00/10] virtiofsd: Add support for file security context at file creation

2022-02-16 Thread Dr. David Alan Gilbert
Queued

* Vivek Goyal (vgo...@redhat.com) wrote:
> Hi,
> 
> This is V6 of the patches. I posted V5 here.
> 
> https://listman.redhat.com/archives/virtio-fs/2022-February/msg00012.html
> 
> This patch series basically allows client to send a security context 
> (which is expected to be xattr security.selinux and its content) to
> virtiofsd and it will set that security context on file during creation
> based on various settings. Hence, this patch series basically allows
> supporting SELinux with virtiofs.
> 
> There are primarily 3 modes.
> 
> - If no security context enabled, then it continues to create files without
>   security context.
> 
> - If security context is enabled and but security.selinux has not been
>   remapped, then it uses /proc/thread-self/attr/fscreate knob to set
>   security context and then create the file. This will make sure that
>   newly created file gets the security context as set in "fscreate" and
>   this is atomic w.r.t file creation.
> 
>   This is useful and host and guest SELinux policies don't conflict and
>   can work with each other. In that case, guest security.selinux xattr
>   is not remapped and it is passthrough as "security.selinux" xattr
>   on host.
> 
> - If security context is enabled but security.selinux xattr has been
>   remapped to something else, then it first creates the file and then
>   uses setxattr() to set the remapped xattr with the security context.
>   This is a non-atomic operation w.r.t file creation.
> 
>   This mode will be most versatile and allow host and guest to have their
>   own separate SELinux xattrs and have their own separate SELinux policies.
> 
> Changes since V5:
> 
> - Added some documentation to recommend using xattr remapping to remap
>   "security.selinux" to "trusted.virtiofs.security.selinux" and also 
>   give CAP_SYS_ADMIN to daemon. Also put a warning to make users aware
>   of trade-off involved here. ("Daniel P. Berrangé")
> 
> - Used macro endof() to determine end of fuse_init_in struct. (David
>   Gilbert).
> 
> - Added a check to make sure fsecctx->size is not zero. Also added
>   "return" statement at few places where it was required. (David Gilbert)
> 
> - Split patch 7 in the series. Some of the handling of setting and
>   clearing fscreate knob has been moved into a separate patch. Found
>   it hard to break it down further. So it helps a bit but not too
>   much. (David Gilbert).
> 
> Thanks
> Vivek
> 
> Vivek Goyal (10):
>   virtiofsd: Fix breakage due to fuse_init_in size change
>   linux-headers: Update headers to v5.17-rc1
>   virtiofsd: Parse extended "struct fuse_init_in"
>   virtiofsd: Extend size of fuse_conn_info->capable and ->want fields
>   virtiofsd, fuse_lowlevel.c: Add capability to parse security context
>   virtiofsd: Move core file creation code in separate function
>   virtiofsd: Add helpers to work with /proc/self/task/tid/attr/fscreate
>   virtiofsd: Create new file with security context
>   virtiofsd: Create new file using O_TMPFILE and set security context
>   virtiofsd: Add an option to enable/disable security label
> 
>  docs/tools/virtiofsd.rst  |  32 ++
>  include/standard-headers/asm-x86/kvm_para.h   |   1 +
>  include/standard-headers/drm/drm_fourcc.h |  11 +
>  include/standard-headers/linux/ethtool.h  |   1 +
>  include/standard-headers/linux/fuse.h |  60 ++-
>  include/standard-headers/linux/pci_regs.h | 142 +++---
>  include/standard-headers/linux/virtio_gpio.h  |  72 +++
>  include/standard-headers/linux/virtio_i2c.h   |  47 ++
>  include/standard-headers/linux/virtio_iommu.h |   8 +-
>  .../standard-headers/linux/virtio_pcidev.h|  65 +++
>  include/standard-headers/linux/virtio_scmi.h  |  24 +
>  linux-headers/asm-generic/unistd.h|   5 +-
>  linux-headers/asm-mips/unistd_n32.h   |   2 +
>  linux-headers/asm-mips/unistd_n64.h   |   2 +
>  linux-headers/asm-mips/unistd_o32.h   |   2 +
>  linux-headers/asm-powerpc/unistd_32.h |   2 +
>  linux-headers/asm-powerpc/unistd_64.h |   2 +
>  linux-headers/asm-riscv/bitsperlong.h |  14 +
>  linux-headers/asm-riscv/mman.h|   1 +
>  linux-headers/asm-riscv/unistd.h  |  44 ++
>  linux-headers/asm-s390/unistd_32.h|   2 +
>  linux-headers/asm-s390/unistd_64.h|   2 +
>  linux-headers/asm-x86/kvm.h   |  16 +-
>  linux-headers/asm-x86/unistd_32.h |   1 +
>  linux-headers/asm-x86/unistd_64.h |   1 +
>  linux-headers/asm-x86/unistd_x32.h|   1 +
>  linux-headers/linux/kvm.h |  17 +
>  tools/virtiofsd/fuse_common.h |   9 +-
>  tools/virtiofsd/fuse_i.h  |   7 +
>  tools/virtiofsd/fuse_lowlevel.c   | 168 +--
>  tools/virtiofsd/helper.c  |   1 +
>  tools/virtiofsd/passthrough_ll.c  | 414 --
>  32 files changed, 1044 

Re: [PATCH v8] isa-applesmc: provide OSK forwarding on Apple hosts

2022-02-16 Thread Vladislav Yaroshchuk
ping
https://patchew.org/QEMU/20220113152836.60398-1-yaroshchuk2...@gmail.com/

чт, 13 янв. 2022 г. в 18:28, Vladislav Yaroshchuk :

> On Apple hosts we can read AppleSMC OSK key directly from host's
> SMC and forward this value to QEMU Guest.
>
> New 'hostosk' property is added:
> * `-device isa-applesmc,hostosk=on`
> The property is set to 'on' by default for machine version > 6.2
>
> Apple licence allows use and run up to two additional copies
> or instances of macOS operating system within virtual operating system
> environments on each Apple-branded computer that is already running
> the Apple Software, for purposes of:
>  * software development
>  * testing during software development
>  * using macOS Server
>  * personal, non-commercial use
>
> Guest macOS requires AppleSMC with correct OSK. The most legal
> way to pass it to the Guest is to forward the key from host SMC
> without any value exposion.
>
> Based on
> https://web.archive.org/web/20200103161737/osxbook.com/book/bonus/chapter7/tpmdrmmyth/
>
> Signed-off-by: Vladislav Yaroshchuk 
> ---
>  hw/core/machine.c  |   4 +-
>  hw/misc/applesmc.c | 125 +++--
>  2 files changed, 125 insertions(+), 4 deletions(-)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index debcdc0e70..ea70be0270 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -37,7 +37,9 @@
>  #include "hw/virtio/virtio.h"
>  #include "hw/virtio/virtio-pci.h"
>
> -GlobalProperty hw_compat_6_2[] = {};
> +GlobalProperty hw_compat_6_2[] = {
> +{ "isa-applesmc", "hostosk", "off" }
> +};
>  const size_t hw_compat_6_2_len = G_N_ELEMENTS(hw_compat_6_2);
>
>  GlobalProperty hw_compat_6_1[] = {
> diff --git a/hw/misc/applesmc.c b/hw/misc/applesmc.c
> index 1b9acaf1d3..99bcc937f9 100644
> --- a/hw/misc/applesmc.c
> +++ b/hw/misc/applesmc.c
> @@ -37,6 +37,11 @@
>  #include "qemu/module.h"
>  #include "qemu/timer.h"
>  #include "qom/object.h"
> +#include "qapi/error.h"
> +
> +#if defined(__APPLE__) && defined(__MACH__)
> +#include 
> +#endif
>
>  /* #define DEBUG_SMC */
>
> @@ -80,7 +85,7 @@ enum {
>  #define smc_debug(...) do { } while (0)
>  #endif
>
> -static char default_osk[64] = "This is a dummy key. Enter the real key "
> +static char default_osk[65] = "This is a dummy key. Enter the real key "
>"using the -osk parameter";
>
>  struct AppleSMCData {
> @@ -109,6 +114,7 @@ struct AppleSMCState {
>  uint8_t data_pos;
>  uint8_t data[255];
>  char *osk;
> +bool hostosk;
>  QLIST_HEAD(, AppleSMCData) data_def;
>  };
>
> @@ -312,6 +318,101 @@ static const MemoryRegionOps applesmc_err_io_ops = {
>  },
>  };
>
> +#if defined(__APPLE__) && defined(__MACH__)
> +/*
> + * Based on
> + *
> https://web.archive.org/web/20200103161737/osxbook.com/book/bonus/chapter7/tpmdrmmyth/
> + */
> +enum {
> +SMC_HANDLE_EVENT = 2,
> +SMC_READ_KEY = 5
> +};
> +
> +struct AppleSMCParam {
> +uint32_t key;
> +uint8_t pad0[22];
> +IOByteCount data_size;
> +uint8_t pad1[10];
> +uint8_t command;
> +uint32_t pad2;
> +uint8_t bytes[32];
> +};
> +
> +static bool applesmc_read_host_osk(char *host_osk, Error **errp)
> +{
> +assert(host_osk != NULL);
> +
> +io_service_t hostsmc_service = IO_OBJECT_NULL;
> +io_connect_t hostsmc_connect = IO_OBJECT_NULL;
> +size_t smc_param_size = sizeof(struct AppleSMCParam);
> +IOReturn status = kIOReturnError;
> +int i;
> +
> +struct AppleSMCParam smc_param[2] = {
> + {
> + .key = ('OSK0'),
> + .data_size = sizeof(smc_param[0].bytes),
> + .command = SMC_READ_KEY,
> + }, {
> + .key = ('OSK1'),
> + .data_size = sizeof(smc_param[0].bytes),
> + .command = SMC_READ_KEY,
> + },
> +};
> +
> +hostsmc_service = IOServiceGetMatchingService(
> +kIOMasterPortDefault,
> +IOServiceMatching("AppleSMC"));
> +if (hostsmc_service == IO_OBJECT_NULL) {
> +error_setg(errp, "Unable to get host-AppleSMC service");
> +goto error;
> +}
> +
> +status = IOServiceOpen(hostsmc_service,
> +   mach_task_self(),
> +   0,
> +   _connect);
> +if (status != kIOReturnSuccess || hostsmc_connect == IO_OBJECT_NULL) {
> +error_setg(errp, "Unable to open host-AppleSMC service");
> +goto error;
> +}
> +
> +for (i = 0; i < ARRAY_SIZE(smc_param); ++i) {
> +status = IOConnectCallStructMethod(
> +hostsmc_connect,
> +SMC_HANDLE_EVENT,
> +_param[i],
> +sizeof(struct AppleSMCParam),
> +_param[i],
> +_param_size
> +);
> +
> +if (status != kIOReturnSuccess) {
> +error_setg(errp, "Unable to read OSK from host-AppleSMC");
> +goto error;
> +}
> +}
> +
> + 

Re: [PATCH v6 1/1] virtiofsd: Add basic support for FUSE_SYNCFS request

2022-02-16 Thread Dr. David Alan Gilbert
* Vivek Goyal (vgo...@redhat.com) wrote:
> On Tue, Feb 15, 2022 at 07:15:29PM +0100, Greg Kurz wrote:
> > Honor the expected behavior of syncfs() to synchronously flush all data
> > and metadata to disk on linux systems.
> > 
> > If virtiofsd is started with '-o announce_submounts', the client is
> > expected to send a FUSE_SYNCFS request for each individual submount.
> > In this case, we just create a new file descriptor on the submount
> > inode with lo_inode_open(), call syncfs() on it and close it. The
> > intermediary file is needed because O_PATH descriptors aren't
> > backed by an actual file and syncfs() would fail with EBADF.
> > 
> > If virtiofsd is started without '-o announce_submounts' or if the
> > client doesn't have the FUSE_CAP_SUBMOUNTS capability, the client
> > only sends a single FUSE_SYNCFS request for the root inode. The
> > server would thus need to track submounts internally and call
> > syncfs() on each of them. This will be implemented later.
> > 
> > Note that syncfs() might suffer from a time penalty if the submounts
> > are being hammered by some unrelated workload on the host. The only
> > solution to prevent that is to avoid shared mounts.
> > 
> > Signed-off-by: Greg Kurz 
> 
> Looks good to me. Thanks Greg.
> 
> Reviewed-by: Vivek Goyal 

Queued

> Vivek
> 
> > ---
> >  tools/virtiofsd/fuse_lowlevel.c   | 11 +++
> >  tools/virtiofsd/fuse_lowlevel.h   | 13 
> >  tools/virtiofsd/passthrough_ll.c  | 44 +++
> >  tools/virtiofsd/passthrough_seccomp.c |  1 +
> >  4 files changed, 69 insertions(+)
> > 
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c 
> > b/tools/virtiofsd/fuse_lowlevel.c
> > index e4679c73abc2..e02d8b25a5f6 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -1876,6 +1876,16 @@ static void do_lseek(fuse_req_t req, fuse_ino_t 
> > nodeid,
> >  }
> >  }
> >  
> > +static void do_syncfs(fuse_req_t req, fuse_ino_t nodeid,
> > +  struct fuse_mbuf_iter *iter)
> > +{
> > +if (req->se->op.syncfs) {
> > +req->se->op.syncfs(req, nodeid);
> > +} else {
> > +fuse_reply_err(req, ENOSYS);
> > +}
> > +}
> > +
> >  static void do_init(fuse_req_t req, fuse_ino_t nodeid,
> >  struct fuse_mbuf_iter *iter)
> >  {
> > @@ -2280,6 +2290,7 @@ static struct {
> >  [FUSE_RENAME2] = { do_rename2, "RENAME2" },
> >  [FUSE_COPY_FILE_RANGE] = { do_copy_file_range, "COPY_FILE_RANGE" },
> >  [FUSE_LSEEK] = { do_lseek, "LSEEK" },
> > +[FUSE_SYNCFS] = { do_syncfs, "SYNCFS" },
> >  };
> >  
> >  #define FUSE_MAXOP (sizeof(fuse_ll_ops) / sizeof(fuse_ll_ops[0]))
> > diff --git a/tools/virtiofsd/fuse_lowlevel.h 
> > b/tools/virtiofsd/fuse_lowlevel.h
> > index c55c0ca2fc1c..b889dae4de0e 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.h
> > +++ b/tools/virtiofsd/fuse_lowlevel.h
> > @@ -1226,6 +1226,19 @@ struct fuse_lowlevel_ops {
> >   */
> >  void (*lseek)(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
> >struct fuse_file_info *fi);
> > +
> > +/**
> > + * Synchronize file system content
> > + *
> > + * If this request is answered with an error code of ENOSYS,
> > + * this is treated as success and future calls to syncfs() will
> > + * succeed automatically without being sent to the filesystem
> > + * process.
> > + *
> > + * @param req request handle
> > + * @param ino the inode number
> > + */
> > +void (*syncfs)(fuse_req_t req, fuse_ino_t ino);
> >  };
> >  
> >  /**
> > diff --git a/tools/virtiofsd/passthrough_ll.c 
> > b/tools/virtiofsd/passthrough_ll.c
> > index b3d0674f6d2f..0f65e6423cf5 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -3357,6 +3357,49 @@ static void lo_lseek(fuse_req_t req, fuse_ino_t ino, 
> > off_t off, int whence,
> >  }
> >  }
> >  
> > +static int lo_do_syncfs(struct lo_data *lo, struct lo_inode *inode)
> > +{
> > +int fd, ret = 0;
> > +
> > +fuse_log(FUSE_LOG_DEBUG, "lo_do_syncfs(ino=%" PRIu64 ")\n",
> > + inode->fuse_ino);
> > +
> > +fd = lo_inode_open(lo, inode, O_RDONLY);
> > +if (fd < 0) {
> > +return -fd;
> > +}
> > +
> > +if (syncfs(fd) < 0) {
> > +ret = errno;
> > +}
> > +
> > +close(fd);
> > +return ret;
> > +}
> > +
> > +static void lo_syncfs(fuse_req_t req, fuse_ino_t ino)
> > +{
> > +struct lo_data *lo = lo_data(req);
> > +struct lo_inode *inode = lo_inode(req, ino);
> > +int err;
> > +
> > +if (!inode) {
> > +fuse_reply_err(req, EBADF);
> > +return;
> > +}
> > +
> > +err = lo_do_syncfs(lo, inode);
> > +lo_inode_put(lo, );
> > +
> > +/*
> > + * If submounts aren't announced, the client only sends a request to
> > + * sync the root inode. TODO: Track submounts internally and iterate
> > + * over them as well.
> > + */
> 

Re: [RFC 1/8] ioregionfd: introduce a syscall and memory API

2022-02-16 Thread David Hildenbrand
Looks straight forward to me.

[...]

>  
> +int kvm_set_ioregionfd(struct kvm_ioregion *ioregionfd)
> +{
> +KVMState *s = kvm_state;
> +int ret = -1;
> +
> +ret = kvm_vm_ioctl(s, KVM_SET_IOREGION, ioregionfd);
> +if (ret < 0) {
> +error_report("Failed SET_IOREGION syscall ret is %d", ret);

Maybe print the textual representation via strerror(-ret).

> +}
> +return ret;
> +}
> +
>  static int do_kvm_destroy_vcpu(CPUState *cpu)
>  {
>  KVMState *s = kvm_state;
> @@ -1635,6 +1648,104 @@ static void kvm_io_ioeventfd_del(MemoryListener 
> *listener,
>  }
>  }
>  
> +static void kvm_mem_ioregionfd_add(MemoryListener *listener,
> +   MemoryRegionSection *section,
> +   uint64_t data,
> +   int fd)
> +{
> +
> +struct kvm_ioregion ioregionfd;
> +int r = -1;
> +
> +ioregionfd.guest_paddr = section->offset_within_address_space;
> +ioregionfd.memory_size = int128_get64(section->size);
> +ioregionfd.user_data = data;
> +ioregionfd.read_fd = fd;
> +ioregionfd.write_fd = fd;
> +ioregionfd.flags = 0;
> +memset(, 0, sizeof(ioregionfd.pad));
> +
> +r = kvm_set_ioregionfd();
> +if (r < 0) {
> +fprintf(stderr, "%s: error adding ioregionfd: %s (%d)\n,",
> +__func__, strerror(-r), -r);

Oh, you're actually printing the error again? Why error_report() above
and here fprintf?

[...]

>  void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
>AddressSpace *as, int as_id, const char 
> *name)
>  {
> @@ -1679,6 +1790,12 @@ static MemoryListener kvm_io_listener = {
>  .priority = 10,
>  };
>  
> +static MemoryListener kvm_ioregion_listener = {
> +.ioregionfd_add = kvm_io_ioregionfd_add,
> +.ioregionfd_del = kvm_io_ioregionfd_del,
> +.priority = 10,
> +};
> +
>  int kvm_set_irq(KVMState *s, int irq, int level)
>  {
>  struct kvm_irq_level event;
> @@ -2564,6 +2681,9 @@ static int kvm_init(MachineState *ms)
>  kvm_ioeventfd_any_length_allowed =
>  (kvm_check_extension(s, KVM_CAP_IOEVENTFD_ANY_LENGTH) > 0);
>  
> +kvm_ioregionfds_allowed =
> +(kvm_check_extension(s, KVM_CAP_IOREGIONFD) > 0);
> +
>  kvm_state = s;
>  
>  ret = kvm_arch_init(ms, s);
> @@ -2585,6 +2705,12 @@ static int kvm_init(MachineState *ms)
>  s->memory_listener.listener.eventfd_add = kvm_mem_ioeventfd_add;
>  s->memory_listener.listener.eventfd_del = kvm_mem_ioeventfd_del;
>  }
> +
> +if (kvm_ioregionfds_allowed) {
> +s->memory_listener.listener.ioregionfd_add = kvm_mem_ioregionfd_add;
> +s->memory_listener.listener.ioregionfd_del = kvm_mem_ioregionfd_del;
> +}
> +
>  s->memory_listener.listener.coalesced_io_add = kvm_coalesce_mmio_region;
>  s->memory_listener.listener.coalesced_io_del = 
> kvm_uncoalesce_mmio_region;
>  
> @@ -2594,6 +2720,12 @@ static int kvm_init(MachineState *ms)
>  memory_listener_register(_io_listener,
>   _space_io);
>  }
> +
> +if (kvm_ioregionfds_allowed) {
> +memory_listener_register(_ioregion_listener,
> + _space_io);
> +}
> +
>  memory_listener_register(_coalesced_pio_listener,
>   _space_io);
>  

Why are we using a single memory listener for address_space_memory but
individual listeners for address_space_io?

IOW, wey don't we have >io_listener ?

-- 
Thanks,

David / dhildenb




Re: [PATCH v1 1/4] hyperv: SControl is optional to enable SynIc

2022-02-16 Thread Jon Doron

On 16/02/2022, Emanuele Giuseppe Esposito wrote:



On 04/02/2022 11:07, Jon Doron wrote:

SynIc can be enabled regardless of the SControl mechanisim which can
register a GSI for a given SintRoute.

This behaviour can achived by setting enabling SIMP and then the guest
will poll on the message slot.

Once there is another message pending the host will set the message slot
with the pending flag.
When the guest polls from the message slot, incase the pending flag is


s/incase/in case

Done

set it will write to the HV_X64_MSR_EOM indicating it has cleared the
slow and we can try and push our message again.


what do you mean by "the slow"?

Just a typo to slot :) fixed


Signed-off-by: Jon Doron 
---
 hw/hyperv/hyperv.c | 233 -
 include/hw/hyperv/hyperv.h |   2 +
 2 files changed, 153 insertions(+), 82 deletions(-)

diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
index cb1074f234..88c9cc1334 100644
--- a/hw/hyperv/hyperv.c
+++ b/hw/hyperv/hyperv.c
@@ -27,18 +27,70 @@ struct SynICState {

 CPUState *cs;

-bool enabled;
+bool sctl_enabled;
 hwaddr msg_page_addr;
 hwaddr event_page_addr;
 MemoryRegion msg_page_mr;
 MemoryRegion event_page_mr;
 struct hyperv_message_page *msg_page;
 struct hyperv_event_flags_page *event_page;
+
+QemuMutex sint_routes_mutex;
+QLIST_HEAD(, HvSintRoute) sint_routes;
 };

 #define TYPE_SYNIC "hyperv-synic"
 OBJECT_DECLARE_SIMPLE_TYPE(SynICState, SYNIC)

+/*
+ * KVM has its own message producers (SynIC timers).  To guarantee
+ * serialization with both KVM vcpu and the guest cpu, the messages are first
+ * staged in an intermediate area and then posted to the SynIC message page in
+ * the vcpu thread.
+ */
+typedef struct HvSintStagedMessage {
+/* message content staged by hyperv_post_msg */
+struct hyperv_message msg;
+/* callback + data (r/o) to complete the processing in a BH */
+HvSintMsgCb cb;
+void *cb_data;
+/* message posting status filled by cpu_post_msg */
+int status;
+/* passing the buck: */
+enum {
+/* initial state */
+HV_STAGED_MSG_FREE,
+/*
+ * hyperv_post_msg (e.g. in main loop) grabs the staged area (FREE ->
+ * BUSY), copies msg, and schedules cpu_post_msg on the assigned cpu
+ */
+HV_STAGED_MSG_BUSY,
+/*
+ * cpu_post_msg (vcpu thread) tries to copy staged msg to msg slot,
+ * notify the guest, records the status, marks the posting done (BUSY
+ * -> POSTED), and schedules sint_msg_bh BH
+ */
+HV_STAGED_MSG_POSTED,
+/*
+ * sint_msg_bh (BH) verifies that the posting is done, runs the
+ * callback, and starts over (POSTED -> FREE)
+ */
+} state;
+} HvSintStagedMessage;
+
+struct HvSintRoute {
+uint32_t sint;
+SynICState *synic;
+int gsi;
+EventNotifier sint_set_notifier;
+EventNotifier sint_ack_notifier;
+
+HvSintStagedMessage *staged_msg;
+
+unsigned refcount;
+QLIST_ENTRY(HvSintRoute) link;
+};
+
 static bool synic_enabled;


Why did you move this struct above?
I think it was done purposefully to separate synic_* functions from the
others below (sint_*).

Done


 bool hyperv_is_synic_enabled(void)
@@ -51,11 +103,11 @@ static SynICState *get_synic(CPUState *cs)
 return SYNIC(object_resolve_path_component(OBJECT(cs), "synic"));
 }

-static void synic_update(SynICState *synic, bool enable,
+static void synic_update(SynICState *synic, bool sctl_enable,
  hwaddr msg_page_addr, hwaddr event_page_addr)
 {

-synic->enabled = enable;
+synic->sctl_enabled = sctl_enable;
 if (synic->msg_page_addr != msg_page_addr) {
 if (synic->msg_page_addr) {
 memory_region_del_subregion(get_system_memory(),
@@ -80,7 +132,7 @@ static void synic_update(SynICState *synic, bool enable,
 }
 }

-void hyperv_synic_update(CPUState *cs, bool enable,
+void hyperv_synic_update(CPUState *cs, bool sctl_enable,
  hwaddr msg_page_addr, hwaddr event_page_addr)
 {
 SynICState *synic = get_synic(cs);
@@ -89,7 +141,7 @@ void hyperv_synic_update(CPUState *cs, bool enable,
 return;
 }

-synic_update(synic, enable, msg_page_addr, event_page_addr);
+synic_update(synic, sctl_enable, msg_page_addr, event_page_addr);
 }

 static void synic_realize(DeviceState *dev, Error **errp)
@@ -110,16 +162,20 @@ static void synic_realize(DeviceState *dev, Error **errp)
sizeof(*synic->event_page), _abort);
 synic->msg_page = memory_region_get_ram_ptr(>msg_page_mr);
 synic->event_page = memory_region_get_ram_ptr(>event_page_mr);
+qemu_mutex_init(>sint_routes_mutex);
+QLIST_INIT(>sint_routes);

 g_free(msgp_name);
 g_free(eventp_name);
 }
+
 static void synic_reset(DeviceState *dev)
 {
 SynICState *synic = SYNIC(dev);
 memset(synic->msg_page, 0, 

Re: [PATCH v2 8/9] target/ppc: Introduce a vhyp framework for nested HV support

2022-02-16 Thread Cédric Le Goater

On 2/16/22 11:25, Nicholas Piggin wrote:

Introduce virtual hypervisor methods that can support a "Nested KVM HV"
implementation using the bare metal 2-level radix MMU, and using HV
exceptions to return from H_ENTER_NESTED (rather than cause interrupts).

HV exceptions can now be raised in the TCG spapr machine when running a
nested KVM HV guest. The main ones are the lev==1 syscall, the hdecr,
hdsi and hisi, hv fu, and hv emu, and h_virt external interrupts.

HV exceptions are intercepted in the exception handler code and instead
of causing interrupts in the guest and switching the machine to HV mode,
they go to the vhyp where it may exit the H_ENTER_NESTED hcall with the
interrupt vector numer as return value as required by the hcall API.

Address translation is provided by the 2-level page table walker that is
implemented for the bare metal radix MMU. The partition scope page table
is pointed to the L1's partition scope by the get_pate vhc method.

Reviewed-by: Fabiano Rosas 
Signed-off-by: Nicholas Piggin 


Reviewed-by: Cédric Le Goater 

Thanks,

C.


---
  hw/ppc/pegasos2.c|  6 
  hw/ppc/spapr.c   |  6 
  target/ppc/cpu.h |  7 +
  target/ppc/excp_helper.c | 64 +---
  target/ppc/mmu-radix64.c | 11 +--
  5 files changed, 81 insertions(+), 13 deletions(-)

diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index 298e6b93e2..d45008ac71 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -449,6 +449,11 @@ static target_ulong pegasos2_rtas(PowerPCCPU *cpu, 
Pegasos2MachineState *pm,
  }
  }
  
+static bool pegasos2_cpu_in_nested(PowerPCCPU *cpu)

+{
+return false;
+}
+
  static void pegasos2_hypercall(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)
  {
  Pegasos2MachineState *pm = PEGASOS2_MACHINE(vhyp);
@@ -504,6 +509,7 @@ static void pegasos2_machine_class_init(ObjectClass *oc, 
void *data)
  mc->default_ram_id = "pegasos2.ram";
  mc->default_ram_size = 512 * MiB;
  
+vhc->cpu_in_nested = pegasos2_cpu_in_nested;

  vhc->hypercall = pegasos2_hypercall;
  vhc->cpu_exec_enter = vhyp_nop;
  vhc->cpu_exec_exit = vhyp_nop;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 2c95a09d25..6fab70767f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4470,6 +4470,11 @@ PowerPCCPU *spapr_find_cpu(int vcpu_id)
  return NULL;
  }
  
+static bool spapr_cpu_in_nested(PowerPCCPU *cpu)

+{
+return false;
+}
+
  static void spapr_cpu_exec_enter(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)
  {
  SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
@@ -4578,6 +4583,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
  fwc->get_dev_path = spapr_get_fw_dev_path;
  nc->nmi_monitor_handler = spapr_nmi;
  smc->phb_placement = spapr_phb_placement;
+vhc->cpu_in_nested = spapr_cpu_in_nested;
  vhc->hypercall = emulate_spapr_hypercall;
  vhc->hpt_mask = spapr_hpt_mask;
  vhc->map_hptes = spapr_map_hptes;
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index c79ae74f10..2baa750729 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1311,6 +1311,8 @@ PowerPCCPUClass *ppc_cpu_get_family_class(PowerPCCPUClass 
*pcc);
  #ifndef CONFIG_USER_ONLY
  struct PPCVirtualHypervisorClass {
  InterfaceClass parent;
+bool (*cpu_in_nested)(PowerPCCPU *cpu);
+void (*deliver_hv_excp)(PowerPCCPU *cpu, int excp);
  void (*hypercall)(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu);
  hwaddr (*hpt_mask)(PPCVirtualHypervisor *vhyp);
  const ppc_hash_pte64_t *(*map_hptes)(PPCVirtualHypervisor *vhyp,
@@ -1330,6 +1332,11 @@ struct PPCVirtualHypervisorClass {
  #define TYPE_PPC_VIRTUAL_HYPERVISOR "ppc-virtual-hypervisor"
  DECLARE_OBJ_CHECKERS(PPCVirtualHypervisor, PPCVirtualHypervisorClass,
   PPC_VIRTUAL_HYPERVISOR, TYPE_PPC_VIRTUAL_HYPERVISOR)
+
+static inline bool vhyp_cpu_in_nested(PowerPCCPU *cpu)
+{
+return PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp)->cpu_in_nested(cpu);
+}
  #endif /* CONFIG_USER_ONLY */
  
  void ppc_cpu_dump_state(CPUState *cpu, FILE *f, int flags);

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 778eb4f3b0..a78d06d648 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1279,6 +1279,18 @@ static void powerpc_excp_booke(PowerPCCPU *cpu, int excp)
  powerpc_set_excp_state(cpu, vector, new_msr);
  }
  
+/*

+ * When running a nested HV guest under vhyp, external interrupts are
+ * delivered as HVIRT.
+ */
+static bool books_vhyp_promotes_external_to_hvirt(PowerPCCPU *cpu)
+{
+if (cpu->vhyp) {
+return vhyp_cpu_in_nested(cpu);
+}
+return false;
+}
+
  #ifdef TARGET_PPC64
  /*
   * When running under vhyp, hcalls are always intercepted and sent to the
@@ -1287,7 +1299,21 @@ static void powerpc_excp_booke(PowerPCCPU *cpu, int excp)
  static bool books_vhyp_handles_hcall(PowerPCCPU *cpu)
  {
  if (cpu->vhyp) {
-return true;
+return 

Re: [PULL 0/5] 9p queue 2022-02-10

2022-02-16 Thread Christian Schoenebeck
On Dienstag, 15. Februar 2022 08:01:37 CET Greg Kurz wrote:
> On Mon, 14 Feb 2022 17:43:51 +0300
> 
> Vitaly Chikunov  wrote:
> > Why g_new0 and not just g_malloc0? This is smallest code change, which
> > seems appropriate for a bug fix.
> 
> I prefer g_new0() for the exact reasons that are provided in QEMU's
> official coding style docs/devel/style.rst:
[...]
> I'm fine with the acceptable version as well. The only important thing is
> to fix the synth backend.
> 
> Cheers,

Hi, is anybody working on a v5 of this patch? If not I will send one this 
evening to bring this issue forward, because it is blocking my queue.

Best regards,
Christian Schoenebeck





Re: [PATCH v1 4/4] hw: hyperv: Initial commit for Synthetic Debugging device

2022-02-16 Thread Jon Doron

On 16/02/2022, Emanuele Giuseppe Esposito wrote:



+
+static uint16_t handle_recv_msg(HvSynDbg *syndbg, uint64_t outgpa,
+uint32_t count, bool is_raw, uint32_t options,
+uint64_t timeout, uint32_t *retrieved_count)
+{
+uint16_t ret;
+uint8_t data_buf[TARGET_PAGE_SIZE - UDP_PKT_HEADER_SIZE];
+hwaddr out_len;
+void *out_data = NULL;
+ssize_t recv_byte_count;
+
+/* TODO: Handle options and timeout */
+(void)options;
+(void)timeout;
+
+if (!syndbg->has_data_pending) {
+recv_byte_count = 0;
+} else {
+recv_byte_count = qemu_recv(syndbg->socket, data_buf,
+MIN(sizeof(data_buf), count), MSG_WAITALL);
+if (recv_byte_count == -1) {
+ret = HV_STATUS_INVALID_PARAMETER;
+goto cleanup;
+}
+}
+
+if (!recv_byte_count) {
+*retrieved_count = 0;
+ret = HV_STATUS_NO_DATA;
+goto cleanup;
+}
+
+set_pending_state(syndbg, false);
+
+out_len = recv_byte_count;
+if (is_raw) {
+out_len += UDP_PKT_HEADER_SIZE;
+}
+out_data = cpu_physical_memory_map(outgpa, _len, 1);
+if (!out_data) {
+ret = HV_STATUS_INSUFFICIENT_MEMORY;
+goto cleanup;
+}
+
+if (is_raw &&
+!create_udp_pkt(syndbg, out_data,
+recv_byte_count + UDP_PKT_HEADER_SIZE,
+data_buf, recv_byte_count)) {
+ret = HV_STATUS_INSUFFICIENT_MEMORY;
+goto cleanup;
+} else if (!is_raw) {
+memcpy(out_data, data_buf, recv_byte_count);
+}
+
+*retrieved_count = recv_byte_count;
+if (is_raw) {
+*retrieved_count += UDP_PKT_HEADER_SIZE;
+}
+ret = HV_STATUS_SUCCESS;
+cleanup:
+if (out_data) {
+cpu_physical_memory_unmap(out_data, out_len, 1, out_len);
+}


Same nitpick as done in patch 1, I think you can use more gotos labels
instead of adding if statements.


Done

+
+return ret;
+}
+






Re: [PATCH v1 2/4] hyperv: Add definitions for syndbg

2022-02-16 Thread Jon Doron

On 16/02/2022, Emanuele Giuseppe Esposito wrote:



On 04/02/2022 11:07, Jon Doron wrote:

Add all required definitions for hyperv synthetic debugger interface.

Signed-off-by: Jon Doron 
---
 include/hw/hyperv/hyperv-proto.h | 52 
 target/i386/kvm/hyperv-proto.h   | 37 +++
 2 files changed, 89 insertions(+)

diff --git a/include/hw/hyperv/hyperv-proto.h b/include/hw/hyperv/hyperv-proto.h
index 21dc28aee9..94c9658eb0 100644
--- a/include/hw/hyperv/hyperv-proto.h
+++ b/include/hw/hyperv/hyperv-proto.h
@@ -24,12 +24,17 @@
 #define HV_STATUS_INVALID_PORT_ID 17
 #define HV_STATUS_INVALID_CONNECTION_ID   18
 #define HV_STATUS_INSUFFICIENT_BUFFERS19
+#define HV_STATUS_NOT_ACKNOWLEDGED20
+#define HV_STATUS_NO_DATA 27

 /*
  * Hypercall numbers
  */
 #define HV_POST_MESSAGE   0x005c
 #define HV_SIGNAL_EVENT   0x005d
+#define HV_POST_DEBUG_DATA0x0069
+#define HV_RETREIVE_DEBUG_DATA0x006a


s/RETREIVE/RETRIEVE?


Done

+#define HV_RESET_DEBUG_SESSION0x006b
 #define HV_HYPERCALL_FAST (1u << 16)

 /*
@@ -127,4 +132,51 @@ struct hyperv_event_flags_page {
 struct hyperv_event_flags slot[HV_SINT_COUNT];
 };

+/*
+ * Kernel debugger structures
+ */
+
+/* Options flags for hyperv_reset_debug_session */
+#define HV_DEBUG_PURGE_INCOMING_DATA0x0001
+#define HV_DEBUG_PURGE_OUTGOING_DATA0x0002
+struct hyperv_reset_debug_session_input {
+uint32_t options;
+} __attribute__ ((__packed__));
+
+struct hyperv_reset_debug_session_output {
+uint32_t host_ip;
+uint32_t target_ip;
+uint16_t host_port;
+uint16_t target_port;
+uint8_t host_mac[6];
+uint8_t target_mac[6];
+} __attribute__ ((__packed__));
+
+/* Options for hyperv_post_debug_data */
+#define HV_DEBUG_POST_LOOP  0x0001
+
+struct hyperv_post_debug_data_input {
+uint32_t count;
+uint32_t options;



+/*uint8_t data[HV_HYP_PAGE_SIZE - 2 * sizeof(uint32_t)];*/


What is this comment for?


It's a reference how the data really looks like.

+} __attribute__ ((__packed__));
+
+struct hyperv_post_debug_data_output {
+uint32_t pending_count;
+} __attribute__ ((__packed__));
+
+/* Options for hyperv_retrieve_debug_data */
+#define HV_DEBUG_RETRIEVE_LOOP  0x0001
+#define HV_DEBUG_RETRIEVE_TEST_ACTIVITY 0x0002
+
+struct hyperv_retrieve_debug_data_input {
+uint32_t count;
+uint32_t options;
+uint64_t timeout;
+} __attribute__ ((__packed__));
+
+struct hyperv_retrieve_debug_data_output {
+uint32_t retrieved_count;
+uint32_t remaining_count;
+} __attribute__ ((__packed__));
 #endif
diff --git a/target/i386/kvm/hyperv-proto.h b/target/i386/kvm/hyperv-proto.h
index 89f81afda7..9480bcdf04 100644
--- a/target/i386/kvm/hyperv-proto.h
+++ b/target/i386/kvm/hyperv-proto.h
@@ -19,6 +19,9 @@
 #define HV_CPUID_ENLIGHTMENT_INFO 0x4004
 #define HV_CPUID_IMPLEMENT_LIMITS 0x4005
 #define HV_CPUID_NESTED_FEATURES  0x400A
+#define HV_CPUID_SYNDBG_VENDOR_AND_MAX_FUNCTIONS0x4080
+#define HV_CPUID_SYNDBG_INTERFACE   0x4081
+#define HV_CPUID_SYNDBG_PLATFORM_CAPABILITIES   0x4082
 #define HV_CPUID_MIN  0x4005
 #define HV_CPUID_MAX  0x4000
 #define HV_HYPERVISOR_PRESENT_BIT 0x8000
@@ -55,8 +58,14 @@
 #define HV_GUEST_IDLE_STATE_AVAILABLE   (1u << 5)
 #define HV_FREQUENCY_MSRS_AVAILABLE (1u << 8)
 #define HV_GUEST_CRASH_MSR_AVAILABLE(1u << 10)
+#define HV_FEATURE_DEBUG_MSRS_AVAILABLE (1u << 11)
 #define HV_STIMER_DIRECT_MODE_AVAILABLE (1u << 19)

+/*
+ * HV_CPUID_FEATURES.EBX bits
+ */
+#define HV_PARTITION_DEUBGGING_ALLOWED  (1u << 12)

s/DEUBGGING/DEBUGGING

Done

+
 /*
  * HV_CPUID_ENLIGHTMENT_INFO.EAX bits
  */
@@ -72,6 +81,11 @@
 #define HV_ENLIGHTENED_VMCS_RECOMMENDED (1u << 14)
 #define HV_NO_NONARCH_CORESHARING   (1u << 18)

+/*
+ * HV_CPUID_SYNDBG_PLATFORM_CAPABILITIES.EAX bits
+ */
+#define HV_SYNDBG_CAP_ALLOW_KERNEL_DEBUGGING(1u << 1)
+
 /*
  * Basic virtualized MSRs
  */
@@ -130,6 +144,18 @@
 #define HV_X64_MSR_STIMER3_CONFIG   0x40B6
 #define HV_X64_MSR_STIMER3_COUNT0x40B7

+/*
+ * Hyper-V Synthetic debug options MSR
+ */
+#define HV_X64_MSR_SYNDBG_CONTROL   0x40F1
+#define HV_X64_MSR_SYNDBG_STATUS0x40F2
+#define HV_X64_MSR_SYNDBG_SEND_BUFFER   0x40F3
+#define HV_X64_MSR_SYNDBG_RECV_BUFFER   0x40F4
+#define HV_X64_MSR_SYNDBG_PENDING_BUFFER0x40F5
+#define HV_X64_MSR_SYNDBG_OPTIONS   0x40FF
+
+#define HV_X64_SYNDBG_OPTION_USE_HCALLS BIT(2)
+
 /*
  * Guest crash notification MSRs
  */
@@ -168,5 

Re: [PATCH v3 1/3] s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3 for the s390x

2022-02-16 Thread David Hildenbrand
> +static DisasJumpType op_sel(DisasContext *s, DisasOps *o)
> +{
> +DisasCompare c;
> +disas_jcc(s, , get_field(s, m4));
> +tcg_gen_movcond_i64(c.cond, o->out, c.u.s64.a, c.u.s64.b,
> +o->in1, o->in2);
> +free_compare();
> +return DISAS_NEXT;
> +}


I realize that SELECT really is mostly identical to LOAD ON CONDITION,
except that we have a second input.

The following on top would unify both


diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
index fb482b08b7..493f1d669c 100644
--- a/target/s390x/tcg/insn-data.def
+++ b/target/s390x/tcg/insn-data.def
@@ -781,8 +781,8 @@
 /* SEARCH STRING UNICODE */
 C(0xb9be, SRSTU,   RRE,   ETF3, 0, 0, 0, 0, srstu, 0)
 /* SELECT */
-C(0xb9f0, SELR,RRF_a, MIE3, r2, r3, new, r1_32, sel, 0)
-C(0xb9e3, SELGR,   RRF_a, MIE3, r2, r3, r1, 0, sel, 0)
+C(0xb9f0, SELR,RRF_a, MIE3, r3, r2, new, r1_32, loc, 0)
+C(0xb9e3, SELGR,   RRF_a, MIE3, r3, r2, r1, 0, loc, 0)
 
 /* SET ACCESS */
 C(0xb24e, SAR, RRE,   Z,   0, r2_o, 0, 0, sar, 0)
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index d5c536c60a..7805ffe879 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -1528,16 +1528,6 @@ static DisasJumpType op_nxor(DisasContext *s, DisasOps 
*o)
 return DISAS_NEXT;
 }
 
-static DisasJumpType op_sel(DisasContext *s, DisasOps *o)
-{
-DisasCompare c;
-disas_jcc(s, , get_field(s, m4));
-tcg_gen_movcond_i64(c.cond, o->out, c.u.s64.a, c.u.s64.b,
-o->in1, o->in2);
-free_compare();
-return DISAS_NEXT;
-}
-
 static DisasJumpType op_ni(DisasContext *s, DisasOps *o)
 {
 o->in1 = tcg_temp_new_i64();
@@ -2998,7 +2988,13 @@ static DisasJumpType op_loc(DisasContext *s, DisasOps *o)
 {
 DisasCompare c;
 
-disas_jcc(s, , get_field(s, m3));
+if (have_field(s, m3)) {
+/* LOAD * ON CONDITION */
+disas_jcc(s, , get_field(s, m3));
+} else {
+/* SELECT */
+disas_jcc(s, , get_field(s, m4));
+}
 
 if (c.is_64) {
 tcg_gen_movcond_i64(c.cond, o->out, c.u.s64.a, c.u.s64.b,


I can spot some advanced magic in op_loc for "!c.is64".

But even with that change, the SELECT test still crashes for me.

The problematic part is the last selfhrnz() test that makes QEMU crash.

This might be an existing BUG for op_loc already -- or in the TCG backend.

Maybe the disas_jcc/tcg_gen_movcond_i64 generates something unexpected on my 
machine?

-- 
Thanks,

David / dhildenb




Re: [PATCH v2 9/9] spapr: implement nested-hv capability for the virtual hypervisor

2022-02-16 Thread Cédric Le Goater

On 2/16/22 11:25, Nicholas Piggin wrote:

This implements the Nested KVM HV hcall API for spapr under TCG.

The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
L1 is switched back in returned from the hcall when a HV exception
is sent to the vhyp. Register state is copied in and out according to
the nested KVM HV hcall API specification.

The hdecr timer is started when the L2 is switched in, and it provides
the HDEC / 0x980 return to L1.

The MMU re-uses the bare metal radix 2-level page table walker by
using the get_pate method to point the MMU to the nested partition
table entry. MMU faults due to partition scope errors raise HV
exceptions and accordingly are routed back to the L1.

The MMU does not tag translations for the L1 (direct) vs L2 (nested)
guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
and exit).>
Reviewed-by: Fabiano Rosas 
Signed-off-by: Nicholas Piggin 


Reviewed-by: Cédric Le Goater 

Some last comments below,



---
  hw/ppc/spapr.c  |  37 +++-
  hw/ppc/spapr_caps.c |  14 +-
  hw/ppc/spapr_hcall.c| 333 
  include/hw/ppc/spapr.h  |  74 ++-
  include/hw/ppc/spapr_cpu_core.h |   5 +
  5 files changed, 452 insertions(+), 11 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 6fab70767f..87e68da77f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1270,6 +1270,8 @@ static void emulate_spapr_hypercall(PPCVirtualHypervisor 
*vhyp,
  /* The TCG path should also be holding the BQL at this point */
  g_assert(qemu_mutex_iothread_locked());
  
+g_assert(!vhyp_cpu_in_nested(cpu));

+
  if (msr_pr) {
  hcall_dprintf("Hypercall made with MSR[PR]=1\n");
  env->gpr[3] = H_PRIVILEGE;
@@ -1313,12 +1315,34 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, 
PowerPCCPU *cpu,
 target_ulong lpid, ppc_v3_pate_t *entry)
  {
  SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
+SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
  
-assert(lpid == 0);

+if (!spapr_cpu->in_nested) {
+assert(lpid == 0);
  
-/* Copy PATE1:GR into PATE0:HR */

-entry->dw0 = spapr->patb_entry & PATE0_HR;
-entry->dw1 = spapr->patb_entry;
+/* Copy PATE1:GR into PATE0:HR */
+entry->dw0 = spapr->patb_entry & PATE0_HR;
+entry->dw1 = spapr->patb_entry;
+
+} else {
+uint64_t patb, pats;
+
+assert(lpid != 0);
+
+patb = spapr->nested_ptcr & PTCR_PATB;
+pats = spapr->nested_ptcr & PTCR_PATS;
+
+/* Calculate number of entries */
+pats = 1ull << (pats + 12 - 4);
+if (pats <= lpid) {
+return false;
+}
+
+/* Grab entry */
+patb += 16 * lpid;
+entry->dw0 = ldq_phys(CPU(cpu)->as, patb);
+entry->dw1 = ldq_phys(CPU(cpu)->as, patb + 8);
+}
  
  return true;

  }
@@ -4472,7 +4496,9 @@ PowerPCCPU *spapr_find_cpu(int vcpu_id)
  
  static bool spapr_cpu_in_nested(PowerPCCPU *cpu)

  {
-return false;
+SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
+
+return spapr_cpu->in_nested;
  }
  
  static void spapr_cpu_exec_enter(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)

@@ -4584,6 +4610,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
  nc->nmi_monitor_handler = spapr_nmi;
  smc->phb_placement = spapr_phb_placement;
  vhc->cpu_in_nested = spapr_cpu_in_nested;
+vhc->deliver_hv_excp = spapr_exit_nested;
  vhc->hypercall = emulate_spapr_hypercall;
  vhc->hpt_mask = spapr_hpt_mask;
  vhc->map_hptes = spapr_map_hptes;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index e2412aaa57..6d74345930 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -444,19 +444,23 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState 
*spapr,
  {
  ERRP_GUARD();
  PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+CPUPPCState *env = >env;
  
  if (!val) {

  /* capability disabled by default */
  return;
  }
  
-if (tcg_enabled()) {

-error_setg(errp, "No Nested KVM-HV support in TCG");
+if (!(env->insns_flags2 & PPC2_ISA300)) {
+error_setg(errp, "Nested-HV only supported on POWER9 and later");
  error_append_hint(errp, "Try appending -machine cap-nested-hv=off\n");
-} else if (kvm_enabled()) {
+return;
+}
+
+if (kvm_enabled()) {
  if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_00, 0,
spapr->max_compat_pvr)) {
-error_setg(errp, "Nested KVM-HV only supported on POWER9");
+error_setg(errp, "Nested-HV only supported on POWER9 and later");
  error_append_hint(errp,
"Try appending -machine 
max-cpu-compat=power9\n");
  return;
@@ -464,7 +468,7 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState 
*spapr,
  
  if 

[PATCH v2 1/3] block: Make bdrv_refresh_limits() non-recursive

2022-02-16 Thread Hanna Reitz
bdrv_refresh_limits() recurses down to the node's children.  That does
not seem necessary: When we refresh limits on some node, and then
recurse down and were to change one of its children's BlockLimits, then
that would mean we noticed the changed limits by pure chance.  The fact
that we refresh the parent's limits has nothing to do with it, so the
reason for the change probably happened before this point in time, and
we should have refreshed the limits then.

On the other hand, we do not have infrastructure for noticing that block
limits change after they have been initialized for the first time (this
would require propagating the change upwards to the respective node's
parents), and so evidently we consider this case impossible.

If this case is impossible, then we will not need to recurse down in
bdrv_refresh_limits().  Every node's limits are initialized in
bdrv_open_driver(), and are refreshed whenever its children change.
We want to use the childrens' limits to get some initial default, but we
can just take them, we do not need to refresh them.

The problem with recursing is that bdrv_refresh_limits() is not atomic.
It begins with zeroing BDS.bl, and only then sets proper, valid limits.
If we do not drain all nodes whose limits are refreshed, then concurrent
I/O requests can encounter invalid request_alignment values and crash
qemu.  Therefore, a recursing bdrv_refresh_limits() requires the whole
subtree to be drained, which is currently not ensured by most callers.

A non-recursive bdrv_refresh_limits() only requires the node in question
to not receive I/O requests, and this is done by most callers in some
way or another:
- bdrv_open_driver() deals with a new node with no parents yet
- bdrv_set_file_or_backing_noperm() acts on a drained node
- bdrv_reopen_commit() acts only on drained nodes
- bdrv_append() should in theory require the node to be drained; in
  practice most callers just lock the AioContext, which should at least
  be enough to prevent concurrent I/O requests from accessing invalid
  limits

So we can resolve the bug by making bdrv_refresh_limits() non-recursive.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1879437
Signed-off-by: Hanna Reitz 
Reviewed-by: Eric Blake 
---
 block/io.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index 4e4cb556c5..c3e7301613 100644
--- a/block/io.c
+++ b/block/io.c
@@ -189,10 +189,6 @@ void bdrv_refresh_limits(BlockDriverState *bs, Transaction 
*tran, Error **errp)
 QLIST_FOREACH(c, >children, next) {
 if (c->role & (BDRV_CHILD_DATA | BDRV_CHILD_FILTERED | BDRV_CHILD_COW))
 {
-bdrv_refresh_limits(c->bs, tran, errp);
-if (*errp) {
-return;
-}
 bdrv_merge_limits(>bl, >bs->bl);
 have_limits = true;
 }
-- 
2.34.1




[PATCH v2 2/3] iotests: Allow using QMP with the QSD

2022-02-16 Thread Hanna Reitz
Add a parameter to optionally open a QMP connection when creating a
QemuStorageDaemon instance.

Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/iotests.py | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 6ba65eb1ff..6027780180 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -39,6 +39,7 @@
 
 from qemu.machine import qtest
 from qemu.qmp import QMPMessage
+from qemu.aqmp.legacy import QEMUMonitorProtocol
 
 # Use this logger for logging messages directly from the iotests module
 logger = logging.getLogger('qemu.iotests')
@@ -348,14 +349,30 @@ def cmd(self, cmd):
 
 
 class QemuStorageDaemon:
-def __init__(self, *args: str, instance_id: str = 'a'):
+_qmp: Optional[QEMUMonitorProtocol] = None
+_qmpsock: Optional[str] = None
+# Python < 3.8 would complain if this type were not a string literal
+# (importing `annotations` from `__future__` would work; but not on <= 3.6)
+_p: 'Optional[subprocess.Popen[bytes]]' = None
+
+def __init__(self, *args: str, instance_id: str = 'a', qmp: bool = False):
 assert '--pidfile' not in args
 self.pidfile = os.path.join(test_dir, f'qsd-{instance_id}-pid')
 all_args = [qsd_prog] + list(args) + ['--pidfile', self.pidfile]
 
+if qmp:
+self._qmpsock = os.path.join(sock_dir, f'qsd-{instance_id}.sock')
+all_args += ['--chardev',
+ f'socket,id=qmp-sock,path={self._qmpsock}',
+ '--monitor', 'qmp-sock']
+
+self._qmp = QEMUMonitorProtocol(self._qmpsock, server=True)
+
 # Cannot use with here, we want the subprocess to stay around
 # pylint: disable=consider-using-with
 self._p = subprocess.Popen(all_args)
+if self._qmp is not None:
+self._qmp.accept()
 while not os.path.exists(self.pidfile):
 if self._p.poll() is not None:
 cmd = ' '.join(all_args)
@@ -370,11 +387,24 @@ def __init__(self, *args: str, instance_id: str = 'a'):
 
 assert self._pid == self._p.pid
 
+def qmp(self, cmd: str, args: Optional[Dict[str, object]] = None) \
+-> QMPMessage:
+assert self._qmp is not None
+return self._qmp.cmd(cmd, args)
+
 def stop(self, kill_signal=15):
 self._p.send_signal(kill_signal)
 self._p.wait()
 self._p = None
 
+if self._qmp:
+self._qmp.close()
+
+if self._qmpsock is not None:
+try:
+os.remove(self._qmpsock)
+except OSError:
+pass
 try:
 os.remove(self.pidfile)
 except OSError:
-- 
2.34.1




[PATCH v2 0/3] block: Make bdrv_refresh_limits() non-recursive

2022-02-16 Thread Hanna Reitz
Hi,

v1 with detailed reasoning:
https://lists.nongnu.org/archive/html/qemu-block/2022-02/msg00508.html

This series makes bdrv_refresh_limits() non-recursive so that it is
sufficient for callers to ensure that the node on which they call it
will not receive concurrent I/O requests (instead of ensuring the same
for the whole subtree).

We need to ensure such I/O does not happen because bdrv_refresh_limits()
is not atomic and will produce intermediate invalid values, which will
break concurrent I/O requests that read these values.


v2:
- Use separate `try` block to clean up in patch 2 instead of putting the
  `os.remove()` in the existing one (which would cause the second
  `os.remove()` to be skipped if the first one failed)


git-backport-diff against v1:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/3:[] [--] 'block: Make bdrv_refresh_limits() non-recursive'
002/3:[0005] [FC] 'iotests: Allow using QMP with the QSD'
003/3:[] [--] 'iotests/graph-changes-while-io: New test'


Hanna Reitz (3):
  block: Make bdrv_refresh_limits() non-recursive
  iotests: Allow using QMP with the QSD
  iotests/graph-changes-while-io: New test

 block/io.c|  4 -
 tests/qemu-iotests/iotests.py | 32 ++-
 .../qemu-iotests/tests/graph-changes-while-io | 91 +++
 .../tests/graph-changes-while-io.out  |  5 +
 4 files changed, 127 insertions(+), 5 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/graph-changes-while-io
 create mode 100644 tests/qemu-iotests/tests/graph-changes-while-io.out

-- 
2.34.1




[PATCH v2 3/3] iotests/graph-changes-while-io: New test

2022-02-16 Thread Hanna Reitz
Test the following scenario:
1. Some block node (null-co) attached to a user (here: NBD server) that
   performs I/O and keeps the node in an I/O thread
2. Repeatedly run blockdev-add/blockdev-del to add/remove an overlay
   to/from that node

Each blockdev-add triggers bdrv_refresh_limits(), and because
blockdev-add runs in the main thread, it does not stop the I/O requests.
I/O can thus happen while the limits are refreshed, and when such a
request sees a temporarily invalid block limit (e.g. alignment is 0),
this may easily crash qemu (or the storage daemon in this case).

The block layer needs to ensure that I/O requests to a node are paused
while that node's BlockLimits are refreshed.

Signed-off-by: Hanna Reitz 
Reviewed-by: Eric Blake 
---
 .../qemu-iotests/tests/graph-changes-while-io | 91 +++
 .../tests/graph-changes-while-io.out  |  5 +
 2 files changed, 96 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/graph-changes-while-io
 create mode 100644 tests/qemu-iotests/tests/graph-changes-while-io.out

diff --git a/tests/qemu-iotests/tests/graph-changes-while-io 
b/tests/qemu-iotests/tests/graph-changes-while-io
new file mode 100755
index 00..567e8cf21e
--- /dev/null
+++ b/tests/qemu-iotests/tests/graph-changes-while-io
@@ -0,0 +1,91 @@
+#!/usr/bin/env python3
+# group: rw
+#
+# Test graph changes while I/O is happening
+#
+# Copyright (C) 2022 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+from threading import Thread
+import iotests
+from iotests import imgfmt, qemu_img, qemu_img_create, QMPTestCase, \
+QemuStorageDaemon
+
+
+top = os.path.join(iotests.test_dir, 'top.img')
+nbd_sock = os.path.join(iotests.sock_dir, 'nbd.sock')
+
+
+def do_qemu_img_bench() -> None:
+"""
+Do some I/O requests on `nbd_sock`.
+"""
+assert qemu_img('bench', '-f', 'raw', '-c', '200',
+f'nbd+unix:///node0?socket={nbd_sock}') == 0
+
+
+class TestGraphChangesWhileIO(QMPTestCase):
+def setUp(self) -> None:
+# Create an overlay that can be added at runtime on top of the
+# null-co block node that will receive I/O
+assert qemu_img_create('-f', imgfmt, '-F', 'raw', '-b', 'null-co://',
+   top) == 0
+
+# QSD instance with a null-co block node in an I/O thread,
+# exported over NBD (on `nbd_sock`, export name "node0")
+self.qsd = QemuStorageDaemon(
+'--object', 'iothread,id=iothread0',
+'--blockdev', 'null-co,node-name=node0,read-zeroes=true',
+'--nbd-server', f'addr.type=unix,addr.path={nbd_sock}',
+'--export', 'nbd,id=exp0,node-name=node0,iothread=iothread0,' +
+'fixed-iothread=true,writable=true',
+qmp=True
+)
+
+def tearDown(self) -> None:
+self.qsd.stop()
+
+def test_blockdev_add_while_io(self) -> None:
+# Run qemu-img bench in the background
+bench_thr = Thread(target=do_qemu_img_bench)
+bench_thr.start()
+
+# While qemu-img bench is running, repeatedly add and remove an
+# overlay to/from node0
+while bench_thr.is_alive():
+result = self.qsd.qmp('blockdev-add', {
+'driver': imgfmt,
+'node-name': 'overlay',
+'backing': 'node0',
+'file': {
+'driver': 'file',
+'filename': top
+}
+})
+self.assert_qmp(result, 'return', {})
+
+result = self.qsd.qmp('blockdev-del', {
+'node-name': 'overlay'
+})
+self.assert_qmp(result, 'return', {})
+
+bench_thr.join()
+
+if __name__ == '__main__':
+# Format must support raw backing files
+iotests.main(supported_fmts=['qcow', 'qcow2', 'qed'],
+ supported_protocols=['file'])
diff --git a/tests/qemu-iotests/tests/graph-changes-while-io.out 
b/tests/qemu-iotests/tests/graph-changes-while-io.out
new file mode 100644
index 00..ae1213e6f8
--- /dev/null
+++ b/tests/qemu-iotests/tests/graph-changes-while-io.out
@@ -0,0 +1,5 @@
+.
+--
+Ran 1 tests
+
+OK
-- 
2.34.1




Re: [PATCH 3/3] x86: Switch to q35 as the default machine type

2022-02-16 Thread Gerd Hoffmann
  Hi,
 
> Given the semantic differences from 'i440fx', changing the default
> machine type has effects that are equivalent to breaking command
> line syntax compatibility, which is something we've always tried
> to avoid.

And if we are fine breaking backward compatibility I'd rather *not* pick
a default, effectively making -M $something mandatory, similar to arm.

take care,
  Gerd




Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID

2022-02-16 Thread Gavin Shan

On 2/15/22 4:32 PM, Andrew Jones wrote:

On Tue, Feb 15, 2022 at 04:19:01PM +0800, Gavin Shan wrote:

The issue isn't related to CPU topology directly. It's actually related
to the fact: the default NUMA node ID will be picked for one particular
CPU if the associated NUMA node ID isn't provided by users explicitly.
So it's related to the CPU-to-NUMA association.

For example, the CPU-to-NUMA association is breaking socket boundary
without the code change included in this patch when the guest is booted
with the command lines like below. With this patch applied, the CPU-to-NUMA
association is following socket boundary, to make Linux guest happy.


Gavin,

Please look at Igor's request for more information. Are we sure that a
socket is a NUMA node boundary? Are we sure we can assume an even
distribution for sockets to nodes or nodes to sockets? If so, where is
that documented?



Yes, I was investigating the code for Igor's questions, but I didn't
reach to conclusion when I replied to Yanan. I will reply to Igor's
thread and lets discuss it through over thread.

Thanks,
Gavin





Re: [PATCH v3 1/3] s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3 for the s390x

2022-02-16 Thread David Hildenbrand
On 16.02.22 11:31, David Hildenbrand wrote:
>> +static DisasJumpType op_sel(DisasContext *s, DisasOps *o)
>> +{
>> +DisasCompare c;
>> +disas_jcc(s, , get_field(s, m4));
>> +tcg_gen_movcond_i64(c.cond, o->out, c.u.s64.a, c.u.s64.b,
>> +o->in1, o->in2);
>> +free_compare();
>> +return DISAS_NEXT;
>> +}
> 
> 
> I realize that SELECT really is mostly identical to LOAD ON CONDITION,
> except that we have a second input.
> 
> The following on top would unify both
> 
> 
> diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
> index fb482b08b7..493f1d669c 100644
> --- a/target/s390x/tcg/insn-data.def
> +++ b/target/s390x/tcg/insn-data.def
> @@ -781,8 +781,8 @@
>  /* SEARCH STRING UNICODE */
>  C(0xb9be, SRSTU,   RRE,   ETF3, 0, 0, 0, 0, srstu, 0)
>  /* SELECT */
> -C(0xb9f0, SELR,RRF_a, MIE3, r2, r3, new, r1_32, sel, 0)
> -C(0xb9e3, SELGR,   RRF_a, MIE3, r2, r3, r1, 0, sel, 0)
> +C(0xb9f0, SELR,RRF_a, MIE3, r3, r2, new, r1_32, loc, 0)
> +C(0xb9e3, SELGR,   RRF_a, MIE3, r3, r2, r1, 0, loc, 0)

I forgot SELECT HIGH, requires similar adjustment.

-- 
Thanks,

David / dhildenb




Re: [PATCH v3 3/3] s390x/tcg/tests: Tests for Miscellaneous-Instruction-Extensions Facility 3

2022-02-16 Thread David Hildenbrand
On 15.02.22 21:27, David Miller wrote:
> tests/tcg/s390x/mie3-compl.c: [N]*K instructions
> tests/tcg/s390x/mie3-mvcrl.c: MVCRL instruction
> tests/tcg/s390x/mie3-sel.c:  SELECT instruction
> 

(I know, a lot of mails from my side :) )

1. I think we usually use the prefix in the subject "tests/tcg/s390x: "

2. Make sure the patches are checkpatch clean as good as possible:

For this patch:

$ rm *.patch
$ git format-patch -1
$ ./scripts/checkpatch.pl *.patch
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#45:
new file mode 100644

ERROR: spaces required around that ':' (ctx:VxW)
#53: FILE: tests/tcg/s390x/mie3-compl.c:4:
+#define F_EPI "stg %%r0, %[res] ": [res] "+m" (res) : : "r0", "r2", "r3"
  ^

ERROR: unnecessary whitespace before a quoted newline
#57: FILE: tests/tcg/s390x/mie3-compl.c:8:
+"lg %%r2, %[a] \n" \

ERROR: space prohibited before that close parenthesis ')'
#61: FILE: tests/tcg/s390x/mie3-compl.c:12:
+: "r2", "r3" )

ERROR: unnecessary whitespace before a quoted newline
#67: FILE: tests/tcg/s390x/mie3-compl.c:18:
+FbinOp(_ncrk,  asm("ncrk  %%r0, %%r3, %%r2 \n" F_EPI))

ERROR: unnecessary whitespace before a quoted newline
#68: FILE: tests/tcg/s390x/mie3-compl.c:19:
+FbinOp(_ncgrk, asm("ncgrk %%r0, %%r3, %%r2 \n" F_EPI))

...

-- 
Thanks,

David / dhildenb




Re: [PATCH v3 3/3] s390x/tcg/tests: Tests for Miscellaneous-Instruction-Extensions Facility 3

2022-02-16 Thread David Hildenbrand
On 16.02.22 10:57, David Hildenbrand wrote:
> On 15.02.22 21:27, David Miller wrote:
>> tests/tcg/s390x/mie3-compl.c: [N]*K instructions
>> tests/tcg/s390x/mie3-mvcrl.c: MVCRL instruction
>> tests/tcg/s390x/mie3-sel.c:  SELECT instruction
>>
>> Signed-off-by: David Miller 
>> ---
>>   tests/tcg/s390x/Makefile.target |  2 +-
>>   tests/tcg/s390x/mie3-compl.c| 56 +
>>   tests/tcg/s390x/mie3-mvcrl.c| 31 ++
>>   tests/tcg/s390x/mie3-sel.c  | 42 +
>>   4 files changed, 130 insertions(+), 1 deletion(-)
>>   create mode 100644 tests/tcg/s390x/mie3-compl.c
>>   create mode 100644 tests/tcg/s390x/mie3-mvcrl.c
>>   create mode 100644 tests/tcg/s390x/mie3-sel.c
>>
>> diff --git a/tests/tcg/s390x/Makefile.target 
>> b/tests/tcg/s390x/Makefile.target
>> index 1a7238b4eb..16b9d45307 100644
>> --- a/tests/tcg/s390x/Makefile.target
>> +++ b/tests/tcg/s390x/Makefile.target
>> @@ -1,6 +1,6 @@
>>   S390X_SRC=$(SRC_PATH)/tests/tcg/s390x
>>   VPATH+=$(S390X_SRC)
>> -CFLAGS+=-march=zEC12 -m64
>> +CFLAGS+=-march=z15 -m64
>>   TESTS+=hello-s390x
>>   TESTS+=csst
>>   TESTS+=ipm
> 
> Your patch is missing the following hunk:
> 
> diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
> index 16b9d45307..54e67446aa 100644
> --- a/tests/tcg/s390x/Makefile.target
> +++ b/tests/tcg/s390x/Makefile.target
> @@ -7,6 +7,9 @@ TESTS+=ipm
>  TESTS+=exrl-trt
>  TESTS+=exrl-trtr
>  TESTS+=pack
> +TESTS+=mie3-compl
> +TESTS+=mie3-mvcrl
> +TESTS+=mie3-sel
>  TESTS+=mvo
>  TESTS+=mvc
>  TESTS+=shift
> 
> 
> With debian11, I can build the tests. However, mie3-sel seems to have an 
> issue:
> 
> 
>   TESTmie3-compl on s390x
>   TESTmie3-mvcrl on s390x
>   TESTmie3-sel on s390x
> timeout: the monitored command dumped core
> make[3]: *** [../Makefile.target:156: run-mie3-sel] Error 132
> make[2]: *** [/home/dhildenb/git/qemu/tests/tcg/Makefile.qemu:102: 
> run-guest-tests] Error 2
> make[1]: *** [/home/dhildenb/git/qemu/tests/Makefile.include:59: 
> run-tcg-tests-s390x-linux-user] Error 2
> make[1]: Leaving directory '/home/dhildenb/git/qemu/build'
> make: *** [GNUmakefile:11: run-tcg-tests-s390x-linux-user] Error 2
> 
> qemu-s390x gets killed via
> 
> "Program terminated with signal SIGILL, Illegal instruction."
> 

Fault on my end, I forgot to copy the "SELECT HIGH" instruction when
manually applying each hunk.

With that, tests run fine under debian11.


-- 
Thanks,

David / dhildenb




Re: [PATCH 3/3] x86: Switch to q35 as the default machine type

2022-02-16 Thread Dr. David Alan Gilbert
* Gerd Hoffmann (kra...@redhat.com) wrote:
>   Hi,
>  
> > Given the semantic differences from 'i440fx', changing the default
> > machine type has effects that are equivalent to breaking command
> > line syntax compatibility, which is something we've always tried
> > to avoid.
> 
> And if we are fine breaking backward compatibility I'd rather *not* pick
> a default, effectively making -M $something mandatory, similar to arm.

Oh, that's probably easy to do;  what are other peoples thoughts on
that?

Dave

> take care,
>   Gerd
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [RFC 4/8] ioregionfd: Introduce IORegionDFObject type

2022-02-16 Thread Stefan Hajnoczi
On Tue, Feb 15, 2022 at 10:18:12AM -0800, Elena wrote:
> On Mon, Feb 14, 2022 at 02:37:21PM +, Stefan Hajnoczi wrote:
> > On Mon, Feb 07, 2022 at 11:22:18PM -0800, Elena Ufimtseva wrote:
> > > Signed-off-by: Elena Ufimtseva 
> > > ---
> > >  meson.build|  15 ++-
> > >  qapi/qom.json  |  32 +-
> > >  include/hw/remote/ioregionfd.h |  40 +++
> > >  hw/remote/ioregionfd.c | 196 +
> > >  Kconfig.host   |   3 +
> > >  MAINTAINERS|   2 +
> > >  hw/remote/Kconfig  |   4 +
> > >  hw/remote/meson.build  |   1 +
> > >  meson_options.txt  |   2 +
> > >  scripts/meson-buildoptions.sh  |   3 +
> > >  10 files changed, 294 insertions(+), 4 deletions(-)
> > >  create mode 100644 include/hw/remote/ioregionfd.h
> > >  create mode 100644 hw/remote/ioregionfd.c
> > > 
> > > diff --git a/meson.build b/meson.build
> > > index 96de1a6ef9..6483e754bd 100644
> > > --- a/meson.build
> > > +++ b/meson.build
> > > @@ -258,6 +258,17 @@ if targetos != 'linux' and 
> > > get_option('multiprocess').enabled()
> > >  endif
> > >  multiprocess_allowed = targetos == 'linux' and not 
> > > get_option('multiprocess').disabled()
> > >  
> > > +# TODO: drop this limitation
> > 
> > What is the reason for the limitation?
> >
> 
> The idea is to limit use of this acceleration until the API is more
> generic and does not need mutliprocess.

Please document that intention so readers understand why a limitation
is in place.

Thanks,
Stefan


signature.asc
Description: PGP signature


Re: [PATCH v3 1/3] s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3 for the s390x

2022-02-16 Thread David Hildenbrand
>   /* Really format SS_b, but we pack both lengths into one argument
> @@ -735,6 +753,9 @@
>   /* PACK UNICODE */
>   C(0xe100, PKU, SS_f,  E2,  la1, a2, 0, 0, pku, 0)
>   +/* POPULATION COUNT */
> +C(0xb9e1, POPCNT,  RRE,   PC,  0, r2_o, r1, 0, popcnt, nz64)

You actually need RRF_c instead of RRE.

Otherwise QEMU aborts when the guest executes POPCNT as RRE does not
include the m3 field.


-- 
Thanks,

David / dhildenb




Re: [RFC 0/8] ioregionfd introduction

2022-02-16 Thread Stefan Hajnoczi
On Tue, Feb 15, 2022 at 10:16:04AM -0800, Elena wrote:
> On Mon, Feb 14, 2022 at 02:52:29PM +, Stefan Hajnoczi wrote:
> > On Mon, Feb 07, 2022 at 11:22:14PM -0800, Elena Ufimtseva wrote:
> > > This patchset is an RFC version for the ioregionfd implementation
> > > in QEMU. The kernel patches are to be posted with some fixes as a v4.
> > > 
> > > For this implementation version 3 of the posted kernel patches was user:
> > > https://lore.kernel.org/kvm/cover.1613828726.git.eafanas...@gmail.com/
> > > 
> > > The future version will include support for vfio/libvfio-user.
> > > Please refer to the design discussion here proposed by Stefan:
> > > https://lore.kernel.org/all/YXpb1f3KicZxj1oj@stefanha-x1.localdomain/T/
> > > 
> > > The vfio-user version needed some bug-fixing and it was decided to send
> > > this for multiprocess first.
> > > 
> > > The ioregionfd is configured currently trough the command line and each
> > > ioregionfd represent an object. This allow for easy parsing and does
> > > not require device/remote object command line option modifications.
> > > 
> > > The following command line can be used to specify ioregionfd:
> > > 
> > >   '-object', 
> > > 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\
> > >   '-object', 
> > > 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1',\
> > >   '-object', 
> > > 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\
> > 
> 
> Hi Stefan
> 
> Thank you for taking a look!
> 
> > Explicit configuration of ioregionfd-object is okay for early
> > prototyping, but what is the plan for integrating this? I guess
> > x-remote-object would query the remote device to find out which
> > ioregionfds need to be registered and the user wouldn't need to specify
> > ioregionfds on the command-line?
> 
> Yes, this can be done. For some reason I thought that user will be able
> to configure the number/size of the regions to be configured as
> ioregionfds. 
> 
> > 
> > > 
> > > 
> > > Proxy side of ioregionfd in this version uses only one file descriptor:
> > > 
> > >   '-device', 
> > > 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()),
> > >  \
> > > 
> > 
> > This raises the question of the ioregionfd file descriptor lifecycle. In
> > the end I think it shouldn't be specified on the command-line. Instead
> > the remote device should create it and pass it to QEMU over the
> > mpqemu/remote fd?
> 
> Yes, this will be same as vfio-user does.
> > 
> > > 
> > > This is done for RFC version and my though was that next version will
> > > be for vfio-user, so I have not dedicated much effort to this command
> > > line options.
> > > 
> > > The multiprocess messaging protocol was extended to support inquiries
> > > by the proxy if device has any ioregionfds.
> > > This RFC implements inquires by proxy about the type of BAR (ioregionfd
> > > or not) and the type of it (memory/io).
> > > 
> > > Currently there are few limitations in this version of ioregionfd.
> > >  - one ioregionfd per bar, only full bar size is supported;
> > >  - one file descriptor per device for all of its ioregionfds;
> > >  - each remote device runs fd handler for all its BARs in one IOThread;
> > >  - proxy supports only one fd.
> > > 
> > > Some of these limitations will be dropped in the future version.
> > > This RFC is to acquire the feedback/suggestions from the community
> > > on the general approach.
> > > 
> > > The quick performance test was done for the remote lsi device with
> > > ioregionfd and without for both mem BARs (1 and 2) with help
> > > of the fio tool:
> > > 
> > > Random R/W:
> > > 
> > >read IOPS  read BW write IOPS   write BW
> > > no ioregionfd  8893559KiB/s   890  3561KiB/s
> > > ioregionfd 9383756KiB/s   939  3757KiB/s
> > 
> > This is extremely slow, even for random I/O. How does this compare to
> > QEMU running the LSI device without multi-process mode?
> 
> These tests had the iodepth=256. I have changed this to 1 and tested
> without multiprocess, with multiprocess and multiprocess with both mmio
> regions as ioregionfds:
> 
>read IOPS  read BW(KiB/s)  write IOPS   write BW 
> (KiB/s)
> no multiprocess 89 358   90   360
> multiprocess138556   139  557
> multiprocess ioregionfd   174  698   173  693
> 
> The fio config for randomrw:
> [global]
> bs=4K
> iodepth=1
> direct=0

Please set direct=1 so the guest page cache does not affect the I/O
pattern.

The host --drive option also needs cache.direct=on to avoid host page
cache effects.

The reason for benchmarking with direct=1 is to ensure that every I/O
request submitted by fio is forwarded to the underlying disk. Otherwise
the benchmark may be comparing guest page cache or host page cache hits,
which do not 

Re: [PATCH 3/3] x86: Switch to q35 as the default machine type

2022-02-16 Thread Daniel P . Berrangé
On Wed, Feb 16, 2022 at 11:01:24AM +, Dr. David Alan Gilbert wrote:
> * Gerd Hoffmann (kra...@redhat.com) wrote:
> >   Hi,
> >  
> > > Given the semantic differences from 'i440fx', changing the default
> > > machine type has effects that are equivalent to breaking command
> > > line syntax compatibility, which is something we've always tried
> > > to avoid.
> > 
> > And if we are fine breaking backward compatibility I'd rather *not* pick
> > a default, effectively making -M $something mandatory, similar to arm.
> 
> Oh, that's probably easy to do;  what are other peoples thoughts on
> that?

On the libvirt side it won't matter & will have no effect. Libvirt ignores
QEMU defaults an explicitly sets 'pc' as the default, so that our users
are protected against QEMU changes in defaults that could break app usage.

We would of course suggest that all apps using libvirt explicitly pick
a machine type they want, but if they dont, we'll set it for them and
guarantee that default won't change as long as the machine type exists
in QEMU.

IOW, whether QEMU selects 'q35' or   as the default
machine, libvirt will continue to set 'pc' as the default in absence
of a mgmt app explicit choice.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PULL v2 27/35] hw/riscv: virt: Use AIA INTC compatible string when available

2022-02-16 Thread Alistair Francis
From: Anup Patel 

We should use the AIA INTC compatible string in the CPU INTC
DT nodes when the CPUs support AIA feature. This will allow
Linux INTC driver to use AIA local interrupt CSRs.

Signed-off-by: Anup Patel 
Signed-off-by: Anup Patel 
Reviewed-by: Alistair Francis 
Reviewed-by: Frank Chang 
Message-id: 20220204174700.534953-17-a...@brainfault.org
Signed-off-by: Alistair Francis 
---
 hw/riscv/virt.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 2643c8bc37..e3068d6126 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -212,8 +212,17 @@ static void create_fdt_socket_cpus(RISCVVirtState *s, int 
socket,
 qemu_fdt_add_subnode(mc->fdt, intc_name);
 qemu_fdt_setprop_cell(mc->fdt, intc_name, "phandle",
 intc_phandles[cpu]);
-qemu_fdt_setprop_string(mc->fdt, intc_name, "compatible",
-"riscv,cpu-intc");
+if (riscv_feature(>soc[socket].harts[cpu].env,
+  RISCV_FEATURE_AIA)) {
+static const char * const compat[2] = {
+"riscv,cpu-intc-aia", "riscv,cpu-intc"
+};
+qemu_fdt_setprop_string_array(mc->fdt, intc_name, "compatible",
+  (char **), ARRAY_SIZE(compat));
+} else {
+qemu_fdt_setprop_string(mc->fdt, intc_name, "compatible",
+"riscv,cpu-intc");
+}
 qemu_fdt_setprop(mc->fdt, intc_name, "interrupt-controller", NULL, 0);
 qemu_fdt_setprop_cell(mc->fdt, intc_name, "#interrupt-cells", 1);
 
-- 
2.34.1




[PULL v2 30/35] target/riscv: Ignore reserved bits in PTE for RV64

2022-02-16 Thread Alistair Francis
From: Guo Ren 

Highest bits of PTE has been used for svpbmt, ref: [1], [2], so we
need to ignore them. They cannot be a part of ppn.

1: The RISC-V Instruction Set Manual, Volume II: Privileged Architecture
   4.4 Sv39: Page-Based 39-bit Virtual-Memory System
   4.5 Sv48: Page-Based 48-bit Virtual-Memory System

2: https://github.com/riscv/virtual-memory/blob/main/specs/663-Svpbmt-diff.pdf

Signed-off-by: Guo Ren 
Reviewed-by: Liu Zhiwei 
Reviewed-by: Alistair Francis 
Cc: Bin Meng 
Reviewed-by: Alistair Francis 
Message-Id: <20220204022658.18097-2-liwei...@iscas.ac.cn>
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.h| 15 +++
 target/riscv/cpu_bits.h   |  3 +++
 target/riscv/cpu_helper.c | 13 -
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 7ecb1387dd..cefccb4016 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -359,6 +359,8 @@ struct RISCVCPUConfig {
 bool ext_counters;
 bool ext_ifencei;
 bool ext_icsr;
+bool ext_svnapot;
+bool ext_svpbmt;
 bool ext_zfh;
 bool ext_zfhmin;
 bool ext_zve32f;
@@ -558,6 +560,19 @@ static inline int riscv_cpu_xlen(CPURISCVState *env)
 return 16 << env->xl;
 }
 
+#ifdef TARGET_RISCV32
+#define riscv_cpu_sxl(env)  ((void)(env), MXL_RV32)
+#else
+static inline RISCVMXL riscv_cpu_sxl(CPURISCVState *env)
+{
+#ifdef CONFIG_USER_ONLY
+return env->misa_mxl;
+#else
+return get_field(env->mstatus, MSTATUS64_SXL);
+#endif
+}
+#endif
+
 /*
  * Encode LMUL to lmul as follows:
  * LMULvlmullmul
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index 068c4d8034..b3489cbc10 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -565,6 +565,9 @@ typedef enum {
 /* Page table PPN shift amount */
 #define PTE_PPN_SHIFT   10
 
+/* Page table PPN mask */
+#define PTE_PPN_MASK0x3FFC00ULL
+
 /* Leaf page shift amount */
 #define PGSHIFT 12
 
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 430060dcd8..7df4569526 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -751,6 +751,8 @@ static int get_physical_address(CPURISCVState *env, hwaddr 
*physical,
 MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
 int mode = mmu_idx & TB_FLAGS_PRIV_MMU_MASK;
 bool use_background = false;
+hwaddr ppn;
+RISCVCPU *cpu = env_archcpu(env);
 
 /*
  * Check if we should use the background registers for the two
@@ -919,7 +921,16 @@ restart:
 return TRANSLATE_FAIL;
 }
 
-hwaddr ppn = pte >> PTE_PPN_SHIFT;
+if (riscv_cpu_sxl(env) == MXL_RV32) {
+ppn = pte >> PTE_PPN_SHIFT;
+} else if (cpu->cfg.ext_svpbmt || cpu->cfg.ext_svnapot) {
+ppn = (pte & (target_ulong)PTE_PPN_MASK) >> PTE_PPN_SHIFT;
+} else {
+ppn = pte >> PTE_PPN_SHIFT;
+if ((pte & ~(target_ulong)PTE_PPN_MASK) >> PTE_PPN_SHIFT) {
+return TRANSLATE_FAIL;
+}
+}
 
 if (!(pte & PTE_V)) {
 /* Invalid PTE */
-- 
2.34.1




Re: [PATCH v1] hw: riscv: opentitan: fixup SPI addresses

2022-02-16 Thread Alistair Francis
On Wed, Feb 16, 2022 at 4:23 PM Alistair Francis
 wrote:
>
> From: Wilfred Mallawa 
>
> This patch updates the SPI_DEVICE, SPI_HOST0, SPI_HOST1
> base addresses. Also adds these as unimplemented devices.
>
> The address references can be found [1].
>
> [1] 
> https://github.com/lowRISC/opentitan/blob/6c317992fbd646818b34f2a2dbf44bc850e461e4/hw/top_earlgrey/sw/autogen/top_earlgrey_memory.h#L107
>
> Signed-off-by: Wilfred Mallawa 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/riscv/opentitan.c | 12 +---
>  include/hw/riscv/opentitan.h |  4 +++-
>  2 files changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/hw/riscv/opentitan.c b/hw/riscv/opentitan.c
> index aec7cfa33f..596b518a26 100644
> --- a/hw/riscv/opentitan.c
> +++ b/hw/riscv/opentitan.c
> @@ -33,8 +33,10 @@ static const MemMapEntry ibex_memmap[] = {
>  [IBEX_DEV_RAM] ={  0x1000,  0x1 },
>  [IBEX_DEV_FLASH] =  {  0x2000,  0x8 },
>  [IBEX_DEV_UART] =   {  0x4000,  0x1000  },
> +[IBEX_DEV_SPI_HOST0] =  {  0x4030,  0x1000  },
> +[IBEX_DEV_SPI_HOST1] =  {  0x4031,  0x1000  },
>  [IBEX_DEV_GPIO] =   {  0x4004,  0x1000  },
> -[IBEX_DEV_SPI] ={  0x4005,  0x1000  },
> +[IBEX_DEV_SPI_DEVICE] = {  0x4005,  0x1000  },
>  [IBEX_DEV_I2C] ={  0x4008,  0x1000  },
>  [IBEX_DEV_PATTGEN] ={  0x400e,  0x1000  },
>  [IBEX_DEV_TIMER] =  {  0x4010,  0x1000  },
> @@ -209,8 +211,12 @@ static void lowrisc_ibex_soc_realize(DeviceState 
> *dev_soc, Error **errp)
>
>  create_unimplemented_device("riscv.lowrisc.ibex.gpio",
>  memmap[IBEX_DEV_GPIO].base, memmap[IBEX_DEV_GPIO].size);
> -create_unimplemented_device("riscv.lowrisc.ibex.spi",
> -memmap[IBEX_DEV_SPI].base, memmap[IBEX_DEV_SPI].size);
> +create_unimplemented_device("riscv.lowrisc.ibex.spi_device",
> +memmap[IBEX_DEV_SPI_DEVICE].base, memmap[IBEX_DEV_SPI_DEVICE].size);
> +create_unimplemented_device("riscv.lowrisc.ibex.spi_host0",
> +memmap[IBEX_DEV_SPI_HOST0].base, memmap[IBEX_DEV_SPI_HOST0].size);
> +create_unimplemented_device("riscv.lowrisc.ibex.spi_host1",
> +memmap[IBEX_DEV_SPI_HOST1].base, memmap[IBEX_DEV_SPI_HOST1].size);
>  create_unimplemented_device("riscv.lowrisc.ibex.i2c",
>  memmap[IBEX_DEV_I2C].base, memmap[IBEX_DEV_I2C].size);
>  create_unimplemented_device("riscv.lowrisc.ibex.pattgen",
> diff --git a/include/hw/riscv/opentitan.h b/include/hw/riscv/opentitan.h
> index eac35ef590..00da9ded43 100644
> --- a/include/hw/riscv/opentitan.h
> +++ b/include/hw/riscv/opentitan.h
> @@ -57,8 +57,10 @@ enum {
>  IBEX_DEV_FLASH,
>  IBEX_DEV_FLASH_VIRTUAL,
>  IBEX_DEV_UART,
> +IBEX_DEV_SPI_DEVICE,
> +IBEX_DEV_SPI_HOST0,
> +IBEX_DEV_SPI_HOST1,
>  IBEX_DEV_GPIO,
> -IBEX_DEV_SPI,
>  IBEX_DEV_I2C,
>  IBEX_DEV_PATTGEN,
>  IBEX_DEV_TIMER,
> --
> 2.35.1
>



[PULL v2 25/35] target/riscv: Implement AIA xiselect and xireg CSRs

2022-02-16 Thread Alistair Francis
From: Anup Patel 

The AIA specification defines [m|s|vs]iselect and [m|s|vs]ireg CSRs
which allow indirect access to interrupt priority arrays and per-HART
IMSIC registers. This patch implements AIA xiselect and xireg CSRs.

Signed-off-by: Anup Patel 
Signed-off-by: Anup Patel 
Reviewed-by: Frank Chang 
Message-id: 20220204174700.534953-15-a...@brainfault.org
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.h |   7 ++
 target/riscv/csr.c | 177 +
 target/riscv/machine.c |   3 +
 3 files changed, 187 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index f0e69f2871..c70de10c85 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -196,6 +196,10 @@ struct CPURISCVState {
 uint8_t miprio[64];
 uint8_t siprio[64];
 
+/* AIA CSRs */
+target_ulong miselect;
+target_ulong siselect;
+
 /* Hypervisor CSRs */
 target_ulong hstatus;
 target_ulong hedeleg;
@@ -229,6 +233,9 @@ struct CPURISCVState {
 target_ulong vstval;
 target_ulong vsatp;
 
+/* AIA VS-mode CSRs */
+target_ulong vsiselect;
+
 target_ulong mtval2;
 target_ulong mtinst;
 
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 39402a6a49..a186b31fcf 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -931,6 +931,169 @@ static int read_mtopi(CPURISCVState *env, int csrno, 
target_ulong *val)
 return RISCV_EXCP_NONE;
 }
 
+static int aia_xlate_vs_csrno(CPURISCVState *env, int csrno)
+{
+if (!riscv_cpu_virt_enabled(env)) {
+return csrno;
+}
+
+switch (csrno) {
+case CSR_SISELECT:
+return CSR_VSISELECT;
+case CSR_SIREG:
+return CSR_VSIREG;
+default:
+return csrno;
+};
+}
+
+static int rmw_xiselect(CPURISCVState *env, int csrno, target_ulong *val,
+target_ulong new_val, target_ulong wr_mask)
+{
+target_ulong *iselect;
+
+/* Translate CSR number for VS-mode */
+csrno = aia_xlate_vs_csrno(env, csrno);
+
+/* Find the iselect CSR based on CSR number */
+switch (csrno) {
+case CSR_MISELECT:
+iselect = >miselect;
+break;
+case CSR_SISELECT:
+iselect = >siselect;
+break;
+case CSR_VSISELECT:
+iselect = >vsiselect;
+break;
+default:
+ return RISCV_EXCP_ILLEGAL_INST;
+};
+
+if (val) {
+*val = *iselect;
+}
+
+wr_mask &= ISELECT_MASK;
+if (wr_mask) {
+*iselect = (*iselect & ~wr_mask) | (new_val & wr_mask);
+}
+
+return RISCV_EXCP_NONE;
+}
+
+static int rmw_iprio(target_ulong xlen,
+ target_ulong iselect, uint8_t *iprio,
+ target_ulong *val, target_ulong new_val,
+ target_ulong wr_mask, int ext_irq_no)
+{
+int i, firq, nirqs;
+target_ulong old_val;
+
+if (iselect < ISELECT_IPRIO0 || ISELECT_IPRIO15 < iselect) {
+return -EINVAL;
+}
+if (xlen != 32 && iselect & 0x1) {
+return -EINVAL;
+}
+
+nirqs = 4 * (xlen / 32);
+firq = ((iselect - ISELECT_IPRIO0) / (xlen / 32)) * (nirqs);
+
+old_val = 0;
+for (i = 0; i < nirqs; i++) {
+old_val |= ((target_ulong)iprio[firq + i]) << (IPRIO_IRQ_BITS * i);
+}
+
+if (val) {
+*val = old_val;
+}
+
+if (wr_mask) {
+new_val = (old_val & ~wr_mask) | (new_val & wr_mask);
+for (i = 0; i < nirqs; i++) {
+/*
+ * M-level and S-level external IRQ priority always read-only
+ * zero. This means default priority order is always preferred
+ * for M-level and S-level external IRQs.
+ */
+if ((firq + i) == ext_irq_no) {
+continue;
+}
+iprio[firq + i] = (new_val >> (IPRIO_IRQ_BITS * i)) & 0xff;
+}
+}
+
+return 0;
+}
+
+static int rmw_xireg(CPURISCVState *env, int csrno, target_ulong *val,
+ target_ulong new_val, target_ulong wr_mask)
+{
+bool virt;
+uint8_t *iprio;
+int ret = -EINVAL;
+target_ulong priv, isel, vgein;
+
+/* Translate CSR number for VS-mode */
+csrno = aia_xlate_vs_csrno(env, csrno);
+
+/* Decode register details from CSR number */
+virt = false;
+switch (csrno) {
+case CSR_MIREG:
+iprio = env->miprio;
+isel = env->miselect;
+priv = PRV_M;
+break;
+case CSR_SIREG:
+iprio = env->siprio;
+isel = env->siselect;
+priv = PRV_S;
+break;
+case CSR_VSIREG:
+iprio = env->hviprio;
+isel = env->vsiselect;
+priv = PRV_S;
+virt = true;
+break;
+default:
+ goto done;
+};
+
+/* Find the selected guest interrupt file */
+vgein = (virt) ? get_field(env->hstatus, HSTATUS_VGEIN) : 0;
+
+if (ISELECT_IPRIO0 <= isel && isel <= ISELECT_IPRIO15) {
+/* Local interrupt priority registers not available for VS-mode 

[PATCH 6/8] target/i386: Add MSR access interface for Arch LBR

2022-02-16 Thread Yang Weijiang
In the first generation of Arch LBR, the max support
Arch LBR depth is 32, both host and guest use the value
to set depth MSR. This can simplify the implementation
of patch given the side-effect of mismatch of host/guest
depth MSR: XRSTORS will reset all recording MSRs to 0s
if the saved depth mismatches MSR_ARCH_LBR_DEPTH.

In most of the cases Arch LBR is not in active status,
so check the control bit before save/restore the big
chunck of Arch LBR MSRs.

Signed-off-by: Yang Weijiang 
---
 target/i386/cpu.h | 10 +++
 target/i386/kvm/kvm.c | 67 +++
 2 files changed, 77 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 07b198539b..0cadd37c47 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -388,6 +388,11 @@ typedef enum X86Seg {
 #define MSR_IA32_TSX_CTRL  0x122
 #define MSR_IA32_TSCDEADLINE0x6e0
 #define MSR_IA32_PKRS   0x6e1
+#define MSR_ARCH_LBR_CTL0x14ce
+#define MSR_ARCH_LBR_DEPTH  0x14cf
+#define MSR_ARCH_LBR_FROM_0 0x1500
+#define MSR_ARCH_LBR_TO_0   0x1600
+#define MSR_ARCH_LBR_INFO_0 0x1200
 
 #define FEATURE_CONTROL_LOCKED(1<<0)
 #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX  (1ULL << 1)
@@ -1659,6 +1664,11 @@ typedef struct CPUX86State {
 uint64_t msr_xfd;
 uint64_t msr_xfd_err;
 
+/* Per-VCPU Arch LBR MSRs */
+uint64_t msr_lbr_ctl;
+uint64_t msr_lbr_depth;
+LBR_ENTRY lbr_records[ARCH_LBR_NR_ENTRIES];
+
 /* exception/interrupt handling */
 int error_code;
 int exception_is_int;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 764d110e0f..974ff3c0a5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3273,6 +3273,38 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
   env->msr_xfd_err);
 }
 
+if (kvm_enabled() && cpu->enable_pmu &&
+(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_ARCH_LBR)) {
+uint64_t depth;
+int i, ret;
+
+/*
+ * Only migrate Arch LBR states when: 1) Arch LBR is enabled
+ * for migrated vcpu. 2) the host Arch LBR depth equals that
+ * of source guest's, this is to avoid mismatch of guest/host
+ * config for the msr hence avoid unexpected misbehavior.
+ */
+ret = kvm_get_one_msr(cpu, MSR_ARCH_LBR_DEPTH, );
+
+if (ret == 1 && (env->msr_lbr_ctl & 0x1) && !!depth &&
+depth == env->msr_lbr_depth) {
+kvm_msr_entry_add(cpu, MSR_ARCH_LBR_CTL, env->msr_lbr_ctl);
+kvm_msr_entry_add(cpu, MSR_ARCH_LBR_DEPTH, env->msr_lbr_depth);
+
+for (i = 0; i < ARCH_LBR_NR_ENTRIES; i++) {
+if (!env->lbr_records[i].from) {
+continue;
+}
+kvm_msr_entry_add(cpu, MSR_ARCH_LBR_FROM_0 + i,
+  env->lbr_records[i].from);
+kvm_msr_entry_add(cpu, MSR_ARCH_LBR_TO_0 + i,
+  env->lbr_records[i].to);
+kvm_msr_entry_add(cpu, MSR_ARCH_LBR_INFO_0 + i,
+  env->lbr_records[i].info);
+}
+}
+}
+
 /* Note: MSR_IA32_FEATURE_CONTROL is written separately, see
  *   kvm_put_msr_feature_control. */
 }
@@ -3670,6 +3702,26 @@ static int kvm_get_msrs(X86CPU *cpu)
 kvm_msr_entry_add(cpu, MSR_IA32_XFD_ERR, 0);
 }
 
+if (kvm_enabled() && cpu->enable_pmu &&
+(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_ARCH_LBR)) {
+uint64_t ctl, depth;
+int i, ret2;
+
+ret = kvm_get_one_msr(cpu, MSR_ARCH_LBR_CTL, );
+ret2 = kvm_get_one_msr(cpu, MSR_ARCH_LBR_DEPTH, );
+if (ret == 1 && ret2 == 1 && (ctl & 0x1) &&
+depth == ARCH_LBR_NR_ENTRIES) {
+kvm_msr_entry_add(cpu, MSR_ARCH_LBR_CTL, 0);
+kvm_msr_entry_add(cpu, MSR_ARCH_LBR_DEPTH, 0);
+
+for (i = 0; i < ARCH_LBR_NR_ENTRIES; i++) {
+kvm_msr_entry_add(cpu, MSR_ARCH_LBR_FROM_0 + i, 0);
+kvm_msr_entry_add(cpu, MSR_ARCH_LBR_TO_0 + i, 0);
+kvm_msr_entry_add(cpu, MSR_ARCH_LBR_INFO_0 + i, 0);
+}
+}
+}
+
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MSRS, cpu->kvm_msr_buf);
 if (ret < 0) {
 return ret;
@@ -3972,6 +4024,21 @@ static int kvm_get_msrs(X86CPU *cpu)
 case MSR_IA32_XFD_ERR:
 env->msr_xfd_err = msrs[i].data;
 break;
+case MSR_ARCH_LBR_CTL:
+env->msr_lbr_ctl = msrs[i].data;
+break;
+case MSR_ARCH_LBR_DEPTH:
+env->msr_lbr_depth = msrs[i].data;
+break;
+case MSR_ARCH_LBR_FROM_0 ... MSR_ARCH_LBR_FROM_0 + 

[PULL v2 35/35] docs/system: riscv: Update description of CPU

2022-02-16 Thread Alistair Francis
From: Yu Li 

Since the hypervisor extension been non experimental and enabled for
default CPU, the previous command is no longer available and the
option `x-h=true` or `h=true` is also no longer required.

Signed-off-by: Yu Li 
Reviewed-by: Alistair Francis 
Message-Id: <9040401e-8f87-ef4a-d840-6703f08d0...@bytedance.com>
Signed-off-by: Alistair Francis 
---
 docs/system/riscv/virt.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/system/riscv/virt.rst b/docs/system/riscv/virt.rst
index fa016584bf..08ce3c4177 100644
--- a/docs/system/riscv/virt.rst
+++ b/docs/system/riscv/virt.rst
@@ -23,9 +23,9 @@ The ``virt`` machine supports the following devices:
 * 1 generic PCIe host bridge
 * The fw_cfg device that allows a guest to obtain data from QEMU
 
-Note that the default CPU is a generic RV32GC/RV64GC. Optional extensions
-can be enabled via command line parameters, e.g.: ``-cpu rv64,x-h=true``
-enables the hypervisor extension for RV64.
+The hypervisor extension has been enabled for the default CPU, so virtual
+machines with hypervisor extension can simply be used without explicitly
+declaring.
 
 Hardware configuration information
 --
-- 
2.34.1




[PATCH v3] Check and report for incomplete 'global' option format

2022-02-16 Thread Rohit Kumar
Qemu might crash when provided incomplete '-global' option.
For example:
 qemu-system-x86_64 -global driver=isa-fdc
 qemu-system-x86_64: ../../devel/qemu/qapi/string-input-visitor.c:394:
 string_input_visitor_new: Assertion `str' failed.
 Aborted (core dumped)

Fixes: 3751d7c43f795b ("vl: allow full-blown QemuOpts syntax for -global")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/604
Signed-off-by: Rohit Kumar 
---
 diff to v2:
  - Avoided double reporting of error.
  - Added the "Fixes" line in the commit message.

 softmmu/qdev-monitor.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 01f3834db5..e918ab8bf3 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -1034,6 +1034,13 @@ int qemu_global_option(const char *str)
 if (!opts) {
 return -1;
 }
+if (!qemu_opt_get(opts, "driver")
+|| !qemu_opt_get(opts, "property")
+|| !qemu_opt_get(opts, "value")) {
+error_report("options 'driver', 'property', and 'value'"
+ " are required");
+return -1;
+}
 
 return 0;
 }
-- 
2.25.1




[PATCH 3/8] target/i386: Add kvm_get_one_msr helper

2022-02-16 Thread Yang Weijiang
When try to get one msr from KVM, I found there's no such kind of
existing interface while kvm_put_one_msr() is there. So here comes
the patch. It'll remove redundant preparation code before finally
call KVM_GET_MSRS IOCTL.

No functional change intended.

Signed-off-by: Yang Weijiang 
---
 target/i386/kvm/kvm.c | 48 ---
 1 file changed, 27 insertions(+), 21 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8dbda2420d..764d110e0f 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -136,6 +136,7 @@ static struct kvm_msr_list *kvm_feature_msrs;
 
 #define BUS_LOCK_SLICE_TIME 10ULL /* ns */
 static RateLimit bus_lock_ratelimit_ctrl;
+static int kvm_get_one_msr(X86CPU *cpu, int index, uint64_t *value);
 
 int kvm_has_pit_state2(void)
 {
@@ -206,28 +207,21 @@ static int kvm_get_tsc(CPUState *cs)
 {
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = >env;
-struct {
-struct kvm_msrs info;
-struct kvm_msr_entry entries[1];
-} msr_data = {};
+uint64_t value;
 int ret;
 
 if (env->tsc_valid) {
 return 0;
 }
 
-memset(_data, 0, sizeof(msr_data));
-msr_data.info.nmsrs = 1;
-msr_data.entries[0].index = MSR_IA32_TSC;
 env->tsc_valid = !runstate_is_running();
 
-ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MSRS, _data);
+ret = kvm_get_one_msr(cpu, MSR_IA32_TSC, );
 if (ret < 0) {
 return ret;
 }
 
-assert(ret == 1);
-env->tsc = msr_data.entries[0].data;
+env->tsc = value;
 return 0;
 }
 
@@ -1485,21 +1479,14 @@ static int hyperv_init_vcpu(X86CPU *cpu)
  * the kernel doesn't support setting vp_index; assert that its value
  * is in sync
  */
-struct {
-struct kvm_msrs info;
-struct kvm_msr_entry entries[1];
-} msr_data = {
-.info.nmsrs = 1,
-.entries[0].index = HV_X64_MSR_VP_INDEX,
-};
-
-ret = kvm_vcpu_ioctl(cs, KVM_GET_MSRS, _data);
+uint64_t value;
+
+ret = kvm_get_one_msr(cpu, HV_X64_MSR_VP_INDEX, );
 if (ret < 0) {
 return ret;
 }
-assert(ret == 1);
 
-if (msr_data.entries[0].data != hyperv_vp_index(CPU(cpu))) {
+if (value != hyperv_vp_index(CPU(cpu))) {
 error_report("kernel's vp_index != QEMU's vp_index");
 return -ENXIO;
 }
@@ -2752,6 +2739,25 @@ static int kvm_put_one_msr(X86CPU *cpu, int index, 
uint64_t value)
 return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, cpu->kvm_msr_buf);
 }
 
+static int kvm_get_one_msr(X86CPU *cpu, int index, uint64_t *value)
+{
+int ret;
+struct {
+struct kvm_msrs info;
+struct kvm_msr_entry entries[1];
+} msr_data = {
+.info.nmsrs = 1,
+.entries[0].index = index,
+};
+
+ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MSRS, _data);
+if (ret < 0) {
+return ret;
+}
+assert(ret == 1);
+*value = msr_data.entries[0].data;
+return ret;
+}
 void kvm_put_apicbase(X86CPU *cpu, uint64_t value)
 {
 int ret;
-- 
2.27.0




Re: [RFC] vhost-vdpa: make notifiers _init()/_uninit() symmetric

2022-02-16 Thread Stefano Garzarella

On Fri, Feb 11, 2022 at 05:13:09PM +0100, Laurent Vivier wrote:

vhost_vdpa_host_notifiers_init() initializes queue notifiers
for queues "dev->vq_index" to queue "dev->vq_index + dev->nvqs",
whereas vhost_vdpa_host_notifiers_uninit() uninitializes the
same notifiers for queue "0" to queue "dev->nvqs".

This asymmetry seems buggy, fix that by using dev->vq_index
as the base for both.

Fixes: d0416d487bd5 ("vhost-vdpa: map virtqueue notification area if possible")
Cc: jasow...@redhat.com
Signed-off-by: Laurent Vivier 
---
hw/virtio/vhost-vdpa.c | 20 ++--
1 file changed, 10 insertions(+), 10 deletions(-)


Reviewed-by: Stefano Garzarella 



diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 04ea43704f5d..9be3dc66580c 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -395,15 +395,6 @@ static void vhost_vdpa_host_notifier_uninit(struct 
vhost_dev *dev,
}
}

-static void vhost_vdpa_host_notifiers_uninit(struct vhost_dev *dev, int n)
-{
-int i;
-
-for (i = 0; i < n; i++) {
-vhost_vdpa_host_notifier_uninit(dev, i);
-}
-}
-
static int vhost_vdpa_host_notifier_init(struct vhost_dev *dev, int queue_index)
{
size_t page_size = qemu_real_host_page_size;
@@ -442,6 +433,15 @@ err:
return -1;
}

+static void vhost_vdpa_host_notifiers_uninit(struct vhost_dev *dev, int n)
+{
+int i;
+
+for (i = dev->vq_index; i < dev->vq_index + n; i++) {
+vhost_vdpa_host_notifier_uninit(dev, i);
+}
+}
+
static void vhost_vdpa_host_notifiers_init(struct vhost_dev *dev)
{
int i;
@@ -455,7 +455,7 @@ static void vhost_vdpa_host_notifiers_init(struct vhost_dev 
*dev)
return;

err:
-vhost_vdpa_host_notifiers_uninit(dev, i);
+vhost_vdpa_host_notifiers_uninit(dev, i - dev->vq_index);
return;
}

--
2.34.1







[PATCH 6/6] aspeed/sdmc: Add trace events

2022-02-16 Thread Cédric Le Goater
This is useful to analyze changes in the U-Boot RAM driver when SDRAM
training is performed.

Signed-off-by: Cédric Le Goater 
---
 hw/misc/aspeed_sdmc.c | 2 ++
 hw/misc/trace-events  | 4 
 2 files changed, 6 insertions(+)

diff --git a/hw/misc/aspeed_sdmc.c b/hw/misc/aspeed_sdmc.c
index 08f856cbda7e..d2a3931033b3 100644
--- a/hw/misc/aspeed_sdmc.c
+++ b/hw/misc/aspeed_sdmc.c
@@ -130,6 +130,7 @@ static uint64_t aspeed_sdmc_read(void *opaque, hwaddr addr, 
unsigned size)
 return 0;
 }
 
+trace_aspeed_sdmc_read(addr, s->regs[addr]);
 return s->regs[addr];
 }
 
@@ -148,6 +149,7 @@ static void aspeed_sdmc_write(void *opaque, hwaddr addr, 
uint64_t data,
 return;
 }
 
+trace_aspeed_sdmc_write(addr, data);
 asc->write(s, addr, data);
 }
 
diff --git a/hw/misc/trace-events b/hw/misc/trace-events
index 1c373dd0a4c5..c3fc9fecbe34 100644
--- a/hw/misc/trace-events
+++ b/hw/misc/trace-events
@@ -205,6 +205,10 @@ aspeed_i3c_write(uint64_t offset, uint64_t data) "I3C 
write: offset 0x%" PRIx64
 aspeed_i3c_device_read(uint32_t deviceid, uint64_t offset, uint64_t data) "I3C 
Dev[%u] read: offset 0x%" PRIx64 " data 0x%" PRIx64
 aspeed_i3c_device_write(uint32_t deviceid, uint64_t offset, uint64_t data) 
"I3C Dev[%u] write: offset 0x%" PRIx64 " data 0x%" PRIx64
 
+# aspeed_sdmc.c
+aspeed_sdmc_write(uint32_t reg, uint32_t data) "reg @0x%" PRIx32 " data: 0x%" 
PRIx32
+aspeed_sdmc_read(uint32_t reg, uint32_t data) "reg @0x%" PRIx32 " data: 0x%" 
PRIx32
+
 # bcm2835_property.c
 bcm2835_mbox_property(uint32_t tag, uint32_t bufsize, size_t resplen) "mbox 
property tag:0x%08x in_sz:%u out_sz:%zu"
 
-- 
2.34.1




[PATCH 5/6] aspeed/smc: Add an address mask on segment registers

2022-02-16 Thread Cédric Le Goater
Only a limited set of bits are used for decoding the Start and End
addresses of the mapping window of a flash device.

Signed-off-by: Cédric Le Goater 
---
 include/hw/ssi/aspeed_smc.h |  1 +
 hw/ssi/aspeed_smc.c | 11 +++
 2 files changed, 12 insertions(+)

diff --git a/include/hw/ssi/aspeed_smc.h b/include/hw/ssi/aspeed_smc.h
index e2681996..cad73ddc13f2 100644
--- a/include/hw/ssi/aspeed_smc.h
+++ b/include/hw/ssi/aspeed_smc.h
@@ -99,6 +99,7 @@ struct AspeedSMCClass {
 uint8_t max_peripherals;
 const uint32_t *resets;
 const AspeedSegments *segments;
+uint32_t segment_addr_mask;
 hwaddr flash_window_base;
 uint32_t flash_window_size;
 uint32_t features;
diff --git a/hw/ssi/aspeed_smc.c b/hw/ssi/aspeed_smc.c
index ff154eb84f85..d899be17fd71 100644
--- a/hw/ssi/aspeed_smc.c
+++ b/hw/ssi/aspeed_smc.c
@@ -259,6 +259,10 @@ static void 
aspeed_smc_flash_set_segment_region(AspeedSMCState *s, int cs,
 memory_region_set_enabled(>mmio, !!seg.size);
 memory_region_transaction_commit();
 
+if (asc->segment_addr_mask) {
+regval &= asc->segment_addr_mask;
+}
+
 s->regs[R_SEG_ADDR0 + cs] = regval;
 }
 
@@ -1364,6 +1368,7 @@ static void aspeed_2400_fmc_class_init(ObjectClass 
*klass, void *data)
 asc->conf_enable_w0= CONF_ENABLE_W0;
 asc->max_peripherals   = 5;
 asc->segments  = aspeed_2400_fmc_segments;
+asc->segment_addr_mask = 0x;
 asc->resets= aspeed_2400_fmc_resets;
 asc->flash_window_base = 0x2000;
 asc->flash_window_size = 0x1000;
@@ -1446,6 +1451,7 @@ static void aspeed_2500_fmc_class_init(ObjectClass 
*klass, void *data)
 asc->conf_enable_w0= CONF_ENABLE_W0;
 asc->max_peripherals   = 3;
 asc->segments  = aspeed_2500_fmc_segments;
+asc->segment_addr_mask = 0x;
 asc->resets= aspeed_2500_fmc_resets;
 asc->flash_window_base = 0x2000;
 asc->flash_window_size = 0x1000;
@@ -1483,6 +1489,7 @@ static void aspeed_2500_spi1_class_init(ObjectClass 
*klass, void *data)
 asc->conf_enable_w0= CONF_ENABLE_W0;
 asc->max_peripherals   = 2;
 asc->segments  = aspeed_2500_spi1_segments;
+asc->segment_addr_mask = 0x;
 asc->flash_window_base = 0x3000;
 asc->flash_window_size = 0x800;
 asc->features  = 0x0;
@@ -1517,6 +1524,7 @@ static void aspeed_2500_spi2_class_init(ObjectClass 
*klass, void *data)
 asc->conf_enable_w0= CONF_ENABLE_W0;
 asc->max_peripherals   = 2;
 asc->segments  = aspeed_2500_spi2_segments;
+asc->segment_addr_mask = 0x;
 asc->flash_window_base = 0x3800;
 asc->flash_window_size = 0x800;
 asc->features  = 0x0;
@@ -1598,6 +1606,7 @@ static void aspeed_2600_fmc_class_init(ObjectClass 
*klass, void *data)
 asc->conf_enable_w0= CONF_ENABLE_W0;
 asc->max_peripherals   = 3;
 asc->segments  = aspeed_2600_fmc_segments;
+asc->segment_addr_mask = 0x0ff00ff0;
 asc->resets= aspeed_2600_fmc_resets;
 asc->flash_window_base = 0x2000;
 asc->flash_window_size = 0x1000;
@@ -1636,6 +1645,7 @@ static void aspeed_2600_spi1_class_init(ObjectClass 
*klass, void *data)
 asc->conf_enable_w0= CONF_ENABLE_W0;
 asc->max_peripherals   = 2;
 asc->segments  = aspeed_2600_spi1_segments;
+asc->segment_addr_mask = 0x0ff00ff0;
 asc->flash_window_base = 0x3000;
 asc->flash_window_size = 0x1000;
 asc->features  = ASPEED_SMC_FEATURE_DMA |
@@ -1674,6 +1684,7 @@ static void aspeed_2600_spi2_class_init(ObjectClass 
*klass, void *data)
 asc->conf_enable_w0= CONF_ENABLE_W0;
 asc->max_peripherals   = 3;
 asc->segments  = aspeed_2600_spi2_segments;
+asc->segment_addr_mask = 0x0ff00ff0;
 asc->flash_window_base = 0x5000;
 asc->flash_window_size = 0x1000;
 asc->features  = ASPEED_SMC_FEATURE_DMA |
-- 
2.34.1




Re: [PATCH v1 2/4] hyperv: Add definitions for syndbg

2022-02-16 Thread Emanuele Giuseppe Esposito



On 04/02/2022 11:07, Jon Doron wrote:
> Add all required definitions for hyperv synthetic debugger interface.
> 
> Signed-off-by: Jon Doron 
> ---
>  include/hw/hyperv/hyperv-proto.h | 52 
>  target/i386/kvm/hyperv-proto.h   | 37 +++
>  2 files changed, 89 insertions(+)
> 
> diff --git a/include/hw/hyperv/hyperv-proto.h 
> b/include/hw/hyperv/hyperv-proto.h
> index 21dc28aee9..94c9658eb0 100644
> --- a/include/hw/hyperv/hyperv-proto.h
> +++ b/include/hw/hyperv/hyperv-proto.h
> @@ -24,12 +24,17 @@
>  #define HV_STATUS_INVALID_PORT_ID 17
>  #define HV_STATUS_INVALID_CONNECTION_ID   18
>  #define HV_STATUS_INSUFFICIENT_BUFFERS19
> +#define HV_STATUS_NOT_ACKNOWLEDGED20
> +#define HV_STATUS_NO_DATA 27
>  
>  /*
>   * Hypercall numbers
>   */
>  #define HV_POST_MESSAGE   0x005c
>  #define HV_SIGNAL_EVENT   0x005d
> +#define HV_POST_DEBUG_DATA0x0069
> +#define HV_RETREIVE_DEBUG_DATA0x006a

s/RETREIVE/RETRIEVE?

> +#define HV_RESET_DEBUG_SESSION0x006b
>  #define HV_HYPERCALL_FAST (1u << 16)
>  
>  /*
> @@ -127,4 +132,51 @@ struct hyperv_event_flags_page {
>  struct hyperv_event_flags slot[HV_SINT_COUNT];
>  };
>  
> +/*
> + * Kernel debugger structures
> + */
> +
> +/* Options flags for hyperv_reset_debug_session */
> +#define HV_DEBUG_PURGE_INCOMING_DATA0x0001
> +#define HV_DEBUG_PURGE_OUTGOING_DATA0x0002
> +struct hyperv_reset_debug_session_input {
> +uint32_t options;
> +} __attribute__ ((__packed__));
> +
> +struct hyperv_reset_debug_session_output {
> +uint32_t host_ip;
> +uint32_t target_ip;
> +uint16_t host_port;
> +uint16_t target_port;
> +uint8_t host_mac[6];
> +uint8_t target_mac[6];
> +} __attribute__ ((__packed__));
> +
> +/* Options for hyperv_post_debug_data */
> +#define HV_DEBUG_POST_LOOP  0x0001
> +
> +struct hyperv_post_debug_data_input {
> +uint32_t count;
> +uint32_t options;

> +/*uint8_t data[HV_HYP_PAGE_SIZE - 2 * sizeof(uint32_t)];*/

What is this comment for?

> +} __attribute__ ((__packed__));
> +
> +struct hyperv_post_debug_data_output {
> +uint32_t pending_count;
> +} __attribute__ ((__packed__));
> +
> +/* Options for hyperv_retrieve_debug_data */
> +#define HV_DEBUG_RETRIEVE_LOOP  0x0001
> +#define HV_DEBUG_RETRIEVE_TEST_ACTIVITY 0x0002
> +
> +struct hyperv_retrieve_debug_data_input {
> +uint32_t count;
> +uint32_t options;
> +uint64_t timeout;
> +} __attribute__ ((__packed__));
> +
> +struct hyperv_retrieve_debug_data_output {
> +uint32_t retrieved_count;
> +uint32_t remaining_count;
> +} __attribute__ ((__packed__));
>  #endif
> diff --git a/target/i386/kvm/hyperv-proto.h b/target/i386/kvm/hyperv-proto.h
> index 89f81afda7..9480bcdf04 100644
> --- a/target/i386/kvm/hyperv-proto.h
> +++ b/target/i386/kvm/hyperv-proto.h
> @@ -19,6 +19,9 @@
>  #define HV_CPUID_ENLIGHTMENT_INFO 0x4004
>  #define HV_CPUID_IMPLEMENT_LIMITS 0x4005
>  #define HV_CPUID_NESTED_FEATURES  0x400A
> +#define HV_CPUID_SYNDBG_VENDOR_AND_MAX_FUNCTIONS0x4080
> +#define HV_CPUID_SYNDBG_INTERFACE   0x4081
> +#define HV_CPUID_SYNDBG_PLATFORM_CAPABILITIES   0x4082
>  #define HV_CPUID_MIN  0x4005
>  #define HV_CPUID_MAX  0x4000
>  #define HV_HYPERVISOR_PRESENT_BIT 0x8000
> @@ -55,8 +58,14 @@
>  #define HV_GUEST_IDLE_STATE_AVAILABLE   (1u << 5)
>  #define HV_FREQUENCY_MSRS_AVAILABLE (1u << 8)
>  #define HV_GUEST_CRASH_MSR_AVAILABLE(1u << 10)
> +#define HV_FEATURE_DEBUG_MSRS_AVAILABLE (1u << 11)
>  #define HV_STIMER_DIRECT_MODE_AVAILABLE (1u << 19)
>  
> +/*
> + * HV_CPUID_FEATURES.EBX bits
> + */
> +#define HV_PARTITION_DEUBGGING_ALLOWED  (1u << 12)
s/DEUBGGING/DEBUGGING
> +
>  /*
>   * HV_CPUID_ENLIGHTMENT_INFO.EAX bits
>   */
> @@ -72,6 +81,11 @@
>  #define HV_ENLIGHTENED_VMCS_RECOMMENDED (1u << 14)
>  #define HV_NO_NONARCH_CORESHARING   (1u << 18)
>  
> +/*
> + * HV_CPUID_SYNDBG_PLATFORM_CAPABILITIES.EAX bits
> + */
> +#define HV_SYNDBG_CAP_ALLOW_KERNEL_DEBUGGING(1u << 1)
> +
>  /*
>   * Basic virtualized MSRs
>   */
> @@ -130,6 +144,18 @@
>  #define HV_X64_MSR_STIMER3_CONFIG   0x40B6
>  #define HV_X64_MSR_STIMER3_COUNT0x40B7
>  
> +/*
> + * Hyper-V Synthetic debug options MSR
> + */
> +#define HV_X64_MSR_SYNDBG_CONTROL   0x40F1
> +#define HV_X64_MSR_SYNDBG_STATUS0x40F2
> +#define HV_X64_MSR_SYNDBG_SEND_BUFFER   0x40F3
> +#define HV_X64_MSR_SYNDBG_RECV_BUFFER   0x40F4
> +#define HV_X64_MSR_SYNDBG_PENDING_BUFFER0x40F5
> +#define 

Re: [PATCH 00/20] migration: Postcopy Preemption

2022-02-16 Thread Peter Xu
On Wed, Feb 16, 2022 at 02:27:49PM +0800, Peter Xu wrote:
> The new patch layout:
> 
> Patch 1-3: Three leftover patches from patchset "[PATCH v3 0/8] migration:
> Postcopy cleanup on ram disgard" that I picked up here too.
> 
>   https://lore.kernel.org/qemu-devel/20211224065000.97572-1-pet...@redhat.com/
> 
>   migration: Dump sub-cmd name in loadvm_process_command tp
>   migration: Finer grained tracepoints for POSTCOPY_LISTEN
>   migration: Tracepoint change in postcopy-run bottom half
> 
> Patch 4-9: Original postcopy preempt RFC preparation patches (with slight
> modifications).
> 
>   migration: Introduce postcopy channels on dest node
>   migration: Dump ramblock and offset too when non-same-page detected
>   migration: Add postcopy_thread_create()
>   migration: Move static var in ram_block_from_stream() into global
>   migration: Add pss.postcopy_requested status
>   migration: Move migrate_allow_multifd and helpers into migration.c
> 
> Patch 10-15: Some newly added patches when working on postcopy recovery
> support.  After these patches migrate-recover command will allow re-entrance,
> which is a very nice side effect.
> 
>   migration: Enlarge postcopy recovery to capture !-EIO too
>   migration: postcopy_pause_fault_thread() never fails
>   migration: Export ram_load_postcopy()
>   migration: Move channel setup out of postcopy_try_recover()
>   migration: Add migration_incoming_transport_cleanup()
>   migration: Allow migrate-recover to run multiple times

Patches before 15 are IMHO good in various aspects with/without the new
preemption, so they can be considered for review earlier.

Especially:

migration: Enlarge postcopy recovery to capture !-EIO too
migration: Add migration_incoming_transport_cleanup()
migration: Allow migrate-recover to run multiple times

Thanks,

-- 
Peter Xu




Re: [PATCH v1 3/4] hyperv: Add support to process syndbg commands

2022-02-16 Thread Emanuele Giuseppe Esposito



On 04/02/2022 11:07, Jon Doron wrote:
> SynDbg commands can come from two different flows:
> 1. Hypercalls, in this mode the data being sent is fully
>encapsulated network packets.
> 2. SynDbg specific MSRs, in this mode only the data that needs to be
>transfered is passed.
> 
> Signed-off-by: Jon Doron 
> ---
>  docs/hyperv.txt   |  15 +++
>  hw/hyperv/hyperv.c| 242 ++
>  include/hw/hyperv/hyperv.h|  58 
>  target/i386/cpu.c |   2 +
>  target/i386/cpu.h |   7 +
>  target/i386/kvm/hyperv-stub.c |   6 +
>  target/i386/kvm/hyperv.c  |  52 +++-
>  target/i386/kvm/kvm.c |  76 ++-
>  8 files changed, 450 insertions(+), 8 deletions(-)
> 
> diff --git a/docs/hyperv.txt b/docs/hyperv.txt
> index 0417c183a3..7abc1b2d89 100644
> --- a/docs/hyperv.txt
> +++ b/docs/hyperv.txt
> @@ -225,6 +225,21 @@ default (WS2016).
>  Note: hv-version-id-* are not enlightenments and thus don't enable Hyper-V
>  identification when specified without any other enlightenments.
>  
> +3.21. hv-syndbg
> +===
> +Enables Hyper-V synthetic debugger interface, this is a special interface 
> used
> +by Windows Kernel debugger to send the packets through, rather than sending
> +them via serial/network .
> +Whe enabled, this enlightenment provides additional communication facilities

When

> +to the guest: SynDbg messages.
> +This new communication is used by Windows Kernel debugger rather than sending
> +packets via serial/network, adding significant performance boost over the 
> other
> +comm channels.
> +This enlightenment requires a VMBus device (-device vmbus-bridge,irq=15)
> +and the follow enlightenments to work:
> +hv-relaxed,hv_time,hv-vapic,hv-vpindex,hv-synic,hv-runtime,hv-stimer
> +
> +
>  4. Supplementary features
>  =
>  
> diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
> index 88c9cc1334..c86e2aa02e 100644
> --- a/hw/hyperv/hyperv.c
> +++ b/hw/hyperv/hyperv.c
> @@ -730,3 +730,245 @@ uint16_t hyperv_hcall_signal_event(uint64_t param, bool 
> fast)
>  }
>  return HV_STATUS_INVALID_CONNECTION_ID;
>  }
> +
> +static HvSynDbgHandler hv_syndbg_handler;
> +static void *hv_syndbg_context;

Add a line here between field and function definition.

> +void hyperv_set_syndbg_handler(HvSynDbgHandler handler, void *context)
> +{
> +assert(!hv_syndbg_handler);
> +hv_syndbg_handler = handler;
> +hv_syndbg_context = context;
> +}
> +
> +uint16_t hyperv_hcall_reset_dbg_session(uint64_t outgpa)
> +{
> +uint16_t ret;
> +HvSynDbgMsg msg;
> +struct hyperv_reset_debug_session_output *reset_dbg_session = NULL;
> +hwaddr len;
> +
> +if (!hv_syndbg_handler) {
> +ret = HV_STATUS_INVALID_HYPERCALL_CODE;
> +goto cleanup;
> +}
> +
> +len = sizeof(*reset_dbg_session);
> +reset_dbg_session = cpu_physical_memory_map(outgpa, , 1);
> +if (!reset_dbg_session || len < sizeof(*reset_dbg_session)) {
> +ret = HV_STATUS_INSUFFICIENT_MEMORY;
> +goto cleanup;
> +}
> +
> +msg.type = HV_SYNDBG_MSG_CONNECTION_INFO;
> +ret = hv_syndbg_handler(hv_syndbg_context, );
> +if (ret) {
> +goto cleanup;
> +}
> +
> +reset_dbg_session->host_ip = msg.u.connection_info.host_ip;
> +reset_dbg_session->host_port = msg.u.connection_info.host_port;
> +/* The following fields are only used as validation for KDVM */
> +memset(_dbg_session->host_mac, 0,
> +   sizeof(reset_dbg_session->host_mac));
> +reset_dbg_session->target_ip = msg.u.connection_info.host_ip;
> +reset_dbg_session->target_port = msg.u.connection_info.host_port;
> +memset(_dbg_session->target_mac, 0,
> +   sizeof(reset_dbg_session->target_mac));
> +cleanup:
> +if (reset_dbg_session) {
> +cpu_physical_memory_unmap(reset_dbg_session,
> +  sizeof(*reset_dbg_session), 1, len);
> +}
> +
> +return ret;
> +}
> +
> +uint16_t hyperv_hcall_retreive_dbg_data(uint64_t ingpa, uint64_t outgpa,
> +bool fast)
> +{
> +uint16_t ret;
> +struct hyperv_retrieve_debug_data_input *debug_data_in = NULL;
> +struct hyperv_retrieve_debug_data_output *debug_data_out = NULL;
> +hwaddr in_len, out_len;
> +HvSynDbgMsg msg;
> +
> +if (fast || !hv_syndbg_handler) {
> +ret = HV_STATUS_INVALID_HYPERCALL_CODE;
> +goto cleanup;
> +}
> +
> +in_len = sizeof(*debug_data_in);
> +debug_data_in = cpu_physical_memory_map(ingpa, _len, 0);
> +if (!debug_data_in || in_len < sizeof(*debug_data_in)) {
> +ret = HV_STATUS_INSUFFICIENT_MEMORY;
> +goto cleanup;
> +}
> +
> +out_len = sizeof(*debug_data_out);
> +debug_data_out = cpu_physical_memory_map(outgpa, _len, 1);
> +if (!debug_data_out || out_len < sizeof(*debug_data_out)) {
> +ret = HV_STATUS_INSUFFICIENT_MEMORY;
> + 

Re: [PATCH 1/1] vdpa: Make ncs autofree

2022-02-16 Thread Stefano Garzarella

On Mon, Feb 14, 2022 at 08:34:15PM +0100, Eugenio Pérez wrote:

Simplifying memory management.

Signed-off-by: Eugenio Pérez 
---
net/vhost-vdpa.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)


Reviewed-by: Stefano Garzarella 



diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 4125d13118..4befba5cc7 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -264,7 +264,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
{
const NetdevVhostVDPAOptions *opts;
int vdpa_device_fd;
-NetClientState **ncs, *nc;
+g_autofree NetClientState **ncs = NULL;
+NetClientState *nc;
int queue_pairs, i, has_cvq = 0;

assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
@@ -302,7 +303,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
goto err;
}

-g_free(ncs);
return 0;

err:
@@ -310,7 +310,6 @@ err:
qemu_del_net_client(ncs[0]);
}
qemu_close(vdpa_device_fd);
-g_free(ncs);

return -1;
}
--
2.27.0







Re: Adding a handshake to qemu-guest-agent

2022-02-16 Thread Markus Armbruster
Michael Roth  writes:

> On Mon, Feb 14, 2022 at 03:14:37PM +0100, Markus Armbruster wrote:
>> Cc: the qemu-ga maintainer
>> 
>> John Snow  writes:
>> 
>> > [Moving our discussion upstream, because it stopped being brief and 
>> > simple.]
>
> Hi John, Markus,
>
>> 
>> Motivation: qemu-ga doesn't do capability negotiation as specified in
>> docs/interop/qmp-spec.txt.
>> 
>> Reminder: qmp-spec.txt specifies the server shall send a greeting
>> containing the capabilities on offer.  The client shall send a
>> qmp_capabilities command before any other command.
>> 
>> We can't just fix qemu-ga to comply, because it would break existing
>> clients.
>> 
>> We could document its behavior in qmp-spec.txt.  Easy enough, but also
>> kind of sad.
>
> I'm not sure we could've ever done it QMP-style with the initial
> greeting/negotiation mode. It's been a while, I but recall virtio-serial
> chardev in guest not having a very straight-forward way of flushing out
> data from the vring after a new client connects on the host side, so
> new clients had a chance of reading left-over garbage from previous
> client sessions. Or maybe it was open/close/open on the guest/chardev
> side that didn't cause the flush... anyway:
>
> This is why guest-sync was there, so you could verify the stream was
> in sync with a given "session ID" before continuing. But that doesn't
> help much if the stream is in some garbage state that parser can't
> recover from...
>
> This is why guest-sync-delimited was introduced; it inserts a 0xFF
> sential value (invalid for any normal QMP stream) prior to response that
> a client can scan for to flush the stream. Similarly, clients are
> supposed to precede guest-sync/guest-sync-delimited so QGA to get stuck
> trying to parse a partial read from an earlier client that is 'eating' a
> new request from a new client connection. I don't think these are really
> issues with vsock (or the other transports QGA accepts), but AFAIK
> Windows is still mostly reliant on virtio-serial, so these are probably
> still needed.

I believe you're right about the reason being virtio-serial.  I
documented it that way in commit 72e9e569d0 "docs/interop/qmp-spec: How
to force known good parser state".

2.6 Forcing the JSON parser into known-good state
-

Incomplete or invalid input can leave the server's JSON parser in a
state where it can't parse additional commands.  To get it back into
known-good state, the client should provoke a lexical error.

The cleanest way to do that is sending an ASCII control character
other than '\t' (horizontal tab), '\r' (carriage return), or '\n' (new
line).

Sadly, older versions of QEMU can fail to flag this as an error.  If a
client needs to deal with them, it should send a 0xFF byte.

2.7 QGA Synchronization
---

When a client connects to QGA over a transport lacking proper
connection semantics such as virtio-serial, QGA may have read partial
input from a previous client.  The client needs to force QGA's parser
into known-good state using the previous section's technique.
Moreover, the client may receive output a previous client didn't read.
To help with skipping that output, QGA provides the
'guest-sync-delimited' command.  Refer to its documentation for
details.

0xFF is invalid UTF-8, which is kind of icky.  We should've used a
proper control character like EOT (end of transmission) from the start.
Water under the bridge.

guest-sync has another design flaw: an unread command reply consisting
of just an integer can be confused with guest-sync's reply.  Unlikely as
long as guest-sync's @id argument is chosen at random, as its
documentation demands.

guest-sync could be deprecated, I guess.  

The @id argument of guest-sync and guest-sync-delimited feels kind of
redundant with the command object's @id member.  Except QGA didn't
conform to the QMP spec until commit 4eaca8de26 "qmp: common 'id'
handling & make QGA conform to QMP spec" (v4.0.0).  More water under the
bridge.

Note that there's no need for all this when the transport provides
proper connection semantics.  Clients relying on connection semantics
work fine even when they don't bother with this syncing stuff.  Do such
clients exist?  We probably don't know.  May or may not matter.

> So QGA has sort of always had its own hand-shake, ideally via
> guest-sync-delimited. So if this new negotiation mechanism could
> build off of that, rather than introducing something on top, that would
> be ideal. Unfortunately it's naming isn't great for what's being done
> here, but 'synchronize' is sorta in the ball-park at least...

Fair point.

>> Is there a way to add capability negotiation to qemu-ga without breaking
>> existing clients?  We obviously have to make it optional.
>> 
>> The obvious idea "make qmp_capabilities optional" doesn't work, because
>> the client needs to receive the 

[PATCH 0/8] Enable Architectural LBR for guest

2022-02-16 Thread Yang Weijiang
Architectural LBR (Arch LBR) is the enhancement for previous
non-Architectural LBR (Legacy LBR). This feature is introduced
in Intel Architecture Instruction Set Extensions and Future
Features Programming Reference[0]. The advantages of Arch LBR
can be referred to in native patch series[1].

Since Arch LBR relies on XSAVES/XRSTORS to boost memory save/
restore, QEMU needs to enable the support for XSS first. Similar
as Legacy LBR, QEMU uses lbr-fmt=0x3f parameter to advertise
Arch LBR feature to guest.

Note, the depth MSR has following side-effects: 1)On write to the
MSR, it'll reset all Arch LBR recording MSRs to 0s. 2) XRSTORS
resets all record MSRs to 0s if the saved depth mismatches
MSR_ARCH_LBR_DEPTH. As the first step, the Arch LBR virtulization
solution only supports guest depth == host depth to simplify the
implementation.

During live migration, before put Arch LBR msrs, it'll check the
depth setting of destination host, the LBR records are written to
destination only if both source and destination host depth MSR
settings match.

This patch series should be built with AMX QEMU patches in order
to set proper xsave area size.

[0]https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
[1]https://lore.kernel.org/lkml/1593780569-62993-1-git-send-email-kan.li...@linux.intel.com/

QEMU base-commit: ad38520bde

patch 1~2: The support patches for legacy LBR.
patch 3:   Add a helper function to clean up code and it'll be 
   used by Arch LBR patch too.
patch 4~5: Enable XSAVES support for Arch LBR.
patch 6~7: Enable Arch LBR live migration support.
patch 8:   Advertise Arch LBR feature.

Yang Weijiang (8):
  qdev-properties: Add a new macro with bitmask check for uint64_t
property
  target/i386: Add lbr-fmt vPMU option to support guest LBR
  target/i386: Add kvm_get_one_msr helper
  target/i386: Enable support for XSAVES based features
  target/i386: Add XSAVES support for Arch LBR
  target/i386: Add MSR access interface for Arch LBR
  target/i386: Enable Arch LBR migration states in vmstate
  target/i386: Support Arch LBR in CPUID enumeration

 hw/core/qdev-properties.c|  19 
 include/hw/qdev-properties.h |  12 +++
 target/i386/cpu.c| 169 +--
 target/i386/cpu.h|  56 +++-
 target/i386/kvm/kvm.c| 115 +++-
 target/i386/machine.c|  38 
 6 files changed, 361 insertions(+), 48 deletions(-)

-- 
2.27.0




[PATCH 4/6] aspeed: rainier: Add strap values taken from hardware

2022-02-16 Thread Cédric Le Goater
From: Joel Stanley 

When time permits, we should introduce defines for the HW strapping
registers to cleanly decode the values.

SCU500 = 0x00422016
  Disable ARM JTAG trusted world debug: 0x1
  Disable ARM JTAG debug: 0x1
  VGA Memory Size: 0x1 [16MB]
  Cortex M3: 0x1 [Disabled]
  Boot device: 0x1 [eMMC]
  Reserved: 0x1

SCU510 = 0x8848
  Secure Boot Enable: 0x1
  Enable boot SPI or eMMC ABR (second boot): 0x1
  Enable LPC mode: 0x1 [LPC]
  Disable LPC SuperIO 0x2e/0x4e: 0x1

Signed-off-by: Joel Stanley 
[ clg: rewrote the commit log ]
Signed-off-by: Cédric Le Goater 
---
 hw/arm/aspeed.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index 0e5e5c31d59c..ffda0d51966a 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -171,8 +171,8 @@ struct AspeedMachineState {
 #define TACOMA_BMC_HW_STRAP2  0x0040
 
 /* Rainier hardware value: (QEMU prototype) */
-#define RAINIER_BMC_HW_STRAP1 0x
-#define RAINIER_BMC_HW_STRAP2 0x
+#define RAINIER_BMC_HW_STRAP1 0x00422016
+#define RAINIER_BMC_HW_STRAP2 0x8848
 
 /* Fuji hardware value */
 #define FUJI_BMC_HW_STRAP10x
-- 
2.34.1




Re: [PATCH 3/3] iotests/graph-changes-while-io: New test

2022-02-16 Thread Hanna Reitz

On 15.02.22 23:22, Eric Blake wrote:

On Tue, Feb 15, 2022 at 02:57:27PM +0100, Hanna Reitz wrote:

Test the following scenario:
1. Some block node (null-co) attached to a user (here: NBD server) that
performs I/O and keeps the node in an I/O thread
2. Repeatedly run blockdev-add/blockdev-del to add/remove an overlay
to/from that node

Each blockdev-add triggers bdrv_refresh_limits(), and because
blockdev-add runs in the main thread, it does not stop the I/O requests.
I/O can thus happen while the limits are refreshed, and when such a
request sees a temporarily invalid block limit (e.g. alignment is 0),
this may easily crash qemu (or the storage daemon in this case).

The block layer needs to ensure that I/O requests to a node are paused
while that node's BlockLimits are refreshed.

Signed-off-by: Hanna Reitz 
---
  .../qemu-iotests/tests/graph-changes-while-io | 91 +++
  .../tests/graph-changes-while-io.out  |  5 +
  2 files changed, 96 insertions(+)
  create mode 100755 tests/qemu-iotests/tests/graph-changes-while-io
  create mode 100644 tests/qemu-iotests/tests/graph-changes-while-io.out

Reviewed-by: Eric Blake 

Since we found this with the help of NBD, should I be considering this
series for my NBD queue, or is there a better block-related maintainer
queue that it should go through?


Well, we found it by using a guest, it’s just that using a guest in the 
iotests is not quite that great, so we need some other way to induce I/O 
(concurrently to monitor commands).  I could’ve used FUSE, too, but NBD 
is always compiled in, so. :)


In any case, of course I don’t mind who takes this series.  If you want 
to take it, go ahead (and thanks!) – I’ll be sending a v2 to split the 
`try` block in patch 2, though.


Hanna




Re: [PATCH v1 4/4] hw: hyperv: Initial commit for Synthetic Debugging device

2022-02-16 Thread Emanuele Giuseppe Esposito


> +
> +static uint16_t handle_recv_msg(HvSynDbg *syndbg, uint64_t outgpa,
> +uint32_t count, bool is_raw, uint32_t 
> options,
> +uint64_t timeout, uint32_t *retrieved_count)
> +{
> +uint16_t ret;
> +uint8_t data_buf[TARGET_PAGE_SIZE - UDP_PKT_HEADER_SIZE];
> +hwaddr out_len;
> +void *out_data = NULL;
> +ssize_t recv_byte_count;
> +
> +/* TODO: Handle options and timeout */
> +(void)options;
> +(void)timeout;
> +
> +if (!syndbg->has_data_pending) {
> +recv_byte_count = 0;
> +} else {
> +recv_byte_count = qemu_recv(syndbg->socket, data_buf,
> +MIN(sizeof(data_buf), count), 
> MSG_WAITALL);
> +if (recv_byte_count == -1) {
> +ret = HV_STATUS_INVALID_PARAMETER;
> +goto cleanup;
> +}
> +}
> +
> +if (!recv_byte_count) {
> +*retrieved_count = 0;
> +ret = HV_STATUS_NO_DATA;
> +goto cleanup;
> +}
> +
> +set_pending_state(syndbg, false);
> +
> +out_len = recv_byte_count;
> +if (is_raw) {
> +out_len += UDP_PKT_HEADER_SIZE;
> +}
> +out_data = cpu_physical_memory_map(outgpa, _len, 1);
> +if (!out_data) {
> +ret = HV_STATUS_INSUFFICIENT_MEMORY;
> +goto cleanup;
> +}
> +
> +if (is_raw &&
> +!create_udp_pkt(syndbg, out_data,
> +recv_byte_count + UDP_PKT_HEADER_SIZE,
> +data_buf, recv_byte_count)) {
> +ret = HV_STATUS_INSUFFICIENT_MEMORY;
> +goto cleanup;
> +} else if (!is_raw) {
> +memcpy(out_data, data_buf, recv_byte_count);
> +}
> +
> +*retrieved_count = recv_byte_count;
> +if (is_raw) {
> +*retrieved_count += UDP_PKT_HEADER_SIZE;
> +}
> +ret = HV_STATUS_SUCCESS;
> +cleanup:
> +if (out_data) {
> +cpu_physical_memory_unmap(out_data, out_len, 1, out_len);
> +}

Same nitpick as done in patch 1, I think you can use more gotos labels
instead of adding if statements.

> +
> +return ret;
> +}
> +




Re: [RFC PATCH] i386/tcg: add AVX/AVX2 support (severely incomplete, just for preliminary feedback)

2022-02-16 Thread Richard Henderson

On 2/16/22 07:56, Alexander Kanavin wrote:

Lack of AVX/AVX2 support in the i386 TCG has been a significant gap
for a long while; I've started work to close this gap.

This is of course nowhere near complete, or even buildable, I'm
just requesting initial feedback from the qemu gurus - am I on
the right track with this? Does something need to be done differently?

There's an enormous amount of legacy SSE instructions to adjust
for VEX-128 and VEX-256 flavours, so I would want to know that this
way would be acceptable.

Signed-off-by: Alexander Kanavin
---


Have a look at updating some existing work:

https://lore.kernel.org/qemu-devel/20190821172951.15333-1-jan.bo...@gmail.com/


r~



Re: [PATCH v3 3/3] s390x/tcg/tests: Tests for Miscellaneous-Instruction-Extensions Facility 3

2022-02-16 Thread David Hildenbrand
On 16.02.22 10:43, Thomas Huth wrote:
> On 16/02/2022 10.17, David Hildenbrand wrote:
>> On 15.02.22 21:27, David Miller wrote:
> ...
>>> diff --git a/tests/tcg/s390x/Makefile.target
>>> b/tests/tcg/s390x/Makefile.target
>>> index 1a7238b4eb..16b9d45307 100644
>>> --- a/tests/tcg/s390x/Makefile.target
>>> +++ b/tests/tcg/s390x/Makefile.target
>>> @@ -1,6 +1,6 @@
>>>S390X_SRC=$(SRC_PATH)/tests/tcg/s390x
>>>VPATH+=$(S390X_SRC)
>>> -CFLAGS+=-march=zEC12 -m64
>>> +CFLAGS+=-march=z15 -m64
>>
>> Unfortunately, this makes our docker builds unhappy -- fail. I assume the
>> compiler in the container is outdated.
>>
>> $ make run-tcg-tests-s390x-linux-user
>> changing dir to build for make "run-tcg-tests-s390x-linux-user"...
>> make[1]: Entering directory '/home/dhildenb/git/qemu/build'
>>GIT ui/keycodemapdb tests/fp/berkeley-testfloat-3 
>> tests/fp/berkeley-softfloat-3 dtc capstone slirp
>>BUILD   debian10
>>BUILD   debian-s390x-cross
>>BUILD   TCG tests for s390x-linux-user
>>CHECK   debian10
>>CHECK   debian-s390x-cross
>>BUILD   s390x-linux-user guest-tests with docker qemu/debian-s390x-cross
>> s390x-linux-gnu-gcc: error: unrecognized argument in option '-march=z15'
>> s390x-linux-gnu-gcc: note: valid arguments to '-march=' are: arch10 arch11 
>> arch12 arch3 arch5 arch6 arch7 arch8 arch9 g5 g6 native z10 z13 z14 z196 
>> z9-109 z9-ec z900 z990 zEC12; did you mean 'z10'?
>>
>> Maybe debian11 could, work.
>>
>> @Thomas do you have any idea if we could get this to work with
>> '-march=z15' or should we work around that by manually encoding
>> the relevant instructions instead?
> 
> I'm not an expert when it comes to containers, but I think you could try to 
> update to debian11 in tests/docker/dockerfiles/debian-s390x-cross.docker and 
> in ./.gitlab-ci.d/container-cross.yml ... if that does not work, it's maybe 
> better to manually encode the instructions.

debian11 won't work for general cross builds.

But, it should work with the tests:


>From 8108b075e4f74fa4c590f3acf932e221e166889c Mon Sep 17 00:00:00 2001
From: David Hildenbrand 
Date: Wed, 16 Feb 2022 10:45:21 +0100
Subject: [PATCH] tests/tcg/s390x: Build tests with debian11

We need a newer compiler to build with -march=z15, to be used soon.

Signed-off-by: David Hildenbrand 
---
 tests/docker/Makefile.include   |  2 ++
 .../dockerfiles/debian-s390x-test-cross.docker  | 13 +
 tests/tcg/configure.sh  |  2 +-
 3 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 tests/docker/dockerfiles/debian-s390x-test-cross.docker

diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index f1a0c5db7a..e77e5a2f40 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -210,6 +210,7 @@ docker-image-debian-arm64-test-cross: docker-image-debian11
 docker-image-debian-microblaze-cross: docker-image-debian10
 docker-image-debian-nios2-cross: docker-image-debian10
 docker-image-debian-powerpc-test-cross: docker-image-debian11
+docker-image-debian-s390x-test-cross: docker-image-debian11
 
 # These images may be good enough for building tests but not for test builds
 DOCKER_PARTIAL_IMAGES += debian-alpha-cross
@@ -219,6 +220,7 @@ DOCKER_PARTIAL_IMAGES += debian-hppa-cross
 DOCKER_PARTIAL_IMAGES += debian-m68k-cross debian-mips64-cross
 DOCKER_PARTIAL_IMAGES += debian-microblaze-cross
 DOCKER_PARTIAL_IMAGES += debian-nios2-cross
+DOCKER_PARTIAL_IMAGES += debian-s390x-test-cross
 DOCKER_PARTIAL_IMAGES += debian-sh4-cross debian-sparc64-cross
 DOCKER_PARTIAL_IMAGES += debian-tricore-cross
 DOCKER_PARTIAL_IMAGES += debian-xtensa-cross
diff --git a/tests/docker/dockerfiles/debian-s390x-test-cross.docker 
b/tests/docker/dockerfiles/debian-s390x-test-cross.docker
new file mode 100644
index 00..26435287b6
--- /dev/null
+++ b/tests/docker/dockerfiles/debian-s390x-test-cross.docker
@@ -0,0 +1,13 @@
+#
+# Docker s390x cross-compiler target (tests only)
+#
+# This docker target builds on the debian Bullseye base image.
+#
+FROM qemu/debian11
+
+# Add the foreign architecture we want and install dependencies
+RUN dpkg --add-architecture s390x
+RUN apt update && \
+DEBIAN_FRONTEND=noninteractive eatmydata \
+apt install -y --no-install-recommends \
+crossbuild-essential-s390x gcc-10-s390x-linux-gnu
diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
index 763e9b6ad8..3f00f9307f 100755
--- a/tests/tcg/configure.sh
+++ b/tests/tcg/configure.sh
@@ -185,7 +185,7 @@ for target in $target_list; do
   ;;
 s390x-*)
   container_hosts=x86_64
-  container_image=debian-s390x-cross
+  container_image=debian-s390x-test-cross
   container_cross_cc=s390x-linux-gnu-gcc
   ;;
 sh4-*)
-- 
2.34.1

-- 
Thanks,

David / dhildenb




[PATCH v12 1/5] target/ppc: fix indent of powerpc_set_excp_state()

2022-02-16 Thread Daniel Henrique Barboza
Reviewed-by: David Gibson 
Signed-off-by: Daniel Henrique Barboza 
---
 target/ppc/excp_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index fcc83a7701..bbc75afbc0 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -361,7 +361,7 @@ static void ppc_excp_apply_ail(PowerPCCPU *cpu, int excp, 
target_ulong msr,
 #endif
 
 static void powerpc_set_excp_state(PowerPCCPU *cpu,
-  target_ulong vector, target_ulong 
msr)
+   target_ulong vector, target_ulong msr)
 {
 CPUState *cs = CPU(cpu);
 CPUPPCState *env = >env;
-- 
2.34.1




Re: [RFC PATCH] i386/tcg: add AVX/AVX2 support (severely incomplete, just for preliminary feedback)

2022-02-16 Thread Alexander Kanavin
On Wed, 16 Feb 2022 at 10:24, Richard Henderson
 wrote:
> > There's an enormous amount of legacy SSE instructions to adjust
> > for VEX-128 and VEX-256 flavours, so I would want to know that this
> > way would be acceptable.
> >
> > Signed-off-by: Alexander Kanavin
> > ---
>
> Have a look at updating some existing work:
>
> https://lore.kernel.org/qemu-devel/20190821172951.15333-1-jan.bo...@gmail.com/

Nice! I agree that gen_sse() is in a bad need of a structured,
extensible rewrite and glad to see it's been done, as I've been merely
tweaking existing code.

Jan, do you have a later version of these patches, or is v4 the final
revision as of now? Do you have them in a git branch somewhere?

Alex



[PATCH v12 0/5] PMU-EBB support for PPC64 TCG

2022-02-16 Thread Daniel Henrique Barboza
Hi,

This new version adds a new patch (patch 2) that fixes --disable-tcg
--disable-linux-user compilation.

The series was based on upstream master.

Changes from v12:
- patch 2 (new):
  * make power8-pmu.c compile only with CONFIG_TCG available
- patch 4 (former 3):
  * added Cedric's r-b
- v11 link: https://lists.gnu.org/archive/html/qemu-devel/2022-02/msg02693.html

Daniel Henrique Barboza (5):
  target/ppc: fix indent of powerpc_set_excp_state()
  target/ppc: make power8-pmu.c CONFIG_TCG only
  target/ppc: finalize pre-EBB PMU logic
  target/ppc: add PPC_INTERRUPT_EBB and EBB exceptions
  target/ppc: trigger PERFM EBBs from power8-pmu.c

 target/ppc/cpu.h |  5 ++-
 target/ppc/cpu_init.c|  9 +++--
 target/ppc/excp_helper.c | 83 +++-
 target/ppc/helper.h  |  1 +
 target/ppc/machine.c |  2 +
 target/ppc/meson.build   |  2 +-
 target/ppc/power8-pmu.c  | 39 +--
 target/ppc/power8-pmu.h  |  4 +-
 8 files changed, 133 insertions(+), 12 deletions(-)

-- 
2.34.1




Re: [PATCH v2] nbd/server: Allow MULTI_CONN for shared writable exports

2022-02-16 Thread Richard W.M. Jones
On Tue, Feb 15, 2022 at 05:24:14PM -0600, Eric Blake wrote:
> Oh. The QMP command (which is immediately visible through
> nbd-server-add/block-storage-add to qemu and qemu-storage-daemon)
> gains "multi-conn":"on", but you may be right that qemu-nbd would want
> a command line option (either that, or we accellerate our plans that
> qsd should replace qemu-nbd).

I really hope there will always be something called "qemu-nbd"
that acts like qemu-nbd.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/




[PATCH v2 0/4] HyperV: Synthetic Debugging device

2022-02-16 Thread Jon Doron
This patchset adds support for the synthetic debugging device.

HyperV supports a special transport layer for the kernel debugger when
running in HyperV.

This patchset add supports for this device so you could have a setup
fast windows kernel debugging.

At this point of time, DHCP is not implmeneted so to set this up few
things need to be noted.

The scenario I used to test is having 2 VMs in the same virtual network
i.e a Debugger VM with the NIC:
-nic tap,model=virtio,mac=02:ca:01:01:01:01,script=/etc/qemu-ifup
And it's IP is going to be static 192.168.53.12
And the VM we want to debug, to which we need to have the englightments
and vmbus configured:
 -cpu 
host,hv-relaxed,hv_spinlocks=0x1fff,hv_time,+vmx,invtsc,hv-vapic,hv-vpindex,hv-synic,hv-syndbg
 \
 -device vmbus-bridge \
 -device hv-syndbg,host_ip=192.168.53.12,host_port=5,use_hcalls=false \
 -nic tap,model=virtio,mac=02:ca:01:01:01:02,script=/etc/qemu-ifup \

Then in the debuggee VM we would setup the kernel debugging in the
following way:

If the VM is older than Win8:
* Copy the proper platform kdvm.dll (make sure it's called kdvm.dll even if 
platform is 32bit)
bcdedit /set {GUID} dbgtransport kdvm.dll
bcdedit /set {GUID} loadoptions host_ip="1.2.3.4",host_port="5",nodhcp
bcdedit /set {GUID} debug on
bcdedit /set {GUID} halbreakpoint on

Win8 and late:
bcdedit /dbgsettings net hostip:7.7.7.7 port:5 nodhcp

This is all the setup that is required to get the synthetic debugger
configured correctly.

Jon Doron (4):
  hyperv: SControl is optional to enable SynIc
  hyperv: Add definitions for syndbg
  hyperv: Add support to process syndbg commands
  hw: hyperv: Initial commit for Synthetic Debugging device

 docs/hyperv.txt  |  15 ++
 hw/hyperv/Kconfig|   5 +
 hw/hyperv/hyperv.c   | 352 ---
 hw/hyperv/meson.build|   1 +
 hw/hyperv/syndbg.c   | 402 +++
 include/hw/hyperv/hyperv-proto.h |  52 
 include/hw/hyperv/hyperv.h   |  58 +
 target/i386/cpu.c|   2 +
 target/i386/cpu.h|   7 +
 target/i386/kvm/hyperv-proto.h   |  37 +++
 target/i386/kvm/hyperv-stub.c|   6 +
 target/i386/kvm/hyperv.c |  52 +++-
 target/i386/kvm/kvm.c|  76 +-
 13 files changed, 1024 insertions(+), 41 deletions(-)
 create mode 100644 hw/hyperv/syndbg.c

-- 
2.35.1




[PATCH v2 2/9] spapr: prevent hdec timer being set up under virtual hypervisor

2022-02-16 Thread Nicholas Piggin
The spapr virtual hypervisor does not require the hdecr timer.
Remove it.

Reviewed-by: Daniel Henrique Barboza 
Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/ppc.c| 2 +-
 hw/ppc/spapr_cpu_core.c | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index ba7fa0f3b5..c6dfc5975f 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -1072,7 +1072,7 @@ clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t 
freq)
 }
 /* Create new timer */
 tb_env->decr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, _ppc_decr_cb, 
cpu);
-if (env->has_hv_mode) {
+if (env->has_hv_mode && !cpu->vhyp) {
 tb_env->hdecr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, 
_ppc_hdecr_cb,
 cpu);
 } else {
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index a781e97f8d..ed84713960 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -261,12 +261,12 @@ static bool spapr_realize_vcpu(PowerPCCPU *cpu, 
SpaprMachineState *spapr,
 return false;
 }
 
-/* Set time-base frequency to 512 MHz */
-cpu_ppc_tb_init(env, SPAPR_TIMEBASE_FREQ);
-
 cpu_ppc_set_vhyp(cpu, PPC_VIRTUAL_HYPERVISOR(spapr));
 kvmppc_set_papr(cpu);
 
+/* Set time-base frequency to 512 MHz. vhyp must be set first. */
+cpu_ppc_tb_init(env, SPAPR_TIMEBASE_FREQ);
+
 if (spapr_irq_cpu_intc_create(spapr, cpu, errp) < 0) {
 qdev_unrealize(DEVICE(cpu));
 return false;
-- 
2.23.0




[PATCH v2 4/9] target/ppc: add vhyp addressing mode helper for radix MMU

2022-02-16 Thread Nicholas Piggin
The radix on vhyp MMU uses a single-level radix table walk, with the
partition scope mapping provided by the flat QEMU machine memory.

A subsequent change will use the two-level radix walk on vhyp in some
situations, so provide a helper which can abstract that logic.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/mmu-radix64.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
index df2fec80ce..5535f0fe20 100644
--- a/target/ppc/mmu-radix64.c
+++ b/target/ppc/mmu-radix64.c
@@ -354,6 +354,17 @@ static int ppc_radix64_partition_scoped_xlate(PowerPCCPU 
*cpu,
 return 0;
 }
 
+/*
+ * The spapr vhc has a flat partition scope provided by qemu memory.
+ */
+static bool vhyp_flat_addressing(PowerPCCPU *cpu)
+{
+if (cpu->vhyp) {
+return true;
+}
+return false;
+}
+
 static int ppc_radix64_process_scoped_xlate(PowerPCCPU *cpu,
 MMUAccessType access_type,
 vaddr eaddr, uint64_t pid,
@@ -385,7 +396,7 @@ static int ppc_radix64_process_scoped_xlate(PowerPCCPU *cpu,
 }
 prtbe_addr = (pate.dw1 & PATE1_R_PRTB) + offset;
 
-if (cpu->vhyp) {
+if (vhyp_flat_addressing(cpu)) {
 prtbe0 = ldq_phys(cs->as, prtbe_addr);
 } else {
 /*
@@ -411,7 +422,7 @@ static int ppc_radix64_process_scoped_xlate(PowerPCCPU *cpu,
 *g_page_size = PRTBE_R_GET_RTS(prtbe0);
 base_addr = prtbe0 & PRTBE_R_RPDB;
 nls = prtbe0 & PRTBE_R_RPDS;
-if (msr_hv || cpu->vhyp) {
+if (msr_hv || vhyp_flat_addressing(cpu)) {
 /*
  * Can treat process table addresses as real addresses
  */
@@ -515,7 +526,7 @@ static bool ppc_radix64_xlate_impl(PowerPCCPU *cpu, vaddr 
eaddr,
 relocation = !mmuidx_real(mmu_idx);
 
 /* HV or virtual hypervisor Real Mode Access */
-if (!relocation && (mmuidx_hv(mmu_idx) || cpu->vhyp)) {
+if (!relocation && (mmuidx_hv(mmu_idx) || vhyp_flat_addressing(cpu))) {
 /* In real mode top 4 effective addr bits (mostly) ignored */
 *raddr = eaddr & 0x0FFFULL;
 
@@ -592,7 +603,7 @@ static bool ppc_radix64_xlate_impl(PowerPCCPU *cpu, vaddr 
eaddr,
 g_raddr = eaddr & R_EADDR_MASK;
 }
 
-if (cpu->vhyp) {
+if (vhyp_flat_addressing(cpu)) {
 *raddr = g_raddr;
 } else {
 /*
-- 
2.23.0




Re: [PATCH 3/3] x86: Switch to q35 as the default machine type

2022-02-16 Thread Thomas Huth

On 16/02/2022 12.01, Dr. David Alan Gilbert wrote:

* Gerd Hoffmann (kra...@redhat.com) wrote:

   Hi,
  

Given the semantic differences from 'i440fx', changing the default
machine type has effects that are equivalent to breaking command
line syntax compatibility, which is something we've always tried
to avoid.


And if we are fine breaking backward compatibility I'd rather *not* pick
a default, effectively making -M $something mandatory, similar to arm.


Oh, that's probably easy to do;  what are other peoples thoughts on
that?


I agree with Gerd. Getting rid of a default machine on x86 is likely better 
than silently changing it to q35. But I'd maybe say that this should go 
through the deprecation process first?


 Thomas




[PATCH 19/20] migration: Postcopy recover with preempt enabled

2022-02-16 Thread Peter Xu
To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
needs similar handling on fault tolerance.  When ram_load_postcopy() fails,
instead of stopping the thread it halts with a semaphore, preparing to be
kicked again when recovery is detected.

A mutex is introduced to make sure there's no concurrent operation upon the
socket.  To make it simple, the fast ram load thread will take the mutex during
its whole procedure, and only release it if it's paused.  The fast-path socket
will be properly released by the main loading thread safely when there's
network failures during postcopy with that mutex held.

Signed-off-by: Peter Xu 
---
 migration/migration.c| 17 +++--
 migration/migration.h|  3 +++
 migration/postcopy-ram.c | 24 ++--
 migration/savevm.c   | 17 +
 migration/trace-events   |  2 ++
 5 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index d20db04097..c68a281406 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -215,9 +215,11 @@ void migration_object_init(void)
 current_incoming->postcopy_remote_fds =
 g_array_new(FALSE, TRUE, sizeof(struct PostCopyFD));
 qemu_mutex_init(_incoming->rp_mutex);
+qemu_mutex_init(_incoming->postcopy_prio_thread_mutex);
 qemu_event_init(_incoming->main_thread_load_event, false);
 qemu_sem_init(_incoming->postcopy_pause_sem_dst, 0);
 qemu_sem_init(_incoming->postcopy_pause_sem_fault, 0);
+qemu_sem_init(_incoming->postcopy_pause_sem_fast_load, 0);
 qemu_mutex_init(_incoming->page_request_mutex);
 current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
 
@@ -697,9 +699,9 @@ static bool postcopy_try_recover(void)
 
 /*
  * Here, we only wake up the main loading thread (while the
- * fault thread will still be waiting), so that we can receive
+ * rest threads will still be waiting), so that we can receive
  * commands from source now, and answer it if needed. The
- * fault thread will be woken up afterwards until we are sure
+ * rest threads will be woken up afterwards until we are sure
  * that source is ready to reply to page requests.
  */
 qemu_sem_post(>postcopy_pause_sem_dst);
@@ -3466,6 +3468,17 @@ static MigThrError postcopy_pause(MigrationState *s)
 qemu_file_shutdown(file);
 qemu_fclose(file);
 
+/*
+ * Do the same to postcopy fast path socket too if there is.  No
+ * locking needed because no racer as long as we do this before setting
+ * status to paused.
+ */
+if (s->postcopy_qemufile_src) {
+migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
+qemu_fclose(s->postcopy_qemufile_src);
+s->postcopy_qemufile_src = NULL;
+}
+
 migrate_set_state(>state, s->state,
   MIGRATION_STATUS_POSTCOPY_PAUSED);
 
diff --git a/migration/migration.h b/migration/migration.h
index b8aacfe3af..945088064a 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -118,6 +118,8 @@ struct MigrationIncomingState {
 /* Postcopy priority thread is used to receive postcopy requested pages */
 QemuThread postcopy_prio_thread;
 bool postcopy_prio_thread_created;
+/* Used to sync with the prio thread */
+QemuMutex postcopy_prio_thread_mutex;
 /*
  * An array of temp host huge pages to be used, one for each postcopy
  * channel.
@@ -147,6 +149,7 @@ struct MigrationIncomingState {
 /* notify PAUSED postcopy incoming migrations to try to continue */
 QemuSemaphore postcopy_pause_sem_dst;
 QemuSemaphore postcopy_pause_sem_fault;
+QemuSemaphore postcopy_pause_sem_fast_load;
 
 /* List of listening socket addresses  */
 SocketAddressList *socket_address_list;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 30eddaacd1..b3d23804bc 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1575,6 +1575,15 @@ int postcopy_preempt_setup(MigrationState *s, Error 
**errp)
 return 0;
 }
 
+static void postcopy_pause_ram_fast_load(MigrationIncomingState *mis)
+{
+trace_postcopy_pause_fast_load();
+qemu_mutex_unlock(>postcopy_prio_thread_mutex);
+qemu_sem_wait(>postcopy_pause_sem_fast_load);
+qemu_mutex_lock(>postcopy_prio_thread_mutex);
+trace_postcopy_pause_fast_load_continued();
+}
+
 void *postcopy_preempt_thread(void *opaque)
 {
 MigrationIncomingState *mis = opaque;
@@ -1587,11 +1596,22 @@ void *postcopy_preempt_thread(void *opaque)
 qemu_sem_post(>thread_sync_sem);
 
 /* Sending RAM_SAVE_FLAG_EOS to terminate this thread */
-ret = ram_load_postcopy(mis->postcopy_qemufile_dst, RAM_CHANNEL_POSTCOPY);
+qemu_mutex_lock(>postcopy_prio_thread_mutex);
+while (1) {
+ret = 

[PULL v2 18/35] target/riscv: Add defines for AIA CSRs

2022-02-16 Thread Alistair Francis
From: Anup Patel 

The RISC-V AIA specification extends RISC-V local interrupts and
introduces new CSRs. This patch adds defines for the new AIA CSRs.

Signed-off-by: Anup Patel 
Signed-off-by: Anup Patel 
Reviewed-by: Alistair Francis 
Reviewed-by: Frank Chang 
Message-id: 20220204174700.534953-8-a...@brainfault.org
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu_bits.h | 119 
 1 file changed, 119 insertions(+)

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index a541705760..068c4d8034 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -168,6 +168,31 @@
 #define CSR_MTVAL   0x343
 #define CSR_MIP 0x344
 
+/* Machine-Level Window to Indirectly Accessed Registers (AIA) */
+#define CSR_MISELECT0x350
+#define CSR_MIREG   0x351
+
+/* Machine-Level Interrupts (AIA) */
+#define CSR_MTOPI   0xfb0
+
+/* Machine-Level IMSIC Interface (AIA) */
+#define CSR_MSETEIPNUM  0x358
+#define CSR_MCLREIPNUM  0x359
+#define CSR_MSETEIENUM  0x35a
+#define CSR_MCLREIENUM  0x35b
+#define CSR_MTOPEI  0x35c
+
+/* Virtual Interrupts for Supervisor Level (AIA) */
+#define CSR_MVIEN   0x308
+#define CSR_MVIP0x309
+
+/* Machine-Level High-Half CSRs (AIA) */
+#define CSR_MIDELEGH0x313
+#define CSR_MIEH0x314
+#define CSR_MVIENH  0x318
+#define CSR_MVIPH   0x319
+#define CSR_MIPH0x354
+
 /* Supervisor Trap Setup */
 #define CSR_SSTATUS 0x100
 #define CSR_SEDELEG 0x102
@@ -187,6 +212,24 @@
 #define CSR_SPTBR   0x180
 #define CSR_SATP0x180
 
+/* Supervisor-Level Window to Indirectly Accessed Registers (AIA) */
+#define CSR_SISELECT0x150
+#define CSR_SIREG   0x151
+
+/* Supervisor-Level Interrupts (AIA) */
+#define CSR_STOPI   0xdb0
+
+/* Supervisor-Level IMSIC Interface (AIA) */
+#define CSR_SSETEIPNUM  0x158
+#define CSR_SCLREIPNUM  0x159
+#define CSR_SSETEIENUM  0x15a
+#define CSR_SCLREIENUM  0x15b
+#define CSR_STOPEI  0x15c
+
+/* Supervisor-Level High-Half CSRs (AIA) */
+#define CSR_SIEH0x114
+#define CSR_SIPH0x154
+
 /* Hpervisor CSRs */
 #define CSR_HSTATUS 0x600
 #define CSR_HEDELEG 0x602
@@ -217,6 +260,35 @@
 #define CSR_MTINST  0x34a
 #define CSR_MTVAL2  0x34b
 
+/* Virtual Interrupts and Interrupt Priorities (H-extension with AIA) */
+#define CSR_HVIEN   0x608
+#define CSR_HVICTL  0x609
+#define CSR_HVIPRIO10x646
+#define CSR_HVIPRIO20x647
+
+/* VS-Level Window to Indirectly Accessed Registers (H-extension with AIA) */
+#define CSR_VSISELECT   0x250
+#define CSR_VSIREG  0x251
+
+/* VS-Level Interrupts (H-extension with AIA) */
+#define CSR_VSTOPI  0xeb0
+
+/* VS-Level IMSIC Interface (H-extension with AIA) */
+#define CSR_VSSETEIPNUM 0x258
+#define CSR_VSCLREIPNUM 0x259
+#define CSR_VSSETEIENUM 0x25a
+#define CSR_VSCLREIENUM 0x25b
+#define CSR_VSTOPEI 0x25c
+
+/* Hypervisor and VS-Level High-Half CSRs (H-extension with AIA) */
+#define CSR_HIDELEGH0x613
+#define CSR_HVIENH  0x618
+#define CSR_HVIPH   0x655
+#define CSR_HVIPRIO1H   0x656
+#define CSR_HVIPRIO2H   0x657
+#define CSR_VSIEH   0x214
+#define CSR_VSIPH   0x254
+
 /* Enhanced Physical Memory Protection (ePMP) */
 #define CSR_MSECCFG 0x747
 #define CSR_MSECCFGH0x757
@@ -635,4 +707,51 @@ typedef enum RISCVException {
 #define UMTE_U_PM_INSN  U_PM_INSN
 #define UMTE_MASK (UMTE_U_PM_ENABLE | MMTE_U_PM_CURRENT | UMTE_U_PM_INSN)
 
+/* MISELECT, SISELECT, and VSISELECT bits (AIA) */
+#define ISELECT_IPRIO0 0x30
+#define ISELECT_IPRIO150x3f
+#define ISELECT_IMSIC_EIDELIVERY   0x70
+#define ISELECT_IMSIC_EITHRESHOLD  0x72
+#define ISELECT_IMSIC_EIP0 0x80
+#define ISELECT_IMSIC_EIP630xbf
+#define ISELECT_IMSIC_EIE0 0xc0
+#define ISELECT_IMSIC_EIE630xff
+#define ISELECT_IMSIC_FIRSTISELECT_IMSIC_EIDELIVERY
+#define ISELECT_IMSIC_LAST ISELECT_IMSIC_EIE63
+#define ISELECT_MASK   0x1ff
+
+/* Dummy [M|S|VS]ISELECT value for emulating [M|S|VS]TOPEI CSRs */
+#define ISELECT_IMSIC_TOPEI(ISELECT_MASK + 1)
+
+/* IMSIC bits (AIA) */
+#define IMSIC_TOPEI_IID_SHIFT  16
+#define IMSIC_TOPEI_IID_MASK   0x7ff
+#define IMSIC_TOPEI_IPRIO_MASK 0x7ff
+#define IMSIC_EIPx_BITS32
+#define IMSIC_EIEx_BITS32
+
+/* MTOPI and STOPI bits (AIA) */
+#define TOPI_IID_SHIFT 16
+#define TOPI_IID_MASK  0xfff
+#define TOPI_IPRIO_MASK0xff
+
+/* Interrupt priority bits (AIA) */
+#define IPRIO_IRQ_BITS

[PATCH 20/20] tests: Add postcopy preempt test

2022-02-16 Thread Peter Xu
Two tests are added: a normal postcopy preempt test, and a recovery test.

Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 39 ++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 7b42f6fd90..5053b40589 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -470,6 +470,7 @@ typedef struct {
  */
 bool hide_stderr;
 bool use_shmem;
+bool postcopy_preempt;
 /* only launch the target process */
 bool only_target;
 /* Use dirty ring if true; dirty logging otherwise */
@@ -673,6 +674,11 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 migrate_set_capability(to, "postcopy-ram", true);
 migrate_set_capability(to, "postcopy-blocktime", true);
 
+if (args->postcopy_preempt) {
+migrate_set_capability(from, "postcopy-preempt", true);
+migrate_set_capability(to, "postcopy-preempt", true);
+}
+
 /* We want to pick a speed slow enough that the test completes
  * quickly, but that it doesn't complete precopy even on a slow
  * machine, so also set the downtime.
@@ -719,13 +725,29 @@ static void test_postcopy(void)
 migrate_postcopy_complete(from, to);
 }
 
-static void test_postcopy_recovery(void)
+static void test_postcopy_preempt(void)
+{
+MigrateStart *args = migrate_start_new();
+QTestState *from, *to;
+
+args->postcopy_preempt = true;
+
+if (migrate_postcopy_prepare(, , args)) {
+return;
+}
+migrate_postcopy_start(from, to);
+migrate_postcopy_complete(from, to);
+}
+
+/* @preempt: whether to use postcopy-preempt */
+static void test_postcopy_recovery(bool preempt)
 {
 MigrateStart *args = migrate_start_new();
 QTestState *from, *to;
 g_autofree char *uri = NULL;
 
 args->hide_stderr = true;
+args->postcopy_preempt = preempt;
 
 if (migrate_postcopy_prepare(, , args)) {
 return;
@@ -781,6 +803,16 @@ static void test_postcopy_recovery(void)
 migrate_postcopy_complete(from, to);
 }
 
+static void test_postcopy_recovery_normal(void)
+{
+test_postcopy_recovery(false);
+}
+
+static void test_postcopy_recovery_preempt(void)
+{
+test_postcopy_recovery(true);
+}
+
 static void test_baddest(void)
 {
 MigrateStart *args = migrate_start_new();
@@ -1458,7 +1490,10 @@ int main(int argc, char **argv)
 module_call_init(MODULE_INIT_QOM);
 
 qtest_add_func("/migration/postcopy/unix", test_postcopy);
-qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
+qtest_add_func("/migration/postcopy/recovery", 
test_postcopy_recovery_normal);
+qtest_add_func("/migration/postcopy/preempt/unix", test_postcopy_preempt);
+qtest_add_func("/migration/postcopy/preempt/recovery",
+   test_postcopy_recovery_preempt);
 qtest_add_func("/migration/bad_dest", test_baddest);
 qtest_add_func("/migration/precopy/unix", test_precopy_unix);
 qtest_add_func("/migration/precopy/tcp", test_precopy_tcp);
-- 
2.32.0




[PULL v2 20/35] target/riscv: Implement AIA local interrupt priorities

2022-02-16 Thread Alistair Francis
From: Anup Patel 

The AIA spec defines programmable 8-bit priority for each local interrupt
at M-level, S-level and VS-level so we extend local interrupt processing
to consider AIA interrupt priorities. The AIA CSRs which help software
configure local interrupt priorities will be added by subsequent patches.

Signed-off-by: Anup Patel 
Signed-off-by: Anup Patel 
Reviewed-by: Alistair Francis 
Message-id: 20220204174700.534953-10-a...@brainfault.org
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.h|  12 ++
 target/riscv/cpu.c|  19 +++
 target/riscv/cpu_helper.c | 281 +++---
 target/riscv/machine.c|   3 +
 4 files changed, 294 insertions(+), 21 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 6b6df57c42..89e9cc558d 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -192,6 +192,10 @@ struct CPURISCVState {
 target_ulong mcause;
 target_ulong mtval;  /* since: priv-1.10.0 */
 
+/* Machine and Supervisor interrupt priorities */
+uint8_t miprio[64];
+uint8_t siprio[64];
+
 /* Hypervisor CSRs */
 target_ulong hstatus;
 target_ulong hedeleg;
@@ -204,6 +208,9 @@ struct CPURISCVState {
 target_ulong hgeip;
 uint64_t htimedelta;
 
+/* Hypervisor controlled virtual interrupt priorities */
+uint8_t hviprio[64];
+
 /* Upper 64-bits of 128-bit CSRs */
 uint64_t mscratchh;
 uint64_t sscratchh;
@@ -415,6 +422,11 @@ int riscv_cpu_write_elf32_note(WriteCoreDumpFunction f, 
CPUState *cs,
int cpuid, void *opaque);
 int riscv_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
 int riscv_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
+int riscv_cpu_hviprio_index2irq(int index, int *out_irq, int *out_rdzero);
+uint8_t riscv_cpu_default_priority(int irq);
+int riscv_cpu_mirq_pending(CPURISCVState *env);
+int riscv_cpu_sirq_pending(CPURISCVState *env);
+int riscv_cpu_vsirq_pending(CPURISCVState *env);
 bool riscv_cpu_fp_enabled(CPURISCVState *env);
 target_ulong riscv_cpu_get_geilen(CPURISCVState *env);
 void riscv_cpu_set_geilen(CPURISCVState *env, target_ulong geilen);
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index ff766acc21..5fb0a61036 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -400,6 +400,10 @@ void restore_state_to_opc(CPURISCVState *env, 
TranslationBlock *tb,
 
 static void riscv_cpu_reset(DeviceState *dev)
 {
+#ifndef CONFIG_USER_ONLY
+uint8_t iprio;
+int i, irq, rdzero;
+#endif
 CPUState *cs = CPU(dev);
 RISCVCPU *cpu = RISCV_CPU(cs);
 RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(cpu);
@@ -432,6 +436,21 @@ static void riscv_cpu_reset(DeviceState *dev)
 env->miclaim = MIP_SGEIP;
 env->pc = env->resetvec;
 env->two_stage_lookup = false;
+
+/* Initialized default priorities of local interrupts. */
+for (i = 0; i < ARRAY_SIZE(env->miprio); i++) {
+iprio = riscv_cpu_default_priority(i);
+env->miprio[i] = (i == IRQ_M_EXT) ? 0 : iprio;
+env->siprio[i] = (i == IRQ_S_EXT) ? 0 : iprio;
+env->hviprio[i] = 0;
+}
+i = 0;
+while (!riscv_cpu_hviprio_index2irq(i, , )) {
+if (!rdzero) {
+env->hviprio[irq] = env->miprio[irq];
+}
+i++;
+}
 /* mmte is supposed to have pm.current hardwired to 1 */
 env->mmte |= (PM_EXT_INITIAL | MMTE_M_PM_CURRENT);
 #endif
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 37c58a891b..1a9534d6d7 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -152,36 +152,275 @@ void riscv_cpu_update_mask(CPURISCVState *env)
 }
 
 #ifndef CONFIG_USER_ONLY
-static int riscv_cpu_local_irq_pending(CPURISCVState *env)
+
+/*
+ * The HS-mode is allowed to configure priority only for the
+ * following VS-mode local interrupts:
+ *
+ * 0  (Reserved interrupt, reads as zero)
+ * 1  Supervisor software interrupt
+ * 4  (Reserved interrupt, reads as zero)
+ * 5  Supervisor timer interrupt
+ * 8  (Reserved interrupt, reads as zero)
+ * 13 (Reserved interrupt)
+ * 14 "
+ * 15 "
+ * 16 "
+ * 18 Debug/trace interrupt
+ * 20 (Reserved interrupt)
+ * 22 "
+ * 24 "
+ * 26 "
+ * 28 "
+ * 30 (Reserved for standard reporting of bus or system errors)
+ */
+
+static const int hviprio_index2irq[] = {
+0, 1, 4, 5, 8, 13, 14, 15, 16, 18, 20, 22, 24, 26, 28, 30 };
+static const int hviprio_index2rdzero[] = {
+1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
+
+int riscv_cpu_hviprio_index2irq(int index, int *out_irq, int *out_rdzero)
+{
+if (index < 0 || ARRAY_SIZE(hviprio_index2irq) <= index) {
+return -EINVAL;
+}
+
+if (out_irq) {
+*out_irq = hviprio_index2irq[index];
+}
+
+if (out_rdzero) {
+*out_rdzero = hviprio_index2rdzero[index];
+}
+
+return 0;
+}
+
+/*
+ * Default priorities of local interrupts are defined in the
+ * RISC-V Advanced Interrupt Architecture specification.
+ *

[PATCH v2 1/3] target/ppc: Fix POWER9 DD2.0 PVR, add DD2.1

2022-02-16 Thread Nicholas Piggin
The POWER9 DD2.0 PVR is incorrect. It doesn't cause problems because
the pvr check is masking it and matching against the base.

Correct it, add a PVR for DD2.1.

Signed-off-by: Nicholas Piggin 
---
Since v1: new patch

 target/ppc/cpu-models.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/ppc/cpu-models.h b/target/ppc/cpu-models.h
index 76775a74a9..9902285ac8 100644
--- a/target/ppc/cpu-models.h
+++ b/target/ppc/cpu-models.h
@@ -349,7 +349,8 @@ enum {
 CPU_POWERPC_POWER8NVL_v10  = 0x004C0100,
 CPU_POWERPC_POWER9_BASE= 0x004E,
 CPU_POWERPC_POWER9_DD1 = 0x004E0100,
-CPU_POWERPC_POWER9_DD20= 0x004E1200,
+CPU_POWERPC_POWER9_DD20= 0x004E0200,
+CPU_POWERPC_POWER9_DD21= 0x004E0201,
 CPU_POWERPC_POWER10_BASE   = 0x0080,
 CPU_POWERPC_POWER10_DD1= 0x00800100,
 CPU_POWERPC_POWER10_DD20   = 0x00800200,
-- 
2.23.0




[PATCH 2/8] target/i386: Add lbr-fmt vPMU option to support guest LBR

2022-02-16 Thread Yang Weijiang
The Last Branch Recording (LBR) is a performance monitor unit (PMU)
feature on Intel processors which records a running trace of the most
recent branches taken by the processor in the LBR stack. This option
indicates the LBR format to enable for guest perf.

The LBR feature is enabled if below conditions are met:
1) KVM is enabled and the PMU is enabled.
2) msr-based-feature IA32_PERF_CAPABILITIES is supporterd on KVM.
3) Supported returned value for lbr_fmt from above msr is non-zero.
4) Guest vcpu model does support FEAT_1_ECX.CPUID_EXT_PDCM.
5) User-provided lbr-fmt value doesn't violate its bitmask (0x3f).
6) Target guest LBR format matches that of host.

Co-developed-by: Like Xu 
Signed-off-by: Like Xu 
Signed-off-by: Yang Weijiang 
---
 target/i386/cpu.c | 40 
 target/i386/cpu.h | 10 ++
 2 files changed, 50 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 9543762e7e..a037bba387 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6370,6 +6370,7 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 CPUX86State *env = >env;
 Error *local_err = NULL;
 static bool ht_warned;
+uint64_t requested_lbr_fmt;
 
 if (cpu->apic_id == UNASSIGNED_APIC_ID) {
 error_setg(errp, "apic-id property was not initialized properly");
@@ -6387,6 +6388,42 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 goto out;
 }
 
+/*
+ * Override env->features[FEAT_PERF_CAPABILITIES].LBR_FMT
+ * with user-provided setting.
+ */
+if (cpu->lbr_fmt != ~PERF_CAP_LBR_FMT) {
+if ((cpu->lbr_fmt & PERF_CAP_LBR_FMT) != cpu->lbr_fmt) {
+error_setg(errp, "invalid lbr-fmt");
+return;
+}
+env->features[FEAT_PERF_CAPABILITIES] &= ~PERF_CAP_LBR_FMT;
+env->features[FEAT_PERF_CAPABILITIES] |= cpu->lbr_fmt;
+}
+
+/*
+ * vPMU LBR is supported when 1) KVM is enabled 2) Option pmu=on and
+ * 3)vPMU LBR format matches that of host setting.
+ */
+requested_lbr_fmt =
+env->features[FEAT_PERF_CAPABILITIES] & PERF_CAP_LBR_FMT;
+if (requested_lbr_fmt && kvm_enabled()) {
+uint64_t host_perf_cap =
+x86_cpu_get_supported_feature_word(FEAT_PERF_CAPABILITIES, false);
+uint64_t host_lbr_fmt = host_perf_cap & PERF_CAP_LBR_FMT;
+
+if (!cpu->enable_pmu) {
+error_setg(errp, "vPMU: LBR is unsupported without pmu=on");
+return;
+}
+if (requested_lbr_fmt != host_lbr_fmt) {
+error_setg(errp, "vPMU: the lbr-fmt value (0x%lx) mismatches "
+"the host supported value (0x%lx).",
+requested_lbr_fmt, host_lbr_fmt);
+return;
+}
+}
+
 x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);
 
 if (cpu->enforce_cpuid && x86_cpu_have_filtered_features(cpu)) {
@@ -6739,6 +6776,8 @@ static void x86_cpu_initfn(Object *obj)
 object_property_add_alias(obj, "sse4_2", obj, "sse4.2");
 
 object_property_add_alias(obj, "hv-apicv", obj, "hv-avic");
+cpu->lbr_fmt = ~PERF_CAP_LBR_FMT;
+object_property_add_alias(obj, "lbr_fmt", obj, "lbr-fmt");
 
 if (xcc->model) {
 x86_cpu_load_model(cpu, xcc->model);
@@ -6894,6 +6933,7 @@ static Property x86_cpu_properties[] = {
 #endif
 DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID),
 DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false),
+DEFINE_PROP_UINT64_CHECKMASK("lbr-fmt", X86CPU, lbr_fmt, PERF_CAP_LBR_FMT),
 
 DEFINE_PROP_UINT32("hv-spinlocks", X86CPU, hyperv_spinlock_attempts,
HYPERV_SPINLOCK_NEVER_NOTIFY),
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 509c16323a..852afabe0b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -383,6 +383,7 @@ typedef enum X86Seg {
 #define ARCH_CAP_TSX_CTRL_MSR  (1<<7)
 
 #define MSR_IA32_PERF_CAPABILITIES  0x345
+#define PERF_CAP_LBR_FMT0x3f
 
 #define MSR_IA32_TSX_CTRL  0x122
 #define MSR_IA32_TSCDEADLINE0x6e0
@@ -1819,6 +1820,15 @@ struct X86CPU {
  */
 bool enable_pmu;
 
+/*
+ * Enable LBR_FMT bits of IA32_PERF_CAPABILITIES MSR.
+ * This can't be initialized with a default because it doesn't have
+ * stable ABI support yet. It is only allowed to pass all LBR_FMT bits
+ * returned by kvm_arch_get_supported_msr_feature()(which depends on both
+ * host CPU and kernel capabilities) to the guest.
+ */
+uint64_t lbr_fmt;
+
 /* LMCE support can be enabled/disabled via cpu option 'lmce=on/off'. It is
  * disabled by default to avoid breaking migration between QEMU with
  * different LMCE configurations.
-- 
2.27.0




[PATCH v2] arm: Remove swift-bmc machine

2022-02-16 Thread Joel Stanley
It was scheduled for removal in 7.0.

Signed-off-by: Joel Stanley 
---
v2: also remove from docs/about/deprecated.rst

 docs/about/deprecated.rst  |  7 -
 docs/system/arm/aspeed.rst |  1 -
 hw/arm/aspeed.c| 53 --
 3 files changed, 61 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 26d00812ba94..85773db631c1 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -315,13 +315,6 @@ Use the more generic event ``DEVICE_UNPLUG_GUEST_ERROR`` 
instead.
 System emulator machines
 
 
-Aspeed ``swift-bmc`` machine (since 6.1)
-
-
-This machine is deprecated because we have enough AST2500 based OpenPOWER
-machines. It can be easily replaced by the ``witherspoon-bmc`` or the
-``romulus-bmc`` machines.
-
 PPC 405 ``taihu`` machine (since 7.0)
 '
 
diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst
index d8b102fa0ad0..60ed94f18759 100644
--- a/docs/system/arm/aspeed.rst
+++ b/docs/system/arm/aspeed.rst
@@ -22,7 +22,6 @@ AST2500 SoC based machines :
 - ``romulus-bmc``  OpenPOWER Romulus POWER9 BMC
 - ``witherspoon-bmc``  OpenPOWER Witherspoon POWER9 BMC
 - ``sonorapass-bmc``   OCP SonoraPass BMC
-- ``swift-bmc``OpenPOWER Swift BMC POWER9 (to be removed in v7.0)
 - ``fp5280g2-bmc`` Inspur FP5280G2 BMC
 - ``g220a-bmc``Bytedance G220A BMC
 
diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index d911dc904fb3..9789a489047b 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -544,35 +544,6 @@ static void romulus_bmc_i2c_init(AspeedMachineState *bmc)
 i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 11), "ds1338", 0x32);
 }
 
-static void swift_bmc_i2c_init(AspeedMachineState *bmc)
-{
-AspeedSoCState *soc = >soc;
-
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 3), "pca9552", 0x60);
-
-/* The swift board expects a TMP275 but a TMP105 is compatible */
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 7), "tmp105", 0x48);
-/* The swift board expects a pca9551 but a pca9552 is compatible */
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 7), "pca9552", 0x60);
-
-/* The swift board expects an Epson RX8900 RTC but a ds1338 is compatible 
*/
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 8), "ds1338", 0x32);
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 8), "pca9552", 0x60);
-
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 9), "tmp423", 0x4c);
-/* The swift board expects a pca9539 but a pca9552 is compatible */
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 9), "pca9552", 0x74);
-
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 10), "tmp423", 0x4c);
-/* The swift board expects a pca9539 but a pca9552 is compatible */
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 10), "pca9552",
- 0x74);
-
-/* The swift board expects a TMP275 but a TMP105 is compatible */
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 12), "tmp105", 0x48);
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 12), "tmp105", 0x4a);
-}
-
 static void sonorapass_bmc_i2c_init(AspeedMachineState *bmc)
 {
 AspeedSoCState *soc = >soc;
@@ -1102,26 +1073,6 @@ static void 
aspeed_machine_sonorapass_class_init(ObjectClass *oc, void *data)
 aspeed_soc_num_cpus(amc->soc_name);
 };
 
-static void aspeed_machine_swift_class_init(ObjectClass *oc, void *data)
-{
-MachineClass *mc = MACHINE_CLASS(oc);
-AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
-
-mc->desc   = "OpenPOWER Swift BMC (ARM1176)";
-amc->soc_name  = "ast2500-a1";
-amc->hw_strap1 = SWIFT_BMC_HW_STRAP1;
-amc->fmc_model = "mx66l1g45g";
-amc->spi_model = "mx66l1g45g";
-amc->num_cs= 2;
-amc->i2c_init  = swift_bmc_i2c_init;
-mc->default_ram_size   = 512 * MiB;
-mc->default_cpus = mc->min_cpus = mc->max_cpus =
-aspeed_soc_num_cpus(amc->soc_name);
-
-mc->deprecation_reason = "redundant system. Please use a similar "
-"OpenPOWER BMC, Witherspoon or Romulus.";
-};
-
 static void aspeed_machine_witherspoon_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -1277,10 +1228,6 @@ static const TypeInfo aspeed_machine_types[] = {
 .name  = MACHINE_TYPE_NAME("romulus-bmc"),
 .parent= TYPE_ASPEED_MACHINE,
 .class_init= aspeed_machine_romulus_class_init,
-}, {
-.name  = MACHINE_TYPE_NAME("swift-bmc"),
-.parent= TYPE_ASPEED_MACHINE,
-.class_init= aspeed_machine_swift_class_init,
 }, {
 .name  = MACHINE_TYPE_NAME("sonorapass-bmc"),
 .parent= TYPE_ASPEED_MACHINE,
-- 
2.34.1




Re: [PATCH] util: Add iova_tree_alloc

2022-02-16 Thread Peter Xu
On Tue, Feb 15, 2022 at 08:34:23PM +0100, Eugenio Pérez wrote:
> This iova tree function allows it to look for a hole in allocated
> regions and return a totally new translation for a given translated
> address.
> 
> It's usage is mainly to allow devices to access qemu address space,
> remapping guest's one into a new iova space where qemu can add chunks of
> addresses.
> 
> Signed-off-by: Eugenio Pérez 

Reviewed-by: Peter Xu 

-- 
Peter Xu




Re: [PATCH v1] hw: riscv: opentitan: fixup SPI addresses

2022-02-16 Thread Bin Meng
On Wed, Feb 16, 2022 at 2:23 PM Alistair Francis
 wrote:
>
> From: Wilfred Mallawa 
>
> This patch updates the SPI_DEVICE, SPI_HOST0, SPI_HOST1
> base addresses. Also adds these as unimplemented devices.
>
> The address references can be found [1].
>
> [1] 
> https://github.com/lowRISC/opentitan/blob/6c317992fbd646818b34f2a2dbf44bc850e461e4/hw/top_earlgrey/sw/autogen/top_earlgrey_memory.h#L107
>
> Signed-off-by: Wilfred Mallawa 
> ---
>  hw/riscv/opentitan.c | 12 +---
>  include/hw/riscv/opentitan.h |  4 +++-
>  2 files changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/hw/riscv/opentitan.c b/hw/riscv/opentitan.c
> index aec7cfa33f..596b518a26 100644
> --- a/hw/riscv/opentitan.c
> +++ b/hw/riscv/opentitan.c
> @@ -33,8 +33,10 @@ static const MemMapEntry ibex_memmap[] = {
>  [IBEX_DEV_RAM] ={  0x1000,  0x1 },
>  [IBEX_DEV_FLASH] =  {  0x2000,  0x8 },
>  [IBEX_DEV_UART] =   {  0x4000,  0x1000  },
> +[IBEX_DEV_SPI_HOST0] =  {  0x4030,  0x1000  },
> +[IBEX_DEV_SPI_HOST1] =  {  0x4031,  0x1000  },

Please insert these according to sorted order

>  [IBEX_DEV_GPIO] =   {  0x4004,  0x1000  },
> -[IBEX_DEV_SPI] ={  0x4005,  0x1000  },
> +[IBEX_DEV_SPI_DEVICE] = {  0x4005,  0x1000  },
>  [IBEX_DEV_I2C] ={  0x4008,  0x1000  },
>  [IBEX_DEV_PATTGEN] ={  0x400e,  0x1000  },
>  [IBEX_DEV_TIMER] =  {  0x4010,  0x1000  },
> @@ -209,8 +211,12 @@ static void lowrisc_ibex_soc_realize(DeviceState 
> *dev_soc, Error **errp)
>
>  create_unimplemented_device("riscv.lowrisc.ibex.gpio",
>  memmap[IBEX_DEV_GPIO].base, memmap[IBEX_DEV_GPIO].size);
> -create_unimplemented_device("riscv.lowrisc.ibex.spi",
> -memmap[IBEX_DEV_SPI].base, memmap[IBEX_DEV_SPI].size);
> +create_unimplemented_device("riscv.lowrisc.ibex.spi_device",
> +memmap[IBEX_DEV_SPI_DEVICE].base, memmap[IBEX_DEV_SPI_DEVICE].size);
> +create_unimplemented_device("riscv.lowrisc.ibex.spi_host0",
> +memmap[IBEX_DEV_SPI_HOST0].base, memmap[IBEX_DEV_SPI_HOST0].size);
> +create_unimplemented_device("riscv.lowrisc.ibex.spi_host1",
> +memmap[IBEX_DEV_SPI_HOST1].base, memmap[IBEX_DEV_SPI_HOST1].size);
>  create_unimplemented_device("riscv.lowrisc.ibex.i2c",
>  memmap[IBEX_DEV_I2C].base, memmap[IBEX_DEV_I2C].size);
>  create_unimplemented_device("riscv.lowrisc.ibex.pattgen",
> diff --git a/include/hw/riscv/opentitan.h b/include/hw/riscv/opentitan.h
> index eac35ef590..00da9ded43 100644
> --- a/include/hw/riscv/opentitan.h
> +++ b/include/hw/riscv/opentitan.h
> @@ -57,8 +57,10 @@ enum {
>  IBEX_DEV_FLASH,
>  IBEX_DEV_FLASH_VIRTUAL,
>  IBEX_DEV_UART,
> +IBEX_DEV_SPI_DEVICE,
> +IBEX_DEV_SPI_HOST0,
> +IBEX_DEV_SPI_HOST1,
>  IBEX_DEV_GPIO,
> -IBEX_DEV_SPI,
>  IBEX_DEV_I2C,
>  IBEX_DEV_PATTGEN,
>  IBEX_DEV_TIMER,
> --

Regards,
Bin



[PATCH 8/8] target/i386: Support Arch LBR in CPUID enumeration

2022-02-16 Thread Yang Weijiang
If CPUID.(EAX=07H, ECX=0):EDX[19] is set to 1, the processor
supports Architectural LBRs. In this case, CPUID leaf 01CH
indicates details of the Architectural LBRs capabilities.
XSAVE support for Architectural LBRs is enumerated in
CPUID.(EAX=0DH, ECX=0FH).

Signed-off-by: Yang Weijiang 
---
 target/i386/cpu.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index e505c926b2..1092618683 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -858,7 +858,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 "fsrm", NULL, NULL, NULL,
 "avx512-vp2intersect", NULL, "md-clear", NULL,
 NULL, NULL, "serialize", NULL,
-"tsx-ldtrk", NULL, NULL /* pconfig */, NULL,
+"tsx-ldtrk", NULL, NULL /* pconfig */, "arch-lbr",
 NULL, NULL, "amx-bf16", "avx512-fp16",
 "amx-tile", "amx-int8", "spec-ctrl", "stibp",
 NULL, "arch-capabilities", "core-capability", "ssbd",
@@ -5494,6 +5494,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 assert(!(*eax & ~0x1f));
 *ebx &= 0x; /* The count doesn't need to be reliable. */
 break;
+case 0x1C:
+*eax = kvm_arch_get_supported_cpuid(cs->kvm_state, 0x1C, 0, R_EAX);
+*ebx = kvm_arch_get_supported_cpuid(cs->kvm_state, 0x1C, 0, R_EBX);
+*ecx = kvm_arch_get_supported_cpuid(cs->kvm_state, 0x1C, 0, R_ECX);
+*edx = 0;
+break;
 case 0x1F:
 /* V2 Extended Topology Enumeration Leaf */
 if (env->nr_dies < 2) {
@@ -5556,6 +5562,19 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *ebx = xsave_area_size(xstate, true);
 *ecx = env->features[FEAT_XSAVE_XSS_LO];
 *edx = env->features[FEAT_XSAVE_XSS_HI];
+if (kvm_enabled() && cpu->enable_pmu &&
+(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_ARCH_LBR) &&
+(*eax & CPUID_XSAVE_XSAVES)) {
+*ecx |= XSTATE_ARCH_LBR_MASK;
+} else {
+*ecx &= ~XSTATE_ARCH_LBR_MASK;
+}
+} else if (count == 0xf && kvm_enabled() && cpu->enable_pmu &&
+   (env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_ARCH_LBR)) {
+*eax = kvm_arch_get_supported_cpuid(cs->kvm_state, 0xD, 0xf, 
R_EAX);
+*ebx = kvm_arch_get_supported_cpuid(cs->kvm_state, 0xD, 0xf, 
R_EBX);
+*ecx = kvm_arch_get_supported_cpuid(cs->kvm_state, 0xD, 0xf, 
R_ECX);
+*edx = kvm_arch_get_supported_cpuid(cs->kvm_state, 0xD, 0xf, 
R_EDX);
 } else if (count < ARRAY_SIZE(x86_ext_save_areas)) {
 const ExtSaveArea *esa = _ext_save_areas[count];
 
-- 
2.27.0




[PATCH 4/8] target/i386: Enable support for XSAVES based features

2022-02-16 Thread Yang Weijiang
There're some new features, including Arch LBR, depending
on XSAVES/XRSTORS support, the new instructions will
save/restore data based on feature bits enabled in XCR0 | XSS.
This patch adds the basic support for related CPUID enumeration
and meanwhile changes the name from FEAT_XSAVE_COMP_{LO|HI} to
FEAT_XSAVE_XCR0_{LO|HI} to differentiate clearly the feature
bits in XCR0 and those in XSS.

Signed-off-by: Yang Weijiang 
---
 target/i386/cpu.c | 104 +++---
 target/i386/cpu.h |  13 +-
 2 files changed, 91 insertions(+), 26 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index a037bba387..496e906233 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -940,6 +940,34 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 },
 .tcg_features = TCG_XSAVE_FEATURES,
 },
+[FEAT_XSAVE_XSS_LO] = {
+.type = CPUID_FEATURE_WORD,
+.feat_names = {
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+},
+.cpuid = {
+.eax = 0xD,
+.needs_ecx = true,
+.ecx = 1,
+.reg = R_ECX,
+},
+},
+[FEAT_XSAVE_XSS_HI] = {
+.type = CPUID_FEATURE_WORD,
+.cpuid = {
+.eax = 0xD,
+.needs_ecx = true,
+.ecx = 1,
+.reg = R_EDX
+},
+},
 [FEAT_6_EAX] = {
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
@@ -955,7 +983,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .cpuid = { .eax = 6, .reg = R_EAX, },
 .tcg_features = TCG_6_EAX_FEATURES,
 },
-[FEAT_XSAVE_COMP_LO] = {
+[FEAT_XSAVE_XCR0_LO] = {
 .type = CPUID_FEATURE_WORD,
 .cpuid = {
 .eax = 0xD,
@@ -968,7 +996,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 XSTATE_OPMASK_MASK | XSTATE_ZMM_Hi256_MASK | XSTATE_Hi16_ZMM_MASK |
 XSTATE_PKRU_MASK,
 },
-[FEAT_XSAVE_COMP_HI] = {
+[FEAT_XSAVE_XCR0_HI] = {
 .type = CPUID_FEATURE_WORD,
 .cpuid = {
 .eax = 0xD,
@@ -1385,6 +1413,9 @@ static const X86RegisterInfo32 
x86_reg_info_32[CPU_NB_REGS32] = {
 };
 #undef REGISTER
 
+/* CPUID feature bits available in XSS */
+#define CPUID_XSTATE_XSS_MASK(0)
+
 ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT] = {
 [XSTATE_FP_BIT] = {
 /* x87 FP state component is always enabled if XSAVE is supported */
@@ -1427,15 +1458,18 @@ ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT] 
= {
 },
 };
 
-static uint32_t xsave_area_size(uint64_t mask)
+static uint32_t xsave_area_size(uint64_t mask, bool compacted)
 {
+uint64_t ret = x86_ext_save_areas[0].size;
+const ExtSaveArea *esa;
+uint32_t offset = 0;
 int i;
-uint64_t ret = 0;
 
-for (i = 0; i < ARRAY_SIZE(x86_ext_save_areas); i++) {
-const ExtSaveArea *esa = _ext_save_areas[i];
+for (i = 2; i < ARRAY_SIZE(x86_ext_save_areas); i++) {
+esa = _ext_save_areas[i];
 if ((mask >> i) & 1) {
-ret = MAX(ret, esa->offset + esa->size);
+offset = compacted ? ret : esa->offset;
+ret = MAX(ret, offset + esa->size);
 }
 }
 return ret;
@@ -1446,10 +1480,10 @@ static inline bool accel_uses_host_cpuid(void)
 return kvm_enabled() || hvf_enabled();
 }
 
-static inline uint64_t x86_cpu_xsave_components(X86CPU *cpu)
+static inline uint64_t x86_cpu_xsave_xcr0_components(X86CPU *cpu)
 {
-return ((uint64_t)cpu->env.features[FEAT_XSAVE_COMP_HI]) << 32 |
-   cpu->env.features[FEAT_XSAVE_COMP_LO];
+return ((uint64_t)cpu->env.features[FEAT_XSAVE_XCR0_HI]) << 32 |
+   cpu->env.features[FEAT_XSAVE_XCR0_LO];
 }
 
 /* Return name of 32-bit register, from a R_* constant */
@@ -1461,6 +1495,12 @@ static const char *get_register_name_32(unsigned int reg)
 return x86_reg_info_32[reg].name;
 }
 
+static inline uint64_t x86_cpu_xsave_xss_components(X86CPU *cpu)
+{
+return ((uint64_t)cpu->env.features[FEAT_XSAVE_XSS_HI]) << 32 |
+   cpu->env.features[FEAT_XSAVE_XSS_LO];
+}
+
 /*
  * Returns the set of feature flags that are supported and migratable by
  * QEMU, for a given FeatureWord.
@@ -4628,8 +4668,8 @@ static const char *x86_cpu_feature_name(FeatureWord w, 
int bitnr)
 /* XSAVE components are automatically enabled by other features,
  * so return the original feature name instead
  */
-if (w == FEAT_XSAVE_COMP_LO || w == FEAT_XSAVE_COMP_HI) {
-int comp = (w == FEAT_XSAVE_COMP_HI) ? bitnr + 32 : bitnr;
+if (w == FEAT_XSAVE_XCR0_LO || w == FEAT_XSAVE_XCR0_HI) {
+int comp = (w == FEAT_XSAVE_XCR0_HI) ? bitnr + 32 : bitnr;
 
 if (comp < 

[PATCH 3/6] aspeed: rainier: Add i2c LED devices

2022-02-16 Thread Cédric Le Goater
From: Joel Stanley 

This helps quieten booting the current Rainier kernel.

Signed-off-by: Joel Stanley 
Signed-off-by: Cédric Le Goater 
---
 hw/arm/aspeed.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index 9789a489047b..0e5e5c31d59c 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -723,6 +723,8 @@ static void rainier_bmc_i2c_init(AspeedMachineState *bmc)
 
 aspeed_eeprom_init(aspeed_i2c_get_bus(>i2c, 0), 0x51, 32 * KiB);
 
+i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 3), "pca9552", 0x61);
+
 /* The rainier expects a TMP275 but a TMP105 is compatible */
 i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 4), TYPE_TMP105,
  0x48);
@@ -735,11 +737,14 @@ static void rainier_bmc_i2c_init(AspeedMachineState *bmc)
 aspeed_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 0), 0x50, 64 * KiB);
 aspeed_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 1), 0x51, 64 * KiB);
 aspeed_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 2), 0x52, 64 * KiB);
+i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 4), "pca9552", 0x60);
 
 i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 5), TYPE_TMP105,
  0x48);
 i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 5), TYPE_TMP105,
  0x49);
+i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 5), "pca9552", 0x60);
+i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 5), "pca9552", 0x61);
 i2c_mux = i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 5),
   "pca9546", 0x70);
 aspeed_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 0), 0x50, 64 * KiB);
@@ -758,7 +763,12 @@ static void rainier_bmc_i2c_init(AspeedMachineState *bmc)
 aspeed_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 2), 0x50, 64 * KiB);
 aspeed_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 3), 0x51, 64 * KiB);
 
+i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 7), "pca9552", 0x30);
+i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 7), "pca9552", 0x31);
+i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 7), "pca9552", 0x32);
+i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 7), "pca9552", 0x33);
 /* Bus 7: TODO max31785@52 */
+i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 7), "pca9552", 0x60);
 i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 7), "pca9552", 0x61);
 i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 7), "dps310", 0x76);
 /* Bus 7: TODO si7021-a20@20 */
@@ -773,6 +783,7 @@ static void rainier_bmc_i2c_init(AspeedMachineState *bmc)
  0x4a);
 aspeed_eeprom_init(aspeed_i2c_get_bus(>i2c, 8), 0x50, 64 * KiB);
 aspeed_eeprom_init(aspeed_i2c_get_bus(>i2c, 8), 0x51, 64 * KiB);
+i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 8), "pca9552", 0x60);
 i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 8), "pca9552", 0x61);
 /* Bus 8: ucd90320@11 */
 /* Bus 8: ucd90320@b */
@@ -794,13 +805,17 @@ static void rainier_bmc_i2c_init(AspeedMachineState *bmc)
   "pca9546", 0x70);
 aspeed_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 0), 0x50, 64 * KiB);
 aspeed_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 1), 0x51, 64 * KiB);
+i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 11), "pca9552", 
0x60);
 
 
 aspeed_eeprom_init(aspeed_i2c_get_bus(>i2c, 13), 0x50, 64 * KiB);
+i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 13), "pca9552", 
0x60);
 
 aspeed_eeprom_init(aspeed_i2c_get_bus(>i2c, 14), 0x50, 64 * KiB);
+i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 14), "pca9552", 
0x60);
 
 aspeed_eeprom_init(aspeed_i2c_get_bus(>i2c, 15), 0x50, 64 * KiB);
+i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 15), "pca9552", 
0x60);
 }
 
 static void get_pca9548_channels(I2CBus *bus, uint8_t mux_addr,
-- 
2.34.1




Re: [PATCH 2/3] iotests: Allow using QMP with the QSD

2022-02-16 Thread Hanna Reitz

On 15.02.22 23:19, Eric Blake wrote:

On Tue, Feb 15, 2022 at 02:57:26PM +0100, Hanna Reitz wrote:

Add a parameter to optionally open a QMP connection when creating a
QemuStorageDaemon instance.

Signed-off-by: Hanna Reitz 
---
  tests/qemu-iotests/iotests.py | 29 -
  1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 6ba65eb1ff..47e3808ab9 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -39,6 +39,7 @@
  
  from qemu.machine import qtest

  from qemu.qmp import QMPMessage
+from qemu.aqmp.legacy import QEMUMonitorProtocol

I thought we were trying to get rid of aqmp.legacy usage, so this
feels like a temporary regression.  Oh well, not the end of the
testing world.


I fiddled around with the non-legacy interface and wasn’t very 
successful...  I thought since machine.py still uses qemu.aqmp.legacy 
for QEMUMachine, when one is reworked to get rid of it (if that ever 
becomes necessary), then we can just do it here, too.





  def stop(self, kill_signal=15):
  self._p.send_signal(kill_signal)
  self._p.wait()
  self._p = None
  
+if self._qmp:

+self._qmp.close()
+
  try:
+if self._qmpsock is not None:
+os.remove(self._qmpsock)
  os.remove(self.pidfile)
  except OSError:
  pass

Do we need two try: blocks here, to remove self.pidfile even if
os.remove(self._qmpsock) failed?


Honestly, no reason not to use two blocks except it being longer. You’re 
right, I should’ve just done that.



Otherwise, makes sense to me.


Thanks for reviewing!

Hanna




Re: [PATCH v3 3/3] s390x/tcg/tests: Tests for Miscellaneous-Instruction-Extensions Facility 3

2022-02-16 Thread David Hildenbrand
On 15.02.22 21:27, David Miller wrote:
> tests/tcg/s390x/mie3-compl.c: [N]*K instructions
> tests/tcg/s390x/mie3-mvcrl.c: MVCRL instruction
> tests/tcg/s390x/mie3-sel.c:  SELECT instruction
> 
> Signed-off-by: David Miller 
> ---
>   tests/tcg/s390x/Makefile.target |  2 +-
>   tests/tcg/s390x/mie3-compl.c| 56 +
>   tests/tcg/s390x/mie3-mvcrl.c| 31 ++
>   tests/tcg/s390x/mie3-sel.c  | 42 +
>   4 files changed, 130 insertions(+), 1 deletion(-)
>   create mode 100644 tests/tcg/s390x/mie3-compl.c
>   create mode 100644 tests/tcg/s390x/mie3-mvcrl.c
>   create mode 100644 tests/tcg/s390x/mie3-sel.c
> 
> diff --git a/tests/tcg/s390x/Makefile.target 
> b/tests/tcg/s390x/Makefile.target
> index 1a7238b4eb..16b9d45307 100644
> --- a/tests/tcg/s390x/Makefile.target
> +++ b/tests/tcg/s390x/Makefile.target
> @@ -1,6 +1,6 @@
>   S390X_SRC=$(SRC_PATH)/tests/tcg/s390x
>   VPATH+=$(S390X_SRC)
> -CFLAGS+=-march=zEC12 -m64
> +CFLAGS+=-march=z15 -m64

Unfortunately, this makes our docker builds unhappy -- fail. I assume the
compiler in the container is outdated.

$ make run-tcg-tests-s390x-linux-user 
changing dir to build for make "run-tcg-tests-s390x-linux-user"...
make[1]: Entering directory '/home/dhildenb/git/qemu/build'
  GIT ui/keycodemapdb tests/fp/berkeley-testfloat-3 
tests/fp/berkeley-softfloat-3 dtc capstone slirp
  BUILD   debian10
  BUILD   debian-s390x-cross
  BUILD   TCG tests for s390x-linux-user
  CHECK   debian10
  CHECK   debian-s390x-cross
  BUILD   s390x-linux-user guest-tests with docker qemu/debian-s390x-cross
s390x-linux-gnu-gcc: error: unrecognized argument in option '-march=z15'
s390x-linux-gnu-gcc: note: valid arguments to '-march=' are: arch10 arch11 
arch12 arch3 arch5 arch6 arch7 arch8 arch9 g5 g6 native z10 z13 z14 z196 z9-109 
z9-ec z900 z990 zEC12; did you mean 'z10'?

Maybe debian11 could, work.

@Thomas do you have any idea if we could get this to work with
'-march=z15' or should we work around that by manually encoding
the relevant instructions instead?

-- 
Thanks,

David / dhildenb




[PATCH 2/6] ast2600: Add Secure Boot Controller model

2022-02-16 Thread Cédric Le Goater
From: Joel Stanley 

Just a stub that indicates the system has booted in secure boot mode.
Used for testing the driver:

 https://lore.kernel.org/all/20211019080608.283324-1-j...@jms.id.au/

Signed-off-by: Joel Stanley 
Signed-off-by: Cédric Le Goater 
---
 include/hw/arm/aspeed_soc.h  |   3 +
 include/hw/misc/aspeed_sbc.h |  33 
 hw/arm/aspeed_ast2600.c  |   9 +++
 hw/misc/aspeed_sbc.c | 141 +++
 hw/misc/meson.build  |   1 +
 5 files changed, 187 insertions(+)
 create mode 100644 include/hw/misc/aspeed_sbc.h
 create mode 100644 hw/misc/aspeed_sbc.c

diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h
index cae9906684cb..da043dcb454d 100644
--- a/include/hw/arm/aspeed_soc.h
+++ b/include/hw/arm/aspeed_soc.h
@@ -24,6 +24,7 @@
 #include "hw/misc/aspeed_i3c.h"
 #include "hw/ssi/aspeed_smc.h"
 #include "hw/misc/aspeed_hace.h"
+#include "hw/misc/aspeed_sbc.h"
 #include "hw/watchdog/wdt_aspeed.h"
 #include "hw/net/ftgmac100.h"
 #include "target/arm/cpu.h"
@@ -60,6 +61,7 @@ struct AspeedSoCState {
 AspeedSMCState fmc;
 AspeedSMCState spi[ASPEED_SPIS_NUM];
 EHCISysBusState ehci[ASPEED_EHCIS_NUM];
+AspeedSBCState sbc;
 AspeedSDMCState sdmc;
 AspeedWDTState wdt[ASPEED_WDTS_NUM];
 FTGMAC100State ftgmac100[ASPEED_MACS_NUM];
@@ -109,6 +111,7 @@ enum {
 ASPEED_DEV_SDMC,
 ASPEED_DEV_SCU,
 ASPEED_DEV_ADC,
+ASPEED_DEV_SBC,
 ASPEED_DEV_VIDEO,
 ASPEED_DEV_SRAM,
 ASPEED_DEV_SDHCI,
diff --git a/include/hw/misc/aspeed_sbc.h b/include/hw/misc/aspeed_sbc.h
new file mode 100644
index ..e47333cb55db
--- /dev/null
+++ b/include/hw/misc/aspeed_sbc.h
@@ -0,0 +1,33 @@
+/*
+ * sASPEED Secure Boot Controller
+ *
+ * Copyright (C) 2021 IBM Corp.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef ASPEED_SBC_H
+#define ASPEED_SBC_H
+
+#include "hw/sysbus.h"
+
+#define TYPE_ASPEED_SBC "aspeed.sbc"
+#define TYPE_ASPEED_AST2600_SBC TYPE_ASPEED_SBC "-ast2600"
+OBJECT_DECLARE_TYPE(AspeedSBCState, AspeedSBCClass, ASPEED_SBC)
+
+#define ASPEED_SBC_NR_REGS (0x93c >> 2)
+
+struct AspeedSBCState {
+SysBusDevice parent;
+
+MemoryRegion iomem;
+
+uint32_t regs[ASPEED_SBC_NR_REGS];
+};
+
+
+struct AspeedSBCClass {
+SysBusDeviceClass parent_class;
+};
+
+#endif /* _ASPEED_SBC_H_ */
diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c
index 12f6edc081fd..21cd3342c578 100644
--- a/hw/arm/aspeed_ast2600.c
+++ b/hw/arm/aspeed_ast2600.c
@@ -47,6 +47,7 @@ static const hwaddr aspeed_soc_ast2600_memmap[] = {
 [ASPEED_DEV_XDMA]  = 0x1E6E7000,
 [ASPEED_DEV_ADC]   = 0x1E6E9000,
 [ASPEED_DEV_DP]= 0x1E6EB000,
+[ASPEED_DEV_SBC]   = 0x1E6F2000,
 [ASPEED_DEV_VIDEO] = 0x1E70,
 [ASPEED_DEV_SDHCI] = 0x1E74,
 [ASPEED_DEV_EMMC]  = 0x1E75,
@@ -227,6 +228,8 @@ static void aspeed_soc_ast2600_init(Object *obj)
 object_initialize_child(obj, "hace", >hace, typename);
 
 object_initialize_child(obj, "i3c", >i3c, TYPE_ASPEED_I3C);
+
+object_initialize_child(obj, "sbc", >sbc, TYPE_ASPEED_SBC);
 }
 
 /*
@@ -539,6 +542,12 @@ static void aspeed_soc_ast2600_realize(DeviceState *dev, 
Error **errp)
 /* The AST2600 I3C controller has one IRQ per bus. */
 sysbus_connect_irq(SYS_BUS_DEVICE(>i3c.devices[i]), 0, irq);
 }
+
+/* Secure Boot Controller */
+if (!sysbus_realize(SYS_BUS_DEVICE(>sbc), errp)) {
+return;
+}
+sysbus_mmio_map(SYS_BUS_DEVICE(>sbc), 0, sc->memmap[ASPEED_DEV_SBC]);
 }
 
 static void aspeed_soc_ast2600_class_init(ObjectClass *oc, void *data)
diff --git a/hw/misc/aspeed_sbc.c b/hw/misc/aspeed_sbc.c
new file mode 100644
index ..e5ac35eb975b
--- /dev/null
+++ b/hw/misc/aspeed_sbc.c
@@ -0,0 +1,141 @@
+/*
+ * ASPEED Secure Boot Controller
+ *
+ * Copyright (C) 2021 IBM Corp.
+ *
+ * Joel Stanley 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/error-report.h"
+#include "hw/misc/aspeed_sbc.h"
+#include "qapi/error.h"
+#include "migration/vmstate.h"
+
+#define R_PROT  (0x000 / 4)
+#define R_STATUS(0x014 / 4)
+
+static uint64_t aspeed_sbc_read(void *opaque, hwaddr addr, unsigned int size)
+{
+AspeedSBCState *s = ASPEED_SBC(opaque);
+
+addr >>= 2;
+
+if (addr >= ASPEED_SBC_NR_REGS) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: Out-of-bounds read at offset 0x%" HWADDR_PRIx "\n",
+  __func__, addr << 2);
+return 0;
+}
+
+return s->regs[addr];
+}
+
+static void aspeed_sbc_write(void *opaque, hwaddr addr, uint64_t data,
+  unsigned int size)
+{
+AspeedSBCState *s = ASPEED_SBC(opaque);
+
+addr >>= 2;
+
+if (addr >= ASPEED_SBC_NR_REGS) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: Out-of-bounds write at offset 0x%" HWADDR_PRIx "\n",
+ 

[PATCH 5/8] target/i386: Add XSAVES support for Arch LBR

2022-02-16 Thread Yang Weijiang
Define Arch LBR bit in XSS and save/restore structure
for XSAVE area size calculation.

Signed-off-by: Yang Weijiang 
---
 target/i386/cpu.c |  6 +-
 target/i386/cpu.h | 23 +++
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 496e906233..e505c926b2 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1414,7 +1414,7 @@ static const X86RegisterInfo32 
x86_reg_info_32[CPU_NB_REGS32] = {
 #undef REGISTER
 
 /* CPUID feature bits available in XSS */
-#define CPUID_XSTATE_XSS_MASK(0)
+#define CPUID_XSTATE_XSS_MASK(XSTATE_ARCH_LBR_MASK)
 
 ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT] = {
 [XSTATE_FP_BIT] = {
@@ -1448,6 +1448,10 @@ ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT] = 
{
 [XSTATE_PKRU_BIT] =
   { .feature = FEAT_7_0_ECX, .bits = CPUID_7_0_ECX_PKU,
 .size = sizeof(XSavePKRU) },
+[XSTATE_ARCH_LBR_BIT] = {
+.feature = FEAT_7_0_EDX, .bits = CPUID_7_0_EDX_ARCH_LBR,
+.offset = 0 /*supervisor mode component, offset = 0 */,
+.size = sizeof(XSavesArchLBR) },
 [XSTATE_XTILE_CFG_BIT] = {
 .feature = FEAT_7_0_EDX, .bits = CPUID_7_0_EDX_AMX_TILE,
 .size = sizeof(XSaveXTILECFG),
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 1d17196a0b..07b198539b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -541,6 +541,7 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_BIT6
 #define XSTATE_Hi16_ZMM_BIT 7
 #define XSTATE_PKRU_BIT 9
+#define XSTATE_ARCH_LBR_BIT 15
 #define XSTATE_XTILE_CFG_BIT17
 #define XSTATE_XTILE_DATA_BIT   18
 
@@ -553,6 +554,7 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_MASK   (1ULL << XSTATE_ZMM_Hi256_BIT)
 #define XSTATE_Hi16_ZMM_MASK(1ULL << XSTATE_Hi16_ZMM_BIT)
 #define XSTATE_PKRU_MASK(1ULL << XSTATE_PKRU_BIT)
+#define XSTATE_ARCH_LBR_MASK(1ULL << XSTATE_ARCH_LBR_BIT)
 #define XSTATE_XTILE_CFG_MASK   (1ULL << XSTATE_XTILE_CFG_BIT)
 #define XSTATE_XTILE_DATA_MASK  (1ULL << XSTATE_XTILE_DATA_BIT)
 #define XFEATURE_XTILE_MASK (XSTATE_XTILE_CFG_MASK \
@@ -867,6 +869,8 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_0_EDX_SERIALIZE (1U << 14)
 /* TSX Suspend Load Address Tracking instruction */
 #define CPUID_7_0_EDX_TSX_LDTRK (1U << 16)
+/* Architectural LBRs */
+#define CPUID_7_0_EDX_ARCH_LBR  (1U << 19)
 /* AVX512_FP16 instruction */
 #define CPUID_7_0_EDX_AVX512_FP16   (1U << 23)
 /* AMX tile (two-dimensional register) */
@@ -1386,6 +1390,24 @@ typedef struct XSaveXTILEDATA {
 uint8_t xtiledata[8][1024];
 } XSaveXTILEDATA;
 
+typedef struct {
+   uint64_t from;
+   uint64_t to;
+   uint64_t info;
+} LBR_ENTRY;
+
+#define ARCH_LBR_NR_ENTRIES32
+
+/* Ext. save area 19: Supervisor mode Arch LBR state */
+typedef struct XSavesArchLBR {
+uint64_t lbr_ctl;
+uint64_t lbr_depth;
+uint64_t ler_from;
+uint64_t ler_to;
+uint64_t ler_info;
+LBR_ENTRY lbr_records[ARCH_LBR_NR_ENTRIES];
+} XSavesArchLBR;
+
 QEMU_BUILD_BUG_ON(sizeof(XSaveAVX) != 0x100);
 QEMU_BUILD_BUG_ON(sizeof(XSaveBNDREG) != 0x40);
 QEMU_BUILD_BUG_ON(sizeof(XSaveBNDCSR) != 0x40);
@@ -1395,6 +1417,7 @@ QEMU_BUILD_BUG_ON(sizeof(XSaveHi16_ZMM) != 0x400);
 QEMU_BUILD_BUG_ON(sizeof(XSavePKRU) != 0x8);
 QEMU_BUILD_BUG_ON(sizeof(XSaveXTILECFG) != 0x40);
 QEMU_BUILD_BUG_ON(sizeof(XSaveXTILEDATA) != 0x2000);
+QEMU_BUILD_BUG_ON(sizeof(XSavesArchLBR) != 0x328);
 
 typedef struct ExtSaveArea {
 uint32_t feature, bits;
-- 
2.27.0




[PATCH 1/6] arm: Remove swift-bmc machine

2022-02-16 Thread Cédric Le Goater
From: Joel Stanley 

It was scheduled for removal in 7.0.

Signed-off-by: Joel Stanley 
Message-Id: <20220216080947.65955-1-j...@jms.id.au>
Signed-off-by: Cédric Le Goater 
---
 docs/about/deprecated.rst  |  7 -
 docs/system/arm/aspeed.rst |  1 -
 hw/arm/aspeed.c| 53 --
 3 files changed, 61 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 26d00812ba94..85773db631c1 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -315,13 +315,6 @@ Use the more generic event ``DEVICE_UNPLUG_GUEST_ERROR`` 
instead.
 System emulator machines
 
 
-Aspeed ``swift-bmc`` machine (since 6.1)
-
-
-This machine is deprecated because we have enough AST2500 based OpenPOWER
-machines. It can be easily replaced by the ``witherspoon-bmc`` or the
-``romulus-bmc`` machines.
-
 PPC 405 ``taihu`` machine (since 7.0)
 '
 
diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst
index d8b102fa0ad0..60ed94f18759 100644
--- a/docs/system/arm/aspeed.rst
+++ b/docs/system/arm/aspeed.rst
@@ -22,7 +22,6 @@ AST2500 SoC based machines :
 - ``romulus-bmc``  OpenPOWER Romulus POWER9 BMC
 - ``witherspoon-bmc``  OpenPOWER Witherspoon POWER9 BMC
 - ``sonorapass-bmc``   OCP SonoraPass BMC
-- ``swift-bmc``OpenPOWER Swift BMC POWER9 (to be removed in v7.0)
 - ``fp5280g2-bmc`` Inspur FP5280G2 BMC
 - ``g220a-bmc``Bytedance G220A BMC
 
diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index d911dc904fb3..9789a489047b 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -544,35 +544,6 @@ static void romulus_bmc_i2c_init(AspeedMachineState *bmc)
 i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 11), "ds1338", 0x32);
 }
 
-static void swift_bmc_i2c_init(AspeedMachineState *bmc)
-{
-AspeedSoCState *soc = >soc;
-
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 3), "pca9552", 0x60);
-
-/* The swift board expects a TMP275 but a TMP105 is compatible */
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 7), "tmp105", 0x48);
-/* The swift board expects a pca9551 but a pca9552 is compatible */
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 7), "pca9552", 0x60);
-
-/* The swift board expects an Epson RX8900 RTC but a ds1338 is compatible 
*/
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 8), "ds1338", 0x32);
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 8), "pca9552", 0x60);
-
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 9), "tmp423", 0x4c);
-/* The swift board expects a pca9539 but a pca9552 is compatible */
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 9), "pca9552", 0x74);
-
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 10), "tmp423", 0x4c);
-/* The swift board expects a pca9539 but a pca9552 is compatible */
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 10), "pca9552",
- 0x74);
-
-/* The swift board expects a TMP275 but a TMP105 is compatible */
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 12), "tmp105", 0x48);
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 12), "tmp105", 0x4a);
-}
-
 static void sonorapass_bmc_i2c_init(AspeedMachineState *bmc)
 {
 AspeedSoCState *soc = >soc;
@@ -1102,26 +1073,6 @@ static void 
aspeed_machine_sonorapass_class_init(ObjectClass *oc, void *data)
 aspeed_soc_num_cpus(amc->soc_name);
 };
 
-static void aspeed_machine_swift_class_init(ObjectClass *oc, void *data)
-{
-MachineClass *mc = MACHINE_CLASS(oc);
-AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
-
-mc->desc   = "OpenPOWER Swift BMC (ARM1176)";
-amc->soc_name  = "ast2500-a1";
-amc->hw_strap1 = SWIFT_BMC_HW_STRAP1;
-amc->fmc_model = "mx66l1g45g";
-amc->spi_model = "mx66l1g45g";
-amc->num_cs= 2;
-amc->i2c_init  = swift_bmc_i2c_init;
-mc->default_ram_size   = 512 * MiB;
-mc->default_cpus = mc->min_cpus = mc->max_cpus =
-aspeed_soc_num_cpus(amc->soc_name);
-
-mc->deprecation_reason = "redundant system. Please use a similar "
-"OpenPOWER BMC, Witherspoon or Romulus.";
-};
-
 static void aspeed_machine_witherspoon_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -1277,10 +1228,6 @@ static const TypeInfo aspeed_machine_types[] = {
 .name  = MACHINE_TYPE_NAME("romulus-bmc"),
 .parent= TYPE_ASPEED_MACHINE,
 .class_init= aspeed_machine_romulus_class_init,
-}, {
-.name  = MACHINE_TYPE_NAME("swift-bmc"),
-.parent= TYPE_ASPEED_MACHINE,
-.class_init= aspeed_machine_swift_class_init,
 }, {
 .name  = MACHINE_TYPE_NAME("sonorapass-bmc"),
 .parent= TYPE_ASPEED_MACHINE,
-- 
2.34.1




Re: [PATCH v2] arm: Remove swift-bmc machine

2022-02-16 Thread Daniel P . Berrangé
On Wed, Feb 16, 2022 at 06:39:47PM +1030, Joel Stanley wrote:
> It was scheduled for removal in 7.0.
> 
> Signed-off-by: Joel Stanley 
> ---
> v2: also remove from docs/about/deprecated.rst
> 
>  docs/about/deprecated.rst  |  7 -
>  docs/system/arm/aspeed.rst |  1 -
>  hw/arm/aspeed.c| 53 --
>  3 files changed, 61 deletions(-)
> 
> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> index 26d00812ba94..85773db631c1 100644
> --- a/docs/about/deprecated.rst
> +++ b/docs/about/deprecated.rst
> @@ -315,13 +315,6 @@ Use the more generic event ``DEVICE_UNPLUG_GUEST_ERROR`` 
> instead.
>  System emulator machines
>  
>  
> -Aspeed ``swift-bmc`` machine (since 6.1)
> -
> -
> -This machine is deprecated because we have enough AST2500 based OpenPOWER
> -machines. It can be easily replaced by the ``witherspoon-bmc`` or the
> -``romulus-bmc`` machines.
> -

An equivalent note needs to be added to removed-features.rst

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PULL 00/30] Misc mostly build system patches for 2022-02-15

2022-02-16 Thread Peter Maydell
On Tue, 15 Feb 2022 at 09:35, Paolo Bonzini  wrote:
>
> The following changes since commit 2d88a3a595f1094e3ecc6cd2fd1e804634c84b0f:
>
>   Merge remote-tracking branch 'remotes/kwolf-gitlab/tags/for-upstream' into 
> staging (2022-02-14 19:54:00 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/bonzini/qemu.git tags/for-upstream
>
> for you to fetch changes up to 3dd33fd665e7fb041350849e35408f679dfa7383:
>
>   configure, meson: move CONFIG_IASL to a Meson option (2022-02-15 09:36:13 
> +0100)
>
> 
> * More Meson conversions (0.59.x now required rather than suggested)
> * UMIP support for TCG x86
> * Fix migration crash
> * Restore error output for check-block
>
> 

Hi; this fails to build on OpenBSD (on the tests/vm/ setup).

Meson thinks it's found OpenGL:
OpenGL support (epoxy)   : YES 1.5.4

but either it's wrong or else it's not putting the right
include directory onto the path, because the compiler
fails to find the headers:

In file included from ../src/hw/arm/virt.c:42:
In file included from
/home/qemu/qemu-test.sr5128/src/include/hw/vfio/vfio-calxeda-xgmac.h:17:
In file included from
/home/qemu/qemu-test.sr5128/src/include/hw/vfio/vfio-platform.h:20:
In file included from
/home/qemu/qemu-test.sr5128/src/include/hw/vfio/vfio-common.h:27:
/home/qemu/qemu-test.sr5128/src/include/ui/console.h:11:11: fatal
error: 'epoxy/gl.h' file not found
# include 
  ^~~~
1 error generated.

thanks
-- PMM



RE: [RFC v4 08/21] vfio-user: define socket receive functions

2022-02-16 Thread Thanos Makatos


> -Original Message-
> From: John Johnson 
> Sent: 16 February 2022 02:10
> To: Thanos Makatos 
> Cc: qemu-devel@nongnu.org
> Subject: Re: [RFC v4 08/21] vfio-user: define socket receive functions
> 
> 
> 
> > On Feb 15, 2022, at 6:50 AM, Thanos Makatos
>  wrote:
> >
> >>>
> >
> > On second thought, should we dump the entire header in case of such errors?
> If not by default then at least in debug builds?
> 
> 
>   I was thinking of adding qemu tracepoints in the recv and send paths
> for your other debug rfe.  Maybe I’ll add one set for the normal path that
> prints an abbreviated header, and another set for the error case that prints
> the whole header.  Would that work?

Yes that would be great.


Re: [PATCH RFCv2 2/4] i386/pc: relocate 4g start to 1T where applicable

2022-02-16 Thread Daniel P . Berrangé
On Tue, Feb 15, 2022 at 10:53:58AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > I don't know what behavior should be if firmware tries to program
> > PCI64 hole beyond supported phys-bits.
> 
> Well, you are basically f*cked.
> 
> Unfortunately there is no reliable way to figure what phys-bits actually
> is.  Because of that the firmware (both seabios and edk2) tries to place
> the pci64 hole as low as possible.
> 
> The long version:
> 
> qemu advertises phys-bits=40 to the guest by default.  Probably because
> this is what the first amd opteron processors had, assuming that it
> would be a safe default.  Then intel came, releasing processors with
> phys-bits=36, even recent (desktop-class) hardware has phys-bits=39.
> Boom.
> 
> End result is that edk2 uses a 32G pci64 window by default, which is
> placed at the first 32G border beyond normal ram.  So for virtual
> machines with up to ~ 30G ram (including reservations for memory
> hotplug) the pci64 hole covers 32G -> 64G in guest physical address
> space, which is low enough that it works on hardware with phys-bits=36.
> 
> If your VM has more than 32G of memory the pci64 hole will move and
> phys-bits=36 isn't enough any more, but given that you probably only do
> that on more beefy hosts which can take >= 64G of RAM and have a larger
> physical address space this heuristic works good enough in practice.
> 
> Changing phys-bits behavior has been discussed on and off since years.
> It's tricky to change for live migration compatibility reasons.
> 
> We got the host-phys-bits and host-phys-bits-limit properties, which
> solve some of the phys-bits problems.
> 
>  * host-phys-bits=on makes sure the phys-bits advertised to the guest
>actually works.  It's off by default though for backward
>compatibility reasons (except microvm).  Also because turning it on
>breaks live migration of machines between hosts with different
>phys-bits.

RHEL has shipped with host-phys-bits=on in its machine types
sinec RHEL-7. If it is good enough for RHEL machine types
for 8 years, IMHO, it is a sign that its reasonable to do the
same with upstream for new machine types.


>  * host-phys-bits-limit can be used to tweak phys-bits to
>be lower than what the host supports.  Which can be used for
>live migration compatibility, i.e. if you have a pool of machines
>where some have 36 and some 39 you can limit phys-bits to 36 so
>live migration from 39 hosts to 36 hosts works.

RHEL machine types have set this to host-phys-bits-limit=48
since RHEL-8 days, to avoid accidentally enabling 5-level
paging in guests without explicit user opt-in.

> What is missing:
> 
>  * Some way for the firmware to get a phys-bits value it can actually
>use.  One possible way would be to have a paravirtual bit somewhere
>telling whenever host-phys-bits is enabled or not.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v3 3/3] s390x/tcg/tests: Tests for Miscellaneous-Instruction-Extensions Facility 3

2022-02-16 Thread Thomas Huth

On 16/02/2022 10.17, David Hildenbrand wrote:

On 15.02.22 21:27, David Miller wrote:

...

diff --git a/tests/tcg/s390x/Makefile.target
b/tests/tcg/s390x/Makefile.target
index 1a7238b4eb..16b9d45307 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -1,6 +1,6 @@
   S390X_SRC=$(SRC_PATH)/tests/tcg/s390x
   VPATH+=$(S390X_SRC)
-CFLAGS+=-march=zEC12 -m64
+CFLAGS+=-march=z15 -m64


Unfortunately, this makes our docker builds unhappy -- fail. I assume the
compiler in the container is outdated.

$ make run-tcg-tests-s390x-linux-user
changing dir to build for make "run-tcg-tests-s390x-linux-user"...
make[1]: Entering directory '/home/dhildenb/git/qemu/build'
   GIT ui/keycodemapdb tests/fp/berkeley-testfloat-3 
tests/fp/berkeley-softfloat-3 dtc capstone slirp
   BUILD   debian10
   BUILD   debian-s390x-cross
   BUILD   TCG tests for s390x-linux-user
   CHECK   debian10
   CHECK   debian-s390x-cross
   BUILD   s390x-linux-user guest-tests with docker qemu/debian-s390x-cross
s390x-linux-gnu-gcc: error: unrecognized argument in option '-march=z15'
s390x-linux-gnu-gcc: note: valid arguments to '-march=' are: arch10 arch11 
arch12 arch3 arch5 arch6 arch7 arch8 arch9 g5 g6 native z10 z13 z14 z196 z9-109 
z9-ec z900 z990 zEC12; did you mean 'z10'?

Maybe debian11 could, work.

@Thomas do you have any idea if we could get this to work with
'-march=z15' or should we work around that by manually encoding
the relevant instructions instead?


I'm not an expert when it comes to containers, but I think you could try to 
update to debian11 in tests/docker/dockerfiles/debian-s390x-cross.docker and 
in ./.gitlab-ci.d/container-cross.yml ... if that does not work, it's maybe 
better to manually encode the instructions.


 Thomas




Re: [PATCH v3 3/3] s390x/tcg/tests: Tests for Miscellaneous-Instruction-Extensions Facility 3

2022-02-16 Thread David Hildenbrand
On 15.02.22 21:27, David Miller wrote:
> tests/tcg/s390x/mie3-compl.c: [N]*K instructions
> tests/tcg/s390x/mie3-mvcrl.c: MVCRL instruction
> tests/tcg/s390x/mie3-sel.c:  SELECT instruction
> 
> Signed-off-by: David Miller 
> ---
>   tests/tcg/s390x/Makefile.target |  2 +-
>   tests/tcg/s390x/mie3-compl.c| 56 +
>   tests/tcg/s390x/mie3-mvcrl.c| 31 ++
>   tests/tcg/s390x/mie3-sel.c  | 42 +
>   4 files changed, 130 insertions(+), 1 deletion(-)
>   create mode 100644 tests/tcg/s390x/mie3-compl.c
>   create mode 100644 tests/tcg/s390x/mie3-mvcrl.c
>   create mode 100644 tests/tcg/s390x/mie3-sel.c
> 
> diff --git a/tests/tcg/s390x/Makefile.target 
> b/tests/tcg/s390x/Makefile.target
> index 1a7238b4eb..16b9d45307 100644
> --- a/tests/tcg/s390x/Makefile.target
> +++ b/tests/tcg/s390x/Makefile.target
> @@ -1,6 +1,6 @@
>   S390X_SRC=$(SRC_PATH)/tests/tcg/s390x
>   VPATH+=$(S390X_SRC)
> -CFLAGS+=-march=zEC12 -m64
> +CFLAGS+=-march=z15 -m64
>   TESTS+=hello-s390x
>   TESTS+=csst
>   TESTS+=ipm

Your patch is missing the following hunk:

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 16b9d45307..54e67446aa 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -7,6 +7,9 @@ TESTS+=ipm
 TESTS+=exrl-trt
 TESTS+=exrl-trtr
 TESTS+=pack
+TESTS+=mie3-compl
+TESTS+=mie3-mvcrl
+TESTS+=mie3-sel
 TESTS+=mvo
 TESTS+=mvc
 TESTS+=shift


With debian11, I can build the tests. However, mie3-sel seems to have an issue:


  TESTmie3-compl on s390x
  TESTmie3-mvcrl on s390x
  TESTmie3-sel on s390x
timeout: the monitored command dumped core
make[3]: *** [../Makefile.target:156: run-mie3-sel] Error 132
make[2]: *** [/home/dhildenb/git/qemu/tests/tcg/Makefile.qemu:102: 
run-guest-tests] Error 2
make[1]: *** [/home/dhildenb/git/qemu/tests/Makefile.include:59: 
run-tcg-tests-s390x-linux-user] Error 2
make[1]: Leaving directory '/home/dhildenb/git/qemu/build'
make: *** [GNUmakefile:11: run-tcg-tests-s390x-linux-user] Error 2

qemu-s390x gets killed via

"Program terminated with signal SIGILL, Illegal instruction."

-- 
Thanks,

David / dhildenb




Re: [PATCH v2 4/8] configure: Disable out-of-line atomic operations on Aarch64

2022-02-16 Thread Richard Henderson

On 2/16/22 04:01, Philippe Mathieu-Daudé via wrote:

GCC 10.1 introduced the -moutline-atomics option on Aarch64.
This options is enabled by default, and triggers a link failure:

   Undefined symbols for architecture arm64:
 "___aarch64_cas1_acq_rel", referenced from:
 _qmp_migrate_recover in migration_migration.c.o
 _cpu_atomic_cmpxchgb_mmu in accel_tcg_cputlb.c.o
 _cpu_atomic_fetch_sminb_mmu in accel_tcg_cputlb.c.o
 _cpu_atomic_fetch_uminb_mmu in accel_tcg_cputlb.c.o
 _cpu_atomic_fetch_smaxb_mmu in accel_tcg_cputlb.c.o
 _cpu_atomic_fetch_umaxb_mmu in accel_tcg_cputlb.c.o
 _cpu_atomic_smin_fetchb_mmu in accel_tcg_cputlb.c.o
 ...
 "___aarch64_ldadd4_acq_rel", referenced from:
 _multifd_recv_new_channel in migration_multifd.c.o
 _monitor_event in monitor_hmp.c.o
 _handle_hmp_command in monitor_hmp.c.o
 _colo_compare_finalize in net_colo-compare.c.o
 _flatview_unref in softmmu_memory.c.o
 _virtio_scsi_hotunplug in hw_scsi_virtio-scsi.c.o
 _tcg_register_thread in tcg_tcg.c.o
 ...
 "___aarch64_swp4_acq", referenced from:
 _qemu_spin_lock in softmmu_cpu-timers.c.o
 _cpu_get_ticks in softmmu_cpu-timers.c.o
 _qemu_spin_lock in softmmu_icount.c.o
 _cpu_exec in accel_tcg_cpu-exec.c.o
 _page_flush_tb_1.isra.0 in accel_tcg_translate-all.c.o
 _page_entry_lock in accel_tcg_translate-all.c.o
 _do_tb_phys_invalidate in accel_tcg_translate-all.c.o
 ...

QEMU implements its own atomic operations using C11 builtin helpers.
Disable the GCC out-of-line atomic ops.

Signed-off-by: Philippe Mathieu-Daudé
---
Cc: Stefan Hajnoczi
Cc: Paolo Bonzini

Clearly out of my understanding, but at least it links and the qtests
pass.
---
  configure | 12 
  1 file changed, 12 insertions(+)


These should have been supplied by libgcc.a, which we're supposed to be linking against. 
Something is wrong with your installation.



r~



Re: [PATCH 9/9] spapr: implement nested-hv capability for the virtual hypervisor

2022-02-16 Thread Cédric Le Goater

On 2/16/22 02:16, Nicholas Piggin wrote:

Excerpts from Cédric Le Goater's message of February 16, 2022 4:21 am:

On 2/15/22 04:16, Nicholas Piggin wrote:

This implements the Nested KVM HV hcall API for spapr under TCG.

The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
L1 is switched back in returned from the hcall when a HV exception
is sent to the vhyp. Register state is copied in and out according to
the nested KVM HV hcall API specification.

The hdecr timer is started when the L2 is switched in, and it provides
the HDEC / 0x980 return to L1.

The MMU re-uses the bare metal radix 2-level page table walker by
using the get_pate method to point the MMU to the nested partition
table entry. MMU faults due to partition scope errors raise HV
exceptions and accordingly are routed back to the L1.

The MMU does not tag translations for the L1 (direct) vs L2 (nested)
guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
and exit).

Signed-off-by: Nicholas Piggin 
---
   hw/ppc/spapr.c |  32 +++-
   hw/ppc/spapr_caps.c|  11 +-
   hw/ppc/spapr_hcall.c   | 321 +
   include/hw/ppc/spapr.h |  74 +-
   target/ppc/cpu.h   |   3 +
   5 files changed, 431 insertions(+), 10 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3a5cf92c94..6988e3ec76 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1314,11 +1314,32 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, 
PowerPCCPU *cpu,
   {
   SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
   
-assert(lpid == 0);

+if (!cpu->in_spapr_nested) {


Since 'in_spapr_nested' is a spapr CPU characteristic, I don't think
it belongs to PowerPCCPU. See the end of the patch, for a proposal.


SpaprCpuState. Certainly that's a better place, I must have missed it.



btw, this helps the ordering of files :

[diff]
orderFile = /path/to/qemu/scripts/git.orderfile


+assert(lpid == 0);
   
-/* Copy PATE1:GR into PATE0:HR */

-entry->dw0 = spapr->patb_entry & PATE0_HR;
-entry->dw1 = spapr->patb_entry;
+/* Copy PATE1:GR into PATE0:HR */
+entry->dw0 = spapr->patb_entry & PATE0_HR;
+entry->dw1 = spapr->patb_entry;
+
+} else {
+uint64_t patb, pats;
+
+assert(lpid != 0);
+
+patb = spapr->nested_ptcr & PTCR_PATB;
+pats = spapr->nested_ptcr & PTCR_PATS;
+
+/* Calculate number of entries */
+pats = 1ull << (pats + 12 - 4);
+if (pats <= lpid) {
+return false;
+}
+
+/* Grab entry */
+patb += 16 * lpid;
+entry->dw0 = ldq_phys(CPU(cpu)->as, patb);
+entry->dw1 = ldq_phys(CPU(cpu)->as, patb + 8);
+}
   
   return true;

   }
@@ -4472,7 +4493,7 @@ PowerPCCPU *spapr_find_cpu(int vcpu_id)
   
   static bool spapr_cpu_in_nested(PowerPCCPU *cpu)

   {
-return false;
+return cpu->in_spapr_nested;
   }
   
   static void spapr_cpu_exec_enter(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)

@@ -4584,6 +4605,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
   nc->nmi_monitor_handler = spapr_nmi;
   smc->phb_placement = spapr_phb_placement;
   vhc->cpu_in_nested = spapr_cpu_in_nested;
+vhc->deliver_hv_excp = spapr_exit_nested;
   vhc->hypercall = emulate_spapr_hypercall;
   vhc->hpt_mask = spapr_hpt_mask;
   vhc->map_hptes = spapr_map_hptes;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 5cc80776d0..4d8bb2ad2c 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -444,19 +444,22 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState 
*spapr,
   {
   ERRP_GUARD();
   PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+CPUPPCState *env = >env;
   
   if (!val) {

   /* capability disabled by default */
   return;
   }
   
-if (tcg_enabled()) {

-error_setg(errp, "No Nested KVM-HV support in TCG");


I don't like using KVM-HV (which is KVM-over-PowerNV) when talking about
KVM-over-pseries. I think the platform name is important. Anyhow, this is
a more global discussion but we should talk about it someday because these
HV mode are becoming confusing ! We have PR also :)


The cap is nested-hv and QEMU describes it nested KVM HV. Are we stuck
with that? That could make a name change even more confusing.

It's really a new backend for the KVM HV front end. Like how POWER8 /
POWER9 bare metal backends are completely different now.

But I guess that does not help the end user to understand. On the other
hand, the user might not think "HV" is the HV mode of the CPU and just
thinks of it as "hypervisor".

I like paravirt-hv but nested-hv is not too bad. Anyway I'm happy to
change it.





+if (!(env->insns_flags2 & PPC2_ISA300)) {
+error_setg(errp, "Nested KVM-HV only supported on POWER9 and later");
   error_append_hint(errp, "Try appending -machine 
cap-nested-hv=off\n");


return ?



[PATCH v2 2/4] hyperv: Add definitions for syndbg

2022-02-16 Thread Jon Doron
Add all required definitions for hyperv synthetic debugger interface.

Signed-off-by: Jon Doron 
---
 include/hw/hyperv/hyperv-proto.h | 52 
 target/i386/kvm/hyperv-proto.h   | 37 +++
 2 files changed, 89 insertions(+)

diff --git a/include/hw/hyperv/hyperv-proto.h b/include/hw/hyperv/hyperv-proto.h
index 21dc28aee9..4a2297307b 100644
--- a/include/hw/hyperv/hyperv-proto.h
+++ b/include/hw/hyperv/hyperv-proto.h
@@ -24,12 +24,17 @@
 #define HV_STATUS_INVALID_PORT_ID 17
 #define HV_STATUS_INVALID_CONNECTION_ID   18
 #define HV_STATUS_INSUFFICIENT_BUFFERS19
+#define HV_STATUS_NOT_ACKNOWLEDGED20
+#define HV_STATUS_NO_DATA 27
 
 /*
  * Hypercall numbers
  */
 #define HV_POST_MESSAGE   0x005c
 #define HV_SIGNAL_EVENT   0x005d
+#define HV_POST_DEBUG_DATA0x0069
+#define HV_RETRIEVE_DEBUG_DATA0x006a
+#define HV_RESET_DEBUG_SESSION0x006b
 #define HV_HYPERCALL_FAST (1u << 16)
 
 /*
@@ -127,4 +132,51 @@ struct hyperv_event_flags_page {
 struct hyperv_event_flags slot[HV_SINT_COUNT];
 };
 
+/*
+ * Kernel debugger structures
+ */
+
+/* Options flags for hyperv_reset_debug_session */
+#define HV_DEBUG_PURGE_INCOMING_DATA0x0001
+#define HV_DEBUG_PURGE_OUTGOING_DATA0x0002
+struct hyperv_reset_debug_session_input {
+uint32_t options;
+} __attribute__ ((__packed__));
+
+struct hyperv_reset_debug_session_output {
+uint32_t host_ip;
+uint32_t target_ip;
+uint16_t host_port;
+uint16_t target_port;
+uint8_t host_mac[6];
+uint8_t target_mac[6];
+} __attribute__ ((__packed__));
+
+/* Options for hyperv_post_debug_data */
+#define HV_DEBUG_POST_LOOP  0x0001
+
+struct hyperv_post_debug_data_input {
+uint32_t count;
+uint32_t options;
+/*uint8_t data[HV_HYP_PAGE_SIZE - 2 * sizeof(uint32_t)];*/
+} __attribute__ ((__packed__));
+
+struct hyperv_post_debug_data_output {
+uint32_t pending_count;
+} __attribute__ ((__packed__));
+
+/* Options for hyperv_retrieve_debug_data */
+#define HV_DEBUG_RETRIEVE_LOOP  0x0001
+#define HV_DEBUG_RETRIEVE_TEST_ACTIVITY 0x0002
+
+struct hyperv_retrieve_debug_data_input {
+uint32_t count;
+uint32_t options;
+uint64_t timeout;
+} __attribute__ ((__packed__));
+
+struct hyperv_retrieve_debug_data_output {
+uint32_t retrieved_count;
+uint32_t remaining_count;
+} __attribute__ ((__packed__));
 #endif
diff --git a/target/i386/kvm/hyperv-proto.h b/target/i386/kvm/hyperv-proto.h
index 89f81afda7..e40e59411c 100644
--- a/target/i386/kvm/hyperv-proto.h
+++ b/target/i386/kvm/hyperv-proto.h
@@ -19,6 +19,9 @@
 #define HV_CPUID_ENLIGHTMENT_INFO 0x4004
 #define HV_CPUID_IMPLEMENT_LIMITS 0x4005
 #define HV_CPUID_NESTED_FEATURES  0x400A
+#define HV_CPUID_SYNDBG_VENDOR_AND_MAX_FUNCTIONS0x4080
+#define HV_CPUID_SYNDBG_INTERFACE   0x4081
+#define HV_CPUID_SYNDBG_PLATFORM_CAPABILITIES   0x4082
 #define HV_CPUID_MIN  0x4005
 #define HV_CPUID_MAX  0x4000
 #define HV_HYPERVISOR_PRESENT_BIT 0x8000
@@ -55,8 +58,14 @@
 #define HV_GUEST_IDLE_STATE_AVAILABLE   (1u << 5)
 #define HV_FREQUENCY_MSRS_AVAILABLE (1u << 8)
 #define HV_GUEST_CRASH_MSR_AVAILABLE(1u << 10)
+#define HV_FEATURE_DEBUG_MSRS_AVAILABLE (1u << 11)
 #define HV_STIMER_DIRECT_MODE_AVAILABLE (1u << 19)
 
+/*
+ * HV_CPUID_FEATURES.EBX bits
+ */
+#define HV_PARTITION_DEBUGGING_ALLOWED  (1u << 12)
+
 /*
  * HV_CPUID_ENLIGHTMENT_INFO.EAX bits
  */
@@ -72,6 +81,11 @@
 #define HV_ENLIGHTENED_VMCS_RECOMMENDED (1u << 14)
 #define HV_NO_NONARCH_CORESHARING   (1u << 18)
 
+/*
+ * HV_CPUID_SYNDBG_PLATFORM_CAPABILITIES.EAX bits
+ */
+#define HV_SYNDBG_CAP_ALLOW_KERNEL_DEBUGGING(1u << 1)
+
 /*
  * Basic virtualized MSRs
  */
@@ -130,6 +144,18 @@
 #define HV_X64_MSR_STIMER3_CONFIG   0x40B6
 #define HV_X64_MSR_STIMER3_COUNT0x40B7
 
+/*
+ * Hyper-V Synthetic debug options MSR
+ */
+#define HV_X64_MSR_SYNDBG_CONTROL   0x40F1
+#define HV_X64_MSR_SYNDBG_STATUS0x40F2
+#define HV_X64_MSR_SYNDBG_SEND_BUFFER   0x40F3
+#define HV_X64_MSR_SYNDBG_RECV_BUFFER   0x40F4
+#define HV_X64_MSR_SYNDBG_PENDING_BUFFER0x40F5
+#define HV_X64_MSR_SYNDBG_OPTIONS   0x40FF
+
+#define HV_X64_SYNDBG_OPTION_USE_HCALLS BIT(2)
+
 /*
  * Guest crash notification MSRs
  */
@@ -168,5 +194,16 @@
 
 #define HV_STIMER_COUNT   4
 
+/*
+ * Synthetic debugger control definitions
+ */
+#define HV_SYNDBG_CONTROL_SEND  (1u << 0)
+#define HV_SYNDBG_CONTROL_RECV  (1u << 1)
+#define 

[PATCH v2 4/4] hw: hyperv: Initial commit for Synthetic Debugging device

2022-02-16 Thread Jon Doron
Signed-off-by: Jon Doron 
---
 hw/hyperv/Kconfig |   5 +
 hw/hyperv/meson.build |   1 +
 hw/hyperv/syndbg.c| 402 ++
 3 files changed, 408 insertions(+)
 create mode 100644 hw/hyperv/syndbg.c

diff --git a/hw/hyperv/Kconfig b/hw/hyperv/Kconfig
index 3fbfe41c9e..fcf65903bd 100644
--- a/hw/hyperv/Kconfig
+++ b/hw/hyperv/Kconfig
@@ -11,3 +11,8 @@ config VMBUS
 bool
 default y
 depends on HYPERV
+
+config SYNDBG
+bool
+default y
+depends on VMBUS
diff --git a/hw/hyperv/meson.build b/hw/hyperv/meson.build
index 1367e2994f..b43f119ea5 100644
--- a/hw/hyperv/meson.build
+++ b/hw/hyperv/meson.build
@@ -1,3 +1,4 @@
 specific_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'))
 specific_ss.add(when: 'CONFIG_HYPERV_TESTDEV', if_true: 
files('hyperv_testdev.c'))
 specific_ss.add(when: 'CONFIG_VMBUS', if_true: files('vmbus.c'))
+specific_ss.add(when: 'CONFIG_SYNDBG', if_true: files('syndbg.c'))
diff --git a/hw/hyperv/syndbg.c b/hw/hyperv/syndbg.c
new file mode 100644
index 00..8816bc4082
--- /dev/null
+++ b/hw/hyperv/syndbg.c
@@ -0,0 +1,402 @@
+/*
+ * QEMU Hyper-V Synthetic Debugging device
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/ctype.h"
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/main-loop.h"
+#include "qemu/sockets.h"
+#include "qemu-common.h"
+#include "qapi/error.h"
+#include "migration/vmstate.h"
+#include "hw/qdev-properties.h"
+#include "hw/loader.h"
+#include "cpu.h"
+#include "hw/hyperv/hyperv.h"
+#include "hw/hyperv/vmbus-bridge.h"
+#include "hw/hyperv/hyperv-proto.h"
+#include "net/net.h"
+#include "net/eth.h"
+#include "net/checksum.h"
+#include "trace.h"
+
+#define TYPE_HV_SYNDBG   "hv-syndbg"
+
+typedef struct HvSynDbg {
+DeviceState parent_obj;
+
+char *host_ip;
+uint16_t host_port;
+bool use_hcalls;
+
+uint32_t target_ip;
+struct sockaddr_in servaddr;
+int socket;
+bool has_data_pending;
+uint64_t pending_page_gpa;
+} HvSynDbg;
+
+#define HVSYNDBG(obj) OBJECT_CHECK(HvSynDbg, (obj), TYPE_HV_SYNDBG)
+
+/* returns NULL unless there is exactly one HV Synth debug device */
+static HvSynDbg *hv_syndbg_find(void)
+{
+/* Returns NULL unless there is exactly one hvsd device */
+return HVSYNDBG(object_resolve_path_type("", TYPE_HV_SYNDBG, NULL));
+}
+
+static void set_pending_state(HvSynDbg *syndbg, bool has_pending)
+{
+hwaddr out_len;
+void *out_data;
+
+syndbg->has_data_pending = has_pending;
+
+if (!syndbg->pending_page_gpa) {
+return;
+}
+
+out_len = 1;
+out_data = cpu_physical_memory_map(syndbg->pending_page_gpa, _len, 1);
+if (out_data) {
+*(uint8_t *)out_data = !!has_pending;
+cpu_physical_memory_unmap(out_data, out_len, 1, out_len);
+}
+}
+
+static bool get_udb_pkt_data(void *p, uint32_t len, uint32_t *data_ofs,
+ uint32_t *src_ip)
+{
+uint32_t offset, curr_len = len;
+
+if (curr_len < sizeof(struct eth_header) ||
+(be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto) != ETH_P_IP)) {
+return false;
+}
+offset = sizeof(struct eth_header);
+curr_len -= sizeof(struct eth_header);
+
+if (curr_len < sizeof(struct ip_header) ||
+PKT_GET_IP_HDR(p)->ip_p != IP_PROTO_UDP) {
+return false;
+}
+offset += PKT_GET_IP_HDR_LEN(p);
+curr_len -= PKT_GET_IP_HDR_LEN(p);
+
+if (curr_len < sizeof(struct udp_header)) {
+return false;
+}
+
+offset += sizeof(struct udp_header);
+*data_ofs = offset;
+*src_ip = PKT_GET_IP_HDR(p)->ip_src;
+return true;
+}
+
+static uint16_t handle_send_msg(HvSynDbg *syndbg, uint64_t ingpa,
+uint32_t count, bool is_raw,
+uint32_t *pending_count)
+{
+uint16_t ret;
+hwaddr data_len;
+void *debug_data = NULL;
+uint32_t udp_data_ofs = 0;
+const void *pkt_data;
+int sent_count;
+
+data_len = count;
+debug_data = cpu_physical_memory_map(ingpa, _len, 0);
+if (!debug_data || data_len < count) {
+ret = HV_STATUS_INSUFFICIENT_MEMORY;
+goto cleanup;
+}
+
+if (is_raw &&
+!get_udb_pkt_data(debug_data, count, _data_ofs,
+  >target_ip)) {
+ret = HV_STATUS_SUCCESS;
+goto cleanup;
+}
+
+pkt_data = (const void *)((uintptr_t)debug_data + udp_data_ofs);
+sent_count = qemu_sendto(syndbg->socket, pkt_data, count - udp_data_ofs,
+ MSG_NOSIGNAL, NULL, 0);
+if (sent_count == -1) {
+ret = HV_STATUS_INSUFFICIENT_MEMORY;
+goto cleanup;
+}
+
+*pending_count = count - (sent_count + udp_data_ofs);
+ret = HV_STATUS_SUCCESS;
+cleanup:
+if (debug_data) {
+cpu_physical_memory_unmap(debug_data, count, 0, data_len);
+}
+
+

[PATCH v2 1/4] hyperv: SControl is optional to enable SynIc

2022-02-16 Thread Jon Doron
SynIc can be enabled regardless of the SControl mechanisim which can
register a GSI for a given SintRoute.

This behaviour can achived by setting enabling SIMP and then the guest
will poll on the message slot.

Once there is another message pending the host will set the message slot
with the pending flag.
When the guest polls from the message slot, in case the pending flag is
set it will write to the HV_X64_MSR_EOM indicating it has cleared the
slot and we can try and push our message again.

Signed-off-by: Jon Doron 
---
 hw/hyperv/hyperv.c | 109 +++--
 1 file changed, 76 insertions(+), 33 deletions(-)

diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
index cb1074f234..aaba6b4901 100644
--- a/hw/hyperv/hyperv.c
+++ b/hw/hyperv/hyperv.c
@@ -27,13 +27,16 @@ struct SynICState {
 
 CPUState *cs;
 
-bool enabled;
+bool sctl_enabled;
 hwaddr msg_page_addr;
 hwaddr event_page_addr;
 MemoryRegion msg_page_mr;
 MemoryRegion event_page_mr;
 struct hyperv_message_page *msg_page;
 struct hyperv_event_flags_page *event_page;
+
+QemuMutex sint_routes_mutex;
+QLIST_HEAD(, HvSintRoute) sint_routes;
 };
 
 #define TYPE_SYNIC "hyperv-synic"
@@ -51,11 +54,11 @@ static SynICState *get_synic(CPUState *cs)
 return SYNIC(object_resolve_path_component(OBJECT(cs), "synic"));
 }
 
-static void synic_update(SynICState *synic, bool enable,
+static void synic_update(SynICState *synic, bool sctl_enable,
  hwaddr msg_page_addr, hwaddr event_page_addr)
 {
 
-synic->enabled = enable;
+synic->sctl_enabled = sctl_enable;
 if (synic->msg_page_addr != msg_page_addr) {
 if (synic->msg_page_addr) {
 memory_region_del_subregion(get_system_memory(),
@@ -80,7 +83,7 @@ static void synic_update(SynICState *synic, bool enable,
 }
 }
 
-void hyperv_synic_update(CPUState *cs, bool enable,
+void hyperv_synic_update(CPUState *cs, bool sctl_enable,
  hwaddr msg_page_addr, hwaddr event_page_addr)
 {
 SynICState *synic = get_synic(cs);
@@ -89,7 +92,7 @@ void hyperv_synic_update(CPUState *cs, bool enable,
 return;
 }
 
-synic_update(synic, enable, msg_page_addr, event_page_addr);
+synic_update(synic, sctl_enable, msg_page_addr, event_page_addr);
 }
 
 static void synic_realize(DeviceState *dev, Error **errp)
@@ -110,16 +113,20 @@ static void synic_realize(DeviceState *dev, Error **errp)
sizeof(*synic->event_page), _abort);
 synic->msg_page = memory_region_get_ram_ptr(>msg_page_mr);
 synic->event_page = memory_region_get_ram_ptr(>event_page_mr);
+qemu_mutex_init(>sint_routes_mutex);
+QLIST_INIT(>sint_routes);
 
 g_free(msgp_name);
 g_free(eventp_name);
 }
+
 static void synic_reset(DeviceState *dev)
 {
 SynICState *synic = SYNIC(dev);
 memset(synic->msg_page, 0, sizeof(*synic->msg_page));
 memset(synic->event_page, 0, sizeof(*synic->event_page));
 synic_update(synic, false, 0, 0);
+assert(QLIST_EMPTY(>sint_routes));
 }
 
 static void synic_class_init(ObjectClass *klass, void *data)
@@ -214,6 +221,7 @@ struct HvSintRoute {
 HvSintStagedMessage *staged_msg;
 
 unsigned refcount;
+QLIST_ENTRY(HvSintRoute) link;
 };
 
 static CPUState *hyperv_find_vcpu(uint32_t vp_index)
@@ -259,7 +267,7 @@ static void cpu_post_msg(CPUState *cs, run_on_cpu_data data)
 
 assert(staged_msg->state == HV_STAGED_MSG_BUSY);
 
-if (!synic->enabled || !synic->msg_page_addr) {
+if (!synic->msg_page_addr) {
 staged_msg->status = -ENXIO;
 goto posted;
 }
@@ -343,7 +351,7 @@ int hyperv_set_event_flag(HvSintRoute *sint_route, unsigned 
eventno)
 if (eventno > HV_EVENT_FLAGS_COUNT) {
 return -EINVAL;
 }
-if (!synic->enabled || !synic->event_page_addr) {
+if (!synic->sctl_enabled || !synic->event_page_addr) {
 return -ENXIO;
 }
 
@@ -364,11 +372,12 @@ int hyperv_set_event_flag(HvSintRoute *sint_route, 
unsigned eventno)
 HvSintRoute *hyperv_sint_route_new(uint32_t vp_index, uint32_t sint,
HvSintMsgCb cb, void *cb_data)
 {
-HvSintRoute *sint_route;
-EventNotifier *ack_notifier;
+HvSintRoute *sint_route = NULL;
+EventNotifier *ack_notifier = NULL;
 int r, gsi;
 CPUState *cs;
 SynICState *synic;
+bool ack_event_initialized = false;
 
 cs = hyperv_find_vcpu(vp_index);
 if (!cs) {
@@ -381,57 +390,77 @@ HvSintRoute *hyperv_sint_route_new(uint32_t vp_index, 
uint32_t sint,
 }
 
 sint_route = g_new0(HvSintRoute, 1);
-r = event_notifier_init(_route->sint_set_notifier, false);
-if (r) {
-goto err;
+if (!sint_route) {
+return NULL;
 }
 
+sint_route->synic = synic;
+sint_route->sint = sint;
+sint_route->refcount = 1;
 
 ack_notifier = cb ? _route->sint_ack_notifier : NULL;
 if (ack_notifier) {
 sint_route->staged_msg 

[PATCH v2 5/9] target/ppc: make vhyp get_pate method take lpid and return success

2022-02-16 Thread Nicholas Piggin
In prepartion for implementing a full partition table option for
vhyp, update the get_pate method to take an lpid and return a
success/fail indicator.

The spapr implementation currently just asserts lpid is always 0
and always return success.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/spapr.c   | 7 ++-
 target/ppc/cpu.h | 3 ++-
 target/ppc/mmu-radix64.c | 7 ++-
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index fd7eccbdfd..2c95a09d25 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1309,13 +1309,18 @@ void spapr_set_all_lpcrs(target_ulong value, 
target_ulong mask)
 }
 }
 
-static void spapr_get_pate(PPCVirtualHypervisor *vhyp, ppc_v3_pate_t *entry)
+static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
+   target_ulong lpid, ppc_v3_pate_t *entry)
 {
 SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
 
+assert(lpid == 0);
+
 /* Copy PATE1:GR into PATE0:HR */
 entry->dw0 = spapr->patb_entry & PATE0_HR;
 entry->dw1 = spapr->patb_entry;
+
+return true;
 }
 
 #define HPTE(_table, _i)   (void *)(((uint64_t *)(_table)) + ((_i) * 2))
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 555c6b9245..c79ae74f10 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1320,7 +1320,8 @@ struct PPCVirtualHypervisorClass {
 hwaddr ptex, int n);
 void (*hpte_set_c)(PPCVirtualHypervisor *vhyp, hwaddr ptex, uint64_t pte1);
 void (*hpte_set_r)(PPCVirtualHypervisor *vhyp, hwaddr ptex, uint64_t pte1);
-void (*get_pate)(PPCVirtualHypervisor *vhyp, ppc_v3_pate_t *entry);
+bool (*get_pate)(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
+ target_ulong lpid, ppc_v3_pate_t *entry);
 target_ulong (*encode_hpt_for_kvm_pr)(PPCVirtualHypervisor *vhyp);
 void (*cpu_exec_enter)(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu);
 void (*cpu_exec_exit)(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu);
diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
index 5535f0fe20..3b6d75a292 100644
--- a/target/ppc/mmu-radix64.c
+++ b/target/ppc/mmu-radix64.c
@@ -563,7 +563,12 @@ static bool ppc_radix64_xlate_impl(PowerPCCPU *cpu, vaddr 
eaddr,
 if (cpu->vhyp) {
 PPCVirtualHypervisorClass *vhc;
 vhc = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
-vhc->get_pate(cpu->vhyp, );
+if (!vhc->get_pate(cpu->vhyp, cpu, lpid, )) {
+if (guest_visible) {
+ppc_radix64_raise_hsi(cpu, access_type, eaddr, eaddr, 
DSISR_R_BADCONFIG);
+}
+return false;
+}
 } else {
 if (!ppc64_v3_get_pate(cpu, lpid, )) {
 if (guest_visible) {
-- 
2.23.0




[PATCH v2 1/9] target/ppc: raise HV interrupts for partition table entry problems

2022-02-16 Thread Nicholas Piggin
Invalid or missing partition table entry exceptions should cause HV
interrupts. HDSISR is set to bad MMU config, which is consistent with
the ISA and experimentally matches what POWER9 generates.

Reviewed-by: Fabiano Rosas 
Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/mmu-radix64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
index d4e16bd7db..df2fec80ce 100644
--- a/target/ppc/mmu-radix64.c
+++ b/target/ppc/mmu-radix64.c
@@ -556,13 +556,13 @@ static bool ppc_radix64_xlate_impl(PowerPCCPU *cpu, vaddr 
eaddr,
 } else {
 if (!ppc64_v3_get_pate(cpu, lpid, )) {
 if (guest_visible) {
-ppc_radix64_raise_si(cpu, access_type, eaddr, DSISR_NOPTE);
+ppc_radix64_raise_hsi(cpu, access_type, eaddr, eaddr, 
DSISR_R_BADCONFIG);
 }
 return false;
 }
 if (!validate_pate(cpu, lpid, )) {
 if (guest_visible) {
-ppc_radix64_raise_si(cpu, access_type, eaddr, 
DSISR_R_BADCONFIG);
+ppc_radix64_raise_hsi(cpu, access_type, eaddr, eaddr, 
DSISR_R_BADCONFIG);
 }
 return false;
 }
-- 
2.23.0




[PATCH v2 3/9] ppc: allow the hdecr timer to be created/destroyed

2022-02-16 Thread Nicholas Piggin
Machines which don't emulate the HDEC facility are able to use the
timer for something else. Provide functions to start and stop the
hdecr timer.

Signed-off-by: Nicholas Piggin 
---
 hw/ppc/ppc.c | 21 +
 include/hw/ppc/ppc.h |  3 +++
 2 files changed, 24 insertions(+)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index c6dfc5975f..ad64015551 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -1083,6 +1083,27 @@ clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t 
freq)
 return _ppc_set_tb_clk;
 }
 
+/* cpu_ppc_hdecr_init may be used if the timer is not used by HDEC emulation */
+void cpu_ppc_hdecr_init(CPUPPCState *env)
+{
+PowerPCCPU *cpu = env_archcpu(env);
+
+assert(env->tb_env->hdecr_timer == NULL);
+
+env->tb_env->hdecr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, 
_ppc_hdecr_cb,
+ cpu);
+}
+
+void cpu_ppc_hdecr_exit(CPUPPCState *env)
+{
+PowerPCCPU *cpu = env_archcpu(env);
+
+timer_free(env->tb_env->hdecr_timer);
+env->tb_env->hdecr_timer = NULL;
+
+cpu_ppc_hdecr_lower(cpu);
+}
+
 /*/
 /* PowerPC 40x timers */
 
diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
index 93e614cffd..b0ba4bd6b9 100644
--- a/include/hw/ppc/ppc.h
+++ b/include/hw/ppc/ppc.h
@@ -54,6 +54,9 @@ struct ppc_tb_t {
 
 uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t tb_offset);
 clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t freq);
+void cpu_ppc_hdecr_init(CPUPPCState *env);
+void cpu_ppc_hdecr_exit(CPUPPCState *env);
+
 /* Embedded PowerPC DCR management */
 typedef uint32_t (*dcr_read_cb)(void *opaque, int dcrn);
 typedef void (*dcr_write_cb)(void *opaque, int dcrn, uint32_t val);
-- 
2.23.0




[PATCH v2 9/9] spapr: implement nested-hv capability for the virtual hypervisor

2022-02-16 Thread Nicholas Piggin
This implements the Nested KVM HV hcall API for spapr under TCG.

The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
L1 is switched back in returned from the hcall when a HV exception
is sent to the vhyp. Register state is copied in and out according to
the nested KVM HV hcall API specification.

The hdecr timer is started when the L2 is switched in, and it provides
the HDEC / 0x980 return to L1.

The MMU re-uses the bare metal radix 2-level page table walker by
using the get_pate method to point the MMU to the nested partition
table entry. MMU faults due to partition scope errors raise HV
exceptions and accordingly are routed back to the L1.

The MMU does not tag translations for the L1 (direct) vs L2 (nested)
guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
and exit).

Reviewed-by: Fabiano Rosas 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/spapr.c  |  37 +++-
 hw/ppc/spapr_caps.c |  14 +-
 hw/ppc/spapr_hcall.c| 333 
 include/hw/ppc/spapr.h  |  74 ++-
 include/hw/ppc/spapr_cpu_core.h |   5 +
 5 files changed, 452 insertions(+), 11 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 6fab70767f..87e68da77f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1270,6 +1270,8 @@ static void emulate_spapr_hypercall(PPCVirtualHypervisor 
*vhyp,
 /* The TCG path should also be holding the BQL at this point */
 g_assert(qemu_mutex_iothread_locked());
 
+g_assert(!vhyp_cpu_in_nested(cpu));
+
 if (msr_pr) {
 hcall_dprintf("Hypercall made with MSR[PR]=1\n");
 env->gpr[3] = H_PRIVILEGE;
@@ -1313,12 +1315,34 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, 
PowerPCCPU *cpu,
target_ulong lpid, ppc_v3_pate_t *entry)
 {
 SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
+SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
 
-assert(lpid == 0);
+if (!spapr_cpu->in_nested) {
+assert(lpid == 0);
 
-/* Copy PATE1:GR into PATE0:HR */
-entry->dw0 = spapr->patb_entry & PATE0_HR;
-entry->dw1 = spapr->patb_entry;
+/* Copy PATE1:GR into PATE0:HR */
+entry->dw0 = spapr->patb_entry & PATE0_HR;
+entry->dw1 = spapr->patb_entry;
+
+} else {
+uint64_t patb, pats;
+
+assert(lpid != 0);
+
+patb = spapr->nested_ptcr & PTCR_PATB;
+pats = spapr->nested_ptcr & PTCR_PATS;
+
+/* Calculate number of entries */
+pats = 1ull << (pats + 12 - 4);
+if (pats <= lpid) {
+return false;
+}
+
+/* Grab entry */
+patb += 16 * lpid;
+entry->dw0 = ldq_phys(CPU(cpu)->as, patb);
+entry->dw1 = ldq_phys(CPU(cpu)->as, patb + 8);
+}
 
 return true;
 }
@@ -4472,7 +4496,9 @@ PowerPCCPU *spapr_find_cpu(int vcpu_id)
 
 static bool spapr_cpu_in_nested(PowerPCCPU *cpu)
 {
-return false;
+SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
+
+return spapr_cpu->in_nested;
 }
 
 static void spapr_cpu_exec_enter(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)
@@ -4584,6 +4610,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 nc->nmi_monitor_handler = spapr_nmi;
 smc->phb_placement = spapr_phb_placement;
 vhc->cpu_in_nested = spapr_cpu_in_nested;
+vhc->deliver_hv_excp = spapr_exit_nested;
 vhc->hypercall = emulate_spapr_hypercall;
 vhc->hpt_mask = spapr_hpt_mask;
 vhc->map_hptes = spapr_map_hptes;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index e2412aaa57..6d74345930 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -444,19 +444,23 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState 
*spapr,
 {
 ERRP_GUARD();
 PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+CPUPPCState *env = >env;
 
 if (!val) {
 /* capability disabled by default */
 return;
 }
 
-if (tcg_enabled()) {
-error_setg(errp, "No Nested KVM-HV support in TCG");
+if (!(env->insns_flags2 & PPC2_ISA300)) {
+error_setg(errp, "Nested-HV only supported on POWER9 and later");
 error_append_hint(errp, "Try appending -machine cap-nested-hv=off\n");
-} else if (kvm_enabled()) {
+return;
+}
+
+if (kvm_enabled()) {
 if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_00, 0,
   spapr->max_compat_pvr)) {
-error_setg(errp, "Nested KVM-HV only supported on POWER9");
+error_setg(errp, "Nested-HV only supported on POWER9 and later");
 error_append_hint(errp,
   "Try appending -machine 
max-cpu-compat=power9\n");
 return;
@@ -464,7 +468,7 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState 
*spapr,
 
 if (!kvmppc_has_cap_nested_kvm_hv()) {
 error_setg(errp,
-   "KVM implementation does not support Nested KVM-HV");
+   "KVM 

[PATCH v2 6/9] target/ppc: add helper for books vhyp hypercall handler

2022-02-16 Thread Nicholas Piggin
The virtual hypervisor currently always intercepts and handles
hypercalls but with a future change this will not always be the case.

Add a helper for the test so the logic is abstracted from the mechanism.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/excp_helper.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index fcc83a7701..6b6ec71bc2 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1278,6 +1278,18 @@ static void powerpc_excp_booke(PowerPCCPU *cpu, int excp)
 }
 
 #ifdef TARGET_PPC64
+/*
+ * When running under vhyp, hcalls are always intercepted and sent to the
+ * vhc->hypercall handler.
+ */
+static bool books_vhyp_handles_hcall(PowerPCCPU *cpu)
+{
+if (cpu->vhyp) {
+return true;
+}
+return false;
+}
+
 static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
 {
 CPUState *cs = CPU(cpu);
@@ -1439,7 +1451,7 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
 env->nip += 4;
 
 /* "PAPR mode" built-in hypercall emulation */
-if ((lev == 1) && cpu->vhyp) {
+if ((lev == 1) && books_vhyp_handles_hcall(cpu)) {
 PPCVirtualHypervisorClass *vhc =
 PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
 vhc->hypercall(cpu->vhyp, cpu);
-- 
2.23.0




Re: [PATCH v2] arm: Remove swift-bmc machine

2022-02-16 Thread Cédric Le Goater

On 2/16/22 09:09, Joel Stanley wrote:

It was scheduled for removal in 7.0.

Signed-off-by: Joel Stanley 



Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
v2: also remove from docs/about/deprecated.rst

  docs/about/deprecated.rst  |  7 -
  docs/system/arm/aspeed.rst |  1 -
  hw/arm/aspeed.c| 53 --
  3 files changed, 61 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 26d00812ba94..85773db631c1 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -315,13 +315,6 @@ Use the more generic event ``DEVICE_UNPLUG_GUEST_ERROR`` 
instead.
  System emulator machines
  
  
-Aspeed ``swift-bmc`` machine (since 6.1)

-
-
-This machine is deprecated because we have enough AST2500 based OpenPOWER
-machines. It can be easily replaced by the ``witherspoon-bmc`` or the
-``romulus-bmc`` machines.
-
  PPC 405 ``taihu`` machine (since 7.0)
  '
  
diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst

index d8b102fa0ad0..60ed94f18759 100644
--- a/docs/system/arm/aspeed.rst
+++ b/docs/system/arm/aspeed.rst
@@ -22,7 +22,6 @@ AST2500 SoC based machines :
  - ``romulus-bmc``  OpenPOWER Romulus POWER9 BMC
  - ``witherspoon-bmc``  OpenPOWER Witherspoon POWER9 BMC
  - ``sonorapass-bmc``   OCP SonoraPass BMC
-- ``swift-bmc``OpenPOWER Swift BMC POWER9 (to be removed in v7.0)
  - ``fp5280g2-bmc`` Inspur FP5280G2 BMC
  - ``g220a-bmc``Bytedance G220A BMC
  
diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c

index d911dc904fb3..9789a489047b 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -544,35 +544,6 @@ static void romulus_bmc_i2c_init(AspeedMachineState *bmc)
  i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 11), "ds1338", 
0x32);
  }
  
-static void swift_bmc_i2c_init(AspeedMachineState *bmc)

-{
-AspeedSoCState *soc = >soc;
-
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 3), "pca9552", 0x60);
-
-/* The swift board expects a TMP275 but a TMP105 is compatible */
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 7), "tmp105", 0x48);
-/* The swift board expects a pca9551 but a pca9552 is compatible */
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 7), "pca9552", 0x60);
-
-/* The swift board expects an Epson RX8900 RTC but a ds1338 is compatible 
*/
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 8), "ds1338", 0x32);
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 8), "pca9552", 0x60);
-
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 9), "tmp423", 0x4c);
-/* The swift board expects a pca9539 but a pca9552 is compatible */
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 9), "pca9552", 0x74);
-
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 10), "tmp423", 0x4c);
-/* The swift board expects a pca9539 but a pca9552 is compatible */
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 10), "pca9552",
- 0x74);
-
-/* The swift board expects a TMP275 but a TMP105 is compatible */
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 12), "tmp105", 0x48);
-i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 12), "tmp105", 0x4a);
-}
-
  static void sonorapass_bmc_i2c_init(AspeedMachineState *bmc)
  {
  AspeedSoCState *soc = >soc;
@@ -1102,26 +1073,6 @@ static void 
aspeed_machine_sonorapass_class_init(ObjectClass *oc, void *data)
  aspeed_soc_num_cpus(amc->soc_name);
  };
  
-static void aspeed_machine_swift_class_init(ObjectClass *oc, void *data)

-{
-MachineClass *mc = MACHINE_CLASS(oc);
-AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
-
-mc->desc   = "OpenPOWER Swift BMC (ARM1176)";
-amc->soc_name  = "ast2500-a1";
-amc->hw_strap1 = SWIFT_BMC_HW_STRAP1;
-amc->fmc_model = "mx66l1g45g";
-amc->spi_model = "mx66l1g45g";
-amc->num_cs= 2;
-amc->i2c_init  = swift_bmc_i2c_init;
-mc->default_ram_size   = 512 * MiB;
-mc->default_cpus = mc->min_cpus = mc->max_cpus =
-aspeed_soc_num_cpus(amc->soc_name);
-
-mc->deprecation_reason = "redundant system. Please use a similar "
-"OpenPOWER BMC, Witherspoon or Romulus.";
-};
-
  static void aspeed_machine_witherspoon_class_init(ObjectClass *oc, void *data)
  {
  MachineClass *mc = MACHINE_CLASS(oc);
@@ -1277,10 +1228,6 @@ static const TypeInfo aspeed_machine_types[] = {
  .name  = MACHINE_TYPE_NAME("romulus-bmc"),
  .parent= TYPE_ASPEED_MACHINE,
  .class_init= aspeed_machine_romulus_class_init,
-}, {
-.name  = MACHINE_TYPE_NAME("swift-bmc"),
-.parent= TYPE_ASPEED_MACHINE,
-.class_init= aspeed_machine_swift_class_init,
  }, {
  .name  = MACHINE_TYPE_NAME("sonorapass-bmc"),
  .parent= TYPE_ASPEED_MACHINE,





[PULL v2 29/35] hw/intc: Add RISC-V AIA APLIC device emulation

2022-02-16 Thread Alistair Francis
From: Anup Patel 

The RISC-V AIA (Advanced Interrupt Architecture) defines a new
interrupt controller for wired interrupts called APLIC (Advanced
Platform Level Interrupt Controller). The APLIC is capabable of
forwarding wired interupts to RISC-V HARTs directly or as MSIs
(Message Signaled Interupts).

This patch adds device emulation for RISC-V AIA APLIC.

Signed-off-by: Anup Patel 
Signed-off-by: Anup Patel 
Reviewed-by: Frank Chang 
Message-id: 20220204174700.534953-19-a...@brainfault.org
Signed-off-by: Alistair Francis 
---
 include/hw/intc/riscv_aplic.h |  79 +++
 hw/intc/riscv_aplic.c | 978 ++
 hw/intc/Kconfig   |   3 +
 hw/intc/meson.build   |   1 +
 4 files changed, 1061 insertions(+)
 create mode 100644 include/hw/intc/riscv_aplic.h
 create mode 100644 hw/intc/riscv_aplic.c

diff --git a/include/hw/intc/riscv_aplic.h b/include/hw/intc/riscv_aplic.h
new file mode 100644
index 00..de8532fbc3
--- /dev/null
+++ b/include/hw/intc/riscv_aplic.h
@@ -0,0 +1,79 @@
+/*
+ * RISC-V APLIC (Advanced Platform Level Interrupt Controller) interface
+ *
+ * Copyright (c) 2021 Western Digital Corporation or its affiliates.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#ifndef HW_RISCV_APLIC_H
+#define HW_RISCV_APLIC_H
+
+#include "hw/sysbus.h"
+#include "qom/object.h"
+
+#define TYPE_RISCV_APLIC "riscv.aplic"
+
+typedef struct RISCVAPLICState RISCVAPLICState;
+DECLARE_INSTANCE_CHECKER(RISCVAPLICState, RISCV_APLIC, TYPE_RISCV_APLIC)
+
+#define APLIC_MIN_SIZE0x4000
+#define APLIC_SIZE_ALIGN(__x) (((__x) + (APLIC_MIN_SIZE - 1)) & \
+   ~(APLIC_MIN_SIZE - 1))
+#define APLIC_SIZE(__num_harts)   (APLIC_MIN_SIZE + \
+   APLIC_SIZE_ALIGN(32 * (__num_harts)))
+
+struct RISCVAPLICState {
+/*< private >*/
+SysBusDevice parent_obj;
+qemu_irq *external_irqs;
+
+/*< public >*/
+MemoryRegion mmio;
+uint32_t bitfield_words;
+uint32_t domaincfg;
+uint32_t mmsicfgaddr;
+uint32_t mmsicfgaddrH;
+uint32_t smsicfgaddr;
+uint32_t smsicfgaddrH;
+uint32_t genmsi;
+uint32_t *sourcecfg;
+uint32_t *state;
+uint32_t *target;
+uint32_t *idelivery;
+uint32_t *iforce;
+uint32_t *ithreshold;
+
+/* topology */
+#define QEMU_APLIC_MAX_CHILDREN16
+struct RISCVAPLICState *parent;
+struct RISCVAPLICState *children[QEMU_APLIC_MAX_CHILDREN];
+uint16_t num_children;
+
+/* config */
+uint32_t aperture_size;
+uint32_t hartid_base;
+uint32_t num_harts;
+uint32_t iprio_mask;
+uint32_t num_irqs;
+bool msimode;
+bool mmode;
+};
+
+void riscv_aplic_add_child(DeviceState *parent, DeviceState *child);
+
+DeviceState *riscv_aplic_create(hwaddr addr, hwaddr size,
+uint32_t hartid_base, uint32_t num_harts, uint32_t num_sources,
+uint32_t iprio_bits, bool msimode, bool mmode, DeviceState *parent);
+
+#endif
diff --git a/hw/intc/riscv_aplic.c b/hw/intc/riscv_aplic.c
new file mode 100644
index 00..e7809fb6b2
--- /dev/null
+++ b/hw/intc/riscv_aplic.c
@@ -0,0 +1,978 @@
+/*
+ * RISC-V APLIC (Advanced Platform Level Interrupt Controller)
+ *
+ * Copyright (c) 2021 Western Digital Corporation or its affiliates.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qemu/error-report.h"
+#include "qemu/bswap.h"
+#include "exec/address-spaces.h"
+#include "hw/sysbus.h"
+#include "hw/pci/msi.h"
+#include "hw/boards.h"
+#include "hw/qdev-properties.h"
+#include "hw/intc/riscv_aplic.h"
+#include "hw/irq.h"
+#include "target/riscv/cpu.h"
+#include "sysemu/sysemu.h"
+#include "migration/vmstate.h"
+
+#define APLIC_MAX_IDC  (1UL << 14)
+#define 

Re: [PATCH] hw/virtio: vdpa: Fix leak of host-notifier memory-region

2022-02-16 Thread Stefano Garzarella

On Fri, Feb 11, 2022 at 06:02:59PM +0100, Laurent Vivier wrote:

If call virtio_queue_set_host_notifier_mr fails, should free
host-notifier memory-region.

This problem can trigger a coredump with some vDPA drivers (mlx5,
but not with the vdpasim), if we unplug the virtio-net card from
the guest after a stop/start.

The same fix has been done for vhost-user:
 1f89d3b91e3e ("hw/virtio: Fix leak of host-notifier memory-region")

Fixes: d0416d487bd5 ("vhost-vdpa: map virtqueue notification area if possible")
Cc: jasow...@redhat.com
Resolves: https://bugzilla.redhat.com/2027208
Signed-off-by: Laurent Vivier 
---
hw/virtio/vhost-vdpa.c | 1 +
1 file changed, 1 insertion(+)


Reviewed-by: Stefano Garzarella 



diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 04ea43704f5d..11f696468dc1 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -431,6 +431,7 @@ static int vhost_vdpa_host_notifier_init(struct vhost_dev 
*dev, int queue_index)
g_free(name);

if (virtio_queue_set_host_notifier_mr(vdev, queue_index, >mr, true)) {
+object_unparent(OBJECT(>mr));
munmap(addr, page_size);
goto err;
}
--
2.34.1







[PULL v2 32/35] target/riscv: add support for svnapot extension

2022-02-16 Thread Alistair Francis
From: Weiwei Li 

- add PTE_N bit
- add PTE_N bit check for inner PTE
- update address translation to support 64KiB continuous region (napot_bits = 4)

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Anup Patel 
Reviewed-by: Alistair Francis 
Message-Id: <20220204022658.18097-4-liwei...@iscas.ac.cn>
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu_bits.h   |  1 +
 target/riscv/cpu.c|  2 ++
 target/riscv/cpu_helper.c | 18 +++---
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index b3489cbc10..37ed4da72c 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -561,6 +561,7 @@ typedef enum {
 #define PTE_A   0x040 /* Accessed */
 #define PTE_D   0x080 /* Dirty */
 #define PTE_SOFT0x300 /* Reserved for Software */
+#define PTE_N   0x8000ULL /* NAPOT translation */
 
 /* Page table PPN shift amount */
 #define PTE_PPN_SHIFT   10
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 9dce57a380..fda99c2a81 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -774,6 +774,8 @@ static Property riscv_cpu_properties[] = {
 DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
 DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
 
+DEFINE_PROP_BOOL("svnapot", RISCVCPU, cfg.ext_svnapot, false),
+
 DEFINE_PROP_BOOL("zba", RISCVCPU, cfg.ext_zba, true),
 DEFINE_PROP_BOOL("zbb", RISCVCPU, cfg.ext_zbb, true),
 DEFINE_PROP_BOOL("zbc", RISCVCPU, cfg.ext_zbc, true),
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 25ebc76725..437c9488a6 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -753,6 +753,8 @@ static int get_physical_address(CPURISCVState *env, hwaddr 
*physical,
 bool use_background = false;
 hwaddr ppn;
 RISCVCPU *cpu = env_archcpu(env);
+int napot_bits = 0;
+target_ulong napot_mask;
 
 /*
  * Check if we should use the background registers for the two
@@ -937,7 +939,7 @@ restart:
 return TRANSLATE_FAIL;
 } else if (!(pte & (PTE_R | PTE_W | PTE_X))) {
 /* Inner PTE, continue walking */
-if (pte & (PTE_D | PTE_A | PTE_U)) {
+if (pte & (PTE_D | PTE_A | PTE_U | PTE_N)) {
 return TRANSLATE_FAIL;
 }
 base = ppn << PGSHIFT;
@@ -1013,8 +1015,18 @@ restart:
 /* for superpage mappings, make a fake leaf PTE for the TLB's
benefit. */
 target_ulong vpn = addr >> PGSHIFT;
-*physical = ((ppn | (vpn & ((1L << ptshift) - 1))) << PGSHIFT) |
-(addr & ~TARGET_PAGE_MASK);
+
+if (cpu->cfg.ext_svnapot && (pte & PTE_N)) {
+napot_bits = ctzl(ppn) + 1;
+if ((i != (levels - 1)) || (napot_bits != 4)) {
+return TRANSLATE_FAIL;
+}
+}
+
+napot_mask = (1 << napot_bits) - 1;
+*physical = (((ppn & ~napot_mask) | (vpn & napot_mask) |
+  (vpn & (((target_ulong)1 << ptshift) - 1))
+ ) << PGSHIFT) | (addr & ~TARGET_PAGE_MASK);
 
 /* set permissions on the TLB entry */
 if ((pte & PTE_R) || ((pte & PTE_X) && mxr)) {
-- 
2.34.1




[PULL v2 21/35] target/riscv: Implement AIA CSRs for 64 local interrupts on RV32

2022-02-16 Thread Alistair Francis
From: Anup Patel 

The AIA specification adds new CSRs for RV32 so that RISC-V hart can
support 64 local interrupts on both RV32 and RV64.

Signed-off-by: Anup Patel 
Signed-off-by: Anup Patel 
Reviewed-by: Alistair Francis 
Reviewed-by: Frank Chang 
Message-id: 20220204174700.534953-11-a...@brainfault.org
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.h|  14 +-
 target/riscv/cpu_helper.c |  10 +-
 target/riscv/csr.c| 560 +++---
 target/riscv/machine.c|  10 +-
 4 files changed, 474 insertions(+), 120 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 89e9cc558d..2dc2485bb4 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -172,12 +172,12 @@ struct CPURISCVState {
  */
 uint64_t mstatus;
 
-target_ulong mip;
+uint64_t mip;
 
-uint32_t miclaim;
+uint64_t miclaim;
 
-target_ulong mie;
-target_ulong mideleg;
+uint64_t mie;
+uint64_t mideleg;
 
 target_ulong satp;   /* since: priv-1.10.0 */
 target_ulong stval;
@@ -199,7 +199,7 @@ struct CPURISCVState {
 /* Hypervisor CSRs */
 target_ulong hstatus;
 target_ulong hedeleg;
-target_ulong hideleg;
+uint64_t hideleg;
 target_ulong hcounteren;
 target_ulong htval;
 target_ulong htinst;
@@ -456,8 +456,8 @@ void riscv_cpu_list(void);
 #ifndef CONFIG_USER_ONLY
 bool riscv_cpu_exec_interrupt(CPUState *cs, int interrupt_request);
 void riscv_cpu_swap_hypervisor_regs(CPURISCVState *env);
-int riscv_cpu_claim_interrupts(RISCVCPU *cpu, uint32_t interrupts);
-uint32_t riscv_cpu_update_mip(RISCVCPU *cpu, uint32_t mask, uint32_t value);
+int riscv_cpu_claim_interrupts(RISCVCPU *cpu, uint64_t interrupts);
+uint64_t riscv_cpu_update_mip(RISCVCPU *cpu, uint64_t mask, uint64_t value);
 #define BOOL_TO_MASK(x) (-!!(x)) /* helper for riscv_cpu_update_mip value */
 void riscv_cpu_set_rdtime_fn(CPURISCVState *env, uint64_t (*fn)(uint32_t),
  uint32_t arg);
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 1a9534d6d7..430060dcd8 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -585,7 +585,7 @@ bool riscv_cpu_two_stage_lookup(int mmu_idx)
 return mmu_idx & TB_FLAGS_PRIV_HYP_ACCESS_MASK;
 }
 
-int riscv_cpu_claim_interrupts(RISCVCPU *cpu, uint32_t interrupts)
+int riscv_cpu_claim_interrupts(RISCVCPU *cpu, uint64_t interrupts)
 {
 CPURISCVState *env = >env;
 if (env->miclaim & interrupts) {
@@ -596,11 +596,11 @@ int riscv_cpu_claim_interrupts(RISCVCPU *cpu, uint32_t 
interrupts)
 }
 }
 
-uint32_t riscv_cpu_update_mip(RISCVCPU *cpu, uint32_t mask, uint32_t value)
+uint64_t riscv_cpu_update_mip(RISCVCPU *cpu, uint64_t mask, uint64_t value)
 {
 CPURISCVState *env = >env;
 CPUState *cs = CPU(cpu);
-uint32_t gein, vsgein = 0, old = env->mip;
+uint64_t gein, vsgein = 0, old = env->mip;
 bool locked = false;
 
 if (riscv_cpu_virt_enabled(env)) {
@@ -1306,7 +1306,7 @@ void riscv_cpu_do_interrupt(CPUState *cs)
  */
 bool async = !!(cs->exception_index & RISCV_EXCP_INT_FLAG);
 target_ulong cause = cs->exception_index & RISCV_EXCP_INT_MASK;
-target_ulong deleg = async ? env->mideleg : env->medeleg;
+uint64_t deleg = async ? env->mideleg : env->medeleg;
 target_ulong tval = 0;
 target_ulong htval = 0;
 target_ulong mtval2 = 0;
@@ -1373,7 +1373,7 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 cause < TARGET_LONG_BITS && ((deleg >> cause) & 1)) {
 /* handle the trap in S-mode */
 if (riscv_has_ext(env, RVH)) {
-target_ulong hdeleg = async ? env->hideleg : env->hedeleg;
+uint64_t hdeleg = async ? env->hideleg : env->hedeleg;
 
 if (riscv_cpu_virt_enabled(env) && ((hdeleg >> cause) & 1)) {
 /* Trap to VS mode */
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index b23195b479..d8283160b1 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -158,6 +158,15 @@ static RISCVException any32(CPURISCVState *env, int csrno)
 
 }
 
+static int aia_any32(CPURISCVState *env, int csrno)
+{
+if (!riscv_feature(env, RISCV_FEATURE_AIA)) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+return any32(env, csrno);
+}
+
 static RISCVException smode(CPURISCVState *env, int csrno)
 {
 if (riscv_has_ext(env, RVS)) {
@@ -167,6 +176,24 @@ static RISCVException smode(CPURISCVState *env, int csrno)
 return RISCV_EXCP_ILLEGAL_INST;
 }
 
+static int smode32(CPURISCVState *env, int csrno)
+{
+if (riscv_cpu_mxl(env) != MXL_RV32) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+return smode(env, csrno);
+}
+
+static int aia_smode32(CPURISCVState *env, int csrno)
+{
+if (!riscv_feature(env, RISCV_FEATURE_AIA)) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+return smode32(env, csrno);
+}
+
 static RISCVException hmode(CPURISCVState *env, int csrno)
 {
 if (riscv_has_ext(env, RVS) &&
@@ 

[PATCH 1/8] qdev-properties: Add a new macro with bitmask check for uint64_t property

2022-02-16 Thread Yang Weijiang
The DEFINE_PROP_UINT64_CHECKMASK maro applies certain mask check agaist
user-supplied property value, reject the value if it violates the bitmask.

Co-developed-by: Like Xu 
Signed-off-by: Like Xu 
Signed-off-by: Yang Weijiang 
---
 hw/core/qdev-properties.c| 19 +++
 include/hw/qdev-properties.h | 12 
 2 files changed, 31 insertions(+)

diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index c34aac6ebc..27566e5ef7 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -428,6 +428,25 @@ const PropertyInfo qdev_prop_int64 = {
 .set_default_value = qdev_propinfo_set_default_value_int,
 };
 
+static void set_uint64_checkmask(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp)
+{
+Property *prop = opaque;
+uint64_t *ptr = object_field_prop_ptr(obj, prop);
+
+visit_type_uint64(v, name, ptr, errp);
+if (*ptr & ~prop->bitmask) {
+error_setg(errp, "Property value for '%s' violates bitmask '0x%lx'",
+   name, prop->bitmask);
+}
+}
+
+const PropertyInfo qdev_prop_uint64_checkmask = {
+.name  = "uint64",
+.get   = get_uint64,
+.set   = set_uint64_checkmask,
+};
+
 /* --- string --- */
 
 static void release_string(Object *obj, const char *name, void *opaque)
diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index f7925f67d0..e1df08876c 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -17,6 +17,7 @@ struct Property {
 const PropertyInfo *info;
 ptrdiff_toffset;
 uint8_t  bitnr;
+uint64_t bitmask;
 bool set_default;
 union {
 int64_t i;
@@ -54,6 +55,7 @@ extern const PropertyInfo qdev_prop_uint16;
 extern const PropertyInfo qdev_prop_uint32;
 extern const PropertyInfo qdev_prop_int32;
 extern const PropertyInfo qdev_prop_uint64;
+extern const PropertyInfo qdev_prop_uint64_checkmask;
 extern const PropertyInfo qdev_prop_int64;
 extern const PropertyInfo qdev_prop_size;
 extern const PropertyInfo qdev_prop_string;
@@ -103,6 +105,16 @@ extern const PropertyInfo qdev_prop_link;
 .set_default = true, \
 .defval.u= (bool)_defval)
 
+/**
+ * The DEFINE_PROP_UINT64_CHECKMASK macro checks a user-supplied value
+ * against corresponding bitmask, rejects the value if it violates.
+ * The default value is set in instance_init().
+ */
+#define DEFINE_PROP_UINT64_CHECKMASK(_name, _state, _field, _bitmask)   \
+DEFINE_PROP(_name, _state, _field, qdev_prop_uint64_checkmask, uint64_t, \
+.bitmask= (_bitmask), \
+.set_default = false)
+
 #define PROP_ARRAY_LEN_PREFIX "len-"
 
 /**
-- 
2.27.0




Re: [PATCH v4 4/4] hw/i386/sgx: Attach SGX-EPC objects to machine

2022-02-16 Thread Igor Mammedov
On Mon, 14 Feb 2022 10:30:18 +
Daniel P. Berrangé  wrote:

> On Mon, Feb 14, 2022 at 09:21:07AM +0100, Igor Mammedov wrote:
> > On Mon, 14 Feb 2022 14:58:57 +0800
> > Yang Zhong  wrote:
> >   
> > > On Mon, Feb 07, 2022 at 09:37:52AM +0100, Igor Mammedov wrote:  
> > > > On Sat,  5 Feb 2022 13:45:26 +0100
> > > > Philippe Mathieu-Daudé  wrote:
> > > > 
> > > > > Previously SGX-EPC objects were exposed in the QOM tree at a path
> > > > > 
> > > > >   /machine/unattached/device[nn]
> > > > > 
> > > > > where the 'nn' varies depending on what devices were already created.
> > > > > 
> > > > > With this change the SGX-EPC objects are now at
> > > > > 
> > > > >   /machine/sgx-epc[nn]
> > > > > 
> > > > > where the 'nn' of the first SGX-EPC object is always zero.
> > > > 
> > > > yet again, why it's necessary?
> > > 
> > > 
> > >   Igor, Sorry for delay feedback because of Chinese New Year holiday.
> > > 
> > >   This series patches are to fix below issues I reported before,
> > >   https://lists.nongnu.org/archive/html/qemu-devel/2021-11/msg05670.html
> > > 
> > >   Since the /machine/unattached/device[0] is used by vcpu and Libvirt
> > >   use this interface to get unavailable-features list. But in the SGX
> > >   VM, the device[0] will be occupied by virtual sgx epc device, Libvirt
> > >   can't get unavailable-features from this device[0].
> > > 
> > >   Although patch 2 in this series already fixed "unavailable-features" 
> > > issue,  
> > 
> > I've seen patches on libvirt fixing "unavailable-features" in another way
> > without dependence on  /machine/unattached/device[0].
> > see:
> >  https://www.mail-archive.com/libvir-list@redhat.com/msg226244.html
> >   
> > >   this patch can move sgx virtual device from 
> > > /machine/unattached/device[nn]
> > >   to /machine/sgx-epc[nn], which seems more clear. Thanks!  
> > 
> > with those patches device[0] becomes non issue, and this patch also becomes
> > unnecessary.
> > I don't mind putting sgx-epc under machine, but that shall be justified
> > somehow. A drawback I noticed in this case is an extra manual
> > plumbing/wiring without apparent need for it.  
> 
> This is effectively questioning why we have a QOM hierarchy with
> named devices at all. IMHO we don't need to justify giving explicitly
> named nodes under QOM beyond  "this is normal QOM modelling", and
> anything under '/unattached' is subject to being fixed in this way.

I agree that we should fix '/unattached', however blindly naming and
moving it wherever just because we can is not the fixing I've have had
in the mind.

With QOM device models, I'd try to compose parent/child relationships
like it's done in real hardware (ex: apic is a part of x86 CPU, so we
made cpu its parent, there are many ARM device models that follow
the same suit.)

In commit message, there must be a reason/explanation as to why
proposed parent has been chosen.
The current reason (lets get it out of the way just because some
userspace abused direct access to QOM) in commit message in not
a valid (I'd even say wasn't valid to begin with).
All I'm asking for is for sane commit message explaining why
something is moved to where it's proposed so that others can
understand it when looking at it.

With this patch I'm not sure if SGX should be a part of machine
or a part of CPU device model. (it seem SGX is a CPU feature
after all)
 
> Regards,
> Daniel




  1   2   3   4   >