Re: [PATCH 1/3] i386: Add missing "vmx-ept-wb" feature name

2021-02-01 Thread Paolo Bonzini

On 02/02/21 01:18, Eduardo Habkost wrote:

On Tue, Feb 02, 2021 at 12:28:38AM +0100, Paolo Bonzini wrote:

Il mar 2 feb 2021, 00:05 Eduardo Habkost  ha scritto:


On Mon, Feb 01, 2021 at 11:59:48PM +0100, Paolo Bonzini wrote:

Il lun 1 feb 2021, 23:54 Eduardo Habkost  ha

scritto:



Not having a feature name in feature_word_info breaks error
reporting and query-cpu-model-expansion.  Add the missing feature
name to feature_word_info[FEAT_VMX_EPT_VPID_CAPS].feat_names[14].


This is intentional, because there's no way that any hypervisor can run

if

this feature is disabled.


If leaving the feature without name enables some desirable
behavior, that's by accident and not by design.  Which part of
the existing behavior is intentional?



Not being able to disable it.


We can make it a hard dependency of vmx, then.  We shouldn't
leave it without a name, though.


The feature is already added to the MSRs unconditionally in 
kvm_msr_entry_add_vmx.  I think we can just remove it from the models 
instead.


Paolo




[PULL v3 00/38] Misc patches (buildsys, i386, fuzzing) for 2021-01-29

2021-02-01 Thread Paolo Bonzini
The following changes since commit 74208cd252c5da9d867270a178799abd802b9338:

  Merge remote-tracking branch 
'remotes/berrange-gitlab/tags/misc-fixes-pull-request' into staging (2021-01-29 
19:51:25 +)

are available in the Git repository at:

  https://gitlab.com/bonzini/qemu.git tags/for-upstream

for you to fetch changes up to a365bda83444f142bb1b9c1b5fdcdefade87981d:

  pc-bios/descriptors: fix paths in json files (2021-02-01 17:30:52 +0100)


* Fuzzing improvements (Qiuhao, Alexander)
* i386: Fix BMI decoding for instructions with the 0x66 prefix (David)
* slirp update (Marc-André)
* initial attempt at fixing event_notifier emulation (Maxim)
* i386: PKS emulation, fix for "qemu-system-i386 -cpu host" (myself)
* meson: RBD test fixes (myself)
* meson: TCI warnings (Philippe)
* Leaner build for --disable-guest-agent, --disable-system and
  --disable-tools (Philippe, Stefan)
* --enable-tcg-interpreter fix (Richard)
* i386: SVM feature bits (Wei)
* HVF bugfix (Alex)
* KVM bugfix (Thomas)



v1->v2: two extra bugfix patches, do move slirp/ to subprojects/libslirp/.

v2->v3: rebased

Alexander Bulekov (7):
  fuzz: ignore address_space_map is_write flag
  fuzz: refine the ide/ahci fuzzer configs
  docs/fuzz: fix pre-meson path
  fuzz: log the arguments used to initialize QEMU
  fuzz: enable dynamic args for generic-fuzz configs
  docs/fuzz: add some information about OSS-Fuzz
  fuzz: add virtio-9p configurations for fuzzing

Alexander Graf (1):
  hvf: Fetch cr4 before evaluating CPUID(1)

David Greenaway (1):
  target/i386: Fix decoding of certain BMI instructions

Igor Mammedov (1):
  machine: add missing doc for memory-backend option

Marc-André Lureau (1):
  slirp: update to git master

Maxim Levitsky (2):
  virtio-scsi: don't uninitialize queues that we didn't initialize
  event_notifier: handle initialization failure better

Paolo Bonzini (5):
  target/i386: do not set LM for 32-bit emulation "-cpu host/max"
  meson: accept either shared or static libraries if --disable-static
  meson: honor --enable-rbd if cc.links test fails
  target/i86: implement PKS
  build-sys: make libslirp a meson subproject

Pavel Dovgalyuk (1):
  replay: fix replay of the interrupts

Philippe Mathieu-Daudé (13):
  configure: Improve TCI feature description
  meson: Explicit TCG backend used
  meson: Warn when TCI is selected but TCG backend is available
  tests/meson: Only build softfloat objects if TCG is selected
  pc-bios/meson: Only install EDK2 blob firmwares with system emulation
  meson: Restrict block subsystem processing
  meson: Merge trace_events_subdirs array
  meson: Restrict some trace event directories to user/system emulation
  meson: Restrict emulation code
  qapi/meson: Restrict qdev code to system-mode emulation
  qapi/meson: Remove QMP from user-mode emulation
  qapi/meson: Restrict system-mode specific modules
  qapi/meson: Restrict UI module to system emulation and tools

Qiuhao Li (1):
  fuzz: fix wrong index in clear_bits

Richard Henderson (1):
  configure: Fix --enable-tcg-interpreter

Sergei Trofimovich (1):
  pc-bios/descriptors: fix paths in json files

Stefan Reiter (1):
  docs: don't install corresponding man page if guest agent is disabled

Thomas Huth (1):
  accel/kvm/kvm-all: Fix wrong return code handling in dirty log code

Wei Huang (1):
  x86/cpu: Populate SVM CPUID feature bits

 .gitmodules  |   4 +-
 MAINTAINERS  |   1 +
 accel/kvm/kvm-all.c  |  21 ++-
 accel/tcg/tcg-cpus-icount.c  |   8 +-
 backends/hostmem.c   |  10 ++
 configure|   9 +-
 docs/devel/build-system.rst  |   2 +-
 docs/devel/fuzzing.rst   |  35 +++-
 docs/meson.build |   6 +-
 hw/scsi/virtio-scsi-dataplane.c  |   8 +-
 include/exec/memory.h|   8 +-
 include/exec/memory_ldst_cached.h.inc|   6 +-
 include/qemu/event_notifier.h|   1 +
 memory_ldst.c.inc|   8 +-
 meson.build  | 277 ++-
 meson_options.txt|   2 +-
 pc-bios/descriptors/meson.build  |   2 +-
 pc-bios/meson.build  |   1 +
 qapi/meson.build |  34 ++--
 qemu-options.hx  |  26 ++-
 scripts/oss-fuzz/minimize_qtest_trace.py |   2 +-
 slirp|   1 -
 softmmu/memory.c |   5 +-
 softmmu/physmem.c|   4 +-
 stubs/meson.build|   2 +
 stubs/qdev.c |  23 +++

Re: [PATCH 3/3] qmp: Resume OOB-enabled monitor before processing the request

2021-02-01 Thread Markus Armbruster
Kevin Wolf  writes:

> Am 01.02.2021 um 17:15 hat Markus Armbruster geschrieben:
>> monitor_qmp_dispatcher_co() needs to resume the monitor if
>> handle_qmp_command() suspended it.  Two cases:
>> 
>> 1. OOB enabled: suspended if mon->qmp_requests has no more space
>> 
>> 2. OOB disabled: suspended always
>> 
>> We resume only after we processed the request.  Which can take a long
>> time.
>> 
>> Resume the monitor right when the queue has space to keep the monitor
>> available for out-of-band commands even in this corner case.
>> 
>> Leave the "OOB disabled" case alone.
>> 
>> Signed-off-by: Markus Armbruster 
>
>> +/*
>> + * We need to resume the monitor if handle_qmp_command()
>> + * suspended it.  Two cases:
>> + * 1. OOB enabled: mon->qmp_requests has no more space
>> + *Resume right away, so that OOB commands can get executed while
>> + *this request is being processed.
>> + * 2. OOB disabled: always
>> + *Resume only after we're done processing the request, 
>
> This line has trailing whitespace.

Trimming...

> With this fixed, the whole series is:
> Reviewed-by: Kevin Wolf 

Thanks!




Re: [PATCH] char: don't fail when client is not connected

2021-02-01 Thread Pavel Dovgalyuk

On 02.02.2021 10:27, Marc-André Lureau wrote:

Hi

On Tue, Feb 2, 2021 at 11:18 AM Pavel Dovgalyuk 
mailto:pavel.dovgal...@ispras.ru>> wrote:


This patch checks that ioc is not null before
using it in tcp socket tcp_chr_add_watch function.

Signed-off-by: Pavel Dovgalyuk mailto:pavel.dovgal...@ispras.ru>>


Do you have a backtrace or a reproducer when this happens?
thanks


Here is the backtrace:

Thread 4 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x72506700 (LWP 64988)]
object_get_class (obj=obj@entry=0x0) at ../qom/object.c:999
999 return obj->class;
(gdb) bt
#0  object_get_class (obj=obj@entry=0x0) at ../qom/object.c:999
#1  0x55b70e26 in QIO_CHANNEL_GET_CLASS (obj=0x0) at 
/home/pasha/ispras/qemu-test/include/io/channel.h:29
#2  qio_channel_create_watch (ioc=0x0, condition=(G_IO_OUT | G_IO_HUP)) 
at ../io/channel.c:281

#3  0x55c1bf9b in qemu_chr_fe_add_watch
(be=be@entry=0x56981648, cond=cond@entry=(G_IO_OUT | G_IO_HUP), 
func=func@entry=0x5597f170 , 
user_data=user_data@entry=0x569815a0)

at /home/pasha/ispras/qemu-test/include/chardev/char.h:229
#4  0x5597f042 in serial_xmit (s=s@entry=0x569815a0) at 
../hw/char/serial.c:265
#5  0x5597f437 in serial_ioport_write (opaque=0x569815a0, 
addr=, val=91, size=) at 
../hw/char/serial.c:359
#6  0x55ab95e0 in memory_region_write_accessor 
(mr=mr@entry=0x56981700, addr=0, value=value@entry=0x72504fc8, 
size=size@entry=1, shift=, mask=mask@entry=255, attrs=...)

at ../softmmu/memory.c:491
#7  0x55ab807e in access_with_adjusted_size
(addr=addr@entry=0, value=value@entry=0x72504fc8, 
size=size@entry=1, access_size_min=, 
access_size_max=, access_fn=access_fn@entry=
0x55ab9550 , mr=0x56981700, 
attrs=...) at ../softmmu/memory.c:552
#8  0x55abb947 in memory_region_dispatch_write 
(mr=mr@entry=0x56981700, addr=0, data=, 
data@entry=91, op=op@entry=MO_8, attrs=attrs@entry=...) at 
../softmmu/memory.c:1501
#9  0x55a721d8 in address_space_stb (as=, 
addr=, val=91, attrs=..., result=0x0) at 
/home/pasha/ispras/qemu-test/memory_ldst.c.inc:382

#10 0x7fffa8b63022 in code_gen_buffer ()
#11 0x55b10ab0 in cpu_tb_exec (tb_exit=, 
itb=, cpu=0x7fffa8635b00 ) at 
../accel/tcg/cpu-exec.c:188
#12 cpu_loop_exec_tb (tb_exit=, last_tb=pointer>, tb=, cpu=0x7fffa8635b00 
) at ../accel/tcg/cpu-exec.c:700

#13 cpu_exec (cpu=cpu@entry=0x566b4350) at ../accel/tcg/cpu-exec.c:811
#14 0x55b0ce97 in tcg_cpus_exec (cpu=cpu@entry=0x566b4350) 
at ../accel/tcg/tcg-cpus.c:57
#15 0x55abfa73 in rr_cpu_thread_fn 
(arg=arg@entry=0x566b4350) at ../accel/tcg/tcg-cpus-rr.c:217
#16 0x55c80573 in qemu_thread_start (args=) at 
../util/qemu-thread-posix.c:521
#17 0x76302609 in start_thread (arg=) at 
pthread_create.c:477
#18 0x76229293 in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95





---
  chardev/char-socket.c |    3 +++
  1 file changed, 3 insertions(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 213a4c8dd0..cef1d9438f 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -385,6 +385,9 @@ static ssize_t tcp_chr_recv(Chardev *chr, char
*buf, size_t len)
  static GSource *tcp_chr_add_watch(Chardev *chr, GIOCondition cond)
  {
      SocketChardev *s = SOCKET_CHARDEV(chr);
+    if (!s->ioc) {
+        return NULL;
+    }
      return qio_channel_create_watch(s->ioc, cond);
  }







Re: [PATCH] char: don't fail when client is not connected

2021-02-01 Thread Marc-André Lureau
Hi

On Tue, Feb 2, 2021 at 11:18 AM Pavel Dovgalyuk 
wrote:

> This patch checks that ioc is not null before
> using it in tcp socket tcp_chr_add_watch function.
>
> Signed-off-by: Pavel Dovgalyuk 
>

Do you have a backtrace or a reproducer when this happens?
thanks

---
>  chardev/char-socket.c |3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 213a4c8dd0..cef1d9438f 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -385,6 +385,9 @@ static ssize_t tcp_chr_recv(Chardev *chr, char *buf,
> size_t len)
>  static GSource *tcp_chr_add_watch(Chardev *chr, GIOCondition cond)
>  {
>  SocketChardev *s = SOCKET_CHARDEV(chr);
> +if (!s->ioc) {
> +return NULL;
> +}
>  return qio_channel_create_watch(s->ioc, cond);
>  }
>
>
>


Re: [Bug 1914117] Short files returned via FTP on Qemu with various architectures and OSes

2021-02-01 Thread Chris Pinnock
Apologies.


Host OS is Big Sur Mac OS X latest - with Xcode latest. Qemu is 5.2 - tar ball 
directly from the website.

- Compile Qemu on Mac OS/Big Sur - completely stock build :  install Ninja, 
mkdir build  && cd build && ../configure && make && make install
- But also the issue is with the binary in home-brew (e.g. brew install Qemu) - 
both methods get me to the same problem.

* Installed NetBSD/amd64 or i386 or OpenBSD/i386. 
Qemu-image create -f raw image 10G
qmu-system-ARCH -m 256M -hda image -cdrom “netbsd.iso”  -boot d -net user  -net 
nic

(For i386 & amd64 I tend to add -nographic for the installer)

* Run the image:
Qmu-system-ARCH -m 256M -hda $IMAGE -net user -net nic

Also NetBSD/arm64 has the issue using their image.
qemu-system-aarch64 -M virt -cpu cortex-a53 -smp 4 -m 4g \
  -drive if=none,file=netbsd-disk-arm64.img,id=hd0 -device 
virtio-blk-device,drive=hd0 \
  -netdev type=user,id=net0 -device 
virtio-net-device,netdev=net0,mac=00:11:22:33:44:55 \
  -bios QEMU_EFI.fd -nographic

* The issue seems to be downloading large files. 
In the host OS two files that seem to tickle the bug often are:

* ftp -a http://cpan.pair.com/src/5.0/perl-5.32.1.tar.xz
On NetBSD this file seems to be one byte shorter than it should be. On arm64 is 
was several bytes shorter.

* ftp -a ftp://ftp.isc.org/isc/bind9/9.16.11/bind-9.16.11.tar.xz
Also seems to tickle the bug


I saw this while trying to use pkgsrc on NetBSD. Saw this on Amd64, i386
and arm64. Tried OpenBSD to rule out NetBSD as the problem. OpenBSD/i386
sees the same issue (ftp returns short read and file is a couple of
bytes smaller).

The screenshot is from amd64 - a fresh boot this morning running on a
fairly idle host.

Kind regards
Chris

> On 2 Feb 2021, at 05:24, Thomas Huth <1914...@bugs.launchpad.net> wrote:
> 
> Please provide more information: How did you compile QEMU? Which version
> did you exactly use? And most important: How do you *run* QEMU? System
> emulation? User mode? What kind of FTP are you doing??
> 
> ** Changed in: qemu
>   Status: New => Incomplete
> 
> -- 
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1914117
> 
> Title:
>  Short files returned via FTP on Qemu with various architectures and
>  OSes
> 
> Status in QEMU:
>  Incomplete
> 
> Bug description:
> 
>  Qemu 5.2 on Mac OS X Big Sur.
> 
>  I originally thought that it might be caused by the home-brew version of 
> Qemu, but this evening I have removed the brew edition and compiled from 
> scratch (using Ninja & Xcode compiler). 
>  Still getting the same problem,.
> 
>  On the following architectures: 
>  arm64, amd64 and sometimes i386 running NetBSD host OS; 
>  i386 running OpenBSD host OS:
> 
>  I have seen a consistent problem with FTP returning short files. The
>  file will be a couple of bytes too short. I do not believe this is a
>  problem with the OS. Downloading the perl source code from CPAN does
>  not work properly, nor does downloading bind from isc. I've tried this
>  on different architectures as above.
> 
>  (Qemu 4.2 on Ubuntu/x86_64 with NetBSD/i386 seems to function fine. My
>  gut feel is there is something not right on the Mac OS version of Qemu
>  or a bug in 5.2 - obviously in the network layer somewhere. If you
>  have anything you want me to try, please let me know - happy to help
>  get a resolution.)
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/qemu/+bug/1914117/+subscriptions


** Attachment added: "PastedGraphic-1.tiff"
   
https://bugs.launchpad.net/bugs/1914117/+attachment/5459003/+files/PastedGraphic-1.tiff

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1914117

Title:
  Short files returned via FTP on Qemu with various architectures and
  OSes

Status in QEMU:
  Incomplete

Bug description:
  
  Qemu 5.2 on Mac OS X Big Sur.

  I originally thought that it might be caused by the home-brew version of 
Qemu, but this evening I have removed the brew edition and compiled from 
scratch (using Ninja & Xcode compiler). 
  Still getting the same problem,.

  On the following architectures: 
  arm64, amd64 and sometimes i386 running NetBSD host OS; 
  i386 running OpenBSD host OS:

  I have seen a consistent problem with FTP returning short files. The
  file will be a couple of bytes too short. I do not believe this is a
  problem with the OS. Downloading the perl source code from CPAN does
  not work properly, nor does downloading bind from isc. I've tried this
  on different architectures as above.

  (Qemu 4.2 on Ubuntu/x86_64 with NetBSD/i386 seems to function fine. My
  gut feel is there is something not right on the Mac OS version of Qemu
  or a bug in 5.2 - obviously in the network layer somewhere. If you
  have anything you want me to try, please let me know - happy to help
  get a resolution.)

To manage 

[Bug 1914117] Re: Short files returned via FTP on Qemu with various architectures and OSes

2021-02-01 Thread Chris Pinnock
Apologies.


Host OS is Big Sur Mac OS X latest - with Xcode latest. Qemu is 5.2 - tar ball 
directly from the website.

- Compile Qemu on Mac OS/Big Sur - completely stock build :  install Ninja, 
mkdir build  && cd build && ../configure && make && make install
- But also the issue is with the binary in home-brew (e.g. brew install Qemu) - 
both methods get me to the same problem.

* Installed NetBSD/amd64 or i386 or OpenBSD/i386. 
Qemu-image create -f raw image 10G
qmu-system-ARCH -m 256M -hda image -cdrom “netbsd.iso”  -boot d -net user  -net 
nic

(For i386 & amd64 I tend to add -nographic for the installer)

* Run the image:
Qmu-system-ARCH -m 256M -hda $IMAGE -net user -net nic

Also NetBSD/arm64 has the issue using their image.
qemu-system-aarch64 -M virt -cpu cortex-a53 -smp 4 -m 4g \
  -drive if=none,file=netbsd-disk-arm64.img,id=hd0 -device 
virtio-blk-device,drive=hd0 \
  -netdev type=user,id=net0 -device 
virtio-net-device,netdev=net0,mac=00:11:22:33:44:55 \
  -bios QEMU_EFI.fd -nographic

* The issue seems to be downloading large files. 
In the host OS two files that seem to tickle the bug often are:

* ftp -a http://cpan.pair.com/src/5.0/perl-5.32.1.tar.xz
On NetBSD this file seems to be one byte shorter than it should be. On arm64 is 
was several bytes shorter.

* ftp -a ftp://ftp.isc.org/isc/bind9/9.16.11/bind-9.16.11.tar.xz
Also seems to tickle the bug

I saw this while trying to use pkgsrc on NetBSD. Saw this on Amd64, i386
and arm64. Tried OpenBSD to rule out NetBSD as the problem. OpenBSD/i386
sees the same issue (ftp returns short read and file is a couple of
bytes smaller).

The screenshot is from amd64 - a fresh boot this morning running on a
fairly idle host.

Kind regards
Chris

** Attachment added: "Screenshot 2021-02-02 at 06.56.22.png"
   
https://bugs.launchpad.net/qemu/+bug/1914117/+attachment/5459002/+files/Screenshot%202021-02-02%20at%2006.56.22.png

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1914117

Title:
  Short files returned via FTP on Qemu with various architectures and
  OSes

Status in QEMU:
  Incomplete

Bug description:
  
  Qemu 5.2 on Mac OS X Big Sur.

  I originally thought that it might be caused by the home-brew version of 
Qemu, but this evening I have removed the brew edition and compiled from 
scratch (using Ninja & Xcode compiler). 
  Still getting the same problem,.

  On the following architectures: 
  arm64, amd64 and sometimes i386 running NetBSD host OS; 
  i386 running OpenBSD host OS:

  I have seen a consistent problem with FTP returning short files. The
  file will be a couple of bytes too short. I do not believe this is a
  problem with the OS. Downloading the perl source code from CPAN does
  not work properly, nor does downloading bind from isc. I've tried this
  on different architectures as above.

  (Qemu 4.2 on Ubuntu/x86_64 with NetBSD/i386 seems to function fine. My
  gut feel is there is something not right on the Mac OS version of Qemu
  or a bug in 5.2 - obviously in the network layer somewhere. If you
  have anything you want me to try, please let me know - happy to help
  get a resolution.)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1914117/+subscriptions



[PATCH] char: don't fail when client is not connected

2021-02-01 Thread Pavel Dovgalyuk
This patch checks that ioc is not null before
using it in tcp socket tcp_chr_add_watch function.

Signed-off-by: Pavel Dovgalyuk 
---
 chardev/char-socket.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 213a4c8dd0..cef1d9438f 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -385,6 +385,9 @@ static ssize_t tcp_chr_recv(Chardev *chr, char *buf, size_t 
len)
 static GSource *tcp_chr_add_watch(Chardev *chr, GIOCondition cond)
 {
 SocketChardev *s = SOCKET_CHARDEV(chr);
+if (!s->ioc) {
+return NULL;
+}
 return qio_channel_create_watch(s->ioc, cond);
 }
 




Re: [PATCH v4 00/16] 64bit block-layer: part I

2021-02-01 Thread Vladimir Sementsov-Ogievskiy

02.02.2021 05:56, Eric Blake wrote:

On 12/11/20 12:39 PM, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

We want 64bit write-zeroes, and for this, convert all io functions to
64bit.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

Please refer to initial cover-letter
  https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg08723.html
for more info.

v4: I found, that some more work is needed for block/block-backend, so
decided to make partI, converting block/io

v4 is based on Kevin's block branch ([PULL 00/34] Block layer patches)
for BDRV_MAX_LENGTH

changes:
01-05: new
06: add Alberto's r-b
07: new
08-16: rebase, add new-style request check, improve commit-msg, drop r-bs


I had planned to send a pull request for this series today, but ran into
a snag.  Without this series applied, './check -qcow2' fails 030, 185,
and 297.  With it applied, I now also get a failure in 206.  I'm trying
to bisect which patch caused the problem, but here's the failure:

206   fail   [20:54:54] [20:55:01]   6.9s   (last: 6.7s)  output
mismatch (see 206.out.bad)
--- /home/eblake/qemu/tests/qemu-iotests/206.out
+++ 206.out.bad
@@ -180,7 +180,7 @@

  {"execute": "blockdev-create", "arguments": {"job-id": "job0",
"options": {"driver": "qcow2", "file": "node0", "size":
9223372036854775296}}}
  {"return": {}}
-Job failed: Could not resize image: Required too big image size, it
must be not greater than 9223372035781033984
+Job failed: Could not resize image: offset(9223372036854775296) exceeds
maximum(9223372035781033984)
  {"execute": "job-dismiss", "arguments": {"id": "job0"}}
  {"return": {}}

Looks like it is just a changed error message, so I can touch up the
correct patch and then repackage the pull request tomorrow (it's too
late for me today).  Oh, and the 0 exit status of ./check when a test
fails is something I see you already plan on fixing...



Yes, Kevin have already sent a pull with "iotests: check: return 1 on failure"

--
Best regards,
Vladimir



Re: [QEMU-SECURITY] [PATCH] hw/intc/arm_gic: Fix interrupt ID in GICD_SGIR register

2021-02-01 Thread P J P
On Sunday, 31 January, 2021, 08:48:26 pm IST, Philippe Mathieu-Daudé 
 wrote: 
>Forwarding to qemu-security@ to see if this issue is worth a CVE.
>
> | On 1/31/21 11:34 AM, Philippe Mathieu-Daudé wrote:
> | > Per the ARM Generic Interrupt Controller Architecture specification
> | > (document "ARM IHI 0048B.b (ID072613)"), the SGIINTID field is 4 bit,
> | > not 10:
> | > 
> | >    - Table 4-21 GICD_SGIR bit assignments
> | > 
> | >    The Interrupt ID of the SGI to forward to the specified CPU
> | >    interfaces. The value of this field is the Interrupt ID, in
> | >    the range 0-15, for example a value of 0b0011 specifies
> | >    Interrupt ID 3.
> | > 
> | > diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c
> | > index af41e2fb448..75316329516 100644
> | > --- a/hw/intc/arm_gic.c
> | > +++ b/hw/intc/arm_gic.c
> | > @@ -1476,7 +1476,7 @@ static void gic_dist_writel(void *opaque, hwaddr 
> offset,
> | >          int target_cpu;
> | >  
> | >          cpu = gic_get_current_cpu(s);
> | > -        irq = value & 0x3ff;
> | > +        irq = value & 0xf;
> | >          switch ((value >> 24) & 3) {
> | >          case 0:
> | >              mask = (value >> 16) & ALL_CPU_MASK;
> | > 
> | > Buglink: https://bugs.launchpad.net/qemu/+bug/1913916
> | > Buglink: https://bugs.launchpad.net/qemu/+bug/1913917

* Does above patch address both these bugs? For BZ#1913917 'irq' is derived 
from 'offset' it seems.

        /* Interrupt Configuration.  */                                         
        irq = (offset - 0xc00) * 4;


> | > Correct the irq mask to fix an undefined behavior (which eventually
> | > lead to a heap-buffer-overflow, see [Buglink]):
> | > 
> | >    $ echo 'writel 0x8000f00 0xff4affb0' | qemu-system-aarch64 -M 
> virt,accel=qtest -qtest stdio
> | >    [I 1612088147.116987] OPENED
> | >  [R +0.278293] writel 0x8000f00 0xff4affb0
> | >  ../hw/intc/arm_gic.c:1498:13: runtime error: index 944 out of bounds for 
> type 'uint8_t [16][8]'
> | >  SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior 
> ../hw/intc/arm_gic.c:1498:13
> | > 
> | > Cc: qemu-sta...@nongnu.org
> | > Fixes: 9ee6e8bb853 ("ARMv7 support.")
> |
> | > ---
> | > Isnt it worth a CVE to help distributions track backports?
> | > ---

Thank you for reporting this issue. Will process further.


Thank you.
---
  -P J P
http://feedmug.com



RE: [PATCH 1/1] virtiofsd: Allow to build it without the tools

2021-02-01 Thread misono.tomoh...@fujitsu.com
> Subject: [PATCH 1/1] virtiofsd: Allow to build it without the tools
> 
> This changed the Meson build script to allow virtiofsd be built even
> though the tools build is disabled, thus honoring the --enable-virtiofsd
> option.
> 
> Signed-off-by: Wainer dos Santos Moschetta 

I misunderstood that virtiofsd builds somehow depends on tools build at that 
time.
Thanks for fixing. I did quick build test.

Reviewed-by: Misono Tomohiro 



Re: macOS (Big Sur, Apple Silicon) 'make check' fails in test-crypto-tlscredsx509

2021-02-01 Thread Yonggang Luo
SHA-1: 94c13c1048378cbffe552b6fe5c960dc04eaefb2

* gcrypt: test_tls_psk_init should write binary file instead text file.

On windows, if open file with "w", it's will automatically convert
"\n" to "\r\n" when writing to file.

Signed-off-by: Yonggang Luo 
Is this related?

On Wed, Jan 27, 2021 at 12:37 AM Peter Maydell 
wrote:

> My Big Sur/Apple Silicon system fails "make check" in
> test-crypto-tlscredsx509:
>
> MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> G_TEST_SRCDIR=/Users/pm215/qemu/tests
> G_TEST_BUILDDIR=/Users/pm215/qemu/build/all/tests
> tests/test-crypto-tlscredsx509 --tap -k
>
> ** (tests/test-crypto-tlscredsx509:35180): CRITICAL **: 16:23:34.590:
> Failed to sign certificate ASN1 parser: Value is not valid.
> ERROR test-crypto-tlscredsx509 - Bail out! FATAL-CRITICAL: Failed to
> sign certificate ASN1 parser: Value is not valid.
> make: *** [run-test-70] Error 1
>
>
> Does this failure ring any bells for anybody?
>
> Here's the crypto part of the meson-log:
>
>   Crypto
>  TLS priority: "NORMAL"
>GNUTLS support: YES
> libgcrypt: NO
>nettle: YES
>   XTS: YES
>  crypto afalg: NO
>  rng-none: NO
> Linux keyring: NO
>
>
> thanks
> -- PMM
>
>

-- 
 此致
礼
罗勇刚
Yours
sincerely,
Yonggang Luo


[Bug 1914117] Re: Short files returned via FTP on Qemu with various architectures and OSes

2021-02-01 Thread Thomas Huth
Please provide more information: How did you compile QEMU? Which version
did you exactly use? And most important: How do you *run* QEMU? System
emulation? User mode? What kind of FTP are you doing??

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1914117

Title:
  Short files returned via FTP on Qemu with various architectures and
  OSes

Status in QEMU:
  Incomplete

Bug description:
  
  Qemu 5.2 on Mac OS X Big Sur.

  I originally thought that it might be caused by the home-brew version of 
Qemu, but this evening I have removed the brew edition and compiled from 
scratch (using Ninja & Xcode compiler). 
  Still getting the same problem,.

  On the following architectures: 
  arm64, amd64 and sometimes i386 running NetBSD host OS; 
  i386 running OpenBSD host OS:

  I have seen a consistent problem with FTP returning short files. The
  file will be a couple of bytes too short. I do not believe this is a
  problem with the OS. Downloading the perl source code from CPAN does
  not work properly, nor does downloading bind from isc. I've tried this
  on different architectures as above.

  (Qemu 4.2 on Ubuntu/x86_64 with NetBSD/i386 seems to function fine. My
  gut feel is there is something not right on the Mac OS version of Qemu
  or a bug in 5.2 - obviously in the network layer somewhere. If you
  have anything you want me to try, please let me know - happy to help
  get a resolution.)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1914117/+subscriptions



Re: macOS (Big Sur, Apple Silicon) 'make check' fails in test-crypto-tlscredsx509

2021-02-01 Thread Roman Bolshakov
On Fri, Jan 29, 2021 at 09:53:27AM +, Daniel P. Berrangé wrote:
> On Fri, Jan 29, 2021 at 11:43:32AM +0300, Roman Bolshakov wrote:
> > On Wed, Jan 27, 2021 at 06:59:17PM +, Daniel P. Berrangé wrote:
> > > On Wed, Jan 27, 2021 at 07:56:16PM +0100, Stefan Weil wrote:
> > > > Am 27.01.21 um 19:17 schrieb Daniel P. Berrangé:
> > > > 
> > > > > On Wed, Jan 27, 2021 at 06:05:08PM +0100, Stefan Weil wrote:
> > > > > > Am 27.01.21 um 17:53 schrieb Daniel P. Berrangé:
> > > > > > 
> > > > > > > In $QEMU.git/crypto/init.c can you uncomment the "#define 
> > > > > > > DEBUG_GNUTLS"
> > > > > > > line and then re-build and re-run the test case.
> > > > > > > 
> > > > > > > There's a bunch of debug logs in code paths from 
> > > > > > > gnutls_x509_crt_privkey_sign
> > > > > > > that might give us useful info.
> > > > > > > 
> > > > > > > Regards,
> > > > > > > Daniel
> > > > > > 
> > > > > > % LANG=C.UTF-8 tests/test-crypto-tlscredsx509
> > > > > > # random seed: R02S9b95072a368ad370cdd4c780b8074596
> > > > > > 3: ASSERT: mpi.c[wrap_nettle_mpi_print]:60
> > > > > > 3: ASSERT: mpi.c[wrap_nettle_mpi_print]:60
> > > > > > 2: signing structure using RSA-SHA256
> > > > > > 3: ASSERT: common.c[_gnutls_x509_der_encode]:855
> > > > > > 3: ASSERT: sign.c[_gnutls_x509_pkix_sign]:174
> > > > > > 3: ASSERT: x509_write.c[gnutls_x509_crt_privkey_sign]:1834
> > > > > > 3: ASSERT: x509_write.c[gnutls_x509_crt_sign2]:1152
> > > > > > Bail out! FATAL-CRITICAL: Failed to sign certificate ASN1 parser: 
> > > > > > Value is
> > > > > > not valid.
> > > > > So it shows its failing inside a asn1_der_coding call, but I can't see
> > > > > why it would fail, especially if the same test suite passes fine on
> > > > > macOS x86_64 hosts.
> > > > 
> > > > 
> > > > It returns ASN1_MEM_ERROR, so the input vector is too small.
> > > 
> > > Hmm, that's odd - "Value is not valid" corresponds to
> > > ASN1_VALUE_NOT_VALID error code.
> > > 
> > 
> > Hi Daniel, Stefan,
> > 
> > It's interesting that "make check" of libtasn1 fails with three tests
> > and two of them produce VALUE_NOT_VALID error.
> > 
> > The failing tests are:
> >   FAIL: Test_parser
> >   FAIL: Test_tree
> >   FAIL: copynode
> 
> That's interesting. Assuming 'make check' for libtasn1 succeeeds on
> x86_64 macOS, then I'm inclined to blame this whole problem on
> libtasn1 not QEMU.
> 

'make check' of libtasn1 doesn't succeed on x86_64 either.

After a session of debugging I believe there's an issue with Clang 12.
Here's a test program (it reproduces unexpected ASN1_VALUE_NOT_VALID
from _asn1_time_der() in libtasn1):

#include 

static int func2(char *foo) {
fprintf(stderr, "%s:%d foo: %p\n", __func__, __LINE__, foo);
if (foo == NULL) {
fprintf(stderr, "%s:%d foo: %p\n", __func__, __LINE__, foo);
return 1;
}
return 0;
}

int func1(char *foo) {
int counter = 0;
if (fprintf(stderr, "IO\n") > 0)
counter += 10;
fprintf(stderr, "%s:%d foo: %p counter %d\n", __func__, __LINE__, foo, 
counter);
if(!func2(foo + counter)) {
fprintf(stderr, "good\n");
return 0;
} else {
fprintf(stderr, "broken\n");
return 1;
}
}

int main() {
char *foo = NULL;
return func1(foo);
}


What return value would you expect from the program?

If the program is compiled with -O0/O1 it returns zero exit code.
Here's the output:
IO
func1:16 foo: 0x0 counter 10
func2:4 foo: 0xa
good

If it is compiled with -O2 it returns 1:
IO
func1:16 foo: 0x0 counter 10
func2:4 foo: 0xa
func2:6 foo: 0x0
broken

That happens because clang uses register behind foo from func1 (it has zero
pointer) inside inlined func2 (it should have non zero pointer).

So, immediate workaround would be to downgrade optimization level of libtasn1
to -O1 in homebrew.

I've submitted the issue to Apple bugtracker:
FB8986815

Best regards,
Roman



[Bug 1911351] Re: x86-64 MTTCG Does not update page table entries atomically

2021-02-01 Thread Venkatesh Srinivas
BTW, the RISC-V MMU code _does_ get this right and the model could be
followed by the x86 version - - something like
https://github.com/vsrinivas/qemu/commit/1efa7dc689c4572d8fe0880ddbe44ec22f8f4348,
(but with more compiling + working) might solve this problem and more
closely model h/w.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1911351

Title:
  x86-64 MTTCG Does not update page table entries atomically

Status in QEMU:
  Confirmed

Bug description:
  It seems like the qemu tcg code for x86-64 doesn't write the access
  and dirty bits of the page table entries atomically. Instead, they
  first read the entry, see if they need to set the page table entry,
  and then write back the updated page table entry. So if you have two
  threads running at the same time, one accessing the virtual address
  over and over again, and the other modifying the page table entry, it
  is possible that after the second thread modifies the page table
  entry, qemu overwrites the value with the old page table entry value,
  with the access/dirty flags set.

  Here's a unit test that reproduces this behavior:

  https://github.com/mvanotti/kvm-unit-
  tests/commit/09f9722807271226a714b04f25174776454b19cd

  You can run it with:

  ```
  /usr/bin/qemu-system-x86_64 --no-reboot -nodefaults \
  -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 \
  -vnc none -serial stdio -device pci-testdev \
  -smp 4 -machine q35 --accel tcg,thread=multi \
  -kernel x86/mmu-race.flat # -initrd /tmp/tmp.avvPpezMFf
  ```

  Expected output (failure):

  ```
  kvm-unit-tests$ make && /usr/bin/qemu-system-x86_64 --no-reboot -nodefaults 
-device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none 
-serial stdio -device pci-testdev -smp 4 -machine q35 --accel tcg,thread=multi  
-kernel x86/mmu-race.flat # -initrd /tmp/tmp.avvPpezMFf
  enabling apic
  enabling apic
  enabling apic
  enabling apic
  paging enabled
  cr0 = 80010011
  cr3 = 627000
  cr4 = 20
  found 4 cpus
  PASS: Need more than 1 CPU
  Detected overwritten PTE:
  want: 0x0062e007
  got:  0x0062d027
  FAIL: PTE not overwritten
  PASS: All Reads were zero
  SUMMARY: 3 tests, 1 unexpected failures
  ```

  This bug allows user-to-root privilege escalation inside the guest VM:
  if the user is able overwrite an entry that belongs to a second-to-
  last level page table, and is able to allocate the referenced page,
  then the user would be in control of a last-level page table, being
  able to map any memory they want. This is not uncommon in situations
  where memory is being decomitted.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1911351/+subscriptions



[Bug 1903752] Re: qemu-system-avr error: qemu-system-avr: execution left flash memory

2021-02-01 Thread Launchpad Bug Tracker
[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1903752

Title:
  qemu-system-avr error: qemu-system-avr: execution left flash memory

Status in QEMU:
  Expired

Bug description:
  I compiled QEMU 5.1 from source with target avr-softmmu. Running
  demo.elf from https://github.com/seharris/qemu-avr-tests/blob/master
  /free-rtos/Demo/AVR_ATMega2560_GCC/demo.elf (linked from
  https://www.qemu.org/docs/master/system/target-avr.html) yields the
  following error:

  $ ./qemu-5.1.0/avr-softmmu/qemu-system-avr -machine mega2560 -bios demo.elf
  VNC server running on 127.0.0.1:5900
  qemu-system-avr: execution left flash memory
  Aborted (core dumped)

  I compiled QEMU on Ubuntu Server 20.10 with gcc (Ubuntu
  10.2.0-13ubuntu1) 10.2.0

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1903752/+subscriptions



[PATCH v8 13/13] s390: Recognize confidential-guest-support option

2021-02-01 Thread David Gibson
At least some s390 cpu models support "Protected Virtualization" (PV),
a mechanism to protect guests from eavesdropping by a compromised
hypervisor.

This is similar in function to other mechanisms like AMD's SEV and
POWER's PEF, which are controlled by the "confidential-guest-support"
machine option.  s390 is a slightly special case, because we already
supported PV, simply by using a CPU model with the required feature
(S390_FEAT_UNPACK).

To integrate this with the option used by other platforms, we
implement the following compromise:

 - When the confidential-guest-support option is set, s390 will
   recognize it, verify that the CPU can support PV (failing if not)
   and set virtio default options necessary for encrypted or protected
   guests, as on other platforms.  i.e. if confidential-guest-support
   is set, we will either create a guest capable of entering PV mode,
   or fail outright.

 - If confidential-guest-support is not set, guests might still be
   able to enter PV mode, if the CPU has the right model.  This may be
   a little surprising, but shouldn't actually be harmful.

To start a guest supporting Protected Virtualization using the new
option use the command line arguments:
-object s390-pv-guest,id=pv0 -machine confidential-guest-support=pv0

Signed-off-by: David Gibson 
---
 docs/confidential-guest-support.txt |  3 ++
 docs/system/s390x/protvirt.rst  | 19 ++---
 hw/s390x/pv.c   | 62 +
 hw/s390x/s390-virtio-ccw.c  |  3 ++
 include/hw/s390x/pv.h   | 17 
 5 files changed, 98 insertions(+), 6 deletions(-)

diff --git a/docs/confidential-guest-support.txt 
b/docs/confidential-guest-support.txt
index 4da4c91bd3..71d07ba57a 100644
--- a/docs/confidential-guest-support.txt
+++ b/docs/confidential-guest-support.txt
@@ -43,4 +43,7 @@ AMD Secure Encrypted Virtualization (SEV)
 POWER Protected Execution Facility (PEF)
 docs/papr-pef.txt
 
+s390x Protected Virtualization (PV)
+docs/system/s390x/protvirt.rst
+
 Other mechanisms may be supported in future.
diff --git a/docs/system/s390x/protvirt.rst b/docs/system/s390x/protvirt.rst
index 712974ad87..0f481043d9 100644
--- a/docs/system/s390x/protvirt.rst
+++ b/docs/system/s390x/protvirt.rst
@@ -22,15 +22,22 @@ If those requirements are met, the capability 
`KVM_CAP_S390_PROTECTED`
 will indicate that KVM can support PVMs on that LPAR.
 
 
-QEMU Settings
--
+Running a Protected Virtual Machine
+---
 
-To indicate to the VM that it can transition into protected mode, the
+To run a PVM you will need to select a CPU model which includes the
 `Unpack facility` (stfle bit 161 represented by the feature
-`unpack`/`S390_FEAT_UNPACK`) needs to be part of the cpu model of
-the VM.
+`unpack`/`S390_FEAT_UNPACK`), and add these options to the command line::
+
+-object s390-pv-guest,id=pv0 \
+-machine confidential-guest-support=pv0
+
+Adding these options will:
+
+* Ensure the `unpack` facility is available
+* Enable the IOMMU by default for all I/O devices
+* Initialize the PV mechanism
 
-All I/O devices need to use the IOMMU.
 Passthrough (vfio) devices are currently not supported.
 
 Host huge page backings are not supported. However guests can use huge
diff --git a/hw/s390x/pv.c b/hw/s390x/pv.c
index ab3a2482aa..93eccfc05d 100644
--- a/hw/s390x/pv.c
+++ b/hw/s390x/pv.c
@@ -14,8 +14,11 @@
 #include 
 
 #include "cpu.h"
+#include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "sysemu/kvm.h"
+#include "qom/object_interfaces.h"
+#include "exec/confidential-guest-support.h"
 #include "hw/s390x/ipl.h"
 #include "hw/s390x/pv.h"
 
@@ -111,3 +114,62 @@ void s390_pv_inject_reset_error(CPUState *cs)
 /* Report that we are unable to enter protected mode */
 env->regs[r1 + 1] = DIAG_308_RC_INVAL_FOR_PV;
 }
+
+#define TYPE_S390_PV_GUEST "s390-pv-guest"
+OBJECT_DECLARE_SIMPLE_TYPE(S390PVGuest, S390_PV_GUEST)
+
+/**
+ * S390PVGuest:
+ *
+ * The S390PVGuest object is basically a dummy used to tell the
+ * confidential guest support system to use s390's PV mechanism.
+ *
+ * # $QEMU \
+ * -object s390-pv-guest,id=pv0 \
+ * -machine ...,confidential-guest-support=pv0
+ */
+struct S390PVGuest {
+ConfidentialGuestSupport parent_obj;
+};
+
+typedef struct S390PVGuestClass S390PVGuestClass;
+
+struct S390PVGuestClass {
+ConfidentialGuestSupportClass parent_class;
+};
+
+int s390_pv_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
+{
+if (!object_dynamic_cast(OBJECT(cgs), TYPE_S390_PV_GUEST)) {
+return 0;
+}
+
+if (!s390_has_feat(S390_FEAT_UNPACK)) {
+error_setg(errp,
+   "CPU model does not support Protected Virtualization");
+return -1;
+}
+
+cgs->ready = true;
+
+return 0;
+}
+
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(S390PVGuest,
+   s390_pv_guest,
+   

[PATCH v8 10/13] spapr: Add PEF based confidential guest support

2021-02-01 Thread David Gibson
Some upcoming POWER machines have a system called PEF (Protected
Execution Facility) which uses a small ultravisor to allow guests to
run in a way that they can't be eavesdropped by the hypervisor.  The
effect is roughly similar to AMD SEV, although the mechanisms are
quite different.

Most of the work of this is done between the guest, KVM and the
ultravisor, with little need for involvement by qemu.  However qemu
does need to tell KVM to allow secure VMs.

Because the availability of secure mode is a guest visible difference
which depends on having the right hardware and firmware, we don't
enable this by default.  In order to run a secure guest you need to
create a "pef-guest" object and set the confidential-guest-support
property to point to it.

Note that this just *allows* secure guests, the architecture of PEF is
such that the guest still needs to talk to the ultravisor to enter
secure mode.  Qemu has no direct way of knowing if the guest is in
secure mode, and certainly can't know until well after machine
creation time.

To start a PEF-capable guest, use the command line options:
-object pef-guest,id=pef0 -machine confidential-guest-support=pef0

Signed-off-by: David Gibson 
---
 docs/confidential-guest-support.txt |   3 +
 docs/papr-pef.txt   |  30 +++
 hw/ppc/meson.build  |   1 +
 hw/ppc/pef.c| 133 
 hw/ppc/spapr.c  |   8 +-
 include/hw/ppc/pef.h|  17 
 target/ppc/kvm.c|  18 
 target/ppc/kvm_ppc.h|   6 --
 8 files changed, 191 insertions(+), 25 deletions(-)
 create mode 100644 docs/papr-pef.txt
 create mode 100644 hw/ppc/pef.c
 create mode 100644 include/hw/ppc/pef.h

diff --git a/docs/confidential-guest-support.txt 
b/docs/confidential-guest-support.txt
index bd439ac800..4da4c91bd3 100644
--- a/docs/confidential-guest-support.txt
+++ b/docs/confidential-guest-support.txt
@@ -40,4 +40,7 @@ Currently supported confidential guest mechanisms are:
 AMD Secure Encrypted Virtualization (SEV)
 docs/amd-memory-encryption.txt
 
+POWER Protected Execution Facility (PEF)
+docs/papr-pef.txt
+
 Other mechanisms may be supported in future.
diff --git a/docs/papr-pef.txt b/docs/papr-pef.txt
new file mode 100644
index 00..72550e9bf8
--- /dev/null
+++ b/docs/papr-pef.txt
@@ -0,0 +1,30 @@
+POWER (PAPR) Protected Execution Facility (PEF)
+===
+
+Protected Execution Facility (PEF), also known as Secure Guest support
+is a feature found on IBM POWER9 and POWER10 processors.
+
+If a suitable firmware including an Ultravisor is installed, it adds
+an extra memory protection mode to the CPU.  The ultravisor manages a
+pool of secure memory which cannot be accessed by the hypervisor.
+
+When this feature is enabled in QEMU, a guest can use ultracalls to
+enter "secure mode".  This transfers most of its memory to secure
+memory, where it cannot be eavesdropped by a compromised hypervisor.
+
+Launching
+-
+
+To launch a guest which will be permitted to enter PEF secure mode:
+
+# ${QEMU} \
+-object pef-guest,id=pef0 \
+-machine confidential-guest-support=pef0 \
+...
+
+Live Migration
+
+
+Live migration is not yet implemented for PEF guests.  For
+consistency, we currently prevent migration if the PEF feature is
+enabled, whether or not the guest has actually entered secure mode.
diff --git a/hw/ppc/meson.build b/hw/ppc/meson.build
index ffa2ec37fa..218631c883 100644
--- a/hw/ppc/meson.build
+++ b/hw/ppc/meson.build
@@ -27,6 +27,7 @@ ppc_ss.add(when: 'CONFIG_PSERIES', if_true: files(
   'spapr_nvdimm.c',
   'spapr_rtas_ddw.c',
   'spapr_numa.c',
+  'pef.c',
 ))
 ppc_ss.add(when: 'CONFIG_SPAPR_RNG', if_true: files('spapr_rng.c'))
 ppc_ss.add(when: ['CONFIG_PSERIES', 'CONFIG_LINUX'], if_true: files(
diff --git a/hw/ppc/pef.c b/hw/ppc/pef.c
new file mode 100644
index 00..f9fd1f2a71
--- /dev/null
+++ b/hw/ppc/pef.c
@@ -0,0 +1,133 @@
+/*
+ * PEF (Protected Execution Facility) for POWER support
+ *
+ * Copyright Red Hat.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include "qapi/error.h"
+#include "qom/object_interfaces.h"
+#include "sysemu/kvm.h"
+#include "migration/blocker.h"
+#include "exec/confidential-guest-support.h"
+#include "hw/ppc/pef.h"
+
+#define TYPE_PEF_GUEST "pef-guest"
+OBJECT_DECLARE_SIMPLE_TYPE(PefGuest, PEF_GUEST)
+
+typedef struct PefGuest PefGuest;
+typedef struct PefGuestClass PefGuestClass;
+
+struct PefGuestClass {
+ConfidentialGuestSupportClass parent_class;
+};
+
+/**
+ * PefGuest:
+ *
+ * The PefGuest object is used for creating and managing a PEF
+ * guest.
+ *
+ * # $QEMU \
+ * -object pef-guest,id=pef0 \
+ * -machine ...,confidential-guest-support=pef0
+ */
+struct PefGuest {
+

[PATCH v8 06/13] sev: Add Error ** to sev_kvm_init()

2021-02-01 Thread David Gibson
This allows failures to be reported richly and idiomatically.

Signed-off-by: David Gibson 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Reviewed-by: Cornelia Huck 
---
 accel/kvm/kvm-all.c  |  4 +++-
 accel/kvm/sev-stub.c |  2 +-
 include/sysemu/sev.h |  2 +-
 target/i386/sev.c| 31 +++
 4 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 7e615b8e68..3d820d0c7d 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2185,9 +2185,11 @@ static int kvm_init(MachineState *ms)
  * encryption context.
  */
 if (ms->cgs) {
+Error *local_err = NULL;
 /* FIXME handle mechanisms other than SEV */
-ret = sev_kvm_init(ms->cgs);
+ret = sev_kvm_init(ms->cgs, _err);
 if (ret < 0) {
+error_report_err(local_err);
 goto err;
 }
 }
diff --git a/accel/kvm/sev-stub.c b/accel/kvm/sev-stub.c
index 3d4787ae4a..512e205f7f 100644
--- a/accel/kvm/sev-stub.c
+++ b/accel/kvm/sev-stub.c
@@ -15,7 +15,7 @@
 #include "qemu-common.h"
 #include "sysemu/sev.h"
 
-int sev_kvm_init(ConfidentialGuestSupport *cgs)
+int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
 /* SEV can't be selected if it's not compiled */
 g_assert_not_reached();
diff --git a/include/sysemu/sev.h b/include/sysemu/sev.h
index 3b5b1aacf1..5c5a13c6ca 100644
--- a/include/sysemu/sev.h
+++ b/include/sysemu/sev.h
@@ -16,7 +16,7 @@
 
 #include "sysemu/kvm.h"
 
-int sev_kvm_init(ConfidentialGuestSupport *cgs);
+int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
 int sev_encrypt_flash(uint8_t *ptr, uint64_t len, Error **errp);
 int sev_inject_launch_secret(const char *hdr, const char *secret,
  uint64_t gpa, Error **errp);
diff --git a/target/i386/sev.c b/target/i386/sev.c
index fa962d533c..590cb31fa8 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -662,7 +662,7 @@ sev_vm_state_change(void *opaque, int running, RunState 
state)
 }
 }
 
-int sev_kvm_init(ConfidentialGuestSupport *cgs)
+int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
 SevGuestState *sev = SEV_GUEST(cgs);
 char *devname;
@@ -684,14 +684,14 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs)
 host_cbitpos = ebx & 0x3f;
 
 if (host_cbitpos != sev->cbitpos) {
-error_report("%s: cbitpos check failed, host '%d' requested '%d'",
- __func__, host_cbitpos, sev->cbitpos);
+error_setg(errp, "%s: cbitpos check failed, host '%d' requested '%d'",
+   __func__, host_cbitpos, sev->cbitpos);
 goto err;
 }
 
 if (sev->reduced_phys_bits < 1) {
-error_report("%s: reduced_phys_bits check failed, it should be >=1,"
- " requested '%d'", __func__, sev->reduced_phys_bits);
+error_setg(errp, "%s: reduced_phys_bits check failed, it should be 
>=1,"
+   " requested '%d'", __func__, sev->reduced_phys_bits);
 goto err;
 }
 
@@ -700,20 +700,19 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs)
 devname = object_property_get_str(OBJECT(sev), "sev-device", NULL);
 sev->sev_fd = open(devname, O_RDWR);
 if (sev->sev_fd < 0) {
-error_report("%s: Failed to open %s '%s'", __func__,
- devname, strerror(errno));
-}
-g_free(devname);
-if (sev->sev_fd < 0) {
+error_setg(errp, "%s: Failed to open %s '%s'", __func__,
+   devname, strerror(errno));
+g_free(devname);
 goto err;
 }
+g_free(devname);
 
 ret = sev_platform_ioctl(sev->sev_fd, SEV_PLATFORM_STATUS, ,
  _error);
 if (ret) {
-error_report("%s: failed to get platform status ret=%d "
- "fw_error='%d: %s'", __func__, ret, fw_error,
- fw_error_to_str(fw_error));
+error_setg(errp, "%s: failed to get platform status ret=%d "
+   "fw_error='%d: %s'", __func__, ret, fw_error,
+   fw_error_to_str(fw_error));
 goto err;
 }
 sev->build_id = status.build;
@@ -723,14 +722,14 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs)
 trace_kvm_sev_init();
 ret = sev_ioctl(sev->sev_fd, KVM_SEV_INIT, NULL, _error);
 if (ret) {
-error_report("%s: failed to initialize ret=%d fw_error=%d '%s'",
- __func__, ret, fw_error, fw_error_to_str(fw_error));
+error_setg(errp, "%s: failed to initialize ret=%d fw_error=%d '%s'",
+   __func__, ret, fw_error, fw_error_to_str(fw_error));
 goto err;
 }
 
 ret = sev_launch_start(sev);
 if (ret) {
-error_report("%s: failed to create encryption context", __func__);
+error_setg(errp, "%s: failed to create encryption context", __func__);
 goto err;
 }
 
-- 
2.29.2




[PATCH v8 11/13] spapr: PEF: prevent migration

2021-02-01 Thread David Gibson
We haven't yet implemented the fairly involved handshaking that will be
needed to migrate PEF protected guests.  For now, just use a migration
blocker so we get a meaningful error if someone attempts this (this is the
same approach used by AMD SEV).

Signed-off-by: David Gibson 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Greg Kurz 
---
 hw/ppc/pef.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/ppc/pef.c b/hw/ppc/pef.c
index f9fd1f2a71..573be3ed79 100644
--- a/hw/ppc/pef.c
+++ b/hw/ppc/pef.c
@@ -44,6 +44,8 @@ struct PefGuest {
 static int kvmppc_svm_init(Error **errp)
 {
 #ifdef CONFIG_KVM
+static Error *pef_mig_blocker;
+
 if (!kvm_check_extension(kvm_state, KVM_CAP_PPC_SECURE_GUEST)) {
 error_setg(errp,
"KVM implementation does not support Secure VMs (is an 
ultravisor running?)");
@@ -58,6 +60,11 @@ static int kvmppc_svm_init(Error **errp)
 }
 }
 
+/* add migration blocker */
+error_setg(_mig_blocker, "PEF: Migration is not implemented");
+/* NB: This can fail if --only-migratable is used */
+migrate_add_blocker(pef_mig_blocker, _fatal);
+
 return 0;
 #else
 g_assert_not_reached();
-- 
2.29.2




[PATCH v8 12/13] confidential guest support: Alter virtio default properties for protected guests

2021-02-01 Thread David Gibson
The default behaviour for virtio devices is not to use the platforms normal
DMA paths, but instead to use the fact that it's running in a hypervisor
to directly access guest memory.  That doesn't work if the guest's memory
is protected from hypervisor access, such as with AMD's SEV or POWER's PEF.

So, if a confidential guest mechanism is enabled, then apply the
iommu_platform=on option so it will go through normal DMA mechanisms.
Those will presumably have some way of marking memory as shared with
the hypervisor or hardware so that DMA will work.

Signed-off-by: David Gibson 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Cornelia Huck 
Reviewed-by: Greg Kurz 
---
 hw/core/machine.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 94194ab82d..497949899b 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -33,6 +33,8 @@
 #include "migration/global_state.h"
 #include "migration/vmstate.h"
 #include "exec/confidential-guest-support.h"
+#include "hw/virtio/virtio.h"
+#include "hw/virtio/virtio-pci.h"
 
 GlobalProperty hw_compat_5_2[] = {};
 const size_t hw_compat_5_2_len = G_N_ELEMENTS(hw_compat_5_2);
@@ -1196,6 +1198,17 @@ void machine_run_board_init(MachineState *machine)
  * areas.
  */
 machine_set_mem_merge(OBJECT(machine), false, _abort);
+
+/*
+ * Virtio devices can't count on directly accessing guest
+ * memory, so they need iommu_platform=on to use normal DMA
+ * mechanisms.  That requires also disabling legacy virtio
+ * support for those virtio pci devices which allow it.
+ */
+object_register_sugar_prop(TYPE_VIRTIO_PCI, "disable-legacy",
+   "on", true);
+object_register_sugar_prop(TYPE_VIRTIO_DEVICE, "iommu_platform",
+   "on", false);
 }
 
 machine_class->init(machine);
-- 
2.29.2




[PATCH v8 08/13] confidential guest support: Move SEV initialization into arch specific code

2021-02-01 Thread David Gibson
While we've abstracted some (potential) differences between mechanisms for
securing guest memory, the initialization is still specific to SEV.  Given
that, move it into x86's kvm_arch_init() code, rather than the generic
kvm_init() code.

Signed-off-by: David Gibson 
Reviewed-by: Cornelia Huck 
---
 accel/kvm/kvm-all.c   | 14 --
 accel/kvm/sev-stub.c  |  4 ++--
 target/i386/kvm/kvm.c | 20 
 target/i386/sev.c |  7 ++-
 4 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 3d820d0c7d..7150acdbcc 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2180,20 +2180,6 @@ static int kvm_init(MachineState *ms)
 
 kvm_state = s;
 
-/*
- * if memory encryption object is specified then initialize the memory
- * encryption context.
- */
-if (ms->cgs) {
-Error *local_err = NULL;
-/* FIXME handle mechanisms other than SEV */
-ret = sev_kvm_init(ms->cgs, _err);
-if (ret < 0) {
-error_report_err(local_err);
-goto err;
-}
-}
-
 ret = kvm_arch_init(ms, s);
 if (ret < 0) {
 goto err;
diff --git a/accel/kvm/sev-stub.c b/accel/kvm/sev-stub.c
index 512e205f7f..9587d1b2a3 100644
--- a/accel/kvm/sev-stub.c
+++ b/accel/kvm/sev-stub.c
@@ -17,6 +17,6 @@
 
 int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
-/* SEV can't be selected if it's not compiled */
-g_assert_not_reached();
+/* If we get here, cgs must be some non-SEV thing */
+return 0;
 }
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 6dc1ee052d..4788139128 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -42,6 +42,7 @@
 #include "hw/i386/intel_iommu.h"
 #include "hw/i386/x86-iommu.h"
 #include "hw/i386/e820_memory_layout.h"
+#include "sysemu/sev.h"
 
 #include "hw/pci/pci.h"
 #include "hw/pci/msi.h"
@@ -2135,6 +2136,25 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 uint64_t shadow_mem;
 int ret;
 struct utsname utsname;
+Error *local_err = NULL;
+
+/*
+ * Initialize SEV context, if required
+ *
+ * If no memory encryption is requested (ms->cgs == NULL) this is
+ * a no-op.
+ *
+ * It's also a no-op if a non-SEV confidential guest support
+ * mechanism is selected.  SEV is the only mechanism available to
+ * select on x86 at present, so this doesn't arise, but if new
+ * mechanisms are supported in future (e.g. TDX), they'll need
+ * their own initialization either here or elsewhere.
+ */
+ret = sev_kvm_init(ms->cgs, _err);
+if (ret < 0) {
+error_report_err(local_err);
+return ret;
+}
 
 if (!kvm_check_extension(s, KVM_CAP_IRQ_ROUTING)) {
 error_report("kvm: KVM_CAP_IRQ_ROUTING not supported by KVM");
diff --git a/target/i386/sev.c b/target/i386/sev.c
index f9e9b5d8ae..11c9a3cc21 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -664,13 +664,18 @@ sev_vm_state_change(void *opaque, int running, RunState 
state)
 
 int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
-SevGuestState *sev = SEV_GUEST(cgs);
+SevGuestState *sev
+= (SevGuestState *)object_dynamic_cast(OBJECT(cgs), TYPE_SEV_GUEST);
 char *devname;
 int ret, fw_error;
 uint32_t ebx;
 uint32_t host_cbitpos;
 struct sev_user_data_status status = {};
 
+if (!sev) {
+return 0;
+}
+
 ret = ram_block_discard_disable(true);
 if (ret) {
 error_report("%s: cannot disable RAM discard", __func__);
-- 
2.29.2




[PATCH v8 05/13] confidential guest support: Rework the "memory-encryption" property

2021-02-01 Thread David Gibson
Currently the "memory-encryption" property is only looked at once we
get to kvm_init().  Although protection of guest memory from the
hypervisor isn't something that could really ever work with TCG, it's
not conceptually tied to the KVM accelerator.

In addition, the way the string property is resolved to an object is
almost identical to how a QOM link property is handled.

So, create a new "confidential-guest-support" link property which sets
this QOM interface link directly in the machine.  For compatibility we
keep the "memory-encryption" property, but now implemented in terms of
the new property.

Signed-off-by: David Gibson 
Reviewed-by: Greg Kurz 
Reviewed-by: Cornelia Huck 
---
 accel/kvm/kvm-all.c  |  5 +++--
 accel/kvm/sev-stub.c |  5 +++--
 hw/core/machine.c| 43 +--
 include/hw/boards.h  |  2 +-
 include/sysemu/sev.h |  2 +-
 target/i386/sev.c| 32 ++--
 6 files changed, 47 insertions(+), 42 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 038ed93e7e..7e615b8e68 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2184,8 +2184,9 @@ static int kvm_init(MachineState *ms)
  * if memory encryption object is specified then initialize the memory
  * encryption context.
  */
-if (ms->memory_encryption) {
-ret = sev_guest_init(ms->memory_encryption);
+if (ms->cgs) {
+/* FIXME handle mechanisms other than SEV */
+ret = sev_kvm_init(ms->cgs);
 if (ret < 0) {
 goto err;
 }
diff --git a/accel/kvm/sev-stub.c b/accel/kvm/sev-stub.c
index 5db9ab8f00..3d4787ae4a 100644
--- a/accel/kvm/sev-stub.c
+++ b/accel/kvm/sev-stub.c
@@ -15,7 +15,8 @@
 #include "qemu-common.h"
 #include "sysemu/sev.h"
 
-int sev_guest_init(const char *id)
+int sev_kvm_init(ConfidentialGuestSupport *cgs)
 {
-return -1;
+/* SEV can't be selected if it's not compiled */
+g_assert_not_reached();
 }
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 8909117d80..94194ab82d 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -32,6 +32,7 @@
 #include "hw/mem/nvdimm.h"
 #include "migration/global_state.h"
 #include "migration/vmstate.h"
+#include "exec/confidential-guest-support.h"
 
 GlobalProperty hw_compat_5_2[] = {};
 const size_t hw_compat_5_2_len = G_N_ELEMENTS(hw_compat_5_2);
@@ -427,16 +428,37 @@ static char *machine_get_memory_encryption(Object *obj, 
Error **errp)
 {
 MachineState *ms = MACHINE(obj);
 
-return g_strdup(ms->memory_encryption);
+if (ms->cgs) {
+return g_strdup(object_get_canonical_path_component(OBJECT(ms->cgs)));
+}
+
+return NULL;
 }
 
 static void machine_set_memory_encryption(Object *obj, const char *value,
 Error **errp)
 {
-MachineState *ms = MACHINE(obj);
+Object *cgs =
+object_resolve_path_component(object_get_objects_root(), value);
+
+if (!cgs) {
+error_setg(errp, "No such memory encryption object '%s'", value);
+return;
+}
 
-g_free(ms->memory_encryption);
-ms->memory_encryption = g_strdup(value);
+object_property_set_link(obj, "confidential-guest-support", cgs, errp);
+}
+
+static void machine_check_confidential_guest_support(const Object *obj,
+ const char *name,
+ Object *new_target,
+ Error **errp)
+{
+/*
+ * So far the only constraint is that the target has the
+ * TYPE_CONFIDENTIAL_GUEST_SUPPORT interface, and that's checked
+ * by the QOM core
+ */
 }
 
 static bool machine_get_nvdimm(Object *obj, Error **errp)
@@ -836,6 +858,15 @@ static void machine_class_init(ObjectClass *oc, void *data)
 object_class_property_set_description(oc, "suppress-vmdesc",
 "Set on to disable self-describing migration");
 
+object_class_property_add_link(oc, "confidential-guest-support",
+   TYPE_CONFIDENTIAL_GUEST_SUPPORT,
+   offsetof(MachineState, cgs),
+   machine_check_confidential_guest_support,
+   OBJ_PROP_LINK_STRONG);
+object_class_property_set_description(oc, "confidential-guest-support",
+  "Set confidential guest scheme to 
support");
+
+/* For compatibility */
 object_class_property_add_str(oc, "memory-encryption",
 machine_get_memory_encryption, machine_set_memory_encryption);
 object_class_property_set_description(oc, "memory-encryption",
@@ -1158,9 +1189,9 @@ void machine_run_board_init(MachineState *machine)
 cc->deprecation_note);
 }
 
-if (machine->memory_encryption) {
+if (machine->cgs) {
 /*
- * With memory encryption, the host can't see the real
+ * 

[PATCH v8 07/13] confidential guest support: Introduce cgs "ready" flag

2021-02-01 Thread David Gibson
The platform specific details of mechanisms for implementing
confidential guest support may require setup at various points during
initialization.  Thus, it's not really feasible to have a single cgs
initialization hook, but instead each mechanism needs its own
initialization calls in arch or machine specific code.

However, to make it harder to have a bug where a mechanism isn't
properly initialized under some circumstances, we want to have a
common place, late in boot, where we verify that cgs has been
initialized if it was requested.

This patch introduces a ready flag to the ConfidentialGuestSupport
base type to accomplish this, which we verify in
qemu_machine_creation_done().

Signed-off-by: David Gibson 
---
 include/exec/confidential-guest-support.h | 24 +++
 softmmu/vl.c  | 10 ++
 target/i386/sev.c |  2 ++
 3 files changed, 36 insertions(+)

diff --git a/include/exec/confidential-guest-support.h 
b/include/exec/confidential-guest-support.h
index 3db6380e63..5dcf602047 100644
--- a/include/exec/confidential-guest-support.h
+++ b/include/exec/confidential-guest-support.h
@@ -27,6 +27,30 @@ OBJECT_DECLARE_SIMPLE_TYPE(ConfidentialGuestSupport, 
CONFIDENTIAL_GUEST_SUPPORT)
 
 struct ConfidentialGuestSupport {
 Object parent;
+
+/*
+ * ready: flag set by CGS initialization code once it's ready to
+ *start executing instructions in a potentially-secure
+ *guest
+ *
+ * The definition here is a bit fuzzy, because this is essentially
+ * part of a self-sanity-check, rather than a strict mechanism.
+ *
+ * It's not fasible to have a single point in the common machine
+ * init path to configure confidential guest support, because
+ * different mechanisms have different interdependencies requiring
+ * initialization in different places, often in arch or machine
+ * type specific code.  It's also usually not possible to check
+ * for invalid configurations until that initialization code.
+ * That means it would be very easy to have a bug allowing CGS
+ * init to be bypassed entirely in certain configurations.
+ *
+ * Silently ignoring a requested security feature would be bad, so
+ * to avoid that we check late in init that this 'ready' flag is
+ * set if CGS was requested.  If the CGS init hasn't happened, and
+ * so 'ready' is not set, we'll abort.
+ */
+bool ready;
 };
 
 typedef struct ConfidentialGuestSupportClass {
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 1b464e3474..1869ed54a9 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -101,6 +101,7 @@
 #include "qemu/plugin.h"
 #include "qemu/queue.h"
 #include "sysemu/arch_init.h"
+#include "exec/confidential-guest-support.h"
 
 #include "ui/qemu-spice.h"
 #include "qapi/string-input-visitor.h"
@@ -2497,6 +2498,8 @@ static void qemu_create_cli_devices(void)
 
 static void qemu_machine_creation_done(void)
 {
+MachineState *machine = MACHINE(qdev_get_machine());
+
 /* Did we create any drives that we failed to create a device for? */
 drive_check_orphaned();
 
@@ -2516,6 +2519,13 @@ static void qemu_machine_creation_done(void)
 
 qdev_machine_creation_done();
 
+if (machine->cgs) {
+/*
+ * Verify that Confidential Guest Support has actually been initialized
+ */
+assert(machine->cgs->ready);
+}
+
 if (foreach_device_config(DEV_GDB, gdbserver_start) < 0) {
 exit(1);
 }
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 590cb31fa8..f9e9b5d8ae 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -737,6 +737,8 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
 qemu_add_machine_init_done_notifier(_machine_done_notify);
 qemu_add_vm_change_state_handler(sev_vm_state_change, sev);
 
+cgs->ready = true;
+
 return 0;
 err:
 sev_guest = NULL;
-- 
2.29.2




[PATCH v8 09/13] confidential guest support: Update documentation

2021-02-01 Thread David Gibson
Now that we've implemented a generic machine option for configuring various
confidential guest support mechanisms:
  1. Update docs/amd-memory-encryption.txt to reference this rather than
 the earlier SEV specific option
  2. Add a docs/confidential-guest-support.txt to cover the generalities of
 the confidential guest support scheme

Signed-off-by: David Gibson 
Reviewed-by: Greg Kurz 
---
 docs/amd-memory-encryption.txt  |  2 +-
 docs/confidential-guest-support.txt | 43 +
 2 files changed, 44 insertions(+), 1 deletion(-)
 create mode 100644 docs/confidential-guest-support.txt

diff --git a/docs/amd-memory-encryption.txt b/docs/amd-memory-encryption.txt
index 80b8eb00e9..145896aec7 100644
--- a/docs/amd-memory-encryption.txt
+++ b/docs/amd-memory-encryption.txt
@@ -73,7 +73,7 @@ complete flow chart.
 To launch a SEV guest
 
 # ${QEMU} \
--machine ...,memory-encryption=sev0 \
+-machine ...,confidential-guest-support=sev0 \
 -object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1
 
 Debugging
diff --git a/docs/confidential-guest-support.txt 
b/docs/confidential-guest-support.txt
new file mode 100644
index 00..bd439ac800
--- /dev/null
+++ b/docs/confidential-guest-support.txt
@@ -0,0 +1,43 @@
+Confidential Guest Support
+==
+
+Traditionally, hypervisors such as QEMU have complete access to a
+guest's memory and other state, meaning that a compromised hypervisor
+can compromise any of its guests.  A number of platforms have added
+mechanisms in hardware and/or firmware which give guests at least some
+protection from a compromised hypervisor.  This is obviously
+especially desirable for public cloud environments.
+
+These mechanisms have different names and different modes of
+operation, but are often referred to as Secure Guests or Confidential
+Guests.  We use the term "Confidential Guest Support" to distinguish
+this from other aspects of guest security (such as security against
+attacks from other guests, or from network sources).
+
+Running a Confidential Guest
+
+
+To run a confidential guest you need to add two command line parameters:
+
+1. Use "-object" to create a "confidential guest support" object.  The
+   type and parameters will vary with the specific mechanism to be
+   used
+2. Set the "confidential-guest-support" machine parameter to the ID of
+   the object from (1).
+
+Example (for AMD SEV)::
+
+qemu-system-x86_64 \
+ \
+-machine ...,confidential-guest-support=sev0 \
+-object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1
+
+Supported mechanisms
+
+
+Currently supported confidential guest mechanisms are:
+
+AMD Secure Encrypted Virtualization (SEV)
+docs/amd-memory-encryption.txt
+
+Other mechanisms may be supported in future.
-- 
2.29.2




[PATCH v8 04/13] confidential guest support: Move side effect out of machine_set_memory_encryption()

2021-02-01 Thread David Gibson
When the "memory-encryption" property is set, we also disable KSM
merging for the guest, since it won't accomplish anything.

We want that, but doing it in the property set function itself is
thereoretically incorrect, in the unlikely event of some configuration
environment that set the property then cleared it again before
constructing the guest.

More importantly, it makes some other cleanups we want more difficult.
So, instead move this logic to machine_run_board_init() conditional on
the final value of the property.

Signed-off-by: David Gibson 
Reviewed-by: Richard Henderson 
Reviewed-by: Greg Kurz 
Reviewed-by: Cornelia Huck 
---
 hw/core/machine.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index de3b8f1b31..8909117d80 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -437,14 +437,6 @@ static void machine_set_memory_encryption(Object *obj, 
const char *value,
 
 g_free(ms->memory_encryption);
 ms->memory_encryption = g_strdup(value);
-
-/*
- * With memory encryption, the host can't see the real contents of RAM,
- * so there's no point in it trying to merge areas.
- */
-if (value) {
-machine_set_mem_merge(obj, false, errp);
-}
 }
 
 static bool machine_get_nvdimm(Object *obj, Error **errp)
@@ -1166,6 +1158,15 @@ void machine_run_board_init(MachineState *machine)
 cc->deprecation_note);
 }
 
+if (machine->memory_encryption) {
+/*
+ * With memory encryption, the host can't see the real
+ * contents of RAM, so there's no point in it trying to merge
+ * areas.
+ */
+machine_set_mem_merge(OBJECT(machine), false, _abort);
+}
+
 machine_class->init(machine);
 phase_advance(PHASE_MACHINE_INITIALIZED);
 }
-- 
2.29.2




[PATCH v8 01/13] qom: Allow optional sugar props

2021-02-01 Thread David Gibson
From: Greg Kurz 

Global properties have an @optional field, which allows to apply a given
property to a given type even if one of its subclasses doesn't support
it. This is especially used in the compat code when dealing with the
"disable-modern" and "disable-legacy" properties and the "virtio-pci"
type.

Allow object_register_sugar_prop() to set this field as well.

Signed-off-by: Greg Kurz 
Message-Id: <159738953558.377274.16617742952571083440.st...@bahia.lan>
Signed-off-by: David Gibson 
Reviewed-by: Eduardo Habkost 
Reviewed-by: Cornelia Huck 
Reviewed-by: Philippe Mathieu-Daudé 
---
 include/qom/object.h |  3 ++-
 qom/object.c |  4 +++-
 softmmu/rtc.c|  3 ++-
 softmmu/vl.c | 17 +++--
 4 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/include/qom/object.h b/include/qom/object.h
index d378f13a11..6721cd312e 100644
--- a/include/qom/object.h
+++ b/include/qom/object.h
@@ -638,7 +638,8 @@ bool object_apply_global_props(Object *obj, const GPtrArray 
*props,
Error **errp);
 void object_set_machine_compat_props(GPtrArray *compat_props);
 void object_set_accelerator_compat_props(GPtrArray *compat_props);
-void object_register_sugar_prop(const char *driver, const char *prop, const 
char *value);
+void object_register_sugar_prop(const char *driver, const char *prop,
+const char *value, bool optional);
 void object_apply_compat_props(Object *obj);
 
 /**
diff --git a/qom/object.c b/qom/object.c
index 2fa0119647..491823db4a 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -442,7 +442,8 @@ static GPtrArray *object_compat_props[3];
  * other than "-global".  These are generally used for syntactic
  * sugar and legacy command line options.
  */
-void object_register_sugar_prop(const char *driver, const char *prop, const 
char *value)
+void object_register_sugar_prop(const char *driver, const char *prop,
+const char *value, bool optional)
 {
 GlobalProperty *g;
 if (!object_compat_props[2]) {
@@ -452,6 +453,7 @@ void object_register_sugar_prop(const char *driver, const 
char *prop, const char
 g->driver = g_strdup(driver);
 g->property = g_strdup(prop);
 g->value = g_strdup(value);
+g->optional = optional;
 g_ptr_array_add(object_compat_props[2], g);
 }
 
diff --git a/softmmu/rtc.c b/softmmu/rtc.c
index e1e15ef613..5632684fc9 100644
--- a/softmmu/rtc.c
+++ b/softmmu/rtc.c
@@ -179,7 +179,8 @@ void configure_rtc(QemuOpts *opts)
 if (!strcmp(value, "slew")) {
 object_register_sugar_prop("mc146818rtc",
"lost_tick_policy",
-   "slew");
+   "slew",
+   false);
 } else if (!strcmp(value, "none")) {
 /* discard is default */
 } else {
diff --git a/softmmu/vl.c b/softmmu/vl.c
index a8876b8965..1b464e3474 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -1663,16 +1663,20 @@ static int machine_set_property(void *opaque,
 return 0;
 }
 if (g_str_equal(qom_name, "igd-passthru")) {
-object_register_sugar_prop(ACCEL_CLASS_NAME("xen"), qom_name, value);
+object_register_sugar_prop(ACCEL_CLASS_NAME("xen"), qom_name, value,
+   false);
 return 0;
 }
 if (g_str_equal(qom_name, "kvm-shadow-mem")) {
-object_register_sugar_prop(ACCEL_CLASS_NAME("kvm"), qom_name, value);
+object_register_sugar_prop(ACCEL_CLASS_NAME("kvm"), qom_name, value,
+   false);
 return 0;
 }
 if (g_str_equal(qom_name, "kernel-irqchip")) {
-object_register_sugar_prop(ACCEL_CLASS_NAME("kvm"), qom_name, value);
-object_register_sugar_prop(ACCEL_CLASS_NAME("whpx"), qom_name, value);
+object_register_sugar_prop(ACCEL_CLASS_NAME("kvm"), qom_name, value,
+   false);
+object_register_sugar_prop(ACCEL_CLASS_NAME("whpx"), qom_name, value,
+   false);
 return 0;
 }
 
@@ -2297,9 +2301,10 @@ static void qemu_process_sugar_options(void)
 
 val = g_strdup_printf("%d",
  (uint32_t) 
qemu_opt_get_number(qemu_find_opts_singleton("smp-opts"), "cpus", 1));
-object_register_sugar_prop("memory-backend", "prealloc-threads", val);
+object_register_sugar_prop("memory-backend", "prealloc-threads", val,
+   false);
 g_free(val);
-object_register_sugar_prop("memory-backend", "prealloc", "on");
+object_register_sugar_prop("memory-backend", "prealloc", "on", false);
 }
 
 if (watchdog) {
-- 
2.29.2




[PATCH v8 02/13] confidential guest support: Introduce new confidential guest support class

2021-02-01 Thread David Gibson
Several architectures have mechanisms which are designed to protect
guest memory from interference or eavesdropping by a compromised
hypervisor.  AMD SEV does this with in-chip memory encryption and
Intel's TDX can do similar things.  POWER's Protected Execution
Framework (PEF) accomplishes a similar goal using an ultravisor and
new memory protection features, instead of encryption.

To (partially) unify handling for these, this introduces a new
ConfidentialGuestSupport QOM base class.  "Confidential" is kind of vague,
but "confidential computing" seems to be the buzzword about these schemes,
and "secure" or "protected" are often used in connection to unrelated
things (such as hypervisor-from-guest or guest-from-guest security).

The "support" in the name is significant because in at least some of the
cases it requires the guest to take specific actions in order to protect
itself from hypervisor eavesdropping.

Signed-off-by: David Gibson 
---
 backends/confidential-guest-support.c | 33 
 backends/meson.build  |  1 +
 include/exec/confidential-guest-support.h | 38 +++
 include/qemu/typedefs.h   |  1 +
 target/i386/sev.c |  5 +--
 5 files changed, 76 insertions(+), 2 deletions(-)
 create mode 100644 backends/confidential-guest-support.c
 create mode 100644 include/exec/confidential-guest-support.h

diff --git a/backends/confidential-guest-support.c 
b/backends/confidential-guest-support.c
new file mode 100644
index 00..052fde8db0
--- /dev/null
+++ b/backends/confidential-guest-support.c
@@ -0,0 +1,33 @@
+/*
+ * QEMU Confidential Guest support
+ *
+ * Copyright Red Hat.
+ *
+ * Authors:
+ *  David Gibson 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include "exec/confidential-guest-support.h"
+
+OBJECT_DEFINE_ABSTRACT_TYPE(ConfidentialGuestSupport,
+confidential_guest_support,
+CONFIDENTIAL_GUEST_SUPPORT,
+OBJECT)
+
+static void confidential_guest_support_class_init(ObjectClass *oc, void *data)
+{
+}
+
+static void confidential_guest_support_init(Object *obj)
+{
+}
+
+static void confidential_guest_support_finalize(Object *obj)
+{
+}
diff --git a/backends/meson.build b/backends/meson.build
index 484456ece7..d4221831fc 100644
--- a/backends/meson.build
+++ b/backends/meson.build
@@ -6,6 +6,7 @@ softmmu_ss.add([files(
   'rng-builtin.c',
   'rng-egd.c',
   'rng.c',
+  'confidential-guest-support.c',
 ), numa])
 
 softmmu_ss.add(when: 'CONFIG_POSIX', if_true: files('rng-random.c'))
diff --git a/include/exec/confidential-guest-support.h 
b/include/exec/confidential-guest-support.h
new file mode 100644
index 00..3db6380e63
--- /dev/null
+++ b/include/exec/confidential-guest-support.h
@@ -0,0 +1,38 @@
+/*
+ * QEMU Confidential Guest support
+ *   This interface describes the common pieces between various
+ *   schemes for protecting guest memory or other state against a
+ *   compromised hypervisor.  This includes memory encryption (AMD's
+ *   SEV and Intel's MKTME) or special protection modes (PEF on POWER,
+ *   or PV on s390x).
+ *
+ * Copyright Red Hat.
+ *
+ * Authors:
+ *  David Gibson 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef QEMU_CONFIDENTIAL_GUEST_SUPPORT_H
+#define QEMU_CONFIDENTIAL_GUEST_SUPPORT_H
+
+#ifndef CONFIG_USER_ONLY
+
+#include "qom/object.h"
+
+#define TYPE_CONFIDENTIAL_GUEST_SUPPORT "confidential-guest-support"
+OBJECT_DECLARE_SIMPLE_TYPE(ConfidentialGuestSupport, 
CONFIDENTIAL_GUEST_SUPPORT)
+
+struct ConfidentialGuestSupport {
+Object parent;
+};
+
+typedef struct ConfidentialGuestSupportClass {
+ObjectClass parent;
+} ConfidentialGuestSupportClass;
+
+#endif /* !CONFIG_USER_ONLY */
+
+#endif /* QEMU_CONFIDENTIAL_GUEST_SUPPORT_H */
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 68deb74ef6..dc39b05c30 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -37,6 +37,7 @@ typedef struct Chardev Chardev;
 typedef struct Clock Clock;
 typedef struct CompatProperty CompatProperty;
 typedef struct CoMutex CoMutex;
+typedef struct ConfidentialGuestSupport ConfidentialGuestSupport;
 typedef struct CPUAddressSpace CPUAddressSpace;
 typedef struct CPUState CPUState;
 typedef struct DeviceListener DeviceListener;
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 1546606811..b738dc45b6 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -31,6 +31,7 @@
 #include "qom/object.h"
 #include "exec/address-spaces.h"
 #include "monitor/monitor.h"
+#include "exec/confidential-guest-support.h"
 
 #define TYPE_SEV_GUEST "sev-guest"
 OBJECT_DECLARE_SIMPLE_TYPE(SevGuestState, SEV_GUEST)
@@ -47,7 +48,7 @@ 

[PATCH v8 03/13] sev: Remove false abstraction of flash encryption

2021-02-01 Thread David Gibson
When AMD's SEV memory encryption is in use, flash memory banks (which are
initialed by pc_system_flash_map()) need to be encrypted with the guest's
key, so that the guest can read them.

That's abstracted via the kvm_memcrypt_encrypt_data() callback in the KVM
state.. except, that it doesn't really abstract much at all.

For starters, the only call site is in code specific to the 'pc'
family of machine types, so it's obviously specific to those and to
x86 to begin with.  But it makes a bunch of further assumptions that
need not be true about an arbitrary confidential guest system based on
memory encryption, let alone one based on other mechanisms:

 * it assumes that the flash memory is defined to be encrypted with the
   guest key, rather than being shared with hypervisor
 * it assumes that that hypervisor has some mechanism to encrypt data into
   the guest, even though it can't decrypt it out, since that's the whole
   point
 * the interface assumes that this encrypt can be done in place, which
   implies that the hypervisor can write into a confidential guests's
   memory, even if what it writes isn't meaningful

So really, this "abstraction" is actually pretty specific to the way SEV
works.  So, this patch removes it and instead has the PC flash
initialization code call into a SEV specific callback.

Signed-off-by: David Gibson 
Reviewed-by: Cornelia Huck 
---
 accel/kvm/kvm-all.c| 31 ++-
 accel/kvm/sev-stub.c   |  9 ++---
 accel/stubs/kvm-stub.c | 10 --
 hw/i386/pc_sysfw.c | 17 ++---
 include/sysemu/kvm.h   | 16 
 include/sysemu/sev.h   |  4 ++--
 target/i386/sev-stub.c |  5 +
 target/i386/sev.c  | 24 ++--
 8 files changed, 31 insertions(+), 85 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 3feb17d965..038ed93e7e 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -123,10 +123,6 @@ struct KVMState
 KVMMemoryListener memory_listener;
 QLIST_HEAD(, KVMParkedVcpu) kvm_parked_vcpus;
 
-/* memory encryption */
-void *memcrypt_handle;
-int (*memcrypt_encrypt_data)(void *handle, uint8_t *ptr, uint64_t len);
-
 /* For "info mtree -f" to tell if an MR is registered in KVM */
 int nr_as;
 struct KVMAs {
@@ -225,26 +221,6 @@ int kvm_get_max_memslots(void)
 return s->nr_slots;
 }
 
-bool kvm_memcrypt_enabled(void)
-{
-if (kvm_state && kvm_state->memcrypt_handle) {
-return true;
-}
-
-return false;
-}
-
-int kvm_memcrypt_encrypt_data(uint8_t *ptr, uint64_t len)
-{
-if (kvm_state->memcrypt_handle &&
-kvm_state->memcrypt_encrypt_data) {
-return kvm_state->memcrypt_encrypt_data(kvm_state->memcrypt_handle,
-  ptr, len);
-}
-
-return 1;
-}
-
 /* Called with KVMMemoryListener.slots_lock held */
 static KVMSlot *kvm_get_free_slot(KVMMemoryListener *kml)
 {
@@ -2209,13 +2185,10 @@ static int kvm_init(MachineState *ms)
  * encryption context.
  */
 if (ms->memory_encryption) {
-kvm_state->memcrypt_handle = sev_guest_init(ms->memory_encryption);
-if (!kvm_state->memcrypt_handle) {
-ret = -1;
+ret = sev_guest_init(ms->memory_encryption);
+if (ret < 0) {
 goto err;
 }
-
-kvm_state->memcrypt_encrypt_data = sev_encrypt_data;
 }
 
 ret = kvm_arch_init(ms, s);
diff --git a/accel/kvm/sev-stub.c b/accel/kvm/sev-stub.c
index 4f97452585..5db9ab8f00 100644
--- a/accel/kvm/sev-stub.c
+++ b/accel/kvm/sev-stub.c
@@ -15,12 +15,7 @@
 #include "qemu-common.h"
 #include "sysemu/sev.h"
 
-int sev_encrypt_data(void *handle, uint8_t *ptr, uint64_t len)
+int sev_guest_init(const char *id)
 {
-abort();
-}
-
-void *sev_guest_init(const char *id)
-{
-return NULL;
+return -1;
 }
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 680e099463..0f17acfac0 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -81,16 +81,6 @@ int kvm_on_sigbus(int code, void *addr)
 return 1;
 }
 
-bool kvm_memcrypt_enabled(void)
-{
-return false;
-}
-
-int kvm_memcrypt_encrypt_data(uint8_t *ptr, uint64_t len)
-{
-  return 1;
-}
-
 #ifndef CONFIG_USER_ONLY
 int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
 {
diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index 92e90ff013..11172214f1 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -38,6 +38,7 @@
 #include "sysemu/sysemu.h"
 #include "hw/block/flash.h"
 #include "sysemu/kvm.h"
+#include "sysemu/sev.h"
 
 #define FLASH_SECTOR_SIZE 4096
 
@@ -147,7 +148,7 @@ static void pc_system_flash_map(PCMachineState *pcms,
 PFlashCFI01 *system_flash;
 MemoryRegion *flash_mem;
 void *flash_ptr;
-int ret, flash_size;
+int flash_size;
 
 assert(PC_MACHINE_GET_CLASS(pcms)->pci_enabled);
 
@@ -191,16 +192,10 @@ static void pc_system_flash_map(PCMachineState *pcms,
  

[PATCH v8 00/13] Generalize memory encryption models

2021-02-01 Thread David Gibson
A number of hardware platforms are implementing mechanisms whereby the
hypervisor does not have unfettered access to guest memory, in order
to mitigate the security impact of a compromised hypervisor.

AMD's SEV implements this with in-cpu memory encryption, and Intel has
its own memory encryption mechanism.  POWER has an upcoming mechanism
to accomplish this in a different way, using a new memory protection
level plus a small trusted ultravisor.  s390 also has a protected
execution environment.

The current code (committed or draft) for these features has each
platform's version configured entirely differently.  That doesn't seem
ideal for users, or particularly for management layers.

AMD SEV introduces a notionally generic machine option
"machine-encryption", but it doesn't actually cover any cases other
than SEV.

This series is a proposal to at least partially unify configuration
for these mechanisms, by renaming and generalizing AMD's
"memory-encryption" property.  It is replaced by a
"confidential-guest-support" property pointing to a platform specific
object which configures and manages the specific details.

Note to Ram Pai: the documentation I've included for PEF is very
minimal.  If you could send a patch expanding on that, it would be
very helpful.

Changes since v7:
 * Tweaked and clarified meaning of the 'ready' flag
 * Polished the interface to the PEF internals
 * Shifted initialization for s390 PV later (I hope I've finally got
   this after apply_cpu_model() where it needs to be)
Changes since v6:
 * Moved to using OBJECT_DECLARE_TYPE and OBJECT_DEFINE_TYPE macros
 * Assorted minor fixes
Changes since v5:
 * Renamed from "securable guest memory" to "confidential guest
   support"
 * Simpler reworking of x86 boot time flash encryption
 * Added a bunch of documentation
 * Fixed some compile errors on POWER
Changes since v4:
 * Renamed from "host trust limitation" to "securable guest memory",
   which I think is marginally more descriptive
 * Re-organized initialization, because the previous model called at
   kvm_init didn't work for s390
 * Assorted fixes to the s390 implementation; rudimentary testing
   (gitlab CI) only
Changes since v3:
 * Rebased
 * Added first cut at handling of s390 protected virtualization
Changes since RFCv2:
 * Rebased
 * Removed preliminary SEV cleanups (they've been merged)
 * Changed name to "host trust limitation"
 * Added migration blocker to the PEF code (based on SEV's version)
Changes since RFCv1:
 * Rebased
 * Fixed some errors pointed out by Dave Gilbert

David Gibson (12):
  confidential guest support: Introduce new confidential guest support
class
  sev: Remove false abstraction of flash encryption
  confidential guest support: Move side effect out of
machine_set_memory_encryption()
  confidential guest support: Rework the "memory-encryption" property
  sev: Add Error ** to sev_kvm_init()
  confidential guest support: Introduce cgs "ready" flag
  confidential guest support: Move SEV initialization into arch specific
code
  confidential guest support: Update documentation
  spapr: Add PEF based confidential guest support
  spapr: PEF: prevent migration
  confidential guest support: Alter virtio default properties for
protected guests
  s390: Recognize confidential-guest-support option

Greg Kurz (1):
  qom: Allow optional sugar props

 accel/kvm/kvm-all.c   |  38 --
 accel/kvm/sev-stub.c  |  10 +-
 accel/stubs/kvm-stub.c|  10 --
 backends/confidential-guest-support.c |  33 +
 backends/meson.build  |   1 +
 docs/amd-memory-encryption.txt|   2 +-
 docs/confidential-guest-support.txt   |  49 
 docs/papr-pef.txt |  30 +
 docs/system/s390x/protvirt.rst|  19 ++-
 hw/core/machine.c |  63 --
 hw/i386/pc_sysfw.c|  17 +--
 hw/ppc/meson.build|   1 +
 hw/ppc/pef.c  | 140 ++
 hw/ppc/spapr.c|   8 +-
 hw/s390x/pv.c |  62 ++
 hw/s390x/s390-virtio-ccw.c|   3 +
 include/exec/confidential-guest-support.h |  62 ++
 include/hw/boards.h   |   2 +-
 include/hw/ppc/pef.h  |  17 +++
 include/hw/s390x/pv.h |  17 +++
 include/qemu/typedefs.h   |   1 +
 include/qom/object.h  |   3 +-
 include/sysemu/kvm.h  |  16 ---
 include/sysemu/sev.h  |   4 +-
 qom/object.c  |   4 +-
 softmmu/rtc.c |   3 +-
 softmmu/vl.c  |  27 -
 target/i386/kvm/kvm.c |  20 
 target/i386/sev-stub.c|   5 +
 target/i386/sev.c |  95 

Re: [PATCH v3 0/4] MIPS Bootloader helper

2021-02-01 Thread Jiaxun Yang

在 2021/1/27 下午2:54, Jiaxun Yang 写道:

v2:
A big reconstruction. rewrite helpers with CPU feature and sepreate
changesets.
v3:
respin


ping?



Jiaxun Yang (4):
   hw/mips: Add a bootloader helper
   hw/mips: Use bl_gen_kernel_jump to generate bootloaders
   hw/mips/malta: Use bootloader helper to set BAR resgiters
   hw/mips/boston: Use bootloader helper to set GCRs

  include/hw/mips/bootloader.h |  49 +++
  hw/mips/bootloader.c | 164 +++
  hw/mips/boston.c |  64 +++---
  hw/mips/fuloong2e.c  |  28 +-
  hw/mips/malta.c  | 109 +++
  hw/mips/meson.build  |   2 +-
  6 files changed, 260 insertions(+), 156 deletions(-)
  create mode 100644 include/hw/mips/bootloader.h
  create mode 100644 hw/mips/bootloader.c






[PATCH] savevm: check for incoming-state in savevm

2021-02-01 Thread lichun
Running #qemu-system-i386 test.img -monitor stdio -incoming tcp:0.0.0.0:1234
(qemu) savevm
we get:

before the patch:
bdrv_co_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed.
Aborted
after:
Error: Guest is waiting for an incoming migration

Signed-off-by: lichun 
---
 migration/savevm.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 4f3b69e..84e76e4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1522,6 +1522,11 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
 return -EINVAL;
 }
 
+if (runstate_check(RUN_STATE_INMIGRATE)) {
+error_setg(errp, "Guest is waiting for an incoming migration");
+return -EINVAL;
+}
+
 if (migrate_use_block()) {
 error_setg(errp, "Block migration and snapshots are incompatible");
 return -EINVAL;
-- 
1.8.3.1




Re: [PATCH 12/13] target/mips: Let get_seg*_physical_address() take MMUAccessType arg

2021-02-01 Thread Jiaxun Yang

在 2021/1/28 下午10:41, Philippe Mathieu-Daudé 写道:

get_physical_address() calls get_seg_physical_address() and
get_segctl_physical_address() passing a MMUAccessType type.
Let the prototypes use it as argument, as it is stricter than
an integer.

Signed-off-by: Philippe Mathieu-Daudé 


Reviewed-by: Jiaxun Yang 


---
  target/mips/tlb_helper.c | 11 ++-
  1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/target/mips/tlb_helper.c b/target/mips/tlb_helper.c
index 64e89591abc..14f5b1a0a9c 100644
--- a/target/mips/tlb_helper.c
+++ b/target/mips/tlb_helper.c
@@ -222,7 +222,7 @@ static int is_seg_am_mapped(unsigned int am, bool eu, int 
mmu_idx)
  
  static int get_seg_physical_address(CPUMIPSState *env, hwaddr *physical,

  int *prot, target_ulong real_address,
-int rw, int mmu_idx,
+MMUAccessType access_type, int mmu_idx,
  unsigned int am, bool eu,
  target_ulong segmask,
  hwaddr physical_base)
@@ -234,7 +234,8 @@ static int get_seg_physical_address(CPUMIPSState *env, 
hwaddr *physical,
  return mapped;
  } else if (mapped) {
  /* The segment is TLB mapped */
-return env->tlb->map_address(env, physical, prot, real_address, rw);
+return env->tlb->map_address(env, physical, prot, real_address,
+ access_type);
  } else {
  /* The segment is unmapped */
  *physical = physical_base | (real_address & segmask);
@@ -245,15 +246,15 @@ static int get_seg_physical_address(CPUMIPSState *env, 
hwaddr *physical,
  
  static int get_segctl_physical_address(CPUMIPSState *env, hwaddr *physical,

 int *prot, target_ulong real_address,
-   int rw, int mmu_idx,
+   MMUAccessType access_type, int mmu_idx,
 uint16_t segctl, target_ulong segmask)
  {
  unsigned int am = (segctl & CP0SC_AM_MASK) >> CP0SC_AM;
  bool eu = (segctl >> CP0SC_EU) & 1;
  hwaddr pa = ((hwaddr)segctl & CP0SC_PA_MASK) << 20;
  
-return get_seg_physical_address(env, physical, prot, real_address, rw,

-mmu_idx, am, eu, segmask,
+return get_seg_physical_address(env, physical, prot, real_address,
+access_type, mmu_idx, am, eu, segmask,
  pa & ~(hwaddr)segmask);
  }
  





Re: [PATCH 13/13] target/mips: Let CPUMIPSTLBContext::map_address() take MMUAccessType

2021-02-01 Thread Jiaxun Yang

在 2021/1/28 下午10:41, Philippe Mathieu-Daudé 写道:

get_seg_physical_address() calls CPUMIPSTLBContext::map_address()
handlers passing a MMUAccessType type. Update the prototype
handlers to take a MMUAccessType argument, as it is stricter than
an integer.

Signed-off-by: Philippe Mathieu-Daudé 


Reviewed-by: Jiaxun Yang 


---
  target/mips/internal.h   |  8 
  target/mips/tlb_helper.c | 12 ++--
  2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/target/mips/internal.h b/target/mips/internal.h
index 34915c275c4..99264b8bf6a 100644
--- a/target/mips/internal.h
+++ b/target/mips/internal.h
@@ -111,7 +111,7 @@ struct CPUMIPSTLBContext {
  uint32_t nb_tlb;
  uint32_t tlb_in_use;
  int (*map_address)(struct CPUMIPSState *env, hwaddr *physical, int *prot,
-   target_ulong address, int rw);
+   target_ulong address, MMUAccessType access_type);
  void (*helper_tlbwi)(struct CPUMIPSState *env);
  void (*helper_tlbwr)(struct CPUMIPSState *env);
  void (*helper_tlbp)(struct CPUMIPSState *env);
@@ -126,11 +126,11 @@ struct CPUMIPSTLBContext {
  };
  
  int no_mmu_map_address(CPUMIPSState *env, hwaddr *physical, int *prot,

-   target_ulong address, int rw);
+   target_ulong address, MMUAccessType access_type);
  int fixed_mmu_map_address(CPUMIPSState *env, hwaddr *physical, int *prot,
-  target_ulong address, int rw);
+  target_ulong address, MMUAccessType access_type);
  int r4k_map_address(CPUMIPSState *env, hwaddr *physical, int *prot,
-target_ulong address, int rw);
+target_ulong address, MMUAccessType access_type);
  void r4k_helper_tlbwi(CPUMIPSState *env);
  void r4k_helper_tlbwr(CPUMIPSState *env);
  void r4k_helper_tlbp(CPUMIPSState *env);
diff --git a/target/mips/tlb_helper.c b/target/mips/tlb_helper.c
index 14f5b1a0a9c..2dc8ecafc3b 100644
--- a/target/mips/tlb_helper.c
+++ b/target/mips/tlb_helper.c
@@ -39,7 +39,7 @@ enum {
  
  /* no MMU emulation */

  int no_mmu_map_address(CPUMIPSState *env, hwaddr *physical, int *prot,
-   target_ulong address, int rw)
+   target_ulong address, MMUAccessType access_type)
  {
  *physical = address;
  *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
@@ -48,7 +48,7 @@ int no_mmu_map_address(CPUMIPSState *env, hwaddr *physical, 
int *prot,
  
  /* fixed mapping MMU emulation */

  int fixed_mmu_map_address(CPUMIPSState *env, hwaddr *physical, int *prot,
-  target_ulong address, int rw)
+  target_ulong address, MMUAccessType access_type)
  {
  if (address <= (int32_t)0x7FFFUL) {
  if (!(env->CP0_Status & (1 << CP0St_ERL))) {
@@ -68,7 +68,7 @@ int fixed_mmu_map_address(CPUMIPSState *env, hwaddr 
*physical, int *prot,
  
  /* MIPS32/MIPS64 R4000-style MMU emulation */

  int r4k_map_address(CPUMIPSState *env, hwaddr *physical, int *prot,
-target_ulong address, int rw)
+target_ulong address, MMUAccessType access_type)
  {
  uint16_t ASID = env->CP0_EntryHi & env->CP0_EntryHi_ASID_mask;
  uint32_t MMID = env->CP0_MemoryMapID;
@@ -97,13 +97,13 @@ int r4k_map_address(CPUMIPSState *env, hwaddr *physical, 
int *prot,
  if (!(n ? tlb->V1 : tlb->V0)) {
  return TLBRET_INVALID;
  }
-if (rw == MMU_INST_FETCH && (n ? tlb->XI1 : tlb->XI0)) {
+if (access_type == MMU_INST_FETCH && (n ? tlb->XI1 : tlb->XI0)) {
  return TLBRET_XI;
  }
-if (rw == MMU_DATA_LOAD && (n ? tlb->RI1 : tlb->RI0)) {
+if (access_type == MMU_DATA_LOAD && (n ? tlb->RI1 : tlb->RI0)) {
  return TLBRET_RI;
  }
-if (rw != MMU_DATA_STORE || (n ? tlb->D1 : tlb->D0)) {
+if (access_type != MMU_DATA_STORE || (n ? tlb->D1 : tlb->D0)) {
  *physical = tlb->PFN[n] | (address & (mask >> 1));
  *prot = PAGE_READ;
  if (n ? tlb->D1 : tlb->D0) {





Re: [PATCH 11/13] target/mips: Let get_physical_address() take MMUAccessType argument

2021-02-01 Thread Jiaxun Yang

在 2021/1/28 下午10:41, Philippe Mathieu-Daudé 写道:

All these functions:
- mips_cpu_get_phys_page_debug()
- cpu_mips_translate_address()
- mips_cpu_tlb_fill()
- page_table_walk_refill()
- walk_directory()
call get_physical_address() passing a MMUAccessType type. Let the
prototype use it as argument, as it is stricter than an integer.

Signed-off-by: Philippe Mathieu-Daudé 


Reviewed-by: Jiaxun Yang 


---
  target/mips/tlb_helper.c | 20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/target/mips/tlb_helper.c b/target/mips/tlb_helper.c
index 21b7d38f11c..64e89591abc 100644
--- a/target/mips/tlb_helper.c
+++ b/target/mips/tlb_helper.c
@@ -259,7 +259,7 @@ static int get_segctl_physical_address(CPUMIPSState *env, 
hwaddr *physical,
  
  static int get_physical_address(CPUMIPSState *env, hwaddr *physical,

  int *prot, target_ulong real_address,
-int rw, int mmu_idx)
+MMUAccessType access_type, int mmu_idx)
  {
  /* User mode can only access useg/xuseg */
  #if defined(TARGET_MIPS64)
@@ -306,14 +306,14 @@ static int get_physical_address(CPUMIPSState *env, hwaddr 
*physical,
  segctl = env->CP0_SegCtl2 >> 16;
  }
  ret = get_segctl_physical_address(env, physical, prot,
-  real_address, rw,
+  real_address, access_type,
mmu_idx, segctl, 0x3FFF);
  #if defined(TARGET_MIPS64)
  } else if (address < 0x4000ULL) {
  /* xuseg */
  if (UX && address <= (0x3FFFULL & env->SEGMask)) {
  ret = env->tlb->map_address(env, physical, prot,
-real_address, rw);
+real_address, access_type);
  } else {
  ret = TLBRET_BADADDR;
  }
@@ -322,7 +322,7 @@ static int get_physical_address(CPUMIPSState *env, hwaddr 
*physical,
  if ((supervisor_mode || kernel_mode) &&
  SX && address <= (0x7FFFULL & env->SEGMask)) {
  ret = env->tlb->map_address(env, physical, prot,
-real_address, rw);
+real_address, access_type);
  } else {
  ret = TLBRET_BADADDR;
  }
@@ -349,7 +349,7 @@ static int get_physical_address(CPUMIPSState *env, hwaddr 
*physical,
  /* Does CP0_Status.KX/SX/UX permit the access mode (am) */
  if (env->CP0_Status & am_ksux[am]) {
  ret = get_seg_physical_address(env, physical, prot,
-   real_address, rw,
+   real_address, access_type,
 mmu_idx, am, false, 
env->PAMask,
 0);
  } else {
@@ -363,7 +363,7 @@ static int get_physical_address(CPUMIPSState *env, hwaddr 
*physical,
  if (kernel_mode && KX &&
  address <= (0x7FFFULL & env->SEGMask)) {
  ret = env->tlb->map_address(env, physical, prot,
-real_address, rw);
+real_address, access_type);
  } else {
  ret = TLBRET_BADADDR;
  }
@@ -371,17 +371,17 @@ static int get_physical_address(CPUMIPSState *env, hwaddr 
*physical,
  } else if (address < KSEG1_BASE) {
  /* kseg0 */
  ret = get_segctl_physical_address(env, physical, prot, real_address,
-  rw, mmu_idx,
+  access_type, mmu_idx,
env->CP0_SegCtl1 >> 16, 0x1FFF);
  } else if (address < KSEG2_BASE) {
  /* kseg1 */
  ret = get_segctl_physical_address(env, physical, prot, real_address,
-  rw, mmu_idx,
+  access_type, mmu_idx,
env->CP0_SegCtl1, 0x1FFF);
  } else if (address < KSEG3_BASE) {
  /* sseg (kseg2) */
  ret = get_segctl_physical_address(env, physical, prot, real_address,
-  rw, mmu_idx,
+  access_type, mmu_idx,
env->CP0_SegCtl0 >> 16, 0x1FFF);
  } else {
  /*
@@ -389,7 +389,7 @@ static int get_physical_address(CPUMIPSState *env, hwaddr 
*physical,
   * XXX: debug segment is not emulated
   */
  ret = get_segctl_physical_address(env, physical, prot, real_address,
-  rw, mmu_idx,
+  access_type, 

Re: [PATCH 10/13] target/mips: Let raise_mmu_exception() take MMUAccessType argument

2021-02-01 Thread Jiaxun Yang

在 2021/1/28 下午10:41, Philippe Mathieu-Daudé 写道:

Both mips_cpu_tlb_fill() and cpu_mips_translate_address() pass
MMUAccessType to raise_mmu_exception(). Let the prototype use it
as argument, as it is stricter than an integer.

Signed-off-by: Philippe Mathieu-Daudé 


Reviewed-by: Jiaxun Yang 


---
  target/mips/tlb_helper.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/target/mips/tlb_helper.c b/target/mips/tlb_helper.c
index e9c3adeade6..21b7d38f11c 100644
--- a/target/mips/tlb_helper.c
+++ b/target/mips/tlb_helper.c
@@ -405,12 +405,12 @@ void cpu_mips_tlb_flush(CPUMIPSState *env)
  #endif /* !CONFIG_USER_ONLY */
  
  static void raise_mmu_exception(CPUMIPSState *env, target_ulong address,

-int rw, int tlb_error)
+MMUAccessType access_type, int tlb_error)
  {
  CPUState *cs = env_cpu(env);
  int exception = 0, error_code = 0;
  
-if (rw == MMU_INST_FETCH) {

+if (access_type == MMU_INST_FETCH) {
  error_code |= EXCP_INST_NOTAVAIL;
  }
  
@@ -419,7 +419,7 @@ static void raise_mmu_exception(CPUMIPSState *env, target_ulong address,

  case TLBRET_BADADDR:
  /* Reference to kernel address from user mode or supervisor mode */
  /* Reference to supervisor address from user mode */
-if (rw == MMU_DATA_STORE) {
+if (access_type == MMU_DATA_STORE) {
  exception = EXCP_AdES;
  } else {
  exception = EXCP_AdEL;
@@ -427,7 +427,7 @@ static void raise_mmu_exception(CPUMIPSState *env, 
target_ulong address,
  break;
  case TLBRET_NOMATCH:
  /* No TLB match for a mapped address */
-if (rw == MMU_DATA_STORE) {
+if (access_type == MMU_DATA_STORE) {
  exception = EXCP_TLBS;
  } else {
  exception = EXCP_TLBL;
@@ -436,7 +436,7 @@ static void raise_mmu_exception(CPUMIPSState *env, 
target_ulong address,
  break;
  case TLBRET_INVALID:
  /* TLB match with no valid bit */
-if (rw == MMU_DATA_STORE) {
+if (access_type == MMU_DATA_STORE) {
  exception = EXCP_TLBS;
  } else {
  exception = EXCP_TLBL;





Re: [PATCH 09/13] target/mips: Let cpu_mips_translate_address() take MMUAccessType arg

2021-02-01 Thread Jiaxun Yang

在 2021/1/28 下午10:41, Philippe Mathieu-Daudé 写道:

The single caller, do_translate_address(), passes MMUAccessType
to cpu_mips_translate_address(). Let the prototype use it as
argument, as it is stricter than an integer.

Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Jiaxun Yang 

---
  target/mips/internal.h   | 2 +-
  target/mips/tlb_helper.c | 6 +++---
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/mips/internal.h b/target/mips/internal.h
index d09afded5ea..34915c275c4 100644
--- a/target/mips/internal.h
+++ b/target/mips/internal.h
@@ -146,7 +146,7 @@ void mips_cpu_do_transaction_failed(CPUState *cs, hwaddr 
physaddr,
  int mmu_idx, MemTxAttrs attrs,
  MemTxResult response, uintptr_t retaddr);
  hwaddr cpu_mips_translate_address(CPUMIPSState *env, target_ulong address,
-  int rw);
+  MMUAccessType access_type);
  #endif
  
  #define cpu_signal_handler cpu_mips_signal_handler

diff --git a/target/mips/tlb_helper.c b/target/mips/tlb_helper.c
index afcc269750d..e9c3adeade6 100644
--- a/target/mips/tlb_helper.c
+++ b/target/mips/tlb_helper.c
@@ -903,17 +903,17 @@ bool mips_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
  
  #ifndef CONFIG_USER_ONLY

  hwaddr cpu_mips_translate_address(CPUMIPSState *env, target_ulong address,
-  int rw)
+  MMUAccessType access_type)
  {
  hwaddr physical;
  int prot;
  int ret = 0;
  
  /* data access */

-ret = get_physical_address(env, , , address, rw,
+ret = get_physical_address(env, , , address, access_type,
 cpu_mmu_index(env, false));
  if (ret != TLBRET_MATCH) {
-raise_mmu_exception(env, address, rw, ret);
+raise_mmu_exception(env, address, access_type, ret);
  return -1LL;
  } else {
  return physical;





Re: [PATCH 08/13] target/mips: Let do_translate_address() take MMUAccessType argument

2021-02-01 Thread Jiaxun Yang

在 2021/1/28 下午10:41, Philippe Mathieu-Daudé 写道:

The single caller, HELPER_LD_ATOMIC(), passes MMUAccessType to
do_translate_address(). Let the prototype use it as argument,
as it is stricter than an integer.

Signed-off-by: Philippe Mathieu-Daudé 


Reviewed-by: Jiaxun Yang 


---
  target/mips/op_helper.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/target/mips/op_helper.c b/target/mips/op_helper.c
index 9fce0194b3e..b80e8f75401 100644
--- a/target/mips/op_helper.c
+++ b/target/mips/op_helper.c
@@ -288,13 +288,14 @@ target_ulong helper_rotx(target_ulong rs, uint32_t shift, 
uint32_t shiftx,
  #ifndef CONFIG_USER_ONLY
  
  static inline hwaddr do_translate_address(CPUMIPSState *env,

-  target_ulong address,
-  int rw, uintptr_t 
retaddr)
+  target_ulong address,
+  MMUAccessType access_type,
+  uintptr_t retaddr)
  {
  hwaddr paddr;
  CPUState *cs = env_cpu(env);
  
-paddr = cpu_mips_translate_address(env, address, rw);

+paddr = cpu_mips_translate_address(env, address, access_type);
  
  if (paddr == -1LL) {

  cpu_loop_exit_restore(cs, retaddr);





Re: [PATCH 07/13] target/mips: Let page_table_walk_refill() take MMUAccessType argument

2021-02-01 Thread Jiaxun Yang

在 2021/1/28 下午10:41, Philippe Mathieu-Daudé 写道:

The single caller, mips_cpu_tlb_fill(), passes MMUAccessType
to page_table_walk_refill(). Let the prototype use it as
argument, as it is stricter than an integer.

Signed-off-by: Philippe Mathieu-Daudé 


Reviewed-by: Jiaxun Yang 


---
  target/mips/tlb_helper.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/mips/tlb_helper.c b/target/mips/tlb_helper.c
index 9216c7a91b3..afcc269750d 100644
--- a/target/mips/tlb_helper.c
+++ b/target/mips/tlb_helper.c
@@ -621,8 +621,8 @@ static int walk_directory(CPUMIPSState *env, uint64_t 
*vaddr,
  }
  }
  
-static bool page_table_walk_refill(CPUMIPSState *env, vaddr address, int rw,

-int mmu_idx)
+static bool page_table_walk_refill(CPUMIPSState *env, vaddr address,
+   MMUAccessType access_type, int mmu_idx)
  {
  int gdw = (env->CP0_PWSize >> CP0PS_GDW) & 0x3F;
  int udw = (env->CP0_PWSize >> CP0PS_UDW) & 0x3F;





Re: [PATCH 06/13] target/mips: Replace magic value by MMU_DATA_LOAD definition

2021-02-01 Thread Jiaxun Yang

在 2021/1/28 下午10:41, Philippe Mathieu-Daudé 写道:

Signed-off-by: Philippe Mathieu-Daudé 


Reviewed-by: Jiaxun Yang 


---
  target/mips/op_helper.c  | 2 +-
  target/mips/tlb_helper.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/mips/op_helper.c b/target/mips/op_helper.c
index 89c7d4556a0..9fce0194b3e 100644
--- a/target/mips/op_helper.c
+++ b/target/mips/op_helper.c
@@ -312,7 +312,7 @@ target_ulong helper_##name(CPUMIPSState *env, target_ulong 
arg, int mem_idx)  \
  } 
\
  do_raise_exception(env, EXCP_AdEL, GETPC());  
\
  } 
\
-env->CP0_LLAddr = do_translate_address(env, arg, 0, GETPC()); \
+env->CP0_LLAddr = do_translate_address(env, arg, MMU_DATA_LOAD, GETPC()); \
  env->lladdr = arg;
\
  env->llval = do_cast cpu_##insn##_mmuidx_ra(env, arg, mem_idx, GETPC());  
\
  return env->llval;
\
diff --git a/target/mips/tlb_helper.c b/target/mips/tlb_helper.c
index c9535b7f72f..9216c7a91b3 100644
--- a/target/mips/tlb_helper.c
+++ b/target/mips/tlb_helper.c
@@ -492,7 +492,7 @@ hwaddr mips_cpu_get_phys_page_debug(CPUState *cs, vaddr 
addr)
  hwaddr phys_addr;
  int prot;
  
-if (get_physical_address(env, _addr, , addr, 0,

+if (get_physical_address(env, _addr, , addr, MMU_DATA_LOAD,
   cpu_mmu_index(env, false)) != 0) {
  return -1;
  }





Re: [PATCH 05/13] target/mips: Remove unused MMU definitions

2021-02-01 Thread Jiaxun Yang

在 2021/1/28 下午10:41, Philippe Mathieu-Daudé 写道:

Remove these confusing and unused definitions.

Signed-off-by: Philippe Mathieu-Daudé 


Reviewed-by: Jiaxun Yang 


---
  target/mips/cpu.h | 16 
  1 file changed, 16 deletions(-)

diff --git a/target/mips/cpu.h b/target/mips/cpu.h
index b9e227a30e9..9e6028f8e63 100644
--- a/target/mips/cpu.h
+++ b/target/mips/cpu.h
@@ -1220,22 +1220,6 @@ typedef MIPSCPU ArchCPU;
  
  #include "exec/cpu-all.h"
  
-/*

- * Memory access type :
- * may be needed for precise access rights control and precise exceptions.
- */
-enum {
-/* 1 bit to define user level / supervisor access */
-ACCESS_USER  = 0x00,
-ACCESS_SUPER = 0x01,
-/* 1 bit to indicate direction */
-ACCESS_STORE = 0x02,
-/* Type of instruction that generated the access */
-ACCESS_CODE  = 0x10, /* Code fetch access*/
-ACCESS_INT   = 0x20, /* Integer load/store access*/
-ACCESS_FLOAT = 0x30, /* floating point load/store access */
-};
-
  /* Exceptions */
  enum {
  EXCP_NONE  = -1,





Re: [RFC 05/10] vhost: Add vhost_dev_from_virtio

2021-02-01 Thread Jason Wang



On 2021/2/1 下午4:28, Eugenio Perez Martin wrote:

On Mon, Feb 1, 2021 at 7:13 AM Jason Wang  wrote:


On 2021/1/30 上午4:54, Eugenio Pérez wrote:

Signed-off-by: Eugenio Pérez 
---
   include/hw/virtio/vhost.h |  1 +
   hw/virtio/vhost.c | 17 +
   2 files changed, 18 insertions(+)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 4a8bc75415..fca076e3f0 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -123,6 +123,7 @@ uint64_t vhost_get_features(struct vhost_dev *hdev, const 
int *feature_bits,
   void vhost_ack_features(struct vhost_dev *hdev, const int *feature_bits,
   uint64_t features);
   bool vhost_has_free_slot(void);
+struct vhost_dev *vhost_dev_from_virtio(const VirtIODevice *vdev);

   int vhost_net_set_backend(struct vhost_dev *hdev,
 struct vhost_vring_file *file);
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 28c7d78172..8683d507f5 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -61,6 +61,23 @@ bool vhost_has_free_slot(void)
   return slots_limit > used_memslots;
   }

+/*
+ * Get the vhost device associated to a VirtIO device.
+ */
+struct vhost_dev *vhost_dev_from_virtio(const VirtIODevice *vdev)
+{
+struct vhost_dev *hdev;
+
+QLIST_FOREACH(hdev, _devices, entry) {
+if (hdev->vdev == vdev) {
+return hdev;
+}
+}
+
+assert(hdev);
+return NULL;
+}


I'm not sure this can work in the case of multiqueue. E.g vhost-net
multiqueue is a N:1 mapping between vhost devics and virtio devices.

Thanks


Right. We could add an "vdev vq index" parameter to the function in
this case, but I guess the most reliable way to do this is to add a
vhost_opaque value to VirtQueue, as Stefan proposed in previous RFC.



So the question still, it looks like it's easier to hide the shadow 
virtqueue stuffs at vhost layer instead of expose them to virtio layer:


1) vhost protocol is stable ABI
2) no need to deal with virtio stuffs which is more complex than vhost

Or are there any advantages if we do it at virtio layer?

Thanks




I need to take this into account in qmp_x_vhost_enable_shadow_vq too.


+
   static void vhost_dev_sync_region(struct vhost_dev *dev,
 MemoryRegionSection *section,
 uint64_t mfirst, uint64_t mlast,





Re: [PATCH 04/13] target/mips: Remove access_type argument from get_physical_address()

2021-02-01 Thread Jiaxun Yang

在 2021/1/28 下午10:41, Philippe Mathieu-Daudé 写道:

get_physical_address() doesn't use the 'access_type' argument,
remove it to simplify.

Signed-off-by: Philippe Mathieu-Daudé 


Reviewed-by: Jiaxun Yang 


---
  target/mips/tlb_helper.c | 22 +-
  1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/target/mips/tlb_helper.c b/target/mips/tlb_helper.c
index d89ad87cb9d..c9535b7f72f 100644
--- a/target/mips/tlb_helper.c
+++ b/target/mips/tlb_helper.c
@@ -259,7 +259,7 @@ static int get_segctl_physical_address(CPUMIPSState *env, 
hwaddr *physical,
  
  static int get_physical_address(CPUMIPSState *env, hwaddr *physical,

  int *prot, target_ulong real_address,
-int rw, int access_type, int mmu_idx)
+int rw, int mmu_idx)
  {
  /* User mode can only access useg/xuseg */
  #if defined(TARGET_MIPS64)
@@ -492,7 +492,7 @@ hwaddr mips_cpu_get_phys_page_debug(CPUState *cs, vaddr 
addr)
  hwaddr phys_addr;
  int prot;
  
-if (get_physical_address(env, _addr, , addr, 0, ACCESS_INT,

+if (get_physical_address(env, _addr, , addr, 0,
   cpu_mmu_index(env, false)) != 0) {
  return -1;
  }
@@ -570,7 +570,7 @@ static int walk_directory(CPUMIPSState *env, uint64_t 
*vaddr,
  uint64_t w = 0;
  
  if (get_physical_address(env, , , *vaddr, MMU_DATA_LOAD,

- ACCESS_INT, cpu_mmu_index(env, false)) !=
+ cpu_mmu_index(env, false)) !=
   TLBRET_MATCH) {
  /* wrong base address */
  return 0;
@@ -598,7 +598,7 @@ static int walk_directory(CPUMIPSState *env, uint64_t 
*vaddr,
  *pw_entrylo0 = entry;
  }
  if (get_physical_address(env, , , vaddr2, 
MMU_DATA_LOAD,
- ACCESS_INT, cpu_mmu_index(env, false)) !=
+ cpu_mmu_index(env, false)) !=
   TLBRET_MATCH) {
  return 0;
  }
@@ -752,7 +752,7 @@ static bool page_table_walk_refill(CPUMIPSState *env, vaddr 
address, int rw,
  /* Leaf Level Page Table - First half of PTE pair */
  vaddr |= ptoffset0;
  if (get_physical_address(env, , , vaddr, MMU_DATA_LOAD,
- ACCESS_INT, cpu_mmu_index(env, false)) !=
+ cpu_mmu_index(env, false)) !=
   TLBRET_MATCH) {
  return false;
  }
@@ -765,7 +765,7 @@ static bool page_table_walk_refill(CPUMIPSState *env, vaddr 
address, int rw,
  /* Leaf Level Page Table - Second half of PTE pair */
  vaddr |= ptoffset1;
  if (get_physical_address(env, , , vaddr, MMU_DATA_LOAD,
- ACCESS_INT, cpu_mmu_index(env, false)) !=
+ cpu_mmu_index(env, false)) !=
   TLBRET_MATCH) {
  return false;
  }
@@ -843,16 +843,14 @@ bool mips_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
  #if !defined(CONFIG_USER_ONLY)
  hwaddr physical;
  int prot;
-int mips_access_type;
  #endif
  int ret = TLBRET_BADADDR;
  
  /* data access */

  #if !defined(CONFIG_USER_ONLY)
  /* XXX: put correct access by using cpu_restore_state() correctly */
-mips_access_type = ACCESS_INT;
  ret = get_physical_address(env, , , address,
-   access_type, mips_access_type, mmu_idx);
+   access_type, mmu_idx);
  switch (ret) {
  case TLBRET_MATCH:
  qemu_log_mask(CPU_LOG_MMU,
@@ -884,7 +882,7 @@ bool mips_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
  env->hflags |= mode;
  if (ret_walker) {
  ret = get_physical_address(env, , , address,
-   access_type, mips_access_type, mmu_idx);
+   access_type, mmu_idx);
  if (ret == TLBRET_MATCH) {
  tlb_set_page(cs, address & TARGET_PAGE_MASK,
   physical & TARGET_PAGE_MASK, prot,
@@ -909,12 +907,10 @@ hwaddr cpu_mips_translate_address(CPUMIPSState *env, 
target_ulong address,
  {
  hwaddr physical;
  int prot;
-int access_type;
  int ret = 0;
  
  /* data access */

-access_type = ACCESS_INT;
-ret = get_physical_address(env, , , address, rw, access_type,
+ret = get_physical_address(env, , , address, rw,
 cpu_mmu_index(env, false));
  if (ret != TLBRET_MATCH) {
  raise_mmu_exception(env, address, rw, ret);





Re: [PATCH 03/13] target/mips: Remove access_type arg from get_segctl_physical_address()

2021-02-01 Thread Jiaxun Yang

在 2021/1/28 下午10:41, Philippe Mathieu-Daudé 写道:

get_segctl_physical_address() doesn't use the 'access_type' argument,
remove it to simplify.

Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Jiaxun Yang 

---
  target/mips/tlb_helper.c | 20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/target/mips/tlb_helper.c b/target/mips/tlb_helper.c
index 9906292440c..d89ad87cb9d 100644
--- a/target/mips/tlb_helper.c
+++ b/target/mips/tlb_helper.c
@@ -245,7 +245,7 @@ static int get_seg_physical_address(CPUMIPSState *env, 
hwaddr *physical,
  
  static int get_segctl_physical_address(CPUMIPSState *env, hwaddr *physical,

 int *prot, target_ulong real_address,
-   int rw, int access_type, int mmu_idx,
+   int rw, int mmu_idx,
 uint16_t segctl, target_ulong segmask)
  {
  unsigned int am = (segctl & CP0SC_AM_MASK) >> CP0SC_AM;
@@ -306,7 +306,7 @@ static int get_physical_address(CPUMIPSState *env, hwaddr 
*physical,
  segctl = env->CP0_SegCtl2 >> 16;
  }
  ret = get_segctl_physical_address(env, physical, prot,
-  real_address, rw, access_type,
+  real_address, rw,
mmu_idx, segctl, 0x3FFF);
  #if defined(TARGET_MIPS64)
  } else if (address < 0x4000ULL) {
@@ -370,26 +370,26 @@ static int get_physical_address(CPUMIPSState *env, hwaddr 
*physical,
  #endif
  } else if (address < KSEG1_BASE) {
  /* kseg0 */
-ret = get_segctl_physical_address(env, physical, prot, real_address, 
rw,
-  access_type, mmu_idx,
+ret = get_segctl_physical_address(env, physical, prot, real_address,
+  rw, mmu_idx,
env->CP0_SegCtl1 >> 16, 0x1FFF);
  } else if (address < KSEG2_BASE) {
  /* kseg1 */
-ret = get_segctl_physical_address(env, physical, prot, real_address, 
rw,
-  access_type, mmu_idx,
+ret = get_segctl_physical_address(env, physical, prot, real_address,
+  rw, mmu_idx,
env->CP0_SegCtl1, 0x1FFF);
  } else if (address < KSEG3_BASE) {
  /* sseg (kseg2) */
-ret = get_segctl_physical_address(env, physical, prot, real_address, 
rw,
-  access_type, mmu_idx,
+ret = get_segctl_physical_address(env, physical, prot, real_address,
+  rw, mmu_idx,
env->CP0_SegCtl0 >> 16, 0x1FFF);
  } else {
  /*
   * kseg3
   * XXX: debug segment is not emulated
   */
-ret = get_segctl_physical_address(env, physical, prot, real_address, 
rw,
-  access_type, mmu_idx,
+ret = get_segctl_physical_address(env, physical, prot, real_address,
+  rw, mmu_idx,
env->CP0_SegCtl0, 0x1FFF);
  }
  return ret;





Re: [PATCH 02/13] target/mips: Remove access_type argument from get_seg_physical_address

2021-02-01 Thread Jiaxun Yang

在 2021/1/28 下午10:41, Philippe Mathieu-Daudé 写道:

get_seg_physical_address() doesn't use the 'access_type' argument,
remove it to simplify.

Signed-off-by: Philippe Mathieu-Daudé 


Reviewed-by: Jiaxun Yang 


---
  target/mips/tlb_helper.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/mips/tlb_helper.c b/target/mips/tlb_helper.c
index 1af2dc969d6..9906292440c 100644
--- a/target/mips/tlb_helper.c
+++ b/target/mips/tlb_helper.c
@@ -222,7 +222,7 @@ static int is_seg_am_mapped(unsigned int am, bool eu, int 
mmu_idx)
  
  static int get_seg_physical_address(CPUMIPSState *env, hwaddr *physical,

  int *prot, target_ulong real_address,
-int rw, int access_type, int mmu_idx,
+int rw, int mmu_idx,
  unsigned int am, bool eu,
  target_ulong segmask,
  hwaddr physical_base)
@@ -253,7 +253,7 @@ static int get_segctl_physical_address(CPUMIPSState *env, 
hwaddr *physical,
  hwaddr pa = ((hwaddr)segctl & CP0SC_PA_MASK) << 20;
  
  return get_seg_physical_address(env, physical, prot, real_address, rw,

-access_type, mmu_idx, am, eu, segmask,
+mmu_idx, am, eu, segmask,
  pa & ~(hwaddr)segmask);
  }
  
@@ -349,7 +349,7 @@ static int get_physical_address(CPUMIPSState *env, hwaddr *physical,

  /* Does CP0_Status.KX/SX/UX permit the access mode (am) */
  if (env->CP0_Status & am_ksux[am]) {
  ret = get_seg_physical_address(env, physical, prot,
-   real_address, rw, access_type,
+   real_address, rw,
 mmu_idx, am, false, 
env->PAMask,
 0);
  } else {





Re: [PATCH 01/13] target/mips: Remove access_type argument from map_address() handler

2021-02-01 Thread Jiaxun Yang

在 2021/1/28 下午10:41, Philippe Mathieu-Daudé 写道:

TLB map_address() handlers don't use the 'access_type' argument,
remove it to simplify.

Signed-off-by: Philippe Mathieu-Daudé 


Reviewed-by: Jiaxun Yang 


---
  target/mips/internal.h   |  8 
  target/mips/tlb_helper.c | 15 +++
  2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/target/mips/internal.h b/target/mips/internal.h
index 5dd17ff7333..d09afded5ea 100644
--- a/target/mips/internal.h
+++ b/target/mips/internal.h
@@ -111,7 +111,7 @@ struct CPUMIPSTLBContext {
  uint32_t nb_tlb;
  uint32_t tlb_in_use;
  int (*map_address)(struct CPUMIPSState *env, hwaddr *physical, int *prot,
-   target_ulong address, int rw, int access_type);
+   target_ulong address, int rw);
  void (*helper_tlbwi)(struct CPUMIPSState *env);
  void (*helper_tlbwr)(struct CPUMIPSState *env);
  void (*helper_tlbp)(struct CPUMIPSState *env);
@@ -126,11 +126,11 @@ struct CPUMIPSTLBContext {
  };
  
  int no_mmu_map_address(CPUMIPSState *env, hwaddr *physical, int *prot,

-   target_ulong address, int rw, int access_type);
+   target_ulong address, int rw);
  int fixed_mmu_map_address(CPUMIPSState *env, hwaddr *physical, int *prot,
-  target_ulong address, int rw, int access_type);
+  target_ulong address, int rw);
  int r4k_map_address(CPUMIPSState *env, hwaddr *physical, int *prot,
-target_ulong address, int rw, int access_type);
+target_ulong address, int rw);
  void r4k_helper_tlbwi(CPUMIPSState *env);
  void r4k_helper_tlbwr(CPUMIPSState *env);
  void r4k_helper_tlbp(CPUMIPSState *env);
diff --git a/target/mips/tlb_helper.c b/target/mips/tlb_helper.c
index 082c17928d3..1af2dc969d6 100644
--- a/target/mips/tlb_helper.c
+++ b/target/mips/tlb_helper.c
@@ -39,7 +39,7 @@ enum {
  
  /* no MMU emulation */

  int no_mmu_map_address(CPUMIPSState *env, hwaddr *physical, int *prot,
-   target_ulong address, int rw, int access_type)
+   target_ulong address, int rw)
  {
  *physical = address;
  *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
@@ -48,7 +48,7 @@ int no_mmu_map_address(CPUMIPSState *env, hwaddr *physical, 
int *prot,
  
  /* fixed mapping MMU emulation */

  int fixed_mmu_map_address(CPUMIPSState *env, hwaddr *physical, int *prot,
-  target_ulong address, int rw, int access_type)
+  target_ulong address, int rw)
  {
  if (address <= (int32_t)0x7FFFUL) {
  if (!(env->CP0_Status & (1 << CP0St_ERL))) {
@@ -68,7 +68,7 @@ int fixed_mmu_map_address(CPUMIPSState *env, hwaddr 
*physical, int *prot,
  
  /* MIPS32/MIPS64 R4000-style MMU emulation */

  int r4k_map_address(CPUMIPSState *env, hwaddr *physical, int *prot,
-target_ulong address, int rw, int access_type)
+target_ulong address, int rw)
  {
  uint16_t ASID = env->CP0_EntryHi & env->CP0_EntryHi_ASID_mask;
  uint32_t MMID = env->CP0_MemoryMapID;
@@ -234,8 +234,7 @@ static int get_seg_physical_address(CPUMIPSState *env, 
hwaddr *physical,
  return mapped;
  } else if (mapped) {
  /* The segment is TLB mapped */
-return env->tlb->map_address(env, physical, prot, real_address, rw,
- access_type);
+return env->tlb->map_address(env, physical, prot, real_address, rw);
  } else {
  /* The segment is unmapped */
  *physical = physical_base | (real_address & segmask);
@@ -314,7 +313,7 @@ static int get_physical_address(CPUMIPSState *env, hwaddr 
*physical,
  /* xuseg */
  if (UX && address <= (0x3FFFULL & env->SEGMask)) {
  ret = env->tlb->map_address(env, physical, prot,
-real_address, rw, access_type);
+real_address, rw);
  } else {
  ret = TLBRET_BADADDR;
  }
@@ -323,7 +322,7 @@ static int get_physical_address(CPUMIPSState *env, hwaddr 
*physical,
  if ((supervisor_mode || kernel_mode) &&
  SX && address <= (0x7FFFULL & env->SEGMask)) {
  ret = env->tlb->map_address(env, physical, prot,
-real_address, rw, access_type);
+real_address, rw);
  } else {
  ret = TLBRET_BADADDR;
  }
@@ -364,7 +363,7 @@ static int get_physical_address(CPUMIPSState *env, hwaddr 
*physical,
  if (kernel_mode && KX &&
  address <= (0x7FFFULL & env->SEGMask)) {
  ret = env->tlb->map_address(env, physical, prot,
-real_address, rw, access_type);
+real_address, rw);
  } 

Re: [PATCH v4 00/16] 64bit block-layer: part I

2021-02-01 Thread Eric Blake
On 12/11/20 12:39 PM, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> We want 64bit write-zeroes, and for this, convert all io functions to
> 64bit.
> 
> We chose signed type, to be consistent with off_t (which is signed) and
> with possibility for signed return type (where negative value means
> error).
> 
> Please refer to initial cover-letter 
>  https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg08723.html
> for more info.
> 
> v4: I found, that some more work is needed for block/block-backend, so
> decided to make partI, converting block/io
> 
> v4 is based on Kevin's block branch ([PULL 00/34] Block layer patches)
>for BDRV_MAX_LENGTH
> 
> changes:
> 01-05: new
> 06: add Alberto's r-b
> 07: new
> 08-16: rebase, add new-style request check, improve commit-msg, drop r-bs

I had planned to send a pull request for this series today, but ran into
a snag.  Without this series applied, './check -qcow2' fails 030, 185,
and 297.  With it applied, I now also get a failure in 206.  I'm trying
to bisect which patch caused the problem, but here's the failure:

206   fail   [20:54:54] [20:55:01]   6.9s   (last: 6.7s)  output
mismatch (see 206.out.bad)
--- /home/eblake/qemu/tests/qemu-iotests/206.out
+++ 206.out.bad
@@ -180,7 +180,7 @@

 {"execute": "blockdev-create", "arguments": {"job-id": "job0",
"options": {"driver": "qcow2", "file": "node0", "size":
9223372036854775296}}}
 {"return": {}}
-Job failed: Could not resize image: Required too big image size, it
must be not greater than 9223372035781033984
+Job failed: Could not resize image: offset(9223372036854775296) exceeds
maximum(9223372035781033984)
 {"execute": "job-dismiss", "arguments": {"id": "job0"}}
 {"return": {}}

Looks like it is just a changed error message, so I can touch up the
correct patch and then repackage the pull request tomorrow (it's too
late for me today).  Oh, and the 0 exit status of ./check when a test
fails is something I see you already plan on fixing...

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v7 07/13] confidential guest support: Introduce cgs "ready" flag

2021-02-01 Thread David Gibson
On Tue, Jan 19, 2021 at 09:16:08AM +0100, Cornelia Huck wrote:
> On Mon, 18 Jan 2021 19:47:30 +
> "Dr. David Alan Gilbert"  wrote:
> 
> > * David Gibson (da...@gibson.dropbear.id.au) wrote:
> > > The platform specific details of mechanisms for implementing
> > > confidential guest support may require setup at various points during
> > > initialization.  Thus, it's not really feasible to have a single cgs
> > > initialization hook, but instead each mechanism needs its own
> > > initialization calls in arch or machine specific code.
> > > 
> > > However, to make it harder to have a bug where a mechanism isn't
> > > properly initialized under some circumstances, we want to have a
> > > common place, relatively late in boot, where we verify that cgs has
> > > been initialized if it was requested.
> > > 
> > > This patch introduces a ready flag to the ConfidentialGuestSupport
> > > base type to accomplish this, which we verify just before the machine
> > > specific initialization function.  
> > 
> > You may find you need to define 'ready' and the answer might be a bit
> > variable; for example, on SEV there's a setup bit and then you may end
> > up doing an attestation and receiving some data before you actaully let
> > the guest execute code.   Is it ready before it's received the
> > attestation response or only when it can run code?
> > Is a Power or 390 machine 'ready' before it's executed the magic
> > instruction to enter the confidential mode?
> 
> I would consider those machines where the guest makes the transition
> itself "ready" as soon as everything is set up so that the guest can
> actually initiate the transition. Otherwise, those machines would never
> be "ready" if the guest does not transition.
> 
> Maybe we can define "ready" as "the guest can start to execute in
> secure mode", where "guest" includes the bootloader and everything that
> runs in a guest context, and "start to execute" implies that some setup
> may be done only after the guest has kicked it off?

That was pretty much my intention.  I've put a big comment on the
field definition and tweaked things around a bit in the hopes of
making that clearer for the next spin.


-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC PATCH v3 00/31] CXL 2.0 Support

2021-02-01 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20210202005948.241655-1-ben.widaw...@intel.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210202005948.241655-1-ben.widaw...@intel.com
Subject: [RFC PATCH v3 00/31] CXL 2.0 Support

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/20210202005948.241655-1-ben.widaw...@intel.com -> 
patchew/20210202005948.241655-1-ben.widaw...@intel.com
Switched to a new branch 'test'
e26ed22 WIP: i386/cxl: Initialize a host bridge
9329c2b qtest/cxl: Add very basic sanity tests
c140fd9 hw/cxl/device: Implement get/set LSA
8ed7755 hw/cxl/device: Plumb real LSA sizing
5f683ab hw/cxl/device: Add some trivial commands
4399501 tests/acpi: Add new CEDT files
6c13c92 acpi/cxl: Create the CEDT (9.14.1)
04a874a tests/acpi: allow CEDT table addition
50f82e6 acpi/cxl: Add _OSC implementation (9.14.2)
7eb8038 hw/cxl/device: Implement MMIO HDM decoding (8.2.5.12)
ba80470 hw/cxl/device: Add a memory device (8.2.8.5)
54b9662 hw/cxl/rp: Add a root port
e70de08 hw/pxb/cxl: Add "windows" for host bridges
606831a acpi/pxb/cxl: Reserve host bridge MMIO
29a562b hw/cxl/component: Implement host bridge MMIO (8.2.5, table 142)
32e7bdd hw/pci: Plumb _UID through host bridges
6651f84 tests/acpi: remove stale allowed tables
24837fc acpi/pci: Consolidate host bridge setup
52f548c qtest: allow DSDT acpi table changes
bdcd7d9 hw/pxb: Allow creation of a CXL PXB (host bridge)
5d67d7e hw/pci/cxl: Create a CXL bus type
3b0d310 hw/pxb: Use a type for realizing expanders
5ccf850 hw/cxl/device: Add log commands (8.2.9.4) + CEL
892e722 hw/cxl/device: Timestamp implementation (8.2.9.3)
f2444bb hw/cxl/device: Add cheap EVENTS implementation (8.2.9.1)
67fa438 hw/cxl/device: Add memory device utilities
cfa875c hw/cxl/device: Implement basic mailbox (8.2.8.4)
bdd7975 hw/cxl/device: Implement the CAP array (8.2.8.1-2)
c9e87d1 hw/cxl/device: Introduce a CXL device (8.2.8)
1cc9e2a hw/cxl/component: Introduce CXL components (8.1.x, 8.2.5)
7b0e042 hw/pci/cxl: Add a CXL component type (interface)

=== OUTPUT BEGIN ===
1/31 Checking commit 7b0e042bc22b (hw/pci/cxl: Add a CXL component type 
(interface))
2/31 Checking commit 1cc9e2a0a6d5 (hw/cxl/component: Introduce CXL components 
(8.1.x, 8.2.5))
WARNING: line over 80 characters
#187: FILE: hw/cxl/cxl-component-utils.c:101:
+reg_state[R_CXL_RAS_ERR_CAP_CTRL] = 0; /* CXL switches and devices must 
set */

WARNING: line over 80 characters
#193: FILE: hw/cxl/cxl-component-utils.c:107:
+ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_GLOBAL_CONTROL, 
HDM_DECODER_ENABLE, 0);

WARNING: line over 80 characters
#406: FILE: include/hw/cxl/cxl_component.h:62:
+#define CXL_RAS_REGISTERS_OFFSET 0x80 /* Give ample space for caps before this 
*/

WARNING: line over 80 characters
#417: FILE: include/hw/cxl/cxl_component.h:73:
+#define CXL_SEC_REGISTERS_OFFSET (CXL_RAS_REGISTERS_OFFSET + 
CXL_RAS_REGISTERS_SIZE)

WARNING: line over 80 characters
#421: FILE: include/hw/cxl/cxl_component.h:77:
+#define CXL_LINK_REGISTERS_OFFSET (CXL_SEC_REGISTERS_OFFSET + 
CXL_SEC_REGISTERS_SIZE)

WARNING: line over 80 characters
#465: FILE: include/hw/cxl/cxl_component.h:121:
+#define CXL_EXTSEC_REGISTERS_OFFSET (CXL_HDM_REGISTERS_OFFSET + 
CXL_HDM_REGISTERS_SIZE)

WARNING: line over 80 characters
#469: FILE: include/hw/cxl/cxl_component.h:125:
+#define CXL_IDE_REGISTERS_OFFSET (CXL_EXTSEC_REGISTERS_OFFSET + 
CXL_EXTSEC_REGISTERS_SIZE)

WARNING: line over 80 characters
#473: FILE: include/hw/cxl/cxl_component.h:129:
+#define CXL_SNOOP_REGISTERS_OFFSET (CXL_IDE_REGISTERS_OFFSET + 
CXL_IDE_REGISTERS_SIZE)

total: 0 errors, 8 warnings, 582 lines checked

Patch 2/31 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
3/31 Checking commit c9e87d150708 (hw/cxl/device: Introduce a CXL device 
(8.2.8))
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#36: 
new file mode 100644

WARNING: line over 80 characters
#156: FILE: include/hw/cxl/cxl_device.h:116:
+#define CXL_DEVICE_CAPABILITY_HEADER_REGISTER(n, offset)   
 \

total: 0 errors, 2 warnings, 162 lines checked

Patch 3/31 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
4/31 Checking commit bdd7975aa4bc (hw/cxl/device: Implement the CAP array 
(8.2.8.1-2))
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#23: 
new file mode 100644

ERROR: Macros with complex values should be enclosed in parenthesis

Re: [PATCH v2 2/2] accel: kvm: Add aligment assert for kvm_log_clear_one_slot

2021-02-01 Thread Keqian Zhu
Hi Philippe,

On 2021/2/1 23:14, Philippe Mathieu-Daudé wrote:
> Hi,
> 
> On 12/17/20 2:49 AM, Keqian Zhu wrote:
>> The parameters start and size are transfered from QEMU memory
>> emulation layer. It can promise that they are TARGET_PAGE_SIZE
>> aligned. However, KVM needs they are qemu_real_page_size aligned.
>>
>> Though no caller breaks this aligned requirement currently, we'd
>> better add an explicit assert to avoid future breaking.
>>
>> Signed-off-by: Keqian Zhu 
>> ---
>>  accel/kvm/kvm-all.c | 7 +++
>>  1 file changed, 7 insertions(+)
>>
>> ---
>> v2
>>  - Address Andrew's commment (Use assert instead of return err).
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index f6b16a8df8..73b195cc41 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -692,6 +692,10 @@ out:
>>  #define KVM_CLEAR_LOG_ALIGN  (qemu_real_host_page_size << 
>> KVM_CLEAR_LOG_SHIFT)
>>  #define KVM_CLEAR_LOG_MASK   (-KVM_CLEAR_LOG_ALIGN)
>>  
>> +/*
>> + * As the granule of kvm dirty log is qemu_real_host_page_size,
>> + * @start and @size are expected and restricted to align to it.
>> + */
>>  static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>>uint64_t size)
>>  {
>> @@ -701,6 +705,9 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int 
>> as_id, uint64_t start,
>>  unsigned long *bmap_clear = NULL, psize = qemu_real_host_page_size;
>>  int ret;
>>  
>> +/* Make sure start and size are qemu_real_host_page_size aligned */
>> +assert(QEMU_IS_ALIGNED(start | size, psize));
> 
> Why not return an error instead of aborting the VM?
Yep, I return an error in v1. As suggested by Peter Xu: "Returning -EINVAL is 
the same as abort() currently - it'll just abort() at
kvm_log_clear() instead."

> 
>>  /*
>>   * We need to extend either the start or the size or both to
>>   * satisfy the KVM interface requirement.  Firstly, do the start
>>
> 
> .
> 
Thanks for review.

Keqian.



[RFC PATCH v3 31/31] WIP: i386/cxl: Initialize a host bridge

2021-02-01 Thread Ben Widawsky
This patch allows initializing the primary host bridge as a CXL capable
hostbridge.

Signed-off-by: Ben Widawsky 

--
This patch is WIP.
---
 hw/arm/virt.c|  1 +
 hw/core/machine.c| 26 ++
 hw/i386/acpi-build.c |  8 +++-
 hw/i386/microvm.c|  1 +
 hw/i386/pc.c |  1 +
 hw/ppc/spapr.c   |  2 ++
 include/hw/boards.h  |  2 ++
 include/hw/cxl/cxl.h |  4 
 8 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 399da73454..fd5f5b656c 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2547,6 +2547,7 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 hc->unplug_request = virt_machine_device_unplug_request_cb;
 hc->unplug = virt_machine_device_unplug_cb;
 mc->nvdimm_supported = true;
+mc->cxl_supported = false;
 mc->auto_enable_numa_with_memhp = true;
 mc->auto_enable_numa_with_memdev = true;
 mc->default_ram_id = "mach-virt.ram";
diff --git a/hw/core/machine.c b/hw/core/machine.c
index de3b8f1b31..c739803854 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -30,6 +30,7 @@
 #include "sysemu/qtest.h"
 #include "hw/pci/pci.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/cxl/cxl.h"
 #include "migration/global_state.h"
 #include "migration/vmstate.h"
 
@@ -502,6 +503,20 @@ static void machine_set_nvdimm_persistence(Object *obj, 
const char *value,
 nvdimms_state->persistence_string = g_strdup(value);
 }
 
+static bool machine_get_cxl(Object *obj, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+return ms->cxl_devices_state->is_enabled;
+}
+
+static void machine_set_cxl(Object *obj, bool value, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+ms->cxl_devices_state->is_enabled = value;
+}
+
 void machine_class_allow_dynamic_sysbus_dev(MachineClass *mc, const char *type)
 {
 QAPI_LIST_PREPEND(mc->allowed_dynamic_sysbus_devices, g_strdup(type));
@@ -903,6 +918,16 @@ static void machine_initfn(Object *obj)
 "Valid values are cpu, mem-ctrl");
 }
 
+if (mc->cxl_supported) {
+Object *obj = OBJECT(ms);
+
+ms->cxl_devices_state = g_new0(CXLState, 1);
+object_property_add_bool(obj, "cxl", machine_get_cxl, machine_set_cxl);
+object_property_set_description(obj, "cxl",
+"Set on/off to enable/disable "
+"CXL instantiation");
+}
+
 if (mc->cpu_index_to_instance_props && mc->get_default_cpu_node_id) {
 ms->numa_state = g_new0(NumaState, 1);
 object_property_add_bool(obj, "hmat",
@@ -939,6 +964,7 @@ static void machine_finalize(Object *obj)
 g_free(ms->device_memory);
 g_free(ms->nvdimms_state);
 g_free(ms->numa_state);
+g_free(ms->cxl_devices_state);
 }
 
 bool machine_usb(MachineState *machine)
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 7706856c49..2250e6d27b 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -53,6 +53,7 @@
 #include "sysemu/numa.h"
 #include "sysemu/reset.h"
 #include "hw/hyperv/vmbus-bridge.h"
+#include "hw/cxl/cxl.h"
 
 /* Supported chipsets: */
 #include "hw/southbridge/piix.h"
@@ -1277,8 +1278,13 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 build_piix4_pci0_int(dsdt);
 } else {
 sb_scope = aml_scope("_SB");
+/*
+ * XXX: CXL spec calls this "CXL0", but that would require lots of
+ * changes throughout and so even for CXL enabled, we call it "PCI0"
+ */
 dev = aml_device("PCI0");
-init_pci_acpi(dev, 0, PCIE);
+init_pci_acpi(dev, 0,
+machine->cxl_devices_state->is_enabled ? CXL : PCIE);
 aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
 aml_append(sb_scope, dev);
 
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index edf2b0f061..970b299a69 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -688,6 +688,7 @@ static void microvm_class_init(ObjectClass *oc, void *data)
 mc->auto_enable_numa_with_memdev = false;
 mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
 mc->nvdimm_supported = false;
+mc->cxl_supported = false;
 mc->default_ram_id = "microvm.ram";
 
 /* Avoid relying too much on kernel components */
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 5d41809b37..7350eeea9c 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1725,6 +1725,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 hc->unplug = pc_machine_device_unplug_cb;
 mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
 mc->nvdimm_supported = true;
+mc->cxl_supported = true;
 mc->default_ram_id = "pc.ram";
 
 object_class_property_add(oc, PC_MACHINE_MAX_RAM_BELOW_4G, "size",
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 6c47466fc2..9773dbd83c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4440,6 +4440,7 @@ static void 

[RFC PATCH v3 27/31] hw/cxl/device: Add some trivial commands

2021-02-01 Thread Ben Widawsky
GET_FW_INFO and GET_PARTITION_INFO, for this emulation, is equivalent to
info already returned in the IDENTIFY command. To have a more robust
implementation, add those.

Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-mailbox-utils.c | 65 ++
 1 file changed, 65 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index f92dfad882..dc8e0eb08e 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -43,6 +43,8 @@ enum {
 #define CLEAR_RECORDS   0x1
 #define GET_INTERRUPT_POLICY   0x2
 #define SET_INTERRUPT_POLICY   0x3
+FIRMWARE_UPDATE = 0x02,
+#define GET_INFO  0x0
 TIMESTAMP   = 0x03,
 #define GET   0x0
 #define SET   0x1
@@ -51,6 +53,8 @@ enum {
 #define GET_LOG   0x1
 IDENTIFY= 0x40,
 #define MEMORY_DEVICE 0x0
+CCLS= 0x41,
+#define GET_PARTITION_INFO 0x0
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -125,11 +129,13 @@ define_mailbox_handler_zeroed(EVENTS_GET_RECORDS, 0x20);
 define_mailbox_handler_nop(EVENTS_CLEAR_RECORDS);
 define_mailbox_handler_zeroed(EVENTS_GET_INTERRUPT_POLICY, 4);
 define_mailbox_handler_nop(EVENTS_SET_INTERRUPT_POLICY);
+declare_mailbox_handler(FIRMWARE_UPDATE_GET_INFO);
 declare_mailbox_handler(TIMESTAMP_GET);
 declare_mailbox_handler(TIMESTAMP_SET);
 declare_mailbox_handler(LOGS_GET_SUPPORTED);
 declare_mailbox_handler(LOGS_GET_LOG);
 declare_mailbox_handler(IDENTIFY_MEMORY_DEVICE);
+declare_mailbox_handler(CCLS_GET_PARTITION_INFO);
 
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -143,15 +149,50 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 CXL_CMD(EVENTS, CLEAR_RECORDS, ~0, IMMEDIATE_LOG_CHANGE),
 CXL_CMD(EVENTS, GET_INTERRUPT_POLICY, 0, 0),
 CXL_CMD(EVENTS, SET_INTERRUPT_POLICY, 4, IMMEDIATE_CONFIG_CHANGE),
+CXL_CMD(FIRMWARE_UPDATE, GET_INFO, 0, 0),
 CXL_CMD(TIMESTAMP, GET, 0, 0),
 CXL_CMD(TIMESTAMP, SET, 8, IMMEDIATE_POLICY_CHANGE),
 CXL_CMD(LOGS, GET_SUPPORTED, 0, 0),
 CXL_CMD(LOGS, GET_LOG, 0x18, 0),
 CXL_CMD(IDENTIFY, MEMORY_DEVICE, 0, 0),
+CXL_CMD(CCLS, GET_PARTITION_INFO, 0, 0),
 };
 
 #undef CXL_CMD
 
+/*
+ * 8.2.9.2.1
+ */
+define_mailbox_handler(FIRMWARE_UPDATE_GET_INFO)
+{
+struct {
+uint8_t slots_supported;
+uint8_t slot_info;
+uint8_t caps;
+uint8_t rsvd[0xd];
+char fw_rev1[0x10];
+char fw_rev2[0x10];
+char fw_rev3[0x10];
+char fw_rev4[0x10];
+} __attribute__((packed)) *fw_info;
+_Static_assert(sizeof(*fw_info) == 0x50, "Bad firmware info size");
+
+if (memory_region_size(cxl_dstate->pmem) < (256 << 20)) {
+return CXL_MBOX_INTERNAL_ERROR;
+}
+
+fw_info = (void *)cmd->payload;
+memset(fw_info, 0, sizeof(*fw_info));
+
+fw_info->slots_supported = 2;
+fw_info->slot_info = BIT(0) | BIT(3);
+fw_info->caps = 0;
+snprintf(fw_info->fw_rev1, 0x10, "BWFW VERSION %02d", 0);
+
+*len = sizeof(*fw_info);
+return CXL_MBOX_SUCCESS;
+}
+
 /*
  * 8.2.9.3.1
  */
@@ -296,6 +337,30 @@ define_mailbox_handler(IDENTIFY_MEMORY_DEVICE)
 return CXL_MBOX_SUCCESS;
 }
 
+define_mailbox_handler(CCLS_GET_PARTITION_INFO)
+{
+struct {
+uint64_t active_vmem;
+uint64_t active_pmem;
+uint64_t next_vmem;
+uint64_t next_pmem;
+} __attribute__((packed)) *part_info = (void *)cmd->payload;
+_Static_assert(sizeof(*part_info) == 0x20, "Bad get partition info size");
+
+if (memory_region_size(cxl_dstate->pmem) < (256 << 20)) {
+return CXL_MBOX_INTERNAL_ERROR;
+}
+
+/* PMEM only */
+part_info->active_vmem = 0;
+part_info->next_vmem = 0;
+part_info->active_pmem = memory_region_size(cxl_dstate->pmem);
+part_info->next_pmem = part_info->active_pmem;
+
+*len = sizeof(*part_info);
+return CXL_MBOX_SUCCESS;
+}
+
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
 {
 uint16_t ret = CXL_MBOX_SUCCESS;
-- 
2.30.0




[RFC PATCH v3 23/31] acpi/cxl: Add _OSC implementation (9.14.2)

2021-02-01 Thread Ben Widawsky
CXL 2.0 specification adds 2 new dwords to the existing _OSC definition
from PCIe. The new dwords are accessed with a new uuid. This
implementation supports what is in the specification.

We are currently in the process of trying to define a new definition for
_OSC. See later work for an explanation.

Signed-off-by: Ben Widawsky 
---
 hw/acpi/Kconfig   |   5 ++
 hw/acpi/cxl.c | 104 ++
 hw/acpi/meson.build   |   1 +
 hw/i386/acpi-build.c  |  12 -
 include/hw/acpi/cxl.h |  23 ++
 5 files changed, 144 insertions(+), 1 deletion(-)
 create mode 100644 hw/acpi/cxl.c
 create mode 100644 include/hw/acpi/cxl.h

diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 1932f66af8..b27907953e 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -5,6 +5,7 @@ config ACPI_X86
 bool
 select ACPI
 select ACPI_NVDIMM
+select ACPI_CXL
 select ACPI_CPU_HOTPLUG
 select ACPI_MEMORY_HOTPLUG
 select ACPI_HMAT
@@ -42,3 +43,7 @@ config ACPI_VMGENID
 depends on PC
 
 config ACPI_HW_REDUCED
+
+config ACPI_CXL
+bool
+depends on ACPI
diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
new file mode 100644
index 00..7124d5a1a3
--- /dev/null
+++ b/hw/acpi/cxl.c
@@ -0,0 +1,104 @@
+/*
+ * CXL ACPI Implementation
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "qemu/osdep.h"
+#include "hw/cxl/cxl.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "hw/acpi/cxl.h"
+#include "qapi/error.h"
+#include "qemu/uuid.h"
+
+static Aml *__build_cxl_osc_method(void)
+{
+Aml *method, *if_uuid, *else_uuid, *if_arg1_not_1, *if_cxl, 
*if_caps_masked;
+Aml *a_ctrl = aml_local(0);
+Aml *a_cdw1 = aml_name("CDW1");
+
+method = aml_method("_OSC", 4, AML_NOTSERIALIZED);
+aml_append(method, aml_create_dword_field(aml_arg(3), aml_int(0), "CDW1"));
+
+/* 9.14.2.1.4 */
+if_uuid = aml_if(
+aml_lor(aml_equal(aml_arg(0),
+  aml_touuid("33DB4D5B-1FF7-401C-9657-7441C03DD766")),
+aml_equal(aml_arg(0),
+  
aml_touuid("68F2D50B-C469-4D8A-BD3D-941A103FD3FC";
+aml_append(if_uuid, aml_create_dword_field(aml_arg(3), aml_int(4), 
"CDW2"));
+aml_append(if_uuid, aml_create_dword_field(aml_arg(3), aml_int(8), 
"CDW3"));
+
+aml_append(if_uuid, aml_store(aml_name("CDW3"), a_ctrl));
+
+/* This is all the same as what's used for PCIe */
+aml_append(if_uuid,
+   aml_and(aml_name("CTRL"), aml_int(0x1F), aml_name("CTRL")));
+
+if_arg1_not_1 = aml_if(aml_lnot(aml_equal(aml_arg(1), aml_int(0x1;
+/* Unknown revision */
+aml_append(if_arg1_not_1, aml_or(a_cdw1, aml_int(0x08), a_cdw1));
+aml_append(if_uuid, if_arg1_not_1);
+
+if_caps_masked = aml_if(aml_lnot(aml_equal(aml_name("CDW3"), a_ctrl)));
+/* Capability bits were masked */
+aml_append(if_caps_masked, aml_or(a_cdw1, aml_int(0x10), a_cdw1));
+aml_append(if_uuid, if_caps_masked);
+
+aml_append(if_uuid, aml_store(aml_name("CDW2"), aml_name("SUPP")));
+aml_append(if_uuid, aml_store(aml_name("CDW3"), aml_name("CTRL")));
+
+if_cxl = aml_if(aml_equal(
+aml_arg(0), aml_touuid("68F2D50B-C469-4D8A-BD3D-941A103FD3FC")));
+/* CXL support field */
+aml_append(if_cxl, aml_create_dword_field(aml_arg(3), aml_int(12), 
"CDW4"));
+/* CXL capabilities */
+aml_append(if_cxl, aml_create_dword_field(aml_arg(3), aml_int(16), 
"CDW5"));
+aml_append(if_cxl, aml_store(aml_name("CDW4"), aml_name("SUPC")));
+aml_append(if_cxl, aml_store(aml_name("CDW5"), aml_name("CTRC")));
+
+/* CXL 2.0 Port/Device Register access */
+aml_append(if_cxl,
+   aml_or(aml_name("CDW5"), aml_int(0x1), aml_name("CDW5")));
+aml_append(if_uuid, if_cxl);
+
+/* Update DWORD3 (the return value) */
+aml_append(if_uuid, aml_store(a_ctrl, aml_name("CDW3")));
+
+aml_append(if_uuid, aml_return(aml_arg(3)));
+aml_append(method, if_uuid);
+
+else_uuid = aml_else();
+
+/* unrecognized uuid */
+aml_append(else_uuid,
+   aml_or(aml_name("CDW1"), aml_int(0x4), aml_name("CDW1")));
+aml_append(else_uuid, aml_return(aml_arg(3)));
+aml_append(method, else_uuid);
+
+return method;

[RFC PATCH v3 30/31] qtest/cxl: Add very basic sanity tests

2021-02-01 Thread Ben Widawsky
Signed-off-by: Ben Widawsky 
---
 tests/qtest/cxl-test.c  | 93 +
 tests/qtest/meson.build |  4 ++
 2 files changed, 97 insertions(+)
 create mode 100644 tests/qtest/cxl-test.c

diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
new file mode 100644
index 00..00eca14faa
--- /dev/null
+++ b/tests/qtest/cxl-test.c
@@ -0,0 +1,93 @@
+/*
+ * QTest testcase for CXL
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest-single.h"
+
+#define QEMU_PXB_CMD "-machine q35 -object memory-backend-file,id=cxl-mem1," \
+ "share,mem-path=%s,size=512M "  \
+ "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52,uid=0,"  \
+ 
"len-window-base=1,window-base[0]=0x4c000,memdev[0]=cxl-mem1"
+#define QEMU_RP "-device cxl-rp,id=rp0,bus=cxl.0,addr=0.0,chassis=0,slot=0"
+
+#define QEMU_T3D "-device 
cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M"
+
+static void cxl_basic_hb(void)
+{
+qtest_start("-machine q35,cxl");
+qtest_end();
+}
+
+static void cxl_basic_pxb(void)
+{
+qtest_start("-machine q35 -device pxb-cxl,bus=pcie.0,uid=0");
+qtest_end();
+}
+
+static void cxl_pxb_with_window(void)
+{
+GString *cmdline;
+char template[] = "/tmp/cxl-test-XX";
+const char *tmpfs;
+
+tmpfs = mkdtemp(template);
+
+cmdline = g_string_new(NULL);
+g_string_printf(cmdline, QEMU_PXB_CMD, tmpfs);
+
+qtest_start(cmdline->str);
+qtest_end();
+
+g_string_free(cmdline, TRUE);
+}
+
+static void cxl_root_port(void)
+{
+GString *cmdline;
+char template[] = "/tmp/cxl-test-XX";
+const char *tmpfs;
+
+tmpfs = mkdtemp(template);
+
+cmdline = g_string_new(NULL);
+g_string_printf(cmdline, QEMU_PXB_CMD " %s", tmpfs, QEMU_RP);
+
+qtest_start(cmdline->str);
+qtest_end();
+
+g_string_free(cmdline, TRUE);
+}
+
+static void cxl_t3d(void)
+{
+GString *cmdline;
+char template[] = "/tmp/cxl-test-XX";
+const char *tmpfs;
+
+tmpfs = mkdtemp(template);
+
+cmdline = g_string_new(NULL);
+g_string_printf(cmdline, QEMU_PXB_CMD " %s %s", tmpfs, QEMU_RP, QEMU_T3D);
+
+qtest_start(cmdline->str);
+qtest_end();
+
+g_string_free(cmdline, TRUE);
+}
+
+int main(int argc, char **argv)
+{
+g_test_init(, , NULL);
+
+qtest_add_func("/pci/cxl/basic_hostbridge", cxl_basic_hb);
+qtest_add_func("/pci/cxl/basic_pxb", cxl_basic_pxb);
+qtest_add_func("/pci/cxl/pxb_with_window", cxl_pxb_with_window);
+qtest_add_func("/pci/cxl/root_port", cxl_root_port);
+qtest_add_func("/pci/cxl/type3_device", cxl_t3d);
+
+return g_test_run();
+}
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index c83bc211b6..554152b7c5 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -22,6 +22,9 @@ qtests_pci = \
   (config_all_devices.has_key('CONFIG_VGA') ? ['display-vga-test'] : []) + 
 \
   (config_all_devices.has_key('CONFIG_IVSHMEM_DEVICE') ? ['ivshmem-test'] : [])
 
+qtests_cxl = \
+  (config_all_devices.has_key('CONFIG_CXL') ? ['cxl-test'] : [])
+
 qtests_i386 = \
   (slirp.found() ? ['pxe-test', 'test-netfilter'] : []) + \
   (config_host.has_key('CONFIG_POSIX') ? ['test-filter-mirror'] : []) +
 \
@@ -48,6 +51,7 @@ qtests_i386 = \
   (config_all_devices.has_key('CONFIG_TPM_TIS_ISA') ? ['tpm-tis-swtpm-test'] : 
[]) +\
   (config_all_devices.has_key('CONFIG_RTL8139_PCI') ? ['rtl8139-test'] : []) + 
 \
   qtests_pci + 
 \
+  qtests_cxl + 
 \
   ['fdc-test',
'ide-test',
'hd-geo-test',
-- 
2.30.0




[RFC PATCH v3 21/31] hw/cxl/device: Add a memory device (8.2.8.5)

2021-02-01 Thread Ben Widawsky
A CXL memory device (AKA Type 3) is a CXL component that contains some
combination of volatile and persistent memory. It also implements the
previously defined mailbox interface as well as the memory device
firmware interface.

Although the memory device is configured like a normal PCIe device, the
memory traffic is on an entirely separate bus conceptually (using the
same physical wires as PCIe, but different protocol).

The guest physical address for the memory device is part of a larger
window which is owned by the platform. Currently, this is hardcoded as
an object property on host bridge (PXB) creation, but that will need to
change for interleaving.

The following example will create a 256M device in a 512M window:
-object "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M"
-device "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M"

Signed-off-by: Ben Widawsky 
---
 hw/core/numa.c |   3 +
 hw/cxl/cxl-mailbox-utils.c |  41 ++
 hw/i386/pc.c   |   1 +
 hw/mem/Kconfig |   5 +
 hw/mem/cxl_type3.c | 281 +
 hw/mem/meson.build |   1 +
 hw/pci/pcie.c  |  30 
 include/hw/cxl/cxl.h   |   2 +
 include/hw/cxl/cxl_pci.h   |  22 +++
 include/hw/pci/pci_ids.h   |   1 +
 monitor/hmp-cmds.c |  15 ++
 qapi/machine.json  |   1 +
 12 files changed, 403 insertions(+)
 create mode 100644 hw/mem/cxl_type3.c

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 68cee65f61..cd7df371e6 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -770,6 +770,9 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[])
 node_mem[pcdimm_info->node].node_plugged_mem +=
 pcdimm_info->size;
 break;
+case MEMORY_DEVICE_INFO_KIND_CXL:
+/* FINISHME */
+break;
 case MEMORY_DEVICE_INFO_KIND_VIRTIO_PMEM:
 vpi = value->u.virtio_pmem.data;
 /* TODO: once we support numa, assign to right node */
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 3f0ae8b9e5..f92dfad882 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -49,6 +49,8 @@ enum {
 LOGS= 0x04,
 #define GET_SUPPORTED 0x0
 #define GET_LOG   0x1
+IDENTIFY= 0x40,
+#define MEMORY_DEVICE 0x0
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -127,6 +129,7 @@ declare_mailbox_handler(TIMESTAMP_GET);
 declare_mailbox_handler(TIMESTAMP_SET);
 declare_mailbox_handler(LOGS_GET_SUPPORTED);
 declare_mailbox_handler(LOGS_GET_LOG);
+declare_mailbox_handler(IDENTIFY_MEMORY_DEVICE);
 
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -144,6 +147,7 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 CXL_CMD(TIMESTAMP, SET, 8, IMMEDIATE_POLICY_CHANGE),
 CXL_CMD(LOGS, GET_SUPPORTED, 0, 0),
 CXL_CMD(LOGS, GET_LOG, 0x18, 0),
+CXL_CMD(IDENTIFY, MEMORY_DEVICE, 0, 0),
 };
 
 #undef CXL_CMD
@@ -255,6 +259,43 @@ define_mailbox_handler(LOGS_GET_LOG)
 return CXL_MBOX_SUCCESS;
 }
 
+/* 8.2.9.5.1.1 */
+define_mailbox_handler(IDENTIFY_MEMORY_DEVICE)
+{
+struct {
+char fw_revision[0x10];
+uint64_t total_capacity;
+uint64_t volatile_capacity;
+uint64_t persistent_capacity;
+uint64_t partition_align;
+uint16_t info_event_log_size;
+uint16_t warning_event_log_size;
+uint16_t failure_event_log_size;
+uint16_t fatal_event_log_size;
+uint32_t lsa_size;
+uint8_t poison_list_max_mer[3];
+uint16_t inject_poison_limit;
+uint8_t poison_caps;
+uint8_t qos_telemetry_caps;
+} __attribute__((packed)) *id;
+_Static_assert(sizeof(*id) == 0x43, "Bad identify size");
+
+if (memory_region_size(cxl_dstate->pmem) < (256 << 20)) {
+return CXL_MBOX_INTERNAL_ERROR;
+}
+
+id = (void *)cmd->payload;
+memset(id, 0, sizeof(*id));
+
+/* PMEM only */
+snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
+id->total_capacity = memory_region_size(cxl_dstate->pmem);
+id->persistent_capacity = memory_region_size(cxl_dstate->pmem);
+
+*len = sizeof(*id);
+return CXL_MBOX_SUCCESS;
+}
+
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
 {
 uint16_t ret = CXL_MBOX_SUCCESS;
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 5458f61d10..5d41809b37 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -79,6 +79,7 @@
 #include "acpi-build.h"
 #include "hw/mem/pc-dimm.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/cxl/cxl.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-common.h"
 #include "qapi/visitor.h"
diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig
index a0ef2cf648..7d9d1ced3e 100644
--- a/hw/mem/Kconfig
+++ b/hw/mem/Kconfig
@@ -10,3 +10,8 @@ config NVDIMM
 default y
 depends on (PC || PSERIES || ARM_VIRT)
 select MEM_DEVICE
+
+config 

[RFC PATCH v3 19/31] hw/pxb/cxl: Add "windows" for host bridges

2021-02-01 Thread Ben Widawsky
In a bare metal CXL capable system, system firmware will program
physical address ranges on the host. This is done by programming
internal registers that aren't typically known to OS. These address
ranges might be contiguous or interleaved across host bridges.

For a QEMU guest a new construct is introduced allowing passing a memory
backend to the host bridge for this same purpose. Each memory backend
needs to be passed to the host bridge as well as any device that will be
emulating that memory (not implemented here).

I'm hopeful the interleaving work in the link can be re-purposed here
(see Link).

An example to create a host bridges with a 512M window at 0x4c000
 -object memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M
 -device 
pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52,uid=0,len-memory-base=1,memory-base\[0\]=0x4c000,memory\[0\]=cxl-mem1

Link: https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg03680.html
Signed-off-by: Ben Widawsky 
---
 hw/pci-bridge/pci_expander_bridge.c | 65 +++--
 include/hw/cxl/cxl.h|  1 +
 2 files changed, 62 insertions(+), 4 deletions(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index 226a8a5fff..af1450c69d 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -69,12 +69,19 @@ struct PXBDev {
 uint8_t bus_nr;
 uint16_t numa_node;
 int32_t uid;
+struct cxl_dev {
+HostMemoryBackend *memory_window[CXL_WINDOW_MAX];
+
+uint32_t num_windows;
+hwaddr *window_base[CXL_WINDOW_MAX];
+} cxl;
 };
 
 typedef struct CXLHost {
 PCIHostState parent_obj;
 
 CXLComponentState cxl_cstate;
+PXBDev *dev;
 } CXLHost;
 
 static PXBDev *convert_to_pxb(PCIDevice *dev)
@@ -213,16 +220,31 @@ static void pxb_cxl_realize(DeviceState *dev, Error 
**errp)
 SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
 PCIHostState *phb = PCI_HOST_BRIDGE(dev);
 CXLHost *cxl = PXB_CXL_HOST(dev);
+struct cxl_dev *cxl_dev = >dev->cxl;
 CXLComponentState *cxl_cstate = >cxl_cstate;
 struct MemoryRegion *mr = _cstate->crb.component_registers;
+int uid = pci_bus_uid(phb->bus);
 
 cxl_component_register_block_init(OBJECT(dev), cxl_cstate,
   TYPE_PXB_CXL_HOST);
 sysbus_init_mmio(sbd, mr);
 
-/* FIXME: support multiple host bridges. */
-sysbus_mmio_map(sbd, 0, CXL_HOST_BASE +
-memory_region_size(mr) * pci_bus_uid(phb->bus));
+sysbus_mmio_map(sbd, 0, CXL_HOST_BASE + memory_region_size(mr) * uid);
+
+/*
+ * A CXL host bridge can exist without a fixed memory window, but it would
+ * only operate in legacy PCIe mode.
+ */
+if (!cxl_dev->memory_window[uid]) {
+warn_report(
+"CXL expander bridge created without window. Consider using %s",
+"memdev[0]=");
+return;
+}
+
+mr = host_memory_backend_get_memory(cxl_dev->memory_window[uid]);
+sysbus_init_mmio(sbd, mr);
+sysbus_mmio_map(sbd, 1 + uid, *cxl_dev->window_base[uid]);
 }
 
 static void pxb_cxl_host_class_init(ObjectClass *class, void *data)
@@ -328,6 +350,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, enum 
BusType type,
 } else if (type == CXL) {
 bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_CXL_BUS);
 bus->flags |= PCI_BUS_CXL;
+PXB_CXL_HOST(ds)->dev = PXB_CXL_DEV(dev);
 } else {
 bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, 
TYPE_PXB_BUS);
 bds = qdev_new("pci-bridge");
@@ -389,6 +412,8 @@ static Property pxb_dev_properties[] = {
 DEFINE_PROP_UINT8("bus_nr", PXBDev, bus_nr, 0),
 DEFINE_PROP_UINT16("numa_node", PXBDev, numa_node, NUMA_NODE_UNASSIGNED),
 DEFINE_PROP_INT32("uid", PXBDev, uid, -1),
+DEFINE_PROP_ARRAY("window-base", PXBDev, cxl.num_windows, cxl.window_base,
+  qdev_prop_uint64, hwaddr),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -460,7 +485,9 @@ static const TypeInfo pxb_pcie_dev_info = {
 
 static void pxb_cxl_dev_realize(PCIDevice *dev, Error **errp)
 {
-PXBDev *pxb = convert_to_pxb(dev);
+PXBDev *pxb = PXB_CXL_DEV(dev);
+struct cxl_dev *cxl = >cxl;
+int count = 0;
 
 /* A CXL PXB's parent bus is still PCIe */
 if (!pci_bus_is_express(pci_get_bus(dev))) {
@@ -476,6 +503,23 @@ static void pxb_cxl_dev_realize(PCIDevice *dev, Error 
**errp)
 /* FIXME: Check that uid doesn't collide with UIDs of other host bridges */
 
 pxb_dev_realize_common(dev, CXL, errp);
+
+for (unsigned i = 0; i < CXL_WINDOW_MAX; i++) {
+if (!cxl->memory_window[i]) {
+continue;
+}
+
+count++;
+}
+
+if (!count) {
+warn_report("memory-windows should be set when creating CXL host 
bridges");
+}
+
+if (count != cxl->num_windows) {
+error_setg(errp, "window bases count (%d) must match window count 

[RFC PATCH v3 29/31] hw/cxl/device: Implement get/set LSA

2021-02-01 Thread Ben Widawsky
Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-mailbox-utils.c  | 50 +
 hw/mem/cxl_type3.c  | 56 -
 include/hw/cxl/cxl_device.h |  9 ++
 3 files changed, 114 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 2637250c7b..c133cf0341 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -55,6 +55,8 @@ enum {
 #define MEMORY_DEVICE 0x0
 CCLS= 0x41,
 #define GET_PARTITION_INFO 0x0
+#define GET_LSA   0x2
+#define SET_LSA   0x3
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -136,8 +138,11 @@ declare_mailbox_handler(LOGS_GET_SUPPORTED);
 declare_mailbox_handler(LOGS_GET_LOG);
 declare_mailbox_handler(IDENTIFY_MEMORY_DEVICE);
 declare_mailbox_handler(CCLS_GET_PARTITION_INFO);
+declare_mailbox_handler(CCLS_GET_LSA);
+declare_mailbox_handler(CCLS_SET_LSA);
 
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
+#define IMMEDIATE_DATA_CHANGE (1 << 1)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
 
@@ -156,6 +161,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 CXL_CMD(LOGS, GET_LOG, 0x18, 0),
 CXL_CMD(IDENTIFY, MEMORY_DEVICE, 0, 0),
 CXL_CMD(CCLS, GET_PARTITION_INFO, 0, 0),
+CXL_CMD(CCLS, GET_LSA, 0, 0),
+CXL_CMD(CCLS, SET_LSA, ~0, IMMEDIATE_CONFIG_CHANGE | 
IMMEDIATE_DATA_CHANGE),
 };
 
 #undef CXL_CMD
@@ -365,6 +372,49 @@ define_mailbox_handler(CCLS_GET_PARTITION_INFO)
 return CXL_MBOX_SUCCESS;
 }
 
+define_mailbox_handler(CCLS_GET_LSA)
+{
+struct {
+uint32_t offset;
+uint32_t length;
+} __attribute__((packed, __aligned__(16))) *get_lsa;
+CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+CXLType3Class *cvc = CXL_TYPE3_DEV_GET_CLASS(ct3d);
+uint32_t offset, length;
+
+get_lsa = (void *)cmd->payload;
+offset = get_lsa->offset;
+length = get_lsa->length;
+
+*len = 0;
+if (offset + length > cvc->get_lsa_size(ct3d)) {
+return CXL_MBOX_INVALID_INPUT;
+}
+
+*len = cvc->get_lsa(ct3d, get_lsa, length, offset);
+return CXL_MBOX_SUCCESS;
+}
+
+define_mailbox_handler(CCLS_SET_LSA)
+{
+struct {
+uint32_t offset;
+uint32_t rsvd;
+void *data;
+} __attribute__((packed, __aligned__(16))) *set_lsa = (void *)cmd->payload;
+CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+CXLType3Class *cvc = CXL_TYPE3_DEV_GET_CLASS(ct3d);
+uint16_t plen = *len;
+
+*len = 0;
+if ((set_lsa->offset + plen) > cvc->get_lsa_size(ct3d)) {
+return CXL_MBOX_INVALID_INPUT;
+}
+
+cvc->set_lsa(ct3d, set_lsa->data, plen, set_lsa->offset);
+return CXL_MBOX_SUCCESS;
+}
+
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
 {
 uint16_t ret = CXL_MBOX_SUCCESS;
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 074d1dd41f..d091e645aa 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -8,6 +8,7 @@
 #include "qapi/error.h"
 #include "qemu/log.h"
 #include "qemu/module.h"
+#include "qemu/pmem.h"
 #include "qemu/range.h"
 #include "qemu/rcu.h"
 #include "sysemu/hostmem.h"
@@ -148,6 +149,11 @@ static void cxl_setup_memory(CXLType3Dev *ct3d, Error 
**errp)
 return;
 }
 
+if (!ct3d->lsa) {
+error_setg(errp, "lsa property must be set");
+return;
+}
+
 /* FIXME: need to check mr is the host bridge's MR */
 mr = host_memory_backend_get_memory(ct3d->hostmem);
 
@@ -267,6 +273,8 @@ static Property ct3_props[] = {
 DEFINE_PROP_SIZE("size", CXLType3Dev, size, -1),
 DEFINE_PROP_LINK("memdev", CXLType3Dev, hostmem, TYPE_MEMORY_BACKEND,
  HostMemoryBackend *),
+DEFINE_PROP_LINK("lsa", CXLType3Dev, lsa, TYPE_MEMORY_BACKEND,
+ HostMemoryBackend *),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -297,7 +305,51 @@ static void pc_dimm_md_fill_device_info(const 
MemoryDeviceState *md,
 
 static uint64_t get_lsa_size(CXLType3Dev *ct3d)
 {
-return 0;
+MemoryRegion *mr;
+
+mr = host_memory_backend_get_memory(ct3d->lsa);
+return memory_region_size(mr);
+}
+
+static void validate_lsa_access(MemoryRegion *mr, uint64_t size,
+uint64_t offset)
+{
+assert(offset + size <= memory_region_size(mr));
+assert(offset + size > offset);
+}
+
+static uint64_t get_lsa(CXLType3Dev *ct3d, void *buf, uint64_t size,
+uint64_t offset)
+{
+MemoryRegion *mr;
+void *lsa;
+
+mr = host_memory_backend_get_memory(ct3d->lsa);
+validate_lsa_access(mr, size, offset);
+
+lsa = memory_region_get_ram_ptr(mr) + offset;
+memcpy(buf, lsa, size);
+
+return size;
+}
+
+static void set_lsa(CXLType3Dev *ct3d, const void *buf, uint64_t size,
+uint64_t offset)
+{
+MemoryRegion *mr;
+void *lsa;
+
+mr = 

[RFC PATCH v3 20/31] hw/cxl/rp: Add a root port

2021-02-01 Thread Ben Widawsky
This adds just enough of a root port implementation to be able to
enumerate root ports (creating the required DVSEC entries). What's not
here yet is the MMIO nor the ability to write some of the DVSEC entries.

This can be added with the qemu commandline by adding a rootport to a
specific CXL host bridge. For example:
  -device cxl-rp,id=rp0,bus="cxl.0",addr=0.0,chassis=4

Like the host bridge patch, the ACPI tables aren't generated at this
point and so system software cannot use it.

Signed-off-by: Ben Widawsky 
---
 hw/pci-bridge/Kconfig  |   5 +
 hw/pci-bridge/cxl_root_port.c  | 231 +
 hw/pci-bridge/meson.build  |   1 +
 hw/pci-bridge/pcie_root_port.c |   6 +-
 hw/pci/pci.c   |   4 +-
 5 files changed, 245 insertions(+), 2 deletions(-)
 create mode 100644 hw/pci-bridge/cxl_root_port.c

diff --git a/hw/pci-bridge/Kconfig b/hw/pci-bridge/Kconfig
index f8df4315ba..02614f49aa 100644
--- a/hw/pci-bridge/Kconfig
+++ b/hw/pci-bridge/Kconfig
@@ -27,3 +27,8 @@ config DEC_PCI
 
 config SIMBA
 bool
+
+config CXL
+bool
+default y if PCI_EXPRESS && PXB
+depends on PCI_EXPRESS && MSI_NONBROKEN && PXB
diff --git a/hw/pci-bridge/cxl_root_port.c b/hw/pci-bridge/cxl_root_port.c
new file mode 100644
index 00..6c3b215bb3
--- /dev/null
+++ b/hw/pci-bridge/cxl_root_port.c
@@ -0,0 +1,231 @@
+/*
+ * CXL 2.0 Root Port Implementation
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/range.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pcie_port.h"
+#include "hw/qdev-properties.h"
+#include "hw/sysbus.h"
+#include "qapi/error.h"
+#include "hw/cxl/cxl.h"
+
+#define CXL_ROOT_PORT_DID 0x7075
+
+/* Copied from the gen root port which we derive */
+#define GEN_PCIE_ROOT_PORT_AER_OFFSET 0x100
+#define GEN_PCIE_ROOT_PORT_ACS_OFFSET \
+(GEN_PCIE_ROOT_PORT_AER_OFFSET + PCI_ERR_SIZEOF)
+#define CXL_ROOT_PORT_DVSEC_OFFSET \
+(GEN_PCIE_ROOT_PORT_ACS_OFFSET + PCI_ACS_SIZEOF)
+
+typedef struct CXLRootPort {
+/*< private >*/
+PCIESlot parent_obj;
+
+CXLComponentState cxl_cstate;
+PCIResReserve res_reserve;
+} CXLRootPort;
+
+#define TYPE_CXL_ROOT_PORT "cxl-rp"
+DECLARE_INSTANCE_CHECKER(CXLRootPort, CXL_ROOT_PORT, TYPE_CXL_ROOT_PORT)
+
+static void latch_registers(CXLRootPort *crp)
+{
+uint32_t *reg_state = crp->cxl_cstate.crb.cache_mem_registers;
+
+cxl_component_register_init_common(reg_state, CXL2_ROOT_PORT);
+}
+
+static void build_dvsecs(CXLComponentState *cxl)
+{
+uint8_t *dvsec;
+
+dvsec = (uint8_t *)&(struct extensions_dvsec_port){ 0 };
+cxl_component_create_dvsec(cxl, EXTENSIONS_PORT_DVSEC_LENGTH,
+   EXTENSIONS_PORT_DVSEC,
+   EXTENSIONS_PORT_DVSEC_REVID, dvsec);
+
+dvsec = (uint8_t *)&(struct dvsec_port_gpf){
+.rsvd= 0,
+.phase1_ctrl = 1, /* 1μs timeout */
+.phase2_ctrl = 1, /* 1μs timeout */
+};
+cxl_component_create_dvsec(cxl, GPF_PORT_DVSEC_LENGTH, GPF_PORT_DVSEC,
+   GPF_PORT_DVSEC_REVID, dvsec);
+
+dvsec = (uint8_t *)&(struct dvsec_port_flexbus){
+.cap  = 0x26, /* IO, Mem, non-MLD */
+.ctrl = 0,
+.status   = 0x26, /* same */
+.rcvd_mod_ts_data = 0xef, /* WTF? */
+};
+cxl_component_create_dvsec(cxl, PCIE_FLEXBUS_PORT_DVSEC_LENGTH_2_0,
+   PCIE_FLEXBUS_PORT_DVSEC,
+   PCIE_FLEXBUS_PORT_DVSEC_REVID_2_0, dvsec);
+
+dvsec = (uint8_t *)&(struct dvsec_register_locator){
+.rsvd = 0,
+.reg0_base_lo = RBI_COMPONENT_REG | COMPONENT_REG_BAR_IDX,
+.reg0_base_hi = 0,
+};
+cxl_component_create_dvsec(cxl, REG_LOC_DVSEC_LENGTH, REG_LOC_DVSEC,
+   REG_LOC_DVSEC_REVID, dvsec);
+}
+
+static void cxl_rp_realize(DeviceState *dev, Error **errp)
+{
+PCIDevice *pci_dev = PCI_DEVICE(dev);
+PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev);
+CXLRootPort *crp   = CXL_ROOT_PORT(dev);
+CXLComponentState *cxl_cstate = >cxl_cstate;
+ComponentRegisters *cregs = _cstate->crb;
+MemoryRegion *component_bar = >component_registers;
+

[RFC PATCH v3 15/31] tests/acpi: remove stale allowed tables

2021-02-01 Thread Ben Widawsky
 Scope (_SB)
 {
 Device (PCI0)
 {
 Name (_HID, EisaId ("PNP0A03") /* PCI Bus */)  // _HID: Hardware ID
-Name (_ADR, Zero)  // _ADR: Address
 Name (_UID, Zero)  // _UID: Unique ID
+Name (_ADR, Zero)  // _ADR: Address

Signed-off-by: Ben Widawsky 
---
 tests/data/acpi/pc/DSDT | Bin 5065 -> 5065 bytes
 tests/data/acpi/pc/DSDT.acpihmat| Bin 6390 -> 6390 bytes
 tests/data/acpi/pc/DSDT.bridge  | Bin 6924 -> 6924 bytes
 tests/data/acpi/pc/DSDT.cphp| Bin 5529 -> 5529 bytes
 tests/data/acpi/pc/DSDT.dimmpxm | Bin 6719 -> 6719 bytes
 tests/data/acpi/pc/DSDT.hpbridge| Bin 5026 -> 5026 bytes
 tests/data/acpi/pc/DSDT.hpbrroot| Bin 3084 -> 3084 bytes
 tests/data/acpi/pc/DSDT.ipmikcs | Bin 5137 -> 5137 bytes
 tests/data/acpi/pc/DSDT.memhp   | Bin 6424 -> 6424 bytes
 tests/data/acpi/pc/DSDT.numamem | Bin 5071 -> 5071 bytes
 tests/data/acpi/pc/DSDT.roothp  | Bin 5261 -> 5261 bytes
 tests/data/acpi/q35/DSDT| Bin 7801 -> 7801 bytes
 tests/data/acpi/q35/DSDT.acpihmat   | Bin 9126 -> 9126 bytes
 tests/data/acpi/q35/DSDT.bridge | Bin 7819 -> 7819 bytes
 tests/data/acpi/q35/DSDT.cphp   | Bin 8265 -> 8265 bytes
 tests/data/acpi/q35/DSDT.dimmpxm| Bin 9455 -> 9455 bytes
 tests/data/acpi/q35/DSDT.ipmibt | Bin 7876 -> 7876 bytes
 tests/data/acpi/q35/DSDT.memhp  | Bin 9160 -> 9160 bytes
 tests/data/acpi/q35/DSDT.mmio64 | Bin 8932 -> 8932 bytes
 tests/data/acpi/q35/DSDT.numamem| Bin 7807 -> 7807 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |  21 
 21 files changed, 21 deletions(-)

diff --git a/tests/data/acpi/pc/DSDT b/tests/data/acpi/pc/DSDT
index 
f6173df1d598767a79aa34ad7585ad7d45c5d4f3..b516745128e3f1a297b6327e9057026a2d16229c
 100644
GIT binary patch
delta 20
bcmX@9eo}oxJ7=h;3j;^Iqf5}n36{bDOsEE~

delta 20
bcmX@9eo}oxJEx;d5CcbisHe-u36{bDOlAhI

diff --git a/tests/data/acpi/pc/DSDT.acpihmat b/tests/data/acpi/pc/DSDT.acpihmat
index 
67f3f7249eaaa9404ebf0f2d0a324b8c8e3bd445..aeae285c6434ae6cf3c53660e34425727a497871
 100644
GIT binary patch
delta 20
bcmexn_|0%aJ7=h;3j;^Iqf5}n3271lRUHRT

delta 20
bcmexn_|0%aJEx;d5CcbisHe-u3271lRNDtm

diff --git a/tests/data/acpi/pc/DSDT.bridge b/tests/data/acpi/pc/DSDT.bridge
index 
643390f4c4138b37fc481656d3f555d0eeedcb02..4cd26a87dd11d96e10bf6de786b9d56ebfe0a4f9
 100644
GIT binary patch
delta 20
bcmeA%>oJ?q_I!oU=n}MXLX8vvMneXi

delta 20
bcmeA%>oJ?q#J~|B>glp^LX8vvMgaz#

diff --git a/tests/data/acpi/pc/DSDT.cphp b/tests/data/acpi/pc/DSDT.cphp
index 
1ddcf7d8812f5d8d4d38fe7e7b35fd5885806046..fecb784812cbb2308ef58acf4a2c580f56d35c39
 100644
GIT binary patch
delta 20
bcmbQKJyUx^J7=h;3j;^Iqf5}n37nz;MY;wk

delta 20
bcmbQKJyUx^JEx;d5CcbisHe-u37nz;MR*1%

diff --git a/tests/data/acpi/pc/DSDT.dimmpxm b/tests/data/acpi/pc/DSDT.dimmpxm
index 
c44385cc01879324738ffb7f997b8cdd762cbf97..f2c31e150ead16e4931367a6dab42704950a21e9
 100644
GIT binary patch
delta 20
bcmdmQvfpGvJ7=h;3j;^Iqf5}n3F{>RP4WjY

delta 20
bcmdmQvfpGvJEx;d5CcbisHe-u3F{>RO|Sglp^LJcglp^!e3zkMNS6(

diff --git a/tests/data/acpi/q35/DSDT b/tests/data/acpi/q35/DSDT
index 
d25cd7072932886d6967f4023faac1e1fa6e836c..17e2aebde98e0a3161d93e9b2e200737b13699ac
 100644
GIT binary patch
delta 21
dcmexq^V4R+R}_#>)Z#RX+z<

diff --git a/tests/data/acpi/q35/DSDT.acpihmat 
b/tests/data/acpi/q35/DSDT.acpihmat
index 
722e06af83abcde203a2b96a8ec81fd3bab9fc98..7b3d659352a0923822f6a5db1dbd0a6ad853c446
 100644
GIT binary patch
delta 21
dcmZ4HzRZ2XR}__9y`WOK1lA

diff --git a/tests/data/acpi/q35/DSDT.bridge b/tests/data/acpi/q35/DSDT.bridge
index 
06bac139d668ddfc7914e258b471a303c9dbd192..5961b55b1067c3090b2f1f4cd3386d71efee241d
 100644
GIT binary patch
delta 21
ccmeCS?Y5mTdE(4QHja2lmmr4CQjCSN09fk={{R30

delta 19
acmeCS?Y5mTnZ?m1h+*Qy=FL)!g|Yxf4F-?^

diff --git a/tests/data/acpi/q35/DSDT.cphp b/tests/data/acpi/q35/DSDT.cphp
index 
2b933ac482e6883efccbd7d6c96089602f2c0b4d..09c92d52f92bb346ed807945b9638cad958446f8
 100644
GIT binary patch
delta 21
dcmX@R}_>dONFPN@dc

diff --git a/tests/data/acpi/q35/DSDT.dimmpxm b/tests/data/acpi/q35/DSDT.dimmpxm
index 
bd8f8305b028ef20f9b6d1a0c69ac428d027e3d1..1da97afb32dddafefe7f27934acbcb7d56a67489
 100644
GIT binary patch
delta 21
dcmaFw`QCHFR}_UR4GFR)YuH

diff --git a/tests/data/acpi/q35/DSDT.ipmibt b/tests/data/acpi/q35/DSDT.ipmibt
index 
a8f868e23c25688ab1c0371016c071f23e9d732f..c7e68432b66e7b4d03284c882c65bbf3066825dc
 100644
GIT binary patch
delta 21
dcmX?NdR}_u95`+PJ;(K

diff --git a/tests/data/acpi/q35/DSDT.memhp b/tests/data/acpi/q35/DSDT.memhp
index 
9a802e4c67022386442976d5cb997ea3fc57b58f..3af457dd550461b2d2ea85aa85d7740452913b34
 100644
GIT binary patch
delta 21
dcmX@%e!_jiR}_u2TX4P;>`i

diff --git a/tests/data/acpi/q35/DSDT.mmio64 b/tests/data/acpi/q35/DSDT.mmio64
index 

[RFC PATCH v3 28/31] hw/cxl/device: Plumb real LSA sizing

2021-02-01 Thread Ben Widawsky
This should introduce no change. Subsequent work will make use of this
new class member.

Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-mailbox-utils.c  |  4 
 hw/mem/cxl_type3.c  | 24 +---
 include/hw/cxl/cxl.h|  1 -
 include/hw/cxl/cxl_device.h | 24 
 4 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index dc8e0eb08e..2637250c7b 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -321,6 +321,9 @@ define_mailbox_handler(IDENTIFY_MEMORY_DEVICE)
 } __attribute__((packed)) *id;
 _Static_assert(sizeof(*id) == 0x43, "Bad identify size");
 
+CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+CXLType3Class *cvc = CXL_TYPE3_DEV_GET_CLASS(ct3d);
+
 if (memory_region_size(cxl_dstate->pmem) < (256 << 20)) {
 return CXL_MBOX_INTERNAL_ERROR;
 }
@@ -332,6 +335,7 @@ define_mailbox_handler(IDENTIFY_MEMORY_DEVICE)
 snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
 id->total_capacity = memory_region_size(cxl_dstate->pmem);
 id->persistent_capacity = memory_region_size(cxl_dstate->pmem);
+id->lsa_size = cvc->get_lsa_size(ct3d);
 
 *len = sizeof(*id);
 return CXL_MBOX_SUCCESS;
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index fe02c3b63c..074d1dd41f 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -13,21 +13,6 @@
 #include "sysemu/hostmem.h"
 #include "hw/cxl/cxl.h"
 
-typedef struct cxl_type3_dev {
-/* Private */
-PCIDevice parent_obj;
-
-/* Properties */
-uint64_t size;
-HostMemoryBackend *hostmem;
-
-/* State */
-CXLComponentState cxl_cstate;
-CXLDeviceState cxl_dstate;
-} CXLType3Dev;
-
-#define CT3(obj) OBJECT_CHECK(CXLType3Dev, (obj), TYPE_CXL_TYPE3_DEV)
-
 static void build_dvsecs(CXLType3Dev *ct3d)
 {
 CXLComponentState *cxl_cstate = >cxl_cstate;
@@ -310,11 +295,17 @@ static void pc_dimm_md_fill_device_info(const 
MemoryDeviceState *md,
 info->type = MEMORY_DEVICE_INFO_KIND_CXL;
 }
 
+static uint64_t get_lsa_size(CXLType3Dev *ct3d)
+{
+return 0;
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
 PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
 MemoryDeviceClass *mdc = MEMORY_DEVICE_CLASS(oc);
+CXLType3Class *cvc = CXL_TYPE3_DEV_CLASS(oc);
 
 pc->realize = ct3_realize;
 pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
@@ -332,11 +323,14 @@ static void ct3_class_init(ObjectClass *oc, void *data)
 mdc->fill_device_info = pc_dimm_md_fill_device_info;
 mdc->get_plugged_size = memory_device_get_region_size;
 mdc->set_addr = cxl_md_set_addr;
+
+cvc->get_lsa_size = get_lsa_size;
 }
 
 static const TypeInfo ct3d_info = {
 .name = TYPE_CXL_TYPE3_DEV,
 .parent = TYPE_PCI_DEVICE,
+.class_size = sizeof(struct CXLType3Class),
 .class_init = ct3_class_init,
 .instance_size = sizeof(CXLType3Dev),
 .instance_init = ct3_instance_init,
diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
index 809ed7de60..c7ca42930f 100644
--- a/include/hw/cxl/cxl.h
+++ b/include/hw/cxl/cxl.h
@@ -23,4 +23,3 @@
 #define CXL_WINDOW_MAX 10
 
 #endif
-
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index ca5328a581..a79a0f106c 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -219,4 +219,28 @@ REG32(CXL_MEM_DEV_STS, 0)
 FIELD(CXL_MEM_DEV_STS, MBOX_READY, 4, 1)
 FIELD(CXL_MEM_DEV_STS, RESET_NEEDED, 5, 3)
 
+typedef struct cxl_type3_dev {
+/* Private */
+PCIDevice parent_obj;
+
+/* Properties */
+uint64_t size;
+HostMemoryBackend *hostmem;
+HostMemoryBackend *lsa;
+
+/* State */
+CXLComponentState cxl_cstate;
+CXLDeviceState cxl_dstate;
+} CXLType3Dev;
+
+#define CT3(obj) OBJECT_CHECK(CXLType3Dev, (obj), TYPE_CXL_TYPE3_DEV)
+
+struct CXLType3Class {
+/* Private */
+PCIDeviceClass parent_class;
+
+/* public */
+uint64_t (*get_lsa_size)(CXLType3Dev *ct3d);
+};
+
 #endif
-- 
2.30.0




[RFC PATCH v3 18/31] acpi/pxb/cxl: Reserve host bridge MMIO

2021-02-01 Thread Ben Widawsky
For all host bridges, reserve MMIO space with _CRS. The MMIO for the
host bridge lives in a magically hard coded space in the system's
physical address space. The standard mechanism to tell the OS about
regions which can't be used for host bridges is _CRS.

Signed-off-by: Ben Widawsky 
---
 hw/i386/acpi-build.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 145a503e92..ecdc10b148 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -28,6 +28,7 @@
 #include "qemu/bitmap.h"
 #include "qemu/error-report.h"
 #include "hw/pci/pci.h"
+#include "hw/cxl/cxl.h"
 #include "hw/core/cpu.h"
 #include "target/i386/cpu.h"
 #include "hw/misc/pvpanic.h"
@@ -1194,7 +1195,7 @@ static void build_smb0(Aml *table, I2CBus *smbus, int 
devnr, int func)
 aml_append(table, scope);
 }
 
-enum { PCI, PCIE };
+enum { PCI, PCIE, CXL };
 static void init_pci_acpi(Aml *dev, int uid, int type)
 {
 if (type == PCI) {
@@ -1344,20 +1345,28 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 uint8_t bus_num = pci_bus_num(bus);
 uint8_t numa_node = pci_bus_numa_node(bus);
 int32_t uid = pci_bus_uid(bus);
+int type;
 
 /* look only for expander root buses */
 if (!pci_bus_is_root(bus)) {
 continue;
 }
 
+type = pci_bus_is_cxl(bus) ? CXL :
+ pci_bus_is_express(bus) ? PCIE : PCI;
+
 if (bus_num < root_bus_limit) {
 root_bus_limit = bus_num - 1;
 }
 
 scope = aml_scope("\\_SB");
-dev = aml_device("PC%.02X", bus_num);
+if (type == CXL) {
+dev = aml_device("CXL%.01X", pci_bus_uid(bus));
+} else {
+dev = aml_device("PC%.02X", bus_num);
+}
 aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
-init_pci_acpi(dev, uid, pci_bus_is_express(bus) ? PCIE : PCI);
+init_pci_acpi(dev, uid, type);
 
 if (numa_node != NUMA_NODE_UNASSIGNED) {
 aml_append(dev, aml_name_decl("_PXM", aml_int(numa_node)));
@@ -1369,6 +1378,13 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 aml_append(dev, aml_name_decl("_CRS", crs));
 aml_append(scope, dev);
 aml_append(dsdt, scope);
+
+/* Handle the ranges for the PXB expanders */
+if (type == CXL) {
+uint64_t base = CXL_HOST_BASE + uid * 0x1;
+crs_range_insert(crs_range_set.mem_ranges, base,
+ base + 0x1 - 1);
+}
 }
 }
 
-- 
2.30.0




[RFC PATCH v3 24/31] tests/acpi: allow CEDT table addition

2021-02-01 Thread Ben Widawsky
Signed-off-by: Ben Widawsky 
---
 tests/data/acpi/pc/CEDT | 0
 tests/data/acpi/q35/CEDT| 0
 tests/qtest/bios-tables-test-allowed-diff.h | 2 ++
 3 files changed, 2 insertions(+)
 create mode 100644 tests/data/acpi/pc/CEDT
 create mode 100644 tests/data/acpi/q35/CEDT

diff --git a/tests/data/acpi/pc/CEDT b/tests/data/acpi/pc/CEDT
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/data/acpi/q35/CEDT b/tests/data/acpi/q35/CEDT
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..9b07f1e1ff 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,3 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/pc/CEDT",
+"tests/data/acpi/q35/CEDT",
-- 
2.30.0




[RFC PATCH v3 13/31] qtest: allow DSDT acpi table changes

2021-02-01 Thread Ben Widawsky
Signed-off-by: Ben Widawsky 
---
 tests/qtest/bios-tables-test-allowed-diff.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..5c695cdf37 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,22 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/pc/DSDT",
+"tests/data/acpi/pc/DSDT.acpihmat",
+"tests/data/acpi/pc/DSDT.bridge",
+"tests/data/acpi/pc/DSDT.cphp",
+"tests/data/acpi/pc/DSDT.dimmpxm",
+"tests/data/acpi/pc/DSDT.hpbridge",
+"tests/data/acpi/pc/DSDT.hpbrroot",
+"tests/data/acpi/pc/DSDT.ipmikcs",
+"tests/data/acpi/pc/DSDT.memhp",
+"tests/data/acpi/pc/DSDT.numamem",
+"tests/data/acpi/pc/DSDT.roothp",
+"tests/data/acpi/q35/DSDT",
+"tests/data/acpi/q35/DSDT.acpihmat",
+"tests/data/acpi/q35/DSDT.bridge",
+"tests/data/acpi/q35/DSDT.cphp",
+"tests/data/acpi/q35/DSDT.dimmpxm",
+"tests/data/acpi/q35/DSDT.ipmibt",
+"tests/data/acpi/q35/DSDT.memhp",
+"tests/data/acpi/q35/DSDT.mmio64",
+"tests/data/acpi/q35/DSDT.numamem",
+"tests/data/acpi/q35/DSDT.tis",
-- 
2.30.0




[RFC PATCH v3 26/31] tests/acpi: Add new CEDT files

2021-02-01 Thread Ben Widawsky
Signed-off-by: Ben Widawsky 
---
 tests/data/acpi/pc/CEDT | Bin 0 -> 36 bytes
 tests/data/acpi/q35/CEDT| Bin 0 -> 36 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   2 --
 3 files changed, 2 deletions(-)

diff --git a/tests/data/acpi/pc/CEDT b/tests/data/acpi/pc/CEDT
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..ebf9b54b0b27d9efca53359c3c2e560511f0e165
 100644
GIT binary patch
literal 36
kcmZ>EbqP^nU|?X};NEbqP^nU|?X};N

[RFC PATCH v3 11/31] hw/pci/cxl: Create a CXL bus type

2021-02-01 Thread Ben Widawsky
The easiest way to differentiate a CXL bus, and a PCIE bus is using a
flag. A CXL bus, in hardware, is backward compatible with PCIE, and
therefore the code tries pretty hard to keep them in sync as much as
possible.

The other way to implement this would be to try to cast the bus to the
correct type. This is less code and useful for debugging via simply
looking at the flags.

Signed-off-by: Ben Widawsky 
---
 hw/pci-bridge/pci_expander_bridge.c | 9 -
 include/hw/pci/pci_bus.h| 7 +++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index 232b7ce305..88c45dc3b5 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -24,7 +24,7 @@
 #include "hw/boards.h"
 #include "qom/object.h"
 
-enum BusType { PCI, PCIE };
+enum BusType { PCI, PCIE, CXL };
 
 #define TYPE_PXB_BUS "pxb-bus"
 typedef struct PXBBus PXBBus;
@@ -35,6 +35,10 @@ DECLARE_INSTANCE_CHECKER(PXBBus, PXB_BUS,
 DECLARE_INSTANCE_CHECKER(PXBBus, PXB_PCIE_BUS,
  TYPE_PXB_PCIE_BUS)
 
+#define TYPE_PXB_CXL_BUS "pxb-cxl-bus"
+DECLARE_INSTANCE_CHECKER(PXBBus, PXB_CXL_BUS,
+ TYPE_PXB_CXL_BUS)
+
 struct PXBBus {
 /*< private >*/
 PCIBus parent_obj;
@@ -244,6 +248,9 @@ static void pxb_dev_realize_common(PCIDevice *dev, enum 
BusType type,
 ds = qdev_new(TYPE_PXB_HOST);
 if (type == PCIE) {
 bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_PCIE_BUS);
+} else if (type == CXL) {
+bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_CXL_BUS);
+bus->flags |= PCI_BUS_CXL;
 } else {
 bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, 
TYPE_PXB_BUS);
 bds = qdev_new("pci-bridge");
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 347440d42c..eb94e7e85c 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -24,6 +24,8 @@ enum PCIBusFlags {
 PCI_BUS_IS_ROOT = 0x0001,
 /* PCIe extended configuration space is accessible on this bus */
 PCI_BUS_EXTENDED_CONFIG_SPACE   = 0x0002,
+/* This is a CXL Type BUS */
+PCI_BUS_CXL = 0x0004,
 };
 
 struct PCIBus {
@@ -53,6 +55,11 @@ struct PCIBus {
 Notifier machine_done;
 };
 
+static inline bool pci_bus_is_cxl(PCIBus *bus)
+{
+return !!(bus->flags & PCI_BUS_CXL);
+}
+
 static inline bool pci_bus_is_root(PCIBus *bus)
 {
 return !!(bus->flags & PCI_BUS_IS_ROOT);
-- 
2.30.0




[RFC PATCH v3 14/31] acpi/pci: Consolidate host bridge setup

2021-02-01 Thread Ben Widawsky
This cleanup will make it easier to add support for CXL to the mix.

Signed-off-by: Ben Widawsky 
---
 hw/i386/acpi-build.c | 31 +--
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index f56d699c7f..cf6eb54c22 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1194,6 +1194,20 @@ static void build_smb0(Aml *table, I2CBus *smbus, int 
devnr, int func)
 aml_append(table, scope);
 }
 
+enum { PCI, PCIE };
+static void init_pci_acpi(Aml *dev, int uid, int type)
+{
+if (type == PCI) {
+aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
+aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
+} else {
+aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
+aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
+aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
+aml_append(dev, build_q35_osc_method());
+}
+}
+
 static void
 build_dsdt(GArray *table_data, BIOSLinker *linker,
AcpiPmInfo *pm, AcpiMiscInfo *misc,
@@ -1222,9 +1236,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 if (misc->is_piix4) {
 sb_scope = aml_scope("_SB");
 dev = aml_device("PCI0");
-aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
+init_pci_acpi(dev, 0, PCI);
 aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
-aml_append(dev, aml_name_decl("_UID", aml_int(0)));
 aml_append(sb_scope, dev);
 aml_append(dsdt, sb_scope);
 
@@ -1238,11 +1251,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 } else {
 sb_scope = aml_scope("_SB");
 dev = aml_device("PCI0");
-aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
-aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
+init_pci_acpi(dev, 0, PCIE);
 aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
-aml_append(dev, aml_name_decl("_UID", aml_int(0)));
-aml_append(dev, build_q35_osc_method());
 aml_append(sb_scope, dev);
 
 if (pm->smi_on_cpuhp) {
@@ -1345,15 +1355,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 
 scope = aml_scope("\\_SB");
 dev = aml_device("PC%.02X", bus_num);
-aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
 aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
-if (pci_bus_is_express(bus)) {
-aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
-aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
-aml_append(dev, build_q35_osc_method());
-} else {
-aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
-}
+init_pci_acpi(dev, bus_num, pci_bus_is_express(bus) ? PCIE : PCI);
 
 if (numa_node != NUMA_NODE_UNASSIGNED) {
 aml_append(dev, aml_name_decl("_PXM", aml_int(numa_node)));
-- 
2.30.0




[RFC PATCH v3 12/31] hw/pxb: Allow creation of a CXL PXB (host bridge)

2021-02-01 Thread Ben Widawsky
This works like adding a typical pxb device, except the name is
'pxb-cxl' instead of 'pxb-pcie'. An example command line would be as
follows:
  -device pxb-cxl,id=cxl.0,bus="pcie.0",bus_nr=1

A CXL PXB is backward compatible with PCIe. What this means in practice
is that an operating system that is unaware of CXL should still be able
to enumerate this topology as if it were PCIe.

One can create multiple CXL PXB host bridges, but a host bridge can only
be connected to the main root bus. Host bridges cannot appear elsewhere
in the topology.

Note that as of this patch, the ACPI tables needed for the host bridge
(specifically, an ACPI object in _SB named ACPI0016 and the CEDT) aren't
created. So while this patch internally creates it, it cannot be
properly used by an operating system or other system software.

Upcoming patches will allow creating multiple host bridges.

v2: Remove vendor and device ID (Ben)

Signed-off-by: Ben Widawsky 
---
 hw/pci-bridge/pci_expander_bridge.c | 67 -
 hw/pci/pci.c|  7 +++
 include/hw/pci/pci.h|  6 +++
 3 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index 88c45dc3b5..b42592e1ff 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -56,6 +56,10 @@ DECLARE_INSTANCE_CHECKER(PXBDev, PXB_DEV,
 DECLARE_INSTANCE_CHECKER(PXBDev, PXB_PCIE_DEV,
  TYPE_PXB_PCIE_DEVICE)
 
+#define TYPE_PXB_CXL_DEVICE "pxb-cxl"
+DECLARE_INSTANCE_CHECKER(PXBDev, PXB_CXL_DEV,
+ TYPE_PXB_CXL_DEVICE)
+
 struct PXBDev {
 /*< private >*/
 PCIDevice parent_obj;
@@ -67,6 +71,11 @@ struct PXBDev {
 
 static PXBDev *convert_to_pxb(PCIDevice *dev)
 {
+/* A CXL PXB's parent bus is PCIe, so the normal check won't work */
+if (object_dynamic_cast(OBJECT(dev), TYPE_PXB_CXL_DEVICE)) {
+return PXB_CXL_DEV(dev);
+}
+
 return pci_bus_is_express(pci_get_bus(dev))
 ? PXB_PCIE_DEV(dev) : PXB_DEV(dev);
 }
@@ -111,11 +120,20 @@ static const TypeInfo pxb_pcie_bus_info = {
 .class_init= pxb_bus_class_init,
 };
 
+static const TypeInfo pxb_cxl_bus_info = {
+.name  = TYPE_PXB_CXL_BUS,
+.parent= TYPE_CXL_BUS,
+.instance_size = sizeof(PXBBus),
+.class_init= pxb_bus_class_init,
+};
+
 static const char *pxb_host_root_bus_path(PCIHostState *host_bridge,
   PCIBus *rootbus)
 {
-PXBBus *bus = pci_bus_is_express(rootbus) ?
-  PXB_PCIE_BUS(rootbus) : PXB_BUS(rootbus);
+PXBBus *bus = pci_bus_is_cxl(rootbus) ?
+  PXB_CXL_BUS(rootbus) :
+  pci_bus_is_express(rootbus) ? PXB_PCIE_BUS(rootbus) :
+PXB_BUS(rootbus);
 
 snprintf(bus->bus_path, 8, ":%02x", pxb_bus_num(rootbus));
 return bus->bus_path;
@@ -380,13 +398,58 @@ static const TypeInfo pxb_pcie_dev_info = {
 },
 };
 
+static void pxb_cxl_dev_realize(PCIDevice *dev, Error **errp)
+{
+/* A CXL PXB's parent bus is still PCIe */
+if (!pci_bus_is_express(pci_get_bus(dev))) {
+error_setg(errp, "pxb-cxl devices cannot reside on a PCI bus");
+return;
+}
+
+pxb_dev_realize_common(dev, CXL, errp);
+}
+
+static void pxb_cxl_dev_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc   = DEVICE_CLASS(klass);
+PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+k->realize = pxb_cxl_dev_realize;
+k->exit= pxb_dev_exitfn;
+/*
+ * XXX: These types of bridges don't actually show up in the hierarchy so
+ * vendor, device, class, etc. ids are intentionally left out.
+ */
+
+dc->desc = "CXL Host Bridge";
+device_class_set_props(dc, pxb_dev_properties);
+set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
+
+/* Host bridges aren't hotpluggable. FIXME: spec reference */
+dc->hotpluggable = false;
+}
+
+static const TypeInfo pxb_cxl_dev_info = {
+.name  = TYPE_PXB_CXL_DEVICE,
+.parent= TYPE_PCI_DEVICE,
+.instance_size = sizeof(PXBDev),
+.class_init= pxb_cxl_dev_class_init,
+.interfaces =
+(InterfaceInfo[]){
+{ INTERFACE_CONVENTIONAL_PCI_DEVICE },
+{},
+},
+};
+
 static void pxb_register_types(void)
 {
 type_register_static(_bus_info);
 type_register_static(_pcie_bus_info);
+type_register_static(_cxl_bus_info);
 type_register_static(_host_info);
 type_register_static(_dev_info);
 type_register_static(_pcie_dev_info);
+type_register_static(_cxl_dev_info);
 }
 
 type_init(pxb_register_types)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index a45ca326ed..adbe8aa260 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -222,6 +222,12 @@ static const TypeInfo pcie_bus_info = {
 .class_init = pcie_bus_class_init,
 

[RFC PATCH v3 25/31] acpi/cxl: Create the CEDT (9.14.1)

2021-02-01 Thread Ben Widawsky
The CXL Early Discovery Table is defined in the CXL 2.0 specification as
a way for the OS to get CXL specific information from the system
firmware.

CXL 2.0 specification adds an _HID, ACPI0016, for CXL capable host
bridges, with a _CID of PNP0A08 (PCIe host bridge). CXL aware software
is able to use this initiate the proper _OSC method, and get the _UID
which is referenced by the CEDT. Therefore the existence of an ACPI0016
device allows a CXL aware driver perform the necessary actions. For a
CXL capable OS, this works. For a CXL unaware OS, this works.

CEDT awaremess requires more. The motivation for ACPI0017 is to provide
the possibility of having a Linux CXL module that can work on a legacy
Linux kernel. Linux core PCI/ACPI which won't be built as a module,
will see the _CID of PNP0A08 and bind a driver to it. If we later loaded
a driver for ACPI0016, Linux won't be able to bind it to the hardware
because it has already bound the PNP0A08 driver. The ACPI0017 device is
an opportunity to have an object to bind a driver will be used by a
Linux driver to walk the CXL topology and do everything that we would
have preferred to do with ACPI0016.

There is another motivation for an ACPI0017 device which isn't
implemented here. An operating system needs an attach point for a
non-volatile region provider that understands cross-hostbridge
interleaving. Since QEMU emulation doesn't support interleaving yet,
this is more important on the OS side, for now.

As of CXL 2.0 spec, only 1 sub structure is defined, the CXL Host Bridge
Structure (CHBS) which is primarily useful for telling the OS exactly
where the MMIO for the host bridge is.

v2: Update CHBS to spec released definition
v3: squash ACPI0017 in now that it's ratified.

Link: 
https://lore.kernel.org/linux-cxl/20210115034911.nkgpzc756d6qm...@intel.com/T/#t
Signed-off-by: Ben Widawsky 
---
 hw/acpi/cxl.c   | 69 +
 hw/i386/acpi-build.c| 25 ++-
 hw/pci-bridge/pci_expander_bridge.c | 21 +
 include/hw/acpi/cxl.h   |  4 ++
 include/hw/pci/pci_bridge.h | 25 +++
 5 files changed, 123 insertions(+), 21 deletions(-)

diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
index 7124d5a1a3..68db0fe3a8 100644
--- a/hw/acpi/cxl.c
+++ b/hw/acpi/cxl.c
@@ -18,14 +18,83 @@
  */
 
 #include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pci_host.h"
 #include "hw/cxl/cxl.h"
+#include "hw/mem/memory-device.h"
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/aml-build.h"
 #include "hw/acpi/bios-linker-loader.h"
 #include "hw/acpi/cxl.h"
+#include "hw/acpi/cxl.h"
 #include "qapi/error.h"
 #include "qemu/uuid.h"
 
+static void cedt_build_chbs(GArray *table_data, PXBDev *cxl)
+{
+SysBusDevice *sbd = SYS_BUS_DEVICE(cxl->cxl.cxl_host_bridge);
+struct MemoryRegion *mr = sbd->mmio[0].memory;
+
+/* Type */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Record Length */
+build_append_int_noprefix(table_data, 32, 2);
+
+/* UID */
+build_append_int_noprefix(table_data, cxl->uid, 4);
+
+/* Version */
+build_append_int_noprefix(table_data, 1, 4);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+
+/* Base */
+build_append_int_noprefix(table_data, mr->addr, 8);
+
+/* Length */
+build_append_int_noprefix(table_data, memory_region_size(mr), 8);
+}
+
+static int cxl_foreach_pxb_hb(Object *obj, void *opaque)
+{
+Aml *cedt = opaque;
+
+if (object_dynamic_cast(obj, TYPE_PXB_CXL_DEVICE)) {
+PXBDev *pxb = PXB_CXL_DEV(obj);
+
+cedt_build_chbs(cedt->buf, pxb);
+}
+
+return 0;
+}
+
+void cxl_build_cedt(GArray *table_offsets, GArray *table_data,
+BIOSLinker *linker)
+{
+const int cedt_start = table_data->len;
+Aml *cedt;
+
+cedt = init_aml_allocator();
+
+/* reserve space for CEDT header */
+acpi_add_table(table_offsets, table_data);
+acpi_data_push(cedt->buf, sizeof(AcpiTableHeader));
+
+object_child_foreach_recursive(object_get_root(), cxl_foreach_pxb_hb, 
cedt);
+
+/* copy AML table into ACPI tables blob and patch header there */
+g_array_append_vals(table_data, cedt->buf->data, cedt->buf->len);
+build_header(linker, table_data, (void *)(table_data->data + cedt_start),
+ "CEDT", table_data->len - cedt_start, 1, NULL, NULL);
+free_aml_allocator();
+}
+
 static Aml *__build_cxl_osc_method(void)
 {
 Aml *method, *if_uuid, *else_uuid, *if_arg1_not_1, *if_cxl, 
*if_caps_masked;
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 2c2293b55f..7706856c49 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -75,6 +75,8 @@
 #include "hw/acpi/ipmi.h"
 #include "hw/acpi/hmat.h"
 
+#include "hw/acpi/cxl.h"
+
 /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
  * -M 

[RFC PATCH v3 08/31] hw/cxl/device: Timestamp implementation (8.2.9.3)

2021-02-01 Thread Ben Widawsky
Per spec, timestamp appears to be a free-running counter from a value
set by the host via the Set Timestamp command (0301h). There are
references to the epoch, which seem like a red herring. Therefore, the
implementation implements the timestamp as freerunning counter from the
last value that was issued by the Set Timestamp command.

Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-mailbox-utils.c  | 53 +
 include/hw/cxl/cxl_device.h |  6 +
 2 files changed, 59 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 7c939a1851..3d36614c0c 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -43,6 +43,9 @@ enum {
 #define CLEAR_RECORDS   0x1
 #define GET_INTERRUPT_POLICY   0x2
 #define SET_INTERRUPT_POLICY   0x3
+TIMESTAMP   = 0x03,
+#define GET   0x0
+#define SET   0x1
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -117,8 +120,11 @@ define_mailbox_handler_zeroed(EVENTS_GET_RECORDS, 0x20);
 define_mailbox_handler_nop(EVENTS_CLEAR_RECORDS);
 define_mailbox_handler_zeroed(EVENTS_GET_INTERRUPT_POLICY, 4);
 define_mailbox_handler_nop(EVENTS_SET_INTERRUPT_POLICY);
+declare_mailbox_handler(TIMESTAMP_GET);
+declare_mailbox_handler(TIMESTAMP_SET);
 
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
+#define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
 
 #define CXL_CMD(s, c, in, cel_effect) \
@@ -129,10 +135,57 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 CXL_CMD(EVENTS, CLEAR_RECORDS, ~0, IMMEDIATE_LOG_CHANGE),
 CXL_CMD(EVENTS, GET_INTERRUPT_POLICY, 0, 0),
 CXL_CMD(EVENTS, SET_INTERRUPT_POLICY, 4, IMMEDIATE_CONFIG_CHANGE),
+CXL_CMD(TIMESTAMP, GET, 0, 0),
+CXL_CMD(TIMESTAMP, SET, 8, IMMEDIATE_POLICY_CHANGE),
 };
 
 #undef CXL_CMD
 
+/*
+ * 8.2.9.3.1
+ */
+define_mailbox_handler(TIMESTAMP_GET)
+{
+struct timespec ts;
+uint64_t delta;
+
+if (!cxl_dstate->timestamp.set) {
+*(uint64_t *)cmd->payload = 0;
+goto done;
+}
+
+/* First find the delta from the last time the host set the time. */
+clock_gettime(CLOCK_REALTIME, );
+delta = (ts.tv_sec * NANOSECONDS_PER_SECOND + ts.tv_nsec) -
+cxl_dstate->timestamp.last_set;
+
+/* Then adjust the actual time */
+stq_le_p(cmd->payload, cxl_dstate->timestamp.host_set + delta);
+
+done:
+*len = 8;
+return CXL_MBOX_SUCCESS;
+}
+
+/*
+ * 8.2.9.3.2
+ */
+define_mailbox_handler(TIMESTAMP_SET)
+{
+struct timespec ts;
+
+clock_gettime(CLOCK_REALTIME, );
+
+cxl_dstate->timestamp.set = true;
+cxl_dstate->timestamp.last_set =
+ts.tv_sec * NANOSECONDS_PER_SECOND + ts.tv_nsec;
+
+cxl_dstate->timestamp.host_set = le64_to_cpu(*(uint64_t *)cmd->payload);
+
+*len = 0;
+return CXL_MBOX_SUCCESS;
+}
+
 QemuUUID cel_uuid;
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 0cc5354ba4..ca5328a581 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -107,6 +107,12 @@ typedef struct cxl_device_state {
 size_t cel_size;
 };
 
+struct {
+bool set;
+uint64_t last_set;
+uint64_t host_set;
+} timestamp;
+
 /* memory region for persistent memory, HDM */
 MemoryRegion *pmem;
 
-- 
2.30.0




[RFC PATCH v3 09/31] hw/cxl/device: Add log commands (8.2.9.4) + CEL

2021-02-01 Thread Ben Widawsky
CXL specification provides for the ability to obtain logs from the
device. Logs are either spec defined, like the "Command Effects Log"
(CEL), or vendor specific. UUIDs are defined for all log types.

The CEL is a mechanism to provide information to the host about which
commands are supported. It is useful both to determine which spec'd
optional commands are supported, as well as provide a list of vendor
specified commands that might be used. The CEL is already created as
part of mailbox initialization, but here it is now exported to hosts
that use these log commands.

Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-mailbox-utils.c | 67 ++
 1 file changed, 67 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 3d36614c0c..3f0ae8b9e5 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -46,6 +46,9 @@ enum {
 TIMESTAMP   = 0x03,
 #define GET   0x0
 #define SET   0x1
+LOGS= 0x04,
+#define GET_SUPPORTED 0x0
+#define GET_LOG   0x1
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -122,6 +125,8 @@ define_mailbox_handler_zeroed(EVENTS_GET_INTERRUPT_POLICY, 
4);
 define_mailbox_handler_nop(EVENTS_SET_INTERRUPT_POLICY);
 declare_mailbox_handler(TIMESTAMP_GET);
 declare_mailbox_handler(TIMESTAMP_SET);
+declare_mailbox_handler(LOGS_GET_SUPPORTED);
+declare_mailbox_handler(LOGS_GET_LOG);
 
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -137,6 +142,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 CXL_CMD(EVENTS, SET_INTERRUPT_POLICY, 4, IMMEDIATE_CONFIG_CHANGE),
 CXL_CMD(TIMESTAMP, GET, 0, 0),
 CXL_CMD(TIMESTAMP, SET, 8, IMMEDIATE_POLICY_CHANGE),
+CXL_CMD(LOGS, GET_SUPPORTED, 0, 0),
+CXL_CMD(LOGS, GET_LOG, 0x18, 0),
 };
 
 #undef CXL_CMD
@@ -188,6 +195,66 @@ define_mailbox_handler(TIMESTAMP_SET)
 
 QemuUUID cel_uuid;
 
+/* 8.2.9.4.1 */
+define_mailbox_handler(LOGS_GET_SUPPORTED)
+{
+struct {
+uint16_t entries;
+uint8_t rsvd[6];
+struct {
+QemuUUID uuid;
+uint32_t size;
+} log_entries[1];
+} __attribute__((packed)) *supported_logs = (void *)cmd->payload;
+_Static_assert(sizeof(*supported_logs) == 0x1c, "Bad supported log size");
+
+supported_logs->entries = 1;
+supported_logs->log_entries[0].uuid = cel_uuid;
+supported_logs->log_entries[0].size = 4 * cxl_dstate->cel_size;
+
+*len = sizeof(*supported_logs);
+return CXL_MBOX_SUCCESS;
+}
+
+/* 8.2.9.4.2 */
+define_mailbox_handler(LOGS_GET_LOG)
+{
+struct {
+QemuUUID uuid;
+uint32_t offset;
+uint32_t length;
+} __attribute__((packed, __aligned__(16))) *get_log = (void *)cmd->payload;
+
+/*
+ * 8.2.9.4.2
+ *   The device shall return Invalid Parameter if the Offset or Length
+ *   fields attempt to access beyond the size of the log as reported by Get
+ *   Supported Logs.
+ *
+ * XXX: Spec is wrong, "Invalid Parameter" isn't a thing.
+ * XXX: Spec doesn't address incorrect UUID incorrectness.
+ *
+ * The CEL buffer is large enough to fit all commands in the emulation, so
+ * the only possible failure would be if the mailbox itself isn't big
+ * enough.
+ */
+if (get_log->offset + get_log->length > cxl_dstate->payload_size) {
+return CXL_MBOX_INVALID_INPUT;
+}
+
+if (!qemu_uuid_is_equal(_log->uuid, _uuid)) {
+return CXL_MBOX_UNSUPPORTED;
+}
+
+/* Store off everything to local variables so we can wipe out the payload 
*/
+*len = get_log->length;
+
+memmove(cmd->payload, cxl_dstate->cel_log + get_log->offset,
+   get_log->length);
+
+return CXL_MBOX_SUCCESS;
+}
+
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
 {
 uint16_t ret = CXL_MBOX_SUCCESS;
-- 
2.30.0




[RFC PATCH v3 10/31] hw/pxb: Use a type for realizing expanders

2021-02-01 Thread Ben Widawsky
This opens up the possibility for more types of expanders (other than
PCI and PCIe). We'll need this to create a CXL expander.

Signed-off-by: Ben Widawsky 
---
 hw/pci-bridge/pci_expander_bridge.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index aedded1064..232b7ce305 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -24,6 +24,8 @@
 #include "hw/boards.h"
 #include "qom/object.h"
 
+enum BusType { PCI, PCIE };
+
 #define TYPE_PXB_BUS "pxb-bus"
 typedef struct PXBBus PXBBus;
 DECLARE_INSTANCE_CHECKER(PXBBus, PXB_BUS,
@@ -214,7 +216,8 @@ static gint pxb_compare(gconstpointer a, gconstpointer b)
0;
 }
 
-static void pxb_dev_realize_common(PCIDevice *dev, bool pcie, Error **errp)
+static void pxb_dev_realize_common(PCIDevice *dev, enum BusType type,
+   Error **errp)
 {
 PXBDev *pxb = convert_to_pxb(dev);
 DeviceState *ds, *bds = NULL;
@@ -239,7 +242,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
pcie, Error **errp)
 }
 
 ds = qdev_new(TYPE_PXB_HOST);
-if (pcie) {
+if (type == PCIE) {
 bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_PCIE_BUS);
 } else {
 bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, 
TYPE_PXB_BUS);
@@ -287,7 +290,7 @@ static void pxb_dev_realize(PCIDevice *dev, Error **errp)
 return;
 }
 
-pxb_dev_realize_common(dev, false, errp);
+pxb_dev_realize_common(dev, PCI, errp);
 }
 
 static void pxb_dev_exitfn(PCIDevice *pci_dev)
@@ -339,7 +342,7 @@ static void pxb_pcie_dev_realize(PCIDevice *dev, Error 
**errp)
 return;
 }
 
-pxb_dev_realize_common(dev, true, errp);
+pxb_dev_realize_common(dev, PCIE, errp);
 }
 
 static void pxb_pcie_dev_class_init(ObjectClass *klass, void *data)
-- 
2.30.0




[RFC PATCH v3 22/31] hw/cxl/device: Implement MMIO HDM decoding (8.2.5.12)

2021-02-01 Thread Ben Widawsky
A device's volatile and persistent memory are known Host Defined Memory
(HDM) regions. The mechanism by which the device is programmed to claim
the addresses associated with those regions is through dedicated logic
known as the HDM decoder. In order to allow the OS to properly program
the HDMs, the HDM decoders must be modeled.

There are two ways the HDM decoders can be implemented, the legacy
mechanism is through the PCIe DVSEC programming from CXL 1.1 (8.1.3.8),
and MMIO is found in 8.2.5.12 of the spec. For now, 8.1.3.8 is not
implemented.

Much of CXL device logic is implemented in cxl-utils. The HDM decoder
however is implemented directly by the device implementation. The
generic cxl-utils probably should be the correct place to put this since
HDM decoders aren't unique to a type3 device. It is however easier at
the moment, and requires less design consideration to simply implement
it in the device, and figure out how to consolidate it later.

Signed-off-by: Ben Widawsky 
---
 hw/mem/cxl_type3.c | 92 ++
 1 file changed, 84 insertions(+), 8 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 4e9a016448..fe02c3b63c 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -57,6 +57,84 @@ static void build_dvsecs(CXLType3Dev *ct3d)
REG_LOC_DVSEC_REVID, dvsec);
 }
 
+static void cxl_set_addr(CXLType3Dev *ct3d, hwaddr addr, Error **errp)
+{
+MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(ct3d);
+mdc->set_addr(MEMORY_DEVICE(ct3d), addr, errp);
+}
+
+static void hdm_decoder_commit(CXLType3Dev *ct3d, int which)
+{
+MemoryRegion *pmem = ct3d->cxl_dstate.pmem;
+MemoryRegion *mr = host_memory_backend_get_memory(ct3d->hostmem);
+Range window, device;
+ComponentRegisters *cregs = >cxl_cstate.crb;
+uint32_t *cache_mem = cregs->cache_mem_registers;
+uint64_t offset, size;
+Error *err = NULL;
+
+assert(which == 0);
+
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMIT, 0);
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, ERROR, 0);
+
+offset = ((uint64_t)cache_mem[R_CXL_HDM_DECODER0_BASE_HI] << 32) |
+ cache_mem[R_CXL_HDM_DECODER0_BASE_LO];
+size = ((uint64_t)cache_mem[R_CXL_HDM_DECODER0_SIZE_HI] << 32) |
+   cache_mem[R_CXL_HDM_DECODER0_SIZE_LO];
+
+range_init_nofail(, mr->addr, memory_region_size(mr));
+range_init_nofail(, offset, size);
+
+if (!range_contains_range(, )) {
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, ERROR, 1);
+return;
+}
+
+/*
+ * FIXME: Support resizing.
+ * Maybe just memory_region_ram_resize(pmem, size, )?
+ */
+if (size != memory_region_size(pmem)) {
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, ERROR, 1);
+return;
+}
+
+cxl_set_addr(ct3d, offset, );
+if (err) {
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, ERROR, 1);
+return;
+}
+memory_region_set_enabled(pmem, true);
+
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMITTED, 1);
+}
+
+static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value, 
unsigned size)
+{
+CXLComponentState *cxl_cstate = opaque;
+ComponentRegisters *cregs = _cstate->crb;
+CXLType3Dev *ct3d = container_of(cxl_cstate, CXLType3Dev, cxl_cstate);
+uint32_t *cache_mem = cregs->cache_mem_registers;
+bool should_commit = false;
+int which_hdm = -1;
+
+assert(size == 4);
+
+switch (offset) {
+case A_CXL_HDM_DECODER0_CTRL:
+should_commit = FIELD_EX32(value, CXL_HDM_DECODER0_CTRL, COMMIT);
+which_hdm = 0;
+break;
+default:
+break;
+}
+
+stl_le_p((uint8_t *)cache_mem + offset, value);
+if (should_commit)
+hdm_decoder_commit(ct3d, which_hdm);
+}
+
 static void ct3_instance_init(Object *obj)
 {
 /* MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(obj); */
@@ -65,18 +143,13 @@ static void ct3_instance_init(Object *obj)
 static void ct3_finalize(Object *obj)
 {
 CXLType3Dev *ct3d = CT3(obj);
+CXLComponentState *cxl_cstate = >cxl_cstate;
+ComponentRegisters *regs = _cstate->crb;
 
+g_free((void *)regs->special_ops);
 g_free(ct3d->cxl_dstate.pmem);
 }
 
-#ifdef SET_PMEM_PADDR
-static void cxl_set_addr(CXLType3Dev *ct3d, hwaddr addr, Error **errp)
-{
-MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(ct3d);
-mdc->set_addr(MEMORY_DEVICE(ct3d), addr, errp);
-}
-#endif
-
 static void cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
 MemoryRegionSection mrs;
@@ -160,6 +233,9 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 ct3d->cxl_cstate.pdev = pci_dev;
 build_dvsecs(ct3d);
 
+regs->special_ops = g_new0(MemoryRegionOps, 1);
+regs->special_ops->write = ct3d_reg_write;
+
 cxl_component_register_block_init(OBJECT(pci_dev), cxl_cstate,
   TYPE_CXL_TYPE3_DEV);
 
-- 

[RFC PATCH v3 04/31] hw/cxl/device: Implement the CAP array (8.2.8.1-2)

2021-02-01 Thread Ben Widawsky
This implements all device MMIO up to the first capability. That
includes the CXL Device Capabilities Array Register, as well as all of
the CXL Device Capability Header Registers. The latter are filled in as
they are implemented in the following patches.

Endianness and alignment are managed by softmmu memory core.

Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-device-utils.c   | 105 
 hw/cxl/meson.build  |   1 +
 include/hw/cxl/cxl_device.h |  27 +-
 3 files changed, 132 insertions(+), 1 deletion(-)
 create mode 100644 hw/cxl/cxl-device-utils.c

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
new file mode 100644
index 00..bb15ad9a0f
--- /dev/null
+++ b/hw/cxl/cxl-device-utils.c
@@ -0,0 +1,105 @@
+/*
+ * CXL Utility library for devices
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/cxl/cxl.h"
+
+/*
+ * Device registers have no restrictions per the spec, and so fall back to the
+ * default memory mapped register rules in 8.2:
+ *   Software shall use CXL.io Memory Read and Write to access memory mapped
+ *   register defined in this section. Unless otherwise specified, software
+ *   shall restrict the accesses width based on the following:
+ *   • A 32 bit register shall   be accessed as a 1 Byte, 2 Bytes or 4 Bytes
+ * quantity.
+ *   • A 64 bit register shall be accessed as a 1 Byte, 2 Bytes, 4 Bytes or 8
+ * Bytes
+ *   • The address shall be a multiple of the access width, e.g. when
+ * accessing a register as a 4 Byte quantity, the address shall be
+ * multiple of 4.
+ *   • The accesses shall map to contiguous bytes.If these rules are not
+ * followed, the behavior is undefined
+ */
+
+static uint64_t caps_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+CXLDeviceState *cxl_dstate = opaque;
+
+return cxl_dstate->caps_reg_state32[offset / 4];
+}
+
+static uint64_t dev_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+return 0;
+}
+
+static const MemoryRegionOps dev_ops = {
+.read = dev_reg_read,
+.write = NULL, /* status register is read only */
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 1,
+.max_access_size = 8,
+},
+};
+
+static const MemoryRegionOps caps_ops = {
+.read = caps_reg_read,
+.write = NULL, /* caps registers are read only */
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 4,
+.max_access_size = 4,
+},
+};
+
+void cxl_device_register_block_init(Object *obj, CXLDeviceState *cxl_dstate)
+{
+/* This will be a BAR, so needs to be rounded up to pow2 for PCI spec */
+memory_region_init(_dstate->device_registers, obj, "device-registers",
+   pow2ceil(CXL_MMIO_SIZE));
+
+memory_region_init_io(_dstate->caps, obj, _ops, cxl_dstate,
+  "cap-array", CXL_DEVICE_REGISTERS_OFFSET - 0);
+memory_region_init_io(_dstate->device, obj, _ops, cxl_dstate,
+  "device-status", CXL_DEVICE_REGISTERS_LENGTH);
+
+memory_region_add_subregion(_dstate->device_registers, 0,
+_dstate->caps);
+memory_region_add_subregion(_dstate->device_registers,
+CXL_DEVICE_REGISTERS_OFFSET,
+_dstate->device);
+}
+
+static void device_reg_init_common(CXLDeviceState *cxl_dstate) { }
+
+void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
+{
+uint32_t *cap_hdrs = cxl_dstate->caps_reg_state32;
+const int cap_count = 1;
+
+/* CXL Device Capabilities Array Register */
+ARRAY_FIELD_DP32(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_ID, 0);
+ARRAY_FIELD_DP32(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_VERSION, 1);
+ARRAY_FIELD_DP32(cap_hdrs, CXL_DEV_CAP_ARRAY2, CAP_COUNT, cap_count);
+
+cxl_device_cap_init(cxl_dstate, DEVICE, 1);
+device_reg_init_common(cxl_dstate);
+}
diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
index 00c3876a0f..47154d6850 100644
--- a/hw/cxl/meson.build
+++ b/hw/cxl/meson.build
@@ -1,3 +1,4 @@
 softmmu_ss.add(when: 'CONFIG_CXL', if_true: files(
   'cxl-component-utils.c',
+  'cxl-device-utils.c',
 ))
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index a85f250503..f3bcf19410 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -58,6 +58,8 @@
 #define CXL_DEVICE_CAP_HDR1_OFFSET 0x10 /* Figure 138 */
 #define CXL_DEVICE_CAP_REG_SIZE 0x10 /* 8.2.8.2 */
 #define CXL_DEVICE_CAPS_MAX 4 /* 8.2.8.2.1 + 8.2.8.5 */

[RFC PATCH v3 06/31] hw/cxl/device: Add memory device utilities

2021-02-01 Thread Ben Widawsky
Memory devices implement extra capabilities on top of CXL devices. This
adds support for that.

A large part of memory devices is the mailbox/command interface. All of
the mailbox handling is done in the mailbox-utils library. Longer term,
new CXL devices that are being emulated may want to handle commands
differently, and therefore would need a mechanism to opt in/out of the
specific generic handlers. As such, this is considered sufficient for
now, but may need more depth in the future.

Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-device-utils.c   | 38 -
 include/hw/cxl/cxl_device.h | 18 +-
 2 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index 6602606f3d..639ace523d 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -130,6 +130,31 @@ static void mailbox_reg_write(void *opaque, hwaddr offset, 
uint64_t value,
 cxl_process_mailbox(cxl_dstate);
 }
 
+static uint64_t mdev_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+uint64_t retval = 0;
+
+retval = FIELD_DP64(retval, CXL_MEM_DEV_STS, MEDIA_STATUS, 1);
+retval = FIELD_DP64(retval, CXL_MEM_DEV_STS, MBOX_READY, 1);
+
+return retval;
+}
+
+static const MemoryRegionOps mdev_ops = {
+.read = mdev_reg_read,
+.write = NULL, /* memory device register is read only */
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 8,
+.max_access_size = 8,
+},
+};
+
 static const MemoryRegionOps mailbox_ops = {
 .read = mailbox_reg_read,
 .write = mailbox_reg_write,
@@ -187,6 +212,9 @@ void cxl_device_register_block_init(Object *obj, 
CXLDeviceState *cxl_dstate)
   "device-status", CXL_DEVICE_REGISTERS_LENGTH);
 memory_region_init_io(_dstate->mailbox, obj, _ops, cxl_dstate,
   "mailbox", CXL_MAILBOX_REGISTERS_LENGTH);
+memory_region_init_io(_dstate->memory_device, obj, _ops,
+  cxl_dstate, "memory device caps",
+  CXL_MEMORY_DEVICE_REGISTERS_LENGTH);
 
 memory_region_add_subregion(_dstate->device_registers, 0,
 _dstate->caps);
@@ -196,6 +224,9 @@ void cxl_device_register_block_init(Object *obj, 
CXLDeviceState *cxl_dstate)
 memory_region_add_subregion(_dstate->device_registers,
 CXL_MAILBOX_REGISTERS_OFFSET,
 _dstate->mailbox);
+memory_region_add_subregion(_dstate->device_registers,
+CXL_MEMORY_DEVICE_REGISTERS_OFFSET,
+_dstate->memory_device);
 }
 
 static void device_reg_init_common(CXLDeviceState *cxl_dstate) { }
@@ -208,10 +239,12 @@ static void mailbox_reg_init_common(CXLDeviceState 
*cxl_dstate)
 cxl_dstate->payload_size = CXL_MAILBOX_MAX_PAYLOAD_SIZE;
 }
 
+static void memdev_reg_init_common(CXLDeviceState *cxl_dstate) { }
+
 void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
 {
 uint32_t *cap_hdrs = cxl_dstate->caps_reg_state32;
-const int cap_count = 2;
+const int cap_count = 3;
 
 /* CXL Device Capabilities Array Register */
 ARRAY_FIELD_DP32(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_ID, 0);
@@ -224,5 +257,8 @@ void cxl_device_register_init_common(CXLDeviceState 
*cxl_dstate)
 cxl_device_cap_init(cxl_dstate, MAILBOX, 2);
 mailbox_reg_init_common(cxl_dstate);
 
+cxl_device_cap_init(cxl_dstate, MEMORY_DEVICE, 0x4000);
+memdev_reg_init_common(cxl_dstate);
+
 assert(cxl_initialize_mailbox(cxl_dstate) == 0);
 }
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index af91bec10c..0cc5354ba4 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -72,15 +72,20 @@
 #define CXL_MAILBOX_REGISTERS_LENGTH \
 (CXL_MAILBOX_REGISTERS_SIZE + CXL_MAILBOX_MAX_PAYLOAD_SIZE)
 
+#define CXL_MEMORY_DEVICE_REGISTERS_OFFSET \
+(CXL_MAILBOX_REGISTERS_OFFSET + CXL_MAILBOX_REGISTERS_LENGTH)
+#define CXL_MEMORY_DEVICE_REGISTERS_LENGTH 0x8
+
 #define CXL_MMIO_SIZE   \
 CXL_DEVICE_CAP_REG_SIZE + CXL_DEVICE_REGISTERS_LENGTH + \
-CXL_MAILBOX_REGISTERS_LENGTH
+CXL_MAILBOX_REGISTERS_LENGTH + CXL_MEMORY_DEVICE_REGISTERS_LENGTH
 
 typedef struct cxl_device_state {
 MemoryRegion device_registers;
 
 /* mmio for device capabilities array - 8.2.8.2 */
 MemoryRegion device;
+MemoryRegion memory_device;
 struct {
 MemoryRegion caps;
 uint32_t caps_reg_state32[CXL_CAPS_SIZE / 4];
@@ -145,6 +150,9 @@ REG32(CXL_DEV_CAP_ARRAY2, 4) /* We're going to pretend it's 
64b */
 CXL_DEVICE_CAPABILITY_HEADER_REGISTER(DEVICE, CXL_DEVICE_CAP_HDR1_OFFSET)
 CXL_DEVICE_CAPABILITY_HEADER_REGISTER(MAILBOX, 

[RFC PATCH v3 17/31] hw/cxl/component: Implement host bridge MMIO (8.2.5, table 142)

2021-02-01 Thread Ben Widawsky
CXL host bridges themselves may have MMIO. Since host bridges don't have
a BAR they are treated as special for MMIO.

Signed-off-by: Ben Widawsky 

--

It's arbitrarily chosen here to pick 0xD000 as the base for the host
bridge MMIO. I'm not sure what the right way to find free space for
platform hardcoded things like this is.
---
 hw/pci-bridge/pci_expander_bridge.c | 53 -
 include/hw/cxl/cxl.h|  2 ++
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index 5021b60435..226a8a5fff 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -17,6 +17,7 @@
 #include "hw/pci/pci_host.h"
 #include "hw/qdev-properties.h"
 #include "hw/pci/pci_bridge.h"
+#include "hw/cxl/cxl.h"
 #include "qemu/range.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
@@ -70,6 +71,12 @@ struct PXBDev {
 int32_t uid;
 };
 
+typedef struct CXLHost {
+PCIHostState parent_obj;
+
+CXLComponentState cxl_cstate;
+} CXLHost;
+
 static PXBDev *convert_to_pxb(PCIDevice *dev)
 {
 /* A CXL PXB's parent bus is PCIe, so the normal check won't work */
@@ -85,6 +92,9 @@ static GList *pxb_dev_list;
 
 #define TYPE_PXB_HOST "pxb-host"
 
+#define TYPE_PXB_CXL_HOST "pxb-cxl-host"
+#define PXB_CXL_HOST(obj) OBJECT_CHECK(CXLHost, (obj), TYPE_PXB_CXL_HOST)
+
 static int pxb_bus_num(PCIBus *bus)
 {
 PXBDev *pxb = convert_to_pxb(bus->parent_dev);
@@ -198,6 +208,46 @@ static const TypeInfo pxb_host_info = {
 .class_init= pxb_host_class_init,
 };
 
+static void pxb_cxl_realize(DeviceState *dev, Error **errp)
+{
+SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
+PCIHostState *phb = PCI_HOST_BRIDGE(dev);
+CXLHost *cxl = PXB_CXL_HOST(dev);
+CXLComponentState *cxl_cstate = >cxl_cstate;
+struct MemoryRegion *mr = _cstate->crb.component_registers;
+
+cxl_component_register_block_init(OBJECT(dev), cxl_cstate,
+  TYPE_PXB_CXL_HOST);
+sysbus_init_mmio(sbd, mr);
+
+/* FIXME: support multiple host bridges. */
+sysbus_mmio_map(sbd, 0, CXL_HOST_BASE +
+memory_region_size(mr) * pci_bus_uid(phb->bus));
+}
+
+static void pxb_cxl_host_class_init(ObjectClass *class, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(class);
+PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(class);
+
+hc->root_bus_path = pxb_host_root_bus_path;
+dc->fw_name = "cxl";
+dc->realize = pxb_cxl_realize;
+/* Reason: Internal part of the pxb/pxb-pcie device, not usable by itself 
*/
+dc->user_creatable = false;
+}
+
+/*
+ * This is a device to handle the MMIO for a CXL host bridge. It does nothing
+ * else.
+ */
+static const TypeInfo cxl_host_info = {
+.name  = TYPE_PXB_CXL_HOST,
+.parent= TYPE_PCI_HOST_BRIDGE,
+.instance_size = sizeof(CXLHost),
+.class_init= pxb_cxl_host_class_init,
+};
+
 /*
  * Registers the PXB bus as a child of pci host root bus.
  */
@@ -272,7 +322,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, enum 
BusType type,
 dev_name = dev->qdev.id;
 }
 
-ds = qdev_new(TYPE_PXB_HOST);
+ds = qdev_new(type == CXL ? TYPE_PXB_CXL_HOST : TYPE_PXB_HOST);
 if (type == PCIE) {
 bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_PCIE_BUS);
 } else if (type == CXL) {
@@ -466,6 +516,7 @@ static void pxb_register_types(void)
 type_register_static(_pcie_bus_info);
 type_register_static(_cxl_bus_info);
 type_register_static(_host_info);
+type_register_static(_host_info);
 type_register_static(_dev_info);
 type_register_static(_pcie_dev_info);
 type_register_static(_cxl_dev_info);
diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
index 362cda40de..6bc344f205 100644
--- a/include/hw/cxl/cxl.h
+++ b/include/hw/cxl/cxl.h
@@ -17,5 +17,7 @@
 #define COMPONENT_REG_BAR_IDX 0
 #define DEVICE_REG_BAR_IDX 2
 
+#define CXL_HOST_BASE 0xD000
+
 #endif
 
-- 
2.30.0




[RFC PATCH v3 03/31] hw/cxl/device: Introduce a CXL device (8.2.8)

2021-02-01 Thread Ben Widawsky
A CXL device is a type of CXL component. Conceptually, a CXL device
would be a leaf node in a CXL topology. From an emulation perspective,
CXL devices are the most complex and so the actual implementation is
reserved for discrete commits.

This new device type is specifically catered towards the eventual
implementation of a Type3 CXL.mem device, 8.2.8.5 in the CXL 2.0
specification.

Signed-off-by: Ben Widawsky 
---
 include/hw/cxl/cxl.h|   1 +
 include/hw/cxl/cxl_device.h | 155 
 2 files changed, 156 insertions(+)
 create mode 100644 include/hw/cxl/cxl_device.h

diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
index 55f6cc30a5..23f52c4cf9 100644
--- a/include/hw/cxl/cxl.h
+++ b/include/hw/cxl/cxl.h
@@ -12,6 +12,7 @@
 
 #include "cxl_pci.h"
 #include "cxl_component.h"
+#include "cxl_device.h"
 
 #endif
 
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
new file mode 100644
index 00..a85f250503
--- /dev/null
+++ b/include/hw/cxl/cxl_device.h
@@ -0,0 +1,155 @@
+/*
+ * QEMU CXL Devices
+ *
+ * Copyright (c) 2020 Intel
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ */
+
+#ifndef CXL_DEVICE_H
+#define CXL_DEVICE_H
+
+#include "hw/register.h"
+
+/*
+ * The following is how a CXL device's MMIO space is laid out. The only
+ * requirement from the spec is that the capabilities array and the capability
+ * headers start at offset 0 and are contiguously packed. The headers 
themselves
+ * provide offsets to the register fields. For this emulation, registers will
+ * start at offset 0x80 (m == 0x80). No secondary mailbox is implemented which
+ * means that n = m + sizeof(mailbox registers) + sizeof(device registers).
+ *
+ * This is roughly described in 8.2.8 Figure 138 of the CXL 2.0 spec.
+ *
+ * n + PAYLOAD_SIZE_MAX  +-+
+ *   | |
+ *  ^| |
+ *  || |
+ *  || |
+ *  || |
+ *  || Command Payload |
+ *  || |
+ *  || |
+ *  || |
+ *  || |
+ *  || |
+ *  n+-+
+ *  ^| |
+ *  ||Device Capability Registers  |
+ *  ||x, mailbox, y|
+ *  || |
+ *  m+-+
+ *  ^| Device Capability Header y  |
+ *  |+-+
+ *  || Device Capability Header Mailbox|
+ *  |+- 
+ *  || Device Capability Header x  |
+ *  |+-+
+ *  || |
+ *  || |
+ *  ||  Device Cap Array[0..n] |
+ *  || |
+ *  || |
+ *  || |
+ *  0+-+
+ */
+
+#define CXL_DEVICE_CAP_HDR1_OFFSET 0x10 /* Figure 138 */
+#define CXL_DEVICE_CAP_REG_SIZE 0x10 /* 8.2.8.2 */
+#define CXL_DEVICE_CAPS_MAX 4 /* 8.2.8.2.1 + 8.2.8.5 */
+
+#define CXL_DEVICE_REGISTERS_OFFSET 0x80 /* Read comment above */
+#define CXL_DEVICE_REGISTERS_LENGTH 0x8 /* 8.2.8.3.1 */
+
+#define CXL_MAILBOX_REGISTERS_OFFSET \
+(CXL_DEVICE_REGISTERS_OFFSET + CXL_DEVICE_REGISTERS_LENGTH)
+#define CXL_MAILBOX_REGISTERS_SIZE 0x20
+#define CXL_MAILBOX_PAYLOAD_SHIFT 11
+#define CXL_MAILBOX_MAX_PAYLOAD_SIZE (1 << CXL_MAILBOX_PAYLOAD_SHIFT)
+#define CXL_MAILBOX_REGISTERS_LENGTH \
+(CXL_MAILBOX_REGISTERS_SIZE + CXL_MAILBOX_MAX_PAYLOAD_SIZE)
+
+typedef struct cxl_device_state {
+MemoryRegion device_registers;
+
+/* mmio for device capabilities array - 8.2.8.2 */
+MemoryRegion caps;
+
+/* mmio for the device status registers 8.2.8.3 */
+MemoryRegion device;
+
+/* mmio for the mailbox registers 8.2.8.4 */
+MemoryRegion mailbox;
+
+/* memory region for persistent memory, HDM */
+MemoryRegion *pmem;
+
+/* memory region for volatile  memory, HDM */
+MemoryRegion *vmem;
+} CXLDeviceState;
+
+/* Initialize the register block for a 

[RFC PATCH v3 05/31] hw/cxl/device: Implement basic mailbox (8.2.8.4)

2021-02-01 Thread Ben Widawsky
This is the beginning of implementing mailbox support for CXL 2.0
devices. The implementation recognizes when the doorbell is rung,
handles the command/payload, clears the doorbell while returning error
codes and data.

Generally the mailbox mechanism is designed to permit communication
between the host OS and the firmware running on the device. For our
purposes, we emulate both the firmware, implemented primarily in
cxl-mailbox-utils.c, and the hardware.

No commands are implemented yet.

Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-device-utils.c   | 125 ++-
 hw/cxl/cxl-mailbox-utils.c  | 197 
 hw/cxl/meson.build  |   1 +
 include/hw/cxl/cxl.h|   3 +
 include/hw/cxl/cxl_device.h |  28 -
 5 files changed, 349 insertions(+), 5 deletions(-)
 create mode 100644 hw/cxl/cxl-mailbox-utils.c

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index bb15ad9a0f..6602606f3d 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -40,6 +40,111 @@ static uint64_t dev_reg_read(void *opaque, hwaddr offset, 
unsigned size)
 return 0;
 }
 
+static uint64_t mailbox_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+CXLDeviceState *cxl_dstate = opaque;
+
+switch (size) {
+case 8:
+return cxl_dstate->mbox_reg_state64[offset / 8];
+case 4:
+return cxl_dstate->mbox_reg_state32[offset / 4];
+default:
+g_assert_not_reached();
+}
+}
+
+static void mailbox_mem_writel(uint32_t *reg_state, hwaddr offset,
+   uint64_t value)
+{
+switch (offset) {
+case A_CXL_DEV_MAILBOX_CTRL:
+/* fallthrough */
+case A_CXL_DEV_MAILBOX_CAP:
+/* RO register */
+break;
+default:
+qemu_log_mask(LOG_UNIMP,
+  "%s Unexpected 32-bit access to 0x%" PRIx64 " (WI)\n",
+  __func__, offset);
+break;
+}
+
+reg_state[offset / 4] = value;
+}
+
+static void mailbox_mem_writeq(uint64_t *reg_state, hwaddr offset,
+   uint64_t value)
+{
+switch (offset) {
+case A_CXL_DEV_MAILBOX_CMD:
+break;
+case A_CXL_DEV_BG_CMD_STS:
+/* BG not supported */
+/* fallthrough */
+case A_CXL_DEV_MAILBOX_STS:
+/* Read only register, will get updated by the state machine */
+return;
+default:
+qemu_log_mask(LOG_UNIMP,
+  "%s Unexpected 64-bit access to 0x%" PRIx64 " (WI)\n",
+  __func__, offset);
+return;
+}
+
+
+reg_state[offset / 8] = value;
+}
+
+static void mailbox_reg_write(void *opaque, hwaddr offset, uint64_t value,
+  unsigned size)
+{
+CXLDeviceState *cxl_dstate = opaque;
+
+if (offset >= A_CXL_DEV_CMD_PAYLOAD) {
+memcpy(cxl_dstate->mbox_reg_state + offset, , size);
+return;
+}
+
+/*
+ * Lock is needed to prevent concurrent writes as well as to prevent writes
+ * coming in while the firmware is processing. Without background commands
+ * or the second mailbox implemented, this serves no purpose since the
+ * memory access is synchronized at a higher level (per memory region).
+ */
+RCU_READ_LOCK_GUARD();
+
+switch (size) {
+case 4:
+mailbox_mem_writel(cxl_dstate->mbox_reg_state32, offset, value);
+break;
+case 8:
+mailbox_mem_writeq(cxl_dstate->mbox_reg_state64, offset, value);
+break;
+default:
+g_assert_not_reached();
+}
+
+if (ARRAY_FIELD_EX32(cxl_dstate->mbox_reg_state32, CXL_DEV_MAILBOX_CTRL,
+ DOORBELL))
+cxl_process_mailbox(cxl_dstate);
+}
+
+static const MemoryRegionOps mailbox_ops = {
+.read = mailbox_reg_read,
+.write = mailbox_reg_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 4,
+.max_access_size = 8,
+},
+};
+
 static const MemoryRegionOps dev_ops = {
 .read = dev_reg_read,
 .write = NULL, /* status register is read only */
@@ -80,20 +185,33 @@ void cxl_device_register_block_init(Object *obj, 
CXLDeviceState *cxl_dstate)
   "cap-array", CXL_DEVICE_REGISTERS_OFFSET - 0);
 memory_region_init_io(_dstate->device, obj, _ops, cxl_dstate,
   "device-status", CXL_DEVICE_REGISTERS_LENGTH);
+memory_region_init_io(_dstate->mailbox, obj, _ops, cxl_dstate,
+  "mailbox", CXL_MAILBOX_REGISTERS_LENGTH);
 
 memory_region_add_subregion(_dstate->device_registers, 0,
 _dstate->caps);
 memory_region_add_subregion(_dstate->device_registers,
 CXL_DEVICE_REGISTERS_OFFSET,
 _dstate->device);
+

[RFC PATCH v3 07/31] hw/cxl/device: Add cheap EVENTS implementation (8.2.9.1)

2021-02-01 Thread Ben Widawsky
Using the previously implemented stubbed helpers, it is now possible to
easily add the missing, required commands to the implementation.

Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-mailbox-utils.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 466055b01a..7c939a1851 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -37,6 +37,14 @@
  *  a register interface that already deals with it.
  */
 
+enum {
+EVENTS  = 0x01,
+#define GET_RECORDS   0x0
+#define CLEAR_RECORDS   0x1
+#define GET_INTERRUPT_POLICY   0x2
+#define SET_INTERRUPT_POLICY   0x3
+};
+
 /* 8.2.8.4.5.1 Command Return Codes */
 typedef enum {
 CXL_MBOX_SUCCESS = 0x0,
@@ -105,10 +113,23 @@ struct cxl_cmd {
 return CXL_MBOX_SUCCESS;  \
 }
 
+define_mailbox_handler_zeroed(EVENTS_GET_RECORDS, 0x20);
+define_mailbox_handler_nop(EVENTS_CLEAR_RECORDS);
+define_mailbox_handler_zeroed(EVENTS_GET_INTERRUPT_POLICY, 4);
+define_mailbox_handler_nop(EVENTS_SET_INTERRUPT_POLICY);
+
+#define IMMEDIATE_CONFIG_CHANGE (1 << 1)
+#define IMMEDIATE_LOG_CHANGE (1 << 4)
+
 #define CXL_CMD(s, c, in, cel_effect) \
 [s][c] = { stringify(s##_##c), cmd_##s##_##c, in, cel_effect }
 
-static struct cxl_cmd cxl_cmd_set[256][256] = {};
+static struct cxl_cmd cxl_cmd_set[256][256] = {
+CXL_CMD(EVENTS, GET_RECORDS, 1, 0),
+CXL_CMD(EVENTS, CLEAR_RECORDS, ~0, IMMEDIATE_LOG_CHANGE),
+CXL_CMD(EVENTS, GET_INTERRUPT_POLICY, 0, 0),
+CXL_CMD(EVENTS, SET_INTERRUPT_POLICY, 4, IMMEDIATE_CONFIG_CHANGE),
+};
 
 #undef CXL_CMD
 
-- 
2.30.0




[RFC PATCH v3 16/31] hw/pci: Plumb _UID through host bridges

2021-02-01 Thread Ben Widawsky
Currently, QEMU makes _UID equivalent to the bus number (_BBN). While
there is nothing wrong with doing it this way, CXL spec has a heavy
reliance on _UID to identify host bridges and there is no link to the
bus number. Having a distinct UID solves two problems. The first is it
gets us around the limitation of 256 (current max bus number). The
second is it allows us to replicate hardware configurations where bus
number and uid aren't equivalent. The latter has benefits for our
development and debugging using QEMU.

The other way to do this would be to implement the expanded bus
numbering, but having an explicit uid makes more sense when trying to
replicate real hardware configurations.

The QEMU commandline to utilize this would be:
  -device pxb-cxl,id=cxl.0,bus="pcie.0",bus_nr=1,uid=x

Signed-off-by: Ben Widawsky 

--

I'm guessing this patch will be somewhat controversial. For early CXL
work, this can be dropped without too much heartache.
---
 hw/i386/acpi-build.c|  3 ++-
 hw/pci-bridge/pci_expander_bridge.c | 19 +++
 hw/pci/pci.c| 11 +++
 include/hw/pci/pci.h|  1 +
 include/hw/pci/pci_bus.h|  1 +
 5 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index cf6eb54c22..145a503e92 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1343,6 +1343,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 QLIST_FOREACH(bus, >child, sibling) {
 uint8_t bus_num = pci_bus_num(bus);
 uint8_t numa_node = pci_bus_numa_node(bus);
+int32_t uid = pci_bus_uid(bus);
 
 /* look only for expander root buses */
 if (!pci_bus_is_root(bus)) {
@@ -1356,7 +1357,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 scope = aml_scope("\\_SB");
 dev = aml_device("PC%.02X", bus_num);
 aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
-init_pci_acpi(dev, bus_num, pci_bus_is_express(bus) ? PCIE : PCI);
+init_pci_acpi(dev, uid, pci_bus_is_express(bus) ? PCIE : PCI);
 
 if (numa_node != NUMA_NODE_UNASSIGNED) {
 aml_append(dev, aml_name_decl("_PXM", aml_int(numa_node)));
diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index b42592e1ff..5021b60435 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -67,6 +67,7 @@ struct PXBDev {
 
 uint8_t bus_nr;
 uint16_t numa_node;
+int32_t uid;
 };
 
 static PXBDev *convert_to_pxb(PCIDevice *dev)
@@ -98,12 +99,20 @@ static uint16_t pxb_bus_numa_node(PCIBus *bus)
 return pxb->numa_node;
 }
 
+static int32_t pxb_bus_uid(PCIBus *bus)
+{
+PXBDev *pxb = convert_to_pxb(bus->parent_dev);
+
+return pxb->uid;
+}
+
 static void pxb_bus_class_init(ObjectClass *class, void *data)
 {
 PCIBusClass *pbc = PCI_BUS_CLASS(class);
 
 pbc->bus_num = pxb_bus_num;
 pbc->numa_node = pxb_bus_numa_node;
+pbc->uid = pxb_bus_uid;
 }
 
 static const TypeInfo pxb_bus_info = {
@@ -329,6 +338,7 @@ static Property pxb_dev_properties[] = {
 /* Note: 0 is not a legal PXB bus number. */
 DEFINE_PROP_UINT8("bus_nr", PXBDev, bus_nr, 0),
 DEFINE_PROP_UINT16("numa_node", PXBDev, numa_node, NUMA_NODE_UNASSIGNED),
+DEFINE_PROP_INT32("uid", PXBDev, uid, -1),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -400,12 +410,21 @@ static const TypeInfo pxb_pcie_dev_info = {
 
 static void pxb_cxl_dev_realize(PCIDevice *dev, Error **errp)
 {
+PXBDev *pxb = convert_to_pxb(dev);
+
 /* A CXL PXB's parent bus is still PCIe */
 if (!pci_bus_is_express(pci_get_bus(dev))) {
 error_setg(errp, "pxb-cxl devices cannot reside on a PCI bus");
 return;
 }
 
+if (pxb->uid < 0) {
+error_setg(errp, "pxb-cxl devices must have a valid uid 
(0-2147483647)");
+return;
+}
+
+/* FIXME: Check that uid doesn't collide with UIDs of other host bridges */
+
 pxb_dev_realize_common(dev, CXL, errp);
 }
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index adbe8aa260..bf019d91a0 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -170,6 +170,11 @@ static uint16_t pcibus_numa_node(PCIBus *bus)
 return NUMA_NODE_UNASSIGNED;
 }
 
+static int32_t pcibus_uid(PCIBus *bus)
+{
+return -1;
+}
+
 static void pci_bus_class_init(ObjectClass *klass, void *data)
 {
 BusClass *k = BUS_CLASS(klass);
@@ -184,6 +189,7 @@ static void pci_bus_class_init(ObjectClass *klass, void 
*data)
 
 pbc->bus_num = pcibus_num;
 pbc->numa_node = pcibus_numa_node;
+pbc->uid = pcibus_uid;
 }
 
 static const TypeInfo pci_bus_info = {
@@ -530,6 +536,11 @@ int pci_bus_numa_node(PCIBus *bus)
 return PCI_BUS_GET_CLASS(bus)->numa_node(bus);
 }
 
+int pci_bus_uid(PCIBus *bus)
+{
+return PCI_BUS_GET_CLASS(bus)->uid(bus);
+}
+
 static int get_pci_config_device(QEMUFile *f, void *pv, 

[RFC PATCH v3 01/31] hw/pci/cxl: Add a CXL component type (interface)

2021-02-01 Thread Ben Widawsky
A CXL component is a hardware entity that implements CXL component
registers from the CXL 2.0 spec (8.2.3). Currently these represent 3
general types.
1. Host Bridge
2. Ports (root, upstream, downstream)
3. Devices (memory, other)

A CXL component can be conceptually thought of as a PCIe device with
extra functionality when enumerated and enabled. For this reason, CXL
does here, and will continue to add on to existing PCI code paths.

Host bridges will typically need to be handled specially and so they can
implement this newly introduced interface or not. All other components
should implement this interface. Implementing this interface allows the
core pci code to treat these devices as special where appropriate.

Signed-off-by: Ben Widawsky 
---
 hw/pci/pci.c | 10 ++
 include/hw/pci/pci.h |  8 
 2 files changed, 18 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 512e9042ff..a45ca326ed 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -194,6 +194,11 @@ static const TypeInfo pci_bus_info = {
 .class_init = pci_bus_class_init,
 };
 
+static const TypeInfo cxl_interface_info = {
+.name  = INTERFACE_CXL_DEVICE,
+.parent= TYPE_INTERFACE,
+};
+
 static const TypeInfo pcie_interface_info = {
 .name  = INTERFACE_PCIE_DEVICE,
 .parent= TYPE_INTERFACE,
@@ -2091,6 +2096,10 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
 }
 
+if (object_class_dynamic_cast(klass, INTERFACE_CXL_DEVICE)) {
+pci_dev->cap_present |= QEMU_PCIE_CAP_CXL;
+}
+
 pci_dev = do_pci_register_device(pci_dev,
  object_get_typename(OBJECT(qdev)),
  pci_dev->devfn, errp);
@@ -2817,6 +2826,7 @@ static void pci_register_types(void)
 type_register_static(_bus_info);
 type_register_static(_bus_info);
 type_register_static(_pci_interface_info);
+type_register_static(_interface_info);
 type_register_static(_interface_info);
 type_register_static(_device_type_info);
 }
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 66db08462f..528cef341c 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -195,6 +195,8 @@ enum {
 QEMU_PCIE_LNKSTA_DLLLA = (1 << QEMU_PCIE_LNKSTA_DLLLA_BITNR),
 #define QEMU_PCIE_EXTCAP_INIT_BITNR 9
 QEMU_PCIE_EXTCAP_INIT = (1 << QEMU_PCIE_EXTCAP_INIT_BITNR),
+#define QEMU_PCIE_CXL_BITNR 10
+QEMU_PCIE_CAP_CXL = (1 << QEMU_PCIE_CXL_BITNR),
 };
 
 #define TYPE_PCI_DEVICE "pci-device"
@@ -202,6 +204,12 @@ typedef struct PCIDeviceClass PCIDeviceClass;
 DECLARE_OBJ_CHECKERS(PCIDevice, PCIDeviceClass,
  PCI_DEVICE, TYPE_PCI_DEVICE)
 
+/*
+ * Implemented by devices that can be plugged on CXL buses. In the spec, this 
is
+ * actually a "CXL Component, but we name it device to match the PCI naming.
+ */
+#define INTERFACE_CXL_DEVICE "cxl-device"
+
 /* Implemented by devices that can be plugged on PCI Express buses */
 #define INTERFACE_PCIE_DEVICE "pci-express-device"
 
-- 
2.30.0




[RFC PATCH v3 00/31] CXL 2.0 Support

2021-02-01 Thread Ben Widawsky
Major changes since v2 [1]:
 * Removed all register endian/alignment/size checking. Using core functionality
   instead. This untested on big endian hosts, but Should Work(tm).
 * Fix component capability header generation (off by 1).
 * Fixed HDM programming (multiple issues).
 * Fixed timestamp command implementations.
 * Added commands: GET_FIRMWARE_UPDATE_INFO, GET_PARTITION_INFO, GET_LSA, 
SET_LSA

Things have remained fairly stable since since v2. The biggest change here is
definitely the HDM programming which has received limited (but not 0) testing in
the Linux driver.

Jonathan Cameron has gotten this patch series working on ARM [2], and added some
much sought after functionality [3].

---

I've started #cxl on OFTC IRC for discussion. Please feel free to use that
channel for questions or suggestions in addition to #qemu.

---

Introduce emulation of Compute Express Link 2.0
(https://www.computeexpresslink.org/). Specifically, add support for Type 3
memory expanders with persistent memory.

The emulation has been critical to get the Linux enabling started [4], it would
be an ideal place to land regression tests for different topology handling, and
there may be applications for this emulation as a way for a guest to manipulate
its address space relative to different performance memories.

Three of the five CXL component types are emulated with some level of
functionality: host bridge, root port, and memory device. All components and
devices implement basic MMIO. Devices/memory devices implement the mailbo
interface. Basic ACPI support is also included. Upstream ports and downstream
ports aren't implemented (the two components needed to make up a switch).

CXL 2.0 is built on top of PCIe (see spec for details). As a result, much of the
implementation utilizes existing PCI paradigms. To implement the host bridge,
I've chosen to use PXB (PCI Expander Bridge). It seemed to be the most natural
fit even though it doesn't directly map to how hardware will work. For
persistent capacity of the memory device, I utilized the memory subsystem
(hw/mem).

We have 3 reasons why this work is valuable:
1. Linux driver feature development benefits from emulation both due to a lack
   of initial hardware availability, but also, as is seen with NVDIMM/PMEM
   emulation, there is value in being able to share topologies with
   system-software developers even after hardware is available.

2. The Linux kernel's unit test suite for NVDIMM/PMEM ended up injecting fake
   resources via custom modules (nfit_test). In retrospect a QEMU emulation of
   nfit_test capabilities would have made the test environment more portable,
   and allowed for easier community contributions of example configurations.

3. This is still being fleshed out, but in short it provides a standardized
   mechanism for the guest to provide feedback to the host about size and
   placement needs of the memory. After the host gives the guest a physical
   window mapping to the CXL device, the emulated HDM decoders allow the guest a
   way to tell the host how much it wants and where. There are likely simpler
   ways to do this, but they'd require inventing a new interface and you'd need
   to have diverging driver code in the guest programming of the HDM decoder vs.
   the host. Since we've already done this work, why not use it?

There is quite a long list of work to do for full spec compliance, but I don't
believe that any of it precludes merging. Off the top of my head:
- Main host bridge support (WIP)
- Interleaving
- Better Tests
- Hot plug support
- Emulating volatile capacity
- CDAT emulation [3]

The flow of the patches in general is to define all the data structures and
registers associated with the various components in a top down manner. Host
bridge, component, ports, devices. Then, the actual implementation is done in
the same order.

The summary is:
1-5: Infrastructure for component and device emulation
6-9: Basic mailbox command implementations
10-19: Implement CXL host bridges as PXB devices
20: Implement a root port
21-22: Implement a memory device
23-26: ACPI bits
27-29: Add some more advanced mailbox command implementations
30: Start working on enabling the main host bridge
31: Basic test case

---

[1]: 
https://lore.kernel.org/qemu-devel/20210105165323.783725-1-ben.widaw...@intel.com/
[2]: 
https://lore.kernel.org/qemu-devel/20210201152655.31027-1-jonathan.came...@huawei.com/
[3]: 
https://lore.kernel.org/qemu-devel/20210201151629.29656-1-jonathan.came...@huawei.com/
[4]: 
https://lore.kernel.org/linux-cxl/20210130002438.1872527-1-ben.widaw...@intel.com/

---

Ben Widawsky (31):
  hw/pci/cxl: Add a CXL component type (interface)
  hw/cxl/component: Introduce CXL components (8.1.x, 8.2.5)
  hw/cxl/device: Introduce a CXL device (8.2.8)
  hw/cxl/device: Implement the CAP array (8.2.8.1-2)
  hw/cxl/device: Implement basic mailbox (8.2.8.4)
  hw/cxl/device: Add memory device utilities
  hw/cxl/device: Add cheap EVENTS implementation 

[RFC PATCH v3 02/31] hw/cxl/component: Introduce CXL components (8.1.x, 8.2.5)

2021-02-01 Thread Ben Widawsky
A CXL 2.0 component is any entity in the CXL topology. All components
have a analogous function in PCIe. Except for the CXL host bridge, all
have a PCIe config space that is accessible via the common PCIe
mechanisms. CXL components are enumerated via DVSEC fields in the
extended PCIe header space. CXL components will minimally implement some
subset of CXL.mem and CXL.cache registers defined in 8.2.5 of the CXL
2.0 specification. Two headers and a utility library are introduced to
support the minimum functionality needed to enumerate components.

The cxl_pci header manages bits associated with PCI, specifically the
DVSEC and related fields. The cxl_component.h variant has data
structures and APIs that are useful for drivers implementing any of the
CXL 2.0 components. The library takes care of making use of the DVSEC
bits and the CXL.[mem|cache] registers. Per spec, the registers are
little endian.

None of the mechanisms required to enumerate a CXL capable hostbridge
are introduced at this point.

Note that the CXL.mem and CXL.cache registers used are always 4B wide.
It's possible in the future that this constraint will not hold.

Signed-off-by: Ben Widawsky 
---
 MAINTAINERS|   6 +
 hw/Kconfig |   1 +
 hw/cxl/Kconfig |   3 +
 hw/cxl/cxl-component-utils.c   | 208 +
 hw/cxl/meson.build |   3 +
 hw/meson.build |   1 +
 include/hw/cxl/cxl.h   |  17 +++
 include/hw/cxl/cxl_component.h | 187 +
 include/hw/cxl/cxl_pci.h   | 138 ++
 9 files changed, 564 insertions(+)
 create mode 100644 hw/cxl/Kconfig
 create mode 100644 hw/cxl/cxl-component-utils.c
 create mode 100644 hw/cxl/meson.build
 create mode 100644 include/hw/cxl/cxl.h
 create mode 100644 include/hw/cxl/cxl_component.h
 create mode 100644 include/hw/cxl/cxl_pci.h

diff --git a/MAINTAINERS b/MAINTAINERS
index bcd88668bc..981dc92e25 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2234,6 +2234,12 @@ F: qapi/block*.json
 F: qapi/transaction.json
 T: git https://repo.or.cz/qemu/armbru.git block-next
 
+Compute Express Link
+M: Ben Widawsky 
+S: Supported
+F: hw/cxl/
+F: include/hw/cxl/
+
 Dirty Bitmaps
 M: Eric Blake 
 M: Vladimir Sementsov-Ogievskiy 
diff --git a/hw/Kconfig b/hw/Kconfig
index 5ad3c6b5a4..c03650c5ed 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -6,6 +6,7 @@ source audio/Kconfig
 source block/Kconfig
 source char/Kconfig
 source core/Kconfig
+source cxl/Kconfig
 source display/Kconfig
 source dma/Kconfig
 source gpio/Kconfig
diff --git a/hw/cxl/Kconfig b/hw/cxl/Kconfig
new file mode 100644
index 00..8e67519b16
--- /dev/null
+++ b/hw/cxl/Kconfig
@@ -0,0 +1,3 @@
+config CXL
+bool
+default y if PCI_EXPRESS
diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
new file mode 100644
index 00..8d56ad5c7d
--- /dev/null
+++ b/hw/cxl/cxl-component-utils.c
@@ -0,0 +1,208 @@
+/*
+ * CXL Utility library for components
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/pci/pci.h"
+#include "hw/cxl/cxl.h"
+
+static uint64_t cxl_cache_mem_read_reg(void *opaque, hwaddr offset,
+   unsigned size)
+{
+CXLComponentState *cxl_cstate = opaque;
+ComponentRegisters *cregs = _cstate->crb;
+
+assert(size == 4);
+
+if (cregs->special_ops && cregs->special_ops->read) {
+return cregs->special_ops->read(cxl_cstate, offset, size);
+} else {
+return cregs->cache_mem_registers[offset / 4];
+}
+}
+
+static void cxl_cache_mem_write_reg(void *opaque, hwaddr offset, uint64_t 
value,
+unsigned size)
+{
+CXLComponentState *cxl_cstate = opaque;
+ComponentRegisters *cregs = _cstate->crb;
+
+assert(size == 4);
+
+if (cregs->special_ops && cregs->special_ops->write) {
+cregs->special_ops->write(cxl_cstate, offset, value, size);
+} else {
+cregs->cache_mem_registers[offset / 4] = value;
+}
+}
+
+/*
+ * 8.2.3
+ *   The access restrictions specified in Section 8.2.2 also apply to CXL 2.0
+ *   Component Registers.
+ *
+ * 8.2.2
+ *   • A 32 bit register shall be accessed as a 4 Bytes quantity. Partial
+ *   reads are not permitted.
+ *   • A 64 bit register shall be accessed as a 8 Bytes quantity. Partial
+ *   reads are not permitted.
+ *
+ * As of the spec defined today, only 4 byte registers exist.
+ */
+static const MemoryRegionOps cache_mem_ops = {
+.read = cxl_cache_mem_read_reg,
+.write = cxl_cache_mem_write_reg,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 4,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 4,
+

Re: [PATCH 1/3] i386: Add missing "vmx-ept-wb" feature name

2021-02-01 Thread Eduardo Habkost
On Tue, Feb 02, 2021 at 12:28:38AM +0100, Paolo Bonzini wrote:
> Il mar 2 feb 2021, 00:05 Eduardo Habkost  ha scritto:
> 
> > On Mon, Feb 01, 2021 at 11:59:48PM +0100, Paolo Bonzini wrote:
> > > Il lun 1 feb 2021, 23:54 Eduardo Habkost  ha
> > scritto:
> > >
> > > > Not having a feature name in feature_word_info breaks error
> > > > reporting and query-cpu-model-expansion.  Add the missing feature
> > > > name to feature_word_info[FEAT_VMX_EPT_VPID_CAPS].feat_names[14].
> > > >
> > > This is intentional, because there's no way that any hypervisor can run
> > if
> > > this feature is disabled.
> >
> > If leaving the feature without name enables some desirable
> > behavior, that's by accident and not by design.  Which part of
> > the existing behavior is intentional?
> >
> 
> Not being able to disable it.

We can make it a hard dependency of vmx, then.  We shouldn't
leave it without a name, though.

-- 
Eduardo




Re: [PATCH] docs/system: document an example vexpress-a15 invocation

2021-02-01 Thread Alex Bennée


Peter Maydell  writes:

> On Mon, 1 Feb 2021 at 20:09, Alex Bennée  wrote:
>>
>>
>> Peter Maydell  writes:
>>
>> > On Thu, 28 Jan 2021 at 18:53, Alex Bennée  wrote:
>> >>
>> >> The wiki and the web are curiously absent of the right runes to boot a
>> >> vexpress model so I had to work from first principles to work it out.
>> >> Use the more modern -drive notation so alternative backends can be
>> >> used (unlike the hardwired -sd mode).
>> >>
>> >> Signed-off-by: Alex Bennée 
>> >> Cc: Anders Roxell 
>> >> ---
>> >>  docs/system/arm/vexpress.rst | 26 ++
>> >>  1 file changed, 26 insertions(+)
>> >>
>> >> diff --git a/docs/system/arm/vexpress.rst b/docs/system/arm/vexpress.rst
>> >> index 7f1bcbef07..30b1823b95 100644
>> >> --- a/docs/system/arm/vexpress.rst
>> >> +++ b/docs/system/arm/vexpress.rst
>> >> @@ -58,3 +58,29 @@ Other differences between the hardware and the QEMU 
>> >> model:
>> >>``vexpress-a15``, and have IRQs from 40 upwards. If a dtb is
>> >>provided on the command line then QEMU will edit it to include
>> >>suitable entries describing these transports for the guest.
>> >> +
>> >> +Booting a Linux kernel
>> >> +--
>> >> +
>> >> +Building a current Linux kernel with ``multi_v7_defconfig`` should be
>> >> +enough to get something running.
>> >> +
>> >> +.. code-block:: bash
>> >> +
>> >> +  $ export ARCH=arm
>> >> +  $ export CROSS_COMPILE=arm-linux-gnueabihf-
>> >> +  $ make multi_v7_defconfig
>> >> +  $ make
>> >
>> > We probably shouldn't be recommending in-tree kernel builds, or
>> > polluting the user's environment with random variables. Try:
>> >
>> > $ make O=builddir ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- 
>> > multi_v7_defconfig
>> > $ make O=builddir ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf-
>
>> Building a current Linux kernel with ``multi_v7_defconfig`` should be
>> enough to get something running. Nowadays an out-of-tree build is
>> recommended (and also useful if you build a lot of different targets).
>> $SRC points at root of the linux source tree.
>>
>> .. code-block:: bash
>>
>>   $ mkdir build; cd build
>>   $ make O=$(pwd) -C $SRC ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- 
>> multi_v7_defconfig
>>   $ make O=$(pwd) -C $SRC ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf-
>
> That works, but do you really commonly cd into the build directory?
> I usually sit in the source tree...

I have tmux panes, lots of lots of tmux panes. It's more common to live
in the build directories for QEMU because I have to run the binaries I
build. For the kernels I usually have one pane in the tip of source tree
and a bunch of others for build configurations I'm actively messing
about with and rebuilding.

>
> thanks
> -- PMM


-- 
Alex Bennée



Re: [PATCH 1/3] i386: Add missing "vmx-ept-wb" feature name

2021-02-01 Thread Paolo Bonzini
Il mar 2 feb 2021, 00:05 Eduardo Habkost  ha scritto:

> On Mon, Feb 01, 2021 at 11:59:48PM +0100, Paolo Bonzini wrote:
> > Il lun 1 feb 2021, 23:54 Eduardo Habkost  ha
> scritto:
> >
> > > Not having a feature name in feature_word_info breaks error
> > > reporting and query-cpu-model-expansion.  Add the missing feature
> > > name to feature_word_info[FEAT_VMX_EPT_VPID_CAPS].feat_names[14].
> > >
> > This is intentional, because there's no way that any hypervisor can run
> if
> > this feature is disabled.
>
> If leaving the feature without name enables some desirable
> behavior, that's by accident and not by design.  Which part of
> the existing behavior is intentional?
>

Not being able to disable it.

Paolo


> --
> Eduardo
>
>


Re: [PATCH 1/3] i386: Add missing "vmx-ept-wb" feature name

2021-02-01 Thread Eduardo Habkost
On Mon, Feb 01, 2021 at 11:59:48PM +0100, Paolo Bonzini wrote:
> Il lun 1 feb 2021, 23:54 Eduardo Habkost  ha scritto:
> 
> > Not having a feature name in feature_word_info breaks error
> > reporting and query-cpu-model-expansion.  Add the missing feature
> > name to feature_word_info[FEAT_VMX_EPT_VPID_CAPS].feat_names[14].
> >
> This is intentional, because there's no way that any hypervisor can run if
> this feature is disabled.

If leaving the feature without name enables some desirable
behavior, that's by accident and not by design.  Which part of
the existing behavior is intentional?

-- 
Eduardo




Re: [PATCH 0/1] Allow to build virtiofsd without the entire tools

2021-02-01 Thread Paolo Bonzini
Il lun 1 feb 2021, 22:15 Wainer dos Santos Moschetta 
ha scritto:

> Not too long ago (QEMU 5.0) it was possible to configure with
> --disable-tools
> and still have virtiofsd built. With the recent port of the build system to
> Meson, it is now built together with the tools though.
>
> The Kata Containers [1] project build QEMU with --disable-tools to
> decrease the
> attack surface


---enable-tools only adds separate executables, therefore it can't add to
the attack surface of the emulators. So this is misleading.

That said, it does make sense to let --enable-virtiofsd override
--disable-tools, and the same in the other direction too.

Paolo

Side note: in a private chat with Stefan Hajnoczi he come up with the idea
> that perhaps --disable-tools could be like --without-default-features where
> one can add back on feature-by-feature basis. This is outside the scope of
> this
> series but I thought in sharing because IMHO it is deserves a discussion.


> [1] https://katacontainers.io
>
> Wainer dos Santos Moschetta (1):
>   virtiofsd: Allow to build it without the tools
>
>  tools/meson.build | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> --
> 2.29.2
>
>


Re: [PATCH 1/3] i386: Add missing "vmx-ept-wb" feature name

2021-02-01 Thread Paolo Bonzini
This is intentional, because there's no way that any hypervisor can run if
this feature is disabled.

Paolo

Il lun 1 feb 2021, 23:54 Eduardo Habkost  ha scritto:

> Not having a feature name in feature_word_info breaks error
> reporting and query-cpu-model-expansion.  Add the missing feature
> name to feature_word_info[FEAT_VMX_EPT_VPID_CAPS].feat_names[14].
>
> Fixes: 0723cc8a5558 ("target/i386: add VMX features to named CPU models")
> Signed-off-by: Eduardo Habkost 
> ---
>  target/i386/cpu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index ae89024d366..2bf3ab78056 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -1262,7 +1262,7 @@ static FeatureWordInfo
> feature_word_info[FEATURE_WORDS] = {
>  "vmx-ept-execonly", NULL, NULL, NULL,
>  NULL, NULL, "vmx-page-walk-4", "vmx-page-walk-5",
>  NULL, NULL, NULL, NULL,
> -NULL, NULL, NULL, NULL,
> +NULL, NULL, "vmx-ept-wb", NULL,
>  "vmx-ept-2mb", "vmx-ept-1gb", NULL, NULL,
>  "vmx-invept", "vmx-eptad", "vmx-ept-advanced-exitinfo", NULL,
>  NULL, "vmx-invept-single-context", "vmx-invept-all-context",
> NULL,
> --
> 2.28.0
>
>


[PATCH 0/3] i386: Ensure feature names are always defined

2021-02-01 Thread Eduardo Habkost
Forgetting to adding feature names to the feature array
seems to be a very common mistake.

Examples:

- Missing name for MSR_VMX_EPT_WB
  commit 0723cc8a5558 ("target/i386: add VMX features to named CPU models")
- Missing name for "ibrs" at
  
https://lore.kernel.org/qemu-devel/0ad4017d-e755-94a3-859e-800661bcd...@amd.com

This series fixes the MSR_VMX_EPT_WB problem and adds a runtime
check that should detect similar mistakes even before CPU model
classes are registered.

Eduardo Habkost (3):
  i386: Add missing "vmx-ept-wb" feature name
  i386: Move asserts to separate x86_cpudef_validate() function
  i386: Sanity check CPU model feature sets

 target/i386/cpu.c | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

-- 
2.28.0





[PATCH 3/3] i386: Sanity check CPU model feature sets

2021-02-01 Thread Eduardo Habkost
All CPU models must refer only to features that have their names
defined in feature_word_info[].feat_names, otherwise error
reporting and query-cpu-model-expansion will break.

Validate CPU feature flags in x86_cpudef_validate(), we can catch
mistakes more easily.

Signed-off-by: Eduardo Habkost 
---
 target/i386/cpu.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6285fb00eb8..3c066738e82 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5435,12 +5435,27 @@ static void x86_register_cpu_model_type(const char 
*name, X86CPUModel *model)
 static void x86_cpudef_validate(X86CPUDefinition *def)
 {
 #ifndef NDEBUG
+FeatureWord w;
+int bitnr;
+
 /* AMD aliases are handled at runtime based on CPUID vendor, so
  * they shouldn't be set on the CPU model table.
  */
 assert(!(def->features[FEAT_8000_0001_EDX] & CPUID_EXT2_AMD_ALIASES));
 /* catch mistakes instead of silently truncating model_id when too long */
 assert(def->model_id && strlen(def->model_id) <= 48);
+
+/*
+ * CPU models must enable only features with valid names, otherwise
+ * error reporting and query-cpu-model-expansion can't work correctly.
+ */
+for (w = 0; w < FEATURE_WORDS; w++) {
+for (bitnr = 0; bitnr < 64; bitnr++) {
+uint64_t mask = (1ULL << bitnr);
+assert(!(def->features[w] & mask) ||
+   feature_word_info[w].feat_names[bitnr]);
+}
+}
 #endif
 }
 
-- 
2.28.0




[PATCH 2/3] i386: Move asserts to separate x86_cpudef_validate() function

2021-02-01 Thread Eduardo Habkost
Additional sanity checks will be added to the code, so move the
existing asserts to a separate function.

Wrap the whole function in `#ifndef NDEBUG` because the checks
will become more complex than trivial assert() calls.

Signed-off-by: Eduardo Habkost 
---
 target/i386/cpu.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 2bf3ab78056..6285fb00eb8 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5431,17 +5431,25 @@ static void x86_register_cpu_model_type(const char 
*name, X86CPUModel *model)
 type_register();
 }
 
-static void x86_register_cpudef_types(X86CPUDefinition *def)
+/* Sanity check CPU model definition before registering it */
+static void x86_cpudef_validate(X86CPUDefinition *def)
 {
-X86CPUModel *m;
-const X86CPUVersionDefinition *vdef;
-
+#ifndef NDEBUG
 /* AMD aliases are handled at runtime based on CPUID vendor, so
  * they shouldn't be set on the CPU model table.
  */
 assert(!(def->features[FEAT_8000_0001_EDX] & CPUID_EXT2_AMD_ALIASES));
 /* catch mistakes instead of silently truncating model_id when too long */
 assert(def->model_id && strlen(def->model_id) <= 48);
+#endif
+}
+
+static void x86_register_cpudef_types(X86CPUDefinition *def)
+{
+X86CPUModel *m;
+const X86CPUVersionDefinition *vdef;
+
+x86_cpudef_validate(def);
 
 /* Unversioned model: */
 m = g_new0(X86CPUModel, 1);
-- 
2.28.0




[PATCH 1/3] i386: Add missing "vmx-ept-wb" feature name

2021-02-01 Thread Eduardo Habkost
Not having a feature name in feature_word_info breaks error
reporting and query-cpu-model-expansion.  Add the missing feature
name to feature_word_info[FEAT_VMX_EPT_VPID_CAPS].feat_names[14].

Fixes: 0723cc8a5558 ("target/i386: add VMX features to named CPU models")
Signed-off-by: Eduardo Habkost 
---
 target/i386/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index ae89024d366..2bf3ab78056 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1262,7 +1262,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = 
{
 "vmx-ept-execonly", NULL, NULL, NULL,
 NULL, NULL, "vmx-page-walk-4", "vmx-page-walk-5",
 NULL, NULL, NULL, NULL,
-NULL, NULL, NULL, NULL,
+NULL, NULL, "vmx-ept-wb", NULL,
 "vmx-ept-2mb", "vmx-ept-1gb", NULL, NULL,
 "vmx-invept", "vmx-eptad", "vmx-ept-advanced-exitinfo", NULL,
 NULL, "vmx-invept-single-context", "vmx-invept-all-context", NULL,
-- 
2.28.0




[PATCH] usb-host: use correct altsetting in usb_host_ep_update

2021-02-01 Thread Nick Rosbrook
In order to keep track of the alternate setting that should be used for
a given interface, the USBDevice struct keeps an array of alternate
setting values, which is indexed by the interface number. In
usb_host_set_interface, when this array is updated, usb_host_ep_update
is called as a result. However, when usb_host_ep_update accesses the
active libusb_config_descriptor, it indexes udev->altsetting with the
loop variable, rather than the interface number.

With the simple trace backend enable, this behavior can be seen:

  [...]

  usb_xhci_xfer_start 0.440 pid=1215 xfer=0x5596a4b85930 slotid=0x1 epid=0x1 
streamid=0x0
  usb_packet_state_change 1.703 pid=1215 bus=0x1 port=b'1' ep=0x0 
p=0x5596a4b85938 o=b'undef' n=b'setup'
  usb_host_req_control 2.269 pid=1215 bus=0x1 addr=0x5 p=0x5596a4b85938 
req=0x10b value=0x1 index=0xd
  usb_host_set_interface 0.449 pid=1215 bus=0x1 addr=0x5 interface=0xd alt=0x1
  usb_host_parse_config 2542.648 pid=1215 bus=0x1 addr=0x5 value=0x2 active=0x1
  usb_host_parse_interface 1.804 pid=1215 bus=0x1 addr=0x5 num=0xc alt=0x0 
active=0x1
  usb_host_parse_endpoint 2.012 pid=1215 bus=0x1 addr=0x5 ep=0x2 dir=b'in' 
type=b'int' active=0x1
  usb_host_parse_interface 1.598 pid=1215 bus=0x1 addr=0x5 num=0xd alt=0x0 
active=0x1
  usb_host_req_emulated 3.593 pid=1215 bus=0x1 addr=0x5 p=0x5596a4b85938 
status=0x0
  usb_packet_state_change 2.550 pid=1215 bus=0x1 port=b'1' ep=0x0 
p=0x5596a4b85938 o=b'setup' n=b'complete'
  usb_xhci_xfer_success 4.298 pid=1215 xfer=0x5596a4b85930 bytes=0x0

  [...]

In particular, it is seen that although usb_host_set_interface sets the
alternate setting of interface 0xd to 0x1, usb_host_ep_update uses 0x0
as the alternate setting due to using the incorrect index to
udev->altsetting.

Fix this problem by getting the interface number from the active
libusb_config_descriptor, and then using that as the index to
udev->altsetting.

Signed-off-by: Nick Rosbrook 
---
 hw/usb/host-libusb.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/hw/usb/host-libusb.c b/hw/usb/host-libusb.c
index fcf48c0193..6ab75e2feb 100644
--- a/hw/usb/host-libusb.c
+++ b/hw/usb/host-libusb.c
@@ -810,7 +810,7 @@ static void usb_host_ep_update(USBHostDevice *s)
 struct libusb_ss_endpoint_companion_descriptor *endp_ss_comp;
 #endif
 uint8_t devep, type;
-int pid, ep;
+int pid, ep, alt;
 int rc, i, e;
 
 usb_ep_reset(udev);
@@ -822,8 +822,20 @@ static void usb_host_ep_update(USBHostDevice *s)
 conf->bConfigurationValue, true);
 
 for (i = 0; i < conf->bNumInterfaces; i++) {
-assert(udev->altsetting[i] < conf->interface[i].num_altsetting);
-intf = >interface[i].altsetting[udev->altsetting[i]];
+/*
+ * The udev->altsetting array indexes alternate settings
+ * by the interface number. Get the 0th alternate setting
+ * first so that we can grab the interface number, and
+ * then correct the alternate setting value if necessary.
+ */
+intf = >interface[i].altsetting[0];
+alt = udev->altsetting[intf->bInterfaceNumber];
+
+if (alt != 0) {
+assert(alt < conf->interface[i].num_altsetting);
+intf = >interface[i].altsetting[alt];
+}
+
 trace_usb_host_parse_interface(s->bus_num, s->addr,
intf->bInterfaceNumber,
intf->bAlternateSetting, true);
-- 
2.17.1




Re: [PATCH] i386: Add the support for AMD EPYC 3rd generation processors

2021-02-01 Thread Babu Moger
Eduardo,
Please hold off on this patch. I need to update this patch.
Actually We need to add one more bit to SVM
feature(CPUID_SVM_SVME_ADDR_CHK). I was planning to do that this week.
Got busy with some other priority. Will send it this week, Sorry about it.
thanks
Babu


On 2/1/21 4:16 PM, Eduardo Habkost wrote:
> On Fri, Jan 22, 2021 at 10:36:27AM -0600, Babu Moger wrote:
>> Adds the support for AMD 3rd generation processors. The model
>> display for the new processor will be EPYC-Milan.
>>
>> Adds the following new feature bits on top of the feature bits from
>> the first and second generation EPYC models.
>>
>> pcid: Process context identifiers support
>> ibrs: Indirect Branch Restricted Speculation
>> ssbd: Speculative Store Bypass Disable
>> erms: Enhanced REP MOVSB/STOSB support
>> fsrm: Fast Short REP MOVSB support
>> invpcid : Invalidate processor context ID
>> pku : Protection keys support
>>
>> Signed-off-by: Babu Moger 
> [...]
>> @@ -4130,6 +4180,61 @@ static X86CPUDefinition builtin_x86_defs[] = {
>>  .model_id = "AMD EPYC-Rome Processor",
>>  .cache_info = _rome_cache_info,
>>  },
>> +{
>> +.name = "EPYC-Milan",
> [...]
>> +.features[FEAT_8000_0008_EBX] =
>> +CPUID_8000_0008_EBX_CLZERO | CPUID_8000_0008_EBX_XSAVEERPTR |
>> +CPUID_8000_0008_EBX_WBNOINVD | CPUID_8000_0008_EBX_IBPB |
>> +CPUID_8000_0008_EBX_IBRS | CPUID_8000_0008_EBX_STIBP |
>> +CPUID_8000_0008_EBX_AMD_SSBD,
> 
> This breaks query-cpu-model-expansion, see:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fehabkost%2Fqemu%2F-%2Fjobs%2F1000347471%23L350data=04%7C01%7Cbabu.moger%40amd.com%7Cc472108231e74551a34a08d8c6ff0975%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637478145976001070%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=UggCEUXrcWCaRLgyClQ53lLnWDqu6%2F9bnxcyJqjy68s%3Dreserved=0
> 
> 20:11:28 ERROR| Reproduced traceback from: 
> /builds/ehabkost/qemu/build/tests/venv/lib64/python3.6/site-packages/avocado/core/test.py:767
> 20:11:28 ERROR| Traceback (most recent call last):
> 20:11:28 ERROR|   File 
> "/builds/ehabkost/qemu/build/tests/acceptance/cpu_queries.py", line 31, in 
> test
> 20:11:28 ERROR| self.assertNotIn('', c['unavailable-features'], c['name'])
> 20:11:28 ERROR|   File "/usr/lib64/python3.6/unittest/case.py", line 1096, in 
> assertNotIn
> 20:11:28 ERROR| self.fail(self._formatMessage(msg, standardMsg))
> 20:11:28 ERROR|   File 
> "/builds/ehabkost/qemu/build/tests/venv/lib64/python3.6/site-packages/avocado/core/test.py",
>  line 953, in fail
> 20:11:28 ERROR| raise exceptions.TestFail(message)
> 20:11:28 ERROR| avocado.core.exceptions.TestFail: '' unexpectedly found in 
> ['fma', 'pcid', 'avx', 'f16c', 'avx2', 'invpcid', 'rdseed', 'sha-ni', 'umip', 
> 'rdpid', 'fsrm', 'fxsr-opt', 'misalignsse', '3dnowprefetch', 'osvw', 
> 'topoext', 'perfctr-core', 'clzero', 'xsaveerptr', 'wbnoinvd', 'ibpb', '', 
> 'amd-stibp', 'amd-ssbd', 'nrip-save', 'xsavec', 'xsaves'] : EPYC-Milan-v1
> 
> The root cause is the lack of name for CPUID_8000_0008_EBX_IBRS at
> feature_word_info[CPUID_8000_0008_EBX_IBRS].feat_names[14].
> 
> I'm applying the following fixup.
> 
> Signed-off-by: Eduardo Habkost 
> ---
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 06c92650a17..8d4baf72e5b 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -1033,7 +1033,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] 
> = {
>  "clzero", NULL, "xsaveerptr", NULL,
>  NULL, NULL, NULL, NULL,
>  NULL, "wbnoinvd", NULL, NULL,
> -"ibpb", NULL, NULL, "amd-stibp",
> +"ibpb", NULL, "ibrs", "amd-stibp",
>  NULL, NULL, NULL, NULL,
>  NULL, NULL, NULL, NULL,
>  "amd-ssbd", "virt-ssbd", "amd-no-ssb", NULL,
> 



Re: [PATCH] i386: Add the support for AMD EPYC 3rd generation processors

2021-02-01 Thread Eduardo Habkost
On Mon, Feb 01, 2021 at 04:29:50PM -0600, Babu Moger wrote:
> Eduardo,
> Please hold off on this patch. I need to update this patch.
> Actually We need to add one more bit to SVM
> feature(CPUID_SVM_SVME_ADDR_CHK). I was planning to do that this week.
> Got busy with some other priority. Will send it this week, Sorry about it.

No problem, thanks for the heads up!

-- 
Eduardo




Re: [PATCH] i386: Add the support for AMD EPYC 3rd generation processors

2021-02-01 Thread Eduardo Habkost
On Fri, Jan 22, 2021 at 10:36:27AM -0600, Babu Moger wrote:
> Adds the support for AMD 3rd generation processors. The model
> display for the new processor will be EPYC-Milan.
> 
> Adds the following new feature bits on top of the feature bits from
> the first and second generation EPYC models.
> 
> pcid: Process context identifiers support
> ibrs: Indirect Branch Restricted Speculation
> ssbd: Speculative Store Bypass Disable
> erms: Enhanced REP MOVSB/STOSB support
> fsrm: Fast Short REP MOVSB support
> invpcid : Invalidate processor context ID
> pku : Protection keys support
> 
> Signed-off-by: Babu Moger 
[...]
> @@ -4130,6 +4180,61 @@ static X86CPUDefinition builtin_x86_defs[] = {
>  .model_id = "AMD EPYC-Rome Processor",
>  .cache_info = _rome_cache_info,
>  },
> +{
> +.name = "EPYC-Milan",
[...]
> +.features[FEAT_8000_0008_EBX] =
> +CPUID_8000_0008_EBX_CLZERO | CPUID_8000_0008_EBX_XSAVEERPTR |
> +CPUID_8000_0008_EBX_WBNOINVD | CPUID_8000_0008_EBX_IBPB |
> +CPUID_8000_0008_EBX_IBRS | CPUID_8000_0008_EBX_STIBP |
> +CPUID_8000_0008_EBX_AMD_SSBD,

This breaks query-cpu-model-expansion, see:
https://gitlab.com/ehabkost/qemu/-/jobs/1000347471#L350

20:11:28 ERROR| Reproduced traceback from: 
/builds/ehabkost/qemu/build/tests/venv/lib64/python3.6/site-packages/avocado/core/test.py:767
20:11:28 ERROR| Traceback (most recent call last):
20:11:28 ERROR|   File 
"/builds/ehabkost/qemu/build/tests/acceptance/cpu_queries.py", line 31, in test
20:11:28 ERROR| self.assertNotIn('', c['unavailable-features'], c['name'])
20:11:28 ERROR|   File "/usr/lib64/python3.6/unittest/case.py", line 1096, in 
assertNotIn
20:11:28 ERROR| self.fail(self._formatMessage(msg, standardMsg))
20:11:28 ERROR|   File 
"/builds/ehabkost/qemu/build/tests/venv/lib64/python3.6/site-packages/avocado/core/test.py",
 line 953, in fail
20:11:28 ERROR| raise exceptions.TestFail(message)
20:11:28 ERROR| avocado.core.exceptions.TestFail: '' unexpectedly found in 
['fma', 'pcid', 'avx', 'f16c', 'avx2', 'invpcid', 'rdseed', 'sha-ni', 'umip', 
'rdpid', 'fsrm', 'fxsr-opt', 'misalignsse', '3dnowprefetch', 'osvw', 'topoext', 
'perfctr-core', 'clzero', 'xsaveerptr', 'wbnoinvd', 'ibpb', '', 'amd-stibp', 
'amd-ssbd', 'nrip-save', 'xsavec', 'xsaves'] : EPYC-Milan-v1

The root cause is the lack of name for CPUID_8000_0008_EBX_IBRS at
feature_word_info[CPUID_8000_0008_EBX_IBRS].feat_names[14].

I'm applying the following fixup.

Signed-off-by: Eduardo Habkost 
---
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 06c92650a17..8d4baf72e5b 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1033,7 +1033,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = 
{
 "clzero", NULL, "xsaveerptr", NULL,
 NULL, NULL, NULL, NULL,
 NULL, "wbnoinvd", NULL, NULL,
-"ibpb", NULL, NULL, "amd-stibp",
+"ibpb", NULL, "ibrs", "amd-stibp",
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 "amd-ssbd", "virt-ssbd", "amd-no-ssb", NULL,

-- 
Eduardo




Re: eMMC support

2021-02-01 Thread Cédric Le Goater
On 2/1/21 11:01 PM, Cédric Le Goater wrote:
> Hello, 
> 
>>> FYI, aspeed machines successfully boot on top of 16G emmc disk images.
>>> I merged some of xilinx patches on top of the aspeed-6.0 branch to
>>> improve the model completeness but only the one fixing powerup was
>>> really necessary.
>>>  
>>> The initial diffstat is rather small.
>>>  
>>>  hw/sd/sd.c |  168 
>>> ++---
>>>  
>>> We can surely find a way to merge support in mainline without
>>> covering the whole specs. The Extended CSD register would be the
>>> big part.
>>
>> */[Sai Pavan Boddu] I’m revisiting eMMC now, made some patches on top of 
>> previous series sent by “Vincent Palatin”./*
>>
>> */Would you like to share your changes, which made aspeed machines to work 
>> ?/*
>>
>> */Regards,
>> Sai Pavan/*
> 
> The patchset is in the aspeed-6.0 branch : 
> 
> df91d012672c Cédric Le Goater - hw/arm/aspeed: Load eMMC first boot area 
> as a boot rom
> 27b75a7ad322 Cédric Le Goater - hw/arm/aspeed: Add eMMC property
> 2836cf5a15a1 Joel Stanley - hw/arm/aspeed: Set boot device to emmc
> 42c9d57f5cd0 Joel Stanley - sd: mmc: Subtract bootarea size from blk
> 218301406607 Joel Stanley - sd: mmc: Support boot area in emmc image
> df0452923b56 Cédric Le Goater - sd: mmc: Add Extended CSD register 
> definitions
> 416c02bbfd32 Sai Pavan Boddu - sd: mmc: Add mmc switch function support
> a228aef1a209 Sai Pavan Boddu - sd: mmc: add CMD21 tuning sequence
> 9b177d7baf8e Sai Pavan Boddu - sd: mmc: Update CMD1 definition for MMC
> 6677e4eb6812 Vincent Palatin - sd: add eMMC support

here : 

https://github.com/legoater/qemu/commits/aspeed-6.0

C.




[PATCH v2] linux-user: fix O_NONBLOCK usage for hppa target

2021-02-01 Thread Helge Deller
Historically the parisc linux port tried to be compatible with HP-UX
userspace and as such defined the O_NONBLOCK constant to 024 to
emulate separate NDELAY & NONBLOCK values.

Since parisc was the only Linux platform which had two bits set, this
produced various userspace issues. Finally it was decided to drop the
(never completed) HP-UX compatibilty, which is why O_NONBLOCK was
changed upstream to only have one bit set in future with this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=75ae04206a4d0e4f541c1d692b7febd1c0fdb814

This patch simply adjusts the value for qemu-user too.

Signed-off-by: Helge Deller 

---

diff --git a/linux-user/hppa/target_fcntl.h b/linux-user/hppa/target_fcntl.h
index bd966a59b8..08e3a4fcb0 100644
--- a/linux-user/hppa/target_fcntl.h
+++ b/linux-user/hppa/target_fcntl.h
@@ -8,7 +8,7 @@
 #ifndef HPPA_TARGET_FCNTL_H
 #define HPPA_TARGET_FCNTL_H

-#define TARGET_O_NONBLOCK00024 /* HPUX has separate NDELAY & NONBLOCK 
*/
+#define TARGET_O_NONBLOCK00020
 #define TARGET_O_APPEND  00010
 #define TARGET_O_CREAT   00400 /* not fcntl */
 #define TARGET_O_EXCL02000 /* not fcntl */



Re: eMMC support

2021-02-01 Thread Cédric Le Goater
Hello, 

>> FYI, aspeed machines successfully boot on top of 16G emmc disk images.
>> I merged some of xilinx patches on top of the aspeed-6.0 branch to
>> improve the model completeness but only the one fixing powerup was
>> really necessary.
>> 
>> The initial diffstat is rather small.
>> 
>>  hw/sd/sd.c |  168 
>> ++---
>> 
>> We can surely find a way to merge support in mainline without
>> covering the whole specs. The Extended CSD register would be the
>> big part.
> 
> */[Sai Pavan Boddu] I’m revisiting eMMC now, made some patches on top of 
> previous series sent by “Vincent Palatin”./*
> 
> */Would you like to share your changes, which made aspeed machines to work ?/*
> 
> */Regards,
> Sai Pavan/*

The patchset is in the aspeed-6.0 branch : 

df91d012672c Cédric Le Goater - hw/arm/aspeed: Load eMMC first boot area as 
a boot rom
27b75a7ad322 Cédric Le Goater - hw/arm/aspeed: Add eMMC property
2836cf5a15a1 Joel Stanley - hw/arm/aspeed: Set boot device to emmc
42c9d57f5cd0 Joel Stanley - sd: mmc: Subtract bootarea size from blk
218301406607 Joel Stanley - sd: mmc: Support boot area in emmc image
df0452923b56 Cédric Le Goater - sd: mmc: Add Extended CSD register 
definitions
416c02bbfd32 Sai Pavan Boddu - sd: mmc: Add mmc switch function support
a228aef1a209 Sai Pavan Boddu - sd: mmc: add CMD21 tuning sequence
9b177d7baf8e Sai Pavan Boddu - sd: mmc: Update CMD1 definition for MMC
6677e4eb6812 Vincent Palatin - sd: add eMMC support

Also based on Vincent Palatin initial patches, reworked by Joel and I.

Booting an aspeed machine requires a bit of work since you need to build
an eMMc disk image with uboot in the boot partitions. 

Here is a little script from Joel to get you going.

Thanks,

C.


#!/bin/sh

URLBASE=https://jenkins.openbmc.org/view/latest/job/latest-master/label=docker-builder,target=witherspoon-tacoma/lastSuccessfulBuild/artifact/openbmc/build/tmp/deploy/images/witherspoon-tacoma/

IMAGESIZE=128
OUTFILE=mmc.img

FILES="u-boot.bin u-boot-spl.bin obmc-phosphor-image-witherspoon-tacoma.wic.xz"

for file in ${FILES}; do

if test -f ${file}; then
echo "${file}: Already downloaded"
else
echo "${file}: Downloading"
wget -nv ${URLBASE}/${file}
fi
done

echo

echo "Creating empty image..."
dd status=none if=/dev/zero of=${OUTFILE} bs=1M count=${IMAGESIZE}
echo "Adding SPL..."
dd status=none if=u-boot-spl.bin of=${OUTFILE} conv=notrunc
echo "Adding u-boot..."
dd status=none if=u-boot.bin of=${OUTFILE} conv=notrunc bs=1K seek=64
echo "Adding userdata..."
unxz -c obmc-phosphor-image-witherspoon-tacoma.wic.xz | dd status=progress 
of=${OUTFILE} conv=notrunc bs=1M seek=2
echo "Fixing size to keep qemu happy..."
truncate --size 16G ${OUTFILE}

echo "Done!"
echo
echo " qemu-system-arm -M tacoma-bmc -nographic -drive 
file=mmc.img,if=sd,index=2,format=raw"




Re: [PATCH v5 5/5] hw/block/nvme: add simple copy command

2021-02-01 Thread Klaus Jensen
On Jan 29 10:15, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Add support for TP 4065a ("Simple Copy Command"), v2020.05.04
> ("Ratified").
> 
> The implementation uses a bounce buffer to first read in the source
> logical blocks, then issue a write of that bounce buffer. The default
> maximum number of source logical blocks is 128, translating to 512 KiB
> for 4k logical blocks which aligns with the default value of MDTS.
> 
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme-ns.h|   4 +
>  hw/block/nvme.h   |   1 +
>  hw/block/nvme-ns.c|   8 ++
>  hw/block/nvme.c   | 253 +-
>  hw/block/trace-events |   7 ++
>  5 files changed, 272 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index c083000b8c1f..b26866ba4338 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -43,12 +43,18 @@ pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t 
> opcode, const char *opna
>  pci_nvme_read(uint16_t cid, uint32_t nsid, uint32_t nlb, uint64_t count, 
> uint64_t lba) "cid %"PRIu16" nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 
> 0x%"PRIx64""
>  pci_nvme_write(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, 
> uint64_t count, uint64_t lba) "cid %"PRIu16" opname '%s' nsid %"PRIu32" nlb 
> %"PRIu32" count %"PRIu64" lba 0x%"PRIx64""
>  pci_nvme_rw_cb(uint16_t cid, const char *blkname) "cid %"PRIu16" blk '%s'"
> +pci_nvme_copy(uint16_t cid, uint32_t nsid, uint16_t nr, uint8_t format) "cid 
> %"PRIu16" nsid %"PRIu32" nr %"PRIu16" format 0x%"PRIx8""
> +pci_nvme_copy_source_range(uint64_t slba, uint32_t nlb) "slba 0x%"PRIx64" 
> nlb %"PRIu32""
> +pci_nvme_copy_in_complete(uint16_t cid) "cid %"PRIu16""
> +pci_nvme_copy_cb(uint16_t cid) "cid %"PRIu16""
> +pci_nvme_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t 
> nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32""

Woops. An old trace event ended up in there when rebasing.


signature.asc
Description: PGP signature


  1   2   3   4   >