[PATCH v3] target-i386: Walk NPT in guest real mode

2024-09-21 Thread Alexander Graf
When translating virtual to physical address with a guest CPU that
supports nested paging (NPT), we need to perform every page table walk
access indirectly through the NPT, which we correctly do.

However, we treat real mode (no page table walk) special: In that case,
we currently just skip any walks and translate VA -> PA. With NPT
enabled, we also need to then perform NPT walk to do GVA -> GPA -> HPA
which we fail to do so far.

The net result of that is that TCG VMs with NPT enabled that execute
real mode code (like SeaBIOS) end up with GPA==HPA mappings which means
the guest accesses host code and data. This typically shows as failure
to boot guests.

This patch changes the page walk logic for NPT enabled guests so that we
always perform a GVA -> GPA translation and then skip any logic that
requires an actual PTE.

That way, all remaining logic to walk the NPT stays and we successfully
walk the NPT in real mode.

Fixes: fe441054bb3f0 ("target-i386: Add NPT support")

Signed-off-by: Alexander Graf 
Reported-by: Eduard Vlad 
Reviewed-by: Richard Henderson 

---

v1 -> v2:

  - Remove hack where we fake a PTE and instead just set the
corresponding resolved variables and jump straight to the
stage2 code.

v2 -> v3:

  - Fix comment
---
 target/i386/tcg/sysemu/excp_helper.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/target/i386/tcg/sysemu/excp_helper.c 
b/target/i386/tcg/sysemu/excp_helper.c
index 8fb05b1f53..24dd6935f9 100644
--- a/target/i386/tcg/sysemu/excp_helper.c
+++ b/target/i386/tcg/sysemu/excp_helper.c
@@ -298,7 +298,7 @@ static bool mmu_translate(CPUX86State *env, const 
TranslateParams *in,
 /* combine pde and pte nx, user and rw protections */
 ptep &= pte ^ PG_NX_MASK;
 page_size = 4096;
-} else {
+} else if (pg_mode) {
 /*
  * Page table level 2
  */
@@ -343,6 +343,15 @@ static bool mmu_translate(CPUX86State *env, const 
TranslateParams *in,
 ptep &= pte | PG_NX_MASK;
 page_size = 4096;
 rsvd_mask = 0;
+} else {
+/*
+ * No paging (real mode), let's tentatively resolve the address as 1:1
+ * here, but conditionally still perform an NPT walk on it later.
+ */
+page_size = 0x4000;
+paddr = in->addr;
+prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
+goto stage2;
 }
 
 do_check_protect:
@@ -420,6 +429,7 @@ do_check_protect_pse36:
 
 /* merge offset within page */
 paddr = (pte & PG_ADDRESS_MASK & ~(page_size - 1)) | (addr & (page_size - 
1));
+stage2:
 
 /*
  * Note that NPT is walked (for both paging structures and final guest
@@ -562,7 +572,7 @@ static bool get_physical_address(CPUX86State *env, vaddr 
addr,
 addr = (uint32_t)addr;
 }
 
-if (likely(env->cr[0] & CR0_PG_MASK)) {
+if (likely(env->cr[0] & CR0_PG_MASK || use_stage2)) {
 in.cr3 = env->cr[3];
 in.mmu_idx = mmu_idx;
 in.ptw_idx = use_stage2 ? MMU_NESTED_IDX : MMU_PHYS_IDX;
-- 
2.40.1




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597




[PATCH v2] target-i386: Walk NPT in guest real mode

2024-09-20 Thread Alexander Graf
When translating virtual to physical address with a guest CPU that
supports nested paging (NPT), we need to perform every page table walk
access indirectly through the NPT, which we correctly do.

However, we treat real mode (no page table walk) special: In that case,
we currently just skip any walks and translate VA -> PA. With NPT
enabled, we also need to then perform NPT walk to do GVA -> GPA -> HPA
which we fail to do so far.

The net result of that is that TCG VMs with NPT enabled that execute
real mode code (like SeaBIOS) end up with GPA==HPA mappings which means
the guest accesses host code and data. This typically shows as failure
to boot guests.

This patch changes the page walk logic for NPT enabled guests so that we
always perform a GVA -> GPA translation and then skip any logic that
requires an actual PTE.

That way, all remaining logic to walk the NPT stays and we successfully
walk the NPT in real mode.

Fixes: fe441054bb3f0 ("target-i386: Add NPT support")

Signed-off-by: Alexander Graf 
Reported-by: Eduard Vlad 

---

v1 -> v2:

  - Remove hack where we fake a PTE and instead just set the
corresponding resolved variables and jump straight to the
stage2 code.
---
 target/i386/tcg/sysemu/excp_helper.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/i386/tcg/sysemu/excp_helper.c 
b/target/i386/tcg/sysemu/excp_helper.c
index 8fb05b1f53..4622d45643 100644
--- a/target/i386/tcg/sysemu/excp_helper.c
+++ b/target/i386/tcg/sysemu/excp_helper.c
@@ -298,7 +298,7 @@ static bool mmu_translate(CPUX86State *env, const 
TranslateParams *in,
 /* combine pde and pte nx, user and rw protections */
 ptep &= pte ^ PG_NX_MASK;
 page_size = 4096;
-} else {
+} else if (pg_mode) {
 /*
  * Page table level 2
  */
@@ -343,6 +343,12 @@ static bool mmu_translate(CPUX86State *env, const 
TranslateParams *in,
 ptep &= pte | PG_NX_MASK;
 page_size = 4096;
 rsvd_mask = 0;
+} else {
+/* No paging (real mode), let's assemble a fake 1:1 1GiB PTE */
+page_size = 0x4000;
+paddr = in->addr;
+prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
+goto stage2;
 }
 
 do_check_protect:
@@ -420,6 +426,7 @@ do_check_protect_pse36:
 
 /* merge offset within page */
 paddr = (pte & PG_ADDRESS_MASK & ~(page_size - 1)) | (addr & (page_size - 
1));
+stage2:
 
 /*
  * Note that NPT is walked (for both paging structures and final guest
@@ -562,7 +569,7 @@ static bool get_physical_address(CPUX86State *env, vaddr 
addr,
 addr = (uint32_t)addr;
 }
 
-if (likely(env->cr[0] & CR0_PG_MASK)) {
+if (likely(env->cr[0] & CR0_PG_MASK || use_stage2)) {
 in.cr3 = env->cr[3];
 in.mmu_idx = mmu_idx;
 in.ptw_idx = use_stage2 ? MMU_NESTED_IDX : MMU_PHYS_IDX;
-- 
2.40.1




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597




Re: vm events, userspace, the vmgenid driver, and the future [was: the uevent revert thread]

2024-09-18 Thread Alexander Graf


On 19.09.24 00:27, Jason A. Donenfeld wrote:

[broadened subject line and added relevant parties to cc list]

On Tue, Sep 17, 2024 at 10:55:20PM +0200, Alexander Graf wrote:

What is still open are user space applications that require event based
notification on VM clone events - and *only* VM clone events. This
mostly caters for tools like systemd which need to execute policy - such
as generating randomly generated MAC addresses - in the event a VM was
cloned.

That's the use case this patch "vmgenid: emit uevent when VMGENID
updates" is about and I think the best path forward is to just revert
the revert. A uevent from the device driver is a well established, well
fitting Linux mechanism for that type of notification.

The thing that worries me is that vmgenid is just some weird random
microsoft acpi driver. It's one sort of particular device, and not a
very good one at that. There's still room for virtio/qemu to improve on
it with their own thing, or for vbox or whatever else to have their
version, and xen theirs, and so forth. That is to say, I'm not sure that
this virtual hardware is *the* way of doing it.



I agree, but given that it's been a few years and nobody else really 
came up with a different device, it means the current semantics for the 
scope of what the device is doing are close to "good enough". So I don't 
expect a lot of innovation here. And if there will be innovation - as 
you point out - it will bring different semantics that will then also 
require user space changes anyway.




Even in terms of the entropy stuff (which I know you no longer care
about, but I do), mst's original virtio-rng draft mentioned reporting
events beyond just VM forks, extending it generically to any kind of
entropy reduction situation. For example, migration or suspend or
whatever might be interesting things to trigger. Heck, one could imagine
those coming through vmgenid at some point, which would then change the
semantics you're after for systemd.



If they come through vmgenid, it would need to gain a new type of event 
at which point the uevent notification would also change.


I'm also not sure why live migration would trigger either a vm clone or 
any rng relevant event. And suspend is something we already have the 
machinery for to detect.




Even in terms of reporting exclusively about external VM events, there's
a subtle thing to consider between clones/forks and rollbacks, as well
as migrations. Vmgenid kind of lumps it all together, and hopefully the



It's the opposite: VMGenID is exclusively concerned about clones. It 
doesn't care about rollbacks. It doesn't care about migrations. Its 
value effectively changes when you clone a VM; and only then.




hypervisor notifies in a way consistent with what userspace was hoping
to learn about. (Right now, maybe we're doing what Hyper-V does, maybe,
but also maybe not; it's kind of loose.) So at some point, there's a
question about the limitations of vmgenid and the possible extensions of
it, or whether this will come in a different driver or virtual hardware,
and how.



To me a lot of this is too vague to be actionable. Unless someone comes 
in with real scenarios where they care about other scenarios, it sounds 
to me like the one scenario that vmgenid covers is what system level 
user space cares about. If in a few years we realize that we need 3 
different types of events, we can start looking at ways to funnel those 
in a more abstract way. Until then, because we don't know what these 
events will be, we can't even design an API that would address them.


Keep in mind that we're not really talking here about building a generic 
API for any random user space application. We only want to give system 
software the ability to reason about system events. IMHO any more 
abstract layer to funnel multiple different of these to downstream user 
space (if we ever care) would be a user space problem to solve, like for 
example a dbus event.




Right now, this is mostly unexplored. The virtio-rng avenue was largest
step in terms of exploring this problem space, but there are obviously a
few directions to go, depending on what your primary concern is.

But all of that makes me think that exposing the particulars of this
virtual hardware driver to userspace is not the best option, or at least
not an option to rush into (or to trick Greg into), and will both limit



I'm pretty sure I never tricked Greg into anything :)



what we can do with it later, and potentially burden userspace with
having to check multiple different things with confusing interactions
down the road. So I think it's worth stepping back a bit and thinking



This interface here is only available to effectively udev/systemd type 
software. Any abstraction above that should be on them. And if we 
eventually decide that we need a better interface to generic use

[PATCH] target-i386: Walk NPT in guest real mode

2024-08-27 Thread Alexander Graf
When translating virtual to physical address with a guest CPU that
supports nested paging (NPT), we need to perform every page table walk
access indirectly through the NPT, which we correctly do.

However, we treat real mode (no page table walk) special: In that case,
we currently just skip any walks and translate VA -> PA. With NPT
enabled, we also need to then perform NPT walk to do GVA -> GPA -> HPA
which we fail to do so far.

The net result of that is that TCG VMs with NPT enabled that execute
real mode code (like SeaBIOS) end up with GPA==HPA mappings which means
the guest accesses host code and data. This typically shows as failure
to boot guests.

This patch changes the page walk logic for NPT enabled guests so that we
always perform a GVA -> GPA translation, but simply provide a 1 GiB fake
PTE when in real mode. That way, all remaining logic to walk the NPT
stays and we successfully walk the NPT in real mode.

Fixes: fe441054bb3f0 ("target-i386: Add NPT support")

Signed-off-by: Alexander Graf 
Reported-by: Eduard Vlad 
---
 target/i386/tcg/sysemu/excp_helper.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/target/i386/tcg/sysemu/excp_helper.c 
b/target/i386/tcg/sysemu/excp_helper.c
index 8fb05b1f53..17f45431f6 100644
--- a/target/i386/tcg/sysemu/excp_helper.c
+++ b/target/i386/tcg/sysemu/excp_helper.c
@@ -298,7 +298,7 @@ static bool mmu_translate(CPUX86State *env, const 
TranslateParams *in,
 /* combine pde and pte nx, user and rw protections */
 ptep &= pte ^ PG_NX_MASK;
 page_size = 4096;
-} else {
+} else if (pg_mode) {
 /*
  * Page table level 2
  */
@@ -343,6 +343,12 @@ static bool mmu_translate(CPUX86State *env, const 
TranslateParams *in,
 ptep &= pte | PG_NX_MASK;
 page_size = 4096;
 rsvd_mask = 0;
+} else {
+/* No paging (real mode), let's assemble a fake 1:1 1GiB PTE */
+page_size = 0x4000;
+pte = (in->addr & ~(page_size - 1)) | PG_DIRTY_MASK | PG_ACCESSED_MASK;
+ptep = PG_NX_MASK | PG_USER_MASK | PG_RW_MASK;
+rsvd_mask = 0;
 }
 
 do_check_protect:
@@ -562,7 +568,7 @@ static bool get_physical_address(CPUX86State *env, vaddr 
addr,
 addr = (uint32_t)addr;
 }
 
-if (likely(env->cr[0] & CR0_PG_MASK)) {
+if (likely(env->cr[0] & CR0_PG_MASK || use_stage2)) {
 in.cr3 = env->cr[3];
 in.mmu_idx = mmu_idx;
 in.ptw_idx = use_stage2 ? MMU_NESTED_IDX : MMU_PHYS_IDX;
-- 
2.40.1




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597




Re: [PATCH v4 4/6] machine/nitro-enclave: Add built-in Nitro Secure Module device

2024-08-19 Thread Alexander Graf


On 19.08.24 17:28, Dorjoy Chowdhury wrote:

Hey Alex,

On Mon, Aug 19, 2024 at 4:13 PM Alexander Graf  wrote:

Hey Dorjoy,

On 18.08.24 13:42, Dorjoy Chowdhury wrote:

AWS Nitro Enclaves have built-in Nitro Secure Module (NSM) device which
is used for stripped down TPM functionality like attestation. This commit
adds the built-in NSM device in the nitro-enclave machine type.

In Nitro Enclaves, all the PCRs start in a known zero state and the first
16 PCRs are locked from boot and reserved. The PCR0, PCR1, PCR2 and PCR8
contain the SHA384 hashes related to the EIF file used to boot the
VM for validation.

Some optional nitro-enclave machine options have been added:
  - 'id': Enclave identifier, reflected in the module-id of the NSM
device. If not provided, a default id will be set.
  - 'parent-role': Parent instance IAM role ARN, reflected in PCR3
of the NSM device.
  - 'parent-id': Parent instance identifier, reflected in PCR4 of the
NSM device.

Signed-off-by: Dorjoy Chowdhury 
---
   crypto/meson.build  |   2 +-
   crypto/x509-utils.c |  73 +++


Can you please put this new API into its own patch file?



   hw/core/eif.c   | 225 +---
   hw/core/eif.h   |   5 +-


These changes to eif.c should ideally already be part of the patch that
introduces eif.c (patch 1), no? In fact, do you think you can make the
whole eif logic its own patch file?


Good point. I guess it should be possible if I have the virtio-nsm
device commit first and then add the machine/nitro-enclave commit with
full support with the devices. That will of course make the
machine/nitro-enclave commit larger. What do you think?



As long as nothing compiles the code, it can rely on not yet implemented 
functions. So it's perfectly legit to add all your code in individual 
commits and then at the end add the meson.build change that implements 
the config option. How about the order below?


* Crypto patch for SHA384
* Crypto patch for x509 fingerprint
* NSM device emulation (including libcbor check, introduces 
CONFIG_VIRTIO_NSM)

* EIF format parsing (not compiled yet)
* Nitro Enclaves machine (introduces CONFIG_NITRO_ENCLAVE)
* Nitro Enclaves docs


Alex




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597


Re: [PATCH v4 4/6] machine/nitro-enclave: Add built-in Nitro Secure Module device

2024-08-19 Thread Alexander Graf

Hey Dorjoy,

On 18.08.24 13:42, Dorjoy Chowdhury wrote:

AWS Nitro Enclaves have built-in Nitro Secure Module (NSM) device which
is used for stripped down TPM functionality like attestation. This commit
adds the built-in NSM device in the nitro-enclave machine type.

In Nitro Enclaves, all the PCRs start in a known zero state and the first
16 PCRs are locked from boot and reserved. The PCR0, PCR1, PCR2 and PCR8
contain the SHA384 hashes related to the EIF file used to boot the
VM for validation.

Some optional nitro-enclave machine options have been added:
 - 'id': Enclave identifier, reflected in the module-id of the NSM
device. If not provided, a default id will be set.
 - 'parent-role': Parent instance IAM role ARN, reflected in PCR3
of the NSM device.
 - 'parent-id': Parent instance identifier, reflected in PCR4 of the
NSM device.

Signed-off-by: Dorjoy Chowdhury 
---
  crypto/meson.build  |   2 +-
  crypto/x509-utils.c |  73 +++



Can you please put this new API into its own patch file?



  hw/core/eif.c   | 225 +---
  hw/core/eif.h   |   5 +-



These changes to eif.c should ideally already be part of the patch that 
introduces eif.c (patch 1), no? In fact, do you think you can make the 
whole eif logic its own patch file?




  hw/core/meson.build |   4 +-
  hw/i386/Kconfig |   1 +
  hw/i386/nitro_enclave.c | 141 +++-
  include/crypto/x509-utils.h |  22 
  include/hw/i386/nitro_enclave.h |  26 
  9 files changed, 479 insertions(+), 20 deletions(-)
  create mode 100644 crypto/x509-utils.c
  create mode 100644 include/crypto/x509-utils.h

diff --git a/crypto/meson.build b/crypto/meson.build
index c46f9c22a7..09633194ed 100644
--- a/crypto/meson.build
+++ b/crypto/meson.build
@@ -62,7 +62,7 @@ endif
  if gcrypt.found()
util_ss.add(gcrypt, files('random-gcrypt.c'))
  elif gnutls.found()
-  util_ss.add(gnutls, files('random-gnutls.c'))
+  util_ss.add(gnutls, files('random-gnutls.c', 'x509-utils.c'))



What if we don't have gnutls. Will everything still compile or do we 
need to add any dependencies?




  elif get_option('rng_none')
util_ss.add(files('random-none.c'))
  else
diff --git a/crypto/x509-utils.c b/crypto/x509-utils.c
new file mode 100644
index 00..2422eb995c
--- /dev/null
+++ b/crypto/x509-utils.c
@@ -0,0 +1,73 @@
+/*
+ * X.509 certificate related helpers
+ *
+ * Copyright (c) 2024 Dorjoy Chowdhury 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.  See the COPYING file in the
+ * top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "crypto/x509-utils.h"
+#include 
+#include 
+#include 
+
+static int qcrypto_to_gnutls_hash_alg_map[QCRYPTO_HASH_ALG__MAX] = {
+[QCRYPTO_HASH_ALG_MD5] = GNUTLS_DIG_MD5,
+[QCRYPTO_HASH_ALG_SHA1] = GNUTLS_DIG_SHA1,
+[QCRYPTO_HASH_ALG_SHA224] = GNUTLS_DIG_SHA224,
+[QCRYPTO_HASH_ALG_SHA256] = GNUTLS_DIG_SHA256,
+[QCRYPTO_HASH_ALG_SHA384] = GNUTLS_DIG_SHA384,
+[QCRYPTO_HASH_ALG_SHA512] = GNUTLS_DIG_SHA512,
+[QCRYPTO_HASH_ALG_RIPEMD160] = GNUTLS_DIG_RMD160,
+};
+
+int qcrypto_get_x509_cert_fingerprint(uint8_t *cert, size_t size,
+  QCryptoHashAlgorithm alg,
+  uint8_t **result,
+  size_t *resultlen,
+  Error **errp)
+{
+int ret;
+gnutls_x509_crt_t crt;
+gnutls_datum_t datum = {.data = cert, .size = size};
+
+if (alg >= G_N_ELEMENTS(qcrypto_to_gnutls_hash_alg_map)) {
+error_setg(errp, "Unknown hash algorithm");
+return -1;
+}
+
+gnutls_x509_crt_init(&crt);
+
+if (gnutls_x509_crt_import(crt, &datum, GNUTLS_X509_FMT_PEM) != 0) {
+error_setg(errp, "Failed to import certificate");
+goto cleanup;
+}
+
+ret = gnutls_hash_get_len(qcrypto_to_gnutls_hash_alg_map[alg]);
+if (*resultlen == 0) {
+*resultlen = ret;
+*result = g_new0(uint8_t, *resultlen);
+} else if (*resultlen < ret) {
+error_setg(errp,
+   "Result buffer size %zu is smaller than hash %d",
+   *resultlen, ret);
+goto cleanup;
+}
+
+if (gnutls_x509_crt_get_fingerprint(crt,
+qcrypto_to_gnutls_hash_alg_map[alg],
+*result, resultlen) != 0) {
+error_setg(errp, "Failed to get fingerprint from certificate");
+goto cleanup;
+}
+
+return 0;
+
+ cleanup:
+gnutls_x509_crt_deinit(crt);
+return -1;
+}
diff --git a/hw/core/eif.c b/hw/core/eif.c
index 5558879a96..8e15142d36 100644
--- a/hw/core/eif.c
+++ b/hw/core/eif.c
@@ -11,7 +11,10 @@
  #include "qemu/osdep.h"
  #include "qemu/bswap.h"
  #include "qapi/error.h"
+#include 

Re: [PATCH v4 3/6] device/virtio-nsm: Support for Nitro Secure Module device

2024-08-19 Thread Alexander Graf


On 18.08.24 13:42, Dorjoy Chowdhury wrote:

Nitro Secure Module (NSM)[1] device is used in AWS Nitro Enclaves for
stripped down TPM functionality like cryptographic attestation. The
requests to and responses from NSM device are CBOR[2] encoded.

This commit adds support for NSM device in QEMU. Although related to
AWS Nitro Enclaves, the virito-nsm device is independent and can be
used in other machine types as well. The libcbor[3] library has been
used for the CBOR encoding and decoding functionalities.

[1] https://lists.oasis-open.org/archives/virtio-comment/202310/msg00387.html
[2] http://cbor.io/
[3] https://libcbor.readthedocs.io/en/latest/

Signed-off-by: Dorjoy Chowdhury 



[...]



+static bool add_payload_to_cose(cbor_item_t *cose, VirtIONSM *vnsm)
+{
+cbor_item_t *root = NULL;
+cbor_item_t *nested_map;
+cbor_item_t *bs = NULL;
+size_t locked_cnt;
+uint8_t ind[NSM_MAX_PCRS];
+size_t payload_map_size = 6;
+size_t len;
+struct PCRInfo *pcr;
+uint8_t zero[64] = {0};
+bool r = false;
+size_t buf_len = 16384;
+uint8_t *buf = g_malloc(buf_len);
+
+if (vnsm->public_key_len > 0) {
+payload_map_size++;
+}
+if (vnsm->user_data_len > 0) {
+payload_map_size++;
+}
+if (vnsm->nonce_len > 0) {
+payload_map_size++;
+}



Now that you're always emitting user_data and nonce, you should include 
them in payload_map_size unconditionally as well; otherwise your map is 
too small to hold all members.


In addition, a real Nitro Enclave attestation document will return Null 
objects for these fields when they're not set instead of empty strings. 
With the patch below I was able to generate a doc that looks very 
similar to a real one:


diff --git a/hw/virtio/cbor-helpers.c b/hw/virtio/cbor-helpers.c
index 5140020d4e..ffecc97c48 100644
--- a/hw/virtio/cbor-helpers.c
+++ b/hw/virtio/cbor-helpers.c
@@ -140,7 +140,11 @@ bool qemu_cbor_add_bytestring_to_map(cbor_item_t 
*map, const char *key,

 if (!key_cbor) {
 goto cleanup;
 }
-    value_cbor = cbor_build_bytestring(arr, len);
+    if (len) {
+    value_cbor = cbor_build_bytestring(arr, len);
+    } else {
+    value_cbor = cbor_new_null();
+    }
 if (!value_cbor) {
 goto cleanup;
 }
@@ -241,7 +245,11 @@ bool 
qemu_cbor_add_uint8_key_bytestring_to_map(cbor_item_t *map, uint8_t key,

 if (!key_cbor) {
 goto cleanup;
 }
-    value_cbor = cbor_build_bytestring(buf, len);
+    if (len) {
+    value_cbor = cbor_build_bytestring(buf, len);
+    } else {
+    value_cbor = cbor_new_null();
+    }
 if (!value_cbor) {
 goto cleanup;
 }
diff --git a/hw/virtio/virtio-nsm.c b/hw/virtio/virtio-nsm.c
index e91848a2b0..b45d97efe2 100644
--- a/hw/virtio/virtio-nsm.c
+++ b/hw/virtio/virtio-nsm.c
@@ -1126,7 +1126,7 @@ static bool add_payload_to_cose(cbor_item_t *cose, 
VirtIONSM *vnsm)

 cbor_item_t *bs = NULL;
 size_t locked_cnt;
 uint8_t ind[NSM_MAX_PCRS];
-    size_t payload_map_size = 6;
+    size_t payload_map_size = 8;
 size_t len;
 struct PCRInfo *pcr;
 uint8_t zero[64] = {0};
@@ -1137,12 +1137,6 @@ static bool add_payload_to_cose(cbor_item_t 
*cose, VirtIONSM *vnsm)

 if (vnsm->public_key_len > 0) {
 payload_map_size++;
 }
-    if (vnsm->user_data_len > 0) {
-    payload_map_size++;
-    }
-    if (vnsm->nonce_len > 0) {
-    payload_map_size++;
-    }
 root = cbor_new_definite_map(payload_map_size);
 if (!root) {
 goto cleanup;


Alex




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597


Re: [PATCH v3 2/5] machine/nitro-enclave: Add vhost-user-vsock device

2024-08-14 Thread Alexander Graf


On 13.08.24 20:02, Dorjoy Chowdhury wrote:

On Mon, Aug 12, 2024 at 8:24 PM Daniel P. Berrangé  wrote:

On Sat, Aug 10, 2024 at 10:44:59PM +0600, Dorjoy Chowdhury wrote:

AWS Nitro Enclaves have built-in vhost-vsock device support which
enables applications in enclave VMs to communicate with the parent
EC2 VM over vsock. The enclave VMs have dynamic CID while the parent
always has CID 3. In QEMU, the vsock emulation for nitro enclave is
added using vhost-user-vsock as opposed to vhost-vsock. vhost-vsock
doesn't support sibling VM communication which is needed for nitro
enclaves.

In QEMU's nitro-enclave emulation, for the vsock communication to CID
3 to work, another process that does the vsock emulation in  userspace
must be run, for example, vhost-device-vsock[1] from rust-vmm, with
necessary vsock communication support in another guest VM with CID 3.
A new mandatory nitro-enclave machine option 'vsock' has been added.
The value for this option should be the chardev id from the '-chardev'
option for the vhost-user-vsock device to work.

Using vhost-user-vsock also enables the possibility to implement some
proxying support in the vhost-user-vsock daemon that will forward all
the packets to the host machine instead of CID 3 so that users of
nitro-enclave can run the necessary applications in their host machine
instead of running another whole VM with CID 3.

[1] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock

Signed-off-by: Dorjoy Chowdhury 
---
  backends/hostmem-memfd.c|   2 -
  hw/core/machine.c   |  71 +-
  hw/i386/Kconfig |   1 +
  hw/i386/nitro_enclave.c | 123 
  include/hw/boards.h |   2 +
  include/hw/i386/nitro_enclave.h |   8 +++
  include/sysemu/hostmem.h|   2 +
  7 files changed, 174 insertions(+), 35 deletions(-)

diff --git a/hw/i386/nitro_enclave.c b/hw/i386/nitro_enclave.c
index 98690c6373..280ab4cc9b 100644
--- a/hw/i386/nitro_enclave.c
+++ b/hw/i386/nitro_enclave.c
@@ -11,11 +11,81 @@
  #include "qemu/osdep.h"
  #include "qemu/error-report.h"
  #include "qapi/error.h"
+#include "qom/object_interfaces.h"

+#include "chardev/char.h"
+#include "hw/sysbus.h"
  #include "hw/core/eif.h"
  #include "hw/i386/x86.h"
  #include "hw/i386/microvm.h"
  #include "hw/i386/nitro_enclave.h"
+#include "hw/virtio/virtio-mmio.h"
+#include "hw/virtio/vhost-user-vsock.h"
+#include "sysemu/hostmem.h"
+
+static BusState *find_free_virtio_mmio_bus(void)
+{
+BusChild *kid;
+BusState *bus = sysbus_get_default();
+
+QTAILQ_FOREACH(kid, &bus->children, sibling) {
+DeviceState *dev = kid->child;
+if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MMIO)) {
+VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
+VirtioBusState *mmio_virtio_bus = &mmio->bus;
+BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
+if (QTAILQ_EMPTY(&mmio_bus->children)) {
+return mmio_bus;
+}
+}
+}
+
+return NULL;
+}
+
+static void vhost_user_vsock_init(NitroEnclaveMachineState *nems)
+{
+DeviceState *dev = qdev_new(TYPE_VHOST_USER_VSOCK);
+VHostUserVSock *vsock = VHOST_USER_VSOCK(dev);
+BusState *bus;
+
+if (!nems->vsock) {
+error_report("A valid chardev id for vhost-user-vsock device must be "
+ "provided using the 'vsock' machine option");
+exit(1);
+}
+
+bus = find_free_virtio_mmio_bus();
+if (!bus) {
+error_report("Failed to find bus for vhost-user-vsock device");
+exit(1);
+}
+
+Chardev *chardev = qemu_chr_find(nems->vsock);
+if (!chardev) {
+error_report("Failed to find chardev with id %s", nems->vsock);
+exit(1);
+}
+
+vsock->conf.chardev.chr = chardev;
+
+qdev_realize_and_unref(dev, bus, &error_fatal);
+}

Why does this machine need to create the vhost-user-vsock device itself ?
Doing it this way prevents the mgmt app from changing any of the other
vsock device settings beyond 'chardev'. The entity creating QEMU can use
-device to create the vsock device.


Hi Daniel. Good point. The reason to make the vhost-user-vsock device
built-in is because nitro VMs will always need it anyway (like the
virtio-nsm device which is built-in too). When an EIF image is built
using nitro-cli the "init" process in the EIF image tries to connect
to (AF_VSOCK, CID 3, port 9000) to send a heartbeat and expects a
heartbeat reply. So my understanding is that if we don't create it
inside the machine code itself, users would always need to provide the
explicit options for the device anyway. But as you point out this also
makes the device settings non-configurable.

Hey Alex, do you have any suggestions on this?



IMHO devices that are required for the machine to function should be 
part of the machine. Since vsock is a core part of the Nitro Enclave, it 
should be part of the machine definitio

Re: [PATCH v3 4/5] machine/nitro-enclave: Add built-in Nitro Secure Module device

2024-08-13 Thread Alexander Graf


On 10.08.24 18:45, Dorjoy Chowdhury wrote:

AWS Nitro Enclaves have built-in Nitro Secure Module (NSM) device which
is used for stripped down TPM functionality like attestation. This commit
adds the built-in NSM device in the nitro-enclave machine type.

In Nitro Enclaves, all the PCRs start in a known zero state and the first
16 PCRs are locked from boot and reserved. The PCR0, PCR1, PCR2 and PCR8
contain the SHA384 hashes related to the EIF file used to boot the
VM for validation.

A new optional nitro-enclave machine option 'id' has been added which will
be the enclave identifier reflected in the module-id of the NSM device.
Otherwise, the device will have a default id set.

Signed-off-by: Dorjoy Chowdhury 
---
  hw/core/eif.c   | 205 +++-
  hw/core/eif.h   |   5 +-
  hw/core/meson.build |   4 +-
  hw/i386/Kconfig |   1 +
  hw/i386/nitro_enclave.c |  85 -
  include/hw/i386/nitro_enclave.h |  19 +++
  6 files changed, 310 insertions(+), 9 deletions(-)



[...]



@@ -87,10 +106,46 @@ static void nitro_enclave_machine_state_init(MachineState 
*machine)
  nitro_enclave_devices_init(ne_state);
  }

+static void nitro_enclave_machine_reset(MachineState *machine,
+ShutdownCause reason)
+{
+NitroEnclaveMachineClass *ne_class =
+NITRO_ENCLAVE_MACHINE_GET_CLASS(machine);
+NitroEnclaveMachineState *ne_state = NITRO_ENCLAVE_MACHINE(machine);
+
+ne_class->parent_reset(machine, reason);
+
+memset(ne_state->vnsm->pcrs, 0, sizeof(ne_state->vnsm->pcrs));
+
+/* PCR0 */
+ne_state->vnsm->extend_pcr(ne_state->vnsm, 0, ne_state->image_sha384,
+   SHA384_BYTE_LEN);
+/* PCR1 */
+ne_state->vnsm->extend_pcr(ne_state->vnsm, 1, ne_state->bootstrap_sha384,
+   SHA384_BYTE_LEN);
+/* PCR2 */
+ne_state->vnsm->extend_pcr(ne_state->vnsm, 2, ne_state->app_sha384,
+   SHA384_BYTE_LEN);



What about PCR3 and PCR4? Both are just sha384 values of input 
strings[1]. Can you make these input strings NSM device as well as 
machine properties as well?


[1] https://docs.aws.amazon.com/enclaves/latest/user/set-up-attestation.html



Alex




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597


Re: [PATCH v3 3/5] device/virtio-nsm: Support for Nitro Secure Module device

2024-08-13 Thread Alexander Graf


On 10.08.24 18:45, Dorjoy Chowdhury wrote:

Nitro Secure Module (NSM)[1] device is used in AWS Nitro Enclaves for
stripped down TPM functionality like cryptographic attestation. The
requests to and responses from NSM device are CBOR[2] encoded.

This commit adds support for NSM device in QEMU. Although related to
AWS Nitro Enclaves, the virito-nsm device is independent and can be
used in other machine types as well. The libcbor[3] library has been
used for the CBOR encoding and decoding functionalities.

[1] https://lists.oasis-open.org/archives/virtio-comment/202310/msg00387.html
[2] http://cbor.io/
[3] https://libcbor.readthedocs.io/en/latest/

Signed-off-by: Dorjoy Chowdhury 
---
  MAINTAINERS|8 +
  hw/virtio/Kconfig  |5 +
  hw/virtio/meson.build  |4 +
  hw/virtio/virtio-nsm-pci.c |   73 ++
  hw/virtio/virtio-nsm.c | 1929 
  include/hw/virtio/virtio-nsm.h |   59 +
  6 files changed, 2078 insertions(+)
  create mode 100644 hw/virtio/virtio-nsm-pci.c
  create mode 100644 hw/virtio/virtio-nsm.c
  create mode 100644 include/hw/virtio/virtio-nsm.h



[...]



+
+/*
+ * Attestation request structure:
+ *
+ *   Map(1) {
+ * key = String("Attestation"),
+ * value = Map(3) {
+ *   key = String("user_data"),
+ *   value = Byte_String() || null,
+ *   key = String("nonce"),
+ *   value = Byte_String() || null,
+ *   key = String("public_key"),
+ *   value = Byte_String() || null,
+ * }
+ *   }
+ * }
+ */
+typedef struct NSMAttestationReq {
+uint16_t user_data_len;
+uint8_t user_data[NSM_USER_DATA_MAX_SIZE];
+
+uint16_t nonce_len;
+uint8_t nonce[NSM_NONCE_MAX_SIZE];
+
+uint16_t public_key_len;
+uint8_t public_key[NSM_PUBLIC_KEY_MAX_SIZE];
+} NSMAttestationReq;
+
+static enum NSMResponseTypes get_nsm_attestation_req(uint8_t *req, size_t len,
+ NSMAttestationReq 
*nsm_req)
+{
+cbor_item_t *item = NULL;
+size_t size;
+uint8_t *str;
+struct cbor_pair *pair;
+struct cbor_load_result result;
+enum NSMResponseTypes r = NSM_INVALID_ARGUMENT;
+
+item = cbor_load(req, len, &result);
+if (!item || result.error.code != CBOR_ERR_NONE) {
+goto cleanup;
+}
+
+pair = cbor_map_handle(item);
+if (!cbor_isa_map(pair->value) || cbor_map_size(pair->value) != 3) {
+goto cleanup;
+}
+pair = cbor_map_handle(pair->value);
+if (!cbor_isa_string(pair->key)) {
+goto cleanup;
+}
+str = cbor_string_handle(pair->key);
+size = cbor_string_length(pair->key);
+if (!str || size != 9 || memcmp(str, "user_data", 9) != 0) {
+goto cleanup;
+}
+
+if (cbor_isa_bytestring(pair->value)) {
+str = cbor_bytestring_handle(pair->value);
+size = cbor_bytestring_length(pair->value);
+if (!str || size == 0) {
+goto cleanup;
+}
+if (size > NSM_USER_DATA_MAX_SIZE) {
+r = NSM_INPUT_TOO_LARGE;
+goto cleanup;
+}
+memcpy(nsm_req->user_data, str, size);
+nsm_req->user_data_len = size;
+} else if (cbor_is_null(pair->value)) {
+nsm_req->user_data_len = 0;
+} else {
+goto cleanup;
+}
+
+/* let's move forward */
+pair++;
+if (!cbor_isa_string(pair->key)) {
+goto cleanup;
+}
+str = cbor_string_handle(pair->key);
+size = cbor_string_length(pair->key);
+if (!str || size != 5 || memcmp(str, "nonce", 5) != 0) {
+goto cleanup;
+}
+
+if (cbor_isa_bytestring(pair->value)) {
+str = cbor_bytestring_handle(pair->value);
+size = cbor_bytestring_length(pair->value);
+if (!str || size == 0) {
+goto cleanup;
+}
+if (size > NSM_NONCE_MAX_SIZE) {
+r = NSM_INPUT_TOO_LARGE;
+goto cleanup;
+}
+memcpy(nsm_req->nonce, str, size);
+nsm_req->nonce_len = size;
+} else if (cbor_is_null(pair->value)) {
+nsm_req->nonce_len = 0;
+} else {
+goto cleanup;
+}
+
+/* let's move forward */
+pair++;
+if (!cbor_isa_string(pair->key)) {
+goto cleanup;
+}
+str = cbor_string_handle(pair->key);
+size = cbor_string_length(pair->key);
+if (!str || size != 10 || memcmp(str, "public_key", 10) != 0) {
+goto cleanup;
+}
+
+if (cbor_isa_bytestring(pair->value)) {
+str = cbor_bytestring_handle(pair->value);
+size = cbor_bytestring_length(pair->value);
+if (!str || size == 0) {
+goto cleanup;
+}
+if (size > NSM_PUBLIC_KEY_MAX_SIZE) {
+r = NSM_INPUT_TOO_LARGE;
+goto cleanup;
+}
+memcpy(nsm_req->public_key, str, size);
+nsm_req->public_key_len = size;
+} else if (cbor_is_null(pair->value)) {
+nsm_req->public_key_len = 0;
+} else {
+goto

Re: [PATCH v3 1/5] machine/nitro-enclave: New machine type for AWS Nitro Enclaves

2024-08-12 Thread Alexander Graf


On 10.08.24 18:44, Dorjoy Chowdhury wrote:

AWS nitro enclaves[1] is an Amazon EC2[2] feature that allows creating
isolated execution environments, called enclaves, from Amazon EC2
instances which are used for processing highly sensitive data.
Enclaves have no persistent storage and no external networking. The
enclave VMs are based on Firecracker microvm with a vhost-vsock
device for communication with the parent EC2 instance that spawned
it and a Nitro Secure Module (NSM) device for cryptographic attestation.
The parent instance VM always has CID 3 while the enclave VM gets a
dynamic CID.

An EIF (Enclave Image Format)[3] file is used to boot an AWS nitro
enclave virtual machine. The EIF file contains the necessary kernel,
cmdline, ramdisk(s) sections to boot.

This commit adds support for limited AWS nitro enclave emulation using
a new machine type option '-M nitro-enclave'. This new machine type is
based on the 'microvm' machine type, similar to how real nitro enclave
VMs are based on Firecracker microvm. For nitro-enclave to boot from
an EIF file, the kernel and ramdisk(s) are extracted into a temporary
kernel and a temporary initrd file which are then hooked into the
regular x86 boot mechanism along with the extracted cmdline. The EIF
file path should be provided using the '-kernel' QEMU option.

The vsock and NSM devices will be implemented so that they are available
automatically in nitro-enclave machine type in the following commits.

[1] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
[2] https://aws.amazon.com/ec2/
[3] https://github.com/aws/aws-nitro-enclaves-image-format

Signed-off-by: Dorjoy Chowdhury 



If I run this code with an invalid kernel parameter, something in the 
error path is off. Can you please try to exercise your error paths to 
validate they work correctly?


$ ./build/qemu-system-x86_64 -M nitro-enclave -nographic -kernel foobar
qemu-system-x86_64: ../util/error.c:68: error_setv: Assertion `*errp == 
NULL' failed.



Alex




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597


Re: [PATCH v3 4/5] machine/nitro-enclave: Add built-in Nitro Secure Module device

2024-08-12 Thread Alexander Graf


On 10.08.24 18:45, Dorjoy Chowdhury wrote:

AWS Nitro Enclaves have built-in Nitro Secure Module (NSM) device which
is used for stripped down TPM functionality like attestation. This commit
adds the built-in NSM device in the nitro-enclave machine type.

In Nitro Enclaves, all the PCRs start in a known zero state and the first
16 PCRs are locked from boot and reserved. The PCR0, PCR1, PCR2 and PCR8
contain the SHA384 hashes related to the EIF file used to boot the
VM for validation.

A new optional nitro-enclave machine option 'id' has been added which will
be the enclave identifier reflected in the module-id of the NSM device.
Otherwise, the device will have a default id set.

Signed-off-by: Dorjoy Chowdhury 
---
  hw/core/eif.c   | 205 +++-
  hw/core/eif.h   |   5 +-
  hw/core/meson.build |   4 +-
  hw/i386/Kconfig |   1 +
  hw/i386/nitro_enclave.c |  85 -
  include/hw/i386/nitro_enclave.h |  19 +++
  6 files changed, 310 insertions(+), 9 deletions(-)

diff --git a/hw/core/eif.c b/hw/core/eif.c
index 5558879a96..d2c65668ef 100644
--- a/hw/core/eif.c
+++ b/hw/core/eif.c
@@ -12,6 +12,9 @@
  #include "qemu/bswap.h"
  #include "qapi/error.h"
  #include  /* for crc32 */
+#include 
+#include 
+#include 

  #include "hw/core/eif.h"

@@ -180,6 +183,8 @@ static void safe_unlink(char *f)
   * Upon success, the caller is reponsible for unlinking and freeing 
*kernel_path
   */
  static bool read_eif_kernel(FILE *f, uint64_t size, char **kernel_path,
+GChecksum *image_hasher,
+GChecksum *bootstrap_hasher,
  uint32_t *crc, Error **errp)
  {
  size_t got;
@@ -213,6 +218,8 @@ static bool read_eif_kernel(FILE *f, uint64_t size, char 
**kernel_path,
  }

  *crc = crc32(*crc, kernel, size);
+g_checksum_update(image_hasher, kernel, size);
+g_checksum_update(bootstrap_hasher, kernel, size);
  g_free(kernel);
  fclose(tmp_file);

@@ -230,6 +237,8 @@ static bool read_eif_kernel(FILE *f, uint64_t size, char 
**kernel_path,
  }

  static bool read_eif_cmdline(FILE *f, uint64_t size, char *cmdline,
+ GChecksum *image_hasher,
+ GChecksum *bootstrap_hasher,
   uint32_t *crc, Error **errp)
  {
  size_t got = fread(cmdline, 1, size, f);
@@ -239,10 +248,14 @@ static bool read_eif_cmdline(FILE *f, uint64_t size, char 
*cmdline,
  }

  *crc = crc32(*crc, (uint8_t *)cmdline, size);
+g_checksum_update(image_hasher, (uint8_t *)cmdline, size);
+g_checksum_update(bootstrap_hasher, (uint8_t *)cmdline, size);
  return true;
  }

  static bool read_eif_ramdisk(FILE *eif, FILE *initrd, uint64_t size,
+ GChecksum *image_hasher,
+ GChecksum *bootstrap_or_app_hasher,
   uint32_t *crc, Error **errp)
  {
  size_t got;
@@ -261,6 +274,8 @@ static bool read_eif_ramdisk(FILE *eif, FILE *initrd, 
uint64_t size,
  }

  *crc = crc32(*crc, ramdisk, size);
+g_checksum_update(image_hasher, ramdisk, size);
+g_checksum_update(bootstrap_or_app_hasher, ramdisk, size);
  g_free(ramdisk);
  return true;

@@ -269,6 +284,125 @@ static bool read_eif_ramdisk(FILE *eif, FILE *initrd, 
uint64_t size,
  return false;
  }

+static bool get_fingerprint_sha384_from_cert(uint8_t *cert, size_t size,
+ uint8_t *sha384, Error **errp)
+{
+gnutls_x509_crt_t crt;
+size_t hash_size = 48;
+gnutls_datum_t datum = {.data = cert, .size = size};
+
+gnutls_global_init();
+gnutls_x509_crt_init(&crt);
+
+if (gnutls_x509_crt_import(crt, &datum, GNUTLS_X509_FMT_PEM) != 0) {
+error_setg(errp, "Failed to import certificate");
+goto cleanup;
+}
+
+if (gnutls_x509_crt_get_fingerprint(crt, GNUTLS_DIG_SHA384, sha384,
+&hash_size) != 0) {
+error_setg(errp, "Failed to compute SHA384 fingerprint");
+goto cleanup;
+}
+
+return true;
+
+ cleanup:
+gnutls_x509_crt_deinit(crt);
+gnutls_global_deinit();
+return false;
+}
+
+static bool get_signature_fingerprint_sha384(FILE *eif, uint64_t size,
+ uint8_t *sha384,
+ uint32_t *crc,
+ Error **errp)
+{
+size_t got;
+uint8_t *sig = NULL;
+uint8_t *cert = NULL;
+cbor_item_t *item = NULL;
+cbor_item_t *pcr0 = NULL;
+size_t len;
+struct cbor_pair *pair;
+struct cbor_load_result result;
+
+sig = g_malloc(size);
+got = fread(sig, 1, size, eif);
+if ((uint64_t) got != size) {
+error_setg(errp, "Failed to read EIF signature section data");
+goto cleanup;
+}
+
+*crc = crc

Re: [PATCH v3 4/5] machine/nitro-enclave: Add built-in Nitro Secure Module device

2024-08-12 Thread Alexander Graf


On 10.08.24 18:45, Dorjoy Chowdhury wrote:

AWS Nitro Enclaves have built-in Nitro Secure Module (NSM) device which
is used for stripped down TPM functionality like attestation. This commit
adds the built-in NSM device in the nitro-enclave machine type.

In Nitro Enclaves, all the PCRs start in a known zero state and the first
16 PCRs are locked from boot and reserved. The PCR0, PCR1, PCR2 and PCR8
contain the SHA384 hashes related to the EIF file used to boot the
VM for validation.

A new optional nitro-enclave machine option 'id' has been added which will
be the enclave identifier reflected in the module-id of the NSM device.
Otherwise, the device will have a default id set.

Signed-off-by: Dorjoy Chowdhury 
---
  hw/core/eif.c   | 205 +++-
  hw/core/eif.h   |   5 +-
  hw/core/meson.build |   4 +-
  hw/i386/Kconfig |   1 +
  hw/i386/nitro_enclave.c |  85 -
  include/hw/i386/nitro_enclave.h |  19 +++
  6 files changed, 310 insertions(+), 9 deletions(-)



[...]



diff --git a/hw/core/meson.build b/hw/core/meson.build
index f32d1ad943..7e7a14ee00 100644
--- a/hw/core/meson.build
+++ b/hw/core/meson.build
@@ -12,6 +12,8 @@ hwcore_ss.add(files(
'qdev-clock.c',
  ))

+libcbor = dependency('libcbor', version: '>=0.8.0')
+
  common_ss.add(files('cpu-common.c'))
  common_ss.add(files('machine-smp.c'))
  system_ss.add(when: 'CONFIG_FITLOADER', if_true: files('loader-fit.c'))
@@ -24,7 +26,7 @@ system_ss.add(when: 'CONFIG_REGISTER', if_true: 
files('register.c'))
  system_ss.add(when: 'CONFIG_SPLIT_IRQ', if_true: files('split-irq.c'))
  system_ss.add(when: 'CONFIG_XILINX_AXI', if_true: files('stream.c'))
  system_ss.add(when: 'CONFIG_PLATFORM_BUS', if_true: files('sysbus-fdt.c'))
-system_ss.add(when: 'CONFIG_NITRO_ENCLAVE', if_true: [files('eif.c'), zlib])
+system_ss.add(when: 'CONFIG_NITRO_ENCLAVE', if_true: [files('eif.c'), zlib, 
libcbor, gnutls])



I think this is missing a dependency check somewhere:

../hw/core/eif.c:16:10: fatal error: gnutls/gnutls.h: No such file or 
directory

   16 | #include 
  |  ^

It's also the first time anything accesses gnutls directly instead of 
through the QEMU crypto framework. Is there a particular reason you can 
not use qcrypto?



Alex




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597


Re: [PATCH v2 0/2] AWS Nitro Enclave emulation

2024-06-14 Thread Alexander Graf


On 01.06.24 18:26, Dorjoy Chowdhury wrote:

This is v2 submission for AWS Nitro Enclave emulation in QEMU. v1 is at:
https://mail.gnu.org/archive/html/qemu-devel/2024-05/msg03524.html

Changes in v2:
 - moved eif.c and eif.h files from hw/i386 to hw/core

Hi,

Hope everyone is doing well. I am working on adding AWS Nitro Enclave[1]
emulation support in QEMU. Alexander Graf is mentoring me on this work. This is
a patch series adding, not yet complete, but useful emulation support of nitro
enclaves. I have a gitlab branch where you can view the patches in the gitlab
web UI for each commit:
https://gitlab.com/dorjoy03/qemu/-/tree/nitro-enclave-emulation

AWS nitro enclaves is an Amazon EC2[2] feature that allows creating isolated
execution environments, called enclaves, from Amazon EC2 instances, which are
used for processing highly sensitive data. Enclaves have no persistent storage
and no external networking. The enclave VMs are based on Firecracker microvm
and have a vhost-vsock device for communication with the parent EC2 instance
that spawned it and a Nitro Secure Module (NSM) device for cryptographic
attestation. The parent instance VM always has CID 3 while the enclave VM gets
a dynamic CID. The enclave VMs can communicate with the parent instance over
various ports to CID 3, for example, the init process inside an enclave sends a
heartbeat to port 9000 upon boot, expecting a heartbeat reply, letting the
parent instance know that the enclave VM has successfully booted.

 From inside an EC2 instance, nitro-cli[3] is used to spawn an enclave VM using
an EIF (Enclave Image Format)[4] file. EIF files can be built using nitro-cli
as well. There is no official EIF specification apart from the github
aws-nitro-enclaves-image-format repository[4]. An EIF file contains the kernel,
cmdline and ramdisk(s) in different sections which are used to boot the enclave
VM. You can look at the structs in hw/i386/eif.c file for more details about
the EIF file format.

Adding nitro enclave emulation support in QEMU will make the life of AWS Nitro
Enclave users easier as they will be able to test their EIF images locally
without having to run real nitro enclaves which can be difficult for debugging
due to its roots in security. This will also make quick prototyping easier.

In QEMU, the new nitro-enclave machine type is implemented based on the microvm
machine type similar to how AWS Nitro Enclaves are based on Firecracker microvm.
The vhost-vsock device support is already part of this patch series so that the
enclave VM can communicate to CID 3 using vsock. A mandatory 'guest-cid'
machine type option is needed which becomes the CID of the enclave VM. Some
documentation for the new 'nitro-enclave' machine type has also been added. The
NSM device support will be added in the future.

The plan is to eventually make the nitro enclave emulation in QEMU standalone
i.e., without needing to run another VM with CID 3 with proper vsock
communication support. For this to work, one approach could be to teach the
vhost-vsock driver in kernel to forward CID 3 messages to another CID
(set to CID 2 for host) so that users of the nitro-enclave machine type can
run the necessary vsock server/clients in the host machine (some defaults can
be implemented in QEMU as well, for example, sending a reply to the heartbeat)
which will rid them of the cumbersome way of running another whole VM with CID
3. This way, users of nitro-enclave machine in QEMU, could potentially also run
multiple enclaves with their messages for CID 3 forwarded to different CIDs
which, in QEMU side, could then be specified using a new machine type option
(parent-cid) if implemented. I will be posting an email to the linux
virtualization mailing list about this approach asking for feedback and
suggestions soon.

For local testing you need to generate a hello.eif image by first building
nitro-cli locally[5]. Then you can use nitro-cli to build a hello.eif image[6].

You need to build qemu-system-x86_64 after applying the patches and then you
can run the following command to boot a hello.eif image using the new
'nitro-enclave' machine type option in QEMU:

sudo ./qemu-system-x86_64 -M nitro-enclave,guest-cid=8 -kernel 
path/to/hello.eif -nographic -m 4G --enable-kvm -cpu host

The command needs to be run as sudo because for the vhost-vsock device to work
QEMU needs to be able to open vhost device in host.

Right now, if you just run the nitro-enclave machine, the kernel panics because
the init process exits abnormally because it cannot connect to port 9000 to CID
3 to send its heartbeat message (the connection times out), so another VM with
CID 3 with proper vsock communication support must be run for it to be useful.
But this restriction can be lifted once the approach about forwarding CID 3
messages is implemented if it gets accepted.



Reviewed-by: Alexander Graf 

I'm happy to see Nitro Enclaves guest support merged even if th

Re: [PATCH v1 1/2] machine/microvm: support for loading EIF image

2024-05-31 Thread Alexander Graf


On 22.05.24 19:23, Dorjoy Chowdhury wrote:

Hi Daniel,
Thanks for reviewing.

On Wed, May 22, 2024 at 9:32 PM Daniel P. Berrangé  wrote:

On Sat, May 18, 2024 at 02:07:52PM +0600, Dorjoy Chowdhury wrote:

An EIF (Enclave Image Format)[1] image is used to boot an AWS nitro
enclave[2] virtual machine. The EIF file contains the necessary
kernel, cmdline, ramdisk(s) sections to boot.

This commit adds support for loading EIF image using the microvm
machine code. For microvm to boot from an EIF file, the kernel and
ramdisk(s) are extracted into a temporary kernel and a temporary
initrd file which are then hooked into the regular x86 boot mechanism
along with the extracted cmdline.

I can understand why you did it this way, but I feel its pretty
gross to be loading everything into memory, writing it back to
disk, and then immediately loading it all back into memory.

The root problem is the x86_load_linux() method, which directly
accesses the machine properties:

 const char *kernel_filename = machine->kernel_filename;
 const char *initrd_filename = machine->initrd_filename;
 const char *dtb_filename = machine->dtb;
 const char *kernel_cmdline = machine->kernel_cmdline;

To properly handle this, I'd say we need to introduce an abstraction
for loading the kernel/inittrd/cmdlkine data.

ie on the   X86MachineClass object, we should introduce four virtual
methods

uint8_t *(*load_kernel)(X86MachineState *machine);
uint8_t *(*load_initrd)(X86MachineState *machine);
uint8_t *(*load_dtb)(X86MachineState *machine);
uint8_t *(*load_cmdline)(X86MachineState *machine);

The default impl of these four methods should just following the
existing logic, of reading and returning the data associated with
the kernel_filename, initrd_filename, dtb and kernel_cmdline
properties.

The Nitro machine sub-class, however, can provide an alternative
impl of thse virtual methods which returns data directly from
the EIF file instead.


Great suggestion! I agree the implementation path you suggested would
look much nicer as a whole. Now that I looked a bit into the
"x86_load_linux" implementation, it looks like pretty much everything
is tied to expecting a filename. For example, after reading the header
from the kernel_filename x86_load_linux calls into load_multiboot,
load_elf (which in turn calls load_elf64, 32) and they all expect a
filename. I think I would need to refactor all of them to instead work
with (uint8_t *) buffers instead, right? Also in case of
initrd_filename the existing code maps the file using
g_mapped_file_new to the X86MachineState member initrd_mapped_file. So
that will need to be looked into and refactored. Please correct me if
I misunderstood something about the way to implement your suggested
approach.

If I am understanding this right, this probably requires a lot of work
which will also probably not be straightforward to implement or test.
Right now, the way the code is, it's easy to see that the existing
code paths are still correct as they are not changed and the new
nitro-enclave machine code just hooks into them. As the loading to
memory, writing to disk and loading back to memory only is in the
execution path of the new nitro-enclave machine type, I think the way
the patch is right now, is a good first implementation. What do you
think?



I think the "real" fix here is to move all of the crufty loader logic 
from "file access" to "block layer access". Along with that, create a 
generic helper function (like this[1]) that opens all 
-kernel/-initrd/-dtb files through the block layer and calls a board 
specific callback to perform the load.


With that in place, we would have a reentrant code path for the EIF 
case: EIF could plug into the generic x86 loader and when it detects 
EIF, create internal block devices that reference the existing file plus 
an offset/limit setting to ensure it only accesses the correct range in 
the target file. It can then simply reinvoke the x86 loader with the new 
block device objects.


With that in place, we could also finally support -kernel 
http://.../vmlinuz command line invocations which currently only works 
on block devices.


However, I do agree that the above is significant effort to get going 
and shouldn't hold back the Nitro Enclave machine model.



Alex


[1] 
https://github.com/agraf/qemu/commit/e49b7a18f2d8a386e5f207c567ad9ab2e3cb5429






Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597


Re: [PATCH v1 1/2] machine/microvm: support for loading EIF image

2024-05-27 Thread Alexander Graf


On 27.05.24 16:52, Dorjoy Chowdhury wrote:

Hi Philippe,
Thank you for reviewing.

On Mon, May 27, 2024 at 4:47 PM Philippe Mathieu-Daudé
 wrote:

Hi Dorjoy,

On 18/5/24 10:07, Dorjoy Chowdhury wrote:

An EIF (Enclave Image Format)[1] image is used to boot an AWS nitro
enclave[2] virtual machine. The EIF file contains the necessary
kernel, cmdline, ramdisk(s) sections to boot.

This commit adds support for loading EIF image using the microvm
machine code. For microvm to boot from an EIF file, the kernel and
ramdisk(s) are extracted into a temporary kernel and a temporary
initrd file which are then hooked into the regular x86 boot mechanism
along with the extracted cmdline.

Although not useful for the microvm machine itself, this is needed
as the following commit adds support for a new machine type
'nitro-enclave' which uses the microvm machine type as parent. The
code for checking and loading EIF will be put inside a 'nitro-enclave'
machine type check in the following commit so that microvm cannot load
EIF because it shouldn't.

[1] https://github.com/aws/aws-nitro-enclaves-image-format

The documentation is rather scarse...


Do you mean documentation about EIF file format?  If so, yes, right
now there is no specification other than the github repo for EIF.


[2] https://aws.amazon.com/ec2/nitro/nitro-enclaves/

Signed-off-by: Dorjoy Chowdhury 
---
   hw/i386/eif.c   | 486 
   hw/i386/eif.h   |  20 ++
   hw/i386/meson.build |   2 +-

... still it seems a generic loader, not restricted to x86.

Maybe better add it as hw/core/loader-eif.[ch]?


Yes, the code in eif.c is architecture agnostic. So it could make
sense to move the files to hw/core. But I don't think the names should
have "loader" prefix as there is no loading logic in eif.c. There is
only logic for parsing and extracting kernel, intird, cmdline etc.
Because nitro-enclave machine type is based on microvm which only
supports x86 now, I think it also makes sense to keep the files inside
hw/i386 for now as we can only really load x86 kernel using it. Maybe
if we, in the future, add support for other architectures, then we can
move them to hw/core. What do you think?



I think it makes sense to put EIF parsing into generic code from the 
start. Nitro Enclaves supports Aarch64 with the same EIF semantics. In 
fact, it would be pretty simple to do a sub-machine-class similar to the 
x86 NE one for arm based on -M virt as a follow-up and by making the EIF 
logic x86 only we're only making our lives harder for that future.



Alex




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597


Re: [PATCH] hvf: arm: Fix encodings for ID_AA64PFR1_EL1 and debug System registers

2024-05-05 Thread Alexander Graf



On 03.05.24 19:34, Zenghui Yu wrote:

We wrongly encoded ID_AA64PFR1_EL1 using {3,0,0,4,2} in hvf_sreg_match[] so
we fail to get the expected ARMCPRegInfo from cp_regs hash table with the
wrong key.

Fix it with the correct encoding {3,0,0,4,1}. With that fixed, the Linux
guest can properly detect FEAT_SSBS2 on my M1 HW.

All DBG{B,W}{V,C}R_EL1 registers are also wrongly encoded with op0 == 14.
It happens to work because HVF_SYSREG(CRn, CRm, 14, op1, op2) equals to
HVF_SYSREG(CRn, CRm, 2, op1, op2), by definition. But we shouldn't rely on
it.

Fixes: a1477da3ddeb ("hvf: Add Apple Silicon support")
Signed-off-by: Zenghui Yu 



Nice catch! Did you find them only because of functional issues or have 
you taken an automated pass somehow to validate the sysreg definitions 
are correct?


Reviewed-by: Alexander Graf 


Alex





Re: Call for GSoC/Outreachy internship project ideas

2024-01-30 Thread Alexander Graf

Hey Stefan,

Thanks a lot for setting up GSoC this year again!

On 15.01.24 17:32, Stefan Hajnoczi wrote:

Dear QEMU and KVM communities,
QEMU will apply for the Google Summer of Code and Outreachy internship
programs again this year. Regular contributors can submit project
ideas that they'd like to mentor by replying to this email before
January 30th.

Internship programs
---
GSoC (https://summerofcode.withgoogle.com/) and Outreachy
(https://www.outreachy.org/) offer paid open source remote work
internships to eligible people wishing to participate in open source
development. QEMU has been part of these internship programs for many
years. Our mentors have enjoyed helping talented interns make their
first open source contributions and some former interns continue to
participate today.

Who can mentor
--
Regular contributors to QEMU and KVM can participate as mentors.
Mentorship involves about 5 hours of time commitment per week to
communicate with the intern, review their patches, etc. Time is also
required during the intern selection phase to communicate with
applicants. Being a mentor is an opportunity to help someone get
started in open source development, will give you experience with
managing a project in a low-stakes environment, and a chance to
explore interesting technical ideas that you may not have time to
develop yourself.

How to propose your idea
--
Reply to this email with the following project idea template filled in:

=== TITLE ===

'''Summary:''' Short description of the project

Detailed description of the project that explains the general idea,
including a list of high-level tasks that will be completed by the
project, and provides enough background for someone unfamiliar with
the codebase to do research. Typically 2 or 3 paragraphs.

'''Links:'''
* Wiki links to relevant material
* External links to mailing lists or web sites

'''Details:'''
* Skill level: beginner or intermediate or advanced
* Language: C/Python/Rust/etc



=== Implement -M nitro-enclave in QEMU  ===

'''Summary:''' AWS EC2 provides the ability to create an isolated 
sibling VM context from within a VM. This project implements the machine 
model and input data format parsing needed to run these sibling VMs 
stand alone in QEMU.


Nitro Enclaves are the first widely adopted implementation of hypervisor 
assisted compute isolation. Similar to technologies like SGX, it allows 
to spawn a separate context that is inaccessible by the parent Operating 
System. This is implemented by "giving up" resources of the parent VM 
(CPU cores, memory) to the hypervisor which then spawns a second vmm to 
execute a completely separate virtual machine. That new VM only has a 
vsock communication channel to the parent and has a built-in lightweight 
TPM called NSM.


One big challenge with Nitro Enclaves is that due to its roots in 
security, there are very few debugging / introspection capabilities. 
That makes OS bringup, debugging and bootstrapping very difficult. 
Having a local dev&test environment that looks like an Enclave, but is 
100% controlled by the developer and introspectable would make life a 
lot easier for everyone working on them. It also may pave the way to see 
Nitro Enclaves adopted in VM environments outside of EC2.


This project will consist of adding a new machine model to QEMU that 
mimics a Nitro Enclave environment, including NSM, the vsock 
communication channel and building firmware which loads the special 
"EIF" file format which contains kernel, initramfs and metadata from a 
-kernel image.


If the student finishes early, we can then proceed to implement the 
Nitro Enclaves parent driver in QEMU as well to create a full QEMU-only 
Nitro Enclaves environment.


'''Tasks:'''
* Implement a device model for the NSM device (link to spec and driver 
code below)

* Implement a new machine model
* Implement firmware for the new machine model that implements EIF parsing
* Add tests for the NSM device
* Add integration test for the machine model executing an actual EIF payload

'''Links:'''
* https://aws.amazon.com/ec2/nitro/nitro-enclaves/
* 
https://lore.kernel.org/lkml/20200921121732.44291-10-andra...@amazon.com/T/
* 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/misc/nsm.c


'''Details:'''
* Skill level: intermediate - advanced (some understanding of QEMU 
machine modeling would be good)

* Language: C
* Mentor: agraf
* Suggested by: Alexander Graf (OFTC: agraf, Email: g...@amazon.com)



Alex





Re: [PATCH v12 04/10] hvf: Add Apple Silicon support

2023-12-01 Thread Alexander Graf



On 30.11.23 15:17, Philippe Mathieu-Daudé wrote:

Hi,

On 16/9/21 17:53, Alexander Graf wrote:
With Apple Silicon available to the masses, it's a good time to add 
support

for driving its virtualization extensions from QEMU.

This patch adds all necessary architecture specific code to get basic 
VMs

working, including save/restore.

Known limitations:

   - WFI handling is missing (follows in later patch)
   - No watchpoint/breakpoint support

Signed-off-by: Alexander Graf 
Reviewed-by: Roman Bolshakov 
Reviewed-by: Sergio Lopez 
Reviewed-by: Peter Maydell 




---
  MAINTAINERS |   5 +
  accel/hvf/hvf-accel-ops.c   |   9 +
  include/sysemu/hvf_int.h    |  10 +-
  meson.build |   1 +
  target/arm/hvf/hvf.c    | 794 
  target/arm/hvf/trace-events |  10 +
  target/i386/hvf/hvf.c   |   5 +
  7 files changed, 833 insertions(+), 1 deletion(-)
  create mode 100644 target/arm/hvf/hvf.c
  create mode 100644 target/arm/hvf/trace-events




+int hvf_arch_init_vcpu(CPUState *cpu)
+{
+    ARMCPU *arm_cpu = ARM_CPU(cpu);
+    CPUARMState *env = &arm_cpu->env;
+    uint32_t sregs_match_len = ARRAY_SIZE(hvf_sreg_match);
+    uint32_t sregs_cnt = 0;
+    uint64_t pfr;
+    hv_return_t ret;
+    int i;
+
+    env->aarch64 = 1;
+    asm volatile("mrs %0, cntfrq_el0" : "=r"(arm_cpu->gt_cntfrq_hz));
+
+    /* Allocate enough space for our sysreg sync */
+    arm_cpu->cpreg_indexes = g_renew(uint64_t, arm_cpu->cpreg_indexes,
+ sregs_match_len);
+    arm_cpu->cpreg_values = g_renew(uint64_t, arm_cpu->cpreg_values,
+    sregs_match_len);
+    arm_cpu->cpreg_vmstate_indexes = g_renew(uint64_t,
+ arm_cpu->cpreg_vmstate_indexes,
+ sregs_match_len);
+    arm_cpu->cpreg_vmstate_values = g_renew(uint64_t,
+ arm_cpu->cpreg_vmstate_values,
+    sregs_match_len);
+
+    memset(arm_cpu->cpreg_values, 0, sregs_match_len * 
sizeof(uint64_t));

+
+    /* Populate cp list for all known sysregs */
+    for (i = 0; i < sregs_match_len; i++) {
+    const ARMCPRegInfo *ri;
+    uint32_t key = hvf_sreg_match[i].key;
+
+    ri = get_arm_cp_reginfo(arm_cpu->cp_regs, key);
+    if (ri) {
+    assert(!(ri->type & ARM_CP_NO_RAW));
+    hvf_sreg_match[i].cp_idx = sregs_cnt;
+    arm_cpu->cpreg_indexes[sregs_cnt++] = cpreg_to_kvm_id(key);


So this hash ...:

    /*
 * Convert a truncated 32 bit hashtable key into the full
 * 64 bit KVM register ID.
 */
    static uint64_t cpreg_to_kvm_id(uint32_t cpregid)
    {
    uint64_t kvmid;

    if (cpregid & CP_REG_AA64_MASK) {
    kvmid = cpregid & ~CP_REG_AA64_MASK;
    kvmid |= CP_REG_SIZE_U64 | CP_REG_ARM64;
    } else {
    kvmid = cpregid & ~(1 << 15);
    if (cpregid & (1 << 15)) {
    kvmid |= CP_REG_SIZE_U64 | CP_REG_ARM;
    } else {
    kvmid |= CP_REG_SIZE_U32 | CP_REG_ARM;
    }
    }
    return kvmid;
    }

... just happens to work the same way for HVF?



It never feeds into HVF - we only use these values as unique identifiers 
inside QEMU, no? See write_cpustate_to_list() and 
write_list_to_cpustate() for reference.



Alex





Re: [PATCH 00/16] hw/uefi: add uefi variable service

2023-11-20 Thread Alexander Graf

Hey Gerd!

On 15.11.23 16:12, Gerd Hoffmann wrote:

This patch adds a virtual device to qemu which the uefi firmware can use
to store variables.  This moves the UEFI variable management from
privileged guest code (managing vars in pflash) to the host.  Main
advantage is that the need to have privilege separation in the guest
goes away.

On x86 privileged guest code runs in SMM.  It's supported by kvm, but
not liked much by various stakeholders in cloud space due to the
complexity SMM emulation brings.

On arm privileged guest code runs in el3 (aka secure world).  This is
not supported by kvm, which is unlikely to change anytime soon given
that even el2 support (nested virt) is being worked on for years and is
not yet in mainline.

The design idea is to reuse the request serialization protocol edk2 uses
for communication between SMM and non-SMM code, so large chunks of the
edk2 variable driver stack can be used unmodified.  Only the driver
which traps into SMM mode must be replaced by a driver which talks to
qemu instead.



I'm not sure I like the split :). If we cut things off at the SMM 
communication layer, we still have a lot of code inside the Runtime 
Services (RTS) code that is edk2 specific which means we're tying 
ourselves tightly to the edk2 data format. It also means we can not 
easily expose UEFI variables that QEMU owns, which can come in very 
handy when implementing MOR - another feature that depends on SMM today.


In EC2, we are simply serializing all variable RTS calls to the 
hypervisor, similar to the Xen implementation 
(https://www.youtube.com/watch?v=jiR8khaECEk).


The edk2 side code we have built is here: 
https://github.com/aws/uefi/tree/main/edk2-stable202211 (see anything 
with VarStore in the name).


For the vmm side, we currently have an AWS-internal C++ implementation 
that I can convert into QEMU code and send as patch if there is real 
interest. Given that we deal with untrusted input, I would strongly 
prefer if we could move it to a Rust implementation instead though. We 
started a Rust reimplementation of it that interface that can build as a 
library with C bindings which QEMU could then link against:


  https://github.com/Mimoja/rs-uefi-varstore/tree/for_main

The code never went beyond the initial stages, but if we're pulling the 
variable store to QEMU, this would be the best path forward IMHO.



If instead, we just want something we can quickly integrate while eating 
up the additional security risk, I think we should just reuse the Xen 
implementation.



Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




Re: [PATCH] hvf: Enable 1G page support

2023-10-17 Thread Alexander Graf



On 21.04.23 00:52, Alexander Graf wrote:

Hvf on x86 only supported 2MiB large pages, but never bothered to strip
out the 1GiB page size capability from -cpu host. With QEMU 8.0.0 this
became a problem because OVMF started to use 1GiB pages by default.

Let's just unconditionally add 1GiB page walk support to the walker.

With this fix applied, I can successfully run OVMF again.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1603
Signed-off-by: Alexander Graf 
Reported-by: Akihiro Suda 
Reported-by: Philippe Mathieu-Daudé 



Ping. Anyone willing to pick this up? :)


Alex





Re: [PATCH v2 05/11] tpm_crb: use the ISA bus

2023-10-17 Thread Alexander Graf

Hi Joelle,

On 01.08.23 03:46, Joelle van Dyne wrote:

On Tue, Jul 18, 2023 at 7:16 AM Stefan Berger  wrote:



On 7/17/23 09:46, Igor Mammedov wrote:

On Fri, 14 Jul 2023 00:09:21 -0700
Joelle van Dyne  wrote:


Since this device is gated to only build for targets with the PC
configuration, we should use the ISA bus like with TPM TIS.

does it affect migration in any way?
  From guest pov it looks like there a new ISA device will appear
and then if you do ping pong migration between old - new QEMU will really it 
work?



If it will, then commit message here shall describe why it safe and why it works


I would just leave the existing device as-is. This seems safest and we know 
thta it works.
 Stefan

Alexander, do you have any comments here? I know you wanted to move
away from the default bus. The concern is that switching from the
default bus to the ISA bus may cause issues in migration. The current
course of action is to drop this patch.



The big problem I have with the CRB device is this code:

https://gitlab.com/qemu-project/qemu/-/blob/master/hw/tpm/tpm_crb.c?ref_type=heads#L297-305

It's a device that maps itself autonomously into system memory. The way 
mapping is supposed to work is that the parent of the device maps it 
into a bus region. If the parent is a machine, it is free to also map it 
into system memory. But a device should not even know what system memory 
is :).


That said, I'm happy if we just introduce a new "sane" sysdev TPM CRB 
device that we use for non-PCs and leave the current layering violating 
one as is.



Alex




Re: [PATCH v2 01/12] build: Only define OS_OBJECT_USE_OBJC with gcc

2023-08-31 Thread Alexander Graf


On 31.08.23 10:53, Akihiko Odaki wrote:



On 2023/08/31 17:12, Philippe Mathieu-Daudé wrote:

On 30/8/23 18:14, Alexander Graf wrote:
Recent versions of macOS use clang instead of gcc. The 
OS_OBJECT_USE_OBJC
define is only necessary when building with gcc. Let's not define it 
when

building with clang.

With this patch, I can successfully include GCD headers in QEMU when
building with clang.

Signed-off-by: Alexander Graf 
---
  meson.build | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 98e68ef0b1..0d6a0015a1 100644
--- a/meson.build
+++ b/meson.build
@@ -224,7 +224,9 @@ qemu_ldflags = []
  if targetos == 'darwin'
    # Disable attempts to use ObjectiveC features in os/object.h since
they
    # won't work when we're compiling with gcc as a C compiler.
-  qemu_common_flags += '-DOS_OBJECT_USE_OBJC=0'
+  if compiler.get_id() == 'gcc'
+    qemu_common_flags += '-DOS_OBJECT_USE_OBJC=0'
+  endif
  elif targetos == 'solaris'
    # needed for CMSG_ macros in sys/socket.h
    qemu_common_flags += '-D_XOPEN_SOURCE=600'


Reviewed-by: Philippe Mathieu-Daudé 



Defining OS_OBJECT_USE_OBJC does not look like a proper solution.
Looking at os/object.h, it seems OS_OBJECT_USE_OBJC is defined as 0 when:
!defined(OS_OBJECT_HAVE_OBJC_SUPPORT) && (!defined(__OBJC__) ||
defined(__OBJC_GC__))

This means OS_OBJECT_USE_OBJC is always 0 if Objective-C is disabled. I
also confirmed os/object.h will not use Objective-C features when
compiled as C code on clang with the following command:

clang -E -x -c - <
EOF

If compilation fails with GCC when not defining OS_OBJECT_USE_OBJC, it
probably means GCC incorrectly treats C code as Objective-C and that is
the problem we should solve. I cannot confirm this theory however since
I have only an Apple Silicon Mac that is incompatible with GCC.



My take on this was to make the gcc hack be a "legacy" thing that we put 
into its own corner, so that in a few years we can just drop it 
altogether. I don't really think it's worth wasting much time on this 
workaround and its potential compatibility with old macOS versions.



Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




[PATCH v2 07/12] hw/vmapple/aes: Introduce aes engine

2023-08-30 Thread Alexander Graf
VMApple contains an "aes" engine device that it uses to encrypt and
decrypt its nvram. It has trivial hard coded keys it uses for that
purpose.

Add device emulation for this device model.

Signed-off-by: Alexander Graf 
---
 hw/vmapple/aes.c| 583 
 hw/vmapple/Kconfig  |   2 +
 hw/vmapple/meson.build  |   1 +
 hw/vmapple/trace-events |  18 ++
 4 files changed, 604 insertions(+)
 create mode 100644 hw/vmapple/aes.c

diff --git a/hw/vmapple/aes.c b/hw/vmapple/aes.c
new file mode 100644
index 00..eaf1e26abe
--- /dev/null
+++ b/hw/vmapple/aes.c
@@ -0,0 +1,583 @@
+/*
+ * QEMU Apple AES device emulation
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/irq.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "trace.h"
+#include "hw/sysbus.h"
+#include "crypto/hash.h"
+#include "crypto/aes.h"
+#include "crypto/cipher.h"
+
+#define TYPE_AES  "apple-aes"
+#define MAX_FIFO_SIZE 9
+
+#define CMD_KEY   0x1
+#define CMD_KEY_CONTEXT_SHIFT27
+#define CMD_KEY_CONTEXT_MASK (0x1 << CMD_KEY_CONTEXT_SHIFT)
+#define CMD_KEY_SELECT_SHIFT 24
+#define CMD_KEY_SELECT_MASK  (0x7 << CMD_KEY_SELECT_SHIFT)
+#define CMD_KEY_KEY_LEN_SHIFT22
+#define CMD_KEY_KEY_LEN_MASK (0x3 << CMD_KEY_KEY_LEN_SHIFT)
+#define CMD_KEY_ENCRYPT_SHIFT20
+#define CMD_KEY_ENCRYPT_MASK (0x1 << CMD_KEY_ENCRYPT_SHIFT)
+#define CMD_KEY_BLOCK_MODE_SHIFT 16
+#define CMD_KEY_BLOCK_MODE_MASK  (0x3 << CMD_KEY_BLOCK_MODE_SHIFT)
+#define CMD_IV0x2
+#define CMD_IV_CONTEXT_SHIFT 26
+#define CMD_IV_CONTEXT_MASK  (0x3 << CMD_KEY_CONTEXT_SHIFT)
+#define CMD_DSB   0x3
+#define CMD_SKG   0x4
+#define CMD_DATA  0x5
+#define CMD_DATA_KEY_CTX_SHIFT   27
+#define CMD_DATA_KEY_CTX_MASK(0x1 << CMD_DATA_KEY_CTX_SHIFT)
+#define CMD_DATA_IV_CTX_SHIFT25
+#define CMD_DATA_IV_CTX_MASK (0x3 << CMD_DATA_IV_CTX_SHIFT)
+#define CMD_DATA_LEN_MASK0xff
+#define CMD_STORE_IV  0x6
+#define CMD_STORE_IV_ADDR_MASK   0xff
+#define CMD_WRITE_REG 0x7
+#define CMD_FLAG  0x8
+#define CMD_FLAG_STOP_MASK   BIT(26)
+#define CMD_FLAG_RAISE_IRQ_MASK  BIT(27)
+#define CMD_FLAG_INFO_MASK   0xff
+#define CMD_MAX   0x10
+
+#define CMD_SHIFT 28
+
+#define REG_STATUS0xc
+#define REG_STATUS_DMA_READ_RUNNING BIT(0)
+#define REG_STATUS_DMA_READ_PENDING BIT(1)
+#define REG_STATUS_DMA_WRITE_RUNNINGBIT(2)
+#define REG_STATUS_DMA_WRITE_PENDINGBIT(3)
+#define REG_STATUS_BUSY BIT(4)
+#define REG_STATUS_EXECUTINGBIT(5)
+#define REG_STATUS_READYBIT(6)
+#define REG_STATUS_TEXT_DPA_SEEDED  BIT(7)
+#define REG_STATUS_UNWRAP_DPA_SEEDEDBIT(8)
+
+#define REG_IRQ_STATUS0x18
+#define REG_IRQ_STATUS_INVALID_CMD  BIT(2)
+#define REG_IRQ_STATUS_FLAG BIT(5)
+#define REG_IRQ_ENABLE0x1c
+#define REG_WATERMARK 0x20
+#define REG_Q_STATUS  0x24
+#define REG_FLAG_INFO 0x30
+#define REG_FIFO  0x200
+
+static const uint32_t key_lens[4] = {
+[0] = 16,
+[1] = 24,
+[2] = 32,
+[3] = 64,
+};
+
+struct key {
+uint32_t key_len;
+uint32_t key[8];
+};
+
+struct iv {
+uint32_t iv[4];
+};
+
+struct context {
+struct key key;
+struct iv iv;
+};
+
+static struct key builtin_keys[7] = {
+[1] = {
+.key_len = 32,
+.key = { 0x1 },
+},
+[2] = {
+.key_len = 32,
+.key = { 0x2 },
+},
+[3] = {
+.key_len = 32,
+.key = { 0x3 },
+}
+};
+
+typedef struct AESState {
+/* Private */
+SysBusDevice parent_obj;
+
+/* Public */
+qemu_irq irq;
+MemoryRegion iomem1;
+MemoryRegion iomem2;
+
+uint32_t status;
+uint32_t q_status;
+uint32_t irq_status;
+uint32_t irq_enable;
+uint32_t watermark;
+uint32_t flag_info;
+uint32_t fifo[MAX_FIFO_SIZE];
+uint32_t fifo_idx;
+struct key key[2];
+struct iv iv[4];
+bool is_encrypt;
+QCryptoCipherMode block_mode;
+} AESState;
+
+OBJECT_DECLARE_SIMPLE_TYPE(AESState, AES)
+
+static void aes_update_irq(AESState *s)
+{
+qemu_set_irq(s->irq, !!(s->irq_status & s->irq_enable));
+}
+
+static uint64_t aes1_read(void *opaque, hwaddr offset, unsigned size)
+{
+AESState *s = opaque;
+uint64_t res = 0;
+
+switch (offset) {
+case REG_STATUS:
+res = s->status;
+break;
+case REG_IRQ_STATUS:
+res = s->irq_status;
+break;
+

[PATCH v2 01/12] build: Only define OS_OBJECT_USE_OBJC with gcc

2023-08-30 Thread Alexander Graf
Recent versions of macOS use clang instead of gcc. The OS_OBJECT_USE_OBJC
define is only necessary when building with gcc. Let's not define it when
building with clang.

With this patch, I can successfully include GCD headers in QEMU when
building with clang.

Signed-off-by: Alexander Graf 
---
 meson.build | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 98e68ef0b1..0d6a0015a1 100644
--- a/meson.build
+++ b/meson.build
@@ -224,7 +224,9 @@ qemu_ldflags = []
 if targetos == 'darwin'
   # Disable attempts to use ObjectiveC features in os/object.h since they
   # won't work when we're compiling with gcc as a C compiler.
-  qemu_common_flags += '-DOS_OBJECT_USE_OBJC=0'
+  if compiler.get_id() == 'gcc'
+qemu_common_flags += '-DOS_OBJECT_USE_OBJC=0'
+  endif
 elif targetos == 'solaris'
   # needed for CMSG_ macros in sys/socket.h
   qemu_common_flags += '-D_XOPEN_SOURCE=600'
-- 
2.39.2 (Apple Git-143)




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH v2 12/12] hw/vmapple/vmapple: Add vmapple machine type

2023-08-30 Thread Alexander Graf
Apple defines a new "vmapple" machine type as part of its proprietary
macOS Virtualization.Framework vmm. This machine type is similar to the
virt one, but with subtle differences in base devices, a few special
vmapple device additions and a vastly different boot chain.

This patch reimplements this machine type in QEMU. To use it, you
have to have a readily installed version of macOS for VMApple,
run on macOS with -accel hvf, pass the Virtualization.Framework
boot rom (AVPBooter) in via -bios, pass the aux and root volume as pflash
and pass aux and root volume as virtio drives. In addition, you also
need to find the machine UUID and pass that as -M vmapple,uuid= parameter:

$ qemu-system-aarch64 -accel hvf -M vmapple,uuid=0x1234 -m 4G \
-bios 
/System/Library/Frameworks/Virtualization.framework/Versions/A/Resources/AVPBooter.vmapple2.bin
-drive file=aux,if=pflash,format=raw \
-drive file=root,if=pflash,format=raw \
-drive file=aux,if=none,id=aux,format=raw \
-device vmapple-virtio-aux,drive=aux \
-drive file=root,if=none,id=root,format=raw \
-device vmapple-virtio-root,drive=root

With all these in place, you should be able to see macOS booting
successfully.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - Adapt to system_ss meson.build target
  - Add documentation
---
 MAINTAINERS |   1 +
 docs/system/arm/vmapple.rst |  63 
 docs/system/target-arm.rst  |   1 +
 hw/vmapple/vmapple.c| 661 
 hw/vmapple/Kconfig  |  19 ++
 hw/vmapple/meson.build  |   1 +
 6 files changed, 746 insertions(+)
 create mode 100644 docs/system/arm/vmapple.rst
 create mode 100644 hw/vmapple/vmapple.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 3104e58eff..1d3b1e0034 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2578,6 +2578,7 @@ M: Alexander Graf 
 S: Maintained
 F: hw/vmapple/*
 F: include/hw/vmapple/*
+F: docs/system/arm/vmapple.rst
 
 Subsystems
 --
diff --git a/docs/system/arm/vmapple.rst b/docs/system/arm/vmapple.rst
new file mode 100644
index 00..c7486b21d9
--- /dev/null
+++ b/docs/system/arm/vmapple.rst
@@ -0,0 +1,63 @@
+VMApple machine emulation
+
+
+VMApple is the device model that the macOS built-in hypervisor called 
"Virtualization.framework"
+exposes to Apple Silicon macOS guests. The "vmapple" machine model in QEMU 
implements the same
+device model, but does not use any code from Virtualization.Framework.
+
+Prerequisites
+-
+
+To run the vmapple machine model, you need to
+
+ * Run on Apple Silicon
+ * Run on macOS 12.0 or above
+ * Have an already installed copy of a Virtualization.Framework macOS virtual 
machine. I will
+   assume that you installed it using the macosvm CLI.
+
+First, we need to extract the UUID from the virtual machine that you 
installed. You can do this
+by running the following shell script:
+
+.. code-block:: bash
+  :caption: uuid.sh script to extract the UUID from a macosvm.json file
+
+  #!/bin/bash
+
+  MID=$(cat "$1" | python3 -c 'import 
json,sys;obj=json.load(sys.stdin);print(obj["machineId"]);')
+  echo "$MID" | base64 -d | plutil -extract ECID raw -
+
+Now we also need to trim the aux partition. It contains metadata that we can 
just discard:
+
+.. code-block:: bash
+  :caption: Command to trim the aux file
+
+  $ dd if="aux.img" of="aux.img.trimmed" bs=$(( 0x4000 )) skip=1
+
+How to run
+--
+
+Then, we can launch QEMU with the Virtualization.Framework pre-boot 
environment and the readily
+installed target disk images. I recommend to port forward the VM's ssh and vnc 
ports to the host
+to get better interactive access into the target system:
+
+.. code-block:: bash
+  :caption: Example execution command line
+
+  $ UUID=$(uuid.sh macosvm.json)
+  $ 
AVPBOOTER=/System/Library/Frameworks/Virtualization.framework/Resources/AVPBooter.vmapple2.bin
+  $ AUX=aux.img.trimmed
+  $ DISK=disk.img
+  $ qemu-system-aarch64 \
+   -serial mon:stdio \
+   -m 4G \
+   -accel hvf \
+   -M vmapple,uuid=$UUID \
+   -bios $AVPBOOTER \
+-drive file="$AUX",if=pflash,format=raw \
+-drive file="$DISK",if=pflash,format=raw \
+   -drive file="$AUX",if=none,id=aux,format=raw \
+   -drive file="$DISK",if=none,id=root,format=raw \
+   -device vmapple-virtio-aux,drive=aux \
+   -device vmapple-virtio-root,drive=root \
+   -net user,ipv6=off,hostfwd=tcp::-:22,hostfwd=tcp::5901-:5900 \
+   -net nic,model=virtio \
diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
index 790ac1b8a2..bf663df4a6 100644
--- a/docs/system/target-arm.rst
+++ b/docs/system/target-arm.rst
@@ -106,6 +106,7 @@ undocumented; you can get a complete list by running
arm/stellaris
arm/stm3

[PATCH v2 00/12] Introduce new vmapple machine type

2023-08-30 Thread Alexander Graf
This patch set introduces a new ARM and HVF specific machine type
called "vmapple". It mimicks the device model that Apple's proprietary
Virtualization.Framework exposes, but implements it in QEMU.

With this new machine type, you can run macOS guests on Apple Silicon
systems via HVF. To do so, you need to first install macOS using
Virtualization.Framework onto a virtual disk image using a tool like
macosvm (https://github.com/s-u/macosvm)

  $ macosvm --disk disk.img,size=32g --aux aux.img \
--restore UniversalMac_12.0.1_21A559_Restore.ipsw vm.json

Then, extract the ECID from the installed VM:

  $ cat "$DIR/macosvm.json" | python3 -c \
  'import json,sys;obj=json.load(sys.stdin);print(obj["machineId"]) |\
  base64 -d | plutil -extract ECID raw -

In addition, cut off the first 16kb of the aux.img:

  $ dd if=aux.img of=aux.img.trimmed bs=$(( 0x4000 )) skip=1

Now, you can just launch QEMU with the bits generated above:

  $ qemu-system-aarch64 -serial mon:stdio\
  -m 4G  \
  -M vmapple,uuid=6240349656165161789\
  -bios /Sys*/Lib*/Fra*/Virtualization.f*/R*/AVPBooter.vmapple2.bin  \
  -pflash aux.img.trimmed\
  -pflash disk.img   \
  -drive file=disk.img,if=none,id=root   \
  -device vmapple-virtio-root,drive=root \
  -drive file=aux.img.trimmed,if=none,id=aux \
  -device vmapple-virtio-aix,drive=aux   \
  -accel hvf

There are a few limitations with this implementation:

  - Only runs on macOS because it relies on
ParavirtualizesGraphics.Framework
  - Something is not fully correct on interrupt delivery or
similar - the keyboard does not work
  - No Rosetta in the guest because we lack the private
entitlement to enable TSO

Over time, I hope that some of the limitations above could cease to exist.
This device model would enable very nice use cases with KVM on an Asahi
Linux device.

Please beware that the vmapple device model only works with macOS 12 guests
for now. Newer guests run into Hypervisor.Framework incompatibilities.

---

v1 -> v2:

  - Adapt to system_ss meson.build target
  - Add documentation
  - Rework virtio-blk patch to make all vmapple virtio-blk logic subclasses
  - Add log message on write
  - Move max slot number to define
  - Use SPDX header
  - Remove useless includes

Alexander Graf (12):
  build: Only define OS_OBJECT_USE_OBJC with gcc
  hw/misc/pvpanic: Add MMIO interface
  hvf: Increase number of possible memory slots
  hvf: arm: Ignore writes to CNTP_CTL_EL0
  hw: Add vmapple subdir
  gpex: Allow more than 4 legacy IRQs
  hw/vmapple/aes: Introduce aes engine
  hw/vmapple/bdif: Introduce vmapple backdoor interface
  hw/vmapple/cfg: Introduce vmapple cfg region
  hw/vmapple/apple-gfx: Introduce ParavirtualizedGraphics.Framework
support
  hw/vmapple/virtio-blk: Add support for apple virtio-blk
  hw/vmapple/vmapple: Add vmapple machine type

 MAINTAINERS |   7 +
 docs/system/arm/vmapple.rst |  68 
 docs/system/target-arm.rst  |   1 +
 meson.build |   9 +-
 hw/vmapple/trace.h  |   1 +
 include/hw/misc/pvpanic.h   |   1 +
 include/hw/pci-host/gpex.h  |   7 +-
 include/hw/pci/pci_ids.h|   1 +
 include/hw/virtio/virtio-blk.h  |  11 +-
 include/hw/vmapple/bdif.h   |  31 ++
 include/hw/vmapple/cfg.h|  68 
 include/hw/vmapple/virtio-blk.h |  39 ++
 include/sysemu/hvf_int.h|   4 +-
 accel/hvf/hvf-accel-ops.c   |   2 +-
 hw/arm/sbsa-ref.c   |   2 +-
 hw/arm/virt.c   |   2 +-
 hw/block/virtio-blk.c   |  18 +-
 hw/i386/microvm.c   |   2 +-
 hw/loongarch/virt.c |   2 +-
 hw/mips/loongson3_virt.c|   2 +-
 hw/misc/pvpanic-mmio.c  |  61 +++
 hw/openrisc/virt.c  |  12 +-
 hw/pci-host/gpex.c  |  36 +-
 hw/riscv/virt.c |  12 +-
 hw/vmapple/aes.c| 583 
 hw/vmapple/bdif.c   | 245 
 hw/vmapple/cfg.c| 105 +
 hw/vmapple/virtio-blk.c | 212 ++
 hw/vmapple/vmapple.c| 661 
 hw/xtensa/virt.c|   2 +-
 target/arm/hvf/hvf.c|   9 +
 hw/Kconfig  |   1 +
 hw/meson.build  |   1 +
 hw/misc/Kconfig |   4 +
 hw/misc/meson.build |   1 +
 hw/vmapple/Kconfig  |  33 ++
 hw/vmapple/apple-gfx.m  | 578 
 hw/vmapple/meson.build  |   6 +
 hw/vmapple/trace

[PATCH v2 09/12] hw/vmapple/cfg: Introduce vmapple cfg region

2023-08-30 Thread Alexander Graf
Instead of device tree or other more standardized means, VMApple passes
platform configuration to the first stage boot loader in a binary encoded
format that resides at a dedicated RAM region in physical address space.

This patch models this configuration space as a qdev device which we can
then map at the fixed location in the address space. That way, we can
influence and annotate all configuration fields easily.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - Adapt to system_ss meson.build target
---
 include/hw/vmapple/cfg.h |  68 +
 hw/vmapple/cfg.c | 105 +++
 hw/vmapple/Kconfig   |   3 ++
 hw/vmapple/meson.build   |   1 +
 4 files changed, 177 insertions(+)
 create mode 100644 include/hw/vmapple/cfg.h
 create mode 100644 hw/vmapple/cfg.c

diff --git a/include/hw/vmapple/cfg.h b/include/hw/vmapple/cfg.h
new file mode 100644
index 00..3337064e44
--- /dev/null
+++ b/include/hw/vmapple/cfg.h
@@ -0,0 +1,68 @@
+/*
+ * VMApple Configuration Region
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_VMAPPLE_CFG_H
+#define HW_VMAPPLE_CFG_H
+
+#include "hw/sysbus.h"
+#include "qom/object.h"
+#include "net/net.h"
+
+typedef struct VMAppleCfg {
+uint32_t version; /* 0x000 */
+uint32_t nr_cpus; /* 0x004 */
+uint32_t unk1;/* 0x008 */
+uint32_t unk2;/* 0x00c */
+uint32_t unk3;/* 0x010 */
+uint32_t unk4;/* 0x014 */
+uint64_t ecid;/* 0x018 */
+uint64_t ram_size;/* 0x020 */
+uint32_t run_installer1;  /* 0x028 */
+uint32_t unk5;/* 0x02c */
+uint32_t unk6;/* 0x030 */
+uint32_t run_installer2;  /* 0x034 */
+uint32_t rnd; /* 0x038 */
+uint32_t unk7;/* 0x03c */
+MACAddr mac_en0;  /* 0x040 */
+uint8_t pad1[2];
+MACAddr mac_en1;  /* 0x048 */
+uint8_t pad2[2];
+MACAddr mac_wifi0;/* 0x050 */
+uint8_t pad3[2];
+MACAddr mac_bt0;  /* 0x058 */
+uint8_t pad4[2];
+uint8_t reserved[0xa0];   /* 0x060 */
+uint32_t cpu_ids[0x80];   /* 0x100 */
+uint8_t scratch[0x200];   /* 0x180 */
+char serial[32];  /* 0x380 */
+char unk8[32];/* 0x3a0 */
+char model[32];   /* 0x3c0 */
+uint8_t unk9[32]; /* 0x3e0 */
+uint32_t unk10;   /* 0x400 */
+char soc_name[32];/* 0x404 */
+} VMAppleCfg;
+
+#define TYPE_VMAPPLE_CFG "vmapple-cfg"
+OBJECT_DECLARE_SIMPLE_TYPE(VMAppleCfgState, VMAPPLE_CFG)
+
+struct VMAppleCfgState {
+/*  */
+SysBusDevice parent_obj;
+VMAppleCfg cfg;
+
+/*  */
+MemoryRegion mem;
+char *serial;
+char *model;
+char *soc_name;
+};
+
+#define VMAPPLE_CFG_SIZE 0x0001
+
+#endif /* HW_VMAPPLE_CFG_H */
diff --git a/hw/vmapple/cfg.c b/hw/vmapple/cfg.c
new file mode 100644
index 00..d48e3c3afa
--- /dev/null
+++ b/hw/vmapple/cfg.c
@@ -0,0 +1,105 @@
+/*
+ * VMApple Configuration Region
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/vmapple/cfg.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qapi/error.h"
+
+static void vmapple_cfg_reset(DeviceState *dev)
+{
+VMAppleCfgState *s = VMAPPLE_CFG(dev);
+VMAppleCfg *cfg;
+
+cfg = memory_region_get_ram_ptr(&s->mem);
+memset((void *)cfg, 0, VMAPPLE_CFG_SIZE);
+*cfg = s->cfg;
+}
+
+static void vmapple_cfg_realize(DeviceState *dev, Error **errp)
+{
+VMAppleCfgState *s = VMAPPLE_CFG(dev);
+uint32_t i;
+
+strncpy(s->cfg.serial, s->serial, sizeof(s->cfg.serial));
+strncpy(s->cfg.model, s->model, sizeof(s->cfg.model));
+strncpy(s->cfg.soc_name, s->soc_name, sizeof(s->cfg.soc_name));
+strncpy(s->cfg.unk8, "D/A", sizeof(s->cfg.soc_name));
+s->cfg.ecid = cpu_to_be64(s->cfg.ecid);
+s->cfg.version = 2;
+s->cfg.unk1 = 1;
+s->cfg.unk2 = 1;
+s->cfg.unk3 = 0x20;
+s->cfg.unk4 = 0;
+s->cfg.unk5 = 1;
+s->cfg.unk6 = 1;
+s->cfg.unk7 = 0;
+s->cfg.unk10 = 1;
+
+g_assert(s->cfg.nr_cpus < ARRAY_SIZE(s->cfg.cpu_ids));
+for (i = 0; i < s->cfg.nr_cpus; i++) {
+s->cfg.cpu_ids[i] = i;
+}
+}
+
+static void vmapple_cfg_init(Object *obj)
+{
+VMAppleCfgState *s = VMAPPLE_CFG(obj);
+
+memory_region_init_ram(&s->mem, obj, "VMApple Config", VMAPPLE_CFG_SI

[PATCH v2 08/12] hw/vmapple/bdif: Introduce vmapple backdoor interface

2023-08-30 Thread Alexander Graf
The VMApple machine exposes AUX and ROOT block devices (as well as USB OTG
emulation) via virtio-pci as well as a special, simple backdoor platform
device.

This patch implements this backdoor platform device to the best of my
understanding. I left out any USB OTG parts; they're only needed for
guest recovery and I don't understand the protocol yet.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - Adapt to system_ss meson.build target
---
 include/hw/vmapple/bdif.h |  31 +
 hw/vmapple/bdif.c | 245 ++
 hw/vmapple/Kconfig|   2 +
 hw/vmapple/meson.build|   1 +
 hw/vmapple/trace-events   |   5 +
 5 files changed, 284 insertions(+)
 create mode 100644 include/hw/vmapple/bdif.h
 create mode 100644 hw/vmapple/bdif.c

diff --git a/include/hw/vmapple/bdif.h b/include/hw/vmapple/bdif.h
new file mode 100644
index 00..65ee43457b
--- /dev/null
+++ b/include/hw/vmapple/bdif.h
@@ -0,0 +1,31 @@
+/*
+ * VMApple Backdoor Interface
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_VMAPPLE_BDIF_H
+#define HW_VMAPPLE_BDIF_H
+
+#include "hw/sysbus.h"
+#include "qom/object.h"
+
+#define TYPE_VMAPPLE_BDIF "vmapple-bdif"
+OBJECT_DECLARE_SIMPLE_TYPE(VMAppleBdifState, VMAPPLE_BDIF)
+
+struct VMAppleBdifState {
+/*  */
+SysBusDevice parent_obj;
+
+/*  */
+BlockBackend *aux;
+BlockBackend *root;
+MemoryRegion mmio;
+};
+
+#define VMAPPLE_BDIF_SIZE 0x0020
+
+#endif /* HW_VMAPPLE_BDIF_H */
diff --git a/hw/vmapple/bdif.c b/hw/vmapple/bdif.c
new file mode 100644
index 00..36b5915ff3
--- /dev/null
+++ b/hw/vmapple/bdif.c
@@ -0,0 +1,245 @@
+/*
+ * VMApple Backdoor Interface
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/vmapple/bdif.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qapi/error.h"
+#include "trace.h"
+#include "hw/block/block.h"
+#include "sysemu/block-backend.h"
+
+#define REG_DEVID_MASK  0x
+#define DEVID_ROOT  0x
+#define DEVID_AUX   0x0001
+#define DEVID_USB   0x0010
+
+#define REG_STATUS  0x0
+#define REG_STATUS_ACTIVE BIT(0)
+#define REG_CFG 0x4
+#define REG_CFG_ACTIVEBIT(1)
+#define REG_UNK10x8
+#define REG_BUSY0x10
+#define REG_BUSY_READYBIT(0)
+#define REG_UNK20x400
+#define REG_CMD 0x408
+#define REG_NEXT_DEVICE 0x420
+#define REG_UNK30x434
+
+typedef struct vblk_sector {
+uint32_t pad;
+uint32_t pad2;
+uint32_t sector;
+uint32_t pad3;
+} VblkSector;
+
+typedef struct vblk_req_cmd {
+uint64_t addr;
+uint32_t len;
+uint32_t flags;
+} VblkReqCmd;
+
+typedef struct vblk_req {
+VblkReqCmd sector;
+VblkReqCmd data;
+VblkReqCmd retval;
+} VblkReq;
+
+#define VBLK_DATA_FLAGS_READ  0x00030001
+#define VBLK_DATA_FLAGS_WRITE 0x00010001
+
+#define VBLK_RET_SUCCESS  0
+#define VBLK_RET_FAILED   1
+
+static uint64_t bdif_read(void *opaque, hwaddr offset, unsigned size)
+{
+uint64_t ret = -1;
+uint64_t devid = (offset & REG_DEVID_MASK);
+
+switch (offset & ~REG_DEVID_MASK) {
+case REG_STATUS:
+ret = REG_STATUS_ACTIVE;
+break;
+case REG_CFG:
+ret = REG_CFG_ACTIVE;
+break;
+case REG_UNK1:
+ret = 0x420;
+break;
+case REG_BUSY:
+ret = REG_BUSY_READY;
+break;
+case REG_UNK2:
+ret = 0x1;
+break;
+case REG_UNK3:
+ret = 0x0;
+break;
+case REG_NEXT_DEVICE:
+switch (devid) {
+case DEVID_ROOT:
+ret = 0x800;
+break;
+case DEVID_AUX:
+ret = 0x1;
+break;
+}
+break;
+}
+
+trace_bdif_read(offset, size, ret);
+return ret;
+}
+
+static void le2cpu_sector(VblkSector *sector)
+{
+sector->sector = le32_to_cpu(sector->sector);
+}
+
+static void le2cpu_reqcmd(VblkReqCmd *cmd)
+{
+cmd->addr = le64_to_cpu(cmd->addr);
+cmd->len = le32_to_cpu(cmd->len);
+cmd->flags = le32_to_cpu(cmd->flags);
+}
+
+static void le2cpu_req(VblkReq *req)
+{
+le2cpu_reqcmd(&req->sector);
+le2cpu_reqcmd(&req->data);
+le2cpu_reqcmd(&req->retval);
+}
+
+static void vblk_cmd(uint64_t devid, BlockBackend *blk, uint64_t value,
+ uint64_t static_off)
+{
+VblkReq req;
+VblkSector sector;
+uint64_t off = 0;
+ 

[PATCH v2 10/12] hw/vmapple/apple-gfx: Introduce ParavirtualizedGraphics.Framework support

2023-08-30 Thread Alexander Graf
MacOS provides a framework (library) that allows any vmm to implement a
paravirtualized 3d graphics passthrough to the host metal stack called
ParavirtualizedGraphics.Framework (PVG). The library abstracts away
almost every aspect of the paravirtualized device model and only provides
and receives callbacks on MMIO access as well as to share memory address
space between the VM and PVG.

This patch implements a QEMU device that drives PVG for the VMApple
variant of it.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - Adapt to system_ss meson.build target
---
 meson.build |   4 +
 hw/vmapple/Kconfig  |   3 +
 hw/vmapple/apple-gfx.m  | 578 
 hw/vmapple/meson.build  |   1 +
 hw/vmapple/trace-events |  22 ++
 5 files changed, 608 insertions(+)
 create mode 100644 hw/vmapple/apple-gfx.m

diff --git a/meson.build b/meson.build
index dc5242a5f4..d34310b5eb 100644
--- a/meson.build
+++ b/meson.build
@@ -607,6 +607,8 @@ socket = []
 version_res = []
 coref = []
 iokit = []
+pvg = []
+metal = []
 emulator_link_args = []
 nvmm =not_found
 hvf = not_found
@@ -630,6 +632,8 @@ elif targetos == 'darwin'
   coref = dependency('appleframeworks', modules: 'CoreFoundation')
   iokit = dependency('appleframeworks', modules: 'IOKit', required: false)
   host_dsosuf = '.dylib'
+  pvg = dependency('appleframeworks', modules: 'ParavirtualizedGraphics')
+  metal = dependency('appleframeworks', modules: 'Metal')
 elif targetos == 'sunos'
   socket = [cc.find_library('socket'),
 cc.find_library('nsl'),
diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
index 542426a740..ba37fc5b81 100644
--- a/hw/vmapple/Kconfig
+++ b/hw/vmapple/Kconfig
@@ -6,3 +6,6 @@ config VMAPPLE_BDIF
 
 config VMAPPLE_CFG
 bool
+
+config VMAPPLE_PVG
+bool
diff --git a/hw/vmapple/apple-gfx.m b/hw/vmapple/apple-gfx.m
new file mode 100644
index 00..97dd2cd9ae
--- /dev/null
+++ b/hw/vmapple/apple-gfx.m
@@ -0,0 +1,578 @@
+/*
+ * QEMU Apple ParavirtualizedGraphics.framework device
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ * ParavirtualizedGraphics.framework is a set of libraries that macOS provides
+ * which implements 3d graphics passthrough to the host as well as a
+ * proprietary guest communication channel to drive it. This device model
+ * implements support to drive that library from within QEMU.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/irq.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "trace.h"
+#include "hw/sysbus.h"
+#include "hw/pci/msi.h"
+#include "crypto/hash.h"
+#include "sysemu/cpus.h"
+#include "ui/console.h"
+#include "monitor/monitor.h"
+#import 
+
+#define TYPE_APPLE_GFX  "apple-gfx"
+
+#define MAX_MRS 512
+
+static const PGDisplayCoord_t apple_gfx_modes[] = {
+{ .x = 1440, .y = 1080 },
+{ .x = 1280, .y = 1024 },
+};
+
+/*
+ * We have to map PVG memory into our address space. Use the one below
+ * as base start address. In normal linker setups it points to a free
+ * memory range.
+ */
+#define APPLE_GFX_BASE_VA ((void *)(uintptr_t)0x5000UL)
+
+/*
+ * ParavirtualizedGraphics.Framework only ships header files for the x86
+ * variant which does not include IOSFC descriptors and host devices. We add
+ * their definitions here so that we can also work with the ARM version.
+ */
+typedef bool(^IOSFCRaiseInterrupt)(uint32_t vector);
+typedef bool(^IOSFCUnmapMemory)(void *a, void *b, void *c, void *d, void *e, 
void *f);
+typedef bool(^IOSFCMapMemory)(uint64_t phys, uint64_t len, bool ro, void **va, 
void *e, void *f);
+
+@interface PGDeviceDescriptorExt : PGDeviceDescriptor
+@property (readwrite, nonatomic) bool usingIOSurfaceMapper;
+@end
+
+@interface PGIOSurfaceHostDeviceDescriptor : NSObject
+-(PGIOSurfaceHostDeviceDescriptor *)init;
+@property (readwrite, nonatomic, copy, nullable) IOSFCMapMemory mapMemory;
+@property (readwrite, nonatomic, copy, nullable) IOSFCUnmapMemory unmapMemory;
+@property (readwrite, nonatomic, copy, nullable) IOSFCRaiseInterrupt 
raiseInterrupt;
+@end
+
+@interface PGIOSurfaceHostDevice : NSObject
+-(void)initWithDescriptor:(PGIOSurfaceHostDeviceDescriptor *) desc;
+-(uint32_t)mmioReadAtOffset:(size_t) offset;
+-(void)mmioWriteAtOffset:(size_t) offset value:(uint32_t)value;
+@end
+
+typedef struct AppleGFXMR {
+QTAILQ_ENTRY(AppleGFXMR) node;
+hwaddr pa;
+void *va;
+uint64_t len;
+} AppleGFXMR;
+
+typedef QTAILQ_HEAD(, AppleGFXMR) AppleGFXMRList;
+
+typedef struct AppleGFXTask {
+QTAILQ_ENTRY(Apple

[PATCH v2 02/12] hw/misc/pvpanic: Add MMIO interface

2023-08-30 Thread Alexander Graf
In addition to the ISA and PCI variants of pvpanic, let's add an MMIO
platform device that we can use in embedded arm environments.

Signed-off-by: Alexander Graf 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 

---

v1 -> v2:

  - Use SPDX header
  - Remove useless includes
  - Adapt to new meson.build target (system_ss)
---
 include/hw/misc/pvpanic.h |  1 +
 hw/misc/pvpanic-mmio.c| 61 +++
 hw/misc/Kconfig   |  4 +++
 hw/misc/meson.build   |  1 +
 4 files changed, 67 insertions(+)
 create mode 100644 hw/misc/pvpanic-mmio.c

diff --git a/include/hw/misc/pvpanic.h b/include/hw/misc/pvpanic.h
index fab94165d0..f9e7c1ea17 100644
--- a/include/hw/misc/pvpanic.h
+++ b/include/hw/misc/pvpanic.h
@@ -20,6 +20,7 @@
 
 #define TYPE_PVPANIC_ISA_DEVICE "pvpanic"
 #define TYPE_PVPANIC_PCI_DEVICE "pvpanic-pci"
+#define TYPE_PVPANIC_MMIO_DEVICE "pvpanic-mmio"
 
 #define PVPANIC_IOPORT_PROP "ioport"
 
diff --git a/hw/misc/pvpanic-mmio.c b/hw/misc/pvpanic-mmio.c
new file mode 100644
index 00..99a24f104c
--- /dev/null
+++ b/hw/misc/pvpanic-mmio.c
@@ -0,0 +1,61 @@
+/*
+ * QEMU simulated pvpanic device (MMIO frontend)
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+
+#include "hw/qdev-properties.h"
+#include "hw/misc/pvpanic.h"
+#include "hw/sysbus.h"
+#include "standard-headers/linux/pvpanic.h"
+
+OBJECT_DECLARE_SIMPLE_TYPE(PVPanicMMIOState, PVPANIC_MMIO_DEVICE)
+
+#define PVPANIC_MMIO_SIZE 0x2
+
+struct PVPanicMMIOState {
+SysBusDevice parent_obj;
+
+PVPanicState pvpanic;
+};
+
+static void pvpanic_mmio_initfn(Object *obj)
+{
+PVPanicMMIOState *s = PVPANIC_MMIO_DEVICE(obj);
+
+pvpanic_setup_io(&s->pvpanic, DEVICE(s), PVPANIC_MMIO_SIZE);
+sysbus_init_mmio(SYS_BUS_DEVICE(obj), &s->pvpanic.mr);
+}
+
+static Property pvpanic_mmio_properties[] = {
+DEFINE_PROP_UINT8("events", PVPanicMMIOState, pvpanic.events,
+  PVPANIC_PANICKED | PVPANIC_CRASH_LOADED),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void pvpanic_mmio_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+device_class_set_props(dc, pvpanic_mmio_properties);
+set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+}
+
+static const TypeInfo pvpanic_mmio_info = {
+.name  = TYPE_PVPANIC_MMIO_DEVICE,
+.parent= TYPE_SYS_BUS_DEVICE,
+.instance_size = sizeof(PVPanicMMIOState),
+.instance_init = pvpanic_mmio_initfn,
+.class_init= pvpanic_mmio_class_init,
+};
+
+static void pvpanic_register_types(void)
+{
+type_register_static(&pvpanic_mmio_info);
+}
+
+type_init(pvpanic_register_types)
diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig
index 6996d265e4..b69746a60a 100644
--- a/hw/misc/Kconfig
+++ b/hw/misc/Kconfig
@@ -125,6 +125,10 @@ config PVPANIC_ISA
 depends on ISA_BUS
 select PVPANIC_COMMON
 
+config PVPANIC_MMIO
+bool
+select PVPANIC_COMMON
+
 config AUX
 bool
 select I2C
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index 892f8b91c5..63821d6040 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -116,6 +116,7 @@ system_ss.add(when: 'CONFIG_ARMSSE_MHU', if_true: 
files('armsse-mhu.c'))
 
 system_ss.add(when: 'CONFIG_PVPANIC_ISA', if_true: files('pvpanic-isa.c'))
 system_ss.add(when: 'CONFIG_PVPANIC_PCI', if_true: files('pvpanic-pci.c'))
+system_ss.add(when: 'CONFIG_PVPANIC_MMIO', if_true: files('pvpanic-mmio.c'))
 system_ss.add(when: 'CONFIG_AUX', if_true: files('auxbus.c'))
 system_ss.add(when: 'CONFIG_ASPEED_SOC', if_true: files(
   'aspeed_hace.c',
-- 
2.39.2 (Apple Git-143)




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




[PATCH v2 03/12] hvf: Increase number of possible memory slots

2023-08-30 Thread Alexander Graf
For PVG we will need more than the current 32 possible memory slots.
Bump the limit to 512 instead.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - Move max slot number to define
---
 include/sysemu/hvf_int.h  | 4 +++-
 accel/hvf/hvf-accel-ops.c | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index 718beddcdd..36aa9b4eff 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -17,6 +17,8 @@
 #include 
 #endif
 
+#define HVF_MAX_SLOTS 512
+
 /* hvf_slot flags */
 #define HVF_SLOT_LOG (1 << 0)
 
@@ -40,7 +42,7 @@ typedef struct hvf_vcpu_caps {
 
 struct HVFState {
 AccelState parent;
-hvf_slot slots[32];
+hvf_slot slots[HVF_MAX_SLOTS];
 int num_slots;
 
 hvf_vcpu_caps *hvf_caps;
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index 3c94c79747..7aee0d6f72 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -88,7 +88,7 @@ struct mac_slot {
 uint64_t gva;
 };
 
-struct mac_slot mac_slots[32];
+struct mac_slot mac_slots[HVF_MAX_SLOTS];
 
 static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
 {
-- 
2.39.2 (Apple Git-143)




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH v2 05/12] hw: Add vmapple subdir

2023-08-30 Thread Alexander Graf
We will introduce a number of devices that are specific to the vmapple
target machine. To keep them all tidily together, let's put them into
a single target directory.

Signed-off-by: Alexander Graf 
---
 MAINTAINERS | 6 ++
 meson.build | 1 +
 hw/vmapple/trace.h  | 1 +
 hw/Kconfig  | 1 +
 hw/meson.build  | 1 +
 hw/vmapple/Kconfig  | 1 +
 hw/vmapple/meson.build  | 0
 hw/vmapple/trace-events | 2 ++
 8 files changed, 13 insertions(+)
 create mode 100644 hw/vmapple/trace.h
 create mode 100644 hw/vmapple/Kconfig
 create mode 100644 hw/vmapple/meson.build
 create mode 100644 hw/vmapple/trace-events

diff --git a/MAINTAINERS b/MAINTAINERS
index 6111b6b4d9..3104e58eff 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2573,6 +2573,12 @@ F: hw/usb/canokey.c
 F: hw/usb/canokey.h
 F: docs/system/devices/canokey.rst
 
+VMapple
+M: Alexander Graf 
+S: Maintained
+F: hw/vmapple/*
+F: include/hw/vmapple/*
+
 Subsystems
 --
 Overall Audio backends
diff --git a/meson.build b/meson.build
index 0d6a0015a1..dc5242a5f4 100644
--- a/meson.build
+++ b/meson.build
@@ -3282,6 +3282,7 @@ if have_system
 'hw/usb',
 'hw/vfio',
 'hw/virtio',
+'hw/vmapple',
 'hw/watchdog',
 'hw/xen',
 'hw/gpio',
diff --git a/hw/vmapple/trace.h b/hw/vmapple/trace.h
new file mode 100644
index 00..572adbefe0
--- /dev/null
+++ b/hw/vmapple/trace.h
@@ -0,0 +1 @@
+#include "trace/trace-hw_vmapple.h"
diff --git a/hw/Kconfig b/hw/Kconfig
index ba62ff6417..d99854afdd 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -41,6 +41,7 @@ source tpm/Kconfig
 source usb/Kconfig
 source virtio/Kconfig
 source vfio/Kconfig
+source vmapple/Kconfig
 source xen/Kconfig
 source watchdog/Kconfig
 
diff --git a/hw/meson.build b/hw/meson.build
index c7ac7d3d75..e156a6618f 100644
--- a/hw/meson.build
+++ b/hw/meson.build
@@ -40,6 +40,7 @@ subdir('tpm')
 subdir('usb')
 subdir('vfio')
 subdir('virtio')
+subdir('vmapple')
 subdir('watchdog')
 subdir('xen')
 subdir('xenpv')
diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
new file mode 100644
index 00..8b13789179
--- /dev/null
+++ b/hw/vmapple/Kconfig
@@ -0,0 +1 @@
+
diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build
new file mode 100644
index 00..e69de29bb2
diff --git a/hw/vmapple/trace-events b/hw/vmapple/trace-events
new file mode 100644
index 00..9ccc579048
--- /dev/null
+++ b/hw/vmapple/trace-events
@@ -0,0 +1,2 @@
+# See docs/devel/tracing.rst for syntax documentation.
+
-- 
2.39.2 (Apple Git-143)




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH v2 04/12] hvf: arm: Ignore writes to CNTP_CTL_EL0

2023-08-30 Thread Alexander Graf
MacOS unconditionally disables interrupts of the physical timer on boot
and then continues to use the virtual one. We don't really want to support
a full physical timer emulation, so let's just ignore those writes.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - Add log message on write
---
 target/arm/hvf/hvf.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 486f90be1d..02db3dc908 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -11,6 +11,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
+#include "qemu/log.h"
 
 #include "sysemu/runstate.h"
 #include "sysemu/hvf.h"
@@ -179,6 +180,7 @@ void hvf_arm_init_debug(void)
 #define SYSREG_OSLSR_EL1  SYSREG(2, 0, 1, 1, 4)
 #define SYSREG_OSDLR_EL1  SYSREG(2, 0, 1, 3, 4)
 #define SYSREG_CNTPCT_EL0 SYSREG(3, 3, 14, 0, 1)
+#define SYSREG_CNTP_CTL_EL0   SYSREG(3, 3, 14, 2, 1)
 #define SYSREG_PMCR_EL0   SYSREG(3, 3, 9, 12, 0)
 #define SYSREG_PMUSERENR_EL0  SYSREG(3, 3, 9, 14, 0)
 #define SYSREG_PMCNTENSET_EL0 SYSREG(3, 3, 9, 12, 1)
@@ -1551,6 +1553,13 @@ static int hvf_sysreg_write(CPUState *cpu, uint32_t reg, 
uint64_t val)
 case SYSREG_OSLAR_EL1:
 env->cp15.oslsr_el1 = val & 1;
 break;
+case SYSREG_CNTP_CTL_EL0:
+/*
+ * Guests should not rely on the physical counter, but macOS emits
+ * disable writes to it. Let it do so, but ignore the requests.
+ */
+qemu_log_mask(LOG_UNIMP, "Unsupported write to CNTP_CTL_EL0\n");
+break;
 case SYSREG_OSDLR_EL1:
 /* Dummy register */
 break;
-- 
2.39.2 (Apple Git-143)




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH v2 11/12] hw/vmapple/virtio-blk: Add support for apple virtio-blk

2023-08-30 Thread Alexander Graf
Apple has its own virtio-blk PCI device ID where it deviates from the
official virtio-pci spec slightly: It puts a new "apple type"
field at a static offset in config space and introduces a new barrier
command.

This patch first creates a mechanism for virtio-blk downstream classes to
handle unknown commands. It then creates such a downstream class and a new
vmapple-virtio-blk-pci class which support the additional apple type config
identifier as well as the barrier command.

It then exposes 2 subclasses from that that we can use to expose root and
aux virtio-blk devices: "vmapple-virtio-root" and "vmapple-virtio-aux".

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - Rework to make all vmapple virtio-blk logic a subclass
---
 include/hw/pci/pci_ids.h|   1 +
 include/hw/virtio/virtio-blk.h  |  12 +-
 include/hw/vmapple/virtio-blk.h |  39 ++
 hw/block/virtio-blk.c   |  19 ++-
 hw/vmapple/virtio-blk.c | 212 
 hw/vmapple/Kconfig  |   3 +
 hw/vmapple/meson.build  |   1 +
 7 files changed, 282 insertions(+), 5 deletions(-)
 create mode 100644 include/hw/vmapple/virtio-blk.h
 create mode 100644 hw/vmapple/virtio-blk.c

diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
index e4386ebb20..74e589a298 100644
--- a/include/hw/pci/pci_ids.h
+++ b/include/hw/pci/pci_ids.h
@@ -188,6 +188,7 @@
 #define PCI_DEVICE_ID_APPLE_UNI_N_AGP0x0020
 #define PCI_DEVICE_ID_APPLE_U3_AGP   0x004b
 #define PCI_DEVICE_ID_APPLE_UNI_N_GMAC   0x0021
+#define PCI_DEVICE_ID_APPLE_VIRTIO_BLK   0x1a00
 
 #define PCI_VENDOR_ID_SUN0x108e
 #define PCI_DEVICE_ID_SUN_EBUS   0x1000
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index dafec432ce..381a906410 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -23,7 +23,7 @@
 #include "qom/object.h"
 
 #define TYPE_VIRTIO_BLK "virtio-blk-device"
-OBJECT_DECLARE_SIMPLE_TYPE(VirtIOBlock, VIRTIO_BLK)
+OBJECT_DECLARE_TYPE(VirtIOBlock, VirtIOBlkClass, VIRTIO_BLK)
 
 /* This is the last element of the write scatter-gather list */
 struct virtio_blk_inhdr
@@ -91,6 +91,16 @@ typedef struct MultiReqBuffer {
 bool is_write;
 } MultiReqBuffer;
 
+typedef struct VirtIOBlkClass {
+/*< private >*/
+VirtioDeviceClass parent;
+/*< public >*/
+bool (*handle_unknown_request)(VirtIOBlockReq *req, MultiReqBuffer *mrb,
+   uint32_t type);
+} VirtIOBlkClass;
+
 void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq);
+void virtio_blk_free_request(VirtIOBlockReq *req);
+void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status);
 
 #endif
diff --git a/include/hw/vmapple/virtio-blk.h b/include/hw/vmapple/virtio-blk.h
new file mode 100644
index 00..b23106a3df
--- /dev/null
+++ b/include/hw/vmapple/virtio-blk.h
@@ -0,0 +1,39 @@
+/*
+ * VMApple specific VirtIO Block implementation
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_VMAPPLE_CFG_H
+#define HW_VMAPPLE_CFG_H
+
+#include "hw/sysbus.h"
+#include "qom/object.h"
+#include "hw/virtio/virtio-pci.h"
+#include "hw/virtio/virtio-blk.h"
+
+#define TYPE_VMAPPLE_VIRTIO_BLK "vmapple-virtio-blk"
+#define TYPE_VMAPPLE_VIRTIO_ROOT "vmapple-virtio-root"
+#define TYPE_VMAPPLE_VIRTIO_AUX "vmapple-virtio-aux"
+
+OBJECT_DECLARE_TYPE(VMAppleVirtIOBlk, VMAppleVirtIOBlkClass, 
VMAPPLE_VIRTIO_BLK)
+
+typedef struct VMAppleVirtIOBlkClass {
+/*< private >*/
+VirtIOBlkClass parent;
+/*< public >*/
+void (*get_config)(VirtIODevice *vdev, uint8_t *config);
+} VMAppleVirtIOBlkClass;
+
+typedef struct VMAppleVirtIOBlk {
+/*  */
+VirtIOBlock parent_obj;
+
+/*  */
+uint32_t apple_type;
+} VMAppleVirtIOBlk;
+
+#endif /* HW_VMAPPLE_CFG_H */
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 39e7f23fab..1645cdccbe 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -48,12 +48,12 @@ static void virtio_blk_init_request(VirtIOBlock *s, 
VirtQueue *vq,
 req->mr_next = NULL;
 }
 
-static void virtio_blk_free_request(VirtIOBlockReq *req)
+void virtio_blk_free_request(VirtIOBlockReq *req)
 {
 g_free(req);
 }
 
-static void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)
+void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)
 {
 VirtIOBlock *s = req->dev;
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
@@ -1121,8 +1121,18 @@ static int virtio_blk_handle_request(VirtIOBlockReq 
*req, MultiReqBuffer *mrb)
 break;
 }
 default:
-virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
-virtio_b

[PATCH v2 06/12] gpex: Allow more than 4 legacy IRQs

2023-08-30 Thread Alexander Graf
Some boards such as vmapple don't do real legacy PCI IRQ swizzling.
Instead, they just keep allocating more board IRQ lines for each new
legacy IRQ. Let's support that mode by giving instantiators a new
"nr_irqs" property they can use to support more than 4 legacy IRQ lines.
In this mode, GPEX will export more IRQ lines, one for each device.

Signed-off-by: Alexander Graf 
---
 include/hw/pci-host/gpex.h |  7 +++
 hw/arm/sbsa-ref.c  |  2 +-
 hw/arm/virt.c  |  2 +-
 hw/i386/microvm.c  |  2 +-
 hw/loongarch/virt.c|  2 +-
 hw/mips/loongson3_virt.c   |  2 +-
 hw/openrisc/virt.c | 12 ++--
 hw/pci-host/gpex.c | 36 +++-
 hw/riscv/virt.c| 12 ++--
 hw/xtensa/virt.c   |  2 +-
 10 files changed, 52 insertions(+), 27 deletions(-)

diff --git a/include/hw/pci-host/gpex.h b/include/hw/pci-host/gpex.h
index b0240bd768..098dc4d1cc 100644
--- a/include/hw/pci-host/gpex.h
+++ b/include/hw/pci-host/gpex.h
@@ -32,8 +32,6 @@ OBJECT_DECLARE_SIMPLE_TYPE(GPEXHost, GPEX_HOST)
 #define TYPE_GPEX_ROOT_DEVICE "gpex-root"
 OBJECT_DECLARE_SIMPLE_TYPE(GPEXRootState, GPEX_ROOT_DEVICE)
 
-#define GPEX_NUM_IRQS 4
-
 struct GPEXRootState {
 /*< private >*/
 PCIDevice parent_obj;
@@ -51,8 +49,9 @@ struct GPEXHost {
 MemoryRegion io_mmio;
 MemoryRegion io_ioport_window;
 MemoryRegion io_mmio_window;
-qemu_irq irq[GPEX_NUM_IRQS];
-int irq_num[GPEX_NUM_IRQS];
+uint32_t nr_irqs;
+qemu_irq *irq;
+int *irq_num;
 
 bool allow_unmapped_accesses;
 };
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index bc89eb4806..a786849238 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -681,7 +681,7 @@ static void create_pcie(SBSAMachineState *sms)
 /* Map IO port space */
 sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, base_pio);
 
-for (i = 0; i < GPEX_NUM_IRQS; i++) {
+for (i = 0; i < PCI_NUM_PINS; i++) {
 sysbus_connect_irq(SYS_BUS_DEVICE(dev), i,
qdev_get_gpio_in(sms->gic, irq + i));
 gpex_set_irq_num(GPEX_HOST(dev), i, irq + i);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index a13c658bbf..3a4ef3adc2 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1467,7 +1467,7 @@ static void create_pcie(VirtMachineState *vms)
 /* Map IO port space */
 sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, base_pio);
 
-for (i = 0; i < GPEX_NUM_IRQS; i++) {
+for (i = 0; i < PCI_NUM_PINS; i++) {
 sysbus_connect_irq(SYS_BUS_DEVICE(dev), i,
qdev_get_gpio_in(vms->gic, irq + i));
 gpex_set_irq_num(GPEX_HOST(dev), i, irq + i);
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 7227a2156c..9ca007b870 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -139,7 +139,7 @@ static void create_gpex(MicrovmMachineState *mms)
 mms->gpex.mmio64.base, mmio64_alias);
 }
 
-for (i = 0; i < GPEX_NUM_IRQS; i++) {
+for (i = 0; i < PCI_NUM_PINS; i++) {
 sysbus_connect_irq(SYS_BUS_DEVICE(dev), i,
x86ms->gsi[mms->gpex.irq + i]);
 }
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 2629128aed..36bfcea53b 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -533,7 +533,7 @@ static void loongarch_devices_init(DeviceState *pch_pic, 
LoongArchMachineState *
 memory_region_add_subregion(get_system_memory(), VIRT_PCI_IO_BASE,
 pio_alias);
 
-for (i = 0; i < GPEX_NUM_IRQS; i++) {
+for (i = 0; i < PCI_NUM_PINS; i++) {
 sysbus_connect_irq(d, i,
qdev_get_gpio_in(pch_pic, 16 + i));
 gpex_set_irq_num(GPEX_HOST(gpex_dev), i, 16 + i);
diff --git a/hw/mips/loongson3_virt.c b/hw/mips/loongson3_virt.c
index b74b358874..6d54f679eb 100644
--- a/hw/mips/loongson3_virt.c
+++ b/hw/mips/loongson3_virt.c
@@ -437,7 +437,7 @@ static inline void loongson3_virt_devices_init(MachineState 
*machine,
 virt_memmap[VIRT_PCIE_PIO].base, s->pio_alias);
 sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, virt_memmap[VIRT_PCIE_PIO].base);
 
-for (i = 0; i < GPEX_NUM_IRQS; i++) {
+for (i = 0; i < PCI_NUM_PINS; i++) {
 irq = qdev_get_gpio_in(pic, PCIE_IRQ_BASE + i);
 sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, irq);
 gpex_set_irq_num(GPEX_HOST(dev), i, PCIE_IRQ_BASE + i);
diff --git a/hw/openrisc/virt.c b/hw/openrisc/virt.c
index f8a68a6a6b..16a5676c4b 100644
--- a/hw/openrisc/virt.c
+++ b/hw/openrisc/virt.c
@@ -318,7 +318,7 @@ static void create_pcie_irq_map(void *fdt, char *nodename, 
int irq_base,
 {
 int pin, dev;
 uint32_t irq_map_stride = 0;
-uint32_t full_irq_map[GPEX_NUM_IRQS * GPEX_NUM_IRQS * 6] = {};
+uint32_t full_irq_map[PCI_NUM_PINS * PCI_NUM_PINS * 6] = {};
 uint32_t *irq_map = ful

Re: [PATCH 12/12] hw/vmapple/vmapple: Add vmapple machine type

2023-08-30 Thread Alexander Graf


On 20.06.23 19:35, Bernhard Beschow wrote:



Am 14. Juni 2023 22:57:34 UTC schrieb Alexander Graf :

Apple defines a new "vmapple" machine type as part of its proprietary
macOS Virtualization.Framework vmm. This machine type is similar to the
virt one, but with subtle differences in base devices, a few special
vmapple device additions and a vastly different boot chain.

This patch reimplements this machine type in QEMU. To use it, you
have to have a readily installed version of macOS for VMApple,
run on macOS with -accel hvf, pass the Virtualization.Framework
boot rom (AVPBooter) in via -bios, pass the aux and root volume as pflash
and pass aux and root volume as virtio drives. In addition, you also
need to find the machine UUID and pass that as -M vmapple,uuid= parameter:

$ qemu-system-aarch64 -accel hvf -M vmapple,uuid=0x1234 -m 4G \
-bios 
/System/Library/Frameworks/Virtualization.framework/Versions/A/Resources/AVPBooter.vmapple2.bin
-drive file=aux,if=pflash,format=raw \
-drive file=root,if=pflash,format=raw \
-drive file=aux,if=none,id=aux,format=raw \
-device virtio-blk-pci,drive=aux,x-apple-type=2 \
-drive file=root,if=none,id=root,format=raw \
-device virtio-blk-pci,drive=root,x-apple-type=1

With all these in place, you should be able to see macOS booting
successfully.

This documentation seems valuable for the QEMU manual. But AFAICS there is no 
documentation like this added to the QEMU manual in this series. This means that it'll 
get "lost". How about adding it, possibly in this patch?



Thanks, I love the idea :). Let me do that for v2!




Note that I'm not able to test this series. I'm just seeing the 
valuable-information-in-the-commit-message-which-will-get-lost pattern.


Signed-off-by: Alexander Graf 
---
hw/vmapple/Kconfig |  19 ++
hw/vmapple/meson.build |   1 +
hw/vmapple/vmapple.c   | 661 +
3 files changed, 681 insertions(+)
create mode 100644 hw/vmapple/vmapple.c

diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
index ba37fc5b81..7a2375dc95 100644
--- a/hw/vmapple/Kconfig
+++ b/hw/vmapple/Kconfig
@@ -9,3 +9,22 @@ config VMAPPLE_CFG

config VMAPPLE_PVG
 bool
+
+config VMAPPLE
+bool
+depends on ARM && HVF
+default y if ARM && HVF
+imply PCI_DEVICES
+select ARM_GIC
+select PLATFORM_BUS
+select PCI_EXPRESS
+select PCI_EXPRESS_GENERIC_BRIDGE
+select PL011 # UART
+select PL031 # RTC
+select PL061 # GPIO
+select GPIO_PWR
+select PVPANIC_MMIO
+select VMAPPLE_AES
+select VMAPPLE_BDIF
+select VMAPPLE_CFG
+select VMAPPLE_PVG
diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build
index 31fec87156..d732873d35 100644
--- a/hw/vmapple/meson.build
+++ b/hw/vmapple/meson.build
@@ -2,3 +2,4 @@ softmmu_ss.add(when: 'CONFIG_VMAPPLE_AES',  if_true: 
files('aes.c'))
softmmu_ss.add(when: 'CONFIG_VMAPPLE_BDIF', if_true: files('bdif.c'))
softmmu_ss.add(when: 'CONFIG_VMAPPLE_CFG',  if_true: files('cfg.c'))
softmmu_ss.add(when: 'CONFIG_VMAPPLE_PVG',  if_true: [files('apple-gfx.m'), 
pvg, metal])
+specific_ss.add(when: 'CONFIG_VMAPPLE', if_true: files('vmapple.c'))
diff --git a/hw/vmapple/vmapple.c b/hw/vmapple/vmapple.c
new file mode 100644
index 00..5d3fe54b96
--- /dev/null
+++ b/hw/vmapple/vmapple.c
@@ -0,0 +1,661 @@
+/*
+ * VMApple machine emulation
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.

Is an "All Rights Reserved" wording compatible with the GPL?



IANAL. You will find the pattern commonly across the code base already. 
My understanding is that all rights are reserved, but additionally I 
grant you the permissions of the GPL.



Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




Re: [PATCH 05/12] hw/virtio: Add support for apple virtio-blk

2023-08-24 Thread Alexander Graf


On 16.06.23 13:48, Kevin Wolf wrote:


Am 15.06.2023 um 00:56 hat Alexander Graf geschrieben:

Apple has its own virtio-blk PCI device ID where it deviates from the
official virtio-pci spec slightly: It puts a new "apple type"
field at a static offset in config space and introduces a new discard
command.

In other words, it's a different device. We shouldn't try to
differentiate only with a property, but actually model it as a separate
device.



I agree and is what I tried at first, but how do I change behavior of a 
virtio-blk-pci subclass all the way down to its virtio-blk 
implementation which lives completely outside of the scope of the 
respective class?


The best thing I could come up with was the QEMU internal qom property 
x-apple-type. Happy to split them: Make the change of virtio-blk 
behavior depend on the property and make all of the PCI device/vendor 
swapping depend on a new class which then sets the x-apple-type.






This patch adds a new qdev property called "apple-type" to virtio-blk-pci.
When that property is set, we assume the virtio-blk device is an Apple one
of the specific type and act accordingly.

Do we have any information on what the number in "apple-type" actually
means or do we have to treat it as a black box?



I have ideas, but no documentation. It's an enum space that defines 
different types of devices (AUX device, root device, etc)






Signed-off-by: Alexander Graf 
---
  hw/block/virtio-blk.c   | 23 +
  hw/virtio/virtio-blk-pci.c  |  7 +++
  include/hw/pci/pci_ids.h|  1 +
  include/hw/virtio/virtio-blk.h  |  1 +
  include/standard-headers/linux/virtio_blk.h |  3 +++
  5 files changed, 35 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 39e7f23fab..76b85bb3cb 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1120,6 +1120,20 @@ static int virtio_blk_handle_request(VirtIOBlockReq 
*req, MultiReqBuffer *mrb)

  break;
  }
+case VIRTIO_BLK_T_APPLE1:

Can we have a more descriptive name?


+{
+if (s->conf.x_apple_type) {
+/* Only valid on Apple Virtio */
+char buf[iov_size(in_iov, in_num)];
+memset(buf, 0, sizeof(buf));
+iov_from_buf(in_iov, in_num, 0, buf, sizeof(buf));
+virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);

So this is a command that simply fills the guest buffer with zeros
without accessing the disk content? Weird, but ok, if that's what they
are doing...

The commit message talks about a discard command. I would have expected
a command that discards/unmaps data from the disk. I think it would be
good to call it something else in the commit message if it has nothing
to do with this.



You're completely right. I looked it up again and turns out this is 
actually a barrier command. Any ideas on how to best implement an actual 
barrier in virtio-blk? Otherwise I'll just ignore it and always return 
S_OK. No need for the memset muckery above.






+} else {
+virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
+}
+virtio_blk_free_request(req);
+break;
+}
  default:
  virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
  virtio_blk_free_request(req);
@@ -1351,6 +1365,10 @@ static void virtio_blk_update_config(VirtIODevice *vdev, 
uint8_t *config)
  } else {
  blkcfg.zoned.model = VIRTIO_BLK_Z_NONE;
  }
+if (s->conf.x_apple_type) {
+/* Apple abuses the same location for its type id */
+blkcfg.max_secure_erase_sectors = s->conf.x_apple_type;

Ideally, blkcfg would contain a union there. Since this is a type
imported from the kernel, we can't change it inside of QEMU only. Works
for me with this comment.


+}
  memcpy(config, &blkcfg, s->config_size);
  }

@@ -1625,6 +1643,10 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)

  s->config_size = virtio_get_config_size(&virtio_blk_cfg_size_params,
  s->host_features);
+if (s->conf.x_apple_type) {
+/* Apple Virtio puts the blk type at 0x3c, make sure we have space. */
+s->config_size = MAX(s->config_size, 0x3d);
+}
  virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size);

  s->blk = conf->conf.blk;
@@ -1734,6 +1756,7 @@ static Property virtio_blk_properties[] = {
 conf.max_write_zeroes_sectors, 
BDRV_REQUEST_MAX_SECTORS),
  DEFINE_PROP_BOOL("x-enable-wce-if-config-wce", VirtIOBlock,
   conf.x_enable_wce_if_config_wce, true),
+DEFINE_PROP_UINT32("x-apple-type", VirtIOBlock, conf.x_apple_type, 0),

In a separate device, this would probably be called "apple-type"
(without "x-") like promi

Re: [PATCH 10/12] hw/vmapple/cfg: Introduce vmapple cfg region

2023-08-22 Thread Alexander Graf


On 16.06.23 12:47, Philippe Mathieu-Daudé wrote:


On 15/6/23 00:57, Alexander Graf wrote:

Instead of device tree or other more standardized means, VMApple passes
platform configuration to the first stage boot loader in a binary 
encoded

format that resides at a dedicated RAM region in physical address space.

This patch models this configuration space as a qdev device which we can
then map at the fixed location in the address space. That way, we can
influence and annotate all configuration fields easily.

Signed-off-by: Alexander Graf 
---
  hw/vmapple/Kconfig   |   3 ++
  hw/vmapple/cfg.c | 105 +++
  hw/vmapple/meson.build   |   1 +
  include/hw/vmapple/cfg.h |  68 +
  4 files changed, 177 insertions(+)
  create mode 100644 hw/vmapple/cfg.c
  create mode 100644 include/hw/vmapple/cfg.h




diff --git a/hw/vmapple/cfg.c b/hw/vmapple/cfg.c
new file mode 100644
index 00..d48e3c3afa
--- /dev/null
+++ b/hw/vmapple/cfg.c
@@ -0,0 +1,105 @@
+/*
+ * VMApple Configuration Region
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights 
Reserved.

+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 
or later.

+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/vmapple/cfg.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qapi/error.h"
+
+static void vmapple_cfg_reset(DeviceState *dev)
+{
+    VMAppleCfgState *s = VMAPPLE_CFG(dev);
+    VMAppleCfg *cfg;
+
+    cfg = memory_region_get_ram_ptr(&s->mem);
+    memset((void *)cfg, 0, VMAPPLE_CFG_SIZE);


I'm a bit confused here: DeviceReset() handler is called _after_
DeviceRealize().



Yes. In Realize we set up s->cfg (the template). In reset, we fetch a 
pointer to the guest exposed memory region (cfg), wipe it and then copy 
the template over it in the next line:






+    *cfg = s->cfg;



[...]





diff --git a/include/hw/vmapple/cfg.h b/include/hw/vmapple/cfg.h
new file mode 100644
index 00..3337064e44
--- /dev/null
+++ b/include/hw/vmapple/cfg.h
@@ -0,0 +1,68 @@
+/*
+ * VMApple Configuration Region
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights 
Reserved.

+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 
or later.

+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_VMAPPLE_CFG_H
+#define HW_VMAPPLE_CFG_H
+
+#include "hw/sysbus.h"
+#include "qom/object.h"
+#include "net/net.h"
+
+typedef struct VMAppleCfg {
+    uint32_t version; /* 0x000 */
+    uint32_t nr_cpus; /* 0x004 */
+    uint32_t unk1;    /* 0x008 */
+    uint32_t unk2;    /* 0x00c */
+    uint32_t unk3;    /* 0x010 */
+    uint32_t unk4;    /* 0x014 */
+    uint64_t ecid;    /* 0x018 */
+    uint64_t ram_size;    /* 0x020 */
+    uint32_t run_installer1;  /* 0x028 */
+    uint32_t unk5;    /* 0x02c */
+    uint32_t unk6;    /* 0x030 */
+    uint32_t run_installer2;  /* 0x034 */
+    uint32_t rnd; /* 0x038 */
+    uint32_t unk7;    /* 0x03c */
+    MACAddr mac_en0;  /* 0x040 */
+    uint8_t pad1[2];
+    MACAddr mac_en1;  /* 0x048 */
+    uint8_t pad2[2];
+    MACAddr mac_wifi0;    /* 0x050 */
+    uint8_t pad3[2];
+    MACAddr mac_bt0;  /* 0x058 */
+    uint8_t pad4[2];
+    uint8_t reserved[0xa0];   /* 0x060 */
+    uint32_t cpu_ids[0x80];   /* 0x100 */
+    uint8_t scratch[0x200];   /* 0x180 */
+    char serial[32];  /* 0x380 */
+    char unk8[32];    /* 0x3a0 */
+    char model[32];   /* 0x3c0 */
+    uint8_t unk9[32]; /* 0x3e0 */
+    uint32_t unk10;   /* 0x400 */
+    char soc_name[32];    /* 0x404 */
+} VMAppleCfg;


Since you access this structure via qdev properties (which is
good), then we can restrict its definition to cfg.c (no need to
expose it).



This struct is part of VMAppleCfgState which (unless we go through 
pointers and allocate dynamically - bleks) means it needs to know the 
size of the struct which again means it needs to be part of the header :)



Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




Re: [PATCH 09/12] hw/vmapple/bdif: Introduce vmapple backdoor interface

2023-08-22 Thread Alexander Graf


On 16.06.23 12:39, Philippe Mathieu-Daudé wrote:


On 15/6/23 00:56, Alexander Graf wrote:
The VMApple machine exposes AUX and ROOT block devices (as well as 
USB OTG

emulation) via virtio-pci as well as a special, simple backdoor platform
device.

This patch implements this backdoor platform device to the best of my
understanding. I left out any USB OTG parts; they're only needed for
guest recovery and I don't understand the protocol yet.

Signed-off-by: Alexander Graf 
---
  hw/vmapple/Kconfig    |   2 +
  hw/vmapple/bdif.c | 245 ++
  hw/vmapple/meson.build    |   1 +
  hw/vmapple/trace-events   |   5 +
  include/hw/vmapple/bdif.h |  31 +


Please enable scripts/git.orderfile if possible.



Sure, happy to :)





+#define REG_DEVID_MASK  0x
+#define DEVID_ROOT  0x
+#define DEVID_AUX   0x0001
+#define DEVID_USB   0x0010
+
+#define REG_STATUS  0x0
+#define REG_STATUS_ACTIVE BIT(0)
+#define REG_CFG 0x4
+#define REG_CFG_ACTIVE    BIT(1)
+#define REG_UNK1    0x8
+#define REG_BUSY    0x10
+#define REG_BUSY_READY    BIT(0)
+#define REG_UNK2    0x400
+#define REG_CMD 0x408
+#define REG_NEXT_DEVICE 0x420
+#define REG_UNK3    0x434




+static uint64_t bdif_read(void *opaque, hwaddr offset, unsigned size)
+{
+    uint64_t ret = -1;
+    uint64_t devid = (offset & REG_DEVID_MASK);
+
+    switch (offset & ~REG_DEVID_MASK) {
+    case REG_STATUS:
+    ret = REG_STATUS_ACTIVE;
+    break;
+    case REG_CFG:
+    ret = REG_CFG_ACTIVE;
+    break;
+    case REG_UNK1:
+    ret = 0x420;
+    break;
+    case REG_BUSY:
+    ret = REG_BUSY_READY;
+    break;
+    case REG_UNK2:
+    ret = 0x1;
+    break;
+    case REG_UNK3:
+    ret = 0x0;
+    break;
+    case REG_NEXT_DEVICE:
+    switch (devid) {
+    case DEVID_ROOT:
+    ret = 0x800;
+    break;
+    case DEVID_AUX:
+    ret = 0x1;
+    break;
+    }
+    break;
+    }
+
+    trace_bdif_read(offset, size, ret);
+    return ret;
+}




+static const MemoryRegionOps bdif_ops = {
+    .read = bdif_read,
+    .write = bdif_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .valid = {
+    .min_access_size = 1,
+    .max_access_size = 8,
+    },
+    .impl = {
+    .min_access_size = 1,
+    .max_access_size = 8,


IIUC your implementation is using (min, max) = (4, 4):
i.e. if the guest emits a 64-bit read at offset 0, we want to return
both REG_STATUS/REG_CFG registers.



I don't know if the BDIF device carries those semantics. Today, I'm only 
seeing 32bit accesses which is what I can vouch for. Will 8bit accesses 
go to a different register space or just access a subset of the 32bit 
register? I don't know :)


The same applies to 64bit ones. For all I know, they might as well end 
up as completely different registers.



Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




Re: hvf: Invalid ISV on data abort

2023-08-02 Thread Alexander Graf

Hi Antonio,

On 02.08.23 11:43, Antonio Caggiano wrote:


Hi there,

I am trying to bring up a guest on HVF, which at a certain point is
trying to write to an area of mmio space and it triggers a data abort
where ISV=0 (translation fault level 2).

I wonder what could cause it and how to recover.



QEMU's HVF implementation - like KVM - only supports MMIO accesses from 
hardware decoded, "simple" load/store instructions. It will only execute 
guest OSs that are aware of that limitation and limit MMIO accesses to 
that set of instructions, such as Linux.


If you see this effect with an enlightened OS, you are most likely 
exposing memory that the guest expects to be represented as RAM as MMIO.



Thanks,

Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




Re: [PATCH 00/12] Introduce new vmapple machine type

2023-06-21 Thread Alexander Graf

Hi Mads,


On 20.06.23 13:17, Mads Ynddal wrote:




On 15 Jun 2023, at 00.40, Alexander Graf  wrote:

This patch set introduces a new ARM and HVF specific machine type
called "vmapple". It mimicks the device model that Apple's proprietary
Virtualization.Framework exposes, but implements it in QEMU.

With this new machine type, you can run macOS guests on Apple Silicon
systems via HVF. To do so, you need to first install macOS using
Virtualization.Framework onto a virtual disk image using a tool like
macosvm (https://github.com/s-u/macosvm)

  $ macosvm --disk disk.img,size=32g --aux aux.img \
--restore UniversalMac_12.0.1_21A559_Restore.ipsw vm.json

Then, extract the ECID from the installed VM:

  $ cat "$DIR/macosvm.json" | python3 -c \
  'import json,sys;obj=json.load(sys.stdin);print(obj["machineId"]) |\
  base64 -d | plutil -extract ECID raw -

Beware, that the file will be called 'vm.json' and DIR is undefined following
the previous line. Also, it's missing a single-quote at the end of
`["machineId"])`.



Thanks :)





In addition, cut off the first 16kb of the aux.img:

  $ dd if=aux.img of=aux.img.trimmed bs=$(( 0x4000 )) skip=1

Now, you can just launch QEMU with the bits generated above:

  $ qemu-system-aarch64 -serial mon:stdio\
  -m 4G  \
  -M vmapple,uuid=6240349656165161789\
  -bios /Sys*/Lib*/Fra*/Virtualization.f*/R*/AVPBooter.vmapple2.bin  \
  -pflash aux.img.trimmed\
  -pflash disk.img   \
  -drive file=disk.img,if=none,id=root   \
  -device virtio-blk-pci,drive=root,x-apple-type=1   \
  -drive file=aux.img.trimmed,if=none,id=aux \
  -device virtio-blk-pci,drive=aux,x-apple-type=2\
  -accel hvf -no-reboot

Just for clarity, I'd add that the 'vmapple,uuid=...' has to be set to the
ECID the previous step.

You haven't defined a display, but I'm not sure if that is on purpose to
show a minimal setup. I had to add '-display sdl' for it to fully work.



Weird, I do get a normal cocoa output screen by default.





There are a few limitations with this implementation:

  - Only runs on macOS because it relies on
ParavirtualizesGraphics.Framework
  - Something is not fully correct on interrupt delivery or
similar - the keyboard does not work
  - No Rosetta in the guest because we lack the private
entitlement to enable TSO

Would it be possible to mitigate the keyboard issue using an emulated USB
keyboard? I tried poking around with it, but with no success.



Unfortunately I was not able to get USB stable inside the guest. This 
may be an issue with interrupt propagation: With usb-kbd I see macOS not 
pick up key up or down events in time.



Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




Re: [PATCH 03/12] hvf: Increase number of possible memory slots

2023-06-21 Thread Alexander Graf

Hi Philippe,


On 16.06.23 12:28, Philippe Mathieu-Daudé wrote:



On 15/6/23 00:40, Alexander Graf wrote:

For PVG we will need more than the current 32 possible memory slots.
Bump the limit to 512 instead.

Signed-off-by: Alexander Graf 
---
  accel/hvf/hvf-accel-ops.c | 2 +-
  include/sysemu/hvf_int.h  | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index 9c3da03c94..bf0caaa852 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -88,7 +88,7 @@ struct mac_slot {
  uint64_t gva;
  };

-struct mac_slot mac_slots[32];
+struct mac_slot mac_slots[512];

  static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
  {
diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index 6ab119e49f..c7623a2c09 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -40,7 +40,7 @@ typedef struct hvf_vcpu_caps {

  struct HVFState {
  AccelState parent;
-    hvf_slot slots[32];
+    hvf_slot slots[512];
  int num_slots;

  hvf_vcpu_caps *hvf_caps;


Please add a definition in this header (using in ops.c).



Happy to :)




In order to save memory and woods, what about keeping
32 on x86 and only raising to 512 on arm?



I am hoping that someone takes the apple-gfx driver and enables it for 
x86 as well, so I'd rather keep them consistent.


Alex




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




[PATCH 11/12] hw/vmapple/apple-gfx: Introduce ParavirtualizedGraphics.Framework support

2023-06-14 Thread Alexander Graf
MacOS provides a framework (library) that allows any vmm to implement a
paravirtualized 3d graphics passthrough to the host metal stack called
ParavirtualizedGraphics.Framework (PVG). The library abstracts away
almost every aspect of the paravirtualized device model and only provides
and receives callbacks on MMIO access as well as to share memory address
space between the VM and PVG.

This patch implements a QEMU device that drives PVG for the VMApple
variant of it.

Signed-off-by: Alexander Graf 
---
 hw/vmapple/Kconfig  |   3 +
 hw/vmapple/apple-gfx.m  | 578 
 hw/vmapple/meson.build  |   1 +
 hw/vmapple/trace-events |  22 ++
 meson.build |   4 +
 5 files changed, 608 insertions(+)
 create mode 100644 hw/vmapple/apple-gfx.m

diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
index 542426a740..ba37fc5b81 100644
--- a/hw/vmapple/Kconfig
+++ b/hw/vmapple/Kconfig
@@ -6,3 +6,6 @@ config VMAPPLE_BDIF
 
 config VMAPPLE_CFG
 bool
+
+config VMAPPLE_PVG
+bool
diff --git a/hw/vmapple/apple-gfx.m b/hw/vmapple/apple-gfx.m
new file mode 100644
index 00..97dd2cd9ae
--- /dev/null
+++ b/hw/vmapple/apple-gfx.m
@@ -0,0 +1,578 @@
+/*
+ * QEMU Apple ParavirtualizedGraphics.framework device
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ * ParavirtualizedGraphics.framework is a set of libraries that macOS provides
+ * which implements 3d graphics passthrough to the host as well as a
+ * proprietary guest communication channel to drive it. This device model
+ * implements support to drive that library from within QEMU.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/irq.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "trace.h"
+#include "hw/sysbus.h"
+#include "hw/pci/msi.h"
+#include "crypto/hash.h"
+#include "sysemu/cpus.h"
+#include "ui/console.h"
+#include "monitor/monitor.h"
+#import 
+
+#define TYPE_APPLE_GFX  "apple-gfx"
+
+#define MAX_MRS 512
+
+static const PGDisplayCoord_t apple_gfx_modes[] = {
+{ .x = 1440, .y = 1080 },
+{ .x = 1280, .y = 1024 },
+};
+
+/*
+ * We have to map PVG memory into our address space. Use the one below
+ * as base start address. In normal linker setups it points to a free
+ * memory range.
+ */
+#define APPLE_GFX_BASE_VA ((void *)(uintptr_t)0x5000UL)
+
+/*
+ * ParavirtualizedGraphics.Framework only ships header files for the x86
+ * variant which does not include IOSFC descriptors and host devices. We add
+ * their definitions here so that we can also work with the ARM version.
+ */
+typedef bool(^IOSFCRaiseInterrupt)(uint32_t vector);
+typedef bool(^IOSFCUnmapMemory)(void *a, void *b, void *c, void *d, void *e, 
void *f);
+typedef bool(^IOSFCMapMemory)(uint64_t phys, uint64_t len, bool ro, void **va, 
void *e, void *f);
+
+@interface PGDeviceDescriptorExt : PGDeviceDescriptor
+@property (readwrite, nonatomic) bool usingIOSurfaceMapper;
+@end
+
+@interface PGIOSurfaceHostDeviceDescriptor : NSObject
+-(PGIOSurfaceHostDeviceDescriptor *)init;
+@property (readwrite, nonatomic, copy, nullable) IOSFCMapMemory mapMemory;
+@property (readwrite, nonatomic, copy, nullable) IOSFCUnmapMemory unmapMemory;
+@property (readwrite, nonatomic, copy, nullable) IOSFCRaiseInterrupt 
raiseInterrupt;
+@end
+
+@interface PGIOSurfaceHostDevice : NSObject
+-(void)initWithDescriptor:(PGIOSurfaceHostDeviceDescriptor *) desc;
+-(uint32_t)mmioReadAtOffset:(size_t) offset;
+-(void)mmioWriteAtOffset:(size_t) offset value:(uint32_t)value;
+@end
+
+typedef struct AppleGFXMR {
+QTAILQ_ENTRY(AppleGFXMR) node;
+hwaddr pa;
+void *va;
+uint64_t len;
+} AppleGFXMR;
+
+typedef QTAILQ_HEAD(, AppleGFXMR) AppleGFXMRList;
+
+typedef struct AppleGFXTask {
+QTAILQ_ENTRY(AppleGFXTask) node;
+void *mem;
+uint64_t len;
+} AppleGFXTask;
+
+typedef QTAILQ_HEAD(, AppleGFXTask) AppleGFXTaskList;
+
+typedef struct AppleGFXState {
+/* Private */
+SysBusDevice parent_obj;
+
+/* Public */
+qemu_irq irq_gfx;
+qemu_irq irq_iosfc;
+MemoryRegion iomem_gfx;
+MemoryRegion iomem_iosfc;
+id pgdev;
+id pgdisp;
+PGIOSurfaceHostDevice *pgiosfc;
+AppleGFXMRList mrs;
+AppleGFXTaskList tasks;
+QemuConsole *con;
+void *vram;
+id mtl;
+id texture;
+bool handles_frames;
+bool new_frame;
+bool cursor_show;
+DisplaySurface *surface;
+QEMUCursor *cursor;
+} AppleGFXState;
+
+
+OBJECT_DECLARE_SIMPLE_TYPE(AppleGFXState, APPLE_GFX)
+
+static AppleGFXTask *apple_gfx_new_task(AppleGFXState *s, uint64_t len)
+{
+void *base = APPLE_GFX_BASE_VA;
+AppleGFXTask *task;
+
+QTAILQ_FOREACH(task, &s->tasks,

[PATCH 12/12] hw/vmapple/vmapple: Add vmapple machine type

2023-06-14 Thread Alexander Graf
Apple defines a new "vmapple" machine type as part of its proprietary
macOS Virtualization.Framework vmm. This machine type is similar to the
virt one, but with subtle differences in base devices, a few special
vmapple device additions and a vastly different boot chain.

This patch reimplements this machine type in QEMU. To use it, you
have to have a readily installed version of macOS for VMApple,
run on macOS with -accel hvf, pass the Virtualization.Framework
boot rom (AVPBooter) in via -bios, pass the aux and root volume as pflash
and pass aux and root volume as virtio drives. In addition, you also
need to find the machine UUID and pass that as -M vmapple,uuid= parameter:

$ qemu-system-aarch64 -accel hvf -M vmapple,uuid=0x1234 -m 4G \
-bios 
/System/Library/Frameworks/Virtualization.framework/Versions/A/Resources/AVPBooter.vmapple2.bin
-drive file=aux,if=pflash,format=raw \
-drive file=root,if=pflash,format=raw \
-drive file=aux,if=none,id=aux,format=raw \
-device virtio-blk-pci,drive=aux,x-apple-type=2 \
-drive file=root,if=none,id=root,format=raw \
-device virtio-blk-pci,drive=root,x-apple-type=1

With all these in place, you should be able to see macOS booting
successfully.

Signed-off-by: Alexander Graf 
---
 hw/vmapple/Kconfig |  19 ++
 hw/vmapple/meson.build |   1 +
 hw/vmapple/vmapple.c   | 661 +
 3 files changed, 681 insertions(+)
 create mode 100644 hw/vmapple/vmapple.c

diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
index ba37fc5b81..7a2375dc95 100644
--- a/hw/vmapple/Kconfig
+++ b/hw/vmapple/Kconfig
@@ -9,3 +9,22 @@ config VMAPPLE_CFG
 
 config VMAPPLE_PVG
 bool
+
+config VMAPPLE
+bool
+depends on ARM && HVF
+default y if ARM && HVF
+imply PCI_DEVICES
+select ARM_GIC
+select PLATFORM_BUS
+select PCI_EXPRESS
+select PCI_EXPRESS_GENERIC_BRIDGE
+select PL011 # UART
+select PL031 # RTC
+select PL061 # GPIO
+select GPIO_PWR
+select PVPANIC_MMIO
+select VMAPPLE_AES
+select VMAPPLE_BDIF
+select VMAPPLE_CFG
+select VMAPPLE_PVG
diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build
index 31fec87156..d732873d35 100644
--- a/hw/vmapple/meson.build
+++ b/hw/vmapple/meson.build
@@ -2,3 +2,4 @@ softmmu_ss.add(when: 'CONFIG_VMAPPLE_AES',  if_true: 
files('aes.c'))
 softmmu_ss.add(when: 'CONFIG_VMAPPLE_BDIF', if_true: files('bdif.c'))
 softmmu_ss.add(when: 'CONFIG_VMAPPLE_CFG',  if_true: files('cfg.c'))
 softmmu_ss.add(when: 'CONFIG_VMAPPLE_PVG',  if_true: [files('apple-gfx.m'), 
pvg, metal])
+specific_ss.add(when: 'CONFIG_VMAPPLE', if_true: files('vmapple.c'))
diff --git a/hw/vmapple/vmapple.c b/hw/vmapple/vmapple.c
new file mode 100644
index 00..5d3fe54b96
--- /dev/null
+++ b/hw/vmapple/vmapple.c
@@ -0,0 +1,661 @@
+/*
+ * VMApple machine emulation
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ * VMApple is the device model that the macOS built-in hypervisor called
+ * "Virtualization.framework" exposes to Apple Silicon macOS guests. The
+ * machine model in this file implements the same device model in QEMU, but
+ * does not use any code from Virtualization.Framework.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/help-texts.h"
+#include "qemu/datadir.h"
+#include "qemu/units.h"
+#include "qemu/option.h"
+#include "monitor/qdev.h"
+#include "qapi/error.h"
+#include "hw/sysbus.h"
+#include "hw/arm/boot.h"
+#include "hw/arm/primecell.h"
+#include "hw/boards.h"
+#include "net/net.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/runstate.h"
+#include "sysemu/kvm.h"
+#include "sysemu/hvf.h"
+#include "hw/loader.h"
+#include "qapi/error.h"
+#include "qemu/bitops.h"
+#include "qemu/error-report.h"
+#include "qemu/module.h"
+#include "hw/pci-host/gpex.h"
+#include "hw/virtio/virtio-pci.h"
+#include "hw/qdev-properties.h"
+#include "hw/intc/arm_gic.h"
+#include "hw/intc/arm_gicv3_common.h"
+#include "hw/irq.h"
+#include "qapi/visitor.h"
+#include "qapi/qapi-visit-common.h"
+#include "standard-headers/linux/input.h"
+#include "target/arm/internals.h"
+#include "target/arm/kvm_arm.h"
+#include "hw/char/pl011.h"
+#include "qemu/guest-random.h"
+#include "sysemu/reset.h"
+#include "qemu/log.h"
+#include "hw/vmapple/cfg.h"
+#include "hw/misc/pvpanic.h"
+#include &quo

[PATCH 08/12] hw/vmapple/aes: Introduce aes engine

2023-06-14 Thread Alexander Graf
VMApple contains an "aes" engine device that it uses to encrypt and
decrypt its nvram. It has trivial hard coded keys it uses for that
purpose.

Add device emulation for this device model.

Signed-off-by: Alexander Graf 
---
 hw/vmapple/Kconfig  |   2 +
 hw/vmapple/aes.c| 583 
 hw/vmapple/meson.build  |   1 +
 hw/vmapple/trace-events |  18 ++
 4 files changed, 604 insertions(+)
 create mode 100644 hw/vmapple/aes.c

diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
index 8b13789179..a73504d599 100644
--- a/hw/vmapple/Kconfig
+++ b/hw/vmapple/Kconfig
@@ -1 +1,3 @@
+config VMAPPLE_AES
+bool
 
diff --git a/hw/vmapple/aes.c b/hw/vmapple/aes.c
new file mode 100644
index 00..eaf1e26abe
--- /dev/null
+++ b/hw/vmapple/aes.c
@@ -0,0 +1,583 @@
+/*
+ * QEMU Apple AES device emulation
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/irq.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "trace.h"
+#include "hw/sysbus.h"
+#include "crypto/hash.h"
+#include "crypto/aes.h"
+#include "crypto/cipher.h"
+
+#define TYPE_AES  "apple-aes"
+#define MAX_FIFO_SIZE 9
+
+#define CMD_KEY   0x1
+#define CMD_KEY_CONTEXT_SHIFT27
+#define CMD_KEY_CONTEXT_MASK (0x1 << CMD_KEY_CONTEXT_SHIFT)
+#define CMD_KEY_SELECT_SHIFT 24
+#define CMD_KEY_SELECT_MASK  (0x7 << CMD_KEY_SELECT_SHIFT)
+#define CMD_KEY_KEY_LEN_SHIFT22
+#define CMD_KEY_KEY_LEN_MASK (0x3 << CMD_KEY_KEY_LEN_SHIFT)
+#define CMD_KEY_ENCRYPT_SHIFT20
+#define CMD_KEY_ENCRYPT_MASK (0x1 << CMD_KEY_ENCRYPT_SHIFT)
+#define CMD_KEY_BLOCK_MODE_SHIFT 16
+#define CMD_KEY_BLOCK_MODE_MASK  (0x3 << CMD_KEY_BLOCK_MODE_SHIFT)
+#define CMD_IV0x2
+#define CMD_IV_CONTEXT_SHIFT 26
+#define CMD_IV_CONTEXT_MASK  (0x3 << CMD_KEY_CONTEXT_SHIFT)
+#define CMD_DSB   0x3
+#define CMD_SKG   0x4
+#define CMD_DATA  0x5
+#define CMD_DATA_KEY_CTX_SHIFT   27
+#define CMD_DATA_KEY_CTX_MASK(0x1 << CMD_DATA_KEY_CTX_SHIFT)
+#define CMD_DATA_IV_CTX_SHIFT25
+#define CMD_DATA_IV_CTX_MASK (0x3 << CMD_DATA_IV_CTX_SHIFT)
+#define CMD_DATA_LEN_MASK0xff
+#define CMD_STORE_IV  0x6
+#define CMD_STORE_IV_ADDR_MASK   0xff
+#define CMD_WRITE_REG 0x7
+#define CMD_FLAG  0x8
+#define CMD_FLAG_STOP_MASK   BIT(26)
+#define CMD_FLAG_RAISE_IRQ_MASK  BIT(27)
+#define CMD_FLAG_INFO_MASK   0xff
+#define CMD_MAX   0x10
+
+#define CMD_SHIFT 28
+
+#define REG_STATUS0xc
+#define REG_STATUS_DMA_READ_RUNNING BIT(0)
+#define REG_STATUS_DMA_READ_PENDING BIT(1)
+#define REG_STATUS_DMA_WRITE_RUNNINGBIT(2)
+#define REG_STATUS_DMA_WRITE_PENDINGBIT(3)
+#define REG_STATUS_BUSY BIT(4)
+#define REG_STATUS_EXECUTINGBIT(5)
+#define REG_STATUS_READYBIT(6)
+#define REG_STATUS_TEXT_DPA_SEEDED  BIT(7)
+#define REG_STATUS_UNWRAP_DPA_SEEDEDBIT(8)
+
+#define REG_IRQ_STATUS0x18
+#define REG_IRQ_STATUS_INVALID_CMD  BIT(2)
+#define REG_IRQ_STATUS_FLAG BIT(5)
+#define REG_IRQ_ENABLE0x1c
+#define REG_WATERMARK 0x20
+#define REG_Q_STATUS  0x24
+#define REG_FLAG_INFO 0x30
+#define REG_FIFO  0x200
+
+static const uint32_t key_lens[4] = {
+[0] = 16,
+[1] = 24,
+[2] = 32,
+[3] = 64,
+};
+
+struct key {
+uint32_t key_len;
+uint32_t key[8];
+};
+
+struct iv {
+uint32_t iv[4];
+};
+
+struct context {
+struct key key;
+struct iv iv;
+};
+
+static struct key builtin_keys[7] = {
+[1] = {
+.key_len = 32,
+.key = { 0x1 },
+},
+[2] = {
+.key_len = 32,
+.key = { 0x2 },
+},
+[3] = {
+.key_len = 32,
+.key = { 0x3 },
+}
+};
+
+typedef struct AESState {
+/* Private */
+SysBusDevice parent_obj;
+
+/* Public */
+qemu_irq irq;
+MemoryRegion iomem1;
+MemoryRegion iomem2;
+
+uint32_t status;
+uint32_t q_status;
+uint32_t irq_status;
+uint32_t irq_enable;
+uint32_t watermark;
+uint32_t flag_info;
+uint32_t fifo[MAX_FIFO_SIZE];
+uint32_t fifo_idx;
+struct key key[2];
+struct iv iv[4];
+bool is_encrypt;
+QCryptoCipherMode block_mode;
+} AESState;
+
+OBJECT_DECLARE_SIMPLE_TYPE(AESState, AES)
+
+static void aes_update_irq(AESState *s)
+{
+qemu_set_irq(s->irq, !!(s->irq_status & s->irq_enable));
+}
+
+static uint64_t aes1_read(void *opaque, hwaddr offset, unsigned size)
+{
+AESState *s = opaque;
+uint64_

[PATCH 10/12] hw/vmapple/cfg: Introduce vmapple cfg region

2023-06-14 Thread Alexander Graf
Instead of device tree or other more standardized means, VMApple passes
platform configuration to the first stage boot loader in a binary encoded
format that resides at a dedicated RAM region in physical address space.

This patch models this configuration space as a qdev device which we can
then map at the fixed location in the address space. That way, we can
influence and annotate all configuration fields easily.

Signed-off-by: Alexander Graf 
---
 hw/vmapple/Kconfig   |   3 ++
 hw/vmapple/cfg.c | 105 +++
 hw/vmapple/meson.build   |   1 +
 include/hw/vmapple/cfg.h |  68 +
 4 files changed, 177 insertions(+)
 create mode 100644 hw/vmapple/cfg.c
 create mode 100644 include/hw/vmapple/cfg.h

diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
index 388a2bc60c..542426a740 100644
--- a/hw/vmapple/Kconfig
+++ b/hw/vmapple/Kconfig
@@ -3,3 +3,6 @@ config VMAPPLE_AES
 
 config VMAPPLE_BDIF
 bool
+
+config VMAPPLE_CFG
+bool
diff --git a/hw/vmapple/cfg.c b/hw/vmapple/cfg.c
new file mode 100644
index 00..d48e3c3afa
--- /dev/null
+++ b/hw/vmapple/cfg.c
@@ -0,0 +1,105 @@
+/*
+ * VMApple Configuration Region
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/vmapple/cfg.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qapi/error.h"
+
+static void vmapple_cfg_reset(DeviceState *dev)
+{
+VMAppleCfgState *s = VMAPPLE_CFG(dev);
+VMAppleCfg *cfg;
+
+cfg = memory_region_get_ram_ptr(&s->mem);
+memset((void *)cfg, 0, VMAPPLE_CFG_SIZE);
+*cfg = s->cfg;
+}
+
+static void vmapple_cfg_realize(DeviceState *dev, Error **errp)
+{
+VMAppleCfgState *s = VMAPPLE_CFG(dev);
+uint32_t i;
+
+strncpy(s->cfg.serial, s->serial, sizeof(s->cfg.serial));
+strncpy(s->cfg.model, s->model, sizeof(s->cfg.model));
+strncpy(s->cfg.soc_name, s->soc_name, sizeof(s->cfg.soc_name));
+strncpy(s->cfg.unk8, "D/A", sizeof(s->cfg.soc_name));
+s->cfg.ecid = cpu_to_be64(s->cfg.ecid);
+s->cfg.version = 2;
+s->cfg.unk1 = 1;
+s->cfg.unk2 = 1;
+s->cfg.unk3 = 0x20;
+s->cfg.unk4 = 0;
+s->cfg.unk5 = 1;
+s->cfg.unk6 = 1;
+s->cfg.unk7 = 0;
+s->cfg.unk10 = 1;
+
+g_assert(s->cfg.nr_cpus < ARRAY_SIZE(s->cfg.cpu_ids));
+for (i = 0; i < s->cfg.nr_cpus; i++) {
+s->cfg.cpu_ids[i] = i;
+}
+}
+
+static void vmapple_cfg_init(Object *obj)
+{
+VMAppleCfgState *s = VMAPPLE_CFG(obj);
+
+memory_region_init_ram(&s->mem, obj, "VMApple Config", VMAPPLE_CFG_SIZE,
+   &error_fatal);
+sysbus_init_mmio(SYS_BUS_DEVICE(obj), &s->mem);
+
+s->serial = (char *)"1234";
+s->model = (char *)"VM0001";
+s->soc_name = (char *)"Apple M1 (Virtual)";
+}
+
+static Property vmapple_cfg_properties[] = {
+DEFINE_PROP_UINT32("nr-cpus", VMAppleCfgState, cfg.nr_cpus, 1),
+DEFINE_PROP_UINT64("ecid", VMAppleCfgState, cfg.ecid, 0),
+DEFINE_PROP_UINT64("ram-size", VMAppleCfgState, cfg.ram_size, 0),
+DEFINE_PROP_UINT32("run_installer1", VMAppleCfgState, cfg.run_installer1, 
0),
+DEFINE_PROP_UINT32("run_installer2", VMAppleCfgState, cfg.run_installer2, 
0),
+DEFINE_PROP_UINT32("rnd", VMAppleCfgState, cfg.rnd, 0),
+DEFINE_PROP_MACADDR("mac-en0", VMAppleCfgState, cfg.mac_en0),
+DEFINE_PROP_MACADDR("mac-en1", VMAppleCfgState, cfg.mac_en1),
+DEFINE_PROP_MACADDR("mac-wifi0", VMAppleCfgState, cfg.mac_wifi0),
+DEFINE_PROP_MACADDR("mac-bt0", VMAppleCfgState, cfg.mac_bt0),
+DEFINE_PROP_STRING("serial", VMAppleCfgState, serial),
+DEFINE_PROP_STRING("model", VMAppleCfgState, model),
+DEFINE_PROP_STRING("soc_name", VMAppleCfgState, soc_name),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void vmapple_cfg_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->realize = vmapple_cfg_realize;
+dc->desc = "VMApple Configuration Region";
+device_class_set_props(dc, vmapple_cfg_properties);
+dc->reset = vmapple_cfg_reset;
+}
+
+static const TypeInfo vmapple_cfg_info = {
+.name  = TYPE_VMAPPLE_CFG,
+.parent= TYPE_SYS_BUS_DEVICE,
+.instance_size = sizeof(VMAppleCfgState),
+.instance_init = vmapple_cfg_init,
+.class_init= vmapple_cfg_class_init,
+};
+
+static void vmapple_cfg_register_types(void)
+{
+type_register_static(&vmapple_cfg_info);
+}
+
+type_init(vmapple

[PATCH 07/12] gpex: Allow more than 4 legacy IRQs

2023-06-14 Thread Alexander Graf
Some boards such as vmapple don't do real legacy PCI IRQ swizzling.
Instead, they just keep allocating more board IRQ lines for each new
legacy IRQ. Let's support that mode by giving instantiators a new
"nr_irqs" property they can use to support more than 4 legacy IRQ lines.
In this mode, GPEX will export more IRQ lines, one for each device.

Signed-off-by: Alexander Graf 
---
 hw/arm/sbsa-ref.c  |  2 +-
 hw/arm/virt.c  |  2 +-
 hw/i386/microvm.c  |  2 +-
 hw/loongarch/virt.c|  2 +-
 hw/mips/loongson3_virt.c   |  2 +-
 hw/openrisc/virt.c | 12 ++--
 hw/pci-host/gpex.c | 36 +++-
 hw/riscv/virt.c| 12 ++--
 hw/xtensa/virt.c   |  2 +-
 include/hw/pci-host/gpex.h |  7 +++
 10 files changed, 52 insertions(+), 27 deletions(-)

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index de21200ff9..2715ea7b2f 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -647,7 +647,7 @@ static void create_pcie(SBSAMachineState *sms)
 /* Map IO port space */
 sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, base_pio);
 
-for (i = 0; i < GPEX_NUM_IRQS; i++) {
+for (i = 0; i < PCI_NUM_PINS; i++) {
 sysbus_connect_irq(SYS_BUS_DEVICE(dev), i,
qdev_get_gpio_in(sms->gic, irq + i));
 gpex_set_irq_num(GPEX_HOST(dev), i, irq + i);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 9b9f7d9c68..cabb5d14f2 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1466,7 +1466,7 @@ static void create_pcie(VirtMachineState *vms)
 /* Map IO port space */
 sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, base_pio);
 
-for (i = 0; i < GPEX_NUM_IRQS; i++) {
+for (i = 0; i < PCI_NUM_PINS; i++) {
 sysbus_connect_irq(SYS_BUS_DEVICE(dev), i,
qdev_get_gpio_in(vms->gic, irq + i));
 gpex_set_irq_num(GPEX_HOST(dev), i, irq + i);
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 7227a2156c..9ca007b870 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -139,7 +139,7 @@ static void create_gpex(MicrovmMachineState *mms)
 mms->gpex.mmio64.base, mmio64_alias);
 }
 
-for (i = 0; i < GPEX_NUM_IRQS; i++) {
+for (i = 0; i < PCI_NUM_PINS; i++) {
 sysbus_connect_irq(SYS_BUS_DEVICE(dev), i,
x86ms->gsi[mms->gpex.irq + i]);
 }
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index ceddec1b23..6a0c2f3103 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -512,7 +512,7 @@ static void loongarch_devices_init(DeviceState *pch_pic, 
LoongArchMachineState *
 memory_region_add_subregion(get_system_memory(), VIRT_PCI_IO_BASE,
 pio_alias);
 
-for (i = 0; i < GPEX_NUM_IRQS; i++) {
+for (i = 0; i < PCI_NUM_PINS; i++) {
 sysbus_connect_irq(d, i,
qdev_get_gpio_in(pch_pic, 16 + i));
 gpex_set_irq_num(GPEX_HOST(gpex_dev), i, 16 + i);
diff --git a/hw/mips/loongson3_virt.c b/hw/mips/loongson3_virt.c
index 216812f660..acf9fead85 100644
--- a/hw/mips/loongson3_virt.c
+++ b/hw/mips/loongson3_virt.c
@@ -438,7 +438,7 @@ static inline void loongson3_virt_devices_init(MachineState 
*machine,
 virt_memmap[VIRT_PCIE_PIO].base, s->pio_alias);
 sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, virt_memmap[VIRT_PCIE_PIO].base);
 
-for (i = 0; i < GPEX_NUM_IRQS; i++) {
+for (i = 0; i < PCI_NUM_PINS; i++) {
 irq = qdev_get_gpio_in(pic, PCIE_IRQ_BASE + i);
 sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, irq);
 gpex_set_irq_num(GPEX_HOST(dev), i, PCIE_IRQ_BASE + i);
diff --git a/hw/openrisc/virt.c b/hw/openrisc/virt.c
index f8a68a6a6b..16a5676c4b 100644
--- a/hw/openrisc/virt.c
+++ b/hw/openrisc/virt.c
@@ -318,7 +318,7 @@ static void create_pcie_irq_map(void *fdt, char *nodename, 
int irq_base,
 {
 int pin, dev;
 uint32_t irq_map_stride = 0;
-uint32_t full_irq_map[GPEX_NUM_IRQS * GPEX_NUM_IRQS * 6] = {};
+uint32_t full_irq_map[PCI_NUM_PINS * PCI_NUM_PINS * 6] = {};
 uint32_t *irq_map = full_irq_map;
 
 /*
@@ -330,11 +330,11 @@ static void create_pcie_irq_map(void *fdt, char 
*nodename, int irq_base,
  * possible slot) seeing the interrupt-map-mask will allow the table
  * to wrap to any number of devices.
  */
-for (dev = 0; dev < GPEX_NUM_IRQS; dev++) {
+for (dev = 0; dev < PCI_NUM_PINS; dev++) {
 int devfn = dev << 3;
 
-for (pin = 0; pin < GPEX_NUM_IRQS; pin++) {
-int irq_nr = irq_base + ((pin + PCI_SLOT(devfn)) % GPEX_NUM_IRQS);
+for (pin = 0; pin < PCI_NUM_PINS; pin++) {
+int irq_nr = irq_base + ((pin + PCI_SLOT(devfn)) % PCI_NUM_PINS);
 int i = 0;
 
 /* Fill PCI address cells */
@@ -357,7 +357,7 @@ static void create_pci

[PATCH 09/12] hw/vmapple/bdif: Introduce vmapple backdoor interface

2023-06-14 Thread Alexander Graf
The VMApple machine exposes AUX and ROOT block devices (as well as USB OTG
emulation) via virtio-pci as well as a special, simple backdoor platform
device.

This patch implements this backdoor platform device to the best of my
understanding. I left out any USB OTG parts; they're only needed for
guest recovery and I don't understand the protocol yet.

Signed-off-by: Alexander Graf 
---
 hw/vmapple/Kconfig|   2 +
 hw/vmapple/bdif.c | 245 ++
 hw/vmapple/meson.build|   1 +
 hw/vmapple/trace-events   |   5 +
 include/hw/vmapple/bdif.h |  31 +
 5 files changed, 284 insertions(+)
 create mode 100644 hw/vmapple/bdif.c
 create mode 100644 include/hw/vmapple/bdif.h

diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
index a73504d599..388a2bc60c 100644
--- a/hw/vmapple/Kconfig
+++ b/hw/vmapple/Kconfig
@@ -1,3 +1,5 @@
 config VMAPPLE_AES
 bool
 
+config VMAPPLE_BDIF
+bool
diff --git a/hw/vmapple/bdif.c b/hw/vmapple/bdif.c
new file mode 100644
index 00..36b5915ff3
--- /dev/null
+++ b/hw/vmapple/bdif.c
@@ -0,0 +1,245 @@
+/*
+ * VMApple Backdoor Interface
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/vmapple/bdif.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qapi/error.h"
+#include "trace.h"
+#include "hw/block/block.h"
+#include "sysemu/block-backend.h"
+
+#define REG_DEVID_MASK  0x
+#define DEVID_ROOT  0x
+#define DEVID_AUX   0x0001
+#define DEVID_USB   0x0010
+
+#define REG_STATUS  0x0
+#define REG_STATUS_ACTIVE BIT(0)
+#define REG_CFG 0x4
+#define REG_CFG_ACTIVEBIT(1)
+#define REG_UNK10x8
+#define REG_BUSY0x10
+#define REG_BUSY_READYBIT(0)
+#define REG_UNK20x400
+#define REG_CMD 0x408
+#define REG_NEXT_DEVICE 0x420
+#define REG_UNK30x434
+
+typedef struct vblk_sector {
+uint32_t pad;
+uint32_t pad2;
+uint32_t sector;
+uint32_t pad3;
+} VblkSector;
+
+typedef struct vblk_req_cmd {
+uint64_t addr;
+uint32_t len;
+uint32_t flags;
+} VblkReqCmd;
+
+typedef struct vblk_req {
+VblkReqCmd sector;
+VblkReqCmd data;
+VblkReqCmd retval;
+} VblkReq;
+
+#define VBLK_DATA_FLAGS_READ  0x00030001
+#define VBLK_DATA_FLAGS_WRITE 0x00010001
+
+#define VBLK_RET_SUCCESS  0
+#define VBLK_RET_FAILED   1
+
+static uint64_t bdif_read(void *opaque, hwaddr offset, unsigned size)
+{
+uint64_t ret = -1;
+uint64_t devid = (offset & REG_DEVID_MASK);
+
+switch (offset & ~REG_DEVID_MASK) {
+case REG_STATUS:
+ret = REG_STATUS_ACTIVE;
+break;
+case REG_CFG:
+ret = REG_CFG_ACTIVE;
+break;
+case REG_UNK1:
+ret = 0x420;
+break;
+case REG_BUSY:
+ret = REG_BUSY_READY;
+break;
+case REG_UNK2:
+ret = 0x1;
+break;
+case REG_UNK3:
+ret = 0x0;
+break;
+case REG_NEXT_DEVICE:
+switch (devid) {
+case DEVID_ROOT:
+ret = 0x800;
+break;
+case DEVID_AUX:
+ret = 0x1;
+break;
+}
+break;
+}
+
+trace_bdif_read(offset, size, ret);
+return ret;
+}
+
+static void le2cpu_sector(VblkSector *sector)
+{
+sector->sector = le32_to_cpu(sector->sector);
+}
+
+static void le2cpu_reqcmd(VblkReqCmd *cmd)
+{
+cmd->addr = le64_to_cpu(cmd->addr);
+cmd->len = le32_to_cpu(cmd->len);
+cmd->flags = le32_to_cpu(cmd->flags);
+}
+
+static void le2cpu_req(VblkReq *req)
+{
+le2cpu_reqcmd(&req->sector);
+le2cpu_reqcmd(&req->data);
+le2cpu_reqcmd(&req->retval);
+}
+
+static void vblk_cmd(uint64_t devid, BlockBackend *blk, uint64_t value,
+ uint64_t static_off)
+{
+VblkReq req;
+VblkSector sector;
+uint64_t off = 0;
+char *buf = NULL;
+uint8_t ret = VBLK_RET_FAILED;
+int r;
+
+cpu_physical_memory_read(value, &req, sizeof(req));
+le2cpu_req(&req);
+
+if (req.sector.len != sizeof(sector)) {
+ret = VBLK_RET_FAILED;
+goto out;
+}
+
+/* Read the vblk command */
+cpu_physical_memory_read(req.sector.addr, §or, sizeof(sector));
+le2cpu_sector(§or);
+
+off = sector.sector * 512ULL + static_off;
+
+/* Sanity check that we're not allocating bogus sizes */
+if (req.data.len > (128 * 1024 * 1024)) {
+goto out;
+}
+
+buf = g_malloc0(req.data.len);
+switch (req.data.flags) {
+case VBLK_DATA_FLAGS_READ:
+r = blk_pread(blk, off, req.data.len, buf

[PATCH 06/12] hw: Add vmapple subdir

2023-06-14 Thread Alexander Graf
We will introduce a number of devices that are specific to the vmapple
target machine. To keep them all tidily together, let's put them into
a single target directory.

Signed-off-by: Alexander Graf 
---
 MAINTAINERS | 6 ++
 hw/Kconfig  | 1 +
 hw/meson.build  | 1 +
 hw/vmapple/Kconfig  | 1 +
 hw/vmapple/meson.build  | 0
 hw/vmapple/trace-events | 2 ++
 hw/vmapple/trace.h  | 1 +
 meson.build | 1 +
 8 files changed, 13 insertions(+)
 create mode 100644 hw/vmapple/Kconfig
 create mode 100644 hw/vmapple/meson.build
 create mode 100644 hw/vmapple/trace-events
 create mode 100644 hw/vmapple/trace.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 4a80a38511..7d5cb3e3e6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2547,6 +2547,12 @@ F: hw/usb/canokey.c
 F: hw/usb/canokey.h
 F: docs/system/devices/canokey.rst
 
+VMapple
+M: Alexander Graf 
+S: Maintained
+F: hw/vmapple/*
+F: include/hw/vmapple/*
+
 Subsystems
 --
 Overall Audio backends
diff --git a/hw/Kconfig b/hw/Kconfig
index ba62ff6417..d99854afdd 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -41,6 +41,7 @@ source tpm/Kconfig
 source usb/Kconfig
 source virtio/Kconfig
 source vfio/Kconfig
+source vmapple/Kconfig
 source xen/Kconfig
 source watchdog/Kconfig
 
diff --git a/hw/meson.build b/hw/meson.build
index c7ac7d3d75..e156a6618f 100644
--- a/hw/meson.build
+++ b/hw/meson.build
@@ -40,6 +40,7 @@ subdir('tpm')
 subdir('usb')
 subdir('vfio')
 subdir('virtio')
+subdir('vmapple')
 subdir('watchdog')
 subdir('xen')
 subdir('xenpv')
diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
new file mode 100644
index 00..8b13789179
--- /dev/null
+++ b/hw/vmapple/Kconfig
@@ -0,0 +1 @@
+
diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build
new file mode 100644
index 00..e69de29bb2
diff --git a/hw/vmapple/trace-events b/hw/vmapple/trace-events
new file mode 100644
index 00..9ccc579048
--- /dev/null
+++ b/hw/vmapple/trace-events
@@ -0,0 +1,2 @@
+# See docs/devel/tracing.rst for syntax documentation.
+
diff --git a/hw/vmapple/trace.h b/hw/vmapple/trace.h
new file mode 100644
index 00..572adbefe0
--- /dev/null
+++ b/hw/vmapple/trace.h
@@ -0,0 +1 @@
+#include "trace/trace-hw_vmapple.h"
diff --git a/meson.build b/meson.build
index 0bb5ea9d10..e0203518ef 100644
--- a/meson.build
+++ b/meson.build
@@ -3273,6 +3273,7 @@ if have_system
 'hw/usb',
 'hw/vfio',
 'hw/virtio',
+'hw/vmapple',
 'hw/watchdog',
 'hw/xen',
 'hw/gpio',
-- 
2.39.2 (Apple Git-143)




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH 05/12] hw/virtio: Add support for apple virtio-blk

2023-06-14 Thread Alexander Graf
Apple has its own virtio-blk PCI device ID where it deviates from the
official virtio-pci spec slightly: It puts a new "apple type"
field at a static offset in config space and introduces a new discard
command.

This patch adds a new qdev property called "apple-type" to virtio-blk-pci.
When that property is set, we assume the virtio-blk device is an Apple one
of the specific type and act accordingly.

Signed-off-by: Alexander Graf 
---
 hw/block/virtio-blk.c   | 23 +
 hw/virtio/virtio-blk-pci.c  |  7 +++
 include/hw/pci/pci_ids.h|  1 +
 include/hw/virtio/virtio-blk.h  |  1 +
 include/standard-headers/linux/virtio_blk.h |  3 +++
 5 files changed, 35 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 39e7f23fab..76b85bb3cb 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1120,6 +1120,20 @@ static int virtio_blk_handle_request(VirtIOBlockReq 
*req, MultiReqBuffer *mrb)
 
 break;
 }
+case VIRTIO_BLK_T_APPLE1:
+{
+if (s->conf.x_apple_type) {
+/* Only valid on Apple Virtio */
+char buf[iov_size(in_iov, in_num)];
+memset(buf, 0, sizeof(buf));
+iov_from_buf(in_iov, in_num, 0, buf, sizeof(buf));
+virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
+} else {
+virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
+}
+virtio_blk_free_request(req);
+break;
+}
 default:
 virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
 virtio_blk_free_request(req);
@@ -1351,6 +1365,10 @@ static void virtio_blk_update_config(VirtIODevice *vdev, 
uint8_t *config)
 } else {
 blkcfg.zoned.model = VIRTIO_BLK_Z_NONE;
 }
+if (s->conf.x_apple_type) {
+/* Apple abuses the same location for its type id */
+blkcfg.max_secure_erase_sectors = s->conf.x_apple_type;
+}
 memcpy(config, &blkcfg, s->config_size);
 }
 
@@ -1625,6 +1643,10 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 
 s->config_size = virtio_get_config_size(&virtio_blk_cfg_size_params,
 s->host_features);
+if (s->conf.x_apple_type) {
+/* Apple Virtio puts the blk type at 0x3c, make sure we have space. */
+s->config_size = MAX(s->config_size, 0x3d);
+}
 virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size);
 
 s->blk = conf->conf.blk;
@@ -1734,6 +1756,7 @@ static Property virtio_blk_properties[] = {
conf.max_write_zeroes_sectors, 
BDRV_REQUEST_MAX_SECTORS),
 DEFINE_PROP_BOOL("x-enable-wce-if-config-wce", VirtIOBlock,
  conf.x_enable_wce_if_config_wce, true),
+DEFINE_PROP_UINT32("x-apple-type", VirtIOBlock, conf.x_apple_type, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/virtio/virtio-blk-pci.c b/hw/virtio/virtio-blk-pci.c
index 9743bee965..5fbf98f750 100644
--- a/hw/virtio/virtio-blk-pci.c
+++ b/hw/virtio/virtio-blk-pci.c
@@ -62,6 +62,13 @@ static void virtio_blk_pci_realize(VirtIOPCIProxy *vpci_dev, 
Error **errp)
 }
 
 qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
+
+if (conf->x_apple_type) {
+/* Apple virtio-blk uses a different vendor/device id */
+pci_config_set_vendor_id(vpci_dev->pci_dev.config, 
PCI_VENDOR_ID_APPLE);
+pci_config_set_device_id(vpci_dev->pci_dev.config,
+ PCI_DEVICE_ID_APPLE_VIRTIO_BLK);
+}
 }
 
 static void virtio_blk_pci_class_init(ObjectClass *klass, void *data)
diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
index e4386ebb20..74e589a298 100644
--- a/include/hw/pci/pci_ids.h
+++ b/include/hw/pci/pci_ids.h
@@ -188,6 +188,7 @@
 #define PCI_DEVICE_ID_APPLE_UNI_N_AGP0x0020
 #define PCI_DEVICE_ID_APPLE_U3_AGP   0x004b
 #define PCI_DEVICE_ID_APPLE_UNI_N_GMAC   0x0021
+#define PCI_DEVICE_ID_APPLE_VIRTIO_BLK   0x1a00
 
 #define PCI_VENDOR_ID_SUN0x108e
 #define PCI_DEVICE_ID_SUN_EBUS   0x1000
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index dafec432ce..7117ce754c 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -46,6 +46,7 @@ struct VirtIOBlkConf
 uint32_t max_discard_sectors;
 uint32_t max_write_zeroes_sectors;
 bool x_enable_wce_if_config_wce;
+uint32_t x_apple_type;
 };
 
 struct VirtIOBlockDataPlane;
diff --git a/include/standard-headers/linux/virtio_blk.h 
b/include/standard-headers/linux/virtio_blk.h
index 7155b1a470..bbea5d50b9 100644
--- a/include/standard-headers/linux/virtio_blk.h
+++ b/include/standard-headers/linux/virtio_blk.h
@@ -204,6 +204,9 @@ struct virtio_blk_config {
 /* Reset All zones command */
 #define VIRTIO_BLK_T_ZONE_RESET_ALL 26
 
+/* Write zeroes c

[PATCH 04/12] hvf: arm: Ignore writes to CNTP_CTL_EL0

2023-06-14 Thread Alexander Graf
MacOS unconditionally disables interrupts of the physical timer on boot
and then continues to use the virtual one. We don't really want to support
a full physical timer emulation, so let's just ignore those writes.

Signed-off-by: Alexander Graf 
---
 target/arm/hvf/hvf.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 8f72624586..0dff63fb5f 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -179,6 +179,7 @@ void hvf_arm_init_debug(void)
 #define SYSREG_OSLSR_EL1  SYSREG(2, 0, 1, 1, 4)
 #define SYSREG_OSDLR_EL1  SYSREG(2, 0, 1, 3, 4)
 #define SYSREG_CNTPCT_EL0 SYSREG(3, 3, 14, 0, 1)
+#define SYSREG_CNTP_CTL_EL0   SYSREG(3, 3, 14, 2, 1)
 #define SYSREG_PMCR_EL0   SYSREG(3, 3, 9, 12, 0)
 #define SYSREG_PMUSERENR_EL0  SYSREG(3, 3, 9, 14, 0)
 #define SYSREG_PMCNTENSET_EL0 SYSREG(3, 3, 9, 12, 1)
@@ -1551,6 +1552,12 @@ static int hvf_sysreg_write(CPUState *cpu, uint32_t reg, 
uint64_t val)
 case SYSREG_OSLAR_EL1:
 env->cp15.oslsr_el1 = val & 1;
 break;
+case SYSREG_CNTP_CTL_EL0:
+/*
+ * Guests should not rely on the physical counter, but macOS emits
+ * disable writes to it. Let it do so, but ignore the requests.
+ */
+break;
 case SYSREG_OSDLR_EL1:
 /* Dummy register */
 break;
-- 
2.39.2 (Apple Git-143)




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH 01/12] build: Only define OS_OBJECT_USE_OBJC with gcc

2023-06-14 Thread Alexander Graf
Recent versions of macOS use clang instead of gcc. The OS_OBJECT_USE_OBJC
define is only necessary when building with gcc. Let's not define it when
building with clang.

With this patch, I can successfully include GCD headers in QEMU when
building with clang.

Signed-off-by: Alexander Graf 
---
 meson.build | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 34306a6205..0bb5ea9d10 100644
--- a/meson.build
+++ b/meson.build
@@ -225,7 +225,9 @@ qemu_ldflags = []
 if targetos == 'darwin'
   # Disable attempts to use ObjectiveC features in os/object.h since they
   # won't work when we're compiling with gcc as a C compiler.
-  qemu_common_flags += '-DOS_OBJECT_USE_OBJC=0'
+  if compiler.get_id() == 'gcc'
+qemu_common_flags += '-DOS_OBJECT_USE_OBJC=0'
+  endif
 elif targetos == 'solaris'
   # needed for CMSG_ macros in sys/socket.h
   qemu_common_flags += '-D_XOPEN_SOURCE=600'
-- 
2.39.2 (Apple Git-143)




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH 03/12] hvf: Increase number of possible memory slots

2023-06-14 Thread Alexander Graf
For PVG we will need more than the current 32 possible memory slots.
Bump the limit to 512 instead.

Signed-off-by: Alexander Graf 
---
 accel/hvf/hvf-accel-ops.c | 2 +-
 include/sysemu/hvf_int.h  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index 9c3da03c94..bf0caaa852 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -88,7 +88,7 @@ struct mac_slot {
 uint64_t gva;
 };
 
-struct mac_slot mac_slots[32];
+struct mac_slot mac_slots[512];
 
 static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
 {
diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index 6ab119e49f..c7623a2c09 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -40,7 +40,7 @@ typedef struct hvf_vcpu_caps {
 
 struct HVFState {
 AccelState parent;
-hvf_slot slots[32];
+hvf_slot slots[512];
 int num_slots;
 
 hvf_vcpu_caps *hvf_caps;
-- 
2.39.2 (Apple Git-143)




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH 04/12] hvf: arm: Ignore writes to CNTP_CTL_EL0

2023-06-14 Thread Alexander Graf
MacOS unconditionally disables interrupts of the physical timer on boot
and then continues to use the virtual one. We don't really want to support
a full physical timer emulation, so let's just ignore those writes.

Signed-off-by: Alexander Graf 
---
 target/arm/hvf/hvf.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 8f72624586..0dff63fb5f 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -179,6 +179,7 @@ void hvf_arm_init_debug(void)
 #define SYSREG_OSLSR_EL1  SYSREG(2, 0, 1, 1, 4)
 #define SYSREG_OSDLR_EL1  SYSREG(2, 0, 1, 3, 4)
 #define SYSREG_CNTPCT_EL0 SYSREG(3, 3, 14, 0, 1)
+#define SYSREG_CNTP_CTL_EL0   SYSREG(3, 3, 14, 2, 1)
 #define SYSREG_PMCR_EL0   SYSREG(3, 3, 9, 12, 0)
 #define SYSREG_PMUSERENR_EL0  SYSREG(3, 3, 9, 14, 0)
 #define SYSREG_PMCNTENSET_EL0 SYSREG(3, 3, 9, 12, 1)
@@ -1551,6 +1552,12 @@ static int hvf_sysreg_write(CPUState *cpu, uint32_t reg, 
uint64_t val)
 case SYSREG_OSLAR_EL1:
 env->cp15.oslsr_el1 = val & 1;
 break;
+case SYSREG_CNTP_CTL_EL0:
+/*
+ * Guests should not rely on the physical counter, but macOS emits
+ * disable writes to it. Let it do so, but ignore the requests.
+ */
+break;
 case SYSREG_OSDLR_EL1:
 /* Dummy register */
 break;
-- 
2.39.2 (Apple Git-143)




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH 02/12] hw/misc/pvpanic: Add MMIO interface

2023-06-14 Thread Alexander Graf
In addition to the ISA and PCI variants of pvpanic, let's add an MMIO
platform device that we can use in embedded arm environments.

Signed-off-by: Alexander Graf 
---
 hw/misc/Kconfig   |  4 +++
 hw/misc/meson.build   |  1 +
 hw/misc/pvpanic-mmio.c| 66 +++
 include/hw/misc/pvpanic.h |  1 +
 4 files changed, 72 insertions(+)
 create mode 100644 hw/misc/pvpanic-mmio.c

diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig
index e4c2149175..21913ef191 100644
--- a/hw/misc/Kconfig
+++ b/hw/misc/Kconfig
@@ -125,6 +125,10 @@ config PVPANIC_ISA
 depends on ISA_BUS
 select PVPANIC_COMMON
 
+config PVPANIC_MMIO
+bool
+select PVPANIC_COMMON
+
 config AUX
 bool
 select I2C
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index 78ca857c9d..b935e74d51 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -115,6 +115,7 @@ softmmu_ss.add(when: 'CONFIG_ARMSSE_MHU', if_true: 
files('armsse-mhu.c'))
 
 softmmu_ss.add(when: 'CONFIG_PVPANIC_ISA', if_true: files('pvpanic-isa.c'))
 softmmu_ss.add(when: 'CONFIG_PVPANIC_PCI', if_true: files('pvpanic-pci.c'))
+softmmu_ss.add(when: 'CONFIG_PVPANIC_MMIO', if_true: files('pvpanic-mmio.c'))
 softmmu_ss.add(when: 'CONFIG_AUX', if_true: files('auxbus.c'))
 softmmu_ss.add(when: 'CONFIG_ASPEED_SOC', if_true: files(
   'aspeed_hace.c',
diff --git a/hw/misc/pvpanic-mmio.c b/hw/misc/pvpanic-mmio.c
new file mode 100644
index 00..aebe7227e6
--- /dev/null
+++ b/hw/misc/pvpanic-mmio.c
@@ -0,0 +1,66 @@
+/*
+ * QEMU simulated pvpanic device (MMIO frontend)
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/module.h"
+#include "sysemu/runstate.h"
+
+#include "hw/nvram/fw_cfg.h"
+#include "hw/qdev-properties.h"
+#include "hw/misc/pvpanic.h"
+#include "qom/object.h"
+#include "hw/isa/isa.h"
+#include "standard-headers/linux/pvpanic.h"
+
+OBJECT_DECLARE_SIMPLE_TYPE(PVPanicMMIOState, PVPANIC_MMIO_DEVICE)
+
+#define PVPANIC_MMIO_SIZE 0x2
+
+struct PVPanicMMIOState {
+SysBusDevice parent_obj;
+
+PVPanicState pvpanic;
+};
+
+static void pvpanic_mmio_initfn(Object *obj)
+{
+PVPanicMMIOState *s = PVPANIC_MMIO_DEVICE(obj);
+
+pvpanic_setup_io(&s->pvpanic, DEVICE(s), PVPANIC_MMIO_SIZE);
+sysbus_init_mmio(SYS_BUS_DEVICE(obj), &s->pvpanic.mr);
+}
+
+static Property pvpanic_mmio_properties[] = {
+DEFINE_PROP_UINT8("events", PVPanicMMIOState, pvpanic.events,
+  PVPANIC_PANICKED | PVPANIC_CRASH_LOADED),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void pvpanic_mmio_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+device_class_set_props(dc, pvpanic_mmio_properties);
+set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+}
+
+static const TypeInfo pvpanic_mmio_info = {
+.name  = TYPE_PVPANIC_MMIO_DEVICE,
+.parent= TYPE_SYS_BUS_DEVICE,
+.instance_size = sizeof(PVPanicMMIOState),
+.instance_init = pvpanic_mmio_initfn,
+.class_init= pvpanic_mmio_class_init,
+};
+
+static void pvpanic_register_types(void)
+{
+type_register_static(&pvpanic_mmio_info);
+}
+
+type_init(pvpanic_register_types)
diff --git a/include/hw/misc/pvpanic.h b/include/hw/misc/pvpanic.h
index fab94165d0..f9e7c1ea17 100644
--- a/include/hw/misc/pvpanic.h
+++ b/include/hw/misc/pvpanic.h
@@ -20,6 +20,7 @@
 
 #define TYPE_PVPANIC_ISA_DEVICE "pvpanic"
 #define TYPE_PVPANIC_PCI_DEVICE "pvpanic-pci"
+#define TYPE_PVPANIC_MMIO_DEVICE "pvpanic-mmio"
 
 #define PVPANIC_IOPORT_PROP "ioport"
 
-- 
2.39.2 (Apple Git-143)




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




[PATCH 00/12] Introduce new vmapple machine type

2023-06-14 Thread Alexander Graf
This patch set introduces a new ARM and HVF specific machine type
called "vmapple". It mimicks the device model that Apple's proprietary
Virtualization.Framework exposes, but implements it in QEMU.

With this new machine type, you can run macOS guests on Apple Silicon
systems via HVF. To do so, you need to first install macOS using
Virtualization.Framework onto a virtual disk image using a tool like
macosvm (https://github.com/s-u/macosvm)

  $ macosvm --disk disk.img,size=32g --aux aux.img \
--restore UniversalMac_12.0.1_21A559_Restore.ipsw vm.json

Then, extract the ECID from the installed VM:

  $ cat "$DIR/macosvm.json" | python3 -c \
  'import json,sys;obj=json.load(sys.stdin);print(obj["machineId"]) |\
  base64 -d | plutil -extract ECID raw -

In addition, cut off the first 16kb of the aux.img:

  $ dd if=aux.img of=aux.img.trimmed bs=$(( 0x4000 )) skip=1

Now, you can just launch QEMU with the bits generated above:

  $ qemu-system-aarch64 -serial mon:stdio\
  -m 4G  \
  -M vmapple,uuid=6240349656165161789\
  -bios /Sys*/Lib*/Fra*/Virtualization.f*/R*/AVPBooter.vmapple2.bin  \
  -pflash aux.img.trimmed\
  -pflash disk.img   \
  -drive file=disk.img,if=none,id=root   \
  -device virtio-blk-pci,drive=root,x-apple-type=1   \
  -drive file=aux.img.trimmed,if=none,id=aux \
  -device virtio-blk-pci,drive=aux,x-apple-type=2\
  -accel hvf -no-reboot

There are a few limitations with this implementation:

  - Only runs on macOS because it relies on
ParavirtualizesGraphics.Framework
  - Something is not fully correct on interrupt delivery or
similar - the keyboard does not work
  - No Rosetta in the guest because we lack the private
entitlement to enable TSO

Over time, I hope that some of the limitations above could cease to exist.
This device model would enable very nice use cases with KVM on an Asahi
Linux device.


Alexander Graf (12):
  build: Only define OS_OBJECT_USE_OBJC with gcc
  hw/misc/pvpanic: Add MMIO interface
  hvf: Increase number of possible memory slots
  hvf: arm: Ignore writes to CNTP_CTL_EL0
  hw/virtio: Add support for apple virtio-blk
  hw: Add vmapple subdir
  gpex: Allow more than 4 legacy IRQs
  hw/vmapple/aes: Introduce aes engine
  hw/vmapple/bdif: Introduce vmapple backdoor interface
  hw/vmapple/cfg: Introduce vmapple cfg region
  hw/vmapple/apple-gfx: Introduce ParavirtualizedGraphics.Framework
support
  hw/vmapple/vmapple: Add vmapple machine type

 MAINTAINERS |   6 +
 accel/hvf/hvf-accel-ops.c   |   2 +-
 hw/Kconfig  |   1 +
 hw/arm/sbsa-ref.c   |   2 +-
 hw/arm/virt.c   |   2 +-
 hw/block/virtio-blk.c   |  23 +
 hw/i386/microvm.c   |   2 +-
 hw/loongarch/virt.c |   2 +-
 hw/meson.build  |   1 +
 hw/mips/loongson3_virt.c|   2 +-
 hw/misc/Kconfig |   4 +
 hw/misc/meson.build |   1 +
 hw/misc/pvpanic-mmio.c  |  66 ++
 hw/openrisc/virt.c  |  12 +-
 hw/pci-host/gpex.c  |  36 +-
 hw/riscv/virt.c |  12 +-
 hw/virtio/virtio-blk-pci.c  |   7 +
 hw/vmapple/Kconfig  |  30 +
 hw/vmapple/aes.c| 583 +
 hw/vmapple/apple-gfx.m  | 578 +
 hw/vmapple/bdif.c   | 245 
 hw/vmapple/cfg.c| 105 
 hw/vmapple/meson.build  |   5 +
 hw/vmapple/trace-events |  47 ++
 hw/vmapple/trace.h  |   1 +
 hw/vmapple/vmapple.c| 661 
 hw/xtensa/virt.c|   2 +-
 include/hw/misc/pvpanic.h   |   1 +
 include/hw/pci-host/gpex.h  |   7 +-
 include/hw/pci/pci_ids.h|   1 +
 include/hw/virtio/virtio-blk.h  |   1 +
 include/hw/vmapple/bdif.h   |  31 +
 include/hw/vmapple/cfg.h|  68 ++
 include/standard-headers/linux/virtio_blk.h |   3 +
 include/sysemu/hvf_int.h|   2 +-
 meson.build |   9 +-
 target/arm/hvf/hvf.c|   7 +
 37 files changed, 2538 insertions(+), 30 deletions(-)
 create mode 100644 hw/misc/pvpan

Re: [PATCH 2/3] hw/ppc/e500plat: Fix modifying QOM class internal state from instance

2023-05-23 Thread Alexander Graf

Hi Philippe,

On 23.05.23 08:44, Philippe Mathieu-Daudé wrote:

QOM object instance should not modify its class state (because
all other objects instanciated from this class get affected).

Instead of modifying the PPCE500MachineClass 'mpic_version' field
in the instance machine_init() handler, set it in the machine
class init handler (e500plat_machine_class_init).

Inspired-by: Bernhard Beschow 
Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/ppc/e500plat.c | 25 +++--
  1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/hw/ppc/e500plat.c b/hw/ppc/e500plat.c
index 3032bd3f6d..c3b0ed01cf 100644
--- a/hw/ppc/e500plat.c
+++ b/hw/ppc/e500plat.c
@@ -30,18 +30,6 @@ static void e500plat_fixup_devtree(void *fdt)
   sizeof(compatible));
  }
  
-static void e500plat_init(MachineState *machine)

-{
-PPCE500MachineClass *pmc = PPCE500_MACHINE_GET_CLASS(machine);
-/* Older KVM versions don't support EPR which breaks guests when we 
announce
-   MPIC variants that support EPR. Revert to an older one for those */
-if (kvm_enabled() && !kvmppc_has_cap_epr()) {
-pmc->mpic_version = OPENPIC_MODEL_FSL_MPIC_20;
-}
-
-ppce500_init(machine);



Won't this drop the call to ppce500_init(machine)?


-}
-
  static void e500plat_machine_device_plug_cb(HotplugHandler *hotplug_dev,
  DeviceState *dev, Error **errp)
  {
@@ -81,7 +69,6 @@ static void e500plat_machine_class_init(ObjectClass *oc, void 
*data)
  pmc->pci_first_slot = 0x1;
  pmc->pci_nr_slots = PCI_SLOT_MAX - 1;
  pmc->fixup_devtree = e500plat_fixup_devtree;
-pmc->mpic_version = OPENPIC_MODEL_FSL_MPIC_42;
  pmc->has_mpc8xxx_gpio = true;
  pmc->has_esdhc = true;
  pmc->platform_bus_base = 0xfULL;
@@ -94,8 +81,18 @@ static void e500plat_machine_class_init(ObjectClass *oc, 
void *data)
  pmc->pci_mmio_bus_base = 0xE000ULL;
  pmc->spin_base = 0xFEF00ULL;
  
+if (kvm_enabled() && !kvmppc_has_cap_epr()) {

+/*
+ * Older KVM versions don't support EPR which breaks guests when
+ * we announce MPIC variants that support EPR. Revert to an older
+ * one for those.
+ */
+pmc->mpic_version = OPENPIC_MODEL_FSL_MPIC_20;
+} else {
+pmc->mpic_version = OPENPIC_MODEL_FSL_MPIC_42;
+}
+
  mc->desc = "generic paravirt e500 platform";
-mc->init = e500plat_init;



I suppose best would be to just put it in here instead of e500plat_init?


Alex



  mc->max_cpus = 32;
  mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("e500v2_v30");
  mc->default_ram_id = "mpc8544ds.ram";




[PATCH] hvf: Enable 1G page support

2023-04-20 Thread Alexander Graf
Hvf on x86 only supported 2MiB large pages, but never bothered to strip
out the 1GiB page size capability from -cpu host. With QEMU 8.0.0 this
became a problem because OVMF started to use 1GiB pages by default.

Let's just unconditionally add 1GiB page walk support to the walker.

With this fix applied, I can successfully run OVMF again.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1603
Signed-off-by: Alexander Graf 
Reported-by: Akihiro Suda 
Reported-by: Philippe Mathieu-Daudé 

---

On my test VM, Linux dies later on with issues in interrupt delivery. But
those are unrelated to this patch; I confirmed that I get the same behavior
with 1GiB page support disabled.
---
 target/i386/hvf/x86_mmu.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/target/i386/hvf/x86_mmu.c b/target/i386/hvf/x86_mmu.c
index 96d117567e..1d860651c6 100644
--- a/target/i386/hvf/x86_mmu.c
+++ b/target/i386/hvf/x86_mmu.c
@@ -38,6 +38,7 @@
 #define LEGACY_PTE_PAGE_MASK(0xllu << 12)
 #define PAE_PTE_PAGE_MASK   ((-1llu << 12) & ((1llu << 52) - 1))
 #define PAE_PTE_LARGE_PAGE_MASK ((-1llu << (21)) & ((1llu << 52) - 1))
+#define PAE_PTE_SUPER_PAGE_MASK ((-1llu << (30)) & ((1llu << 52) - 1))
 
 struct gpt_translation {
 target_ulong  gva;
@@ -96,7 +97,7 @@ static bool get_pt_entry(struct CPUState *cpu, struct 
gpt_translation *pt,
 
 /* test page table entry */
 static bool test_pt_entry(struct CPUState *cpu, struct gpt_translation *pt,
-  int level, bool *is_large, bool pae)
+  int level, int *largeness, bool pae)
 {
 uint64_t pte = pt->pte[level];
 
@@ -118,9 +119,9 @@ static bool test_pt_entry(struct CPUState *cpu, struct 
gpt_translation *pt,
 goto exit;
 }
 
-if (1 == level && pte_large_page(pte)) {
+if (level && pte_large_page(pte)) {
 pt->err_code |= MMU_PAGE_PT;
-*is_large = true;
+*largeness = level;
 }
 if (!level) {
 pt->err_code |= MMU_PAGE_PT;
@@ -152,9 +153,18 @@ static inline uint64_t pse_pte_to_page(uint64_t pte)
 return ((pte & 0x1fe000) << 19) | (pte & 0xffc0);
 }
 
-static inline uint64_t large_page_gpa(struct gpt_translation *pt, bool pae)
+static inline uint64_t large_page_gpa(struct gpt_translation *pt, bool pae,
+  int largeness)
 {
-VM_PANIC_ON(!pte_large_page(pt->pte[1]))
+VM_PANIC_ON(!pte_large_page(pt->pte[largeness]))
+
+/* 1Gib large page  */
+if (pae && largeness == 2) {
+return (pt->pte[2] & PAE_PTE_SUPER_PAGE_MASK) | (pt->gva & 0x3fff);
+}
+
+VM_PANIC_ON(largeness != 1)
+
 /* 2Mb large page  */
 if (pae) {
 return (pt->pte[1] & PAE_PTE_LARGE_PAGE_MASK) | (pt->gva & 0x1f);
@@ -170,7 +180,7 @@ static bool walk_gpt(struct CPUState *cpu, target_ulong 
addr, int err_code,
  struct gpt_translation *pt, bool pae)
 {
 int top_level, level;
-bool is_large = false;
+int largeness = 0;
 target_ulong cr3 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR3);
 uint64_t page_mask = pae ? PAE_PTE_PAGE_MASK : LEGACY_PTE_PAGE_MASK;
 
@@ -186,19 +196,19 @@ static bool walk_gpt(struct CPUState *cpu, target_ulong 
addr, int err_code,
 for (level = top_level; level > 0; level--) {
 get_pt_entry(cpu, pt, level, pae);
 
-if (!test_pt_entry(cpu, pt, level - 1, &is_large, pae)) {
+if (!test_pt_entry(cpu, pt, level - 1, &largeness, pae)) {
 return false;
 }
 
-if (is_large) {
+if (largeness) {
 break;
 }
 }
 
-if (!is_large) {
+if (!largeness) {
 pt->gpa = (pt->pte[0] & page_mask) | (pt->gva & 0xfff);
 } else {
-pt->gpa = large_page_gpa(pt, pae);
+pt->gpa = large_page_gpa(pt, pae, largeness);
 }
 
 return true;
-- 
2.39.2 (Apple Git-143)




[PATCH v5] hostmem-file: add offset option

2023-04-03 Thread Alexander Graf
Add an option for hostmem-file to start the memory object at an offset
into the target file. This is useful if multiple memory objects reside
inside the same target file, such as a device node.

In particular, it's useful to map guest memory directly into /dev/mem
for experimentation.

To make this work consistently, also fix up all places in QEMU that
expect fd offsets to be 0.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - add qom documentation
  - propagate offset into truncate, size and alignment checks

v2 -> v3:

  - failed attempt at fixing typo

v3 -> v4:

  - fix typo

v4 -> v5:

  - improve qom doc comment
  - account for fd_offset in more places
---
 backends/hostmem-file.c | 40 +++-
 hw/virtio/vhost-user.c  |  1 +
 include/exec/memory.h   |  2 ++
 include/exec/ram_addr.h |  3 ++-
 include/exec/ramblock.h |  1 +
 qapi/qom.json   |  5 +
 qemu-options.hx |  6 +-
 softmmu/memory.c|  3 ++-
 softmmu/physmem.c   | 17 -
 9 files changed, 69 insertions(+), 9 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 25141283c4..38ea65bec5 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -27,6 +27,7 @@ struct HostMemoryBackendFile {
 
 char *mem_path;
 uint64_t align;
+uint64_t offset;
 bool discard_data;
 bool is_pmem;
 bool readonly;
@@ -58,7 +59,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
  backend->size, fb->align, ram_flags,
- fb->mem_path, fb->readonly, errp);
+ fb->mem_path, fb->offset, fb->readonly,
+ errp);
 g_free(name);
 #endif
 }
@@ -125,6 +127,36 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 fb->align = val;
 }
 
+static void file_memory_backend_get_offset(Object *o, Visitor *v,
+  const char *name, void *opaque,
+  Error **errp)
+{
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+uint64_t val = fb->offset;
+
+visit_type_size(v, name, &val, errp);
+}
+
+static void file_memory_backend_set_offset(Object *o, Visitor *v,
+  const char *name, void *opaque,
+  Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+uint64_t val;
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property '%s' of %s", name,
+   object_get_typename(o));
+return;
+}
+
+if (!visit_type_size(v, name, &val, errp)) {
+return;
+}
+fb->offset = val;
+}
+
 #ifdef CONFIG_LIBPMEM
 static bool file_memory_backend_get_pmem(Object *o, Error **errp)
 {
@@ -197,6 +229,12 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL);
+object_class_property_add(oc, "offset", "int",
+file_memory_backend_get_offset,
+file_memory_backend_set_offset,
+NULL, NULL);
+object_class_property_set_description(oc, "offset",
+"Offset into the target file (ex: 1G)");
 #ifdef CONFIG_LIBPMEM
 object_class_property_add_bool(oc, "pmem",
 file_memory_backend_get_pmem, file_memory_backend_set_pmem);
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index e5285df4ba..39dc803b03 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -483,6 +483,7 @@ static MemoryRegion *vhost_user_get_mr_data(uint64_t addr, 
ram_addr_t *offset,
 assert((uintptr_t)addr == addr);
 mr = memory_region_from_host((void *)(uintptr_t)addr, offset);
 *fd = memory_region_get_fd(mr);
+*offset += mr->ram_block->fd_offset;
 
 return mr;
 }
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 15ade918ba..3b7295fbe2 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1318,6 +1318,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
  * RAM_NORESERVE,
  * @path: the path in which to allocate the RAM.
+ * @offset: offset within the file referenced by path
  * @readonly: true to open @path for reading, false for read/write.
  * @errp: pointer to Error*, to store an error if it happens.
  *
@@ -1331,6 +1332,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,

Re: [PATCH v4] hostmem-file: add offset option

2023-04-03 Thread Alexander Graf


On 03.04.23 09:13, David Hildenbrand wrote:



On 01.04.23 19:47, Stefan Hajnoczi wrote:

On Sat, Apr 01, 2023 at 12:42:57PM +, Alexander Graf wrote:

Add an option for hostmem-file to start the memory object at an offset
into the target file. This is useful if multiple memory objects reside
inside the same target file, such as a device node.

In particular, it's useful to map guest memory directly into /dev/mem
for experimentation.

Signed-off-by: Alexander Graf 
Reviewed-by: Stefan Hajnoczi 

---

v1 -> v2:

   - add qom documentation
   - propagate offset into truncate, size and alignment checks

v2 -> v3:

   - failed attempt at fixing typo

v2 -> v4:

   - fix typo
---
  backends/hostmem-file.c | 40 +++-
  include/exec/memory.h   |  2 ++
  include/exec/ram_addr.h |  3 ++-
  qapi/qom.json   |  5 +
  qemu-options.hx |  6 +-
  softmmu/memory.c    |  3 ++-
  softmmu/physmem.c   | 14 ++
  7 files changed, 65 insertions(+), 8 deletions(-)


Reviewed-by: Stefan Hajnoczi 


The change itself looks good to me, but I do think some other QEMU code
that ends up working on the RAMBlock is not prepared yet. Most probably,
because we never ended up using fd with an offset as guest RAM.

We don't seem to be remembering that offset in the RAMBlock. First, I
thought block->offset would be used for that, but that's just the offset
in the ram_addr_t space. Maybe we need a new "block->fd_offset" to
remember the offset (unless I am missing something).

The real offset in the file would be required at least in two cases I
can see (whenever we essentially end up calling mmap() on the fd again):

1) qemu_ram_remap(): We'd have to add the file offset on top of the
calculated offset.



This one is a bit tricky to test, as we're only running into that code 
path with KVM when we see an #MCE. But it's trivial, so I'm confident it 
will work as expected.





2) vhost-user: most probably whenever we set the mmap_offset. For
example, in vhost_user_fill_set_mem_table_msg() we'd similarly have to
add the file_offset on top of the calculated offset.
vhost_user_get_mr_data() should most probably do that.



I agree - adding the offset as part of get_mr_data() is sufficient. I 
have validated it works correctly with QEMU's vhost-user-blk target.


I think the changes are still obvious enough that I'll fold them all 
into a single patch.



Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




Re: [PATCH v2] hostmem-file: add offset option

2023-04-03 Thread Alexander Graf


On 03.04.23 08:28, Markus Armbruster wrote:


Alexander Graf  writes:


Add an option for hostmem-file to start the memory object at an offset
into the target file. This is useful if multiple memory objects reside
inside the same target file, such as a device node.

In particular, it's useful to map guest memory directly into /dev/mem
for experimentation.

Signed-off-by: Alexander Graf 

[...]


diff --git a/qapi/qom.json b/qapi/qom.json
index a877b879b9..8f5eaa8415 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -635,6 +635,10 @@
  # specify the required alignment via this option.
  # 0 selects a default alignment (currently the page size). (default: 
0)
  #
+# @offset: the offset into the target file that the region starts at. You can
+#  use this option to overload multiple regions into a single fils.

single file

I'm not sure about "to overload multiple regions into a single file".
Maybe "to back multiple regions with a single file".



I like it, I'll use that version here and in the qemu-options.hx file.



Any alignment requirements?



Page size, I'll add it.




What happens when the regions overlap?



It "just works" - same as mapping the same file twice. It's up to the 
user to ensure that nothing bad happens because of that.



Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




[PATCH v4] hostmem-file: add offset option

2023-04-01 Thread Alexander Graf
Add an option for hostmem-file to start the memory object at an offset
into the target file. This is useful if multiple memory objects reside
inside the same target file, such as a device node.

In particular, it's useful to map guest memory directly into /dev/mem
for experimentation.

Signed-off-by: Alexander Graf 
Reviewed-by: Stefan Hajnoczi 

---

v1 -> v2:

  - add qom documentation
  - propagate offset into truncate, size and alignment checks

v2 -> v3:

  - failed attempt at fixing typo

v2 -> v4:

  - fix typo
---
 backends/hostmem-file.c | 40 +++-
 include/exec/memory.h   |  2 ++
 include/exec/ram_addr.h |  3 ++-
 qapi/qom.json   |  5 +
 qemu-options.hx |  6 +-
 softmmu/memory.c|  3 ++-
 softmmu/physmem.c   | 14 ++
 7 files changed, 65 insertions(+), 8 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 25141283c4..38ea65bec5 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -27,6 +27,7 @@ struct HostMemoryBackendFile {
 
 char *mem_path;
 uint64_t align;
+uint64_t offset;
 bool discard_data;
 bool is_pmem;
 bool readonly;
@@ -58,7 +59,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
  backend->size, fb->align, ram_flags,
- fb->mem_path, fb->readonly, errp);
+ fb->mem_path, fb->offset, fb->readonly,
+ errp);
 g_free(name);
 #endif
 }
@@ -125,6 +127,36 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 fb->align = val;
 }
 
+static void file_memory_backend_get_offset(Object *o, Visitor *v,
+  const char *name, void *opaque,
+  Error **errp)
+{
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+uint64_t val = fb->offset;
+
+visit_type_size(v, name, &val, errp);
+}
+
+static void file_memory_backend_set_offset(Object *o, Visitor *v,
+  const char *name, void *opaque,
+  Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+uint64_t val;
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property '%s' of %s", name,
+   object_get_typename(o));
+return;
+}
+
+if (!visit_type_size(v, name, &val, errp)) {
+return;
+}
+fb->offset = val;
+}
+
 #ifdef CONFIG_LIBPMEM
 static bool file_memory_backend_get_pmem(Object *o, Error **errp)
 {
@@ -197,6 +229,12 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL);
+object_class_property_add(oc, "offset", "int",
+file_memory_backend_get_offset,
+file_memory_backend_set_offset,
+NULL, NULL);
+object_class_property_set_description(oc, "offset",
+"Offset into the target file (ex: 1G)");
 #ifdef CONFIG_LIBPMEM
 object_class_property_add_bool(oc, "pmem",
 file_memory_backend_get_pmem, file_memory_backend_set_pmem);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 15ade918ba..3b7295fbe2 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1318,6 +1318,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
  * RAM_NORESERVE,
  * @path: the path in which to allocate the RAM.
+ * @offset: offset within the file referenced by path
  * @readonly: true to open @path for reading, false for read/write.
  * @errp: pointer to Error*, to store an error if it happens.
  *
@@ -1331,6 +1332,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
   uint64_t align,
   uint32_t ram_flags,
   const char *path,
+  ram_addr_t offset,
   bool readonly,
   Error **errp);
 
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index f4fb6a2111..90a8269290 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -110,6 +110,7 @@ long qemu_maxrampagesize(void);
  *  @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
  *  RAM_NORESERVE.
  *  @mem_path or @fd: specify the backing file or device
+ *  @offset: Offset into targ

[PATCH v3] hostmem-file: add offset option

2023-04-01 Thread Alexander Graf
Add an option for hostmem-file to start the memory object at an offset
into the target file. This is useful if multiple memory objects reside
inside the same target file, such as a device node.

In particular, it's useful to map guest memory directly into /dev/mem
for experimentation.

Signed-off-by: Alexander Graf 
Reviewed-by: Stefan Hajnoczi 

---

v1 -> v2:

  - add qom documentation
  - propagate offset into truncate, size and alignment checks

v2 -> v3:

  - fix typo
---
 backends/hostmem-file.c | 40 +++-
 include/exec/memory.h   |  2 ++
 include/exec/ram_addr.h |  3 ++-
 qapi/qom.json   |  5 +
 qemu-options.hx |  6 +-
 softmmu/memory.c|  3 ++-
 softmmu/physmem.c   | 14 ++
 7 files changed, 65 insertions(+), 8 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 25141283c4..38ea65bec5 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -27,6 +27,7 @@ struct HostMemoryBackendFile {
 
 char *mem_path;
 uint64_t align;
+uint64_t offset;
 bool discard_data;
 bool is_pmem;
 bool readonly;
@@ -58,7 +59,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
  backend->size, fb->align, ram_flags,
- fb->mem_path, fb->readonly, errp);
+ fb->mem_path, fb->offset, fb->readonly,
+ errp);
 g_free(name);
 #endif
 }
@@ -125,6 +127,36 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 fb->align = val;
 }
 
+static void file_memory_backend_get_offset(Object *o, Visitor *v,
+  const char *name, void *opaque,
+  Error **errp)
+{
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+uint64_t val = fb->offset;
+
+visit_type_size(v, name, &val, errp);
+}
+
+static void file_memory_backend_set_offset(Object *o, Visitor *v,
+  const char *name, void *opaque,
+  Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+uint64_t val;
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property '%s' of %s", name,
+   object_get_typename(o));
+return;
+}
+
+if (!visit_type_size(v, name, &val, errp)) {
+return;
+}
+fb->offset = val;
+}
+
 #ifdef CONFIG_LIBPMEM
 static bool file_memory_backend_get_pmem(Object *o, Error **errp)
 {
@@ -197,6 +229,12 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL);
+object_class_property_add(oc, "offset", "int",
+file_memory_backend_get_offset,
+file_memory_backend_set_offset,
+NULL, NULL);
+object_class_property_set_description(oc, "offset",
+"Offset into the target file (ex: 1G)");
 #ifdef CONFIG_LIBPMEM
 object_class_property_add_bool(oc, "pmem",
 file_memory_backend_get_pmem, file_memory_backend_set_pmem);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 15ade918ba..3b7295fbe2 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1318,6 +1318,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
  * RAM_NORESERVE,
  * @path: the path in which to allocate the RAM.
+ * @offset: offset within the file referenced by path
  * @readonly: true to open @path for reading, false for read/write.
  * @errp: pointer to Error*, to store an error if it happens.
  *
@@ -1331,6 +1332,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
   uint64_t align,
   uint32_t ram_flags,
   const char *path,
+  ram_addr_t offset,
   bool readonly,
   Error **errp);
 
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index f4fb6a2111..90a8269290 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -110,6 +110,7 @@ long qemu_maxrampagesize(void);
  *  @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
  *  RAM_NORESERVE.
  *  @mem_path or @fd: specify the backing file or device
+ *  @offset: Offset into target file
  *  @readonly: true to open @pa

[PATCH v2] hostmem-file: add offset option

2023-04-01 Thread Alexander Graf
Add an option for hostmem-file to start the memory object at an offset
into the target file. This is useful if multiple memory objects reside
inside the same target file, such as a device node.

In particular, it's useful to map guest memory directly into /dev/mem
for experimentation.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - add qom documentation
  - propagate offset into truncate, size and alignment checks
---
 backends/hostmem-file.c | 40 +++-
 include/exec/memory.h   |  2 ++
 include/exec/ram_addr.h |  3 ++-
 qapi/qom.json   |  5 +
 qemu-options.hx |  6 +-
 softmmu/memory.c|  3 ++-
 softmmu/physmem.c   | 14 ++
 7 files changed, 65 insertions(+), 8 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 25141283c4..38ea65bec5 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -27,6 +27,7 @@ struct HostMemoryBackendFile {
 
 char *mem_path;
 uint64_t align;
+uint64_t offset;
 bool discard_data;
 bool is_pmem;
 bool readonly;
@@ -58,7 +59,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
  backend->size, fb->align, ram_flags,
- fb->mem_path, fb->readonly, errp);
+ fb->mem_path, fb->offset, fb->readonly,
+ errp);
 g_free(name);
 #endif
 }
@@ -125,6 +127,36 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 fb->align = val;
 }
 
+static void file_memory_backend_get_offset(Object *o, Visitor *v,
+  const char *name, void *opaque,
+  Error **errp)
+{
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+uint64_t val = fb->offset;
+
+visit_type_size(v, name, &val, errp);
+}
+
+static void file_memory_backend_set_offset(Object *o, Visitor *v,
+  const char *name, void *opaque,
+  Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+uint64_t val;
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property '%s' of %s", name,
+   object_get_typename(o));
+return;
+}
+
+if (!visit_type_size(v, name, &val, errp)) {
+return;
+}
+fb->offset = val;
+}
+
 #ifdef CONFIG_LIBPMEM
 static bool file_memory_backend_get_pmem(Object *o, Error **errp)
 {
@@ -197,6 +229,12 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL);
+object_class_property_add(oc, "offset", "int",
+file_memory_backend_get_offset,
+file_memory_backend_set_offset,
+NULL, NULL);
+object_class_property_set_description(oc, "offset",
+"Offset into the target file (ex: 1G)");
 #ifdef CONFIG_LIBPMEM
 object_class_property_add_bool(oc, "pmem",
 file_memory_backend_get_pmem, file_memory_backend_set_pmem);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 15ade918ba..3b7295fbe2 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1318,6 +1318,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
  * RAM_NORESERVE,
  * @path: the path in which to allocate the RAM.
+ * @offset: offset within the file referenced by path
  * @readonly: true to open @path for reading, false for read/write.
  * @errp: pointer to Error*, to store an error if it happens.
  *
@@ -1331,6 +1332,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
   uint64_t align,
   uint32_t ram_flags,
   const char *path,
+  ram_addr_t offset,
   bool readonly,
   Error **errp);
 
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index f4fb6a2111..90a8269290 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -110,6 +110,7 @@ long qemu_maxrampagesize(void);
  *  @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
  *  RAM_NORESERVE.
  *  @mem_path or @fd: specify the backing file or device
+ *  @offset: Offset into target file
  *  @readonly: true to open @path for reading, false for read/write.
  *  @e

Re: [PATCH v2 28/32] contrib/gitdm: add Amazon to the domain map

2023-03-17 Thread Alexander Graf


On 15.03.23 20:18, Durrant, Paul wrote:

-Original Message-
From: Alex Bennée 
Sent: 15 March 2023 17:43
To: qemu-devel@nongnu.org
Cc: Akihiko Odaki ; Marc-André Lureau
; qemu-ri...@nongnu.org; Riku Voipio
; Igor Mammedov ; Xiao Guangrong
; Thomas Huth ; Wainer dos
Santos Moschetta ; Dr. David Alan Gilbert
; Alex Williamson ; Hao
Wu ; Cleber Rosa ; Daniel Henrique
Barboza ; Jan Kiszka ; Aurelien
Jarno ; qemu-...@nongnu.org; Marcelo Tosatti
; Eduardo Habkost ; Alexandre
Iooss ; Gerd Hoffmann ; Palmer
Dabbelt ; Ilya Leoshkevich ; qemu-
p...@nongnu.org; Juan Quintela ; Cédric Le Goater
; Darren Kenny ;
k...@vger.kernel.org; Marcel Apfelbaum ; Peter
Maydell ; Richard Henderson
; Stafford Horne ; Weiwei
Li ; Sunil V L ; Stefan
Hajnoczi ; Thomas Huth ; Vijai
Kumar K ; Liu Zhiwei
; David Gibson
; Song Gao ; Paolo
Bonzini ; Michael S. Tsirkin ; Niek
Linnenbank ; Greg Kurz ; Laurent
Vivier ; Qiuhao Li ; Philippe
Mathieu-Daudé ; Xiaojuan Yang
; Mahmoud Mandour ;
Alexander Bulekov ; Jiaxun Yang ;
qemu-bl...@nongnu.org; Yanan Wang ; David
Woodhouse ; qemu-s3...@nongnu.org; Strahinja Jankovic
; Bandan Das ; Alistair
Francis ; Aleksandar Rikalo
; Tyrone Ting ; Kevin
Wolf ; David Hildenbrand ; Beraldo
Leal ; Beniamino Galvani ; Paul
Durrant ; Bin Meng ; Sunil
Muthuswamy ; Hanna Reitz ;
Peter Xu ; Alex Bennée ; Graf
(AWS), Alexander ; Durrant, Paul ;
Woodhouse, David 
Subject: [EXTERNAL] [PATCH v2 28/32] contrib/gitdm: add Amazon to the
domain map

CAUTION: This email originated from outside of the organization. Do not
click links or open attachments unless you can confirm the sender and know
the content is safe.



We have multiple contributors from both .co.uk and .com versions of
the address.

Signed-off-by: Alex Bennée 
Cc: Alexander Graf 
Cc: Paul Durrant 
Cc: David Wooodhouse 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20230310180332.2274827-7-alex.ben...@linaro.org>
---
  contrib/gitdm/domain-map | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/contrib/gitdm/domain-map b/contrib/gitdm/domain-map
index 4a988c5b5f..8dce276a1c 100644
--- a/contrib/gitdm/domain-map
+++ b/contrib/gitdm/domain-map
@@ -4,6 +4,8 @@
  # This maps email domains to nice easy to read company names
  #

+amazon.com  Amazon
+amazon.co.ukAmazon

You might want 'amazon.de' too but as far as it goes...



Yes, please add amazon.de here. Once that's added, feel free to take my

Reviewed-by: Alexander Graf 


Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




Re: [PATCH 0/2] hw/intc/arm_gicv3: Bump ITT entry size to 16

2023-03-10 Thread Alexander Graf



On 03.01.23 18:41, Peter Maydell wrote:

On Fri, 23 Dec 2022 at 08:50, Alexander Graf  wrote:

While trying to make Windows work with GICv3 emulation, I stumbled over
the fact that it only supports ITT entry sizes that are power of 2 sized.

While the spec allows arbitrary sizes, in practice hardware will always
expose power of 2 sizes and so this limitation is not really a problem
in real world scenarios. However, we only expose a 12 byte ITT entry size
which makes Windows blue screen on boot.

The easy way to get around that problem is to bump the size to 16. That
is a power of 2, basically is what hardware would expose given the amount
of bits we need per entry and doesn't break any existing scenarios. To
play it safe, this patch set only bumps them on newer machine types.

This is a Windows bug and should IMHO be fixed in that guest OS.
Changing the ITT entry size of QEMU's implementation introduces
an unnecessary incompatibility in migration and wastes memory
(we're already a bit unnecessarily profligate with ITT entries
compared to real hardware).



Follow-up on this: Microsoft has fixed the issue in Windows. That won't 
make older versions work, but the current should be fine with GICv3:


https://fosstodon.org/@itanium/109909281184181276


Alex





Re: [PATCH v2] hvf: arm: Add support for GICv3

2023-02-03 Thread Alexander Graf

Hey Peter,

On 03.02.23 11:57, Peter Maydell wrote:

On Thu, 2 Feb 2023 at 17:56, Peter Maydell  wrote:

On Sat, 28 Jan 2023 at 22:45, Alexander Graf  wrote:

We currently only support GICv2 emulation. To also support GICv3, we will
need to pass a few system registers into their respective handler functions.

This patch adds support for HVF to call into the TCG callbacks for GICv3
system register handlers. This is safe because the GICv3 TCG code is generic
as long as we limit ourselves to EL0 and EL1 - which are the only modes
supported by HVF.

To make sure nobody trips over that, we also annotate callbacks that don't
work in HVF mode, such as EL state change hooks.

With GICv3 support in place, we can run with more than 8 vCPUs.

Signed-off-by: Alexander Graf 

---



Applied to target-arm.next, thanks.

This one *also* fails 'make check'. Please can you test your
patches before sending them?

The fix is not difficult (another missing qtest_enabled() check),
so I've squashed it in.



Sorry for the mess :(. I usually do test TCG and HVF when submitting 
these patches with various VMs, but keep forgetting about "make check". 
I'll try hard to remember next time.



Thanks,

Alex





[PATCH] hostmem-file: add offset option

2023-02-02 Thread Alexander Graf
Add an option for hostmem-file to start the memory object at an offset
into the target file. This is useful if multiple memory objects reside
inside the same target file, such as a device node.

In particular, it's useful to map guest memory directly into /dev/mem
for experimentation.

Signed-off-by: Alexander Graf 
---
 backends/hostmem-file.c | 40 +++-
 include/exec/memory.h   |  2 ++
 include/exec/ram_addr.h |  3 ++-
 qapi/qom.json   |  1 +
 qemu-options.hx |  6 +-
 softmmu/memory.c|  3 ++-
 softmmu/physmem.c   |  5 +++--
 7 files changed, 54 insertions(+), 6 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 25141283c4..38ea65bec5 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -27,6 +27,7 @@ struct HostMemoryBackendFile {
 
 char *mem_path;
 uint64_t align;
+uint64_t offset;
 bool discard_data;
 bool is_pmem;
 bool readonly;
@@ -58,7 +59,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
  backend->size, fb->align, ram_flags,
- fb->mem_path, fb->readonly, errp);
+ fb->mem_path, fb->offset, fb->readonly,
+ errp);
 g_free(name);
 #endif
 }
@@ -125,6 +127,36 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 fb->align = val;
 }
 
+static void file_memory_backend_get_offset(Object *o, Visitor *v,
+  const char *name, void *opaque,
+  Error **errp)
+{
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+uint64_t val = fb->offset;
+
+visit_type_size(v, name, &val, errp);
+}
+
+static void file_memory_backend_set_offset(Object *o, Visitor *v,
+  const char *name, void *opaque,
+  Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+uint64_t val;
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property '%s' of %s", name,
+   object_get_typename(o));
+return;
+}
+
+if (!visit_type_size(v, name, &val, errp)) {
+return;
+}
+fb->offset = val;
+}
+
 #ifdef CONFIG_LIBPMEM
 static bool file_memory_backend_get_pmem(Object *o, Error **errp)
 {
@@ -197,6 +229,12 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL);
+object_class_property_add(oc, "offset", "int",
+file_memory_backend_get_offset,
+file_memory_backend_set_offset,
+NULL, NULL);
+object_class_property_set_description(oc, "offset",
+"Offset into the target file (ex: 1G)");
 #ifdef CONFIG_LIBPMEM
 object_class_property_add_bool(oc, "pmem",
 file_memory_backend_get_pmem, file_memory_backend_set_pmem);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2e602a2fad..bd67198111 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1318,6 +1318,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
  * RAM_NORESERVE,
  * @path: the path in which to allocate the RAM.
+ * @offset: offset within the file referenced by path
  * @readonly: true to open @path for reading, false for read/write.
  * @errp: pointer to Error*, to store an error if it happens.
  *
@@ -1331,6 +1332,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
   uint64_t align,
   uint32_t ram_flags,
   const char *path,
+  ram_addr_t offset,
   bool readonly,
   Error **errp);
 
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index f4fb6a2111..90a8269290 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -110,6 +110,7 @@ long qemu_maxrampagesize(void);
  *  @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
  *  RAM_NORESERVE.
  *  @mem_path or @fd: specify the backing file or device
+ *  @offset: Offset into target file
  *  @readonly: true to open @path for reading, false for read/write.
  *  @errp: pointer to Error*, to store an error if it happens
  *
@@ -119,7 +120,7 @@ long qemu_maxrampagesize(void);
  */
 RAMBlock *qemu_ram_allo

[PATCH v2] hvf: arm: Add support for GICv3

2023-01-28 Thread Alexander Graf
We currently only support GICv2 emulation. To also support GICv3, we will
need to pass a few system registers into their respective handler functions.

This patch adds support for HVF to call into the TCG callbacks for GICv3
system register handlers. This is safe because the GICv3 TCG code is generic
as long as we limit ourselves to EL0 and EL1 - which are the only modes
supported by HVF.

To make sure nobody trips over that, we also annotate callbacks that don't
work in HVF mode, such as EL state change hooks.

With GICv3 support in place, we can run with more than 8 vCPUs.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - assert when guest has EL2/EL3 and uses non-TCG GICv3
  - use defines for sysreg masks
---
 hw/intc/arm_gicv3_cpuif.c   |  15 +++-
 target/arm/hvf/hvf.c| 151 
 target/arm/hvf/trace-events |   2 +
 3 files changed, 167 insertions(+), 1 deletion(-)

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index b17b29288c..c4ff595742 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -21,6 +21,7 @@
 #include "hw/irq.h"
 #include "cpu.h"
 #include "target/arm/cpregs.h"
+#include "sysemu/tcg.h"
 
 /*
  * Special case return value from hppvi_index(); must be larger than
@@ -2810,6 +2811,8 @@ void gicv3_init_cpuif(GICv3State *s)
  * which case we'd get the wrong value.
  * So instead we define the regs with no ri->opaque info, and
  * get back to the GICv3CPUState from the CPUARMState.
+ *
+ * These CP regs callbacks can be called from either TCG or HVF code.
  */
 define_arm_cp_regs(cpu, gicv3_cpuif_reginfo);
 
@@ -2905,6 +2908,16 @@ void gicv3_init_cpuif(GICv3State *s)
 define_arm_cp_regs(cpu, gicv3_cpuif_ich_apxr23_reginfo);
 }
 }
-arm_register_el_change_hook(cpu, gicv3_cpuif_el_change_hook, cs);
+if (tcg_enabled()) {
+/*
+ * We can only trap EL changes with TCG. However the GIC interrupt
+ * state only changes on EL changes involving EL2 or EL3, so for
+ * the non-TCG case this is OK, as EL2 and EL3 can't exist.
+ */
+arm_register_el_change_hook(cpu, gicv3_cpuif_el_change_hook, cs);
+} else {
+assert(!arm_feature(&cpu->env, ARM_FEATURE_EL2));
+assert(!arm_feature(&cpu->env, ARM_FEATURE_EL3));
+}
 }
 }
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 060aa0ccf4..ad65603445 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -80,6 +80,33 @@
 #define SYSREG_PMCCNTR_EL0SYSREG(3, 3, 9, 13, 0)
 #define SYSREG_PMCCFILTR_EL0  SYSREG(3, 3, 14, 15, 7)
 
+#define SYSREG_ICC_AP0R0_EL1 SYSREG(3, 0, 12, 8, 4)
+#define SYSREG_ICC_AP0R1_EL1 SYSREG(3, 0, 12, 8, 5)
+#define SYSREG_ICC_AP0R2_EL1 SYSREG(3, 0, 12, 8, 6)
+#define SYSREG_ICC_AP0R3_EL1 SYSREG(3, 0, 12, 8, 7)
+#define SYSREG_ICC_AP1R0_EL1 SYSREG(3, 0, 12, 9, 0)
+#define SYSREG_ICC_AP1R1_EL1 SYSREG(3, 0, 12, 9, 1)
+#define SYSREG_ICC_AP1R2_EL1 SYSREG(3, 0, 12, 9, 2)
+#define SYSREG_ICC_AP1R3_EL1 SYSREG(3, 0, 12, 9, 3)
+#define SYSREG_ICC_ASGI1R_EL1SYSREG(3, 0, 12, 11, 6)
+#define SYSREG_ICC_BPR0_EL1  SYSREG(3, 0, 12, 8, 3)
+#define SYSREG_ICC_BPR1_EL1  SYSREG(3, 0, 12, 12, 3)
+#define SYSREG_ICC_CTLR_EL1  SYSREG(3, 0, 12, 12, 4)
+#define SYSREG_ICC_DIR_EL1   SYSREG(3, 0, 12, 11, 1)
+#define SYSREG_ICC_EOIR0_EL1 SYSREG(3, 0, 12, 8, 1)
+#define SYSREG_ICC_EOIR1_EL1 SYSREG(3, 0, 12, 12, 1)
+#define SYSREG_ICC_HPPIR0_EL1SYSREG(3, 0, 12, 8, 2)
+#define SYSREG_ICC_HPPIR1_EL1SYSREG(3, 0, 12, 12, 2)
+#define SYSREG_ICC_IAR0_EL1  SYSREG(3, 0, 12, 8, 0)
+#define SYSREG_ICC_IAR1_EL1  SYSREG(3, 0, 12, 12, 0)
+#define SYSREG_ICC_IGRPEN0_EL1   SYSREG(3, 0, 12, 12, 6)
+#define SYSREG_ICC_IGRPEN1_EL1   SYSREG(3, 0, 12, 12, 7)
+#define SYSREG_ICC_PMR_EL1   SYSREG(3, 0, 4, 6, 0)
+#define SYSREG_ICC_RPR_EL1   SYSREG(3, 0, 12, 11, 3)
+#define SYSREG_ICC_SGI0R_EL1 SYSREG(3, 0, 12, 11, 7)
+#define SYSREG_ICC_SGI1R_EL1 SYSREG(3, 0, 12, 11, 5)
+#define SYSREG_ICC_SRE_EL1   SYSREG(3, 0, 12, 12, 5)
+
 #define WFX_IS_WFE (1 << 0)
 
 #define TMR_CTL_ENABLE  (1 << 0)
@@ -788,6 +815,43 @@ static bool is_id_sysreg(uint32_t reg)
SYSREG_CRM(reg) < 8;
 }
 
+static uint32_t hvf_reg2cp_reg(uint32_t reg)
+{
+return ENCODE_AA64_CP_REG(CP_REG_ARM64_SYSREG_CP,
+  (reg >> SYSREG_CRN_SHIFT) & SYSREG_CRN_MASK,
+  (reg >> SYSREG_CRM_SHIFT) & SYSREG_CRM_MASK,
+  (reg >> SYSREG_OP0_SHIFT) & SYSREG_OP0_MASK,
+  (reg >> SYSREG_OP1_SHIFT) & SYSREG_OP1_MASK,
+  (reg >> SYSREG_OP2_SHIFT

Re: [PATCH] hvf: arm: Add support for GICv3

2023-01-28 Thread Alexander Graf



On 06.01.23 17:37, Peter Maydell wrote:

On Mon, 19 Dec 2022 at 22:08, Alexander Graf  wrote:

We currently only support GICv2 emulation. To also support GICv3, we will
need to pass a few system registers into their respective handler functions.

This patch adds support for HVF to call into the TCG callbacks for GICv3
system register handlers. This is safe because the GICv3 TCG code is generic
as long as we limit ourselves to EL0 and EL1 - which are the only modes
supported by HVF.

To make sure nobody trips over that, we also annotate callbacks that don't
work in HVF mode, such as EL state change hooks.

With GICv3 support in place, we can run with more than 8 vCPUs.

Signed-off-by: Alexander Graf 
---
  hw/intc/arm_gicv3_cpuif.c   |   8 +-
  target/arm/hvf/hvf.c| 151 
  target/arm/hvf/trace-events |   2 +
  3 files changed, 160 insertions(+), 1 deletion(-)

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index b17b29288c..b4e387268c 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -21,6 +21,7 @@
  #include "hw/irq.h"
  #include "cpu.h"
  #include "target/arm/cpregs.h"
+#include "sysemu/tcg.h"

  /*
   * Special case return value from hppvi_index(); must be larger than
@@ -2810,6 +2811,8 @@ void gicv3_init_cpuif(GICv3State *s)
   * which case we'd get the wrong value.
   * So instead we define the regs with no ri->opaque info, and
   * get back to the GICv3CPUState from the CPUARMState.
+ *
+ * These CP regs callbacks can be called from either TCG or HVF code.
   */
  define_arm_cp_regs(cpu, gicv3_cpuif_reginfo);

@@ -2905,6 +2908,9 @@ void gicv3_init_cpuif(GICv3State *s)
  define_arm_cp_regs(cpu, gicv3_cpuif_ich_apxr23_reginfo);
  }
  }
-arm_register_el_change_hook(cpu, gicv3_cpuif_el_change_hook, cs);
+if (tcg_enabled()) {
+/* We can only trap EL changes with TCG for now */

We could expand this a bit:

  We can only trap EL changes with TCG. However the GIC interrupt
  state only changes on EL changes involving EL2 or EL3, so for
  the non-TCG case this is OK, as EL2 and EL3 can't exist.

and assert:
  assert(!arm_feature(&cpu->env, ARM_FEATURE_EL2));
  assert(!arm_feature(&cpu->env, ARM_FEATURE_EL3));



Good idea! Let me add that.





+static uint32_t hvf_reg2cp_reg(uint32_t reg)
+{
+return ENCODE_AA64_CP_REG(CP_REG_ARM64_SYSREG_CP,
+  (reg >> 10) & 0xf,
+  (reg >> 1) & 0xf,
+  (reg >> 20) & 0x3,
+  (reg >> 14) & 0x7,
+  (reg >> 17) & 0x7);

This file has #defines for these shift and mask constants
(SYSREG_OP0_SHIFT etc).



Ugh, thanks for catching that!





+}
+
+static bool hvf_sysreg_read_cp(CPUState *cpu, uint32_t reg, uint64_t *val)
+{
+ARMCPU *arm_cpu = ARM_CPU(cpu);
+CPUARMState *env = &arm_cpu->env;
+const ARMCPRegInfo *ri;
+
+ri = get_arm_cp_reginfo(arm_cpu->cp_regs, hvf_reg2cp_reg(reg));
+if (ri) {
+if (ri->accessfn) {
+if (ri->accessfn(env, ri, true) != CP_ACCESS_OK) {
+return false;
+}
+}
+if (ri->type & ARM_CP_CONST) {
+*val = ri->resetvalue;
+} else if (ri->readfn) {
+*val = ri->readfn(env, ri);
+} else {
+*val = CPREG_FIELD64(env, ri);
+}
+trace_hvf_vgic_read(ri->name, *val);
+return true;
+}

Can we get here for attempts by EL0 to access EL1-only
sysregs, or does hvf send the exception to EL1 without
trapping out to us? If we can get here for EL0 accesses we
need to check against ri->access as well as ri->accessfn.



I just validated, GICv3 EL1 registers trap to EL1 inside the guest:


$ cat a.S
.global start
.global _main
_main:
start:
    mrs x0, ICC_AP0R0_EL1
    mov x0, #0x1234
    msr ICC_AP0R0_EL1, x0
    mov x0, #0
    ret
$ gcc -nostdlib a.S
$ gdb ./a.out
(gdb) r
Program received signal SIGILL, Illegal instruction.
0x004000d4 in start ()
(gdb) x/i $pc
=> 0x4000d4 :    mrs x0, icc_ap0r0_el1


So no need to check ri->access :)


Alex




Re: [PATCH 0/2] hw/intc/arm_gicv3: Bump ITT entry size to 16

2023-01-03 Thread Alexander Graf

Hi Peter,

On 03.01.23 18:41, Peter Maydell wrote:

On Fri, 23 Dec 2022 at 08:50, Alexander Graf  wrote:

While trying to make Windows work with GICv3 emulation, I stumbled over
the fact that it only supports ITT entry sizes that are power of 2 sized.

While the spec allows arbitrary sizes, in practice hardware will always
expose power of 2 sizes and so this limitation is not really a problem
in real world scenarios. However, we only expose a 12 byte ITT entry size
which makes Windows blue screen on boot.

The easy way to get around that problem is to bump the size to 16. That
is a power of 2, basically is what hardware would expose given the amount
of bits we need per entry and doesn't break any existing scenarios. To
play it safe, this patch set only bumps them on newer machine types.

This is a Windows bug and should IMHO be fixed in that guest OS.



I don't have access to the Windows source code, but the compiled binary 
very explicitly checks and validates that an ITT entry is Po2 sized. 
That means the MS folks deliberately decided to make simplifying 
assumptions that hardware will never use any other sizes.


After thinking about it for a while, I ended up with the same 
conclusion: Hardware would never use anything but Po2 sizes because 
those are trivial to map to indexes in hardware, while anything even 
remotely odd is much more costly (in die space and/or time) to extract 
an index from.


So while I'm really curious about the rationale they had here, I doubt 
it's a bug. It's a deliberate decision. And one that makes sense in the 
context of hardware. I don't see a good reason for them to change the 
behavior, given that there's a close-to-0 chance we will ever see real 
hardware ITS structures with ITT entries that are not Po2 sized.




Changing the ITT entry size of QEMU's implementation introduces
an unnecessary incompatibility in migration and wastes memory


The patch set deals with migration through machine versions. We do these 
type of changes all the time, why would it be a problem here?


As for memory waste, I agree. If I understand the ITS code correctly, 
basically all of the contents that are >8 bytes is GICv4 related and 
useless in a GICv3 vGIC. So I think if we really care strongly about 
memory waste, we could try to condense it down to 8 bytes in the GICv3 
case and make it 16 only for GICv4.


I think keeping GICv3 and GICv4 code paths identical does have its 
attractiveness though, so I'd prefer not to do it.




(we're already a bit unnecessarily profligate with ITT entries
compared to real hardware).


Do you mean the number of entries or the size per entry?


Alex





Re: qemu-system-i386: Could not install MSR_CORE_THREAD_COUNT handler: Success

2022-12-31 Thread Alexander Graf

Hi Vitaly,

On 31.12.22 11:17, Vitaly Chikunov wrote:

Alexander,

On Sat, Dec 31, 2022 at 10:28:21AM +0100, Alexander Graf wrote:

On 30.12.22 19:16, Vitaly Chikunov wrote:

On Fri, Dec 30, 2022 at 06:44:14PM +0100, Alexander Graf wrote:

This is a kvm kernel bug and should be fixed with the latest stable releases. 
Which kernel version are you running?

This is on latest v6.0 stable - 6.0.15.

Maybe there could be workaround for such situations? (Or maybe it's
possible to make this error non-fatal?) We use qemu+kvm for testing and
now we cannot test on x86.

I'm confused what's going wrong for you. I tried to reproduce the issue
locally, but am unable to:

$ uname -a
Linux server 6.0.15-default #1 SMP PREEMPT_DYNAMIC Sat Dec 31 07:52:52 CET
2022 x86_64 x86_64 x86_64 GNU/Linux
$ linux32 chroot .
$ uname -a
Linux server 6.0.15-default #1 SMP PREEMPT_DYNAMIC Sat Dec 31 07:52:52 CET
2022 i686 GNU/Linux
$ cd qemu
$ file ./build/qemu-system-i386
./build/qemu-system-i386: ELF 32-bit LSB shared object, Intel 80386, version
1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux
3.2.0, BuildID[sha1]=f75e20572be5c604c121de4497397665c168aa4c, with
debug_info, not stripped
$ ./build/qemu-system-i386 --version
QEMU emulator version 7.2.0 (v7.2.0-dirty)
Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
$ ./build/qemu-system-i386 -nographic -enable-kvm
SeaBIOS (version rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org)
[...]


Can you please double check whether your host kernel version is 6.0.15?
Please paste the output of "uname -a".

Excuse me, I'm incorrectly reported kernel version I tried to boot instead
of host one. Host kernels are quite old, 5.15.59 and even 5.17.15 --
where failure is occurring.

I just tested on 5.15.85 and there is no failure.



Awesome, great to hear :). That means everything works as expected at least.



   builder@i586:/.in$ uname -a
   Linux localhost.localdomain 5.15.85-std-def-alt1 #1 SMP Wed Dec 21 21:14:40 
UTC 2022 i686 GNU/Linux
   builder@i586:/.in$ qemu-system-i386 -nographic -enable-kvm
   SeaBIOS (version 1.16.1-alt1)

Perhaps, one of solutions it to reboot our build fleet to newer kernels.
[This maybe hard, though, since special builder node image should be
created and reboot shall be coordinated through all systems, in compare,
updating QEMU would be easier since chroot is created on every build].



I understand that it may be slightly painful to update your build fleet, 
but given this is a genuine kernel bug that has a fix available upstream 
and it only happens on niche corner cases (i386 QEMU on x86-64 Linux 
kernels with the bug) that I doubt anyone will use in production, I'd 
prefer we keep the QEMU logic as is :).


In the meanwhile, while you're patching the build fleet, you can apply 
the patch below as part of your build process to ensure you don't fail 
due to the kernel bug. Just make sure to remove it again as soon as 
you're done with the fleet update :).



diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a213209379..b9396bc7a6 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2632,7 +2632,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 return ret;
 }
 }
+#ifdef __x86_64__
 if (kvm_vm_check_extension(s, KVM_CAP_X86_USER_SPACE_MSR)) {
+#else
+    if (0) {
+#endif
 bool r;

 ret = kvm_vm_enable_cap(s, KVM_CAP_X86_USER_SPACE_MSR, 0,

Alex





Re: qemu-system-i386: Could not install MSR_CORE_THREAD_COUNT handler: Success

2022-12-31 Thread Alexander Graf

Hi Vitaly,

On 30.12.22 19:16, Vitaly Chikunov wrote:

Alexander,

On Fri, Dec 30, 2022 at 06:44:14PM +0100, Alexander Graf wrote:

Hi Vitaly,

This is a kvm kernel bug and should be fixed with the latest stable releases. 
Which kernel version are you running?

This is on latest v6.0 stable - 6.0.15.

Maybe there could be workaround for such situations? (Or maybe it's
possible to make this error non-fatal?) We use qemu+kvm for testing and
now we cannot test on x86.


I'm confused what's going wrong for you. I tried to reproduce the issue 
locally, but am unable to:


$ uname -a
Linux server 6.0.15-default #1 SMP PREEMPT_DYNAMIC Sat Dec 31 07:52:52 
CET 2022 x86_64 x86_64 x86_64 GNU/Linux

$ linux32 chroot .
$ uname -a
Linux server 6.0.15-default #1 SMP PREEMPT_DYNAMIC Sat Dec 31 07:52:52 
CET 2022 i686 GNU/Linux

$ cd qemu
$ file ./build/qemu-system-i386
./build/qemu-system-i386: ELF 32-bit LSB shared object, Intel 80386, 
version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, 
for GNU/Linux 3.2.0, 
BuildID[sha1]=f75e20572be5c604c121de4497397665c168aa4c, with debug_info, 
not stripped

$ ./build/qemu-system-i386 --version
QEMU emulator version 7.2.0 (v7.2.0-dirty)
Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
$ ./build/qemu-system-i386 -nographic -enable-kvm
SeaBIOS (version rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org)
[...]


Can you please double check whether your host kernel version is 6.0.15? 
Please paste the output of "uname -a".



Alex




Re: qemu-system-i386: Could not install MSR_CORE_THREAD_COUNT handler: Success

2022-12-30 Thread Alexander Graf
Hi Vitaly,

This is a kvm kernel bug and should be fixed with the latest stable releases. 
Which kernel version are you running?

Thanks,

Alex


> Am 30.12.2022 um 15:30 schrieb Vitaly Chikunov :
> 
> Hi,
> 
> QEMU 7.2.0 when run on 32-bit x86 architecture fails with:
> 
>  i586$ qemu-system-i386 -enable-kvm
>  qemu-system-i386: Could not install MSR_CORE_THREAD_COUNT handler: Success
>  i586$ qemu-system-x86_64 -enable-kvm
>  qemu-system-x86_64: Could not install MSR_CORE_THREAD_COUNT handler: Success
> 
> Minimal reproducer is `qemu-system-i386 -enable-kvm'. And this only
> happens on x86 (linux32 personality and binaries on x86_64 host):
> 
>  i586$ file /usr/bin/qemu-system-i386
>  /usr/bin/qemu-system-i386: ELF 32-bit LSB pie executable, Intel 80386, 
> version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, 
> BuildID[sha1]=0ba1d953bcb7a691014255954f060ff404c8df90, for GNU/Linux 3.2.0, 
> stripped
>  i586$ /usr/bin/qemu-system-i386 --version
>  QEMU emulator version 7.2.0 (qemu-7.2.0-alt1)
>  Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
> 
> Thanks,
> 



Re: [PATCH v3 1/2] hw/arm/virt: Consolidate GIC finalize logic

2022-12-23 Thread Alexander Graf

Hey Cornelia,

On 23.12.22 13:30, Cornelia Huck wrote:

On Fri, Dec 23 2022, Alexander Graf  wrote:


Up to now, the finalize_gic_version() code open coded what is essentially
a support bitmap match between host/emulation environment and desired
target GIC type.

This open coding leads to undesirable side effects. For example, a VM with
KVM and -smp 10 will automatically choose GICv3 while the same command
line with TCG will stay on GICv2 and fail the launch.

This patch combines the TCG and KVM matching code paths by making
everything a 2 pass process. First, we determine which GIC versions the
current environment is able to support, then we go through a single
state machine to determine which target GIC mode that means for us.

After this patch, the only user noticable changes should be consolidated
error messages as well as TCG -M virt supporting -smp > 8 automatically.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

   - Leave VIRT_GIC_VERSION defines intact, we need them for MADT generation

v2 -> v3:

   - Fix comment
   - Flip kvm-enabled logic for host around
---
  hw/arm/virt.c | 198 ++
  include/hw/arm/virt.h |  15 ++--
  2 files changed, 112 insertions(+), 101 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ea2413a0ba..6d27f044fe 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1820,6 +1820,84 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
  }
  }
  
+static VirtGICType finalize_gic_version_do(const char *accel_name,

+   VirtGICType gic_version,
+   int gics_supported,
+   unsigned int max_cpus)
+{
+/* Convert host/max/nosel to GIC version number */
+switch (gic_version) {
+case VIRT_GIC_VERSION_HOST:
+if (!kvm_enabled()) {
+error_report("gic-version=host requires KVM");
+exit(1);
+}
+
+/* For KVM, gic-version=host means gic-version=max */
+return finalize_gic_version_do(accel_name, VIRT_GIC_VERSION_MAX,
+   gics_supported, max_cpus);

I think I'd still rather use /* fallthrough */ here, but let's leave
that decision to the maintainers.



I originally had a fallthrough here, then looked at the code and 
concluded for myself that I dislike fallthroughs :). They make more 
complicated code flows insanely complicated and are super error prone.



In any case,

Reviewed-by: Cornelia Huck 

[As an aside, we have a QEMU_FALLTHROUGH #define that maps to
__attribute__((fallthrough)) if available, but unlike the Linux kernel,
we didn't bother to convert everything to use it in QEMU. Should we?
Would using the attribute give us some extra benefits?]



IMHO we're be better off just refactoring code in ways that don't 
require fall-throughs. Modern compilers inline functions pretty well, so 
I think there's very little reason for them anymore.


Thanks a lot for the reviews!


Alex





[PATCH v3 0/2] hw/arm/virt: Handle HVF in finalize_gic_version()

2022-12-23 Thread Alexander Graf
The finalize_gic_version() function tries to determine which GIC version
the current accelerator / host combination supports. During the initial
HVF porting efforts, I didn't realize that I also had to touch this
function. Then Zenghui brought up this function as reply to my HVF GICv3
enablement patch - and boy it is a mess.

This patch set cleans up all of the GIC finalization so that we can
easily plug HVF in and also hopefully will have a better time extending
it in the future. As second step, it explicitly adds HVF support and
fails loudly for any unsupported accelerators.

Alex

v1 -> v2:

  - Leave VIRT_GIC_VERSION defines intact, we need them for MADT generation
  - Include TCG header for tcg_enabled()

v2 -> v3:

  - Fix comment
  - Flip kvm-enabled logic for host around

Alexander Graf (2):
  hw/arm/virt: Consolidate GIC finalize logic
  hw/arm/virt: Make accels in GIC finalize logic explicit

 hw/arm/virt.c | 200 ++
 include/hw/arm/virt.h |  15 ++--
 2 files changed, 115 insertions(+), 100 deletions(-)

-- 
2.37.1 (Apple Git-137.1)




[PATCH v3 2/2] hw/arm/virt: Make accels in GIC finalize logic explicit

2022-12-23 Thread Alexander Graf
Let's explicitly list out all accelerators that we support when trying to
determine the supported set of GIC versions. KVM was already separate, so
the only missing one is HVF which simply reuses all of TCG's emulation
code and thus has the same compatibility matrix.

Signed-off-by: Alexander Graf 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Cornelia Huck 

---

v1 -> v2:

  - Include TCG header for tcg_enabled()
---
 hw/arm/virt.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 6d27f044fe..611f40c1da 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -47,6 +47,7 @@
 #include "sysemu/numa.h"
 #include "sysemu/runstate.h"
 #include "sysemu/tpm.h"
+#include "sysemu/tcg.h"
 #include "sysemu/kvm.h"
 #include "sysemu/hvf.h"
 #include "hw/loader.h"
@@ -1929,7 +1930,7 @@ static void finalize_gic_version(VirtMachineState *vms)
 /* KVM w/o kernel irqchip can only deal with GICv2 */
 gics_supported |= VIRT_GIC_VERSION_2_MASK;
 accel_name = "KVM with kernel-irqchip=off";
-} else {
+} else if (tcg_enabled() || hvf_enabled())  {
 gics_supported |= VIRT_GIC_VERSION_2_MASK;
 if (module_object_class_by_name("arm-gicv3")) {
 gics_supported |= VIRT_GIC_VERSION_3_MASK;
@@ -1938,6 +1939,9 @@ static void finalize_gic_version(VirtMachineState *vms)
 gics_supported |= VIRT_GIC_VERSION_4_MASK;
 }
 }
+} else {
+error_report("Unsupported accelerator, can not determine GIC support");
+exit(1);
 }
 
 /*
-- 
2.37.1 (Apple Git-137.1)




[PATCH v3 1/2] hw/arm/virt: Consolidate GIC finalize logic

2022-12-23 Thread Alexander Graf
Up to now, the finalize_gic_version() code open coded what is essentially
a support bitmap match between host/emulation environment and desired
target GIC type.

This open coding leads to undesirable side effects. For example, a VM with
KVM and -smp 10 will automatically choose GICv3 while the same command
line with TCG will stay on GICv2 and fail the launch.

This patch combines the TCG and KVM matching code paths by making
everything a 2 pass process. First, we determine which GIC versions the
current environment is able to support, then we go through a single
state machine to determine which target GIC mode that means for us.

After this patch, the only user noticable changes should be consolidated
error messages as well as TCG -M virt supporting -smp > 8 automatically.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - Leave VIRT_GIC_VERSION defines intact, we need them for MADT generation

v2 -> v3:

  - Fix comment
  - Flip kvm-enabled logic for host around
---
 hw/arm/virt.c | 198 ++
 include/hw/arm/virt.h |  15 ++--
 2 files changed, 112 insertions(+), 101 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ea2413a0ba..6d27f044fe 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1820,6 +1820,84 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 }
 }
 
+static VirtGICType finalize_gic_version_do(const char *accel_name,
+   VirtGICType gic_version,
+   int gics_supported,
+   unsigned int max_cpus)
+{
+/* Convert host/max/nosel to GIC version number */
+switch (gic_version) {
+case VIRT_GIC_VERSION_HOST:
+if (!kvm_enabled()) {
+error_report("gic-version=host requires KVM");
+exit(1);
+}
+
+/* For KVM, gic-version=host means gic-version=max */
+return finalize_gic_version_do(accel_name, VIRT_GIC_VERSION_MAX,
+   gics_supported, max_cpus);
+case VIRT_GIC_VERSION_MAX:
+if (gics_supported & VIRT_GIC_VERSION_4_MASK) {
+gic_version = VIRT_GIC_VERSION_4;
+} else if (gics_supported & VIRT_GIC_VERSION_3_MASK) {
+gic_version = VIRT_GIC_VERSION_3;
+} else {
+gic_version = VIRT_GIC_VERSION_2;
+}
+break;
+case VIRT_GIC_VERSION_NOSEL:
+if ((gics_supported & VIRT_GIC_VERSION_2_MASK) &&
+max_cpus <= GIC_NCPU) {
+gic_version = VIRT_GIC_VERSION_2;
+} else if (gics_supported & VIRT_GIC_VERSION_3_MASK) {
+/*
+ * in case the host does not support v2 emulation or
+ * the end-user requested more than 8 VCPUs we now default
+ * to v3. In any case defaulting to v2 would be broken.
+ */
+gic_version = VIRT_GIC_VERSION_3;
+} else if (max_cpus > GIC_NCPU) {
+error_report("%s only supports GICv2 emulation but more than 8 "
+ "vcpus are requested", accel_name);
+exit(1);
+}
+break;
+case VIRT_GIC_VERSION_2:
+case VIRT_GIC_VERSION_3:
+case VIRT_GIC_VERSION_4:
+break;
+}
+
+/* Check chosen version is effectively supported */
+switch (gic_version) {
+case VIRT_GIC_VERSION_2:
+if (!(gics_supported & VIRT_GIC_VERSION_2_MASK)) {
+error_report("%s does not support GICv2 emulation", accel_name);
+exit(1);
+}
+break;
+case VIRT_GIC_VERSION_3:
+if (!(gics_supported & VIRT_GIC_VERSION_3_MASK)) {
+error_report("%s does not support GICv3 emulation", accel_name);
+exit(1);
+}
+break;
+case VIRT_GIC_VERSION_4:
+if (!(gics_supported & VIRT_GIC_VERSION_4_MASK)) {
+error_report("%s does not support GICv4 emulation, is 
virtualization=on?",
+ accel_name);
+exit(1);
+}
+break;
+default:
+error_report("logic error in finalize_gic_version");
+exit(1);
+break;
+}
+
+return gic_version;
+}
+
 /*
  * finalize_gic_version - Determines the final gic_version
  * according to the gic-version property
@@ -1828,118 +1906,46 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
  */
 static void finalize_gic_version(VirtMachineState *vms)
 {
+const char *accel_name = current_accel_name();
 unsigned int max_cpus = MACHINE(vms)->smp.max_cpus;
+int gics_supported = 0;
 
-if (kvm_enabled()) {
-int probe_bitmap;
+/* Determine which GIC versions the current environment supports */
+if (kvm_enabled() && kvm_irqchip_in_kernel()) {
+int probe_bitmap = kvm_arm_vgic_

[PATCH 0/2] hw/intc/arm_gicv3: Bump ITT entry size to 16

2022-12-23 Thread Alexander Graf
While trying to make Windows work with GICv3 emulation, I stumbled over
the fact that it only supports ITT entry sizes that are power of 2 sized.

While the spec allows arbitrary sizes, in practice hardware will always
expose power of 2 sizes and so this limitation is not really a problem
in real world scenarios. However, we only expose a 12 byte ITT entry size
which makes Windows blue screen on boot.

The easy way to get around that problem is to bump the size to 16. That
is a power of 2, basically is what hardware would expose given the amount
of bits we need per entry and doesn't break any existing scenarios. To
play it safe, this patch set only bumps them on newer machine types.

Alexander Graf (2):
  hw/intc/arm_gicv3: Make ITT entry size configurable
  hw/intc/arm_gicv3: Bump ITT entry size to 16

 hw/core/machine.c  |  4 +++-
 hw/intc/arm_gicv3_its.c| 13 ++---
 hw/intc/gicv3_internal.h   |  2 +-
 include/hw/intc/arm_gicv3_its_common.h |  1 +
 4 files changed, 15 insertions(+), 5 deletions(-)

-- 
2.37.1 (Apple Git-137.1)




[PATCH 1/2] hw/intc/arm_gicv3: Make ITT entry size configurable

2022-12-23 Thread Alexander Graf
An ITT entry is opaque to the OS. The only thing it does get told by HW is
its size. In theory, that size can be any byte aligned number, in practice
HW will always use power of 2s to simplify offset calculation. We currently
expose the size as 12, which is not a power of 2.

To prepare for a future where we expose power of 2 sized entry sizes, let's
make the size itself configurable. We only need to watch out that we don't
have an entry be smaller than the fields we want to access inside. Bigger
is always fine.

Signed-off-by: Alexander Graf 
---
 hw/intc/arm_gicv3_its.c| 14 +++---
 hw/intc/gicv3_internal.h   |  2 +-
 include/hw/intc/arm_gicv3_its_common.h |  1 +
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
index 57c79da5c5..e7cabeb46c 100644
--- a/hw/intc/arm_gicv3_its.c
+++ b/hw/intc/arm_gicv3_its.c
@@ -215,7 +215,7 @@ static bool update_ite(GICv3ITSState *s, uint32_t eventid, 
const DTEntry *dte,
 {
 AddressSpace *as = &s->gicv3->dma_as;
 MemTxResult res = MEMTX_OK;
-hwaddr iteaddr = dte->ittaddr + eventid * ITS_ITT_ENTRY_SIZE;
+hwaddr iteaddr = dte->ittaddr + eventid * s->itt_entry_size;
 uint64_t itel = 0;
 uint32_t iteh = 0;
 
@@ -253,7 +253,7 @@ static MemTxResult get_ite(GICv3ITSState *s, uint32_t 
eventid,
 MemTxResult res = MEMTX_OK;
 uint64_t itel;
 uint32_t iteh;
-hwaddr iteaddr = dte->ittaddr + eventid * ITS_ITT_ENTRY_SIZE;
+hwaddr iteaddr = dte->ittaddr + eventid * s->itt_entry_size;
 
 itel = address_space_ldq_le(as, iteaddr, MEMTXATTRS_UNSPECIFIED, &res);
 if (res != MEMTX_OK) {
@@ -1934,6 +1934,12 @@ static void gicv3_arm_its_realize(DeviceState *dev, 
Error **errp)
 }
 }
 
+if (s->itt_entry_size < MIN_ITS_ITT_ENTRY_SIZE) {
+error_setg(errp, "ITT entry size must be at least %d",
+   MIN_ITS_ITT_ENTRY_SIZE);
+return;
+}
+
 gicv3_add_its(s->gicv3, dev);
 
 gicv3_its_init_mmio(s, &gicv3_its_control_ops, &gicv3_its_translation_ops);
@@ -1941,7 +1947,7 @@ static void gicv3_arm_its_realize(DeviceState *dev, Error 
**errp)
 /* set the ITS default features supported */
 s->typer = FIELD_DP64(s->typer, GITS_TYPER, PHYSICAL, 1);
 s->typer = FIELD_DP64(s->typer, GITS_TYPER, ITT_ENTRY_SIZE,
-  ITS_ITT_ENTRY_SIZE - 1);
+  s->itt_entry_size - 1);
 s->typer = FIELD_DP64(s->typer, GITS_TYPER, IDBITS, ITS_IDBITS);
 s->typer = FIELD_DP64(s->typer, GITS_TYPER, DEVBITS, ITS_DEVBITS);
 s->typer = FIELD_DP64(s->typer, GITS_TYPER, CIL, 1);
@@ -2008,6 +2014,8 @@ static void gicv3_its_post_load(GICv3ITSState *s)
 static Property gicv3_its_props[] = {
 DEFINE_PROP_LINK("parent-gicv3", GICv3ITSState, gicv3, "arm-gicv3",
  GICv3State *),
+DEFINE_PROP_UINT8("itt-entry-size", GICv3ITSState, itt_entry_size,
+  MIN_ITS_ITT_ENTRY_SIZE),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h
index 29d5cdc1b6..2aca1ba095 100644
--- a/hw/intc/gicv3_internal.h
+++ b/hw/intc/gicv3_internal.h
@@ -450,7 +450,7 @@ FIELD(VINVALL_1, VPEID, 32, 16)
  * the value of that field in memory cannot be relied upon -- older
  * versions of QEMU did not correctly write to that memory.)
  */
-#define ITS_ITT_ENTRY_SIZE0xC
+#define MIN_ITS_ITT_ENTRY_SIZE0xC
 
 FIELD(ITE_L, VALID, 0, 1)
 FIELD(ITE_L, INTTYPE, 1, 1)
diff --git a/include/hw/intc/arm_gicv3_its_common.h 
b/include/hw/intc/arm_gicv3_its_common.h
index a11a0f6654..e730a5482c 100644
--- a/include/hw/intc/arm_gicv3_its_common.h
+++ b/include/hw/intc/arm_gicv3_its_common.h
@@ -66,6 +66,7 @@ struct GICv3ITSState {
 int dev_fd; /* kvm device fd if backed by kvm vgic support */
 uint64_t gits_translater_gpa;
 bool translater_gpa_known;
+uint8_t itt_entry_size;
 
 /* Registers */
 uint32_t ctlr;
-- 
2.37.1 (Apple Git-137.1)




[PATCH 2/2] hw/intc/arm_gicv3: Bump ITT entry size to 16

2022-12-23 Thread Alexander Graf
Some Operating Systems (like Windows) can only deal with ITT entry sizes
that are a power of 2. While the spec allows arbitrarily sized ITT entry
sizes, in practice all hardware will use power of 2 because that
simplifies offset calculation and ensures that a power of 2 sized region
can hold a set of entries without gap at the end.

So let's just bump the entry size to 16. That gives us enough space for
the 12 bytes of data that we want to have in each ITT entry and makes
QEMU look a bit more like real hardware.

Signed-off-by: Alexander Graf 
---
 hw/core/machine.c   | 4 +++-
 hw/intc/arm_gicv3_its.c | 3 +--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index f589b92909..d9a3f01ed9 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -40,7 +40,9 @@
 #include "hw/virtio/virtio-pci.h"
 #include "qom/object_interfaces.h"
 
-GlobalProperty hw_compat_7_2[] = {};
+GlobalProperty hw_compat_7_2[] = {
+{ "arm-gicv3-its", "itt-entry-size", "12" },
+};
 const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2);
 
 GlobalProperty hw_compat_7_1[] = {
diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
index e7cabeb46c..6754523321 100644
--- a/hw/intc/arm_gicv3_its.c
+++ b/hw/intc/arm_gicv3_its.c
@@ -2014,8 +2014,7 @@ static void gicv3_its_post_load(GICv3ITSState *s)
 static Property gicv3_its_props[] = {
 DEFINE_PROP_LINK("parent-gicv3", GICv3ITSState, gicv3, "arm-gicv3",
  GICv3State *),
-DEFINE_PROP_UINT8("itt-entry-size", GICv3ITSState, itt_entry_size,
-  MIN_ITS_ITT_ENTRY_SIZE),
+DEFINE_PROP_UINT8("itt-entry-size", GICv3ITSState, itt_entry_size, 16),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.37.1 (Apple Git-137.1)




[PATCH v2 1/2] hw/arm/virt: Consolidate GIC finalize logic

2022-12-21 Thread Alexander Graf
Up to now, the finalize_gic_version() code open coded what is essentially
a support bitmap match between host/emulation environment and desired
target GIC type.

This open coding leads to undesirable side effects. For example, a VM with
KVM and -smp 10 will automatically choose GICv3 while the same command
line with TCG will stay on GICv2 and fail the launch.

This patch combines the TCG and KVM matching code paths by making
everything a 2 pass process. First, we determine which GIC versions the
current environment is able to support, then we go through a single
state machine to determine which target GIC mode that means for us.

After this patch, the only user noticable changes should be consolidated
error messages as well as TCG -M virt supporting -smp > 8 automatically.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - leave VIRT_GIC_VERSION defines intact, we need them for MADT generation
---
 hw/arm/virt.c | 199 ++
 include/hw/arm/virt.h |  15 ++--
 2 files changed, 113 insertions(+), 101 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 04eb6c201d..7b54387958 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1820,6 +1820,85 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 }
 }
 
+static VirtGICType finalize_gic_version_do(const char *accel_name,
+   VirtGICType gic_version,
+   int gics_supported,
+   unsigned int max_cpus)
+{
+/* Convert host/max/nosel to GIC version number */
+switch (gic_version) {
+case VIRT_GIC_VERSION_HOST:
+if (kvm_enabled()) {
+/* For KVM, -cpu host means -cpu max */
+return finalize_gic_version_do(accel_name, VIRT_GIC_VERSION_MAX,
+   gics_supported, max_cpus);
+}
+
+error_report("gic-version=host requires KVM");
+exit(1);
+break;
+case VIRT_GIC_VERSION_MAX:
+if (gics_supported & VIRT_GIC_VERSION_4_MASK) {
+gic_version = VIRT_GIC_VERSION_4;
+} else if (gics_supported & VIRT_GIC_VERSION_3_MASK) {
+gic_version = VIRT_GIC_VERSION_3;
+} else {
+gic_version = VIRT_GIC_VERSION_2;
+}
+break;
+case VIRT_GIC_VERSION_NOSEL:
+if ((gics_supported & VIRT_GIC_VERSION_2_MASK) &&
+max_cpus <= GIC_NCPU) {
+gic_version = VIRT_GIC_VERSION_2;
+} else if (gics_supported & VIRT_GIC_VERSION_3_MASK) {
+/*
+ * in case the host does not support v2 emulation or
+ * the end-user requested more than 8 VCPUs we now default
+ * to v3. In any case defaulting to v2 would be broken.
+ */
+gic_version = VIRT_GIC_VERSION_3;
+} else if (max_cpus > GIC_NCPU) {
+error_report("%s only supports GICv2 emulation but more than 8 "
+ "vcpus are requested", accel_name);
+exit(1);
+}
+break;
+case VIRT_GIC_VERSION_2:
+case VIRT_GIC_VERSION_3:
+case VIRT_GIC_VERSION_4:
+break;
+}
+
+/* Check chosen version is effectively supported */
+switch (gic_version) {
+case VIRT_GIC_VERSION_2:
+if (!(gics_supported & VIRT_GIC_VERSION_2_MASK)) {
+error_report("%s does not support GICv2 emulation", accel_name);
+exit(1);
+}
+break;
+case VIRT_GIC_VERSION_3:
+if (!(gics_supported & VIRT_GIC_VERSION_3_MASK)) {
+error_report("%s does not support GICv3 emulation", accel_name);
+exit(1);
+}
+break;
+case VIRT_GIC_VERSION_4:
+if (!(gics_supported & VIRT_GIC_VERSION_4_MASK)) {
+error_report("%s does not support GICv4 emulation, is 
virtualization=on?",
+ accel_name);
+exit(1);
+}
+break;
+default:
+error_report("logic error in finalize_gic_version");
+exit(1);
+break;
+}
+
+return gic_version;
+}
+
 /*
  * finalize_gic_version - Determines the final gic_version
  * according to the gic-version property
@@ -1828,118 +1907,46 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
  */
 static void finalize_gic_version(VirtMachineState *vms)
 {
+const char *accel_name = current_accel_name();
 unsigned int max_cpus = MACHINE(vms)->smp.max_cpus;
+int gics_supported = 0;
 
-if (kvm_enabled()) {
-int probe_bitmap;
-
-if (!kvm_irqchip_in_kernel()) {
-switch (vms->gic_version) {
-case VIRT_GIC_VERSION_HOST:
-warn_report(
-"gic-version=host not relevant with kernel-irqchip=off "
- 

[PATCH 0/2] hw/arm/virt: Handle HVF in finalize_gic_version()

2022-12-21 Thread Alexander Graf
The finalize_gic_version() function tries to determine which GIC version
the current accelerator / host combination supports. During the initial
HVF porting efforts, I didn't realize that I also had to touch this
function. Then Zenghui brought up this function as reply to my HVF GICv3
enablement patch - and boy it is a mess.

This patch set cleans up all of the GIC finalization so that we can
easily plug HVF in and also hopefully will have a better time extending
it in the future. As second step, it explicitly adds HVF support and
fails loudly for any unsupported accelerators.

Alex

v1 -> v2:

  - Leave VIRT_GIC_VERSION defines intact, we need them for MADT generation
  - Include TCG header for tcg_enabled()

Alexander Graf (2):
  hw/arm/virt: Consolidate GIC finalize logic
  hw/arm/virt: Make accels in GIC finalize logic explicit

 hw/arm/virt.c | 201 ++
 include/hw/arm/virt.h |  15 ++--
 2 files changed, 116 insertions(+), 100 deletions(-)

-- 
2.37.1 (Apple Git-137.1)




[PATCH v2 2/2] hw/arm/virt: Make accels in GIC finalize logic explicit

2022-12-21 Thread Alexander Graf
Let's explicitly list out all accelerators that we support when trying to
determine the supported set of GIC versions. KVM was already separate, so
the only missing one is HVF which simply reuses all of TCG's emulation
code and thus has the same compatibility matrix.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - Include TCG header for tcg_enabled()
---
 hw/arm/virt.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 7b54387958..76d8d5cc5a 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -47,6 +47,7 @@
 #include "sysemu/numa.h"
 #include "sysemu/runstate.h"
 #include "sysemu/tpm.h"
+#include "sysemu/tcg.h"
 #include "sysemu/kvm.h"
 #include "sysemu/hvf.h"
 #include "hw/loader.h"
@@ -1930,7 +1931,7 @@ static void finalize_gic_version(VirtMachineState *vms)
 /* KVM w/o kernel irqchip can only deal with GICv2 */
 gics_supported |= VIRT_GIC_VERSION_2_MASK;
 accel_name = "KVM with kernel-irqchip=off";
-} else {
+} else if (tcg_enabled() || hvf_enabled())  {
 gics_supported |= VIRT_GIC_VERSION_2_MASK;
 if (module_object_class_by_name("arm-gicv3")) {
 gics_supported |= VIRT_GIC_VERSION_3_MASK;
@@ -1939,6 +1940,9 @@ static void finalize_gic_version(VirtMachineState *vms)
 gics_supported |= VIRT_GIC_VERSION_4_MASK;
 }
 }
+} else {
+error_report("Unsupported accelerator, can not determine GIC support");
+exit(1);
 }
 
 /*
-- 
2.37.1 (Apple Git-137.1)




Re: [PATCH 1/2] hw/arm/virt: Consolidate GIC finalize logic

2022-12-21 Thread Alexander Graf

Hey Zengui,

On 21.12.22 04:35, Zenghui Yu wrote:

On 2022/12/21 7:04, Alexander Graf wrote:

diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index c7dd59d7f1..365d19f7a3 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -109,12 +109,12 @@ typedef enum VirtMSIControllerType {
 } VirtMSIControllerType;

 typedef enum VirtGICType {
-    VIRT_GIC_VERSION_MAX,
-    VIRT_GIC_VERSION_HOST,
-    VIRT_GIC_VERSION_2,
-    VIRT_GIC_VERSION_3,
-    VIRT_GIC_VERSION_4,
-    VIRT_GIC_VERSION_NOSEL,
+    VIRT_GIC_VERSION_MAX = 0,
+    VIRT_GIC_VERSION_HOST = 1,
+    VIRT_GIC_VERSION_NOSEL = 2,
+    VIRT_GIC_VERSION_2 = (1 << 2),
+    VIRT_GIC_VERSION_3 = (1 << 3),
+    VIRT_GIC_VERSION_4 = (1 << 4),


This would break the ACPI case. When building the MADT, we currently
write the raw vms->gic_version value into "GIC version" field of the
GICD structure. This happens to work because VIRT_GIC_VERSION_x == x (by
luck, I think). We may need to fix build_madt() before this change.



Ouch, thanks a lot for the catch! I don't think it's by luck - the 
versions are put very deliberately at a place where they equal the gic 
number. But I agree that it's missing a comment - I'll add one for 
clarification and make sure the defines looks explicit in v2.



Alex




[PATCH 2/2] hw/arm/virt: Make accels in GIC finalize logic explicit

2022-12-20 Thread Alexander Graf
Let's explicitly list out all accelerators that we support when trying to
determine the supported set of GIC versions. KVM was already separate, so
the only missing one is HVF which simply reuses all of TCG's emulation
code and thus has the same compatibility matrix.

Signed-off-by: Alexander Graf 
---
 hw/arm/virt.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index c79f5b6a66..b7fb473788 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1929,7 +1929,7 @@ static void finalize_gic_version(VirtMachineState *vms)
 /* KVM w/o kernel irqchip can only deal with GICv2 */
 gics_supported |= VIRT_GIC_VERSION_2;
 accel_name = "KVM with kernel-irqchip=off";
-} else {
+} else if (tcg_enabled() || hvf_enabled())  {
 gics_supported |= VIRT_GIC_VERSION_2;
 if (module_object_class_by_name("arm-gicv3")) {
 gics_supported |= VIRT_GIC_VERSION_3;
@@ -1938,6 +1938,9 @@ static void finalize_gic_version(VirtMachineState *vms)
 gics_supported |= VIRT_GIC_VERSION_4;
 }
 }
+} else {
+error_report("Unsupported accelerator, can not determine GIC support");
+exit(1);
 }
 
 /*
-- 
2.37.1 (Apple Git-137.1)




[PATCH 1/2] hw/arm/virt: Consolidate GIC finalize logic

2022-12-20 Thread Alexander Graf
Up to now, the finalize_gic_version() code open coded what is essentially
a support bitmap match between host/emulation environment and desired
target GIC type.

This open coding leads to undesirable side effects. For example, a VM with
KVM and -smp 10 will automatically choose GICv3 while the same command
line with TCG will stay on GICv2 and fail the launch.

This patch combines the TCG and KVM matching code paths by making
everything a 2 pass process. First, we determine which GIC versions the
current environment is able to support, then we go through a single
state machine to determine which target GIC mode that means for us.

After this patch, the only user noticable changes should be consolidated
error messages as well as TCG -M virt supporting -smp > 8 automatically.

Signed-off-by: Alexander Graf 
---
 hw/arm/virt.c | 198 ++
 include/hw/arm/virt.h |  12 +--
 2 files changed, 108 insertions(+), 102 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 04eb6c201d..c79f5b6a66 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1820,6 +1820,84 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 }
 }
 
+static VirtGICType finalize_gic_version_do(const char *accel_name,
+   VirtGICType gic_version,
+   int gics_supported,
+   unsigned int max_cpus)
+{
+/* Convert host/max/nosel to GIC version number */
+switch (gic_version) {
+case VIRT_GIC_VERSION_HOST:
+if (kvm_enabled()) {
+/* For KVM, -cpu host means -cpu max */
+return finalize_gic_version_do(accel_name, VIRT_GIC_VERSION_MAX,
+   gics_supported, max_cpus);
+}
+
+error_report("gic-version=host requires KVM");
+exit(1);
+break;
+case VIRT_GIC_VERSION_MAX:
+if (gics_supported & VIRT_GIC_VERSION_4) {
+gic_version = VIRT_GIC_VERSION_4;
+} else if (gics_supported & VIRT_GIC_VERSION_3) {
+gic_version = VIRT_GIC_VERSION_3;
+} else {
+gic_version = VIRT_GIC_VERSION_2;
+}
+break;
+case VIRT_GIC_VERSION_NOSEL:
+if ((gics_supported & VIRT_GIC_VERSION_2) && max_cpus <= GIC_NCPU) {
+gic_version = VIRT_GIC_VERSION_2;
+} else if (gics_supported & VIRT_GIC_VERSION_3) {
+/*
+ * in case the host does not support v2 emulation or
+ * the end-user requested more than 8 VCPUs we now default
+ * to v3. In any case defaulting to v2 would be broken.
+ */
+gic_version = VIRT_GIC_VERSION_3;
+} else if (max_cpus > GIC_NCPU) {
+error_report("%s only supports GICv2 emulation but more than 8 "
+ "vcpus are requested", accel_name);
+exit(1);
+}
+break;
+case VIRT_GIC_VERSION_2:
+case VIRT_GIC_VERSION_3:
+case VIRT_GIC_VERSION_4:
+break;
+}
+
+/* Check chosen version is effectively supported */
+switch (gic_version) {
+case VIRT_GIC_VERSION_2:
+if (!(gics_supported & VIRT_GIC_VERSION_2)) {
+error_report("%s does not support GICv2 emulation", accel_name);
+exit(1);
+}
+break;
+case VIRT_GIC_VERSION_3:
+if (!(gics_supported & VIRT_GIC_VERSION_3)) {
+error_report("%s does not support GICv3 emulation", accel_name);
+exit(1);
+}
+break;
+case VIRT_GIC_VERSION_4:
+if (!(gics_supported & VIRT_GIC_VERSION_4)) {
+error_report("%s does not support GICv4 emulation, is 
virtualization=on?",
+ accel_name);
+exit(1);
+}
+break;
+default:
+error_report("logic error in finalize_gic_version");
+exit(1);
+break;
+}
+
+return gic_version;
+}
+
 /*
  * finalize_gic_version - Determines the final gic_version
  * according to the gic-version property
@@ -1828,118 +1906,46 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
  */
 static void finalize_gic_version(VirtMachineState *vms)
 {
+const char *accel_name = current_accel_name();
 unsigned int max_cpus = MACHINE(vms)->smp.max_cpus;
+int gics_supported = 0;
 
-if (kvm_enabled()) {
-int probe_bitmap;
-
-if (!kvm_irqchip_in_kernel()) {
-switch (vms->gic_version) {
-case VIRT_GIC_VERSION_HOST:
-warn_report(
-"gic-version=host not relevant with kernel-irqchip=off "
- "as only userspace GICv2 is supported. Using v2 ...");
-return;
-case VIRT_GIC_VERSION_MAX:

[PATCH 0/2] hw/arm/virt: Handle HVF in finalize_gic_version()

2022-12-20 Thread Alexander Graf
The finalize_gic_version() function tries to determine which GIC version
the current accelerator / host combination supports. During the initial
HVF porting efforts, I didn't realize that I also had to touch this
function. Then Zenghui brought up this function as reply to my HVF GICv3
enablement patch - and boy it is a mess.

This patch set cleans up all of the GIC finalization so that we can
easily plug HVF in and also hopefully will have a better time extending
it in the future. As second step, it explicitly adds HVF support and
fails loudly for any unsupported accelerators.

Alex

Alexander Graf (2):
  hw/arm/virt: Consolidate GIC finalize logic
  hw/arm/virt: Make accels in GIC finalize logic explicit

 hw/arm/virt.c | 199 ++
 include/hw/arm/virt.h |  12 +--
 2 files changed, 110 insertions(+), 101 deletions(-)

-- 
2.37.1 (Apple Git-137.1)




Re: [PATCH 1/5] target/arm: only build psci for TCG

2022-12-20 Thread Alexander Graf



On 20.12.22 14:53, Fabiano Rosas wrote:

Alexander Graf  writes:


Hey Fabiano,

On 19.12.22 12:42, Fabiano Rosas wrote:

Claudio Fontana  writes:


Ciao Alex,

On 12/19/22 11:47, Alexander Graf wrote:

Hey Claudio,

On 19.12.22 09:37, Claudio Fontana wrote:

On 12/16/22 22:59, Alexander Graf wrote:

Hi Claudio,

If the PSCI implementation becomes TCG only, can we also move to a tcg accel 
directory? It slowly gets super confusing to keep track of which files are 
supposed to be generic target code and which ones TCG specific>
Alex

Hi Alex, Fabiano, Peter and all,

that was the plan but at the time of:

https://lore.kernel.org/all/20210416162824.25131-1-cfont...@suse.de/

Peter mentioned that HVF AArch64 might use that code too:

https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg00509.html

so from v2 to v3 the series changed to not move the code under tcg/ , expecting HVF to be 
reusing that code "soon".

I see that your hvf code in hvf/ implements psci, is there some plan to reuse 
pieces from the tcg implementation now?

I originally reused the PSCI code in earlier versions of my hvf patch
set, but then we realized that some functions like remote CPU reset are
wired in a TCG specific view of the world with full target CPU register
ownership. So if we want to actually share it, we'll need to abstract it
up a level.

Hence I'd suggest to move it to a TCG directory for now and then later
move it back into a generic helper if we want / need to. The code just
simply isn't generic yet.

Or alternatively, you create a patch (set) to actually merge the 2
implementations into a generic one again which then can live at a
generic place :)


Alex

Thanks for the clarification, I'll leave the choice up to Fabiano now, since he 
is working on the series currently :-)

Ciao,

Claudio

Hello, thank you all for the comments.

I like the idea of merging the two implementations. However, I won't get
to it anytime soon. There's still ~70 patches in the original series
that I need to understand, rebase and test, including the introduction
of the tcg directory.


Sure, I am definitely fine with leaving them separate for now as well :).



I'd say we merge this as is now, since this patch has no
dependencies. Later when I introduce the tcg directory I can move the
code there along with the other tcg-only files. I'll take note to come
back to the PSCI code as well.

I'm confused about the patch ordering :). Why is it easier to move the
psci.c compilation target from generic to an if(CONFIG_TCG) only to
later move it into a tcg/ directory?

It's a simple patch, so the overhead didn't cross my mind. But you are
right, this could go directly into tcg/ without having to put it under
CONFIG_TCG first.



I'm sure more like this will follow, and it will be a lot easier on 
everyone if the pattern is going to be "move tcg specific code to tcg/ 
and leave generic code in the main directory".






Wouldn't it be easier to create a
tcg/ directory from the start and just put it there?

I don't know about "from the start". At this point we cannot have a
single commit moving everything into the tcg/ directory because some
files still contain tcg + non-tcg code.



Yes, the only thing the initial commit at the start would do is create 
the directory and ninja config, pretty much nothing else. All follow-on 
commits then split the currently entangled code into tcg/ once code is 
clearly separate :).



I believe this was also the approach Claudio took in his patch set last 
year, and I find it very reasonable. It allows you to stop at any point 
mid-way.




We would end up with several
commits moving files into tcg/ along the history, which I think results
in a poor experience when inspecting the log later on (git blame and so
on). So my idea was to split as much as I can upfront and only later
move everything into the directory.



Quite the opposite: Please make sure to move everything slowly at a 
digestible pace. There is no way you will get 100 patches in at once. 
Make sure you can cut off at any point in between.


What you basically want is to move from "target/arm is tcg+generic code" 
to "target/arm is generic, target/arm/tcg is tcg code". You will be in a 
transitional phase along the way whatever you do, so just make it 
"target/arm is tcg+generic code, target/arm/tcg is tcg code" while 
things are in flight and have a final commit that indicates the 
conversion is done.




I'm also rebasing this series [1] from 2021, which means I'd rather have
small chunks of code moved under CONFIG_TCG that I can build-test with
--disable-tcg (even though the build doesn't finish, I can see the
number of errors going down), than to move non-tcg code into tcg/ and
then pull it out later like in the original series.



I think we're saying the same thing. Please don't move

Re: [PATCH] hvf: arm: Add support for GICv3

2022-12-19 Thread Alexander Graf

Hi Zenghui,

On 20.12.22 08:14, Zenghui Yu wrote:

On 2022/12/20 6:08, Alexander Graf wrote:

We currently only support GICv2 emulation.


Before looking into it, I think it's worth finalizing the GIC version in
the hvf case - only v2 is allowed and fail early if user selects the
unsupported versions. Currently finalize_gic_version() does not deal
with hvf at all.



Currently finalize_gic_version() treats HVF the same as TCG, which is 
incorrect. However, with this patch applied, they happen to match.


I don't think it's worth changing the finalize_gic_version() 
implementation to reflect the gicv2 only state for HVF. Instead, let's 
rather get GICv3 support in and then add explicit handling for HVF on top.


Alex




Re: [PATCH 1/5] target/arm: only build psci for TCG

2022-12-19 Thread Alexander Graf

Hey Fabiano,

On 19.12.22 12:42, Fabiano Rosas wrote:

Claudio Fontana  writes:


Ciao Alex,

On 12/19/22 11:47, Alexander Graf wrote:

Hey Claudio,

On 19.12.22 09:37, Claudio Fontana wrote:

On 12/16/22 22:59, Alexander Graf wrote:

Hi Claudio,

If the PSCI implementation becomes TCG only, can we also move to a tcg accel 
directory? It slowly gets super confusing to keep track of which files are 
supposed to be generic target code and which ones TCG specific>
Alex

Hi Alex, Fabiano, Peter and all,

that was the plan but at the time of:

https://lore.kernel.org/all/20210416162824.25131-1-cfont...@suse.de/

Peter mentioned that HVF AArch64 might use that code too:

https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg00509.html

so from v2 to v3 the series changed to not move the code under tcg/ , expecting HVF to be 
reusing that code "soon".

I see that your hvf code in hvf/ implements psci, is there some plan to reuse 
pieces from the tcg implementation now?

I originally reused the PSCI code in earlier versions of my hvf patch
set, but then we realized that some functions like remote CPU reset are
wired in a TCG specific view of the world with full target CPU register
ownership. So if we want to actually share it, we'll need to abstract it
up a level.

Hence I'd suggest to move it to a TCG directory for now and then later
move it back into a generic helper if we want / need to. The code just
simply isn't generic yet.

Or alternatively, you create a patch (set) to actually merge the 2
implementations into a generic one again which then can live at a
generic place :)


Alex

Thanks for the clarification, I'll leave the choice up to Fabiano now, since he 
is working on the series currently :-)

Ciao,

Claudio

Hello, thank you all for the comments.

I like the idea of merging the two implementations. However, I won't get
to it anytime soon. There's still ~70 patches in the original series
that I need to understand, rebase and test, including the introduction
of the tcg directory.



Sure, I am definitely fine with leaving them separate for now as well :).



I'd say we merge this as is now, since this patch has no
dependencies. Later when I introduce the tcg directory I can move the
code there along with the other tcg-only files. I'll take note to come
back to the PSCI code as well.


I'm confused about the patch ordering :). Why is it easier to move the 
psci.c compilation target from generic to an if(CONFIG_TCG) only to 
later move it into a tcg/ directory? Wouldn't it be easier to create a 
tcg/ directory from the start and just put it there?


The current approach just looks like duplicate effort to me.


Alex




[PATCH] hvf: arm: Add support for GICv3

2022-12-19 Thread Alexander Graf
We currently only support GICv2 emulation. To also support GICv3, we will
need to pass a few system registers into their respective handler functions.

This patch adds support for HVF to call into the TCG callbacks for GICv3
system register handlers. This is safe because the GICv3 TCG code is generic
as long as we limit ourselves to EL0 and EL1 - which are the only modes
supported by HVF.

To make sure nobody trips over that, we also annotate callbacks that don't
work in HVF mode, such as EL state change hooks.

With GICv3 support in place, we can run with more than 8 vCPUs.

Signed-off-by: Alexander Graf 
---
 hw/intc/arm_gicv3_cpuif.c   |   8 +-
 target/arm/hvf/hvf.c| 151 
 target/arm/hvf/trace-events |   2 +
 3 files changed, 160 insertions(+), 1 deletion(-)

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index b17b29288c..b4e387268c 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -21,6 +21,7 @@
 #include "hw/irq.h"
 #include "cpu.h"
 #include "target/arm/cpregs.h"
+#include "sysemu/tcg.h"
 
 /*
  * Special case return value from hppvi_index(); must be larger than
@@ -2810,6 +2811,8 @@ void gicv3_init_cpuif(GICv3State *s)
  * which case we'd get the wrong value.
  * So instead we define the regs with no ri->opaque info, and
  * get back to the GICv3CPUState from the CPUARMState.
+ *
+ * These CP regs callbacks can be called from either TCG or HVF code.
  */
 define_arm_cp_regs(cpu, gicv3_cpuif_reginfo);
 
@@ -2905,6 +2908,9 @@ void gicv3_init_cpuif(GICv3State *s)
 define_arm_cp_regs(cpu, gicv3_cpuif_ich_apxr23_reginfo);
 }
 }
-arm_register_el_change_hook(cpu, gicv3_cpuif_el_change_hook, cs);
+if (tcg_enabled()) {
+/* We can only trap EL changes with TCG for now */
+arm_register_el_change_hook(cpu, gicv3_cpuif_el_change_hook, cs);
+}
 }
 }
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 060aa0ccf4..8ea4be5f30 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -80,6 +80,33 @@
 #define SYSREG_PMCCNTR_EL0SYSREG(3, 3, 9, 13, 0)
 #define SYSREG_PMCCFILTR_EL0  SYSREG(3, 3, 14, 15, 7)
 
+#define SYSREG_ICC_AP0R0_EL1 SYSREG(3, 0, 12, 8, 4)
+#define SYSREG_ICC_AP0R1_EL1 SYSREG(3, 0, 12, 8, 5)
+#define SYSREG_ICC_AP0R2_EL1 SYSREG(3, 0, 12, 8, 6)
+#define SYSREG_ICC_AP0R3_EL1 SYSREG(3, 0, 12, 8, 7)
+#define SYSREG_ICC_AP1R0_EL1 SYSREG(3, 0, 12, 9, 0)
+#define SYSREG_ICC_AP1R1_EL1 SYSREG(3, 0, 12, 9, 1)
+#define SYSREG_ICC_AP1R2_EL1 SYSREG(3, 0, 12, 9, 2)
+#define SYSREG_ICC_AP1R3_EL1 SYSREG(3, 0, 12, 9, 3)
+#define SYSREG_ICC_ASGI1R_EL1SYSREG(3, 0, 12, 11, 6)
+#define SYSREG_ICC_BPR0_EL1  SYSREG(3, 0, 12, 8, 3)
+#define SYSREG_ICC_BPR1_EL1  SYSREG(3, 0, 12, 12, 3)
+#define SYSREG_ICC_CTLR_EL1  SYSREG(3, 0, 12, 12, 4)
+#define SYSREG_ICC_DIR_EL1   SYSREG(3, 0, 12, 11, 1)
+#define SYSREG_ICC_EOIR0_EL1 SYSREG(3, 0, 12, 8, 1)
+#define SYSREG_ICC_EOIR1_EL1 SYSREG(3, 0, 12, 12, 1)
+#define SYSREG_ICC_HPPIR0_EL1SYSREG(3, 0, 12, 8, 2)
+#define SYSREG_ICC_HPPIR1_EL1SYSREG(3, 0, 12, 12, 2)
+#define SYSREG_ICC_IAR0_EL1  SYSREG(3, 0, 12, 8, 0)
+#define SYSREG_ICC_IAR1_EL1  SYSREG(3, 0, 12, 12, 0)
+#define SYSREG_ICC_IGRPEN0_EL1   SYSREG(3, 0, 12, 12, 6)
+#define SYSREG_ICC_IGRPEN1_EL1   SYSREG(3, 0, 12, 12, 7)
+#define SYSREG_ICC_PMR_EL1   SYSREG(3, 0, 4, 6, 0)
+#define SYSREG_ICC_RPR_EL1   SYSREG(3, 0, 12, 11, 3)
+#define SYSREG_ICC_SGI0R_EL1 SYSREG(3, 0, 12, 11, 7)
+#define SYSREG_ICC_SGI1R_EL1 SYSREG(3, 0, 12, 11, 5)
+#define SYSREG_ICC_SRE_EL1   SYSREG(3, 0, 12, 12, 5)
+
 #define WFX_IS_WFE (1 << 0)
 
 #define TMR_CTL_ENABLE  (1 << 0)
@@ -788,6 +815,43 @@ static bool is_id_sysreg(uint32_t reg)
SYSREG_CRM(reg) < 8;
 }
 
+static uint32_t hvf_reg2cp_reg(uint32_t reg)
+{
+return ENCODE_AA64_CP_REG(CP_REG_ARM64_SYSREG_CP,
+  (reg >> 10) & 0xf,
+  (reg >> 1) & 0xf,
+  (reg >> 20) & 0x3,
+  (reg >> 14) & 0x7,
+  (reg >> 17) & 0x7);
+}
+
+static bool hvf_sysreg_read_cp(CPUState *cpu, uint32_t reg, uint64_t *val)
+{
+ARMCPU *arm_cpu = ARM_CPU(cpu);
+CPUARMState *env = &arm_cpu->env;
+const ARMCPRegInfo *ri;
+
+ri = get_arm_cp_reginfo(arm_cpu->cp_regs, hvf_reg2cp_reg(reg));
+if (ri) {
+if (ri->accessfn) {
+if (ri->accessfn(env, ri, true) != CP_ACCESS_OK) {
+return false;
+}
+}
+if (ri->type & ARM_CP_CONST) {
+*val = ri->resetvalue;
+} else if (ri->readfn) {
+ 

Re: [PATCH 1/5] target/arm: only build psci for TCG

2022-12-19 Thread Alexander Graf

Hey Claudio,

On 19.12.22 09:37, Claudio Fontana wrote:


On 12/16/22 22:59, Alexander Graf wrote:

Hi Claudio,

If the PSCI implementation becomes TCG only, can we also move to a tcg accel 
directory? It slowly gets super confusing to keep track of which files are 
supposed to be generic target code and which ones TCG specific>
Alex

Hi Alex, Fabiano, Peter and all,

that was the plan but at the time of:

https://lore.kernel.org/all/20210416162824.25131-1-cfont...@suse.de/

Peter mentioned that HVF AArch64 might use that code too:

https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg00509.html

so from v2 to v3 the series changed to not move the code under tcg/ , expecting HVF to be 
reusing that code "soon".

I see that your hvf code in hvf/ implements psci, is there some plan to reuse 
pieces from the tcg implementation now?


I originally reused the PSCI code in earlier versions of my hvf patch 
set, but then we realized that some functions like remote CPU reset are 
wired in a TCG specific view of the world with full target CPU register 
ownership. So if we want to actually share it, we'll need to abstract it 
up a level.


Hence I'd suggest to move it to a TCG directory for now and then later 
move it back into a generic helper if we want / need to. The code just 
simply isn't generic yet.


Or alternatively, you create a patch (set) to actually merge the 2 
implementations into a generic one again which then can live at a 
generic place :)



Alex





Re: [PATCH 1/5] target/arm: only build psci for TCG

2022-12-16 Thread Alexander Graf
Hi Claudio,

If the PSCI implementation becomes TCG only, can we also move to a tcg accel 
directory? It slowly gets super confusing to keep track of which files are 
supposed to be generic target code and which ones TCG specific.

Alex

> Am 16.12.2022 um 22:37 schrieb Fabiano Rosas :
> 
> From: Claudio Fontana 
> 
> Signed-off-by: Claudio Fontana 
> Cc: Alexander Graf 
> Reviewed-by: Richard Henderson 
> Reviewed-by: Alex Bennée 
> Signed-off-by: Fabiano Rosas 
> ---
> Originally from:
> [RFC v14 09/80] target/arm: only build psci for TCG
> https://lore.kernel.org/r/20210416162824.25131-10-cfont...@suse.de
> ---
> target/arm/meson.build | 5 -
> 1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/target/arm/meson.build b/target/arm/meson.build
> index 87e911b27f..26e425418f 100644
> --- a/target/arm/meson.build
> +++ b/target/arm/meson.build
> @@ -61,10 +61,13 @@ arm_softmmu_ss.add(files(
>   'arm-powerctl.c',
>   'machine.c',
>   'monitor.c',
> -  'psci.c',
>   'ptw.c',
> ))
> 
> +arm_softmmu_ss.add(when: 'CONFIG_TCG', if_true: files(
> +  'psci.c',
> +))
> +
> subdir('hvf')
> 
> target_arch += {'arm': arm_ss}
> -- 
> 2.35.3
> 



[PATCH 2/3] i386: kvm: Add support for MSR filtering

2022-10-04 Thread Alexander Graf
KVM has grown support to deflect arbitrary MSRs to user space since
Linux 5.10. For now we don't expect to make a lot of use of this
feature, so let's expose it the easiest way possible: With up to 16
individually maskable MSRs.

This patch adds a kvm_filter_msr() function that other code can call
to install a hook on KVM MSR reads or writes.

Signed-off-by: Alexander Graf 
---
 target/i386/kvm/kvm.c  | 124 +
 target/i386/kvm/kvm_i386.h |  11 
 2 files changed, 135 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a1fd1f5379..ea53092dd0 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -139,6 +139,8 @@ static struct kvm_cpuid2 *cpuid_cache;
 static struct kvm_cpuid2 *hv_cpuid_cache;
 static struct kvm_msr_list *kvm_feature_msrs;
 
+static KVMMSRHandlers msr_handlers[KVM_MSR_FILTER_MAX_RANGES];
+
 #define BUS_LOCK_SLICE_TIME 10ULL /* ns */
 static RateLimit bus_lock_ratelimit_ctrl;
 static int kvm_get_one_msr(X86CPU *cpu, int index, uint64_t *value);
@@ -2588,6 +2590,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 }
 }
 
+if (kvm_vm_check_extension(s, KVM_CAP_X86_USER_SPACE_MSR)) {
+ret = kvm_vm_enable_cap(s, KVM_CAP_X86_USER_SPACE_MSR, 0,
+KVM_MSR_EXIT_REASON_FILTER);
+if (ret) {
+error_report("Could not enable user space MSRs: %s",
+ strerror(-ret));
+exit(1);
+}
+}
+
 return 0;
 }
 
@@ -5077,6 +5089,108 @@ void kvm_arch_update_guest_debug(CPUState *cpu, struct 
kvm_guest_debug *dbg)
 }
 }
 
+static bool kvm_install_msr_filters(KVMState *s)
+{
+uint64_t zero = 0;
+struct kvm_msr_filter filter = {
+.flags = KVM_MSR_FILTER_DEFAULT_ALLOW,
+};
+int r, i, j = 0;
+
+for (i = 0; i < KVM_MSR_FILTER_MAX_RANGES; i++) {
+KVMMSRHandlers *handler = &msr_handlers[i];
+if (handler->msr) {
+struct kvm_msr_filter_range *range = &filter.ranges[j++];
+
+*range = (struct kvm_msr_filter_range) {
+.flags = 0,
+.nmsrs = 1,
+.base = handler->msr,
+.bitmap = (__u8 *)&zero,
+};
+
+if (handler->rdmsr) {
+range->flags |= KVM_MSR_FILTER_READ;
+}
+
+if (handler->wrmsr) {
+range->flags |= KVM_MSR_FILTER_WRITE;
+}
+}
+}
+
+r = kvm_vm_ioctl(s, KVM_X86_SET_MSR_FILTER, &filter);
+if (r) {
+return false;
+}
+
+return true;
+}
+
+bool kvm_filter_msr(KVMState *s, uint32_t msr, QEMURDMSRHandler *rdmsr,
+QEMUWRMSRHandler *wrmsr)
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(msr_handlers); i++) {
+if (!msr_handlers[i].msr) {
+msr_handlers[i] = (KVMMSRHandlers) {
+.msr = msr,
+.rdmsr = rdmsr,
+.wrmsr = wrmsr,
+};
+
+if (!kvm_install_msr_filters(s)) {
+msr_handlers[i] = (KVMMSRHandlers) { };
+return false;
+}
+
+return true;
+}
+}
+
+return false;
+}
+
+static int kvm_handle_rdmsr(X86CPU *cpu, struct kvm_run *run)
+{
+int i;
+bool r;
+
+for (i = 0; i < ARRAY_SIZE(msr_handlers); i++) {
+KVMMSRHandlers *handler = &msr_handlers[i];
+if (run->msr.index == handler->msr) {
+if (handler->rdmsr) {
+r = handler->rdmsr(cpu, handler->msr,
+   (uint64_t *)&run->msr.data);
+run->msr.error = r ? 0 : 1;
+return 0;
+}
+}
+}
+
+assert(false);
+}
+
+static int kvm_handle_wrmsr(X86CPU *cpu, struct kvm_run *run)
+{
+int i;
+bool r;
+
+for (i = 0; i < ARRAY_SIZE(msr_handlers); i++) {
+KVMMSRHandlers *handler = &msr_handlers[i];
+if (run->msr.index == handler->msr) {
+if (handler->wrmsr) {
+r = handler->wrmsr(cpu, handler->msr, run->msr.data);
+run->msr.error = r ? 0 : 1;
+return 0;
+}
+}
+}
+
+assert(false);
+}
+
 static bool has_sgx_provisioning;
 
 static bool __kvm_enable_sgx_provisioning(KVMState *s)
@@ -5176,6 +5290,16 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run 
*run)
 /* already handled in kvm_arch_post_run */
 ret = 0;
 break;
+case KVM_EXIT_X86_RDMSR:
+/* We only enable MSR filtering, any other exit is bogus */
+assert(run->msr.reason == KVM_MSR_EXIT_REASON_FILTER);
+ret = kvm_handle_rdmsr(cpu, run);
+break;
+case KVM_EXIT_X86_WRMSR:
+/* We only enable MSR filtering, any other exit is bogus */
+assert(run->msr.re

[PATCH 3/3] KVM: x86: Implement MSR_CORE_THREAD_COUNT MSR

2022-10-04 Thread Alexander Graf
The MSR_CORE_THREAD_COUNT MSR describes CPU package topology, such as number
of threads and cores for a given package. This is information that QEMU has
readily available and can provide through the new user space MSR deflection
interface.

This patch propagates the existing hvf logic from patch 027ac0cb516
("target/i386/hvf: add rdmsr 35H MSR_CORE_THREAD_COUNT") to KVM.

Signed-off-by: Alexander Graf 
---
 target/i386/kvm/kvm.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index ea53092dd0..791e995389 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2403,6 +2403,17 @@ static int kvm_get_supported_msrs(KVMState *s)
 return ret;
 }
 
+static bool kvm_rdmsr_core_thread_count(X86CPU *cpu, uint32_t msr,
+uint64_t *val)
+{
+CPUState *cs = CPU(cpu);
+
+*val = cs->nr_threads * cs->nr_cores; /* thread count, bits 15..0 */
+*val |= ((uint32_t)cs->nr_cores << 16); /* core count, bits 31..16 */
+
+return true;
+}
+
 static Notifier smram_machine_done;
 static KVMMemoryListener smram_listener;
 static AddressSpace smram_address_space;
@@ -2591,6 +2602,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 }
 
 if (kvm_vm_check_extension(s, KVM_CAP_X86_USER_SPACE_MSR)) {
+bool r;
+
 ret = kvm_vm_enable_cap(s, KVM_CAP_X86_USER_SPACE_MSR, 0,
 KVM_MSR_EXIT_REASON_FILTER);
 if (ret) {
@@ -2598,6 +2611,14 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
  strerror(-ret));
 exit(1);
 }
+
+r = kvm_filter_msr(s, MSR_CORE_THREAD_COUNT,
+   kvm_rdmsr_core_thread_count, NULL);
+if (!r) {
+error_report("Could not install MSR_CORE_THREAD_COUNT handler: %s",
+ strerror(-ret));
+exit(1);
+}
 }
 
 return 0;
-- 
2.37.0 (Apple Git-136)




[PATCH 0/3] Add TCG & KVM support for MSR_CORE_THREAD_COUNT

2022-10-04 Thread Alexander Graf
Commit 027ac0cb516 ("target/i386/hvf: add rdmsr 35H
MSR_CORE_THREAD_COUNT") added support for the MSR_CORE_THREAD_COUNT MSR
to HVF. This MSR is mandatory to execute macOS when run with -cpu
host,+hypervisor.

This patch set adds support for the very same MSR to TCG as well as
KVM - as long as host KVM is recent enough to support MSR trapping.

With this support added, I can successfully execute macOS guests in
KVM with an APFS enabled OVMF build, a valid applesmc plus OSK and

  -cpu Skylake-Client,+invtsc,+hypervisor


Alex

Alexander Graf (3):
  x86: Implement MSR_CORE_THREAD_COUNT MSR
  i386: kvm: Add support for MSR filtering
  KVM: x86: Implement MSR_CORE_THREAD_COUNT MSR

 target/i386/kvm/kvm.c| 145 +++
 target/i386/kvm/kvm_i386.h   |  11 ++
 target/i386/tcg/sysemu/misc_helper.c |   5 +
 3 files changed, 161 insertions(+)

-- 
2.37.0 (Apple Git-136)




[PATCH 1/3] x86: Implement MSR_CORE_THREAD_COUNT MSR

2022-10-04 Thread Alexander Graf
Intel CPUs starting with Haswell-E implement a new MSR called
MSR_CORE_THREAD_COUNT which exposes the number of threads and cores
inside of a package.

This MSR is used by XNU to populate internal data structures and not
implementing it prevents virtual machines with more than 1 vCPU from
booting if the emulated CPU generation is at least Haswell-E.

This patch propagates the existing hvf logic from patch 027ac0cb516
("target/i386/hvf: add rdmsr 35H MSR_CORE_THREAD_COUNT") to TCG.

Signed-off-by: Alexander Graf 
---
 target/i386/tcg/sysemu/misc_helper.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/target/i386/tcg/sysemu/misc_helper.c 
b/target/i386/tcg/sysemu/misc_helper.c
index 1328aa656f..e1528b7f80 100644
--- a/target/i386/tcg/sysemu/misc_helper.c
+++ b/target/i386/tcg/sysemu/misc_helper.c
@@ -450,6 +450,11 @@ void helper_rdmsr(CPUX86State *env)
  case MSR_IA32_UCODE_REV:
 val = x86_cpu->ucode_rev;
 break;
+case MSR_CORE_THREAD_COUNT: {
+CPUState *cs = CPU(x86_cpu);
+val = (cs->nr_threads * cs->nr_cores) | (cs->nr_cores << 16);
+break;
+}
 default:
 if ((uint32_t)env->regs[R_ECX] >= MSR_MC0_CTL
 && (uint32_t)env->regs[R_ECX] < MSR_MC0_CTL +
-- 
2.37.0 (Apple Git-136)




Re: [PATCH v2 03/11] target/arm: ensure HVF traps set appropriate MemTxAttrs

2022-09-26 Thread Alexander Graf



On 26.09.22 15:38, Alex Bennée wrote:

As most HVF devices are done purely in software we need to make sure
we properly encode the source CPU in MemTxAttrs. This will allow the
device emulations to use those attributes rather than relying on
current_cpu (although current_cpu will still be correct in this case).

Signed-off-by: Alex Bennée 
Cc: Mads Ynddal 
Cc: Alexander Graf 
---
  target/arm/hvf/hvf.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 060aa0ccf4..13b7971560 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -1233,11 +1233,11 @@ int hvf_vcpu_exec(CPUState *cpu)
  val = hvf_get_reg(cpu, srt);
  address_space_write(&address_space_memory,
  hvf_exit->exception.physical_address,
-MEMTXATTRS_UNSPECIFIED, &val, len);
+MEMTXATTRS_CPU(cpu->cpu_index), &val, len);



I think it would make a safer API if MEMTXATTRS_CPU() would take 
CPUState * as argument so you can just pass in cpu here.


For the HVF part however,

Acked-by: Alexander Graf 


Alex





  1   2   3   4   5   6   7   8   9   10   >