[PATCH v3] target-i386: Walk NPT in guest real mode
When translating virtual to physical address with a guest CPU that supports nested paging (NPT), we need to perform every page table walk access indirectly through the NPT, which we correctly do. However, we treat real mode (no page table walk) special: In that case, we currently just skip any walks and translate VA -> PA. With NPT enabled, we also need to then perform NPT walk to do GVA -> GPA -> HPA which we fail to do so far. The net result of that is that TCG VMs with NPT enabled that execute real mode code (like SeaBIOS) end up with GPA==HPA mappings which means the guest accesses host code and data. This typically shows as failure to boot guests. This patch changes the page walk logic for NPT enabled guests so that we always perform a GVA -> GPA translation and then skip any logic that requires an actual PTE. That way, all remaining logic to walk the NPT stays and we successfully walk the NPT in real mode. Fixes: fe441054bb3f0 ("target-i386: Add NPT support") Signed-off-by: Alexander Graf Reported-by: Eduard Vlad Reviewed-by: Richard Henderson --- v1 -> v2: - Remove hack where we fake a PTE and instead just set the corresponding resolved variables and jump straight to the stage2 code. v2 -> v3: - Fix comment --- target/i386/tcg/sysemu/excp_helper.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/target/i386/tcg/sysemu/excp_helper.c b/target/i386/tcg/sysemu/excp_helper.c index 8fb05b1f53..24dd6935f9 100644 --- a/target/i386/tcg/sysemu/excp_helper.c +++ b/target/i386/tcg/sysemu/excp_helper.c @@ -298,7 +298,7 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, /* combine pde and pte nx, user and rw protections */ ptep &= pte ^ PG_NX_MASK; page_size = 4096; -} else { +} else if (pg_mode) { /* * Page table level 2 */ @@ -343,6 +343,15 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, ptep &= pte | PG_NX_MASK; page_size = 4096; rsvd_mask = 0; +} else { +/* + * No paging (real mode), let's tentatively resolve the address as 1:1 + * here, but conditionally still perform an NPT walk on it later. + */ +page_size = 0x4000; +paddr = in->addr; +prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC; +goto stage2; } do_check_protect: @@ -420,6 +429,7 @@ do_check_protect_pse36: /* merge offset within page */ paddr = (pte & PG_ADDRESS_MASK & ~(page_size - 1)) | (addr & (page_size - 1)); +stage2: /* * Note that NPT is walked (for both paging structures and final guest @@ -562,7 +572,7 @@ static bool get_physical_address(CPUX86State *env, vaddr addr, addr = (uint32_t)addr; } -if (likely(env->cr[0] & CR0_PG_MASK)) { +if (likely(env->cr[0] & CR0_PG_MASK || use_stage2)) { in.cr3 = env->cr[3]; in.mmu_idx = mmu_idx; in.ptw_idx = use_stage2 ? MMU_NESTED_IDX : MMU_PHYS_IDX; -- 2.40.1 Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597
[PATCH v2] target-i386: Walk NPT in guest real mode
When translating virtual to physical address with a guest CPU that supports nested paging (NPT), we need to perform every page table walk access indirectly through the NPT, which we correctly do. However, we treat real mode (no page table walk) special: In that case, we currently just skip any walks and translate VA -> PA. With NPT enabled, we also need to then perform NPT walk to do GVA -> GPA -> HPA which we fail to do so far. The net result of that is that TCG VMs with NPT enabled that execute real mode code (like SeaBIOS) end up with GPA==HPA mappings which means the guest accesses host code and data. This typically shows as failure to boot guests. This patch changes the page walk logic for NPT enabled guests so that we always perform a GVA -> GPA translation and then skip any logic that requires an actual PTE. That way, all remaining logic to walk the NPT stays and we successfully walk the NPT in real mode. Fixes: fe441054bb3f0 ("target-i386: Add NPT support") Signed-off-by: Alexander Graf Reported-by: Eduard Vlad --- v1 -> v2: - Remove hack where we fake a PTE and instead just set the corresponding resolved variables and jump straight to the stage2 code. --- target/i386/tcg/sysemu/excp_helper.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/target/i386/tcg/sysemu/excp_helper.c b/target/i386/tcg/sysemu/excp_helper.c index 8fb05b1f53..4622d45643 100644 --- a/target/i386/tcg/sysemu/excp_helper.c +++ b/target/i386/tcg/sysemu/excp_helper.c @@ -298,7 +298,7 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, /* combine pde and pte nx, user and rw protections */ ptep &= pte ^ PG_NX_MASK; page_size = 4096; -} else { +} else if (pg_mode) { /* * Page table level 2 */ @@ -343,6 +343,12 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, ptep &= pte | PG_NX_MASK; page_size = 4096; rsvd_mask = 0; +} else { +/* No paging (real mode), let's assemble a fake 1:1 1GiB PTE */ +page_size = 0x4000; +paddr = in->addr; +prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC; +goto stage2; } do_check_protect: @@ -420,6 +426,7 @@ do_check_protect_pse36: /* merge offset within page */ paddr = (pte & PG_ADDRESS_MASK & ~(page_size - 1)) | (addr & (page_size - 1)); +stage2: /* * Note that NPT is walked (for both paging structures and final guest @@ -562,7 +569,7 @@ static bool get_physical_address(CPUX86State *env, vaddr addr, addr = (uint32_t)addr; } -if (likely(env->cr[0] & CR0_PG_MASK)) { +if (likely(env->cr[0] & CR0_PG_MASK || use_stage2)) { in.cr3 = env->cr[3]; in.mmu_idx = mmu_idx; in.ptw_idx = use_stage2 ? MMU_NESTED_IDX : MMU_PHYS_IDX; -- 2.40.1 Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597
Re: vm events, userspace, the vmgenid driver, and the future [was: the uevent revert thread]
On 19.09.24 00:27, Jason A. Donenfeld wrote: [broadened subject line and added relevant parties to cc list] On Tue, Sep 17, 2024 at 10:55:20PM +0200, Alexander Graf wrote: What is still open are user space applications that require event based notification on VM clone events - and *only* VM clone events. This mostly caters for tools like systemd which need to execute policy - such as generating randomly generated MAC addresses - in the event a VM was cloned. That's the use case this patch "vmgenid: emit uevent when VMGENID updates" is about and I think the best path forward is to just revert the revert. A uevent from the device driver is a well established, well fitting Linux mechanism for that type of notification. The thing that worries me is that vmgenid is just some weird random microsoft acpi driver. It's one sort of particular device, and not a very good one at that. There's still room for virtio/qemu to improve on it with their own thing, or for vbox or whatever else to have their version, and xen theirs, and so forth. That is to say, I'm not sure that this virtual hardware is *the* way of doing it. I agree, but given that it's been a few years and nobody else really came up with a different device, it means the current semantics for the scope of what the device is doing are close to "good enough". So I don't expect a lot of innovation here. And if there will be innovation - as you point out - it will bring different semantics that will then also require user space changes anyway. Even in terms of the entropy stuff (which I know you no longer care about, but I do), mst's original virtio-rng draft mentioned reporting events beyond just VM forks, extending it generically to any kind of entropy reduction situation. For example, migration or suspend or whatever might be interesting things to trigger. Heck, one could imagine those coming through vmgenid at some point, which would then change the semantics you're after for systemd. If they come through vmgenid, it would need to gain a new type of event at which point the uevent notification would also change. I'm also not sure why live migration would trigger either a vm clone or any rng relevant event. And suspend is something we already have the machinery for to detect. Even in terms of reporting exclusively about external VM events, there's a subtle thing to consider between clones/forks and rollbacks, as well as migrations. Vmgenid kind of lumps it all together, and hopefully the It's the opposite: VMGenID is exclusively concerned about clones. It doesn't care about rollbacks. It doesn't care about migrations. Its value effectively changes when you clone a VM; and only then. hypervisor notifies in a way consistent with what userspace was hoping to learn about. (Right now, maybe we're doing what Hyper-V does, maybe, but also maybe not; it's kind of loose.) So at some point, there's a question about the limitations of vmgenid and the possible extensions of it, or whether this will come in a different driver or virtual hardware, and how. To me a lot of this is too vague to be actionable. Unless someone comes in with real scenarios where they care about other scenarios, it sounds to me like the one scenario that vmgenid covers is what system level user space cares about. If in a few years we realize that we need 3 different types of events, we can start looking at ways to funnel those in a more abstract way. Until then, because we don't know what these events will be, we can't even design an API that would address them. Keep in mind that we're not really talking here about building a generic API for any random user space application. We only want to give system software the ability to reason about system events. IMHO any more abstract layer to funnel multiple different of these to downstream user space (if we ever care) would be a user space problem to solve, like for example a dbus event. Right now, this is mostly unexplored. The virtio-rng avenue was largest step in terms of exploring this problem space, but there are obviously a few directions to go, depending on what your primary concern is. But all of that makes me think that exposing the particulars of this virtual hardware driver to userspace is not the best option, or at least not an option to rush into (or to trick Greg into), and will both limit I'm pretty sure I never tricked Greg into anything :) what we can do with it later, and potentially burden userspace with having to check multiple different things with confusing interactions down the road. So I think it's worth stepping back a bit and thinking This interface here is only available to effectively udev/systemd type software. Any abstraction above that should be on them. And if we eventually decide that we need a better interface to generic use
[PATCH] target-i386: Walk NPT in guest real mode
When translating virtual to physical address with a guest CPU that supports nested paging (NPT), we need to perform every page table walk access indirectly through the NPT, which we correctly do. However, we treat real mode (no page table walk) special: In that case, we currently just skip any walks and translate VA -> PA. With NPT enabled, we also need to then perform NPT walk to do GVA -> GPA -> HPA which we fail to do so far. The net result of that is that TCG VMs with NPT enabled that execute real mode code (like SeaBIOS) end up with GPA==HPA mappings which means the guest accesses host code and data. This typically shows as failure to boot guests. This patch changes the page walk logic for NPT enabled guests so that we always perform a GVA -> GPA translation, but simply provide a 1 GiB fake PTE when in real mode. That way, all remaining logic to walk the NPT stays and we successfully walk the NPT in real mode. Fixes: fe441054bb3f0 ("target-i386: Add NPT support") Signed-off-by: Alexander Graf Reported-by: Eduard Vlad --- target/i386/tcg/sysemu/excp_helper.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/target/i386/tcg/sysemu/excp_helper.c b/target/i386/tcg/sysemu/excp_helper.c index 8fb05b1f53..17f45431f6 100644 --- a/target/i386/tcg/sysemu/excp_helper.c +++ b/target/i386/tcg/sysemu/excp_helper.c @@ -298,7 +298,7 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, /* combine pde and pte nx, user and rw protections */ ptep &= pte ^ PG_NX_MASK; page_size = 4096; -} else { +} else if (pg_mode) { /* * Page table level 2 */ @@ -343,6 +343,12 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, ptep &= pte | PG_NX_MASK; page_size = 4096; rsvd_mask = 0; +} else { +/* No paging (real mode), let's assemble a fake 1:1 1GiB PTE */ +page_size = 0x4000; +pte = (in->addr & ~(page_size - 1)) | PG_DIRTY_MASK | PG_ACCESSED_MASK; +ptep = PG_NX_MASK | PG_USER_MASK | PG_RW_MASK; +rsvd_mask = 0; } do_check_protect: @@ -562,7 +568,7 @@ static bool get_physical_address(CPUX86State *env, vaddr addr, addr = (uint32_t)addr; } -if (likely(env->cr[0] & CR0_PG_MASK)) { +if (likely(env->cr[0] & CR0_PG_MASK || use_stage2)) { in.cr3 = env->cr[3]; in.mmu_idx = mmu_idx; in.ptw_idx = use_stage2 ? MMU_NESTED_IDX : MMU_PHYS_IDX; -- 2.40.1 Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597
Re: [PATCH v4 4/6] machine/nitro-enclave: Add built-in Nitro Secure Module device
On 19.08.24 17:28, Dorjoy Chowdhury wrote: Hey Alex, On Mon, Aug 19, 2024 at 4:13 PM Alexander Graf wrote: Hey Dorjoy, On 18.08.24 13:42, Dorjoy Chowdhury wrote: AWS Nitro Enclaves have built-in Nitro Secure Module (NSM) device which is used for stripped down TPM functionality like attestation. This commit adds the built-in NSM device in the nitro-enclave machine type. In Nitro Enclaves, all the PCRs start in a known zero state and the first 16 PCRs are locked from boot and reserved. The PCR0, PCR1, PCR2 and PCR8 contain the SHA384 hashes related to the EIF file used to boot the VM for validation. Some optional nitro-enclave machine options have been added: - 'id': Enclave identifier, reflected in the module-id of the NSM device. If not provided, a default id will be set. - 'parent-role': Parent instance IAM role ARN, reflected in PCR3 of the NSM device. - 'parent-id': Parent instance identifier, reflected in PCR4 of the NSM device. Signed-off-by: Dorjoy Chowdhury --- crypto/meson.build | 2 +- crypto/x509-utils.c | 73 +++ Can you please put this new API into its own patch file? hw/core/eif.c | 225 +--- hw/core/eif.h | 5 +- These changes to eif.c should ideally already be part of the patch that introduces eif.c (patch 1), no? In fact, do you think you can make the whole eif logic its own patch file? Good point. I guess it should be possible if I have the virtio-nsm device commit first and then add the machine/nitro-enclave commit with full support with the devices. That will of course make the machine/nitro-enclave commit larger. What do you think? As long as nothing compiles the code, it can rely on not yet implemented functions. So it's perfectly legit to add all your code in individual commits and then at the end add the meson.build change that implements the config option. How about the order below? * Crypto patch for SHA384 * Crypto patch for x509 fingerprint * NSM device emulation (including libcbor check, introduces CONFIG_VIRTIO_NSM) * EIF format parsing (not compiled yet) * Nitro Enclaves machine (introduces CONFIG_NITRO_ENCLAVE) * Nitro Enclaves docs Alex Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597
Re: [PATCH v4 4/6] machine/nitro-enclave: Add built-in Nitro Secure Module device
Hey Dorjoy, On 18.08.24 13:42, Dorjoy Chowdhury wrote: AWS Nitro Enclaves have built-in Nitro Secure Module (NSM) device which is used for stripped down TPM functionality like attestation. This commit adds the built-in NSM device in the nitro-enclave machine type. In Nitro Enclaves, all the PCRs start in a known zero state and the first 16 PCRs are locked from boot and reserved. The PCR0, PCR1, PCR2 and PCR8 contain the SHA384 hashes related to the EIF file used to boot the VM for validation. Some optional nitro-enclave machine options have been added: - 'id': Enclave identifier, reflected in the module-id of the NSM device. If not provided, a default id will be set. - 'parent-role': Parent instance IAM role ARN, reflected in PCR3 of the NSM device. - 'parent-id': Parent instance identifier, reflected in PCR4 of the NSM device. Signed-off-by: Dorjoy Chowdhury --- crypto/meson.build | 2 +- crypto/x509-utils.c | 73 +++ Can you please put this new API into its own patch file? hw/core/eif.c | 225 +--- hw/core/eif.h | 5 +- These changes to eif.c should ideally already be part of the patch that introduces eif.c (patch 1), no? In fact, do you think you can make the whole eif logic its own patch file? hw/core/meson.build | 4 +- hw/i386/Kconfig | 1 + hw/i386/nitro_enclave.c | 141 +++- include/crypto/x509-utils.h | 22 include/hw/i386/nitro_enclave.h | 26 9 files changed, 479 insertions(+), 20 deletions(-) create mode 100644 crypto/x509-utils.c create mode 100644 include/crypto/x509-utils.h diff --git a/crypto/meson.build b/crypto/meson.build index c46f9c22a7..09633194ed 100644 --- a/crypto/meson.build +++ b/crypto/meson.build @@ -62,7 +62,7 @@ endif if gcrypt.found() util_ss.add(gcrypt, files('random-gcrypt.c')) elif gnutls.found() - util_ss.add(gnutls, files('random-gnutls.c')) + util_ss.add(gnutls, files('random-gnutls.c', 'x509-utils.c')) What if we don't have gnutls. Will everything still compile or do we need to add any dependencies? elif get_option('rng_none') util_ss.add(files('random-none.c')) else diff --git a/crypto/x509-utils.c b/crypto/x509-utils.c new file mode 100644 index 00..2422eb995c --- /dev/null +++ b/crypto/x509-utils.c @@ -0,0 +1,73 @@ +/* + * X.509 certificate related helpers + * + * Copyright (c) 2024 Dorjoy Chowdhury + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * (at your option) any later version. See the COPYING file in the + * top-level directory. + */ + +#include "qemu/osdep.h" +#include "qapi/error.h" +#include "crypto/x509-utils.h" +#include +#include +#include + +static int qcrypto_to_gnutls_hash_alg_map[QCRYPTO_HASH_ALG__MAX] = { +[QCRYPTO_HASH_ALG_MD5] = GNUTLS_DIG_MD5, +[QCRYPTO_HASH_ALG_SHA1] = GNUTLS_DIG_SHA1, +[QCRYPTO_HASH_ALG_SHA224] = GNUTLS_DIG_SHA224, +[QCRYPTO_HASH_ALG_SHA256] = GNUTLS_DIG_SHA256, +[QCRYPTO_HASH_ALG_SHA384] = GNUTLS_DIG_SHA384, +[QCRYPTO_HASH_ALG_SHA512] = GNUTLS_DIG_SHA512, +[QCRYPTO_HASH_ALG_RIPEMD160] = GNUTLS_DIG_RMD160, +}; + +int qcrypto_get_x509_cert_fingerprint(uint8_t *cert, size_t size, + QCryptoHashAlgorithm alg, + uint8_t **result, + size_t *resultlen, + Error **errp) +{ +int ret; +gnutls_x509_crt_t crt; +gnutls_datum_t datum = {.data = cert, .size = size}; + +if (alg >= G_N_ELEMENTS(qcrypto_to_gnutls_hash_alg_map)) { +error_setg(errp, "Unknown hash algorithm"); +return -1; +} + +gnutls_x509_crt_init(&crt); + +if (gnutls_x509_crt_import(crt, &datum, GNUTLS_X509_FMT_PEM) != 0) { +error_setg(errp, "Failed to import certificate"); +goto cleanup; +} + +ret = gnutls_hash_get_len(qcrypto_to_gnutls_hash_alg_map[alg]); +if (*resultlen == 0) { +*resultlen = ret; +*result = g_new0(uint8_t, *resultlen); +} else if (*resultlen < ret) { +error_setg(errp, + "Result buffer size %zu is smaller than hash %d", + *resultlen, ret); +goto cleanup; +} + +if (gnutls_x509_crt_get_fingerprint(crt, +qcrypto_to_gnutls_hash_alg_map[alg], +*result, resultlen) != 0) { +error_setg(errp, "Failed to get fingerprint from certificate"); +goto cleanup; +} + +return 0; + + cleanup: +gnutls_x509_crt_deinit(crt); +return -1; +} diff --git a/hw/core/eif.c b/hw/core/eif.c index 5558879a96..8e15142d36 100644 --- a/hw/core/eif.c +++ b/hw/core/eif.c @@ -11,7 +11,10 @@ #include "qemu/osdep.h" #include "qemu/bswap.h" #include "qapi/error.h" +#include
Re: [PATCH v4 3/6] device/virtio-nsm: Support for Nitro Secure Module device
On 18.08.24 13:42, Dorjoy Chowdhury wrote: Nitro Secure Module (NSM)[1] device is used in AWS Nitro Enclaves for stripped down TPM functionality like cryptographic attestation. The requests to and responses from NSM device are CBOR[2] encoded. This commit adds support for NSM device in QEMU. Although related to AWS Nitro Enclaves, the virito-nsm device is independent and can be used in other machine types as well. The libcbor[3] library has been used for the CBOR encoding and decoding functionalities. [1] https://lists.oasis-open.org/archives/virtio-comment/202310/msg00387.html [2] http://cbor.io/ [3] https://libcbor.readthedocs.io/en/latest/ Signed-off-by: Dorjoy Chowdhury [...] +static bool add_payload_to_cose(cbor_item_t *cose, VirtIONSM *vnsm) +{ +cbor_item_t *root = NULL; +cbor_item_t *nested_map; +cbor_item_t *bs = NULL; +size_t locked_cnt; +uint8_t ind[NSM_MAX_PCRS]; +size_t payload_map_size = 6; +size_t len; +struct PCRInfo *pcr; +uint8_t zero[64] = {0}; +bool r = false; +size_t buf_len = 16384; +uint8_t *buf = g_malloc(buf_len); + +if (vnsm->public_key_len > 0) { +payload_map_size++; +} +if (vnsm->user_data_len > 0) { +payload_map_size++; +} +if (vnsm->nonce_len > 0) { +payload_map_size++; +} Now that you're always emitting user_data and nonce, you should include them in payload_map_size unconditionally as well; otherwise your map is too small to hold all members. In addition, a real Nitro Enclave attestation document will return Null objects for these fields when they're not set instead of empty strings. With the patch below I was able to generate a doc that looks very similar to a real one: diff --git a/hw/virtio/cbor-helpers.c b/hw/virtio/cbor-helpers.c index 5140020d4e..ffecc97c48 100644 --- a/hw/virtio/cbor-helpers.c +++ b/hw/virtio/cbor-helpers.c @@ -140,7 +140,11 @@ bool qemu_cbor_add_bytestring_to_map(cbor_item_t *map, const char *key, if (!key_cbor) { goto cleanup; } - value_cbor = cbor_build_bytestring(arr, len); + if (len) { + value_cbor = cbor_build_bytestring(arr, len); + } else { + value_cbor = cbor_new_null(); + } if (!value_cbor) { goto cleanup; } @@ -241,7 +245,11 @@ bool qemu_cbor_add_uint8_key_bytestring_to_map(cbor_item_t *map, uint8_t key, if (!key_cbor) { goto cleanup; } - value_cbor = cbor_build_bytestring(buf, len); + if (len) { + value_cbor = cbor_build_bytestring(buf, len); + } else { + value_cbor = cbor_new_null(); + } if (!value_cbor) { goto cleanup; } diff --git a/hw/virtio/virtio-nsm.c b/hw/virtio/virtio-nsm.c index e91848a2b0..b45d97efe2 100644 --- a/hw/virtio/virtio-nsm.c +++ b/hw/virtio/virtio-nsm.c @@ -1126,7 +1126,7 @@ static bool add_payload_to_cose(cbor_item_t *cose, VirtIONSM *vnsm) cbor_item_t *bs = NULL; size_t locked_cnt; uint8_t ind[NSM_MAX_PCRS]; - size_t payload_map_size = 6; + size_t payload_map_size = 8; size_t len; struct PCRInfo *pcr; uint8_t zero[64] = {0}; @@ -1137,12 +1137,6 @@ static bool add_payload_to_cose(cbor_item_t *cose, VirtIONSM *vnsm) if (vnsm->public_key_len > 0) { payload_map_size++; } - if (vnsm->user_data_len > 0) { - payload_map_size++; - } - if (vnsm->nonce_len > 0) { - payload_map_size++; - } root = cbor_new_definite_map(payload_map_size); if (!root) { goto cleanup; Alex Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597
Re: [PATCH v3 2/5] machine/nitro-enclave: Add vhost-user-vsock device
On 13.08.24 20:02, Dorjoy Chowdhury wrote: On Mon, Aug 12, 2024 at 8:24 PM Daniel P. Berrangé wrote: On Sat, Aug 10, 2024 at 10:44:59PM +0600, Dorjoy Chowdhury wrote: AWS Nitro Enclaves have built-in vhost-vsock device support which enables applications in enclave VMs to communicate with the parent EC2 VM over vsock. The enclave VMs have dynamic CID while the parent always has CID 3. In QEMU, the vsock emulation for nitro enclave is added using vhost-user-vsock as opposed to vhost-vsock. vhost-vsock doesn't support sibling VM communication which is needed for nitro enclaves. In QEMU's nitro-enclave emulation, for the vsock communication to CID 3 to work, another process that does the vsock emulation in userspace must be run, for example, vhost-device-vsock[1] from rust-vmm, with necessary vsock communication support in another guest VM with CID 3. A new mandatory nitro-enclave machine option 'vsock' has been added. The value for this option should be the chardev id from the '-chardev' option for the vhost-user-vsock device to work. Using vhost-user-vsock also enables the possibility to implement some proxying support in the vhost-user-vsock daemon that will forward all the packets to the host machine instead of CID 3 so that users of nitro-enclave can run the necessary applications in their host machine instead of running another whole VM with CID 3. [1] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock Signed-off-by: Dorjoy Chowdhury --- backends/hostmem-memfd.c| 2 - hw/core/machine.c | 71 +- hw/i386/Kconfig | 1 + hw/i386/nitro_enclave.c | 123 include/hw/boards.h | 2 + include/hw/i386/nitro_enclave.h | 8 +++ include/sysemu/hostmem.h| 2 + 7 files changed, 174 insertions(+), 35 deletions(-) diff --git a/hw/i386/nitro_enclave.c b/hw/i386/nitro_enclave.c index 98690c6373..280ab4cc9b 100644 --- a/hw/i386/nitro_enclave.c +++ b/hw/i386/nitro_enclave.c @@ -11,11 +11,81 @@ #include "qemu/osdep.h" #include "qemu/error-report.h" #include "qapi/error.h" +#include "qom/object_interfaces.h" +#include "chardev/char.h" +#include "hw/sysbus.h" #include "hw/core/eif.h" #include "hw/i386/x86.h" #include "hw/i386/microvm.h" #include "hw/i386/nitro_enclave.h" +#include "hw/virtio/virtio-mmio.h" +#include "hw/virtio/vhost-user-vsock.h" +#include "sysemu/hostmem.h" + +static BusState *find_free_virtio_mmio_bus(void) +{ +BusChild *kid; +BusState *bus = sysbus_get_default(); + +QTAILQ_FOREACH(kid, &bus->children, sibling) { +DeviceState *dev = kid->child; +if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MMIO)) { +VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev)); +VirtioBusState *mmio_virtio_bus = &mmio->bus; +BusState *mmio_bus = &mmio_virtio_bus->parent_obj; +if (QTAILQ_EMPTY(&mmio_bus->children)) { +return mmio_bus; +} +} +} + +return NULL; +} + +static void vhost_user_vsock_init(NitroEnclaveMachineState *nems) +{ +DeviceState *dev = qdev_new(TYPE_VHOST_USER_VSOCK); +VHostUserVSock *vsock = VHOST_USER_VSOCK(dev); +BusState *bus; + +if (!nems->vsock) { +error_report("A valid chardev id for vhost-user-vsock device must be " + "provided using the 'vsock' machine option"); +exit(1); +} + +bus = find_free_virtio_mmio_bus(); +if (!bus) { +error_report("Failed to find bus for vhost-user-vsock device"); +exit(1); +} + +Chardev *chardev = qemu_chr_find(nems->vsock); +if (!chardev) { +error_report("Failed to find chardev with id %s", nems->vsock); +exit(1); +} + +vsock->conf.chardev.chr = chardev; + +qdev_realize_and_unref(dev, bus, &error_fatal); +} Why does this machine need to create the vhost-user-vsock device itself ? Doing it this way prevents the mgmt app from changing any of the other vsock device settings beyond 'chardev'. The entity creating QEMU can use -device to create the vsock device. Hi Daniel. Good point. The reason to make the vhost-user-vsock device built-in is because nitro VMs will always need it anyway (like the virtio-nsm device which is built-in too). When an EIF image is built using nitro-cli the "init" process in the EIF image tries to connect to (AF_VSOCK, CID 3, port 9000) to send a heartbeat and expects a heartbeat reply. So my understanding is that if we don't create it inside the machine code itself, users would always need to provide the explicit options for the device anyway. But as you point out this also makes the device settings non-configurable. Hey Alex, do you have any suggestions on this? IMHO devices that are required for the machine to function should be part of the machine. Since vsock is a core part of the Nitro Enclave, it should be part of the machine definitio
Re: [PATCH v3 4/5] machine/nitro-enclave: Add built-in Nitro Secure Module device
On 10.08.24 18:45, Dorjoy Chowdhury wrote: AWS Nitro Enclaves have built-in Nitro Secure Module (NSM) device which is used for stripped down TPM functionality like attestation. This commit adds the built-in NSM device in the nitro-enclave machine type. In Nitro Enclaves, all the PCRs start in a known zero state and the first 16 PCRs are locked from boot and reserved. The PCR0, PCR1, PCR2 and PCR8 contain the SHA384 hashes related to the EIF file used to boot the VM for validation. A new optional nitro-enclave machine option 'id' has been added which will be the enclave identifier reflected in the module-id of the NSM device. Otherwise, the device will have a default id set. Signed-off-by: Dorjoy Chowdhury --- hw/core/eif.c | 205 +++- hw/core/eif.h | 5 +- hw/core/meson.build | 4 +- hw/i386/Kconfig | 1 + hw/i386/nitro_enclave.c | 85 - include/hw/i386/nitro_enclave.h | 19 +++ 6 files changed, 310 insertions(+), 9 deletions(-) [...] @@ -87,10 +106,46 @@ static void nitro_enclave_machine_state_init(MachineState *machine) nitro_enclave_devices_init(ne_state); } +static void nitro_enclave_machine_reset(MachineState *machine, +ShutdownCause reason) +{ +NitroEnclaveMachineClass *ne_class = +NITRO_ENCLAVE_MACHINE_GET_CLASS(machine); +NitroEnclaveMachineState *ne_state = NITRO_ENCLAVE_MACHINE(machine); + +ne_class->parent_reset(machine, reason); + +memset(ne_state->vnsm->pcrs, 0, sizeof(ne_state->vnsm->pcrs)); + +/* PCR0 */ +ne_state->vnsm->extend_pcr(ne_state->vnsm, 0, ne_state->image_sha384, + SHA384_BYTE_LEN); +/* PCR1 */ +ne_state->vnsm->extend_pcr(ne_state->vnsm, 1, ne_state->bootstrap_sha384, + SHA384_BYTE_LEN); +/* PCR2 */ +ne_state->vnsm->extend_pcr(ne_state->vnsm, 2, ne_state->app_sha384, + SHA384_BYTE_LEN); What about PCR3 and PCR4? Both are just sha384 values of input strings[1]. Can you make these input strings NSM device as well as machine properties as well? [1] https://docs.aws.amazon.com/enclaves/latest/user/set-up-attestation.html Alex Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597
Re: [PATCH v3 3/5] device/virtio-nsm: Support for Nitro Secure Module device
On 10.08.24 18:45, Dorjoy Chowdhury wrote: Nitro Secure Module (NSM)[1] device is used in AWS Nitro Enclaves for stripped down TPM functionality like cryptographic attestation. The requests to and responses from NSM device are CBOR[2] encoded. This commit adds support for NSM device in QEMU. Although related to AWS Nitro Enclaves, the virito-nsm device is independent and can be used in other machine types as well. The libcbor[3] library has been used for the CBOR encoding and decoding functionalities. [1] https://lists.oasis-open.org/archives/virtio-comment/202310/msg00387.html [2] http://cbor.io/ [3] https://libcbor.readthedocs.io/en/latest/ Signed-off-by: Dorjoy Chowdhury --- MAINTAINERS|8 + hw/virtio/Kconfig |5 + hw/virtio/meson.build |4 + hw/virtio/virtio-nsm-pci.c | 73 ++ hw/virtio/virtio-nsm.c | 1929 include/hw/virtio/virtio-nsm.h | 59 + 6 files changed, 2078 insertions(+) create mode 100644 hw/virtio/virtio-nsm-pci.c create mode 100644 hw/virtio/virtio-nsm.c create mode 100644 include/hw/virtio/virtio-nsm.h [...] + +/* + * Attestation request structure: + * + * Map(1) { + * key = String("Attestation"), + * value = Map(3) { + * key = String("user_data"), + * value = Byte_String() || null, + * key = String("nonce"), + * value = Byte_String() || null, + * key = String("public_key"), + * value = Byte_String() || null, + * } + * } + * } + */ +typedef struct NSMAttestationReq { +uint16_t user_data_len; +uint8_t user_data[NSM_USER_DATA_MAX_SIZE]; + +uint16_t nonce_len; +uint8_t nonce[NSM_NONCE_MAX_SIZE]; + +uint16_t public_key_len; +uint8_t public_key[NSM_PUBLIC_KEY_MAX_SIZE]; +} NSMAttestationReq; + +static enum NSMResponseTypes get_nsm_attestation_req(uint8_t *req, size_t len, + NSMAttestationReq *nsm_req) +{ +cbor_item_t *item = NULL; +size_t size; +uint8_t *str; +struct cbor_pair *pair; +struct cbor_load_result result; +enum NSMResponseTypes r = NSM_INVALID_ARGUMENT; + +item = cbor_load(req, len, &result); +if (!item || result.error.code != CBOR_ERR_NONE) { +goto cleanup; +} + +pair = cbor_map_handle(item); +if (!cbor_isa_map(pair->value) || cbor_map_size(pair->value) != 3) { +goto cleanup; +} +pair = cbor_map_handle(pair->value); +if (!cbor_isa_string(pair->key)) { +goto cleanup; +} +str = cbor_string_handle(pair->key); +size = cbor_string_length(pair->key); +if (!str || size != 9 || memcmp(str, "user_data", 9) != 0) { +goto cleanup; +} + +if (cbor_isa_bytestring(pair->value)) { +str = cbor_bytestring_handle(pair->value); +size = cbor_bytestring_length(pair->value); +if (!str || size == 0) { +goto cleanup; +} +if (size > NSM_USER_DATA_MAX_SIZE) { +r = NSM_INPUT_TOO_LARGE; +goto cleanup; +} +memcpy(nsm_req->user_data, str, size); +nsm_req->user_data_len = size; +} else if (cbor_is_null(pair->value)) { +nsm_req->user_data_len = 0; +} else { +goto cleanup; +} + +/* let's move forward */ +pair++; +if (!cbor_isa_string(pair->key)) { +goto cleanup; +} +str = cbor_string_handle(pair->key); +size = cbor_string_length(pair->key); +if (!str || size != 5 || memcmp(str, "nonce", 5) != 0) { +goto cleanup; +} + +if (cbor_isa_bytestring(pair->value)) { +str = cbor_bytestring_handle(pair->value); +size = cbor_bytestring_length(pair->value); +if (!str || size == 0) { +goto cleanup; +} +if (size > NSM_NONCE_MAX_SIZE) { +r = NSM_INPUT_TOO_LARGE; +goto cleanup; +} +memcpy(nsm_req->nonce, str, size); +nsm_req->nonce_len = size; +} else if (cbor_is_null(pair->value)) { +nsm_req->nonce_len = 0; +} else { +goto cleanup; +} + +/* let's move forward */ +pair++; +if (!cbor_isa_string(pair->key)) { +goto cleanup; +} +str = cbor_string_handle(pair->key); +size = cbor_string_length(pair->key); +if (!str || size != 10 || memcmp(str, "public_key", 10) != 0) { +goto cleanup; +} + +if (cbor_isa_bytestring(pair->value)) { +str = cbor_bytestring_handle(pair->value); +size = cbor_bytestring_length(pair->value); +if (!str || size == 0) { +goto cleanup; +} +if (size > NSM_PUBLIC_KEY_MAX_SIZE) { +r = NSM_INPUT_TOO_LARGE; +goto cleanup; +} +memcpy(nsm_req->public_key, str, size); +nsm_req->public_key_len = size; +} else if (cbor_is_null(pair->value)) { +nsm_req->public_key_len = 0; +} else { +goto
Re: [PATCH v3 1/5] machine/nitro-enclave: New machine type for AWS Nitro Enclaves
On 10.08.24 18:44, Dorjoy Chowdhury wrote: AWS nitro enclaves[1] is an Amazon EC2[2] feature that allows creating isolated execution environments, called enclaves, from Amazon EC2 instances which are used for processing highly sensitive data. Enclaves have no persistent storage and no external networking. The enclave VMs are based on Firecracker microvm with a vhost-vsock device for communication with the parent EC2 instance that spawned it and a Nitro Secure Module (NSM) device for cryptographic attestation. The parent instance VM always has CID 3 while the enclave VM gets a dynamic CID. An EIF (Enclave Image Format)[3] file is used to boot an AWS nitro enclave virtual machine. The EIF file contains the necessary kernel, cmdline, ramdisk(s) sections to boot. This commit adds support for limited AWS nitro enclave emulation using a new machine type option '-M nitro-enclave'. This new machine type is based on the 'microvm' machine type, similar to how real nitro enclave VMs are based on Firecracker microvm. For nitro-enclave to boot from an EIF file, the kernel and ramdisk(s) are extracted into a temporary kernel and a temporary initrd file which are then hooked into the regular x86 boot mechanism along with the extracted cmdline. The EIF file path should be provided using the '-kernel' QEMU option. The vsock and NSM devices will be implemented so that they are available automatically in nitro-enclave machine type in the following commits. [1] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html [2] https://aws.amazon.com/ec2/ [3] https://github.com/aws/aws-nitro-enclaves-image-format Signed-off-by: Dorjoy Chowdhury If I run this code with an invalid kernel parameter, something in the error path is off. Can you please try to exercise your error paths to validate they work correctly? $ ./build/qemu-system-x86_64 -M nitro-enclave -nographic -kernel foobar qemu-system-x86_64: ../util/error.c:68: error_setv: Assertion `*errp == NULL' failed. Alex Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597
Re: [PATCH v3 4/5] machine/nitro-enclave: Add built-in Nitro Secure Module device
On 10.08.24 18:45, Dorjoy Chowdhury wrote: AWS Nitro Enclaves have built-in Nitro Secure Module (NSM) device which is used for stripped down TPM functionality like attestation. This commit adds the built-in NSM device in the nitro-enclave machine type. In Nitro Enclaves, all the PCRs start in a known zero state and the first 16 PCRs are locked from boot and reserved. The PCR0, PCR1, PCR2 and PCR8 contain the SHA384 hashes related to the EIF file used to boot the VM for validation. A new optional nitro-enclave machine option 'id' has been added which will be the enclave identifier reflected in the module-id of the NSM device. Otherwise, the device will have a default id set. Signed-off-by: Dorjoy Chowdhury --- hw/core/eif.c | 205 +++- hw/core/eif.h | 5 +- hw/core/meson.build | 4 +- hw/i386/Kconfig | 1 + hw/i386/nitro_enclave.c | 85 - include/hw/i386/nitro_enclave.h | 19 +++ 6 files changed, 310 insertions(+), 9 deletions(-) diff --git a/hw/core/eif.c b/hw/core/eif.c index 5558879a96..d2c65668ef 100644 --- a/hw/core/eif.c +++ b/hw/core/eif.c @@ -12,6 +12,9 @@ #include "qemu/bswap.h" #include "qapi/error.h" #include /* for crc32 */ +#include +#include +#include #include "hw/core/eif.h" @@ -180,6 +183,8 @@ static void safe_unlink(char *f) * Upon success, the caller is reponsible for unlinking and freeing *kernel_path */ static bool read_eif_kernel(FILE *f, uint64_t size, char **kernel_path, +GChecksum *image_hasher, +GChecksum *bootstrap_hasher, uint32_t *crc, Error **errp) { size_t got; @@ -213,6 +218,8 @@ static bool read_eif_kernel(FILE *f, uint64_t size, char **kernel_path, } *crc = crc32(*crc, kernel, size); +g_checksum_update(image_hasher, kernel, size); +g_checksum_update(bootstrap_hasher, kernel, size); g_free(kernel); fclose(tmp_file); @@ -230,6 +237,8 @@ static bool read_eif_kernel(FILE *f, uint64_t size, char **kernel_path, } static bool read_eif_cmdline(FILE *f, uint64_t size, char *cmdline, + GChecksum *image_hasher, + GChecksum *bootstrap_hasher, uint32_t *crc, Error **errp) { size_t got = fread(cmdline, 1, size, f); @@ -239,10 +248,14 @@ static bool read_eif_cmdline(FILE *f, uint64_t size, char *cmdline, } *crc = crc32(*crc, (uint8_t *)cmdline, size); +g_checksum_update(image_hasher, (uint8_t *)cmdline, size); +g_checksum_update(bootstrap_hasher, (uint8_t *)cmdline, size); return true; } static bool read_eif_ramdisk(FILE *eif, FILE *initrd, uint64_t size, + GChecksum *image_hasher, + GChecksum *bootstrap_or_app_hasher, uint32_t *crc, Error **errp) { size_t got; @@ -261,6 +274,8 @@ static bool read_eif_ramdisk(FILE *eif, FILE *initrd, uint64_t size, } *crc = crc32(*crc, ramdisk, size); +g_checksum_update(image_hasher, ramdisk, size); +g_checksum_update(bootstrap_or_app_hasher, ramdisk, size); g_free(ramdisk); return true; @@ -269,6 +284,125 @@ static bool read_eif_ramdisk(FILE *eif, FILE *initrd, uint64_t size, return false; } +static bool get_fingerprint_sha384_from_cert(uint8_t *cert, size_t size, + uint8_t *sha384, Error **errp) +{ +gnutls_x509_crt_t crt; +size_t hash_size = 48; +gnutls_datum_t datum = {.data = cert, .size = size}; + +gnutls_global_init(); +gnutls_x509_crt_init(&crt); + +if (gnutls_x509_crt_import(crt, &datum, GNUTLS_X509_FMT_PEM) != 0) { +error_setg(errp, "Failed to import certificate"); +goto cleanup; +} + +if (gnutls_x509_crt_get_fingerprint(crt, GNUTLS_DIG_SHA384, sha384, +&hash_size) != 0) { +error_setg(errp, "Failed to compute SHA384 fingerprint"); +goto cleanup; +} + +return true; + + cleanup: +gnutls_x509_crt_deinit(crt); +gnutls_global_deinit(); +return false; +} + +static bool get_signature_fingerprint_sha384(FILE *eif, uint64_t size, + uint8_t *sha384, + uint32_t *crc, + Error **errp) +{ +size_t got; +uint8_t *sig = NULL; +uint8_t *cert = NULL; +cbor_item_t *item = NULL; +cbor_item_t *pcr0 = NULL; +size_t len; +struct cbor_pair *pair; +struct cbor_load_result result; + +sig = g_malloc(size); +got = fread(sig, 1, size, eif); +if ((uint64_t) got != size) { +error_setg(errp, "Failed to read EIF signature section data"); +goto cleanup; +} + +*crc = crc
Re: [PATCH v3 4/5] machine/nitro-enclave: Add built-in Nitro Secure Module device
On 10.08.24 18:45, Dorjoy Chowdhury wrote: AWS Nitro Enclaves have built-in Nitro Secure Module (NSM) device which is used for stripped down TPM functionality like attestation. This commit adds the built-in NSM device in the nitro-enclave machine type. In Nitro Enclaves, all the PCRs start in a known zero state and the first 16 PCRs are locked from boot and reserved. The PCR0, PCR1, PCR2 and PCR8 contain the SHA384 hashes related to the EIF file used to boot the VM for validation. A new optional nitro-enclave machine option 'id' has been added which will be the enclave identifier reflected in the module-id of the NSM device. Otherwise, the device will have a default id set. Signed-off-by: Dorjoy Chowdhury --- hw/core/eif.c | 205 +++- hw/core/eif.h | 5 +- hw/core/meson.build | 4 +- hw/i386/Kconfig | 1 + hw/i386/nitro_enclave.c | 85 - include/hw/i386/nitro_enclave.h | 19 +++ 6 files changed, 310 insertions(+), 9 deletions(-) [...] diff --git a/hw/core/meson.build b/hw/core/meson.build index f32d1ad943..7e7a14ee00 100644 --- a/hw/core/meson.build +++ b/hw/core/meson.build @@ -12,6 +12,8 @@ hwcore_ss.add(files( 'qdev-clock.c', )) +libcbor = dependency('libcbor', version: '>=0.8.0') + common_ss.add(files('cpu-common.c')) common_ss.add(files('machine-smp.c')) system_ss.add(when: 'CONFIG_FITLOADER', if_true: files('loader-fit.c')) @@ -24,7 +26,7 @@ system_ss.add(when: 'CONFIG_REGISTER', if_true: files('register.c')) system_ss.add(when: 'CONFIG_SPLIT_IRQ', if_true: files('split-irq.c')) system_ss.add(when: 'CONFIG_XILINX_AXI', if_true: files('stream.c')) system_ss.add(when: 'CONFIG_PLATFORM_BUS', if_true: files('sysbus-fdt.c')) -system_ss.add(when: 'CONFIG_NITRO_ENCLAVE', if_true: [files('eif.c'), zlib]) +system_ss.add(when: 'CONFIG_NITRO_ENCLAVE', if_true: [files('eif.c'), zlib, libcbor, gnutls]) I think this is missing a dependency check somewhere: ../hw/core/eif.c:16:10: fatal error: gnutls/gnutls.h: No such file or directory 16 | #include | ^ It's also the first time anything accesses gnutls directly instead of through the QEMU crypto framework. Is there a particular reason you can not use qcrypto? Alex Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597
Re: [PATCH v2 0/2] AWS Nitro Enclave emulation
On 01.06.24 18:26, Dorjoy Chowdhury wrote: This is v2 submission for AWS Nitro Enclave emulation in QEMU. v1 is at: https://mail.gnu.org/archive/html/qemu-devel/2024-05/msg03524.html Changes in v2: - moved eif.c and eif.h files from hw/i386 to hw/core Hi, Hope everyone is doing well. I am working on adding AWS Nitro Enclave[1] emulation support in QEMU. Alexander Graf is mentoring me on this work. This is a patch series adding, not yet complete, but useful emulation support of nitro enclaves. I have a gitlab branch where you can view the patches in the gitlab web UI for each commit: https://gitlab.com/dorjoy03/qemu/-/tree/nitro-enclave-emulation AWS nitro enclaves is an Amazon EC2[2] feature that allows creating isolated execution environments, called enclaves, from Amazon EC2 instances, which are used for processing highly sensitive data. Enclaves have no persistent storage and no external networking. The enclave VMs are based on Firecracker microvm and have a vhost-vsock device for communication with the parent EC2 instance that spawned it and a Nitro Secure Module (NSM) device for cryptographic attestation. The parent instance VM always has CID 3 while the enclave VM gets a dynamic CID. The enclave VMs can communicate with the parent instance over various ports to CID 3, for example, the init process inside an enclave sends a heartbeat to port 9000 upon boot, expecting a heartbeat reply, letting the parent instance know that the enclave VM has successfully booted. From inside an EC2 instance, nitro-cli[3] is used to spawn an enclave VM using an EIF (Enclave Image Format)[4] file. EIF files can be built using nitro-cli as well. There is no official EIF specification apart from the github aws-nitro-enclaves-image-format repository[4]. An EIF file contains the kernel, cmdline and ramdisk(s) in different sections which are used to boot the enclave VM. You can look at the structs in hw/i386/eif.c file for more details about the EIF file format. Adding nitro enclave emulation support in QEMU will make the life of AWS Nitro Enclave users easier as they will be able to test their EIF images locally without having to run real nitro enclaves which can be difficult for debugging due to its roots in security. This will also make quick prototyping easier. In QEMU, the new nitro-enclave machine type is implemented based on the microvm machine type similar to how AWS Nitro Enclaves are based on Firecracker microvm. The vhost-vsock device support is already part of this patch series so that the enclave VM can communicate to CID 3 using vsock. A mandatory 'guest-cid' machine type option is needed which becomes the CID of the enclave VM. Some documentation for the new 'nitro-enclave' machine type has also been added. The NSM device support will be added in the future. The plan is to eventually make the nitro enclave emulation in QEMU standalone i.e., without needing to run another VM with CID 3 with proper vsock communication support. For this to work, one approach could be to teach the vhost-vsock driver in kernel to forward CID 3 messages to another CID (set to CID 2 for host) so that users of the nitro-enclave machine type can run the necessary vsock server/clients in the host machine (some defaults can be implemented in QEMU as well, for example, sending a reply to the heartbeat) which will rid them of the cumbersome way of running another whole VM with CID 3. This way, users of nitro-enclave machine in QEMU, could potentially also run multiple enclaves with their messages for CID 3 forwarded to different CIDs which, in QEMU side, could then be specified using a new machine type option (parent-cid) if implemented. I will be posting an email to the linux virtualization mailing list about this approach asking for feedback and suggestions soon. For local testing you need to generate a hello.eif image by first building nitro-cli locally[5]. Then you can use nitro-cli to build a hello.eif image[6]. You need to build qemu-system-x86_64 after applying the patches and then you can run the following command to boot a hello.eif image using the new 'nitro-enclave' machine type option in QEMU: sudo ./qemu-system-x86_64 -M nitro-enclave,guest-cid=8 -kernel path/to/hello.eif -nographic -m 4G --enable-kvm -cpu host The command needs to be run as sudo because for the vhost-vsock device to work QEMU needs to be able to open vhost device in host. Right now, if you just run the nitro-enclave machine, the kernel panics because the init process exits abnormally because it cannot connect to port 9000 to CID 3 to send its heartbeat message (the connection times out), so another VM with CID 3 with proper vsock communication support must be run for it to be useful. But this restriction can be lifted once the approach about forwarding CID 3 messages is implemented if it gets accepted. Reviewed-by: Alexander Graf I'm happy to see Nitro Enclaves guest support merged even if th
Re: [PATCH v1 1/2] machine/microvm: support for loading EIF image
On 22.05.24 19:23, Dorjoy Chowdhury wrote: Hi Daniel, Thanks for reviewing. On Wed, May 22, 2024 at 9:32 PM Daniel P. Berrangé wrote: On Sat, May 18, 2024 at 02:07:52PM +0600, Dorjoy Chowdhury wrote: An EIF (Enclave Image Format)[1] image is used to boot an AWS nitro enclave[2] virtual machine. The EIF file contains the necessary kernel, cmdline, ramdisk(s) sections to boot. This commit adds support for loading EIF image using the microvm machine code. For microvm to boot from an EIF file, the kernel and ramdisk(s) are extracted into a temporary kernel and a temporary initrd file which are then hooked into the regular x86 boot mechanism along with the extracted cmdline. I can understand why you did it this way, but I feel its pretty gross to be loading everything into memory, writing it back to disk, and then immediately loading it all back into memory. The root problem is the x86_load_linux() method, which directly accesses the machine properties: const char *kernel_filename = machine->kernel_filename; const char *initrd_filename = machine->initrd_filename; const char *dtb_filename = machine->dtb; const char *kernel_cmdline = machine->kernel_cmdline; To properly handle this, I'd say we need to introduce an abstraction for loading the kernel/inittrd/cmdlkine data. ie on the X86MachineClass object, we should introduce four virtual methods uint8_t *(*load_kernel)(X86MachineState *machine); uint8_t *(*load_initrd)(X86MachineState *machine); uint8_t *(*load_dtb)(X86MachineState *machine); uint8_t *(*load_cmdline)(X86MachineState *machine); The default impl of these four methods should just following the existing logic, of reading and returning the data associated with the kernel_filename, initrd_filename, dtb and kernel_cmdline properties. The Nitro machine sub-class, however, can provide an alternative impl of thse virtual methods which returns data directly from the EIF file instead. Great suggestion! I agree the implementation path you suggested would look much nicer as a whole. Now that I looked a bit into the "x86_load_linux" implementation, it looks like pretty much everything is tied to expecting a filename. For example, after reading the header from the kernel_filename x86_load_linux calls into load_multiboot, load_elf (which in turn calls load_elf64, 32) and they all expect a filename. I think I would need to refactor all of them to instead work with (uint8_t *) buffers instead, right? Also in case of initrd_filename the existing code maps the file using g_mapped_file_new to the X86MachineState member initrd_mapped_file. So that will need to be looked into and refactored. Please correct me if I misunderstood something about the way to implement your suggested approach. If I am understanding this right, this probably requires a lot of work which will also probably not be straightforward to implement or test. Right now, the way the code is, it's easy to see that the existing code paths are still correct as they are not changed and the new nitro-enclave machine code just hooks into them. As the loading to memory, writing to disk and loading back to memory only is in the execution path of the new nitro-enclave machine type, I think the way the patch is right now, is a good first implementation. What do you think? I think the "real" fix here is to move all of the crufty loader logic from "file access" to "block layer access". Along with that, create a generic helper function (like this[1]) that opens all -kernel/-initrd/-dtb files through the block layer and calls a board specific callback to perform the load. With that in place, we would have a reentrant code path for the EIF case: EIF could plug into the generic x86 loader and when it detects EIF, create internal block devices that reference the existing file plus an offset/limit setting to ensure it only accesses the correct range in the target file. It can then simply reinvoke the x86 loader with the new block device objects. With that in place, we could also finally support -kernel http://.../vmlinuz command line invocations which currently only works on block devices. However, I do agree that the above is significant effort to get going and shouldn't hold back the Nitro Enclave machine model. Alex [1] https://github.com/agraf/qemu/commit/e49b7a18f2d8a386e5f207c567ad9ab2e3cb5429 Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597
Re: [PATCH v1 1/2] machine/microvm: support for loading EIF image
On 27.05.24 16:52, Dorjoy Chowdhury wrote: Hi Philippe, Thank you for reviewing. On Mon, May 27, 2024 at 4:47 PM Philippe Mathieu-Daudé wrote: Hi Dorjoy, On 18/5/24 10:07, Dorjoy Chowdhury wrote: An EIF (Enclave Image Format)[1] image is used to boot an AWS nitro enclave[2] virtual machine. The EIF file contains the necessary kernel, cmdline, ramdisk(s) sections to boot. This commit adds support for loading EIF image using the microvm machine code. For microvm to boot from an EIF file, the kernel and ramdisk(s) are extracted into a temporary kernel and a temporary initrd file which are then hooked into the regular x86 boot mechanism along with the extracted cmdline. Although not useful for the microvm machine itself, this is needed as the following commit adds support for a new machine type 'nitro-enclave' which uses the microvm machine type as parent. The code for checking and loading EIF will be put inside a 'nitro-enclave' machine type check in the following commit so that microvm cannot load EIF because it shouldn't. [1] https://github.com/aws/aws-nitro-enclaves-image-format The documentation is rather scarse... Do you mean documentation about EIF file format? If so, yes, right now there is no specification other than the github repo for EIF. [2] https://aws.amazon.com/ec2/nitro/nitro-enclaves/ Signed-off-by: Dorjoy Chowdhury --- hw/i386/eif.c | 486 hw/i386/eif.h | 20 ++ hw/i386/meson.build | 2 +- ... still it seems a generic loader, not restricted to x86. Maybe better add it as hw/core/loader-eif.[ch]? Yes, the code in eif.c is architecture agnostic. So it could make sense to move the files to hw/core. But I don't think the names should have "loader" prefix as there is no loading logic in eif.c. There is only logic for parsing and extracting kernel, intird, cmdline etc. Because nitro-enclave machine type is based on microvm which only supports x86 now, I think it also makes sense to keep the files inside hw/i386 for now as we can only really load x86 kernel using it. Maybe if we, in the future, add support for other architectures, then we can move them to hw/core. What do you think? I think it makes sense to put EIF parsing into generic code from the start. Nitro Enclaves supports Aarch64 with the same EIF semantics. In fact, it would be pretty simple to do a sub-machine-class similar to the x86 NE one for arm based on -M virt as a follow-up and by making the EIF logic x86 only we're only making our lives harder for that future. Alex Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597
Re: [PATCH] hvf: arm: Fix encodings for ID_AA64PFR1_EL1 and debug System registers
On 03.05.24 19:34, Zenghui Yu wrote: We wrongly encoded ID_AA64PFR1_EL1 using {3,0,0,4,2} in hvf_sreg_match[] so we fail to get the expected ARMCPRegInfo from cp_regs hash table with the wrong key. Fix it with the correct encoding {3,0,0,4,1}. With that fixed, the Linux guest can properly detect FEAT_SSBS2 on my M1 HW. All DBG{B,W}{V,C}R_EL1 registers are also wrongly encoded with op0 == 14. It happens to work because HVF_SYSREG(CRn, CRm, 14, op1, op2) equals to HVF_SYSREG(CRn, CRm, 2, op1, op2), by definition. But we shouldn't rely on it. Fixes: a1477da3ddeb ("hvf: Add Apple Silicon support") Signed-off-by: Zenghui Yu Nice catch! Did you find them only because of functional issues or have you taken an automated pass somehow to validate the sysreg definitions are correct? Reviewed-by: Alexander Graf Alex
Re: Call for GSoC/Outreachy internship project ideas
Hey Stefan, Thanks a lot for setting up GSoC this year again! On 15.01.24 17:32, Stefan Hajnoczi wrote: Dear QEMU and KVM communities, QEMU will apply for the Google Summer of Code and Outreachy internship programs again this year. Regular contributors can submit project ideas that they'd like to mentor by replying to this email before January 30th. Internship programs --- GSoC (https://summerofcode.withgoogle.com/) and Outreachy (https://www.outreachy.org/) offer paid open source remote work internships to eligible people wishing to participate in open source development. QEMU has been part of these internship programs for many years. Our mentors have enjoyed helping talented interns make their first open source contributions and some former interns continue to participate today. Who can mentor -- Regular contributors to QEMU and KVM can participate as mentors. Mentorship involves about 5 hours of time commitment per week to communicate with the intern, review their patches, etc. Time is also required during the intern selection phase to communicate with applicants. Being a mentor is an opportunity to help someone get started in open source development, will give you experience with managing a project in a low-stakes environment, and a chance to explore interesting technical ideas that you may not have time to develop yourself. How to propose your idea -- Reply to this email with the following project idea template filled in: === TITLE === '''Summary:''' Short description of the project Detailed description of the project that explains the general idea, including a list of high-level tasks that will be completed by the project, and provides enough background for someone unfamiliar with the codebase to do research. Typically 2 or 3 paragraphs. '''Links:''' * Wiki links to relevant material * External links to mailing lists or web sites '''Details:''' * Skill level: beginner or intermediate or advanced * Language: C/Python/Rust/etc === Implement -M nitro-enclave in QEMU === '''Summary:''' AWS EC2 provides the ability to create an isolated sibling VM context from within a VM. This project implements the machine model and input data format parsing needed to run these sibling VMs stand alone in QEMU. Nitro Enclaves are the first widely adopted implementation of hypervisor assisted compute isolation. Similar to technologies like SGX, it allows to spawn a separate context that is inaccessible by the parent Operating System. This is implemented by "giving up" resources of the parent VM (CPU cores, memory) to the hypervisor which then spawns a second vmm to execute a completely separate virtual machine. That new VM only has a vsock communication channel to the parent and has a built-in lightweight TPM called NSM. One big challenge with Nitro Enclaves is that due to its roots in security, there are very few debugging / introspection capabilities. That makes OS bringup, debugging and bootstrapping very difficult. Having a local dev&test environment that looks like an Enclave, but is 100% controlled by the developer and introspectable would make life a lot easier for everyone working on them. It also may pave the way to see Nitro Enclaves adopted in VM environments outside of EC2. This project will consist of adding a new machine model to QEMU that mimics a Nitro Enclave environment, including NSM, the vsock communication channel and building firmware which loads the special "EIF" file format which contains kernel, initramfs and metadata from a -kernel image. If the student finishes early, we can then proceed to implement the Nitro Enclaves parent driver in QEMU as well to create a full QEMU-only Nitro Enclaves environment. '''Tasks:''' * Implement a device model for the NSM device (link to spec and driver code below) * Implement a new machine model * Implement firmware for the new machine model that implements EIF parsing * Add tests for the NSM device * Add integration test for the machine model executing an actual EIF payload '''Links:''' * https://aws.amazon.com/ec2/nitro/nitro-enclaves/ * https://lore.kernel.org/lkml/20200921121732.44291-10-andra...@amazon.com/T/ * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/misc/nsm.c '''Details:''' * Skill level: intermediate - advanced (some understanding of QEMU machine modeling would be good) * Language: C * Mentor: agraf * Suggested by: Alexander Graf (OFTC: agraf, Email: g...@amazon.com) Alex
Re: [PATCH v12 04/10] hvf: Add Apple Silicon support
On 30.11.23 15:17, Philippe Mathieu-Daudé wrote: Hi, On 16/9/21 17:53, Alexander Graf wrote: With Apple Silicon available to the masses, it's a good time to add support for driving its virtualization extensions from QEMU. This patch adds all necessary architecture specific code to get basic VMs working, including save/restore. Known limitations: - WFI handling is missing (follows in later patch) - No watchpoint/breakpoint support Signed-off-by: Alexander Graf Reviewed-by: Roman Bolshakov Reviewed-by: Sergio Lopez Reviewed-by: Peter Maydell --- MAINTAINERS | 5 + accel/hvf/hvf-accel-ops.c | 9 + include/sysemu/hvf_int.h | 10 +- meson.build | 1 + target/arm/hvf/hvf.c | 794 target/arm/hvf/trace-events | 10 + target/i386/hvf/hvf.c | 5 + 7 files changed, 833 insertions(+), 1 deletion(-) create mode 100644 target/arm/hvf/hvf.c create mode 100644 target/arm/hvf/trace-events +int hvf_arch_init_vcpu(CPUState *cpu) +{ + ARMCPU *arm_cpu = ARM_CPU(cpu); + CPUARMState *env = &arm_cpu->env; + uint32_t sregs_match_len = ARRAY_SIZE(hvf_sreg_match); + uint32_t sregs_cnt = 0; + uint64_t pfr; + hv_return_t ret; + int i; + + env->aarch64 = 1; + asm volatile("mrs %0, cntfrq_el0" : "=r"(arm_cpu->gt_cntfrq_hz)); + + /* Allocate enough space for our sysreg sync */ + arm_cpu->cpreg_indexes = g_renew(uint64_t, arm_cpu->cpreg_indexes, + sregs_match_len); + arm_cpu->cpreg_values = g_renew(uint64_t, arm_cpu->cpreg_values, + sregs_match_len); + arm_cpu->cpreg_vmstate_indexes = g_renew(uint64_t, + arm_cpu->cpreg_vmstate_indexes, + sregs_match_len); + arm_cpu->cpreg_vmstate_values = g_renew(uint64_t, + arm_cpu->cpreg_vmstate_values, + sregs_match_len); + + memset(arm_cpu->cpreg_values, 0, sregs_match_len * sizeof(uint64_t)); + + /* Populate cp list for all known sysregs */ + for (i = 0; i < sregs_match_len; i++) { + const ARMCPRegInfo *ri; + uint32_t key = hvf_sreg_match[i].key; + + ri = get_arm_cp_reginfo(arm_cpu->cp_regs, key); + if (ri) { + assert(!(ri->type & ARM_CP_NO_RAW)); + hvf_sreg_match[i].cp_idx = sregs_cnt; + arm_cpu->cpreg_indexes[sregs_cnt++] = cpreg_to_kvm_id(key); So this hash ...: /* * Convert a truncated 32 bit hashtable key into the full * 64 bit KVM register ID. */ static uint64_t cpreg_to_kvm_id(uint32_t cpregid) { uint64_t kvmid; if (cpregid & CP_REG_AA64_MASK) { kvmid = cpregid & ~CP_REG_AA64_MASK; kvmid |= CP_REG_SIZE_U64 | CP_REG_ARM64; } else { kvmid = cpregid & ~(1 << 15); if (cpregid & (1 << 15)) { kvmid |= CP_REG_SIZE_U64 | CP_REG_ARM; } else { kvmid |= CP_REG_SIZE_U32 | CP_REG_ARM; } } return kvmid; } ... just happens to work the same way for HVF? It never feeds into HVF - we only use these values as unique identifiers inside QEMU, no? See write_cpustate_to_list() and write_list_to_cpustate() for reference. Alex
Re: [PATCH 00/16] hw/uefi: add uefi variable service
Hey Gerd! On 15.11.23 16:12, Gerd Hoffmann wrote: This patch adds a virtual device to qemu which the uefi firmware can use to store variables. This moves the UEFI variable management from privileged guest code (managing vars in pflash) to the host. Main advantage is that the need to have privilege separation in the guest goes away. On x86 privileged guest code runs in SMM. It's supported by kvm, but not liked much by various stakeholders in cloud space due to the complexity SMM emulation brings. On arm privileged guest code runs in el3 (aka secure world). This is not supported by kvm, which is unlikely to change anytime soon given that even el2 support (nested virt) is being worked on for years and is not yet in mainline. The design idea is to reuse the request serialization protocol edk2 uses for communication between SMM and non-SMM code, so large chunks of the edk2 variable driver stack can be used unmodified. Only the driver which traps into SMM mode must be replaced by a driver which talks to qemu instead. I'm not sure I like the split :). If we cut things off at the SMM communication layer, we still have a lot of code inside the Runtime Services (RTS) code that is edk2 specific which means we're tying ourselves tightly to the edk2 data format. It also means we can not easily expose UEFI variables that QEMU owns, which can come in very handy when implementing MOR - another feature that depends on SMM today. In EC2, we are simply serializing all variable RTS calls to the hypervisor, similar to the Xen implementation (https://www.youtube.com/watch?v=jiR8khaECEk). The edk2 side code we have built is here: https://github.com/aws/uefi/tree/main/edk2-stable202211 (see anything with VarStore in the name). For the vmm side, we currently have an AWS-internal C++ implementation that I can convert into QEMU code and send as patch if there is real interest. Given that we deal with untrusted input, I would strongly prefer if we could move it to a Rust implementation instead though. We started a Rust reimplementation of it that interface that can build as a library with C bindings which QEMU could then link against: https://github.com/Mimoja/rs-uefi-varstore/tree/for_main The code never went beyond the initial stages, but if we're pulling the variable store to QEMU, this would be the best path forward IMHO. If instead, we just want something we can quickly integrate while eating up the additional security risk, I think we should just reuse the Xen implementation. Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Re: [PATCH] hvf: Enable 1G page support
On 21.04.23 00:52, Alexander Graf wrote: Hvf on x86 only supported 2MiB large pages, but never bothered to strip out the 1GiB page size capability from -cpu host. With QEMU 8.0.0 this became a problem because OVMF started to use 1GiB pages by default. Let's just unconditionally add 1GiB page walk support to the walker. With this fix applied, I can successfully run OVMF again. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1603 Signed-off-by: Alexander Graf Reported-by: Akihiro Suda Reported-by: Philippe Mathieu-Daudé Ping. Anyone willing to pick this up? :) Alex
Re: [PATCH v2 05/11] tpm_crb: use the ISA bus
Hi Joelle, On 01.08.23 03:46, Joelle van Dyne wrote: On Tue, Jul 18, 2023 at 7:16 AM Stefan Berger wrote: On 7/17/23 09:46, Igor Mammedov wrote: On Fri, 14 Jul 2023 00:09:21 -0700 Joelle van Dyne wrote: Since this device is gated to only build for targets with the PC configuration, we should use the ISA bus like with TPM TIS. does it affect migration in any way? From guest pov it looks like there a new ISA device will appear and then if you do ping pong migration between old - new QEMU will really it work? If it will, then commit message here shall describe why it safe and why it works I would just leave the existing device as-is. This seems safest and we know thta it works. Stefan Alexander, do you have any comments here? I know you wanted to move away from the default bus. The concern is that switching from the default bus to the ISA bus may cause issues in migration. The current course of action is to drop this patch. The big problem I have with the CRB device is this code: https://gitlab.com/qemu-project/qemu/-/blob/master/hw/tpm/tpm_crb.c?ref_type=heads#L297-305 It's a device that maps itself autonomously into system memory. The way mapping is supposed to work is that the parent of the device maps it into a bus region. If the parent is a machine, it is free to also map it into system memory. But a device should not even know what system memory is :). That said, I'm happy if we just introduce a new "sane" sysdev TPM CRB device that we use for non-PCs and leave the current layering violating one as is. Alex
Re: [PATCH v2 01/12] build: Only define OS_OBJECT_USE_OBJC with gcc
On 31.08.23 10:53, Akihiko Odaki wrote: On 2023/08/31 17:12, Philippe Mathieu-Daudé wrote: On 30/8/23 18:14, Alexander Graf wrote: Recent versions of macOS use clang instead of gcc. The OS_OBJECT_USE_OBJC define is only necessary when building with gcc. Let's not define it when building with clang. With this patch, I can successfully include GCD headers in QEMU when building with clang. Signed-off-by: Alexander Graf --- meson.build | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/meson.build b/meson.build index 98e68ef0b1..0d6a0015a1 100644 --- a/meson.build +++ b/meson.build @@ -224,7 +224,9 @@ qemu_ldflags = [] if targetos == 'darwin' # Disable attempts to use ObjectiveC features in os/object.h since they # won't work when we're compiling with gcc as a C compiler. - qemu_common_flags += '-DOS_OBJECT_USE_OBJC=0' + if compiler.get_id() == 'gcc' + qemu_common_flags += '-DOS_OBJECT_USE_OBJC=0' + endif elif targetos == 'solaris' # needed for CMSG_ macros in sys/socket.h qemu_common_flags += '-D_XOPEN_SOURCE=600' Reviewed-by: Philippe Mathieu-Daudé Defining OS_OBJECT_USE_OBJC does not look like a proper solution. Looking at os/object.h, it seems OS_OBJECT_USE_OBJC is defined as 0 when: !defined(OS_OBJECT_HAVE_OBJC_SUPPORT) && (!defined(__OBJC__) || defined(__OBJC_GC__)) This means OS_OBJECT_USE_OBJC is always 0 if Objective-C is disabled. I also confirmed os/object.h will not use Objective-C features when compiled as C code on clang with the following command: clang -E -x -c - < EOF If compilation fails with GCC when not defining OS_OBJECT_USE_OBJC, it probably means GCC incorrectly treats C code as Objective-C and that is the problem we should solve. I cannot confirm this theory however since I have only an Apple Silicon Mac that is incompatible with GCC. My take on this was to make the gcc hack be a "legacy" thing that we put into its own corner, so that in a few years we can just drop it altogether. I don't really think it's worth wasting much time on this workaround and its potential compatibility with old macOS versions. Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH v2 07/12] hw/vmapple/aes: Introduce aes engine
VMApple contains an "aes" engine device that it uses to encrypt and decrypt its nvram. It has trivial hard coded keys it uses for that purpose. Add device emulation for this device model. Signed-off-by: Alexander Graf --- hw/vmapple/aes.c| 583 hw/vmapple/Kconfig | 2 + hw/vmapple/meson.build | 1 + hw/vmapple/trace-events | 18 ++ 4 files changed, 604 insertions(+) create mode 100644 hw/vmapple/aes.c diff --git a/hw/vmapple/aes.c b/hw/vmapple/aes.c new file mode 100644 index 00..eaf1e26abe --- /dev/null +++ b/hw/vmapple/aes.c @@ -0,0 +1,583 @@ +/* + * QEMU Apple AES device emulation + * + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "hw/irq.h" +#include "migration/vmstate.h" +#include "qemu/log.h" +#include "qemu/module.h" +#include "trace.h" +#include "hw/sysbus.h" +#include "crypto/hash.h" +#include "crypto/aes.h" +#include "crypto/cipher.h" + +#define TYPE_AES "apple-aes" +#define MAX_FIFO_SIZE 9 + +#define CMD_KEY 0x1 +#define CMD_KEY_CONTEXT_SHIFT27 +#define CMD_KEY_CONTEXT_MASK (0x1 << CMD_KEY_CONTEXT_SHIFT) +#define CMD_KEY_SELECT_SHIFT 24 +#define CMD_KEY_SELECT_MASK (0x7 << CMD_KEY_SELECT_SHIFT) +#define CMD_KEY_KEY_LEN_SHIFT22 +#define CMD_KEY_KEY_LEN_MASK (0x3 << CMD_KEY_KEY_LEN_SHIFT) +#define CMD_KEY_ENCRYPT_SHIFT20 +#define CMD_KEY_ENCRYPT_MASK (0x1 << CMD_KEY_ENCRYPT_SHIFT) +#define CMD_KEY_BLOCK_MODE_SHIFT 16 +#define CMD_KEY_BLOCK_MODE_MASK (0x3 << CMD_KEY_BLOCK_MODE_SHIFT) +#define CMD_IV0x2 +#define CMD_IV_CONTEXT_SHIFT 26 +#define CMD_IV_CONTEXT_MASK (0x3 << CMD_KEY_CONTEXT_SHIFT) +#define CMD_DSB 0x3 +#define CMD_SKG 0x4 +#define CMD_DATA 0x5 +#define CMD_DATA_KEY_CTX_SHIFT 27 +#define CMD_DATA_KEY_CTX_MASK(0x1 << CMD_DATA_KEY_CTX_SHIFT) +#define CMD_DATA_IV_CTX_SHIFT25 +#define CMD_DATA_IV_CTX_MASK (0x3 << CMD_DATA_IV_CTX_SHIFT) +#define CMD_DATA_LEN_MASK0xff +#define CMD_STORE_IV 0x6 +#define CMD_STORE_IV_ADDR_MASK 0xff +#define CMD_WRITE_REG 0x7 +#define CMD_FLAG 0x8 +#define CMD_FLAG_STOP_MASK BIT(26) +#define CMD_FLAG_RAISE_IRQ_MASK BIT(27) +#define CMD_FLAG_INFO_MASK 0xff +#define CMD_MAX 0x10 + +#define CMD_SHIFT 28 + +#define REG_STATUS0xc +#define REG_STATUS_DMA_READ_RUNNING BIT(0) +#define REG_STATUS_DMA_READ_PENDING BIT(1) +#define REG_STATUS_DMA_WRITE_RUNNINGBIT(2) +#define REG_STATUS_DMA_WRITE_PENDINGBIT(3) +#define REG_STATUS_BUSY BIT(4) +#define REG_STATUS_EXECUTINGBIT(5) +#define REG_STATUS_READYBIT(6) +#define REG_STATUS_TEXT_DPA_SEEDED BIT(7) +#define REG_STATUS_UNWRAP_DPA_SEEDEDBIT(8) + +#define REG_IRQ_STATUS0x18 +#define REG_IRQ_STATUS_INVALID_CMD BIT(2) +#define REG_IRQ_STATUS_FLAG BIT(5) +#define REG_IRQ_ENABLE0x1c +#define REG_WATERMARK 0x20 +#define REG_Q_STATUS 0x24 +#define REG_FLAG_INFO 0x30 +#define REG_FIFO 0x200 + +static const uint32_t key_lens[4] = { +[0] = 16, +[1] = 24, +[2] = 32, +[3] = 64, +}; + +struct key { +uint32_t key_len; +uint32_t key[8]; +}; + +struct iv { +uint32_t iv[4]; +}; + +struct context { +struct key key; +struct iv iv; +}; + +static struct key builtin_keys[7] = { +[1] = { +.key_len = 32, +.key = { 0x1 }, +}, +[2] = { +.key_len = 32, +.key = { 0x2 }, +}, +[3] = { +.key_len = 32, +.key = { 0x3 }, +} +}; + +typedef struct AESState { +/* Private */ +SysBusDevice parent_obj; + +/* Public */ +qemu_irq irq; +MemoryRegion iomem1; +MemoryRegion iomem2; + +uint32_t status; +uint32_t q_status; +uint32_t irq_status; +uint32_t irq_enable; +uint32_t watermark; +uint32_t flag_info; +uint32_t fifo[MAX_FIFO_SIZE]; +uint32_t fifo_idx; +struct key key[2]; +struct iv iv[4]; +bool is_encrypt; +QCryptoCipherMode block_mode; +} AESState; + +OBJECT_DECLARE_SIMPLE_TYPE(AESState, AES) + +static void aes_update_irq(AESState *s) +{ +qemu_set_irq(s->irq, !!(s->irq_status & s->irq_enable)); +} + +static uint64_t aes1_read(void *opaque, hwaddr offset, unsigned size) +{ +AESState *s = opaque; +uint64_t res = 0; + +switch (offset) { +case REG_STATUS: +res = s->status; +break; +case REG_IRQ_STATUS: +res = s->irq_status; +break; +
[PATCH v2 01/12] build: Only define OS_OBJECT_USE_OBJC with gcc
Recent versions of macOS use clang instead of gcc. The OS_OBJECT_USE_OBJC define is only necessary when building with gcc. Let's not define it when building with clang. With this patch, I can successfully include GCD headers in QEMU when building with clang. Signed-off-by: Alexander Graf --- meson.build | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/meson.build b/meson.build index 98e68ef0b1..0d6a0015a1 100644 --- a/meson.build +++ b/meson.build @@ -224,7 +224,9 @@ qemu_ldflags = [] if targetos == 'darwin' # Disable attempts to use ObjectiveC features in os/object.h since they # won't work when we're compiling with gcc as a C compiler. - qemu_common_flags += '-DOS_OBJECT_USE_OBJC=0' + if compiler.get_id() == 'gcc' +qemu_common_flags += '-DOS_OBJECT_USE_OBJC=0' + endif elif targetos == 'solaris' # needed for CMSG_ macros in sys/socket.h qemu_common_flags += '-D_XOPEN_SOURCE=600' -- 2.39.2 (Apple Git-143) Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH v2 12/12] hw/vmapple/vmapple: Add vmapple machine type
Apple defines a new "vmapple" machine type as part of its proprietary macOS Virtualization.Framework vmm. This machine type is similar to the virt one, but with subtle differences in base devices, a few special vmapple device additions and a vastly different boot chain. This patch reimplements this machine type in QEMU. To use it, you have to have a readily installed version of macOS for VMApple, run on macOS with -accel hvf, pass the Virtualization.Framework boot rom (AVPBooter) in via -bios, pass the aux and root volume as pflash and pass aux and root volume as virtio drives. In addition, you also need to find the machine UUID and pass that as -M vmapple,uuid= parameter: $ qemu-system-aarch64 -accel hvf -M vmapple,uuid=0x1234 -m 4G \ -bios /System/Library/Frameworks/Virtualization.framework/Versions/A/Resources/AVPBooter.vmapple2.bin -drive file=aux,if=pflash,format=raw \ -drive file=root,if=pflash,format=raw \ -drive file=aux,if=none,id=aux,format=raw \ -device vmapple-virtio-aux,drive=aux \ -drive file=root,if=none,id=root,format=raw \ -device vmapple-virtio-root,drive=root With all these in place, you should be able to see macOS booting successfully. Signed-off-by: Alexander Graf --- v1 -> v2: - Adapt to system_ss meson.build target - Add documentation --- MAINTAINERS | 1 + docs/system/arm/vmapple.rst | 63 docs/system/target-arm.rst | 1 + hw/vmapple/vmapple.c| 661 hw/vmapple/Kconfig | 19 ++ hw/vmapple/meson.build | 1 + 6 files changed, 746 insertions(+) create mode 100644 docs/system/arm/vmapple.rst create mode 100644 hw/vmapple/vmapple.c diff --git a/MAINTAINERS b/MAINTAINERS index 3104e58eff..1d3b1e0034 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2578,6 +2578,7 @@ M: Alexander Graf S: Maintained F: hw/vmapple/* F: include/hw/vmapple/* +F: docs/system/arm/vmapple.rst Subsystems -- diff --git a/docs/system/arm/vmapple.rst b/docs/system/arm/vmapple.rst new file mode 100644 index 00..c7486b21d9 --- /dev/null +++ b/docs/system/arm/vmapple.rst @@ -0,0 +1,63 @@ +VMApple machine emulation + + +VMApple is the device model that the macOS built-in hypervisor called "Virtualization.framework" +exposes to Apple Silicon macOS guests. The "vmapple" machine model in QEMU implements the same +device model, but does not use any code from Virtualization.Framework. + +Prerequisites +- + +To run the vmapple machine model, you need to + + * Run on Apple Silicon + * Run on macOS 12.0 or above + * Have an already installed copy of a Virtualization.Framework macOS virtual machine. I will + assume that you installed it using the macosvm CLI. + +First, we need to extract the UUID from the virtual machine that you installed. You can do this +by running the following shell script: + +.. code-block:: bash + :caption: uuid.sh script to extract the UUID from a macosvm.json file + + #!/bin/bash + + MID=$(cat "$1" | python3 -c 'import json,sys;obj=json.load(sys.stdin);print(obj["machineId"]);') + echo "$MID" | base64 -d | plutil -extract ECID raw - + +Now we also need to trim the aux partition. It contains metadata that we can just discard: + +.. code-block:: bash + :caption: Command to trim the aux file + + $ dd if="aux.img" of="aux.img.trimmed" bs=$(( 0x4000 )) skip=1 + +How to run +-- + +Then, we can launch QEMU with the Virtualization.Framework pre-boot environment and the readily +installed target disk images. I recommend to port forward the VM's ssh and vnc ports to the host +to get better interactive access into the target system: + +.. code-block:: bash + :caption: Example execution command line + + $ UUID=$(uuid.sh macosvm.json) + $ AVPBOOTER=/System/Library/Frameworks/Virtualization.framework/Resources/AVPBooter.vmapple2.bin + $ AUX=aux.img.trimmed + $ DISK=disk.img + $ qemu-system-aarch64 \ + -serial mon:stdio \ + -m 4G \ + -accel hvf \ + -M vmapple,uuid=$UUID \ + -bios $AVPBOOTER \ +-drive file="$AUX",if=pflash,format=raw \ +-drive file="$DISK",if=pflash,format=raw \ + -drive file="$AUX",if=none,id=aux,format=raw \ + -drive file="$DISK",if=none,id=root,format=raw \ + -device vmapple-virtio-aux,drive=aux \ + -device vmapple-virtio-root,drive=root \ + -net user,ipv6=off,hostfwd=tcp::-:22,hostfwd=tcp::5901-:5900 \ + -net nic,model=virtio \ diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst index 790ac1b8a2..bf663df4a6 100644 --- a/docs/system/target-arm.rst +++ b/docs/system/target-arm.rst @@ -106,6 +106,7 @@ undocumented; you can get a complete list by running arm/stellaris arm/stm3
[PATCH v2 00/12] Introduce new vmapple machine type
This patch set introduces a new ARM and HVF specific machine type called "vmapple". It mimicks the device model that Apple's proprietary Virtualization.Framework exposes, but implements it in QEMU. With this new machine type, you can run macOS guests on Apple Silicon systems via HVF. To do so, you need to first install macOS using Virtualization.Framework onto a virtual disk image using a tool like macosvm (https://github.com/s-u/macosvm) $ macosvm --disk disk.img,size=32g --aux aux.img \ --restore UniversalMac_12.0.1_21A559_Restore.ipsw vm.json Then, extract the ECID from the installed VM: $ cat "$DIR/macosvm.json" | python3 -c \ 'import json,sys;obj=json.load(sys.stdin);print(obj["machineId"]) |\ base64 -d | plutil -extract ECID raw - In addition, cut off the first 16kb of the aux.img: $ dd if=aux.img of=aux.img.trimmed bs=$(( 0x4000 )) skip=1 Now, you can just launch QEMU with the bits generated above: $ qemu-system-aarch64 -serial mon:stdio\ -m 4G \ -M vmapple,uuid=6240349656165161789\ -bios /Sys*/Lib*/Fra*/Virtualization.f*/R*/AVPBooter.vmapple2.bin \ -pflash aux.img.trimmed\ -pflash disk.img \ -drive file=disk.img,if=none,id=root \ -device vmapple-virtio-root,drive=root \ -drive file=aux.img.trimmed,if=none,id=aux \ -device vmapple-virtio-aix,drive=aux \ -accel hvf There are a few limitations with this implementation: - Only runs on macOS because it relies on ParavirtualizesGraphics.Framework - Something is not fully correct on interrupt delivery or similar - the keyboard does not work - No Rosetta in the guest because we lack the private entitlement to enable TSO Over time, I hope that some of the limitations above could cease to exist. This device model would enable very nice use cases with KVM on an Asahi Linux device. Please beware that the vmapple device model only works with macOS 12 guests for now. Newer guests run into Hypervisor.Framework incompatibilities. --- v1 -> v2: - Adapt to system_ss meson.build target - Add documentation - Rework virtio-blk patch to make all vmapple virtio-blk logic subclasses - Add log message on write - Move max slot number to define - Use SPDX header - Remove useless includes Alexander Graf (12): build: Only define OS_OBJECT_USE_OBJC with gcc hw/misc/pvpanic: Add MMIO interface hvf: Increase number of possible memory slots hvf: arm: Ignore writes to CNTP_CTL_EL0 hw: Add vmapple subdir gpex: Allow more than 4 legacy IRQs hw/vmapple/aes: Introduce aes engine hw/vmapple/bdif: Introduce vmapple backdoor interface hw/vmapple/cfg: Introduce vmapple cfg region hw/vmapple/apple-gfx: Introduce ParavirtualizedGraphics.Framework support hw/vmapple/virtio-blk: Add support for apple virtio-blk hw/vmapple/vmapple: Add vmapple machine type MAINTAINERS | 7 + docs/system/arm/vmapple.rst | 68 docs/system/target-arm.rst | 1 + meson.build | 9 +- hw/vmapple/trace.h | 1 + include/hw/misc/pvpanic.h | 1 + include/hw/pci-host/gpex.h | 7 +- include/hw/pci/pci_ids.h| 1 + include/hw/virtio/virtio-blk.h | 11 +- include/hw/vmapple/bdif.h | 31 ++ include/hw/vmapple/cfg.h| 68 include/hw/vmapple/virtio-blk.h | 39 ++ include/sysemu/hvf_int.h| 4 +- accel/hvf/hvf-accel-ops.c | 2 +- hw/arm/sbsa-ref.c | 2 +- hw/arm/virt.c | 2 +- hw/block/virtio-blk.c | 18 +- hw/i386/microvm.c | 2 +- hw/loongarch/virt.c | 2 +- hw/mips/loongson3_virt.c| 2 +- hw/misc/pvpanic-mmio.c | 61 +++ hw/openrisc/virt.c | 12 +- hw/pci-host/gpex.c | 36 +- hw/riscv/virt.c | 12 +- hw/vmapple/aes.c| 583 hw/vmapple/bdif.c | 245 hw/vmapple/cfg.c| 105 + hw/vmapple/virtio-blk.c | 212 ++ hw/vmapple/vmapple.c| 661 hw/xtensa/virt.c| 2 +- target/arm/hvf/hvf.c| 9 + hw/Kconfig | 1 + hw/meson.build | 1 + hw/misc/Kconfig | 4 + hw/misc/meson.build | 1 + hw/vmapple/Kconfig | 33 ++ hw/vmapple/apple-gfx.m | 578 hw/vmapple/meson.build | 6 + hw/vmapple/trace
[PATCH v2 09/12] hw/vmapple/cfg: Introduce vmapple cfg region
Instead of device tree or other more standardized means, VMApple passes platform configuration to the first stage boot loader in a binary encoded format that resides at a dedicated RAM region in physical address space. This patch models this configuration space as a qdev device which we can then map at the fixed location in the address space. That way, we can influence and annotate all configuration fields easily. Signed-off-by: Alexander Graf --- v1 -> v2: - Adapt to system_ss meson.build target --- include/hw/vmapple/cfg.h | 68 + hw/vmapple/cfg.c | 105 +++ hw/vmapple/Kconfig | 3 ++ hw/vmapple/meson.build | 1 + 4 files changed, 177 insertions(+) create mode 100644 include/hw/vmapple/cfg.h create mode 100644 hw/vmapple/cfg.c diff --git a/include/hw/vmapple/cfg.h b/include/hw/vmapple/cfg.h new file mode 100644 index 00..3337064e44 --- /dev/null +++ b/include/hw/vmapple/cfg.h @@ -0,0 +1,68 @@ +/* + * VMApple Configuration Region + * + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef HW_VMAPPLE_CFG_H +#define HW_VMAPPLE_CFG_H + +#include "hw/sysbus.h" +#include "qom/object.h" +#include "net/net.h" + +typedef struct VMAppleCfg { +uint32_t version; /* 0x000 */ +uint32_t nr_cpus; /* 0x004 */ +uint32_t unk1;/* 0x008 */ +uint32_t unk2;/* 0x00c */ +uint32_t unk3;/* 0x010 */ +uint32_t unk4;/* 0x014 */ +uint64_t ecid;/* 0x018 */ +uint64_t ram_size;/* 0x020 */ +uint32_t run_installer1; /* 0x028 */ +uint32_t unk5;/* 0x02c */ +uint32_t unk6;/* 0x030 */ +uint32_t run_installer2; /* 0x034 */ +uint32_t rnd; /* 0x038 */ +uint32_t unk7;/* 0x03c */ +MACAddr mac_en0; /* 0x040 */ +uint8_t pad1[2]; +MACAddr mac_en1; /* 0x048 */ +uint8_t pad2[2]; +MACAddr mac_wifi0;/* 0x050 */ +uint8_t pad3[2]; +MACAddr mac_bt0; /* 0x058 */ +uint8_t pad4[2]; +uint8_t reserved[0xa0]; /* 0x060 */ +uint32_t cpu_ids[0x80]; /* 0x100 */ +uint8_t scratch[0x200]; /* 0x180 */ +char serial[32]; /* 0x380 */ +char unk8[32];/* 0x3a0 */ +char model[32]; /* 0x3c0 */ +uint8_t unk9[32]; /* 0x3e0 */ +uint32_t unk10; /* 0x400 */ +char soc_name[32];/* 0x404 */ +} VMAppleCfg; + +#define TYPE_VMAPPLE_CFG "vmapple-cfg" +OBJECT_DECLARE_SIMPLE_TYPE(VMAppleCfgState, VMAPPLE_CFG) + +struct VMAppleCfgState { +/* */ +SysBusDevice parent_obj; +VMAppleCfg cfg; + +/* */ +MemoryRegion mem; +char *serial; +char *model; +char *soc_name; +}; + +#define VMAPPLE_CFG_SIZE 0x0001 + +#endif /* HW_VMAPPLE_CFG_H */ diff --git a/hw/vmapple/cfg.c b/hw/vmapple/cfg.c new file mode 100644 index 00..d48e3c3afa --- /dev/null +++ b/hw/vmapple/cfg.c @@ -0,0 +1,105 @@ +/* + * VMApple Configuration Region + * + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "hw/vmapple/cfg.h" +#include "qemu/log.h" +#include "qemu/module.h" +#include "qapi/error.h" + +static void vmapple_cfg_reset(DeviceState *dev) +{ +VMAppleCfgState *s = VMAPPLE_CFG(dev); +VMAppleCfg *cfg; + +cfg = memory_region_get_ram_ptr(&s->mem); +memset((void *)cfg, 0, VMAPPLE_CFG_SIZE); +*cfg = s->cfg; +} + +static void vmapple_cfg_realize(DeviceState *dev, Error **errp) +{ +VMAppleCfgState *s = VMAPPLE_CFG(dev); +uint32_t i; + +strncpy(s->cfg.serial, s->serial, sizeof(s->cfg.serial)); +strncpy(s->cfg.model, s->model, sizeof(s->cfg.model)); +strncpy(s->cfg.soc_name, s->soc_name, sizeof(s->cfg.soc_name)); +strncpy(s->cfg.unk8, "D/A", sizeof(s->cfg.soc_name)); +s->cfg.ecid = cpu_to_be64(s->cfg.ecid); +s->cfg.version = 2; +s->cfg.unk1 = 1; +s->cfg.unk2 = 1; +s->cfg.unk3 = 0x20; +s->cfg.unk4 = 0; +s->cfg.unk5 = 1; +s->cfg.unk6 = 1; +s->cfg.unk7 = 0; +s->cfg.unk10 = 1; + +g_assert(s->cfg.nr_cpus < ARRAY_SIZE(s->cfg.cpu_ids)); +for (i = 0; i < s->cfg.nr_cpus; i++) { +s->cfg.cpu_ids[i] = i; +} +} + +static void vmapple_cfg_init(Object *obj) +{ +VMAppleCfgState *s = VMAPPLE_CFG(obj); + +memory_region_init_ram(&s->mem, obj, "VMApple Config", VMAPPLE_CFG_SI
[PATCH v2 08/12] hw/vmapple/bdif: Introduce vmapple backdoor interface
The VMApple machine exposes AUX and ROOT block devices (as well as USB OTG emulation) via virtio-pci as well as a special, simple backdoor platform device. This patch implements this backdoor platform device to the best of my understanding. I left out any USB OTG parts; they're only needed for guest recovery and I don't understand the protocol yet. Signed-off-by: Alexander Graf --- v1 -> v2: - Adapt to system_ss meson.build target --- include/hw/vmapple/bdif.h | 31 + hw/vmapple/bdif.c | 245 ++ hw/vmapple/Kconfig| 2 + hw/vmapple/meson.build| 1 + hw/vmapple/trace-events | 5 + 5 files changed, 284 insertions(+) create mode 100644 include/hw/vmapple/bdif.h create mode 100644 hw/vmapple/bdif.c diff --git a/include/hw/vmapple/bdif.h b/include/hw/vmapple/bdif.h new file mode 100644 index 00..65ee43457b --- /dev/null +++ b/include/hw/vmapple/bdif.h @@ -0,0 +1,31 @@ +/* + * VMApple Backdoor Interface + * + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef HW_VMAPPLE_BDIF_H +#define HW_VMAPPLE_BDIF_H + +#include "hw/sysbus.h" +#include "qom/object.h" + +#define TYPE_VMAPPLE_BDIF "vmapple-bdif" +OBJECT_DECLARE_SIMPLE_TYPE(VMAppleBdifState, VMAPPLE_BDIF) + +struct VMAppleBdifState { +/* */ +SysBusDevice parent_obj; + +/* */ +BlockBackend *aux; +BlockBackend *root; +MemoryRegion mmio; +}; + +#define VMAPPLE_BDIF_SIZE 0x0020 + +#endif /* HW_VMAPPLE_BDIF_H */ diff --git a/hw/vmapple/bdif.c b/hw/vmapple/bdif.c new file mode 100644 index 00..36b5915ff3 --- /dev/null +++ b/hw/vmapple/bdif.c @@ -0,0 +1,245 @@ +/* + * VMApple Backdoor Interface + * + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "hw/vmapple/bdif.h" +#include "qemu/log.h" +#include "qemu/module.h" +#include "qapi/error.h" +#include "trace.h" +#include "hw/block/block.h" +#include "sysemu/block-backend.h" + +#define REG_DEVID_MASK 0x +#define DEVID_ROOT 0x +#define DEVID_AUX 0x0001 +#define DEVID_USB 0x0010 + +#define REG_STATUS 0x0 +#define REG_STATUS_ACTIVE BIT(0) +#define REG_CFG 0x4 +#define REG_CFG_ACTIVEBIT(1) +#define REG_UNK10x8 +#define REG_BUSY0x10 +#define REG_BUSY_READYBIT(0) +#define REG_UNK20x400 +#define REG_CMD 0x408 +#define REG_NEXT_DEVICE 0x420 +#define REG_UNK30x434 + +typedef struct vblk_sector { +uint32_t pad; +uint32_t pad2; +uint32_t sector; +uint32_t pad3; +} VblkSector; + +typedef struct vblk_req_cmd { +uint64_t addr; +uint32_t len; +uint32_t flags; +} VblkReqCmd; + +typedef struct vblk_req { +VblkReqCmd sector; +VblkReqCmd data; +VblkReqCmd retval; +} VblkReq; + +#define VBLK_DATA_FLAGS_READ 0x00030001 +#define VBLK_DATA_FLAGS_WRITE 0x00010001 + +#define VBLK_RET_SUCCESS 0 +#define VBLK_RET_FAILED 1 + +static uint64_t bdif_read(void *opaque, hwaddr offset, unsigned size) +{ +uint64_t ret = -1; +uint64_t devid = (offset & REG_DEVID_MASK); + +switch (offset & ~REG_DEVID_MASK) { +case REG_STATUS: +ret = REG_STATUS_ACTIVE; +break; +case REG_CFG: +ret = REG_CFG_ACTIVE; +break; +case REG_UNK1: +ret = 0x420; +break; +case REG_BUSY: +ret = REG_BUSY_READY; +break; +case REG_UNK2: +ret = 0x1; +break; +case REG_UNK3: +ret = 0x0; +break; +case REG_NEXT_DEVICE: +switch (devid) { +case DEVID_ROOT: +ret = 0x800; +break; +case DEVID_AUX: +ret = 0x1; +break; +} +break; +} + +trace_bdif_read(offset, size, ret); +return ret; +} + +static void le2cpu_sector(VblkSector *sector) +{ +sector->sector = le32_to_cpu(sector->sector); +} + +static void le2cpu_reqcmd(VblkReqCmd *cmd) +{ +cmd->addr = le64_to_cpu(cmd->addr); +cmd->len = le32_to_cpu(cmd->len); +cmd->flags = le32_to_cpu(cmd->flags); +} + +static void le2cpu_req(VblkReq *req) +{ +le2cpu_reqcmd(&req->sector); +le2cpu_reqcmd(&req->data); +le2cpu_reqcmd(&req->retval); +} + +static void vblk_cmd(uint64_t devid, BlockBackend *blk, uint64_t value, + uint64_t static_off) +{ +VblkReq req; +VblkSector sector; +uint64_t off = 0; +
[PATCH v2 10/12] hw/vmapple/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
MacOS provides a framework (library) that allows any vmm to implement a paravirtualized 3d graphics passthrough to the host metal stack called ParavirtualizedGraphics.Framework (PVG). The library abstracts away almost every aspect of the paravirtualized device model and only provides and receives callbacks on MMIO access as well as to share memory address space between the VM and PVG. This patch implements a QEMU device that drives PVG for the VMApple variant of it. Signed-off-by: Alexander Graf --- v1 -> v2: - Adapt to system_ss meson.build target --- meson.build | 4 + hw/vmapple/Kconfig | 3 + hw/vmapple/apple-gfx.m | 578 hw/vmapple/meson.build | 1 + hw/vmapple/trace-events | 22 ++ 5 files changed, 608 insertions(+) create mode 100644 hw/vmapple/apple-gfx.m diff --git a/meson.build b/meson.build index dc5242a5f4..d34310b5eb 100644 --- a/meson.build +++ b/meson.build @@ -607,6 +607,8 @@ socket = [] version_res = [] coref = [] iokit = [] +pvg = [] +metal = [] emulator_link_args = [] nvmm =not_found hvf = not_found @@ -630,6 +632,8 @@ elif targetos == 'darwin' coref = dependency('appleframeworks', modules: 'CoreFoundation') iokit = dependency('appleframeworks', modules: 'IOKit', required: false) host_dsosuf = '.dylib' + pvg = dependency('appleframeworks', modules: 'ParavirtualizedGraphics') + metal = dependency('appleframeworks', modules: 'Metal') elif targetos == 'sunos' socket = [cc.find_library('socket'), cc.find_library('nsl'), diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig index 542426a740..ba37fc5b81 100644 --- a/hw/vmapple/Kconfig +++ b/hw/vmapple/Kconfig @@ -6,3 +6,6 @@ config VMAPPLE_BDIF config VMAPPLE_CFG bool + +config VMAPPLE_PVG +bool diff --git a/hw/vmapple/apple-gfx.m b/hw/vmapple/apple-gfx.m new file mode 100644 index 00..97dd2cd9ae --- /dev/null +++ b/hw/vmapple/apple-gfx.m @@ -0,0 +1,578 @@ +/* + * QEMU Apple ParavirtualizedGraphics.framework device + * + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + * ParavirtualizedGraphics.framework is a set of libraries that macOS provides + * which implements 3d graphics passthrough to the host as well as a + * proprietary guest communication channel to drive it. This device model + * implements support to drive that library from within QEMU. + */ + +#include "qemu/osdep.h" +#include "hw/irq.h" +#include "migration/vmstate.h" +#include "qemu/log.h" +#include "qemu/module.h" +#include "trace.h" +#include "hw/sysbus.h" +#include "hw/pci/msi.h" +#include "crypto/hash.h" +#include "sysemu/cpus.h" +#include "ui/console.h" +#include "monitor/monitor.h" +#import + +#define TYPE_APPLE_GFX "apple-gfx" + +#define MAX_MRS 512 + +static const PGDisplayCoord_t apple_gfx_modes[] = { +{ .x = 1440, .y = 1080 }, +{ .x = 1280, .y = 1024 }, +}; + +/* + * We have to map PVG memory into our address space. Use the one below + * as base start address. In normal linker setups it points to a free + * memory range. + */ +#define APPLE_GFX_BASE_VA ((void *)(uintptr_t)0x5000UL) + +/* + * ParavirtualizedGraphics.Framework only ships header files for the x86 + * variant which does not include IOSFC descriptors and host devices. We add + * their definitions here so that we can also work with the ARM version. + */ +typedef bool(^IOSFCRaiseInterrupt)(uint32_t vector); +typedef bool(^IOSFCUnmapMemory)(void *a, void *b, void *c, void *d, void *e, void *f); +typedef bool(^IOSFCMapMemory)(uint64_t phys, uint64_t len, bool ro, void **va, void *e, void *f); + +@interface PGDeviceDescriptorExt : PGDeviceDescriptor +@property (readwrite, nonatomic) bool usingIOSurfaceMapper; +@end + +@interface PGIOSurfaceHostDeviceDescriptor : NSObject +-(PGIOSurfaceHostDeviceDescriptor *)init; +@property (readwrite, nonatomic, copy, nullable) IOSFCMapMemory mapMemory; +@property (readwrite, nonatomic, copy, nullable) IOSFCUnmapMemory unmapMemory; +@property (readwrite, nonatomic, copy, nullable) IOSFCRaiseInterrupt raiseInterrupt; +@end + +@interface PGIOSurfaceHostDevice : NSObject +-(void)initWithDescriptor:(PGIOSurfaceHostDeviceDescriptor *) desc; +-(uint32_t)mmioReadAtOffset:(size_t) offset; +-(void)mmioWriteAtOffset:(size_t) offset value:(uint32_t)value; +@end + +typedef struct AppleGFXMR { +QTAILQ_ENTRY(AppleGFXMR) node; +hwaddr pa; +void *va; +uint64_t len; +} AppleGFXMR; + +typedef QTAILQ_HEAD(, AppleGFXMR) AppleGFXMRList; + +typedef struct AppleGFXTask { +QTAILQ_ENTRY(Apple
[PATCH v2 02/12] hw/misc/pvpanic: Add MMIO interface
In addition to the ISA and PCI variants of pvpanic, let's add an MMIO platform device that we can use in embedded arm environments. Signed-off-by: Alexander Graf Reviewed-by: Philippe Mathieu-Daudé Tested-by: Philippe Mathieu-Daudé --- v1 -> v2: - Use SPDX header - Remove useless includes - Adapt to new meson.build target (system_ss) --- include/hw/misc/pvpanic.h | 1 + hw/misc/pvpanic-mmio.c| 61 +++ hw/misc/Kconfig | 4 +++ hw/misc/meson.build | 1 + 4 files changed, 67 insertions(+) create mode 100644 hw/misc/pvpanic-mmio.c diff --git a/include/hw/misc/pvpanic.h b/include/hw/misc/pvpanic.h index fab94165d0..f9e7c1ea17 100644 --- a/include/hw/misc/pvpanic.h +++ b/include/hw/misc/pvpanic.h @@ -20,6 +20,7 @@ #define TYPE_PVPANIC_ISA_DEVICE "pvpanic" #define TYPE_PVPANIC_PCI_DEVICE "pvpanic-pci" +#define TYPE_PVPANIC_MMIO_DEVICE "pvpanic-mmio" #define PVPANIC_IOPORT_PROP "ioport" diff --git a/hw/misc/pvpanic-mmio.c b/hw/misc/pvpanic-mmio.c new file mode 100644 index 00..99a24f104c --- /dev/null +++ b/hw/misc/pvpanic-mmio.c @@ -0,0 +1,61 @@ +/* + * QEMU simulated pvpanic device (MMIO frontend) + * + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#include "qemu/osdep.h" + +#include "hw/qdev-properties.h" +#include "hw/misc/pvpanic.h" +#include "hw/sysbus.h" +#include "standard-headers/linux/pvpanic.h" + +OBJECT_DECLARE_SIMPLE_TYPE(PVPanicMMIOState, PVPANIC_MMIO_DEVICE) + +#define PVPANIC_MMIO_SIZE 0x2 + +struct PVPanicMMIOState { +SysBusDevice parent_obj; + +PVPanicState pvpanic; +}; + +static void pvpanic_mmio_initfn(Object *obj) +{ +PVPanicMMIOState *s = PVPANIC_MMIO_DEVICE(obj); + +pvpanic_setup_io(&s->pvpanic, DEVICE(s), PVPANIC_MMIO_SIZE); +sysbus_init_mmio(SYS_BUS_DEVICE(obj), &s->pvpanic.mr); +} + +static Property pvpanic_mmio_properties[] = { +DEFINE_PROP_UINT8("events", PVPanicMMIOState, pvpanic.events, + PVPANIC_PANICKED | PVPANIC_CRASH_LOADED), +DEFINE_PROP_END_OF_LIST(), +}; + +static void pvpanic_mmio_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); + +device_class_set_props(dc, pvpanic_mmio_properties); +set_bit(DEVICE_CATEGORY_MISC, dc->categories); +} + +static const TypeInfo pvpanic_mmio_info = { +.name = TYPE_PVPANIC_MMIO_DEVICE, +.parent= TYPE_SYS_BUS_DEVICE, +.instance_size = sizeof(PVPanicMMIOState), +.instance_init = pvpanic_mmio_initfn, +.class_init= pvpanic_mmio_class_init, +}; + +static void pvpanic_register_types(void) +{ +type_register_static(&pvpanic_mmio_info); +} + +type_init(pvpanic_register_types) diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig index 6996d265e4..b69746a60a 100644 --- a/hw/misc/Kconfig +++ b/hw/misc/Kconfig @@ -125,6 +125,10 @@ config PVPANIC_ISA depends on ISA_BUS select PVPANIC_COMMON +config PVPANIC_MMIO +bool +select PVPANIC_COMMON + config AUX bool select I2C diff --git a/hw/misc/meson.build b/hw/misc/meson.build index 892f8b91c5..63821d6040 100644 --- a/hw/misc/meson.build +++ b/hw/misc/meson.build @@ -116,6 +116,7 @@ system_ss.add(when: 'CONFIG_ARMSSE_MHU', if_true: files('armsse-mhu.c')) system_ss.add(when: 'CONFIG_PVPANIC_ISA', if_true: files('pvpanic-isa.c')) system_ss.add(when: 'CONFIG_PVPANIC_PCI', if_true: files('pvpanic-pci.c')) +system_ss.add(when: 'CONFIG_PVPANIC_MMIO', if_true: files('pvpanic-mmio.c')) system_ss.add(when: 'CONFIG_AUX', if_true: files('auxbus.c')) system_ss.add(when: 'CONFIG_ASPEED_SOC', if_true: files( 'aspeed_hace.c', -- 2.39.2 (Apple Git-143) Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH v2 03/12] hvf: Increase number of possible memory slots
For PVG we will need more than the current 32 possible memory slots. Bump the limit to 512 instead. Signed-off-by: Alexander Graf --- v1 -> v2: - Move max slot number to define --- include/sysemu/hvf_int.h | 4 +++- accel/hvf/hvf-accel-ops.c | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h index 718beddcdd..36aa9b4eff 100644 --- a/include/sysemu/hvf_int.h +++ b/include/sysemu/hvf_int.h @@ -17,6 +17,8 @@ #include #endif +#define HVF_MAX_SLOTS 512 + /* hvf_slot flags */ #define HVF_SLOT_LOG (1 << 0) @@ -40,7 +42,7 @@ typedef struct hvf_vcpu_caps { struct HVFState { AccelState parent; -hvf_slot slots[32]; +hvf_slot slots[HVF_MAX_SLOTS]; int num_slots; hvf_vcpu_caps *hvf_caps; diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c index 3c94c79747..7aee0d6f72 100644 --- a/accel/hvf/hvf-accel-ops.c +++ b/accel/hvf/hvf-accel-ops.c @@ -88,7 +88,7 @@ struct mac_slot { uint64_t gva; }; -struct mac_slot mac_slots[32]; +struct mac_slot mac_slots[HVF_MAX_SLOTS]; static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) { -- 2.39.2 (Apple Git-143) Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH v2 05/12] hw: Add vmapple subdir
We will introduce a number of devices that are specific to the vmapple target machine. To keep them all tidily together, let's put them into a single target directory. Signed-off-by: Alexander Graf --- MAINTAINERS | 6 ++ meson.build | 1 + hw/vmapple/trace.h | 1 + hw/Kconfig | 1 + hw/meson.build | 1 + hw/vmapple/Kconfig | 1 + hw/vmapple/meson.build | 0 hw/vmapple/trace-events | 2 ++ 8 files changed, 13 insertions(+) create mode 100644 hw/vmapple/trace.h create mode 100644 hw/vmapple/Kconfig create mode 100644 hw/vmapple/meson.build create mode 100644 hw/vmapple/trace-events diff --git a/MAINTAINERS b/MAINTAINERS index 6111b6b4d9..3104e58eff 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2573,6 +2573,12 @@ F: hw/usb/canokey.c F: hw/usb/canokey.h F: docs/system/devices/canokey.rst +VMapple +M: Alexander Graf +S: Maintained +F: hw/vmapple/* +F: include/hw/vmapple/* + Subsystems -- Overall Audio backends diff --git a/meson.build b/meson.build index 0d6a0015a1..dc5242a5f4 100644 --- a/meson.build +++ b/meson.build @@ -3282,6 +3282,7 @@ if have_system 'hw/usb', 'hw/vfio', 'hw/virtio', +'hw/vmapple', 'hw/watchdog', 'hw/xen', 'hw/gpio', diff --git a/hw/vmapple/trace.h b/hw/vmapple/trace.h new file mode 100644 index 00..572adbefe0 --- /dev/null +++ b/hw/vmapple/trace.h @@ -0,0 +1 @@ +#include "trace/trace-hw_vmapple.h" diff --git a/hw/Kconfig b/hw/Kconfig index ba62ff6417..d99854afdd 100644 --- a/hw/Kconfig +++ b/hw/Kconfig @@ -41,6 +41,7 @@ source tpm/Kconfig source usb/Kconfig source virtio/Kconfig source vfio/Kconfig +source vmapple/Kconfig source xen/Kconfig source watchdog/Kconfig diff --git a/hw/meson.build b/hw/meson.build index c7ac7d3d75..e156a6618f 100644 --- a/hw/meson.build +++ b/hw/meson.build @@ -40,6 +40,7 @@ subdir('tpm') subdir('usb') subdir('vfio') subdir('virtio') +subdir('vmapple') subdir('watchdog') subdir('xen') subdir('xenpv') diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig new file mode 100644 index 00..8b13789179 --- /dev/null +++ b/hw/vmapple/Kconfig @@ -0,0 +1 @@ + diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build new file mode 100644 index 00..e69de29bb2 diff --git a/hw/vmapple/trace-events b/hw/vmapple/trace-events new file mode 100644 index 00..9ccc579048 --- /dev/null +++ b/hw/vmapple/trace-events @@ -0,0 +1,2 @@ +# See docs/devel/tracing.rst for syntax documentation. + -- 2.39.2 (Apple Git-143) Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH v2 04/12] hvf: arm: Ignore writes to CNTP_CTL_EL0
MacOS unconditionally disables interrupts of the physical timer on boot and then continues to use the virtual one. We don't really want to support a full physical timer emulation, so let's just ignore those writes. Signed-off-by: Alexander Graf --- v1 -> v2: - Add log message on write --- target/arm/hvf/hvf.c | 9 + 1 file changed, 9 insertions(+) diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c index 486f90be1d..02db3dc908 100644 --- a/target/arm/hvf/hvf.c +++ b/target/arm/hvf/hvf.c @@ -11,6 +11,7 @@ #include "qemu/osdep.h" #include "qemu/error-report.h" +#include "qemu/log.h" #include "sysemu/runstate.h" #include "sysemu/hvf.h" @@ -179,6 +180,7 @@ void hvf_arm_init_debug(void) #define SYSREG_OSLSR_EL1 SYSREG(2, 0, 1, 1, 4) #define SYSREG_OSDLR_EL1 SYSREG(2, 0, 1, 3, 4) #define SYSREG_CNTPCT_EL0 SYSREG(3, 3, 14, 0, 1) +#define SYSREG_CNTP_CTL_EL0 SYSREG(3, 3, 14, 2, 1) #define SYSREG_PMCR_EL0 SYSREG(3, 3, 9, 12, 0) #define SYSREG_PMUSERENR_EL0 SYSREG(3, 3, 9, 14, 0) #define SYSREG_PMCNTENSET_EL0 SYSREG(3, 3, 9, 12, 1) @@ -1551,6 +1553,13 @@ static int hvf_sysreg_write(CPUState *cpu, uint32_t reg, uint64_t val) case SYSREG_OSLAR_EL1: env->cp15.oslsr_el1 = val & 1; break; +case SYSREG_CNTP_CTL_EL0: +/* + * Guests should not rely on the physical counter, but macOS emits + * disable writes to it. Let it do so, but ignore the requests. + */ +qemu_log_mask(LOG_UNIMP, "Unsupported write to CNTP_CTL_EL0\n"); +break; case SYSREG_OSDLR_EL1: /* Dummy register */ break; -- 2.39.2 (Apple Git-143) Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH v2 11/12] hw/vmapple/virtio-blk: Add support for apple virtio-blk
Apple has its own virtio-blk PCI device ID where it deviates from the official virtio-pci spec slightly: It puts a new "apple type" field at a static offset in config space and introduces a new barrier command. This patch first creates a mechanism for virtio-blk downstream classes to handle unknown commands. It then creates such a downstream class and a new vmapple-virtio-blk-pci class which support the additional apple type config identifier as well as the barrier command. It then exposes 2 subclasses from that that we can use to expose root and aux virtio-blk devices: "vmapple-virtio-root" and "vmapple-virtio-aux". Signed-off-by: Alexander Graf --- v1 -> v2: - Rework to make all vmapple virtio-blk logic a subclass --- include/hw/pci/pci_ids.h| 1 + include/hw/virtio/virtio-blk.h | 12 +- include/hw/vmapple/virtio-blk.h | 39 ++ hw/block/virtio-blk.c | 19 ++- hw/vmapple/virtio-blk.c | 212 hw/vmapple/Kconfig | 3 + hw/vmapple/meson.build | 1 + 7 files changed, 282 insertions(+), 5 deletions(-) create mode 100644 include/hw/vmapple/virtio-blk.h create mode 100644 hw/vmapple/virtio-blk.c diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h index e4386ebb20..74e589a298 100644 --- a/include/hw/pci/pci_ids.h +++ b/include/hw/pci/pci_ids.h @@ -188,6 +188,7 @@ #define PCI_DEVICE_ID_APPLE_UNI_N_AGP0x0020 #define PCI_DEVICE_ID_APPLE_U3_AGP 0x004b #define PCI_DEVICE_ID_APPLE_UNI_N_GMAC 0x0021 +#define PCI_DEVICE_ID_APPLE_VIRTIO_BLK 0x1a00 #define PCI_VENDOR_ID_SUN0x108e #define PCI_DEVICE_ID_SUN_EBUS 0x1000 diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h index dafec432ce..381a906410 100644 --- a/include/hw/virtio/virtio-blk.h +++ b/include/hw/virtio/virtio-blk.h @@ -23,7 +23,7 @@ #include "qom/object.h" #define TYPE_VIRTIO_BLK "virtio-blk-device" -OBJECT_DECLARE_SIMPLE_TYPE(VirtIOBlock, VIRTIO_BLK) +OBJECT_DECLARE_TYPE(VirtIOBlock, VirtIOBlkClass, VIRTIO_BLK) /* This is the last element of the write scatter-gather list */ struct virtio_blk_inhdr @@ -91,6 +91,16 @@ typedef struct MultiReqBuffer { bool is_write; } MultiReqBuffer; +typedef struct VirtIOBlkClass { +/*< private >*/ +VirtioDeviceClass parent; +/*< public >*/ +bool (*handle_unknown_request)(VirtIOBlockReq *req, MultiReqBuffer *mrb, + uint32_t type); +} VirtIOBlkClass; + void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq); +void virtio_blk_free_request(VirtIOBlockReq *req); +void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status); #endif diff --git a/include/hw/vmapple/virtio-blk.h b/include/hw/vmapple/virtio-blk.h new file mode 100644 index 00..b23106a3df --- /dev/null +++ b/include/hw/vmapple/virtio-blk.h @@ -0,0 +1,39 @@ +/* + * VMApple specific VirtIO Block implementation + * + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef HW_VMAPPLE_CFG_H +#define HW_VMAPPLE_CFG_H + +#include "hw/sysbus.h" +#include "qom/object.h" +#include "hw/virtio/virtio-pci.h" +#include "hw/virtio/virtio-blk.h" + +#define TYPE_VMAPPLE_VIRTIO_BLK "vmapple-virtio-blk" +#define TYPE_VMAPPLE_VIRTIO_ROOT "vmapple-virtio-root" +#define TYPE_VMAPPLE_VIRTIO_AUX "vmapple-virtio-aux" + +OBJECT_DECLARE_TYPE(VMAppleVirtIOBlk, VMAppleVirtIOBlkClass, VMAPPLE_VIRTIO_BLK) + +typedef struct VMAppleVirtIOBlkClass { +/*< private >*/ +VirtIOBlkClass parent; +/*< public >*/ +void (*get_config)(VirtIODevice *vdev, uint8_t *config); +} VMAppleVirtIOBlkClass; + +typedef struct VMAppleVirtIOBlk { +/* */ +VirtIOBlock parent_obj; + +/* */ +uint32_t apple_type; +} VMAppleVirtIOBlk; + +#endif /* HW_VMAPPLE_CFG_H */ diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 39e7f23fab..1645cdccbe 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -48,12 +48,12 @@ static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq, req->mr_next = NULL; } -static void virtio_blk_free_request(VirtIOBlockReq *req) +void virtio_blk_free_request(VirtIOBlockReq *req) { g_free(req); } -static void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status) +void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status) { VirtIOBlock *s = req->dev; VirtIODevice *vdev = VIRTIO_DEVICE(s); @@ -1121,8 +1121,18 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) break; } default: -virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP); -virtio_b
[PATCH v2 06/12] gpex: Allow more than 4 legacy IRQs
Some boards such as vmapple don't do real legacy PCI IRQ swizzling. Instead, they just keep allocating more board IRQ lines for each new legacy IRQ. Let's support that mode by giving instantiators a new "nr_irqs" property they can use to support more than 4 legacy IRQ lines. In this mode, GPEX will export more IRQ lines, one for each device. Signed-off-by: Alexander Graf --- include/hw/pci-host/gpex.h | 7 +++ hw/arm/sbsa-ref.c | 2 +- hw/arm/virt.c | 2 +- hw/i386/microvm.c | 2 +- hw/loongarch/virt.c| 2 +- hw/mips/loongson3_virt.c | 2 +- hw/openrisc/virt.c | 12 ++-- hw/pci-host/gpex.c | 36 +++- hw/riscv/virt.c| 12 ++-- hw/xtensa/virt.c | 2 +- 10 files changed, 52 insertions(+), 27 deletions(-) diff --git a/include/hw/pci-host/gpex.h b/include/hw/pci-host/gpex.h index b0240bd768..098dc4d1cc 100644 --- a/include/hw/pci-host/gpex.h +++ b/include/hw/pci-host/gpex.h @@ -32,8 +32,6 @@ OBJECT_DECLARE_SIMPLE_TYPE(GPEXHost, GPEX_HOST) #define TYPE_GPEX_ROOT_DEVICE "gpex-root" OBJECT_DECLARE_SIMPLE_TYPE(GPEXRootState, GPEX_ROOT_DEVICE) -#define GPEX_NUM_IRQS 4 - struct GPEXRootState { /*< private >*/ PCIDevice parent_obj; @@ -51,8 +49,9 @@ struct GPEXHost { MemoryRegion io_mmio; MemoryRegion io_ioport_window; MemoryRegion io_mmio_window; -qemu_irq irq[GPEX_NUM_IRQS]; -int irq_num[GPEX_NUM_IRQS]; +uint32_t nr_irqs; +qemu_irq *irq; +int *irq_num; bool allow_unmapped_accesses; }; diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c index bc89eb4806..a786849238 100644 --- a/hw/arm/sbsa-ref.c +++ b/hw/arm/sbsa-ref.c @@ -681,7 +681,7 @@ static void create_pcie(SBSAMachineState *sms) /* Map IO port space */ sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, base_pio); -for (i = 0; i < GPEX_NUM_IRQS; i++) { +for (i = 0; i < PCI_NUM_PINS; i++) { sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, qdev_get_gpio_in(sms->gic, irq + i)); gpex_set_irq_num(GPEX_HOST(dev), i, irq + i); diff --git a/hw/arm/virt.c b/hw/arm/virt.c index a13c658bbf..3a4ef3adc2 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1467,7 +1467,7 @@ static void create_pcie(VirtMachineState *vms) /* Map IO port space */ sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, base_pio); -for (i = 0; i < GPEX_NUM_IRQS; i++) { +for (i = 0; i < PCI_NUM_PINS; i++) { sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, qdev_get_gpio_in(vms->gic, irq + i)); gpex_set_irq_num(GPEX_HOST(dev), i, irq + i); diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c index 7227a2156c..9ca007b870 100644 --- a/hw/i386/microvm.c +++ b/hw/i386/microvm.c @@ -139,7 +139,7 @@ static void create_gpex(MicrovmMachineState *mms) mms->gpex.mmio64.base, mmio64_alias); } -for (i = 0; i < GPEX_NUM_IRQS; i++) { +for (i = 0; i < PCI_NUM_PINS; i++) { sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, x86ms->gsi[mms->gpex.irq + i]); } diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c index 2629128aed..36bfcea53b 100644 --- a/hw/loongarch/virt.c +++ b/hw/loongarch/virt.c @@ -533,7 +533,7 @@ static void loongarch_devices_init(DeviceState *pch_pic, LoongArchMachineState * memory_region_add_subregion(get_system_memory(), VIRT_PCI_IO_BASE, pio_alias); -for (i = 0; i < GPEX_NUM_IRQS; i++) { +for (i = 0; i < PCI_NUM_PINS; i++) { sysbus_connect_irq(d, i, qdev_get_gpio_in(pch_pic, 16 + i)); gpex_set_irq_num(GPEX_HOST(gpex_dev), i, 16 + i); diff --git a/hw/mips/loongson3_virt.c b/hw/mips/loongson3_virt.c index b74b358874..6d54f679eb 100644 --- a/hw/mips/loongson3_virt.c +++ b/hw/mips/loongson3_virt.c @@ -437,7 +437,7 @@ static inline void loongson3_virt_devices_init(MachineState *machine, virt_memmap[VIRT_PCIE_PIO].base, s->pio_alias); sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, virt_memmap[VIRT_PCIE_PIO].base); -for (i = 0; i < GPEX_NUM_IRQS; i++) { +for (i = 0; i < PCI_NUM_PINS; i++) { irq = qdev_get_gpio_in(pic, PCIE_IRQ_BASE + i); sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, irq); gpex_set_irq_num(GPEX_HOST(dev), i, PCIE_IRQ_BASE + i); diff --git a/hw/openrisc/virt.c b/hw/openrisc/virt.c index f8a68a6a6b..16a5676c4b 100644 --- a/hw/openrisc/virt.c +++ b/hw/openrisc/virt.c @@ -318,7 +318,7 @@ static void create_pcie_irq_map(void *fdt, char *nodename, int irq_base, { int pin, dev; uint32_t irq_map_stride = 0; -uint32_t full_irq_map[GPEX_NUM_IRQS * GPEX_NUM_IRQS * 6] = {}; +uint32_t full_irq_map[PCI_NUM_PINS * PCI_NUM_PINS * 6] = {}; uint32_t *irq_map = ful
Re: [PATCH 12/12] hw/vmapple/vmapple: Add vmapple machine type
On 20.06.23 19:35, Bernhard Beschow wrote: Am 14. Juni 2023 22:57:34 UTC schrieb Alexander Graf : Apple defines a new "vmapple" machine type as part of its proprietary macOS Virtualization.Framework vmm. This machine type is similar to the virt one, but with subtle differences in base devices, a few special vmapple device additions and a vastly different boot chain. This patch reimplements this machine type in QEMU. To use it, you have to have a readily installed version of macOS for VMApple, run on macOS with -accel hvf, pass the Virtualization.Framework boot rom (AVPBooter) in via -bios, pass the aux and root volume as pflash and pass aux and root volume as virtio drives. In addition, you also need to find the machine UUID and pass that as -M vmapple,uuid= parameter: $ qemu-system-aarch64 -accel hvf -M vmapple,uuid=0x1234 -m 4G \ -bios /System/Library/Frameworks/Virtualization.framework/Versions/A/Resources/AVPBooter.vmapple2.bin -drive file=aux,if=pflash,format=raw \ -drive file=root,if=pflash,format=raw \ -drive file=aux,if=none,id=aux,format=raw \ -device virtio-blk-pci,drive=aux,x-apple-type=2 \ -drive file=root,if=none,id=root,format=raw \ -device virtio-blk-pci,drive=root,x-apple-type=1 With all these in place, you should be able to see macOS booting successfully. This documentation seems valuable for the QEMU manual. But AFAICS there is no documentation like this added to the QEMU manual in this series. This means that it'll get "lost". How about adding it, possibly in this patch? Thanks, I love the idea :). Let me do that for v2! Note that I'm not able to test this series. I'm just seeing the valuable-information-in-the-commit-message-which-will-get-lost pattern. Signed-off-by: Alexander Graf --- hw/vmapple/Kconfig | 19 ++ hw/vmapple/meson.build | 1 + hw/vmapple/vmapple.c | 661 + 3 files changed, 681 insertions(+) create mode 100644 hw/vmapple/vmapple.c diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig index ba37fc5b81..7a2375dc95 100644 --- a/hw/vmapple/Kconfig +++ b/hw/vmapple/Kconfig @@ -9,3 +9,22 @@ config VMAPPLE_CFG config VMAPPLE_PVG bool + +config VMAPPLE +bool +depends on ARM && HVF +default y if ARM && HVF +imply PCI_DEVICES +select ARM_GIC +select PLATFORM_BUS +select PCI_EXPRESS +select PCI_EXPRESS_GENERIC_BRIDGE +select PL011 # UART +select PL031 # RTC +select PL061 # GPIO +select GPIO_PWR +select PVPANIC_MMIO +select VMAPPLE_AES +select VMAPPLE_BDIF +select VMAPPLE_CFG +select VMAPPLE_PVG diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build index 31fec87156..d732873d35 100644 --- a/hw/vmapple/meson.build +++ b/hw/vmapple/meson.build @@ -2,3 +2,4 @@ softmmu_ss.add(when: 'CONFIG_VMAPPLE_AES', if_true: files('aes.c')) softmmu_ss.add(when: 'CONFIG_VMAPPLE_BDIF', if_true: files('bdif.c')) softmmu_ss.add(when: 'CONFIG_VMAPPLE_CFG', if_true: files('cfg.c')) softmmu_ss.add(when: 'CONFIG_VMAPPLE_PVG', if_true: [files('apple-gfx.m'), pvg, metal]) +specific_ss.add(when: 'CONFIG_VMAPPLE', if_true: files('vmapple.c')) diff --git a/hw/vmapple/vmapple.c b/hw/vmapple/vmapple.c new file mode 100644 index 00..5d3fe54b96 --- /dev/null +++ b/hw/vmapple/vmapple.c @@ -0,0 +1,661 @@ +/* + * VMApple machine emulation + * + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. Is an "All Rights Reserved" wording compatible with the GPL? IANAL. You will find the pattern commonly across the code base already. My understanding is that all rights are reserved, but additionally I grant you the permissions of the GPL. Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Re: [PATCH 05/12] hw/virtio: Add support for apple virtio-blk
On 16.06.23 13:48, Kevin Wolf wrote: Am 15.06.2023 um 00:56 hat Alexander Graf geschrieben: Apple has its own virtio-blk PCI device ID where it deviates from the official virtio-pci spec slightly: It puts a new "apple type" field at a static offset in config space and introduces a new discard command. In other words, it's a different device. We shouldn't try to differentiate only with a property, but actually model it as a separate device. I agree and is what I tried at first, but how do I change behavior of a virtio-blk-pci subclass all the way down to its virtio-blk implementation which lives completely outside of the scope of the respective class? The best thing I could come up with was the QEMU internal qom property x-apple-type. Happy to split them: Make the change of virtio-blk behavior depend on the property and make all of the PCI device/vendor swapping depend on a new class which then sets the x-apple-type. This patch adds a new qdev property called "apple-type" to virtio-blk-pci. When that property is set, we assume the virtio-blk device is an Apple one of the specific type and act accordingly. Do we have any information on what the number in "apple-type" actually means or do we have to treat it as a black box? I have ideas, but no documentation. It's an enum space that defines different types of devices (AUX device, root device, etc) Signed-off-by: Alexander Graf --- hw/block/virtio-blk.c | 23 + hw/virtio/virtio-blk-pci.c | 7 +++ include/hw/pci/pci_ids.h| 1 + include/hw/virtio/virtio-blk.h | 1 + include/standard-headers/linux/virtio_blk.h | 3 +++ 5 files changed, 35 insertions(+) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 39e7f23fab..76b85bb3cb 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -1120,6 +1120,20 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) break; } +case VIRTIO_BLK_T_APPLE1: Can we have a more descriptive name? +{ +if (s->conf.x_apple_type) { +/* Only valid on Apple Virtio */ +char buf[iov_size(in_iov, in_num)]; +memset(buf, 0, sizeof(buf)); +iov_from_buf(in_iov, in_num, 0, buf, sizeof(buf)); +virtio_blk_req_complete(req, VIRTIO_BLK_S_OK); So this is a command that simply fills the guest buffer with zeros without accessing the disk content? Weird, but ok, if that's what they are doing... The commit message talks about a discard command. I would have expected a command that discards/unmaps data from the disk. I think it would be good to call it something else in the commit message if it has nothing to do with this. You're completely right. I looked it up again and turns out this is actually a barrier command. Any ideas on how to best implement an actual barrier in virtio-blk? Otherwise I'll just ignore it and always return S_OK. No need for the memset muckery above. +} else { +virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP); +} +virtio_blk_free_request(req); +break; +} default: virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP); virtio_blk_free_request(req); @@ -1351,6 +1365,10 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) } else { blkcfg.zoned.model = VIRTIO_BLK_Z_NONE; } +if (s->conf.x_apple_type) { +/* Apple abuses the same location for its type id */ +blkcfg.max_secure_erase_sectors = s->conf.x_apple_type; Ideally, blkcfg would contain a union there. Since this is a type imported from the kernel, we can't change it inside of QEMU only. Works for me with this comment. +} memcpy(config, &blkcfg, s->config_size); } @@ -1625,6 +1643,10 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp) s->config_size = virtio_get_config_size(&virtio_blk_cfg_size_params, s->host_features); +if (s->conf.x_apple_type) { +/* Apple Virtio puts the blk type at 0x3c, make sure we have space. */ +s->config_size = MAX(s->config_size, 0x3d); +} virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size); s->blk = conf->conf.blk; @@ -1734,6 +1756,7 @@ static Property virtio_blk_properties[] = { conf.max_write_zeroes_sectors, BDRV_REQUEST_MAX_SECTORS), DEFINE_PROP_BOOL("x-enable-wce-if-config-wce", VirtIOBlock, conf.x_enable_wce_if_config_wce, true), +DEFINE_PROP_UINT32("x-apple-type", VirtIOBlock, conf.x_apple_type, 0), In a separate device, this would probably be called "apple-type" (without "x-") like promi
Re: [PATCH 10/12] hw/vmapple/cfg: Introduce vmapple cfg region
On 16.06.23 12:47, Philippe Mathieu-Daudé wrote: On 15/6/23 00:57, Alexander Graf wrote: Instead of device tree or other more standardized means, VMApple passes platform configuration to the first stage boot loader in a binary encoded format that resides at a dedicated RAM region in physical address space. This patch models this configuration space as a qdev device which we can then map at the fixed location in the address space. That way, we can influence and annotate all configuration fields easily. Signed-off-by: Alexander Graf --- hw/vmapple/Kconfig | 3 ++ hw/vmapple/cfg.c | 105 +++ hw/vmapple/meson.build | 1 + include/hw/vmapple/cfg.h | 68 + 4 files changed, 177 insertions(+) create mode 100644 hw/vmapple/cfg.c create mode 100644 include/hw/vmapple/cfg.h diff --git a/hw/vmapple/cfg.c b/hw/vmapple/cfg.c new file mode 100644 index 00..d48e3c3afa --- /dev/null +++ b/hw/vmapple/cfg.c @@ -0,0 +1,105 @@ +/* + * VMApple Configuration Region + * + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "hw/vmapple/cfg.h" +#include "qemu/log.h" +#include "qemu/module.h" +#include "qapi/error.h" + +static void vmapple_cfg_reset(DeviceState *dev) +{ + VMAppleCfgState *s = VMAPPLE_CFG(dev); + VMAppleCfg *cfg; + + cfg = memory_region_get_ram_ptr(&s->mem); + memset((void *)cfg, 0, VMAPPLE_CFG_SIZE); I'm a bit confused here: DeviceReset() handler is called _after_ DeviceRealize(). Yes. In Realize we set up s->cfg (the template). In reset, we fetch a pointer to the guest exposed memory region (cfg), wipe it and then copy the template over it in the next line: + *cfg = s->cfg; [...] diff --git a/include/hw/vmapple/cfg.h b/include/hw/vmapple/cfg.h new file mode 100644 index 00..3337064e44 --- /dev/null +++ b/include/hw/vmapple/cfg.h @@ -0,0 +1,68 @@ +/* + * VMApple Configuration Region + * + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef HW_VMAPPLE_CFG_H +#define HW_VMAPPLE_CFG_H + +#include "hw/sysbus.h" +#include "qom/object.h" +#include "net/net.h" + +typedef struct VMAppleCfg { + uint32_t version; /* 0x000 */ + uint32_t nr_cpus; /* 0x004 */ + uint32_t unk1; /* 0x008 */ + uint32_t unk2; /* 0x00c */ + uint32_t unk3; /* 0x010 */ + uint32_t unk4; /* 0x014 */ + uint64_t ecid; /* 0x018 */ + uint64_t ram_size; /* 0x020 */ + uint32_t run_installer1; /* 0x028 */ + uint32_t unk5; /* 0x02c */ + uint32_t unk6; /* 0x030 */ + uint32_t run_installer2; /* 0x034 */ + uint32_t rnd; /* 0x038 */ + uint32_t unk7; /* 0x03c */ + MACAddr mac_en0; /* 0x040 */ + uint8_t pad1[2]; + MACAddr mac_en1; /* 0x048 */ + uint8_t pad2[2]; + MACAddr mac_wifi0; /* 0x050 */ + uint8_t pad3[2]; + MACAddr mac_bt0; /* 0x058 */ + uint8_t pad4[2]; + uint8_t reserved[0xa0]; /* 0x060 */ + uint32_t cpu_ids[0x80]; /* 0x100 */ + uint8_t scratch[0x200]; /* 0x180 */ + char serial[32]; /* 0x380 */ + char unk8[32]; /* 0x3a0 */ + char model[32]; /* 0x3c0 */ + uint8_t unk9[32]; /* 0x3e0 */ + uint32_t unk10; /* 0x400 */ + char soc_name[32]; /* 0x404 */ +} VMAppleCfg; Since you access this structure via qdev properties (which is good), then we can restrict its definition to cfg.c (no need to expose it). This struct is part of VMAppleCfgState which (unless we go through pointers and allocate dynamically - bleks) means it needs to know the size of the struct which again means it needs to be part of the header :) Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Re: [PATCH 09/12] hw/vmapple/bdif: Introduce vmapple backdoor interface
On 16.06.23 12:39, Philippe Mathieu-Daudé wrote: On 15/6/23 00:56, Alexander Graf wrote: The VMApple machine exposes AUX and ROOT block devices (as well as USB OTG emulation) via virtio-pci as well as a special, simple backdoor platform device. This patch implements this backdoor platform device to the best of my understanding. I left out any USB OTG parts; they're only needed for guest recovery and I don't understand the protocol yet. Signed-off-by: Alexander Graf --- hw/vmapple/Kconfig | 2 + hw/vmapple/bdif.c | 245 ++ hw/vmapple/meson.build | 1 + hw/vmapple/trace-events | 5 + include/hw/vmapple/bdif.h | 31 + Please enable scripts/git.orderfile if possible. Sure, happy to :) +#define REG_DEVID_MASK 0x +#define DEVID_ROOT 0x +#define DEVID_AUX 0x0001 +#define DEVID_USB 0x0010 + +#define REG_STATUS 0x0 +#define REG_STATUS_ACTIVE BIT(0) +#define REG_CFG 0x4 +#define REG_CFG_ACTIVE BIT(1) +#define REG_UNK1 0x8 +#define REG_BUSY 0x10 +#define REG_BUSY_READY BIT(0) +#define REG_UNK2 0x400 +#define REG_CMD 0x408 +#define REG_NEXT_DEVICE 0x420 +#define REG_UNK3 0x434 +static uint64_t bdif_read(void *opaque, hwaddr offset, unsigned size) +{ + uint64_t ret = -1; + uint64_t devid = (offset & REG_DEVID_MASK); + + switch (offset & ~REG_DEVID_MASK) { + case REG_STATUS: + ret = REG_STATUS_ACTIVE; + break; + case REG_CFG: + ret = REG_CFG_ACTIVE; + break; + case REG_UNK1: + ret = 0x420; + break; + case REG_BUSY: + ret = REG_BUSY_READY; + break; + case REG_UNK2: + ret = 0x1; + break; + case REG_UNK3: + ret = 0x0; + break; + case REG_NEXT_DEVICE: + switch (devid) { + case DEVID_ROOT: + ret = 0x800; + break; + case DEVID_AUX: + ret = 0x1; + break; + } + break; + } + + trace_bdif_read(offset, size, ret); + return ret; +} +static const MemoryRegionOps bdif_ops = { + .read = bdif_read, + .write = bdif_write, + .endianness = DEVICE_NATIVE_ENDIAN, + .valid = { + .min_access_size = 1, + .max_access_size = 8, + }, + .impl = { + .min_access_size = 1, + .max_access_size = 8, IIUC your implementation is using (min, max) = (4, 4): i.e. if the guest emits a 64-bit read at offset 0, we want to return both REG_STATUS/REG_CFG registers. I don't know if the BDIF device carries those semantics. Today, I'm only seeing 32bit accesses which is what I can vouch for. Will 8bit accesses go to a different register space or just access a subset of the 32bit register? I don't know :) The same applies to 64bit ones. For all I know, they might as well end up as completely different registers. Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Re: hvf: Invalid ISV on data abort
Hi Antonio, On 02.08.23 11:43, Antonio Caggiano wrote: Hi there, I am trying to bring up a guest on HVF, which at a certain point is trying to write to an area of mmio space and it triggers a data abort where ISV=0 (translation fault level 2). I wonder what could cause it and how to recover. QEMU's HVF implementation - like KVM - only supports MMIO accesses from hardware decoded, "simple" load/store instructions. It will only execute guest OSs that are aware of that limitation and limit MMIO accesses to that set of instructions, such as Linux. If you see this effect with an enlightened OS, you are most likely exposing memory that the guest expects to be represented as RAM as MMIO. Thanks, Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Re: [PATCH 00/12] Introduce new vmapple machine type
Hi Mads, On 20.06.23 13:17, Mads Ynddal wrote: On 15 Jun 2023, at 00.40, Alexander Graf wrote: This patch set introduces a new ARM and HVF specific machine type called "vmapple". It mimicks the device model that Apple's proprietary Virtualization.Framework exposes, but implements it in QEMU. With this new machine type, you can run macOS guests on Apple Silicon systems via HVF. To do so, you need to first install macOS using Virtualization.Framework onto a virtual disk image using a tool like macosvm (https://github.com/s-u/macosvm) $ macosvm --disk disk.img,size=32g --aux aux.img \ --restore UniversalMac_12.0.1_21A559_Restore.ipsw vm.json Then, extract the ECID from the installed VM: $ cat "$DIR/macosvm.json" | python3 -c \ 'import json,sys;obj=json.load(sys.stdin);print(obj["machineId"]) |\ base64 -d | plutil -extract ECID raw - Beware, that the file will be called 'vm.json' and DIR is undefined following the previous line. Also, it's missing a single-quote at the end of `["machineId"])`. Thanks :) In addition, cut off the first 16kb of the aux.img: $ dd if=aux.img of=aux.img.trimmed bs=$(( 0x4000 )) skip=1 Now, you can just launch QEMU with the bits generated above: $ qemu-system-aarch64 -serial mon:stdio\ -m 4G \ -M vmapple,uuid=6240349656165161789\ -bios /Sys*/Lib*/Fra*/Virtualization.f*/R*/AVPBooter.vmapple2.bin \ -pflash aux.img.trimmed\ -pflash disk.img \ -drive file=disk.img,if=none,id=root \ -device virtio-blk-pci,drive=root,x-apple-type=1 \ -drive file=aux.img.trimmed,if=none,id=aux \ -device virtio-blk-pci,drive=aux,x-apple-type=2\ -accel hvf -no-reboot Just for clarity, I'd add that the 'vmapple,uuid=...' has to be set to the ECID the previous step. You haven't defined a display, but I'm not sure if that is on purpose to show a minimal setup. I had to add '-display sdl' for it to fully work. Weird, I do get a normal cocoa output screen by default. There are a few limitations with this implementation: - Only runs on macOS because it relies on ParavirtualizesGraphics.Framework - Something is not fully correct on interrupt delivery or similar - the keyboard does not work - No Rosetta in the guest because we lack the private entitlement to enable TSO Would it be possible to mitigate the keyboard issue using an emulated USB keyboard? I tried poking around with it, but with no success. Unfortunately I was not able to get USB stable inside the guest. This may be an issue with interrupt propagation: With usb-kbd I see macOS not pick up key up or down events in time. Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Re: [PATCH 03/12] hvf: Increase number of possible memory slots
Hi Philippe, On 16.06.23 12:28, Philippe Mathieu-Daudé wrote: On 15/6/23 00:40, Alexander Graf wrote: For PVG we will need more than the current 32 possible memory slots. Bump the limit to 512 instead. Signed-off-by: Alexander Graf --- accel/hvf/hvf-accel-ops.c | 2 +- include/sysemu/hvf_int.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c index 9c3da03c94..bf0caaa852 100644 --- a/accel/hvf/hvf-accel-ops.c +++ b/accel/hvf/hvf-accel-ops.c @@ -88,7 +88,7 @@ struct mac_slot { uint64_t gva; }; -struct mac_slot mac_slots[32]; +struct mac_slot mac_slots[512]; static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) { diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h index 6ab119e49f..c7623a2c09 100644 --- a/include/sysemu/hvf_int.h +++ b/include/sysemu/hvf_int.h @@ -40,7 +40,7 @@ typedef struct hvf_vcpu_caps { struct HVFState { AccelState parent; - hvf_slot slots[32]; + hvf_slot slots[512]; int num_slots; hvf_vcpu_caps *hvf_caps; Please add a definition in this header (using in ops.c). Happy to :) In order to save memory and woods, what about keeping 32 on x86 and only raising to 512 on arm? I am hoping that someone takes the apple-gfx driver and enables it for x86 as well, so I'd rather keep them consistent. Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH 11/12] hw/vmapple/apple-gfx: Introduce ParavirtualizedGraphics.Framework support
MacOS provides a framework (library) that allows any vmm to implement a paravirtualized 3d graphics passthrough to the host metal stack called ParavirtualizedGraphics.Framework (PVG). The library abstracts away almost every aspect of the paravirtualized device model and only provides and receives callbacks on MMIO access as well as to share memory address space between the VM and PVG. This patch implements a QEMU device that drives PVG for the VMApple variant of it. Signed-off-by: Alexander Graf --- hw/vmapple/Kconfig | 3 + hw/vmapple/apple-gfx.m | 578 hw/vmapple/meson.build | 1 + hw/vmapple/trace-events | 22 ++ meson.build | 4 + 5 files changed, 608 insertions(+) create mode 100644 hw/vmapple/apple-gfx.m diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig index 542426a740..ba37fc5b81 100644 --- a/hw/vmapple/Kconfig +++ b/hw/vmapple/Kconfig @@ -6,3 +6,6 @@ config VMAPPLE_BDIF config VMAPPLE_CFG bool + +config VMAPPLE_PVG +bool diff --git a/hw/vmapple/apple-gfx.m b/hw/vmapple/apple-gfx.m new file mode 100644 index 00..97dd2cd9ae --- /dev/null +++ b/hw/vmapple/apple-gfx.m @@ -0,0 +1,578 @@ +/* + * QEMU Apple ParavirtualizedGraphics.framework device + * + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + * ParavirtualizedGraphics.framework is a set of libraries that macOS provides + * which implements 3d graphics passthrough to the host as well as a + * proprietary guest communication channel to drive it. This device model + * implements support to drive that library from within QEMU. + */ + +#include "qemu/osdep.h" +#include "hw/irq.h" +#include "migration/vmstate.h" +#include "qemu/log.h" +#include "qemu/module.h" +#include "trace.h" +#include "hw/sysbus.h" +#include "hw/pci/msi.h" +#include "crypto/hash.h" +#include "sysemu/cpus.h" +#include "ui/console.h" +#include "monitor/monitor.h" +#import + +#define TYPE_APPLE_GFX "apple-gfx" + +#define MAX_MRS 512 + +static const PGDisplayCoord_t apple_gfx_modes[] = { +{ .x = 1440, .y = 1080 }, +{ .x = 1280, .y = 1024 }, +}; + +/* + * We have to map PVG memory into our address space. Use the one below + * as base start address. In normal linker setups it points to a free + * memory range. + */ +#define APPLE_GFX_BASE_VA ((void *)(uintptr_t)0x5000UL) + +/* + * ParavirtualizedGraphics.Framework only ships header files for the x86 + * variant which does not include IOSFC descriptors and host devices. We add + * their definitions here so that we can also work with the ARM version. + */ +typedef bool(^IOSFCRaiseInterrupt)(uint32_t vector); +typedef bool(^IOSFCUnmapMemory)(void *a, void *b, void *c, void *d, void *e, void *f); +typedef bool(^IOSFCMapMemory)(uint64_t phys, uint64_t len, bool ro, void **va, void *e, void *f); + +@interface PGDeviceDescriptorExt : PGDeviceDescriptor +@property (readwrite, nonatomic) bool usingIOSurfaceMapper; +@end + +@interface PGIOSurfaceHostDeviceDescriptor : NSObject +-(PGIOSurfaceHostDeviceDescriptor *)init; +@property (readwrite, nonatomic, copy, nullable) IOSFCMapMemory mapMemory; +@property (readwrite, nonatomic, copy, nullable) IOSFCUnmapMemory unmapMemory; +@property (readwrite, nonatomic, copy, nullable) IOSFCRaiseInterrupt raiseInterrupt; +@end + +@interface PGIOSurfaceHostDevice : NSObject +-(void)initWithDescriptor:(PGIOSurfaceHostDeviceDescriptor *) desc; +-(uint32_t)mmioReadAtOffset:(size_t) offset; +-(void)mmioWriteAtOffset:(size_t) offset value:(uint32_t)value; +@end + +typedef struct AppleGFXMR { +QTAILQ_ENTRY(AppleGFXMR) node; +hwaddr pa; +void *va; +uint64_t len; +} AppleGFXMR; + +typedef QTAILQ_HEAD(, AppleGFXMR) AppleGFXMRList; + +typedef struct AppleGFXTask { +QTAILQ_ENTRY(AppleGFXTask) node; +void *mem; +uint64_t len; +} AppleGFXTask; + +typedef QTAILQ_HEAD(, AppleGFXTask) AppleGFXTaskList; + +typedef struct AppleGFXState { +/* Private */ +SysBusDevice parent_obj; + +/* Public */ +qemu_irq irq_gfx; +qemu_irq irq_iosfc; +MemoryRegion iomem_gfx; +MemoryRegion iomem_iosfc; +id pgdev; +id pgdisp; +PGIOSurfaceHostDevice *pgiosfc; +AppleGFXMRList mrs; +AppleGFXTaskList tasks; +QemuConsole *con; +void *vram; +id mtl; +id texture; +bool handles_frames; +bool new_frame; +bool cursor_show; +DisplaySurface *surface; +QEMUCursor *cursor; +} AppleGFXState; + + +OBJECT_DECLARE_SIMPLE_TYPE(AppleGFXState, APPLE_GFX) + +static AppleGFXTask *apple_gfx_new_task(AppleGFXState *s, uint64_t len) +{ +void *base = APPLE_GFX_BASE_VA; +AppleGFXTask *task; + +QTAILQ_FOREACH(task, &s->tasks,
[PATCH 12/12] hw/vmapple/vmapple: Add vmapple machine type
Apple defines a new "vmapple" machine type as part of its proprietary macOS Virtualization.Framework vmm. This machine type is similar to the virt one, but with subtle differences in base devices, a few special vmapple device additions and a vastly different boot chain. This patch reimplements this machine type in QEMU. To use it, you have to have a readily installed version of macOS for VMApple, run on macOS with -accel hvf, pass the Virtualization.Framework boot rom (AVPBooter) in via -bios, pass the aux and root volume as pflash and pass aux and root volume as virtio drives. In addition, you also need to find the machine UUID and pass that as -M vmapple,uuid= parameter: $ qemu-system-aarch64 -accel hvf -M vmapple,uuid=0x1234 -m 4G \ -bios /System/Library/Frameworks/Virtualization.framework/Versions/A/Resources/AVPBooter.vmapple2.bin -drive file=aux,if=pflash,format=raw \ -drive file=root,if=pflash,format=raw \ -drive file=aux,if=none,id=aux,format=raw \ -device virtio-blk-pci,drive=aux,x-apple-type=2 \ -drive file=root,if=none,id=root,format=raw \ -device virtio-blk-pci,drive=root,x-apple-type=1 With all these in place, you should be able to see macOS booting successfully. Signed-off-by: Alexander Graf --- hw/vmapple/Kconfig | 19 ++ hw/vmapple/meson.build | 1 + hw/vmapple/vmapple.c | 661 + 3 files changed, 681 insertions(+) create mode 100644 hw/vmapple/vmapple.c diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig index ba37fc5b81..7a2375dc95 100644 --- a/hw/vmapple/Kconfig +++ b/hw/vmapple/Kconfig @@ -9,3 +9,22 @@ config VMAPPLE_CFG config VMAPPLE_PVG bool + +config VMAPPLE +bool +depends on ARM && HVF +default y if ARM && HVF +imply PCI_DEVICES +select ARM_GIC +select PLATFORM_BUS +select PCI_EXPRESS +select PCI_EXPRESS_GENERIC_BRIDGE +select PL011 # UART +select PL031 # RTC +select PL061 # GPIO +select GPIO_PWR +select PVPANIC_MMIO +select VMAPPLE_AES +select VMAPPLE_BDIF +select VMAPPLE_CFG +select VMAPPLE_PVG diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build index 31fec87156..d732873d35 100644 --- a/hw/vmapple/meson.build +++ b/hw/vmapple/meson.build @@ -2,3 +2,4 @@ softmmu_ss.add(when: 'CONFIG_VMAPPLE_AES', if_true: files('aes.c')) softmmu_ss.add(when: 'CONFIG_VMAPPLE_BDIF', if_true: files('bdif.c')) softmmu_ss.add(when: 'CONFIG_VMAPPLE_CFG', if_true: files('cfg.c')) softmmu_ss.add(when: 'CONFIG_VMAPPLE_PVG', if_true: [files('apple-gfx.m'), pvg, metal]) +specific_ss.add(when: 'CONFIG_VMAPPLE', if_true: files('vmapple.c')) diff --git a/hw/vmapple/vmapple.c b/hw/vmapple/vmapple.c new file mode 100644 index 00..5d3fe54b96 --- /dev/null +++ b/hw/vmapple/vmapple.c @@ -0,0 +1,661 @@ +/* + * VMApple machine emulation + * + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + * VMApple is the device model that the macOS built-in hypervisor called + * "Virtualization.framework" exposes to Apple Silicon macOS guests. The + * machine model in this file implements the same device model in QEMU, but + * does not use any code from Virtualization.Framework. + */ + +#include "qemu/osdep.h" +#include "qemu/help-texts.h" +#include "qemu/datadir.h" +#include "qemu/units.h" +#include "qemu/option.h" +#include "monitor/qdev.h" +#include "qapi/error.h" +#include "hw/sysbus.h" +#include "hw/arm/boot.h" +#include "hw/arm/primecell.h" +#include "hw/boards.h" +#include "net/net.h" +#include "sysemu/sysemu.h" +#include "sysemu/runstate.h" +#include "sysemu/kvm.h" +#include "sysemu/hvf.h" +#include "hw/loader.h" +#include "qapi/error.h" +#include "qemu/bitops.h" +#include "qemu/error-report.h" +#include "qemu/module.h" +#include "hw/pci-host/gpex.h" +#include "hw/virtio/virtio-pci.h" +#include "hw/qdev-properties.h" +#include "hw/intc/arm_gic.h" +#include "hw/intc/arm_gicv3_common.h" +#include "hw/irq.h" +#include "qapi/visitor.h" +#include "qapi/qapi-visit-common.h" +#include "standard-headers/linux/input.h" +#include "target/arm/internals.h" +#include "target/arm/kvm_arm.h" +#include "hw/char/pl011.h" +#include "qemu/guest-random.h" +#include "sysemu/reset.h" +#include "qemu/log.h" +#include "hw/vmapple/cfg.h" +#include "hw/misc/pvpanic.h" +#include &quo
[PATCH 08/12] hw/vmapple/aes: Introduce aes engine
VMApple contains an "aes" engine device that it uses to encrypt and decrypt its nvram. It has trivial hard coded keys it uses for that purpose. Add device emulation for this device model. Signed-off-by: Alexander Graf --- hw/vmapple/Kconfig | 2 + hw/vmapple/aes.c| 583 hw/vmapple/meson.build | 1 + hw/vmapple/trace-events | 18 ++ 4 files changed, 604 insertions(+) create mode 100644 hw/vmapple/aes.c diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig index 8b13789179..a73504d599 100644 --- a/hw/vmapple/Kconfig +++ b/hw/vmapple/Kconfig @@ -1 +1,3 @@ +config VMAPPLE_AES +bool diff --git a/hw/vmapple/aes.c b/hw/vmapple/aes.c new file mode 100644 index 00..eaf1e26abe --- /dev/null +++ b/hw/vmapple/aes.c @@ -0,0 +1,583 @@ +/* + * QEMU Apple AES device emulation + * + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "hw/irq.h" +#include "migration/vmstate.h" +#include "qemu/log.h" +#include "qemu/module.h" +#include "trace.h" +#include "hw/sysbus.h" +#include "crypto/hash.h" +#include "crypto/aes.h" +#include "crypto/cipher.h" + +#define TYPE_AES "apple-aes" +#define MAX_FIFO_SIZE 9 + +#define CMD_KEY 0x1 +#define CMD_KEY_CONTEXT_SHIFT27 +#define CMD_KEY_CONTEXT_MASK (0x1 << CMD_KEY_CONTEXT_SHIFT) +#define CMD_KEY_SELECT_SHIFT 24 +#define CMD_KEY_SELECT_MASK (0x7 << CMD_KEY_SELECT_SHIFT) +#define CMD_KEY_KEY_LEN_SHIFT22 +#define CMD_KEY_KEY_LEN_MASK (0x3 << CMD_KEY_KEY_LEN_SHIFT) +#define CMD_KEY_ENCRYPT_SHIFT20 +#define CMD_KEY_ENCRYPT_MASK (0x1 << CMD_KEY_ENCRYPT_SHIFT) +#define CMD_KEY_BLOCK_MODE_SHIFT 16 +#define CMD_KEY_BLOCK_MODE_MASK (0x3 << CMD_KEY_BLOCK_MODE_SHIFT) +#define CMD_IV0x2 +#define CMD_IV_CONTEXT_SHIFT 26 +#define CMD_IV_CONTEXT_MASK (0x3 << CMD_KEY_CONTEXT_SHIFT) +#define CMD_DSB 0x3 +#define CMD_SKG 0x4 +#define CMD_DATA 0x5 +#define CMD_DATA_KEY_CTX_SHIFT 27 +#define CMD_DATA_KEY_CTX_MASK(0x1 << CMD_DATA_KEY_CTX_SHIFT) +#define CMD_DATA_IV_CTX_SHIFT25 +#define CMD_DATA_IV_CTX_MASK (0x3 << CMD_DATA_IV_CTX_SHIFT) +#define CMD_DATA_LEN_MASK0xff +#define CMD_STORE_IV 0x6 +#define CMD_STORE_IV_ADDR_MASK 0xff +#define CMD_WRITE_REG 0x7 +#define CMD_FLAG 0x8 +#define CMD_FLAG_STOP_MASK BIT(26) +#define CMD_FLAG_RAISE_IRQ_MASK BIT(27) +#define CMD_FLAG_INFO_MASK 0xff +#define CMD_MAX 0x10 + +#define CMD_SHIFT 28 + +#define REG_STATUS0xc +#define REG_STATUS_DMA_READ_RUNNING BIT(0) +#define REG_STATUS_DMA_READ_PENDING BIT(1) +#define REG_STATUS_DMA_WRITE_RUNNINGBIT(2) +#define REG_STATUS_DMA_WRITE_PENDINGBIT(3) +#define REG_STATUS_BUSY BIT(4) +#define REG_STATUS_EXECUTINGBIT(5) +#define REG_STATUS_READYBIT(6) +#define REG_STATUS_TEXT_DPA_SEEDED BIT(7) +#define REG_STATUS_UNWRAP_DPA_SEEDEDBIT(8) + +#define REG_IRQ_STATUS0x18 +#define REG_IRQ_STATUS_INVALID_CMD BIT(2) +#define REG_IRQ_STATUS_FLAG BIT(5) +#define REG_IRQ_ENABLE0x1c +#define REG_WATERMARK 0x20 +#define REG_Q_STATUS 0x24 +#define REG_FLAG_INFO 0x30 +#define REG_FIFO 0x200 + +static const uint32_t key_lens[4] = { +[0] = 16, +[1] = 24, +[2] = 32, +[3] = 64, +}; + +struct key { +uint32_t key_len; +uint32_t key[8]; +}; + +struct iv { +uint32_t iv[4]; +}; + +struct context { +struct key key; +struct iv iv; +}; + +static struct key builtin_keys[7] = { +[1] = { +.key_len = 32, +.key = { 0x1 }, +}, +[2] = { +.key_len = 32, +.key = { 0x2 }, +}, +[3] = { +.key_len = 32, +.key = { 0x3 }, +} +}; + +typedef struct AESState { +/* Private */ +SysBusDevice parent_obj; + +/* Public */ +qemu_irq irq; +MemoryRegion iomem1; +MemoryRegion iomem2; + +uint32_t status; +uint32_t q_status; +uint32_t irq_status; +uint32_t irq_enable; +uint32_t watermark; +uint32_t flag_info; +uint32_t fifo[MAX_FIFO_SIZE]; +uint32_t fifo_idx; +struct key key[2]; +struct iv iv[4]; +bool is_encrypt; +QCryptoCipherMode block_mode; +} AESState; + +OBJECT_DECLARE_SIMPLE_TYPE(AESState, AES) + +static void aes_update_irq(AESState *s) +{ +qemu_set_irq(s->irq, !!(s->irq_status & s->irq_enable)); +} + +static uint64_t aes1_read(void *opaque, hwaddr offset, unsigned size) +{ +AESState *s = opaque; +uint64_
[PATCH 10/12] hw/vmapple/cfg: Introduce vmapple cfg region
Instead of device tree or other more standardized means, VMApple passes platform configuration to the first stage boot loader in a binary encoded format that resides at a dedicated RAM region in physical address space. This patch models this configuration space as a qdev device which we can then map at the fixed location in the address space. That way, we can influence and annotate all configuration fields easily. Signed-off-by: Alexander Graf --- hw/vmapple/Kconfig | 3 ++ hw/vmapple/cfg.c | 105 +++ hw/vmapple/meson.build | 1 + include/hw/vmapple/cfg.h | 68 + 4 files changed, 177 insertions(+) create mode 100644 hw/vmapple/cfg.c create mode 100644 include/hw/vmapple/cfg.h diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig index 388a2bc60c..542426a740 100644 --- a/hw/vmapple/Kconfig +++ b/hw/vmapple/Kconfig @@ -3,3 +3,6 @@ config VMAPPLE_AES config VMAPPLE_BDIF bool + +config VMAPPLE_CFG +bool diff --git a/hw/vmapple/cfg.c b/hw/vmapple/cfg.c new file mode 100644 index 00..d48e3c3afa --- /dev/null +++ b/hw/vmapple/cfg.c @@ -0,0 +1,105 @@ +/* + * VMApple Configuration Region + * + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "hw/vmapple/cfg.h" +#include "qemu/log.h" +#include "qemu/module.h" +#include "qapi/error.h" + +static void vmapple_cfg_reset(DeviceState *dev) +{ +VMAppleCfgState *s = VMAPPLE_CFG(dev); +VMAppleCfg *cfg; + +cfg = memory_region_get_ram_ptr(&s->mem); +memset((void *)cfg, 0, VMAPPLE_CFG_SIZE); +*cfg = s->cfg; +} + +static void vmapple_cfg_realize(DeviceState *dev, Error **errp) +{ +VMAppleCfgState *s = VMAPPLE_CFG(dev); +uint32_t i; + +strncpy(s->cfg.serial, s->serial, sizeof(s->cfg.serial)); +strncpy(s->cfg.model, s->model, sizeof(s->cfg.model)); +strncpy(s->cfg.soc_name, s->soc_name, sizeof(s->cfg.soc_name)); +strncpy(s->cfg.unk8, "D/A", sizeof(s->cfg.soc_name)); +s->cfg.ecid = cpu_to_be64(s->cfg.ecid); +s->cfg.version = 2; +s->cfg.unk1 = 1; +s->cfg.unk2 = 1; +s->cfg.unk3 = 0x20; +s->cfg.unk4 = 0; +s->cfg.unk5 = 1; +s->cfg.unk6 = 1; +s->cfg.unk7 = 0; +s->cfg.unk10 = 1; + +g_assert(s->cfg.nr_cpus < ARRAY_SIZE(s->cfg.cpu_ids)); +for (i = 0; i < s->cfg.nr_cpus; i++) { +s->cfg.cpu_ids[i] = i; +} +} + +static void vmapple_cfg_init(Object *obj) +{ +VMAppleCfgState *s = VMAPPLE_CFG(obj); + +memory_region_init_ram(&s->mem, obj, "VMApple Config", VMAPPLE_CFG_SIZE, + &error_fatal); +sysbus_init_mmio(SYS_BUS_DEVICE(obj), &s->mem); + +s->serial = (char *)"1234"; +s->model = (char *)"VM0001"; +s->soc_name = (char *)"Apple M1 (Virtual)"; +} + +static Property vmapple_cfg_properties[] = { +DEFINE_PROP_UINT32("nr-cpus", VMAppleCfgState, cfg.nr_cpus, 1), +DEFINE_PROP_UINT64("ecid", VMAppleCfgState, cfg.ecid, 0), +DEFINE_PROP_UINT64("ram-size", VMAppleCfgState, cfg.ram_size, 0), +DEFINE_PROP_UINT32("run_installer1", VMAppleCfgState, cfg.run_installer1, 0), +DEFINE_PROP_UINT32("run_installer2", VMAppleCfgState, cfg.run_installer2, 0), +DEFINE_PROP_UINT32("rnd", VMAppleCfgState, cfg.rnd, 0), +DEFINE_PROP_MACADDR("mac-en0", VMAppleCfgState, cfg.mac_en0), +DEFINE_PROP_MACADDR("mac-en1", VMAppleCfgState, cfg.mac_en1), +DEFINE_PROP_MACADDR("mac-wifi0", VMAppleCfgState, cfg.mac_wifi0), +DEFINE_PROP_MACADDR("mac-bt0", VMAppleCfgState, cfg.mac_bt0), +DEFINE_PROP_STRING("serial", VMAppleCfgState, serial), +DEFINE_PROP_STRING("model", VMAppleCfgState, model), +DEFINE_PROP_STRING("soc_name", VMAppleCfgState, soc_name), +DEFINE_PROP_END_OF_LIST(), +}; + +static void vmapple_cfg_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); + +dc->realize = vmapple_cfg_realize; +dc->desc = "VMApple Configuration Region"; +device_class_set_props(dc, vmapple_cfg_properties); +dc->reset = vmapple_cfg_reset; +} + +static const TypeInfo vmapple_cfg_info = { +.name = TYPE_VMAPPLE_CFG, +.parent= TYPE_SYS_BUS_DEVICE, +.instance_size = sizeof(VMAppleCfgState), +.instance_init = vmapple_cfg_init, +.class_init= vmapple_cfg_class_init, +}; + +static void vmapple_cfg_register_types(void) +{ +type_register_static(&vmapple_cfg_info); +} + +type_init(vmapple
[PATCH 07/12] gpex: Allow more than 4 legacy IRQs
Some boards such as vmapple don't do real legacy PCI IRQ swizzling. Instead, they just keep allocating more board IRQ lines for each new legacy IRQ. Let's support that mode by giving instantiators a new "nr_irqs" property they can use to support more than 4 legacy IRQ lines. In this mode, GPEX will export more IRQ lines, one for each device. Signed-off-by: Alexander Graf --- hw/arm/sbsa-ref.c | 2 +- hw/arm/virt.c | 2 +- hw/i386/microvm.c | 2 +- hw/loongarch/virt.c| 2 +- hw/mips/loongson3_virt.c | 2 +- hw/openrisc/virt.c | 12 ++-- hw/pci-host/gpex.c | 36 +++- hw/riscv/virt.c| 12 ++-- hw/xtensa/virt.c | 2 +- include/hw/pci-host/gpex.h | 7 +++ 10 files changed, 52 insertions(+), 27 deletions(-) diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c index de21200ff9..2715ea7b2f 100644 --- a/hw/arm/sbsa-ref.c +++ b/hw/arm/sbsa-ref.c @@ -647,7 +647,7 @@ static void create_pcie(SBSAMachineState *sms) /* Map IO port space */ sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, base_pio); -for (i = 0; i < GPEX_NUM_IRQS; i++) { +for (i = 0; i < PCI_NUM_PINS; i++) { sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, qdev_get_gpio_in(sms->gic, irq + i)); gpex_set_irq_num(GPEX_HOST(dev), i, irq + i); diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 9b9f7d9c68..cabb5d14f2 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1466,7 +1466,7 @@ static void create_pcie(VirtMachineState *vms) /* Map IO port space */ sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, base_pio); -for (i = 0; i < GPEX_NUM_IRQS; i++) { +for (i = 0; i < PCI_NUM_PINS; i++) { sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, qdev_get_gpio_in(vms->gic, irq + i)); gpex_set_irq_num(GPEX_HOST(dev), i, irq + i); diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c index 7227a2156c..9ca007b870 100644 --- a/hw/i386/microvm.c +++ b/hw/i386/microvm.c @@ -139,7 +139,7 @@ static void create_gpex(MicrovmMachineState *mms) mms->gpex.mmio64.base, mmio64_alias); } -for (i = 0; i < GPEX_NUM_IRQS; i++) { +for (i = 0; i < PCI_NUM_PINS; i++) { sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, x86ms->gsi[mms->gpex.irq + i]); } diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c index ceddec1b23..6a0c2f3103 100644 --- a/hw/loongarch/virt.c +++ b/hw/loongarch/virt.c @@ -512,7 +512,7 @@ static void loongarch_devices_init(DeviceState *pch_pic, LoongArchMachineState * memory_region_add_subregion(get_system_memory(), VIRT_PCI_IO_BASE, pio_alias); -for (i = 0; i < GPEX_NUM_IRQS; i++) { +for (i = 0; i < PCI_NUM_PINS; i++) { sysbus_connect_irq(d, i, qdev_get_gpio_in(pch_pic, 16 + i)); gpex_set_irq_num(GPEX_HOST(gpex_dev), i, 16 + i); diff --git a/hw/mips/loongson3_virt.c b/hw/mips/loongson3_virt.c index 216812f660..acf9fead85 100644 --- a/hw/mips/loongson3_virt.c +++ b/hw/mips/loongson3_virt.c @@ -438,7 +438,7 @@ static inline void loongson3_virt_devices_init(MachineState *machine, virt_memmap[VIRT_PCIE_PIO].base, s->pio_alias); sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, virt_memmap[VIRT_PCIE_PIO].base); -for (i = 0; i < GPEX_NUM_IRQS; i++) { +for (i = 0; i < PCI_NUM_PINS; i++) { irq = qdev_get_gpio_in(pic, PCIE_IRQ_BASE + i); sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, irq); gpex_set_irq_num(GPEX_HOST(dev), i, PCIE_IRQ_BASE + i); diff --git a/hw/openrisc/virt.c b/hw/openrisc/virt.c index f8a68a6a6b..16a5676c4b 100644 --- a/hw/openrisc/virt.c +++ b/hw/openrisc/virt.c @@ -318,7 +318,7 @@ static void create_pcie_irq_map(void *fdt, char *nodename, int irq_base, { int pin, dev; uint32_t irq_map_stride = 0; -uint32_t full_irq_map[GPEX_NUM_IRQS * GPEX_NUM_IRQS * 6] = {}; +uint32_t full_irq_map[PCI_NUM_PINS * PCI_NUM_PINS * 6] = {}; uint32_t *irq_map = full_irq_map; /* @@ -330,11 +330,11 @@ static void create_pcie_irq_map(void *fdt, char *nodename, int irq_base, * possible slot) seeing the interrupt-map-mask will allow the table * to wrap to any number of devices. */ -for (dev = 0; dev < GPEX_NUM_IRQS; dev++) { +for (dev = 0; dev < PCI_NUM_PINS; dev++) { int devfn = dev << 3; -for (pin = 0; pin < GPEX_NUM_IRQS; pin++) { -int irq_nr = irq_base + ((pin + PCI_SLOT(devfn)) % GPEX_NUM_IRQS); +for (pin = 0; pin < PCI_NUM_PINS; pin++) { +int irq_nr = irq_base + ((pin + PCI_SLOT(devfn)) % PCI_NUM_PINS); int i = 0; /* Fill PCI address cells */ @@ -357,7 +357,7 @@ static void create_pci
[PATCH 09/12] hw/vmapple/bdif: Introduce vmapple backdoor interface
The VMApple machine exposes AUX and ROOT block devices (as well as USB OTG emulation) via virtio-pci as well as a special, simple backdoor platform device. This patch implements this backdoor platform device to the best of my understanding. I left out any USB OTG parts; they're only needed for guest recovery and I don't understand the protocol yet. Signed-off-by: Alexander Graf --- hw/vmapple/Kconfig| 2 + hw/vmapple/bdif.c | 245 ++ hw/vmapple/meson.build| 1 + hw/vmapple/trace-events | 5 + include/hw/vmapple/bdif.h | 31 + 5 files changed, 284 insertions(+) create mode 100644 hw/vmapple/bdif.c create mode 100644 include/hw/vmapple/bdif.h diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig index a73504d599..388a2bc60c 100644 --- a/hw/vmapple/Kconfig +++ b/hw/vmapple/Kconfig @@ -1,3 +1,5 @@ config VMAPPLE_AES bool +config VMAPPLE_BDIF +bool diff --git a/hw/vmapple/bdif.c b/hw/vmapple/bdif.c new file mode 100644 index 00..36b5915ff3 --- /dev/null +++ b/hw/vmapple/bdif.c @@ -0,0 +1,245 @@ +/* + * VMApple Backdoor Interface + * + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "hw/vmapple/bdif.h" +#include "qemu/log.h" +#include "qemu/module.h" +#include "qapi/error.h" +#include "trace.h" +#include "hw/block/block.h" +#include "sysemu/block-backend.h" + +#define REG_DEVID_MASK 0x +#define DEVID_ROOT 0x +#define DEVID_AUX 0x0001 +#define DEVID_USB 0x0010 + +#define REG_STATUS 0x0 +#define REG_STATUS_ACTIVE BIT(0) +#define REG_CFG 0x4 +#define REG_CFG_ACTIVEBIT(1) +#define REG_UNK10x8 +#define REG_BUSY0x10 +#define REG_BUSY_READYBIT(0) +#define REG_UNK20x400 +#define REG_CMD 0x408 +#define REG_NEXT_DEVICE 0x420 +#define REG_UNK30x434 + +typedef struct vblk_sector { +uint32_t pad; +uint32_t pad2; +uint32_t sector; +uint32_t pad3; +} VblkSector; + +typedef struct vblk_req_cmd { +uint64_t addr; +uint32_t len; +uint32_t flags; +} VblkReqCmd; + +typedef struct vblk_req { +VblkReqCmd sector; +VblkReqCmd data; +VblkReqCmd retval; +} VblkReq; + +#define VBLK_DATA_FLAGS_READ 0x00030001 +#define VBLK_DATA_FLAGS_WRITE 0x00010001 + +#define VBLK_RET_SUCCESS 0 +#define VBLK_RET_FAILED 1 + +static uint64_t bdif_read(void *opaque, hwaddr offset, unsigned size) +{ +uint64_t ret = -1; +uint64_t devid = (offset & REG_DEVID_MASK); + +switch (offset & ~REG_DEVID_MASK) { +case REG_STATUS: +ret = REG_STATUS_ACTIVE; +break; +case REG_CFG: +ret = REG_CFG_ACTIVE; +break; +case REG_UNK1: +ret = 0x420; +break; +case REG_BUSY: +ret = REG_BUSY_READY; +break; +case REG_UNK2: +ret = 0x1; +break; +case REG_UNK3: +ret = 0x0; +break; +case REG_NEXT_DEVICE: +switch (devid) { +case DEVID_ROOT: +ret = 0x800; +break; +case DEVID_AUX: +ret = 0x1; +break; +} +break; +} + +trace_bdif_read(offset, size, ret); +return ret; +} + +static void le2cpu_sector(VblkSector *sector) +{ +sector->sector = le32_to_cpu(sector->sector); +} + +static void le2cpu_reqcmd(VblkReqCmd *cmd) +{ +cmd->addr = le64_to_cpu(cmd->addr); +cmd->len = le32_to_cpu(cmd->len); +cmd->flags = le32_to_cpu(cmd->flags); +} + +static void le2cpu_req(VblkReq *req) +{ +le2cpu_reqcmd(&req->sector); +le2cpu_reqcmd(&req->data); +le2cpu_reqcmd(&req->retval); +} + +static void vblk_cmd(uint64_t devid, BlockBackend *blk, uint64_t value, + uint64_t static_off) +{ +VblkReq req; +VblkSector sector; +uint64_t off = 0; +char *buf = NULL; +uint8_t ret = VBLK_RET_FAILED; +int r; + +cpu_physical_memory_read(value, &req, sizeof(req)); +le2cpu_req(&req); + +if (req.sector.len != sizeof(sector)) { +ret = VBLK_RET_FAILED; +goto out; +} + +/* Read the vblk command */ +cpu_physical_memory_read(req.sector.addr, §or, sizeof(sector)); +le2cpu_sector(§or); + +off = sector.sector * 512ULL + static_off; + +/* Sanity check that we're not allocating bogus sizes */ +if (req.data.len > (128 * 1024 * 1024)) { +goto out; +} + +buf = g_malloc0(req.data.len); +switch (req.data.flags) { +case VBLK_DATA_FLAGS_READ: +r = blk_pread(blk, off, req.data.len, buf
[PATCH 06/12] hw: Add vmapple subdir
We will introduce a number of devices that are specific to the vmapple target machine. To keep them all tidily together, let's put them into a single target directory. Signed-off-by: Alexander Graf --- MAINTAINERS | 6 ++ hw/Kconfig | 1 + hw/meson.build | 1 + hw/vmapple/Kconfig | 1 + hw/vmapple/meson.build | 0 hw/vmapple/trace-events | 2 ++ hw/vmapple/trace.h | 1 + meson.build | 1 + 8 files changed, 13 insertions(+) create mode 100644 hw/vmapple/Kconfig create mode 100644 hw/vmapple/meson.build create mode 100644 hw/vmapple/trace-events create mode 100644 hw/vmapple/trace.h diff --git a/MAINTAINERS b/MAINTAINERS index 4a80a38511..7d5cb3e3e6 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2547,6 +2547,12 @@ F: hw/usb/canokey.c F: hw/usb/canokey.h F: docs/system/devices/canokey.rst +VMapple +M: Alexander Graf +S: Maintained +F: hw/vmapple/* +F: include/hw/vmapple/* + Subsystems -- Overall Audio backends diff --git a/hw/Kconfig b/hw/Kconfig index ba62ff6417..d99854afdd 100644 --- a/hw/Kconfig +++ b/hw/Kconfig @@ -41,6 +41,7 @@ source tpm/Kconfig source usb/Kconfig source virtio/Kconfig source vfio/Kconfig +source vmapple/Kconfig source xen/Kconfig source watchdog/Kconfig diff --git a/hw/meson.build b/hw/meson.build index c7ac7d3d75..e156a6618f 100644 --- a/hw/meson.build +++ b/hw/meson.build @@ -40,6 +40,7 @@ subdir('tpm') subdir('usb') subdir('vfio') subdir('virtio') +subdir('vmapple') subdir('watchdog') subdir('xen') subdir('xenpv') diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig new file mode 100644 index 00..8b13789179 --- /dev/null +++ b/hw/vmapple/Kconfig @@ -0,0 +1 @@ + diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build new file mode 100644 index 00..e69de29bb2 diff --git a/hw/vmapple/trace-events b/hw/vmapple/trace-events new file mode 100644 index 00..9ccc579048 --- /dev/null +++ b/hw/vmapple/trace-events @@ -0,0 +1,2 @@ +# See docs/devel/tracing.rst for syntax documentation. + diff --git a/hw/vmapple/trace.h b/hw/vmapple/trace.h new file mode 100644 index 00..572adbefe0 --- /dev/null +++ b/hw/vmapple/trace.h @@ -0,0 +1 @@ +#include "trace/trace-hw_vmapple.h" diff --git a/meson.build b/meson.build index 0bb5ea9d10..e0203518ef 100644 --- a/meson.build +++ b/meson.build @@ -3273,6 +3273,7 @@ if have_system 'hw/usb', 'hw/vfio', 'hw/virtio', +'hw/vmapple', 'hw/watchdog', 'hw/xen', 'hw/gpio', -- 2.39.2 (Apple Git-143) Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH 05/12] hw/virtio: Add support for apple virtio-blk
Apple has its own virtio-blk PCI device ID where it deviates from the official virtio-pci spec slightly: It puts a new "apple type" field at a static offset in config space and introduces a new discard command. This patch adds a new qdev property called "apple-type" to virtio-blk-pci. When that property is set, we assume the virtio-blk device is an Apple one of the specific type and act accordingly. Signed-off-by: Alexander Graf --- hw/block/virtio-blk.c | 23 + hw/virtio/virtio-blk-pci.c | 7 +++ include/hw/pci/pci_ids.h| 1 + include/hw/virtio/virtio-blk.h | 1 + include/standard-headers/linux/virtio_blk.h | 3 +++ 5 files changed, 35 insertions(+) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 39e7f23fab..76b85bb3cb 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -1120,6 +1120,20 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) break; } +case VIRTIO_BLK_T_APPLE1: +{ +if (s->conf.x_apple_type) { +/* Only valid on Apple Virtio */ +char buf[iov_size(in_iov, in_num)]; +memset(buf, 0, sizeof(buf)); +iov_from_buf(in_iov, in_num, 0, buf, sizeof(buf)); +virtio_blk_req_complete(req, VIRTIO_BLK_S_OK); +} else { +virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP); +} +virtio_blk_free_request(req); +break; +} default: virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP); virtio_blk_free_request(req); @@ -1351,6 +1365,10 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) } else { blkcfg.zoned.model = VIRTIO_BLK_Z_NONE; } +if (s->conf.x_apple_type) { +/* Apple abuses the same location for its type id */ +blkcfg.max_secure_erase_sectors = s->conf.x_apple_type; +} memcpy(config, &blkcfg, s->config_size); } @@ -1625,6 +1643,10 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp) s->config_size = virtio_get_config_size(&virtio_blk_cfg_size_params, s->host_features); +if (s->conf.x_apple_type) { +/* Apple Virtio puts the blk type at 0x3c, make sure we have space. */ +s->config_size = MAX(s->config_size, 0x3d); +} virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size); s->blk = conf->conf.blk; @@ -1734,6 +1756,7 @@ static Property virtio_blk_properties[] = { conf.max_write_zeroes_sectors, BDRV_REQUEST_MAX_SECTORS), DEFINE_PROP_BOOL("x-enable-wce-if-config-wce", VirtIOBlock, conf.x_enable_wce_if_config_wce, true), +DEFINE_PROP_UINT32("x-apple-type", VirtIOBlock, conf.x_apple_type, 0), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/virtio/virtio-blk-pci.c b/hw/virtio/virtio-blk-pci.c index 9743bee965..5fbf98f750 100644 --- a/hw/virtio/virtio-blk-pci.c +++ b/hw/virtio/virtio-blk-pci.c @@ -62,6 +62,13 @@ static void virtio_blk_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp) } qdev_realize(vdev, BUS(&vpci_dev->bus), errp); + +if (conf->x_apple_type) { +/* Apple virtio-blk uses a different vendor/device id */ +pci_config_set_vendor_id(vpci_dev->pci_dev.config, PCI_VENDOR_ID_APPLE); +pci_config_set_device_id(vpci_dev->pci_dev.config, + PCI_DEVICE_ID_APPLE_VIRTIO_BLK); +} } static void virtio_blk_pci_class_init(ObjectClass *klass, void *data) diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h index e4386ebb20..74e589a298 100644 --- a/include/hw/pci/pci_ids.h +++ b/include/hw/pci/pci_ids.h @@ -188,6 +188,7 @@ #define PCI_DEVICE_ID_APPLE_UNI_N_AGP0x0020 #define PCI_DEVICE_ID_APPLE_U3_AGP 0x004b #define PCI_DEVICE_ID_APPLE_UNI_N_GMAC 0x0021 +#define PCI_DEVICE_ID_APPLE_VIRTIO_BLK 0x1a00 #define PCI_VENDOR_ID_SUN0x108e #define PCI_DEVICE_ID_SUN_EBUS 0x1000 diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h index dafec432ce..7117ce754c 100644 --- a/include/hw/virtio/virtio-blk.h +++ b/include/hw/virtio/virtio-blk.h @@ -46,6 +46,7 @@ struct VirtIOBlkConf uint32_t max_discard_sectors; uint32_t max_write_zeroes_sectors; bool x_enable_wce_if_config_wce; +uint32_t x_apple_type; }; struct VirtIOBlockDataPlane; diff --git a/include/standard-headers/linux/virtio_blk.h b/include/standard-headers/linux/virtio_blk.h index 7155b1a470..bbea5d50b9 100644 --- a/include/standard-headers/linux/virtio_blk.h +++ b/include/standard-headers/linux/virtio_blk.h @@ -204,6 +204,9 @@ struct virtio_blk_config { /* Reset All zones command */ #define VIRTIO_BLK_T_ZONE_RESET_ALL 26 +/* Write zeroes c
[PATCH 04/12] hvf: arm: Ignore writes to CNTP_CTL_EL0
MacOS unconditionally disables interrupts of the physical timer on boot and then continues to use the virtual one. We don't really want to support a full physical timer emulation, so let's just ignore those writes. Signed-off-by: Alexander Graf --- target/arm/hvf/hvf.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c index 8f72624586..0dff63fb5f 100644 --- a/target/arm/hvf/hvf.c +++ b/target/arm/hvf/hvf.c @@ -179,6 +179,7 @@ void hvf_arm_init_debug(void) #define SYSREG_OSLSR_EL1 SYSREG(2, 0, 1, 1, 4) #define SYSREG_OSDLR_EL1 SYSREG(2, 0, 1, 3, 4) #define SYSREG_CNTPCT_EL0 SYSREG(3, 3, 14, 0, 1) +#define SYSREG_CNTP_CTL_EL0 SYSREG(3, 3, 14, 2, 1) #define SYSREG_PMCR_EL0 SYSREG(3, 3, 9, 12, 0) #define SYSREG_PMUSERENR_EL0 SYSREG(3, 3, 9, 14, 0) #define SYSREG_PMCNTENSET_EL0 SYSREG(3, 3, 9, 12, 1) @@ -1551,6 +1552,12 @@ static int hvf_sysreg_write(CPUState *cpu, uint32_t reg, uint64_t val) case SYSREG_OSLAR_EL1: env->cp15.oslsr_el1 = val & 1; break; +case SYSREG_CNTP_CTL_EL0: +/* + * Guests should not rely on the physical counter, but macOS emits + * disable writes to it. Let it do so, but ignore the requests. + */ +break; case SYSREG_OSDLR_EL1: /* Dummy register */ break; -- 2.39.2 (Apple Git-143) Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH 01/12] build: Only define OS_OBJECT_USE_OBJC with gcc
Recent versions of macOS use clang instead of gcc. The OS_OBJECT_USE_OBJC define is only necessary when building with gcc. Let's not define it when building with clang. With this patch, I can successfully include GCD headers in QEMU when building with clang. Signed-off-by: Alexander Graf --- meson.build | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/meson.build b/meson.build index 34306a6205..0bb5ea9d10 100644 --- a/meson.build +++ b/meson.build @@ -225,7 +225,9 @@ qemu_ldflags = [] if targetos == 'darwin' # Disable attempts to use ObjectiveC features in os/object.h since they # won't work when we're compiling with gcc as a C compiler. - qemu_common_flags += '-DOS_OBJECT_USE_OBJC=0' + if compiler.get_id() == 'gcc' +qemu_common_flags += '-DOS_OBJECT_USE_OBJC=0' + endif elif targetos == 'solaris' # needed for CMSG_ macros in sys/socket.h qemu_common_flags += '-D_XOPEN_SOURCE=600' -- 2.39.2 (Apple Git-143) Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH 03/12] hvf: Increase number of possible memory slots
For PVG we will need more than the current 32 possible memory slots. Bump the limit to 512 instead. Signed-off-by: Alexander Graf --- accel/hvf/hvf-accel-ops.c | 2 +- include/sysemu/hvf_int.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c index 9c3da03c94..bf0caaa852 100644 --- a/accel/hvf/hvf-accel-ops.c +++ b/accel/hvf/hvf-accel-ops.c @@ -88,7 +88,7 @@ struct mac_slot { uint64_t gva; }; -struct mac_slot mac_slots[32]; +struct mac_slot mac_slots[512]; static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags) { diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h index 6ab119e49f..c7623a2c09 100644 --- a/include/sysemu/hvf_int.h +++ b/include/sysemu/hvf_int.h @@ -40,7 +40,7 @@ typedef struct hvf_vcpu_caps { struct HVFState { AccelState parent; -hvf_slot slots[32]; +hvf_slot slots[512]; int num_slots; hvf_vcpu_caps *hvf_caps; -- 2.39.2 (Apple Git-143) Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH 04/12] hvf: arm: Ignore writes to CNTP_CTL_EL0
MacOS unconditionally disables interrupts of the physical timer on boot and then continues to use the virtual one. We don't really want to support a full physical timer emulation, so let's just ignore those writes. Signed-off-by: Alexander Graf --- target/arm/hvf/hvf.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c index 8f72624586..0dff63fb5f 100644 --- a/target/arm/hvf/hvf.c +++ b/target/arm/hvf/hvf.c @@ -179,6 +179,7 @@ void hvf_arm_init_debug(void) #define SYSREG_OSLSR_EL1 SYSREG(2, 0, 1, 1, 4) #define SYSREG_OSDLR_EL1 SYSREG(2, 0, 1, 3, 4) #define SYSREG_CNTPCT_EL0 SYSREG(3, 3, 14, 0, 1) +#define SYSREG_CNTP_CTL_EL0 SYSREG(3, 3, 14, 2, 1) #define SYSREG_PMCR_EL0 SYSREG(3, 3, 9, 12, 0) #define SYSREG_PMUSERENR_EL0 SYSREG(3, 3, 9, 14, 0) #define SYSREG_PMCNTENSET_EL0 SYSREG(3, 3, 9, 12, 1) @@ -1551,6 +1552,12 @@ static int hvf_sysreg_write(CPUState *cpu, uint32_t reg, uint64_t val) case SYSREG_OSLAR_EL1: env->cp15.oslsr_el1 = val & 1; break; +case SYSREG_CNTP_CTL_EL0: +/* + * Guests should not rely on the physical counter, but macOS emits + * disable writes to it. Let it do so, but ignore the requests. + */ +break; case SYSREG_OSDLR_EL1: /* Dummy register */ break; -- 2.39.2 (Apple Git-143) Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH 02/12] hw/misc/pvpanic: Add MMIO interface
In addition to the ISA and PCI variants of pvpanic, let's add an MMIO platform device that we can use in embedded arm environments. Signed-off-by: Alexander Graf --- hw/misc/Kconfig | 4 +++ hw/misc/meson.build | 1 + hw/misc/pvpanic-mmio.c| 66 +++ include/hw/misc/pvpanic.h | 1 + 4 files changed, 72 insertions(+) create mode 100644 hw/misc/pvpanic-mmio.c diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig index e4c2149175..21913ef191 100644 --- a/hw/misc/Kconfig +++ b/hw/misc/Kconfig @@ -125,6 +125,10 @@ config PVPANIC_ISA depends on ISA_BUS select PVPANIC_COMMON +config PVPANIC_MMIO +bool +select PVPANIC_COMMON + config AUX bool select I2C diff --git a/hw/misc/meson.build b/hw/misc/meson.build index 78ca857c9d..b935e74d51 100644 --- a/hw/misc/meson.build +++ b/hw/misc/meson.build @@ -115,6 +115,7 @@ softmmu_ss.add(when: 'CONFIG_ARMSSE_MHU', if_true: files('armsse-mhu.c')) softmmu_ss.add(when: 'CONFIG_PVPANIC_ISA', if_true: files('pvpanic-isa.c')) softmmu_ss.add(when: 'CONFIG_PVPANIC_PCI', if_true: files('pvpanic-pci.c')) +softmmu_ss.add(when: 'CONFIG_PVPANIC_MMIO', if_true: files('pvpanic-mmio.c')) softmmu_ss.add(when: 'CONFIG_AUX', if_true: files('auxbus.c')) softmmu_ss.add(when: 'CONFIG_ASPEED_SOC', if_true: files( 'aspeed_hace.c', diff --git a/hw/misc/pvpanic-mmio.c b/hw/misc/pvpanic-mmio.c new file mode 100644 index 00..aebe7227e6 --- /dev/null +++ b/hw/misc/pvpanic-mmio.c @@ -0,0 +1,66 @@ +/* + * QEMU simulated pvpanic device (MMIO frontend) + * + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "qemu/module.h" +#include "sysemu/runstate.h" + +#include "hw/nvram/fw_cfg.h" +#include "hw/qdev-properties.h" +#include "hw/misc/pvpanic.h" +#include "qom/object.h" +#include "hw/isa/isa.h" +#include "standard-headers/linux/pvpanic.h" + +OBJECT_DECLARE_SIMPLE_TYPE(PVPanicMMIOState, PVPANIC_MMIO_DEVICE) + +#define PVPANIC_MMIO_SIZE 0x2 + +struct PVPanicMMIOState { +SysBusDevice parent_obj; + +PVPanicState pvpanic; +}; + +static void pvpanic_mmio_initfn(Object *obj) +{ +PVPanicMMIOState *s = PVPANIC_MMIO_DEVICE(obj); + +pvpanic_setup_io(&s->pvpanic, DEVICE(s), PVPANIC_MMIO_SIZE); +sysbus_init_mmio(SYS_BUS_DEVICE(obj), &s->pvpanic.mr); +} + +static Property pvpanic_mmio_properties[] = { +DEFINE_PROP_UINT8("events", PVPanicMMIOState, pvpanic.events, + PVPANIC_PANICKED | PVPANIC_CRASH_LOADED), +DEFINE_PROP_END_OF_LIST(), +}; + +static void pvpanic_mmio_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); + +device_class_set_props(dc, pvpanic_mmio_properties); +set_bit(DEVICE_CATEGORY_MISC, dc->categories); +} + +static const TypeInfo pvpanic_mmio_info = { +.name = TYPE_PVPANIC_MMIO_DEVICE, +.parent= TYPE_SYS_BUS_DEVICE, +.instance_size = sizeof(PVPanicMMIOState), +.instance_init = pvpanic_mmio_initfn, +.class_init= pvpanic_mmio_class_init, +}; + +static void pvpanic_register_types(void) +{ +type_register_static(&pvpanic_mmio_info); +} + +type_init(pvpanic_register_types) diff --git a/include/hw/misc/pvpanic.h b/include/hw/misc/pvpanic.h index fab94165d0..f9e7c1ea17 100644 --- a/include/hw/misc/pvpanic.h +++ b/include/hw/misc/pvpanic.h @@ -20,6 +20,7 @@ #define TYPE_PVPANIC_ISA_DEVICE "pvpanic" #define TYPE_PVPANIC_PCI_DEVICE "pvpanic-pci" +#define TYPE_PVPANIC_MMIO_DEVICE "pvpanic-mmio" #define PVPANIC_IOPORT_PROP "ioport" -- 2.39.2 (Apple Git-143) Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH 00/12] Introduce new vmapple machine type
This patch set introduces a new ARM and HVF specific machine type called "vmapple". It mimicks the device model that Apple's proprietary Virtualization.Framework exposes, but implements it in QEMU. With this new machine type, you can run macOS guests on Apple Silicon systems via HVF. To do so, you need to first install macOS using Virtualization.Framework onto a virtual disk image using a tool like macosvm (https://github.com/s-u/macosvm) $ macosvm --disk disk.img,size=32g --aux aux.img \ --restore UniversalMac_12.0.1_21A559_Restore.ipsw vm.json Then, extract the ECID from the installed VM: $ cat "$DIR/macosvm.json" | python3 -c \ 'import json,sys;obj=json.load(sys.stdin);print(obj["machineId"]) |\ base64 -d | plutil -extract ECID raw - In addition, cut off the first 16kb of the aux.img: $ dd if=aux.img of=aux.img.trimmed bs=$(( 0x4000 )) skip=1 Now, you can just launch QEMU with the bits generated above: $ qemu-system-aarch64 -serial mon:stdio\ -m 4G \ -M vmapple,uuid=6240349656165161789\ -bios /Sys*/Lib*/Fra*/Virtualization.f*/R*/AVPBooter.vmapple2.bin \ -pflash aux.img.trimmed\ -pflash disk.img \ -drive file=disk.img,if=none,id=root \ -device virtio-blk-pci,drive=root,x-apple-type=1 \ -drive file=aux.img.trimmed,if=none,id=aux \ -device virtio-blk-pci,drive=aux,x-apple-type=2\ -accel hvf -no-reboot There are a few limitations with this implementation: - Only runs on macOS because it relies on ParavirtualizesGraphics.Framework - Something is not fully correct on interrupt delivery or similar - the keyboard does not work - No Rosetta in the guest because we lack the private entitlement to enable TSO Over time, I hope that some of the limitations above could cease to exist. This device model would enable very nice use cases with KVM on an Asahi Linux device. Alexander Graf (12): build: Only define OS_OBJECT_USE_OBJC with gcc hw/misc/pvpanic: Add MMIO interface hvf: Increase number of possible memory slots hvf: arm: Ignore writes to CNTP_CTL_EL0 hw/virtio: Add support for apple virtio-blk hw: Add vmapple subdir gpex: Allow more than 4 legacy IRQs hw/vmapple/aes: Introduce aes engine hw/vmapple/bdif: Introduce vmapple backdoor interface hw/vmapple/cfg: Introduce vmapple cfg region hw/vmapple/apple-gfx: Introduce ParavirtualizedGraphics.Framework support hw/vmapple/vmapple: Add vmapple machine type MAINTAINERS | 6 + accel/hvf/hvf-accel-ops.c | 2 +- hw/Kconfig | 1 + hw/arm/sbsa-ref.c | 2 +- hw/arm/virt.c | 2 +- hw/block/virtio-blk.c | 23 + hw/i386/microvm.c | 2 +- hw/loongarch/virt.c | 2 +- hw/meson.build | 1 + hw/mips/loongson3_virt.c| 2 +- hw/misc/Kconfig | 4 + hw/misc/meson.build | 1 + hw/misc/pvpanic-mmio.c | 66 ++ hw/openrisc/virt.c | 12 +- hw/pci-host/gpex.c | 36 +- hw/riscv/virt.c | 12 +- hw/virtio/virtio-blk-pci.c | 7 + hw/vmapple/Kconfig | 30 + hw/vmapple/aes.c| 583 + hw/vmapple/apple-gfx.m | 578 + hw/vmapple/bdif.c | 245 hw/vmapple/cfg.c| 105 hw/vmapple/meson.build | 5 + hw/vmapple/trace-events | 47 ++ hw/vmapple/trace.h | 1 + hw/vmapple/vmapple.c| 661 hw/xtensa/virt.c| 2 +- include/hw/misc/pvpanic.h | 1 + include/hw/pci-host/gpex.h | 7 +- include/hw/pci/pci_ids.h| 1 + include/hw/virtio/virtio-blk.h | 1 + include/hw/vmapple/bdif.h | 31 + include/hw/vmapple/cfg.h| 68 ++ include/standard-headers/linux/virtio_blk.h | 3 + include/sysemu/hvf_int.h| 2 +- meson.build | 9 +- target/arm/hvf/hvf.c| 7 + 37 files changed, 2538 insertions(+), 30 deletions(-) create mode 100644 hw/misc/pvpan
Re: [PATCH 2/3] hw/ppc/e500plat: Fix modifying QOM class internal state from instance
Hi Philippe, On 23.05.23 08:44, Philippe Mathieu-Daudé wrote: QOM object instance should not modify its class state (because all other objects instanciated from this class get affected). Instead of modifying the PPCE500MachineClass 'mpic_version' field in the instance machine_init() handler, set it in the machine class init handler (e500plat_machine_class_init). Inspired-by: Bernhard Beschow Signed-off-by: Philippe Mathieu-Daudé --- hw/ppc/e500plat.c | 25 +++-- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/hw/ppc/e500plat.c b/hw/ppc/e500plat.c index 3032bd3f6d..c3b0ed01cf 100644 --- a/hw/ppc/e500plat.c +++ b/hw/ppc/e500plat.c @@ -30,18 +30,6 @@ static void e500plat_fixup_devtree(void *fdt) sizeof(compatible)); } -static void e500plat_init(MachineState *machine) -{ -PPCE500MachineClass *pmc = PPCE500_MACHINE_GET_CLASS(machine); -/* Older KVM versions don't support EPR which breaks guests when we announce - MPIC variants that support EPR. Revert to an older one for those */ -if (kvm_enabled() && !kvmppc_has_cap_epr()) { -pmc->mpic_version = OPENPIC_MODEL_FSL_MPIC_20; -} - -ppce500_init(machine); Won't this drop the call to ppce500_init(machine)? -} - static void e500plat_machine_device_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { @@ -81,7 +69,6 @@ static void e500plat_machine_class_init(ObjectClass *oc, void *data) pmc->pci_first_slot = 0x1; pmc->pci_nr_slots = PCI_SLOT_MAX - 1; pmc->fixup_devtree = e500plat_fixup_devtree; -pmc->mpic_version = OPENPIC_MODEL_FSL_MPIC_42; pmc->has_mpc8xxx_gpio = true; pmc->has_esdhc = true; pmc->platform_bus_base = 0xfULL; @@ -94,8 +81,18 @@ static void e500plat_machine_class_init(ObjectClass *oc, void *data) pmc->pci_mmio_bus_base = 0xE000ULL; pmc->spin_base = 0xFEF00ULL; +if (kvm_enabled() && !kvmppc_has_cap_epr()) { +/* + * Older KVM versions don't support EPR which breaks guests when + * we announce MPIC variants that support EPR. Revert to an older + * one for those. + */ +pmc->mpic_version = OPENPIC_MODEL_FSL_MPIC_20; +} else { +pmc->mpic_version = OPENPIC_MODEL_FSL_MPIC_42; +} + mc->desc = "generic paravirt e500 platform"; -mc->init = e500plat_init; I suppose best would be to just put it in here instead of e500plat_init? Alex mc->max_cpus = 32; mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("e500v2_v30"); mc->default_ram_id = "mpc8544ds.ram";
[PATCH] hvf: Enable 1G page support
Hvf on x86 only supported 2MiB large pages, but never bothered to strip out the 1GiB page size capability from -cpu host. With QEMU 8.0.0 this became a problem because OVMF started to use 1GiB pages by default. Let's just unconditionally add 1GiB page walk support to the walker. With this fix applied, I can successfully run OVMF again. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1603 Signed-off-by: Alexander Graf Reported-by: Akihiro Suda Reported-by: Philippe Mathieu-Daudé --- On my test VM, Linux dies later on with issues in interrupt delivery. But those are unrelated to this patch; I confirmed that I get the same behavior with 1GiB page support disabled. --- target/i386/hvf/x86_mmu.c | 30 -- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/target/i386/hvf/x86_mmu.c b/target/i386/hvf/x86_mmu.c index 96d117567e..1d860651c6 100644 --- a/target/i386/hvf/x86_mmu.c +++ b/target/i386/hvf/x86_mmu.c @@ -38,6 +38,7 @@ #define LEGACY_PTE_PAGE_MASK(0xllu << 12) #define PAE_PTE_PAGE_MASK ((-1llu << 12) & ((1llu << 52) - 1)) #define PAE_PTE_LARGE_PAGE_MASK ((-1llu << (21)) & ((1llu << 52) - 1)) +#define PAE_PTE_SUPER_PAGE_MASK ((-1llu << (30)) & ((1llu << 52) - 1)) struct gpt_translation { target_ulong gva; @@ -96,7 +97,7 @@ static bool get_pt_entry(struct CPUState *cpu, struct gpt_translation *pt, /* test page table entry */ static bool test_pt_entry(struct CPUState *cpu, struct gpt_translation *pt, - int level, bool *is_large, bool pae) + int level, int *largeness, bool pae) { uint64_t pte = pt->pte[level]; @@ -118,9 +119,9 @@ static bool test_pt_entry(struct CPUState *cpu, struct gpt_translation *pt, goto exit; } -if (1 == level && pte_large_page(pte)) { +if (level && pte_large_page(pte)) { pt->err_code |= MMU_PAGE_PT; -*is_large = true; +*largeness = level; } if (!level) { pt->err_code |= MMU_PAGE_PT; @@ -152,9 +153,18 @@ static inline uint64_t pse_pte_to_page(uint64_t pte) return ((pte & 0x1fe000) << 19) | (pte & 0xffc0); } -static inline uint64_t large_page_gpa(struct gpt_translation *pt, bool pae) +static inline uint64_t large_page_gpa(struct gpt_translation *pt, bool pae, + int largeness) { -VM_PANIC_ON(!pte_large_page(pt->pte[1])) +VM_PANIC_ON(!pte_large_page(pt->pte[largeness])) + +/* 1Gib large page */ +if (pae && largeness == 2) { +return (pt->pte[2] & PAE_PTE_SUPER_PAGE_MASK) | (pt->gva & 0x3fff); +} + +VM_PANIC_ON(largeness != 1) + /* 2Mb large page */ if (pae) { return (pt->pte[1] & PAE_PTE_LARGE_PAGE_MASK) | (pt->gva & 0x1f); @@ -170,7 +180,7 @@ static bool walk_gpt(struct CPUState *cpu, target_ulong addr, int err_code, struct gpt_translation *pt, bool pae) { int top_level, level; -bool is_large = false; +int largeness = 0; target_ulong cr3 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR3); uint64_t page_mask = pae ? PAE_PTE_PAGE_MASK : LEGACY_PTE_PAGE_MASK; @@ -186,19 +196,19 @@ static bool walk_gpt(struct CPUState *cpu, target_ulong addr, int err_code, for (level = top_level; level > 0; level--) { get_pt_entry(cpu, pt, level, pae); -if (!test_pt_entry(cpu, pt, level - 1, &is_large, pae)) { +if (!test_pt_entry(cpu, pt, level - 1, &largeness, pae)) { return false; } -if (is_large) { +if (largeness) { break; } } -if (!is_large) { +if (!largeness) { pt->gpa = (pt->pte[0] & page_mask) | (pt->gva & 0xfff); } else { -pt->gpa = large_page_gpa(pt, pae); +pt->gpa = large_page_gpa(pt, pae, largeness); } return true; -- 2.39.2 (Apple Git-143)
[PATCH v5] hostmem-file: add offset option
Add an option for hostmem-file to start the memory object at an offset into the target file. This is useful if multiple memory objects reside inside the same target file, such as a device node. In particular, it's useful to map guest memory directly into /dev/mem for experimentation. To make this work consistently, also fix up all places in QEMU that expect fd offsets to be 0. Signed-off-by: Alexander Graf --- v1 -> v2: - add qom documentation - propagate offset into truncate, size and alignment checks v2 -> v3: - failed attempt at fixing typo v3 -> v4: - fix typo v4 -> v5: - improve qom doc comment - account for fd_offset in more places --- backends/hostmem-file.c | 40 +++- hw/virtio/vhost-user.c | 1 + include/exec/memory.h | 2 ++ include/exec/ram_addr.h | 3 ++- include/exec/ramblock.h | 1 + qapi/qom.json | 5 + qemu-options.hx | 6 +- softmmu/memory.c| 3 ++- softmmu/physmem.c | 17 - 9 files changed, 69 insertions(+), 9 deletions(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 25141283c4..38ea65bec5 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -27,6 +27,7 @@ struct HostMemoryBackendFile { char *mem_path; uint64_t align; +uint64_t offset; bool discard_data; bool is_pmem; bool readonly; @@ -58,7 +59,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) ram_flags |= fb->is_pmem ? RAM_PMEM : 0; memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name, backend->size, fb->align, ram_flags, - fb->mem_path, fb->readonly, errp); + fb->mem_path, fb->offset, fb->readonly, + errp); g_free(name); #endif } @@ -125,6 +127,36 @@ static void file_memory_backend_set_align(Object *o, Visitor *v, fb->align = val; } +static void file_memory_backend_get_offset(Object *o, Visitor *v, + const char *name, void *opaque, + Error **errp) +{ +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); +uint64_t val = fb->offset; + +visit_type_size(v, name, &val, errp); +} + +static void file_memory_backend_set_offset(Object *o, Visitor *v, + const char *name, void *opaque, + Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(o); +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); +uint64_t val; + +if (host_memory_backend_mr_inited(backend)) { +error_setg(errp, "cannot change property '%s' of %s", name, + object_get_typename(o)); +return; +} + +if (!visit_type_size(v, name, &val, errp)) { +return; +} +fb->offset = val; +} + #ifdef CONFIG_LIBPMEM static bool file_memory_backend_get_pmem(Object *o, Error **errp) { @@ -197,6 +229,12 @@ file_backend_class_init(ObjectClass *oc, void *data) file_memory_backend_get_align, file_memory_backend_set_align, NULL, NULL); +object_class_property_add(oc, "offset", "int", +file_memory_backend_get_offset, +file_memory_backend_set_offset, +NULL, NULL); +object_class_property_set_description(oc, "offset", +"Offset into the target file (ex: 1G)"); #ifdef CONFIG_LIBPMEM object_class_property_add_bool(oc, "pmem", file_memory_backend_get_pmem, file_memory_backend_set_pmem); diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index e5285df4ba..39dc803b03 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -483,6 +483,7 @@ static MemoryRegion *vhost_user_get_mr_data(uint64_t addr, ram_addr_t *offset, assert((uintptr_t)addr == addr); mr = memory_region_from_host((void *)(uintptr_t)addr, offset); *fd = memory_region_get_fd(mr); +*offset += mr->ram_block->fd_offset; return mr; } diff --git a/include/exec/memory.h b/include/exec/memory.h index 15ade918ba..3b7295fbe2 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1318,6 +1318,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM, * RAM_NORESERVE, * @path: the path in which to allocate the RAM. + * @offset: offset within the file referenced by path * @readonly: true to open @path for reading, false for read/write. * @errp: pointer to Error*, to store an error if it happens. * @@ -1331,6 +1332,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
Re: [PATCH v4] hostmem-file: add offset option
On 03.04.23 09:13, David Hildenbrand wrote: On 01.04.23 19:47, Stefan Hajnoczi wrote: On Sat, Apr 01, 2023 at 12:42:57PM +, Alexander Graf wrote: Add an option for hostmem-file to start the memory object at an offset into the target file. This is useful if multiple memory objects reside inside the same target file, such as a device node. In particular, it's useful to map guest memory directly into /dev/mem for experimentation. Signed-off-by: Alexander Graf Reviewed-by: Stefan Hajnoczi --- v1 -> v2: - add qom documentation - propagate offset into truncate, size and alignment checks v2 -> v3: - failed attempt at fixing typo v2 -> v4: - fix typo --- backends/hostmem-file.c | 40 +++- include/exec/memory.h | 2 ++ include/exec/ram_addr.h | 3 ++- qapi/qom.json | 5 + qemu-options.hx | 6 +- softmmu/memory.c | 3 ++- softmmu/physmem.c | 14 ++ 7 files changed, 65 insertions(+), 8 deletions(-) Reviewed-by: Stefan Hajnoczi The change itself looks good to me, but I do think some other QEMU code that ends up working on the RAMBlock is not prepared yet. Most probably, because we never ended up using fd with an offset as guest RAM. We don't seem to be remembering that offset in the RAMBlock. First, I thought block->offset would be used for that, but that's just the offset in the ram_addr_t space. Maybe we need a new "block->fd_offset" to remember the offset (unless I am missing something). The real offset in the file would be required at least in two cases I can see (whenever we essentially end up calling mmap() on the fd again): 1) qemu_ram_remap(): We'd have to add the file offset on top of the calculated offset. This one is a bit tricky to test, as we're only running into that code path with KVM when we see an #MCE. But it's trivial, so I'm confident it will work as expected. 2) vhost-user: most probably whenever we set the mmap_offset. For example, in vhost_user_fill_set_mem_table_msg() we'd similarly have to add the file_offset on top of the calculated offset. vhost_user_get_mr_data() should most probably do that. I agree - adding the offset as part of get_mr_data() is sufficient. I have validated it works correctly with QEMU's vhost-user-blk target. I think the changes are still obvious enough that I'll fold them all into a single patch. Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Re: [PATCH v2] hostmem-file: add offset option
On 03.04.23 08:28, Markus Armbruster wrote: Alexander Graf writes: Add an option for hostmem-file to start the memory object at an offset into the target file. This is useful if multiple memory objects reside inside the same target file, such as a device node. In particular, it's useful to map guest memory directly into /dev/mem for experimentation. Signed-off-by: Alexander Graf [...] diff --git a/qapi/qom.json b/qapi/qom.json index a877b879b9..8f5eaa8415 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -635,6 +635,10 @@ # specify the required alignment via this option. # 0 selects a default alignment (currently the page size). (default: 0) # +# @offset: the offset into the target file that the region starts at. You can +# use this option to overload multiple regions into a single fils. single file I'm not sure about "to overload multiple regions into a single file". Maybe "to back multiple regions with a single file". I like it, I'll use that version here and in the qemu-options.hx file. Any alignment requirements? Page size, I'll add it. What happens when the regions overlap? It "just works" - same as mapping the same file twice. It's up to the user to ensure that nothing bad happens because of that. Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH v4] hostmem-file: add offset option
Add an option for hostmem-file to start the memory object at an offset into the target file. This is useful if multiple memory objects reside inside the same target file, such as a device node. In particular, it's useful to map guest memory directly into /dev/mem for experimentation. Signed-off-by: Alexander Graf Reviewed-by: Stefan Hajnoczi --- v1 -> v2: - add qom documentation - propagate offset into truncate, size and alignment checks v2 -> v3: - failed attempt at fixing typo v2 -> v4: - fix typo --- backends/hostmem-file.c | 40 +++- include/exec/memory.h | 2 ++ include/exec/ram_addr.h | 3 ++- qapi/qom.json | 5 + qemu-options.hx | 6 +- softmmu/memory.c| 3 ++- softmmu/physmem.c | 14 ++ 7 files changed, 65 insertions(+), 8 deletions(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 25141283c4..38ea65bec5 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -27,6 +27,7 @@ struct HostMemoryBackendFile { char *mem_path; uint64_t align; +uint64_t offset; bool discard_data; bool is_pmem; bool readonly; @@ -58,7 +59,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) ram_flags |= fb->is_pmem ? RAM_PMEM : 0; memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name, backend->size, fb->align, ram_flags, - fb->mem_path, fb->readonly, errp); + fb->mem_path, fb->offset, fb->readonly, + errp); g_free(name); #endif } @@ -125,6 +127,36 @@ static void file_memory_backend_set_align(Object *o, Visitor *v, fb->align = val; } +static void file_memory_backend_get_offset(Object *o, Visitor *v, + const char *name, void *opaque, + Error **errp) +{ +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); +uint64_t val = fb->offset; + +visit_type_size(v, name, &val, errp); +} + +static void file_memory_backend_set_offset(Object *o, Visitor *v, + const char *name, void *opaque, + Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(o); +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); +uint64_t val; + +if (host_memory_backend_mr_inited(backend)) { +error_setg(errp, "cannot change property '%s' of %s", name, + object_get_typename(o)); +return; +} + +if (!visit_type_size(v, name, &val, errp)) { +return; +} +fb->offset = val; +} + #ifdef CONFIG_LIBPMEM static bool file_memory_backend_get_pmem(Object *o, Error **errp) { @@ -197,6 +229,12 @@ file_backend_class_init(ObjectClass *oc, void *data) file_memory_backend_get_align, file_memory_backend_set_align, NULL, NULL); +object_class_property_add(oc, "offset", "int", +file_memory_backend_get_offset, +file_memory_backend_set_offset, +NULL, NULL); +object_class_property_set_description(oc, "offset", +"Offset into the target file (ex: 1G)"); #ifdef CONFIG_LIBPMEM object_class_property_add_bool(oc, "pmem", file_memory_backend_get_pmem, file_memory_backend_set_pmem); diff --git a/include/exec/memory.h b/include/exec/memory.h index 15ade918ba..3b7295fbe2 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1318,6 +1318,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM, * RAM_NORESERVE, * @path: the path in which to allocate the RAM. + * @offset: offset within the file referenced by path * @readonly: true to open @path for reading, false for read/write. * @errp: pointer to Error*, to store an error if it happens. * @@ -1331,6 +1332,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr, uint64_t align, uint32_t ram_flags, const char *path, + ram_addr_t offset, bool readonly, Error **errp); diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index f4fb6a2111..90a8269290 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -110,6 +110,7 @@ long qemu_maxrampagesize(void); * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM, * RAM_NORESERVE. * @mem_path or @fd: specify the backing file or device + * @offset: Offset into targ
[PATCH v3] hostmem-file: add offset option
Add an option for hostmem-file to start the memory object at an offset into the target file. This is useful if multiple memory objects reside inside the same target file, such as a device node. In particular, it's useful to map guest memory directly into /dev/mem for experimentation. Signed-off-by: Alexander Graf Reviewed-by: Stefan Hajnoczi --- v1 -> v2: - add qom documentation - propagate offset into truncate, size and alignment checks v2 -> v3: - fix typo --- backends/hostmem-file.c | 40 +++- include/exec/memory.h | 2 ++ include/exec/ram_addr.h | 3 ++- qapi/qom.json | 5 + qemu-options.hx | 6 +- softmmu/memory.c| 3 ++- softmmu/physmem.c | 14 ++ 7 files changed, 65 insertions(+), 8 deletions(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 25141283c4..38ea65bec5 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -27,6 +27,7 @@ struct HostMemoryBackendFile { char *mem_path; uint64_t align; +uint64_t offset; bool discard_data; bool is_pmem; bool readonly; @@ -58,7 +59,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) ram_flags |= fb->is_pmem ? RAM_PMEM : 0; memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name, backend->size, fb->align, ram_flags, - fb->mem_path, fb->readonly, errp); + fb->mem_path, fb->offset, fb->readonly, + errp); g_free(name); #endif } @@ -125,6 +127,36 @@ static void file_memory_backend_set_align(Object *o, Visitor *v, fb->align = val; } +static void file_memory_backend_get_offset(Object *o, Visitor *v, + const char *name, void *opaque, + Error **errp) +{ +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); +uint64_t val = fb->offset; + +visit_type_size(v, name, &val, errp); +} + +static void file_memory_backend_set_offset(Object *o, Visitor *v, + const char *name, void *opaque, + Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(o); +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); +uint64_t val; + +if (host_memory_backend_mr_inited(backend)) { +error_setg(errp, "cannot change property '%s' of %s", name, + object_get_typename(o)); +return; +} + +if (!visit_type_size(v, name, &val, errp)) { +return; +} +fb->offset = val; +} + #ifdef CONFIG_LIBPMEM static bool file_memory_backend_get_pmem(Object *o, Error **errp) { @@ -197,6 +229,12 @@ file_backend_class_init(ObjectClass *oc, void *data) file_memory_backend_get_align, file_memory_backend_set_align, NULL, NULL); +object_class_property_add(oc, "offset", "int", +file_memory_backend_get_offset, +file_memory_backend_set_offset, +NULL, NULL); +object_class_property_set_description(oc, "offset", +"Offset into the target file (ex: 1G)"); #ifdef CONFIG_LIBPMEM object_class_property_add_bool(oc, "pmem", file_memory_backend_get_pmem, file_memory_backend_set_pmem); diff --git a/include/exec/memory.h b/include/exec/memory.h index 15ade918ba..3b7295fbe2 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1318,6 +1318,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM, * RAM_NORESERVE, * @path: the path in which to allocate the RAM. + * @offset: offset within the file referenced by path * @readonly: true to open @path for reading, false for read/write. * @errp: pointer to Error*, to store an error if it happens. * @@ -1331,6 +1332,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr, uint64_t align, uint32_t ram_flags, const char *path, + ram_addr_t offset, bool readonly, Error **errp); diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index f4fb6a2111..90a8269290 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -110,6 +110,7 @@ long qemu_maxrampagesize(void); * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM, * RAM_NORESERVE. * @mem_path or @fd: specify the backing file or device + * @offset: Offset into target file * @readonly: true to open @pa
[PATCH v2] hostmem-file: add offset option
Add an option for hostmem-file to start the memory object at an offset into the target file. This is useful if multiple memory objects reside inside the same target file, such as a device node. In particular, it's useful to map guest memory directly into /dev/mem for experimentation. Signed-off-by: Alexander Graf --- v1 -> v2: - add qom documentation - propagate offset into truncate, size and alignment checks --- backends/hostmem-file.c | 40 +++- include/exec/memory.h | 2 ++ include/exec/ram_addr.h | 3 ++- qapi/qom.json | 5 + qemu-options.hx | 6 +- softmmu/memory.c| 3 ++- softmmu/physmem.c | 14 ++ 7 files changed, 65 insertions(+), 8 deletions(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 25141283c4..38ea65bec5 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -27,6 +27,7 @@ struct HostMemoryBackendFile { char *mem_path; uint64_t align; +uint64_t offset; bool discard_data; bool is_pmem; bool readonly; @@ -58,7 +59,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) ram_flags |= fb->is_pmem ? RAM_PMEM : 0; memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name, backend->size, fb->align, ram_flags, - fb->mem_path, fb->readonly, errp); + fb->mem_path, fb->offset, fb->readonly, + errp); g_free(name); #endif } @@ -125,6 +127,36 @@ static void file_memory_backend_set_align(Object *o, Visitor *v, fb->align = val; } +static void file_memory_backend_get_offset(Object *o, Visitor *v, + const char *name, void *opaque, + Error **errp) +{ +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); +uint64_t val = fb->offset; + +visit_type_size(v, name, &val, errp); +} + +static void file_memory_backend_set_offset(Object *o, Visitor *v, + const char *name, void *opaque, + Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(o); +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); +uint64_t val; + +if (host_memory_backend_mr_inited(backend)) { +error_setg(errp, "cannot change property '%s' of %s", name, + object_get_typename(o)); +return; +} + +if (!visit_type_size(v, name, &val, errp)) { +return; +} +fb->offset = val; +} + #ifdef CONFIG_LIBPMEM static bool file_memory_backend_get_pmem(Object *o, Error **errp) { @@ -197,6 +229,12 @@ file_backend_class_init(ObjectClass *oc, void *data) file_memory_backend_get_align, file_memory_backend_set_align, NULL, NULL); +object_class_property_add(oc, "offset", "int", +file_memory_backend_get_offset, +file_memory_backend_set_offset, +NULL, NULL); +object_class_property_set_description(oc, "offset", +"Offset into the target file (ex: 1G)"); #ifdef CONFIG_LIBPMEM object_class_property_add_bool(oc, "pmem", file_memory_backend_get_pmem, file_memory_backend_set_pmem); diff --git a/include/exec/memory.h b/include/exec/memory.h index 15ade918ba..3b7295fbe2 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1318,6 +1318,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM, * RAM_NORESERVE, * @path: the path in which to allocate the RAM. + * @offset: offset within the file referenced by path * @readonly: true to open @path for reading, false for read/write. * @errp: pointer to Error*, to store an error if it happens. * @@ -1331,6 +1332,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr, uint64_t align, uint32_t ram_flags, const char *path, + ram_addr_t offset, bool readonly, Error **errp); diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index f4fb6a2111..90a8269290 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -110,6 +110,7 @@ long qemu_maxrampagesize(void); * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM, * RAM_NORESERVE. * @mem_path or @fd: specify the backing file or device + * @offset: Offset into target file * @readonly: true to open @path for reading, false for read/write. * @e
Re: [PATCH v2 28/32] contrib/gitdm: add Amazon to the domain map
On 15.03.23 20:18, Durrant, Paul wrote: -Original Message- From: Alex Bennée Sent: 15 March 2023 17:43 To: qemu-devel@nongnu.org Cc: Akihiko Odaki ; Marc-André Lureau ; qemu-ri...@nongnu.org; Riku Voipio ; Igor Mammedov ; Xiao Guangrong ; Thomas Huth ; Wainer dos Santos Moschetta ; Dr. David Alan Gilbert ; Alex Williamson ; Hao Wu ; Cleber Rosa ; Daniel Henrique Barboza ; Jan Kiszka ; Aurelien Jarno ; qemu-...@nongnu.org; Marcelo Tosatti ; Eduardo Habkost ; Alexandre Iooss ; Gerd Hoffmann ; Palmer Dabbelt ; Ilya Leoshkevich ; qemu- p...@nongnu.org; Juan Quintela ; Cédric Le Goater ; Darren Kenny ; k...@vger.kernel.org; Marcel Apfelbaum ; Peter Maydell ; Richard Henderson ; Stafford Horne ; Weiwei Li ; Sunil V L ; Stefan Hajnoczi ; Thomas Huth ; Vijai Kumar K ; Liu Zhiwei ; David Gibson ; Song Gao ; Paolo Bonzini ; Michael S. Tsirkin ; Niek Linnenbank ; Greg Kurz ; Laurent Vivier ; Qiuhao Li ; Philippe Mathieu-Daudé ; Xiaojuan Yang ; Mahmoud Mandour ; Alexander Bulekov ; Jiaxun Yang ; qemu-bl...@nongnu.org; Yanan Wang ; David Woodhouse ; qemu-s3...@nongnu.org; Strahinja Jankovic ; Bandan Das ; Alistair Francis ; Aleksandar Rikalo ; Tyrone Ting ; Kevin Wolf ; David Hildenbrand ; Beraldo Leal ; Beniamino Galvani ; Paul Durrant ; Bin Meng ; Sunil Muthuswamy ; Hanna Reitz ; Peter Xu ; Alex Bennée ; Graf (AWS), Alexander ; Durrant, Paul ; Woodhouse, David Subject: [EXTERNAL] [PATCH v2 28/32] contrib/gitdm: add Amazon to the domain map CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. We have multiple contributors from both .co.uk and .com versions of the address. Signed-off-by: Alex Bennée Cc: Alexander Graf Cc: Paul Durrant Cc: David Wooodhouse Reviewed-by: Philippe Mathieu-Daudé Message-Id: <20230310180332.2274827-7-alex.ben...@linaro.org> --- contrib/gitdm/domain-map | 2 ++ 1 file changed, 2 insertions(+) diff --git a/contrib/gitdm/domain-map b/contrib/gitdm/domain-map index 4a988c5b5f..8dce276a1c 100644 --- a/contrib/gitdm/domain-map +++ b/contrib/gitdm/domain-map @@ -4,6 +4,8 @@ # This maps email domains to nice easy to read company names # +amazon.com Amazon +amazon.co.ukAmazon You might want 'amazon.de' too but as far as it goes... Yes, please add amazon.de here. Once that's added, feel free to take my Reviewed-by: Alexander Graf Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Re: [PATCH 0/2] hw/intc/arm_gicv3: Bump ITT entry size to 16
On 03.01.23 18:41, Peter Maydell wrote: On Fri, 23 Dec 2022 at 08:50, Alexander Graf wrote: While trying to make Windows work with GICv3 emulation, I stumbled over the fact that it only supports ITT entry sizes that are power of 2 sized. While the spec allows arbitrary sizes, in practice hardware will always expose power of 2 sizes and so this limitation is not really a problem in real world scenarios. However, we only expose a 12 byte ITT entry size which makes Windows blue screen on boot. The easy way to get around that problem is to bump the size to 16. That is a power of 2, basically is what hardware would expose given the amount of bits we need per entry and doesn't break any existing scenarios. To play it safe, this patch set only bumps them on newer machine types. This is a Windows bug and should IMHO be fixed in that guest OS. Changing the ITT entry size of QEMU's implementation introduces an unnecessary incompatibility in migration and wastes memory (we're already a bit unnecessarily profligate with ITT entries compared to real hardware). Follow-up on this: Microsoft has fixed the issue in Windows. That won't make older versions work, but the current should be fine with GICv3: https://fosstodon.org/@itanium/109909281184181276 Alex
Re: [PATCH v2] hvf: arm: Add support for GICv3
Hey Peter, On 03.02.23 11:57, Peter Maydell wrote: On Thu, 2 Feb 2023 at 17:56, Peter Maydell wrote: On Sat, 28 Jan 2023 at 22:45, Alexander Graf wrote: We currently only support GICv2 emulation. To also support GICv3, we will need to pass a few system registers into their respective handler functions. This patch adds support for HVF to call into the TCG callbacks for GICv3 system register handlers. This is safe because the GICv3 TCG code is generic as long as we limit ourselves to EL0 and EL1 - which are the only modes supported by HVF. To make sure nobody trips over that, we also annotate callbacks that don't work in HVF mode, such as EL state change hooks. With GICv3 support in place, we can run with more than 8 vCPUs. Signed-off-by: Alexander Graf --- Applied to target-arm.next, thanks. This one *also* fails 'make check'. Please can you test your patches before sending them? The fix is not difficult (another missing qtest_enabled() check), so I've squashed it in. Sorry for the mess :(. I usually do test TCG and HVF when submitting these patches with various VMs, but keep forgetting about "make check". I'll try hard to remember next time. Thanks, Alex
[PATCH] hostmem-file: add offset option
Add an option for hostmem-file to start the memory object at an offset into the target file. This is useful if multiple memory objects reside inside the same target file, such as a device node. In particular, it's useful to map guest memory directly into /dev/mem for experimentation. Signed-off-by: Alexander Graf --- backends/hostmem-file.c | 40 +++- include/exec/memory.h | 2 ++ include/exec/ram_addr.h | 3 ++- qapi/qom.json | 1 + qemu-options.hx | 6 +- softmmu/memory.c| 3 ++- softmmu/physmem.c | 5 +++-- 7 files changed, 54 insertions(+), 6 deletions(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 25141283c4..38ea65bec5 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -27,6 +27,7 @@ struct HostMemoryBackendFile { char *mem_path; uint64_t align; +uint64_t offset; bool discard_data; bool is_pmem; bool readonly; @@ -58,7 +59,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) ram_flags |= fb->is_pmem ? RAM_PMEM : 0; memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name, backend->size, fb->align, ram_flags, - fb->mem_path, fb->readonly, errp); + fb->mem_path, fb->offset, fb->readonly, + errp); g_free(name); #endif } @@ -125,6 +127,36 @@ static void file_memory_backend_set_align(Object *o, Visitor *v, fb->align = val; } +static void file_memory_backend_get_offset(Object *o, Visitor *v, + const char *name, void *opaque, + Error **errp) +{ +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); +uint64_t val = fb->offset; + +visit_type_size(v, name, &val, errp); +} + +static void file_memory_backend_set_offset(Object *o, Visitor *v, + const char *name, void *opaque, + Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(o); +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); +uint64_t val; + +if (host_memory_backend_mr_inited(backend)) { +error_setg(errp, "cannot change property '%s' of %s", name, + object_get_typename(o)); +return; +} + +if (!visit_type_size(v, name, &val, errp)) { +return; +} +fb->offset = val; +} + #ifdef CONFIG_LIBPMEM static bool file_memory_backend_get_pmem(Object *o, Error **errp) { @@ -197,6 +229,12 @@ file_backend_class_init(ObjectClass *oc, void *data) file_memory_backend_get_align, file_memory_backend_set_align, NULL, NULL); +object_class_property_add(oc, "offset", "int", +file_memory_backend_get_offset, +file_memory_backend_set_offset, +NULL, NULL); +object_class_property_set_description(oc, "offset", +"Offset into the target file (ex: 1G)"); #ifdef CONFIG_LIBPMEM object_class_property_add_bool(oc, "pmem", file_memory_backend_get_pmem, file_memory_backend_set_pmem); diff --git a/include/exec/memory.h b/include/exec/memory.h index 2e602a2fad..bd67198111 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1318,6 +1318,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM, * RAM_NORESERVE, * @path: the path in which to allocate the RAM. + * @offset: offset within the file referenced by path * @readonly: true to open @path for reading, false for read/write. * @errp: pointer to Error*, to store an error if it happens. * @@ -1331,6 +1332,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr, uint64_t align, uint32_t ram_flags, const char *path, + ram_addr_t offset, bool readonly, Error **errp); diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index f4fb6a2111..90a8269290 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -110,6 +110,7 @@ long qemu_maxrampagesize(void); * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM, * RAM_NORESERVE. * @mem_path or @fd: specify the backing file or device + * @offset: Offset into target file * @readonly: true to open @path for reading, false for read/write. * @errp: pointer to Error*, to store an error if it happens * @@ -119,7 +120,7 @@ long qemu_maxrampagesize(void); */ RAMBlock *qemu_ram_allo
[PATCH v2] hvf: arm: Add support for GICv3
We currently only support GICv2 emulation. To also support GICv3, we will need to pass a few system registers into their respective handler functions. This patch adds support for HVF to call into the TCG callbacks for GICv3 system register handlers. This is safe because the GICv3 TCG code is generic as long as we limit ourselves to EL0 and EL1 - which are the only modes supported by HVF. To make sure nobody trips over that, we also annotate callbacks that don't work in HVF mode, such as EL state change hooks. With GICv3 support in place, we can run with more than 8 vCPUs. Signed-off-by: Alexander Graf --- v1 -> v2: - assert when guest has EL2/EL3 and uses non-TCG GICv3 - use defines for sysreg masks --- hw/intc/arm_gicv3_cpuif.c | 15 +++- target/arm/hvf/hvf.c| 151 target/arm/hvf/trace-events | 2 + 3 files changed, 167 insertions(+), 1 deletion(-) diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c index b17b29288c..c4ff595742 100644 --- a/hw/intc/arm_gicv3_cpuif.c +++ b/hw/intc/arm_gicv3_cpuif.c @@ -21,6 +21,7 @@ #include "hw/irq.h" #include "cpu.h" #include "target/arm/cpregs.h" +#include "sysemu/tcg.h" /* * Special case return value from hppvi_index(); must be larger than @@ -2810,6 +2811,8 @@ void gicv3_init_cpuif(GICv3State *s) * which case we'd get the wrong value. * So instead we define the regs with no ri->opaque info, and * get back to the GICv3CPUState from the CPUARMState. + * + * These CP regs callbacks can be called from either TCG or HVF code. */ define_arm_cp_regs(cpu, gicv3_cpuif_reginfo); @@ -2905,6 +2908,16 @@ void gicv3_init_cpuif(GICv3State *s) define_arm_cp_regs(cpu, gicv3_cpuif_ich_apxr23_reginfo); } } -arm_register_el_change_hook(cpu, gicv3_cpuif_el_change_hook, cs); +if (tcg_enabled()) { +/* + * We can only trap EL changes with TCG. However the GIC interrupt + * state only changes on EL changes involving EL2 or EL3, so for + * the non-TCG case this is OK, as EL2 and EL3 can't exist. + */ +arm_register_el_change_hook(cpu, gicv3_cpuif_el_change_hook, cs); +} else { +assert(!arm_feature(&cpu->env, ARM_FEATURE_EL2)); +assert(!arm_feature(&cpu->env, ARM_FEATURE_EL3)); +} } } diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c index 060aa0ccf4..ad65603445 100644 --- a/target/arm/hvf/hvf.c +++ b/target/arm/hvf/hvf.c @@ -80,6 +80,33 @@ #define SYSREG_PMCCNTR_EL0SYSREG(3, 3, 9, 13, 0) #define SYSREG_PMCCFILTR_EL0 SYSREG(3, 3, 14, 15, 7) +#define SYSREG_ICC_AP0R0_EL1 SYSREG(3, 0, 12, 8, 4) +#define SYSREG_ICC_AP0R1_EL1 SYSREG(3, 0, 12, 8, 5) +#define SYSREG_ICC_AP0R2_EL1 SYSREG(3, 0, 12, 8, 6) +#define SYSREG_ICC_AP0R3_EL1 SYSREG(3, 0, 12, 8, 7) +#define SYSREG_ICC_AP1R0_EL1 SYSREG(3, 0, 12, 9, 0) +#define SYSREG_ICC_AP1R1_EL1 SYSREG(3, 0, 12, 9, 1) +#define SYSREG_ICC_AP1R2_EL1 SYSREG(3, 0, 12, 9, 2) +#define SYSREG_ICC_AP1R3_EL1 SYSREG(3, 0, 12, 9, 3) +#define SYSREG_ICC_ASGI1R_EL1SYSREG(3, 0, 12, 11, 6) +#define SYSREG_ICC_BPR0_EL1 SYSREG(3, 0, 12, 8, 3) +#define SYSREG_ICC_BPR1_EL1 SYSREG(3, 0, 12, 12, 3) +#define SYSREG_ICC_CTLR_EL1 SYSREG(3, 0, 12, 12, 4) +#define SYSREG_ICC_DIR_EL1 SYSREG(3, 0, 12, 11, 1) +#define SYSREG_ICC_EOIR0_EL1 SYSREG(3, 0, 12, 8, 1) +#define SYSREG_ICC_EOIR1_EL1 SYSREG(3, 0, 12, 12, 1) +#define SYSREG_ICC_HPPIR0_EL1SYSREG(3, 0, 12, 8, 2) +#define SYSREG_ICC_HPPIR1_EL1SYSREG(3, 0, 12, 12, 2) +#define SYSREG_ICC_IAR0_EL1 SYSREG(3, 0, 12, 8, 0) +#define SYSREG_ICC_IAR1_EL1 SYSREG(3, 0, 12, 12, 0) +#define SYSREG_ICC_IGRPEN0_EL1 SYSREG(3, 0, 12, 12, 6) +#define SYSREG_ICC_IGRPEN1_EL1 SYSREG(3, 0, 12, 12, 7) +#define SYSREG_ICC_PMR_EL1 SYSREG(3, 0, 4, 6, 0) +#define SYSREG_ICC_RPR_EL1 SYSREG(3, 0, 12, 11, 3) +#define SYSREG_ICC_SGI0R_EL1 SYSREG(3, 0, 12, 11, 7) +#define SYSREG_ICC_SGI1R_EL1 SYSREG(3, 0, 12, 11, 5) +#define SYSREG_ICC_SRE_EL1 SYSREG(3, 0, 12, 12, 5) + #define WFX_IS_WFE (1 << 0) #define TMR_CTL_ENABLE (1 << 0) @@ -788,6 +815,43 @@ static bool is_id_sysreg(uint32_t reg) SYSREG_CRM(reg) < 8; } +static uint32_t hvf_reg2cp_reg(uint32_t reg) +{ +return ENCODE_AA64_CP_REG(CP_REG_ARM64_SYSREG_CP, + (reg >> SYSREG_CRN_SHIFT) & SYSREG_CRN_MASK, + (reg >> SYSREG_CRM_SHIFT) & SYSREG_CRM_MASK, + (reg >> SYSREG_OP0_SHIFT) & SYSREG_OP0_MASK, + (reg >> SYSREG_OP1_SHIFT) & SYSREG_OP1_MASK, + (reg >> SYSREG_OP2_SHIFT
Re: [PATCH] hvf: arm: Add support for GICv3
On 06.01.23 17:37, Peter Maydell wrote: On Mon, 19 Dec 2022 at 22:08, Alexander Graf wrote: We currently only support GICv2 emulation. To also support GICv3, we will need to pass a few system registers into their respective handler functions. This patch adds support for HVF to call into the TCG callbacks for GICv3 system register handlers. This is safe because the GICv3 TCG code is generic as long as we limit ourselves to EL0 and EL1 - which are the only modes supported by HVF. To make sure nobody trips over that, we also annotate callbacks that don't work in HVF mode, such as EL state change hooks. With GICv3 support in place, we can run with more than 8 vCPUs. Signed-off-by: Alexander Graf --- hw/intc/arm_gicv3_cpuif.c | 8 +- target/arm/hvf/hvf.c| 151 target/arm/hvf/trace-events | 2 + 3 files changed, 160 insertions(+), 1 deletion(-) diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c index b17b29288c..b4e387268c 100644 --- a/hw/intc/arm_gicv3_cpuif.c +++ b/hw/intc/arm_gicv3_cpuif.c @@ -21,6 +21,7 @@ #include "hw/irq.h" #include "cpu.h" #include "target/arm/cpregs.h" +#include "sysemu/tcg.h" /* * Special case return value from hppvi_index(); must be larger than @@ -2810,6 +2811,8 @@ void gicv3_init_cpuif(GICv3State *s) * which case we'd get the wrong value. * So instead we define the regs with no ri->opaque info, and * get back to the GICv3CPUState from the CPUARMState. + * + * These CP regs callbacks can be called from either TCG or HVF code. */ define_arm_cp_regs(cpu, gicv3_cpuif_reginfo); @@ -2905,6 +2908,9 @@ void gicv3_init_cpuif(GICv3State *s) define_arm_cp_regs(cpu, gicv3_cpuif_ich_apxr23_reginfo); } } -arm_register_el_change_hook(cpu, gicv3_cpuif_el_change_hook, cs); +if (tcg_enabled()) { +/* We can only trap EL changes with TCG for now */ We could expand this a bit: We can only trap EL changes with TCG. However the GIC interrupt state only changes on EL changes involving EL2 or EL3, so for the non-TCG case this is OK, as EL2 and EL3 can't exist. and assert: assert(!arm_feature(&cpu->env, ARM_FEATURE_EL2)); assert(!arm_feature(&cpu->env, ARM_FEATURE_EL3)); Good idea! Let me add that. +static uint32_t hvf_reg2cp_reg(uint32_t reg) +{ +return ENCODE_AA64_CP_REG(CP_REG_ARM64_SYSREG_CP, + (reg >> 10) & 0xf, + (reg >> 1) & 0xf, + (reg >> 20) & 0x3, + (reg >> 14) & 0x7, + (reg >> 17) & 0x7); This file has #defines for these shift and mask constants (SYSREG_OP0_SHIFT etc). Ugh, thanks for catching that! +} + +static bool hvf_sysreg_read_cp(CPUState *cpu, uint32_t reg, uint64_t *val) +{ +ARMCPU *arm_cpu = ARM_CPU(cpu); +CPUARMState *env = &arm_cpu->env; +const ARMCPRegInfo *ri; + +ri = get_arm_cp_reginfo(arm_cpu->cp_regs, hvf_reg2cp_reg(reg)); +if (ri) { +if (ri->accessfn) { +if (ri->accessfn(env, ri, true) != CP_ACCESS_OK) { +return false; +} +} +if (ri->type & ARM_CP_CONST) { +*val = ri->resetvalue; +} else if (ri->readfn) { +*val = ri->readfn(env, ri); +} else { +*val = CPREG_FIELD64(env, ri); +} +trace_hvf_vgic_read(ri->name, *val); +return true; +} Can we get here for attempts by EL0 to access EL1-only sysregs, or does hvf send the exception to EL1 without trapping out to us? If we can get here for EL0 accesses we need to check against ri->access as well as ri->accessfn. I just validated, GICv3 EL1 registers trap to EL1 inside the guest: $ cat a.S .global start .global _main _main: start: mrs x0, ICC_AP0R0_EL1 mov x0, #0x1234 msr ICC_AP0R0_EL1, x0 mov x0, #0 ret $ gcc -nostdlib a.S $ gdb ./a.out (gdb) r Program received signal SIGILL, Illegal instruction. 0x004000d4 in start () (gdb) x/i $pc => 0x4000d4 : mrs x0, icc_ap0r0_el1 So no need to check ri->access :) Alex
Re: [PATCH 0/2] hw/intc/arm_gicv3: Bump ITT entry size to 16
Hi Peter, On 03.01.23 18:41, Peter Maydell wrote: On Fri, 23 Dec 2022 at 08:50, Alexander Graf wrote: While trying to make Windows work with GICv3 emulation, I stumbled over the fact that it only supports ITT entry sizes that are power of 2 sized. While the spec allows arbitrary sizes, in practice hardware will always expose power of 2 sizes and so this limitation is not really a problem in real world scenarios. However, we only expose a 12 byte ITT entry size which makes Windows blue screen on boot. The easy way to get around that problem is to bump the size to 16. That is a power of 2, basically is what hardware would expose given the amount of bits we need per entry and doesn't break any existing scenarios. To play it safe, this patch set only bumps them on newer machine types. This is a Windows bug and should IMHO be fixed in that guest OS. I don't have access to the Windows source code, but the compiled binary very explicitly checks and validates that an ITT entry is Po2 sized. That means the MS folks deliberately decided to make simplifying assumptions that hardware will never use any other sizes. After thinking about it for a while, I ended up with the same conclusion: Hardware would never use anything but Po2 sizes because those are trivial to map to indexes in hardware, while anything even remotely odd is much more costly (in die space and/or time) to extract an index from. So while I'm really curious about the rationale they had here, I doubt it's a bug. It's a deliberate decision. And one that makes sense in the context of hardware. I don't see a good reason for them to change the behavior, given that there's a close-to-0 chance we will ever see real hardware ITS structures with ITT entries that are not Po2 sized. Changing the ITT entry size of QEMU's implementation introduces an unnecessary incompatibility in migration and wastes memory The patch set deals with migration through machine versions. We do these type of changes all the time, why would it be a problem here? As for memory waste, I agree. If I understand the ITS code correctly, basically all of the contents that are >8 bytes is GICv4 related and useless in a GICv3 vGIC. So I think if we really care strongly about memory waste, we could try to condense it down to 8 bytes in the GICv3 case and make it 16 only for GICv4. I think keeping GICv3 and GICv4 code paths identical does have its attractiveness though, so I'd prefer not to do it. (we're already a bit unnecessarily profligate with ITT entries compared to real hardware). Do you mean the number of entries or the size per entry? Alex
Re: qemu-system-i386: Could not install MSR_CORE_THREAD_COUNT handler: Success
Hi Vitaly, On 31.12.22 11:17, Vitaly Chikunov wrote: Alexander, On Sat, Dec 31, 2022 at 10:28:21AM +0100, Alexander Graf wrote: On 30.12.22 19:16, Vitaly Chikunov wrote: On Fri, Dec 30, 2022 at 06:44:14PM +0100, Alexander Graf wrote: This is a kvm kernel bug and should be fixed with the latest stable releases. Which kernel version are you running? This is on latest v6.0 stable - 6.0.15. Maybe there could be workaround for such situations? (Or maybe it's possible to make this error non-fatal?) We use qemu+kvm for testing and now we cannot test on x86. I'm confused what's going wrong for you. I tried to reproduce the issue locally, but am unable to: $ uname -a Linux server 6.0.15-default #1 SMP PREEMPT_DYNAMIC Sat Dec 31 07:52:52 CET 2022 x86_64 x86_64 x86_64 GNU/Linux $ linux32 chroot . $ uname -a Linux server 6.0.15-default #1 SMP PREEMPT_DYNAMIC Sat Dec 31 07:52:52 CET 2022 i686 GNU/Linux $ cd qemu $ file ./build/qemu-system-i386 ./build/qemu-system-i386: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=f75e20572be5c604c121de4497397665c168aa4c, with debug_info, not stripped $ ./build/qemu-system-i386 --version QEMU emulator version 7.2.0 (v7.2.0-dirty) Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers $ ./build/qemu-system-i386 -nographic -enable-kvm SeaBIOS (version rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org) [...] Can you please double check whether your host kernel version is 6.0.15? Please paste the output of "uname -a". Excuse me, I'm incorrectly reported kernel version I tried to boot instead of host one. Host kernels are quite old, 5.15.59 and even 5.17.15 -- where failure is occurring. I just tested on 5.15.85 and there is no failure. Awesome, great to hear :). That means everything works as expected at least. builder@i586:/.in$ uname -a Linux localhost.localdomain 5.15.85-std-def-alt1 #1 SMP Wed Dec 21 21:14:40 UTC 2022 i686 GNU/Linux builder@i586:/.in$ qemu-system-i386 -nographic -enable-kvm SeaBIOS (version 1.16.1-alt1) Perhaps, one of solutions it to reboot our build fleet to newer kernels. [This maybe hard, though, since special builder node image should be created and reboot shall be coordinated through all systems, in compare, updating QEMU would be easier since chroot is created on every build]. I understand that it may be slightly painful to update your build fleet, but given this is a genuine kernel bug that has a fix available upstream and it only happens on niche corner cases (i386 QEMU on x86-64 Linux kernels with the bug) that I doubt anyone will use in production, I'd prefer we keep the QEMU logic as is :). In the meanwhile, while you're patching the build fleet, you can apply the patch below as part of your build process to ensure you don't fail due to the kernel bug. Just make sure to remove it again as soon as you're done with the fleet update :). diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index a213209379..b9396bc7a6 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -2632,7 +2632,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s) return ret; } } +#ifdef __x86_64__ if (kvm_vm_check_extension(s, KVM_CAP_X86_USER_SPACE_MSR)) { +#else + if (0) { +#endif bool r; ret = kvm_vm_enable_cap(s, KVM_CAP_X86_USER_SPACE_MSR, 0, Alex
Re: qemu-system-i386: Could not install MSR_CORE_THREAD_COUNT handler: Success
Hi Vitaly, On 30.12.22 19:16, Vitaly Chikunov wrote: Alexander, On Fri, Dec 30, 2022 at 06:44:14PM +0100, Alexander Graf wrote: Hi Vitaly, This is a kvm kernel bug and should be fixed with the latest stable releases. Which kernel version are you running? This is on latest v6.0 stable - 6.0.15. Maybe there could be workaround for such situations? (Or maybe it's possible to make this error non-fatal?) We use qemu+kvm for testing and now we cannot test on x86. I'm confused what's going wrong for you. I tried to reproduce the issue locally, but am unable to: $ uname -a Linux server 6.0.15-default #1 SMP PREEMPT_DYNAMIC Sat Dec 31 07:52:52 CET 2022 x86_64 x86_64 x86_64 GNU/Linux $ linux32 chroot . $ uname -a Linux server 6.0.15-default #1 SMP PREEMPT_DYNAMIC Sat Dec 31 07:52:52 CET 2022 i686 GNU/Linux $ cd qemu $ file ./build/qemu-system-i386 ./build/qemu-system-i386: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=f75e20572be5c604c121de4497397665c168aa4c, with debug_info, not stripped $ ./build/qemu-system-i386 --version QEMU emulator version 7.2.0 (v7.2.0-dirty) Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers $ ./build/qemu-system-i386 -nographic -enable-kvm SeaBIOS (version rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org) [...] Can you please double check whether your host kernel version is 6.0.15? Please paste the output of "uname -a". Alex
Re: qemu-system-i386: Could not install MSR_CORE_THREAD_COUNT handler: Success
Hi Vitaly, This is a kvm kernel bug and should be fixed with the latest stable releases. Which kernel version are you running? Thanks, Alex > Am 30.12.2022 um 15:30 schrieb Vitaly Chikunov : > > Hi, > > QEMU 7.2.0 when run on 32-bit x86 architecture fails with: > > i586$ qemu-system-i386 -enable-kvm > qemu-system-i386: Could not install MSR_CORE_THREAD_COUNT handler: Success > i586$ qemu-system-x86_64 -enable-kvm > qemu-system-x86_64: Could not install MSR_CORE_THREAD_COUNT handler: Success > > Minimal reproducer is `qemu-system-i386 -enable-kvm'. And this only > happens on x86 (linux32 personality and binaries on x86_64 host): > > i586$ file /usr/bin/qemu-system-i386 > /usr/bin/qemu-system-i386: ELF 32-bit LSB pie executable, Intel 80386, > version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, > BuildID[sha1]=0ba1d953bcb7a691014255954f060ff404c8df90, for GNU/Linux 3.2.0, > stripped > i586$ /usr/bin/qemu-system-i386 --version > QEMU emulator version 7.2.0 (qemu-7.2.0-alt1) > Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers > > Thanks, >
Re: [PATCH v3 1/2] hw/arm/virt: Consolidate GIC finalize logic
Hey Cornelia, On 23.12.22 13:30, Cornelia Huck wrote: On Fri, Dec 23 2022, Alexander Graf wrote: Up to now, the finalize_gic_version() code open coded what is essentially a support bitmap match between host/emulation environment and desired target GIC type. This open coding leads to undesirable side effects. For example, a VM with KVM and -smp 10 will automatically choose GICv3 while the same command line with TCG will stay on GICv2 and fail the launch. This patch combines the TCG and KVM matching code paths by making everything a 2 pass process. First, we determine which GIC versions the current environment is able to support, then we go through a single state machine to determine which target GIC mode that means for us. After this patch, the only user noticable changes should be consolidated error messages as well as TCG -M virt supporting -smp > 8 automatically. Signed-off-by: Alexander Graf --- v1 -> v2: - Leave VIRT_GIC_VERSION defines intact, we need them for MADT generation v2 -> v3: - Fix comment - Flip kvm-enabled logic for host around --- hw/arm/virt.c | 198 ++ include/hw/arm/virt.h | 15 ++-- 2 files changed, 112 insertions(+), 101 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ea2413a0ba..6d27f044fe 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1820,6 +1820,84 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) } } +static VirtGICType finalize_gic_version_do(const char *accel_name, + VirtGICType gic_version, + int gics_supported, + unsigned int max_cpus) +{ +/* Convert host/max/nosel to GIC version number */ +switch (gic_version) { +case VIRT_GIC_VERSION_HOST: +if (!kvm_enabled()) { +error_report("gic-version=host requires KVM"); +exit(1); +} + +/* For KVM, gic-version=host means gic-version=max */ +return finalize_gic_version_do(accel_name, VIRT_GIC_VERSION_MAX, + gics_supported, max_cpus); I think I'd still rather use /* fallthrough */ here, but let's leave that decision to the maintainers. I originally had a fallthrough here, then looked at the code and concluded for myself that I dislike fallthroughs :). They make more complicated code flows insanely complicated and are super error prone. In any case, Reviewed-by: Cornelia Huck [As an aside, we have a QEMU_FALLTHROUGH #define that maps to __attribute__((fallthrough)) if available, but unlike the Linux kernel, we didn't bother to convert everything to use it in QEMU. Should we? Would using the attribute give us some extra benefits?] IMHO we're be better off just refactoring code in ways that don't require fall-throughs. Modern compilers inline functions pretty well, so I think there's very little reason for them anymore. Thanks a lot for the reviews! Alex
[PATCH v3 0/2] hw/arm/virt: Handle HVF in finalize_gic_version()
The finalize_gic_version() function tries to determine which GIC version the current accelerator / host combination supports. During the initial HVF porting efforts, I didn't realize that I also had to touch this function. Then Zenghui brought up this function as reply to my HVF GICv3 enablement patch - and boy it is a mess. This patch set cleans up all of the GIC finalization so that we can easily plug HVF in and also hopefully will have a better time extending it in the future. As second step, it explicitly adds HVF support and fails loudly for any unsupported accelerators. Alex v1 -> v2: - Leave VIRT_GIC_VERSION defines intact, we need them for MADT generation - Include TCG header for tcg_enabled() v2 -> v3: - Fix comment - Flip kvm-enabled logic for host around Alexander Graf (2): hw/arm/virt: Consolidate GIC finalize logic hw/arm/virt: Make accels in GIC finalize logic explicit hw/arm/virt.c | 200 ++ include/hw/arm/virt.h | 15 ++-- 2 files changed, 115 insertions(+), 100 deletions(-) -- 2.37.1 (Apple Git-137.1)
[PATCH v3 2/2] hw/arm/virt: Make accels in GIC finalize logic explicit
Let's explicitly list out all accelerators that we support when trying to determine the supported set of GIC versions. KVM was already separate, so the only missing one is HVF which simply reuses all of TCG's emulation code and thus has the same compatibility matrix. Signed-off-by: Alexander Graf Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Cornelia Huck --- v1 -> v2: - Include TCG header for tcg_enabled() --- hw/arm/virt.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 6d27f044fe..611f40c1da 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -47,6 +47,7 @@ #include "sysemu/numa.h" #include "sysemu/runstate.h" #include "sysemu/tpm.h" +#include "sysemu/tcg.h" #include "sysemu/kvm.h" #include "sysemu/hvf.h" #include "hw/loader.h" @@ -1929,7 +1930,7 @@ static void finalize_gic_version(VirtMachineState *vms) /* KVM w/o kernel irqchip can only deal with GICv2 */ gics_supported |= VIRT_GIC_VERSION_2_MASK; accel_name = "KVM with kernel-irqchip=off"; -} else { +} else if (tcg_enabled() || hvf_enabled()) { gics_supported |= VIRT_GIC_VERSION_2_MASK; if (module_object_class_by_name("arm-gicv3")) { gics_supported |= VIRT_GIC_VERSION_3_MASK; @@ -1938,6 +1939,9 @@ static void finalize_gic_version(VirtMachineState *vms) gics_supported |= VIRT_GIC_VERSION_4_MASK; } } +} else { +error_report("Unsupported accelerator, can not determine GIC support"); +exit(1); } /* -- 2.37.1 (Apple Git-137.1)
[PATCH v3 1/2] hw/arm/virt: Consolidate GIC finalize logic
Up to now, the finalize_gic_version() code open coded what is essentially a support bitmap match between host/emulation environment and desired target GIC type. This open coding leads to undesirable side effects. For example, a VM with KVM and -smp 10 will automatically choose GICv3 while the same command line with TCG will stay on GICv2 and fail the launch. This patch combines the TCG and KVM matching code paths by making everything a 2 pass process. First, we determine which GIC versions the current environment is able to support, then we go through a single state machine to determine which target GIC mode that means for us. After this patch, the only user noticable changes should be consolidated error messages as well as TCG -M virt supporting -smp > 8 automatically. Signed-off-by: Alexander Graf --- v1 -> v2: - Leave VIRT_GIC_VERSION defines intact, we need them for MADT generation v2 -> v3: - Fix comment - Flip kvm-enabled logic for host around --- hw/arm/virt.c | 198 ++ include/hw/arm/virt.h | 15 ++-- 2 files changed, 112 insertions(+), 101 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ea2413a0ba..6d27f044fe 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1820,6 +1820,84 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) } } +static VirtGICType finalize_gic_version_do(const char *accel_name, + VirtGICType gic_version, + int gics_supported, + unsigned int max_cpus) +{ +/* Convert host/max/nosel to GIC version number */ +switch (gic_version) { +case VIRT_GIC_VERSION_HOST: +if (!kvm_enabled()) { +error_report("gic-version=host requires KVM"); +exit(1); +} + +/* For KVM, gic-version=host means gic-version=max */ +return finalize_gic_version_do(accel_name, VIRT_GIC_VERSION_MAX, + gics_supported, max_cpus); +case VIRT_GIC_VERSION_MAX: +if (gics_supported & VIRT_GIC_VERSION_4_MASK) { +gic_version = VIRT_GIC_VERSION_4; +} else if (gics_supported & VIRT_GIC_VERSION_3_MASK) { +gic_version = VIRT_GIC_VERSION_3; +} else { +gic_version = VIRT_GIC_VERSION_2; +} +break; +case VIRT_GIC_VERSION_NOSEL: +if ((gics_supported & VIRT_GIC_VERSION_2_MASK) && +max_cpus <= GIC_NCPU) { +gic_version = VIRT_GIC_VERSION_2; +} else if (gics_supported & VIRT_GIC_VERSION_3_MASK) { +/* + * in case the host does not support v2 emulation or + * the end-user requested more than 8 VCPUs we now default + * to v3. In any case defaulting to v2 would be broken. + */ +gic_version = VIRT_GIC_VERSION_3; +} else if (max_cpus > GIC_NCPU) { +error_report("%s only supports GICv2 emulation but more than 8 " + "vcpus are requested", accel_name); +exit(1); +} +break; +case VIRT_GIC_VERSION_2: +case VIRT_GIC_VERSION_3: +case VIRT_GIC_VERSION_4: +break; +} + +/* Check chosen version is effectively supported */ +switch (gic_version) { +case VIRT_GIC_VERSION_2: +if (!(gics_supported & VIRT_GIC_VERSION_2_MASK)) { +error_report("%s does not support GICv2 emulation", accel_name); +exit(1); +} +break; +case VIRT_GIC_VERSION_3: +if (!(gics_supported & VIRT_GIC_VERSION_3_MASK)) { +error_report("%s does not support GICv3 emulation", accel_name); +exit(1); +} +break; +case VIRT_GIC_VERSION_4: +if (!(gics_supported & VIRT_GIC_VERSION_4_MASK)) { +error_report("%s does not support GICv4 emulation, is virtualization=on?", + accel_name); +exit(1); +} +break; +default: +error_report("logic error in finalize_gic_version"); +exit(1); +break; +} + +return gic_version; +} + /* * finalize_gic_version - Determines the final gic_version * according to the gic-version property @@ -1828,118 +1906,46 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) */ static void finalize_gic_version(VirtMachineState *vms) { +const char *accel_name = current_accel_name(); unsigned int max_cpus = MACHINE(vms)->smp.max_cpus; +int gics_supported = 0; -if (kvm_enabled()) { -int probe_bitmap; +/* Determine which GIC versions the current environment supports */ +if (kvm_enabled() && kvm_irqchip_in_kernel()) { +int probe_bitmap = kvm_arm_vgic_
[PATCH 0/2] hw/intc/arm_gicv3: Bump ITT entry size to 16
While trying to make Windows work with GICv3 emulation, I stumbled over the fact that it only supports ITT entry sizes that are power of 2 sized. While the spec allows arbitrary sizes, in practice hardware will always expose power of 2 sizes and so this limitation is not really a problem in real world scenarios. However, we only expose a 12 byte ITT entry size which makes Windows blue screen on boot. The easy way to get around that problem is to bump the size to 16. That is a power of 2, basically is what hardware would expose given the amount of bits we need per entry and doesn't break any existing scenarios. To play it safe, this patch set only bumps them on newer machine types. Alexander Graf (2): hw/intc/arm_gicv3: Make ITT entry size configurable hw/intc/arm_gicv3: Bump ITT entry size to 16 hw/core/machine.c | 4 +++- hw/intc/arm_gicv3_its.c| 13 ++--- hw/intc/gicv3_internal.h | 2 +- include/hw/intc/arm_gicv3_its_common.h | 1 + 4 files changed, 15 insertions(+), 5 deletions(-) -- 2.37.1 (Apple Git-137.1)
[PATCH 1/2] hw/intc/arm_gicv3: Make ITT entry size configurable
An ITT entry is opaque to the OS. The only thing it does get told by HW is its size. In theory, that size can be any byte aligned number, in practice HW will always use power of 2s to simplify offset calculation. We currently expose the size as 12, which is not a power of 2. To prepare for a future where we expose power of 2 sized entry sizes, let's make the size itself configurable. We only need to watch out that we don't have an entry be smaller than the fields we want to access inside. Bigger is always fine. Signed-off-by: Alexander Graf --- hw/intc/arm_gicv3_its.c| 14 +++--- hw/intc/gicv3_internal.h | 2 +- include/hw/intc/arm_gicv3_its_common.h | 1 + 3 files changed, 13 insertions(+), 4 deletions(-) diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c index 57c79da5c5..e7cabeb46c 100644 --- a/hw/intc/arm_gicv3_its.c +++ b/hw/intc/arm_gicv3_its.c @@ -215,7 +215,7 @@ static bool update_ite(GICv3ITSState *s, uint32_t eventid, const DTEntry *dte, { AddressSpace *as = &s->gicv3->dma_as; MemTxResult res = MEMTX_OK; -hwaddr iteaddr = dte->ittaddr + eventid * ITS_ITT_ENTRY_SIZE; +hwaddr iteaddr = dte->ittaddr + eventid * s->itt_entry_size; uint64_t itel = 0; uint32_t iteh = 0; @@ -253,7 +253,7 @@ static MemTxResult get_ite(GICv3ITSState *s, uint32_t eventid, MemTxResult res = MEMTX_OK; uint64_t itel; uint32_t iteh; -hwaddr iteaddr = dte->ittaddr + eventid * ITS_ITT_ENTRY_SIZE; +hwaddr iteaddr = dte->ittaddr + eventid * s->itt_entry_size; itel = address_space_ldq_le(as, iteaddr, MEMTXATTRS_UNSPECIFIED, &res); if (res != MEMTX_OK) { @@ -1934,6 +1934,12 @@ static void gicv3_arm_its_realize(DeviceState *dev, Error **errp) } } +if (s->itt_entry_size < MIN_ITS_ITT_ENTRY_SIZE) { +error_setg(errp, "ITT entry size must be at least %d", + MIN_ITS_ITT_ENTRY_SIZE); +return; +} + gicv3_add_its(s->gicv3, dev); gicv3_its_init_mmio(s, &gicv3_its_control_ops, &gicv3_its_translation_ops); @@ -1941,7 +1947,7 @@ static void gicv3_arm_its_realize(DeviceState *dev, Error **errp) /* set the ITS default features supported */ s->typer = FIELD_DP64(s->typer, GITS_TYPER, PHYSICAL, 1); s->typer = FIELD_DP64(s->typer, GITS_TYPER, ITT_ENTRY_SIZE, - ITS_ITT_ENTRY_SIZE - 1); + s->itt_entry_size - 1); s->typer = FIELD_DP64(s->typer, GITS_TYPER, IDBITS, ITS_IDBITS); s->typer = FIELD_DP64(s->typer, GITS_TYPER, DEVBITS, ITS_DEVBITS); s->typer = FIELD_DP64(s->typer, GITS_TYPER, CIL, 1); @@ -2008,6 +2014,8 @@ static void gicv3_its_post_load(GICv3ITSState *s) static Property gicv3_its_props[] = { DEFINE_PROP_LINK("parent-gicv3", GICv3ITSState, gicv3, "arm-gicv3", GICv3State *), +DEFINE_PROP_UINT8("itt-entry-size", GICv3ITSState, itt_entry_size, + MIN_ITS_ITT_ENTRY_SIZE), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h index 29d5cdc1b6..2aca1ba095 100644 --- a/hw/intc/gicv3_internal.h +++ b/hw/intc/gicv3_internal.h @@ -450,7 +450,7 @@ FIELD(VINVALL_1, VPEID, 32, 16) * the value of that field in memory cannot be relied upon -- older * versions of QEMU did not correctly write to that memory.) */ -#define ITS_ITT_ENTRY_SIZE0xC +#define MIN_ITS_ITT_ENTRY_SIZE0xC FIELD(ITE_L, VALID, 0, 1) FIELD(ITE_L, INTTYPE, 1, 1) diff --git a/include/hw/intc/arm_gicv3_its_common.h b/include/hw/intc/arm_gicv3_its_common.h index a11a0f6654..e730a5482c 100644 --- a/include/hw/intc/arm_gicv3_its_common.h +++ b/include/hw/intc/arm_gicv3_its_common.h @@ -66,6 +66,7 @@ struct GICv3ITSState { int dev_fd; /* kvm device fd if backed by kvm vgic support */ uint64_t gits_translater_gpa; bool translater_gpa_known; +uint8_t itt_entry_size; /* Registers */ uint32_t ctlr; -- 2.37.1 (Apple Git-137.1)
[PATCH 2/2] hw/intc/arm_gicv3: Bump ITT entry size to 16
Some Operating Systems (like Windows) can only deal with ITT entry sizes that are a power of 2. While the spec allows arbitrarily sized ITT entry sizes, in practice all hardware will use power of 2 because that simplifies offset calculation and ensures that a power of 2 sized region can hold a set of entries without gap at the end. So let's just bump the entry size to 16. That gives us enough space for the 12 bytes of data that we want to have in each ITT entry and makes QEMU look a bit more like real hardware. Signed-off-by: Alexander Graf --- hw/core/machine.c | 4 +++- hw/intc/arm_gicv3_its.c | 3 +-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/hw/core/machine.c b/hw/core/machine.c index f589b92909..d9a3f01ed9 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -40,7 +40,9 @@ #include "hw/virtio/virtio-pci.h" #include "qom/object_interfaces.h" -GlobalProperty hw_compat_7_2[] = {}; +GlobalProperty hw_compat_7_2[] = { +{ "arm-gicv3-its", "itt-entry-size", "12" }, +}; const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2); GlobalProperty hw_compat_7_1[] = { diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c index e7cabeb46c..6754523321 100644 --- a/hw/intc/arm_gicv3_its.c +++ b/hw/intc/arm_gicv3_its.c @@ -2014,8 +2014,7 @@ static void gicv3_its_post_load(GICv3ITSState *s) static Property gicv3_its_props[] = { DEFINE_PROP_LINK("parent-gicv3", GICv3ITSState, gicv3, "arm-gicv3", GICv3State *), -DEFINE_PROP_UINT8("itt-entry-size", GICv3ITSState, itt_entry_size, - MIN_ITS_ITT_ENTRY_SIZE), +DEFINE_PROP_UINT8("itt-entry-size", GICv3ITSState, itt_entry_size, 16), DEFINE_PROP_END_OF_LIST(), }; -- 2.37.1 (Apple Git-137.1)
[PATCH v2 1/2] hw/arm/virt: Consolidate GIC finalize logic
Up to now, the finalize_gic_version() code open coded what is essentially a support bitmap match between host/emulation environment and desired target GIC type. This open coding leads to undesirable side effects. For example, a VM with KVM and -smp 10 will automatically choose GICv3 while the same command line with TCG will stay on GICv2 and fail the launch. This patch combines the TCG and KVM matching code paths by making everything a 2 pass process. First, we determine which GIC versions the current environment is able to support, then we go through a single state machine to determine which target GIC mode that means for us. After this patch, the only user noticable changes should be consolidated error messages as well as TCG -M virt supporting -smp > 8 automatically. Signed-off-by: Alexander Graf --- v1 -> v2: - leave VIRT_GIC_VERSION defines intact, we need them for MADT generation --- hw/arm/virt.c | 199 ++ include/hw/arm/virt.h | 15 ++-- 2 files changed, 113 insertions(+), 101 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 04eb6c201d..7b54387958 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1820,6 +1820,85 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) } } +static VirtGICType finalize_gic_version_do(const char *accel_name, + VirtGICType gic_version, + int gics_supported, + unsigned int max_cpus) +{ +/* Convert host/max/nosel to GIC version number */ +switch (gic_version) { +case VIRT_GIC_VERSION_HOST: +if (kvm_enabled()) { +/* For KVM, -cpu host means -cpu max */ +return finalize_gic_version_do(accel_name, VIRT_GIC_VERSION_MAX, + gics_supported, max_cpus); +} + +error_report("gic-version=host requires KVM"); +exit(1); +break; +case VIRT_GIC_VERSION_MAX: +if (gics_supported & VIRT_GIC_VERSION_4_MASK) { +gic_version = VIRT_GIC_VERSION_4; +} else if (gics_supported & VIRT_GIC_VERSION_3_MASK) { +gic_version = VIRT_GIC_VERSION_3; +} else { +gic_version = VIRT_GIC_VERSION_2; +} +break; +case VIRT_GIC_VERSION_NOSEL: +if ((gics_supported & VIRT_GIC_VERSION_2_MASK) && +max_cpus <= GIC_NCPU) { +gic_version = VIRT_GIC_VERSION_2; +} else if (gics_supported & VIRT_GIC_VERSION_3_MASK) { +/* + * in case the host does not support v2 emulation or + * the end-user requested more than 8 VCPUs we now default + * to v3. In any case defaulting to v2 would be broken. + */ +gic_version = VIRT_GIC_VERSION_3; +} else if (max_cpus > GIC_NCPU) { +error_report("%s only supports GICv2 emulation but more than 8 " + "vcpus are requested", accel_name); +exit(1); +} +break; +case VIRT_GIC_VERSION_2: +case VIRT_GIC_VERSION_3: +case VIRT_GIC_VERSION_4: +break; +} + +/* Check chosen version is effectively supported */ +switch (gic_version) { +case VIRT_GIC_VERSION_2: +if (!(gics_supported & VIRT_GIC_VERSION_2_MASK)) { +error_report("%s does not support GICv2 emulation", accel_name); +exit(1); +} +break; +case VIRT_GIC_VERSION_3: +if (!(gics_supported & VIRT_GIC_VERSION_3_MASK)) { +error_report("%s does not support GICv3 emulation", accel_name); +exit(1); +} +break; +case VIRT_GIC_VERSION_4: +if (!(gics_supported & VIRT_GIC_VERSION_4_MASK)) { +error_report("%s does not support GICv4 emulation, is virtualization=on?", + accel_name); +exit(1); +} +break; +default: +error_report("logic error in finalize_gic_version"); +exit(1); +break; +} + +return gic_version; +} + /* * finalize_gic_version - Determines the final gic_version * according to the gic-version property @@ -1828,118 +1907,46 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) */ static void finalize_gic_version(VirtMachineState *vms) { +const char *accel_name = current_accel_name(); unsigned int max_cpus = MACHINE(vms)->smp.max_cpus; +int gics_supported = 0; -if (kvm_enabled()) { -int probe_bitmap; - -if (!kvm_irqchip_in_kernel()) { -switch (vms->gic_version) { -case VIRT_GIC_VERSION_HOST: -warn_report( -"gic-version=host not relevant with kernel-irqchip=off " -
[PATCH 0/2] hw/arm/virt: Handle HVF in finalize_gic_version()
The finalize_gic_version() function tries to determine which GIC version the current accelerator / host combination supports. During the initial HVF porting efforts, I didn't realize that I also had to touch this function. Then Zenghui brought up this function as reply to my HVF GICv3 enablement patch - and boy it is a mess. This patch set cleans up all of the GIC finalization so that we can easily plug HVF in and also hopefully will have a better time extending it in the future. As second step, it explicitly adds HVF support and fails loudly for any unsupported accelerators. Alex v1 -> v2: - Leave VIRT_GIC_VERSION defines intact, we need them for MADT generation - Include TCG header for tcg_enabled() Alexander Graf (2): hw/arm/virt: Consolidate GIC finalize logic hw/arm/virt: Make accels in GIC finalize logic explicit hw/arm/virt.c | 201 ++ include/hw/arm/virt.h | 15 ++-- 2 files changed, 116 insertions(+), 100 deletions(-) -- 2.37.1 (Apple Git-137.1)
[PATCH v2 2/2] hw/arm/virt: Make accels in GIC finalize logic explicit
Let's explicitly list out all accelerators that we support when trying to determine the supported set of GIC versions. KVM was already separate, so the only missing one is HVF which simply reuses all of TCG's emulation code and thus has the same compatibility matrix. Signed-off-by: Alexander Graf --- v1 -> v2: - Include TCG header for tcg_enabled() --- hw/arm/virt.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 7b54387958..76d8d5cc5a 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -47,6 +47,7 @@ #include "sysemu/numa.h" #include "sysemu/runstate.h" #include "sysemu/tpm.h" +#include "sysemu/tcg.h" #include "sysemu/kvm.h" #include "sysemu/hvf.h" #include "hw/loader.h" @@ -1930,7 +1931,7 @@ static void finalize_gic_version(VirtMachineState *vms) /* KVM w/o kernel irqchip can only deal with GICv2 */ gics_supported |= VIRT_GIC_VERSION_2_MASK; accel_name = "KVM with kernel-irqchip=off"; -} else { +} else if (tcg_enabled() || hvf_enabled()) { gics_supported |= VIRT_GIC_VERSION_2_MASK; if (module_object_class_by_name("arm-gicv3")) { gics_supported |= VIRT_GIC_VERSION_3_MASK; @@ -1939,6 +1940,9 @@ static void finalize_gic_version(VirtMachineState *vms) gics_supported |= VIRT_GIC_VERSION_4_MASK; } } +} else { +error_report("Unsupported accelerator, can not determine GIC support"); +exit(1); } /* -- 2.37.1 (Apple Git-137.1)
Re: [PATCH 1/2] hw/arm/virt: Consolidate GIC finalize logic
Hey Zengui, On 21.12.22 04:35, Zenghui Yu wrote: On 2022/12/21 7:04, Alexander Graf wrote: diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index c7dd59d7f1..365d19f7a3 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -109,12 +109,12 @@ typedef enum VirtMSIControllerType { } VirtMSIControllerType; typedef enum VirtGICType { - VIRT_GIC_VERSION_MAX, - VIRT_GIC_VERSION_HOST, - VIRT_GIC_VERSION_2, - VIRT_GIC_VERSION_3, - VIRT_GIC_VERSION_4, - VIRT_GIC_VERSION_NOSEL, + VIRT_GIC_VERSION_MAX = 0, + VIRT_GIC_VERSION_HOST = 1, + VIRT_GIC_VERSION_NOSEL = 2, + VIRT_GIC_VERSION_2 = (1 << 2), + VIRT_GIC_VERSION_3 = (1 << 3), + VIRT_GIC_VERSION_4 = (1 << 4), This would break the ACPI case. When building the MADT, we currently write the raw vms->gic_version value into "GIC version" field of the GICD structure. This happens to work because VIRT_GIC_VERSION_x == x (by luck, I think). We may need to fix build_madt() before this change. Ouch, thanks a lot for the catch! I don't think it's by luck - the versions are put very deliberately at a place where they equal the gic number. But I agree that it's missing a comment - I'll add one for clarification and make sure the defines looks explicit in v2. Alex
[PATCH 2/2] hw/arm/virt: Make accels in GIC finalize logic explicit
Let's explicitly list out all accelerators that we support when trying to determine the supported set of GIC versions. KVM was already separate, so the only missing one is HVF which simply reuses all of TCG's emulation code and thus has the same compatibility matrix. Signed-off-by: Alexander Graf --- hw/arm/virt.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index c79f5b6a66..b7fb473788 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1929,7 +1929,7 @@ static void finalize_gic_version(VirtMachineState *vms) /* KVM w/o kernel irqchip can only deal with GICv2 */ gics_supported |= VIRT_GIC_VERSION_2; accel_name = "KVM with kernel-irqchip=off"; -} else { +} else if (tcg_enabled() || hvf_enabled()) { gics_supported |= VIRT_GIC_VERSION_2; if (module_object_class_by_name("arm-gicv3")) { gics_supported |= VIRT_GIC_VERSION_3; @@ -1938,6 +1938,9 @@ static void finalize_gic_version(VirtMachineState *vms) gics_supported |= VIRT_GIC_VERSION_4; } } +} else { +error_report("Unsupported accelerator, can not determine GIC support"); +exit(1); } /* -- 2.37.1 (Apple Git-137.1)
[PATCH 1/2] hw/arm/virt: Consolidate GIC finalize logic
Up to now, the finalize_gic_version() code open coded what is essentially a support bitmap match between host/emulation environment and desired target GIC type. This open coding leads to undesirable side effects. For example, a VM with KVM and -smp 10 will automatically choose GICv3 while the same command line with TCG will stay on GICv2 and fail the launch. This patch combines the TCG and KVM matching code paths by making everything a 2 pass process. First, we determine which GIC versions the current environment is able to support, then we go through a single state machine to determine which target GIC mode that means for us. After this patch, the only user noticable changes should be consolidated error messages as well as TCG -M virt supporting -smp > 8 automatically. Signed-off-by: Alexander Graf --- hw/arm/virt.c | 198 ++ include/hw/arm/virt.h | 12 +-- 2 files changed, 108 insertions(+), 102 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 04eb6c201d..c79f5b6a66 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1820,6 +1820,84 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) } } +static VirtGICType finalize_gic_version_do(const char *accel_name, + VirtGICType gic_version, + int gics_supported, + unsigned int max_cpus) +{ +/* Convert host/max/nosel to GIC version number */ +switch (gic_version) { +case VIRT_GIC_VERSION_HOST: +if (kvm_enabled()) { +/* For KVM, -cpu host means -cpu max */ +return finalize_gic_version_do(accel_name, VIRT_GIC_VERSION_MAX, + gics_supported, max_cpus); +} + +error_report("gic-version=host requires KVM"); +exit(1); +break; +case VIRT_GIC_VERSION_MAX: +if (gics_supported & VIRT_GIC_VERSION_4) { +gic_version = VIRT_GIC_VERSION_4; +} else if (gics_supported & VIRT_GIC_VERSION_3) { +gic_version = VIRT_GIC_VERSION_3; +} else { +gic_version = VIRT_GIC_VERSION_2; +} +break; +case VIRT_GIC_VERSION_NOSEL: +if ((gics_supported & VIRT_GIC_VERSION_2) && max_cpus <= GIC_NCPU) { +gic_version = VIRT_GIC_VERSION_2; +} else if (gics_supported & VIRT_GIC_VERSION_3) { +/* + * in case the host does not support v2 emulation or + * the end-user requested more than 8 VCPUs we now default + * to v3. In any case defaulting to v2 would be broken. + */ +gic_version = VIRT_GIC_VERSION_3; +} else if (max_cpus > GIC_NCPU) { +error_report("%s only supports GICv2 emulation but more than 8 " + "vcpus are requested", accel_name); +exit(1); +} +break; +case VIRT_GIC_VERSION_2: +case VIRT_GIC_VERSION_3: +case VIRT_GIC_VERSION_4: +break; +} + +/* Check chosen version is effectively supported */ +switch (gic_version) { +case VIRT_GIC_VERSION_2: +if (!(gics_supported & VIRT_GIC_VERSION_2)) { +error_report("%s does not support GICv2 emulation", accel_name); +exit(1); +} +break; +case VIRT_GIC_VERSION_3: +if (!(gics_supported & VIRT_GIC_VERSION_3)) { +error_report("%s does not support GICv3 emulation", accel_name); +exit(1); +} +break; +case VIRT_GIC_VERSION_4: +if (!(gics_supported & VIRT_GIC_VERSION_4)) { +error_report("%s does not support GICv4 emulation, is virtualization=on?", + accel_name); +exit(1); +} +break; +default: +error_report("logic error in finalize_gic_version"); +exit(1); +break; +} + +return gic_version; +} + /* * finalize_gic_version - Determines the final gic_version * according to the gic-version property @@ -1828,118 +1906,46 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) */ static void finalize_gic_version(VirtMachineState *vms) { +const char *accel_name = current_accel_name(); unsigned int max_cpus = MACHINE(vms)->smp.max_cpus; +int gics_supported = 0; -if (kvm_enabled()) { -int probe_bitmap; - -if (!kvm_irqchip_in_kernel()) { -switch (vms->gic_version) { -case VIRT_GIC_VERSION_HOST: -warn_report( -"gic-version=host not relevant with kernel-irqchip=off " - "as only userspace GICv2 is supported. Using v2 ..."); -return; -case VIRT_GIC_VERSION_MAX:
[PATCH 0/2] hw/arm/virt: Handle HVF in finalize_gic_version()
The finalize_gic_version() function tries to determine which GIC version the current accelerator / host combination supports. During the initial HVF porting efforts, I didn't realize that I also had to touch this function. Then Zenghui brought up this function as reply to my HVF GICv3 enablement patch - and boy it is a mess. This patch set cleans up all of the GIC finalization so that we can easily plug HVF in and also hopefully will have a better time extending it in the future. As second step, it explicitly adds HVF support and fails loudly for any unsupported accelerators. Alex Alexander Graf (2): hw/arm/virt: Consolidate GIC finalize logic hw/arm/virt: Make accels in GIC finalize logic explicit hw/arm/virt.c | 199 ++ include/hw/arm/virt.h | 12 +-- 2 files changed, 110 insertions(+), 101 deletions(-) -- 2.37.1 (Apple Git-137.1)
Re: [PATCH 1/5] target/arm: only build psci for TCG
On 20.12.22 14:53, Fabiano Rosas wrote: Alexander Graf writes: Hey Fabiano, On 19.12.22 12:42, Fabiano Rosas wrote: Claudio Fontana writes: Ciao Alex, On 12/19/22 11:47, Alexander Graf wrote: Hey Claudio, On 19.12.22 09:37, Claudio Fontana wrote: On 12/16/22 22:59, Alexander Graf wrote: Hi Claudio, If the PSCI implementation becomes TCG only, can we also move to a tcg accel directory? It slowly gets super confusing to keep track of which files are supposed to be generic target code and which ones TCG specific> Alex Hi Alex, Fabiano, Peter and all, that was the plan but at the time of: https://lore.kernel.org/all/20210416162824.25131-1-cfont...@suse.de/ Peter mentioned that HVF AArch64 might use that code too: https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg00509.html so from v2 to v3 the series changed to not move the code under tcg/ , expecting HVF to be reusing that code "soon". I see that your hvf code in hvf/ implements psci, is there some plan to reuse pieces from the tcg implementation now? I originally reused the PSCI code in earlier versions of my hvf patch set, but then we realized that some functions like remote CPU reset are wired in a TCG specific view of the world with full target CPU register ownership. So if we want to actually share it, we'll need to abstract it up a level. Hence I'd suggest to move it to a TCG directory for now and then later move it back into a generic helper if we want / need to. The code just simply isn't generic yet. Or alternatively, you create a patch (set) to actually merge the 2 implementations into a generic one again which then can live at a generic place :) Alex Thanks for the clarification, I'll leave the choice up to Fabiano now, since he is working on the series currently :-) Ciao, Claudio Hello, thank you all for the comments. I like the idea of merging the two implementations. However, I won't get to it anytime soon. There's still ~70 patches in the original series that I need to understand, rebase and test, including the introduction of the tcg directory. Sure, I am definitely fine with leaving them separate for now as well :). I'd say we merge this as is now, since this patch has no dependencies. Later when I introduce the tcg directory I can move the code there along with the other tcg-only files. I'll take note to come back to the PSCI code as well. I'm confused about the patch ordering :). Why is it easier to move the psci.c compilation target from generic to an if(CONFIG_TCG) only to later move it into a tcg/ directory? It's a simple patch, so the overhead didn't cross my mind. But you are right, this could go directly into tcg/ without having to put it under CONFIG_TCG first. I'm sure more like this will follow, and it will be a lot easier on everyone if the pattern is going to be "move tcg specific code to tcg/ and leave generic code in the main directory". Wouldn't it be easier to create a tcg/ directory from the start and just put it there? I don't know about "from the start". At this point we cannot have a single commit moving everything into the tcg/ directory because some files still contain tcg + non-tcg code. Yes, the only thing the initial commit at the start would do is create the directory and ninja config, pretty much nothing else. All follow-on commits then split the currently entangled code into tcg/ once code is clearly separate :). I believe this was also the approach Claudio took in his patch set last year, and I find it very reasonable. It allows you to stop at any point mid-way. We would end up with several commits moving files into tcg/ along the history, which I think results in a poor experience when inspecting the log later on (git blame and so on). So my idea was to split as much as I can upfront and only later move everything into the directory. Quite the opposite: Please make sure to move everything slowly at a digestible pace. There is no way you will get 100 patches in at once. Make sure you can cut off at any point in between. What you basically want is to move from "target/arm is tcg+generic code" to "target/arm is generic, target/arm/tcg is tcg code". You will be in a transitional phase along the way whatever you do, so just make it "target/arm is tcg+generic code, target/arm/tcg is tcg code" while things are in flight and have a final commit that indicates the conversion is done. I'm also rebasing this series [1] from 2021, which means I'd rather have small chunks of code moved under CONFIG_TCG that I can build-test with --disable-tcg (even though the build doesn't finish, I can see the number of errors going down), than to move non-tcg code into tcg/ and then pull it out later like in the original series. I think we're saying the same thing. Please don't move
Re: [PATCH] hvf: arm: Add support for GICv3
Hi Zenghui, On 20.12.22 08:14, Zenghui Yu wrote: On 2022/12/20 6:08, Alexander Graf wrote: We currently only support GICv2 emulation. Before looking into it, I think it's worth finalizing the GIC version in the hvf case - only v2 is allowed and fail early if user selects the unsupported versions. Currently finalize_gic_version() does not deal with hvf at all. Currently finalize_gic_version() treats HVF the same as TCG, which is incorrect. However, with this patch applied, they happen to match. I don't think it's worth changing the finalize_gic_version() implementation to reflect the gicv2 only state for HVF. Instead, let's rather get GICv3 support in and then add explicit handling for HVF on top. Alex
Re: [PATCH 1/5] target/arm: only build psci for TCG
Hey Fabiano, On 19.12.22 12:42, Fabiano Rosas wrote: Claudio Fontana writes: Ciao Alex, On 12/19/22 11:47, Alexander Graf wrote: Hey Claudio, On 19.12.22 09:37, Claudio Fontana wrote: On 12/16/22 22:59, Alexander Graf wrote: Hi Claudio, If the PSCI implementation becomes TCG only, can we also move to a tcg accel directory? It slowly gets super confusing to keep track of which files are supposed to be generic target code and which ones TCG specific> Alex Hi Alex, Fabiano, Peter and all, that was the plan but at the time of: https://lore.kernel.org/all/20210416162824.25131-1-cfont...@suse.de/ Peter mentioned that HVF AArch64 might use that code too: https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg00509.html so from v2 to v3 the series changed to not move the code under tcg/ , expecting HVF to be reusing that code "soon". I see that your hvf code in hvf/ implements psci, is there some plan to reuse pieces from the tcg implementation now? I originally reused the PSCI code in earlier versions of my hvf patch set, but then we realized that some functions like remote CPU reset are wired in a TCG specific view of the world with full target CPU register ownership. So if we want to actually share it, we'll need to abstract it up a level. Hence I'd suggest to move it to a TCG directory for now and then later move it back into a generic helper if we want / need to. The code just simply isn't generic yet. Or alternatively, you create a patch (set) to actually merge the 2 implementations into a generic one again which then can live at a generic place :) Alex Thanks for the clarification, I'll leave the choice up to Fabiano now, since he is working on the series currently :-) Ciao, Claudio Hello, thank you all for the comments. I like the idea of merging the two implementations. However, I won't get to it anytime soon. There's still ~70 patches in the original series that I need to understand, rebase and test, including the introduction of the tcg directory. Sure, I am definitely fine with leaving them separate for now as well :). I'd say we merge this as is now, since this patch has no dependencies. Later when I introduce the tcg directory I can move the code there along with the other tcg-only files. I'll take note to come back to the PSCI code as well. I'm confused about the patch ordering :). Why is it easier to move the psci.c compilation target from generic to an if(CONFIG_TCG) only to later move it into a tcg/ directory? Wouldn't it be easier to create a tcg/ directory from the start and just put it there? The current approach just looks like duplicate effort to me. Alex
[PATCH] hvf: arm: Add support for GICv3
We currently only support GICv2 emulation. To also support GICv3, we will need to pass a few system registers into their respective handler functions. This patch adds support for HVF to call into the TCG callbacks for GICv3 system register handlers. This is safe because the GICv3 TCG code is generic as long as we limit ourselves to EL0 and EL1 - which are the only modes supported by HVF. To make sure nobody trips over that, we also annotate callbacks that don't work in HVF mode, such as EL state change hooks. With GICv3 support in place, we can run with more than 8 vCPUs. Signed-off-by: Alexander Graf --- hw/intc/arm_gicv3_cpuif.c | 8 +- target/arm/hvf/hvf.c| 151 target/arm/hvf/trace-events | 2 + 3 files changed, 160 insertions(+), 1 deletion(-) diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c index b17b29288c..b4e387268c 100644 --- a/hw/intc/arm_gicv3_cpuif.c +++ b/hw/intc/arm_gicv3_cpuif.c @@ -21,6 +21,7 @@ #include "hw/irq.h" #include "cpu.h" #include "target/arm/cpregs.h" +#include "sysemu/tcg.h" /* * Special case return value from hppvi_index(); must be larger than @@ -2810,6 +2811,8 @@ void gicv3_init_cpuif(GICv3State *s) * which case we'd get the wrong value. * So instead we define the regs with no ri->opaque info, and * get back to the GICv3CPUState from the CPUARMState. + * + * These CP regs callbacks can be called from either TCG or HVF code. */ define_arm_cp_regs(cpu, gicv3_cpuif_reginfo); @@ -2905,6 +2908,9 @@ void gicv3_init_cpuif(GICv3State *s) define_arm_cp_regs(cpu, gicv3_cpuif_ich_apxr23_reginfo); } } -arm_register_el_change_hook(cpu, gicv3_cpuif_el_change_hook, cs); +if (tcg_enabled()) { +/* We can only trap EL changes with TCG for now */ +arm_register_el_change_hook(cpu, gicv3_cpuif_el_change_hook, cs); +} } } diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c index 060aa0ccf4..8ea4be5f30 100644 --- a/target/arm/hvf/hvf.c +++ b/target/arm/hvf/hvf.c @@ -80,6 +80,33 @@ #define SYSREG_PMCCNTR_EL0SYSREG(3, 3, 9, 13, 0) #define SYSREG_PMCCFILTR_EL0 SYSREG(3, 3, 14, 15, 7) +#define SYSREG_ICC_AP0R0_EL1 SYSREG(3, 0, 12, 8, 4) +#define SYSREG_ICC_AP0R1_EL1 SYSREG(3, 0, 12, 8, 5) +#define SYSREG_ICC_AP0R2_EL1 SYSREG(3, 0, 12, 8, 6) +#define SYSREG_ICC_AP0R3_EL1 SYSREG(3, 0, 12, 8, 7) +#define SYSREG_ICC_AP1R0_EL1 SYSREG(3, 0, 12, 9, 0) +#define SYSREG_ICC_AP1R1_EL1 SYSREG(3, 0, 12, 9, 1) +#define SYSREG_ICC_AP1R2_EL1 SYSREG(3, 0, 12, 9, 2) +#define SYSREG_ICC_AP1R3_EL1 SYSREG(3, 0, 12, 9, 3) +#define SYSREG_ICC_ASGI1R_EL1SYSREG(3, 0, 12, 11, 6) +#define SYSREG_ICC_BPR0_EL1 SYSREG(3, 0, 12, 8, 3) +#define SYSREG_ICC_BPR1_EL1 SYSREG(3, 0, 12, 12, 3) +#define SYSREG_ICC_CTLR_EL1 SYSREG(3, 0, 12, 12, 4) +#define SYSREG_ICC_DIR_EL1 SYSREG(3, 0, 12, 11, 1) +#define SYSREG_ICC_EOIR0_EL1 SYSREG(3, 0, 12, 8, 1) +#define SYSREG_ICC_EOIR1_EL1 SYSREG(3, 0, 12, 12, 1) +#define SYSREG_ICC_HPPIR0_EL1SYSREG(3, 0, 12, 8, 2) +#define SYSREG_ICC_HPPIR1_EL1SYSREG(3, 0, 12, 12, 2) +#define SYSREG_ICC_IAR0_EL1 SYSREG(3, 0, 12, 8, 0) +#define SYSREG_ICC_IAR1_EL1 SYSREG(3, 0, 12, 12, 0) +#define SYSREG_ICC_IGRPEN0_EL1 SYSREG(3, 0, 12, 12, 6) +#define SYSREG_ICC_IGRPEN1_EL1 SYSREG(3, 0, 12, 12, 7) +#define SYSREG_ICC_PMR_EL1 SYSREG(3, 0, 4, 6, 0) +#define SYSREG_ICC_RPR_EL1 SYSREG(3, 0, 12, 11, 3) +#define SYSREG_ICC_SGI0R_EL1 SYSREG(3, 0, 12, 11, 7) +#define SYSREG_ICC_SGI1R_EL1 SYSREG(3, 0, 12, 11, 5) +#define SYSREG_ICC_SRE_EL1 SYSREG(3, 0, 12, 12, 5) + #define WFX_IS_WFE (1 << 0) #define TMR_CTL_ENABLE (1 << 0) @@ -788,6 +815,43 @@ static bool is_id_sysreg(uint32_t reg) SYSREG_CRM(reg) < 8; } +static uint32_t hvf_reg2cp_reg(uint32_t reg) +{ +return ENCODE_AA64_CP_REG(CP_REG_ARM64_SYSREG_CP, + (reg >> 10) & 0xf, + (reg >> 1) & 0xf, + (reg >> 20) & 0x3, + (reg >> 14) & 0x7, + (reg >> 17) & 0x7); +} + +static bool hvf_sysreg_read_cp(CPUState *cpu, uint32_t reg, uint64_t *val) +{ +ARMCPU *arm_cpu = ARM_CPU(cpu); +CPUARMState *env = &arm_cpu->env; +const ARMCPRegInfo *ri; + +ri = get_arm_cp_reginfo(arm_cpu->cp_regs, hvf_reg2cp_reg(reg)); +if (ri) { +if (ri->accessfn) { +if (ri->accessfn(env, ri, true) != CP_ACCESS_OK) { +return false; +} +} +if (ri->type & ARM_CP_CONST) { +*val = ri->resetvalue; +} else if (ri->readfn) { +
Re: [PATCH 1/5] target/arm: only build psci for TCG
Hey Claudio, On 19.12.22 09:37, Claudio Fontana wrote: On 12/16/22 22:59, Alexander Graf wrote: Hi Claudio, If the PSCI implementation becomes TCG only, can we also move to a tcg accel directory? It slowly gets super confusing to keep track of which files are supposed to be generic target code and which ones TCG specific> Alex Hi Alex, Fabiano, Peter and all, that was the plan but at the time of: https://lore.kernel.org/all/20210416162824.25131-1-cfont...@suse.de/ Peter mentioned that HVF AArch64 might use that code too: https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg00509.html so from v2 to v3 the series changed to not move the code under tcg/ , expecting HVF to be reusing that code "soon". I see that your hvf code in hvf/ implements psci, is there some plan to reuse pieces from the tcg implementation now? I originally reused the PSCI code in earlier versions of my hvf patch set, but then we realized that some functions like remote CPU reset are wired in a TCG specific view of the world with full target CPU register ownership. So if we want to actually share it, we'll need to abstract it up a level. Hence I'd suggest to move it to a TCG directory for now and then later move it back into a generic helper if we want / need to. The code just simply isn't generic yet. Or alternatively, you create a patch (set) to actually merge the 2 implementations into a generic one again which then can live at a generic place :) Alex
Re: [PATCH 1/5] target/arm: only build psci for TCG
Hi Claudio, If the PSCI implementation becomes TCG only, can we also move to a tcg accel directory? It slowly gets super confusing to keep track of which files are supposed to be generic target code and which ones TCG specific. Alex > Am 16.12.2022 um 22:37 schrieb Fabiano Rosas : > > From: Claudio Fontana > > Signed-off-by: Claudio Fontana > Cc: Alexander Graf > Reviewed-by: Richard Henderson > Reviewed-by: Alex Bennée > Signed-off-by: Fabiano Rosas > --- > Originally from: > [RFC v14 09/80] target/arm: only build psci for TCG > https://lore.kernel.org/r/20210416162824.25131-10-cfont...@suse.de > --- > target/arm/meson.build | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/target/arm/meson.build b/target/arm/meson.build > index 87e911b27f..26e425418f 100644 > --- a/target/arm/meson.build > +++ b/target/arm/meson.build > @@ -61,10 +61,13 @@ arm_softmmu_ss.add(files( > 'arm-powerctl.c', > 'machine.c', > 'monitor.c', > - 'psci.c', > 'ptw.c', > )) > > +arm_softmmu_ss.add(when: 'CONFIG_TCG', if_true: files( > + 'psci.c', > +)) > + > subdir('hvf') > > target_arch += {'arm': arm_ss} > -- > 2.35.3 >
[PATCH 2/3] i386: kvm: Add support for MSR filtering
KVM has grown support to deflect arbitrary MSRs to user space since Linux 5.10. For now we don't expect to make a lot of use of this feature, so let's expose it the easiest way possible: With up to 16 individually maskable MSRs. This patch adds a kvm_filter_msr() function that other code can call to install a hook on KVM MSR reads or writes. Signed-off-by: Alexander Graf --- target/i386/kvm/kvm.c | 124 + target/i386/kvm/kvm_i386.h | 11 2 files changed, 135 insertions(+) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index a1fd1f5379..ea53092dd0 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -139,6 +139,8 @@ static struct kvm_cpuid2 *cpuid_cache; static struct kvm_cpuid2 *hv_cpuid_cache; static struct kvm_msr_list *kvm_feature_msrs; +static KVMMSRHandlers msr_handlers[KVM_MSR_FILTER_MAX_RANGES]; + #define BUS_LOCK_SLICE_TIME 10ULL /* ns */ static RateLimit bus_lock_ratelimit_ctrl; static int kvm_get_one_msr(X86CPU *cpu, int index, uint64_t *value); @@ -2588,6 +2590,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } } +if (kvm_vm_check_extension(s, KVM_CAP_X86_USER_SPACE_MSR)) { +ret = kvm_vm_enable_cap(s, KVM_CAP_X86_USER_SPACE_MSR, 0, +KVM_MSR_EXIT_REASON_FILTER); +if (ret) { +error_report("Could not enable user space MSRs: %s", + strerror(-ret)); +exit(1); +} +} + return 0; } @@ -5077,6 +5089,108 @@ void kvm_arch_update_guest_debug(CPUState *cpu, struct kvm_guest_debug *dbg) } } +static bool kvm_install_msr_filters(KVMState *s) +{ +uint64_t zero = 0; +struct kvm_msr_filter filter = { +.flags = KVM_MSR_FILTER_DEFAULT_ALLOW, +}; +int r, i, j = 0; + +for (i = 0; i < KVM_MSR_FILTER_MAX_RANGES; i++) { +KVMMSRHandlers *handler = &msr_handlers[i]; +if (handler->msr) { +struct kvm_msr_filter_range *range = &filter.ranges[j++]; + +*range = (struct kvm_msr_filter_range) { +.flags = 0, +.nmsrs = 1, +.base = handler->msr, +.bitmap = (__u8 *)&zero, +}; + +if (handler->rdmsr) { +range->flags |= KVM_MSR_FILTER_READ; +} + +if (handler->wrmsr) { +range->flags |= KVM_MSR_FILTER_WRITE; +} +} +} + +r = kvm_vm_ioctl(s, KVM_X86_SET_MSR_FILTER, &filter); +if (r) { +return false; +} + +return true; +} + +bool kvm_filter_msr(KVMState *s, uint32_t msr, QEMURDMSRHandler *rdmsr, +QEMUWRMSRHandler *wrmsr) +{ +int i; + +for (i = 0; i < ARRAY_SIZE(msr_handlers); i++) { +if (!msr_handlers[i].msr) { +msr_handlers[i] = (KVMMSRHandlers) { +.msr = msr, +.rdmsr = rdmsr, +.wrmsr = wrmsr, +}; + +if (!kvm_install_msr_filters(s)) { +msr_handlers[i] = (KVMMSRHandlers) { }; +return false; +} + +return true; +} +} + +return false; +} + +static int kvm_handle_rdmsr(X86CPU *cpu, struct kvm_run *run) +{ +int i; +bool r; + +for (i = 0; i < ARRAY_SIZE(msr_handlers); i++) { +KVMMSRHandlers *handler = &msr_handlers[i]; +if (run->msr.index == handler->msr) { +if (handler->rdmsr) { +r = handler->rdmsr(cpu, handler->msr, + (uint64_t *)&run->msr.data); +run->msr.error = r ? 0 : 1; +return 0; +} +} +} + +assert(false); +} + +static int kvm_handle_wrmsr(X86CPU *cpu, struct kvm_run *run) +{ +int i; +bool r; + +for (i = 0; i < ARRAY_SIZE(msr_handlers); i++) { +KVMMSRHandlers *handler = &msr_handlers[i]; +if (run->msr.index == handler->msr) { +if (handler->wrmsr) { +r = handler->wrmsr(cpu, handler->msr, run->msr.data); +run->msr.error = r ? 0 : 1; +return 0; +} +} +} + +assert(false); +} + static bool has_sgx_provisioning; static bool __kvm_enable_sgx_provisioning(KVMState *s) @@ -5176,6 +5290,16 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) /* already handled in kvm_arch_post_run */ ret = 0; break; +case KVM_EXIT_X86_RDMSR: +/* We only enable MSR filtering, any other exit is bogus */ +assert(run->msr.reason == KVM_MSR_EXIT_REASON_FILTER); +ret = kvm_handle_rdmsr(cpu, run); +break; +case KVM_EXIT_X86_WRMSR: +/* We only enable MSR filtering, any other exit is bogus */ +assert(run->msr.re
[PATCH 3/3] KVM: x86: Implement MSR_CORE_THREAD_COUNT MSR
The MSR_CORE_THREAD_COUNT MSR describes CPU package topology, such as number of threads and cores for a given package. This is information that QEMU has readily available and can provide through the new user space MSR deflection interface. This patch propagates the existing hvf logic from patch 027ac0cb516 ("target/i386/hvf: add rdmsr 35H MSR_CORE_THREAD_COUNT") to KVM. Signed-off-by: Alexander Graf --- target/i386/kvm/kvm.c | 21 + 1 file changed, 21 insertions(+) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index ea53092dd0..791e995389 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -2403,6 +2403,17 @@ static int kvm_get_supported_msrs(KVMState *s) return ret; } +static bool kvm_rdmsr_core_thread_count(X86CPU *cpu, uint32_t msr, +uint64_t *val) +{ +CPUState *cs = CPU(cpu); + +*val = cs->nr_threads * cs->nr_cores; /* thread count, bits 15..0 */ +*val |= ((uint32_t)cs->nr_cores << 16); /* core count, bits 31..16 */ + +return true; +} + static Notifier smram_machine_done; static KVMMemoryListener smram_listener; static AddressSpace smram_address_space; @@ -2591,6 +2602,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } if (kvm_vm_check_extension(s, KVM_CAP_X86_USER_SPACE_MSR)) { +bool r; + ret = kvm_vm_enable_cap(s, KVM_CAP_X86_USER_SPACE_MSR, 0, KVM_MSR_EXIT_REASON_FILTER); if (ret) { @@ -2598,6 +2611,14 @@ int kvm_arch_init(MachineState *ms, KVMState *s) strerror(-ret)); exit(1); } + +r = kvm_filter_msr(s, MSR_CORE_THREAD_COUNT, + kvm_rdmsr_core_thread_count, NULL); +if (!r) { +error_report("Could not install MSR_CORE_THREAD_COUNT handler: %s", + strerror(-ret)); +exit(1); +} } return 0; -- 2.37.0 (Apple Git-136)
[PATCH 0/3] Add TCG & KVM support for MSR_CORE_THREAD_COUNT
Commit 027ac0cb516 ("target/i386/hvf: add rdmsr 35H MSR_CORE_THREAD_COUNT") added support for the MSR_CORE_THREAD_COUNT MSR to HVF. This MSR is mandatory to execute macOS when run with -cpu host,+hypervisor. This patch set adds support for the very same MSR to TCG as well as KVM - as long as host KVM is recent enough to support MSR trapping. With this support added, I can successfully execute macOS guests in KVM with an APFS enabled OVMF build, a valid applesmc plus OSK and -cpu Skylake-Client,+invtsc,+hypervisor Alex Alexander Graf (3): x86: Implement MSR_CORE_THREAD_COUNT MSR i386: kvm: Add support for MSR filtering KVM: x86: Implement MSR_CORE_THREAD_COUNT MSR target/i386/kvm/kvm.c| 145 +++ target/i386/kvm/kvm_i386.h | 11 ++ target/i386/tcg/sysemu/misc_helper.c | 5 + 3 files changed, 161 insertions(+) -- 2.37.0 (Apple Git-136)
[PATCH 1/3] x86: Implement MSR_CORE_THREAD_COUNT MSR
Intel CPUs starting with Haswell-E implement a new MSR called MSR_CORE_THREAD_COUNT which exposes the number of threads and cores inside of a package. This MSR is used by XNU to populate internal data structures and not implementing it prevents virtual machines with more than 1 vCPU from booting if the emulated CPU generation is at least Haswell-E. This patch propagates the existing hvf logic from patch 027ac0cb516 ("target/i386/hvf: add rdmsr 35H MSR_CORE_THREAD_COUNT") to TCG. Signed-off-by: Alexander Graf --- target/i386/tcg/sysemu/misc_helper.c | 5 + 1 file changed, 5 insertions(+) diff --git a/target/i386/tcg/sysemu/misc_helper.c b/target/i386/tcg/sysemu/misc_helper.c index 1328aa656f..e1528b7f80 100644 --- a/target/i386/tcg/sysemu/misc_helper.c +++ b/target/i386/tcg/sysemu/misc_helper.c @@ -450,6 +450,11 @@ void helper_rdmsr(CPUX86State *env) case MSR_IA32_UCODE_REV: val = x86_cpu->ucode_rev; break; +case MSR_CORE_THREAD_COUNT: { +CPUState *cs = CPU(x86_cpu); +val = (cs->nr_threads * cs->nr_cores) | (cs->nr_cores << 16); +break; +} default: if ((uint32_t)env->regs[R_ECX] >= MSR_MC0_CTL && (uint32_t)env->regs[R_ECX] < MSR_MC0_CTL + -- 2.37.0 (Apple Git-136)
Re: [PATCH v2 03/11] target/arm: ensure HVF traps set appropriate MemTxAttrs
On 26.09.22 15:38, Alex Bennée wrote: As most HVF devices are done purely in software we need to make sure we properly encode the source CPU in MemTxAttrs. This will allow the device emulations to use those attributes rather than relying on current_cpu (although current_cpu will still be correct in this case). Signed-off-by: Alex Bennée Cc: Mads Ynddal Cc: Alexander Graf --- target/arm/hvf/hvf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c index 060aa0ccf4..13b7971560 100644 --- a/target/arm/hvf/hvf.c +++ b/target/arm/hvf/hvf.c @@ -1233,11 +1233,11 @@ int hvf_vcpu_exec(CPUState *cpu) val = hvf_get_reg(cpu, srt); address_space_write(&address_space_memory, hvf_exit->exception.physical_address, -MEMTXATTRS_UNSPECIFIED, &val, len); +MEMTXATTRS_CPU(cpu->cpu_index), &val, len); I think it would make a safer API if MEMTXATTRS_CPU() would take CPUState * as argument so you can just pass in cpu here. For the HVF part however, Acked-by: Alexander Graf Alex