Re: [PATCH v2] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake
On 07/05/2024 3:45 pm, Roger Pau Monné wrote: > On Tue, May 07, 2024 at 03:31:19PM +0100, Andrew Cooper wrote: >> On 07/05/2024 3:24 pm, Roger Pau Monné wrote: >>> On Tue, May 07, 2024 at 02:45:40PM +0100, Andrew Cooper wrote: >>>> Ever since Xen 4.14, there has been a latent bug with migration. >>>> >>>> While some toolstacks can level the features properly, they don't shink >>>> feat.max_subleaf when all features have been dropped. This is because >>>> we *still* have not completed the toolstack side work for full CPU Policy >>>> objects. >>>> >>>> As a consequence, even when properly feature levelled, VMs can't migrate >>>> "backwards" across hardware which reduces feat.max_subleaf. One such >>>> example >>>> is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0). >>>> >>>> Extend the max policies feat.max_subleaf to the hightest number Xen knows >>>> about, but leave the default policies matching the host. This will allow >>>> VMs >>>> with a higher feat.max_subleaf than strictly necessary to migrate in. >>>> >>>> Eventually we'll manage to teach the toolstack how to avoid creating such >>>> VMs >>>> in the first place, but there's still more work to do there. >>>> >>>> Signed-off-by: Andrew Cooper >>> Acked-by: Roger Pau Monné >> Thanks. >> >>> Even if we have just found one glitch with PSFD and Ice Lake vs >>> Cascade Lack, wouldn't it be safer to always extend the max policies >>> max leafs and subleafs to match the known array sizes? >> This is the final max leaf (containing feature information) to gain >> custom handling, I think? > Couldn't the same happen with extended leaves? Some of the extended > leaves contain features, and hence for policy leveling toolstack might > decide to zero them, yet extd.max_leaf won't be adjusted. Hmm. Right now, extd max leaf is also the one with the bit that we unconditionally advertise, and it's inherited all the way from the host policy. So yes, in principle, but anything that bumps this limit is going to have other implications too, and I'd prefer not to second-guess them at this point. I hope we can get the toolstack side fixes before this becomes a real problem... ~Andrew
Re: [PATCH v2] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake
On 07/05/2024 3:24 pm, Roger Pau Monné wrote: > On Tue, May 07, 2024 at 02:45:40PM +0100, Andrew Cooper wrote: >> Ever since Xen 4.14, there has been a latent bug with migration. >> >> While some toolstacks can level the features properly, they don't shink >> feat.max_subleaf when all features have been dropped. This is because >> we *still* have not completed the toolstack side work for full CPU Policy >> objects. >> >> As a consequence, even when properly feature levelled, VMs can't migrate >> "backwards" across hardware which reduces feat.max_subleaf. One such example >> is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0). >> >> Extend the max policies feat.max_subleaf to the hightest number Xen knows >> about, but leave the default policies matching the host. This will allow VMs >> with a higher feat.max_subleaf than strictly necessary to migrate in. >> >> Eventually we'll manage to teach the toolstack how to avoid creating such VMs >> in the first place, but there's still more work to do there. >> >> Signed-off-by: Andrew Cooper > Acked-by: Roger Pau Monné Thanks. > > Even if we have just found one glitch with PSFD and Ice Lake vs > Cascade Lack, wouldn't it be safer to always extend the max policies > max leafs and subleafs to match the known array sizes? This is the final max leaf (containing feature information) to gain custom handling, I think? ~Andrew
Re: [PATCH] tools/xl: Open xldevd.log with O_CLOEXEC
On 07/05/2024 3:23 pm, Marek Marczykowski-Górecki wrote: > On Tue, May 07, 2024 at 03:15:48PM +0100, Andrew Cooper wrote: >> On 07/05/2024 12:32 pm, Marek Marczykowski-Górecki wrote: >>> On Tue, May 07, 2024 at 12:08:06PM +0100, Andrew Cooper wrote: >>>> `xl devd` has been observed leaking /var/log/xldevd.log into children. >>>> >>>> Link: https://github.com/QubesOS/qubes-issues/issues/8292 >>>> Reported-by: Demi Marie Obenour >>>> Signed-off-by: Andrew Cooper >>>> --- >>>> CC: Anthony PERARD >>>> CC: Juergen Gross >>>> CC: Demi Marie Obenour >>>> CC: Marek Marczykowski-Górecki >>>> >>>> Also entirely speculative based on the QubesOS ticket. >>>> --- >>>> tools/xl/xl_utils.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c >>>> index 17489d182954..060186db3a59 100644 >>>> --- a/tools/xl/xl_utils.c >>>> +++ b/tools/xl/xl_utils.c >>>> @@ -270,7 +270,7 @@ int do_daemonize(const char *name, const char *pidfile) >>>> exit(-1); >>>> } >>>> >>>> -CHK_SYSCALL(logfile = open(fullname, O_WRONLY|O_CREAT|O_APPEND, >>>> 0644)); >>>> +CHK_SYSCALL(logfile = open(fullname, O_WRONLY | O_CREAT | O_APPEND | >>>> O_CLOEXEC, 0644)); >>> This one might be not enough, as the FD gets dup2()-ed to stdout/stderr >>> just outside of the context here, and then inherited by various hotplug >>> script. Just adding O_CLOEXEC here means the hotplug scripts will run >>> with stdout/stderr closed. >> Lovely :( Yes - this won't work. I guess what we want instead is: >> >> diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c >> index 060186db3a59..a0ce7dd7fa21 100644 >> --- a/tools/xl/xl_utils.c >> +++ b/tools/xl/xl_utils.c >> @@ -282,6 +282,7 @@ int do_daemonize(const char *name, const char *pidfile) >> dup2(logfile, 2); >> >> close(nullfd); >> + close(logfile); >> >> CHK_SYSCALL(daemon(0, 1)); >> >> which at least means there's not a random extra fd attached to the logfile. > But logfile is a global variable, and it looks to be used in dolog()... Urgh, fine. Lets go back to your suggestion of setting CLOEXEC after dup()ing. ~Andrew
Re: [PATCH] tools/xl: Open xldevd.log with O_CLOEXEC
On 07/05/2024 12:32 pm, Marek Marczykowski-Górecki wrote: > On Tue, May 07, 2024 at 12:08:06PM +0100, Andrew Cooper wrote: >> `xl devd` has been observed leaking /var/log/xldevd.log into children. >> >> Link: https://github.com/QubesOS/qubes-issues/issues/8292 >> Reported-by: Demi Marie Obenour >> Signed-off-by: Andrew Cooper >> --- >> CC: Anthony PERARD >> CC: Juergen Gross >> CC: Demi Marie Obenour >> CC: Marek Marczykowski-Górecki >> >> Also entirely speculative based on the QubesOS ticket. >> --- >> tools/xl/xl_utils.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c >> index 17489d182954..060186db3a59 100644 >> --- a/tools/xl/xl_utils.c >> +++ b/tools/xl/xl_utils.c >> @@ -270,7 +270,7 @@ int do_daemonize(const char *name, const char *pidfile) >> exit(-1); >> } >> >> -CHK_SYSCALL(logfile = open(fullname, O_WRONLY|O_CREAT|O_APPEND, 0644)); >> +CHK_SYSCALL(logfile = open(fullname, O_WRONLY | O_CREAT | O_APPEND | >> O_CLOEXEC, 0644)); > This one might be not enough, as the FD gets dup2()-ed to stdout/stderr > just outside of the context here, and then inherited by various hotplug > script. Just adding O_CLOEXEC here means the hotplug scripts will run > with stdout/stderr closed. Lovely :( Yes - this won't work. I guess what we want instead is: diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c index 060186db3a59..a0ce7dd7fa21 100644 --- a/tools/xl/xl_utils.c +++ b/tools/xl/xl_utils.c @@ -282,6 +282,7 @@ int do_daemonize(const char *name, const char *pidfile) dup2(logfile, 2); close(nullfd); + close(logfile); CHK_SYSCALL(daemon(0, 1)); which at least means there's not a random extra fd attached to the logfile. >> free(fullname); >> assert(logfile >= 3); >> >> >> base-commit: ebab808eb1bb8f24c7d0dd41b956e48cb1824b81 >> prerequisite-patch-id: 212e50457e9b6bdfd06a97da545a5aa7155bb919 > Which one is this? I don't see it in staging, nor in any of your > branches on xenbits. Lore finds "tools/libxs: Open /dev/xen/xenbus fds > as O_CLOEXEC" which I guess is correct, but I have no idea how it > correlates it, as this hash doesn't appear anywhere in the message, nor > its headers... It's the libxs patch, but rebased over yesterday's push to staging. auto-base doesn't work quite so well in cases like this. ~Andrew
Re: [PATCH net] xen-netfront: Add missing skb_mark_for_recycle
Hello, Please could we request a CVE for "xen-netfront: Add missing skb_mark_for_recycle" which is 037965402a010898d34f4e35327d22c0a95cd51f in Linus' tree. This is a kernel memory leak trigger-able from unprivileged userspace. I can't see any evidence of this fix having been assigned a CVE thus far on the linux-cve-annouce mailing list. Thanks, ~Andrew On 25/04/2024 4:13 pm, Greg KH wrote: > On Thu, Apr 25, 2024 at 02:39:38PM +0100, George Dunlap wrote: >> Greg, >> >> We're issuing an XSA for this; can you issue a CVE? > To ask for a cve, please contact c...@kernel.org as per our > documentation. Please provide the git id of the commit you wish to have > the cve assigned to. > > thanks, > > greg k-h
[PATCH v2] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake
Ever since Xen 4.14, there has been a latent bug with migration. While some toolstacks can level the features properly, they don't shink feat.max_subleaf when all features have been dropped. This is because we *still* have not completed the toolstack side work for full CPU Policy objects. As a consequence, even when properly feature levelled, VMs can't migrate "backwards" across hardware which reduces feat.max_subleaf. One such example is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0). Extend the max policies feat.max_subleaf to the hightest number Xen knows about, but leave the default policies matching the host. This will allow VMs with a higher feat.max_subleaf than strictly necessary to migrate in. Eventually we'll manage to teach the toolstack how to avoid creating such VMs in the first place, but there's still more work to do there. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné v2: * Adjust max policies rather than the host policy. --- xen/arch/x86/cpu-policy.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c index 4b6d96276399..f7e2910c01b5 100644 --- a/xen/arch/x86/cpu-policy.c +++ b/xen/arch/x86/cpu-policy.c @@ -590,6 +590,13 @@ static void __init calculate_pv_max_policy(void) unsigned int i; *p = host_cpu_policy; + +/* + * Some VMs may have a larger-than-necessary feat max_leaf. Allow them to + * migrate in. + */ +p->feat.max_subleaf = ARRAY_SIZE(p->feat.raw) - 1; + x86_cpu_policy_to_featureset(p, fs); for ( i = 0; i < ARRAY_SIZE(fs); ++i ) @@ -630,6 +637,10 @@ static void __init calculate_pv_def_policy(void) unsigned int i; *p = pv_max_cpu_policy; + +/* Default to the same max_subleaf as the host. */ +p->feat.max_subleaf = host_cpu_policy.feat.max_subleaf; + x86_cpu_policy_to_featureset(p, fs); for ( i = 0; i < ARRAY_SIZE(fs); ++i ) @@ -666,6 +677,13 @@ static void __init calculate_hvm_max_policy(void) const uint32_t *mask; *p = host_cpu_policy; + +/* + * Some VMs may have a larger-than-necessary feat max_leaf. Allow them to + * migrate in. + */ +p->feat.max_subleaf = ARRAY_SIZE(p->feat.raw) - 1; + x86_cpu_policy_to_featureset(p, fs); mask = hvm_hap_supported() ? @@ -783,6 +801,10 @@ static void __init calculate_hvm_def_policy(void) const uint32_t *mask; *p = hvm_max_cpu_policy; + +/* Default to the same max_subleaf as the host. */ +p->feat.max_subleaf = host_cpu_policy.feat.max_subleaf; + x86_cpu_policy_to_featureset(p, fs); mask = hvm_hap_supported() ? base-commit: ebab808eb1bb8f24c7d0dd41b956e48cb1824b81 -- 2.30.2
Re: [PATCH] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake
On 07/05/2024 12:50 pm, Roger Pau Monné wrote: > On Tue, May 07, 2024 at 12:29:57PM +0100, Andrew Cooper wrote: >> Ever since Xen 4.14, there has been a latent bug with migration. >> >> While some toolstacks can level the features properly, they don't shink >> feat.max_subleaf when all features have been dropped. This is because >> we *still* have not completed the toolstack side work for full CPU Policy >> objects. >> >> As a consequence, even when properly feature levelled, VMs can't migrate >> "backwards" across hardware which reduces feat.max_subleaf. One such example >> is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0). >> >> Extend the host policy's feat.max_subleaf to the hightest number Xen knows >> about, similarly to how we extend extd.max_leaf for LFENCE_DISPATCH. This >> will allow VMs with a higher feat.max_subleaf than strictly necessary to >> migrate in. > Seeing what we do for max_extd_leaf, shouldn't we switch to doing what > you propose for feat.max_subleaf to max_extd_leaf also? > > To allow migration between hosts that have 0x8021.eax and hosts > that don't have such extended leaf. > > cpu_has_lfence_dispatch kind of does that, but if lfence cannot be > made serializing then the max extended leaf is not expanded. And we > should also likely account for more feature leafs possibly appearing > after 0x8021? On second thoughts, this adjustment ought to be in the max policies only. It's slightly different to LFENCE_DISPATCH, in that we don't actually have any set bits in those leaves. I'll do a different patch. ~Andrew
[PATCH] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake
Ever since Xen 4.14, there has been a latent bug with migration. While some toolstacks can level the features properly, they don't shink feat.max_subleaf when all features have been dropped. This is because we *still* have not completed the toolstack side work for full CPU Policy objects. As a consequence, even when properly feature levelled, VMs can't migrate "backwards" across hardware which reduces feat.max_subleaf. One such example is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0). Extend the host policy's feat.max_subleaf to the hightest number Xen knows about, similarly to how we extend extd.max_leaf for LFENCE_DISPATCH. This will allow VMs with a higher feat.max_subleaf than strictly necessary to migrate in. Eventually we'll manage to teach the toolstack how to avoid creating such VMs in the first place, but there's still more work to do there. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné --- xen/arch/x86/cpu-policy.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c index 4b6d96276399..a216fc8b886f 100644 --- a/xen/arch/x86/cpu-policy.c +++ b/xen/arch/x86/cpu-policy.c @@ -373,8 +373,13 @@ static void __init calculate_host_policy(void) p->basic.max_leaf = min_t(uint32_t, p->basic.max_leaf, ARRAY_SIZE(p->basic.raw) - 1); -p->feat.max_subleaf = -min_t(uint32_t, p->feat.max_subleaf, ARRAY_SIZE(p->feat.raw) - 1); + +/* + * p->feat is "just" featureset information. We know about more than may + * be present in this hardware. Also, VMs may have a higher max_subleaf + * than strictly necessary, and we can accept those too. + */ +p->feat.max_subleaf = ARRAY_SIZE(p->feat.raw) - 1; max_extd_leaf = p->extd.max_leaf; base-commit: ebab808eb1bb8f24c7d0dd41b956e48cb1824b81 -- 2.30.2
[PATCH] tools/xl: Open xldevd.log with O_CLOEXEC
`xl devd` has been observed leaking /var/log/xldevd.log into children. Link: https://github.com/QubesOS/qubes-issues/issues/8292 Reported-by: Demi Marie Obenour Signed-off-by: Andrew Cooper --- CC: Anthony PERARD CC: Juergen Gross CC: Demi Marie Obenour CC: Marek Marczykowski-Górecki Also entirely speculative based on the QubesOS ticket. --- tools/xl/xl_utils.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c index 17489d182954..060186db3a59 100644 --- a/tools/xl/xl_utils.c +++ b/tools/xl/xl_utils.c @@ -270,7 +270,7 @@ int do_daemonize(const char *name, const char *pidfile) exit(-1); } -CHK_SYSCALL(logfile = open(fullname, O_WRONLY|O_CREAT|O_APPEND, 0644)); +CHK_SYSCALL(logfile = open(fullname, O_WRONLY | O_CREAT | O_APPEND | O_CLOEXEC, 0644)); free(fullname); assert(logfile >= 3); base-commit: ebab808eb1bb8f24c7d0dd41b956e48cb1824b81 prerequisite-patch-id: 212e50457e9b6bdfd06a97da545a5aa7155bb919 -- 2.30.2
[PATCH] tools/libxs: Open /dev/xen/xenbus fds as O_CLOEXEC
The header description for xs_open() goes as far as to suggest that the fd is O_CLOEXEC, but it isn't actually. `xl devd` has been observed leaking /dev/xen/xenbus into children. Link: https://github.com/QubesOS/qubes-issues/issues/8292 Reported-by: Demi Marie Obenour Signed-off-by: Andrew Cooper --- CC: Anthony PERARD CC: Juergen Gross CC: Demi Marie Obenour CC: Marek Marczykowski-Górecki Entirely speculative patch based on a Matrix report --- tools/libs/store/xs.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/libs/store/xs.c b/tools/libs/store/xs.c index 140b9a28395e..1f74fb3c44a2 100644 --- a/tools/libs/store/xs.c +++ b/tools/libs/store/xs.c @@ -54,6 +54,10 @@ struct xs_stored_msg { #include #endif +#ifndef O_CLOEXEC +#define O_CLOEXEC 0 +#endif + struct xs_handle { /* Communications channel to xenstore daemon. */ int fd; @@ -227,7 +231,7 @@ static int get_socket(const char *connect_to) static int get_dev(const char *connect_to) { /* We cannot open read-only because requests are writes */ - return open(connect_to, O_RDWR); + return open(connect_to, O_RDWR|O_CLOEXEC); } static int all_restrict_cb(Xentoolcore__Active_Handle *ah, domid_t domid) { base-commit: feb9158a620040846d76981acbe8ea9e2255a07b -- 2.30.2
Re: [REGRESSION] Re: [XEN PATCH 0/3] automation/eclair: do not allow failure for triggered analyses
On 03/05/2024 9:01 pm, Federico Serafini wrote: > On 03/05/24 21:46, Andrew Cooper wrote: >> On 03/05/2024 8:44 pm, Federico Serafini wrote: >>> On 03/05/24 21:14, Andrew Cooper wrote: >>>> On 29/04/2024 4:21 pm, Federico Serafini wrote: >>>>> Patch 1/3 does some preparation work. >>>>> >>>>> Patch 2/3, as the title says, removes allow_failure = true for >>>>> triggered >>>>> analyses. >>>>> >>>>> Patch 3/3 makes explicit that initally no files are tagged as >>>>> adopted, this >>>>> is needed by the scheduled analysis. >>>> >>>> I'm afraid that something in this series is broken. >>>> >>>> Since these patches went in, all pipelines are now getting a status of >>>> blocked rather than passed. >>>> >>>> If I manually start the Eclair jobs, then eventually the pipeline gets >>>> to Passed. >>> >>> Can you provide us a link to those failures? >>> I am looking at gitlab xen-project/xen and xen-project/patchew >>> and everything seems ok. >>> >> >> https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines/1276081658 >> is the first one I noticed as blocked, and I manually ran. That ended >> up working. >> >> https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines/1277724847 >> is still in the blocked state. The build-each-commit failure is >> unrelated. > > This is intentional and was introduced by > commit 7c1bf8661db5c00bd8c9a25015fe8678b2ff9ac6 > > The ECLAIR analysis under people/* need to be activated > manually. Yes. I know, and that matches the behaviour I saw. > > Is this causing some problems to the CI? > Yes. See https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines Prior to this series, the manual actions were not used but the pipeline was overall in the Passed state. Specifically, they ended up being skipped. After this series, the manual actions are now blocking the pipeline, not letting it complete, and not marking it as passed. ~Andrew
Re: [REGRESSION] Re: [XEN PATCH 0/3] automation/eclair: do not allow failure for triggered analyses
On 03/05/2024 8:44 pm, Federico Serafini wrote: > On 03/05/24 21:14, Andrew Cooper wrote: >> On 29/04/2024 4:21 pm, Federico Serafini wrote: >>> Patch 1/3 does some preparation work. >>> >>> Patch 2/3, as the title says, removes allow_failure = true for >>> triggered >>> analyses. >>> >>> Patch 3/3 makes explicit that initally no files are tagged as >>> adopted, this >>> is needed by the scheduled analysis. >> >> I'm afraid that something in this series is broken. >> >> Since these patches went in, all pipelines are now getting a status of >> blocked rather than passed. >> >> If I manually start the Eclair jobs, then eventually the pipeline gets >> to Passed. > > Can you provide us a link to those failures? > I am looking at gitlab xen-project/xen and xen-project/patchew > and everything seems ok. > https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines/1276081658 is the first one I noticed as blocked, and I manually ran. That ended up working. https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines/1277724847 is still in the blocked state. The build-each-commit failure is unrelated. ~Andrew
[REGRESSION] Re: [XEN PATCH 0/3] automation/eclair: do not allow failure for triggered analyses
On 29/04/2024 4:21 pm, Federico Serafini wrote: > Patch 1/3 does some preparation work. > > Patch 2/3, as the title says, removes allow_failure = true for triggered > analyses. > > Patch 3/3 makes explicit that initally no files are tagged as adopted, this > is needed by the scheduled analysis. I'm afraid that something in this series is broken. Since these patches went in, all pipelines are now getting a status of blocked rather than passed. If I manually start the Eclair jobs, then eventually the pipeline gets to Passed. ~Andrew
Re: [PATCH] tools/tests: don't let test-xenstore write nodes exceeding default size
On 02/05/2024 2:25 pm, Andrew Cooper wrote: > On 02/05/2024 2:21 pm, Juergen Gross wrote: >> Today test-xenstore will write nodes with 3000 bytes node data. This >> size is exceeding the default quota for the allowed node size. While >> working in dom0 with C-xenstored, OCAML-xenstored does not like that. >> >> Use a size of 2000 instead, which is lower than the allowed default >> node size of 2048. >> >> Fixes: 3afc5e4a5b75 ("tools/tests: add xenstore testing framework") >> Signed-off-by: Juergen Gross > Acked-by: Andrew Cooper > > Thanks for figuring this out. I'll give this and the return code fix a > spin through XenRT and check there's nothing else unexpected hiding. All seems happy. I'll put these in in due course. ~Andrew
[PATCH] xen/Kconfig: Drop the final remnants of ---help---
We deprecated the use of ---help--- a while ago, but a lot of new content copy bad examples. Convert the remaining instances, and update Kconfig's parser to no longer recongise it. This now causes builds to fail with: Kconfig.debug:8: syntax error Kconfig.debug:7: unknown statement "---help---" which short circuits one common piece of churn in new content. No functional change. Signed-off-by: Andrew Cooper --- CC: George Dunlap CC: Jan Beulich CC: Stefano Stabellini CC: Julien Grall CC: Oleksii Kurochko For 4.19. This cleans up a legacy we've been wanting to get rid of for a while, and will be least disruptive on people if it gets in ahead of most people starting work for 4.20. --- xen/Kconfig | 2 +- xen/Kconfig.debug | 28 +-- xen/arch/arm/Kconfig| 8 +++--- xen/arch/arm/platforms/Kconfig | 12 - xen/arch/x86/Kconfig| 32 +++--- xen/common/Kconfig | 48 - xen/common/sched/Kconfig| 10 +++ xen/drivers/passthrough/Kconfig | 8 +++--- xen/drivers/video/Kconfig | 2 +- xen/tools/kconfig/lexer.l | 2 +- 10 files changed, 76 insertions(+), 76 deletions(-) diff --git a/xen/Kconfig b/xen/Kconfig index 1e1b041fd52f..e459cdac0cd7 100644 --- a/xen/Kconfig +++ b/xen/Kconfig @@ -84,7 +84,7 @@ config UNSUPPORTED config LTO bool "Link Time Optimisation" depends on BROKEN - ---help--- + help Enable Link Time Optimisation. If unsure, say N. diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug index fa81853e9385..61b24ac552cd 100644 --- a/xen/Kconfig.debug +++ b/xen/Kconfig.debug @@ -4,7 +4,7 @@ menu "Debugging Options" config DEBUG bool "Developer Checks" default y - ---help--- + help If you say Y here this will enable developer checks such as asserts and extra printks. This option is intended for development purposes only, and not for production use. @@ -17,14 +17,14 @@ config GDBSX bool "Guest debugging with gdbsx" depends on X86 default y - ---help--- + help If you want to enable support for debugging guests from dom0 via gdbsx then say Y. config FRAME_POINTER bool "Compile Xen with frame pointers" default DEBUG - ---help--- + help If you say Y here the resulting Xen will be slightly larger and maybe slower, but it gives very useful debugging information in case of any Xen bugs. @@ -33,7 +33,7 @@ config COVERAGE bool "Code coverage support" depends on !LIVEPATCH select SUPPRESS_DUPLICATE_SYMBOL_WARNINGS if !ENFORCE_UNIQUE_SYMBOLS - ---help--- + help Enable code coverage support. If unsure, say N here. @@ -41,7 +41,7 @@ config COVERAGE config DEBUG_LOCK_PROFILE bool "Lock Profiling" select DEBUG_LOCKS - ---help--- + help Lock profiling allows you to see how often locks are taken and blocked. You can use serial console to print (and reset) using 'l' and 'L' respectively, or the 'xenlockprof' tool. @@ -49,13 +49,13 @@ config DEBUG_LOCK_PROFILE config DEBUG_LOCKS bool "Lock debugging" default DEBUG - ---help--- + help Enable debugging features of lock handling. Some additional checks will be performed when acquiring and releasing locks. config PERF_COUNTERS bool "Performance Counters" - ---help--- + help Enables software performance counters that allows you to analyze bottlenecks in the system. To access this data you can use serial console to print (and reset) using 'p' and 'P' respectively, or @@ -64,21 +64,21 @@ config PERF_COUNTERS config PERF_ARRAYS bool "Performance Counter Array Histograms" depends on PERF_COUNTERS - ---help--- + help Enables software performance counter array histograms. config VERBOSE_DEBUG bool "Verbose debug messages" default DEBUG - ---help--- + help Guest output from HYPERVISOR_console_io and hypervisor parsing ELF images (dom0) will be logged in the Xen ring buffer. config DEVICE_TREE_DEBUG bool "Device tree debug messages" depends on HAS_DEVICE_TREE - ---help--- + help Device tree parsing and DOM0 device tree building messages are logged in the Xen ring buffer. If unsure, say N here. @@ -86,14 +86,14 @@ config DEVICE_TREE_DEBUG config SCRUB_DEBUG bool "Page scrubbing test" default DEBUG - ---help--- + help Verify that pages that need
Re: [PATCH v2 4/4] x86/cpuid: Fix handling of xsave dynamic leaves
On 02/05/2024 2:04 pm, Jan Beulich wrote: > On 29.04.2024 20:28, Andrew Cooper wrote: >> If max leaf is greater than 0xd but xsave not available to the guest, then >> the >> current XSAVE size should not be filled in. This is a latent bug for now as >> the guest max leaf is 0xd, but will become problematic in the future. > Why would this not be an issue when .max_leaf == 0xd, but .xsave == 0? Hmm, true. I'll adjust the description. > >> The comment concerning XSS state is wrong. VT-x doesn't manage host/guest >> state automatically, but there is provision for "host only" bits to be set, >> so >> the implications are still accurate. >> >> Introduce {xstate,hw}_compressed_size() helpers to mirror the uncompressed >> ones. >> >> This in turn higlights a bug in xstate_init(). Defaulting this_cpu(xss) to >> ~0 >> requires a forced write to clear it back out. This in turn highlights that >> it's only a safe default on systems with XSAVES. > Well, yes, such an explicit write was expected to appear when some xsaves- > based component would actually be enabled. Much like the set_xcr0() there. I stumbled over this because the debug logic ended up trying to restore 0x (the thing it read out as the "last" value) back into XSS. This works about as well as you'd expect. >> Signed-off-by: Andrew Cooper >> --- >> CC: Jan Beulich >> CC: Roger Pau Monné >> >> The more I think about it, the more I think that cross-checking with hardware >> is a bad move. It's horribly expensive, and for supervisor states in >> particular, liable to interfere with functionality. > I agree, but I'd also like to see the cross checking not dropped entirely. > Can't we arrange for such to happen during boot, before we enable any such > functionality? After all even in debug builds it's not overly useful to do > the same cross-checking (i.e. for identical inputs) over and over again. > Of course doing in an exhaustive manner may be okay for the uncompressed > values, but might be a little too much for all possible combinations in > order to check compressed values, too. Given the observation of patch 2 being buggy and my final sanity test before posting didn't notice, I think doing this all at boot would be a much better idea. I think I'm going to do a new patch early in the series as an adjustment to xstate_init(). We can't feasibly test every combination, but what ought to do is linearly accumulate the xstates Xen knows about and checking that in each case the size(s) increase as appropriate. This will have a substantial impact on the remainder of the series, but I think the end result will be all the better for it. >> --- a/xen/arch/x86/xstate.c >> +++ b/xen/arch/x86/xstate.c >> @@ -614,6 +614,65 @@ unsigned int xstate_uncompressed_size(uint64_t xcr0) >> return size; >> } >> >> +static unsigned int hw_compressed_size(uint64_t xstates) >> +{ >> +uint64_t curr_xcr0 = get_xcr0(), curr_xss = get_msr_xss(); >> +unsigned int size; >> +bool ok; >> + >> +ok = set_xcr0(xstates & ~XSTATE_XSAVES_ONLY); >> +ASSERT(ok); >> +set_msr_xss(xstates & XSTATE_XSAVES_ONLY); >> + >> +size = cpuid_count_ebx(XSTATE_CPUID, 1); >> + >> +ok = set_xcr0(curr_xcr0); >> +ASSERT(ok); >> +set_msr_xss(curr_xss); >> + >> +return size; >> +} >> + >> +unsigned int xstate_compressed_size(uint64_t xstates) >> +{ >> +unsigned int i, size = XSTATE_AREA_MIN_SIZE; >> + >> +if ( xstates == 0 ) /* TODO: clean up paths passing 0 in here. */ >> +return 0; >> + >> +if ( xstates <= (X86_XCR0_SSE | X86_XCR0_FP) ) > Same comment as on the earlier change regarding the (lack of) use of > >> +return size; >> + >> +/* >> + * For the compressed size, every component matters. Some are >> + * automatically rounded up to 64 first. >> + */ >> +xstates &= ~XSTATE_FP_SSE; > ... this constant up there. > >> +for_each_set_bit ( i, , 63 ) >> +{ >> +if ( test_bit(i, _align) ) >> +size = ROUNDUP(size, 64); >> + >> +size += xstate_sizes[i]; >> +} > The comment is a little misleading: As you have it in code, it is not the > component's size that is rounded up, but its position. Maybe "Some have > their start automatically rounded up to 64 first"? Size in that sentence referees to the compressed size of everything, not the size of the component. But I'll try to make it clearer. ~Andrew
Re: [PATCH] tools/tests: don't let test-xenstore write nodes exceeding default size
On 02/05/2024 2:21 pm, Juergen Gross wrote: > Today test-xenstore will write nodes with 3000 bytes node data. This > size is exceeding the default quota for the allowed node size. While > working in dom0 with C-xenstored, OCAML-xenstored does not like that. > > Use a size of 2000 instead, which is lower than the allowed default > node size of 2048. > > Fixes: 3afc5e4a5b75 ("tools/tests: add xenstore testing framework") > Signed-off-by: Juergen Gross Acked-by: Andrew Cooper Thanks for figuring this out. I'll give this and the return code fix a spin through XenRT and check there's nothing else unexpected hiding.
Re: [PATCH v2 3/4] x86/cpu-policy: Simplify recalculate_xstate()
On 02/05/2024 1:39 pm, Jan Beulich wrote: > On 29.04.2024 20:28, Andrew Cooper wrote: >> Make use of the new xstate_uncompressed_size() helper rather than maintaining >> the running calculation while accumulating feature components. > xstate_uncompressed_size() isn't really a new function, but the re-work of > an earlier one. That, aiui, could have been used here, too, just that it > would have been inefficient to do so. IOW perhaps drop "the new"? Ok. > >> The rest of the CPUID data can come direct from the raw cpu policy. All >> per-component data form an ABI through the behaviour of the X{SAVE,RSTOR}* >> instructions. >> >> Use for_each_set_bit() rather than opencoding a slightly awkward version of >> it. Mask the attributes in ecx down based on the visible features. This >> isn't actually necessary for any components or attributes defined at the time >> of writing (up to AMX), but is added out of an abundance of caution. > As to this, ... > >> @@ -206,61 +205,47 @@ static void recalculate_xstate(struct cpu_policy *p) >> return; >> >> if ( p->basic.avx ) >> -{ >> xstates |= X86_XCR0_YMM; >> -xstate_size = max(xstate_size, >> - xstate_offsets[X86_XCR0_YMM_POS] + >> - xstate_sizes[X86_XCR0_YMM_POS]); >> -} >> >> if ( p->feat.mpx ) >> -{ >> xstates |= X86_XCR0_BNDREGS | X86_XCR0_BNDCSR; >> -xstate_size = max(xstate_size, >> - xstate_offsets[X86_XCR0_BNDCSR_POS] + >> - xstate_sizes[X86_XCR0_BNDCSR_POS]); >> -} >> >> if ( p->feat.avx512f ) >> -{ >> xstates |= X86_XCR0_OPMASK | X86_XCR0_ZMM | X86_XCR0_HI_ZMM; >> -xstate_size = max(xstate_size, >> - xstate_offsets[X86_XCR0_HI_ZMM_POS] + >> - xstate_sizes[X86_XCR0_HI_ZMM_POS]); >> -} >> >> if ( p->feat.pku ) >> -{ >> xstates |= X86_XCR0_PKRU; >> -xstate_size = max(xstate_size, >> - xstate_offsets[X86_XCR0_PKRU_POS] + >> - xstate_sizes[X86_XCR0_PKRU_POS]); >> -} >> >> -p->xstate.max_size = xstate_size; >> +/* Subleaf 0 */ >> +p->xstate.max_size = >> +xstate_uncompressed_size(xstates & ~XSTATE_XSAVES_ONLY); >> p->xstate.xcr0_low = xstates & ~XSTATE_XSAVES_ONLY; >> p->xstate.xcr0_high = (xstates & ~XSTATE_XSAVES_ONLY) >> 32; >> >> +/* Subleaf 1 */ >> p->xstate.Da1 = Da1; >> +if ( p->xstate.xsavec ) >> +ecx_mask |= XSTATE_ALIGN64; >> + >> if ( p->xstate.xsaves ) >> { >> +ecx_mask |= XSTATE_XSS; >> p->xstate.xss_low = xstates & XSTATE_XSAVES_ONLY; >> p->xstate.xss_high = (xstates & XSTATE_XSAVES_ONLY) >> 32; >> } >> -else >> -xstates &= ~XSTATE_XSAVES_ONLY; >> >> -for ( i = 2; i < min(63UL, ARRAY_SIZE(p->xstate.comp)); ++i ) >> +/* Subleafs 2+ */ >> +xstates &= ~XSTATE_FP_SSE; >> +BUILD_BUG_ON(ARRAY_SIZE(p->xstate.comp) < 63); >> +for_each_set_bit ( i, , 63 ) >> { >> -uint64_t curr_xstate = 1UL << i; >> - >> -if ( !(xstates & curr_xstate) ) >> -continue; >> - >> -p->xstate.comp[i].size = xstate_sizes[i]; >> -p->xstate.comp[i].offset = xstate_offsets[i]; >> -p->xstate.comp[i].xss= curr_xstate & XSTATE_XSAVES_ONLY; >> -p->xstate.comp[i].align = curr_xstate & xstate_align; > ... for this bit, isn't the move from this ... > >> +/* >> + * Pass through size (eax) and offset (ebx) directly. Visbility of >> + * attributes in ecx limited by visible features in Da1. >> + */ >> +p->xstate.raw[i].a = raw_cpu_policy.xstate.raw[i].a; >> +p->xstate.raw[i].b = raw_cpu_policy.xstate.raw[i].b; >> +p->xstate.raw[i].c = raw_cpu_policy.xstate.raw[i].c & ecx_mask; > ... to this changing what guests get to see, i.e. (mildly?) incompatible? No. The only "rows" in leaf 0xd we expose to guests are AVX, MPX, AVX512 and PKU (higher up in this hunk, selecting valid bits in xstates). None of these have a non-zero value in ecx. This is a latent bug until we offer AMX or CET, hence why I wanted to complete this series before your AMX series goes in. ~Andrew
Re: [PATCH v2 2/4] x86/xstate: Rework xstate_ctxt_size() as xstate_uncompressed_size()
On 02/05/2024 1:19 pm, Jan Beulich wrote: > On 29.04.2024 20:28, Andrew Cooper wrote: >> @@ -567,16 +567,51 @@ static unsigned int hw_uncompressed_size(uint64_t xcr0) >> return size; >> } >> >> -/* Fastpath for common xstate size requests, avoiding reloads of xcr0. */ >> -unsigned int xstate_ctxt_size(u64 xcr0) >> +unsigned int xstate_uncompressed_size(uint64_t xcr0) >> { >> +unsigned int size = XSTATE_AREA_MIN_SIZE, i; >> + >> if ( xcr0 == xfeature_mask ) >> return xsave_cntxt_size; >> >> if ( xcr0 == 0 ) /* TODO: clean up paths passing 0 in here. */ >> return 0; >> >> -return hw_uncompressed_size(xcr0); >> +if ( xcr0 <= (X86_XCR0_SSE | X86_XCR0_FP) ) > This is open-coded XSTATE_FP_SSE, which I wouldn't mind if ... > >> +return size; >> + >> +/* >> + * For the non-legacy states, search all activate states and find the >> + * maximum offset+size. Some states (e.g. LWP, APX_F) are out-of-order >> + * with respect their index. >> + */ >> +xcr0 &= ~XSTATE_FP_SSE; > ... you didn't use that macro here (and once further down). IOW please > be consistent, no matter which way round. Oh yes - that's a consequence of these hunks having between written 3 years apart. It's important for the first one (logical comparison against a bitmap) that it's split apart. > >> +for_each_set_bit ( i, , 63 ) >> +{ >> +unsigned int s; >> + >> +ASSERT(xstate_offsets[i] && xstate_sizes[i]); >> + >> +s = xstate_offsets[i] && xstate_sizes[i]; > You mean + here, don't you? Yes I do... That was a victim of a last minute refactor. It also shows that even the cross-check with hardware isn't terribly effective. More on that in the other thread. > Then: > Reviewed-by: Jan Beulich > > I'm also inclined to suggest making this the initializer of s. Hmm, good point. Will change. Thanks, ~Andrew
Re: [PATCH for-4.19] tools/xen-cpuid: switch to use cpu-policy defined names
On 30/04/2024 1:54 pm, Roger Pau Monné wrote: > On Tue, Apr 30, 2024 at 02:06:38PM +0200, Jan Beulich wrote: >> On 30.04.2024 13:25, Roger Pau Monné wrote: >>> On Tue, Apr 30, 2024 at 12:37:44PM +0200, Jan Beulich wrote: On 30.04.2024 10:29, Roger Pau Monne wrote: > @@ -301,21 +52,32 @@ static const char *const fs_names[] = { > [XEN_SYSCTL_cpu_featureset_hvm_max] = "HVM Max", > }; > > -static void dump_leaf(uint32_t leaf, const char *const *strs) > +static const char *find_name(unsigned int index) > { > -unsigned i; > +static const struct feature_name { > +const char *name; > +unsigned int bit; > +} feature_names[] = INIT_FEATURE_NAMES; > +unsigned int i; > > -if ( !strs ) > -{ > -printf(" ???"); > -return; > -} > +for ( i = 0; i < ARRAY_SIZE(feature_names); i++ ) > +if ( feature_names[i].bit == index ) > +return feature_names[i].name; ... a linear search, repeated perhaps hundreds of times, looks still a little odd to me. >>> I didn't benchmark what kind of performance impact this change would >>> have on the tool, but I didn't think it was that relevant, as this is >>> a diagnostic/debug tool, and hence performance (unless it took seconds >>> to execute) shouldn't be that important. >> As indicated, performance itself isn't much of a concern here. My earlier >> question wants reading in relation to the other question raised, regarding >> the script maybe wanting to produce macro(s) more suitable for the purpose >> here. > Hm, we could maybe produce an array of strings, one per feature bit > (features without names would get NULL). > > I will see, albeit my python skills are very limited. See how Xen's cmdline parsing uses feature_names; they're intentionally sorted already. But, having gen-cpuid.py write out something that's closer to what xen-cpuid.c wants is also very easy to do. ~Andrew
Re: [PATCH] tools/tests: let test-xenstore exit with non-0 status in case of error
On 02/05/2024 10:22 am, Juergen Gross wrote: > In case a test is failing in test-xenstore, let the tool exit with an > exit status other than 0. > > Fix a typo in an error message. > > Reported-by: Andrew Cooper > Fixes: 3afc5e4a5b75 ("tools/tests: add xenstore testing framework") > Signed-off-by: Juergen Gross Thanks. Acked-by: Andrew Cooper although this could do with waiting until after we've fixed whatever else is causing it to fail. I'll try to find someone to work on that bug too. ~Andrew
Re: [PATCH 1/2] tools/{c,o}xenstored: Don't link against libsystemd
On 01/05/2024 2:29 pm, Anthony PERARD wrote: > On Fri, Apr 26, 2024 at 09:51:47AM +0100, Anthony PERARD wrote: >> Run `systemd-notify --ready` instead. Hopefully, that will be enough. >> ($NOTIFY_SOCKET is a socket, and a bit more complicated that I though, >> it can start with "@" for example) > FTR: If it turns out that calling systemd-notify binary isn't working > well enough, we could have an implementation of sd_notify() in our tree, > openssh are doing there own here: > https://bugzilla.mindrot.org/show_bug.cgi?id=2641 > and there's an example implementation on systemd's documentation: > https://www.freedesktop.org/software/systemd/man/devel/sd_notify.html#Notes > (Nothing for ocaml) > > But let's go with `systemd-notify --ready` as it is just easier to > write a bit of shell script. I was already thinking of going down the small-library-function route. Given that I miss-analysed the launch-xenstore, script, I'm not overly enthused with just falling back to waiting on the pidfile, because that's adding technical debt rather than removing it. ~Andrew
Re: [PATCH 2/5] x86/cpu-policy: Add SVM features already used by Xen
On 01/05/2024 11:00 am, George Dunlap wrote: > On Mon, Apr 29, 2024 at 4:16 PM Andrew Cooper > wrote: >> These will replace svm_feature_flags and the SVM_FEATURE_* constants over the >> next few changes. Take the opportunity to rationalise some names. >> >> Drop the opencoded "inherit from host" logic in calculate_hvm_max_policy() >> and >> use 'h'/'!' annotations. The logic needs to operate on fs, not the policy >> object, given its position within the function. >> >> Drop some trailing whitespace introduced when this block of code was last >> moved. >> >> Signed-off-by: Andrew Cooper >> --- >> CC: Jan Beulich >> CC: Roger Pau Monné >> CC: Stefano Stabellini >> CC: Xenia Ragiadakou >> CC: Sergiy Kibrik >> CC: George Dunlap >> CC: Andrei Semenov >> CC: Vaishali Thakkar >> --- >> tools/misc/xen-cpuid.c | 11 +++ >> xen/arch/x86/cpu-policy.c | 17 + >> xen/include/public/arch-x86/cpufeatureset.h | 14 ++ >> xen/tools/gen-cpuid.py | 3 +++ >> 4 files changed, 33 insertions(+), 12 deletions(-) >> >> diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c >> index ab09410a05d6..0d01b0e797f1 100644 >> --- a/tools/misc/xen-cpuid.c >> +++ b/tools/misc/xen-cpuid.c >> @@ -266,6 +266,17 @@ static const char *const str_m10Ah[32] = >> >> static const char *const str_eAd[32] = >> { >> +[ 0] = "npt", [ 1] = "v-lbr", >> +[ 2] = "svm-lock",[ 3] = "nrips", >> +[ 4] = "v-tsc-rate", [ 5] = "vmcb-cleanbits", >> +[ 6] = "flush-by-asid", [ 7] = "decode-assist", >> + >> +[10] = "pause-filter", >> +[12] = "pause-thresh", >> +/* 14 */ [15] = "v-loadsave", >> +[16] = "v-gif", >> +/* 18 */ [19] = "npt-sss", >> +[20] = "v-spec-ctrl", >> }; >> >> static const char *const str_e1Fa[32] = >> diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c >> index 4b6d96276399..da4401047e89 100644 >> --- a/xen/arch/x86/cpu-policy.c >> +++ b/xen/arch/x86/cpu-policy.c >> @@ -9,7 +9,6 @@ >> #include >> #include >> #include >> -#include >> #include >> #include >> #include >> @@ -748,22 +747,16 @@ static void __init calculate_hvm_max_policy(void) >> if ( !cpu_has_vmx ) >> __clear_bit(X86_FEATURE_PKS, fs); >> >> -/* >> +/* >> * Make adjustments to possible (nested) virtualization features exposed >> * to the guest >> */ >> -if ( p->extd.svm ) >> +if ( test_bit(X86_FEATURE_SVM, fs) ) >> { >> -/* Clamp to implemented features which require hardware support. */ >> -p->extd.raw[0xa].d &= ((1u << SVM_FEATURE_NPT) | >> - (1u << SVM_FEATURE_LBRV) | >> - (1u << SVM_FEATURE_NRIPS) | >> - (1u << SVM_FEATURE_PAUSEFILTER) | >> - (1u << SVM_FEATURE_DECODEASSISTS)); >> -/* Enable features which are always emulated. */ >> -p->extd.raw[0xa].d |= (1u << SVM_FEATURE_VMCBCLEAN); >> +/* Xen always emulates cleanbits. */ >> +__set_bit(X86_FEATURE_VMCB_CLEANBITS, fs); >> } > What about this line, at the end of recalculate_cpuid_policy()? > > if ( !p->extd.svm ) > p->extd.raw[0xa] = EMPTY_LEAF; > > Is there a reason to continue to operate directly on the policy here, > or would it be better to do this up in the featureset section? That's still needed. Otherwise in a !SVM VM you still see svm_rev and nr_asids in a leaf that should be all zeroes. ~Andrew
Re: [PATCH 1/5] x86/cpu-policy: Infrastructure for the AMD SVM and SEV leaves
On 30/04/2024 1:45 pm, Jan Beulich wrote: > On 29.04.2024 17:16, Andrew Cooper wrote: >> Allocate two new feature leaves, and extend cpu_policy with the non-feature >> fields too. >> >> The CPUID dependency between the SVM bit on the whole SVM leaf is >> intentionally deferred, to avoid transiently breaking nested virt. > In reply to this I meant to ask that you at least add those dependencies in > commented-out form, such that from looking at gen-cpuid.py it becomes clear > they're intentionally omitted. But you don't add feature identifiers either, > making dependencies impossible to express. Maybe this sentence was really > meant for another of the patches? (Then my request would actually apply > there.) This is necessary because c/s 4f8b0e94d7ca is buggy. Notice how it puts an edit to the policy object in the middle of a block of logic editing the featureset, which ends with writing the featureset back over the policy object. And it's not the first outstanding problem from what is a very small number of nested-virt patches so far... >> @@ -296,7 +298,16 @@ struct cpu_policy >> uint32_t /* d */:32; >> >> uint64_t :64, :64; /* Leaf 0x8009. */ >> -uint64_t :64, :64; /* Leaf 0x800a - SVM rev and features. */ >> + >> +/* Leaf 0x800a - SVM rev and features. */ >> +uint8_t svm_rev, :8, :8, :8; >> +uint32_t /* b */ :32; >> +uint32_t nr_asids; > According to the doc I'm looking at it is %ebx which holds the number of > ASIDs and %ecx is reserved. With that adjusted That's fun... The PPR I used for this has it wrong. A sample of others match the APM, so I'll raise a bug with AMD against this PPR. > Reviewed-by: Jan Beulich Thanks. ~Andrew
Re: xen | Failed pipeline for staging | b819bd65
On 30/04/2024 10:00 am, Jan Beulich wrote: > On 30.04.2024 10:03, GitLab wrote: >> >> Pipeline #1272869158 has failed! >> >> Project: xen ( https://gitlab.com/xen-project/hardware/xen ) >> Branch: staging ( >> https://gitlab.com/xen-project/hardware/xen/-/commits/staging ) >> >> Commit: b819bd65 ( >> https://gitlab.com/xen-project/hardware/xen/-/commit/b819bd65f4fb25be582f66ba2e4134f61d86f459 >> ) >> Commit Message: revert "x86/mm: re-implement get_page_light() u... >> Commit Author: Jan Beulich ( https://gitlab.com/jbeulich ) >> >> >> Pipeline #1272869158 ( >> https://gitlab.com/xen-project/hardware/xen/-/pipelines/1272869158 ) >> triggered by Jan Beulich ( https://gitlab.com/jbeulich ) >> had 3 failed jobs. >> >> Job #6745823842 ( >> https://gitlab.com/xen-project/hardware/xen/-/jobs/6745823842/raw ) >> >> Stage: test >> Name: adl-pci-hvm-x86-64-gcc-debug >> Job #6745823720 ( >> https://gitlab.com/xen-project/hardware/xen/-/jobs/6745823720/raw ) >> >> Stage: analyze >> Name: eclair-x86_64 > This flags start_nested_{svm,vmx}() as regressions, when the regression was > previously spotted already. Is that intended? Yes. > I.e. shouldn't the comparison > be to the previous pipeline run, such that issues are pointed out only for > what is actually being added anew with the patch / batch under test? Why should it be? That's unlike every other CI we use, even OSSTest. Gitlab, like many others, is stateless between runs. These violations are ones where we had got down to 0 in Xen, and Xen was marked as "clean" for these rules. Any nonzero count (in the subset of things we think we've fixed fully) is a failure. This is no different to e.g. a panic on boot. OSSTest will keep on complaining of a regression until it gets fixed. ~Andrew
[PATCH v2 3/4] x86/cpu-policy: Simplify recalculate_xstate()
Make use of the new xstate_uncompressed_size() helper rather than maintaining the running calculation while accumulating feature components. The rest of the CPUID data can come direct from the raw cpu policy. All per-component data form an ABI through the behaviour of the X{SAVE,RSTOR}* instructions. Use for_each_set_bit() rather than opencoding a slightly awkward version of it. Mask the attributes in ecx down based on the visible features. This isn't actually necessary for any components or attributes defined at the time of writing (up to AMX), but is added out of an abundance of caution. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné v2: * Tie ALIGN64 to xsavec rather than xsaves. --- xen/arch/x86/cpu-policy.c | 55 +++ xen/arch/x86/include/asm/xstate.h | 1 + 2 files changed, 21 insertions(+), 35 deletions(-) diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c index 4b6d96276399..fc7933be8577 100644 --- a/xen/arch/x86/cpu-policy.c +++ b/xen/arch/x86/cpu-policy.c @@ -193,8 +193,7 @@ static void sanitise_featureset(uint32_t *fs) static void recalculate_xstate(struct cpu_policy *p) { uint64_t xstates = XSTATE_FP_SSE; -uint32_t xstate_size = XSTATE_AREA_MIN_SIZE; -unsigned int i, Da1 = p->xstate.Da1; +unsigned int i, ecx_mask = 0, Da1 = p->xstate.Da1; /* * The Da1 leaf is the only piece of information preserved in the common @@ -206,61 +205,47 @@ static void recalculate_xstate(struct cpu_policy *p) return; if ( p->basic.avx ) -{ xstates |= X86_XCR0_YMM; -xstate_size = max(xstate_size, - xstate_offsets[X86_XCR0_YMM_POS] + - xstate_sizes[X86_XCR0_YMM_POS]); -} if ( p->feat.mpx ) -{ xstates |= X86_XCR0_BNDREGS | X86_XCR0_BNDCSR; -xstate_size = max(xstate_size, - xstate_offsets[X86_XCR0_BNDCSR_POS] + - xstate_sizes[X86_XCR0_BNDCSR_POS]); -} if ( p->feat.avx512f ) -{ xstates |= X86_XCR0_OPMASK | X86_XCR0_ZMM | X86_XCR0_HI_ZMM; -xstate_size = max(xstate_size, - xstate_offsets[X86_XCR0_HI_ZMM_POS] + - xstate_sizes[X86_XCR0_HI_ZMM_POS]); -} if ( p->feat.pku ) -{ xstates |= X86_XCR0_PKRU; -xstate_size = max(xstate_size, - xstate_offsets[X86_XCR0_PKRU_POS] + - xstate_sizes[X86_XCR0_PKRU_POS]); -} -p->xstate.max_size = xstate_size; +/* Subleaf 0 */ +p->xstate.max_size = +xstate_uncompressed_size(xstates & ~XSTATE_XSAVES_ONLY); p->xstate.xcr0_low = xstates & ~XSTATE_XSAVES_ONLY; p->xstate.xcr0_high = (xstates & ~XSTATE_XSAVES_ONLY) >> 32; +/* Subleaf 1 */ p->xstate.Da1 = Da1; +if ( p->xstate.xsavec ) +ecx_mask |= XSTATE_ALIGN64; + if ( p->xstate.xsaves ) { +ecx_mask |= XSTATE_XSS; p->xstate.xss_low = xstates & XSTATE_XSAVES_ONLY; p->xstate.xss_high = (xstates & XSTATE_XSAVES_ONLY) >> 32; } -else -xstates &= ~XSTATE_XSAVES_ONLY; -for ( i = 2; i < min(63UL, ARRAY_SIZE(p->xstate.comp)); ++i ) +/* Subleafs 2+ */ +xstates &= ~XSTATE_FP_SSE; +BUILD_BUG_ON(ARRAY_SIZE(p->xstate.comp) < 63); +for_each_set_bit ( i, , 63 ) { -uint64_t curr_xstate = 1UL << i; - -if ( !(xstates & curr_xstate) ) -continue; - -p->xstate.comp[i].size = xstate_sizes[i]; -p->xstate.comp[i].offset = xstate_offsets[i]; -p->xstate.comp[i].xss= curr_xstate & XSTATE_XSAVES_ONLY; -p->xstate.comp[i].align = curr_xstate & xstate_align; +/* + * Pass through size (eax) and offset (ebx) directly. Visbility of + * attributes in ecx limited by visible features in Da1. + */ +p->xstate.raw[i].a = raw_cpu_policy.xstate.raw[i].a; +p->xstate.raw[i].b = raw_cpu_policy.xstate.raw[i].b; +p->xstate.raw[i].c = raw_cpu_policy.xstate.raw[i].c & ecx_mask; } } diff --git a/xen/arch/x86/include/asm/xstate.h b/xen/arch/x86/include/asm/xstate.h index f5115199d4f9..bfb66dd766b6 100644 --- a/xen/arch/x86/include/asm/xstate.h +++ b/xen/arch/x86/include/asm/xstate.h @@ -40,6 +40,7 @@ extern uint32_t mxcsr_mask; #define XSTATE_XSAVES_ONLY 0 #define XSTATE_COMPACTION_ENABLED (1ULL << 63) +#define XSTATE_XSS (1U << 0) #define XSTATE_ALIGN64 (1U << 1) extern u64 xfeature_mask; -- 2.30.2
[PATCH v2 4/4] x86/cpuid: Fix handling of xsave dynamic leaves
If max leaf is greater than 0xd but xsave not available to the guest, then the current XSAVE size should not be filled in. This is a latent bug for now as the guest max leaf is 0xd, but will become problematic in the future. The comment concerning XSS state is wrong. VT-x doesn't manage host/guest state automatically, but there is provision for "host only" bits to be set, so the implications are still accurate. Introduce {xstate,hw}_compressed_size() helpers to mirror the uncompressed ones. This in turn higlights a bug in xstate_init(). Defaulting this_cpu(xss) to ~0 requires a forced write to clear it back out. This in turn highlights that it's only a safe default on systems with XSAVES. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné The more I think about it, the more I think that cross-checking with hardware is a bad move. It's horribly expensive, and for supervisor states in particular, liable to interfere with functionality. v2: * Cope with blind reads of CPUID.0xD[1] prior to %xcr0 having been set up. --- xen/arch/x86/cpuid.c | 24 xen/arch/x86/include/asm/xstate.h | 1 + xen/arch/x86/xstate.c | 64 ++- 3 files changed, 72 insertions(+), 17 deletions(-) diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c index 7a38e032146a..a822e80c7ea7 100644 --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -330,23 +330,15 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf, case XSTATE_CPUID: switch ( subleaf ) { -case 1: -if ( !p->xstate.xsavec && !p->xstate.xsaves ) -break; - -/* - * TODO: Figure out what to do for XSS state. VT-x manages host - * vs guest MSR_XSS automatically, so as soon as we start - * supporting any XSS states, the wrong XSS will be in context. - */ -BUILD_BUG_ON(XSTATE_XSAVES_ONLY != 0); -fallthrough; case 0: -/* - * Read CPUID[0xD,0/1].EBX from hardware. They vary with enabled - * XSTATE, and appropriate XCR0|XSS are in context. - */ -res->b = cpuid_count_ebx(leaf, subleaf); +if ( p->basic.xsave ) +res->b = xstate_uncompressed_size(v->arch.xcr0); +break; + +case 1: +if ( p->xstate.xsavec ) +res->b = xstate_compressed_size(v->arch.xcr0 | +v->arch.msrs->xss.raw); break; } break; diff --git a/xen/arch/x86/include/asm/xstate.h b/xen/arch/x86/include/asm/xstate.h index bfb66dd766b6..da1d89d2f416 100644 --- a/xen/arch/x86/include/asm/xstate.h +++ b/xen/arch/x86/include/asm/xstate.h @@ -109,6 +109,7 @@ void xstate_free_save_area(struct vcpu *v); int xstate_alloc_save_area(struct vcpu *v); void xstate_init(struct cpuinfo_x86 *c); unsigned int xstate_uncompressed_size(uint64_t xcr0); +unsigned int xstate_compressed_size(uint64_t xstates); static inline uint64_t xgetbv(unsigned int index) { diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c index a94f4025fce5..b2cf3ae6acee 100644 --- a/xen/arch/x86/xstate.c +++ b/xen/arch/x86/xstate.c @@ -614,6 +614,65 @@ unsigned int xstate_uncompressed_size(uint64_t xcr0) return size; } +static unsigned int hw_compressed_size(uint64_t xstates) +{ +uint64_t curr_xcr0 = get_xcr0(), curr_xss = get_msr_xss(); +unsigned int size; +bool ok; + +ok = set_xcr0(xstates & ~XSTATE_XSAVES_ONLY); +ASSERT(ok); +set_msr_xss(xstates & XSTATE_XSAVES_ONLY); + +size = cpuid_count_ebx(XSTATE_CPUID, 1); + +ok = set_xcr0(curr_xcr0); +ASSERT(ok); +set_msr_xss(curr_xss); + +return size; +} + +unsigned int xstate_compressed_size(uint64_t xstates) +{ +unsigned int i, size = XSTATE_AREA_MIN_SIZE; + +if ( xstates == 0 ) /* TODO: clean up paths passing 0 in here. */ +return 0; + +if ( xstates <= (X86_XCR0_SSE | X86_XCR0_FP) ) +return size; + +/* + * For the compressed size, every component matters. Some are + * automatically rounded up to 64 first. + */ +xstates &= ~XSTATE_FP_SSE; +for_each_set_bit ( i, , 63 ) +{ +if ( test_bit(i, _align) ) +size = ROUNDUP(size, 64); + +size += xstate_sizes[i]; +} + +/* In debug builds, cross-check our calculation with hardware. */ +if ( IS_ENABLED(CONFIG_DEBUG) ) +{ +unsigned int hwsize; + +xstates |= XSTATE_FP_SSE; +hwsize = hw_compressed_size(xstates); + +if ( size != hwsize ) +printk_once(XENLOG_ERR "%s(%#"PRIx64") size %#x != hwsize %#x\n", +__func__, xstates, size, hwsize); +size = hwsize; +} + +return size; +} + static bool valid_xcr0
[PATCH v2 0/4] x86/xstate: Fixes to size calculations
Various fixes and improvements to xsave image size calculations. Since v1: * Rebase over exposing XSAVEC. This has highlighted several latent bugs. * Rework the uncompressed size handling. LWP / APX_F make for much sadness. Be aware that Intel and AMD have different ABIs for the xstate layout. Andrew Cooper (4): x86/hvm: Defer the size calculation in hvm_save_cpu_xsave_states() x86/xstate: Rework xstate_ctxt_size() as xstate_uncompressed_size() x86/cpu-policy: Simplify recalculate_xstate() x86/cpuid: Fix handling of xsave dynamic leaves xen/arch/x86/cpu-policy.c | 55 ++- xen/arch/x86/cpuid.c | 24 +++ xen/arch/x86/domctl.c | 2 +- xen/arch/x86/hvm/hvm.c| 12 +++- xen/arch/x86/include/asm/xstate.h | 4 +- xen/arch/x86/xstate.c | 111 -- 6 files changed, 145 insertions(+), 63 deletions(-) -- 2.30.2
[PATCH v2 2/4] x86/xstate: Rework xstate_ctxt_size() as xstate_uncompressed_size()
We're soon going to need a compressed helper of the same form. The size of the uncompressed image depends on the single element with the largest offset+size. Sadly this isn't always the element with the largest index. Retain the cross-check with hardware in debug builds, but forgo it normal builds. In particular, this means that the migration paths don't need to mess with XCR0 just to sanity check the buffer size. No practical change. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné v2: * Scan all features. LWP/APX_F are out-of-order. --- xen/arch/x86/domctl.c | 2 +- xen/arch/x86/hvm/hvm.c| 2 +- xen/arch/x86/include/asm/xstate.h | 2 +- xen/arch/x86/xstate.c | 45 +++ 4 files changed, 43 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index 9a72d57333e9..c2f2016ed45a 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -833,7 +833,7 @@ long arch_do_domctl( uint32_t offset = 0; #define PV_XSAVE_HDR_SIZE (2 * sizeof(uint64_t)) -#define PV_XSAVE_SIZE(xcr0) (PV_XSAVE_HDR_SIZE + xstate_ctxt_size(xcr0)) +#define PV_XSAVE_SIZE(xcr0) (PV_XSAVE_HDR_SIZE + xstate_uncompressed_size(xcr0)) ret = -ESRCH; if ( (evc->vcpu >= d->max_vcpus) || diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 9594e0a5c530..9e677d891d6c 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1191,7 +1191,7 @@ HVM_REGISTER_SAVE_RESTORE(CPU, hvm_save_cpu_ctxt, NULL, hvm_load_cpu_ctxt, 1, #define HVM_CPU_XSAVE_SIZE(xcr0) (offsetof(struct hvm_hw_cpu_xsave, \ save_area) + \ - xstate_ctxt_size(xcr0)) + xstate_uncompressed_size(xcr0)) static int cf_check hvm_save_cpu_xsave_states( struct vcpu *v, hvm_domain_context_t *h) diff --git a/xen/arch/x86/include/asm/xstate.h b/xen/arch/x86/include/asm/xstate.h index c08c267884f0..f5115199d4f9 100644 --- a/xen/arch/x86/include/asm/xstate.h +++ b/xen/arch/x86/include/asm/xstate.h @@ -107,7 +107,7 @@ void compress_xsave_states(struct vcpu *v, const void *src, unsigned int size); void xstate_free_save_area(struct vcpu *v); int xstate_alloc_save_area(struct vcpu *v); void xstate_init(struct cpuinfo_x86 *c); -unsigned int xstate_ctxt_size(u64 xcr0); +unsigned int xstate_uncompressed_size(uint64_t xcr0); static inline uint64_t xgetbv(unsigned int index) { diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c index 99cedb4f5e24..a94f4025fce5 100644 --- a/xen/arch/x86/xstate.c +++ b/xen/arch/x86/xstate.c @@ -183,7 +183,7 @@ void expand_xsave_states(const struct vcpu *v, void *dest, unsigned int size) /* Check there is state to serialise (i.e. at least an XSAVE_HDR) */ BUG_ON(!v->arch.xcr0_accum); /* Check there is the correct room to decompress into. */ -BUG_ON(size != xstate_ctxt_size(v->arch.xcr0_accum)); +BUG_ON(size != xstate_uncompressed_size(v->arch.xcr0_accum)); if ( !(xstate->xsave_hdr.xcomp_bv & XSTATE_COMPACTION_ENABLED) ) { @@ -245,7 +245,7 @@ void compress_xsave_states(struct vcpu *v, const void *src, unsigned int size) u64 xstate_bv, valid; BUG_ON(!v->arch.xcr0_accum); -BUG_ON(size != xstate_ctxt_size(v->arch.xcr0_accum)); +BUG_ON(size != xstate_uncompressed_size(v->arch.xcr0_accum)); ASSERT(!xsave_area_compressed(src)); xstate_bv = ((const struct xsave_struct *)src)->xsave_hdr.xstate_bv; @@ -567,16 +567,51 @@ static unsigned int hw_uncompressed_size(uint64_t xcr0) return size; } -/* Fastpath for common xstate size requests, avoiding reloads of xcr0. */ -unsigned int xstate_ctxt_size(u64 xcr0) +unsigned int xstate_uncompressed_size(uint64_t xcr0) { +unsigned int size = XSTATE_AREA_MIN_SIZE, i; + if ( xcr0 == xfeature_mask ) return xsave_cntxt_size; if ( xcr0 == 0 ) /* TODO: clean up paths passing 0 in here. */ return 0; -return hw_uncompressed_size(xcr0); +if ( xcr0 <= (X86_XCR0_SSE | X86_XCR0_FP) ) +return size; + +/* + * For the non-legacy states, search all activate states and find the + * maximum offset+size. Some states (e.g. LWP, APX_F) are out-of-order + * with respect their index. + */ +xcr0 &= ~XSTATE_FP_SSE; +for_each_set_bit ( i, , 63 ) +{ +unsigned int s; + +ASSERT(xstate_offsets[i] && xstate_sizes[i]); + +s = xstate_offsets[i] && xstate_sizes[i]; + +size = max(size, s); +} + +/* In debug builds, cross-check our calculation with hardware. */ +if ( IS_ENABLED(CONFIG_DEBUG) ) +{ +unsigned int hwsize; + +xcr0 |= XSTATE_FP_SSE; +hwsize = hw_uncompressed_size(xcr0); + +if ( size != hwsize ) +printk_once(XENLOG_ERR "%s(%#"
[PATCH v2 1/4] x86/hvm: Defer the size calculation in hvm_save_cpu_xsave_states()
HVM_CPU_XSAVE_SIZE() may rewrite %xcr0 twice. Defer the caluclation it until after we've decided to write out an XSAVE record. Note in hvm_load_cpu_xsave_states() that there were versions of Xen which wrote out a useless XSAVE record. This sadly limits out ability to tidy up the existing infrastructure. Also leave a note in xstate_ctxt_size() that 0 still needs tolerating for now. No functional change. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné v2: * New --- xen/arch/x86/hvm/hvm.c | 10 -- xen/arch/x86/xstate.c | 2 +- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 0ce45b177cf4..9594e0a5c530 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1197,12 +1197,13 @@ static int cf_check hvm_save_cpu_xsave_states( struct vcpu *v, hvm_domain_context_t *h) { struct hvm_hw_cpu_xsave *ctxt; -unsigned int size = HVM_CPU_XSAVE_SIZE(v->arch.xcr0_accum); +unsigned int size; int err; -if ( !cpu_has_xsave || !xsave_enabled(v) ) +if ( !xsave_enabled(v) ) return 0; /* do nothing */ +size = HVM_CPU_XSAVE_SIZE(v->arch.xcr0_accum); err = _hvm_init_entry(h, CPU_XSAVE_CODE, v->vcpu_id, size); if ( err ) return err; @@ -1255,6 +1256,11 @@ static int cf_check hvm_load_cpu_xsave_states( if ( !cpu_has_xsave ) return -EOPNOTSUPP; +/* + * Note: Xen prior to 4.12 would write out empty XSAVE records for VMs + * running on XSAVE-capable hardware but without XSAVE active. + */ + /* Customized checking for entry since our entry is of variable length */ desc = (struct hvm_save_descriptor *)>data[h->cur]; if ( sizeof (*desc) > h->size - h->cur) diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c index cf94761d0542..99cedb4f5e24 100644 --- a/xen/arch/x86/xstate.c +++ b/xen/arch/x86/xstate.c @@ -573,7 +573,7 @@ unsigned int xstate_ctxt_size(u64 xcr0) if ( xcr0 == xfeature_mask ) return xsave_cntxt_size; -if ( xcr0 == 0 ) +if ( xcr0 == 0 ) /* TODO: clean up paths passing 0 in here. */ return 0; return hw_uncompressed_size(xcr0); -- 2.30.2
Re: [PATCH 2/5] x86/cpu-policy: Add SVM features already used by Xen
On 29/04/2024 4:16 pm, Andrew Cooper wrote: > diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py > index bf3f9ec01e6e..f07b1f4cf905 100755 > --- a/xen/tools/gen-cpuid.py > +++ b/xen/tools/gen-cpuid.py > @@ -280,6 +280,9 @@ def crunch_numbers(state): > # standard 3DNow in the earlier K6 processors. > _3DNOW: [_3DNOWEXT], > > +# The SVM bit enumerates the whole SVM leave. > +SVM: list(range(NPT, NPT + 32)), This should say leaf. Fixed locally. ~Andrew
[PATCH 1/5] x86/cpu-policy: Infrastructure for the AMD SVM and SEV leaves
Allocate two new feature leaves, and extend cpu_policy with the non-feature fields too. The CPUID dependency between the SVM bit on the whole SVM leaf is intentionally deferred, to avoid transiently breaking nested virt. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Stefano Stabellini CC: Xenia Ragiadakou CC: Sergiy Kibrik CC: George Dunlap CC: Andrei Semenov CC: Vaishali Thakkar --- tools/libs/light/libxl_cpuid.c | 2 ++ tools/misc/xen-cpuid.c | 10 + xen/arch/x86/cpu/common.c | 4 xen/include/public/arch-x86/cpufeatureset.h | 4 xen/include/xen/lib/x86/cpu-policy.h| 24 +++-- xen/lib/x86/cpuid.c | 4 6 files changed, 46 insertions(+), 2 deletions(-) diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c index ce4f3c7095ba..c7a8b76f541d 100644 --- a/tools/libs/light/libxl_cpuid.c +++ b/tools/libs/light/libxl_cpuid.c @@ -342,6 +342,8 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *policy, const char* str) CPUID_ENTRY(0x0007, 1, CPUID_REG_EDX), MSR_ENTRY(0x10a, CPUID_REG_EAX), MSR_ENTRY(0x10a, CPUID_REG_EDX), +CPUID_ENTRY(0x800a, NA, CPUID_REG_EDX), +CPUID_ENTRY(0x801f, NA, CPUID_REG_EAX), #undef MSR_ENTRY #undef CPUID_ENTRY }; diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c index 8893547bebce..ab09410a05d6 100644 --- a/tools/misc/xen-cpuid.c +++ b/tools/misc/xen-cpuid.c @@ -264,6 +264,14 @@ static const char *const str_m10Ah[32] = { }; +static const char *const str_eAd[32] = +{ +}; + +static const char *const str_e1Fa[32] = +{ +}; + static const struct { const char *name; const char *abbr; @@ -288,6 +296,8 @@ static const struct { { "CPUID 0x0007:1.edx", "7d1", str_7d1 }, { "MSR_ARCH_CAPS.lo", "m10Al", str_m10Al }, { "MSR_ARCH_CAPS.hi", "m10Ah", str_m10Ah }, +{ "CPUID 0x800a.edx", "eAd", str_eAd }, +{ "CPUID 0x801f.eax", "e1Fa", str_e1Fa }, }; #define COL_ALIGN "24" diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c index 28d7f34c4dbe..25b11e6472b8 100644 --- a/xen/arch/x86/cpu/common.c +++ b/xen/arch/x86/cpu/common.c @@ -477,6 +477,10 @@ static void generic_identify(struct cpuinfo_x86 *c) c->x86_capability[FEATURESET_e7d] = cpuid_edx(0x8007); if (c->extended_cpuid_level >= 0x8008) c->x86_capability[FEATURESET_e8b] = cpuid_ebx(0x8008); + if (c->extended_cpuid_level >= 0x800a) + c->x86_capability[FEATURESET_eAd] = cpuid_edx(0x800a); + if (c->extended_cpuid_level >= 0x801f) + c->x86_capability[FEATURESET_e1Fa] = cpuid_eax(0x801f); if (c->extended_cpuid_level >= 0x8021) c->x86_capability[FEATURESET_e21a] = cpuid_eax(0x8021); diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h index 53f13dec31f7..0f869214811e 100644 --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -357,6 +357,10 @@ XEN_CPUFEATURE(RFDS_CLEAR, 16*32+28) /*!A Register File(s) cleared by VE /* Intel-defined CPU features, MSR_ARCH_CAPS 0x10a.edx, word 17 */ +/* AMD-defined CPU features, CPUID level 0x800a.edx, word 18 */ + +/* AMD-defined CPU features, CPUID level 0x801f.eax, word 19 */ + #endif /* XEN_CPUFEATURE */ /* Clean up from a default include. Close the enum (for C). */ diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h index d5e447e9dc06..936e00e4da73 100644 --- a/xen/include/xen/lib/x86/cpu-policy.h +++ b/xen/include/xen/lib/x86/cpu-policy.h @@ -22,6 +22,8 @@ #define FEATURESET_7d1 15 /* 0x0007:1.edx*/ #define FEATURESET_m10Al 16 /* 0x010a.eax */ #define FEATURESET_m10Ah 17 /* 0x010a.edx */ +#define FEATURESET_eAd 18 /* 0x800a.edx */ +#define FEATURESET_e1Fa 19 /* 0x801f.eax */ struct cpuid_leaf { @@ -296,7 +298,16 @@ struct cpu_policy uint32_t /* d */:32; uint64_t :64, :64; /* Leaf 0x8009. */ -uint64_t :64, :64; /* Leaf 0x800a - SVM rev and features. */ + +/* Leaf 0x800a - SVM rev and features. */ +uint8_t svm_rev, :8, :8, :8; +uint32_t /* b */ :32; +uint32_t nr_asids; +union { +uint32_t eAd; +struct { DECL_BITFIELD(eAd); }; +}; + uint64_t :64, :64; /* Leaf 0x800b. */ uint64_t :64, :64; /* Leaf 0x800c. */ uint64_t :64, :64; /* Leaf
[PATCH 4/5] x86/svm: Switch SVM features over normal cpu_has_*
Delete the boot time rendering of advanced features. It's entirely ad-hoc and not even everything printed here is used by Xen. It is available in `xen-cpuid` now. With (only) svm_load_segs_{,prefetch}() declared now in svm.h, only svm.c and domain.c which need the header. Clean up all others. No functional change. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Stefano Stabellini CC: Xenia Ragiadakou CC: Sergiy Kibrik CC: George Dunlap CC: Andrei Semenov CC: Vaishali Thakkar --- xen/arch/x86/hvm/svm/asid.c| 5 ++- xen/arch/x86/hvm/svm/emulate.c | 3 +- xen/arch/x86/hvm/svm/intr.c| 1 - xen/arch/x86/hvm/svm/nestedsvm.c | 14 xen/arch/x86/hvm/svm/svm.c | 50 +++--- xen/arch/x86/hvm/svm/vmcb.c| 1 - xen/arch/x86/include/asm/cpufeature.h | 10 ++ xen/arch/x86/include/asm/hvm/svm/svm.h | 36 --- 8 files changed, 31 insertions(+), 89 deletions(-) diff --git a/xen/arch/x86/hvm/svm/asid.c b/xen/arch/x86/hvm/svm/asid.c index 7977a8e86b53..6117a362d310 100644 --- a/xen/arch/x86/hvm/svm/asid.c +++ b/xen/arch/x86/hvm/svm/asid.c @@ -6,7 +6,6 @@ #include #include -#include #include "svm.h" @@ -39,7 +38,7 @@ void svm_asid_handle_vmrun(void) { vmcb_set_asid(vmcb, true); vmcb->tlb_control = -cpu_has_svm_flushbyasid ? TLB_CTRL_FLUSH_ASID : TLB_CTRL_FLUSH_ALL; +cpu_has_flush_by_asid ? TLB_CTRL_FLUSH_ASID : TLB_CTRL_FLUSH_ALL; return; } @@ -48,7 +47,7 @@ void svm_asid_handle_vmrun(void) vmcb->tlb_control = !need_flush ? TLB_CTRL_NO_FLUSH : -cpu_has_svm_flushbyasid ? TLB_CTRL_FLUSH_ASID : TLB_CTRL_FLUSH_ALL; +cpu_has_flush_by_asid ? TLB_CTRL_FLUSH_ASID : TLB_CTRL_FLUSH_ALL; } /* diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c index 93ac1d3435f9..da6e21b2e270 100644 --- a/xen/arch/x86/hvm/svm/emulate.c +++ b/xen/arch/x86/hvm/svm/emulate.c @@ -11,7 +11,6 @@ #include #include #include -#include #include #include "svm.h" @@ -20,7 +19,7 @@ static unsigned long svm_nextrip_insn_length(struct vcpu *v) { struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb; -if ( !cpu_has_svm_nrips ) +if ( !cpu_has_nrips ) return 0; #ifndef NDEBUG diff --git a/xen/arch/x86/hvm/svm/intr.c b/xen/arch/x86/hvm/svm/intr.c index 4805c5567213..facd2894a2c6 100644 --- a/xen/arch/x86/hvm/svm/intr.c +++ b/xen/arch/x86/hvm/svm/intr.c @@ -17,7 +17,6 @@ #include #include #include -#include #include /* for nestedhvm_vcpu_in_guestmode */ #include #include diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c index 35a2cbfd7d13..255af112661f 100644 --- a/xen/arch/x86/hvm/svm/nestedsvm.c +++ b/xen/arch/x86/hvm/svm/nestedsvm.c @@ -6,7 +6,6 @@ */ #include -#include #include #include #include @@ -1620,7 +1619,7 @@ void svm_nested_features_on_efer_update(struct vcpu *v) { if ( !vmcb->virt_ext.fields.vloadsave_enable && paging_mode_hap(v->domain) && - cpu_has_svm_vloadsave ) + cpu_has_v_loadsave ) { vmcb->virt_ext.fields.vloadsave_enable = 1; general2_intercepts = vmcb_get_general2_intercepts(vmcb); @@ -1629,8 +1628,7 @@ void svm_nested_features_on_efer_update(struct vcpu *v) vmcb_set_general2_intercepts(vmcb, general2_intercepts); } -if ( !vmcb->_vintr.fields.vgif_enable && - cpu_has_svm_vgif ) +if ( !vmcb->_vintr.fields.vgif_enable && cpu_has_v_gif ) { vintr = vmcb_get_vintr(vmcb); vintr.fields.vgif = svm->ns_gif; @@ -1675,8 +1673,8 @@ void __init start_nested_svm(struct hvm_function_table *hvm_function_table) */ hvm_function_table->caps.nested_virt = hvm_function_table->caps.hap && -cpu_has_svm_lbrv && -cpu_has_svm_nrips && -cpu_has_svm_flushbyasid && -cpu_has_svm_decode; +cpu_has_v_lbr && +cpu_has_nrips && +cpu_has_flush_by_asid && +cpu_has_decode_assist; } diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 4719fffae589..16eb875aab94 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1287,7 +1287,7 @@ static void cf_check svm_inject_event(const struct x86_event *event) * that hardware doesn't perform DPL checking on injection. */ if ( event->type == X86_EVENTTYPE_PRI_SW_EXCEPTION || - (!cpu_has_svm_nrips && (event->type >= X86_EVENTTYPE_SW_INTERRUPT)) ) + (!cpu_has_nrips && (event->type >= X86_EVENTTYPE_SW_INTERRUPT)) ) svm_emul_swint_injection(&_event);
[PATCH 0/5] x86: AMD CPUID handling improvements
This is (half) the series I've promised various people, to untangle some AMD CPUID handling. It moves the SVM feature leaf into the standard x86_capabilities[] infrastructure. On a random Milan system, with this in place, xen-cpuid reports: Dynamic sets: Raw 178bfbff:7eda320b:2fd3fbff:75c237ff:000f:219c95a9:0040069c:6799:91bef75f:::204d:::::::119b9cff:0101fd3f [18] CPUID 0x800a.edx npt v-lbr svm-lock nrips v-tsc-rate vmcb-cleanbits flush-by-asid decode-assist pause-filter <11> pause-thresh v-loadsave v-gif <17> npt-sss v-spec-ctrl <23> <24> <28> [19] CPUID 0x801f.eax sme sev <2> sev-es sev-snp <5> <8> <10> <11> <12> <13> <14> <15> <16> <24> Host 178bf3ff:76da320b:2fd3fbff:644037ff:000f:219c95a9:0040068c:0780:319ed205:::1845:::::::001994ff: [18] CPUID 0x800a.edx npt v-lbr svm-lock nrips v-tsc-rate vmcb-cleanbits flush-by-asid decode-assist pause-filter pause-thresh v-loadsave v-gif npt-sss v-spec-ctrl [19] CPUID 0x801f.eax HVM Max 178bfbff:f6fa3203:2e500800:440001f7:000f:219c05a9:0040060c:0100:331ed005:::1845:::::::04ab: [18] CPUID 0x800a.edx npt v-lbr nrips vmcb-cleanbits decode-assist pause-filter [19] CPUID 0x801f.eax Unforunately, I haven't managed to do the second half to make the host_policy usable earlier on boot. Untanling __setup_xen() is proving stuborn due to (ab)uses of the MB1 module list. Andrew Cooper (5): x86/cpu-policy: Infrastructure for the AMD SVM and SEV leaves x86/cpu-policy: Add SVM features already used by Xen x86/spec-ctrl: Remove open-coded check of SVM_FEATURE_SPEC_CTRL x86/svm: Switch SVM features over normal cpu_has_* x86/cpu-policy: Introduce some SEV features tools/libs/light/libxl_cpuid.c | 2 + tools/misc/xen-cpuid.c | 24 ++ xen/arch/x86/cpu-policy.c | 17 +++ xen/arch/x86/cpu/common.c | 4 ++ xen/arch/x86/hvm/svm/asid.c | 5 +-- xen/arch/x86/hvm/svm/emulate.c | 3 +- xen/arch/x86/hvm/svm/intr.c | 1 - xen/arch/x86/hvm/svm/nestedsvm.c| 14 +++--- xen/arch/x86/hvm/svm/svm.c | 50 + xen/arch/x86/hvm/svm/vmcb.c | 1 - xen/arch/x86/include/asm/cpufeature.h | 16 +++ xen/arch/x86/include/asm/hvm/svm/svm.h | 36 --- xen/arch/x86/spec_ctrl.c| 7 +-- xen/include/public/arch-x86/cpufeatureset.h | 22 + xen/include/xen/lib/x86/cpu-policy.h| 24 +- xen/lib/x86/cpuid.c | 4 ++ xen/tools/gen-cpuid.py | 7 +++ 17 files changed, 128 insertions(+), 109 deletions(-) -- 2.30.2
[PATCH 5/5] x86/cpu-policy: Introduce some SEV features
For display purposes only right now. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Stefano Stabellini CC: Xenia Ragiadakou CC: Sergiy Kibrik CC: George Dunlap CC: Andrei Semenov CC: Vaishali Thakkar This is only half the work to get SEV working nicely. The other half (rearranging __start_xen() so we can move the host policy collection earlier) is still a work-in-progress. --- tools/misc/xen-cpuid.c | 3 +++ xen/arch/x86/include/asm/cpufeature.h | 3 +++ xen/include/public/arch-x86/cpufeatureset.h | 4 xen/tools/gen-cpuid.py | 6 +- 4 files changed, 15 insertions(+), 1 deletion(-) diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c index 0d01b0e797f1..1463e0429ba1 100644 --- a/tools/misc/xen-cpuid.c +++ b/tools/misc/xen-cpuid.c @@ -281,6 +281,9 @@ static const char *const str_eAd[32] = static const char *const str_e1Fa[32] = { +[ 0] = "sme", [ 1] = "sev", +/* 2 */ [ 3] = "sev-es", +[ 4] = "sev-snp", }; static const struct { diff --git a/xen/arch/x86/include/asm/cpufeature.h b/xen/arch/x86/include/asm/cpufeature.h index b6fb8c24423c..732f0d2bf758 100644 --- a/xen/arch/x86/include/asm/cpufeature.h +++ b/xen/arch/x86/include/asm/cpufeature.h @@ -230,6 +230,9 @@ static inline bool boot_cpu_has(unsigned int feat) #define cpu_has_v_gif boot_cpu_has(X86_FEATURE_V_GIF) #define cpu_has_v_spec_ctrl boot_cpu_has(X86_FEATURE_V_SPEC_CTRL) +/* CPUID level 0x801f.eax */ +#define cpu_has_sev boot_cpu_has(X86_FEATURE_SEV) + /* Synthesized. */ #define cpu_has_arch_perfmonboot_cpu_has(X86_FEATURE_ARCH_PERFMON) #define cpu_has_cpuid_faulting boot_cpu_has(X86_FEATURE_CPUID_FAULTING) diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h index 80d252a38c2d..7ee0f2329151 100644 --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -374,6 +374,10 @@ XEN_CPUFEATURE(NPT_SSS,18*32+19) /* NPT Supervisor Shadow Stacks * XEN_CPUFEATURE(V_SPEC_CTRL,18*32+20) /* Virtualised MSR_SPEC_CTRL */ /* AMD-defined CPU features, CPUID level 0x801f.eax, word 19 */ +XEN_CPUFEATURE(SME,19*32+ 0) /* Secure Memory Encryption */ +XEN_CPUFEATURE(SEV,19*32+ 1) /* Secure Encryped VM */ +XEN_CPUFEATURE(SEV_ES, 19*32+ 3) /* SEV Encrypted State */ +XEN_CPUFEATURE(SEV_SNP,19*32+ 4) /* SEV Secure Nested Paging */ #endif /* XEN_CPUFEATURE */ diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py index f07b1f4cf905..bff4d9389ff6 100755 --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -281,7 +281,7 @@ def crunch_numbers(state): _3DNOW: [_3DNOWEXT], # The SVM bit enumerates the whole SVM leave. -SVM: list(range(NPT, NPT + 32)), +SVM: list(range(NPT, NPT + 32)) + [SEV], # This is just the dependency between AVX512 and AVX2 of XSTATE # feature flags. If want to use AVX512, AVX2 must be supported and @@ -341,6 +341,10 @@ def crunch_numbers(state): # The behaviour described by RRSBA depend on eIBRS being active. EIBRS: [RRSBA], + +SEV: [SEV_ES], + +SEV_ES: [SEV_SNP], } deep_features = tuple(sorted(deps.keys())) -- 2.30.2
[PATCH 3/5] x86/spec-ctrl: Remove open-coded check of SVM_FEATURE_SPEC_CTRL
Now that the SVM feature leaf has been included in normal feature handling, it is available early enough for init_speculation_mitigations() to use. No functional change. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Stefano Stabellini CC: Xenia Ragiadakou CC: Sergiy Kibrik CC: George Dunlap CC: Andrei Semenov CC: Vaishali Thakkar --- xen/arch/x86/include/asm/cpufeature.h | 3 +++ xen/arch/x86/spec_ctrl.c | 7 +-- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/xen/arch/x86/include/asm/cpufeature.h b/xen/arch/x86/include/asm/cpufeature.h index 9bc553681f4a..77cfd900cb56 100644 --- a/xen/arch/x86/include/asm/cpufeature.h +++ b/xen/arch/x86/include/asm/cpufeature.h @@ -217,6 +217,9 @@ static inline bool boot_cpu_has(unsigned int feat) #define cpu_has_rfds_no boot_cpu_has(X86_FEATURE_RFDS_NO) #define cpu_has_rfds_clear boot_cpu_has(X86_FEATURE_RFDS_CLEAR) +/* CPUID level 0x800a.edx */ +#define cpu_has_v_spec_ctrl boot_cpu_has(X86_FEATURE_V_SPEC_CTRL) + /* Synthesized. */ #define cpu_has_arch_perfmonboot_cpu_has(X86_FEATURE_ARCH_PERFMON) #define cpu_has_cpuid_faulting boot_cpu_has(X86_FEATURE_CPUID_FAULTING) diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c index 40f6ae017010..0bda9d01def5 100644 --- a/xen/arch/x86/spec_ctrl.c +++ b/xen/arch/x86/spec_ctrl.c @@ -11,7 +11,6 @@ #include #include -#include #include #include #include @@ -1896,12 +1895,8 @@ void __init init_speculation_mitigations(void) * * No need for SCF_ist_sc_msr because Xen's value is restored * atomically WRT NMIs in the VMExit path. - * - * TODO: Adjust cpu_has_svm_spec_ctrl to be usable earlier on boot. */ -if ( opt_msr_sc_hvm && - (boot_cpu_data.extended_cpuid_level >= 0x800aU) && - (cpuid_edx(0x800aU) & (1u << SVM_FEATURE_SPEC_CTRL)) ) +if ( opt_msr_sc_hvm && cpu_has_v_spec_ctrl ) setup_force_cpu_cap(X86_FEATURE_SC_MSR_HVM); } -- 2.30.2
[PATCH 2/5] x86/cpu-policy: Add SVM features already used by Xen
These will replace svm_feature_flags and the SVM_FEATURE_* constants over the next few changes. Take the opportunity to rationalise some names. Drop the opencoded "inherit from host" logic in calculate_hvm_max_policy() and use 'h'/'!' annotations. The logic needs to operate on fs, not the policy object, given its position within the function. Drop some trailing whitespace introduced when this block of code was last moved. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Stefano Stabellini CC: Xenia Ragiadakou CC: Sergiy Kibrik CC: George Dunlap CC: Andrei Semenov CC: Vaishali Thakkar --- tools/misc/xen-cpuid.c | 11 +++ xen/arch/x86/cpu-policy.c | 17 + xen/include/public/arch-x86/cpufeatureset.h | 14 ++ xen/tools/gen-cpuid.py | 3 +++ 4 files changed, 33 insertions(+), 12 deletions(-) diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c index ab09410a05d6..0d01b0e797f1 100644 --- a/tools/misc/xen-cpuid.c +++ b/tools/misc/xen-cpuid.c @@ -266,6 +266,17 @@ static const char *const str_m10Ah[32] = static const char *const str_eAd[32] = { +[ 0] = "npt", [ 1] = "v-lbr", +[ 2] = "svm-lock",[ 3] = "nrips", +[ 4] = "v-tsc-rate", [ 5] = "vmcb-cleanbits", +[ 6] = "flush-by-asid", [ 7] = "decode-assist", + +[10] = "pause-filter", +[12] = "pause-thresh", +/* 14 */ [15] = "v-loadsave", +[16] = "v-gif", +/* 18 */ [19] = "npt-sss", +[20] = "v-spec-ctrl", }; static const char *const str_e1Fa[32] = diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c index 4b6d96276399..da4401047e89 100644 --- a/xen/arch/x86/cpu-policy.c +++ b/xen/arch/x86/cpu-policy.c @@ -9,7 +9,6 @@ #include #include #include -#include #include #include #include @@ -748,22 +747,16 @@ static void __init calculate_hvm_max_policy(void) if ( !cpu_has_vmx ) __clear_bit(X86_FEATURE_PKS, fs); -/* +/* * Make adjustments to possible (nested) virtualization features exposed * to the guest */ -if ( p->extd.svm ) +if ( test_bit(X86_FEATURE_SVM, fs) ) { -/* Clamp to implemented features which require hardware support. */ -p->extd.raw[0xa].d &= ((1u << SVM_FEATURE_NPT) | - (1u << SVM_FEATURE_LBRV) | - (1u << SVM_FEATURE_NRIPS) | - (1u << SVM_FEATURE_PAUSEFILTER) | - (1u << SVM_FEATURE_DECODEASSISTS)); -/* Enable features which are always emulated. */ -p->extd.raw[0xa].d |= (1u << SVM_FEATURE_VMCBCLEAN); +/* Xen always emulates cleanbits. */ +__set_bit(X86_FEATURE_VMCB_CLEANBITS, fs); } - + guest_common_max_feature_adjustments(fs); guest_common_feature_adjustments(fs); diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h index 0f869214811e..80d252a38c2d 100644 --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -358,6 +358,20 @@ XEN_CPUFEATURE(RFDS_CLEAR, 16*32+28) /*!A Register File(s) cleared by VE /* Intel-defined CPU features, MSR_ARCH_CAPS 0x10a.edx, word 17 */ /* AMD-defined CPU features, CPUID level 0x800a.edx, word 18 */ +XEN_CPUFEATURE(NPT,18*32+ 0) /*h Nested PageTables */ +XEN_CPUFEATURE(V_LBR, 18*32+ 1) /*h Virtualised LBR */ +XEN_CPUFEATURE(SVM_LOCK, 18*32+ 2) /* SVM locking MSR */ +XEN_CPUFEATURE(NRIPS, 18*32+ 3) /*h Next-RIP saved on VMExit */ +XEN_CPUFEATURE(V_TSC_RATE, 18*32+ 4) /* Virtualised TSC Ratio */ +XEN_CPUFEATURE(VMCB_CLEANBITS, 18*32+ 5) /*! VMCB Clean-bits */ +XEN_CPUFEATURE(FLUSH_BY_ASID, 18*32+ 6) /* TLB Flush by ASID */ +XEN_CPUFEATURE(DECODE_ASSIST, 18*32+ 7) /*h Decode assists */ +XEN_CPUFEATURE(PAUSE_FILTER, 18*32+10) /*h Pause filter */ +XEN_CPUFEATURE(PAUSE_THRESH, 18*32+12) /* Pause filter threshold */ +XEN_CPUFEATURE(V_LOADSAVE, 18*32+15) /* Virtualised VMLOAD/SAVE */ +XEN_CPUFEATURE(V_GIF, 18*32+16) /* Virtualised GIF */ +XEN_CPUFEATURE(NPT_SSS,18*32+19) /* NPT Supervisor Shadow Stacks */ +XEN_CPUFEATURE(V_SPEC_CTRL,18*32+20) /* Virtualised MSR_SPEC_CTRL */ /* AMD-defined CPU features, CPUID level 0x801f.eax, word 19 */ diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py index bf3f9ec01e6e..f07b1f4cf905 100755 --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -280,6 +280,9 @@ def crunc
Re: [PATCH] revert "x86/mm: re-implement get_page_light() using an atomic increment"
On 29/04/2024 2:01 pm, Jan Beulich wrote: > revert "x86/mm: re-implement get_page_light() using an atomic increment" > > This reverts commit c40bc0576dcc5acd4d7e22ef628eb4642f568533. > > That change aimed at eliminating an open-coded lock-like construct, > which really isn't all that similar to, in particular, get_page(). The > function always succeeds. Any remaining concern would want taking care > of by placing block_lock_speculation() at the end of the function. > Since the function is called only during page (de)validation, any > possible performance concerns over such extra serialization could > likely be addressed by pre-validating (e.g. via pinning) page tables. > > The fundamental issue with the change being reverted is that it detects > bad state only after already having caused possible corruption. While > the system is going to be halted in such an event, there is a time > window during which the resulting incorrect state could be leveraged by > a clever (in particular: fast enough) attacker. > > Signed-off-by: Jan Beulich Acked-by: Andrew Cooper
Re: [PATCH] x86/cpu-policy: Annotate the accumulated features
On 29/04/2024 8:49 am, Jan Beulich wrote: > On 26.04.2024 18:08, Andrew Cooper wrote: >> Some features need accumulating rather than intersecting to make migration >> safe. Introduce the new '|' attribute for this purpose. >> >> Right now, it's only used by the Xapi toolstack, but it will be used by >> xl/libxl when the full policy-object work is complete, and until then it's >> still a useful hint for hand-crafted cpuid= lines in vm.cfg files. >> >> Signed-off-by: Andrew Cooper > Reviewed-by: Jan Beulich Thanks. > The one feature you don't annotate such that I'm not entirely sure about is > NO_FPU_SEL: On one hand it tells code not to rely on / use the selector > fields in FPU state. It's sadly far more complicated than this. This feature, and it's AMD partner RSTR_FP_ERR_PTRS, where introduced to stop windows BSOD-ing under virt, and came with an accidental breakage of x86emul/DoSBox/etc which Intel and AMD have declined to fix. If you recall, prior to these features, the hypervisor needs to divine the operand size of Window's {F,}X{SAVE,RESTORE} instructions, as it blindly does a memcmp() across the region to confirm that the interrupt handler didn't corrupt any state. Both features are discontinuities across which is is not safe to migrate. Advertising "reliably store as 0" on older systems will still cause windows to BSOD on occasion, whereas not advertising it on newer systems will suggest that legacy emulators will work, when they don't. I don't have any good idea of how to make this work nicely, other than having some admin booleans in the xl.cfg file saying "I certify I'm not using a legacy emulator in my VM" to override the safety check. Other discontinuities I'm aware of are the introduction of DAZ (changing MXCSR_MASK), and any migration which changes LBR_FORMAT. ~Andrew
Re: [PATCH] xen/x86: Fix Syntax warning in gen-cpuid.py
On 26/04/2024 5:07 am, Jason Andryuk wrote: > Python 3.12.2 warns: > > xen/tools/gen-cpuid.py:50: SyntaxWarning: invalid escape sequence '\s' > "\s+([\s\d]+\*[\s\d]+\+[\s\d]+)\)" > xen/tools/gen-cpuid.py:51: SyntaxWarning: invalid escape sequence '\s' > "\s+/\*([\w!]*) .*$") > > Specify the strings as raw strings so '\s' is read as literal '\' + 's'. > This avoids escaping all the '\'s in the strings. > > Signed-off-by: Jason Andryuk That's something I didn't know about how python does string concatenation. I was expecting the whole string to be considered raw, not just the first line. Acked-by: Andrew Cooper I'll rebase my pending change altering the regex over this.
[PATCH] x86/cpu-policy: Annotate the accumulated features
Some features need accumulating rather than intersecting to make migration safe. Introduce the new '|' attribute for this purpose. Right now, it's only used by the Xapi toolstack, but it will be used by xl/libxl when the full policy-object work is complete, and until then it's still a useful hint for hand-crafted cpuid= lines in vm.cfg files. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné --- xen/include/public/arch-x86/cpufeatureset.h | 15 ++- xen/tools/gen-cpuid.py | 7 +-- 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h index 53f13dec31f7..6627453e3985 100644 --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -72,6 +72,11 @@ enum { * 'H' = HVM HAP guests (not PV or HVM Shadow guests). * Upper case => Available by default * Lower case => Can be opted-in to, but not available by default. + * + * Migration: '|' + * This bit should be visible to a guest if any anywhere it might run has + * the bit set. i.e. it needs accumulating across the migration pool, + * rather than intersecting. */ /* Intel-defined CPU features, CPUID level 0x0001.edx, word 0 */ @@ -248,7 +253,7 @@ XEN_CPUFEATURE(IBRS_ALWAYS, 8*32+16) /*S IBRS preferred always on */ XEN_CPUFEATURE(STIBP_ALWAYS, 8*32+17) /*S STIBP preferred always on */ XEN_CPUFEATURE(IBRS_FAST, 8*32+18) /*S IBRS preferred over software options */ XEN_CPUFEATURE(IBRS_SAME_MODE, 8*32+19) /*S IBRS provides same-mode protection */ -XEN_CPUFEATURE(NO_LMSL, 8*32+20) /*S EFER.LMSLE no longer supported. */ +XEN_CPUFEATURE(NO_LMSL, 8*32+20) /*S| EFER.LMSLE no longer supported. */ XEN_CPUFEATURE(AMD_PPIN, 8*32+23) /* Protected Processor Inventory Number */ XEN_CPUFEATURE(AMD_SSBD, 8*32+24) /*S MSR_SPEC_CTRL.SSBD available */ XEN_CPUFEATURE(VIRT_SSBD, 8*32+25) /*! MSR_VIRT_SPEC_CTRL.SSBD */ @@ -263,7 +268,7 @@ XEN_CPUFEATURE(AVX512_4FMAPS, 9*32+ 3) /*A AVX512 Multiply Accumulation Single XEN_CPUFEATURE(FSRM, 9*32+ 4) /*A Fast Short REP MOVS */ XEN_CPUFEATURE(AVX512_VP2INTERSECT, 9*32+8) /*a VP2INTERSECT{D,Q} insns */ XEN_CPUFEATURE(SRBDS_CTRL,9*32+ 9) /* MSR_MCU_OPT_CTRL and RNGDS_MITG_DIS. */ -XEN_CPUFEATURE(MD_CLEAR, 9*32+10) /*!A VERW clears microarchitectural buffers */ +XEN_CPUFEATURE(MD_CLEAR, 9*32+10) /*!A| VERW clears microarchitectural buffers */ XEN_CPUFEATURE(RTM_ALWAYS_ABORT, 9*32+11) /*! RTM disabled (but XBEGIN wont fault) */ XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */ XEN_CPUFEATURE(SERIALIZE, 9*32+14) /*A SERIALIZE insn */ @@ -292,7 +297,7 @@ XEN_CPUFEATURE(AVX_IFMA, 10*32+23) /*A AVX-IFMA Instructions */ /* AMD-defined CPU features, CPUID level 0x8021.eax, word 11 */ XEN_CPUFEATURE(NO_NEST_BP, 11*32+ 0) /*A No Nested Data Breakpoints */ -XEN_CPUFEATURE(FS_GS_NS, 11*32+ 1) /*S FS/GS base MSRs non-serialising */ +XEN_CPUFEATURE(FS_GS_NS, 11*32+ 1) /*S| FS/GS base MSRs non-serialising */ XEN_CPUFEATURE(LFENCE_DISPATCH,11*32+ 2) /*A LFENCE always serializing */ XEN_CPUFEATURE(NSCB, 11*32+ 6) /*A Null Selector Clears Base (and limit too) */ XEN_CPUFEATURE(AUTO_IBRS, 11*32+ 8) /*S Automatic IBRS */ @@ -343,7 +348,7 @@ XEN_CPUFEATURE(DOITM, 16*32+12) /* Data Operand Invariant Timing XEN_CPUFEATURE(SBDR_SSDP_NO, 16*32+13) /*A No Shared Buffer Data Read or Sideband Stale Data Propagation */ XEN_CPUFEATURE(FBSDP_NO, 16*32+14) /*A No Fill Buffer Stale Data Propagation */ XEN_CPUFEATURE(PSDP_NO,16*32+15) /*A No Primary Stale Data Propagation */ -XEN_CPUFEATURE(FB_CLEAR, 16*32+17) /*!A Fill Buffers cleared by VERW */ +XEN_CPUFEATURE(FB_CLEAR, 16*32+17) /*!A| Fill Buffers cleared by VERW */ XEN_CPUFEATURE(FB_CLEAR_CTRL, 16*32+18) /* MSR_OPT_CPU_CTRL.FB_CLEAR_DIS */ XEN_CPUFEATURE(RRSBA, 16*32+19) /*! Restricted RSB Alternative */ XEN_CPUFEATURE(BHI_NO, 16*32+20) /*A No Branch History Injection */ @@ -353,7 +358,7 @@ XEN_CPUFEATURE(PBRSB_NO, 16*32+24) /*A No Post-Barrier RSB prediction XEN_CPUFEATURE(GDS_CTRL, 16*32+25) /* MCU_OPT_CTRL.GDS_MIT_{DIS,LOCK} */ XEN_CPUFEATURE(GDS_NO, 16*32+26) /*A No Gather Data Sampling */ XEN_CPUFEATURE(RFDS_NO,16*32+27) /*A No Register File Data Sampling */ -XEN_CPUFEATURE(RFDS_CLEAR, 16*32+28) /*!A Register File(s) cleared by VERW */ +XEN_CPUFEATURE(RFDS_CLEAR, 16*32+28) /*!A| Register File(s) cleared by VERW */ /* Intel-defined CPU features, MSR_ARCH_CAPS 0x10a.edx, word 17 */ diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py index bf3f9ec01e6e..1fb76f664529 100755 --- a/xen/too
Re: [PATCH 1/3] x86/hvm/trace: Use a different trace type for AMD processors
On 26/04/2024 4:29 pm, George Dunlap wrote: > On Fri, Apr 26, 2024 at 4:18 PM Andrew Cooper > wrote: >> On 26/04/2024 3:32 pm, George Dunlap wrote: >>> In xenalyze, first remove the redundant call to init_hvm_data(); >>> there's no way to get to hvm_vmexit_process() without it being already >>> initialized by the set_vcpu_type call in hvm_process(). >>> >>> Replace this with set_hvm_exit_reson_data(), and move setting of >>> hvm->exit_reason_* into that function. >>> >>> Modify hvm_process and hvm_vmexit_process to handle all four potential >>> values appropriately. >>> >>> If SVM entries are encountered, set opt.svm_mode so that other >>> SVM-specific functionality is triggered. >> Given that xenalyze is now closely tied to Xen, and that we're >> technically changing the ABI here, is there any point keeping `--svm-mode` ? >> >> I'm unsure of the utility of reading the buggy trace records from an >> older version of Xen. > Yeah, I thought about that. If nobody argues to keep it, I guess I'll > rip it out for v2. That's the way I'd suggest going. >>> Signed-off-by: George Dunlap >>> --- >>> NB that this patch goes on top of Andrew's trace cleanup series: >>> >>> https://lore.kernel.org/xen-devel/20240318163552.3808695-1-andrew.coop...@citrix.com/ >> The delta in Xen is trivial. I'm happy if you want to commit this, and >> I can rebase over it. > It's trivial in part *because of* your series. Without your series I > would have had to do a bunch of munging around with DO_TRC_BLAH_BLAH > to make it compile. In fact, I started doing it directly on staging, > but quickly moved onto your series to save myself some time. :-) Ah. I'll be refreshing mine next week. Given that it's missed everything since 4.16, I'm not intending to let it miss this one... But I've still got a few things which need to go out before the past-post deadline. ~Andrew
Re: [PATCH 1/3] x86/hvm/trace: Use a different trace type for AMD processors
On 26/04/2024 3:32 pm, George Dunlap wrote: > A long-standing usability sub-optimality with xenalyze is the > necessity to specify `--svm-mode` when analyzing AMD processors. This > fundamentally comes about because the same trace event ID is used for > both VMX and SVM, but the contents of the trace must be interpreted > differently. > > Instead, allocate separate trace events for VMX and SVM vmexits in > Xen; this will allow all readers to properly intrepret the meaning of interpret ? > the vmexit reason. > > In xenalyze, first remove the redundant call to init_hvm_data(); > there's no way to get to hvm_vmexit_process() without it being already > initialized by the set_vcpu_type call in hvm_process(). > > Replace this with set_hvm_exit_reson_data(), and move setting of > hvm->exit_reason_* into that function. > > Modify hvm_process and hvm_vmexit_process to handle all four potential > values appropriately. > > If SVM entries are encountered, set opt.svm_mode so that other > SVM-specific functionality is triggered. Given that xenalyze is now closely tied to Xen, and that we're technically changing the ABI here, is there any point keeping `--svm-mode` ? I'm unsure of the utility of reading the buggy trace records from an older version of Xen. > Also add lines in `formats` for xentrace_format. Personally I'd have put patch 3 first, and reduced the churn here. > Signed-off-by: George Dunlap > --- > NB that this patch goes on top of Andrew's trace cleanup series: > > https://lore.kernel.org/xen-devel/20240318163552.3808695-1-andrew.coop...@citrix.com/ The delta in Xen is trivial. I'm happy if you want to commit this, and I can rebase over it. ~Andrew
Re: [PATCH 2/3] tools/xenalyze: Ignore HVM_EMUL events harder
On 26/04/2024 3:32 pm, George Dunlap wrote: > To unify certain common sanity checks, checks are done very early in > processing based only on the top-level type. > > Unfortunately, when TRC_HVM_EMUL was introduced, it broke some of the > assumptions about how the top-level types worked. Namely, traces of > this type will show up outside of HVM contexts: in idle domains and in > PV domains. > > Make an explicit exception for TRC_HVM_EMUL types in a number of places: > > - Pass the record info pointer to toplevel_assert_check, so that it >can exclude TRC_HVM_EMUL records from idle and vcpu data_mode >checks > > - Don't attempt to set the vcpu data_type in hvm_process for >TRC_HVM_EMUL records. > > Signed-off-by: George Dunlap Acked-by: Andrew Cooper Although I'm tempted to say that if records of this type show up outside of HVM context, then it's misnamed or we've got an error in Xen. Any view on which? ~Andrew
Re: [PATCH 3/3] tools/xentrace: Remove xentrace_format
On 26/04/2024 3:32 pm, George Dunlap wrote: > xentrace_format was always of limited utility, since trace records > across pcpus were processed out of order; it was superseded by xenalyze > over a decade ago. > > But for several releases, the `formats` file it has depended on for > proper operation has not even been included in `make install` (which > generally means it doesn't get picked up by distros either); yet > nobody has seemed to complain. > > Simple remove xentrace_format, and point people to xenalyze instead. > > NB that there is no man page for xenalyze, so the "see also" on the > xentrace man page is simply removed for now. > > Signed-off-by: George Dunlap Acked-by: Andrew Cooper
[PATCH 0/3] x86/boot: Untangling
This started out as another series trying to fix CPUID handling for SEV. I'm still trying to sort out the SEV side of things, but I might need to resort to something *far* more unpleasant in order to hit 4.19. This series is some cleanup of module handling from the work I've done so far. It interacts with the HyperLaunch Boot Module cleanup, but should make it simpler overall. Andrew Cooper (3): x86/boot: Explain how moving mod[0] works x86/boot: Explain discard_initial_images() and untangle PV initrd handling x86/boot: Refactor pvh_load_kernel() to have an initrd_len local xen/arch/x86/hvm/dom0_build.c | 9 + xen/arch/x86/pv/dom0_build.c | 22 +++--- xen/arch/x86/setup.c | 14 +- 3 files changed, 29 insertions(+), 16 deletions(-) -- 2.30.2
[PATCH 2/3] x86/boot: Explain discard_initial_images() and untangle PV initrd handling
discard_initial_images() frees the memory backing the boot modules. It is called by dom0_construct_pv{,h}(), but obtains the module list by global pointer which is why there is no apparent link with the initrd pointer. Generally, module contents are copied into dom0 as it's being built, but the initrd for PV dom0 might be directly mapped instead of copied. dom0_construct_pv() does it's own ad-hoc freeing of the module in the copy case, and points the global reference at the new copy, then sets the size to 0. This only functions because init_domheap_pages(x, x) happens to be a nop. Delete the ad-hoc freeing, and leave it to discard_initial_images(). This requires (not) adjusting initd->mod_start in the copy case, and only setting the size to 0 in the mapping case. Alter discard_initial_images() to explicitly check for an ignored module, and explain what's going on. This is more robust and will allow for fixing other problems with module handling. The later logic in dom0_construct_pv() now cannot use initrd->mod_start, but that's fine because initrd_mfn is already a duplicate of the information wanted, and is more consistent with initrd_{pfn,len} used elsewhere. Invalidate the initrd pointer with LIST_POISON1 to make it clearer that it shouldn't be used. No practical change in behaviour, but a substantial reduction in the complexity of how this works. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Stefano Stabellini CC: Daniel Smith CC: Christopher Clark In other akward questions, why does initial_images_nrpages() account for all modules when only 1 or 2 are relevant for how we construct dom0 ? --- xen/arch/x86/pv/dom0_build.c | 22 +++--- xen/arch/x86/setup.c | 9 - 2 files changed, 19 insertions(+), 12 deletions(-) diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c index d8043fa58a27..64d9984b8308 100644 --- a/xen/arch/x86/pv/dom0_build.c +++ b/xen/arch/x86/pv/dom0_build.c @@ -630,18 +630,20 @@ int __init dom0_construct_pv(struct domain *d, } memcpy(page_to_virt(page), mfn_to_virt(initrd->mod_start), initrd_len); -mpt_alloc = (paddr_t)initrd->mod_start << PAGE_SHIFT; -init_domheap_pages(mpt_alloc, - mpt_alloc + PAGE_ALIGN(initrd_len)); -initrd->mod_start = initrd_mfn = mfn_x(page_to_mfn(page)); +initrd_mfn = mfn_x(page_to_mfn(page)); } else { while ( count-- ) if ( assign_pages(mfn_to_page(_mfn(mfn++)), 1, d, 0) ) BUG(); +/* + * Mapped rather than copied. Tell discard_initial_images() to + * ignore it. + */ +initrd->mod_end = 0; } -initrd->mod_end = 0; +initrd = LIST_POISON1; /* No longer valid to use. */ iommu_memory_setup(d, "initrd", mfn_to_page(_mfn(initrd_mfn)), PFN_UP(initrd_len), _flags); @@ -653,12 +655,10 @@ int __init dom0_construct_pv(struct domain *d, if ( domain_tot_pages(d) < nr_pages ) printk(" (%lu pages to be allocated)", nr_pages - domain_tot_pages(d)); -if ( initrd ) -{ -mpt_alloc = (paddr_t)initrd->mod_start << PAGE_SHIFT; +if ( initrd_len ) printk("\n Init. ramdisk: %"PRIpaddr"->%"PRIpaddr, - mpt_alloc, mpt_alloc + initrd_len); -} + pfn_to_paddr(initrd_mfn), + pfn_to_paddr(initrd_mfn) + initrd_len); printk("\nVIRTUAL MEMORY ARRANGEMENT:\n"); printk(" Loaded kernel: %p->%p\n", _p(vkern_start), _p(vkern_end)); @@ -881,7 +881,7 @@ int __init dom0_construct_pv(struct domain *d, if ( pfn >= initrd_pfn ) { if ( pfn < initrd_pfn + PFN_UP(initrd_len) ) -mfn = initrd->mod_start + (pfn - initrd_pfn); +mfn = initrd_mfn + (pfn - initrd_pfn); else mfn -= PFN_UP(initrd_len); } diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index 59907fae095f..785a46a44995 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -294,7 +294,7 @@ unsigned long __init initial_images_nrpages(nodeid_t node) return nr; } -void __init discard_initial_images(void) +void __init discard_initial_images(void) /* a.k.a. free multiboot modules */ { unsigned int i; @@ -302,6 +302,13 @@ void __init discard_initial_images(void) { uint64_t start = (uint64_t)initial_images[i].mod_start << PAGE_SHIFT; +/* + * Sometimes the initrd is mapped, rather than copied, into dom0. + * end=0 signifies that we should leave it alone. + */ +if ( initial_images[i].mod_end == 0 ) +
[PATCH 3/3] x86/boot: Refactor pvh_load_kernel() to have an initrd_len local
The expression get more complicated when ->mod_end isn't being abused as a size field. Introduce and use a initrd_len local variable. No functional change. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Stefano Stabellini CC: Daniel Smith CC: Christopher Clark --- xen/arch/x86/hvm/dom0_build.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c index ac71d43a6b3f..b0cb96c3bc76 100644 --- a/xen/arch/x86/hvm/dom0_build.c +++ b/xen/arch/x86/hvm/dom0_build.c @@ -650,6 +650,7 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image, { void *image_start = image_base + image_headroom; unsigned long image_len = image->mod_end; +unsigned long initrd_len = initrd ? initrd->mod_end : 0; struct elf_binary elf; struct elf_dom_parms parms; paddr_t last_addr; @@ -710,7 +711,7 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image, * simplify it. */ last_addr = find_memory(d, , sizeof(start_info) + -(initrd ? ROUNDUP(initrd->mod_end, PAGE_SIZE) + +(initrd ? ROUNDUP(initrd_len, PAGE_SIZE) + sizeof(mod) : 0) + (cmdline ? ROUNDUP(strlen(cmdline) + 1, @@ -725,7 +726,7 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image, if ( initrd != NULL ) { rc = hvm_copy_to_guest_phys(last_addr, mfn_to_virt(initrd->mod_start), -initrd->mod_end, v); +initrd_len, v); if ( rc ) { printk("Unable to copy initrd to guest\n"); @@ -733,8 +734,8 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image, } mod.paddr = last_addr; -mod.size = initrd->mod_end; -last_addr += ROUNDUP(initrd->mod_end, elf_64bit() ? 8 : 4); +mod.size = initrd_len; +last_addr += ROUNDUP(initrd_len, elf_64bit() ? 8 : 4); if ( initrd->string ) { char *str = __va(initrd->string); -- 2.30.2
[PATCH 1/3] x86/boot: Explain how moving mod[0] works
modules_headroom is a misleading name as it applies strictly to mod[0] only, and the movement loop is deeply unintuitive and completely undocumented. Provide help to whomever needs to look at this code next. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Stefano Stabellini CC: Daniel Smith CC: Christopher Clark --- xen/arch/x86/setup.c | 5 + 1 file changed, 5 insertions(+) diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index caf235c6286d..59907fae095f 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -1432,6 +1432,11 @@ void asmlinkage __init noreturn __start_xen(unsigned long mbi_p) /* Is the region suitable for relocating the multiboot modules? */ for ( j = mbi->mods_count - 1; j >= 0; j-- ) { +/* + * 'headroom' is a guess for the decompressed size and + * decompressor overheads of mod[0] (the dom0 kernel). When we + * move mod[0], we incorperate this as extra space at the start. + */ unsigned long headroom = j ? 0 : modules_headroom; unsigned long size = PAGE_ALIGN(headroom + mod[j].mod_end); -- 2.30.2
Re: [PATCH v3 7/8] gzip: move bitbuffer into gunzip state
On 26/04/2024 6:57 am, Jan Beulich wrote: > On 26.04.2024 07:55, Jan Beulich wrote: >> On 25.04.2024 21:23, Andrew Cooper wrote: >>> On 24/04/2024 5:34 pm, Daniel P. Smith wrote: >>>> --- a/xen/common/gzip/inflate.c >>>> +++ b/xen/common/gzip/inflate.c >>>> @@ -1017,8 +1014,8 @@ static int __init inflate(struct gunzip_state *s) >>>> /* Undo too much lookahead. The next read will be byte aligned so we >>>> * can discard unused bits in the last meaningful byte. >>>> */ >>>> -while (bk >= 8) { >>>> -bk -= 8; >>>> +while (s->bk >= 8) { >>>> +s->bk -= 8; >>>> s->inptr--; >>>> } >>> Isn't it just me, but isn't this just: >>> >>> s->inptr -= (s->bk >> 3); >>> s->bk &= 7; >>> >>> ? >> I'd say yes, if only there wasn't the comment talking of just a single byte, >> and even there supposedly some of the bits. The comments in this file leave a lot to be desired... I'm reasonably confident they were not adjusted when a piece of userspace code was imported into Linux. This cannot ever have been just a byte's worth of bits, seeing as the bit count is from 8 or more. Right now it's an unsigned long's worth of bits. > Talking of the comment: Considering what patch 1 supposedly does, how come > that isn't Xen-style (yet)? I had been collecting some minor fixes like this to fold into patch 1 when I committed the whole series. I'll see if I can fold them elsewhere. ~Andrew
MISRA and -Wextra-semi
Hi, Based on a call a long while back, I experimented with -Wextra-semi. This is what lead to 8e36c668ca107 "xen: Drop superfluous semi-colons". However, there are a number of problems with getting this working fully. First, we need workarounds like this: diff --git a/xen/include/xen/config.h b/xen/include/xen/config.h index d888b2314daf..12e99c6dded4 100644 --- a/xen/include/xen/config.h +++ b/xen/include/xen/config.h @@ -26,7 +26,7 @@ #include -#define EXPORT_SYMBOL(var) +#define EXPORT_SYMBOL(var) typedef int var##_ignore_t /* * The following log levels are as follows: to avoid a failure for users, which do legitimately have a semi-colon. It occurs to me that we could swap the typedef for as asm("") which might be a little less unpleasant. But with the simple cases taken care of, we then hit: In file included from common/grant_table.c:3813: common/compat/grant_table.c:10:1: error: extra ';' outside of a function [-Werror,-Wextra-semi] CHECK_grant_entry_v1; ^ which is quickly starting to unravel. Finally, while Clang does have -Wextra-semi working properly for C, GCC does this: cc1: warning: command-line option ‘-Wextra-semi’ is valid for C++/ObjC++ but not for C instead, which passes the cc-option-add check (which doesn't contain -Werror), but causes the real build to fail. So, while -Wextra-semi is definitely useful to find some hidden problems, it seems like using it fully might be very complicated. How much do we care? ~Andrew
Re: [PATCH 1/2] tools/{c,o}xenstored: Don't link against libsystemd
On 26/04/2024 9:51 am, Anthony PERARD wrote: > On Thu, Apr 25, 2024 at 07:16:23PM +0100, Andrew Cooper wrote: >> On 25/04/2024 7:06 pm, Anthony PERARD wrote: >>> On Thu, Apr 25, 2024 at 06:32:15PM +0100, Andrew Cooper wrote: >>>> libsystemd is a giant dependency for one single function, but in the wake >>>> of >>>> the xz backdoor, it turns out that even systemd leadership recommend >>>> against >>>> linking against libsystemd for sd_notify(). >>>> >>>> Since commit 7b61011e1450 ("tools: make xenstore domain easy >>>> configurable") in >>>> Xen 4.8, the launch-xenstore script invokes systemd-notify directly, so its >>> That's not enough, it's needs to be `systemd-notify --ready` to be a >>> replacement for sd_notify(READY=1). >>> >>>> not even necessary for the xenstored's to call sd_notify() themselves. >>> So, sd_notify() or equivalent is still necessary. >>> >>>> Therefore, just drop the calls to sd_notify() and stop linking against >>>> libsystemd. >>> Sounds good, be we need to replace the call by something like: >>> echo READY=1 > $NOTIFY_SOCKET >>> implemented in C and ocaml. Detail to be checked. >>> >>> Otherwise, things won't work. >> Hmm. It worked in XenRT when stripping this all out, but that is > I don't know how XenServer is setup, maybe it doesn't matter? In theory it's straight systemd, but I could also believe that Xapi checks for the pidfile too. > Anyway... > >> extremely unintuitive behaviour for `systemd-notify --booted`, seeing as >> it's entirely different to --ready. > Yes, this --booted option should probably not exist, and there's > `systemctl is-system-running` that does something similar. > >> I've got no interest in keeping the C around, but if: >> >> [ -n "$NOTIFY_SOCKET" ] && echo READY=1 > $NOTIFY_SOCKET >> >> works, then can't we just use that after waiting for the the pidfile ? > Run `systemd-notify --ready` instead. Hopefully, that will be enough. > ($NOTIFY_SOCKET is a socket, and a bit more complicated that I though, > it can start with "@" for example) I'll do a prep patch to adjust launch-xenstore after which this patch should be fine in this form (modulo a tweak in the commit message). ~Andrew
Re: [PATCH v3 7/8] gzip: move bitbuffer into gunzip state
On 24/04/2024 5:34 pm, Daniel P. Smith wrote: > Signed-off-by: Daniel P. Smith Acked-by: Andrew Cooper > diff --git a/xen/common/gzip/inflate.c b/xen/common/gzip/inflate.c > index bec8801df487..8da14880cfbe 100644 > --- a/xen/common/gzip/inflate.c > +++ b/xen/common/gzip/inflate.c > @@ -1017,8 +1014,8 @@ static int __init inflate(struct gunzip_state *s) > /* Undo too much lookahead. The next read will be byte aligned so we > * can discard unused bits in the last meaningful byte. > */ > -while (bk >= 8) { > -bk -= 8; > +while (s->bk >= 8) { > +s->bk -= 8; > s->inptr--; > } Isn't it just me, but isn't this just: s->inptr -= (s->bk >> 3); s->bk &= 7; ? ~Andrew
Re: [PATCH v3 8/8] gzip: move crc state into gunzip state
On 24/04/2024 5:34 pm, Daniel P. Smith wrote: > Move the crc and its state into struct gunzip_state. In the process, expand > the > only use of CRC_VALUE as it is hides what is being compared. "All variables here should be uint32_t rather than unsigned long, which halves the storage space required." > diff --git a/xen/common/gzip/inflate.c b/xen/common/gzip/inflate.c > index 8da14880cfbe..dc940e59d853 100644 > --- a/xen/common/gzip/inflate.c > +++ b/xen/common/gzip/inflate.c > @@ -1069,11 +1065,11 @@ static void __init makecrc(void) > if (k & 1) > c ^= e; > } > -crc_32_tab[i] = c; > +s->crc_32_tab[i] = c; > } > > /* this is initialized here so this code could reside in ROM */ > -crc = (ulg)0xUL; /* shift register contents */ > +s->crc = (ulg)~0u; /* shift register contents */ The (ulg) wasn't ever necessary, and can be dropped as part of this cleanup. Can fix on commit. ~Andrew
Re: [PATCH v3 1/8] gzip: clean up comments and fix code alignment
On 24/04/2024 5:34 pm, Daniel P. Smith wrote: > This commit cleans up the comments and fixes the code alignment using Xen > coding style. This is done to make the code more legible before refactoring. > > Signed-off-by: Daniel P. Smith Acked-by: Andrew Cooper
Re: [PATCH 1/2] tools/{c,o}xenstored: Don't link against libsystemd
On 25/04/2024 7:06 pm, Anthony PERARD wrote: > On Thu, Apr 25, 2024 at 06:32:15PM +0100, Andrew Cooper wrote: >> libsystemd is a giant dependency for one single function, but in the wake of >> the xz backdoor, it turns out that even systemd leadership recommend against >> linking against libsystemd for sd_notify(). >> >> Since commit 7b61011e1450 ("tools: make xenstore domain easy configurable") >> in >> Xen 4.8, the launch-xenstore script invokes systemd-notify directly, so its > That's not enough, it's needs to be `systemd-notify --ready` to be a > replacement for sd_notify(READY=1). > >> not even necessary for the xenstored's to call sd_notify() themselves. > So, sd_notify() or equivalent is still necessary. > >> Therefore, just drop the calls to sd_notify() and stop linking against >> libsystemd. > Sounds good, be we need to replace the call by something like: > echo READY=1 > $NOTIFY_SOCKET > implemented in C and ocaml. Detail to be checked. > > Otherwise, things won't work. Hmm. It worked in XenRT when stripping this all out, but that is extremely unintuitive behaviour for `systemd-notify --booted`, seeing as it's entirely different to --ready. I've got no interest in keeping the C around, but if: [ -n "$NOTIFY_SOCKET" ] && echo READY=1 > $NOTIFY_SOCKET works, then can't we just use that after waiting for the the pidfile ? ~Andrew
[PATCH] CI: Drop glibc-i386 from the build containers
Xen 4.14 no longer runs in Gitlab CI. Drop the dependency to shrink the build containers a little. Signed-off-by: Andrew Cooper --- CC: Anthony PERARD CC: Stefano Stabellini CC: Michal Orzel CC: Doug Goldstein CC: Roger Pau Monné --- automation/build/archlinux/current.dockerfile| 2 -- automation/build/centos/7.dockerfile | 2 -- automation/build/debian/bookworm.dockerfile | 2 -- automation/build/debian/jessie.dockerfile| 2 -- automation/build/debian/stretch.dockerfile | 2 -- automation/build/fedora/29.dockerfile| 2 -- automation/build/suse/opensuse-leap.dockerfile | 2 -- automation/build/suse/opensuse-tumbleweed.dockerfile | 2 -- automation/build/ubuntu/bionic.dockerfile| 2 -- automation/build/ubuntu/focal.dockerfile | 2 -- automation/build/ubuntu/trusty.dockerfile| 2 -- automation/build/ubuntu/xenial.dockerfile| 2 -- 12 files changed, 24 deletions(-) diff --git a/automation/build/archlinux/current.dockerfile b/automation/build/archlinux/current.dockerfile index d974a1434fd5..3e37ab5c40c1 100644 --- a/automation/build/archlinux/current.dockerfile +++ b/automation/build/archlinux/current.dockerfile @@ -19,8 +19,6 @@ RUN pacman -S --refresh --sysupgrade --noconfirm --noprogressbar --needed \ iasl \ inetutils \ iproute \ -# lib32-glibc for Xen < 4.15 -lib32-glibc \ libaio \ libcacard \ libgl \ diff --git a/automation/build/centos/7.dockerfile b/automation/build/centos/7.dockerfile index ab450f0b3a0e..1cdc16fc05f9 100644 --- a/automation/build/centos/7.dockerfile +++ b/automation/build/centos/7.dockerfile @@ -32,8 +32,6 @@ RUN yum -y update \ yajl-devel \ pixman-devel \ glibc-devel \ -# glibc-devel.i686 for Xen < 4.15 -glibc-devel.i686 \ make \ binutils \ git \ diff --git a/automation/build/debian/bookworm.dockerfile b/automation/build/debian/bookworm.dockerfile index 459f8e30bdc6..d893218fc4bd 100644 --- a/automation/build/debian/bookworm.dockerfile +++ b/automation/build/debian/bookworm.dockerfile @@ -31,8 +31,6 @@ RUN apt-get update && \ bin86 \ bcc \ liblzma-dev \ -# libc6-dev-i386 for Xen < 4.15 -libc6-dev-i386 \ libnl-3-dev \ ocaml-nox \ libfindlib-ocaml-dev \ diff --git a/automation/build/debian/jessie.dockerfile b/automation/build/debian/jessie.dockerfile index 32fc952fbc2d..308675cac150 100644 --- a/automation/build/debian/jessie.dockerfile +++ b/automation/build/debian/jessie.dockerfile @@ -37,8 +37,6 @@ RUN apt-get update && \ bin86 \ bcc \ liblzma-dev \ -# libc6-dev-i386 for Xen < 4.15 -libc6-dev-i386 \ libnl-3-dev \ ocaml-nox \ libfindlib-ocaml-dev \ diff --git a/automation/build/debian/stretch.dockerfile b/automation/build/debian/stretch.dockerfile index e2706a8f3589..59794ed4677b 100644 --- a/automation/build/debian/stretch.dockerfile +++ b/automation/build/debian/stretch.dockerfile @@ -38,8 +38,6 @@ RUN apt-get update && \ bin86 \ bcc \ liblzma-dev \ -# libc6-dev-i386 for Xen < 4.15 -libc6-dev-i386 \ libnl-3-dev \ ocaml-nox \ libfindlib-ocaml-dev \ diff --git a/automation/build/fedora/29.dockerfile b/automation/build/fedora/29.dockerfile index 42a87ce6c84b..f473ae13e7c1 100644 --- a/automation/build/fedora/29.dockerfile +++ b/automation/build/fedora/29.dockerfile @@ -21,8 +21,6 @@ RUN dnf -y install \ yajl-devel \ pixman-devel \ glibc-devel \ -# glibc-devel.i686 for Xen < 4.15 -glibc-devel.i686 \ make \ binutils \ git \ diff --git a/automation/build/suse/opensuse-leap.dockerfile b/automation/build/suse/opensuse-leap.dockerfile index e1ec38a41445..48d0d50d005d 100644 --- a/automation/build/suse/opensuse-leap.dockerfile +++ b/automation/build/suse/opensuse-leap.dockerfile @@ -28,8 +28,6 @@ RUN zypper install -y --no-recommends \ ghostscript \ glib2-devel \ glibc-devel \ -# glibc-devel-32bit for Xen < 4.15 -glibc-devel-32bit \ gzip \ hostname \ libaio-devel \ diff --git a/automation/build/suse/opensuse-tumbleweed.dockerfile b/automation/build/suse/opensuse-tumbleweed.dockerfile index f00e03eda7b1..53542ba1f4d9 100644 --- a/automation/build/suse/opensuse-tumbleweed.dockerfile +++ b/automation/build/suse/opensuse-tumbleweed.dockerfile @@ -26,8 +26,6 @@ RUN zypper install -y --no-recommends \ ghostscript \ glib2-devel \ glibc-devel \ -# glibc-devel-32bit for Xen < 4.15 -glibc-devel-32bit \ gzip \ hostname \ libaio-devel \ diff --git a/automation/bu
[PATCH 2/2] tools: Drop libsystemd as a dependency
There are no more users, and we want to disuade people from introducing new users just for sd_notify() and friends. Drop the dependency. We still want the overall --with{,out}-systemd to gate the generation of the service/unit/mount/etc files. Rerun autogen.sh, and mark the dependency as removed in the build containers. Signed-off-by: Andrew Cooper --- CC: Anthony PERARD CC: Juergen Gross CC: Christian Lindig CC: Edwin Török CC: Stefano Stabellini --- automation/build/archlinux/current.dockerfile | 1 + .../build/suse/opensuse-leap.dockerfile | 1 + .../build/suse/opensuse-tumbleweed.dockerfile | 1 + automation/build/ubuntu/focal.dockerfile | 1 + config/Tools.mk.in| 2 - m4/systemd.m4 | 23 +- tools/config.h.in | 3 - tools/configure | 523 +- 8 files changed, 7 insertions(+), 548 deletions(-) diff --git a/automation/build/archlinux/current.dockerfile b/automation/build/archlinux/current.dockerfile index d974a1434fd5..a350b53ad8e2 100644 --- a/automation/build/archlinux/current.dockerfile +++ b/automation/build/archlinux/current.dockerfile @@ -39,6 +39,7 @@ RUN pacman -S --refresh --sysupgrade --noconfirm --noprogressbar --needed \ sdl2 \ spice \ spice-protocol \ +# systemd for Xen < 4.19 systemd \ transfig \ usbredir \ diff --git a/automation/build/suse/opensuse-leap.dockerfile b/automation/build/suse/opensuse-leap.dockerfile index e1ec38a41445..0483d9826d7d 100644 --- a/automation/build/suse/opensuse-leap.dockerfile +++ b/automation/build/suse/opensuse-leap.dockerfile @@ -61,6 +61,7 @@ RUN zypper install -y --no-recommends \ 'pkgconfig(sdl2)' \ python3-devel \ python3-setuptools \ +# systemd-devel for Xen < 4.19 systemd-devel \ tar \ transfig \ diff --git a/automation/build/suse/opensuse-tumbleweed.dockerfile b/automation/build/suse/opensuse-tumbleweed.dockerfile index f00e03eda7b1..59b168e717b2 100644 --- a/automation/build/suse/opensuse-tumbleweed.dockerfile +++ b/automation/build/suse/opensuse-tumbleweed.dockerfile @@ -62,6 +62,7 @@ RUN zypper install -y --no-recommends \ 'pkgconfig(sdl2)' \ python3-devel \ python3-setuptools \ +# systemd-devel for Xen < 4.19 systemd-devel \ tar \ transfig \ diff --git a/automation/build/ubuntu/focal.dockerfile b/automation/build/ubuntu/focal.dockerfile index 30a9b8e84ffe..8171347c4656 100644 --- a/automation/build/ubuntu/focal.dockerfile +++ b/automation/build/ubuntu/focal.dockerfile @@ -36,6 +36,7 @@ RUN apt-get update && \ libnl-3-dev \ ocaml-nox \ libfindlib-ocaml-dev \ +# libsystemd-dev for Xen < 4.19 libsystemd-dev \ transfig \ pandoc \ diff --git a/config/Tools.mk.in b/config/Tools.mk.in index b54ab21f966b..50fbef841f3f 100644 --- a/config/Tools.mk.in +++ b/config/Tools.mk.in @@ -52,8 +52,6 @@ CONFIG_PYGRUB := @pygrub@ CONFIG_LIBFSIMAGE := @libfsimage@ CONFIG_SYSTEMD := @systemd@ -SYSTEMD_CFLAGS := @SYSTEMD_CFLAGS@ -SYSTEMD_LIBS:= @SYSTEMD_LIBS@ XEN_SYSTEMD_DIR := @SYSTEMD_DIR@ XEN_SYSTEMD_MODULES_LOAD := @SYSTEMD_MODULES_LOAD@ CONFIG_9PFS := @ninepfs@ diff --git a/m4/systemd.m4 b/m4/systemd.m4 index 112dc11b5e05..aa1ebe94f56c 100644 --- a/m4/systemd.m4 +++ b/m4/systemd.m4 @@ -41,15 +41,6 @@ AC_DEFUN([AX_ALLOW_SYSTEMD_OPTS], [ ]) AC_DEFUN([AX_CHECK_SYSTEMD_LIBS], [ - PKG_CHECK_MODULES([SYSTEMD], [libsystemd-daemon],, - [PKG_CHECK_MODULES([SYSTEMD], [libsystemd >= 209])] -) - dnl pkg-config older than 0.24 does not set these for - dnl PKG_CHECK_MODULES() worth also noting is that as of version 208 - dnl of systemd pkg-config --cflags currently yields no extra flags yet. - AC_SUBST([SYSTEMD_CFLAGS]) - AC_SUBST([SYSTEMD_LIBS]) - AS_IF([test "x$SYSTEMD_DIR" = x], [ dnl In order to use the line below we need to fix upstream systemd dnl to properly ${prefix} for child variables in @@ -83,23 +74,11 @@ AC_DEFUN([AX_CHECK_SYSTEMD_LIBS], [ AC_DEFUN([AX_CHECK_SYSTEMD], [ dnl Respect user override to disable AS_IF([test "x$enable_systemd" != "xno"], [ -AS_IF([test "x$systemd" = "xy" ], [ - AC_DEFINE([HAVE_SYSTEMD], [1], [Systemd available and enabled]) - systemd=y - AX_CHECK_SYSTEMD_LIBS() - ],[ - AS_IF([test "x$enable_systemd" = "xyes"], - [AC_MSG_ERROR([Unable to find systemd development library])], - [systemd=n]) - ]) + systemd=y ],[systemd=
[PATCH 0/2] Drop libsystemd
On advise from the systemd leadership. See patch 1 for details. Andrew Cooper (2): tools/{c,o}xenstored: Don't link against libsystemd tools: Drop libsystemd as a dependency automation/build/archlinux/current.dockerfile | 1 + .../build/suse/opensuse-leap.dockerfile | 1 + .../build/suse/opensuse-tumbleweed.dockerfile | 1 + automation/build/ubuntu/focal.dockerfile | 1 + config/Tools.mk.in| 2 - m4/systemd.m4 | 23 +- tools/config.h.in | 3 - tools/configure | 523 +- tools/ocaml/xenstored/Makefile| 12 +- tools/ocaml/xenstored/systemd.ml | 15 - tools/ocaml/xenstored/systemd.mli | 16 - tools/ocaml/xenstored/systemd_stubs.c | 47 -- tools/ocaml/xenstored/xenstored.ml| 1 - tools/xenstored/Makefile | 5 - tools/xenstored/posix.c | 9 - 15 files changed, 8 insertions(+), 652 deletions(-) delete mode 100644 tools/ocaml/xenstored/systemd.ml delete mode 100644 tools/ocaml/xenstored/systemd.mli delete mode 100644 tools/ocaml/xenstored/systemd_stubs.c base-commit: 23cd1207e7f6ee3e51fb42e11dba8d7cdb28e1e5 -- 2.30.2
[PATCH 1/2] tools/{c,o}xenstored: Don't link against libsystemd
libsystemd is a giant dependency for one single function, but in the wake of the xz backdoor, it turns out that even systemd leadership recommend against linking against libsystemd for sd_notify(). Since commit 7b61011e1450 ("tools: make xenstore domain easy configurable") in Xen 4.8, the launch-xenstore script invokes systemd-notify directly, so its not even necessary for the xenstored's to call sd_notify() themselves. Therefore, just drop the calls to sd_notify() and stop linking against libsystemd. No functional change. Signed-off-by: Andrew Cooper --- CC: Anthony PERARD CC: Juergen Gross CC: Christian Lindig CC: Edwin Török CC: Stefano Stabellini --- tools/ocaml/xenstored/Makefile| 12 +-- tools/ocaml/xenstored/systemd.ml | 15 - tools/ocaml/xenstored/systemd.mli | 16 - tools/ocaml/xenstored/systemd_stubs.c | 47 --- tools/ocaml/xenstored/xenstored.ml| 1 - tools/xenstored/Makefile | 5 --- tools/xenstored/posix.c | 9 - 7 files changed, 1 insertion(+), 104 deletions(-) delete mode 100644 tools/ocaml/xenstored/systemd.ml delete mode 100644 tools/ocaml/xenstored/systemd.mli delete mode 100644 tools/ocaml/xenstored/systemd_stubs.c diff --git a/tools/ocaml/xenstored/Makefile b/tools/ocaml/xenstored/Makefile index e8aaecf2e630..1e4b51cc5432 100644 --- a/tools/ocaml/xenstored/Makefile +++ b/tools/ocaml/xenstored/Makefile @@ -4,8 +4,6 @@ include $(OCAML_TOPLEVEL)/common.make # Include configure output (config.h) CFLAGS += -include $(XEN_ROOT)/tools/config.h -CFLAGS-$(CONFIG_SYSTEMD) += $(SYSTEMD_CFLAGS) -LDFLAGS-$(CONFIG_SYSTEMD) += $(SYSTEMD_LIBS) CFLAGS += $(CFLAGS-y) CFLAGS += $(APPEND_CFLAGS) @@ -25,13 +23,6 @@ poll_OBJS = poll poll_C_OBJS = select_stubs OCAML_LIBRARY = syslog poll -LIBS += systemd.cma systemd.cmxa -systemd_OBJS = systemd -systemd_C_OBJS = systemd_stubs -OCAML_LIBRARY += systemd - -LIBS_systemd += $(LDFLAGS-y) - OBJS = paths \ define \ stdext \ @@ -56,12 +47,11 @@ OBJS = paths \ process \ xenstored -INTF = symbol.cmi trie.cmi syslog.cmi systemd.cmi poll.cmi +INTF = symbol.cmi trie.cmi syslog.cmi poll.cmi XENSTOREDLIBS = \ unix.cmxa \ -ccopt -L -ccopt . syslog.cmxa \ - -ccopt -L -ccopt . systemd.cmxa \ -ccopt -L -ccopt . poll.cmxa \ -ccopt -L -ccopt $(OCAML_TOPLEVEL)/libs/mmap $(OCAML_TOPLEVEL)/libs/mmap/xenmmap.cmxa \ -ccopt -L -ccopt $(OCAML_TOPLEVEL)/libs/eventchn $(OCAML_TOPLEVEL)/libs/eventchn/xeneventchn.cmxa \ diff --git a/tools/ocaml/xenstored/systemd.ml b/tools/ocaml/xenstored/systemd.ml deleted file mode 100644 index 39127f712d72.. --- a/tools/ocaml/xenstored/systemd.ml +++ /dev/null @@ -1,15 +0,0 @@ -(* - * Copyright (C) 2014 Luis R. Rodriguez - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU Lesser General Public License as published - * by the Free Software Foundation; version 2.1 only. with the special - * exception on linking described in file LICENSE. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU Lesser General Public License for more details. - *) - -external sd_notify_ready: unit -> unit = "ocaml_sd_notify_ready" diff --git a/tools/ocaml/xenstored/systemd.mli b/tools/ocaml/xenstored/systemd.mli deleted file mode 100644 index 18b9331031f9.. --- a/tools/ocaml/xenstored/systemd.mli +++ /dev/null @@ -1,16 +0,0 @@ -(* - * Copyright (C) 2014 Luis R. Rodriguez - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU Lesser General Public License as published - * by the Free Software Foundation; version 2.1 only. with the special - * exception on linking described in file LICENSE. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU Lesser General Public License for more details. - *) - -(** Tells systemd we're ready *) -external sd_notify_ready: unit -> unit = "ocaml_sd_notify_ready" diff --git a/tools/ocaml/xenstored/systemd_stubs.c b/tools/ocaml/xenstored/systemd_stubs.c deleted file mode 100644 index f4c875075abe.. --- a/tools/ocaml/xenstored/systemd_stubs.c +++ /dev/null @@ -1,47 +0,0 @@ -/* - * Copyright (C) 2014 Luis R. Rodriguez - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU Lesser General Public License as published - * by the Free Software Foundation; version 2.1 only. with the special - * exception on linking described in file LICENSE. - * - * This program is distribute
Re: [PATCH] svm: Fix MISRA 8.2 violation
On 25/04/2024 10:12 am, George Dunlap wrote: > Misra 8.2 requires named parameters in prototypes. Use the name from > the implementaiton. > > Fixes 0d19d3aab0 ("svm/nestedsvm: Introduce nested capabilities bit"). Nit. We treat Fixes: as a tag. Also you might want to include the Reported-by's. Otherwise, Acked-by: Andrew Cooper although I have a strong wish to shorten the parameter name. That can be done at a later point. ~Andrew
Re: [PATCH v3] x86/entry: shrink insn size for some of our EFLAGS manipulation
On 25/04/2024 3:26 pm, Jan Beulich wrote: > Much like was recently done for setting entry vector, and along the > lines of what we already had in handle_exception_saved, avoid 32-bit > immediates where 8-bit ones do. Reduces .text.entry size by 16 bytes in > my non-CET reference build, while in my CET reference build section size > doesn't change (there and in .text only padding space increases). > > Inspired by other long->byte conversion work. > > Signed-off-by: Jan Beulich Acked-by: Andrew Cooper
Re: [PATCH] VMX: no open-coding in vmx_get_cpl()
On 25/04/2024 2:27 pm, Jan Beulich wrote: > Neither X86_SEG_AR_DPL nor MASK_EXTR() should really be avoided here, > using literal number instead. > > No difference in generated code (with gcc13 at least). > > Signed-off-by: Jan Beulich Acked-by: Andrew Cooepr
Re: [PATCH v2 2/2] x86/spec: adjust logic to logic that elides lfence
On 18/04/2024 4:52 pm, Roger Pau Monne wrote: > It's currently too restrictive by just checking whether there's a BHB clearing > sequence selected. It should instead check whether BHB clearing is used on > entry from PV or HVM specifically. > > Switch to use opt_bhb_entry_{pv,hvm} instead, and then remove cpu_has_bhb_seq > since it no longer has any users. > > Reported-by: Jan Beulich > Fixes: 954c983abcee ('x86/spec-ctrl: Software BHB-clearing sequences') > Signed-off-by: Roger Pau Monné > --- > Changes since v1: > - New in this version. > > There (possibly) still a bit of overhead for dom0 if BHB clearing is not used > for dom0, as Xen would still add the lfence if domUs require it. This is the note about dom0 that I made on the previous patch. "protect dom0" only has any effect if the appropriate foo_pv or foo_hvm is also selected. It's not possible to express "protect dom0 but not domU of $TYPE". This early on boot we have no idea whether dom0 is going to be PV or HVM. We could in principle figure it out by peeking at dom0's ELF notes, but that needs a lot of rearranging of __start_xen() to do safely. Reviewed-by: Andrew Cooper
Re: [PATCH v2 1/2] x86/spec: fix reporting of BHB clearing usage from guest entry points
On 18/04/2024 4:52 pm, Roger Pau Monne wrote: > Reporting whether the BHB clearing on entry is done for the different domains > types based on cpu_has_bhb_seq is incorrect, as that variable signals whether I'd prefer s/incorrect/unhelpful/ here. As pointed out, it's currently like IBPB-entry in this regard, identifying whether the block is in place; not whether it's used by the condition. > there's a BHB clearing sequence selected, but that alone doesn't imply that > such sequence is used from the PV and/or HVM entry points. > > Instead use opt_bhb_entry_{pv,hvm} which do signal whether BHB clearing is > performed on entry from PV/HVM. I was going to ask for a note about dom0, but instead I need to make a correction to the cmdline docs. I'll do that in a separate patch. > Fixes: 689ad48ce9cf ('x86/spec-ctrl: Wire up the Native-BHI software > sequences') > Signed-off-by: Roger Pau Monné Reviewed-by: Andrew Cooper Can make the one tweak on commit.
Re: [XEN PATCH 2/2] x86/msr: add suffix 'U' to MSR_AMD_CSTATE_CFG macro.
On 24/04/2024 1:51 pm, Jan Beulich wrote: > On 24.04.2024 14:11, Alessandro Zucchelli wrote: >> This addresses violations of MISRA C:2012 Rule 7.2 which states as >> following: A “u” or “U” suffix shall be applied to all integer constants >> that are represented in an unsigned type. >> >> No functional change. > I'm inclined to suggest > Fixes: 652683e1aeaa ("x86/hvm: address violations of MISRA C:2012 Rule 7.2") > as that change clearly should have taken care of this already. The line > changed here is even visible in patch context there. > >> Signed-off-by: Alessandro Zucchelli > Acked-by: Jan Beulich > > I expect it was a race condition. MSR_AMD_CSTATE_CFG is a recent addition. ~Andrew
Re: [PATCH 1/4] x86/P2M: write_p2m_entry() is HVM-only anyway
On 23/04/2024 3:31 pm, Jan Beulich wrote: > The latest as of e2b2ff677958 ("x86/P2M: split out init/teardown > functions") the function is obviously unreachable for PV guests. This doesn't parse. Do you mean "Since e2b2ff677958 ..." ? > Hence > the paging_mode_enabled(d) check is pointless. > > Further host mode of a vCPU is always set, by virtue of > paging_vcpu_init() being part of vCPU creation. Hence the > paging_get_hostmode() check is pointless. > > With that the v local variable is unnecessary too. Drop the "if()" > conditional and its corresponding "else". > > Signed-off-by: Jan Beulich > --- > I have to confess that this if() has been puzzling me before. Puzzling yes, but it can't blindly be dropped. This is the "did the toolstack initiate this update" check. i.e. I think it's "bypass the normal side effects of making this update". I suspect it exists because of improper abstraction between the guest physmap and the shadow pagetables as-were - which were/are tighly coupled to vCPUs even for aspects where they shouldn't have been. For better or worse, the toolstack can add_to_physmap() before it creates vCPUs, and it will take this path you're trying to delete. There may be other cases too; I could see foreign mapping ending up ticking this too. Whether we ought to permit a toolstack to do this is a different question, but seeing as we explicitly intend (eventually for AMX) have a set_policy call between domain_create() and vcpu_create(), I don't think we can reasably restrict other hypercalls too in this period. ~Andrew
Re: [PATCH 1/6] x86: Introduce x86_decode_lite()
On 23/04/2024 10:17 am, Jan Beulich wrote: > On 22.04.2024 20:14, Andrew Cooper wrote: >> --- /dev/null >> +++ b/xen/arch/x86/x86_emulate/decode-lite.c >> @@ -0,0 +1,245 @@ >> +/* SPDX-License-Identifier: GPL-2.0-only */ >> + >> +#include "private.h" >> + >> +#define Imm8 (1 << 0) >> +#define Imm(1 << 1) >> +#define Branch (1 << 5) /* ... that we care about */ >> +/* ModRM (1 << 6) */ >> +#define Known (1 << 7) >> + >> +#define ALU_OPS \ >> +(Known|ModRM), \ >> +(Known|ModRM), \ >> +(Known|ModRM), \ >> +(Known|ModRM), \ >> +(Known|Imm8), \ >> +(Known|Imm) >> + >> +static const uint8_t init_or_livepatch_const onebyte[256] = { >> +[0x00] = ALU_OPS, /* ADD */ [0x08] = ALU_OPS, /* OR */ >> +[0x10] = ALU_OPS, /* ADC */ [0x18] = ALU_OPS, /* SBB */ >> +[0x20] = ALU_OPS, /* AND */ [0x28] = ALU_OPS, /* SUB */ >> +[0x30] = ALU_OPS, /* XOR */ [0x38] = ALU_OPS, /* CMP */ >> + >> +[0x50 ... 0x5f] = (Known), /* PUSH/POP %reg */ >> + >> +[0x70 ... 0x7f] = (Known|Branch|Imm8), /* Jcc disp8 */ >> +[0x80] = (Known|ModRM|Imm8), >> +[0x81] = (Known|ModRM|Imm), >> + >> +[0x83] = (Known|ModRM|Imm8), >> +[0x84 ... 0x8e] = (Known|ModRM), /* TEST/XCHG/MOV/MOV-SREG/LEA >> r/rm */ >> + >> +[0xb0 ... 0xb7] = (Known|Imm8),/* MOV $imm8, %reg */ > I'm surprised you get away without at least NOP also marked as known. > Imo the whole 0x90 row should be marked so. > > Similarly I'm not convinced that leaving the 0xA0 row unpopulated is a > good idea. It's at best a trap set up for somebody to fall into rather > sooner than later. > >> +[0xb8 ... 0xbf] = (Known|Imm), /* MOV $imm32, %reg */ >> + >> +[0xcc] = (Known), /* INT3 */ >> +[0xcd] = (Known|Imm8),/* INT $imm8 */ > Like above, what about in particular any of the shifts/rotates and the > MOV that's in the 0xC0 row? > > While the last sentence in the description is likely meant to cover > that, I think the description wants to go further as to any pretty > common but omitted insns. Already "x86: re-work memset()" and "x86: re- > work memcpy()" (v2 pending for, soon, 3 years) would make it necessary > to touch this table, thus increasing complexity of those changes to an > area they shouldn't be concerned about at all. > >> +[0xe8 ... 0xe9] = (Known|Branch|Imm), /* CALL/JMP disp32 */ >> +[0xeb] = (Known|Branch|Imm8), /* JMP disp8 */ > 0xe0 ... 0xe7 and 0xec ... 0xef would imo also better be covered, as > they easily can be (much like you also cover e.g. CMC despite it > appearing pretty unlikely that we use that insn anywhere). > >> +[0xf1] = (Known), /* ICEBP */ >> +[0xf4] = (Known), /* HLT */ >> +[0xf5] = (Known), /* CMC */ >> +[0xf6 ... 0xf7] = (Known|ModRM), /* Grp3 */ >> +[0xf8 ... 0xfd] = (Known), /* CLC ... STD */ >> +[0xfe ... 0xff] = (Known|ModRM), /* Grp4 */ >> +}; >> +static const uint8_t init_or_livepatch_const twobyte[256] = { >> +[0x00 ... 0x01] = (Known|ModRM), /* Grp6/Grp7 */ > LAR / LSL? CLTS? WBINVD? UD2? > >> +[0x18 ... 0x1f] = (Known|ModRM), /* Grp16 (Hint Nop) */ >> + >> +[0x20 ... 0x23] = (Known|ModRM), /* MOV CR/DR */ >> + >> +[0x30 ... 0x34] = (Known), /* WRMSR ... RDPMC */ > 0x34 is SYSENTER, isn't it? > >> +[0x40 ... 0x4f] = (Known|ModRM), /* CMOVcc */ >> + >> +[0x80 ... 0x8f] = (Known|Branch|Imm), /* Jcc disp32 */ > What about things like VMREAD/VMWRITE? > >> +[0x90 ... 0x9f] = (Known|ModRM), /* SETcc */ > PUSH/POP fs/gs? CPUID? > >> +[0xab] = (Known|ModRM), /* BTS */ > BTS (and BTC below) but not BT and BTR? > >> +[0xac] = (Known|ModRM|Imm8), /* SHRD $imm8 */ >> +[0xad ... 0xaf] = (Known|ModRM), /* SHRD %cl / Grp15 / IMUL */ > SHRD but not SHLD? > > CMPXCHG? > >> +[0xb8 ... 0xb9] = (Known|ModRM), /* POPCNT/Grp10 (UD1) */ >> +[0xba] = (Known|ModRM|Imm8), /* Grp8 */ >> +[0xbb ... 0xbf] = (Known|ModRM), /* BSR/BSF/BSR/MOVSX */ > Nit (comment only): 0xb
Re: [PATCH 3/6] x86/alternative: Intend the relocation logic
This of course intended to say indent... ~Andrew
[PATCH 2/6] x86/alternative: Walk all replacements in debug builds
In debug builds, walk all alternative replacements with x86_decode_lite(). This checks that we can decode all instructions, and also lets us check that disp8's don't leave the replacement block. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné --- xen/arch/x86/alternative.c | 49 ++ 1 file changed, 49 insertions(+) diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c index 2e7ba6e0b833..5bd256365def 100644 --- a/xen/arch/x86/alternative.c +++ b/xen/arch/x86/alternative.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #define MAX_PATCH_LEN (255-1) @@ -464,6 +465,54 @@ static void __init _alternative_instructions(bool force) void __init alternative_instructions(void) { arch_init_ideal_nops(); + +/* + * Walk all replacement instructions with x86_decode_lite(). This checks + * both that we can decode all instructions within the replacement, and + * that any near branch with a disp8 stays within the alternative itself. + */ +if ( IS_ENABLED(CONFIG_DEBUG) ) +{ +struct alt_instr *a; + +for ( a = __alt_instructions; + a < __alt_instructions_end; ++a ) +{ +void *repl = ALT_REPL_PTR(a); +void *ip = repl, *end = ip + a->repl_len; + +if ( !a->repl_len ) +continue; + +for ( x86_decode_lite_t res; ip < end; ip += res.len ) +{ +res = x86_decode_lite(ip, end); + +if ( res.len <= 0 ) +{ +printk("Alternative for %ps [%*ph]\n", + ALT_ORIG_PTR(a), a->repl_len, repl); +panic("Unable to decode instruction at +%u in alternative\n", + (unsigned int)(unsigned long)(ip - repl)); +} + +if ( res.rel_type == REL_TYPE_d8 ) +{ +int8_t *d8 = res.rel; +void *target = ip + res.len + *d8; + +if ( target < repl || target > end ) +{ +printk("Alternative for %ps [%*ph]\n", + ALT_ORIG_PTR(a), a->repl_len, repl); +panic("'JMP/Jcc disp8' at +%u leaves alternative block\n", + (unsigned int)(unsigned long)(ip - repl)); +} +} +} +} +} + _alternative_instructions(false); } -- 2.30.2
[PATCH 5/6] x86/alternative: Relocate all insn-relative fields
Right now, relocation of displacements is restricted to finding 0xe8/e9 as the first byte of the replacement, but this is overly restrictive. Use x86_decode_lite() to find and adjust all insn-relative fields. As with disp8's not leaving the replacemnet block, some disp32's don't either. e.g. the RSB stuffing loop. These stay unmodified. For now, leave the altcall devirtualisation alone. These require more care to transform into the new scheme. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné --- xen/arch/x86/alternative.c | 46 +++--- 1 file changed, 38 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c index c86ea235e865..4d7dc9418cf0 100644 --- a/xen/arch/x86/alternative.c +++ b/xen/arch/x86/alternative.c @@ -244,10 +244,31 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start, memcpy(buf, repl, a->repl_len); +/* Walk buf[] and adjust any insn-relative operands. */ +if ( a->repl_len ) { -/* 0xe8/0xe9 are relative branches; fix the offset. */ -if ( a->repl_len >= 5 && (*buf & 0xfe) == 0xe8 ) +uint8_t *ip = buf, *end = ip + a->repl_len; + +for ( x86_decode_lite_t res; ip < end; ip += res.len ) { +int32_t *d32; +uint8_t *target; + +res = x86_decode_lite(ip, end); + +if ( res.len <= 0 ) +{ +printk("Alternative for %ps [%*ph]\n", + ALT_ORIG_PTR(a), a->repl_len, repl); +printk("Unable to decode instruction in alternative - ignoring.\n"); +goto skip_this_alternative; +} + +if ( res.rel_type != REL_TYPE_d32 ) +continue; + +d32 = res.rel; + /* * Detect the special case of indirect-to-direct branch patching: * - replacement is a direct CALL/JMP (opcodes 0xE8/0xE9; already @@ -317,14 +338,23 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start, */ goto skip_this_alternative; } + +continue; } -else if ( force && system_state < SYS_STATE_active ) -ASSERT_UNREACHABLE(); -else -*(int32_t *)(buf + 1) += repl - orig; + +target = ip + res.len + *d32; + +if ( target >= buf && target <= end ) +{ +/* + * Target doesn't leave the replacement block. e.g. RSB + * stuffing. Leave it unmodified. + */ +continue; +} + +*d32 += repl - orig; } -else if ( force && system_state < SYS_STATE_active ) -ASSERT_UNREACHABLE(); } a->priv = 1; -- 2.30.2
[PATCH 1/6] x86: Introduce x86_decode_lite()
In order to relocate all IP-relative fields in an alternative replacement block, we need to decode the instructions enough to obtain their length and any relative fields. Full x86_decode() is far too heavyweight, so introduce a minimal form which can make several simplifying assumptions. This logic is sufficient to decode all alternative blocks that exist in Xen right now. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné --- xen/arch/x86/x86_emulate/Makefile | 1 + xen/arch/x86/x86_emulate/decode-lite.c | 245 + xen/arch/x86/x86_emulate/private.h | 2 + xen/arch/x86/x86_emulate/x86_emulate.h | 17 ++ 4 files changed, 265 insertions(+) create mode 100644 xen/arch/x86/x86_emulate/decode-lite.c diff --git a/xen/arch/x86/x86_emulate/Makefile b/xen/arch/x86/x86_emulate/Makefile index 2e20d65d788a..f06731913d67 100644 --- a/xen/arch/x86/x86_emulate/Makefile +++ b/xen/arch/x86/x86_emulate/Makefile @@ -3,6 +3,7 @@ obj-y += 0fae.o obj-y += 0fc7.o obj-y += blk.o obj-y += decode.o +obj-y += decode-lite.o obj-$(CONFIG_HVM) += fpu.o obj-y += util.o obj-y += util-xen.o diff --git a/xen/arch/x86/x86_emulate/decode-lite.c b/xen/arch/x86/x86_emulate/decode-lite.c new file mode 100644 index ..050b0d8cf4a8 --- /dev/null +++ b/xen/arch/x86/x86_emulate/decode-lite.c @@ -0,0 +1,245 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#include "private.h" + +#define Imm8 (1 << 0) +#define Imm(1 << 1) +#define Branch (1 << 5) /* ... that we care about */ +/* ModRM (1 << 6) */ +#define Known (1 << 7) + +#define ALU_OPS \ +(Known|ModRM), \ +(Known|ModRM), \ +(Known|ModRM), \ +(Known|ModRM), \ +(Known|Imm8), \ +(Known|Imm) + +static const uint8_t init_or_livepatch_const onebyte[256] = { +[0x00] = ALU_OPS, /* ADD */ [0x08] = ALU_OPS, /* OR */ +[0x10] = ALU_OPS, /* ADC */ [0x18] = ALU_OPS, /* SBB */ +[0x20] = ALU_OPS, /* AND */ [0x28] = ALU_OPS, /* SUB */ +[0x30] = ALU_OPS, /* XOR */ [0x38] = ALU_OPS, /* CMP */ + +[0x50 ... 0x5f] = (Known), /* PUSH/POP %reg */ + +[0x70 ... 0x7f] = (Known|Branch|Imm8), /* Jcc disp8 */ +[0x80] = (Known|ModRM|Imm8), +[0x81] = (Known|ModRM|Imm), + +[0x83] = (Known|ModRM|Imm8), +[0x84 ... 0x8e] = (Known|ModRM), /* TEST/XCHG/MOV/MOV-SREG/LEA r/rm */ + +[0xb0 ... 0xb7] = (Known|Imm8),/* MOV $imm8, %reg */ +[0xb8 ... 0xbf] = (Known|Imm), /* MOV $imm32, %reg */ + +[0xcc] = (Known), /* INT3 */ +[0xcd] = (Known|Imm8),/* INT $imm8 */ + +[0xe8 ... 0xe9] = (Known|Branch|Imm), /* CALL/JMP disp32 */ +[0xeb] = (Known|Branch|Imm8), /* JMP disp8 */ + +[0xf1] = (Known), /* ICEBP */ +[0xf4] = (Known), /* HLT */ +[0xf5] = (Known), /* CMC */ +[0xf6 ... 0xf7] = (Known|ModRM), /* Grp3 */ +[0xf8 ... 0xfd] = (Known), /* CLC ... STD */ +[0xfe ... 0xff] = (Known|ModRM), /* Grp4 */ +}; +static const uint8_t init_or_livepatch_const twobyte[256] = { +[0x00 ... 0x01] = (Known|ModRM), /* Grp6/Grp7 */ + +[0x18 ... 0x1f] = (Known|ModRM), /* Grp16 (Hint Nop) */ + +[0x20 ... 0x23] = (Known|ModRM), /* MOV CR/DR */ + +[0x30 ... 0x34] = (Known), /* WRMSR ... RDPMC */ +[0x40 ... 0x4f] = (Known|ModRM), /* CMOVcc */ + +[0x80 ... 0x8f] = (Known|Branch|Imm), /* Jcc disp32 */ +[0x90 ... 0x9f] = (Known|ModRM), /* SETcc */ + +[0xab] = (Known|ModRM), /* BTS */ +[0xac] = (Known|ModRM|Imm8), /* SHRD $imm8 */ +[0xad ... 0xaf] = (Known|ModRM), /* SHRD %cl / Grp15 / IMUL */ + +[0xb8 ... 0xb9] = (Known|ModRM), /* POPCNT/Grp10 (UD1) */ +[0xba] = (Known|ModRM|Imm8), /* Grp8 */ +[0xbb ... 0xbf] = (Known|ModRM), /* BSR/BSF/BSR/MOVSX */ +}; + +/* + * Bare minimum x86 instruction decoder to parse the alternative replacement + * instructions and locate the IP-relative references that may need updating. + * + * These are: + * - disp8/32 from near branches + * - RIP-relative memory references + * + * The following simplifications are used: + * - All code is 64bit, and the instruction stream is safe to read. + * - The 67 prefix is not implemented, so the address size is only 64bit. + * + * Inputs: + * @ip The position to start decoding from. + * @end End of the replacement block. Exceeding this is considered an error. + * + * Returns: x86_decode_lite_t + * - On failure, length of -1. + * - On success, length > 0 and REL_TYPE_*. For REL_TYPE != NONE, rel points + *at t
[PATCH 6/6] x86/spec-ctrl: Introduce and use DO_COND_BHB_SEQ
Now that alternatives can fix up call displacements even when they're not the first instruction of the replacement, move the SCF_entry_bhb conditional inside the replacement block. This removes a conditional branch from the fastpaths of BHI-unaffected hardware. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné --- xen/arch/x86/hvm/vmx/entry.S | 12 +++ xen/arch/x86/include/asm/spec_ctrl_asm.h | 43 +--- 2 files changed, 30 insertions(+), 25 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S index 7233e771d88a..77bf9ea564ea 100644 --- a/xen/arch/x86/hvm/vmx/entry.S +++ b/xen/arch/x86/hvm/vmx/entry.S @@ -62,12 +62,12 @@ ENTRY(vmx_asm_vmexit_handler) * Clear the BHB to mitigate BHI. Used on eIBRS parts, and uses RETs * itself so must be after we've perfomed all the RET-safety we can. */ -testb $SCF_entry_bhb, CPUINFO_scf(%rsp) -jz .L_skip_bhb -ALTERNATIVE_2 "",\ -"call clear_bhb_loops", X86_SPEC_BHB_LOOPS, \ -"call clear_bhb_tsx", X86_SPEC_BHB_TSX -.L_skip_bhb: +.macro VMX_BHB_SEQ fn:req +DO_COND_BHB_SEQ \fn scf=CPUINFO_scf(%rsp) +.endm +ALTERNATIVE_2 "", \ +"VMX_BHB_SEQ fn=clear_bhb_loops", X86_SPEC_BHB_LOOPS, \ +"VMX_BHB_SEQ fn=clear_bhb_tsx", X86_SPEC_BHB_TSX ALTERNATIVE "lfence", "", X86_SPEC_NO_LFENCE_ENTRY_VMX /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */ diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h index 729a830411eb..559dad88f967 100644 --- a/xen/arch/x86/include/asm/spec_ctrl_asm.h +++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h @@ -92,6 +92,21 @@ .L\@_skip: .endm +.macro DO_COND_BHB_SEQ fn:req, scf=%bl +/* + * Requires SCF (defaults to %rbx), fn=clear_bhb_{loops,tsx} + * Clobbers %rax, %rcx + * + * Conditionally use a BHB clearing software sequence. + */ +testb $SCF_entry_bhb, \scf +jz .L\@_skip_bhb + +call \fn + +.L\@_skip_bhb: +.endm + .macro DO_OVERWRITE_RSB tmp=rax, xu /* * Requires nothing @@ -277,12 +292,9 @@ * Clear the BHB to mitigate BHI. Used on eIBRS parts, and uses RETs * itself so must be after we've perfomed all the RET-safety we can. */ -testb $SCF_entry_bhb, %bl -jz .L\@_skip_bhb -ALTERNATIVE_2 "",\ -"call clear_bhb_loops", X86_SPEC_BHB_LOOPS, \ -"call clear_bhb_tsx", X86_SPEC_BHB_TSX -.L\@_skip_bhb: +ALTERNATIVE_2 "", \ +"DO_COND_BHB_SEQ clear_bhb_loops", X86_SPEC_BHB_LOOPS, \ +"DO_COND_BHB_SEQ clear_bhb_tsx", X86_SPEC_BHB_TSX ALTERNATIVE "lfence", "", X86_SPEC_NO_LFENCE_ENTRY_PV .endm @@ -322,12 +334,9 @@ ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=1), \ X86_FEATURE_SC_MSR_PV -testb $SCF_entry_bhb, %bl -jz .L\@_skip_bhb -ALTERNATIVE_2 "",\ -"call clear_bhb_loops", X86_SPEC_BHB_LOOPS, \ -"call clear_bhb_tsx", X86_SPEC_BHB_TSX -.L\@_skip_bhb: +ALTERNATIVE_2 "", \ +"DO_COND_BHB_SEQ clear_bhb_loops", X86_SPEC_BHB_LOOPS, \ +"DO_COND_BHB_SEQ clear_bhb_tsx", X86_SPEC_BHB_TSX ALTERNATIVE "lfence", "", X86_SPEC_NO_LFENCE_ENTRY_INTR .endm @@ -433,13 +442,9 @@ * Clear the BHB to mitigate BHI. Used on eIBRS parts, and uses RETs * itself so must be after we've perfomed all the RET-safety we can. */ -testb $SCF_entry_bhb, %bl -jz .L\@_skip_bhb - -ALTERNATIVE_2 "",\ -"call clear_bhb_loops", X86_SPEC_BHB_LOOPS, \ -"call clear_bhb_tsx", X86_SPEC_BHB_TSX -.L\@_skip_bhb: +ALTERNATIVE_2 "", \ +"DO_COND_BHB_SEQ clear_bhb_loops", X86_SPEC_BHB_LOOPS, \ +"DO_COND_BHB_SEQ clear_bhb_tsx", X86_SPEC_BHB_TSX lfence .endm -- 2.30.2
[PATCH 3/6] x86/alternative: Intend the relocation logic
... to make subsequent patches legible. No functional change. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné --- xen/arch/x86/alternative.c | 126 +++-- 1 file changed, 64 insertions(+), 62 deletions(-) diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c index 5bd256365def..2ca4dfd569bc 100644 --- a/xen/arch/x86/alternative.c +++ b/xen/arch/x86/alternative.c @@ -244,78 +244,80 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start, memcpy(buf, repl, a->repl_len); -/* 0xe8/0xe9 are relative branches; fix the offset. */ -if ( a->repl_len >= 5 && (*buf & 0xfe) == 0xe8 ) { -/* - * Detect the special case of indirect-to-direct branch patching: - * - replacement is a direct CALL/JMP (opcodes 0xE8/0xE9; already - * checked above), - * - replacement's displacement is -5 (pointing back at the very - * insn, which makes no sense in a real replacement insn), - * - original is an indirect CALL/JMP (opcodes 0xFF/2 or 0xFF/4) - * using RIP-relative addressing. - * Some branch destinations may still be NULL when we come here - * the first time. Defer patching of those until the post-presmp- - * initcalls re-invocation (with force set to true). If at that - * point the branch destination is still NULL, insert "UD2; UD0" - * (for ease of recognition) instead of CALL/JMP. - */ -if ( a->cpuid == X86_FEATURE_ALWAYS && - *(int32_t *)(buf + 1) == -5 && - a->orig_len >= 6 && - orig[0] == 0xff && - orig[1] == (*buf & 1 ? 0x25 : 0x15) ) +/* 0xe8/0xe9 are relative branches; fix the offset. */ +if ( a->repl_len >= 5 && (*buf & 0xfe) == 0xe8 ) { -long disp = *(int32_t *)(orig + 2); -const uint8_t *dest = *(void **)(orig + 6 + disp); - -if ( dest ) +/* + * Detect the special case of indirect-to-direct branch patching: + * - replacement is a direct CALL/JMP (opcodes 0xE8/0xE9; already + * checked above), + * - replacement's displacement is -5 (pointing back at the very + * insn, which makes no sense in a real replacement insn), + * - original is an indirect CALL/JMP (opcodes 0xFF/2 or 0xFF/4) + * using RIP-relative addressing. + * Some branch destinations may still be NULL when we come here + * the first time. Defer patching of those until the post-presmp- + * initcalls re-invocation (with force set to true). If at that + * point the branch destination is still NULL, insert "UD2; UD0" + * (for ease of recognition) instead of CALL/JMP. + */ +if ( a->cpuid == X86_FEATURE_ALWAYS && + *(int32_t *)(buf + 1) == -5 && + a->orig_len >= 6 && + orig[0] == 0xff && + orig[1] == (*buf & 1 ? 0x25 : 0x15) ) { -/* - * When building for CET-IBT, all function pointer targets - * should have an endbr64 instruction. - * - * If this is not the case, leave a warning because - * something is probably wrong with the build. A CET-IBT - * enabled system might have exploded already. - * - * Otherwise, skip the endbr64 instruction. This is a - * marginal perf improvement which saves on instruction - * decode bandwidth. - */ -if ( IS_ENABLED(CONFIG_XEN_IBT) ) +long disp = *(int32_t *)(orig + 2); +const uint8_t *dest = *(void **)(orig + 6 + disp); + +if ( dest ) { -if ( is_endbr64(dest) ) -dest += ENDBR64_LEN; -else -printk(XENLOG_WARNING - "altcall %ps dest %ps has no endbr64\n", - orig, dest); +/* + * When building for CET-IBT, all function pointer targets + * should have an endbr64 instruction. + * + * If this is not the case, leave a warning bec
[PATCH 0/6] x86/alternatives: Adjust all insn-relative fields
Alternatives have had a reasonably severe restriction since their introduction. This has been the source of several bugs, and several inefficiencies particularly in the speculative safety paths, and I've finally gotten bored enough to fixing it. Introduce the new infrastructure, and adjust the BHB scrubbing logic to use it. Andrew Cooper (6): x86: Introduce x86_decode_lite() x86/alternative: Walk all replacements in debug builds x86/alternative: Intend the relocation logic x86/alternative: Replace a continue with a goto x86/alternative: Relocate all insn-relative fields x86/spec-ctrl: Introduce and use DO_COND_BHB_SEQ xen/arch/x86/alternative.c | 210 +-- xen/arch/x86/hvm/vmx/entry.S | 12 +- xen/arch/x86/include/asm/spec_ctrl_asm.h | 43 ++-- xen/arch/x86/x86_emulate/Makefile| 1 + xen/arch/x86/x86_emulate/decode-lite.c | 245 +++ xen/arch/x86/x86_emulate/private.h | 2 + xen/arch/x86/x86_emulate/x86_emulate.h | 17 ++ 7 files changed, 445 insertions(+), 85 deletions(-) create mode 100644 xen/arch/x86/x86_emulate/decode-lite.c -- 2.30.2
[PATCH 4/6] x86/alternative: Replace a continue with a goto
A subsequent patch is going to insert a loop, which interferes with the continue in the devirtualisation logic. Replace it with a goto, and a paragraph explaining why we intentionally avoid setting a->priv = 1. No functional change. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné --- xen/arch/x86/alternative.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c index 2ca4dfd569bc..c86ea235e865 100644 --- a/xen/arch/x86/alternative.c +++ b/xen/arch/x86/alternative.c @@ -308,7 +308,15 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start, buf[4] = 0xff; } else -continue; +{ +/* + * The function pointer we're wanting to devirtualise + * is still NULL, and we're not sealing yet. Leave + * the alternative fully un-processed, in order to + * process it the next time around. + */ +goto skip_this_alternative; +} } else if ( force && system_state < SYS_STATE_active ) ASSERT_UNREACHABLE(); @@ -323,6 +331,7 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start, add_nops(buf + a->repl_len, total_len - a->repl_len); text_poke(orig, buf, total_len); +skip_this_alternative:; } /* -- 2.30.2
Re: [PATCH v2 1/2] xen: introduce header file with section related symbols
On 19/04/2024 11:12 am, Jan Beulich wrote: > On 19.04.2024 12:08, Andrew Cooper wrote: >> On 19/04/2024 11:02 am, Roger Pau Monne wrote: >>> Start by declaring the beginning and end of the init section. >>> >>> No functional change intended. >>> >>> Requested-by: Andrew Cooper >>> Signed-off-by: Roger Pau Monné >> TYVM for doing this. There's a lot of cleanup which can follow on for it. >> >> Reviewed-by: Andrew Cooper >> >> although if anyone has a better name than sections.h then speak now. > For what is put there now, and for any other section bounds markers the > name is fine with me. I'm not presently convinced though we want to put > __read_mostly and friends there. Well that's exactly what I intend to clean up into it, because it's far better in sections.h than (duplicated per arch) cache.h (Also I intend to strip down kernel.h for the other major sections too.) ~Andrew
Re: [PATCH v2 2/2] livepatch: refuse to resolve symbols that belong to init sections
On 19/04/2024 11:02 am, Roger Pau Monne wrote: > Livepatch payloads containing symbols that belong to init sections can only > lead to page faults later on, as by the time the livepatch is loaded init > sections have already been freed. > > Refuse to resolve such symbols and return an error instead. > > Note such resolutions are only relevant for symbols that point to undefined > sections (SHN_UNDEF), as that implies the symbol is not in the current payload > and hence must either be a Xen or a different livepatch payload symbol. > > Do not allow to resolve symbols that point to __init_begin, as that address is > also unmapped. On the other hand, __init_end is not unmapped, and hence allow > resolutions against it. > > Signed-off-by: Roger Pau Monné Reviewed-by: Andrew Cooper , although ... > --- > Changes since v1: > - Fix off-by-one in range checking. > --- > xen/common/livepatch_elf.c | 16 > 1 file changed, 16 insertions(+) > > diff --git a/xen/common/livepatch_elf.c b/xen/common/livepatch_elf.c > index 45d73912a3cd..a67101eadc02 100644 > --- a/xen/common/livepatch_elf.c > +++ b/xen/common/livepatch_elf.c > @@ -4,6 +4,7 @@ > > #include > #include > +#include > #include > #include > #include > @@ -310,6 +311,21 @@ int livepatch_elf_resolve_symbols(struct livepatch_elf > *elf) > break; > } > } > + > +/* > + * Ensure not an init symbol. Only applicable to Xen symbols, as > + * livepatch payloads don't have init sections or equivalent. > + */ > +else if ( st_value >= (uintptr_t)&__init_begin && > + st_value < (uintptr_t)&__init_end ) ... I normally vertically the (casts) in cases like this for improved legibility. Happy to fold on commit. ~Andrew
Re: [PATCH v2 1/2] xen: introduce header file with section related symbols
On 19/04/2024 11:02 am, Roger Pau Monne wrote: > Start by declaring the beginning and end of the init section. > > No functional change intended. > > Requested-by: Andrew Cooper > Signed-off-by: Roger Pau Monné TYVM for doing this. There's a lot of cleanup which can follow on for it. Reviewed-by: Andrew Cooper although if anyone has a better name than sections.h then speak now.
Re: [XEN PATCH] docs/misra: mark the gzip folder as adopted code
On 18/04/2024 8:39 am, Jan Beulich wrote: > On 15.04.2024 17:44, Andrew Cooper wrote: >> On 15/04/2024 10:56 am, Federico Serafini wrote: >>> Mark the whole gzip folder as adopted code and remove the redundant >>> deviation of file inflate. >>> >>> Signed-off-by: Federico Serafini >> Acked-by: Andrew Cooper >> >> I hadn't realised that we had a special case like this. Definitely >> better to get rid of it. >> >> I've pulled this into my `for-next` branch, and will get it committed >> properly when OSSTest (our non-gitlab CI) is in a happier state. > Hmm. Considering Daniel's work (which I'll comment on separately), is this > really going to remain "adopted"? We're about to diverge to a degree where > simply taking patches from the original source isn't going to work anymore. > IOW I think we want either Daniel's work (and perhaps follow-on adjustments) > or marking of that code as adopted. inflate.c is was from Linux in 2010. There's only one build fix and one comment change in Linux since 2010, whereas Xen's copy has seen several bugfixes and cleanups. gunzip.c has floated around rather more (it was originally some glue code in bZImage.c) but it was entirely rewritten first, to support other types of decompression (we did this differently in Xen), and second to support KASLR. In both cases, there's not an upstream to usefully track, and we probably take ownership. ~Andrew
Re: [PATCH] xen/efi: Rewrite DOS/PE magic checking without memcmp()
On 17/04/2024 8:14 am, Roger Pau Monné wrote: > On Tue, Apr 16, 2024 at 04:52:51PM +0100, Andrew Cooper wrote: >> Misra Rule 21.16 doesn't like the use of memcmp() between a string literal >> and >> a UINT8 array. Rewrite using plain compares. > The commit message makes it look like it's a type mismatch issue > between the two elements being compared, but from my reading of the > rule the issue is with the usage of a char pointer with memcmp(). > IOW: even if the two parameters are char pointers it would still be a > violation. > > "Misra Rule 21.16 forbids the use of memcmp() against character > arrays. Rewrite using plain compares since checking for "PE\0\0" > cannot be done using strncmp()." I've tweaked the sentence to say character array, but IMO it's really not about strncmp() which wouldn't be correct to use here even if it happened to produce a correct answer. >> No functional change. >> >> Signed-off-by: Andrew Cooper > LGTM (possibly pending the adjustment of the commit message): > > Acked-by: Roger Pau Monné Thanks.
Re: [PATCH] xen/efi: Rewrite DOS/PE magic checking without memcmp()
On 18/04/2024 12:09 pm, Jan Beulich wrote: > On 16.04.2024 17:52, Andrew Cooper wrote: >> Misra Rule 21.16 doesn't like the use of memcmp() between a string literal >> and >> a UINT8 array. Rewrite using plain compares. >> >> No functional change. >> >> Signed-off-by: Andrew Cooper > Reviewed-by: Jan Beulich Thanks. > after having realized that sadly gcc13 instantiates the compound literals > (that I had thought of using) in .rodata (or one of its variants), rather > than leveraging them being as constant as string literals. gcc10 manages to optimise both checks to a cmp{w,l}, which I did check before sending the patch. However, it chooses a different base register for the cmpl and there's a sad cascade effect where a bunch of JMP disp8's turn into disp32's. But oh well - it's init code. ~Andrew
Re: [PATCH] xen/efi: Rewrite DOS/PE magic checking without memcmp()
On 18/04/2024 12:06 pm, Jan Beulich wrote: > On 17.04.2024 09:14, Roger Pau Monné wrote: >> On Tue, Apr 16, 2024 at 04:52:51PM +0100, Andrew Cooper wrote: >>> --- a/xen/common/efi/pe.c >>> +++ b/xen/common/efi/pe.c >>> @@ -111,7 +111,8 @@ const void *__init pe_find_section(const void *image, >>> const UINTN image_size, >>> UINTN offset, i; >>> >>> if ( image_size < sizeof(*dos) || >>> - memcmp(dos->Magic, "MZ", 2) != 0 ) >>> + dos->Magic[0] != 'M' || >>> + dos->Magic[1] != 'Z' ) >> For this one you could likely use strncmp()? > strncmp() against UINT8[2] wouldn't be liked by the compiler, I guess. Indeed. And this MISRA rule is very much "you are mixing string and non-string types. Don't do that." This is a very rare patten, where we are looking for a binary marker than just happens to also make sense when expressed as an ASCII string. Although the MISRA complaint does raise a good point. The memcmp() form would malfunction on any system with CHAR_BIT != 8, in a way that the {u,}int8_t-at-a-time form wouldn't. ~Andrew
Re: [XEN PATCH v1 08/15] x86/vpmu: separate amd/intel vPMU code
On 18/04/2024 2:25 pm, Sergiy Kibrik wrote: > 16.04.24 14:05, Andrew Cooper: >> On 16/04/2024 7:35 am, Sergiy Kibrik wrote: >>> diff --git a/xen/arch/x86/cpu/Makefile b/xen/arch/x86/cpu/Makefile >>> index 35561fe51d..d3d7b8fb2e 100644 >>> --- a/xen/arch/x86/cpu/Makefile >>> +++ b/xen/arch/x86/cpu/Makefile >>> @@ -10,4 +10,6 @@ obj-y += intel.o >>> obj-y += intel_cacheinfo.o >>> obj-y += mwait-idle.o >>> obj-y += shanghai.o >>> -obj-y += vpmu.o vpmu_amd.o vpmu_intel.o >>> +obj-y += vpmu.o >>> +obj-$(CONFIG_SVM) += vpmu_amd.o >>> +obj-$(CONFIG_VMX) += vpmu_intel.o >> >> I'm afraid this breaks perf counters on PV guests. These files are >> joint guest-type implementations. >> >> Seeing as you leave vpmu.o alone, I guess that all you're actually >> wanting to do is compile out vpmu_intel.o? In which case, use >> CONFIG_{AMD,INTEL} rather than CONFIG_{SVM,VMX} please. >> > > Thanks for pointing that out. > I think I'll just exclude this patch from the series, and make a > separate series with CONFIG_{AMD,INTEL} option and code separation > that unrelated to VMX/SVM & HVM/PV, only to CPUs themselves. > > BTW, how would you suggest CONFIG_{AMD,INTEL} shall relate to > CONFIG_{SVM,VMX}? Should CONFIG_VMX just plainly depend on CONFIG_AMD, > or more complex relations needed? To a first approximation, no linkage. Centaur have an implementation of VMX on the market, and Hygon have an implementation of SVM. ~Andrew
Re: [PATCH 1/4] xen/xlat: Sort out whitespace
On 15/04/2024 10:49 pm, Stefano Stabellini wrote: > On Mon, 15 Apr 2024, Andrew Cooper wrote: >> * Fix tabs/spaces mismatch for certain rows >> * Insert lines between header files to improve legibility >> >> Signed-off-by: Andrew Cooper >> --- >> CC: George Dunlap >> CC: Jan Beulich >> CC: Stefano Stabellini >> CC: Julien Grall >> --- >> xen/include/xlat.lst | 31 +++ >> 1 file changed, 27 insertions(+), 4 deletions(-) >> >> diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst >> index 9c41948514bf..e811342bb096 100644 >> --- a/xen/include/xlat.lst >> +++ b/xen/include/xlat.lst >> @@ -20,19 +20,24 @@ >> # First column indicator: >> # ! - needs translation >> # ? - needs checking >> + >> ? dom0_vga_console_info xen.h >> ? xenctl_bitmap xen.h >> ? mmu_update xen.h >> ! mmuext_op xen.h >> ! start_info xen.h >> ? vcpu_time_info xen.h >> + >> ? pmu_amd_ctxtarch-x86/pmu.h >> ? pmu_archarch-x86/pmu.h >> ? pmu_cntr_pair arch-x86/pmu.h >> ? pmu_intel_ctxt arch-x86/pmu.h >> ? pmu_regsarch-x86/pmu.h >> + >> ! cpu_user_regs arch-x86/xen-@arch@.h >> + >> ! trap_info arch-x86/xen.h >> + >> ? cpu_offline_action arch-x86/xen-mca.h >> ? mc arch-x86/xen-mca.h >> ? mcinfo_bank arch-x86/xen-mca.h >> @@ -50,6 +55,7 @@ >> ? mc_notifydomain arch-x86/xen-mca.h >> ! mc_physcpuinfo arch-x86/xen-mca.h >> ? page_offline_action arch-x86/xen-mca.h >> + >> ? argo_addr argo.h >> ! argo_iovargo.h >> ? argo_register_ring argo.h >> @@ -59,6 +65,7 @@ >> ? argo_ring_message_headerargo.h >> ? argo_send_addr argo.h >> ? argo_unregister_ringargo.h >> + >> ? evtchn_alloc_unboundevent_channel.h >> ? evtchn_bind_interdomain event_channel.h >> ? evtchn_bind_ipi event_channel.h >> @@ -74,6 +81,7 @@ >> ? evtchn_set_priority event_channel.h >> ? evtchn_status event_channel.h >> ? evtchn_unmask event_channel.h >> + >> ? gnttab_cache_flush grant_table.h >> ! gnttab_copy grant_table.h >> ? gnttab_dump_table grant_table.h >> @@ -86,9 +94,10 @@ >> ? gnttab_get_version grant_table.h >> ! gnttab_get_status_framesgrant_table.h >> ? grant_entry_v1 grant_table.h >> -? grant_entry_header grant_table.h >> +? grant_entry_header grant_table.h >> ? grant_entry_v2 grant_table.h >> ? gnttab_swap_grant_ref grant_table.h >> + >> ! dm_op_buf hvm/dm_op.h >> ? dm_op_create_ioreq_server hvm/dm_op.h >> ? dm_op_destroy_ioreq_server hvm/dm_op.h >> @@ -108,15 +117,20 @@ >> ? dm_op_set_pci_intx_levelhvm/dm_op.h >> ? dm_op_set_pci_link_routehvm/dm_op.h >> ? dm_op_track_dirty_vram hvm/dm_op.h >> + >> ! hvm_altp2m_set_mem_access_multi hvm/hvm_op.h >> + >> ? vcpu_hvm_contexthvm/hvm_vcpu.h >> ? vcpu_hvm_x86_32 hvm/hvm_vcpu.h >> ? vcpu_hvm_x86_64 hvm/hvm_vcpu.h >> + >> ? hypfs_direntry hypfs.h >> ? hypfs_dirlistentry hypfs.h >> + >> ? kexec_exec kexec.h >> ! kexec_image kexec.h >> ! kexec_range kexec.h >> + >> ! add_to_physmap memory.h >> ! add_to_physmap_batchmemory.h >> ! foreign_memory_map memory.h >> @@ -130,6 +144,7 @@ >> ! reserved_device_memory_map memory.h >> ? vmemrange memory.h >> ! vnuma_topology_info memory.h >> + >> ? physdev_eoi physdev.h >> ? physdev_get_free_pirq physdev.h >> ? physdev_irq physdev.h >> @@ -
Re: [PATCH v2 1/5] x86: Update x86 low level version check of microcode
On 16/04/2024 10:15 am, Fouad Hilly wrote: > Update microcode version check at Intel and AMD Level by: > Preventing the low level code from sending errors if the microcode > version provided is not a newer version. Other errors will be sent like > before. > When the provided microcode version is the same as the current one, code > to point to microcode provided. > Microcode version check happens at higher and common level in core.c. > Keep all the required code at low level that checks for signature and CPU > compatibility > > [v2] > Update message description to better describe the changes > > Signed-off-by: Fouad Hilly > --- As a general note, your v2/v3/etc changelog needs to go under this --- line. ~Andrew
Re: [PATCH v2 4/6] gzip: refactor state tracking
On 17/04/2024 3:37 pm, Daniel P. Smith wrote: > diff --git a/xen/common/gzip/gunzip.c b/xen/common/gzip/gunzip.c > index 1b448d6e3655..8178a05a0190 100644 > --- a/xen/common/gzip/gunzip.c > +++ b/xen/common/gzip/gunzip.c > @@ -4,18 +4,25 @@ > #include > #include > > -static unsigned char *__initdata window; > - > #define WSIZE 0x8000U > > -static unsigned char *__initdata inbuf; > -static unsigned int __initdata insize; > +struct gzip_state { > +unsigned char *window; > + > +unsigned char *inbuf; > +unsigned int insize; > + > +/* Index of next byte to be processed in inbuf: */ > +unsigned int inptr; perform_gunzip() passes in an unsigned long size, which means it's truncated in this state. Please at least make these size_t. Life is too short to deal with the fallout of this going wrong. > > -/* Index of next byte to be processed in inbuf: */ > -static unsigned int __initdata inptr; > +/* Bytes in output buffer: */ > +unsigned int outcnt; See later, but I think the comment for outcnt is wrong. > > -/* Bytes in output buffer: */ > -static unsigned int __initdata outcnt; > +long bytes_out; See later, but I don't think this can be signed. It's only used to cross-check the header-reported length, which is done as an unsigned long. > + > +unsigned long bb; /* bit buffer */ > +unsigned bk; /* bits in bit buffer */ As this is effectively new code, "unsigned int" please. > @@ -27,7 +34,7 @@ typedef unsigned char uch; > typedef unsigned short ush; > typedef unsigned long ulg; > > -#define get_byte() (inptr < insize ? inbuf[inptr++] : fill_inbuf()) > +#define get_byte() (s->inptr < s->insize ? s->inbuf[s->inptr++] : > fill_inbuf()) This is a bit weird: > $ git grep -w get_byte common/gzip > common/gzip/gunzip.c:40:#define get_byte() (s->inptr < s->insize ? > s->inbuf[s->inptr++] : fill_inbuf()) > common/gzip/inflate.c:224:#define NEXTBYTE(s) ({ int v = get_byte(); > if (v < 0) goto underrun; (uch)v; }) > common/gzip/inflate.c:1131: flags = (uch)get_byte(); NEXTBYTE() gets an s parameter, but doesn't pass it to get_byte() and instead relies on state always having the name 's' in scope. I'd suggest turning get_byte() into a proper static function. It will be more readable that throwing extra ()'s around s in the existing macro. Except... fill_inbuf() is a fatal error anyway, so that can be folded together to simplify the result. > @@ -72,16 +78,16 @@ static __init void flush_window(void) > unsigned int n; > unsigned char *in, ch; > > -in = window; > -for ( n = 0; n < outcnt; n++ ) > +in = s->window; > +for ( n = 0; n < s->outcnt; n++ ) > { > ch = *in++; > c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8); > } > crc = c; > > -bytes_out += (unsigned long)outcnt; > -outcnt = 0; > +s->bytes_out += (unsigned long)s->outcnt; I can't figure out why this was written this way, even originally. AFAICT, outcnt doesn't even match it's comment. /* Bytes in output buffer: */ It looks like it's the number of bytes in window. This also matches the "wp" define which I guess is "window position". Anyway, irrespective of naming, the cast is useless so lets drop it while modifying the line. > diff --git a/xen/common/gzip/inflate.c b/xen/common/gzip/inflate.c > index 512d9bf0ee2e..5735bbcf7eb4 100644 > --- a/xen/common/gzip/inflate.c > +++ b/xen/common/gzip/inflate.c > @@ -119,7 +119,7 @@ static char rcsid[] = "#Id: inflate.c,v 0.14 1993/06/10 > 13:27:04 jloup Exp #"; > > #endif /* !__XEN__ */ > > -#define slide window > +#define slide s->window Given only 8 uses, I'd expand this in place and drop the macro. But, there's an entire comment block discussing "slide" which I think is wrong for Xen's implementation. That wants to go too, I think. > > /* > * Huffman code lookup table entry--this entry is four bytes for machines > @@ -143,12 +143,12 @@ struct huft { > static int huft_build OF((unsigned *, unsigned, unsigned, >const ush *, const ush *, struct huft **, int *)); > static int huft_free OF((struct huft *)); > -static int inflate_codes OF((struct huft *, struct huft *, int, int)); > -static int inflate_stored OF((void)); > -static int inflate_fixed OF((void)); > -static int inflate_dynamic OF((void)); > -static int inflate_block OF((int *)); > -static int inflate OF((void)); > +static int inflate_codes OF((struct gzip_state *, struct huft *, struct huft > *, int, int)); > +static int inflate_stored OF((struct gzip_state *)); > +static int inflate_fixed OF((struct gzip_state *)); > +static int inflate_dynamic OF((struct gzip_state *)); > +static int inflate_block OF((struct gzip_state *, int *)); > +static int inflate OF((struct gzip_state *)); These are the only users of OF, and it turns out it's a no-op macro. This code would be far-less WTF-worthy if it was expanded as part of
Re: [PATCH v2 5/6] gzip: move crc state into consilidated gzip state
On 17/04/2024 3:37 pm, Daniel P. Smith wrote: > Signed-off-by: Daniel P. Smith The change in type is fine, but does need discussing. Furthermore, ... > diff --git a/xen/common/gzip/gunzip.c b/xen/common/gzip/gunzip.c > index 8178a05a0190..bef324d3d166 100644 > --- a/xen/common/gzip/gunzip.c > +++ b/xen/common/gzip/gunzip.c > @@ -74,7 +77,7 @@ static __init void flush_window(struct gzip_state *s) > * The window is equal to the output buffer therefore only need to > * compute the crc. > */ > -unsigned long c = crc; > +unsigned long c = s->crc; ... this wants to be unsigned int I think. > diff --git a/xen/common/gzip/inflate.c b/xen/common/gzip/inflate.c > index 5735bbcf7eb4..c18ce20210b0 100644 > --- a/xen/common/gzip/inflate.c > +++ b/xen/common/gzip/inflate.c > @@ -1063,16 +1063,14 @@ static int __init inflate(struct gzip_state *s) > * > **/ > > -static ulg __initdata crc_32_tab[256]; > -static ulg __initdata crc; /* initialized in makecrc() so it'll reside in > bss */ > -#define CRC_VALUE (crc ^ 0xUL) > +#define CRC_VALUE (s->crc ^ 0xUL) $ git grep CRC_VALUE common/gzip/inflate.c:1052:#define CRC_VALUE (s->crc ^ 0xUL) common/gzip/inflate.c:1207: if (orig_crc != CRC_VALUE) { I'd expand this in it's single user, but like ... > > /* > * Code to compute the CRC-32 table. Borrowed from > * gzip-1.0.3/makecrc.c. > */ > > -static void __init makecrc(void) > +static void __init makecrc(struct gzip_state *s) > { > /* Not copyrighted 1990 Mark Adler */ > > @@ -1089,7 +1087,7 @@ static void __init makecrc(void) > for (i = 0; i < sizeof(p)/sizeof(int); i++) > e |= 1L << (31 - p[i]); > > -crc_32_tab[0] = 0; > +s->crc_32_tab[0] = 0; > > for (i = 1; i < 256; i++) > { > @@ -1100,11 +1098,11 @@ static void __init makecrc(void) > if (k & 1) > c ^= e; > } > -crc_32_tab[i] = c; > +s->crc_32_tab[i] = c; > } > > /* this is initialized here so this code could reside in ROM */ > -crc = (ulg)0xUL; /* shift register contents */ > +s->crc = (ulg)0xUL; /* shift register contents */ ... this, the constant should become -1u or ~0u because of the type change. I'm not sure what to make of the ROM comment, but I suspect it means the XOR can be dropped with a bit of care too. ~Andrew
Re: [PATCH v2 3/6] gzip: remove custom memory allocator
On 17/04/2024 3:37 pm, Daniel P. Smith wrote: > All the other decompression routines use xmalloc_bytes(), thus there is no > reason for gzip to be handling its own allocation of memory. In fact, there is > a bug somewhere in the allocator as decompression started to break when adding > additional allocations. Instead of troubleshooting the allocator, replace it > with xmalloc_bytes(). > > Signed-off-by: Daniel P. Smith > --- > xen/common/gzip/gunzip.c | 17 ++ > xen/common/gzip/inflate.c | 47 --- > 2 files changed, 2 insertions(+), 62 deletions(-) Good riddance. Reviewed-by: Andrew Cooper
Re: [PATCH v2 6/6] gzip: drop huffman code table tracking
On 17/04/2024 3:37 pm, Daniel P. Smith wrote: > The "tracking" bits does not appear to be used, so dropping from the code. > > Signed-off-by: Daniel P. Smith > --- > xen/common/gzip/inflate.c | 6 -- > 1 file changed, 6 deletions(-) > > diff --git a/xen/common/gzip/inflate.c b/xen/common/gzip/inflate.c > index c18ce20210b0..15bc187c2bbe 100644 > --- a/xen/common/gzip/inflate.c > +++ b/xen/common/gzip/inflate.c > @@ -264,8 +264,6 @@ static const int dbits = 6; /* bits in base > distance lookup table */ > #define BMAX 16 /* maximum bit length of any code (16 for explode) */ > #define N_MAX 288 /* maximum number of codes in any set */ > > -static unsigned __initdata hufts; /* track memory usage */ > - > /* > * Given a list of code lengths and a maximum table size, make a set of > * tables to decode that set of codes. Return zero on success, one if > @@ -445,7 +443,6 @@ static int __init huft_build( > goto out; > } > DEBG1("4 "); > -hufts += z + 1; /* track memory usage */ > *t = q + 1; /* link to list for huft_free() */ > *(t = &(q->v.t)) = (struct huft *)NULL; > u[h] = ++q; /* table starts after link */ > @@ -1028,15 +1025,12 @@ static int __init inflate(struct gzip_state *s) > /* decompress until the last block */ > h = 0; > do { > -hufts = 0; > #ifdef ARCH_HAS_DECOMP_WDOG > arch_decomp_wdog(); > #endif > r = inflate_block(s, ); > if (r) > return r; > -if (hufts > h) > -h = hufts; > } while (!e); With 'hufts' removed, the local variable 'h' is now dead too. It gets read inside an #ifdef DEBUG, but as it's rendering a unqualified number to stderr, it can also be deleted. Specifically, I recommend this additional delta: diff --git a/xen/common/gzip/inflate.c b/xen/common/gzip/inflate.c index 15bc187c2bbe..13015bb45f4a 100644 --- a/xen/common/gzip/inflate.c +++ b/xen/common/gzip/inflate.c @@ -1015,7 +1015,6 @@ static int __init inflate(struct gzip_state *s) { int e; /* last block flag */ int r; /* result code */ - unsigned h; /* maximum struct huft's malloc'ed */ /* initialize window, bit buffer */ wp = 0; @@ -1023,7 +1022,6 @@ static int __init inflate(struct gzip_state *s) s->bb = 0; /* decompress until the last block */ - h = 0; do { #ifdef ARCH_HAS_DECOMP_WDOG arch_decomp_wdog(); @@ -1045,9 +1043,6 @@ static int __init inflate(struct gzip_state *s) flush_output(s, wp); /* return success */ -#ifdef DEBUG - fprintf(stderr, "<%u> ", h); -#endif /* DEBUG */ return 0; } which I'm happy to fold on commit. ~Andrew
Re: [PATCH v2 1/6] gzip: drop unused define checks
On 17/04/2024 3:37 pm, Daniel P. Smith wrote: > Dropping the define checks for PKZIP_BUG_WORKAROUND and NOMEMCPY define as > they > never are set. > > Signed-off-by: Daniel P. Smith It looks like ARCH_HAS_DECOMP_WDOG is another one that can go. There's only a single instance, in inflate(), which I'm happy to fold on commit. ~Andrew
[PATCH] xen/efi: Rewrite DOS/PE magic checking without memcmp()
Misra Rule 21.16 doesn't like the use of memcmp() between a string literal and a UINT8 array. Rewrite using plain compares. No functional change. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Stefano Stabellini CC: consult...@bugseng.com CC: Roberto Bagnara CC: Federico Serafini CC: Nicola Vetrini --- xen/common/efi/pe.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/xen/common/efi/pe.c b/xen/common/efi/pe.c index a84992df9afe..ef8a2543e0a1 100644 --- a/xen/common/efi/pe.c +++ b/xen/common/efi/pe.c @@ -111,7 +111,8 @@ const void *__init pe_find_section(const void *image, const UINTN image_size, UINTN offset, i; if ( image_size < sizeof(*dos) || - memcmp(dos->Magic, "MZ", 2) != 0 ) + dos->Magic[0] != 'M' || + dos->Magic[1] != 'Z' ) return NULL; offset = dos->ExeHeader; @@ -119,7 +120,10 @@ const void *__init pe_find_section(const void *image, const UINTN image_size, offset += sizeof(*pe); if ( image_size < offset || - memcmp(pe->Magic, "PE\0\0", 4) != 0 ) + pe->Magic[0] != 'P' || + pe->Magic[1] != 'E' || + pe->Magic[2] != '\0' || + pe->Magic[3] != '\0' ) return NULL; if ( pe->FileHeader.Machine != PE_HEADER_MACHINE ) base-commit: c0f890cd9d5fd2c17a1e3110cb26f98c90ce8429 -- 2.30.2
Re: [PATCH] x86/hvm: Allow supplying a dynamic start ASID
On 16/04/2024 9:54 am, Vaishali Thakkar wrote: > Currently, Xen always starts the ASID allocation at 1. But > for SEV technologies the ASID space is divided. This is > because it's a security issue if a guest is started as > ES/SNP and is migrated to SEV-only. So, the types are > tracked explicitly. > > Thus, in preparation of SEV support in Xen, add min_asid > to allow supplying the dynamic start ASID during the > allocation process. > > Signed-off-by: Vaishali Thakkar Mechanically, this is fine, but is it going to be useful in the long term? For SEV and SEV-ES/SNP, we must run the VM in the single fixed ASID negotiated with the ASP. This is not not optional. For non-encrypted VMs, we are free to choose between using the remaining ASID space as we used to (i.e. run it per-pCPU and tick it over to flush the TLBs), or to run it in a fixed ASID per guest too. The latter lets us use INVLPGB, and would avoid maintaining two different TLB-tagging schemes. I'd say that we absolutely do want INVLPGB support for managing non-encrypted VMs, and we cannot mix both fixed and floating ASIDs at the same time. We should seriously consider whether it's worth maintaining two schemes, or just switching wholesale to a fixed ASID scheme. I don't have a good answer here... If it where anything but TLB flushing, it would be an obvious choice, but TLB handling bugs are some of the nastiest to show up. ~Andrew
Re: [PATCH] x86/svm: Add flushbyasid in the supported features
On 16/04/2024 10:08 am, Vaishali Thakkar wrote: > TLB Flush by ASID is missing in the list of supported features > here. So, add it. > > Signed-off-by: Vaishali Thakkar > --- > xen/arch/x86/hvm/svm/svm.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c > index a745acd903..4719fffae5 100644 > --- a/xen/arch/x86/hvm/svm/svm.c > +++ b/xen/arch/x86/hvm/svm/svm.c > @@ -2510,6 +2510,7 @@ const struct hvm_function_table * __init start_svm(void) > P(cpu_has_svm_lbrv, "Last Branch Record (LBR) Virtualisation"); > P(cpu_has_svm_nrips, "Next-RIP Saved on #VMEXIT"); > P(cpu_has_svm_cleanbits, "VMCB Clean Bits"); > +P(cpu_has_svm_flushbyasid, "TLB flush by ASID"); > P(cpu_has_svm_decode, "DecodeAssists"); > P(cpu_has_svm_vloadsave, "Virtual VMLOAD/VMSAVE"); > P(cpu_has_svm_vgif, "Virtual GIF"); This is consistent with pre-existing behaviour, so Acked-by: Andrew Cooper However, an ever increasing list of lines like this is something I'm trying to push back against. They don't match the configured state of VMs in the system, not least because one of the things required to fix security vulnerabilities in nested virt is to break the (false) assumption that there is a single global state of how a VM is configured. These ones in particular are just about to appear in CPU policies. ~Andrew
Re: [XEN PATCH v1 13/15] x86: wire cpu_has_{svm/vmx}_* to false when svm/vmx not enabled
On 16/04/2024 7:46 am, Sergiy Kibrik wrote: > From: Xenia Ragiadakou > > To be able to use cpu_has_{svm/vmx}_* macros in common code without enclosing > them inside #ifdef guards when the respective virtualization technology is > not enabled, define corresponding helper routines as false when not > applicable. > > No functional change intended. > > Signed-off-by: Xenia Ragiadakou > Signed-off-by: Sergiy Kibrik > --- > xen/arch/x86/include/asm/hvm/svm/svm.h | 8 > xen/arch/x86/include/asm/hvm/vmx/vmcs.h | 7 +++ > 2 files changed, 15 insertions(+) > > diff --git a/xen/arch/x86/include/asm/hvm/svm/svm.h > b/xen/arch/x86/include/asm/hvm/svm/svm.h > index 4eeeb25da9..7e8cdb4a27 100644 > --- a/xen/arch/x86/include/asm/hvm/svm/svm.h > +++ b/xen/arch/x86/include/asm/hvm/svm/svm.h > @@ -38,10 +38,18 @@ extern u32 svm_feature_flags; > #define SVM_FEATURE_SSS 19 /* NPT Supervisor Shadow Stacks */ > #define SVM_FEATURE_SPEC_CTRL 20 /* MSR_SPEC_CTRL virtualisation */ > > +#ifdef CONFIG_SVM > static inline bool cpu_has_svm_feature(unsigned int feat) > { > return svm_feature_flags & (1u << feat); > } > +#else > +static inline bool cpu_has_svm_feature(unsigned int feat) > +{ > +return false; > +} > +#endif > + > #define cpu_has_svm_npt cpu_has_svm_feature(SVM_FEATURE_NPT) > #define cpu_has_svm_lbrv cpu_has_svm_feature(SVM_FEATURE_LBRV) > #define cpu_has_svm_svml cpu_has_svm_feature(SVM_FEATURE_SVML) > diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h > b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h > index fd197e2603..2d927d3100 100644 > --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h > +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h > @@ -287,10 +287,17 @@ extern uint64_t vmx_tertiary_exec_control; > #define VMX_VPID_INVVPID_SINGLE_CONTEXT_RETAINING_GLOBAL 0x800ULL > extern u64 vmx_ept_vpid_cap; > > +#ifdef CONFIG_VMX > static inline bool vmx_ctrl_has_feature(uint64_t control, unsigned long > feature) > { > return control & feature; > } > +#else > +static inline bool vmx_ctrl_has_feature(uint64_t control, unsigned long > feature) > +{ > +return false; > +} > +#endif > > #define VMX_MISC_ACTIVITY_MASK 0x01c0 > #define VMX_MISC_PROC_TRACE 0x4000 I'm afraid this is going in an unhelpful direction. We want to move both of these files to be local to arch/x86/hvm/{vmx,svm}/. cpu_has_svm_* isn't actually used outside of svm/; only the plain SVM_FEATURE_* constants are, and that's only because they're not expressed as plain cpu features yet. cpu_has_vmx_* has a few more users, but most are unlikely to remain in this form. One critical set of changes to fix vulnerabilities in nested-virt is to make almost of of these decisions based on per-domain state, not host state. The aspects which are host state should be in regular cpu features. I already volunteered to sort out the SEV feature leaf properly, and I was going to do the SVM leaf while I was at it. If you can wait a few days, I might be able to make half of this problem disappear. ~Andrew
Re: [XEN PATCH v1 08/15] x86/vpmu: separate amd/intel vPMU code
On 16/04/2024 7:35 am, Sergiy Kibrik wrote: > diff --git a/xen/arch/x86/cpu/Makefile b/xen/arch/x86/cpu/Makefile > index 35561fe51d..d3d7b8fb2e 100644 > --- a/xen/arch/x86/cpu/Makefile > +++ b/xen/arch/x86/cpu/Makefile > @@ -10,4 +10,6 @@ obj-y += intel.o > obj-y += intel_cacheinfo.o > obj-y += mwait-idle.o > obj-y += shanghai.o > -obj-y += vpmu.o vpmu_amd.o vpmu_intel.o > +obj-y += vpmu.o > +obj-$(CONFIG_SVM) += vpmu_amd.o > +obj-$(CONFIG_VMX) += vpmu_intel.o I'm afraid this breaks perf counters on PV guests. These files are joint guest-type implementations. Seeing as you leave vpmu.o alone, I guess that all you're actually wanting to do is compile out vpmu_intel.o? In which case, use CONFIG_{AMD,INTEL} rather than CONFIG_{SVM,VMX} please. ~Andrew
Re: [PATCH] Mini-OS: add some macros for asm statements
On 16/04/2024 8:11 am, Juergen Gross wrote: > diff --git a/arch/x86/sched.c b/arch/x86/sched.c > index dabe6fd6..460dea2e 100644 > --- a/arch/x86/sched.c > +++ b/arch/x86/sched.c > @@ -119,20 +113,12 @@ struct thread* arch_create_thread(char *name, void > (*function)(void *), > > void run_idle_thread(void) > { > -/* Switch stacks and run the thread */ > -#if defined(__i386__) > -__asm__ __volatile__("mov %0,%%esp\n\t" > - "push %1\n\t" > - "ret" > +/* Switch stacks and run the thread */ > +__asm__ __volatile__("mov %0,"ASM_SP"\n\t" > + "push %1\n\t" > + "ret" > :"=m" (idle_thread->sp) > - :"m" (idle_thread->ip)); > -#elif defined(__x86_64__) > -__asm__ __volatile__("mov %0,%%rsp\n\t" > - "push %1\n\t" > - "ret" > - :"=m" (idle_thread->sp) > - :"m" (idle_thread->ip)); > > -#endif > + :"m" (idle_thread->ip)); I know you're only transforming the existing logic, but this is dodgy in several ways. First, PUSH/RET is more commonly spelt JMP. Second, sp is an input parameter not an output, so I'm pretty sure this only works by luck. 01ee : 1ee: 55 push %rbp 1ef: 48 89 e5 mov %rsp,%rbp 1f2: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 1f9 1f9: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx # 200 200: 48 8b 60 10 mov 0x10(%rax),%rsp 204: ff 72 18 pushq 0x18(%rdx) 207: c3 retq 208: 90 nop 209: 5d pop %rbp 20a: c3 retq This only works because the address constructed for sp's "output" is happens to be on a single-parameter instruction where's its good as an input too. Anyway, this is a much better way of writing it: asm volatile ("mov %[sp], %%"ASM_SP"\n\t" "jmp *%[ip]\n\t" : : [sp] "m" (idle_thread->sp), [ip] "m" (idle_thread->ip)); and also highlights that run_idle_thread() should be marked noreturn. > } > > unsigned long __local_irq_save(void) > diff --git a/include/x86/os.h b/include/x86/os.h > index ee34d784..485d90b8 100644 > --- a/include/x86/os.h > +++ b/include/x86/os.h > @@ -77,6 +77,17 @@ int arch_suspend(void); > void arch_post_suspend(int canceled); > void arch_fini(void); > > +#if defined(__i386__) > +#define __SZ"l" > +#define __REG "e" > +#else > +#define __SZ"q" > +#define __REG "r" > +#endif > + > +#define ASM_SP "%%"__REG"sp" The %% should be at the usage sites, not here. That way, you can use ASM_SP in e.g. file-scope asm where it's spelt with a single %. > +#define ASM_MOV "mov"__SZ There's no need for ASM_MOV. Regular plain mov (no suffix) will work in all the cases you're trying to use it, and makes the asm easier to read. Notice how run_idle_thread() is already like this. ~Andrew
Re: [XEN PATCH v1 06/15] x86/p2m: guard altp2m code with CONFIG_VMX option
On 16/04/2024 7:31 am, Sergiy Kibrik wrote: > Instead of using generic CONFIG_HVM option switch to a bit more specific > CONFIG_VMX option for altp2m support, as it depends on VMX. Also guard > altp2m routines, so that it can be disabled completely in the build. > > Signed-off-by: Sergiy Kibrik Altp2m is not VMX-specific. It's just no-one has wired it up on AMD, or accepted the long-outstanding ARM patchset where it was made to work. If you want to compile it, you probably want CONFIG_ALTP2M. However, it's not even x86 specific. See the uses in common/monitor.c CC Tamas. ~Andrew
Re: [XEN PATCH v1 15/15] x86/hvm: make AMD-V and Intel VT-x support configurable
On 16/04/2024 7:50 am, Sergiy Kibrik wrote: > From: Xenia Ragiadakou > > Provide the user with configuration control over the cpu virtualization > support > in Xen by making SVM and VMX options user selectable. > > To preserve the current default behavior, both options depend on HVM and > default to Y. > > To prevent users from unknowingly disabling virtualization support, make the > controls user selectable only if EXPERT is enabled. > Also make INTEL_IOMMU/AMD_IOMMU options dependant on VMX/SVM options. Everything else seems ok, but there's no inherent dependency between VMX/SVM and IOMMUs. There are certain features (HAP/IOMMU pagetable sharing, posted interrupts) which do depend on both, but the vast majority of functionality is independent. It would be a legitimate config (although getting less plausible, these days) to have PV && IOMMU && !HVM. Furthermore, randconfig will do a better job without such restrictions in place. ~Andrew
Re: [XEN PATCH] docs/misra: mark the gzip folder as adopted code
On 15/04/2024 10:56 am, Federico Serafini wrote: > Mark the whole gzip folder as adopted code and remove the redundant > deviation of file inflate. > > Signed-off-by: Federico Serafini Acked-by: Andrew Cooper I hadn't realised that we had a special case like this. Definitely better to get rid of it. I've pulled this into my `for-next` branch, and will get it committed properly when OSSTest (our non-gitlab CI) is in a happier state.