Re: [PATCH v2] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake

2024-05-07 Thread Andrew Cooper
On 07/05/2024 3:45 pm, Roger Pau Monné wrote:
> On Tue, May 07, 2024 at 03:31:19PM +0100, Andrew Cooper wrote:
>> On 07/05/2024 3:24 pm, Roger Pau Monné wrote:
>>> On Tue, May 07, 2024 at 02:45:40PM +0100, Andrew Cooper wrote:
>>>> Ever since Xen 4.14, there has been a latent bug with migration.
>>>>
>>>> While some toolstacks can level the features properly, they don't shink
>>>> feat.max_subleaf when all features have been dropped.  This is because
>>>> we *still* have not completed the toolstack side work for full CPU Policy
>>>> objects.
>>>>
>>>> As a consequence, even when properly feature levelled, VMs can't migrate
>>>> "backwards" across hardware which reduces feat.max_subleaf.  One such 
>>>> example
>>>> is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0).
>>>>
>>>> Extend the max policies feat.max_subleaf to the hightest number Xen knows
>>>> about, but leave the default policies matching the host.  This will allow 
>>>> VMs
>>>> with a higher feat.max_subleaf than strictly necessary to migrate in.
>>>>
>>>> Eventually we'll manage to teach the toolstack how to avoid creating such 
>>>> VMs
>>>> in the first place, but there's still more work to do there.
>>>>
>>>> Signed-off-by: Andrew Cooper 
>>> Acked-by: Roger Pau Monné 
>> Thanks.
>>
>>> Even if we have just found one glitch with PSFD and Ice Lake vs
>>> Cascade Lack, wouldn't it be safer to always extend the max policies
>>> max leafs and subleafs to match the known array sizes?
>> This is the final max leaf (containing feature information) to gain
>> custom handling, I think?
> Couldn't the same happen with extended leaves?  Some of the extended
> leaves contain features, and hence for policy leveling toolstack might
> decide to zero them, yet extd.max_leaf won't be adjusted.

Hmm.  Right now, extd max leaf is also the one with the bit that we
unconditionally advertise, and it's inherited all the way from the host
policy.

So yes, in principle, but anything that bumps this limit is going to
have other implications too, and I'd prefer not to second-guess them at
this point.

I hope we can get the toolstack side fixes before this becomes a real
problem...

~Andrew



Re: [PATCH v2] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake

2024-05-07 Thread Andrew Cooper
On 07/05/2024 3:24 pm, Roger Pau Monné wrote:
> On Tue, May 07, 2024 at 02:45:40PM +0100, Andrew Cooper wrote:
>> Ever since Xen 4.14, there has been a latent bug with migration.
>>
>> While some toolstacks can level the features properly, they don't shink
>> feat.max_subleaf when all features have been dropped.  This is because
>> we *still* have not completed the toolstack side work for full CPU Policy
>> objects.
>>
>> As a consequence, even when properly feature levelled, VMs can't migrate
>> "backwards" across hardware which reduces feat.max_subleaf.  One such example
>> is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0).
>>
>> Extend the max policies feat.max_subleaf to the hightest number Xen knows
>> about, but leave the default policies matching the host.  This will allow VMs
>> with a higher feat.max_subleaf than strictly necessary to migrate in.
>>
>> Eventually we'll manage to teach the toolstack how to avoid creating such VMs
>> in the first place, but there's still more work to do there.
>>
>> Signed-off-by: Andrew Cooper 
> Acked-by: Roger Pau Monné 

Thanks.

>
> Even if we have just found one glitch with PSFD and Ice Lake vs
> Cascade Lack, wouldn't it be safer to always extend the max policies
> max leafs and subleafs to match the known array sizes?

This is the final max leaf (containing feature information) to gain
custom handling, I think?

~Andrew



Re: [PATCH] tools/xl: Open xldevd.log with O_CLOEXEC

2024-05-07 Thread Andrew Cooper
On 07/05/2024 3:23 pm, Marek Marczykowski-Górecki wrote:
> On Tue, May 07, 2024 at 03:15:48PM +0100, Andrew Cooper wrote:
>> On 07/05/2024 12:32 pm, Marek Marczykowski-Górecki wrote:
>>> On Tue, May 07, 2024 at 12:08:06PM +0100, Andrew Cooper wrote:
>>>> `xl devd` has been observed leaking /var/log/xldevd.log into children.
>>>>
>>>> Link: https://github.com/QubesOS/qubes-issues/issues/8292
>>>> Reported-by: Demi Marie Obenour 
>>>> Signed-off-by: Andrew Cooper 
>>>> ---
>>>> CC: Anthony PERARD 
>>>> CC: Juergen Gross 
>>>> CC: Demi Marie Obenour 
>>>> CC: Marek Marczykowski-Górecki 
>>>>
>>>> Also entirely speculative based on the QubesOS ticket.
>>>> ---
>>>>  tools/xl/xl_utils.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c
>>>> index 17489d182954..060186db3a59 100644
>>>> --- a/tools/xl/xl_utils.c
>>>> +++ b/tools/xl/xl_utils.c
>>>> @@ -270,7 +270,7 @@ int do_daemonize(const char *name, const char *pidfile)
>>>>  exit(-1);
>>>>  }
>>>>  
>>>> -CHK_SYSCALL(logfile = open(fullname, O_WRONLY|O_CREAT|O_APPEND, 
>>>> 0644));
>>>> +CHK_SYSCALL(logfile = open(fullname, O_WRONLY | O_CREAT | O_APPEND | 
>>>> O_CLOEXEC, 0644));
>>> This one might be not enough, as the FD gets dup2()-ed to stdout/stderr
>>> just outside of the context here, and then inherited by various hotplug
>>> script. Just adding O_CLOEXEC here means the hotplug scripts will run
>>> with stdout/stderr closed.
>> Lovely :(  Yes - this won't work.  I guess what we want instead is:
>>
>> diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c
>> index 060186db3a59..a0ce7dd7fa21 100644
>> --- a/tools/xl/xl_utils.c
>> +++ b/tools/xl/xl_utils.c
>> @@ -282,6 +282,7 @@ int do_daemonize(const char *name, const char *pidfile)
>>  dup2(logfile, 2);
>>  
>>  close(nullfd);
>> +    close(logfile);
>>  
>>  CHK_SYSCALL(daemon(0, 1));
>>  
>> which at least means there's not a random extra fd attached to the logfile.
> But logfile is a global variable, and it looks to be used in dolog()...

Urgh, fine.  Lets go back to your suggestion of setting CLOEXEC after
dup()ing.

~Andrew



Re: [PATCH] tools/xl: Open xldevd.log with O_CLOEXEC

2024-05-07 Thread Andrew Cooper
On 07/05/2024 12:32 pm, Marek Marczykowski-Górecki wrote:
> On Tue, May 07, 2024 at 12:08:06PM +0100, Andrew Cooper wrote:
>> `xl devd` has been observed leaking /var/log/xldevd.log into children.
>>
>> Link: https://github.com/QubesOS/qubes-issues/issues/8292
>> Reported-by: Demi Marie Obenour 
>> Signed-off-by: Andrew Cooper 
>> ---
>> CC: Anthony PERARD 
>> CC: Juergen Gross 
>> CC: Demi Marie Obenour 
>> CC: Marek Marczykowski-Górecki 
>>
>> Also entirely speculative based on the QubesOS ticket.
>> ---
>>  tools/xl/xl_utils.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c
>> index 17489d182954..060186db3a59 100644
>> --- a/tools/xl/xl_utils.c
>> +++ b/tools/xl/xl_utils.c
>> @@ -270,7 +270,7 @@ int do_daemonize(const char *name, const char *pidfile)
>>  exit(-1);
>>  }
>>  
>> -CHK_SYSCALL(logfile = open(fullname, O_WRONLY|O_CREAT|O_APPEND, 0644));
>> +CHK_SYSCALL(logfile = open(fullname, O_WRONLY | O_CREAT | O_APPEND | 
>> O_CLOEXEC, 0644));
> This one might be not enough, as the FD gets dup2()-ed to stdout/stderr
> just outside of the context here, and then inherited by various hotplug
> script. Just adding O_CLOEXEC here means the hotplug scripts will run
> with stdout/stderr closed.

Lovely :(  Yes - this won't work.  I guess what we want instead is:

diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c
index 060186db3a59..a0ce7dd7fa21 100644
--- a/tools/xl/xl_utils.c
+++ b/tools/xl/xl_utils.c
@@ -282,6 +282,7 @@ int do_daemonize(const char *name, const char *pidfile)
 dup2(logfile, 2);
 
 close(nullfd);
+    close(logfile);
 
 CHK_SYSCALL(daemon(0, 1));
 
which at least means there's not a random extra fd attached to the logfile.
>>  free(fullname);
>>  assert(logfile >= 3);
>>  
>>
>> base-commit: ebab808eb1bb8f24c7d0dd41b956e48cb1824b81
>> prerequisite-patch-id: 212e50457e9b6bdfd06a97da545a5aa7155bb919
> Which one is this? I don't see it in staging, nor in any of your
> branches on xenbits. Lore finds "tools/libxs: Open /dev/xen/xenbus fds
> as O_CLOEXEC" which I guess is correct, but I have no idea how it
> correlates it, as this hash doesn't appear anywhere in the message, nor
> its headers...

It's the libxs patch, but rebased over yesterday's push to staging. 
auto-base doesn't work quite so well in cases like this.

~Andrew



Re: [PATCH net] xen-netfront: Add missing skb_mark_for_recycle

2024-05-07 Thread Andrew Cooper
Hello,

Please could we request a CVE for "xen-netfront: Add missing
skb_mark_for_recycle" which is 037965402a010898d34f4e35327d22c0a95cd51f
in Linus' tree.

This is a kernel memory leak trigger-able from unprivileged userspace.

I can't see any evidence of this fix having been assigned a CVE thus far
on the linux-cve-annouce mailing list.

Thanks,

~Andrew


On 25/04/2024 4:13 pm, Greg KH wrote:
> On Thu, Apr 25, 2024 at 02:39:38PM +0100, George Dunlap wrote:
>> Greg,
>>
>> We're issuing an XSA for this; can you issue a CVE?
> To ask for a cve, please contact c...@kernel.org as per our
> documentation.  Please provide the git id of the commit you wish to have
> the cve assigned to.
>
> thanks,
>
> greg k-h




[PATCH v2] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake

2024-05-07 Thread Andrew Cooper
Ever since Xen 4.14, there has been a latent bug with migration.

While some toolstacks can level the features properly, they don't shink
feat.max_subleaf when all features have been dropped.  This is because
we *still* have not completed the toolstack side work for full CPU Policy
objects.

As a consequence, even when properly feature levelled, VMs can't migrate
"backwards" across hardware which reduces feat.max_subleaf.  One such example
is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0).

Extend the max policies feat.max_subleaf to the hightest number Xen knows
about, but leave the default policies matching the host.  This will allow VMs
with a higher feat.max_subleaf than strictly necessary to migrate in.

Eventually we'll manage to teach the toolstack how to avoid creating such VMs
in the first place, but there's still more work to do there.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 

v2:
 * Adjust max policies rather than the host policy.
---
 xen/arch/x86/cpu-policy.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index 4b6d96276399..f7e2910c01b5 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -590,6 +590,13 @@ static void __init calculate_pv_max_policy(void)
 unsigned int i;
 
 *p = host_cpu_policy;
+
+/*
+ * Some VMs may have a larger-than-necessary feat max_leaf.  Allow them to
+ * migrate in.
+ */
+p->feat.max_subleaf = ARRAY_SIZE(p->feat.raw) - 1;
+
 x86_cpu_policy_to_featureset(p, fs);
 
 for ( i = 0; i < ARRAY_SIZE(fs); ++i )
@@ -630,6 +637,10 @@ static void __init calculate_pv_def_policy(void)
 unsigned int i;
 
 *p = pv_max_cpu_policy;
+
+/* Default to the same max_subleaf as the host. */
+p->feat.max_subleaf = host_cpu_policy.feat.max_subleaf;
+
 x86_cpu_policy_to_featureset(p, fs);
 
 for ( i = 0; i < ARRAY_SIZE(fs); ++i )
@@ -666,6 +677,13 @@ static void __init calculate_hvm_max_policy(void)
 const uint32_t *mask;
 
 *p = host_cpu_policy;
+
+/*
+ * Some VMs may have a larger-than-necessary feat max_leaf.  Allow them to
+ * migrate in.
+ */
+p->feat.max_subleaf = ARRAY_SIZE(p->feat.raw) - 1;
+
 x86_cpu_policy_to_featureset(p, fs);
 
 mask = hvm_hap_supported() ?
@@ -783,6 +801,10 @@ static void __init calculate_hvm_def_policy(void)
 const uint32_t *mask;
 
 *p = hvm_max_cpu_policy;
+
+/* Default to the same max_subleaf as the host. */
+p->feat.max_subleaf = host_cpu_policy.feat.max_subleaf;
+
 x86_cpu_policy_to_featureset(p, fs);
 
 mask = hvm_hap_supported() ?

base-commit: ebab808eb1bb8f24c7d0dd41b956e48cb1824b81
-- 
2.30.2




Re: [PATCH] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake

2024-05-07 Thread Andrew Cooper
On 07/05/2024 12:50 pm, Roger Pau Monné wrote:
> On Tue, May 07, 2024 at 12:29:57PM +0100, Andrew Cooper wrote:
>> Ever since Xen 4.14, there has been a latent bug with migration.
>>
>> While some toolstacks can level the features properly, they don't shink
>> feat.max_subleaf when all features have been dropped.  This is because
>> we *still* have not completed the toolstack side work for full CPU Policy
>> objects.
>>
>> As a consequence, even when properly feature levelled, VMs can't migrate
>> "backwards" across hardware which reduces feat.max_subleaf.  One such example
>> is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0).
>>
>> Extend the host policy's feat.max_subleaf to the hightest number Xen knows
>> about, similarly to how we extend extd.max_leaf for LFENCE_DISPATCH.  This
>> will allow VMs with a higher feat.max_subleaf than strictly necessary to
>> migrate in.
> Seeing what we do for max_extd_leaf, shouldn't we switch to doing what
> you propose for feat.max_subleaf to max_extd_leaf also?
>
> To allow migration between hosts that have 0x8021.eax and hosts
> that don't have such extended leaf.
>
> cpu_has_lfence_dispatch kind of does that, but if lfence cannot be
> made serializing then the max extended leaf is not expanded.  And we
> should also likely account for more feature leafs possibly appearing
> after 0x8021?

On second thoughts, this adjustment ought to be in the max policies only.

It's slightly different to LFENCE_DISPATCH, in that we don't actually
have any set bits in those leaves.

I'll do a different patch.

~Andrew



[PATCH] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake

2024-05-07 Thread Andrew Cooper
Ever since Xen 4.14, there has been a latent bug with migration.

While some toolstacks can level the features properly, they don't shink
feat.max_subleaf when all features have been dropped.  This is because
we *still* have not completed the toolstack side work for full CPU Policy
objects.

As a consequence, even when properly feature levelled, VMs can't migrate
"backwards" across hardware which reduces feat.max_subleaf.  One such example
is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0).

Extend the host policy's feat.max_subleaf to the hightest number Xen knows
about, similarly to how we extend extd.max_leaf for LFENCE_DISPATCH.  This
will allow VMs with a higher feat.max_subleaf than strictly necessary to
migrate in.

Eventually we'll manage to teach the toolstack how to avoid creating such VMs
in the first place, but there's still more work to do there.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
---
 xen/arch/x86/cpu-policy.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index 4b6d96276399..a216fc8b886f 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -373,8 +373,13 @@ static void __init calculate_host_policy(void)
 
 p->basic.max_leaf =
 min_t(uint32_t, p->basic.max_leaf,   ARRAY_SIZE(p->basic.raw) - 1);
-p->feat.max_subleaf =
-min_t(uint32_t, p->feat.max_subleaf, ARRAY_SIZE(p->feat.raw) - 1);
+
+/*
+ * p->feat is "just" featureset information.  We know about more than may
+ * be present in this hardware.  Also, VMs may have a higher max_subleaf
+ * than strictly necessary, and we can accept those too.
+ */
+p->feat.max_subleaf = ARRAY_SIZE(p->feat.raw) - 1;
 
 max_extd_leaf = p->extd.max_leaf;
 

base-commit: ebab808eb1bb8f24c7d0dd41b956e48cb1824b81
-- 
2.30.2




[PATCH] tools/xl: Open xldevd.log with O_CLOEXEC

2024-05-07 Thread Andrew Cooper
`xl devd` has been observed leaking /var/log/xldevd.log into children.

Link: https://github.com/QubesOS/qubes-issues/issues/8292
Reported-by: Demi Marie Obenour 
Signed-off-by: Andrew Cooper 
---
CC: Anthony PERARD 
CC: Juergen Gross 
CC: Demi Marie Obenour 
CC: Marek Marczykowski-Górecki 

Also entirely speculative based on the QubesOS ticket.
---
 tools/xl/xl_utils.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c
index 17489d182954..060186db3a59 100644
--- a/tools/xl/xl_utils.c
+++ b/tools/xl/xl_utils.c
@@ -270,7 +270,7 @@ int do_daemonize(const char *name, const char *pidfile)
 exit(-1);
 }
 
-CHK_SYSCALL(logfile = open(fullname, O_WRONLY|O_CREAT|O_APPEND, 0644));
+CHK_SYSCALL(logfile = open(fullname, O_WRONLY | O_CREAT | O_APPEND | 
O_CLOEXEC, 0644));
 free(fullname);
 assert(logfile >= 3);
 

base-commit: ebab808eb1bb8f24c7d0dd41b956e48cb1824b81
prerequisite-patch-id: 212e50457e9b6bdfd06a97da545a5aa7155bb919
-- 
2.30.2




[PATCH] tools/libxs: Open /dev/xen/xenbus fds as O_CLOEXEC

2024-05-03 Thread Andrew Cooper
The header description for xs_open() goes as far as to suggest that the fd is
O_CLOEXEC, but it isn't actually.

`xl devd` has been observed leaking /dev/xen/xenbus into children.

Link: https://github.com/QubesOS/qubes-issues/issues/8292
Reported-by: Demi Marie Obenour 
Signed-off-by: Andrew Cooper 
---
CC: Anthony PERARD 
CC: Juergen Gross 
CC: Demi Marie Obenour 
CC: Marek Marczykowski-Górecki 

Entirely speculative patch based on a Matrix report
---
 tools/libs/store/xs.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/libs/store/xs.c b/tools/libs/store/xs.c
index 140b9a28395e..1f74fb3c44a2 100644
--- a/tools/libs/store/xs.c
+++ b/tools/libs/store/xs.c
@@ -54,6 +54,10 @@ struct xs_stored_msg {
 #include 
 #endif
 
+#ifndef O_CLOEXEC
+#define O_CLOEXEC 0
+#endif
+
 struct xs_handle {
/* Communications channel to xenstore daemon. */
int fd;
@@ -227,7 +231,7 @@ static int get_socket(const char *connect_to)
 static int get_dev(const char *connect_to)
 {
/* We cannot open read-only because requests are writes */
-   return open(connect_to, O_RDWR);
+   return open(connect_to, O_RDWR|O_CLOEXEC);
 }
 
 static int all_restrict_cb(Xentoolcore__Active_Handle *ah, domid_t domid) {

base-commit: feb9158a620040846d76981acbe8ea9e2255a07b
-- 
2.30.2




Re: [REGRESSION] Re: [XEN PATCH 0/3] automation/eclair: do not allow failure for triggered analyses

2024-05-03 Thread Andrew Cooper
On 03/05/2024 9:01 pm, Federico Serafini wrote:
> On 03/05/24 21:46, Andrew Cooper wrote:
>> On 03/05/2024 8:44 pm, Federico Serafini wrote:
>>> On 03/05/24 21:14, Andrew Cooper wrote:
>>>> On 29/04/2024 4:21 pm, Federico Serafini wrote:
>>>>> Patch 1/3 does some preparation work.
>>>>>
>>>>> Patch 2/3, as the title says, removes allow_failure = true for
>>>>> triggered
>>>>> analyses.
>>>>>
>>>>> Patch 3/3 makes explicit that initally no files are tagged as
>>>>> adopted, this
>>>>> is needed by the scheduled analysis.
>>>>
>>>> I'm afraid that something in this series is broken.
>>>>
>>>> Since these patches went in, all pipelines are now getting a status of
>>>> blocked rather than passed.
>>>>
>>>> If I manually start the Eclair jobs, then eventually the pipeline gets
>>>> to Passed.
>>>
>>> Can you provide us a link to those failures?
>>> I am looking at gitlab xen-project/xen and xen-project/patchew
>>> and everything seems ok.
>>>
>>
>> https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines/1276081658
>> is the first one I noticed as blocked, and I manually ran.  That ended
>> up working.
>>
>> https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines/1277724847
>> is still in the blocked state.  The build-each-commit failure is
>> unrelated.
>
> This is intentional and was introduced by
> commit 7c1bf8661db5c00bd8c9a25015fe8678b2ff9ac6
>
> The ECLAIR analysis under people/* need to be activated
> manually.

Yes.  I know, and that matches the behaviour I saw.

>
> Is this causing some problems to the CI?
>

Yes.

See https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines

Prior to this series, the manual actions were not used but the pipeline
was overall in the Passed state.  Specifically, they ended up being skipped.

After this series, the manual actions are now blocking the pipeline, not
letting it complete, and not marking it as passed.

~Andrew



Re: [REGRESSION] Re: [XEN PATCH 0/3] automation/eclair: do not allow failure for triggered analyses

2024-05-03 Thread Andrew Cooper
On 03/05/2024 8:44 pm, Federico Serafini wrote:
> On 03/05/24 21:14, Andrew Cooper wrote:
>> On 29/04/2024 4:21 pm, Federico Serafini wrote:
>>> Patch 1/3 does some preparation work.
>>>
>>> Patch 2/3, as the title says, removes allow_failure = true for
>>> triggered
>>> analyses.
>>>
>>> Patch 3/3 makes explicit that initally no files are tagged as
>>> adopted, this
>>> is needed by the scheduled analysis.
>>
>> I'm afraid that something in this series is broken.
>>
>> Since these patches went in, all pipelines are now getting a status of
>> blocked rather than passed.
>>
>> If I manually start the Eclair jobs, then eventually the pipeline gets
>> to Passed.
>
> Can you provide us a link to those failures?
> I am looking at gitlab xen-project/xen and xen-project/patchew
> and everything seems ok.
>

https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines/1276081658
is the first one I noticed as blocked, and I manually ran.  That ended
up working.

https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines/1277724847
is still in the blocked state.  The build-each-commit failure is unrelated.

~Andrew



[REGRESSION] Re: [XEN PATCH 0/3] automation/eclair: do not allow failure for triggered analyses

2024-05-03 Thread Andrew Cooper
On 29/04/2024 4:21 pm, Federico Serafini wrote:
> Patch 1/3 does some preparation work.
>
> Patch 2/3, as the title says, removes allow_failure = true for triggered
> analyses.
>
> Patch 3/3 makes explicit that initally no files are tagged as adopted, this
> is needed by the scheduled analysis.

I'm afraid that something in this series is broken.

Since these patches went in, all pipelines are now getting a status of
blocked rather than passed.

If I manually start the Eclair jobs, then eventually the pipeline gets
to Passed.

~Andrew



Re: [PATCH] tools/tests: don't let test-xenstore write nodes exceeding default size

2024-05-02 Thread Andrew Cooper
On 02/05/2024 2:25 pm, Andrew Cooper wrote:
> On 02/05/2024 2:21 pm, Juergen Gross wrote:
>> Today test-xenstore will write nodes with 3000 bytes node data. This
>> size is exceeding the default quota for the allowed node size. While
>> working in dom0 with C-xenstored, OCAML-xenstored does not like that.
>>
>> Use a size of 2000 instead, which is lower than the allowed default
>> node size of 2048.
>>
>> Fixes: 3afc5e4a5b75 ("tools/tests: add xenstore testing framework")
>> Signed-off-by: Juergen Gross 
> Acked-by: Andrew Cooper 
>
> Thanks for figuring this out.  I'll give this and the return code fix a
> spin through XenRT and check there's nothing else unexpected hiding.

All seems happy.  I'll put these in in due course.

~Andrew



[PATCH] xen/Kconfig: Drop the final remnants of ---help---

2024-05-02 Thread Andrew Cooper
We deprecated the use of ---help--- a while ago, but a lot of new content
copy bad examples.  Convert the remaining instances, and update
Kconfig's parser to no longer recongise it.

This now causes builds to fail with:

  Kconfig.debug:8: syntax error
  Kconfig.debug:7: unknown statement "---help---"

which short circuits one common piece of churn in new content.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: George Dunlap 
CC: Jan Beulich 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Oleksii Kurochko 

For 4.19.  This cleans up a legacy we've been wanting to get rid of for a
while, and will be least disruptive on people if it gets in ahead of most
people starting work for 4.20.
---
 xen/Kconfig |  2 +-
 xen/Kconfig.debug   | 28 +--
 xen/arch/arm/Kconfig|  8 +++---
 xen/arch/arm/platforms/Kconfig  | 12 -
 xen/arch/x86/Kconfig| 32 +++---
 xen/common/Kconfig  | 48 -
 xen/common/sched/Kconfig| 10 +++
 xen/drivers/passthrough/Kconfig |  8 +++---
 xen/drivers/video/Kconfig   |  2 +-
 xen/tools/kconfig/lexer.l   |  2 +-
 10 files changed, 76 insertions(+), 76 deletions(-)

diff --git a/xen/Kconfig b/xen/Kconfig
index 1e1b041fd52f..e459cdac0cd7 100644
--- a/xen/Kconfig
+++ b/xen/Kconfig
@@ -84,7 +84,7 @@ config UNSUPPORTED
 config LTO
bool "Link Time Optimisation"
depends on BROKEN
-   ---help---
+   help
  Enable Link Time Optimisation.
 
  If unsure, say N.
diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug
index fa81853e9385..61b24ac552cd 100644
--- a/xen/Kconfig.debug
+++ b/xen/Kconfig.debug
@@ -4,7 +4,7 @@ menu "Debugging Options"
 config DEBUG
bool "Developer Checks"
default y
-   ---help---
+   help
  If you say Y here this will enable developer checks such as asserts
  and extra printks. This option is intended for development purposes
  only, and not for production use.
@@ -17,14 +17,14 @@ config GDBSX
bool "Guest debugging with gdbsx"
depends on X86
default y
-   ---help---
+   help
  If you want to enable support for debugging guests from dom0 via
  gdbsx then say Y.
 
 config FRAME_POINTER
bool "Compile Xen with frame pointers"
default DEBUG
-   ---help---
+   help
  If you say Y here the resulting Xen will be slightly larger and
  maybe slower, but it gives very useful debugging information
  in case of any Xen bugs.
@@ -33,7 +33,7 @@ config COVERAGE
bool "Code coverage support"
depends on !LIVEPATCH
select SUPPRESS_DUPLICATE_SYMBOL_WARNINGS if !ENFORCE_UNIQUE_SYMBOLS
-   ---help---
+   help
  Enable code coverage support.
 
  If unsure, say N here.
@@ -41,7 +41,7 @@ config COVERAGE
 config DEBUG_LOCK_PROFILE
bool "Lock Profiling"
select DEBUG_LOCKS
-   ---help---
+   help
  Lock profiling allows you to see how often locks are taken and 
blocked.
  You can use serial console to print (and reset) using 'l' and 'L'
  respectively, or the 'xenlockprof' tool.
@@ -49,13 +49,13 @@ config DEBUG_LOCK_PROFILE
 config DEBUG_LOCKS
bool "Lock debugging"
default DEBUG
-   ---help---
+   help
  Enable debugging features of lock handling.  Some additional
  checks will be performed when acquiring and releasing locks.
 
 config PERF_COUNTERS
bool "Performance Counters"
-   ---help---
+   help
  Enables software performance counters that allows you to analyze
  bottlenecks in the system.  To access this data you can use serial
  console to print (and reset) using 'p' and 'P' respectively, or
@@ -64,21 +64,21 @@ config PERF_COUNTERS
 config PERF_ARRAYS
bool "Performance Counter Array Histograms"
depends on PERF_COUNTERS
-   ---help---
+   help
  Enables software performance counter array histograms.
 
 
 config VERBOSE_DEBUG
bool "Verbose debug messages"
default DEBUG
-   ---help---
+   help
  Guest output from HYPERVISOR_console_io and hypervisor parsing
  ELF images (dom0) will be logged in the Xen ring buffer.
 
 config DEVICE_TREE_DEBUG
bool "Device tree debug messages"
depends on HAS_DEVICE_TREE
-   ---help---
+   help
  Device tree parsing and DOM0 device tree building messages are
  logged in the Xen ring buffer.
  If unsure, say N here.
@@ -86,14 +86,14 @@ config DEVICE_TREE_DEBUG
 config SCRUB_DEBUG
bool "Page scrubbing test"
default DEBUG
-   ---help---
+   help
  Verify that pages that need

Re: [PATCH v2 4/4] x86/cpuid: Fix handling of xsave dynamic leaves

2024-05-02 Thread Andrew Cooper
On 02/05/2024 2:04 pm, Jan Beulich wrote:
> On 29.04.2024 20:28, Andrew Cooper wrote:
>> If max leaf is greater than 0xd but xsave not available to the guest, then 
>> the
>> current XSAVE size should not be filled in.  This is a latent bug for now as
>> the guest max leaf is 0xd, but will become problematic in the future.
> Why would this not be an issue when .max_leaf == 0xd, but .xsave == 0?

Hmm, true.  I'll adjust the description.

>
>> The comment concerning XSS state is wrong.  VT-x doesn't manage host/guest
>> state automatically, but there is provision for "host only" bits to be set, 
>> so
>> the implications are still accurate.
>>
>> Introduce {xstate,hw}_compressed_size() helpers to mirror the uncompressed
>> ones.
>>
>> This in turn higlights a bug in xstate_init().  Defaulting this_cpu(xss) to 
>> ~0
>> requires a forced write to clear it back out.  This in turn highlights that
>> it's only a safe default on systems with XSAVES.
> Well, yes, such an explicit write was expected to appear when some xsaves-
> based component would actually be enabled. Much like the set_xcr0() there.

I stumbled over this because the debug logic ended up trying to restore
0x (the thing it read out as the "last" value) back into XSS. 
This works about as well as you'd expect.
>> Signed-off-by: Andrew Cooper 
>> ---
>> CC: Jan Beulich 
>> CC: Roger Pau Monné 
>>
>> The more I think about it, the more I think that cross-checking with hardware
>> is a bad move.  It's horribly expensive, and for supervisor states in
>> particular, liable to interfere with functionality.
> I agree, but I'd also like to see the cross checking not dropped entirely.
> Can't we arrange for such to happen during boot, before we enable any such
> functionality? After all even in debug builds it's not overly useful to do
> the same cross-checking (i.e. for identical inputs) over and over again.
> Of course doing in an exhaustive manner may be okay for the uncompressed
> values, but might be a little too much for all possible combinations in
> order to check compressed values, too.

Given the observation of patch 2 being buggy and my final sanity test
before posting didn't notice, I think doing this all at boot would be a
much better idea.

I think I'm going to do a new patch early in the series as an adjustment
to xstate_init().

We can't feasibly test every combination, but what ought to do is
linearly accumulate the xstates Xen knows about and checking that in
each case the size(s) increase as appropriate.

This will have a substantial impact on the remainder of the series, but
I think the end result will be all the better for it.
>> --- a/xen/arch/x86/xstate.c
>> +++ b/xen/arch/x86/xstate.c
>> @@ -614,6 +614,65 @@ unsigned int xstate_uncompressed_size(uint64_t xcr0)
>>  return size;
>>  }
>>  
>> +static unsigned int hw_compressed_size(uint64_t xstates)
>> +{
>> +uint64_t curr_xcr0 = get_xcr0(), curr_xss = get_msr_xss();
>> +unsigned int size;
>> +bool ok;
>> +
>> +ok = set_xcr0(xstates & ~XSTATE_XSAVES_ONLY);
>> +ASSERT(ok);
>> +set_msr_xss(xstates & XSTATE_XSAVES_ONLY);
>> +
>> +size = cpuid_count_ebx(XSTATE_CPUID, 1);
>> +
>> +ok = set_xcr0(curr_xcr0);
>> +ASSERT(ok);
>> +set_msr_xss(curr_xss);
>> +
>> +return size;
>> +}
>> +
>> +unsigned int xstate_compressed_size(uint64_t xstates)
>> +{
>> +unsigned int i, size = XSTATE_AREA_MIN_SIZE;
>> +
>> +if ( xstates == 0 ) /* TODO: clean up paths passing 0 in here. */
>> +return 0;
>> +
>> +if ( xstates <= (X86_XCR0_SSE | X86_XCR0_FP) )
> Same comment as on the earlier change regarding the (lack of) use of 
>
>> +return size;
>> +
>> +/*
>> + * For the compressed size, every component matters.  Some are
>> + * automatically rounded up to 64 first.
>> + */
>> +xstates &= ~XSTATE_FP_SSE;
> ... this constant up there.
>
>> +for_each_set_bit ( i, , 63 )
>> +{
>> +if ( test_bit(i, _align) )
>> +size = ROUNDUP(size, 64);
>> +
>> +size += xstate_sizes[i];
>> +}
> The comment is a little misleading: As you have it in code, it is not the
> component's size that is rounded up, but its position. Maybe "Some have
> their start automatically rounded up to 64 first"?

Size in that sentence referees to the compressed size of everything, not
the size of the component.

But I'll try to make it clearer.

~Andrew



Re: [PATCH] tools/tests: don't let test-xenstore write nodes exceeding default size

2024-05-02 Thread Andrew Cooper
On 02/05/2024 2:21 pm, Juergen Gross wrote:
> Today test-xenstore will write nodes with 3000 bytes node data. This
> size is exceeding the default quota for the allowed node size. While
> working in dom0 with C-xenstored, OCAML-xenstored does not like that.
>
> Use a size of 2000 instead, which is lower than the allowed default
> node size of 2048.
>
> Fixes: 3afc5e4a5b75 ("tools/tests: add xenstore testing framework")
> Signed-off-by: Juergen Gross 

Acked-by: Andrew Cooper 

Thanks for figuring this out.  I'll give this and the return code fix a
spin through XenRT and check there's nothing else unexpected hiding.



Re: [PATCH v2 3/4] x86/cpu-policy: Simplify recalculate_xstate()

2024-05-02 Thread Andrew Cooper
On 02/05/2024 1:39 pm, Jan Beulich wrote:
> On 29.04.2024 20:28, Andrew Cooper wrote:
>> Make use of the new xstate_uncompressed_size() helper rather than maintaining
>> the running calculation while accumulating feature components.
> xstate_uncompressed_size() isn't really a new function, but the re-work of
> an earlier one. That, aiui, could have been used here, too, just that it
> would have been inefficient to do so. IOW perhaps drop "the new"?

Ok.

>
>> The rest of the CPUID data can come direct from the raw cpu policy.  All
>> per-component data form an ABI through the behaviour of the X{SAVE,RSTOR}*
>> instructions.
>>
>> Use for_each_set_bit() rather than opencoding a slightly awkward version of
>> it.  Mask the attributes in ecx down based on the visible features.  This
>> isn't actually necessary for any components or attributes defined at the time
>> of writing (up to AMX), but is added out of an abundance of caution.
> As to this, ...
>
>> @@ -206,61 +205,47 @@ static void recalculate_xstate(struct cpu_policy *p)
>>  return;
>>  
>>  if ( p->basic.avx )
>> -{
>>  xstates |= X86_XCR0_YMM;
>> -xstate_size = max(xstate_size,
>> -  xstate_offsets[X86_XCR0_YMM_POS] +
>> -  xstate_sizes[X86_XCR0_YMM_POS]);
>> -}
>>  
>>  if ( p->feat.mpx )
>> -{
>>  xstates |= X86_XCR0_BNDREGS | X86_XCR0_BNDCSR;
>> -xstate_size = max(xstate_size,
>> -  xstate_offsets[X86_XCR0_BNDCSR_POS] +
>> -  xstate_sizes[X86_XCR0_BNDCSR_POS]);
>> -}
>>  
>>  if ( p->feat.avx512f )
>> -{
>>  xstates |= X86_XCR0_OPMASK | X86_XCR0_ZMM | X86_XCR0_HI_ZMM;
>> -xstate_size = max(xstate_size,
>> -  xstate_offsets[X86_XCR0_HI_ZMM_POS] +
>> -  xstate_sizes[X86_XCR0_HI_ZMM_POS]);
>> -}
>>  
>>  if ( p->feat.pku )
>> -{
>>  xstates |= X86_XCR0_PKRU;
>> -xstate_size = max(xstate_size,
>> -  xstate_offsets[X86_XCR0_PKRU_POS] +
>> -  xstate_sizes[X86_XCR0_PKRU_POS]);
>> -}
>>  
>> -p->xstate.max_size  =  xstate_size;
>> +/* Subleaf 0 */
>> +p->xstate.max_size =
>> +xstate_uncompressed_size(xstates & ~XSTATE_XSAVES_ONLY);
>>  p->xstate.xcr0_low  =  xstates & ~XSTATE_XSAVES_ONLY;
>>  p->xstate.xcr0_high = (xstates & ~XSTATE_XSAVES_ONLY) >> 32;
>>  
>> +/* Subleaf 1 */
>>  p->xstate.Da1 = Da1;
>> +if ( p->xstate.xsavec )
>> +ecx_mask |= XSTATE_ALIGN64;
>> +
>>  if ( p->xstate.xsaves )
>>  {
>> +ecx_mask |= XSTATE_XSS;
>>  p->xstate.xss_low   =  xstates & XSTATE_XSAVES_ONLY;
>>  p->xstate.xss_high  = (xstates & XSTATE_XSAVES_ONLY) >> 32;
>>  }
>> -else
>> -xstates &= ~XSTATE_XSAVES_ONLY;
>>  
>> -for ( i = 2; i < min(63UL, ARRAY_SIZE(p->xstate.comp)); ++i )
>> +/* Subleafs 2+ */
>> +xstates &= ~XSTATE_FP_SSE;
>> +BUILD_BUG_ON(ARRAY_SIZE(p->xstate.comp) < 63);
>> +for_each_set_bit ( i, , 63 )
>>  {
>> -uint64_t curr_xstate = 1UL << i;
>> -
>> -if ( !(xstates & curr_xstate) )
>> -continue;
>> -
>> -p->xstate.comp[i].size   = xstate_sizes[i];
>> -p->xstate.comp[i].offset = xstate_offsets[i];
>> -p->xstate.comp[i].xss= curr_xstate & XSTATE_XSAVES_ONLY;
>> -p->xstate.comp[i].align  = curr_xstate & xstate_align;
> ... for this bit, isn't the move from this ...
>
>> +/*
>> + * Pass through size (eax) and offset (ebx) directly.  Visbility of
>> + * attributes in ecx limited by visible features in Da1.
>> + */
>> +p->xstate.raw[i].a = raw_cpu_policy.xstate.raw[i].a;
>> +p->xstate.raw[i].b = raw_cpu_policy.xstate.raw[i].b;
>> +p->xstate.raw[i].c = raw_cpu_policy.xstate.raw[i].c & ecx_mask;
> ... to this changing what guests get to see, i.e. (mildly?) incompatible?

No.

The only "rows" in leaf 0xd we expose to guests are AVX, MPX, AVX512 and
PKU  (higher up in this hunk, selecting valid bits in xstates).  None of
these have a non-zero value in ecx.

This is a latent bug until we offer AMX or CET, hence why I wanted to
complete this series before your AMX series goes in.

~Andrew



Re: [PATCH v2 2/4] x86/xstate: Rework xstate_ctxt_size() as xstate_uncompressed_size()

2024-05-02 Thread Andrew Cooper
On 02/05/2024 1:19 pm, Jan Beulich wrote:
> On 29.04.2024 20:28, Andrew Cooper wrote:
>> @@ -567,16 +567,51 @@ static unsigned int hw_uncompressed_size(uint64_t xcr0)
>>  return size;
>>  }
>>  
>> -/* Fastpath for common xstate size requests, avoiding reloads of xcr0. */
>> -unsigned int xstate_ctxt_size(u64 xcr0)
>> +unsigned int xstate_uncompressed_size(uint64_t xcr0)
>>  {
>> +unsigned int size = XSTATE_AREA_MIN_SIZE, i;
>> +
>>  if ( xcr0 == xfeature_mask )
>>  return xsave_cntxt_size;
>>  
>>  if ( xcr0 == 0 ) /* TODO: clean up paths passing 0 in here. */
>>  return 0;
>>  
>> -return hw_uncompressed_size(xcr0);
>> +if ( xcr0 <= (X86_XCR0_SSE | X86_XCR0_FP) )
> This is open-coded XSTATE_FP_SSE, which I wouldn't mind if ...
>
>> +return size;
>> +
>> +/*
>> + * For the non-legacy states, search all activate states and find the
>> + * maximum offset+size.  Some states (e.g. LWP, APX_F) are out-of-order
>> + * with respect their index.
>> + */
>> +xcr0 &= ~XSTATE_FP_SSE;
> ... you didn't use that macro here (and once further down). IOW please
> be consistent, no matter which way round.

Oh yes - that's a consequence of these hunks having between written 3
years apart.

It's important for the first one (logical comparison against a bitmap)
that it's split apart.

>
>> +for_each_set_bit ( i, , 63 )
>> +{
>> +unsigned int s;
>> +
>> +ASSERT(xstate_offsets[i] && xstate_sizes[i]);
>> +
>> +s = xstate_offsets[i] && xstate_sizes[i];
> You mean + here, don't you?

Yes I do...  That was a victim of a last minute refactor.

It also shows that even the cross-check with hardware isn't terribly
effective.  More on that in the other thread.

>  Then:
> Reviewed-by: Jan Beulich 
>
> I'm also inclined to suggest making this the initializer of s.

Hmm, good point.  Will change.

Thanks,

~Andrew



Re: [PATCH for-4.19] tools/xen-cpuid: switch to use cpu-policy defined names

2024-05-02 Thread Andrew Cooper
On 30/04/2024 1:54 pm, Roger Pau Monné wrote:
> On Tue, Apr 30, 2024 at 02:06:38PM +0200, Jan Beulich wrote:
>> On 30.04.2024 13:25, Roger Pau Monné wrote:
>>> On Tue, Apr 30, 2024 at 12:37:44PM +0200, Jan Beulich wrote:
 On 30.04.2024 10:29, Roger Pau Monne wrote:
> @@ -301,21 +52,32 @@ static const char *const fs_names[] = {
>  [XEN_SYSCTL_cpu_featureset_hvm_max] = "HVM Max",
>  };
>  
> -static void dump_leaf(uint32_t leaf, const char *const *strs)
> +static const char *find_name(unsigned int index)
>  {
> -unsigned i;
> +static const struct feature_name {
> +const char *name;
> +unsigned int bit;
> +} feature_names[] = INIT_FEATURE_NAMES;
> +unsigned int i;
>  
> -if ( !strs )
> -{
> -printf(" ???");
> -return;
> -}
> +for ( i = 0; i < ARRAY_SIZE(feature_names); i++ )
> +if ( feature_names[i].bit == index )
> +return feature_names[i].name;
 ... a linear search, repeated perhaps hundreds of times, looks still a
 little odd to me.
>>> I didn't benchmark what kind of performance impact this change would
>>> have on the tool, but I didn't think it was that relevant, as this is
>>> a diagnostic/debug tool, and hence performance (unless it took seconds
>>> to execute) shouldn't be that important.
>> As indicated, performance itself isn't much of a concern here. My earlier
>> question wants reading in relation to the other question raised, regarding
>> the script maybe wanting to produce macro(s) more suitable for the purpose
>> here.
> Hm, we could maybe produce an array of strings, one per feature bit
> (features without names would get NULL).
>
> I will see, albeit my python skills are very limited.

See how Xen's cmdline parsing uses feature_names; they're intentionally
sorted already.

But, having gen-cpuid.py write out something that's closer to what
xen-cpuid.c wants is also very easy to do.

~Andrew



Re: [PATCH] tools/tests: let test-xenstore exit with non-0 status in case of error

2024-05-02 Thread Andrew Cooper
On 02/05/2024 10:22 am, Juergen Gross wrote:
> In case a test is failing in test-xenstore, let the tool exit with an
> exit status other than 0.
>
> Fix a typo in an error message.
>
> Reported-by: Andrew Cooper 
> Fixes: 3afc5e4a5b75 ("tools/tests: add xenstore testing framework")
> Signed-off-by: Juergen Gross 

Thanks.  Acked-by: Andrew Cooper  although
this could do with waiting until after we've fixed whatever else is
causing it to fail.

I'll try to find someone to work on that bug too.

~Andrew



Re: [PATCH 1/2] tools/{c,o}xenstored: Don't link against libsystemd

2024-05-01 Thread Andrew Cooper
On 01/05/2024 2:29 pm, Anthony PERARD wrote:
> On Fri, Apr 26, 2024 at 09:51:47AM +0100, Anthony PERARD wrote:
>> Run `systemd-notify --ready` instead. Hopefully, that will be enough.
>> ($NOTIFY_SOCKET is a socket, and a bit more complicated that I though,
>> it can start with "@" for example)
> FTR: If it turns out that calling systemd-notify binary isn't working
> well enough, we could have an implementation of sd_notify() in our tree,
> openssh are doing there own here:
> https://bugzilla.mindrot.org/show_bug.cgi?id=2641
> and there's an example implementation on systemd's documentation:
> https://www.freedesktop.org/software/systemd/man/devel/sd_notify.html#Notes
> (Nothing for ocaml)
>
> But let's go with `systemd-notify --ready` as it is just easier to
> write a bit of shell script.

I was already thinking of going down the small-library-function route.

Given that I miss-analysed the launch-xenstore, script, I'm not overly
enthused with just falling back to waiting on the pidfile, because
that's adding technical debt rather than removing it.

~Andrew



Re: [PATCH 2/5] x86/cpu-policy: Add SVM features already used by Xen

2024-05-01 Thread Andrew Cooper
On 01/05/2024 11:00 am, George Dunlap wrote:
> On Mon, Apr 29, 2024 at 4:16 PM Andrew Cooper  
> wrote:
>> These will replace svm_feature_flags and the SVM_FEATURE_* constants over the
>> next few changes.  Take the opportunity to rationalise some names.
>>
>> Drop the opencoded "inherit from host" logic in calculate_hvm_max_policy() 
>> and
>> use 'h'/'!' annotations.  The logic needs to operate on fs, not the policy
>> object, given its position within the function.
>>
>> Drop some trailing whitespace introduced when this block of code was last
>> moved.
>>
>> Signed-off-by: Andrew Cooper 
>> ---
>> CC: Jan Beulich 
>> CC: Roger Pau Monné 
>> CC: Stefano Stabellini 
>> CC: Xenia Ragiadakou 
>> CC: Sergiy Kibrik 
>> CC: George Dunlap 
>> CC: Andrei Semenov 
>> CC: Vaishali Thakkar 
>> ---
>>  tools/misc/xen-cpuid.c  | 11 +++
>>  xen/arch/x86/cpu-policy.c   | 17 +
>>  xen/include/public/arch-x86/cpufeatureset.h | 14 ++
>>  xen/tools/gen-cpuid.py  |  3 +++
>>  4 files changed, 33 insertions(+), 12 deletions(-)
>>
>> diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
>> index ab09410a05d6..0d01b0e797f1 100644
>> --- a/tools/misc/xen-cpuid.c
>> +++ b/tools/misc/xen-cpuid.c
>> @@ -266,6 +266,17 @@ static const char *const str_m10Ah[32] =
>>
>>  static const char *const str_eAd[32] =
>>  {
>> +[ 0] = "npt", [ 1] = "v-lbr",
>> +[ 2] = "svm-lock",[ 3] = "nrips",
>> +[ 4] = "v-tsc-rate",  [ 5] = "vmcb-cleanbits",
>> +[ 6] = "flush-by-asid",   [ 7] = "decode-assist",
>> +
>> +[10] = "pause-filter",
>> +[12] = "pause-thresh",
>> +/* 14 */  [15] = "v-loadsave",
>> +[16] = "v-gif",
>> +/* 18 */  [19] = "npt-sss",
>> +[20] = "v-spec-ctrl",
>>  };
>>
>>  static const char *const str_e1Fa[32] =
>> diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
>> index 4b6d96276399..da4401047e89 100644
>> --- a/xen/arch/x86/cpu-policy.c
>> +++ b/xen/arch/x86/cpu-policy.c
>> @@ -9,7 +9,6 @@
>>  #include 
>>  #include 
>>  #include 
>> -#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -748,22 +747,16 @@ static void __init calculate_hvm_max_policy(void)
>>  if ( !cpu_has_vmx )
>>  __clear_bit(X86_FEATURE_PKS, fs);
>>
>> -/*
>> +/*
>>   * Make adjustments to possible (nested) virtualization features exposed
>>   * to the guest
>>   */
>> -if ( p->extd.svm )
>> +if ( test_bit(X86_FEATURE_SVM, fs) )
>>  {
>> -/* Clamp to implemented features which require hardware support. */
>> -p->extd.raw[0xa].d &= ((1u << SVM_FEATURE_NPT) |
>> -   (1u << SVM_FEATURE_LBRV) |
>> -   (1u << SVM_FEATURE_NRIPS) |
>> -   (1u << SVM_FEATURE_PAUSEFILTER) |
>> -   (1u << SVM_FEATURE_DECODEASSISTS));
>> -/* Enable features which are always emulated. */
>> -p->extd.raw[0xa].d |= (1u << SVM_FEATURE_VMCBCLEAN);
>> +/* Xen always emulates cleanbits. */
>> +__set_bit(X86_FEATURE_VMCB_CLEANBITS, fs);
>>  }
> What about this line, at the end of recalculate_cpuid_policy()?
>
>   if ( !p->extd.svm )
> p->extd.raw[0xa] = EMPTY_LEAF;
>
> Is there a reason to continue to operate directly on the policy here,
> or would it be better to do this up in the featureset section?

That's still needed.

Otherwise in a !SVM VM you still see svm_rev and nr_asids in a leaf that
should be all zeroes.

~Andrew



Re: [PATCH 1/5] x86/cpu-policy: Infrastructure for the AMD SVM and SEV leaves

2024-04-30 Thread Andrew Cooper
On 30/04/2024 1:45 pm, Jan Beulich wrote:
> On 29.04.2024 17:16, Andrew Cooper wrote:
>> Allocate two new feature leaves, and extend cpu_policy with the non-feature
>> fields too.
>>
>> The CPUID dependency between the SVM bit on the whole SVM leaf is
>> intentionally deferred, to avoid transiently breaking nested virt.
> In reply to this I meant to ask that you at least add those dependencies in
> commented-out form, such that from looking at gen-cpuid.py it becomes clear
> they're intentionally omitted. But you don't add feature identifiers either,
> making dependencies impossible to express. Maybe this sentence was really
> meant for another of the patches? (Then my request would actually apply
> there.)

This is necessary because c/s 4f8b0e94d7ca is buggy.  Notice how it puts
an edit to the policy object in the middle of a block of logic editing
the featureset, which ends with writing the featureset back over the
policy object.

And it's not the first outstanding problem from what is a very small
number of nested-virt patches so far...

>> @@ -296,7 +298,16 @@ struct cpu_policy
>>  uint32_t /* d */:32;
>>  
>>  uint64_t :64, :64; /* Leaf 0x8009. */
>> -uint64_t :64, :64; /* Leaf 0x800a - SVM rev and features. */
>> +
>> +/* Leaf 0x800a - SVM rev and features. */
>> +uint8_t svm_rev, :8, :8, :8;
>> +uint32_t /* b */ :32;
>> +uint32_t nr_asids;
> According to the doc I'm looking at it is %ebx which holds the number of
> ASIDs and %ecx is reserved. With that adjusted

That's fun...  The PPR I used for this has it wrong.  A sample of others
match the APM, so  I'll raise a bug with AMD against this PPR.

> Reviewed-by: Jan Beulich 

Thanks.

~Andrew



Re: xen | Failed pipeline for staging | b819bd65

2024-04-30 Thread Andrew Cooper
On 30/04/2024 10:00 am, Jan Beulich wrote:
> On 30.04.2024 10:03, GitLab wrote:
>>
>> Pipeline #1272869158 has failed!
>>
>> Project: xen ( https://gitlab.com/xen-project/hardware/xen )
>> Branch: staging ( 
>> https://gitlab.com/xen-project/hardware/xen/-/commits/staging )
>>
>> Commit: b819bd65 ( 
>> https://gitlab.com/xen-project/hardware/xen/-/commit/b819bd65f4fb25be582f66ba2e4134f61d86f459
>>  )
>> Commit Message: revert "x86/mm: re-implement get_page_light() u...
>> Commit Author: Jan Beulich ( https://gitlab.com/jbeulich )
>>
>>
>> Pipeline #1272869158 ( 
>> https://gitlab.com/xen-project/hardware/xen/-/pipelines/1272869158 ) 
>> triggered by Jan Beulich ( https://gitlab.com/jbeulich )
>> had 3 failed jobs.
>>
>> Job #6745823842 ( 
>> https://gitlab.com/xen-project/hardware/xen/-/jobs/6745823842/raw )
>>
>> Stage: test
>> Name: adl-pci-hvm-x86-64-gcc-debug
>> Job #6745823720 ( 
>> https://gitlab.com/xen-project/hardware/xen/-/jobs/6745823720/raw )
>>
>> Stage: analyze
>> Name: eclair-x86_64
> This flags start_nested_{svm,vmx}() as regressions, when the regression was
> previously spotted already. Is that intended?

Yes.

>  I.e. shouldn't the comparison
> be to the previous pipeline run, such that issues are pointed out only for
> what is actually being added anew with the patch / batch under test?

Why should it be?  That's unlike every other CI we use, even OSSTest.

Gitlab, like many others, is stateless between runs.

These violations are ones where we had got down to 0 in Xen, and Xen was
marked as "clean" for these rules.  Any nonzero count (in the subset of
things we think we've fixed fully) is a failure.

This is no different to e.g. a panic on boot.  OSSTest will keep on
complaining of a regression until it gets fixed.

~Andrew



[PATCH v2 3/4] x86/cpu-policy: Simplify recalculate_xstate()

2024-04-29 Thread Andrew Cooper
Make use of the new xstate_uncompressed_size() helper rather than maintaining
the running calculation while accumulating feature components.

The rest of the CPUID data can come direct from the raw cpu policy.  All
per-component data form an ABI through the behaviour of the X{SAVE,RSTOR}*
instructions.

Use for_each_set_bit() rather than opencoding a slightly awkward version of
it.  Mask the attributes in ecx down based on the visible features.  This
isn't actually necessary for any components or attributes defined at the time
of writing (up to AMX), but is added out of an abundance of caution.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 

v2:
 * Tie ALIGN64 to xsavec rather than xsaves.
---
 xen/arch/x86/cpu-policy.c | 55 +++
 xen/arch/x86/include/asm/xstate.h |  1 +
 2 files changed, 21 insertions(+), 35 deletions(-)

diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index 4b6d96276399..fc7933be8577 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -193,8 +193,7 @@ static void sanitise_featureset(uint32_t *fs)
 static void recalculate_xstate(struct cpu_policy *p)
 {
 uint64_t xstates = XSTATE_FP_SSE;
-uint32_t xstate_size = XSTATE_AREA_MIN_SIZE;
-unsigned int i, Da1 = p->xstate.Da1;
+unsigned int i, ecx_mask = 0, Da1 = p->xstate.Da1;
 
 /*
  * The Da1 leaf is the only piece of information preserved in the common
@@ -206,61 +205,47 @@ static void recalculate_xstate(struct cpu_policy *p)
 return;
 
 if ( p->basic.avx )
-{
 xstates |= X86_XCR0_YMM;
-xstate_size = max(xstate_size,
-  xstate_offsets[X86_XCR0_YMM_POS] +
-  xstate_sizes[X86_XCR0_YMM_POS]);
-}
 
 if ( p->feat.mpx )
-{
 xstates |= X86_XCR0_BNDREGS | X86_XCR0_BNDCSR;
-xstate_size = max(xstate_size,
-  xstate_offsets[X86_XCR0_BNDCSR_POS] +
-  xstate_sizes[X86_XCR0_BNDCSR_POS]);
-}
 
 if ( p->feat.avx512f )
-{
 xstates |= X86_XCR0_OPMASK | X86_XCR0_ZMM | X86_XCR0_HI_ZMM;
-xstate_size = max(xstate_size,
-  xstate_offsets[X86_XCR0_HI_ZMM_POS] +
-  xstate_sizes[X86_XCR0_HI_ZMM_POS]);
-}
 
 if ( p->feat.pku )
-{
 xstates |= X86_XCR0_PKRU;
-xstate_size = max(xstate_size,
-  xstate_offsets[X86_XCR0_PKRU_POS] +
-  xstate_sizes[X86_XCR0_PKRU_POS]);
-}
 
-p->xstate.max_size  =  xstate_size;
+/* Subleaf 0 */
+p->xstate.max_size =
+xstate_uncompressed_size(xstates & ~XSTATE_XSAVES_ONLY);
 p->xstate.xcr0_low  =  xstates & ~XSTATE_XSAVES_ONLY;
 p->xstate.xcr0_high = (xstates & ~XSTATE_XSAVES_ONLY) >> 32;
 
+/* Subleaf 1 */
 p->xstate.Da1 = Da1;
+if ( p->xstate.xsavec )
+ecx_mask |= XSTATE_ALIGN64;
+
 if ( p->xstate.xsaves )
 {
+ecx_mask |= XSTATE_XSS;
 p->xstate.xss_low   =  xstates & XSTATE_XSAVES_ONLY;
 p->xstate.xss_high  = (xstates & XSTATE_XSAVES_ONLY) >> 32;
 }
-else
-xstates &= ~XSTATE_XSAVES_ONLY;
 
-for ( i = 2; i < min(63UL, ARRAY_SIZE(p->xstate.comp)); ++i )
+/* Subleafs 2+ */
+xstates &= ~XSTATE_FP_SSE;
+BUILD_BUG_ON(ARRAY_SIZE(p->xstate.comp) < 63);
+for_each_set_bit ( i, , 63 )
 {
-uint64_t curr_xstate = 1UL << i;
-
-if ( !(xstates & curr_xstate) )
-continue;
-
-p->xstate.comp[i].size   = xstate_sizes[i];
-p->xstate.comp[i].offset = xstate_offsets[i];
-p->xstate.comp[i].xss= curr_xstate & XSTATE_XSAVES_ONLY;
-p->xstate.comp[i].align  = curr_xstate & xstate_align;
+/*
+ * Pass through size (eax) and offset (ebx) directly.  Visbility of
+ * attributes in ecx limited by visible features in Da1.
+ */
+p->xstate.raw[i].a = raw_cpu_policy.xstate.raw[i].a;
+p->xstate.raw[i].b = raw_cpu_policy.xstate.raw[i].b;
+p->xstate.raw[i].c = raw_cpu_policy.xstate.raw[i].c & ecx_mask;
 }
 }
 
diff --git a/xen/arch/x86/include/asm/xstate.h 
b/xen/arch/x86/include/asm/xstate.h
index f5115199d4f9..bfb66dd766b6 100644
--- a/xen/arch/x86/include/asm/xstate.h
+++ b/xen/arch/x86/include/asm/xstate.h
@@ -40,6 +40,7 @@ extern uint32_t mxcsr_mask;
 #define XSTATE_XSAVES_ONLY 0
 #define XSTATE_COMPACTION_ENABLED  (1ULL << 63)
 
+#define XSTATE_XSS (1U << 0)
 #define XSTATE_ALIGN64 (1U << 1)
 
 extern u64 xfeature_mask;
-- 
2.30.2




[PATCH v2 4/4] x86/cpuid: Fix handling of xsave dynamic leaves

2024-04-29 Thread Andrew Cooper
If max leaf is greater than 0xd but xsave not available to the guest, then the
current XSAVE size should not be filled in.  This is a latent bug for now as
the guest max leaf is 0xd, but will become problematic in the future.

The comment concerning XSS state is wrong.  VT-x doesn't manage host/guest
state automatically, but there is provision for "host only" bits to be set, so
the implications are still accurate.

Introduce {xstate,hw}_compressed_size() helpers to mirror the uncompressed
ones.

This in turn higlights a bug in xstate_init().  Defaulting this_cpu(xss) to ~0
requires a forced write to clear it back out.  This in turn highlights that
it's only a safe default on systems with XSAVES.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 

The more I think about it, the more I think that cross-checking with hardware
is a bad move.  It's horribly expensive, and for supervisor states in
particular, liable to interfere with functionality.

v2:
 * Cope with blind reads of CPUID.0xD[1] prior to %xcr0 having been set up.
---
 xen/arch/x86/cpuid.c  | 24 
 xen/arch/x86/include/asm/xstate.h |  1 +
 xen/arch/x86/xstate.c | 64 ++-
 3 files changed, 72 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 7a38e032146a..a822e80c7ea7 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -330,23 +330,15 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
 case XSTATE_CPUID:
 switch ( subleaf )
 {
-case 1:
-if ( !p->xstate.xsavec && !p->xstate.xsaves )
-break;
-
-/*
- * TODO: Figure out what to do for XSS state.  VT-x manages host
- * vs guest MSR_XSS automatically, so as soon as we start
- * supporting any XSS states, the wrong XSS will be in context.
- */
-BUILD_BUG_ON(XSTATE_XSAVES_ONLY != 0);
-fallthrough;
 case 0:
-/*
- * Read CPUID[0xD,0/1].EBX from hardware.  They vary with enabled
- * XSTATE, and appropriate XCR0|XSS are in context.
- */
-res->b = cpuid_count_ebx(leaf, subleaf);
+if ( p->basic.xsave )
+res->b = xstate_uncompressed_size(v->arch.xcr0);
+break;
+
+case 1:
+if ( p->xstate.xsavec )
+res->b = xstate_compressed_size(v->arch.xcr0 |
+v->arch.msrs->xss.raw);
 break;
 }
 break;
diff --git a/xen/arch/x86/include/asm/xstate.h 
b/xen/arch/x86/include/asm/xstate.h
index bfb66dd766b6..da1d89d2f416 100644
--- a/xen/arch/x86/include/asm/xstate.h
+++ b/xen/arch/x86/include/asm/xstate.h
@@ -109,6 +109,7 @@ void xstate_free_save_area(struct vcpu *v);
 int xstate_alloc_save_area(struct vcpu *v);
 void xstate_init(struct cpuinfo_x86 *c);
 unsigned int xstate_uncompressed_size(uint64_t xcr0);
+unsigned int xstate_compressed_size(uint64_t xstates);
 
 static inline uint64_t xgetbv(unsigned int index)
 {
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index a94f4025fce5..b2cf3ae6acee 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -614,6 +614,65 @@ unsigned int xstate_uncompressed_size(uint64_t xcr0)
 return size;
 }
 
+static unsigned int hw_compressed_size(uint64_t xstates)
+{
+uint64_t curr_xcr0 = get_xcr0(), curr_xss = get_msr_xss();
+unsigned int size;
+bool ok;
+
+ok = set_xcr0(xstates & ~XSTATE_XSAVES_ONLY);
+ASSERT(ok);
+set_msr_xss(xstates & XSTATE_XSAVES_ONLY);
+
+size = cpuid_count_ebx(XSTATE_CPUID, 1);
+
+ok = set_xcr0(curr_xcr0);
+ASSERT(ok);
+set_msr_xss(curr_xss);
+
+return size;
+}
+
+unsigned int xstate_compressed_size(uint64_t xstates)
+{
+unsigned int i, size = XSTATE_AREA_MIN_SIZE;
+
+if ( xstates == 0 ) /* TODO: clean up paths passing 0 in here. */
+return 0;
+
+if ( xstates <= (X86_XCR0_SSE | X86_XCR0_FP) )
+return size;
+
+/*
+ * For the compressed size, every component matters.  Some are
+ * automatically rounded up to 64 first.
+ */
+xstates &= ~XSTATE_FP_SSE;
+for_each_set_bit ( i, , 63 )
+{
+if ( test_bit(i, _align) )
+size = ROUNDUP(size, 64);
+
+size += xstate_sizes[i];
+}
+
+/* In debug builds, cross-check our calculation with hardware. */
+if ( IS_ENABLED(CONFIG_DEBUG) )
+{
+unsigned int hwsize;
+
+xstates |= XSTATE_FP_SSE;
+hwsize = hw_compressed_size(xstates);
+
+if ( size != hwsize )
+printk_once(XENLOG_ERR "%s(%#"PRIx64") size %#x != hwsize %#x\n",
+__func__, xstates, size, hwsize);
+size = hwsize;
+}
+
+return size;
+}
+
 static bool valid_xcr0

[PATCH v2 0/4] x86/xstate: Fixes to size calculations

2024-04-29 Thread Andrew Cooper
Various fixes and improvements to xsave image size calculations.

Since v1:
 * Rebase over exposing XSAVEC.  This has highlighted several latent bugs.
 * Rework the uncompressed size handling.  LWP / APX_F make for much sadness.

Be aware that Intel and AMD have different ABIs for the xstate layout.

Andrew Cooper (4):
  x86/hvm: Defer the size calculation in hvm_save_cpu_xsave_states()
  x86/xstate: Rework xstate_ctxt_size() as xstate_uncompressed_size()
  x86/cpu-policy: Simplify recalculate_xstate()
  x86/cpuid: Fix handling of xsave dynamic leaves

 xen/arch/x86/cpu-policy.c |  55 ++-
 xen/arch/x86/cpuid.c  |  24 +++
 xen/arch/x86/domctl.c |   2 +-
 xen/arch/x86/hvm/hvm.c|  12 +++-
 xen/arch/x86/include/asm/xstate.h |   4 +-
 xen/arch/x86/xstate.c | 111 --
 6 files changed, 145 insertions(+), 63 deletions(-)

-- 
2.30.2




[PATCH v2 2/4] x86/xstate: Rework xstate_ctxt_size() as xstate_uncompressed_size()

2024-04-29 Thread Andrew Cooper
We're soon going to need a compressed helper of the same form.

The size of the uncompressed image depends on the single element with the
largest offset+size.  Sadly this isn't always the element with the largest
index.

Retain the cross-check with hardware in debug builds, but forgo it normal
builds.  In particular, this means that the migration paths don't need to mess
with XCR0 just to sanity check the buffer size.

No practical change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 

v2:
 * Scan all features.  LWP/APX_F are out-of-order.
---
 xen/arch/x86/domctl.c |  2 +-
 xen/arch/x86/hvm/hvm.c|  2 +-
 xen/arch/x86/include/asm/xstate.h |  2 +-
 xen/arch/x86/xstate.c | 45 +++
 4 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 9a72d57333e9..c2f2016ed45a 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -833,7 +833,7 @@ long arch_do_domctl(
 uint32_t offset = 0;
 
 #define PV_XSAVE_HDR_SIZE (2 * sizeof(uint64_t))
-#define PV_XSAVE_SIZE(xcr0) (PV_XSAVE_HDR_SIZE + xstate_ctxt_size(xcr0))
+#define PV_XSAVE_SIZE(xcr0) (PV_XSAVE_HDR_SIZE + 
xstate_uncompressed_size(xcr0))
 
 ret = -ESRCH;
 if ( (evc->vcpu >= d->max_vcpus) ||
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 9594e0a5c530..9e677d891d6c 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1191,7 +1191,7 @@ HVM_REGISTER_SAVE_RESTORE(CPU, hvm_save_cpu_ctxt, NULL, 
hvm_load_cpu_ctxt, 1,
 
 #define HVM_CPU_XSAVE_SIZE(xcr0) (offsetof(struct hvm_hw_cpu_xsave, \
save_area) + \
-  xstate_ctxt_size(xcr0))
+  xstate_uncompressed_size(xcr0))
 
 static int cf_check hvm_save_cpu_xsave_states(
 struct vcpu *v, hvm_domain_context_t *h)
diff --git a/xen/arch/x86/include/asm/xstate.h 
b/xen/arch/x86/include/asm/xstate.h
index c08c267884f0..f5115199d4f9 100644
--- a/xen/arch/x86/include/asm/xstate.h
+++ b/xen/arch/x86/include/asm/xstate.h
@@ -107,7 +107,7 @@ void compress_xsave_states(struct vcpu *v, const void *src, 
unsigned int size);
 void xstate_free_save_area(struct vcpu *v);
 int xstate_alloc_save_area(struct vcpu *v);
 void xstate_init(struct cpuinfo_x86 *c);
-unsigned int xstate_ctxt_size(u64 xcr0);
+unsigned int xstate_uncompressed_size(uint64_t xcr0);
 
 static inline uint64_t xgetbv(unsigned int index)
 {
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 99cedb4f5e24..a94f4025fce5 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -183,7 +183,7 @@ void expand_xsave_states(const struct vcpu *v, void *dest, 
unsigned int size)
 /* Check there is state to serialise (i.e. at least an XSAVE_HDR) */
 BUG_ON(!v->arch.xcr0_accum);
 /* Check there is the correct room to decompress into. */
-BUG_ON(size != xstate_ctxt_size(v->arch.xcr0_accum));
+BUG_ON(size != xstate_uncompressed_size(v->arch.xcr0_accum));
 
 if ( !(xstate->xsave_hdr.xcomp_bv & XSTATE_COMPACTION_ENABLED) )
 {
@@ -245,7 +245,7 @@ void compress_xsave_states(struct vcpu *v, const void *src, 
unsigned int size)
 u64 xstate_bv, valid;
 
 BUG_ON(!v->arch.xcr0_accum);
-BUG_ON(size != xstate_ctxt_size(v->arch.xcr0_accum));
+BUG_ON(size != xstate_uncompressed_size(v->arch.xcr0_accum));
 ASSERT(!xsave_area_compressed(src));
 
 xstate_bv = ((const struct xsave_struct *)src)->xsave_hdr.xstate_bv;
@@ -567,16 +567,51 @@ static unsigned int hw_uncompressed_size(uint64_t xcr0)
 return size;
 }
 
-/* Fastpath for common xstate size requests, avoiding reloads of xcr0. */
-unsigned int xstate_ctxt_size(u64 xcr0)
+unsigned int xstate_uncompressed_size(uint64_t xcr0)
 {
+unsigned int size = XSTATE_AREA_MIN_SIZE, i;
+
 if ( xcr0 == xfeature_mask )
 return xsave_cntxt_size;
 
 if ( xcr0 == 0 ) /* TODO: clean up paths passing 0 in here. */
 return 0;
 
-return hw_uncompressed_size(xcr0);
+if ( xcr0 <= (X86_XCR0_SSE | X86_XCR0_FP) )
+return size;
+
+/*
+ * For the non-legacy states, search all activate states and find the
+ * maximum offset+size.  Some states (e.g. LWP, APX_F) are out-of-order
+ * with respect their index.
+ */
+xcr0 &= ~XSTATE_FP_SSE;
+for_each_set_bit ( i, , 63 )
+{
+unsigned int s;
+
+ASSERT(xstate_offsets[i] && xstate_sizes[i]);
+
+s = xstate_offsets[i] && xstate_sizes[i];
+
+size = max(size, s);
+}
+
+/* In debug builds, cross-check our calculation with hardware. */
+if ( IS_ENABLED(CONFIG_DEBUG) )
+{
+unsigned int hwsize;
+
+xcr0 |= XSTATE_FP_SSE;
+hwsize = hw_uncompressed_size(xcr0);
+
+if ( size != hwsize )
+printk_once(XENLOG_ERR "%s(%#"

[PATCH v2 1/4] x86/hvm: Defer the size calculation in hvm_save_cpu_xsave_states()

2024-04-29 Thread Andrew Cooper
HVM_CPU_XSAVE_SIZE() may rewrite %xcr0 twice.  Defer the caluclation it until
after we've decided to write out an XSAVE record.

Note in hvm_load_cpu_xsave_states() that there were versions of Xen which
wrote out a useless XSAVE record.  This sadly limits out ability to tidy up
the existing infrastructure.  Also leave a note in xstate_ctxt_size() that 0
still needs tolerating for now.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 

v2:
 * New
---
 xen/arch/x86/hvm/hvm.c | 10 --
 xen/arch/x86/xstate.c  |  2 +-
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0ce45b177cf4..9594e0a5c530 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1197,12 +1197,13 @@ static int cf_check hvm_save_cpu_xsave_states(
 struct vcpu *v, hvm_domain_context_t *h)
 {
 struct hvm_hw_cpu_xsave *ctxt;
-unsigned int size = HVM_CPU_XSAVE_SIZE(v->arch.xcr0_accum);
+unsigned int size;
 int err;
 
-if ( !cpu_has_xsave || !xsave_enabled(v) )
+if ( !xsave_enabled(v) )
 return 0;   /* do nothing */
 
+size = HVM_CPU_XSAVE_SIZE(v->arch.xcr0_accum);
 err = _hvm_init_entry(h, CPU_XSAVE_CODE, v->vcpu_id, size);
 if ( err )
 return err;
@@ -1255,6 +1256,11 @@ static int cf_check hvm_load_cpu_xsave_states(
 if ( !cpu_has_xsave )
 return -EOPNOTSUPP;
 
+/*
+ * Note: Xen prior to 4.12 would write out empty XSAVE records for VMs
+ * running on XSAVE-capable hardware but without XSAVE active.
+ */
+
 /* Customized checking for entry since our entry is of variable length */
 desc = (struct hvm_save_descriptor *)>data[h->cur];
 if ( sizeof (*desc) > h->size - h->cur)
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index cf94761d0542..99cedb4f5e24 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -573,7 +573,7 @@ unsigned int xstate_ctxt_size(u64 xcr0)
 if ( xcr0 == xfeature_mask )
 return xsave_cntxt_size;
 
-if ( xcr0 == 0 )
+if ( xcr0 == 0 ) /* TODO: clean up paths passing 0 in here. */
 return 0;
 
 return hw_uncompressed_size(xcr0);
-- 
2.30.2




Re: [PATCH 2/5] x86/cpu-policy: Add SVM features already used by Xen

2024-04-29 Thread Andrew Cooper
On 29/04/2024 4:16 pm, Andrew Cooper wrote:
> diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
> index bf3f9ec01e6e..f07b1f4cf905 100755
> --- a/xen/tools/gen-cpuid.py
> +++ b/xen/tools/gen-cpuid.py
> @@ -280,6 +280,9 @@ def crunch_numbers(state):
>  # standard 3DNow in the earlier K6 processors.
>  _3DNOW: [_3DNOWEXT],
>  
> +# The SVM bit enumerates the whole SVM leave.
> +SVM: list(range(NPT, NPT + 32)),

This should say leaf.  Fixed locally.

~Andrew



[PATCH 1/5] x86/cpu-policy: Infrastructure for the AMD SVM and SEV leaves

2024-04-29 Thread Andrew Cooper
Allocate two new feature leaves, and extend cpu_policy with the non-feature
fields too.

The CPUID dependency between the SVM bit on the whole SVM leaf is
intentionally deferred, to avoid transiently breaking nested virt.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Stefano Stabellini 
CC: Xenia Ragiadakou 
CC: Sergiy Kibrik 
CC: George Dunlap 
CC: Andrei Semenov 
CC: Vaishali Thakkar 
---
 tools/libs/light/libxl_cpuid.c  |  2 ++
 tools/misc/xen-cpuid.c  | 10 +
 xen/arch/x86/cpu/common.c   |  4 
 xen/include/public/arch-x86/cpufeatureset.h |  4 
 xen/include/xen/lib/x86/cpu-policy.h| 24 +++--
 xen/lib/x86/cpuid.c |  4 
 6 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
index ce4f3c7095ba..c7a8b76f541d 100644
--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
@@ -342,6 +342,8 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list 
*policy, const char* str)
 CPUID_ENTRY(0x0007,  1, CPUID_REG_EDX),
 MSR_ENTRY(0x10a, CPUID_REG_EAX),
 MSR_ENTRY(0x10a, CPUID_REG_EDX),
+CPUID_ENTRY(0x800a, NA, CPUID_REG_EDX),
+CPUID_ENTRY(0x801f, NA, CPUID_REG_EAX),
 #undef MSR_ENTRY
 #undef CPUID_ENTRY
 };
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index 8893547bebce..ab09410a05d6 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -264,6 +264,14 @@ static const char *const str_m10Ah[32] =
 {
 };
 
+static const char *const str_eAd[32] =
+{
+};
+
+static const char *const str_e1Fa[32] =
+{
+};
+
 static const struct {
 const char *name;
 const char *abbr;
@@ -288,6 +296,8 @@ static const struct {
 { "CPUID 0x0007:1.edx", "7d1", str_7d1 },
 { "MSR_ARCH_CAPS.lo", "m10Al", str_m10Al },
 { "MSR_ARCH_CAPS.hi", "m10Ah", str_m10Ah },
+{ "CPUID 0x800a.edx",   "eAd", str_eAd },
+{ "CPUID 0x801f.eax",  "e1Fa", str_e1Fa },
 };
 
 #define COL_ALIGN "24"
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 28d7f34c4dbe..25b11e6472b8 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -477,6 +477,10 @@ static void generic_identify(struct cpuinfo_x86 *c)
c->x86_capability[FEATURESET_e7d] = cpuid_edx(0x8007);
if (c->extended_cpuid_level >= 0x8008)
c->x86_capability[FEATURESET_e8b] = cpuid_ebx(0x8008);
+   if (c->extended_cpuid_level >= 0x800a)
+   c->x86_capability[FEATURESET_eAd] = cpuid_edx(0x800a);
+   if (c->extended_cpuid_level >= 0x801f)
+   c->x86_capability[FEATURESET_e1Fa] = cpuid_eax(0x801f);
if (c->extended_cpuid_level >= 0x8021)
c->x86_capability[FEATURESET_e21a] = cpuid_eax(0x8021);
 
diff --git a/xen/include/public/arch-x86/cpufeatureset.h 
b/xen/include/public/arch-x86/cpufeatureset.h
index 53f13dec31f7..0f869214811e 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -357,6 +357,10 @@ XEN_CPUFEATURE(RFDS_CLEAR, 16*32+28) /*!A Register 
File(s) cleared by VE
 
 /* Intel-defined CPU features, MSR_ARCH_CAPS 0x10a.edx, word 17 */
 
+/* AMD-defined CPU features, CPUID level 0x800a.edx, word 18 */
+
+/* AMD-defined CPU features, CPUID level 0x801f.eax, word 19 */
+
 #endif /* XEN_CPUFEATURE */
 
 /* Clean up from a default include.  Close the enum (for C). */
diff --git a/xen/include/xen/lib/x86/cpu-policy.h 
b/xen/include/xen/lib/x86/cpu-policy.h
index d5e447e9dc06..936e00e4da73 100644
--- a/xen/include/xen/lib/x86/cpu-policy.h
+++ b/xen/include/xen/lib/x86/cpu-policy.h
@@ -22,6 +22,8 @@
 #define FEATURESET_7d1   15 /* 0x0007:1.edx*/
 #define FEATURESET_m10Al 16 /* 0x010a.eax  */
 #define FEATURESET_m10Ah 17 /* 0x010a.edx  */
+#define FEATURESET_eAd   18 /* 0x800a.edx  */
+#define FEATURESET_e1Fa  19 /* 0x801f.eax  */
 
 struct cpuid_leaf
 {
@@ -296,7 +298,16 @@ struct cpu_policy
 uint32_t /* d */:32;
 
 uint64_t :64, :64; /* Leaf 0x8009. */
-uint64_t :64, :64; /* Leaf 0x800a - SVM rev and features. */
+
+/* Leaf 0x800a - SVM rev and features. */
+uint8_t svm_rev, :8, :8, :8;
+uint32_t /* b */ :32;
+uint32_t nr_asids;
+union {
+uint32_t eAd;
+struct { DECL_BITFIELD(eAd); };
+};
+
 uint64_t :64, :64; /* Leaf 0x800b. */
 uint64_t :64, :64; /* Leaf 0x800c. */
 uint64_t :64, :64; /* Leaf

[PATCH 4/5] x86/svm: Switch SVM features over normal cpu_has_*

2024-04-29 Thread Andrew Cooper
Delete the boot time rendering of advanced features.  It's entirely ad-hoc and
not even everything printed here is used by Xen.  It is available in
`xen-cpuid` now.

With (only) svm_load_segs_{,prefetch}() declared now in svm.h, only svm.c and
domain.c which need the header.  Clean up all others.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Stefano Stabellini 
CC: Xenia Ragiadakou 
CC: Sergiy Kibrik 
CC: George Dunlap 
CC: Andrei Semenov 
CC: Vaishali Thakkar 
---
 xen/arch/x86/hvm/svm/asid.c|  5 ++-
 xen/arch/x86/hvm/svm/emulate.c |  3 +-
 xen/arch/x86/hvm/svm/intr.c|  1 -
 xen/arch/x86/hvm/svm/nestedsvm.c   | 14 
 xen/arch/x86/hvm/svm/svm.c | 50 +++---
 xen/arch/x86/hvm/svm/vmcb.c|  1 -
 xen/arch/x86/include/asm/cpufeature.h  | 10 ++
 xen/arch/x86/include/asm/hvm/svm/svm.h | 36 ---
 8 files changed, 31 insertions(+), 89 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/asid.c b/xen/arch/x86/hvm/svm/asid.c
index 7977a8e86b53..6117a362d310 100644
--- a/xen/arch/x86/hvm/svm/asid.c
+++ b/xen/arch/x86/hvm/svm/asid.c
@@ -6,7 +6,6 @@
 
 #include 
 #include 
-#include 
 
 #include "svm.h"
 
@@ -39,7 +38,7 @@ void svm_asid_handle_vmrun(void)
 {
 vmcb_set_asid(vmcb, true);
 vmcb->tlb_control =
-cpu_has_svm_flushbyasid ? TLB_CTRL_FLUSH_ASID : TLB_CTRL_FLUSH_ALL;
+cpu_has_flush_by_asid ? TLB_CTRL_FLUSH_ASID : TLB_CTRL_FLUSH_ALL;
 return;
 }
 
@@ -48,7 +47,7 @@ void svm_asid_handle_vmrun(void)
 
 vmcb->tlb_control =
 !need_flush ? TLB_CTRL_NO_FLUSH :
-cpu_has_svm_flushbyasid ? TLB_CTRL_FLUSH_ASID : TLB_CTRL_FLUSH_ALL;
+cpu_has_flush_by_asid ? TLB_CTRL_FLUSH_ASID : TLB_CTRL_FLUSH_ALL;
 }
 
 /*
diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c
index 93ac1d3435f9..da6e21b2e270 100644
--- a/xen/arch/x86/hvm/svm/emulate.c
+++ b/xen/arch/x86/hvm/svm/emulate.c
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include "svm.h"
@@ -20,7 +19,7 @@ static unsigned long svm_nextrip_insn_length(struct vcpu *v)
 {
 struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
 
-if ( !cpu_has_svm_nrips )
+if ( !cpu_has_nrips )
 return 0;
 
 #ifndef NDEBUG
diff --git a/xen/arch/x86/hvm/svm/intr.c b/xen/arch/x86/hvm/svm/intr.c
index 4805c5567213..facd2894a2c6 100644
--- a/xen/arch/x86/hvm/svm/intr.c
+++ b/xen/arch/x86/hvm/svm/intr.c
@@ -17,7 +17,6 @@
 #include 
 #include 
 #include 
-#include 
 #include  /* for nestedhvm_vcpu_in_guestmode */
 #include 
 #include 
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index 35a2cbfd7d13..255af112661f 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -6,7 +6,6 @@
  */
 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -1620,7 +1619,7 @@ void svm_nested_features_on_efer_update(struct vcpu *v)
 {
 if ( !vmcb->virt_ext.fields.vloadsave_enable &&
  paging_mode_hap(v->domain) &&
- cpu_has_svm_vloadsave )
+ cpu_has_v_loadsave )
 {
 vmcb->virt_ext.fields.vloadsave_enable = 1;
 general2_intercepts  = vmcb_get_general2_intercepts(vmcb);
@@ -1629,8 +1628,7 @@ void svm_nested_features_on_efer_update(struct vcpu *v)
 vmcb_set_general2_intercepts(vmcb, general2_intercepts);
 }
 
-if ( !vmcb->_vintr.fields.vgif_enable &&
- cpu_has_svm_vgif )
+if ( !vmcb->_vintr.fields.vgif_enable && cpu_has_v_gif )
 {
 vintr = vmcb_get_vintr(vmcb);
 vintr.fields.vgif = svm->ns_gif;
@@ -1675,8 +1673,8 @@ void __init start_nested_svm(struct hvm_function_table 
*hvm_function_table)
  */
 hvm_function_table->caps.nested_virt =
 hvm_function_table->caps.hap && 
-cpu_has_svm_lbrv &&
-cpu_has_svm_nrips &&
-cpu_has_svm_flushbyasid &&
-cpu_has_svm_decode;
+cpu_has_v_lbr &&
+cpu_has_nrips &&
+cpu_has_flush_by_asid &&
+cpu_has_decode_assist;
 }
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 4719fffae589..16eb875aab94 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1287,7 +1287,7 @@ static void cf_check svm_inject_event(const struct 
x86_event *event)
  * that hardware doesn't perform DPL checking on injection.
  */
 if ( event->type == X86_EVENTTYPE_PRI_SW_EXCEPTION ||
- (!cpu_has_svm_nrips && (event->type >= X86_EVENTTYPE_SW_INTERRUPT)) )
+ (!cpu_has_nrips && (event->type >= X86_EVENTTYPE_SW_INTERRUPT)) )
 svm_emul_swint_injection(&_event);
 

[PATCH 0/5] x86: AMD CPUID handling improvements

2024-04-29 Thread Andrew Cooper
This is (half) the series I've promised various people, to untangle some AMD
CPUID handling.  It moves the SVM feature leaf into the standard
x86_capabilities[] infrastructure.

On a random Milan system, with this in place, xen-cpuid reports:

Dynamic sets:
Raw 
178bfbff:7eda320b:2fd3fbff:75c237ff:000f:219c95a9:0040069c:6799:91bef75f:::204d:::::::119b9cff:0101fd3f
  [18] CPUID 0x800a.edx npt v-lbr svm-lock nrips v-tsc-rate 
vmcb-cleanbits flush-by-asid decode-assist pause-filter <11> pause-thresh 
v-loadsave v-gif <17> npt-sss v-spec-ctrl <23> <24> <28>
  [19] CPUID 0x801f.eax sme sev <2> sev-es sev-snp <5> <8> <10> <11> 
<12> <13> <14> <15> <16> <24>

Host
178bf3ff:76da320b:2fd3fbff:644037ff:000f:219c95a9:0040068c:0780:319ed205:::1845:::::::001994ff:
  [18] CPUID 0x800a.edx npt v-lbr svm-lock nrips v-tsc-rate 
vmcb-cleanbits flush-by-asid decode-assist pause-filter pause-thresh v-loadsave 
v-gif npt-sss v-spec-ctrl
  [19] CPUID 0x801f.eax

HVM Max 
178bfbff:f6fa3203:2e500800:440001f7:000f:219c05a9:0040060c:0100:331ed005:::1845:::::::04ab:
  [18] CPUID 0x800a.edx npt v-lbr nrips vmcb-cleanbits decode-assist 
pause-filter
  [19] CPUID 0x801f.eax


Unforunately, I haven't managed to do the second half to make the host_policy
usable earlier on boot.  Untanling __setup_xen() is proving stuborn due to
(ab)uses of the MB1 module list.

Andrew Cooper (5):
  x86/cpu-policy: Infrastructure for the AMD SVM and SEV leaves
  x86/cpu-policy: Add SVM features already used by Xen
  x86/spec-ctrl: Remove open-coded check of SVM_FEATURE_SPEC_CTRL
  x86/svm: Switch SVM features over normal cpu_has_*
  x86/cpu-policy: Introduce some SEV features

 tools/libs/light/libxl_cpuid.c  |  2 +
 tools/misc/xen-cpuid.c  | 24 ++
 xen/arch/x86/cpu-policy.c   | 17 +++
 xen/arch/x86/cpu/common.c   |  4 ++
 xen/arch/x86/hvm/svm/asid.c |  5 +--
 xen/arch/x86/hvm/svm/emulate.c  |  3 +-
 xen/arch/x86/hvm/svm/intr.c |  1 -
 xen/arch/x86/hvm/svm/nestedsvm.c| 14 +++---
 xen/arch/x86/hvm/svm/svm.c  | 50 +
 xen/arch/x86/hvm/svm/vmcb.c |  1 -
 xen/arch/x86/include/asm/cpufeature.h   | 16 +++
 xen/arch/x86/include/asm/hvm/svm/svm.h  | 36 ---
 xen/arch/x86/spec_ctrl.c|  7 +--
 xen/include/public/arch-x86/cpufeatureset.h | 22 +
 xen/include/xen/lib/x86/cpu-policy.h| 24 +-
 xen/lib/x86/cpuid.c |  4 ++
 xen/tools/gen-cpuid.py  |  7 +++
 17 files changed, 128 insertions(+), 109 deletions(-)

--
2.30.2



[PATCH 5/5] x86/cpu-policy: Introduce some SEV features

2024-04-29 Thread Andrew Cooper
For display purposes only right now.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Stefano Stabellini 
CC: Xenia Ragiadakou 
CC: Sergiy Kibrik 
CC: George Dunlap 
CC: Andrei Semenov 
CC: Vaishali Thakkar 

This is only half the work to get SEV working nicely.  The other
half (rearranging __start_xen() so we can move the host policy collection
earlier) is still a work-in-progress.
---
 tools/misc/xen-cpuid.c  | 3 +++
 xen/arch/x86/include/asm/cpufeature.h   | 3 +++
 xen/include/public/arch-x86/cpufeatureset.h | 4 
 xen/tools/gen-cpuid.py  | 6 +-
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index 0d01b0e797f1..1463e0429ba1 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -281,6 +281,9 @@ static const char *const str_eAd[32] =
 
 static const char *const str_e1Fa[32] =
 {
+[ 0] = "sme", [ 1] = "sev",
+/* 2 */   [ 3] = "sev-es",
+[ 4] = "sev-snp",
 };
 
 static const struct {
diff --git a/xen/arch/x86/include/asm/cpufeature.h 
b/xen/arch/x86/include/asm/cpufeature.h
index b6fb8c24423c..732f0d2bf758 100644
--- a/xen/arch/x86/include/asm/cpufeature.h
+++ b/xen/arch/x86/include/asm/cpufeature.h
@@ -230,6 +230,9 @@ static inline bool boot_cpu_has(unsigned int feat)
 #define cpu_has_v_gif   boot_cpu_has(X86_FEATURE_V_GIF)
 #define cpu_has_v_spec_ctrl boot_cpu_has(X86_FEATURE_V_SPEC_CTRL)
 
+/* CPUID level 0x801f.eax */
+#define cpu_has_sev boot_cpu_has(X86_FEATURE_SEV)
+
 /* Synthesized. */
 #define cpu_has_arch_perfmonboot_cpu_has(X86_FEATURE_ARCH_PERFMON)
 #define cpu_has_cpuid_faulting  boot_cpu_has(X86_FEATURE_CPUID_FAULTING)
diff --git a/xen/include/public/arch-x86/cpufeatureset.h 
b/xen/include/public/arch-x86/cpufeatureset.h
index 80d252a38c2d..7ee0f2329151 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -374,6 +374,10 @@ XEN_CPUFEATURE(NPT_SSS,18*32+19) /*   NPT 
Supervisor Shadow Stacks *
 XEN_CPUFEATURE(V_SPEC_CTRL,18*32+20) /*   Virtualised MSR_SPEC_CTRL */
 
 /* AMD-defined CPU features, CPUID level 0x801f.eax, word 19 */
+XEN_CPUFEATURE(SME,19*32+ 0) /*   Secure Memory Encryption */
+XEN_CPUFEATURE(SEV,19*32+ 1) /*   Secure Encryped VM */
+XEN_CPUFEATURE(SEV_ES, 19*32+ 3) /*   SEV Encrypted State */
+XEN_CPUFEATURE(SEV_SNP,19*32+ 4) /*   SEV Secure Nested Paging */
 
 #endif /* XEN_CPUFEATURE */
 
diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index f07b1f4cf905..bff4d9389ff6 100755
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -281,7 +281,7 @@ def crunch_numbers(state):
 _3DNOW: [_3DNOWEXT],
 
 # The SVM bit enumerates the whole SVM leave.
-SVM: list(range(NPT, NPT + 32)),
+SVM: list(range(NPT, NPT + 32)) + [SEV],
 
 # This is just the dependency between AVX512 and AVX2 of XSTATE
 # feature flags.  If want to use AVX512, AVX2 must be supported and
@@ -341,6 +341,10 @@ def crunch_numbers(state):
 
 # The behaviour described by RRSBA depend on eIBRS being active.
 EIBRS: [RRSBA],
+
+SEV: [SEV_ES],
+
+SEV_ES: [SEV_SNP],
 }
 
 deep_features = tuple(sorted(deps.keys()))
-- 
2.30.2




[PATCH 3/5] x86/spec-ctrl: Remove open-coded check of SVM_FEATURE_SPEC_CTRL

2024-04-29 Thread Andrew Cooper
Now that the SVM feature leaf has been included in normal feature handling, it
is available early enough for init_speculation_mitigations() to use.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Stefano Stabellini 
CC: Xenia Ragiadakou 
CC: Sergiy Kibrik 
CC: George Dunlap 
CC: Andrei Semenov 
CC: Vaishali Thakkar 
---
 xen/arch/x86/include/asm/cpufeature.h | 3 +++
 xen/arch/x86/spec_ctrl.c  | 7 +--
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/include/asm/cpufeature.h 
b/xen/arch/x86/include/asm/cpufeature.h
index 9bc553681f4a..77cfd900cb56 100644
--- a/xen/arch/x86/include/asm/cpufeature.h
+++ b/xen/arch/x86/include/asm/cpufeature.h
@@ -217,6 +217,9 @@ static inline bool boot_cpu_has(unsigned int feat)
 #define cpu_has_rfds_no boot_cpu_has(X86_FEATURE_RFDS_NO)
 #define cpu_has_rfds_clear  boot_cpu_has(X86_FEATURE_RFDS_CLEAR)
 
+/* CPUID level 0x800a.edx */
+#define cpu_has_v_spec_ctrl boot_cpu_has(X86_FEATURE_V_SPEC_CTRL)
+
 /* Synthesized. */
 #define cpu_has_arch_perfmonboot_cpu_has(X86_FEATURE_ARCH_PERFMON)
 #define cpu_has_cpuid_faulting  boot_cpu_has(X86_FEATURE_CPUID_FAULTING)
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 40f6ae017010..0bda9d01def5 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -11,7 +11,6 @@
 #include 
 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -1896,12 +1895,8 @@ void __init init_speculation_mitigations(void)
  *
  * No need for SCF_ist_sc_msr because Xen's value is restored
  * atomically WRT NMIs in the VMExit path.
- *
- * TODO: Adjust cpu_has_svm_spec_ctrl to be usable earlier on boot.
  */
-if ( opt_msr_sc_hvm &&
- (boot_cpu_data.extended_cpuid_level >= 0x800aU) &&
- (cpuid_edx(0x800aU) & (1u << SVM_FEATURE_SPEC_CTRL)) )
+if ( opt_msr_sc_hvm && cpu_has_v_spec_ctrl )
 setup_force_cpu_cap(X86_FEATURE_SC_MSR_HVM);
 }
 
-- 
2.30.2




[PATCH 2/5] x86/cpu-policy: Add SVM features already used by Xen

2024-04-29 Thread Andrew Cooper
These will replace svm_feature_flags and the SVM_FEATURE_* constants over the
next few changes.  Take the opportunity to rationalise some names.

Drop the opencoded "inherit from host" logic in calculate_hvm_max_policy() and
use 'h'/'!' annotations.  The logic needs to operate on fs, not the policy
object, given its position within the function.

Drop some trailing whitespace introduced when this block of code was last
moved.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Stefano Stabellini 
CC: Xenia Ragiadakou 
CC: Sergiy Kibrik 
CC: George Dunlap 
CC: Andrei Semenov 
CC: Vaishali Thakkar 
---
 tools/misc/xen-cpuid.c  | 11 +++
 xen/arch/x86/cpu-policy.c   | 17 +
 xen/include/public/arch-x86/cpufeatureset.h | 14 ++
 xen/tools/gen-cpuid.py  |  3 +++
 4 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index ab09410a05d6..0d01b0e797f1 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -266,6 +266,17 @@ static const char *const str_m10Ah[32] =
 
 static const char *const str_eAd[32] =
 {
+[ 0] = "npt", [ 1] = "v-lbr",
+[ 2] = "svm-lock",[ 3] = "nrips",
+[ 4] = "v-tsc-rate",  [ 5] = "vmcb-cleanbits",
+[ 6] = "flush-by-asid",   [ 7] = "decode-assist",
+
+[10] = "pause-filter",
+[12] = "pause-thresh",
+/* 14 */  [15] = "v-loadsave",
+[16] = "v-gif",
+/* 18 */  [19] = "npt-sss",
+[20] = "v-spec-ctrl",
 };
 
 static const char *const str_e1Fa[32] =
diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index 4b6d96276399..da4401047e89 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -9,7 +9,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -748,22 +747,16 @@ static void __init calculate_hvm_max_policy(void)
 if ( !cpu_has_vmx )
 __clear_bit(X86_FEATURE_PKS, fs);
 
-/* 
+/*
  * Make adjustments to possible (nested) virtualization features exposed
  * to the guest
  */
-if ( p->extd.svm )
+if ( test_bit(X86_FEATURE_SVM, fs) )
 {
-/* Clamp to implemented features which require hardware support. */
-p->extd.raw[0xa].d &= ((1u << SVM_FEATURE_NPT) |
-   (1u << SVM_FEATURE_LBRV) |
-   (1u << SVM_FEATURE_NRIPS) |
-   (1u << SVM_FEATURE_PAUSEFILTER) |
-   (1u << SVM_FEATURE_DECODEASSISTS));
-/* Enable features which are always emulated. */
-p->extd.raw[0xa].d |= (1u << SVM_FEATURE_VMCBCLEAN);
+/* Xen always emulates cleanbits. */
+__set_bit(X86_FEATURE_VMCB_CLEANBITS, fs);
 }
-
+
 guest_common_max_feature_adjustments(fs);
 guest_common_feature_adjustments(fs);
 
diff --git a/xen/include/public/arch-x86/cpufeatureset.h 
b/xen/include/public/arch-x86/cpufeatureset.h
index 0f869214811e..80d252a38c2d 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -358,6 +358,20 @@ XEN_CPUFEATURE(RFDS_CLEAR, 16*32+28) /*!A Register 
File(s) cleared by VE
 /* Intel-defined CPU features, MSR_ARCH_CAPS 0x10a.edx, word 17 */
 
 /* AMD-defined CPU features, CPUID level 0x800a.edx, word 18 */
+XEN_CPUFEATURE(NPT,18*32+ 0) /*h  Nested PageTables */
+XEN_CPUFEATURE(V_LBR,  18*32+ 1) /*h  Virtualised LBR */
+XEN_CPUFEATURE(SVM_LOCK,   18*32+ 2) /*   SVM locking MSR */
+XEN_CPUFEATURE(NRIPS,  18*32+ 3) /*h  Next-RIP saved on VMExit */
+XEN_CPUFEATURE(V_TSC_RATE, 18*32+ 4) /*   Virtualised TSC Ratio */
+XEN_CPUFEATURE(VMCB_CLEANBITS, 18*32+ 5) /*!  VMCB Clean-bits */
+XEN_CPUFEATURE(FLUSH_BY_ASID,  18*32+ 6) /*   TLB Flush by ASID */
+XEN_CPUFEATURE(DECODE_ASSIST,  18*32+ 7) /*h  Decode assists */
+XEN_CPUFEATURE(PAUSE_FILTER,   18*32+10) /*h  Pause filter */
+XEN_CPUFEATURE(PAUSE_THRESH,   18*32+12) /*   Pause filter threshold */
+XEN_CPUFEATURE(V_LOADSAVE, 18*32+15) /*   Virtualised VMLOAD/SAVE */
+XEN_CPUFEATURE(V_GIF,  18*32+16) /*   Virtualised GIF */
+XEN_CPUFEATURE(NPT_SSS,18*32+19) /*   NPT Supervisor Shadow Stacks 
*/
+XEN_CPUFEATURE(V_SPEC_CTRL,18*32+20) /*   Virtualised MSR_SPEC_CTRL */
 
 /* AMD-defined CPU features, CPUID level 0x801f.eax, word 19 */
 
diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index bf3f9ec01e6e..f07b1f4cf905 100755
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -280,6 +280,9 @@ def crunc

Re: [PATCH] revert "x86/mm: re-implement get_page_light() using an atomic increment"

2024-04-29 Thread Andrew Cooper
On 29/04/2024 2:01 pm, Jan Beulich wrote:
> revert "x86/mm: re-implement get_page_light() using an atomic increment"
>
> This reverts commit c40bc0576dcc5acd4d7e22ef628eb4642f568533.
>
> That change aimed at eliminating an open-coded lock-like construct,
> which really isn't all that similar to, in particular, get_page(). The
> function always succeeds. Any remaining concern would want taking care
> of by placing block_lock_speculation() at the end of the function.
> Since the function is called only during page (de)validation, any
> possible performance concerns over such extra serialization could
> likely be addressed by pre-validating (e.g. via pinning) page tables.
>
> The fundamental issue with the change being reverted is that it detects
> bad state only after already having caused possible corruption. While
> the system is going to be halted in such an event, there is a time
> window during which the resulting incorrect state could be leveraged by
> a clever (in particular: fast enough) attacker.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 



Re: [PATCH] x86/cpu-policy: Annotate the accumulated features

2024-04-29 Thread Andrew Cooper
On 29/04/2024 8:49 am, Jan Beulich wrote:
> On 26.04.2024 18:08, Andrew Cooper wrote:
>> Some features need accumulating rather than intersecting to make migration
>> safe.  Introduce the new '|' attribute for this purpose.
>>
>> Right now, it's only used by the Xapi toolstack, but it will be used by
>> xl/libxl when the full policy-object work is complete, and until then it's
>> still a useful hint for hand-crafted cpuid= lines in vm.cfg files.
>>
>> Signed-off-by: Andrew Cooper 
> Reviewed-by: Jan Beulich 

Thanks.

> The one feature you don't annotate such that I'm not entirely sure about is
> NO_FPU_SEL: On one hand it tells code not to rely on / use the selector
> fields in FPU state.

It's sadly far more complicated than this.

This feature, and it's AMD partner RSTR_FP_ERR_PTRS, where introduced to
stop windows BSOD-ing under virt, and came with an accidental breakage
of x86emul/DoSBox/etc which Intel and AMD have declined to fix.

If you recall, prior to these features, the hypervisor needs to divine
the operand size of Window's {F,}X{SAVE,RESTORE} instructions, as it
blindly does a memcmp() across the region to confirm that the interrupt
handler didn't corrupt any state.


Both features are discontinuities across which is is not safe to migrate.

Advertising "reliably store as 0" on older systems will still cause
windows to BSOD on occasion, whereas not advertising it on newer systems
will suggest that legacy emulators will work, when they don't.


I don't have any good idea of how to make this work nicely, other than
having some admin booleans in the xl.cfg file saying "I certify I'm not
using a legacy emulator in my VM" to override the safety check.

Other discontinuities I'm aware of are the introduction of DAZ (changing
MXCSR_MASK), and any migration which changes LBR_FORMAT.

~Andrew



Re: [PATCH] xen/x86: Fix Syntax warning in gen-cpuid.py

2024-04-26 Thread Andrew Cooper
On 26/04/2024 5:07 am, Jason Andryuk wrote:
> Python 3.12.2 warns:
>
> xen/tools/gen-cpuid.py:50: SyntaxWarning: invalid escape sequence '\s'
>   "\s+([\s\d]+\*[\s\d]+\+[\s\d]+)\)"
> xen/tools/gen-cpuid.py:51: SyntaxWarning: invalid escape sequence '\s'
>   "\s+/\*([\w!]*) .*$")
>
> Specify the strings as raw strings so '\s' is read as literal '\' + 's'.
> This avoids escaping all the '\'s in the strings.
>
> Signed-off-by: Jason Andryuk 

That's something I didn't know about how python does string
concatenation.  I was expecting the whole string to be considered raw,
not just the first line.

Acked-by: Andrew Cooper 

I'll rebase my pending change altering the regex over this.



[PATCH] x86/cpu-policy: Annotate the accumulated features

2024-04-26 Thread Andrew Cooper
Some features need accumulating rather than intersecting to make migration
safe.  Introduce the new '|' attribute for this purpose.

Right now, it's only used by the Xapi toolstack, but it will be used by
xl/libxl when the full policy-object work is complete, and until then it's
still a useful hint for hand-crafted cpuid= lines in vm.cfg files.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
---
 xen/include/public/arch-x86/cpufeatureset.h | 15 ++-
 xen/tools/gen-cpuid.py  |  7 +--
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/xen/include/public/arch-x86/cpufeatureset.h 
b/xen/include/public/arch-x86/cpufeatureset.h
index 53f13dec31f7..6627453e3985 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -72,6 +72,11 @@ enum {
  *   'H' = HVM HAP guests (not PV or HVM Shadow guests).
  *   Upper case => Available by default
  *   Lower case => Can be opted-in to, but not available by default.
+ *
+ * Migration: '|'
+ *   This bit should be visible to a guest if any anywhere it might run has
+ *   the bit set.  i.e. it needs accumulating across the migration pool,
+ *   rather than intersecting.
  */
 
 /* Intel-defined CPU features, CPUID level 0x0001.edx, word 0 */
@@ -248,7 +253,7 @@ XEN_CPUFEATURE(IBRS_ALWAYS,   8*32+16) /*S  IBRS preferred 
always on */
 XEN_CPUFEATURE(STIBP_ALWAYS,  8*32+17) /*S  STIBP preferred always on */
 XEN_CPUFEATURE(IBRS_FAST, 8*32+18) /*S  IBRS preferred over software 
options */
 XEN_CPUFEATURE(IBRS_SAME_MODE, 8*32+19) /*S  IBRS provides same-mode 
protection */
-XEN_CPUFEATURE(NO_LMSL,   8*32+20) /*S  EFER.LMSLE no longer supported. */
+XEN_CPUFEATURE(NO_LMSL,   8*32+20) /*S| EFER.LMSLE no longer supported. */
 XEN_CPUFEATURE(AMD_PPIN,  8*32+23) /*   Protected Processor Inventory 
Number */
 XEN_CPUFEATURE(AMD_SSBD,  8*32+24) /*S  MSR_SPEC_CTRL.SSBD available */
 XEN_CPUFEATURE(VIRT_SSBD, 8*32+25) /*!  MSR_VIRT_SPEC_CTRL.SSBD */
@@ -263,7 +268,7 @@ XEN_CPUFEATURE(AVX512_4FMAPS, 9*32+ 3) /*A  AVX512 Multiply 
Accumulation Single
 XEN_CPUFEATURE(FSRM,  9*32+ 4) /*A  Fast Short REP MOVS */
 XEN_CPUFEATURE(AVX512_VP2INTERSECT, 9*32+8) /*a  VP2INTERSECT{D,Q} insns */
 XEN_CPUFEATURE(SRBDS_CTRL,9*32+ 9) /*   MSR_MCU_OPT_CTRL and 
RNGDS_MITG_DIS. */
-XEN_CPUFEATURE(MD_CLEAR,  9*32+10) /*!A VERW clears microarchitectural 
buffers */
+XEN_CPUFEATURE(MD_CLEAR,  9*32+10) /*!A| VERW clears microarchitectural 
buffers */
 XEN_CPUFEATURE(RTM_ALWAYS_ABORT, 9*32+11) /*! RTM disabled (but XBEGIN wont 
fault) */
 XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
 XEN_CPUFEATURE(SERIALIZE, 9*32+14) /*A  SERIALIZE insn */
@@ -292,7 +297,7 @@ XEN_CPUFEATURE(AVX_IFMA, 10*32+23) /*A  AVX-IFMA 
Instructions */
 
 /* AMD-defined CPU features, CPUID level 0x8021.eax, word 11 */
 XEN_CPUFEATURE(NO_NEST_BP, 11*32+ 0) /*A  No Nested Data Breakpoints */
-XEN_CPUFEATURE(FS_GS_NS,   11*32+ 1) /*S  FS/GS base MSRs 
non-serialising */
+XEN_CPUFEATURE(FS_GS_NS,   11*32+ 1) /*S| FS/GS base MSRs 
non-serialising */
 XEN_CPUFEATURE(LFENCE_DISPATCH,11*32+ 2) /*A  LFENCE always serializing */
 XEN_CPUFEATURE(NSCB,   11*32+ 6) /*A  Null Selector Clears Base 
(and limit too) */
 XEN_CPUFEATURE(AUTO_IBRS,  11*32+ 8) /*S  Automatic IBRS */
@@ -343,7 +348,7 @@ XEN_CPUFEATURE(DOITM,  16*32+12) /*   Data 
Operand Invariant Timing
 XEN_CPUFEATURE(SBDR_SSDP_NO,   16*32+13) /*A  No Shared Buffer Data Read 
or Sideband Stale Data Propagation */
 XEN_CPUFEATURE(FBSDP_NO,   16*32+14) /*A  No Fill Buffer Stale Data 
Propagation */
 XEN_CPUFEATURE(PSDP_NO,16*32+15) /*A  No Primary Stale Data 
Propagation */
-XEN_CPUFEATURE(FB_CLEAR,   16*32+17) /*!A Fill Buffers cleared by VERW 
*/
+XEN_CPUFEATURE(FB_CLEAR,   16*32+17) /*!A| Fill Buffers cleared by 
VERW */
 XEN_CPUFEATURE(FB_CLEAR_CTRL,  16*32+18) /*   
MSR_OPT_CPU_CTRL.FB_CLEAR_DIS */
 XEN_CPUFEATURE(RRSBA,  16*32+19) /*!  Restricted RSB Alternative */
 XEN_CPUFEATURE(BHI_NO, 16*32+20) /*A  No Branch History Injection  
*/
@@ -353,7 +358,7 @@ XEN_CPUFEATURE(PBRSB_NO,   16*32+24) /*A  No 
Post-Barrier RSB prediction
 XEN_CPUFEATURE(GDS_CTRL,   16*32+25) /*   
MCU_OPT_CTRL.GDS_MIT_{DIS,LOCK} */
 XEN_CPUFEATURE(GDS_NO, 16*32+26) /*A  No Gather Data Sampling */
 XEN_CPUFEATURE(RFDS_NO,16*32+27) /*A  No Register File Data 
Sampling */
-XEN_CPUFEATURE(RFDS_CLEAR, 16*32+28) /*!A Register File(s) cleared by 
VERW */
+XEN_CPUFEATURE(RFDS_CLEAR, 16*32+28) /*!A| Register File(s) cleared by 
VERW */
 
 /* Intel-defined CPU features, MSR_ARCH_CAPS 0x10a.edx, word 17 */
 
diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index bf3f9ec01e6e..1fb76f664529 100755
--- a/xen/too

Re: [PATCH 1/3] x86/hvm/trace: Use a different trace type for AMD processors

2024-04-26 Thread Andrew Cooper
On 26/04/2024 4:29 pm, George Dunlap wrote:
> On Fri, Apr 26, 2024 at 4:18 PM Andrew Cooper  
> wrote:
>> On 26/04/2024 3:32 pm, George Dunlap wrote:
>>> In xenalyze, first remove the redundant call to init_hvm_data();
>>> there's no way to get to hvm_vmexit_process() without it being already
>>> initialized by the set_vcpu_type call in hvm_process().
>>>
>>> Replace this with set_hvm_exit_reson_data(), and move setting of
>>> hvm->exit_reason_* into that function.
>>>
>>> Modify hvm_process and hvm_vmexit_process to handle all four potential
>>> values appropriately.
>>>
>>> If SVM entries are encountered, set opt.svm_mode so that other
>>> SVM-specific functionality is triggered.
>> Given that xenalyze is now closely tied to Xen, and that we're
>> technically changing the ABI here, is there any point keeping `--svm-mode` ?
>>
>> I'm unsure of the utility of reading the buggy trace records from an
>> older version of Xen.
> Yeah, I thought about that.  If nobody argues to keep it, I guess I'll
> rip it out for v2.

That's the way I'd suggest going.
>>> Signed-off-by: George Dunlap 
>>> ---
>>> NB that this patch goes on top of Andrew's trace cleanup series:
>>>
>>> https://lore.kernel.org/xen-devel/20240318163552.3808695-1-andrew.coop...@citrix.com/
>> The delta in Xen is trivial.  I'm happy if you want to commit this, and
>> I can rebase over it.
> It's trivial in part *because of* your series.  Without your series I
> would have had to do a bunch of munging around with DO_TRC_BLAH_BLAH
> to make it compile.  In fact, I started doing it directly on staging,
> but quickly moved onto your series to save myself some time. :-)

Ah.  I'll be refreshing mine next week.  Given that it's missed
everything since 4.16, I'm not intending to let it miss this one...

But I've still got a few things which need to go out before the
past-post deadline.

~Andrew



Re: [PATCH 1/3] x86/hvm/trace: Use a different trace type for AMD processors

2024-04-26 Thread Andrew Cooper
On 26/04/2024 3:32 pm, George Dunlap wrote:
> A long-standing usability sub-optimality with xenalyze is the
> necessity to specify `--svm-mode` when analyzing AMD processors.  This
> fundamentally comes about because the same trace event ID is used for
> both VMX and SVM, but the contents of the trace must be interpreted
> differently.
>
> Instead, allocate separate trace events for VMX and SVM vmexits in
> Xen; this will allow all readers to properly intrepret the meaning of

interpret ?

> the vmexit reason.
>
> In xenalyze, first remove the redundant call to init_hvm_data();
> there's no way to get to hvm_vmexit_process() without it being already
> initialized by the set_vcpu_type call in hvm_process().
>
> Replace this with set_hvm_exit_reson_data(), and move setting of
> hvm->exit_reason_* into that function.
>
> Modify hvm_process and hvm_vmexit_process to handle all four potential
> values appropriately.
>
> If SVM entries are encountered, set opt.svm_mode so that other
> SVM-specific functionality is triggered.

Given that xenalyze is now closely tied to Xen, and that we're
technically changing the ABI here, is there any point keeping `--svm-mode` ?

I'm unsure of the utility of reading the buggy trace records from an
older version of Xen.

> Also add lines in `formats` for xentrace_format.

Personally I'd have put patch 3 first, and reduced the churn here.

> Signed-off-by: George Dunlap 
> ---
> NB that this patch goes on top of Andrew's trace cleanup series:
>
> https://lore.kernel.org/xen-devel/20240318163552.3808695-1-andrew.coop...@citrix.com/

The delta in Xen is trivial.  I'm happy if you want to commit this, and
I can rebase over it.

~Andrew



Re: [PATCH 2/3] tools/xenalyze: Ignore HVM_EMUL events harder

2024-04-26 Thread Andrew Cooper
On 26/04/2024 3:32 pm, George Dunlap wrote:
> To unify certain common sanity checks, checks are done very early in
> processing based only on the top-level type.
>
> Unfortunately, when TRC_HVM_EMUL was introduced, it broke some of the
> assumptions about how the top-level types worked.  Namely, traces of
> this type will show up outside of HVM contexts: in idle domains and in
> PV domains.
>
> Make an explicit exception for TRC_HVM_EMUL types in a number of places:
>
>  - Pass the record info pointer to toplevel_assert_check, so that it
>can exclude TRC_HVM_EMUL records from idle and vcpu data_mode
>checks
>
>  - Don't attempt to set the vcpu data_type in hvm_process for
>TRC_HVM_EMUL records.
>
> Signed-off-by: George Dunlap 

Acked-by: Andrew Cooper 

Although I'm tempted to say that if records of this type show up outside
of HVM context, then it's misnamed or we've got an error in Xen.  Any
view on which?

~Andrew



Re: [PATCH 3/3] tools/xentrace: Remove xentrace_format

2024-04-26 Thread Andrew Cooper
On 26/04/2024 3:32 pm, George Dunlap wrote:
> xentrace_format was always of limited utility, since trace records
> across pcpus were processed out of order; it was superseded by xenalyze
> over a decade ago.
>
> But for several releases, the `formats` file it has depended on for
> proper operation has not even been included in `make install` (which
> generally means it doesn't get picked up by distros either); yet
> nobody has seemed to complain.
>
> Simple remove xentrace_format, and point people to xenalyze instead.
>
> NB that there is no man page for xenalyze, so the "see also" on the
> xentrace man page is simply removed for now.
>
> Signed-off-by: George Dunlap 

Acked-by: Andrew Cooper 



[PATCH 0/3] x86/boot: Untangling

2024-04-26 Thread Andrew Cooper
This started out as another series trying to fix CPUID handling for SEV.

I'm still trying to sort out the SEV side of things, but I might need to
resort to something *far* more unpleasant in order to hit 4.19.

This series is some cleanup of module handling from the work I've done so far.
It interacts with the HyperLaunch Boot Module cleanup, but should make it
simpler overall.

Andrew Cooper (3):
  x86/boot: Explain how moving mod[0] works
  x86/boot: Explain discard_initial_images() and untangle PV initrd handling
  x86/boot: Refactor pvh_load_kernel() to have an initrd_len local

 xen/arch/x86/hvm/dom0_build.c |  9 +
 xen/arch/x86/pv/dom0_build.c  | 22 +++---
 xen/arch/x86/setup.c  | 14 +-
 3 files changed, 29 insertions(+), 16 deletions(-)

-- 
2.30.2




[PATCH 2/3] x86/boot: Explain discard_initial_images() and untangle PV initrd handling

2024-04-26 Thread Andrew Cooper
discard_initial_images() frees the memory backing the boot modules.  It is
called by dom0_construct_pv{,h}(), but obtains the module list by global
pointer which is why there is no apparent link with the initrd pointer.

Generally, module contents are copied into dom0 as it's being built, but the
initrd for PV dom0 might be directly mapped instead of copied.

dom0_construct_pv() does it's own ad-hoc freeing of the module in the copy
case, and points the global reference at the new copy, then sets the size to
0.  This only functions because init_domheap_pages(x, x) happens to be a nop.

Delete the ad-hoc freeing, and leave it to discard_initial_images().  This
requires (not) adjusting initd->mod_start in the copy case, and only setting
the size to 0 in the mapping case.

Alter discard_initial_images() to explicitly check for an ignored module, and
explain what's going on.  This is more robust and will allow for fixing other
problems with module handling.

The later logic in dom0_construct_pv() now cannot use initrd->mod_start, but
that's fine because initrd_mfn is already a duplicate of the information
wanted, and is more consistent with initrd_{pfn,len} used elsewhere.

Invalidate the initrd pointer with LIST_POISON1 to make it clearer that it
shouldn't be used.

No practical change in behaviour, but a substantial reduction in the
complexity of how this works.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Stefano Stabellini 
CC: Daniel Smith 
CC: Christopher Clark 

In other akward questions, why does initial_images_nrpages() account for all
modules when only 1 or 2 are relevant for how we construct dom0 ?
---
 xen/arch/x86/pv/dom0_build.c | 22 +++---
 xen/arch/x86/setup.c |  9 -
 2 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index d8043fa58a27..64d9984b8308 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -630,18 +630,20 @@ int __init dom0_construct_pv(struct domain *d,
 }
 memcpy(page_to_virt(page), mfn_to_virt(initrd->mod_start),
initrd_len);
-mpt_alloc = (paddr_t)initrd->mod_start << PAGE_SHIFT;
-init_domheap_pages(mpt_alloc,
-   mpt_alloc + PAGE_ALIGN(initrd_len));
-initrd->mod_start = initrd_mfn = mfn_x(page_to_mfn(page));
+initrd_mfn = mfn_x(page_to_mfn(page));
 }
 else
 {
 while ( count-- )
 if ( assign_pages(mfn_to_page(_mfn(mfn++)), 1, d, 0) )
 BUG();
+/*
+ * Mapped rather than copied.  Tell discard_initial_images() to
+ * ignore it.
+ */
+initrd->mod_end = 0;
 }
-initrd->mod_end = 0;
+initrd = LIST_POISON1; /* No longer valid to use. */
 
 iommu_memory_setup(d, "initrd", mfn_to_page(_mfn(initrd_mfn)),
PFN_UP(initrd_len), _flags);
@@ -653,12 +655,10 @@ int __init dom0_construct_pv(struct domain *d,
 if ( domain_tot_pages(d) < nr_pages )
 printk(" (%lu pages to be allocated)",
nr_pages - domain_tot_pages(d));
-if ( initrd )
-{
-mpt_alloc = (paddr_t)initrd->mod_start << PAGE_SHIFT;
+if ( initrd_len )
 printk("\n Init. ramdisk: %"PRIpaddr"->%"PRIpaddr,
-   mpt_alloc, mpt_alloc + initrd_len);
-}
+   pfn_to_paddr(initrd_mfn),
+   pfn_to_paddr(initrd_mfn) + initrd_len);
 
 printk("\nVIRTUAL MEMORY ARRANGEMENT:\n");
 printk(" Loaded kernel: %p->%p\n", _p(vkern_start), _p(vkern_end));
@@ -881,7 +881,7 @@ int __init dom0_construct_pv(struct domain *d,
 if ( pfn >= initrd_pfn )
 {
 if ( pfn < initrd_pfn + PFN_UP(initrd_len) )
-mfn = initrd->mod_start + (pfn - initrd_pfn);
+mfn = initrd_mfn + (pfn - initrd_pfn);
 else
 mfn -= PFN_UP(initrd_len);
 }
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 59907fae095f..785a46a44995 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -294,7 +294,7 @@ unsigned long __init initial_images_nrpages(nodeid_t node)
 return nr;
 }
 
-void __init discard_initial_images(void)
+void __init discard_initial_images(void) /* a.k.a. free multiboot modules */
 {
 unsigned int i;
 
@@ -302,6 +302,13 @@ void __init discard_initial_images(void)
 {
 uint64_t start = (uint64_t)initial_images[i].mod_start << PAGE_SHIFT;
 
+/*
+ * Sometimes the initrd is mapped, rather than copied, into dom0.
+ * end=0 signifies that we should leave it alone.
+ */
+if ( initial_images[i].mod_end == 0 )
+   

[PATCH 3/3] x86/boot: Refactor pvh_load_kernel() to have an initrd_len local

2024-04-26 Thread Andrew Cooper
The expression get more complicated when ->mod_end isn't being abused as a
size field.  Introduce and use a initrd_len local variable.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Stefano Stabellini 
CC: Daniel Smith 
CC: Christopher Clark 
---
 xen/arch/x86/hvm/dom0_build.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index ac71d43a6b3f..b0cb96c3bc76 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -650,6 +650,7 @@ static int __init pvh_load_kernel(struct domain *d, const 
module_t *image,
 {
 void *image_start = image_base + image_headroom;
 unsigned long image_len = image->mod_end;
+unsigned long initrd_len = initrd ? initrd->mod_end : 0;
 struct elf_binary elf;
 struct elf_dom_parms parms;
 paddr_t last_addr;
@@ -710,7 +711,7 @@ static int __init pvh_load_kernel(struct domain *d, const 
module_t *image,
  * simplify it.
  */
 last_addr = find_memory(d, , sizeof(start_info) +
-(initrd ? ROUNDUP(initrd->mod_end, PAGE_SIZE) +
+(initrd ? ROUNDUP(initrd_len, PAGE_SIZE) +
   sizeof(mod)
 : 0) +
 (cmdline ? ROUNDUP(strlen(cmdline) + 1,
@@ -725,7 +726,7 @@ static int __init pvh_load_kernel(struct domain *d, const 
module_t *image,
 if ( initrd != NULL )
 {
 rc = hvm_copy_to_guest_phys(last_addr, mfn_to_virt(initrd->mod_start),
-initrd->mod_end, v);
+initrd_len, v);
 if ( rc )
 {
 printk("Unable to copy initrd to guest\n");
@@ -733,8 +734,8 @@ static int __init pvh_load_kernel(struct domain *d, const 
module_t *image,
 }
 
 mod.paddr = last_addr;
-mod.size = initrd->mod_end;
-last_addr += ROUNDUP(initrd->mod_end, elf_64bit() ? 8 : 4);
+mod.size = initrd_len;
+last_addr += ROUNDUP(initrd_len, elf_64bit() ? 8 : 4);
 if ( initrd->string )
 {
 char *str = __va(initrd->string);
-- 
2.30.2




[PATCH 1/3] x86/boot: Explain how moving mod[0] works

2024-04-26 Thread Andrew Cooper
modules_headroom is a misleading name as it applies strictly to mod[0] only,
and the movement loop is deeply unintuitive and completely undocumented.

Provide help to whomever needs to look at this code next.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Stefano Stabellini 
CC: Daniel Smith 
CC: Christopher Clark 
---
 xen/arch/x86/setup.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index caf235c6286d..59907fae095f 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1432,6 +1432,11 @@ void asmlinkage __init noreturn __start_xen(unsigned 
long mbi_p)
 /* Is the region suitable for relocating the multiboot modules? */
 for ( j = mbi->mods_count - 1; j >= 0; j-- )
 {
+/*
+ * 'headroom' is a guess for the decompressed size and
+ * decompressor overheads of mod[0] (the dom0 kernel).  When we
+ * move mod[0], we incorperate this as extra space at the start.
+ */
 unsigned long headroom = j ? 0 : modules_headroom;
 unsigned long size = PAGE_ALIGN(headroom + mod[j].mod_end);
 
-- 
2.30.2




Re: [PATCH v3 7/8] gzip: move bitbuffer into gunzip state

2024-04-26 Thread Andrew Cooper
On 26/04/2024 6:57 am, Jan Beulich wrote:
> On 26.04.2024 07:55, Jan Beulich wrote:
>> On 25.04.2024 21:23, Andrew Cooper wrote:
>>> On 24/04/2024 5:34 pm, Daniel P. Smith wrote:
>>>> --- a/xen/common/gzip/inflate.c
>>>> +++ b/xen/common/gzip/inflate.c
>>>> @@ -1017,8 +1014,8 @@ static int __init inflate(struct gunzip_state *s)
>>>>  /* Undo too much lookahead. The next read will be byte aligned so we
>>>>   * can discard unused bits in the last meaningful byte.
>>>>   */
>>>> -while (bk >= 8) {
>>>> -bk -= 8;
>>>> +while (s->bk >= 8) {
>>>> +s->bk -= 8;
>>>>  s->inptr--;
>>>>  }
>>> Isn't it just me, but isn't this just:
>>>
>>>     s->inptr -= (s->bk >> 3);
>>>     s->bk &= 7;
>>>
>>> ?
>> I'd say yes, if only there wasn't the comment talking of just a single byte,
>> and even there supposedly some of the bits.

The comments in this file leave a lot to be desired...

I'm reasonably confident they were not adjusted when a piece of
userspace code was imported into Linux.

This cannot ever have been just a byte's worth of bits, seeing as the
bit count is from 8 or more.  Right now it's an unsigned long's worth of
bits.
> Talking of the comment: Considering what patch 1 supposedly does, how come
> that isn't Xen-style (yet)?

I had been collecting some minor fixes like this to fold into patch 1
when I committed the whole series.

I'll see if I can fold them elsewhere.

~Andrew



MISRA and -Wextra-semi

2024-04-26 Thread Andrew Cooper
Hi,

Based on a call a long while back, I experimented with -Wextra-semi. 
This is what lead to 8e36c668ca107 "xen: Drop superfluous semi-colons".

However, there are a number of problems with getting this working
fully.  First, we need workarounds like this:

diff --git a/xen/include/xen/config.h b/xen/include/xen/config.h
index d888b2314daf..12e99c6dded4 100644
--- a/xen/include/xen/config.h
+++ b/xen/include/xen/config.h
@@ -26,7 +26,7 @@
 
 #include 
 
-#define EXPORT_SYMBOL(var)
+#define EXPORT_SYMBOL(var) typedef int var##_ignore_t
 
 /*
  * The following log levels are as follows:

to avoid a failure for users, which do legitimately have a semi-colon. 
It occurs to me that we could swap the typedef for as asm("") which
might be a little less unpleasant.

But with the simple cases taken care of, we then hit:

In file included from common/grant_table.c:3813:
common/compat/grant_table.c:10:1: error: extra ';' outside of a function
[-Werror,-Wextra-semi]
CHECK_grant_entry_v1;
^

which is quickly starting to unravel.

Finally, while Clang does have -Wextra-semi working properly for C, GCC
does this:

cc1: warning: command-line option ‘-Wextra-semi’ is valid for C++/ObjC++
but not for C

instead, which passes the cc-option-add check (which doesn't contain
-Werror), but causes the real build to fail.


So, while -Wextra-semi is definitely useful to find some hidden
problems, it seems like using it fully might be very complicated.  How
much do we care?

~Andrew



Re: [PATCH 1/2] tools/{c,o}xenstored: Don't link against libsystemd

2024-04-26 Thread Andrew Cooper
On 26/04/2024 9:51 am, Anthony PERARD wrote:
> On Thu, Apr 25, 2024 at 07:16:23PM +0100, Andrew Cooper wrote:
>> On 25/04/2024 7:06 pm, Anthony PERARD wrote:
>>> On Thu, Apr 25, 2024 at 06:32:15PM +0100, Andrew Cooper wrote:
>>>> libsystemd is a giant dependency for one single function, but in the wake 
>>>> of
>>>> the xz backdoor, it turns out that even systemd leadership recommend 
>>>> against
>>>> linking against libsystemd for sd_notify().
>>>>
>>>> Since commit 7b61011e1450 ("tools: make xenstore domain easy 
>>>> configurable") in
>>>> Xen 4.8, the launch-xenstore script invokes systemd-notify directly, so its
>>> That's not enough, it's needs to be `systemd-notify --ready` to be a
>>> replacement for sd_notify(READY=1).
>>>
>>>> not even necessary for the xenstored's to call sd_notify() themselves.
>>> So, sd_notify() or equivalent is still necessary.
>>>
>>>> Therefore, just drop the calls to sd_notify() and stop linking against
>>>> libsystemd.
>>> Sounds good, be we need to replace the call by something like:
>>> echo READY=1 > $NOTIFY_SOCKET
>>> implemented in C and ocaml. Detail to be checked.
>>>
>>> Otherwise, things won't work.
>> Hmm.  It worked in XenRT when stripping this all out, but that is
> I don't know how XenServer is setup, maybe it doesn't matter?

In theory it's straight systemd, but I could also believe that Xapi
checks for the pidfile too.

>  Anyway...
>
>> extremely unintuitive behaviour for `systemd-notify --booted`, seeing as
>> it's entirely different to --ready.
> Yes, this --booted option should probably not exist, and there's
> `systemctl is-system-running` that does something similar.
>
>> I've got no interest in keeping the C around, but if:
>>
>> [ -n "$NOTIFY_SOCKET" ] && echo READY=1 > $NOTIFY_SOCKET
>>
>> works, then can't we just use that after waiting for the the pidfile ?
> Run `systemd-notify --ready` instead. Hopefully, that will be enough.
> ($NOTIFY_SOCKET is a socket, and a bit more complicated that I though,
> it can start with "@" for example)

I'll do a prep patch to adjust launch-xenstore after which this patch
should be fine in this form (modulo a tweak in the commit message).

~Andrew



Re: [PATCH v3 7/8] gzip: move bitbuffer into gunzip state

2024-04-25 Thread Andrew Cooper
On 24/04/2024 5:34 pm, Daniel P. Smith wrote:
> Signed-off-by: Daniel P. Smith 

Acked-by: Andrew Cooper 

> diff --git a/xen/common/gzip/inflate.c b/xen/common/gzip/inflate.c
> index bec8801df487..8da14880cfbe 100644
> --- a/xen/common/gzip/inflate.c
> +++ b/xen/common/gzip/inflate.c
> @@ -1017,8 +1014,8 @@ static int __init inflate(struct gunzip_state *s)
>  /* Undo too much lookahead. The next read will be byte aligned so we
>   * can discard unused bits in the last meaningful byte.
>   */
> -while (bk >= 8) {
> -bk -= 8;
> +while (s->bk >= 8) {
> +s->bk -= 8;
>  s->inptr--;
>  }

Isn't it just me, but isn't this just:

    s->inptr -= (s->bk >> 3);
    s->bk &= 7;

?

~Andrew



Re: [PATCH v3 8/8] gzip: move crc state into gunzip state

2024-04-25 Thread Andrew Cooper
On 24/04/2024 5:34 pm, Daniel P. Smith wrote:
> Move the crc and its state into struct gunzip_state. In the process, expand 
> the
> only use of CRC_VALUE as it is hides what is being compared.

"All variables here should be uint32_t rather than unsigned long, which
halves the storage space required."

> diff --git a/xen/common/gzip/inflate.c b/xen/common/gzip/inflate.c
> index 8da14880cfbe..dc940e59d853 100644
> --- a/xen/common/gzip/inflate.c
> +++ b/xen/common/gzip/inflate.c
> @@ -1069,11 +1065,11 @@ static void __init makecrc(void)
>  if (k & 1)
>  c ^= e;
>  }
> -crc_32_tab[i] = c;
> +s->crc_32_tab[i] = c;
>  }
>  
>  /* this is initialized here so this code could reside in ROM */
> -crc = (ulg)0xUL; /* shift register contents */
> +s->crc = (ulg)~0u; /* shift register contents */

The (ulg) wasn't ever necessary, and can be dropped as part of this cleanup.

Can fix on commit.

~Andrew



Re: [PATCH v3 1/8] gzip: clean up comments and fix code alignment

2024-04-25 Thread Andrew Cooper
On 24/04/2024 5:34 pm, Daniel P. Smith wrote:
> This commit cleans up the comments and fixes the code alignment using Xen
> coding style. This is done to make the code more legible before refactoring.
>
> Signed-off-by: Daniel P. Smith 

Acked-by: Andrew Cooper 



Re: [PATCH 1/2] tools/{c,o}xenstored: Don't link against libsystemd

2024-04-25 Thread Andrew Cooper
On 25/04/2024 7:06 pm, Anthony PERARD wrote:
> On Thu, Apr 25, 2024 at 06:32:15PM +0100, Andrew Cooper wrote:
>> libsystemd is a giant dependency for one single function, but in the wake of
>> the xz backdoor, it turns out that even systemd leadership recommend against
>> linking against libsystemd for sd_notify().
>>
>> Since commit 7b61011e1450 ("tools: make xenstore domain easy configurable") 
>> in
>> Xen 4.8, the launch-xenstore script invokes systemd-notify directly, so its
> That's not enough, it's needs to be `systemd-notify --ready` to be a
> replacement for sd_notify(READY=1).
>
>> not even necessary for the xenstored's to call sd_notify() themselves.
> So, sd_notify() or equivalent is still necessary.
>
>> Therefore, just drop the calls to sd_notify() and stop linking against
>> libsystemd.
> Sounds good, be we need to replace the call by something like:
> echo READY=1 > $NOTIFY_SOCKET
> implemented in C and ocaml. Detail to be checked.
>
> Otherwise, things won't work.

Hmm.  It worked in XenRT when stripping this all out, but that is
extremely unintuitive behaviour for `systemd-notify --booted`, seeing as
it's entirely different to --ready.

I've got no interest in keeping the C around, but if:

[ -n "$NOTIFY_SOCKET" ] && echo READY=1 > $NOTIFY_SOCKET

works, then can't we just use that after waiting for the the pidfile ?

~Andrew



[PATCH] CI: Drop glibc-i386 from the build containers

2024-04-25 Thread Andrew Cooper
Xen 4.14 no longer runs in Gitlab CI.  Drop the dependency to shrink the build
containers a little.

Signed-off-by: Andrew Cooper 
---
CC: Anthony PERARD 
CC: Stefano Stabellini 
CC: Michal Orzel 
CC: Doug Goldstein 
CC: Roger Pau Monné 
---
 automation/build/archlinux/current.dockerfile| 2 --
 automation/build/centos/7.dockerfile | 2 --
 automation/build/debian/bookworm.dockerfile  | 2 --
 automation/build/debian/jessie.dockerfile| 2 --
 automation/build/debian/stretch.dockerfile   | 2 --
 automation/build/fedora/29.dockerfile| 2 --
 automation/build/suse/opensuse-leap.dockerfile   | 2 --
 automation/build/suse/opensuse-tumbleweed.dockerfile | 2 --
 automation/build/ubuntu/bionic.dockerfile| 2 --
 automation/build/ubuntu/focal.dockerfile | 2 --
 automation/build/ubuntu/trusty.dockerfile| 2 --
 automation/build/ubuntu/xenial.dockerfile| 2 --
 12 files changed, 24 deletions(-)

diff --git a/automation/build/archlinux/current.dockerfile 
b/automation/build/archlinux/current.dockerfile
index d974a1434fd5..3e37ab5c40c1 100644
--- a/automation/build/archlinux/current.dockerfile
+++ b/automation/build/archlinux/current.dockerfile
@@ -19,8 +19,6 @@ RUN pacman -S --refresh --sysupgrade --noconfirm 
--noprogressbar --needed \
 iasl \
 inetutils \
 iproute \
-# lib32-glibc for Xen < 4.15
-lib32-glibc \
 libaio \
 libcacard \
 libgl \
diff --git a/automation/build/centos/7.dockerfile 
b/automation/build/centos/7.dockerfile
index ab450f0b3a0e..1cdc16fc05f9 100644
--- a/automation/build/centos/7.dockerfile
+++ b/automation/build/centos/7.dockerfile
@@ -32,8 +32,6 @@ RUN yum -y update \
 yajl-devel \
 pixman-devel \
 glibc-devel \
-# glibc-devel.i686 for Xen < 4.15
-glibc-devel.i686 \
 make \
 binutils \
 git \
diff --git a/automation/build/debian/bookworm.dockerfile 
b/automation/build/debian/bookworm.dockerfile
index 459f8e30bdc6..d893218fc4bd 100644
--- a/automation/build/debian/bookworm.dockerfile
+++ b/automation/build/debian/bookworm.dockerfile
@@ -31,8 +31,6 @@ RUN apt-get update && \
 bin86 \
 bcc \
 liblzma-dev \
-# libc6-dev-i386 for Xen < 4.15
-libc6-dev-i386 \
 libnl-3-dev \
 ocaml-nox \
 libfindlib-ocaml-dev \
diff --git a/automation/build/debian/jessie.dockerfile 
b/automation/build/debian/jessie.dockerfile
index 32fc952fbc2d..308675cac150 100644
--- a/automation/build/debian/jessie.dockerfile
+++ b/automation/build/debian/jessie.dockerfile
@@ -37,8 +37,6 @@ RUN apt-get update && \
 bin86 \
 bcc \
 liblzma-dev \
-# libc6-dev-i386 for Xen < 4.15
-libc6-dev-i386 \
 libnl-3-dev \
 ocaml-nox \
 libfindlib-ocaml-dev \
diff --git a/automation/build/debian/stretch.dockerfile 
b/automation/build/debian/stretch.dockerfile
index e2706a8f3589..59794ed4677b 100644
--- a/automation/build/debian/stretch.dockerfile
+++ b/automation/build/debian/stretch.dockerfile
@@ -38,8 +38,6 @@ RUN apt-get update && \
 bin86 \
 bcc \
 liblzma-dev \
-# libc6-dev-i386 for Xen < 4.15
-libc6-dev-i386 \
 libnl-3-dev \
 ocaml-nox \
 libfindlib-ocaml-dev \
diff --git a/automation/build/fedora/29.dockerfile 
b/automation/build/fedora/29.dockerfile
index 42a87ce6c84b..f473ae13e7c1 100644
--- a/automation/build/fedora/29.dockerfile
+++ b/automation/build/fedora/29.dockerfile
@@ -21,8 +21,6 @@ RUN dnf -y install \
 yajl-devel \
 pixman-devel \
 glibc-devel \
-# glibc-devel.i686 for Xen < 4.15
-glibc-devel.i686 \
 make \
 binutils \
 git \
diff --git a/automation/build/suse/opensuse-leap.dockerfile 
b/automation/build/suse/opensuse-leap.dockerfile
index e1ec38a41445..48d0d50d005d 100644
--- a/automation/build/suse/opensuse-leap.dockerfile
+++ b/automation/build/suse/opensuse-leap.dockerfile
@@ -28,8 +28,6 @@ RUN zypper install -y --no-recommends \
 ghostscript \
 glib2-devel \
 glibc-devel \
-# glibc-devel-32bit for Xen < 4.15
-glibc-devel-32bit \
 gzip \
 hostname \
 libaio-devel \
diff --git a/automation/build/suse/opensuse-tumbleweed.dockerfile 
b/automation/build/suse/opensuse-tumbleweed.dockerfile
index f00e03eda7b1..53542ba1f4d9 100644
--- a/automation/build/suse/opensuse-tumbleweed.dockerfile
+++ b/automation/build/suse/opensuse-tumbleweed.dockerfile
@@ -26,8 +26,6 @@ RUN zypper install -y --no-recommends \
 ghostscript \
 glib2-devel \
 glibc-devel \
-# glibc-devel-32bit for Xen < 4.15
-glibc-devel-32bit \
 gzip \
 hostname \
 libaio-devel \
diff --git a/automation/bu

[PATCH 2/2] tools: Drop libsystemd as a dependency

2024-04-25 Thread Andrew Cooper
There are no more users, and we want to disuade people from introducing new
users just for sd_notify() and friends.  Drop the dependency.

We still want the overall --with{,out}-systemd to gate the generation of the
service/unit/mount/etc files.

Rerun autogen.sh, and mark the dependency as removed in the build containers.

Signed-off-by: Andrew Cooper 
---
CC: Anthony PERARD 
CC: Juergen Gross 
CC: Christian Lindig 
CC: Edwin Török 
CC: Stefano Stabellini 
---
 automation/build/archlinux/current.dockerfile |   1 +
 .../build/suse/opensuse-leap.dockerfile   |   1 +
 .../build/suse/opensuse-tumbleweed.dockerfile |   1 +
 automation/build/ubuntu/focal.dockerfile  |   1 +
 config/Tools.mk.in|   2 -
 m4/systemd.m4 |  23 +-
 tools/config.h.in |   3 -
 tools/configure   | 523 +-
 8 files changed, 7 insertions(+), 548 deletions(-)

diff --git a/automation/build/archlinux/current.dockerfile 
b/automation/build/archlinux/current.dockerfile
index d974a1434fd5..a350b53ad8e2 100644
--- a/automation/build/archlinux/current.dockerfile
+++ b/automation/build/archlinux/current.dockerfile
@@ -39,6 +39,7 @@ RUN pacman -S --refresh --sysupgrade --noconfirm 
--noprogressbar --needed \
 sdl2 \
 spice \
 spice-protocol \
+# systemd for Xen < 4.19
 systemd \
 transfig \
 usbredir \
diff --git a/automation/build/suse/opensuse-leap.dockerfile 
b/automation/build/suse/opensuse-leap.dockerfile
index e1ec38a41445..0483d9826d7d 100644
--- a/automation/build/suse/opensuse-leap.dockerfile
+++ b/automation/build/suse/opensuse-leap.dockerfile
@@ -61,6 +61,7 @@ RUN zypper install -y --no-recommends \
 'pkgconfig(sdl2)' \
 python3-devel \
 python3-setuptools \
+# systemd-devel for Xen < 4.19
 systemd-devel \
 tar \
 transfig \
diff --git a/automation/build/suse/opensuse-tumbleweed.dockerfile 
b/automation/build/suse/opensuse-tumbleweed.dockerfile
index f00e03eda7b1..59b168e717b2 100644
--- a/automation/build/suse/opensuse-tumbleweed.dockerfile
+++ b/automation/build/suse/opensuse-tumbleweed.dockerfile
@@ -62,6 +62,7 @@ RUN zypper install -y --no-recommends \
 'pkgconfig(sdl2)' \
 python3-devel \
 python3-setuptools \
+# systemd-devel for Xen < 4.19
 systemd-devel \
 tar \
 transfig \
diff --git a/automation/build/ubuntu/focal.dockerfile 
b/automation/build/ubuntu/focal.dockerfile
index 30a9b8e84ffe..8171347c4656 100644
--- a/automation/build/ubuntu/focal.dockerfile
+++ b/automation/build/ubuntu/focal.dockerfile
@@ -36,6 +36,7 @@ RUN apt-get update && \
 libnl-3-dev \
 ocaml-nox \
 libfindlib-ocaml-dev \
+# libsystemd-dev for Xen < 4.19
 libsystemd-dev \
 transfig \
 pandoc \
diff --git a/config/Tools.mk.in b/config/Tools.mk.in
index b54ab21f966b..50fbef841f3f 100644
--- a/config/Tools.mk.in
+++ b/config/Tools.mk.in
@@ -52,8 +52,6 @@ CONFIG_PYGRUB   := @pygrub@
 CONFIG_LIBFSIMAGE   := @libfsimage@
 
 CONFIG_SYSTEMD  := @systemd@
-SYSTEMD_CFLAGS  := @SYSTEMD_CFLAGS@
-SYSTEMD_LIBS:= @SYSTEMD_LIBS@
 XEN_SYSTEMD_DIR := @SYSTEMD_DIR@
 XEN_SYSTEMD_MODULES_LOAD := @SYSTEMD_MODULES_LOAD@
 CONFIG_9PFS := @ninepfs@
diff --git a/m4/systemd.m4 b/m4/systemd.m4
index 112dc11b5e05..aa1ebe94f56c 100644
--- a/m4/systemd.m4
+++ b/m4/systemd.m4
@@ -41,15 +41,6 @@ AC_DEFUN([AX_ALLOW_SYSTEMD_OPTS], [
 ])
 
 AC_DEFUN([AX_CHECK_SYSTEMD_LIBS], [
-   PKG_CHECK_MODULES([SYSTEMD], [libsystemd-daemon],,
-   [PKG_CHECK_MODULES([SYSTEMD], [libsystemd >= 209])]
-)
-   dnl pkg-config older than 0.24 does not set these for
-   dnl PKG_CHECK_MODULES() worth also noting is that as of version 208
-   dnl of systemd pkg-config --cflags currently yields no extra flags yet.
-   AC_SUBST([SYSTEMD_CFLAGS])
-   AC_SUBST([SYSTEMD_LIBS])
-
AS_IF([test "x$SYSTEMD_DIR" = x], [
dnl In order to use the line below we need to fix upstream systemd
dnl to properly ${prefix} for child variables in
@@ -83,23 +74,11 @@ AC_DEFUN([AX_CHECK_SYSTEMD_LIBS], [
 AC_DEFUN([AX_CHECK_SYSTEMD], [
dnl Respect user override to disable
AS_IF([test "x$enable_systemd" != "xno"], [
-AS_IF([test "x$systemd" = "xy" ], [
-   AC_DEFINE([HAVE_SYSTEMD], [1], [Systemd available and enabled])
-   systemd=y
-   AX_CHECK_SYSTEMD_LIBS()
-   ],[
-   AS_IF([test "x$enable_systemd" = "xyes"],
-   [AC_MSG_ERROR([Unable to find systemd development 
library])],
-   [systemd=n])
-   ])
+   systemd=y
],[systemd=

[PATCH 0/2] Drop libsystemd

2024-04-25 Thread Andrew Cooper
On advise from the systemd leadership.  See patch 1 for details.

Andrew Cooper (2):
  tools/{c,o}xenstored: Don't link against libsystemd
  tools: Drop libsystemd as a dependency

 automation/build/archlinux/current.dockerfile |   1 +
 .../build/suse/opensuse-leap.dockerfile   |   1 +
 .../build/suse/opensuse-tumbleweed.dockerfile |   1 +
 automation/build/ubuntu/focal.dockerfile  |   1 +
 config/Tools.mk.in|   2 -
 m4/systemd.m4 |  23 +-
 tools/config.h.in |   3 -
 tools/configure   | 523 +-
 tools/ocaml/xenstored/Makefile|  12 +-
 tools/ocaml/xenstored/systemd.ml  |  15 -
 tools/ocaml/xenstored/systemd.mli |  16 -
 tools/ocaml/xenstored/systemd_stubs.c |  47 --
 tools/ocaml/xenstored/xenstored.ml|   1 -
 tools/xenstored/Makefile  |   5 -
 tools/xenstored/posix.c   |   9 -
 15 files changed, 8 insertions(+), 652 deletions(-)
 delete mode 100644 tools/ocaml/xenstored/systemd.ml
 delete mode 100644 tools/ocaml/xenstored/systemd.mli
 delete mode 100644 tools/ocaml/xenstored/systemd_stubs.c


base-commit: 23cd1207e7f6ee3e51fb42e11dba8d7cdb28e1e5
-- 
2.30.2




[PATCH 1/2] tools/{c,o}xenstored: Don't link against libsystemd

2024-04-25 Thread Andrew Cooper
libsystemd is a giant dependency for one single function, but in the wake of
the xz backdoor, it turns out that even systemd leadership recommend against
linking against libsystemd for sd_notify().

Since commit 7b61011e1450 ("tools: make xenstore domain easy configurable") in
Xen 4.8, the launch-xenstore script invokes systemd-notify directly, so its
not even necessary for the xenstored's to call sd_notify() themselves.

Therefore, just drop the calls to sd_notify() and stop linking against
libsystemd.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Anthony PERARD 
CC: Juergen Gross 
CC: Christian Lindig 
CC: Edwin Török 
CC: Stefano Stabellini 
---
 tools/ocaml/xenstored/Makefile| 12 +--
 tools/ocaml/xenstored/systemd.ml  | 15 -
 tools/ocaml/xenstored/systemd.mli | 16 -
 tools/ocaml/xenstored/systemd_stubs.c | 47 ---
 tools/ocaml/xenstored/xenstored.ml|  1 -
 tools/xenstored/Makefile  |  5 ---
 tools/xenstored/posix.c   |  9 -
 7 files changed, 1 insertion(+), 104 deletions(-)
 delete mode 100644 tools/ocaml/xenstored/systemd.ml
 delete mode 100644 tools/ocaml/xenstored/systemd.mli
 delete mode 100644 tools/ocaml/xenstored/systemd_stubs.c

diff --git a/tools/ocaml/xenstored/Makefile b/tools/ocaml/xenstored/Makefile
index e8aaecf2e630..1e4b51cc5432 100644
--- a/tools/ocaml/xenstored/Makefile
+++ b/tools/ocaml/xenstored/Makefile
@@ -4,8 +4,6 @@ include $(OCAML_TOPLEVEL)/common.make
 
 # Include configure output (config.h)
 CFLAGS += -include $(XEN_ROOT)/tools/config.h
-CFLAGS-$(CONFIG_SYSTEMD)  += $(SYSTEMD_CFLAGS)
-LDFLAGS-$(CONFIG_SYSTEMD) += $(SYSTEMD_LIBS)
 
 CFLAGS  += $(CFLAGS-y)
 CFLAGS  += $(APPEND_CFLAGS)
@@ -25,13 +23,6 @@ poll_OBJS = poll
 poll_C_OBJS = select_stubs
 OCAML_LIBRARY = syslog poll
 
-LIBS += systemd.cma systemd.cmxa
-systemd_OBJS = systemd
-systemd_C_OBJS = systemd_stubs
-OCAML_LIBRARY += systemd
-
-LIBS_systemd += $(LDFLAGS-y)
-
 OBJS = paths \
define \
stdext \
@@ -56,12 +47,11 @@ OBJS = paths \
process \
xenstored
 
-INTF = symbol.cmi trie.cmi syslog.cmi systemd.cmi poll.cmi
+INTF = symbol.cmi trie.cmi syslog.cmi poll.cmi
 
 XENSTOREDLIBS = \
unix.cmxa \
-ccopt -L -ccopt . syslog.cmxa \
-   -ccopt -L -ccopt . systemd.cmxa \
-ccopt -L -ccopt . poll.cmxa \
-ccopt -L -ccopt $(OCAML_TOPLEVEL)/libs/mmap 
$(OCAML_TOPLEVEL)/libs/mmap/xenmmap.cmxa \
-ccopt -L -ccopt $(OCAML_TOPLEVEL)/libs/eventchn 
$(OCAML_TOPLEVEL)/libs/eventchn/xeneventchn.cmxa \
diff --git a/tools/ocaml/xenstored/systemd.ml b/tools/ocaml/xenstored/systemd.ml
deleted file mode 100644
index 39127f712d72..
--- a/tools/ocaml/xenstored/systemd.ml
+++ /dev/null
@@ -1,15 +0,0 @@
-(*
- * Copyright (C) 2014 Luis R. Rodriguez 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published
- * by the Free Software Foundation; version 2.1 only. with the special
- * exception on linking described in file LICENSE.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU Lesser General Public License for more details.
- *)
-
-external sd_notify_ready: unit -> unit = "ocaml_sd_notify_ready"
diff --git a/tools/ocaml/xenstored/systemd.mli 
b/tools/ocaml/xenstored/systemd.mli
deleted file mode 100644
index 18b9331031f9..
--- a/tools/ocaml/xenstored/systemd.mli
+++ /dev/null
@@ -1,16 +0,0 @@
-(*
- * Copyright (C) 2014 Luis R. Rodriguez 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published
- * by the Free Software Foundation; version 2.1 only. with the special
- * exception on linking described in file LICENSE.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU Lesser General Public License for more details.
- *)
-
-(** Tells systemd we're ready *)
-external sd_notify_ready: unit -> unit = "ocaml_sd_notify_ready"
diff --git a/tools/ocaml/xenstored/systemd_stubs.c 
b/tools/ocaml/xenstored/systemd_stubs.c
deleted file mode 100644
index f4c875075abe..
--- a/tools/ocaml/xenstored/systemd_stubs.c
+++ /dev/null
@@ -1,47 +0,0 @@
-/*
- * Copyright (C) 2014 Luis R. Rodriguez 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published
- * by the Free Software Foundation; version 2.1 only. with the special
- * exception on linking described in file LICENSE.
- *
- * This program is distribute

Re: [PATCH] svm: Fix MISRA 8.2 violation

2024-04-25 Thread Andrew Cooper
On 25/04/2024 10:12 am, George Dunlap wrote:
> Misra 8.2 requires named parameters in prototypes.  Use the name from
> the implementaiton.
>
> Fixes 0d19d3aab0 ("svm/nestedsvm: Introduce nested capabilities bit").

Nit.  We treat Fixes: as a tag.

Also you might want to include the Reported-by's.

Otherwise, Acked-by: Andrew Cooper  although
I have a strong wish to shorten the parameter name.  That can be done at
a later point.

~Andrew



Re: [PATCH v3] x86/entry: shrink insn size for some of our EFLAGS manipulation

2024-04-25 Thread Andrew Cooper
On 25/04/2024 3:26 pm, Jan Beulich wrote:
> Much like was recently done for setting entry vector, and along the
> lines of what we already had in handle_exception_saved, avoid 32-bit
> immediates where 8-bit ones do. Reduces .text.entry size by 16 bytes in
> my non-CET reference build, while in my CET reference build section size
> doesn't change (there and in .text only padding space increases).
>
> Inspired by other long->byte conversion work.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 



Re: [PATCH] VMX: no open-coding in vmx_get_cpl()

2024-04-25 Thread Andrew Cooper
On 25/04/2024 2:27 pm, Jan Beulich wrote:
> Neither X86_SEG_AR_DPL nor MASK_EXTR() should really be avoided here,
> using literal number instead.
>
> No difference in generated code (with gcc13 at least).
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooepr 



Re: [PATCH v2 2/2] x86/spec: adjust logic to logic that elides lfence

2024-04-25 Thread Andrew Cooper
On 18/04/2024 4:52 pm, Roger Pau Monne wrote:
> It's currently too restrictive by just checking whether there's a BHB clearing
> sequence selected.  It should instead check whether BHB clearing is used on
> entry from PV or HVM specifically.
>
> Switch to use opt_bhb_entry_{pv,hvm} instead, and then remove cpu_has_bhb_seq
> since it no longer has any users.
>
> Reported-by: Jan Beulich 
> Fixes: 954c983abcee ('x86/spec-ctrl: Software BHB-clearing sequences')
> Signed-off-by: Roger Pau Monné 
> ---
> Changes since v1:
>  - New in this version.
>
> There (possibly) still a bit of overhead for dom0 if BHB clearing is not used
> for dom0, as Xen would still add the lfence if domUs require it.

This is the note about dom0 that I made on the previous patch.

"protect dom0" only has any effect if the appropriate foo_pv or foo_hvm
is also selected.  It's not possible to express "protect dom0 but not
domU of $TYPE".

This early on boot we have no idea whether dom0 is going to be PV or
HVM.  We could in principle figure it out by peeking at dom0's ELF
notes, but that needs a lot of rearranging of __start_xen() to do safely.

Reviewed-by: Andrew Cooper 



Re: [PATCH v2 1/2] x86/spec: fix reporting of BHB clearing usage from guest entry points

2024-04-25 Thread Andrew Cooper
On 18/04/2024 4:52 pm, Roger Pau Monne wrote:
> Reporting whether the BHB clearing on entry is done for the different domains
> types based on cpu_has_bhb_seq is incorrect, as that variable signals whether

I'd prefer s/incorrect/unhelpful/ here.

As pointed out, it's currently like IBPB-entry in this regard,
identifying whether the block is in place; not whether it's used by the
condition.

> there's a BHB clearing sequence selected, but that alone doesn't imply that
> such sequence is used from the PV and/or HVM entry points.
>
> Instead use opt_bhb_entry_{pv,hvm} which do signal whether BHB clearing is
> performed on entry from PV/HVM.

I was going to ask for a note about dom0, but instead I need to make a
correction to the cmdline docs.  I'll do that in a separate patch.


> Fixes: 689ad48ce9cf ('x86/spec-ctrl: Wire up the Native-BHI software 
> sequences')
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Andrew Cooper 

Can make the one tweak on commit.



Re: [XEN PATCH 2/2] x86/msr: add suffix 'U' to MSR_AMD_CSTATE_CFG macro.

2024-04-24 Thread Andrew Cooper
On 24/04/2024 1:51 pm, Jan Beulich wrote:
> On 24.04.2024 14:11, Alessandro Zucchelli wrote:
>> This addresses violations of MISRA C:2012 Rule 7.2 which states as
>> following: A “u” or “U” suffix shall be applied to all integer constants
>> that are represented in an unsigned type.
>>
>> No functional change.
> I'm inclined to suggest
> Fixes: 652683e1aeaa ("x86/hvm: address violations of MISRA C:2012 Rule 7.2")
> as that change clearly should have taken care of this already. The line
> changed here is even visible in patch context there.
>
>> Signed-off-by: Alessandro Zucchelli 
> Acked-by: Jan Beulich 
>
>

I expect it was a race condition.  MSR_AMD_CSTATE_CFG is a recent addition.

~Andrew



Re: [PATCH 1/4] x86/P2M: write_p2m_entry() is HVM-only anyway

2024-04-23 Thread Andrew Cooper
On 23/04/2024 3:31 pm, Jan Beulich wrote:
> The latest as of e2b2ff677958 ("x86/P2M: split out init/teardown
> functions") the function is obviously unreachable for PV guests.

This doesn't parse.  Do you mean "Since e2b2ff677958 ..." ?

>  Hence
> the paging_mode_enabled(d) check is pointless.
>
> Further host mode of a vCPU is always set, by virtue of
> paging_vcpu_init() being part of vCPU creation. Hence the
> paging_get_hostmode() check is pointless.
>
> With that the v local variable is unnecessary too. Drop the "if()"
> conditional and its corresponding "else".
>
> Signed-off-by: Jan Beulich 
> ---
> I have to confess that this if() has been puzzling me before.

Puzzling yes, but it can't blindly be dropped.

This is the "did the toolstack initiate this update" check.  i.e. I
think it's "bypass the normal side effects of making this update".

I suspect it exists because of improper abstraction between the guest
physmap and the shadow pagetables as-were - which were/are tighly
coupled to vCPUs even for aspects where they shouldn't have been.

For better or worse, the toolstack can add_to_physmap() before it
creates vCPUs, and it will take this path you're trying to delete. 
There may be other cases too; I could see foreign mapping ending up
ticking this too.

Whether we ought to permit a toolstack to do this is a different
question, but seeing as we explicitly intend (eventually for AMX) have a
set_policy call between domain_create() and vcpu_create(), I don't think
we can reasably restrict other hypercalls too in this period.

~Andrew



Re: [PATCH 1/6] x86: Introduce x86_decode_lite()

2024-04-23 Thread Andrew Cooper
On 23/04/2024 10:17 am, Jan Beulich wrote:
> On 22.04.2024 20:14, Andrew Cooper wrote:
>> --- /dev/null
>> +++ b/xen/arch/x86/x86_emulate/decode-lite.c
>> @@ -0,0 +1,245 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +
>> +#include "private.h"
>> +
>> +#define Imm8   (1 << 0)
>> +#define Imm(1 << 1)
>> +#define Branch (1 << 5) /* ... that we care about */
>> +/*  ModRM  (1 << 6) */
>> +#define Known  (1 << 7)
>> +
>> +#define ALU_OPS \
>> +(Known|ModRM),  \
>> +(Known|ModRM),  \
>> +(Known|ModRM),  \
>> +(Known|ModRM),  \
>> +(Known|Imm8),   \
>> +(Known|Imm)
>> +
>> +static const uint8_t init_or_livepatch_const onebyte[256] = {
>> +[0x00] = ALU_OPS, /* ADD */ [0x08] = ALU_OPS, /* OR  */
>> +[0x10] = ALU_OPS, /* ADC */ [0x18] = ALU_OPS, /* SBB */
>> +[0x20] = ALU_OPS, /* AND */ [0x28] = ALU_OPS, /* SUB */
>> +[0x30] = ALU_OPS, /* XOR */ [0x38] = ALU_OPS, /* CMP */
>> +
>> +[0x50 ... 0x5f] = (Known), /* PUSH/POP %reg */
>> +
>> +[0x70 ... 0x7f] = (Known|Branch|Imm8), /* Jcc disp8 */
>> +[0x80]  = (Known|ModRM|Imm8),
>> +[0x81]  = (Known|ModRM|Imm),
>> +
>> +[0x83]  = (Known|ModRM|Imm8),
>> +[0x84 ... 0x8e] = (Known|ModRM),   /* TEST/XCHG/MOV/MOV-SREG/LEA 
>> r/rm */
>> +
>> +[0xb0 ... 0xb7] = (Known|Imm8),/* MOV $imm8, %reg */
> I'm surprised you get away without at least NOP also marked as known.
> Imo the whole 0x90 row should be marked so.
>
> Similarly I'm not convinced that leaving the 0xA0 row unpopulated is a
> good idea. It's at best a trap set up for somebody to fall into rather
> sooner than later.
>
>> +[0xb8 ... 0xbf] = (Known|Imm), /* MOV $imm32, %reg */
>> +
>> +[0xcc]  = (Known), /* INT3 */
>> +[0xcd]  = (Known|Imm8),/* INT $imm8 */
> Like above, what about in particular any of the shifts/rotates and the
> MOV that's in the 0xC0 row?
>
> While the last sentence in the description is likely meant to cover
> that, I think the description wants to go further as to any pretty
> common but omitted insns. Already "x86: re-work memset()" and "x86: re-
> work memcpy()" (v2 pending for, soon, 3 years) would make it necessary
> to touch this table, thus increasing complexity of those changes to an
> area they shouldn't be concerned about at all.
>
>> +[0xe8 ... 0xe9] = (Known|Branch|Imm),  /* CALL/JMP disp32 */
>> +[0xeb]  = (Known|Branch|Imm8), /* JMP disp8 */
> 0xe0 ... 0xe7 and 0xec ... 0xef would imo also better be covered, as
> they easily can be (much like you also cover e.g. CMC despite it
> appearing pretty unlikely that we use that insn anywhere).
>
>> +[0xf1]  = (Known), /* ICEBP */
>> +[0xf4]  = (Known), /* HLT */
>> +[0xf5]  = (Known), /* CMC */
>> +[0xf6 ... 0xf7] = (Known|ModRM),   /* Grp3 */
>> +[0xf8 ... 0xfd] = (Known), /* CLC ... STD */
>> +[0xfe ... 0xff] = (Known|ModRM),   /* Grp4 */
>> +};
>> +static const uint8_t init_or_livepatch_const twobyte[256] = {
>> +[0x00 ... 0x01] = (Known|ModRM),   /* Grp6/Grp7 */
> LAR / LSL? CLTS? WBINVD? UD2?
>
>> +[0x18 ... 0x1f] = (Known|ModRM),   /* Grp16 (Hint Nop) */
>> +
>> +[0x20 ... 0x23] = (Known|ModRM),   /* MOV CR/DR */
>> +
>> +[0x30 ... 0x34] = (Known), /* WRMSR ... RDPMC */
> 0x34 is SYSENTER, isn't it?
>
>> +[0x40 ... 0x4f] = (Known|ModRM),   /* CMOVcc */
>> +
>> +[0x80 ... 0x8f] = (Known|Branch|Imm),  /* Jcc disp32 */
> What about things like VMREAD/VMWRITE?
>
>> +[0x90 ... 0x9f] = (Known|ModRM),   /* SETcc */
> PUSH/POP fs/gs? CPUID?
>
>> +[0xab]  = (Known|ModRM),   /* BTS */
> BTS (and BTC below) but not BT and BTR?
>
>> +[0xac]  = (Known|ModRM|Imm8),  /* SHRD $imm8 */
>> +[0xad ... 0xaf] = (Known|ModRM),   /* SHRD %cl / Grp15 / IMUL */
> SHRD but not SHLD?
>
> CMPXCHG?
>
>> +[0xb8 ... 0xb9] = (Known|ModRM),   /* POPCNT/Grp10 (UD1) */
>> +[0xba]  = (Known|ModRM|Imm8),  /* Grp8 */
>> +[0xbb ... 0xbf] = (Known|ModRM),   /* BSR/BSF/BSR/MOVSX */
> Nit (comment only): 0xb

Re: [PATCH 3/6] x86/alternative: Intend the relocation logic

2024-04-22 Thread Andrew Cooper
This of course intended to say indent...

~Andrew



[PATCH 2/6] x86/alternative: Walk all replacements in debug builds

2024-04-22 Thread Andrew Cooper
In debug builds, walk all alternative replacements with x86_decode_lite().

This checks that we can decode all instructions, and also lets us check that
disp8's don't leave the replacement block.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
---
 xen/arch/x86/alternative.c | 49 ++
 1 file changed, 49 insertions(+)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index 2e7ba6e0b833..5bd256365def 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define MAX_PATCH_LEN (255-1)
@@ -464,6 +465,54 @@ static void __init _alternative_instructions(bool force)
 void __init alternative_instructions(void)
 {
 arch_init_ideal_nops();
+
+/*
+ * Walk all replacement instructions with x86_decode_lite().  This checks
+ * both that we can decode all instructions within the replacement, and
+ * that any near branch with a disp8 stays within the alternative itself.
+ */
+if ( IS_ENABLED(CONFIG_DEBUG) )
+{
+struct alt_instr *a;
+
+for ( a = __alt_instructions;
+  a < __alt_instructions_end; ++a )
+{
+void *repl = ALT_REPL_PTR(a);
+void *ip = repl, *end = ip + a->repl_len;
+
+if ( !a->repl_len )
+continue;
+
+for ( x86_decode_lite_t res; ip < end; ip += res.len )
+{
+res = x86_decode_lite(ip, end);
+
+if ( res.len <= 0 )
+{
+printk("Alternative for %ps [%*ph]\n",
+   ALT_ORIG_PTR(a), a->repl_len, repl);
+panic("Unable to decode instruction at +%u in 
alternative\n",
+  (unsigned int)(unsigned long)(ip - repl));
+}
+
+if ( res.rel_type == REL_TYPE_d8 )
+{
+int8_t *d8 = res.rel;
+void *target = ip + res.len + *d8;
+
+if ( target < repl || target > end )
+{
+printk("Alternative for %ps [%*ph]\n",
+   ALT_ORIG_PTR(a), a->repl_len, repl);
+panic("'JMP/Jcc disp8' at +%u leaves alternative 
block\n",
+  (unsigned int)(unsigned long)(ip - repl));
+}
+}
+}
+}
+}
+
 _alternative_instructions(false);
 }
 
-- 
2.30.2




[PATCH 5/6] x86/alternative: Relocate all insn-relative fields

2024-04-22 Thread Andrew Cooper
Right now, relocation of displacements is restricted to finding 0xe8/e9 as the
first byte of the replacement, but this is overly restrictive.

Use x86_decode_lite() to find and adjust all insn-relative fields.

As with disp8's not leaving the replacemnet block, some disp32's don't either.
e.g. the RSB stuffing loop.  These stay unmodified.

For now, leave the altcall devirtualisation alone.  These require more care to
transform into the new scheme.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
---
 xen/arch/x86/alternative.c | 46 +++---
 1 file changed, 38 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index c86ea235e865..4d7dc9418cf0 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -244,10 +244,31 @@ static void init_or_livepatch _apply_alternatives(struct 
alt_instr *start,
 
 memcpy(buf, repl, a->repl_len);
 
+/* Walk buf[] and adjust any insn-relative operands. */
+if ( a->repl_len )
 {
-/* 0xe8/0xe9 are relative branches; fix the offset. */
-if ( a->repl_len >= 5 && (*buf & 0xfe) == 0xe8 )
+uint8_t *ip = buf, *end = ip + a->repl_len;
+
+for ( x86_decode_lite_t res; ip < end; ip += res.len )
 {
+int32_t *d32;
+uint8_t *target;
+
+res = x86_decode_lite(ip, end);
+
+if ( res.len <= 0 )
+{
+printk("Alternative for %ps [%*ph]\n",
+   ALT_ORIG_PTR(a), a->repl_len, repl);
+printk("Unable to decode instruction in alternative - 
ignoring.\n");
+goto skip_this_alternative;
+}
+
+if ( res.rel_type != REL_TYPE_d32 )
+continue;
+
+d32 = res.rel;
+
 /*
  * Detect the special case of indirect-to-direct branch 
patching:
  * - replacement is a direct CALL/JMP (opcodes 0xE8/0xE9; 
already
@@ -317,14 +338,23 @@ static void init_or_livepatch _apply_alternatives(struct 
alt_instr *start,
  */
 goto skip_this_alternative;
 }
+
+continue;
 }
-else if ( force && system_state < SYS_STATE_active )
-ASSERT_UNREACHABLE();
-else
-*(int32_t *)(buf + 1) += repl - orig;
+
+target = ip + res.len + *d32;
+
+if ( target >= buf && target <= end )
+{
+/*
+ * Target doesn't leave the replacement block.  e.g. RSB
+ * stuffing.  Leave it unmodified.
+ */
+continue;
+}
+
+*d32 += repl - orig;
 }
-else if ( force && system_state < SYS_STATE_active  )
-ASSERT_UNREACHABLE();
 }
 
 a->priv = 1;
-- 
2.30.2




[PATCH 1/6] x86: Introduce x86_decode_lite()

2024-04-22 Thread Andrew Cooper
In order to relocate all IP-relative fields in an alternative replacement
block, we need to decode the instructions enough to obtain their length and
any relative fields.

Full x86_decode() is far too heavyweight, so introduce a minimal form which
can make several simplifying assumptions.

This logic is sufficient to decode all alternative blocks that exist in Xen
right now.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
---
 xen/arch/x86/x86_emulate/Makefile  |   1 +
 xen/arch/x86/x86_emulate/decode-lite.c | 245 +
 xen/arch/x86/x86_emulate/private.h |   2 +
 xen/arch/x86/x86_emulate/x86_emulate.h |  17 ++
 4 files changed, 265 insertions(+)
 create mode 100644 xen/arch/x86/x86_emulate/decode-lite.c

diff --git a/xen/arch/x86/x86_emulate/Makefile 
b/xen/arch/x86/x86_emulate/Makefile
index 2e20d65d788a..f06731913d67 100644
--- a/xen/arch/x86/x86_emulate/Makefile
+++ b/xen/arch/x86/x86_emulate/Makefile
@@ -3,6 +3,7 @@ obj-y += 0fae.o
 obj-y += 0fc7.o
 obj-y += blk.o
 obj-y += decode.o
+obj-y += decode-lite.o
 obj-$(CONFIG_HVM) += fpu.o
 obj-y += util.o
 obj-y += util-xen.o
diff --git a/xen/arch/x86/x86_emulate/decode-lite.c 
b/xen/arch/x86/x86_emulate/decode-lite.c
new file mode 100644
index ..050b0d8cf4a8
--- /dev/null
+++ b/xen/arch/x86/x86_emulate/decode-lite.c
@@ -0,0 +1,245 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include "private.h"
+
+#define Imm8   (1 << 0)
+#define Imm(1 << 1)
+#define Branch (1 << 5) /* ... that we care about */
+/*  ModRM  (1 << 6) */
+#define Known  (1 << 7)
+
+#define ALU_OPS \
+(Known|ModRM),  \
+(Known|ModRM),  \
+(Known|ModRM),  \
+(Known|ModRM),  \
+(Known|Imm8),   \
+(Known|Imm)
+
+static const uint8_t init_or_livepatch_const onebyte[256] = {
+[0x00] = ALU_OPS, /* ADD */ [0x08] = ALU_OPS, /* OR  */
+[0x10] = ALU_OPS, /* ADC */ [0x18] = ALU_OPS, /* SBB */
+[0x20] = ALU_OPS, /* AND */ [0x28] = ALU_OPS, /* SUB */
+[0x30] = ALU_OPS, /* XOR */ [0x38] = ALU_OPS, /* CMP */
+
+[0x50 ... 0x5f] = (Known), /* PUSH/POP %reg */
+
+[0x70 ... 0x7f] = (Known|Branch|Imm8), /* Jcc disp8 */
+[0x80]  = (Known|ModRM|Imm8),
+[0x81]  = (Known|ModRM|Imm),
+
+[0x83]  = (Known|ModRM|Imm8),
+[0x84 ... 0x8e] = (Known|ModRM),   /* TEST/XCHG/MOV/MOV-SREG/LEA r/rm 
*/
+
+[0xb0 ... 0xb7] = (Known|Imm8),/* MOV $imm8, %reg */
+[0xb8 ... 0xbf] = (Known|Imm), /* MOV $imm32, %reg */
+
+[0xcc]  = (Known), /* INT3 */
+[0xcd]  = (Known|Imm8),/* INT $imm8 */
+
+[0xe8 ... 0xe9] = (Known|Branch|Imm),  /* CALL/JMP disp32 */
+[0xeb]  = (Known|Branch|Imm8), /* JMP disp8 */
+
+[0xf1]  = (Known), /* ICEBP */
+[0xf4]  = (Known), /* HLT */
+[0xf5]  = (Known), /* CMC */
+[0xf6 ... 0xf7] = (Known|ModRM),   /* Grp3 */
+[0xf8 ... 0xfd] = (Known), /* CLC ... STD */
+[0xfe ... 0xff] = (Known|ModRM),   /* Grp4 */
+};
+static const uint8_t init_or_livepatch_const twobyte[256] = {
+[0x00 ... 0x01] = (Known|ModRM),   /* Grp6/Grp7 */
+
+[0x18 ... 0x1f] = (Known|ModRM),   /* Grp16 (Hint Nop) */
+
+[0x20 ... 0x23] = (Known|ModRM),   /* MOV CR/DR */
+
+[0x30 ... 0x34] = (Known), /* WRMSR ... RDPMC */
+[0x40 ... 0x4f] = (Known|ModRM),   /* CMOVcc */
+
+[0x80 ... 0x8f] = (Known|Branch|Imm),  /* Jcc disp32 */
+[0x90 ... 0x9f] = (Known|ModRM),   /* SETcc */
+
+[0xab]  = (Known|ModRM),   /* BTS */
+[0xac]  = (Known|ModRM|Imm8),  /* SHRD $imm8 */
+[0xad ... 0xaf] = (Known|ModRM),   /* SHRD %cl / Grp15 / IMUL */
+
+[0xb8 ... 0xb9] = (Known|ModRM),   /* POPCNT/Grp10 (UD1) */
+[0xba]  = (Known|ModRM|Imm8),  /* Grp8 */
+[0xbb ... 0xbf] = (Known|ModRM),   /* BSR/BSF/BSR/MOVSX */
+};
+
+/*
+ * Bare minimum x86 instruction decoder to parse the alternative replacement
+ * instructions and locate the IP-relative references that may need updating.
+ *
+ * These are:
+ *  - disp8/32 from near branches
+ *  - RIP-relative memory references
+ *
+ * The following simplifications are used:
+ *  - All code is 64bit, and the instruction stream is safe to read.
+ *  - The 67 prefix is not implemented, so the address size is only 64bit.
+ *
+ * Inputs:
+ *  @ip  The position to start decoding from.
+ *  @end End of the replacement block.  Exceeding this is considered an error.
+ *
+ * Returns: x86_decode_lite_t
+ *  - On failure, length of -1.
+ *  - On success, length > 0 and REL_TYPE_*.  For REL_TYPE != NONE, rel points
+ *at t

[PATCH 6/6] x86/spec-ctrl: Introduce and use DO_COND_BHB_SEQ

2024-04-22 Thread Andrew Cooper
Now that alternatives can fix up call displacements even when they're not the
first instruction of the replacement, move the SCF_entry_bhb conditional
inside the replacement block.

This removes a conditional branch from the fastpaths of BHI-unaffected
hardware.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
---
 xen/arch/x86/hvm/vmx/entry.S | 12 +++
 xen/arch/x86/include/asm/spec_ctrl_asm.h | 43 +---
 2 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
index 7233e771d88a..77bf9ea564ea 100644
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -62,12 +62,12 @@ ENTRY(vmx_asm_vmexit_handler)
  * Clear the BHB to mitigate BHI.  Used on eIBRS parts, and uses RETs
  * itself so must be after we've perfomed all the RET-safety we can.
  */
-testb $SCF_entry_bhb, CPUINFO_scf(%rsp)
-jz .L_skip_bhb
-ALTERNATIVE_2 "",\
-"call clear_bhb_loops", X86_SPEC_BHB_LOOPS,  \
-"call clear_bhb_tsx", X86_SPEC_BHB_TSX
-.L_skip_bhb:
+.macro VMX_BHB_SEQ fn:req
+DO_COND_BHB_SEQ \fn scf=CPUINFO_scf(%rsp)
+.endm
+ALTERNATIVE_2 "", \
+"VMX_BHB_SEQ fn=clear_bhb_loops", X86_SPEC_BHB_LOOPS, \
+"VMX_BHB_SEQ fn=clear_bhb_tsx",   X86_SPEC_BHB_TSX
 
 ALTERNATIVE "lfence", "", X86_SPEC_NO_LFENCE_ENTRY_VMX
 /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h 
b/xen/arch/x86/include/asm/spec_ctrl_asm.h
index 729a830411eb..559dad88f967 100644
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -92,6 +92,21 @@
 .L\@_skip:
 .endm
 
+.macro DO_COND_BHB_SEQ fn:req, scf=%bl
+/*
+ * Requires SCF (defaults to %rbx), fn=clear_bhb_{loops,tsx}
+ * Clobbers %rax, %rcx
+ *
+ * Conditionally use a BHB clearing software sequence.
+ */
+testb  $SCF_entry_bhb, \scf
+jz .L\@_skip_bhb
+
+call   \fn
+
+.L\@_skip_bhb:
+.endm
+
 .macro DO_OVERWRITE_RSB tmp=rax, xu
 /*
  * Requires nothing
@@ -277,12 +292,9 @@
  * Clear the BHB to mitigate BHI.  Used on eIBRS parts, and uses RETs
  * itself so must be after we've perfomed all the RET-safety we can.
  */
-testb $SCF_entry_bhb, %bl
-jz .L\@_skip_bhb
-ALTERNATIVE_2 "",\
-"call clear_bhb_loops", X86_SPEC_BHB_LOOPS,  \
-"call clear_bhb_tsx", X86_SPEC_BHB_TSX
-.L\@_skip_bhb:
+ALTERNATIVE_2 "",  \
+"DO_COND_BHB_SEQ clear_bhb_loops", X86_SPEC_BHB_LOOPS, \
+"DO_COND_BHB_SEQ clear_bhb_tsx",   X86_SPEC_BHB_TSX
 
 ALTERNATIVE "lfence", "", X86_SPEC_NO_LFENCE_ENTRY_PV
 .endm
@@ -322,12 +334,9 @@
 ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=1), \
 X86_FEATURE_SC_MSR_PV
 
-testb $SCF_entry_bhb, %bl
-jz .L\@_skip_bhb
-ALTERNATIVE_2 "",\
-"call clear_bhb_loops", X86_SPEC_BHB_LOOPS,  \
-"call clear_bhb_tsx", X86_SPEC_BHB_TSX
-.L\@_skip_bhb:
+ALTERNATIVE_2 "",  \
+"DO_COND_BHB_SEQ clear_bhb_loops", X86_SPEC_BHB_LOOPS, \
+"DO_COND_BHB_SEQ clear_bhb_tsx",   X86_SPEC_BHB_TSX
 
 ALTERNATIVE "lfence", "", X86_SPEC_NO_LFENCE_ENTRY_INTR
 .endm
@@ -433,13 +442,9 @@
  * Clear the BHB to mitigate BHI.  Used on eIBRS parts, and uses RETs
  * itself so must be after we've perfomed all the RET-safety we can.
  */
-testb $SCF_entry_bhb, %bl
-jz .L\@_skip_bhb
-
-ALTERNATIVE_2 "",\
-"call clear_bhb_loops", X86_SPEC_BHB_LOOPS,  \
-"call clear_bhb_tsx", X86_SPEC_BHB_TSX
-.L\@_skip_bhb:
+ALTERNATIVE_2 "",  \
+"DO_COND_BHB_SEQ clear_bhb_loops", X86_SPEC_BHB_LOOPS, \
+"DO_COND_BHB_SEQ clear_bhb_tsx",   X86_SPEC_BHB_TSX
 
 lfence
 .endm
-- 
2.30.2




[PATCH 3/6] x86/alternative: Intend the relocation logic

2024-04-22 Thread Andrew Cooper
... to make subsequent patches legible.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
---
 xen/arch/x86/alternative.c | 126 +++--
 1 file changed, 64 insertions(+), 62 deletions(-)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index 5bd256365def..2ca4dfd569bc 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -244,78 +244,80 @@ static void init_or_livepatch _apply_alternatives(struct 
alt_instr *start,
 
 memcpy(buf, repl, a->repl_len);
 
-/* 0xe8/0xe9 are relative branches; fix the offset. */
-if ( a->repl_len >= 5 && (*buf & 0xfe) == 0xe8 )
 {
-/*
- * Detect the special case of indirect-to-direct branch patching:
- * - replacement is a direct CALL/JMP (opcodes 0xE8/0xE9; already
- *   checked above),
- * - replacement's displacement is -5 (pointing back at the very
- *   insn, which makes no sense in a real replacement insn),
- * - original is an indirect CALL/JMP (opcodes 0xFF/2 or 0xFF/4)
- *   using RIP-relative addressing.
- * Some branch destinations may still be NULL when we come here
- * the first time. Defer patching of those until the post-presmp-
- * initcalls re-invocation (with force set to true). If at that
- * point the branch destination is still NULL, insert "UD2; UD0"
- * (for ease of recognition) instead of CALL/JMP.
- */
-if ( a->cpuid == X86_FEATURE_ALWAYS &&
- *(int32_t *)(buf + 1) == -5 &&
- a->orig_len >= 6 &&
- orig[0] == 0xff &&
- orig[1] == (*buf & 1 ? 0x25 : 0x15) )
+/* 0xe8/0xe9 are relative branches; fix the offset. */
+if ( a->repl_len >= 5 && (*buf & 0xfe) == 0xe8 )
 {
-long disp = *(int32_t *)(orig + 2);
-const uint8_t *dest = *(void **)(orig + 6 + disp);
-
-if ( dest )
+/*
+ * Detect the special case of indirect-to-direct branch 
patching:
+ * - replacement is a direct CALL/JMP (opcodes 0xE8/0xE9; 
already
+ *   checked above),
+ * - replacement's displacement is -5 (pointing back at the 
very
+ *   insn, which makes no sense in a real replacement insn),
+ * - original is an indirect CALL/JMP (opcodes 0xFF/2 or 
0xFF/4)
+ *   using RIP-relative addressing.
+ * Some branch destinations may still be NULL when we come here
+ * the first time. Defer patching of those until the 
post-presmp-
+ * initcalls re-invocation (with force set to true). If at that
+ * point the branch destination is still NULL, insert "UD2; 
UD0"
+ * (for ease of recognition) instead of CALL/JMP.
+ */
+if ( a->cpuid == X86_FEATURE_ALWAYS &&
+ *(int32_t *)(buf + 1) == -5 &&
+ a->orig_len >= 6 &&
+ orig[0] == 0xff &&
+ orig[1] == (*buf & 1 ? 0x25 : 0x15) )
 {
-/*
- * When building for CET-IBT, all function pointer targets
- * should have an endbr64 instruction.
- *
- * If this is not the case, leave a warning because
- * something is probably wrong with the build.  A CET-IBT
- * enabled system might have exploded already.
- *
- * Otherwise, skip the endbr64 instruction.  This is a
- * marginal perf improvement which saves on instruction
- * decode bandwidth.
- */
-if ( IS_ENABLED(CONFIG_XEN_IBT) )
+long disp = *(int32_t *)(orig + 2);
+const uint8_t *dest = *(void **)(orig + 6 + disp);
+
+if ( dest )
 {
-if ( is_endbr64(dest) )
-dest += ENDBR64_LEN;
-else
-printk(XENLOG_WARNING
-   "altcall %ps dest %ps has no endbr64\n",
-   orig, dest);
+/*
+ * When building for CET-IBT, all function pointer 
targets
+ * should have an endbr64 instruction.
+ *
+ * If this is not the case, leave a warning bec

[PATCH 0/6] x86/alternatives: Adjust all insn-relative fields

2024-04-22 Thread Andrew Cooper
Alternatives have had a reasonably severe restriction since their
introduction.  This has been the source of several bugs, and several
inefficiencies particularly in the speculative safety paths, and I've finally
gotten bored enough to fixing it.

Introduce the new infrastructure, and adjust the BHB scrubbing logic to use
it.

Andrew Cooper (6):
  x86: Introduce x86_decode_lite()
  x86/alternative: Walk all replacements in debug builds
  x86/alternative: Intend the relocation logic
  x86/alternative: Replace a continue with a goto
  x86/alternative: Relocate all insn-relative fields
  x86/spec-ctrl: Introduce and use DO_COND_BHB_SEQ

 xen/arch/x86/alternative.c   | 210 +--
 xen/arch/x86/hvm/vmx/entry.S |  12 +-
 xen/arch/x86/include/asm/spec_ctrl_asm.h |  43 ++--
 xen/arch/x86/x86_emulate/Makefile|   1 +
 xen/arch/x86/x86_emulate/decode-lite.c   | 245 +++
 xen/arch/x86/x86_emulate/private.h   |   2 +
 xen/arch/x86/x86_emulate/x86_emulate.h   |  17 ++
 7 files changed, 445 insertions(+), 85 deletions(-)
 create mode 100644 xen/arch/x86/x86_emulate/decode-lite.c

-- 
2.30.2




[PATCH 4/6] x86/alternative: Replace a continue with a goto

2024-04-22 Thread Andrew Cooper
A subsequent patch is going to insert a loop, which interferes with the
continue in the devirtualisation logic.

Replace it with a goto, and a paragraph explaining why we intentionally avoid
setting a->priv = 1.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
---
 xen/arch/x86/alternative.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index 2ca4dfd569bc..c86ea235e865 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -308,7 +308,15 @@ static void init_or_livepatch _apply_alternatives(struct 
alt_instr *start,
 buf[4] = 0xff;
 }
 else
-continue;
+{
+/*
+ * The function pointer we're wanting to devirtualise
+ * is still NULL, and we're not sealing yet.  Leave
+ * the alternative fully un-processed, in order to
+ * process it the next time around.
+ */
+goto skip_this_alternative;
+}
 }
 else if ( force && system_state < SYS_STATE_active )
 ASSERT_UNREACHABLE();
@@ -323,6 +331,7 @@ static void init_or_livepatch _apply_alternatives(struct 
alt_instr *start,
 
 add_nops(buf + a->repl_len, total_len - a->repl_len);
 text_poke(orig, buf, total_len);
+skip_this_alternative:;
 }
 
 /*
-- 
2.30.2




Re: [PATCH v2 1/2] xen: introduce header file with section related symbols

2024-04-19 Thread Andrew Cooper
On 19/04/2024 11:12 am, Jan Beulich wrote:
> On 19.04.2024 12:08, Andrew Cooper wrote:
>> On 19/04/2024 11:02 am, Roger Pau Monne wrote:
>>> Start by declaring the beginning and end of the init section.
>>>
>>> No functional change intended.
>>>
>>> Requested-by: Andrew Cooper 
>>> Signed-off-by: Roger Pau Monné 
>> TYVM for doing this.  There's a lot of cleanup which can follow on for it.
>>
>> Reviewed-by: Andrew Cooper 
>>
>> although if anyone has a better name than sections.h then speak now.
> For what is put there now, and for any other section bounds markers the
> name is fine with me. I'm not presently convinced though we want to put
> __read_mostly and friends there.

Well that's exactly what I intend to clean up into it, because it's far
better in sections.h than (duplicated per arch) cache.h

(Also I intend to strip down kernel.h for the other major sections too.)

~Andrew



Re: [PATCH v2 2/2] livepatch: refuse to resolve symbols that belong to init sections

2024-04-19 Thread Andrew Cooper
On 19/04/2024 11:02 am, Roger Pau Monne wrote:
> Livepatch payloads containing symbols that belong to init sections can only
> lead to page faults later on, as by the time the livepatch is loaded init
> sections have already been freed.
>
> Refuse to resolve such symbols and return an error instead.
>
> Note such resolutions are only relevant for symbols that point to undefined
> sections (SHN_UNDEF), as that implies the symbol is not in the current payload
> and hence must either be a Xen or a different livepatch payload symbol.
>
> Do not allow to resolve symbols that point to __init_begin, as that address is
> also unmapped.  On the other hand, __init_end is not unmapped, and hence allow
> resolutions against it.
>
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Andrew Cooper , although ...

> ---
> Changes since v1:
>  - Fix off-by-one in range checking.
> ---
>  xen/common/livepatch_elf.c | 16 
>  1 file changed, 16 insertions(+)
>
> diff --git a/xen/common/livepatch_elf.c b/xen/common/livepatch_elf.c
> index 45d73912a3cd..a67101eadc02 100644
> --- a/xen/common/livepatch_elf.c
> +++ b/xen/common/livepatch_elf.c
> @@ -4,6 +4,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -310,6 +311,21 @@ int livepatch_elf_resolve_symbols(struct livepatch_elf 
> *elf)
>  break;
>  }
>  }
> +
> +/*
> + * Ensure not an init symbol.  Only applicable to Xen symbols, as
> + * livepatch payloads don't have init sections or equivalent.
> + */
> +else if ( st_value >= (uintptr_t)&__init_begin &&
> +  st_value < (uintptr_t)&__init_end )

... I normally vertically the (casts) in cases like this for improved
legibility.  Happy to fold on commit.

~Andrew



Re: [PATCH v2 1/2] xen: introduce header file with section related symbols

2024-04-19 Thread Andrew Cooper
On 19/04/2024 11:02 am, Roger Pau Monne wrote:
> Start by declaring the beginning and end of the init section.
>
> No functional change intended.
>
> Requested-by: Andrew Cooper 
> Signed-off-by: Roger Pau Monné 

TYVM for doing this.  There's a lot of cleanup which can follow on for it.

Reviewed-by: Andrew Cooper 

although if anyone has a better name than sections.h then speak now.



Re: [XEN PATCH] docs/misra: mark the gzip folder as adopted code

2024-04-18 Thread Andrew Cooper
On 18/04/2024 8:39 am, Jan Beulich wrote:
> On 15.04.2024 17:44, Andrew Cooper wrote:
>> On 15/04/2024 10:56 am, Federico Serafini wrote:
>>> Mark the whole gzip folder as adopted code and remove the redundant
>>> deviation of file inflate.
>>>
>>> Signed-off-by: Federico Serafini 
>> Acked-by: Andrew Cooper 
>>
>> I hadn't realised that we had a special case like this.  Definitely
>> better to get rid of it.
>>
>> I've pulled this into my `for-next` branch, and will get it committed
>> properly when OSSTest (our non-gitlab CI) is in a happier state.
> Hmm. Considering Daniel's work (which I'll comment on separately), is this
> really going to remain "adopted"? We're about to diverge to a degree where
> simply taking patches from the original source isn't going to work anymore.
> IOW I think we want either Daniel's work (and perhaps follow-on adjustments)
> or marking of that code as adopted.

inflate.c is was from Linux in 2010.  There's only one build fix and one
comment change in Linux since 2010, whereas Xen's copy has seen several
bugfixes and cleanups.

gunzip.c has floated around rather more (it was originally some glue
code in bZImage.c) but it was entirely rewritten first, to support other
types of decompression (we did this differently in Xen), and second to
support KASLR.


In both cases, there's not an upstream to usefully track, and we
probably take ownership.

~Andrew



Re: [PATCH] xen/efi: Rewrite DOS/PE magic checking without memcmp()

2024-04-18 Thread Andrew Cooper
On 17/04/2024 8:14 am, Roger Pau Monné wrote:
> On Tue, Apr 16, 2024 at 04:52:51PM +0100, Andrew Cooper wrote:
>> Misra Rule 21.16 doesn't like the use of memcmp() between a string literal 
>> and
>> a UINT8 array.  Rewrite using plain compares.
> The commit message makes it look like it's a type mismatch issue
> between the two elements being compared, but from my reading of the
> rule the issue is with the usage of a char pointer with memcmp().
> IOW: even if the two parameters are char pointers it would still be a
> violation.
>
> "Misra Rule 21.16 forbids the use of memcmp() against character
> arrays.  Rewrite using plain compares since checking for "PE\0\0"
> cannot be done using strncmp()."

I've tweaked the sentence to say character array, but IMO it's really
not about strncmp() which wouldn't be correct to use here even if it
happened to produce a correct answer.
>> No functional change.
>>
>> Signed-off-by: Andrew Cooper 
> LGTM (possibly pending the adjustment of the commit message):
>
> Acked-by: Roger Pau Monné 

Thanks.



Re: [PATCH] xen/efi: Rewrite DOS/PE magic checking without memcmp()

2024-04-18 Thread Andrew Cooper
On 18/04/2024 12:09 pm, Jan Beulich wrote:
> On 16.04.2024 17:52, Andrew Cooper wrote:
>> Misra Rule 21.16 doesn't like the use of memcmp() between a string literal 
>> and
>> a UINT8 array.  Rewrite using plain compares.
>>
>> No functional change.
>>
>> Signed-off-by: Andrew Cooper 
> Reviewed-by: Jan Beulich 

Thanks.

> after having realized that sadly gcc13 instantiates the compound literals
> (that I had thought of using) in .rodata (or one of its variants), rather
> than leveraging them being as constant as string literals.

gcc10 manages to optimise both checks to a cmp{w,l}, which I did check
before sending the patch.

However, it chooses a different base register for the cmpl and there's a
sad cascade effect where a bunch of JMP disp8's turn into disp32's.

But oh well - it's init code.

~Andrew



Re: [PATCH] xen/efi: Rewrite DOS/PE magic checking without memcmp()

2024-04-18 Thread Andrew Cooper
On 18/04/2024 12:06 pm, Jan Beulich wrote:
> On 17.04.2024 09:14, Roger Pau Monné wrote:
>> On Tue, Apr 16, 2024 at 04:52:51PM +0100, Andrew Cooper wrote:
>>> --- a/xen/common/efi/pe.c
>>> +++ b/xen/common/efi/pe.c
>>> @@ -111,7 +111,8 @@ const void *__init pe_find_section(const void *image, 
>>> const UINTN image_size,
>>>  UINTN offset, i;
>>>  
>>>  if ( image_size < sizeof(*dos) ||
>>> - memcmp(dos->Magic, "MZ", 2) != 0 )
>>> + dos->Magic[0] != 'M' ||
>>> + dos->Magic[1] != 'Z' )
>> For this one you could likely use strncmp()?
> strncmp() against UINT8[2] wouldn't be liked by the compiler, I guess.

Indeed.  And this MISRA rule is very much "you are mixing string and
non-string types.  Don't do that."

This is a very rare patten, where we are looking for a binary marker
than just happens to also make sense when expressed as an ASCII string.

Although the MISRA complaint does raise a good point.   The memcmp()
form would malfunction on any system with CHAR_BIT != 8, in a way that
the {u,}int8_t-at-a-time form wouldn't.

~Andrew



Re: [XEN PATCH v1 08/15] x86/vpmu: separate amd/intel vPMU code

2024-04-18 Thread Andrew Cooper
On 18/04/2024 2:25 pm, Sergiy Kibrik wrote:
> 16.04.24 14:05, Andrew Cooper:
>> On 16/04/2024 7:35 am, Sergiy Kibrik wrote:
>>> diff --git a/xen/arch/x86/cpu/Makefile b/xen/arch/x86/cpu/Makefile
>>> index 35561fe51d..d3d7b8fb2e 100644
>>> --- a/xen/arch/x86/cpu/Makefile
>>> +++ b/xen/arch/x86/cpu/Makefile
>>> @@ -10,4 +10,6 @@ obj-y += intel.o
>>>   obj-y += intel_cacheinfo.o
>>>   obj-y += mwait-idle.o
>>>   obj-y += shanghai.o
>>> -obj-y += vpmu.o vpmu_amd.o vpmu_intel.o
>>> +obj-y += vpmu.o
>>> +obj-$(CONFIG_SVM) += vpmu_amd.o
>>> +obj-$(CONFIG_VMX) += vpmu_intel.o
>>
>> I'm afraid this breaks perf counters on PV guests.  These files are
>> joint guest-type implementations.
>>
>> Seeing as you leave vpmu.o alone, I guess that all you're actually
>> wanting to do is compile out vpmu_intel.o?  In which case, use
>> CONFIG_{AMD,INTEL} rather than CONFIG_{SVM,VMX} please.
>>
>
> Thanks for pointing that out.
> I think I'll just exclude this patch from the series, and make a
> separate series with CONFIG_{AMD,INTEL} option and code separation
> that unrelated to VMX/SVM & HVM/PV, only to CPUs themselves.
>
> BTW, how would you suggest CONFIG_{AMD,INTEL} shall relate to
> CONFIG_{SVM,VMX}? Should CONFIG_VMX just plainly depend on CONFIG_AMD,
> or more complex relations needed?

To a first approximation, no linkage.

Centaur have an implementation of VMX on the market, and Hygon have an
implementation of SVM.

~Andrew



Re: [PATCH 1/4] xen/xlat: Sort out whitespace

2024-04-18 Thread Andrew Cooper
On 15/04/2024 10:49 pm, Stefano Stabellini wrote:
> On Mon, 15 Apr 2024, Andrew Cooper wrote:
>>  * Fix tabs/spaces mismatch for certain rows
>>  * Insert lines between header files to improve legibility
>>
>> Signed-off-by: Andrew Cooper 
>> ---
>> CC: George Dunlap 
>> CC: Jan Beulich 
>> CC: Stefano Stabellini 
>> CC: Julien Grall 
>> ---
>>  xen/include/xlat.lst | 31 +++
>>  1 file changed, 27 insertions(+), 4 deletions(-)
>>
>> diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
>> index 9c41948514bf..e811342bb096 100644
>> --- a/xen/include/xlat.lst
>> +++ b/xen/include/xlat.lst
>> @@ -20,19 +20,24 @@
>>  # First column indicator:
>>  # ! - needs translation
>>  # ? - needs checking
>> +
>>  ?   dom0_vga_console_info   xen.h
>>  ?   xenctl_bitmap   xen.h
>>  ?   mmu_update  xen.h
>>  !   mmuext_op   xen.h
>>  !   start_info  xen.h
>>  ?   vcpu_time_info  xen.h
>> +
>>  ?   pmu_amd_ctxtarch-x86/pmu.h
>>  ?   pmu_archarch-x86/pmu.h
>>  ?   pmu_cntr_pair   arch-x86/pmu.h
>>  ?   pmu_intel_ctxt  arch-x86/pmu.h
>>  ?   pmu_regsarch-x86/pmu.h
>> +
>>  !   cpu_user_regs   arch-x86/xen-@arch@.h
>> +
>>  !   trap_info   arch-x86/xen.h
>> +
>>  ?   cpu_offline_action  arch-x86/xen-mca.h
>>  ?   mc  arch-x86/xen-mca.h
>>  ?   mcinfo_bank arch-x86/xen-mca.h
>> @@ -50,6 +55,7 @@
>>  ?   mc_notifydomain arch-x86/xen-mca.h
>>  !   mc_physcpuinfo  arch-x86/xen-mca.h
>>  ?   page_offline_action arch-x86/xen-mca.h
>> +
>>  ?   argo_addr   argo.h
>>  !   argo_iovargo.h
>>  ?   argo_register_ring  argo.h
>> @@ -59,6 +65,7 @@
>>  ?   argo_ring_message_headerargo.h
>>  ?   argo_send_addr  argo.h
>>  ?   argo_unregister_ringargo.h
>> +
>>  ?   evtchn_alloc_unboundevent_channel.h
>>  ?   evtchn_bind_interdomain event_channel.h
>>  ?   evtchn_bind_ipi event_channel.h
>> @@ -74,6 +81,7 @@
>>  ?   evtchn_set_priority event_channel.h
>>  ?   evtchn_status   event_channel.h
>>  ?   evtchn_unmask   event_channel.h
>> +
>>  ?   gnttab_cache_flush  grant_table.h
>>  !   gnttab_copy grant_table.h
>>  ?   gnttab_dump_table   grant_table.h
>> @@ -86,9 +94,10 @@
>>  ?   gnttab_get_version  grant_table.h
>>  !   gnttab_get_status_framesgrant_table.h
>>  ?   grant_entry_v1  grant_table.h
>> -?   grant_entry_header  grant_table.h
>> +?   grant_entry_header  grant_table.h
>>  ?   grant_entry_v2  grant_table.h
>>  ?   gnttab_swap_grant_ref   grant_table.h
>> +
>>  !   dm_op_buf   hvm/dm_op.h
>>  ?   dm_op_create_ioreq_server   hvm/dm_op.h
>>  ?   dm_op_destroy_ioreq_server  hvm/dm_op.h
>> @@ -108,15 +117,20 @@
>>  ?   dm_op_set_pci_intx_levelhvm/dm_op.h
>>  ?   dm_op_set_pci_link_routehvm/dm_op.h
>>  ?   dm_op_track_dirty_vram  hvm/dm_op.h
>> +
>>  !   hvm_altp2m_set_mem_access_multi hvm/hvm_op.h
>> +
>>  ?   vcpu_hvm_contexthvm/hvm_vcpu.h
>>  ?   vcpu_hvm_x86_32 hvm/hvm_vcpu.h
>>  ?   vcpu_hvm_x86_64 hvm/hvm_vcpu.h
>> +
>>  ?   hypfs_direntry  hypfs.h
>>  ?   hypfs_dirlistentry  hypfs.h
>> +
>>  ?   kexec_exec  kexec.h
>>  !   kexec_image kexec.h
>>  !   kexec_range kexec.h
>> +
>>  !   add_to_physmap  memory.h
>>  !   add_to_physmap_batchmemory.h
>>  !   foreign_memory_map  memory.h
>> @@ -130,6 +144,7 @@
>>  !   reserved_device_memory_map  memory.h
>>  ?   vmemrange   memory.h
>>  !   vnuma_topology_info memory.h
>> +
>>  ?   physdev_eoi physdev.h
>>  ?   physdev_get_free_pirq   physdev.h
>>  ?   physdev_irq physdev.h
>> @@ -

Re: [PATCH v2 1/5] x86: Update x86 low level version check of microcode

2024-04-18 Thread Andrew Cooper
On 16/04/2024 10:15 am, Fouad Hilly wrote:
> Update microcode version check at Intel and AMD Level by:
> Preventing the low level code from sending errors if the microcode
> version provided is not a newer version. Other errors will be sent like 
> before.
> When the provided microcode version is the same as the current one, code
> to point to microcode provided.
> Microcode version check happens at higher and common level in core.c.
> Keep all the required code at low level that checks for signature and CPU 
> compatibility
>
> [v2]
> Update message description to better describe the changes
>
> Signed-off-by: Fouad Hilly 
> ---


As a general note, your v2/v3/etc changelog needs to go under this --- line.

~Andrew



Re: [PATCH v2 4/6] gzip: refactor state tracking

2024-04-17 Thread Andrew Cooper
On 17/04/2024 3:37 pm, Daniel P. Smith wrote:
> diff --git a/xen/common/gzip/gunzip.c b/xen/common/gzip/gunzip.c
> index 1b448d6e3655..8178a05a0190 100644
> --- a/xen/common/gzip/gunzip.c
> +++ b/xen/common/gzip/gunzip.c
> @@ -4,18 +4,25 @@
>  #include 
>  #include 
>  
> -static unsigned char *__initdata window;
> -
>  #define WSIZE   0x8000U
>  
> -static unsigned char *__initdata inbuf;
> -static unsigned int __initdata insize;
> +struct gzip_state {
> +unsigned char *window;
> +
> +unsigned char *inbuf;
> +unsigned int insize;
> +
> +/* Index of next byte to be processed in inbuf: */
> +unsigned int inptr;

perform_gunzip() passes in an unsigned long size, which means it's
truncated in this state.

Please at least make these size_t.  Life is too short to deal with the
fallout of this going wrong.

>  
> -/* Index of next byte to be processed in inbuf: */
> -static unsigned int __initdata inptr;
> +/* Bytes in output buffer: */
> +unsigned int outcnt;

See later, but I think the comment for outcnt is wrong.

>  
> -/* Bytes in output buffer: */
> -static unsigned int __initdata outcnt;
> +long bytes_out;

See later, but I don't think this can be signed.  It's only used to
cross-check the header-reported length, which is done as an unsigned long.

> +
> +unsigned long bb;  /* bit buffer */
> +unsigned bk;   /* bits in bit buffer */

As this is effectively new code, "unsigned int" please.

> @@ -27,7 +34,7 @@ typedef unsigned char   uch;
>  typedef unsigned short  ush;
>  typedef unsigned long   ulg;
>  
> -#define get_byte()  (inptr < insize ? inbuf[inptr++] : fill_inbuf())
> +#define get_byte() (s->inptr < s->insize ? s->inbuf[s->inptr++] : 
> fill_inbuf())

This is a bit weird:
> $ git grep -w get_byte common/gzip
> common/gzip/gunzip.c:40:#define get_byte() (s->inptr < s->insize ?
> s->inbuf[s->inptr++] : fill_inbuf())
> common/gzip/inflate.c:224:#define NEXTBYTE(s)  ({ int v = get_byte();
> if (v < 0) goto underrun; (uch)v; })
> common/gzip/inflate.c:1131:    flags  = (uch)get_byte();

NEXTBYTE() gets an s parameter, but doesn't pass it to get_byte() and
instead relies on state always having the name 's' in scope.

I'd suggest turning get_byte() into a proper static function.  It will
be more readable that throwing extra ()'s around s in the existing macro.

Except...  fill_inbuf() is a fatal error anyway, so that can be folded
together to simplify the result.


> @@ -72,16 +78,16 @@ static __init void flush_window(void)
>  unsigned int n;
>  unsigned char *in, ch;
>  
> -in = window;
> -for ( n = 0; n < outcnt; n++ )
> +in = s->window;
> +for ( n = 0; n < s->outcnt; n++ )
>  {
>  ch = *in++;
>  c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
>  }
>  crc = c;
>  
> -bytes_out += (unsigned long)outcnt;
> -outcnt = 0;
> +s->bytes_out += (unsigned long)s->outcnt;

I can't figure out why this was written this way, even originally.

AFAICT, outcnt doesn't even match it's comment.

/* Bytes in output buffer: */

It looks like it's the number of bytes in window.  This also matches the
"wp" define which I guess is "window position".

Anyway, irrespective of naming, the cast is useless so lets drop it
while modifying the line.

> diff --git a/xen/common/gzip/inflate.c b/xen/common/gzip/inflate.c
> index 512d9bf0ee2e..5735bbcf7eb4 100644
> --- a/xen/common/gzip/inflate.c
> +++ b/xen/common/gzip/inflate.c
> @@ -119,7 +119,7 @@ static char rcsid[] = "#Id: inflate.c,v 0.14 1993/06/10 
> 13:27:04 jloup Exp #";
>  
>  #endif /* !__XEN__ */
>  
> -#define slide window
> +#define slide s->window

Given only 8 uses, I'd expand this in place and drop the macro.

But, there's an entire comment block discussing "slide" which I think is
wrong for Xen's implementation.  That wants to go too, I think.

>  
>  /*
>   * Huffman code lookup table entry--this entry is four bytes for machines
> @@ -143,12 +143,12 @@ struct huft {
>  static int huft_build OF((unsigned *, unsigned, unsigned,
>const ush *, const ush *, struct huft **, int *));
>  static int huft_free OF((struct huft *));
> -static int inflate_codes OF((struct huft *, struct huft *, int, int));
> -static int inflate_stored OF((void));
> -static int inflate_fixed OF((void));
> -static int inflate_dynamic OF((void));
> -static int inflate_block OF((int *));
> -static int inflate OF((void));
> +static int inflate_codes OF((struct gzip_state *, struct huft *, struct huft 
> *, int, int));
> +static int inflate_stored OF((struct gzip_state *));
> +static int inflate_fixed OF((struct gzip_state *));
> +static int inflate_dynamic OF((struct gzip_state *));
> +static int inflate_block OF((struct gzip_state *, int *));
> +static int inflate OF((struct gzip_state *));

These are the only users of OF, and it turns out it's a no-op macro. 
This code would be far-less WTF-worthy if it was expanded as part of

Re: [PATCH v2 5/6] gzip: move crc state into consilidated gzip state

2024-04-17 Thread Andrew Cooper
On 17/04/2024 3:37 pm, Daniel P. Smith wrote:
> Signed-off-by: Daniel P. Smith 

The change in type is fine, but does need discussing.  Furthermore, ...

> diff --git a/xen/common/gzip/gunzip.c b/xen/common/gzip/gunzip.c
> index 8178a05a0190..bef324d3d166 100644
> --- a/xen/common/gzip/gunzip.c
> +++ b/xen/common/gzip/gunzip.c
> @@ -74,7 +77,7 @@ static __init void flush_window(struct gzip_state *s)
>   * The window is equal to the output buffer therefore only need to
>   * compute the crc.
>   */
> -unsigned long c = crc;
> +unsigned long c = s->crc;

... this wants to be unsigned int I think.

> diff --git a/xen/common/gzip/inflate.c b/xen/common/gzip/inflate.c
> index 5735bbcf7eb4..c18ce20210b0 100644
> --- a/xen/common/gzip/inflate.c
> +++ b/xen/common/gzip/inflate.c
> @@ -1063,16 +1063,14 @@ static int __init inflate(struct gzip_state *s)
>   *
>   **/
>  
> -static ulg __initdata crc_32_tab[256];
> -static ulg __initdata crc;  /* initialized in makecrc() so it'll reside in 
> bss */
> -#define CRC_VALUE (crc ^ 0xUL)
> +#define CRC_VALUE (s->crc ^ 0xUL)

$ git grep CRC_VALUE
common/gzip/inflate.c:1052:#define CRC_VALUE (s->crc ^ 0xUL)
common/gzip/inflate.c:1207:    if (orig_crc != CRC_VALUE) {

I'd expand this in it's single user, but like ...

>  
>  /*
>   * Code to compute the CRC-32 table. Borrowed from
>   * gzip-1.0.3/makecrc.c.
>   */
>  
> -static void __init makecrc(void)
> +static void __init makecrc(struct gzip_state *s)
>  {
>  /* Not copyrighted 1990 Mark Adler */
>  
> @@ -1089,7 +1087,7 @@ static void __init makecrc(void)
>  for (i = 0; i < sizeof(p)/sizeof(int); i++)
>  e |= 1L << (31 - p[i]);
>  
> -crc_32_tab[0] = 0;
> +s->crc_32_tab[0] = 0;
>  
>  for (i = 1; i < 256; i++)
>  {
> @@ -1100,11 +1098,11 @@ static void __init makecrc(void)
>  if (k & 1)
>  c ^= e;
>  }
> -crc_32_tab[i] = c;
> +s->crc_32_tab[i] = c;
>  }
>  
>  /* this is initialized here so this code could reside in ROM */
> -crc = (ulg)0xUL; /* shift register contents */
> +s->crc = (ulg)0xUL; /* shift register contents */

... this, the constant should become -1u or ~0u because of the type change.

I'm not sure what to make of the ROM comment, but I suspect it means the
XOR can be dropped with a bit of care too.

~Andrew



Re: [PATCH v2 3/6] gzip: remove custom memory allocator

2024-04-17 Thread Andrew Cooper
On 17/04/2024 3:37 pm, Daniel P. Smith wrote:
> All the other decompression routines use xmalloc_bytes(), thus there is no
> reason for gzip to be handling its own allocation of memory. In fact, there is
> a bug somewhere in the allocator as decompression started to break when adding
> additional allocations. Instead of troubleshooting the allocator, replace it
> with xmalloc_bytes().
>
> Signed-off-by: Daniel P. Smith 
> ---
>  xen/common/gzip/gunzip.c  | 17 ++
>  xen/common/gzip/inflate.c | 47 ---
>  2 files changed, 2 insertions(+), 62 deletions(-)

Good riddance.

Reviewed-by: Andrew Cooper 



Re: [PATCH v2 6/6] gzip: drop huffman code table tracking

2024-04-17 Thread Andrew Cooper
On 17/04/2024 3:37 pm, Daniel P. Smith wrote:
> The "tracking" bits does not appear to be used, so dropping from the code.
>
> Signed-off-by: Daniel P. Smith 
> ---
>  xen/common/gzip/inflate.c | 6 --
>  1 file changed, 6 deletions(-)
>
> diff --git a/xen/common/gzip/inflate.c b/xen/common/gzip/inflate.c
> index c18ce20210b0..15bc187c2bbe 100644
> --- a/xen/common/gzip/inflate.c
> +++ b/xen/common/gzip/inflate.c
> @@ -264,8 +264,6 @@ static const int dbits = 6;  /* bits in base 
> distance lookup table */
>  #define BMAX 16 /* maximum bit length of any code (16 for explode) */
>  #define N_MAX 288   /* maximum number of codes in any set */
>  
> -static unsigned __initdata hufts;  /* track memory usage */
> -
>  /*
>   * Given a list of code lengths and a maximum table size, make a set of
>   * tables to decode that set of codes.  Return zero on success, one if
> @@ -445,7 +443,6 @@ static int __init huft_build(
>  goto out;
>  }
>  DEBG1("4 ");
> -hufts += z + 1; /* track memory usage */
>  *t = q + 1; /* link to list for huft_free() */
>  *(t = &(q->v.t)) = (struct huft *)NULL;
>  u[h] = ++q; /* table starts after link */
> @@ -1028,15 +1025,12 @@ static int __init inflate(struct gzip_state *s)
>  /* decompress until the last block */
>  h = 0;
>  do {
> -hufts = 0;
>  #ifdef ARCH_HAS_DECOMP_WDOG
>  arch_decomp_wdog();
>  #endif
>  r = inflate_block(s, );
>  if (r)
>  return r;
> -if (hufts > h)
> -h = hufts;
>  } while (!e);

With 'hufts' removed, the local variable 'h' is now dead too.  It gets
read inside an #ifdef DEBUG, but as it's rendering a unqualified number
to stderr, it can also be deleted.

Specifically, I recommend this additional delta:

diff --git a/xen/common/gzip/inflate.c b/xen/common/gzip/inflate.c
index 15bc187c2bbe..13015bb45f4a 100644
--- a/xen/common/gzip/inflate.c
+++ b/xen/common/gzip/inflate.c
@@ -1015,7 +1015,6 @@ static int __init inflate(struct gzip_state *s)
 {
 int e;    /* last block flag */
 int r;    /* result code */
-    unsigned h;   /* maximum struct huft's malloc'ed */
 
 /* initialize window, bit buffer */
 wp = 0;
@@ -1023,7 +1022,6 @@ static int __init inflate(struct gzip_state *s)
 s->bb = 0;
 
 /* decompress until the last block */
-    h = 0;
 do {
 #ifdef ARCH_HAS_DECOMP_WDOG
 arch_decomp_wdog();
@@ -1045,9 +1043,6 @@ static int __init inflate(struct gzip_state *s)
 flush_output(s, wp);
 
 /* return success */
-#ifdef DEBUG
-    fprintf(stderr, "<%u> ", h);
-#endif /* DEBUG */
 return 0;
 }
 

which I'm happy to fold on commit.

~Andrew



Re: [PATCH v2 1/6] gzip: drop unused define checks

2024-04-17 Thread Andrew Cooper
On 17/04/2024 3:37 pm, Daniel P. Smith wrote:
> Dropping the define checks for PKZIP_BUG_WORKAROUND and NOMEMCPY define as 
> they
> never are set.
>
> Signed-off-by: Daniel P. Smith 

It looks like ARCH_HAS_DECOMP_WDOG is another one that can go.

There's only a single instance, in inflate(), which I'm happy to fold on
commit.

~Andrew



[PATCH] xen/efi: Rewrite DOS/PE magic checking without memcmp()

2024-04-16 Thread Andrew Cooper
Misra Rule 21.16 doesn't like the use of memcmp() between a string literal and
a UINT8 array.  Rewrite using plain compares.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Stefano Stabellini 
CC: consult...@bugseng.com 
CC: Roberto Bagnara 
CC: Federico Serafini 
CC: Nicola Vetrini 
---
 xen/common/efi/pe.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/xen/common/efi/pe.c b/xen/common/efi/pe.c
index a84992df9afe..ef8a2543e0a1 100644
--- a/xen/common/efi/pe.c
+++ b/xen/common/efi/pe.c
@@ -111,7 +111,8 @@ const void *__init pe_find_section(const void *image, const 
UINTN image_size,
 UINTN offset, i;
 
 if ( image_size < sizeof(*dos) ||
- memcmp(dos->Magic, "MZ", 2) != 0 )
+ dos->Magic[0] != 'M' ||
+ dos->Magic[1] != 'Z' )
 return NULL;
 
 offset = dos->ExeHeader;
@@ -119,7 +120,10 @@ const void *__init pe_find_section(const void *image, 
const UINTN image_size,
 
 offset += sizeof(*pe);
 if ( image_size < offset ||
- memcmp(pe->Magic, "PE\0\0", 4) != 0 )
+ pe->Magic[0] != 'P' ||
+ pe->Magic[1] != 'E' ||
+ pe->Magic[2] != '\0' ||
+ pe->Magic[3] != '\0' )
 return NULL;
 
 if ( pe->FileHeader.Machine != PE_HEADER_MACHINE )

base-commit: c0f890cd9d5fd2c17a1e3110cb26f98c90ce8429
-- 
2.30.2




Re: [PATCH] x86/hvm: Allow supplying a dynamic start ASID

2024-04-16 Thread Andrew Cooper
On 16/04/2024 9:54 am, Vaishali Thakkar wrote:
> Currently, Xen always starts the ASID allocation at 1. But
> for SEV technologies the ASID space is divided. This is
> because it's a security issue if a guest is started as
> ES/SNP and is migrated to SEV-only. So, the types are
> tracked explicitly.
>
> Thus, in preparation of SEV support in Xen, add min_asid
> to allow supplying the dynamic start ASID during the
> allocation process.
>
> Signed-off-by: Vaishali Thakkar 

Mechanically, this is fine, but is it going to be useful in the long term?

For SEV and SEV-ES/SNP, we must run the VM in the single fixed ASID
negotiated with the ASP.  This is not not optional.

For non-encrypted VMs, we are free to choose between using the remaining
ASID space as we used to (i.e. run it per-pCPU and tick it over to flush
the TLBs), or to run it in a fixed ASID per guest too.

The latter lets us use INVLPGB, and would avoid maintaining two
different TLB-tagging schemes.


I'd say that we absolutely do want INVLPGB support for managing
non-encrypted VMs, and we cannot mix both fixed and floating ASIDs at
the same time.

We should seriously consider whether it's worth maintaining two schemes,
or just switching wholesale to a fixed ASID scheme.

I don't have a good answer here...  If it where anything but TLB
flushing, it would be an obvious choice, but TLB handling bugs are some
of the nastiest to show up.

~Andrew



Re: [PATCH] x86/svm: Add flushbyasid in the supported features

2024-04-16 Thread Andrew Cooper
On 16/04/2024 10:08 am, Vaishali Thakkar wrote:
> TLB Flush by ASID is missing in the list of supported features
> here. So, add it.
>
> Signed-off-by: Vaishali Thakkar 
> ---
>  xen/arch/x86/hvm/svm/svm.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index a745acd903..4719fffae5 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -2510,6 +2510,7 @@ const struct hvm_function_table * __init start_svm(void)
>  P(cpu_has_svm_lbrv, "Last Branch Record (LBR) Virtualisation");
>  P(cpu_has_svm_nrips, "Next-RIP Saved on #VMEXIT");
>  P(cpu_has_svm_cleanbits, "VMCB Clean Bits");
> +P(cpu_has_svm_flushbyasid, "TLB flush by ASID");
>  P(cpu_has_svm_decode, "DecodeAssists");
>  P(cpu_has_svm_vloadsave, "Virtual VMLOAD/VMSAVE");
>  P(cpu_has_svm_vgif, "Virtual GIF");

This is consistent with pre-existing behaviour, so

Acked-by: Andrew Cooper 

However, an ever increasing list of lines like this is something I'm
trying to push back against.

They don't match the configured state of VMs in the system, not least
because one of the things required to fix security vulnerabilities in
nested virt is to break the (false) assumption that there is a single
global state of how a VM is configured.

These ones in particular are just about to appear in CPU policies.

~Andrew



Re: [XEN PATCH v1 13/15] x86: wire cpu_has_{svm/vmx}_* to false when svm/vmx not enabled

2024-04-16 Thread Andrew Cooper
On 16/04/2024 7:46 am, Sergiy Kibrik wrote:
> From: Xenia Ragiadakou 
>
> To be able to use cpu_has_{svm/vmx}_* macros in common code without enclosing
> them inside #ifdef guards when the respective virtualization technology is
> not enabled, define corresponding helper routines as false when not 
> applicable.
>
> No functional change intended.
>
> Signed-off-by: Xenia Ragiadakou 
> Signed-off-by: Sergiy Kibrik 
> ---
>  xen/arch/x86/include/asm/hvm/svm/svm.h  | 8 
>  xen/arch/x86/include/asm/hvm/vmx/vmcs.h | 7 +++
>  2 files changed, 15 insertions(+)
>
> diff --git a/xen/arch/x86/include/asm/hvm/svm/svm.h 
> b/xen/arch/x86/include/asm/hvm/svm/svm.h
> index 4eeeb25da9..7e8cdb4a27 100644
> --- a/xen/arch/x86/include/asm/hvm/svm/svm.h
> +++ b/xen/arch/x86/include/asm/hvm/svm/svm.h
> @@ -38,10 +38,18 @@ extern u32 svm_feature_flags;
>  #define SVM_FEATURE_SSS   19 /* NPT Supervisor Shadow Stacks */
>  #define SVM_FEATURE_SPEC_CTRL 20 /* MSR_SPEC_CTRL virtualisation */
>  
> +#ifdef CONFIG_SVM
>  static inline bool cpu_has_svm_feature(unsigned int feat)
>  {
>  return svm_feature_flags & (1u << feat);
>  }
> +#else
> +static inline bool cpu_has_svm_feature(unsigned int feat)
> +{
> +return false;
> +}
> +#endif
> +
>  #define cpu_has_svm_npt   cpu_has_svm_feature(SVM_FEATURE_NPT)
>  #define cpu_has_svm_lbrv  cpu_has_svm_feature(SVM_FEATURE_LBRV)
>  #define cpu_has_svm_svml  cpu_has_svm_feature(SVM_FEATURE_SVML)
> diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h 
> b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
> index fd197e2603..2d927d3100 100644
> --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
> +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
> @@ -287,10 +287,17 @@ extern uint64_t vmx_tertiary_exec_control;
>  #define VMX_VPID_INVVPID_SINGLE_CONTEXT_RETAINING_GLOBAL 0x800ULL
>  extern u64 vmx_ept_vpid_cap;
>  
> +#ifdef CONFIG_VMX
>  static inline bool vmx_ctrl_has_feature(uint64_t control, unsigned long 
> feature)
>  {
>  return control & feature;
>  }
> +#else
> +static inline bool vmx_ctrl_has_feature(uint64_t control, unsigned long 
> feature)
> +{
> +return false;
> +}
> +#endif
>  
>  #define VMX_MISC_ACTIVITY_MASK  0x01c0
>  #define VMX_MISC_PROC_TRACE 0x4000

I'm afraid this is going in an unhelpful direction.  We want to move
both of these files to be local to arch/x86/hvm/{vmx,svm}/.

cpu_has_svm_* isn't actually used outside of svm/; only the plain
SVM_FEATURE_* constants are, and that's only because they're not
expressed as plain cpu features yet.

cpu_has_vmx_* has a few more users, but most are unlikely to remain in
this form.  One critical set of changes to fix vulnerabilities in
nested-virt is to make almost of of these decisions based on per-domain
state, not host state.  The aspects which are host state should be in
regular cpu features.

I already volunteered to sort out the SEV feature leaf properly, and I
was going to do the SVM leaf while I was at it.  If you can wait a few
days, I might be able to make half of this problem disappear.

~Andrew



Re: [XEN PATCH v1 08/15] x86/vpmu: separate amd/intel vPMU code

2024-04-16 Thread Andrew Cooper
On 16/04/2024 7:35 am, Sergiy Kibrik wrote:
> diff --git a/xen/arch/x86/cpu/Makefile b/xen/arch/x86/cpu/Makefile
> index 35561fe51d..d3d7b8fb2e 100644
> --- a/xen/arch/x86/cpu/Makefile
> +++ b/xen/arch/x86/cpu/Makefile
> @@ -10,4 +10,6 @@ obj-y += intel.o
>  obj-y += intel_cacheinfo.o
>  obj-y += mwait-idle.o
>  obj-y += shanghai.o
> -obj-y += vpmu.o vpmu_amd.o vpmu_intel.o
> +obj-y += vpmu.o
> +obj-$(CONFIG_SVM) += vpmu_amd.o
> +obj-$(CONFIG_VMX) += vpmu_intel.o

I'm afraid this breaks perf counters on PV guests.  These files are
joint guest-type implementations.

Seeing as you leave vpmu.o alone, I guess that all you're actually
wanting to do is compile out vpmu_intel.o?  In which case, use
CONFIG_{AMD,INTEL} rather than CONFIG_{SVM,VMX} please.

~Andrew



Re: [PATCH] Mini-OS: add some macros for asm statements

2024-04-16 Thread Andrew Cooper
On 16/04/2024 8:11 am, Juergen Gross wrote:
> diff --git a/arch/x86/sched.c b/arch/x86/sched.c
> index dabe6fd6..460dea2e 100644
> --- a/arch/x86/sched.c
> +++ b/arch/x86/sched.c
> @@ -119,20 +113,12 @@ struct thread* arch_create_thread(char *name, void 
> (*function)(void *),
>  
>  void run_idle_thread(void)
>  {
> -/* Switch stacks and run the thread */ 
> -#if defined(__i386__)
> -__asm__ __volatile__("mov %0,%%esp\n\t"
> - "push %1\n\t" 
> - "ret"
> +/* Switch stacks and run the thread */
> +__asm__ __volatile__("mov %0,"ASM_SP"\n\t"
> + "push %1\n\t"
> + "ret"
>   :"=m" (idle_thread->sp)
> - :"m" (idle_thread->ip));  
> -#elif defined(__x86_64__)
> -__asm__ __volatile__("mov %0,%%rsp\n\t"
> - "push %1\n\t" 
> - "ret"
> - :"=m" (idle_thread->sp)
> - :"m" (idle_thread->ip));
> 
> -#endif
> + :"m" (idle_thread->ip));

I know you're only transforming the existing logic, but this is dodgy in
several ways.

First, PUSH/RET is more commonly spelt JMP.  Second, sp is an input
parameter not an output, so I'm pretty sure this only works by luck.

01ee :
 1ee:   55  push   %rbp
 1ef:   48 89 e5    mov    %rsp,%rbp
 1f2:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax    # 1f9

 1f9:   48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx    # 200

 200:   48 8b 60 10 mov    0x10(%rax),%rsp
 204:   ff 72 18    pushq  0x18(%rdx)
 207:   c3  retq   
 208:   90  nop
 209:   5d  pop    %rbp
 20a:   c3  retq   


This only works because the address constructed for sp's "output" is
happens to be on a single-parameter instruction where's its good as an
input too.

Anyway, this is a much better way of writing it:

    asm volatile ("mov %[sp], %%"ASM_SP"\n\t"
  "jmp *%[ip]\n\t"
  :
  : [sp] "m" (idle_thread->sp),
    [ip] "m" (idle_thread->ip));

and also highlights that run_idle_thread() should be marked noreturn.

>  }
>  
>  unsigned long __local_irq_save(void)
> diff --git a/include/x86/os.h b/include/x86/os.h
> index ee34d784..485d90b8 100644
> --- a/include/x86/os.h
> +++ b/include/x86/os.h
> @@ -77,6 +77,17 @@ int  arch_suspend(void);
>  void arch_post_suspend(int canceled);
>  void arch_fini(void);
>  
> +#if defined(__i386__)
> +#define __SZ"l"
> +#define __REG   "e"
> +#else
> +#define __SZ"q"
> +#define __REG   "r"
> +#endif
> +
> +#define ASM_SP  "%%"__REG"sp"

The %% should be at the usage sites, not here.  That way, you can use
ASM_SP in e.g. file-scope asm where it's spelt with a single %.

> +#define ASM_MOV "mov"__SZ

There's no need for ASM_MOV.  Regular plain mov (no suffix) will work in
all the cases you're trying to use it, and makes the asm easier to
read.  Notice how run_idle_thread() is already like this.

~Andrew



Re: [XEN PATCH v1 06/15] x86/p2m: guard altp2m code with CONFIG_VMX option

2024-04-16 Thread Andrew Cooper
On 16/04/2024 7:31 am, Sergiy Kibrik wrote:
> Instead of using generic CONFIG_HVM option switch to a bit more specific
> CONFIG_VMX option for altp2m support, as it depends on VMX. Also guard
> altp2m routines, so that it can be disabled completely in the build.
>
> Signed-off-by: Sergiy Kibrik 

Altp2m is not VMX-specific.  It's just no-one has wired it up on AMD, or
accepted the long-outstanding ARM patchset where it was made to work.

If you want to compile it, you probably want CONFIG_ALTP2M.

However, it's not even x86 specific.  See the uses in common/monitor.c

CC Tamas.

~Andrew



Re: [XEN PATCH v1 15/15] x86/hvm: make AMD-V and Intel VT-x support configurable

2024-04-16 Thread Andrew Cooper
On 16/04/2024 7:50 am, Sergiy Kibrik wrote:
> From: Xenia Ragiadakou 
>
> Provide the user with configuration control over the cpu virtualization 
> support
> in Xen by making SVM and VMX options user selectable.
>
> To preserve the current default behavior, both options depend on HVM and
> default to Y.
>
> To prevent users from unknowingly disabling virtualization support, make the
> controls user selectable only if EXPERT is enabled.
> Also make INTEL_IOMMU/AMD_IOMMU options dependant on VMX/SVM options.

Everything else seems ok, but there's no inherent dependency between
VMX/SVM and IOMMUs.  There are certain features (HAP/IOMMU pagetable
sharing, posted interrupts) which do depend on both, but the vast
majority of functionality is independent.

It would be a legitimate config (although getting less plausible, these
days) to have PV && IOMMU && !HVM.

Furthermore, randconfig will do a better job without such restrictions
in place.

~Andrew



Re: [XEN PATCH] docs/misra: mark the gzip folder as adopted code

2024-04-15 Thread Andrew Cooper
On 15/04/2024 10:56 am, Federico Serafini wrote:
> Mark the whole gzip folder as adopted code and remove the redundant
> deviation of file inflate.
>
> Signed-off-by: Federico Serafini 

Acked-by: Andrew Cooper 

I hadn't realised that we had a special case like this.  Definitely
better to get rid of it.

I've pulled this into my `for-next` branch, and will get it committed
properly when OSSTest (our non-gitlab CI) is in a happier state.



  1   2   3   4   5   6   7   8   9   10   >