Re: [PATCH v2 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86

2022-02-14 Thread Jan Beulich
On 14.02.2022 18:09, Jane Malalane wrote:
> On 14/02/2022 13:18, Jan Beulich wrote:
>> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
>> unless you have verified the sender and know the content is safe.
>>
>> On 14.02.2022 14:11, Jane Malalane wrote:
>>> On 11/02/2022 11:46, Jan Beulich wrote:
 [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
 unless you have verified the sender and know the content is safe.

 On 11.02.2022 12:29, Roger Pau Monné wrote:
> On Fri, Feb 11, 2022 at 10:06:48AM +, Jane Malalane wrote:
>> On 10/02/2022 10:03, Roger Pau Monné wrote:
>>> On Mon, Feb 07, 2022 at 06:21:00PM +, Jane Malalane wrote:
 diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
 index 7ab15e07a0..4060aef1bd 100644
 --- a/xen/arch/x86/hvm/vmx/vmcs.c
 +++ b/xen/arch/x86/hvm/vmx/vmcs.c
 @@ -343,6 +343,15 @@ static int vmx_init_vmcs_config(bool bsp)
 MSR_IA32_VMX_PROCBASED_CTLS2, );
 }
 
 +/* Check whether hardware supports accelerated xapic and x2apic. 
 */
 +if ( bsp )
 +{
 +assisted_xapic_available = 
 cpu_has_vmx_virtualize_apic_accesses;
 +assisted_x2apic_available = (cpu_has_vmx_apic_reg_virt ||
 + 
 cpu_has_vmx_virtual_intr_delivery) &&
 +
 cpu_has_vmx_virtualize_x2apic_mode;
>>>
>>> I've been think about this, and it seems kind of asymmetric that for
>>> xAPIC mode we report hw assisted support only with
>>> virtualize_apic_accesses available, while for x2APIC we require
>>> virtualize_x2apic_mode plus either apic_reg_virt or
>>> virtual_intr_delivery.
>>>
>>> I think we likely need to be more consistent here, and report hw
>>> assisted x2APIC support as long as virtualize_x2apic_mode is
>>> available.
>>>
>>> This will likely have some effect on patch 2 also, as you will have to
>>> adjust vmx_vlapic_msr_changed.
>>>
>>> Thanks, Roger.
>>
>> Any other thoughts on this? As on one hand it is asymmetric but also
>> there isn't much assistance with only virtualize_x2apic_mode set as, in
>> this case, a VM exit will be avoided only when trying to access the TPR
>> register.
>
> I've been thinking about this, and reporting hardware assisted
> x{2}APIC virtualization with just
> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES or
> SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE doesn't seem very helpful. While
> those provide some assistance to the VMM in order to handle APIC
> accesses, it will still require a trap into the hypervisor to handle
> most of the accesses.
>
> So maybe we should only report hardware assisted support when the
> mentioned features are present together with
> SECONDARY_EXEC_APIC_REGISTER_VIRT?

 Not sure - "some assistance" seems still a little better than none at all.
 Which route to go depends on what exactly we intend the bit to be used for.

>>> True. I intended this bit to be specifically for enabling
>>> assisted_x{2}apic. So, would it be inconsistent to report hardware
>>> assistance with just VIRTUALIZE_APIC_ACCESSES or VIRTUALIZE_X2APIC_MODE
>>> but still claim that x{2}apic is virtualized if no MSR accesses are
>>> intercepted with XEN_HVM_CPUID_X2APIC_VIRT (in traps.c) so that, as you
>>> say, the guest gets at least "some assistance" instead of none but we
>>> still claim x{2}apic virtualization when it is actually complete? Maybe
>>> I could also add a comment alluding to this in the xl documentation.
>>
>> To rephrase my earlier point: Which kind of decisions are the consumer(s)
>> of us reporting hardware assistance going to take? In how far is there a
>> risk that "some assistance" is overall going to lead to a loss of
>> performance? I guess I'd need to see comment and actual code all in one
>> place ...
>>
> So, I was thinking of adding something along the lines of:
> 
> +=item B B<(x86 only)>
> +Enables or disables hardware assisted virtualization for xAPIC. This
> +allows accessing APIC registers without a VM-exit. Notice enabling
> +this does not guarantee full virtualization for xAPIC, as this can
> +only be achieved if hardware supports “APIC-register virtualization”
> +and “virtual-interrupt delivery”. The default is settable via
> +L.

But isn't this contradictory? Doesn't lack of APIC-register virtualization
mean VM exits upon (most) accesses?

Jan

> and going for assisted_x2apic_available = 
> cpu_has_vmx_virtualize_x2apic_mode.
> 
> This would prevent the customer from expecting full acceleration when 
> apic_register_virt and/or virtual_intr_delivery aren't available whilst 
> still offering some if they are not available as Xen currently does. In 
> 

[xen-unstable test] 168111: tolerable FAIL - PUSHED

2022-02-14 Thread osstest service owner
flight 168111 xen-unstable real [real]
flight 168116 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/168111/
http://logs.test-lab.xenproject.org/osstest/logs/168116/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemuu-win7-amd64 12 windows-install fail pass in 
168116-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stop  fail in 168116 like 168105
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168105
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168105
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168105
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168105
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168105
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168105
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168105
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168105
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168105
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168105
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168105
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-check

[PATCH V3 11/13] media: tda8083: use time_is_after_jiffies() instead of open coding it

2022-02-14 Thread Qing Wang
From: Wang Qing 

Use the helper function time_is_{before,after}_jiffies() to improve
code readability.

Signed-off-by: Wang Qing 
---
 drivers/media/dvb-frontends/tda8083.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/media/dvb-frontends/tda8083.c 
b/drivers/media/dvb-frontends/tda8083.c
index 5be11fd..49c4fe1
--- a/drivers/media/dvb-frontends/tda8083.c
+++ b/drivers/media/dvb-frontends/tda8083.c
@@ -162,7 +162,7 @@ static void tda8083_wait_diseqc_fifo (struct tda8083_state* 
state, int timeout)
 {
unsigned long start = jiffies;
 
-   while (jiffies - start < timeout &&
+   while (time_is_after_jiffies(start + timeout) &&
   !(tda8083_readreg(state, 0x02) & 0x80))
{
msleep(50);
-- 
2.7.4




[PATCH V3 12/13] media: wl128x: use time_is_before_jiffies() instead of open coding it

2022-02-14 Thread Qing Wang
From: Wang Qing 

Use the helper function time_is_{before,after}_jiffies() to improve
code readability.

Signed-off-by: Wang Qing 
---
 drivers/media/radio/wl128x/fmdrv_common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/media/radio/wl128x/fmdrv_common.c 
b/drivers/media/radio/wl128x/fmdrv_common.c
index 6142484d..a599d08
--- a/drivers/media/radio/wl128x/fmdrv_common.c
+++ b/drivers/media/radio/wl128x/fmdrv_common.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "fmdrv.h"
 #include "fmdrv_v4l2.h"
@@ -342,7 +343,7 @@ static void send_tasklet(struct tasklet_struct *t)
return;
 
/* Check, is there any timeout happened to last transmitted packet */
-   if ((jiffies - fmdev->last_tx_jiffies) > FM_DRV_TX_TIMEOUT) {
+   if (time_is_before_jiffies(fmdev->last_tx_jiffies + FM_DRV_TX_TIMEOUT)) 
{
fmerr("TX timeout occurred\n");
atomic_set(>tx_cnt, 1);
}
-- 
2.7.4




[PATCH V3 9/13] media: si21xx: use time_is_before_jiffies() instead of open coding it

2022-02-14 Thread Qing Wang
From: Wang Qing 

Use the helper function time_is_{before,after}_jiffies() to improve
code readability.

Signed-off-by: Wang Qing 
---
 drivers/media/dvb-frontends/si21xx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/media/dvb-frontends/si21xx.c 
b/drivers/media/dvb-frontends/si21xx.c
index 001b235..1c6cf76
--- a/drivers/media/dvb-frontends/si21xx.c
+++ b/drivers/media/dvb-frontends/si21xx.c
@@ -336,7 +336,7 @@ static int si21xx_wait_diseqc_idle(struct si21xx_state 
*state, int timeout)
dprintk("%s\n", __func__);
 
while ((si21_readreg(state, LNB_CTRL_REG_1) & 0x8) == 8) {
-   if (jiffies - start > timeout) {
+   if (time_is_before_jiffies(start + timeout)) {
dprintk("%s: timeout!!\n", __func__);
return -ETIMEDOUT;
}
-- 
2.7.4




[PATCH V3 10/13] media: stv0299: use time_is_before_jiffies() instead of open coding it

2022-02-14 Thread Qing Wang
From: Wang Qing 

Use the helper function time_is_{before,after}_jiffies() to improve
code readability.

Signed-off-by: Wang Qing 
---
 drivers/media/dvb-frontends/stv0299.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/dvb-frontends/stv0299.c 
b/drivers/media/dvb-frontends/stv0299.c
index 421395e..867ae04
--- a/drivers/media/dvb-frontends/stv0299.c
+++ b/drivers/media/dvb-frontends/stv0299.c
@@ -183,7 +183,7 @@ static int stv0299_wait_diseqc_fifo (struct stv0299_state* 
state, int timeout)
dprintk ("%s\n", __func__);
 
while (stv0299_readreg(state, 0x0a) & 1) {
-   if (jiffies - start > timeout) {
+   if (time_is_before_jiffies(start + timeout)) {
dprintk ("%s: timeout!!\n", __func__);
return -ETIMEDOUT;
}
@@ -200,7 +200,7 @@ static int stv0299_wait_diseqc_idle (struct stv0299_state* 
state, int timeout)
dprintk ("%s\n", __func__);
 
while ((stv0299_readreg(state, 0x0a) & 3) != 2 ) {
-   if (jiffies - start > timeout) {
+   if (time_is_before_jiffies(start + timeout)) {
dprintk ("%s: timeout!!\n", __func__);
return -ETIMEDOUT;
}
-- 
2.7.4




[PATCH V3 6/13] input: serio: use time_is_before_jiffies() instead of open coding it

2022-02-14 Thread Qing Wang
From: Wang Qing 

Use the helper function time_is_{before,after}_jiffies() to improve
code readability.

Signed-off-by: Wang Qing 
---
 drivers/input/serio/ps2-gpio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/input/serio/ps2-gpio.c b/drivers/input/serio/ps2-gpio.c
index 8970b49..7834296
--- a/drivers/input/serio/ps2-gpio.c
+++ b/drivers/input/serio/ps2-gpio.c
@@ -136,7 +136,7 @@ static irqreturn_t ps2_gpio_irq_rx(struct ps2_gpio_data 
*drvdata)
if (old_jiffies == 0)
old_jiffies = jiffies;
 
-   if ((jiffies - old_jiffies) > usecs_to_jiffies(100)) {
+   if (time_is_before_jiffies(old_jiffies + usecs_to_jiffies(100))) {
dev_err(drvdata->dev,
"RX: timeout, probably we missed an interrupt\n");
goto err;
@@ -237,7 +237,7 @@ static irqreturn_t ps2_gpio_irq_tx(struct ps2_gpio_data 
*drvdata)
if (old_jiffies == 0)
old_jiffies = jiffies;
 
-   if ((jiffies - old_jiffies) > usecs_to_jiffies(100)) {
+   if (time_is_before_jiffies(old_jiffies + usecs_to_jiffies(100))) {
dev_err(drvdata->dev,
"TX: timeout, probably we missed an interrupt\n");
goto err;
-- 
2.7.4




[PATCH V3 10/13] md: use time_is_before_eq_jiffies() instead of open coding it

2022-02-14 Thread Qing Wang
From: Wang Qing 

Use the helper function time_is_{before,after}_jiffies() to improve
code readability.

Signed-off-by: Wang Qing 
---
 drivers/md/dm-writecache.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 5630b47..125bb5d
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "dm-io-tracker.h"
 
 #define DM_MSG_PREFIX "writecache"
@@ -1971,8 +1972,8 @@ static void writecache_writeback(struct work_struct *work)
while (!list_empty(>lru) &&
   (wc->writeback_all ||
wc->freelist_size + wc->writeback_size <= 
wc->freelist_low_watermark ||
-   (jiffies - container_of(wc->lru.prev, struct wc_entry, 
lru)->age >=
-wc->max_age - wc->max_age / MAX_AGE_DIV))) {
+   time_is_before_eq_jiffies(container_of(wc->lru.prev, struct 
wc_entry, lru)->age
+ + wc->max_age - wc->max_age / MAX_AGE_DIV)) {
 
n_walked++;
if (unlikely(n_walked > WRITEBACK_LATENCY) &&
-- 
2.7.4




[PATCH V3 13/13] media: vivid: use time_is_after_jiffies() instead of open coding it

2022-02-14 Thread Qing Wang
From: Wang Qing 

Use the helper function time_is_{before,after}_jiffies() to improve
code readability.

Signed-off-by: Wang Qing 
---
 drivers/media/test-drivers/vivid/vivid-kthread-cap.c   | 3 ++-
 drivers/media/test-drivers/vivid/vivid-kthread-out.c   | 3 ++-
 drivers/media/test-drivers/vivid/vivid-kthread-touch.c | 3 ++-
 drivers/media/test-drivers/vivid/vivid-sdr-cap.c   | 3 ++-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/media/test-drivers/vivid/vivid-kthread-cap.c 
b/drivers/media/test-drivers/vivid/vivid-kthread-cap.c
index 6baa046..295f4a3
--- a/drivers/media/test-drivers/vivid/vivid-kthread-cap.c
+++ b/drivers/media/test-drivers/vivid/vivid-kthread-cap.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -893,7 +894,7 @@ static int vivid_thread_vid_cap(void *data)
next_jiffies_since_start = jiffies_since_start;
 
wait_jiffies = next_jiffies_since_start - jiffies_since_start;
-   while (jiffies - cur_jiffies < wait_jiffies &&
+   while (time_is_after_jiffies(cur_jiffies + wait_jiffies) &&
   !kthread_should_stop())
schedule();
}
diff --git a/drivers/media/test-drivers/vivid/vivid-kthread-out.c 
b/drivers/media/test-drivers/vivid/vivid-kthread-out.c
index b6d4316..13f737e
--- a/drivers/media/test-drivers/vivid/vivid-kthread-out.c
+++ b/drivers/media/test-drivers/vivid/vivid-kthread-out.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -234,7 +235,7 @@ static int vivid_thread_vid_out(void *data)
next_jiffies_since_start = jiffies_since_start;
 
wait_jiffies = next_jiffies_since_start - jiffies_since_start;
-   while (jiffies - cur_jiffies < wait_jiffies &&
+   while (time_is_after_jiffies(cur_jiffies + wait_jiffies) &&
   !kthread_should_stop())
schedule();
}
diff --git a/drivers/media/test-drivers/vivid/vivid-kthread-touch.c 
b/drivers/media/test-drivers/vivid/vivid-kthread-touch.c
index f065faae..8828243
--- a/drivers/media/test-drivers/vivid/vivid-kthread-touch.c
+++ b/drivers/media/test-drivers/vivid/vivid-kthread-touch.c
@@ -5,6 +5,7 @@
  */
 
 #include 
+#include 
 #include "vivid-core.h"
 #include "vivid-kthread-touch.h"
 #include "vivid-touch-cap.h"
@@ -134,7 +135,7 @@ static int vivid_thread_touch_cap(void *data)
next_jiffies_since_start = jiffies_since_start;
 
wait_jiffies = next_jiffies_since_start - jiffies_since_start;
-   while (jiffies - cur_jiffies < wait_jiffies &&
+   while (time_is_after_jiffies(cur_jiffies + wait_jiffies) &&
   !kthread_should_stop())
schedule();
}
diff --git a/drivers/media/test-drivers/vivid/vivid-sdr-cap.c 
b/drivers/media/test-drivers/vivid/vivid-sdr-cap.c
index 59fd508..f82856b
--- a/drivers/media/test-drivers/vivid/vivid-sdr-cap.c
+++ b/drivers/media/test-drivers/vivid/vivid-sdr-cap.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "vivid-core.h"
 #include "vivid-ctrls.h"
@@ -205,7 +206,7 @@ static int vivid_thread_sdr_cap(void *data)
next_jiffies_since_start = jiffies_since_start;
 
wait_jiffies = next_jiffies_since_start - jiffies_since_start;
-   while (jiffies - cur_jiffies < wait_jiffies &&
+   while (time_is_after_jiffies(cur_jiffies + wait_jiffies) &&
   !kthread_should_stop())
schedule();
}
-- 
2.7.4




[PATCH V3 7/13] md: use time_is_before_jiffies(() instead of open coding it

2022-02-14 Thread Qing Wang
From: Wang Qing 

Use the helper function time_is_{before,after}_jiffies() to improve
code readability.

Signed-off-by: Wang Qing 
---
 drivers/md/dm-thin.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index f4234d6..dced764
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -161,7 +161,7 @@ static void throttle_work_start(struct throttle *t)
 
 static void throttle_work_update(struct throttle *t)
 {
-   if (!t->throttle_applied && jiffies > t->threshold) {
+   if (!t->throttle_applied && time_is_before_jiffies(t->threshold)) {
down_write(>lock);
t->throttle_applied = true;
}
-- 
2.7.4




[PATCH V3 5/13] hid: use time_is_after_jiffies() instead of open coding it

2022-02-14 Thread Qing Wang
From: Wang Qing 

Use the helper function time_is_{before,after}_jiffies() to improve
code readability.

Signed-off-by: Wang Qing 
Acked-by: Srinivas Pandruvada 
---
 drivers/hid/intel-ish-hid/ipc/ipc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hid/intel-ish-hid/ipc/ipc.c 
b/drivers/hid/intel-ish-hid/ipc/ipc.c
index 8ccb246..15e1423
--- a/drivers/hid/intel-ish-hid/ipc/ipc.c
+++ b/drivers/hid/intel-ish-hid/ipc/ipc.c
@@ -578,7 +578,7 @@ static void _ish_sync_fw_clock(struct ishtp_device *dev)
static unsigned longprev_sync;
uint64_tusec;
 
-   if (prev_sync && jiffies - prev_sync < 20 * HZ)
+   if (prev_sync && time_is_after_jiffies(prev_sync + 20 * HZ))
return;
 
prev_sync = jiffies;
-- 
2.7.4




[PATCH V3 4/13] gpu: drm: radeon: use time_is_before_jiffies() instead of open coding it

2022-02-14 Thread Qing Wang
From: Wang Qing 

Use the helper function time_is_{before,after}_jiffies() to improve
code readability.

Signed-off-by: Wang Qing 
---
 drivers/gpu/drm/radeon/radeon_pm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_pm.c 
b/drivers/gpu/drm/radeon/radeon_pm.c
index c67b6dd..53d536a
--- a/drivers/gpu/drm/radeon/radeon_pm.c
+++ b/drivers/gpu/drm/radeon/radeon_pm.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -1899,7 +1900,7 @@ static void radeon_dynpm_idle_work_handler(struct 
work_struct *work)
 * to false since we want to wait for vbl to avoid flicker.
 */
if (rdev->pm.dynpm_planned_action != DYNPM_ACTION_NONE &&
-   jiffies > rdev->pm.dynpm_action_timeout) {
+   time_is_before_jiffies(rdev->pm.dynpm_action_timeout)) {
radeon_pm_get_dynpm_state(rdev);
radeon_pm_set_clocks(rdev);
}
-- 
2.7.4




[PATCH V3 3/13] gpu: drm: i915: use time_is_after_jiffies() instead of open coding it

2022-02-14 Thread Qing Wang
From: Wang Qing 

Use the helper function time_is_{before,after}_jiffies() to improve
code readability.

Signed-off-by: Wang Qing 
---
 drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c 
b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c
index 9db3dcb..b289abb
--- a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c
@@ -56,7 +56,7 @@ static bool pool_free_older_than(struct intel_gt_buffer_pool 
*pool, long keep)
node = list_entry(pos, typeof(*node), link);
 
age = READ_ONCE(node->age);
-   if (!age || jiffies - age < keep)
+   if (!age || time_is_after_jiffies(age + keep))
break;
 
/* Check we are the first to claim this node */
-- 
2.7.4




[PATCH V3 2/13] clk: mvebu: use time_is_before_eq_jiffies() instead of open coding it

2022-02-14 Thread Qing Wang
From: Wang Qing 

Use the helper function time_is_{before,after}_jiffies() to improve
code readability.

Signed-off-by: Wang Qing 
---
 drivers/clk/mvebu/armada-37xx-periph.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/mvebu/armada-37xx-periph.c 
b/drivers/clk/mvebu/armada-37xx-periph.c
index 32ac6b6..14d73f8
--- a/drivers/clk/mvebu/armada-37xx-periph.c
+++ b/drivers/clk/mvebu/armada-37xx-periph.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define TBG_SEL0x0
 #define DIV_SEL0   0x4
@@ -541,7 +542,7 @@ static void clk_pm_cpu_set_rate_wa(struct clk_pm_cpu 
*pm_cpu,
 * We are going to L0 with rate >= 1GHz. Check whether we have been at
 * L1 for long enough time. If not, go to L1 for 20ms.
 */
-   if (pm_cpu->l1_expiration && jiffies >= pm_cpu->l1_expiration)
+   if (pm_cpu->l1_expiration && 
time_is_before_eq_jiffies(pm_cpu->l1_expiration))
goto invalidate_l1_exp;
 
regmap_update_bits(base, ARMADA_37XX_NB_CPU_LOAD,
-- 
2.7.4




[PATCH V3 00/13] use time_is_{before,after}_jiffies() instead of open coding it

2022-02-14 Thread Qing Wang
From: Wang Qing 

Use the helper function time_is_{before,after}_jiffies() to improve
code readability.

V2:
Batch them in a series suggested by Joe.
Use time_xxx_jiffies() instead of time_xxx() suggested by Kieran.

V3:
Fix subject and description suggested by Ted.

Wang Qing (14):
  block: xen: use time_is_before_eq_jiffies() instead of open coding it
  clk: mvebu: use time_is_before_eq_jiffies() instead of open coding it
  gpu: drm: i915: use time_is_after_jiffies() instead of open coding it
  gpu: drm: radeon: use time_is_before_jiffies() instead open coding it
  hid: use time_is_after_jiffies() instead of open coding it
  input: serio: use time_is_before_jiffies() instead of open coding it
  md: use time_is_before_jiffies(() instead of open coding it
  md: use time_is_before_eq_jiffies() instead of open coding it
  media: si21xx: use time_is_before_jiffies() instead of open coding it
  media: stv0299: use time_is_before_jiffies() instead of open coding it
  media: tda8083: use time_is_after_jiffies() instead of open coding it
  media: wl128x: use time_is_before_jiffies() instead of open coding it
  media: vivid: use time_is_after_jiffies() instead of open coding it

 drivers/block/xen-blkback/blkback.c| 5 +++--
 drivers/clk/mvebu/armada-37xx-periph.c | 3 ++-
 drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c | 2 +-
 drivers/gpu/drm/radeon/radeon_pm.c | 3 ++-
 drivers/hid/intel-ish-hid/ipc/ipc.c| 2 +-
 drivers/input/serio/ps2-gpio.c | 4 ++--
 drivers/md/dm-thin.c   | 2 +-
 drivers/md/dm-writecache.c | 5 +++--
 drivers/media/dvb-frontends/si21xx.c   | 2 +-
 drivers/media/dvb-frontends/stv0299.c  | 4 ++--
 drivers/media/dvb-frontends/tda8083.c  | 2 +-
 drivers/media/radio/wl128x/fmdrv_common.c  | 3 ++-
 drivers/media/test-drivers/vivid/vivid-kthread-cap.c   | 3 ++-
 drivers/media/test-drivers/vivid/vivid-kthread-out.c   | 3 ++-
 drivers/media/test-drivers/vivid/vivid-kthread-touch.c | 3 ++-
 drivers/media/test-drivers/vivid/vivid-sdr-cap.c   | 3 ++-
 17 files changed, 31 insertions(+), 22 deletions(-)

-- 
2.7.4




[PATCH V3 1/13] block: xen: use time_is_before_eq_jiffies() instead of open coding it

2022-02-14 Thread Qing Wang
From: Wang Qing 

Use the helper function time_is_{before,after}_jiffies() to improve
code readability.

Signed-off-by: Wang Qing 
---
 drivers/block/xen-blkback/blkback.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index d1e2646..aecc1f4
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -134,8 +135,8 @@ module_param(log_stats, int, 0644);
 
 static inline bool persistent_gnt_timeout(struct persistent_gnt 
*persistent_gnt)
 {
-   return pgrant_timeout && (jiffies - persistent_gnt->last_used >=
-   HZ * pgrant_timeout);
+   return pgrant_timeout && time_is_before_eq_jiffies(
+   persistent_gnt->last_used + HZ * pgrant_timeout);
 }
 
 #define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
-- 
2.7.4




[qemu-mainline test] 168109: FAIL

2022-02-14 Thread osstest service owner
flight 168109 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168109/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64-xsm  broken  in 168104
 build-arm64  broken  in 168104
 build-arm64-pvopsbroken  in 168104
 build-arm64-pvops  4 host-install(4) broken in 168104 REGR. vs. 168059
 build-arm64-xsm4 host-install(4) broken in 168104 REGR. vs. 168059
 build-arm644 host-install(4) broken in 168104 REGR. vs. 168059

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-libvirt  8 xen-boot fail in 168104 pass in 168109
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail pass 
in 168104

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked in 168104 n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked in 168104 n/a
 test-arm64-arm64-xl-seattle   1 build-check(1)   blocked in 168104 n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked in 168104 n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked in 168104 n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked in 168104 n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked in 168104 n/a
 build-arm64-libvirt   1 build-check(1)   blocked in 168104 n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked in 168104 n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked in 168104 n/a
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168059
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168059
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168059
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168059
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168059
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168059
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168059
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168059
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd 

Re: [RFC v2 5/8] xen/arm: introduce SCMI-SMC mediator driver

2022-02-14 Thread Stefano Stabellini
On Mon, 14 Feb 2022, Oleksii Moisieiev wrote:
> Hi Bertrand,
> 
> On Mon, Feb 14, 2022 at 11:27:21AM +, Bertrand Marquis wrote:
> > Hi Oleksii,
> > 
> > > On 14 Feb 2022, at 11:13, Oleksii Moisieiev  
> > > wrote:
> > > 
> > > Hi Julien,
> > > 
> > > On Sat, Feb 12, 2022 at 12:43:56PM +, Julien Grall wrote:
> > >> Hi,
> > >> 
> > >> On 11/02/2022 11:18, Bertrand Marquis wrote:
> > >>> Do you plan to add support for other boards ?
> > >>> 
> > >>> Did you discuss more in general with the linux kernel guys to see if 
> > >>> this
> > >>> approach was agreed and will be adopted by other manufacturers ?
> > >>> 
> > >>> All in all I think this is a good idea but I fear that all this will 
> > >>> actually only
> > >>> be used by one board or one manufacturer and other might use a different
> > >>> strategy, I would like to unrisk this before merging this in Xen.
> > >> 
> > >> In the past we merged code that would only benefits one vendor (i.e. 
> > >> EEMI).
> > >> That said, this was a vendor specific protocol. I believe the situation 
> > >> is
> > >> different here because the spec is meant to be generic.
> > >> 
> > >>> @julien and Stefano: what is your view here ?
> > >> 
> > >> I share the same concerns as you. I think we need to make sure all the
> > >> pieces we rely on (e.g. firmware, DT bindings) have been agreed before we
> > >> can merge such code in Xen.
> > >> 
> > >> The first step is to have all the pieces available in public so they can 
> > >> be
> > >> reviewed and tested together.
> > >>
> > >> Oleksii, on a separate e-mail, you said you made change for ATF. How 
> > >> much of
> > >> those changes was related to support for Xen? If they are some, then I 
> > >> think
> > >> they should be upstreamed first.
> > >> 
> > > 
> > > Let me share changes, that were done to AT-F and Linux kernel
> > > device-tree in terms of the SCMI mediator POC.
> > > Changes to the Linux kernel:
> > > https://urldefense.com/v3/__https://github.com/oleksiimoisieiev/arm-trusted-firmware/pull/4__;!!GF_29dbcQIUBPA!je9Cu0n0498Yn76OLWjxxVaB7jWJtyWycHX0YARezTnc7aYHpGRJ8tSxHqIC0fTMUUSV$
> > >  [github[.]com]
> > > Based on renesas-rcar linux-bsp, branch v5.10/rcar-5.0.0.rc5
> > > 
> > > Changes to AT-F:
> > > https://urldefense.com/v3/__https://github.com/oleksiimoisieiev/linux-bsp/pull/3__;!!GF_29dbcQIUBPA!je9Cu0n0498Yn76OLWjxxVaB7jWJtyWycHX0YARezTnc7aYHpGRJ8tSxHqIC0eDKS3ge$
> > >  [github[.]com]
> > > Based on renesas-rcar/arm-trusted-firmware branch rcar_gen3_v2.5.
> > 
> > You inverted the links but thanks this is really useful.
> > 
> 
> That's strange. Links looks good from xen.markmail.org interface.
> 
> > Did you push the ATF changes to mainstream ATF or discuss those with
> > the maintainers ?
> 
> No. We did changes in ATF as a proof of concept.
> 
> > 
> > The strategy overall is nice but we need to make sure this is accepted and
> >  merged by all parties (ATF and Linux) to make sure the support for this 
> > will
> > not only be available in Xen and for one board.

+1


> I've prepared patch to Linux kernel, which is introducing scmi_devid
> binding, needed to set device permissions via SCMI. I've contacted
> Sudeep Holla , who is the maintainer of the SCMI 
> protocol
> drivers. Waiting for the response.
> 
> Changes to ATF are not Xen specific and were done in terms of POC. We do
> not have plans to upstream those changes right now.

If this work relies on a new interface in ATF, and the interface is not
vendor-specific, then at least the interface (if not the code) should be
reviewed and accepted by ATF.

Otherwise we risk ending up with an upstream SCMI implementation in Xen
that cannot be used anywhere, except the PoC. To make things worse, this
could happen:

- we upstream the SCMI mediator to Xen
- we upstream any required changes to Linux
- ATF rejects the SCMI-related interface changes
- ATF comes up with a difference interface

At this point we would have to deprecate the implementation in Xen. It
might also be difficult to do so due to versioning issues. We would
need to be able to detect which version of ATF we are running on, to
distinguish the ATF PoC version that works with the old interface from
the new ATF version that supports a different interface.

To avoid this kind of issues we typically expect that all relevant
communities agree on the public interfaces before upstreaming the code.



Re: [PATCH] RFC: Version support policy

2022-02-14 Thread George Dunlap


> On Aug 19, 2021, at 10:18 AM, Jan Beulich  wrote:
> 
> On 13.08.2021 13:37, Ian Jackson wrote:
>> The current policy for minimum supported versions of tools, compilers,
>> etc. is unsatisfactory: For many dependencies no minimum version is
>> specified.  For those where a version is stated, updating it is a
>> decision that has to be explicitly taken for that tool.
> 
> Considering your submission of this having been close to a glibc
> version issue you and I have been discussing, I wonder whether
> "etc" above includes library dependencies as well.
> 
> In any event the precise scope of what is meant to be covered is
> quite important to me: There are affected entities that I'm happy
> to replace on older distros (binutils, gcc). There are potentially
> affected entities that I'm less happy to replace, but at the time
> I did work my way through it for example for Python (to still be
> able to build qemu, the community of which doesn't appear to care
> at all to have their stuff buildable in older environments). The
> point where I'd be really in trouble would be when base platform
> libraries like glibc are required to be a certain minimum version:
> I'd then be (potentially severely) restricted in what systems I
> can actually test stuff on.

The question here is, why would someone running a 10-year-old distro that’s 
been out of support for 6 years want to run a bleeding edge version of Xen?  I 
understand wanting to run Xen 4.16 on (say) Ubuntu 18.04, but who on earth 
would want to run Xen 4.16 on Ubuntu 14.04, and why?  If such people exist, is 
it really worth the effort to try to support them?

> In addition I see a difference between actively breaking e.g.
> building with older tool chains vs (like you have it in your
> README adjustment) merely a statement about what we believe
> things may work with, leaving room for people to fix issues with
> their (older) environments, and such changes then not getting
> rejected simply because of policy.

Yes; I think the principle should be that we *promise* to keep it working on 
the currently-supported releases of a specific set of distros (e.g., Debian, 
Ubuntu, Fedora, SUSE, RHEL).  Working on older versions can be best-effort; if 
simple changes make it compatible with older versions, and aren’t too 
burdensome from a code complexity point of view, they can be accepted.

One of the issues however is build-time checks.  If we have a build-time check 
for version X, but only test it on X+10 or later, then the build may break in 
strange ways when someone tries it on something in between.

I think it’s too much effort to ask developers to try to find the actual 
minimum version of each individual dependency as things evolve.

> While generally I find Marek's proposal better to tie the baseline
> to distros of interest, in a way it only shifts the issue, I'm
> afraid.

What do you mean “shifts the issue”?  You mean shifts it from versions of 
individual components to versions of distros?

That’s why I think we should support only currently-supported distros.  If the 
distro’s maintainers don’t consider the distro worth supporting any more, I 
don’t see why we should make the effort to do so.

 -George


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH] RFC: Version support policy

2022-02-14 Thread George Dunlap


> On Aug 18, 2021, at 12:16 PM, Marek Marczykowski-Górecki 
>  wrote:
> 
> On Fri, Aug 13, 2021 at 12:37:27PM +0100, Ian Jackson wrote:
>> The current policy for minimum supported versions of tools, compilers,
>> etc. is unsatisfactory: For many dependencies no minimum version is
>> specified.  For those where a version is stated, updating it is a
>> decision that has to be explicitly taken for that tool.
>> 
>> The result is persistent debates over what is good to support,
>> conducted in detail in the context of individual patches.
>> 
>> Decisions about support involve tradeoffs, often tradeoffs between the
>> interests of different people.  Currently we don't have anything
>> resembling a guideline.  The result is that the individual debates are
>> inconclusive; and also, this framework does not lead to good feelings
>> amongst participants.
>> 
>> I suggest instead that we adopt a date-based policy: we define a
>> maximum *age* of dependencies that we will support.
> 
> I wonder about another approach: specify supported toolchain version(s)
> based on environments we choose to care about. That would be things like
> "Debian, including LTS (or even ELTS) one", "RHEL/CentOS until X...",
> etc. Based on this, it's easy to derive what's the oldest version that
> needs to be supported.
> This would be also much friendlier for testing - a clear definition
> what environments should be used (in gitlab-ci, I guess).

This is in fact what I’ve been thinking and talking about proposing for a very 
long time.  As far as an open-source offering, what we really want is for the 
newest version of Xen to build on all currently-supported distros.  If the 
distro maintainers themselves no longer want to support a distro, I don’t see 
why we should make the effort to do so.

As you say, this should make testing super easy as well: All we have to do is 
have docker images on gitlab for all the supported distros.

 -George




signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 0/3] amd/msr: implement MSR_VIRT_SPEC_CTRL for HVM guests

2022-02-14 Thread Andrew Cooper
On 01/02/2022 16:46, Roger Pau Monne wrote:
> Hello,
>
> The following series implements support for MSR_VIRT_SPEC_CTRL on
> different AMD CPU families.
>
> Note that the support is added backwards, starting with the newer CPUs
> that support MSR_SPEC_CTRL and moving to the older ones either using
> MSR_VIRT_SPEC_CTRL or the SSBD bit in LS_CFG.
>
> First patch is quite clean, as it uses the shadow SPEC_CTRL in order to
> set the SSBD bit and have it context switched by Xen using the existing
> logic recently added.
>
> The next two patches introduce a different way to context switch SSBD
> either depending on the underlying SSBD support, so it's either using
> VIRT_SPEC_CTRL or the LS_CFG MSR. They also kind of overload the usage of
> several spec_ctrl variables in the hypervisor in order to store the
> status of SSBD even when not using MSR_SPEC_CTRL itself. I've tried to
> document those in the commit messages, but it could be controversial.
>
> Thanks, Roger.

I suspect it would help reviewing things to state what the end result is
intended to be.

1) Xen should use the AMD provided algorithm for engaging SSBD itself. 
This includes using MSR_VIRT_SPEC_CTRL if Xen is nested under another
hypervisor.  In the current code, this is implemented by amd_init_ssbd()
even if only limited to boot paths for simplicity.

2) On Fam15h thru Zen1, Xen should expose MSR_VIRT_SPEC_CTRL to guests
by default to abstract away the model and/or hypervisor specific
differences away in MSR_LS_CFG/MSR_VIRT_SPEC_CTRL.

3) On Zen2 and later, MSR_SPEC_CTRL exists and should be used in
preference.  However, for migration compatibility, Xen should be capable
of offering MSR_VIRT_SPEC_CTRL to guests (max, not default) implemented
in terms of MSR_SPEC_CTRL.

This way, a VM levelled to run on Zen1 and Zen2 sees MSR_VIRT_SPEC_CTRL
and can use it on both hosts, whereas a VM only intending to run on Zen2
gets MSR_SPEC_CTRL by default.

Obviously this means that a VM on Zen2 can opt in to MSR_VIRT_SPEC_CTRL
because of how max vs default works and this is a legal configuration,
even if it's not one you'd expect to see outside of testing scenarios.

~Andrew


[linux-linus test] 168108: regressions - trouble: blocked/broken/fail/pass

2022-02-14 Thread osstest service owner
flight 168108 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168108/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64  broken
 build-arm64-pvopsbroken
 build-arm64-xsm  broken
 build-arm64-xsm   4 host-install(4)broken REGR. vs. 168080
 build-arm64   4 host-install(4)broken REGR. vs. 168080
 build-arm64-pvops 4 host-install(4)broken REGR. vs. 168080
 test-armhf-armhf-libvirt-qcow2broken in 168103
 test-armhf-armhf-libvirt-qcow2 10 host-ping-check-xenfail REGR. vs. 168080

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-libvirt-qcow2 5 host-install(5) broken in 168103 pass in 
168108
 test-amd64-amd64-xl-rtds 18 guest-localmigrate fail pass in 168103

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds18 guest-start/debian.repeat fail REGR. vs. 168080
 test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10 fail in 168103 REGR. vs. 
168080

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-examine  1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-seattle   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168080
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168080
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168080
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168080
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168080
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168080
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168080
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 linux754e0b0e35608ed5206d6a67a791563c631cec07
baseline version:
 linuxf1baf68e1383f6ed93eb9cff2866d46562607a43

Last test of basis   168080  2022-02-11 00:09:22 Z3 days
Failing since168086  2022-02-11 20:11:19 Z2 days7 attempts
Testing same since   168103  2022-02-13 21:41:20 Z0 days2 attempts


People who touched revisions under test:
  Aaron Liu 
  

Re: [PATCH v2 02/70] xen/sort: Switch to an extern inline implementation

2022-02-14 Thread Andrew Cooper
On 14/02/2022 13:13, Bertrand Marquis wrote:
> Hi Andrew,
>
>> On 14 Feb 2022, at 12:50, Andrew Cooper  wrote:
>>
>> There are exactly 3 callers of sort() in the hypervisor.  Callbacks in a 
>> tight
>> loop like this are problematic for performance, especially with Spectre v2
>> protections, which is why extern inline is used commonly by libraries.
>>
>> Both ARM callers pass in NULL for the swap function, and while this might 
>> seem
>> like an attractive option at first, it causes generic_swap() to be used, 
>> which
>> forced a byte-wise copy.  Provide real swap functions so the compiler can
>> optimise properly, which is very important for ARM downstreams where
>> milliseconds until the system is up matters.
>>
>> No functional change.
>>
>> Signed-off-by: Andrew Cooper 
>> Reviewed-by: Jan Beulich 
> Just one comment fix after, with it fixed for the arm part:
>
> Reviewed-by: Bertrand Marquis 

Thanks.

>> diff --git a/xen/include/xen/sort.h b/xen/include/xen/sort.h
>> index a403652948e7..01479ea44606 100644
>> --- a/xen/include/xen/sort.h
>> +++ b/xen/include/xen/sort.h
>> @@ -3,8 +3,61 @@
>>
>> #include 
>>
>> +/*
>> + * sort - sort an array of elements
>> + * @base: pointer to data to sort
>> + * @num: number of elements
>> + * @size: size of each element
>> + * @cmp: pointer to comparison function
>> + * @swap: pointer to swap function or NULL
> The function is not accepting anymore to have NULL as parameter.
> The comment should be fixed here.

Will fix.

~Andrew


[linux-5.4 test] 168106: trouble: blocked/broken/fail/pass

2022-02-14 Thread osstest service owner
flight 168106 linux-5.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168106/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64  broken
 build-arm64-pvopsbroken
 build-arm64-xsm  broken
 build-arm64-xsm   4 host-install(4)broken REGR. vs. 168060
 build-arm64-pvops 4 host-install(4)broken REGR. vs. 168060
 build-arm64   4 host-install(4)broken REGR. vs. 168060

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeat fail in 168102 pass in 
168106
 test-armhf-armhf-xl-credit1   8 xen-boot   fail pass in 168102

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-examine  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-seattle   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit1 15 migrate-support-check fail in 168102 never pass
 test-armhf-armhf-xl-credit1 16 saverestore-support-check fail in 168102 never 
pass
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168060
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168060
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168060
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168060
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168060
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168060
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168060
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168060
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168060
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168060
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168060
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168060
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 linux  

Re: [RFC PATCH] arm/vgic-v3: provide custom callbacks for pend_lpi_tree radix tree

2022-02-14 Thread Luca Fancellu



> On 11 Feb 2022, at 16:12, Julien Grall  wrote:
> 
> 
> 
> On 11/02/2022 15:45, Luca Fancellu wrote:
>>> On 11 Feb 2022, at 15:26, Julien Grall  wrote:
>>> 
>>> Hi Luca,
>>> 
>>> On 11/02/2022 15:00, Luca Fancellu wrote:
 pend_lpi_tree is a radix tree used to store pending irqs, the tree is
 protected by a lock for read/write operations.
 Currently the radix tree default function to free items uses the
 RCU mechanism, calling call_rcu and deferring the operation.
 However every access to the structure is protected by the lock so we
 can avoid using the default free function that, by using RCU,
 increases memory usage and impacts the predictability of the system.
>>> 
>>> I understand goal but looking at the implementation of 
>>> vgic_v3_lpi_to_pending() (Copied below for convenience). We would release 
>>> the lock as soon as the look-up finish, yet the element is returned.
>>> 
>>> static struct pending_irq *vgic_v3_lpi_to_pending(struct domain *d,
>>>  unsigned int lpi)
>>> {
>>>struct pending_irq *pirq;
>>> 
>>>read_lock(>arch.vgic.pend_lpi_tree_lock);
>>>pirq = radix_tree_lookup(>arch.vgic.pend_lpi_tree, lpi);
>>>read_unlock(>arch.vgic.pend_lpi_tree_lock);
>>> 
>>>return pirq;
>>> }
>>> 
>>> So the lock will not protect us against removal. If you want to drop the 
>>> RCU, you will need to ensure the structure pending_irq is suitably 
>>> protected. I haven't check whether there are other locks that may suit us 
>>> here.
>>> 
>> Hi Julien,
>> Yes you are right! I missed that, sorry for the noise.
> 
> Actually,... I think I am wrong :/.
> 
> I thought the lock pend_lpi_tre_lock would protect pending_irq, but it only 
> protects the radix tree element (not the value).
> 
> The use in its_discard_event() seems to confirm that because the
> pending_irq is re-initialized as soon as it gets destroyed.
> 
> I would like a second opinion though.
> 

Hi Julien,

I think you are right, the structure itself is protected but the usage of the 
element not. I guess now it’s safe because RCU
is freeing it when no cpus are using it anymore.

 - radix_tree_lookup
   - vgic_v3_lpi_to_pending (return pointer to item)
 - lpi_to_pending (function pointer to vgic_v3_lpi_to_pending)
   - irq_to_pending (return pointer to item if it is lpi -> is_lpi(irq))

 - vgic_vcpu_inject_lpi
   - gicv3_do_LPI (rcu_lock_domain_by_id on domain)
 - gic_interrupt (do_LPI function pointer)
   - do_trap_irq
   - do_trap_fiq
   - its_handle_int
 - vgic_its_handle_cmds
   - vgic_v3_its_mmio_write
 - handle_write
   - try_handle_mmio
 - do_trap_stage2_abort_guest
   - do_trap_guest_sync

 - vgic_get_hw_irq_desc 
   - release_guest_irq 
 - arch_do_domctl (XEN_DOMCTL_unbind_pt_irq)
   - do_domctl
 - domain_vgic_free
   - arch_domain_destroy

 - gic_raise_inflight_irq (assert v->arch.vgic.lock)
 - gic_raise_guest_irq (assert v->arch.vgic.lock)
 - gic_update_one_lr (assert v->arch.vgic.lock, irq are disabled)
 - vgic_connect_hw_irq
   - gic_route_irq_to_guest (Assert !is_lpi)
   - gic_remove_irq_from_guest (Assert !is_lpi(virq))
 - vgic_migrate_irq (lock old->arch.vgic.lock)
 - arch_move_irqs (Assert not lpi in loop)
 - vgic_disable_irqs (lock v_target->arch.vgic.lock)
 - vgic_enable_irqs (lock v_target->arch.vgic.lock)
 - vgic_inject_irq (lock v->arch.vgic.lock)
 - vgic_evtchn_irq_pending (assert !is_lpi(v->domain->arch.evtchn_irq))
 - vgic_check_inflight_irqs_pending (lock v_target->arch.vgic.lock)

   - vgic_v3_lpi_get_priority (return value from pointer)
 - lpi_get_priority (function pointer to vgic_v3_lpi_get_priority)

 - radix_tree_delete
   - its_discard_event (lock vcpu->arch.vgic.lock)

>From a quick analysis I see there are path using that pointer who doesn’t 
>share any lock.

I will put on hold this patch.

Cheers,
Luca


> Cheers,
> 
> -- 
> Julien Grall




Re: [PATCH v2 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86

2022-02-14 Thread Jane Malalane
On 14/02/2022 13:18, Jan Beulich wrote:
> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
> unless you have verified the sender and know the content is safe.
> 
> On 14.02.2022 14:11, Jane Malalane wrote:
>> On 11/02/2022 11:46, Jan Beulich wrote:
>>> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
>>> unless you have verified the sender and know the content is safe.
>>>
>>> On 11.02.2022 12:29, Roger Pau Monné wrote:
 On Fri, Feb 11, 2022 at 10:06:48AM +, Jane Malalane wrote:
> On 10/02/2022 10:03, Roger Pau Monné wrote:
>> On Mon, Feb 07, 2022 at 06:21:00PM +, Jane Malalane wrote:
>>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>>> index 7ab15e07a0..4060aef1bd 100644
>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>> @@ -343,6 +343,15 @@ static int vmx_init_vmcs_config(bool bsp)
>>> MSR_IA32_VMX_PROCBASED_CTLS2, );
>>> }
>>> 
>>> +/* Check whether hardware supports accelerated xapic and x2apic. */
>>> +if ( bsp )
>>> +{
>>> +assisted_xapic_available = 
>>> cpu_has_vmx_virtualize_apic_accesses;
>>> +assisted_x2apic_available = (cpu_has_vmx_apic_reg_virt ||
>>> + 
>>> cpu_has_vmx_virtual_intr_delivery) &&
>>> +cpu_has_vmx_virtualize_x2apic_mode;
>>
>> I've been think about this, and it seems kind of asymmetric that for
>> xAPIC mode we report hw assisted support only with
>> virtualize_apic_accesses available, while for x2APIC we require
>> virtualize_x2apic_mode plus either apic_reg_virt or
>> virtual_intr_delivery.
>>
>> I think we likely need to be more consistent here, and report hw
>> assisted x2APIC support as long as virtualize_x2apic_mode is
>> available.
>>
>> This will likely have some effect on patch 2 also, as you will have to
>> adjust vmx_vlapic_msr_changed.
>>
>> Thanks, Roger.
>
> Any other thoughts on this? As on one hand it is asymmetric but also
> there isn't much assistance with only virtualize_x2apic_mode set as, in
> this case, a VM exit will be avoided only when trying to access the TPR
> register.

 I've been thinking about this, and reporting hardware assisted
 x{2}APIC virtualization with just
 SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES or
 SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE doesn't seem very helpful. While
 those provide some assistance to the VMM in order to handle APIC
 accesses, it will still require a trap into the hypervisor to handle
 most of the accesses.

 So maybe we should only report hardware assisted support when the
 mentioned features are present together with
 SECONDARY_EXEC_APIC_REGISTER_VIRT?
>>>
>>> Not sure - "some assistance" seems still a little better than none at all.
>>> Which route to go depends on what exactly we intend the bit to be used for.
>>>
>> True. I intended this bit to be specifically for enabling
>> assisted_x{2}apic. So, would it be inconsistent to report hardware
>> assistance with just VIRTUALIZE_APIC_ACCESSES or VIRTUALIZE_X2APIC_MODE
>> but still claim that x{2}apic is virtualized if no MSR accesses are
>> intercepted with XEN_HVM_CPUID_X2APIC_VIRT (in traps.c) so that, as you
>> say, the guest gets at least "some assistance" instead of none but we
>> still claim x{2}apic virtualization when it is actually complete? Maybe
>> I could also add a comment alluding to this in the xl documentation.
> 
> To rephrase my earlier point: Which kind of decisions are the consumer(s)
> of us reporting hardware assistance going to take? In how far is there a
> risk that "some assistance" is overall going to lead to a loss of
> performance? I guess I'd need to see comment and actual code all in one
> place ...
> 
So, I was thinking of adding something along the lines of:

+=item B B<(x86 only)>
+Enables or disables hardware assisted virtualization for xAPIC. This
+allows accessing APIC registers without a VM-exit. Notice enabling
+this does not guarantee full virtualization for xAPIC, as this can
+only be achieved if hardware supports “APIC-register virtualization”
+and “virtual-interrupt delivery”. The default is settable via
+L.

and going for assisted_x2apic_available = 
cpu_has_vmx_virtualize_x2apic_mode.

This would prevent the customer from expecting full acceleration when 
apic_register_virt and/or virtual_intr_delivery aren't available whilst 
still offering some if they are not available as Xen currently does. In 
a future patch, we could also expose and add config options for these 
controls if we wanted to.

Thank you for your help,

Jane.

Re: [PATCH v2 21/70] xen/evtchn: CFI hardening

2022-02-14 Thread Andrew Cooper
On 14/02/2022 16:53, David Vrabel wrote:
> On 14/02/2022 12:50, Andrew Cooper wrote:
>> Control Flow Integrity schemes use toolchain and optionally hardware
>> support
>> to help protect against call/jump/return oriented programming attacks.
>>
>> Use cf_check to annotate function pointer targets for the toolchain.
> [...]
>> -static void evtchn_2l_set_pending(struct vcpu *v, struct evtchn
>> *evtchn)
>> +static void cf_check evtchn_2l_set_pending(
>> +    struct vcpu *v, struct evtchn *evtchn)
>
> Why manually annotate functions instead of getting the compiler to
> automatically work it out?

Because the compilers are not currently capable of working it out
automatically.

~Andrew



Re: [PATCH v2 21/70] xen/evtchn: CFI hardening

2022-02-14 Thread David Vrabel

On 14/02/2022 12:50, Andrew Cooper wrote:

Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

[...]

-static void evtchn_2l_set_pending(struct vcpu *v, struct evtchn *evtchn)
+static void cf_check evtchn_2l_set_pending(
+struct vcpu *v, struct evtchn *evtchn)


Why manually annotate functions instead of getting the compiler to 
automatically work it out?


David



Re: [PATCH v2 5/7] x86/hvm: Use __initdata_cf_clobber for hvm_funcs

2022-02-14 Thread Jan Beulich
On 14.02.2022 17:39, Andrew Cooper wrote:
> On 14/02/2022 13:35, Andrew Cooper wrote:
>> On 14/02/2022 13:10, Jan Beulich wrote:
>>> On 14.02.2022 13:56, Andrew Cooper wrote:
 --- a/xen/arch/x86/hvm/hvm.c
 +++ b/xen/arch/x86/hvm/hvm.c
 @@ -88,7 +88,7 @@ unsigned int opt_hvm_debug_level __read_mostly;
  integer_param("hvm_debug", opt_hvm_debug_level);
  #endif
  
 -struct hvm_function_table hvm_funcs __read_mostly;
 +struct hvm_function_table __ro_after_init hvm_funcs;
>>> Strictly speaking this is an unrelated change. I'm fine with it living here,
>>> but half a sentence would be nice in the description.
>> I could split it out, but we could probably make 200 patches of
>> "sprinkle some __ro_after_init around, now that it exists".
>>
 --- a/xen/arch/x86/hvm/svm/svm.c
 +++ b/xen/arch/x86/hvm/svm/svm.c
 @@ -2513,7 +2513,7 @@ static void cf_check svm_set_reg(struct vcpu *v, 
 unsigned int reg, uint64_t val)
  }
  }
  
 -static struct hvm_function_table __initdata svm_function_table = {
 +static struct hvm_function_table __initdata_cf_clobber svm_function_table 
 = {
  .name = "SVM",
  .cpu_up_prepare   = svm_cpu_up_prepare,
  .cpu_dead = svm_cpu_dead,
 diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
 index 41db538a9e3d..758df3321884 100644
 --- a/xen/arch/x86/hvm/vmx/vmx.c
 +++ b/xen/arch/x86/hvm/vmx/vmx.c
 @@ -2473,7 +2473,7 @@ static void cf_check vmx_set_reg(struct vcpu *v, 
 unsigned int reg, uint64_t val)
  vmx_vmcs_exit(v);
  }
  
 -static struct hvm_function_table __initdata vmx_function_table = {
 +static struct hvm_function_table __initdata_cf_clobber vmx_function_table 
 = {
  .name = "VMX",
  .cpu_up_prepare   = vmx_cpu_up_prepare,
  .cpu_dead = vmx_cpu_dead,
>>> While I'd like to re-raise my concern regarding the non-pointer fields
>>> in these structure instances (just consider a sequence of enough bool
>>> bitfields, which effectively can express any value, including ones
>>> which would appear like pointers into .text), since for now all is okay
>>> afaict:
>>> Reviewed-by: Jan Beulich 
>> I should probably put something in the commit message too.  It is a
>> theoretical risk, but not (IMO) a practical one.
> 
> Updated commit message:
> 
> x86/hvm: Use __initdata_cf_clobber for hvm_funcs
> 
> Now that all calls through hvm_funcs are fully altcall'd, harden all the svm
> and vmx function pointer targets.  This drops 106 endbr64 instructions.
> 
> Clobbering does come with a theoretical risk.  The non-pointer fields of
> {svm,vmx}_function_table can in theory happen to form a bit pattern
> matching a
> pointer into .text at a legal endbr64 instruction, but this is expected
> to be
> implausible for anything liable to pass code review.
> 
> While at it, move hvm_funcs into __ro_after_init now that this exists.

SGTM, thanks.

Jan




Re: [PATCH 3/3] amd/msr: implement VIRT_SPEC_CTRL for HVM guests using legacy SSBD

2022-02-14 Thread Jan Beulich
On 01.02.2022 17:46, Roger Pau Monne wrote:
> @@ -716,26 +702,117 @@ void amd_init_ssbd(const struct cpuinfo_x86 *c)
>   if (rdmsr_safe(MSR_AMD64_LS_CFG, val) ||
>   ({
>   val &= ~mask;
> - if (opt_ssbd)
> + if (enable)
>   val |= mask;
>   false;
>   }) ||
>   wrmsr_safe(MSR_AMD64_LS_CFG, val) ||
>   ({
>   rdmsrl(MSR_AMD64_LS_CFG, val);
> - (val & mask) != (opt_ssbd * mask);
> + (val & mask) != (enable * mask);
>   }))
>   bit = -1;
>   }
>  
> - if (bit < 0)
> + return bit >= 0;
> +}
> +
> +void amd_init_ssbd(const struct cpuinfo_x86 *c)
> +{
> + struct cpu_info *info = get_cpu_info();
> +
> + if (cpu_has_ssb_no)
> + return;
> +
> + if (cpu_has_amd_ssbd) {
> + /* Handled by common MSR_SPEC_CTRL logic */
> + return;
> + }
> +
> + if (cpu_has_virt_ssbd) {
> + wrmsrl(MSR_VIRT_SPEC_CTRL, opt_ssbd ? SPEC_CTRL_SSBD : 0);
> + goto out;
> + }
> +
> + if (!set_legacy_ssbd(c, opt_ssbd)) {
>   printk_once(XENLOG_ERR "No SSBD controls available\n");
> + return;
> + }
> +
> + if (!smp_processor_id())
> + setup_force_cpu_cap(X86_FEATURE_LEGACY_SSBD);

I don't think you need a new feature flag here: You only ever use it
with boot_cpu_has() and there's no alternatives patching keyed to it,
so a single global flag will likely do.

>   out:
>   info->last_spec_ctrl = info->xen_spec_ctrl = opt_ssbd ? SPEC_CTRL_SSBD
> : 0;
>  }
>  
> +static struct ssbd_core {
> +spinlock_t lock;
> +unsigned int count;
> +} *ssbd_core;
> +static unsigned int __read_mostly ssbd_max_cores;

__ro_after_init?

> +bool __init amd_setup_legacy_ssbd(void)
> +{
> + unsigned int i;
> +
> + if (boot_cpu_data.x86 != 0x17 || boot_cpu_data.x86_num_siblings == 1)

Maybe better "<= 1", not the least ...

> + return true;
> +
> + /*
> +  * One could be forgiven for thinking that c->x86_max_cores is the
> +  * correct value to use here.
> +  *
> +  * However, that value is derived from the current configuration, and
> +  * c->cpu_core_id is sparse on all but the top end CPUs.  Derive
> +  * max_cpus from ApicIdCoreIdSize which will cover any sparseness.
> +  */
> + if (boot_cpu_data.extended_cpuid_level >= 0x8008) {
> + ssbd_max_cores = 1u << MASK_EXTR(cpuid_ecx(0x8008), 0xf000);
> + ssbd_max_cores /= boot_cpu_data.x86_num_siblings;

... because of this division. I don't know whether we're also susceptible
to this, but I've seen Linux (on top of Xen) being confused enough about
the topology related CPUID data we expose that it ended up running with
the value set to zero (and then exploding e.g. on a similar use).

> + }
> + if (!ssbd_max_cores)
> + return false;
> +
> + /* Max is two sockets for Fam17h hardware. */
> + ssbd_core = xzalloc_array(struct ssbd_core, ssbd_max_cores * 2);
> + if (!ssbd_core)
> + return false;
> +
> + for (i = 0; i < ssbd_max_cores * 2; i++) {
> + spin_lock_init(_core[i].lock);
> + /* Record the current state. */
> + ssbd_core[i].count = opt_ssbd ?
> +  boot_cpu_data.x86_num_siblings : 0;
> + }
> +
> + return true;
> +}
> +
> +void amd_set_legacy_ssbd(bool enable)
> +{
> + const struct cpuinfo_x86 *c = _cpu_data;
> + struct ssbd_core *core;
> + unsigned long flags;
> +
> + if (c->x86 != 0x17 || c->x86_num_siblings == 1) {
> + set_legacy_ssbd(c, enable);
> + return;
> + }
> +
> + ASSERT(c->phys_proc_id < 2);
> + ASSERT(c->cpu_core_id < ssbd_max_cores);
> + core = _core[c->phys_proc_id * ssbd_max_cores + c->cpu_core_id];
> + spin_lock_irqsave(>lock, flags);

May I suggest a brief comment on the irqsave aspect here? Aiui when
called from vmexit_virt_spec_ctrl() while we're still in a GIF=0
section, IF is 1 and hence check_lock() would be unhappy (albeit in
a false positive way).

> + core->count += enable ? 1 : -1;
> + ASSERT(core->count <= c->x86_num_siblings);
> + if ((enable  && core->count == 1) ||
> + (!enable && core->count == 0))

Maybe simply "if ( core->count == enable )"? Or do compilers not like
comparisons with booleans?

> --- a/xen/arch/x86/spec_ctrl.c
> +++ b/xen/arch/x86/spec_ctrl.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1056,7 +1057,8 @@ void __init init_speculation_mitigations(void)
>  setup_force_cpu_cap(X86_FEATURE_SC_MSR_HVM);

Re: [PATCH v2 5/7] x86/hvm: Use __initdata_cf_clobber for hvm_funcs

2022-02-14 Thread Andrew Cooper
On 14/02/2022 13:35, Andrew Cooper wrote:
> On 14/02/2022 13:10, Jan Beulich wrote:
>> On 14.02.2022 13:56, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -88,7 +88,7 @@ unsigned int opt_hvm_debug_level __read_mostly;
>>>  integer_param("hvm_debug", opt_hvm_debug_level);
>>>  #endif
>>>  
>>> -struct hvm_function_table hvm_funcs __read_mostly;
>>> +struct hvm_function_table __ro_after_init hvm_funcs;
>> Strictly speaking this is an unrelated change. I'm fine with it living here,
>> but half a sentence would be nice in the description.
> I could split it out, but we could probably make 200 patches of
> "sprinkle some __ro_after_init around, now that it exists".
>
>>> --- a/xen/arch/x86/hvm/svm/svm.c
>>> +++ b/xen/arch/x86/hvm/svm/svm.c
>>> @@ -2513,7 +2513,7 @@ static void cf_check svm_set_reg(struct vcpu *v, 
>>> unsigned int reg, uint64_t val)
>>>  }
>>>  }
>>>  
>>> -static struct hvm_function_table __initdata svm_function_table = {
>>> +static struct hvm_function_table __initdata_cf_clobber svm_function_table 
>>> = {
>>>  .name = "SVM",
>>>  .cpu_up_prepare   = svm_cpu_up_prepare,
>>>  .cpu_dead = svm_cpu_dead,
>>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>>> index 41db538a9e3d..758df3321884 100644
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -2473,7 +2473,7 @@ static void cf_check vmx_set_reg(struct vcpu *v, 
>>> unsigned int reg, uint64_t val)
>>>  vmx_vmcs_exit(v);
>>>  }
>>>  
>>> -static struct hvm_function_table __initdata vmx_function_table = {
>>> +static struct hvm_function_table __initdata_cf_clobber vmx_function_table 
>>> = {
>>>  .name = "VMX",
>>>  .cpu_up_prepare   = vmx_cpu_up_prepare,
>>>  .cpu_dead = vmx_cpu_dead,
>> While I'd like to re-raise my concern regarding the non-pointer fields
>> in these structure instances (just consider a sequence of enough bool
>> bitfields, which effectively can express any value, including ones
>> which would appear like pointers into .text), since for now all is okay
>> afaict:
>> Reviewed-by: Jan Beulich 
> I should probably put something in the commit message too.  It is a
> theoretical risk, but not (IMO) a practical one.

Updated commit message:

x86/hvm: Use __initdata_cf_clobber for hvm_funcs

Now that all calls through hvm_funcs are fully altcall'd, harden all the svm
and vmx function pointer targets.  This drops 106 endbr64 instructions.

Clobbering does come with a theoretical risk.  The non-pointer fields of
{svm,vmx}_function_table can in theory happen to form a bit pattern
matching a
pointer into .text at a legal endbr64 instruction, but this is expected
to be
implausible for anything liable to pass code review.

While at it, move hvm_funcs into __ro_after_init now that this exists.

~Andrew


Re: [PATCH v2 3/7] x86/altcall: Optimise away endbr64 instruction where possible

2022-02-14 Thread Jan Beulich
On 14.02.2022 17:03, Andrew Cooper wrote:
> On 14/02/2022 13:51, Jan Beulich wrote:
>> On 14.02.2022 14:31, Andrew Cooper wrote:
>>> On 14/02/2022 13:06, Jan Beulich wrote:
 On 14.02.2022 13:56, Andrew Cooper wrote:
> @@ -330,6 +333,41 @@ static void init_or_livepatch 
> _apply_alternatives(struct alt_instr *start,
>  add_nops(buf + a->repl_len, total_len - a->repl_len);
>  text_poke(orig, buf, total_len);
>  }
> +
> +/*
> + * Clobber endbr64 instructions now that altcall has finished 
> optimising
> + * all indirect branches to direct ones.
> + */
> +if ( force && cpu_has_xen_ibt )
> +{
> +void *const *val;
> +unsigned int clobbered = 0;
> +
> +/*
> + * This is some minor structure (ab)use.  We walk the entire 
> contents
> + * of .init.{ro,}data.cf_clobber as if it were an array of 
> pointers.
> + *
> + * If the pointer points into .text, and at an endbr64 
> instruction,
> + * nop out the endbr64.  This causes the pointer to no longer be 
> a
> + * legal indirect branch target under CET-IBT.  This is a
> + * defence-in-depth measure, to reduce the options available to 
> an
> + * adversary who has managed to hijack a function pointer.
> + */
> +for ( val = __initdata_cf_clobber_start;
> +  val < __initdata_cf_clobber_end;
> +  val++ )
> +{
> +void *ptr = *val;
> +
> +if ( !is_kernel_text(ptr) || !is_endbr64(ptr) )
> +continue;
> +
> +add_nops(ptr, 4);
 This literal 4 would be nice to have a #define next to where the ENDBR64
 encoding has its central place.
>>> We don't have an encoding of ENDBR64 in a central place.
>>>
>>> The best you can probably have is
>>>
>>> #define ENDBR64_LEN 4
>>>
>>> in endbr.h ?
>> Perhaps. That's not in this series nor in staging already, so it's a little
>> hard to check. By "central place" I really meant is_enbr64() if that's the
>> only place where the encoding actually appears.
> 
> endbr.h is the header which contains is_endbr64(), and deliberately does
> not contain the raw encoding.

Well, yes, it's intentionally the inverted encoding, but I thought
you would get the point.

> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -221,6 +221,12 @@ SECTIONS
> *(.initcall1.init)
> __initcall_end = .;
>  
> +   . = ALIGN(POINTER_ALIGN);
> +   __initdata_cf_clobber_start = .;
> +   *(.init.data.cf_clobber)
> +   *(.init.rodata.cf_clobber)
> +   __initdata_cf_clobber_end = .;
> +
> *(.init.data)
> *(.init.data.rel)
> *(.init.data.rel.*)
 With r/o data ahead and r/w data following, may I suggest to flip the
 order of the two section specifiers you add?
>>> I don't follow.  This is all initdata which is merged together into a
>>> single section.
>>>
>>> The only reason const data is split out in the first place is to appease
>>> the toolchains, not because it makes a difference.
>> It's marginal, I agree, but it would still seem more clean to me if all
>> (pseudo) r/o init data lived side by side.
> 
> I still don't understand what you're asking.
> 
> There is no such thing as actually read-only init data.
> 
> Wherever the .init.rodata goes in here, it's bounded by .init.data.

Well, looking at the linker script again I notice that while r/o items
like .init.setup and .initcall*.init come first, some further ones
(.init_array etc) come quite late. Personally I'd prefer if all r/o
items sat side by side, no matter that currently we munge them all
into a single section. Then, if we decided to stop this practice, all
it would take would be to insert an output section closing and re-
opening. (Or it would have been so until now; with your addition it
wouldn't be as simple anymore anyway.)

But anyway, if at this point I still didn't get my point across, then
please leave things as you have them.

Jan




Re: [PATCH v2 64/70] x86: Introduce helpers/checks for endbr64 instructions

2022-02-14 Thread Andrew Cooper
On 14/02/2022 12:51, Andrew Cooper wrote:
> ... to prevent the optimiser creating unsafe code.  See the code comment for
> full details.
>
> Signed-off-by: Andrew Cooper 

From review in the follow-up series, I've merged this delta:

diff --git a/xen/arch/x86/include/asm/endbr.h
b/xen/arch/x86/include/asm/endbr.h
index 6b6f46afaf29..6090afeb0bd8 100644
--- a/xen/arch/x86/include/asm/endbr.h
+++ b/xen/arch/x86/include/asm/endbr.h
@@ -19,6 +19,8 @@
 
 #include 
 
+#define ENDBR64_LEN 4
+
 /*
  * In some cases we need to inspect/insert endbr64 instructions.
  *

in, to replace some raw 4's.

~Andrew


Re: [PATCH 16/16] x86/P2M: the majority for struct p2m_domain's fields are HVM-only

2022-02-14 Thread Jan Beulich
On 14.02.2022 16:51, George Dunlap wrote:
> 
> 
>> On Jul 5, 2021, at 5:15 PM, Jan Beulich  wrote:
>>
>> ..., as are the majority of the locks involved. Conditionalize things
>> accordingly.
>>
>> Also adjust the ioreq field's indentation at this occasion.
>>
>> Signed-off-by: Jan Beulich 
> 
> Reviewed-by: George Dunlap 

Thanks.

> With one question…
> 
>> @@ -905,10 +917,10 @@ int p2m_altp2m_propagate_change(struct d
>> /* Set a specific p2m view visibility */
>> int p2m_set_altp2m_view_visibility(struct domain *d, unsigned int idx,
>>uint8_t visible);
>> -#else
>> +#else /* CONFIG_HVM */
>> struct p2m_domain *p2m_get_altp2m(struct vcpu *v);
>> static inline void p2m_altp2m_check(struct vcpu *v, uint16_t idx) {}
>> -#endif
>> +#endif /* CONFIG_HVM */
> 
> This is relatively minor, but what’s the normal for how to label #else macros 
> here?  Wouldn’t you normally see “#endif /* CONFIG_HVM */“ and think that the 
> immediately preceding lines are compiled only if CONFIG_HVM is defined?  
> I.e., would this be more accurate to write “!CONFIG_HVM” here?
> 
> I realize in this case it’s not a big deal since the #else is just three 
> lines above it, but since you took the time to add the comment in there, it 
> seems like it’s worth the time to have a quick think about whether that’s the 
> right thing to do.

Hmm, yes, let me make this !CONFIG_HVM. I think we're not really
consistent with this, but I agree it's more natural like you say.

Jan




Re: [PATCH v2 3/7] x86/altcall: Optimise away endbr64 instruction where possible

2022-02-14 Thread Andrew Cooper
On 14/02/2022 13:51, Jan Beulich wrote:
> On 14.02.2022 14:31, Andrew Cooper wrote:
>> On 14/02/2022 13:06, Jan Beulich wrote:
>>> On 14.02.2022 13:56, Andrew Cooper wrote:
 @@ -330,6 +333,41 @@ static void init_or_livepatch 
 _apply_alternatives(struct alt_instr *start,
  add_nops(buf + a->repl_len, total_len - a->repl_len);
  text_poke(orig, buf, total_len);
  }
 +
 +/*
 + * Clobber endbr64 instructions now that altcall has finished 
 optimising
 + * all indirect branches to direct ones.
 + */
 +if ( force && cpu_has_xen_ibt )
 +{
 +void *const *val;
 +unsigned int clobbered = 0;
 +
 +/*
 + * This is some minor structure (ab)use.  We walk the entire 
 contents
 + * of .init.{ro,}data.cf_clobber as if it were an array of 
 pointers.
 + *
 + * If the pointer points into .text, and at an endbr64 
 instruction,
 + * nop out the endbr64.  This causes the pointer to no longer be a
 + * legal indirect branch target under CET-IBT.  This is a
 + * defence-in-depth measure, to reduce the options available to an
 + * adversary who has managed to hijack a function pointer.
 + */
 +for ( val = __initdata_cf_clobber_start;
 +  val < __initdata_cf_clobber_end;
 +  val++ )
 +{
 +void *ptr = *val;
 +
 +if ( !is_kernel_text(ptr) || !is_endbr64(ptr) )
 +continue;
 +
 +add_nops(ptr, 4);
>>> This literal 4 would be nice to have a #define next to where the ENDBR64
>>> encoding has its central place.
>> We don't have an encoding of ENDBR64 in a central place.
>>
>> The best you can probably have is
>>
>> #define ENDBR64_LEN 4
>>
>> in endbr.h ?
> Perhaps. That's not in this series nor in staging already, so it's a little
> hard to check. By "central place" I really meant is_enbr64() if that's the
> only place where the encoding actually appears.

endbr.h is the header which contains is_endbr64(), and deliberately does
not contain the raw encoding.

>
 --- a/xen/arch/x86/xen.lds.S
 +++ b/xen/arch/x86/xen.lds.S
 @@ -221,6 +221,12 @@ SECTIONS
 *(.initcall1.init)
 __initcall_end = .;
  
 +   . = ALIGN(POINTER_ALIGN);
 +   __initdata_cf_clobber_start = .;
 +   *(.init.data.cf_clobber)
 +   *(.init.rodata.cf_clobber)
 +   __initdata_cf_clobber_end = .;
 +
 *(.init.data)
 *(.init.data.rel)
 *(.init.data.rel.*)
>>> With r/o data ahead and r/w data following, may I suggest to flip the
>>> order of the two section specifiers you add?
>> I don't follow.  This is all initdata which is merged together into a
>> single section.
>>
>> The only reason const data is split out in the first place is to appease
>> the toolchains, not because it makes a difference.
> It's marginal, I agree, but it would still seem more clean to me if all
> (pseudo) r/o init data lived side by side.

I still don't understand what you're asking.

There is no such thing as actually read-only init data.

Wherever the .init.rodata goes in here, it's bounded by .init.data.

~Andrew


Re: [PATCH 2/3] amd/msr: allow passthrough of VIRT_SPEC_CTRL for HVM guests

2022-02-14 Thread Jan Beulich
On 01.02.2022 17:46, Roger Pau Monne wrote:
> Allow HVM guests untrapped access to MSR_VIRT_SPEC_CTRL if the
> hardware has support for it. This requires adding logic in the
> vm{entry,exit} paths for SVM in order to context switch between the
> hypervisor value and the guest one. The added handlers for context
> switch will also be used for the legacy SSBD support.

So by "hardware" you mean virtual hardware here, when we run
virtualized ourselves? While the wording in AMD's whitepaper suggests
hardware could exist with both MSRs implemented, so far it was my
understanding that VIRT_SPEC_CTRL was rather left for hypervisors to
implement. Maybe I'm wrong with this, in which case some of the
further comments may also be wrong.

> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -687,6 +687,7 @@ void amd_init_lfence(struct cpuinfo_x86 *c)
>   */
>  void amd_init_ssbd(const struct cpuinfo_x86 *c)
>  {
> + struct cpu_info *info = get_cpu_info();
>   int bit = -1;
>  
>   if (cpu_has_ssb_no)
> @@ -699,7 +700,7 @@ void amd_init_ssbd(const struct cpuinfo_x86 *c)
>  
>   if (cpu_has_virt_ssbd) {
>   wrmsrl(MSR_VIRT_SPEC_CTRL, opt_ssbd ? SPEC_CTRL_SSBD : 0);
> - return;
> + goto out;
>   }
>  
>   switch (c->x86) {
> @@ -729,6 +730,10 @@ void amd_init_ssbd(const struct cpuinfo_x86 *c)
>  
>   if (bit < 0)
>   printk_once(XENLOG_ERR "No SSBD controls available\n");
> +
> + out:
> + info->last_spec_ctrl = info->xen_spec_ctrl = opt_ssbd ? SPEC_CTRL_SSBD
> +   : 0;
>  }

Besides me being uncertain about the placement of these (preferably
the writes would be where the other similar writes are), this re-use
of the values suggests that you mean to prefer VIRT_SPEC_CTRL use
over that of SPEC_CTRL (see below).

Additionally - the value you store isn't necessarily the value you
wrote to the MSR. It only is if you cam here via the "goto out".

> --- a/xen/arch/x86/hvm/svm/entry.S
> +++ b/xen/arch/x86/hvm/svm/entry.S
> @@ -71,7 +71,9 @@ __UNLIKELY_END(nsvm_hap)
>  mov%al, CPUINFO_last_spec_ctrl(%rsp)
>  1:  /* No Spectre v1 concerns.  Execution will hit VMRUN imminently. 
> */
>  .endm
> -ALTERNATIVE "", svm_vmentry_spec_ctrl, X86_FEATURE_SC_MSR_HVM
> +ALTERNATIVE_2 "", STR(call vmentry_virt_spec_ctrl), \

I'm afraid this violates the "ret" part of the warning a few lines up,
while ...

> +  X86_FEATURE_VIRT_SC_MSR_HVM, \
> +  svm_vmentry_spec_ctrl, X86_FEATURE_SC_MSR_HVM
>  
>  pop  %r15
>  pop  %r14
> @@ -111,7 +113,9 @@ __UNLIKELY_END(nsvm_hap)
>  wrmsr
>  mov%al, CPUINFO_last_spec_ctrl(%rsp)
>  .endm
> -ALTERNATIVE "", svm_vmexit_spec_ctrl, X86_FEATURE_SC_MSR_HVM
> +ALTERNATIVE_2 "", STR(call vmexit_virt_spec_ctrl), \

... this violates ...

> +  X86_FEATURE_VIRT_SC_MSR_HVM, \
> +  svm_vmexit_spec_ctrl, X86_FEATURE_SC_MSR_HVM
>  /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */

... the "ret" part of this warning.

Furthermore, opposite to what the change to amd_init_ssbd() suggests,
the ordering of the alternatives here means you prefer SPEC_CTRL over
VIRT_SPEC_CTRL; see the comment near the top of _apply_alternatives().
Unless I've missed logic guaranteeing that both of the keyed to
features can't be active at the same time.

Jan




Re: [PATCH 16/16] x86/P2M: the majority for struct p2m_domain's fields are HVM-only

2022-02-14 Thread George Dunlap


> On Jul 5, 2021, at 5:15 PM, Jan Beulich  wrote:
> 
> ..., as are the majority of the locks involved. Conditionalize things
> accordingly.
> 
> Also adjust the ioreq field's indentation at this occasion.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: George Dunlap 

With one question…

> @@ -905,10 +917,10 @@ int p2m_altp2m_propagate_change(struct d
> /* Set a specific p2m view visibility */
> int p2m_set_altp2m_view_visibility(struct domain *d, unsigned int idx,
>uint8_t visible);
> -#else
> +#else /* CONFIG_HVM */
> struct p2m_domain *p2m_get_altp2m(struct vcpu *v);
> static inline void p2m_altp2m_check(struct vcpu *v, uint16_t idx) {}
> -#endif
> +#endif /* CONFIG_HVM */

This is relatively minor, but what’s the normal for how to label #else macros 
here?  Wouldn’t you normally see “#endif /* CONFIG_HVM */“ and think that the 
immediately preceding lines are compiled only if CONFIG_HVM is defined?  I.e., 
would this be more accurate to write “!CONFIG_HVM” here?

I realize in this case it’s not a big deal since the #else is just three lines 
above it, but since you took the time to add the comment in there, it seems 
like it’s worth the time to have a quick think about whether that’s the right 
thing to do.

 -George


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 15/16] x86/P2M: p2m.c is HVM-only

2022-02-14 Thread George Dunlap


> On Jul 5, 2021, at 5:14 PM, Jan Beulich  wrote:
> 
> This only requires moving p2m_percpu_rwlock elsewhere (ultimately I
> think all P2M locking should go away as well when !HVM, but this looks
> to require further code juggling). The two other unguarded functions are
> already unneeded (by virtue of DCE) when !HVM.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: George Dunlap 



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 14/16] paged_pages field is MEM_PAGING-only

2022-02-14 Thread George Dunlap


> On Jul 5, 2021, at 5:14 PM, Jan Beulich  wrote:
> 
> Conditionalize it and its uses accordingly.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: George Dunlap 



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 13/16] shr_pages field is MEM_SHARING-only

2022-02-14 Thread George Dunlap


> On Jul 5, 2021, at 5:13 PM, Jan Beulich  wrote:
> 
> Conditionalize it and its uses accordingly. The main goal though is to
> demonstrate that x86's p2m_teardown() is now empty when !HVM, which in
> particular means the last remaining use of p2m_lock() in this cases goes
> away.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: George Dunlap 



signature.asc
Description: Message signed with OpenPGP


Re: SecureBoot and PCI passthrough with kernel lockdown in place (on Xen)

2022-02-14 Thread marma...@invisiblethingslab.com
On Mon, Feb 14, 2022 at 03:25:31PM +, Andrew Cooper wrote:
> On 14/02/2022 15:02, Dario Faggioli wrote:
> > Hello,
> >
> > We have run into an issue when trying to use PCI passthrough for a Xen
> > VM running on an host where dom0 kernel is 5.14.21 (but we think it
> > could be any kernel > 5.4) and SecureBoot is enabled.
> 
> Back up a bit...
> 
> Xen doesn't support SecureBoot and there's a massive pile of work to
> make it function, let alone work in a way that MSFT aren't liable to
> revoke your cert on 0 notice.
> 
> >
> > The error we get, when (for instance) trying to attach a device to an
> > (HVM) VM, on such system is:
> >
> > # xl pci-attach 2-fv-sles15sp4beta2 :58:03.0 
> > libxl: error: libxl_qmp.c:1838:qmp_ev_parse_error_messages: Domain 
> > 12:Failed to initialize 12/15, type = 0x1, rc: -1
> > libxl: error: libxl_pci.c:1777:device_pci_add_done: Domain 
> > 12:libxl__device_pci_add failed for PCI device 0:58:3.0 (rc -28)
> > libxl: error: libxl_device.c:1420:device_addrm_aocomplete: unable to add 
> > device
> >
> > QEMU, is telling us the following:
> >
> > [00:04.0] xen_pt_msix_init: Error: Can't open /dev/mem: Operation not 
> > permitted
> > [00:04.0] xen_pt_msix_size_init: Error: Internal error: Invalid 
> > xen_pt_msix_init.
> >
> > And the kernel reports this:
> >
> > Jan 27 16:20:53 narvi-sr860v2-bps-sles15sp4b2 kernel: Lockdown: 
> > qemu-system-i38: /dev/mem,kmem,port is restricted; see man kernel_lockdown.7
> >
> > So, it's related to lockdown. Which AFAIUI it's consistent with the
> > fact that the problem only shows up when SecureBoot is enabled, as
> > that's implies lockdown. It's also consistent with the fact that we
> > don't seem to have any problems doing the same with a 5.3.x dom0
> > kernel... As there's no lockdown there!
> >
> > Some digging revealed that QEMU tries to open /dev/mem in
> > xen_pt_msix_init():
> >
> > fd = open("/dev/mem", O_RDWR);
> > ...
> > msix->phys_iomem_base =
> > mmap(NULL,
> >  total_entries * PCI_MSIX_ENTRY_SIZE + 
> > msix->table_offset_adjust,
> >  PROT_READ,
> >  MAP_SHARED | MAP_LOCKED,
> >  fd,
> >  msix->table_base + table_off - msix->table_offset_adjust);
> > close(fd);
> 
> Yes.  Use of /dev/mem is not permitted in lockdown mode.  This wants
> reworking into something which is lockdown compatible.

FWIW, Qubes has PCI passthrough working with qemu in stubdomain, which
works without access to /dev/mem in dom0. We do this, by disabling
MSI-X, including the above piece of code...

https://github.com/QubesOS/qubes-vmm-xen-stubdom-linux/blob/master/qemu/patches/0005-Disable-MSI-X-caps.patch

> The real elephant in the room is that privcmd is not remotely safe to
> use in a SecureBoot environment, because it lets any root userspace
> trivially escalate privilege into the dom0 kernel, bypassing the
> specific protection that SecureBoot is trying to achieve.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


Re: SecureBoot and PCI passthrough with kernel lockdown in place (on Xen)

2022-02-14 Thread Andrew Cooper
On 14/02/2022 15:02, Dario Faggioli wrote:
> Hello,
>
> We have run into an issue when trying to use PCI passthrough for a Xen
> VM running on an host where dom0 kernel is 5.14.21 (but we think it
> could be any kernel > 5.4) and SecureBoot is enabled.

Back up a bit...

Xen doesn't support SecureBoot and there's a massive pile of work to
make it function, let alone work in a way that MSFT aren't liable to
revoke your cert on 0 notice.

>
> The error we get, when (for instance) trying to attach a device to an
> (HVM) VM, on such system is:
>
> # xl pci-attach 2-fv-sles15sp4beta2 :58:03.0 
> libxl: error: libxl_qmp.c:1838:qmp_ev_parse_error_messages: Domain 12:Failed 
> to initialize 12/15, type = 0x1, rc: -1
> libxl: error: libxl_pci.c:1777:device_pci_add_done: Domain 
> 12:libxl__device_pci_add failed for PCI device 0:58:3.0 (rc -28)
> libxl: error: libxl_device.c:1420:device_addrm_aocomplete: unable to add 
> device
>
> QEMU, is telling us the following:
>
> [00:04.0] xen_pt_msix_init: Error: Can't open /dev/mem: Operation not 
> permitted
> [00:04.0] xen_pt_msix_size_init: Error: Internal error: Invalid 
> xen_pt_msix_init.
>
> And the kernel reports this:
>
> Jan 27 16:20:53 narvi-sr860v2-bps-sles15sp4b2 kernel: Lockdown: 
> qemu-system-i38: /dev/mem,kmem,port is restricted; see man kernel_lockdown.7
>
> So, it's related to lockdown. Which AFAIUI it's consistent with the
> fact that the problem only shows up when SecureBoot is enabled, as
> that's implies lockdown. It's also consistent with the fact that we
> don't seem to have any problems doing the same with a 5.3.x dom0
> kernel... As there's no lockdown there!
>
> Some digging revealed that QEMU tries to open /dev/mem in
> xen_pt_msix_init():
>
> fd = open("/dev/mem", O_RDWR);
> ...
> msix->phys_iomem_base =
> mmap(NULL,
>  total_entries * PCI_MSIX_ENTRY_SIZE + 
> msix->table_offset_adjust,
>  PROT_READ,
>  MAP_SHARED | MAP_LOCKED,
>  fd,
>  msix->table_base + table_off - msix->table_offset_adjust);
> close(fd);

Yes.  Use of /dev/mem is not permitted in lockdown mode.  This wants
reworking into something which is lockdown compatible.

The real elephant in the room is that privcmd is not remotely safe to
use in a SecureBoot environment, because it lets any root userspace
trivially escalate privilege into the dom0 kernel, bypassing the
specific protection that SecureBoot is trying to achieve.

~Andrew


Re: [PATCH 11/16] x86/P2M: derive a HVM-only variant from __get_gfn_type_access()

2022-02-14 Thread Jan Beulich
On 14.02.2022 16:12, George Dunlap wrote:
>> On Jul 5, 2021, at 5:12 PM, Jan Beulich  wrote:
>>
>> Introduce an inline wrapper dealing with the non-translated-domain case,
>> while stripping that logic from the main function, which gets renamed to
>> p2m_get_gfn_type_access(). HVM-only callers can then directly use the
>> main function.
>>
>> Along with renaming the main function also make its and the new inline
>> helper's GFN parameters type-safe.
>>
>> Signed-off-by: Jan Beulich 
> 
> Nit in the title: I read “HVM” as “aych vee emm”, and so I use ‘an’ before it 
> rather than ‘a’; i.e., “derive an HVM-only…”
> 
> I feel obligated to mention it but I’ll leave it to you whether you want to 
> change it or not:

Thanks - I always appreciate clarification on my, frequently, improper
language use. In the case here, however, I know people saying "aych"
as well as ones saying "haych", so I'm always in trouble to judge
which one's right (and probably both are). I therefore decided to
simply drop the "a" from the title, which I think still leaves it be a
proper one.

> Reviewed-by: George Dunlap 

And thanks again.

Jan




Re: [PATCH 12/16] x86/p2m: re-arrange {,__}put_gfn()

2022-02-14 Thread George Dunlap


> On Jul 5, 2021, at 5:12 PM, Jan Beulich  wrote:
> 
> All explicit callers of __put_gfn() are in HVM-only code and hold a valid
> P2M pointer in their hands. Move the paging_mode_translate() check out of
> there into put_gfn(), renaming __put_gfn() and making its GFN parameter
> type-safe.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: George Dunlap 



signature.asc
Description: Message signed with OpenPGP


Re: SecureBoot and PCI passthrough with kernel lockdown in place (on Xen)

2022-02-14 Thread Jan Beulich
On 14.02.2022 16:02, Dario Faggioli wrote:
> We have run into an issue when trying to use PCI passthrough for a Xen
> VM running on an host where dom0 kernel is 5.14.21 (but we think it
> could be any kernel > 5.4) and SecureBoot is enabled.
> 
> The error we get, when (for instance) trying to attach a device to an
> (HVM) VM, on such system is:
> 
> # xl pci-attach 2-fv-sles15sp4beta2 :58:03.0 
> libxl: error: libxl_qmp.c:1838:qmp_ev_parse_error_messages: Domain 12:Failed 
> to initialize 12/15, type = 0x1, rc: -1
> libxl: error: libxl_pci.c:1777:device_pci_add_done: Domain 
> 12:libxl__device_pci_add failed for PCI device 0:58:3.0 (rc -28)
> libxl: error: libxl_device.c:1420:device_addrm_aocomplete: unable to add 
> device
> 
> QEMU, is telling us the following:
> 
> [00:04.0] xen_pt_msix_init: Error: Can't open /dev/mem: Operation not 
> permitted
> [00:04.0] xen_pt_msix_size_init: Error: Internal error: Invalid 
> xen_pt_msix_init.
> 
> And the kernel reports this:
> 
> Jan 27 16:20:53 narvi-sr860v2-bps-sles15sp4b2 kernel: Lockdown: 
> qemu-system-i38: /dev/mem,kmem,port is restricted; see man kernel_lockdown.7
> 
> So, it's related to lockdown. Which AFAIUI it's consistent with the
> fact that the problem only shows up when SecureBoot is enabled, as
> that's implies lockdown. It's also consistent with the fact that we
> don't seem to have any problems doing the same with a 5.3.x dom0
> kernel... As there's no lockdown there!
> 
> Some digging revealed that QEMU tries to open /dev/mem in
> xen_pt_msix_init():
> 
> fd = open("/dev/mem", O_RDWR);
> ...
> msix->phys_iomem_base =
> mmap(NULL,
>  total_entries * PCI_MSIX_ENTRY_SIZE + 
> msix->table_offset_adjust,
>  PROT_READ,
>  MAP_SHARED | MAP_LOCKED,
>  fd,
>  msix->table_base + table_off - msix->table_offset_adjust);
> close(fd);

I think this is finally a clear indication that it has always been
wrong for qemu to access hardware directly like this. I see no way
around replacing this by something which isn't a bodge / layering
violation.

Jan

> This comes from commit:
> 
> commit 3854ca577dad92c4fe97b4a6ebce360e25407af7
> Author: Jiang Yunhong 
> Date:   Thu Jun 21 15:42:35 2012 +
> 
> Introduce Xen PCI Passthrough, MSI
> 
> A more complete history can be found here:
> git://xenbits.xensource.com/qemu-xen-unstable.git
> 
> Signed-off-by: Jiang Yunhong 
> Signed-off-by: Shan Haitao 
> Signed-off-by: Anthony PERARD 
> Acked-by: Stefano Stabellini 
> 
> Now, the questions:
> - is this (i.e., PCI-Passthrough with a locked-down dom0 kernel) 
>   working for anyone? I've Cc-ed Marek, because I think I've read that 
>   QubesOS that it does on QubesOS, but I'm not sure if the situation 
>   is the same...
> - if it's working, how?
> 
> Thanks and Regards




Re: [PATCH 11/16] x86/P2M: derive a HVM-only variant from __get_gfn_type_access()

2022-02-14 Thread George Dunlap


> On Jul 5, 2021, at 5:12 PM, Jan Beulich  wrote:
> 
> Introduce an inline wrapper dealing with the non-translated-domain case,
> while stripping that logic from the main function, which gets renamed to
> p2m_get_gfn_type_access(). HVM-only callers can then directly use the
> main function.
> 
> Along with renaming the main function also make its and the new inline
> helper's GFN parameters type-safe.
> 
> Signed-off-by: Jan Beulich 

Nit in the title: I read “HVM” as “aych vee emm”, and so I use ‘an’ before it 
rather than ‘a’; i.e., “derive an HVM-only…”

I feel obligated to mention it but I’ll leave it to you whether you want to 
change it or not:

Reviewed-by: George Dunlap 



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 1/3] amd/msr: implement VIRT_SPEC_CTRL for HVM guests on top of SPEC_CTRL

2022-02-14 Thread Jan Beulich
On 01.02.2022 17:46, Roger Pau Monne wrote:
> Use the logic to set shadow SPEC_CTRL values in order to implement
> support for VIRT_SPEC_CTRL (signaled by VIRT_SSBD CPUID flag) for HVM
> guests. This includes using the spec_ctrl vCPU MSR variable to store
> the guest set value of VIRT_SPEC_CTRL.SSBD.

This leverages the guest running on the OR of host and guest values,
aiui. If so, this could do with spelling out.

> Note that VIRT_SSBD is only set in the HVM max CPUID policy, as the
> default should be to expose SPEC_CTRL only and support VIRT_SPEC_CTRL
> for migration compatibility.

I'm afraid I don't understand this last statement: How would this be
about migration compatibility? No guest so far can use VIRT_SPEC_CTRL,
and a future guest using it is unlikely to be able to cope with the
MSR "disappearing" during migration.

> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -2273,8 +2273,9 @@ to use.
>  * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
>respectively.
>  * `msr-sc=` offers control over Xen's support for manipulating 
> `MSR_SPEC_CTRL`
> -  on entry and exit.  These blocks are necessary to virtualise support for
> -  guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
> +  and/or `MSR_VIRT_SPEC_CTRL` on entry and exit.  These blocks are necessary 
> to

Why would Xen be manipulating an MSR it only brings into existence for its
guests?

> --- a/xen/arch/x86/cpuid.c
> +++ b/xen/arch/x86/cpuid.c
> @@ -543,6 +543,13 @@ static void __init calculate_hvm_max_policy(void)
>  __clear_bit(X86_FEATURE_IBRSB, hvm_featureset);
>  __clear_bit(X86_FEATURE_IBRS, hvm_featureset);
>  }
> +else
> +/*
> + * If SPEC_CTRL is available VIRT_SPEC_CTRL can also be implemented 
> as
> + * it's a subset of the controls exposed in SPEC_CTRL (SSBD only).
> + * Expose in the max policy for compatibility migration.
> + */
> +__set_bit(X86_FEATURE_VIRT_SSBD, hvm_featureset);

This means even Intel guests can use the feature then? I thought it was
meanwhile deemed bad to offer such cross-vendor features?

Additionally, is SPEC_CTRL (i.e. IBRS) availability enough? Don't you
need AMD_SSBD as a prereq (which may want expressing in gen-cpuid.py)?

> --- a/xen/arch/x86/include/asm/msr.h
> +++ b/xen/arch/x86/include/asm/msr.h
> @@ -291,6 +291,7 @@ struct vcpu_msrs
>  {
>  /*
>   * 0x0048 - MSR_SPEC_CTRL
> + * 0xc001011f - MSR_VIRT_SPEC_CTRL
>   *
>   * For PV guests, this holds the guest kernel value.  It is accessed on
>   * every entry/exit path.
> @@ -301,7 +302,10 @@ struct vcpu_msrs
>   * For SVM, the guest value lives in the VMCB, and hardware 
> saves/restores
>   * the host value automatically.  However, guests run with the OR of the
>   * host and guest value, which allows Xen to set protections behind the
> - * guest's back.
> + * guest's back.  Use such functionality in order to implement support 
> for
> + * VIRT_SPEC_CTRL as a shadow value of SPEC_CTRL and thus store the value
> + * of VIRT_SPEC_CTRL in this field, taking advantage of both MSRs having
> + * compatible layouts.

I guess "shadow value" means more like an alternative value, but
(see above) this is about setting for now just one bit behind the
guest's back.

> --- a/xen/arch/x86/spec_ctrl.c
> +++ b/xen/arch/x86/spec_ctrl.c
> @@ -395,12 +395,13 @@ static void __init print_details(enum ind_thunk thunk, 
> uint64_t caps)
>   * mitigation support for guests.
>   */
>  #ifdef CONFIG_HVM
> -printk("  Support for HVM VMs:%s%s%s%s%s\n",
> +printk("  Support for HVM VMs:%s%s%s%s%s%s\n",
> (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
>  boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
>  boot_cpu_has(X86_FEATURE_MD_CLEAR)   ||
>  opt_eager_fpu)   ? ""   : " 
> None",
> boot_cpu_has(X86_FEATURE_SC_MSR_HVM)  ? " MSR_SPEC_CTRL" : "",
> +   boot_cpu_has(X86_FEATURE_SC_MSR_HVM)  ? " MSR_VIRT_SPEC_CTRL" 
> : "",
> boot_cpu_has(X86_FEATURE_SC_RSB_HVM)  ? " RSB"   : "",
> opt_eager_fpu ? " EAGER_FPU" : "",
> boot_cpu_has(X86_FEATURE_MD_CLEAR)? " MD_CLEAR"  : 
> "");

The output getting longish, can the two SC_MSR_HVM dependent items
perhaps be folded, e.g. by making it "MSR_{,VIRT_}SPEC_CTRL"?

> --- a/xen/include/public/arch-x86/cpufeatureset.h
> +++ b/xen/include/public/arch-x86/cpufeatureset.h
> @@ -265,7 +265,7 @@ XEN_CPUFEATURE(IBRS_SAME_MODE, 8*32+19) /*S  IBRS 
> provides same-mode protection
>  XEN_CPUFEATURE(NO_LMSL,   8*32+20) /*S  EFER.LMSLE no longer supported. 
> */
>  XEN_CPUFEATURE(AMD_PPIN,  8*32+23) /*   Protected Processor Inventory 
> Number */
>  XEN_CPUFEATURE(AMD_SSBD,  8*32+24) /*S  MSR_SPEC_CTRL.SSBD available 

SecureBoot and PCI passthrough with kernel lockdown in place (on Xen)

2022-02-14 Thread Dario Faggioli
Hello,

We have run into an issue when trying to use PCI passthrough for a Xen
VM running on an host where dom0 kernel is 5.14.21 (but we think it
could be any kernel > 5.4) and SecureBoot is enabled.

The error we get, when (for instance) trying to attach a device to an
(HVM) VM, on such system is:

# xl pci-attach 2-fv-sles15sp4beta2 :58:03.0 
libxl: error: libxl_qmp.c:1838:qmp_ev_parse_error_messages: Domain 12:Failed to 
initialize 12/15, type = 0x1, rc: -1
libxl: error: libxl_pci.c:1777:device_pci_add_done: Domain 
12:libxl__device_pci_add failed for PCI device 0:58:3.0 (rc -28)
libxl: error: libxl_device.c:1420:device_addrm_aocomplete: unable to add device

QEMU, is telling us the following:

[00:04.0] xen_pt_msix_init: Error: Can't open /dev/mem: Operation not permitted
[00:04.0] xen_pt_msix_size_init: Error: Internal error: Invalid 
xen_pt_msix_init.

And the kernel reports this:

Jan 27 16:20:53 narvi-sr860v2-bps-sles15sp4b2 kernel: Lockdown: 
qemu-system-i38: /dev/mem,kmem,port is restricted; see man kernel_lockdown.7

So, it's related to lockdown. Which AFAIUI it's consistent with the
fact that the problem only shows up when SecureBoot is enabled, as
that's implies lockdown. It's also consistent with the fact that we
don't seem to have any problems doing the same with a 5.3.x dom0
kernel... As there's no lockdown there!

Some digging revealed that QEMU tries to open /dev/mem in
xen_pt_msix_init():

fd = open("/dev/mem", O_RDWR);
...
msix->phys_iomem_base =
mmap(NULL,
 total_entries * PCI_MSIX_ENTRY_SIZE + 
msix->table_offset_adjust,
 PROT_READ,
 MAP_SHARED | MAP_LOCKED,
 fd,
 msix->table_base + table_off - msix->table_offset_adjust);
close(fd);

This comes from commit:

commit 3854ca577dad92c4fe97b4a6ebce360e25407af7
Author: Jiang Yunhong 
Date:   Thu Jun 21 15:42:35 2012 +

Introduce Xen PCI Passthrough, MSI

A more complete history can be found here:
git://xenbits.xensource.com/qemu-xen-unstable.git

Signed-off-by: Jiang Yunhong 
Signed-off-by: Shan Haitao 
Signed-off-by: Anthony PERARD 
Acked-by: Stefano Stabellini 

Now, the questions:
- is this (i.e., PCI-Passthrough with a locked-down dom0 kernel) 
  working for anyone? I've Cc-ed Marek, because I think I've read that 
  QubesOS that it does on QubesOS, but I'm not sure if the situation 
  is the same...
- if it's working, how?

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v2 00/70] x86: Support for CET Indirect Branch Tracking

2022-02-14 Thread Jan Beulich
On 14.02.2022 15:15, Andrew Cooper wrote:
> On 14/02/2022 13:43, Jan Beulich wrote:
>> On 14.02.2022 14:10, Andrew Cooper wrote:
>>> On 14/02/2022 12:50, Andrew Cooper wrote:
 CET Indirect Branch Tracking is a hardware feature designed to protect 
 against
 forward-edge control flow hijacking (Call/Jump oriented programming), and 
 is a
 companion feature to CET Shadow Stacks added in Xen 4.14.

 Patches 1 thru 5 are prerequisites.  Patches 6 thru 60 are fairly 
 mechanical
 annotations of function pointer targets.  Patches 61 thru 70 are the final
 enablement of CET-IBT.

 This series functions correctly with GCC 9 and later, although an 
 experimental
 GCC patch is required to get more helpful typechecking at build time.

 Tested on a TigerLake NUC.

 CI pipelines:
   https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines/470453652
   https://cirrus-ci.com/build/4962308362338304

 Major changes from v1:
  * Boilerplate for mechanical commits
  * UEFI runtime services unconditionally disable IBT
  * Comprehensive build time check for embedded endbr's
>>> There's one thing I considered, and wanted to discuss.
>>>
>>> I'm tempted to rename cf_check to cfi for the function annotation, as
>>> it's shorter without reducing clarity.
>> What would the 'i' stand for in this acronym?
> 
> The class of techniques is called Control Flow Integrity.
> 
>>  Irrespective of the answer
>> I'd like to point out the name collision with the CFI directives at
>> assembler level. This isn't necessarily an objection (I'm certainly for
>> shortening), but we want to avoid introducing confusion.
> 
> I doubt there is confusion to be had here.  One is entirely a compiler
> construct which turns into ENDBR64 instructions in the assembler, and
> one is a general toolchain construct we explicitly disable.

Hmm. I'm still at best half convinced. Plus we generally have been
naming our shorthands after the actual attribute names. By using
"cfi" such a connection would also be largely lost. Roger, Wei,
others - do you opinions either way?

Jan




Re: [PATCH v2 03/70] xen/xsm: Move {do,compat}_flask_op() declarations into a header

2022-02-14 Thread Daniel P. Smith
On 2/14/22 07:50, Andrew Cooper wrote:
> Declaring sideways like this is unsafe, because the compiler can't check that
> the implementaton in flask_op.c still has the same type.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Daniel De Graaf 
> CC: Daniel Smith 
> 
> v2:
>  * Rework in the face of no useful progress on the better fix.
> ---
>  xen/xsm/flask/flask_op.c | 1 +
>  xen/xsm/flask/hooks.c| 4 +---
>  xen/xsm/flask/private.h  | 9 +
>  3 files changed, 11 insertions(+), 3 deletions(-)
>  create mode 100644 xen/xsm/flask/private.h
> 
> diff --git a/xen/xsm/flask/flask_op.c b/xen/xsm/flask/flask_op.c
> index 221ff00fd3cc..bb3bebc30e01 100644
> --- a/xen/xsm/flask/flask_op.c
> +++ b/xen/xsm/flask/flask_op.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include "private.h"
>  
>  #define ret_t long
>  #define _copy_to_guest copy_to_guest
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index 3b29f7fde372..6ff1be28e4a4 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -36,6 +36,7 @@
>  #include 
>  #include 
>  #include 
> +#include "private.h"
>  
>  static u32 domain_sid(const struct domain *dom)
>  {
> @@ -1742,9 +1743,6 @@ static int flask_argo_send(const struct domain *d, 
> const struct domain *t)
>  
>  #endif
>  
> -long do_flask_op(XEN_GUEST_HANDLE_PARAM(void) u_flask_op);
> -int compat_flask_op(XEN_GUEST_HANDLE_PARAM(void) u_flask_op);
> -
>  static const struct xsm_ops __initconstrel flask_ops = {
>  .security_domaininfo = flask_security_domaininfo,
>  .domain_create = flask_domain_create,
> diff --git a/xen/xsm/flask/private.h b/xen/xsm/flask/private.h
> new file mode 100644
> index ..73b0de87245a
> --- /dev/null
> +++ b/xen/xsm/flask/private.h
> @@ -0,0 +1,9 @@
> +#ifndef XSM_FLASK_PRIVATE
> +#define XSM_FLASK_PRIVATE
> +
> +#include 
> +
> +long do_flask_op(XEN_GUEST_HANDLE_PARAM(void) u_flask_op);
> +int compat_flask_op(XEN_GUEST_HANDLE_PARAM(void) u_flask_op);
> +
> +#endif /* XSM_FLASK_PRIVATE */

Reviewed-by: Daniel P. Smith 




Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure

2022-02-14 Thread Oleksandr Andrushchenko


On 14.02.22 16:31, Jan Beulich wrote:
> On 14.02.2022 15:26, Oleksandr Andrushchenko wrote:
>>
>> On 14.02.22 16:19, Jan Beulich wrote:
>>> On 09.02.2022 14:36, Oleksandr Andrushchenko wrote:
 @@ -410,14 +428,37 @@ static void vpci_write_helper(const struct pci_dev 
 *pdev,
 r->private);
}

 +static bool vpci_header_write_lock(const struct pci_dev *pdev,
 +   unsigned int start, unsigned int size)
 +{
 +/*
 + * Writing the command register and ROM BAR register may trigger
 + * modify_bars to run which in turn may access multiple pdevs while
 + * checking for the existing BAR's overlap. The overlapping check, if 
 done
 + * under the read lock, requires vpci->lock to be acquired on both 
 devices
 + * being compared, which may produce a deadlock. It is not possible to
 + * upgrade read lock to write lock in such a case. So, in order to 
 prevent
 + * the deadlock, check which registers are going to be written and 
 acquire
 + * the lock in the appropriate mode from the beginning.
 + */
 +if ( !vpci_offset_cmp(start, size, PCI_COMMAND, 2) )
 +return true;
 +
 +if ( !vpci_offset_cmp(start, size, pdev->vpci->header.rom_reg, 4) )
 +return true;
 +
 +return false;
 +}
>>> A function of this name gives (especially at the call site(s)) the
>>> impression of acquiring a lock. Considering that of the prefixes
>>> neither "vpci" nor "header" are really relevant here, may I suggest
>>> to use need_write_lock()?
>>>
>>> May I further suggest that you either split the comment or combine
>>> the two if()-s (perhaps even straight into single return statement)?
>>> Personally I'd prefer the single return statement approach here ...
>> That was already questioned by Roger and now it looks like:
>>
>> static bool overlap(unsigned int r1_offset, unsigned int r1_size,
>>       unsigned int r2_offset, unsigned int r2_size)
>> {
>>       /* Return true if there is an overlap. */
>>       return r1_offset < r2_offset + r2_size && r2_offset < r1_offset + 
>> r1_size;
>> }
>>
>> bool vpci_header_write_lock(const struct pci_dev *pdev,
>>       unsigned int start, unsigned int size)
>> {
>>       /*
>>    * Writing the command register and ROM BAR register may trigger
>>    * modify_bars to run which in turn may access multiple pdevs while
>>    * checking for the existing BAR's overlap. The overlapping check, if 
>> done
>>    * under the read lock, requires vpci->lock to be acquired on both 
>> devices
>>    * being compared, which may produce a deadlock. It is not possible to
>>    * upgrade read lock to write lock in such a case. So, in order to 
>> prevent
>>    * the deadlock, check which registers are going to be written and 
>> acquire
>>    * the lock in the appropriate mode from the beginning.
>>    */
>>       if ( overlap(start, size, PCI_COMMAND, 2) ||
>>    (pdev->vpci->header.rom_reg &&
>>     overlap(start, size, pdev->vpci->header.rom_reg, 4)) )
>>       return true;
>>
>>       return false;
>> }
>>
>> vpci_header_write_lock moved to header.c and is not static anymore.
>> So, sitting in header.c, the name seems to be appropriate now
> The prefix of the name - yes. But as said, a function of this name looks
> as if it would acquire a lock. Imo you want to insert "need" or some
> such.
Agree. Then vpci_header_need_write_lock.
I will also update the comment because it makes an impression that
the function acquires the lock
>
> Jan
>
Thank you,
Oleksandr

Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure

2022-02-14 Thread Jan Beulich
On 14.02.2022 15:26, Oleksandr Andrushchenko wrote:
> 
> 
> On 14.02.22 16:19, Jan Beulich wrote:
>> On 09.02.2022 14:36, Oleksandr Andrushchenko wrote:
>>> @@ -410,14 +428,37 @@ static void vpci_write_helper(const struct pci_dev 
>>> *pdev,
>>>r->private);
>>>   }
>>>   
>>> +static bool vpci_header_write_lock(const struct pci_dev *pdev,
>>> +   unsigned int start, unsigned int size)
>>> +{
>>> +/*
>>> + * Writing the command register and ROM BAR register may trigger
>>> + * modify_bars to run which in turn may access multiple pdevs while
>>> + * checking for the existing BAR's overlap. The overlapping check, if 
>>> done
>>> + * under the read lock, requires vpci->lock to be acquired on both 
>>> devices
>>> + * being compared, which may produce a deadlock. It is not possible to
>>> + * upgrade read lock to write lock in such a case. So, in order to 
>>> prevent
>>> + * the deadlock, check which registers are going to be written and 
>>> acquire
>>> + * the lock in the appropriate mode from the beginning.
>>> + */
>>> +if ( !vpci_offset_cmp(start, size, PCI_COMMAND, 2) )
>>> +return true;
>>> +
>>> +if ( !vpci_offset_cmp(start, size, pdev->vpci->header.rom_reg, 4) )
>>> +return true;
>>> +
>>> +return false;
>>> +}
>> A function of this name gives (especially at the call site(s)) the
>> impression of acquiring a lock. Considering that of the prefixes
>> neither "vpci" nor "header" are really relevant here, may I suggest
>> to use need_write_lock()?
>>
>> May I further suggest that you either split the comment or combine
>> the two if()-s (perhaps even straight into single return statement)?
>> Personally I'd prefer the single return statement approach here ...
> That was already questioned by Roger and now it looks like:
> 
> static bool overlap(unsigned int r1_offset, unsigned int r1_size,
>      unsigned int r2_offset, unsigned int r2_size)
> {
>      /* Return true if there is an overlap. */
>      return r1_offset < r2_offset + r2_size && r2_offset < r1_offset + 
> r1_size;
> }
> 
> bool vpci_header_write_lock(const struct pci_dev *pdev,
>      unsigned int start, unsigned int size)
> {
>      /*
>   * Writing the command register and ROM BAR register may trigger
>   * modify_bars to run which in turn may access multiple pdevs while
>   * checking for the existing BAR's overlap. The overlapping check, if 
> done
>   * under the read lock, requires vpci->lock to be acquired on both 
> devices
>   * being compared, which may produce a deadlock. It is not possible to
>   * upgrade read lock to write lock in such a case. So, in order to 
> prevent
>   * the deadlock, check which registers are going to be written and 
> acquire
>   * the lock in the appropriate mode from the beginning.
>   */
>      if ( overlap(start, size, PCI_COMMAND, 2) ||
>   (pdev->vpci->header.rom_reg &&
>    overlap(start, size, pdev->vpci->header.rom_reg, 4)) )
>      return true;
> 
>      return false;
> }
> 
> vpci_header_write_lock moved to header.c and is not static anymore.
> So, sitting in header.c, the name seems to be appropriate now

The prefix of the name - yes. But as said, a function of this name looks
as if it would acquire a lock. Imo you want to insert "need" or some
such.

Jan




Re: [PATCH v2 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC

2022-02-14 Thread Jan Beulich
On 08.02.2022 17:17, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 06:21:01PM +, Jane Malalane wrote:
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -1115,7 +1115,8 @@ void cpuid_hypervisor_leaves(const struct vcpu *v, 
>> uint32_t leaf,
>>  if ( !is_hvm_domain(d) || subleaf != 0 )
>>  break;
>>  
>> -if ( cpu_has_vmx_apic_reg_virt )
>> +if ( cpu_has_vmx_apic_reg_virt &&
> 
> You can drop the cpu_has_vmx_apic_reg_virt check here, if
> cpu_has_vmx_apic_reg_virt is false assisted_xapic won't be set to true.

Along these lines ...

>> + v->domain->arch.hvm.assisted_xapic )
>>  res->a |= XEN_HVM_CPUID_APIC_ACCESS_VIRT;
>>  
>>  /*
>> @@ -1124,9 +1125,8 @@ void cpuid_hypervisor_leaves(const struct vcpu *v, 
>> uint32_t leaf,
>>   * and wrmsr in the guest will run without VMEXITs (see
>>   * vmx_vlapic_msr_changed()).
>>   */
>> -if ( cpu_has_vmx_virtualize_x2apic_mode &&
>> - cpu_has_vmx_apic_reg_virt &&
>> - cpu_has_vmx_virtual_intr_delivery )
>> +if ( (cpu_has_vmx_apic_reg_virt && 
>> cpu_has_vmx_virtual_intr_delivery) &&
> ^ unneeded parentheses

... this also wants simplifying to just v->domain->arch.hvm.assisted_x2apic:
The apparently stray parentheses were, I think, added in reply to me pointing
out that the check here isn't in line with that put in place by patch 1 in
vmx_init_vmcs_config(). I.e. the inner && really was meant to be || as it
looks. Yet once the two are in line, the same simplification as above is
possible.

Jan




Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure

2022-02-14 Thread Oleksandr Andrushchenko


On 14.02.22 16:19, Jan Beulich wrote:
> On 09.02.2022 14:36, Oleksandr Andrushchenko wrote:
>> @@ -410,14 +428,37 @@ static void vpci_write_helper(const struct pci_dev 
>> *pdev,
>>r->private);
>>   }
>>   
>> +static bool vpci_header_write_lock(const struct pci_dev *pdev,
>> +   unsigned int start, unsigned int size)
>> +{
>> +/*
>> + * Writing the command register and ROM BAR register may trigger
>> + * modify_bars to run which in turn may access multiple pdevs while
>> + * checking for the existing BAR's overlap. The overlapping check, if 
>> done
>> + * under the read lock, requires vpci->lock to be acquired on both 
>> devices
>> + * being compared, which may produce a deadlock. It is not possible to
>> + * upgrade read lock to write lock in such a case. So, in order to 
>> prevent
>> + * the deadlock, check which registers are going to be written and 
>> acquire
>> + * the lock in the appropriate mode from the beginning.
>> + */
>> +if ( !vpci_offset_cmp(start, size, PCI_COMMAND, 2) )
>> +return true;
>> +
>> +if ( !vpci_offset_cmp(start, size, pdev->vpci->header.rom_reg, 4) )
>> +return true;
>> +
>> +return false;
>> +}
> A function of this name gives (especially at the call site(s)) the
> impression of acquiring a lock. Considering that of the prefixes
> neither "vpci" nor "header" are really relevant here, may I suggest
> to use need_write_lock()?
>
> May I further suggest that you either split the comment or combine
> the two if()-s (perhaps even straight into single return statement)?
> Personally I'd prefer the single return statement approach here ...
That was already questioned by Roger and now it looks like:

static bool overlap(unsigned int r1_offset, unsigned int r1_size,
     unsigned int r2_offset, unsigned int r2_size)
{
     /* Return true if there is an overlap. */
     return r1_offset < r2_offset + r2_size && r2_offset < r1_offset + r1_size;
}

bool vpci_header_write_lock(const struct pci_dev *pdev,
     unsigned int start, unsigned int size)
{
     /*
  * Writing the command register and ROM BAR register may trigger
  * modify_bars to run which in turn may access multiple pdevs while
  * checking for the existing BAR's overlap. The overlapping check, if done
  * under the read lock, requires vpci->lock to be acquired on both devices
  * being compared, which may produce a deadlock. It is not possible to
  * upgrade read lock to write lock in such a case. So, in order to prevent
  * the deadlock, check which registers are going to be written and acquire
  * the lock in the appropriate mode from the beginning.
  */
     if ( overlap(start, size, PCI_COMMAND, 2) ||
  (pdev->vpci->header.rom_reg &&
   overlap(start, size, pdev->vpci->header.rom_reg, 4)) )
     return true;

     return false;
}

vpci_header_write_lock moved to header.c and is not static anymore.
So, sitting in header.c, the name seems to be appropriate now
>
> Jan
>
Thank you,
Oleksandr

Re: [PATCH 10/16] x86/P2M: p2m_get_page_from_gfn() is HVM-only

2022-02-14 Thread George Dunlap


> On Jul 5, 2021, at 5:10 PM, Jan Beulich  wrote:
> 
> This function is the wrong layer to go through for PV guests. It happens
> to work, but produces results which aren't fully consistent with
> get_page_from_gfn(). The latter function, however, cannot be used in
> map_domain_gfn() as it may not be the host P2M we mean to act on.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: George Dunlap 



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure

2022-02-14 Thread Jan Beulich
On 09.02.2022 14:36, Oleksandr Andrushchenko wrote:
> @@ -410,14 +428,37 @@ static void vpci_write_helper(const struct pci_dev 
> *pdev,
>   r->private);
>  }
>  
> +static bool vpci_header_write_lock(const struct pci_dev *pdev,
> +   unsigned int start, unsigned int size)
> +{
> +/*
> + * Writing the command register and ROM BAR register may trigger
> + * modify_bars to run which in turn may access multiple pdevs while
> + * checking for the existing BAR's overlap. The overlapping check, if 
> done
> + * under the read lock, requires vpci->lock to be acquired on both 
> devices
> + * being compared, which may produce a deadlock. It is not possible to
> + * upgrade read lock to write lock in such a case. So, in order to 
> prevent
> + * the deadlock, check which registers are going to be written and 
> acquire
> + * the lock in the appropriate mode from the beginning.
> + */
> +if ( !vpci_offset_cmp(start, size, PCI_COMMAND, 2) )
> +return true;
> +
> +if ( !vpci_offset_cmp(start, size, pdev->vpci->header.rom_reg, 4) )
> +return true;
> +
> +return false;
> +}

A function of this name gives (especially at the call site(s)) the
impression of acquiring a lock. Considering that of the prefixes
neither "vpci" nor "header" are really relevant here, may I suggest
to use need_write_lock()?

May I further suggest that you either split the comment or combine
the two if()-s (perhaps even straight into single return statement)?
Personally I'd prefer the single return statement approach here ...

Jan




Re: [PATCH v2 00/70] x86: Support for CET Indirect Branch Tracking

2022-02-14 Thread Andrew Cooper
On 14/02/2022 13:43, Jan Beulich wrote:
> On 14.02.2022 14:10, Andrew Cooper wrote:
>> On 14/02/2022 12:50, Andrew Cooper wrote:
>>> CET Indirect Branch Tracking is a hardware feature designed to protect 
>>> against
>>> forward-edge control flow hijacking (Call/Jump oriented programming), and 
>>> is a
>>> companion feature to CET Shadow Stacks added in Xen 4.14.
>>>
>>> Patches 1 thru 5 are prerequisites.  Patches 6 thru 60 are fairly mechanical
>>> annotations of function pointer targets.  Patches 61 thru 70 are the final
>>> enablement of CET-IBT.
>>>
>>> This series functions correctly with GCC 9 and later, although an 
>>> experimental
>>> GCC patch is required to get more helpful typechecking at build time.
>>>
>>> Tested on a TigerLake NUC.
>>>
>>> CI pipelines:
>>>   https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines/470453652
>>>   https://cirrus-ci.com/build/4962308362338304
>>>
>>> Major changes from v1:
>>>  * Boilerplate for mechanical commits
>>>  * UEFI runtime services unconditionally disable IBT
>>>  * Comprehensive build time check for embedded endbr's
>> There's one thing I considered, and wanted to discuss.
>>
>> I'm tempted to rename cf_check to cfi for the function annotation, as
>> it's shorter without reducing clarity.
> What would the 'i' stand for in this acronym?

The class of techniques is called Control Flow Integrity.

>  Irrespective of the answer
> I'd like to point out the name collision with the CFI directives at
> assembler level. This isn't necessarily an objection (I'm certainly for
> shortening), but we want to avoid introducing confusion.

I doubt there is confusion to be had here.  One is entirely a compiler
construct which turns into ENDBR64 instructions in the assembler, and
one is a general toolchain construct we explicitly disable.

~Andrew


Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure

2022-02-14 Thread Jan Beulich
On 14.02.2022 15:00, Oleksandr Andrushchenko wrote:
> /*
> * FIXME: apply_map is called from dom0 specific init code when
> * system_state < SYS_STATE_active, so there is no race condition
> * possible between this code and vpci_process_pending. So, neither
> * vpci_process_pending may try to acquire the lock in read mode and
> * also destroy pdev->vpci in its error path nor pdev may be disposed yet.
> * This means that it is not required to check if the relevant pdev
> * still exists after re-acquiring the lock.
> */

I think I'm okay with this variant, pending me seeing it in context.

Jan




Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure

2022-02-14 Thread Oleksandr Andrushchenko


On 14.02.22 15:48, Jan Beulich wrote:
> On 14.02.2022 14:27, Oleksandr Andrushchenko wrote:
>>
>> On 14.02.22 15:22, Jan Beulich wrote:
>>> On 14.02.2022 14:13, Oleksandr Andrushchenko wrote:
 On 14.02.22 14:57, Jan Beulich wrote:
> On 14.02.2022 12:37, Oleksandr Andrushchenko wrote:
>> On 14.02.22 13:25, Roger Pau Monné wrote:
>>> On Mon, Feb 14, 2022 at 11:15:27AM +, Oleksandr Andrushchenko wrote:
 On 14.02.22 13:11, Roger Pau Monné wrote:
> On Mon, Feb 14, 2022 at 10:53:43AM +, Oleksandr Andrushchenko 
> wrote:
>> On 14.02.22 12:34, Roger Pau Monné wrote:
>>> On Mon, Feb 14, 2022 at 09:36:39AM +, Oleksandr Andrushchenko 
>>> wrote:
 On 11.02.22 13:40, Roger Pau Monné wrote:
> +
  for ( i = 0; i < msix->max_entries; i++ )
  {
  const struct vpci_msix_entry *entry = 
 >entries[i];
>>> Since this function is now called with the per-domain rwlock 
>>> read
>>> locked it's likely not appropriate to call 
>>> process_pending_softirqs
>>> while holding such lock (check below).
>> You are right, as it is possible that:
>>
>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>
>> Even more, vpci_process_pending may also
>>
>> read_unlock -> vpci_remove_device -> write_lock
>>
>> in its error path. So, any invocation of process_pending_softirqs
>> must not hold d->vpci_rwlock at least.
>>
>> And also we need to check that pdev->vpci was not removed
>> in between or *re-created*
>>> We will likely need to re-iterate over the list of pdevs 
>>> assigned to
>>> the domain and assert that the pdev is still assigned to the 
>>> same
>>> domain.
>> So, do you mean a pattern like the below should be used at all
>> places where we need to call process_pending_softirqs?
>>
>> read_unlock
>> process_pending_softirqs
>> read_lock
>> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
>> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) )
>> 
> Something along those lines. You likely need to continue iterate 
> using
> for_each_pdev.
 How do we tell if pdev->vpci is the same? Jan has already brought
 this question before [1] and I was about to use some ID for that 
 purpose:
 pdev->vpci->id = d->vpci_id++ and then we use pdev->vpci->id  for 
 checks
>>> Given this is a debug message I would be OK with just doing the
>>> minimal checks to prevent Xen from crashing (ie: pdev->vpci exists)
>>> and that the resume MSI entry is not past the current limit. 
>>> Otherwise
>>> just print a message and move on to the next device.
>> Agree, I see no big issue (probably) if we are not able to print
>>
>> How about this one:
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 809a6b4773e1..50373f04da82 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -171,10 +171,31 @@ static int __init apply_map(struct domain *d, 
>> const struct pci_dev *pdev,
>>    struct rangeset *mem, uint16_t 
>> cmd)
>>    {
>>    struct map_data data = { .d = d, .map = true };
>> +    pci_sbdf_t sbdf = pdev->sbdf;
>>    int rc;
>>
>> + ASSERT(rw_is_write_locked(>domain->vpci_rwlock));
>> +
>>    while ( (rc = rangeset_consume_ranges(mem, map_range, 
>> )) == -ERESTART )
>> +    {
>> +
>> +    /*
>> + * process_pending_softirqs may trigger 
>> vpci_process_pending which
>> + * may need to acquire pdev->domain->vpci_rwlock in read 
>> mode.
>> + */
>> +    write_unlock(>domain->vpci_rwlock);
>>    process_pending_softirqs();
>> +    write_lock(>domain->vpci_rwlock);
>> +
>> +    /* Check if pdev still exists and vPCI was not removed or 
>> re-created. */
>> +    if (pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, 
>> sbdf.devfn) != pdev)
>> +    if ( vpci is NOT the same )
>> +    {
>> +    rc = 0;
>> +    break;
>> +    }
>> +    }
>> +
>> 

Re: [PATCH v2 04/70] x86/pv-shim: Don't modify the hypercall table

2022-02-14 Thread Jan Beulich
On 14.02.2022 14:50, Andrew Cooper wrote:
> On 14/02/2022 13:33, Jan Beulich wrote:
>> On 14.02.2022 13:50, Andrew Cooper wrote:
>>> From: Juergen Gross 
>>>
>>> When running as pv-shim the hypercall is modified today in order to
>>> replace the functions for __HYPERVISOR_event_channel_op and
>>> __HYPERVISOR_grant_table_op hypercalls.
>>>
>>> Change this to call the related functions from the normal handlers
>>> instead when running as shim. The performance implications are not
>>> really relevant, as a normal production hypervisor will not be
>>> configured to support shim mode, so the related calls will be dropped
>>> due to optimization of the compiler.
>>>
>>> Note that for the CONFIG_PV_SHIM_EXCLUSIVE case there is a dummy
>>> wrapper do_grant_table_op() needed, as in this case grant_table.c
>>> isn't being built.
>>>
>>> Signed-off-by: Juergen Gross 
>>> Signed-off-by: Andrew Cooper 
>> I don't think you sync-ed this with Jürgen's v3. There were only minor
>> changes but having a stale version sent two months later isn't very
>> nice.
> 
> I did resync.  What do you think is missing?

A few likely() / unlikely() as far as I could see.

>>> --- a/xen/common/compat/multicall.c
>>> +++ b/xen/common/compat/multicall.c
>>> @@ -5,7 +5,7 @@
>>>  EMIT_FILE;
>>>  
>>>  #include 
>>> -#include 
>>> +#include 
>>>  #include 
>>>  
>>>  #define COMPAT
>>> @@ -19,7 +19,6 @@ static inline void xlat_multicall_entry(struct mc_state 
>>> *mcs)
>>>  mcs->compat_call.args[i] = mcs->call.args[i];
>>>  }
>>>  
>>> -DEFINE_XEN_GUEST_HANDLE(multicall_entry_compat_t);
>>>  #define multicall_entry  compat_multicall_entry
>>>  #define multicall_entry_tmulticall_entry_compat_t
>>>  #define do_multicall_callcompat_multicall_call
>> Jürgen's patch doesn't have any change to this file, and I'm afraid I
>> also don't see how these adjustments are related here. The commit
>> message sadly also doesn't help ...
> 
> The changes are very necessary to split it out of Juergen's series.
> 
> Without the adjustment, the correction of compat_platform_op()'s guest
> handle type from void to compat_platform_op_t doesn't compile.

Interesting. That's quite far from obvious in this context, so clarifying
the purpose in the description would seem helpful.

Coming back to the syncing with v3: Was this change the reason then why
you did drop my R-b?

Jan




Re: [PATCH v2 3/7] x86/altcall: Optimise away endbr64 instruction where possible

2022-02-14 Thread Jan Beulich
On 14.02.2022 14:31, Andrew Cooper wrote:
> On 14/02/2022 13:06, Jan Beulich wrote:
>> On 14.02.2022 13:56, Andrew Cooper wrote:
>>> @@ -330,6 +333,41 @@ static void init_or_livepatch 
>>> _apply_alternatives(struct alt_instr *start,
>>>  add_nops(buf + a->repl_len, total_len - a->repl_len);
>>>  text_poke(orig, buf, total_len);
>>>  }
>>> +
>>> +/*
>>> + * Clobber endbr64 instructions now that altcall has finished 
>>> optimising
>>> + * all indirect branches to direct ones.
>>> + */
>>> +if ( force && cpu_has_xen_ibt )
>>> +{
>>> +void *const *val;
>>> +unsigned int clobbered = 0;
>>> +
>>> +/*
>>> + * This is some minor structure (ab)use.  We walk the entire 
>>> contents
>>> + * of .init.{ro,}data.cf_clobber as if it were an array of 
>>> pointers.
>>> + *
>>> + * If the pointer points into .text, and at an endbr64 instruction,
>>> + * nop out the endbr64.  This causes the pointer to no longer be a
>>> + * legal indirect branch target under CET-IBT.  This is a
>>> + * defence-in-depth measure, to reduce the options available to an
>>> + * adversary who has managed to hijack a function pointer.
>>> + */
>>> +for ( val = __initdata_cf_clobber_start;
>>> +  val < __initdata_cf_clobber_end;
>>> +  val++ )
>>> +{
>>> +void *ptr = *val;
>>> +
>>> +if ( !is_kernel_text(ptr) || !is_endbr64(ptr) )
>>> +continue;
>>> +
>>> +add_nops(ptr, 4);
>> This literal 4 would be nice to have a #define next to where the ENDBR64
>> encoding has its central place.
> 
> We don't have an encoding of ENDBR64 in a central place.
> 
> The best you can probably have is
> 
> #define ENDBR64_LEN 4
> 
> in endbr.h ?

Perhaps. That's not in this series nor in staging already, so it's a little
hard to check. By "central place" I really meant is_enbr64() if that's the
only place where the encoding actually appears.

>>> --- a/xen/arch/x86/xen.lds.S
>>> +++ b/xen/arch/x86/xen.lds.S
>>> @@ -221,6 +221,12 @@ SECTIONS
>>> *(.initcall1.init)
>>> __initcall_end = .;
>>>  
>>> +   . = ALIGN(POINTER_ALIGN);
>>> +   __initdata_cf_clobber_start = .;
>>> +   *(.init.data.cf_clobber)
>>> +   *(.init.rodata.cf_clobber)
>>> +   __initdata_cf_clobber_end = .;
>>> +
>>> *(.init.data)
>>> *(.init.data.rel)
>>> *(.init.data.rel.*)
>> With r/o data ahead and r/w data following, may I suggest to flip the
>> order of the two section specifiers you add?
> 
> I don't follow.  This is all initdata which is merged together into a
> single section.
> 
> The only reason const data is split out in the first place is to appease
> the toolchains, not because it makes a difference.

It's marginal, I agree, but it would still seem more clean to me if all
(pseudo) r/o init data lived side by side.

Jan




Re: [PATCH v2 04/70] x86/pv-shim: Don't modify the hypercall table

2022-02-14 Thread Andrew Cooper
On 14/02/2022 13:33, Jan Beulich wrote:
> On 14.02.2022 13:50, Andrew Cooper wrote:
>> From: Juergen Gross 
>>
>> When running as pv-shim the hypercall is modified today in order to
>> replace the functions for __HYPERVISOR_event_channel_op and
>> __HYPERVISOR_grant_table_op hypercalls.
>>
>> Change this to call the related functions from the normal handlers
>> instead when running as shim. The performance implications are not
>> really relevant, as a normal production hypervisor will not be
>> configured to support shim mode, so the related calls will be dropped
>> due to optimization of the compiler.
>>
>> Note that for the CONFIG_PV_SHIM_EXCLUSIVE case there is a dummy
>> wrapper do_grant_table_op() needed, as in this case grant_table.c
>> isn't being built.
>>
>> Signed-off-by: Juergen Gross 
>> Signed-off-by: Andrew Cooper 
> I don't think you sync-ed this with Jürgen's v3. There were only minor
> changes but having a stale version sent two months later isn't very
> nice.

I did resync.  What do you think is missing?

>
>> --- a/xen/common/compat/multicall.c
>> +++ b/xen/common/compat/multicall.c
>> @@ -5,7 +5,7 @@
>>  EMIT_FILE;
>>  
>>  #include 
>> -#include 
>> +#include 
>>  #include 
>>  
>>  #define COMPAT
>> @@ -19,7 +19,6 @@ static inline void xlat_multicall_entry(struct mc_state 
>> *mcs)
>>  mcs->compat_call.args[i] = mcs->call.args[i];
>>  }
>>  
>> -DEFINE_XEN_GUEST_HANDLE(multicall_entry_compat_t);
>>  #define multicall_entry  compat_multicall_entry
>>  #define multicall_entry_tmulticall_entry_compat_t
>>  #define do_multicall_callcompat_multicall_call
> Jürgen's patch doesn't have any change to this file, and I'm afraid I
> also don't see how these adjustments are related here. The commit
> message sadly also doesn't help ...

The changes are very necessary to split it out of Juergen's series.

Without the adjustment, the correction of compat_platform_op()'s guest
handle type from void to compat_platform_op_t doesn't compile.

~Andrew



Re: Development Issue of Concern

2022-02-14 Thread George Dunlap
On Sat, Feb 12, 2022 at 1:46 AM Elliott Mitchell  wrote:

> The tradition has been to name the active development branch in GIT has
> been named "master".  Quite a number of people object to the name due to
> its history.
>
> In light of such concerns, perhaps the Xen Project should join with other
> similar projects and move to have the active development branch renamed
> "main"?
>

There was a general intention to do that switch a few years ago, but there
were some technical pieces missing.  Probably time to take another look.

 -George


Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure

2022-02-14 Thread Jan Beulich
On 14.02.2022 14:27, Oleksandr Andrushchenko wrote:
> 
> 
> On 14.02.22 15:22, Jan Beulich wrote:
>> On 14.02.2022 14:13, Oleksandr Andrushchenko wrote:
>>>
>>> On 14.02.22 14:57, Jan Beulich wrote:
 On 14.02.2022 12:37, Oleksandr Andrushchenko wrote:
> On 14.02.22 13:25, Roger Pau Monné wrote:
>> On Mon, Feb 14, 2022 at 11:15:27AM +, Oleksandr Andrushchenko wrote:
>>> On 14.02.22 13:11, Roger Pau Monné wrote:
 On Mon, Feb 14, 2022 at 10:53:43AM +, Oleksandr Andrushchenko 
 wrote:
> On 14.02.22 12:34, Roger Pau Monné wrote:
>> On Mon, Feb 14, 2022 at 09:36:39AM +, Oleksandr Andrushchenko 
>> wrote:
>>> On 11.02.22 13:40, Roger Pau Monné wrote:
 +
>>> for ( i = 0; i < msix->max_entries; i++ )
>>> {
>>> const struct vpci_msix_entry *entry = 
>>> >entries[i];
>> Since this function is now called with the per-domain rwlock read
>> locked it's likely not appropriate to call 
>> process_pending_softirqs
>> while holding such lock (check below).
> You are right, as it is possible that:
>
> process_pending_softirqs -> vpci_process_pending -> read_lock
>
> Even more, vpci_process_pending may also
>
> read_unlock -> vpci_remove_device -> write_lock
>
> in its error path. So, any invocation of process_pending_softirqs
> must not hold d->vpci_rwlock at least.
>
> And also we need to check that pdev->vpci was not removed
> in between or *re-created*
>> We will likely need to re-iterate over the list of pdevs 
>> assigned to
>> the domain and assert that the pdev is still assigned to the same
>> domain.
> So, do you mean a pattern like the below should be used at all
> places where we need to call process_pending_softirqs?
>
> read_unlock
> process_pending_softirqs
> read_lock
> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) )
> 
 Something along those lines. You likely need to continue iterate 
 using
 for_each_pdev.
>>> How do we tell if pdev->vpci is the same? Jan has already brought
>>> this question before [1] and I was about to use some ID for that 
>>> purpose:
>>> pdev->vpci->id = d->vpci_id++ and then we use pdev->vpci->id  for 
>>> checks
>> Given this is a debug message I would be OK with just doing the
>> minimal checks to prevent Xen from crashing (ie: pdev->vpci exists)
>> and that the resume MSI entry is not past the current limit. 
>> Otherwise
>> just print a message and move on to the next device.
> Agree, I see no big issue (probably) if we are not able to print
>
> How about this one:
>
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 809a6b4773e1..50373f04da82 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -171,10 +171,31 @@ static int __init apply_map(struct domain *d, 
> const struct pci_dev *pdev,
>   struct rangeset *mem, uint16_t cmd)
>   {
>   struct map_data data = { .d = d, .map = true };
> +    pci_sbdf_t sbdf = pdev->sbdf;
>   int rc;
>
> + ASSERT(rw_is_write_locked(>domain->vpci_rwlock));
> +
>   while ( (rc = rangeset_consume_ranges(mem, map_range, 
> )) == -ERESTART )
> +    {
> +
> +    /*
> + * process_pending_softirqs may trigger vpci_process_pending 
> which
> + * may need to acquire pdev->domain->vpci_rwlock in read 
> mode.
> + */
> +    write_unlock(>domain->vpci_rwlock);
>   process_pending_softirqs();
> +    write_lock(>domain->vpci_rwlock);
> +
> +    /* Check if pdev still exists and vPCI was not removed or 
> re-created. */
> +    if (pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, 
> sbdf.devfn) != pdev)
> +    if ( vpci is NOT the same )
> +    {
> +    rc = 0;
> +    break;
> +    }
> +    }
> +
>   rangeset_destroy(mem);
>   if ( !rc )
>   modify_decoding(pdev, cmd, false);
>
> This one also wants process_pending_softirqs to run so 

Re: [PATCH v2 00/70] x86: Support for CET Indirect Branch Tracking

2022-02-14 Thread Jan Beulich
On 14.02.2022 14:10, Andrew Cooper wrote:
> On 14/02/2022 12:50, Andrew Cooper wrote:
>> CET Indirect Branch Tracking is a hardware feature designed to protect 
>> against
>> forward-edge control flow hijacking (Call/Jump oriented programming), and is 
>> a
>> companion feature to CET Shadow Stacks added in Xen 4.14.
>>
>> Patches 1 thru 5 are prerequisites.  Patches 6 thru 60 are fairly mechanical
>> annotations of function pointer targets.  Patches 61 thru 70 are the final
>> enablement of CET-IBT.
>>
>> This series functions correctly with GCC 9 and later, although an 
>> experimental
>> GCC patch is required to get more helpful typechecking at build time.
>>
>> Tested on a TigerLake NUC.
>>
>> CI pipelines:
>>   https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines/470453652
>>   https://cirrus-ci.com/build/4962308362338304
>>
>> Major changes from v1:
>>  * Boilerplate for mechanical commits
>>  * UEFI runtime services unconditionally disable IBT
>>  * Comprehensive build time check for embedded endbr's
> 
> There's one thing I considered, and wanted to discuss.
> 
> I'm tempted to rename cf_check to cfi for the function annotation, as
> it's shorter without reducing clarity.

What would the 'i' stand for in this acronym? Irrespective of the answer
I'd like to point out the name collision with the CFI directives at
assembler level. This isn't necessarily an objection (I'm certainly for
shortening), but we want to avoid introducing confusion.

Jan




Re: [PATCH v2 34/70] x86/emul: CFI hardening

2022-02-14 Thread Jan Beulich
On 14.02.2022 13:50, Andrew Cooper wrote:
> Control Flow Integrity schemes use toolchain and optionally hardware support
> to help protect against call/jump/return oriented programming attacks.
> 
> Use cf_check to annotate function pointer targets for the toolchain.
> 
> pv_emul_is_mem_write() is only used in a single file.  Having it as a static
> inline is pointless because it can't be inlined to begin with.

I'd like you to consider to re-word this: It being static inline was for
the case of there appearing a 2nd user. I don't view such as pointless.

Jan




Re: [PATCH v2 5/7] x86/hvm: Use __initdata_cf_clobber for hvm_funcs

2022-02-14 Thread Andrew Cooper
On 14/02/2022 13:10, Jan Beulich wrote:
> On 14.02.2022 13:56, Andrew Cooper wrote:
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -88,7 +88,7 @@ unsigned int opt_hvm_debug_level __read_mostly;
>>  integer_param("hvm_debug", opt_hvm_debug_level);
>>  #endif
>>  
>> -struct hvm_function_table hvm_funcs __read_mostly;
>> +struct hvm_function_table __ro_after_init hvm_funcs;
> Strictly speaking this is an unrelated change. I'm fine with it living here,
> but half a sentence would be nice in the description.

I could split it out, but we could probably make 200 patches of
"sprinkle some __ro_after_init around, now that it exists".

>
>> --- a/xen/arch/x86/hvm/svm/svm.c
>> +++ b/xen/arch/x86/hvm/svm/svm.c
>> @@ -2513,7 +2513,7 @@ static void cf_check svm_set_reg(struct vcpu *v, 
>> unsigned int reg, uint64_t val)
>>  }
>>  }
>>  
>> -static struct hvm_function_table __initdata svm_function_table = {
>> +static struct hvm_function_table __initdata_cf_clobber svm_function_table = 
>> {
>>  .name = "SVM",
>>  .cpu_up_prepare   = svm_cpu_up_prepare,
>>  .cpu_dead = svm_cpu_dead,
>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>> index 41db538a9e3d..758df3321884 100644
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -2473,7 +2473,7 @@ static void cf_check vmx_set_reg(struct vcpu *v, 
>> unsigned int reg, uint64_t val)
>>  vmx_vmcs_exit(v);
>>  }
>>  
>> -static struct hvm_function_table __initdata vmx_function_table = {
>> +static struct hvm_function_table __initdata_cf_clobber vmx_function_table = 
>> {
>>  .name = "VMX",
>>  .cpu_up_prepare   = vmx_cpu_up_prepare,
>>  .cpu_dead = vmx_cpu_dead,
> While I'd like to re-raise my concern regarding the non-pointer fields
> in these structure instances (just consider a sequence of enough bool
> bitfields, which effectively can express any value, including ones
> which would appear like pointers into .text), since for now all is okay
> afaict:
> Reviewed-by: Jan Beulich 

I should probably put something in the commit message too.  It is a
theoretical risk, but not (IMO) a practical one.

~Andrew


Re: [PATCH v2 04/70] x86/pv-shim: Don't modify the hypercall table

2022-02-14 Thread Jan Beulich
On 14.02.2022 13:50, Andrew Cooper wrote:
> From: Juergen Gross 
> 
> When running as pv-shim the hypercall is modified today in order to
> replace the functions for __HYPERVISOR_event_channel_op and
> __HYPERVISOR_grant_table_op hypercalls.
> 
> Change this to call the related functions from the normal handlers
> instead when running as shim. The performance implications are not
> really relevant, as a normal production hypervisor will not be
> configured to support shim mode, so the related calls will be dropped
> due to optimization of the compiler.
> 
> Note that for the CONFIG_PV_SHIM_EXCLUSIVE case there is a dummy
> wrapper do_grant_table_op() needed, as in this case grant_table.c
> isn't being built.
> 
> Signed-off-by: Juergen Gross 
> Signed-off-by: Andrew Cooper 

I don't think you sync-ed this with Jürgen's v3. There were only minor
changes but having a stale version sent two months later isn't very
nice.

> --- a/xen/common/compat/multicall.c
> +++ b/xen/common/compat/multicall.c
> @@ -5,7 +5,7 @@
>  EMIT_FILE;
>  
>  #include 
> -#include 
> +#include 
>  #include 
>  
>  #define COMPAT
> @@ -19,7 +19,6 @@ static inline void xlat_multicall_entry(struct mc_state 
> *mcs)
>  mcs->compat_call.args[i] = mcs->call.args[i];
>  }
>  
> -DEFINE_XEN_GUEST_HANDLE(multicall_entry_compat_t);
>  #define multicall_entry  compat_multicall_entry
>  #define multicall_entry_tmulticall_entry_compat_t
>  #define do_multicall_callcompat_multicall_call

Jürgen's patch doesn't have any change to this file, and I'm afraid I
also don't see how these adjustments are related here. The commit
message sadly also doesn't help ...

Jan




Re: [PATCH v2 3/7] x86/altcall: Optimise away endbr64 instruction where possible

2022-02-14 Thread Andrew Cooper
On 14/02/2022 13:06, Jan Beulich wrote:
> On 14.02.2022 13:56, Andrew Cooper wrote:
>> With altcall, we convert indirect branches into direct ones.  With that
>> complete, none of the potential targets need an endbr64 instruction.
>>
>> Furthermore, removing the endbr64 instructions is a security defence-in-depth
>> improvement, because it limits the options available to an attacker who has
>> managed to hijack a function pointer.
>>
>> Introduce new .init.{ro,}data.cf_clobber sections.  Have 
>> _apply_alternatives()
>> walk over this, looking for any pointers into .text, and clobber an endbr64
>> instruction if found.  This is some minor structure (ab)use but it works
>> alarmingly well.
>>
>> Signed-off-by: Andrew Cooper 
> Reviewed-by: Jan Beulich 

Thanks,

> with two remarks, which ideally would be addressed by respective
> small adjustments:
>
>> @@ -330,6 +333,41 @@ static void init_or_livepatch 
>> _apply_alternatives(struct alt_instr *start,
>>  add_nops(buf + a->repl_len, total_len - a->repl_len);
>>  text_poke(orig, buf, total_len);
>>  }
>> +
>> +/*
>> + * Clobber endbr64 instructions now that altcall has finished optimising
>> + * all indirect branches to direct ones.
>> + */
>> +if ( force && cpu_has_xen_ibt )
>> +{
>> +void *const *val;
>> +unsigned int clobbered = 0;
>> +
>> +/*
>> + * This is some minor structure (ab)use.  We walk the entire 
>> contents
>> + * of .init.{ro,}data.cf_clobber as if it were an array of pointers.
>> + *
>> + * If the pointer points into .text, and at an endbr64 instruction,
>> + * nop out the endbr64.  This causes the pointer to no longer be a
>> + * legal indirect branch target under CET-IBT.  This is a
>> + * defence-in-depth measure, to reduce the options available to an
>> + * adversary who has managed to hijack a function pointer.
>> + */
>> +for ( val = __initdata_cf_clobber_start;
>> +  val < __initdata_cf_clobber_end;
>> +  val++ )
>> +{
>> +void *ptr = *val;
>> +
>> +if ( !is_kernel_text(ptr) || !is_endbr64(ptr) )
>> +continue;
>> +
>> +add_nops(ptr, 4);
> This literal 4 would be nice to have a #define next to where the ENDBR64
> encoding has its central place.

We don't have an encoding of ENDBR64 in a central place.

The best you can probably have is

#define ENDBR64_LEN 4

in endbr.h ?

>
>> --- a/xen/arch/x86/xen.lds.S
>> +++ b/xen/arch/x86/xen.lds.S
>> @@ -221,6 +221,12 @@ SECTIONS
>> *(.initcall1.init)
>> __initcall_end = .;
>>  
>> +   . = ALIGN(POINTER_ALIGN);
>> +   __initdata_cf_clobber_start = .;
>> +   *(.init.data.cf_clobber)
>> +   *(.init.rodata.cf_clobber)
>> +   __initdata_cf_clobber_end = .;
>> +
>> *(.init.data)
>> *(.init.data.rel)
>> *(.init.data.rel.*)
> With r/o data ahead and r/w data following, may I suggest to flip the
> order of the two section specifiers you add?

I don't follow.  This is all initdata which is merged together into a
single section.

The only reason const data is split out in the first place is to appease
the toolchains, not because it makes a difference.

~Andrew


[xen-unstable-smoke test] 168110: tolerable all pass - PUSHED

2022-02-14 Thread osstest service owner
flight 168110 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168110/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  94334d854bd358bd1d9c61d5e3306e4d903b120b
baseline version:
 xen  87319afb96973213ec0a76270d93696f3b8d6743

Last test of basis   168071  2022-02-09 17:02:46 Z4 days
Testing same since   168110  2022-02-14 10:00:43 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Norbert Manthey 
  Roger Pau Monné 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   87319afb96..94334d854b  94334d854bd358bd1d9c61d5e3306e4d903b120b -> smoke



Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure

2022-02-14 Thread Oleksandr Andrushchenko


On 14.02.22 15:22, Jan Beulich wrote:
> On 14.02.2022 14:13, Oleksandr Andrushchenko wrote:
>>
>> On 14.02.22 14:57, Jan Beulich wrote:
>>> On 14.02.2022 12:37, Oleksandr Andrushchenko wrote:
 On 14.02.22 13:25, Roger Pau Monné wrote:
> On Mon, Feb 14, 2022 at 11:15:27AM +, Oleksandr Andrushchenko wrote:
>> On 14.02.22 13:11, Roger Pau Monné wrote:
>>> On Mon, Feb 14, 2022 at 10:53:43AM +, Oleksandr Andrushchenko wrote:
 On 14.02.22 12:34, Roger Pau Monné wrote:
> On Mon, Feb 14, 2022 at 09:36:39AM +, Oleksandr Andrushchenko 
> wrote:
>> On 11.02.22 13:40, Roger Pau Monné wrote:
>>> +
>> for ( i = 0; i < msix->max_entries; i++ )
>> {
>> const struct vpci_msix_entry *entry = 
>> >entries[i];
> Since this function is now called with the per-domain rwlock read
> locked it's likely not appropriate to call 
> process_pending_softirqs
> while holding such lock (check below).
 You are right, as it is possible that:

 process_pending_softirqs -> vpci_process_pending -> read_lock

 Even more, vpci_process_pending may also

 read_unlock -> vpci_remove_device -> write_lock

 in its error path. So, any invocation of process_pending_softirqs
 must not hold d->vpci_rwlock at least.

 And also we need to check that pdev->vpci was not removed
 in between or *re-created*
> We will likely need to re-iterate over the list of pdevs assigned 
> to
> the domain and assert that the pdev is still assigned to the same
> domain.
 So, do you mean a pattern like the below should be used at all
 places where we need to call process_pending_softirqs?

 read_unlock
 process_pending_softirqs
 read_lock
 pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
 if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) )
 
>>> Something along those lines. You likely need to continue iterate 
>>> using
>>> for_each_pdev.
>> How do we tell if pdev->vpci is the same? Jan has already brought
>> this question before [1] and I was about to use some ID for that 
>> purpose:
>> pdev->vpci->id = d->vpci_id++ and then we use pdev->vpci->id  for 
>> checks
> Given this is a debug message I would be OK with just doing the
> minimal checks to prevent Xen from crashing (ie: pdev->vpci exists)
> and that the resume MSI entry is not past the current limit. Otherwise
> just print a message and move on to the next device.
 Agree, I see no big issue (probably) if we are not able to print

 How about this one:

 diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
 index 809a6b4773e1..50373f04da82 100644
 --- a/xen/drivers/vpci/header.c
 +++ b/xen/drivers/vpci/header.c
 @@ -171,10 +171,31 @@ static int __init apply_map(struct domain *d, 
 const struct pci_dev *pdev,
   struct rangeset *mem, uint16_t cmd)
   {
   struct map_data data = { .d = d, .map = true };
 +    pci_sbdf_t sbdf = pdev->sbdf;
   int rc;

 + ASSERT(rw_is_write_locked(>domain->vpci_rwlock));
 +
   while ( (rc = rangeset_consume_ranges(mem, map_range, 
 )) == -ERESTART )
 +    {
 +
 +    /*
 + * process_pending_softirqs may trigger vpci_process_pending 
 which
 + * may need to acquire pdev->domain->vpci_rwlock in read mode.
 + */
 +    write_unlock(>domain->vpci_rwlock);
   process_pending_softirqs();
 +    write_lock(>domain->vpci_rwlock);
 +
 +    /* Check if pdev still exists and vPCI was not removed or 
 re-created. */
 +    if (pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn) 
 != pdev)
 +    if ( vpci is NOT the same )
 +    {
 +    rc = 0;
 +    break;
 +    }
 +    }
 +
   rangeset_destroy(mem);
   if ( !rc )
   modify_decoding(pdev, cmd, false);

 This one also wants process_pending_softirqs to run so it *might*
 want pdev and vpci checks. But at the same time apply_map runs
 at ( system_state < SYS_STATE_active ), so defer_map won't be
 running yet, thus no 

Re: [PATCH v2 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86

2022-02-14 Thread Jan Beulich
On 14.02.2022 14:11, Jane Malalane wrote:
> On 11/02/2022 11:46, Jan Beulich wrote:
>> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
>> unless you have verified the sender and know the content is safe.
>>
>> On 11.02.2022 12:29, Roger Pau Monné wrote:
>>> On Fri, Feb 11, 2022 at 10:06:48AM +, Jane Malalane wrote:
 On 10/02/2022 10:03, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 06:21:00PM +, Jane Malalane wrote:
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index 7ab15e07a0..4060aef1bd 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -343,6 +343,15 @@ static int vmx_init_vmcs_config(bool bsp)
>>MSR_IA32_VMX_PROCBASED_CTLS2, );
>>}
>>
>> +/* Check whether hardware supports accelerated xapic and x2apic. */
>> +if ( bsp )
>> +{
>> +assisted_xapic_available = cpu_has_vmx_virtualize_apic_accesses;
>> +assisted_x2apic_available = (cpu_has_vmx_apic_reg_virt ||
>> + cpu_has_vmx_virtual_intr_delivery) 
>> &&
>> +cpu_has_vmx_virtualize_x2apic_mode;
>
> I've been think about this, and it seems kind of asymmetric that for
> xAPIC mode we report hw assisted support only with
> virtualize_apic_accesses available, while for x2APIC we require
> virtualize_x2apic_mode plus either apic_reg_virt or
> virtual_intr_delivery.
>
> I think we likely need to be more consistent here, and report hw
> assisted x2APIC support as long as virtualize_x2apic_mode is
> available.
>
> This will likely have some effect on patch 2 also, as you will have to
> adjust vmx_vlapic_msr_changed.
>
> Thanks, Roger.

 Any other thoughts on this? As on one hand it is asymmetric but also
 there isn't much assistance with only virtualize_x2apic_mode set as, in
 this case, a VM exit will be avoided only when trying to access the TPR
 register.
>>>
>>> I've been thinking about this, and reporting hardware assisted
>>> x{2}APIC virtualization with just
>>> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES or
>>> SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE doesn't seem very helpful. While
>>> those provide some assistance to the VMM in order to handle APIC
>>> accesses, it will still require a trap into the hypervisor to handle
>>> most of the accesses.
>>>
>>> So maybe we should only report hardware assisted support when the
>>> mentioned features are present together with
>>> SECONDARY_EXEC_APIC_REGISTER_VIRT?
>>
>> Not sure - "some assistance" seems still a little better than none at all.
>> Which route to go depends on what exactly we intend the bit to be used for.
>>
> True. I intended this bit to be specifically for enabling 
> assisted_x{2}apic. So, would it be inconsistent to report hardware 
> assistance with just VIRTUALIZE_APIC_ACCESSES or VIRTUALIZE_X2APIC_MODE 
> but still claim that x{2}apic is virtualized if no MSR accesses are 
> intercepted with XEN_HVM_CPUID_X2APIC_VIRT (in traps.c) so that, as you 
> say, the guest gets at least "some assistance" instead of none but we 
> still claim x{2}apic virtualization when it is actually complete? Maybe 
> I could also add a comment alluding to this in the xl documentation.

To rephrase my earlier point: Which kind of decisions are the consumer(s)
of us reporting hardware assistance going to take? In how far is there a
risk that "some assistance" is overall going to lead to a loss of
performance? I guess I'd need to see comment and actual code all in one
place ...

Jan




Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure

2022-02-14 Thread Jan Beulich
On 14.02.2022 14:13, Oleksandr Andrushchenko wrote:
> 
> 
> On 14.02.22 14:57, Jan Beulich wrote:
>> On 14.02.2022 12:37, Oleksandr Andrushchenko wrote:
>>>
>>> On 14.02.22 13:25, Roger Pau Monné wrote:
 On Mon, Feb 14, 2022 at 11:15:27AM +, Oleksandr Andrushchenko wrote:
> On 14.02.22 13:11, Roger Pau Monné wrote:
>> On Mon, Feb 14, 2022 at 10:53:43AM +, Oleksandr Andrushchenko wrote:
>>> On 14.02.22 12:34, Roger Pau Monné wrote:
 On Mon, Feb 14, 2022 at 09:36:39AM +, Oleksandr Andrushchenko 
 wrote:
> On 11.02.22 13:40, Roger Pau Monné wrote:
>> +
>for ( i = 0; i < msix->max_entries; i++ )
>{
>const struct vpci_msix_entry *entry = 
> >entries[i];
 Since this function is now called with the per-domain rwlock read
 locked it's likely not appropriate to call process_pending_softirqs
 while holding such lock (check below).
>>> You are right, as it is possible that:
>>>
>>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>>
>>> Even more, vpci_process_pending may also
>>>
>>> read_unlock -> vpci_remove_device -> write_lock
>>>
>>> in its error path. So, any invocation of process_pending_softirqs
>>> must not hold d->vpci_rwlock at least.
>>>
>>> And also we need to check that pdev->vpci was not removed
>>> in between or *re-created*
 We will likely need to re-iterate over the list of pdevs assigned 
 to
 the domain and assert that the pdev is still assigned to the same
 domain.
>>> So, do you mean a pattern like the below should be used at all
>>> places where we need to call process_pending_softirqs?
>>>
>>> read_unlock
>>> process_pending_softirqs
>>> read_lock
>>> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
>>> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) )
>>> 
>> Something along those lines. You likely need to continue iterate 
>> using
>> for_each_pdev.
> How do we tell if pdev->vpci is the same? Jan has already brought
> this question before [1] and I was about to use some ID for that 
> purpose:
> pdev->vpci->id = d->vpci_id++ and then we use pdev->vpci->id  for 
> checks
 Given this is a debug message I would be OK with just doing the
 minimal checks to prevent Xen from crashing (ie: pdev->vpci exists)
 and that the resume MSI entry is not past the current limit. Otherwise
 just print a message and move on to the next device.
>>> Agree, I see no big issue (probably) if we are not able to print
>>>
>>> How about this one:
>>>
>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>>> index 809a6b4773e1..50373f04da82 100644
>>> --- a/xen/drivers/vpci/header.c
>>> +++ b/xen/drivers/vpci/header.c
>>> @@ -171,10 +171,31 @@ static int __init apply_map(struct domain *d, 
>>> const struct pci_dev *pdev,
>>>  struct rangeset *mem, uint16_t cmd)
>>>  {
>>>  struct map_data data = { .d = d, .map = true };
>>> +    pci_sbdf_t sbdf = pdev->sbdf;
>>>  int rc;
>>>
>>> + ASSERT(rw_is_write_locked(>domain->vpci_rwlock));
>>> +
>>>  while ( (rc = rangeset_consume_ranges(mem, map_range, )) 
>>> == -ERESTART )
>>> +    {
>>> +
>>> +    /*
>>> + * process_pending_softirqs may trigger vpci_process_pending 
>>> which
>>> + * may need to acquire pdev->domain->vpci_rwlock in read mode.
>>> + */
>>> +    write_unlock(>domain->vpci_rwlock);
>>>  process_pending_softirqs();
>>> +    write_lock(>domain->vpci_rwlock);
>>> +
>>> +    /* Check if pdev still exists and vPCI was not removed or 
>>> re-created. */
>>> +    if (pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn) 
>>> != pdev)
>>> +    if ( vpci is NOT the same )
>>> +    {
>>> +    rc = 0;
>>> +    break;
>>> +    }
>>> +    }
>>> +
>>>  rangeset_destroy(mem);
>>>  if ( !rc )
>>>  modify_decoding(pdev, cmd, false);
>>>
>>> This one also wants process_pending_softirqs to run so it *might*
>>> want pdev and vpci checks. But at the same time apply_map runs
>>> at ( system_state < SYS_STATE_active ), so defer_map won't be
>>> running yet, thus no vpci_process_pending is possible yet (in terms
>>> it has something to do yet). So, I think we just need:
>>>
>>>     write_unlock(>domain->vpci_rwlock);
>>> 

[PATCH v2 12/70] xen: CFI hardening for acpi_table_parse()

2022-02-14 Thread Andrew Cooper
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
---
 xen/arch/x86/acpi/boot.c | 24 
 xen/arch/x86/hvm/dom0_build.c| 16 
 xen/arch/x86/include/asm/tboot.h |  2 +-
 xen/arch/x86/srat.c  |  4 ++--
 xen/arch/x86/tboot.c |  2 +-
 xen/arch/x86/x86_64/acpi_mmcfg.c |  2 +-
 xen/arch/x86/x86_64/mmconfig.h   |  2 +-
 xen/drivers/acpi/apei/hest.c |  4 ++--
 xen/drivers/acpi/numa.c  | 10 +-
 xen/drivers/passthrough/amd/iommu_acpi.c |  9 +
 xen/drivers/passthrough/pci.c|  3 ++-
 xen/drivers/passthrough/vtd/dmar.c   |  2 +-
 xen/include/xen/acpi.h   |  2 +-
 13 files changed, 42 insertions(+), 40 deletions(-)

diff --git a/xen/arch/x86/acpi/boot.c b/xen/arch/x86/acpi/boot.c
index cc4bbc0284fa..54b72d716bed 100644
--- a/xen/arch/x86/acpi/boot.c
+++ b/xen/arch/x86/acpi/boot.c
@@ -60,7 +60,7 @@ static u64 acpi_lapic_addr __initdata = 
APIC_DEFAULT_PHYS_BASE;
   Boot-time Configuration
-- 
*/
 
-static int __init acpi_parse_madt(struct acpi_table_header *table)
+static int __init cf_check acpi_parse_madt(struct acpi_table_header *table)
 {
struct acpi_table_madt *madt =
container_of(table, struct acpi_table_madt, header);
@@ -77,7 +77,7 @@ static int __init acpi_parse_madt(struct acpi_table_header 
*table)
return 0;
 }
 
-static int __init
+static int __init cf_check
 acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end)
 {
struct acpi_madt_local_x2apic *processor =
@@ -133,7 +133,7 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, 
const unsigned long end)
return 0;
 }
 
-static int __init
+static int __init cf_check
 acpi_parse_lapic(struct acpi_subtable_header * header, const unsigned long end)
 {
struct acpi_madt_local_apic *processor =
@@ -171,7 +171,7 @@ acpi_parse_lapic(struct acpi_subtable_header * header, 
const unsigned long end)
return 0;
 }
 
-static int __init
+static int __init cf_check
 acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * header,
  const unsigned long end)
 {
@@ -187,7 +187,7 @@ acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * 
header,
return 0;
 }
 
-static int __init
+static int __init cf_check
 acpi_parse_x2apic_nmi(struct acpi_subtable_header *header,
  const unsigned long end)
 {
@@ -206,7 +206,7 @@ acpi_parse_x2apic_nmi(struct acpi_subtable_header *header,
return 0;
 }
 
-static int __init
+static int __init cf_check
 acpi_parse_lapic_nmi(struct acpi_subtable_header * header, const unsigned long 
end)
 {
struct acpi_madt_local_apic_nmi *lapic_nmi =
@@ -223,7 +223,7 @@ acpi_parse_lapic_nmi(struct acpi_subtable_header * header, 
const unsigned long e
return 0;
 }
 
-static int __init
+static int __init cf_check
 acpi_parse_ioapic(struct acpi_subtable_header * header, const unsigned long 
end)
 {
struct acpi_madt_io_apic *ioapic =
@@ -240,7 +240,7 @@ acpi_parse_ioapic(struct acpi_subtable_header * header, 
const unsigned long end)
return 0;
 }
 
-static int __init
+static int __init cf_check
 acpi_parse_int_src_ovr(struct acpi_subtable_header * header,
   const unsigned long end)
 {
@@ -267,7 +267,7 @@ acpi_parse_int_src_ovr(struct acpi_subtable_header * header,
return 0;
 }
 
-static int __init
+static int __init cf_check
 acpi_parse_nmi_src(struct acpi_subtable_header * header, const unsigned long 
end)
 {
struct acpi_madt_nmi_source *nmi_src =
@@ -283,7 +283,7 @@ acpi_parse_nmi_src(struct acpi_subtable_header * header, 
const unsigned long end
return 0;
 }
 
-static int __init acpi_parse_hpet(struct acpi_table_header *table)
+static int __init cf_check acpi_parse_hpet(struct acpi_table_header *table)
 {
const struct acpi_table_hpet *hpet_tbl =
container_of(table, const struct acpi_table_hpet, header);
@@ -319,7 +319,7 @@ static int __init acpi_parse_hpet(struct acpi_table_header 
*table)
return 0;
 }
 
-static int __init acpi_invalidate_bgrt(struct acpi_table_header *table)
+static int __init cf_check acpi_invalidate_bgrt(struct acpi_table_header 
*table)
 {
struct acpi_table_bgrt *bgrt_tbl =
container_of(table, struct acpi_table_bgrt, header);
@@ -472,7 +472,7 @@ acpi_fadt_parse_sleep_info(const struct acpi_table_fadt 
*fadt)
   acpi_sinfo.wakeup_vector, acpi_sinfo.vector_width);
 }
 
-static int __init acpi_parse_fadt(struct acpi_table_header *table)

[PATCH v2 19/70] xsm: CFI hardening

2022-02-14 Thread Andrew Cooper
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

Signed-off-by: Andrew Cooper 
Reviewed-by: Daniel P. Smith 
---
 xen/include/xsm/dummy.h  | 211 ++
 xen/xsm/flask/flask_op.c |   2 +-
 xen/xsm/flask/hooks.c| 232 ++-
 xen/xsm/flask/private.h  |   4 +-
 xen/xsm/silo.c   |  24 ++---
 5 files changed, 257 insertions(+), 216 deletions(-)

diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index b024119896e6..58afc1d58973 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -101,46 +101,48 @@ static always_inline int xsm_default_action(
 }
 }
 
-static XSM_INLINE void xsm_security_domaininfo(
+static XSM_INLINE void cf_check xsm_security_domaininfo(
 struct domain *d, struct xen_domctl_getdomaininfo *info)
 {
 return;
 }
 
-static XSM_INLINE int xsm_domain_create(
+static XSM_INLINE int cf_check xsm_domain_create(
 XSM_DEFAULT_ARG struct domain *d, uint32_t ssidref)
 {
 XSM_ASSERT_ACTION(XSM_HOOK);
 return xsm_default_action(action, current->domain, d);
 }
 
-static XSM_INLINE int xsm_getdomaininfo(XSM_DEFAULT_ARG struct domain *d)
+static XSM_INLINE int cf_check xsm_getdomaininfo(
+XSM_DEFAULT_ARG struct domain *d)
 {
 XSM_ASSERT_ACTION(XSM_HOOK);
 return xsm_default_action(action, current->domain, d);
 }
 
-static XSM_INLINE int xsm_domctl_scheduler_op(
+static XSM_INLINE int cf_check xsm_domctl_scheduler_op(
 XSM_DEFAULT_ARG struct domain *d, int cmd)
 {
 XSM_ASSERT_ACTION(XSM_HOOK);
 return xsm_default_action(action, current->domain, d);
 }
 
-static XSM_INLINE int xsm_sysctl_scheduler_op(XSM_DEFAULT_ARG int cmd)
+static XSM_INLINE int cf_check xsm_sysctl_scheduler_op(XSM_DEFAULT_ARG int cmd)
 {
 XSM_ASSERT_ACTION(XSM_HOOK);
 return xsm_default_action(action, current->domain, NULL);
 }
 
-static XSM_INLINE int xsm_set_target(
+static XSM_INLINE int cf_check xsm_set_target(
 XSM_DEFAULT_ARG struct domain *d, struct domain *e)
 {
 XSM_ASSERT_ACTION(XSM_HOOK);
 return xsm_default_action(action, current->domain, NULL);
 }
 
-static XSM_INLINE int xsm_domctl(XSM_DEFAULT_ARG struct domain *d, int cmd)
+static XSM_INLINE int cf_check xsm_domctl(
+XSM_DEFAULT_ARG struct domain *d, int cmd)
 {
 XSM_ASSERT_ACTION(XSM_OTHER);
 switch ( cmd )
@@ -157,91 +159,93 @@ static XSM_INLINE int xsm_domctl(XSM_DEFAULT_ARG struct 
domain *d, int cmd)
 }
 }
 
-static XSM_INLINE int xsm_sysctl(XSM_DEFAULT_ARG int cmd)
+static XSM_INLINE int cf_check xsm_sysctl(XSM_DEFAULT_ARG int cmd)
 {
 XSM_ASSERT_ACTION(XSM_PRIV);
 return xsm_default_action(action, current->domain, NULL);
 }
 
-static XSM_INLINE int xsm_readconsole(XSM_DEFAULT_ARG uint32_t clear)
+static XSM_INLINE int cf_check xsm_readconsole(XSM_DEFAULT_ARG uint32_t clear)
 {
 XSM_ASSERT_ACTION(XSM_HOOK);
 return xsm_default_action(action, current->domain, NULL);
 }
 
-static XSM_INLINE int xsm_alloc_security_domain(struct domain *d)
+static XSM_INLINE int cf_check xsm_alloc_security_domain(struct domain *d)
 {
 return 0;
 }
 
-static XSM_INLINE void xsm_free_security_domain(struct domain *d)
+static XSM_INLINE void cf_check xsm_free_security_domain(struct domain *d)
 {
 return;
 }
 
-static XSM_INLINE int xsm_grant_mapref(
+static XSM_INLINE int cf_check xsm_grant_mapref(
 XSM_DEFAULT_ARG struct domain *d1, struct domain *d2, uint32_t flags)
 {
 XSM_ASSERT_ACTION(XSM_HOOK);
 return xsm_default_action(action, d1, d2);
 }
 
-static XSM_INLINE int xsm_grant_unmapref(
+static XSM_INLINE int cf_check xsm_grant_unmapref(
 XSM_DEFAULT_ARG struct domain *d1, struct domain *d2)
 {
 XSM_ASSERT_ACTION(XSM_HOOK);
 return xsm_default_action(action, d1, d2);
 }
 
-static XSM_INLINE int xsm_grant_setup(
+static XSM_INLINE int cf_check xsm_grant_setup(
 XSM_DEFAULT_ARG struct domain *d1, struct domain *d2)
 {
 XSM_ASSERT_ACTION(XSM_TARGET);
 return xsm_default_action(action, d1, d2);
 }
 
-static XSM_INLINE int xsm_grant_transfer(
+static XSM_INLINE int cf_check xsm_grant_transfer(
 XSM_DEFAULT_ARG struct domain *d1, struct domain *d2)
 {
 XSM_ASSERT_ACTION(XSM_HOOK);
 return xsm_default_action(action, d1, d2);
 }
 
-static XSM_INLINE int xsm_grant_copy(
+static XSM_INLINE int cf_check xsm_grant_copy(
 XSM_DEFAULT_ARG struct domain *d1, struct domain *d2)
 {
 XSM_ASSERT_ACTION(XSM_HOOK);
 return xsm_default_action(action, d1, d2);
 }
 
-static XSM_INLINE int xsm_grant_query_size(
+static XSM_INLINE int cf_check xsm_grant_query_size(
 XSM_DEFAULT_ARG struct domain *d1, struct domain *d2)
 {
 XSM_ASSERT_ACTION(XSM_TARGET);
 return xsm_default_action(action, d1, d2);
 }
 
-static XSM_INLINE int xsm_memory_exchange(XSM_DEFAULT_ARG 

Re: [PATCH v2 3/7] x86/altcall: Optimise away endbr64 instruction where possible

2022-02-14 Thread Jan Beulich
On 14.02.2022 13:56, Andrew Cooper wrote:
> With altcall, we convert indirect branches into direct ones.  With that
> complete, none of the potential targets need an endbr64 instruction.
> 
> Furthermore, removing the endbr64 instructions is a security defence-in-depth
> improvement, because it limits the options available to an attacker who has
> managed to hijack a function pointer.
> 
> Introduce new .init.{ro,}data.cf_clobber sections.  Have _apply_alternatives()
> walk over this, looking for any pointers into .text, and clobber an endbr64
> instruction if found.  This is some minor structure (ab)use but it works
> alarmingly well.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 
with two remarks, which ideally would be addressed by respective
small adjustments:

> @@ -330,6 +333,41 @@ static void init_or_livepatch _apply_alternatives(struct 
> alt_instr *start,
>  add_nops(buf + a->repl_len, total_len - a->repl_len);
>  text_poke(orig, buf, total_len);
>  }
> +
> +/*
> + * Clobber endbr64 instructions now that altcall has finished optimising
> + * all indirect branches to direct ones.
> + */
> +if ( force && cpu_has_xen_ibt )
> +{
> +void *const *val;
> +unsigned int clobbered = 0;
> +
> +/*
> + * This is some minor structure (ab)use.  We walk the entire contents
> + * of .init.{ro,}data.cf_clobber as if it were an array of pointers.
> + *
> + * If the pointer points into .text, and at an endbr64 instruction,
> + * nop out the endbr64.  This causes the pointer to no longer be a
> + * legal indirect branch target under CET-IBT.  This is a
> + * defence-in-depth measure, to reduce the options available to an
> + * adversary who has managed to hijack a function pointer.
> + */
> +for ( val = __initdata_cf_clobber_start;
> +  val < __initdata_cf_clobber_end;
> +  val++ )
> +{
> +void *ptr = *val;
> +
> +if ( !is_kernel_text(ptr) || !is_endbr64(ptr) )
> +continue;
> +
> +add_nops(ptr, 4);

This literal 4 would be nice to have a #define next to where the ENDBR64
encoding has its central place.

> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -221,6 +221,12 @@ SECTIONS
> *(.initcall1.init)
> __initcall_end = .;
>  
> +   . = ALIGN(POINTER_ALIGN);
> +   __initdata_cf_clobber_start = .;
> +   *(.init.data.cf_clobber)
> +   *(.init.rodata.cf_clobber)
> +   __initdata_cf_clobber_end = .;
> +
> *(.init.data)
> *(.init.data.rel)
> *(.init.data.rel.*)

With r/o data ahead and r/w data following, may I suggest to flip the
order of the two section specifiers you add?

Jan




[PATCH v2 17/70] xen: CFI hardening for open_softirq()

2022-02-14 Thread Andrew Cooper
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
---
 xen/arch/x86/cpu/mcheck/mce.c   | 2 +-
 xen/arch/x86/domain.c   | 2 +-
 xen/arch/x86/include/asm/flushtlb.h | 2 +-
 xen/arch/x86/pv/traps.c | 2 +-
 xen/arch/x86/smp.c  | 2 +-
 xen/arch/x86/time.c | 2 +-
 xen/common/rcupdate.c   | 2 +-
 xen/common/sched/core.c | 6 +++---
 xen/common/tasklet.c| 2 +-
 xen/common/timer.c  | 2 +-
 xen/drivers/passthrough/x86/hvm.c   | 2 +-
 11 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index 43f6c8471a90..3467e0f1a315 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -1837,7 +1837,7 @@ static int mce_delayed_action(mctelem_cookie_t mctc)
 }
 
 /* Softirq Handler for this MCE# processing */
-static void mce_softirq(void)
+static void cf_check mce_softirq(void)
 {
 static DEFINE_MCE_BARRIER(mce_inside_bar);
 static DEFINE_MCE_BARRIER(mce_severity_bar);
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f943283b2a88..1c3a1ec2a080 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2523,7 +2523,7 @@ void vcpu_mark_events_pending(struct vcpu *v)
 vcpu_kick(v);
 }
 
-static void vcpu_kick_softirq(void)
+static void cf_check vcpu_kick_softirq(void)
 {
 /*
  * Nothing to do here: we merely prevent notifiers from racing with checks
diff --git a/xen/arch/x86/include/asm/flushtlb.h 
b/xen/arch/x86/include/asm/flushtlb.h
index 0be2273387ed..18777f1d4c00 100644
--- a/xen/arch/x86/include/asm/flushtlb.h
+++ b/xen/arch/x86/include/asm/flushtlb.h
@@ -87,7 +87,7 @@ static inline void tlbflush_filter(cpumask_t *mask, uint32_t 
page_timestamp)
 __cpumask_clear_cpu(cpu, mask);
 }
 
-void new_tlbflush_clock_period(void);
+void cf_check new_tlbflush_clock_period(void);
 
 /* Read pagetable base. */
 static inline unsigned long read_cr3(void)
diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 170e1030982b..97fe54b5ee5a 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -130,7 +130,7 @@ bool set_guest_nmi_trapbounce(void)
 
 static DEFINE_PER_CPU(struct vcpu *, softirq_nmi_vcpu);
 
-static void nmi_softirq(void)
+static void cf_check nmi_softirq(void)
 {
 struct vcpu **v_ptr = _cpu(softirq_nmi_vcpu);
 
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index f6fd7f95df58..b9a696f61963 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -290,7 +290,7 @@ void flush_area_mask(const cpumask_t *mask, const void *va, 
unsigned int flags)
 }
 
 /* Call with no locks held and interrupts enabled (e.g., softirq context). */
-void new_tlbflush_clock_period(void)
+void cf_check new_tlbflush_clock_period(void)
 {
 cpumask_t allbutself;
 
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index b444d6344e79..5a72b66800e4 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1455,7 +1455,7 @@ int cpu_frequency_change(u64 freq)
 static DEFINE_PER_CPU(struct cpu_time_stamp, cpu_calibration);
 
 /* Softirq handler for per-CPU time calibration. */
-static void local_time_calibration(void)
+static void cf_check local_time_calibration(void)
 {
 struct cpu_time *t = _cpu(cpu_time);
 const struct cpu_time_stamp *c = _cpu(cpu_calibration);
diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index 423d6b1d6d02..212a99acd8c8 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -466,7 +466,7 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp,
 rcu_do_batch(rdp);
 }
 
-static void rcu_process_callbacks(void)
+static void cf_check rcu_process_callbacks(void)
 {
 struct rcu_data *rdp = _cpu(rcu_data);
 
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 285de9ee2a19..b1836b591c0a 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -2568,7 +2568,7 @@ static struct sched_unit *sched_wait_rendezvous_in(struct 
sched_unit *prev,
 return prev->next_task;
 }
 
-static void sched_slave(void)
+static void cf_check sched_slave(void)
 {
 struct vcpu  *v, *vprev = current;
 struct sched_unit*prev = vprev->sched_unit, *next;
@@ -2632,7 +2632,7 @@ static void sched_slave(void)
  * - deschedule the current domain (scheduler independent).
  * - pick a new domain (scheduler dependent).
  */
-static void schedule(void)
+static void cf_check schedule(void)
 {
 struct vcpu  *vnext, *vprev = current;
 struct sched_unit*prev = vprev->sched_unit, *next = NULL;
@@ -2928,7 +2928,7 @@ const cpumask_t *sched_get_opt_cpumask(enum sched_gran 
opt, unsigned int cpu)
 return mask;
 }
 
-static void 

[PATCH v2 47/70] x86/logdirty: CFI hardening

2022-02-14 Thread Andrew Cooper
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
---
 xen/arch/x86/mm/hap/hap.c   |  6 +++---
 xen/arch/x86/mm/shadow/common.c | 12 ++--
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index de4b13565ab4..ed5112b00b63 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -180,7 +180,7 @@ int hap_track_dirty_vram(struct domain *d,
  * NB: Domain that having device assigned should not set log_global. Because
  * there is no way to track the memory updating from device.
  */
-static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
+static int cf_check hap_enable_log_dirty(struct domain *d, bool log_global)
 {
 struct p2m_domain *p2m = p2m_get_hostp2m(d);
 
@@ -211,7 +211,7 @@ static int hap_enable_log_dirty(struct domain *d, bool_t 
log_global)
 return 0;
 }
 
-static int hap_disable_log_dirty(struct domain *d)
+static int cf_check hap_disable_log_dirty(struct domain *d)
 {
 paging_lock(d);
 d->arch.paging.mode &= ~PG_log_dirty;
@@ -228,7 +228,7 @@ static int hap_disable_log_dirty(struct domain *d)
 return 0;
 }
 
-static void hap_clean_dirty_bitmap(struct domain *d)
+static void cf_check hap_clean_dirty_bitmap(struct domain *d)
 {
 /*
  * Switch to log-dirty mode, either by setting l1e entries of P2M table to
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 83dedc8870aa..071a19adce82 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -40,9 +40,9 @@
 
 DEFINE_PER_CPU(uint32_t,trace_shadow_path_flags);
 
-static int sh_enable_log_dirty(struct domain *, bool log_global);
-static int sh_disable_log_dirty(struct domain *);
-static void sh_clean_dirty_bitmap(struct domain *);
+static int cf_check sh_enable_log_dirty(struct domain *, bool log_global);
+static int cf_check sh_disable_log_dirty(struct domain *);
+static void cf_check sh_clean_dirty_bitmap(struct domain *);
 
 /* Set up the shadow-specific parts of a domain struct at start of day.
  * Called for every domain from arch_domain_create() */
@@ -3016,7 +3016,7 @@ static int shadow_test_disable(struct domain *d)
 /* Shadow specific code which is called in paging_log_dirty_enable().
  * Return 0 if no problem found.
  */
-static int sh_enable_log_dirty(struct domain *d, bool log_global)
+static int cf_check sh_enable_log_dirty(struct domain *d, bool log_global)
 {
 int ret;
 
@@ -3044,7 +3044,7 @@ static int sh_enable_log_dirty(struct domain *d, bool 
log_global)
 }
 
 /* shadow specfic code which is called in paging_log_dirty_disable() */
-static int sh_disable_log_dirty(struct domain *d)
+static int cf_check sh_disable_log_dirty(struct domain *d)
 {
 int ret;
 
@@ -3058,7 +3058,7 @@ static int sh_disable_log_dirty(struct domain *d)
 /* This function is called when we CLEAN log dirty bitmap. See
  * paging_log_dirty_op() for details.
  */
-static void sh_clean_dirty_bitmap(struct domain *d)
+static void cf_check sh_clean_dirty_bitmap(struct domain *d)
 {
 paging_lock(d);
 /* Need to revoke write access to the domain's pages again.
-- 
2.11.0




[PATCH v2 61/70] x86/setup: Read CR4 earlier in __start_xen()

2022-02-14 Thread Andrew Cooper
This is necessary for read_cr4() to function correctly.  Move the EFER caching
at the same time.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
 xen/arch/x86/setup.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 735f69d2cae8..2b1192d85b77 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -888,6 +888,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
 /* Full exception support from here on in. */
 
+rdmsrl(MSR_EFER, this_cpu(efer));
+asm volatile ( "mov %%cr4,%0" : "=r" (get_cpu_info()->cr4) );
+
 /* Enable NMIs.  Our loader (e.g. Tboot) may have left them disabled. */
 enable_nmis();
 
@@ -934,9 +937,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
 parse_video_info();
 
-rdmsrl(MSR_EFER, this_cpu(efer));
-asm volatile ( "mov %%cr4,%0" : "=r" (get_cpu_info()->cr4) );
-
 /* We initialise the serial devices very early so we can get debugging. */
 ns16550.io_base = 0x3f8;
 ns16550.irq = 4;
-- 
2.11.0




[PATCH v2 70/70] x86: Enable CET Indirect Branch Tracking

2022-02-14 Thread Andrew Cooper
With all the pieces now in place, turn CET-IBT on when available.

MSR_S_CET, like SMEP/SMAP, controls Ring1 meaning that ENDBR_EN can't be
enabled for Xen independently of PV32 kernels.  As we already disable PV32 for
CET-SS, extend this to all CET, adjusting the documentation/comments as
appropriate.

Introduce a cet=no-ibt command line option to allow the admin to disable IBT
even when everything else is configured correctly.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v2:
 * Rebase over change to UEFI RS handling
---
 docs/misc/xen-command-line.pandoc | 16 +++
 xen/arch/x86/cpu/common.c |  1 +
 xen/arch/x86/setup.c  | 42 ++-
 3 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index 1ca817f5e1b9..92891a856971 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -271,7 +271,7 @@ enough. Setting this to a high value may cause boot 
failure, particularly if
 the NMI watchdog is also enabled.
 
 ### cet
-= List of [ shstk= ]
+= List of [ shstk=, ibt= ]
 
 Applicability: x86
 
@@ -279,6 +279,10 @@ Controls for the use of Control-flow Enforcement 
Technology.  CET is group a
 of hardware features designed to combat Return-oriented Programming (ROP, also
 call/jmp COP/JOP) attacks.
 
+CET is incompatible with 32bit PV guests.  If any CET sub-options are active,
+they will override the `pv=32` boolean to `false`.  Backwards compatibility
+can be maintained with the pv-shim mechanism.
+
 *   The `shstk=` boolean controls whether Xen uses Shadow Stacks for its own
 protection.
 
@@ -287,9 +291,13 @@ call/jmp COP/JOP) attacks.
 `cet=no-shstk` will cause Xen not to use Shadow Stacks even when support
 is available in hardware.
 
-Shadow Stacks are incompatible with 32bit PV guests.  This option will
-override the `pv=32` boolean to false.  Backwards compatibility can be
-maintained with the `pv-shim` mechanism.
+*   The `ibt=` boolean controls whether Xen uses Indirect Branch Tracking for
+its own protection.
+
+The option is available when `CONFIG_XEN_IBT` is compiled in, and defaults
+to `true` on hardware supporting CET-IBT.  Specifying `cet=no-ibt` will
+cause Xen not to use Indirect Branch Tracking even when support is
+available in hardware.
 
 ### clocksource (x86)
 > `= pit | hpet | acpi | tsc`
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 6b674bf15e8b..bfb8cf9f100b 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -345,6 +345,7 @@ void __init early_cpu_init(void)
if (c->cpuid_level >= 7) {
cpuid_count(7, 0, , , , );
c->x86_capability[cpufeat_word(X86_FEATURE_CET_SS)] = ecx;
+   c->x86_capability[cpufeat_word(X86_FEATURE_CET_IBT)] = edx;
}
 
eax = cpuid_eax(0x8000);
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index f6a59d5f0412..f5449c972825 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -102,6 +102,12 @@ static bool __initdata opt_xen_shstk = true;
 #define opt_xen_shstk false
 #endif
 
+#ifdef CONFIG_XEN_IBT
+static bool __initdata opt_xen_ibt = true;
+#else
+#define opt_xen_ibt false
+#endif
+
 static int __init cf_check parse_cet(const char *s)
 {
 const char *ss;
@@ -120,6 +126,14 @@ static int __init cf_check parse_cet(const char *s)
 no_config_param("XEN_SHSTK", "cet", s, ss);
 #endif
 }
+else if ( (val = parse_boolean("ibt", s, ss)) >= 0 )
+{
+#ifdef CONFIG_XEN_IBT
+opt_xen_ibt = val;
+#else
+no_config_param("XEN_IBT", "cet", s, ss);
+#endif
+}
 else
 rc = -EINVAL;
 
@@ -1118,11 +1132,33 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 printk("Enabling Supervisor Shadow Stacks\n");
 
 setup_force_cpu_cap(X86_FEATURE_XEN_SHSTK);
+}
+
+if ( opt_xen_ibt && boot_cpu_has(X86_FEATURE_CET_IBT) )
+{
+printk("Enabling Indirect Branch Tracking\n");
+
+setup_force_cpu_cap(X86_FEATURE_XEN_IBT);
+
+if ( efi_enabled(EFI_RS) )
+printk("  - IBT disabled in UEFI Runtime Services\n");
+
+/*
+ * Enable IBT now.  Only require the endbr64 on callees, which is
+ * entirely build-time arrangements.
+ */
+wrmsrl(MSR_S_CET, CET_ENDBR_EN);
+}
+
+if ( cpu_has_xen_shstk || cpu_has_xen_ibt )
+{
+set_in_cr4(X86_CR4_CET);
+
 #ifdef CONFIG_PV32
 if ( opt_pv32 )
 {
 opt_pv32 = 0;
-printk("  - Disabling PV32 due to Shadow Stacks\n");
+printk("  - Disabling PV32 due to CET\n");
 }
 #endif
 }
@@ -1849,10 +1885,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
 

[PATCH v2 21/70] xen/evtchn: CFI hardening

2022-02-14 Thread Andrew Cooper
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
---
 xen/common/event_2l.c  | 21 -
 xen/common/event_channel.c |  3 ++-
 xen/common/event_fifo.c| 30 --
 3 files changed, 30 insertions(+), 24 deletions(-)

diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c
index 7424320e525a..d40dd51ab555 100644
--- a/xen/common/event_2l.c
+++ b/xen/common/event_2l.c
@@ -16,7 +16,8 @@
 
 #include 
 
-static void evtchn_2l_set_pending(struct vcpu *v, struct evtchn *evtchn)
+static void cf_check evtchn_2l_set_pending(
+struct vcpu *v, struct evtchn *evtchn)
 {
 struct domain *d = v->domain;
 unsigned int port = evtchn->port;
@@ -41,12 +42,14 @@ static void evtchn_2l_set_pending(struct vcpu *v, struct 
evtchn *evtchn)
 evtchn_check_pollers(d, port);
 }
 
-static void evtchn_2l_clear_pending(struct domain *d, struct evtchn *evtchn)
+static void cf_check evtchn_2l_clear_pending(
+struct domain *d, struct evtchn *evtchn)
 {
 guest_clear_bit(d, evtchn->port, _info(d, evtchn_pending));
 }
 
-static void evtchn_2l_unmask(struct domain *d, struct evtchn *evtchn)
+static void cf_check evtchn_2l_unmask(
+struct domain *d, struct evtchn *evtchn)
 {
 struct vcpu *v = d->vcpu[evtchn->notify_vcpu_id];
 unsigned int port = evtchn->port;
@@ -64,8 +67,8 @@ static void evtchn_2l_unmask(struct domain *d, struct evtchn 
*evtchn)
 }
 }
 
-static bool evtchn_2l_is_pending(const struct domain *d,
- const struct evtchn *evtchn)
+static bool cf_check evtchn_2l_is_pending(
+const struct domain *d, const struct evtchn *evtchn)
 {
 evtchn_port_t port = evtchn->port;
 unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
@@ -75,8 +78,8 @@ static bool evtchn_2l_is_pending(const struct domain *d,
 guest_test_bit(d, port, _info(d, evtchn_pending)));
 }
 
-static bool evtchn_2l_is_masked(const struct domain *d,
-const struct evtchn *evtchn)
+static bool cf_check evtchn_2l_is_masked(
+const struct domain *d, const struct evtchn *evtchn)
 {
 evtchn_port_t port = evtchn->port;
 unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
@@ -86,8 +89,8 @@ static bool evtchn_2l_is_masked(const struct domain *d,
 guest_test_bit(d, port, _info(d, evtchn_mask)));
 }
 
-static void evtchn_2l_print_state(struct domain *d,
-  const struct evtchn *evtchn)
+static void cf_check evtchn_2l_print_state(
+struct domain *d, const struct evtchn *evtchn)
 {
 struct vcpu *v = d->vcpu[evtchn->notify_vcpu_id];
 
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 2026bc30dc95..183e78ac17f1 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -99,7 +99,8 @@ static xen_event_channel_notification_t __read_mostly
 xen_consumers[NR_XEN_CONSUMERS];
 
 /* Default notification action: wake up from wait_on_xen_event_channel(). */
-static void default_xen_notification_fn(struct vcpu *v, unsigned int port)
+static void cf_check default_xen_notification_fn(
+struct vcpu *v, unsigned int port)
 {
 /* Consumer needs notification only if blocked. */
 if ( test_and_clear_bit(_VPF_blocked_in_xen, >pause_flags) )
diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
index 2fb01b82db84..ed4d3beb10f3 100644
--- a/xen/common/event_fifo.c
+++ b/xen/common/event_fifo.c
@@ -78,7 +78,7 @@ static inline event_word_t *evtchn_fifo_word_from_port(const 
struct domain *d,
 return d->evtchn_fifo->event_array[p] + w;
 }
 
-static void evtchn_fifo_init(struct domain *d, struct evtchn *evtchn)
+static void cf_check evtchn_fifo_init(struct domain *d, struct evtchn *evtchn)
 {
 event_word_t *word;
 
@@ -158,7 +158,8 @@ static bool_t evtchn_fifo_set_link(struct domain *d, 
event_word_t *word,
 return 1;
 }
 
-static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
+static void cf_check evtchn_fifo_set_pending(
+struct vcpu *v, struct evtchn *evtchn)
 {
 struct domain *d = v->domain;
 unsigned int port;
@@ -317,7 +318,8 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct 
evtchn *evtchn)
 evtchn_check_pollers(d, port);
 }
 
-static void evtchn_fifo_clear_pending(struct domain *d, struct evtchn *evtchn)
+static void cf_check evtchn_fifo_clear_pending(
+struct domain *d, struct evtchn *evtchn)
 {
 event_word_t *word;
 
@@ -334,7 +336,7 @@ static void evtchn_fifo_clear_pending(struct domain *d, 
struct evtchn *evtchn)
 guest_clear_bit(d, EVTCHN_FIFO_PENDING, word);
 }
 
-static void evtchn_fifo_unmask(struct domain *d, struct evtchn *evtchn)
+static void cf_check 

[PATCH v2 67/70] x86/entry: Make IDT entrypoints CET-IBT compatible

2022-02-14 Thread Andrew Cooper
Each IDT vector needs to land on an endbr64 instruction.  This is especially
important for the #CP handler, which will recurse indefinitely if the endbr64
is missing, eventually escalating to #DF if guard pages are active.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v2:
 * Extra newlines in asm
 * Reword commit message
---
 xen/arch/x86/x86_64/compat/entry.S |  1 +
 xen/arch/x86/x86_64/entry.S| 30 --
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/x86_64/compat/entry.S 
b/xen/arch/x86/x86_64/compat/entry.S
index c84ff7ea6476..5fd6dbbd4513 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -12,6 +12,7 @@
 #include 
 
 ENTRY(entry_int82)
+ENDBR64
 ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
 pushq $0
 movl  $HYPERCALL_VECTOR, 4(%rsp)
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 9abcf95bd010..ea6f0afbc2b4 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -386,6 +386,7 @@ UNLIKELY_END(sysenter_gpf)
 jmp   .Lbounce_exception
 
 ENTRY(int80_direct_trap)
+ENDBR64
 ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
 pushq $0
 movl  $0x80, 4(%rsp)
@@ -698,6 +699,7 @@ ENTRY(common_interrupt)
 jmp ret_from_intr
 
 ENTRY(page_fault)
+ENDBR64
 movl  $TRAP_page_fault,4(%rsp)
 /* No special register assumptions. */
 GLOBAL(handle_exception)
@@ -872,75 +874,91 @@ FATAL_exception_with_ints_disabled:
 BUG   /* fatal_trap() shouldn't return. */
 
 ENTRY(divide_error)
+ENDBR64
 pushq $0
 movl  $TRAP_divide_error,4(%rsp)
 jmp   handle_exception
 
 ENTRY(coprocessor_error)
+ENDBR64
 pushq $0
 movl  $TRAP_copro_error,4(%rsp)
 jmp   handle_exception
 
 ENTRY(simd_coprocessor_error)
+ENDBR64
 pushq $0
 movl  $TRAP_simd_error,4(%rsp)
 jmp   handle_exception
 
 ENTRY(device_not_available)
+ENDBR64
 pushq $0
 movl  $TRAP_no_device,4(%rsp)
 jmp   handle_exception
 
 ENTRY(debug)
+ENDBR64
 pushq $0
 movl  $TRAP_debug,4(%rsp)
 jmp   handle_ist_exception
 
 ENTRY(int3)
+ENDBR64
 pushq $0
 movl  $TRAP_int3,4(%rsp)
 jmp   handle_exception
 
 ENTRY(overflow)
+ENDBR64
 pushq $0
 movl  $TRAP_overflow,4(%rsp)
 jmp   handle_exception
 
 ENTRY(bounds)
+ENDBR64
 pushq $0
 movl  $TRAP_bounds,4(%rsp)
 jmp   handle_exception
 
 ENTRY(invalid_op)
+ENDBR64
 pushq $0
 movl  $TRAP_invalid_op,4(%rsp)
 jmp   handle_exception
 
 ENTRY(invalid_TSS)
+ENDBR64
 movl  $TRAP_invalid_tss,4(%rsp)
 jmp   handle_exception
 
 ENTRY(segment_not_present)
+ENDBR64
 movl  $TRAP_no_segment,4(%rsp)
 jmp   handle_exception
 
 ENTRY(stack_segment)
+ENDBR64
 movl  $TRAP_stack_error,4(%rsp)
 jmp   handle_exception
 
 ENTRY(general_protection)
+ENDBR64
 movl  $TRAP_gp_fault,4(%rsp)
 jmp   handle_exception
 
 ENTRY(alignment_check)
+ENDBR64
 movl  $TRAP_alignment_check,4(%rsp)
 jmp   handle_exception
 
 ENTRY(entry_CP)
+ENDBR64
 movl  $X86_EXC_CP, 4(%rsp)
 jmp   handle_exception
 
 ENTRY(double_fault)
+ENDBR64
 movl  $TRAP_double_fault,4(%rsp)
 /* Set AC to reduce chance of further SMAP faults */
 ALTERNATIVE "", stac, X86_FEATURE_XEN_SMAP
@@ -966,6 +984,7 @@ ENTRY(double_fault)
 
 .pushsection .init.text, "ax", @progbits
 ENTRY(early_page_fault)
+ENDBR64
 movl  $TRAP_page_fault,4(%rsp)
 SAVE_ALL
 movq  %rsp,%rdi
@@ -974,6 +993,7 @@ ENTRY(early_page_fault)
 .popsection
 
 ENTRY(nmi)
+ENDBR64
 pushq $0
 movl  $TRAP_nmi,4(%rsp)
 handle_ist_exception:
@@ -1102,12 +1122,14 @@ handle_ist_exception:
 #endif
 
 ENTRY(machine_check)
+ENDBR64
 pushq $0
 movl  $TRAP_machine_check,4(%rsp)
 jmp   handle_ist_exception
 
 /* No op trap handler.  Required for kexec crash path. */
 GLOBAL(trap_nop)
+ENDBR64
 iretq
 
 /* Table of automatically generated entry points.  One per vector. */
@@ -1136,7 +1158,9 @@ autogen_stubs: /* Automatically generated stubs. */
 #endif
 
 ALIGN
-1:  pushq $0
+1:
+ENDBR64
+pushq $0
 movb  $vec,4(%rsp)
 jmp   common_interrupt
 
@@ -1146,7 +1170,9 @@ autogen_stubs: /* Automatically generated stubs. */
 .elseif vec == X86_EXC_CSO || vec == X86_EXC_SPV || \
 vec == X86_EXC_VE  || (vec > X86_EXC_CP && vec < TRAP_nr)
 
-1:  test  $8,%spl/* 64bit exception frames are 16 byte aligned, 
but the 

[PATCH v2 54/70] x86/dpci: CFI hardening

2022-02-14 Thread Andrew Cooper
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
---
 xen/arch/x86/hvm/hvm.c| 4 ++--
 xen/drivers/passthrough/vtd/x86/hvm.c | 4 ++--
 xen/drivers/passthrough/x86/hvm.c | 8 
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 4cf313a0ad0a..cdd1529014f2 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -474,8 +474,8 @@ void hvm_migrate_pirq(struct hvm_pirq_dpci *pirq_dpci, 
const struct vcpu *v)
 }
 }
 
-static int migrate_pirq(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
-void *arg)
+static int cf_check migrate_pirq(
+struct domain *d, struct hvm_pirq_dpci *pirq_dpci, void *arg)
 {
 hvm_migrate_pirq(pirq_dpci, arg);
 
diff --git a/xen/drivers/passthrough/vtd/x86/hvm.c 
b/xen/drivers/passthrough/vtd/x86/hvm.c
index b531fe907a94..132d252d1cca 100644
--- a/xen/drivers/passthrough/vtd/x86/hvm.c
+++ b/xen/drivers/passthrough/vtd/x86/hvm.c
@@ -21,8 +21,8 @@
 #include 
 #include 
 
-static int _hvm_dpci_isairq_eoi(struct domain *d,
-struct hvm_pirq_dpci *pirq_dpci, void *arg)
+static int cf_check _hvm_dpci_isairq_eoi(
+struct domain *d, struct hvm_pirq_dpci *pirq_dpci, void *arg)
 {
 struct hvm_irq *hvm_irq = hvm_domain_irq(d);
 unsigned int isairq = (long)arg;
diff --git a/xen/drivers/passthrough/x86/hvm.c 
b/xen/drivers/passthrough/x86/hvm.c
index 0e3c0f6aeed3..0f94203af817 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -777,8 +777,8 @@ static void __msi_pirq_eoi(struct hvm_pirq_dpci *pirq_dpci)
 }
 }
 
-static int _hvm_dpci_msi_eoi(struct domain *d,
- struct hvm_pirq_dpci *pirq_dpci, void *arg)
+static int cf_check _hvm_dpci_msi_eoi(
+struct domain *d, struct hvm_pirq_dpci *pirq_dpci, void *arg)
 {
 int vector = (long)arg;
 
@@ -947,8 +947,8 @@ void hvm_dpci_eoi(struct domain *d, unsigned int guest_gsi)
 spin_unlock(>event_lock);
 }
 
-static int pci_clean_dpci_irq(struct domain *d,
-  struct hvm_pirq_dpci *pirq_dpci, void *arg)
+static int cf_check pci_clean_dpci_irq(
+struct domain *d, struct hvm_pirq_dpci *pirq_dpci, void *arg)
 {
 struct dev_intx_gsi_link *digl, *tmp;
 
-- 
2.11.0




[PATCH v2 24/70] xen/keyhandler: CFI hardening

2022-02-14 Thread Andrew Cooper
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

Tweak {IRQ_,}KEYHANDLER() to use a named initialiser instead of requiring a
pointer cast to compile in the IRQ case.

Reposition iommu_dump_page_tables() to avoid a forward declaration.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
---
 xen/arch/x86/acpi/cpu_idle.c |  2 +-
 xen/arch/x86/hvm/irq.c   |  2 +-
 xen/arch/x86/hvm/svm/vmcb.c  |  2 +-
 xen/arch/x86/hvm/vmx/vmcs.c  |  2 +-
 xen/arch/x86/io_apic.c   |  2 +-
 xen/arch/x86/irq.c   |  2 +-
 xen/arch/x86/mm/p2m-ept.c|  2 +-
 xen/arch/x86/mm/shadow/common.c  |  4 +--
 xen/arch/x86/msi.c   |  2 +-
 xen/arch/x86/nmi.c   |  4 +--
 xen/arch/x86/numa.c  |  2 +-
 xen/arch/x86/time.c  |  2 +-
 xen/common/debugtrace.c  |  2 +-
 xen/common/event_channel.c   |  2 +-
 xen/common/grant_table.c |  2 +-
 xen/common/kexec.c   |  2 +-
 xen/common/keyhandler.c  | 35 -
 xen/common/livepatch.c   |  2 +-
 xen/common/page_alloc.c  |  4 +--
 xen/common/perfc.c   |  4 +--
 xen/common/sched/cpupool.c   |  2 +-
 xen/common/spinlock.c|  4 +--
 xen/common/timer.c   |  2 +-
 xen/drivers/char/console.c   |  8 ++---
 xen/drivers/passthrough/amd/iommu.h  |  2 +-
 xen/drivers/passthrough/amd/iommu_intr.c |  2 +-
 xen/drivers/passthrough/iommu.c  | 52 +++-
 xen/drivers/passthrough/pci.c|  2 +-
 xen/drivers/passthrough/vtd/extern.h |  2 +-
 xen/drivers/passthrough/vtd/utils.c  |  2 +-
 xen/include/xen/perfc.h  |  4 +--
 xen/include/xen/sched.h  |  2 +-
 xen/include/xen/spinlock.h   |  4 +--
 33 files changed, 86 insertions(+), 83 deletions(-)

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index 22c8bb0c2d94..0142671bb836 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -377,7 +377,7 @@ static void print_acpi_power(uint32_t cpu, struct 
acpi_processor_power *power)
 print_hw_residencies(cpu);
 }
 
-static void dump_cx(unsigned char key)
+static void cf_check dump_cx(unsigned char key)
 {
 unsigned int cpu;
 
diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index 6045c9149bad..a7f8991a7b84 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -635,7 +635,7 @@ static void irq_dump(struct domain *d)
hvm_irq->callback_via_asserted ? "" : " not");
 }
 
-static void dump_irq_info(unsigned char key)
+static void cf_check dump_irq_info(unsigned char key)
 {
 struct domain *d;
 
diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c
index efa085032bb5..958309657799 100644
--- a/xen/arch/x86/hvm/svm/vmcb.c
+++ b/xen/arch/x86/hvm/svm/vmcb.c
@@ -226,7 +226,7 @@ void svm_destroy_vmcb(struct vcpu *v)
 svm->vmcb = NULL;
 }
 
-static void vmcb_dump(unsigned char ch)
+static void cf_check vmcb_dump(unsigned char ch)
 {
 struct domain *d;
 struct vcpu *v;
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 2b6bafe9d542..d2cafd8ca1c5 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -2117,7 +2117,7 @@ void vmcs_dump_vcpu(struct vcpu *v)
 vmx_vmcs_exit(v);
 }
 
-static void vmcs_dump(unsigned char ch)
+static void cf_check vmcs_dump(unsigned char ch)
 {
 struct domain *d;
 struct vcpu *v;
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index 4135a9c06052..4c5eaef86273 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1268,7 +1268,7 @@ static void __init print_IO_APIC(void)
 __print_IO_APIC(1);
 }
 
-static void _print_IO_APIC_keyhandler(unsigned char key)
+static void cf_check _print_IO_APIC_keyhandler(unsigned char key)
 {
 __print_IO_APIC(0);
 }
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index d9bd355113d7..f43b926ed26b 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2424,7 +2424,7 @@ void free_domain_pirqs(struct domain *d)
 pcidevs_unlock();
 }
 
-static void dump_irqs(unsigned char key)
+static void cf_check dump_irqs(unsigned char key)
 {
 int i, irq, pirq;
 struct irq_desc *desc;
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index b7ee441d4573..a8a6ad629528 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -1433,7 +1433,7 @@ static const char *memory_type_to_str(unsigned int x)
 return memory_types[x][0] ? memory_types[x] : "?";
 }
 
-static void 

[PATCH v2 31/70] x86: CFI hardening for request_irq()

2022-02-14 Thread Andrew Cooper
... and friends; alloc_direct_apic_vector() and set_direct_apic_vector().

Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
---
 xen/arch/x86/apic.c  |  8 
 xen/arch/x86/cpu/mcheck/mce_intel.c  |  4 ++--
 xen/arch/x86/guest/xen/xen.c |  2 +-
 xen/arch/x86/hpet.c  |  4 ++--
 xen/arch/x86/hvm/vmx/vmx.c   |  4 ++--
 xen/arch/x86/include/asm/irq.h   | 16 
 xen/arch/x86/irq.c   |  2 +-
 xen/arch/x86/smp.c   |  6 +++---
 xen/arch/x86/time.c  |  3 ++-
 xen/drivers/passthrough/amd/iommu_init.c |  4 ++--
 xen/drivers/passthrough/vtd/iommu.c  |  4 ++--
 11 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index 68e4d870c749..5a7a58dc9830 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1361,7 +1361,7 @@ int reprogram_timer(s_time_t timeout)
 return apic_tmict || !timeout;
 }
 
-void apic_timer_interrupt(struct cpu_user_regs * regs)
+void cf_check apic_timer_interrupt(struct cpu_user_regs *regs)
 {
 ack_APIC_irq();
 perfc_incr(apic_timer);
@@ -1380,7 +1380,7 @@ void smp_send_state_dump(unsigned int cpu)
 /*
  * Spurious interrupts should _never_ happen with our APIC/SMP architecture.
  */
-void spurious_interrupt(struct cpu_user_regs *regs)
+void cf_check spurious_interrupt(struct cpu_user_regs *regs)
 {
 /*
  * Check if this is a vectored interrupt (most likely, as this is probably
@@ -1411,7 +1411,7 @@ void spurious_interrupt(struct cpu_user_regs *regs)
  * This interrupt should never happen with our APIC/SMP architecture
  */
 
-void error_interrupt(struct cpu_user_regs *regs)
+void cf_check error_interrupt(struct cpu_user_regs *regs)
 {
 static const char *const esr_fields[] = {
 "Send CS error",
@@ -1444,7 +1444,7 @@ void error_interrupt(struct cpu_user_regs *regs)
  * This interrupt handles performance counters interrupt
  */
 
-void pmu_apic_interrupt(struct cpu_user_regs *regs)
+void cf_check pmu_apic_interrupt(struct cpu_user_regs *regs)
 {
 ack_APIC_irq();
 vpmu_do_interrupt(regs);
diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c 
b/xen/arch/x86/cpu/mcheck/mce_intel.c
index a691e10bdcd6..7aaa56fd02eb 100644
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -55,7 +55,7 @@ bool __read_mostly lmce_support;
 #define MCE_RING0x1
 static DEFINE_PER_CPU(int, last_state);
 
-static void intel_thermal_interrupt(struct cpu_user_regs *regs)
+static void cf_check intel_thermal_interrupt(struct cpu_user_regs *regs)
 {
 uint64_t msr_content;
 unsigned int cpu = smp_processor_id();
@@ -639,7 +639,7 @@ static void cpu_mcheck_disable(void)
 clear_cmci();
 }
 
-static void cmci_interrupt(struct cpu_user_regs *regs)
+static void cf_check cmci_interrupt(struct cpu_user_regs *regs)
 {
 mctelem_cookie_t mctc;
 struct mca_summary bs;
diff --git a/xen/arch/x86/guest/xen/xen.c b/xen/arch/x86/guest/xen/xen.c
index b2aa3a009b4a..17807cdea688 100644
--- a/xen/arch/x86/guest/xen/xen.c
+++ b/xen/arch/x86/guest/xen/xen.c
@@ -170,7 +170,7 @@ static void __init init_memmap(void)
 }
 }
 
-static void xen_evtchn_upcall(struct cpu_user_regs *regs)
+static void cf_check xen_evtchn_upcall(struct cpu_user_regs *regs)
 {
 struct vcpu_info *vcpu_info = this_cpu(vcpu_info);
 unsigned long pending;
diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index 7b009a930498..c31fd97579dc 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -240,8 +240,8 @@ static void handle_hpet_broadcast(struct hpet_event_channel 
*ch)
 }
 }
 
-static void hpet_interrupt_handler(int irq, void *data,
-struct cpu_user_regs *regs)
+static void cf_check hpet_interrupt_handler(
+int irq, void *data, struct cpu_user_regs *regs)
 {
 struct hpet_event_channel *ch = data;
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 36c8a12cfe7d..dade08f60279 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2533,7 +2533,7 @@ static struct hvm_function_table __initdata 
vmx_function_table = {
 };
 
 /* Handle VT-d posted-interrupt when VCPU is blocked. */
-static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
+static void cf_check pi_wakeup_interrupt(struct cpu_user_regs *regs)
 {
 struct vmx_vcpu *vmx, *tmp;
 spinlock_t *lock = _cpu(vmx_pi_blocking, smp_processor_id()).lock;
@@ -2565,7 +2565,7 @@ static void pi_wakeup_interrupt(struct cpu_user_regs 
*regs)
 }
 
 /* Handle VT-d posted-interrupt when VCPU is running. */
-static void pi_notification_interrupt(struct cpu_user_regs *regs)
+static void cf_check 

Re: [PATCH v2 02/70] xen/sort: Switch to an extern inline implementation

2022-02-14 Thread Julien Grall

Hi,

On 14/02/2022 12:50, Andrew Cooper wrote:

There are exactly 3 callers of sort() in the hypervisor.  Callbacks in a tight
loop like this are problematic for performance, especially with Spectre v2
protections, which is why extern inline is used commonly by libraries.

Both ARM callers pass in NULL for the swap function, and while this might seem
like an attractive option at first, it causes generic_swap() to be used, which
forced a byte-wise copy.  Provide real swap functions so the compiler can
optimise properly, which is very important for ARM downstreams where
milliseconds until the system is up matters.


Did you actually benchmark it? Both those lists will have < 128 elements 
in them. So I would be extremely surprised if you save more than a few 
hundreds microseconds with this approach.


So, my opinion on this approach hasn't changed. On v1, we discussed an 
approach that would suit both Stefano and I. Jan seemed to confirm that 
would also suit x86.


Therefore, for this approach:

Nacked-by: Julien Grall 

Cheers,

--
Julien Grall



[PATCH v2 65/70] x86/emul: Update emulation stubs to be CET-IBT compatible

2022-02-14 Thread Andrew Cooper
All indirect branches need to land on an endbr64 instruction.

For stub_selftests(), use endbr64 unconditionally for simplicity.  For ioport
and instruction emulation, add endbr64 conditionally.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v2:
 * Use local endbr64 define rather than raw opcodes in stub_selftest()
v1.1:
 * Update to use endbr helpers
---
 xen/arch/x86/extable.c | 12 +++-
 xen/arch/x86/pv/emul-priv-op.c |  7 +++
 xen/arch/x86/x86_emulate.c | 13 +++--
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
index 4d1875585f9d..4913c4a6dd5d 100644
--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -129,20 +129,22 @@ search_exception_table(const struct cpu_user_regs *regs)
 static int __init cf_check stub_selftest(void)
 {
 static const struct {
-uint8_t opc[4];
+uint8_t opc[8];
 uint64_t rax;
 union stub_exception_token res;
 } tests[] __initconst = {
-{ .opc = { 0x0f, 0xb9, 0xc3, 0xc3 }, /* ud1 */
+#define endbr64 0xf3, 0x0f, 0x1e, 0xfa
+{ .opc = { endbr64, 0x0f, 0xb9, 0xc3, 0xc3 }, /* ud1 */
   .res.fields.trapnr = TRAP_invalid_op },
-{ .opc = { 0x90, 0x02, 0x00, 0xc3 }, /* nop; add (%rax),%al */
+{ .opc = { endbr64, 0x90, 0x02, 0x00, 0xc3 }, /* nop; add (%rax),%al */
   .rax = 0x0123456789abcdef,
   .res.fields.trapnr = TRAP_gp_fault },
-{ .opc = { 0x02, 0x04, 0x04, 0xc3 }, /* add (%rsp,%rax),%al */
+{ .opc = { endbr64, 0x02, 0x04, 0x04, 0xc3 }, /* add (%rsp,%rax),%al */
   .rax = 0xfedcba9876543210,
   .res.fields.trapnr = TRAP_stack_error },
-{ .opc = { 0xcc, 0xc3, 0xc3, 0xc3 }, /* int3 */
+{ .opc = { endbr64, 0xcc, 0xc3, 0xc3, 0xc3 }, /* int3 */
   .res.fields.trapnr = TRAP_int3 },
+#undef endbr64
 };
 unsigned long addr = this_cpu(stubs.addr) + STUB_BUF_SIZE / 2;
 unsigned int i;
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index c46c072f93db..22b10dec2a6e 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -26,6 +26,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -111,6 +112,12 @@ static io_emul_stub_t *io_emul_stub_setup(struct 
priv_op_ctxt *ctxt, u8 opcode,
 
 p = ctxt->io_emul_stub;
 
+if ( cpu_has_xen_ibt )
+{
+place_endbr64(p);
+p += 4;
+}
+
 APPEND_BUFF(prologue);
 APPEND_CALL(load_guest_gprs);
 
diff --git a/xen/arch/x86/x86_emulate.c b/xen/arch/x86/x86_emulate.c
index 60191a94dc18..720740f29b84 100644
--- a/xen/arch/x86/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate.c
@@ -17,6 +17,7 @@
 #include 
 #include  /* cpu_has_amd_erratum() */
 #include 
+#include 
 
 /* Avoid namespace pollution. */
 #undef cmpxchg
@@ -29,11 +30,19 @@
 cpu_has_amd_erratum(_cpu_data, AMD_ERRATUM_##nr)
 
 #define get_stub(stb) ({\
+void *ptr;  \
 BUILD_BUG_ON(STUB_BUF_SIZE / 2 < MAX_INST_LEN + 1); \
 ASSERT(!(stb).ptr); \
 (stb).addr = this_cpu(stubs.addr) + STUB_BUF_SIZE / 2;  \
-memset(((stb).ptr = map_domain_page(_mfn(this_cpu(stubs.mfn +  \
-   ((stb).addr & ~PAGE_MASK), 0xcc, STUB_BUF_SIZE / 2);\
+(stb).ptr = map_domain_page(_mfn(this_cpu(stubs.mfn))) +\
+((stb).addr & ~PAGE_MASK);  \
+ptr = memset((stb).ptr, 0xcc, STUB_BUF_SIZE / 2);   \
+if ( cpu_has_xen_ibt )  \
+{   \
+place_endbr64(ptr); \
+ptr += 4;   \
+}   \
+ptr;\
 })
 #define put_stub(stb) ({   \
 if ( (stb).ptr )   \
-- 
2.11.0




[PATCH v2 28/70] xen/video: CFI hardening

2022-02-14 Thread Andrew Cooper
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
---
 xen/drivers/video/lfb.c  | 4 ++--
 xen/drivers/video/lfb.h  | 4 ++--
 xen/drivers/video/vesa.c | 4 ++--
 xen/drivers/video/vga.c  | 6 +++---
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/xen/drivers/video/lfb.c b/xen/drivers/video/lfb.c
index 75b749b3303b..48c66f8acf10 100644
--- a/xen/drivers/video/lfb.c
+++ b/xen/drivers/video/lfb.c
@@ -53,7 +53,7 @@ static void lfb_show_line(
 }
 
 /* Fast mode which redraws all modified parts of a 2D text buffer. */
-void lfb_redraw_puts(const char *s, size_t nr)
+void cf_check lfb_redraw_puts(const char *s, size_t nr)
 {
 unsigned int i, min_redraw_y = lfb.ypos;
 
@@ -98,7 +98,7 @@ void lfb_redraw_puts(const char *s, size_t nr)
 }
 
 /* Slower line-based scroll mode which interacts better with dom0. */
-void lfb_scroll_puts(const char *s, size_t nr)
+void cf_check lfb_scroll_puts(const char *s, size_t nr)
 {
 unsigned int i;
 
diff --git a/xen/drivers/video/lfb.h b/xen/drivers/video/lfb.h
index e743ccdd6b11..42161402d611 100644
--- a/xen/drivers/video/lfb.h
+++ b/xen/drivers/video/lfb.h
@@ -35,8 +35,8 @@ struct lfb_prop {
 unsigned int text_rows;
 };
 
-void lfb_redraw_puts(const char *s, size_t nr);
-void lfb_scroll_puts(const char *s, size_t nr);
+void cf_check lfb_redraw_puts(const char *s, size_t nr);
+void cf_check lfb_scroll_puts(const char *s, size_t nr);
 void lfb_carriage_return(void);
 void lfb_free(void);
 
diff --git a/xen/drivers/video/vesa.c b/xen/drivers/video/vesa.c
index cb0e443be4dd..155bc09d3237 100644
--- a/xen/drivers/video/vesa.c
+++ b/xen/drivers/video/vesa.c
@@ -17,7 +17,7 @@
 
 #define vlfb_infovga_console_info.u.vesa_lfb
 
-static void lfb_flush(void);
+static void cf_check lfb_flush(void);
 
 static unsigned char *lfb;
 static const struct font_desc *font;
@@ -177,7 +177,7 @@ void __init vesa_mtrr_init(void)
 } while ( (size_total >= PAGE_SIZE) && (rc == -EINVAL) );
 }
 
-static void lfb_flush(void)
+static void cf_check lfb_flush(void)
 {
 if ( vesa_mtrr == 3 )
 __asm__ __volatile__ ("sfence" : : : "memory");
diff --git a/xen/drivers/video/vga.c b/xen/drivers/video/vga.c
index b7f04d0d97f4..abe295e477b1 100644
--- a/xen/drivers/video/vga.c
+++ b/xen/drivers/video/vga.c
@@ -19,8 +19,8 @@ static int vgacon_keep;
 static unsigned int xpos, ypos;
 static unsigned char *video;
 
-static void vga_text_puts(const char *s, size_t nr);
-static void vga_noop_puts(const char *s, size_t nr) {}
+static void cf_check vga_text_puts(const char *s, size_t nr);
+static void cf_check vga_noop_puts(const char *s, size_t nr) {}
 void (*video_puts)(const char *, size_t nr) = vga_noop_puts;
 
 /*
@@ -175,7 +175,7 @@ void __init video_endboot(void)
 }
 }
 
-static void vga_text_puts(const char *s, size_t nr)
+static void cf_check vga_text_puts(const char *s, size_t nr)
 {
 for ( ; nr > 0; nr--, s++ )
 {
-- 
2.11.0




[PATCH v2 63/70] x86/traps: Rework write_stub_trampoline() to not hardcode the jmp

2022-02-14 Thread Andrew Cooper
For CET-IBT, we will need to optionally insert an endbr64 instruction at the
start of the stub.  Don't hardcode the jmp displacement assuming that it
starts at byte 24 of the stub.

Also add extra comments describing what is going on.  The mix of %rax and %rsp
is far from trivial to follow.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v2:
 * Retain the rounding up to 16 bytes.
---
 xen/arch/x86/x86_64/traps.c | 35 ++-
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index d661d7ffcaaf..edc6820b85c7 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -293,30 +293,39 @@ static unsigned int write_stub_trampoline(
 unsigned char *stub, unsigned long stub_va,
 unsigned long stack_bottom, unsigned long target_va)
 {
+unsigned char *p = stub;
+
+/* Store guest %rax into %ss slot */
 /* movabsq %rax, stack_bottom - 8 */
-stub[0] = 0x48;
-stub[1] = 0xa3;
-*(uint64_t *)[2] = stack_bottom - 8;
+*p++ = 0x48;
+*p++ = 0xa3;
+*(uint64_t *)p = stack_bottom - 8;
+p += 8;
 
+/* Store guest %rsp in %rax */
 /* movq %rsp, %rax */
-stub[10] = 0x48;
-stub[11] = 0x89;
-stub[12] = 0xe0;
+*p++ = 0x48;
+*p++ = 0x89;
+*p++ = 0xe0;
 
+/* Switch to Xen stack */
 /* movabsq $stack_bottom - 8, %rsp */
-stub[13] = 0x48;
-stub[14] = 0xbc;
-*(uint64_t *)[15] = stack_bottom - 8;
+*p++ = 0x48;
+*p++ = 0xbc;
+*(uint64_t *)p = stack_bottom - 8;
+p += 8;
 
+/* Store guest %rsp into %rsp slot */
 /* pushq %rax */
-stub[23] = 0x50;
+*p++ = 0x50;
 
 /* jmp target_va */
-stub[24] = 0xe9;
-*(int32_t *)[25] = target_va - (stub_va + 29);
+*p++ = 0xe9;
+*(int32_t *)p = target_va - (stub_va + (p - stub) + 4);
+p += 4;
 
 /* Round up to a multiple of 16 bytes. */
-return 32;
+return ROUNDUP(p - stub, 16);
 }
 
 DEFINE_PER_CPU(struct stubs, stubs);
-- 
2.11.0




[PATCH v2 62/70] x86/alternatives: Clear CR4.CET when clearing CR0.WP

2022-02-14 Thread Andrew Cooper
This allows us to have CET active much earlier in boot.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
 xen/arch/x86/alternative.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index 436047abe021..ec24692e9595 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -333,9 +333,13 @@ static int __init cf_check nmi_apply_alternatives(
  */
 if ( !(alt_done & alt_todo) )
 {
-unsigned long cr0;
+unsigned long cr0, cr4;
 
 cr0 = read_cr0();
+cr4 = read_cr4();
+
+if ( cr4 & X86_CR4_CET )
+write_cr4(cr4 & ~X86_CR4_CET);
 
 /* Disable WP to allow patching read-only pages. */
 write_cr0(cr0 & ~X86_CR0_WP);
@@ -345,6 +349,9 @@ static int __init cf_check nmi_apply_alternatives(
 
 write_cr0(cr0);
 
+if ( cr4 & X86_CR4_CET )
+write_cr4(cr4);
+
 alt_done |= alt_todo;
 }
 
-- 
2.11.0




[PATCH v2 40/70] x86/idle: CFI hardening

2022-02-14 Thread Andrew Cooper
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
---
 xen/arch/x86/acpi/cpu_idle.c | 31 +---
 xen/arch/x86/acpi/cpuidle_menu.c |  6 +++---
 xen/arch/x86/cpu/mwait-idle.c|  2 +-
 xen/arch/x86/domain.c|  6 +++---
 xen/arch/x86/hpet.c  |  4 ++--
 xen/arch/x86/include/asm/cpuidle.h   |  4 ++--
 xen/arch/x86/include/asm/hpet.h  |  4 ++--
 xen/arch/x86/include/asm/time.h  |  6 +++---
 xen/arch/x86/time.c  |  6 +++---
 xen/drivers/cpufreq/cpufreq_misc_governors.c | 14 ++---
 10 files changed, 49 insertions(+), 34 deletions(-)

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index 0142671bb836..557bc6ef8642 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -75,7 +75,7 @@
 #define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB onwards */
 #define PHI_CC6_RES(val)  GET_HW_RES_IN_NS(0x3FF, val) /* Xeon Phi only */
 
-static void lapic_timer_nop(void) { }
+static void cf_check lapic_timer_nop(void) { }
 void (*__read_mostly lapic_timer_off)(void);
 void (*__read_mostly lapic_timer_on)(void);
 
@@ -310,12 +310,27 @@ static char* acpi_cstate_method_name[] =
 "HALT"
 };
 
-static uint64_t get_stime_tick(void) { return (uint64_t)NOW(); }
-static uint64_t stime_ticks_elapsed(uint64_t t1, uint64_t t2) { return t2 - 
t1; }
-static uint64_t stime_tick_to_ns(uint64_t ticks) { return ticks; }
+static uint64_t cf_check get_stime_tick(void)
+{
+return NOW();
+}
+
+static uint64_t cf_check stime_ticks_elapsed(uint64_t t1, uint64_t t2)
+{
+return t2 - t1;
+}
+
+static uint64_t cf_check stime_tick_to_ns(uint64_t ticks)
+{
+return ticks;
+}
+
+static uint64_t cf_check get_acpi_pm_tick(void)
+{
+return inl(pmtmr_ioport);
+}
 
-static uint64_t get_acpi_pm_tick(void) { return (uint64_t)inl(pmtmr_ioport); }
-static uint64_t acpi_pm_ticks_elapsed(uint64_t t1, uint64_t t2)
+static uint64_t cf_check acpi_pm_ticks_elapsed(uint64_t t1, uint64_t t2)
 {
 if ( t2 >= t1 )
 return (t2 - t1);
@@ -664,7 +679,7 @@ void update_idle_stats(struct acpi_processor_power *power,
 spin_unlock(>stat_lock);
 }
 
-static void acpi_processor_idle(void)
+static void cf_check acpi_processor_idle(void)
 {
 unsigned int cpu = smp_processor_id();
 struct acpi_processor_power *power = processor_powers[cpu];
@@ -869,7 +884,7 @@ static void acpi_processor_idle(void)
 cpuidle_current_governor->reflect(power);
 }
 
-void acpi_dead_idle(void)
+void cf_check acpi_dead_idle(void)
 {
 struct acpi_processor_power *power;
 struct acpi_processor_cx *cx;
diff --git a/xen/arch/x86/acpi/cpuidle_menu.c b/xen/arch/x86/acpi/cpuidle_menu.c
index 6ff5fb8ff215..a275436d799c 100644
--- a/xen/arch/x86/acpi/cpuidle_menu.c
+++ b/xen/arch/x86/acpi/cpuidle_menu.c
@@ -185,7 +185,7 @@ static unsigned int get_sleep_length_us(void)
 return (us >> 32) ? (unsigned int)-2000 : (unsigned int)us;
 }
 
-static int menu_select(struct acpi_processor_power *power)
+static int cf_check menu_select(struct acpi_processor_power *power)
 {
 struct menu_device *data = _cpu(menu_devices);
 int i;
@@ -237,7 +237,7 @@ static int menu_select(struct acpi_processor_power *power)
 return data->last_state_idx;
 }
 
-static void menu_reflect(struct acpi_processor_power *power)
+static void cf_check menu_reflect(struct acpi_processor_power *power)
 {
 struct menu_device *data = _cpu(menu_devices);
 u64 new_factor;
@@ -275,7 +275,7 @@ static void menu_reflect(struct acpi_processor_power *power)
 data->correction_factor[data->bucket] = new_factor;
 }
 
-static int menu_enable_device(struct acpi_processor_power *power)
+static int cf_check menu_enable_device(struct acpi_processor_power *power)
 {
 memset(_cpu(menu_devices, power->cpu), 0, sizeof(struct menu_device));
 
diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c
index 927ce1b67aa5..f76c64e04b20 100644
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -773,7 +773,7 @@ static const struct cpuidle_state snr_cstates[] = {
{}
 };
 
-static void mwait_idle(void)
+static void cf_check mwait_idle(void)
 {
unsigned int cpu = smp_processor_id();
struct acpi_processor_power *power = processor_powers[cpu];
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 1c3a1ec2a080..ae7c88b51af1 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -74,11 +74,11 @@
 
 DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
 
-static void default_idle(void);
+static void cf_check default_idle(void);
 void (*pm_idle) (void) __read_mostly = default_idle;
 void 

[PATCH v2 34/70] x86/emul: CFI hardening

2022-02-14 Thread Andrew Cooper
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

pv_emul_is_mem_write() is only used in a single file.  Having it as a static
inline is pointless because it can't be inlined to begin with.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
---
v2:
 * Correct details in commit message.
---
 xen/arch/x86/hvm/emulate.c | 72 +-
 xen/arch/x86/hvm/hvm.c |  8 ++--
 xen/arch/x86/hvm/svm/svm.c |  4 +-
 xen/arch/x86/include/asm/hvm/emulate.h |  8 ++--
 xen/arch/x86/include/asm/mm.h  | 16 +++-
 xen/arch/x86/mm.c  |  4 +-
 xen/arch/x86/mm/shadow/hvm.c   |  8 ++--
 xen/arch/x86/pv/emul-gate-op.c |  9 +++--
 xen/arch/x86/pv/emul-priv-op.c | 64 +++---
 xen/arch/x86/pv/emulate.h  |  7 
 xen/arch/x86/pv/ro-page-fault.c| 31 +--
 xen/arch/x86/x86_emulate.c | 21 +-
 xen/arch/x86/x86_emulate/x86_emulate.c | 10 ++---
 xen/arch/x86/x86_emulate/x86_emulate.h | 33 
 14 files changed, 148 insertions(+), 147 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 39dac7fd9d6d..e8d510e0be91 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -1272,7 +1272,7 @@ static int __hvmemul_read(
 return linear_read(addr, bytes, p_data, pfec, hvmemul_ctxt);
 }
 
-static int hvmemul_read(
+static int cf_check hvmemul_read(
 enum x86_segment seg,
 unsigned long offset,
 void *p_data,
@@ -1290,7 +1290,7 @@ static int hvmemul_read(
 container_of(ctxt, struct hvm_emulate_ctxt, ctxt));
 }
 
-int hvmemul_insn_fetch(
+int cf_check hvmemul_insn_fetch(
 unsigned long offset,
 void *p_data,
 unsigned int bytes,
@@ -1336,7 +1336,7 @@ int hvmemul_insn_fetch(
 return X86EMUL_OKAY;
 }
 
-static int hvmemul_write(
+static int cf_check hvmemul_write(
 enum x86_segment seg,
 unsigned long offset,
 void *p_data,
@@ -1384,7 +1384,7 @@ static int hvmemul_write(
 return X86EMUL_OKAY;
 }
 
-static int hvmemul_rmw(
+static int cf_check hvmemul_rmw(
 enum x86_segment seg,
 unsigned long offset,
 unsigned int bytes,
@@ -1437,7 +1437,7 @@ static int hvmemul_rmw(
 return rc;
 }
 
-static int hvmemul_blk(
+static int cf_check hvmemul_blk(
 enum x86_segment seg,
 unsigned long offset,
 void *p_data,
@@ -1478,7 +1478,7 @@ static int hvmemul_blk(
 return rc;
 }
 
-static int hvmemul_write_discard(
+static int cf_check hvmemul_write_discard(
 enum x86_segment seg,
 unsigned long offset,
 void *p_data,
@@ -1489,7 +1489,7 @@ static int hvmemul_write_discard(
 return X86EMUL_OKAY;
 }
 
-static int hvmemul_rep_ins_discard(
+static int cf_check hvmemul_rep_ins_discard(
 uint16_t src_port,
 enum x86_segment dst_seg,
 unsigned long dst_offset,
@@ -1500,7 +1500,7 @@ static int hvmemul_rep_ins_discard(
 return X86EMUL_OKAY;
 }
 
-static int hvmemul_rep_movs_discard(
+static int cf_check hvmemul_rep_movs_discard(
enum x86_segment src_seg,
unsigned long src_offset,
enum x86_segment dst_seg,
@@ -1512,7 +1512,7 @@ static int hvmemul_rep_movs_discard(
 return X86EMUL_OKAY;
 }
 
-static int hvmemul_rep_stos_discard(
+static int cf_check hvmemul_rep_stos_discard(
 void *p_data,
 enum x86_segment seg,
 unsigned long offset,
@@ -1523,7 +1523,7 @@ static int hvmemul_rep_stos_discard(
 return X86EMUL_OKAY;
 }
 
-static int hvmemul_rep_outs_discard(
+static int cf_check hvmemul_rep_outs_discard(
 enum x86_segment src_seg,
 unsigned long src_offset,
 uint16_t dst_port,
@@ -1534,7 +1534,7 @@ static int hvmemul_rep_outs_discard(
 return X86EMUL_OKAY;
 }
 
-static int hvmemul_cmpxchg_discard(
+static int cf_check hvmemul_cmpxchg_discard(
 enum x86_segment seg,
 unsigned long offset,
 void *p_old,
@@ -1546,7 +1546,7 @@ static int hvmemul_cmpxchg_discard(
 return X86EMUL_OKAY;
 }
 
-static int hvmemul_read_io_discard(
+static int cf_check hvmemul_read_io_discard(
 unsigned int port,
 unsigned int bytes,
 unsigned long *val,
@@ -1555,7 +1555,7 @@ static int hvmemul_read_io_discard(
 return X86EMUL_OKAY;
 }
 
-static int hvmemul_write_io_discard(
+static int cf_check hvmemul_write_io_discard(
 unsigned int port,
 unsigned int bytes,
 unsigned long val,
@@ -1564,7 +1564,7 @@ static int hvmemul_write_io_discard(
 return X86EMUL_OKAY;
 }
 
-static int hvmemul_write_msr_discard(
+static int cf_check hvmemul_write_msr_discard(
 unsigned int reg,
 uint64_t val,
 struct x86_emulate_ctxt *ctxt)
@@ -1572,7 +1572,7 @@ static int hvmemul_write_msr_discard(
 return X86EMUL_OKAY;
 }
 
-static int hvmemul_cache_op_discard(
+static int cf_check 

[PATCH v2 29/70] xen/console: CFI hardening

2022-02-14 Thread Andrew Cooper
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
---
 xen/drivers/char/console.c   |  4 ++--
 xen/drivers/char/ehci-dbgp.c | 24 +---
 xen/drivers/char/ns16550.c   | 26 +-
 3 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index 380765ab02fd..d9d6556c2293 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -552,7 +552,7 @@ static void __serial_rx(char c, struct cpu_user_regs *regs)
 #endif
 }
 
-static void serial_rx(char c, struct cpu_user_regs *regs)
+static void cf_check serial_rx(char c, struct cpu_user_regs *regs)
 {
 static int switch_code_count = 0;
 
@@ -1286,7 +1286,7 @@ void panic(const char *fmt, ...)
  * **
  */
 
-static void suspend_steal_fn(const char *str, size_t nr) { }
+static void cf_check suspend_steal_fn(const char *str, size_t nr) { }
 static int suspend_steal_id;
 
 int console_suspend(void)
diff --git a/xen/drivers/char/ehci-dbgp.c b/xen/drivers/char/ehci-dbgp.c
index a6b57fdf2d19..e205c0da6a61 100644
--- a/xen/drivers/char/ehci-dbgp.c
+++ b/xen/drivers/char/ehci-dbgp.c
@@ -1000,13 +1000,15 @@ static int ehci_dbgp_external_startup(struct ehci_dbgp 
*dbgp)
 
 typedef void (*set_debug_port_t)(struct ehci_dbgp *, unsigned int);
 
-static void default_set_debug_port(struct ehci_dbgp *dbgp, unsigned int port)
+static void cf_check default_set_debug_port(
+struct ehci_dbgp *dbgp, unsigned int port)
 {
 }
 
 static set_debug_port_t __read_mostly set_debug_port = default_set_debug_port;
 
-static void nvidia_set_debug_port(struct ehci_dbgp *dbgp, unsigned int port)
+static void cf_check nvidia_set_debug_port(
+struct ehci_dbgp *dbgp, unsigned int port)
 {
 uint32_t dword = pci_conf_read32(PCI_SBDF(0, dbgp->bus, dbgp->slot,
   dbgp->func), 0x74);
@@ -1167,7 +1169,7 @@ static inline void _ehci_dbgp_flush(struct ehci_dbgp 
*dbgp)
 dbgp->out.chunk = 0;
 }
 
-static void ehci_dbgp_flush(struct serial_port *port)
+static void cf_check ehci_dbgp_flush(struct serial_port *port)
 {
 struct ehci_dbgp *dbgp = port->uart;
 s_time_t goal;
@@ -1196,7 +1198,7 @@ static void ehci_dbgp_flush(struct serial_port *port)
set_timer(>timer, goal);
 }
 
-static void ehci_dbgp_putc(struct serial_port *port, char c)
+static void cf_check ehci_dbgp_putc(struct serial_port *port, char c)
 {
 struct ehci_dbgp *dbgp = port->uart;
 
@@ -1209,7 +1211,7 @@ static void ehci_dbgp_putc(struct serial_port *port, char 
c)
 ehci_dbgp_flush(port);
 }
 
-static int ehci_dbgp_tx_ready(struct serial_port *port)
+static int cf_check ehci_dbgp_tx_ready(struct serial_port *port)
 {
 struct ehci_dbgp *dbgp = port->uart;
 
@@ -1228,7 +1230,7 @@ static int ehci_dbgp_tx_ready(struct serial_port *port)
(dbgp->state == dbgp_idle) * DBGP_MAX_PACKET;
 }
 
-static int ehci_dbgp_getc(struct serial_port *port, char *pc)
+static int cf_check ehci_dbgp_getc(struct serial_port *port, char *pc)
 {
 struct ehci_dbgp *dbgp = port->uart;
 
@@ -1309,7 +1311,7 @@ static bool_t ehci_dbgp_setup_preirq(struct ehci_dbgp 
*dbgp)
 return 0;
 }
 
-static void __init ehci_dbgp_init_preirq(struct serial_port *port)
+static void __init cf_check ehci_dbgp_init_preirq(struct serial_port *port)
 {
 struct ehci_dbgp *dbgp = port->uart;
 u32 debug_port, offset;
@@ -1358,7 +1360,7 @@ static void ehci_dbgp_setup_postirq(struct ehci_dbgp 
*dbgp)
 set_timer(>timer, NOW() + MILLISECS(1));
 }
 
-static void __init ehci_dbgp_init_postirq(struct serial_port *port)
+static void __init cf_check ehci_dbgp_init_postirq(struct serial_port *port)
 {
 struct ehci_dbgp *dbgp = port->uart;
 
@@ -1409,12 +1411,12 @@ static int ehci_dbgp_check_release(struct ehci_dbgp 
*dbgp)
 return 0;
 }
 
-static void __init ehci_dbgp_endboot(struct serial_port *port)
+static void __init cf_check ehci_dbgp_endboot(struct serial_port *port)
 {
 ehci_dbgp_check_release(port->uart);
 }
 
-static void ehci_dbgp_suspend(struct serial_port *port)
+static void cf_check ehci_dbgp_suspend(struct serial_port *port)
 {
 struct ehci_dbgp *dbgp = port->uart;
 
@@ -1431,7 +1433,7 @@ static void ehci_dbgp_suspend(struct serial_port *port)
 dbgp->state = dbgp_unsafe;
 }
 
-static void ehci_dbgp_resume(struct serial_port *port)
+static void cf_check ehci_dbgp_resume(struct serial_port *port)
 {
 struct ehci_dbgp *dbgp = port->uart;
 
diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
index 990cad39fe85..8df1ee4d5c2c 100644
--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -174,7 +174,7 @@ static void 

[PATCH v2 22/70] xen/hypfs: CFI hardening

2022-02-14 Thread Andrew Cooper
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

Signed-off-by: Andrew Cooper 
Acked-by: Juergen Gross 
---
 xen/common/hypfs.c | 57 +++---
 xen/common/sched/cpupool.c | 25 ++--
 xen/include/xen/hypfs.h| 49 +++
 3 files changed, 65 insertions(+), 66 deletions(-)

diff --git a/xen/common/hypfs.c b/xen/common/hypfs.c
index 1526bcc52810..0d22396f5dd7 100644
--- a/xen/common/hypfs.c
+++ b/xen/common/hypfs.c
@@ -113,12 +113,13 @@ static void hypfs_unlock(void)
 }
 }
 
-const struct hypfs_entry *hypfs_node_enter(const struct hypfs_entry *entry)
+const struct hypfs_entry *cf_check hypfs_node_enter(
+const struct hypfs_entry *entry)
 {
 return entry;
 }
 
-void hypfs_node_exit(const struct hypfs_entry *entry)
+void cf_check hypfs_node_exit(const struct hypfs_entry *entry)
 {
 }
 
@@ -289,16 +290,14 @@ static int hypfs_get_path_user(char *buf,
 return 0;
 }
 
-struct hypfs_entry *hypfs_leaf_findentry(const struct hypfs_entry_dir *dir,
- const char *name,
- unsigned int name_len)
+struct hypfs_entry *cf_check hypfs_leaf_findentry(
+const struct hypfs_entry_dir *dir, const char *name, unsigned int name_len)
 {
 return ERR_PTR(-ENOTDIR);
 }
 
-struct hypfs_entry *hypfs_dir_findentry(const struct hypfs_entry_dir *dir,
-const char *name,
-unsigned int name_len)
+struct hypfs_entry *cf_check hypfs_dir_findentry(
+const struct hypfs_entry_dir *dir, const char *name, unsigned int name_len)
 {
 struct hypfs_entry *entry;
 
@@ -360,7 +359,7 @@ static struct hypfs_entry *hypfs_get_entry(const char *path)
 return hypfs_get_entry_rel(_root, path + 1);
 }
 
-unsigned int hypfs_getsize(const struct hypfs_entry *entry)
+unsigned int cf_check hypfs_getsize(const struct hypfs_entry *entry)
 {
 return entry->size;
 }
@@ -396,7 +395,7 @@ int hypfs_read_dyndir_id_entry(const struct hypfs_entry_dir 
*template,
 return 0;
 }
 
-static const struct hypfs_entry *hypfs_dyndir_enter(
+static const struct hypfs_entry *cf_check hypfs_dyndir_enter(
 const struct hypfs_entry *entry)
 {
 const struct hypfs_dyndir_id *data;
@@ -407,7 +406,7 @@ static const struct hypfs_entry *hypfs_dyndir_enter(
 return data->template->e.funcs->enter(>template->e);
 }
 
-static struct hypfs_entry *hypfs_dyndir_findentry(
+static struct hypfs_entry *cf_check hypfs_dyndir_findentry(
 const struct hypfs_entry_dir *dir, const char *name, unsigned int name_len)
 {
 const struct hypfs_dyndir_id *data;
@@ -418,8 +417,8 @@ static struct hypfs_entry *hypfs_dyndir_findentry(
 return data->template->e.funcs->findentry(data->template, name, name_len);
 }
 
-static int hypfs_read_dyndir(const struct hypfs_entry *entry,
- XEN_GUEST_HANDLE_PARAM(void) uaddr)
+static int cf_check hypfs_read_dyndir(
+const struct hypfs_entry *entry, XEN_GUEST_HANDLE_PARAM(void) uaddr)
 {
 const struct hypfs_dyndir_id *data;
 
@@ -463,8 +462,8 @@ unsigned int hypfs_dynid_entry_size(const struct 
hypfs_entry *template,
 return DIRENTRY_SIZE(snprintf(NULL, 0, template->name, id));
 }
 
-int hypfs_read_dir(const struct hypfs_entry *entry,
-   XEN_GUEST_HANDLE_PARAM(void) uaddr)
+int cf_check hypfs_read_dir(const struct hypfs_entry *entry,
+XEN_GUEST_HANDLE_PARAM(void) uaddr)
 {
 const struct hypfs_entry_dir *d;
 const struct hypfs_entry *e;
@@ -510,8 +509,8 @@ int hypfs_read_dir(const struct hypfs_entry *entry,
 return 0;
 }
 
-int hypfs_read_leaf(const struct hypfs_entry *entry,
-XEN_GUEST_HANDLE_PARAM(void) uaddr)
+int cf_check hypfs_read_leaf(
+const struct hypfs_entry *entry, XEN_GUEST_HANDLE_PARAM(void) uaddr)
 {
 const struct hypfs_entry_leaf *l;
 unsigned int size = entry->funcs->getsize(entry);
@@ -555,9 +554,9 @@ static int hypfs_read(const struct hypfs_entry *entry,
 return ret;
 }
 
-int hypfs_write_leaf(struct hypfs_entry_leaf *leaf,
- XEN_GUEST_HANDLE_PARAM(const_void) uaddr,
- unsigned int ulen)
+int cf_check hypfs_write_leaf(
+struct hypfs_entry_leaf *leaf, XEN_GUEST_HANDLE_PARAM(const_void) uaddr,
+unsigned int ulen)
 {
 char *buf;
 int ret;
@@ -596,9 +595,9 @@ int hypfs_write_leaf(struct hypfs_entry_leaf *leaf,
 return ret;
 }
 
-int hypfs_write_bool(struct hypfs_entry_leaf *leaf,
- XEN_GUEST_HANDLE_PARAM(const_void) uaddr,
- unsigned int ulen)
+int cf_check hypfs_write_bool(
+struct hypfs_entry_leaf *leaf, XEN_GUEST_HANDLE_PARAM(const_void) uaddr,
+

[PATCH v2 66/70] x86/entry: Make syscall/sysenter entrypoints CET-IBT compatible

2022-02-14 Thread Andrew Cooper
Each of MSR_{L,C}STAR and MSR_SYSENTER_EIP need to land on an endbr64
instruction.  For sysenter, this is easy.

Unfortunately for syscall, the stubs are already 29 byte long with a limit of
32.  endbr64 is 4 bytes.  Luckily, there is a 1 byte instruction which can
move from the stubs into the main handlers.

Move the push %rax out of the stub and into {l,c}star_entry(), allowing room
for the endbr64 instruction when appropriate.  Update the comment describing
the entry state.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v1.1:
 * Update to use endbr helpers.
---
 xen/arch/x86/x86_64/entry.S | 18 +-
 xen/arch/x86/x86_64/traps.c | 11 +++
 2 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 8494b97a54a2..9abcf95bd010 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -241,18 +241,17 @@ iret_exit_to_guest:
  * When entering SYSCALL from user mode:
  *  Vector directly to the registered arch.syscall_addr.
  *
- * Initial work is done by per-CPU trampolines. At this point %rsp has been
- * initialised to point at the correct Xen stack, %rsp has been saved, and
- * %rax needs to be restored from the %ss save slot. All other registers are
- * still to be saved onto the stack, starting with RFLAGS, and an appropriate
- * %ss must be saved into the space left by the trampoline.
+ * Initial work is done by per-CPU trampolines.
+ *  - Guest %rax stored in the %ss slot
+ *  - Guest %rsp stored in %rax
+ *  - Xen stack loaded, pointing at the %ss slot
  */
 ENTRY(lstar_enter)
 #ifdef CONFIG_XEN_SHSTK
 ALTERNATIVE "", "setssbsy", X86_FEATURE_XEN_SHSTK
 #endif
-/* sti could live here when we don't switch page tables below. */
-movq  8(%rsp),%rax /* Restore %rax. */
+push  %rax  /* Guest %rsp */
+movq  8(%rsp), %rax /* Restore guest %rax */
 movq  $FLAT_KERNEL_SS,8(%rsp)
 pushq %r11
 pushq $FLAT_KERNEL_CS64
@@ -288,9 +287,9 @@ ENTRY(cstar_enter)
 #ifdef CONFIG_XEN_SHSTK
 ALTERNATIVE "", "setssbsy", X86_FEATURE_XEN_SHSTK
 #endif
-/* sti could live here when we don't switch page tables below. */
+push  %rax  /* Guest %rsp */
 CR4_PV32_RESTORE
-movq  8(%rsp), %rax /* Restore %rax. */
+movq  8(%rsp), %rax /* Restore guest %rax. */
 movq  $FLAT_USER_SS32, 8(%rsp) /* Assume a 64bit domain.  Compat 
handled lower. */
 pushq %r11
 pushq $FLAT_USER_CS32
@@ -323,6 +322,7 @@ ENTRY(cstar_enter)
 jmp   switch_to_kernel
 
 ENTRY(sysenter_entry)
+ENDBR64
 #ifdef CONFIG_XEN_SHSTK
 ALTERNATIVE "", "setssbsy", X86_FEATURE_XEN_SHSTK
 #endif
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index edc6820b85c7..fccfb7c17283 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -295,6 +296,12 @@ static unsigned int write_stub_trampoline(
 {
 unsigned char *p = stub;
 
+if ( cpu_has_xen_ibt )
+{
+place_endbr64(p);
+p += 4;
+}
+
 /* Store guest %rax into %ss slot */
 /* movabsq %rax, stack_bottom - 8 */
 *p++ = 0x48;
@@ -315,10 +322,6 @@ static unsigned int write_stub_trampoline(
 *(uint64_t *)p = stack_bottom - 8;
 p += 8;
 
-/* Store guest %rsp into %rsp slot */
-/* pushq %rax */
-*p++ = 0x50;
-
 /* jmp target_va */
 *p++ = 0xe9;
 *(int32_t *)p = target_va - (stub_va + (p - stub) + 4);
-- 
2.11.0




[PATCH v2 42/70] x86/hvmsave: CFI hardening

2022-02-14 Thread Andrew Cooper
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
---
 xen/arch/x86/cpu/mcheck/vmce.c   |  4 ++--
 xen/arch/x86/emul-i8254.c|  4 ++--
 xen/arch/x86/hvm/hpet.c  |  4 ++--
 xen/arch/x86/hvm/hvm.c   | 18 ++
 xen/arch/x86/hvm/irq.c   | 12 ++--
 xen/arch/x86/hvm/mtrr.c  |  4 ++--
 xen/arch/x86/hvm/pmtimer.c   |  4 ++--
 xen/arch/x86/hvm/rtc.c   |  4 ++--
 xen/arch/x86/hvm/vioapic.c   |  4 ++--
 xen/arch/x86/hvm/viridian/viridian.c | 15 ---
 xen/arch/x86/hvm/vlapic.c|  8 
 xen/arch/x86/hvm/vpic.c  |  4 ++--
 12 files changed, 44 insertions(+), 41 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
index eb6434a3ba20..458120f9ad8d 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -353,7 +353,7 @@ int vmce_wrmsr(uint32_t msr, uint64_t val)
 }
 
 #if CONFIG_HVM
-static int vmce_save_vcpu_ctxt(struct vcpu *v, hvm_domain_context_t *h)
+static int cf_check vmce_save_vcpu_ctxt(struct vcpu *v, hvm_domain_context_t 
*h)
 {
 struct hvm_vmce_vcpu ctxt = {
 .caps = v->arch.vmce.mcg_cap,
@@ -365,7 +365,7 @@ static int vmce_save_vcpu_ctxt(struct vcpu *v, 
hvm_domain_context_t *h)
 return hvm_save_entry(VMCE_VCPU, v->vcpu_id, h, );
 }
 
-static int vmce_load_vcpu_ctxt(struct domain *d, hvm_domain_context_t *h)
+static int cf_check vmce_load_vcpu_ctxt(struct domain *d, hvm_domain_context_t 
*h)
 {
 unsigned int vcpuid = hvm_load_instance(h);
 struct vcpu *v;
diff --git a/xen/arch/x86/emul-i8254.c b/xen/arch/x86/emul-i8254.c
index 0e09a173187f..d170f464d966 100644
--- a/xen/arch/x86/emul-i8254.c
+++ b/xen/arch/x86/emul-i8254.c
@@ -391,7 +391,7 @@ void pit_stop_channel0_irq(PITState *pit)
 spin_unlock(>lock);
 }
 
-static int pit_save(struct vcpu *v, hvm_domain_context_t *h)
+static int cf_check pit_save(struct vcpu *v, hvm_domain_context_t *h)
 {
 struct domain *d = v->domain;
 PITState *pit = domain_vpit(d);
@@ -409,7 +409,7 @@ static int pit_save(struct vcpu *v, hvm_domain_context_t *h)
 return rc;
 }
 
-static int pit_load(struct domain *d, hvm_domain_context_t *h)
+static int cf_check pit_load(struct domain *d, hvm_domain_context_t *h)
 {
 PITState *pit = domain_vpit(d);
 int i, rc = 0;
diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index 7bdb51cfa1c4..ed512fa65b63 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -582,7 +582,7 @@ static const struct hvm_mmio_ops hpet_mmio_ops = {
 };
 
 
-static int hpet_save(struct vcpu *v, hvm_domain_context_t *h)
+static int cf_check hpet_save(struct vcpu *v, hvm_domain_context_t *h)
 {
 const struct domain *d = v->domain;
 HPETState *hp = domain_vhpet(d);
@@ -645,7 +645,7 @@ static int hpet_save(struct vcpu *v, hvm_domain_context_t 
*h)
 return rc;
 }
 
-static int hpet_load(struct domain *d, hvm_domain_context_t *h)
+static int cf_check hpet_load(struct domain *d, hvm_domain_context_t *h)
 {
 HPETState *hp = domain_vhpet(d);
 struct hvm_hw_hpet *rec;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index e87e809a945d..4cf313a0ad0a 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -756,7 +756,7 @@ void hvm_domain_destroy(struct domain *d)
 destroy_vpci_mmcfg(d);
 }
 
-static int hvm_save_tsc_adjust(struct vcpu *v, hvm_domain_context_t *h)
+static int cf_check hvm_save_tsc_adjust(struct vcpu *v, hvm_domain_context_t 
*h)
 {
 struct hvm_tsc_adjust ctxt = {
 .tsc_adjust = v->arch.hvm.msr_tsc_adjust,
@@ -765,7 +765,7 @@ static int hvm_save_tsc_adjust(struct vcpu *v, 
hvm_domain_context_t *h)
 return hvm_save_entry(TSC_ADJUST, v->vcpu_id, h, );
 }
 
-static int hvm_load_tsc_adjust(struct domain *d, hvm_domain_context_t *h)
+static int cf_check hvm_load_tsc_adjust(struct domain *d, hvm_domain_context_t 
*h)
 {
 unsigned int vcpuid = hvm_load_instance(h);
 struct vcpu *v;
@@ -788,7 +788,7 @@ static int hvm_load_tsc_adjust(struct domain *d, 
hvm_domain_context_t *h)
 HVM_REGISTER_SAVE_RESTORE(TSC_ADJUST, hvm_save_tsc_adjust,
   hvm_load_tsc_adjust, 1, HVMSR_PER_VCPU);
 
-static int hvm_save_cpu_ctxt(struct vcpu *v, hvm_domain_context_t *h)
+static int cf_check hvm_save_cpu_ctxt(struct vcpu *v, hvm_domain_context_t *h)
 {
 struct segment_register seg;
 struct hvm_hw_cpu ctxt = {
@@ -971,7 +971,7 @@ unsigned long hvm_cr4_guest_valid_bits(const struct domain 
*d)
 (cet  ? X86_CR4_CET   : 0));
 }
 
-static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
+static int cf_check hvm_load_cpu_ctxt(struct domain *d, 

[PATCH v2 30/70] xen/misc: CFI hardening

2022-02-14 Thread Andrew Cooper
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
---
 xen/arch/x86/mm.c| 6 --
 xen/arch/x86/setup.c | 4 ++--
 xen/common/domain.c  | 2 +-
 xen/common/gdbstub.c | 5 ++---
 xen/common/livepatch.c   | 7 +++
 xen/common/memory.c  | 4 ++--
 xen/common/page_alloc.c  | 2 +-
 xen/common/radix-tree.c  | 4 ++--
 xen/common/rangeset.c| 2 +-
 xen/common/spinlock.c| 6 +++---
 xen/common/vm_event.c| 6 +++---
 xen/common/xmalloc_tlsf.c| 4 ++--
 xen/drivers/passthrough/amd/iommu_init.c | 2 +-
 13 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 3b8bc3dda977..4b6956c5be78 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -835,7 +835,8 @@ struct mmio_emul_range_ctxt {
 unsigned long mfn;
 };
 
-static int print_mmio_emul_range(unsigned long s, unsigned long e, void *arg)
+static int cf_check print_mmio_emul_range(
+unsigned long s, unsigned long e, void *arg)
 {
 const struct mmio_emul_range_ctxt *ctxt = arg;
 
@@ -4606,7 +4607,8 @@ static int _handle_iomem_range(unsigned long s, unsigned 
long e,
 return 0;
 }
 
-static int handle_iomem_range(unsigned long s, unsigned long e, void *p)
+static int cf_check handle_iomem_range(
+unsigned long s, unsigned long e, void *p)
 {
 int err = 0;
 
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index eceff0a4e2b4..735f69d2cae8 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -2023,8 +2023,8 @@ int __hwdom_init xen_in_range(unsigned long mfn)
 return 0;
 }
 
-static int __hwdom_init io_bitmap_cb(unsigned long s, unsigned long e,
- void *ctx)
+static int __hwdom_init cf_check io_bitmap_cb(
+unsigned long s, unsigned long e, void *ctx)
 {
 struct domain *d = ctx;
 unsigned int i;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index a49c26064601..a3614539e472 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -370,7 +370,7 @@ static void cf_check _free_pirq_struct(struct rcu_head 
*head)
 xfree(container_of(head, struct pirq, rcu_head));
 }
 
-static void free_pirq_struct(void *ptr)
+static void cf_check free_pirq_struct(void *ptr)
 {
 struct pirq *pirq = ptr;
 
diff --git a/xen/common/gdbstub.c b/xen/common/gdbstub.c
index 079c3ca9616a..d6872721dc0d 100644
--- a/xen/common/gdbstub.c
+++ b/xen/common/gdbstub.c
@@ -69,7 +69,7 @@ static void gdb_smp_resume(void);
 static char __initdata opt_gdb[30];
 string_param("gdb", opt_gdb);
 
-static void gdbstub_console_puts(const char *str, size_t nr);
+static void cf_check gdbstub_console_puts(const char *str, size_t nr);
 
 /* value <-> char (de)serialzers */
 static char
@@ -546,8 +546,7 @@ __gdb_ctx = {
 };
 static struct gdb_context *gdb_ctx = &__gdb_ctx;
 
-static void
-gdbstub_console_puts(const char *str, size_t nr)
+static void cf_check gdbstub_console_puts(const char *str, size_t nr)
 {
 const char *p;
 
diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
index e8714920dc8f..ec301a9f120c 100644
--- a/xen/common/livepatch.c
+++ b/xen/common/livepatch.c
@@ -157,10 +157,9 @@ unsigned long livepatch_symbols_lookup_by_name(const char 
*symname)
 return 0;
 }
 
-static const char *livepatch_symbols_lookup(unsigned long addr,
-unsigned long *symbolsize,
-unsigned long *offset,
-char *namebuf)
+static const char *cf_check livepatch_symbols_lookup(
+unsigned long addr, unsigned long *symbolsize, unsigned long *offset,
+char *namebuf)
 {
 const struct payload *data;
 unsigned int i, best;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index ede45c4af9db..69b0cd1e50de 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1051,8 +1051,8 @@ struct get_reserved_device_memory {
 unsigned int used_entries;
 };
 
-static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
-  u32 id, void *ctxt)
+static int cf_check get_reserved_device_memory(
+xen_pfn_t start, xen_ulong_t nr, u32 id, void *ctxt)
 {
 struct get_reserved_device_memory *grdm = ctxt;
 uint32_t sbdf = PCI_SBDF3(grdm->map.dev.pci.seg, grdm->map.dev.pci.bus,
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 3caf5c954b24..46357182375a 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1238,7 +1238,7 @@ struct scrub_wait_state {
 bool drop;
 };
 
-static void 

[PATCH v2 16/70] xen: CFI hardening for IPIs

2022-02-14 Thread Andrew Cooper
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
---
 xen/arch/x86/acpi/cpu_idle.c|  2 +-
 xen/arch/x86/acpi/cpufreq/cpufreq.c |  8 
 xen/arch/x86/acpi/cpufreq/powernow.c|  6 +++---
 xen/arch/x86/acpi/lib.c |  2 +-
 xen/arch/x86/cpu/amd.c  |  2 +-
 xen/arch/x86/cpu/mcheck/amd_nonfatal.c  |  2 +-
 xen/arch/x86/cpu/mcheck/mce.c   |  6 +++---
 xen/arch/x86/cpu/mcheck/mce_intel.c |  2 +-
 xen/arch/x86/cpu/mcheck/non-fatal.c |  2 +-
 xen/arch/x86/cpu/microcode/core.c   |  2 +-
 xen/arch/x86/cpu/mtrr/generic.c |  2 +-
 xen/arch/x86/cpu/mtrr/main.c|  2 +-
 xen/arch/x86/cpu/mwait-idle.c   |  6 +++---
 xen/arch/x86/cpu/vpmu.c |  4 ++--
 xen/arch/x86/guest/xen/xen.c|  2 +-
 xen/arch/x86/hvm/nestedhvm.c|  3 +--
 xen/arch/x86/hvm/vmx/vmcs.c |  2 +-
 xen/arch/x86/include/asm/mtrr.h |  2 +-
 xen/arch/x86/irq.c  |  4 ++--
 xen/arch/x86/nmi.c  |  2 +-
 xen/arch/x86/oprofile/nmi_int.c | 10 +-
 xen/arch/x86/oprofile/op_model_athlon.c |  2 +-
 xen/arch/x86/platform_hypercall.c   |  4 ++--
 xen/arch/x86/psr.c  |  2 +-
 xen/arch/x86/shutdown.c |  4 ++--
 xen/arch/x86/smp.c  |  2 +-
 xen/arch/x86/sysctl.c   |  2 +-
 xen/arch/x86/time.c |  8 
 xen/common/cpu.c|  4 ++--
 xen/common/gdbstub.c|  2 +-
 xen/common/keyhandler.c |  2 +-
 xen/common/page_alloc.c |  2 +-
 32 files changed, 53 insertions(+), 54 deletions(-)

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index fb47eb9ad68e..22c8bb0c2d94 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -145,7 +145,7 @@ struct hw_residencies
 uint64_t cc7;
 };
 
-static void do_get_hw_residencies(void *arg)
+static void cf_check do_get_hw_residencies(void *arg)
 {
 struct cpuinfo_x86 *c = _cpu_data;
 struct hw_residencies *hw_res = arg;
diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c 
b/xen/arch/x86/acpi/cpufreq/cpufreq.c
index 9510f05340aa..8133c2dd958c 100644
--- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
+++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
@@ -129,7 +129,7 @@ struct drv_cmd {
 u32 val;
 };
 
-static void do_drv_read(void *drvcmd)
+static void cf_check do_drv_read(void *drvcmd)
 {
 struct drv_cmd *cmd;
 
@@ -148,7 +148,7 @@ static void do_drv_read(void *drvcmd)
 }
 }
 
-static void do_drv_write(void *drvcmd)
+static void cf_check do_drv_write(void *drvcmd)
 {
 struct drv_cmd *cmd;
 uint64_t msr_content;
@@ -244,7 +244,7 @@ struct perf_pair {
 static DEFINE_PER_CPU(struct perf_pair, gov_perf_pair);
 static DEFINE_PER_CPU(struct perf_pair, usr_perf_pair);
 
-static void read_measured_perf_ctrs(void *_readin)
+static void cf_check read_measured_perf_ctrs(void *_readin)
 {
 struct perf_pair *readin = _readin;
 
@@ -340,7 +340,7 @@ static unsigned int get_cur_freq_on_cpu(unsigned int cpu)
 return extract_freq(get_cur_val(cpumask_of(cpu)), data);
 }
 
-static void feature_detect(void *info)
+static void cf_check feature_detect(void *info)
 {
 struct cpufreq_policy *policy = info;
 unsigned int eax;
diff --git a/xen/arch/x86/acpi/cpufreq/powernow.c 
b/xen/arch/x86/acpi/cpufreq/powernow.c
index da8fc40b9a6f..ca71ecf72d67 100644
--- a/xen/arch/x86/acpi/cpufreq/powernow.c
+++ b/xen/arch/x86/acpi/cpufreq/powernow.c
@@ -44,12 +44,12 @@
 
 #define ARCH_CPU_FLAG_RESUME   1
 
-static void transition_pstate(void *pstate)
+static void cf_check transition_pstate(void *pstate)
 {
 wrmsrl(MSR_PSTATE_CTRL, *(unsigned int *)pstate);
 }
 
-static void update_cpb(void *data)
+static void cf_check update_cpb(void *data)
 {
 struct cpufreq_policy *policy = data;
 
@@ -165,7 +165,7 @@ struct amd_cpu_data {
 u32 max_hw_pstate;
 };
 
-static void get_cpu_data(void *arg)
+static void cf_check get_cpu_data(void *arg)
 {
 struct amd_cpu_data *data = arg;
 struct processor_performance *perf = data->perf;
diff --git a/xen/arch/x86/acpi/lib.c b/xen/arch/x86/acpi/lib.c
index b66e7338e74d..43831b92d132 100644
--- a/xen/arch/x86/acpi/lib.c
+++ b/xen/arch/x86/acpi/lib.c
@@ -99,7 +99,7 @@ unsigned int acpi_get_processor_id(unsigned int cpu)
return INVALID_ACPIID;
 }
 
-static void get_mwait_ecx(void *info)
+static void cf_check get_mwait_ecx(void *info)
 {
*(u32 *)info = cpuid_ecx(CPUID_MWAIT_LEAF);
 }
diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index a8e37dbb1f5c..2d18223f20ef 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -430,7 

[PATCH v2 15/70] xen: CFI hardening for call_rcu()

2022-02-14 Thread Andrew Cooper
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
---
 xen/arch/x86/hvm/mtrr.c   | 2 +-
 xen/arch/x86/hvm/vmsi.c   | 2 +-
 xen/arch/x86/mm/mem_sharing.c | 2 +-
 xen/arch/x86/percpu.c | 2 +-
 xen/common/domain.c   | 4 ++--
 xen/common/radix-tree.c   | 2 +-
 xen/common/rcupdate.c | 2 +-
 xen/common/sched/core.c   | 2 +-
 xen/xsm/flask/avc.c   | 2 +-
 9 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index b3ef1bf54133..42f3d8319296 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -586,7 +586,7 @@ int hvm_get_mem_pinned_cacheattr(struct domain *d, gfn_t 
gfn,
 return rc;
 }
 
-static void free_pinned_cacheattr_entry(struct rcu_head *rcu)
+static void cf_check free_pinned_cacheattr_entry(struct rcu_head *rcu)
 {
 xfree(container_of(rcu, struct hvm_mem_pinned_cacheattr_range, rcu));
 }
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 13e2a190b439..2889575a2035 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -441,7 +441,7 @@ static void add_msixtbl_entry(struct domain *d,
 list_add_rcu(>list, >arch.hvm.msixtbl_list);
 }
 
-static void free_msixtbl_entry(struct rcu_head *rcu)
+static void cf_check free_msixtbl_entry(struct rcu_head *rcu)
 {
 struct msixtbl_entry *entry;
 
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 74d2869c0e6f..15e6a7ed814b 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -75,7 +75,7 @@ static DEFINE_SPINLOCK(shr_audit_lock);
 static DEFINE_RCU_READ_LOCK(shr_audit_read_lock);
 
 /* RCU delayed free of audit list entry */
-static void _free_pg_shared_info(struct rcu_head *head)
+static void cf_check _free_pg_shared_info(struct rcu_head *head)
 {
 xfree(container_of(head, struct page_sharing_info, rcu_head));
 }
diff --git a/xen/arch/x86/percpu.c b/xen/arch/x86/percpu.c
index eb3ba7bc8874..46460689b73d 100644
--- a/xen/arch/x86/percpu.c
+++ b/xen/arch/x86/percpu.c
@@ -45,7 +45,7 @@ struct free_info {
 };
 static DEFINE_PER_CPU(struct free_info, free_info);
 
-static void _free_percpu_area(struct rcu_head *head)
+static void cf_check _free_percpu_area(struct rcu_head *head)
 {
 struct free_info *info = container_of(head, struct free_info, rcu);
 unsigned int cpu = info->cpu;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 5df0d167537b..32ec156e6f6a 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -365,7 +365,7 @@ static int __init cf_check parse_extra_guest_irqs(const 
char *s)
 }
 custom_param("extra_guest_irqs", parse_extra_guest_irqs);
 
-static void _free_pirq_struct(struct rcu_head *head)
+static void cf_check _free_pirq_struct(struct rcu_head *head)
 {
 xfree(container_of(head, struct pirq, rcu_head));
 }
@@ -1108,7 +1108,7 @@ void vcpu_end_shutdown_deferral(struct vcpu *v)
 }
 
 /* Complete domain destroy after RCU readers are not holding old references. */
-static void complete_domain_destroy(struct rcu_head *head)
+static void cf_check complete_domain_destroy(struct rcu_head *head)
 {
 struct domain *d = container_of(head, struct domain, rcu);
 struct vcpu *v;
diff --git a/xen/common/radix-tree.c b/xen/common/radix-tree.c
index 628a7e06988f..33b47748ae49 100644
--- a/xen/common/radix-tree.c
+++ b/xen/common/radix-tree.c
@@ -58,7 +58,7 @@ static struct radix_tree_node *rcu_node_alloc(void *arg)
return rcu_node ? _node->node : NULL;
 }
 
-static void _rcu_node_free(struct rcu_head *head)
+static void cf_check _rcu_node_free(struct rcu_head *head)
 {
struct rcu_node *rcu_node =
container_of(head, struct rcu_node, rcu_head);
diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index f9dd2584a8b7..423d6b1d6d02 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -167,7 +167,7 @@ static int rsinterval = 1000;
 static atomic_t cpu_count = ATOMIC_INIT(0);
 static atomic_t pending_count = ATOMIC_INIT(0);
 
-static void rcu_barrier_callback(struct rcu_head *head)
+static void cf_check rcu_barrier_callback(struct rcu_head *head)
 {
 /*
  * We need a barrier making all previous writes visible to other cpus
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index cf1ba01b4d87..285de9ee2a19 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -2798,7 +2798,7 @@ static int cpu_schedule_up(unsigned int cpu)
 return 0;
 }
 
-static void sched_res_free(struct rcu_head *head)
+static void cf_check sched_res_free(struct rcu_head *head)
 {
 struct sched_resource *sr = container_of(head, struct sched_resource, rcu);
 
diff --git a/xen/xsm/flask/avc.c b/xen/xsm/flask/avc.c
index 

[PATCH v2 25/70] xen/vpci: CFI hardening

2022-02-14 Thread Andrew Cooper
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.

Use cf_check to annotate function pointer targets for the toolchain.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
---
 xen/drivers/vpci/header.c | 18 +-
 xen/drivers/vpci/msi.c| 42 +-
 xen/drivers/vpci/msix.c   | 20 ++--
 xen/drivers/vpci/vpci.c   | 16 
 xen/include/xen/vpci.h|  8 
 5 files changed, 52 insertions(+), 52 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 40ff79c33f8f..a1c928a0d26f 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -33,8 +33,8 @@ struct map_data {
 bool map;
 };
 
-static int map_range(unsigned long s, unsigned long e, void *data,
- unsigned long *c)
+static int cf_check map_range(
+unsigned long s, unsigned long e, void *data, unsigned long *c)
 {
 const struct map_data *map = data;
 int rc;
@@ -332,8 +332,8 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t 
cmd, bool rom_only)
 return 0;
 }
 
-static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
-  uint32_t cmd, void *data)
+static void cf_check cmd_write(
+const struct pci_dev *pdev, unsigned int reg, uint32_t cmd, void *data)
 {
 uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
 
@@ -353,8 +353,8 @@ static void cmd_write(const struct pci_dev *pdev, unsigned 
int reg,
 pci_conf_write16(pdev->sbdf, reg, cmd);
 }
 
-static void bar_write(const struct pci_dev *pdev, unsigned int reg,
-  uint32_t val, void *data)
+static void cf_check bar_write(
+const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
 struct vpci_bar *bar = data;
 bool hi = false;
@@ -397,8 +397,8 @@ static void bar_write(const struct pci_dev *pdev, unsigned 
int reg,
 pci_conf_write32(pdev->sbdf, reg, val);
 }
 
-static void rom_write(const struct pci_dev *pdev, unsigned int reg,
-  uint32_t val, void *data)
+static void cf_check rom_write(
+const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
 struct vpci_header *header = >vpci->header;
 struct vpci_bar *rom = data;
@@ -445,7 +445,7 @@ static void rom_write(const struct pci_dev *pdev, unsigned 
int reg,
 rom->addr = val & PCI_ROM_ADDRESS_MASK;
 }
 
-static int init_bars(struct pci_dev *pdev)
+static int cf_check init_bars(struct pci_dev *pdev)
 {
 uint16_t cmd;
 uint64_t addr, size;
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 5757a7aed20f..8f2b59e61aa4 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -22,8 +22,8 @@
 
 #include 
 
-static uint32_t control_read(const struct pci_dev *pdev, unsigned int reg,
- void *data)
+static uint32_t cf_check control_read(
+const struct pci_dev *pdev, unsigned int reg, void *data)
 {
 const struct vpci_msi *msi = data;
 
@@ -34,8 +34,8 @@ static uint32_t control_read(const struct pci_dev *pdev, 
unsigned int reg,
(msi->address64 ? PCI_MSI_FLAGS_64BIT : 0);
 }
 
-static void control_write(const struct pci_dev *pdev, unsigned int reg,
-  uint32_t val, void *data)
+static void cf_check control_write(
+const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
 struct vpci_msi *msi = data;
 unsigned int vectors = min_t(uint8_t,
@@ -89,16 +89,16 @@ static void update_msi(const struct pci_dev *pdev, struct 
vpci_msi *msi)
 }
 
 /* Handlers for the address field (32bit or low part of a 64bit address). */
-static uint32_t address_read(const struct pci_dev *pdev, unsigned int reg,
- void *data)
+static uint32_t cf_check address_read(
+const struct pci_dev *pdev, unsigned int reg, void *data)
 {
 const struct vpci_msi *msi = data;
 
 return msi->address;
 }
 
-static void address_write(const struct pci_dev *pdev, unsigned int reg,
-  uint32_t val, void *data)
+static void cf_check address_write(
+const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
 struct vpci_msi *msi = data;
 
@@ -110,16 +110,16 @@ static void address_write(const struct pci_dev *pdev, 
unsigned int reg,
 }
 
 /* Handlers for the high part of a 64bit address field. */
-static uint32_t address_hi_read(const struct pci_dev *pdev, unsigned int reg,
-void *data)
+static uint32_t cf_check address_hi_read(
+const struct pci_dev *pdev, unsigned int reg, void *data)
 {
 const struct vpci_msi *msi = data;
 
 return msi->address >> 32;
 }
 
-static void address_hi_write(const struct pci_dev *pdev, unsigned int reg,
- uint32_t val, void *data)
+static void cf_check 

  1   2   >