Re: [Xen-devel] [PATCH v1 1/6] x86/vvmx: introduce vvmcx_valid()

2018-10-30 Thread Sergey Dyasli
On 30/10/2018 07:41, Tian, Kevin wrote:
>> From: Sergey Dyasli [mailto:sergey.dya...@citrix.com]
>> Sent: Friday, October 12, 2018 11:28 PM
>>
>> As a convenient helper function and refactor the code to use it.
>>
>> No functional change.
>>
>> Signed-off-by: Sergey Dyasli 
> 
> since vmcx is hvm abstracted term, what about using this
> helper in svm side too?

I couldn't find any code in nestedsvm.c that would benefit from this new
helper.

--
Thanks,
Sergey

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1 5/6] x86/vvmx: correctly report vvmcs size

2018-10-30 Thread Sergey Dyasli
On 30/10/2018 08:06, Tian, Kevin wrote:
>> From: Sergey Dyasli [mailto:sergey.dya...@citrix.com]
>> Sent: Friday, October 12, 2018 11:28 PM
>>
>> The size of Xen's virtual vmcs region is 4096 bytes. Correctly report
>> it to the guest in case when VMCS shadowing is not available instead of
>> providing H/W value (which is usually smaller).
> 
> what is the problem of reporting smaller size even when actual
> size is 4096? is L1 expected to access the portion beyond h/w
> reported size?
> 

Here's the code snippet from kvm-unit-tests:

vmcs[0]->hdr.revision_id = basic.revision;
assert(!vmcs_clear(vmcs[0]));
assert(!make_vmcs_current(vmcs[0]));
set_all_vmcs_fields(0x86);

assert(!vmcs_clear(vmcs[0]));
memcpy(vmcs[1], vmcs[0], basic.size);
assert(!make_vmcs_current(vmcs[1]));
report("test vmclear flush (current VMCS)", 
check_all_vmcs_fields(0x86));

set_all_vmcs_fields() vmwrites almost 4k, but memcpy() copies only 1024
bytes and vmreads get incorrect values.

>>
>> Signed-off-by: Sergey Dyasli 
>> ---
>>  xen/arch/x86/hvm/vmx/vvmx.c | 8 
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c
>> b/xen/arch/x86/hvm/vmx/vvmx.c
>> index 8b691bfc04..2c2ba36d94 100644
>> --- a/xen/arch/x86/hvm/vmx/vvmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
>> @@ -2064,6 +2064,14 @@ int nvmx_msr_read_intercept(unsigned int msr,
>> u64 *msr_content)
>>  data = (host_data & (~0ul << 32)) |
>> (vmcs->vmcs_revision_id & 0x7fff);
>>  unmap_domain_page(vmcs);
>> +
>> +if ( !cpu_has_vmx_vmcs_shadowing )
>> +{
>> +/* Report vmcs_region_size as 4096 */
>> +data &= ~VMX_BASIC_VMCS_SIZE_MASK;
>> +data |= 1ULL << 44;
>> +}
>> +
>>  break;
>>  }
>>  case MSR_IA32_VMX_PINBASED_CTLS:
>> --
>> 2.17.1
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/4] x86/vvmx: Don't handle unknown nested vmexit reasons at L0

2018-10-26 Thread Sergey Dyasli
On 26/10/2018 12:09, Andrew Cooper wrote:
> Ok - that patch is now as follows:
> 
> x86/vvmx: Let L1 handle all the unconditional vmexit instructions
> 
> Signed-off-by: Andrew Cooper 
> ...
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index aa202e0..7051eb3 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -2383,6 +2383,8 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
>  case EXIT_REASON_TRIPLE_FAULT:
>  case EXIT_REASON_TASK_SWITCH:
>  case EXIT_REASON_CPUID:
> +    case EXIT_REASON_GETSEC:
> +    case EXIT_REASON_INVD:
>  case EXIT_REASON_VMCALL:
>  case EXIT_REASON_VMCLEAR:
>  case EXIT_REASON_VMLAUNCH:
> @@ -2395,6 +2397,7 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
>  case EXIT_REASON_VMXON:
>  case EXIT_REASON_INVEPT:
>  case EXIT_REASON_XSETBV:
> +    case EXIT_REASON_INVVPID:
>  /* inject to L1 */
>  nvcpu->nv_vmexit_pending = 1;
>  break;
> 
> RDTSCP and VMFUNC mustn't be blindly forwarded to L1.  We need to check
> whether the intercepts are active and either forward to L1 or handle at L0.

I think the below will do for RDTSCP.
VMFUNC is trickier but Xen currently doesn't provide this capability to
L1 anyway.

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index aa202e0d12..1d1a0921af 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -2653,6 +2653,13 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs
*regs,
 if ( ctrl & CPU_BASED_TPR_SHADOW )
 nvcpu->nv_vmexit_pending = 1;
 break;
+
+case EXIT_REASON_RDTSCP:
+ctrl = __n2_exec_control(v);
+if ( ctrl & CPU_BASED_RDTSC_EXITING )
+nvcpu->nv_vmexit_pending = 1;
+break;
+
 default:
 gprintk(XENLOG_ERR, "Unexpected nested vmexit: reason %u\n",
 exit_reason);

---
Thanks,
Sergey

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/4] x86/vvmx: Don't handle unknown nested vmexit reasons at L0

2018-10-26 Thread Sergey Dyasli
On 26/10/2018 10:10, Andrew Cooper wrote:
> On 26/10/2018 10:05, Sergey Dyasli wrote:
>>
>> On 25/10/2018 16:39, Andrew Cooper wrote:
>>> This is very dangerous from a security point of view, because a missing 
>>> entry
>>> will cause L2's action to be interpreted as L1's action.
>>>
>>> Signed-off-by: Andrew Cooper 
>>> ---
>>> CC: Sergey Dyasli 
>>> CC: Jan Beulich 
>>> CC: Wei Liu 
>>> CC: Jun Nakajima 
>>> CC: Kevin Tian 
>>> ---
>>>  xen/arch/x86/hvm/vmx/vvmx.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
>>> index d1c8a41..817d85f 100644
>>> --- a/xen/arch/x86/hvm/vmx/vvmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
>>> @@ -2609,8 +2609,9 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
>>>  nvcpu->nv_vmexit_pending = 1;
>>>  break;
>>>  default:
>>> -gprintk(XENLOG_ERR, "Unexpected nested vmexit: reason %u\n",
>>> +gprintk(XENLOG_ERR, "Unhandled nested vmexit: reason %u\n",
>>>  exit_reason);
>>> +domain_crash(v->domain);
>>>  }
>>>  
>>>  return ( nvcpu->nv_vmexit_pending == 1 );
>> Can you consider adding handling for the following?
>>
>>  EXIT_REASON_INVD
>>  EXIT_REASON_RDTSCP
>>  EXIT_REASON_VMFUNC
> 
> Looks like these should be merged in with the INVVPID change, if your
> happy with that?

Yes, that would be a cleaner way, thanks.

--
Sergey

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] x86/vvmx: Drop the now-obsolete vmx_inst_check_privilege()

2018-10-26 Thread Sergey Dyasli
On 25/10/2018 16:38, Andrew Cooper wrote:
> Now that nvmx_handle_vmx_insn() performs all VT-x instruction checks, there is
> no need for redundant checking in vmx_inst_check_privilege().  Remove it, and
> take out the vmxon_check boolean which was plumbed through decode_vmx_inst().
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Sergey Dyasli 

--
Thanks,
Sergey

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/4] x86/vvmx: Unconditionally initialise vmxon_region_pa during vcpu construction

2018-10-26 Thread Sergey Dyasli
On 25/10/2018 16:36, Andrew Cooper wrote:
> This is a stopgap solution until the toolstack side of initialisation can be
> sorted out, but it does result in the nvmx_vcpu_in_vmx() predicate working
> correctly even when nested virt hasn't been enabled for the domain.
> 
> Update nvmx_handle_vmx_insn() to include the in-vmx mode check (for all
> instructions other than VMXON) to complete the set of #UD checks.
> 
> In addition, sanity check that the nested vmexit handler has worked correctly,
> and that we are only providing emulation of the VT-x instructions to L1
> guests.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Sergey Dyasli 

--
Thanks,
Sergey

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/4] x86/vvmx: Don't handle unknown nested vmexit reasons at L0

2018-10-26 Thread Sergey Dyasli


On 25/10/2018 16:39, Andrew Cooper wrote:
> This is very dangerous from a security point of view, because a missing entry
> will cause L2's action to be interpreted as L1's action.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Sergey Dyasli 
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Jun Nakajima 
> CC: Kevin Tian 
> ---
>  xen/arch/x86/hvm/vmx/vvmx.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index d1c8a41..817d85f 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -2609,8 +2609,9 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
>  nvcpu->nv_vmexit_pending = 1;
>  break;
>  default:
> -gprintk(XENLOG_ERR, "Unexpected nested vmexit: reason %u\n",
> +gprintk(XENLOG_ERR, "Unhandled nested vmexit: reason %u\n",
>  exit_reason);
> +domain_crash(v->domain);
>  }
>  
>  return ( nvcpu->nv_vmexit_pending == 1 );

Can you consider adding handling for the following?

EXIT_REASON_INVD
EXIT_REASON_RDTSCP
EXIT_REASON_VMFUNC

But in any case:

Reviewed-by: Sergey Dyasli 

--
Thanks,
Sergey

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/4] x86/vvmx: INVVPID instructions should be handled at by L1

2018-10-26 Thread Sergey Dyasli
On 25/10/2018 16:38, Andrew Cooper wrote:

> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 9390705..d1c8a41 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -2348,6 +2348,7 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
>  case EXIT_REASON_VMXOFF:
>  case EXIT_REASON_VMXON:
>  case EXIT_REASON_INVEPT:
> +case EXIT_REASON_INVVPID:
>  case EXIT_REASON_XSETBV:
>  /* inject to L1 */
>  nvcpu->nv_vmexit_pending = 1;

Reviewed-by: Sergey Dyasli 

I think this patch should be the first one in the series.


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2] x86/boot: enable NMIs after traps init

2018-10-23 Thread Sergey Dyasli
In certain scenarios, NMIs might be disabled during Xen boot process.
Such situation will cause alternative_instructions() to:

panic("Timed out waiting for alternatives self-NMI to hit");

This bug was originally seen when using Tboot to boot Xen.

To prevent this from happening, enable NMIs during cpu_init() and
during __start_xen() for BSP.

Signed-off-by: Sergey Dyasli 
---
v2:
- Added enable_nmis() to __start_xen() for BSP
- Added comments as per Andrew's suggestion

CC: Jan Beulich 
CC: Andrew Cooper 
CC: Wei Liu 
---
 xen/arch/x86/cpu/common.c | 3 +++
 xen/arch/x86/setup.c  | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 057859ab14..2d02335fef 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -819,6 +819,9 @@ void cpu_init(void)
 #define CD(register) asm volatile ( "mov %0,%%db" #register : : "r"(0UL) );
CD(0); CD(1); CD(2); CD(3); /* no db4 and db5 */; CD(6); CD(7);
 #undef CD
+
+   /* Enable NMIs in case our loader (e.g. Tboot) has left them disabled */
+   enable_nmis();
 }
 
 void cpu_uninit(unsigned int cpu)
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 93b79c7c0c..194388ef47 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -708,6 +708,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
 /* Full exception support from here on in. */
 
+/* Re-enable NMIs in case our loader (e.g. Tboot) has left them disabled. 
*/
+enable_nmis();
+
 if ( pvh_boot )
 {
 /*
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] x86/boot: enable NMIs during cpu_init()

2018-10-23 Thread Sergey Dyasli
In certain scenarios, NMIs might be disabled during Xen boot process.
Such situation will cause alternative_instructions() to:

panic("Timed out waiting for alternatives self-NMI to hit");

This bug was originally seen when using tboot to boot Xen.
To prevent this from happening, enable NMIs during cpu_init().

Signed-off-by: Sergey Dyasli 
---
CC: Jan Beulich 
CC: Andrew Cooper 
CC: Wei Liu 
---
 xen/arch/x86/cpu/common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 057859ab14..09fbd98764 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -819,6 +819,8 @@ void cpu_init(void)
 #define CD(register) asm volatile ( "mov %0,%%db" #register : : "r"(0UL) );
CD(0); CD(1); CD(2); CD(3); /* no db4 and db5 */; CD(6); CD(7);
 #undef CD
+
+   enable_nmis();
 }
 
 void cpu_uninit(unsigned int cpu)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] mm/page_alloc: make bootscrub happen in idle-loop

2018-10-15 Thread Sergey Dyasli
On 12/10/18 14:40, Jan Beulich wrote:
 On 09.10.18 at 17:21,  wrote:
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -161,8 +161,42 @@ string_param("badpage", opt_badpage);
>>  /*
>>   * no-bootscrub -> Free pages are not zeroed during boot.
>>   */
>> -static bool_t opt_bootscrub __initdata = 1;
>> -boolean_param("bootscrub", opt_bootscrub);
>> +enum bootscrub_mode {
>> +BOOTSCRUB_OFF = 0,
> 
> The "= 0" is pointless.

I don't mind this change.

>> @@ -2039,8 +2077,24 @@ void __init heap_init_late(void)
>>   */
>>  setup_low_mem_virq();
>>  
>> -if ( opt_bootscrub )
>> +switch ( opt_bootscrub )
>> +{
>> +default:
>> +ASSERT_UNREACHABLE();
>> +/* Fall through */
>> +
>> +case BOOTSCRUB_IDLE:
>> +printk("Scrubbing Free RAM on %d nodes in background\n",
>> +   num_online_nodes());
> 
> Still the question whether this printk(), and in particular the inclusion
> of the node count, is meaningful in any way. Other than this

I don't have any strong opinion about how this printk() statement should
look like. It can be changed to whatever maintainers find more appropriate.

> Reviewed-by: Jan Beulich 
> and one or both changes would be easy enough to make while
> committing, provided we can reach agreement.

Thanks,
Sergey

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v1 2/6] x86/vvmx: correct vmfail() usage for vmptrld and vmclear

2018-10-12 Thread Sergey Dyasli
Calling vmfail_valid() is correct only if vvmcx is valid. Modify
functions to use vmfail() instead which performs the necessary check.

Signed-off-by: Sergey Dyasli 
---
 xen/arch/x86/hvm/vmx/vvmx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 8eee6d0ea8..26b7d72660 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1744,7 +1744,7 @@ int nvmx_handle_vmptrld(struct cpu_user_regs *regs)
  !map_io_bitmap_all(v) ||
  !_map_msr_bitmap(v) )
 {
-vmfail_valid(regs, VMX_INSN_VMPTRLD_INVALID_PHYADDR);
+vmfail(regs, VMX_INSN_VMPTRLD_INVALID_PHYADDR);
 goto out;
 }
 }
@@ -1828,7 +1828,7 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs)
 if ( rc == VMSUCCEED )
 vmsucceed(regs);
 else if ( rc == VMFAIL_VALID )
-vmfail_valid(regs, VMX_INSN_VMCLEAR_INVALID_PHYADDR);
+vmfail(regs, VMX_INSN_VMCLEAR_INVALID_PHYADDR);
 else
 vmfail_invalid(regs);
 
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v1 6/6] x86/vvmx: fix I/O and MSR bitmaps mapping

2018-10-12 Thread Sergey Dyasli
Currently Xen tries to map bitmaps during emulation of vmptrld and
vmwrite. This is wrong: the guest can store arbitrary values in those
fields.

Make bitmaps mapping happen only during a nested vmentry and only if
the appropriate execution controls are turned on by L1 hypervisor.
Mapping happens only:

1. During the first nested vmentry
2. After L1 has changed an appropriate vmcs field
3. After nvmx_purge_vvmcs() was previously called

Signed-off-by: Sergey Dyasli 
---
 xen/arch/x86/hvm/vmx/vvmx.c | 104 +++-
 1 file changed, 67 insertions(+), 37 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 2c2ba36d94..e10ed79f66 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -750,6 +750,17 @@ static void __clear_current_vvmcs(struct vcpu *v)
 __vmpclear(nvcpu->nv_n2vmcx_pa);
 }
 
+static void unmap_msr_bitmap(struct vcpu *v)
+{
+struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
+
+if ( nvmx->msrbitmap )
+{
+hvm_unmap_guest_frame(nvmx->msrbitmap, 1);
+nvmx->msrbitmap = NULL;
+}
+}
+
 /*
  * Refreshes the MSR bitmap mapping for the current nested vcpu.  Returns true
  * for a successful mapping, and returns false for MSR_BITMAP parameter errors
@@ -760,12 +771,7 @@ static bool __must_check _map_msr_bitmap(struct vcpu *v)
 struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
 uint64_t gpa;
 
-if ( nvmx->msrbitmap )
-{
-hvm_unmap_guest_frame(nvmx->msrbitmap, 1);
-nvmx->msrbitmap = NULL;
-}
-
+unmap_msr_bitmap(v);
 gpa = get_vvmcs(v, MSR_BITMAP);
 
 if ( !IS_ALIGNED(gpa, PAGE_SIZE) )
@@ -776,6 +782,17 @@ static bool __must_check _map_msr_bitmap(struct vcpu *v)
 return nvmx->msrbitmap != NULL;
 }
 
+static void unmap_io_bitmap(struct vcpu *v, unsigned int idx)
+{
+struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
+
+if ( nvmx->iobitmap[idx] )
+{
+hvm_unmap_guest_frame(nvmx->iobitmap[idx], 1);
+nvmx->iobitmap[idx] = NULL;
+}
+}
+
 static bool_t __must_check _map_io_bitmap(struct vcpu *v, u64 vmcs_reg)
 {
 struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
@@ -783,8 +800,7 @@ static bool_t __must_check _map_io_bitmap(struct vcpu *v, 
u64 vmcs_reg)
 int index;
 
 index = vmcs_reg == IO_BITMAP_A ? 0 : 1;
-if (nvmx->iobitmap[index])
-hvm_unmap_guest_frame(nvmx->iobitmap[index], 1);
+unmap_io_bitmap(v, index);
 gpa = get_vvmcs(v, vmcs_reg);
 nvmx->iobitmap[index] = hvm_map_guest_frame_ro(gpa >> PAGE_SHIFT, 1);
 
@@ -799,7 +815,6 @@ static inline bool_t __must_check map_io_bitmap_all(struct 
vcpu *v)
 
 static void nvmx_purge_vvmcs(struct vcpu *v)
 {
-struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
 struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
 int i;
 
@@ -809,16 +824,11 @@ static void nvmx_purge_vvmcs(struct vcpu *v)
 nvcpu->nv_vvmcx = NULL;
 nvcpu->nv_vvmcxaddr = INVALID_PADDR;
 v->arch.hvm.vmx.vmcs_shadow_maddr = 0;
-for (i=0; i<2; i++) {
-if ( nvmx->iobitmap[i] ) {
-hvm_unmap_guest_frame(nvmx->iobitmap[i], 1);
-nvmx->iobitmap[i] = NULL;
-}
-}
-if ( nvmx->msrbitmap ) {
-hvm_unmap_guest_frame(nvmx->msrbitmap, 1);
-nvmx->msrbitmap = NULL;
-}
+
+for ( i = 0; i < 2; i++ )
+unmap_io_bitmap(v, i);
+
+unmap_msr_bitmap(v);
 }
 
 u64 nvmx_get_tsc_offset(struct vcpu *v)
@@ -1594,20 +1604,34 @@ static void clear_vvmcs_launched(struct list_head 
*launched_list,
 }
 }
 
-static int nvmx_vmresume(struct vcpu *v, struct cpu_user_regs *regs)
+static enum vmx_insn_errno nvmx_vmresume(struct vcpu *v)
 {
 struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
 struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+unsigned int exec_ctrl;
 
-/* check VMCS is valid and IO BITMAP is set */
-if ( vvmcx_valid(v) &&
-((nvmx->iobitmap[0] && nvmx->iobitmap[1]) ||
-!(__n2_exec_control(v) & CPU_BASED_ACTIVATE_IO_BITMAP) ) )
-nvcpu->nv_vmentry_pending = 1;
-else
-vmfail_invalid(regs);
+ASSERT(vvmcx_valid(v));
+exec_ctrl = __n2_exec_control(v);
 
-return X86EMUL_OKAY;
+if ( exec_ctrl & CPU_BASED_ACTIVATE_IO_BITMAP )
+{
+if ( (nvmx->iobitmap[0] == NULL || nvmx->iobitmap[1] == NULL) &&
+ !map_io_bitmap_all(v) )
+goto invalid_control_state;
+}
+
+if ( exec_ctrl & CPU_BASED_ACTIVATE_MSR_BITMAP )
+{
+if ( nvmx->msrbitmap == NULL && !_map_msr_bitmap(v) )
+goto invalid_control_state;
+}
+
+nvcpu->nv_vmentry_pending = 1;
+
+return VMX_INSN_SUCCEED;
+
+invalid_control_state:
+return VMX_INSN_INVALID_CONTROL_STATE;
 }
 
 int nvmx_handle_vmresume(struct cpu_user_regs *regs)
@@ -1641,7 +1665,12 @@ int 

[Xen-devel] [PATCH v1 4/6] x86/vvmx: add VMX_INSN_VMCLEAR_WITH_VMXON_PTR errno

2018-10-12 Thread Sergey Dyasli
And make nvmx_handle_vmclear() return the new errno in case the provided
address is the same as vmxon region address.

While at it, correct the return value for not-4KB-aligned case and for
invalid physaddr.

Signed-off-by: Sergey Dyasli 
---
 xen/arch/x86/hvm/vmx/vvmx.c| 23 ++-
 xen/include/asm-x86/hvm/vmx/vmcs.h |  1 +
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 4caa5811a1..8b691bfc04 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1804,9 +1804,20 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs)
 return rc;
 
 BUILD_BUG_ON(X86EMUL_OKAY != VMSUCCEED); /* rc = VMSUCCEED; */
+
+if ( gpa == vcpu_2_nvmx(v).vmxon_region_pa )
+{
+vmfail(regs, VMX_INSN_VMCLEAR_WITH_VMXON_PTR);
+goto out;
+}
+
 if ( gpa & 0xfff )
-rc = VMFAIL_INVALID;
-else if ( gpa == nvcpu->nv_vvmcxaddr )
+{
+vmfail(regs, VMX_INSN_VMCLEAR_INVALID_PHYADDR);
+goto out;
+}
+
+if ( gpa == nvcpu->nv_vvmcxaddr )
 {
 if ( cpu_has_vmx_vmcs_shadowing )
 nvmx_clear_vmcs_pointer(v, nvcpu->nv_vvmcx);
@@ -1820,7 +1831,10 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs)
 bool_t writable;
 
 vvmcs = hvm_map_guest_frame_rw(paddr_to_pfn(gpa), 0, &writable);
-if ( vvmcs ) 
+
+if ( !vvmcs )
+rc = VMFAIL_VALID;
+else
 {
 if ( writable )
 clear_vvmcs_launched(&nvmx->launched_list,
@@ -1835,9 +1849,8 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs)
 vmsucceed(regs);
 else if ( rc == VMFAIL_VALID )
 vmfail(regs, VMX_INSN_VMCLEAR_INVALID_PHYADDR);
-else
-vmfail_invalid(regs);
 
+out:
 return X86EMUL_OKAY;
 }
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h 
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index cae1861610..e84d2e482b 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -529,6 +529,7 @@ enum vmx_insn_errno
 {
 VMX_INSN_SUCCEED   = 0,
 VMX_INSN_VMCLEAR_INVALID_PHYADDR   = 2,
+VMX_INSN_VMCLEAR_WITH_VMXON_PTR= 3,
 VMX_INSN_VMLAUNCH_NONCLEAR_VMCS= 4,
 VMX_INSN_VMRESUME_NONLAUNCHED_VMCS = 5,
 VMX_INSN_INVALID_CONTROL_STATE = 7,
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v1 0/6] x86/vvmx: various fixes

2018-10-12 Thread Sergey Dyasli
These were found by running nested VMX tests from kvm-unit-tests.

Sergey Dyasli (6):
  x86/vvmx: introduce vvmcx_valid()
  x86/vvmx: correct vmfail() usage for vmptrld and vmclear
  x86/vvmx: add VMX_INSN_VMPTRLD_WITH_VMXON_PTR errno
  x86/vvmx: add VMX_INSN_VMCLEAR_WITH_VMXON_PTR errno
  x86/vvmx: correctly report vvmcs size
  x86/vvmx: fix I/O and MSR bitmaps mapping

 xen/arch/x86/hvm/vmx/vvmx.c | 164 +++-
 xen/include/asm-x86/hvm/nestedhvm.h |   5 +
 xen/include/asm-x86/hvm/vmx/vmcs.h  |   2 +
 3 files changed, 117 insertions(+), 54 deletions(-)

-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v1 5/6] x86/vvmx: correctly report vvmcs size

2018-10-12 Thread Sergey Dyasli
The size of Xen's virtual vmcs region is 4096 bytes. Correctly report
it to the guest in case when VMCS shadowing is not available instead of
providing H/W value (which is usually smaller).

Signed-off-by: Sergey Dyasli 
---
 xen/arch/x86/hvm/vmx/vvmx.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 8b691bfc04..2c2ba36d94 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -2064,6 +2064,14 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 
*msr_content)
 data = (host_data & (~0ul << 32)) |
(vmcs->vmcs_revision_id & 0x7fff);
 unmap_domain_page(vmcs);
+
+if ( !cpu_has_vmx_vmcs_shadowing )
+{
+/* Report vmcs_region_size as 4096 */
+data &= ~VMX_BASIC_VMCS_SIZE_MASK;
+data |= 1ULL << 44;
+}
+
 break;
 }
 case MSR_IA32_VMX_PINBASED_CTLS:
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v1 3/6] x86/vvmx: add VMX_INSN_VMPTRLD_WITH_VMXON_PTR errno

2018-10-12 Thread Sergey Dyasli
And make nvmx_handle_vmptrld() return the new errno in case the provided
address is the same as vmxon region address.

While at it, correct the return value for not-4KB-aligned case.

Signed-off-by: Sergey Dyasli 
---
 xen/arch/x86/hvm/vmx/vvmx.c| 10 --
 xen/include/asm-x86/hvm/vmx/vmcs.h |  1 +
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 26b7d72660..4caa5811a1 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1699,9 +1699,15 @@ int nvmx_handle_vmptrld(struct cpu_user_regs *regs)
 if ( rc != X86EMUL_OKAY )
 return rc;
 
-if ( gpa == vcpu_2_nvmx(v).vmxon_region_pa || gpa & 0xfff )
+if ( gpa & 0xfff )
 {
-vmfail_invalid(regs);
+vmfail(regs, VMX_INSN_VMPTRLD_INVALID_PHYADDR);
+goto out;
+}
+
+if ( gpa == vcpu_2_nvmx(v).vmxon_region_pa )
+{
+vmfail(regs, VMX_INSN_VMPTRLD_WITH_VMXON_PTR);
 goto out;
 }
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h 
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 76dd04a72d..cae1861610 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -534,6 +534,7 @@ enum vmx_insn_errno
 VMX_INSN_INVALID_CONTROL_STATE = 7,
 VMX_INSN_INVALID_HOST_STATE= 8,
 VMX_INSN_VMPTRLD_INVALID_PHYADDR   = 9,
+VMX_INSN_VMPTRLD_WITH_VMXON_PTR= 10,
 VMX_INSN_VMPTRLD_INCORRECT_VMCS_ID = 11,
 VMX_INSN_UNSUPPORTED_VMCS_COMPONENT= 12,
 VMX_INSN_VMXON_IN_VMX_ROOT = 15,
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v1 1/6] x86/vvmx: introduce vvmcx_valid()

2018-10-12 Thread Sergey Dyasli
As a convenient helper function and refactor the code to use it.

No functional change.

Signed-off-by: Sergey Dyasli 
---
 xen/arch/x86/hvm/vmx/vvmx.c | 17 -
 xen/include/asm-x86/hvm/nestedhvm.h |  5 +
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 0e45db83e5..8eee6d0ea8 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -521,8 +521,7 @@ static void vmfail(struct cpu_user_regs *regs, enum 
vmx_insn_errno errno)
 if ( errno == VMX_INSN_SUCCEED )
 return;
 
-if ( vcpu_nestedhvm(current).nv_vvmcxaddr != INVALID_PADDR &&
- errno != VMX_INSN_FAIL_INVALID )
+if ( vvmcx_valid(current) && errno != VMX_INSN_FAIL_INVALID )
 vmfail_valid(regs, errno);
 else
 vmfail_invalid(regs);
@@ -805,7 +804,7 @@ static void nvmx_purge_vvmcs(struct vcpu *v)
 int i;
 
 __clear_current_vvmcs(v);
-if ( nvcpu->nv_vvmcxaddr != INVALID_PADDR )
+if ( vvmcx_valid(v) )
 hvm_unmap_guest_frame(nvcpu->nv_vvmcx, 1);
 nvcpu->nv_vvmcx = NULL;
 nvcpu->nv_vvmcxaddr = INVALID_PADDR;
@@ -1601,7 +1600,7 @@ static int nvmx_vmresume(struct vcpu *v, struct 
cpu_user_regs *regs)
 struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
 
 /* check VMCS is valid and IO BITMAP is set */
-if ( (nvcpu->nv_vvmcxaddr != INVALID_PADDR) &&
+if ( vvmcx_valid(v) &&
 ((nvmx->iobitmap[0] && nvmx->iobitmap[1]) ||
 !(__n2_exec_control(v) & CPU_BASED_ACTIVATE_IO_BITMAP) ) )
 nvcpu->nv_vmentry_pending = 1;
@@ -1622,7 +1621,7 @@ int nvmx_handle_vmresume(struct cpu_user_regs *regs)
 if ( rc != X86EMUL_OKAY )
 return rc;
 
-if ( vcpu_nestedhvm(v).nv_vvmcxaddr == INVALID_PADDR )
+if ( !vvmcx_valid(v) )
 {
 vmfail_invalid(regs);
 return X86EMUL_OKAY;
@@ -1656,7 +1655,7 @@ int nvmx_handle_vmlaunch(struct cpu_user_regs *regs)
 if ( rc != X86EMUL_OKAY )
 return rc;
 
-if ( vcpu_nestedhvm(v).nv_vvmcxaddr == INVALID_PADDR )
+if ( !vvmcx_valid(v) )
 {
 vmfail_invalid(regs);
 return X86EMUL_OKAY;
@@ -1709,7 +1708,7 @@ int nvmx_handle_vmptrld(struct cpu_user_regs *regs)
 if ( nvcpu->nv_vvmcxaddr != gpa )
 nvmx_purge_vvmcs(v);
 
-if ( nvcpu->nv_vvmcxaddr == INVALID_PADDR )
+if ( !vvmcx_valid(v) )
 {
 bool_t writable;
 void *vvmcx = hvm_map_guest_frame_rw(paddr_to_pfn(gpa), 1, &writable);
@@ -1848,7 +1847,7 @@ int nvmx_handle_vmread(struct cpu_user_regs *regs)
 if ( rc != X86EMUL_OKAY )
 return rc;
 
-if ( vcpu_nestedhvm(v).nv_vvmcxaddr == INVALID_PADDR )
+if ( !vvmcx_valid(v) )
 {
 vmfail_invalid(regs);
 return X86EMUL_OKAY;
@@ -1891,7 +1890,7 @@ int nvmx_handle_vmwrite(struct cpu_user_regs *regs)
  != X86EMUL_OKAY )
 return X86EMUL_EXCEPTION;
 
-if ( vcpu_nestedhvm(v).nv_vvmcxaddr == INVALID_PADDR )
+if ( !vvmcx_valid(v) )
 {
 vmfail_invalid(regs);
 return X86EMUL_OKAY;
diff --git a/xen/include/asm-x86/hvm/nestedhvm.h 
b/xen/include/asm-x86/hvm/nestedhvm.h
index 9d1c2742b5..e09fa9d47d 100644
--- a/xen/include/asm-x86/hvm/nestedhvm.h
+++ b/xen/include/asm-x86/hvm/nestedhvm.h
@@ -92,4 +92,9 @@ static inline void nestedhvm_set_cr(struct vcpu *v, unsigned 
int cr,
 v->arch.hvm.nvcpu.guest_cr[cr] = value;
 }
 
+static inline bool vvmcx_valid(const struct vcpu *v)
+{
+return vcpu_nestedhvm(v).nv_vvmcxaddr != INVALID_PADDR;
+}
+
 #endif /* _HVM_NESTEDHVM_H */
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2] mm/page_alloc: make bootscrub happen in idle-loop

2018-10-09 Thread Sergey Dyasli
Scrubbing RAM during boot may take a long time on machines with lots
of RAM. Add 'idle' option to bootscrub which marks all pages dirty
initially so they will eventually be scrubbed in idle-loop on every
online CPU.

It's guaranteed that the allocator will return scrubbed pages by doing
eager scrubbing during allocation (unless MEMF_no_scrub was provided).

Use the new 'idle' option as the default one.

Signed-off-by: Sergey Dyasli 
---
v1 --> v2:
- dropped comment about performance
- changed default to 'idle'
- changed type of opt_bootscrub to enum
- restored __initdata for opt_bootscrub
- call parse_bool() first during parsing
- using switch() in heap_init_late()

Note: The message "Scrubbing Free RAM on %d nodes" corresponds to
the similar one in scrub_heap_pages()

CC: Andrew Cooper 
CC: Boris Ostrovsky 
CC: George Dunlap 
CC: Jan Beulich 
CC: Julien Grall 
CC: Wei Liu 
---
 docs/misc/xen-command-line.markdown |  9 -
 xen/common/page_alloc.c | 62 +++--
 2 files changed, 65 insertions(+), 6 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index 1ffd586224..9c15d52bf7 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -227,14 +227,19 @@ that byte `0x12345678` is bad, you would place 
`badpage=0x12345` on
 Xen's command line.
 
 ### bootscrub
-> `= `
+> `= idle | `
 
-> Default: `true`
+> Default: `idle`
 
 Scrub free RAM during boot.  This is a safety feature to prevent
 accidentally leaking sensitive VM data into other VMs if Xen crashes
 and reboots.
 
+In `idle` mode, RAM is scrubbed in background on all CPUs during idle-loop
+with a guarantee that memory allocations always provide scrubbed pages.
+This option reduces boot time on machines with a large amount of RAM while
+still providing security benefits.
+
 ### bootscrub\_chunk
 > `= `
 
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 16e1b0c357..7df5d5c545 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -161,8 +161,42 @@ string_param("badpage", opt_badpage);
 /*
  * no-bootscrub -> Free pages are not zeroed during boot.
  */
-static bool_t opt_bootscrub __initdata = 1;
-boolean_param("bootscrub", opt_bootscrub);
+enum bootscrub_mode {
+BOOTSCRUB_OFF = 0,
+BOOTSCRUB_ON,
+BOOTSCRUB_IDLE,
+};
+static enum bootscrub_mode __initdata opt_bootscrub = BOOTSCRUB_IDLE;
+static int __init parse_bootscrub_param(const char *s)
+{
+/* Interpret 'bootscrub' alone in its positive boolean form */
+if ( *s == '\0' )
+{
+opt_bootscrub = BOOTSCRUB_ON;
+return 0;
+}
+
+switch ( parse_bool(s, NULL) )
+{
+case 0:
+opt_bootscrub = BOOTSCRUB_OFF;
+break;
+
+case 1:
+opt_bootscrub = BOOTSCRUB_ON;
+break;
+
+default:
+if ( !strcmp(s, "idle") )
+opt_bootscrub = BOOTSCRUB_IDLE;
+else
+return -EINVAL;
+break;
+}
+
+return 0;
+}
+custom_param("bootscrub", parse_bootscrub_param);
 
 /*
  * bootscrub_chunk -> Amount of bytes to scrub lockstep on non-SMT CPUs
@@ -1726,6 +1760,7 @@ static void init_heap_pages(
 struct page_info *pg, unsigned long nr_pages)
 {
 unsigned long i;
+bool idle_scrub = false;
 
 /*
  * Some pages may not go through the boot allocator (e.g reserved
@@ -1737,6 +1772,9 @@ static void init_heap_pages(
 first_valid_mfn = mfn_min(page_to_mfn(pg), first_valid_mfn);
 spin_unlock(&heap_lock);
 
+if ( system_state < SYS_STATE_active && opt_bootscrub == BOOTSCRUB_IDLE )
+idle_scrub = true;
+
 for ( i = 0; i < nr_pages; i++ )
 {
 unsigned int nid = phys_to_nid(page_to_maddr(pg+i));
@@ -1763,7 +1801,7 @@ static void init_heap_pages(
 nr_pages -= n;
 }
 
-free_heap_pages(pg + i, 0, scrub_debug);
+free_heap_pages(pg + i, 0, scrub_debug || idle_scrub);
 }
 }
 
@@ -2039,8 +2077,24 @@ void __init heap_init_late(void)
  */
 setup_low_mem_virq();
 
-if ( opt_bootscrub )
+switch ( opt_bootscrub )
+{
+default:
+ASSERT_UNREACHABLE();
+/* Fall through */
+
+case BOOTSCRUB_IDLE:
+printk("Scrubbing Free RAM on %d nodes in background\n",
+   num_online_nodes());
+break;
+
+case BOOTSCRUB_ON:
 scrub_heap_pages();
+break;
+
+case BOOTSCRUB_OFF:
+break;
+}
 }
 
 
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] mm/page_alloc: add bootscrub=idle cmdline option

2018-10-03 Thread Sergey Dyasli
Scrubbing RAM during boot may take a long time on machines with lots
of RAM. Add 'idle' option which marks all pages dirty initially so they
would eventually be scrubbed in idle-loop on every online CPU.

Performance of idle-loop scrubbing is worse than bootscrub but it's
guaranteed that the allocator will return scrubbed pages by doing
eager scrubbing during allocation (unless MEMF_no_scrub was provided).

Signed-off-by: Sergey Dyasli 
---
CC: Andrew Cooper 
CC: Boris Ostrovsky 
CC: George Dunlap 
CC: Jan Beulich 
CC: Julien Grall 
CC: Tim Deegan 
---
 docs/misc/xen-command-line.markdown |  7 +-
 xen/common/page_alloc.c | 36 +
 2 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index 1ffd586224..4c60905837 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -227,7 +227,7 @@ that byte `0x12345678` is bad, you would place 
`badpage=0x12345` on
 Xen's command line.
 
 ### bootscrub
-> `= `
+> `=  | idle`
 
 > Default: `true`
 
@@ -235,6 +235,11 @@ Scrub free RAM during boot.  This is a safety feature to 
prevent
 accidentally leaking sensitive VM data into other VMs if Xen crashes
 and reboots.
 
+In `idle` mode, RAM is scrubbed in background on all CPUs during idle-loop
+with a guarantee that memory allocations always provide scrubbed pages.
+This option reduces boot time on machines with a large amount of RAM while
+still providing security benefits.
+
 ### bootscrub\_chunk
 > `= `
 
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 16e1b0c357..c85f44874a 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -161,8 +161,32 @@ string_param("badpage", opt_badpage);
 /*
  * no-bootscrub -> Free pages are not zeroed during boot.
  */
-static bool_t opt_bootscrub __initdata = 1;
-boolean_param("bootscrub", opt_bootscrub);
+enum {
+BOOTSCRUB_OFF = 0,
+BOOTSCRUB_ON,
+BOOTSCRUB_IDLE,
+};
+static int __read_mostly opt_bootscrub = BOOTSCRUB_ON;
+static int __init parse_bootscrub_param(const char *s)
+{
+if ( *s == '\0' )
+return 0;
+
+if ( !strcmp(s, "idle") )
+opt_bootscrub = BOOTSCRUB_IDLE;
+else
+opt_bootscrub = parse_bool(s, NULL);
+
+if ( opt_bootscrub < 0 )
+{
+opt_bootscrub = BOOTSCRUB_ON;
+return -EINVAL;
+}
+
+return 0;
+}
+
+custom_param("bootscrub", parse_bootscrub_param);
 
 /*
  * bootscrub_chunk -> Amount of bytes to scrub lockstep on non-SMT CPUs
@@ -1763,7 +1787,8 @@ static void init_heap_pages(
 nr_pages -= n;
 }
 
-free_heap_pages(pg + i, 0, scrub_debug);
+free_heap_pages(pg + i, 0, scrub_debug ||
+   opt_bootscrub == BOOTSCRUB_IDLE);
 }
 }
 
@@ -2039,8 +2064,11 @@ void __init heap_init_late(void)
  */
 setup_low_mem_virq();
 
-if ( opt_bootscrub )
+if ( opt_bootscrub == BOOTSCRUB_ON )
 scrub_heap_pages();
+else if ( opt_bootscrub == BOOTSCRUB_IDLE )
+printk("Scrubbing Free RAM on %d nodes in background\n",
+   num_online_nodes());
 }
 
 
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Sergey Dyasli
On 01/10/18 12:13, Jan Beulich wrote:
 On 01.10.18 at 11:58,  wrote:
>> Having the allocator return unscrubbed pages is a potential security
>> concern: some domain can be given pages with memory contents of another
>> domain. This may happen, for example, if a domain voluntarily releases
>> its own memory (ballooning being the easiest way for doing this).
> 
> And we've always said that in this case it's the domain's responsibility
> to scrub the memory of secrets it cares about. Therefore I'm at the
> very least missing some background on this change of expectations.
> 
>> Change the allocator to always scrub the pages given to it by:
>>
>> 1. free_xenheap_pages()
>> 2. free_domheap_pages()
>> 3. online_page()
>> 4. init_heap_pages()
>>
>> Performance testing has shown that on multi-node machines bootscrub
>> vastly outperforms idle-loop scrubbing. So instead of marking all pages
>> dirty initially, introduce bootscrub_done to track the completion of
>> the process and eagerly scrub all allocated pages during boot.
> 
> I'm afraid I'm somewhat lost: There still is active boot time scrubbing,
> or at least I can't see how that might be skipped (other than due to
> "bootscrub=0"). I was actually expecting this to change at some
> point. Am I perhaps simply mis-reading this part of the description?
> 
>> If bootscrub is disabled, then all pages will be marked as dirty right
>> away and scrubbed either in idle-loop or eagerly during allocation.
>>
>> After this patch, alloc_heap_pages() is guaranteed to return scrubbed
>> pages to a caller unless MEMF_no_scrub flag was provided.
> 
> I also don't understand the point of this: Xen's internal allocations
> have no need to come from scrubbed memory. This in particular
> also puts under question the need to "eagerly scrub all allocated
> pages during boot" (quoted from an earlier paragraph).

There are ways to share a Xen's page with a guest. So from a security
point of view, there is no guarantee that a page allocated with
alloc_xenheap_pages() will not end up accessible by some guest.

--
Thanks,
Sergey

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Sergey Dyasli
On Mon, 2018-10-01 at 14:54 +0100, George Dunlap wrote:
> On 10/01/2018 02:44 PM, Sergey Dyasli wrote:
> > On Mon, 2018-10-01 at 07:38 -0600, Jan Beulich wrote:
> > > > > > On 01.10.18 at 15:12,  wrote:
> > > > 
> > > > On 01/10/18 12:13, Jan Beulich wrote:
> > > > > > > > On 01.10.18 at 11:58,  wrote:
> > > > > > 
> > > > > > Having the allocator return unscrubbed pages is a potential security
> > > > > > concern: some domain can be given pages with memory contents of 
> > > > > > another
> > > > > > domain. This may happen, for example, if a domain voluntarily 
> > > > > > releases
> > > > > > its own memory (ballooning being the easiest way for doing this).
> > > > > 
> > > > > And we've always said that in this case it's the domain's 
> > > > > responsibility
> > > > > to scrub the memory of secrets it cares about. Therefore I'm at the
> > > > > very least missing some background on this change of expectations.
> > > > 
> > > > You were on the call when this was discussed, along with the synchronous
> > > > scrubbing in destroydomain.
> > > 
> > > Quite possible, but it has been a while.
> > > 
> > > > Put simply, the current behaviour is not good enough for a number of
> > > > security sensitive usecases.
> > > 
> > > Well, I'm looking forward for Sergey to expand on this in the commit
> > > message.
> > > 
> > > > The main reason however for doing this is the optimisations it enables,
> > > > and in particular, not double scrubbing most of our pages.
> > > 
> > > Well, wait - scrubbing != zeroing (taking into account also what you
> > > say further down).
> > > 
> > > > > > Change the allocator to always scrub the pages given to it by:
> > > > > > 
> > > > > > 1. free_xenheap_pages()
> > > > > > 2. free_domheap_pages()
> > > > > > 3. online_page()
> > > > > > 4. init_heap_pages()
> > > > > > 
> > > > > > Performance testing has shown that on multi-node machines bootscrub
> > > > > > vastly outperforms idle-loop scrubbing. So instead of marking all 
> > > > > > pages
> > > > > > dirty initially, introduce bootscrub_done to track the completion of
> > > > > > the process and eagerly scrub all allocated pages during boot.
> > > > > 
> > > > > I'm afraid I'm somewhat lost: There still is active boot time 
> > > > > scrubbing,
> > > > > or at least I can't see how that might be skipped (other than due to
> > > > > "bootscrub=0"). I was actually expecting this to change at some
> > > > > point. Am I perhaps simply mis-reading this part of the description?
> > > > 
> > > > No.  Sergey tried that, and found a massive perf difference between
> > > > scrubbing in the idle loop and scrubbing at boot.  (1.2s vs 40s iirc)
> > > 
> > > That's not something you can reasonably compare, imo: For one,
> > > it is certainly expected for the background scrubbing to be slower,
> > > simply because of other activity on the system. And then 1.2s
> > > looks awfully small for a multi-Tb system. Yet it is mainly large
> > > systems where the synchronous boot time scrubbing is a problem.
> > 
> > Let me throw in some numbers.
> > 
> > Performance of current idle loop scrubbing is just not good enough:
> > on 8 nodes, 32 CPUs and 512GB RAM machine it takes ~40 seconds to scrub
> > all the memory instead of ~8 seconds for current bootscrub implementation.
> > 
> > This was measured while synchronously waiting for CPUs to scrub all the
> > memory in idle-loop. But scrubbing can happen in background, of course.
> 
> Right, the whole point of idle loop scrubbing is that you *don't*
> syncronously wait for *all* the memory to finish scrubbing before you
> can use part of it.  So why is this an issue for you guys -- what
> concrete problem did it cause, that the full amount of memory took 40s
> to finish scrubbing rather than only 8s?

There is no issue at the moment. Using idle loop to scrub all the memory
is just not viable: it doesn't scale. How long does it currently take
to bootscrub a multi-Tb system? If it takes a few minutes then I fear
that it might take an hour to idle-scrub it.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Sergey Dyasli
On Mon, 2018-10-01 at 07:38 -0600, Jan Beulich wrote:
> > > > On 01.10.18 at 15:12,  wrote:
> > 
> > On 01/10/18 12:13, Jan Beulich wrote:
> > > > > > On 01.10.18 at 11:58,  wrote:
> > > > 
> > > > Having the allocator return unscrubbed pages is a potential security
> > > > concern: some domain can be given pages with memory contents of another
> > > > domain. This may happen, for example, if a domain voluntarily releases
> > > > its own memory (ballooning being the easiest way for doing this).
> > > 
> > > And we've always said that in this case it's the domain's responsibility
> > > to scrub the memory of secrets it cares about. Therefore I'm at the
> > > very least missing some background on this change of expectations.
> > 
> > You were on the call when this was discussed, along with the synchronous
> > scrubbing in destroydomain.
> 
> Quite possible, but it has been a while.
> 
> > Put simply, the current behaviour is not good enough for a number of
> > security sensitive usecases.
> 
> Well, I'm looking forward for Sergey to expand on this in the commit
> message.

Jan,

I think this is the main argument here: what to do about those security
sensitive use cases? Scrubbing everything unconditionally might be a too
radical approach. Would inroducing a new cmdline param be appropriate?

> 
> > The main reason however for doing this is the optimisations it enables,
> > and in particular, not double scrubbing most of our pages.
> 
> Well, wait - scrubbing != zeroing (taking into account also what you
> say further down).

Andrew,

I'm not yet convinced myself about the value that returning always zeroed
pages from the allocator provides. Most of the pages are given to guests
anyway, and re-zeroing a few pages in the hypervisor doesn't sound
too bad. But maybe I'm just not aware of cases where Xen performs large
allocations and zeroes them afterwards?

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Sergey Dyasli
On Mon, 2018-10-01 at 07:38 -0600, Jan Beulich wrote:
> > > > On 01.10.18 at 15:12,  wrote:
> > 
> > On 01/10/18 12:13, Jan Beulich wrote:
> > > > > > On 01.10.18 at 11:58,  wrote:
> > > > 
> > > > Having the allocator return unscrubbed pages is a potential security
> > > > concern: some domain can be given pages with memory contents of another
> > > > domain. This may happen, for example, if a domain voluntarily releases
> > > > its own memory (ballooning being the easiest way for doing this).
> > > 
> > > And we've always said that in this case it's the domain's responsibility
> > > to scrub the memory of secrets it cares about. Therefore I'm at the
> > > very least missing some background on this change of expectations.
> > 
> > You were on the call when this was discussed, along with the synchronous
> > scrubbing in destroydomain.
> 
> Quite possible, but it has been a while.
> 
> > Put simply, the current behaviour is not good enough for a number of
> > security sensitive usecases.
> 
> Well, I'm looking forward for Sergey to expand on this in the commit
> message.
> 
> > The main reason however for doing this is the optimisations it enables,
> > and in particular, not double scrubbing most of our pages.
> 
> Well, wait - scrubbing != zeroing (taking into account also what you
> say further down).
> 
> > > > Change the allocator to always scrub the pages given to it by:
> > > > 
> > > > 1. free_xenheap_pages()
> > > > 2. free_domheap_pages()
> > > > 3. online_page()
> > > > 4. init_heap_pages()
> > > > 
> > > > Performance testing has shown that on multi-node machines bootscrub
> > > > vastly outperforms idle-loop scrubbing. So instead of marking all pages
> > > > dirty initially, introduce bootscrub_done to track the completion of
> > > > the process and eagerly scrub all allocated pages during boot.
> > > 
> > > I'm afraid I'm somewhat lost: There still is active boot time scrubbing,
> > > or at least I can't see how that might be skipped (other than due to
> > > "bootscrub=0"). I was actually expecting this to change at some
> > > point. Am I perhaps simply mis-reading this part of the description?
> > 
> > No.  Sergey tried that, and found a massive perf difference between
> > scrubbing in the idle loop and scrubbing at boot.  (1.2s vs 40s iirc)
> 
> That's not something you can reasonably compare, imo: For one,
> it is certainly expected for the background scrubbing to be slower,
> simply because of other activity on the system. And then 1.2s
> looks awfully small for a multi-Tb system. Yet it is mainly large
> systems where the synchronous boot time scrubbing is a problem.

Let me throw in some numbers.

Performance of current idle loop scrubbing is just not good enough:
on 8 nodes, 32 CPUs and 512GB RAM machine it takes ~40 seconds to scrub
all the memory instead of ~8 seconds for current bootscrub implementation.

This was measured while synchronously waiting for CPUs to scrub all the
memory in idle-loop. But scrubbing can happen in background, of course.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Sergey Dyasli
Having the allocator return unscrubbed pages is a potential security
concern: some domain can be given pages with memory contents of another
domain. This may happen, for example, if a domain voluntarily releases
its own memory (ballooning being the easiest way for doing this).

Change the allocator to always scrub the pages given to it by:

1. free_xenheap_pages()
2. free_domheap_pages()
3. online_page()
4. init_heap_pages()

Performance testing has shown that on multi-node machines bootscrub
vastly outperforms idle-loop scrubbing. So instead of marking all pages
dirty initially, introduce bootscrub_done to track the completion of
the process and eagerly scrub all allocated pages during boot.

If bootscrub is disabled, then all pages will be marked as dirty right
away and scrubbed either in idle-loop or eagerly during allocation.

After this patch, alloc_heap_pages() is guaranteed to return scrubbed
pages to a caller unless MEMF_no_scrub flag was provided.

Signed-off-by: Sergey Dyasli 
---
CC: Andrew Cooper 
CC: Boris Ostrovsky 
CC: George Dunlap 
CC: Jan Beulich 
CC: Julien Grall 
CC: Tim Deegan 
---
 docs/misc/xen-command-line.markdown |  3 ++-
 xen/common/page_alloc.c | 29 ++---
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index 1ffd586224..d9bebf4e4b 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -233,7 +233,8 @@ Xen's command line.
 
 Scrub free RAM during boot.  This is a safety feature to prevent
 accidentally leaking sensitive VM data into other VMs if Xen crashes
-and reboots.
+and reboots. Note: even if disabled, RAM will still be scrubbed in
+background.
 
 ### bootscrub\_chunk
 > `= `
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 16e1b0c357..cb1c265f9c 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -161,8 +161,9 @@ string_param("badpage", opt_badpage);
 /*
  * no-bootscrub -> Free pages are not zeroed during boot.
  */
-static bool_t opt_bootscrub __initdata = 1;
+static bool __read_mostly opt_bootscrub = true;
 boolean_param("bootscrub", opt_bootscrub);
+static bool __read_mostly bootscrub_done;
 
 /*
  * bootscrub_chunk -> Amount of bytes to scrub lockstep on non-SMT CPUs
@@ -1026,6 +1027,12 @@ static struct page_info *alloc_heap_pages(
 }
 }
 
+if ( unlikely(opt_bootscrub && !bootscrub_done) )
+{
+for ( i = 0; i < (1U << order); i++ )
+scrub_one_page(&pg[i]);
+}
+
 if ( need_tlbflush )
 filtered_flush_tlb_mask(tlbflush_timestamp);
 
@@ -1684,7 +1691,7 @@ unsigned int online_page(unsigned long mfn, uint32_t 
*status)
 spin_unlock(&heap_lock);
 
 if ( (y & PGC_state) == PGC_state_offlined )
-free_heap_pages(pg, 0, false);
+free_heap_pages(pg, 0, true);
 
 return ret;
 }
@@ -1763,7 +1770,8 @@ static void init_heap_pages(
 nr_pages -= n;
 }
 
-free_heap_pages(pg + i, 0, scrub_debug);
+free_heap_pages(pg + i, 0, scrub_debug ||
+   (opt_bootscrub ? bootscrub_done : true));
 }
 }
 
@@ -2024,6 +2032,7 @@ static void __init scrub_heap_pages(void)
 }
 }
 
+bootscrub_done = true;
 printk("done.\n");
 
 #ifdef CONFIG_SCRUB_DEBUG
@@ -2098,7 +2107,7 @@ void free_xenheap_pages(void *v, unsigned int order)
 
 memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
 
-free_heap_pages(virt_to_page(v), order, false);
+free_heap_pages(virt_to_page(v), order, true);
 }
 
 #else
@@ -2293,8 +2302,6 @@ void free_domheap_pages(struct page_info *pg, unsigned 
int order)
 }
 else
 {
-bool_t scrub;
-
 if ( likely(d) && likely(d != dom_cow) )
 {
 /* NB. May recursively lock from relinquish_memory(). */
@@ -2309,13 +2316,6 @@ void free_domheap_pages(struct page_info *pg, unsigned 
int order)
 drop_dom_ref = !domain_adjust_tot_pages(d, -(1 << order));
 
 spin_unlock_recursive(&d->page_alloc_lock);
-
-/*
- * Normally we expect a domain to clear pages before freeing them,
- * if it cares about the secrecy of their contents. However, after
- * a domain has died we assume responsibility for erasure.
- */
-scrub = d->is_dying || scrub_debug;
 }
 else
 {
@@ -2328,10 +2328,9 @@ void free_domheap_pages(struct page_info *pg, unsigned 
int order)
  */
 ASSERT(!d || !order);
 drop_dom_ref = 0;
-scrub = 1;
 }
 
-free_heap_pages(pg, order, scrub);
+free_heap_pages(pg, order, true);
 }
 
 if ( drop_dom_ref )
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] x86: Clean up the Xen MSR infrastructure

2018-09-13 Thread Sergey Dyasli
On Wed, 2018-09-12 at 11:23 +0100, Andrew Cooper wrote:
> On 12/09/18 10:46, Sergey Dyasli wrote:
> > On Wed, 2018-09-12 at 10:12 +0100, Andrew Cooper wrote:
> > > On 12/09/18 09:29, Sergey Dyasli wrote:
> > > > On Tue, 2018-09-11 at 19:56 +0100, Andrew Cooper wrote:
> > > > > Rename them to guest_{rd,wr}msr_xen() for consistency, and because 
> > > > > the _regs
> > > > > suffix isn't very appropriate.
> > > > > 
> > > > > Update them to take a vcpu pointer rather than presuming that they 
> > > > > act on
> > > > > current, and switch to using X86EMUL_* return values.
> > > > > 
> > > > > Signed-off-by: Andrew Cooper 
> > > > > ---
> > > > > CC: Jan Beulich 
> > > > > CC: Wei Liu 
> > > > > CC: Roger Pau Monné 
> > > > > CC: Sergey Dyasli 
> > > > > 
> > > > > v3:
> > > > >  * Clean up after splitting the series.
> > > > > ---
> > > > >  xen/arch/x86/msr.c  |  6 ++
> > > > >  xen/arch/x86/traps.c| 29 +
> > > > >  xen/include/asm-x86/processor.h |  4 ++--
> > > > >  3 files changed, 17 insertions(+), 22 deletions(-)
> > > > > 
> > > > > diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
> > > > > index cf0dc27..8f02a89 100644
> > > > > --- a/xen/arch/x86/msr.c
> > > > > +++ b/xen/arch/x86/msr.c
> > > > > @@ -156,8 +156,7 @@ int guest_rdmsr(const struct vcpu *v, uint32_t 
> > > > > msr, uint64_t *val)
> > > > >  
> > > > >  /* Fallthrough. */
> > > > >  case 0x4200 ... 0x42ff:
> > > > > -ret = (rdmsr_hypervisor_regs(msr, val)
> > > > > -   ? X86EMUL_OKAY : X86EMUL_EXCEPTION);
> > > > > +ret = guest_rdmsr_xen(v, msr, val);
> > > > >  break;
> > > > >  
> > > > >  default:
> > > > > @@ -277,8 +276,7 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, 
> > > > > uint64_t val)
> > > > >  
> > > > >  /* Fallthrough. */
> > > > >  case 0x4200 ... 0x42ff:
> > > > > -ret = (wrmsr_hypervisor_regs(msr, val) == 1
> > > > > -   ? X86EMUL_OKAY : X86EMUL_EXCEPTION);
> > > > > +ret = guest_wrmsr_xen(v, msr, val);
> > > > >  break;
> > > > >  
> > > > >  default:
> > > > > diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> > > > > index 7c17806..3988753 100644
> > > > > --- a/xen/arch/x86/traps.c
> > > > > +++ b/xen/arch/x86/traps.c
> > > > > @@ -768,29 +768,25 @@ static void do_trap(struct cpu_user_regs *regs)
> > > > >trapnr, trapstr(trapnr), regs->error_code);
> > > > >  }
> > > > >  
> > > > > -/* Returns 0 if not handled, and non-0 for success. */
> > > > > -int rdmsr_hypervisor_regs(uint32_t idx, uint64_t *val)
> > > > > +int guest_rdmsr_xen(const struct vcpu *v, uint32_t idx, uint64_t 
> > > > > *val)
> > > > >  {
> > > > > -struct domain *d = current->domain;
> > > > > +const struct domain *d = v->domain;
> > > > >  /* Optionally shift out of the way of Viridian architectural 
> > > > > MSRs. */
> > > > >  uint32_t base = is_viridian_domain(d) ? 0x4200 : 0x4000;
> > > > >  
> > > > >  switch ( idx - base )
> > > > >  {
> > > > >  case 0: /* Write hypercall page MSR.  Read as zero. */
> > > > > -{
> > > > >  *val = 0;
> > > > > -return 1;
> > > > > -}
> > > > > +return X86EMUL_OKAY;
> > > > >  }
> > > > >  
> > > > > -return 0;
> > > > > +return X86EMUL_EXCEPTION;
> > > > >  }
> > > > >  
> > > > > -/* Returns 1 if handled, 0 if not and -Exx for error. */
> > > > > -int wrmsr_hypervisor_regs(uint32_t idx, uint64_t val)
> > > > > +int guest_wrmsr_xen(struct vcpu *v, uint32_t idx, uint64_t val)
> > > > >  {
> > > > > 

Re: [Xen-devel] [PATCH v4 1/3] x86/msr: Dispatch Xen and Viridian MSRs from guest_{wr, rd}msr()

2018-09-13 Thread Sergey Dyasli
On Wed, 2018-09-12 at 13:00 +0100, Andrew Cooper wrote:
> Despite the complicated diff in {svm,vmx}_msr_write_intercept(), it is just
> the 0 case losing one level of indentation, as part of removing the call to
> wrmsr_hypervisor_regs().
> 
> The case blocks in guest_{wr,rd}msr() use raw numbers, partly for consistency
> with the CPUID side of things, but mainly because this is clearer code to
> follow.  In particular, the Xen block may overlap with the Viridian block if
> Viridian is not enabled for the domain, and trying to express this with named
> literals caused more confusion that it solved.
> 
> Future changes with clean up the individual APIs, including allowing these
> MSRs to be usable for vcpus other than current (no callers exist with v !=
> current).
> 
> Signed-off-by: Andrew Cooper 
> Reviewed-by: Boris Ostrovsky 
> Reviewed-by: Kevin Tian 
> Reviewed-by: Paul Durrant 
> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Sergey Dyasli 
> 
> v3:
>  * Split out of previous series.  Retain appropriate R-by's
> v4:
>  * Retain switch() for interpreting the result of wrmsr_hypervisor_regs()
> ---
>  xen/arch/x86/hvm/svm/svm.c | 27 +++---
>  xen/arch/x86/hvm/vmx/vmx.c | 28 ---
>  xen/arch/x86/msr.c | 43 
> ++
>  xen/arch/x86/pv/emul-priv-op.c |  6 --
>  4 files changed, 46 insertions(+), 58 deletions(-)

Reviewed-by: Sergey Dyasli 

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] x86: Clean up the Xen MSR infrastructure

2018-09-12 Thread Sergey Dyasli
On Wed, 2018-09-12 at 10:12 +0100, Andrew Cooper wrote:
> On 12/09/18 09:29, Sergey Dyasli wrote:
> > On Tue, 2018-09-11 at 19:56 +0100, Andrew Cooper wrote:
> > > Rename them to guest_{rd,wr}msr_xen() for consistency, and because the 
> > > _regs
> > > suffix isn't very appropriate.
> > > 
> > > Update them to take a vcpu pointer rather than presuming that they act on
> > > current, and switch to using X86EMUL_* return values.
> > > 
> > > Signed-off-by: Andrew Cooper 
> > > ---
> > > CC: Jan Beulich 
> > > CC: Wei Liu 
> > > CC: Roger Pau Monné 
> > > CC: Sergey Dyasli 
> > > 
> > > v3:
> > >  * Clean up after splitting the series.
> > > ---
> > >  xen/arch/x86/msr.c  |  6 ++
> > >  xen/arch/x86/traps.c| 29 +
> > >  xen/include/asm-x86/processor.h |  4 ++--
> > >  3 files changed, 17 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
> > > index cf0dc27..8f02a89 100644
> > > --- a/xen/arch/x86/msr.c
> > > +++ b/xen/arch/x86/msr.c
> > > @@ -156,8 +156,7 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, 
> > > uint64_t *val)
> > >  
> > >  /* Fallthrough. */
> > >  case 0x4200 ... 0x42ff:
> > > -ret = (rdmsr_hypervisor_regs(msr, val)
> > > -   ? X86EMUL_OKAY : X86EMUL_EXCEPTION);
> > > +ret = guest_rdmsr_xen(v, msr, val);
> > >  break;
> > >  
> > >  default:
> > > @@ -277,8 +276,7 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, 
> > > uint64_t val)
> > >  
> > >  /* Fallthrough. */
> > >  case 0x4200 ... 0x42ff:
> > > -ret = (wrmsr_hypervisor_regs(msr, val) == 1
> > > -   ? X86EMUL_OKAY : X86EMUL_EXCEPTION);
> > > +ret = guest_wrmsr_xen(v, msr, val);
> > >  break;
> > >  
> > >  default:
> > > diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> > > index 7c17806..3988753 100644
> > > --- a/xen/arch/x86/traps.c
> > > +++ b/xen/arch/x86/traps.c
> > > @@ -768,29 +768,25 @@ static void do_trap(struct cpu_user_regs *regs)
> > >trapnr, trapstr(trapnr), regs->error_code);
> > >  }
> > >  
> > > -/* Returns 0 if not handled, and non-0 for success. */
> > > -int rdmsr_hypervisor_regs(uint32_t idx, uint64_t *val)
> > > +int guest_rdmsr_xen(const struct vcpu *v, uint32_t idx, uint64_t *val)
> > >  {
> > > -struct domain *d = current->domain;
> > > +const struct domain *d = v->domain;
> > >  /* Optionally shift out of the way of Viridian architectural MSRs. */
> > >  uint32_t base = is_viridian_domain(d) ? 0x4200 : 0x4000;
> > >  
> > >  switch ( idx - base )
> > >  {
> > >  case 0: /* Write hypercall page MSR.  Read as zero. */
> > > -{
> > >  *val = 0;
> > > -return 1;
> > > -}
> > > +return X86EMUL_OKAY;
> > >  }
> > >  
> > > -return 0;
> > > +return X86EMUL_EXCEPTION;
> > >  }
> > >  
> > > -/* Returns 1 if handled, 0 if not and -Exx for error. */
> > > -int wrmsr_hypervisor_regs(uint32_t idx, uint64_t val)
> > > +int guest_wrmsr_xen(struct vcpu *v, uint32_t idx, uint64_t val)
> > >  {
> > > -struct domain *d = current->domain;
> > > +struct domain *d = v->domain;
> > >  /* Optionally shift out of the way of Viridian architectural MSRs. */
> > >  uint32_t base = is_viridian_domain(d) ? 0x4200 : 0x4000;
> > >  
> > > @@ -809,7 +805,7 @@ int wrmsr_hypervisor_regs(uint32_t idx, uint64_t val)
> > >  gdprintk(XENLOG_WARNING,
> > >   "wrmsr hypercall page index %#x unsupported\n",
> > >   page_index);
> > > -return 0;
> > > +return X86EMUL_EXCEPTION;
> > >  }
> > >  
> > >  page = get_page_from_gfn(d, gmfn, &t, P2M_ALLOC);
> > > @@ -822,13 +818,13 @@ int wrmsr_hypervisor_regs(uint32_t idx, uint64_t 
> > > val)
> > >  if ( p2m_is_paging(t) )
> > >  {
> > >  p2m_mem_paging_populate(d, gmfn);
> > 

Re: [Xen-devel] [PATCH v3 3/3] x86: Clean up the Xen MSR infrastructure

2018-09-12 Thread Sergey Dyasli
On Tue, 2018-09-11 at 19:56 +0100, Andrew Cooper wrote:
> Rename them to guest_{rd,wr}msr_xen() for consistency, and because the _regs
> suffix isn't very appropriate.
> 
> Update them to take a vcpu pointer rather than presuming that they act on
> current, and switch to using X86EMUL_* return values.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Sergey Dyasli 
> 
> v3:
>  * Clean up after splitting the series.
> ---
>  xen/arch/x86/msr.c  |  6 ++
>  xen/arch/x86/traps.c| 29 +
>  xen/include/asm-x86/processor.h |  4 ++--
>  3 files changed, 17 insertions(+), 22 deletions(-)
> 
> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
> index cf0dc27..8f02a89 100644
> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -156,8 +156,7 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, 
> uint64_t *val)
>  
>  /* Fallthrough. */
>  case 0x4200 ... 0x42ff:
> -ret = (rdmsr_hypervisor_regs(msr, val)
> -   ? X86EMUL_OKAY : X86EMUL_EXCEPTION);
> +ret = guest_rdmsr_xen(v, msr, val);
>  break;
>  
>  default:
> @@ -277,8 +276,7 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t 
> val)
>  
>  /* Fallthrough. */
>  case 0x4200 ... 0x42ff:
> -ret = (wrmsr_hypervisor_regs(msr, val) == 1
> -   ? X86EMUL_OKAY : X86EMUL_EXCEPTION);
> +ret = guest_wrmsr_xen(v, msr, val);
>  break;
>  
>  default:
> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> index 7c17806..3988753 100644
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -768,29 +768,25 @@ static void do_trap(struct cpu_user_regs *regs)
>trapnr, trapstr(trapnr), regs->error_code);
>  }
>  
> -/* Returns 0 if not handled, and non-0 for success. */
> -int rdmsr_hypervisor_regs(uint32_t idx, uint64_t *val)
> +int guest_rdmsr_xen(const struct vcpu *v, uint32_t idx, uint64_t *val)
>  {
> -struct domain *d = current->domain;
> +const struct domain *d = v->domain;
>  /* Optionally shift out of the way of Viridian architectural MSRs. */
>  uint32_t base = is_viridian_domain(d) ? 0x4200 : 0x4000;
>  
>  switch ( idx - base )
>  {
>  case 0: /* Write hypercall page MSR.  Read as zero. */
> -{
>  *val = 0;
> -return 1;
> -}
> +return X86EMUL_OKAY;
>  }
>  
> -return 0;
> +return X86EMUL_EXCEPTION;
>  }
>  
> -/* Returns 1 if handled, 0 if not and -Exx for error. */
> -int wrmsr_hypervisor_regs(uint32_t idx, uint64_t val)
> +int guest_wrmsr_xen(struct vcpu *v, uint32_t idx, uint64_t val)
>  {
> -struct domain *d = current->domain;
> +struct domain *d = v->domain;
>  /* Optionally shift out of the way of Viridian architectural MSRs. */
>  uint32_t base = is_viridian_domain(d) ? 0x4200 : 0x4000;
>  
> @@ -809,7 +805,7 @@ int wrmsr_hypervisor_regs(uint32_t idx, uint64_t val)
>  gdprintk(XENLOG_WARNING,
>   "wrmsr hypercall page index %#x unsupported\n",
>   page_index);
> -return 0;
> +return X86EMUL_EXCEPTION;
>  }
>  
>  page = get_page_from_gfn(d, gmfn, &t, P2M_ALLOC);
> @@ -822,13 +818,13 @@ int wrmsr_hypervisor_regs(uint32_t idx, uint64_t val)
>  if ( p2m_is_paging(t) )
>  {
>  p2m_mem_paging_populate(d, gmfn);
> -return -ERESTART;
> +return X86EMUL_RETRY;

Previously -ERESTART would've been converted to X86EMUL_EXCEPTION. But
with this patch, X86EMUL_RETRY will actually be returned. I don't think
that callers can handle this situation.

E.g. the code from vmx_vmexit_handler():

case EXIT_REASON_MSR_WRITE:
switch ( hvm_msr_write_intercept(regs->ecx, msr_fold(regs), 1) )
{
case X86EMUL_OKAY:
update_guest_eip(); /* Safe: WRMSR */
break;

case X86EMUL_EXCEPTION:
hvm_inject_hw_exception(TRAP_gp_fault, 0);
break;
}
break;

>  }
>  
>  gdprintk(XENLOG_WARNING,
>   "Bad GMFN %lx (MFN %#"PRI_mfn") to MSR %08x\n",
>   gmfn, mfn_x(page ? page_to_mfn(page) : INVALID_MFN), 
> base);
> -return 0;
> +return X86EMUL_EXCEPTION;
>  }
>  
>  hypercall_page = __map_domain_page(page);
> @@ -836,11 +832,12 @@ int wrmsr_hypervisor_regs(uint32_t idx, uin

Re: [Xen-devel] [PATCH v3 2/3] x86/viridan: Clean up Viridian MSR infrastructure

2018-09-12 Thread Sergey Dyasli
On Tue, 2018-09-11 at 19:56 +0100, Andrew Cooper wrote:
> Rename the functions to guest_{rd,wr}msr_viridian() for consistency, and
> because the _regs() suffix isn't very appropriate.
> 
> Update them to take a vcpu pointer rather than presuming that they act on
> current, which is safe for all implemented operations, and switch their return
> ABI to use X86EMUL_*.
> 
> The default cases no longer need to deal with MSRs out of the Viridian range,
> but drop the printks to debug builds only and identify the value attempting to
> be written.
> 
> Signed-off-by: Andrew Cooper 
> Reviewed-by: Paul Durrant 
> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Sergey Dyasli 
> 
> v3:
>  * Clean up after splitting the series.  Retain appropriate R-by's
> ---
>  xen/arch/x86/hvm/viridian.c| 46 
> ++
>  xen/arch/x86/msr.c |  6 ++---
>  xen/include/asm-x86/hvm/viridian.h | 11 ++---
>  3 files changed, 21 insertions(+), 42 deletions(-)
> 

Reviewed-by: Sergey Dyasli 

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] x86 Community Call - Wed Aug 15, 14:00 - 15:00 UTC - Agenda items

2018-08-14 Thread Sergey Dyasli
On Mon, 2018-08-13 at 02:54 -0600, Jan Beulich wrote:
> > > > On 13.08.18 at 09:46,  wrote:
> > 
> > proposed topics so far:
> > * 4.10+ changes to Xen's memory scrubbing: discussion of the changes 
> > that made to it in recent versions of Xen (4.10+) - Christopher
> > * Project Management stuff to keep the Momentum going - primarily 
> > looking for Intel updates
> 
> Timing is not really good for this, but deferring to the next meeting is
> also too long. I realize everyone's quite busy, and I'm myself also
> struggling to get to look at
> - VMX MSRs policy for Nested Virt: part 1 (I've looked over this, and I
>   think it's okay, but I also think that in particular nested stuff wants
>   both maintainers and Andrew to look over)

Regarding VMX MSRs, I plan to refresh the series once the new CPUID+MSR
infrastructure is in place. This will allow to configure VMX features
for each guest independently.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: correct first_dirty calculations during block merging

2018-07-11 Thread Sergey Dyasli
On Tue, 2018-07-10 at 11:34 -0400, Boris Ostrovsky wrote:
> On 07/10/2018 11:15 AM, Jan Beulich wrote:
> > > > > On 10.07.18 at 16:49,  wrote:
> > > 
> > > Currently it's possible to hit an assertion in alloc_heap_pages():
> > > 
> > > Assertion 'first_dirty != INVALID_DIRTY_IDX || !(pg[i].count_info & 
> > > PGC_need_scrub)' failed at page_alloc.c:988
> > > 
> > > This can happen because a piece of logic to calculate first_dirty
> > > during block merging in free_heap_pages() is missing for the following
> > > scenario:
> > > 
> > > 1. Current block's first_dirty equals to INVALID_DIRTY_IDX
> > > 2. Successor block is free but its first_dirty != INVALID_DIRTY_IDX
> > > 3. The successor is merged into current block
> > > 4. Current block's first_dirty still equals to INVALID_DIRTY_IDX
> > > 
> > > This will trigger the assertion during allocation of such block in
> > > alloc_heap_pages() because there will be pages with PGC_need_scrub
> > > bit set despite the claim of first_dirty that the block is scrubbed.
> > > 
> > > Add the missing piece of logic and slightly update the comment for
> > > the predecessor case to better capture the code's intent.
> > > 
> > > Fixes 1a37f33ea613 ("mm: Place unscrubbed pages at the end of pagelist")
> > > 
> > > Signed-off-by: Sergey Dyasli 
> > > ---
> > > CC: Andrew Cooper 
> > > CC: George Dunlap 
> > > CC: Jan Beulich 
> > > CC: Julien Grall 
> > > CC: Wei Liu 
> > > CC: Boris Ostrovsky 
> > > ---
> > >  xen/common/page_alloc.c | 8 +++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> > > index 20ee1e4897..aa911f2dc5 100644
> > > --- a/xen/common/page_alloc.c
> > > +++ b/xen/common/page_alloc.c
> > > @@ -1426,7 +1426,7 @@ static void free_heap_pages(
> > >  
> > >  page_list_del(predecessor, &heap(node, zone, order));
> > >  
> > > -/* Keep predecessor's first_dirty if it is already set. */
> > > +/* Keep block's first_dirty if the predecessor doesn't have 
> > > one */
> > >  if ( predecessor->u.free.first_dirty == INVALID_DIRTY_IDX &&
> > >   pg->u.free.first_dirty != INVALID_DIRTY_IDX )
> > >  predecessor->u.free.first_dirty = (1U << order) +
> > 
> > How about "Convert pg's first_dirty if predecessor doesn't already have
> > one"? "Keep" isn't describing well enough what's being done here imo.
> 
> "Keep" was used here for the (not provided) "else" clause. But I can see
> how it can be confusing.
> 
> "Update predecessor's first_dirty if necessary"? Or maybe even drop it.

I'd like to retain the comments. Personally, I like the following
variant because the if statement logic is pretty self-explanatory:

/* Update predecessor's first_dirty if necessary */
...
/* Update pg's first_dirty if necessary */

These changes can be done while committing.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] mm/page_alloc: correct first_dirty calculations during block merging

2018-07-10 Thread Sergey Dyasli
Currently it's possible to hit an assertion in alloc_heap_pages():

Assertion 'first_dirty != INVALID_DIRTY_IDX || !(pg[i].count_info & 
PGC_need_scrub)' failed at page_alloc.c:988

This can happen because a piece of logic to calculate first_dirty
during block merging in free_heap_pages() is missing for the following
scenario:

1. Current block's first_dirty equals to INVALID_DIRTY_IDX
2. Successor block is free but its first_dirty != INVALID_DIRTY_IDX
3. The successor is merged into current block
4. Current block's first_dirty still equals to INVALID_DIRTY_IDX

This will trigger the assertion during allocation of such block in
alloc_heap_pages() because there will be pages with PGC_need_scrub
bit set despite the claim of first_dirty that the block is scrubbed.

Add the missing piece of logic and slightly update the comment for
the predecessor case to better capture the code's intent.

Fixes 1a37f33ea613 ("mm: Place unscrubbed pages at the end of pagelist")

Signed-off-by: Sergey Dyasli 
---
CC: Andrew Cooper 
CC: George Dunlap 
CC: Jan Beulich 
CC: Julien Grall 
CC: Wei Liu 
CC: Boris Ostrovsky 
---
 xen/common/page_alloc.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 20ee1e4897..aa911f2dc5 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1426,7 +1426,7 @@ static void free_heap_pages(
 
 page_list_del(predecessor, &heap(node, zone, order));
 
-/* Keep predecessor's first_dirty if it is already set. */
+/* Keep block's first_dirty if the predecessor doesn't have one */
 if ( predecessor->u.free.first_dirty == INVALID_DIRTY_IDX &&
  pg->u.free.first_dirty != INVALID_DIRTY_IDX )
 predecessor->u.free.first_dirty = (1U << order) +
@@ -1447,6 +1447,12 @@ static void free_heap_pages(
 
 check_and_stop_scrub(successor);
 
+/* Keep successor's first_dirty if the block doesn't have one */
+if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX &&
+ successor->u.free.first_dirty != INVALID_DIRTY_IDX )
+pg->u.free.first_dirty = (1U << order) +
+ successor->u.free.first_dirty;
+
 page_list_del(successor, &heap(node, zone, order));
 }
 
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 10/13] x86/domctl: Implement XEN_DOMCTL_get_cpumsr_policy

2018-07-05 Thread Sergey Dyasli
On Tue, 2018-07-03 at 21:55 +0100, Andrew Cooper wrote:
> From: Sergey Dyasli 
> 
> This finally (after literally years of work!) marks the point where the
> toolstack can ask the hypervisor for the current CPUID configuration of a
> specific domain.
> 
> Also extend xen-cpuid's --policy mode to be able to take a domid and dump a
> specific domains CPUID and MSR policy.
> 
> Signed-off-by: Andrew Cooper 
> Signed-off-by: Sergey Dyasli 
> ---
> 
> diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
> index a5b3004..52a3694 100644
> --- a/tools/misc/xen-cpuid.c
> +++ b/tools/misc/xen-cpuid.c
...
> @@ -407,17 +434,36 @@ int main(int argc, char **argv)
>  if ( !msrs )
>  err(1, "calloc(max_msrs)");
>  
> -for ( pol = 0; pol < ARRAY_SIZE(sys_policies); ++pol )
> +if ( domid != -1 )
>  {
> +char name[20];
>  uint32_t nr_leaves = max_leaves;
>  uint32_t nr_msrs = max_msrs;
>  
> -if ( xc_get_system_cpumsr_policy(xch, pol, &nr_leaves, leaves,
> +if ( xc_get_domain_cpumsr_policy(xch, domid, &nr_leaves, leaves,
>   &nr_msrs, msrs) )
> -err(1, "xc_get_system_cpumsr_policy(, %s,,)",
> -sys_policies[pol]);
> +err(1, "xc_get_domain_cpuid_policy(, %d, %d,, %d,)",
 ^
This wants to be "cpumsr" for consistency.

> +domid, nr_leaves, nr_msrs);
>  
> -print_policy(sys_policies[pol], leaves, nr_leaves, msrs, 
> nr_msrs);
> +snprintf(name, sizeof(name), "Domain %d", domid);
> +print_policy(name, leaves, nr_leaves, msrs, nr_msrs);
> +}
> +else
> +{
> +/* Get system policies */
> +for ( pol = 0; pol < ARRAY_SIZE(sys_policies); ++pol )
> +{
> +uint32_t nr_leaves = max_leaves;
> +uint32_t nr_msrs = max_msrs;
> +
> +if ( xc_get_system_cpumsr_policy(xch, pol, &nr_leaves, 
> leaves,
> + &nr_msrs, msrs) )
> +err(1, "xc_get_system_cpumsr_policy(, %s,,)",
> +sys_policies[pol]);
> +
> +print_policy(sys_policies[pol], leaves, nr_leaves,
> + msrs, nr_msrs);
> +}
>  }
>  
>  free(leaves);
> 
-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/3] x86/msr: Use the architectural layout for MSR_{MISC_ENABLES, PLATFORM_INFO}

2018-07-02 Thread Sergey Dyasli
On Mon, 2018-07-02 at 10:57 +0100, Andrew Cooper wrote:
> This simplifies future interactions with the toolstack, by removing the need
> for per-MSR custom accessors when shuffling data in/out of a policy.
> 
> Use a 32bit raw backing integer (for simplicity), and use a bitfield to move
> the cpuid_faulting field to its appropriate position.
> 
> Signed-off-by: Andrew Cooper 
> ---
> 

Reviewed-by: Sergey Dyasli 

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/3] x86/msr: Drop {MISC_ENABLES, PLATFORM_INFO}.available

2018-07-02 Thread Sergey Dyasli
On Mon, 2018-07-02 at 10:57 +0100, Andrew Cooper wrote:
> These MSRs are non-architectural and the available booleans were used in lieu
> of an architectural signal of availability.  The MSRs are unconditionally
> available to HVM guests, but currently for PV guests, are hidden when CPUID
> faulting is unavailable.
> 
> However, in hindsight, the additional booleans make toolstack MSR interactions
> more complicated.  As the behaviour of the MSRs is reserved when unavailable,
> unconditionally letting the MSRs be accessible is compatible behaviour, even
> for PV guests.
> 
> The new behaviour is:
>   * PLATFORM_INFO is unconditionally readable even for PV guests and will
> indicate the presense or absense of CPUID Faulting in bit 31.
>   * MISC_FEATURES_ENABLES is uncondtionally readable, and bit 0 may be set iff
> PLATFORM_INFO reports that CPUID Faulting is available.
> 
> Signed-off-by: Andrew Cooper 
> ---
> 
> diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
> index afbeb7f..c28371c 100644
> --- a/xen/include/asm-x86/msr.h
> +++ b/xen/include/asm-x86/msr.h
> @@ -262,7 +262,6 @@ struct msr_domain_policy
>  {
>  /* 0x00ce  MSR_INTEL_PLATFORM_INFO */
>  struct {
> -bool available; /* This MSR is non-architectural */
>  bool cpuid_faulting;
>  } plaform_info;
>  };
> @@ -290,7 +289,6 @@ struct msr_vcpu_policy
>  
>  /* 0x0140  MSR_INTEL_MISC_FEATURES_ENABLES */
>  struct {
> -bool available; /* This MSR is non-architectural */
>  bool cpuid_faulting;
>  } misc_features_enables;
>  };

Could you add comments saying that those 2 MSRs are always available
for all guests?

With that, Reviewed-by: Sergey Dyasli 

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v1 for 4.7] x86/cpuid: fix raw FEATURESET_7d0 reporting

2018-05-15 Thread Sergey Dyasli
Commit 62b1879693e0 ("x86: further CPUID handling adjustments") added
FEATURESET_7d0 reporting but forgot to update calculate_raw_featureset()
function. As result, the value reported by xen-cpuid contains 0.

Fix that by properly filling raw_featureset[FEATURESET_7d0].

Signed-off-by: Sergey Dyasli 
---
I see that at least 4.8 also contains this bug, so other releases also
need checking.

CC: Jan Beulich 
CC: Andrew Cooper 
---
 xen/arch/x86/cpuid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 451952cabe..fffcecd878 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -113,7 +113,7 @@ static void __init calculate_raw_featureset(void)
 cpuid_count(0x7, 0, &tmp,
 &raw_featureset[FEATURESET_7b0],
 &raw_featureset[FEATURESET_7c0],
-&tmp);
+&raw_featureset[FEATURESET_7d0]);
 if ( max >= 0xd )
 cpuid_count(0xd, 1,
 &raw_featureset[FEATURESET_Da1],
-- 
2.17.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 3/9] xen/x86: support per-domain flag for xpti

2018-04-27 Thread Sergey Dyasli
On Thu, 2018-04-26 at 13:33 +0200, Juergen Gross wrote:
> Instead of switching XPTI globally on or off add a per-domain flag for
> that purpose. This allows to modify the xpti boot parameter to support
> running dom0 without Meltdown mitigations. Using "xpti=no-dom0" as boot

"xpti=dom0=0"

> parameter will achieve that.
> 
> Move the xpti boot parameter handling to xen/arch/x86/pv/domain.c as
> it is pv-domain specific.
> 
> Signed-off-by: Juergen Gross 
> Reviewed-by: Jan Beulich 
> ---
> V9:
> - adjust boot message (Sergey Dyasli)
> - adjust boot parameter documentation (Sergey Dyasli)
> 
> V6.1:
> - address some minor comments (Jan Beulich)
> 
> V6:
> - modify xpti boot parameter options (Andrew Cooper)
> - move xpti_init() code to spec_ctrl.c (Andrew Cooper)
> - irework init of per-domain xpti flag (Andrew Cooper)
> 
> V3:
> - latch get_cpu_info() return value in variable (Jan Beulich)
> - call always xpti_domain_init() for pv dom0 (Jan Beulich)
> - add __init annotations (Jan Beulich)
> - drop per domain XPTI message (Jan Beulich)
> - document xpti=default support (Jan Beulich)
> - move domain xpti flag into a padding hole (Jan Beulich)
> ---
>  docs/misc/xen-command-line.markdown | 14 ++-
>  xen/arch/x86/mm.c   | 17 -
>  xen/arch/x86/pv/dom0_build.c|  1 +
>  xen/arch/x86/pv/domain.c|  6 +++
>  xen/arch/x86/setup.c| 19 --
>  xen/arch/x86/smpboot.c  |  4 +-
>  xen/arch/x86/spec_ctrl.c| 75 
> -
>  xen/include/asm-x86/current.h   |  3 +-
>  xen/include/asm-x86/domain.h|  3 ++
>  xen/include/asm-x86/flushtlb.h  |  2 +-
>  xen/include/asm-x86/spec_ctrl.h |  4 ++
>  11 files changed, 119 insertions(+), 29 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.markdown 
> b/docs/misc/xen-command-line.markdown
> index b353352adf..220d1ba020 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1955,14 +1955,24 @@ clustered mode.  The default, given no hint from the 
> **FADT**, is cluster
>  mode.
>  
>  ### xpti
> -> `= `
> +> `= List of [ default |  | dom0= | domu= ]`
>  
> -> Default: `false` on AMD hardware
> +> Default: `false` on hardware not to be vulnerable to Meltdown (e.g. AMD)
 ^
 known

>  > Default: `true` everywhere else
>  
>  Override default selection of whether to isolate 64-bit PV guest page
>  tables.
>  
> +`true` activates page table isolation even on hardware not vulnerable by
> +Meltdown for all domains.
> +
> +`false` deactivates page table isolation on all systems for all domains.
> +
> +`default` sets the default behaviour.
> +
> +With `dom0` and `domu` it is possible to control page table isolation
> +for dom0 or guest domains only.
> +
>  ### xsave
>  > `= `

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 3/9] xen/x86: support per-domain flag for xpti

2018-04-18 Thread Sergey Dyasli
Hi Juergen,

2 small requests from me below.

On Wed, 2018-04-18 at 10:30 +0200, Juergen Gross wrote:
> Instead of switching XPTI globally on or off add a per-domain flag for
> that purpose. This allows to modify the xpti boot parameter to support
> running dom0 without Meltdown mitigations. Using "xpti=nodom0" as boot
> parameter will achieve that.
> 
> Move the xpti boot parameter handling to xen/arch/x86/pv/domain.c as
> it is pv-domain specific.
> 
> Signed-off-by: Juergen Gross 
> Reviewed-by: Jan Beulich 


> diff --git a/docs/misc/xen-command-line.markdown 
> b/docs/misc/xen-command-line.markdown
> index b353352adf..d4f758487a 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1955,14 +1955,24 @@ clustered mode.  The default, given no hint from the 
> **FADT**, is cluster
>  mode.
>  
>  ### xpti
> -> `= `
> +> `= List of [ default |  | dom0= | domu= ]`
>  
> -> Default: `false` on AMD hardware
> +> Default: `false` on hardware not vulnerable to Meltdown (e.g. AMD)

Could this line please be changed to:

`false` on hardware known not to be vulnerable to Meltdown (e.g. AMD)

>  > Default: `true` everywhere else
>  
>  Override default selection of whether to isolate 64-bit PV guest page
>  tables.
>  
> +`true` activates page table isolation even on hardware not vulnerable by
> +Meltdown for all domains.
> +
> +`false` deactivates page table isolation on all systems for all domains.
> +
> +`default` sets the default behaviour.
> +
> +With `dom0` and `domu` it is possible to control page table isolation
> +for dom0 or guest domains only.
> +
>  ### xsave
>  > `= `
>  

> diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
> index 5ab4ff3f68..b4fa43269e 100644
> --- a/xen/include/asm-x86/spec_ctrl.h
> +++ b/xen/include/asm-x86/spec_ctrl.h
> @@ -29,6 +29,10 @@ void init_speculation_mitigations(void);
>  extern bool opt_ibpb;
>  extern uint8_t default_bti_ist_info;
>  
> +extern uint8_t opt_xpti;
> +#define OPT_XPTI_DOM0  0x01
> +#define OPT_XPTI_DOMU  0x02
> +
>  static inline void init_shadow_spec_ctrl_state(void)
>  {
>  struct cpu_info *info = get_cpu_info();

Could you please also include something like the following:

@@ -119,8 +122,9 @@ static void __init print_details(enum ind_thunk thunk)
boot_cpu_has(X86_FEATURE_RSB_NATIVE)  ? " RSB_NATIVE" : "",
boot_cpu_has(X86_FEATURE_RSB_VMEXIT)  ? " RSB_VMEXIT" : "");
 
-printk("XPTI: %s\n",
-   boot_cpu_has(X86_FEATURE_NO_XPTI) ? "disabled" : "enabled");
+printk("XPTI: Dom0 %s, DomU (64-bit PV only) %s\n",
+   opt_xpti & OPT_XPTI_DOM0 ? "enabled" : "disabled",
+   opt_xpti & OPT_XPTI_DOMU ? "enabled" : "disabled");
 }


(just noticed that commit message also needs update regarding param name)

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v6 5/5] x86/msr: handle VMX MSRs with guest_rd/wrmsr()

2018-03-22 Thread Sergey Dyasli
Now that each domain has a correct view of VMX MSRs in it's per-domain
MSR policy, it's possible to handle guest's RD/WRMSR with the new
handlers. Do it and remove the old nvmx_msr_read_intercept() and
associated bits.

There is no functional change to what a guest sees in its VMX MSRs.

Signed-off-by: Sergey Dyasli 
Reviewed-by: Andrew Cooper 
---
v5 --> v6:
- Moved VMX MSRs case to the read-only block in guest_wrmsr()
- Added Reviewed-by
---
 xen/arch/x86/hvm/vmx/vmx.c |   6 --
 xen/arch/x86/hvm/vmx/vvmx.c| 178 -
 xen/arch/x86/msr.c |  32 +++
 xen/include/asm-x86/hvm/vmx/vvmx.h |   2 -
 4 files changed, 32 insertions(+), 186 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 847c314a08..ba5b78a9c2 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2869,10 +2869,6 @@ static int vmx_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 if ( nestedhvm_enabled(curr->domain) )
 *msr_content |= IA32_FEATURE_CONTROL_ENABLE_VMXON_OUTSIDE_SMX;
 break;
-case MSR_IA32_VMX_BASIC...MSR_IA32_VMX_VMFUNC:
-if ( !nvmx_msr_read_intercept(msr, msr_content) )
-goto gp_fault;
-break;
 case MSR_IA32_MISC_ENABLE:
 rdmsrl(MSR_IA32_MISC_ENABLE, *msr_content);
 /* Debug Trace Store is not supported. */
@@ -3126,8 +3122,6 @@ static int vmx_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 break;
 }
 case MSR_IA32_FEATURE_CONTROL:
-case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
-/* None of these MSRs are writeable. */
 goto gp_fault;
 
 case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 43f7297c04..5a1d9c8fc5 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1980,184 +1980,6 @@ int nvmx_handle_invvpid(struct cpu_user_regs *regs)
 return X86EMUL_OKAY;
 }
 
-#define __emul_value(enable1, default1) \
-((enable1 | default1) << 32 | (default1))
-
-#define gen_vmx_msr(enable1, default1, host_value) \
-(((__emul_value(enable1, default1) & host_value) & (~0ul << 32)) | \
-((uint32_t)(__emul_value(enable1, default1) | host_value)))
-
-/*
- * Capability reporting
- */
-int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content)
-{
-struct vcpu *v = current;
-struct domain *d = v->domain;
-u64 data = 0, host_data = 0;
-int r = 1;
-
-/* VMX capablity MSRs are available only when guest supports VMX. */
-if ( !nestedhvm_enabled(d) || !d->arch.cpuid->basic.vmx )
-return 0;
-
-/*
- * These MSRs are only available when flags in other MSRs are set.
- * These prerequisites are listed in the Intel 64 and IA-32
- * Architectures Software Developer’s Manual, Vol 3, Appendix A.
- */
-switch ( msr )
-{
-case MSR_IA32_VMX_PROCBASED_CTLS2:
-if ( !cpu_has_vmx_secondary_exec_control )
-return 0;
-break;
-
-case MSR_IA32_VMX_EPT_VPID_CAP:
-if ( !(cpu_has_vmx_ept || cpu_has_vmx_vpid) )
-return 0;
-break;
-
-case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
-case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
-case MSR_IA32_VMX_TRUE_EXIT_CTLS:
-case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
-if ( !(vmx_basic_msr & VMX_BASIC_DEFAULT1_ZERO) )
-return 0;
-break;
-
-case MSR_IA32_VMX_VMFUNC:
-if ( !cpu_has_vmx_vmfunc )
-return 0;
-break;
-}
-
-rdmsrl(msr, host_data);
-
-/*
- * Remove unsupport features from n1 guest capability MSR
- */
-switch (msr) {
-case MSR_IA32_VMX_BASIC:
-{
-const struct vmcs_struct *vmcs =
-map_domain_page(_mfn(PFN_DOWN(v->arch.hvm_vmx.vmcs_pa)));
-
-data = (host_data & (~0ul << 32)) |
-   (vmcs->vmcs_revision_id & 0x7fff);
-unmap_domain_page(vmcs);
-break;
-}
-case MSR_IA32_VMX_PINBASED_CTLS:
-case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
-/* 1-settings */
-data = PIN_BASED_EXT_INTR_MASK |
-   PIN_BASED_NMI_EXITING |
-   PIN_BASED_PREEMPT_TIMER;
-data = gen_vmx_msr(data, VMX_PINBASED_CTLS_DEFAULT1, host_data);
-break;
-case MSR_IA32_VMX_PROCBASED_CTLS:
-case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
-{
-u32 default1_bits = VMX_PROCBASED_CTLS_DEFAULT1;
-/* 1-settings */
-data = CPU_BASED_HLT_EXITING |
-   CPU_BASED_VIRTUAL_INTR_PENDING |
-   CPU_BASED_CR8_LOAD_EXITING |
-   CPU_BASED_CR8_STORE_EXITING |
-   CPU_BASED_INVLPG_EXITING |
-   CPU_BASED_CR3_LOAD_EXITING |
-   CPU_BASED_CR3_STORE_EXITING |
-   CPU_BASED_MONITOR_EXITING |
-   

[Xen-devel] [PATCH v6 2/5] x86/msr: add VMX MSRs into HVM_max domain policy

2018-03-22 Thread Sergey Dyasli
Currently, when nested virt is enabled, the set of L1 VMX features
is fixed and calculated by nvmx_msr_read_intercept() as an intersection
between the full set of Xen's supported L1 VMX features, the set of
actual H/W features and, for MSR_IA32_VMX_EPT_VPID_CAP, the set of
features that Xen uses.

Add calculate_hvm_max_vmx_policy() which will save the end result of
nvmx_msr_read_intercept() on current H/W into HVM_max domain policy.
There will be no functional change to what L1 sees in VMX MSRs. But the
actual use of HVM_max domain policy will happen later, when VMX MSRs
are handled by guest_rd/wrmsr().

Signed-off-by: Sergey Dyasli 
---
v5 --> v6:
- Replaced !cpu_has_vmx check with !hvm_max_cpuid_policy.basic.vmx
- Added a TODO reminder
- Added brackets around bit or expressions
---
 xen/arch/x86/msr.c | 135 +
 1 file changed, 135 insertions(+)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 87239e151e..01a5b52f95 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -106,6 +106,139 @@ static void __init calculate_host_policy(void)
 dp->plaform_info.cpuid_faulting = cpu_has_cpuid_faulting;
 }
 
+static void vmx_clear_policy(struct msr_domain_policy *dp)
+{
+memset(dp->vmx.raw, 0, sizeof(dp->vmx.raw));
+dp->vmx_procbased_ctls2.raw = 0;
+dp->vmx_ept_vpid_cap.raw = 0;
+memset(dp->vmx_true_ctls.raw, 0, sizeof(dp->vmx_true_ctls.raw));
+dp->vmx_vmfunc.raw = 0;
+}
+
+static void __init calculate_hvm_max_vmx_policy(struct msr_domain_policy *dp)
+{
+const struct msr_domain_policy *hp = &host_msr_domain_policy;
+uint32_t supported;
+
+if ( !hvm_max_cpuid_policy.basic.vmx )
+return;
+
+vmx_clear_policy(dp);
+
+ /* TODO: actually make vmx features selection sane */
+dp->vmx.basic.raw = hp->vmx.basic.raw;
+
+dp->vmx.pinbased_ctls.allowed_0.raw = VMX_PINBASED_CTLS_DEFAULT1;
+dp->vmx.pinbased_ctls.allowed_1.raw = VMX_PINBASED_CTLS_DEFAULT1;
+supported = (PIN_BASED_EXT_INTR_MASK |
+ PIN_BASED_NMI_EXITING   |
+ PIN_BASED_PREEMPT_TIMER);
+dp->vmx.pinbased_ctls.allowed_1.raw |= supported;
+dp->vmx.pinbased_ctls.allowed_1.raw &= hp->vmx.pinbased_ctls.allowed_1.raw;
+
+dp->vmx.procbased_ctls.allowed_0.raw = VMX_PROCBASED_CTLS_DEFAULT1;
+dp->vmx.procbased_ctls.allowed_1.raw = VMX_PROCBASED_CTLS_DEFAULT1;
+supported = (CPU_BASED_HLT_EXITING  |
+ CPU_BASED_VIRTUAL_INTR_PENDING |
+ CPU_BASED_CR8_LOAD_EXITING |
+ CPU_BASED_CR8_STORE_EXITING|
+ CPU_BASED_INVLPG_EXITING   |
+ CPU_BASED_MONITOR_EXITING  |
+ CPU_BASED_MWAIT_EXITING|
+ CPU_BASED_MOV_DR_EXITING   |
+ CPU_BASED_ACTIVATE_IO_BITMAP   |
+ CPU_BASED_USE_TSC_OFFSETING|
+ CPU_BASED_UNCOND_IO_EXITING|
+ CPU_BASED_RDTSC_EXITING|
+ CPU_BASED_MONITOR_TRAP_FLAG|
+ CPU_BASED_VIRTUAL_NMI_PENDING  |
+ CPU_BASED_ACTIVATE_MSR_BITMAP  |
+ CPU_BASED_PAUSE_EXITING|
+ CPU_BASED_RDPMC_EXITING|
+ CPU_BASED_TPR_SHADOW   |
+ CPU_BASED_ACTIVATE_SECONDARY_CONTROLS);
+dp->vmx.procbased_ctls.allowed_1.raw |= supported;
+dp->vmx.procbased_ctls.allowed_1.raw &=
+hp->vmx.procbased_ctls.allowed_1.raw;
+
+dp->vmx.exit_ctls.allowed_0.raw = VMX_EXIT_CTLS_DEFAULT1;
+dp->vmx.exit_ctls.allowed_1.raw = VMX_EXIT_CTLS_DEFAULT1;
+supported = (VM_EXIT_ACK_INTR_ON_EXIT   |
+ VM_EXIT_IA32E_MODE |
+ VM_EXIT_SAVE_PREEMPT_TIMER |
+ VM_EXIT_SAVE_GUEST_PAT |
+ VM_EXIT_LOAD_HOST_PAT  |
+ VM_EXIT_SAVE_GUEST_EFER|
+ VM_EXIT_LOAD_HOST_EFER |
+ VM_EXIT_LOAD_PERF_GLOBAL_CTRL);
+dp->vmx.exit_ctls.allowed_1.raw |= supported;
+dp->vmx.exit_ctls.allowed_1.raw &= hp->vmx.exit_ctls.allowed_1.raw;
+
+dp->vmx.entry_ctls.allowed_0.raw = VMX_ENTRY_CTLS_DEFAULT1;
+dp->vmx.entry_ctls.allowed_1.raw = VMX_ENTRY_CTLS_DEFAULT1;
+supported = (VM_ENTRY_LOAD_GUEST_PAT|
+ VM_ENTRY_LOAD_GUEST_EFER   |
+ VM_ENTRY_LOAD_PERF_GLOBAL_CTRL |
+ VM_ENTRY_IA32E_MODE);
+dp->vmx.entry_ctls.allowed_1.raw |= supported;
+dp->vmx.entry_ctls.allowed_1.raw &= hp->vmx.entry_ctls.allowed_1.raw;
+
+dp->vmx.misc.raw = hp->vmx.misc.raw;
+/* Do not support CR3-target feature now */
+dp->vmx.misc.cr3_target = false;
+
+/* PG, PE bits must be 1 in VMX operation */
+dp->vmx.cr0_fixed0.allowed_0.pe = true;
+dp->vmx.cr0_fixe

[Xen-devel] [PATCH v6 1/5] x86/msr: add VMX MSRs definitions and populate Raw domain policy

2018-03-22 Thread Sergey Dyasli
New definitions provide a convenient way of accessing contents of
VMX MSRs. They are separated into 5 logical blocks based on the
availability conditions of MSRs in the each block:

1. vmx: [VMX_BASIC, VMX_VMCS_ENUM]
2. VMX_PROCBASED_CTLS2
3. VMX_EPT_VPID_CAP
4. vmx_true_ctls: [VMX_TRUE_PINBASED_CTLS, VMX_TRUE_ENTRY_CTLS]
5. VMX_VMFUNC

Every bit value is accessible by its name and bit names match existing
Xen's definitions as close as possible. There is a "raw" 64-bit field
for each MSR as well as "raw" arrays for vmx and vmx_true_ctls blocks.

Add calculate_raw_vmx_policy() which fills Raw policy with H/W values
of VMX MSRs. Host policy will contain a copy of these values (for now).

Signed-off-by: Sergey Dyasli 
---
v5 --> v6:
- Removed "_bits" and "_based" from union names
- Removed "_exiting" suffixes from control bit names
- Various shortenings of control bit names
---
 xen/arch/x86/msr.c  | 118 ++
 xen/include/asm-x86/msr.h   | 330 
 xen/include/asm-x86/x86-defns.h |  54 +++
 3 files changed, 502 insertions(+)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 369b4754ce..87239e151e 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -34,10 +34,65 @@ struct msr_domain_policy __read_mostly 
raw_msr_domain_policy,
 struct msr_vcpu_policy __read_mostly hvm_max_msr_vcpu_policy,
__read_mostly  pv_max_msr_vcpu_policy;
 
+static bool vmx_procbased_ctls2_available(const struct msr_domain_policy *dp)
+{
+return dp->vmx.procbased_ctls.allowed_1.secondary;
+}
+
+static bool vmx_ept_vpid_cap_available(const struct msr_domain_policy *dp)
+{
+return dp->vmx_procbased_ctls2.allowed_1.ept ||
+   dp->vmx_procbased_ctls2.allowed_1.vpid;
+}
+
+static bool vmx_true_ctls_available(const struct msr_domain_policy *dp)
+{
+return dp->vmx.basic.default1_zero;
+}
+
+static bool vmx_vmfunc_available(const struct msr_domain_policy *dp)
+{
+return dp->vmx_procbased_ctls2.allowed_1.vmfunc;
+}
+
+static void __init calculate_raw_vmx_policy(struct msr_domain_policy *dp)
+{
+unsigned int i, start_msr, end_msr;
+
+if ( !cpu_has_vmx )
+return;
+
+start_msr = MSR_IA32_VMX_BASIC;
+end_msr = MSR_IA32_VMX_VMCS_ENUM;
+for ( i = start_msr; i <= end_msr; i++ )
+rdmsrl(i, dp->vmx.raw[i - start_msr]);
+
+if ( vmx_procbased_ctls2_available(dp) )
+rdmsrl(MSR_IA32_VMX_PROCBASED_CTLS2, dp->vmx_procbased_ctls2.raw);
+
+if ( vmx_ept_vpid_cap_available(dp) )
+rdmsrl(MSR_IA32_VMX_EPT_VPID_CAP, dp->vmx_ept_vpid_cap.raw);
+
+if ( vmx_true_ctls_available(dp) )
+{
+start_msr = MSR_IA32_VMX_TRUE_PINBASED_CTLS;
+end_msr = MSR_IA32_VMX_TRUE_ENTRY_CTLS;
+for ( i = start_msr; i <= end_msr; i++ )
+rdmsrl(i, dp->vmx_true_ctls.raw[i - start_msr]);
+}
+
+if ( vmx_vmfunc_available(dp) )
+rdmsrl(MSR_IA32_VMX_VMFUNC, dp->vmx_vmfunc.raw);
+}
+
 static void __init calculate_raw_policy(void)
 {
+struct msr_domain_policy *dp = &raw_msr_domain_policy;
+
 /* 0x00ce  MSR_INTEL_PLATFORM_INFO */
 /* Was already added by probe_cpuid_faulting() */
+
+calculate_raw_vmx_policy(dp);
 }
 
 static void __init calculate_host_policy(void)
@@ -284,6 +339,69 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
 return X86EMUL_EXCEPTION;
 }
 
+static void __init __maybe_unused build_assertions(void)
+{
+struct msr_domain_policy dp;
+
+BUILD_BUG_ON(sizeof(dp.vmx.basic) !=
+ sizeof(dp.vmx.basic.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.pinbased_ctls) !=
+ sizeof(dp.vmx.pinbased_ctls.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.procbased_ctls) !=
+ sizeof(dp.vmx.procbased_ctls.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.exit_ctls) !=
+ sizeof(dp.vmx.exit_ctls.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.entry_ctls) !=
+ sizeof(dp.vmx.entry_ctls.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.misc) !=
+ sizeof(dp.vmx.misc.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.cr0_fixed0) !=
+ sizeof(dp.vmx.cr0_fixed0.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.cr0_fixed1) !=
+ sizeof(dp.vmx.cr0_fixed1.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.cr4_fixed0) !=
+ sizeof(dp.vmx.cr4_fixed0.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.cr4_fixed1) !=
+ sizeof(dp.vmx.cr4_fixed1.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.vmcs_enum) !=
+ sizeof(dp.vmx.vmcs_enum.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.raw) !=
+ sizeof(dp.vmx.basic) +
+ sizeof(dp.vmx.pinbased_ctls) +
+ sizeof(dp.vmx.procbased_ctls) +
+ sizeof(dp.vmx.exit_ctls) +
+ sizeof(dp.vmx.entry_

[Xen-devel] [PATCH v6 4/5] x86/msr: update domain policy on CPUID policy changes

2018-03-22 Thread Sergey Dyasli
Availability of some MSRs depends on certain CPUID bits. Add function
recalculate_domain_msr_policy() which updates availability of MSRs
based on current domain's CPUID policy. This function is called when
CPUID policy is changed from a toolstack.

Add recalculate_domain_vmx_msr_policy() which changes availability of
VMX MSRs based on domain's nested virt settings. If it's enabled, then
the domain receives a copy of HVM_max vmx policy with allowed CR4 bits
adjusted by CPUID policy.

Signed-off-by: Sergey Dyasli 
Reviewed-by: Andrew Cooper 
---
v5 --> v6:
- Updated recalculate_msr_policy() comment and commit message
- Added Reviewed-by
---
 xen/arch/x86/domctl.c |  1 +
 xen/arch/x86/msr.c| 35 +++
 xen/include/asm-x86/msr.h |  3 +++
 3 files changed, 39 insertions(+)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 8fbbf3aeb3..5bde1a22b7 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -125,6 +125,7 @@ static int update_domain_cpuid_info(struct domain *d,
 }
 
 recalculate_cpuid_policy(d);
+recalculate_msr_policy(d);
 
 switch ( ctl->input[0] )
 {
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 01a5b52f95..26d987098b 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 DEFINE_PER_CPU(uint32_t, tsc_aux);
 
@@ -283,6 +284,39 @@ void __init init_guest_msr_policy(void)
 calculate_pv_max_policy();
 }
 
+static void vmx_copy_policy(const struct msr_domain_policy *src,
+  struct msr_domain_policy *dst)
+{
+memcpy(dst->vmx.raw, src->vmx.raw, sizeof(dst->vmx.raw));
+dst->vmx_procbased_ctls2.raw = src->vmx_procbased_ctls2.raw;
+dst->vmx_ept_vpid_cap.raw = src->vmx_ept_vpid_cap.raw;
+memcpy(dst->vmx_true_ctls.raw, src->vmx_true_ctls.raw,
+   sizeof(dst->vmx_true_ctls.raw));
+dst->vmx_vmfunc.raw = src->vmx_vmfunc.raw;
+}
+
+static void recalculate_vmx_msr_policy(struct domain *d)
+{
+struct msr_domain_policy *dp = d->arch.msr;
+
+if ( !nestedhvm_enabled(d) || !d->arch.cpuid->basic.vmx )
+{
+vmx_clear_policy(dp);
+
+return;
+}
+
+vmx_copy_policy(&hvm_max_msr_domain_policy, dp);
+
+/* Get allowed CR4 bits from CPUID policy */
+dp->vmx.cr4_fixed1.allowed_1.raw = hvm_cr4_guest_valid_bits(d, false);
+}
+
+void recalculate_msr_policy(struct domain *d)
+{
+recalculate_vmx_msr_policy(d);
+}
+
 int init_domain_msr_policy(struct domain *d)
 {
 struct msr_domain_policy *dp;
@@ -303,6 +337,7 @@ int init_domain_msr_policy(struct domain *d)
 }
 
 d->arch.msr = dp;
+recalculate_msr_policy(d);
 
 return 0;
 }
diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
index 5fdf82860e..41433fea94 100644
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -641,6 +641,9 @@ int init_vcpu_msr_policy(struct vcpu *v);
 int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val);
 int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val);
 
+/* Updates availability of MSRs based on CPUID policy */
+void recalculate_msr_policy(struct domain *d);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_MSR_H */
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v6 3/5] x86/cpuid: update signature of hvm_cr4_guest_valid_bits()

2018-03-22 Thread Sergey Dyasli
With the new cpuid infrastructure there is a domain-wide struct cpuid
policy and there is no need to pass a separate struct vcpu * into
hvm_cr4_guest_valid_bits() anymore. Make the function accept struct
domain * instead and update callers.

Signed-off-by: Sergey Dyasli 
Reviewed-by: Andrew Cooper 
---
v5 --> v6:
- Added brackets to expression in vmx.c and replaced 0 with false
- Added Reviewed-by
---
 xen/arch/x86/hvm/domain.c   | 3 ++-
 xen/arch/x86/hvm/hvm.c  | 7 +++
 xen/arch/x86/hvm/svm/svmdebug.c | 4 ++--
 xen/arch/x86/hvm/vmx/vmx.c  | 4 ++--
 xen/arch/x86/hvm/vmx/vvmx.c | 2 +-
 xen/include/asm-x86/hvm/hvm.h   | 2 +-
 6 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/hvm/domain.c b/xen/arch/x86/hvm/domain.c
index 60474649de..ce15ce0470 100644
--- a/xen/arch/x86/hvm/domain.c
+++ b/xen/arch/x86/hvm/domain.c
@@ -111,6 +111,7 @@ static int check_segment(struct segment_register *reg, enum 
x86_segment seg)
 /* Called by VCPUOP_initialise for HVM guests. */
 int arch_set_info_hvm_guest(struct vcpu *v, const vcpu_hvm_context_t *ctx)
 {
+const struct domain *d = v->domain;
 struct cpu_user_regs *uregs = &v->arch.user_regs;
 struct segment_register cs, ds, ss, es, tr;
 const char *errstr;
@@ -272,7 +273,7 @@ int arch_set_info_hvm_guest(struct vcpu *v, const 
vcpu_hvm_context_t *ctx)
 if ( v->arch.hvm_vcpu.guest_efer & EFER_LME )
 v->arch.hvm_vcpu.guest_efer |= EFER_LMA;
 
-if ( v->arch.hvm_vcpu.guest_cr[4] & ~hvm_cr4_guest_valid_bits(v, 0) )
+if ( v->arch.hvm_vcpu.guest_cr[4] & ~hvm_cr4_guest_valid_bits(d, false) )
 {
 gprintk(XENLOG_ERR, "Bad CR4 value: %#016lx\n",
 v->arch.hvm_vcpu.guest_cr[4]);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5759c73dd4..fe253034f2 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -931,9 +931,8 @@ const char *hvm_efer_valid(const struct vcpu *v, uint64_t 
value,
 X86_CR0_CD | X86_CR0_PG)))
 
 /* These bits in CR4 can be set by the guest. */
-unsigned long hvm_cr4_guest_valid_bits(const struct vcpu *v, bool restore)
+unsigned long hvm_cr4_guest_valid_bits(const struct domain *d, bool restore)
 {
-const struct domain *d = v->domain;
 const struct cpuid_policy *p;
 bool mce, vmxe;
 
@@ -1000,7 +999,7 @@ static int hvm_load_cpu_ctxt(struct domain *d, 
hvm_domain_context_t *h)
 return -EINVAL;
 }
 
-if ( ctxt.cr4 & ~hvm_cr4_guest_valid_bits(v, 1) )
+if ( ctxt.cr4 & ~hvm_cr4_guest_valid_bits(d, true) )
 {
 printk(XENLOG_G_ERR "HVM%d restore: bad CR4 %#" PRIx64 "\n",
d->domain_id, ctxt.cr4);
@@ -2350,7 +2349,7 @@ int hvm_set_cr4(unsigned long value, bool_t may_defer)
 struct vcpu *v = current;
 unsigned long old_cr;
 
-if ( value & ~hvm_cr4_guest_valid_bits(v, 0) )
+if ( value & ~hvm_cr4_guest_valid_bits(v->domain, false) )
 {
 HVM_DBG_LOG(DBG_LEVEL_1,
 "Guest attempts to set reserved bit in CR4: %lx",
diff --git a/xen/arch/x86/hvm/svm/svmdebug.c b/xen/arch/x86/hvm/svm/svmdebug.c
index 091c58fa1b..6c215d19fe 100644
--- a/xen/arch/x86/hvm/svm/svmdebug.c
+++ b/xen/arch/x86/hvm/svm/svmdebug.c
@@ -121,9 +121,9 @@ bool svm_vmcb_isvalid(const char *from, const struct 
vmcb_struct *vmcb,
(cr3 >> v->domain->arch.cpuid->extd.maxphysaddr))) )
 PRINTF("CR3: MBZ bits are set (%#"PRIx64")\n", cr3);
 
-if ( cr4 & ~hvm_cr4_guest_valid_bits(v, false) )
+if ( cr4 & ~hvm_cr4_guest_valid_bits(v->domain, false) )
 PRINTF("CR4: invalid bits are set (%#"PRIx64", valid: %#"PRIx64")\n",
-   cr4, hvm_cr4_guest_valid_bits(v, false));
+   cr4, hvm_cr4_guest_valid_bits(v->domain, false));
 
 if ( vmcb_get_dr6(vmcb) >> 32 )
 PRINTF("DR6: bits [63:32] are not zero (%#"PRIx64")\n",
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c5cc96339e..847c314a08 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1598,8 +1598,8 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned 
int cr,
  * Update CR4 host mask to only trap when the guest tries to set
  * bits that are controlled by the hypervisor.
  */
-v->arch.hvm_vmx.cr4_host_mask = HVM_CR4_HOST_MASK | X86_CR4_PKE |
-~hvm_cr4_guest_valid_bits(v, 0);
+v->arch.hvm_vmx.cr4_host_mask = (HVM_CR4_HOST_MASK | X86_CR4_PKE |
+   ~hvm_cr4_guest_valid_bits(v->domain, 
false));
 v->arch.hvm_vmx.cr4_host_mask |= v->arch.hvm_vmx.vmx_realmode ?
  X86_CR4_VME : 0;
 

[Xen-devel] [PATCH v6 0/5] VMX MSRs policy for Nested Virt: part 1

2018-03-22 Thread Sergey Dyasli
The end goal of having VMX MSRs policy is to be able to manage
L1 VMX features. This patch series is the first part of this work.
There is no functional change to what L1 sees in VMX MSRs at this
point. But each domain will have a policy object which allows to
sensibly query what VMX features the domain has. This will unblock
some other nested virtualization work items.

Currently, when nested virt is enabled, the set of L1 VMX features
is fixed and calculated by nvmx_msr_read_intercept() as an intersection
between the full set of Xen's supported L1 VMX features, the set of
actual H/W features and, for MSR_IA32_VMX_EPT_VPID_CAP, the set of
features that Xen uses.

The above makes L1 VMX feature set inconsistent between different H/W
and there is no ability to control what features are available to L1.
The overall set of issues has much in common with CPUID policy.

Part 1 adds VMX MSRs into struct msr_domain_policy and initializes them
during domain creation based on CPUID policy. In the future it should be
possible to independently configure values of VMX MSRs for each domain.

v5 --> v6:
- Various shortenings of control bit names
- Added Reviewed-by: Andrew Cooper to pathes 3,4 and 5
- Other changes are provided on per-patch basis

Sergey Dyasli (5):
  x86/msr: add VMX MSRs definitions and populate Raw domain policy
  x86/msr: add VMX MSRs into HVM_max domain policy
  x86/cpuid: update signature of hvm_cr4_guest_valid_bits()
  x86/msr: update domain policy on CPUID policy changes
  x86/msr: handle VMX MSRs with guest_rd/wrmsr()

 xen/arch/x86/domctl.c  |   1 +
 xen/arch/x86/hvm/domain.c  |   3 +-
 xen/arch/x86/hvm/hvm.c |   7 +-
 xen/arch/x86/hvm/svm/svmdebug.c|   4 +-
 xen/arch/x86/hvm/vmx/vmx.c |  10 +-
 xen/arch/x86/hvm/vmx/vvmx.c| 178 
 xen/arch/x86/msr.c | 320 +++
 xen/include/asm-x86/hvm/hvm.h  |   2 +-
 xen/include/asm-x86/hvm/vmx/vvmx.h |   2 -
 xen/include/asm-x86/msr.h  | 333 +
 xen/include/asm-x86/x86-defns.h|  54 ++
 11 files changed, 718 insertions(+), 196 deletions(-)

-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 2/5] x86/msr: add VMX MSRs into HVM_max domain policy

2018-03-22 Thread Sergey Dyasli
On Wed, 2018-03-21 at 20:46 +, Andrew Cooper wrote:
> On 28/02/2018 16:09, Sergey Dyasli wrote:
> > +
> > +dp->vmx.pinbased_ctls.allowed_0.raw = VMX_PINBASED_CTLS_DEFAULT1;
> > +dp->vmx.pinbased_ctls.allowed_1.raw = VMX_PINBASED_CTLS_DEFAULT1;
> > +supported = PIN_BASED_EXT_INTR_MASK |
> > +PIN_BASED_NMI_EXITING   |
> > +PIN_BASED_PREEMPT_TIMER;
> 
> Please have a single set of brackets around the entire or statement, so
> editors will indent new changes correctly.

Which editors? My editor is doing it fine. Anyway, is this what you are
asking for?

supported = (PIN_BASED_EXT_INTR_MASK |
 PIN_BASED_NMI_EXITING   |
 PIN_BASED_PREEMPT_TIMER);

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 1/5] x86/msr: add VMX MSRs definitions and populate Raw domain policy

2018-03-22 Thread Sergey Dyasli
On Wed, 2018-03-21 at 19:52 +, Andrew Cooper wrote:
> On 28/02/18 16:09, Sergey Dyasli wrote:
> > 
> > +struct {
> > +/* 0x0480  MSR_IA32_VMX_BASIC */
> > +union {
> > +uint64_t raw;
> > +struct {
> > +uint32_t vmcs_revision_id:31;
> 
> vmcs_rev_id
> 
> > +bool  mbz:1;  /* 31 always zero */
> 
> Is this really mbz?  Isn't this the shadow identifier bit for shadow vmcs's?

Yes, in vmcs itself it's the shadow bit. However, it is always zero in
MSR since the job of MSR is to report vmcs revision identifier.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ping: [PATCH] x86: ignore guest microcode loading attempts

2018-03-15 Thread Sergey Dyasli
On Thu, 2018-03-15 at 02:37 -0600, Jan Beulich wrote:
> > > > On 13.03.18 at 12:51,  wrote:
> > > > > On 13.03.18 at 11:36,  wrote:
> > > 
> > > On 13/03/2018 10:13, Jan Beulich wrote:
> > > > The respective MSRs are write-only, and hence attempts by guests to
> > > > write to these are - as of 1f1d183d49 ("x86/HVM: don't give the wrong
> > > > impression of WRMSR succeeding") no longer ignored. Restore original
> > > > behavior for the two affected MSRs.
> > > > 
> > > > Signed-off-by: Jan Beulich 
> > > > ---
> > > > While what is being logged for the current osstest failure on the 4.7
> > > > branch (I have to admit that I don't understand why it's only that
> > > > branch which shows a regression)
> > > 
> > > Differences in advertised viridian?
> > > 
> > > >  doesn't fully prove this to be the
> > > > problem, RCX holding 0x79 and there being a recorded hypervisor level
> > > > #GP recovery immediately before the guest triple fault is sufficient
> > > > indication imo.
> > > > What I'm unsure about is whether we want to ignore such writes also for
> > > > PV guests. If not, at least the WRMSR change would need to move into
> > > > hvm/hvm.c.
> > > 
> > > Hmm - I'd very much like not to make this change, because it sets a bad
> > > precedent for making the MSR handling sane.  We shouldn't be silently
> > > dropping MSR writes, as that will cause more problems for the guests,
> > > rather than less.
> > > 
> > > Given that it is appears to be just 4.7 which is affected, I think it is
> > > worth trying to work out what is causing 4.8 and later to be fine, and
> > > whether that is a better solution to backport.
> > 
> > The latest flight on 4.9 shows the same issue.
> 
> I realize it's generally too early for a ping, but with at least two of
> the stable trees now blocked on this (and specifically the ones we
> want to cut a release from soon), I'd really like to either see this
> patch acked or viable alternative proposals be made. FTR, I don't
> think reverting the change that caused the issue to surface is an
> option: We had specifically agreed to deal with fallout on a case by
> case basis.
> 
> From the pattern of the failures it's only a matter of time for other
> branches to also become blocked on this.

SDM states that this MSR is architectural and hence should always be
available to a guest. I couldn't find any information about how a #GP
fault may be raised during wrmsr or what happens if microcode update
fails. Looks like the guest should just check the resulting value in
MSR_IA32_UCODE_REV (which can be emulated by Xen).

Overall, the patch looks sensible to me. The only thing I would like to
see is some code comments about why this wrmsr is silently dropped.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v1] pv_console: remove unnecessary #ifdefs

2018-03-06 Thread Sergey Dyasli
The header for PV console contains empty function definitions in case of
!CONFIG_XEN_GUEST specially to avoid #ifdefs in a code that uses them
to make the code look cleaner.

Unfortunately, during the release of shim-comet, PV console functions
were enclosed into unnecessary #ifdefs CONFIG_X86. Remove them.

Signed-off-by: Sergey Dyasli 
---
Compile tested with aarch64 compiler.
---
 xen/drivers/char/console.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index 121073c8ed..47aff69869 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -30,10 +30,10 @@
 #include  /* for do_console_io */
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86
 #include 
-#include 
 #include 
 #endif
 
@@ -347,10 +347,8 @@ static void sercon_puts(const char *s)
 else
 serial_puts(sercon_handle, s);
 
-#ifdef CONFIG_X86
 /* Copy all serial output into PV console */
 pv_console_puts(s);
-#endif
 }
 
 static void dump_console_ring_key(unsigned char key)
@@ -816,9 +814,9 @@ void __init console_init_preirq(void)
 p++;
 if ( !strncmp(p, "vga", 3) )
 video_init();
-#ifdef CONFIG_X86
else if ( !strncmp(p, "pv", 2) )
 pv_console_init();
+#ifdef CONFIG_X86
 else if ( !strncmp(p, "xen", 3) )
 opt_console_xen = true;
 #endif
@@ -841,10 +839,7 @@ void __init console_init_preirq(void)
 }
 
 serial_set_rx_handler(sercon_handle, serial_rx);
-
-#ifdef CONFIG_X86
 pv_console_set_rx_handler(serial_rx);
-#endif
 
 /* HELLO WORLD --- start-of-day banner text. */
 spin_lock(&console_lock);
@@ -897,10 +892,7 @@ void __init console_init_ring(void)
 void __init console_init_postirq(void)
 {
 serial_init_postirq();
-
-#ifdef CONFIG_X86
 pv_console_init_postirq();
-#endif
 
 if ( conring != _conring )
 return;
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] vvmx: fixes after CR4 trapping optimizations

2018-03-05 Thread Sergey Dyasli
On Fri, 2018-03-02 at 16:19 +, Roger Pau Monne wrote:
> Commit 406817 doesn't update nested VMX code in order to take into
> account L1 CR4 host mask when nested guest (L2) writes to CR4, and
> thus the mask written to CR4_GUEST_HOST_MASK is likely not as
> restrictive as it should be.
> 
> Also the VVMCS GUEST_CR4 value should be updated to match the
> underlying value when syncing the VVMCS state.
> 
> Fixes: 406817 ("vmx/hap: optimize CR4 trapping")
> Signed-off-by: Roger Pau Monné 
> ---
> Cc: Jun Nakajima 
> Cc: Kevin Tian 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Sergey Dyasli 
> ---
> I've manually tested and AFAICT this fixes the osstest failure
> detected in 120076 ("test-amd64-amd64-qemuu-nested-intel").
> ---
> Changes since v1:
>  - Use guest_cr[4] in order to update the nested VMCS GUEST_CR4.

Reviewed-by: Sergey Dyasli 

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] vvmx: fixes after CR4 trapping optimizations

2018-03-02 Thread Sergey Dyasli
On Thu, 2018-03-01 at 16:19 +, Roger Pau Monne wrote:
> Commit 406817 doesn't update nested VMX code in order to take into
> account L1 CR4 host mask when nested guest (L2) writes to CR4, and
> thus the mask written to CR4_GUEST_HOST_MASK is likely not as
> restrictive as it should be.
> 
> Also the VVMCS GUEST_CR4 value should be updated to match the
> underlying value when syncing the VVMCS state.
> 
> Fixes: 406817 ("vmx/hap: optimize CR4 trapping")
> Signed-off-by: Roger Pau Monné 
> ---
> Cc: Jun Nakajima 
> Cc: Kevin Tian 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> ---
> I've manually tested and AFAICT this fixes the osstest failure
> detected in 120076 ("test-amd64-amd64-qemuu-nested-intel").
> ---
>  xen/arch/x86/hvm/vmx/vmx.c  |  4 
>  xen/arch/x86/hvm/vmx/vvmx.c | 13 -
>  2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 5cee364b0c..18d8ce2303 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1617,6 +1617,10 @@ static void vmx_update_guest_cr(struct vcpu *v, 
> unsigned int cr,
>  v->arch.hvm_vmx.cr4_host_mask |=
>  
> ~v->domain->arch.monitor.write_ctrlreg_mask[VM_EVENT_X86_CR4];
>  
> +if ( nestedhvm_vcpu_in_guestmode(v) )
> +/* Add the nested host mask to get the more restrictive one. 
> */
> +v->arch.hvm_vmx.cr4_host_mask |= get_vvmcs(v,
> +   
> CR4_GUEST_HOST_MASK);
>  }
>  __vmwrite(CR4_GUEST_HOST_MASK, v->arch.hvm_vmx.cr4_host_mask);
>  
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 8176736e8f..2baf707eec 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -1101,7 +1101,8 @@ static void load_shadow_guest_state(struct vcpu *v)
>   (get_vvmcs(v, CR4_READ_SHADOW) & cr_gh_mask);
>  __vmwrite(CR4_READ_SHADOW, cr_read_shadow);
>  /* Add the nested host mask to the one set by vmx_update_guest_cr. */
> -__vmwrite(CR4_GUEST_HOST_MASK, cr_gh_mask | 
> v->arch.hvm_vmx.cr4_host_mask);
> +v->arch.hvm_vmx.cr4_host_mask |= cr_gh_mask;
> +__vmwrite(CR4_GUEST_HOST_MASK, v->arch.hvm_vmx.cr4_host_mask);
>  
>  /* TODO: CR3 target control */
>  }
> @@ -1232,6 +1233,16 @@ static void sync_vvmcs_guest_state(struct vcpu *v, 
> struct cpu_user_regs *regs)
>  /* CR3 sync if exec doesn't want cr3 load exiting: i.e. nested EPT */
>  if ( !(__n2_exec_control(v) & CPU_BASED_CR3_LOAD_EXITING) )
>  shadow_to_vvmcs(v, GUEST_CR3);
> +
> +if ( v->arch.hvm_vmx.cr4_host_mask != ~0UL )
> +{
> +   /* Only need to update nested GUEST_CR4 if not all bits are trapped. 
> */
> +unsigned long nested_cr4_mask = get_vvmcs(v, CR4_GUEST_HOST_MASK);
> +
> +set_vvmcs(v, GUEST_CR4,
> +  (get_vvmcs(v, GUEST_CR4) & nested_cr4_mask) |
> +  (v->arch.hvm_vcpu.guest_cr[4] & ~nested_cr4_mask));

Why reading the old GUEST_CR4 is needed here? Can the new value be set
directly from guest_cr[4]?

> +}
>  }
>  
>  static void sync_vvmcs_ro(struct vcpu *v)
-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v5 5/5] x86/msr: handle VMX MSRs with guest_rd/wrmsr()

2018-02-28 Thread Sergey Dyasli
Now that each domain has a correct view of VMX MSRs in it's per-domain
MSR policy, it's possible to handle guest's RD/WRMSR with the new
handlers. Do it and remove the old nvmx_msr_read_intercept() and
associated bits.

There is no functional change to what a guest sees in its VMX MSRs.

Signed-off-by: Sergey Dyasli 
---
v4 --> v5:
- New msr availability helpers are used
---
 xen/arch/x86/hvm/vmx/vmx.c |   6 --
 xen/arch/x86/hvm/vmx/vvmx.c| 178 -
 xen/arch/x86/msr.c |  35 
 xen/include/asm-x86/hvm/vmx/vvmx.h |   2 -
 4 files changed, 35 insertions(+), 186 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 4856ad7c24..e850ef913f 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2934,10 +2934,6 @@ static int vmx_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 if ( nestedhvm_enabled(curr->domain) )
 *msr_content |= IA32_FEATURE_CONTROL_ENABLE_VMXON_OUTSIDE_SMX;
 break;
-case MSR_IA32_VMX_BASIC...MSR_IA32_VMX_VMFUNC:
-if ( !nvmx_msr_read_intercept(msr, msr_content) )
-goto gp_fault;
-break;
 case MSR_IA32_MISC_ENABLE:
 rdmsrl(MSR_IA32_MISC_ENABLE, *msr_content);
 /* Debug Trace Store is not supported. */
@@ -3160,8 +3156,6 @@ static int vmx_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 break;
 }
 case MSR_IA32_FEATURE_CONTROL:
-case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
-/* None of these MSRs are writeable. */
 goto gp_fault;
 
 case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index c8c168b7d0..8f4a68cf9a 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1975,184 +1975,6 @@ int nvmx_handle_invvpid(struct cpu_user_regs *regs)
 return X86EMUL_OKAY;
 }
 
-#define __emul_value(enable1, default1) \
-((enable1 | default1) << 32 | (default1))
-
-#define gen_vmx_msr(enable1, default1, host_value) \
-(((__emul_value(enable1, default1) & host_value) & (~0ul << 32)) | \
-((uint32_t)(__emul_value(enable1, default1) | host_value)))
-
-/*
- * Capability reporting
- */
-int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content)
-{
-struct vcpu *v = current;
-struct domain *d = v->domain;
-u64 data = 0, host_data = 0;
-int r = 1;
-
-/* VMX capablity MSRs are available only when guest supports VMX. */
-if ( !nestedhvm_enabled(d) || !d->arch.cpuid->basic.vmx )
-return 0;
-
-/*
- * These MSRs are only available when flags in other MSRs are set.
- * These prerequisites are listed in the Intel 64 and IA-32
- * Architectures Software Developer’s Manual, Vol 3, Appendix A.
- */
-switch ( msr )
-{
-case MSR_IA32_VMX_PROCBASED_CTLS2:
-if ( !cpu_has_vmx_secondary_exec_control )
-return 0;
-break;
-
-case MSR_IA32_VMX_EPT_VPID_CAP:
-if ( !(cpu_has_vmx_ept || cpu_has_vmx_vpid) )
-return 0;
-break;
-
-case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
-case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
-case MSR_IA32_VMX_TRUE_EXIT_CTLS:
-case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
-if ( !(vmx_basic_msr & VMX_BASIC_DEFAULT1_ZERO) )
-return 0;
-break;
-
-case MSR_IA32_VMX_VMFUNC:
-if ( !cpu_has_vmx_vmfunc )
-return 0;
-break;
-}
-
-rdmsrl(msr, host_data);
-
-/*
- * Remove unsupport features from n1 guest capability MSR
- */
-switch (msr) {
-case MSR_IA32_VMX_BASIC:
-{
-const struct vmcs_struct *vmcs =
-map_domain_page(_mfn(PFN_DOWN(v->arch.hvm_vmx.vmcs_pa)));
-
-data = (host_data & (~0ul << 32)) |
-   (vmcs->vmcs_revision_id & 0x7fff);
-unmap_domain_page(vmcs);
-break;
-}
-case MSR_IA32_VMX_PINBASED_CTLS:
-case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
-/* 1-settings */
-data = PIN_BASED_EXT_INTR_MASK |
-   PIN_BASED_NMI_EXITING |
-   PIN_BASED_PREEMPT_TIMER;
-data = gen_vmx_msr(data, VMX_PINBASED_CTLS_DEFAULT1, host_data);
-break;
-case MSR_IA32_VMX_PROCBASED_CTLS:
-case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
-{
-u32 default1_bits = VMX_PROCBASED_CTLS_DEFAULT1;
-/* 1-settings */
-data = CPU_BASED_HLT_EXITING |
-   CPU_BASED_VIRTUAL_INTR_PENDING |
-   CPU_BASED_CR8_LOAD_EXITING |
-   CPU_BASED_CR8_STORE_EXITING |
-   CPU_BASED_INVLPG_EXITING |
-   CPU_BASED_CR3_LOAD_EXITING |
-   CPU_BASED_CR3_STORE_EXITING |
-   CPU_BASED_MONITOR_EXITING |
-   CPU_BASED_MWAIT_EXITING |
-

[Xen-devel] [PATCH v5 1/5] x86/msr: add VMX MSRs definitions and populate Raw domain policy

2018-02-28 Thread Sergey Dyasli
New definitions provide a convenient way of accessing contents of
VMX MSRs. They are separated into 5 logical blocks based on the
availability conditions of MSRs in the each block:

1. vmx: [VMX_BASIC, VMX_VMCS_ENUM]
2. VMX_PROCBASED_CTLS2
3. VMX_EPT_VPID_CAP
4. vmx_true_ctls: [VMX_TRUE_PINBASED_CTLS, VMX_TRUE_ENTRY_CTLS]
5. VMX_VMFUNC

Every bit value is accessible by its name and bit names match existing
Xen's definitions as close as possible. There is a "raw" 64-bit field
for each MSR as well as "raw" arrays for vmx and vmx_true_ctls blocks.

Add calculate_raw_vmx_policy() which fills Raw policy with H/W values
of VMX MSRs. Host policy will contain a copy of these values (for now).

Signed-off-by: Sergey Dyasli 
---
v4 --> v5:
- Clarified the reason for splitting MSRs into 5 blocks
- Added raw field into cr0/4_bits
- Moved cr0/4_bits definitions into asm-x86/x86-defns.h
- Added msr availability helpers
---
 xen/arch/x86/msr.c  | 118 ++
 xen/include/asm-x86/msr.h   | 330 
 xen/include/asm-x86/x86-defns.h |  54 +++
 3 files changed, 502 insertions(+)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 8ae3b4e616..43607b5107 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -34,10 +34,65 @@ struct msr_domain_policy __read_mostly 
raw_msr_domain_policy,
 struct msr_vcpu_policy __read_mostly hvm_max_msr_vcpu_policy,
__read_mostly  pv_max_msr_vcpu_policy;
 
+static bool vmx_procbased_ctls2_available(const struct msr_domain_policy *dp)
+{
+return dp->vmx.procbased_ctls.allowed_1.activate_secondary_controls;
+}
+
+static bool vmx_ept_vpid_cap_available(const struct msr_domain_policy *dp)
+{
+return dp->vmx_procbased_ctls2.allowed_1.enable_ept ||
+   dp->vmx_procbased_ctls2.allowed_1.enable_vpid;
+}
+
+static bool vmx_true_ctls_available(const struct msr_domain_policy *dp)
+{
+return dp->vmx.basic.default1_zero;
+}
+
+static bool vmx_vmfunc_available(const struct msr_domain_policy *dp)
+{
+return dp->vmx_procbased_ctls2.allowed_1.enable_vm_functions;
+}
+
+static void __init calculate_raw_vmx_policy(struct msr_domain_policy *dp)
+{
+unsigned int i, start_msr, end_msr;
+
+if ( !cpu_has_vmx )
+return;
+
+start_msr = MSR_IA32_VMX_BASIC;
+end_msr = MSR_IA32_VMX_VMCS_ENUM;
+for ( i = start_msr; i <= end_msr; i++ )
+rdmsrl(i, dp->vmx.raw[i - start_msr]);
+
+if ( vmx_procbased_ctls2_available(dp) )
+rdmsrl(MSR_IA32_VMX_PROCBASED_CTLS2, dp->vmx_procbased_ctls2.raw);
+
+if ( vmx_ept_vpid_cap_available(dp) )
+rdmsrl(MSR_IA32_VMX_EPT_VPID_CAP, dp->vmx_ept_vpid_cap.raw);
+
+if ( vmx_true_ctls_available(dp) )
+{
+start_msr = MSR_IA32_VMX_TRUE_PINBASED_CTLS;
+end_msr = MSR_IA32_VMX_TRUE_ENTRY_CTLS;
+for ( i = start_msr; i <= end_msr; i++ )
+rdmsrl(i, dp->vmx_true_ctls.raw[i - start_msr]);
+}
+
+if ( vmx_vmfunc_available(dp) )
+rdmsrl(MSR_IA32_VMX_VMFUNC, dp->vmx_vmfunc.raw);
+}
+
 static void __init calculate_raw_policy(void)
 {
+struct msr_domain_policy *dp = &raw_msr_domain_policy;
+
 /* 0x00ce  MSR_INTEL_PLATFORM_INFO */
 /* Was already added by probe_cpuid_faulting() */
+
+calculate_raw_vmx_policy(dp);
 }
 
 static void __init calculate_host_policy(void)
@@ -260,6 +315,69 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
 return X86EMUL_EXCEPTION;
 }
 
+static void __init __maybe_unused build_assertions(void)
+{
+struct msr_domain_policy dp;
+
+BUILD_BUG_ON(sizeof(dp.vmx.basic) !=
+ sizeof(dp.vmx.basic.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.pinbased_ctls) !=
+ sizeof(dp.vmx.pinbased_ctls.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.procbased_ctls) !=
+ sizeof(dp.vmx.procbased_ctls.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.exit_ctls) !=
+ sizeof(dp.vmx.exit_ctls.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.entry_ctls) !=
+ sizeof(dp.vmx.entry_ctls.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.misc) !=
+ sizeof(dp.vmx.misc.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.cr0_fixed0) !=
+ sizeof(dp.vmx.cr0_fixed0.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.cr0_fixed1) !=
+ sizeof(dp.vmx.cr0_fixed1.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.cr4_fixed0) !=
+ sizeof(dp.vmx.cr4_fixed0.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.cr4_fixed1) !=
+ sizeof(dp.vmx.cr4_fixed1.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.vmcs_enum) !=
+ sizeof(dp.vmx.vmcs_enum.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.raw) !=
+ sizeof(dp.vmx.basic) +
+ sizeof(dp.vmx.pinbased_ctls) +
+ sizeof(dp.vmx.procbased_ctls) +
+ sizeof(dp.vmx.exit_ctls) +
+ 

[Xen-devel] [PATCH v5 3/5] x86/cpuid: update signature of hvm_cr4_guest_valid_bits()

2018-02-28 Thread Sergey Dyasli
With the new cpuid infrastructure there is a domain-wide struct cpuid
policy and there is no need to pass a separate struct vcpu * into
hvm_cr4_guest_valid_bits() anymore. Make the function accept struct
domain * instead and update callers.

Signed-off-by: Sergey Dyasli 
---
v4 --> v5: rebased to the latest staging
---
 xen/arch/x86/hvm/domain.c   | 3 ++-
 xen/arch/x86/hvm/hvm.c  | 7 +++
 xen/arch/x86/hvm/svm/svmdebug.c | 4 ++--
 xen/arch/x86/hvm/vmx/vmx.c  | 2 +-
 xen/arch/x86/hvm/vmx/vvmx.c | 2 +-
 xen/include/asm-x86/hvm/hvm.h   | 2 +-
 6 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/domain.c b/xen/arch/x86/hvm/domain.c
index 60474649de..ce15ce0470 100644
--- a/xen/arch/x86/hvm/domain.c
+++ b/xen/arch/x86/hvm/domain.c
@@ -111,6 +111,7 @@ static int check_segment(struct segment_register *reg, enum 
x86_segment seg)
 /* Called by VCPUOP_initialise for HVM guests. */
 int arch_set_info_hvm_guest(struct vcpu *v, const vcpu_hvm_context_t *ctx)
 {
+const struct domain *d = v->domain;
 struct cpu_user_regs *uregs = &v->arch.user_regs;
 struct segment_register cs, ds, ss, es, tr;
 const char *errstr;
@@ -272,7 +273,7 @@ int arch_set_info_hvm_guest(struct vcpu *v, const 
vcpu_hvm_context_t *ctx)
 if ( v->arch.hvm_vcpu.guest_efer & EFER_LME )
 v->arch.hvm_vcpu.guest_efer |= EFER_LMA;
 
-if ( v->arch.hvm_vcpu.guest_cr[4] & ~hvm_cr4_guest_valid_bits(v, 0) )
+if ( v->arch.hvm_vcpu.guest_cr[4] & ~hvm_cr4_guest_valid_bits(d, false) )
 {
 gprintk(XENLOG_ERR, "Bad CR4 value: %#016lx\n",
 v->arch.hvm_vcpu.guest_cr[4]);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0539551851..c96e166952 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -932,9 +932,8 @@ const char *hvm_efer_valid(const struct vcpu *v, uint64_t 
value,
 X86_CR0_CD | X86_CR0_PG)))
 
 /* These bits in CR4 can be set by the guest. */
-unsigned long hvm_cr4_guest_valid_bits(const struct vcpu *v, bool restore)
+unsigned long hvm_cr4_guest_valid_bits(const struct domain *d, bool restore)
 {
-const struct domain *d = v->domain;
 const struct cpuid_policy *p;
 bool mce, vmxe;
 
@@ -1001,7 +1000,7 @@ static int hvm_load_cpu_ctxt(struct domain *d, 
hvm_domain_context_t *h)
 return -EINVAL;
 }
 
-if ( ctxt.cr4 & ~hvm_cr4_guest_valid_bits(v, 1) )
+if ( ctxt.cr4 & ~hvm_cr4_guest_valid_bits(d, true) )
 {
 printk(XENLOG_G_ERR "HVM%d restore: bad CR4 %#" PRIx64 "\n",
d->domain_id, ctxt.cr4);
@@ -2344,7 +2343,7 @@ int hvm_set_cr4(unsigned long value, bool_t may_defer)
 struct vcpu *v = current;
 unsigned long old_cr;
 
-if ( value & ~hvm_cr4_guest_valid_bits(v, 0) )
+if ( value & ~hvm_cr4_guest_valid_bits(v->domain, false) )
 {
 HVM_DBG_LOG(DBG_LEVEL_1,
 "Guest attempts to set reserved bit in CR4: %lx",
diff --git a/xen/arch/x86/hvm/svm/svmdebug.c b/xen/arch/x86/hvm/svm/svmdebug.c
index 091c58fa1b..6c215d19fe 100644
--- a/xen/arch/x86/hvm/svm/svmdebug.c
+++ b/xen/arch/x86/hvm/svm/svmdebug.c
@@ -121,9 +121,9 @@ bool svm_vmcb_isvalid(const char *from, const struct 
vmcb_struct *vmcb,
(cr3 >> v->domain->arch.cpuid->extd.maxphysaddr))) )
 PRINTF("CR3: MBZ bits are set (%#"PRIx64")\n", cr3);
 
-if ( cr4 & ~hvm_cr4_guest_valid_bits(v, false) )
+if ( cr4 & ~hvm_cr4_guest_valid_bits(v->domain, false) )
 PRINTF("CR4: invalid bits are set (%#"PRIx64", valid: %#"PRIx64")\n",
-   cr4, hvm_cr4_guest_valid_bits(v, false));
+   cr4, hvm_cr4_guest_valid_bits(v->domain, false));
 
 if ( vmcb_get_dr6(vmcb) >> 32 )
 PRINTF("DR6: bits [63:32] are not zero (%#"PRIx64")\n",
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index aa0505036b..4856ad7c24 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1699,7 +1699,7 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned 
int cr)
  * bits that are controlled by the hypervisor.
  */
 v->arch.hvm_vmx.cr4_host_mask = HVM_CR4_HOST_MASK | X86_CR4_PKE |
-~hvm_cr4_guest_valid_bits(v, 0);
+~hvm_cr4_guest_valid_bits(v->domain, 
0);
 v->arch.hvm_vmx.cr4_host_mask |= v->arch.hvm_vmx.vmx_realmode ?
  X86_CR4_VME : 0;
 v->arch.hvm_vmx.cr4_host_mask |= !hvm_paging_enabled(v) ?
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 8176736e8f..c8c168b7d0 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -

[Xen-devel] [PATCH v5 4/5] x86/msr: update domain policy on CPUID policy changes

2018-02-28 Thread Sergey Dyasli
Availability of some MSRs depends on certain CPUID bits. Add function
recalculate_domain_msr_policy() which updates availability of per-domain
MSRs based on current domain's CPUID policy. This function is called
when CPUID policy is changed from a toolstack.

Add recalculate_domain_vmx_msr_policy() which changes availability of
VMX MSRs based on domain's nested virt settings. If it's enabled, then
the domain receives a copy of HVM_max vmx policy with allowed CR4 bits
adjusted by CPUID policy.

Signed-off-by: Sergey Dyasli 
---
v4 --> v5:
- Removed _domain from function names
- Added vmx_copy_policy() helper
- recalculate_vmx_msr_policy() was rewritten
---
 xen/arch/x86/domctl.c |  1 +
 xen/arch/x86/msr.c| 35 +++
 xen/include/asm-x86/msr.h |  3 +++
 3 files changed, 39 insertions(+)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 8fbbf3aeb3..5bde1a22b7 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -125,6 +125,7 @@ static int update_domain_cpuid_info(struct domain *d,
 }
 
 recalculate_cpuid_policy(d);
+recalculate_msr_policy(d);
 
 switch ( ctl->input[0] )
 {
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index f700e05570..9114b8f53b 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 DEFINE_PER_CPU(uint32_t, tsc_aux);
 
@@ -282,6 +283,39 @@ void __init init_guest_msr_policy(void)
 calculate_pv_max_policy();
 }
 
+static void vmx_copy_policy(const struct msr_domain_policy *src,
+  struct msr_domain_policy *dst)
+{
+memcpy(dst->vmx.raw, src->vmx.raw, sizeof(dst->vmx.raw));
+dst->vmx_procbased_ctls2.raw = src->vmx_procbased_ctls2.raw;
+dst->vmx_ept_vpid_cap.raw = src->vmx_ept_vpid_cap.raw;
+memcpy(dst->vmx_true_ctls.raw, src->vmx_true_ctls.raw,
+   sizeof(dst->vmx_true_ctls.raw));
+dst->vmx_vmfunc.raw = src->vmx_vmfunc.raw;
+}
+
+static void recalculate_vmx_msr_policy(struct domain *d)
+{
+struct msr_domain_policy *dp = d->arch.msr;
+
+if ( !nestedhvm_enabled(d) || !d->arch.cpuid->basic.vmx )
+{
+vmx_clear_policy(dp);
+
+return;
+}
+
+vmx_copy_policy(&hvm_max_msr_domain_policy, dp);
+
+/* Get allowed CR4 bits from CPUID policy */
+dp->vmx.cr4_fixed1.allowed_1.raw = hvm_cr4_guest_valid_bits(d, false);
+}
+
+void recalculate_msr_policy(struct domain *d)
+{
+recalculate_vmx_msr_policy(d);
+}
+
 int init_domain_msr_policy(struct domain *d)
 {
 struct msr_domain_policy *dp;
@@ -302,6 +336,7 @@ int init_domain_msr_policy(struct domain *d)
 }
 
 d->arch.msr = dp;
+recalculate_msr_policy(d);
 
 return 0;
 }
diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
index 419ab6f8a7..4747572871 100644
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -606,6 +606,9 @@ int init_vcpu_msr_policy(struct vcpu *v);
 int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val);
 int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val);
 
+/* Update availability of per-domain MSRs based on CPUID policy */
+void recalculate_msr_policy(struct domain *d);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_MSR_H */
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v5 2/5] x86/msr: add VMX MSRs into HVM_max domain policy

2018-02-28 Thread Sergey Dyasli
Currently, when nested virt is enabled, the set of L1 VMX features
is fixed and calculated by nvmx_msr_read_intercept() as an intersection
between the full set of Xen's supported L1 VMX features, the set of
actual H/W features and, for MSR_IA32_VMX_EPT_VPID_CAP, the set of
features that Xen uses.

Add calculate_hvm_max_vmx_policy() which will save the end result of
nvmx_msr_read_intercept() on current H/W into HVM_max domain policy.
There will be no functional change to what L1 sees in VMX MSRs. But the
actual use of HVM_max domain policy will happen later, when VMX MSRs
are handled by guest_rd/wrmsr().

Signed-off-by: Sergey Dyasli 
---
v4 --> v5:
- Macros are removed and now supported bitmask is used to derive policy
- Added vmx_clear_policy() helper
---
 xen/arch/x86/msr.c | 134 +
 1 file changed, 134 insertions(+)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 43607b5107..f700e05570 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -106,6 +106,138 @@ static void __init calculate_host_policy(void)
 dp->plaform_info.cpuid_faulting = cpu_has_cpuid_faulting;
 }
 
+static void vmx_clear_policy(struct msr_domain_policy *dp)
+{
+memset(dp->vmx.raw, 0, sizeof(dp->vmx.raw));
+dp->vmx_procbased_ctls2.raw = 0;
+dp->vmx_ept_vpid_cap.raw = 0;
+memset(dp->vmx_true_ctls.raw, 0, sizeof(dp->vmx_true_ctls.raw));
+dp->vmx_vmfunc.raw = 0;
+}
+
+static void __init calculate_hvm_max_vmx_policy(struct msr_domain_policy *dp)
+{
+const struct msr_domain_policy *hp = &host_msr_domain_policy;
+uint32_t supported;
+
+if ( !cpu_has_vmx )
+return;
+
+vmx_clear_policy(dp);
+
+dp->vmx.basic.raw = hp->vmx.basic.raw;
+
+dp->vmx.pinbased_ctls.allowed_0.raw = VMX_PINBASED_CTLS_DEFAULT1;
+dp->vmx.pinbased_ctls.allowed_1.raw = VMX_PINBASED_CTLS_DEFAULT1;
+supported = PIN_BASED_EXT_INTR_MASK |
+PIN_BASED_NMI_EXITING   |
+PIN_BASED_PREEMPT_TIMER;
+dp->vmx.pinbased_ctls.allowed_1.raw |= supported;
+dp->vmx.pinbased_ctls.allowed_1.raw &= hp->vmx.pinbased_ctls.allowed_1.raw;
+
+dp->vmx.procbased_ctls.allowed_0.raw = VMX_PROCBASED_CTLS_DEFAULT1;
+dp->vmx.procbased_ctls.allowed_1.raw = VMX_PROCBASED_CTLS_DEFAULT1;
+supported = CPU_BASED_HLT_EXITING  |
+CPU_BASED_VIRTUAL_INTR_PENDING |
+CPU_BASED_CR8_LOAD_EXITING |
+CPU_BASED_CR8_STORE_EXITING|
+CPU_BASED_INVLPG_EXITING   |
+CPU_BASED_MONITOR_EXITING  |
+CPU_BASED_MWAIT_EXITING|
+CPU_BASED_MOV_DR_EXITING   |
+CPU_BASED_ACTIVATE_IO_BITMAP   |
+CPU_BASED_USE_TSC_OFFSETING|
+CPU_BASED_UNCOND_IO_EXITING|
+CPU_BASED_RDTSC_EXITING|
+CPU_BASED_MONITOR_TRAP_FLAG|
+CPU_BASED_VIRTUAL_NMI_PENDING  |
+CPU_BASED_ACTIVATE_MSR_BITMAP  |
+CPU_BASED_PAUSE_EXITING|
+CPU_BASED_RDPMC_EXITING|
+CPU_BASED_TPR_SHADOW   |
+CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
+dp->vmx.procbased_ctls.allowed_1.raw |= supported;
+dp->vmx.procbased_ctls.allowed_1.raw &=
+hp->vmx.procbased_ctls.allowed_1.raw;
+
+dp->vmx.exit_ctls.allowed_0.raw = VMX_EXIT_CTLS_DEFAULT1;
+dp->vmx.exit_ctls.allowed_1.raw = VMX_EXIT_CTLS_DEFAULT1;
+supported = VM_EXIT_ACK_INTR_ON_EXIT   |
+VM_EXIT_IA32E_MODE |
+VM_EXIT_SAVE_PREEMPT_TIMER |
+VM_EXIT_SAVE_GUEST_PAT |
+VM_EXIT_LOAD_HOST_PAT  |
+VM_EXIT_SAVE_GUEST_EFER|
+VM_EXIT_LOAD_HOST_EFER |
+VM_EXIT_LOAD_PERF_GLOBAL_CTRL;
+dp->vmx.exit_ctls.allowed_1.raw |= supported;
+dp->vmx.exit_ctls.allowed_1.raw &= hp->vmx.exit_ctls.allowed_1.raw;
+
+dp->vmx.entry_ctls.allowed_0.raw = VMX_ENTRY_CTLS_DEFAULT1;
+dp->vmx.entry_ctls.allowed_1.raw = VMX_ENTRY_CTLS_DEFAULT1;
+supported = VM_ENTRY_LOAD_GUEST_PAT|
+VM_ENTRY_LOAD_GUEST_EFER   |
+VM_ENTRY_LOAD_PERF_GLOBAL_CTRL |
+VM_ENTRY_IA32E_MODE;
+dp->vmx.entry_ctls.allowed_1.raw |= supported;
+dp->vmx.entry_ctls.allowed_1.raw &= hp->vmx.entry_ctls.allowed_1.raw;
+
+dp->vmx.misc.raw = hp->vmx.misc.raw;
+/* Do not support CR3-target feature now */
+dp->vmx.misc.cr3_target = false;
+
+/* PG, PE bits must be 1 in VMX operation */
+dp->vmx.cr0_fixed0.allowed_0.pe = true;
+dp->vmx.cr0_fixed0.allowed_0.pg = true;
+
+/* allow 0-settings for all bits */
+dp->vmx.cr0_fixed1.allowed_1.raw = 0x;
+
+/* VMXE bi

[Xen-devel] [PATCH v5 0/5] VMX MSRs policy for Nested Virt: part 1

2018-02-28 Thread Sergey Dyasli
The end goal of having VMX MSRs policy is to be able to manage
L1 VMX features. This patch series is the first part of this work.
There is no functional change to what L1 sees in VMX MSRs at this
point. But each domain will have a policy object which allows to
sensibly query what VMX features the domain has. This will unblock
some other nested virtualization work items.

Currently, when nested virt is enabled, the set of L1 VMX features
is fixed and calculated by nvmx_msr_read_intercept() as an intersection
between the full set of Xen's supported L1 VMX features, the set of
actual H/W features and, for MSR_IA32_VMX_EPT_VPID_CAP, the set of
features that Xen uses.

The above makes L1 VMX feature set inconsistent between different H/W
and there is no ability to control what features are available to L1.
The overall set of issues has much in common with CPUID policy.

Part 1 adds VMX MSRs into struct msr_domain_policy and initializes them
during domain creation based on CPUID policy. In the future it should be
possible to independently configure values of VMX MSRs for each domain.

v4 --> v5:
- First patch "x86/msr: add Raw and Host domain policies" was upstreamed
  separately
- Combined the next 2 patches into 1

Sergey Dyasli (5):
  x86/msr: add VMX MSRs definitions and populate Raw domain policy
  x86/msr: add VMX MSRs into HVM_max domain policy
  x86/cpuid: update signature of hvm_cr4_guest_valid_bits()
  x86/msr: update domain policy on CPUID policy changes
  x86/msr: handle VMX MSRs with guest_rd/wrmsr()

 xen/arch/x86/domctl.c  |   1 +
 xen/arch/x86/hvm/domain.c  |   3 +-
 xen/arch/x86/hvm/hvm.c |   7 +-
 xen/arch/x86/hvm/svm/svmdebug.c|   4 +-
 xen/arch/x86/hvm/vmx/vmx.c |   8 +-
 xen/arch/x86/hvm/vmx/vvmx.c| 178 
 xen/arch/x86/msr.c | 322 +++
 xen/include/asm-x86/hvm/hvm.h  |   2 +-
 xen/include/asm-x86/hvm/vmx/vvmx.h |   2 -
 xen/include/asm-x86/msr.h  | 333 +
 xen/include/asm-x86/x86-defns.h|  54 ++
 11 files changed, 719 insertions(+), 195 deletions(-)

-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2] x86/msr: add Raw and Host domain policies

2018-02-19 Thread Sergey Dyasli
Raw policy contains the actual values from H/W MSRs. Add PLATFORM_INFO
msr to the policy during probe_cpuid_faulting().

Host policy may have certain features disabled if Xen decides not
to use them. For now, make Host policy equal to Raw policy with
cpuid_faulting availability dependent on X86_FEATURE_CPUID_FAULTING.

Finally, derive HVM/PV max domain policies from the Host policy.

Signed-off-by: Sergey Dyasli 
---
v2:
- Moved *dp into a narrower scope in probe_cpuid_faulting()
- Changes to how Host/pv/hvm domain policies are calculated
---
 xen/arch/x86/cpu/common.c | 12 +++-
 xen/arch/x86/msr.c| 37 -
 xen/include/asm-x86/msr.h |  8 
 3 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 4306e59650..0a452aea2c 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -119,8 +119,18 @@ void (* __read_mostly ctxt_switch_masking)(const struct 
vcpu *next);
 bool __init probe_cpuid_faulting(void)
 {
uint64_t val;
+   int rc;
 
-   if (rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val) ||
+   if ((rc = rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val)) == 0)
+   {
+   struct msr_domain_policy *dp = &raw_msr_domain_policy;
+
+   dp->plaform_info.available = true;
+   if (val & MSR_PLATFORM_INFO_CPUID_FAULTING)
+   dp->plaform_info.cpuid_faulting = true;
+   }
+
+   if (rc ||
!(val & MSR_PLATFORM_INFO_CPUID_FAULTING) ||
rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES,
   this_cpu(msr_misc_features)))
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 7875d9c1e0..7aaa2b0406 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -24,12 +24,31 @@
 #include 
 #include 
 
-struct msr_domain_policy __read_mostly hvm_max_msr_domain_policy,
+struct msr_domain_policy __read_mostly raw_msr_domain_policy,
+ __read_mostlyhost_msr_domain_policy,
+ __read_mostly hvm_max_msr_domain_policy,
  __read_mostly  pv_max_msr_domain_policy;
 
 struct msr_vcpu_policy __read_mostly hvm_max_msr_vcpu_policy,
__read_mostly  pv_max_msr_vcpu_policy;
 
+static void __init calculate_raw_policy(void)
+{
+/* 0x00ce  MSR_INTEL_PLATFORM_INFO */
+/* Was already added by probe_cpuid_faulting() */
+}
+
+static void __init calculate_host_policy(void)
+{
+struct msr_domain_policy *dp = &host_msr_domain_policy;
+
+*dp = raw_msr_domain_policy;
+
+/* 0x00ce  MSR_INTEL_PLATFORM_INFO */
+/* probe_cpuid_faulting() sanity checks presence of MISC_FEATURES_ENABLES 
*/
+dp->plaform_info.cpuid_faulting = cpu_has_cpuid_faulting;
+}
+
 static void __init calculate_hvm_max_policy(void)
 {
 struct msr_domain_policy *dp = &hvm_max_msr_domain_policy;
@@ -38,7 +57,10 @@ static void __init calculate_hvm_max_policy(void)
 if ( !hvm_enabled )
 return;
 
+*dp = host_msr_domain_policy;
+
 /* 0x00ce  MSR_INTEL_PLATFORM_INFO */
+/* It's always possible to emulate CPUID faulting for HVM guests */
 if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
  boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
 {
@@ -47,7 +69,7 @@ static void __init calculate_hvm_max_policy(void)
 }
 
 /* 0x0140  MSR_INTEL_MISC_FEATURES_ENABLES */
-vp->misc_features_enables.available = dp->plaform_info.available;
+vp->misc_features_enables.available = dp->plaform_info.cpuid_faulting;
 }
 
 static void __init calculate_pv_max_policy(void)
@@ -55,19 +77,16 @@ static void __init calculate_pv_max_policy(void)
 struct msr_domain_policy *dp = &pv_max_msr_domain_policy;
 struct msr_vcpu_policy *vp = &pv_max_msr_vcpu_policy;
 
-/* 0x00ce  MSR_INTEL_PLATFORM_INFO */
-if ( cpu_has_cpuid_faulting )
-{
-dp->plaform_info.available = true;
-dp->plaform_info.cpuid_faulting = true;
-}
+*dp = host_msr_domain_policy;
 
 /* 0x0140  MSR_INTEL_MISC_FEATURES_ENABLES */
-vp->misc_features_enables.available = dp->plaform_info.available;
+vp->misc_features_enables.available = dp->plaform_info.cpuid_faulting;
 }
 
 void __init init_guest_msr_policy(void)
 {
+calculate_raw_policy();
+calculate_host_policy();
 calculate_hvm_max_policy();
 calculate_pv_max_policy();
 }
diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
index 928f1cc454..94c142289b 100644
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -220,6 +220,14 @@ struct msr_domain_policy
 } plaform_info;
 };
 
+/* RAW msr domain policy: contains the actual values from H/W MSRs */
+extern struct msr_domain_policy raw_msr_domain_policy;
+/*
+ * HOST msr domain policy: features that Xen actually decided to use,
+ * a subset of 

Re: [Xen-devel] [PATCH v1] x86/msr: add Raw and Host domain policies

2018-02-16 Thread Sergey Dyasli
On Fri, 2018-02-16 at 11:38 +, Andrew Cooper wrote:
> On 16/02/18 11:31, Sergey Dyasli wrote:
> > On Fri, 2018-02-16 at 04:06 -0700, Jan Beulich wrote:
> > > > > > On 16.02.18 at 11:33,  wrote:
> > > > 
> > > > On Thu, 2018-02-15 at 06:33 -0700, Jan Beulich wrote:
> > > > > > > > On 08.02.18 at 11:23,  wrote:
> > > > > > 
> > > > > > uint64_t val;
> > > > > > +   int rc;
> > > > > >  
> > > > > > -   if (rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val) ||
> > > > > > +   if ((rc = rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val)) == 0)
> > > > > > +   {
> > > > > > +   dp->plaform_info.available = true;
> > > > > > +   if (val & MSR_PLATFORM_INFO_CPUID_FAULTING)
> > > > > > +   dp->plaform_info.cpuid_faulting = true;
> > > > > > +   }
> > > > > > +
> > > > > > +   if (rc ||
> > > > > > !(val & MSR_PLATFORM_INFO_CPUID_FAULTING) ||
> > > > > > rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES,
> > > > > >this_cpu(msr_misc_features)))
> > > > > 
> > > > > Below here we have
> > > > > 
> > > > >   setup_clear_cpu_cap(X86_FEATURE_CPUID_FAULTING);
> > > > > 
> > > > > Shouldn't this be reflected in the host policy?
> > > > 
> > > > I guess the correct thing to do for now for host_msr_domain_policy is:
> > > > 
> > > > dp->plaform_info.cpuid_faulting = cpu_has_cpuid_faulting;
> > > > 
> > > > Looking at the code, calculate_pv_max_policy() will be simplified with
> > > > the above change: pv_max_msr_domain_policy will become a copy of host
> > > > policy.
> > > > 
> > > > This actually brings a question: what to do about per-pCPU MSRs in the
> > > > context of MSR policy?
> > > 
> > > How does per-pCPU-ness of an MSR affect the policy? Are you
> > > thinking of CPUs with different capabilities? We assume all CPUs
> > > are identical in many other places.
> > 
> > Yes, CPUs are assumed to be identical. But currently Xen checks
> > the presence of MISC_FEATURES_ENABLES (which is a per-pCPU msr)
> > on the boot CPU, and it affects X86_FEATURE_CPUID_FAULTING. Which
> > in it's turn affects the presence of MISC_FEATURES_ENABLES for PV vCPUs.
> > 
> > So the actual question is: where to store the availability of
> > MISC_FEATURES_ENABLES (and possibly other per-pCPU msrs in the future)
> > and is it even needed to do so?
> 
> Store it in one single host policy.

And where do you propose to actually store it? Currently there are
two distinct structures: msr_domain_policy and msr_vcpu_policy.

> Part of my CPUID work will be cleaning up some of these warts in the
> detection logic.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] x86/msr: add Raw and Host domain policies

2018-02-16 Thread Sergey Dyasli
On Fri, 2018-02-16 at 04:06 -0700, Jan Beulich wrote:
> > > > On 16.02.18 at 11:33,  wrote:
> > 
> > On Thu, 2018-02-15 at 06:33 -0700, Jan Beulich wrote:
> > > > > > On 08.02.18 at 11:23,  wrote:
> > > > 
> > > > uint64_t val;
> > > > +   int rc;
> > > >  
> > > > -   if (rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val) ||
> > > > +   if ((rc = rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val)) == 0)
> > > > +   {
> > > > +   dp->plaform_info.available = true;
> > > > +   if (val & MSR_PLATFORM_INFO_CPUID_FAULTING)
> > > > +   dp->plaform_info.cpuid_faulting = true;
> > > > +   }
> > > > +
> > > > +   if (rc ||
> > > > !(val & MSR_PLATFORM_INFO_CPUID_FAULTING) ||
> > > > rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES,
> > > >this_cpu(msr_misc_features)))
> > > 
> > > Below here we have
> > > 
> > >   setup_clear_cpu_cap(X86_FEATURE_CPUID_FAULTING);
> > > 
> > > Shouldn't this be reflected in the host policy?
> > 
> > I guess the correct thing to do for now for host_msr_domain_policy is:
> > 
> > dp->plaform_info.cpuid_faulting = cpu_has_cpuid_faulting;
> > 
> > Looking at the code, calculate_pv_max_policy() will be simplified with
> > the above change: pv_max_msr_domain_policy will become a copy of host
> > policy.
> > 
> > This actually brings a question: what to do about per-pCPU MSRs in the
> > context of MSR policy?
> 
> How does per-pCPU-ness of an MSR affect the policy? Are you
> thinking of CPUs with different capabilities? We assume all CPUs
> are identical in many other places.

Yes, CPUs are assumed to be identical. But currently Xen checks
the presence of MISC_FEATURES_ENABLES (which is a per-pCPU msr)
on the boot CPU, and it affects X86_FEATURE_CPUID_FAULTING. Which
in it's turn affects the presence of MISC_FEATURES_ENABLES for PV vCPUs.

So the actual question is: where to store the availability of
MISC_FEATURES_ENABLES (and possibly other per-pCPU msrs in the future)
and is it even needed to do so?

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] x86/msr: add Raw and Host domain policies

2018-02-16 Thread Sergey Dyasli
On Thu, 2018-02-15 at 06:33 -0700, Jan Beulich wrote:
> > > > On 08.02.18 at 11:23,  wrote:
> > 
> > --- a/xen/arch/x86/cpu/common.c
> > +++ b/xen/arch/x86/cpu/common.c
> > @@ -118,9 +118,18 @@ void (* __read_mostly ctxt_switch_masking)(const 
> > struct vcpu *next);
> >  
> >  bool __init probe_cpuid_faulting(void)
> >  {
> > +   struct msr_domain_policy *dp = &raw_msr_domain_policy;
> 
> Unless you foresee the variable to be needed for further things
> here, could this be moved into the more narrow scope it's used in
> please?

Will do in v2.

> > uint64_t val;
> > +   int rc;
> >  
> > -   if (rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val) ||
> > +   if ((rc = rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val)) == 0)
> > +   {
> > +   dp->plaform_info.available = true;
> > +   if (val & MSR_PLATFORM_INFO_CPUID_FAULTING)
> > +   dp->plaform_info.cpuid_faulting = true;
> > +   }
> > +
> > +   if (rc ||
> > !(val & MSR_PLATFORM_INFO_CPUID_FAULTING) ||
> > rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES,
> >this_cpu(msr_misc_features)))
> 
> Below here we have
> 
>   setup_clear_cpu_cap(X86_FEATURE_CPUID_FAULTING);
> 
> Shouldn't this be reflected in the host policy?

I guess the correct thing to do for now for host_msr_domain_policy is:

dp->plaform_info.cpuid_faulting = cpu_has_cpuid_faulting;

Looking at the code, calculate_pv_max_policy() will be simplified with
the above change: pv_max_msr_domain_policy will become a copy of host
policy.

This actually brings a question: what to do about per-pCPU MSRs in the
context of MSR policy?

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] x86/msr: add Raw and Host domain policies

2018-02-08 Thread Sergey Dyasli
On Thu, 2018-02-08 at 11:21 +, Roger Pau Monné wrote:
> On Thu, Feb 08, 2018 at 10:23:21AM +0000, Sergey Dyasli wrote:
> > +static void __init calculate_host_policy(void)
> > +{
> > +struct msr_domain_policy *dp = &host_msr_domain_policy;
> > +
> > +*dp = raw_msr_domain_policy;
> 
> host_msr_domain_policy = raw_msr_domain_policy;
> 
> Should work AFAICT.

This is a template for the future. The code with *dp is much shorter
than with host_msr_domain_policy.

> 
> > diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
> > index 928f1cc454..8401d376c3 100644
> > --- a/xen/include/asm-x86/msr.h
> > +++ b/xen/include/asm-x86/msr.h
> > @@ -220,6 +220,14 @@ struct msr_domain_policy
> >  } plaform_info;
> >  };
> >  
> > +/* RAW msr domain policy: contains the actual values from H/W MSRs */
> > +extern msr_domain_policy raw_msr_domain_policy;
> > +/*
> > + * HOST msr domain policy: features that Xen actually decided to use,
> > + * a subset of RAW policy.
> > + */
> > +extern msr_domain_policy host_msr_domain_policy;
> 
> Aren't you missing a 'struct' here? I don't see any typedef for struct
> msr_domain_policy.

I don't know how I missed this. You are right, 'struct' is mandatory.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v1] x86/msr: add Raw and Host domain policies

2018-02-08 Thread Sergey Dyasli
Raw policy contains the actual values from H/W MSRs. Add PLATFORM_INFO
msr to the policy during probe_cpuid_faulting().

Host policy might have certain features disabled if Xen decides not
to use them. For now, make Host policy equal to Raw policy.

Signed-off-by: Sergey Dyasli 
---
v1: Decided to upstream this separately from VMX MSRs policy
---
 xen/arch/x86/cpu/common.c | 11 ++-
 xen/arch/x86/msr.c| 19 ++-
 xen/include/asm-x86/msr.h |  8 
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 4306e59650..0875b5478b 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -118,9 +118,18 @@ void (* __read_mostly ctxt_switch_masking)(const struct 
vcpu *next);
 
 bool __init probe_cpuid_faulting(void)
 {
+   struct msr_domain_policy *dp = &raw_msr_domain_policy;
uint64_t val;
+   int rc;
 
-   if (rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val) ||
+   if ((rc = rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val)) == 0)
+   {
+   dp->plaform_info.available = true;
+   if (val & MSR_PLATFORM_INFO_CPUID_FAULTING)
+   dp->plaform_info.cpuid_faulting = true;
+   }
+
+   if (rc ||
!(val & MSR_PLATFORM_INFO_CPUID_FAULTING) ||
rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES,
   this_cpu(msr_misc_features)))
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 7875d9c1e0..fbc8cd47a7 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -24,12 +24,27 @@
 #include 
 #include 
 
-struct msr_domain_policy __read_mostly hvm_max_msr_domain_policy,
+struct msr_domain_policy __read_mostly raw_msr_domain_policy,
+ __read_mostlyhost_msr_domain_policy,
+ __read_mostly hvm_max_msr_domain_policy,
  __read_mostly  pv_max_msr_domain_policy;
 
 struct msr_vcpu_policy __read_mostly hvm_max_msr_vcpu_policy,
__read_mostly  pv_max_msr_vcpu_policy;
 
+static void __init calculate_raw_policy(void)
+{
+/* 0x00ce  MSR_INTEL_PLATFORM_INFO */
+/* Was already added by probe_cpuid_faulting() */
+}
+
+static void __init calculate_host_policy(void)
+{
+struct msr_domain_policy *dp = &host_msr_domain_policy;
+
+*dp = raw_msr_domain_policy;
+}
+
 static void __init calculate_hvm_max_policy(void)
 {
 struct msr_domain_policy *dp = &hvm_max_msr_domain_policy;
@@ -68,6 +83,8 @@ static void __init calculate_pv_max_policy(void)
 
 void __init init_guest_msr_policy(void)
 {
+calculate_raw_policy();
+calculate_host_policy();
 calculate_hvm_max_policy();
 calculate_pv_max_policy();
 }
diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
index 928f1cc454..8401d376c3 100644
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -220,6 +220,14 @@ struct msr_domain_policy
 } plaform_info;
 };
 
+/* RAW msr domain policy: contains the actual values from H/W MSRs */
+extern msr_domain_policy raw_msr_domain_policy;
+/*
+ * HOST msr domain policy: features that Xen actually decided to use,
+ * a subset of RAW policy.
+ */
+extern msr_domain_policy host_msr_domain_policy;
+
 /* MSR policy object for per-vCPU MSRs */
 struct msr_vcpu_policy
 {
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 57/74] x86/pv-shim: shadow PV console's page for L2 DomU

2018-01-10 Thread Sergey Dyasli
On Tue, 2018-01-09 at 09:28 -0700, Jan Beulich wrote:
> > > > On 09.01.18 at 16:43,  wrote:
> > 
> > On Tue, 2018-01-09 at 02:13 -0700, Jan Beulich wrote:
> > > > > > On 04.01.18 at 14:06,  wrote:
> > > > 
> > > > +size_t consoled_guest_rx(void)
> > > > +{
> > > > +size_t recv = 0, idx = 0;
> > > > +XENCONS_RING_IDX cons, prod;
> > > > +
> > > > +if ( !cons_ring )
> > > > +return 0;
> > > > +
> > > > +spin_lock(&rx_lock);
> > > > +
> > > > +cons = cons_ring->out_cons;
> > > > +prod = ACCESS_ONCE(cons_ring->out_prod);
> > > > +ASSERT((prod - cons) <= sizeof(cons_ring->out));
> > > > +
> > > > +/* Is the ring empty? */
> > > > +if ( cons == prod )
> > > > +goto out;
> > > > +
> > > > +/* Update pointers before accessing the ring */
> > > > +smp_rmb();
> > > 
> > > I think this need to move up ahead of the if(). In the comment
> > > perhaps s/Update/Latch/?
> > 
> > The read/write memory barriers here are between read/write accesses to
> > ring->out_prod and ring->out array. So there is no need to move them.
> > (the same goes for the input ring)
> 
> And there is no multiple-read issue here?

As Andrew has kindly explained to me, there is an issue indeed.
So I moved smp_rmb() to be right after cons and prod read, and updated
the comment to say:

"Latch pointers before accessing the ring. Included compiler barrier also
ensures that pointers are really read only once into local variables."

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 57/74] x86/pv-shim: shadow PV console's page for L2 DomU

2018-01-09 Thread Sergey Dyasli
On Tue, 2018-01-09 at 02:13 -0700, Jan Beulich wrote:
> > > > On 04.01.18 at 14:06,  wrote:
> > +size_t consoled_guest_rx(void)
> > +{
> > +size_t recv = 0, idx = 0;
> > +XENCONS_RING_IDX cons, prod;
> > +
> > +if ( !cons_ring )
> > +return 0;
> > +
> > +spin_lock(&rx_lock);
> > +
> > +cons = cons_ring->out_cons;
> > +prod = ACCESS_ONCE(cons_ring->out_prod);
> > +ASSERT((prod - cons) <= sizeof(cons_ring->out));
> > +
> > +/* Is the ring empty? */
> > +if ( cons == prod )
> > +goto out;
> > +
> > +/* Update pointers before accessing the ring */
> > +smp_rmb();
> 
> I think this need to move up ahead of the if(). In the comment
> perhaps s/Update/Latch/?

The read/write memory barriers here are between read/write accesses to
ring->out_prod and ring->out array. So there is no need to move them.
(the same goes for the input ring)

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

<    1   2   3