[Xen-devel] [PATCH V11 0/3] x86/hvm: pkeys, add memory protection-key support

2016-02-19 Thread Huaitong Han
Changes in v11:
*Move pkru_ad/pkru_wd variable initialization position.
*Undo v10 changes.

Changes in v10:
*Move PFEC_page_present check.

Changes in v9:
*Rename _write_cr4 to raw_write_cr4.
*Clear X86_FEATURE_OSPKE and X86_FEATURE_OSXSAVE when the condition is not 
satisfied.

Changes in v8:
*Add the comment describing for paging_gva_to_gfn.
*Abstract out _write_cr4.

Changes in v7:
*Add static for pkey_fault.
*Add a comment for page present check and adjust indentation.
*Init pkru_ad and pkru_wd.
*Delete l3e_get_pkey the outer parentheses.
*The first parameter of read_pkru_* use uint32_t type.
*Use EOPNOTSUPP instead of EINVAL as return value on is_pv_vcpu condition.
*Update SDM chapter comments for patch 4.
*Add hvm_vcpu check in sh_gva_to_gfn.
*Rebase in the latest tree for patch 5.
*Add a comment for cpu_has_xsave adjustment.
*Adjust indentation for patch 5.

Changes in v6:
*2 patches merged are not included.
*Don't write XSTATE_PKRU to PV's xcr0.
*Use "if()" instead of "?:" in cpuid handling patch.
*Update read_pkru function.
*Use value 4 instead of CONFIG_PAGING_LEVELS.
*Add George's patch for PFEC_insn_fetch handling.

Changes in v5:
*Add static for opt_pku.
*Update commit message for some patches.
*Add condition 5:the access is to a user page to pkey_fault, and simplify #ifdef
for guest_walk_tables patch.
*Don't write XSTATE_PKRU to PV's xcr0.
*count == 0 is combined in hvm_cpuid function.

Changes in v4:
*Delete gva2gfn patch, and when page is present, PFEC_prot_key is always 
checked.
*Use RDPKRU instead of xsave_read because RDPKRU does cost less.
*Squash pkeys patch and pkru patch to guest_walk_tables patch.

Changes in v3:
*Get CPUID:ospke depend on guest cpuid instead of host hardware capable,
and Move cpuid patch to the last of patches.
*Move opt_pku to cpu/common.c.
*Use MASK_EXTR for get_pte_pkeys.
*Add quoting for pkru macro, and use static inline pkru_read functions.
*Rebase get_xsave_addr for updated codes, and add uncompressed format support
for xsave state.
*Use fpu_xsave instead of vcpu_save_fpu, and adjust the code style for
leaf_pte_pkeys_check.
*Add parentheses for PFEC_prot_key of gva2gfn funcitons.

Changes in v2:
*Rebase all patches in staging branch
*Disable X86_CR4_PKE on hypervisor, and delete pkru_read/write functions, and
use xsave state read to get pkru value.
*Delete the patch that adds pkeys support for do_page_fault.
*Add pkeys support for gva2gfn so that setting _PAGE_PK_BIT in the return
value can get propagated to the guest correctly.

The protection-key feature provides an additional mechanism by which IA-32e
paging controls access to usermode addresses.

Hardware support for protection keys for user pages is enumerated with CPUID
feature flag CPUID.7.0.ECX[3]:PKU. Software support is CPUID.7.0.ECX[4]:OSPKE
with the setting of CR4.PKE(bit 22).

When CR4.PKE = 1, every linear address is associated with the 4-bit protection
key located in bits 62:59 of the paging-structure entry that mapped the page
containing the linear address. The PKRU register determines, for each
protection key, whether user-mode addresses with that protection key may be
read or written.

The PKRU register (protection key rights for user pages) is a 32-bit register
with the following format: for each i (0 ≤ i ≤ 15), PKRU[2i] is the
access-disable bit for protection key i (ADi); PKRU[2i+1] is the write-disable
bit for protection key i (WDi).

Software can use the RDPKRU and WRPKRU instructions with ECX = 0 to read and
write PKRU. In addition, the PKRU register is XSAVE-managed state and can thus
be read and written by instructions in the XSAVE feature set.

PFEC.PK (bit 5) is defined as protection key violations.

The specification of Protection Keys can be found at SDM (4.6.2, volume 3)
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.


Huaitong Han (3):
  x86/hvm: pkeys, add pkeys support for guest_walk_tables
  x86/hvm: pkeys, add xstate support for pkeys
  x86/hvm: pkeys, add pkeys support for cpuid handling

 tools/libxc/xc_cpufeature.h   |  3 +++
 tools/libxc/xc_cpuid_x86.c|  6 +++--
 xen/arch/x86/hvm/hvm.c| 26 ++--
 xen/arch/x86/mm/guest_walk.c  | 52 +++
 xen/arch/x86/mm/hap/guest_walk.c  |  3 +++
 xen/arch/x86/xstate.c |  4 +++
 xen/include/asm-x86/guest_pt.h| 12 +
 xen/include/asm-x86/hvm/hvm.h |  2 ++
 xen/include/asm-x86/page.h|  5 
 xen/include/asm-x86/processor.h   | 47 ++-
 xen/include/asm-x86/x86_64/page.h | 12 +
 xen/include/asm-x86/xstate.h  |  4 ++-
 12 files changed, 165 insertions(+), 11 deletions(-)

-- 
2.5.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V11 3/3] x86/hvm: pkeys, add pkeys support for cpuid handling

2016-02-19 Thread Huaitong Han
This patch adds pkeys support for cpuid handing.

Pkeys hardware support is CPUID.7.0.ECX[3]:PKU. software support is
CPUID.7.0.ECX[4]:OSPKE and it reflects the support setting of CR4.PKE.

X86_FEATURE_OSXSAVE depends on guest X86_FEATURE_XSAVE, but cpu_has_xsave
function reflects hypervisor X86_FEATURE_XSAVE, it is fixed too.

Signed-off-by: Huaitong Han 
Reviewed-by: Jan Beulich 
Acked-by: Wei Liu 
---
Changes in v9:
*Clear X86_FEATURE_OSPKE and X86_FEATURE_OSXSAVE when the condition is not 
satisfied.

Changes in v7:
*Rebase in the latest tree.
*Add a comment for cpu_has_xsave adjustment.
*Adjust indentation.

 tools/libxc/xc_cpufeature.h |  3 +++
 tools/libxc/xc_cpuid_x86.c  |  6 --
 xen/arch/x86/hvm/hvm.c  | 26 +++---
 3 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/tools/libxc/xc_cpufeature.h b/tools/libxc/xc_cpufeature.h
index ee53679..866cf0b 100644
--- a/tools/libxc/xc_cpufeature.h
+++ b/tools/libxc/xc_cpufeature.h
@@ -144,4 +144,7 @@
 #define X86_FEATURE_CLFLUSHOPT  23 /* CLFLUSHOPT instruction */
 #define X86_FEATURE_CLWB24 /* CLWB instruction */
 
+/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx) */
+#define X86_FEATURE_PKU 3
+
 #endif /* __LIBXC_CPUFEATURE_H */
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index c142595..5408dd0 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -430,9 +430,11 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
 bitmaskof(X86_FEATURE_PCOMMIT) |
 bitmaskof(X86_FEATURE_CLWB) |
 bitmaskof(X86_FEATURE_CLFLUSHOPT));
+regs[2] &= bitmaskof(X86_FEATURE_PKU);
 } else
-regs[1] = 0;
-regs[0] = regs[2] = regs[3] = 0;
+regs[1] = regs[2] = 0;
+
+regs[0] = regs[3] = 0;
 break;
 
 case 0x000d:
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5ec2ae1..73fb54c 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4572,9 +4572,11 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 __clear_bit(X86_FEATURE_APIC & 31, edx);
 
 /* Fix up OSXSAVE. */
-if ( cpu_has_xsave )
-*ecx |= (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE) ?
- cpufeat_mask(X86_FEATURE_OSXSAVE) : 0;
+if ( *ecx & cpufeat_mask(X86_FEATURE_XSAVE) &&
+ (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE) )
+*ecx |= cpufeat_mask(X86_FEATURE_OSXSAVE);
+else
+*ecx &= ~cpufeat_mask(X86_FEATURE_OSXSAVE);
 
 /* Don't expose PCID to non-hap hvm. */
 if ( !hap_enabled(d) )
@@ -4593,16 +4595,26 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 if ( !cpu_has_smap )
 *ebx &= ~cpufeat_mask(X86_FEATURE_SMAP);
 
-/* Don't expose MPX to hvm when VMX support is not available */
+/* Don't expose MPX to hvm when VMX support is not available. */
 if ( !(vmx_vmexit_control & VM_EXIT_CLEAR_BNDCFGS) ||
  !(vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS) )
 *ebx &= ~cpufeat_mask(X86_FEATURE_MPX);
 
-/* Don't expose INVPCID to non-hap hvm. */
 if ( !hap_enabled(d) )
-*ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+{
+ /* Don't expose INVPCID to non-hap hvm. */
+ *ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+ /* X86_FEATURE_PKU is not yet implemented for shadow paging. 
*/
+ *ecx &= ~cpufeat_mask(X86_FEATURE_PKU);
+}
+
+if ( (*ecx & cpufeat_mask(X86_FEATURE_PKU)) &&
+ (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE) )
+*ecx |= cpufeat_mask(X86_FEATURE_OSPKE);
+else
+*ecx &= ~cpufeat_mask(X86_FEATURE_OSPKE);
 
-/* Don't expose PCOMMIT to hvm when VMX support is not available */
+/* Don't expose PCOMMIT to hvm when VMX support is not available. 
*/
 if ( !cpu_has_vmx_pcommit )
 *ebx &= ~cpufeat_mask(X86_FEATURE_PCOMMIT);
 }
-- 
2.5.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V11 2/3] x86/hvm: pkeys, add xstate support for pkeys

2016-02-19 Thread Huaitong Han
The XSAVE feature set can operate on PKRU state only if the feature set is
enabled (CR4.OSXSAVE = 1) and has been configured to manage PKRU state
(XCR0[9] = 1). And XCR0.PKRU is disabled on PV mode without PKU feature
enabled.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
Reviewed-by: Kevin Tian 
---
Changes in v7:
*Use EOPNOTSUPP instead of EINVAL as return value on is_pv_vcpu condition.

 xen/arch/x86/xstate.c| 4 
 xen/include/asm-x86/xstate.h | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 4e87ab3..50d9e48 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -579,6 +579,10 @@ int handle_xsetbv(u32 index, u64 new_bv)
 if ( (new_bv & ~xfeature_mask) || !valid_xcr0(new_bv) )
 return -EINVAL;
 
+/* XCR0.PKRU is disabled on PV mode. */
+if ( is_pv_vcpu(curr) && (new_bv & XSTATE_PKRU) )
+return -EOPNOTSUPP;
+
 if ( !set_xcr0(new_bv) )
 return -EFAULT;
 
diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index 12d939b..f7c41ba 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -34,13 +34,15 @@
 #define XSTATE_OPMASK  (1ULL << 5)
 #define XSTATE_ZMM (1ULL << 6)
 #define XSTATE_HI_ZMM  (1ULL << 7)
+#define XSTATE_PKRU(1ULL << 9)
 #define XSTATE_LWP (1ULL << 62) /* AMD lightweight profiling */
 #define XSTATE_FP_SSE  (XSTATE_FP | XSTATE_SSE)
 #define XCNTXT_MASK(XSTATE_FP | XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | \
 XSTATE_ZMM | XSTATE_HI_ZMM | XSTATE_NONLAZY)
 
 #define XSTATE_ALL (~(1ULL << 63))
-#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR)
+#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR | \
+XSTATE_PKRU)
 #define XSTATE_LAZY(XSTATE_ALL & ~XSTATE_NONLAZY)
 #define XSTATE_COMPACTION_ENABLED  (1ULL << 63)
 
-- 
2.5.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V11 1/3] x86/hvm: pkeys, add pkeys support for guest_walk_tables

2016-02-19 Thread Huaitong Han
Protection keys define a new 4-bit protection key field(PKEY) in bits 62:59 of
leaf entries of the page tables.

PKRU register defines 32 bits, there are 16 domains and 2 attribute bits per
domain in pkru, for each i (0 ≤ i ≤ 15), PKRU[2i] is the access-disable bit for
protection key i (ADi); PKRU[2i+1] is the write-disable bit for protection key
i (WDi). PKEY is index to a defined domain.

A fault is considered as a PKU violation if all of the following conditions are
true:
1.CR4_PKE=1.
2.EFER_LMA=1.
3.Page is present with no reserved bit violations.
4.The access is not an instruction fetch.
5.The access is to a user page.
6.PKRU.AD=1
or The access is a data write and PKRU.WD=1
and either CR0.WP=1 or it is a user access.

Signed-off-by: Huaitong Han 
Reviewed-by: Jan Beulich 
Reviewed-by: Kevin Tian 
---
Changes in v11:
*Move pkru_ad/pkru_wd variable initialization position.
*Undo v10 changes.

Changes in v10:
*Move PFEC_page_present check.

Changes in v9:
*Rename _write_cr4 to raw_write_cr4.

Changes in v8:
*Abstract out _write_cr4.

Changes in v7:
*Add static for pkey_fault.
*Add a comment for page present check and adjust indentation.
*Init pkru_ad and pkru_wd.
*Delete l3e_get_pkey the outer parentheses.
*The first parameter of read_pkru_* use uint32_t type.

 xen/arch/x86/mm/guest_walk.c  | 52 +++
 xen/arch/x86/mm/hap/guest_walk.c  |  3 +++
 xen/include/asm-x86/guest_pt.h| 12 +
 xen/include/asm-x86/hvm/hvm.h |  2 ++
 xen/include/asm-x86/page.h|  5 
 xen/include/asm-x86/processor.h   | 47 ++-
 xen/include/asm-x86/x86_64/page.h | 12 +
 7 files changed, 132 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 18d1acf..01a64ae 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -90,6 +90,52 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int 
set_dirty)
 return 0;
 }
 
+#if GUEST_PAGING_LEVELS >= 4
+static bool_t pkey_fault(struct vcpu *vcpu, uint32_t pfec,
+uint32_t pte_flags, uint32_t pte_pkey)
+{
+uint32_t pkru;
+
+/* When page isn't present,  PKEY isn't checked. */
+if ( !(pfec & PFEC_page_present) || is_pv_vcpu(vcpu) )
+return 0;
+
+/*
+ * PKU:  additional mechanism by which the paging controls
+ * access to user-mode addresses based on the value in the
+ * PKRU register. A fault is considered as a PKU violation if all
+ * of the following conditions are true:
+ * 1.CR4_PKE=1.
+ * 2.EFER_LMA=1.
+ * 3.Page is present with no reserved bit violations.
+ * 4.The access is not an instruction fetch.
+ * 5.The access is to a user page.
+ * 6.PKRU.AD=1 or
+ *  the access is a data write and PKRU.WD=1 and
+ *  either CR0.WP=1 or it is a user access.
+ */
+if ( !hvm_pku_enabled(vcpu) ||
+ !hvm_long_mode_enabled(vcpu) ||
+ (pfec & PFEC_reserved_bit) ||
+ (pfec & PFEC_insn_fetch) ||
+ !(pte_flags & _PAGE_USER) )
+return 0;
+
+pkru = read_pkru();
+if ( unlikely(pkru) )
+{
+bool_t pkru_ad = read_pkru_ad(pkru, pte_pkey);
+bool_t pkru_wd = read_pkru_wd(pkru, pte_pkey);
+/* Condition 6 */
+if ( pkru_ad || (pkru_wd && (pfec & PFEC_write_access) &&
+(hvm_wp_enabled(vcpu) || (pfec & PFEC_user_mode
+return 1;
+}
+
+return 0;
+}
+#endif
+
 /* Walk the guest pagetables, after the manner of a hardware walker. */
 /* Because the walk is essentially random, it can cause a deadlock 
  * warning in the p2m locking code. Highly unlikely this is an actual
@@ -107,6 +153,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 guest_l3e_t *l3p = NULL;
 guest_l4e_t *l4p;
 #endif
+unsigned int pkey;
 uint32_t gflags, mflags, iflags, rc = 0;
 bool_t smep = 0, smap = 0;
 bool_t pse1G = 0, pse2M = 0;
@@ -190,6 +237,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 goto out;
 /* Get the l3e and check its flags*/
 gw->l3e = l3p[guest_l3_table_offset(va)];
+pkey = guest_l3e_get_pkey(gw->l3e);
 gflags = guest_l3e_get_flags(gw->l3e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -261,6 +309,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 #endif /* All levels... */
 
+pkey = guest_l2e_get_pkey(gw->l2e);
 gflags = guest_l2e_get_flags(gw->l2e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -324,6 +373,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 if(l1p == NULL)
 goto out;
 gw->l1e = l1p[guest_l1_table_offset(va)];
+pkey = guest_l1e_get_pkey(gw->l1e);
 gflags = guest_l1e_get_fl

[Xen-devel] [PATCH V10 4/5] xen/mm: Clean up pfec handling in gva_to_gfn

2016-02-03 Thread Huaitong Han
From: George Dunlap 

At the moment, the pfec argument to gva_to_gfn has two functions:

* To inform guest_walk what kind of access is happenind

* As a value to pass back into the guest in the event of a fault.

Unfortunately this is not quite treated consistently: the hvm_fetch_*
function will "pre-clear" the PFEC_insn_fetch flag before calling
gva_to_gfn; meaning guest_walk doesn't actually know whether a given
access is an instruction fetch or not.  This works now, but will cause
issues when pkeys are introduced, since guest_walk will need to know
whether an access is an instruction fetch even if it doesn't return
PFEC_insn_fetch.

Fix this by making a clean separation for in and out functionalities
of the pfec argument:

1. Always pass in the access type to gva_to_gfn

2. Filter out inappropriate access flags before returning from gva_to_gfn.

(The PFEC_insn_fetch flag should only be passed to the guest if either NX or
SMEP is enabled.  See Intel 64 Developer's Manual, Volume 3, Chapter Paging,
PAGE-FAULT EXCEPTIONS)

Signed-off-by: George Dunlap 
Signed-off-by: Huaitong Han 
Acked-by: Jan Beulich 
Acked-by: Tim Deegan 
---
Changes in v8:
*Add the comment describing for paging_gva_to_gfn.

Changes in v7:
*Update SDM chapter comments.
*Add hvm_vcpu check in sh_gva_to_gfn.

 xen/arch/x86/hvm/hvm.c   |  8 ++--
 xen/arch/x86/mm/hap/guest_walk.c | 10 +-
 xen/arch/x86/mm/shadow/multi.c   |  6 ++
 xen/include/asm-x86/paging.h |  6 +-
 4 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 674feea..5ec2ae1 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4438,11 +4438,9 @@ enum hvm_copy_result hvm_copy_from_guest_virt(
 enum hvm_copy_result hvm_fetch_from_guest_virt(
 void *buf, unsigned long vaddr, int size, uint32_t pfec)
 {
-if ( hvm_nx_enabled(current) || hvm_smep_enabled(current) )
-pfec |= PFEC_insn_fetch;
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
-  PFEC_page_present | pfec);
+  PFEC_page_present | PFEC_insn_fetch | pfec);
 }
 
 enum hvm_copy_result hvm_copy_to_guest_virt_nofault(
@@ -4464,11 +4462,9 @@ enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
 enum hvm_copy_result hvm_fetch_from_guest_virt_nofault(
 void *buf, unsigned long vaddr, int size, uint32_t pfec)
 {
-if ( hvm_nx_enabled(current) || hvm_smep_enabled(current) )
-pfec |= PFEC_insn_fetch;
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-  PFEC_page_present | pfec);
+  PFEC_page_present | PFEC_insn_fetch | pfec);
 }
 
 unsigned long copy_to_user_hvm(void *to, const void *from, unsigned int len)
diff --git a/xen/arch/x86/mm/hap/guest_walk.c b/xen/arch/x86/mm/hap/guest_walk.c
index 49d0328..d2716f9 100644
--- a/xen/arch/x86/mm/hap/guest_walk.c
+++ b/xen/arch/x86/mm/hap/guest_walk.c
@@ -82,7 +82,7 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
 if ( !top_page )
 {
 pfec[0] &= ~PFEC_page_present;
-return INVALID_GFN;
+goto out_tweak_pfec;
 }
 top_mfn = _mfn(page_to_mfn(top_page));
 
@@ -139,6 +139,14 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
 if ( missing & _PAGE_SHARED )
 pfec[0] = PFEC_page_shared;
 
+ out_tweak_pfec:
+/*
+ * SDM Intel 64 Volume 3, Chapter Paging, PAGE-FAULT EXCEPTIONS:
+ * The PFEC_insn_fetch flag is set only when NX or SMEP are enabled.
+ */
+if ( !hvm_nx_enabled(v) && !hvm_smep_enabled(v) )
+pfec[0] &= ~PFEC_insn_fetch;
+
 return INVALID_GFN;
 }
 
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 162c06f..d42597c 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3669,6 +3669,12 @@ sh_gva_to_gfn(struct vcpu *v, struct p2m_domain *p2m,
 pfec[0] &= ~PFEC_page_present;
 if ( missing & _PAGE_INVALID_BITS )
 pfec[0] |= PFEC_reserved_bit;
+/*
+ * SDM Intel 64 Volume 3, Chapter Paging, PAGE-FAULT EXCEPTIONS:
+ * The PFEC_insn_fetch flag is set only when NX or SMEP are enabled.
+ */
+if ( is_hvm_vcpu(v) && !hvm_nx_enabled(v) && !hvm_smep_enabled(v) )
+pfec[0] &= ~PFEC_insn_fetch;
 return INVALID_GFN;
 }
 gfn = guest_walk_to_gfn(&gw);
diff --git a/xen/include/asm-x86/paging.h b/xen/include/asm-x86/paging.h
index 9a8653d..195fe8f 100644
--- a/xen/include/asm-x86/paging.h
+++ b/xen/include/asm-x86/paging.h
@@ -255,7 +255,11 @@ static inline bool_t paging_invlpg(struct vcpu *v, 
unsigned long va)
  * tables don't map this address for this kind of access.
  * pfec[0] is used to determine which kind of access thi

[Xen-devel] [PATCH V10 5/5] x86/hvm: pkeys, add pkeys support for cpuid handling

2016-02-03 Thread Huaitong Han
This patch adds pkeys support for cpuid handing.

Pkeys hardware support is CPUID.7.0.ECX[3]:PKU. software support is
CPUID.7.0.ECX[4]:OSPKE and it reflects the support setting of CR4.PKE.

X86_FEATURE_OSXSAVE depends on guest X86_FEATURE_XSAVE, but cpu_has_xsave
function reflects hypervisor X86_FEATURE_XSAVE, it is fixed too.

Signed-off-by: Huaitong Han 
Reviewed-by: Jan Beulich 
---
Changes in v9:
*Clear X86_FEATURE_OSPKE and X86_FEATURE_OSXSAVE when the condition is not 
satisfied.

Changes in v7:
*Rebase in the latest tree.
*Add a comment for cpu_has_xsave adjustment.
*Adjust indentation.

 tools/libxc/xc_cpufeature.h |  3 +++
 tools/libxc/xc_cpuid_x86.c  |  6 --
 xen/arch/x86/hvm/hvm.c  | 26 +++---
 3 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/tools/libxc/xc_cpufeature.h b/tools/libxc/xc_cpufeature.h
index ee53679..866cf0b 100644
--- a/tools/libxc/xc_cpufeature.h
+++ b/tools/libxc/xc_cpufeature.h
@@ -144,4 +144,7 @@
 #define X86_FEATURE_CLFLUSHOPT  23 /* CLFLUSHOPT instruction */
 #define X86_FEATURE_CLWB24 /* CLWB instruction */
 
+/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx) */
+#define X86_FEATURE_PKU 3
+
 #endif /* __LIBXC_CPUFEATURE_H */
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index c142595..5408dd0 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -430,9 +430,11 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
 bitmaskof(X86_FEATURE_PCOMMIT) |
 bitmaskof(X86_FEATURE_CLWB) |
 bitmaskof(X86_FEATURE_CLFLUSHOPT));
+regs[2] &= bitmaskof(X86_FEATURE_PKU);
 } else
-regs[1] = 0;
-regs[0] = regs[2] = regs[3] = 0;
+regs[1] = regs[2] = 0;
+
+regs[0] = regs[3] = 0;
 break;
 
 case 0x000d:
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5ec2ae1..73fb54c 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4572,9 +4572,11 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 __clear_bit(X86_FEATURE_APIC & 31, edx);
 
 /* Fix up OSXSAVE. */
-if ( cpu_has_xsave )
-*ecx |= (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE) ?
- cpufeat_mask(X86_FEATURE_OSXSAVE) : 0;
+if ( *ecx & cpufeat_mask(X86_FEATURE_XSAVE) &&
+ (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE) )
+*ecx |= cpufeat_mask(X86_FEATURE_OSXSAVE);
+else
+*ecx &= ~cpufeat_mask(X86_FEATURE_OSXSAVE);
 
 /* Don't expose PCID to non-hap hvm. */
 if ( !hap_enabled(d) )
@@ -4593,16 +4595,26 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 if ( !cpu_has_smap )
 *ebx &= ~cpufeat_mask(X86_FEATURE_SMAP);
 
-/* Don't expose MPX to hvm when VMX support is not available */
+/* Don't expose MPX to hvm when VMX support is not available. */
 if ( !(vmx_vmexit_control & VM_EXIT_CLEAR_BNDCFGS) ||
  !(vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS) )
 *ebx &= ~cpufeat_mask(X86_FEATURE_MPX);
 
-/* Don't expose INVPCID to non-hap hvm. */
 if ( !hap_enabled(d) )
-*ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+{
+ /* Don't expose INVPCID to non-hap hvm. */
+ *ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+ /* X86_FEATURE_PKU is not yet implemented for shadow paging. 
*/
+ *ecx &= ~cpufeat_mask(X86_FEATURE_PKU);
+}
+
+if ( (*ecx & cpufeat_mask(X86_FEATURE_PKU)) &&
+ (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE) )
+*ecx |= cpufeat_mask(X86_FEATURE_OSPKE);
+else
+*ecx &= ~cpufeat_mask(X86_FEATURE_OSPKE);
 
-/* Don't expose PCOMMIT to hvm when VMX support is not available */
+/* Don't expose PCOMMIT to hvm when VMX support is not available. 
*/
 if ( !cpu_has_vmx_pcommit )
 *ebx &= ~cpufeat_mask(X86_FEATURE_PCOMMIT);
 }
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V10 2/5] x86/hvm: pkeys, add pkeys support for guest_walk_tables

2016-02-03 Thread Huaitong Han
Protection keys define a new 4-bit protection key field(PKEY) in bits 62:59 of
leaf entries of the page tables.

PKRU register defines 32 bits, there are 16 domains and 2 attribute bits per
domain in pkru, for each i (0 ≤ i ≤ 15), PKRU[2i] is the access-disable bit for
protection key i (ADi); PKRU[2i+1] is the write-disable bit for protection key
i (WDi). PKEY is index to a defined domain.

A fault is considered as a PKU violation if all of the following conditions are
true:
1.CR4_PKE=1.
2.EFER_LMA=1.
3.Page is present with no reserved bit violations.
4.The access is not an instruction fetch.
5.The access is to a user page.
6.PKRU.AD=1
or The access is a data write and PKRU.WD=1
and either CR0.WP=1 or it is a user access.

Signed-off-by: Huaitong Han 
Reviewed-by: Jan Beulich 
Reviewed-by: Kevin Tian 
---
Changes in v10:
*Move PFEC_page_present check.

Changes in v9:
*Rename _write_cr4 to raw_write_cr4.

Changes in v8:
*Abstract out _write_cr4.

Changes in v7:
*Add static for pkey_fault.
*Add a comment for page present check and adjust indentation.
*Init pkru_ad and pkru_wd.
*Delete l3e_get_pkey the outer parentheses.
*The first parameter of read_pkru_* use uint32_t type.

 xen/arch/x86/mm/guest_walk.c  | 53 +++
 xen/arch/x86/mm/hap/guest_walk.c  |  3 +++
 xen/include/asm-x86/guest_pt.h| 12 +
 xen/include/asm-x86/hvm/hvm.h |  2 ++
 xen/include/asm-x86/page.h|  5 
 xen/include/asm-x86/processor.h   | 47 +-
 xen/include/asm-x86/x86_64/page.h | 12 +
 7 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 18d1acf..4a6d292 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -90,6 +90,53 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int 
set_dirty)
 return 0;
 }
 
+#if GUEST_PAGING_LEVELS >= 4
+static bool_t pkey_fault(struct vcpu *vcpu, uint32_t pfec,
+uint32_t pte_flags, uint32_t pte_pkey)
+{
+uint32_t pkru = 0;
+bool_t pkru_ad = 0, pkru_wd = 0;
+
+if ( is_pv_vcpu(vcpu) )
+return 0;
+
+/*
+ * PKU:  additional mechanism by which the paging controls
+ * access to user-mode addresses based on the value in the
+ * PKRU register. A fault is considered as a PKU violation if all
+ * of the following conditions are true:
+ * 1.CR4_PKE=1.
+ * 2.EFER_LMA=1.
+ * 3.Page is present with no reserved bit violations.
+ * 4.The access is not an instruction fetch.
+ * 5.The access is to a user page.
+ * 6.PKRU.AD=1 or
+ *  the access is a data write and PKRU.WD=1 and
+ *  either CR0.WP=1 or it is a user access.
+ */
+if ( !hvm_pku_enabled(vcpu) ||
+ !hvm_long_mode_enabled(vcpu) ||
+ !(pfec & PFEC_page_present) ||
+ (pfec & PFEC_reserved_bit) ||
+ (pfec & PFEC_insn_fetch) ||
+ !(pte_flags & _PAGE_USER) )
+return 0;
+
+pkru = read_pkru();
+if ( unlikely(pkru) )
+{
+pkru_ad = read_pkru_ad(pkru, pte_pkey);
+pkru_wd = read_pkru_wd(pkru, pte_pkey);
+/* Condition 6 */
+if ( pkru_ad || (pkru_wd && (pfec & PFEC_write_access) &&
+(hvm_wp_enabled(vcpu) || (pfec & PFEC_user_mode
+return 1;
+}
+
+return 0;
+}
+#endif
+
 /* Walk the guest pagetables, after the manner of a hardware walker. */
 /* Because the walk is essentially random, it can cause a deadlock 
  * warning in the p2m locking code. Highly unlikely this is an actual
@@ -107,6 +154,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 guest_l3e_t *l3p = NULL;
 guest_l4e_t *l4p;
 #endif
+unsigned int pkey;
 uint32_t gflags, mflags, iflags, rc = 0;
 bool_t smep = 0, smap = 0;
 bool_t pse1G = 0, pse2M = 0;
@@ -190,6 +238,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 goto out;
 /* Get the l3e and check its flags*/
 gw->l3e = l3p[guest_l3_table_offset(va)];
+pkey = guest_l3e_get_pkey(gw->l3e);
 gflags = guest_l3e_get_flags(gw->l3e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -261,6 +310,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 #endif /* All levels... */
 
+pkey = guest_l2e_get_pkey(gw->l2e);
 gflags = guest_l2e_get_flags(gw->l2e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -324,6 +374,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 if(l1p == NULL)
 goto out;
 gw->l1e = l1p[guest_l1_table_offset(va)];
+pkey = guest_l1e_get_pkey(gw->l1e);
 gflags = guest_l1e_get_flags(gw->l1e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -334,6 +385,8 @@ guest

[Xen-devel] [PATCH V10 3/5] x86/hvm: pkeys, add xstate support for pkeys

2016-02-03 Thread Huaitong Han
The XSAVE feature set can operate on PKRU state only if the feature set is
enabled (CR4.OSXSAVE = 1) and has been configured to manage PKRU state
(XCR0[9] = 1). And XCR0.PKRU is disabled on PV mode without PKU feature
enabled.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
Reviewed-by: Kevin Tian 
---
Changes in v7:
*Use EOPNOTSUPP instead of EINVAL as return value on is_pv_vcpu condition.

 xen/arch/x86/xstate.c| 4 
 xen/include/asm-x86/xstate.h | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 4e87ab3..50d9e48 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -579,6 +579,10 @@ int handle_xsetbv(u32 index, u64 new_bv)
 if ( (new_bv & ~xfeature_mask) || !valid_xcr0(new_bv) )
 return -EINVAL;
 
+/* XCR0.PKRU is disabled on PV mode. */
+if ( is_pv_vcpu(curr) && (new_bv & XSTATE_PKRU) )
+return -EOPNOTSUPP;
+
 if ( !set_xcr0(new_bv) )
 return -EFAULT;
 
diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index 12d939b..f7c41ba 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -34,13 +34,15 @@
 #define XSTATE_OPMASK  (1ULL << 5)
 #define XSTATE_ZMM (1ULL << 6)
 #define XSTATE_HI_ZMM  (1ULL << 7)
+#define XSTATE_PKRU(1ULL << 9)
 #define XSTATE_LWP (1ULL << 62) /* AMD lightweight profiling */
 #define XSTATE_FP_SSE  (XSTATE_FP | XSTATE_SSE)
 #define XCNTXT_MASK(XSTATE_FP | XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | \
 XSTATE_ZMM | XSTATE_HI_ZMM | XSTATE_NONLAZY)
 
 #define XSTATE_ALL (~(1ULL << 63))
-#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR)
+#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR | \
+XSTATE_PKRU)
 #define XSTATE_LAZY(XSTATE_ALL & ~XSTATE_NONLAZY)
 #define XSTATE_COMPACTION_ENABLED  (1ULL << 63)
 
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V10 1/5] x86/hvm: pkeys, disable pkeys for guests in non-paging mode

2016-02-03 Thread Huaitong Han
This patch disables pkeys for guest in non-paging mode, However XEN always uses
paging mode to emulate guest non-paging mode, To emulate this behavior, pkeys
needs to be manually disabled when guest switches to non-paging mode.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
Acked-by: Kevin Tian 
---
 xen/arch/x86/hvm/vmx/vmx.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 04dde83..a0d51cb 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1368,12 +1368,13 @@ static void vmx_update_guest_cr(struct vcpu *v, 
unsigned int cr)
 if ( !hvm_paging_enabled(v) )
 {
 /*
- * SMEP/SMAP is disabled if CPU is in non-paging mode in hardware.
- * However Xen always uses paging mode to emulate guest non-paging
- * mode. To emulate this behavior, SMEP/SMAP needs to be manually
- * disabled when guest VCPU is in non-paging mode.
+ * SMEP/SMAP/PKU is disabled if CPU is in non-paging mode in
+ * hardware. However Xen always uses paging mode to emulate guest
+ * non-paging mode. To emulate this behavior, SMEP/SMAP/PKU needs
+ * to be manually disabled when guest VCPU is in non-paging mode.
  */
-v->arch.hvm_vcpu.hw_cr[4] &= ~(X86_CR4_SMEP | X86_CR4_SMAP);
+v->arch.hvm_vcpu.hw_cr[4] &=
+~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
 }
 __vmwrite(GUEST_CR4, v->arch.hvm_vcpu.hw_cr[4]);
 break;
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V10 0/5] x86/hvm: pkeys, add memory protection-key support

2016-02-03 Thread Huaitong Han
Changes in v10:
*Move PFEC_page_present check.

Changes in v9:
*Rename _write_cr4 to raw_write_cr4.
*Clear X86_FEATURE_OSPKE and X86_FEATURE_OSXSAVE when the condition is not 
satisfied.

Changes in v8:
*Add the comment describing for paging_gva_to_gfn.
*Abstract out _write_cr4.

Changes in v7:
*Add static for pkey_fault.
*Add a comment for page present check and adjust indentation.
*Init pkru_ad and pkru_wd.
*Delete l3e_get_pkey the outer parentheses.
*The first parameter of read_pkru_* use uint32_t type.
*Use EOPNOTSUPP instead of EINVAL as return value on is_pv_vcpu condition.
*Update SDM chapter comments for patch 4.
*Add hvm_vcpu check in sh_gva_to_gfn.
*Rebase in the latest tree for patch 5.
*Add a comment for cpu_has_xsave adjustment.
*Adjust indentation for patch 5.

Changes in v6:
*2 patches merged are not included.
*Don't write XSTATE_PKRU to PV's xcr0.
*Use "if()" instead of "?:" in cpuid handling patch.
*Update read_pkru function.
*Use value 4 instead of CONFIG_PAGING_LEVELS.
*Add George's patch for PFEC_insn_fetch handling.

Changes in v5:
*Add static for opt_pku.
*Update commit message for some patches.
*Add condition 5:the access is to a user page to pkey_fault, and simplify #ifdef
for guest_walk_tables patch.
*Don't write XSTATE_PKRU to PV's xcr0.
*count == 0 is combined in hvm_cpuid function.

Changes in v4:
*Delete gva2gfn patch, and when page is present, PFEC_prot_key is always 
checked.
*Use RDPKRU instead of xsave_read because RDPKRU does cost less.
*Squash pkeys patch and pkru patch to guest_walk_tables patch.

Changes in v3:
*Get CPUID:ospke depend on guest cpuid instead of host hardware capable,
and Move cpuid patch to the last of patches.
*Move opt_pku to cpu/common.c.
*Use MASK_EXTR for get_pte_pkeys.
*Add quoting for pkru macro, and use static inline pkru_read functions.
*Rebase get_xsave_addr for updated codes, and add uncompressed format support
for xsave state.
*Use fpu_xsave instead of vcpu_save_fpu, and adjust the code style for
leaf_pte_pkeys_check.
*Add parentheses for PFEC_prot_key of gva2gfn funcitons.

Changes in v2:
*Rebase all patches in staging branch
*Disable X86_CR4_PKE on hypervisor, and delete pkru_read/write functions, and
use xsave state read to get pkru value.
*Delete the patch that adds pkeys support for do_page_fault.
*Add pkeys support for gva2gfn so that setting _PAGE_PK_BIT in the return
value can get propagated to the guest correctly.

The protection-key feature provides an additional mechanism by which IA-32e
paging controls access to usermode addresses.

Hardware support for protection keys for user pages is enumerated with CPUID
feature flag CPUID.7.0.ECX[3]:PKU. Software support is CPUID.7.0.ECX[4]:OSPKE
with the setting of CR4.PKE(bit 22).

When CR4.PKE = 1, every linear address is associated with the 4-bit protection
key located in bits 62:59 of the paging-structure entry that mapped the page
containing the linear address. The PKRU register determines, for each
protection key, whether user-mode addresses with that protection key may be
read or written.

The PKRU register (protection key rights for user pages) is a 32-bit register
with the following format: for each i (0 ≤ i ≤ 15), PKRU[2i] is the
access-disable bit for protection key i (ADi); PKRU[2i+1] is the write-disable
bit for protection key i (WDi).

Software can use the RDPKRU and WRPKRU instructions with ECX = 0 to read and
write PKRU. In addition, the PKRU register is XSAVE-managed state and can thus
be read and written by instructions in the XSAVE feature set.

PFEC.PK (bit 5) is defined as protection key violations.

The specification of Protection Keys can be found at SDM (4.6.2, volume 3)
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.

George Dunlap (1):
  xen/mm: Clean up pfec handling in gva_to_gfn

Huaitong Han (4):
  x86/hvm: pkeys, disable pkeys for guests in non-paging mode
  x86/hvm: pkeys, add pkeys support for guest_walk_tables
  x86/hvm: pkeys, add xstate support for pkeys
  x86/hvm: pkeys, add pkeys support for cpuid handling

 tools/libxc/xc_cpufeature.h   |  3 +++
 tools/libxc/xc_cpuid_x86.c|  6 +++--
 xen/arch/x86/hvm/hvm.c| 34 +++--
 xen/arch/x86/hvm/vmx/vmx.c| 11 
 xen/arch/x86/mm/guest_walk.c  | 53 +++
 xen/arch/x86/mm/hap/guest_walk.c  | 13 +-
 xen/arch/x86/mm/shadow/multi.c|  6 +
 xen/arch/x86/xstate.c |  4 +++
 xen/include/asm-x86/guest_pt.h| 12 +
 xen/include/asm-x86/hvm/hvm.h |  2 ++
 xen/include/asm-x86/page.h|  5 
 xen/include/asm-x86/paging.h  |  6 -
 xen/include/asm-x86/processor.h   | 47 +-
 xen/include/asm-x86/x86_64/page.h | 12 +
 xen/include/asm-x86/xstate.h  |  4 ++-
 15 files changed, 

[Xen-devel] [PATCH V9 2/5] x86/hvm: pkeys, add pkeys support for guest_walk_tables

2016-02-03 Thread Huaitong Han
Protection keys define a new 4-bit protection key field(PKEY) in bits 62:59 of
leaf entries of the page tables.

PKRU register defines 32 bits, there are 16 domains and 2 attribute bits per
domain in pkru, for each i (0 ≤ i ≤ 15), PKRU[2i] is the access-disable bit for
protection key i (ADi); PKRU[2i+1] is the write-disable bit for protection key
i (WDi). PKEY is index to a defined domain.

A fault is considered as a PKU violation if all of the following conditions are
true:
1.CR4_PKE=1.
2.EFER_LMA=1.
3.Page is present with no reserved bit violations.
4.The access is not an instruction fetch.
5.The access is to a user page.
6.PKRU.AD=1
or The access is a data write and PKRU.WD=1
and either CR0.WP=1 or it is a user access.

Signed-off-by: Huaitong Han 
Reviewed-by: Jan Beulich 
---
Changes in v9:
*Rename _write_cr4 to raw_write_cr4.

Changes in v8:
*Abstract out _write_cr4.

Changes in v7:
*Add static for pkey_fault.
*Add a comment for page present check and adjust indentation.
*Init pkru_ad and pkru_wd.
*Delete l3e_get_pkey the outer parentheses.
*The first parameter of read_pkru_* use uint32_t type.

 xen/arch/x86/mm/guest_walk.c  | 54 +++
 xen/arch/x86/mm/hap/guest_walk.c  |  3 +++
 xen/include/asm-x86/guest_pt.h| 12 +
 xen/include/asm-x86/hvm/hvm.h |  2 ++
 xen/include/asm-x86/page.h|  5 
 xen/include/asm-x86/processor.h   | 47 +-
 xen/include/asm-x86/x86_64/page.h | 12 +
 7 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 18d1acf..5eb 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -90,6 +90,54 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int 
set_dirty)
 return 0;
 }
 
+#if GUEST_PAGING_LEVELS >= 4
+static bool_t pkey_fault(struct vcpu *vcpu, uint32_t pfec,
+uint32_t pte_flags, uint32_t pte_pkey)
+{
+uint32_t pkru = 0;
+bool_t pkru_ad = 0, pkru_wd = 0;
+
+/* When page isn't present,  PKEY isn't checked. */
+if ( !(pfec & PFEC_page_present) || is_pv_vcpu(vcpu) )
+return 0;
+
+/*
+ * PKU:  additional mechanism by which the paging controls
+ * access to user-mode addresses based on the value in the
+ * PKRU register. A fault is considered as a PKU violation if all
+ * of the following conditions are true:
+ * 1.CR4_PKE=1.
+ * 2.EFER_LMA=1.
+ * 3.Page is present with no reserved bit violations.
+ * 4.The access is not an instruction fetch.
+ * 5.The access is to a user page.
+ * 6.PKRU.AD=1 or
+ *  the access is a data write and PKRU.WD=1 and
+ *  either CR0.WP=1 or it is a user access.
+ */
+if ( !hvm_pku_enabled(vcpu) ||
+ !hvm_long_mode_enabled(vcpu) ||
+ /* The persent bit is guaranteed by the caller. */
+ (pfec & PFEC_reserved_bit) ||
+ (pfec & PFEC_insn_fetch) ||
+ !(pte_flags & _PAGE_USER) )
+return 0;
+
+pkru = read_pkru();
+if ( unlikely(pkru) )
+{
+pkru_ad = read_pkru_ad(pkru, pte_pkey);
+pkru_wd = read_pkru_wd(pkru, pte_pkey);
+/* Condition 6 */
+if ( pkru_ad || (pkru_wd && (pfec & PFEC_write_access) &&
+(hvm_wp_enabled(vcpu) || (pfec & PFEC_user_mode
+return 1;
+}
+
+return 0;
+}
+#endif
+
 /* Walk the guest pagetables, after the manner of a hardware walker. */
 /* Because the walk is essentially random, it can cause a deadlock 
  * warning in the p2m locking code. Highly unlikely this is an actual
@@ -107,6 +155,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 guest_l3e_t *l3p = NULL;
 guest_l4e_t *l4p;
 #endif
+unsigned int pkey;
 uint32_t gflags, mflags, iflags, rc = 0;
 bool_t smep = 0, smap = 0;
 bool_t pse1G = 0, pse2M = 0;
@@ -190,6 +239,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 goto out;
 /* Get the l3e and check its flags*/
 gw->l3e = l3p[guest_l3_table_offset(va)];
+pkey = guest_l3e_get_pkey(gw->l3e);
 gflags = guest_l3e_get_flags(gw->l3e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -261,6 +311,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 #endif /* All levels... */
 
+pkey = guest_l2e_get_pkey(gw->l2e);
 gflags = guest_l2e_get_flags(gw->l2e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -324,6 +375,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 if(l1p == NULL)
 goto out;
 gw->l1e = l1p[guest_l1_table_offset(va)];
+pkey = guest_l1e_get_pkey(gw->l1e);
 gflags = guest_l1e_get_flags(gw->l1e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc 

[Xen-devel] [PATCH V9 4/5] xen/mm: Clean up pfec handling in gva_to_gfn

2016-02-03 Thread Huaitong Han
From: George Dunlap 

At the moment, the pfec argument to gva_to_gfn has two functions:

* To inform guest_walk what kind of access is happenind

* As a value to pass back into the guest in the event of a fault.

Unfortunately this is not quite treated consistently: the hvm_fetch_*
function will "pre-clear" the PFEC_insn_fetch flag before calling
gva_to_gfn; meaning guest_walk doesn't actually know whether a given
access is an instruction fetch or not.  This works now, but will cause
issues when pkeys are introduced, since guest_walk will need to know
whether an access is an instruction fetch even if it doesn't return
PFEC_insn_fetch.

Fix this by making a clean separation for in and out functionalities
of the pfec argument:

1. Always pass in the access type to gva_to_gfn

2. Filter out inappropriate access flags before returning from gva_to_gfn.

(The PFEC_insn_fetch flag should only be passed to the guest if either NX or
SMEP is enabled.  See Intel 64 Developer's Manual, Volume 3, Chapter Paging,
PAGE-FAULT EXCEPTIONS)

Signed-off-by: George Dunlap 
Signed-off-by: Huaitong Han 
Acked-by: Jan Beulich 
---
Changes in v8:
*Add the comment describing for paging_gva_to_gfn.

Changes in v7:
*Update SDM chapter comments.
*Add hvm_vcpu check in sh_gva_to_gfn.

 xen/arch/x86/hvm/hvm.c   |  8 ++--
 xen/arch/x86/mm/hap/guest_walk.c | 10 +-
 xen/arch/x86/mm/shadow/multi.c   |  6 ++
 xen/include/asm-x86/paging.h |  6 +-
 4 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 674feea..5ec2ae1 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4438,11 +4438,9 @@ enum hvm_copy_result hvm_copy_from_guest_virt(
 enum hvm_copy_result hvm_fetch_from_guest_virt(
 void *buf, unsigned long vaddr, int size, uint32_t pfec)
 {
-if ( hvm_nx_enabled(current) || hvm_smep_enabled(current) )
-pfec |= PFEC_insn_fetch;
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
-  PFEC_page_present | pfec);
+  PFEC_page_present | PFEC_insn_fetch | pfec);
 }
 
 enum hvm_copy_result hvm_copy_to_guest_virt_nofault(
@@ -4464,11 +4462,9 @@ enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
 enum hvm_copy_result hvm_fetch_from_guest_virt_nofault(
 void *buf, unsigned long vaddr, int size, uint32_t pfec)
 {
-if ( hvm_nx_enabled(current) || hvm_smep_enabled(current) )
-pfec |= PFEC_insn_fetch;
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-  PFEC_page_present | pfec);
+  PFEC_page_present | PFEC_insn_fetch | pfec);
 }
 
 unsigned long copy_to_user_hvm(void *to, const void *from, unsigned int len)
diff --git a/xen/arch/x86/mm/hap/guest_walk.c b/xen/arch/x86/mm/hap/guest_walk.c
index 49d0328..d2716f9 100644
--- a/xen/arch/x86/mm/hap/guest_walk.c
+++ b/xen/arch/x86/mm/hap/guest_walk.c
@@ -82,7 +82,7 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
 if ( !top_page )
 {
 pfec[0] &= ~PFEC_page_present;
-return INVALID_GFN;
+goto out_tweak_pfec;
 }
 top_mfn = _mfn(page_to_mfn(top_page));
 
@@ -139,6 +139,14 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
 if ( missing & _PAGE_SHARED )
 pfec[0] = PFEC_page_shared;
 
+ out_tweak_pfec:
+/*
+ * SDM Intel 64 Volume 3, Chapter Paging, PAGE-FAULT EXCEPTIONS:
+ * The PFEC_insn_fetch flag is set only when NX or SMEP are enabled.
+ */
+if ( !hvm_nx_enabled(v) && !hvm_smep_enabled(v) )
+pfec[0] &= ~PFEC_insn_fetch;
+
 return INVALID_GFN;
 }
 
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 162c06f..d42597c 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3669,6 +3669,12 @@ sh_gva_to_gfn(struct vcpu *v, struct p2m_domain *p2m,
 pfec[0] &= ~PFEC_page_present;
 if ( missing & _PAGE_INVALID_BITS )
 pfec[0] |= PFEC_reserved_bit;
+/*
+ * SDM Intel 64 Volume 3, Chapter Paging, PAGE-FAULT EXCEPTIONS:
+ * The PFEC_insn_fetch flag is set only when NX or SMEP are enabled.
+ */
+if ( is_hvm_vcpu(v) && !hvm_nx_enabled(v) && !hvm_smep_enabled(v) )
+pfec[0] &= ~PFEC_insn_fetch;
 return INVALID_GFN;
 }
 gfn = guest_walk_to_gfn(&gw);
diff --git a/xen/include/asm-x86/paging.h b/xen/include/asm-x86/paging.h
index 9a8653d..195fe8f 100644
--- a/xen/include/asm-x86/paging.h
+++ b/xen/include/asm-x86/paging.h
@@ -255,7 +255,11 @@ static inline bool_t paging_invlpg(struct vcpu *v, 
unsigned long va)
  * tables don't map this address for this kind of access.
  * pfec[0] is used to determine which kind of access this is when
  * walking t

[Xen-devel] [PATCH V9 3/5] x86/hvm: pkeys, add xstate support for pkeys

2016-02-03 Thread Huaitong Han
The XSAVE feature set can operate on PKRU state only if the feature set is
enabled (CR4.OSXSAVE = 1) and has been configured to manage PKRU state
(XCR0[9] = 1). And XCR0.PKRU is disabled on PV mode without PKU feature
enabled.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
Changes in v7:
*Use EOPNOTSUPP instead of EINVAL as return value on is_pv_vcpu condition.

 xen/arch/x86/xstate.c| 4 
 xen/include/asm-x86/xstate.h | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 4e87ab3..50d9e48 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -579,6 +579,10 @@ int handle_xsetbv(u32 index, u64 new_bv)
 if ( (new_bv & ~xfeature_mask) || !valid_xcr0(new_bv) )
 return -EINVAL;
 
+/* XCR0.PKRU is disabled on PV mode. */
+if ( is_pv_vcpu(curr) && (new_bv & XSTATE_PKRU) )
+return -EOPNOTSUPP;
+
 if ( !set_xcr0(new_bv) )
 return -EFAULT;
 
diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index 12d939b..f7c41ba 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -34,13 +34,15 @@
 #define XSTATE_OPMASK  (1ULL << 5)
 #define XSTATE_ZMM (1ULL << 6)
 #define XSTATE_HI_ZMM  (1ULL << 7)
+#define XSTATE_PKRU(1ULL << 9)
 #define XSTATE_LWP (1ULL << 62) /* AMD lightweight profiling */
 #define XSTATE_FP_SSE  (XSTATE_FP | XSTATE_SSE)
 #define XCNTXT_MASK(XSTATE_FP | XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | \
 XSTATE_ZMM | XSTATE_HI_ZMM | XSTATE_NONLAZY)
 
 #define XSTATE_ALL (~(1ULL << 63))
-#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR)
+#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR | \
+XSTATE_PKRU)
 #define XSTATE_LAZY(XSTATE_ALL & ~XSTATE_NONLAZY)
 #define XSTATE_COMPACTION_ENABLED  (1ULL << 63)
 
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V9 1/5] x86/hvm: pkeys, disable pkeys for guests in non-paging mode

2016-02-03 Thread Huaitong Han
This patch disables pkeys for guest in non-paging mode, However XEN always uses
paging mode to emulate guest non-paging mode, To emulate this behavior, pkeys
needs to be manually disabled when guest switches to non-paging mode.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
 xen/arch/x86/hvm/vmx/vmx.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 04dde83..a0d51cb 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1368,12 +1368,13 @@ static void vmx_update_guest_cr(struct vcpu *v, 
unsigned int cr)
 if ( !hvm_paging_enabled(v) )
 {
 /*
- * SMEP/SMAP is disabled if CPU is in non-paging mode in hardware.
- * However Xen always uses paging mode to emulate guest non-paging
- * mode. To emulate this behavior, SMEP/SMAP needs to be manually
- * disabled when guest VCPU is in non-paging mode.
+ * SMEP/SMAP/PKU is disabled if CPU is in non-paging mode in
+ * hardware. However Xen always uses paging mode to emulate guest
+ * non-paging mode. To emulate this behavior, SMEP/SMAP/PKU needs
+ * to be manually disabled when guest VCPU is in non-paging mode.
  */
-v->arch.hvm_vcpu.hw_cr[4] &= ~(X86_CR4_SMEP | X86_CR4_SMAP);
+v->arch.hvm_vcpu.hw_cr[4] &=
+~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
 }
 __vmwrite(GUEST_CR4, v->arch.hvm_vcpu.hw_cr[4]);
 break;
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V9 0/5] x86/hvm: pkeys, add memory protection-key support

2016-02-03 Thread Huaitong Han
Changes in v9:
*Rename _write_cr4 to raw_write_cr4.
*Clear X86_FEATURE_OSPKE and X86_FEATURE_OSXSAVE when the condition is not 
satisfied.

Changes in v8:
*Add the comment describing for paging_gva_to_gfn.
*Abstract out _write_cr4.

Changes in v7:
*Add static for pkey_fault.
*Add a comment for page present check and adjust indentation.
*Init pkru_ad and pkru_wd.
*Delete l3e_get_pkey the outer parentheses.
*The first parameter of read_pkru_* use uint32_t type.
*Use EOPNOTSUPP instead of EINVAL as return value on is_pv_vcpu condition.
*Update SDM chapter comments for patch 4.
*Add hvm_vcpu check in sh_gva_to_gfn.
*Rebase in the latest tree for patch 5.
*Add a comment for cpu_has_xsave adjustment.
*Adjust indentation for patch 5.

Changes in v6:
*2 patches merged are not included.
*Don't write XSTATE_PKRU to PV's xcr0.
*Use "if()" instead of "?:" in cpuid handling patch.
*Update read_pkru function.
*Use value 4 instead of CONFIG_PAGING_LEVELS.
*Add George's patch for PFEC_insn_fetch handling.

Changes in v5:
*Add static for opt_pku.
*Update commit message for some patches.
*Add condition 5:the access is to a user page to pkey_fault, and simplify #ifdef
for guest_walk_tables patch.
*Don't write XSTATE_PKRU to PV's xcr0.
*count == 0 is combined in hvm_cpuid function.

Changes in v4:
*Delete gva2gfn patch, and when page is present, PFEC_prot_key is always 
checked.
*Use RDPKRU instead of xsave_read because RDPKRU does cost less.
*Squash pkeys patch and pkru patch to guest_walk_tables patch.

Changes in v3:
*Get CPUID:ospke depend on guest cpuid instead of host hardware capable,
and Move cpuid patch to the last of patches.
*Move opt_pku to cpu/common.c.
*Use MASK_EXTR for get_pte_pkeys.
*Add quoting for pkru macro, and use static inline pkru_read functions.
*Rebase get_xsave_addr for updated codes, and add uncompressed format support
for xsave state.
*Use fpu_xsave instead of vcpu_save_fpu, and adjust the code style for
leaf_pte_pkeys_check.
*Add parentheses for PFEC_prot_key of gva2gfn funcitons.

Changes in v2:
*Rebase all patches in staging branch
*Disable X86_CR4_PKE on hypervisor, and delete pkru_read/write functions, and
use xsave state read to get pkru value.
*Delete the patch that adds pkeys support for do_page_fault.
*Add pkeys support for gva2gfn so that setting _PAGE_PK_BIT in the return
value can get propagated to the guest correctly.

The protection-key feature provides an additional mechanism by which IA-32e
paging controls access to usermode addresses.

Hardware support for protection keys for user pages is enumerated with CPUID
feature flag CPUID.7.0.ECX[3]:PKU. Software support is CPUID.7.0.ECX[4]:OSPKE
with the setting of CR4.PKE(bit 22).

When CR4.PKE = 1, every linear address is associated with the 4-bit protection
key located in bits 62:59 of the paging-structure entry that mapped the page
containing the linear address. The PKRU register determines, for each
protection key, whether user-mode addresses with that protection key may be
read or written.

The PKRU register (protection key rights for user pages) is a 32-bit register
with the following format: for each i (0 ≤ i ≤ 15), PKRU[2i] is the
access-disable bit for protection key i (ADi); PKRU[2i+1] is the write-disable
bit for protection key i (WDi).

Software can use the RDPKRU and WRPKRU instructions with ECX = 0 to read and
write PKRU. In addition, the PKRU register is XSAVE-managed state and can thus
be read and written by instructions in the XSAVE feature set.

PFEC.PK (bit 5) is defined as protection key violations.

The specification of Protection Keys can be found at SDM (4.6.2, volume 3)
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.

George Dunlap (1):
  xen/mm: Clean up pfec handling in gva_to_gfn

Huaitong Han (4):
  x86/hvm: pkeys, disable pkeys for guests in non-paging mode
  x86/hvm: pkeys, add pkeys support for guest_walk_tables
  x86/hvm: pkeys, add xstate support for pkeys
  x86/hvm: pkeys, add pkeys support for cpuid handling

 tools/libxc/xc_cpufeature.h   |  3 +++
 tools/libxc/xc_cpuid_x86.c|  6 +++--
 xen/arch/x86/hvm/hvm.c| 34 ++--
 xen/arch/x86/hvm/vmx/vmx.c| 11 
 xen/arch/x86/mm/guest_walk.c  | 54 +++
 xen/arch/x86/mm/hap/guest_walk.c  | 13 +-
 xen/arch/x86/mm/shadow/multi.c|  6 +
 xen/arch/x86/xstate.c |  4 +++
 xen/include/asm-x86/guest_pt.h| 12 +
 xen/include/asm-x86/hvm/hvm.h |  2 ++
 xen/include/asm-x86/page.h|  5 
 xen/include/asm-x86/paging.h  |  6 -
 xen/include/asm-x86/processor.h   | 47 +-
 xen/include/asm-x86/x86_64/page.h | 12 +
 xen/include/asm-x86/xstate.h  |  4 ++-
 15 files changed, 195 insertions(+), 24 deletions(-)

-- 
2.4.3


___

[Xen-devel] [PATCH V9 5/5] x86/hvm: pkeys, add pkeys support for cpuid handling

2016-02-03 Thread Huaitong Han
This patch adds pkeys support for cpuid handing.

Pkeys hardware support is CPUID.7.0.ECX[3]:PKU. software support is
CPUID.7.0.ECX[4]:OSPKE and it reflects the support setting of CR4.PKE.

X86_FEATURE_OSXSAVE depends on guest X86_FEATURE_XSAVE, but cpu_has_xsave
function reflects hypervisor X86_FEATURE_XSAVE, it is fixed too.

Signed-off-by: Huaitong Han 
---
Changes in v9:
*Clear X86_FEATURE_OSPKE and X86_FEATURE_OSXSAVE when the condition is not 
satisfied.

Changes in v7:
*Rebase in the latest tree.
*Add a comment for cpu_has_xsave adjustment.
*Adjust indentation.

 tools/libxc/xc_cpufeature.h |  3 +++
 tools/libxc/xc_cpuid_x86.c  |  6 --
 xen/arch/x86/hvm/hvm.c  | 26 +++---
 3 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/tools/libxc/xc_cpufeature.h b/tools/libxc/xc_cpufeature.h
index ee53679..866cf0b 100644
--- a/tools/libxc/xc_cpufeature.h
+++ b/tools/libxc/xc_cpufeature.h
@@ -144,4 +144,7 @@
 #define X86_FEATURE_CLFLUSHOPT  23 /* CLFLUSHOPT instruction */
 #define X86_FEATURE_CLWB24 /* CLWB instruction */
 
+/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx) */
+#define X86_FEATURE_PKU 3
+
 #endif /* __LIBXC_CPUFEATURE_H */
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index c142595..5408dd0 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -430,9 +430,11 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
 bitmaskof(X86_FEATURE_PCOMMIT) |
 bitmaskof(X86_FEATURE_CLWB) |
 bitmaskof(X86_FEATURE_CLFLUSHOPT));
+regs[2] &= bitmaskof(X86_FEATURE_PKU);
 } else
-regs[1] = 0;
-regs[0] = regs[2] = regs[3] = 0;
+regs[1] = regs[2] = 0;
+
+regs[0] = regs[3] = 0;
 break;
 
 case 0x000d:
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5ec2ae1..73fb54c 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4572,9 +4572,11 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 __clear_bit(X86_FEATURE_APIC & 31, edx);
 
 /* Fix up OSXSAVE. */
-if ( cpu_has_xsave )
-*ecx |= (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE) ?
- cpufeat_mask(X86_FEATURE_OSXSAVE) : 0;
+if ( *ecx & cpufeat_mask(X86_FEATURE_XSAVE) &&
+ (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE) )
+*ecx |= cpufeat_mask(X86_FEATURE_OSXSAVE);
+else
+*ecx &= ~cpufeat_mask(X86_FEATURE_OSXSAVE);
 
 /* Don't expose PCID to non-hap hvm. */
 if ( !hap_enabled(d) )
@@ -4593,16 +4595,26 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 if ( !cpu_has_smap )
 *ebx &= ~cpufeat_mask(X86_FEATURE_SMAP);
 
-/* Don't expose MPX to hvm when VMX support is not available */
+/* Don't expose MPX to hvm when VMX support is not available. */
 if ( !(vmx_vmexit_control & VM_EXIT_CLEAR_BNDCFGS) ||
  !(vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS) )
 *ebx &= ~cpufeat_mask(X86_FEATURE_MPX);
 
-/* Don't expose INVPCID to non-hap hvm. */
 if ( !hap_enabled(d) )
-*ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+{
+ /* Don't expose INVPCID to non-hap hvm. */
+ *ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+ /* X86_FEATURE_PKU is not yet implemented for shadow paging. 
*/
+ *ecx &= ~cpufeat_mask(X86_FEATURE_PKU);
+}
+
+if ( (*ecx & cpufeat_mask(X86_FEATURE_PKU)) &&
+ (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE) )
+*ecx |= cpufeat_mask(X86_FEATURE_OSPKE);
+else
+*ecx &= ~cpufeat_mask(X86_FEATURE_OSPKE);
 
-/* Don't expose PCOMMIT to hvm when VMX support is not available */
+/* Don't expose PCOMMIT to hvm when VMX support is not available. 
*/
 if ( !cpu_has_vmx_pcommit )
 *ebx &= ~cpufeat_mask(X86_FEATURE_PCOMMIT);
 }
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V8 4/5] xen/mm: Clean up pfec handling in gva_to_gfn

2016-02-01 Thread Huaitong Han
From: George Dunlap 

At the moment, the pfec argument to gva_to_gfn has two functions:

* To inform guest_walk what kind of access is happenind

* As a value to pass back into the guest in the event of a fault.

Unfortunately this is not quite treated consistently: the hvm_fetch_*
function will "pre-clear" the PFEC_insn_fetch flag before calling
gva_to_gfn; meaning guest_walk doesn't actually know whether a given
access is an instruction fetch or not.  This works now, but will cause
issues when pkeys are introduced, since guest_walk will need to know
whether an access is an instruction fetch even if it doesn't return
PFEC_insn_fetch.

Fix this by making a clean separation for in and out functionalities
of the pfec argument:

1. Always pass in the access type to gva_to_gfn

2. Filter out inappropriate access flags before returning from gva_to_gfn.

(The PFEC_insn_fetch flag should only be passed to the guest if either NX or
SMEP is enabled.  See Intel 64 Developer's Manual, Volume 3, Chapter Paging,
PAGE-FAULT EXCEPTIONS)

Signed-off-by: George Dunlap 
Signed-off-by: Huaitong Han 
Acked-by: Jan Beulich 
---
Changes in v8:
*Add the comment describing for paging_gva_to_gfn.

Changes in v7:
*Update SDM chapter comments.
*Add hvm_vcpu check in sh_gva_to_gfn.

 xen/arch/x86/hvm/hvm.c   |  8 ++--
 xen/arch/x86/mm/hap/guest_walk.c | 10 +-
 xen/arch/x86/mm/shadow/multi.c   |  6 ++
 xen/include/asm-x86/paging.h |  6 +-
 4 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 674feea..5ec2ae1 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4438,11 +4438,9 @@ enum hvm_copy_result hvm_copy_from_guest_virt(
 enum hvm_copy_result hvm_fetch_from_guest_virt(
 void *buf, unsigned long vaddr, int size, uint32_t pfec)
 {
-if ( hvm_nx_enabled(current) || hvm_smep_enabled(current) )
-pfec |= PFEC_insn_fetch;
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
-  PFEC_page_present | pfec);
+  PFEC_page_present | PFEC_insn_fetch | pfec);
 }
 
 enum hvm_copy_result hvm_copy_to_guest_virt_nofault(
@@ -4464,11 +4462,9 @@ enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
 enum hvm_copy_result hvm_fetch_from_guest_virt_nofault(
 void *buf, unsigned long vaddr, int size, uint32_t pfec)
 {
-if ( hvm_nx_enabled(current) || hvm_smep_enabled(current) )
-pfec |= PFEC_insn_fetch;
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-  PFEC_page_present | pfec);
+  PFEC_page_present | PFEC_insn_fetch | pfec);
 }
 
 unsigned long copy_to_user_hvm(void *to, const void *from, unsigned int len)
diff --git a/xen/arch/x86/mm/hap/guest_walk.c b/xen/arch/x86/mm/hap/guest_walk.c
index 49d0328..d2716f9 100644
--- a/xen/arch/x86/mm/hap/guest_walk.c
+++ b/xen/arch/x86/mm/hap/guest_walk.c
@@ -82,7 +82,7 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
 if ( !top_page )
 {
 pfec[0] &= ~PFEC_page_present;
-return INVALID_GFN;
+goto out_tweak_pfec;
 }
 top_mfn = _mfn(page_to_mfn(top_page));
 
@@ -139,6 +139,14 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
 if ( missing & _PAGE_SHARED )
 pfec[0] = PFEC_page_shared;
 
+ out_tweak_pfec:
+/*
+ * SDM Intel 64 Volume 3, Chapter Paging, PAGE-FAULT EXCEPTIONS:
+ * The PFEC_insn_fetch flag is set only when NX or SMEP are enabled.
+ */
+if ( !hvm_nx_enabled(v) && !hvm_smep_enabled(v) )
+pfec[0] &= ~PFEC_insn_fetch;
+
 return INVALID_GFN;
 }
 
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 162c06f..d42597c 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3669,6 +3669,12 @@ sh_gva_to_gfn(struct vcpu *v, struct p2m_domain *p2m,
 pfec[0] &= ~PFEC_page_present;
 if ( missing & _PAGE_INVALID_BITS )
 pfec[0] |= PFEC_reserved_bit;
+/*
+ * SDM Intel 64 Volume 3, Chapter Paging, PAGE-FAULT EXCEPTIONS:
+ * The PFEC_insn_fetch flag is set only when NX or SMEP are enabled.
+ */
+if ( is_hvm_vcpu(v) && !hvm_nx_enabled(v) && !hvm_smep_enabled(v) )
+pfec[0] &= ~PFEC_insn_fetch;
 return INVALID_GFN;
 }
 gfn = guest_walk_to_gfn(&gw);
diff --git a/xen/include/asm-x86/paging.h b/xen/include/asm-x86/paging.h
index 9a8653d..195fe8f 100644
--- a/xen/include/asm-x86/paging.h
+++ b/xen/include/asm-x86/paging.h
@@ -255,7 +255,11 @@ static inline bool_t paging_invlpg(struct vcpu *v, 
unsigned long va)
  * tables don't map this address for this kind of access.
  * pfec[0] is used to determine which kind of access this is when
  * walking t

[Xen-devel] [PATCH V8 3/5] x86/hvm: pkeys, add xstate support for pkeys

2016-02-01 Thread Huaitong Han
The XSAVE feature set can operate on PKRU state only if the feature set is
enabled (CR4.OSXSAVE = 1) and has been configured to manage PKRU state
(XCR0[9] = 1). And XCR0.PKRU is disabled on PV mode without PKU feature
enabled.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
Changes in v7:
*Use EOPNOTSUPP instead of EINVAL as return value on is_pv_vcpu condition.

 xen/arch/x86/xstate.c| 4 
 xen/include/asm-x86/xstate.h | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 4e87ab3..50d9e48 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -579,6 +579,10 @@ int handle_xsetbv(u32 index, u64 new_bv)
 if ( (new_bv & ~xfeature_mask) || !valid_xcr0(new_bv) )
 return -EINVAL;
 
+/* XCR0.PKRU is disabled on PV mode. */
+if ( is_pv_vcpu(curr) && (new_bv & XSTATE_PKRU) )
+return -EOPNOTSUPP;
+
 if ( !set_xcr0(new_bv) )
 return -EFAULT;
 
diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index 12d939b..f7c41ba 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -34,13 +34,15 @@
 #define XSTATE_OPMASK  (1ULL << 5)
 #define XSTATE_ZMM (1ULL << 6)
 #define XSTATE_HI_ZMM  (1ULL << 7)
+#define XSTATE_PKRU(1ULL << 9)
 #define XSTATE_LWP (1ULL << 62) /* AMD lightweight profiling */
 #define XSTATE_FP_SSE  (XSTATE_FP | XSTATE_SSE)
 #define XCNTXT_MASK(XSTATE_FP | XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | \
 XSTATE_ZMM | XSTATE_HI_ZMM | XSTATE_NONLAZY)
 
 #define XSTATE_ALL (~(1ULL << 63))
-#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR)
+#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR | \
+XSTATE_PKRU)
 #define XSTATE_LAZY(XSTATE_ALL & ~XSTATE_NONLAZY)
 #define XSTATE_COMPACTION_ENABLED  (1ULL << 63)
 
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V8 0/5] x86/hvm: pkeys, add memory protection-key support

2016-02-01 Thread Huaitong Han
Changes in v8:
*Add the comment describing for paging_gva_to_gfn.
*Abstract out _write_cr4.

Changes in v7:
*Add static for pkey_fault.
*Add a comment for page present check and adjust indentation.
*Init pkru_ad and pkru_wd.
*Delete l3e_get_pkey the outer parentheses.
*The first parameter of read_pkru_* use uint32_t type.
*Use EOPNOTSUPP instead of EINVAL as return value on is_pv_vcpu condition.
*Update SDM chapter comments for patch 4.
*Add hvm_vcpu check in sh_gva_to_gfn.
*Rebase in the latest tree for patch 5.
*Add a comment for cpu_has_xsave adjustment.
*Adjust indentation for patch 5.

Changes in v6:
*2 patches merged are not included.
*Don't write XSTATE_PKRU to PV's xcr0.
*Use "if()" instead of "?:" in cpuid handling patch.
*Update read_pkru function.
*Use value 4 instead of CONFIG_PAGING_LEVELS.
*Add George's patch for PFEC_insn_fetch handling.

Changes in v5:
*Add static for opt_pku.
*Update commit message for some patches.
*Add condition 5:the access is to a user page to pkey_fault, and simplify #ifdef
for guest_walk_tables patch.
*Don't write XSTATE_PKRU to PV's xcr0.
*count == 0 is combined in hvm_cpuid function.

Changes in v4:
*Delete gva2gfn patch, and when page is present, PFEC_prot_key is always 
checked.
*Use RDPKRU instead of xsave_read because RDPKRU does cost less.
*Squash pkeys patch and pkru patch to guest_walk_tables patch.

Changes in v3:
*Get CPUID:ospke depend on guest cpuid instead of host hardware capable,
and Move cpuid patch to the last of patches.
*Move opt_pku to cpu/common.c.
*Use MASK_EXTR for get_pte_pkeys.
*Add quoting for pkru macro, and use static inline pkru_read functions.
*Rebase get_xsave_addr for updated codes, and add uncompressed format support
for xsave state.
*Use fpu_xsave instead of vcpu_save_fpu, and adjust the code style for
leaf_pte_pkeys_check.
*Add parentheses for PFEC_prot_key of gva2gfn funcitons.

Changes in v2:
*Rebase all patches in staging branch
*Disable X86_CR4_PKE on hypervisor, and delete pkru_read/write functions, and
use xsave state read to get pkru value.
*Delete the patch that adds pkeys support for do_page_fault.
*Add pkeys support for gva2gfn so that setting _PAGE_PK_BIT in the return
value can get propagated to the guest correctly.

The protection-key feature provides an additional mechanism by which IA-32e
paging controls access to usermode addresses.

Hardware support for protection keys for user pages is enumerated with CPUID
feature flag CPUID.7.0.ECX[3]:PKU. Software support is CPUID.7.0.ECX[4]:OSPKE
with the setting of CR4.PKE(bit 22).

When CR4.PKE = 1, every linear address is associated with the 4-bit protection
key located in bits 62:59 of the paging-structure entry that mapped the page
containing the linear address. The PKRU register determines, for each
protection key, whether user-mode addresses with that protection key may be
read or written.

The PKRU register (protection key rights for user pages) is a 32-bit register
with the following format: for each i (0 ≤ i ≤ 15), PKRU[2i] is the
access-disable bit for protection key i (ADi); PKRU[2i+1] is the write-disable
bit for protection key i (WDi).

Software can use the RDPKRU and WRPKRU instructions with ECX = 0 to read and
write PKRU. In addition, the PKRU register is XSAVE-managed state and can thus
be read and written by instructions in the XSAVE feature set.

PFEC.PK (bit 5) is defined as protection key violations.

The specification of Protection Keys can be found at SDM (4.6.2, volume 3)
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.

George Dunlap (1):
  xen/mm: Clean up pfec handling in gva_to_gfn

Huaitong Han (4):
  x86/hvm: pkeys, disable pkeys for guests in non-paging mode
  x86/hvm: pkeys, add pkeys support for guest_walk_tables
  x86/hvm: pkeys, add xstate support for pkeys
  x86/hvm: pkeys, add pkeys support for cpuid handling

 tools/libxc/xc_cpufeature.h   |  3 +++
 tools/libxc/xc_cpuid_x86.c|  6 +++--
 xen/arch/x86/hvm/hvm.c| 26 +++
 xen/arch/x86/hvm/vmx/vmx.c| 11 
 xen/arch/x86/mm/guest_walk.c  | 54 +++
 xen/arch/x86/mm/hap/guest_walk.c  | 13 +-
 xen/arch/x86/mm/shadow/multi.c|  6 +
 xen/arch/x86/xstate.c |  4 +++
 xen/include/asm-x86/guest_pt.h| 12 +
 xen/include/asm-x86/hvm/hvm.h |  2 ++
 xen/include/asm-x86/page.h|  5 
 xen/include/asm-x86/paging.h  |  6 -
 xen/include/asm-x86/processor.h   | 47 +-
 xen/include/asm-x86/x86_64/page.h | 12 +
 xen/include/asm-x86/xstate.h  |  4 ++-
 15 files changed, 189 insertions(+), 22 deletions(-)

-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V8 1/5] x86/hvm: pkeys, disable pkeys for guests in non-paging mode

2016-02-01 Thread Huaitong Han
This patch disables pkeys for guest in non-paging mode, However XEN always uses
paging mode to emulate guest non-paging mode, To emulate this behavior, pkeys
needs to be manually disabled when guest switches to non-paging mode.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
Changes in v7:
no changes.

 xen/arch/x86/hvm/vmx/vmx.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 04dde83..a0d51cb 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1368,12 +1368,13 @@ static void vmx_update_guest_cr(struct vcpu *v, 
unsigned int cr)
 if ( !hvm_paging_enabled(v) )
 {
 /*
- * SMEP/SMAP is disabled if CPU is in non-paging mode in hardware.
- * However Xen always uses paging mode to emulate guest non-paging
- * mode. To emulate this behavior, SMEP/SMAP needs to be manually
- * disabled when guest VCPU is in non-paging mode.
+ * SMEP/SMAP/PKU is disabled if CPU is in non-paging mode in
+ * hardware. However Xen always uses paging mode to emulate guest
+ * non-paging mode. To emulate this behavior, SMEP/SMAP/PKU needs
+ * to be manually disabled when guest VCPU is in non-paging mode.
  */
-v->arch.hvm_vcpu.hw_cr[4] &= ~(X86_CR4_SMEP | X86_CR4_SMAP);
+v->arch.hvm_vcpu.hw_cr[4] &=
+~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
 }
 __vmwrite(GUEST_CR4, v->arch.hvm_vcpu.hw_cr[4]);
 break;
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V8 5/5] x86/hvm: pkeys, add pkeys support for cpuid handling

2016-02-01 Thread Huaitong Han
This patch adds pkeys support for cpuid handing.

Pkeys hardware support is CPUID.7.0.ECX[3]:PKU. software support is
CPUID.7.0.ECX[4]:OSPKE and it reflects the support setting of CR4.PKE.

X86_FEATURE_OSXSAVE depends on guest X86_FEATURE_XSAVE, but cpu_has_xsave
function reflects hypervisor X86_FEATURE_XSAVE, it is fixed too.

Signed-off-by: Huaitong Han 
---
Changes in v7:
*Rebase in the latest tree.
*Add a comment for cpu_has_xsave adjustment.
*Adjust indentation.

 tools/libxc/xc_cpufeature.h |  3 +++
 tools/libxc/xc_cpuid_x86.c  |  6 --
 xen/arch/x86/hvm/hvm.c  | 18 +-
 3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/tools/libxc/xc_cpufeature.h b/tools/libxc/xc_cpufeature.h
index ee53679..866cf0b 100644
--- a/tools/libxc/xc_cpufeature.h
+++ b/tools/libxc/xc_cpufeature.h
@@ -144,4 +144,7 @@
 #define X86_FEATURE_CLFLUSHOPT  23 /* CLFLUSHOPT instruction */
 #define X86_FEATURE_CLWB24 /* CLWB instruction */
 
+/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx) */
+#define X86_FEATURE_PKU 3
+
 #endif /* __LIBXC_CPUFEATURE_H */
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index c142595..5408dd0 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -430,9 +430,11 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
 bitmaskof(X86_FEATURE_PCOMMIT) |
 bitmaskof(X86_FEATURE_CLWB) |
 bitmaskof(X86_FEATURE_CLFLUSHOPT));
+regs[2] &= bitmaskof(X86_FEATURE_PKU);
 } else
-regs[1] = 0;
-regs[0] = regs[2] = regs[3] = 0;
+regs[1] = regs[2] = 0;
+
+regs[0] = regs[3] = 0;
 break;
 
 case 0x000d:
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5ec2ae1..1389173 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4572,7 +4572,7 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 __clear_bit(X86_FEATURE_APIC & 31, edx);
 
 /* Fix up OSXSAVE. */
-if ( cpu_has_xsave )
+if ( *ecx & cpufeat_mask(X86_FEATURE_XSAVE) )
 *ecx |= (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE) ?
  cpufeat_mask(X86_FEATURE_OSXSAVE) : 0;
 
@@ -4593,16 +4593,24 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 if ( !cpu_has_smap )
 *ebx &= ~cpufeat_mask(X86_FEATURE_SMAP);
 
-/* Don't expose MPX to hvm when VMX support is not available */
+/* Don't expose MPX to hvm when VMX support is not available. */
 if ( !(vmx_vmexit_control & VM_EXIT_CLEAR_BNDCFGS) ||
  !(vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS) )
 *ebx &= ~cpufeat_mask(X86_FEATURE_MPX);
 
-/* Don't expose INVPCID to non-hap hvm. */
 if ( !hap_enabled(d) )
-*ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+{
+ /* Don't expose INVPCID to non-hap hvm. */
+ *ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+ /* X86_FEATURE_PKU is not yet implemented for shadow paging. 
*/
+ *ecx &= ~cpufeat_mask(X86_FEATURE_PKU);
+}
+
+if ( (*ecx & cpufeat_mask(X86_FEATURE_PKU)) &&
+ (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE) )
+*ecx |= cpufeat_mask(X86_FEATURE_OSPKE);
 
-/* Don't expose PCOMMIT to hvm when VMX support is not available */
+/* Don't expose PCOMMIT to hvm when VMX support is not available. 
*/
 if ( !cpu_has_vmx_pcommit )
 *ebx &= ~cpufeat_mask(X86_FEATURE_PCOMMIT);
 }
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V8 2/5] x86/hvm: pkeys, add pkeys support for guest_walk_tables

2016-02-01 Thread Huaitong Han
Protection keys define a new 4-bit protection key field(PKEY) in bits 62:59 of
leaf entries of the page tables.

PKRU register defines 32 bits, there are 16 domains and 2 attribute bits per
domain in pkru, for each i (0 ≤ i ≤ 15), PKRU[2i] is the access-disable bit for
protection key i (ADi); PKRU[2i+1] is the write-disable bit for protection key
i (WDi). PKEY is index to a defined domain.

A fault is considered as a PKU violation if all of the following conditions are
true:
1.CR4_PKE=1.
2.EFER_LMA=1.
3.Page is present with no reserved bit violations.
4.The access is not an instruction fetch.
5.The access is to a user page.
6.PKRU.AD=1
or The access is a data write and PKRU.WD=1
and either CR0.WP=1 or it is a user access.

Signed-off-by: Huaitong Han 
---
Changes in v8:
*Abstract out _write_cr4.

Changes in v7:
*Add static for pkey_fault.
*Add a comment for page present check and adjust indentation.
*Init pkru_ad and pkru_wd.
*Delete l3e_get_pkey the outer parentheses.
*The first parameter of read_pkru_* use uint32_t type.


 xen/arch/x86/mm/guest_walk.c  | 54 +++
 xen/arch/x86/mm/hap/guest_walk.c  |  3 +++
 xen/include/asm-x86/guest_pt.h| 12 +
 xen/include/asm-x86/hvm/hvm.h |  2 ++
 xen/include/asm-x86/page.h|  5 
 xen/include/asm-x86/processor.h   | 47 +-
 xen/include/asm-x86/x86_64/page.h | 12 +
 7 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 18d1acf..5eb 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -90,6 +90,54 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int 
set_dirty)
 return 0;
 }
 
+#if GUEST_PAGING_LEVELS >= 4
+static bool_t pkey_fault(struct vcpu *vcpu, uint32_t pfec,
+uint32_t pte_flags, uint32_t pte_pkey)
+{
+uint32_t pkru = 0;
+bool_t pkru_ad = 0, pkru_wd = 0;
+
+/* When page isn't present,  PKEY isn't checked. */
+if ( !(pfec & PFEC_page_present) || is_pv_vcpu(vcpu) )
+return 0;
+
+/*
+ * PKU:  additional mechanism by which the paging controls
+ * access to user-mode addresses based on the value in the
+ * PKRU register. A fault is considered as a PKU violation if all
+ * of the following conditions are true:
+ * 1.CR4_PKE=1.
+ * 2.EFER_LMA=1.
+ * 3.Page is present with no reserved bit violations.
+ * 4.The access is not an instruction fetch.
+ * 5.The access is to a user page.
+ * 6.PKRU.AD=1 or
+ *  the access is a data write and PKRU.WD=1 and
+ *  either CR0.WP=1 or it is a user access.
+ */
+if ( !hvm_pku_enabled(vcpu) ||
+ !hvm_long_mode_enabled(vcpu) ||
+ /* The persent bit is guaranteed by the caller. */
+ (pfec & PFEC_reserved_bit) ||
+ (pfec & PFEC_insn_fetch) ||
+ !(pte_flags & _PAGE_USER) )
+return 0;
+
+pkru = read_pkru();
+if ( unlikely(pkru) )
+{
+pkru_ad = read_pkru_ad(pkru, pte_pkey);
+pkru_wd = read_pkru_wd(pkru, pte_pkey);
+/* Condition 6 */
+if ( pkru_ad || (pkru_wd && (pfec & PFEC_write_access) &&
+(hvm_wp_enabled(vcpu) || (pfec & PFEC_user_mode
+return 1;
+}
+
+return 0;
+}
+#endif
+
 /* Walk the guest pagetables, after the manner of a hardware walker. */
 /* Because the walk is essentially random, it can cause a deadlock 
  * warning in the p2m locking code. Highly unlikely this is an actual
@@ -107,6 +155,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 guest_l3e_t *l3p = NULL;
 guest_l4e_t *l4p;
 #endif
+unsigned int pkey;
 uint32_t gflags, mflags, iflags, rc = 0;
 bool_t smep = 0, smap = 0;
 bool_t pse1G = 0, pse2M = 0;
@@ -190,6 +239,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 goto out;
 /* Get the l3e and check its flags*/
 gw->l3e = l3p[guest_l3_table_offset(va)];
+pkey = guest_l3e_get_pkey(gw->l3e);
 gflags = guest_l3e_get_flags(gw->l3e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -261,6 +311,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 #endif /* All levels... */
 
+pkey = guest_l2e_get_pkey(gw->l2e);
 gflags = guest_l2e_get_flags(gw->l2e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -324,6 +375,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 if(l1p == NULL)
 goto out;
 gw->l1e = l1p[guest_l1_table_offset(va)];
+pkey = guest_l1e_get_pkey(gw->l1e);
 gflags = guest_l1e_get_flags(gw->l1e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -334,6 +386,8 @@ guest_walk_tables(struct vcpu *v, stru

[Xen-devel] [PATCH V7 1/5] x86/hvm: pkeys, disable pkeys for guests in non-paging mode

2016-01-27 Thread Huaitong Han
Changes in v7:
no changes.


This patch disables pkeys for guest in non-paging mode, However XEN always uses
paging mode to emulate guest non-paging mode, To emulate this behavior, pkeys
needs to be manually disabled when guest switches to non-paging mode.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
 xen/arch/x86/hvm/vmx/vmx.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 04dde83..a0d51cb 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1368,12 +1368,13 @@ static void vmx_update_guest_cr(struct vcpu *v, 
unsigned int cr)
 if ( !hvm_paging_enabled(v) )
 {
 /*
- * SMEP/SMAP is disabled if CPU is in non-paging mode in hardware.
- * However Xen always uses paging mode to emulate guest non-paging
- * mode. To emulate this behavior, SMEP/SMAP needs to be manually
- * disabled when guest VCPU is in non-paging mode.
+ * SMEP/SMAP/PKU is disabled if CPU is in non-paging mode in
+ * hardware. However Xen always uses paging mode to emulate guest
+ * non-paging mode. To emulate this behavior, SMEP/SMAP/PKU needs
+ * to be manually disabled when guest VCPU is in non-paging mode.
  */
-v->arch.hvm_vcpu.hw_cr[4] &= ~(X86_CR4_SMEP | X86_CR4_SMAP);
+v->arch.hvm_vcpu.hw_cr[4] &=
+~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
 }
 __vmwrite(GUEST_CR4, v->arch.hvm_vcpu.hw_cr[4]);
 break;
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V7 4/5] xen/mm: Clean up pfec handling in gva_to_gfn

2016-01-27 Thread Huaitong Han
From: George Dunlap 

Changes in v7:
*Update SDM chapter comments.
*Add hvm_vcpu check in sh_gva_to_gfn.
---

At the moment, the pfec argument to gva_to_gfn has two functions:

* To inform guest_walk what kind of access is happenind

* As a value to pass back into the guest in the event of a fault.

Unfortunately this is not quite treated consistently: the hvm_fetch_*
function will "pre-clear" the PFEC_insn_fetch flag before calling
gva_to_gfn; meaning guest_walk doesn't actually know whether a given
access is an instruction fetch or not.  This works now, but will cause
issues when pkeys are introduced, since guest_walk will need to know
whether an access is an instruction fetch even if it doesn't return
PFEC_insn_fetch.

Fix this by making a clean separation for in and out functionalities
of the pfec argument:

1. Always pass in the access type to gva_to_gfn

2. Filter out inappropriate access flags before returning from gva_to_gfn.

(The PFEC_insn_fetch flag should only be passed to the guest if either NX or
SMEP is enabled.  See Intel 64 Developer's Manual, Volume 3, Chapter Paging,
PAGE-FAULT EXCEPTIONS)

Signed-off-by: George Dunlap 
Signed-off-by: Huaitong Han 
Acked-by: Jan Beulich 
---
 xen/arch/x86/hvm/hvm.c   |  8 ++--
 xen/arch/x86/mm/hap/guest_walk.c | 10 +-
 xen/arch/x86/mm/shadow/multi.c   |  6 ++
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 674feea..5ec2ae1 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4438,11 +4438,9 @@ enum hvm_copy_result hvm_copy_from_guest_virt(
 enum hvm_copy_result hvm_fetch_from_guest_virt(
 void *buf, unsigned long vaddr, int size, uint32_t pfec)
 {
-if ( hvm_nx_enabled(current) || hvm_smep_enabled(current) )
-pfec |= PFEC_insn_fetch;
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
-  PFEC_page_present | pfec);
+  PFEC_page_present | PFEC_insn_fetch | pfec);
 }
 
 enum hvm_copy_result hvm_copy_to_guest_virt_nofault(
@@ -4464,11 +4462,9 @@ enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
 enum hvm_copy_result hvm_fetch_from_guest_virt_nofault(
 void *buf, unsigned long vaddr, int size, uint32_t pfec)
 {
-if ( hvm_nx_enabled(current) || hvm_smep_enabled(current) )
-pfec |= PFEC_insn_fetch;
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-  PFEC_page_present | pfec);
+  PFEC_page_present | PFEC_insn_fetch | pfec);
 }
 
 unsigned long copy_to_user_hvm(void *to, const void *from, unsigned int len)
diff --git a/xen/arch/x86/mm/hap/guest_walk.c b/xen/arch/x86/mm/hap/guest_walk.c
index 49d0328..d2716f9 100644
--- a/xen/arch/x86/mm/hap/guest_walk.c
+++ b/xen/arch/x86/mm/hap/guest_walk.c
@@ -82,7 +82,7 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
 if ( !top_page )
 {
 pfec[0] &= ~PFEC_page_present;
-return INVALID_GFN;
+goto out_tweak_pfec;
 }
 top_mfn = _mfn(page_to_mfn(top_page));
 
@@ -139,6 +139,14 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
 if ( missing & _PAGE_SHARED )
 pfec[0] = PFEC_page_shared;
 
+ out_tweak_pfec:
+/*
+ * SDM Intel 64 Volume 3, Chapter Paging, PAGE-FAULT EXCEPTIONS:
+ * The PFEC_insn_fetch flag is set only when NX or SMEP are enabled.
+ */
+if ( !hvm_nx_enabled(v) && !hvm_smep_enabled(v) )
+pfec[0] &= ~PFEC_insn_fetch;
+
 return INVALID_GFN;
 }
 
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 162c06f..d42597c 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3669,6 +3669,12 @@ sh_gva_to_gfn(struct vcpu *v, struct p2m_domain *p2m,
 pfec[0] &= ~PFEC_page_present;
 if ( missing & _PAGE_INVALID_BITS )
 pfec[0] |= PFEC_reserved_bit;
+/*
+ * SDM Intel 64 Volume 3, Chapter Paging, PAGE-FAULT EXCEPTIONS:
+ * The PFEC_insn_fetch flag is set only when NX or SMEP are enabled.
+ */
+if ( is_hvm_vcpu(v) && !hvm_nx_enabled(v) && !hvm_smep_enabled(v) )
+pfec[0] &= ~PFEC_insn_fetch;
 return INVALID_GFN;
 }
 gfn = guest_walk_to_gfn(&gw);
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V7 3/5] x86/hvm: pkeys, add xstate support for pkeys

2016-01-27 Thread Huaitong Han
Changes in v7:
*Use EOPNOTSUPP instead of EINVAL as return value on is_pv_vcpu condition.
---

The XSAVE feature set can operate on PKRU state only if the feature set is
enabled (CR4.OSXSAVE = 1) and has been configured to manage PKRU state
(XCR0[9] = 1). And XCR0.PKRU is disabled on PV mode without PKU feature
enabled.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
 xen/arch/x86/xstate.c| 4 
 xen/include/asm-x86/xstate.h | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 4e87ab3..50d9e48 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -579,6 +579,10 @@ int handle_xsetbv(u32 index, u64 new_bv)
 if ( (new_bv & ~xfeature_mask) || !valid_xcr0(new_bv) )
 return -EINVAL;
 
+/* XCR0.PKRU is disabled on PV mode. */
+if ( is_pv_vcpu(curr) && (new_bv & XSTATE_PKRU) )
+return -EOPNOTSUPP;
+
 if ( !set_xcr0(new_bv) )
 return -EFAULT;
 
diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index 12d939b..f7c41ba 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -34,13 +34,15 @@
 #define XSTATE_OPMASK  (1ULL << 5)
 #define XSTATE_ZMM (1ULL << 6)
 #define XSTATE_HI_ZMM  (1ULL << 7)
+#define XSTATE_PKRU(1ULL << 9)
 #define XSTATE_LWP (1ULL << 62) /* AMD lightweight profiling */
 #define XSTATE_FP_SSE  (XSTATE_FP | XSTATE_SSE)
 #define XCNTXT_MASK(XSTATE_FP | XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | \
 XSTATE_ZMM | XSTATE_HI_ZMM | XSTATE_NONLAZY)
 
 #define XSTATE_ALL (~(1ULL << 63))
-#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR)
+#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR | \
+XSTATE_PKRU)
 #define XSTATE_LAZY(XSTATE_ALL & ~XSTATE_NONLAZY)
 #define XSTATE_COMPACTION_ENABLED  (1ULL << 63)
 
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V7 0/5] x86/hvm: pkeys, add memory protection-key support

2016-01-27 Thread Huaitong Han
Changes in v7:
*Add static for pkey_fault.
*Add a comment for page present check and adjust indentation.
*Init pkru_ad and pkru_wd.
*Delete l3e_get_pkey the outer parentheses.
*The first parameter of read_pkru_* use uint32_t type.
*Use EOPNOTSUPP instead of EINVAL as return value on is_pv_vcpu condition.
*Update SDM chapter comments for patch 4.
*Add hvm_vcpu check in sh_gva_to_gfn.
*Rebase in the latest tree for patch 5.
*Add a comment for cpu_has_xsave adjustment.
*Adjust indentation for patch 5.

Changes in v6:
*2 patches merged are not included.
*Don't write XSTATE_PKRU to PV's xcr0.
*Use "if()" instead of "?:" in cpuid handling patch.
*Update read_pkru function.
*Use value 4 instead of CONFIG_PAGING_LEVELS.
*Add George's patch for PFEC_insn_fetch handling.

Changes in v5:
*Add static for opt_pku.
*Update commit message for some patches.
*Add condition 5:the access is to a user page to pkey_fault, and simplify #ifdef
for guest_walk_tables patch.
*Don't write XSTATE_PKRU to PV's xcr0.
*count == 0 is combined in hvm_cpuid function.

Changes in v4:
*Delete gva2gfn patch, and when page is present, PFEC_prot_key is always 
checked.
*Use RDPKRU instead of xsave_read because RDPKRU does cost less.
*Squash pkeys patch and pkru patch to guest_walk_tables patch.

Changes in v3:
*Get CPUID:ospke depend on guest cpuid instead of host hardware capable,
and Move cpuid patch to the last of patches.
*Move opt_pku to cpu/common.c.
*Use MASK_EXTR for get_pte_pkeys.
*Add quoting for pkru macro, and use static inline pkru_read functions.
*Rebase get_xsave_addr for updated codes, and add uncompressed format support
for xsave state.
*Use fpu_xsave instead of vcpu_save_fpu, and adjust the code style for
leaf_pte_pkeys_check.
*Add parentheses for PFEC_prot_key of gva2gfn funcitons.

Changes in v2:
*Rebase all patches in staging branch
*Disable X86_CR4_PKE on hypervisor, and delete pkru_read/write functions, and
use xsave state read to get pkru value.
*Delete the patch that adds pkeys support for do_page_fault.
*Add pkeys support for gva2gfn so that setting _PAGE_PK_BIT in the return
value can get propagated to the guest correctly.

The protection-key feature provides an additional mechanism by which IA-32e
paging controls access to usermode addresses.

Hardware support for protection keys for user pages is enumerated with CPUID
feature flag CPUID.7.0.ECX[3]:PKU. Software support is CPUID.7.0.ECX[4]:OSPKE
with the setting of CR4.PKE(bit 22).

When CR4.PKE = 1, every linear address is associated with the 4-bit protection
key located in bits 62:59 of the paging-structure entry that mapped the page
containing the linear address. The PKRU register determines, for each
protection key, whether user-mode addresses with that protection key may be
read or written.

The PKRU register (protection key rights for user pages) is a 32-bit register
with the following format: for each i (0 ≤ i ≤ 15), PKRU[2i] is the
access-disable bit for protection key i (ADi); PKRU[2i+1] is the write-disable
bit for protection key i (WDi).

Software can use the RDPKRU and WRPKRU instructions with ECX = 0 to read and
write PKRU. In addition, the PKRU register is XSAVE-managed state and can thus
be read and written by instructions in the XSAVE feature set.

PFEC.PK (bit 5) is defined as protection key violations.

The specification of Protection Keys can be found at SDM (4.6.2, volume 3)
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.

George Dunlap (1):
  xen/mm: Clean up pfec handling in gva_to_gfn

Huaitong Han (4):
  x86/hvm: pkeys, disable pkeys for guests in non-paging mode
  x86/hvm: pkeys, add pkeys support for guest_walk_tables
  x86/hvm: pkeys, add xstate support for pkeys
  x86/hvm: pkeys, add pkeys support for cpuid handling

 tools/libxc/xc_cpufeature.h   |  3 +++
 tools/libxc/xc_cpuid_x86.c|  6 +++--
 xen/arch/x86/hvm/hvm.c| 26 +++
 xen/arch/x86/hvm/vmx/vmx.c| 11 
 xen/arch/x86/mm/guest_walk.c  | 54 +++
 xen/arch/x86/mm/hap/guest_walk.c  | 13 +-
 xen/arch/x86/mm/shadow/multi.c|  6 +
 xen/arch/x86/xstate.c |  4 +++
 xen/include/asm-x86/guest_pt.h| 12 +
 xen/include/asm-x86/hvm/hvm.h |  2 ++
 xen/include/asm-x86/page.h|  5 
 xen/include/asm-x86/processor.h   | 40 +
 xen/include/asm-x86/x86_64/page.h | 12 +
 xen/include/asm-x86/xstate.h  |  4 ++-
 14 files changed, 178 insertions(+), 20 deletions(-)

-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V7 5/5] x86/hvm: pkeys, add pkeys support for cpuid handling

2016-01-27 Thread Huaitong Han
Changes in v7:
*Rebase in the latest tree.
*Add a comment for cpu_has_xsave adjustment.
*Adjust indentation.
---

This patch adds pkeys support for cpuid handing.

Pkeys hardware support is CPUID.7.0.ECX[3]:PKU. software support is
CPUID.7.0.ECX[4]:OSPKE and it reflects the support setting of CR4.PKE.

X86_FEATURE_OSXSAVE depends on guest X86_FEATURE_XSAVE, but cpu_has_xsave
function reflects hypervisor X86_FEATURE_XSAVE, it is fixed too.

Signed-off-by: Huaitong Han 
---
 tools/libxc/xc_cpufeature.h |  3 +++
 tools/libxc/xc_cpuid_x86.c  |  6 --
 xen/arch/x86/hvm/hvm.c  | 18 +-
 3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/tools/libxc/xc_cpufeature.h b/tools/libxc/xc_cpufeature.h
index ee53679..866cf0b 100644
--- a/tools/libxc/xc_cpufeature.h
+++ b/tools/libxc/xc_cpufeature.h
@@ -144,4 +144,7 @@
 #define X86_FEATURE_CLFLUSHOPT  23 /* CLFLUSHOPT instruction */
 #define X86_FEATURE_CLWB24 /* CLWB instruction */
 
+/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx) */
+#define X86_FEATURE_PKU 3
+
 #endif /* __LIBXC_CPUFEATURE_H */
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index c142595..5408dd0 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -430,9 +430,11 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
 bitmaskof(X86_FEATURE_PCOMMIT) |
 bitmaskof(X86_FEATURE_CLWB) |
 bitmaskof(X86_FEATURE_CLFLUSHOPT));
+regs[2] &= bitmaskof(X86_FEATURE_PKU);
 } else
-regs[1] = 0;
-regs[0] = regs[2] = regs[3] = 0;
+regs[1] = regs[2] = 0;
+
+regs[0] = regs[3] = 0;
 break;
 
 case 0x000d:
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5ec2ae1..1389173 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4572,7 +4572,7 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 __clear_bit(X86_FEATURE_APIC & 31, edx);
 
 /* Fix up OSXSAVE. */
-if ( cpu_has_xsave )
+if ( *ecx & cpufeat_mask(X86_FEATURE_XSAVE) )
 *ecx |= (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE) ?
  cpufeat_mask(X86_FEATURE_OSXSAVE) : 0;
 
@@ -4593,16 +4593,24 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 if ( !cpu_has_smap )
 *ebx &= ~cpufeat_mask(X86_FEATURE_SMAP);
 
-/* Don't expose MPX to hvm when VMX support is not available */
+/* Don't expose MPX to hvm when VMX support is not available. */
 if ( !(vmx_vmexit_control & VM_EXIT_CLEAR_BNDCFGS) ||
  !(vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS) )
 *ebx &= ~cpufeat_mask(X86_FEATURE_MPX);
 
-/* Don't expose INVPCID to non-hap hvm. */
 if ( !hap_enabled(d) )
-*ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+{
+ /* Don't expose INVPCID to non-hap hvm. */
+ *ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+ /* X86_FEATURE_PKU is not yet implemented for shadow paging. 
*/
+ *ecx &= ~cpufeat_mask(X86_FEATURE_PKU);
+}
+
+if ( (*ecx & cpufeat_mask(X86_FEATURE_PKU)) &&
+ (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE) )
+*ecx |= cpufeat_mask(X86_FEATURE_OSPKE);
 
-/* Don't expose PCOMMIT to hvm when VMX support is not available */
+/* Don't expose PCOMMIT to hvm when VMX support is not available. 
*/
 if ( !cpu_has_vmx_pcommit )
 *ebx &= ~cpufeat_mask(X86_FEATURE_PCOMMIT);
 }
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V7 2/5] x86/hvm: pkeys, add pkeys support for guest_walk_tables

2016-01-27 Thread Huaitong Han
Changes in v7:
*Add static for pkey_fault.
*Add a comment for page present check and adjust indentation.
*Init pkru_ad and pkru_wd.
*Delete l3e_get_pkey the outer parentheses.
*The first parameter of read_pkru_* use uint32_t type.


Protection keys define a new 4-bit protection key field(PKEY) in bits 62:59 of
leaf entries of the page tables.

PKRU register defines 32 bits, there are 16 domains and 2 attribute bits per
domain in pkru, for each i (0 ≤ i ≤ 15), PKRU[2i] is the access-disable bit for
protection key i (ADi); PKRU[2i+1] is the write-disable bit for protection key
i (WDi). PKEY is index to a defined domain.

A fault is considered as a PKU violation if all of the following conditions are
true:
1.CR4_PKE=1.
2.EFER_LMA=1.
3.Page is present with no reserved bit violations.
4.The access is not an instruction fetch.
5.The access is to a user page.
6.PKRU.AD=1
or The access is a data write and PKRU.WD=1
and either CR0.WP=1 or it is a user access.

Signed-off-by: Huaitong Han 
---
 xen/arch/x86/mm/guest_walk.c  | 54 +++
 xen/arch/x86/mm/hap/guest_walk.c  |  3 +++
 xen/include/asm-x86/guest_pt.h| 12 +
 xen/include/asm-x86/hvm/hvm.h |  2 ++
 xen/include/asm-x86/page.h|  5 
 xen/include/asm-x86/processor.h   | 40 +
 xen/include/asm-x86/x86_64/page.h | 12 +
 7 files changed, 128 insertions(+)

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 18d1acf..5eb 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -90,6 +90,54 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int 
set_dirty)
 return 0;
 }
 
+#if GUEST_PAGING_LEVELS >= 4
+static bool_t pkey_fault(struct vcpu *vcpu, uint32_t pfec,
+uint32_t pte_flags, uint32_t pte_pkey)
+{
+uint32_t pkru = 0;
+bool_t pkru_ad = 0, pkru_wd = 0;
+
+/* When page isn't present,  PKEY isn't checked. */
+if ( !(pfec & PFEC_page_present) || is_pv_vcpu(vcpu) )
+return 0;
+
+/*
+ * PKU:  additional mechanism by which the paging controls
+ * access to user-mode addresses based on the value in the
+ * PKRU register. A fault is considered as a PKU violation if all
+ * of the following conditions are true:
+ * 1.CR4_PKE=1.
+ * 2.EFER_LMA=1.
+ * 3.Page is present with no reserved bit violations.
+ * 4.The access is not an instruction fetch.
+ * 5.The access is to a user page.
+ * 6.PKRU.AD=1 or
+ *  the access is a data write and PKRU.WD=1 and
+ *  either CR0.WP=1 or it is a user access.
+ */
+if ( !hvm_pku_enabled(vcpu) ||
+ !hvm_long_mode_enabled(vcpu) ||
+ /* The persent bit is guaranteed by the caller. */
+ (pfec & PFEC_reserved_bit) ||
+ (pfec & PFEC_insn_fetch) ||
+ !(pte_flags & _PAGE_USER) )
+return 0;
+
+pkru = read_pkru();
+if ( unlikely(pkru) )
+{
+pkru_ad = read_pkru_ad(pkru, pte_pkey);
+pkru_wd = read_pkru_wd(pkru, pte_pkey);
+/* Condition 6 */
+if ( pkru_ad || (pkru_wd && (pfec & PFEC_write_access) &&
+(hvm_wp_enabled(vcpu) || (pfec & PFEC_user_mode
+return 1;
+}
+
+return 0;
+}
+#endif
+
 /* Walk the guest pagetables, after the manner of a hardware walker. */
 /* Because the walk is essentially random, it can cause a deadlock 
  * warning in the p2m locking code. Highly unlikely this is an actual
@@ -107,6 +155,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 guest_l3e_t *l3p = NULL;
 guest_l4e_t *l4p;
 #endif
+unsigned int pkey;
 uint32_t gflags, mflags, iflags, rc = 0;
 bool_t smep = 0, smap = 0;
 bool_t pse1G = 0, pse2M = 0;
@@ -190,6 +239,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 goto out;
 /* Get the l3e and check its flags*/
 gw->l3e = l3p[guest_l3_table_offset(va)];
+pkey = guest_l3e_get_pkey(gw->l3e);
 gflags = guest_l3e_get_flags(gw->l3e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -261,6 +311,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 #endif /* All levels... */
 
+pkey = guest_l2e_get_pkey(gw->l2e);
 gflags = guest_l2e_get_flags(gw->l2e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -324,6 +375,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 if(l1p == NULL)
 goto out;
 gw->l1e = l1p[guest_l1_table_offset(va)];
+pkey = guest_l1e_get_pkey(gw->l1e);
 gflags = guest_l1e_get_flags(gw->l1e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -334,6 +386,8 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 #if GUEST_PAGING_LEVELS >

[Xen-devel] [PATCH V6 1/5] x86/hvm: pkeys, disable pkeys for guests in non-paging mode

2016-01-18 Thread Huaitong Han
This patch disables pkeys for guest in non-paging mode, However XEN always uses
paging mode to emulate guest non-paging mode, To emulate this behavior, pkeys
needs to be manually disabled when guest switches to non-paging mode.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
 xen/arch/x86/hvm/vmx/vmx.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b918b8a..9a8cfb5 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1368,12 +1368,13 @@ static void vmx_update_guest_cr(struct vcpu *v, 
unsigned int cr)
 if ( !hvm_paging_enabled(v) )
 {
 /*
- * SMEP/SMAP is disabled if CPU is in non-paging mode in hardware.
- * However Xen always uses paging mode to emulate guest non-paging
- * mode. To emulate this behavior, SMEP/SMAP needs to be manually
- * disabled when guest VCPU is in non-paging mode.
+ * SMEP/SMAP/PKU is disabled if CPU is in non-paging mode in
+ * hardware. However Xen always uses paging mode to emulate guest
+ * non-paging mode. To emulate this behavior, SMEP/SMAP/PKU needs
+ * to be manually disabled when guest VCPU is in non-paging mode.
  */
-v->arch.hvm_vcpu.hw_cr[4] &= ~(X86_CR4_SMEP | X86_CR4_SMAP);
+v->arch.hvm_vcpu.hw_cr[4] &=
+~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
 }
 __vmwrite(GUEST_CR4, v->arch.hvm_vcpu.hw_cr[4]);
 break;
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V6 0/5] x86/hvm: pkeys, add memory protection-key support

2016-01-18 Thread Huaitong Han
Changes in v6:
*2 patches merged are not included.
*Don't write XSTATE_PKRU to PV's xcr0.
*Use "if()" instead of "?:" in cpuid handling patch.
*Update read_pkru function.
*Use value 4 instead of CONFIG_PAGING_LEVELS.
*Add George's patch for PFEC_insn_fetch handling.

Changes in v5:
*Add static for opt_pku.
*Update commit message for some patches.
*Add condition 5:the access is to a user page to pkey_fault, and simplify #ifdef
for guest_walk_tables patch.
*Don't write XSTATE_PKRU to PV's xcr0.
*count == 0 is combined in hvm_cpuid function.

Changes in v4:
*Delete gva2gfn patch, and when page is present, PFEC_prot_key is always 
checked.
*Use RDPKRU instead of xsave_read because RDPKRU does cost less.
*Squash pkeys patch and pkru patch to guest_walk_tables patch.

Changes in v3:
*Get CPUID:ospke depend on guest cpuid instead of host hardware capable,
and Move cpuid patch to the last of patches.
*Move opt_pku to cpu/common.c.
*Use MASK_EXTR for get_pte_pkeys.
*Add quoting for pkru macro, and use static inline pkru_read functions.
*Rebase get_xsave_addr for updated codes, and add uncompressed format support
for xsave state.
*Use fpu_xsave instead of vcpu_save_fpu, and adjust the code style for
leaf_pte_pkeys_check.
*Add parentheses for PFEC_prot_key of gva2gfn funcitons.

Changes in v2:
*Rebase all patches in staging branch
*Disable X86_CR4_PKE on hypervisor, and delete pkru_read/write functions, and
use xsave state read to get pkru value.
*Delete the patch that adds pkeys support for do_page_fault.
*Add pkeys support for gva2gfn so that setting _PAGE_PK_BIT in the return
value can get propagated to the guest correctly.

The protection-key feature provides an additional mechanism by which IA-32e
paging controls access to usermode addresses.

Hardware support for protection keys for user pages is enumerated with CPUID
feature flag CPUID.7.0.ECX[3]:PKU. Software support is CPUID.7.0.ECX[4]:OSPKE
with the setting of CR4.PKE(bit 22).

When CR4.PKE = 1, every linear address is associated with the 4-bit protection
key located in bits 62:59 of the paging-structure entry that mapped the page
containing the linear address. The PKRU register determines, for each
protection key, whether user-mode addresses with that protection key may be
read or written.

The PKRU register (protection key rights for user pages) is a 32-bit register
with the following format: for each i (0 ≤ i ≤ 15), PKRU[2i] is the
access-disable bit for protection key i (ADi); PKRU[2i+1] is the write-disable
bit for protection key i (WDi).

Software can use the RDPKRU and WRPKRU instructions with ECX = 0 to read and
write PKRU. In addition, the PKRU register is XSAVE-managed state and can thus
be read and written by instructions in the XSAVE feature set.

PFEC.PK (bit 5) is defined as protection key violations.

The specification of Protection Keys can be found at SDM (4.6.2, volume 3)
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.

Huaitong Han (5):
  x86/hvm: pkeys, disable pkeys for guests in non-paging mode
  x86/hvm: pkeys, add pkeys support for guest_walk_tables
  x86/hvm: pkeys, add xstate support for pkeys
  xen/mm: Clean up pfec handling in gva_to_gfn
  x86/hvm: pkeys, add pkeys support for cpuid handling

 tools/libxc/xc_cpufeature.h   |  2 ++
 tools/libxc/xc_cpuid_x86.c|  6 +++--
 xen/arch/x86/hvm/hvm.c| 44 ++--
 xen/arch/x86/hvm/vmx/vmx.c| 11 
 xen/arch/x86/mm/guest_walk.c  | 53 +++
 xen/arch/x86/mm/hap/guest_walk.c  | 13 +-
 xen/arch/x86/mm/shadow/multi.c|  6 +
 xen/arch/x86/xstate.c |  4 +++
 xen/include/asm-x86/guest_pt.h| 12 +
 xen/include/asm-x86/hvm/hvm.h |  2 ++
 xen/include/asm-x86/page.h|  5 
 xen/include/asm-x86/processor.h   | 40 +
 xen/include/asm-x86/x86_64/page.h | 12 +
 xen/include/asm-x86/xstate.h  |  4 ++-
 14 files changed, 186 insertions(+), 28 deletions(-)

-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V6 5/5] x86/hvm: pkeys, add pkeys support for cpuid handling

2016-01-18 Thread Huaitong Han
This patch adds pkeys support for cpuid handing.

Pkeys hardware support is CPUID.7.0.ECX[3]:PKU. software support is
CPUID.7.0.ECX[4]:OSPKE and it reflects the support setting of CR4.PKE.

Signed-off-by: Huaitong Han 
---
 tools/libxc/xc_cpufeature.h |  2 ++
 tools/libxc/xc_cpuid_x86.c  |  6 --
 xen/arch/x86/hvm/hvm.c  | 36 +++-
 3 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/tools/libxc/xc_cpufeature.h b/tools/libxc/xc_cpufeature.h
index c3ddc80..f6a9778 100644
--- a/tools/libxc/xc_cpufeature.h
+++ b/tools/libxc/xc_cpufeature.h
@@ -141,5 +141,7 @@
 #define X86_FEATURE_ADX 19 /* ADCX, ADOX instructions */
 #define X86_FEATURE_SMAP20 /* Supervisor Mode Access Protection */
 
+/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx) */
+#define X86_FEATURE_PKU 3
 
 #endif /* __LIBXC_CPUFEATURE_H */
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 8882c01..1ce979b 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -427,9 +427,11 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
 bitmaskof(X86_FEATURE_ADX)  |
 bitmaskof(X86_FEATURE_SMAP) |
 bitmaskof(X86_FEATURE_FSGSBASE));
+regs[2] &= bitmaskof(X86_FEATURE_PKU);
 } else
-regs[1] = 0;
-regs[0] = regs[2] = regs[3] = 0;
+regs[1] = regs[2] = 0;
+
+regs[0] = regs[3] = 0;
 break;
 
 case 0x000d:
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 688d200..39123fe 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4566,7 +4566,7 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 __clear_bit(X86_FEATURE_APIC & 31, edx);
 
 /* Fix up OSXSAVE. */
-if ( cpu_has_xsave )
+if ( *ecx & cpufeat_mask(X86_FEATURE_XSAVE) )
 *ecx |= (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE) ?
  cpufeat_mask(X86_FEATURE_OSXSAVE) : 0;
 
@@ -4579,21 +4579,31 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 *edx &= ~cpufeat_mask(X86_FEATURE_PSE36);
 break;
 case 0x7:
-if ( (count == 0) && !cpu_has_smep )
-*ebx &= ~cpufeat_mask(X86_FEATURE_SMEP);
+if ( count == 0 )
+{
+if ( !cpu_has_smep )
+*ebx &= ~cpufeat_mask(X86_FEATURE_SMEP);
 
-if ( (count == 0) && !cpu_has_smap )
-*ebx &= ~cpufeat_mask(X86_FEATURE_SMAP);
+if ( !cpu_has_smap )
+*ebx &= ~cpufeat_mask(X86_FEATURE_SMAP);
 
-/* Don't expose MPX to hvm when VMX support is not available */
-if ( (count == 0) &&
- (!(vmx_vmexit_control & VM_EXIT_CLEAR_BNDCFGS) ||
-  !(vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS)) )
-*ebx &= ~cpufeat_mask(X86_FEATURE_MPX);
+/* Don't expose MPX to hvm when VMX support is not available. */
+if ( !(vmx_vmexit_control & VM_EXIT_CLEAR_BNDCFGS) ||
+!(vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS) )
+*ebx &= ~cpufeat_mask(X86_FEATURE_MPX);
 
-/* Don't expose INVPCID to non-hap hvm. */
-if ( (count == 0) && !hap_enabled(d) )
-*ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+if ( !hap_enabled(d) )
+{
+/* Don't expose INVPCID to non-hap hvm. */
+*ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+/* X86_FEATURE_PKU is not yet implemented for shadow paging. */
+*ecx &= ~cpufeat_mask(X86_FEATURE_PKU);
+}
+
+if ( (*ecx & cpufeat_mask(X86_FEATURE_PKU)) &&
+(v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE) )
+*ecx |= cpufeat_mask(X86_FEATURE_OSPKE);
+}
 break;
 case 0xb:
 /* Fix the x2APIC identifier. */
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V6 4/5] xen/mm: Clean up pfec handling in gva_to_gfn

2016-01-18 Thread Huaitong Han
At the moment, the pfec argument to gva_to_gfn has two functions:

* To inform guest_walk what kind of access is happenind

* As a value to pass back into the guest in the event of a fault.

Unfortunately this is not quite treated consistently: the hvm_fetch_*
function will "pre-clear" the PFEC_insn_fetch flag before calling
gva_to_gfn; meaning guest_walk doesn't actually know whether a given
access is an instruction fetch or not.  This works now, but will cause
issues when pkeys are introduced, since guest_walk will need to know
whether an access is an instruction fetch even if it doesn't return
PFEC_insn_fetch.

Fix this by making a clean separation for in and out functionalities
of the pfec argument:

1. Always pass in the access type to gva_to_gfn

2. Filter out inappropriate access flags before returning from gva_to_gfn.

(The PFEC_insn_fetch flag should only be passed to the guest if either NX or
SMEP is enabled.  See Intel 64 Developer's Manual, Volume 3, Section 4.7.)

Signed-off-by: George Dunlap 
Signed-off-by: Huaitong Han 
---
 xen/arch/x86/hvm/hvm.c   |  8 ++--
 xen/arch/x86/mm/hap/guest_walk.c | 10 +-
 xen/arch/x86/mm/shadow/multi.c   |  6 ++
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 21470ec..688d200 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4432,11 +4432,9 @@ enum hvm_copy_result hvm_copy_from_guest_virt(
 enum hvm_copy_result hvm_fetch_from_guest_virt(
 void *buf, unsigned long vaddr, int size, uint32_t pfec)
 {
-if ( hvm_nx_enabled(current) || hvm_smep_enabled(current) )
-pfec |= PFEC_insn_fetch;
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
-  PFEC_page_present | pfec);
+  PFEC_page_present | PFEC_insn_fetch | pfec);
 }
 
 enum hvm_copy_result hvm_copy_to_guest_virt_nofault(
@@ -4458,11 +4456,9 @@ enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
 enum hvm_copy_result hvm_fetch_from_guest_virt_nofault(
 void *buf, unsigned long vaddr, int size, uint32_t pfec)
 {
-if ( hvm_nx_enabled(current) || hvm_smep_enabled(current) )
-pfec |= PFEC_insn_fetch;
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-  PFEC_page_present | pfec);
+  PFEC_page_present | PFEC_insn_fetch | pfec);
 }
 
 unsigned long copy_to_user_hvm(void *to, const void *from, unsigned int len)
diff --git a/xen/arch/x86/mm/hap/guest_walk.c b/xen/arch/x86/mm/hap/guest_walk.c
index 49d0328..3eb8597 100644
--- a/xen/arch/x86/mm/hap/guest_walk.c
+++ b/xen/arch/x86/mm/hap/guest_walk.c
@@ -82,7 +82,7 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
 if ( !top_page )
 {
 pfec[0] &= ~PFEC_page_present;
-return INVALID_GFN;
+goto out_tweak_pfec;
 }
 top_mfn = _mfn(page_to_mfn(top_page));
 
@@ -139,6 +139,14 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
 if ( missing & _PAGE_SHARED )
 pfec[0] = PFEC_page_shared;
 
+out_tweak_pfec:
+/*
+ * Intel 64 Volume 3, Section 4.7: The PFEC_insn_fetch flag is set
+ * only when NX or SMEP are enabled.
+ */
+if ( !hvm_nx_enabled(v) && !hvm_smep_enabled(v) )
+pfec[0] &= ~PFEC_insn_fetch;
+
 return INVALID_GFN;
 }
 
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 58f7e72..bbbc706 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3668,6 +3668,12 @@ sh_gva_to_gfn(struct vcpu *v, struct p2m_domain *p2m,
 pfec[0] &= ~PFEC_page_present;
 if ( missing & _PAGE_INVALID_BITS )
 pfec[0] |= PFEC_reserved_bit;
+/*
+ * Intel 64 Volume 3, Section 4.7: The PFEC_insn_fetch flag is
+ * set only when NX or SMEP are enabled.
+ */
+if ( !hvm_nx_enabled(v) && !hvm_smep_enabled(v) )
+pfec[0] &= ~PFEC_insn_fetch;
 return INVALID_GFN;
 }
 gfn = guest_walk_to_gfn(&gw);
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V6 2/5] x86/hvm: pkeys, add pkeys support for guest_walk_tables

2016-01-18 Thread Huaitong Han
Protection keys define a new 4-bit protection key field(PKEY) in bits 62:59 of
leaf entries of the page tables.

PKRU register defines 32 bits, there are 16 domains and 2 attribute bits per
domain in pkru, for each i (0 ≤ i ≤ 15), PKRU[2i] is the access-disable bit for
protection key i (ADi); PKRU[2i+1] is the write-disable bit for protection key
i (WDi). PKEY is index to a defined domain.

A fault is considered as a PKU violation if all of the following conditions are
ture:
1.CR4_PKE=1.
2.EFER_LMA=1.
3.Page is present with no reserved bit violations.
4.The access is not an instruction fetch.
5.The access is to a user page.
6.PKRU.AD=1
or The access is a data write and PKRU.WD=1
and either CR0.WP=1 or it is a user access.

Signed-off-by: Huaitong Han 
---
 xen/arch/x86/mm/guest_walk.c  | 53 +++
 xen/arch/x86/mm/hap/guest_walk.c  |  3 +++
 xen/include/asm-x86/guest_pt.h| 12 +
 xen/include/asm-x86/hvm/hvm.h |  2 ++
 xen/include/asm-x86/page.h|  5 
 xen/include/asm-x86/processor.h   | 40 +
 xen/include/asm-x86/x86_64/page.h | 12 +
 7 files changed, 127 insertions(+)

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 18d1acf..dfee43f 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -90,6 +90,53 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int 
set_dirty)
 return 0;
 }
 
+#if GUEST_PAGING_LEVELS >= 4
+bool_t pkey_fault(struct vcpu *vcpu, uint32_t pfec,
+uint32_t pte_flags, uint32_t pte_pkey)
+{
+unsigned int pkru = 0;
+bool_t pkru_ad, pkru_wd;
+
+/* When page isn't present,  PKEY isn't checked. */
+if ( !(pfec & PFEC_page_present) || is_pv_vcpu(vcpu) )
+return 0;
+
+/*
+ * PKU:  additional mechanism by which the paging controls
+ * access to user-mode addresses based on the value in the
+ * PKRU register. A fault is considered as a PKU violation if all
+ * of the following conditions are true:
+ * 1.CR4_PKE=1.
+ * 2.EFER_LMA=1.
+ * 3.Page is present with no reserved bit violations.
+ * 4.The access is not an instruction fetch.
+ * 5.The access is to a user page.
+ * 6.PKRU.AD=1 or
+ *  the access is a data write and PKRU.WD=1 and
+ *  either CR0.WP=1 or it is a user access.
+ */
+if ( !hvm_pku_enabled(vcpu) ||
+!hvm_long_mode_enabled(vcpu) ||
+(pfec & PFEC_reserved_bit) ||
+(pfec & PFEC_insn_fetch) ||
+!(pte_flags & _PAGE_USER) )
+return 0;
+
+pkru = read_pkru();
+if ( unlikely(pkru) )
+{
+pkru_ad = read_pkru_ad(pkru, pte_pkey);
+pkru_wd = read_pkru_wd(pkru, pte_pkey);
+/* Condition 6 */
+if ( pkru_ad || (pkru_wd && (pfec & PFEC_write_access) &&
+(hvm_wp_enabled(vcpu) || (pfec & PFEC_user_mode
+return 1;
+}
+
+return 0;
+}
+#endif
+
 /* Walk the guest pagetables, after the manner of a hardware walker. */
 /* Because the walk is essentially random, it can cause a deadlock 
  * warning in the p2m locking code. Highly unlikely this is an actual
@@ -107,6 +154,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 guest_l3e_t *l3p = NULL;
 guest_l4e_t *l4p;
 #endif
+unsigned int pkey;
 uint32_t gflags, mflags, iflags, rc = 0;
 bool_t smep = 0, smap = 0;
 bool_t pse1G = 0, pse2M = 0;
@@ -190,6 +238,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 goto out;
 /* Get the l3e and check its flags*/
 gw->l3e = l3p[guest_l3_table_offset(va)];
+pkey = guest_l3e_get_pkey(gw->l3e);
 gflags = guest_l3e_get_flags(gw->l3e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -261,6 +310,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 #endif /* All levels... */
 
+pkey = guest_l2e_get_pkey(gw->l2e);
 gflags = guest_l2e_get_flags(gw->l2e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -324,6 +374,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 if(l1p == NULL)
 goto out;
 gw->l1e = l1p[guest_l1_table_offset(va)];
+pkey = guest_l1e_get_pkey(gw->l1e);
 gflags = guest_l1e_get_flags(gw->l1e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -334,6 +385,8 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
 set_ad:
+if ( pkey_fault(v, pfec, gflags, pkey) )
+rc |= _PAGE_PKEY_BITS;
 #endif
 /* Now re-invert the user-mode requirement for SMEP and SMAP */
 if ( smep || smap )
diff --git a/xen/arch/x86/mm/hap/guest_walk.c b/xen/arch/x86/mm/hap/guest_walk.c
index 11c1b35..

[Xen-devel] [PATCH V6 3/5] x86/hvm: pkeys, add xstate support for pkeys

2016-01-18 Thread Huaitong Han
The XSAVE feature set can operate on PKRU state only if the feature set is
enabled (CR4.OSXSAVE = 1) and has been configured to manage PKRU state
(XCR0[9] = 1). And XCR0.PKRU is disabled on PV mode without PKU feature
enabled.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
 xen/arch/x86/xstate.c| 4 
 xen/include/asm-x86/xstate.h | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 4e87ab3..b79b20b 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -579,6 +579,10 @@ int handle_xsetbv(u32 index, u64 new_bv)
 if ( (new_bv & ~xfeature_mask) || !valid_xcr0(new_bv) )
 return -EINVAL;
 
+/* XCR0.PKRU is disabled on PV mode. */
+if ( is_pv_vcpu(curr) && (new_bv & XSTATE_PKRU) )
+return -EINVAL;
+
 if ( !set_xcr0(new_bv) )
 return -EFAULT;
 
diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index 12d939b..f7c41ba 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -34,13 +34,15 @@
 #define XSTATE_OPMASK  (1ULL << 5)
 #define XSTATE_ZMM (1ULL << 6)
 #define XSTATE_HI_ZMM  (1ULL << 7)
+#define XSTATE_PKRU(1ULL << 9)
 #define XSTATE_LWP (1ULL << 62) /* AMD lightweight profiling */
 #define XSTATE_FP_SSE  (XSTATE_FP | XSTATE_SSE)
 #define XCNTXT_MASK(XSTATE_FP | XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | \
 XSTATE_ZMM | XSTATE_HI_ZMM | XSTATE_NONLAZY)
 
 #define XSTATE_ALL (~(1ULL << 63))
-#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR)
+#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR | \
+XSTATE_PKRU)
 #define XSTATE_LAZY(XSTATE_ALL & ~XSTATE_NONLAZY)
 #define XSTATE_COMPACTION_ENABLED  (1ULL << 63)
 
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V5 6/6] x86/hvm: pkeys, add pkeys support for cpuid handling

2015-12-22 Thread Huaitong Han
This patch adds pkeys support for cpuid handing.

Pkeys hardware support is CPUID.7.0.ECX[3]:PKU. software support is
CPUID.7.0.ECX[4]:OSPKE and it reflects the support setting of CR4.PKE.

Signed-off-by: Huaitong Han 
---
 tools/libxc/xc_cpufeature.h |  2 ++
 tools/libxc/xc_cpuid_x86.c  |  6 --
 xen/arch/x86/hvm/hvm.c  | 36 +++-
 3 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/tools/libxc/xc_cpufeature.h b/tools/libxc/xc_cpufeature.h
index c3ddc80..f6a9778 100644
--- a/tools/libxc/xc_cpufeature.h
+++ b/tools/libxc/xc_cpufeature.h
@@ -141,5 +141,7 @@
 #define X86_FEATURE_ADX 19 /* ADCX, ADOX instructions */
 #define X86_FEATURE_SMAP20 /* Supervisor Mode Access Protection */
 
+/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx) */
+#define X86_FEATURE_PKU 3
 
 #endif /* __LIBXC_CPUFEATURE_H */
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 8882c01..1ce979b 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -427,9 +427,11 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
 bitmaskof(X86_FEATURE_ADX)  |
 bitmaskof(X86_FEATURE_SMAP) |
 bitmaskof(X86_FEATURE_FSGSBASE));
+regs[2] &= bitmaskof(X86_FEATURE_PKU);
 } else
-regs[1] = 0;
-regs[0] = regs[2] = regs[3] = 0;
+regs[1] = regs[2] = 0;
+
+regs[0] = regs[3] = 0;
 break;
 
 case 0x000d:
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 59916ed..076313b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4572,7 +4572,7 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 __clear_bit(X86_FEATURE_APIC & 31, edx);
 
 /* Fix up OSXSAVE. */
-if ( cpu_has_xsave )
+if ( *ecx & cpufeat_mask(X86_FEATURE_XSAVE) )
 *ecx |= (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE) ?
  cpufeat_mask(X86_FEATURE_OSXSAVE) : 0;
 
@@ -4585,21 +4585,31 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 *edx &= ~cpufeat_mask(X86_FEATURE_PSE36);
 break;
 case 0x7:
-if ( (count == 0) && !cpu_has_smep )
-*ebx &= ~cpufeat_mask(X86_FEATURE_SMEP);
+if ( count == 0 )
+{
+if ( !cpu_has_smep )
+*ebx &= ~cpufeat_mask(X86_FEATURE_SMEP);
 
-if ( (count == 0) && !cpu_has_smap )
-*ebx &= ~cpufeat_mask(X86_FEATURE_SMAP);
+if ( !cpu_has_smap )
+*ebx &= ~cpufeat_mask(X86_FEATURE_SMAP);
 
-/* Don't expose MPX to hvm when VMX support is not available */
-if ( (count == 0) &&
- (!(vmx_vmexit_control & VM_EXIT_CLEAR_BNDCFGS) ||
-  !(vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS)) )
-*ebx &= ~cpufeat_mask(X86_FEATURE_MPX);
+/* Don't expose MPX to hvm when VMX support is not available */
+if (!(vmx_vmexit_control & VM_EXIT_CLEAR_BNDCFGS) ||
+  !(vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS))
+*ebx &= ~cpufeat_mask(X86_FEATURE_MPX);
 
-/* Don't expose INVPCID to non-hap hvm. */
-if ( (count == 0) && !hap_enabled(d) )
-*ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+if ( !hap_enabled(d) )
+{
+/* Don't expose INVPCID to non-hap hvm. */
+*ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+/* X86_FEATURE_PKU is not yet implemented for shadow paging. */
+*ecx &= ~cpufeat_mask(X86_FEATURE_PKU);
+}
+
+if ( *ecx & cpufeat_mask(X86_FEATURE_PKU))
+*ecx |= (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE) ?
+ cpufeat_mask(X86_FEATURE_OSPKE) : 0;
+}
 break;
 case 0xb:
 /* Fix the x2APIC identifier. */
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V5 3/6] x86/hvm: pkeys, disable pkeys for guests in non-paging mode

2015-12-22 Thread Huaitong Han
This patch disables pkeys for guest in non-paging mode, However XEN always uses
paging mode to emulate guest non-paging mode, To emulate this behavior, pkeys
needs to be manually disabled when guest switches to non-paging mode.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
 xen/arch/x86/hvm/vmx/vmx.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 2581e97..7123912 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1380,12 +1380,13 @@ static void vmx_update_guest_cr(struct vcpu *v, 
unsigned int cr)
 if ( !hvm_paging_enabled(v) )
 {
 /*
- * SMEP/SMAP is disabled if CPU is in non-paging mode in hardware.
- * However Xen always uses paging mode to emulate guest non-paging
- * mode. To emulate this behavior, SMEP/SMAP needs to be manually
- * disabled when guest VCPU is in non-paging mode.
+ * SMEP/SMAP/PKU is disabled if CPU is in non-paging mode in
+ * hardware. However Xen always uses paging mode to emulate guest
+ * non-paging mode. To emulate this behavior, SMEP/SMAP/PKU needs
+ * to be manually disabled when guest VCPU is in non-paging mode.
  */
-v->arch.hvm_vcpu.hw_cr[4] &= ~(X86_CR4_SMEP | X86_CR4_SMAP);
+v->arch.hvm_vcpu.hw_cr[4] &=
+~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
 }
 __vmwrite(GUEST_CR4, v->arch.hvm_vcpu.hw_cr[4]);
 break;
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V5 1/6] x86/hvm: pkeys, add the flag to enable Memory Protection Keys

2015-12-22 Thread Huaitong Han
This patch adds the flag("pku") to enable Memory Protection Keys, and updates
the markdown.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
 docs/misc/xen-command-line.markdown | 10 ++
 xen/arch/x86/cpu/common.c   | 10 +-
 xen/include/asm-x86/cpufeature.h|  6 +-
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index c103894..36ecf80 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1177,6 +1177,16 @@ This option can be specified more than once (up to 8 
times at present).
 ### ple\_window
 > `= `
 
+### pku
+> `= `
+
+> Default: `true`
+
+Flag to enable Memory Protection Keys.
+
+The protection-key feature provides an additional mechanism by which IA-32e
+paging controls access to usermode addresses.
+
 ### psr (Intel)
 > `= List of ( cmt: | rmid_max: | cat: | 
 > cos_max: | cdp: )`
 
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 310ec85..a018855 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -22,6 +22,10 @@ boolean_param("xsave", use_xsave);
 bool_t opt_arat = 1;
 boolean_param("arat", opt_arat);
 
+/* pku: Flag to enable Memory Protection Keys (default on). */
+static bool_t opt_pku = 1;
+boolean_param("pku", opt_pku);
+
 unsigned int opt_cpuid_mask_ecx = ~0u;
 integer_param("cpuid_mask_ecx", opt_cpuid_mask_ecx);
 unsigned int opt_cpuid_mask_edx = ~0u;
@@ -270,7 +274,8 @@ static void __cpuinit generic_identify(struct cpuinfo_x86 
*c)
if ( c->cpuid_level >= 0x0007 )
cpuid_count(0x0007, 0, &tmp,

&c->x86_capability[cpufeat_word(X86_FEATURE_FSGSBASE)],
-   &tmp, &tmp);
+   &c->x86_capability[cpufeat_word(X86_FEATURE_PKU)],
+   &tmp);
 }
 
 /*
@@ -323,6 +328,9 @@ void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
if ( cpu_has_xsave )
xstate_init(c);
 
+   if ( !opt_pku )
+   setup_clear_cpu_cap(X86_FEATURE_PKU);
+
/*
 * The vendor-specific functions might have changed features.  Now
 * we do "generic changes."
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index af127cf..ef96514 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -11,7 +11,7 @@
 
 #include 
 
-#define NCAPINTS   8   /* N 32-bit words worth of info */
+#define NCAPINTS   9   /* N 32-bit words worth of info */
 
 /* Intel-defined CPU features, CPUID level 0x0001 (edx), word 0 */
 #define X86_FEATURE_FPU(0*32+ 0) /* Onboard FPU */
@@ -163,6 +163,10 @@
 #define X86_FEATURE_ADX(7*32+19) /* ADCX, ADOX instructions */
 #define X86_FEATURE_SMAP   (7*32+20) /* Supervisor Mode Access Prevention 
*/
 
+/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx), word 8 */
+#define X86_FEATURE_PKU(8*32+ 3) /* Protection Keys for Userspace */
+#define X86_FEATURE_OSPKE  (8*32+ 4) /* OS Protection Keys Enable */
+
 #define cpufeat_word(idx)  ((idx) / 32)
 #define cpufeat_bit(idx)   ((idx) % 32)
 #define cpufeat_mask(idx)  (_AC(1, U) << cpufeat_bit(idx))
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V5 5/6] x86/hvm: pkeys, add xstate support for pkeys

2015-12-22 Thread Huaitong Han
This patch adds xstate support for pkeys.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
 xen/arch/x86/xstate.c| 3 ++-
 xen/include/asm-x86/xstate.h | 4 +++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index b65da38..baa4b58 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -487,7 +487,8 @@ void xstate_init(struct cpuinfo_x86 *c)
  * Set CR4_OSXSAVE and run "cpuid" to get xsave_cntxt_size.
  */
 set_in_cr4(X86_CR4_OSXSAVE);
-if ( !set_xcr0(feature_mask) )
+/* PKU is disabled on PV mode. */
+if ( !set_xcr0(feature_mask & ~XSTATE_PKRU) )
 BUG();
 
 if ( bsp )
diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index 12d939b..f7c41ba 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -34,13 +34,15 @@
 #define XSTATE_OPMASK  (1ULL << 5)
 #define XSTATE_ZMM (1ULL << 6)
 #define XSTATE_HI_ZMM  (1ULL << 7)
+#define XSTATE_PKRU(1ULL << 9)
 #define XSTATE_LWP (1ULL << 62) /* AMD lightweight profiling */
 #define XSTATE_FP_SSE  (XSTATE_FP | XSTATE_SSE)
 #define XCNTXT_MASK(XSTATE_FP | XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | \
 XSTATE_ZMM | XSTATE_HI_ZMM | XSTATE_NONLAZY)
 
 #define XSTATE_ALL (~(1ULL << 63))
-#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR)
+#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR | \
+XSTATE_PKRU)
 #define XSTATE_LAZY(XSTATE_ALL & ~XSTATE_NONLAZY)
 #define XSTATE_COMPACTION_ENABLED  (1ULL << 63)
 
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V5 4/6] x86/hvm: pkeys, add pkeys support for guest_walk_tables

2015-12-22 Thread Huaitong Han
Protection keys define a new 4-bit protection key field(PKEY) in bits 62:59 of
leaf entries of the page tables.

PKRU register defines 32 bits, there are 16 domains and 2 attribute bits per
domain in pkru, for each i (0 ≤ i ≤ 15), PKRU[2i] is the access-disable bit for
protection key i (ADi); PKRU[2i+1] is the write-disable bit for protection key
i (WDi). PKEY is index to a defined domain.

A fault is considered as a PKU violation if all of the following conditions are
ture:
1.CR4_PKE=1.
2.EFER_LMA=1.
3.Page is present with no reserved bit violations.
4.The access is not an instruction fetch.
5.The access is to a user page.
6.PKRU.AD=1
or The access is a data write and PKRU.WD=1
and either CR0.WP=1 or it is a user access.

Signed-off-by: Huaitong Han 
---
 xen/arch/x86/mm/guest_walk.c  | 64 +++
 xen/arch/x86/mm/hap/guest_walk.c  |  3 ++
 xen/include/asm-x86/guest_pt.h| 12 
 xen/include/asm-x86/hvm/hvm.h |  2 ++
 xen/include/asm-x86/page.h|  5 +++
 xen/include/asm-x86/processor.h   | 39 
 xen/include/asm-x86/x86_64/page.h | 12 
 7 files changed, 137 insertions(+)

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 18d1acf..9cdd607 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -90,6 +90,57 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int 
set_dirty)
 return 0;
 }
 
+extern bool_t pkey_fault(struct vcpu *vcpu, uint32_t pfec,
+uint32_t pte_flags, uint32_t pte_pkey);
+#if GUEST_PAGING_LEVELS == CONFIG_PAGING_LEVELS
+bool_t pkey_fault(struct vcpu *vcpu, uint32_t pfec,
+uint32_t pte_flags, uint32_t pte_pkey)
+{
+unsigned int pkru = 0;
+bool_t pkru_ad, pkru_wd;
+
+bool_t pf = !!(pfec & PFEC_page_present);
+bool_t uf = !!(pfec & PFEC_user_mode);
+bool_t wf = !!(pfec & PFEC_write_access);
+bool_t ff = !!(pfec & PFEC_insn_fetch);
+bool_t rsvdf = !!(pfec & PFEC_reserved_bit);
+
+/* When page isn't present,  PKEY isn't checked. */
+if ( !pf || is_pv_vcpu(vcpu) )
+return 0;
+
+/*
+ * PKU:  additional mechanism by which the paging controls
+ * access to user-mode addresses based on the value in the
+ * PKRU register. A fault is considered as a PKU violation if all
+ * of the following conditions are ture:
+ * 1.CR4_PKE=1.
+ * 2.EFER_LMA=1.
+ * 3.page is present with no reserved bit violations.
+ * 4.the access is not an instruction fetch.
+ * 5.the access is to a user page.
+ * 6.PKRU.AD=1
+ *   or The access is a data write and PKRU.WD=1
+ *and either CR0.WP=1 or it is a user access.
+ */
+if ( !hvm_pku_enabled(vcpu) || !hvm_long_mode_enabled(vcpu) ||
+rsvdf || ff || !(pte_flags & _PAGE_USER) )
+return 0;
+
+pkru = read_pkru();
+if ( unlikely(pkru) )
+{
+pkru_ad = read_pkru_ad(pkru, pte_pkey);
+pkru_wd = read_pkru_wd(pkru, pte_pkey);
+/* Condition 6 */
+if ( pkru_ad || (pkru_wd && wf && (hvm_wp_enabled(vcpu) || uf)))
+return 1;
+}
+
+return 0;
+}
+#endif
+
 /* Walk the guest pagetables, after the manner of a hardware walker. */
 /* Because the walk is essentially random, it can cause a deadlock 
  * warning in the p2m locking code. Highly unlikely this is an actual
@@ -107,6 +158,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 guest_l3e_t *l3p = NULL;
 guest_l4e_t *l4p;
 #endif
+unsigned int pkey;
 uint32_t gflags, mflags, iflags, rc = 0;
 bool_t smep = 0, smap = 0;
 bool_t pse1G = 0, pse2M = 0;
@@ -190,6 +242,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 goto out;
 /* Get the l3e and check its flags*/
 gw->l3e = l3p[guest_l3_table_offset(va)];
+pkey = guest_l3e_get_pkey(gw->l3e);
 gflags = guest_l3e_get_flags(gw->l3e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -199,6 +252,9 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 pse1G = (gflags & _PAGE_PSE) && guest_supports_1G_superpages(v); 
 
+if ( pse1G && pkey_fault(v, pfec, gflags, pkey) )
+rc |= _PAGE_PKEY_BITS;
+
 if ( pse1G )
 {
 /* Generate a fake l1 table entry so callers don't all 
@@ -270,6 +326,10 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 pse2M = (gflags & _PAGE_PSE) && guest_supports_superpages(v); 
 
+pkey = guest_l2e_get_pkey(gw->l2e);
+if ( pse2M && pkey_fault(v, pfec, gflags, pkey) )
+rc |= _PAGE_PKEY_BITS;
+
 if ( pse2M )
 {
 /* Special case: this guest VA is in a PSE superpage, so there's
@@ -330,6 +390,10 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 goto out;
 }

[Xen-devel] [PATCH V5 0/6] x86/hvm: pkeys, add memory protection-key support

2015-12-22 Thread Huaitong Han
Changes in v5:
*Add static for opt_pku.
*Update commit message for some patches.
*Add condition 5:the access is to a user page to pkey_fault, and simplify #ifdef
for guest_walk_tables patch.
*Don't write XSTATE_PKRU to PV's xcr0.
*count == 0 is combined in hvm_cpuid function.

Changes in v4:
*Delete gva2gfn patch, and when page is present, PFEC_prot_key is always 
checked.
*Use RDPKRU instead of xsave_read because RDPKRU does cost less.
*Squash pkeys patch and pkru patch to guest_walk_tables patch.

Changes in v3:
*Get CPUID:ospke depend on guest cpuid instead of host hardware capable,
and Move cpuid patch to the last of patches.
*Move opt_pku to cpu/common.c.
*Use MASK_EXTR for get_pte_pkeys.
*Add quoting for pkru macro, and use static inline pkru_read functions.
*Rebase get_xsave_addr for updated codes, and add uncompressed format support
for xsave state.
*Use fpu_xsave instead of vcpu_save_fpu, and adjust the code style for
leaf_pte_pkeys_check.
*Add parentheses for PFEC_prot_key of gva2gfn funcitons.

Changes in v2:
*Rebase all patches in staging branch
*Disable X86_CR4_PKE on hypervisor, and delete pkru_read/write functions, and
use xsave state read to get pkru value.
*Delete the patch that adds pkeys support for do_page_fault.
*Add pkeys support for gva2gfn so that setting _PAGE_PK_BIT in the return
value can get propagated to the guest correctly.

The protection-key feature provides an additional mechanism by which IA-32e
paging controls access to usermode addresses.

Hardware support for protection keys for user pages is enumerated with CPUID
feature flag CPUID.7.0.ECX[3]:PKU. Software support is CPUID.7.0.ECX[4]:OSPKE
with the setting of CR4.PKE(bit 22).

When CR4.PKE = 1, every linear address is associated with the 4-bit protection
key located in bits 62:59 of the paging-structure entry that mapped the page
containing the linear address. The PKRU register determines, for each
protection key, whether user-mode addresses with that protection key may be
read or written.

The PKRU register (protection key rights for user pages) is a 32-bit register
with the following format: for each i (0 ≤ i ≤ 15), PKRU[2i] is the
access-disable bit for protection key i (ADi); PKRU[2i+1] is the write-disable
bit for protection key i (WDi).

Software can use the RDPKRU and WRPKRU instructions with ECX = 0 to read and
write PKRU. In addition, the PKRU register is XSAVE-managed state and can thus
be read and written by instructions in the XSAVE feature set.

PFEC.PK (bit 5) is defined as protection key violations.

The specification of Protection Keys can be found at SDM (4.6.2, volume 3)
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.

Huaitong Han (6):
  x86/hvm: pkeys, add the flag to enable Memory Protection Keys
  x86/hvm: pkeys, add pkeys support when setting CR4
  x86/hvm: pkeys, disable pkeys for guests in non-paging mode
  x86/hvm: pkeys, add pkeys support for guest_walk_tables
  x86/hvm: pkeys, add xstate support for pkeys
  x86/hvm: pkeys, add pkeys support for cpuid handling

 docs/misc/xen-command-line.markdown | 10 ++
 tools/libxc/xc_cpufeature.h |  2 ++
 tools/libxc/xc_cpuid_x86.c  |  6 ++--
 xen/arch/x86/cpu/common.c   | 10 +-
 xen/arch/x86/hvm/hvm.c  | 41 
 xen/arch/x86/hvm/vmx/vmx.c  | 11 ---
 xen/arch/x86/mm/guest_walk.c| 64 +
 xen/arch/x86/mm/hap/guest_walk.c|  3 ++
 xen/arch/x86/xstate.c   |  3 +-
 xen/include/asm-x86/cpufeature.h|  6 +++-
 xen/include/asm-x86/guest_pt.h  | 12 +++
 xen/include/asm-x86/hvm/hvm.h   |  2 ++
 xen/include/asm-x86/page.h  |  5 +++
 xen/include/asm-x86/processor.h | 39 ++
 xen/include/asm-x86/x86_64/page.h   | 12 +++
 xen/include/asm-x86/xstate.h|  4 ++-
 16 files changed, 205 insertions(+), 25 deletions(-)

-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V5 2/6] x86/hvm: pkeys, add pkeys support when setting CR4

2015-12-22 Thread Huaitong Han
CR4.PKE(bit 22) enables support for the RDPKRU/WRPKRU instructions to access 
PKRU and
the protection keys check (a page fault trigger).

This patch adds X86_CR4_PKE to hvm_cr4_guest_reserved_bits so that CR4 check 
works
before setting.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
 xen/arch/x86/hvm/hvm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index db0aeba..59916ed 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1924,6 +1924,7 @@ static unsigned long hvm_cr4_guest_reserved_bits(const 
struct vcpu *v,
 leaf1_edx = 
boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_VME)];
 leaf1_ecx = 
boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_PCID)];
 leaf7_0_ebx = 
boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_FSGSBASE)];
+leaf7_0_ecx = 
boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_PKU)];
 }
 
 return ~(unsigned long)
@@ -1959,7 +1960,9 @@ static unsigned long hvm_cr4_guest_reserved_bits(const 
struct vcpu *v,
  (leaf7_0_ebx & cpufeat_mask(X86_FEATURE_SMEP) ?
   X86_CR4_SMEP : 0) |
  (leaf7_0_ebx & cpufeat_mask(X86_FEATURE_SMAP) ?
-  X86_CR4_SMAP : 0));
+  X86_CR4_SMAP : 0) |
+  (leaf7_0_ecx & cpufeat_mask(X86_FEATURE_PKU) ?
+  X86_CR4_PKE : 0));
 }
 
 static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V4 6/6] x86/hvm: pkeys, add pkeys support for cpuid handling

2015-12-20 Thread Huaitong Han
This patch adds pkeys support for cpuid handing.

Pkeys hardware support is CPUID.7.0.ECX[3]:PKU. software support is
CPUID.7.0.ECX[4]:OSPKE and it reflects the support setting of CR4.PKE.

Signed-off-by: Huaitong Han 
---
 tools/libxc/xc_cpufeature.h |  2 ++
 tools/libxc/xc_cpuid_x86.c  |  6 --
 xen/arch/x86/hvm/hvm.c  | 10 +-
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/xc_cpufeature.h b/tools/libxc/xc_cpufeature.h
index c3ddc80..f6a9778 100644
--- a/tools/libxc/xc_cpufeature.h
+++ b/tools/libxc/xc_cpufeature.h
@@ -141,5 +141,7 @@
 #define X86_FEATURE_ADX 19 /* ADCX, ADOX instructions */
 #define X86_FEATURE_SMAP20 /* Supervisor Mode Access Protection */
 
+/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx) */
+#define X86_FEATURE_PKU 3
 
 #endif /* __LIBXC_CPUFEATURE_H */
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 8882c01..1ce979b 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -427,9 +427,11 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
 bitmaskof(X86_FEATURE_ADX)  |
 bitmaskof(X86_FEATURE_SMAP) |
 bitmaskof(X86_FEATURE_FSGSBASE));
+regs[2] &= bitmaskof(X86_FEATURE_PKU);
 } else
-regs[1] = 0;
-regs[0] = regs[2] = regs[3] = 0;
+regs[1] = regs[2] = 0;
+
+regs[0] = regs[3] = 0;
 break;
 
 case 0x000d:
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 59916ed..05821ed 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4572,7 +4572,7 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 __clear_bit(X86_FEATURE_APIC & 31, edx);
 
 /* Fix up OSXSAVE. */
-if ( cpu_has_xsave )
+if ( *ecx & cpufeat_mask(X86_FEATURE_XSAVE) )
 *ecx |= (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE) ?
  cpufeat_mask(X86_FEATURE_OSXSAVE) : 0;
 
@@ -4600,6 +4600,14 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 /* Don't expose INVPCID to non-hap hvm. */
 if ( (count == 0) && !hap_enabled(d) )
 *ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+
+/* X86_FEATURE_PKU is not yet implemented for shadow paging. */
+if ( (count == 0) && !hap_enabled(d) )
+*ecx &= ~cpufeat_mask(X86_FEATURE_PKU);
+
+if ( (count == 0) && (*ecx & cpufeat_mask(X86_FEATURE_PKU)) )
+*ecx |= (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE) ?
+ cpufeat_mask(X86_FEATURE_OSPKE) : 0;
 break;
 case 0xb:
 /* Fix the x2APIC identifier. */
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V4 5/6] x86/hvm: pkeys, add xstate support for pkeys

2015-12-20 Thread Huaitong Han
This patch adds xstate support for pkeys.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
 xen/include/asm-x86/xstate.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index 12d939b..f7c41ba 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -34,13 +34,15 @@
 #define XSTATE_OPMASK  (1ULL << 5)
 #define XSTATE_ZMM (1ULL << 6)
 #define XSTATE_HI_ZMM  (1ULL << 7)
+#define XSTATE_PKRU(1ULL << 9)
 #define XSTATE_LWP (1ULL << 62) /* AMD lightweight profiling */
 #define XSTATE_FP_SSE  (XSTATE_FP | XSTATE_SSE)
 #define XCNTXT_MASK(XSTATE_FP | XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | \
 XSTATE_ZMM | XSTATE_HI_ZMM | XSTATE_NONLAZY)
 
 #define XSTATE_ALL (~(1ULL << 63))
-#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR)
+#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR | \
+XSTATE_PKRU)
 #define XSTATE_LAZY(XSTATE_ALL & ~XSTATE_NONLAZY)
 #define XSTATE_COMPACTION_ENABLED  (1ULL << 63)
 
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V4 0/6] x86/hvm: pkeys, add memory protection-key support

2015-12-20 Thread Huaitong Han
Changes in v4:
*Delete gva2gfn patch, and when page is present, PFEC_prot_key is always 
checked.
*Use RDPKRU instead of xsave_read because RDPKRU does cost less.
*Squash pkeys patch and pkru patch to guest_walk_tables patch.

Changes in v3:
*Get CPUID:ospke depend on guest cpuid instead of host hardware capable,
and Move cpuid patch to the last of patches.
*Move opt_pku to cpu/common.c.
*Use MASK_EXTR for get_pte_pkeys.
*Add quoting for pkru macro, and use static inline pkru_read functions.
*Rebase get_xsave_addr for updated codes, and add uncompressed format support
for xsave state.
*Use fpu_xsave instead of vcpu_save_fpu, and adjust the code style for
leaf_pte_pkeys_check.
*Add parentheses for PFEC_prot_key of gva2gfn funcitons.

Changes in v2:
*Rebase all patches in staging branch
*Disable X86_CR4_PKE on hypervisor, and delete pkru_read/write functions, and
use xsave state read to get pkru value.
*Delete the patch that adds pkeys support for do_page_fault.
*Add pkeys support for gva2gfn so that setting _PAGE_PK_BIT in the return
value can get propagated to the guest correctly.

The protection-key feature provides an additional mechanism by which IA-32e
paging controls access to usermode addresses.

Hardware support for protection keys for user pages is enumerated with CPUID
feature flag CPUID.7.0.ECX[3]:PKU. Software support is CPUID.7.0.ECX[4]:OSPKE
with the setting of CR4.PKE(bit 22).

When CR4.PKE = 1, every linear address is associated with the 4-bit protection
key located in bits 62:59 of the paging-structure entry that mapped the page
containing the linear address. The PKRU register determines, for each
protection key, whether user-mode addresses with that protection key may be
read or written.

The PKRU register (protection key rights for user pages) is a 32-bit register
with the following format: for each i (0 ≤ i ≤ 15), PKRU[2i] is the
access-disable bit for protection key i (ADi); PKRU[2i+1] is the write-disable
bit for protection key i (WDi).

Software can use the RDPKRU and WRPKRU instructions with ECX = 0 to read and
write PKRU. In addition, the PKRU register is XSAVE-managed state and can thus
be read and written by instructions in the XSAVE feature set.

PFEC.PK (bit 5) is defined as protection key violations.

The specification of Protection Keys can be found at SDM (4.6.2, volume 3)
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.

Huaitong Han (6):
  x86/hvm: pkeys, add the flag to enable Memory Protection Keys
  x86/hvm: pkeys, add pkeys support when setting CR4
  x86/hvm: pkeys, disable pkeys for guests in non-paging mode
  x86/hvm: pkeys, add pkeys support for guest_walk_tables
  x86/hvm: pkeys, add xstate support for pkeys
  x86/hvm: pkeys, add pkeys support for cpuid handling

 docs/misc/xen-command-line.markdown | 10 ++
 tools/libxc/xc_cpufeature.h |  2 ++
 tools/libxc/xc_cpuid_x86.c  |  6 ++--
 xen/arch/x86/cpu/common.c   | 10 +-
 xen/arch/x86/hvm/hvm.c  | 15 +++--
 xen/arch/x86/hvm/vmx/vmx.c  | 11 ---
 xen/arch/x86/mm/guest_walk.c| 65 +
 xen/arch/x86/mm/hap/guest_walk.c|  3 ++
 xen/include/asm-x86/cpufeature.h|  6 +++-
 xen/include/asm-x86/guest_pt.h  |  7 
 xen/include/asm-x86/hvm/hvm.h   |  2 ++
 xen/include/asm-x86/page.h  |  5 +++
 xen/include/asm-x86/processor.h | 39 ++
 xen/include/asm-x86/x86_64/page.h   | 12 +++
 xen/include/asm-x86/xstate.h|  4 ++-
 15 files changed, 185 insertions(+), 12 deletions(-)

-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V4 4/6] x86/hvm: pkeys, add pkeys support for guest_walk_tables

2015-12-20 Thread Huaitong Han
Protection keys define a new 4-bit protection key field(PKEY) in bits 62:59 of
leaf entries of the page tables.

PKRU register defines 32 bits, there are 16 domains and 2 attribute bits per
domain in pkru, for each i (0 ≤ i ≤ 15), PKRU[2i] is the access-disable bit for
protection key i (ADi); PKRU[2i+1] is the write-disable bit for protection key
i (WDi). PKEY is index to a defined domain.

A fault is considered as a PKU violation if all of the following conditions are
ture:
1.CR4_PKE=1.
2.EFER_LMA=1.
3.Page is present with no reserved bit violations.
4.The access is not an instruction fetch.
5.The access is to a user page.
6.PKRU.AD=1
or The access is a data write and PKRU.WD=1
and either CR0.WP=1 or it is a user access.

Signed-off-by: Huaitong Han 
---
 xen/arch/x86/mm/guest_walk.c  | 65 +++
 xen/arch/x86/mm/hap/guest_walk.c  |  3 ++
 xen/include/asm-x86/guest_pt.h|  7 +
 xen/include/asm-x86/hvm/hvm.h |  2 ++
 xen/include/asm-x86/page.h|  5 +++
 xen/include/asm-x86/processor.h   | 39 +++
 xen/include/asm-x86/x86_64/page.h | 12 
 7 files changed, 133 insertions(+)

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 18d1acf..f65ba27 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -90,6 +90,55 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int 
set_dirty)
 return 0;
 }
 
+#if GUEST_PAGING_LEVELS >= CONFIG_PAGING_LEVELS
+bool_t leaf_pte_pkeys_check(struct vcpu *vcpu,
+uint32_t pfec, uint32_t pte_pkey)
+{
+unsigned int pkru = 0;
+bool_t pkru_ad, pkru_wd;
+
+bool_t pf = !!(pfec & PFEC_page_present);
+bool_t uf = !!(pfec & PFEC_user_mode);
+bool_t wf = !!(pfec & PFEC_write_access);
+bool_t ff = !!(pfec & PFEC_insn_fetch);
+bool_t rsvdf = !!(pfec & PFEC_reserved_bit);
+
+/* When page is present,  PFEC_prot_key is always checked */
+if ( !pf || is_pv_vcpu(vcpu) )
+return 0;
+
+/*
+ * PKU:  additional mechanism by which the paging controls
+ * access to user-mode addresses based on the value in the
+ * PKRU register. A fault is considered as a PKU violation if all
+ * of the following conditions are ture:
+ * 1.CR4_PKE=1.
+ * 2.EFER_LMA=1.
+ * 3.page is present with no reserved bit violations.
+ * 4.the access is not an instruction fetch.
+ * 5.the access is to a user page.
+ * 6.PKRU.AD=1
+ *   or The access is a data write and PKRU.WD=1
+ *and either CR0.WP=1 or it is a user access.
+ */
+if ( !hvm_pku_enabled(vcpu) ||
+!hvm_long_mode_enabled(vcpu) || rsvdf || ff )
+return 0;
+
+pkru = read_pkru();
+if ( unlikely(pkru) )
+{
+pkru_ad = read_pkru_ad(pkru, pte_pkey);
+pkru_wd = read_pkru_wd(pkru, pte_pkey);
+/* Condition 6 */
+if ( pkru_ad || (pkru_wd && wf && (hvm_wp_enabled(vcpu) || uf)))
+return 1;
+}
+
+return 0;
+}
+#endif
+
 /* Walk the guest pagetables, after the manner of a hardware walker. */
 /* Because the walk is essentially random, it can cause a deadlock 
  * warning in the p2m locking code. Highly unlikely this is an actual
@@ -106,6 +155,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
 guest_l3e_t *l3p = NULL;
 guest_l4e_t *l4p;
+unsigned int pkey;
 #endif
 uint32_t gflags, mflags, iflags, rc = 0;
 bool_t smep = 0, smap = 0;
@@ -190,6 +240,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 goto out;
 /* Get the l3e and check its flags*/
 gw->l3e = l3p[guest_l3_table_offset(va)];
+pkey = guest_l3e_get_pkey(gw->l3e);
 gflags = guest_l3e_get_flags(gw->l3e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -199,6 +250,9 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 pse1G = (gflags & _PAGE_PSE) && guest_supports_1G_superpages(v); 
 
+if ( pse1G && leaf_pte_pkeys_check(v, pfec, pkey) )
+rc |= _PAGE_PKEY_BITS;
+
 if ( pse1G )
 {
 /* Generate a fake l1 table entry so callers don't all 
@@ -270,6 +324,12 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 pse2M = (gflags & _PAGE_PSE) && guest_supports_superpages(v); 
 
+#if GUEST_PAGING_LEVELS >= 4
+pkey = guest_l2e_get_pkey(gw->l2e);
+if ( pse2M && leaf_pte_pkeys_check(v, pfec, pkey) )
+rc |= _PAGE_PKEY_BITS;
+#endif
+
 if ( pse2M )
 {
 /* Special case: this guest VA is in a PSE superpage, so there's
@@ -330,6 +390,11 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 goto out;
 }
 rc |= ((gflags & mflags) ^ mflags);
+#if GUEST_PAGING_LEVELS >= 4
+pkey

[Xen-devel] [PATCH V4 3/6] x86/hvm: pkeys, disable pkeys for guests in non-paging mode

2015-12-20 Thread Huaitong Han
This patch disables pkeys for guest in non-paging mode, However XEN always uses
paging mode to emulate guest non-paging mode, To emulate this behavior, pkeys
needs to be manually disabled when guest switches to non-paging mode.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
 xen/arch/x86/hvm/vmx/vmx.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 2581e97..7123912 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1380,12 +1380,13 @@ static void vmx_update_guest_cr(struct vcpu *v, 
unsigned int cr)
 if ( !hvm_paging_enabled(v) )
 {
 /*
- * SMEP/SMAP is disabled if CPU is in non-paging mode in hardware.
- * However Xen always uses paging mode to emulate guest non-paging
- * mode. To emulate this behavior, SMEP/SMAP needs to be manually
- * disabled when guest VCPU is in non-paging mode.
+ * SMEP/SMAP/PKU is disabled if CPU is in non-paging mode in
+ * hardware. However Xen always uses paging mode to emulate guest
+ * non-paging mode. To emulate this behavior, SMEP/SMAP/PKU needs
+ * to be manually disabled when guest VCPU is in non-paging mode.
  */
-v->arch.hvm_vcpu.hw_cr[4] &= ~(X86_CR4_SMEP | X86_CR4_SMAP);
+v->arch.hvm_vcpu.hw_cr[4] &=
+~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
 }
 __vmwrite(GUEST_CR4, v->arch.hvm_vcpu.hw_cr[4]);
 break;
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V4 2/6] x86/hvm: pkeys, add pkeys support when setting CR4

2015-12-20 Thread Huaitong Han
This patch adds pkeys support when setting CR4.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
 xen/arch/x86/hvm/hvm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index db0aeba..59916ed 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1924,6 +1924,7 @@ static unsigned long hvm_cr4_guest_reserved_bits(const 
struct vcpu *v,
 leaf1_edx = 
boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_VME)];
 leaf1_ecx = 
boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_PCID)];
 leaf7_0_ebx = 
boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_FSGSBASE)];
+leaf7_0_ecx = 
boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_PKU)];
 }
 
 return ~(unsigned long)
@@ -1959,7 +1960,9 @@ static unsigned long hvm_cr4_guest_reserved_bits(const 
struct vcpu *v,
  (leaf7_0_ebx & cpufeat_mask(X86_FEATURE_SMEP) ?
   X86_CR4_SMEP : 0) |
  (leaf7_0_ebx & cpufeat_mask(X86_FEATURE_SMAP) ?
-  X86_CR4_SMAP : 0));
+  X86_CR4_SMAP : 0) |
+  (leaf7_0_ecx & cpufeat_mask(X86_FEATURE_PKU) ?
+  X86_CR4_PKE : 0));
 }
 
 static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V4 1/6] x86/hvm: pkeys, add the flag to enable Memory Protection Keys

2015-12-20 Thread Huaitong Han
This patch adds the flag to enable Memory Protection Keys.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
 docs/misc/xen-command-line.markdown | 10 ++
 xen/arch/x86/cpu/common.c   | 10 +-
 xen/include/asm-x86/cpufeature.h|  6 +-
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index c103894..36ecf80 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1177,6 +1177,16 @@ This option can be specified more than once (up to 8 
times at present).
 ### ple\_window
 > `= `
 
+### pku
+> `= `
+
+> Default: `true`
+
+Flag to enable Memory Protection Keys.
+
+The protection-key feature provides an additional mechanism by which IA-32e
+paging controls access to usermode addresses.
+
 ### psr (Intel)
 > `= List of ( cmt: | rmid_max: | cat: | 
 > cos_max: | cdp: )`
 
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 310ec85..7d03e52 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -22,6 +22,10 @@ boolean_param("xsave", use_xsave);
 bool_t opt_arat = 1;
 boolean_param("arat", opt_arat);
 
+/* pku: Flag to enable Memory Protection Keys (default on). */
+bool_t opt_pku = 1;
+boolean_param("pku", opt_pku);
+
 unsigned int opt_cpuid_mask_ecx = ~0u;
 integer_param("cpuid_mask_ecx", opt_cpuid_mask_ecx);
 unsigned int opt_cpuid_mask_edx = ~0u;
@@ -270,7 +274,8 @@ static void __cpuinit generic_identify(struct cpuinfo_x86 
*c)
if ( c->cpuid_level >= 0x0007 )
cpuid_count(0x0007, 0, &tmp,

&c->x86_capability[cpufeat_word(X86_FEATURE_FSGSBASE)],
-   &tmp, &tmp);
+   &c->x86_capability[cpufeat_word(X86_FEATURE_PKU)],
+   &tmp);
 }
 
 /*
@@ -323,6 +328,9 @@ void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
if ( cpu_has_xsave )
xstate_init(c);
 
+   if ( !opt_pku )
+   setup_clear_cpu_cap(X86_FEATURE_PKU);
+
/*
 * The vendor-specific functions might have changed features.  Now
 * we do "generic changes."
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index af127cf..ef96514 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -11,7 +11,7 @@
 
 #include 
 
-#define NCAPINTS   8   /* N 32-bit words worth of info */
+#define NCAPINTS   9   /* N 32-bit words worth of info */
 
 /* Intel-defined CPU features, CPUID level 0x0001 (edx), word 0 */
 #define X86_FEATURE_FPU(0*32+ 0) /* Onboard FPU */
@@ -163,6 +163,10 @@
 #define X86_FEATURE_ADX(7*32+19) /* ADCX, ADOX instructions */
 #define X86_FEATURE_SMAP   (7*32+20) /* Supervisor Mode Access Prevention 
*/
 
+/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx), word 8 */
+#define X86_FEATURE_PKU(8*32+ 3) /* Protection Keys for Userspace */
+#define X86_FEATURE_OSPKE  (8*32+ 4) /* OS Protection Keys Enable */
+
 #define cpufeat_word(idx)  ((idx) / 32)
 #define cpufeat_bit(idx)   ((idx) % 32)
 #define cpufeat_mask(idx)  (_AC(1, U) << cpufeat_bit(idx))
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V2] x86/xsaves: get_xsave_addr, check xsave header and support uncompressed format

2015-12-18 Thread Huaitong Han
The check needs to be against the xsave header in the area, rather than Xen's
maximum xfeature_mask. A guest might easily have a smaller xcr0 than the
maximum Xen is willing to allow, causing the pointer below to be bogus.

The get_xsave_addr() is modified to support uncompressed xstate areas.

Signed-off-by: Huaitong Han 
---
 xen/arch/x86/xstate.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index b65da38..4e87ab3 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -146,12 +146,15 @@ static void __init setup_xstate_comp(void)
 }
 }
 
-static void *get_xsave_addr(void *xsave, unsigned int xfeature_idx)
+static void *get_xsave_addr(struct xsave_struct *xsave,
+unsigned int xfeature_idx)
 {
-if ( !((1ul << xfeature_idx) & xfeature_mask) )
+if ( !((1ul << xfeature_idx) & xsave->xsave_hdr.xstate_bv) )
 return NULL;
 
-return xsave + xstate_comp_offsets[xfeature_idx];
+return (void *)xsave + (xsave_area_compressed(xsave)
+? xstate_comp_offsets
+: xstate_offsets)[xfeature_idx];
 }
 
 void expand_xsave_states(struct vcpu *v, void *dest, unsigned int size)
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] x86/xsaves: get_xsave_addr needs check the xsave header

2015-12-18 Thread Huaitong Han
The check needs to be against the xsave header in the area, rather than
Xen's maximum xfeature_mask. A guest might easily have a smaller xcr0
than the maximum Xen is willing to allow, causing the pointer below to
be bogus.

Signed-off-by: Huaitong Han 
---
 xen/arch/x86/xstate.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index b65da38..d87ab40 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -146,12 +146,13 @@ static void __init setup_xstate_comp(void)
 }
 }
 
-static void *get_xsave_addr(void *xsave, unsigned int xfeature_idx)
+static void *get_xsave_addr(struct xsave_struct *xsave,
+unsigned int xfeature_idx)
 {
-if ( !((1ul << xfeature_idx) & xfeature_mask) )
+if ( !((1ul << xfeature_idx) & xsave->xsave_hdr.xstate_bv) )
 return NULL;
 
-return xsave + xstate_comp_offsets[xfeature_idx];
+return (void *)xsave + xstate_comp_offsets[xfeature_idx];
 }
 
 void expand_xsave_states(struct vcpu *v, void *dest, unsigned int size)
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V3 PATCH 6/9] x86/hvm: pkeys, add xstate support for pkeys

2015-12-07 Thread Huaitong Han
This patch adds xstate support for pkeys.

Signed-off-by: Huaitong Han 
---
 xen/arch/x86/xstate.c| 7 +--
 xen/include/asm-x86/xstate.h | 4 +++-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index b65da38..db978c4 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -146,12 +146,15 @@ static void __init setup_xstate_comp(void)
 }
 }
 
-static void *get_xsave_addr(void *xsave, unsigned int xfeature_idx)
+void *get_xsave_addr(void *xsave, unsigned int xfeature_idx)
 {
 if ( !((1ul << xfeature_idx) & xfeature_mask) )
 return NULL;
 
-return xsave + xstate_comp_offsets[xfeature_idx];
+if ( xsave_area_compressed(xsave) )
+return xsave + xstate_comp_offsets[xfeature_idx];
+else
+return xsave + xstate_offsets[xfeature_idx];
 }
 
 void expand_xsave_states(struct vcpu *v, void *dest, unsigned int size)
diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index 12d939b..6536813 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -34,13 +34,14 @@
 #define XSTATE_OPMASK  (1ULL << 5)
 #define XSTATE_ZMM (1ULL << 6)
 #define XSTATE_HI_ZMM  (1ULL << 7)
+#define XSTATE_PKRU(1ULL << 9)
 #define XSTATE_LWP (1ULL << 62) /* AMD lightweight profiling */
 #define XSTATE_FP_SSE  (XSTATE_FP | XSTATE_SSE)
 #define XCNTXT_MASK(XSTATE_FP | XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | \
 XSTATE_ZMM | XSTATE_HI_ZMM | XSTATE_NONLAZY)
 
 #define XSTATE_ALL (~(1ULL << 63))
-#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR)
+#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR | 
XSTATE_PKRU)
 #define XSTATE_LAZY(XSTATE_ALL & ~XSTATE_NONLAZY)
 #define XSTATE_COMPACTION_ENABLED  (1ULL << 63)
 
@@ -90,6 +91,7 @@ uint64_t get_msr_xss(void);
 void xsave(struct vcpu *v, uint64_t mask);
 void xrstor(struct vcpu *v, uint64_t mask);
 bool_t xsave_enabled(const struct vcpu *v);
+void *get_xsave_addr(void *xsave, unsigned int xfeature_idx);
 int __must_check validate_xstate(u64 xcr0, u64 xcr0_accum, u64 xstate_bv);
 int __must_check handle_xsetbv(u32 index, u64 new_bv);
 void expand_xsave_states(struct vcpu *v, void *dest, unsigned int size);
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V3 PATCH 0/9] x86/hvm: pkeys, add memory protection-key support

2015-12-07 Thread Huaitong Han
Changes in v3:
*Get CPUID:ospke depend on guest cpuid instead of host hardware capable, 
and Move cpuid patch to the last of patches.
*Move opt_pku to cpu/common.c.
*Use MASK_EXTR for get_pte_pkeys.
*Add quoting for pkru macro, and use static inline pkru_read functions.
*Rebase get_xsave_addr for updated codes, and add uncompressed format support
for xsave state.
*Use fpu_xsave instead of vcpu_save_fpu, and adjust the code style for
leaf_pte_pkeys_check.
*Add parentheses for PFEC_prot_key of gva2gfn funcitons.

Changes in v2:
*Rebase all patches in staging branch
*Disable X86_CR4_PKE on hypervisor, and delete pkru_read/write functions, and
use xsave state read to get pkru value.
*Delete the patch that adds pkeys support for do_page_fault.
*Add pkeys support for gva2gfn so that setting _PAGE_PK_BIT in the return
value can get propagated to the guest correctly.

The protection-key feature provides an additional mechanism by which IA-32e
paging controls access to usermode addresses.

Hardware support for protection keys for user pages is enumerated with CPUID
feature flag CPUID.7.0.ECX[3]:PKU. Software support is CPUID.7.0.ECX[4]:OSPKE
with the setting of CR4.PKE(bit 22).

When CR4.PKE = 1, every linear address is associated with the 4-bit protection
key located in bits 62:59 of the paging-structure entry that mapped the page
containing the linear address. The PKRU register determines, for each
protection key, whether user-mode addresses with that protection key may be
read or written.

The PKRU register (protection key rights for user pages) is a 32-bit register
with the following format: for each i (0 ≤ i ≤ 15), PKRU[2i] is the
access-disable bit for protection key i (ADi); PKRU[2i+1] is the write-disable
bit for protection key i (WDi).

Software can use the RDPKRU and WRPKRU instructions with ECX = 0 to read and
write PKRU. In addition, the PKRU register is XSAVE-managed state and can thus
be read and written by instructions in the XSAVE feature set.

PFEC.PK (bit 5) is defined as protection key violations.

The specification of Protection Keys can be found at SDM (4.6.2, volume 3)
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.

Huaitong Han (9):
  x86/hvm: pkeys, add the flag to enable Memory Protection Keys
  x86/hvm: pkeys, add pkeys support when setting CR4
  x86/hvm: pkeys, disable pkeys for guests in non-paging mode
  x86/hvm: pkeys, add functions to get pkeys value from PTE
  x86/hvm: pkeys, add functions to support PKRU access
  x86/hvm: pkeys, add xstate support for pkeys
  x86/hvm: pkeys, add pkeys support for guest_walk_tables
  x86/hvm: pkeys, add pkeys support for gva2gfn funcitons
  x86/hvm: pkeys, add pkeys support for cpuid handling

 docs/misc/xen-command-line.markdown | 21 +++
 tools/libxc/xc_cpufeature.h |  2 +
 tools/libxc/xc_cpuid_x86.c  |  6 ++-
 xen/arch/x86/cpu/common.c   | 10 -
 xen/arch/x86/hvm/hvm.c  | 34 +
 xen/arch/x86/hvm/vmx/vmx.c  | 11 +++---
 xen/arch/x86/i387.c |  2 +-
 xen/arch/x86/mm/guest_walk.c| 73 +
 xen/arch/x86/xstate.c   |  7 +++-
 xen/include/asm-x86/cpufeature.h|  6 ++-
 xen/include/asm-x86/guest_pt.h  |  7 
 xen/include/asm-x86/hvm/hvm.h   |  2 +
 xen/include/asm-x86/i387.h  |  1 +
 xen/include/asm-x86/page.h  |  5 +++
 xen/include/asm-x86/processor.h | 20 ++
 xen/include/asm-x86/x86_64/page.h   | 12 ++
 xen/include/asm-x86/xstate.h|  4 +-
 17 files changed, 203 insertions(+), 20 deletions(-)

-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V3 PATCH 8/9] x86/hvm: pkeys, add pkeys support for gva2gfn funcitons

2015-12-07 Thread Huaitong Han
 This patch adds pkeys support for gva2gfn funcitons.

Signed-off-by: Huaitong Han 
---
 xen/arch/x86/hvm/hvm.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 59916ed..b88f381 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4319,7 +4319,8 @@ static enum hvm_copy_result __hvm_clear(paddr_t addr, int 
size)
 p2m_type_t p2mt;
 char *p;
 int count, todo = size;
-uint32_t pfec = PFEC_page_present | PFEC_write_access;
+uint32_t pfec = PFEC_page_present | PFEC_write_access |
+(hvm_pku_enabled(curr) ? PFEC_prot_key : 0);
 
 /*
  * XXX Disable for 4.1.0: PV-on-HVM drivers will do grant-table ops
@@ -4420,7 +4421,8 @@ enum hvm_copy_result hvm_copy_to_guest_virt(
 {
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_virt,
-  PFEC_page_present | PFEC_write_access | pfec);
+  PFEC_page_present | PFEC_write_access | pfec |
+  (hvm_pku_enabled(current) ? PFEC_prot_key : 0));
 }
 
 enum hvm_copy_result hvm_copy_from_guest_virt(
@@ -4428,7 +4430,8 @@ enum hvm_copy_result hvm_copy_from_guest_virt(
 {
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
-  PFEC_page_present | pfec);
+  PFEC_page_present | pfec |
+  (hvm_pku_enabled(current) ? PFEC_prot_key : 0));
 }
 
 enum hvm_copy_result hvm_fetch_from_guest_virt(
@@ -4446,7 +4449,8 @@ enum hvm_copy_result hvm_copy_to_guest_virt_nofault(
 {
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_to_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-  PFEC_page_present | PFEC_write_access | pfec);
+  PFEC_page_present | PFEC_write_access | pfec |
+  (hvm_pku_enabled(current) ? PFEC_prot_key : 0));
 }
 
 enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
@@ -4454,7 +4458,8 @@ enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
 {
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-  PFEC_page_present | pfec);
+  PFEC_page_present | pfec |
+  (hvm_pku_enabled(current) ? PFEC_prot_key : 0));
 }
 
 enum hvm_copy_result hvm_fetch_from_guest_virt_nofault(
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V3 PATCH 3/9] x86/hvm: pkeys, disable pkeys for guests in non-paging mode

2015-12-07 Thread Huaitong Han
This patch disables pkeys for guest in non-paging mode, However XEN always uses
paging mode to emulate guest non-paging mode, To emulate this behavior, pkeys
needs to be manually disabled when guest switches to non-paging mode.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
 xen/arch/x86/hvm/vmx/vmx.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 2581e97..7123912 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1380,12 +1380,13 @@ static void vmx_update_guest_cr(struct vcpu *v, 
unsigned int cr)
 if ( !hvm_paging_enabled(v) )
 {
 /*
- * SMEP/SMAP is disabled if CPU is in non-paging mode in hardware.
- * However Xen always uses paging mode to emulate guest non-paging
- * mode. To emulate this behavior, SMEP/SMAP needs to be manually
- * disabled when guest VCPU is in non-paging mode.
+ * SMEP/SMAP/PKU is disabled if CPU is in non-paging mode in
+ * hardware. However Xen always uses paging mode to emulate guest
+ * non-paging mode. To emulate this behavior, SMEP/SMAP/PKU needs
+ * to be manually disabled when guest VCPU is in non-paging mode.
  */
-v->arch.hvm_vcpu.hw_cr[4] &= ~(X86_CR4_SMEP | X86_CR4_SMAP);
+v->arch.hvm_vcpu.hw_cr[4] &=
+~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
 }
 __vmwrite(GUEST_CR4, v->arch.hvm_vcpu.hw_cr[4]);
 break;
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V3 PATCH 4/9] x86/hvm: pkeys, add functions to get pkeys value from PTE

2015-12-07 Thread Huaitong Han
This patch adds functions to get pkeys value from PTE.

Signed-off-by: Huaitong Han 
---
 xen/include/asm-x86/guest_pt.h|  7 +++
 xen/include/asm-x86/page.h|  5 +
 xen/include/asm-x86/x86_64/page.h | 12 
 3 files changed, 24 insertions(+)

diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h
index 3447973..6b0af70 100644
--- a/xen/include/asm-x86/guest_pt.h
+++ b/xen/include/asm-x86/guest_pt.h
@@ -154,6 +154,13 @@ static inline u32 guest_l4e_get_flags(guest_l4e_t gl4e)
 { return l4e_get_flags(gl4e); }
 #endif
 
+static inline u32 guest_l1e_get_pkeys(guest_l1e_t gl1e)
+{ return l1e_get_pkeys(gl1e); }
+static inline u32 guest_l2e_get_pkeys(guest_l2e_t gl2e)
+{ return l2e_get_pkeys(gl2e); }
+static inline u32 guest_l3e_get_pkeys(guest_l3e_t gl3e)
+{ return l3e_get_pkeys(gl3e); }
+
 static inline guest_l1e_t guest_l1e_from_gfn(gfn_t gfn, u32 flags)
 { return l1e_from_pfn(gfn_x(gfn), flags); }
 static inline guest_l2e_t guest_l2e_from_gfn(gfn_t gfn, u32 flags)
diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
index a095a93..93a0db0 100644
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -93,6 +93,11 @@
 #define l3e_get_flags(x)   (get_pte_flags((x).l3))
 #define l4e_get_flags(x)   (get_pte_flags((x).l4))
 
+/* Get pte pkeys (unsigned int). */
+#define l1e_get_pkeys(x)   (get_pte_pkeys((x).l1))
+#define l2e_get_pkeys(x)   (get_pte_pkeys((x).l2))
+#define l3e_get_pkeys(x)   (get_pte_pkeys((x).l3))
+
 /* Construct an empty pte. */
 #define l1e_empty()((l1_pgentry_t) { 0 })
 #define l2e_empty()((l2_pgentry_t) { 0 })
diff --git a/xen/include/asm-x86/x86_64/page.h 
b/xen/include/asm-x86/x86_64/page.h
index 19ab4d0..3ca489a 100644
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -134,6 +134,18 @@ typedef l4_pgentry_t root_pgentry_t;
 #define get_pte_flags(x) (((int)((x) >> 40) & ~0xFFF) | ((int)(x) & 0xFFF))
 #define put_pte_flags(x) (((intpte_t)((x) & ~0xFFF) << 40) | ((x) & 0xFFF))
 
+/*
+ * Protection keys define a new 4-bit protection key field
+ * (PKEY) in bits 62:59 of leaf entries of the page tables.
+ * This corresponds to bit 22:19 of a 24-bit flags.
+ *
+ * Notice: Bit 22 is used by _PAGE_GNTTAB which is visible to PV guests,
+ * so Protection keys must be disabled on PV guests.
+ */
+#define _PAGE_PKEY_BITS  (0x78) /* Protection Keys, 22:19 */
+
+#define get_pte_pkeys(x) (MASK_EXTR(get_pte_flags(x), _PAGE_PKEY_BITS))
+
 /* Bit 23 of a 24-bit flag mask. This corresponds to bit 63 of a pte.*/
 #define _PAGE_NX_BIT (1U<<23)
 
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V3 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables

2015-12-07 Thread Huaitong Han
This patch adds pkeys support for guest_walk_tables.

Signed-off-by: Huaitong Han 
---
 xen/arch/x86/i387.c   |  2 +-
 xen/arch/x86/mm/guest_walk.c  | 73 +++
 xen/include/asm-x86/hvm/hvm.h |  2 ++
 xen/include/asm-x86/i387.h|  1 +
 4 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c
index b661d39..83c8465 100644
--- a/xen/arch/x86/i387.c
+++ b/xen/arch/x86/i387.c
@@ -132,7 +132,7 @@ static inline uint64_t vcpu_xsave_mask(const struct vcpu *v)
 }
 
 /* Save x87 extended state */
-static inline void fpu_xsave(struct vcpu *v)
+void fpu_xsave(struct vcpu *v)
 {
 bool_t ok;
 uint64_t mask = vcpu_xsave_mask(v);
diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 18d1acf..e79f72f 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -31,6 +31,8 @@ asm(".file \"" __OBJECT_FILE__ "\"");
 #include 
 #include 
 #include 
+#include 
+#include 
 
 extern const uint32_t gw_page_flags[];
 #if GUEST_PAGING_LEVELS == CONFIG_PAGING_LEVELS
@@ -90,6 +92,61 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int 
set_dirty)
 return 0;
 }
 
+#if GUEST_PAGING_LEVELS >= CONFIG_PAGING_LEVELS
+bool_t leaf_pte_pkeys_check(struct vcpu *vcpu, uint32_t pfec,
+uint32_t pte_access, uint32_t pte_pkeys)
+{
+void *xsave_addr;
+unsigned int pkru = 0;
+bool_t pkru_ad, pkru_wd;
+
+bool_t uf = !!(pfec & PFEC_user_mode);
+bool_t wf = !!(pfec & PFEC_write_access);
+bool_t ff = !!(pfec & PFEC_insn_fetch);
+bool_t rsvdf = !!(pfec & PFEC_reserved_bit);
+bool_t pkuf  = !!(pfec & PFEC_prot_key);
+
+if ( !cpu_has_xsave || !pkuf || is_pv_vcpu(vcpu) )
+return 0;
+
+/* PKRU dom0 is always zero */
+if ( likely(!pte_pkeys) )
+return 0;
+
+/* Update vcpu xsave area */
+fpu_xsave(vcpu);
+xsave_addr = get_xsave_addr(vcpu->arch.xsave_area, fls64(XSTATE_PKRU)-1);
+if ( !!xsave_addr )
+memcpy(&pkru, xsave_addr, sizeof(pkru));
+
+if ( unlikely(pkru) )
+{
+/*
+ * PKU:  additional mechanism by which the paging controls
+ * access to user-mode addresses based on the value in the
+ * PKRU register. A fault is considered as a PKU violation if all
+ * of the following conditions are ture:
+ * 1.CR4_PKE=1.
+ * 2.EFER_LMA=1.
+ * 3.page is present with no reserved bit violations.
+ * 4.the access is not an instruction fetch.
+ * 5.the access is to a user page.
+ * 6.PKRU.AD=1
+ *   or The access is a data write and PKRU.WD=1
+ *and either CR0.WP=1 or it is a user access.
+ */
+pkru_ad = read_pkru_ad(pkru, pte_pkeys);
+pkru_wd = read_pkru_wd(pkru, pte_pkeys);
+if ( hvm_pku_enabled(vcpu) && hvm_long_mode_enabled(vcpu) &&
+!rsvdf && !ff && (pkru_ad ||
+(pkru_wd && wf && (hvm_wp_enabled(vcpu) || uf
+return 1;
+}
+
+return 0;
+}
+#endif
+
 /* Walk the guest pagetables, after the manner of a hardware walker. */
 /* Because the walk is essentially random, it can cause a deadlock 
  * warning in the p2m locking code. Highly unlikely this is an actual
@@ -106,6 +163,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
 guest_l3e_t *l3p = NULL;
 guest_l4e_t *l4p;
+unsigned int pkeys;
 #endif
 uint32_t gflags, mflags, iflags, rc = 0;
 bool_t smep = 0, smap = 0;
@@ -190,6 +248,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 goto out;
 /* Get the l3e and check its flags*/
 gw->l3e = l3p[guest_l3_table_offset(va)];
+pkeys = guest_l3e_get_pkeys(gw->l3e);
 gflags = guest_l3e_get_flags(gw->l3e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -199,6 +258,9 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 pse1G = (gflags & _PAGE_PSE) && guest_supports_1G_superpages(v); 
 
+if ( pse1G && leaf_pte_pkeys_check(v, pfec, gflags, pkeys) )
+rc |= _PAGE_PKEY_BITS;
+
 if ( pse1G )
 {
 /* Generate a fake l1 table entry so callers don't all 
@@ -270,6 +332,12 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 pse2M = (gflags & _PAGE_PSE) && guest_supports_superpages(v); 
 
+#if GUEST_PAGING_LEVELS >= 4
+pkeys = guest_l2e_get_pkeys(gw->l2e);
+if ( pse2M && leaf_pte_pkeys_check(v, pfec, gflags, pkeys) )
+rc |= _PAGE_PKEY_BITS;
+#endif
+
 if ( pse2M )
 {
 /* Special case: this guest VA is in a PSE superpage, so there's
@@ -330,6 +398,11 @@ guest_walk_tables(struct vcpu *v, struct p2m_

[Xen-devel] [V3 PATCH 9/9] x86/hvm: pkeys, add pkeys support for cpuid handling

2015-12-07 Thread Huaitong Han
This patch adds pkeys support for cpuid handing.

Pkeys hardware support is CPUID.7.0.ECX[3]:PKU. software support is
CPUID.7.0.ECX[4]:OSPKE and it reflects the support setting of CR4.PKE.

Signed-off-by: Huaitong Han 
---
 tools/libxc/xc_cpufeature.h |  2 ++
 tools/libxc/xc_cpuid_x86.c  |  6 --
 xen/arch/x86/hvm/hvm.c  | 14 +-
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/xc_cpufeature.h b/tools/libxc/xc_cpufeature.h
index c3ddc80..f6a9778 100644
--- a/tools/libxc/xc_cpufeature.h
+++ b/tools/libxc/xc_cpufeature.h
@@ -141,5 +141,7 @@
 #define X86_FEATURE_ADX 19 /* ADCX, ADOX instructions */
 #define X86_FEATURE_SMAP20 /* Supervisor Mode Access Protection */
 
+/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx) */
+#define X86_FEATURE_PKU 3
 
 #endif /* __LIBXC_CPUFEATURE_H */
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 8882c01..1ce979b 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -427,9 +427,11 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
 bitmaskof(X86_FEATURE_ADX)  |
 bitmaskof(X86_FEATURE_SMAP) |
 bitmaskof(X86_FEATURE_FSGSBASE));
+regs[2] &= bitmaskof(X86_FEATURE_PKU);
 } else
-regs[1] = 0;
-regs[0] = regs[2] = regs[3] = 0;
+regs[1] = regs[2] = 0;
+
+regs[0] = regs[3] = 0;
 break;
 
 case 0x000d:
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index b88f381..d7b3b43 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4577,7 +4577,7 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 __clear_bit(X86_FEATURE_APIC & 31, edx);
 
 /* Fix up OSXSAVE. */
-if ( cpu_has_xsave )
+if ( *ecx & cpufeat_mask(X86_FEATURE_XSAVE) )
 *ecx |= (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE) ?
  cpufeat_mask(X86_FEATURE_OSXSAVE) : 0;
 
@@ -4605,6 +4605,18 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 /* Don't expose INVPCID to non-hap hvm. */
 if ( (count == 0) && !hap_enabled(d) )
 *ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+
+/* X86_FEATURE_PKU is not yet implemented for shadow paging
+ *
+ * Hypervisor gets guest pkru value from XSAVE state, because
+ * Hypervisor CR4 without X86_CR4_PKE disables RDPKRU instruction.
+ */
+if ( (count == 0) && (!hap_enabled(d) || !cpu_has_xsave) )
+*ecx &= ~cpufeat_mask(X86_FEATURE_PKU);
+
+if ( (count == 0) && (*ecx & cpufeat_mask(X86_FEATURE_PKU)) )
+*ecx |= (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE) ?
+ cpufeat_mask(X86_FEATURE_OSPKE) : 0;
 break;
 case 0xb:
 /* Fix the x2APIC identifier. */
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V3 PATCH 1/9] x86/hvm: pkeys, add the flag to enable Memory Protection Keys

2015-12-07 Thread Huaitong Han
This patch adds the flag to enable Memory Protection Keys.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
 docs/misc/xen-command-line.markdown | 21 +
 xen/arch/x86/cpu/common.c   | 10 +-
 xen/include/asm-x86/cpufeature.h|  6 +-
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index c103894..ef5ef6c 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1177,6 +1177,27 @@ This option can be specified more than once (up to 8 
times at present).
 ### ple\_window
 > `= `
 
+### pku
+> `= `
+
+> Default: `true`
+
+Flag to enable Memory Protection Keys.
+
+The protection-key feature provides an additional mechanism by which IA-32e
+paging controls access to usermode addresses.
+
+When CR4.PKE = 1, every linear address is associated with the 4-bit protection
+key located in bits 62:59 of the paging-structure entry that mapped the page
+containing the linear address. The PKRU register determines, for each
+protection key, whether user-mode addresses with that protection key may be
+read or written.
+
+The PKRU register (protection key rights for user pages) is a 32-bit register
+with the following format: for each i (0 ≤ i ≤ 15), PKRU[2i] is the
+access-disable bit for protection key i (ADi); PKRU[2i+1] is the write-disable
+bit for protection key i (WDi).
+
 ### psr (Intel)
 > `= List of ( cmt: | rmid_max: | cat: | 
 > cos_max: | cdp: )`
 
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 310ec85..7d03e52 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -22,6 +22,10 @@ boolean_param("xsave", use_xsave);
 bool_t opt_arat = 1;
 boolean_param("arat", opt_arat);
 
+/* pku: Flag to enable Memory Protection Keys (default on). */
+bool_t opt_pku = 1;
+boolean_param("pku", opt_pku);
+
 unsigned int opt_cpuid_mask_ecx = ~0u;
 integer_param("cpuid_mask_ecx", opt_cpuid_mask_ecx);
 unsigned int opt_cpuid_mask_edx = ~0u;
@@ -270,7 +274,8 @@ static void __cpuinit generic_identify(struct cpuinfo_x86 
*c)
if ( c->cpuid_level >= 0x0007 )
cpuid_count(0x0007, 0, &tmp,

&c->x86_capability[cpufeat_word(X86_FEATURE_FSGSBASE)],
-   &tmp, &tmp);
+   &c->x86_capability[cpufeat_word(X86_FEATURE_PKU)],
+   &tmp);
 }
 
 /*
@@ -323,6 +328,9 @@ void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
if ( cpu_has_xsave )
xstate_init(c);
 
+   if ( !opt_pku )
+   setup_clear_cpu_cap(X86_FEATURE_PKU);
+
/*
 * The vendor-specific functions might have changed features.  Now
 * we do "generic changes."
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index af127cf..ef96514 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -11,7 +11,7 @@
 
 #include 
 
-#define NCAPINTS   8   /* N 32-bit words worth of info */
+#define NCAPINTS   9   /* N 32-bit words worth of info */
 
 /* Intel-defined CPU features, CPUID level 0x0001 (edx), word 0 */
 #define X86_FEATURE_FPU(0*32+ 0) /* Onboard FPU */
@@ -163,6 +163,10 @@
 #define X86_FEATURE_ADX(7*32+19) /* ADCX, ADOX instructions */
 #define X86_FEATURE_SMAP   (7*32+20) /* Supervisor Mode Access Prevention 
*/
 
+/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx), word 8 */
+#define X86_FEATURE_PKU(8*32+ 3) /* Protection Keys for Userspace */
+#define X86_FEATURE_OSPKE  (8*32+ 4) /* OS Protection Keys Enable */
+
 #define cpufeat_word(idx)  ((idx) / 32)
 #define cpufeat_bit(idx)   ((idx) % 32)
 #define cpufeat_mask(idx)  (_AC(1, U) << cpufeat_bit(idx))
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V3 PATCH 5/9] x86/hvm: pkeys, add functions to support PKRU access

2015-12-07 Thread Huaitong Han
This patch adds functions to support PKRU access.

Signed-off-by: Huaitong Han 
---
 xen/include/asm-x86/processor.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 3f8411f..c345787 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -342,6 +342,26 @@ static inline void write_cr4(unsigned long val)
 asm volatile ( "mov %0,%%cr4" : : "r" (val) );
 }
 
+/* Macros for PKRU domain */
+#define PKRU_READ  (0)
+#define PKRU_WRITE (1)
+#define PKRU_ATTRS (2)
+
+/*
+ * PKRU defines 32 bits, there are 16 domains and 2 attribute bits per
+ * domain in pkru, pkeys is index to a defined domain, so the value of
+ * pte_pkeys * PKRU_ATTRS + R/W is offset of a defined domain attribute.
+ */
+static inline bool_t read_pkru_ad(unsigned int pkru, unsigned int pkey)
+{
+return (pkru >> (pkey * PKRU_ATTRS + PKRU_READ)) & 1;
+}
+
+static inline bool_t read_pkru_wd(unsigned int pkru, unsigned int pkey)
+{
+return (pkru >> (pkey * PKRU_ATTRS + PKRU_WRITE)) & 1;
+}
+
 /* Clear and set 'TS' bit respectively */
 static inline void clts(void) 
 {
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V3 PATCH 2/9] x86/hvm: pkeys, add pkeys support when setting CR4

2015-12-07 Thread Huaitong Han
This patch adds pkeys support when setting CR4.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
 xen/arch/x86/hvm/hvm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index db0aeba..59916ed 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1924,6 +1924,7 @@ static unsigned long hvm_cr4_guest_reserved_bits(const 
struct vcpu *v,
 leaf1_edx = 
boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_VME)];
 leaf1_ecx = 
boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_PCID)];
 leaf7_0_ebx = 
boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_FSGSBASE)];
+leaf7_0_ecx = 
boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_PKU)];
 }
 
 return ~(unsigned long)
@@ -1959,7 +1960,9 @@ static unsigned long hvm_cr4_guest_reserved_bits(const 
struct vcpu *v,
  (leaf7_0_ebx & cpufeat_mask(X86_FEATURE_SMEP) ?
   X86_CR4_SMEP : 0) |
  (leaf7_0_ebx & cpufeat_mask(X86_FEATURE_SMAP) ?
-  X86_CR4_SMAP : 0));
+  X86_CR4_SMAP : 0) |
+  (leaf7_0_ecx & cpufeat_mask(X86_FEATURE_PKU) ?
+  X86_CR4_PKE : 0));
 }
 
 static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V2 PATCH 2/9] x86/hvm: pkeys, add the flag to enable Memory Protection Keys

2015-11-27 Thread Huaitong Han
This patch adds the flag to enable Memory Protection Keys.

Signed-off-by: Huaitong Han 
---
 docs/misc/xen-command-line.markdown | 21 +
 xen/arch/x86/setup.c|  7 +++
 2 files changed, 28 insertions(+)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index afb9548..c0bd84d 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1177,6 +1177,27 @@ This option can be specified more than once (up to 8 
times at present).
 ### ple\_window
 > `= `
 
+### pku
+> `= `
+
+> Default: `true`
+
+Flag to enable Memory Protection Keys.
+
+The protection-key feature provides an additional mechanism by which IA-32e
+paging controls access to usermode addresses.
+
+When CR4.PKE = 1, every linear address is associated with the 4-bit protection
+key located in bits 62:59 of the paging-structure entry that mapped the page
+containing the linear address. The PKRU register determines, for each
+protection key, whether user-mode addresses with that protection key may be
+read or written.
+
+The PKRU register (protection key rights for user pages) is a 32-bit register
+with the following format: for each i (0 ≤ i ≤ 15), PKRU[2i] is the
+access-disable bit for protection key i (ADi); PKRU[2i+1] is the write-disable
+bit for protection key i (WDi).
+
 ### psr (Intel)
 > `= List of ( cmt: | rmid_max: | cat: | 
 > cos_max: | cdp: )`
 
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 6714473..2aa2f83 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -67,6 +67,10 @@ invbool_param("smep", disable_smep);
 static bool_t __initdata disable_smap;
 invbool_param("smap", disable_smap);
 
+/* pku: Flag to enable Memory Protection Keys (default on). */
+static bool_t __initdata opt_pku = 1;
+boolean_param("pku", opt_pku);
+
 /* Boot dom0 in pvh mode */
 static bool_t __initdata opt_dom0pvh;
 boolean_param("dom0pvh", opt_dom0pvh);
@@ -1307,6 +1311,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 if ( cpu_has_smap )
 set_in_cr4(X86_CR4_SMAP);
 
+if ( !opt_pku )
+setup_clear_cpu_cap(X86_FEATURE_PKU);
+
 if ( cpu_has_fsgsbase )
 set_in_cr4(X86_CR4_FSGSBASE);
 
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V2 PATCH 8/9] x86/hvm: pkeys, add xstate support for pkeys

2015-11-27 Thread Huaitong Han
This patch adds xstate support for pkeys.

Signed-off-by: Huaitong Han 
---
 xen/arch/x86/xstate.c| 18 ++
 xen/include/asm-x86/xstate.h |  5 -
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 827e0e1..00bddb0 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -294,6 +294,24 @@ unsigned int xstate_ctxt_size(u64 xcr0)
 return _xstate_ctxt_size(xcr0);
 }
 
+/*
+ * Given the xsave area and a state inside, this function returns the
+ * address of the state.
+ *
+ * This is the API that is called to get xstate address in standard format.
+ * Just because XSAVE function does not use compacted format of xsave
+ * area.
+ */
+void *get_xsave_addr(struct xsave_struct *xsave, u32 xfeature)
+{
+u32 xstate_offsets, xstate_sizes, ecx, edx;
+u32 xstate_nr = fls64(xfeature) - 1;
+
+cpuid_count(XSTATE_CPUID, xstate_nr, &xstate_sizes, &xstate_offsets, &ecx, 
&edx);
+
+return (void *)xsave + xstate_offsets;
+}
+
 /* Collect the information of processor's extended state */
 void xstate_init(struct cpuinfo_x86 *c)
 {
diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index b95a5b5..e9abe71 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -34,13 +34,15 @@
 #define XSTATE_OPMASK  (1ULL << 5)
 #define XSTATE_ZMM (1ULL << 6)
 #define XSTATE_HI_ZMM  (1ULL << 7)
+#define XSTATE_UNUSED  (1ULL << 8)
+#define XSTATE_PKRU(1ULL << 9)
 #define XSTATE_LWP (1ULL << 62) /* AMD lightweight profiling */
 #define XSTATE_FP_SSE  (XSTATE_FP | XSTATE_SSE)
 #define XCNTXT_MASK(XSTATE_FP | XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | \
 XSTATE_ZMM | XSTATE_HI_ZMM | XSTATE_NONLAZY)
 
 #define XSTATE_ALL (~(1ULL << 63))
-#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR)
+#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR | 
XSTATE_PKRU)
 #define XSTATE_LAZY(XSTATE_ALL & ~XSTATE_NONLAZY)
 #define XSTATE_COMPACTION_ENABLED  (1ULL << 63)
 
@@ -89,6 +91,7 @@ uint64_t get_msr_xss(void);
 void xsave(struct vcpu *v, uint64_t mask);
 void xrstor(struct vcpu *v, uint64_t mask);
 bool_t xsave_enabled(const struct vcpu *v);
+void *get_xsave_addr(struct xsave_struct *xsave, u32 xfeature);
 int __must_check validate_xstate(u64 xcr0, u64 xcr0_accum, u64 xstate_bv);
 int __must_check handle_xsetbv(u32 index, u64 new_bv);
 
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V2 PATCH 6/9] x86/hvm: pkeys, add functions to support PKRU access

2015-11-27 Thread Huaitong Han
This patch adds functions to support PKRU access.

Signed-off-by: Huaitong Han 
---
 xen/include/asm-x86/processor.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 3f8411f..68d86cb 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -342,6 +342,21 @@ static inline void write_cr4(unsigned long val)
 asm volatile ( "mov %0,%%cr4" : : "r" (val) );
 }
 
+/* Macros for PKRU domain */
+#define PKRU_READ  0
+#define PKRU_WRITE 1
+#define PKRU_ATTRS 2
+
+/*
+ * PKRU defines 32 bits, there are 16 domains and 2 attribute bits per
+ * domain in pkru, pkeys is index to a defined domain, so the value of
+ * pte_pkeys * PKRU_ATTRS + R/W is offset of a defined domain attribute.
+ */
+#define READ_PKRU_AD(pkru, pkey) \
+((pkru >> (pkey * PKRU_ATTRS + PKRU_READ)) & 1)
+#define READ_PKRU_WD(pkru, pkey) \
+((pkru >> (pkey * PKRU_ATTRS + PKRU_WRITE)) & 1)
+
 /* Clear and set 'TS' bit respectively */
 static inline void clts(void) 
 {
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V2 PATCH 9/9] x86/hvm: pkeys, add pkeys support for gva2gfn funcitons

2015-11-27 Thread Huaitong Han
 This patch adds pkeys support for gva2gfn funcitons.

Signed-off-by: Huaitong Han 
---
 xen/arch/x86/hvm/hvm.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0103bcb..b28d104 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4304,7 +4304,8 @@ static enum hvm_copy_result __hvm_clear(paddr_t addr, int 
size)
 p2m_type_t p2mt;
 char *p;
 int count, todo = size;
-uint32_t pfec = PFEC_page_present | PFEC_write_access;
+uint32_t pfec = PFEC_page_present | PFEC_write_access |
+hvm_pku_enabled(curr) ? PFEC_prot_key : 0;
 
 /*
  * XXX Disable for 4.1.0: PV-on-HVM drivers will do grant-table ops
@@ -4405,7 +4406,8 @@ enum hvm_copy_result hvm_copy_to_guest_virt(
 {
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_virt,
-  PFEC_page_present | PFEC_write_access | pfec);
+  PFEC_page_present | PFEC_write_access | pfec |
+  hvm_pku_enabled(current) ? PFEC_prot_key : 0);
 }
 
 enum hvm_copy_result hvm_copy_from_guest_virt(
@@ -4413,7 +4415,8 @@ enum hvm_copy_result hvm_copy_from_guest_virt(
 {
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
-  PFEC_page_present | pfec);
+  PFEC_page_present | pfec |
+  hvm_pku_enabled(current) ? PFEC_prot_key : 0);
 }
 
 enum hvm_copy_result hvm_fetch_from_guest_virt(
@@ -4431,7 +4434,8 @@ enum hvm_copy_result hvm_copy_to_guest_virt_nofault(
 {
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_to_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-  PFEC_page_present | PFEC_write_access | pfec);
+  PFEC_page_present | PFEC_write_access | pfec |
+  hvm_pku_enabled(current) ? PFEC_prot_key : 0);
 }
 
 enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
@@ -4439,7 +4443,8 @@ enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
 {
 return __hvm_copy(buf, vaddr, size,
   HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-  PFEC_page_present | pfec);
+  PFEC_page_present | pfec |
+  hvm_pku_enabled(current) ? PFEC_prot_key : 0);
 }
 
 enum hvm_copy_result hvm_fetch_from_guest_virt_nofault(
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V2 PATCH 5/9] x86/hvm: pkeys, add functions to get pkeys value from PTE

2015-11-27 Thread Huaitong Han
This patch adds functions to get pkeys value from PTE.

Signed-off-by: Huaitong Han 
---
 xen/include/asm-x86/guest_pt.h|  7 +++
 xen/include/asm-x86/page.h|  5 +
 xen/include/asm-x86/x86_64/page.h | 19 +++
 3 files changed, 31 insertions(+)

diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h
index 3447973..6b0af70 100644
--- a/xen/include/asm-x86/guest_pt.h
+++ b/xen/include/asm-x86/guest_pt.h
@@ -154,6 +154,13 @@ static inline u32 guest_l4e_get_flags(guest_l4e_t gl4e)
 { return l4e_get_flags(gl4e); }
 #endif
 
+static inline u32 guest_l1e_get_pkeys(guest_l1e_t gl1e)
+{ return l1e_get_pkeys(gl1e); }
+static inline u32 guest_l2e_get_pkeys(guest_l2e_t gl2e)
+{ return l2e_get_pkeys(gl2e); }
+static inline u32 guest_l3e_get_pkeys(guest_l3e_t gl3e)
+{ return l3e_get_pkeys(gl3e); }
+
 static inline guest_l1e_t guest_l1e_from_gfn(gfn_t gfn, u32 flags)
 { return l1e_from_pfn(gfn_x(gfn), flags); }
 static inline guest_l2e_t guest_l2e_from_gfn(gfn_t gfn, u32 flags)
diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
index a095a93..93a0db0 100644
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -93,6 +93,11 @@
 #define l3e_get_flags(x)   (get_pte_flags((x).l3))
 #define l4e_get_flags(x)   (get_pte_flags((x).l4))
 
+/* Get pte pkeys (unsigned int). */
+#define l1e_get_pkeys(x)   (get_pte_pkeys((x).l1))
+#define l2e_get_pkeys(x)   (get_pte_pkeys((x).l2))
+#define l3e_get_pkeys(x)   (get_pte_pkeys((x).l3))
+
 /* Construct an empty pte. */
 #define l1e_empty()((l1_pgentry_t) { 0 })
 #define l2e_empty()((l2_pgentry_t) { 0 })
diff --git a/xen/include/asm-x86/x86_64/page.h 
b/xen/include/asm-x86/x86_64/page.h
index 19ab4d0..49343ec 100644
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -134,6 +134,25 @@ typedef l4_pgentry_t root_pgentry_t;
 #define get_pte_flags(x) (((int)((x) >> 40) & ~0xFFF) | ((int)(x) & 0xFFF))
 #define put_pte_flags(x) (((intpte_t)((x) & ~0xFFF) << 40) | ((x) & 0xFFF))
 
+/*
+ * Protection keys define a new 4-bit protection key field
+ * (PKEY) in bits 62:59 of leaf entries of the page tables.
+ * This corresponds to bit 22:19 of a 24-bit flags.
+ *
+ * Notice: Bit 22 is used by _PAGE_GNTTAB which is visible to PV guests,
+ * so Protection keys must be disabled on PV guests.
+ */
+#define _PAGE_PKEY_BIT0 (1u<<19)   /* Protection Keys, bit 1/4 */
+#define _PAGE_PKEY_BIT1 (1u<<20)   /* Protection Keys, bit 2/4 */
+#define _PAGE_PKEY_BIT2 (1u<<21)   /* Protection Keys, bit 3/4 */
+#define _PAGE_PKEY_BIT3 (1u<<22)   /* Protection Keys, bit 4/4 */
+
+/* The order of mask _PAGE_PKEY_BIT0 is 19 */
+#define get_pte_pkeys(x) ((int)(get_pte_flags(x) >> 19) & 0xF)
+
+/* Take pkey first bit as pkey feature */
+#define _PAGE_PKEY_BIT _PAGE_PKEY_BIT0
+
 /* Bit 23 of a 24-bit flag mask. This corresponds to bit 63 of a pte.*/
 #define _PAGE_NX_BIT (1U<<23)
 
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V2 PATCH 1/9] x86/hvm: pkeys, add pkeys support for cpuid handling

2015-11-27 Thread Huaitong Han
This patch adds pkeys support for cpuid handing.

Pkeys hardware support is CPUID.7.0.ECX[3]:PKU. software support is
CPUID.7.0.ECX[4]:OSPKE and it reflects the support setting of CR4.PKE.

Signed-off-by: Huaitong Han 
---
 tools/libxc/xc_cpufeature.h  |  2 ++
 tools/libxc/xc_cpuid_x86.c   |  6 --
 xen/arch/x86/cpu/common.c|  5 +++--
 xen/arch/x86/hvm/hvm.c   | 12 
 xen/include/asm-x86/cpufeature.h |  7 ++-
 5 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/xc_cpufeature.h b/tools/libxc/xc_cpufeature.h
index c3ddc80..f6a9778 100644
--- a/tools/libxc/xc_cpufeature.h
+++ b/tools/libxc/xc_cpufeature.h
@@ -141,5 +141,7 @@
 #define X86_FEATURE_ADX 19 /* ADCX, ADOX instructions */
 #define X86_FEATURE_SMAP20 /* Supervisor Mode Access Protection */
 
+/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx) */
+#define X86_FEATURE_PKU 3
 
 #endif /* __LIBXC_CPUFEATURE_H */
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 031c848..5c1d076 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -421,9 +421,11 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
 bitmaskof(X86_FEATURE_ADX)  |
 bitmaskof(X86_FEATURE_SMAP) |
 bitmaskof(X86_FEATURE_FSGSBASE));
+regs[2] &= bitmaskof(X86_FEATURE_PKU);
 } else
-regs[1] = 0;
-regs[0] = regs[2] = regs[3] = 0;
+regs[1] = regs[2] = 0;
+
+regs[0] = regs[3] = 0;
 break;
 
 case 0x000d:
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index e60929d..84d3a10 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -264,8 +264,9 @@ static void __cpuinit generic_identify(struct cpuinfo_x86 
*c)
/* Intel-defined flags: level 0x0007 */
if ( c->cpuid_level >= 0x0007 )
cpuid_count(0x0007, 0, &tmp,
-   
&c->x86_capability[cpufeat_word(X86_FEATURE_FSGSBASE)],
-   &tmp, &tmp);
+&c->x86_capability[cpufeat_word(X86_FEATURE_FSGSBASE)],
+&c->x86_capability[cpufeat_word(X86_FEATURE_PKU)],
+&tmp);
 }
 
 /*
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ea982e2..0adafe9 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4582,6 +4582,18 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 /* Don't expose INVPCID to non-hap hvm. */
 if ( (count == 0) && !hap_enabled(d) )
 *ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+
+/* X86_FEATURE_PKU is not yet implemented for shadow paging
+ *
+ * Hypervisor gets guest pkru value from XSAVE state, because
+ * Hypervisor CR4 without X86_CR4_PKE disables RDPKRU instruction.
+ */
+if ( (count == 0) && (!hap_enabled(d) || !cpu_has_xsave) )
+*ecx &= ~cpufeat_mask(X86_FEATURE_PKU);
+
+if ( (count == 0) && cpu_has_pku )
+*ecx |= (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE) ?
+ cpufeat_mask(X86_FEATURE_OSPKE) : 0;
 break;
 case 0xb:
 /* Fix the x2APIC identifier. */
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index af127cf..f041efa 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -11,7 +11,7 @@
 
 #include 
 
-#define NCAPINTS   8   /* N 32-bit words worth of info */
+#define NCAPINTS   9   /* N 32-bit words worth of info */
 
 /* Intel-defined CPU features, CPUID level 0x0001 (edx), word 0 */
 #define X86_FEATURE_FPU(0*32+ 0) /* Onboard FPU */
@@ -163,6 +163,10 @@
 #define X86_FEATURE_ADX(7*32+19) /* ADCX, ADOX instructions */
 #define X86_FEATURE_SMAP   (7*32+20) /* Supervisor Mode Access Prevention 
*/
 
+/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx), word 8 */
+#define X86_FEATURE_PKU(8*32+ 3) /* Protection Keys for Userspace */
+#define X86_FEATURE_OSPKE  (8*32+ 4) /* OS Protection Keys Enable */
+
 #define cpufeat_word(idx)  ((idx) / 32)
 #define cpufeat_bit(idx)   ((idx) % 32)
 #define cpufeat_mask(idx)  (_AC(1, U) << cpufeat_bit(idx))
@@ -199,6 +203,7 @@
 
 #define cpu_has_smepboot_cpu_has(X86_FEATURE_SMEP)
 #define cpu_has_smapboot_cpu_has(X86_FEATURE_SMAP)
+#define cpu_has_pku boot_cpu_has(X86_FEATURE_PKU)
 #define cpu_has_fpu_sel (!boot_cpu_has(X86_FEATURE_NO_FPU_SEL))
 
 #define cpu_has_ffxsr   ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) \
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V2 PATCH 0/9] x86/hvm: pkeys, add memory protection-key support

2015-11-27 Thread Huaitong Han
Changes in v2:
*Rebase all patches in staging branch
*Disable X86_CR4_PKE on hypervisor, and delete pkru_read/write functions, and
use xsave state read to get pkru value.
*Delete the patch that adds pkeys support for do_page_fault.
*Add pkeys support for gva2gfn so that setting _PAGE_PK_BIT in the return
value can get propagated to the guest correctly.

The protection-key feature provides an additional mechanism by which IA-32e
paging controls access to usermode addresses.

Hardware support for protection keys for user pages is enumerated with CPUID
feature flag CPUID.7.0.ECX[3]:PKU. Software support is CPUID.7.0.ECX[4]:OSPKE
with the setting of CR4.PKE(bit 22).

When CR4.PKE = 1, every linear address is associated with the 4-bit protection
key located in bits 62:59 of the paging-structure entry that mapped the page
containing the linear address. The PKRU register determines, for each
protection key, whether user-mode addresses with that protection key may be
read or written.

The PKRU register (protection key rights for user pages) is a 32-bit register
with the following format: for each i (0 ≤ i ≤ 15), PKRU[2i] is the
access-disable bit for protection key i (ADi); PKRU[2i+1] is the write-disable
bit for protection key i (WDi).

Software can use the RDPKRU and WRPKRU instructions with ECX = 0 to read and
write PKRU. In addition, the PKRU register is XSAVE-managed state and can thus
be read and written by instructions in the XSAVE feature set.

PFEC.PK (bit 5) is defined as protection key violations.

The specification of Protection Keys can be found at SDM (4.6.2, volume 3)
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.

Huaitong Han (9):
  x86/hvm: pkeys, add pkeys support for cpuid handling
  x86/hvm: pkeys, add the flag to enable Memory Protection Keys
  x86/hvm: pkeys, add pkeys support when setting CR4
  x86/hvm: pkeys, disable pkeys for guests in non-paging mode
  x86/hvm: pkeys, add functions to get pkeys value from PTE
  x86/hvm: pkeys, add functions to support PKRU access
  x86/hvm: pkeys, add pkeys support for guest_walk_tables
  x86/hvm: pkeys, add xstate support for pkeys
  x86/hvm: pkeys, add pkeys support for gva2gfn funcitons

 docs/misc/xen-command-line.markdown | 21 
 tools/libxc/xc_cpufeature.h |  2 ++
 tools/libxc/xc_cpuid_x86.c  |  6 ++--
 xen/arch/x86/cpu/common.c   |  5 +--
 xen/arch/x86/hvm/hvm.c  | 32 ++
 xen/arch/x86/hvm/vmx/vmx.c  | 11 ---
 xen/arch/x86/mm/guest_walk.c| 65 +
 xen/arch/x86/setup.c|  7 
 xen/arch/x86/xstate.c   | 18 ++
 xen/include/asm-x86/cpufeature.h|  7 +++-
 xen/include/asm-x86/guest_pt.h  |  7 
 xen/include/asm-x86/hvm/hvm.h   |  2 ++
 xen/include/asm-x86/page.h  |  5 +++
 xen/include/asm-x86/processor.h | 15 +
 xen/include/asm-x86/x86_64/page.h   | 19 +++
 xen/include/asm-x86/xstate.h|  5 ++-
 16 files changed, 210 insertions(+), 17 deletions(-)

-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V2 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables

2015-11-27 Thread Huaitong Han
This patch adds pkeys support for guest_walk_tables.

Signed-off-by: Huaitong Han 
---
 xen/arch/x86/mm/guest_walk.c  | 65 +++
 xen/include/asm-x86/hvm/hvm.h |  2 ++
 2 files changed, 67 insertions(+)

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 18d1acf..3e443b3 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -31,6 +31,8 @@ asm(".file \"" __OBJECT_FILE__ "\"");
 #include 
 #include 
 #include 
+#include 
+#include 
 
 extern const uint32_t gw_page_flags[];
 #if GUEST_PAGING_LEVELS == CONFIG_PAGING_LEVELS
@@ -90,6 +92,53 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int 
set_dirty)
 return 0;
 }
 
+#if GUEST_PAGING_LEVELS >= 4
+uint32_t leaf_pte_pkeys_check(struct vcpu *vcpu, uint32_t pfec,
+uint32_t pte_access, uint32_t pte_pkeys)
+{
+bool_t pkru_ad, pkru_wd;
+bool_t ff, wf, uf, rsvdf, pkuf;
+unsigned int pkru = 0;
+
+uf = pfec & PFEC_user_mode;
+wf = pfec & PFEC_write_access;
+rsvdf = pfec & PFEC_reserved_bit;
+ff = pfec & PFEC_insn_fetch;
+pkuf = pfec & PFEC_prot_key;
+
+if ( !cpu_has_xsave || !pkuf || is_pv_vcpu(vcpu) )
+return 0;
+
+vcpu_save_fpu(vcpu);
+pkru = *(unsigned int*)get_xsave_addr(vcpu->arch.xsave_area, XSTATE_PKRU);
+if ( unlikely(pkru) )
+{
+/*
+ * PKU:  additional mechanism by which the paging controls
+ * access to user-mode addresses based on the value in the
+ * PKRU register. A fault is considered as a PKU violation if all
+ * of the following conditions are ture:
+ * 1.CR4_PKE=1.
+ * 2.EFER_LMA=1.
+ * 3.page is present with no reserved bit violations.
+ * 4.the access is not an instruction fetch.
+ * 5.the access is to a user page.
+ * 6.PKRU.AD=1
+ *   or The access is a data write and PKRU.WD=1
+ *and either CR0.WP=1 or it is a user access.
+ */
+pkru_ad = READ_PKRU_AD(pkru, pte_pkeys);
+pkru_wd = READ_PKRU_AD(pkru, pte_pkeys);
+if ( hvm_pku_enabled(vcpu) && hvm_long_mode_enabled(vcpu) &&
+!rsvdf && !ff && (pkru_ad ||
+(pkru_wd && wf && (hvm_wp_enabled(vcpu) || uf
+return 1;
+}
+
+return 0;
+}
+#endif
+
 /* Walk the guest pagetables, after the manner of a hardware walker. */
 /* Because the walk is essentially random, it can cause a deadlock 
  * warning in the p2m locking code. Highly unlikely this is an actual
@@ -106,6 +155,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
 guest_l3e_t *l3p = NULL;
 guest_l4e_t *l4p;
+uint32_t pkeys;
 #endif
 uint32_t gflags, mflags, iflags, rc = 0;
 bool_t smep = 0, smap = 0;
@@ -190,6 +240,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 goto out;
 /* Get the l3e and check its flags*/
 gw->l3e = l3p[guest_l3_table_offset(va)];
+pkeys = guest_l3e_get_pkeys(gw->l3e);
 gflags = guest_l3e_get_flags(gw->l3e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -199,6 +250,9 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 pse1G = (gflags & _PAGE_PSE) && guest_supports_1G_superpages(v); 
 
+if (pse1G && leaf_pte_pkeys_check(v, pfec, gflags, pkeys))
+rc |= _PAGE_PKEY_BIT;
+
 if ( pse1G )
 {
 /* Generate a fake l1 table entry so callers don't all 
@@ -270,6 +324,12 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 pse2M = (gflags & _PAGE_PSE) && guest_supports_superpages(v); 
 
+#if GUEST_PAGING_LEVELS >= 4
+pkeys = guest_l2e_get_pkeys(gw->l2e);
+if (pse2M && leaf_pte_pkeys_check(v, pfec, gflags, pkeys))
+rc |= _PAGE_PKEY_BIT;
+#endif
+
 if ( pse2M )
 {
 /* Special case: this guest VA is in a PSE superpage, so there's
@@ -330,6 +390,11 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 goto out;
 }
 rc |= ((gflags & mflags) ^ mflags);
+#if GUEST_PAGING_LEVELS >= 4
+pkeys = guest_l1e_get_pkeys(gw->l1e);
+if (leaf_pte_pkeys_check(v, pfec, gflags, pkeys))
+rc |= _PAGE_PKEY_BIT;
+#endif
 }
 
 #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index da799a0..cfbb1ef 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -276,6 +276,8 @@ int hvm_girq_dest_2_vcpu_id(struct domain *d, uint8_t dest, 
uint8_t dest_mode);
 (hvm_paging_enabled(v) && ((v)->arch.hvm_vcpu.guest_cr[4] & X86_CR4_SMAP))
 #define hvm_nx_enabled(v) \
 (!!((v)->

[Xen-devel] [V2 PATCH 3/9] x86/hvm: pkeys, add pkeys support when setting CR4

2015-11-27 Thread Huaitong Han
This patch adds pkeys support when setting CR4.

Signed-off-by: Huaitong Han 
---
 xen/arch/x86/hvm/hvm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0adafe9..0103bcb 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1924,6 +1924,7 @@ static unsigned long hvm_cr4_guest_reserved_bits(const 
struct vcpu *v,
 leaf1_edx = 
boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_VME)];
 leaf1_ecx = 
boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_PCID)];
 leaf7_0_ebx = 
boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_FSGSBASE)];
+leaf7_0_ecx = 
boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_PKU)];
 }
 
 return ~(unsigned long)
@@ -1959,7 +1960,9 @@ static unsigned long hvm_cr4_guest_reserved_bits(const 
struct vcpu *v,
  (leaf7_0_ebx & cpufeat_mask(X86_FEATURE_SMEP) ?
   X86_CR4_SMEP : 0) |
  (leaf7_0_ebx & cpufeat_mask(X86_FEATURE_SMAP) ?
-  X86_CR4_SMAP : 0));
+  X86_CR4_SMAP : 0) |
+  (leaf7_0_ecx & cpufeat_mask(X86_FEATURE_PKU) ?
+  X86_CR4_PKE : 0));
 }
 
 static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [V2 PATCH 4/9] x86/hvm: pkeys, disable pkeys for guests in non-paging mode

2015-11-27 Thread Huaitong Han
This patch disables pkeys for guest in non-paging mode, However XEN always uses
paging mode to emulate guest non-paging mode, To emulate this behavior, pkeys
needs to be manually disabled when guest switches to non-paging mode.

Signed-off-by: Huaitong Han 
Reviewed-by: Andrew Cooper 
---
 xen/arch/x86/hvm/vmx/vmx.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index eb6248e..4a0c95f 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1361,12 +1361,13 @@ static void vmx_update_guest_cr(struct vcpu *v, 
unsigned int cr)
 if ( !hvm_paging_enabled(v) )
 {
 /*
- * SMEP/SMAP is disabled if CPU is in non-paging mode in hardware.
- * However Xen always uses paging mode to emulate guest non-paging
- * mode. To emulate this behavior, SMEP/SMAP needs to be manually
- * disabled when guest VCPU is in non-paging mode.
+ * SMEP/SMAP/PKU is disabled if CPU is in non-paging mode in
+ * hardware. However Xen always uses paging mode to emulate guest
+ * non-paging mode. To emulate this behavior, SMEP/SMAP/PKU needs
+ * to be manually disabled when guest VCPU is in non-paging mode.
  */
-v->arch.hvm_vcpu.hw_cr[4] &= ~(X86_CR4_SMEP | X86_CR4_SMAP);
+v->arch.hvm_vcpu.hw_cr[4] &=
+~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
 }
 __vmwrite(GUEST_CR4, v->arch.hvm_vcpu.hw_cr[4]);
 break;
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 01/10] x86/hvm: pkeys, add pkeys support for cpuid handling

2015-11-16 Thread Huaitong Han
This patch adds pkeys support for cpuid handing.

Pkeys hardware support is CPUID.7.0.ECX[3]:PKU. software support is
CPUID.7.0.ECX[4]:OSPKE and it reflects the support setting of CR4.PKE.

Signed-off-by: Huaitong Han 

diff --git a/tools/libxc/xc_cpufeature.h b/tools/libxc/xc_cpufeature.h
index c3ddc80..f6a9778 100644
--- a/tools/libxc/xc_cpufeature.h
+++ b/tools/libxc/xc_cpufeature.h
@@ -141,5 +141,7 @@
 #define X86_FEATURE_ADX 19 /* ADCX, ADOX instructions */
 #define X86_FEATURE_SMAP20 /* Supervisor Mode Access Protection */
 
+/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx) */
+#define X86_FEATURE_PKU 3
 
 #endif /* __LIBXC_CPUFEATURE_H */
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index e146a3e..34bb964 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -367,9 +367,11 @@ static void xc_cpuid_hvm_policy(
 bitmaskof(X86_FEATURE_ADX)  |
 bitmaskof(X86_FEATURE_SMAP) |
 bitmaskof(X86_FEATURE_FSGSBASE));
+regs[2] &= bitmaskof(X86_FEATURE_PKU);
 } else
-regs[1] = 0;
-regs[0] = regs[2] = regs[3] = 0;
+regs[1] = regs[2] = 0;
+
+regs[0] = regs[3] = 0;
 break;
 
 case 0x000d:
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 615fa89..66917ff 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4518,6 +4518,12 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 /* Don't expose INVPCID to non-hap hvm. */
 if ( (count == 0) && !hap_enabled(d) )
 *ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID);
+
+if ( (count == 0) && !(cpu_has_pku && hap_enabled(d)) )
+*ecx &= ~cpufeat_mask(X86_FEATURE_PKU);
+if ( (count == 0) && cpu_has_pku )
+*ecx |= (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE) ?
+ cpufeat_mask(X86_FEATURE_OSPKE) : 0;
 break;
 case 0xb:
 /* Fix the x2APIC identifier. */
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 9a01563..3c3b95f 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -154,6 +154,10 @@
 #define X86_FEATURE_ADX(7*32+19) /* ADCX, ADOX instructions */
 #define X86_FEATURE_SMAP   (7*32+20) /* Supervisor Mode Access Prevention 
*/
 
+/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx), word 8 */
+#define X86_FEATURE_PKU(8*32+ 3) /* Protection Keys for Userspace */
+#define X86_FEATURE_OSPKE  (8*32+ 4) /* OS Protection Keys Enable */
+
 #if !defined(__ASSEMBLY__) && !defined(X86_FEATURES_ONLY)
 #include 
 
@@ -193,6 +197,7 @@
 
 #define cpu_has_smepboot_cpu_has(X86_FEATURE_SMEP)
 #define cpu_has_smapboot_cpu_has(X86_FEATURE_SMAP)
+#define cpu_has_pku boot_cpu_has(X86_FEATURE_PKU)
 #define cpu_has_fpu_sel (!boot_cpu_has(X86_FEATURE_NO_FPU_SEL))
 
 #define cpu_has_ffxsr   ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) \
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 04/10] x86/hvm: pkeys, add pkeys support when setting CR4

2015-11-16 Thread Huaitong Han
This patch adds pkeys support when setting CR4

Signed-off-by: Huaitong Han 

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 66917ff..953047f 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1911,6 +1911,7 @@ static unsigned long hvm_cr4_guest_reserved_bits(const 
struct vcpu *v,
 leaf1_edx = boot_cpu_data.x86_capability[X86_FEATURE_VME / 32];
 leaf1_ecx = boot_cpu_data.x86_capability[X86_FEATURE_PCID / 32];
 leaf7_0_ebx = boot_cpu_data.x86_capability[X86_FEATURE_FSGSBASE / 32];
+leaf7_0_ecx = boot_cpu_data.x86_capability[X86_FEATURE_PKU / 32];
 }
 
 return ~(unsigned long)
@@ -1946,7 +1947,9 @@ static unsigned long hvm_cr4_guest_reserved_bits(const 
struct vcpu *v,
  (leaf7_0_ebx & cpufeat_mask(X86_FEATURE_SMEP) ?
   X86_CR4_SMEP : 0) |
  (leaf7_0_ebx & cpufeat_mask(X86_FEATURE_SMAP) ?
-  X86_CR4_SMAP : 0));
+  X86_CR4_SMAP : 0) |
+ (leaf7_0_ecx & cpufeat_mask(X86_FEATURE_PKU) ?
+  X86_CR4_PKE : 0));
 }
 
 static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index c1f924e..8101a1b 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1310,6 +1310,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
 if ( disable_pku )
 setup_clear_cpu_cap(X86_FEATURE_PKU);
+if ( cpu_has_pku )
+set_in_cr4(X86_CR4_PKE);
 
 if ( cpu_has_fsgsbase )
 set_in_cr4(X86_CR4_FSGSBASE);
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 03/10] x86/hvm: pkeys, add the flag to enable Memory Protection Keys

2015-11-16 Thread Huaitong Han
This patch adds the flag to enable Memory Protection Keys.

Signed-off-by: Huaitong Han 

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index a565c1b..0ded4bf 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1303,6 +1303,13 @@ Flag to enable Supervisor Mode Execution Protection
 
 Flag to enable Supervisor Mode Access Prevention
 
+### pku
+> `= >`
+
+> Default: `true`
+
+Flag to enable Memory Protection Keys
+
 ### snb\_igd\_quirk
 > `=  | cap | `
 
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 3946e4c..c1f924e 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -67,6 +67,10 @@ invbool_param("smep", disable_smep);
 static bool_t __initdata disable_smap;
 invbool_param("smap", disable_smap);
 
+/* pku: Enable/disable Memory Protection Keys (default on). */
+static bool_t __initdata disable_pku;
+invbool_param("pku", disable_pku);
+
 /* Boot dom0 in pvh mode */
 static bool_t __initdata opt_dom0pvh;
 boolean_param("dom0pvh", opt_dom0pvh);
@@ -1304,6 +1308,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 if ( cpu_has_smap )
 set_in_cr4(X86_CR4_SMAP);
 
+if ( disable_pku )
+setup_clear_cpu_cap(X86_FEATURE_PKU);
+
 if ( cpu_has_fsgsbase )
 set_in_cr4(X86_CR4_FSGSBASE);
 
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 09/10] x86/hvm: pkeys, add pkeys support for guest_walk_tables

2015-11-16 Thread Huaitong Han
This patch adds pkeys support for guest_walk_tables.

Signed-off-by: Huaitong Han 

diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 773454d..7a7ae96 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -124,6 +124,46 @@ void *map_domain_gfn(struct p2m_domain *p2m, gfn_t gfn, 
mfn_t *mfn,
 return map;
 }
 
+#if GUEST_PAGING_LEVELS >= 4
+uint32_t leaf_pte_pkeys_check(struct vcpu *vcpu, uint32_t pfec,
+uint32_t pte_access, uint32_t pte_pkeys)
+{
+unsigned int pkru_ad, pkru_wd;
+unsigned int ff, wf, uf, rsvdf, pkuf;
+
+uf = pfec & PFEC_user_mode;
+wf = pfec & PFEC_write_access;
+rsvdf = pfec & PFEC_reserved_bit;
+ff = pfec & PFEC_insn_fetch;
+pkuf = pfec & PFEC_protection_key;
+
+if (!pkuf)
+return 0;
+
+/*
+ * PKU:  additional mechanism by which the paging controls
+* access to user-mode addresses based on the value in the
+* PKRU register. A fault is considered as a PKU violation if all
+* of the following conditions are ture:
+* 1.CR4_PKE=1.
+* 2.EFER_LMA=1.
+* 3.page is present with no reserved bit violations.
+* 4.the access is not an instruction fetch.
+* 5.the access is to a user page.
+* 6.PKRU.AD=1
+*   or The access is a data write and PKRU.WD=1
+*and either CR0.WP=1 or it is a user access.
+*/
+pkru_ad = READ_PKRU_AD(pte_pkeys);
+pkru_wd = READ_PKRU_AD(pte_pkeys);
+if ( hvm_pku_enabled(vcpu) && hvm_long_mode_enabled(vcpu) &&
+!rsvdf && !ff && (pkru_ad ||
+(pkru_wd && wf && (hvm_wp_enabled(vcpu) || uf
+return 1;
+
+return 0;
+}
+#endif
 
 /* Walk the guest pagetables, after the manner of a hardware walker. */
 /* Because the walk is essentially random, it can cause a deadlock 
@@ -141,6 +181,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
 guest_l3e_t *l3p = NULL;
 guest_l4e_t *l4p;
+uint32_t pkeys;
 #endif
 uint32_t gflags, mflags, iflags, rc = 0;
 bool_t smep = 0, smap = 0;
@@ -225,6 +266,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 goto out;
 /* Get the l3e and check its flags*/
 gw->l3e = l3p[guest_l3_table_offset(va)];
+pkeys = guest_l3e_get_pkeys(gw->l3e);
 gflags = guest_l3e_get_flags(gw->l3e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -234,6 +276,9 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 pse1G = (gflags & _PAGE_PSE) && guest_supports_1G_superpages(v); 
 
+if (pse1G && leaf_pte_pkeys_check(v, pfec, gflags, pkeys))
+rc |= _PAGE_PK_BIT;
+
 if ( pse1G )
 {
 /* Generate a fake l1 table entry so callers don't all 
@@ -295,7 +340,6 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 gw->l2e = l2p[guest_l2_table_offset(va)];
 
 #endif /* All levels... */
-
 gflags = guest_l2e_get_flags(gw->l2e) ^ iflags;
 if ( !(gflags & _PAGE_PRESENT) ) {
 rc |= _PAGE_PRESENT;
@@ -305,6 +349,12 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 
 pse2M = (gflags & _PAGE_PSE) && guest_supports_superpages(v); 
 
+#if GUEST_PAGING_LEVELS >= 4
+pkeys = guest_l2e_get_pkeys(gw->l2e);
+if (pse2M && leaf_pte_pkeys_check(v, pfec, gflags, pkeys))
+rc |= _PAGE_PK_BIT;
+#endif
+
 if ( pse2M )
 {
 /* Special case: this guest VA is in a PSE superpage, so there's
@@ -365,6 +415,11 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
 goto out;
 }
 rc |= ((gflags & mflags) ^ mflags);
+#if GUEST_PAGING_LEVELS >= 4
+pkeys = guest_l1e_get_pkeys(gw->l1e);
+if (leaf_pte_pkeys_check(v, pfec, gflags, pkeys))
+rc |= _PAGE_PK_BIT;
+#endif
 }
 
 #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h
index f8a0d76..1c0f050 100644
--- a/xen/include/asm-x86/guest_pt.h
+++ b/xen/include/asm-x86/guest_pt.h
@@ -154,6 +154,17 @@ static inline u32 guest_l4e_get_flags(guest_l4e_t gl4e)
 { return l4e_get_flags(gl4e); }
 #endif
 
+static inline u32 guest_l1e_get_pkeys(guest_l1e_t gl1e)
+{ return l1e_get_pkeys(gl1e); }
+static inline u32 guest_l2e_get_pkeys(guest_l2e_t gl2e)
+{ return l2e_get_pkeys(gl2e); }
+static inline u32 guest_l3e_get_pkeys(guest_l3e_t gl3e)
+{ return l3e_get_pkeys(gl3e); }
+#if GUEST_PAGING_LEVELS >= 4
+static inline u32 guest_l4e_get_pkeys(guest_l4e_t gl4e)
+{ return l4e_get_pkeys(gl4e); }
+#endif
+
 static inline guest_l1e_t guest_l1e_from_gfn(gfn_t gfn, u32 flags)
 { return l1e_from_pfn(gfn_x(gfn), flags); }
 static inline guest_l2e_t guest_l2e_from_gfn(gfn_t gfn, u32 flags)
diff --git a/xen/

[Xen-devel] [PATCH 05/10] x86/hvm: pkeys, disable pkeys for guests in non-paging mode

2015-11-16 Thread Huaitong Han
This patch disables pkeys for guest in non-paging mode, However XEN always uses
paging mode to emulate guest non-paging mode, To emulate this behavior, pkeys
needs to be manually disabled when guest switches to non-paging mode.

Signed-off-by: Huaitong Han 

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 2582cdd..bc9c4b0 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1367,12 +1367,13 @@ static void vmx_update_guest_cr(struct vcpu *v, 
unsigned int cr)
 if ( !hvm_paging_enabled(v) )
 {
 /*
- * SMEP/SMAP is disabled if CPU is in non-paging mode in hardware.
- * However Xen always uses paging mode to emulate guest non-paging
- * mode. To emulate this behavior, SMEP/SMAP needs to be manually
- * disabled when guest VCPU is in non-paging mode.
+ * SMEP/SMAP/PKU is disabled if CPU is in non-paging mode in
+ * hardware. However Xen always uses paging mode to emulate guest
+ * non-paging mode. To emulate this behavior, SMEP/SMAP/PKU needs
+ * to be manually disabled when guest VCPU is in non-paging mode.
  */
-v->arch.hvm_vcpu.hw_cr[4] &= ~(X86_CR4_SMEP | X86_CR4_SMAP);
+v->arch.hvm_vcpu.hw_cr[4] &=
+~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
 }
 __vmwrite(GUEST_CR4, v->arch.hvm_vcpu.hw_cr[4]);
 break;
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 02/10] x86/hvm: pkeys, add pku support for x86_capability

2015-11-16 Thread Huaitong Han
This patch adds pku support for x86_capability.

Signed-off-by: Huaitong Han 

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 35ef21b..04bf4fb 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -211,7 +211,7 @@ static void __init early_cpu_detect(void)
 
 static void __cpuinit generic_identify(struct cpuinfo_x86 *c)
 {
-   u32 tfms, capability, excap, ebx;
+   u32 tfms, capability, excap, ebx, ecx;
 
/* Get vendor name */
cpuid(0x, &c->cpuid_level,
@@ -258,8 +258,9 @@ static void __cpuinit generic_identify(struct cpuinfo_x86 
*c)
/* Intel-defined flags: level 0x0007 */
if ( c->cpuid_level >= 0x0007 ) {
u32 dummy;
-   cpuid_count(0x0007, 0, &dummy, &ebx, &dummy, &dummy);
+   cpuid_count(0x0007, 0, &dummy, &ebx, &ecx, &dummy);
c->x86_capability[X86_FEATURE_FSGSBASE / 32] = ebx;
+   c->x86_capability[X86_FEATURE_PKU / 32] = ecx;
}
 }
 
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 3c3b95f..b2899e3 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -9,7 +9,7 @@
 #define __ASM_I386_CPUFEATURE_H
 #endif
 
-#define NCAPINTS   8   /* N 32-bit words worth of info */
+#define NCAPINTS   9   /* N 32-bit words worth of info */
 
 /* Intel-defined CPU features, CPUID level 0x0001 (edx), word 0 */
 #define X86_FEATURE_FPU(0*32+ 0) /* Onboard FPU */
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 07/10] x86/hvm: pkeys, add functions to support PKRU access/write

2015-11-16 Thread Huaitong Han
This patch adds functions to support PKRU access/write

Signed-off-by: Huaitong Han 

diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index f507f5e..427eb84 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -336,6 +336,44 @@ static inline void write_cr4(unsigned long val)
 asm volatile ( "mov %0,%%cr4" : : "r" (val) );
 }
 
+static inline unsigned int read_pkru(void)
+{
+unsigned int eax, edx;
+unsigned int ecx = 0;
+unsigned int pkru;
+
+asm volatile(".byte 0x0f,0x01,0xee\n\t"
+ : "=a" (eax), "=d" (edx)
+ : "c" (ecx));
+pkru = eax;
+return pkru;
+}
+
+static inline void write_pkru(unsigned int pkru)
+{
+unsigned int eax = pkru;
+unsigned int ecx = 0;
+unsigned int edx = 0;
+
+asm volatile(".byte 0x0f,0x01,0xef\n\t"
+ : : "a" (eax), "c" (ecx), "d" (edx));
+}
+
+/* macros for pkru */
+#define PKRU_READ  0
+#define PKRU_WRITE 1
+#define PKRU_ATTRS 2
+
+/*
+* PKRU defines 32 bits, there are 16 domains and 2 attribute bits per
+* domain in pkru, pkeys is index to a defined domain, so the value of
+* pte_pkeys * PKRU_ATTRS + R/W is offset of a defined domain attribute.
+*/
+#define READ_PKRU_AD(x) ((read_pkru() >> (x * PKRU_ATTRS + PKRU_READ)) & 1)
+#define READ_PKRU_WD(x) ((read_pkru() >> (x * PKRU_ATTRS + PKRU_WRITE)) & 1)
+
+
+
 /* Clear and set 'TS' bit respectively */
 static inline void clts(void) 
 {
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 00/10] x86/hvm: pkeys, add memory protection-key support

2015-11-16 Thread Huaitong Han
The protection-key feature provides an additional mechanism by which IA-32e
paging controls access to usermode addresses.

Hardware support for protection keys for user pages is enumerated with CPUID
feature flag CPUID.7.0.ECX[3]:PKU. Software support is CPUID.7.0.ECX[4]:OSPKE
with the setting of CR4.PKE(bit 22).

When CR4.PKE = 1, every linear address is associated with the 4-bit protection
key located in bits 62:59 of the paging-structure entry that mapped the page
containing the linear address. The PKRU register determines, for each
protection key, whether user-mode addresses with that protection key may be
read or written.

The PKRU register (protection key rights for user pages) is a 32-bit register
with the following format: for each i (0 ≤ i ≤ 15), PKRU[2i] is the
access-disable bit for protection key i (ADi); PKRU[2i+1] is the write-disable
bit for protection key i (WDi).

Software can use the RDPKRU and WRPKRU instructions with ECX = 0 to read and
write PKRU. In addition, the PKRU register is XSAVE-managed state and can thus
be read and written by instructions in the XSAVE feature set.

PFEC.PK (bit 5) is defined as protection key violations.

The specification of Protection Keys can be found at SDM (4.6.2, volume 3)
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.

Huaitong Han (10):
  x86/hvm: pkeys, add pkeys support for cpuid handling
  x86/hvm: pkeys, add pku support for x86_capability
  x86/hvm: pkeys, add the flag to enable Memory Protection Keys
  x86/hvm: pkeys, add pkeys support when setting CR4
  x86/hvm: pkeys, disable pkeys for guests in non-paging mode
  x86/hvm: pkeys, add functions to get pkeys value from PTE
  x86/hvm: pkeys, add functions to support PKRU access/write
  x86/hvm: pkeys, add pkeys support for do_page_fault
  x86/hvm: pkeys, add pkeys support for guest_walk_tables
  x86/hvm: pkeys, add xstate support for pkeys

 docs/misc/xen-command-line.markdown |  7 +
 tools/libxc/xc_cpufeature.h |  2 ++
 tools/libxc/xc_cpuid_x86.c  |  6 ++--
 xen/arch/x86/cpu/common.c   |  5 ++--
 xen/arch/x86/hvm/hvm.c  | 11 ++-
 xen/arch/x86/hvm/vmx/vmx.c  | 11 +++
 xen/arch/x86/mm/guest_walk.c| 57 -
 xen/arch/x86/setup.c|  9 ++
 xen/arch/x86/traps.c| 47 +-
 xen/include/asm-x86/cpufeature.h|  7 -
 xen/include/asm-x86/guest_pt.h  | 11 +++
 xen/include/asm-x86/hvm/hvm.h   |  2 ++
 xen/include/asm-x86/page.h  |  7 +
 xen/include/asm-x86/processor.h | 53 +-
 xen/include/asm-x86/x86_64/page.h   | 14 +
 xen/include/asm-x86/xstate.h|  3 +-
 16 files changed, 225 insertions(+), 27 deletions(-)

-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 10/10] x86/hvm: pkeys, add xstate support for pkeys

2015-11-16 Thread Huaitong Han
This patch adds xstate support for pkeys.

Signed-off-by: Huaitong Han 

diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index 4c690db..5674f3e 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -33,13 +33,14 @@
 #define XSTATE_OPMASK  (1ULL << 5)
 #define XSTATE_ZMM (1ULL << 6)
 #define XSTATE_HI_ZMM  (1ULL << 7)
+#define XSTATE_PKRU(1ULL << 9)
 #define XSTATE_LWP (1ULL << 62) /* AMD lightweight profiling */
 #define XSTATE_FP_SSE  (XSTATE_FP | XSTATE_SSE)
 #define XCNTXT_MASK(XSTATE_FP | XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | \
 XSTATE_ZMM | XSTATE_HI_ZMM | XSTATE_NONLAZY)
 
 #define XSTATE_ALL (~(1ULL << 63))
-#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR)
+#define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR | 
XSTATE_PKRU)
 #define XSTATE_LAZY(XSTATE_ALL & ~XSTATE_NONLAZY)
 
 extern u64 xfeature_mask;
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 06/10] x86/hvm: pkeys, add functions to get pkeys value from PTE

2015-11-16 Thread Huaitong Han
This patch adds functions to get pkeys value from PTE.

Signed-off-by: Huaitong Han 

diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
index 87b3341..1cdbfc8 100644
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -93,6 +93,13 @@
 #define l3e_get_flags(x)   (get_pte_flags((x).l3))
 #define l4e_get_flags(x)   (get_pte_flags((x).l4))
 
+/* Get pte pkeys (unsigned int). */
+#define l1e_get_pkeys(x)   (get_pte_pkeys((x).l1))
+#define l2e_get_pkeys(x)   (get_pte_pkeys((x).l2))
+#define l3e_get_pkeys(x)   (get_pte_pkeys((x).l3))
+#define l4e_get_pkeys(x)   (get_pte_pkeys((x).l4))
+
+
 /* Construct an empty pte. */
 #define l1e_empty()((l1_pgentry_t) { 0 })
 #define l2e_empty()((l2_pgentry_t) { 0 })
diff --git a/xen/include/asm-x86/x86_64/page.h 
b/xen/include/asm-x86/x86_64/page.h
index 19ab4d0..03418ba 100644
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -134,6 +134,18 @@ typedef l4_pgentry_t root_pgentry_t;
 #define get_pte_flags(x) (((int)((x) >> 40) & ~0xFFF) | ((int)(x) & 0xFFF))
 #define put_pte_flags(x) (((intpte_t)((x) & ~0xFFF) << 40) | ((x) & 0xFFF))
 
+/*
+* Protection keys define a new 4-bit protection key field
+* (PKEY) in bits 62:59 of leaf entries of the page tables.
+* This corresponds to bit 22:19 of a 24-bit flags.
+*/
+#define _PAGE_PKEY_BIT0 19   /* Protection Keys, bit 1/4 */
+#define _PAGE_PKEY_BIT1 20   /* Protection Keys, bit 2/4 */
+#define _PAGE_PKEY_BIT2 21   /* Protection Keys, bit 3/4 */
+#define _PAGE_PKEY_BIT3 22   /* Protection Keys, bit 4/4 */
+
+#define get_pte_pkeys(x) ((int)(get_pte_flags(x) >> _PAGE_PKEY_BIT0) & 0xF)
+
 /* Bit 23 of a 24-bit flag mask. This corresponds to bit 63 of a pte.*/
 #define _PAGE_NX_BIT (1U<<23)
 
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 08/10] x86/hvm: pkeys, add pkeys support for do_page_fault

2015-11-16 Thread Huaitong Han
This patch adds pkeys support for do_page_fault.

the protection keys architecture define a new status bit in the PFEC. PFEC.PK
(bit 5) is set to 1 if an only if protection keys block the access.

Protection keys block an access and induce a page fault if and only if
1.Protection keys are enabled (CR4.PKE=1 and EFER.LMA=1), and
2.The page has a valid translation (page is present with no reserved bit
  violations), and
3.The access is not an instruction fetch, and
4.The access is to a user page, and
5.At least one of the following restrictions apply:
--The access is a data read or data write and AD=1
--The access is a data write and WD=1 and either CR0.WP=1 or (CR0.WP=0 and
it is a user access)

Signed-off-by: Huaitong Han 

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 9f5a6c6..73abb3b 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1287,7 +1287,7 @@ enum pf_type {
 spurious_fault
 };
 
-static enum pf_type __page_fault_type(
+static enum pf_type __page_fault_type(struct vcpu *vcpu,
 unsigned long addr, const struct cpu_user_regs *regs)
 {
 unsigned long mfn, cr3 = read_cr3();
@@ -1295,7 +1295,7 @@ static enum pf_type __page_fault_type(
 l3_pgentry_t l3e, *l3t;
 l2_pgentry_t l2e, *l2t;
 l1_pgentry_t l1e, *l1t;
-unsigned int required_flags, disallowed_flags, page_user;
+unsigned int required_flags, disallowed_flags, page_user, pte_pkeys;
 unsigned int error_code = regs->error_code;
 
 /*
@@ -1340,6 +1340,7 @@ static enum pf_type __page_fault_type(
 if ( ((l3e_get_flags(l3e) & required_flags) != required_flags) ||
  (l3e_get_flags(l3e) & disallowed_flags) )
 return real_fault;
+pte_pkeys = l3e_get_pkeys(l3e);
 page_user &= l3e_get_flags(l3e);
 if ( l3e_get_flags(l3e) & _PAGE_PSE )
 goto leaf;
@@ -1351,6 +1352,7 @@ static enum pf_type __page_fault_type(
 if ( ((l2e_get_flags(l2e) & required_flags) != required_flags) ||
  (l2e_get_flags(l2e) & disallowed_flags) )
 return real_fault;
+pte_pkeys = l2e_get_pkeys(l2e);
 page_user &= l2e_get_flags(l2e);
 if ( l2e_get_flags(l2e) & _PAGE_PSE )
 goto leaf;
@@ -1362,12 +1364,22 @@ static enum pf_type __page_fault_type(
 if ( ((l1e_get_flags(l1e) & required_flags) != required_flags) ||
  (l1e_get_flags(l1e) & disallowed_flags) )
 return real_fault;
+pte_pkeys = l1e_get_pkeys(l1e);
 page_user &= l1e_get_flags(l1e);
 
 leaf:
 if ( page_user )
 {
 unsigned long cr4 = read_cr4();
+unsigned int ff, wf, uf, rsvdf, pkuf;
+unsigned int pkru_ad, pkru_wd;
+
+uf = error_code & PFEC_user_mode;
+wf = error_code & PFEC_write_access;
+rsvdf = error_code & PFEC_reserved_bit;
+ff = error_code & PFEC_insn_fetch;
+pkuf = error_code & PFEC_protection_key;
+
 /*
  * Supervisor Mode Execution Prevention (SMEP):
  * Disallow supervisor execution from user-accessible mappings
@@ -1386,15 +1398,35 @@ leaf:
  *   - CPL=3 or X86_EFLAGS_AC is clear
  *   - Page fault in kernel mode
  */
-if ( (cr4 & X86_CR4_SMAP) && !(error_code & PFEC_user_mode) &&
+if ( (cr4 & X86_CR4_SMAP) && !uf &&
  (((regs->cs & 3) == 3) || !(regs->eflags & X86_EFLAGS_AC)) )
 return smap_fault;
+ /*
+ * PKU:  additional mechanism by which the paging controls
+ * access to user-mode addresses based on the value in the
+ * PKRU register. A fault is considered as a PKU violation if all
+ * of the following conditions are ture:
+ * 1.CR4_PKE=1.
+ * 2.EFER_LMA=1.
+ * 3.page is present with no reserved bit violations.
+ * 4.the access is not an instruction fetch.
+ * 5.the access is to a user page.
+ * 6.PKRU.AD=1
+ *   or The access is a data write and PKRU.WD=1
+ *   and either CR0.WP=1 or it is a user access.
+ */
+ pkru_ad = READ_PKRU_AD(pte_pkeys);
+ pkru_wd = READ_PKRU_AD(pte_pkeys);
+ if ( pkuf && (cr4 & X86_CR4_PKE) && hvm_long_mode_enabled(vcpu) &&
+ !rsvdf && !ff && (pkru_ad ||
+ (pkru_wd && wf && (hvm_wp_enabled(vcpu) || uf
+ return real_fault;
 }
 
 return spurious_fault;
 }
 
-static enum pf_type spurious_page_fault(
+static enum pf_type spurious_page_fault(struct vcpu *vcpu,
 unsigned long addr, const struct cpu_user_regs *regs)
 {
 unsigned long flags;
@@ -1405,7 +1437,7 @@ static enum pf_type spurious_page_fault(
  * page tables from becoming invalid under our feet during the walk.
  */
 local_irq_save(flags);
-pf_type = __page_fault_type(addr, regs);
+pf_type = __p

[Xen-devel] [V7] x86/cpuidle: get accurate C0 value with xenpm tool

2015-05-21 Thread Huaitong Han
When checking the ACPI funciton of C-status, after 100 seconds sleep,
the sampling value of C0 status from the xenpm tool decreases.
Because C0=NOW()-C1-C2-C3-C4, when NOW() value is during idle time,
NOW() value is bigger than last C-status update time, and C0 value
is also bigger than ture value. if margin of the second error cannot
make up for margin of the first error, the value of C0 would decrease.

Signed-off-by: Huaitong Han 

---
ChangeLog:
V7:
delete uint64_t;
delete extern variable and keep variable static.

V6:
Add blank lines.
delete res_ticks[] and a wrong comment.

V5:
Ticks clock souce may be acpi_pm, so use common funciton "ticks_elapsed".
Taking every "tick_to_ns" outside spin_lock.
Spliting the "for" loop.

V4:
delete pointless initializers and hard tabs.

V3:
1.Don't use tick_to_ns inside lock in print_acpi_power.
2.Use 08 padding in printk.
3.Merge two "for" circulation into one for coding style.

V2:
C0 = last_cx_update_time-C1-C2-C3-C4, but last_cx_update_time is not now,
so the C0 value is stale, NOW-last_update_time should be calculated.
C[current_cx_stat]+=NOW-last_update_time, so the CX value is fresh.

V1:
Initial patch
---

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index e639c99..39b5e4d 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -252,11 +252,33 @@ static char* acpi_cstate_method_name[] =
 "HALT"
 };
 
+static uint64_t get_stime_tick(void) { return (uint64_t)NOW(); }
+static uint64_t stime_ticks_elapsed(uint64_t t1, uint64_t t2) { return t2 - 
t1; }
+static uint64_t stime_tick_to_ns(uint64_t ticks) { return ticks; }
+
+static uint64_t get_acpi_pm_tick(void) { return (uint64_t)inl(pmtmr_ioport); }
+static uint64_t acpi_pm_ticks_elapsed(uint64_t t1, uint64_t t2)
+{
+if ( t2 >= t1 )
+return (t2 - t1);
+else if ( !(acpi_gbl_FADT.flags & ACPI_FADT_32BIT_TIMER) )
+return (((0x00FF - t1) + t2 + 1) & 0x00FF);
+else
+return ((0x - t1) + t2 +1);
+}
+
+uint64_t (*__read_mostly cpuidle_get_tick)(void) = get_acpi_pm_tick;
+static uint64_t (*__read_mostly ticks_elapsed)(uint64_t, uint64_t)
+= acpi_pm_ticks_elapsed;
+
+
 static void print_acpi_power(uint32_t cpu, struct acpi_processor_power *power)
 {
-uint32_t i, idle_usage = 0;
-uint64_t res, idle_res = 0;
-u32 usage;
+uint64_t idle_res = 0, idle_usage = 0;
+uint64_t last_state_update_tick, current_tick, current_stime;
+uint64_t usage[ACPI_PROCESSOR_MAX_POWER] = { 0 };
+uint64_t res_tick[ACPI_PROCESSOR_MAX_POWER] = { 0 };
+unsigned int i;
 u8 last_state_idx;
 
 printk("==cpu%d==\n", cpu);
@@ -264,28 +286,37 @@ static void print_acpi_power(uint32_t cpu, struct 
acpi_processor_power *power)
 printk("active state:\t\tC%d\n", last_state_idx);
 printk("max_cstate:\t\tC%d\n", max_cstate);
 printk("states:\n");
-
+
+spin_lock_irq(&power->stat_lock);
+current_tick = cpuidle_get_tick();
+current_stime = NOW();
 for ( i = 1; i < power->count; i++ )
 {
-spin_lock_irq(&power->stat_lock);  
-res = tick_to_ns(power->states[i].time);
-usage = power->states[i].usage;
-spin_unlock_irq(&power->stat_lock);
+res_tick[i] = power->states[i].time;
+usage[i] = power->states[i].usage;
+}
+last_state_update_tick = power->last_state_update_tick;
+spin_unlock_irq(&power->stat_lock);
 
-idle_usage += usage;
-idle_res += res;
+res_tick[last_state_idx] += ticks_elapsed(last_state_update_tick, 
current_tick);
+usage[last_state_idx]++;
+
+for ( i = 1; i < power->count; i++ )
+{
+idle_usage += usage[i];
+idle_res += tick_to_ns(res_tick[i]);
 
 printk((last_state_idx == i) ? "   *" : "");
 printk("C%d:\t", i);
 printk("type[C%d] ", power->states[i].type);
 printk("latency[%03d] ", power->states[i].latency);
-printk("usage[%08d] ", usage);
+printk("usage[%08"PRIu64"] ", usage[i]);
 printk("method[%5s] ", 
acpi_cstate_method_name[power->states[i].entry_method]);
-printk("duration[%"PRId64"]\n", res);
+printk("duration[%"PRIu64"]\n", tick_to_ns(res_tick[i]));
 }
 printk((last_state_idx == 0) ? "   *" : "");
-printk("C0:\tusage[%08d] duration[%"PRId64"]\n",
-   idle_usage, NOW() - idle_res);
+printk("C0:\tusage[%08"PRIu64"] duration[%"PRIu64"]\n",
+   usage[0] + idle_usage, current_stime - idle_res);
 
 print_hw_residencies(cpu);
 }
@@ -313,24 +344,6 @@ static int __init cpu_idl

[Xen-devel] [V6] x86/cpuidle: get accurate C0 value with xenpm tool

2015-05-20 Thread Huaitong Han
When checking the ACPI funciton of C-status, after 100 seconds sleep,
the sampling value of C0 status from the xenpm tool decreases.
Because C0=NOW()-C1-C2-C3-C4, when NOW() value is during idle time,
NOW() value is bigger than last C-status update time, and C0 value
is also bigger than ture value. if margin of the second error cannot
make up for margin of the first error, the value of C0 would decrease.

Signed-off-by: Huaitong Han 

---
ChangeLog:
V6:
Add blank lines.
delete res_ticks[] and a wrong comment.

V5:
Ticks clock souce may be acpi_pm, so use common funciton "ticks_elapsed".
Taking every "tick_to_ns" outside spin_lock.
Spliting the "for" loop.

V4:
delete pointless initializers and hard tabs.

V3:
1.Don't use tick_to_ns inside lock in print_acpi_power.
2.Use 08 padding in printk.
3.Merge two "for" circulation into one for coding style.

V2:
C0 = last_cx_update_time-C1-C2-C3-C4, but last_cx_update_time is not now,
so the C0 value is stale, NOW-last_update_time should be calculated.
C[current_cx_stat]+=NOW-last_update_time, so the CX value is fresh.

V1:
Initial patch
---

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index e639c99..d16e3e4 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -254,9 +254,11 @@ static char* acpi_cstate_method_name[] =
 
 static void print_acpi_power(uint32_t cpu, struct acpi_processor_power *power)
 {
-uint32_t i, idle_usage = 0;
-uint64_t res, idle_res = 0;
-u32 usage;
+uint64_t idle_res = 0, idle_usage = 0;
+uint64_t last_state_update_tick, current_tick, current_stime;
+uint64_t usage[ACPI_PROCESSOR_MAX_POWER] = { 0 };
+uint64_t res_tick[ACPI_PROCESSOR_MAX_POWER] = { 0 };
+unsigned int i;
 u8 last_state_idx;
 
 printk("==cpu%d==\n", cpu);
@@ -264,28 +266,37 @@ static void print_acpi_power(uint32_t cpu, struct 
acpi_processor_power *power)
 printk("active state:\t\tC%d\n", last_state_idx);
 printk("max_cstate:\t\tC%d\n", max_cstate);
 printk("states:\n");
-
+
+spin_lock_irq(&power->stat_lock);
+current_tick = cpuidle_get_tick();
+current_stime = NOW();
 for ( i = 1; i < power->count; i++ )
 {
-spin_lock_irq(&power->stat_lock);  
-res = tick_to_ns(power->states[i].time);
-usage = power->states[i].usage;
-spin_unlock_irq(&power->stat_lock);
+res_tick[i] = power->states[i].time;
+usage[i] = power->states[i].usage;
+}
+last_state_update_tick = power->last_state_update_tick;
+spin_unlock_irq(&power->stat_lock);
 
-idle_usage += usage;
-idle_res += res;
+res_tick[last_state_idx] += ticks_elapsed(last_state_update_tick, 
current_tick);
+usage[last_state_idx]++;
+
+for ( i = 1; i < power->count; i++ )
+{
+idle_usage += usage[i];
+idle_res += tick_to_ns(res_tick[i]);
 
 printk((last_state_idx == i) ? "   *" : "");
 printk("C%d:\t", i);
 printk("type[C%d] ", power->states[i].type);
 printk("latency[%03d] ", power->states[i].latency);
-printk("usage[%08d] ", usage);
+printk("usage[%08"PRIu64"] ", usage[i]);
 printk("method[%5s] ", 
acpi_cstate_method_name[power->states[i].entry_method]);
-printk("duration[%"PRId64"]\n", res);
+printk("duration[%"PRIu64"]\n", tick_to_ns(res_tick[i]));
 }
 printk((last_state_idx == 0) ? "   *" : "");
-printk("C0:\tusage[%08d] duration[%"PRId64"]\n",
-   idle_usage, NOW() - idle_res);
+printk("C0:\tusage[%08"PRIu64"] duration[%"PRIu64"]\n",
+   usage[0] + idle_usage, current_stime - idle_res);
 
 print_hw_residencies(cpu);
 }
@@ -329,7 +340,7 @@ static uint64_t acpi_pm_ticks_elapsed(uint64_t t1, uint64_t 
t2)
 }
 
 uint64_t (*__read_mostly cpuidle_get_tick)(void) = get_acpi_pm_tick;
-static uint64_t (*__read_mostly ticks_elapsed)(uint64_t, uint64_t)
+uint64_t (*__read_mostly ticks_elapsed)(uint64_t, uint64_t)
 = acpi_pm_ticks_elapsed;
 
 /*
@@ -486,6 +497,17 @@ bool_t errata_c6_eoi_workaround(void)
 return (fix_needed && cpu_has_pending_apic_eoi());
 }
 
+void update_last_cx_stat(struct acpi_processor_power *power,
+ struct acpi_processor_cx *cx, uint64_t ticks)
+{
+ASSERT(!local_irq_is_enabled());
+
+spin_lock(&power->stat_lock);
+power->last_state = cx;
+power->last_state_update_tick = ticks;
+spin_unlock(&power->stat_lock);
+}
+
 void update_idle_stats(struct acpi_processor_power *power,
struct acpi_processor_cx *cx,
   

[Xen-devel] [V5] x86/cpuidle: get accurate C0 value with xenpm tool

2015-05-13 Thread Huaitong Han
When checking the ACPI funciton of C-status, after 100 seconds sleep,
the sampling value of C0 status from the xenpm tool decreases.
Because C0=NOW()-C1-C2-C3-C4, when NOW() value is during idle time,
NOW() value is bigger than last C-status update time, and C0 value
is also bigger than ture value. if margin of the second error cannot
make up for margin of the first error, the value of C0 would decrease.

Signed-off-by: Huaitong Han 

---
ChangeLog:
V5:
Ticks clock souce may be acpi_pm, so use common funciton "ticks_elapsed".
Taking every "tick_to_ns" outside spin_lock.
Spliting the "for" loop.

V4:
delete pointless initializers and hard tabs.

V3:
1.Don't use tick_to_ns inside lock in print_acpi_power.
2.Use 08 padding in printk.
3.Merge two "for" circulation into one for coding style.

V2:
C0 = last_cx_update_time-C1-C2-C3-C4, but last_cx_update_time is not now,
so the C0 value is stale, NOW-last_update_time should be calculated.
C[current_cx_stat]+=NOW-last_update_time, so the CX value is fresh.

V1:
Initial patch
---

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index e639c99..07ee3a2 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -254,9 +254,11 @@ static char* acpi_cstate_method_name[] =
 
 static void print_acpi_power(uint32_t cpu, struct acpi_processor_power *power)
 {
-uint32_t i, idle_usage = 0;
-uint64_t res, idle_res = 0;
-u32 usage;
+uint64_t idle_res = 0, idle_usage = 0;
+uint64_t last_state_update_tick, current_tick, current_stime;
+uint64_t usage[ACPI_PROCESSOR_MAX_POWER] = { 0 };
+uint64_t res_tick[ACPI_PROCESSOR_MAX_POWER] = { 0 };
+unsigned int i;
 u8 last_state_idx;
 
 printk("==cpu%d==\n", cpu);
@@ -264,28 +266,37 @@ static void print_acpi_power(uint32_t cpu, struct 
acpi_processor_power *power)
 printk("active state:\t\tC%d\n", last_state_idx);
 printk("max_cstate:\t\tC%d\n", max_cstate);
 printk("states:\n");
-
+
+spin_lock_irq(&power->stat_lock);
+current_tick = cpuidle_get_tick();
+current_stime = NOW();
 for ( i = 1; i < power->count; i++ )
 {
-spin_lock_irq(&power->stat_lock);  
-res = tick_to_ns(power->states[i].time);
-usage = power->states[i].usage;
-spin_unlock_irq(&power->stat_lock);
+res_tick[i] = power->states[i].time;
+usage[i] = power->states[i].usage;
+}
+last_state_update_tick = power->last_state_update_tick;
+spin_unlock_irq(&power->stat_lock);
+
+res_tick[last_state_idx] += ticks_elapsed(last_state_update_tick, 
current_tick);
+usage[last_state_idx]++;
 
-idle_usage += usage;
-idle_res += res;
+for ( i = 1; i < power->count; i++ )
+{
+idle_usage += usage[i];
+idle_res += tick_to_ns(res_tick[i]);
 
 printk((last_state_idx == i) ? "   *" : "");
 printk("C%d:\t", i);
 printk("type[C%d] ", power->states[i].type);
 printk("latency[%03d] ", power->states[i].latency);
-printk("usage[%08d] ", usage);
+printk("usage[%08"PRIu64"] ", usage[i]);
 printk("method[%5s] ", 
acpi_cstate_method_name[power->states[i].entry_method]);
-printk("duration[%"PRId64"]\n", res);
+printk("duration[%"PRIu64"]\n", tick_to_ns(res_tick[i]));
 }
 printk((last_state_idx == 0) ? "   *" : "");
-printk("C0:\tusage[%08d] duration[%"PRId64"]\n",
-   idle_usage, NOW() - idle_res);
+printk("C0:\tusage[%08"PRIu64"] duration[%"PRIu64"]\n",
+   usage[0] + idle_usage, current_stime - idle_res);
 
 print_hw_residencies(cpu);
 }
@@ -329,7 +340,7 @@ static uint64_t acpi_pm_ticks_elapsed(uint64_t t1, uint64_t 
t2)
 }
 
 uint64_t (*__read_mostly cpuidle_get_tick)(void) = get_acpi_pm_tick;
-static uint64_t (*__read_mostly ticks_elapsed)(uint64_t, uint64_t)
+uint64_t (*__read_mostly ticks_elapsed)(uint64_t, uint64_t)
 = acpi_pm_ticks_elapsed;
 
 /*
@@ -486,6 +497,17 @@ bool_t errata_c6_eoi_workaround(void)
 return (fix_needed && cpu_has_pending_apic_eoi());
 }
 
+void update_last_cx_stat(struct acpi_processor_power *power,
+ struct acpi_processor_cx *cx, uint64_t ticks)
+{
+ASSERT(!local_irq_is_enabled());
+
+spin_lock(&power->stat_lock);
+power->last_state = cx;
+power->last_state_update_tick = ticks;
+spin_unlock(&power->stat_lock);
+}
+
 void update_idle_stats(struct acpi_processor_power *power,
struct acpi_processor_cx *cx,
uint64_t before, uint64_t after)
@@ -501,6 +523,8 

[Xen-devel] [V4] x86/cpuidle: get accurate C0 value with xenpm tool

2015-05-11 Thread Huaitong Han
When checking the ACPI funciton of C-status, after 100 seconds sleep,
the sampling value of C0 status from the xenpm tool decreases.
Because C0=NOW()-C1-C2-C3-C4, when NOW() value is during idle time,
NOW() value is bigger than last C-status update time, and C0 value
is also bigger than ture value. if margin of the second error cannot
make up for margin of the first error, the value of C0 would decrease.

Signed-off-by: Huaitong Han 

---
ChangeLog:
V4:
delete pointless initializers and hard tabs.

V3:
1.Don't use tick_to_ns inside lock in print_acpi_power.
2.Use 08 padding in printk.
3.Merge two "for" circulation into one for coding style.

V2:
C0 = last_cx_update_time-C1-C2-C3-C4, but last_cx_update_time is not now,
so the C0 value is stale, NOW-last_update_time should be calculated.
C[current_cx_stat]+=NOW-last_update_time, so the CX value is fresh.

V1:
Initial patch
---

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index e639c99..9f7ccc4 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -254,9 +254,10 @@ static char* acpi_cstate_method_name[] =
 
 static void print_acpi_power(uint32_t cpu, struct acpi_processor_power *power)
 {
-uint32_t i, idle_usage = 0;
-uint64_t res, idle_res = 0;
-u32 usage;
+uint64_t idle_res = 0, idle_usage = 0, last_state_update_tick, now;
+uint64_t usage[ACPI_PROCESSOR_MAX_POWER] = { 0 };
+uint64_t res[ACPI_PROCESSOR_MAX_POWER] = { 0 };
+unsigned int i;
 u8 last_state_idx;
 
 printk("==cpu%d==\n", cpu);
@@ -264,28 +265,36 @@ static void print_acpi_power(uint32_t cpu, struct 
acpi_processor_power *power)
 printk("active state:\t\tC%d\n", last_state_idx);
 printk("max_cstate:\t\tC%d\n", max_cstate);
 printk("states:\n");
-
+
+spin_lock_irq(&power->stat_lock);
+now = NOW();
 for ( i = 1; i < power->count; i++ )
 {
-spin_lock_irq(&power->stat_lock);  
-res = tick_to_ns(power->states[i].time);
-usage = power->states[i].usage;
-spin_unlock_irq(&power->stat_lock);
+res[i] = tick_to_ns(power->states[i].time);
+usage[i] = power->states[i].usage;
+}
+last_state_update_tick = power->last_state_update_tick;
+spin_unlock_irq(&power->stat_lock);
+
+res[last_state_idx] += now - tick_to_ns(last_state_update_tick);
+usage[last_state_idx] += 1;
 
-idle_usage += usage;
-idle_res += res;
+for ( i = 1; i < power->count; i++ )
+{
+idle_usage += usage[i];
+idle_res += res[i];
 
 printk((last_state_idx == i) ? "   *" : "");
 printk("C%d:\t", i);
 printk("type[C%d] ", power->states[i].type);
 printk("latency[%03d] ", power->states[i].latency);
-printk("usage[%08d] ", usage);
+printk("usage[%08"PRIu64"] ", usage[i]);
 printk("method[%5s] ", 
acpi_cstate_method_name[power->states[i].entry_method]);
-printk("duration[%"PRId64"]\n", res);
+printk("duration[%"PRIu64"]\n", res[i]);
 }
 printk((last_state_idx == 0) ? "   *" : "");
-printk("C0:\tusage[%08d] duration[%"PRId64"]\n",
-   idle_usage, NOW() - idle_res);
+printk("C0:\tusage[%08"PRIu64"] duration[%"PRIu64"]\n",
+   usage[0] + idle_usage, res[0] + tick_to_ns(last_state_update_tick) 
- idle_res);
 
 print_hw_residencies(cpu);
 }
@@ -486,6 +495,17 @@ bool_t errata_c6_eoi_workaround(void)
 return (fix_needed && cpu_has_pending_apic_eoi());
 }
 
+void update_last_cx_stat(struct acpi_processor_power *power,
+ struct acpi_processor_cx *cx, uint64_t ticks)
+{
+ASSERT(!local_irq_is_enabled());
+
+spin_lock(&power->stat_lock);
+power->last_state = cx;
+power->last_state_update_tick = ticks;
+spin_unlock(&power->stat_lock);
+}
+
 void update_idle_stats(struct acpi_processor_power *power,
struct acpi_processor_cx *cx,
uint64_t before, uint64_t after)
@@ -501,6 +521,8 @@ void update_idle_stats(struct acpi_processor_power *power,
 power->last_residency = tick_to_ns(sleep_ticks) / 1000UL;
 cx->time += sleep_ticks;
 }
+power->last_state = &power->states[0];
+power->last_state_update_tick = after;
 
 spin_unlock(&power->stat_lock);
 }
@@ -557,7 +579,6 @@ static void acpi_processor_idle(void)
 if ( (cx->type == ACPI_STATE_C3) && errata_c6_eoi_workaround() )
 cx = power->safe_state;
 
-power->last_state = cx;
 
 /*
  * Sleep:
@@ -574,6 +595,7 @@ static void acpi_processor_idle(voi

[Xen-devel] [V3] x86/cpuidle: get accurate C0 value with xenpm tool

2015-05-08 Thread Huaitong Han
When checking the ACPI funciton of C-status, after 100 seconds sleep,
the sampling value of C0 status from the xenpm tool decreases.
Because C0=NOW()-C1-C2-C3-C4, when NOW() value is during idle time,
NOW() value is bigger than last C-status update time, and C0 value
is also bigger than ture value. if margin of the second error cannot
make up for margin of the first error, the value of C0 would decrease.

Signed-off-by: Huaitong Han 

---
ChangeLog:
V3:
1.Don't use tick_to_ns inside lock in print_acpi_power.
2.Use 08 padding in printk.
3.Merge two "for" circulation into one for coding style.

V2:
C0 = last_cx_update_time-C1-C2-C3-C4, but last_cx_update_time is not now,
so the C0 value is stale, NOW-last_update_time should be calculated.
C[current_cx_stat]+=NOW-last_update_time, so the CX value is fresh.

V1:
Initial patch.
---

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index e639c99..bd31b09 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -254,9 +254,10 @@ static char* acpi_cstate_method_name[] =
 
 static void print_acpi_power(uint32_t cpu, struct acpi_processor_power *power)
 {
-uint32_t i, idle_usage = 0;
-uint64_t res, idle_res = 0;
-u32 usage;
+uint64_t idle_res = 0, idle_usage = 0, last_state_update_tick = 0, now = 0;
+uint64_t usage[ACPI_PROCESSOR_MAX_POWER] = { 0 };
+uint64_t res[ACPI_PROCESSOR_MAX_POWER] = { 0 };
+unsigned int i;
 u8 last_state_idx;
 
 printk("==cpu%d==\n", cpu);
@@ -264,28 +265,36 @@ static void print_acpi_power(uint32_t cpu, struct 
acpi_processor_power *power)
 printk("active state:\t\tC%d\n", last_state_idx);
 printk("max_cstate:\t\tC%d\n", max_cstate);
 printk("states:\n");
-
+
+spin_lock_irq(&power->stat_lock);
+now = NOW();
 for ( i = 1; i < power->count; i++ )
 {
-spin_lock_irq(&power->stat_lock);  
-res = tick_to_ns(power->states[i].time);
-usage = power->states[i].usage;
-spin_unlock_irq(&power->stat_lock);
+res[i] = tick_to_ns(power->states[i].time);
+usage[i] = power->states[i].usage;
+}
+last_state_update_tick = power->last_state_update_tick;
+spin_unlock_irq(&power->stat_lock);
 
-idle_usage += usage;
-idle_res += res;
+res[last_state_idx] += now - tick_to_ns(last_state_update_tick);
+usage[last_state_idx] += 1;
+
+for ( i = 1; i < power->count; i++ )
+{
+idle_usage += usage[i];
+idle_res += res[i];
 
 printk((last_state_idx == i) ? "   *" : "");
 printk("C%d:\t", i);
 printk("type[C%d] ", power->states[i].type);
 printk("latency[%03d] ", power->states[i].latency);
-printk("usage[%08d] ", usage);
+printk("usage[%08"PRIu64"] ", usage[i]);
 printk("method[%5s] ", 
acpi_cstate_method_name[power->states[i].entry_method]);
-printk("duration[%"PRId64"]\n", res);
+printk("duration[%"PRIu64"]\n", res[i]);
 }
 printk((last_state_idx == 0) ? "   *" : "");
-printk("C0:\tusage[%08d] duration[%"PRId64"]\n",
-   idle_usage, NOW() - idle_res);
+printk("C0:\tusage[%08"PRIu64"] duration[%"PRIu64"]\n",
+   usage[0] + idle_usage, res[0] + tick_to_ns(last_state_update_tick) 
- idle_res);
 
 print_hw_residencies(cpu);
 }
@@ -486,6 +495,17 @@ bool_t errata_c6_eoi_workaround(void)
 return (fix_needed && cpu_has_pending_apic_eoi());
 }
 
+void update_last_cx_stat(struct acpi_processor_power *power,
+ struct acpi_processor_cx *cx, uint64_t ticks)
+{
+   ASSERT(!local_irq_is_enabled());
+
+   spin_lock(&power->stat_lock);
+   power->last_state = cx;
+   power->last_state_update_tick = ticks;
+   spin_unlock(&power->stat_lock);
+}
+
 void update_idle_stats(struct acpi_processor_power *power,
struct acpi_processor_cx *cx,
uint64_t before, uint64_t after)
@@ -501,6 +521,8 @@ void update_idle_stats(struct acpi_processor_power *power,
 power->last_residency = tick_to_ns(sleep_ticks) / 1000UL;
 cx->time += sleep_ticks;
 }
+power->last_state = &power->states[0];
+power->last_state_update_tick = after;
 
 spin_unlock(&power->stat_lock);
 }
@@ -557,7 +579,6 @@ static void acpi_processor_idle(void)
 if ( (cx->type == ACPI_STATE_C3) && errata_c6_eoi_workaround() )
 cx = power->safe_state;
 
-power->last_state = cx;
 
 /*
  * Sleep:
@@ -574,6 +595,7 @@ static void acpi_processor_idle(void)
 t1 = cpuidle_g

[Xen-devel] [V2] x86/cpuidle: get accurate C0 value with xenpm tool

2015-05-03 Thread Huaitong Han
When checking the ACPI funciton of C-status, after 100 seconds sleep,
the sampling value of C0 status from the xenpm tool decreases.
Because C0=NOW()-C1-C2-C3-C4, when NOW() value is during idle time,
NOW() value is bigger than last C-status update time, and C0 value
is also bigger than ture value. if margin of the second error cannot
make up for margin of the first error, the value of C0 would decrease.

Signed-off-by: Huaitong Han 

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index e639c99..e5fffe8 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -254,9 +254,10 @@ static char* acpi_cstate_method_name[] =
 
 static void print_acpi_power(uint32_t cpu, struct acpi_processor_power *power)
 {
-uint32_t i, idle_usage = 0;
-uint64_t res, idle_res = 0;
-u32 usage;
+uint64_t idle_res = 0, idle_usage = 0, last_state_update_time = 0, now = 0;
+uint64_t usage[ACPI_PROCESSOR_MAX_POWER] = { 0 };
+uint64_t res[ACPI_PROCESSOR_MAX_POWER] = { 0 };
+uint32_t i;
 u8 last_state_idx;
 
 printk("==cpu%d==\n", cpu);
@@ -264,28 +265,36 @@ static void print_acpi_power(uint32_t cpu, struct 
acpi_processor_power *power)
 printk("active state:\t\tC%d\n", last_state_idx);
 printk("max_cstate:\t\tC%d\n", max_cstate);
 printk("states:\n");
-
+
+spin_lock_irq(&power->stat_lock);
+now = NOW();
 for ( i = 1; i < power->count; i++ )
 {
-spin_lock_irq(&power->stat_lock);  
-res = tick_to_ns(power->states[i].time);
-usage = power->states[i].usage;
-spin_unlock_irq(&power->stat_lock);
+res[i] = tick_to_ns(power->states[i].time);
+usage[i] = power->states[i].usage;
+}
+last_state_update_time = tick_to_ns(power->last_state_update_tick);
+spin_unlock_irq(&power->stat_lock);
 
-idle_usage += usage;
-idle_res += res;
+res[last_state_idx] += now - last_state_update_time;
+usage[last_state_idx] += 1;
+
+for ( i = 1; i < power->count; i++ )
+{
+idle_usage += usage[i];
+idle_res += res[i];
 
 printk((last_state_idx == i) ? "   *" : "");
 printk("C%d:\t", i);
 printk("type[C%d] ", power->states[i].type);
 printk("latency[%03d] ", power->states[i].latency);
-printk("usage[%08d] ", usage);
+printk("usage[%"PRIu64"] ", usage[i]);
 printk("method[%5s] ", 
acpi_cstate_method_name[power->states[i].entry_method]);
-printk("duration[%"PRId64"]\n", res);
+   printk("duration[%"PRIu64"]\n", res[i]);
 }
 printk((last_state_idx == 0) ? "   *" : "");
-printk("C0:\tusage[%08d] duration[%"PRId64"]\n",
-   idle_usage, NOW() - idle_res);
+printk("C0:\tusage[%"PRIu64"] duration[%"PRIu64"]\n",
+   usage[0] + idle_usage, res[0] + last_state_update_time - idle_res);
 
 print_hw_residencies(cpu);
 }
@@ -486,6 +495,15 @@ bool_t errata_c6_eoi_workaround(void)
 return (fix_needed && cpu_has_pending_apic_eoi());
 }
 
+void update_last_cx_stat(struct acpi_processor_power *power,
+ struct acpi_processor_cx *cx, uint64_t ticks)
+{
+   spin_lock(&power->stat_lock);
+   power->last_state = cx;
+   power->last_state_update_tick = ticks;
+   spin_unlock(&power->stat_lock);
+}
+
 void update_idle_stats(struct acpi_processor_power *power,
struct acpi_processor_cx *cx,
uint64_t before, uint64_t after)
@@ -501,6 +519,8 @@ void update_idle_stats(struct acpi_processor_power *power,
 power->last_residency = tick_to_ns(sleep_ticks) / 1000UL;
 cx->time += sleep_ticks;
 }
+power->last_state = &power->states[0];
+power->last_state_update_tick = after;
 
 spin_unlock(&power->stat_lock);
 }
@@ -557,7 +577,6 @@ static void acpi_processor_idle(void)
 if ( (cx->type == ACPI_STATE_C3) && errata_c6_eoi_workaround() )
 cx = power->safe_state;
 
-power->last_state = cx;
 
 /*
  * Sleep:
@@ -574,6 +593,7 @@ static void acpi_processor_idle(void)
 t1 = cpuidle_get_tick();
 /* Trace cpu idle entry */
 TRACE_4D(TRC_PM_IDLE_ENTRY, cx->idx, t1, exp, pred);
+update_last_cx_stat(power, cx, t1);
 /* Invoke C2 */
 acpi_idle_do_entry(cx);
 /* Get end time (ticks) */
@@ -602,7 +622,7 @@ static void acpi_processor_idle(void)
 t1 = cpuidle_get_tick();
 /* Trace cpu idle entry */
 TRACE_4D(TRC_PM_IDLE_ENTRY, cx->idx, t1, exp, pr

[Xen-devel] [v1] x86/cpuidle: get accurate C0 value with xenpm tool

2015-04-15 Thread Huaitong Han
When checking the ACPI funciton of C-status, after 100 seconds sleep,
the sampling value of C0 C-status from the xenpm tool decreases.
Because C0=NOW()-C1-C2-C3-C4, when NOW() value is during idle time,
NOW() value is bigger than last C-status update time, and C0 value
is also bigger than ture value. if margin of the second error cannot
make up for margin of the first error, the value of C0 would decrease.

Signed-off-by: Huaitong Han 

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index e639c99..fd80227 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -499,6 +499,7 @@ void update_idle_stats(struct acpi_processor_power *power,
 if ( sleep_ticks > 0 )
 {
 power->last_residency = tick_to_ns(sleep_ticks) / 1000UL;
+power->last_cx_update_tick = after;
 cx->time += sleep_ticks;
 }
 
@@ -1171,7 +1172,7 @@ uint32_t pmstat_get_cx_nr(uint32_t cpuid)
 int pmstat_get_cx_stat(uint32_t cpuid, struct pm_cx_stat *stat)
 {
 struct acpi_processor_power *power = processor_powers[cpuid];
-uint64_t idle_usage = 0, idle_res = 0;
+uint64_t idle_usage = 0, idle_res = 0, last_cx_update_time = 0;
 uint64_t usage[ACPI_PROCESSOR_MAX_POWER], res[ACPI_PROCESSOR_MAX_POWER];
 unsigned int i, nr, nr_pc = 0, nr_cc = 0;
 
@@ -1214,6 +1215,10 @@ int pmstat_get_cx_stat(uint32_t cpuid, struct pm_cx_stat 
*stat)
 idle_res += res[i];
 }
 
+spin_lock_irq(&power->stat_lock);
+last_cx_update_time = tick_to_ns(power->last_cx_update_tick);
+spin_unlock_irq(&power->stat_lock);
+
 get_hw_residencies(cpuid, &hw_res);
 
 #define PUT_xC(what, n) do { \
@@ -1243,7 +1248,7 @@ int pmstat_get_cx_stat(uint32_t cpuid, struct pm_cx_stat 
*stat)
 }
 
 usage[0] = idle_usage;
-res[0] = NOW() - idle_res;
+res[0] = last_cx_update_time - idle_res;
 
 if ( copy_to_guest(stat->triggers, usage, nr) ||
  copy_to_guest(stat->residencies, res, nr) )
diff --git a/xen/include/xen/cpuidle.h b/xen/include/xen/cpuidle.h
index b7b9e8c..19e7c7a 100644
--- a/xen/include/xen/cpuidle.h
+++ b/xen/include/xen/cpuidle.h
@@ -66,6 +66,7 @@ struct acpi_processor_power
 struct acpi_processor_cx *last_state;
 struct acpi_processor_cx *safe_state;
 void *gdata; /* governor specific data */
+u64 last_cx_update_tick;
 u32 last_residency;
 u32 count;
 spinlock_t stat_lock;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel