Re: [Xen-devel] [V5 PATCH 1/1] x86/xen: Set EFER.NX and EFER.SCE in PVH guests

2014-09-15 Thread Mukesh Rathor
On Fri, 12 Sep 2014 16:42:58 -0400
Konrad Rzeszutek Wilk  wrote:

> On Wed, Sep 10, 2014 at 04:36:06PM -0700, Mukesh Rathor wrote:

sorry, i didn't realize you had more comments... didn't scroll down :)..

> >  cpumask_var_t xen_cpu_initialized_map;
> >  
> > @@ -99,10 +100,14 @@ static void cpu_bringup(void)
> > wmb();  /* make sure everything is
> > out */ }
> >  
> > -/* Note: cpu parameter is only relevant for PVH */
> > -static void cpu_bringup_and_idle(int cpu)
> > +/*
> > + * Note: cpu parameter is only relevant for PVH. The reason for
> > passing it
> > + * is we can't do smp_processor_id until the percpu segments are
> > loaded, for
> > + * which we need the cpu number! So we pass it in rdi as first
> > parameter.
> > + */
> 
> Thank you for expanding on that (I totally forgot why we did that).

sure.

> > +* The vcpu comes on kernel page tables which have
> > the NX pte
> > +* bit set. This means before DS/SS is touched, NX
> > in
> > +* EFER must be set. Hence the following assembly
> > glue code.
> 
> And you ripped out the nice 'N.B' comment I added. Sad :-(
> >  */
> > +   ctxt->user_regs.eip = (unsigned
> > long)xen_pvh_early_cpu_init; ctxt->user_regs.rdi = cpu;
> > +   ctxt->user_regs.rsi = true;  /* secondary cpu ==
> > true */
> 
> Oh, that is new. Ah yes we can use that [looking at Xen code].
> I wonder what other registers we can use to pass stuff around.

All GPRs. I commented that we can do that in the Rogers PVH doc.

Looks like David responded to other comments.

Thanks,
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [V5 PATCH 1/1] x86/xen: Set EFER.NX and EFER.SCE in PVH guests

2014-09-15 Thread Mukesh Rathor
On Fri, 12 Sep 2014 16:42:58 -0400
Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:

 On Wed, Sep 10, 2014 at 04:36:06PM -0700, Mukesh Rathor wrote:

sorry, i didn't realize you had more comments... didn't scroll down :)..

   cpumask_var_t xen_cpu_initialized_map;
   
  @@ -99,10 +100,14 @@ static void cpu_bringup(void)
  wmb();  /* make sure everything is
  out */ }
   
  -/* Note: cpu parameter is only relevant for PVH */
  -static void cpu_bringup_and_idle(int cpu)
  +/*
  + * Note: cpu parameter is only relevant for PVH. The reason for
  passing it
  + * is we can't do smp_processor_id until the percpu segments are
  loaded, for
  + * which we need the cpu number! So we pass it in rdi as first
  parameter.
  + */
 
 Thank you for expanding on that (I totally forgot why we did that).

sure.

  +* The vcpu comes on kernel page tables which have
  the NX pte
  +* bit set. This means before DS/SS is touched, NX
  in
  +* EFER must be set. Hence the following assembly
  glue code.
 
 And you ripped out the nice 'N.B' comment I added. Sad :-(
   */
  +   ctxt-user_regs.eip = (unsigned
  long)xen_pvh_early_cpu_init; ctxt-user_regs.rdi = cpu;
  +   ctxt-user_regs.rsi = true;  /* secondary cpu ==
  true */
 
 Oh, that is new. Ah yes we can use that [looking at Xen code].
 I wonder what other registers we can use to pass stuff around.

All GPRs. I commented that we can do that in the Rogers PVH doc.

Looks like David responded to other comments.

Thanks,
Mukesh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [V5 PATCH 1/1] x86/xen: Set EFER.NX and EFER.SCE in PVH guests

2014-09-12 Thread Mukesh Rathor
On Fri, 12 Sep 2014 16:42:58 -0400
Konrad Rzeszutek Wilk  wrote:

> On Wed, Sep 10, 2014 at 04:36:06PM -0700, Mukesh Rathor wrote:
> > This fixes two bugs in PVH guests:
> > 
> >   - Not setting EFER.NX means the NX bit in page table entries is
> > ignored on Intel processors and causes reserved bit page faults
> > on AMD processors.
> > 
> >   - After the Xen commit 7645640d6ff1 ("x86/PVH: don't set EFER_SCE
> > for pvh guest") PVH guests are required to set EFER.SCE to enable
> > the SYSCALL instruction.
> > 
> > Secondary VCPUs are started with pagetables with the NX bit set so
> > EFER.NX must be set before using any stack or data segment.
> > xen_pvh_cpu_early_init() is the new secondary VCPU entry point that
> > sets EFER before jumping to cpu_bringup_and_idle().
> > 
> > Signed-off-by: Mukesh Rathor 
> > Signed-off-by: David Vrabel 
> 
> Huh? So who wrote it? Or did you mean 'Reviewed-by'?

No, meant SOB. I wrote v1, v2, then David came up with V3 and v4, 
then i took comments from v4 and came up with v5.

-Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [V5 PATCH 1/1] x86/xen: Set EFER.NX and EFER.SCE in PVH guests

2014-09-12 Thread Mukesh Rathor
On Fri, 12 Sep 2014 16:42:58 -0400
Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:

 On Wed, Sep 10, 2014 at 04:36:06PM -0700, Mukesh Rathor wrote:
  This fixes two bugs in PVH guests:
  
- Not setting EFER.NX means the NX bit in page table entries is
  ignored on Intel processors and causes reserved bit page faults
  on AMD processors.
  
- After the Xen commit 7645640d6ff1 (x86/PVH: don't set EFER_SCE
  for pvh guest) PVH guests are required to set EFER.SCE to enable
  the SYSCALL instruction.
  
  Secondary VCPUs are started with pagetables with the NX bit set so
  EFER.NX must be set before using any stack or data segment.
  xen_pvh_cpu_early_init() is the new secondary VCPU entry point that
  sets EFER before jumping to cpu_bringup_and_idle().
  
  Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
  Signed-off-by: David Vrabel david.vra...@citrix.com
 
 Huh? So who wrote it? Or did you mean 'Reviewed-by'?

No, meant SOB. I wrote v1, v2, then David came up with V3 and v4, 
then i took comments from v4 and came up with v5.

-Mukesh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V5 PATCH 1/1] x86/xen: Set EFER.NX and EFER.SCE in PVH guests

2014-09-10 Thread Mukesh Rathor
This fixes two bugs in PVH guests:

  - Not setting EFER.NX means the NX bit in page table entries is
ignored on Intel processors and causes reserved bit page faults on
AMD processors.

  - After the Xen commit 7645640d6ff1 ("x86/PVH: don't set EFER_SCE for
pvh guest") PVH guests are required to set EFER.SCE to enable the
SYSCALL instruction.

Secondary VCPUs are started with pagetables with the NX bit set so
EFER.NX must be set before using any stack or data segment.
xen_pvh_cpu_early_init() is the new secondary VCPU entry point that
sets EFER before jumping to cpu_bringup_and_idle().

Signed-off-by: Mukesh Rathor 
Signed-off-by: David Vrabel 
---
 arch/x86/xen/enlighten.c |  6 ++
 arch/x86/xen/smp.c   | 29 ++---
 arch/x86/xen/smp.h   |  8 
 arch/x86/xen/xen-head.S  | 33 +
 4 files changed, 65 insertions(+), 11 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c0cb11f..424d831 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1463,6 +1463,7 @@ static void __ref xen_setup_gdt(int cpu)
pv_cpu_ops.load_gdt = xen_load_gdt;
 }
 
+#ifdef CONFIG_XEN_PVH
 /*
  * A PV guest starts with default flags that are not set for PVH, set them
  * here asap.
@@ -1508,12 +1509,15 @@ static void __init xen_pvh_early_guest_init(void)
return;
 
xen_have_vector_callback = 1;
+
+   xen_pvh_early_cpu_init(0, false);
xen_pvh_set_cr_flags(0);
 
 #ifdef CONFIG_X86_32
BUG(); /* PVH: Implement proper support. */
 #endif
 }
+#endif/* CONFIG_XEN_PVH */
 
 /* First C function to be called on Xen boot */
 asmlinkage __visible void __init xen_start_kernel(void)
@@ -1527,7 +1531,9 @@ asmlinkage __visible void __init xen_start_kernel(void)
xen_domain_type = XEN_PV_DOMAIN;
 
xen_setup_features();
+#ifdef CONFIG_XEN_PVH
xen_pvh_early_guest_init();
+#endif
xen_setup_machphys_mapping();
 
/* Install Xen paravirt ops */
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7005974..b25f8942 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -37,6 +37,7 @@
 #include 
 #include "xen-ops.h"
 #include "mmu.h"
+#include "smp.h"
 
 cpumask_var_t xen_cpu_initialized_map;
 
@@ -99,10 +100,14 @@ static void cpu_bringup(void)
wmb();  /* make sure everything is out */
 }
 
-/* Note: cpu parameter is only relevant for PVH */
-static void cpu_bringup_and_idle(int cpu)
+/*
+ * Note: cpu parameter is only relevant for PVH. The reason for passing it
+ * is we can't do smp_processor_id until the percpu segments are loaded, for
+ * which we need the cpu number! So we pass it in rdi as first parameter.
+ */
+asmlinkage __visible void cpu_bringup_and_idle(int cpu)
 {
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_XEN_PVH
if (xen_feature(XENFEAT_auto_translated_physmap) &&
xen_feature(XENFEAT_supervisor_mode_kernel))
xen_pvh_secondary_vcpu_init(cpu);
@@ -374,11 +379,10 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
ctxt->user_regs.fs = __KERNEL_PERCPU;
ctxt->user_regs.gs = __KERNEL_STACK_CANARY;
 #endif
-   ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
-
memset(>fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
ctxt->flags = VGCF_IN_KERNEL;
ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
ctxt->user_regs.ds = __USER_DS;
@@ -413,15 +417,18 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
(unsigned long)xen_failsafe_callback;
ctxt->user_regs.cs = __KERNEL_CS;
per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
-#ifdef CONFIG_X86_32
}
-#else
-   } else
-   /* N.B. The user_regs.eip (cpu_bringup_and_idle) is called with
-* %rdi having the cpu number - which means are passing in
-* as the first parameter the cpu. Subtle!
+#ifdef CONFIG_XEN_PVH
+   else {
+   /*
+* The vcpu comes on kernel page tables which have the NX pte
+* bit set. This means before DS/SS is touched, NX in
+* EFER must be set. Hence the following assembly glue code.
 */
+   ctxt->user_regs.eip = (unsigned long)xen_pvh_early_cpu_init;
ctxt->user_regs.rdi = cpu;
+   ctxt->user_regs.rsi = true;  /* secondary cpu == true */
+   }
 #endif
ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
diff --git a/arch/x86/xen/smp.

[V5 PATCH 0/1] x86/xen: Set EFER.NX and EFER.SCE in PVH guests

2014-09-10 Thread Mukesh Rathor
Hi,

Attached V5 patch for fixing the EFER bugs on PVH.

Changes in v5 (Mukesh):
  - Jan reminded us that vcpu 0 could go offline/online. So, add flag back 
instead of using cpuid to return from xen_pvh_early_cpu_init.
  - Boris comments: 
   o Rename to xen_pvh_early_cpu_init
   o Add ifdef around pvh functions in enlighten.c too.
  - Tab before closing brace to pacify checkpatch.pl

Changes in v4 (David):
  - cpu == 0 => boot CPU
  - Reduce #ifdefs.
  - Add patch for XEN_PVH docs.

Changes in v3 (David):
  - Use common xen_pvh_cpu_early_init() function for boot and secondary
VCPUs.

Changes in v2: (Mukesh):
  - Use assembly macro to unify code for boot and secondary VCPUs.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V5 PATCH 0/1] x86/xen: Set EFER.NX and EFER.SCE in PVH guests

2014-09-10 Thread Mukesh Rathor
Hi,

Attached V5 patch for fixing the EFER bugs on PVH.

Changes in v5 (Mukesh):
  - Jan reminded us that vcpu 0 could go offline/online. So, add flag back 
instead of using cpuid to return from xen_pvh_early_cpu_init.
  - Boris comments: 
   o Rename to xen_pvh_early_cpu_init
   o Add ifdef around pvh functions in enlighten.c too.
  - Tab before closing brace to pacify checkpatch.pl

Changes in v4 (David):
  - cpu == 0 = boot CPU
  - Reduce #ifdefs.
  - Add patch for XEN_PVH docs.

Changes in v3 (David):
  - Use common xen_pvh_cpu_early_init() function for boot and secondary
VCPUs.

Changes in v2: (Mukesh):
  - Use assembly macro to unify code for boot and secondary VCPUs.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V5 PATCH 1/1] x86/xen: Set EFER.NX and EFER.SCE in PVH guests

2014-09-10 Thread Mukesh Rathor
This fixes two bugs in PVH guests:

  - Not setting EFER.NX means the NX bit in page table entries is
ignored on Intel processors and causes reserved bit page faults on
AMD processors.

  - After the Xen commit 7645640d6ff1 (x86/PVH: don't set EFER_SCE for
pvh guest) PVH guests are required to set EFER.SCE to enable the
SYSCALL instruction.

Secondary VCPUs are started with pagetables with the NX bit set so
EFER.NX must be set before using any stack or data segment.
xen_pvh_cpu_early_init() is the new secondary VCPU entry point that
sets EFER before jumping to cpu_bringup_and_idle().

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
Signed-off-by: David Vrabel david.vra...@citrix.com
---
 arch/x86/xen/enlighten.c |  6 ++
 arch/x86/xen/smp.c   | 29 ++---
 arch/x86/xen/smp.h   |  8 
 arch/x86/xen/xen-head.S  | 33 +
 4 files changed, 65 insertions(+), 11 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c0cb11f..424d831 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1463,6 +1463,7 @@ static void __ref xen_setup_gdt(int cpu)
pv_cpu_ops.load_gdt = xen_load_gdt;
 }
 
+#ifdef CONFIG_XEN_PVH
 /*
  * A PV guest starts with default flags that are not set for PVH, set them
  * here asap.
@@ -1508,12 +1509,15 @@ static void __init xen_pvh_early_guest_init(void)
return;
 
xen_have_vector_callback = 1;
+
+   xen_pvh_early_cpu_init(0, false);
xen_pvh_set_cr_flags(0);
 
 #ifdef CONFIG_X86_32
BUG(); /* PVH: Implement proper support. */
 #endif
 }
+#endif/* CONFIG_XEN_PVH */
 
 /* First C function to be called on Xen boot */
 asmlinkage __visible void __init xen_start_kernel(void)
@@ -1527,7 +1531,9 @@ asmlinkage __visible void __init xen_start_kernel(void)
xen_domain_type = XEN_PV_DOMAIN;
 
xen_setup_features();
+#ifdef CONFIG_XEN_PVH
xen_pvh_early_guest_init();
+#endif
xen_setup_machphys_mapping();
 
/* Install Xen paravirt ops */
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7005974..b25f8942 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -37,6 +37,7 @@
 #include xen/hvc-console.h
 #include xen-ops.h
 #include mmu.h
+#include smp.h
 
 cpumask_var_t xen_cpu_initialized_map;
 
@@ -99,10 +100,14 @@ static void cpu_bringup(void)
wmb();  /* make sure everything is out */
 }
 
-/* Note: cpu parameter is only relevant for PVH */
-static void cpu_bringup_and_idle(int cpu)
+/*
+ * Note: cpu parameter is only relevant for PVH. The reason for passing it
+ * is we can't do smp_processor_id until the percpu segments are loaded, for
+ * which we need the cpu number! So we pass it in rdi as first parameter.
+ */
+asmlinkage __visible void cpu_bringup_and_idle(int cpu)
 {
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_XEN_PVH
if (xen_feature(XENFEAT_auto_translated_physmap) 
xen_feature(XENFEAT_supervisor_mode_kernel))
xen_pvh_secondary_vcpu_init(cpu);
@@ -374,11 +379,10 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
ctxt-user_regs.fs = __KERNEL_PERCPU;
ctxt-user_regs.gs = __KERNEL_STACK_CANARY;
 #endif
-   ctxt-user_regs.eip = (unsigned long)cpu_bringup_and_idle;
-
memset(ctxt-fpu_ctxt, 0, sizeof(ctxt-fpu_ctxt));
 
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   ctxt-user_regs.eip = (unsigned long)cpu_bringup_and_idle;
ctxt-flags = VGCF_IN_KERNEL;
ctxt-user_regs.eflags = 0x1000; /* IOPL_RING1 */
ctxt-user_regs.ds = __USER_DS;
@@ -413,15 +417,18 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
(unsigned long)xen_failsafe_callback;
ctxt-user_regs.cs = __KERNEL_CS;
per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
-#ifdef CONFIG_X86_32
}
-#else
-   } else
-   /* N.B. The user_regs.eip (cpu_bringup_and_idle) is called with
-* %rdi having the cpu number - which means are passing in
-* as the first parameter the cpu. Subtle!
+#ifdef CONFIG_XEN_PVH
+   else {
+   /*
+* The vcpu comes on kernel page tables which have the NX pte
+* bit set. This means before DS/SS is touched, NX in
+* EFER must be set. Hence the following assembly glue code.
 */
+   ctxt-user_regs.eip = (unsigned long)xen_pvh_early_cpu_init;
ctxt-user_regs.rdi = cpu;
+   ctxt-user_regs.rsi = true;  /* secondary cpu == true */
+   }
 #endif
ctxt-user_regs.esp = idle-thread.sp0 - sizeof(struct pt_regs);
ctxt-ctrlreg[3] = xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
index c7c2d89..04a83a7 100644

Re: [Xen-devel] [V2 PATCH 1/1] PVH: set EFER.NX and EFER.SCE

2014-09-03 Thread Mukesh Rathor
On Wed, 3 Sep 2014 14:58:04 +0100
David Vrabel  wrote:

> On 03/09/14 02:19, Mukesh Rathor wrote:
> > This patch addresses two things for a pvh boot vcpu:
> > 
> >   - NX bug on intel: It was recenlty discovered that NX is not being
> > honored in PVH on intel since EFER.NX is not being set.
> > 
> >   - PVH boot hang on newer xen:  Following c/s on xen
> > 
> > c/s 7645640:  x86/PVH: don't set EFER_SCE for pvh guest
> > 
> > removes setting of EFER.SCE for PVH guests. As such, existing
> > intel pvh guest will no longer boot on xen after that c/s.
> > 
> > Both above changes will be applicable to AMD also when xen support
> > of AMD pvh is added.
> > 
> > Also, we create a new glue assembly entry point for secondary vcpus
> > because they come up on kernel page tables that have pte.NX
> > bits set. As such, before anything is touched in DS/SS, EFER.NX
> > must be set.
> [...]
> > --- a/arch/x86/xen/xen-head.S
> > +++ b/arch/x86/xen/xen-head.S
> > @@ -47,6 +47,35 @@ ENTRY(startup_xen)
> >  
> > __FINIT
> >  
> > +#ifdef CONFIG_XEN_PVH
> > +#ifdef CONFIG_X86_64
> > +.macro PVH_EARLY_SET_EFER
> 
> I don't think a macro is the right way to do this.  We can instead
> pass a parameter to say whether it is a boot or secondary CPU.
> 
> Something like this (untested) patch?

That's fine too. But, since vcpu 0 is always primary vcpu, we can
just use that and not worry about passing another parameter.

-Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [V2 PATCH 1/1] PVH: set EFER.NX and EFER.SCE

2014-09-03 Thread Mukesh Rathor
On Wed, 3 Sep 2014 14:58:04 +0100
David Vrabel david.vra...@citrix.com wrote:

 On 03/09/14 02:19, Mukesh Rathor wrote:
  This patch addresses two things for a pvh boot vcpu:
  
- NX bug on intel: It was recenlty discovered that NX is not being
  honored in PVH on intel since EFER.NX is not being set.
  
- PVH boot hang on newer xen:  Following c/s on xen
  
  c/s 7645640:  x86/PVH: don't set EFER_SCE for pvh guest
  
  removes setting of EFER.SCE for PVH guests. As such, existing
  intel pvh guest will no longer boot on xen after that c/s.
  
  Both above changes will be applicable to AMD also when xen support
  of AMD pvh is added.
  
  Also, we create a new glue assembly entry point for secondary vcpus
  because they come up on kernel page tables that have pte.NX
  bits set. As such, before anything is touched in DS/SS, EFER.NX
  must be set.
 [...]
  --- a/arch/x86/xen/xen-head.S
  +++ b/arch/x86/xen/xen-head.S
  @@ -47,6 +47,35 @@ ENTRY(startup_xen)
   
  __FINIT
   
  +#ifdef CONFIG_XEN_PVH
  +#ifdef CONFIG_X86_64
  +.macro PVH_EARLY_SET_EFER
 
 I don't think a macro is the right way to do this.  We can instead
 pass a parameter to say whether it is a boot or secondary CPU.
 
 Something like this (untested) patch?

That's fine too. But, since vcpu 0 is always primary vcpu, we can
just use that and not worry about passing another parameter.

-Mukesh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V2 PATCH 1/1] PVH: set EFER.NX and EFER.SCE

2014-09-02 Thread Mukesh Rathor
This patch addresses two things for a pvh boot vcpu:

  - NX bug on intel: It was recenlty discovered that NX is not being
honored in PVH on intel since EFER.NX is not being set.

  - PVH boot hang on newer xen:  Following c/s on xen

c/s 7645640:  x86/PVH: don't set EFER_SCE for pvh guest

removes setting of EFER.SCE for PVH guests. As such, existing intel
pvh guest will no longer boot on xen after that c/s.

Both above changes will be applicable to AMD also when xen support
of AMD pvh is added.

Also, we create a new glue assembly entry point for secondary vcpus
because they come up on kernel page tables that have pte.NX
bits set. As such, before anything is touched in DS/SS, EFER.NX
must be set.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/enlighten.c |  3 +++
 arch/x86/xen/smp.c   | 28 
 arch/x86/xen/smp.h   |  1 +
 arch/x86/xen/xen-head.S  | 29 +
 4 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c0cb11f..e17fa2d 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -85,6 +85,8 @@
 
 EXPORT_SYMBOL_GPL(hypercall_page);
 
+extern void xen_pvh_configure_efer(void);
+
 /*
  * Pointer to the xen_vcpu_info structure or
  * _shared_info->vcpu_info[cpu]. See xen_hvm_init_shared_info
@@ -1508,6 +1510,7 @@ static void __init xen_pvh_early_guest_init(void)
return;
 
xen_have_vector_callback = 1;
+   xen_pvh_configure_efer();
xen_pvh_set_cr_flags(0);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7005974..073bbf4 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -37,6 +37,7 @@
 #include 
 #include "xen-ops.h"
 #include "mmu.h"
+#include "smp.h"
 
 cpumask_var_t xen_cpu_initialized_map;
 
@@ -99,8 +100,12 @@ static void cpu_bringup(void)
wmb();  /* make sure everything is out */
 }
 
-/* Note: cpu parameter is only relevant for PVH */
-static void cpu_bringup_and_idle(int cpu)
+/*
+ * Note: cpu parameter is only relevant for PVH. The reason for passing it
+ * is we can't do smp_processor_id until the percpu segments are loaded, for
+ * which we need the cpu number! So we pass it in rdi as first parameter.
+ */
+asmlinkage __visible void cpu_bringup_and_idle(int cpu)
 {
 #ifdef CONFIG_X86_64
if (xen_feature(XENFEAT_auto_translated_physmap) &&
@@ -374,11 +379,10 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
ctxt->user_regs.fs = __KERNEL_PERCPU;
ctxt->user_regs.gs = __KERNEL_STACK_CANARY;
 #endif
-   ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
-
memset(>fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
ctxt->flags = VGCF_IN_KERNEL;
ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
ctxt->user_regs.ds = __USER_DS;
@@ -416,12 +420,20 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
 #ifdef CONFIG_X86_32
}
 #else
-   } else
-   /* N.B. The user_regs.eip (cpu_bringup_and_idle) is called with
-* %rdi having the cpu number - which means are passing in
-* as the first parameter the cpu. Subtle!
+   } else {
+   /*
+* The vcpu comes on kernel page tables which have the NX pte
+* bit set. This means before DS/SS is touched, NX in
+* EFER must be set. Hence the following assembly glue code.
+*/
+   ctxt->user_regs.eip = (unsigned long)pvh_smp_cpu_bringup;
+
+   /* N.B. The bringup function cpu_bringup_and_idle is called with
+* %rdi having the cpu number - which means we are passing it in
+* as the first parameter. Subtle!
 */
ctxt->user_regs.rdi = cpu;
+   }
 #endif
ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
index c7c2d89..d6628cb 100644
--- a/arch/x86/xen/smp.h
+++ b/arch/x86/xen/smp.h
@@ -7,5 +7,6 @@ extern void xen_send_IPI_mask_allbutself(const struct cpumask 
*mask,
 extern void xen_send_IPI_allbutself(int vector);
 extern void xen_send_IPI_all(int vector);
 extern void xen_send_IPI_self(int vector);
+extern void pvh_smp_cpu_bringup(int cpu);
 
 #endif
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 485b695..97ee831 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -47,6 +47,35 @@ ENTRY(startup_xen)
 
__FINIT
 
+#ifdef CONFIG_XEN_PVH
+#ifdef CONFIG_X86_64
+.macro PVH_

[V2 PATCH 0/1] PVH: set EFER bits

2014-09-02 Thread Mukesh Rathor
Changes from V1:
   - Unify the patches into one
   - Unify the code to set the EFER bits.

thanks,
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V2 PATCH 0/1] PVH: set EFER bits

2014-09-02 Thread Mukesh Rathor
Changes from V1:
   - Unify the patches into one
   - Unify the code to set the EFER bits.

thanks,
Mukesh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V2 PATCH 1/1] PVH: set EFER.NX and EFER.SCE

2014-09-02 Thread Mukesh Rathor
This patch addresses two things for a pvh boot vcpu:

  - NX bug on intel: It was recenlty discovered that NX is not being
honored in PVH on intel since EFER.NX is not being set.

  - PVH boot hang on newer xen:  Following c/s on xen

c/s 7645640:  x86/PVH: don't set EFER_SCE for pvh guest

removes setting of EFER.SCE for PVH guests. As such, existing intel
pvh guest will no longer boot on xen after that c/s.

Both above changes will be applicable to AMD also when xen support
of AMD pvh is added.

Also, we create a new glue assembly entry point for secondary vcpus
because they come up on kernel page tables that have pte.NX
bits set. As such, before anything is touched in DS/SS, EFER.NX
must be set.

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
---
 arch/x86/xen/enlighten.c |  3 +++
 arch/x86/xen/smp.c   | 28 
 arch/x86/xen/smp.h   |  1 +
 arch/x86/xen/xen-head.S  | 29 +
 4 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c0cb11f..e17fa2d 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -85,6 +85,8 @@
 
 EXPORT_SYMBOL_GPL(hypercall_page);
 
+extern void xen_pvh_configure_efer(void);
+
 /*
  * Pointer to the xen_vcpu_info structure or
  * HYPERVISOR_shared_info-vcpu_info[cpu]. See xen_hvm_init_shared_info
@@ -1508,6 +1510,7 @@ static void __init xen_pvh_early_guest_init(void)
return;
 
xen_have_vector_callback = 1;
+   xen_pvh_configure_efer();
xen_pvh_set_cr_flags(0);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7005974..073bbf4 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -37,6 +37,7 @@
 #include xen/hvc-console.h
 #include xen-ops.h
 #include mmu.h
+#include smp.h
 
 cpumask_var_t xen_cpu_initialized_map;
 
@@ -99,8 +100,12 @@ static void cpu_bringup(void)
wmb();  /* make sure everything is out */
 }
 
-/* Note: cpu parameter is only relevant for PVH */
-static void cpu_bringup_and_idle(int cpu)
+/*
+ * Note: cpu parameter is only relevant for PVH. The reason for passing it
+ * is we can't do smp_processor_id until the percpu segments are loaded, for
+ * which we need the cpu number! So we pass it in rdi as first parameter.
+ */
+asmlinkage __visible void cpu_bringup_and_idle(int cpu)
 {
 #ifdef CONFIG_X86_64
if (xen_feature(XENFEAT_auto_translated_physmap) 
@@ -374,11 +379,10 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
ctxt-user_regs.fs = __KERNEL_PERCPU;
ctxt-user_regs.gs = __KERNEL_STACK_CANARY;
 #endif
-   ctxt-user_regs.eip = (unsigned long)cpu_bringup_and_idle;
-
memset(ctxt-fpu_ctxt, 0, sizeof(ctxt-fpu_ctxt));
 
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   ctxt-user_regs.eip = (unsigned long)cpu_bringup_and_idle;
ctxt-flags = VGCF_IN_KERNEL;
ctxt-user_regs.eflags = 0x1000; /* IOPL_RING1 */
ctxt-user_regs.ds = __USER_DS;
@@ -416,12 +420,20 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
 #ifdef CONFIG_X86_32
}
 #else
-   } else
-   /* N.B. The user_regs.eip (cpu_bringup_and_idle) is called with
-* %rdi having the cpu number - which means are passing in
-* as the first parameter the cpu. Subtle!
+   } else {
+   /*
+* The vcpu comes on kernel page tables which have the NX pte
+* bit set. This means before DS/SS is touched, NX in
+* EFER must be set. Hence the following assembly glue code.
+*/
+   ctxt-user_regs.eip = (unsigned long)pvh_smp_cpu_bringup;
+
+   /* N.B. The bringup function cpu_bringup_and_idle is called with
+* %rdi having the cpu number - which means we are passing it in
+* as the first parameter. Subtle!
 */
ctxt-user_regs.rdi = cpu;
+   }
 #endif
ctxt-user_regs.esp = idle-thread.sp0 - sizeof(struct pt_regs);
ctxt-ctrlreg[3] = xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
index c7c2d89..d6628cb 100644
--- a/arch/x86/xen/smp.h
+++ b/arch/x86/xen/smp.h
@@ -7,5 +7,6 @@ extern void xen_send_IPI_mask_allbutself(const struct cpumask 
*mask,
 extern void xen_send_IPI_allbutself(int vector);
 extern void xen_send_IPI_all(int vector);
 extern void xen_send_IPI_self(int vector);
+extern void pvh_smp_cpu_bringup(int cpu);
 
 #endif
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 485b695..97ee831 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -47,6 +47,35 @@ ENTRY(startup_xen)
 
__FINIT
 
+#ifdef CONFIG_XEN_PVH
+#ifdef CONFIG_X86_64
+.macro PVH_EARLY_SET_EFER
+   /* Gather features to see

Re: [Xen-devel] [V1 PATCH 1/2] PVH: set EFER.NX and EFER.SCE for boot vcpu

2014-08-28 Thread Mukesh Rathor
On Thu, 28 Aug 2014 15:18:26 +0100
David Vrabel  wrote:

> On 27/08/14 23:33, Mukesh Rathor wrote:
> > This patch addresses three things for a pvh boot vcpu:
> > 
> >   - NX bug on intel: It was recenlty discovered that NX is not being
> > honored in PVH on intel since EFER.NX is not being set. The
> > pte.NX bits are ignored if EFER.NX is not set on intel.
> 
> I am unconvinced by this explanation.  The Intel SDM clearly states
> that the XD bit in the page table entries is reserved if EFER.NXE is
> clear, and thus using a entry with XD set and EFER.NXE clear should
> generate a page fault (same as AMD).
> 
> You either need to find out why Intel really worked (perhaps Xen is
> setting EFER.NXE on Intel?) or you need to included an errata (or
> similar) reference.

Nop, verified that again. The vcpu is coming up on efer 0x501,  ie,
LME/LMA/SCE (older xen prior to SCE removal change). The pte entry for
rsp is: 80003e32b063 that has NX set. No exception is generated upon
push rbp instruction (like on amd).

Could be that Intel docs are incomplete on vmx, I didn't hear back from them 
on the last one I had found. Anyways, we are not addressing an intel errata
here, but fixing our issue of setting the EFER.NX bit.

-Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [V1 PATCH 1/2] PVH: set EFER.NX and EFER.SCE for boot vcpu

2014-08-28 Thread Mukesh Rathor
On Thu, 28 Aug 2014 15:18:26 +0100
David Vrabel david.vra...@citrix.com wrote:

 On 27/08/14 23:33, Mukesh Rathor wrote:
  This patch addresses three things for a pvh boot vcpu:
  
- NX bug on intel: It was recenlty discovered that NX is not being
  honored in PVH on intel since EFER.NX is not being set. The
  pte.NX bits are ignored if EFER.NX is not set on intel.
 
 I am unconvinced by this explanation.  The Intel SDM clearly states
 that the XD bit in the page table entries is reserved if EFER.NXE is
 clear, and thus using a entry with XD set and EFER.NXE clear should
 generate a page fault (same as AMD).
 
 You either need to find out why Intel really worked (perhaps Xen is
 setting EFER.NXE on Intel?) or you need to included an errata (or
 similar) reference.

Nop, verified that again. The vcpu is coming up on efer 0x501,  ie,
LME/LMA/SCE (older xen prior to SCE removal change). The pte entry for
rsp is: 80003e32b063 that has NX set. No exception is generated upon
push rbp instruction (like on amd).

Could be that Intel docs are incomplete on vmx, I didn't hear back from them 
on the last one I had found. Anyways, we are not addressing an intel errata
here, but fixing our issue of setting the EFER.NX bit.

-Mukesh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V1 PATCH 1/2] PVH: set EFER.NX and EFER.SCE for boot vcpu

2014-08-27 Thread Mukesh Rathor
This patch addresses three things for a pvh boot vcpu:

  - NX bug on intel: It was recenlty discovered that NX is not being
honored in PVH on intel since EFER.NX is not being set. The pte.NX
bits are ignored if EFER.NX is not set on intel.

  - PVH boot hang on newer xen:  Following c/s on xen

c/s 7645640:  x86/PVH: don't set EFER_SCE for pvh guest

removes setting of EFER.SCE for PVH guests. As such, existing intel pvh
guest will no longer boot on xen after that c/s.

  - Both above changes will be applicable to AMD also when xen support of
AMD pvh is added.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/enlighten.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c0cb11f..4af512d 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1499,6 +1499,17 @@ void __ref xen_pvh_secondary_vcpu_init(int cpu)
xen_pvh_set_cr_flags(cpu);
 }
 
+/* This is done in secondary_startup_64 for hvm guests. */
+static void __init xen_configure_efer(void)
+{
+   u64 efer;
+
+   rdmsrl(MSR_EFER, efer);
+   efer |= EFER_SCE;
+   efer |= (cpuid_edx(0x8001) & (1 << 20)) ? EFER_NX : 0;
+   wrmsrl(MSR_EFER, efer);
+}
+
 static void __init xen_pvh_early_guest_init(void)
 {
if (!xen_feature(XENFEAT_auto_translated_physmap))
@@ -1508,6 +1519,7 @@ static void __init xen_pvh_early_guest_init(void)
return;
 
xen_have_vector_callback = 1;
+   xen_configure_efer();
xen_pvh_set_cr_flags(0);
 
 #ifdef CONFIG_X86_32
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V1 PATCH 0/2] Linux PVH: set EFER bits..

2014-08-27 Thread Mukesh Rathor
Resending with comments fixed up. Please note, these are no longer
AMD only, but address existing broken boot and broken NX on intel.

thanks
mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V1 PATCH 2/2] PVH: set EFER.NX and EFER.SCE for secondary vcpus

2014-08-27 Thread Mukesh Rathor
This patch addresses three things for a pvh secondary vcpu:

  - NX bug on intel: It was recenlty discovered that NX is not being
honored in PVH on intel since EFER.NX is not being set. The pte.NX
bits are ignored if EFER.NX is not set on intel.

  - PVH boot hang on newer xen:  Following c/s on xen

c/s 7645640:  x86/PVH: don't set EFER_SCE for pvh guest

removes setting of EFER.SCE for PVH guests. As such, existing intel pvh
guest will no longer boot on xen after that c/s.

  - Both above changes will be applicable to AMD also when xen support of
AMD pvh is added.

Please note: We create a new glue assembly entry point because the
secondary vcpus come up on kernel page tables that have pte.NX
bits set. While on Intel these are ignored if EFER.NX is not set, on
AMD a RSVD bit fault is generated.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/smp.c  | 28 
 arch/x86/xen/smp.h  |  1 +
 arch/x86/xen/xen-head.S | 21 +
 3 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7005974..66058b9 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -37,6 +37,7 @@
 #include 
 #include "xen-ops.h"
 #include "mmu.h"
+#include "smp.h"
 
 cpumask_var_t xen_cpu_initialized_map;
 
@@ -99,8 +100,12 @@ static void cpu_bringup(void)
wmb();  /* make sure everything is out */
 }
 
-/* Note: cpu parameter is only relevant for PVH */
-static void cpu_bringup_and_idle(int cpu)
+/*
+ * Note: cpu parameter is only relevant for PVH. The reason for passing it
+ * is we can't do smp_processor_id until the percpu segments are loaded, for
+ * which we need the cpu number! So we pass it in rdi as first parameter.
+ */
+asmlinkage __visible void cpu_bringup_and_idle(int cpu)
 {
 #ifdef CONFIG_X86_64
if (xen_feature(XENFEAT_auto_translated_physmap) &&
@@ -374,11 +379,10 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
ctxt->user_regs.fs = __KERNEL_PERCPU;
ctxt->user_regs.gs = __KERNEL_STACK_CANARY;
 #endif
-   ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
-
memset(>fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
ctxt->flags = VGCF_IN_KERNEL;
ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
ctxt->user_regs.ds = __USER_DS;
@@ -416,12 +420,20 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
 #ifdef CONFIG_X86_32
}
 #else
-   } else
-   /* N.B. The user_regs.eip (cpu_bringup_and_idle) is called with
-* %rdi having the cpu number - which means are passing in
-* as the first parameter the cpu. Subtle!
+   } else {
+   /*
+* The vcpu comes on kernel page tables which have the NX pte
+* bit set on AMD. This means before DS/SS is touched, NX in
+* EFER must be set. Hence the following assembly glue code.
+*/
+   ctxt->user_regs.eip = (unsigned long)pvh_cpu_bringup;
+
+   /* N.B. The bringup function cpu_bringup_and_idle is called with
+* %rdi having the cpu number - which means we are passing it in
+* as the first parameter. Subtle!
 */
ctxt->user_regs.rdi = cpu;
+   }
 #endif
ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
index c7c2d89..b20ba68 100644
--- a/arch/x86/xen/smp.h
+++ b/arch/x86/xen/smp.h
@@ -7,5 +7,6 @@ extern void xen_send_IPI_mask_allbutself(const struct cpumask 
*mask,
 extern void xen_send_IPI_allbutself(int vector);
 extern void xen_send_IPI_all(int vector);
 extern void xen_send_IPI_self(int vector);
+extern void pvh_cpu_bringup(int cpu);
 
 #endif
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 485b695..db8dca5 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -47,6 +47,27 @@ ENTRY(startup_xen)
 
__FINIT
 
+#ifdef CONFIG_XEN_PVH
+#ifdef CONFIG_X86_64
+/* Note that rdi contains the cpu number and must be preserved */
+ENTRY(pvh_cpu_bringup)
+   /* Gather features to see if NX implemented. (no EFER.NX on intel) */
+   movl$0x8001, %eax
+   cpuid
+   movl%edx,%esi
+
+   movl$MSR_EFER, %ecx
+   rdmsr
+   btsl$_EFER_SCE, %eax
+
+   btl $20,%esi
+   jnc 1f  /* No NX, skip it */
+   btsl$_EFER_NX, %eax
+1: wrmsr
+   jmp cpu_bringup_and_idle
+#endif /* CONFIG_X86_64 */
+#endif /* CONFIG_XEN_PVH */
+
 .pushsection .text

[V1 PATCH 2/2] PVH: set EFER.NX and EFER.SCE for secondary vcpus

2014-08-27 Thread Mukesh Rathor
This patch addresses three things for a pvh secondary vcpu:

  - NX bug on intel: It was recenlty discovered that NX is not being
honored in PVH on intel since EFER.NX is not being set. The pte.NX
bits are ignored if EFER.NX is not set on intel.

  - PVH boot hang on newer xen:  Following c/s on xen

c/s 7645640:  x86/PVH: don't set EFER_SCE for pvh guest

removes setting of EFER.SCE for PVH guests. As such, existing intel pvh
guest will no longer boot on xen after that c/s.

  - Both above changes will be applicable to AMD also when xen support of
AMD pvh is added.

Please note: We create a new glue assembly entry point because the
secondary vcpus come up on kernel page tables that have pte.NX
bits set. While on Intel these are ignored if EFER.NX is not set, on
AMD a RSVD bit fault is generated.

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
---
 arch/x86/xen/smp.c  | 28 
 arch/x86/xen/smp.h  |  1 +
 arch/x86/xen/xen-head.S | 21 +
 3 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7005974..66058b9 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -37,6 +37,7 @@
 #include xen/hvc-console.h
 #include xen-ops.h
 #include mmu.h
+#include smp.h
 
 cpumask_var_t xen_cpu_initialized_map;
 
@@ -99,8 +100,12 @@ static void cpu_bringup(void)
wmb();  /* make sure everything is out */
 }
 
-/* Note: cpu parameter is only relevant for PVH */
-static void cpu_bringup_and_idle(int cpu)
+/*
+ * Note: cpu parameter is only relevant for PVH. The reason for passing it
+ * is we can't do smp_processor_id until the percpu segments are loaded, for
+ * which we need the cpu number! So we pass it in rdi as first parameter.
+ */
+asmlinkage __visible void cpu_bringup_and_idle(int cpu)
 {
 #ifdef CONFIG_X86_64
if (xen_feature(XENFEAT_auto_translated_physmap) 
@@ -374,11 +379,10 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
ctxt-user_regs.fs = __KERNEL_PERCPU;
ctxt-user_regs.gs = __KERNEL_STACK_CANARY;
 #endif
-   ctxt-user_regs.eip = (unsigned long)cpu_bringup_and_idle;
-
memset(ctxt-fpu_ctxt, 0, sizeof(ctxt-fpu_ctxt));
 
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   ctxt-user_regs.eip = (unsigned long)cpu_bringup_and_idle;
ctxt-flags = VGCF_IN_KERNEL;
ctxt-user_regs.eflags = 0x1000; /* IOPL_RING1 */
ctxt-user_regs.ds = __USER_DS;
@@ -416,12 +420,20 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
 #ifdef CONFIG_X86_32
}
 #else
-   } else
-   /* N.B. The user_regs.eip (cpu_bringup_and_idle) is called with
-* %rdi having the cpu number - which means are passing in
-* as the first parameter the cpu. Subtle!
+   } else {
+   /*
+* The vcpu comes on kernel page tables which have the NX pte
+* bit set on AMD. This means before DS/SS is touched, NX in
+* EFER must be set. Hence the following assembly glue code.
+*/
+   ctxt-user_regs.eip = (unsigned long)pvh_cpu_bringup;
+
+   /* N.B. The bringup function cpu_bringup_and_idle is called with
+* %rdi having the cpu number - which means we are passing it in
+* as the first parameter. Subtle!
 */
ctxt-user_regs.rdi = cpu;
+   }
 #endif
ctxt-user_regs.esp = idle-thread.sp0 - sizeof(struct pt_regs);
ctxt-ctrlreg[3] = xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
index c7c2d89..b20ba68 100644
--- a/arch/x86/xen/smp.h
+++ b/arch/x86/xen/smp.h
@@ -7,5 +7,6 @@ extern void xen_send_IPI_mask_allbutself(const struct cpumask 
*mask,
 extern void xen_send_IPI_allbutself(int vector);
 extern void xen_send_IPI_all(int vector);
 extern void xen_send_IPI_self(int vector);
+extern void pvh_cpu_bringup(int cpu);
 
 #endif
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 485b695..db8dca5 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -47,6 +47,27 @@ ENTRY(startup_xen)
 
__FINIT
 
+#ifdef CONFIG_XEN_PVH
+#ifdef CONFIG_X86_64
+/* Note that rdi contains the cpu number and must be preserved */
+ENTRY(pvh_cpu_bringup)
+   /* Gather features to see if NX implemented. (no EFER.NX on intel) */
+   movl$0x8001, %eax
+   cpuid
+   movl%edx,%esi
+
+   movl$MSR_EFER, %ecx
+   rdmsr
+   btsl$_EFER_SCE, %eax
+
+   btl $20,%esi
+   jnc 1f  /* No NX, skip it */
+   btsl$_EFER_NX, %eax
+1: wrmsr
+   jmp cpu_bringup_and_idle
+#endif /* CONFIG_X86_64 */
+#endif /* CONFIG_XEN_PVH */
+
 .pushsection .text
.balign PAGE_SIZE
 ENTRY(hypercall_page

[V1 PATCH 0/2] Linux PVH: set EFER bits..

2014-08-27 Thread Mukesh Rathor
Resending with comments fixed up. Please note, these are no longer
AMD only, but address existing broken boot and broken NX on intel.

thanks
mukesh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V1 PATCH 1/2] PVH: set EFER.NX and EFER.SCE for boot vcpu

2014-08-27 Thread Mukesh Rathor
This patch addresses three things for a pvh boot vcpu:

  - NX bug on intel: It was recenlty discovered that NX is not being
honored in PVH on intel since EFER.NX is not being set. The pte.NX
bits are ignored if EFER.NX is not set on intel.

  - PVH boot hang on newer xen:  Following c/s on xen

c/s 7645640:  x86/PVH: don't set EFER_SCE for pvh guest

removes setting of EFER.SCE for PVH guests. As such, existing intel pvh
guest will no longer boot on xen after that c/s.

  - Both above changes will be applicable to AMD also when xen support of
AMD pvh is added.

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
---
 arch/x86/xen/enlighten.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c0cb11f..4af512d 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1499,6 +1499,17 @@ void __ref xen_pvh_secondary_vcpu_init(int cpu)
xen_pvh_set_cr_flags(cpu);
 }
 
+/* This is done in secondary_startup_64 for hvm guests. */
+static void __init xen_configure_efer(void)
+{
+   u64 efer;
+
+   rdmsrl(MSR_EFER, efer);
+   efer |= EFER_SCE;
+   efer |= (cpuid_edx(0x8001)  (1  20)) ? EFER_NX : 0;
+   wrmsrl(MSR_EFER, efer);
+}
+
 static void __init xen_pvh_early_guest_init(void)
 {
if (!xen_feature(XENFEAT_auto_translated_physmap))
@@ -1508,6 +1519,7 @@ static void __init xen_pvh_early_guest_init(void)
return;
 
xen_have_vector_callback = 1;
+   xen_configure_efer();
xen_pvh_set_cr_flags(0);
 
 #ifdef CONFIG_X86_32
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [V0 PATCH 1/2] AMD-PVH: set EFER.NX and EFER.SCE for the boot vcpu

2014-08-22 Thread Mukesh Rathor
On Fri, 22 Aug 2014 12:09:27 -0700
Mukesh Rathor  wrote:

> On Fri, 22 Aug 2014 06:41:40 +0200
> Borislav Petkov  wrote:
> 
> > On Thu, Aug 21, 2014 at 07:46:56PM -0700, Mukesh Rathor wrote:
> > > Intel doesn't have EFER.NX bit.
> > 
> > Of course it does.
> > 
> 
> Right, it does. Some code/comment is misleading... Anyways, reading
> intel SDMs, if I understand the convoluted text correctly, EFER.NX is
> not required to be set for l1.nx to be set, thus allowing for page
> level protection. Where as on AMD, EFER.NX must be set for l1.nx to
> be used. So, in the end, this patch would apply to both amd/intel 
> 
> I'll reword and submit.

Err, try again, the section "4.1.1 Three Paging Modes" says:

"Execute-disable access rights are applied only if IA32_EFER.NXE = 1"

So, I guess NX is broken on Intel PVH because EFER.NX is currently 
not being set.  While AMD will #GP if l1.NX is set and EFER.NX is not, 
I guess Intel just ignores the l1.XD if EFER.NX is not set. 

Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [V0 PATCH 0/2] AMD PVH domU support

2014-08-22 Thread Mukesh Rathor
On Fri, 22 Aug 2014 14:52:41 +0100
David Vrabel  wrote:

> On 21/08/14 03:16, Mukesh Rathor wrote:
> > Hi,
> > 
> > Here's first stab at AMD PVH domU support. Pretty much the only
> > thing needed is EFER bits set. Please review.
> 
> I'm not going to accept this until there is some ABI documentation
> stating explicitly what state non-boot CPUs will be in.
> 
> I'm particularly concerned that: a) there is a difference between AMD
> and Intel; and b) you want to change the ABI by clearing a the
> EFER.SCE bit.

Correct, I realize it changes the ABI, but I believe that is the right 
thing to do while we can, specially, since we need to fix the EFER for
NX anyways. Looking at the code, it appears this would be the final
cleanup for this ABI... :)..

However, if that's not possible, I suppose we can just leave it as is
too for the SC bit.

thanks
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [V0 PATCH 0/2] AMD PVH domU support

2014-08-22 Thread Mukesh Rathor
On Fri, 22 Aug 2014 14:55:21 +0100
David Vrabel  wrote:

> On 22/08/14 14:52, David Vrabel wrote:
> > On 21/08/14 03:16, Mukesh Rathor wrote:
> >> Hi,
> >>
> >> Here's first stab at AMD PVH domU support. Pretty much the only
> >> thing needed is EFER bits set. Please review.
> > 
> > I'm not going to accept this until there is some ABI documentation
> > stating explicitly what state non-boot CPUs will be in.
> 
> Also the boot CPU.
> 
> David

Sure, but looks like Roger already beat me to it... 

>From Roger's "very initial PVH design document" :

And finally on `EFER` the following features are enabled:

  * LME (bit 8): Long mode enable.
  * LMA (bit 10): Long mode active.


LMK if anything additional needs to be done.

Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [V0 PATCH 1/2] AMD-PVH: set EFER.NX and EFER.SCE for the boot vcpu

2014-08-22 Thread Mukesh Rathor
On Fri, 22 Aug 2014 06:41:40 +0200
Borislav Petkov  wrote:

> On Thu, Aug 21, 2014 at 07:46:56PM -0700, Mukesh Rathor wrote:
> > Intel doesn't have EFER.NX bit.
> 
> Of course it does.
> 

Right, it does. Some code/comment is misleading... Anyways, reading
intel SDMs, if I understand the convoluted text correctly, EFER.NX is
not required to be set for l1.nx to be set, thus allowing for page
level protection. Where as on AMD, EFER.NX must be set for l1.nx to
be used. So, in the end, this patch would apply to both amd/intel 

I'll reword and submit.

Thanks,
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [V0 PATCH 1/2] AMD-PVH: set EFER.NX and EFER.SCE for the boot vcpu

2014-08-22 Thread Mukesh Rathor
On Fri, 22 Aug 2014 12:09:27 -0700
Mukesh Rathor mukesh.rat...@oracle.com wrote:

 On Fri, 22 Aug 2014 06:41:40 +0200
 Borislav Petkov b...@alien8.de wrote:
 
  On Thu, Aug 21, 2014 at 07:46:56PM -0700, Mukesh Rathor wrote:
   Intel doesn't have EFER.NX bit.
  
  Of course it does.
  
 
 Right, it does. Some code/comment is misleading... Anyways, reading
 intel SDMs, if I understand the convoluted text correctly, EFER.NX is
 not required to be set for l1.nx to be set, thus allowing for page
 level protection. Where as on AMD, EFER.NX must be set for l1.nx to
 be used. So, in the end, this patch would apply to both amd/intel 
 
 I'll reword and submit.

Err, try again, the section 4.1.1 Three Paging Modes says:

Execute-disable access rights are applied only if IA32_EFER.NXE = 1

So, I guess NX is broken on Intel PVH because EFER.NX is currently 
not being set.  While AMD will #GP if l1.NX is set and EFER.NX is not, 
I guess Intel just ignores the l1.XD if EFER.NX is not set. 

Mukesh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [V0 PATCH 1/2] AMD-PVH: set EFER.NX and EFER.SCE for the boot vcpu

2014-08-22 Thread Mukesh Rathor
On Fri, 22 Aug 2014 06:41:40 +0200
Borislav Petkov b...@alien8.de wrote:

 On Thu, Aug 21, 2014 at 07:46:56PM -0700, Mukesh Rathor wrote:
  Intel doesn't have EFER.NX bit.
 
 Of course it does.
 

Right, it does. Some code/comment is misleading... Anyways, reading
intel SDMs, if I understand the convoluted text correctly, EFER.NX is
not required to be set for l1.nx to be set, thus allowing for page
level protection. Where as on AMD, EFER.NX must be set for l1.nx to
be used. So, in the end, this patch would apply to both amd/intel 

I'll reword and submit.

Thanks,
Mukesh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [V0 PATCH 0/2] AMD PVH domU support

2014-08-22 Thread Mukesh Rathor
On Fri, 22 Aug 2014 14:55:21 +0100
David Vrabel david.vra...@citrix.com wrote:

 On 22/08/14 14:52, David Vrabel wrote:
  On 21/08/14 03:16, Mukesh Rathor wrote:
  Hi,
 
  Here's first stab at AMD PVH domU support. Pretty much the only
  thing needed is EFER bits set. Please review.
  
  I'm not going to accept this until there is some ABI documentation
  stating explicitly what state non-boot CPUs will be in.
 
 Also the boot CPU.
 
 David

Sure, but looks like Roger already beat me to it... 

From Roger's very initial PVH design document :

And finally on `EFER` the following features are enabled:

  * LME (bit 8): Long mode enable.
  * LMA (bit 10): Long mode active.


LMK if anything additional needs to be done.

Mukesh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [V0 PATCH 0/2] AMD PVH domU support

2014-08-22 Thread Mukesh Rathor
On Fri, 22 Aug 2014 14:52:41 +0100
David Vrabel david.vra...@citrix.com wrote:

 On 21/08/14 03:16, Mukesh Rathor wrote:
  Hi,
  
  Here's first stab at AMD PVH domU support. Pretty much the only
  thing needed is EFER bits set. Please review.
 
 I'm not going to accept this until there is some ABI documentation
 stating explicitly what state non-boot CPUs will be in.
 
 I'm particularly concerned that: a) there is a difference between AMD
 and Intel; and b) you want to change the ABI by clearing a the
 EFER.SCE bit.

Correct, I realize it changes the ABI, but I believe that is the right 
thing to do while we can, specially, since we need to fix the EFER for
NX anyways. Looking at the code, it appears this would be the final
cleanup for this ABI... :)..

However, if that's not possible, I suppose we can just leave it as is
too for the SC bit.

thanks
Mukesh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [V0 PATCH 1/2] AMD-PVH: set EFER.NX and EFER.SCE for the boot vcpu

2014-08-21 Thread Mukesh Rathor
On Thu, 21 Aug 2014 21:39:04 -0400
Konrad Rzeszutek Wilk  wrote:

> On Wed, Aug 20, 2014 at 07:16:39PM -0700, Mukesh Rathor wrote:
> > On AMD, NX feature must be enabled in the efer for NX to be honored
> > in the pte entries, otherwise protection fault. We also set SC for
> > system calls to be enabled.
> 
> How come we don't need to do that for Intel (that is set the NX bit)?
> Could you include the explanation here please?

Intel doesn't have EFER.NX bit. The SC bit is being set in xen, but it
doesn't need to be, and I'm going to submit a patch to undo it.

> 
> > 
> > Signed-off-by: Mukesh Rathor 
> > ---
> >  arch/x86/xen/enlighten.c | 12 
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> > index c0cb11f..4af512d 100644
> > --- a/arch/x86/xen/enlighten.c
> > +++ b/arch/x86/xen/enlighten.c
> > @@ -1499,6 +1499,17 @@ void __ref xen_pvh_secondary_vcpu_init(int
> > cpu) xen_pvh_set_cr_flags(cpu);
> >  }
> >  
> > +/* This is done in secondary_startup_64 for hvm guests. */
> > +static void __init xen_configure_efer(void)
> > +{
> > +   u64 efer;
> > +
> > +   rdmsrl(MSR_EFER, efer);
> > +   efer |= EFER_SCE;
> > +   efer |= (cpuid_edx(0x8001) & (1 << 20)) ? EFER_NX : 0;
> 
> Ahem? #defines for these magic values please?

Linux uses these directly all over the code as they are set in stone
pretty much, and I didn't find any #defines. See cpu/common.c for one of
the places. Also see secondary_startup_64, and others...

> Or could you use 'boot_cpu_has'?

Nop, it's not initialized at this point.

thanks,
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [V0 PATCH 1/2] AMD-PVH: set EFER.NX and EFER.SCE for the boot vcpu

2014-08-21 Thread Mukesh Rathor
On Thu, 21 Aug 2014 21:39:04 -0400
Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:

 On Wed, Aug 20, 2014 at 07:16:39PM -0700, Mukesh Rathor wrote:
  On AMD, NX feature must be enabled in the efer for NX to be honored
  in the pte entries, otherwise protection fault. We also set SC for
  system calls to be enabled.
 
 How come we don't need to do that for Intel (that is set the NX bit)?
 Could you include the explanation here please?

Intel doesn't have EFER.NX bit. The SC bit is being set in xen, but it
doesn't need to be, and I'm going to submit a patch to undo it.

 
  
  Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
  ---
   arch/x86/xen/enlighten.c | 12 
   1 file changed, 12 insertions(+)
  
  diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
  index c0cb11f..4af512d 100644
  --- a/arch/x86/xen/enlighten.c
  +++ b/arch/x86/xen/enlighten.c
  @@ -1499,6 +1499,17 @@ void __ref xen_pvh_secondary_vcpu_init(int
  cpu) xen_pvh_set_cr_flags(cpu);
   }
   
  +/* This is done in secondary_startup_64 for hvm guests. */
  +static void __init xen_configure_efer(void)
  +{
  +   u64 efer;
  +
  +   rdmsrl(MSR_EFER, efer);
  +   efer |= EFER_SCE;
  +   efer |= (cpuid_edx(0x8001)  (1  20)) ? EFER_NX : 0;
 
 Ahem? #defines for these magic values please?

Linux uses these directly all over the code as they are set in stone
pretty much, and I didn't find any #defines. See cpu/common.c for one of
the places. Also see secondary_startup_64, and others...

 Or could you use 'boot_cpu_has'?

Nop, it's not initialized at this point.

thanks,
Mukesh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V0 PATCH 1/2] AMD-PVH: set EFER.NX and EFER.SCE for the boot vcpu

2014-08-20 Thread Mukesh Rathor
On AMD, NX feature must be enabled in the efer for NX to be honored in
the pte entries, otherwise protection fault. We also set SC for
system calls to be enabled.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/enlighten.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c0cb11f..4af512d 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1499,6 +1499,17 @@ void __ref xen_pvh_secondary_vcpu_init(int cpu)
xen_pvh_set_cr_flags(cpu);
 }
 
+/* This is done in secondary_startup_64 for hvm guests. */
+static void __init xen_configure_efer(void)
+{
+   u64 efer;
+
+   rdmsrl(MSR_EFER, efer);
+   efer |= EFER_SCE;
+   efer |= (cpuid_edx(0x8001) & (1 << 20)) ? EFER_NX : 0;
+   wrmsrl(MSR_EFER, efer);
+}
+
 static void __init xen_pvh_early_guest_init(void)
 {
if (!xen_feature(XENFEAT_auto_translated_physmap))
@@ -1508,6 +1519,7 @@ static void __init xen_pvh_early_guest_init(void)
return;
 
xen_have_vector_callback = 1;
+   xen_configure_efer();
xen_pvh_set_cr_flags(0);
 
 #ifdef CONFIG_X86_32
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V0 PATCH 0/2] AMD PVH domU support

2014-08-20 Thread Mukesh Rathor
Hi,

Here's first stab at AMD PVH domU support. Pretty much the only thing
needed is EFER bits set. Please review.

thanks,
Mukesh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V0 PATCH 2/2] AMD-PVH: set EFER.NX and EFER.SCE for secondary vcpus

2014-08-20 Thread Mukesh Rathor
The secondary vcpus come on kernel page tables which have the NX bit set
in pte entries for DS/SS. On AMD, EFER.NX must be set to avoid protection
fault.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/smp.c  | 28 
 arch/x86/xen/smp.h  |  1 +
 arch/x86/xen/xen-head.S | 21 +
 3 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7005974..66058b9 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -37,6 +37,7 @@
 #include 
 #include "xen-ops.h"
 #include "mmu.h"
+#include "smp.h"
 
 cpumask_var_t xen_cpu_initialized_map;
 
@@ -99,8 +100,12 @@ static void cpu_bringup(void)
wmb();  /* make sure everything is out */
 }
 
-/* Note: cpu parameter is only relevant for PVH */
-static void cpu_bringup_and_idle(int cpu)
+/*
+ * Note: cpu parameter is only relevant for PVH. The reason for passing it
+ * is we can't do smp_processor_id until the percpu segments are loaded, for
+ * which we need the cpu number! So we pass it in rdi as first parameter.
+ */
+asmlinkage __visible void cpu_bringup_and_idle(int cpu)
 {
 #ifdef CONFIG_X86_64
if (xen_feature(XENFEAT_auto_translated_physmap) &&
@@ -374,11 +379,10 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
ctxt->user_regs.fs = __KERNEL_PERCPU;
ctxt->user_regs.gs = __KERNEL_STACK_CANARY;
 #endif
-   ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
-
memset(>fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
ctxt->flags = VGCF_IN_KERNEL;
ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
ctxt->user_regs.ds = __USER_DS;
@@ -416,12 +420,20 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
 #ifdef CONFIG_X86_32
}
 #else
-   } else
-   /* N.B. The user_regs.eip (cpu_bringup_and_idle) is called with
-* %rdi having the cpu number - which means are passing in
-* as the first parameter the cpu. Subtle!
+   } else {
+   /*
+* The vcpu comes on kernel page tables which have the NX pte
+* bit set on AMD. This means before DS/SS is touched, NX in
+* EFER must be set. Hence the following assembly glue code.
+*/
+   ctxt->user_regs.eip = (unsigned long)pvh_cpu_bringup;
+
+   /* N.B. The bringup function cpu_bringup_and_idle is called with
+* %rdi having the cpu number - which means we are passing it in
+* as the first parameter. Subtle!
 */
ctxt->user_regs.rdi = cpu;
+   }
 #endif
ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
index c7c2d89..b20ba68 100644
--- a/arch/x86/xen/smp.h
+++ b/arch/x86/xen/smp.h
@@ -7,5 +7,6 @@ extern void xen_send_IPI_mask_allbutself(const struct cpumask 
*mask,
 extern void xen_send_IPI_allbutself(int vector);
 extern void xen_send_IPI_all(int vector);
 extern void xen_send_IPI_self(int vector);
+extern void pvh_cpu_bringup(int cpu);
 
 #endif
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 485b695..db8dca5 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -47,6 +47,27 @@ ENTRY(startup_xen)
 
__FINIT
 
+#ifdef CONFIG_XEN_PVH
+#ifdef CONFIG_X86_64
+/* Note that rdi contains the cpu number and must be preserved */
+ENTRY(pvh_cpu_bringup)
+   /* Gather features to see if NX implemented. (no EFER.NX on intel) */
+   movl$0x8001, %eax
+   cpuid
+   movl%edx,%esi
+
+   movl$MSR_EFER, %ecx
+   rdmsr
+   btsl$_EFER_SCE, %eax
+
+   btl $20,%esi
+   jnc 1f  /* No NX, skip it */
+   btsl$_EFER_NX, %eax
+1: wrmsr
+   jmp cpu_bringup_and_idle
+#endif /* CONFIG_X86_64 */
+#endif /* CONFIG_XEN_PVH */
+
 .pushsection .text
.balign PAGE_SIZE
 ENTRY(hypercall_page)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V0 PATCH 0/2] AMD PVH domU support

2014-08-20 Thread Mukesh Rathor
Hi,

Here's first stab at AMD PVH domU support. Pretty much the only thing
needed is EFER bits set. Please review.

thanks,
Mukesh


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V0 PATCH 2/2] AMD-PVH: set EFER.NX and EFER.SCE for secondary vcpus

2014-08-20 Thread Mukesh Rathor
The secondary vcpus come on kernel page tables which have the NX bit set
in pte entries for DS/SS. On AMD, EFER.NX must be set to avoid protection
fault.

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
---
 arch/x86/xen/smp.c  | 28 
 arch/x86/xen/smp.h  |  1 +
 arch/x86/xen/xen-head.S | 21 +
 3 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7005974..66058b9 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -37,6 +37,7 @@
 #include xen/hvc-console.h
 #include xen-ops.h
 #include mmu.h
+#include smp.h
 
 cpumask_var_t xen_cpu_initialized_map;
 
@@ -99,8 +100,12 @@ static void cpu_bringup(void)
wmb();  /* make sure everything is out */
 }
 
-/* Note: cpu parameter is only relevant for PVH */
-static void cpu_bringup_and_idle(int cpu)
+/*
+ * Note: cpu parameter is only relevant for PVH. The reason for passing it
+ * is we can't do smp_processor_id until the percpu segments are loaded, for
+ * which we need the cpu number! So we pass it in rdi as first parameter.
+ */
+asmlinkage __visible void cpu_bringup_and_idle(int cpu)
 {
 #ifdef CONFIG_X86_64
if (xen_feature(XENFEAT_auto_translated_physmap) 
@@ -374,11 +379,10 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
ctxt-user_regs.fs = __KERNEL_PERCPU;
ctxt-user_regs.gs = __KERNEL_STACK_CANARY;
 #endif
-   ctxt-user_regs.eip = (unsigned long)cpu_bringup_and_idle;
-
memset(ctxt-fpu_ctxt, 0, sizeof(ctxt-fpu_ctxt));
 
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   ctxt-user_regs.eip = (unsigned long)cpu_bringup_and_idle;
ctxt-flags = VGCF_IN_KERNEL;
ctxt-user_regs.eflags = 0x1000; /* IOPL_RING1 */
ctxt-user_regs.ds = __USER_DS;
@@ -416,12 +420,20 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
 #ifdef CONFIG_X86_32
}
 #else
-   } else
-   /* N.B. The user_regs.eip (cpu_bringup_and_idle) is called with
-* %rdi having the cpu number - which means are passing in
-* as the first parameter the cpu. Subtle!
+   } else {
+   /*
+* The vcpu comes on kernel page tables which have the NX pte
+* bit set on AMD. This means before DS/SS is touched, NX in
+* EFER must be set. Hence the following assembly glue code.
+*/
+   ctxt-user_regs.eip = (unsigned long)pvh_cpu_bringup;
+
+   /* N.B. The bringup function cpu_bringup_and_idle is called with
+* %rdi having the cpu number - which means we are passing it in
+* as the first parameter. Subtle!
 */
ctxt-user_regs.rdi = cpu;
+   }
 #endif
ctxt-user_regs.esp = idle-thread.sp0 - sizeof(struct pt_regs);
ctxt-ctrlreg[3] = xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
index c7c2d89..b20ba68 100644
--- a/arch/x86/xen/smp.h
+++ b/arch/x86/xen/smp.h
@@ -7,5 +7,6 @@ extern void xen_send_IPI_mask_allbutself(const struct cpumask 
*mask,
 extern void xen_send_IPI_allbutself(int vector);
 extern void xen_send_IPI_all(int vector);
 extern void xen_send_IPI_self(int vector);
+extern void pvh_cpu_bringup(int cpu);
 
 #endif
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 485b695..db8dca5 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -47,6 +47,27 @@ ENTRY(startup_xen)
 
__FINIT
 
+#ifdef CONFIG_XEN_PVH
+#ifdef CONFIG_X86_64
+/* Note that rdi contains the cpu number and must be preserved */
+ENTRY(pvh_cpu_bringup)
+   /* Gather features to see if NX implemented. (no EFER.NX on intel) */
+   movl$0x8001, %eax
+   cpuid
+   movl%edx,%esi
+
+   movl$MSR_EFER, %ecx
+   rdmsr
+   btsl$_EFER_SCE, %eax
+
+   btl $20,%esi
+   jnc 1f  /* No NX, skip it */
+   btsl$_EFER_NX, %eax
+1: wrmsr
+   jmp cpu_bringup_and_idle
+#endif /* CONFIG_X86_64 */
+#endif /* CONFIG_XEN_PVH */
+
 .pushsection .text
.balign PAGE_SIZE
 ENTRY(hypercall_page)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V0 PATCH 1/2] AMD-PVH: set EFER.NX and EFER.SCE for the boot vcpu

2014-08-20 Thread Mukesh Rathor
On AMD, NX feature must be enabled in the efer for NX to be honored in
the pte entries, otherwise protection fault. We also set SC for
system calls to be enabled.

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
---
 arch/x86/xen/enlighten.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c0cb11f..4af512d 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1499,6 +1499,17 @@ void __ref xen_pvh_secondary_vcpu_init(int cpu)
xen_pvh_set_cr_flags(cpu);
 }
 
+/* This is done in secondary_startup_64 for hvm guests. */
+static void __init xen_configure_efer(void)
+{
+   u64 efer;
+
+   rdmsrl(MSR_EFER, efer);
+   efer |= EFER_SCE;
+   efer |= (cpuid_edx(0x8001)  (1  20)) ? EFER_NX : 0;
+   wrmsrl(MSR_EFER, efer);
+}
+
 static void __init xen_pvh_early_guest_init(void)
 {
if (!xen_feature(XENFEAT_auto_translated_physmap))
@@ -1508,6 +1519,7 @@ static void __init xen_pvh_early_guest_init(void)
return;
 
xen_have_vector_callback = 1;
+   xen_configure_efer();
xen_pvh_set_cr_flags(0);
 
 #ifdef CONFIG_X86_32
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [V1 PATCH] dom0 pvh: map foreign pfns in our p2m for toolstack

2014-05-27 Thread Mukesh Rathor
On Tue, 27 May 2014 11:59:26 +0100
David Vrabel  wrote:

> On 27/05/14 11:43, Roger Pau Monné wrote:
> > On 24/05/14 03:33, Mukesh Rathor wrote:
> >> When running as dom0 in pvh mode, foreign pfns that are accessed
> >> must be added to our p2m which is managed by xen. This is done via
> >> XENMEM_add_to_physmap_range hypercall. This is needed for toolstack
> >> building guests and mapping guest memory, xentrace mapping xen
> >> pages, etc..
> 
> Thanks.
> 
> Applied to devel/for-linus-3.16, but see comments below.
> 
> >> +static int xlate_add_to_p2m(unsigned long lpfn, unsigned long
> >> fgmfn,
> >> +  unsigned int domid)
> 
> The preferred abbreviation is GFN not GMFN.  I fixed this up.
> 
> >> +{
> >> +  int rc, err = 0;
> >> +  xen_pfn_t gpfn = lpfn;
> >> +  xen_ulong_t idx = fgmfn;
> >> +
> >> +  struct xen_add_to_physmap_range xatp = {
> >> +  .domid = DOMID_SELF,
> >> +  .foreign_domid = domid,
> >> +  .size = 1,
> >> +  .space = XENMAPSPACE_gmfn_foreign,
> >> +  };
> >> +  set_xen_guest_handle(xatp.idxs, );
> >> +  set_xen_guest_handle(xatp.gpfns, );
> >> +  set_xen_guest_handle(xatp.errs, );
> >> +
> >> +  rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range,
> >> );
> >> +  return rc;
> > 
> > Thanks for the patches, I see two problems with this approach, the
> > first one is that you are completely ignoring the error in the
> > variable "err", which means that you can end up with a pfn that
> > Linux thinks it's valid, but it's not mapped to any mfn, so when
> > you try to access it you will trigger the vioapic crash.
> 
> I spotted this and fixed this up by adding:
> 
> +   if (rc < 0)
> +   return rc;
> +   return err;

Thanks a lot.

> > The second one is that this seems extremely inefficient, you are
> > issuing one hypercall for each memory page, when you could instead
> > batch all the pages into a single hypercall and map them in one
> > shot.
> 
> I agree, but the 3.16 merge window is nearly here so I've applied it
> as-is.  Note that the privcmd driver calls this function once per
> page, so the lack of batching doesn't really hurt here.

Thanks again, pleasure working with maintainer like you!

Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [V1 PATCH] dom0 pvh: map foreign pfns in our p2m for toolstack

2014-05-27 Thread Mukesh Rathor
On Tue, 27 May 2014 11:59:26 +0100
David Vrabel david.vra...@citrix.com wrote:

 On 27/05/14 11:43, Roger Pau Monné wrote:
  On 24/05/14 03:33, Mukesh Rathor wrote:
  When running as dom0 in pvh mode, foreign pfns that are accessed
  must be added to our p2m which is managed by xen. This is done via
  XENMEM_add_to_physmap_range hypercall. This is needed for toolstack
  building guests and mapping guest memory, xentrace mapping xen
  pages, etc..
 
 Thanks.
 
 Applied to devel/for-linus-3.16, but see comments below.
 
  +static int xlate_add_to_p2m(unsigned long lpfn, unsigned long
  fgmfn,
  +  unsigned int domid)
 
 The preferred abbreviation is GFN not GMFN.  I fixed this up.
 
  +{
  +  int rc, err = 0;
  +  xen_pfn_t gpfn = lpfn;
  +  xen_ulong_t idx = fgmfn;
  +
  +  struct xen_add_to_physmap_range xatp = {
  +  .domid = DOMID_SELF,
  +  .foreign_domid = domid,
  +  .size = 1,
  +  .space = XENMAPSPACE_gmfn_foreign,
  +  };
  +  set_xen_guest_handle(xatp.idxs, idx);
  +  set_xen_guest_handle(xatp.gpfns, gpfn);
  +  set_xen_guest_handle(xatp.errs, err);
  +
  +  rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range,
  xatp);
  +  return rc;
  
  Thanks for the patches, I see two problems with this approach, the
  first one is that you are completely ignoring the error in the
  variable err, which means that you can end up with a pfn that
  Linux thinks it's valid, but it's not mapped to any mfn, so when
  you try to access it you will trigger the vioapic crash.
 
 I spotted this and fixed this up by adding:
 
 +   if (rc  0)
 +   return rc;
 +   return err;

Thanks a lot.

  The second one is that this seems extremely inefficient, you are
  issuing one hypercall for each memory page, when you could instead
  batch all the pages into a single hypercall and map them in one
  shot.
 
 I agree, but the 3.16 merge window is nearly here so I've applied it
 as-is.  Note that the privcmd driver calls this function once per
 page, so the lack of batching doesn't really hurt here.

Thanks again, pleasure working with maintainer like you!

Mukesh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V1 PATCH] dom0 pvh linux support

2014-05-23 Thread Mukesh Rathor
Hi,

Attached please find patch for linux to support toolstack on pvh dom0.

thanks, 
Mukesh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V1 PATCH] dom0 pvh: map foreign pfns in our p2m for toolstack

2014-05-23 Thread Mukesh Rathor
When running as dom0 in pvh mode, foreign pfns that are accessed must be
added to our p2m which is managed by xen. This is done via
XENMEM_add_to_physmap_range hypercall. This is needed for toolstack
building guests and mapping guest memory, xentrace mapping xen pages,
etc..

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/mmu.c | 115 +++--
 1 file changed, 112 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 86e02ea..8efc066 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2510,6 +2510,93 @@ void __init xen_hvm_init_mmu_ops(void)
 }
 #endif
 
+#ifdef CONFIG_XEN_PVH
+/*
+ * Map foreign gmfn, fgmfn, to local pfn, lpfn. This for the user space
+ * creating new guest on pvh dom0 and needing to map domU pages.
+ */
+static int xlate_add_to_p2m(unsigned long lpfn, unsigned long fgmfn,
+   unsigned int domid)
+{
+   int rc, err = 0;
+   xen_pfn_t gpfn = lpfn;
+   xen_ulong_t idx = fgmfn;
+
+   struct xen_add_to_physmap_range xatp = {
+   .domid = DOMID_SELF,
+   .foreign_domid = domid,
+   .size = 1,
+   .space = XENMAPSPACE_gmfn_foreign,
+   };
+   set_xen_guest_handle(xatp.idxs, );
+   set_xen_guest_handle(xatp.gpfns, );
+   set_xen_guest_handle(xatp.errs, );
+
+   rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, );
+   return rc;
+}
+
+static int xlate_remove_from_p2m(unsigned long spfn, int count)
+{
+   struct xen_remove_from_physmap xrp;
+   int i, rc;
+
+   for (i = 0; i < count; i++) {
+   xrp.domid = DOMID_SELF;
+   xrp.gpfn = spfn+i;
+   rc = HYPERVISOR_memory_op(XENMEM_remove_from_physmap, );
+   if (rc)
+   break;
+   }
+   return rc;
+}
+
+struct xlate_remap_data {
+   unsigned long fgmfn; /* foreign domain's gmfn */
+   pgprot_t prot;
+   domid_t  domid;
+   int index;
+   struct page **pages;
+};
+
+static int xlate_map_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
+   void *data)
+{
+   int rc;
+   struct xlate_remap_data *remap = data;
+   unsigned long pfn = page_to_pfn(remap->pages[remap->index++]);
+   pte_t pteval = pte_mkspecial(pfn_pte(pfn, remap->prot));
+
+   rc = xlate_add_to_p2m(pfn, remap->fgmfn, remap->domid);
+   if (rc)
+   return rc;
+   native_set_pte(ptep, pteval);
+
+   return 0;
+}
+
+static int xlate_remap_gmfn_range(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long mfn,
+ int nr, pgprot_t prot, unsigned domid,
+ struct page **pages)
+{
+   int err;
+   struct xlate_remap_data pvhdata;
+
+   BUG_ON(!pages);
+
+   pvhdata.fgmfn = mfn;
+   pvhdata.prot = prot;
+   pvhdata.domid = domid;
+   pvhdata.index = 0;
+   pvhdata.pages = pages;
+   err = apply_to_page_range(vma->vm_mm, addr, nr << PAGE_SHIFT,
+ xlate_map_pte_fn, );
+   flush_tlb_all();
+   return err;
+}
+#endif
+
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
@@ -2544,13 +2631,20 @@ int xen_remap_domain_mfn_range(struct vm_area_struct 
*vma,
unsigned long range;
int err = 0;
 
-   if (xen_feature(XENFEAT_auto_translated_physmap))
-   return -EINVAL;
-
prot = __pgprot(pgprot_val(prot) | _PAGE_IOMAP);
 
BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
+   if (xen_feature(XENFEAT_auto_translated_physmap)) {
+#ifdef CONFIG_XEN_PVH
+   /* We need to update the local page tables and the xen HAP */
+   return xlate_remap_gmfn_range(vma, addr, mfn, nr, prot,
+ domid, pages);
+#else
+   return -EINVAL;
+#endif
+   }
+
rmd.mfn = mfn;
rmd.prot = prot;
 
@@ -2588,6 +2682,21 @@ int xen_unmap_domain_mfn_range(struct vm_area_struct 
*vma,
if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
return 0;
 
+#ifdef CONFIG_XEN_PVH
+   while (numpgs--) {
+
+   /* The mmu has already cleaned up the process mmu resources at
+* this point (lookup_address will return NULL). */
+   unsigned long pfn = page_to_pfn(pages[numpgs]);
+
+   xlate_remove_from_p2m(pfn, 1);
+   }
+   /* We don't need to flush tlbs because as part of xlate_remove_from_p2m,
+* the hypervisor will do tlb flushes after removing the p2m entries
+* from the EPT/NPT */
+   return 0;
+#else
return -EINVAL;
+#endif
 }
 EXPORT_SYMBOL_GPL(xen_unmap_domain_mfn_range);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-ker

[V1 PATCH] dom0 pvh linux support

2014-05-23 Thread Mukesh Rathor
Hi,

Attached please find patch for linux to support toolstack on pvh dom0.

thanks, 
Mukesh


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V1 PATCH] dom0 pvh: map foreign pfns in our p2m for toolstack

2014-05-23 Thread Mukesh Rathor
When running as dom0 in pvh mode, foreign pfns that are accessed must be
added to our p2m which is managed by xen. This is done via
XENMEM_add_to_physmap_range hypercall. This is needed for toolstack
building guests and mapping guest memory, xentrace mapping xen pages,
etc..

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
---
 arch/x86/xen/mmu.c | 115 +++--
 1 file changed, 112 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 86e02ea..8efc066 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2510,6 +2510,93 @@ void __init xen_hvm_init_mmu_ops(void)
 }
 #endif
 
+#ifdef CONFIG_XEN_PVH
+/*
+ * Map foreign gmfn, fgmfn, to local pfn, lpfn. This for the user space
+ * creating new guest on pvh dom0 and needing to map domU pages.
+ */
+static int xlate_add_to_p2m(unsigned long lpfn, unsigned long fgmfn,
+   unsigned int domid)
+{
+   int rc, err = 0;
+   xen_pfn_t gpfn = lpfn;
+   xen_ulong_t idx = fgmfn;
+
+   struct xen_add_to_physmap_range xatp = {
+   .domid = DOMID_SELF,
+   .foreign_domid = domid,
+   .size = 1,
+   .space = XENMAPSPACE_gmfn_foreign,
+   };
+   set_xen_guest_handle(xatp.idxs, idx);
+   set_xen_guest_handle(xatp.gpfns, gpfn);
+   set_xen_guest_handle(xatp.errs, err);
+
+   rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, xatp);
+   return rc;
+}
+
+static int xlate_remove_from_p2m(unsigned long spfn, int count)
+{
+   struct xen_remove_from_physmap xrp;
+   int i, rc;
+
+   for (i = 0; i  count; i++) {
+   xrp.domid = DOMID_SELF;
+   xrp.gpfn = spfn+i;
+   rc = HYPERVISOR_memory_op(XENMEM_remove_from_physmap, xrp);
+   if (rc)
+   break;
+   }
+   return rc;
+}
+
+struct xlate_remap_data {
+   unsigned long fgmfn; /* foreign domain's gmfn */
+   pgprot_t prot;
+   domid_t  domid;
+   int index;
+   struct page **pages;
+};
+
+static int xlate_map_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
+   void *data)
+{
+   int rc;
+   struct xlate_remap_data *remap = data;
+   unsigned long pfn = page_to_pfn(remap-pages[remap-index++]);
+   pte_t pteval = pte_mkspecial(pfn_pte(pfn, remap-prot));
+
+   rc = xlate_add_to_p2m(pfn, remap-fgmfn, remap-domid);
+   if (rc)
+   return rc;
+   native_set_pte(ptep, pteval);
+
+   return 0;
+}
+
+static int xlate_remap_gmfn_range(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long mfn,
+ int nr, pgprot_t prot, unsigned domid,
+ struct page **pages)
+{
+   int err;
+   struct xlate_remap_data pvhdata;
+
+   BUG_ON(!pages);
+
+   pvhdata.fgmfn = mfn;
+   pvhdata.prot = prot;
+   pvhdata.domid = domid;
+   pvhdata.index = 0;
+   pvhdata.pages = pages;
+   err = apply_to_page_range(vma-vm_mm, addr, nr  PAGE_SHIFT,
+ xlate_map_pte_fn, pvhdata);
+   flush_tlb_all();
+   return err;
+}
+#endif
+
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
@@ -2544,13 +2631,20 @@ int xen_remap_domain_mfn_range(struct vm_area_struct 
*vma,
unsigned long range;
int err = 0;
 
-   if (xen_feature(XENFEAT_auto_translated_physmap))
-   return -EINVAL;
-
prot = __pgprot(pgprot_val(prot) | _PAGE_IOMAP);
 
BUG_ON(!((vma-vm_flags  (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
+   if (xen_feature(XENFEAT_auto_translated_physmap)) {
+#ifdef CONFIG_XEN_PVH
+   /* We need to update the local page tables and the xen HAP */
+   return xlate_remap_gmfn_range(vma, addr, mfn, nr, prot,
+ domid, pages);
+#else
+   return -EINVAL;
+#endif
+   }
+
rmd.mfn = mfn;
rmd.prot = prot;
 
@@ -2588,6 +2682,21 @@ int xen_unmap_domain_mfn_range(struct vm_area_struct 
*vma,
if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
return 0;
 
+#ifdef CONFIG_XEN_PVH
+   while (numpgs--) {
+
+   /* The mmu has already cleaned up the process mmu resources at
+* this point (lookup_address will return NULL). */
+   unsigned long pfn = page_to_pfn(pages[numpgs]);
+
+   xlate_remove_from_p2m(pfn, 1);
+   }
+   /* We don't need to flush tlbs because as part of xlate_remove_from_p2m,
+* the hypervisor will do tlb flushes after removing the p2m entries
+* from the EPT/NPT */
+   return 0;
+#else
return -EINVAL;
+#endif
 }
 EXPORT_SYMBOL_GPL(xen_unmap_domain_mfn_range);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body

Re: [PATCH] pvh: set cr4 flags for APs

2014-02-03 Thread Mukesh Rathor
On Mon, 3 Feb 2014 15:43:46 -0500
Konrad Rzeszutek Wilk  wrote:

> On Mon, Feb 03, 2014 at 02:52:40PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Mon, Feb 03, 2014 at 11:30:01AM -0800, Mukesh Rathor wrote:
> > > On Mon, 3 Feb 2014 06:49:14 -0500
> > > Konrad Rzeszutek Wilk  wrote:
> > > 
> > > > On Wed, Jan 29, 2014 at 04:15:18PM -0800, Mukesh Rathor wrote:
> > > > > We need to set cr4 flags for APs that are already set for BSP.
> > > > 
> > > > The title is missing the 'xen' part.
> > > 
> > > The patch is for linux, not xen.
> > 
> > Right. And hence you need to prefix the title with 'xen' in it
> > otherwise it won't be obvious from the Linux log line for what
> > component of the Linux tree it is.
> > 
> > > 
> > > > I rewrote it a bit and I think this should go in 3.14.
> > > > 
> > > > David, Boris: It is not the full fix as there are other parts to
> > > > make an PVH guest use 2MB or 1GB pages- but this fixes an
> > > > obvious bug.
> > > > 
> > > > 
> > > > 
> > > > From 797ea6812ff0a90cce966a4ff6bad57cbadc43b5 Mon Sep 17
> > > > 00:00:00 2001 From: Mukesh Rathor 
> > > > Date: Wed, 29 Jan 2014 16:15:18 -0800
> > > > Subject: [PATCH] xen/pvh: set CR4 flags for APs
> > > > 
> > > > The Xen ABI sets said flags for the BSP, but it does
> > > 
> > > NO it does not. I said it few times, it's set by
> > > probe_page_size_mask (which is in linux) for the BSP. The comment
> > > below also says it.
> > 
> > Where does it set it for APs? Can we piggyback on that?
> 
> And since I am in a hurry to fix an build regression I did the
> research myself - but this kind of information needs to be in the
> commit message.
> 
> Here is what I have, please comment as I want to send a git pull to
> Linux within the hour.
> 
> From 125ef07fd58e963cc286554f6536e46c9712033c Mon Sep 17 00:00:00 2001
> From: Mukesh Rathor 
> Date: Wed, 29 Jan 2014 16:15:18 -0800
> Subject: [PATCH] xen/pvh: set CR4 flags for APs
> 
> During bootup in the 'probe_page_size_mask' these CR4
> flags are set in there. But for AP processors they
> are not set as we do not use 'secondary_startup_64' which
> the baremetal kernels uses. Instead do it in
> this function which we use in Xen PVH during our
> startup for AP and BSP processors.
> 
> As such fix it up to make sure we have that flag set.

Thats good enough for me.

Mukesh


> Signed-off-by: Mukesh Rathor 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  arch/x86/xen/enlighten.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index a4d7b64..201d09a 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1473,6 +1473,18 @@ static void xen_pvh_set_cr_flags(int cpu)
>* X86_CR0_TS, X86_CR0_PE, X86_CR0_ET are set by Xen for HVM
> guests
>* (which PVH shared codepaths), while X86_CR0_PG is for
> PVH. */ write_cr0(read_cr0() | X86_CR0_MP | X86_CR0_NE | X86_CR0_WP |
> X86_CR0_AM); +
> + if (!cpu)
> + return;
> + /*
> +  * For BSP, PSE PGE are set in probe_page_size_mask(), for
> APs
> +  * set them here. For all, OSFXSR OSXMMEXCPT are set in
> fpu_init.
> + */
> + if (cpu_has_pse)
> + set_in_cr4(X86_CR4_PSE);
> +
> + if (cpu_has_pge)
> + set_in_cr4(X86_CR4_PGE);
>  }
>  
>  /*

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pvh: set cr4 flags for APs

2014-02-03 Thread Mukesh Rathor
On Mon, 3 Feb 2014 06:49:14 -0500
Konrad Rzeszutek Wilk  wrote:

> On Wed, Jan 29, 2014 at 04:15:18PM -0800, Mukesh Rathor wrote:
> > We need to set cr4 flags for APs that are already set for BSP.
> 
> The title is missing the 'xen' part.

The patch is for linux, not xen.

> I rewrote it a bit and I think this should go in 3.14.
> 
> David, Boris: It is not the full fix as there are other parts to
> make an PVH guest use 2MB or 1GB pages- but this fixes an obvious
> bug.
> 
> 
> 
> From 797ea6812ff0a90cce966a4ff6bad57cbadc43b5 Mon Sep 17 00:00:00 2001
> From: Mukesh Rathor 
> Date: Wed, 29 Jan 2014 16:15:18 -0800
> Subject: [PATCH] xen/pvh: set CR4 flags for APs
> 
> The Xen ABI sets said flags for the BSP, but it does

NO it does not. I said it few times, it's set by probe_page_size_mask
(which is in linux) for the BSP. The comment below also says it.

thanks
mukesh

> not do that for the CR4. As such fix it up to make
> sure we have that flag set.
> 
> Signed-off-by: Mukesh Rathor 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  arch/x86/xen/enlighten.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index a4d7b64..201d09a 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1473,6 +1473,18 @@ static void xen_pvh_set_cr_flags(int cpu)
>* X86_CR0_TS, X86_CR0_PE, X86_CR0_ET are set by Xen for HVM
> guests
>* (which PVH shared codepaths), while X86_CR0_PG is for
> PVH. */ write_cr0(read_cr0() | X86_CR0_MP | X86_CR0_NE | X86_CR0_WP |
> X86_CR0_AM); +
> + if (!cpu)
> + return;
> + /*
> +  * For BSP, PSE PGE are set in probe_page_size_mask(), for
> APs
> +  * set them here. For all, OSFXSR OSXMMEXCPT are set in
> fpu_init.
> + */
> + if (cpu_has_pse)
> + set_in_cr4(X86_CR4_PSE);
> +
> + if (cpu_has_pge)
> + set_in_cr4(X86_CR4_PGE);
>  }
>  
>  /*

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pvh: set cr4 flags for APs

2014-02-03 Thread Mukesh Rathor
On Mon, 3 Feb 2014 06:49:14 -0500
Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:

 On Wed, Jan 29, 2014 at 04:15:18PM -0800, Mukesh Rathor wrote:
  We need to set cr4 flags for APs that are already set for BSP.
 
 The title is missing the 'xen' part.

The patch is for linux, not xen.

 I rewrote it a bit and I think this should go in 3.14.
 
 David, Boris: It is not the full fix as there are other parts to
 make an PVH guest use 2MB or 1GB pages- but this fixes an obvious
 bug.
 
 
 
 From 797ea6812ff0a90cce966a4ff6bad57cbadc43b5 Mon Sep 17 00:00:00 2001
 From: Mukesh Rathor mukesh.rat...@oracle.com
 Date: Wed, 29 Jan 2014 16:15:18 -0800
 Subject: [PATCH] xen/pvh: set CR4 flags for APs
 
 The Xen ABI sets said flags for the BSP, but it does

NO it does not. I said it few times, it's set by probe_page_size_mask
(which is in linux) for the BSP. The comment below also says it.

thanks
mukesh

 not do that for the CR4. As such fix it up to make
 sure we have that flag set.
 
 Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
 Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 ---
  arch/x86/xen/enlighten.c | 12 
  1 file changed, 12 insertions(+)
 
 diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
 index a4d7b64..201d09a 100644
 --- a/arch/x86/xen/enlighten.c
 +++ b/arch/x86/xen/enlighten.c
 @@ -1473,6 +1473,18 @@ static void xen_pvh_set_cr_flags(int cpu)
* X86_CR0_TS, X86_CR0_PE, X86_CR0_ET are set by Xen for HVM
 guests
* (which PVH shared codepaths), while X86_CR0_PG is for
 PVH. */ write_cr0(read_cr0() | X86_CR0_MP | X86_CR0_NE | X86_CR0_WP |
 X86_CR0_AM); +
 + if (!cpu)
 + return;
 + /*
 +  * For BSP, PSE PGE are set in probe_page_size_mask(), for
 APs
 +  * set them here. For all, OSFXSR OSXMMEXCPT are set in
 fpu_init.
 + */
 + if (cpu_has_pse)
 + set_in_cr4(X86_CR4_PSE);
 +
 + if (cpu_has_pge)
 + set_in_cr4(X86_CR4_PGE);
  }
  
  /*

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pvh: set cr4 flags for APs

2014-02-03 Thread Mukesh Rathor
On Mon, 3 Feb 2014 15:43:46 -0500
Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:

 On Mon, Feb 03, 2014 at 02:52:40PM -0500, Konrad Rzeszutek Wilk wrote:
  On Mon, Feb 03, 2014 at 11:30:01AM -0800, Mukesh Rathor wrote:
   On Mon, 3 Feb 2014 06:49:14 -0500
   Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:
   
On Wed, Jan 29, 2014 at 04:15:18PM -0800, Mukesh Rathor wrote:
 We need to set cr4 flags for APs that are already set for BSP.

The title is missing the 'xen' part.
   
   The patch is for linux, not xen.
  
  Right. And hence you need to prefix the title with 'xen' in it
  otherwise it won't be obvious from the Linux log line for what
  component of the Linux tree it is.
  
   
I rewrote it a bit and I think this should go in 3.14.

David, Boris: It is not the full fix as there are other parts to
make an PVH guest use 2MB or 1GB pages- but this fixes an
obvious bug.



From 797ea6812ff0a90cce966a4ff6bad57cbadc43b5 Mon Sep 17
00:00:00 2001 From: Mukesh Rathor mukesh.rat...@oracle.com
Date: Wed, 29 Jan 2014 16:15:18 -0800
Subject: [PATCH] xen/pvh: set CR4 flags for APs

The Xen ABI sets said flags for the BSP, but it does
   
   NO it does not. I said it few times, it's set by
   probe_page_size_mask (which is in linux) for the BSP. The comment
   below also says it.
  
  Where does it set it for APs? Can we piggyback on that?
 
 And since I am in a hurry to fix an build regression I did the
 research myself - but this kind of information needs to be in the
 commit message.
 
 Here is what I have, please comment as I want to send a git pull to
 Linux within the hour.
 
 From 125ef07fd58e963cc286554f6536e46c9712033c Mon Sep 17 00:00:00 2001
 From: Mukesh Rathor mukesh.rat...@oracle.com
 Date: Wed, 29 Jan 2014 16:15:18 -0800
 Subject: [PATCH] xen/pvh: set CR4 flags for APs
 
 During bootup in the 'probe_page_size_mask' these CR4
 flags are set in there. But for AP processors they
 are not set as we do not use 'secondary_startup_64' which
 the baremetal kernels uses. Instead do it in
 this function which we use in Xen PVH during our
 startup for AP and BSP processors.
 
 As such fix it up to make sure we have that flag set.

Thats good enough for me.

Mukesh


 Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
 Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 ---
  arch/x86/xen/enlighten.c | 12 
  1 file changed, 12 insertions(+)
 
 diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
 index a4d7b64..201d09a 100644
 --- a/arch/x86/xen/enlighten.c
 +++ b/arch/x86/xen/enlighten.c
 @@ -1473,6 +1473,18 @@ static void xen_pvh_set_cr_flags(int cpu)
* X86_CR0_TS, X86_CR0_PE, X86_CR0_ET are set by Xen for HVM
 guests
* (which PVH shared codepaths), while X86_CR0_PG is for
 PVH. */ write_cr0(read_cr0() | X86_CR0_MP | X86_CR0_NE | X86_CR0_WP |
 X86_CR0_AM); +
 + if (!cpu)
 + return;
 + /*
 +  * For BSP, PSE PGE are set in probe_page_size_mask(), for
 APs
 +  * set them here. For all, OSFXSR OSXMMEXCPT are set in
 fpu_init.
 + */
 + if (cpu_has_pse)
 + set_in_cr4(X86_CR4_PSE);
 +
 + if (cpu_has_pge)
 + set_in_cr4(X86_CR4_PGE);
  }
  
  /*

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V0] linux PVH: Set CR4 flags

2014-01-30 Thread Mukesh Rathor
On Thu, 30 Jan 2014 11:40:44 +
Roger Pau Monné  wrote:

> On 30/01/14 00:15, Mukesh Rathor wrote:
> > Konrad,
> > 
> > The CR4 settings were dropped from my earlier patch because you
> > didn't wanna enable them. But since you do now, we need to set them
> > in the APs also. If you decide not too again, please apply my prev
> > patch "pvh: disable pse feature for now".
> 
> Hello Mukesh,
> 
> Could you push your CR related patches to a git repo branch? I'm
> currently having a bit of a mess in figuring out which ones should be
> applied and in which order.
> 
> Thanks, Roger.

Hey Roger,

Unfortunately, I don't have them in a tree because my first patch was 
changed during merge, and also the tree was refreshed.  Basically, the end
result, we leave features enabled on linux side, thus setting not only
the cr0 bits, but also the cr4 PSE and PGE for APs (they were already
set for the BSP). 

Konrad only merged the CR0 setting part of my first patch, hence this 
patch to set the CR4 bits. Hope that makes sense. My latest tree is:

http://oss.us.oracle.com/git/mrathor/linux.git  muk2

thanks
mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V0] linux PVH: Set CR4 flags

2014-01-30 Thread Mukesh Rathor
On Thu, 30 Jan 2014 11:40:44 +
Roger Pau Monné roger@citrix.com wrote:

 On 30/01/14 00:15, Mukesh Rathor wrote:
  Konrad,
  
  The CR4 settings were dropped from my earlier patch because you
  didn't wanna enable them. But since you do now, we need to set them
  in the APs also. If you decide not too again, please apply my prev
  patch pvh: disable pse feature for now.
 
 Hello Mukesh,
 
 Could you push your CR related patches to a git repo branch? I'm
 currently having a bit of a mess in figuring out which ones should be
 applied and in which order.
 
 Thanks, Roger.

Hey Roger,

Unfortunately, I don't have them in a tree because my first patch was 
changed during merge, and also the tree was refreshed.  Basically, the end
result, we leave features enabled on linux side, thus setting not only
the cr0 bits, but also the cr4 PSE and PGE for APs (they were already
set for the BSP). 

Konrad only merged the CR0 setting part of my first patch, hence this 
patch to set the CR4 bits. Hope that makes sense. My latest tree is:

http://oss.us.oracle.com/git/mrathor/linux.git  muk2

thanks
mukesh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] pvh: set cr4 flags for APs

2014-01-29 Thread Mukesh Rathor
We need to set cr4 flags for APs that are already set for BSP.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/enlighten.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a4d7b64..201d09a 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1473,6 +1473,18 @@ static void xen_pvh_set_cr_flags(int cpu)
 * X86_CR0_TS, X86_CR0_PE, X86_CR0_ET are set by Xen for HVM guests
 * (which PVH shared codepaths), while X86_CR0_PG is for PVH. */
write_cr0(read_cr0() | X86_CR0_MP | X86_CR0_NE | X86_CR0_WP | 
X86_CR0_AM);
+
+   if (!cpu)
+   return;
+   /*
+* For BSP, PSE PGE are set in probe_page_size_mask(), for APs
+* set them here. For all, OSFXSR OSXMMEXCPT are set in fpu_init.
+   */
+   if (cpu_has_pse)
+   set_in_cr4(X86_CR4_PSE);
+
+   if (cpu_has_pge)
+   set_in_cr4(X86_CR4_PGE);
 }
 
 /*
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V0] linux PVH: Set CR4 flags

2014-01-29 Thread Mukesh Rathor
Konrad,

The CR4 settings were dropped from my earlier patch because you didn't
wanna enable them. But since you do now, we need to set them in the APs
also. If you decide not too again, please apply my prev patch
"pvh: disable pse feature for now".

thanks
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] pvh: set cr4 flags for APs

2014-01-29 Thread Mukesh Rathor
We need to set cr4 flags for APs that are already set for BSP.

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
---
 arch/x86/xen/enlighten.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a4d7b64..201d09a 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1473,6 +1473,18 @@ static void xen_pvh_set_cr_flags(int cpu)
 * X86_CR0_TS, X86_CR0_PE, X86_CR0_ET are set by Xen for HVM guests
 * (which PVH shared codepaths), while X86_CR0_PG is for PVH. */
write_cr0(read_cr0() | X86_CR0_MP | X86_CR0_NE | X86_CR0_WP | 
X86_CR0_AM);
+
+   if (!cpu)
+   return;
+   /*
+* For BSP, PSE PGE are set in probe_page_size_mask(), for APs
+* set them here. For all, OSFXSR OSXMMEXCPT are set in fpu_init.
+   */
+   if (cpu_has_pse)
+   set_in_cr4(X86_CR4_PSE);
+
+   if (cpu_has_pge)
+   set_in_cr4(X86_CR4_PGE);
 }
 
 /*
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V0] linux PVH: Set CR4 flags

2014-01-29 Thread Mukesh Rathor
Konrad,

The CR4 settings were dropped from my earlier patch because you didn't
wanna enable them. But since you do now, we need to set them in the APs
also. If you decide not too again, please apply my prev patch
pvh: disable pse feature for now.

thanks
Mukesh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [V0 PATCH] pvh: Disable PSE feature for now

2014-01-28 Thread Mukesh Rathor
On Tue, 28 Jan 2014 10:39:23 +
"Jan Beulich"  wrote:

> >>> On 28.01.14 at 03:18, Mukesh Rathor 
> >>> wrote:
> > Until now, xen did not expose PSE to pvh guest, but a patch was
> > submitted to xen list to enable bunch of features for a pvh guest.
> > PSE has not been looked into for PVH, so until we can do that and
> > test it to make sure it works, disable the feature to avoid flood
> > of bugs.
> > 
> > Signed-off-by: Mukesh Rathor 
> > ---
> >  arch/x86/xen/enlighten.c |5 +
> >  1 files changed, 5 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> > index a4d7b64..4e952046 100644
> > --- a/arch/x86/xen/enlighten.c
> > +++ b/arch/x86/xen/enlighten.c
> > @@ -1497,6 +1497,11 @@ static void __init
> > xen_pvh_early_guest_init(void) xen_have_vector_callback = 1;
> > xen_pvh_set_cr_flags(0);
> >  
> > +/* pvh guests are not quite ready for large pages yet */
> > +setup_clear_cpu_cap(X86_FEATURE_PSE);
> > +setup_clear_cpu_cap(X86_FEATURE_PSE36);
> 
> And why would you not want to also turn of 1Gb pages then?

Right, that should be turned off too, but Konrad thinks we should
leave them on in linux and deal with issues as they come. I've not
tested them, or looked/thought about them, so had thought would be 
better to turn them on after I/someone gets to test them.

thanks
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [V0 PATCH] pvh: Disable PSE feature for now

2014-01-28 Thread Mukesh Rathor
On Mon, 27 Jan 2014 22:46:34 -0500
Konrad Rzeszutek Wilk  wrote:

> On Mon, Jan 27, 2014 at 06:18:39PM -0800, Mukesh Rathor wrote:
> > Until now, xen did not expose PSE to pvh guest, but a patch was
> > submitted to xen list to enable bunch of features for a pvh guest.
> > PSE has not been
> 
> Which 'patch'?
> 
> > looked into for PVH, so until we can do that and test it to make
> > sure it works, disable the feature to avoid flood of bugs.
> 
> I think we want a flood of bugs, no?

Ok, but lets document (via this email :)), that they are not tested.

thanks
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [V0 PATCH] pvh: Disable PSE feature for now

2014-01-28 Thread Mukesh Rathor
On Mon, 27 Jan 2014 22:46:34 -0500
Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:

 On Mon, Jan 27, 2014 at 06:18:39PM -0800, Mukesh Rathor wrote:
  Until now, xen did not expose PSE to pvh guest, but a patch was
  submitted to xen list to enable bunch of features for a pvh guest.
  PSE has not been
 
 Which 'patch'?
 
  looked into for PVH, so until we can do that and test it to make
  sure it works, disable the feature to avoid flood of bugs.
 
 I think we want a flood of bugs, no?

Ok, but lets document (via this email :)), that they are not tested.

thanks
Mukesh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [V0 PATCH] pvh: Disable PSE feature for now

2014-01-28 Thread Mukesh Rathor
On Tue, 28 Jan 2014 10:39:23 +
Jan Beulich jbeul...@suse.com wrote:

  On 28.01.14 at 03:18, Mukesh Rathor mukesh.rat...@oracle.com
  wrote:
  Until now, xen did not expose PSE to pvh guest, but a patch was
  submitted to xen list to enable bunch of features for a pvh guest.
  PSE has not been looked into for PVH, so until we can do that and
  test it to make sure it works, disable the feature to avoid flood
  of bugs.
  
  Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
  ---
   arch/x86/xen/enlighten.c |5 +
   1 files changed, 5 insertions(+), 0 deletions(-)
  
  diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
  index a4d7b64..4e952046 100644
  --- a/arch/x86/xen/enlighten.c
  +++ b/arch/x86/xen/enlighten.c
  @@ -1497,6 +1497,11 @@ static void __init
  xen_pvh_early_guest_init(void) xen_have_vector_callback = 1;
  xen_pvh_set_cr_flags(0);
   
  +/* pvh guests are not quite ready for large pages yet */
  +setup_clear_cpu_cap(X86_FEATURE_PSE);
  +setup_clear_cpu_cap(X86_FEATURE_PSE36);
 
 And why would you not want to also turn of 1Gb pages then?

Right, that should be turned off too, but Konrad thinks we should
leave them on in linux and deal with issues as they come. I've not
tested them, or looked/thought about them, so had thought would be 
better to turn them on after I/someone gets to test them.

thanks
Mukesh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V0 PATCH] pvh: Disable PSE feature for now

2014-01-27 Thread Mukesh Rathor
Until now, xen did not expose PSE to pvh guest, but a patch was submitted
to xen list to enable bunch of features for a pvh guest. PSE has not been
looked into for PVH, so until we can do that and test it to make sure it
works, disable the feature to avoid flood of bugs.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/enlighten.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a4d7b64..4e952046 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1497,6 +1497,11 @@ static void __init xen_pvh_early_guest_init(void)
xen_have_vector_callback = 1;
xen_pvh_set_cr_flags(0);
 
+/* pvh guests are not quite ready for large pages yet */
+setup_clear_cpu_cap(X86_FEATURE_PSE);
+setup_clear_cpu_cap(X86_FEATURE_PSE36);
+
+
 #ifdef CONFIG_X86_32
BUG(); /* PVH: Implement proper support. */
 #endif
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


pvh: disable pse feature for now

2014-01-27 Thread Mukesh Rathor
Konrad,

Following will turn off PSE in linux until we can get to it. It's better
to turn it off here than in xen, so if BSD gets there sooner, they are not 
dependent on us.

thanks
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V0 PATCH] pvh: Disable PSE feature for now

2014-01-27 Thread Mukesh Rathor
Until now, xen did not expose PSE to pvh guest, but a patch was submitted
to xen list to enable bunch of features for a pvh guest. PSE has not been
looked into for PVH, so until we can do that and test it to make sure it
works, disable the feature to avoid flood of bugs.

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
---
 arch/x86/xen/enlighten.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a4d7b64..4e952046 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1497,6 +1497,11 @@ static void __init xen_pvh_early_guest_init(void)
xen_have_vector_callback = 1;
xen_pvh_set_cr_flags(0);
 
+/* pvh guests are not quite ready for large pages yet */
+setup_clear_cpu_cap(X86_FEATURE_PSE);
+setup_clear_cpu_cap(X86_FEATURE_PSE36);
+
+
 #ifdef CONFIG_X86_32
BUG(); /* PVH: Implement proper support. */
 #endif
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


pvh: disable pse feature for now

2014-01-27 Thread Mukesh Rathor
Konrad,

Following will turn off PSE in linux until we can get to it. It's better
to turn it off here than in xen, so if BSD gets there sooner, they are not 
dependent on us.

thanks
Mukesh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [V0 PATCH] xen/pvh: set some cr flags upon vcpu start

2014-01-22 Thread Mukesh Rathor
On Mon, 20 Jan 2014 10:09:30 -0500
Konrad Rzeszutek Wilk  wrote:

> On Fri, Jan 17, 2014 at 06:24:55PM -0800, Mukesh Rathor wrote:
> > pvh was designed to start with pv flags, but a commit in xen tree
> 
> Thank you for posting this!
> 
> > 51e2cac257ec8b4080d89f0855c498cbbd76a5e5 removed some of the flags
> > as
> 
> You need to always include the title of said commit.
> 
> > they are not necessary. As a result, these CR flags must be set in
> > the guest.
> 
> I sent out replies to this over the weekend but somehow they are not
> showing up.
> 

Well, they finally showed up today... US mail must be slow :)...


> 
> > +
> > +   if (!cpu)
> > +   return;
> 
> And what happens if don't have this check? Will be bad if do multiple
> cr4 writes?

no, but just confuses the reader/debugger of the code IMO :)... 


> Fyi, this (cr4) should have been a seperate patch. I fixed it up that
> way.
> > +   /*
> > +* Unlike PV, for pvh xen does not set: PSE PGE OSFXSR
> > OSXMMEXCPT
> > +* For BSP, PSE PGE will be set in probe_page_size_mask(),
> > for AP
> > +* set them here. For all, OSFXSR OSXMMEXCPT will be set
> > in fpu_init
> > +*/
> > +   if (cpu_has_pse)
> > +   set_in_cr4(X86_CR4_PSE);
> > +
> > +   if (cpu_has_pge)
> > +   set_in_cr4(X86_CR4_PGE);
> > +}
> 
> Seperate patch and since the PGE part is more complicated that just
> setting the CR4 - you also have to tweak this:
> 
> 1512 /* Prevent unwanted bits from being set in PTEs.
> */ 1513 __supported_pte_mask &=
> ~_PAGE_GLOBAL;  
> 
> I think it should be done once we have actually confirmed that you can
> do 2MB pages within the guest. (might need some more tweaking?)

Umm... well, the above is just setting the PSE and PGE in the APs, the
BSP is already doing that in probe_page_size_mask, and setting 
__supported_pte_mask which needs to be set just once. So, because it's
being set in the BSP, it's already broken/untested if we add expose of PGE
from xen to a linux PVH guest... 

IOW, leaving above is no more harm, or we should 'if (pvh)' the code in 
probe_page_size_mask() for PSE, and wait till we can test it...

thanks
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [V0 PATCH] xen/pvh: set some cr flags upon vcpu start

2014-01-22 Thread Mukesh Rathor
On Mon, 20 Jan 2014 10:09:30 -0500
Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:

 On Fri, Jan 17, 2014 at 06:24:55PM -0800, Mukesh Rathor wrote:
  pvh was designed to start with pv flags, but a commit in xen tree
 
 Thank you for posting this!
 
  51e2cac257ec8b4080d89f0855c498cbbd76a5e5 removed some of the flags
  as
 
 You need to always include the title of said commit.
 
  they are not necessary. As a result, these CR flags must be set in
  the guest.
 
 I sent out replies to this over the weekend but somehow they are not
 showing up.
 

Well, they finally showed up today... US mail must be slow :)...


 
  +
  +   if (!cpu)
  +   return;
 
 And what happens if don't have this check? Will be bad if do multiple
 cr4 writes?

no, but just confuses the reader/debugger of the code IMO :)... 


 Fyi, this (cr4) should have been a seperate patch. I fixed it up that
 way.
  +   /*
  +* Unlike PV, for pvh xen does not set: PSE PGE OSFXSR
  OSXMMEXCPT
  +* For BSP, PSE PGE will be set in probe_page_size_mask(),
  for AP
  +* set them here. For all, OSFXSR OSXMMEXCPT will be set
  in fpu_init
  +*/
  +   if (cpu_has_pse)
  +   set_in_cr4(X86_CR4_PSE);
  +
  +   if (cpu_has_pge)
  +   set_in_cr4(X86_CR4_PGE);
  +}
 
 Seperate patch and since the PGE part is more complicated that just
 setting the CR4 - you also have to tweak this:
 
 1512 /* Prevent unwanted bits from being set in PTEs.
 */ 1513 __supported_pte_mask =
 ~_PAGE_GLOBAL;  
 
 I think it should be done once we have actually confirmed that you can
 do 2MB pages within the guest. (might need some more tweaking?)

Umm... well, the above is just setting the PSE and PGE in the APs, the
BSP is already doing that in probe_page_size_mask, and setting 
__supported_pte_mask which needs to be set just once. So, because it's
being set in the BSP, it's already broken/untested if we add expose of PGE
from xen to a linux PVH guest... 

IOW, leaving above is no more harm, or we should 'if (pvh)' the code in 
probe_page_size_mask() for PSE, and wait till we can test it...

thanks
Mukesh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V0 PATCH] xen/pvh: set some cr flags upon vcpu start

2014-01-17 Thread Mukesh Rathor
Konrad,

The following patch sets the bits in CR0 and CR4. Please note, I'm working
on patch for the xen side. The CR4 features are not currently exported
to a PVH guest. 

Roger, I added your SOB line, please lmk if I need to add anything else.

This patch was build on top of a71accb67e7645c68061cec2bee6067205e439fc in
konrad devel/pvh.v13 branch.

thanks
Mukesh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V0 PATCH] xen/pvh: set some cr flags upon vcpu start

2014-01-17 Thread Mukesh Rathor
pvh was designed to start with pv flags, but a commit in xen tree
51e2cac257ec8b4080d89f0855c498cbbd76a5e5 removed some of the flags as
they are not necessary. As a result, these CR flags must be set in the
guest.

Signed-off-by: Roger Pau Monne 
Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/enlighten.c |   43 +--
 arch/x86/xen/smp.c   |2 +-
 arch/x86/xen/xen-ops.h   |2 +-
 3 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 628099a..4a2aaa6 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1410,12 +1410,8 @@ static void __init xen_boot_params_init_edd(void)
  * Set up the GDT and segment registers for -fstack-protector.  Until
  * we do this, we have to be careful not to call any stack-protected
  * function, which is most of the kernel.
- *
- * Note, that it is refok - because the only caller of this after init
- * is PVH which is not going to use xen_load_gdt_boot or other
- * __init functions.
  */
-void __ref xen_setup_gdt(int cpu)
+static void xen_setup_gdt(int cpu)
 {
if (xen_feature(XENFEAT_auto_translated_physmap)) {
 #ifdef CONFIG_X86_64
@@ -1463,13 +1459,48 @@ void __ref xen_setup_gdt(int cpu)
pv_cpu_ops.load_gdt = xen_load_gdt;
 }
 
+/*
+ * A pv guest starts with default flags that are not set for pvh, set them
+ * here asap.
+ */
+static void xen_pvh_set_cr_flags(int cpu)
+{
+   write_cr0(read_cr0() | X86_CR0_MP | X86_CR0_WP | X86_CR0_AM);
+
+   if (!cpu)
+   return;
+   /*
+* Unlike PV, for pvh xen does not set: PSE PGE OSFXSR OSXMMEXCPT
+* For BSP, PSE PGE will be set in probe_page_size_mask(), for AP
+* set them here. For all, OSFXSR OSXMMEXCPT will be set in fpu_init
+*/
+   if (cpu_has_pse)
+   set_in_cr4(X86_CR4_PSE);
+
+   if (cpu_has_pge)
+   set_in_cr4(X86_CR4_PGE);
+}
+
+/*
+ * Note, that it is refok - because the only caller of this after init
+ * is PVH which is not going to use xen_load_gdt_boot or other
+ * __init functions.
+ */
+void __ref xen_pvh_secondary_vcpu_init(int cpu)
+{
+   xen_setup_gdt(cpu);
+   xen_pvh_set_cr_flags(cpu);
+}
+
 static void __init xen_pvh_early_guest_init(void)
 {
if (!xen_feature(XENFEAT_auto_translated_physmap))
return;
 
-   if (xen_feature(XENFEAT_hvm_callback_vector))
+   if (xen_feature(XENFEAT_hvm_callback_vector)) {
xen_have_vector_callback = 1;
+   xen_pvh_set_cr_flags(0);
+   }
 
 #ifdef CONFIG_X86_32
BUG(); /* PVH: Implement proper support. */
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 5e46190..a18eadd 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -105,7 +105,7 @@ static void cpu_bringup_and_idle(int cpu)
 #ifdef CONFIG_X86_64
if (xen_feature(XENFEAT_auto_translated_physmap) &&
xen_feature(XENFEAT_supervisor_mode_kernel))
-   xen_setup_gdt(cpu);
+   xen_pvh_secondary_vcpu_init(cpu);
 #endif
cpu_bringup();
cpu_startup_entry(CPUHP_ONLINE);
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 9059c24..1cb6f4c 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -123,5 +123,5 @@ __visible void xen_adjust_exception_frame(void);
 
 extern int xen_panic_handler_init(void);
 
-void xen_setup_gdt(int cpu);
+void xen_pvh_secondary_vcpu_init(int cpu);
 #endif /* XEN_OPS_H */
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V0 PATCH] xen/pvh: set some cr flags upon vcpu start

2014-01-17 Thread Mukesh Rathor
pvh was designed to start with pv flags, but a commit in xen tree
51e2cac257ec8b4080d89f0855c498cbbd76a5e5 removed some of the flags as
they are not necessary. As a result, these CR flags must be set in the
guest.

Signed-off-by: Roger Pau Monne roger@citrix.com
Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
---
 arch/x86/xen/enlighten.c |   43 +--
 arch/x86/xen/smp.c   |2 +-
 arch/x86/xen/xen-ops.h   |2 +-
 3 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 628099a..4a2aaa6 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1410,12 +1410,8 @@ static void __init xen_boot_params_init_edd(void)
  * Set up the GDT and segment registers for -fstack-protector.  Until
  * we do this, we have to be careful not to call any stack-protected
  * function, which is most of the kernel.
- *
- * Note, that it is refok - because the only caller of this after init
- * is PVH which is not going to use xen_load_gdt_boot or other
- * __init functions.
  */
-void __ref xen_setup_gdt(int cpu)
+static void xen_setup_gdt(int cpu)
 {
if (xen_feature(XENFEAT_auto_translated_physmap)) {
 #ifdef CONFIG_X86_64
@@ -1463,13 +1459,48 @@ void __ref xen_setup_gdt(int cpu)
pv_cpu_ops.load_gdt = xen_load_gdt;
 }
 
+/*
+ * A pv guest starts with default flags that are not set for pvh, set them
+ * here asap.
+ */
+static void xen_pvh_set_cr_flags(int cpu)
+{
+   write_cr0(read_cr0() | X86_CR0_MP | X86_CR0_WP | X86_CR0_AM);
+
+   if (!cpu)
+   return;
+   /*
+* Unlike PV, for pvh xen does not set: PSE PGE OSFXSR OSXMMEXCPT
+* For BSP, PSE PGE will be set in probe_page_size_mask(), for AP
+* set them here. For all, OSFXSR OSXMMEXCPT will be set in fpu_init
+*/
+   if (cpu_has_pse)
+   set_in_cr4(X86_CR4_PSE);
+
+   if (cpu_has_pge)
+   set_in_cr4(X86_CR4_PGE);
+}
+
+/*
+ * Note, that it is refok - because the only caller of this after init
+ * is PVH which is not going to use xen_load_gdt_boot or other
+ * __init functions.
+ */
+void __ref xen_pvh_secondary_vcpu_init(int cpu)
+{
+   xen_setup_gdt(cpu);
+   xen_pvh_set_cr_flags(cpu);
+}
+
 static void __init xen_pvh_early_guest_init(void)
 {
if (!xen_feature(XENFEAT_auto_translated_physmap))
return;
 
-   if (xen_feature(XENFEAT_hvm_callback_vector))
+   if (xen_feature(XENFEAT_hvm_callback_vector)) {
xen_have_vector_callback = 1;
+   xen_pvh_set_cr_flags(0);
+   }
 
 #ifdef CONFIG_X86_32
BUG(); /* PVH: Implement proper support. */
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 5e46190..a18eadd 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -105,7 +105,7 @@ static void cpu_bringup_and_idle(int cpu)
 #ifdef CONFIG_X86_64
if (xen_feature(XENFEAT_auto_translated_physmap) 
xen_feature(XENFEAT_supervisor_mode_kernel))
-   xen_setup_gdt(cpu);
+   xen_pvh_secondary_vcpu_init(cpu);
 #endif
cpu_bringup();
cpu_startup_entry(CPUHP_ONLINE);
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 9059c24..1cb6f4c 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -123,5 +123,5 @@ __visible void xen_adjust_exception_frame(void);
 
 extern int xen_panic_handler_init(void);
 
-void xen_setup_gdt(int cpu);
+void xen_pvh_secondary_vcpu_init(int cpu);
 #endif /* XEN_OPS_H */
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V0 PATCH] xen/pvh: set some cr flags upon vcpu start

2014-01-17 Thread Mukesh Rathor
Konrad,

The following patch sets the bits in CR0 and CR4. Please note, I'm working
on patch for the xen side. The CR4 features are not currently exported
to a PVH guest. 

Roger, I added your SOB line, please lmk if I need to add anything else.

This patch was build on top of a71accb67e7645c68061cec2bee6067205e439fc in
konrad devel/pvh.v13 branch.

thanks
Mukesh


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 10/18] xen/pvh: Update E820 to work with PVH (v2)

2014-01-03 Thread Mukesh Rathor
On Thu, 2 Jan 2014 13:41:34 -0500
Konrad Rzeszutek Wilk  wrote:

> On Thu, Jan 02, 2014 at 04:14:32PM +, David Vrabel wrote:
> > On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > > From: Mukesh Rathor 
> > > 
> > > In xen_add_extra_mem() we can skip updating P2M as it's managed
> > > by Xen. PVH maps the entire IO space, but only RAM pages need
> > > to be repopulated.
> > 
> > So this looks minimal but I can't work out what PVH actually needs
> > to do here.  This code really doesn't need to be made any more
> > confusing.
> 
> I gather you prefer Mukesh's original version?

I think Konrad thats easier to follow as one can quickly spot
the PVH difference... but your call.

thanks
mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2).

2014-01-03 Thread Mukesh Rathor
On Fri, 3 Jan 2014 12:35:55 -0500
Konrad Rzeszutek Wilk  wrote:

> On Thu, Jan 02, 2014 at 05:34:38PM -0800, Mukesh Rathor wrote:
> > On Thu, 2 Jan 2014 13:32:21 -0500
> > Konrad Rzeszutek Wilk  wrote:
> > 
> > > On Thu, Jan 02, 2014 at 03:32:33PM +, David Vrabel wrote:
> > > > On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > > > > From: Mukesh Rathor 
> > > > > 
> > > > > In the bootup code for PVH we can trap cpuid via vmexit, so
> > > > > don't need to use emulated prefix call. We also check for
> > > > > vector callback early on, as it is a required feature. PVH
> > > > > also runs at default kernel IOPL.
> > > > > 
> > > > > Finally, pure PV settings are moved to a separate function
> > > > > that are only called for pure PV, ie, pv with pvmmu. They are
> > > > > also #ifdef with CONFIG_XEN_PVMMU.
> > > > [...]
> > > > > @@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax,
> > > > > unsigned int *bx, break;
> > > > >   }
> > > > >  
> > > > > - asm(XEN_EMULATE_PREFIX "cpuid"
> > > > > - : "=a" (*ax),
> > > > > -   "=b" (*bx),
> > > > > -   "=c" (*cx),
> > > > > -   "=d" (*dx)
> > > > > - : "0" (*ax), "2" (*cx));
> > > > > + if (xen_pvh_domain())
> > > > > + native_cpuid(ax, bx, cx, dx);
> > > > > + else
> > > > > + asm(XEN_EMULATE_PREFIX "cpuid"
> > > > > + : "=a" (*ax),
> > > > > + "=b" (*bx),
> > > > > + "=c" (*cx),
> > > > > + "=d" (*dx)
> > > > > + : "0" (*ax), "2" (*cx));
> > > > 
> > > > For this one off cpuid call it seems preferrable to me to use
> > > > the emulate prefix rather than diverge from PV.
> > > 
> > > This was before the PV cpuid was deemed OK to be used on PVH.
> > > Will rip this out to use the same version.
> > 
> > Whats wrong with using native cpuid? That is one of the benefits
> > that cpuid can be trapped via vmexit, and also there is talk of
> > making PV cpuid trap obsolete in the future. I suggest leaving it
> > native.
> 
> I chatted with David, Andrew and Roger on IRC about this. I like the
> idea of using xen_cpuid because:
>  1) It filters some of the CPUID flags that guests should not use.
> There is the 'aperfmperf,'x2apic', 'xsave', and whether the MWAIT_LEAF
> should be exposed (so that the ACPI AML code can call the right
> initialization code to use the extended C3 states instead of the
> legacy IOPORT ones). All of that is in xen_cpuid.
>
>  2) It works, while we can concentrate on making 1) work in the
> hypervisor/toolstack.
> 
> Meaning that the future way would be to use the native cpuid and have
> the hypervisor/toolstack setup the proper cpuid. In other words - use
> the xen_cpuid as is until that code for filtering is in the
> hypervisor.
> 
> 
> Except that PVH does not work the PV cpuid at all. I get a triple
> fault. The instruction it fails at is at the 'XEN_EMULATE_PREFIX'.
> 
> Mukesh, can you point me to the patch where the PV cpuid functionality
> is enabled?
> 
> Anyhow, as it stands, I will just use the native cpuid.

I am referring to using "cpuid" instruction instead of XEN_EMULATE_PREFIX.
cpuid is faster and long term better... there is no benefit using
XEN_EMULATE_PREFIX IMO. We can look at removing xen_cpuid() altogether for
PVH when/after pvh 32bit work gets done IMO.

The triple fault seems to be a new bug... I can create a bug, but for
now, with using cpuid instruction, that won't be an issue.

thanks
mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels for PVH.

2014-01-03 Thread Mukesh Rathor
On Wed, 18 Dec 2013 16:17:39 -0500
Konrad Rzeszutek Wilk  wrote:

> On Wed, Dec 18, 2013 at 06:31:43PM +, Stefano Stabellini wrote:
> > On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > > From: Mukesh Rathor 
> > > 
> > > PVH is a PV guest with a twist - there are certain things
> > > that work in it like HVM and some like PV. There is
> > > a similar mode - PVHVM where we run in HVM mode with
> > > PV code enabled - and this patch explores that.
> > > 
> > > The most notable PV interfaces are the XenBus and event channels.
> > > For PVH, we will use XenBus and event channels.
> > > 
> > > For the XenBus mechanism we piggyback on how it is done for
> > > PVHVM guests.
> > > 
> > > Ditto for the event channel mechanism - we piggyback on PVHVM -
> > > by setting up a specific vector callback and that
> > > vector ends up calling the event channel mechanism to
> > > dispatch the events as needed.
> > > 
> > > This means that from a pvops perspective, we can use
> > > native_irq_ops instead of the Xen PV specific. Albeit in the
> > > future we could support pirq_eoi_map. But that is
> > > a feature request that can be shared with PVHVM.
> > > 
> > > Signed-off-by: Mukesh Rathor 
> > > Signed-off-by: Konrad Rzeszutek Wilk 
> > > ---
> > >  arch/x86/xen/enlighten.c   | 6 ++
> > >  arch/x86/xen/irq.c | 5 -
> > >  drivers/xen/events.c   | 5 +
> > >  drivers/xen/xenbus/xenbus_client.c | 3 ++-
> > >  4 files changed, 17 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> > > index e420613..7fceb51 100644
> > > --- a/arch/x86/xen/enlighten.c
> > > +++ b/arch/x86/xen/enlighten.c
> > > @@ -1134,6 +1134,8 @@ void xen_setup_shared_info(void)
> > >   /* In UP this is as good a place as any to set up shared
> > > info */ xen_setup_vcpu_info_placement();
> > >  #endif
> > > + if (xen_pvh_domain())
> > > + return;
> > >  
> > >   xen_setup_mfn_list_list();
> > >  }
> > 
> > This is another one of those cases where I think we would benefit
> > from introducing xen_setup_shared_info_pvh instead of adding more
> > ifs here.
> 
> Actually this one can be removed.
> 
> > 
> > 
> > > @@ -1146,6 +1148,10 @@ void xen_setup_vcpu_info_placement(void)
> > >   for_each_possible_cpu(cpu)
> > >   xen_vcpu_setup(cpu);
> > >  
> > > + /* PVH always uses native IRQ ops */
> > > + if (xen_pvh_domain())
> > > + return;
> > > +
> > >   /* xen_vcpu_setup managed to place the vcpu_info within
> > > the percpu area for all cpus, so make use of it */
> > >   if (have_vcpu_info_placement) {
> > 
> > Same here?
> 
> Hmmm, I wonder if the vcpu info placement could work with PVH.

It should now (after a patch I sent while ago)... the comment implies
that PVH uses native IRQs even case of vcpu info placlement...

perhaps it would be more clear to do:

for_each_possible_cpu(cpu)
xen_vcpu_setup(cpu);
/* PVH always uses native IRQ ops */
if (have_vcpu_info_placement && !xen_pvh_domain) {
pv_irq_ops.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels for PVH.

2014-01-03 Thread Mukesh Rathor
On Fri, 3 Jan 2014 15:04:27 +
Stefano Stabellini  wrote:

> On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > > > --- a/drivers/xen/xenbus/xenbus_client.c
> > > > +++ b/drivers/xen/xenbus/xenbus_client.c
> > > > @@ -45,6 +45,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > >  
> > > >  #include "xenbus_probe.h"
> > > >  
> > > > @@ -743,7 +744,7 @@ static const struct xenbus_ring_ops
> > > > ring_ops_hvm = { 
> > > >  void __init xenbus_ring_ops_init(void)
> > > >  {
> > > > -   if (xen_pv_domain())
> > > > +   if (xen_pv_domain()
> > > > && !xen_feature(XENFEAT_auto_translated_physmap))
> > > 
> > > Can we just change this test to
> > > 
> > > if (!xen_feature(XENFEAT_auto_translated_physmap))
> > > 
> > > ?
> > 
> > No. If we do then the HVM domains (which are also !auto-xlat)
> > will end up using the PV version of ring_ops.
> 
> Actually HVM guests have XENFEAT_auto_translated_physmap, so in this
> case they would get _ops_hvm.

Right. Back then I was confused about all the other PV modes, like
shadow, supervisor, ... but looks like they are all obsolete. It could 
just be:

if (!xen_feature(XENFEAT_auto_translated_physmap))
ring_ops = _ops_pv;
else
ring_ops = _ops_hvm;

thanks,
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels for PVH.

2014-01-03 Thread Mukesh Rathor
On Fri, 3 Jan 2014 15:04:27 +
Stefano Stabellini stefano.stabell...@eu.citrix.com wrote:

 On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -45,6 +45,7 @@
 #include xen/grant_table.h
 #include xen/xenbus.h
 #include xen/xen.h
+#include xen/features.h
 
 #include xenbus_probe.h
 
@@ -743,7 +744,7 @@ static const struct xenbus_ring_ops
ring_ops_hvm = { 
 void __init xenbus_ring_ops_init(void)
 {
-   if (xen_pv_domain())
+   if (xen_pv_domain()
 !xen_feature(XENFEAT_auto_translated_physmap))
   
   Can we just change this test to
   
   if (!xen_feature(XENFEAT_auto_translated_physmap))
   
   ?
  
  No. If we do then the HVM domains (which are also !auto-xlat)
  will end up using the PV version of ring_ops.
 
 Actually HVM guests have XENFEAT_auto_translated_physmap, so in this
 case they would get ring_ops_hvm.

Right. Back then I was confused about all the other PV modes, like
shadow, supervisor, ... but looks like they are all obsolete. It could 
just be:

if (!xen_feature(XENFEAT_auto_translated_physmap))
ring_ops = ring_ops_pv;
else
ring_ops = ring_ops_hvm;

thanks,
Mukesh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels for PVH.

2014-01-03 Thread Mukesh Rathor
On Wed, 18 Dec 2013 16:17:39 -0500
Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:

 On Wed, Dec 18, 2013 at 06:31:43PM +, Stefano Stabellini wrote:
  On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
   From: Mukesh Rathor mukesh.rat...@oracle.com
   
   PVH is a PV guest with a twist - there are certain things
   that work in it like HVM and some like PV. There is
   a similar mode - PVHVM where we run in HVM mode with
   PV code enabled - and this patch explores that.
   
   The most notable PV interfaces are the XenBus and event channels.
   For PVH, we will use XenBus and event channels.
   
   For the XenBus mechanism we piggyback on how it is done for
   PVHVM guests.
   
   Ditto for the event channel mechanism - we piggyback on PVHVM -
   by setting up a specific vector callback and that
   vector ends up calling the event channel mechanism to
   dispatch the events as needed.
   
   This means that from a pvops perspective, we can use
   native_irq_ops instead of the Xen PV specific. Albeit in the
   future we could support pirq_eoi_map. But that is
   a feature request that can be shared with PVHVM.
   
   Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
   Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
   ---
arch/x86/xen/enlighten.c   | 6 ++
arch/x86/xen/irq.c | 5 -
drivers/xen/events.c   | 5 +
drivers/xen/xenbus/xenbus_client.c | 3 ++-
4 files changed, 17 insertions(+), 2 deletions(-)
   
   diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
   index e420613..7fceb51 100644
   --- a/arch/x86/xen/enlighten.c
   +++ b/arch/x86/xen/enlighten.c
   @@ -1134,6 +1134,8 @@ void xen_setup_shared_info(void)
 /* In UP this is as good a place as any to set up shared
   info */ xen_setup_vcpu_info_placement();
#endif
   + if (xen_pvh_domain())
   + return;

 xen_setup_mfn_list_list();
}
  
  This is another one of those cases where I think we would benefit
  from introducing xen_setup_shared_info_pvh instead of adding more
  ifs here.
 
 Actually this one can be removed.
 
  
  
   @@ -1146,6 +1148,10 @@ void xen_setup_vcpu_info_placement(void)
 for_each_possible_cpu(cpu)
 xen_vcpu_setup(cpu);

   + /* PVH always uses native IRQ ops */
   + if (xen_pvh_domain())
   + return;
   +
 /* xen_vcpu_setup managed to place the vcpu_info within
   the percpu area for all cpus, so make use of it */
 if (have_vcpu_info_placement) {
  
  Same here?
 
 Hmmm, I wonder if the vcpu info placement could work with PVH.

It should now (after a patch I sent while ago)... the comment implies
that PVH uses native IRQs even case of vcpu info placlement...

perhaps it would be more clear to do:

for_each_possible_cpu(cpu)
xen_vcpu_setup(cpu);
/* PVH always uses native IRQ ops */
if (have_vcpu_info_placement  !xen_pvh_domain) {
pv_irq_ops.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2).

2014-01-03 Thread Mukesh Rathor
On Fri, 3 Jan 2014 12:35:55 -0500
Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:

 On Thu, Jan 02, 2014 at 05:34:38PM -0800, Mukesh Rathor wrote:
  On Thu, 2 Jan 2014 13:32:21 -0500
  Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:
  
   On Thu, Jan 02, 2014 at 03:32:33PM +, David Vrabel wrote:
On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
 From: Mukesh Rathor mukesh.rat...@oracle.com
 
 In the bootup code for PVH we can trap cpuid via vmexit, so
 don't need to use emulated prefix call. We also check for
 vector callback early on, as it is a required feature. PVH
 also runs at default kernel IOPL.
 
 Finally, pure PV settings are moved to a separate function
 that are only called for pure PV, ie, pv with pvmmu. They are
 also #ifdef with CONFIG_XEN_PVMMU.
[...]
 @@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax,
 unsigned int *bx, break;
   }
  
 - asm(XEN_EMULATE_PREFIX cpuid
 - : =a (*ax),
 -   =b (*bx),
 -   =c (*cx),
 -   =d (*dx)
 - : 0 (*ax), 2 (*cx));
 + if (xen_pvh_domain())
 + native_cpuid(ax, bx, cx, dx);
 + else
 + asm(XEN_EMULATE_PREFIX cpuid
 + : =a (*ax),
 + =b (*bx),
 + =c (*cx),
 + =d (*dx)
 + : 0 (*ax), 2 (*cx));

For this one off cpuid call it seems preferrable to me to use
the emulate prefix rather than diverge from PV.
   
   This was before the PV cpuid was deemed OK to be used on PVH.
   Will rip this out to use the same version.
  
  Whats wrong with using native cpuid? That is one of the benefits
  that cpuid can be trapped via vmexit, and also there is talk of
  making PV cpuid trap obsolete in the future. I suggest leaving it
  native.
 
 I chatted with David, Andrew and Roger on IRC about this. I like the
 idea of using xen_cpuid because:
  1) It filters some of the CPUID flags that guests should not use.
 There is the 'aperfmperf,'x2apic', 'xsave', and whether the MWAIT_LEAF
 should be exposed (so that the ACPI AML code can call the right
 initialization code to use the extended C3 states instead of the
 legacy IOPORT ones). All of that is in xen_cpuid.

  2) It works, while we can concentrate on making 1) work in the
 hypervisor/toolstack.
 
 Meaning that the future way would be to use the native cpuid and have
 the hypervisor/toolstack setup the proper cpuid. In other words - use
 the xen_cpuid as is until that code for filtering is in the
 hypervisor.
 
 
 Except that PVH does not work the PV cpuid at all. I get a triple
 fault. The instruction it fails at is at the 'XEN_EMULATE_PREFIX'.
 
 Mukesh, can you point me to the patch where the PV cpuid functionality
 is enabled?
 
 Anyhow, as it stands, I will just use the native cpuid.

I am referring to using cpuid instruction instead of XEN_EMULATE_PREFIX.
cpuid is faster and long term better... there is no benefit using
XEN_EMULATE_PREFIX IMO. We can look at removing xen_cpuid() altogether for
PVH when/after pvh 32bit work gets done IMO.

The triple fault seems to be a new bug... I can create a bug, but for
now, with using cpuid instruction, that won't be an issue.

thanks
mukesh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 10/18] xen/pvh: Update E820 to work with PVH (v2)

2014-01-03 Thread Mukesh Rathor
On Thu, 2 Jan 2014 13:41:34 -0500
Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:

 On Thu, Jan 02, 2014 at 04:14:32PM +, David Vrabel wrote:
  On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
   From: Mukesh Rathor mukesh.rat...@oracle.com
   
   In xen_add_extra_mem() we can skip updating P2M as it's managed
   by Xen. PVH maps the entire IO space, but only RAM pages need
   to be repopulated.
  
  So this looks minimal but I can't work out what PVH actually needs
  to do here.  This code really doesn't need to be made any more
  confusing.
 
 I gather you prefer Mukesh's original version?

I think Konrad thats easier to follow as one can quickly spot
the PVH difference... but your call.

thanks
mukesh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 06/18] xen/pvh: MMU changes for PVH (v2)

2014-01-02 Thread Mukesh Rathor
On Thu, 2 Jan 2014 11:24:50 +
David Vrabel  wrote:

> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor 
> > 
> > .. which are surprinsingly small compared to the amount for PV code.
> > 
> > PVH uses mostly native mmu ops, we leave the generic (native_*) for
> > the majority and just overwrite the baremetal with the ones we need.
> > 
> > We also optimize one - the TLB flush. The native operation would
> > needlessly IPI offline VCPUs causing extra wakeups. Using the
> > Xen one avoids that and lets the hypervisor determine which
> > VCPU needs the TLB flush.
> 
> This TLB flush optimization should be a separate patch.

It's not really an "optimization", we are using PV mechanism instead
of native because PV one performs better. So, I think it's ok to belong
here.

Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2).

2014-01-02 Thread Mukesh Rathor
On Thu, 2 Jan 2014 13:32:21 -0500
Konrad Rzeszutek Wilk  wrote:

> On Thu, Jan 02, 2014 at 03:32:33PM +, David Vrabel wrote:
> > On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > > From: Mukesh Rathor 
> > > 
> > > In the bootup code for PVH we can trap cpuid via vmexit, so don't
> > > need to use emulated prefix call. We also check for vector
> > > callback early on, as it is a required feature. PVH also runs at
> > > default kernel IOPL.
> > > 
> > > Finally, pure PV settings are moved to a separate function that
> > > are only called for pure PV, ie, pv with pvmmu. They are also
> > > #ifdef with CONFIG_XEN_PVMMU.
> > [...]
> > > @@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax,
> > > unsigned int *bx, break;
> > >   }
> > >  
> > > - asm(XEN_EMULATE_PREFIX "cpuid"
> > > - : "=a" (*ax),
> > > -   "=b" (*bx),
> > > -   "=c" (*cx),
> > > -   "=d" (*dx)
> > > - : "0" (*ax), "2" (*cx));
> > > + if (xen_pvh_domain())
> > > + native_cpuid(ax, bx, cx, dx);
> > > + else
> > > + asm(XEN_EMULATE_PREFIX "cpuid"
> > > + : "=a" (*ax),
> > > + "=b" (*bx),
> > > + "=c" (*cx),
> > > + "=d" (*dx)
> > > + : "0" (*ax), "2" (*cx));
> > 
> > For this one off cpuid call it seems preferrable to me to use the
> > emulate prefix rather than diverge from PV.
> 
> This was before the PV cpuid was deemed OK to be used on PVH.
> Will rip this out to use the same version.

Whats wrong with using native cpuid? That is one of the benefits that
cpuid can be trapped via vmexit, and also there is talk of making PV
cpuid trap obsolete in the future. I suggest leaving it native.

Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2).

2014-01-02 Thread Mukesh Rathor
On Thu, 2 Jan 2014 13:32:21 -0500
Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:

 On Thu, Jan 02, 2014 at 03:32:33PM +, David Vrabel wrote:
  On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
   From: Mukesh Rathor mukesh.rat...@oracle.com
   
   In the bootup code for PVH we can trap cpuid via vmexit, so don't
   need to use emulated prefix call. We also check for vector
   callback early on, as it is a required feature. PVH also runs at
   default kernel IOPL.
   
   Finally, pure PV settings are moved to a separate function that
   are only called for pure PV, ie, pv with pvmmu. They are also
   #ifdef with CONFIG_XEN_PVMMU.
  [...]
   @@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax,
   unsigned int *bx, break;
 }

   - asm(XEN_EMULATE_PREFIX cpuid
   - : =a (*ax),
   -   =b (*bx),
   -   =c (*cx),
   -   =d (*dx)
   - : 0 (*ax), 2 (*cx));
   + if (xen_pvh_domain())
   + native_cpuid(ax, bx, cx, dx);
   + else
   + asm(XEN_EMULATE_PREFIX cpuid
   + : =a (*ax),
   + =b (*bx),
   + =c (*cx),
   + =d (*dx)
   + : 0 (*ax), 2 (*cx));
  
  For this one off cpuid call it seems preferrable to me to use the
  emulate prefix rather than diverge from PV.
 
 This was before the PV cpuid was deemed OK to be used on PVH.
 Will rip this out to use the same version.

Whats wrong with using native cpuid? That is one of the benefits that
cpuid can be trapped via vmexit, and also there is talk of making PV
cpuid trap obsolete in the future. I suggest leaving it native.

Mukesh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 06/18] xen/pvh: MMU changes for PVH (v2)

2014-01-02 Thread Mukesh Rathor
On Thu, 2 Jan 2014 11:24:50 +
David Vrabel david.vra...@citrix.com wrote:

 On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
  From: Mukesh Rathor mukesh.rat...@oracle.com
  
  .. which are surprinsingly small compared to the amount for PV code.
  
  PVH uses mostly native mmu ops, we leave the generic (native_*) for
  the majority and just overwrite the baremetal with the ones we need.
  
  We also optimize one - the TLB flush. The native operation would
  needlessly IPI offline VCPUs causing extra wakeups. Using the
  Xen one avoids that and lets the hypervisor determine which
  VCPU needs the TLB flush.
 
 This TLB flush optimization should be a separate patch.

It's not really an optimization, we are using PV mechanism instead
of native because PV one performs better. So, I think it's ok to belong
here.

Mukesh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v11 05/12] xen/pvh: Update E820 to work with PVH

2013-12-18 Thread Mukesh Rathor
On Wed, 18 Dec 2013 18:25:15 +
Stefano Stabellini  wrote:

> On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor 
> > 
> > In xen_add_extra_mem() we can skip updating P2M as it's managed
> > by Xen. PVH maps the entire IO space, but only RAM pages need
> > to be repopulated.
> > 
> > Signed-off-by: Mukesh Rathor 
> > Signed-off-by: Konrad Rzeszutek Wilk 
> > ---
> >  arch/x86/xen/setup.c | 19 +--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c

> > @@ -231,7 +246,7 @@ static void __init
> > xen_set_identity_and_release_chunk( (void)HYPERVISOR_update_va_mapping(
> > (unsigned long)__va(pfn << PAGE_SHIFT),
> > pte, 0); }
> > -
> > +skip:
> > if (start_pfn < nr_pages)
> > *released += xen_release_chunk(
> > start_pfn, min(end_pfn, nr_pages));
... 
> Also considering that you are turning xen_release_chunk into a nop,
> the only purpose of this function on PVH is to call
> set_phys_range_identity. Can't we just do that?

xen_release_chunk() is called for PVH to give us the count of released,
altho we don't need to release anything for pvh as it was already done in
xen. The released count is then used later to add memory.

I had separate function to just adjust the stats, which is all we need
to do for pvh, konrad just merged it with pv functions.

thanks
mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v11 05/12] xen/pvh: Update E820 to work with PVH

2013-12-18 Thread Mukesh Rathor
On Wed, 18 Dec 2013 18:25:15 +
Stefano Stabellini stefano.stabell...@eu.citrix.com wrote:

 On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
  From: Mukesh Rathor mukesh.rat...@oracle.com
  
  In xen_add_extra_mem() we can skip updating P2M as it's managed
  by Xen. PVH maps the entire IO space, but only RAM pages need
  to be repopulated.
  
  Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
  Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
  ---
   arch/x86/xen/setup.c | 19 +--
   1 file changed, 17 insertions(+), 2 deletions(-)
  
  diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c

  @@ -231,7 +246,7 @@ static void __init
  xen_set_identity_and_release_chunk( (void)HYPERVISOR_update_va_mapping(
  (unsigned long)__va(pfn  PAGE_SHIFT),
  pte, 0); }
  -
  +skip:
  if (start_pfn  nr_pages)
  *released += xen_release_chunk(
  start_pfn, min(end_pfn, nr_pages));
... 
 Also considering that you are turning xen_release_chunk into a nop,
 the only purpose of this function on PVH is to call
 set_phys_range_identity. Can't we just do that?

xen_release_chunk() is called for PVH to give us the count of released,
altho we don't need to release anything for pvh as it was already done in
xen. The released count is then used later to add memory.

I had separate function to just adjust the stats, which is all we need
to do for pvh, konrad just merged it with pv functions.

thanks
mukesh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V1]PVH: vcpu info placement, load CS selector, and remove debug printk.

2013-06-05 Thread Mukesh Rathor
This patch addresses 3 things:
   - Resolve vcpu info placement fixme.
   - Load CS selector for PVH after switching to new gdt.
   - Remove printk in case of failure to map pnfs in p2m. This because qemu
 has lot of expected failures when mapping HVM pages.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/enlighten.c |   19 +++
 arch/x86/xen/mmu.c   |3 ---
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a7ee39f..d55a578 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1083,14 +1083,12 @@ void xen_setup_shared_info(void)
HYPERVISOR_shared_info =
(struct shared_info *)__va(xen_start_info->shared_info);
 
-   /* PVH TBD/FIXME: vcpu info placement in phase 2 */
-   if (xen_pvh_domain())
-   return;
-
 #ifndef CONFIG_SMP
/* In UP this is as good a place as any to set up shared info */
xen_setup_vcpu_info_placement();
 #endif
+   if (xen_pvh_domain())
+   return;
 
xen_setup_mfn_list_list();
 }
@@ -1103,6 +1101,10 @@ void xen_setup_vcpu_info_placement(void)
for_each_possible_cpu(cpu)
xen_vcpu_setup(cpu);
 
+   /* PVH always uses native IRQ ops */
+   if (xen_pvh_domain())
+   return;
+
/* xen_vcpu_setup managed to place the vcpu_info within the
   percpu area for all cpus, so make use of it */
if (have_vcpu_info_placement) {
@@ -1326,7 +1328,16 @@ static void __init xen_setup_stackprotector(void)
 {
/* PVH TBD/FIXME: investigate setup_stack_canary_segment */
if (xen_feature(XENFEAT_auto_translated_physmap)) {
+   unsigned long dummy;
+
switch_to_new_gdt(0);
+
+   asm volatile ("pushq %0\n"
+ "leaq 1f(%%rip),%0\n"
+ "pushq %0\n"
+ "lretq\n"
+ "1:\n"
+ : "=" (dummy) : "0" (__KERNEL_CS));
return;
}
pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 31cc1ef..c104895 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2527,9 +2527,6 @@ static int pvh_add_to_xen_p2m(unsigned long lpfn, 
unsigned long fgmfn,
set_xen_guest_handle(xatp.errs, );
 
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, );
-   if (rc || err)
-   pr_warn("d0: Failed to map pfn (0x%lx) to mfn (0x%lx) 
rc:%d:%d\n",
-   lpfn, fgmfn, rc, err);
return rc;
 }
 
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] PVH: vcpu info placement, load selectors, and remove debug printk.

2013-06-05 Thread Mukesh Rathor
On Wed, 05 Jun 2013 08:03:12 +0100
"Jan Beulich"  wrote:

> >>> On 04.06.13 at 23:53, Mukesh Rathor 
> >>> wrote:
> > Following OK? :
> > 
> > if (xen_feature(XENFEAT_auto_translated_physmap)) {
> > switch_to_new_gdt(0);
> > 
> > asm volatile (
> > "pushq %%rax\n"
> > "leaq 1f(%%rip),%%rax\n"
> > "pushq %%rax\n"
> > "lretq\n"
> > "1:\n"
> > : : "a" (__KERNEL_CS) : "memory");
> > 
> > return;
> > }
> 
> While generally the choice of using %%rax instead of %0 here is
> a matter of taste to some degree, I still don't see why you can't
> use "r" as the constraint here in the first place.

The compiler mostly picks eax anyways, but good suggestion.

> Furthermore, assuming this sits in a function guaranteed to not be
> inlined, this has a latent bug (and if the assumption isn't right, the
> bug is real) in that the asm() modifies %rax without telling the
> compiler.

According to one of the unofficial asm tutorials i've here, the compiler
knows since it's input and doesn't need to be told. In fact
it'll barf if added to clobber list.

> This is how I would have done it:
> 
>   unsigned long dummy;
> 
>   asm volatile ("pushq %0\n"
> "leaq 1f(%%rip),%0\n"
> "pushq %0\n"
> "lretq\n"
> "1:\n"
> : "=" (dummy) : "0" (__KERNEL_CS));
> 

Looks good. Thanks,
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] PVH: vcpu info placement, load selectors, and remove debug printk.

2013-06-05 Thread Mukesh Rathor
On Wed, 05 Jun 2013 08:03:12 +0100
Jan Beulich jbeul...@suse.com wrote:

  On 04.06.13 at 23:53, Mukesh Rathor mukesh.rat...@oracle.com
  wrote:
  Following OK? :
  
  if (xen_feature(XENFEAT_auto_translated_physmap)) {
  switch_to_new_gdt(0);
  
  asm volatile (
  pushq %%rax\n
  leaq 1f(%%rip),%%rax\n
  pushq %%rax\n
  lretq\n
  1:\n
  : : a (__KERNEL_CS) : memory);
  
  return;
  }
 
 While generally the choice of using %%rax instead of %0 here is
 a matter of taste to some degree, I still don't see why you can't
 use r as the constraint here in the first place.

The compiler mostly picks eax anyways, but good suggestion.

 Furthermore, assuming this sits in a function guaranteed to not be
 inlined, this has a latent bug (and if the assumption isn't right, the
 bug is real) in that the asm() modifies %rax without telling the
 compiler.

According to one of the unofficial asm tutorials i've here, the compiler
knows since it's input and doesn't need to be told. In fact
it'll barf if added to clobber list.

 This is how I would have done it:
 
   unsigned long dummy;
 
   asm volatile (pushq %0\n
 leaq 1f(%%rip),%0\n
 pushq %0\n
 lretq\n
 1:\n
 : =r (dummy) : 0 (__KERNEL_CS));
 

Looks good. Thanks,
Mukesh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V1]PVH: vcpu info placement, load CS selector, and remove debug printk.

2013-06-05 Thread Mukesh Rathor
This patch addresses 3 things:
   - Resolve vcpu info placement fixme.
   - Load CS selector for PVH after switching to new gdt.
   - Remove printk in case of failure to map pnfs in p2m. This because qemu
 has lot of expected failures when mapping HVM pages.

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
---
 arch/x86/xen/enlighten.c |   19 +++
 arch/x86/xen/mmu.c   |3 ---
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a7ee39f..d55a578 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1083,14 +1083,12 @@ void xen_setup_shared_info(void)
HYPERVISOR_shared_info =
(struct shared_info *)__va(xen_start_info-shared_info);
 
-   /* PVH TBD/FIXME: vcpu info placement in phase 2 */
-   if (xen_pvh_domain())
-   return;
-
 #ifndef CONFIG_SMP
/* In UP this is as good a place as any to set up shared info */
xen_setup_vcpu_info_placement();
 #endif
+   if (xen_pvh_domain())
+   return;
 
xen_setup_mfn_list_list();
 }
@@ -1103,6 +1101,10 @@ void xen_setup_vcpu_info_placement(void)
for_each_possible_cpu(cpu)
xen_vcpu_setup(cpu);
 
+   /* PVH always uses native IRQ ops */
+   if (xen_pvh_domain())
+   return;
+
/* xen_vcpu_setup managed to place the vcpu_info within the
   percpu area for all cpus, so make use of it */
if (have_vcpu_info_placement) {
@@ -1326,7 +1328,16 @@ static void __init xen_setup_stackprotector(void)
 {
/* PVH TBD/FIXME: investigate setup_stack_canary_segment */
if (xen_feature(XENFEAT_auto_translated_physmap)) {
+   unsigned long dummy;
+
switch_to_new_gdt(0);
+
+   asm volatile (pushq %0\n
+ leaq 1f(%%rip),%0\n
+ pushq %0\n
+ lretq\n
+ 1:\n
+ : =r (dummy) : 0 (__KERNEL_CS));
return;
}
pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 31cc1ef..c104895 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2527,9 +2527,6 @@ static int pvh_add_to_xen_p2m(unsigned long lpfn, 
unsigned long fgmfn,
set_xen_guest_handle(xatp.errs, err);
 
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, xatp);
-   if (rc || err)
-   pr_warn(d0: Failed to map pfn (0x%lx) to mfn (0x%lx) 
rc:%d:%d\n,
-   lpfn, fgmfn, rc, err);
return rc;
 }
 
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] PVH: vcpu info placement, load selectors, and remove debug printk.

2013-06-04 Thread Mukesh Rathor
On Tue, 04 Jun 2013 09:27:03 +0100
"Jan Beulich"  wrote:

> >>> On 04.06.13 at 02:43, Mukesh Rathor 
> >>> wrote:
> > @@ -1327,6 +1329,18 @@ static void __init
> > xen_setup_stackprotector(void) /* PVH TBD/FIXME: investigate
> > setup_stack_canary_segment */ if
> > (xen_feature(XENFEAT_auto_translated_physmap))
> > { switch_to_new_gdt(0); +
> > +   /* xen started us with null selectors. load them
> > now */
> > +   __asm__ __volatile__ (
> > +   "movl %0,%%ds\n"
> > +   "movl %0,%%ss\n"
> > +   "pushq %%rax\n"
> > +   "leaq 1f(%%rip),%%rax\n"
> > +   "pushq %%rax\n"
> > +   "retfq\n"
> > +   "1:\n"
> > +   : : "r" (__KERNEL_DS), "a" (__KERNEL_CS) :
> > "memory"); +
> 
> I can see why you want CS to be reloaded (and CS, other than the
> comment says, clearly hasn't been holding a null selector up to here.
> 
> I can't immediately see why you'd need SS to be other than null, and
> it completely escapes me why you'd need to DS (but not ES) to be
> non-null.
> 
> Furthermore, is there any reason why you use "retfq" (Intel syntax)
> when all assembly code otherwise uses AT syntax (the proper
> equivalent here would be "lretq")?
> 
> And finally, please consistently use % (which, once
> fixed, will make clear that the second constraint really can be "r"),
> and avoid using suffixes on moves to/from selector registers
> (which, once fixed, will make clear that at least the first constraint
> really can be relaxed to "rm").

Following OK? :

if (xen_feature(XENFEAT_auto_translated_physmap)) {
switch_to_new_gdt(0);

asm volatile (
"pushq %%rax\n"
"leaq 1f(%%rip),%%rax\n"
"pushq %%rax\n"
"lretq\n"
"1:\n"
: : "a" (__KERNEL_CS) : "memory");

return;
}

thanks,
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] PVH: vcpu info placement, load selectors, and remove debug printk.

2013-06-04 Thread Mukesh Rathor
On Tue, 04 Jun 2013 09:27:03 +0100
Jan Beulich jbeul...@suse.com wrote:

  On 04.06.13 at 02:43, Mukesh Rathor mukesh.rat...@oracle.com
  wrote:
  @@ -1327,6 +1329,18 @@ static void __init
  xen_setup_stackprotector(void) /* PVH TBD/FIXME: investigate
  setup_stack_canary_segment */ if
  (xen_feature(XENFEAT_auto_translated_physmap))
  { switch_to_new_gdt(0); +
  +   /* xen started us with null selectors. load them
  now */
  +   __asm__ __volatile__ (
  +   movl %0,%%ds\n
  +   movl %0,%%ss\n
  +   pushq %%rax\n
  +   leaq 1f(%%rip),%%rax\n
  +   pushq %%rax\n
  +   retfq\n
  +   1:\n
  +   : : r (__KERNEL_DS), a (__KERNEL_CS) :
  memory); +
 
 I can see why you want CS to be reloaded (and CS, other than the
 comment says, clearly hasn't been holding a null selector up to here.
 
 I can't immediately see why you'd need SS to be other than null, and
 it completely escapes me why you'd need to DS (but not ES) to be
 non-null.
 
 Furthermore, is there any reason why you use retfq (Intel syntax)
 when all assembly code otherwise uses ATT syntax (the proper
 equivalent here would be lretq)?
 
 And finally, please consistently use %number (which, once
 fixed, will make clear that the second constraint really can be r),
 and avoid using suffixes on moves to/from selector registers
 (which, once fixed, will make clear that at least the first constraint
 really can be relaxed to rm).

Following OK? :

if (xen_feature(XENFEAT_auto_translated_physmap)) {
switch_to_new_gdt(0);

asm volatile (
pushq %%rax\n
leaq 1f(%%rip),%%rax\n
pushq %%rax\n
lretq\n
1:\n
: : a (__KERNEL_CS) : memory);

return;
}

thanks,
Mukesh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] PVH: vcpu info placement, load selectors, and remove debug printk.

2013-06-03 Thread Mukesh Rathor
This patch addresses 3 things:
   - Resolve vcpu info placement fixme.
   - Load DS/SS/CS selectors for PVH after switching to new gdt.
   - Remove printk in case of failure to map pnfs in p2m. This because qemu
 has lot of benign failures when mapping HVM pages.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/enlighten.c |   22 ++
 arch/x86/xen/mmu.c   |3 ---
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a7ee39f..6ff30d8 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1083,14 +1083,12 @@ void xen_setup_shared_info(void)
HYPERVISOR_shared_info =
(struct shared_info *)__va(xen_start_info->shared_info);
 
-   /* PVH TBD/FIXME: vcpu info placement in phase 2 */
-   if (xen_pvh_domain())
-   return;
-
 #ifndef CONFIG_SMP
/* In UP this is as good a place as any to set up shared info */
xen_setup_vcpu_info_placement();
 #endif
+   if (xen_pvh_domain())
+   return;
 
xen_setup_mfn_list_list();
 }
@@ -1103,6 +1101,10 @@ void xen_setup_vcpu_info_placement(void)
for_each_possible_cpu(cpu)
xen_vcpu_setup(cpu);
 
+   /* PVH always uses native IRQ ops */
+   if (xen_pvh_domain())
+   return;
+
/* xen_vcpu_setup managed to place the vcpu_info within the
   percpu area for all cpus, so make use of it */
if (have_vcpu_info_placement) {
@@ -1327,6 +1329,18 @@ static void __init xen_setup_stackprotector(void)
/* PVH TBD/FIXME: investigate setup_stack_canary_segment */
if (xen_feature(XENFEAT_auto_translated_physmap)) {
switch_to_new_gdt(0);
+
+   /* xen started us with null selectors. load them now */
+   __asm__ __volatile__ (
+   "movl %0,%%ds\n"
+   "movl %0,%%ss\n"
+   "pushq %%rax\n"
+   "leaq 1f(%%rip),%%rax\n"
+   "pushq %%rax\n"
+   "retfq\n"
+   "1:\n"
+   : : "r" (__KERNEL_DS), "a" (__KERNEL_CS) : "memory");
+
return;
}
pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 31cc1ef..c104895 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2527,9 +2527,6 @@ static int pvh_add_to_xen_p2m(unsigned long lpfn, 
unsigned long fgmfn,
set_xen_guest_handle(xatp.errs, );
 
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, );
-   if (rc || err)
-   pr_warn("d0: Failed to map pfn (0x%lx) to mfn (0x%lx) 
rc:%d:%d\n",
-   lpfn, fgmfn, rc, err);
return rc;
 }
 
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] PVH: vcpu info placement, load selectors, and remove debug printk.

2013-06-03 Thread Mukesh Rathor
This patch addresses 3 things:
   - Resolve vcpu info placement fixme.
   - Load DS/SS/CS selectors for PVH after switching to new gdt.
   - Remove printk in case of failure to map pnfs in p2m. This because qemu
 has lot of benign failures when mapping HVM pages.

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
---
 arch/x86/xen/enlighten.c |   22 ++
 arch/x86/xen/mmu.c   |3 ---
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a7ee39f..6ff30d8 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1083,14 +1083,12 @@ void xen_setup_shared_info(void)
HYPERVISOR_shared_info =
(struct shared_info *)__va(xen_start_info-shared_info);
 
-   /* PVH TBD/FIXME: vcpu info placement in phase 2 */
-   if (xen_pvh_domain())
-   return;
-
 #ifndef CONFIG_SMP
/* In UP this is as good a place as any to set up shared info */
xen_setup_vcpu_info_placement();
 #endif
+   if (xen_pvh_domain())
+   return;
 
xen_setup_mfn_list_list();
 }
@@ -1103,6 +1101,10 @@ void xen_setup_vcpu_info_placement(void)
for_each_possible_cpu(cpu)
xen_vcpu_setup(cpu);
 
+   /* PVH always uses native IRQ ops */
+   if (xen_pvh_domain())
+   return;
+
/* xen_vcpu_setup managed to place the vcpu_info within the
   percpu area for all cpus, so make use of it */
if (have_vcpu_info_placement) {
@@ -1327,6 +1329,18 @@ static void __init xen_setup_stackprotector(void)
/* PVH TBD/FIXME: investigate setup_stack_canary_segment */
if (xen_feature(XENFEAT_auto_translated_physmap)) {
switch_to_new_gdt(0);
+
+   /* xen started us with null selectors. load them now */
+   __asm__ __volatile__ (
+   movl %0,%%ds\n
+   movl %0,%%ss\n
+   pushq %%rax\n
+   leaq 1f(%%rip),%%rax\n
+   pushq %%rax\n
+   retfq\n
+   1:\n
+   : : r (__KERNEL_DS), a (__KERNEL_CS) : memory);
+
return;
}
pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 31cc1ef..c104895 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2527,9 +2527,6 @@ static int pvh_add_to_xen_p2m(unsigned long lpfn, 
unsigned long fgmfn,
set_xen_guest_handle(xatp.errs, err);
 
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, xatp);
-   if (rc || err)
-   pr_warn(d0: Failed to map pfn (0x%lx) to mfn (0x%lx) 
rc:%d:%d\n,
-   lpfn, fgmfn, rc, err);
return rc;
 }
 
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]: PVH linux: don't print warning in case of failed mapping

2013-02-15 Thread Mukesh Rathor
Remove the printing of warning in case of failed mappings. Sometimes
they are expected as in case of Qemu mapping pages during HVM guest
creation.

Signed-off-by: Mukesh Rathor 

---
 arch/x86/xen/mmu.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index fbf6a63..afa6af6 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2499,9 +2499,6 @@ static int pvh_add_to_xen_p2m(unsigned long lpfn, 
unsigned long fgmfn,
set_xen_guest_handle(xatpr.gpfns, );
 
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, );
-   if (rc)
-   pr_warn("d0: Failed to map pfn to mfn rc:%d pfn:%lx mfn:%lx\n",
-   rc, lpfn, fgmfn);
return rc;
 }
 
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]: PVH linux: don't print warning in case of failed mapping

2013-02-15 Thread Mukesh Rathor
Remove the printing of warning in case of failed mappings. Sometimes
they are expected as in case of Qemu mapping pages during HVM guest
creation.

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com

---
 arch/x86/xen/mmu.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index fbf6a63..afa6af6 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2499,9 +2499,6 @@ static int pvh_add_to_xen_p2m(unsigned long lpfn, 
unsigned long fgmfn,
set_xen_guest_handle(xatpr.gpfns, gpfn);
 
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, xatpr);
-   if (rc)
-   pr_warn(d0: Failed to map pfn to mfn rc:%d pfn:%lx mfn:%lx\n,
-   rc, lpfn, fgmfn);
return rc;
 }
 
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PVH: remove code to map iomem from guest

2013-02-06 Thread Mukesh Rathor
On Wed, 6 Feb 2013 10:39:13 -0500
Konrad Rzeszutek Wilk  wrote:

> On Wed, Jan 30, 2013 at 02:55:29PM -0800, Mukesh Rathor wrote:
> > It was decided during xen patch review that xen map the iomem
> > transparently, so remove xen_set_clr_mmio_pvh_pte() and the sub
> > hypercall PHYSDEVOP_map_iomem.
> > 
> 
> G..
> 
> No Signed-off-by??

Signed-off-by: Mukesh Rathor 


BTW, thanks a lot konrad for managing this while xen patch is being
reviewed. It's a huge help for me and allows me to focus on xen side.
Appreciate much.

Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PVH linux: Use ballooning to allocate grant table pages

2013-02-06 Thread Mukesh Rathor
On Wed, 6 Feb 2013 10:49:13 -0500
Konrad Rzeszutek Wilk  wrote:

> On Thu, Jan 31, 2013 at 06:30:15PM -0800, Mukesh Rathor wrote:
> > This patch fixes a fixme in Linux to use alloc_xenballooned_pages()
> > to allocate pfns for grant table pages instead of kmalloc. This also
> > simplifies add to physmap on the xen side a bit.
> 
> Pulled this.
> > 


Konrad, no, there was a follow up email on this thread to discard this.
Please discard this. I resent yesterday with proper fixes. I realize
now I should've given one yesterday version number. My bad, this head 
cold is crippling my brain :).. 

Sorry for the confusion.

Mukesh

Following is the latest patch I emailed yesterday :


This patch fixes a fixme in Linux to use alloc_xenballooned_pages() to
allocate pfns for grant table pages instead of kmalloc. This also
simplifies add to physmap on the xen side a bit.

Signed-off-by: Mukesh Rathor 

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 9c0019d..fdb1d88 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1026,10 +1027,22 @@ static void gnttab_unmap_frames_v2(void)
arch_gnttab_unmap(grstatus, nr_status_frames(nr_grant_frames));
 }
 
+static xen_pfn_t pvh_get_grant_pfn(int grant_idx)
+{
+   unsigned long vaddr;
+   unsigned int level;
+   pte_t *pte;
+
+   vaddr = (unsigned long)(gnttab_shared.addr) + grant_idx * PAGE_SIZE;
+   pte = lookup_address(vaddr, );
+   BUG_ON(pte == NULL);
+   return pte_mfn(*pte);
+}
+
 static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
 {
struct gnttab_setup_table setup;
-   unsigned long *frames, start_gpfn;
+   unsigned long *frames, start_gpfn = 0;
unsigned int nr_gframes = end_idx + 1;
int rc;
 
@@ -1040,8 +1053,6 @@ static int gnttab_map(unsigned int start_idx, unsigned 
int end_idx)
 
if (xen_hvm_domain())
start_gpfn = xen_hvm_resume_frames >> PAGE_SHIFT;
-   else
-   start_gpfn = virt_to_pfn(gnttab_shared.addr);
/*
 * Loop backwards, so that the first hypercall has the largest
 * index, ensuring that the table will grow only once.
@@ -1050,7 +1061,11 @@ static int gnttab_map(unsigned int start_idx, unsigned 
int end_idx)
xatp.domid = DOMID_SELF;
xatp.idx = i;
xatp.space = XENMAPSPACE_grant_table;
-   xatp.gpfn = start_gpfn + i;
+   if (xen_hvm_domain())
+   xatp.gpfn = start_gpfn + i;
+   else
+   xatp.gpfn = pvh_get_grant_pfn(i);
+
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, );
if (rc != 0) {
printk(KERN_WARNING
@@ -1138,27 +1153,51 @@ static void gnttab_request_version(void)
grant_table_version);
 }
 
+/*
+ * PVH: we need three things: virtual address, pfns, and mfns. The pfns
+ * are allocated via ballooning, then we call arch_gnttab_map_shared to
+ * allocate the VA and put pfn's in the pte's for the VA. The mfn's are
+ * finally allocated in gnttab_map() by xen which also populates the P2M.
+ */
+static int xlated_setup_gnttab_pages(unsigned long numpages, void **addr)
+{
+   int i, rc;
+   unsigned long pfns[numpages];
+   struct page *pages[numpages];
+
+   rc = alloc_xenballooned_pages(numpages, pages, 0);
+   if (rc != 0) {
+   pr_warn("%s Couldn't balloon alloc %ld pfns rc:%d\n", __func__,
+   numpages, rc);
+   return rc;
+   }
+   for (i = 0; i < numpages; i++)
+   pfns[i] = page_to_pfn(pages[i]);
+
+   rc = arch_gnttab_map_shared(pfns, numpages, numpages, addr);
+   if (rc != 0)
+   free_xenballooned_pages(numpages, pages);
+
+   return rc;
+}
+
 int gnttab_resume(void)
 {
+   int rc;
unsigned int max_nr_gframes;
-   char *kmsg = "Failed to kmalloc pages for pv in hvm grant frames\n";
 
gnttab_request_version();
max_nr_gframes = gnttab_max_grant_frames();
if (max_nr_gframes < nr_grant_frames)
return -ENOSYS;
 
-   /* PVH note: xen will free existing kmalloc'd mfn in
-* XENMEM_add_to_physmap. TBD/FIXME: use xen ballooning instead of
-* kmalloc(). */
if (xen_pv_domain() && xen_feature(XENFEAT_auto_translated_physmap) &&
!gnttab_shared.addr) {
-   gnttab_shared.addr =
-   kmalloc(max_nr_gframes * PAGE_SIZE, GFP_KERNEL);
-   if (!gnttab_shared.addr) {
-   pr_warn("%s", kmsg);
-  

Re: [PATCH] PVH linux: Use ballooning to allocate grant table pages

2013-02-06 Thread Mukesh Rathor
On Wed, 6 Feb 2013 10:49:13 -0500
Konrad Rzeszutek Wilk kon...@darnok.org wrote:

 On Thu, Jan 31, 2013 at 06:30:15PM -0800, Mukesh Rathor wrote:
  This patch fixes a fixme in Linux to use alloc_xenballooned_pages()
  to allocate pfns for grant table pages instead of kmalloc. This also
  simplifies add to physmap on the xen side a bit.
 
 Pulled this.
  


Konrad, no, there was a follow up email on this thread to discard this.
Please discard this. I resent yesterday with proper fixes. I realize
now I should've given one yesterday version number. My bad, this head 
cold is crippling my brain :).. 

Sorry for the confusion.

Mukesh

Following is the latest patch I emailed yesterday :


This patch fixes a fixme in Linux to use alloc_xenballooned_pages() to
allocate pfns for grant table pages instead of kmalloc. This also
simplifies add to physmap on the xen side a bit.

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 9c0019d..fdb1d88 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -47,6 +47,7 @@
 #include xen/grant_table.h
 #include xen/interface/memory.h
 #include xen/hvc-console.h
+#include xen/balloon.h
 #include asm/xen/hypercall.h
 #include asm/xen/interface.h
 
@@ -1026,10 +1027,22 @@ static void gnttab_unmap_frames_v2(void)
arch_gnttab_unmap(grstatus, nr_status_frames(nr_grant_frames));
 }
 
+static xen_pfn_t pvh_get_grant_pfn(int grant_idx)
+{
+   unsigned long vaddr;
+   unsigned int level;
+   pte_t *pte;
+
+   vaddr = (unsigned long)(gnttab_shared.addr) + grant_idx * PAGE_SIZE;
+   pte = lookup_address(vaddr, level);
+   BUG_ON(pte == NULL);
+   return pte_mfn(*pte);
+}
+
 static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
 {
struct gnttab_setup_table setup;
-   unsigned long *frames, start_gpfn;
+   unsigned long *frames, start_gpfn = 0;
unsigned int nr_gframes = end_idx + 1;
int rc;
 
@@ -1040,8 +1053,6 @@ static int gnttab_map(unsigned int start_idx, unsigned 
int end_idx)
 
if (xen_hvm_domain())
start_gpfn = xen_hvm_resume_frames  PAGE_SHIFT;
-   else
-   start_gpfn = virt_to_pfn(gnttab_shared.addr);
/*
 * Loop backwards, so that the first hypercall has the largest
 * index, ensuring that the table will grow only once.
@@ -1050,7 +1061,11 @@ static int gnttab_map(unsigned int start_idx, unsigned 
int end_idx)
xatp.domid = DOMID_SELF;
xatp.idx = i;
xatp.space = XENMAPSPACE_grant_table;
-   xatp.gpfn = start_gpfn + i;
+   if (xen_hvm_domain())
+   xatp.gpfn = start_gpfn + i;
+   else
+   xatp.gpfn = pvh_get_grant_pfn(i);
+
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, xatp);
if (rc != 0) {
printk(KERN_WARNING
@@ -1138,27 +1153,51 @@ static void gnttab_request_version(void)
grant_table_version);
 }
 
+/*
+ * PVH: we need three things: virtual address, pfns, and mfns. The pfns
+ * are allocated via ballooning, then we call arch_gnttab_map_shared to
+ * allocate the VA and put pfn's in the pte's for the VA. The mfn's are
+ * finally allocated in gnttab_map() by xen which also populates the P2M.
+ */
+static int xlated_setup_gnttab_pages(unsigned long numpages, void **addr)
+{
+   int i, rc;
+   unsigned long pfns[numpages];
+   struct page *pages[numpages];
+
+   rc = alloc_xenballooned_pages(numpages, pages, 0);
+   if (rc != 0) {
+   pr_warn(%s Couldn't balloon alloc %ld pfns rc:%d\n, __func__,
+   numpages, rc);
+   return rc;
+   }
+   for (i = 0; i  numpages; i++)
+   pfns[i] = page_to_pfn(pages[i]);
+
+   rc = arch_gnttab_map_shared(pfns, numpages, numpages, addr);
+   if (rc != 0)
+   free_xenballooned_pages(numpages, pages);
+
+   return rc;
+}
+
 int gnttab_resume(void)
 {
+   int rc;
unsigned int max_nr_gframes;
-   char *kmsg = Failed to kmalloc pages for pv in hvm grant frames\n;
 
gnttab_request_version();
max_nr_gframes = gnttab_max_grant_frames();
if (max_nr_gframes  nr_grant_frames)
return -ENOSYS;
 
-   /* PVH note: xen will free existing kmalloc'd mfn in
-* XENMEM_add_to_physmap. TBD/FIXME: use xen ballooning instead of
-* kmalloc(). */
if (xen_pv_domain()  xen_feature(XENFEAT_auto_translated_physmap) 
!gnttab_shared.addr) {
-   gnttab_shared.addr =
-   kmalloc(max_nr_gframes * PAGE_SIZE, GFP_KERNEL);
-   if (!gnttab_shared.addr

Re: [PATCH] PVH: remove code to map iomem from guest

2013-02-06 Thread Mukesh Rathor
On Wed, 6 Feb 2013 10:39:13 -0500
Konrad Rzeszutek Wilk kon...@darnok.org wrote:

 On Wed, Jan 30, 2013 at 02:55:29PM -0800, Mukesh Rathor wrote:
  It was decided during xen patch review that xen map the iomem
  transparently, so remove xen_set_clr_mmio_pvh_pte() and the sub
  hypercall PHYSDEVOP_map_iomem.
  
 
 G..
 
 No Signed-off-by??

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com


BTW, thanks a lot konrad for managing this while xen patch is being
reviewed. It's a huge help for me and allows me to focus on xen side.
Appreciate much.

Mukesh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] PVH linux: Use ballooning to allocate grant table pages

2013-02-05 Thread Mukesh Rathor
This patch fixes a fixme in Linux to use alloc_xenballooned_pages() to
allocate pfns for grant table pages instead of kmalloc. This also
simplifies add to physmap on the xen side a bit.

Signed-off-by: Mukesh Rathor 

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 9c0019d..fdb1d88 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1026,10 +1027,22 @@ static void gnttab_unmap_frames_v2(void)
arch_gnttab_unmap(grstatus, nr_status_frames(nr_grant_frames));
 }
 
+static xen_pfn_t pvh_get_grant_pfn(int grant_idx)
+{
+   unsigned long vaddr;
+   unsigned int level;
+   pte_t *pte;
+
+   vaddr = (unsigned long)(gnttab_shared.addr) + grant_idx * PAGE_SIZE;
+   pte = lookup_address(vaddr, );
+   BUG_ON(pte == NULL);
+   return pte_mfn(*pte);
+}
+
 static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
 {
struct gnttab_setup_table setup;
-   unsigned long *frames, start_gpfn;
+   unsigned long *frames, start_gpfn = 0;
unsigned int nr_gframes = end_idx + 1;
int rc;
 
@@ -1040,8 +1053,6 @@ static int gnttab_map(unsigned int start_idx, unsigned 
int end_idx)
 
if (xen_hvm_domain())
start_gpfn = xen_hvm_resume_frames >> PAGE_SHIFT;
-   else
-   start_gpfn = virt_to_pfn(gnttab_shared.addr);
/*
 * Loop backwards, so that the first hypercall has the largest
 * index, ensuring that the table will grow only once.
@@ -1050,7 +1061,11 @@ static int gnttab_map(unsigned int start_idx, unsigned 
int end_idx)
xatp.domid = DOMID_SELF;
xatp.idx = i;
xatp.space = XENMAPSPACE_grant_table;
-   xatp.gpfn = start_gpfn + i;
+   if (xen_hvm_domain())
+   xatp.gpfn = start_gpfn + i;
+   else
+   xatp.gpfn = pvh_get_grant_pfn(i);
+
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, );
if (rc != 0) {
printk(KERN_WARNING
@@ -1138,27 +1153,51 @@ static void gnttab_request_version(void)
grant_table_version);
 }
 
+/*
+ * PVH: we need three things: virtual address, pfns, and mfns. The pfns
+ * are allocated via ballooning, then we call arch_gnttab_map_shared to
+ * allocate the VA and put pfn's in the pte's for the VA. The mfn's are
+ * finally allocated in gnttab_map() by xen which also populates the P2M.
+ */
+static int xlated_setup_gnttab_pages(unsigned long numpages, void **addr)
+{
+   int i, rc;
+   unsigned long pfns[numpages];
+   struct page *pages[numpages];
+
+   rc = alloc_xenballooned_pages(numpages, pages, 0);
+   if (rc != 0) {
+   pr_warn("%s Couldn't balloon alloc %ld pfns rc:%d\n", __func__,
+   numpages, rc);
+   return rc;
+   }
+   for (i = 0; i < numpages; i++)
+   pfns[i] = page_to_pfn(pages[i]);
+
+   rc = arch_gnttab_map_shared(pfns, numpages, numpages, addr);
+   if (rc != 0)
+   free_xenballooned_pages(numpages, pages);
+
+   return rc;
+}
+
 int gnttab_resume(void)
 {
+   int rc;
unsigned int max_nr_gframes;
-   char *kmsg = "Failed to kmalloc pages for pv in hvm grant frames\n";
 
gnttab_request_version();
max_nr_gframes = gnttab_max_grant_frames();
if (max_nr_gframes < nr_grant_frames)
return -ENOSYS;
 
-   /* PVH note: xen will free existing kmalloc'd mfn in
-* XENMEM_add_to_physmap. TBD/FIXME: use xen ballooning instead of
-* kmalloc(). */
if (xen_pv_domain() && xen_feature(XENFEAT_auto_translated_physmap) &&
!gnttab_shared.addr) {
-   gnttab_shared.addr =
-   kmalloc(max_nr_gframes * PAGE_SIZE, GFP_KERNEL);
-   if (!gnttab_shared.addr) {
-   pr_warn("%s", kmsg);
-   return -ENOMEM;
-   }
+
+   rc = xlated_setup_gnttab_pages((unsigned long)max_nr_gframes,
+  _shared.addr);
+   if (rc != 0)
+   return rc;
}
if (xen_pv_domain())
return gnttab_map(0, nr_grant_frames - 1);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] PVH linux: Use ballooning to allocate grant table pages

2013-02-05 Thread Mukesh Rathor
This patch fixes a fixme in Linux to use alloc_xenballooned_pages() to
allocate pfns for grant table pages instead of kmalloc. This also
simplifies add to physmap on the xen side a bit.

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 9c0019d..fdb1d88 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -47,6 +47,7 @@
 #include xen/grant_table.h
 #include xen/interface/memory.h
 #include xen/hvc-console.h
+#include xen/balloon.h
 #include asm/xen/hypercall.h
 #include asm/xen/interface.h
 
@@ -1026,10 +1027,22 @@ static void gnttab_unmap_frames_v2(void)
arch_gnttab_unmap(grstatus, nr_status_frames(nr_grant_frames));
 }
 
+static xen_pfn_t pvh_get_grant_pfn(int grant_idx)
+{
+   unsigned long vaddr;
+   unsigned int level;
+   pte_t *pte;
+
+   vaddr = (unsigned long)(gnttab_shared.addr) + grant_idx * PAGE_SIZE;
+   pte = lookup_address(vaddr, level);
+   BUG_ON(pte == NULL);
+   return pte_mfn(*pte);
+}
+
 static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
 {
struct gnttab_setup_table setup;
-   unsigned long *frames, start_gpfn;
+   unsigned long *frames, start_gpfn = 0;
unsigned int nr_gframes = end_idx + 1;
int rc;
 
@@ -1040,8 +1053,6 @@ static int gnttab_map(unsigned int start_idx, unsigned 
int end_idx)
 
if (xen_hvm_domain())
start_gpfn = xen_hvm_resume_frames  PAGE_SHIFT;
-   else
-   start_gpfn = virt_to_pfn(gnttab_shared.addr);
/*
 * Loop backwards, so that the first hypercall has the largest
 * index, ensuring that the table will grow only once.
@@ -1050,7 +1061,11 @@ static int gnttab_map(unsigned int start_idx, unsigned 
int end_idx)
xatp.domid = DOMID_SELF;
xatp.idx = i;
xatp.space = XENMAPSPACE_grant_table;
-   xatp.gpfn = start_gpfn + i;
+   if (xen_hvm_domain())
+   xatp.gpfn = start_gpfn + i;
+   else
+   xatp.gpfn = pvh_get_grant_pfn(i);
+
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, xatp);
if (rc != 0) {
printk(KERN_WARNING
@@ -1138,27 +1153,51 @@ static void gnttab_request_version(void)
grant_table_version);
 }
 
+/*
+ * PVH: we need three things: virtual address, pfns, and mfns. The pfns
+ * are allocated via ballooning, then we call arch_gnttab_map_shared to
+ * allocate the VA and put pfn's in the pte's for the VA. The mfn's are
+ * finally allocated in gnttab_map() by xen which also populates the P2M.
+ */
+static int xlated_setup_gnttab_pages(unsigned long numpages, void **addr)
+{
+   int i, rc;
+   unsigned long pfns[numpages];
+   struct page *pages[numpages];
+
+   rc = alloc_xenballooned_pages(numpages, pages, 0);
+   if (rc != 0) {
+   pr_warn(%s Couldn't balloon alloc %ld pfns rc:%d\n, __func__,
+   numpages, rc);
+   return rc;
+   }
+   for (i = 0; i  numpages; i++)
+   pfns[i] = page_to_pfn(pages[i]);
+
+   rc = arch_gnttab_map_shared(pfns, numpages, numpages, addr);
+   if (rc != 0)
+   free_xenballooned_pages(numpages, pages);
+
+   return rc;
+}
+
 int gnttab_resume(void)
 {
+   int rc;
unsigned int max_nr_gframes;
-   char *kmsg = Failed to kmalloc pages for pv in hvm grant frames\n;
 
gnttab_request_version();
max_nr_gframes = gnttab_max_grant_frames();
if (max_nr_gframes  nr_grant_frames)
return -ENOSYS;
 
-   /* PVH note: xen will free existing kmalloc'd mfn in
-* XENMEM_add_to_physmap. TBD/FIXME: use xen ballooning instead of
-* kmalloc(). */
if (xen_pv_domain()  xen_feature(XENFEAT_auto_translated_physmap) 
!gnttab_shared.addr) {
-   gnttab_shared.addr =
-   kmalloc(max_nr_gframes * PAGE_SIZE, GFP_KERNEL);
-   if (!gnttab_shared.addr) {
-   pr_warn(%s, kmsg);
-   return -ENOMEM;
-   }
+
+   rc = xlated_setup_gnttab_pages((unsigned long)max_nr_gframes,
+  gnttab_shared.addr);
+   if (rc != 0)
+   return rc;
}
if (xen_pv_domain())
return gnttab_map(0, nr_grant_frames - 1);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] PVH linux: Use ballooning to allocate grant table pages

2013-02-01 Thread Mukesh Rathor
On Fri, 1 Feb 2013 14:00:58 -0800
Mukesh Rathor  wrote:

> On Thu, 31 Jan 2013 18:44:46 -0800
> Mukesh Rathor  wrote:
> 
> > On Thu, 31 Jan 2013 18:30:15 -0800
> > Mukesh Rathor  wrote:
> > 
> > > This patch fixes a fixme in Linux to use
> > > alloc_xenballooned_pages() to allocate pfns for grant table pages
> > > instead of kmalloc. This also simplifies add to physmap on the
> > > xen side a bit.
> > 
> > Looking at it again, I realized rc should be signed in
> > gnttab_resume(). Below again. Thanks.
> 
> Konrad, Please hold off on this patch. I discovered an issue on the 
> domU side with this change. I'm currently investigating if it's 
> related.

Ah right, I forgot the pfn's from balloon may not be always contigous.
Besides these are special pfns so to speak, so in gnttab_map()
virt_to_pfn doesn't work. 

I'm gonna have to create a separate gnttab map routine for pvh case, it
appears. Shouldn't be too bad tho.

thanks,
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] PVH linux: Use ballooning to allocate grant table pages

2013-02-01 Thread Mukesh Rathor
On Thu, 31 Jan 2013 18:44:46 -0800
Mukesh Rathor  wrote:

> On Thu, 31 Jan 2013 18:30:15 -0800
> Mukesh Rathor  wrote:
> 
> > This patch fixes a fixme in Linux to use alloc_xenballooned_pages()
> > to allocate pfns for grant table pages instead of kmalloc. This also
> > simplifies add to physmap on the xen side a bit.
> 
> Looking at it again, I realized rc should be signed in
> gnttab_resume(). Below again. Thanks.

Konrad, Please hold off on this patch. I discovered an issue on the 
domU side with this change. I'm currently investigating if it's 
related.

thanks,
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >