from:"Mukesh Rathor"

Re: [Xen-devel] [V5 PATCH 1/1] x86/xen: Set EFER.NX and EFER.SCE in PVH guests

2014-09-15 Thread Mukesh Rathor

On Fri, 12 Sep 2014 16:42:58 -0400
Konrad Rzeszutek Wilk  wrote:

> On Wed, Sep 10, 2014 at 04:36:06PM -0700, Mukesh Rathor wrote:

sorry, i didn't realize you had more comments... didn't scroll down :)..

> >  cpumask_var_t xen_cpu_initialized_map;
> >  
> > @@ -99,10 +100,14 @@ static void cpu_bringup(void)
> > wmb();  /* make sure everything is
> > out */ }
> >  
> > -/* Note: cpu parameter is only relevant for PVH */
> > -static void cpu_bringup_and_idle(int cpu)
> > +/*
> > + * Note: cpu parameter is only relevant for PVH. The reason for
> > passing it
> > + * is we can't do smp_processor_id until the percpu segments are
> > loaded, for
> > + * which we need the cpu number! So we pass it in rdi as first
> > parameter.
> > + */
> 
> Thank you for expanding on that (I totally forgot why we did that).

sure.

> > +* The vcpu comes on kernel page tables which have
> > the NX pte
> > +* bit set. This means before DS/SS is touched, NX
> > in
> > +* EFER must be set. Hence the following assembly
> > glue code.
> 
> And you ripped out the nice 'N.B' comment I added. Sad :-(
> >  */
> > +   ctxt->user_regs.eip = (unsigned
> > long)xen_pvh_early_cpu_init; ctxt->user_regs.rdi = cpu;
> > +   ctxt->user_regs.rsi = true;  /* secondary cpu ==
> > true */
> 
> Oh, that is new. Ah yes we can use that [looking at Xen code].
> I wonder what other registers we can use to pass stuff around.

All GPRs. I commented that we can do that in the Rogers PVH doc.

Looks like David responded to other comments.

Thanks,
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [V5 PATCH 1/1] x86/xen: Set EFER.NX and EFER.SCE in PVH guests

2014-09-12 Thread Mukesh Rathor

On Fri, 12 Sep 2014 16:42:58 -0400
Konrad Rzeszutek Wilk  wrote:

> On Wed, Sep 10, 2014 at 04:36:06PM -0700, Mukesh Rathor wrote:
> > This fixes two bugs in PVH guests:
> > 
> >   - Not setting EFER.NX means the NX bit in page table entries is
> > ignored on Intel processors and causes reserved bit page faults
> > on AMD processors.
> > 
> >   - After the Xen commit 7645640d6ff1 ("x86/PVH: don't set EFER_SCE
> > for pvh guest") PVH guests are required to set EFER.SCE to enable
> > the SYSCALL instruction.
> > 
> > Secondary VCPUs are started with pagetables with the NX bit set so
> > EFER.NX must be set before using any stack or data segment.
> > xen_pvh_cpu_early_init() is the new secondary VCPU entry point that
> > sets EFER before jumping to cpu_bringup_and_idle().
> > 
> > Signed-off-by: Mukesh Rathor 
> > Signed-off-by: David Vrabel 
> 
> Huh? So who wrote it? Or did you mean 'Reviewed-by'?

No, meant SOB. I wrote v1, v2, then David came up with V3 and v4, 
then i took comments from v4 and came up with v5.

-Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V5 PATCH 1/1] x86/xen: Set EFER.NX and EFER.SCE in PVH guests

2014-09-10 Thread Mukesh Rathor

This fixes two bugs in PVH guests:

  - Not setting EFER.NX means the NX bit in page table entries is
ignored on Intel processors and causes reserved bit page faults on
AMD processors.

  - After the Xen commit 7645640d6ff1 ("x86/PVH: don't set EFER_SCE for
pvh guest") PVH guests are required to set EFER.SCE to enable the
SYSCALL instruction.

Secondary VCPUs are started with pagetables with the NX bit set so
EFER.NX must be set before using any stack or data segment.
xen_pvh_cpu_early_init() is the new secondary VCPU entry point that
sets EFER before jumping to cpu_bringup_and_idle().

Signed-off-by: Mukesh Rathor 
Signed-off-by: David Vrabel 
---
 arch/x86/xen/enlighten.c |  6 ++
 arch/x86/xen/smp.c   | 29 ++---
 arch/x86/xen/smp.h   |  8 
 arch/x86/xen/xen-head.S  | 33 +
 4 files changed, 65 insertions(+), 11 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c0cb11f..424d831 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1463,6 +1463,7 @@ static void __ref xen_setup_gdt(int cpu)
pv_cpu_ops.load_gdt = xen_load_gdt;
 }
 
+#ifdef CONFIG_XEN_PVH
 /*
  * A PV guest starts with default flags that are not set for PVH, set them
  * here asap.
@@ -1508,12 +1509,15 @@ static void __init xen_pvh_early_guest_init(void)
return;
 
xen_have_vector_callback = 1;
+
+   xen_pvh_early_cpu_init(0, false);
xen_pvh_set_cr_flags(0);
 
 #ifdef CONFIG_X86_32
BUG(); /* PVH: Implement proper support. */
 #endif
 }
+#endif/* CONFIG_XEN_PVH */
 
 /* First C function to be called on Xen boot */
 asmlinkage __visible void __init xen_start_kernel(void)
@@ -1527,7 +1531,9 @@ asmlinkage __visible void __init xen_start_kernel(void)
xen_domain_type = XEN_PV_DOMAIN;
 
xen_setup_features();
+#ifdef CONFIG_XEN_PVH
xen_pvh_early_guest_init();
+#endif
xen_setup_machphys_mapping();
 
/* Install Xen paravirt ops */
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7005974..b25f8942 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -37,6 +37,7 @@
 #include 
 #include "xen-ops.h"
 #include "mmu.h"
+#include "smp.h"
 
 cpumask_var_t xen_cpu_initialized_map;
 
@@ -99,10 +100,14 @@ static void cpu_bringup(void)
wmb();  /* make sure everything is out */
 }
 
-/* Note: cpu parameter is only relevant for PVH */
-static void cpu_bringup_and_idle(int cpu)
+/*
+ * Note: cpu parameter is only relevant for PVH. The reason for passing it
+ * is we can't do smp_processor_id until the percpu segments are loaded, for
+ * which we need the cpu number! So we pass it in rdi as first parameter.
+ */
+asmlinkage __visible void cpu_bringup_and_idle(int cpu)
 {
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_XEN_PVH
if (xen_feature(XENFEAT_auto_translated_physmap) &&
xen_feature(XENFEAT_supervisor_mode_kernel))
xen_pvh_secondary_vcpu_init(cpu);
@@ -374,11 +379,10 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
ctxt->user_regs.fs = __KERNEL_PERCPU;
ctxt->user_regs.gs = __KERNEL_STACK_CANARY;
 #endif
-   ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
-
memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
ctxt->flags = VGCF_IN_KERNEL;
ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
ctxt->user_regs.ds = __USER_DS;
@@ -413,15 +417,18 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
(unsigned long)xen_failsafe_callback;
ctxt->user_regs.cs = __KERNEL_CS;
per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
-#ifdef CONFIG_X86_32
}
-#else
-   } else
-   /* N.B. The user_regs.eip (cpu_bringup_and_idle) is called with
-* %rdi having the cpu number - which means are passing in
-* as the first parameter the cpu. Subtle!
+#ifdef CONFIG_XEN_PVH
+   else {
+   /*
+* The vcpu comes on kernel page tables which have the NX pte
+* bit set. This means before DS/SS is touched, NX in
+* EFER must be set. Hence the following assembly glue code.
 */
+   ctxt->user_regs.eip = (unsigned long)xen_pvh_early_cpu_init;
ctxt->user_regs.rdi = cpu;
+   ctxt->user_regs.rsi = true;  /* secondary cpu == true */
+   }
 #endif
ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
diff --g

[V5 PATCH 0/1] x86/xen: Set EFER.NX and EFER.SCE in PVH guests

2014-09-10 Thread Mukesh Rathor

Hi,

Attached V5 patch for fixing the EFER bugs on PVH.

Changes in v5 (Mukesh):
  - Jan reminded us that vcpu 0 could go offline/online. So, add flag back 
instead of using cpuid to return from xen_pvh_early_cpu_init.
  - Boris comments: 
   o Rename to xen_pvh_early_cpu_init
   o Add ifdef around pvh functions in enlighten.c too.
  - Tab before closing brace to pacify checkpatch.pl

Changes in v4 (David):
  - cpu == 0 => boot CPU
  - Reduce #ifdefs.
  - Add patch for XEN_PVH docs.

Changes in v3 (David):
  - Use common xen_pvh_cpu_early_init() function for boot and secondary
VCPUs.

Changes in v2: (Mukesh):
  - Use assembly macro to unify code for boot and secondary VCPUs.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [V2 PATCH 1/1] PVH: set EFER.NX and EFER.SCE

2014-09-03 Thread Mukesh Rathor

On Wed, 3 Sep 2014 14:58:04 +0100
David Vrabel  wrote:

> On 03/09/14 02:19, Mukesh Rathor wrote:
> > This patch addresses two things for a pvh boot vcpu:
> > 
> >   - NX bug on intel: It was recenlty discovered that NX is not being
> > honored in PVH on intel since EFER.NX is not being set.
> > 
> >   - PVH boot hang on newer xen:  Following c/s on xen
> > 
> > c/s 7645640:  x86/PVH: don't set EFER_SCE for pvh guest
> > 
> > removes setting of EFER.SCE for PVH guests. As such, existing
> > intel pvh guest will no longer boot on xen after that c/s.
> > 
> > Both above changes will be applicable to AMD also when xen support
> > of AMD pvh is added.
> > 
> > Also, we create a new glue assembly entry point for secondary vcpus
> > because they come up on kernel page tables that have pte.NX
> > bits set. As such, before anything is touched in DS/SS, EFER.NX
> > must be set.
> [...]
> > --- a/arch/x86/xen/xen-head.S
> > +++ b/arch/x86/xen/xen-head.S
> > @@ -47,6 +47,35 @@ ENTRY(startup_xen)
> >  
> > __FINIT
> >  
> > +#ifdef CONFIG_XEN_PVH
> > +#ifdef CONFIG_X86_64
> > +.macro PVH_EARLY_SET_EFER
> 
> I don't think a macro is the right way to do this.  We can instead
> pass a parameter to say whether it is a boot or secondary CPU.
> 
> Something like this (untested) patch?

That's fine too. But, since vcpu 0 is always primary vcpu, we can
just use that and not worry about passing another parameter.

-Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V2 PATCH 1/1] PVH: set EFER.NX and EFER.SCE

2014-09-02 Thread Mukesh Rathor

This patch addresses two things for a pvh boot vcpu:

  - NX bug on intel: It was recenlty discovered that NX is not being
honored in PVH on intel since EFER.NX is not being set.

  - PVH boot hang on newer xen:  Following c/s on xen

c/s 7645640:  x86/PVH: don't set EFER_SCE for pvh guest

removes setting of EFER.SCE for PVH guests. As such, existing intel
pvh guest will no longer boot on xen after that c/s.

Both above changes will be applicable to AMD also when xen support
of AMD pvh is added.

Also, we create a new glue assembly entry point for secondary vcpus
because they come up on kernel page tables that have pte.NX
bits set. As such, before anything is touched in DS/SS, EFER.NX
must be set.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/enlighten.c |  3 +++
 arch/x86/xen/smp.c   | 28 
 arch/x86/xen/smp.h   |  1 +
 arch/x86/xen/xen-head.S  | 29 +
 4 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c0cb11f..e17fa2d 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -85,6 +85,8 @@
 
 EXPORT_SYMBOL_GPL(hypercall_page);
 
+extern void xen_pvh_configure_efer(void);
+
 /*
  * Pointer to the xen_vcpu_info structure or
  * &HYPERVISOR_shared_info->vcpu_info[cpu]. See xen_hvm_init_shared_info
@@ -1508,6 +1510,7 @@ static void __init xen_pvh_early_guest_init(void)
return;
 
xen_have_vector_callback = 1;
+   xen_pvh_configure_efer();
xen_pvh_set_cr_flags(0);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7005974..073bbf4 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -37,6 +37,7 @@
 #include 
 #include "xen-ops.h"
 #include "mmu.h"
+#include "smp.h"
 
 cpumask_var_t xen_cpu_initialized_map;
 
@@ -99,8 +100,12 @@ static void cpu_bringup(void)
wmb();  /* make sure everything is out */
 }
 
-/* Note: cpu parameter is only relevant for PVH */
-static void cpu_bringup_and_idle(int cpu)
+/*
+ * Note: cpu parameter is only relevant for PVH. The reason for passing it
+ * is we can't do smp_processor_id until the percpu segments are loaded, for
+ * which we need the cpu number! So we pass it in rdi as first parameter.
+ */
+asmlinkage __visible void cpu_bringup_and_idle(int cpu)
 {
 #ifdef CONFIG_X86_64
if (xen_feature(XENFEAT_auto_translated_physmap) &&
@@ -374,11 +379,10 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
ctxt->user_regs.fs = __KERNEL_PERCPU;
ctxt->user_regs.gs = __KERNEL_STACK_CANARY;
 #endif
-   ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
-
memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
ctxt->flags = VGCF_IN_KERNEL;
ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
ctxt->user_regs.ds = __USER_DS;
@@ -416,12 +420,20 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
 #ifdef CONFIG_X86_32
}
 #else
-   } else
-   /* N.B. The user_regs.eip (cpu_bringup_and_idle) is called with
-* %rdi having the cpu number - which means are passing in
-* as the first parameter the cpu. Subtle!
+   } else {
+   /*
+* The vcpu comes on kernel page tables which have the NX pte
+* bit set. This means before DS/SS is touched, NX in
+* EFER must be set. Hence the following assembly glue code.
+*/
+   ctxt->user_regs.eip = (unsigned long)pvh_smp_cpu_bringup;
+
+   /* N.B. The bringup function cpu_bringup_and_idle is called with
+* %rdi having the cpu number - which means we are passing it in
+* as the first parameter. Subtle!
 */
ctxt->user_regs.rdi = cpu;
+   }
 #endif
ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
index c7c2d89..d6628cb 100644
--- a/arch/x86/xen/smp.h
+++ b/arch/x86/xen/smp.h
@@ -7,5 +7,6 @@ extern void xen_send_IPI_mask_allbutself(const struct cpumask 
*mask,
 extern void xen_send_IPI_allbutself(int vector);
 extern void xen_send_IPI_all(int vector);
 extern void xen_send_IPI_self(int vector);
+extern void pvh_smp_cpu_bringup(int cpu);
 
 #endif
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 485b695..97ee831 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -47,6 +47,35 @@ ENTRY(startup_xen)
 
__FINIT
 
+#ifdef CONFIG_XEN_PVH

[V2 PATCH 0/1] PVH: set EFER bits

2014-09-02 Thread Mukesh Rathor

Changes from V1:
   - Unify the patches into one
   - Unify the code to set the EFER bits.

thanks,
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [V1 PATCH 1/2] PVH: set EFER.NX and EFER.SCE for boot vcpu

2014-08-28 Thread Mukesh Rathor

On Thu, 28 Aug 2014 15:18:26 +0100
David Vrabel  wrote:

> On 27/08/14 23:33, Mukesh Rathor wrote:
> > This patch addresses three things for a pvh boot vcpu:
> > 
> >   - NX bug on intel: It was recenlty discovered that NX is not being
> > honored in PVH on intel since EFER.NX is not being set. The
> > pte.NX bits are ignored if EFER.NX is not set on intel.
> 
> I am unconvinced by this explanation.  The Intel SDM clearly states
> that the XD bit in the page table entries is reserved if EFER.NXE is
> clear, and thus using a entry with XD set and EFER.NXE clear should
> generate a page fault (same as AMD).
> 
> You either need to find out why Intel really worked (perhaps Xen is
> setting EFER.NXE on Intel?) or you need to included an errata (or
> similar) reference.

Nop, verified that again. The vcpu is coming up on efer 0x501,  ie,
LME/LMA/SCE (older xen prior to SCE removal change). The pte entry for
rsp is: 80003e32b063 that has NX set. No exception is generated upon
push rbp instruction (like on amd).

Could be that Intel docs are incomplete on vmx, I didn't hear back from them 
on the last one I had found. Anyways, we are not addressing an intel errata
here, but fixing our issue of setting the EFER.NX bit.

-Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V1 PATCH 1/2] PVH: set EFER.NX and EFER.SCE for boot vcpu

2014-08-27 Thread Mukesh Rathor

This patch addresses three things for a pvh boot vcpu:

  - NX bug on intel: It was recenlty discovered that NX is not being
honored in PVH on intel since EFER.NX is not being set. The pte.NX
bits are ignored if EFER.NX is not set on intel.

  - PVH boot hang on newer xen:  Following c/s on xen

c/s 7645640:  x86/PVH: don't set EFER_SCE for pvh guest

removes setting of EFER.SCE for PVH guests. As such, existing intel pvh
guest will no longer boot on xen after that c/s.

  - Both above changes will be applicable to AMD also when xen support of
AMD pvh is added.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/enlighten.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c0cb11f..4af512d 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1499,6 +1499,17 @@ void __ref xen_pvh_secondary_vcpu_init(int cpu)
xen_pvh_set_cr_flags(cpu);
 }
 
+/* This is done in secondary_startup_64 for hvm guests. */
+static void __init xen_configure_efer(void)
+{
+   u64 efer;
+
+   rdmsrl(MSR_EFER, efer);
+   efer |= EFER_SCE;
+   efer |= (cpuid_edx(0x8001) & (1 << 20)) ? EFER_NX : 0;
+   wrmsrl(MSR_EFER, efer);
+}
+
 static void __init xen_pvh_early_guest_init(void)
 {
if (!xen_feature(XENFEAT_auto_translated_physmap))
@@ -1508,6 +1519,7 @@ static void __init xen_pvh_early_guest_init(void)
return;
 
xen_have_vector_callback = 1;
+   xen_configure_efer();
xen_pvh_set_cr_flags(0);
 
 #ifdef CONFIG_X86_32
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V1 PATCH 0/2] Linux PVH: set EFER bits..

2014-08-27 Thread Mukesh Rathor

Resending with comments fixed up. Please note, these are no longer
AMD only, but address existing broken boot and broken NX on intel.

thanks
mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V1 PATCH 2/2] PVH: set EFER.NX and EFER.SCE for secondary vcpus

2014-08-27 Thread Mukesh Rathor

This patch addresses three things for a pvh secondary vcpu:

  - NX bug on intel: It was recenlty discovered that NX is not being
honored in PVH on intel since EFER.NX is not being set. The pte.NX
bits are ignored if EFER.NX is not set on intel.

  - PVH boot hang on newer xen:  Following c/s on xen

c/s 7645640:  x86/PVH: don't set EFER_SCE for pvh guest

removes setting of EFER.SCE for PVH guests. As such, existing intel pvh
guest will no longer boot on xen after that c/s.

  - Both above changes will be applicable to AMD also when xen support of
AMD pvh is added.

Please note: We create a new glue assembly entry point because the
secondary vcpus come up on kernel page tables that have pte.NX
bits set. While on Intel these are ignored if EFER.NX is not set, on
AMD a RSVD bit fault is generated.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/smp.c  | 28 
 arch/x86/xen/smp.h  |  1 +
 arch/x86/xen/xen-head.S | 21 +
 3 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7005974..66058b9 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -37,6 +37,7 @@
 #include 
 #include "xen-ops.h"
 #include "mmu.h"
+#include "smp.h"
 
 cpumask_var_t xen_cpu_initialized_map;
 
@@ -99,8 +100,12 @@ static void cpu_bringup(void)
wmb();  /* make sure everything is out */
 }
 
-/* Note: cpu parameter is only relevant for PVH */
-static void cpu_bringup_and_idle(int cpu)
+/*
+ * Note: cpu parameter is only relevant for PVH. The reason for passing it
+ * is we can't do smp_processor_id until the percpu segments are loaded, for
+ * which we need the cpu number! So we pass it in rdi as first parameter.
+ */
+asmlinkage __visible void cpu_bringup_and_idle(int cpu)
 {
 #ifdef CONFIG_X86_64
if (xen_feature(XENFEAT_auto_translated_physmap) &&
@@ -374,11 +379,10 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
ctxt->user_regs.fs = __KERNEL_PERCPU;
ctxt->user_regs.gs = __KERNEL_STACK_CANARY;
 #endif
-   ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
-
memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
ctxt->flags = VGCF_IN_KERNEL;
ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
ctxt->user_regs.ds = __USER_DS;
@@ -416,12 +420,20 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
 #ifdef CONFIG_X86_32
}
 #else
-   } else
-   /* N.B. The user_regs.eip (cpu_bringup_and_idle) is called with
-* %rdi having the cpu number - which means are passing in
-* as the first parameter the cpu. Subtle!
+   } else {
+   /*
+* The vcpu comes on kernel page tables which have the NX pte
+* bit set on AMD. This means before DS/SS is touched, NX in
+* EFER must be set. Hence the following assembly glue code.
+*/
+   ctxt->user_regs.eip = (unsigned long)pvh_cpu_bringup;
+
+   /* N.B. The bringup function cpu_bringup_and_idle is called with
+* %rdi having the cpu number - which means we are passing it in
+* as the first parameter. Subtle!
 */
ctxt->user_regs.rdi = cpu;
+   }
 #endif
ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
index c7c2d89..b20ba68 100644
--- a/arch/x86/xen/smp.h
+++ b/arch/x86/xen/smp.h
@@ -7,5 +7,6 @@ extern void xen_send_IPI_mask_allbutself(const struct cpumask 
*mask,
 extern void xen_send_IPI_allbutself(int vector);
 extern void xen_send_IPI_all(int vector);
 extern void xen_send_IPI_self(int vector);
+extern void pvh_cpu_bringup(int cpu);
 
 #endif
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 485b695..db8dca5 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -47,6 +47,27 @@ ENTRY(startup_xen)
 
__FINIT
 
+#ifdef CONFIG_XEN_PVH
+#ifdef CONFIG_X86_64
+/* Note that rdi contains the cpu number and must be preserved */
+ENTRY(pvh_cpu_bringup)
+   /* Gather features to see if NX implemented. (no EFER.NX on intel) */
+   movl$0x8001, %eax
+   cpuid
+   movl%edx,%esi
+
+   movl$MSR_EFER, %ecx
+   rdmsr
+   btsl$_EFER_SCE, %eax
+
+   btl $20,%esi
+   jnc 1f  /* No NX, skip it */
+   btsl$_EFER_NX, %eax
+1: wrmsr
+   jmp cpu_bringup_and_idle
+#endif /* CONFIG_X86_64 */
+#endif /* CONFIG_XEN_PV

Re: [Xen-devel] [V0 PATCH 1/2] AMD-PVH: set EFER.NX and EFER.SCE for the boot vcpu

2014-08-22 Thread Mukesh Rathor

On Fri, 22 Aug 2014 12:09:27 -0700
Mukesh Rathor  wrote:

> On Fri, 22 Aug 2014 06:41:40 +0200
> Borislav Petkov  wrote:
> 
> > On Thu, Aug 21, 2014 at 07:46:56PM -0700, Mukesh Rathor wrote:
> > > Intel doesn't have EFER.NX bit.
> > 
> > Of course it does.
> > 
> 
> Right, it does. Some code/comment is misleading... Anyways, reading
> intel SDMs, if I understand the convoluted text correctly, EFER.NX is
> not required to be set for l1.nx to be set, thus allowing for page
> level protection. Where as on AMD, EFER.NX must be set for l1.nx to
> be used. So, in the end, this patch would apply to both amd/intel 
> 
> I'll reword and submit.

Err, try again, the section "4.1.1 Three Paging Modes" says:

"Execute-disable access rights are applied only if IA32_EFER.NXE = 1"

So, I guess NX is broken on Intel PVH because EFER.NX is currently 
not being set.  While AMD will #GP if l1.NX is set and EFER.NX is not, 
I guess Intel just ignores the l1.XD if EFER.NX is not set. 

Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [V0 PATCH 0/2] AMD PVH domU support

2014-08-22 Thread Mukesh Rathor

On Fri, 22 Aug 2014 14:52:41 +0100
David Vrabel  wrote:

> On 21/08/14 03:16, Mukesh Rathor wrote:
> > Hi,
> > 
> > Here's first stab at AMD PVH domU support. Pretty much the only
> > thing needed is EFER bits set. Please review.
> 
> I'm not going to accept this until there is some ABI documentation
> stating explicitly what state non-boot CPUs will be in.
> 
> I'm particularly concerned that: a) there is a difference between AMD
> and Intel; and b) you want to change the ABI by clearing a the
> EFER.SCE bit.

Correct, I realize it changes the ABI, but I believe that is the right 
thing to do while we can, specially, since we need to fix the EFER for
NX anyways. Looking at the code, it appears this would be the final
cleanup for this ABI... :)..

However, if that's not possible, I suppose we can just leave it as is
too for the SC bit.

thanks
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [V0 PATCH 0/2] AMD PVH domU support

2014-08-22 Thread Mukesh Rathor

On Fri, 22 Aug 2014 14:55:21 +0100
David Vrabel  wrote:

> On 22/08/14 14:52, David Vrabel wrote:
> > On 21/08/14 03:16, Mukesh Rathor wrote:
> >> Hi,
> >>
> >> Here's first stab at AMD PVH domU support. Pretty much the only
> >> thing needed is EFER bits set. Please review.
> > 
> > I'm not going to accept this until there is some ABI documentation
> > stating explicitly what state non-boot CPUs will be in.
> 
> Also the boot CPU.
> 
> David

Sure, but looks like Roger already beat me to it... 

>From Roger's "very initial PVH design document" :

And finally on `EFER` the following features are enabled:

  * LME (bit 8): Long mode enable.
  * LMA (bit 10): Long mode active.


LMK if anything additional needs to be done.

Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [V0 PATCH 1/2] AMD-PVH: set EFER.NX and EFER.SCE for the boot vcpu

2014-08-22 Thread Mukesh Rathor

On Fri, 22 Aug 2014 06:41:40 +0200
Borislav Petkov  wrote:

> On Thu, Aug 21, 2014 at 07:46:56PM -0700, Mukesh Rathor wrote:
> > Intel doesn't have EFER.NX bit.
> 
> Of course it does.
> 

Right, it does. Some code/comment is misleading... Anyways, reading
intel SDMs, if I understand the convoluted text correctly, EFER.NX is
not required to be set for l1.nx to be set, thus allowing for page
level protection. Where as on AMD, EFER.NX must be set for l1.nx to
be used. So, in the end, this patch would apply to both amd/intel 

I'll reword and submit.

Thanks,
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [V0 PATCH 1/2] AMD-PVH: set EFER.NX and EFER.SCE for the boot vcpu

2014-08-21 Thread Mukesh Rathor

On Thu, 21 Aug 2014 21:39:04 -0400
Konrad Rzeszutek Wilk  wrote:

> On Wed, Aug 20, 2014 at 07:16:39PM -0700, Mukesh Rathor wrote:
> > On AMD, NX feature must be enabled in the efer for NX to be honored
> > in the pte entries, otherwise protection fault. We also set SC for
> > system calls to be enabled.
> 
> How come we don't need to do that for Intel (that is set the NX bit)?
> Could you include the explanation here please?

Intel doesn't have EFER.NX bit. The SC bit is being set in xen, but it
doesn't need to be, and I'm going to submit a patch to undo it.

> 
> > 
> > Signed-off-by: Mukesh Rathor 
> > ---
> >  arch/x86/xen/enlighten.c | 12 
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> > index c0cb11f..4af512d 100644
> > --- a/arch/x86/xen/enlighten.c
> > +++ b/arch/x86/xen/enlighten.c
> > @@ -1499,6 +1499,17 @@ void __ref xen_pvh_secondary_vcpu_init(int
> > cpu) xen_pvh_set_cr_flags(cpu);
> >  }
> >  
> > +/* This is done in secondary_startup_64 for hvm guests. */
> > +static void __init xen_configure_efer(void)
> > +{
> > +   u64 efer;
> > +
> > +   rdmsrl(MSR_EFER, efer);
> > +   efer |= EFER_SCE;
> > +   efer |= (cpuid_edx(0x8001) & (1 << 20)) ? EFER_NX : 0;
> 
> Ahem? #defines for these magic values please?

Linux uses these directly all over the code as they are set in stone
pretty much, and I didn't find any #defines. See cpu/common.c for one of
the places. Also see secondary_startup_64, and others...

> Or could you use 'boot_cpu_has'?

Nop, it's not initialized at this point.

thanks,
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V0 PATCH 1/2] AMD-PVH: set EFER.NX and EFER.SCE for the boot vcpu

2014-08-20 Thread Mukesh Rathor

On AMD, NX feature must be enabled in the efer for NX to be honored in
the pte entries, otherwise protection fault. We also set SC for
system calls to be enabled.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/enlighten.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c0cb11f..4af512d 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1499,6 +1499,17 @@ void __ref xen_pvh_secondary_vcpu_init(int cpu)
xen_pvh_set_cr_flags(cpu);
 }
 
+/* This is done in secondary_startup_64 for hvm guests. */
+static void __init xen_configure_efer(void)
+{
+   u64 efer;
+
+   rdmsrl(MSR_EFER, efer);
+   efer |= EFER_SCE;
+   efer |= (cpuid_edx(0x8001) & (1 << 20)) ? EFER_NX : 0;
+   wrmsrl(MSR_EFER, efer);
+}
+
 static void __init xen_pvh_early_guest_init(void)
 {
if (!xen_feature(XENFEAT_auto_translated_physmap))
@@ -1508,6 +1519,7 @@ static void __init xen_pvh_early_guest_init(void)
return;
 
xen_have_vector_callback = 1;
+   xen_configure_efer();
xen_pvh_set_cr_flags(0);
 
 #ifdef CONFIG_X86_32
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V0 PATCH 0/2] AMD PVH domU support

2014-08-20 Thread Mukesh Rathor

Hi,

Here's first stab at AMD PVH domU support. Pretty much the only thing
needed is EFER bits set. Please review.

thanks,
Mukesh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V0 PATCH 2/2] AMD-PVH: set EFER.NX and EFER.SCE for secondary vcpus

2014-08-20 Thread Mukesh Rathor

The secondary vcpus come on kernel page tables which have the NX bit set
in pte entries for DS/SS. On AMD, EFER.NX must be set to avoid protection
fault.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/smp.c  | 28 
 arch/x86/xen/smp.h  |  1 +
 arch/x86/xen/xen-head.S | 21 +
 3 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7005974..66058b9 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -37,6 +37,7 @@
 #include 
 #include "xen-ops.h"
 #include "mmu.h"
+#include "smp.h"
 
 cpumask_var_t xen_cpu_initialized_map;
 
@@ -99,8 +100,12 @@ static void cpu_bringup(void)
wmb();  /* make sure everything is out */
 }
 
-/* Note: cpu parameter is only relevant for PVH */
-static void cpu_bringup_and_idle(int cpu)
+/*
+ * Note: cpu parameter is only relevant for PVH. The reason for passing it
+ * is we can't do smp_processor_id until the percpu segments are loaded, for
+ * which we need the cpu number! So we pass it in rdi as first parameter.
+ */
+asmlinkage __visible void cpu_bringup_and_idle(int cpu)
 {
 #ifdef CONFIG_X86_64
if (xen_feature(XENFEAT_auto_translated_physmap) &&
@@ -374,11 +379,10 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
ctxt->user_regs.fs = __KERNEL_PERCPU;
ctxt->user_regs.gs = __KERNEL_STACK_CANARY;
 #endif
-   ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
-
memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
ctxt->flags = VGCF_IN_KERNEL;
ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
ctxt->user_regs.ds = __USER_DS;
@@ -416,12 +420,20 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
 #ifdef CONFIG_X86_32
}
 #else
-   } else
-   /* N.B. The user_regs.eip (cpu_bringup_and_idle) is called with
-* %rdi having the cpu number - which means are passing in
-* as the first parameter the cpu. Subtle!
+   } else {
+   /*
+* The vcpu comes on kernel page tables which have the NX pte
+* bit set on AMD. This means before DS/SS is touched, NX in
+* EFER must be set. Hence the following assembly glue code.
+*/
+   ctxt->user_regs.eip = (unsigned long)pvh_cpu_bringup;
+
+   /* N.B. The bringup function cpu_bringup_and_idle is called with
+* %rdi having the cpu number - which means we are passing it in
+* as the first parameter. Subtle!
 */
ctxt->user_regs.rdi = cpu;
+   }
 #endif
ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
index c7c2d89..b20ba68 100644
--- a/arch/x86/xen/smp.h
+++ b/arch/x86/xen/smp.h
@@ -7,5 +7,6 @@ extern void xen_send_IPI_mask_allbutself(const struct cpumask 
*mask,
 extern void xen_send_IPI_allbutself(int vector);
 extern void xen_send_IPI_all(int vector);
 extern void xen_send_IPI_self(int vector);
+extern void pvh_cpu_bringup(int cpu);
 
 #endif
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 485b695..db8dca5 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -47,6 +47,27 @@ ENTRY(startup_xen)
 
__FINIT
 
+#ifdef CONFIG_XEN_PVH
+#ifdef CONFIG_X86_64
+/* Note that rdi contains the cpu number and must be preserved */
+ENTRY(pvh_cpu_bringup)
+   /* Gather features to see if NX implemented. (no EFER.NX on intel) */
+   movl$0x8001, %eax
+   cpuid
+   movl%edx,%esi
+
+   movl$MSR_EFER, %ecx
+   rdmsr
+   btsl$_EFER_SCE, %eax
+
+   btl $20,%esi
+   jnc 1f  /* No NX, skip it */
+   btsl$_EFER_NX, %eax
+1: wrmsr
+   jmp cpu_bringup_and_idle
+#endif /* CONFIG_X86_64 */
+#endif /* CONFIG_XEN_PVH */
+
 .pushsection .text
.balign PAGE_SIZE
 ENTRY(hypercall_page)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [V1 PATCH] dom0 pvh: map foreign pfns in our p2m for toolstack

2014-05-27 Thread Mukesh Rathor

On Tue, 27 May 2014 11:59:26 +0100
David Vrabel  wrote:

> On 27/05/14 11:43, Roger Pau Monné wrote:
> > On 24/05/14 03:33, Mukesh Rathor wrote:
> >> When running as dom0 in pvh mode, foreign pfns that are accessed
> >> must be added to our p2m which is managed by xen. This is done via
> >> XENMEM_add_to_physmap_range hypercall. This is needed for toolstack
> >> building guests and mapping guest memory, xentrace mapping xen
> >> pages, etc..
> 
> Thanks.
> 
> Applied to devel/for-linus-3.16, but see comments below.
> 
> >> +static int xlate_add_to_p2m(unsigned long lpfn, unsigned long
> >> fgmfn,
> >> +  unsigned int domid)
> 
> The preferred abbreviation is GFN not GMFN.  I fixed this up.
> 
> >> +{
> >> +  int rc, err = 0;
> >> +  xen_pfn_t gpfn = lpfn;
> >> +  xen_ulong_t idx = fgmfn;
> >> +
> >> +  struct xen_add_to_physmap_range xatp = {
> >> +  .domid = DOMID_SELF,
> >> +  .foreign_domid = domid,
> >> +  .size = 1,
> >> +  .space = XENMAPSPACE_gmfn_foreign,
> >> +  };
> >> +  set_xen_guest_handle(xatp.idxs, &idx);
> >> +  set_xen_guest_handle(xatp.gpfns, &gpfn);
> >> +  set_xen_guest_handle(xatp.errs, &err);
> >> +
> >> +  rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range,
> >> &xatp);
> >> +  return rc;
> > 
> > Thanks for the patches, I see two problems with this approach, the
> > first one is that you are completely ignoring the error in the
> > variable "err", which means that you can end up with a pfn that
> > Linux thinks it's valid, but it's not mapped to any mfn, so when
> > you try to access it you will trigger the vioapic crash.
> 
> I spotted this and fixed this up by adding:
> 
> +   if (rc < 0)
> +   return rc;
> +   return err;

Thanks a lot.

> > The second one is that this seems extremely inefficient, you are
> > issuing one hypercall for each memory page, when you could instead
> > batch all the pages into a single hypercall and map them in one
> > shot.
> 
> I agree, but the 3.16 merge window is nearly here so I've applied it
> as-is.  Note that the privcmd driver calls this function once per
> page, so the lack of batching doesn't really hurt here.

Thanks again, pleasure working with maintainer like you!

Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V1 PATCH] dom0 pvh linux support

2014-05-23 Thread Mukesh Rathor

Hi,

Attached please find patch for linux to support toolstack on pvh dom0.

thanks, 
Mukesh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V1 PATCH] dom0 pvh: map foreign pfns in our p2m for toolstack

2014-05-23 Thread Mukesh Rathor

When running as dom0 in pvh mode, foreign pfns that are accessed must be
added to our p2m which is managed by xen. This is done via
XENMEM_add_to_physmap_range hypercall. This is needed for toolstack
building guests and mapping guest memory, xentrace mapping xen pages,
etc..

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/mmu.c | 115 +++--
 1 file changed, 112 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 86e02ea..8efc066 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2510,6 +2510,93 @@ void __init xen_hvm_init_mmu_ops(void)
 }
 #endif
 
+#ifdef CONFIG_XEN_PVH
+/*
+ * Map foreign gmfn, fgmfn, to local pfn, lpfn. This for the user space
+ * creating new guest on pvh dom0 and needing to map domU pages.
+ */
+static int xlate_add_to_p2m(unsigned long lpfn, unsigned long fgmfn,
+   unsigned int domid)
+{
+   int rc, err = 0;
+   xen_pfn_t gpfn = lpfn;
+   xen_ulong_t idx = fgmfn;
+
+   struct xen_add_to_physmap_range xatp = {
+   .domid = DOMID_SELF,
+   .foreign_domid = domid,
+   .size = 1,
+   .space = XENMAPSPACE_gmfn_foreign,
+   };
+   set_xen_guest_handle(xatp.idxs, &idx);
+   set_xen_guest_handle(xatp.gpfns, &gpfn);
+   set_xen_guest_handle(xatp.errs, &err);
+
+   rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
+   return rc;
+}
+
+static int xlate_remove_from_p2m(unsigned long spfn, int count)
+{
+   struct xen_remove_from_physmap xrp;
+   int i, rc;
+
+   for (i = 0; i < count; i++) {
+   xrp.domid = DOMID_SELF;
+   xrp.gpfn = spfn+i;
+   rc = HYPERVISOR_memory_op(XENMEM_remove_from_physmap, &xrp);
+   if (rc)
+   break;
+   }
+   return rc;
+}
+
+struct xlate_remap_data {
+   unsigned long fgmfn; /* foreign domain's gmfn */
+   pgprot_t prot;
+   domid_t  domid;
+   int index;
+   struct page **pages;
+};
+
+static int xlate_map_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
+   void *data)
+{
+   int rc;
+   struct xlate_remap_data *remap = data;
+   unsigned long pfn = page_to_pfn(remap->pages[remap->index++]);
+   pte_t pteval = pte_mkspecial(pfn_pte(pfn, remap->prot));
+
+   rc = xlate_add_to_p2m(pfn, remap->fgmfn, remap->domid);
+   if (rc)
+   return rc;
+   native_set_pte(ptep, pteval);
+
+   return 0;
+}
+
+static int xlate_remap_gmfn_range(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long mfn,
+ int nr, pgprot_t prot, unsigned domid,
+ struct page **pages)
+{
+   int err;
+   struct xlate_remap_data pvhdata;
+
+   BUG_ON(!pages);
+
+   pvhdata.fgmfn = mfn;
+   pvhdata.prot = prot;
+   pvhdata.domid = domid;
+   pvhdata.index = 0;
+   pvhdata.pages = pages;
+   err = apply_to_page_range(vma->vm_mm, addr, nr << PAGE_SHIFT,
+ xlate_map_pte_fn, &pvhdata);
+   flush_tlb_all();
+   return err;
+}
+#endif
+
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
@@ -2544,13 +2631,20 @@ int xen_remap_domain_mfn_range(struct vm_area_struct 
*vma,
unsigned long range;
int err = 0;
 
-   if (xen_feature(XENFEAT_auto_translated_physmap))
-   return -EINVAL;
-
prot = __pgprot(pgprot_val(prot) | _PAGE_IOMAP);
 
BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
+   if (xen_feature(XENFEAT_auto_translated_physmap)) {
+#ifdef CONFIG_XEN_PVH
+   /* We need to update the local page tables and the xen HAP */
+   return xlate_remap_gmfn_range(vma, addr, mfn, nr, prot,
+ domid, pages);
+#else
+   return -EINVAL;
+#endif
+   }
+
rmd.mfn = mfn;
rmd.prot = prot;
 
@@ -2588,6 +2682,21 @@ int xen_unmap_domain_mfn_range(struct vm_area_struct 
*vma,
if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
return 0;
 
+#ifdef CONFIG_XEN_PVH
+   while (numpgs--) {
+
+   /* The mmu has already cleaned up the process mmu resources at
+* this point (lookup_address will return NULL). */
+   unsigned long pfn = page_to_pfn(pages[numpgs]);
+
+   xlate_remove_from_p2m(pfn, 1);
+   }
+   /* We don't need to flush tlbs because as part of xlate_remove_from_p2m,
+* the hypervisor will do tlb flushes after removing the p2m entries
+* from the EPT/NPT */
+   return 0;
+#else
return -EINVAL;
+#endif
 }
 EXPORT_SYMBOL_GPL(xen_unmap_domain_mfn_range);
-- 
1.8.3.1

--
To unsubscribe from

Re: [PATCH] pvh: set cr4 flags for APs

2014-02-03 Thread Mukesh Rathor

On Mon, 3 Feb 2014 15:43:46 -0500
Konrad Rzeszutek Wilk  wrote:

> On Mon, Feb 03, 2014 at 02:52:40PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Mon, Feb 03, 2014 at 11:30:01AM -0800, Mukesh Rathor wrote:
> > > On Mon, 3 Feb 2014 06:49:14 -0500
> > > Konrad Rzeszutek Wilk  wrote:
> > > 
> > > > On Wed, Jan 29, 2014 at 04:15:18PM -0800, Mukesh Rathor wrote:
> > > > > We need to set cr4 flags for APs that are already set for BSP.
> > > > 
> > > > The title is missing the 'xen' part.
> > > 
> > > The patch is for linux, not xen.
> > 
> > Right. And hence you need to prefix the title with 'xen' in it
> > otherwise it won't be obvious from the Linux log line for what
> > component of the Linux tree it is.
> > 
> > > 
> > > > I rewrote it a bit and I think this should go in 3.14.
> > > > 
> > > > David, Boris: It is not the full fix as there are other parts to
> > > > make an PVH guest use 2MB or 1GB pages- but this fixes an
> > > > obvious bug.
> > > > 
> > > > 
> > > > 
> > > > From 797ea6812ff0a90cce966a4ff6bad57cbadc43b5 Mon Sep 17
> > > > 00:00:00 2001 From: Mukesh Rathor 
> > > > Date: Wed, 29 Jan 2014 16:15:18 -0800
> > > > Subject: [PATCH] xen/pvh: set CR4 flags for APs
> > > > 
> > > > The Xen ABI sets said flags for the BSP, but it does
> > > 
> > > NO it does not. I said it few times, it's set by
> > > probe_page_size_mask (which is in linux) for the BSP. The comment
> > > below also says it.
> > 
> > Where does it set it for APs? Can we piggyback on that?
> 
> And since I am in a hurry to fix an build regression I did the
> research myself - but this kind of information needs to be in the
> commit message.
> 
> Here is what I have, please comment as I want to send a git pull to
> Linux within the hour.
> 
> From 125ef07fd58e963cc286554f6536e46c9712033c Mon Sep 17 00:00:00 2001
> From: Mukesh Rathor 
> Date: Wed, 29 Jan 2014 16:15:18 -0800
> Subject: [PATCH] xen/pvh: set CR4 flags for APs
> 
> During bootup in the 'probe_page_size_mask' these CR4
> flags are set in there. But for AP processors they
> are not set as we do not use 'secondary_startup_64' which
> the baremetal kernels uses. Instead do it in
> this function which we use in Xen PVH during our
> startup for AP and BSP processors.
> 
> As such fix it up to make sure we have that flag set.

Thats good enough for me.

Mukesh


> Signed-off-by: Mukesh Rathor 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  arch/x86/xen/enlighten.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index a4d7b64..201d09a 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1473,6 +1473,18 @@ static void xen_pvh_set_cr_flags(int cpu)
>* X86_CR0_TS, X86_CR0_PE, X86_CR0_ET are set by Xen for HVM
> guests
>* (which PVH shared codepaths), while X86_CR0_PG is for
> PVH. */ write_cr0(read_cr0() | X86_CR0_MP | X86_CR0_NE | X86_CR0_WP |
> X86_CR0_AM); +
> + if (!cpu)
> + return;
> + /*
> +  * For BSP, PSE PGE are set in probe_page_size_mask(), for
> APs
> +  * set them here. For all, OSFXSR OSXMMEXCPT are set in
> fpu_init.
> + */
> + if (cpu_has_pse)
> + set_in_cr4(X86_CR4_PSE);
> +
> + if (cpu_has_pge)
> + set_in_cr4(X86_CR4_PGE);
>  }
>  
>  /*

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pvh: set cr4 flags for APs

2014-02-03 Thread Mukesh Rathor

On Mon, 3 Feb 2014 06:49:14 -0500
Konrad Rzeszutek Wilk  wrote:

> On Wed, Jan 29, 2014 at 04:15:18PM -0800, Mukesh Rathor wrote:
> > We need to set cr4 flags for APs that are already set for BSP.
> 
> The title is missing the 'xen' part.

The patch is for linux, not xen.

> I rewrote it a bit and I think this should go in 3.14.
> 
> David, Boris: It is not the full fix as there are other parts to
> make an PVH guest use 2MB or 1GB pages- but this fixes an obvious
> bug.
> 
> 
> 
> From 797ea6812ff0a90cce966a4ff6bad57cbadc43b5 Mon Sep 17 00:00:00 2001
> From: Mukesh Rathor 
> Date: Wed, 29 Jan 2014 16:15:18 -0800
> Subject: [PATCH] xen/pvh: set CR4 flags for APs
> 
> The Xen ABI sets said flags for the BSP, but it does

NO it does not. I said it few times, it's set by probe_page_size_mask
(which is in linux) for the BSP. The comment below also says it.

thanks
mukesh

> not do that for the CR4. As such fix it up to make
> sure we have that flag set.
> 
> Signed-off-by: Mukesh Rathor 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  arch/x86/xen/enlighten.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index a4d7b64..201d09a 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1473,6 +1473,18 @@ static void xen_pvh_set_cr_flags(int cpu)
>* X86_CR0_TS, X86_CR0_PE, X86_CR0_ET are set by Xen for HVM
> guests
>* (which PVH shared codepaths), while X86_CR0_PG is for
> PVH. */ write_cr0(read_cr0() | X86_CR0_MP | X86_CR0_NE | X86_CR0_WP |
> X86_CR0_AM); +
> + if (!cpu)
> + return;
> + /*
> +  * For BSP, PSE PGE are set in probe_page_size_mask(), for
> APs
> +  * set them here. For all, OSFXSR OSXMMEXCPT are set in
> fpu_init.
> + */
> + if (cpu_has_pse)
> + set_in_cr4(X86_CR4_PSE);
> +
> + if (cpu_has_pge)
> + set_in_cr4(X86_CR4_PGE);
>  }
>  
>  /*

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V0] linux PVH: Set CR4 flags

2014-01-30 Thread Mukesh Rathor

On Thu, 30 Jan 2014 11:40:44 +
Roger Pau Monné  wrote:

> On 30/01/14 00:15, Mukesh Rathor wrote:
> > Konrad,
> > 
> > The CR4 settings were dropped from my earlier patch because you
> > didn't wanna enable them. But since you do now, we need to set them
> > in the APs also. If you decide not too again, please apply my prev
> > patch "pvh: disable pse feature for now".
> 
> Hello Mukesh,
> 
> Could you push your CR related patches to a git repo branch? I'm
> currently having a bit of a mess in figuring out which ones should be
> applied and in which order.
> 
> Thanks, Roger.

Hey Roger,

Unfortunately, I don't have them in a tree because my first patch was 
changed during merge, and also the tree was refreshed.  Basically, the end
result, we leave features enabled on linux side, thus setting not only
the cr0 bits, but also the cr4 PSE and PGE for APs (they were already
set for the BSP). 

Konrad only merged the CR0 setting part of my first patch, hence this 
patch to set the CR4 bits. Hope that makes sense. My latest tree is:

http://oss.us.oracle.com/git/mrathor/linux.git  muk2

thanks
mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] pvh: set cr4 flags for APs

2014-01-29 Thread Mukesh Rathor

We need to set cr4 flags for APs that are already set for BSP.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/enlighten.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a4d7b64..201d09a 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1473,6 +1473,18 @@ static void xen_pvh_set_cr_flags(int cpu)
 * X86_CR0_TS, X86_CR0_PE, X86_CR0_ET are set by Xen for HVM guests
 * (which PVH shared codepaths), while X86_CR0_PG is for PVH. */
write_cr0(read_cr0() | X86_CR0_MP | X86_CR0_NE | X86_CR0_WP | 
X86_CR0_AM);
+
+   if (!cpu)
+   return;
+   /*
+* For BSP, PSE PGE are set in probe_page_size_mask(), for APs
+* set them here. For all, OSFXSR OSXMMEXCPT are set in fpu_init.
+   */
+   if (cpu_has_pse)
+   set_in_cr4(X86_CR4_PSE);
+
+   if (cpu_has_pge)
+   set_in_cr4(X86_CR4_PGE);
 }
 
 /*
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V0] linux PVH: Set CR4 flags

2014-01-29 Thread Mukesh Rathor

Konrad,

The CR4 settings were dropped from my earlier patch because you didn't
wanna enable them. But since you do now, we need to set them in the APs
also. If you decide not too again, please apply my prev patch
"pvh: disable pse feature for now".

thanks
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [V0 PATCH] pvh: Disable PSE feature for now

2014-01-28 Thread Mukesh Rathor

On Tue, 28 Jan 2014 10:39:23 +
"Jan Beulich"  wrote:

> >>> On 28.01.14 at 03:18, Mukesh Rathor 
> >>> wrote:
> > Until now, xen did not expose PSE to pvh guest, but a patch was
> > submitted to xen list to enable bunch of features for a pvh guest.
> > PSE has not been looked into for PVH, so until we can do that and
> > test it to make sure it works, disable the feature to avoid flood
> > of bugs.
> > 
> > Signed-off-by: Mukesh Rathor 
> > ---
> >  arch/x86/xen/enlighten.c |5 +
> >  1 files changed, 5 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> > index a4d7b64..4e952046 100644
> > --- a/arch/x86/xen/enlighten.c
> > +++ b/arch/x86/xen/enlighten.c
> > @@ -1497,6 +1497,11 @@ static void __init
> > xen_pvh_early_guest_init(void) xen_have_vector_callback = 1;
> > xen_pvh_set_cr_flags(0);
> >  
> > +/* pvh guests are not quite ready for large pages yet */
> > +setup_clear_cpu_cap(X86_FEATURE_PSE);
> > +setup_clear_cpu_cap(X86_FEATURE_PSE36);
> 
> And why would you not want to also turn of 1Gb pages then?

Right, that should be turned off too, but Konrad thinks we should
leave them on in linux and deal with issues as they come. I've not
tested them, or looked/thought about them, so had thought would be 
better to turn them on after I/someone gets to test them.

thanks
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [V0 PATCH] pvh: Disable PSE feature for now

2014-01-28 Thread Mukesh Rathor

On Mon, 27 Jan 2014 22:46:34 -0500
Konrad Rzeszutek Wilk  wrote:

> On Mon, Jan 27, 2014 at 06:18:39PM -0800, Mukesh Rathor wrote:
> > Until now, xen did not expose PSE to pvh guest, but a patch was
> > submitted to xen list to enable bunch of features for a pvh guest.
> > PSE has not been
> 
> Which 'patch'?
> 
> > looked into for PVH, so until we can do that and test it to make
> > sure it works, disable the feature to avoid flood of bugs.
> 
> I think we want a flood of bugs, no?

Ok, but lets document (via this email :)), that they are not tested.

thanks
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V0 PATCH] pvh: Disable PSE feature for now

2014-01-27 Thread Mukesh Rathor

Until now, xen did not expose PSE to pvh guest, but a patch was submitted
to xen list to enable bunch of features for a pvh guest. PSE has not been
looked into for PVH, so until we can do that and test it to make sure it
works, disable the feature to avoid flood of bugs.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/enlighten.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a4d7b64..4e952046 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1497,6 +1497,11 @@ static void __init xen_pvh_early_guest_init(void)
xen_have_vector_callback = 1;
xen_pvh_set_cr_flags(0);
 
+/* pvh guests are not quite ready for large pages yet */
+setup_clear_cpu_cap(X86_FEATURE_PSE);
+setup_clear_cpu_cap(X86_FEATURE_PSE36);
+
+
 #ifdef CONFIG_X86_32
BUG(); /* PVH: Implement proper support. */
 #endif
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

pvh: disable pse feature for now

2014-01-27 Thread Mukesh Rathor

Konrad,

Following will turn off PSE in linux until we can get to it. It's better
to turn it off here than in xen, so if BSD gets there sooner, they are not 
dependent on us.

thanks
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [V0 PATCH] xen/pvh: set some cr flags upon vcpu start

2014-01-22 Thread Mukesh Rathor

On Mon, 20 Jan 2014 10:09:30 -0500
Konrad Rzeszutek Wilk  wrote:

> On Fri, Jan 17, 2014 at 06:24:55PM -0800, Mukesh Rathor wrote:
> > pvh was designed to start with pv flags, but a commit in xen tree
> 
> Thank you for posting this!
> 
> > 51e2cac257ec8b4080d89f0855c498cbbd76a5e5 removed some of the flags
> > as
> 
> You need to always include the title of said commit.
> 
> > they are not necessary. As a result, these CR flags must be set in
> > the guest.
> 
> I sent out replies to this over the weekend but somehow they are not
> showing up.
> 

Well, they finally showed up today... US mail must be slow :)...


> 
> > +
> > +   if (!cpu)
> > +   return;
> 
> And what happens if don't have this check? Will be bad if do multiple
> cr4 writes?

no, but just confuses the reader/debugger of the code IMO :)... 


> Fyi, this (cr4) should have been a seperate patch. I fixed it up that
> way.
> > +   /*
> > +* Unlike PV, for pvh xen does not set: PSE PGE OSFXSR
> > OSXMMEXCPT
> > +* For BSP, PSE PGE will be set in probe_page_size_mask(),
> > for AP
> > +* set them here. For all, OSFXSR OSXMMEXCPT will be set
> > in fpu_init
> > +*/
> > +   if (cpu_has_pse)
> > +   set_in_cr4(X86_CR4_PSE);
> > +
> > +   if (cpu_has_pge)
> > +   set_in_cr4(X86_CR4_PGE);
> > +}
> 
> Seperate patch and since the PGE part is more complicated that just
> setting the CR4 - you also have to tweak this:
> 
> 1512 /* Prevent unwanted bits from being set in PTEs.
> */ 1513 __supported_pte_mask &=
> ~_PAGE_GLOBAL;  
> 
> I think it should be done once we have actually confirmed that you can
> do 2MB pages within the guest. (might need some more tweaking?)

Umm... well, the above is just setting the PSE and PGE in the APs, the
BSP is already doing that in probe_page_size_mask, and setting 
__supported_pte_mask which needs to be set just once. So, because it's
being set in the BSP, it's already broken/untested if we add expose of PGE
from xen to a linux PVH guest... 

IOW, leaving above is no more harm, or we should 'if (pvh)' the code in 
probe_page_size_mask() for PSE, and wait till we can test it...

thanks
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V0 PATCH] xen/pvh: set some cr flags upon vcpu start

2014-01-17 Thread Mukesh Rathor

Konrad,

The following patch sets the bits in CR0 and CR4. Please note, I'm working
on patch for the xen side. The CR4 features are not currently exported
to a PVH guest. 

Roger, I added your SOB line, please lmk if I need to add anything else.

This patch was build on top of a71accb67e7645c68061cec2bee6067205e439fc in
konrad devel/pvh.v13 branch.

thanks
Mukesh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V0 PATCH] xen/pvh: set some cr flags upon vcpu start

2014-01-17 Thread Mukesh Rathor

pvh was designed to start with pv flags, but a commit in xen tree
51e2cac257ec8b4080d89f0855c498cbbd76a5e5 removed some of the flags as
they are not necessary. As a result, these CR flags must be set in the
guest.

Signed-off-by: Roger Pau Monne 
Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/enlighten.c |   43 +--
 arch/x86/xen/smp.c   |2 +-
 arch/x86/xen/xen-ops.h   |2 +-
 3 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 628099a..4a2aaa6 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1410,12 +1410,8 @@ static void __init xen_boot_params_init_edd(void)
  * Set up the GDT and segment registers for -fstack-protector.  Until
  * we do this, we have to be careful not to call any stack-protected
  * function, which is most of the kernel.
- *
- * Note, that it is refok - because the only caller of this after init
- * is PVH which is not going to use xen_load_gdt_boot or other
- * __init functions.
  */
-void __ref xen_setup_gdt(int cpu)
+static void xen_setup_gdt(int cpu)
 {
if (xen_feature(XENFEAT_auto_translated_physmap)) {
 #ifdef CONFIG_X86_64
@@ -1463,13 +1459,48 @@ void __ref xen_setup_gdt(int cpu)
pv_cpu_ops.load_gdt = xen_load_gdt;
 }
 
+/*
+ * A pv guest starts with default flags that are not set for pvh, set them
+ * here asap.
+ */
+static void xen_pvh_set_cr_flags(int cpu)
+{
+   write_cr0(read_cr0() | X86_CR0_MP | X86_CR0_WP | X86_CR0_AM);
+
+   if (!cpu)
+   return;
+   /*
+* Unlike PV, for pvh xen does not set: PSE PGE OSFXSR OSXMMEXCPT
+* For BSP, PSE PGE will be set in probe_page_size_mask(), for AP
+* set them here. For all, OSFXSR OSXMMEXCPT will be set in fpu_init
+*/
+   if (cpu_has_pse)
+   set_in_cr4(X86_CR4_PSE);
+
+   if (cpu_has_pge)
+   set_in_cr4(X86_CR4_PGE);
+}
+
+/*
+ * Note, that it is refok - because the only caller of this after init
+ * is PVH which is not going to use xen_load_gdt_boot or other
+ * __init functions.
+ */
+void __ref xen_pvh_secondary_vcpu_init(int cpu)
+{
+   xen_setup_gdt(cpu);
+   xen_pvh_set_cr_flags(cpu);
+}
+
 static void __init xen_pvh_early_guest_init(void)
 {
if (!xen_feature(XENFEAT_auto_translated_physmap))
return;
 
-   if (xen_feature(XENFEAT_hvm_callback_vector))
+   if (xen_feature(XENFEAT_hvm_callback_vector)) {
xen_have_vector_callback = 1;
+   xen_pvh_set_cr_flags(0);
+   }
 
 #ifdef CONFIG_X86_32
BUG(); /* PVH: Implement proper support. */
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 5e46190..a18eadd 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -105,7 +105,7 @@ static void cpu_bringup_and_idle(int cpu)
 #ifdef CONFIG_X86_64
if (xen_feature(XENFEAT_auto_translated_physmap) &&
xen_feature(XENFEAT_supervisor_mode_kernel))
-   xen_setup_gdt(cpu);
+   xen_pvh_secondary_vcpu_init(cpu);
 #endif
cpu_bringup();
cpu_startup_entry(CPUHP_ONLINE);
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 9059c24..1cb6f4c 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -123,5 +123,5 @@ __visible void xen_adjust_exception_frame(void);
 
 extern int xen_panic_handler_init(void);
 
-void xen_setup_gdt(int cpu);
+void xen_pvh_secondary_vcpu_init(int cpu);
 #endif /* XEN_OPS_H */
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v12 10/18] xen/pvh: Update E820 to work with PVH (v2)

2014-01-03 Thread Mukesh Rathor

On Thu, 2 Jan 2014 13:41:34 -0500
Konrad Rzeszutek Wilk  wrote:

> On Thu, Jan 02, 2014 at 04:14:32PM +, David Vrabel wrote:
> > On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > > From: Mukesh Rathor 
> > > 
> > > In xen_add_extra_mem() we can skip updating P2M as it's managed
> > > by Xen. PVH maps the entire IO space, but only RAM pages need
> > > to be repopulated.
> > 
> > So this looks minimal but I can't work out what PVH actually needs
> > to do here.  This code really doesn't need to be made any more
> > confusing.
> 
> I gather you prefer Mukesh's original version?

I think Konrad thats easier to follow as one can quickly spot
the PVH difference... but your call.

thanks
mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2).

2014-01-03 Thread Mukesh Rathor

On Fri, 3 Jan 2014 12:35:55 -0500
Konrad Rzeszutek Wilk  wrote:

> On Thu, Jan 02, 2014 at 05:34:38PM -0800, Mukesh Rathor wrote:
> > On Thu, 2 Jan 2014 13:32:21 -0500
> > Konrad Rzeszutek Wilk  wrote:
> > 
> > > On Thu, Jan 02, 2014 at 03:32:33PM +, David Vrabel wrote:
> > > > On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > > > > From: Mukesh Rathor 
> > > > > 
> > > > > In the bootup code for PVH we can trap cpuid via vmexit, so
> > > > > don't need to use emulated prefix call. We also check for
> > > > > vector callback early on, as it is a required feature. PVH
> > > > > also runs at default kernel IOPL.
> > > > > 
> > > > > Finally, pure PV settings are moved to a separate function
> > > > > that are only called for pure PV, ie, pv with pvmmu. They are
> > > > > also #ifdef with CONFIG_XEN_PVMMU.
> > > > [...]
> > > > > @@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax,
> > > > > unsigned int *bx, break;
> > > > >   }
> > > > >  
> > > > > - asm(XEN_EMULATE_PREFIX "cpuid"
> > > > > - : "=a" (*ax),
> > > > > -   "=b" (*bx),
> > > > > -   "=c" (*cx),
> > > > > -   "=d" (*dx)
> > > > > - : "0" (*ax), "2" (*cx));
> > > > > + if (xen_pvh_domain())
> > > > > + native_cpuid(ax, bx, cx, dx);
> > > > > + else
> > > > > + asm(XEN_EMULATE_PREFIX "cpuid"
> > > > > + : "=a" (*ax),
> > > > > + "=b" (*bx),
> > > > > + "=c" (*cx),
> > > > > + "=d" (*dx)
> > > > > + : "0" (*ax), "2" (*cx));
> > > > 
> > > > For this one off cpuid call it seems preferrable to me to use
> > > > the emulate prefix rather than diverge from PV.
> > > 
> > > This was before the PV cpuid was deemed OK to be used on PVH.
> > > Will rip this out to use the same version.
> > 
> > Whats wrong with using native cpuid? That is one of the benefits
> > that cpuid can be trapped via vmexit, and also there is talk of
> > making PV cpuid trap obsolete in the future. I suggest leaving it
> > native.
> 
> I chatted with David, Andrew and Roger on IRC about this. I like the
> idea of using xen_cpuid because:
>  1) It filters some of the CPUID flags that guests should not use.
> There is the 'aperfmperf,'x2apic', 'xsave', and whether the MWAIT_LEAF
> should be exposed (so that the ACPI AML code can call the right
> initialization code to use the extended C3 states instead of the
> legacy IOPORT ones). All of that is in xen_cpuid.
>
>  2) It works, while we can concentrate on making 1) work in the
> hypervisor/toolstack.
> 
> Meaning that the future way would be to use the native cpuid and have
> the hypervisor/toolstack setup the proper cpuid. In other words - use
> the xen_cpuid as is until that code for filtering is in the
> hypervisor.
> 
> 
> Except that PVH does not work the PV cpuid at all. I get a triple
> fault. The instruction it fails at is at the 'XEN_EMULATE_PREFIX'.
> 
> Mukesh, can you point me to the patch where the PV cpuid functionality
> is enabled?
> 
> Anyhow, as it stands, I will just use the native cpuid.

I am referring to using "cpuid" instruction instead of XEN_EMULATE_PREFIX.
cpuid is faster and long term better... there is no benefit using
XEN_EMULATE_PREFIX IMO. We can look at removing xen_cpuid() altogether for
PVH when/after pvh 32bit work gets done IMO.

The triple fault seems to be a new bug... I can create a bug, but for
now, with using cpuid instruction, that won't be an issue.

thanks
mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels for PVH.

2014-01-03 Thread Mukesh Rathor

On Wed, 18 Dec 2013 16:17:39 -0500
Konrad Rzeszutek Wilk  wrote:

> On Wed, Dec 18, 2013 at 06:31:43PM +, Stefano Stabellini wrote:
> > On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > > From: Mukesh Rathor 
> > > 
> > > PVH is a PV guest with a twist - there are certain things
> > > that work in it like HVM and some like PV. There is
> > > a similar mode - PVHVM where we run in HVM mode with
> > > PV code enabled - and this patch explores that.
> > > 
> > > The most notable PV interfaces are the XenBus and event channels.
> > > For PVH, we will use XenBus and event channels.
> > > 
> > > For the XenBus mechanism we piggyback on how it is done for
> > > PVHVM guests.
> > > 
> > > Ditto for the event channel mechanism - we piggyback on PVHVM -
> > > by setting up a specific vector callback and that
> > > vector ends up calling the event channel mechanism to
> > > dispatch the events as needed.
> > > 
> > > This means that from a pvops perspective, we can use
> > > native_irq_ops instead of the Xen PV specific. Albeit in the
> > > future we could support pirq_eoi_map. But that is
> > > a feature request that can be shared with PVHVM.
> > > 
> > > Signed-off-by: Mukesh Rathor 
> > > Signed-off-by: Konrad Rzeszutek Wilk 
> > > ---
> > >  arch/x86/xen/enlighten.c   | 6 ++
> > >  arch/x86/xen/irq.c | 5 -
> > >  drivers/xen/events.c   | 5 +
> > >  drivers/xen/xenbus/xenbus_client.c | 3 ++-
> > >  4 files changed, 17 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> > > index e420613..7fceb51 100644
> > > --- a/arch/x86/xen/enlighten.c
> > > +++ b/arch/x86/xen/enlighten.c
> > > @@ -1134,6 +1134,8 @@ void xen_setup_shared_info(void)
> > >   /* In UP this is as good a place as any to set up shared
> > > info */ xen_setup_vcpu_info_placement();
> > >  #endif
> > > + if (xen_pvh_domain())
> > > + return;
> > >  
> > >   xen_setup_mfn_list_list();
> > >  }
> > 
> > This is another one of those cases where I think we would benefit
> > from introducing xen_setup_shared_info_pvh instead of adding more
> > ifs here.
> 
> Actually this one can be removed.
> 
> > 
> > 
> > > @@ -1146,6 +1148,10 @@ void xen_setup_vcpu_info_placement(void)
> > >   for_each_possible_cpu(cpu)
> > >   xen_vcpu_setup(cpu);
> > >  
> > > + /* PVH always uses native IRQ ops */
> > > + if (xen_pvh_domain())
> > > + return;
> > > +
> > >   /* xen_vcpu_setup managed to place the vcpu_info within
> > > the percpu area for all cpus, so make use of it */
> > >   if (have_vcpu_info_placement) {
> > 
> > Same here?
> 
> Hmmm, I wonder if the vcpu info placement could work with PVH.

It should now (after a patch I sent while ago)... the comment implies
that PVH uses native IRQs even case of vcpu info placlement...

perhaps it would be more clear to do:

for_each_possible_cpu(cpu)
xen_vcpu_setup(cpu);
/* PVH always uses native IRQ ops */
if (have_vcpu_info_placement && !xen_pvh_domain) {
pv_irq_ops.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels for PVH.

2014-01-03 Thread Mukesh Rathor

On Fri, 3 Jan 2014 15:04:27 +
Stefano Stabellini  wrote:

> On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > > > --- a/drivers/xen/xenbus/xenbus_client.c
> > > > +++ b/drivers/xen/xenbus/xenbus_client.c
> > > > @@ -45,6 +45,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > >  
> > > >  #include "xenbus_probe.h"
> > > >  
> > > > @@ -743,7 +744,7 @@ static const struct xenbus_ring_ops
> > > > ring_ops_hvm = { 
> > > >  void __init xenbus_ring_ops_init(void)
> > > >  {
> > > > -   if (xen_pv_domain())
> > > > +   if (xen_pv_domain()
> > > > && !xen_feature(XENFEAT_auto_translated_physmap))
> > > 
> > > Can we just change this test to
> > > 
> > > if (!xen_feature(XENFEAT_auto_translated_physmap))
> > > 
> > > ?
> > 
> > No. If we do then the HVM domains (which are also !auto-xlat)
> > will end up using the PV version of ring_ops.
> 
> Actually HVM guests have XENFEAT_auto_translated_physmap, so in this
> case they would get &ring_ops_hvm.

Right. Back then I was confused about all the other PV modes, like
shadow, supervisor, ... but looks like they are all obsolete. It could 
just be:

if (!xen_feature(XENFEAT_auto_translated_physmap))
ring_ops = &ring_ops_pv;
else
ring_ops = &ring_ops_hvm;

thanks,
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v12 06/18] xen/pvh: MMU changes for PVH (v2)

2014-01-02 Thread Mukesh Rathor

On Thu, 2 Jan 2014 11:24:50 +
David Vrabel  wrote:

> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor 
> > 
> > .. which are surprinsingly small compared to the amount for PV code.
> > 
> > PVH uses mostly native mmu ops, we leave the generic (native_*) for
> > the majority and just overwrite the baremetal with the ones we need.
> > 
> > We also optimize one - the TLB flush. The native operation would
> > needlessly IPI offline VCPUs causing extra wakeups. Using the
> > Xen one avoids that and lets the hypervisor determine which
> > VCPU needs the TLB flush.
> 
> This TLB flush optimization should be a separate patch.

It's not really an "optimization", we are using PV mechanism instead
of native because PV one performs better. So, I think it's ok to belong
here.

Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2).

2014-01-02 Thread Mukesh Rathor

On Thu, 2 Jan 2014 13:32:21 -0500
Konrad Rzeszutek Wilk  wrote:

> On Thu, Jan 02, 2014 at 03:32:33PM +, David Vrabel wrote:
> > On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > > From: Mukesh Rathor 
> > > 
> > > In the bootup code for PVH we can trap cpuid via vmexit, so don't
> > > need to use emulated prefix call. We also check for vector
> > > callback early on, as it is a required feature. PVH also runs at
> > > default kernel IOPL.
> > > 
> > > Finally, pure PV settings are moved to a separate function that
> > > are only called for pure PV, ie, pv with pvmmu. They are also
> > > #ifdef with CONFIG_XEN_PVMMU.
> > [...]
> > > @@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax,
> > > unsigned int *bx, break;
> > >   }
> > >  
> > > - asm(XEN_EMULATE_PREFIX "cpuid"
> > > - : "=a" (*ax),
> > > -   "=b" (*bx),
> > > -   "=c" (*cx),
> > > -   "=d" (*dx)
> > > - : "0" (*ax), "2" (*cx));
> > > + if (xen_pvh_domain())
> > > + native_cpuid(ax, bx, cx, dx);
> > > + else
> > > + asm(XEN_EMULATE_PREFIX "cpuid"
> > > + : "=a" (*ax),
> > > + "=b" (*bx),
> > > + "=c" (*cx),
> > > + "=d" (*dx)
> > > + : "0" (*ax), "2" (*cx));
> > 
> > For this one off cpuid call it seems preferrable to me to use the
> > emulate prefix rather than diverge from PV.
> 
> This was before the PV cpuid was deemed OK to be used on PVH.
> Will rip this out to use the same version.

Whats wrong with using native cpuid? That is one of the benefits that
cpuid can be trapped via vmexit, and also there is talk of making PV
cpuid trap obsolete in the future. I suggest leaving it native.

Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH v11 05/12] xen/pvh: Update E820 to work with PVH

2013-12-18 Thread Mukesh Rathor

On Wed, 18 Dec 2013 18:25:15 +
Stefano Stabellini  wrote:

> On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor 
> > 
> > In xen_add_extra_mem() we can skip updating P2M as it's managed
> > by Xen. PVH maps the entire IO space, but only RAM pages need
> > to be repopulated.
> > 
> > Signed-off-by: Mukesh Rathor 
> > Signed-off-by: Konrad Rzeszutek Wilk 
> > ---
> >  arch/x86/xen/setup.c | 19 +--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c

> > @@ -231,7 +246,7 @@ static void __init
> > xen_set_identity_and_release_chunk( (void)HYPERVISOR_update_va_mapping(
> > (unsigned long)__va(pfn << PAGE_SHIFT),
> > pte, 0); }
> > -
> > +skip:
> > if (start_pfn < nr_pages)
> > *released += xen_release_chunk(
> > start_pfn, min(end_pfn, nr_pages));
... 
> Also considering that you are turning xen_release_chunk into a nop,
> the only purpose of this function on PVH is to call
> set_phys_range_identity. Can't we just do that?

xen_release_chunk() is called for PVH to give us the count of released,
altho we don't need to release anything for pvh as it was already done in
xen. The released count is then used later to add memory.

I had separate function to just adjust the stats, which is all we need
to do for pvh, konrad just merged it with pv functions.

thanks
mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V1]PVH: vcpu info placement, load CS selector, and remove debug printk.

2013-06-05 Thread Mukesh Rathor

This patch addresses 3 things:
   - Resolve vcpu info placement fixme.
   - Load CS selector for PVH after switching to new gdt.
   - Remove printk in case of failure to map pnfs in p2m. This because qemu
 has lot of expected failures when mapping HVM pages.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/enlighten.c |   19 +++
 arch/x86/xen/mmu.c   |3 ---
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a7ee39f..d55a578 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1083,14 +1083,12 @@ void xen_setup_shared_info(void)
HYPERVISOR_shared_info =
(struct shared_info *)__va(xen_start_info->shared_info);
 
-   /* PVH TBD/FIXME: vcpu info placement in phase 2 */
-   if (xen_pvh_domain())
-   return;
-
 #ifndef CONFIG_SMP
/* In UP this is as good a place as any to set up shared info */
xen_setup_vcpu_info_placement();
 #endif
+   if (xen_pvh_domain())
+   return;
 
xen_setup_mfn_list_list();
 }
@@ -1103,6 +1101,10 @@ void xen_setup_vcpu_info_placement(void)
for_each_possible_cpu(cpu)
xen_vcpu_setup(cpu);
 
+   /* PVH always uses native IRQ ops */
+   if (xen_pvh_domain())
+   return;
+
/* xen_vcpu_setup managed to place the vcpu_info within the
   percpu area for all cpus, so make use of it */
if (have_vcpu_info_placement) {
@@ -1326,7 +1328,16 @@ static void __init xen_setup_stackprotector(void)
 {
/* PVH TBD/FIXME: investigate setup_stack_canary_segment */
if (xen_feature(XENFEAT_auto_translated_physmap)) {
+   unsigned long dummy;
+
switch_to_new_gdt(0);
+
+   asm volatile ("pushq %0\n"
+ "leaq 1f(%%rip),%0\n"
+ "pushq %0\n"
+ "lretq\n"
+ "1:\n"
+ : "=&r" (dummy) : "0" (__KERNEL_CS));
return;
}
pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 31cc1ef..c104895 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2527,9 +2527,6 @@ static int pvh_add_to_xen_p2m(unsigned long lpfn, 
unsigned long fgmfn,
set_xen_guest_handle(xatp.errs, &err);
 
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
-   if (rc || err)
-   pr_warn("d0: Failed to map pfn (0x%lx) to mfn (0x%lx) 
rc:%d:%d\n",
-   lpfn, fgmfn, rc, err);
return rc;
 }
 
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH] PVH: vcpu info placement, load selectors, and remove debug printk.

2013-06-05 Thread Mukesh Rathor

On Wed, 05 Jun 2013 08:03:12 +0100
"Jan Beulich"  wrote:

> >>> On 04.06.13 at 23:53, Mukesh Rathor 
> >>> wrote:
> > Following OK? :
> > 
> > if (xen_feature(XENFEAT_auto_translated_physmap)) {
> > switch_to_new_gdt(0);
> > 
> > asm volatile (
> > "pushq %%rax\n"
> > "leaq 1f(%%rip),%%rax\n"
> > "pushq %%rax\n"
> > "lretq\n"
> > "1:\n"
> > : : "a" (__KERNEL_CS) : "memory");
> > 
> > return;
> > }
> 
> While generally the choice of using %%rax instead of %0 here is
> a matter of taste to some degree, I still don't see why you can't
> use "r" as the constraint here in the first place.

The compiler mostly picks eax anyways, but good suggestion.

> Furthermore, assuming this sits in a function guaranteed to not be
> inlined, this has a latent bug (and if the assumption isn't right, the
> bug is real) in that the asm() modifies %rax without telling the
> compiler.

According to one of the unofficial asm tutorials i've here, the compiler
knows since it's input and doesn't need to be told. In fact
it'll barf if added to clobber list.

> This is how I would have done it:
> 
>   unsigned long dummy;
> 
>   asm volatile ("pushq %0\n"
> "leaq 1f(%%rip),%0\n"
> "pushq %0\n"
> "lretq\n"
> "1:\n"
> : "=&r" (dummy) : "0" (__KERNEL_CS));
> 

Looks good. Thanks,
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH] PVH: vcpu info placement, load selectors, and remove debug printk.

2013-06-04 Thread Mukesh Rathor

On Tue, 04 Jun 2013 09:27:03 +0100
"Jan Beulich"  wrote:

> >>> On 04.06.13 at 02:43, Mukesh Rathor 
> >>> wrote:
> > @@ -1327,6 +1329,18 @@ static void __init
> > xen_setup_stackprotector(void) /* PVH TBD/FIXME: investigate
> > setup_stack_canary_segment */ if
> > (xen_feature(XENFEAT_auto_translated_physmap))
> > { switch_to_new_gdt(0); +
> > +   /* xen started us with null selectors. load them
> > now */
> > +   __asm__ __volatile__ (
> > +   "movl %0,%%ds\n"
> > +   "movl %0,%%ss\n"
> > +   "pushq %%rax\n"
> > +   "leaq 1f(%%rip),%%rax\n"
> > +   "pushq %%rax\n"
> > +   "retfq\n"
> > +   "1:\n"
> > +   : : "r" (__KERNEL_DS), "a" (__KERNEL_CS) :
> > "memory"); +
> 
> I can see why you want CS to be reloaded (and CS, other than the
> comment says, clearly hasn't been holding a null selector up to here.
> 
> I can't immediately see why you'd need SS to be other than null, and
> it completely escapes me why you'd need to DS (but not ES) to be
> non-null.
> 
> Furthermore, is there any reason why you use "retfq" (Intel syntax)
> when all assembly code otherwise uses AT&T syntax (the proper
> equivalent here would be "lretq")?
> 
> And finally, please consistently use % (which, once
> fixed, will make clear that the second constraint really can be "r"),
> and avoid using suffixes on moves to/from selector registers
> (which, once fixed, will make clear that at least the first constraint
> really can be relaxed to "rm").

Following OK? :

if (xen_feature(XENFEAT_auto_translated_physmap)) {
switch_to_new_gdt(0);

asm volatile (
"pushq %%rax\n"
"leaq 1f(%%rip),%%rax\n"
"pushq %%rax\n"
"lretq\n"
"1:\n"
: : "a" (__KERNEL_CS) : "memory");

return;
}

thanks,
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PVH: vcpu info placement, load selectors, and remove debug printk.

2013-06-03 Thread Mukesh Rathor

This patch addresses 3 things:
   - Resolve vcpu info placement fixme.
   - Load DS/SS/CS selectors for PVH after switching to new gdt.
   - Remove printk in case of failure to map pnfs in p2m. This because qemu
 has lot of benign failures when mapping HVM pages.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/enlighten.c |   22 ++
 arch/x86/xen/mmu.c   |3 ---
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a7ee39f..6ff30d8 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1083,14 +1083,12 @@ void xen_setup_shared_info(void)
HYPERVISOR_shared_info =
(struct shared_info *)__va(xen_start_info->shared_info);
 
-   /* PVH TBD/FIXME: vcpu info placement in phase 2 */
-   if (xen_pvh_domain())
-   return;
-
 #ifndef CONFIG_SMP
/* In UP this is as good a place as any to set up shared info */
xen_setup_vcpu_info_placement();
 #endif
+   if (xen_pvh_domain())
+   return;
 
xen_setup_mfn_list_list();
 }
@@ -1103,6 +1101,10 @@ void xen_setup_vcpu_info_placement(void)
for_each_possible_cpu(cpu)
xen_vcpu_setup(cpu);
 
+   /* PVH always uses native IRQ ops */
+   if (xen_pvh_domain())
+   return;
+
/* xen_vcpu_setup managed to place the vcpu_info within the
   percpu area for all cpus, so make use of it */
if (have_vcpu_info_placement) {
@@ -1327,6 +1329,18 @@ static void __init xen_setup_stackprotector(void)
/* PVH TBD/FIXME: investigate setup_stack_canary_segment */
if (xen_feature(XENFEAT_auto_translated_physmap)) {
switch_to_new_gdt(0);
+
+   /* xen started us with null selectors. load them now */
+   __asm__ __volatile__ (
+   "movl %0,%%ds\n"
+   "movl %0,%%ss\n"
+   "pushq %%rax\n"
+   "leaq 1f(%%rip),%%rax\n"
+   "pushq %%rax\n"
+   "retfq\n"
+   "1:\n"
+   : : "r" (__KERNEL_DS), "a" (__KERNEL_CS) : "memory");
+
return;
}
pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 31cc1ef..c104895 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2527,9 +2527,6 @@ static int pvh_add_to_xen_p2m(unsigned long lpfn, 
unsigned long fgmfn,
set_xen_guest_handle(xatp.errs, &err);
 
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
-   if (rc || err)
-   pr_warn("d0: Failed to map pfn (0x%lx) to mfn (0x%lx) 
rc:%d:%d\n",
-   lpfn, fgmfn, rc, err);
return rc;
 }
 
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH]: PVH linux: don't print warning in case of failed mapping

2013-02-15 Thread Mukesh Rathor

Remove the printing of warning in case of failed mappings. Sometimes
they are expected as in case of Qemu mapping pages during HVM guest
creation.

Signed-off-by: Mukesh Rathor 

---
 arch/x86/xen/mmu.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index fbf6a63..afa6af6 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2499,9 +2499,6 @@ static int pvh_add_to_xen_p2m(unsigned long lpfn, 
unsigned long fgmfn,
set_xen_guest_handle(xatpr.gpfns, &gpfn);
 
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatpr);
-   if (rc)
-   pr_warn("d0: Failed to map pfn to mfn rc:%d pfn:%lx mfn:%lx\n",
-   rc, lpfn, fgmfn);
return rc;
 }
 
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] PVH: remove code to map iomem from guest

2013-02-06 Thread Mukesh Rathor

On Wed, 6 Feb 2013 10:39:13 -0500
Konrad Rzeszutek Wilk  wrote:

> On Wed, Jan 30, 2013 at 02:55:29PM -0800, Mukesh Rathor wrote:
> > It was decided during xen patch review that xen map the iomem
> > transparently, so remove xen_set_clr_mmio_pvh_pte() and the sub
> > hypercall PHYSDEVOP_map_iomem.
> > 
> 
> G..
> 
> No Signed-off-by??

Signed-off-by: Mukesh Rathor 


BTW, thanks a lot konrad for managing this while xen patch is being
reviewed. It's a huge help for me and allows me to focus on xen side.
Appreciate much.

Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] PVH linux: Use ballooning to allocate grant table pages

2013-02-06 Thread Mukesh Rathor

On Wed, 6 Feb 2013 10:49:13 -0500
Konrad Rzeszutek Wilk  wrote:

> On Thu, Jan 31, 2013 at 06:30:15PM -0800, Mukesh Rathor wrote:
> > This patch fixes a fixme in Linux to use alloc_xenballooned_pages()
> > to allocate pfns for grant table pages instead of kmalloc. This also
> > simplifies add to physmap on the xen side a bit.
> 
> Pulled this.
> > 


Konrad, no, there was a follow up email on this thread to discard this.
Please discard this. I resent yesterday with proper fixes. I realize
now I should've given one yesterday version number. My bad, this head 
cold is crippling my brain :).. 

Sorry for the confusion.

Mukesh

Following is the latest patch I emailed yesterday :


This patch fixes a fixme in Linux to use alloc_xenballooned_pages() to
allocate pfns for grant table pages instead of kmalloc. This also
simplifies add to physmap on the xen side a bit.

Signed-off-by: Mukesh Rathor 

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 9c0019d..fdb1d88 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1026,10 +1027,22 @@ static void gnttab_unmap_frames_v2(void)
arch_gnttab_unmap(grstatus, nr_status_frames(nr_grant_frames));
 }
 
+static xen_pfn_t pvh_get_grant_pfn(int grant_idx)
+{
+   unsigned long vaddr;
+   unsigned int level;
+   pte_t *pte;
+
+   vaddr = (unsigned long)(gnttab_shared.addr) + grant_idx * PAGE_SIZE;
+   pte = lookup_address(vaddr, &level);
+   BUG_ON(pte == NULL);
+   return pte_mfn(*pte);
+}
+
 static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
 {
struct gnttab_setup_table setup;
-   unsigned long *frames, start_gpfn;
+   unsigned long *frames, start_gpfn = 0;
unsigned int nr_gframes = end_idx + 1;
int rc;
 
@@ -1040,8 +1053,6 @@ static int gnttab_map(unsigned int start_idx, unsigned 
int end_idx)
 
if (xen_hvm_domain())
start_gpfn = xen_hvm_resume_frames >> PAGE_SHIFT;
-   else
-   start_gpfn = virt_to_pfn(gnttab_shared.addr);
/*
 * Loop backwards, so that the first hypercall has the largest
 * index, ensuring that the table will grow only once.
@@ -1050,7 +1061,11 @@ static int gnttab_map(unsigned int start_idx, unsigned 
int end_idx)
xatp.domid = DOMID_SELF;
xatp.idx = i;
xatp.space = XENMAPSPACE_grant_table;
-   xatp.gpfn = start_gpfn + i;
+   if (xen_hvm_domain())
+   xatp.gpfn = start_gpfn + i;
+   else
+   xatp.gpfn = pvh_get_grant_pfn(i);
+
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
if (rc != 0) {
printk(KERN_WARNING
@@ -1138,27 +1153,51 @@ static void gnttab_request_version(void)
grant_table_version);
 }
 
+/*
+ * PVH: we need three things: virtual address, pfns, and mfns. The pfns
+ * are allocated via ballooning, then we call arch_gnttab_map_shared to
+ * allocate the VA and put pfn's in the pte's for the VA. The mfn's are
+ * finally allocated in gnttab_map() by xen which also populates the P2M.
+ */
+static int xlated_setup_gnttab_pages(unsigned long numpages, void **addr)
+{
+   int i, rc;
+   unsigned long pfns[numpages];
+   struct page *pages[numpages];
+
+   rc = alloc_xenballooned_pages(numpages, pages, 0);
+   if (rc != 0) {
+   pr_warn("%s Couldn't balloon alloc %ld pfns rc:%d\n", __func__,
+   numpages, rc);
+   return rc;
+   }
+   for (i = 0; i < numpages; i++)
+   pfns[i] = page_to_pfn(pages[i]);
+
+   rc = arch_gnttab_map_shared(pfns, numpages, numpages, addr);
+   if (rc != 0)
+   free_xenballooned_pages(numpages, pages);
+
+   return rc;
+}
+
 int gnttab_resume(void)
 {
+   int rc;
unsigned int max_nr_gframes;
-   char *kmsg = "Failed to kmalloc pages for pv in hvm grant frames\n";
 
gnttab_request_version();
max_nr_gframes = gnttab_max_grant_frames();
if (max_nr_gframes < nr_grant_frames)
return -ENOSYS;
 
-   /* PVH note: xen will free existing kmalloc'd mfn in
-* XENMEM_add_to_physmap. TBD/FIXME: use xen ballooning instead of
-* kmalloc(). */
if (xen_pv_domain() && xen_feature(XENFEAT_auto_translated_physmap) &&
!gnttab_shared.addr) {
-   gnttab_shared.addr =
-   kmalloc(max_nr_gframes * PAGE_SIZE, GFP_KERNEL);
-   if (!gnttab_shared.addr) {
-

[PATCH] PVH linux: Use ballooning to allocate grant table pages

2013-02-05 Thread Mukesh Rathor

This patch fixes a fixme in Linux to use alloc_xenballooned_pages() to
allocate pfns for grant table pages instead of kmalloc. This also
simplifies add to physmap on the xen side a bit.

Signed-off-by: Mukesh Rathor 

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 9c0019d..fdb1d88 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1026,10 +1027,22 @@ static void gnttab_unmap_frames_v2(void)
arch_gnttab_unmap(grstatus, nr_status_frames(nr_grant_frames));
 }
 
+static xen_pfn_t pvh_get_grant_pfn(int grant_idx)
+{
+   unsigned long vaddr;
+   unsigned int level;
+   pte_t *pte;
+
+   vaddr = (unsigned long)(gnttab_shared.addr) + grant_idx * PAGE_SIZE;
+   pte = lookup_address(vaddr, &level);
+   BUG_ON(pte == NULL);
+   return pte_mfn(*pte);
+}
+
 static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
 {
struct gnttab_setup_table setup;
-   unsigned long *frames, start_gpfn;
+   unsigned long *frames, start_gpfn = 0;
unsigned int nr_gframes = end_idx + 1;
int rc;
 
@@ -1040,8 +1053,6 @@ static int gnttab_map(unsigned int start_idx, unsigned 
int end_idx)
 
if (xen_hvm_domain())
start_gpfn = xen_hvm_resume_frames >> PAGE_SHIFT;
-   else
-   start_gpfn = virt_to_pfn(gnttab_shared.addr);
/*
 * Loop backwards, so that the first hypercall has the largest
 * index, ensuring that the table will grow only once.
@@ -1050,7 +1061,11 @@ static int gnttab_map(unsigned int start_idx, unsigned 
int end_idx)
xatp.domid = DOMID_SELF;
xatp.idx = i;
xatp.space = XENMAPSPACE_grant_table;
-   xatp.gpfn = start_gpfn + i;
+   if (xen_hvm_domain())
+   xatp.gpfn = start_gpfn + i;
+   else
+   xatp.gpfn = pvh_get_grant_pfn(i);
+
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
if (rc != 0) {
printk(KERN_WARNING
@@ -1138,27 +1153,51 @@ static void gnttab_request_version(void)
grant_table_version);
 }
 
+/*
+ * PVH: we need three things: virtual address, pfns, and mfns. The pfns
+ * are allocated via ballooning, then we call arch_gnttab_map_shared to
+ * allocate the VA and put pfn's in the pte's for the VA. The mfn's are
+ * finally allocated in gnttab_map() by xen which also populates the P2M.
+ */
+static int xlated_setup_gnttab_pages(unsigned long numpages, void **addr)
+{
+   int i, rc;
+   unsigned long pfns[numpages];
+   struct page *pages[numpages];
+
+   rc = alloc_xenballooned_pages(numpages, pages, 0);
+   if (rc != 0) {
+   pr_warn("%s Couldn't balloon alloc %ld pfns rc:%d\n", __func__,
+   numpages, rc);
+   return rc;
+   }
+   for (i = 0; i < numpages; i++)
+   pfns[i] = page_to_pfn(pages[i]);
+
+   rc = arch_gnttab_map_shared(pfns, numpages, numpages, addr);
+   if (rc != 0)
+   free_xenballooned_pages(numpages, pages);
+
+   return rc;
+}
+
 int gnttab_resume(void)
 {
+   int rc;
unsigned int max_nr_gframes;
-   char *kmsg = "Failed to kmalloc pages for pv in hvm grant frames\n";
 
gnttab_request_version();
max_nr_gframes = gnttab_max_grant_frames();
if (max_nr_gframes < nr_grant_frames)
return -ENOSYS;
 
-   /* PVH note: xen will free existing kmalloc'd mfn in
-* XENMEM_add_to_physmap. TBD/FIXME: use xen ballooning instead of
-* kmalloc(). */
if (xen_pv_domain() && xen_feature(XENFEAT_auto_translated_physmap) &&
!gnttab_shared.addr) {
-   gnttab_shared.addr =
-   kmalloc(max_nr_gframes * PAGE_SIZE, GFP_KERNEL);
-   if (!gnttab_shared.addr) {
-   pr_warn("%s", kmsg);
-   return -ENOMEM;
-   }
+
+   rc = xlated_setup_gnttab_pages((unsigned long)max_nr_gframes,
+  &gnttab_shared.addr);
+   if (rc != 0)
+   return rc;
}
if (xen_pv_domain())
return gnttab_map(0, nr_grant_frames - 1);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH] PVH linux: Use ballooning to allocate grant table pages

2013-02-01 Thread Mukesh Rathor

On Fri, 1 Feb 2013 14:00:58 -0800
Mukesh Rathor  wrote:

> On Thu, 31 Jan 2013 18:44:46 -0800
> Mukesh Rathor  wrote:
> 
> > On Thu, 31 Jan 2013 18:30:15 -0800
> > Mukesh Rathor  wrote:
> > 
> > > This patch fixes a fixme in Linux to use
> > > alloc_xenballooned_pages() to allocate pfns for grant table pages
> > > instead of kmalloc. This also simplifies add to physmap on the
> > > xen side a bit.
> > 
> > Looking at it again, I realized rc should be signed in
> > gnttab_resume(). Below again. Thanks.
> 
> Konrad, Please hold off on this patch. I discovered an issue on the 
> domU side with this change. I'm currently investigating if it's 
> related.

Ah right, I forgot the pfn's from balloon may not be always contigous.
Besides these are special pfns so to speak, so in gnttab_map()
virt_to_pfn doesn't work. 

I'm gonna have to create a separate gnttab map routine for pvh case, it
appears. Shouldn't be too bad tho.

thanks,
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH] PVH linux: Use ballooning to allocate grant table pages

2013-02-01 Thread Mukesh Rathor

On Thu, 31 Jan 2013 18:44:46 -0800
Mukesh Rathor  wrote:

> On Thu, 31 Jan 2013 18:30:15 -0800
> Mukesh Rathor  wrote:
> 
> > This patch fixes a fixme in Linux to use alloc_xenballooned_pages()
> > to allocate pfns for grant table pages instead of kmalloc. This also
> > simplifies add to physmap on the xen side a bit.
> 
> Looking at it again, I realized rc should be signed in
> gnttab_resume(). Below again. Thanks.

Konrad, Please hold off on this patch. I discovered an issue on the 
domU side with this change. I'm currently investigating if it's 
related.

thanks,
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH] PVH linux: Use ballooning to allocate grant table pages

2013-01-31 Thread Mukesh Rathor

On Thu, 31 Jan 2013 18:30:15 -0800
Mukesh Rathor  wrote:

> This patch fixes a fixme in Linux to use alloc_xenballooned_pages() to
> allocate pfns for grant table pages instead of kmalloc. This also
> simplifies add to physmap on the xen side a bit.

Looking at it again, I realized rc should be signed in gnttab_resume().
Below again. Thanks.

Signed-off-by: Mukesh Rathor 

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 9c0019d..7a630bb 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1138,27 +1139,42 @@ static void gnttab_request_version(void)
grant_table_version);
 }
 
+static int xlated_setup_gnttab_pages(int numpages, void **addr)
+{
+   int i, rc;
+   unsigned long pfns[numpages];
+   struct page *pages[numpages];
+
+   rc = alloc_xenballooned_pages(numpages, pages, 0);
+   if (rc != 0) {
+   pr_warn("%s Could not balloon alloc %d pfns rc:%d\n", __func__,
+   numpages, rc);
+   return -ENOMEM;
+   }
+   for (i = 0; i < numpages; i++)
+   pfns[i] = page_to_pfn(pages[i]);
+
+   rc = arch_gnttab_map_shared(pfns, numpages, numpages, addr);
+   return rc;
+}
+
 int gnttab_resume(void)
 {
+   int rc;
unsigned int max_nr_gframes;
-   char *kmsg = "Failed to kmalloc pages for pv in hvm grant frames\n";
 
gnttab_request_version();
max_nr_gframes = gnttab_max_grant_frames();
if (max_nr_gframes < nr_grant_frames)
return -ENOSYS;
 
-   /* PVH note: xen will free existing kmalloc'd mfn in
-* XENMEM_add_to_physmap. TBD/FIXME: use xen ballooning instead of
-* kmalloc(). */
if (xen_pv_domain() && xen_feature(XENFEAT_auto_translated_physmap) &&
!gnttab_shared.addr) {
-   gnttab_shared.addr =
-   kmalloc(max_nr_gframes * PAGE_SIZE, GFP_KERNEL);
-   if (!gnttab_shared.addr) {
-   pr_warn("%s", kmsg);
-   return -ENOMEM;
-   }
+
+   rc = xlated_setup_gnttab_pages(max_nr_gframes,
+  &gnttab_shared.addr);
+   if (rc != 0)
+   return rc;
}
if (xen_pv_domain())
return gnttab_map(0, nr_grant_frames - 1);



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PVH linux: Use ballooning to allocate grant table pages

2013-01-31 Thread Mukesh Rathor

This patch fixes a fixme in Linux to use alloc_xenballooned_pages() to
allocate pfns for grant table pages instead of kmalloc. This also
simplifies add to physmap on the xen side a bit.

Signed-off-by: Mukesh Rathor 

---
 drivers/xen/grant-table.c |   37 ++---
 1 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 9c0019d..d731f39 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1138,27 +1139,41 @@ static void gnttab_request_version(void)
grant_table_version);
 }
 
+static int xlated_setup_gnttab_pages(int numpages, void **addr)
+{
+   int i, rc;
+   unsigned long pfns[numpages];
+   struct page *pages[numpages];
+
+   rc = alloc_xenballooned_pages(numpages, pages, 0);
+   if (rc != 0) {
+   pr_warn("%s Could not balloon alloc %d pfns rc:%d\n", __func__,
+   numpages, rc);
+   return -ENOMEM;
+   }
+   for (i = 0; i < numpages; i++)
+   pfns[i] = page_to_pfn(pages[i]);
+
+   rc = arch_gnttab_map_shared(pfns, numpages, numpages, addr);
+   return rc;
+}
+
 int gnttab_resume(void)
 {
-   unsigned int max_nr_gframes;
-   char *kmsg = "Failed to kmalloc pages for pv in hvm grant frames\n";
+   unsigned int rc, max_nr_gframes;
 
gnttab_request_version();
max_nr_gframes = gnttab_max_grant_frames();
if (max_nr_gframes < nr_grant_frames)
return -ENOSYS;
 
-   /* PVH note: xen will free existing kmalloc'd mfn in
-* XENMEM_add_to_physmap. TBD/FIXME: use xen ballooning instead of
-* kmalloc(). */
if (xen_pv_domain() && xen_feature(XENFEAT_auto_translated_physmap) &&
!gnttab_shared.addr) {
-   gnttab_shared.addr =
-   kmalloc(max_nr_gframes * PAGE_SIZE, GFP_KERNEL);
-   if (!gnttab_shared.addr) {
-   pr_warn("%s", kmsg);
-   return -ENOMEM;
-   }
+
+   rc = xlated_setup_gnttab_pages(max_nr_gframes,
+  &gnttab_shared.addr);
+   if (rc != 0)
+   return rc;
}
if (xen_pv_domain())
return gnttab_map(0, nr_grant_frames - 1);
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PVH: remove code to map iomem from guest

2013-01-30 Thread Mukesh Rathor

It was decided during xen patch review that xen map the iomem
transparently, so remove xen_set_clr_mmio_pvh_pte() and the sub
hypercall PHYSDEVOP_map_iomem.

---
 arch/x86/xen/mmu.c  |   14 --
 arch/x86/xen/setup.c|   16 
 include/xen/interface/physdev.h |   10 --
 3 files changed, 4 insertions(+), 36 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index b4be4c9..fbf6a63 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -333,20 +333,6 @@ static void xen_set_pte(pte_t *ptep, pte_t pteval)
__xen_set_pte(ptep, pteval);
 }
 
-void xen_set_clr_mmio_pvh_pte(unsigned long pfn, unsigned long mfn,
- int nr_mfns, int add_mapping)
-{
-   struct physdev_map_iomem iomem;
-
-   iomem.first_gfn = pfn;
-   iomem.first_mfn = mfn;
-   iomem.nr_mfns = nr_mfns;
-   iomem.add_mapping = add_mapping;
-
-   if (HYPERVISOR_physdev_op(PHYSDEVOP_map_iomem, &iomem))
-   BUG();
-}
-
 static void xen_set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pteval)
 {
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 7e93ec9..6532172 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -235,20 +235,12 @@ static void __init xen_set_identity_and_release_chunk(
*identity += set_phys_range_identity(start_pfn, end_pfn);
 }
 
-/* For PVH, the pfns [0..MAX] are mapped to mfn's in the EPT/NPT. The mfns
- * are released as part of this 1:1 mapping hypercall back to the dom heap.
- * Also, we map the entire IO space, ie, beyond max_pfn_mapped.
- */
-static void __init xen_pvh_identity_map_chunk(unsigned long start_pfn,
+/* PVH: xen has already mapped the IO space in the EPT/NPT for us, so we
+ * just need to adjust the released and identity count */
+static void __init xen_pvh_adjust_stats(unsigned long start_pfn,
unsigned long end_pfn, unsigned long max_pfn,
unsigned long *released, unsigned long *identity)
 {
-   unsigned long pfn;
-   int numpfns = 1, add_mapping = 1;
-
-   for (pfn = start_pfn; pfn < end_pfn; pfn++)
-   xen_set_clr_mmio_pvh_pte(pfn, pfn, numpfns, add_mapping);
-
if (start_pfn <= max_pfn) {
unsigned long end = min(max_pfn_mapped, end_pfn);
*released += end - start_pfn;
@@ -288,7 +280,7 @@ static unsigned long __init xen_set_identity_and_release(
 
if (start_pfn < end_pfn) {
if (xlated_phys) {
-   xen_pvh_identity_map_chunk(start_pfn,
+   xen_pvh_adjust_stats(start_pfn,
end_pfn, nr_pages, &released, 
&identity);
} else {
diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index 83050d3..1844d31 100644
--- a/include/xen/interface/physdev.h
+++ b/include/xen/interface/physdev.h
@@ -274,16 +274,6 @@ struct physdev_dbgp_op {
 } u;
 };
 
-#define PHYSDEVOP_map_iomem30
-struct physdev_map_iomem {
-/* IN */
-uint64_t first_gfn;
-uint64_t first_mfn;
-uint32_t nr_mfns;
-uint32_t add_mapping; /* 1 == add mapping;  0 == unmap */
-
-};
-
 /*
  * Notify that some PIRQ-bound event channels have been unmasked.
  * ** This command is obsolete since interface version 0x00030202 and is **
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH]: PVH: remove FEATURES_PVH macro

2012-11-27 Thread Mukesh Rathor

On Mon, 26 Nov 2012 14:54:00 -0500
Konrad Rzeszutek Wilk  wrote:

> On Wed, Nov 14, 2012 at 06:19:33PM -0800, Mukesh Rathor wrote:
> > PVH: remove macro FEATURES_PVH and put PVH strings in the ELFNOTE
> > line, because there's a null char before FEATURES_PVH and in the
> > FEATURES_PVH strings since this is not C file
> 
> ping? Jan had a comment about it to keep the #ifdef at the top
> and combine ascii and asiz?

Ok, if you'd like that too, I'll look into it. I need to RTFM.

thanks
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH]: PVH: remove FEATURES_PVH macro

2012-11-14 Thread Mukesh Rathor

PVH: remove macro FEATURES_PVH and put PVH strings in the ELFNOTE line, because 
there's a null char before FEATURES_PVH and in the FEATURES_PVH strings since 
this is not C file

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/xen-head.S |   14 +-
 1 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 1a6bca1..340fd4e 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -13,14 +13,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_XEN_X86_PVH
-#define FEATURES_PVH "|writable_descriptor_tables" \
-"|auto_translated_physmap" \
-"|supervisor_mode_kernel" \
-"|hvm_callback_vector"
-#else
-#define FEATURES_PVH /* Not supported */
-#endif
 
__INIT
 ENTRY(startup_xen)
@@ -104,7 +96,11 @@ NEXT_HYPERCALL(arch_6)
 #endif
ELFNOTE(Xen, XEN_ELFNOTE_ENTRY,  _ASM_PTR startup_xen)
ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, _ASM_PTR hypercall_page)
-   ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,   .asciz 
"!writable_page_tables|pae_pgdir_above_4gb"FEATURES_PVH)
+#ifdef CONFIG_XEN_X86_PVH
+ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,   .asciz 
"!writable_page_tables|pae_pgdir_above_4gb|writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector")
+#else
+ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,   .asciz 
"!writable_page_tables|pae_pgdir_above_4gb")
+#endif
ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE,   .asciz "yes")
ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz "generic")
ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID,
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH 4/5] xen: arm: implement remap interfaces needed for privcmd mappings.

2012-10-25 Thread Mukesh Rathor

On Thu, 25 Oct 2012 08:46:59 +0100
Ian Campbell  wrote:

> On Thu, 2012-10-25 at 01:07 +0100, Mukesh Rathor wrote:
> > On Wed, 24 Oct 2012 16:44:11 -0700
> > Mukesh Rathor  wrote:
> > 
> > > >  
> > > > +/* Indexes into space being mapped. */
> > > > +GUEST_HANDLE(xen_ulong_t) idxs;
> > > > +
> > > > +/* GPFN in domid where the source mapping page should
> > > > appear. */
> > > > +GUEST_HANDLE(xen_pfn_t) gpfns;
> > > 
> > > 
> > > Looking at your arm implementation in xen, doesn't look like you
> > > are expecting idxs and gpfns to be contigous. In that case,
> > > shouldn't idxs and gpfns be pointers, ie, they are sent down as
> > > arrays? Or does GUEST_HANDLE do that, I can't seem to find where
> > > it's defined quickly.
> > 
> > Never mind, I see it got corrected to XEN_GUEST_HANDLE in staging
> > tree.
> 
> The macro is called XEN_GUEST_HANDLE in Xen and just GUEST_HANDLE in
> Linux.
> 
> > Still doesn't compile tho:
> > 
> > public/memory.h:246: error: expected specifier-qualifier-list before
> > ‘__guest_handle_xen_ulong_t’
> > 
> > I'll figure it out.
> 
> Looks like you've got it all sorted?

Yup. I made the change on xen side and added this patch to my tree
and got it working after reverting Konrad's setup.c changes. Not sure
if you need an ack from x86, but if you do:

Acked-by: Mukesh Rathor 

thanks
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH 5/5] xen: x86 pvh: use XENMEM_add_to_physmap_range for foreign gmfn mappings

2012-10-25 Thread Mukesh Rathor

On Wed, 24 Oct 2012 14:19:37 +0100
Ian Campbell  wrote:

> Squeezing the necessary fields into the existing XENMEM_add_to_physmap
> interface was proving to be a bit tricky so we have decided to go with
> a new interface upstream (the XENMAPSPACE_gmfn_foreign interface using
> XENMEM_add_to_physmap was never committed anywhere). This interface
> also allows for batching which was impossible to support at the same
> time as foreign mfns in the old interface.
> 
> This reverts the relevant parts of "PVH: basic and header changes,
> elfnote changes, ..." and followups and trivially converts
> pvh_add_to_xen_p2m over.
> 
> Signed-off-by: Ian Campbell 
> Acked-by: Stefano Stabellini 

Ok, I made the change on the xen side for x86 and tested it out. Works
fine. Second ack.

thanks, 
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH 4/5] xen: arm: implement remap interfaces needed for privcmd mappings.

2012-10-24 Thread Mukesh Rathor

On Wed, 24 Oct 2012 17:07:46 -0700
Mukesh Rathor  wrote:

> On Wed, 24 Oct 2012 16:44:11 -0700
> Mukesh Rathor  wrote:
> 
> > >  
> > >  #ifndef HYPERVISOR_VIRT_START
> > > diff --git a/include/xen/interface/memory.h
> > > b/include/xen/interface/memory.h index ad0dff5..5de2b36 100644
> > > --- a/include/xen/interface/memory.h
> > > +++ b/include/xen/interface/memory.h
> > > @@ -188,6 +188,24 @@
> > > DEFINE_GUEST_HANDLE_STRUCT(xen_add_to_physmap); /*** REMOVED ***/
> > >  /*#define XENMEM_translate_gpfn_list  8*/
> > >  
> > > +#define XENMEM_add_to_physmap_range 23
> > > +struct xen_add_to_physmap_range {
> > > +/* Which domain to change the mapping for. */
> > > +domid_t domid;
> > > +uint16_t space; /* => enum phys_map_space */
> > > +
> > > +/* Number of pages to go through */
> > > +uint16_t size;
> > > +domid_t foreign_domid; /* IFF gmfn_foreign */
> > > +
> > > +/* Indexes into space being mapped. */
> > > +GUEST_HANDLE(xen_ulong_t) idxs;
> > > +
> > > +/* GPFN in domid where the source mapping page should appear.
> > > */
> > > +GUEST_HANDLE(xen_pfn_t) gpfns;
> > 
> > 
> > Looking at your arm implementation in xen, doesn't look like you are
> > expecting idxs and gpfns to be contigous. In that case, shouldn't
> > idxs and gpfns be pointers, ie, they are sent down as arrays? Or
> > does GUEST_HANDLE do that, I can't seem to find where it's defined
> > quickly.
> 
> Never mind, I see it got corrected to XEN_GUEST_HANDLE in staging
> tree. Still doesn't compile tho:
> 
> public/memory.h:246: error: expected specifier-qualifier-list before
> ‘__guest_handle_xen_ulong_t’
> 
> I'll figure it out.

Oh, yeah, missed:
+DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);

compiles now.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH 4/5] xen: arm: implement remap interfaces needed for privcmd mappings.

2012-10-24 Thread Mukesh Rathor

On Wed, 24 Oct 2012 16:44:11 -0700
Mukesh Rathor  wrote:

> >  
> >  #ifndef HYPERVISOR_VIRT_START
> > diff --git a/include/xen/interface/memory.h
> > b/include/xen/interface/memory.h index ad0dff5..5de2b36 100644
> > --- a/include/xen/interface/memory.h
> > +++ b/include/xen/interface/memory.h
> > @@ -188,6 +188,24 @@ DEFINE_GUEST_HANDLE_STRUCT(xen_add_to_physmap);
> >  /*** REMOVED ***/
> >  /*#define XENMEM_translate_gpfn_list  8*/
> >  
> > +#define XENMEM_add_to_physmap_range 23
> > +struct xen_add_to_physmap_range {
> > +/* Which domain to change the mapping for. */
> > +domid_t domid;
> > +uint16_t space; /* => enum phys_map_space */
> > +
> > +/* Number of pages to go through */
> > +uint16_t size;
> > +domid_t foreign_domid; /* IFF gmfn_foreign */
> > +
> > +/* Indexes into space being mapped. */
> > +GUEST_HANDLE(xen_ulong_t) idxs;
> > +
> > +/* GPFN in domid where the source mapping page should appear.
> > */
> > +GUEST_HANDLE(xen_pfn_t) gpfns;
> 
> 
> Looking at your arm implementation in xen, doesn't look like you are
> expecting idxs and gpfns to be contigous. In that case, shouldn't idxs
> and gpfns be pointers, ie, they are sent down as arrays? Or does
> GUEST_HANDLE do that, I can't seem to find where it's defined quickly.

Never mind, I see it got corrected to XEN_GUEST_HANDLE in staging tree.
Still doesn't compile tho:

public/memory.h:246: error: expected specifier-qualifier-list before
‘__guest_handle_xen_ulong_t’

I'll figure it out.

thanks,
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH 4/5] xen: arm: implement remap interfaces needed for privcmd mappings.

2012-10-24 Thread Mukesh Rathor

>  
>  #ifndef HYPERVISOR_VIRT_START
> diff --git a/include/xen/interface/memory.h
> b/include/xen/interface/memory.h index ad0dff5..5de2b36 100644
> --- a/include/xen/interface/memory.h
> +++ b/include/xen/interface/memory.h
> @@ -188,6 +188,24 @@ DEFINE_GUEST_HANDLE_STRUCT(xen_add_to_physmap);
>  /*** REMOVED ***/
>  /*#define XENMEM_translate_gpfn_list  8*/
>  
> +#define XENMEM_add_to_physmap_range 23
> +struct xen_add_to_physmap_range {
> +/* Which domain to change the mapping for. */
> +domid_t domid;
> +uint16_t space; /* => enum phys_map_space */
> +
> +/* Number of pages to go through */
> +uint16_t size;
> +domid_t foreign_domid; /* IFF gmfn_foreign */
> +
> +/* Indexes into space being mapped. */
> +GUEST_HANDLE(xen_ulong_t) idxs;
> +
> +/* GPFN in domid where the source mapping page should appear. */
> +GUEST_HANDLE(xen_pfn_t) gpfns;


Looking at your arm implementation in xen, doesn't look like you are
expecting idxs and gpfns to be contigous. In that case, shouldn't idxs
and gpfns be pointers, ie, they are sent down as arrays? Or does
GUEST_HANDLE do that, I can't seem to find where it's defined quickly.

thanks
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/6] xen/pvh: bootup and setup related changes.

2012-10-23 Thread Mukesh Rathor

On Tue, 23 Oct 2012 14:07:06 +0100
Stefano Stabellini  wrote:

> On Mon, 22 Oct 2012, Konrad Rzeszutek Wilk wrote:
> > On Mon, Oct 22, 2012 at 02:34:44PM +0100, Stefano Stabellini wrote:
> > > On Sat, 20 Oct 2012, Konrad Rzeszutek Wilk wrote:
> > > > From: Mukesh Rathor 
> > > > 
> > > > for (pfn = PFN_DOWN(start); pfn < xen_max_p2m_pfn;
> > > > pfn++) { unsigned long mfn = pfn_to_mfn(pfn);
> > > > @@ -100,6 +104,7 @@ static unsigned long __init
> > > > xen_do_chunk(unsigned long start, .domid= DOMID_SELF
> > > > };
> > > > unsigned long len = 0;
> > > > +   int xlated_phys =
> > > > xen_feature(XENFEAT_auto_translated_physmap); unsigned long pfn;
> > > > int ret;
> > > >  
> > > > @@ -113,7 +118,7 @@ static unsigned long __init
> > > > xen_do_chunk(unsigned long start, continue;
> > > > frame = mfn;
> > > > } else {
> > > > -   if (mfn != INVALID_P2M_ENTRY)
> > > > +   if (!xlated_phys && mfn !=
> > > > INVALID_P2M_ENTRY) continue;
> > > > frame = pfn;
> > > > }
> > > 
> > > Shouldn't we add a "!xlated_phys &&" to the other case as well?
> > 
> > No. That is handled in xen_pvh_identity_map_chunk which
> > just does a xen_set_clr_mmio_pvh_pte call for the "released"
> > pages. But that is a bit of ... well, extra logic. I think
> > if we did this it would work and look much nicer:
> 
> Yes, I think that this version looks better

But doesn't boot:

(XEN) vmx_hybrid.c:710:d0 Dom:0 EPT violation 0x181 (r--/---), gpa 
0x00bf421e1c, mfn 0x, type 4.
(XEN) p2m-ept.c:642:d0 Walking EPT tables for domain 0 gfn bf421
(XEN) p2m-ept.c:648:d0  gfn exceeds max_mapped_pfn 4b062
(XEN) vmx_hybrid.c:717:d0  --- GLA 0xff477e1c


I'll have to debug it, or we can go back to the prev version, which
I don't think is that un-pretty :).

Mukesh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH 4/6] xen/pvh: bootup and setup related changes.

2012-10-23 Thread Mukesh Rathor

On Tue, 23 Oct 2012 17:03:10 -0700
Mukesh Rathor  wrote:

> On Tue, 23 Oct 2012 16:47:29 -0700
> Mukesh Rathor  wrote:
> 
> > On Tue, 23 Oct 2012 14:07:06 +0100
> > Stefano Stabellini  wrote:
> > 
> > > On Mon, 22 Oct 2012, Konrad Rzeszutek Wilk wrote:
> > > > On Mon, Oct 22, 2012 at 02:34:44PM +0100, Stefano Stabellini
> > > > wrote:
> > > > > On Sat, 20 Oct 2012, Konrad Rzeszutek Wilk wrote:
> > > > > > From: Mukesh Rathor 
> > > > > > 
> > > > > > for (pfn = PFN_DOWN(start); pfn < xen_max_p2m_pfn;
> > > > > > pfn++) { unsigned long mfn = pfn_to_mfn(pfn);
> > > > > > @@ -100,6 +104,7 @@ static unsigned long __init
> > > > > > xen_do_chunk(unsigned long start, .domid= DOMID_SELF
> > > > > > };
> > > > > > unsigned long len = 0;
> > > > > > +   int xlated_phys =
> > > > > > xen_feature(XENFEAT_auto_translated_physmap); unsigned long
> > > > > > pfn; int ret;
> > > > > >  
> > > > > > @@ -113,7 +118,7 @@ static unsigned long __init
> > > > > > xen_do_chunk(unsigned long start, continue;
> > > > > > frame = mfn;
> > > > > > } else {
> > > > > > -   if (mfn != INVALID_P2M_ENTRY)
> > > > > > +   if (!xlated_phys && mfn !=
> > > > > > INVALID_P2M_ENTRY) continue;
> > > > > > frame = pfn;
> > > > > > }
> > > > > 
> > > > > Shouldn't we add a "!xlated_phys &&" to the other case as
> > > > > well?
> > > > 
> > > > No. That is handled in xen_pvh_identity_map_chunk which
> > > > just does a xen_set_clr_mmio_pvh_pte call for the "released"
> > > > pages. But that is a bit of ... well, extra logic. I think
> > > > if we did this it would work and look much nicer:
> > > 
> > > Yes, I think that this version looks better
> > 
> > But doesn't boot:
> > 
> > (XEN) vmx_hybrid.c:710:d0 Dom:0 EPT violation 0x181 (r--/---), gpa
> > 0x00bf421e1c, mfn 0x, type 4. (XEN)
> > p2m-ept.c:642:d0 Walking EPT tables for domain 0 gfn bf421 (XEN)
> > p2m-ept.c:648:d0  gfn exceeds max_mapped_pfn 4b062 (XEN)
> > vmx_hybrid.c:717:d0  --- GLA 0xff477e1c
> > 
> > 
> > I'll have to debug it, or we can go back to the prev version, which
> > I don't think is that un-pretty :).
> > 
> 
> The reason being:
> xen_set_identity_and_release_chunk():
> NEW : > for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn;
> pfn++) {
> 
> xen_pvh_identity_map_chunk():
> OLD: for (pfn = start_pfn; pfn < end_pfn; pfn++)
> 
> IOW, for PVH we need to avoid testing for max_pfn_mapped, as we are
> mapping the entire IO space. 

Also, now your counts for released and identity are off. Can we please
go back to the way it was? 

thanks
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH 4/6] xen/pvh: bootup and setup related changes.

2012-10-23 Thread Mukesh Rathor

On Tue, 23 Oct 2012 16:47:29 -0700
Mukesh Rathor  wrote:

> On Tue, 23 Oct 2012 14:07:06 +0100
> Stefano Stabellini  wrote:
> 
> > On Mon, 22 Oct 2012, Konrad Rzeszutek Wilk wrote:
> > > On Mon, Oct 22, 2012 at 02:34:44PM +0100, Stefano Stabellini
> > > wrote:
> > > > On Sat, 20 Oct 2012, Konrad Rzeszutek Wilk wrote:
> > > > > From: Mukesh Rathor 
> > > > > 
> > > > >   for (pfn = PFN_DOWN(start); pfn < xen_max_p2m_pfn;
> > > > > pfn++) { unsigned long mfn = pfn_to_mfn(pfn);
> > > > > @@ -100,6 +104,7 @@ static unsigned long __init
> > > > > xen_do_chunk(unsigned long start, .domid= DOMID_SELF
> > > > >   };
> > > > >   unsigned long len = 0;
> > > > > + int xlated_phys =
> > > > > xen_feature(XENFEAT_auto_translated_physmap); unsigned long
> > > > > pfn; int ret;
> > > > >  
> > > > > @@ -113,7 +118,7 @@ static unsigned long __init
> > > > > xen_do_chunk(unsigned long start, continue;
> > > > >   frame = mfn;
> > > > >   } else {
> > > > > - if (mfn != INVALID_P2M_ENTRY)
> > > > > + if (!xlated_phys && mfn !=
> > > > > INVALID_P2M_ENTRY) continue;
> > > > >   frame = pfn;
> > > > >   }
> > > > 
> > > > Shouldn't we add a "!xlated_phys &&" to the other case as well?
> > > 
> > > No. That is handled in xen_pvh_identity_map_chunk which
> > > just does a xen_set_clr_mmio_pvh_pte call for the "released"
> > > pages. But that is a bit of ... well, extra logic. I think
> > > if we did this it would work and look much nicer:
> > 
> > Yes, I think that this version looks better
> 
> But doesn't boot:
> 
> (XEN) vmx_hybrid.c:710:d0 Dom:0 EPT violation 0x181 (r--/---), gpa
> 0x00bf421e1c, mfn 0x, type 4. (XEN)
> p2m-ept.c:642:d0 Walking EPT tables for domain 0 gfn bf421 (XEN)
> p2m-ept.c:648:d0  gfn exceeds max_mapped_pfn 4b062 (XEN)
> vmx_hybrid.c:717:d0  --- GLA 0xff477e1c
> 
> 
> I'll have to debug it, or we can go back to the prev version, which
> I don't think is that un-pretty :).
> 

The reason being:
xen_set_identity_and_release_chunk():
NEW : > for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) {

xen_pvh_identity_map_chunk():
OLD: for (pfn = start_pfn; pfn < end_pfn; pfn++)

IOW, for PVH we need to avoid testing for max_pfn_mapped, as we are
mapping the entire IO space. 

thanks
mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/6] xen/pvh: Extend vcpu_guest_context, p2m, event, and xenbus to support PVH.

2012-10-22 Thread Mukesh Rathor

On Mon, 22 Oct 2012 16:14:51 -0400
Konrad Rzeszutek Wilk  wrote:

> On Mon, Oct 22, 2012 at 11:31:54AM -0700, Mukesh Rathor wrote:
> > On Mon, 22 Oct 2012 14:44:40 +0100
> > Stefano Stabellini  wrote:
> > 
> > > On Sat, 20 Oct 2012, Konrad Rzeszutek Wilk wrote:
> > > > From: Mukesh Rathor 
> > > > 
> > > > make gdt_frames[]/gdt_ents into a union with {gdtaddr, gdtsz},
> > > > as PVH only needs to send down gdtaddr and gdtsz.
> > > > 
> > > > For interrupts, PVH uses native_irq_ops.
> > > > vcpu hotplug is currently not available for PVH.
> > > > 
> > > > For events we follow what PVHVM does - to use callback vector.
> > > > Lastly, also use HVM path to setup XenBus.
> > > > 
> > > > Signed-off-by: Mukesh Rathor 
> > > > Signed-off-by: Konrad Rzeszutek Wilk 
> > > > ---
> > > > return true;
> > > > }
> > > > -   xen_copy_trap_info(ctxt->trap_ctxt);
> > > > +   /* check for autoxlated to get it right for 32bit
> > > > kernel */
> > > 
> > > I am not sure what this comment means, considering that in another
> > > comment below you say that we don't support 32bit PVH kernels.
> > 
> > Function is common to both 32bit and 64bit kernels. We need to
> > check for auto xlated also in the if statement in addition to
> > supervisor mode kernel, so 32 bit doesn't go down the wrong path.
> 
> Can one just make it #ifdef CONFIG_X86_64 for the whole thing?
> You are either way during bootup doing a 'BUG' when booting as 32-bit?

32bit pure pv, ie, pv mmu, path. BUG() is for 32bit PVH. 
Sure we could ifdef whole thing, but then we'd have to add more ifdef's
around else, closing else, etc.. I"m ok with whatever works for you.

thanks
mukesh



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/6] xen/pvh: Extend vcpu_guest_context, p2m, event, and xenbus to support PVH.

2012-10-22 Thread Mukesh Rathor

On Mon, 22 Oct 2012 14:44:40 +0100
Stefano Stabellini  wrote:

> On Sat, 20 Oct 2012, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor 
> > 
> > make gdt_frames[]/gdt_ents into a union with {gdtaddr, gdtsz}, as
> > PVH only needs to send down gdtaddr and gdtsz.
> > 
> > For interrupts, PVH uses native_irq_ops.
> > vcpu hotplug is currently not available for PVH.
> > 
> > For events we follow what PVHVM does - to use callback vector.
> > Lastly, also use HVM path to setup XenBus.
> > 
> > Signed-off-by: Mukesh Rathor 
> > Signed-off-by: Konrad Rzeszutek Wilk 
> > ---
> > return true;
> > }
> > -   xen_copy_trap_info(ctxt->trap_ctxt);
> > +   /* check for autoxlated to get it right for 32bit kernel */
> 
> I am not sure what this comment means, considering that in another
> comment below you say that we don't support 32bit PVH kernels.

Function is common to both 32bit and 64bit kernels. We need to check 
for auto xlated also in the if statement in addition to supervisor 
mode kernel, so 32 bit doesn't go down the wrong path.

PVH is not supported for 32bit kernels, and gs_base_user doesn't exist
in the structure for 32bit so it needs to be if'def'd 64bit which is
ok because PVH is not supprted on 32bit kernel.

> > +   (unsigned
> > long)xen_hypervisor_callback;
> > +   ctxt->failsafe_callback_eip =
> > +   (unsigned
> > long)xen_failsafe_callback;
> > +   }
> > +   ctxt->user_regs.cs = __KERNEL_CS;
> > +   ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct
> > pt_regs); 
> > per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
> > ctxt->ctrlreg[3] =
> > xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
> 
> The tradional path looks the same as before, however it is hard to
> tell whether the PVH path is correct without the Xen side. For
> example, what is gdtsz?

gdtsz is GUEST_GDTR_LIMIT and gdtaddr is GUEST_GDTR_BASE in the vmcs.

thanks,
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH V3 6/6]: PVH:privcmd changes.

2012-10-18 Thread Mukesh Rathor

On Thu, 18 Oct 2012 11:35:53 +0100
Ian Campbell  wrote:

> 
> > @@ -439,6 +490,19 @@ static long privcmd_ioctl(struct file *file,
> > return ret;
> >  }
> >  
> > +static void privcmd_close(struct vm_area_struct *vma)
> > +{
> > +   struct page **pages = vma ? vma->vm_private_data : NULL;
> 
> Can VMA really be NULL?...

Good programming I thought!

> > +   int numpgs = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
> 
> ...I assume not since you unconditionally dereference it here.

Added this later, and just didn't change the earlier part.

> > +   if (!pages || !numpgs
> > || !xen_feature(XENFEAT_auto_translated_physmap))
> 
> In the non-xlat case pages will (or should!) be 1 here which will pass
> the first clause of the test.
> 
> Although the later clauses will catch this I think it would be worth
> ordering the checks such that they are each valid, perhaps by pulling
> the feature check to the front or by separating the !xlat case from
> the other two which are valid iff xlat == True.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH V3 4/6]: PVH:bootup and setup related changes.

2012-10-18 Thread Mukesh Rathor

On Thu, 18 Oct 2012 11:49:31 +0100
Ian Campbell  wrote:

> On Thu, 2012-10-18 at 01:32 +0100, Mukesh Rathor wrote:
> > 
> > +   /* PVH TBD/FIXME: vcpu info placement in phase 2 */
> > +   if (xen_pvh_domain())
> > +   return; 
> 
> Do you have a list of future work before PVH is "feature complete"
> somewhere?
> 
> I wouldn't count vcpu placement specifically against completeness but
> the TBD triggered the thought.
> 
> For example I was reviewing a migration patch this morning and was
> wondering if that sort of thing had been investigated yet.
> 
> Ian.

Yes, I was gonna post it in a separate email, so one can easily
find/track it. 

No, haven't thought of migration, phase II or III.

thanks
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH V3 2/6]: (RESEND) PVH: use native irq, enable callback, use HVM ring ops, smp, ...

2012-10-18 Thread Mukesh Rathor

PVH: make gdt_frames[]/gdt_ents into a union with {gdtaddr, gdtsz}, PVH
only needs to send down gdtaddr and gdtsz. irq.c: PVH uses
native_irq_ops. vcpu hotplug is currently not available for PVH.
events.c: setup callback vector for PVH. Finally, PVH ring ops uses HVM
paths for xenbus.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/include/asm/xen/interface.h |   11 +-
 arch/x86/xen/irq.c   |5 ++-
 arch/x86/xen/p2m.c   |2 +-
 arch/x86/xen/smp.c   |   75 ++
 drivers/xen/cpu_hotplug.c|4 +-
 drivers/xen/events.c |9 -
 drivers/xen/xenbus/xenbus_client.c   |3 +-
 7 files changed, 77 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/xen/interface.h 
b/arch/x86/include/asm/xen/interface.h
index 555f94d..ac5ef76 100644
--- a/arch/x86/include/asm/xen/interface.h
+++ b/arch/x86/include/asm/xen/interface.h
@@ -143,7 +143,16 @@ struct vcpu_guest_context {
 struct cpu_user_regs user_regs; /* User-level CPU registers */
 struct trap_info trap_ctxt[256];/* Virtual IDT  */
 unsigned long ldt_base, ldt_ents;   /* LDT (linear address, # ents) */
-unsigned long gdt_frames[16], gdt_ents; /* GDT (machine frames, # ents) */
+union {
+   struct {
+   /* PV: GDT (machine frames, # ents).*/
+   unsigned long gdt_frames[16], gdt_ents;
+   } pv;
+   struct {
+   /* PVH: GDTR addr and size */
+   unsigned long gdtaddr, gdtsz;
+   } pvh;
+} u;
 unsigned long kernel_ss, kernel_sp; /* Virtual TSS (only SS1/SP1)   */
 /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */
 unsigned long ctrlreg[8];   /* CR0-CR7 (control registers)  */
diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index 1573376..31959a7 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -128,6 +129,8 @@ static const struct pv_irq_ops xen_irq_ops __initconst = {
 
 void __init xen_init_irq_ops(void)
 {
-   pv_irq_ops = xen_irq_ops;
+   /* For PVH we use default pv_irq_ops settings */
+   if (!xen_feature(XENFEAT_hvm_callback_vector))
+   pv_irq_ops = xen_irq_ops;
x86_init.irqs.intr_init = xen_init_IRQ;
 }
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 95fb2aa..ea553c8 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -798,7 +798,7 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long 
mfn)
 {
unsigned topidx, mididx, idx;
 
-   if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
+   if (xen_feature(XENFEAT_auto_translated_physmap)) {
BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
return true;
}
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index f58dca7..cda1907 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -68,9 +68,11 @@ static void __cpuinit cpu_bringup(void)
touch_softlockup_watchdog();
preempt_disable();
 
-   xen_enable_sysenter();
-   xen_enable_syscall();
-
+   /* PVH runs in ring 0 and allows us to do native syscalls. Yay! */
+   if (!xen_feature(XENFEAT_supervisor_mode_kernel)) {
+   xen_enable_sysenter();
+   xen_enable_syscall();
+   }
cpu = smp_processor_id();
smp_store_cpu_info(cpu);
cpu_data(cpu).x86_max_cores = 1;
@@ -230,10 +232,11 @@ static void __init xen_smp_prepare_boot_cpu(void)
BUG_ON(smp_processor_id() != 0);
native_smp_prepare_boot_cpu();
 
-   /* We've switched to the "real" per-cpu gdt, so make sure the
-  old memory can be recycled */
-   make_lowmem_page_readwrite(xen_initial_gdt);
-
+   if (!xen_feature(XENFEAT_writable_page_tables)) {
+   /* We've switched to the "real" per-cpu gdt, so make sure the
+* old memory can be recycled */
+   make_lowmem_page_readwrite(xen_initial_gdt);
+   }
xen_filter_cpu_maps();
xen_setup_vcpu_info_placement();
 }
@@ -300,8 +303,6 @@ cpu_initialize_context(unsigned int cpu, struct task_struct 
*idle)
gdt = get_cpu_gdt_table(cpu);
 
ctxt->flags = VGCF_IN_KERNEL;
-   ctxt->user_regs.ds = __USER_DS;
-   ctxt->user_regs.es = __USER_DS;
ctxt->user_regs.ss = __KERNEL_DS;
 #ifdef CONFIG_X86_32
ctxt->user_regs.fs = __KERNEL_PERCPU;
@@ -310,35 +311,57 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
ctxt->gs_base_kernel = per_cpu_offset(cpu);
 #endif
ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
-   ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
 
memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
-   xen_copy_tra

Re: [Xen-devel] [PATCH V3 2/6]: PVH: use native irq, enable callback, use HVM ring ops, smp, ...

2012-10-18 Thread Mukesh Rathor

On Thu, 18 Oct 2012 11:44:37 +0100
Ian Campbell  wrote:

> On Thu, 2012-10-18 at 01:30 +0100, Mukesh Rathor wrote:
> > PVH: make gdt_frames[]/gdt_ents into a union with {gdtaddr, gdtsz},
> > PVH only needs to send down gdtaddr and gdtsz. irq.c: PVH uses
> > native_irq_ops. vcpu hotplug is currently not available for PVH.
> > events.c: setup callback vector for PVH. smp.c: This pertains to
> > bringing up smp vcpus. PVH runs in ring 0, so syscalls are native.
> > Also, the vcpu context is send down via the hcall to be set in the
> > vmcs. gdtaddr and gdtsz are unionionized as PVH only needs to send
> > these two to be set in the vmcs. Finally, PVH ring ops uses HVM
> > paths for xenbus.
> > 
> > Signed-off-by: Mukesh Rathor 
> > ---
> >  arch/x86/include/asm/xen/interface.h |   11 +-
> >  arch/x86/xen/irq.c   |5 ++-
> >  arch/x86/xen/p2m.c   |2 +-
> >  arch/x86/xen/smp.c   |   75
> > ++ drivers/xen/cpu_hotplug.c
> > |4 +- drivers/xen/events.c |9 -
> >  drivers/xen/xenbus/xenbus_client.c   |3 +-
> 
> This patch seems to have been horribly whitespace damaged.

Hmm.. not sure what happened. Resending. See another email.

> Have you seen "git send-email" ? It's very useful for avoiding this
> sort of thing and also takes a lot of the grunt work out of reposting
> a series.

> It also chains the patches as replies to the introductory zero-th mail
> -- which is something I've been meaning to ask you to do for a while.
> It's useful because it joins the series together in a thread which
> makes it easier to keep track of in my INBOX.

I prefer different thread-sets for different version. Otherwise too
many emails makes much harder to manage and figure. Besides, as
comments come in, path x of y,  may contain different files/changes in
a subsequent version.

thanks
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 3/6]: PVH: mmu related changes.

2012-10-18 Thread Mukesh Rathor

On Thu, 18 Oct 2012 12:31:08 +0100
Stefano Stabellini  wrote:

> On Thu, 18 Oct 2012, Mukesh Rathor wrote:
> > PVH: This patch implements mmu changes for PVH. First the set/clear
> > mmio pte function makes a hypercall to update the p2m in xen with
> > 1:1 mapping. PVH uses mostly native mmu ops. Two local functions
> > are introduced to add to xen physmap for xen remap interface. xen
> > unmap interface is introduced so the privcmd pte entries can be
> > cleared in xen p2m table.
> > 
> > Signed-off-by: Mukesh Rathor 
> > ---
> >  arch/x86/xen/mmu.c|  174
> > ++---
> > arch/x86/xen/mmu.h|2 + drivers/xen/privcmd.c |5 +-
> >  include/xen/xen-ops.h |5 +-
> >  4 files changed, 174 insertions(+), 12 deletions(-)
> > 
> > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > index 5a16824..5ed3b3e 100644
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -73,6 +73,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include "multicalls.h"
> >  #include "mmu.h"
> > @@ -331,6 +332,20 @@ static void xen_set_pte(pte_t *ptep, pte_t
> > pteval) __xen_set_pte(ptep, pteval);
> >  }
> >  
> > +void xen_set_clr_mmio_pvh_pte(unsigned long pfn, unsigned long mfn,
> > + int nr_mfns, int add_mapping)
> > +{
> > +   struct physdev_map_iomem iomem;
> > +
> > +   iomem.first_gfn = pfn;
> > +   iomem.first_mfn = mfn;
> > +   iomem.nr_mfns = nr_mfns;
> > +   iomem.add_mapping = add_mapping;
> > +
> > +   if (HYPERVISOR_physdev_op(PHYSDEVOP_pvh_map_iomem, &iomem))
> > +   BUG();
> > +}
> 
> You introduce this function here but it is unused. It is not clear
> from the patch description why you are introducing it.
> 
> 
> >  static void xen_set_pte_at(struct mm_struct *mm, unsigned long
> > addr, pte_t *ptep, pte_t pteval)
> >  {
> > @@ -1220,6 +1235,8 @@ static void __init xen_pagetable_init(void)
> >  #endif
> > paging_init();
> > xen_setup_shared_info();
> > +   if (xen_feature(XENFEAT_auto_translated_physmap))
> > +   return;
> >  #ifdef CONFIG_X86_64
> > if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> > unsigned long new_mfn_list;
> > @@ -1527,6 +1544,10 @@ static void __init xen_set_pte_init(pte_t
> > *ptep, pte_t pte) static void pin_pagetable_pfn(unsigned cmd,
> > unsigned long pfn) {
> > struct mmuext_op op;
> > +
> > +   if (xen_feature(XENFEAT_writable_page_tables))
> > +   return;
> > +
> > op.cmd = cmd;
> > op.arg1.mfn = pfn_to_mfn(pfn);
> > if (HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF))
> > @@ -1724,6 +1745,10 @@ static void set_page_prot(void *addr,
> > pgprot_t prot) unsigned long pfn = __pa(addr) >> PAGE_SHIFT;
> > pte_t pte = pfn_pte(pfn, prot);
> >  
> > +   /* recall for PVH, page tables are native. */
> > +   if (xen_feature(XENFEAT_auto_translated_physmap))
> > +   return;
> > +
> > if (HYPERVISOR_update_va_mapping((unsigned long)addr, pte,
> > 0)) BUG();
> >  }
> > @@ -1801,6 +1826,9 @@ static void convert_pfn_mfn(void *v)
> > pte_t *pte = v;
> > int i;
> >  
> > +   if (xen_feature(XENFEAT_auto_translated_physmap))
> > +   return;
> > +
> > /* All levels are converted the same way, so just treat
> > them as ptes. */
> > for (i = 0; i < PTRS_PER_PTE; i++)
> > @@ -1820,6 +1848,7 @@ static void __init check_pt_base(unsigned
> > long *pt_base, unsigned long *pt_end, (*pt_end)--;
> > }
> >  }
> > +
> >  /*
> >   * Set up the initial kernel pagetable.
> >   *
> > @@ -1830,6 +1859,7 @@ static void __init check_pt_base(unsigned
> > long *pt_base, unsigned long *pt_end,
> >   * but that's enough to get __va working.  We need to fill in the
> > rest
> >   * of the physical mapping once some sort of allocator has been set
> >   * up.
> > + * NOTE: for PVH, the page tables are native.
> >   */
> >  void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long
> > max_pfn) {
> > @@ -1907,10 +1937,13 @@ void __init
> > xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
> >  * structure to attach it to, so make sure we just set
> > kernel
> >  * pgd.
> >  */
> > -   xen_mc_batch();
> >

[PATCH V3 6/6]: PVH:privcmd changes.

2012-10-17 Thread Mukesh Rathor

PVH: privcmd changes. PVH only supports the batch interface. To map a foreign 
page to a process, pfn must be allocated. PVH path uses ballooning for that 
purpose. The returned pfn is then mapped to the foreign page. 
xen_unmap_domain_mfn_range() is introduced to unmap these pages via the privcmd 
close call.

Signed-off-by: Mukesh Rathor 
---
 drivers/xen/privcmd.c |   69 +++-
 1 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 63d9ee8..835166a 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -33,11 +33,14 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "privcmd.h"
 
 MODULE_LICENSE("GPL");
 
+#define PRIV_VMA_LOCKED ((void *)1)
+
 #ifndef HAVE_ARCH_PRIVCMD_MMAP
 static int privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma);
 #endif
@@ -199,6 +202,10 @@ static long privcmd_ioctl_mmap(void __user *udata)
if (!xen_initial_domain())
return -EPERM;
 
+   /* We only support privcmd_ioctl_mmap_batch for auto translated. */
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return -ENOSYS;
+
if (copy_from_user(&mmapcmd, udata, sizeof(mmapcmd)))
return -EFAULT;
 
@@ -246,6 +253,7 @@ struct mmap_batch_state {
domid_t domain;
unsigned long va;
struct vm_area_struct *vma;
+   int index;
/* A tristate:
 *  0 for no errors
 *  1 if at least one error has happened (and no
@@ -260,15 +268,24 @@ struct mmap_batch_state {
xen_pfn_t __user *user_mfn;
 };
 
+/* auto translated dom0 note: if domU being created is PV, then mfn is
+ * mfn(addr on bus). If it's auto xlated, then mfn is pfn (input to HAP).
+ */
 static int mmap_batch_fn(void *data, void *state)
 {
xen_pfn_t *mfnp = data;
struct mmap_batch_state *st = state;
+   struct vm_area_struct *vma = st->vma;
+   struct page **pages = vma->vm_private_data;
+   struct page *cur_page = NULL;
int ret;
 
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   cur_page = pages[st->index++];
+
ret = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK, *mfnp, 1,
 st->vma->vm_page_prot, st->domain,
-NULL);
+&cur_page);
 
/* Store error code for second pass. */
*(st->err++) = ret;
@@ -304,6 +321,32 @@ static int mmap_return_errors_v1(void *data, void *state)
return __put_user(*mfnp, st->user_mfn++);
 }
 
+/* Allocate pfns that are then mapped with gmfns from foreign domid. Update
+ * the vma with the page info to use later.
+ * Returns: 0 if success, otherwise -errno
+ */
+static int alloc_empty_pages(struct vm_area_struct *vma, int numpgs)
+{
+   int rc;
+   struct page **pages;
+
+   pages = kcalloc(numpgs, sizeof(pages[0]), GFP_KERNEL);
+   if (pages == NULL)
+   return -ENOMEM;
+
+   rc = alloc_xenballooned_pages(numpgs, pages, 0);
+   if (rc != 0) {
+   pr_warn("%s Could not alloc %d pfns rc:%d\n", __func__,
+   numpgs, rc);
+   kfree(pages);
+   return -ENOMEM;
+   }
+   BUG_ON(vma->vm_private_data != PRIV_VMA_LOCKED);
+   vma->vm_private_data = pages;
+
+   return 0;
+}
+
 static struct vm_operations_struct privcmd_vm_ops;
 
 static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
@@ -371,10 +414,18 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
up_write(&mm->mmap_sem);
goto out;
}
+   if (xen_feature(XENFEAT_auto_translated_physmap)) {
+   ret = alloc_empty_pages(vma, m.num);
+   if (ret < 0) {
+   up_write(&mm->mmap_sem);
+   goto out;
+   }
+   }
 
state.domain= m.dom;
state.vma   = vma;
state.va= m.addr;
+   state.index = 0;
state.global_error  = 0;
state.err   = err_array;
 
@@ -439,6 +490,19 @@ static long privcmd_ioctl(struct file *file,
return ret;
 }
 
+static void privcmd_close(struct vm_area_struct *vma)
+{
+   struct page **pages = vma ? vma->vm_private_data : NULL;
+   int numpgs = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+
+   if (!pages || !numpgs || !xen_feature(XENFEAT_auto_translated_physmap))
+   return;
+
+   xen_unmap_domain_mfn_range(vma, numpgs, pages);
+   free_xenballooned_pages(numpgs, pages);
+   kfree(pages);
+}
+
 static int privcmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
printk(KERN_DEBUG "privcmd_fau

[PATCH V3 5/6]: PVH:balloon and grant changes

2012-10-17 Thread Mukesh Rathor

PVH: balloon and grant changes. For balloon changes we skip setting of local 
p2m as it's updated in xen. For grant, the shared grant frame is the pfn and 
not mfn, hence its mapped via the same code path as HVM

Signed-off-by: Mukesh Rathor 
---
 drivers/xen/balloon.c |   15 +--
 drivers/xen/gntdev.c  |3 ++-
 drivers/xen/grant-table.c |   26 ++
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 31ab82f..c825b63 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -361,7 +361,9 @@ static enum bp_state increase_reservation(unsigned long 
nr_pages)
set_phys_to_machine(pfn, frame_list[i]);
 
/* Link back into the page tables if not highmem. */
-   if (xen_pv_domain() && !PageHighMem(page)) {
+   if (xen_pv_domain() && !PageHighMem(page) &&
+   !xen_feature(XENFEAT_auto_translated_physmap)) {
+
int ret;
ret = HYPERVISOR_update_va_mapping(
(unsigned long)__va(pfn << PAGE_SHIFT),
@@ -418,12 +420,13 @@ static enum bp_state decrease_reservation(unsigned long 
nr_pages, gfp_t gfp)
scrub_page(page);
 
if (xen_pv_domain() && !PageHighMem(page)) {
-   ret = HYPERVISOR_update_va_mapping(
-   (unsigned long)__va(pfn << PAGE_SHIFT),
-   __pte_ma(0), 0);
-   BUG_ON(ret);
+   if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   ret = HYPERVISOR_update_va_mapping(
+   (unsigned long)__va(pfn << PAGE_SHIFT),
+   __pte_ma(0), 0);
+   BUG_ON(ret);
+   }
}
-
}
 
/* Ensure that ballooned highmem pages don't have kmaps. */
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 5df9fd8..36ec380 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -803,7 +803,8 @@ static int __init gntdev_init(void)
if (!xen_domain())
return -ENODEV;
 
-   use_ptemod = xen_pv_domain();
+   use_ptemod = xen_pv_domain() &&
+!xen_feature(XENFEAT_auto_translated_physmap);
 
err = misc_register(&gntdev_miscdev);
if (err != 0) {
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index f37faf6..1b851fa 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -976,14 +976,19 @@ static void gnttab_unmap_frames_v2(void)
 static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
 {
struct gnttab_setup_table setup;
-   unsigned long *frames;
+   unsigned long *frames, start_gpfn;
unsigned int nr_gframes = end_idx + 1;
int rc;
 
-   if (xen_hvm_domain()) {
+   if (xen_hvm_domain() || xen_feature(XENFEAT_auto_translated_physmap)) {
struct xen_add_to_physmap xatp;
unsigned int i = end_idx;
rc = 0;
+
+   if (xen_hvm_domain())
+   start_gpfn = xen_hvm_resume_frames >> PAGE_SHIFT;
+   else
+   start_gpfn = virt_to_pfn(gnttab_shared.addr);
/*
 * Loop backwards, so that the first hypercall has the largest
 * index, ensuring that the table will grow only once.
@@ -992,7 +997,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int 
end_idx)
xatp.domid = DOMID_SELF;
xatp.idx = i;
xatp.space = XENMAPSPACE_grant_table;
-   xatp.gpfn = (xen_hvm_resume_frames >> PAGE_SHIFT) + i;
+   xatp.gpfn = start_gpfn + i;
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
if (rc != 0) {
printk(KERN_WARNING
@@ -1055,7 +1060,7 @@ static void gnttab_request_version(void)
int rc;
struct gnttab_set_version gsv;
 
-   if (xen_hvm_domain())
+   if (xen_hvm_domain() || xen_feature(XENFEAT_auto_translated_physmap))
gsv.version = 1;
else
gsv.version = 2;
@@ -1083,12 +1088,25 @@ static void gnttab_request_version(void)
 int gnttab_resume(void)
 {
unsigned int max_nr_gframes;
+   char *kmsg = "Failed to kmalloc pages for pv in hvm grant frames\n";
 
gnttab_request_version();
max_nr_gframes = gnttab_max_grant_frames();
if (max_nr_gframes < nr_grant_frames)
return -ENOSYS;
 
+   /* PVH note: xen will free existing kmalloc'd mfn in
+* XENMEM_a

[PATCH V3 4/6]: PVH:bootup and setup related changes.

2012-10-17 Thread Mukesh Rathor

PVH: bootup and setup related changes. enlighten.c: for PVH we can trap cpuid 
via vmexit, so don't need to use emulated prefix call. Check for vector 
callback early on, as it is a required feature. PVH runs at default kernel 
iopl. setup.c: in xen_add_extra_mem() we can skip updating p2m as it's managed 
by xen. PVH maps the entire IO space, but only RAM pages need to be 
repopulated. Finally, pure PV settings are moved to a separate function that is 
only called for pure pv, ie, pv with pvmmu.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/enlighten.c |   77 ++---
 arch/x86/xen/setup.c |   64 +++---
 2 files changed, 110 insertions(+), 31 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 2d932c3..18f5514 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -106,6 +107,9 @@ RESERVE_BRK(shared_info_page_brk, PAGE_SIZE);
 __read_mostly int xen_have_vector_callback;
 EXPORT_SYMBOL_GPL(xen_have_vector_callback);
 
+#define xen_pvh_domain() (xen_pv_domain() && \
+ xen_feature(XENFEAT_auto_translated_physmap) && \
+ xen_have_vector_callback)
 /*
  * Point at some empty memory to start with. We map the real shared_info
  * page as soon as fixmap is up and running.
@@ -218,8 +222,9 @@ static void __init xen_banner(void)
struct xen_extraversion extra;
HYPERVISOR_xen_version(XENVER_extraversion, &extra);
 
-   printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
-  pv_info.name);
+   pr_info("Booting paravirtualized kernel %son %s\n",
+   xen_feature(XENFEAT_auto_translated_physmap) ?
+   "with PVH extensions " : "", pv_info.name);
printk(KERN_INFO "Xen version: %d.%d%s%s\n",
   version >> 16, version & 0x, extra.extraversion,
   xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " 
(preserve-AD)" : "");
@@ -272,12 +277,15 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
break;
}
 
-   asm(XEN_EMULATE_PREFIX "cpuid"
-   : "=a" (*ax),
- "=b" (*bx),
- "=c" (*cx),
- "=d" (*dx)
-   : "0" (*ax), "2" (*cx));
+   if (xen_pvh_domain())
+   native_cpuid(ax, bx, cx, dx);
+   else
+   asm(XEN_EMULATE_PREFIX "cpuid"
+   : "=a" (*ax),
+   "=b" (*bx),
+   "=c" (*cx),
+   "=d" (*dx)
+   : "0" (*ax), "2" (*cx));
 
*bx &= maskebx;
*cx &= maskecx;
@@ -1045,6 +1053,10 @@ void xen_setup_shared_info(void)
HYPERVISOR_shared_info =
(struct shared_info *)__va(xen_start_info->shared_info);
 
+   /* PVH TBD/FIXME: vcpu info placement in phase 2 */
+   if (xen_pvh_domain())
+   return;
+
 #ifndef CONFIG_SMP
/* In UP this is as good a place as any to set up shared info */
xen_setup_vcpu_info_placement();
@@ -1275,6 +1287,11 @@ static const struct machine_ops xen_machine_ops 
__initconst = {
  */
 static void __init xen_setup_stackprotector(void)
 {
+   /* PVH TBD/FIXME: investigate setup_stack_canary_segment */
+   if (xen_feature(XENFEAT_auto_translated_physmap)) {
+   switch_to_new_gdt(0);
+   return;
+   }
pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot;
pv_cpu_ops.load_gdt = xen_load_gdt_boot;
 
@@ -1285,6 +1302,19 @@ static void __init xen_setup_stackprotector(void)
pv_cpu_ops.load_gdt = xen_load_gdt;
 }
 
+static void __init xen_pvh_early_guest_init(void)
+{
+   if (xen_feature(XENFEAT_hvm_callback_vector))
+   xen_have_vector_callback = 1;
+
+#ifdef CONFIG_X86_32
+   if (xen_feature(XENFEAT_auto_translated_physmap)) {
+   xen_raw_printk("ERROR: 32bit PVH guests are not supported\n");
+   BUG();
+   }
+#endif
+}
+
 /* First C function to be called on Xen boot */
 asmlinkage void __init xen_start_kernel(void)
 {
@@ -1296,13 +1326,18 @@ asmlinkage void __init xen_start_kernel(void)
 
xen_domain_type = XEN_PV_DOMAIN;
 
+   xen_setup_features();
+   xen_pvh_early_guest_init();
xen_setup_machphys_mapping();
 
/* Install Xen paravirt ops */
pv_info = xen_info;
pv_init_ops = xen_init_ops;
-   pv_cpu_ops = xen_cpu_ops;
pv_apic_ops = xen_apic_ops;
+   if (xen_pvh_domain())
+   pv_cpu_ops.cpuid

[PATCH V3 3/6]: PVH: mmu related changes.

2012-10-17 Thread Mukesh Rathor

PVH: This patch implements mmu changes for PVH. First the set/clear mmio pte 
function makes a hypercall to update the p2m in xen with 1:1 mapping. PVH uses 
mostly native mmu ops. Two local functions are introduced to add to xen physmap 
for xen remap interface. xen unmap interface is introduced so the privcmd pte 
entries can be cleared in xen p2m table.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/xen/mmu.c|  174 ++---
 arch/x86/xen/mmu.h|2 +
 drivers/xen/privcmd.c |5 +-
 include/xen/xen-ops.h |5 +-
 4 files changed, 174 insertions(+), 12 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 5a16824..5ed3b3e 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -73,6 +73,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "multicalls.h"
 #include "mmu.h"
@@ -331,6 +332,20 @@ static void xen_set_pte(pte_t *ptep, pte_t pteval)
__xen_set_pte(ptep, pteval);
 }
 
+void xen_set_clr_mmio_pvh_pte(unsigned long pfn, unsigned long mfn,
+ int nr_mfns, int add_mapping)
+{
+   struct physdev_map_iomem iomem;
+
+   iomem.first_gfn = pfn;
+   iomem.first_mfn = mfn;
+   iomem.nr_mfns = nr_mfns;
+   iomem.add_mapping = add_mapping;
+
+   if (HYPERVISOR_physdev_op(PHYSDEVOP_pvh_map_iomem, &iomem))
+   BUG();
+}
+
 static void xen_set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pteval)
 {
@@ -1220,6 +1235,8 @@ static void __init xen_pagetable_init(void)
 #endif
paging_init();
xen_setup_shared_info();
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return;
 #ifdef CONFIG_X86_64
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
unsigned long new_mfn_list;
@@ -1527,6 +1544,10 @@ static void __init xen_set_pte_init(pte_t *ptep, pte_t 
pte)
 static void pin_pagetable_pfn(unsigned cmd, unsigned long pfn)
 {
struct mmuext_op op;
+
+   if (xen_feature(XENFEAT_writable_page_tables))
+   return;
+
op.cmd = cmd;
op.arg1.mfn = pfn_to_mfn(pfn);
if (HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF))
@@ -1724,6 +1745,10 @@ static void set_page_prot(void *addr, pgprot_t prot)
unsigned long pfn = __pa(addr) >> PAGE_SHIFT;
pte_t pte = pfn_pte(pfn, prot);
 
+   /* recall for PVH, page tables are native. */
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return;
+
if (HYPERVISOR_update_va_mapping((unsigned long)addr, pte, 0))
BUG();
 }
@@ -1801,6 +1826,9 @@ static void convert_pfn_mfn(void *v)
pte_t *pte = v;
int i;
 
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return;
+
/* All levels are converted the same way, so just treat them
   as ptes. */
for (i = 0; i < PTRS_PER_PTE; i++)
@@ -1820,6 +1848,7 @@ static void __init check_pt_base(unsigned long *pt_base, 
unsigned long *pt_end,
(*pt_end)--;
}
 }
+
 /*
  * Set up the initial kernel pagetable.
  *
@@ -1830,6 +1859,7 @@ static void __init check_pt_base(unsigned long *pt_base, 
unsigned long *pt_end,
  * but that's enough to get __va working.  We need to fill in the rest
  * of the physical mapping once some sort of allocator has been set
  * up.
+ * NOTE: for PVH, the page tables are native.
  */
 void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 {
@@ -1907,10 +1937,13 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, 
unsigned long max_pfn)
 * structure to attach it to, so make sure we just set kernel
 * pgd.
 */
-   xen_mc_batch();
-   __xen_write_cr3(true, __pa(init_level4_pgt));
-   xen_mc_issue(PARAVIRT_LAZY_CPU);
-
+   if (xen_feature(XENFEAT_writable_page_tables)) {
+   native_write_cr3(__pa(init_level4_pgt));
+   } else {
+   xen_mc_batch();
+   __xen_write_cr3(true, __pa(init_level4_pgt));
+   xen_mc_issue(PARAVIRT_LAZY_CPU);
+   }
/* We can't that easily rip out L3 and L2, as the Xen pagetables are
 * set out this way: [L4], [L1], [L2], [L3], [L1], [L1] ...  for
 * the initial domain. For guests using the toolstack, they are in:
@@ -2177,8 +2210,20 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = 
{
 
 void __init xen_init_mmu_ops(void)
 {
-   x86_init.mapping.pagetable_reserve = xen_mapping_pagetable_reserve;
x86_init.paging.pagetable_init = xen_pagetable_init;
+
+   if (xen_feature(XENFEAT_auto_translated_physmap)) {
+   pv_mmu_ops.flush_tlb_others = xen_flush_tlb_others;
+#if 0
+   /* For PCI devices to map iomem. */
+   if (xen_initial_domain()) {
+   pv_mmu_ops.set_pte = native_set_pte;
+

[PATCH V3 2/6]: PVH: use native irq, enable callback, use HVM ring ops, smp, ...

2012-10-17 Thread Mukesh Rathor

PVH: make gdt_frames[]/gdt_ents into a union with {gdtaddr, gdtsz}, PVH
only needs to send down gdtaddr and gdtsz. irq.c: PVH uses
native_irq_ops. vcpu hotplug is currently not available for PVH.
events.c: setup callback vector for PVH. smp.c: This pertains to
bringing up smp vcpus. PVH runs in ring 0, so syscalls are native.
Also, the vcpu context is send down via the hcall to be set in the
vmcs. gdtaddr and gdtsz are unionionized as PVH only needs to send
these two to be set in the vmcs. Finally, PVH ring ops uses HVM paths
for xenbus.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/include/asm/xen/interface.h |   11 +-
 arch/x86/xen/irq.c   |5 ++-
 arch/x86/xen/p2m.c   |2 +-
 arch/x86/xen/smp.c   |   75
++ drivers/xen/cpu_hotplug.c
|4 +- drivers/xen/events.c |9 -
 drivers/xen/xenbus/xenbus_client.c   |3 +-
 7 files changed, 77 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/xen/interface.h
b/arch/x86/include/asm/xen/interface.h index 555f94d..ac5ef76 100644
--- a/arch/x86/include/asm/xen/interface.h
+++ b/arch/x86/include/asm/xen/interface.h
@@ -143,7 +143,16 @@ struct vcpu_guest_context {
 struct cpu_user_regs user_regs; /* User-level CPU
registers */ struct trap_info trap_ctxt[256];/* Virtual
IDT  */ unsigned long ldt_base, ldt_ents;   /* LDT
(linear address, # ents) */
-unsigned long gdt_frames[16], gdt_ents; /* GDT (machine frames, #
ents) */
+union {
+   struct {
+   /* PV: GDT (machine frames, # ents).*/
+   unsigned long gdt_frames[16], gdt_ents;
+   } pv;
+   struct {
+   /* PVH: GDTR addr and size */
+   unsigned long gdtaddr, gdtsz;
+   } pvh;
+} u;
 unsigned long kernel_ss, kernel_sp; /* Virtual TSS (only
SS1/SP1)   */ /* NB. User pagetable on x86/64 is placed in ctrlreg[1].
*/ unsigned long ctrlreg[8];   /* CR0-CR7 (control
registers)  */ diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index 1573376..31959a7 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -128,6 +129,8 @@ static const struct pv_irq_ops xen_irq_ops
__initconst = { 
 void __init xen_init_irq_ops(void)
 {
-   pv_irq_ops = xen_irq_ops;
+   /* For PVH we use default pv_irq_ops settings */
+   if (!xen_feature(XENFEAT_hvm_callback_vector))
+   pv_irq_ops = xen_irq_ops;
x86_init.irqs.intr_init = xen_init_IRQ;
 }
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 95fb2aa..ea553c8 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -798,7 +798,7 @@ bool __set_phys_to_machine(unsigned long pfn,
unsigned long mfn) {
unsigned topidx, mididx, idx;
 
-   if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
+   if (xen_feature(XENFEAT_auto_translated_physmap)) {
BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
return true;
}
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index f58dca7..cda1907 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -68,9 +68,11 @@ static void __cpuinit cpu_bringup(void)
touch_softlockup_watchdog();
preempt_disable();
 
-   xen_enable_sysenter();
-   xen_enable_syscall();
-
+   /* PVH runs in ring 0 and allows us to do native syscalls.
Yay! */
+   if (!xen_feature(XENFEAT_supervisor_mode_kernel)) {
+   xen_enable_sysenter();
+   xen_enable_syscall();
+   }
cpu = smp_processor_id();
smp_store_cpu_info(cpu);
cpu_data(cpu).x86_max_cores = 1;
@@ -230,10 +232,11 @@ static void __init xen_smp_prepare_boot_cpu(void)
BUG_ON(smp_processor_id() != 0);
native_smp_prepare_boot_cpu();
 
-   /* We've switched to the "real" per-cpu gdt, so make sure the
-  old memory can be recycled */
-   make_lowmem_page_readwrite(xen_initial_gdt);
-
+   if (!xen_feature(XENFEAT_writable_page_tables)) {
+   /* We've switched to the "real" per-cpu gdt, so make
sure the
+* old memory can be recycled */
+   make_lowmem_page_readwrite(xen_initial_gdt);
+   }
xen_filter_cpu_maps();
xen_setup_vcpu_info_placement();
 }
@@ -300,8 +303,6 @@ cpu_initialize_context(unsigned int cpu, struct
task_struct *idle) gdt = get_cpu_gdt_table(cpu);
 
ctxt->flags = VGCF_IN_KERNEL;
-   ctxt->user_regs.ds = __USER_DS;
-   ctxt->user_regs.es = __USER_DS;
ctxt->user_regs.ss = __KERNEL_DS;
 #ifdef CONFIG_X86_32
ctxt->user_regs.fs = __KERNEL_PERCPU;
@@ -310,35 +311,57 @@ cpu_initialize_context(unsigned int cpu, struct
task_struct *idle) ctxt->gs_base_kernel = per_cpu_offset(cpu);
 #endif
ctxt->user_r

[PATCH V3 1/6]: PVH: basic and header changes, elfnote changes, ...

2012-10-17 Thread Mukesh Rathor

[PATCH 1/6] PVH: is a PV linux guest that has extended capabilities. This patch 
allows it to be configured and enabled. Also, basic header file changes to add 
new subcalls to physmap hypercall. Lastly, mfn_to_local_pfn must return mfn for 
paging mode translate.

Signed-off-by: Mukesh Rathor 
---
 arch/x86/include/asm/xen/page.h |3 +++
 arch/x86/xen/Kconfig|   10 ++
 arch/x86/xen/xen-head.S |   11 ++-
 include/xen/interface/memory.h  |   24 +++-
 include/xen/interface/physdev.h |   10 ++
 5 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 472b9b7..6af440d 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -159,6 +159,9 @@ static inline xpaddr_t machine_to_phys(xmaddr_t machine)
 static inline unsigned long mfn_to_local_pfn(unsigned long mfn)
 {
unsigned long pfn = mfn_to_pfn(mfn);
+
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return mfn;
if (get_phys_to_machine(pfn) != mfn)
return -1; /* force !pfn_valid() */
return pfn;
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index fdce49c..822c5a0 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -50,3 +50,13 @@ config XEN_DEBUG_FS
  Enable statistics output and various tuning options in debugfs.
  Enabling this option may incur a significant performance overhead.
 
+config XEN_X86_PVH
+   bool "Support for running as a PVH guest (EXPERIMENTAL)"
+   depends on X86_64 && XEN && EXPERIMENTAL
+   default n
+   help
+  This option enables support for running as a PVH guest (PV guest
+  using hardware extensions) under a suitably capable hypervisor.
+  This option is EXPERIMENTAL because the hypervisor interfaces
+  which it uses are not yet considered stable therefore backwards and
+  forwards compatibility is not yet guaranteed.  If unsure, say N.
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 7faed58..1a6bca1 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -13,6 +13,15 @@
 #include 
 #include 
 
+#ifdef CONFIG_XEN_X86_PVH
+#define FEATURES_PVH "|writable_descriptor_tables" \
+"|auto_translated_physmap" \
+"|supervisor_mode_kernel" \
+"|hvm_callback_vector"
+#else
+#define FEATURES_PVH /* Not supported */
+#endif
+
__INIT
 ENTRY(startup_xen)
cld
@@ -95,7 +104,7 @@ NEXT_HYPERCALL(arch_6)
 #endif
ELFNOTE(Xen, XEN_ELFNOTE_ENTRY,  _ASM_PTR startup_xen)
ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, _ASM_PTR hypercall_page)
-   ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,   .asciz 
"!writable_page_tables|pae_pgdir_above_4gb")
+   ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,   .asciz 
"!writable_page_tables|pae_pgdir_above_4gb"FEATURES_PVH)
ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE,   .asciz "yes")
ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz "generic")
ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID,
diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h
index d8e33a9..425911f 100644
--- a/include/xen/interface/memory.h
+++ b/include/xen/interface/memory.h
@@ -169,7 +169,13 @@ struct xen_add_to_physmap {
 /* Source mapping space. */
 #define XENMAPSPACE_shared_info 0 /* shared info page */
 #define XENMAPSPACE_grant_table 1 /* grant table page */
-unsigned int space;
+#define XENMAPSPACE_gmfn2 /* GMFN */
+#define XENMAPSPACE_gmfn_range  3 /* GMFN range */
+#define XENMAPSPACE_gmfn_foreign 4 /* GMFN from another guest */
+uint16_t space;
+domid_t foreign_domid; /* IFF XENMAPSPACE_gmfn_foreign */
+
+#define XENMAPIDX_grant_table_status 0x8000
 
 /* Index into source mapping space. */
 unsigned long idx;
@@ -237,4 +243,20 @@ DEFINE_GUEST_HANDLE_STRUCT(xen_memory_map);
  * during a driver critical region.
  */
 extern spinlock_t xen_reservation_lock;
+
+/*
+ * Unmaps the page appearing at a particular GPFN from the specified guest's
+ * pseudophysical address space.
+ * arg == addr of xen_remove_from_physmap_t.
+ */
+#define XENMEM_remove_from_physmap  15
+struct xen_remove_from_physmap {
+/* Which domain to change the mapping for. */
+domid_t domid;
+
+/* GPFN of the current mapping of the page. */
+xen_pfn_t gpfn;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_remove_from_physmap);
+
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index 9ce788d..3b9d5b6 100644
--- a/include/xen/interface/physdev.h
+++ b/include/xen/interface/physdev.h
@@ -258,6 +258,16 @@ struct physdev_pci_device {
 uint8_t devfn;
 };
 
+#define PHY

[PATCH V3 0/6]: PVH: PV guest with extensions

2012-10-17 Thread Mukesh Rathor

Hi guys,

Ok, I've made the changes from prev V2 patch submission comments. Tested
all the combinations. I am building xen patch just for the
corresponding header file changes. Following that I'll refresh xen
tree, debug, test, and send patches.

For linux kernel mailing list introduction, PVH is a PV guest that can
run in an HVM container, uses native pagetables, uses callback vector,
native IDT, and native syscalls.

They were built on top of 89d0307af2b9957d59bfb2a86aaa57464ff921de
commit.

thanks,
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/6] xen: x86 pvh: use XENMEM_add_to_physmap_range for foreign gmfn mappings

2012-10-17 Thread Mukesh Rathor

On Wed, 17 Oct 2012 12:32:12 +0100
Ian Campbell  wrote:

> Squeezing the necessary fields into the existing XENMEM_add_to_physmap
> interface was proving to be a bit tricky so we have decided to go with
> a new interface upstream (the XENMAPSPACE_gmfn_foreign interface using
> XENMEM_add_to_physmap was never committed anywhere). This interface
> also allows for batching which was impossible to support at the same
> time as foreign mfns in the old interface.
> 
> This reverts the relevant parts of "PVH: basic and header changes,
> elfnote changes, ..." and followups and trivially converts
> pvh_add_to_xen_p2m over.


Hmm... I don't see xen side implementation of XENMEM_add_to_physmap_range,
and since I've already got my patches tested and cut, I'm going to send
them now. We can apply this change easily in konrad's tree.

thanks,
mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH V2 6/7]: PVH: balloon and grant changes

2012-10-16 Thread Mukesh Rathor

On Fri, 12 Oct 2012 10:06:57 +0100
Ian Campbell  wrote:

> On Thu, 2012-10-11 at 23:01 +0100, Mukesh Rathor wrote:
> > PVH: balloon and grant changes. For balloon changes we skip setting
> > of local p2m as it's updated in xen. For grant, the shared grant
> > frame is the pfn and not mfn, hence its mapped via the same code
> > path as HVM
> > 
> > Signed-off-by: Mukesh R 
> > ---
> >  drivers/xen/balloon.c |   18 +++---
> >  drivers/xen/gntdev.c  |3 ++-
> >  drivers/xen/grant-table.c |   25 +
> >  3 files changed, 34 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> > index 31ab82f..9b895ad 100644
> > --- a/drivers/xen/balloon.c
> > +++ b/drivers/xen/balloon.c
> > @@ -358,10 +358,13 @@ static enum bp_state
> > increase_reservation(unsigned long nr_pages)
> > BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) &&
> > phys_to_machine_mapping_valid(pfn)); 
> > -   set_phys_to_machine(pfn, frame_list[i]);
> > +   if (!xen_feature(XENFEAT_auto_translated_physmap))
> > +   set_phys_to_machine(pfn, frame_list[i]);
> 
> set_phys_to_machine is a NOP if XENFEAT_auto_translated_physmap but it
> includes some useful sanity checks (BUG_ON(pfn != mfn && mfn !=
> INVALID_P2M_ENTRY)) so it would be useful to keep calling it I think.

ok, fine, done.

> > +!xen_feature(XENFEAT_auto_translated_physmap);
> >  
> > err = misc_register(&gntdev_miscdev);
> > if (err != 0) {
> > @@ -1055,7 +1060,7 @@ static void gnttab_request_version(void)
> > int rc;
> > struct gnttab_set_version gsv;
> >  
> > -   if (xen_hvm_domain())
> > +   if (xen_hvm_domain() ||
> > xen_feature(XENFEAT_auto_translated_physmap))
> 
> This is just a case of not yet implemented rather than a fundamental
> problem with v2 grant tables, correct? (i.e. future work)

Right. 

> > gsv.version = 1;
> > else
> > gsv.version = 2;
> > @@ -1083,12 +1088,24 @@ static void gnttab_request_version(void)
> >  int gnttab_resume(void)
> >  {
> > unsigned int max_nr_gframes;
> > +   char *kmsg = "Failed to kmalloc pages for pv in hvm grant
> > frames\n"; 
> > gnttab_request_version();
> > max_nr_gframes = gnttab_max_grant_frames();
> > if (max_nr_gframes < nr_grant_frames)
> > return -ENOSYS;
> >  
> > +   /* PVH note: xen will free existing kmalloc'd mfn in
> > +* XENMEM_add_to_physmap */
> 
> It doesn't leak it?
> We should consider using xenballoon_alloc_pages here.

Well, it's a one time allocation of a dozen (iirc) pages, that are 
returned back to dom heap. We could use xenballoon, but for that we
need to create vma first so we get the contigous VA, then map each pte
since xenballoon_ doesn't return contigous pages, and then populate
physmap. I'm gonna punt it to phase II.  Lets get some working baseline
in asap.

thanks
mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH V2 2/7]: PVH: use native irq, enable callback, use HVM ring ops, ...

2012-10-16 Thread Mukesh Rathor

On Mon, 15 Oct 2012 09:58:17 +0100
Ian Campbell  wrote:

> On Fri, 2012-10-12 at 20:06 +0100, Mukesh Rathor wrote:
> > On Fri, 12 Oct 2012 09:52:17 +0100
> > Ian Campbell  wrote:
> > 
> > > >  drivers/xen/cpu_hotplug.c|4 +++-
> > > >  drivers/xen/events.c |9 -
> > > >  drivers/xen/xenbus/xenbus_client.c   |3 ++-
> > > >  7 files changed, 27 insertions(+), 8 deletions(-)
> > > > 
> > > union {
> > >   struct {
> > >   unsigned long gdt_frames[16], gdt_ents;
> > >   } pv;
> > >   struct {
> > >   unsigned long gdtaddr, gdtsz;
> > >   } pvh;
> > > } gdt;
> > > 
> > > (I've gone with naming the union gdt instead of u. You might want
> > > therefore to also drop the gdt prefix from the members?)
> > 
> > Is it worth it, I mean, making it a union. Would you be OK if I just
> > used gdt_frames[0] and gdt_ends for gdtaddr and size?
> 
> What's the problem with making it a union? Seems like you are 80% of
> the way there.

No problem. It resutls in a patch on xen side too. I'll send that too.

> units AFAICT and so can be combined.
> 
> How come you don't need the same stuff for ldt*?

Happens natively. Isn't PVH great!

thanks
mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH V2 3/7]: PVH: mmu related changes.

2012-10-16 Thread Mukesh Rathor

On Tue, 16 Oct 2012 17:27:01 +0100
Ian Campbell  wrote:

> On Fri, 2012-10-12 at 09:57 +0100, Ian Campbell wrote:
> > > +int xen_unmap_domain_mfn_range(struct vm_area_struct *vma)
> > > +{
> > > +   int numpgs = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
> > > +   struct page **pages = vma ? vma->vm_private_data : NULL;
> > 
> > I thought we agreed to keep uses of vm_private_data in the privcmd
> > driver?
> > 
> > I think you should just add pages and nr as direct parameters to
> > this function, which is symmetric with the map call.
> 
> I had to look at this while rebasing my arm patches, turned out to be
> fairly simple. Feel free to either fold in or badger me for a proper
> commit message.


I made similar change in my tree, except I am not passing vma as its
not needed. I guess you just wanna be consistend with remap, or future
use?

thanks
mukesh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH V2 0/7]: PVH: PV guest with extensions

2012-10-15 Thread Mukesh Rathor

On Mon, 15 Oct 2012 17:15:43 +0100
Ian Campbell  wrote:

> On Thu, 2012-10-11 at 22:49 +0100, Mukesh Rathor wrote:
> > Hi guys,
> > 
> > Ok, I've made all the changes from prev RFC patch submissions.
> > Tested all the combinations. The patches are organized slightly
> > differently from prev version because of the nature of changes
> > after last review. I am building xen patch just for the
> > corresponding header file changes. Following that I'll refresh xen
> > tree, debug, test, and send patches.
> > 
> > For linux kernel mailing list introduction, PVH is a PV guest that
> > can run in an HVM container, uses native pagetables, uses callback
> > vector, native IDT, and native syscalls.
> > 
> > They were built on top of 89d0307af2b9957d59bfb2a86aaa57464ff921de
> > commit.
> 
> Are they in a git branch anywhere?
> 

not in a public accessible location. we are very close to getting this
done (it appears ;)), and once konrad puts in his tree it should be
publicly available.

thanks
mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH V2 3/7]: PVH: mmu related changes.

2012-10-12 Thread Mukesh Rathor

On Fri, 12 Oct 2012 09:57:56 +0100
Ian Campbell  wrote:

> On Thu, 2012-10-11 at 22:58 +0100, Mukesh Rathor wrote:
> > @@ -2177,8 +2210,19 @@ static const struct pv_mmu_ops xen_mmu_ops
> > __initconst = {
> > 
> >  void __init xen_init_mmu_ops(void)
> >  {
> > -   x86_init.mapping.pagetable_reserve =
> > xen_mapping_pagetable_reserve; x86_init.paging.pagetable_init =
> > xen_pagetable_init; +
> > +   if (xen_feature(XENFEAT_auto_translated_physmap)) {
> > +   pv_mmu_ops.flush_tlb_others = xen_flush_tlb_others;
> > +
> > +   /* For PCI devices to map iomem. */
> > +   if (xen_initial_domain()) {
> > +   pv_mmu_ops.set_pte = native_set_pte;
> > +   pv_mmu_ops.set_pte_at = native_set_pte_at;
> 
> What do these end up being for the !xen_initial_domain case? I'd have
> expected native_FOO.

Yeah, right, we kept on changing the functions that they were set
to, until it came down to just native_*. I just didn't think it didnt
being set. Too much too fast... ok, time to slow down... :) :)..

thanks
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH V2 1/7]: PVH: basic and header changes, elfnote changes, ...

2012-10-12 Thread Mukesh Rathor

On Fri, 12 Oct 2012 09:48:48 +0100
Ian Campbell  wrote:

> > index fdce49c..9323b8c 100644
> > --- a/arch/x86/xen/Kconfig
> > +++ b/arch/x86/xen/Kconfig
> > @@ -50,3 +50,13 @@ config XEN_DEBUG_FS
> >   Enable statistics output and various tuning options in
> > debugfs. Enabling this option may incur a significant performance
> > overhead. 
> > +config XEN_X86_PVH
> > +   bool "Support for running as a PVH guest (EXPERIMENTAL)"
> > +   depends on X86_64 && XEN && INTEL_IOMMU && EXPERIMENTAL
> 
> OOI why does the kernel side require an INTEL_IOMMU? I can see why the
> hypervisor would need it but the guests (including dom0) can't
> actually see the underlying IOMMU, can they?

Well, the kernel requires the hypervisor to have it, but I guess
thats not what this is referring to. The tools can decide that.
I'll take it out.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH V2 0/7]: PVH: PV guest with extensions

2012-10-12 Thread Mukesh Rathor

On Fri, 12 Oct 2012 10:18:31 +0100
Ian Campbell  wrote:

> On Thu, 2012-10-11 at 22:49 +0100, Mukesh Rathor wrote:
> > Hi guys,
> > 
> > Ok, I've made all the changes from prev RFC patch submissions.
> > Tested all the combinations. The patches are organized slightly
> > differently from prev version because of the nature of changes
> > after last review. I am building xen patch just for the
> > corresponding header file changes. Following that I'll refresh xen
> > tree, debug, test, and send patches.
> > 
> > For linux kernel mailing list introduction, PVH is a PV guest that
> > can run in an HVM container, uses native pagetables, uses callback
> > vector, native IDT, and native syscalls.
> > 
> > They were built on top of 89d0307af2b9957d59bfb2a86aaa57464ff921de
> > commit.
> 
> I took a (fairly quick) look. I had a few comments but overall looks
> pretty good, thanks!
> 
> I'm constantly amazed by how small this patchset is. I suspect you are
> going to make up for it in the hypervisor side ;-)

Hehe... yes! The patch has become almost half since I first got it
working. Each review trims it down :).

thanks
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH V2 2/7]: PVH: use native irq, enable callback, use HVM ring ops, ...

2012-10-12 Thread Mukesh Rathor

On Fri, 12 Oct 2012 09:52:17 +0100
Ian Campbell  wrote:

> >  drivers/xen/cpu_hotplug.c|4 +++-
> >  drivers/xen/events.c |9 -
> >  drivers/xen/xenbus/xenbus_client.c   |3 ++-
> >  7 files changed, 27 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/xen/interface.h
> > b/arch/x86/include/asm/xen/interface.h index 555f94d..f11edb0 100644
> > --- a/arch/x86/include/asm/xen/interface.h
> > +++ b/arch/x86/include/asm/xen/interface.h
> > @@ -143,7 +143,13 @@ struct vcpu_guest_context {
> >  struct cpu_user_regs user_regs; /* User-level CPU
> > registers */ struct trap_info trap_ctxt[256];/* Virtual
> > IDT  */ unsigned long ldt_base, ldt_ents;   /*
> > LDT (linear address, # ents) */
> > -unsigned long gdt_frames[16], gdt_ents; /* GDT (machine
> > frames, # ents) */
> > +union {
> > +   struct {
> > +   /* PV: GDT (machine frames, # ents).*/
> > +   unsigned long gdt_frames[16], gdt_ents;
> > +   } s;
> > +   unsigned long gdtaddr, gdtsz;   /* PVH: GDTR addr
> > and size */
> 
> I've pointed out a few times that I think this is wrong -- gdtaddr and
> gdtsz will overlap each other in the union. I'm not sure how it even
> works, unless the hypervisor is ignoring one or the other. You need:
> 
> union {
>   struct {
>   unsigned long gdt_frames[16], gdt_ents;
>   } pv;
>   struct {
>   unsigned long gdtaddr, gdtsz;
>   } pvh;
> } gdt;
> 
> (I've gone with naming the union gdt instead of u. You might want
> therefore to also drop the gdt prefix from the members?)

Is it worth it, I mean, making it a union. Would you be OK if I just
used gdt_frames[0] and gdt_ends for gdtaddr and size?

thanks
Mukesh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2 7/7]: PVH: privcmd changes.

2012-10-11 Thread Mukesh Rathor

PVH: privcmd changes. PVH only supports the batch interface. To map a
foreign page to a process, pfn must be allocated. PVH path uses
ballooning for that purpose. The returned pfn is then mapped to the
foreign page. xen_unmap_domain_mfn_range() is introduced to unmap these
pages via the privcmd close call.

Signed-off-by: Mukesh R 
---
 drivers/xen/privcmd.c |   70 +++-
 1 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 63d9ee8..b76d33c 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -33,11 +33,14 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "privcmd.h"
 
 MODULE_LICENSE("GPL");
 
+#define PRIV_VMA_LOCKED ((void *)1)
+
 #ifndef HAVE_ARCH_PRIVCMD_MMAP
 static int privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma);
 #endif
@@ -199,6 +202,10 @@ static long privcmd_ioctl_mmap(void __user *udata)
if (!xen_initial_domain())
return -EPERM;
 
+   /* We only support privcmd_ioctl_mmap_batch for auto translated. */
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return -ENOSYS;
+
if (copy_from_user(&mmapcmd, udata, sizeof(mmapcmd)))
return -EFAULT;
 
@@ -246,6 +253,7 @@ struct mmap_batch_state {
domid_t domain;
unsigned long va;
struct vm_area_struct *vma;
+   int index;
/* A tristate:
 *  0 for no errors
 *  1 if at least one error has happened (and no
@@ -260,15 +268,24 @@ struct mmap_batch_state {
xen_pfn_t __user *user_mfn;
 };
 
+/* auto translated dom0 note: if domU being created is PV, then mfn is
+ * mfn(addr on bus). If it's auto xlated, then mfn is pfn (input to HAP).
+ */
 static int mmap_batch_fn(void *data, void *state)
 {
xen_pfn_t *mfnp = data;
struct mmap_batch_state *st = state;
+   struct vm_area_struct *vma = st->vma;
+   struct page **pages = vma->vm_private_data;
+   struct page *cur_page = NULL;
int ret;
 
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   cur_page = pages[st->index++];
+
ret = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK, *mfnp, 1,
 st->vma->vm_page_prot, st->domain,
-NULL);
+&cur_page);
 
/* Store error code for second pass. */
*(st->err++) = ret;
@@ -304,6 +321,32 @@ static int mmap_return_errors_v1(void *data, void *state)
return __put_user(*mfnp, st->user_mfn++);
 }
 
+/* Allocate pfns that are then mapped with gmfns from foreign domid. Update
+ * the vma with the page info to use later.
+ * Returns: 0 if success, otherwise -errno
+ */
+static int alloc_empty_pages(struct vm_area_struct *vma, int numpgs)
+{
+   int rc;
+   struct page **pages;
+
+   pages = kcalloc(numpgs, sizeof(pages[0]), GFP_KERNEL);
+   if (pages == NULL)
+   return -ENOMEM;
+
+   rc = alloc_xenballooned_pages(numpgs, pages, 0);
+   if (rc != 0) {
+   pr_warn("%s Could not alloc %d pfns rc:%d\n", __func__,
+   numpgs, rc);
+   kfree(pages);
+   return -ENOMEM;
+   }
+   BUG_ON(vma->vm_private_data != PRIV_VMA_LOCKED);
+   vma->vm_private_data = pages;
+
+   return 0;
+}
+
 static struct vm_operations_struct privcmd_vm_ops;
 
 static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
@@ -371,10 +414,18 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
up_write(&mm->mmap_sem);
goto out;
}
+   if (xen_feature(XENFEAT_auto_translated_physmap)) {
+   ret = alloc_empty_pages(vma, m.num);
+   if (ret < 0) {
+   up_write(&mm->mmap_sem);
+   goto out;
+   }
+   }
 
state.domain= m.dom;
state.vma   = vma;
state.va= m.addr;
+   state.index = 0;
state.global_error  = 0;
state.err   = err_array;
 
@@ -439,6 +490,20 @@ static long privcmd_ioctl(struct file *file,
return ret;
 }
 
+static void privcmd_close(struct vm_area_struct *vma)
+{
+   struct page **pages = vma ? vma->vm_private_data : NULL;
+   int numpgs = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+
+   if (!pages || !numpgs || !xen_feature(XENFEAT_auto_translated_physmap))
+   return;
+
+   xen_unmap_domain_mfn_range(vma);
+   while (numpgs--)
+   free_xenballooned_pages(1, &pages[numpgs]);
+   kfree(pages);
+}
+
 static int privcmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
printk(KERN_DEBUG "privcmd_fault: vma=%p %lx-%lx, pgoff=%lx, uv=%p\n",
@@ -449,6 +514,7 @@ static int privcmd_fault(struct vm_area_struct *vma,

[PATCH V2 6/7]: PVH: balloon and grant changes

2012-10-11 Thread Mukesh Rathor

PVH: balloon and grant changes. For balloon changes we skip setting of
local p2m as it's updated in xen. For grant, the shared grant frame is
the pfn and not mfn, hence its mapped via the same code path as HVM

Signed-off-by: Mukesh R 
---
 drivers/xen/balloon.c |   18 +++---
 drivers/xen/gntdev.c  |3 ++-
 drivers/xen/grant-table.c |   25 +
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 31ab82f..9b895ad 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -358,10 +358,13 @@ static enum bp_state increase_reservation(unsigned long 
nr_pages)
BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) &&
   phys_to_machine_mapping_valid(pfn));
 
-   set_phys_to_machine(pfn, frame_list[i]);
+   if (!xen_feature(XENFEAT_auto_translated_physmap))
+   set_phys_to_machine(pfn, frame_list[i]);
 
/* Link back into the page tables if not highmem. */
-   if (xen_pv_domain() && !PageHighMem(page)) {
+   if (xen_pv_domain() && !PageHighMem(page) &&
+   !xen_feature(XENFEAT_auto_translated_physmap)) {
+
int ret;
ret = HYPERVISOR_update_va_mapping(
(unsigned long)__va(pfn << PAGE_SHIFT),
@@ -418,12 +421,13 @@ static enum bp_state decrease_reservation(unsigned long 
nr_pages, gfp_t gfp)
scrub_page(page);
 
if (xen_pv_domain() && !PageHighMem(page)) {
-   ret = HYPERVISOR_update_va_mapping(
-   (unsigned long)__va(pfn << PAGE_SHIFT),
-   __pte_ma(0), 0);
-   BUG_ON(ret);
+   if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   ret = HYPERVISOR_update_va_mapping(
+   (unsigned long)__va(pfn << PAGE_SHIFT),
+   __pte_ma(0), 0);
+   BUG_ON(ret);
+   }
}
-
}
 
/* Ensure that ballooned highmem pages don't have kmaps. */
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 5df9fd8..36ec380 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -803,7 +803,8 @@ static int __init gntdev_init(void)
if (!xen_domain())
return -ENODEV;
 
-   use_ptemod = xen_pv_domain();
+   use_ptemod = xen_pv_domain() &&
+!xen_feature(XENFEAT_auto_translated_physmap);
 
err = misc_register(&gntdev_miscdev);
if (err != 0) {
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index f37faf6..feaebeb 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -976,14 +976,19 @@ static void gnttab_unmap_frames_v2(void)
 static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
 {
struct gnttab_setup_table setup;
-   unsigned long *frames;
+   unsigned long *frames, start_gpfn;
unsigned int nr_gframes = end_idx + 1;
int rc;
 
-   if (xen_hvm_domain()) {
+   if (xen_hvm_domain() || xen_feature(XENFEAT_auto_translated_physmap)) {
struct xen_add_to_physmap xatp;
unsigned int i = end_idx;
rc = 0;
+
+   if (xen_hvm_domain())
+   start_gpfn = xen_hvm_resume_frames >> PAGE_SHIFT;
+   else
+   start_gpfn = virt_to_pfn(gnttab_shared.addr);
/*
 * Loop backwards, so that the first hypercall has the largest
 * index, ensuring that the table will grow only once.
@@ -992,7 +997,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int 
end_idx)
xatp.domid = DOMID_SELF;
xatp.idx = i;
xatp.space = XENMAPSPACE_grant_table;
-   xatp.gpfn = (xen_hvm_resume_frames >> PAGE_SHIFT) + i;
+   xatp.gpfn = start_gpfn + i;
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
if (rc != 0) {
printk(KERN_WARNING
@@ -1055,7 +1060,7 @@ static void gnttab_request_version(void)
int rc;
struct gnttab_set_version gsv;
 
-   if (xen_hvm_domain())
+   if (xen_hvm_domain() || xen_feature(XENFEAT_auto_translated_physmap))
gsv.version = 1;
else
gsv.version = 2;
@@ -1083,12 +1088,24 @@ static void gnttab_request_version(void)
 int gnttab_resume(void)
 {
unsigned int max_nr_gframes;
+   char *kmsg = "Failed to kmalloc pages for pv in hvm grant frames\n";
 
gnttab_request_version();
max_nr_gframes = gnttab_max_grant_frames();

[PATCH V2 5/7]: PVH: smp changes.

2012-10-11 Thread Mukesh Rathor

PVH: smp changes. This pertains to bringing up smp vcpus. PVH runs in
ring 0, so syscalls are native. Also, the vcpu context is send down via
the hcall to be set in the vmcs. gdtaddr and gdtsz are unionionized as
PVH only needs to send these two to be set in the vmcs

Signed-off-by: Mukesh R 
---
 arch/x86/xen/smp.c |   75 ++--
 1 files changed, 49 insertions(+), 26 deletions(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index bd92698..63a0bfb 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -68,9 +68,11 @@ static void __cpuinit cpu_bringup(void)
touch_softlockup_watchdog();
preempt_disable();
 
-   xen_enable_sysenter();
-   xen_enable_syscall();
-
+   /* PVH runs in ring 0 and allows us to do native syscalls. Yay! */
+   if (!xen_feature(XENFEAT_supervisor_mode_kernel)) {
+   xen_enable_sysenter();
+   xen_enable_syscall();
+   }
cpu = smp_processor_id();
smp_store_cpu_info(cpu);
cpu_data(cpu).x86_max_cores = 1;
@@ -230,10 +232,11 @@ static void __init xen_smp_prepare_boot_cpu(void)
BUG_ON(smp_processor_id() != 0);
native_smp_prepare_boot_cpu();
 
-   /* We've switched to the "real" per-cpu gdt, so make sure the
-  old memory can be recycled */
-   make_lowmem_page_readwrite(xen_initial_gdt);
-
+   if (!xen_feature(XENFEAT_writable_page_tables)) {
+   /* We've switched to the "real" per-cpu gdt, so make sure the
+* old memory can be recycled */
+   make_lowmem_page_readwrite(xen_initial_gdt);
+   }
xen_filter_cpu_maps();
xen_setup_vcpu_info_placement();
 }
@@ -300,8 +303,6 @@ cpu_initialize_context(unsigned int cpu, struct task_struct 
*idle)
gdt = get_cpu_gdt_table(cpu);
 
ctxt->flags = VGCF_IN_KERNEL;
-   ctxt->user_regs.ds = __USER_DS;
-   ctxt->user_regs.es = __USER_DS;
ctxt->user_regs.ss = __KERNEL_DS;
 #ifdef CONFIG_X86_32
ctxt->user_regs.fs = __KERNEL_PERCPU;
@@ -310,35 +311,57 @@ cpu_initialize_context(unsigned int cpu, struct 
task_struct *idle)
ctxt->gs_base_kernel = per_cpu_offset(cpu);
 #endif
ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
-   ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
 
memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
-   xen_copy_trap_info(ctxt->trap_ctxt);
+   /* check for autoxlated to get it right for 32bit kernel */
+   if (xen_feature(XENFEAT_auto_translated_physmap) &&
+   xen_feature(XENFEAT_supervisor_mode_kernel)) {
 
-   ctxt->ldt_ents = 0;
+   ctxt->user_regs.ds = __KERNEL_DS;
+   ctxt->user_regs.es = 0;
+   ctxt->user_regs.gs = 0;
 
-   BUG_ON((unsigned long)gdt & ~PAGE_MASK);
+   ctxt->u.gdtaddr = (unsigned long)gdt;
+   ctxt->u.gdtsz = (unsigned long)(GDT_SIZE - 1);
 
-   gdt_mfn = arbitrary_virt_to_mfn(gdt);
-   make_lowmem_page_readonly(gdt);
-   make_lowmem_page_readonly(mfn_to_virt(gdt_mfn));
+#ifdef CONFIG_X86_64
+   /* Note: PVH is not supported on x86_32. */
+   ctxt->gs_base_user = (unsigned long)
+   per_cpu(irq_stack_union.gs_base, cpu);
+#endif
+   } else {
+   ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
+   ctxt->user_regs.ds = __USER_DS;
+   ctxt->user_regs.es = __USER_DS;
 
-   ctxt->u.s.gdt_frames[0] = gdt_mfn;
-   ctxt->u.s.gdt_ents  = GDT_ENTRIES;
+   xen_copy_trap_info(ctxt->trap_ctxt);
 
-   ctxt->user_regs.cs = __KERNEL_CS;
-   ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
+   ctxt->ldt_ents = 0;
 
-   ctxt->kernel_ss = __KERNEL_DS;
-   ctxt->kernel_sp = idle->thread.sp0;
+   BUG_ON((unsigned long)gdt & ~PAGE_MASK);
+
+   gdt_mfn = arbitrary_virt_to_mfn(gdt);
+   make_lowmem_page_readonly(gdt);
+   make_lowmem_page_readonly(mfn_to_virt(gdt_mfn));
+
+   ctxt->u.s.gdt_frames[0] = gdt_mfn;
+   ctxt->u.s.gdt_ents  = GDT_ENTRIES;
+
+   ctxt->kernel_ss = __KERNEL_DS;
+   ctxt->kernel_sp = idle->thread.sp0;
 
 #ifdef CONFIG_X86_32
-   ctxt->event_callback_cs = __KERNEL_CS;
-   ctxt->failsafe_callback_cs  = __KERNEL_CS;
+   ctxt->event_callback_cs = __KERNEL_CS;
+   ctxt->failsafe_callback_cs  = __KERNEL_CS;
 #endif
-   ctxt->event_callback_eip= (unsigned long)xen_hypervisor_callback;
-   ctxt->failsafe_callback_eip = (unsigned long)xen_failsafe_callback;
+   ctxt->event_callback_eip=
+   (unsigned long)xen_hypervisor_callback;
+   ctxt->failsafe_callback_eip =
+   (unsigned long)xen_failsafe_c

[PATCH V2 4/7]: PVH: bootup and setup related changes.

2012-10-11 Thread Mukesh Rathor

PVH: bootup and setup related changes. enlighten.c: for PVH we can trap
cpuid via vmexit, so don't need to use emulated prefix call. Check for
vector callback early on, as it is a required feature. PVH runs at
default kernel iopl. setup.c: in xen_add_extra_mem() we can skip
updating p2m as it's managed by xen. PVH maps the entire IO space, but
only RAM pages need to be repopulated. Finally, pure PV settings are
moved to a separate function that is only called for pure pv, ie, pv
with pvmmu.

Signed-off-by: Mukesh R 
---
 arch/x86/xen/enlighten.c |   78 ++
 arch/x86/xen/setup.c |   64 ++---
 2 files changed, 110 insertions(+), 32 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 2d932c3..5199258 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -106,6 +107,9 @@ RESERVE_BRK(shared_info_page_brk, PAGE_SIZE);
 __read_mostly int xen_have_vector_callback;
 EXPORT_SYMBOL_GPL(xen_have_vector_callback);
 
+#define xen_pvh_domain() (xen_pv_domain() && \
+ xen_feature(XENFEAT_auto_translated_physmap) && \
+ xen_have_vector_callback)
 /*
  * Point at some empty memory to start with. We map the real shared_info
  * page as soon as fixmap is up and running.
@@ -218,8 +222,9 @@ static void __init xen_banner(void)
struct xen_extraversion extra;
HYPERVISOR_xen_version(XENVER_extraversion, &extra);
 
-   printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
-  pv_info.name);
+   pr_info("Booting paravirtualized kernel %son %s\n",
+   xen_feature(XENFEAT_auto_translated_physmap) ?
+   "with PVH extensions " : "", pv_info.name);
printk(KERN_INFO "Xen version: %d.%d%s%s\n",
   version >> 16, version & 0x, extra.extraversion,
   xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " 
(preserve-AD)" : "");
@@ -272,12 +277,15 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
break;
}
 
-   asm(XEN_EMULATE_PREFIX "cpuid"
-   : "=a" (*ax),
- "=b" (*bx),
- "=c" (*cx),
- "=d" (*dx)
-   : "0" (*ax), "2" (*cx));
+   if (xen_pvh_domain())
+   native_cpuid(ax, bx, cx, dx);
+   else
+   asm(XEN_EMULATE_PREFIX "cpuid"
+   : "=a" (*ax),
+   "=b" (*bx),
+   "=c" (*cx),
+   "=d" (*dx)
+   : "0" (*ax), "2" (*cx));
 
*bx &= maskebx;
*cx &= maskecx;
@@ -1045,6 +1053,10 @@ void xen_setup_shared_info(void)
HYPERVISOR_shared_info =
(struct shared_info *)__va(xen_start_info->shared_info);
 
+   /* PVH TBD/FIXME: vcpu info placement in phase 2 */
+   if (xen_pvh_domain())
+   return;
+
 #ifndef CONFIG_SMP
/* In UP this is as good a place as any to set up shared info */
xen_setup_vcpu_info_placement();
@@ -1275,6 +1287,11 @@ static const struct machine_ops xen_machine_ops 
__initconst = {
  */
 static void __init xen_setup_stackprotector(void)
 {
+   /* PVH TBD/FIXME: investigate setup_stack_canary_segment */
+   if (xen_feature(XENFEAT_auto_translated_physmap)) {
+   switch_to_new_gdt(0);
+   return;
+   }
pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot;
pv_cpu_ops.load_gdt = xen_load_gdt_boot;
 
@@ -1285,6 +1302,19 @@ static void __init xen_setup_stackprotector(void)
pv_cpu_ops.load_gdt = xen_load_gdt;
 }
 
+static void __init xen_pvh_early_guest_init(void)
+{
+   if (xen_feature(XENFEAT_hvm_callback_vector))
+   xen_have_vector_callback = 1;
+
+#ifdef CONFIG_X86_32
+   if (xen_feature(XENFEAT_auto_translated_physmap)) {
+   xen_raw_printk("ERROR: 32bit PVH guests are not supported\n");
+   BUG();
+   }
+#endif
+}
+
 /* First C function to be called on Xen boot */
 asmlinkage void __init xen_start_kernel(void)
 {
@@ -1296,13 +1326,18 @@ asmlinkage void __init xen_start_kernel(void)
 
xen_domain_type = XEN_PV_DOMAIN;
 
+   xen_setup_features();
+   xen_pvh_early_guest_init();
xen_setup_machphys_mapping();
 
/* Install Xen paravirt ops */
pv_info = xen_info;
pv_init_ops = xen_init_ops;
-   pv_cpu_ops = xen_cpu_ops;
pv_apic_ops = xen_apic_ops;
+   if (xen_pvh_domain())
+   pv_cpu_ops.cpuid = xen_cpuid;
+   else
+   pv_cpu_ops = xen_cpu_ops;
 
x86_init.resources.memory_setup = xen_memory_setup;
x86_init.oem.arch_setup = xen_arch_setup;
@@ -1334,8 +1369,6 @@ asmlinkage void __init xen_start_kernel(void)
/* Work out if we support NX

[PATCH V2 3/7]: PVH: mmu related changes.

2012-10-11 Thread Mukesh Rathor

PVH: This patch implements mmu changes for PVH. First the set/clear
mmio pte function makes a hypercall to update the p2m in xen with 1:1
mapping. PVH uses mostly native mmu ops. Two local functions are
introduced to add to xen physmap for xen remap interface. xen unmap
interface is introduced so the privcmd pte entries can be cleared in
xen p2m table.

Signed-off-by: Mukesh R 
---
 arch/x86/xen/mmu.c|  172 ++---
 arch/x86/xen/mmu.h|2 +
 drivers/xen/privcmd.c |5 +-
 include/xen/xen-ops.h |4 +-
 4 files changed, 171 insertions(+), 12 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 5a16824..12b56a0 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -73,6 +73,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "multicalls.h"
 #include "mmu.h"
@@ -331,6 +332,20 @@ static void xen_set_pte(pte_t *ptep, pte_t pteval)
__xen_set_pte(ptep, pteval);
 }
 
+void xen_set_clr_mmio_pvh_pte(unsigned long pfn, unsigned long mfn,
+ int nr_mfns, int add_mapping)
+{
+   struct physdev_map_iomem iomem;
+
+   iomem.first_gfn = pfn;
+   iomem.first_mfn = mfn;
+   iomem.nr_mfns = nr_mfns;
+   iomem.add_mapping = add_mapping;
+
+   if (HYPERVISOR_physdev_op(PHYSDEVOP_pvh_map_iomem, &iomem))
+   BUG();
+}
+
 static void xen_set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pteval)
 {
@@ -1220,6 +1235,8 @@ static void __init xen_pagetable_init(void)
 #endif
paging_init();
xen_setup_shared_info();
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return;
 #ifdef CONFIG_X86_64
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
unsigned long new_mfn_list;
@@ -1527,6 +1544,10 @@ static void __init xen_set_pte_init(pte_t *ptep, pte_t 
pte)
 static void pin_pagetable_pfn(unsigned cmd, unsigned long pfn)
 {
struct mmuext_op op;
+
+   if (xen_feature(XENFEAT_writable_page_tables))
+   return;
+
op.cmd = cmd;
op.arg1.mfn = pfn_to_mfn(pfn);
if (HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF))
@@ -1724,6 +1745,10 @@ static void set_page_prot(void *addr, pgprot_t prot)
unsigned long pfn = __pa(addr) >> PAGE_SHIFT;
pte_t pte = pfn_pte(pfn, prot);
 
+   /* recall for PVH, page tables are native. */
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return;
+
if (HYPERVISOR_update_va_mapping((unsigned long)addr, pte, 0))
BUG();
 }
@@ -1801,6 +1826,9 @@ static void convert_pfn_mfn(void *v)
pte_t *pte = v;
int i;
 
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return;
+
/* All levels are converted the same way, so just treat them
   as ptes. */
for (i = 0; i < PTRS_PER_PTE; i++)
@@ -1820,6 +1848,7 @@ static void __init check_pt_base(unsigned long *pt_base, 
unsigned long *pt_end,
(*pt_end)--;
}
 }
+
 /*
  * Set up the initial kernel pagetable.
  *
@@ -1830,6 +1859,7 @@ static void __init check_pt_base(unsigned long *pt_base, 
unsigned long *pt_end,
  * but that's enough to get __va working.  We need to fill in the rest
  * of the physical mapping once some sort of allocator has been set
  * up.
+ * NOTE: for PVH, the page tables are native.
  */
 void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 {
@@ -1907,10 +1937,13 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, 
unsigned long max_pfn)
 * structure to attach it to, so make sure we just set kernel
 * pgd.
 */
-   xen_mc_batch();
-   __xen_write_cr3(true, __pa(init_level4_pgt));
-   xen_mc_issue(PARAVIRT_LAZY_CPU);
-
+   if (xen_feature(XENFEAT_writable_page_tables)) {
+   native_write_cr3(__pa(init_level4_pgt));
+   } else {
+   xen_mc_batch();
+   __xen_write_cr3(true, __pa(init_level4_pgt));
+   xen_mc_issue(PARAVIRT_LAZY_CPU);
+   }
/* We can't that easily rip out L3 and L2, as the Xen pagetables are
 * set out this way: [L4], [L1], [L2], [L3], [L1], [L1] ...  for
 * the initial domain. For guests using the toolstack, they are in:
@@ -2177,8 +2210,19 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = 
{
 
 void __init xen_init_mmu_ops(void)
 {
-   x86_init.mapping.pagetable_reserve = xen_mapping_pagetable_reserve;
x86_init.paging.pagetable_init = xen_pagetable_init;
+
+   if (xen_feature(XENFEAT_auto_translated_physmap)) {
+   pv_mmu_ops.flush_tlb_others = xen_flush_tlb_others;
+
+   /* For PCI devices to map iomem. */
+   if (xen_initial_domain()) {
+   pv_mmu_ops.set_pte = native_set_pte;
+   pv_mmu_ops.set_pte_at = native_set_pte_at;
+   }
+

[PATCH V2 2/7]: PVH: use native irq, enable callback, use HVM ring ops,...

2012-10-11 Thread Mukesh Rathor

PVH: make gdt_frames[]/gdt_ents into a union with {gdtaddr, gdtsz}, PVH
only needs to send down gdtaddr and gdtsz. irq.c: PVH uses
native_irq_ops. vcpu hotplug is currently not available for PVH.
events.c: setup callback vector for PVH. Finally, PVH ring ops uses HVM
paths for xenbus.

Signed-off-by: Mukesh R 
---
 arch/x86/include/asm/xen/interface.h |8 +++-
 arch/x86/xen/irq.c   |5 -
 arch/x86/xen/p2m.c   |2 +-
 arch/x86/xen/smp.c   |4 ++--
 drivers/xen/cpu_hotplug.c|4 +++-
 drivers/xen/events.c |9 -
 drivers/xen/xenbus/xenbus_client.c   |3 ++-
 7 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/xen/interface.h 
b/arch/x86/include/asm/xen/interface.h
index 555f94d..f11edb0 100644
--- a/arch/x86/include/asm/xen/interface.h
+++ b/arch/x86/include/asm/xen/interface.h
@@ -143,7 +143,13 @@ struct vcpu_guest_context {
 struct cpu_user_regs user_regs; /* User-level CPU registers */
 struct trap_info trap_ctxt[256];/* Virtual IDT  */
 unsigned long ldt_base, ldt_ents;   /* LDT (linear address, # ents) */
-unsigned long gdt_frames[16], gdt_ents; /* GDT (machine frames, # ents) */
+union {
+   struct {
+   /* PV: GDT (machine frames, # ents).*/
+   unsigned long gdt_frames[16], gdt_ents;
+   } s;
+   unsigned long gdtaddr, gdtsz;   /* PVH: GDTR addr and size */
+} u;
 unsigned long kernel_ss, kernel_sp; /* Virtual TSS (only SS1/SP1)   */
 /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */
 unsigned long ctrlreg[8];   /* CR0-CR7 (control registers)  */
diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index 1573376..31959a7 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -128,6 +129,8 @@ static const struct pv_irq_ops xen_irq_ops __initconst = {
 
 void __init xen_init_irq_ops(void)
 {
-   pv_irq_ops = xen_irq_ops;
+   /* For PVH we use default pv_irq_ops settings */
+   if (!xen_feature(XENFEAT_hvm_callback_vector))
+   pv_irq_ops = xen_irq_ops;
x86_init.irqs.intr_init = xen_init_IRQ;
 }
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 95fb2aa..ea553c8 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -798,7 +798,7 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long 
mfn)
 {
unsigned topidx, mididx, idx;
 
-   if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
+   if (xen_feature(XENFEAT_auto_translated_physmap)) {
BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
return true;
}
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index f58dca7..bd92698 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -324,8 +324,8 @@ cpu_initialize_context(unsigned int cpu, struct task_struct 
*idle)
make_lowmem_page_readonly(gdt);
make_lowmem_page_readonly(mfn_to_virt(gdt_mfn));
 
-   ctxt->gdt_frames[0] = gdt_mfn;
-   ctxt->gdt_ents  = GDT_ENTRIES;
+   ctxt->u.s.gdt_frames[0] = gdt_mfn;
+   ctxt->u.s.gdt_ents  = GDT_ENTRIES;
 
ctxt->user_regs.cs = __KERNEL_CS;
ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c
index 4dcfced..de6bcf9 100644
--- a/drivers/xen/cpu_hotplug.c
+++ b/drivers/xen/cpu_hotplug.c
@@ -2,6 +2,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -100,7 +101,8 @@ static int __init setup_vcpu_hotplug_event(void)
static struct notifier_block xsn_cpu = {
.notifier_call = setup_cpu_watcher };
 
-   if (!xen_pv_domain())
+   /* PVH TBD/FIXME: future work */
+   if (!xen_pv_domain() || xen_feature(XENFEAT_auto_translated_physmap))
return -ENODEV;
 
register_xenstore_notifier(&xsn_cpu);
diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index c60d162..a977612 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -1767,7 +1767,7 @@ int xen_set_callback_via(uint64_t via)
 }
 EXPORT_SYMBOL_GPL(xen_set_callback_via);
 
-#ifdef CONFIG_XEN_PVHVM
+#ifdef CONFIG_X86
 /* Vector callbacks are better than PCI interrupts to receive event
  * channel notifications because we can receive vector callbacks on any
  * vcpu and we don't need PCI support or APIC interactions. */
@@ -1826,6 +1826,13 @@ void __init xen_init_IRQ(void)
if (xen_initial_domain())
pci_xen_initial_domain();
 
+   if (xen_feature(XENFEAT_hvm_callback_vector)) {
+   xen_callback_vector();
+   return;
+   }
+
+   /* PVH: TBD/FIXME: debug and fix eio map to work with pvh */
+
pirq_eoi_map

[PATCH V2 1/7]: PVH: basic and header changes, elfnote changes, ...

2012-10-11 Thread Mukesh Rathor


PVH: is a PV linux guest that has extended capabilities. This patch allows it 
to be configured and enabled. Also, basic header file changes to add new 
subcalls to physmap hypercall. Lastly, mfn_to_local_pfn must return mfn for 
paging mode translate

Signed-off-by: Mukesh R 
---
 arch/x86/include/asm/xen/page.h |3 +++
 arch/x86/xen/Kconfig|   10 ++
 arch/x86/xen/xen-head.S |   11 ++-
 include/xen/interface/memory.h  |   30 +++---
 include/xen/interface/physdev.h |   10 ++
 5 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 472b9b7..6af440d 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -159,6 +159,9 @@ static inline xpaddr_t machine_to_phys(xmaddr_t machine)
 static inline unsigned long mfn_to_local_pfn(unsigned long mfn)
 {
unsigned long pfn = mfn_to_pfn(mfn);
+
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return mfn;
if (get_phys_to_machine(pfn) != mfn)
return -1; /* force !pfn_valid() */
return pfn;
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index fdce49c..9323b8c 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -50,3 +50,13 @@ config XEN_DEBUG_FS
  Enable statistics output and various tuning options in debugfs.
  Enabling this option may incur a significant performance overhead.
 
+config XEN_X86_PVH
+   bool "Support for running as a PVH guest (EXPERIMENTAL)"
+   depends on X86_64 && XEN && INTEL_IOMMU && EXPERIMENTAL
+   default n
+   help
+  This option enables support for running as a PVH guest (PV guest
+  using hardware extensions) under a suitably capable hypervisor.
+  This option is EXPERIMETNAL because the hypervisor interfaces
+  which it uses are not yet considered stable therefore backwards and
+  forwards compatibility is not yet guaranteed.  If unsure, say N.
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 7faed58..3e65ece 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -13,6 +13,15 @@
 #include 
 #include 
 
+#ifdef CONFIG_XEN_X86_PVH
+#define FEATURES_PVH "| writable_descriptor_tables" \
+"| auto_translated_physmap" \
+"| supervisor_mode_kernel" \
+"| hvm_callback_vector"
+#else
+#define FEATURES_PVH /* Not supported */
+#endif
+
__INIT
 ENTRY(startup_xen)
cld
@@ -95,7 +104,7 @@ NEXT_HYPERCALL(arch_6)
 #endif
ELFNOTE(Xen, XEN_ELFNOTE_ENTRY,  _ASM_PTR startup_xen)
ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, _ASM_PTR hypercall_page)
-   ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,   .asciz 
"!writable_page_tables|pae_pgdir_above_4gb")
+   ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,   .asciz 
"!writable_page_tables|pae_pgdir_above_4gb"FEATURES_PVH)
ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE,   .asciz "yes")
ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz "generic")
ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID,
diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h
index d8e33a9..dbf4c6b 100644
--- a/include/xen/interface/memory.h
+++ b/include/xen/interface/memory.h
@@ -163,14 +163,22 @@ struct xen_add_to_physmap {
 /* Which domain to change the mapping for. */
 domid_t domid;
 
-/* Number of pages to go through for gmfn_range */
-uint16_tsize;
-
+union {
+   /* Number of pages to go through for gmfn_range */
+   uint16_tsize;
+   /* IFF XENMAPSPACE_gmfn_foreign */
+   domid_t foreign_domid;
+} u;
 /* Source mapping space. */
 #define XENMAPSPACE_shared_info 0 /* shared info page */
 #define XENMAPSPACE_grant_table 1 /* grant table page */
+#define XENMAPSPACE_gmfn2 /* GMFN */
+#define XENMAPSPACE_gmfn_range  3 /* GMFN range */
+#define XENMAPSPACE_gmfn_foreign 4 /* GMFN from another guest */
 unsigned int space;
 
+#define XENMAPIDX_grant_table_status 0x8000
+
 /* Index into source mapping space. */
 unsigned long idx;
 
@@ -237,4 +245,20 @@ DEFINE_GUEST_HANDLE_STRUCT(xen_memory_map);
  * during a driver critical region.
  */
 extern spinlock_t xen_reservation_lock;
+
+/*
+ * Unmaps the page appearing at a particular GPFN from the specified guest's
+ * pseudophysical address space.
+ * arg == addr of xen_remove_from_physmap_t.
+ */
+#define XENMEM_remove_from_physmap  15
+struct xen_remove_from_physmap {
+/* Which domain to change the mapping for. */
+domid_t domid;
+
+/* GPFN of the current mapping of the page. */
+xen_pfn_t gpfn;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_remove_from_physmap);
+
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index 9ce788d..3b9d5b6 100644
--- a/include/xen/interface/p

[PATCH V2 0/7]: PVH: PV guest with extensions

2012-10-11 Thread Mukesh Rathor

Hi guys,

Ok, I've made all the changes from prev RFC patch submissions. Tested
all the combinations. The patches are organized slightly differently
from prev version because of the nature of changes after last review. I
am building xen patch just for the corresponding header file changes.
Following that I'll refresh xen tree, debug, test, and send patches.

For linux kernel mailing list introduction, PVH is a PV guest that can
run in an HVM container, uses native pagetables, uses callback vector,
native IDT, and native syscalls.

They were built on top of 89d0307af2b9957d59bfb2a86aaa57464ff921de
commit.

thanks,
Mukesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 20/24] xen: update xen_add_to_physmap interface

2012-08-01 Thread Mukesh Rathor

On Wed, 1 Aug 2012 10:52:15 -0400
Konrad Rzeszutek Wilk  wrote:

> On Thu, Jul 26, 2012 at 04:34:02PM +0100, Stefano Stabellini wrote:
> > Update struct xen_add_to_physmap to be in sync with Xen's version
> > of the structure.
> > The size field was introduced by:
> > 
> > changeset:   24164:707d27fe03e7
> > user:Jean Guyader 
> > date:Fri Nov 18 13:42:08 2011 +
> > summary: mm: New XENMEM space, XENMAPSPACE_gmfn_range
> > 
> > According to the comment:
> > 
> > "This new field .size is located in the 16 bits padding
> > between .domid and .space in struct xen_add_to_physmap to stay
> > compatible with older versions."
> > 
> > This is not true on ARM where there is not padding, but it is valid
> > on X86, so introducing size is safe on X86 and it is going to fix
> > the interace for ARM.
> 
> Has this been checked actually for backwards compatibility? It sounds
> like it should work just fine with Xen 4.0 right?
> 
> I believe this also helps Mukesh's patches, so CC-ing him here for
> his Ack.

Yup, I already had that change in my tree.

thanks,
Mukesh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

96 matches

Mail list logo