Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-17 Thread Olaf Hering
On Tue, Jul 17, Olaf Hering wrote:

To make this robust against allocation errors I will change it to

   do {

> + /* Allocate new mfn for previous pfn */
> + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, );

   } while (rc == 0);

> +
> + /* Make sure the previous pfn is really connected to a (new) mfn */
> + BUG_ON(rc != 1);

Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-17 Thread Olaf Hering
On Mon, Jul 16, Konrad Rzeszutek Wilk wrote:

> On Sun, Jul 15, 2012 at 06:06:53PM +0200, Olaf Hering wrote:
> > -void __ref xen_hvm_init_shared_info(void)
> > +static struct shared_info *hvm_shared_info;
> > +static unsigned long hvm_shared_info_pfn;
> > +
> 
> Please include a big comment explainig what this machination is OK
> to do multiple times. If you can include the juicy snippets of the
> email converstation in this comment so that in 6 months if somebody
> looks at this they have a good understanding of it.

I have added a comment to the new patch, and I will extend it further in
the commit message.

> > +   /* Move pfn, disconnects previous pfn from mfn */
> > +   set_shared_info(sip, pfn);
> > +
> > +   /* Allocate new mfn for previous pfn */
> > +   rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, );
> > +   WARN_ON(rc != 1);
> 
> Shouldn't you do some recovery if it fails? Like reassign the old RAM pfn
> back to the shared page?

I moved the whole reassignment into the shutdown path because there is
appearently no sane way to make the move atomic. Shortly before reboot
all vcpus are still online, but hopefully the other vcpus have no work
left.

See below, the change is on top of the two other patches I sent out
today.

Olaf

xen PVonHVM: move shared_info to MMIO before kexec

Signed-off-by: Olaf Hering 
---
 arch/x86/xen/enlighten.c   |  114 +++-
 arch/x86/xen/suspend.c |2 +-
 arch/x86/xen/xen-ops.h |2 +-
 drivers/xen/platform-pci.c |   15 ++
 include/xen/events.h   |2 +
 5 Dateien geändert, 122 Zeilen hinzugefügt(+), 13 Zeilen entfernt(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 184fa9c..3cf233b 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1439,38 +1440,126 @@ asmlinkage void __init xen_start_kernel(void)
 #endif
 }
 
-void __ref xen_hvm_init_shared_info(void)
+#ifdef CONFIG_XEN_PVHVM
+/*
+ * The pfn containing the shared_info is located somewhere in RAM. This
+ * will cause trouble if the current kernel is doing a kexec boot into a
+ * new kernel. The new kernel (and its startup code) can not know where
+ * the pfn is, so it can not reserve the page. The hypervisor will
+ * continue to update the pfn, and as a result memory corruption occours
+ * in the new kernel.
+ *
+ * One way to work around this issue is to allocate a page in the
+ * xen-platform pci device's BAR memory range. But pci init is done very
+ * late and the shared_info page is already in use very early to read
+ * the pvclock. So moving the pfn from RAM to MMIO is racy because some
+ * code paths on other vcpus could access the pfn during the small
+ * window when the old pfn is moved to the new pfn. There is even a
+ * small window were the old pfn is not backed by a mfn, and during that
+ * time all reads return -1.
+ *
+ * Because it is not known upfront where the MMIO region is located it
+ * can not be used right from the start in xen_hvm_init_shared_info.
+ *
+ * To minimise trouble the move of the pfn is done shortly before kexec.
+ * This does not eliminate the race because all vcpus are still online
+ * when the syscore_ops will be called. But hopefully there is no work
+ * pending at this point in time. Also the syscore_op is run last which
+ * reduces the risk further.
+ */
+
+static struct shared_info *xen_hvm_shared_info;
+
+static void xen_hvm_connect_shared_info(unsigned long pfn)
 {
-   int cpu;
struct xen_add_to_physmap xatp;
-   static struct shared_info *shared_info_page = 0;
 
-   if (!shared_info_page)
-   shared_info_page = (struct shared_info *)
-   extend_brk(PAGE_SIZE, PAGE_SIZE);
xatp.domid = DOMID_SELF;
xatp.idx = 0;
xatp.space = XENMAPSPACE_shared_info;
-   xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+   xatp.gpfn = pfn;
if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, ))
BUG();
 
-   HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
+}
+static void xen_hvm_set_shared_info(struct shared_info *sip)
+{
+   int cpu;
+
+   HYPERVISOR_shared_info = sip;
 
/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
 * page, we use it in the event channel upcall and in some pvclock
 * related functions. We don't need the vcpu_info placement
 * optimizations because we don't use any pv_mmu or pv_irq op on
 * HVM.
-* When xen_hvm_init_shared_info is run at boot time only vcpu 0 is
-* online but xen_hvm_init_shared_info is run at resume time too and
+* When xen_hvm_set_shared_info is run at boot time only vcpu 0 is
+* online but xen_hvm_set_shared_info is run at resume time too and
 * in that case multiple vcpus might be online. */
for_each_online_cpu(cpu) {

Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-17 Thread Olaf Hering
On Mon, Jul 16, Konrad Rzeszutek Wilk wrote:

 On Sun, Jul 15, 2012 at 06:06:53PM +0200, Olaf Hering wrote:
  -void __ref xen_hvm_init_shared_info(void)
  +static struct shared_info *hvm_shared_info;
  +static unsigned long hvm_shared_info_pfn;
  +
 
 Please include a big comment explainig what this machination is OK
 to do multiple times. If you can include the juicy snippets of the
 email converstation in this comment so that in 6 months if somebody
 looks at this they have a good understanding of it.

I have added a comment to the new patch, and I will extend it further in
the commit message.

  +   /* Move pfn, disconnects previous pfn from mfn */
  +   set_shared_info(sip, pfn);
  +
  +   /* Allocate new mfn for previous pfn */
  +   rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, reservation);
  +   WARN_ON(rc != 1);
 
 Shouldn't you do some recovery if it fails? Like reassign the old RAM pfn
 back to the shared page?

I moved the whole reassignment into the shutdown path because there is
appearently no sane way to make the move atomic. Shortly before reboot
all vcpus are still online, but hopefully the other vcpus have no work
left.

See below, the change is on top of the two other patches I sent out
today.

Olaf

xen PVonHVM: move shared_info to MMIO before kexec

Signed-off-by: Olaf Hering o...@aepfle.de
---
 arch/x86/xen/enlighten.c   |  114 +++-
 arch/x86/xen/suspend.c |2 +-
 arch/x86/xen/xen-ops.h |2 +-
 drivers/xen/platform-pci.c |   15 ++
 include/xen/events.h   |2 +
 5 Dateien geändert, 122 Zeilen hinzugefügt(+), 13 Zeilen entfernt(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 184fa9c..3cf233b 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -31,6 +31,7 @@
 #include linux/pci.h
 #include linux/gfp.h
 #include linux/memblock.h
+#include linux/syscore_ops.h
 
 #include xen/xen.h
 #include xen/interface/xen.h
@@ -1439,38 +1440,126 @@ asmlinkage void __init xen_start_kernel(void)
 #endif
 }
 
-void __ref xen_hvm_init_shared_info(void)
+#ifdef CONFIG_XEN_PVHVM
+/*
+ * The pfn containing the shared_info is located somewhere in RAM. This
+ * will cause trouble if the current kernel is doing a kexec boot into a
+ * new kernel. The new kernel (and its startup code) can not know where
+ * the pfn is, so it can not reserve the page. The hypervisor will
+ * continue to update the pfn, and as a result memory corruption occours
+ * in the new kernel.
+ *
+ * One way to work around this issue is to allocate a page in the
+ * xen-platform pci device's BAR memory range. But pci init is done very
+ * late and the shared_info page is already in use very early to read
+ * the pvclock. So moving the pfn from RAM to MMIO is racy because some
+ * code paths on other vcpus could access the pfn during the small
+ * window when the old pfn is moved to the new pfn. There is even a
+ * small window were the old pfn is not backed by a mfn, and during that
+ * time all reads return -1.
+ *
+ * Because it is not known upfront where the MMIO region is located it
+ * can not be used right from the start in xen_hvm_init_shared_info.
+ *
+ * To minimise trouble the move of the pfn is done shortly before kexec.
+ * This does not eliminate the race because all vcpus are still online
+ * when the syscore_ops will be called. But hopefully there is no work
+ * pending at this point in time. Also the syscore_op is run last which
+ * reduces the risk further.
+ */
+
+static struct shared_info *xen_hvm_shared_info;
+
+static void xen_hvm_connect_shared_info(unsigned long pfn)
 {
-   int cpu;
struct xen_add_to_physmap xatp;
-   static struct shared_info *shared_info_page = 0;
 
-   if (!shared_info_page)
-   shared_info_page = (struct shared_info *)
-   extend_brk(PAGE_SIZE, PAGE_SIZE);
xatp.domid = DOMID_SELF;
xatp.idx = 0;
xatp.space = XENMAPSPACE_shared_info;
-   xatp.gpfn = __pa(shared_info_page)  PAGE_SHIFT;
+   xatp.gpfn = pfn;
if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, xatp))
BUG();
 
-   HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
+}
+static void xen_hvm_set_shared_info(struct shared_info *sip)
+{
+   int cpu;
+
+   HYPERVISOR_shared_info = sip;
 
/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
 * page, we use it in the event channel upcall and in some pvclock
 * related functions. We don't need the vcpu_info placement
 * optimizations because we don't use any pv_mmu or pv_irq op on
 * HVM.
-* When xen_hvm_init_shared_info is run at boot time only vcpu 0 is
-* online but xen_hvm_init_shared_info is run at resume time too and
+* When xen_hvm_set_shared_info is run at boot time only vcpu 0 is
+* online but xen_hvm_set_shared_info is run at resume time too and
 * in 

Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-17 Thread Olaf Hering
On Tue, Jul 17, Olaf Hering wrote:

To make this robust against allocation errors I will change it to

   do {

 + /* Allocate new mfn for previous pfn */
 + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, reservation);

   } while (rc == 0);

 +
 + /* Make sure the previous pfn is really connected to a (new) mfn */
 + BUG_ON(rc != 1);

Olaf
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-16 Thread Konrad Rzeszutek Wilk
On Sun, Jul 15, 2012 at 06:06:53PM +0200, Olaf Hering wrote:
> On Tue, Jul 10, Keir Fraser wrote:
> 
> > Best thing to do, is possible, is map the shared-info page in the
> > xen-platform pci device's BAR memory range. Then it will not conflict with
> > any RAM.
> 
> This patch does that. I did a kexec boot and a save/restore.
> It does not deal with the possible race were the old pfn is not backed
> by a mfn.
> 
> Olaf
> 
> 
> commit a9d5567c67a2317c9db174a4deef6c5690220749
> Author: Olaf Hering 
> Date:   Thu Jul 12 19:38:39 2012 +0200
> 
> xen PVonHVM: move shared info page from RAM to MMIO
> 
> Signed-off-by: Olaf Hering 
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index ff962d4..8a743af 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor)
>   return 0;
>  }
>  
> -void __ref xen_hvm_init_shared_info(void)
> +static struct shared_info *hvm_shared_info;
> +static unsigned long hvm_shared_info_pfn;
> +

Please include a big comment explainig what this machination is OK
to do multiple times. If you can include the juicy snippets of the
email converstation in this comment so that in 6 months if somebody
looks at this they have a good understanding of it.

> +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn)
>  {
>   int cpu;
>   struct xen_add_to_physmap xatp;
> - static struct shared_info *shared_info_page = 0;
>  
> - if (!shared_info_page)
> - shared_info_page = (struct shared_info *)
> - extend_brk(PAGE_SIZE, PAGE_SIZE);
>   xatp.domid = DOMID_SELF;
>   xatp.idx = 0;
>   xatp.space = XENMAPSPACE_shared_info;
> - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
> + xatp.gpfn = pfn;
>   if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, ))
>   BUG();
>  
> - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
> + HYPERVISOR_shared_info = sip;
>  
>   /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
>* page, we use it in the event channel upcall and in some pvclock
> @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void)
>   for_each_online_cpu(cpu) {
>   per_cpu(xen_vcpu, cpu) = 
> _shared_info->vcpu_info[cpu];
>   }
> + mb();
>  }
>  
> +/* Reconnect the pfn to a mfn */
> +void __ref xen_hvm_resume_shared_info(void)
> +{
> + set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
> +}
> +
> +/* Move the pfn in RAM to a pfn in MMIO space */
> +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned 
> long pfn)
> +{
> + struct xen_memory_reservation reservation = {
> + .domid = DOMID_SELF,
> + .nr_extents = 1,
> + };
> + int rc;
> +
> + set_xen_guest_handle(reservation.extent_start, _shared_info_pfn);
> +
> + /* Move pfn, disconnects previous pfn from mfn */
> + set_shared_info(sip, pfn);
> +
> + /* Allocate new mfn for previous pfn */
> + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, );
> + WARN_ON(rc != 1);

Shouldn't you do some recovery if it fails? Like reassign the old RAM pfn
back to the shared page?

> +
> + /* Remember final pfn and pointer for resume */
> + hvm_shared_info_pfn = pfn;
> + hvm_shared_info = sip;
> +}
> +
> +/* Use a pfn in RAM until PCI init is done. */
> +static void __ref xen_hvm_initial_shared_info(void)
> +{
> + /* FIXME simply allocate a page and release it after pfn move (How at 
> this stage?) */

You could just have an page on the .bss that has __init_data. During
bootup it would be recycled - and since the platform PCI driver is always
built in, it would not be a problem I think?

Thought if somebody does not compile with CONFIG_XEN_PVHVM then
it would make sense to keep the existing allocation from the .brk.

> + hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
> + hvm_shared_info_pfn = __pa(hvm_shared_info) >> PAGE_SHIFT;
> + set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
> +}
> +
> +
>  #ifdef CONFIG_XEN_PVHVM
>  static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
>   unsigned long action, void *hcpu)
> @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void)
>   if (r < 0)
>   return;
>  
> - xen_hvm_init_shared_info();

Put here a comment saying:
/* Initaliazing a RAM PFN that is later going to be replaced with
 * a MMIO PFN. */
> + xen_hvm_initial_shared_info();
>  
>   if (xen_feature(XENFEAT_hvm_callback_vector))
>   xen_have_vector_callback = 1;
> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
> index 45329c8..ae8a00c 100644
> --- a/arch/x86/xen/suspend.c
> +++ b/arch/x86/xen/suspend.c
> @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled)
>  {
>  #ifdef 

Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-16 Thread Konrad Rzeszutek Wilk
On Sun, Jul 15, 2012 at 06:06:53PM +0200, Olaf Hering wrote:
 On Tue, Jul 10, Keir Fraser wrote:
 
  Best thing to do, is possible, is map the shared-info page in the
  xen-platform pci device's BAR memory range. Then it will not conflict with
  any RAM.
 
 This patch does that. I did a kexec boot and a save/restore.
 It does not deal with the possible race were the old pfn is not backed
 by a mfn.
 
 Olaf
 
 
 commit a9d5567c67a2317c9db174a4deef6c5690220749
 Author: Olaf Hering o...@aepfle.de
 Date:   Thu Jul 12 19:38:39 2012 +0200
 
 xen PVonHVM: move shared info page from RAM to MMIO
 
 Signed-off-by: Olaf Hering oher...@suse.de
 
 diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
 index ff962d4..8a743af 100644
 --- a/arch/x86/xen/enlighten.c
 +++ b/arch/x86/xen/enlighten.c
 @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor)
   return 0;
  }
  
 -void __ref xen_hvm_init_shared_info(void)
 +static struct shared_info *hvm_shared_info;
 +static unsigned long hvm_shared_info_pfn;
 +

Please include a big comment explainig what this machination is OK
to do multiple times. If you can include the juicy snippets of the
email converstation in this comment so that in 6 months if somebody
looks at this they have a good understanding of it.

 +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn)
  {
   int cpu;
   struct xen_add_to_physmap xatp;
 - static struct shared_info *shared_info_page = 0;
  
 - if (!shared_info_page)
 - shared_info_page = (struct shared_info *)
 - extend_brk(PAGE_SIZE, PAGE_SIZE);
   xatp.domid = DOMID_SELF;
   xatp.idx = 0;
   xatp.space = XENMAPSPACE_shared_info;
 - xatp.gpfn = __pa(shared_info_page)  PAGE_SHIFT;
 + xatp.gpfn = pfn;
   if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, xatp))
   BUG();
  
 - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
 + HYPERVISOR_shared_info = sip;
  
   /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
* page, we use it in the event channel upcall and in some pvclock
 @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void)
   for_each_online_cpu(cpu) {
   per_cpu(xen_vcpu, cpu) = 
 HYPERVISOR_shared_info-vcpu_info[cpu];
   }
 + mb();
  }
  
 +/* Reconnect the pfn to a mfn */
 +void __ref xen_hvm_resume_shared_info(void)
 +{
 + set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
 +}
 +
 +/* Move the pfn in RAM to a pfn in MMIO space */
 +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned 
 long pfn)
 +{
 + struct xen_memory_reservation reservation = {
 + .domid = DOMID_SELF,
 + .nr_extents = 1,
 + };
 + int rc;
 +
 + set_xen_guest_handle(reservation.extent_start, hvm_shared_info_pfn);
 +
 + /* Move pfn, disconnects previous pfn from mfn */
 + set_shared_info(sip, pfn);
 +
 + /* Allocate new mfn for previous pfn */
 + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, reservation);
 + WARN_ON(rc != 1);

Shouldn't you do some recovery if it fails? Like reassign the old RAM pfn
back to the shared page?

 +
 + /* Remember final pfn and pointer for resume */
 + hvm_shared_info_pfn = pfn;
 + hvm_shared_info = sip;
 +}
 +
 +/* Use a pfn in RAM until PCI init is done. */
 +static void __ref xen_hvm_initial_shared_info(void)
 +{
 + /* FIXME simply allocate a page and release it after pfn move (How at 
 this stage?) */

You could just have an page on the .bss that has __init_data. During
bootup it would be recycled - and since the platform PCI driver is always
built in, it would not be a problem I think?

Thought if somebody does not compile with CONFIG_XEN_PVHVM then
it would make sense to keep the existing allocation from the .brk.

 + hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
 + hvm_shared_info_pfn = __pa(hvm_shared_info)  PAGE_SHIFT;
 + set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
 +}
 +
 +
  #ifdef CONFIG_XEN_PVHVM
  static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
   unsigned long action, void *hcpu)
 @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void)
   if (r  0)
   return;
  
 - xen_hvm_init_shared_info();

Put here a comment saying:
/* Initaliazing a RAM PFN that is later going to be replaced with
 * a MMIO PFN. */
 + xen_hvm_initial_shared_info();
  
   if (xen_feature(XENFEAT_hvm_callback_vector))
   xen_have_vector_callback = 1;
 diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
 index 45329c8..ae8a00c 100644
 --- a/arch/x86/xen/suspend.c
 +++ b/arch/x86/xen/suspend.c
 @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled)
  {
  #ifdef CONFIG_XEN_PVHVM
   int cpu;
 - xen_hvm_init_shared_info();
 + 

Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-15 Thread Keir Fraser
On 15/07/2012 17:06, "Olaf Hering"  wrote:

> On Tue, Jul 10, Keir Fraser wrote:
> 
>> Best thing to do, is possible, is map the shared-info page in the
>> xen-platform pci device's BAR memory range. Then it will not conflict with
>> any RAM.
> 
> This patch does that. I did a kexec boot and a save/restore.
> It does not deal with the possible race were the old pfn is not backed
> by a mfn.

Looks good to me.

 -- Keir

> Olaf
> 
> 
> commit a9d5567c67a2317c9db174a4deef6c5690220749
> Author: Olaf Hering 
> Date:   Thu Jul 12 19:38:39 2012 +0200
> 
> xen PVonHVM: move shared info page from RAM to MMIO
> 
> Signed-off-by: Olaf Hering 
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index ff962d4..8a743af 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor)
> return 0;
>  }
>  
> -void __ref xen_hvm_init_shared_info(void)
> +static struct shared_info *hvm_shared_info;
> +static unsigned long hvm_shared_info_pfn;
> +
> +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn)
>  {
> int cpu;
> struct xen_add_to_physmap xatp;
> - static struct shared_info *shared_info_page = 0;
>  
> - if (!shared_info_page)
> -  shared_info_page = (struct shared_info *)
> -   extend_brk(PAGE_SIZE, PAGE_SIZE);
> xatp.domid = DOMID_SELF;
> xatp.idx = 0;
> xatp.space = XENMAPSPACE_shared_info;
> - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
> + xatp.gpfn = pfn;
> if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, ))
> BUG();
>  
> - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
> + HYPERVISOR_shared_info = sip;
>  
> /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
> * page, we use it in the event channel upcall and in some pvclock
> @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void)
> for_each_online_cpu(cpu) {
> per_cpu(xen_vcpu, cpu) = _shared_info->vcpu_info[cpu];
> }
> + mb();
>  }
>  
> +/* Reconnect the pfn to a mfn */
> +void __ref xen_hvm_resume_shared_info(void)
> +{
> + set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
> +}
> +
> +/* Move the pfn in RAM to a pfn in MMIO space */
> +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned
> long pfn)
> +{
> + struct xen_memory_reservation reservation = {
> +  .domid = DOMID_SELF,
> +  .nr_extents = 1,
> + };
> + int rc;
> +
> + set_xen_guest_handle(reservation.extent_start, _shared_info_pfn);
> +
> + /* Move pfn, disconnects previous pfn from mfn */
> + set_shared_info(sip, pfn);
> +
> + /* Allocate new mfn for previous pfn */
> + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, );
> + WARN_ON(rc != 1);
> +
> + /* Remember final pfn and pointer for resume */
> + hvm_shared_info_pfn = pfn;
> + hvm_shared_info = sip;
> +}
> +
> +/* Use a pfn in RAM until PCI init is done. */
> +static void __ref xen_hvm_initial_shared_info(void)
> +{
> + /* FIXME simply allocate a page and release it after pfn move (How at this
> stage?) */
> + hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
> + hvm_shared_info_pfn = __pa(hvm_shared_info) >> PAGE_SHIFT;
> + set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
> +}
> +
> +
>  #ifdef CONFIG_XEN_PVHVM
>  static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
>unsigned long action, void *hcpu)
> @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void)
> if (r < 0)
> return;
>  
> - xen_hvm_init_shared_info();
> + xen_hvm_initial_shared_info();
>  
> if (xen_feature(XENFEAT_hvm_callback_vector))
> xen_have_vector_callback = 1;
> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
> index 45329c8..ae8a00c 100644
> --- a/arch/x86/xen/suspend.c
> +++ b/arch/x86/xen/suspend.c
> @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled)
>  {
>  #ifdef CONFIG_XEN_PVHVM
> int cpu;
> - xen_hvm_init_shared_info();
> + xen_hvm_resume_shared_info();
> xen_callback_vector();
> xen_unplug_emulated_devices();
> if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index 202d4c1..1e4329e 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -41,7 +41,7 @@ void xen_enable_syscall(void);
>  void xen_vcpu_restore(void);
>  
>  void xen_callback_vector(void);
> -void xen_hvm_init_shared_info(void);
> +void xen_hvm_resume_shared_info(void);
>  void xen_unplug_emulated_devices(void);
>  
>  void __init xen_build_dynamic_phys_to_machine(void);
> diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
> index 97ca359..cbe866b 100644
> --- a/drivers/xen/platform-pci.c
> +++ b/drivers/xen/platform-pci.c
> @@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev
> *pdev,
> long ioaddr;
> long mmio_addr, mmio_len;
> unsigned int max_nr_gframes;
> + unsigned long addr;
> + struct shared_info *hvm_shared_info;
>  
> if (!xen_domain())
> return -ENODEV;
> @@ -138,6 +140,11 @@ static 

Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-15 Thread Olaf Hering
On Tue, Jul 10, Keir Fraser wrote:

> Best thing to do, is possible, is map the shared-info page in the
> xen-platform pci device's BAR memory range. Then it will not conflict with
> any RAM.

This patch does that. I did a kexec boot and a save/restore.
It does not deal with the possible race were the old pfn is not backed
by a mfn.

Olaf


commit a9d5567c67a2317c9db174a4deef6c5690220749
Author: Olaf Hering 
Date:   Thu Jul 12 19:38:39 2012 +0200

xen PVonHVM: move shared info page from RAM to MMIO

Signed-off-by: Olaf Hering 

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index ff962d4..8a743af 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor)
return 0;
 }
 
-void __ref xen_hvm_init_shared_info(void)
+static struct shared_info *hvm_shared_info;
+static unsigned long hvm_shared_info_pfn;
+
+static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn)
 {
int cpu;
struct xen_add_to_physmap xatp;
-   static struct shared_info *shared_info_page = 0;
 
-   if (!shared_info_page)
-   shared_info_page = (struct shared_info *)
-   extend_brk(PAGE_SIZE, PAGE_SIZE);
xatp.domid = DOMID_SELF;
xatp.idx = 0;
xatp.space = XENMAPSPACE_shared_info;
-   xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+   xatp.gpfn = pfn;
if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, ))
BUG();
 
-   HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
+   HYPERVISOR_shared_info = sip;
 
/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
 * page, we use it in the event channel upcall and in some pvclock
@@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void)
for_each_online_cpu(cpu) {
per_cpu(xen_vcpu, cpu) = 
_shared_info->vcpu_info[cpu];
}
+   mb();
 }
 
+/* Reconnect the pfn to a mfn */
+void __ref xen_hvm_resume_shared_info(void)
+{
+   set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
+}
+
+/* Move the pfn in RAM to a pfn in MMIO space */
+void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long 
pfn)
+{
+   struct xen_memory_reservation reservation = {
+   .domid = DOMID_SELF,
+   .nr_extents = 1,
+   };
+   int rc;
+
+   set_xen_guest_handle(reservation.extent_start, _shared_info_pfn);
+
+   /* Move pfn, disconnects previous pfn from mfn */
+   set_shared_info(sip, pfn);
+
+   /* Allocate new mfn for previous pfn */
+   rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, );
+   WARN_ON(rc != 1);
+
+   /* Remember final pfn and pointer for resume */
+   hvm_shared_info_pfn = pfn;
+   hvm_shared_info = sip;
+}
+
+/* Use a pfn in RAM until PCI init is done. */
+static void __ref xen_hvm_initial_shared_info(void)
+{
+   /* FIXME simply allocate a page and release it after pfn move (How at 
this stage?) */
+   hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
+   hvm_shared_info_pfn = __pa(hvm_shared_info) >> PAGE_SHIFT;
+   set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
+}
+
+
 #ifdef CONFIG_XEN_PVHVM
 static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
unsigned long action, void *hcpu)
@@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void)
if (r < 0)
return;
 
-   xen_hvm_init_shared_info();
+   xen_hvm_initial_shared_info();
 
if (xen_feature(XENFEAT_hvm_callback_vector))
xen_have_vector_callback = 1;
diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index 45329c8..ae8a00c 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled)
 {
 #ifdef CONFIG_XEN_PVHVM
int cpu;
-   xen_hvm_init_shared_info();
+   xen_hvm_resume_shared_info();
xen_callback_vector();
xen_unplug_emulated_devices();
if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 202d4c1..1e4329e 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -41,7 +41,7 @@ void xen_enable_syscall(void);
 void xen_vcpu_restore(void);
 
 void xen_callback_vector(void);
-void xen_hvm_init_shared_info(void);
+void xen_hvm_resume_shared_info(void);
 void xen_unplug_emulated_devices(void);
 
 void __init xen_build_dynamic_phys_to_machine(void);
diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index 97ca359..cbe866b 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
long ioaddr;
long mmio_addr, mmio_len;
unsigned int max_nr_gframes;
+   unsigned 

Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-15 Thread Olaf Hering
On Tue, Jul 10, Keir Fraser wrote:

 Best thing to do, is possible, is map the shared-info page in the
 xen-platform pci device's BAR memory range. Then it will not conflict with
 any RAM.

This patch does that. I did a kexec boot and a save/restore.
It does not deal with the possible race were the old pfn is not backed
by a mfn.

Olaf


commit a9d5567c67a2317c9db174a4deef6c5690220749
Author: Olaf Hering o...@aepfle.de
Date:   Thu Jul 12 19:38:39 2012 +0200

xen PVonHVM: move shared info page from RAM to MMIO

Signed-off-by: Olaf Hering oher...@suse.de

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index ff962d4..8a743af 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor)
return 0;
 }
 
-void __ref xen_hvm_init_shared_info(void)
+static struct shared_info *hvm_shared_info;
+static unsigned long hvm_shared_info_pfn;
+
+static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn)
 {
int cpu;
struct xen_add_to_physmap xatp;
-   static struct shared_info *shared_info_page = 0;
 
-   if (!shared_info_page)
-   shared_info_page = (struct shared_info *)
-   extend_brk(PAGE_SIZE, PAGE_SIZE);
xatp.domid = DOMID_SELF;
xatp.idx = 0;
xatp.space = XENMAPSPACE_shared_info;
-   xatp.gpfn = __pa(shared_info_page)  PAGE_SHIFT;
+   xatp.gpfn = pfn;
if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, xatp))
BUG();
 
-   HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
+   HYPERVISOR_shared_info = sip;
 
/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
 * page, we use it in the event channel upcall and in some pvclock
@@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void)
for_each_online_cpu(cpu) {
per_cpu(xen_vcpu, cpu) = 
HYPERVISOR_shared_info-vcpu_info[cpu];
}
+   mb();
 }
 
+/* Reconnect the pfn to a mfn */
+void __ref xen_hvm_resume_shared_info(void)
+{
+   set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
+}
+
+/* Move the pfn in RAM to a pfn in MMIO space */
+void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long 
pfn)
+{
+   struct xen_memory_reservation reservation = {
+   .domid = DOMID_SELF,
+   .nr_extents = 1,
+   };
+   int rc;
+
+   set_xen_guest_handle(reservation.extent_start, hvm_shared_info_pfn);
+
+   /* Move pfn, disconnects previous pfn from mfn */
+   set_shared_info(sip, pfn);
+
+   /* Allocate new mfn for previous pfn */
+   rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, reservation);
+   WARN_ON(rc != 1);
+
+   /* Remember final pfn and pointer for resume */
+   hvm_shared_info_pfn = pfn;
+   hvm_shared_info = sip;
+}
+
+/* Use a pfn in RAM until PCI init is done. */
+static void __ref xen_hvm_initial_shared_info(void)
+{
+   /* FIXME simply allocate a page and release it after pfn move (How at 
this stage?) */
+   hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
+   hvm_shared_info_pfn = __pa(hvm_shared_info)  PAGE_SHIFT;
+   set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
+}
+
+
 #ifdef CONFIG_XEN_PVHVM
 static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
unsigned long action, void *hcpu)
@@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void)
if (r  0)
return;
 
-   xen_hvm_init_shared_info();
+   xen_hvm_initial_shared_info();
 
if (xen_feature(XENFEAT_hvm_callback_vector))
xen_have_vector_callback = 1;
diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index 45329c8..ae8a00c 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled)
 {
 #ifdef CONFIG_XEN_PVHVM
int cpu;
-   xen_hvm_init_shared_info();
+   xen_hvm_resume_shared_info();
xen_callback_vector();
xen_unplug_emulated_devices();
if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 202d4c1..1e4329e 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -41,7 +41,7 @@ void xen_enable_syscall(void);
 void xen_vcpu_restore(void);
 
 void xen_callback_vector(void);
-void xen_hvm_init_shared_info(void);
+void xen_hvm_resume_shared_info(void);
 void xen_unplug_emulated_devices(void);
 
 void __init xen_build_dynamic_phys_to_machine(void);
diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index 97ca359..cbe866b 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
long ioaddr;
long mmio_addr, mmio_len;

Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-15 Thread Keir Fraser
On 15/07/2012 17:06, Olaf Hering o...@aepfle.de wrote:

 On Tue, Jul 10, Keir Fraser wrote:
 
 Best thing to do, is possible, is map the shared-info page in the
 xen-platform pci device's BAR memory range. Then it will not conflict with
 any RAM.
 
 This patch does that. I did a kexec boot and a save/restore.
 It does not deal with the possible race were the old pfn is not backed
 by a mfn.

Looks good to me.

 -- Keir

 Olaf
 
 
 commit a9d5567c67a2317c9db174a4deef6c5690220749
 Author: Olaf Hering o...@aepfle.de
 Date:   Thu Jul 12 19:38:39 2012 +0200
 
 xen PVonHVM: move shared info page from RAM to MMIO
 
 Signed-off-by: Olaf Hering oher...@suse.de
 
 diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
 index ff962d4..8a743af 100644
 --- a/arch/x86/xen/enlighten.c
 +++ b/arch/x86/xen/enlighten.c
 @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor)
 return 0;
  }
  
 -void __ref xen_hvm_init_shared_info(void)
 +static struct shared_info *hvm_shared_info;
 +static unsigned long hvm_shared_info_pfn;
 +
 +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn)
  {
 int cpu;
 struct xen_add_to_physmap xatp;
 - static struct shared_info *shared_info_page = 0;
  
 - if (!shared_info_page)
 -  shared_info_page = (struct shared_info *)
 -   extend_brk(PAGE_SIZE, PAGE_SIZE);
 xatp.domid = DOMID_SELF;
 xatp.idx = 0;
 xatp.space = XENMAPSPACE_shared_info;
 - xatp.gpfn = __pa(shared_info_page)  PAGE_SHIFT;
 + xatp.gpfn = pfn;
 if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, xatp))
 BUG();
  
 - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
 + HYPERVISOR_shared_info = sip;
  
 /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
 * page, we use it in the event channel upcall and in some pvclock
 @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void)
 for_each_online_cpu(cpu) {
 per_cpu(xen_vcpu, cpu) = HYPERVISOR_shared_info-vcpu_info[cpu];
 }
 + mb();
  }
  
 +/* Reconnect the pfn to a mfn */
 +void __ref xen_hvm_resume_shared_info(void)
 +{
 + set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
 +}
 +
 +/* Move the pfn in RAM to a pfn in MMIO space */
 +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned
 long pfn)
 +{
 + struct xen_memory_reservation reservation = {
 +  .domid = DOMID_SELF,
 +  .nr_extents = 1,
 + };
 + int rc;
 +
 + set_xen_guest_handle(reservation.extent_start, hvm_shared_info_pfn);
 +
 + /* Move pfn, disconnects previous pfn from mfn */
 + set_shared_info(sip, pfn);
 +
 + /* Allocate new mfn for previous pfn */
 + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, reservation);
 + WARN_ON(rc != 1);
 +
 + /* Remember final pfn and pointer for resume */
 + hvm_shared_info_pfn = pfn;
 + hvm_shared_info = sip;
 +}
 +
 +/* Use a pfn in RAM until PCI init is done. */
 +static void __ref xen_hvm_initial_shared_info(void)
 +{
 + /* FIXME simply allocate a page and release it after pfn move (How at this
 stage?) */
 + hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
 + hvm_shared_info_pfn = __pa(hvm_shared_info)  PAGE_SHIFT;
 + set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
 +}
 +
 +
  #ifdef CONFIG_XEN_PVHVM
  static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
unsigned long action, void *hcpu)
 @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void)
 if (r  0)
 return;
  
 - xen_hvm_init_shared_info();
 + xen_hvm_initial_shared_info();
  
 if (xen_feature(XENFEAT_hvm_callback_vector))
 xen_have_vector_callback = 1;
 diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
 index 45329c8..ae8a00c 100644
 --- a/arch/x86/xen/suspend.c
 +++ b/arch/x86/xen/suspend.c
 @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled)
  {
  #ifdef CONFIG_XEN_PVHVM
 int cpu;
 - xen_hvm_init_shared_info();
 + xen_hvm_resume_shared_info();
 xen_callback_vector();
 xen_unplug_emulated_devices();
 if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
 diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
 index 202d4c1..1e4329e 100644
 --- a/arch/x86/xen/xen-ops.h
 +++ b/arch/x86/xen/xen-ops.h
 @@ -41,7 +41,7 @@ void xen_enable_syscall(void);
  void xen_vcpu_restore(void);
  
  void xen_callback_vector(void);
 -void xen_hvm_init_shared_info(void);
 +void xen_hvm_resume_shared_info(void);
  void xen_unplug_emulated_devices(void);
  
  void __init xen_build_dynamic_phys_to_machine(void);
 diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
 index 97ca359..cbe866b 100644
 --- a/drivers/xen/platform-pci.c
 +++ b/drivers/xen/platform-pci.c
 @@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev
 *pdev,
 long ioaddr;
 long mmio_addr, mmio_len;
 unsigned int max_nr_gframes;
 + unsigned long addr;
 + struct shared_info *hvm_shared_info;
  
 if (!xen_domain())
 return -ENODEV;
 @@ -138,6 +140,11 @@ static int __devinit platform_pci_init(struct pci_dev
 *pdev,
 platform_mmio = mmio_addr;
 

Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-13 Thread Keir Fraser
On 13/07/2012 21:20, "Olaf Hering"  wrote:

> On Tue, Jul 10, Keir Fraser wrote:
> 
>> On 10/07/2012 19:09, "Olaf Hering"  wrote:
>>> I'm not sure, most likely the gfn will just disappear from the guest,
>>> like a ballooned page disappears. Accessing it will likely cause a
>>> crash.
>> 
>> Best thing to do, is possible, is map the shared-info page in the
>> xen-platform pci device's BAR memory range. Then it will not conflict with
>> any RAM.
>> 
>> If you do map it over the top of an existing RAM page, you will have to
>> repopulate that RAM page before kexec, using populate_physmap hypercall. The
>> good news is that the populate_physmap hypercall will have the side effect
>> of unmapping the shared-info page, reayd to be mapped wherever the new
>> kernel would like it to reside :)
> 
> Keir,
> 
> is this a safe thing to do in a SMP guest?
> If arch/x86/xen/enlighten.c:xen_hvm_init_shared_info() allocates a page
> (backed by mfn M and pfn A) and assigns *HYPERVISOR_shared_info and
> *xen_vcpu then everything will reference these pointers.

So pfn A now points at shared_info, and mfn M is lost (freed back to Xen).
Xen_vcpu doesn't come into it, you'd have that mapped at yet another pfn.

> If drivers/xen/platform-pci.c:platform_pci_init would also do a
> XENMAPSPACE_shared_info call with pfn B, isnt there a small window where
> pfn A is not backed by a mfn because mfn M is now connected to pfn C? As
> a result other code paths which access *HYPERVISOR_shared_info and
> *xen_vcpu between the hypercall and the update of the pointers will read
> 0xff.

Don't really understand this. After the XENMAPSPACE_shared_info_call:
 * PFN B points at shared_info, mfn M_B it previously mapped is lost (freed
back to Xen).
 * PFN A maps nothing, reads return all-1s.

Yes, obviously you can't atomically update the mapping of shinfo from A->B,
ad update your pointer in the kernel at exactly the same time. Presumably
you do this early during boot, or late during kexec, or otherwise at a time
when other processors are not expected to touch shinfo.

> 
> If I read the hypercall code of XENMEM_add_to_physmap correctly the mfn
> backing *HYPERVISOR_shared_info will remain the same, so there is no need
> to copy data from the old to the new *HYPERVISOR_shared_info.

That is correct.

> What do you think, is that race real?

I suppose it is. I didn't imagine it would be a troublesome one though.

 -- Keir

> Olaf


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-13 Thread Olaf Hering
On Tue, Jul 10, Keir Fraser wrote:

> On 10/07/2012 19:09, "Olaf Hering"  wrote:
> > I'm not sure, most likely the gfn will just disappear from the guest,
> > like a ballooned page disappears. Accessing it will likely cause a
> > crash.
> 
> Best thing to do, is possible, is map the shared-info page in the
> xen-platform pci device's BAR memory range. Then it will not conflict with
> any RAM.
> 
> If you do map it over the top of an existing RAM page, you will have to
> repopulate that RAM page before kexec, using populate_physmap hypercall. The
> good news is that the populate_physmap hypercall will have the side effect
> of unmapping the shared-info page, reayd to be mapped wherever the new
> kernel would like it to reside :)

Keir,

is this a safe thing to do in a SMP guest?
If arch/x86/xen/enlighten.c:xen_hvm_init_shared_info() allocates a page
(backed by mfn M and pfn A) and assigns *HYPERVISOR_shared_info and
*xen_vcpu then everything will reference these pointers.

If drivers/xen/platform-pci.c:platform_pci_init would also do a
XENMAPSPACE_shared_info call with pfn B, isnt there a small window where
pfn A is not backed by a mfn because mfn M is now connected to pfn C? As
a result other code paths which access *HYPERVISOR_shared_info and
*xen_vcpu between the hypercall and the update of the pointers will read
0xff.


If I read the hypercall code of XENMEM_add_to_physmap correctly the mfn
backing *HYPERVISOR_shared_info will remain the same, so there is no need
to copy data from the old to the new *HYPERVISOR_shared_info.

What do you think, is that race real?

Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-13 Thread Olaf Hering
On Tue, Jul 10, Keir Fraser wrote:

 On 10/07/2012 19:09, Olaf Hering o...@aepfle.de wrote:
  I'm not sure, most likely the gfn will just disappear from the guest,
  like a ballooned page disappears. Accessing it will likely cause a
  crash.
 
 Best thing to do, is possible, is map the shared-info page in the
 xen-platform pci device's BAR memory range. Then it will not conflict with
 any RAM.
 
 If you do map it over the top of an existing RAM page, you will have to
 repopulate that RAM page before kexec, using populate_physmap hypercall. The
 good news is that the populate_physmap hypercall will have the side effect
 of unmapping the shared-info page, reayd to be mapped wherever the new
 kernel would like it to reside :)

Keir,

is this a safe thing to do in a SMP guest?
If arch/x86/xen/enlighten.c:xen_hvm_init_shared_info() allocates a page
(backed by mfn M and pfn A) and assigns *HYPERVISOR_shared_info and
*xen_vcpu then everything will reference these pointers.

If drivers/xen/platform-pci.c:platform_pci_init would also do a
XENMAPSPACE_shared_info call with pfn B, isnt there a small window where
pfn A is not backed by a mfn because mfn M is now connected to pfn C? As
a result other code paths which access *HYPERVISOR_shared_info and
*xen_vcpu between the hypercall and the update of the pointers will read
0xff.


If I read the hypercall code of XENMEM_add_to_physmap correctly the mfn
backing *HYPERVISOR_shared_info will remain the same, so there is no need
to copy data from the old to the new *HYPERVISOR_shared_info.

What do you think, is that race real?

Olaf
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-13 Thread Keir Fraser
On 13/07/2012 21:20, Olaf Hering o...@aepfle.de wrote:

 On Tue, Jul 10, Keir Fraser wrote:
 
 On 10/07/2012 19:09, Olaf Hering o...@aepfle.de wrote:
 I'm not sure, most likely the gfn will just disappear from the guest,
 like a ballooned page disappears. Accessing it will likely cause a
 crash.
 
 Best thing to do, is possible, is map the shared-info page in the
 xen-platform pci device's BAR memory range. Then it will not conflict with
 any RAM.
 
 If you do map it over the top of an existing RAM page, you will have to
 repopulate that RAM page before kexec, using populate_physmap hypercall. The
 good news is that the populate_physmap hypercall will have the side effect
 of unmapping the shared-info page, reayd to be mapped wherever the new
 kernel would like it to reside :)
 
 Keir,
 
 is this a safe thing to do in a SMP guest?
 If arch/x86/xen/enlighten.c:xen_hvm_init_shared_info() allocates a page
 (backed by mfn M and pfn A) and assigns *HYPERVISOR_shared_info and
 *xen_vcpu then everything will reference these pointers.

So pfn A now points at shared_info, and mfn M is lost (freed back to Xen).
Xen_vcpu doesn't come into it, you'd have that mapped at yet another pfn.

 If drivers/xen/platform-pci.c:platform_pci_init would also do a
 XENMAPSPACE_shared_info call with pfn B, isnt there a small window where
 pfn A is not backed by a mfn because mfn M is now connected to pfn C? As
 a result other code paths which access *HYPERVISOR_shared_info and
 *xen_vcpu between the hypercall and the update of the pointers will read
 0xff.

Don't really understand this. After the XENMAPSPACE_shared_info_call:
 * PFN B points at shared_info, mfn M_B it previously mapped is lost (freed
back to Xen).
 * PFN A maps nothing, reads return all-1s.

Yes, obviously you can't atomically update the mapping of shinfo from A-B,
ad update your pointer in the kernel at exactly the same time. Presumably
you do this early during boot, or late during kexec, or otherwise at a time
when other processors are not expected to touch shinfo.

 
 If I read the hypercall code of XENMEM_add_to_physmap correctly the mfn
 backing *HYPERVISOR_shared_info will remain the same, so there is no need
 to copy data from the old to the new *HYPERVISOR_shared_info.

That is correct.

 What do you think, is that race real?

I suppose it is. I didn't imagine it would be a troublesome one though.

 -- Keir

 Olaf


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Keir Fraser
On 10/07/2012 19:09, "Olaf Hering"  wrote:

>>> Are there more shared areas or is it just the shared info page?
>>> 
 And I am kind of worried that moving it to the .data section won't
 be completly safe - as the decompressor might blow away that part too.
>>> 
>>> The decompressor may just clear the area, but since there is no way to
>>> tell where the shared pages are its always a risk to allocate them at
>>> compile time.
>> 
>> Yeah, and with the hypervisor potentially still updating the "old"
>> MFN before the new kernel has registered the new MFN, we can end up
>> corrupting the new kernel. Ouch.
>> 
>> Would all of these issues disappear if the hypervisor had a hypercall
>> that would stop updating the shared info? or just deregister the MFN?
>> What if you ripped the GMFN out using 'decrease_reservation' hypercall?
>> Would that eliminate the pesky GMFN?
> 
> I'm not sure, most likely the gfn will just disappear from the guest,
> like a ballooned page disappears. Accessing it will likely cause a
> crash.

Best thing to do, is possible, is map the shared-info page in the
xen-platform pci device's BAR memory range. Then it will not conflict with
any RAM.

If you do map it over the top of an existing RAM page, you will have to
repopulate that RAM page before kexec, using populate_physmap hypercall. The
good news is that the populate_physmap hypercall will have the side effect
of unmapping the shared-info page, reayd to be mapped wherever the new
kernel would like it to reside :)

Hope this clears up some of the confusion. ;)

 -- Keir


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Konrad Rzeszutek Wilk
On Tue, Jul 10, 2012 at 08:09:53PM +0200, Olaf Hering wrote:
> On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:
> 
> > On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote:
> > > I was not thinking of statically allocated pages but some new concept of
> > > allocating such shared pages. Shouldnt there be some dedicated area in
> > > the E820 table which has to be used during the whole life time of the
> > > guest?
> > 
> > Not that I can see. But I don't see why that could not be added? Perhaps
> > the HVM loader can make it happen? But then how would it tell the kernel
> > that this E820_RESERVED is the shared_info one. Not the other ones..
> 
> Maybe just use a new E820 type for this sort of thing? Its just the

Ewww.
> question wether some other OS can cope with an unknown type. From my
> reading of the e820 related code a region with an unknown type is just
> ignored.

Sure. And we could scan it.. but scanning E820_UNKNOWN for some magic
header seems .. hacky.

> 
> > > Are there more shared areas or is it just the shared info page?
> > > 
> > > > And I am kind of worried that moving it to the .data section won't
> > > > be completly safe - as the decompressor might blow away that part too.
> > > 
> > > The decompressor may just clear the area, but since there is no way to
> > > tell where the shared pages are its always a risk to allocate them at
> > > compile time.
> > 
> > Yeah, and with the hypervisor potentially still updating the "old"
> > MFN before the new kernel has registered the new MFN, we can end up
> > corrupting the new kernel. Ouch.
> > 
> > Would all of these issues disappear if the hypervisor had a hypercall
> > that would stop updating the shared info? or just deregister the MFN?
> > What if you ripped the GMFN out using 'decrease_reservation' hypercall?
> > Would that eliminate the pesky GMFN?
> 
> I'm not sure, most likely the gfn will just disappear from the guest,
> like a ballooned page disappears. Accessing it will likely cause a
> crash.

What about an populate_physmap right afterwards to stick a newly
minted GMFN in its place? I don't really know whether this dance
of balloon out/balloon in the same GMFN will break the shared_info
relationship. Perhaps not?

What we are going for is to stop the hypervisor from using the shared_info
MFN... perhaps there are other ways to do this?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Olaf Hering
On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:

> On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote:
> > I was not thinking of statically allocated pages but some new concept of
> > allocating such shared pages. Shouldnt there be some dedicated area in
> > the E820 table which has to be used during the whole life time of the
> > guest?
> 
> Not that I can see. But I don't see why that could not be added? Perhaps
> the HVM loader can make it happen? But then how would it tell the kernel
> that this E820_RESERVED is the shared_info one. Not the other ones..

Maybe just use a new E820 type for this sort of thing? Its just the
question wether some other OS can cope with an unknown type. From my
reading of the e820 related code a region with an unknown type is just
ignored.

> > Are there more shared areas or is it just the shared info page?
> > 
> > > And I am kind of worried that moving it to the .data section won't
> > > be completly safe - as the decompressor might blow away that part too.
> > 
> > The decompressor may just clear the area, but since there is no way to
> > tell where the shared pages are its always a risk to allocate them at
> > compile time.
> 
> Yeah, and with the hypervisor potentially still updating the "old"
> MFN before the new kernel has registered the new MFN, we can end up
> corrupting the new kernel. Ouch.
> 
> Would all of these issues disappear if the hypervisor had a hypercall
> that would stop updating the shared info? or just deregister the MFN?
> What if you ripped the GMFN out using 'decrease_reservation' hypercall?
> Would that eliminate the pesky GMFN?

I'm not sure, most likely the gfn will just disappear from the guest,
like a ballooned page disappears. Accessing it will likely cause a
crash.

Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Konrad Rzeszutek Wilk
On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote:
> On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:
> 
> > On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote:
> > > On Fri, Jul 06, Olaf Hering wrote:
> > > 
> > > > On Fri, Jul 06, Jan Beulich wrote:
> > > > 
> > > > > > Could it be that some code tweaks the stack content used by 
> > > > > > decompress()
> > > > > > in some odd way? But that would most likely lead to a crash, not to
> > > > > > unexpected uncompressing results.
> > > > > 
> > > > > Especially if the old and new kernel are using the exact same
> > > > > image, how about the decompression writing over the shared
> > > > > info page causing all this? As the decompressor wouldn't
> > > > > expect Xen to possibly write stuff there itself, it could easily be
> > > > > that some repeat count gets altered, thus breaking the
> > > > > decompressed data without the decompression code necessarily
> > > > > noticing.
> > > > 
> > > > In my case the gfn of the shared info page is 1f54.
> > > > Is it possible to move the page at runtime? It looks like it can be done
> > > > because the hvm loader configures f initially.
> > > > 
> > > > Perhaps the PVonHVM code has to give up the shared pages or move them to
> > > > some dedicated unused page during the process of booting into the new
> > > > kernel.
> > > 
> > > The pfn 1f54 of the shared info page is in the middle of the new bzImage:
> > > pfn32080 KiB   <= ok, so the old .bss
> > 
> > > _head  29360 KiB
> > > _text  33826 KiB
> > > _end   33924 KiB
> > 
> > So _head is at 1CAC and _text starts at  2108h?
> > Ugh, and 1F54 gets overriden. And with your patch, the data gets
> > stuck in between _text and _end? No wait, where would the shared_info
> > be located in the old kernel? Somewhere below the 1CACh?
> 
> The 3 symbols above are from bzImage, which contains the gzipped vmlinux
> and some code to decompress and actually start the uncompressed vmlinux.
> 
> > I presume 1F54 is the _brk_end for the old kernel as well?
> 
> Its in the .brk section of the old kernel.
> 
> > Could you tell me how the decompress code works? Is the new kernel
> > put at PFN 1000h and the decompressor code is put below it?
> 
> I'm not too familiar with the details of the events that happen during
> kexec -e. This is what happens from my understanding:
> kexec -l loads the new kernel and some helper code into some allocated
> memory. kexec -e starts the helper code which relocates the new bzImage
> to its new location (_head) and starts it. The new bzImage uncompresses
> the vmlinux to its final location and finally starts the new vmlinux.
> 
> 
> Now that I think about it, during the relocation of the new bzImage the
> part which contains the compressed vmlinux will already be corrupted
> because the shared info page will be modified by Xen (I dont know in
> what intervals the page gets modified).
> 
> > And the decompressor code uses the .bss section of the "new" kernel
> > to do its deed - since it assumes that the carcass of the old kernel
> > is truly dead and it is not raising a hand saying: "I am not dead yet!".
> 
> The decompressor uses its own .bss.
> 
> > Which brings me to another question - say we do use this patch, what
> > if the decompressor overwrites the old kernels .data section. Won't
> > we run into this problem again?
> 
> The old kernel is dead at this point. If both kernels have the same
> memory layout then the decompressor will clear the page. If they have a
> different layout the .data section (or whatever happens to be there) of
> the new kernel will be corrupted.
> 
> > And what about the new kernel? It will try to register at a new
> > MFN location the VCPU structure. Is that something that the hypervisor
> > is OK with? Does that work with more than VCPU? Or is is stuck with
> > just one VCPU (b/c it couldn't actually do the hypercall?).
> 
> So far I havent seen an issue because my guest uses a single cpu.
> 
> > Or is the registration OK, since the new kernel has the same layout
> > so it registers at the same MFN as the "dead" kernel and it works
> > peachy? What if the kernel version used in the kexec is different
> > from the old one (say it has less built in things)? That would mean
> > the .text and .data section are different than the "dead" kernel?
> 
> Yes, the layout would differ. During decompression corruption may
> occour.
> 
> > > In the patch below the pfn is moved from the bss to the data section. As
> > > a result the new location is now at 28680 KiB, which is outside of the
> > > bzImage.
> > > 
> > > Maybe there is a way to use another dedicated page as shared info page.
> > 
> > That would do it, but it has the negative consequence that we end up
> > consuming an extra PAGE_SIZE that on baremetal kernels won't be used.
> 
> I was not thinking of statically allocated pages but some new concept of
> allocating such shared pages. Shouldnt there be some dedicated area in
> the E820 table which has 

Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Olaf Hering
On Tue, Jul 10, Ian Campbell wrote:

> On Tue, 2012-07-10 at 10:51 -0400, Konrad Rzeszutek Wilk wrote:
> > On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote:
> > > On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
> > > > Which brings me to another question - say we do use this patch, what
> > > > if the decompressor overwrites the old kernels .data section. Won't
> > > > we run into this problem again?
> > > 
> > > I've not really been following this thread that closely but wouldn't the
> > > right answer be for the original kernel to unmap the shared info on
> > > kexec? Or maybe remap it up to some high/reserved address? Can it read
> > 
> > That would be the right answer I think, but I don't see the a 
> > VCPU_deregister
> > call (only VCPU_register).
> 
> Is the issue here vcpuinfo or the shared info (or both)?

shared info is the issue in PVonHVM.

Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Ian Campbell
On Tue, 2012-07-10 at 10:51 -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote:
> > On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
> > > Which brings me to another question - say we do use this patch, what
> > > if the decompressor overwrites the old kernels .data section. Won't
> > > we run into this problem again?
> > 
> > I've not really been following this thread that closely but wouldn't the
> > right answer be for the original kernel to unmap the shared info on
> > kexec? Or maybe remap it up to some high/reserved address? Can it read
> 
> That would be the right answer I think, but I don't see the a VCPU_deregister
> call (only VCPU_register).

Is the issue here vcpuinfo or the shared info (or both)?

> But perhaps the XENMEM_decrease_reservation for the particular MFN is the
> answer to do a VCPU "de-register" ?
> 
> > the original address used by hvmloader at start of day and reuse that?
> 
> Wait, we can map multiple shared_info? Ooh, somehow I thought the guest
> could only do one registration.

I think you can only have one mapping but you can move it by registering
it again.

Ian.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Olaf Hering
On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:

> On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote:
> > On Fri, Jul 06, Olaf Hering wrote:
> > 
> > > On Fri, Jul 06, Jan Beulich wrote:
> > > 
> > > > > Could it be that some code tweaks the stack content used by 
> > > > > decompress()
> > > > > in some odd way? But that would most likely lead to a crash, not to
> > > > > unexpected uncompressing results.
> > > > 
> > > > Especially if the old and new kernel are using the exact same
> > > > image, how about the decompression writing over the shared
> > > > info page causing all this? As the decompressor wouldn't
> > > > expect Xen to possibly write stuff there itself, it could easily be
> > > > that some repeat count gets altered, thus breaking the
> > > > decompressed data without the decompression code necessarily
> > > > noticing.
> > > 
> > > In my case the gfn of the shared info page is 1f54.
> > > Is it possible to move the page at runtime? It looks like it can be done
> > > because the hvm loader configures f initially.
> > > 
> > > Perhaps the PVonHVM code has to give up the shared pages or move them to
> > > some dedicated unused page during the process of booting into the new
> > > kernel.
> > 
> > The pfn 1f54 of the shared info page is in the middle of the new bzImage:
> > pfn32080 KiB   <= ok, so the old .bss
> 
> > _head  29360 KiB
> > _text  33826 KiB
> > _end   33924 KiB
> 
> So _head is at 1CAC and _text starts at  2108h?
> Ugh, and 1F54 gets overriden. And with your patch, the data gets
> stuck in between _text and _end? No wait, where would the shared_info
> be located in the old kernel? Somewhere below the 1CACh?

The 3 symbols above are from bzImage, which contains the gzipped vmlinux
and some code to decompress and actually start the uncompressed vmlinux.

> I presume 1F54 is the _brk_end for the old kernel as well?

Its in the .brk section of the old kernel.

> Could you tell me how the decompress code works? Is the new kernel
> put at PFN 1000h and the decompressor code is put below it?

I'm not too familiar with the details of the events that happen during
kexec -e. This is what happens from my understanding:
kexec -l loads the new kernel and some helper code into some allocated
memory. kexec -e starts the helper code which relocates the new bzImage
to its new location (_head) and starts it. The new bzImage uncompresses
the vmlinux to its final location and finally starts the new vmlinux.


Now that I think about it, during the relocation of the new bzImage the
part which contains the compressed vmlinux will already be corrupted
because the shared info page will be modified by Xen (I dont know in
what intervals the page gets modified).

> And the decompressor code uses the .bss section of the "new" kernel
> to do its deed - since it assumes that the carcass of the old kernel
> is truly dead and it is not raising a hand saying: "I am not dead yet!".

The decompressor uses its own .bss.

> Which brings me to another question - say we do use this patch, what
> if the decompressor overwrites the old kernels .data section. Won't
> we run into this problem again?

The old kernel is dead at this point. If both kernels have the same
memory layout then the decompressor will clear the page. If they have a
different layout the .data section (or whatever happens to be there) of
the new kernel will be corrupted.

> And what about the new kernel? It will try to register at a new
> MFN location the VCPU structure. Is that something that the hypervisor
> is OK with? Does that work with more than VCPU? Or is is stuck with
> just one VCPU (b/c it couldn't actually do the hypercall?).

So far I havent seen an issue because my guest uses a single cpu.

> Or is the registration OK, since the new kernel has the same layout
> so it registers at the same MFN as the "dead" kernel and it works
> peachy? What if the kernel version used in the kexec is different
> from the old one (say it has less built in things)? That would mean
> the .text and .data section are different than the "dead" kernel?

Yes, the layout would differ. During decompression corruption may
occour.

> > In the patch below the pfn is moved from the bss to the data section. As
> > a result the new location is now at 28680 KiB, which is outside of the
> > bzImage.
> > 
> > Maybe there is a way to use another dedicated page as shared info page.
> 
> That would do it, but it has the negative consequence that we end up
> consuming an extra PAGE_SIZE that on baremetal kernels won't be used.

I was not thinking of statically allocated pages but some new concept of
allocating such shared pages. Shouldnt there be some dedicated area in
the E820 table which has to be used during the whole life time of the
guest?
Are there more shared areas or is it just the shared info page?

> And I am kind of worried that moving it to the .data section won't
> be completly safe - as the decompressor might blow away that part too.

The 

Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Konrad Rzeszutek Wilk
On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote:
> On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
> > Which brings me to another question - say we do use this patch, what
> > if the decompressor overwrites the old kernels .data section. Won't
> > we run into this problem again?
> 
> I've not really been following this thread that closely but wouldn't the
> right answer be for the original kernel to unmap the shared info on
> kexec? Or maybe remap it up to some high/reserved address? Can it read

That would be the right answer I think, but I don't see the a VCPU_deregister
call (only VCPU_register).

But perhaps the XENMEM_decrease_reservation for the particular MFN is the
answer to do a VCPU "de-register" ?

> the original address used by hvmloader at start of day and reuse that?

Wait, we can map multiple shared_info? Ooh, somehow I thought the guest
could only do one registration.

> 
> Ian.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Ian Campbell
On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
> Which brings me to another question - say we do use this patch, what
> if the decompressor overwrites the old kernels .data section. Won't
> we run into this problem again?

I've not really been following this thread that closely but wouldn't the
right answer be for the original kernel to unmap the shared info on
kexec? Or maybe remap it up to some high/reserved address? Can it read
the original address used by hvmloader at start of day and reuse that?

Ian.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Konrad Rzeszutek Wilk
On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote:
> On Fri, Jul 06, Olaf Hering wrote:
> 
> > On Fri, Jul 06, Jan Beulich wrote:
> > 
> > > > Could it be that some code tweaks the stack content used by decompress()
> > > > in some odd way? But that would most likely lead to a crash, not to
> > > > unexpected uncompressing results.
> > > 
> > > Especially if the old and new kernel are using the exact same
> > > image, how about the decompression writing over the shared
> > > info page causing all this? As the decompressor wouldn't
> > > expect Xen to possibly write stuff there itself, it could easily be
> > > that some repeat count gets altered, thus breaking the
> > > decompressed data without the decompression code necessarily
> > > noticing.
> > 
> > In my case the gfn of the shared info page is 1f54.
> > Is it possible to move the page at runtime? It looks like it can be done
> > because the hvm loader configures f initially.
> > 
> > Perhaps the PVonHVM code has to give up the shared pages or move them to
> > some dedicated unused page during the process of booting into the new
> > kernel.
> 
> The pfn 1f54 of the shared info page is in the middle of the new bzImage:
> pfn32080 KiB   <= ok, so the old .bss

> _head  29360 KiB
> _text  33826 KiB
> _end   33924 KiB

So _head is at 1CAC and _text starts at  2108h?
Ugh, and 1F54 gets overriden. And with your patch, the data gets
stuck in between _text and _end? No wait, where would the shared_info
be located in the old kernel? Somewhere below the 1CACh?

I presume 1F54 is the _brk_end for the old kernel as well?

Could you tell me how the decompress code works? Is the new kernel
put at PFN 1000h and the decompressor code is put below it?

And the decompressor code uses the .bss section of the "new" kernel
to do its deed - since it assumes that the carcass of the old kernel
is truly dead and it is not raising a hand saying: "I am not dead yet!".

Which brings me to another question - say we do use this patch, what
if the decompressor overwrites the old kernels .data section. Won't
we run into this problem again?

And what about the new kernel? It will try to register at a new
MFN location the VCPU structure. Is that something that the hypervisor
is OK with? Does that work with more than VCPU? Or is is stuck with
just one VCPU (b/c it couldn't actually do the hypercall?).

Or is the registration OK, since the new kernel has the same layout
so it registers at the same MFN as the "dead" kernel and it works
peachy? What if the kernel version used in the kexec is different
from the old one (say it has less built in things)? That would mean
the .text and .data section are different than the "dead" kernel?

> 
> In other words, the compressed vmlinux.bin gets slightly modified during
> uncompression. For some reason this does not lead to decompression
> errors.

Hm
> 
> In the patch below the pfn is moved from the bss to the data section. As
> a result the new location is now at 28680 KiB, which is outside of the
> bzImage.
> 
> Maybe there is a way to use another dedicated page as shared info page.

That would do it, but it has the negative consequence that we end up
consuming an extra PAGE_SIZE that on baremetal kernels won't be used.

Also, I think there are other issues lurking around. The other
users of the .bss (or rather of the extend_brk) call would also be
trashed. On PVHVM that would be the DMI. On PV guests (Daniel
has some thoughts on how to make kexec work under PV guests) that
means the P2M tree gets trashed.

And I am kind of worried that moving it to the .data section won't
be completly safe - as the decompressor might blow away that part too.


> 
> Olaf
> 
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index ff962d4..258e8f3 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1469,19 +1469,16 @@ void __ref xen_hvm_init_shared_info(void)
>  {
>   int cpu;
>   struct xen_add_to_physmap xatp;
> - static struct shared_info *shared_info_page = 0;
> + static struct shared_info shared_info_page __page_aligned_data = {  };
>  
> - if (!shared_info_page)
> - shared_info_page = (struct shared_info *)
> - extend_brk(PAGE_SIZE, PAGE_SIZE);
>   xatp.domid = DOMID_SELF;
>   xatp.idx = 0;
>   xatp.space = XENMAPSPACE_shared_info;
> - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
> + xatp.gpfn = __pa(_info_page) >> PAGE_SHIFT;
>   if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, ))
>   BUG();
>  
> - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
> + HYPERVISOR_shared_info = _info_page;
>  
>   /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
>* page, we use it in the event channel upcall and in some pvclock
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org
> http://lists.xen.org/xen-devel
--

Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Olaf Hering
On Fri, Jul 06, Olaf Hering wrote:

> On Fri, Jul 06, Jan Beulich wrote:
> 
> > > Could it be that some code tweaks the stack content used by decompress()
> > > in some odd way? But that would most likely lead to a crash, not to
> > > unexpected uncompressing results.
> > 
> > Especially if the old and new kernel are using the exact same
> > image, how about the decompression writing over the shared
> > info page causing all this? As the decompressor wouldn't
> > expect Xen to possibly write stuff there itself, it could easily be
> > that some repeat count gets altered, thus breaking the
> > decompressed data without the decompression code necessarily
> > noticing.
> 
> In my case the gfn of the shared info page is 1f54.
> Is it possible to move the page at runtime? It looks like it can be done
> because the hvm loader configures f initially.
> 
> Perhaps the PVonHVM code has to give up the shared pages or move them to
> some dedicated unused page during the process of booting into the new
> kernel.

The pfn 1f54 of the shared info page is in the middle of the new bzImage:
pfn32080 KiB
_head  29360 KiB
_text  33826 KiB
_end   33924 KiB

In other words, the compressed vmlinux.bin gets slightly modified during
uncompression. For some reason this does not lead to decompression
errors.

In the patch below the pfn is moved from the bss to the data section. As
a result the new location is now at 28680 KiB, which is outside of the
bzImage.

Maybe there is a way to use another dedicated page as shared info page.

Olaf


diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index ff962d4..258e8f3 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1469,19 +1469,16 @@ void __ref xen_hvm_init_shared_info(void)
 {
int cpu;
struct xen_add_to_physmap xatp;
-   static struct shared_info *shared_info_page = 0;
+   static struct shared_info shared_info_page __page_aligned_data = {  };
 
-   if (!shared_info_page)
-   shared_info_page = (struct shared_info *)
-   extend_brk(PAGE_SIZE, PAGE_SIZE);
xatp.domid = DOMID_SELF;
xatp.idx = 0;
xatp.space = XENMAPSPACE_shared_info;
-   xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+   xatp.gpfn = __pa(_info_page) >> PAGE_SHIFT;
if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, ))
BUG();
 
-   HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
+   HYPERVISOR_shared_info = _info_page;
 
/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
 * page, we use it in the event channel upcall and in some pvclock
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Olaf Hering
On Fri, Jul 06, Olaf Hering wrote:

 On Fri, Jul 06, Jan Beulich wrote:
 
   Could it be that some code tweaks the stack content used by decompress()
   in some odd way? But that would most likely lead to a crash, not to
   unexpected uncompressing results.
  
  Especially if the old and new kernel are using the exact same
  image, how about the decompression writing over the shared
  info page causing all this? As the decompressor wouldn't
  expect Xen to possibly write stuff there itself, it could easily be
  that some repeat count gets altered, thus breaking the
  decompressed data without the decompression code necessarily
  noticing.
 
 In my case the gfn of the shared info page is 1f54.
 Is it possible to move the page at runtime? It looks like it can be done
 because the hvm loader configures f initially.
 
 Perhaps the PVonHVM code has to give up the shared pages or move them to
 some dedicated unused page during the process of booting into the new
 kernel.

The pfn 1f54 of the shared info page is in the middle of the new bzImage:
pfn32080 KiB
_head  29360 KiB
_text  33826 KiB
_end   33924 KiB

In other words, the compressed vmlinux.bin gets slightly modified during
uncompression. For some reason this does not lead to decompression
errors.

In the patch below the pfn is moved from the bss to the data section. As
a result the new location is now at 28680 KiB, which is outside of the
bzImage.

Maybe there is a way to use another dedicated page as shared info page.

Olaf


diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index ff962d4..258e8f3 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1469,19 +1469,16 @@ void __ref xen_hvm_init_shared_info(void)
 {
int cpu;
struct xen_add_to_physmap xatp;
-   static struct shared_info *shared_info_page = 0;
+   static struct shared_info shared_info_page __page_aligned_data = {  };
 
-   if (!shared_info_page)
-   shared_info_page = (struct shared_info *)
-   extend_brk(PAGE_SIZE, PAGE_SIZE);
xatp.domid = DOMID_SELF;
xatp.idx = 0;
xatp.space = XENMAPSPACE_shared_info;
-   xatp.gpfn = __pa(shared_info_page)  PAGE_SHIFT;
+   xatp.gpfn = __pa(shared_info_page)  PAGE_SHIFT;
if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, xatp))
BUG();
 
-   HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
+   HYPERVISOR_shared_info = shared_info_page;
 
/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
 * page, we use it in the event channel upcall and in some pvclock
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Konrad Rzeszutek Wilk
On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote:
 On Fri, Jul 06, Olaf Hering wrote:
 
  On Fri, Jul 06, Jan Beulich wrote:
  
Could it be that some code tweaks the stack content used by decompress()
in some odd way? But that would most likely lead to a crash, not to
unexpected uncompressing results.
   
   Especially if the old and new kernel are using the exact same
   image, how about the decompression writing over the shared
   info page causing all this? As the decompressor wouldn't
   expect Xen to possibly write stuff there itself, it could easily be
   that some repeat count gets altered, thus breaking the
   decompressed data without the decompression code necessarily
   noticing.
  
  In my case the gfn of the shared info page is 1f54.
  Is it possible to move the page at runtime? It looks like it can be done
  because the hvm loader configures f initially.
  
  Perhaps the PVonHVM code has to give up the shared pages or move them to
  some dedicated unused page during the process of booting into the new
  kernel.
 
 The pfn 1f54 of the shared info page is in the middle of the new bzImage:
 pfn32080 KiB   = ok, so the old .bss

 _head  29360 KiB
 _text  33826 KiB
 _end   33924 KiB

So _head is at 1CAC and _text starts at  2108h?
Ugh, and 1F54 gets overriden. And with your patch, the data gets
stuck in between _text and _end? No wait, where would the shared_info
be located in the old kernel? Somewhere below the 1CACh?

I presume 1F54 is the _brk_end for the old kernel as well?

Could you tell me how the decompress code works? Is the new kernel
put at PFN 1000h and the decompressor code is put below it?

And the decompressor code uses the .bss section of the new kernel
to do its deed - since it assumes that the carcass of the old kernel
is truly dead and it is not raising a hand saying: I am not dead yet!.

Which brings me to another question - say we do use this patch, what
if the decompressor overwrites the old kernels .data section. Won't
we run into this problem again?

And what about the new kernel? It will try to register at a new
MFN location the VCPU structure. Is that something that the hypervisor
is OK with? Does that work with more than VCPU? Or is is stuck with
just one VCPU (b/c it couldn't actually do the hypercall?).

Or is the registration OK, since the new kernel has the same layout
so it registers at the same MFN as the dead kernel and it works
peachy? What if the kernel version used in the kexec is different
from the old one (say it has less built in things)? That would mean
the .text and .data section are different than the dead kernel?

 
 In other words, the compressed vmlinux.bin gets slightly modified during
 uncompression. For some reason this does not lead to decompression
 errors.

Hm
 
 In the patch below the pfn is moved from the bss to the data section. As
 a result the new location is now at 28680 KiB, which is outside of the
 bzImage.
 
 Maybe there is a way to use another dedicated page as shared info page.

That would do it, but it has the negative consequence that we end up
consuming an extra PAGE_SIZE that on baremetal kernels won't be used.

Also, I think there are other issues lurking around. The other
users of the .bss (or rather of the extend_brk) call would also be
trashed. On PVHVM that would be the DMI. On PV guests (Daniel
has some thoughts on how to make kexec work under PV guests) that
means the P2M tree gets trashed.

And I am kind of worried that moving it to the .data section won't
be completly safe - as the decompressor might blow away that part too.


 
 Olaf
 
 
 diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
 index ff962d4..258e8f3 100644
 --- a/arch/x86/xen/enlighten.c
 +++ b/arch/x86/xen/enlighten.c
 @@ -1469,19 +1469,16 @@ void __ref xen_hvm_init_shared_info(void)
  {
   int cpu;
   struct xen_add_to_physmap xatp;
 - static struct shared_info *shared_info_page = 0;
 + static struct shared_info shared_info_page __page_aligned_data = {  };
  
 - if (!shared_info_page)
 - shared_info_page = (struct shared_info *)
 - extend_brk(PAGE_SIZE, PAGE_SIZE);
   xatp.domid = DOMID_SELF;
   xatp.idx = 0;
   xatp.space = XENMAPSPACE_shared_info;
 - xatp.gpfn = __pa(shared_info_page)  PAGE_SHIFT;
 + xatp.gpfn = __pa(shared_info_page)  PAGE_SHIFT;
   if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, xatp))
   BUG();
  
 - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
 + HYPERVISOR_shared_info = shared_info_page;
  
   /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
* page, we use it in the event channel upcall and in some pvclock
 
 ___
 Xen-devel mailing list
 xen-de...@lists.xen.org
 http://lists.xen.org/xen-devel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to 

Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Ian Campbell
On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
 Which brings me to another question - say we do use this patch, what
 if the decompressor overwrites the old kernels .data section. Won't
 we run into this problem again?

I've not really been following this thread that closely but wouldn't the
right answer be for the original kernel to unmap the shared info on
kexec? Or maybe remap it up to some high/reserved address? Can it read
the original address used by hvmloader at start of day and reuse that?

Ian.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Konrad Rzeszutek Wilk
On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote:
 On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
  Which brings me to another question - say we do use this patch, what
  if the decompressor overwrites the old kernels .data section. Won't
  we run into this problem again?
 
 I've not really been following this thread that closely but wouldn't the
 right answer be for the original kernel to unmap the shared info on
 kexec? Or maybe remap it up to some high/reserved address? Can it read

That would be the right answer I think, but I don't see the a VCPU_deregister
call (only VCPU_register).

But perhaps the XENMEM_decrease_reservation for the particular MFN is the
answer to do a VCPU de-register ?

 the original address used by hvmloader at start of day and reuse that?

Wait, we can map multiple shared_info? Ooh, somehow I thought the guest
could only do one registration.

 
 Ian.
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Olaf Hering
On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:

 On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote:
  On Fri, Jul 06, Olaf Hering wrote:
  
   On Fri, Jul 06, Jan Beulich wrote:
   
 Could it be that some code tweaks the stack content used by 
 decompress()
 in some odd way? But that would most likely lead to a crash, not to
 unexpected uncompressing results.

Especially if the old and new kernel are using the exact same
image, how about the decompression writing over the shared
info page causing all this? As the decompressor wouldn't
expect Xen to possibly write stuff there itself, it could easily be
that some repeat count gets altered, thus breaking the
decompressed data without the decompression code necessarily
noticing.
   
   In my case the gfn of the shared info page is 1f54.
   Is it possible to move the page at runtime? It looks like it can be done
   because the hvm loader configures f initially.
   
   Perhaps the PVonHVM code has to give up the shared pages or move them to
   some dedicated unused page during the process of booting into the new
   kernel.
  
  The pfn 1f54 of the shared info page is in the middle of the new bzImage:
  pfn32080 KiB   = ok, so the old .bss
 
  _head  29360 KiB
  _text  33826 KiB
  _end   33924 KiB
 
 So _head is at 1CAC and _text starts at  2108h?
 Ugh, and 1F54 gets overriden. And with your patch, the data gets
 stuck in between _text and _end? No wait, where would the shared_info
 be located in the old kernel? Somewhere below the 1CACh?

The 3 symbols above are from bzImage, which contains the gzipped vmlinux
and some code to decompress and actually start the uncompressed vmlinux.

 I presume 1F54 is the _brk_end for the old kernel as well?

Its in the .brk section of the old kernel.

 Could you tell me how the decompress code works? Is the new kernel
 put at PFN 1000h and the decompressor code is put below it?

I'm not too familiar with the details of the events that happen during
kexec -e. This is what happens from my understanding:
kexec -l loads the new kernel and some helper code into some allocated
memory. kexec -e starts the helper code which relocates the new bzImage
to its new location (_head) and starts it. The new bzImage uncompresses
the vmlinux to its final location and finally starts the new vmlinux.


Now that I think about it, during the relocation of the new bzImage the
part which contains the compressed vmlinux will already be corrupted
because the shared info page will be modified by Xen (I dont know in
what intervals the page gets modified).

 And the decompressor code uses the .bss section of the new kernel
 to do its deed - since it assumes that the carcass of the old kernel
 is truly dead and it is not raising a hand saying: I am not dead yet!.

The decompressor uses its own .bss.

 Which brings me to another question - say we do use this patch, what
 if the decompressor overwrites the old kernels .data section. Won't
 we run into this problem again?

The old kernel is dead at this point. If both kernels have the same
memory layout then the decompressor will clear the page. If they have a
different layout the .data section (or whatever happens to be there) of
the new kernel will be corrupted.

 And what about the new kernel? It will try to register at a new
 MFN location the VCPU structure. Is that something that the hypervisor
 is OK with? Does that work with more than VCPU? Or is is stuck with
 just one VCPU (b/c it couldn't actually do the hypercall?).

So far I havent seen an issue because my guest uses a single cpu.

 Or is the registration OK, since the new kernel has the same layout
 so it registers at the same MFN as the dead kernel and it works
 peachy? What if the kernel version used in the kexec is different
 from the old one (say it has less built in things)? That would mean
 the .text and .data section are different than the dead kernel?

Yes, the layout would differ. During decompression corruption may
occour.

  In the patch below the pfn is moved from the bss to the data section. As
  a result the new location is now at 28680 KiB, which is outside of the
  bzImage.
  
  Maybe there is a way to use another dedicated page as shared info page.
 
 That would do it, but it has the negative consequence that we end up
 consuming an extra PAGE_SIZE that on baremetal kernels won't be used.

I was not thinking of statically allocated pages but some new concept of
allocating such shared pages. Shouldnt there be some dedicated area in
the E820 table which has to be used during the whole life time of the
guest?
Are there more shared areas or is it just the shared info page?

 And I am kind of worried that moving it to the .data section won't
 be completly safe - as the decompressor might blow away that part too.

The decompressor may just clear the area, but since there is no way to
tell where the shared pages are its always a risk to allocate them at
compile time.

Olaf

Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Ian Campbell
On Tue, 2012-07-10 at 10:51 -0400, Konrad Rzeszutek Wilk wrote:
 On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote:
  On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
   Which brings me to another question - say we do use this patch, what
   if the decompressor overwrites the old kernels .data section. Won't
   we run into this problem again?
  
  I've not really been following this thread that closely but wouldn't the
  right answer be for the original kernel to unmap the shared info on
  kexec? Or maybe remap it up to some high/reserved address? Can it read
 
 That would be the right answer I think, but I don't see the a VCPU_deregister
 call (only VCPU_register).

Is the issue here vcpuinfo or the shared info (or both)?

 But perhaps the XENMEM_decrease_reservation for the particular MFN is the
 answer to do a VCPU de-register ?
 
  the original address used by hvmloader at start of day and reuse that?
 
 Wait, we can map multiple shared_info? Ooh, somehow I thought the guest
 could only do one registration.

I think you can only have one mapping but you can move it by registering
it again.

Ian.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Olaf Hering
On Tue, Jul 10, Ian Campbell wrote:

 On Tue, 2012-07-10 at 10:51 -0400, Konrad Rzeszutek Wilk wrote:
  On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote:
   On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
Which brings me to another question - say we do use this patch, what
if the decompressor overwrites the old kernels .data section. Won't
we run into this problem again?
   
   I've not really been following this thread that closely but wouldn't the
   right answer be for the original kernel to unmap the shared info on
   kexec? Or maybe remap it up to some high/reserved address? Can it read
  
  That would be the right answer I think, but I don't see the a 
  VCPU_deregister
  call (only VCPU_register).
 
 Is the issue here vcpuinfo or the shared info (or both)?

shared info is the issue in PVonHVM.

Olaf
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Konrad Rzeszutek Wilk
On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote:
 On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:
 
  On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote:
   On Fri, Jul 06, Olaf Hering wrote:
   
On Fri, Jul 06, Jan Beulich wrote:

  Could it be that some code tweaks the stack content used by 
  decompress()
  in some odd way? But that would most likely lead to a crash, not to
  unexpected uncompressing results.
 
 Especially if the old and new kernel are using the exact same
 image, how about the decompression writing over the shared
 info page causing all this? As the decompressor wouldn't
 expect Xen to possibly write stuff there itself, it could easily be
 that some repeat count gets altered, thus breaking the
 decompressed data without the decompression code necessarily
 noticing.

In my case the gfn of the shared info page is 1f54.
Is it possible to move the page at runtime? It looks like it can be done
because the hvm loader configures f initially.

Perhaps the PVonHVM code has to give up the shared pages or move them to
some dedicated unused page during the process of booting into the new
kernel.
   
   The pfn 1f54 of the shared info page is in the middle of the new bzImage:
   pfn32080 KiB   = ok, so the old .bss
  
   _head  29360 KiB
   _text  33826 KiB
   _end   33924 KiB
  
  So _head is at 1CAC and _text starts at  2108h?
  Ugh, and 1F54 gets overriden. And with your patch, the data gets
  stuck in between _text and _end? No wait, where would the shared_info
  be located in the old kernel? Somewhere below the 1CACh?
 
 The 3 symbols above are from bzImage, which contains the gzipped vmlinux
 and some code to decompress and actually start the uncompressed vmlinux.
 
  I presume 1F54 is the _brk_end for the old kernel as well?
 
 Its in the .brk section of the old kernel.
 
  Could you tell me how the decompress code works? Is the new kernel
  put at PFN 1000h and the decompressor code is put below it?
 
 I'm not too familiar with the details of the events that happen during
 kexec -e. This is what happens from my understanding:
 kexec -l loads the new kernel and some helper code into some allocated
 memory. kexec -e starts the helper code which relocates the new bzImage
 to its new location (_head) and starts it. The new bzImage uncompresses
 the vmlinux to its final location and finally starts the new vmlinux.
 
 
 Now that I think about it, during the relocation of the new bzImage the
 part which contains the compressed vmlinux will already be corrupted
 because the shared info page will be modified by Xen (I dont know in
 what intervals the page gets modified).
 
  And the decompressor code uses the .bss section of the new kernel
  to do its deed - since it assumes that the carcass of the old kernel
  is truly dead and it is not raising a hand saying: I am not dead yet!.
 
 The decompressor uses its own .bss.
 
  Which brings me to another question - say we do use this patch, what
  if the decompressor overwrites the old kernels .data section. Won't
  we run into this problem again?
 
 The old kernel is dead at this point. If both kernels have the same
 memory layout then the decompressor will clear the page. If they have a
 different layout the .data section (or whatever happens to be there) of
 the new kernel will be corrupted.
 
  And what about the new kernel? It will try to register at a new
  MFN location the VCPU structure. Is that something that the hypervisor
  is OK with? Does that work with more than VCPU? Or is is stuck with
  just one VCPU (b/c it couldn't actually do the hypercall?).
 
 So far I havent seen an issue because my guest uses a single cpu.
 
  Or is the registration OK, since the new kernel has the same layout
  so it registers at the same MFN as the dead kernel and it works
  peachy? What if the kernel version used in the kexec is different
  from the old one (say it has less built in things)? That would mean
  the .text and .data section are different than the dead kernel?
 
 Yes, the layout would differ. During decompression corruption may
 occour.
 
   In the patch below the pfn is moved from the bss to the data section. As
   a result the new location is now at 28680 KiB, which is outside of the
   bzImage.
   
   Maybe there is a way to use another dedicated page as shared info page.
  
  That would do it, but it has the negative consequence that we end up
  consuming an extra PAGE_SIZE that on baremetal kernels won't be used.
 
 I was not thinking of statically allocated pages but some new concept of
 allocating such shared pages. Shouldnt there be some dedicated area in
 the E820 table which has to be used during the whole life time of the
 guest?

Not that I can see. But I don't see why that could not be added? Perhaps
the HVM loader can make it happen? But then how would it tell the kernel
that this E820_RESERVED is the shared_info one. Not 

Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Olaf Hering
On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:

 On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote:
  I was not thinking of statically allocated pages but some new concept of
  allocating such shared pages. Shouldnt there be some dedicated area in
  the E820 table which has to be used during the whole life time of the
  guest?
 
 Not that I can see. But I don't see why that could not be added? Perhaps
 the HVM loader can make it happen? But then how would it tell the kernel
 that this E820_RESERVED is the shared_info one. Not the other ones..

Maybe just use a new E820 type for this sort of thing? Its just the
question wether some other OS can cope with an unknown type. From my
reading of the e820 related code a region with an unknown type is just
ignored.

  Are there more shared areas or is it just the shared info page?
  
   And I am kind of worried that moving it to the .data section won't
   be completly safe - as the decompressor might blow away that part too.
  
  The decompressor may just clear the area, but since there is no way to
  tell where the shared pages are its always a risk to allocate them at
  compile time.
 
 Yeah, and with the hypervisor potentially still updating the old
 MFN before the new kernel has registered the new MFN, we can end up
 corrupting the new kernel. Ouch.
 
 Would all of these issues disappear if the hypervisor had a hypercall
 that would stop updating the shared info? or just deregister the MFN?
 What if you ripped the GMFN out using 'decrease_reservation' hypercall?
 Would that eliminate the pesky GMFN?

I'm not sure, most likely the gfn will just disappear from the guest,
like a ballooned page disappears. Accessing it will likely cause a
crash.

Olaf
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Konrad Rzeszutek Wilk
On Tue, Jul 10, 2012 at 08:09:53PM +0200, Olaf Hering wrote:
 On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:
 
  On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote:
   I was not thinking of statically allocated pages but some new concept of
   allocating such shared pages. Shouldnt there be some dedicated area in
   the E820 table which has to be used during the whole life time of the
   guest?
  
  Not that I can see. But I don't see why that could not be added? Perhaps
  the HVM loader can make it happen? But then how would it tell the kernel
  that this E820_RESERVED is the shared_info one. Not the other ones..
 
 Maybe just use a new E820 type for this sort of thing? Its just the

Ewww.
 question wether some other OS can cope with an unknown type. From my
 reading of the e820 related code a region with an unknown type is just
 ignored.

Sure. And we could scan it.. but scanning E820_UNKNOWN for some magic
header seems .. hacky.

 
   Are there more shared areas or is it just the shared info page?
   
And I am kind of worried that moving it to the .data section won't
be completly safe - as the decompressor might blow away that part too.
   
   The decompressor may just clear the area, but since there is no way to
   tell where the shared pages are its always a risk to allocate them at
   compile time.
  
  Yeah, and with the hypervisor potentially still updating the old
  MFN before the new kernel has registered the new MFN, we can end up
  corrupting the new kernel. Ouch.
  
  Would all of these issues disappear if the hypervisor had a hypercall
  that would stop updating the shared info? or just deregister the MFN?
  What if you ripped the GMFN out using 'decrease_reservation' hypercall?
  Would that eliminate the pesky GMFN?
 
 I'm not sure, most likely the gfn will just disappear from the guest,
 like a ballooned page disappears. Accessing it will likely cause a
 crash.

What about an populate_physmap right afterwards to stick a newly
minted GMFN in its place? I don't really know whether this dance
of balloon out/balloon in the same GMFN will break the shared_info
relationship. Perhaps not?

What we are going for is to stop the hypervisor from using the shared_info
MFN... perhaps there are other ways to do this?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-10 Thread Keir Fraser
On 10/07/2012 19:09, Olaf Hering o...@aepfle.de wrote:

 Are there more shared areas or is it just the shared info page?
 
 And I am kind of worried that moving it to the .data section won't
 be completly safe - as the decompressor might blow away that part too.
 
 The decompressor may just clear the area, but since there is no way to
 tell where the shared pages are its always a risk to allocate them at
 compile time.
 
 Yeah, and with the hypervisor potentially still updating the old
 MFN before the new kernel has registered the new MFN, we can end up
 corrupting the new kernel. Ouch.
 
 Would all of these issues disappear if the hypervisor had a hypercall
 that would stop updating the shared info? or just deregister the MFN?
 What if you ripped the GMFN out using 'decrease_reservation' hypercall?
 Would that eliminate the pesky GMFN?
 
 I'm not sure, most likely the gfn will just disappear from the guest,
 like a ballooned page disappears. Accessing it will likely cause a
 crash.

Best thing to do, is possible, is map the shared-info page in the
xen-platform pci device's BAR memory range. Then it will not conflict with
any RAM.

If you do map it over the top of an existing RAM page, you will have to
repopulate that RAM page before kexec, using populate_physmap hypercall. The
good news is that the populate_physmap hypercall will have the side effect
of unmapping the shared-info page, reayd to be mapped wherever the new
kernel would like it to reside :)

Hope this clears up some of the confusion. ;)

 -- Keir


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-06 Thread Olaf Hering
On Fri, Jul 06, Jan Beulich wrote:

> > Could it be that some code tweaks the stack content used by decompress()
> > in some odd way? But that would most likely lead to a crash, not to
> > unexpected uncompressing results.
> 
> Especially if the old and new kernel are using the exact same
> image, how about the decompression writing over the shared
> info page causing all this? As the decompressor wouldn't
> expect Xen to possibly write stuff there itself, it could easily be
> that some repeat count gets altered, thus breaking the
> decompressed data without the decompression code necessarily
> noticing.

In my case the gfn of the shared info page is 1f54.
Is it possible to move the page at runtime? It looks like it can be done
because the hvm loader configures f initially.

Perhaps the PVonHVM code has to give up the shared pages or move them to
some dedicated unused page during the process of booting into the new
kernel.

Konrad, any idea how that could be done?

> If that's the case, there would be a more general problem here
> (for kdump at least), as granted pages could also still get written
> to when the new kernel already is in the process of launching.

Maybe you meant to say kexec instead of kdump?
kdump runs in its own memory area, so I think the worst thing is that
some pages of the crashed kernel get modified.

Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot

2012-07-06 Thread Olaf Hering
On Fri, Jul 06, Jan Beulich wrote:

  Could it be that some code tweaks the stack content used by decompress()
  in some odd way? But that would most likely lead to a crash, not to
  unexpected uncompressing results.
 
 Especially if the old and new kernel are using the exact same
 image, how about the decompression writing over the shared
 info page causing all this? As the decompressor wouldn't
 expect Xen to possibly write stuff there itself, it could easily be
 that some repeat count gets altered, thus breaking the
 decompressed data without the decompression code necessarily
 noticing.

In my case the gfn of the shared info page is 1f54.
Is it possible to move the page at runtime? It looks like it can be done
because the hvm loader configures f initially.

Perhaps the PVonHVM code has to give up the shared pages or move them to
some dedicated unused page during the process of booting into the new
kernel.

Konrad, any idea how that could be done?

 If that's the case, there would be a more general problem here
 (for kdump at least), as granted pages could also still get written
 to when the new kernel already is in the process of launching.

Maybe you meant to say kexec instead of kdump?
kdump runs in its own memory area, so I think the worst thing is that
some pages of the crashed kernel get modified.

Olaf
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/