Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 17, Olaf Hering wrote: To make this robust against allocation errors I will change it to do { > + /* Allocate new mfn for previous pfn */ > + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, ); } while (rc == 0); > + > + /* Make sure the previous pfn is really connected to a (new) mfn */ > + BUG_ON(rc != 1); Olaf -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Mon, Jul 16, Konrad Rzeszutek Wilk wrote: > On Sun, Jul 15, 2012 at 06:06:53PM +0200, Olaf Hering wrote: > > -void __ref xen_hvm_init_shared_info(void) > > +static struct shared_info *hvm_shared_info; > > +static unsigned long hvm_shared_info_pfn; > > + > > Please include a big comment explainig what this machination is OK > to do multiple times. If you can include the juicy snippets of the > email converstation in this comment so that in 6 months if somebody > looks at this they have a good understanding of it. I have added a comment to the new patch, and I will extend it further in the commit message. > > + /* Move pfn, disconnects previous pfn from mfn */ > > + set_shared_info(sip, pfn); > > + > > + /* Allocate new mfn for previous pfn */ > > + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, ); > > + WARN_ON(rc != 1); > > Shouldn't you do some recovery if it fails? Like reassign the old RAM pfn > back to the shared page? I moved the whole reassignment into the shutdown path because there is appearently no sane way to make the move atomic. Shortly before reboot all vcpus are still online, but hopefully the other vcpus have no work left. See below, the change is on top of the two other patches I sent out today. Olaf xen PVonHVM: move shared_info to MMIO before kexec Signed-off-by: Olaf Hering --- arch/x86/xen/enlighten.c | 114 +++- arch/x86/xen/suspend.c |2 +- arch/x86/xen/xen-ops.h |2 +- drivers/xen/platform-pci.c | 15 ++ include/xen/events.h |2 + 5 Dateien geändert, 122 Zeilen hinzugefügt(+), 13 Zeilen entfernt(-) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 184fa9c..3cf233b 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include @@ -1439,38 +1440,126 @@ asmlinkage void __init xen_start_kernel(void) #endif } -void __ref xen_hvm_init_shared_info(void) +#ifdef CONFIG_XEN_PVHVM +/* + * The pfn containing the shared_info is located somewhere in RAM. This + * will cause trouble if the current kernel is doing a kexec boot into a + * new kernel. The new kernel (and its startup code) can not know where + * the pfn is, so it can not reserve the page. The hypervisor will + * continue to update the pfn, and as a result memory corruption occours + * in the new kernel. + * + * One way to work around this issue is to allocate a page in the + * xen-platform pci device's BAR memory range. But pci init is done very + * late and the shared_info page is already in use very early to read + * the pvclock. So moving the pfn from RAM to MMIO is racy because some + * code paths on other vcpus could access the pfn during the small + * window when the old pfn is moved to the new pfn. There is even a + * small window were the old pfn is not backed by a mfn, and during that + * time all reads return -1. + * + * Because it is not known upfront where the MMIO region is located it + * can not be used right from the start in xen_hvm_init_shared_info. + * + * To minimise trouble the move of the pfn is done shortly before kexec. + * This does not eliminate the race because all vcpus are still online + * when the syscore_ops will be called. But hopefully there is no work + * pending at this point in time. Also the syscore_op is run last which + * reduces the risk further. + */ + +static struct shared_info *xen_hvm_shared_info; + +static void xen_hvm_connect_shared_info(unsigned long pfn) { - int cpu; struct xen_add_to_physmap xatp; - static struct shared_info *shared_info_page = 0; - if (!shared_info_page) - shared_info_page = (struct shared_info *) - extend_brk(PAGE_SIZE, PAGE_SIZE); xatp.domid = DOMID_SELF; xatp.idx = 0; xatp.space = XENMAPSPACE_shared_info; - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT; + xatp.gpfn = pfn; if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, )) BUG(); - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; +} +static void xen_hvm_set_shared_info(struct shared_info *sip) +{ + int cpu; + + HYPERVISOR_shared_info = sip; /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info * page, we use it in the event channel upcall and in some pvclock * related functions. We don't need the vcpu_info placement * optimizations because we don't use any pv_mmu or pv_irq op on * HVM. -* When xen_hvm_init_shared_info is run at boot time only vcpu 0 is -* online but xen_hvm_init_shared_info is run at resume time too and +* When xen_hvm_set_shared_info is run at boot time only vcpu 0 is +* online but xen_hvm_set_shared_info is run at resume time too and * in that case multiple vcpus might be online. */ for_each_online_cpu(cpu) {
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Mon, Jul 16, Konrad Rzeszutek Wilk wrote: On Sun, Jul 15, 2012 at 06:06:53PM +0200, Olaf Hering wrote: -void __ref xen_hvm_init_shared_info(void) +static struct shared_info *hvm_shared_info; +static unsigned long hvm_shared_info_pfn; + Please include a big comment explainig what this machination is OK to do multiple times. If you can include the juicy snippets of the email converstation in this comment so that in 6 months if somebody looks at this they have a good understanding of it. I have added a comment to the new patch, and I will extend it further in the commit message. + /* Move pfn, disconnects previous pfn from mfn */ + set_shared_info(sip, pfn); + + /* Allocate new mfn for previous pfn */ + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, reservation); + WARN_ON(rc != 1); Shouldn't you do some recovery if it fails? Like reassign the old RAM pfn back to the shared page? I moved the whole reassignment into the shutdown path because there is appearently no sane way to make the move atomic. Shortly before reboot all vcpus are still online, but hopefully the other vcpus have no work left. See below, the change is on top of the two other patches I sent out today. Olaf xen PVonHVM: move shared_info to MMIO before kexec Signed-off-by: Olaf Hering o...@aepfle.de --- arch/x86/xen/enlighten.c | 114 +++- arch/x86/xen/suspend.c |2 +- arch/x86/xen/xen-ops.h |2 +- drivers/xen/platform-pci.c | 15 ++ include/xen/events.h |2 + 5 Dateien geändert, 122 Zeilen hinzugefügt(+), 13 Zeilen entfernt(-) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 184fa9c..3cf233b 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -31,6 +31,7 @@ #include linux/pci.h #include linux/gfp.h #include linux/memblock.h +#include linux/syscore_ops.h #include xen/xen.h #include xen/interface/xen.h @@ -1439,38 +1440,126 @@ asmlinkage void __init xen_start_kernel(void) #endif } -void __ref xen_hvm_init_shared_info(void) +#ifdef CONFIG_XEN_PVHVM +/* + * The pfn containing the shared_info is located somewhere in RAM. This + * will cause trouble if the current kernel is doing a kexec boot into a + * new kernel. The new kernel (and its startup code) can not know where + * the pfn is, so it can not reserve the page. The hypervisor will + * continue to update the pfn, and as a result memory corruption occours + * in the new kernel. + * + * One way to work around this issue is to allocate a page in the + * xen-platform pci device's BAR memory range. But pci init is done very + * late and the shared_info page is already in use very early to read + * the pvclock. So moving the pfn from RAM to MMIO is racy because some + * code paths on other vcpus could access the pfn during the small + * window when the old pfn is moved to the new pfn. There is even a + * small window were the old pfn is not backed by a mfn, and during that + * time all reads return -1. + * + * Because it is not known upfront where the MMIO region is located it + * can not be used right from the start in xen_hvm_init_shared_info. + * + * To minimise trouble the move of the pfn is done shortly before kexec. + * This does not eliminate the race because all vcpus are still online + * when the syscore_ops will be called. But hopefully there is no work + * pending at this point in time. Also the syscore_op is run last which + * reduces the risk further. + */ + +static struct shared_info *xen_hvm_shared_info; + +static void xen_hvm_connect_shared_info(unsigned long pfn) { - int cpu; struct xen_add_to_physmap xatp; - static struct shared_info *shared_info_page = 0; - if (!shared_info_page) - shared_info_page = (struct shared_info *) - extend_brk(PAGE_SIZE, PAGE_SIZE); xatp.domid = DOMID_SELF; xatp.idx = 0; xatp.space = XENMAPSPACE_shared_info; - xatp.gpfn = __pa(shared_info_page) PAGE_SHIFT; + xatp.gpfn = pfn; if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, xatp)) BUG(); - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; +} +static void xen_hvm_set_shared_info(struct shared_info *sip) +{ + int cpu; + + HYPERVISOR_shared_info = sip; /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info * page, we use it in the event channel upcall and in some pvclock * related functions. We don't need the vcpu_info placement * optimizations because we don't use any pv_mmu or pv_irq op on * HVM. -* When xen_hvm_init_shared_info is run at boot time only vcpu 0 is -* online but xen_hvm_init_shared_info is run at resume time too and +* When xen_hvm_set_shared_info is run at boot time only vcpu 0 is +* online but xen_hvm_set_shared_info is run at resume time too and * in
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 17, Olaf Hering wrote: To make this robust against allocation errors I will change it to do { + /* Allocate new mfn for previous pfn */ + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, reservation); } while (rc == 0); + + /* Make sure the previous pfn is really connected to a (new) mfn */ + BUG_ON(rc != 1); Olaf -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Sun, Jul 15, 2012 at 06:06:53PM +0200, Olaf Hering wrote: > On Tue, Jul 10, Keir Fraser wrote: > > > Best thing to do, is possible, is map the shared-info page in the > > xen-platform pci device's BAR memory range. Then it will not conflict with > > any RAM. > > This patch does that. I did a kexec boot and a save/restore. > It does not deal with the possible race were the old pfn is not backed > by a mfn. > > Olaf > > > commit a9d5567c67a2317c9db174a4deef6c5690220749 > Author: Olaf Hering > Date: Thu Jul 12 19:38:39 2012 +0200 > > xen PVonHVM: move shared info page from RAM to MMIO > > Signed-off-by: Olaf Hering > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > index ff962d4..8a743af 100644 > --- a/arch/x86/xen/enlighten.c > +++ b/arch/x86/xen/enlighten.c > @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor) > return 0; > } > > -void __ref xen_hvm_init_shared_info(void) > +static struct shared_info *hvm_shared_info; > +static unsigned long hvm_shared_info_pfn; > + Please include a big comment explainig what this machination is OK to do multiple times. If you can include the juicy snippets of the email converstation in this comment so that in 6 months if somebody looks at this they have a good understanding of it. > +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn) > { > int cpu; > struct xen_add_to_physmap xatp; > - static struct shared_info *shared_info_page = 0; > > - if (!shared_info_page) > - shared_info_page = (struct shared_info *) > - extend_brk(PAGE_SIZE, PAGE_SIZE); > xatp.domid = DOMID_SELF; > xatp.idx = 0; > xatp.space = XENMAPSPACE_shared_info; > - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT; > + xatp.gpfn = pfn; > if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, )) > BUG(); > > - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; > + HYPERVISOR_shared_info = sip; > > /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info >* page, we use it in the event channel upcall and in some pvclock > @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void) > for_each_online_cpu(cpu) { > per_cpu(xen_vcpu, cpu) = > _shared_info->vcpu_info[cpu]; > } > + mb(); > } > > +/* Reconnect the pfn to a mfn */ > +void __ref xen_hvm_resume_shared_info(void) > +{ > + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); > +} > + > +/* Move the pfn in RAM to a pfn in MMIO space */ > +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned > long pfn) > +{ > + struct xen_memory_reservation reservation = { > + .domid = DOMID_SELF, > + .nr_extents = 1, > + }; > + int rc; > + > + set_xen_guest_handle(reservation.extent_start, _shared_info_pfn); > + > + /* Move pfn, disconnects previous pfn from mfn */ > + set_shared_info(sip, pfn); > + > + /* Allocate new mfn for previous pfn */ > + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, ); > + WARN_ON(rc != 1); Shouldn't you do some recovery if it fails? Like reassign the old RAM pfn back to the shared page? > + > + /* Remember final pfn and pointer for resume */ > + hvm_shared_info_pfn = pfn; > + hvm_shared_info = sip; > +} > + > +/* Use a pfn in RAM until PCI init is done. */ > +static void __ref xen_hvm_initial_shared_info(void) > +{ > + /* FIXME simply allocate a page and release it after pfn move (How at > this stage?) */ You could just have an page on the .bss that has __init_data. During bootup it would be recycled - and since the platform PCI driver is always built in, it would not be a problem I think? Thought if somebody does not compile with CONFIG_XEN_PVHVM then it would make sense to keep the existing allocation from the .brk. > + hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE); > + hvm_shared_info_pfn = __pa(hvm_shared_info) >> PAGE_SHIFT; > + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); > +} > + > + > #ifdef CONFIG_XEN_PVHVM > static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self, > unsigned long action, void *hcpu) > @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void) > if (r < 0) > return; > > - xen_hvm_init_shared_info(); Put here a comment saying: /* Initaliazing a RAM PFN that is later going to be replaced with * a MMIO PFN. */ > + xen_hvm_initial_shared_info(); > > if (xen_feature(XENFEAT_hvm_callback_vector)) > xen_have_vector_callback = 1; > diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c > index 45329c8..ae8a00c 100644 > --- a/arch/x86/xen/suspend.c > +++ b/arch/x86/xen/suspend.c > @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled) > { > #ifdef
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Sun, Jul 15, 2012 at 06:06:53PM +0200, Olaf Hering wrote: On Tue, Jul 10, Keir Fraser wrote: Best thing to do, is possible, is map the shared-info page in the xen-platform pci device's BAR memory range. Then it will not conflict with any RAM. This patch does that. I did a kexec boot and a save/restore. It does not deal with the possible race were the old pfn is not backed by a mfn. Olaf commit a9d5567c67a2317c9db174a4deef6c5690220749 Author: Olaf Hering o...@aepfle.de Date: Thu Jul 12 19:38:39 2012 +0200 xen PVonHVM: move shared info page from RAM to MMIO Signed-off-by: Olaf Hering oher...@suse.de diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index ff962d4..8a743af 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor) return 0; } -void __ref xen_hvm_init_shared_info(void) +static struct shared_info *hvm_shared_info; +static unsigned long hvm_shared_info_pfn; + Please include a big comment explainig what this machination is OK to do multiple times. If you can include the juicy snippets of the email converstation in this comment so that in 6 months if somebody looks at this they have a good understanding of it. +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn) { int cpu; struct xen_add_to_physmap xatp; - static struct shared_info *shared_info_page = 0; - if (!shared_info_page) - shared_info_page = (struct shared_info *) - extend_brk(PAGE_SIZE, PAGE_SIZE); xatp.domid = DOMID_SELF; xatp.idx = 0; xatp.space = XENMAPSPACE_shared_info; - xatp.gpfn = __pa(shared_info_page) PAGE_SHIFT; + xatp.gpfn = pfn; if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, xatp)) BUG(); - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; + HYPERVISOR_shared_info = sip; /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info * page, we use it in the event channel upcall and in some pvclock @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void) for_each_online_cpu(cpu) { per_cpu(xen_vcpu, cpu) = HYPERVISOR_shared_info-vcpu_info[cpu]; } + mb(); } +/* Reconnect the pfn to a mfn */ +void __ref xen_hvm_resume_shared_info(void) +{ + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); +} + +/* Move the pfn in RAM to a pfn in MMIO space */ +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long pfn) +{ + struct xen_memory_reservation reservation = { + .domid = DOMID_SELF, + .nr_extents = 1, + }; + int rc; + + set_xen_guest_handle(reservation.extent_start, hvm_shared_info_pfn); + + /* Move pfn, disconnects previous pfn from mfn */ + set_shared_info(sip, pfn); + + /* Allocate new mfn for previous pfn */ + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, reservation); + WARN_ON(rc != 1); Shouldn't you do some recovery if it fails? Like reassign the old RAM pfn back to the shared page? + + /* Remember final pfn and pointer for resume */ + hvm_shared_info_pfn = pfn; + hvm_shared_info = sip; +} + +/* Use a pfn in RAM until PCI init is done. */ +static void __ref xen_hvm_initial_shared_info(void) +{ + /* FIXME simply allocate a page and release it after pfn move (How at this stage?) */ You could just have an page on the .bss that has __init_data. During bootup it would be recycled - and since the platform PCI driver is always built in, it would not be a problem I think? Thought if somebody does not compile with CONFIG_XEN_PVHVM then it would make sense to keep the existing allocation from the .brk. + hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE); + hvm_shared_info_pfn = __pa(hvm_shared_info) PAGE_SHIFT; + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); +} + + #ifdef CONFIG_XEN_PVHVM static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu) @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void) if (r 0) return; - xen_hvm_init_shared_info(); Put here a comment saying: /* Initaliazing a RAM PFN that is later going to be replaced with * a MMIO PFN. */ + xen_hvm_initial_shared_info(); if (xen_feature(XENFEAT_hvm_callback_vector)) xen_have_vector_callback = 1; diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c index 45329c8..ae8a00c 100644 --- a/arch/x86/xen/suspend.c +++ b/arch/x86/xen/suspend.c @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled) { #ifdef CONFIG_XEN_PVHVM int cpu; - xen_hvm_init_shared_info(); +
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On 15/07/2012 17:06, "Olaf Hering" wrote: > On Tue, Jul 10, Keir Fraser wrote: > >> Best thing to do, is possible, is map the shared-info page in the >> xen-platform pci device's BAR memory range. Then it will not conflict with >> any RAM. > > This patch does that. I did a kexec boot and a save/restore. > It does not deal with the possible race were the old pfn is not backed > by a mfn. Looks good to me. -- Keir > Olaf > > > commit a9d5567c67a2317c9db174a4deef6c5690220749 > Author: Olaf Hering > Date: Thu Jul 12 19:38:39 2012 +0200 > > xen PVonHVM: move shared info page from RAM to MMIO > > Signed-off-by: Olaf Hering > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > index ff962d4..8a743af 100644 > --- a/arch/x86/xen/enlighten.c > +++ b/arch/x86/xen/enlighten.c > @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor) > return 0; > } > > -void __ref xen_hvm_init_shared_info(void) > +static struct shared_info *hvm_shared_info; > +static unsigned long hvm_shared_info_pfn; > + > +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn) > { > int cpu; > struct xen_add_to_physmap xatp; > - static struct shared_info *shared_info_page = 0; > > - if (!shared_info_page) > - shared_info_page = (struct shared_info *) > - extend_brk(PAGE_SIZE, PAGE_SIZE); > xatp.domid = DOMID_SELF; > xatp.idx = 0; > xatp.space = XENMAPSPACE_shared_info; > - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT; > + xatp.gpfn = pfn; > if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, )) > BUG(); > > - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; > + HYPERVISOR_shared_info = sip; > > /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info > * page, we use it in the event channel upcall and in some pvclock > @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void) > for_each_online_cpu(cpu) { > per_cpu(xen_vcpu, cpu) = _shared_info->vcpu_info[cpu]; > } > + mb(); > } > > +/* Reconnect the pfn to a mfn */ > +void __ref xen_hvm_resume_shared_info(void) > +{ > + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); > +} > + > +/* Move the pfn in RAM to a pfn in MMIO space */ > +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned > long pfn) > +{ > + struct xen_memory_reservation reservation = { > + .domid = DOMID_SELF, > + .nr_extents = 1, > + }; > + int rc; > + > + set_xen_guest_handle(reservation.extent_start, _shared_info_pfn); > + > + /* Move pfn, disconnects previous pfn from mfn */ > + set_shared_info(sip, pfn); > + > + /* Allocate new mfn for previous pfn */ > + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, ); > + WARN_ON(rc != 1); > + > + /* Remember final pfn and pointer for resume */ > + hvm_shared_info_pfn = pfn; > + hvm_shared_info = sip; > +} > + > +/* Use a pfn in RAM until PCI init is done. */ > +static void __ref xen_hvm_initial_shared_info(void) > +{ > + /* FIXME simply allocate a page and release it after pfn move (How at this > stage?) */ > + hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE); > + hvm_shared_info_pfn = __pa(hvm_shared_info) >> PAGE_SHIFT; > + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); > +} > + > + > #ifdef CONFIG_XEN_PVHVM > static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self, >unsigned long action, void *hcpu) > @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void) > if (r < 0) > return; > > - xen_hvm_init_shared_info(); > + xen_hvm_initial_shared_info(); > > if (xen_feature(XENFEAT_hvm_callback_vector)) > xen_have_vector_callback = 1; > diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c > index 45329c8..ae8a00c 100644 > --- a/arch/x86/xen/suspend.c > +++ b/arch/x86/xen/suspend.c > @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled) > { > #ifdef CONFIG_XEN_PVHVM > int cpu; > - xen_hvm_init_shared_info(); > + xen_hvm_resume_shared_info(); > xen_callback_vector(); > xen_unplug_emulated_devices(); > if (xen_feature(XENFEAT_hvm_safe_pvclock)) { > diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h > index 202d4c1..1e4329e 100644 > --- a/arch/x86/xen/xen-ops.h > +++ b/arch/x86/xen/xen-ops.h > @@ -41,7 +41,7 @@ void xen_enable_syscall(void); > void xen_vcpu_restore(void); > > void xen_callback_vector(void); > -void xen_hvm_init_shared_info(void); > +void xen_hvm_resume_shared_info(void); > void xen_unplug_emulated_devices(void); > > void __init xen_build_dynamic_phys_to_machine(void); > diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c > index 97ca359..cbe866b 100644 > --- a/drivers/xen/platform-pci.c > +++ b/drivers/xen/platform-pci.c > @@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev > *pdev, > long ioaddr; > long mmio_addr, mmio_len; > unsigned int max_nr_gframes; > + unsigned long addr; > + struct shared_info *hvm_shared_info; > > if (!xen_domain()) > return -ENODEV; > @@ -138,6 +140,11 @@ static
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, Keir Fraser wrote: > Best thing to do, is possible, is map the shared-info page in the > xen-platform pci device's BAR memory range. Then it will not conflict with > any RAM. This patch does that. I did a kexec boot and a save/restore. It does not deal with the possible race were the old pfn is not backed by a mfn. Olaf commit a9d5567c67a2317c9db174a4deef6c5690220749 Author: Olaf Hering Date: Thu Jul 12 19:38:39 2012 +0200 xen PVonHVM: move shared info page from RAM to MMIO Signed-off-by: Olaf Hering diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index ff962d4..8a743af 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor) return 0; } -void __ref xen_hvm_init_shared_info(void) +static struct shared_info *hvm_shared_info; +static unsigned long hvm_shared_info_pfn; + +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn) { int cpu; struct xen_add_to_physmap xatp; - static struct shared_info *shared_info_page = 0; - if (!shared_info_page) - shared_info_page = (struct shared_info *) - extend_brk(PAGE_SIZE, PAGE_SIZE); xatp.domid = DOMID_SELF; xatp.idx = 0; xatp.space = XENMAPSPACE_shared_info; - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT; + xatp.gpfn = pfn; if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, )) BUG(); - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; + HYPERVISOR_shared_info = sip; /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info * page, we use it in the event channel upcall and in some pvclock @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void) for_each_online_cpu(cpu) { per_cpu(xen_vcpu, cpu) = _shared_info->vcpu_info[cpu]; } + mb(); } +/* Reconnect the pfn to a mfn */ +void __ref xen_hvm_resume_shared_info(void) +{ + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); +} + +/* Move the pfn in RAM to a pfn in MMIO space */ +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long pfn) +{ + struct xen_memory_reservation reservation = { + .domid = DOMID_SELF, + .nr_extents = 1, + }; + int rc; + + set_xen_guest_handle(reservation.extent_start, _shared_info_pfn); + + /* Move pfn, disconnects previous pfn from mfn */ + set_shared_info(sip, pfn); + + /* Allocate new mfn for previous pfn */ + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, ); + WARN_ON(rc != 1); + + /* Remember final pfn and pointer for resume */ + hvm_shared_info_pfn = pfn; + hvm_shared_info = sip; +} + +/* Use a pfn in RAM until PCI init is done. */ +static void __ref xen_hvm_initial_shared_info(void) +{ + /* FIXME simply allocate a page and release it after pfn move (How at this stage?) */ + hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE); + hvm_shared_info_pfn = __pa(hvm_shared_info) >> PAGE_SHIFT; + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); +} + + #ifdef CONFIG_XEN_PVHVM static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu) @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void) if (r < 0) return; - xen_hvm_init_shared_info(); + xen_hvm_initial_shared_info(); if (xen_feature(XENFEAT_hvm_callback_vector)) xen_have_vector_callback = 1; diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c index 45329c8..ae8a00c 100644 --- a/arch/x86/xen/suspend.c +++ b/arch/x86/xen/suspend.c @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled) { #ifdef CONFIG_XEN_PVHVM int cpu; - xen_hvm_init_shared_info(); + xen_hvm_resume_shared_info(); xen_callback_vector(); xen_unplug_emulated_devices(); if (xen_feature(XENFEAT_hvm_safe_pvclock)) { diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h index 202d4c1..1e4329e 100644 --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -41,7 +41,7 @@ void xen_enable_syscall(void); void xen_vcpu_restore(void); void xen_callback_vector(void); -void xen_hvm_init_shared_info(void); +void xen_hvm_resume_shared_info(void); void xen_unplug_emulated_devices(void); void __init xen_build_dynamic_phys_to_machine(void); diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c index 97ca359..cbe866b 100644 --- a/drivers/xen/platform-pci.c +++ b/drivers/xen/platform-pci.c @@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev *pdev, long ioaddr; long mmio_addr, mmio_len; unsigned int max_nr_gframes; + unsigned
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, Keir Fraser wrote: Best thing to do, is possible, is map the shared-info page in the xen-platform pci device's BAR memory range. Then it will not conflict with any RAM. This patch does that. I did a kexec boot and a save/restore. It does not deal with the possible race were the old pfn is not backed by a mfn. Olaf commit a9d5567c67a2317c9db174a4deef6c5690220749 Author: Olaf Hering o...@aepfle.de Date: Thu Jul 12 19:38:39 2012 +0200 xen PVonHVM: move shared info page from RAM to MMIO Signed-off-by: Olaf Hering oher...@suse.de diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index ff962d4..8a743af 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor) return 0; } -void __ref xen_hvm_init_shared_info(void) +static struct shared_info *hvm_shared_info; +static unsigned long hvm_shared_info_pfn; + +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn) { int cpu; struct xen_add_to_physmap xatp; - static struct shared_info *shared_info_page = 0; - if (!shared_info_page) - shared_info_page = (struct shared_info *) - extend_brk(PAGE_SIZE, PAGE_SIZE); xatp.domid = DOMID_SELF; xatp.idx = 0; xatp.space = XENMAPSPACE_shared_info; - xatp.gpfn = __pa(shared_info_page) PAGE_SHIFT; + xatp.gpfn = pfn; if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, xatp)) BUG(); - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; + HYPERVISOR_shared_info = sip; /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info * page, we use it in the event channel upcall and in some pvclock @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void) for_each_online_cpu(cpu) { per_cpu(xen_vcpu, cpu) = HYPERVISOR_shared_info-vcpu_info[cpu]; } + mb(); } +/* Reconnect the pfn to a mfn */ +void __ref xen_hvm_resume_shared_info(void) +{ + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); +} + +/* Move the pfn in RAM to a pfn in MMIO space */ +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long pfn) +{ + struct xen_memory_reservation reservation = { + .domid = DOMID_SELF, + .nr_extents = 1, + }; + int rc; + + set_xen_guest_handle(reservation.extent_start, hvm_shared_info_pfn); + + /* Move pfn, disconnects previous pfn from mfn */ + set_shared_info(sip, pfn); + + /* Allocate new mfn for previous pfn */ + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, reservation); + WARN_ON(rc != 1); + + /* Remember final pfn and pointer for resume */ + hvm_shared_info_pfn = pfn; + hvm_shared_info = sip; +} + +/* Use a pfn in RAM until PCI init is done. */ +static void __ref xen_hvm_initial_shared_info(void) +{ + /* FIXME simply allocate a page and release it after pfn move (How at this stage?) */ + hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE); + hvm_shared_info_pfn = __pa(hvm_shared_info) PAGE_SHIFT; + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); +} + + #ifdef CONFIG_XEN_PVHVM static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu) @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void) if (r 0) return; - xen_hvm_init_shared_info(); + xen_hvm_initial_shared_info(); if (xen_feature(XENFEAT_hvm_callback_vector)) xen_have_vector_callback = 1; diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c index 45329c8..ae8a00c 100644 --- a/arch/x86/xen/suspend.c +++ b/arch/x86/xen/suspend.c @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled) { #ifdef CONFIG_XEN_PVHVM int cpu; - xen_hvm_init_shared_info(); + xen_hvm_resume_shared_info(); xen_callback_vector(); xen_unplug_emulated_devices(); if (xen_feature(XENFEAT_hvm_safe_pvclock)) { diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h index 202d4c1..1e4329e 100644 --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -41,7 +41,7 @@ void xen_enable_syscall(void); void xen_vcpu_restore(void); void xen_callback_vector(void); -void xen_hvm_init_shared_info(void); +void xen_hvm_resume_shared_info(void); void xen_unplug_emulated_devices(void); void __init xen_build_dynamic_phys_to_machine(void); diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c index 97ca359..cbe866b 100644 --- a/drivers/xen/platform-pci.c +++ b/drivers/xen/platform-pci.c @@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev *pdev, long ioaddr; long mmio_addr, mmio_len;
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On 15/07/2012 17:06, Olaf Hering o...@aepfle.de wrote: On Tue, Jul 10, Keir Fraser wrote: Best thing to do, is possible, is map the shared-info page in the xen-platform pci device's BAR memory range. Then it will not conflict with any RAM. This patch does that. I did a kexec boot and a save/restore. It does not deal with the possible race were the old pfn is not backed by a mfn. Looks good to me. -- Keir Olaf commit a9d5567c67a2317c9db174a4deef6c5690220749 Author: Olaf Hering o...@aepfle.de Date: Thu Jul 12 19:38:39 2012 +0200 xen PVonHVM: move shared info page from RAM to MMIO Signed-off-by: Olaf Hering oher...@suse.de diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index ff962d4..8a743af 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor) return 0; } -void __ref xen_hvm_init_shared_info(void) +static struct shared_info *hvm_shared_info; +static unsigned long hvm_shared_info_pfn; + +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn) { int cpu; struct xen_add_to_physmap xatp; - static struct shared_info *shared_info_page = 0; - if (!shared_info_page) - shared_info_page = (struct shared_info *) - extend_brk(PAGE_SIZE, PAGE_SIZE); xatp.domid = DOMID_SELF; xatp.idx = 0; xatp.space = XENMAPSPACE_shared_info; - xatp.gpfn = __pa(shared_info_page) PAGE_SHIFT; + xatp.gpfn = pfn; if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, xatp)) BUG(); - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; + HYPERVISOR_shared_info = sip; /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info * page, we use it in the event channel upcall and in some pvclock @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void) for_each_online_cpu(cpu) { per_cpu(xen_vcpu, cpu) = HYPERVISOR_shared_info-vcpu_info[cpu]; } + mb(); } +/* Reconnect the pfn to a mfn */ +void __ref xen_hvm_resume_shared_info(void) +{ + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); +} + +/* Move the pfn in RAM to a pfn in MMIO space */ +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long pfn) +{ + struct xen_memory_reservation reservation = { + .domid = DOMID_SELF, + .nr_extents = 1, + }; + int rc; + + set_xen_guest_handle(reservation.extent_start, hvm_shared_info_pfn); + + /* Move pfn, disconnects previous pfn from mfn */ + set_shared_info(sip, pfn); + + /* Allocate new mfn for previous pfn */ + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, reservation); + WARN_ON(rc != 1); + + /* Remember final pfn and pointer for resume */ + hvm_shared_info_pfn = pfn; + hvm_shared_info = sip; +} + +/* Use a pfn in RAM until PCI init is done. */ +static void __ref xen_hvm_initial_shared_info(void) +{ + /* FIXME simply allocate a page and release it after pfn move (How at this stage?) */ + hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE); + hvm_shared_info_pfn = __pa(hvm_shared_info) PAGE_SHIFT; + set_shared_info(hvm_shared_info, hvm_shared_info_pfn); +} + + #ifdef CONFIG_XEN_PVHVM static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu) @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void) if (r 0) return; - xen_hvm_init_shared_info(); + xen_hvm_initial_shared_info(); if (xen_feature(XENFEAT_hvm_callback_vector)) xen_have_vector_callback = 1; diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c index 45329c8..ae8a00c 100644 --- a/arch/x86/xen/suspend.c +++ b/arch/x86/xen/suspend.c @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled) { #ifdef CONFIG_XEN_PVHVM int cpu; - xen_hvm_init_shared_info(); + xen_hvm_resume_shared_info(); xen_callback_vector(); xen_unplug_emulated_devices(); if (xen_feature(XENFEAT_hvm_safe_pvclock)) { diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h index 202d4c1..1e4329e 100644 --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -41,7 +41,7 @@ void xen_enable_syscall(void); void xen_vcpu_restore(void); void xen_callback_vector(void); -void xen_hvm_init_shared_info(void); +void xen_hvm_resume_shared_info(void); void xen_unplug_emulated_devices(void); void __init xen_build_dynamic_phys_to_machine(void); diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c index 97ca359..cbe866b 100644 --- a/drivers/xen/platform-pci.c +++ b/drivers/xen/platform-pci.c @@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev *pdev, long ioaddr; long mmio_addr, mmio_len; unsigned int max_nr_gframes; + unsigned long addr; + struct shared_info *hvm_shared_info; if (!xen_domain()) return -ENODEV; @@ -138,6 +140,11 @@ static int __devinit platform_pci_init(struct pci_dev *pdev, platform_mmio = mmio_addr;
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On 13/07/2012 21:20, "Olaf Hering" wrote: > On Tue, Jul 10, Keir Fraser wrote: > >> On 10/07/2012 19:09, "Olaf Hering" wrote: >>> I'm not sure, most likely the gfn will just disappear from the guest, >>> like a ballooned page disappears. Accessing it will likely cause a >>> crash. >> >> Best thing to do, is possible, is map the shared-info page in the >> xen-platform pci device's BAR memory range. Then it will not conflict with >> any RAM. >> >> If you do map it over the top of an existing RAM page, you will have to >> repopulate that RAM page before kexec, using populate_physmap hypercall. The >> good news is that the populate_physmap hypercall will have the side effect >> of unmapping the shared-info page, reayd to be mapped wherever the new >> kernel would like it to reside :) > > Keir, > > is this a safe thing to do in a SMP guest? > If arch/x86/xen/enlighten.c:xen_hvm_init_shared_info() allocates a page > (backed by mfn M and pfn A) and assigns *HYPERVISOR_shared_info and > *xen_vcpu then everything will reference these pointers. So pfn A now points at shared_info, and mfn M is lost (freed back to Xen). Xen_vcpu doesn't come into it, you'd have that mapped at yet another pfn. > If drivers/xen/platform-pci.c:platform_pci_init would also do a > XENMAPSPACE_shared_info call with pfn B, isnt there a small window where > pfn A is not backed by a mfn because mfn M is now connected to pfn C? As > a result other code paths which access *HYPERVISOR_shared_info and > *xen_vcpu between the hypercall and the update of the pointers will read > 0xff. Don't really understand this. After the XENMAPSPACE_shared_info_call: * PFN B points at shared_info, mfn M_B it previously mapped is lost (freed back to Xen). * PFN A maps nothing, reads return all-1s. Yes, obviously you can't atomically update the mapping of shinfo from A->B, ad update your pointer in the kernel at exactly the same time. Presumably you do this early during boot, or late during kexec, or otherwise at a time when other processors are not expected to touch shinfo. > > If I read the hypercall code of XENMEM_add_to_physmap correctly the mfn > backing *HYPERVISOR_shared_info will remain the same, so there is no need > to copy data from the old to the new *HYPERVISOR_shared_info. That is correct. > What do you think, is that race real? I suppose it is. I didn't imagine it would be a troublesome one though. -- Keir > Olaf -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, Keir Fraser wrote: > On 10/07/2012 19:09, "Olaf Hering" wrote: > > I'm not sure, most likely the gfn will just disappear from the guest, > > like a ballooned page disappears. Accessing it will likely cause a > > crash. > > Best thing to do, is possible, is map the shared-info page in the > xen-platform pci device's BAR memory range. Then it will not conflict with > any RAM. > > If you do map it over the top of an existing RAM page, you will have to > repopulate that RAM page before kexec, using populate_physmap hypercall. The > good news is that the populate_physmap hypercall will have the side effect > of unmapping the shared-info page, reayd to be mapped wherever the new > kernel would like it to reside :) Keir, is this a safe thing to do in a SMP guest? If arch/x86/xen/enlighten.c:xen_hvm_init_shared_info() allocates a page (backed by mfn M and pfn A) and assigns *HYPERVISOR_shared_info and *xen_vcpu then everything will reference these pointers. If drivers/xen/platform-pci.c:platform_pci_init would also do a XENMAPSPACE_shared_info call with pfn B, isnt there a small window where pfn A is not backed by a mfn because mfn M is now connected to pfn C? As a result other code paths which access *HYPERVISOR_shared_info and *xen_vcpu between the hypercall and the update of the pointers will read 0xff. If I read the hypercall code of XENMEM_add_to_physmap correctly the mfn backing *HYPERVISOR_shared_info will remain the same, so there is no need to copy data from the old to the new *HYPERVISOR_shared_info. What do you think, is that race real? Olaf -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, Keir Fraser wrote: On 10/07/2012 19:09, Olaf Hering o...@aepfle.de wrote: I'm not sure, most likely the gfn will just disappear from the guest, like a ballooned page disappears. Accessing it will likely cause a crash. Best thing to do, is possible, is map the shared-info page in the xen-platform pci device's BAR memory range. Then it will not conflict with any RAM. If you do map it over the top of an existing RAM page, you will have to repopulate that RAM page before kexec, using populate_physmap hypercall. The good news is that the populate_physmap hypercall will have the side effect of unmapping the shared-info page, reayd to be mapped wherever the new kernel would like it to reside :) Keir, is this a safe thing to do in a SMP guest? If arch/x86/xen/enlighten.c:xen_hvm_init_shared_info() allocates a page (backed by mfn M and pfn A) and assigns *HYPERVISOR_shared_info and *xen_vcpu then everything will reference these pointers. If drivers/xen/platform-pci.c:platform_pci_init would also do a XENMAPSPACE_shared_info call with pfn B, isnt there a small window where pfn A is not backed by a mfn because mfn M is now connected to pfn C? As a result other code paths which access *HYPERVISOR_shared_info and *xen_vcpu between the hypercall and the update of the pointers will read 0xff. If I read the hypercall code of XENMEM_add_to_physmap correctly the mfn backing *HYPERVISOR_shared_info will remain the same, so there is no need to copy data from the old to the new *HYPERVISOR_shared_info. What do you think, is that race real? Olaf -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On 13/07/2012 21:20, Olaf Hering o...@aepfle.de wrote: On Tue, Jul 10, Keir Fraser wrote: On 10/07/2012 19:09, Olaf Hering o...@aepfle.de wrote: I'm not sure, most likely the gfn will just disappear from the guest, like a ballooned page disappears. Accessing it will likely cause a crash. Best thing to do, is possible, is map the shared-info page in the xen-platform pci device's BAR memory range. Then it will not conflict with any RAM. If you do map it over the top of an existing RAM page, you will have to repopulate that RAM page before kexec, using populate_physmap hypercall. The good news is that the populate_physmap hypercall will have the side effect of unmapping the shared-info page, reayd to be mapped wherever the new kernel would like it to reside :) Keir, is this a safe thing to do in a SMP guest? If arch/x86/xen/enlighten.c:xen_hvm_init_shared_info() allocates a page (backed by mfn M and pfn A) and assigns *HYPERVISOR_shared_info and *xen_vcpu then everything will reference these pointers. So pfn A now points at shared_info, and mfn M is lost (freed back to Xen). Xen_vcpu doesn't come into it, you'd have that mapped at yet another pfn. If drivers/xen/platform-pci.c:platform_pci_init would also do a XENMAPSPACE_shared_info call with pfn B, isnt there a small window where pfn A is not backed by a mfn because mfn M is now connected to pfn C? As a result other code paths which access *HYPERVISOR_shared_info and *xen_vcpu between the hypercall and the update of the pointers will read 0xff. Don't really understand this. After the XENMAPSPACE_shared_info_call: * PFN B points at shared_info, mfn M_B it previously mapped is lost (freed back to Xen). * PFN A maps nothing, reads return all-1s. Yes, obviously you can't atomically update the mapping of shinfo from A-B, ad update your pointer in the kernel at exactly the same time. Presumably you do this early during boot, or late during kexec, or otherwise at a time when other processors are not expected to touch shinfo. If I read the hypercall code of XENMEM_add_to_physmap correctly the mfn backing *HYPERVISOR_shared_info will remain the same, so there is no need to copy data from the old to the new *HYPERVISOR_shared_info. That is correct. What do you think, is that race real? I suppose it is. I didn't imagine it would be a troublesome one though. -- Keir Olaf -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On 10/07/2012 19:09, "Olaf Hering" wrote: >>> Are there more shared areas or is it just the shared info page? >>> And I am kind of worried that moving it to the .data section won't be completly safe - as the decompressor might blow away that part too. >>> >>> The decompressor may just clear the area, but since there is no way to >>> tell where the shared pages are its always a risk to allocate them at >>> compile time. >> >> Yeah, and with the hypervisor potentially still updating the "old" >> MFN before the new kernel has registered the new MFN, we can end up >> corrupting the new kernel. Ouch. >> >> Would all of these issues disappear if the hypervisor had a hypercall >> that would stop updating the shared info? or just deregister the MFN? >> What if you ripped the GMFN out using 'decrease_reservation' hypercall? >> Would that eliminate the pesky GMFN? > > I'm not sure, most likely the gfn will just disappear from the guest, > like a ballooned page disappears. Accessing it will likely cause a > crash. Best thing to do, is possible, is map the shared-info page in the xen-platform pci device's BAR memory range. Then it will not conflict with any RAM. If you do map it over the top of an existing RAM page, you will have to repopulate that RAM page before kexec, using populate_physmap hypercall. The good news is that the populate_physmap hypercall will have the side effect of unmapping the shared-info page, reayd to be mapped wherever the new kernel would like it to reside :) Hope this clears up some of the confusion. ;) -- Keir -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, 2012 at 08:09:53PM +0200, Olaf Hering wrote: > On Tue, Jul 10, Konrad Rzeszutek Wilk wrote: > > > On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote: > > > I was not thinking of statically allocated pages but some new concept of > > > allocating such shared pages. Shouldnt there be some dedicated area in > > > the E820 table which has to be used during the whole life time of the > > > guest? > > > > Not that I can see. But I don't see why that could not be added? Perhaps > > the HVM loader can make it happen? But then how would it tell the kernel > > that this E820_RESERVED is the shared_info one. Not the other ones.. > > Maybe just use a new E820 type for this sort of thing? Its just the Ewww. > question wether some other OS can cope with an unknown type. From my > reading of the e820 related code a region with an unknown type is just > ignored. Sure. And we could scan it.. but scanning E820_UNKNOWN for some magic header seems .. hacky. > > > > Are there more shared areas or is it just the shared info page? > > > > > > > And I am kind of worried that moving it to the .data section won't > > > > be completly safe - as the decompressor might blow away that part too. > > > > > > The decompressor may just clear the area, but since there is no way to > > > tell where the shared pages are its always a risk to allocate them at > > > compile time. > > > > Yeah, and with the hypervisor potentially still updating the "old" > > MFN before the new kernel has registered the new MFN, we can end up > > corrupting the new kernel. Ouch. > > > > Would all of these issues disappear if the hypervisor had a hypercall > > that would stop updating the shared info? or just deregister the MFN? > > What if you ripped the GMFN out using 'decrease_reservation' hypercall? > > Would that eliminate the pesky GMFN? > > I'm not sure, most likely the gfn will just disappear from the guest, > like a ballooned page disappears. Accessing it will likely cause a > crash. What about an populate_physmap right afterwards to stick a newly minted GMFN in its place? I don't really know whether this dance of balloon out/balloon in the same GMFN will break the shared_info relationship. Perhaps not? What we are going for is to stop the hypervisor from using the shared_info MFN... perhaps there are other ways to do this? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, Konrad Rzeszutek Wilk wrote: > On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote: > > I was not thinking of statically allocated pages but some new concept of > > allocating such shared pages. Shouldnt there be some dedicated area in > > the E820 table which has to be used during the whole life time of the > > guest? > > Not that I can see. But I don't see why that could not be added? Perhaps > the HVM loader can make it happen? But then how would it tell the kernel > that this E820_RESERVED is the shared_info one. Not the other ones.. Maybe just use a new E820 type for this sort of thing? Its just the question wether some other OS can cope with an unknown type. From my reading of the e820 related code a region with an unknown type is just ignored. > > Are there more shared areas or is it just the shared info page? > > > > > And I am kind of worried that moving it to the .data section won't > > > be completly safe - as the decompressor might blow away that part too. > > > > The decompressor may just clear the area, but since there is no way to > > tell where the shared pages are its always a risk to allocate them at > > compile time. > > Yeah, and with the hypervisor potentially still updating the "old" > MFN before the new kernel has registered the new MFN, we can end up > corrupting the new kernel. Ouch. > > Would all of these issues disappear if the hypervisor had a hypercall > that would stop updating the shared info? or just deregister the MFN? > What if you ripped the GMFN out using 'decrease_reservation' hypercall? > Would that eliminate the pesky GMFN? I'm not sure, most likely the gfn will just disappear from the guest, like a ballooned page disappears. Accessing it will likely cause a crash. Olaf -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote: > On Tue, Jul 10, Konrad Rzeszutek Wilk wrote: > > > On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote: > > > On Fri, Jul 06, Olaf Hering wrote: > > > > > > > On Fri, Jul 06, Jan Beulich wrote: > > > > > > > > > > Could it be that some code tweaks the stack content used by > > > > > > decompress() > > > > > > in some odd way? But that would most likely lead to a crash, not to > > > > > > unexpected uncompressing results. > > > > > > > > > > Especially if the old and new kernel are using the exact same > > > > > image, how about the decompression writing over the shared > > > > > info page causing all this? As the decompressor wouldn't > > > > > expect Xen to possibly write stuff there itself, it could easily be > > > > > that some repeat count gets altered, thus breaking the > > > > > decompressed data without the decompression code necessarily > > > > > noticing. > > > > > > > > In my case the gfn of the shared info page is 1f54. > > > > Is it possible to move the page at runtime? It looks like it can be done > > > > because the hvm loader configures f initially. > > > > > > > > Perhaps the PVonHVM code has to give up the shared pages or move them to > > > > some dedicated unused page during the process of booting into the new > > > > kernel. > > > > > > The pfn 1f54 of the shared info page is in the middle of the new bzImage: > > > pfn32080 KiB <= ok, so the old .bss > > > > > _head 29360 KiB > > > _text 33826 KiB > > > _end 33924 KiB > > > > So _head is at 1CAC and _text starts at 2108h? > > Ugh, and 1F54 gets overriden. And with your patch, the data gets > > stuck in between _text and _end? No wait, where would the shared_info > > be located in the old kernel? Somewhere below the 1CACh? > > The 3 symbols above are from bzImage, which contains the gzipped vmlinux > and some code to decompress and actually start the uncompressed vmlinux. > > > I presume 1F54 is the _brk_end for the old kernel as well? > > Its in the .brk section of the old kernel. > > > Could you tell me how the decompress code works? Is the new kernel > > put at PFN 1000h and the decompressor code is put below it? > > I'm not too familiar with the details of the events that happen during > kexec -e. This is what happens from my understanding: > kexec -l loads the new kernel and some helper code into some allocated > memory. kexec -e starts the helper code which relocates the new bzImage > to its new location (_head) and starts it. The new bzImage uncompresses > the vmlinux to its final location and finally starts the new vmlinux. > > > Now that I think about it, during the relocation of the new bzImage the > part which contains the compressed vmlinux will already be corrupted > because the shared info page will be modified by Xen (I dont know in > what intervals the page gets modified). > > > And the decompressor code uses the .bss section of the "new" kernel > > to do its deed - since it assumes that the carcass of the old kernel > > is truly dead and it is not raising a hand saying: "I am not dead yet!". > > The decompressor uses its own .bss. > > > Which brings me to another question - say we do use this patch, what > > if the decompressor overwrites the old kernels .data section. Won't > > we run into this problem again? > > The old kernel is dead at this point. If both kernels have the same > memory layout then the decompressor will clear the page. If they have a > different layout the .data section (or whatever happens to be there) of > the new kernel will be corrupted. > > > And what about the new kernel? It will try to register at a new > > MFN location the VCPU structure. Is that something that the hypervisor > > is OK with? Does that work with more than VCPU? Or is is stuck with > > just one VCPU (b/c it couldn't actually do the hypercall?). > > So far I havent seen an issue because my guest uses a single cpu. > > > Or is the registration OK, since the new kernel has the same layout > > so it registers at the same MFN as the "dead" kernel and it works > > peachy? What if the kernel version used in the kexec is different > > from the old one (say it has less built in things)? That would mean > > the .text and .data section are different than the "dead" kernel? > > Yes, the layout would differ. During decompression corruption may > occour. > > > > In the patch below the pfn is moved from the bss to the data section. As > > > a result the new location is now at 28680 KiB, which is outside of the > > > bzImage. > > > > > > Maybe there is a way to use another dedicated page as shared info page. > > > > That would do it, but it has the negative consequence that we end up > > consuming an extra PAGE_SIZE that on baremetal kernels won't be used. > > I was not thinking of statically allocated pages but some new concept of > allocating such shared pages. Shouldnt there be some dedicated area in > the E820 table which has
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, Ian Campbell wrote: > On Tue, 2012-07-10 at 10:51 -0400, Konrad Rzeszutek Wilk wrote: > > On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote: > > > On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote: > > > > Which brings me to another question - say we do use this patch, what > > > > if the decompressor overwrites the old kernels .data section. Won't > > > > we run into this problem again? > > > > > > I've not really been following this thread that closely but wouldn't the > > > right answer be for the original kernel to unmap the shared info on > > > kexec? Or maybe remap it up to some high/reserved address? Can it read > > > > That would be the right answer I think, but I don't see the a > > VCPU_deregister > > call (only VCPU_register). > > Is the issue here vcpuinfo or the shared info (or both)? shared info is the issue in PVonHVM. Olaf -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, 2012-07-10 at 10:51 -0400, Konrad Rzeszutek Wilk wrote: > On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote: > > On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote: > > > Which brings me to another question - say we do use this patch, what > > > if the decompressor overwrites the old kernels .data section. Won't > > > we run into this problem again? > > > > I've not really been following this thread that closely but wouldn't the > > right answer be for the original kernel to unmap the shared info on > > kexec? Or maybe remap it up to some high/reserved address? Can it read > > That would be the right answer I think, but I don't see the a VCPU_deregister > call (only VCPU_register). Is the issue here vcpuinfo or the shared info (or both)? > But perhaps the XENMEM_decrease_reservation for the particular MFN is the > answer to do a VCPU "de-register" ? > > > the original address used by hvmloader at start of day and reuse that? > > Wait, we can map multiple shared_info? Ooh, somehow I thought the guest > could only do one registration. I think you can only have one mapping but you can move it by registering it again. Ian. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, Konrad Rzeszutek Wilk wrote: > On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote: > > On Fri, Jul 06, Olaf Hering wrote: > > > > > On Fri, Jul 06, Jan Beulich wrote: > > > > > > > > Could it be that some code tweaks the stack content used by > > > > > decompress() > > > > > in some odd way? But that would most likely lead to a crash, not to > > > > > unexpected uncompressing results. > > > > > > > > Especially if the old and new kernel are using the exact same > > > > image, how about the decompression writing over the shared > > > > info page causing all this? As the decompressor wouldn't > > > > expect Xen to possibly write stuff there itself, it could easily be > > > > that some repeat count gets altered, thus breaking the > > > > decompressed data without the decompression code necessarily > > > > noticing. > > > > > > In my case the gfn of the shared info page is 1f54. > > > Is it possible to move the page at runtime? It looks like it can be done > > > because the hvm loader configures f initially. > > > > > > Perhaps the PVonHVM code has to give up the shared pages or move them to > > > some dedicated unused page during the process of booting into the new > > > kernel. > > > > The pfn 1f54 of the shared info page is in the middle of the new bzImage: > > pfn32080 KiB <= ok, so the old .bss > > > _head 29360 KiB > > _text 33826 KiB > > _end 33924 KiB > > So _head is at 1CAC and _text starts at 2108h? > Ugh, and 1F54 gets overriden. And with your patch, the data gets > stuck in between _text and _end? No wait, where would the shared_info > be located in the old kernel? Somewhere below the 1CACh? The 3 symbols above are from bzImage, which contains the gzipped vmlinux and some code to decompress and actually start the uncompressed vmlinux. > I presume 1F54 is the _brk_end for the old kernel as well? Its in the .brk section of the old kernel. > Could you tell me how the decompress code works? Is the new kernel > put at PFN 1000h and the decompressor code is put below it? I'm not too familiar with the details of the events that happen during kexec -e. This is what happens from my understanding: kexec -l loads the new kernel and some helper code into some allocated memory. kexec -e starts the helper code which relocates the new bzImage to its new location (_head) and starts it. The new bzImage uncompresses the vmlinux to its final location and finally starts the new vmlinux. Now that I think about it, during the relocation of the new bzImage the part which contains the compressed vmlinux will already be corrupted because the shared info page will be modified by Xen (I dont know in what intervals the page gets modified). > And the decompressor code uses the .bss section of the "new" kernel > to do its deed - since it assumes that the carcass of the old kernel > is truly dead and it is not raising a hand saying: "I am not dead yet!". The decompressor uses its own .bss. > Which brings me to another question - say we do use this patch, what > if the decompressor overwrites the old kernels .data section. Won't > we run into this problem again? The old kernel is dead at this point. If both kernels have the same memory layout then the decompressor will clear the page. If they have a different layout the .data section (or whatever happens to be there) of the new kernel will be corrupted. > And what about the new kernel? It will try to register at a new > MFN location the VCPU structure. Is that something that the hypervisor > is OK with? Does that work with more than VCPU? Or is is stuck with > just one VCPU (b/c it couldn't actually do the hypercall?). So far I havent seen an issue because my guest uses a single cpu. > Or is the registration OK, since the new kernel has the same layout > so it registers at the same MFN as the "dead" kernel and it works > peachy? What if the kernel version used in the kexec is different > from the old one (say it has less built in things)? That would mean > the .text and .data section are different than the "dead" kernel? Yes, the layout would differ. During decompression corruption may occour. > > In the patch below the pfn is moved from the bss to the data section. As > > a result the new location is now at 28680 KiB, which is outside of the > > bzImage. > > > > Maybe there is a way to use another dedicated page as shared info page. > > That would do it, but it has the negative consequence that we end up > consuming an extra PAGE_SIZE that on baremetal kernels won't be used. I was not thinking of statically allocated pages but some new concept of allocating such shared pages. Shouldnt there be some dedicated area in the E820 table which has to be used during the whole life time of the guest? Are there more shared areas or is it just the shared info page? > And I am kind of worried that moving it to the .data section won't > be completly safe - as the decompressor might blow away that part too. The
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote: > On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote: > > Which brings me to another question - say we do use this patch, what > > if the decompressor overwrites the old kernels .data section. Won't > > we run into this problem again? > > I've not really been following this thread that closely but wouldn't the > right answer be for the original kernel to unmap the shared info on > kexec? Or maybe remap it up to some high/reserved address? Can it read That would be the right answer I think, but I don't see the a VCPU_deregister call (only VCPU_register). But perhaps the XENMEM_decrease_reservation for the particular MFN is the answer to do a VCPU "de-register" ? > the original address used by hvmloader at start of day and reuse that? Wait, we can map multiple shared_info? Ooh, somehow I thought the guest could only do one registration. > > Ian. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote: > Which brings me to another question - say we do use this patch, what > if the decompressor overwrites the old kernels .data section. Won't > we run into this problem again? I've not really been following this thread that closely but wouldn't the right answer be for the original kernel to unmap the shared info on kexec? Or maybe remap it up to some high/reserved address? Can it read the original address used by hvmloader at start of day and reuse that? Ian. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote: > On Fri, Jul 06, Olaf Hering wrote: > > > On Fri, Jul 06, Jan Beulich wrote: > > > > > > Could it be that some code tweaks the stack content used by decompress() > > > > in some odd way? But that would most likely lead to a crash, not to > > > > unexpected uncompressing results. > > > > > > Especially if the old and new kernel are using the exact same > > > image, how about the decompression writing over the shared > > > info page causing all this? As the decompressor wouldn't > > > expect Xen to possibly write stuff there itself, it could easily be > > > that some repeat count gets altered, thus breaking the > > > decompressed data without the decompression code necessarily > > > noticing. > > > > In my case the gfn of the shared info page is 1f54. > > Is it possible to move the page at runtime? It looks like it can be done > > because the hvm loader configures f initially. > > > > Perhaps the PVonHVM code has to give up the shared pages or move them to > > some dedicated unused page during the process of booting into the new > > kernel. > > The pfn 1f54 of the shared info page is in the middle of the new bzImage: > pfn32080 KiB <= ok, so the old .bss > _head 29360 KiB > _text 33826 KiB > _end 33924 KiB So _head is at 1CAC and _text starts at 2108h? Ugh, and 1F54 gets overriden. And with your patch, the data gets stuck in between _text and _end? No wait, where would the shared_info be located in the old kernel? Somewhere below the 1CACh? I presume 1F54 is the _brk_end for the old kernel as well? Could you tell me how the decompress code works? Is the new kernel put at PFN 1000h and the decompressor code is put below it? And the decompressor code uses the .bss section of the "new" kernel to do its deed - since it assumes that the carcass of the old kernel is truly dead and it is not raising a hand saying: "I am not dead yet!". Which brings me to another question - say we do use this patch, what if the decompressor overwrites the old kernels .data section. Won't we run into this problem again? And what about the new kernel? It will try to register at a new MFN location the VCPU structure. Is that something that the hypervisor is OK with? Does that work with more than VCPU? Or is is stuck with just one VCPU (b/c it couldn't actually do the hypercall?). Or is the registration OK, since the new kernel has the same layout so it registers at the same MFN as the "dead" kernel and it works peachy? What if the kernel version used in the kexec is different from the old one (say it has less built in things)? That would mean the .text and .data section are different than the "dead" kernel? > > In other words, the compressed vmlinux.bin gets slightly modified during > uncompression. For some reason this does not lead to decompression > errors. Hm > > In the patch below the pfn is moved from the bss to the data section. As > a result the new location is now at 28680 KiB, which is outside of the > bzImage. > > Maybe there is a way to use another dedicated page as shared info page. That would do it, but it has the negative consequence that we end up consuming an extra PAGE_SIZE that on baremetal kernels won't be used. Also, I think there are other issues lurking around. The other users of the .bss (or rather of the extend_brk) call would also be trashed. On PVHVM that would be the DMI. On PV guests (Daniel has some thoughts on how to make kexec work under PV guests) that means the P2M tree gets trashed. And I am kind of worried that moving it to the .data section won't be completly safe - as the decompressor might blow away that part too. > > Olaf > > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > index ff962d4..258e8f3 100644 > --- a/arch/x86/xen/enlighten.c > +++ b/arch/x86/xen/enlighten.c > @@ -1469,19 +1469,16 @@ void __ref xen_hvm_init_shared_info(void) > { > int cpu; > struct xen_add_to_physmap xatp; > - static struct shared_info *shared_info_page = 0; > + static struct shared_info shared_info_page __page_aligned_data = { }; > > - if (!shared_info_page) > - shared_info_page = (struct shared_info *) > - extend_brk(PAGE_SIZE, PAGE_SIZE); > xatp.domid = DOMID_SELF; > xatp.idx = 0; > xatp.space = XENMAPSPACE_shared_info; > - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT; > + xatp.gpfn = __pa(_info_page) >> PAGE_SHIFT; > if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, )) > BUG(); > > - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; > + HYPERVISOR_shared_info = _info_page; > > /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info >* page, we use it in the event channel upcall and in some pvclock > > ___ > Xen-devel mailing list > xen-de...@lists.xen.org > http://lists.xen.org/xen-devel --
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Fri, Jul 06, Olaf Hering wrote: > On Fri, Jul 06, Jan Beulich wrote: > > > > Could it be that some code tweaks the stack content used by decompress() > > > in some odd way? But that would most likely lead to a crash, not to > > > unexpected uncompressing results. > > > > Especially if the old and new kernel are using the exact same > > image, how about the decompression writing over the shared > > info page causing all this? As the decompressor wouldn't > > expect Xen to possibly write stuff there itself, it could easily be > > that some repeat count gets altered, thus breaking the > > decompressed data without the decompression code necessarily > > noticing. > > In my case the gfn of the shared info page is 1f54. > Is it possible to move the page at runtime? It looks like it can be done > because the hvm loader configures f initially. > > Perhaps the PVonHVM code has to give up the shared pages or move them to > some dedicated unused page during the process of booting into the new > kernel. The pfn 1f54 of the shared info page is in the middle of the new bzImage: pfn32080 KiB _head 29360 KiB _text 33826 KiB _end 33924 KiB In other words, the compressed vmlinux.bin gets slightly modified during uncompression. For some reason this does not lead to decompression errors. In the patch below the pfn is moved from the bss to the data section. As a result the new location is now at 28680 KiB, which is outside of the bzImage. Maybe there is a way to use another dedicated page as shared info page. Olaf diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index ff962d4..258e8f3 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1469,19 +1469,16 @@ void __ref xen_hvm_init_shared_info(void) { int cpu; struct xen_add_to_physmap xatp; - static struct shared_info *shared_info_page = 0; + static struct shared_info shared_info_page __page_aligned_data = { }; - if (!shared_info_page) - shared_info_page = (struct shared_info *) - extend_brk(PAGE_SIZE, PAGE_SIZE); xatp.domid = DOMID_SELF; xatp.idx = 0; xatp.space = XENMAPSPACE_shared_info; - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT; + xatp.gpfn = __pa(_info_page) >> PAGE_SHIFT; if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, )) BUG(); - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; + HYPERVISOR_shared_info = _info_page; /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info * page, we use it in the event channel upcall and in some pvclock -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Fri, Jul 06, Olaf Hering wrote: On Fri, Jul 06, Jan Beulich wrote: Could it be that some code tweaks the stack content used by decompress() in some odd way? But that would most likely lead to a crash, not to unexpected uncompressing results. Especially if the old and new kernel are using the exact same image, how about the decompression writing over the shared info page causing all this? As the decompressor wouldn't expect Xen to possibly write stuff there itself, it could easily be that some repeat count gets altered, thus breaking the decompressed data without the decompression code necessarily noticing. In my case the gfn of the shared info page is 1f54. Is it possible to move the page at runtime? It looks like it can be done because the hvm loader configures f initially. Perhaps the PVonHVM code has to give up the shared pages or move them to some dedicated unused page during the process of booting into the new kernel. The pfn 1f54 of the shared info page is in the middle of the new bzImage: pfn32080 KiB _head 29360 KiB _text 33826 KiB _end 33924 KiB In other words, the compressed vmlinux.bin gets slightly modified during uncompression. For some reason this does not lead to decompression errors. In the patch below the pfn is moved from the bss to the data section. As a result the new location is now at 28680 KiB, which is outside of the bzImage. Maybe there is a way to use another dedicated page as shared info page. Olaf diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index ff962d4..258e8f3 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1469,19 +1469,16 @@ void __ref xen_hvm_init_shared_info(void) { int cpu; struct xen_add_to_physmap xatp; - static struct shared_info *shared_info_page = 0; + static struct shared_info shared_info_page __page_aligned_data = { }; - if (!shared_info_page) - shared_info_page = (struct shared_info *) - extend_brk(PAGE_SIZE, PAGE_SIZE); xatp.domid = DOMID_SELF; xatp.idx = 0; xatp.space = XENMAPSPACE_shared_info; - xatp.gpfn = __pa(shared_info_page) PAGE_SHIFT; + xatp.gpfn = __pa(shared_info_page) PAGE_SHIFT; if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, xatp)) BUG(); - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; + HYPERVISOR_shared_info = shared_info_page; /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info * page, we use it in the event channel upcall and in some pvclock -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote: On Fri, Jul 06, Olaf Hering wrote: On Fri, Jul 06, Jan Beulich wrote: Could it be that some code tweaks the stack content used by decompress() in some odd way? But that would most likely lead to a crash, not to unexpected uncompressing results. Especially if the old and new kernel are using the exact same image, how about the decompression writing over the shared info page causing all this? As the decompressor wouldn't expect Xen to possibly write stuff there itself, it could easily be that some repeat count gets altered, thus breaking the decompressed data without the decompression code necessarily noticing. In my case the gfn of the shared info page is 1f54. Is it possible to move the page at runtime? It looks like it can be done because the hvm loader configures f initially. Perhaps the PVonHVM code has to give up the shared pages or move them to some dedicated unused page during the process of booting into the new kernel. The pfn 1f54 of the shared info page is in the middle of the new bzImage: pfn32080 KiB = ok, so the old .bss _head 29360 KiB _text 33826 KiB _end 33924 KiB So _head is at 1CAC and _text starts at 2108h? Ugh, and 1F54 gets overriden. And with your patch, the data gets stuck in between _text and _end? No wait, where would the shared_info be located in the old kernel? Somewhere below the 1CACh? I presume 1F54 is the _brk_end for the old kernel as well? Could you tell me how the decompress code works? Is the new kernel put at PFN 1000h and the decompressor code is put below it? And the decompressor code uses the .bss section of the new kernel to do its deed - since it assumes that the carcass of the old kernel is truly dead and it is not raising a hand saying: I am not dead yet!. Which brings me to another question - say we do use this patch, what if the decompressor overwrites the old kernels .data section. Won't we run into this problem again? And what about the new kernel? It will try to register at a new MFN location the VCPU structure. Is that something that the hypervisor is OK with? Does that work with more than VCPU? Or is is stuck with just one VCPU (b/c it couldn't actually do the hypercall?). Or is the registration OK, since the new kernel has the same layout so it registers at the same MFN as the dead kernel and it works peachy? What if the kernel version used in the kexec is different from the old one (say it has less built in things)? That would mean the .text and .data section are different than the dead kernel? In other words, the compressed vmlinux.bin gets slightly modified during uncompression. For some reason this does not lead to decompression errors. Hm In the patch below the pfn is moved from the bss to the data section. As a result the new location is now at 28680 KiB, which is outside of the bzImage. Maybe there is a way to use another dedicated page as shared info page. That would do it, but it has the negative consequence that we end up consuming an extra PAGE_SIZE that on baremetal kernels won't be used. Also, I think there are other issues lurking around. The other users of the .bss (or rather of the extend_brk) call would also be trashed. On PVHVM that would be the DMI. On PV guests (Daniel has some thoughts on how to make kexec work under PV guests) that means the P2M tree gets trashed. And I am kind of worried that moving it to the .data section won't be completly safe - as the decompressor might blow away that part too. Olaf diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index ff962d4..258e8f3 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1469,19 +1469,16 @@ void __ref xen_hvm_init_shared_info(void) { int cpu; struct xen_add_to_physmap xatp; - static struct shared_info *shared_info_page = 0; + static struct shared_info shared_info_page __page_aligned_data = { }; - if (!shared_info_page) - shared_info_page = (struct shared_info *) - extend_brk(PAGE_SIZE, PAGE_SIZE); xatp.domid = DOMID_SELF; xatp.idx = 0; xatp.space = XENMAPSPACE_shared_info; - xatp.gpfn = __pa(shared_info_page) PAGE_SHIFT; + xatp.gpfn = __pa(shared_info_page) PAGE_SHIFT; if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, xatp)) BUG(); - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; + HYPERVISOR_shared_info = shared_info_page; /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info * page, we use it in the event channel upcall and in some pvclock ___ Xen-devel mailing list xen-de...@lists.xen.org http://lists.xen.org/xen-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote: Which brings me to another question - say we do use this patch, what if the decompressor overwrites the old kernels .data section. Won't we run into this problem again? I've not really been following this thread that closely but wouldn't the right answer be for the original kernel to unmap the shared info on kexec? Or maybe remap it up to some high/reserved address? Can it read the original address used by hvmloader at start of day and reuse that? Ian. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote: On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote: Which brings me to another question - say we do use this patch, what if the decompressor overwrites the old kernels .data section. Won't we run into this problem again? I've not really been following this thread that closely but wouldn't the right answer be for the original kernel to unmap the shared info on kexec? Or maybe remap it up to some high/reserved address? Can it read That would be the right answer I think, but I don't see the a VCPU_deregister call (only VCPU_register). But perhaps the XENMEM_decrease_reservation for the particular MFN is the answer to do a VCPU de-register ? the original address used by hvmloader at start of day and reuse that? Wait, we can map multiple shared_info? Ooh, somehow I thought the guest could only do one registration. Ian. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, Konrad Rzeszutek Wilk wrote: On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote: On Fri, Jul 06, Olaf Hering wrote: On Fri, Jul 06, Jan Beulich wrote: Could it be that some code tweaks the stack content used by decompress() in some odd way? But that would most likely lead to a crash, not to unexpected uncompressing results. Especially if the old and new kernel are using the exact same image, how about the decompression writing over the shared info page causing all this? As the decompressor wouldn't expect Xen to possibly write stuff there itself, it could easily be that some repeat count gets altered, thus breaking the decompressed data without the decompression code necessarily noticing. In my case the gfn of the shared info page is 1f54. Is it possible to move the page at runtime? It looks like it can be done because the hvm loader configures f initially. Perhaps the PVonHVM code has to give up the shared pages or move them to some dedicated unused page during the process of booting into the new kernel. The pfn 1f54 of the shared info page is in the middle of the new bzImage: pfn32080 KiB = ok, so the old .bss _head 29360 KiB _text 33826 KiB _end 33924 KiB So _head is at 1CAC and _text starts at 2108h? Ugh, and 1F54 gets overriden. And with your patch, the data gets stuck in between _text and _end? No wait, where would the shared_info be located in the old kernel? Somewhere below the 1CACh? The 3 symbols above are from bzImage, which contains the gzipped vmlinux and some code to decompress and actually start the uncompressed vmlinux. I presume 1F54 is the _brk_end for the old kernel as well? Its in the .brk section of the old kernel. Could you tell me how the decompress code works? Is the new kernel put at PFN 1000h and the decompressor code is put below it? I'm not too familiar with the details of the events that happen during kexec -e. This is what happens from my understanding: kexec -l loads the new kernel and some helper code into some allocated memory. kexec -e starts the helper code which relocates the new bzImage to its new location (_head) and starts it. The new bzImage uncompresses the vmlinux to its final location and finally starts the new vmlinux. Now that I think about it, during the relocation of the new bzImage the part which contains the compressed vmlinux will already be corrupted because the shared info page will be modified by Xen (I dont know in what intervals the page gets modified). And the decompressor code uses the .bss section of the new kernel to do its deed - since it assumes that the carcass of the old kernel is truly dead and it is not raising a hand saying: I am not dead yet!. The decompressor uses its own .bss. Which brings me to another question - say we do use this patch, what if the decompressor overwrites the old kernels .data section. Won't we run into this problem again? The old kernel is dead at this point. If both kernels have the same memory layout then the decompressor will clear the page. If they have a different layout the .data section (or whatever happens to be there) of the new kernel will be corrupted. And what about the new kernel? It will try to register at a new MFN location the VCPU structure. Is that something that the hypervisor is OK with? Does that work with more than VCPU? Or is is stuck with just one VCPU (b/c it couldn't actually do the hypercall?). So far I havent seen an issue because my guest uses a single cpu. Or is the registration OK, since the new kernel has the same layout so it registers at the same MFN as the dead kernel and it works peachy? What if the kernel version used in the kexec is different from the old one (say it has less built in things)? That would mean the .text and .data section are different than the dead kernel? Yes, the layout would differ. During decompression corruption may occour. In the patch below the pfn is moved from the bss to the data section. As a result the new location is now at 28680 KiB, which is outside of the bzImage. Maybe there is a way to use another dedicated page as shared info page. That would do it, but it has the negative consequence that we end up consuming an extra PAGE_SIZE that on baremetal kernels won't be used. I was not thinking of statically allocated pages but some new concept of allocating such shared pages. Shouldnt there be some dedicated area in the E820 table which has to be used during the whole life time of the guest? Are there more shared areas or is it just the shared info page? And I am kind of worried that moving it to the .data section won't be completly safe - as the decompressor might blow away that part too. The decompressor may just clear the area, but since there is no way to tell where the shared pages are its always a risk to allocate them at compile time. Olaf
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, 2012-07-10 at 10:51 -0400, Konrad Rzeszutek Wilk wrote: On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote: On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote: Which brings me to another question - say we do use this patch, what if the decompressor overwrites the old kernels .data section. Won't we run into this problem again? I've not really been following this thread that closely but wouldn't the right answer be for the original kernel to unmap the shared info on kexec? Or maybe remap it up to some high/reserved address? Can it read That would be the right answer I think, but I don't see the a VCPU_deregister call (only VCPU_register). Is the issue here vcpuinfo or the shared info (or both)? But perhaps the XENMEM_decrease_reservation for the particular MFN is the answer to do a VCPU de-register ? the original address used by hvmloader at start of day and reuse that? Wait, we can map multiple shared_info? Ooh, somehow I thought the guest could only do one registration. I think you can only have one mapping but you can move it by registering it again. Ian. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, Ian Campbell wrote: On Tue, 2012-07-10 at 10:51 -0400, Konrad Rzeszutek Wilk wrote: On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote: On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote: Which brings me to another question - say we do use this patch, what if the decompressor overwrites the old kernels .data section. Won't we run into this problem again? I've not really been following this thread that closely but wouldn't the right answer be for the original kernel to unmap the shared info on kexec? Or maybe remap it up to some high/reserved address? Can it read That would be the right answer I think, but I don't see the a VCPU_deregister call (only VCPU_register). Is the issue here vcpuinfo or the shared info (or both)? shared info is the issue in PVonHVM. Olaf -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote: On Tue, Jul 10, Konrad Rzeszutek Wilk wrote: On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote: On Fri, Jul 06, Olaf Hering wrote: On Fri, Jul 06, Jan Beulich wrote: Could it be that some code tweaks the stack content used by decompress() in some odd way? But that would most likely lead to a crash, not to unexpected uncompressing results. Especially if the old and new kernel are using the exact same image, how about the decompression writing over the shared info page causing all this? As the decompressor wouldn't expect Xen to possibly write stuff there itself, it could easily be that some repeat count gets altered, thus breaking the decompressed data without the decompression code necessarily noticing. In my case the gfn of the shared info page is 1f54. Is it possible to move the page at runtime? It looks like it can be done because the hvm loader configures f initially. Perhaps the PVonHVM code has to give up the shared pages or move them to some dedicated unused page during the process of booting into the new kernel. The pfn 1f54 of the shared info page is in the middle of the new bzImage: pfn32080 KiB = ok, so the old .bss _head 29360 KiB _text 33826 KiB _end 33924 KiB So _head is at 1CAC and _text starts at 2108h? Ugh, and 1F54 gets overriden. And with your patch, the data gets stuck in between _text and _end? No wait, where would the shared_info be located in the old kernel? Somewhere below the 1CACh? The 3 symbols above are from bzImage, which contains the gzipped vmlinux and some code to decompress and actually start the uncompressed vmlinux. I presume 1F54 is the _brk_end for the old kernel as well? Its in the .brk section of the old kernel. Could you tell me how the decompress code works? Is the new kernel put at PFN 1000h and the decompressor code is put below it? I'm not too familiar with the details of the events that happen during kexec -e. This is what happens from my understanding: kexec -l loads the new kernel and some helper code into some allocated memory. kexec -e starts the helper code which relocates the new bzImage to its new location (_head) and starts it. The new bzImage uncompresses the vmlinux to its final location and finally starts the new vmlinux. Now that I think about it, during the relocation of the new bzImage the part which contains the compressed vmlinux will already be corrupted because the shared info page will be modified by Xen (I dont know in what intervals the page gets modified). And the decompressor code uses the .bss section of the new kernel to do its deed - since it assumes that the carcass of the old kernel is truly dead and it is not raising a hand saying: I am not dead yet!. The decompressor uses its own .bss. Which brings me to another question - say we do use this patch, what if the decompressor overwrites the old kernels .data section. Won't we run into this problem again? The old kernel is dead at this point. If both kernels have the same memory layout then the decompressor will clear the page. If they have a different layout the .data section (or whatever happens to be there) of the new kernel will be corrupted. And what about the new kernel? It will try to register at a new MFN location the VCPU structure. Is that something that the hypervisor is OK with? Does that work with more than VCPU? Or is is stuck with just one VCPU (b/c it couldn't actually do the hypercall?). So far I havent seen an issue because my guest uses a single cpu. Or is the registration OK, since the new kernel has the same layout so it registers at the same MFN as the dead kernel and it works peachy? What if the kernel version used in the kexec is different from the old one (say it has less built in things)? That would mean the .text and .data section are different than the dead kernel? Yes, the layout would differ. During decompression corruption may occour. In the patch below the pfn is moved from the bss to the data section. As a result the new location is now at 28680 KiB, which is outside of the bzImage. Maybe there is a way to use another dedicated page as shared info page. That would do it, but it has the negative consequence that we end up consuming an extra PAGE_SIZE that on baremetal kernels won't be used. I was not thinking of statically allocated pages but some new concept of allocating such shared pages. Shouldnt there be some dedicated area in the E820 table which has to be used during the whole life time of the guest? Not that I can see. But I don't see why that could not be added? Perhaps the HVM loader can make it happen? But then how would it tell the kernel that this E820_RESERVED is the shared_info one. Not
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, Konrad Rzeszutek Wilk wrote: On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote: I was not thinking of statically allocated pages but some new concept of allocating such shared pages. Shouldnt there be some dedicated area in the E820 table which has to be used during the whole life time of the guest? Not that I can see. But I don't see why that could not be added? Perhaps the HVM loader can make it happen? But then how would it tell the kernel that this E820_RESERVED is the shared_info one. Not the other ones.. Maybe just use a new E820 type for this sort of thing? Its just the question wether some other OS can cope with an unknown type. From my reading of the e820 related code a region with an unknown type is just ignored. Are there more shared areas or is it just the shared info page? And I am kind of worried that moving it to the .data section won't be completly safe - as the decompressor might blow away that part too. The decompressor may just clear the area, but since there is no way to tell where the shared pages are its always a risk to allocate them at compile time. Yeah, and with the hypervisor potentially still updating the old MFN before the new kernel has registered the new MFN, we can end up corrupting the new kernel. Ouch. Would all of these issues disappear if the hypervisor had a hypercall that would stop updating the shared info? or just deregister the MFN? What if you ripped the GMFN out using 'decrease_reservation' hypercall? Would that eliminate the pesky GMFN? I'm not sure, most likely the gfn will just disappear from the guest, like a ballooned page disappears. Accessing it will likely cause a crash. Olaf -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Tue, Jul 10, 2012 at 08:09:53PM +0200, Olaf Hering wrote: On Tue, Jul 10, Konrad Rzeszutek Wilk wrote: On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote: I was not thinking of statically allocated pages but some new concept of allocating such shared pages. Shouldnt there be some dedicated area in the E820 table which has to be used during the whole life time of the guest? Not that I can see. But I don't see why that could not be added? Perhaps the HVM loader can make it happen? But then how would it tell the kernel that this E820_RESERVED is the shared_info one. Not the other ones.. Maybe just use a new E820 type for this sort of thing? Its just the Ewww. question wether some other OS can cope with an unknown type. From my reading of the e820 related code a region with an unknown type is just ignored. Sure. And we could scan it.. but scanning E820_UNKNOWN for some magic header seems .. hacky. Are there more shared areas or is it just the shared info page? And I am kind of worried that moving it to the .data section won't be completly safe - as the decompressor might blow away that part too. The decompressor may just clear the area, but since there is no way to tell where the shared pages are its always a risk to allocate them at compile time. Yeah, and with the hypervisor potentially still updating the old MFN before the new kernel has registered the new MFN, we can end up corrupting the new kernel. Ouch. Would all of these issues disappear if the hypervisor had a hypercall that would stop updating the shared info? or just deregister the MFN? What if you ripped the GMFN out using 'decrease_reservation' hypercall? Would that eliminate the pesky GMFN? I'm not sure, most likely the gfn will just disappear from the guest, like a ballooned page disappears. Accessing it will likely cause a crash. What about an populate_physmap right afterwards to stick a newly minted GMFN in its place? I don't really know whether this dance of balloon out/balloon in the same GMFN will break the shared_info relationship. Perhaps not? What we are going for is to stop the hypervisor from using the shared_info MFN... perhaps there are other ways to do this? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On 10/07/2012 19:09, Olaf Hering o...@aepfle.de wrote: Are there more shared areas or is it just the shared info page? And I am kind of worried that moving it to the .data section won't be completly safe - as the decompressor might blow away that part too. The decompressor may just clear the area, but since there is no way to tell where the shared pages are its always a risk to allocate them at compile time. Yeah, and with the hypervisor potentially still updating the old MFN before the new kernel has registered the new MFN, we can end up corrupting the new kernel. Ouch. Would all of these issues disappear if the hypervisor had a hypercall that would stop updating the shared info? or just deregister the MFN? What if you ripped the GMFN out using 'decrease_reservation' hypercall? Would that eliminate the pesky GMFN? I'm not sure, most likely the gfn will just disappear from the guest, like a ballooned page disappears. Accessing it will likely cause a crash. Best thing to do, is possible, is map the shared-info page in the xen-platform pci device's BAR memory range. Then it will not conflict with any RAM. If you do map it over the top of an existing RAM page, you will have to repopulate that RAM page before kexec, using populate_physmap hypercall. The good news is that the populate_physmap hypercall will have the side effect of unmapping the shared-info page, reayd to be mapped wherever the new kernel would like it to reside :) Hope this clears up some of the confusion. ;) -- Keir -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Fri, Jul 06, Jan Beulich wrote: > > Could it be that some code tweaks the stack content used by decompress() > > in some odd way? But that would most likely lead to a crash, not to > > unexpected uncompressing results. > > Especially if the old and new kernel are using the exact same > image, how about the decompression writing over the shared > info page causing all this? As the decompressor wouldn't > expect Xen to possibly write stuff there itself, it could easily be > that some repeat count gets altered, thus breaking the > decompressed data without the decompression code necessarily > noticing. In my case the gfn of the shared info page is 1f54. Is it possible to move the page at runtime? It looks like it can be done because the hvm loader configures f initially. Perhaps the PVonHVM code has to give up the shared pages or move them to some dedicated unused page during the process of booting into the new kernel. Konrad, any idea how that could be done? > If that's the case, there would be a more general problem here > (for kdump at least), as granted pages could also still get written > to when the new kernel already is in the process of launching. Maybe you meant to say kexec instead of kdump? kdump runs in its own memory area, so I think the worst thing is that some pages of the crashed kernel get modified. Olaf -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
On Fri, Jul 06, Jan Beulich wrote: Could it be that some code tweaks the stack content used by decompress() in some odd way? But that would most likely lead to a crash, not to unexpected uncompressing results. Especially if the old and new kernel are using the exact same image, how about the decompression writing over the shared info page causing all this? As the decompressor wouldn't expect Xen to possibly write stuff there itself, it could easily be that some repeat count gets altered, thus breaking the decompressed data without the decompression code necessarily noticing. In my case the gfn of the shared info page is 1f54. Is it possible to move the page at runtime? It looks like it can be done because the hvm loader configures f initially. Perhaps the PVonHVM code has to give up the shared pages or move them to some dedicated unused page during the process of booting into the new kernel. Konrad, any idea how that could be done? If that's the case, there would be a more general problem here (for kdump at least), as granted pages could also still get written to when the new kernel already is in the process of launching. Maybe you meant to say kexec instead of kdump? kdump runs in its own memory area, so I think the worst thing is that some pages of the crashed kernel get modified. Olaf -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/