Re: [Xen-devel] [PATCH v3 1/2] kexec: use hypercall_create_continuation to protect KEXEC ops
>>> On 20.04.17 at 12:42,wrote: > On Thu, Apr 20, 2017 at 03:34:21AM -0600, Jan Beulich wrote: >> >>> On 19.04.17 at 19:16, wrote: >> > On Wed, Apr 19, 2017 at 10:19:44AM -0600, Jan Beulich wrote: >> >> >>> On 19.04.17 at 17:54, wrote: >> >> > On Wed, Apr 19, 2017 at 10:47:15AM -0500, Eric DeVolder wrote: >> >> >> @@ -1193,6 +1194,9 @@ static int do_kexec_op_internal(unsigned long op, >> >> >> if ( ret ) >> >> >> return ret; >> >> >> >> >> >> +if ( test_and_set_bit(KEXEC_FLAG_IN_HYPERCALL, _flags) ) >> >> >> +return hypercall_create_continuation(__HYPERVISOR_kexec_op, >> >> >> "lh", op, uarg); >> >> >> + >> >> > >> >> > I would suggest here: >> >> >ASSERT(test_bit(KEXEC_FLAG_IN_HYPERCALL, _flags)); >> >> >> >> You're kidding? The flag was set just in the line above. Or do you >> >> really mean we need to consider test_and_set_bit() not doing what >> >> it is supposed to do? >> > >> > Yep, it looks ridiculous. However, ASSERT() in kexec_swap_images() looks >> > almost >> > the same for me. So, TBH, I still do not understand need for it at all. >> > Could >> > you enlighten me? >> >> Can't be that difficult to understand: There was a lock there before, >> and the addition of the ASSERT() could help document that the >> serialization requirements aren't being broken. I'm not saying there > > Ahhh... OK, so, you treat this more as a comment than anything else. > So, why not use regular comment here then? Hmmm... Do you treat an > ASSERT() here as kind of fuse too? Well, in a way - other than a plain comment, the ASSERT() serves both documentation purposes _and_ enforces what the comment would otherwise only state. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] kexec: use hypercall_create_continuation to protect KEXEC ops
On Thu, Apr 20, 2017 at 03:34:21AM -0600, Jan Beulich wrote: > >>> On 19.04.17 at 19:16,wrote: > > On Wed, Apr 19, 2017 at 10:19:44AM -0600, Jan Beulich wrote: > >> >>> On 19.04.17 at 17:54, wrote: > >> > On Wed, Apr 19, 2017 at 10:47:15AM -0500, Eric DeVolder wrote: > >> >> @@ -1193,6 +1194,9 @@ static int do_kexec_op_internal(unsigned long op, > >> >> if ( ret ) > >> >> return ret; > >> >> > >> >> +if ( test_and_set_bit(KEXEC_FLAG_IN_HYPERCALL, _flags) ) > >> >> +return hypercall_create_continuation(__HYPERVISOR_kexec_op, > >> >> "lh", op, uarg); > >> >> + > >> > > >> > I would suggest here: > >> >ASSERT(test_bit(KEXEC_FLAG_IN_HYPERCALL, _flags)); > >> > >> You're kidding? The flag was set just in the line above. Or do you > >> really mean we need to consider test_and_set_bit() not doing what > >> it is supposed to do? > > > > Yep, it looks ridiculous. However, ASSERT() in kexec_swap_images() looks > > almost > > the same for me. So, TBH, I still do not understand need for it at all. > > Could > > you enlighten me? > > Can't be that difficult to understand: There was a lock there before, > and the addition of the ASSERT() could help document that the > serialization requirements aren't being broken. I'm not saying there Ahhh... OK, so, you treat this more as a comment than anything else. So, why not use regular comment here then? Hmmm... Do you treat an ASSERT() here as kind of fuse too? > might not be other places to _also_ add ASSERT()s, but not in _that > other_ patch. OK, so, I would put ASSERT() at least at the beginning of kexec_load() and kexec_unload() (I am not sure about others). However, then ASSERT() is not needed in kexec_swap_images(). Though if you wish to leave it I will not object any longer. Daniel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] kexec: use hypercall_create_continuation to protect KEXEC ops
>>> On 19.04.17 at 19:16,wrote: > On Wed, Apr 19, 2017 at 10:19:44AM -0600, Jan Beulich wrote: >> >>> On 19.04.17 at 17:54, wrote: >> > On Wed, Apr 19, 2017 at 10:47:15AM -0500, Eric DeVolder wrote: >> >> @@ -1193,6 +1194,9 @@ static int do_kexec_op_internal(unsigned long op, >> >> if ( ret ) >> >> return ret; >> >> >> >> +if ( test_and_set_bit(KEXEC_FLAG_IN_HYPERCALL, _flags) ) >> >> +return hypercall_create_continuation(__HYPERVISOR_kexec_op, >> >> "lh", op, uarg); >> >> + >> > >> > I would suggest here: >> >ASSERT(test_bit(KEXEC_FLAG_IN_HYPERCALL, _flags)); >> >> You're kidding? The flag was set just in the line above. Or do you >> really mean we need to consider test_and_set_bit() not doing what >> it is supposed to do? > > Yep, it looks ridiculous. However, ASSERT() in kexec_swap_images() looks > almost > the same for me. So, TBH, I still do not understand need for it at all. Could > you enlighten me? Can't be that difficult to understand: There was a lock there before, and the addition of the ASSERT() could help document that the serialization requirements aren't being broken. I'm not saying there might not be other places to _also_ add ASSERT()s, but not in _that other_ patch. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] kexec: use hypercall_create_continuation to protect KEXEC ops
On Wed, Apr 19, 2017 at 10:19:44AM -0600, Jan Beulich wrote: > >>> On 19.04.17 at 17:54,wrote: > > On Wed, Apr 19, 2017 at 10:47:15AM -0500, Eric DeVolder wrote: > >> @@ -1193,6 +1194,9 @@ static int do_kexec_op_internal(unsigned long op, > >> if ( ret ) > >> return ret; > >> > >> +if ( test_and_set_bit(KEXEC_FLAG_IN_HYPERCALL, _flags) ) > >> +return hypercall_create_continuation(__HYPERVISOR_kexec_op, "lh", > >> op, uarg); > >> + > > > > I would suggest here: > >ASSERT(test_bit(KEXEC_FLAG_IN_HYPERCALL, _flags)); > > You're kidding? The flag was set just in the line above. Or do you > really mean we need to consider test_and_set_bit() not doing what > it is supposed to do? Yep, it looks ridiculous. However, ASSERT() in kexec_swap_images() looks almost the same for me. So, TBH, I still do not understand need for it at all. Could you enlighten me? If we really need it I would put it at the beginning of every function called from switch() in do_kexec_op_internal(). Just in case. Though I still do not see the point for it. At least in that form. Daniel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] kexec: use hypercall_create_continuation to protect KEXEC ops
>>> On 19.04.17 at 17:54,wrote: > On Wed, Apr 19, 2017 at 10:47:15AM -0500, Eric DeVolder wrote: >> @@ -1193,6 +1194,9 @@ static int do_kexec_op_internal(unsigned long op, >> if ( ret ) >> return ret; >> >> +if ( test_and_set_bit(KEXEC_FLAG_IN_HYPERCALL, _flags) ) >> +return hypercall_create_continuation(__HYPERVISOR_kexec_op, "lh", >> op, uarg); >> + > > I would suggest here: >ASSERT(test_bit(KEXEC_FLAG_IN_HYPERCALL, _flags)); You're kidding? The flag was set just in the line above. Or do you really mean we need to consider test_and_set_bit() not doing what it is supposed to do? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] kexec: use hypercall_create_continuation to protect KEXEC ops
On Wed, Apr 19, 2017 at 10:47:15AM -0500, Eric DeVolder wrote: > When we concurrently try to unload and load crash > images we eventually get: > > Xen call trace: > [] machine_kexec_add_page+0x3a0/0x3fa > [] machine_kexec_load+0xdb/0x107 > [] kexec.c#kexec_load_slot+0x11/0x42 > [] kexec.c#kexec_load+0x119/0x150 > [] kexec.c#do_kexec_op_internal+0xab/0xcf > [] do_kexec_op+0xe/0x1e > [] pv_hypercall+0x20a/0x44a > [] cpufreq.c#test_all_events+0/0x30 > > Pagetable walk from 820040088320: > L4[0x104] = 0002979d1063 > L3[0x001] = 0002979d0063 > L2[0x000] = 0002979c7063 > L1[0x088] = 80037a91ede97063 > > The interesting thing is that the page bits (063) look legit. > > The operation on which we blow up is us trying to write > in the L1 and finding that the L2 entry points to some > bizzare MFN. It stinks of a race, and it looks like > the issue is due to no concurrency locks when dealing > with the crash kernel space. > > Specifically we concurrently call kimage_alloc_crash_control_page > which iterates over the kexec_crash_area.start -> kexec_crash_area.size > and once found: > > if ( page ) > { > image->next_crash_page = hole_end; > clear_domain_page(_mfn(page_to_mfn(page))); > } > > clears. Since the parameters of what MFN to use are provided > by the callers (and the area to search is bounded) the the 'page' > is probably the same. So #1 we concurrently clear the > 'control_code_page'. > > The next step is us passing this 'control_code_page' to > machine_kexec_add_page. This function requires the MFNs: > page_to_maddr(image->control_code_page). > > And this would always return the same virtual address, as > the MFN of the control_code_page is inside of the > kexec_crash_area.start -> kexec_crash_area.size area. > > Then machine_kexec_add_page updates the L1 .. which can be done > concurrently and on subsequent calls we mangle it up. > > This is all a theory at this time, but testing reveals > that adding the hypercall_create_continuation() at the > kexec hypercall fixes the crash. > > NOTE: This patch follows 5c5216 (kexec: clear kexec_image slot > when unloading kexec image) to prevent crashes during > simultaneous load/unloads. > > NOTE: Consideration was given to using the existing flag > KEXEC_FLAG_IN_PROGRESS to denote a kexec hypercall in > progress. This, however, overloads the original intent of > the flag which is to denote that we are about-to/have made > the jump to the crash path. The overloading would lead to > failures in existing checks on this flag as the flag would > always be set at the top level in do_kexec_op_internal(). > For this reason, the new flag KEXEC_FLAG_HC_IN_PROGRESS > was introduced. > > While at it, fixed the #define mismatched spacing > > Signed-off-by: Eric DeVolder> Reviewed-by: Bhavesh Davda > Reviewed-by: Konrad Rzeszutek Wilk > Reviewed-by: Jan Beulich > Reviewed-by: Andrew Cooper > Reviewed-by: Daniel Kiper > --- > v3: > - Incorporated feedback from Jan Beulich to change the >name of the new flag to KEXEC_FLAG_IN_HYPERCALL > > v2: 04/17/2017 > - Patch titled 'kexec: use hypercall_create_continuation to protect KEXEC > ops' > - Jan Beulich directed me to use a continuation method instead >of spinlock. > - Incorporated feedback from Daniel Kiper > - Incorporated feedback from Konrad Wilk > > v1: 04/10/2017 > - Patch titled 'kexec: Add spinlock for the whole hypercall' > - Used spinlock in do_kexec_op_internal() > --- > xen/common/kexec.c | 12 +--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/xen/common/kexec.c b/xen/common/kexec.c > index 072cc8e..253c204 100644 > --- a/xen/common/kexec.c > +++ b/xen/common/kexec.c > @@ -50,9 +50,10 @@ static cpumask_t crash_saved_cpus; > > static struct kexec_image *kexec_image[KEXEC_IMAGE_NR]; > > -#define KEXEC_FLAG_DEFAULT_POS (KEXEC_IMAGE_NR + 0) > -#define KEXEC_FLAG_CRASH_POS (KEXEC_IMAGE_NR + 1) > -#define KEXEC_FLAG_IN_PROGRESS (KEXEC_IMAGE_NR + 2) > +#define KEXEC_FLAG_DEFAULT_POS (KEXEC_IMAGE_NR + 0) > +#define KEXEC_FLAG_CRASH_POS(KEXEC_IMAGE_NR + 1) > +#define KEXEC_FLAG_IN_PROGRESS (KEXEC_IMAGE_NR + 2) > +#define KEXEC_FLAG_IN_HYPERCALL (KEXEC_IMAGE_NR + 3) I do not think that you need change all of this right now. Just add one space after KEXEC_FLAG_IN_HYPERCALL and you are set. > static unsigned long kexec_flags = 0; /* the lowest bits are for > KEXEC_IMAGE... */ > > @@ -1193,6 +1194,9 @@ static int do_kexec_op_internal(unsigned long op, > if ( ret ) > return ret; > > +if ( test_and_set_bit(KEXEC_FLAG_IN_HYPERCALL, _flags) ) > +return hypercall_create_continuation(__HYPERVISOR_kexec_op,
[Xen-devel] [PATCH v3 1/2] kexec: use hypercall_create_continuation to protect KEXEC ops
When we concurrently try to unload and load crash images we eventually get: Xen call trace: [] machine_kexec_add_page+0x3a0/0x3fa [] machine_kexec_load+0xdb/0x107 [] kexec.c#kexec_load_slot+0x11/0x42 [] kexec.c#kexec_load+0x119/0x150 [] kexec.c#do_kexec_op_internal+0xab/0xcf [] do_kexec_op+0xe/0x1e [] pv_hypercall+0x20a/0x44a [] cpufreq.c#test_all_events+0/0x30 Pagetable walk from 820040088320: L4[0x104] = 0002979d1063 L3[0x001] = 0002979d0063 L2[0x000] = 0002979c7063 L1[0x088] = 80037a91ede97063 The interesting thing is that the page bits (063) look legit. The operation on which we blow up is us trying to write in the L1 and finding that the L2 entry points to some bizzare MFN. It stinks of a race, and it looks like the issue is due to no concurrency locks when dealing with the crash kernel space. Specifically we concurrently call kimage_alloc_crash_control_page which iterates over the kexec_crash_area.start -> kexec_crash_area.size and once found: if ( page ) { image->next_crash_page = hole_end; clear_domain_page(_mfn(page_to_mfn(page))); } clears. Since the parameters of what MFN to use are provided by the callers (and the area to search is bounded) the the 'page' is probably the same. So #1 we concurrently clear the 'control_code_page'. The next step is us passing this 'control_code_page' to machine_kexec_add_page. This function requires the MFNs: page_to_maddr(image->control_code_page). And this would always return the same virtual address, as the MFN of the control_code_page is inside of the kexec_crash_area.start -> kexec_crash_area.size area. Then machine_kexec_add_page updates the L1 .. which can be done concurrently and on subsequent calls we mangle it up. This is all a theory at this time, but testing reveals that adding the hypercall_create_continuation() at the kexec hypercall fixes the crash. NOTE: This patch follows 5c5216 (kexec: clear kexec_image slot when unloading kexec image) to prevent crashes during simultaneous load/unloads. NOTE: Consideration was given to using the existing flag KEXEC_FLAG_IN_PROGRESS to denote a kexec hypercall in progress. This, however, overloads the original intent of the flag which is to denote that we are about-to/have made the jump to the crash path. The overloading would lead to failures in existing checks on this flag as the flag would always be set at the top level in do_kexec_op_internal(). For this reason, the new flag KEXEC_FLAG_HC_IN_PROGRESS was introduced. While at it, fixed the #define mismatched spacing Signed-off-by: Eric DeVolderReviewed-by: Bhavesh Davda Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Jan Beulich Reviewed-by: Andrew Cooper Reviewed-by: Daniel Kiper --- v3: - Incorporated feedback from Jan Beulich to change the name of the new flag to KEXEC_FLAG_IN_HYPERCALL v2: 04/17/2017 - Patch titled 'kexec: use hypercall_create_continuation to protect KEXEC ops' - Jan Beulich directed me to use a continuation method instead of spinlock. - Incorporated feedback from Daniel Kiper - Incorporated feedback from Konrad Wilk v1: 04/10/2017 - Patch titled 'kexec: Add spinlock for the whole hypercall' - Used spinlock in do_kexec_op_internal() --- xen/common/kexec.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/xen/common/kexec.c b/xen/common/kexec.c index 072cc8e..253c204 100644 --- a/xen/common/kexec.c +++ b/xen/common/kexec.c @@ -50,9 +50,10 @@ static cpumask_t crash_saved_cpus; static struct kexec_image *kexec_image[KEXEC_IMAGE_NR]; -#define KEXEC_FLAG_DEFAULT_POS (KEXEC_IMAGE_NR + 0) -#define KEXEC_FLAG_CRASH_POS (KEXEC_IMAGE_NR + 1) -#define KEXEC_FLAG_IN_PROGRESS (KEXEC_IMAGE_NR + 2) +#define KEXEC_FLAG_DEFAULT_POS (KEXEC_IMAGE_NR + 0) +#define KEXEC_FLAG_CRASH_POS(KEXEC_IMAGE_NR + 1) +#define KEXEC_FLAG_IN_PROGRESS (KEXEC_IMAGE_NR + 2) +#define KEXEC_FLAG_IN_HYPERCALL (KEXEC_IMAGE_NR + 3) static unsigned long kexec_flags = 0; /* the lowest bits are for KEXEC_IMAGE... */ @@ -1193,6 +1194,9 @@ static int do_kexec_op_internal(unsigned long op, if ( ret ) return ret; +if ( test_and_set_bit(KEXEC_FLAG_IN_HYPERCALL, _flags) ) +return hypercall_create_continuation(__HYPERVISOR_kexec_op, "lh", op, uarg); + switch ( op ) { case KEXEC_CMD_kexec_get_range: @@ -1227,6 +1231,8 @@ static int do_kexec_op_internal(unsigned long op, break; } +clear_bit(KEXEC_FLAG_IN_HYPERCALL, _flags); + return ret; } -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel