subject:"Re\: \[Xen\-devel\] \[PATCHv1\] x86\: don't schedule when handling #NM exception"

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2015-03-05 Thread H. Peter Anvin

On 06/23/2014 06:08 AM, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 19, 2014 at 08:02:22AM -0700, H. Peter Anvin wrote:
>> On 03/19/2014 06:21 AM, Konrad Rzeszutek Wilk wrote:

 The following patch does the always eager allocation.  It's a fixup of
 Suresh's original patch.

>>>
>>> Hey Peter,
>>>
>>> I think this is the solution you were looking for?
>>>
>>> Or are there some other subtle issues that you think lurk around?
>>>
>>
>> Ah, I managed to miss it (mostly because it was buried *inside* another
>> email and didn't change the subject line... I really dislike that mode
>> of delivering a patch.
> 
> Let me roll up some of these patchset and send them as git send-email.
> 
>>
>> Let me see if the issues have been fixed.  Still wondering if there is a
>> way we can get away without the boot_func hack...
> 
> I have to confesss I don't even remember what the 'if the issues have been
> fixed' is referring to?
> 

Hi Konrad... it looks like this got left waiting for you and got forgotten?

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2015-03-05 Thread H. Peter Anvin

On 06/23/2014 06:08 AM, Konrad Rzeszutek Wilk wrote:
 On Wed, Mar 19, 2014 at 08:02:22AM -0700, H. Peter Anvin wrote:
 On 03/19/2014 06:21 AM, Konrad Rzeszutek Wilk wrote:

 The following patch does the always eager allocation.  It's a fixup of
 Suresh's original patch.


 Hey Peter,

 I think this is the solution you were looking for?

 Or are there some other subtle issues that you think lurk around?


 Ah, I managed to miss it (mostly because it was buried *inside* another
 email and didn't change the subject line... I really dislike that mode
 of delivering a patch.
 
 Let me roll up some of these patchset and send them as git send-email.
 

 Let me see if the issues have been fixed.  Still wondering if there is a
 way we can get away without the boot_func hack...
 
 I have to confesss I don't even remember what the 'if the issues have been
 fixed' is referring to?
 

Hi Konrad... it looks like this got left waiting for you and got forgotten?

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-06-23 Thread Konrad Rzeszutek Wilk

On Wed, Mar 19, 2014 at 08:02:22AM -0700, H. Peter Anvin wrote:
> On 03/19/2014 06:21 AM, Konrad Rzeszutek Wilk wrote:
> >>
> >> The following patch does the always eager allocation.  It's a fixup of
> >> Suresh's original patch.
> >>
> > 
> > Hey Peter,
> > 
> > I think this is the solution you were looking for?
> > 
> > Or are there some other subtle issues that you think lurk around?
> > 
> 
> Ah, I managed to miss it (mostly because it was buried *inside* another
> email and didn't change the subject line... I really dislike that mode
> of delivering a patch.

Let me roll up some of these patchset and send them as git send-email.

> 
> Let me see if the issues have been fixed.  Still wondering if there is a
> way we can get away without the boot_func hack...

I have to confesss I don't even remember what the 'if the issues have been
fixed' is referring to?

> 
>   -hpa
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-06-23 Thread Konrad Rzeszutek Wilk

On Wed, Mar 19, 2014 at 08:02:22AM -0700, H. Peter Anvin wrote:
 On 03/19/2014 06:21 AM, Konrad Rzeszutek Wilk wrote:
 
  The following patch does the always eager allocation.  It's a fixup of
  Suresh's original patch.
 
  
  Hey Peter,
  
  I think this is the solution you were looking for?
  
  Or are there some other subtle issues that you think lurk around?
  
 
 Ah, I managed to miss it (mostly because it was buried *inside* another
 email and didn't change the subject line... I really dislike that mode
 of delivering a patch.

Let me roll up some of these patchset and send them as git send-email.

 
 Let me see if the issues have been fixed.  Still wondering if there is a
 way we can get away without the boot_func hack...

I have to confesss I don't even remember what the 'if the issues have been
fixed' is referring to?

 
   -hpa
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-19 Thread H. Peter Anvin

Sounds good.

On March 19, 2014 5:00:11 PM PDT, Greg Kroah-Hartman 
 wrote:
>On Sun, Mar 16, 2014 at 09:23:01PM -0700, H. Peter Anvin wrote:
>> On 03/16/2014 09:12 PM, Sarah Newman wrote:
>> > 
>> > Unconditional eager allocation works. Can xen users count on this
>being included and applied to the
>> > stable kernels?
>> > 
>> 
>> I don't know.  If we state that it is a bug fix for Xen it might be
>> possible, but it would be up to Greg (Cc:'d) and the rest of the
>stable
>> team.
>
>If someone sends the git id of the commit in Linus's tree to the
>sta...@vger.kernel.org address, we will be glad to review it at that
>point in time.
>
>thanks,
>
>greg k-h

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-19 Thread Greg Kroah-Hartman

On Sun, Mar 16, 2014 at 09:23:01PM -0700, H. Peter Anvin wrote:
> On 03/16/2014 09:12 PM, Sarah Newman wrote:
> > 
> > Unconditional eager allocation works. Can xen users count on this being 
> > included and applied to the
> > stable kernels?
> > 
> 
> I don't know.  If we state that it is a bug fix for Xen it might be
> possible, but it would be up to Greg (Cc:'d) and the rest of the stable
> team.

If someone sends the git id of the commit in Linus's tree to the
sta...@vger.kernel.org address, we will be glad to review it at that
point in time.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-19 Thread H. Peter Anvin

On 03/19/2014 06:21 AM, Konrad Rzeszutek Wilk wrote:
>>
>> The following patch does the always eager allocation.  It's a fixup of
>> Suresh's original patch.
>>
> 
> Hey Peter,
> 
> I think this is the solution you were looking for?
> 
> Or are there some other subtle issues that you think lurk around?
> 

Ah, I managed to miss it (mostly because it was buried *inside* another
email and didn't change the subject line... I really dislike that mode
of delivering a patch.

Let me see if the issues have been fixed.  Still wondering if there is a
way we can get away without the boot_func hack...

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-19 Thread Konrad Rzeszutek Wilk

On Mon, Mar 17, 2014 at 01:29:03PM +, David Vrabel wrote:
> On 17/03/14 03:43, H. Peter Anvin wrote:
> > On 03/16/2014 08:35 PM, Sarah Newman wrote:
> >> Can you please review my patch first?  It's only enabled when absolutely 
> >> required.
> > 
> > It doesn't help.  It means you're running on Xen, and you will have
> > processes subjected to random SIGKILL because they happen to touch the
> > FPU when the atomic pool is low.
> > 
> > However, there is probably a happy medium: you don't actually need eager
> > FPU restore, you just need eager FPU *allocation*.  We have been
> > intending to allocate the FPU state at task creation time for eagerfpu,
> > and Suresh Siddha has already produced such a patch; it just needs some
> > minor fixups due to an __init failure.
> > 
> > http://lkml.kernel.org/r/1391325599.6481.5.camel@europa
> > 
> > In the Xen case we could turn on eager allocation but not eager fpu.  In
> > fact, it might be justified to *always* do eager allocation...
> 
> The following patch does the always eager allocation.  It's a fixup of
> Suresh's original patch.
> 

Hey Peter,

I think this is the solution you were looking for?

Or are there some other subtle issues that you think lurk around?

Thanks!
> v2:
> - Allocate initial fpu state after xsave_init().
> - Only allocate initial FPU state on boot cpu.
> 
> 8<---
> x86, fpu: remove the logic of non-eager fpu mem allocation at the first usage
> 
> From: Suresh Siddha 
> 
> For non-eager fpu mode, thread's fpu state is allocated during the first
> fpu usage (in the context of device not available exception). This can be
> a blocking call and hence we enable interrupts (which were originally
> disabled when the exception happened), allocate memory and disable
> interrupts etc. While this saves 512 bytes or so per-thread, there
> are some issues in general.
> 
> a.  Most of the future cases will be anyway using eager
> FPU (because of processor features like xsaveopt, LWP, MPX etc) and
> they do the allocation at the thread creation itself. Nice to have
> one common mechanism as all the state save/restore code is
> shared. Avoids the confusion and minimizes the subtle bugs
> in the core piece involved with context-switch.
> 
> b. If a parent thread uses FPU, during fork() we allocate
> the FPU state in the child and copy the state etc. Shortly after this,
> during exec() we free it up, so that we can later allocate during
> the first usage of FPU. So this free/allocate might be slower
> for some workloads.
> 
> c. math_state_restore() is called from multiple places
> and it is error pone if the caller expects interrupts to be disabled
> throughout the execution of math_state_restore(). Can lead to subtle
> bugs like Ubuntu bug #1265841.
> 
> Memory savings will be small anyways and the code complexity
> introducing subtle bugs is not worth it. So remove
> the logic of non-eager fpu mem allocation at the first usage.
> 
> Signed-off-by: Suresh Siddha 
> Signed-off-by: David Vrabel 
> ---
>  arch/x86/kernel/i387.c|   22 +-
>  arch/x86/kernel/process.c |6 --
>  arch/x86/kernel/traps.c   |   16 ++--
>  arch/x86/kernel/xsave.c   |2 --
>  4 files changed, 15 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
> index e8368c6..05aeae2 100644
> --- a/arch/x86/kernel/i387.c
> +++ b/arch/x86/kernel/i387.c
> @@ -5,6 +5,7 @@
>   *  General FPU state handling cleanups
>   *   Gareth Hughes , May 2000
>   */
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -153,8 +154,15 @@ static void init_thread_xstate(void)
>   * into all processes.
>   */
>  
> +static void __init fpu_init_boot_cpu(void)
> +{
> + current->thread.fpu.state =
> + alloc_bootmem_align(xstate_size, __alignof__(struct 
> xsave_struct));
> +}
> +
>  void fpu_init(void)
>  {
> + static __refdata void (*boot_func)(void) = fpu_init_boot_cpu;
>   unsigned long cr0;
>   unsigned long cr4_mask = 0;
>  
> @@ -189,6 +197,11 @@ void fpu_init(void)
>   mxcsr_feature_mask_init();
>   xsave_init();
>   eager_fpu_init();
> +
> + if (boot_func) {
> + boot_func();
> + boot_func = NULL;
> + }
>  }
>  
>  void fpu_finit(struct fpu *fpu)
> @@ -219,8 +232,6 @@ EXPORT_SYMBOL_GPL(fpu_finit);
>   */
>  int init_fpu(struct task_struct *tsk)
>  {
> - int ret;
> -
>   if (tsk_used_math(tsk)) {
>   if (cpu_has_fpu && tsk == current)
>   unlazy_fpu(tsk);
> @@ -228,13 +239,6 @@ int init_fpu(struct task_struct *tsk)
>   return 0;
>   }
>  
> - /*
> -  * Memory allocation at the first usage of the FPU and other state.
> -  */
> - ret = fpu_alloc(>thread.fpu);
> - if (ret)
> - return ret;
> -
>   fpu_finit(>thread.fpu);
>  
>   set_stopped_child_used_math(tsk);
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-19 Thread Konrad Rzeszutek Wilk

On Mon, Mar 17, 2014 at 01:29:03PM +, David Vrabel wrote:
 On 17/03/14 03:43, H. Peter Anvin wrote:
  On 03/16/2014 08:35 PM, Sarah Newman wrote:
  Can you please review my patch first?  It's only enabled when absolutely 
  required.
  
  It doesn't help.  It means you're running on Xen, and you will have
  processes subjected to random SIGKILL because they happen to touch the
  FPU when the atomic pool is low.
  
  However, there is probably a happy medium: you don't actually need eager
  FPU restore, you just need eager FPU *allocation*.  We have been
  intending to allocate the FPU state at task creation time for eagerfpu,
  and Suresh Siddha has already produced such a patch; it just needs some
  minor fixups due to an __init failure.
  
  http://lkml.kernel.org/r/1391325599.6481.5.camel@europa
  
  In the Xen case we could turn on eager allocation but not eager fpu.  In
  fact, it might be justified to *always* do eager allocation...
 
 The following patch does the always eager allocation.  It's a fixup of
 Suresh's original patch.
 

Hey Peter,

I think this is the solution you were looking for?

Or are there some other subtle issues that you think lurk around?

Thanks!
 v2:
 - Allocate initial fpu state after xsave_init().
 - Only allocate initial FPU state on boot cpu.
 
 8---
 x86, fpu: remove the logic of non-eager fpu mem allocation at the first usage
 
 From: Suresh Siddha sbsid...@gmail.com
 
 For non-eager fpu mode, thread's fpu state is allocated during the first
 fpu usage (in the context of device not available exception). This can be
 a blocking call and hence we enable interrupts (which were originally
 disabled when the exception happened), allocate memory and disable
 interrupts etc. While this saves 512 bytes or so per-thread, there
 are some issues in general.
 
 a.  Most of the future cases will be anyway using eager
 FPU (because of processor features like xsaveopt, LWP, MPX etc) and
 they do the allocation at the thread creation itself. Nice to have
 one common mechanism as all the state save/restore code is
 shared. Avoids the confusion and minimizes the subtle bugs
 in the core piece involved with context-switch.
 
 b. If a parent thread uses FPU, during fork() we allocate
 the FPU state in the child and copy the state etc. Shortly after this,
 during exec() we free it up, so that we can later allocate during
 the first usage of FPU. So this free/allocate might be slower
 for some workloads.
 
 c. math_state_restore() is called from multiple places
 and it is error pone if the caller expects interrupts to be disabled
 throughout the execution of math_state_restore(). Can lead to subtle
 bugs like Ubuntu bug #1265841.
 
 Memory savings will be small anyways and the code complexity
 introducing subtle bugs is not worth it. So remove
 the logic of non-eager fpu mem allocation at the first usage.
 
 Signed-off-by: Suresh Siddha sbsid...@gmail.com
 Signed-off-by: David Vrabel david.vra...@citrix.com
 ---
  arch/x86/kernel/i387.c|   22 +-
  arch/x86/kernel/process.c |6 --
  arch/x86/kernel/traps.c   |   16 ++--
  arch/x86/kernel/xsave.c   |2 --
  4 files changed, 15 insertions(+), 31 deletions(-)
 
 diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
 index e8368c6..05aeae2 100644
 --- a/arch/x86/kernel/i387.c
 +++ b/arch/x86/kernel/i387.c
 @@ -5,6 +5,7 @@
   *  General FPU state handling cleanups
   *   Gareth Hughes gar...@valinux.com, May 2000
   */
 +#include linux/bootmem.h
  #include linux/module.h
  #include linux/regset.h
  #include linux/sched.h
 @@ -153,8 +154,15 @@ static void init_thread_xstate(void)
   * into all processes.
   */
  
 +static void __init fpu_init_boot_cpu(void)
 +{
 + current-thread.fpu.state =
 + alloc_bootmem_align(xstate_size, __alignof__(struct 
 xsave_struct));
 +}
 +
  void fpu_init(void)
  {
 + static __refdata void (*boot_func)(void) = fpu_init_boot_cpu;
   unsigned long cr0;
   unsigned long cr4_mask = 0;
  
 @@ -189,6 +197,11 @@ void fpu_init(void)
   mxcsr_feature_mask_init();
   xsave_init();
   eager_fpu_init();
 +
 + if (boot_func) {
 + boot_func();
 + boot_func = NULL;
 + }
  }
  
  void fpu_finit(struct fpu *fpu)
 @@ -219,8 +232,6 @@ EXPORT_SYMBOL_GPL(fpu_finit);
   */
  int init_fpu(struct task_struct *tsk)
  {
 - int ret;
 -
   if (tsk_used_math(tsk)) {
   if (cpu_has_fpu  tsk == current)
   unlazy_fpu(tsk);
 @@ -228,13 +239,6 @@ int init_fpu(struct task_struct *tsk)
   return 0;
   }
  
 - /*
 -  * Memory allocation at the first usage of the FPU and other state.
 -  */
 - ret = fpu_alloc(tsk-thread.fpu);
 - if (ret)
 - return ret;
 -
   fpu_finit(tsk-thread.fpu);
  
   set_stopped_child_used_math(tsk);
 diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
 index 3fb8d95..cd9c190

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-19 Thread H. Peter Anvin

On 03/19/2014 06:21 AM, Konrad Rzeszutek Wilk wrote:

 The following patch does the always eager allocation.  It's a fixup of
 Suresh's original patch.

 
 Hey Peter,
 
 I think this is the solution you were looking for?
 
 Or are there some other subtle issues that you think lurk around?
 

Ah, I managed to miss it (mostly because it was buried *inside* another
email and didn't change the subject line... I really dislike that mode
of delivering a patch.

Let me see if the issues have been fixed.  Still wondering if there is a
way we can get away without the boot_func hack...

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-19 Thread Greg Kroah-Hartman

On Sun, Mar 16, 2014 at 09:23:01PM -0700, H. Peter Anvin wrote:
 On 03/16/2014 09:12 PM, Sarah Newman wrote:
  
  Unconditional eager allocation works. Can xen users count on this being 
  included and applied to the
  stable kernels?
  
 
 I don't know.  If we state that it is a bug fix for Xen it might be
 possible, but it would be up to Greg (Cc:'d) and the rest of the stable
 team.

If someone sends the git id of the commit in Linus's tree to the
sta...@vger.kernel.org address, we will be glad to review it at that
point in time.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-19 Thread H. Peter Anvin

Sounds good.

On March 19, 2014 5:00:11 PM PDT, Greg Kroah-Hartman 
gre...@linuxfoundation.org wrote:
On Sun, Mar 16, 2014 at 09:23:01PM -0700, H. Peter Anvin wrote:
 On 03/16/2014 09:12 PM, Sarah Newman wrote:
  
  Unconditional eager allocation works. Can xen users count on this
being included and applied to the
  stable kernels?
  
 
 I don't know.  If we state that it is a bug fix for Xen it might be
 possible, but it would be up to Greg (Cc:'d) and the rest of the
stable
 team.

If someone sends the git id of the commit in Linus's tree to the
sta...@vger.kernel.org address, we will be glad to review it at that
point in time.

thanks,

greg k-h

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-18 Thread H. Peter Anvin

On 03/18/2014 11:17 AM, Sarah Newman wrote:
> 
> Should or has there been a review of the current xen PVABI to look for any 
> other such deviations?
> 

It would be a very good thing to do.  First of all, the PVABI needs to
be **documented** because without that there is no hope.  I would like
to specifically call out the efforts of Konrad Wilk in this general
area, but he obviously doesn't scale, either...

The other aspect is to get rid of as much of the PVABI as possible.  HPV
is a big step in that direction, getting rid of some of the most
invasive hooks, but I think we can minimize further.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-18 Thread Sarah Newman

On 03/17/2014 10:14 AM, George Dunlap wrote:
> On 03/17/2014 05:05 PM, Jan Beulich wrote:
> On 17.03.14 at 17:55, "H. Peter Anvin"  wrote:
>>> So if this interface wasn't an accident it was active negligence and
>>> incompetence.
>> I don't think so - while it (as we now see) disallows certain things
>> inside the guest, back at the time when this was designed there was
>> no sign of any sort of allocation/scheduling being done inside the
>> #NM handler. And furthermore, a PV specification is by its nature
>> allowed to define deviations from real hardware behavior, or else it
>> wouldn't be needed in the first place.
> 
> But it's certainly the case that deviating from the hardware in *this* way by 
> default was always
> very likely to case the exact kind of bug we've seen here.  It is an 
> "interface trap" that was bound
> to be tripped over (much like Intel's infamous sysret vulnerability).
> 
> Making it opt-in would have been a much better idea.  But the people who made 
> that decision are long
> gone, and we now need to deal with the situation as we have it.

Should or has there been a review of the current xen PVABI to look for any 
other such deviations?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-18 Thread Ingo Molnar


* H. Peter Anvin  wrote:

> On 03/17/2014 10:05 AM, Jan Beulich wrote:
> > 
> > I don't think so - while it (as we now see) disallows certain things
> > inside the guest, back at the time when this was designed there was
> > no sign of any sort of allocation/scheduling being done inside the
> > #NM handler. And furthermore, a PV specification is by its nature
> > allowed to define deviations from real hardware behavior, or else it
> > wouldn't be needed in the first place.
> > 
> 
> And this is exactly the sort of thing about Xen that make me want to 
> go on murderous rampage.  You think you can just take the current 
> Linux implementation at whatever time you implement the code and 
> later come back and say "don't change that, we hard-coded it in 
> Xen."

And the solution is that we just ignore that kind of crap in the 
native kernel and let Xen sort it out as best as it can.

When Xen (and PV) was merged it was promised that a PV interface can 
always adopt to whatever Linux does, without restricting the kernel on 
the native side in any fashion - time to check on that promise.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-18 Thread Ingo Molnar


* H. Peter Anvin h...@zytor.com wrote:

 On 03/17/2014 10:05 AM, Jan Beulich wrote:
  
  I don't think so - while it (as we now see) disallows certain things
  inside the guest, back at the time when this was designed there was
  no sign of any sort of allocation/scheduling being done inside the
  #NM handler. And furthermore, a PV specification is by its nature
  allowed to define deviations from real hardware behavior, or else it
  wouldn't be needed in the first place.
  
 
 And this is exactly the sort of thing about Xen that make me want to 
 go on murderous rampage.  You think you can just take the current 
 Linux implementation at whatever time you implement the code and 
 later come back and say don't change that, we hard-coded it in 
 Xen.

And the solution is that we just ignore that kind of crap in the 
native kernel and let Xen sort it out as best as it can.

When Xen (and PV) was merged it was promised that a PV interface can 
always adopt to whatever Linux does, without restricting the kernel on 
the native side in any fashion - time to check on that promise.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-18 Thread Sarah Newman

On 03/17/2014 10:14 AM, George Dunlap wrote:
 On 03/17/2014 05:05 PM, Jan Beulich wrote:
 On 17.03.14 at 17:55, H. Peter Anvin h...@zytor.com wrote:
 So if this interface wasn't an accident it was active negligence and
 incompetence.
 I don't think so - while it (as we now see) disallows certain things
 inside the guest, back at the time when this was designed there was
 no sign of any sort of allocation/scheduling being done inside the
 #NM handler. And furthermore, a PV specification is by its nature
 allowed to define deviations from real hardware behavior, or else it
 wouldn't be needed in the first place.
 
 But it's certainly the case that deviating from the hardware in *this* way by 
 default was always
 very likely to case the exact kind of bug we've seen here.  It is an 
 interface trap that was bound
 to be tripped over (much like Intel's infamous sysret vulnerability).
 
 Making it opt-in would have been a much better idea.  But the people who made 
 that decision are long
 gone, and we now need to deal with the situation as we have it.

Should or has there been a review of the current xen PVABI to look for any 
other such deviations?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-18 Thread H. Peter Anvin

On 03/18/2014 11:17 AM, Sarah Newman wrote:
 
 Should or has there been a review of the current xen PVABI to look for any 
 other such deviations?
 

It would be a very good thing to do.  First of all, the PVABI needs to
be **documented** because without that there is no hope.  I would like
to specifically call out the efforts of Konrad Wilk in this general
area, but he obviously doesn't scale, either...

The other aspect is to get rid of as much of the PVABI as possible.  HPV
is a big step in that direction, getting rid of some of the most
invasive hooks, but I think we can minimize further.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-17 Thread George Dunlap


On 03/17/2014 05:05 PM, Jan Beulich wrote:

On 17.03.14 at 17:55, "H. Peter Anvin"  wrote:

On 03/17/2014 05:19 AM, George Dunlap wrote:

On Mon, Mar 17, 2014 at 3:33 AM, H. Peter Anvin  wrote:

No, the right thing is to unf*ck the Xen braindamage and use eagerfpu as a

workaround for the legacy hypervisor versions.

The interface wasn't an accident.  In the most common case you'll want
to clear the bit anyway. In PV mode clearing it would require an extra
trip up into the hypervisor.  So this saves one trip up into the
hypervisor on every context switch which involves an FPU, at the
expense of not being able to context-switch away when handling the
trap.

The interface was a complete faceplant, because it caused failures.
You're not infinitely unconstrained since you want to play in the same
sandbox as the native architecture, and if you want to have a hope of
avoiding these kinds of failures you really need to avoid making random
"improvements", certainly not without an explicit guest opt-in (the same
we do for the native CPU architecture when adding new features.)

So if this interface wasn't an accident it was active negligence and
incompetence.

I don't think so - while it (as we now see) disallows certain things
inside the guest, back at the time when this was designed there was
no sign of any sort of allocation/scheduling being done inside the
#NM handler. And furthermore, a PV specification is by its nature
allowed to define deviations from real hardware behavior, or else it
wouldn't be needed in the first place.


But it's certainly the case that deviating from the hardware in *this* 
way by default was always very likely to case the exact kind of bug 
we've seen here.  It is an "interface trap" that was bound to be tripped 
over (much like Intel's infamous sysret vulnerability).


Making it opt-in would have been a much better idea.  But the people who 
made that decision are long gone, and we now need to deal with the 
situation as we have it.


 -George
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-17 Thread H. Peter Anvin

On 03/17/2014 10:05 AM, Jan Beulich wrote:
> 
> I don't think so - while it (as we now see) disallows certain things
> inside the guest, back at the time when this was designed there was
> no sign of any sort of allocation/scheduling being done inside the
> #NM handler. And furthermore, a PV specification is by its nature
> allowed to define deviations from real hardware behavior, or else it
> wouldn't be needed in the first place.
> 

And this is exactly the sort of thing about Xen that make me want to go
on murderous rampage.  You think you can just take the current Linux
implementation at whatever time you implement the code and later come
back and say "don't change that, we hard-coded it in Xen."

Calling that "active negligence and incompetence" is probably on the
mild side.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-17 Thread Jan Beulich

>>> On 17.03.14 at 17:55, "H. Peter Anvin"  wrote:
> On 03/17/2014 05:19 AM, George Dunlap wrote:
>> On Mon, Mar 17, 2014 at 3:33 AM, H. Peter Anvin  wrote:
>>> No, the right thing is to unf*ck the Xen braindamage and use eagerfpu as a 
> workaround for the legacy hypervisor versions.
>> 
>> The interface wasn't an accident.  In the most common case you'll want
>> to clear the bit anyway. In PV mode clearing it would require an extra
>> trip up into the hypervisor.  So this saves one trip up into the
>> hypervisor on every context switch which involves an FPU, at the
>> expense of not being able to context-switch away when handling the
>> trap.
> 
> The interface was a complete faceplant, because it caused failures.
> You're not infinitely unconstrained since you want to play in the same
> sandbox as the native architecture, and if you want to have a hope of
> avoiding these kinds of failures you really need to avoid making random
> "improvements", certainly not without an explicit guest opt-in (the same
> we do for the native CPU architecture when adding new features.)
> 
> So if this interface wasn't an accident it was active negligence and
> incompetence.

I don't think so - while it (as we now see) disallows certain things
inside the guest, back at the time when this was designed there was
no sign of any sort of allocation/scheduling being done inside the
#NM handler. And furthermore, a PV specification is by its nature
allowed to define deviations from real hardware behavior, or else it
wouldn't be needed in the first place.

Jan


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-17 Thread H. Peter Anvin

On 03/17/2014 05:19 AM, George Dunlap wrote:
> On Mon, Mar 17, 2014 at 3:33 AM, H. Peter Anvin  wrote:
>> No, the right thing is to unf*ck the Xen braindamage and use eagerfpu as a 
>> workaround for the legacy hypervisor versions.
> 
> The interface wasn't an accident.  In the most common case you'll want
> to clear the bit anyway. In PV mode clearing it would require an extra
> trip up into the hypervisor.  So this saves one trip up into the
> hypervisor on every context switch which involves an FPU, at the
> expense of not being able to context-switch away when handling the
> trap.
> 
>  -George
> 

The interface was a complete faceplant, because it caused failures.
You're not infinitely unconstrained since you want to play in the same
sandbox as the native architecture, and if you want to have a hope of
avoiding these kinds of failures you really need to avoid making random
"improvements", certainly not without an explicit guest opt-in (the same
we do for the native CPU architecture when adding new features.)

So if this interface wasn't an accident it was active negligence and
incompetence.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-17 Thread David Vrabel

On 17/03/14 03:43, H. Peter Anvin wrote:
> On 03/16/2014 08:35 PM, Sarah Newman wrote:
>> Can you please review my patch first?  It's only enabled when absolutely 
>> required.
> 
> It doesn't help.  It means you're running on Xen, and you will have
> processes subjected to random SIGKILL because they happen to touch the
> FPU when the atomic pool is low.
> 
> However, there is probably a happy medium: you don't actually need eager
> FPU restore, you just need eager FPU *allocation*.  We have been
> intending to allocate the FPU state at task creation time for eagerfpu,
> and Suresh Siddha has already produced such a patch; it just needs some
> minor fixups due to an __init failure.
> 
> http://lkml.kernel.org/r/1391325599.6481.5.camel@europa
> 
> In the Xen case we could turn on eager allocation but not eager fpu.  In
> fact, it might be justified to *always* do eager allocation...

The following patch does the always eager allocation.  It's a fixup of
Suresh's original patch.

v2:
- Allocate initial fpu state after xsave_init().
- Only allocate initial FPU state on boot cpu.

8<---
x86, fpu: remove the logic of non-eager fpu mem allocation at the first usage

From: Suresh Siddha 

For non-eager fpu mode, thread's fpu state is allocated during the first
fpu usage (in the context of device not available exception). This can be
a blocking call and hence we enable interrupts (which were originally
disabled when the exception happened), allocate memory and disable
interrupts etc. While this saves 512 bytes or so per-thread, there
are some issues in general.

a.  Most of the future cases will be anyway using eager
FPU (because of processor features like xsaveopt, LWP, MPX etc) and
they do the allocation at the thread creation itself. Nice to have
one common mechanism as all the state save/restore code is
shared. Avoids the confusion and minimizes the subtle bugs
in the core piece involved with context-switch.

b. If a parent thread uses FPU, during fork() we allocate
the FPU state in the child and copy the state etc. Shortly after this,
during exec() we free it up, so that we can later allocate during
the first usage of FPU. So this free/allocate might be slower
for some workloads.

c. math_state_restore() is called from multiple places
and it is error pone if the caller expects interrupts to be disabled
throughout the execution of math_state_restore(). Can lead to subtle
bugs like Ubuntu bug #1265841.

Memory savings will be small anyways and the code complexity
introducing subtle bugs is not worth it. So remove
the logic of non-eager fpu mem allocation at the first usage.

Signed-off-by: Suresh Siddha 
Signed-off-by: David Vrabel 
---
 arch/x86/kernel/i387.c|   22 +-
 arch/x86/kernel/process.c |6 --
 arch/x86/kernel/traps.c   |   16 ++--
 arch/x86/kernel/xsave.c   |2 --
 4 files changed, 15 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index e8368c6..05aeae2 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -5,6 +5,7 @@
  *  General FPU state handling cleanups
  * Gareth Hughes , May 2000
  */
+#include 
 #include 
 #include 
 #include 
@@ -153,8 +154,15 @@ static void init_thread_xstate(void)
  * into all processes.
  */

+static void __init fpu_init_boot_cpu(void)
+{
+   current->thread.fpu.state =
+   alloc_bootmem_align(xstate_size, __alignof__(struct 
xsave_struct));
+}
+
 void fpu_init(void)
 {
+   static __refdata void (*boot_func)(void) = fpu_init_boot_cpu;
unsigned long cr0;
unsigned long cr4_mask = 0;

@@ -189,6 +197,11 @@ void fpu_init(void)
mxcsr_feature_mask_init();
xsave_init();
eager_fpu_init();
+
+   if (boot_func) {
+   boot_func();
+   boot_func = NULL;
+   }
 }

 void fpu_finit(struct fpu *fpu)
@@ -219,8 +232,6 @@ EXPORT_SYMBOL_GPL(fpu_finit);
  */
 int init_fpu(struct task_struct *tsk)
 {
-   int ret;
-
if (tsk_used_math(tsk)) {
if (cpu_has_fpu && tsk == current)
unlazy_fpu(tsk);
@@ -228,13 +239,6 @@ int init_fpu(struct task_struct *tsk)
return 0;
}

-   /*
-* Memory allocation at the first usage of the FPU and other state.
-*/
-   ret = fpu_alloc(>thread.fpu);
-   if (ret)
-   return ret;
-
fpu_finit(>thread.fpu);

set_stopped_child_used_math(tsk);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 3fb8d95..cd9c190 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -128,12 +128,6 @@ void flush_thread(void)
flush_ptrace_hw_breakpoint(tsk);
memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
drop_init_fpu(tsk);
-   /*
-* Free the FPU state for non xsave platforms. They get reallocated
-* lazily at the first use.
-*/
-

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-17 Thread George Dunlap

On Mon, Mar 17, 2014 at 3:33 AM, H. Peter Anvin  wrote:
> No, the right thing is to unf*ck the Xen braindamage and use eagerfpu as a 
> workaround for the legacy hypervisor versions.

The interface wasn't an accident.  In the most common case you'll want
to clear the bit anyway. In PV mode clearing it would require an extra
trip up into the hypervisor.  So this saves one trip up into the
hypervisor on every context switch which involves an FPU, at the
expense of not being able to context-switch away when handling the
trap.

 -George
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-17 Thread George Dunlap

On Mon, Mar 17, 2014 at 3:33 AM, H. Peter Anvin h...@zytor.com wrote:
 No, the right thing is to unf*ck the Xen braindamage and use eagerfpu as a 
 workaround for the legacy hypervisor versions.

The interface wasn't an accident.  In the most common case you'll want
to clear the bit anyway. In PV mode clearing it would require an extra
trip up into the hypervisor.  So this saves one trip up into the
hypervisor on every context switch which involves an FPU, at the
expense of not being able to context-switch away when handling the
trap.

 -George
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-17 Thread David Vrabel

On 17/03/14 03:43, H. Peter Anvin wrote:
 On 03/16/2014 08:35 PM, Sarah Newman wrote:
 Can you please review my patch first?  It's only enabled when absolutely 
 required.
 
 It doesn't help.  It means you're running on Xen, and you will have
 processes subjected to random SIGKILL because they happen to touch the
 FPU when the atomic pool is low.
 
 However, there is probably a happy medium: you don't actually need eager
 FPU restore, you just need eager FPU *allocation*.  We have been
 intending to allocate the FPU state at task creation time for eagerfpu,
 and Suresh Siddha has already produced such a patch; it just needs some
 minor fixups due to an __init failure.
 
 http://lkml.kernel.org/r/1391325599.6481.5.camel@europa
 
 In the Xen case we could turn on eager allocation but not eager fpu.  In
 fact, it might be justified to *always* do eager allocation...

The following patch does the always eager allocation.  It's a fixup of
Suresh's original patch.

v2:
- Allocate initial fpu state after xsave_init().
- Only allocate initial FPU state on boot cpu.

8---
x86, fpu: remove the logic of non-eager fpu mem allocation at the first usage

From: Suresh Siddha sbsid...@gmail.com

For non-eager fpu mode, thread's fpu state is allocated during the first
fpu usage (in the context of device not available exception). This can be
a blocking call and hence we enable interrupts (which were originally
disabled when the exception happened), allocate memory and disable
interrupts etc. While this saves 512 bytes or so per-thread, there
are some issues in general.

a.  Most of the future cases will be anyway using eager
FPU (because of processor features like xsaveopt, LWP, MPX etc) and
they do the allocation at the thread creation itself. Nice to have
one common mechanism as all the state save/restore code is
shared. Avoids the confusion and minimizes the subtle bugs
in the core piece involved with context-switch.

b. If a parent thread uses FPU, during fork() we allocate
the FPU state in the child and copy the state etc. Shortly after this,
during exec() we free it up, so that we can later allocate during
the first usage of FPU. So this free/allocate might be slower
for some workloads.

c. math_state_restore() is called from multiple places
and it is error pone if the caller expects interrupts to be disabled
throughout the execution of math_state_restore(). Can lead to subtle
bugs like Ubuntu bug #1265841.

Memory savings will be small anyways and the code complexity
introducing subtle bugs is not worth it. So remove
the logic of non-eager fpu mem allocation at the first usage.

Signed-off-by: Suresh Siddha sbsid...@gmail.com
Signed-off-by: David Vrabel david.vra...@citrix.com
---
 arch/x86/kernel/i387.c|   22 +-
 arch/x86/kernel/process.c |6 --
 arch/x86/kernel/traps.c   |   16 ++--
 arch/x86/kernel/xsave.c   |2 --
 4 files changed, 15 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index e8368c6..05aeae2 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -5,6 +5,7 @@
  *  General FPU state handling cleanups
  * Gareth Hughes gar...@valinux.com, May 2000
  */
+#include linux/bootmem.h
 #include linux/module.h
 #include linux/regset.h
 #include linux/sched.h
@@ -153,8 +154,15 @@ static void init_thread_xstate(void)
  * into all processes.
  */
 
+static void __init fpu_init_boot_cpu(void)
+{
+   current-thread.fpu.state =
+   alloc_bootmem_align(xstate_size, __alignof__(struct 
xsave_struct));
+}
+
 void fpu_init(void)
 {
+   static __refdata void (*boot_func)(void) = fpu_init_boot_cpu;
unsigned long cr0;
unsigned long cr4_mask = 0;
 
@@ -189,6 +197,11 @@ void fpu_init(void)
mxcsr_feature_mask_init();
xsave_init();
eager_fpu_init();
+
+   if (boot_func) {
+   boot_func();
+   boot_func = NULL;
+   }
 }
 
 void fpu_finit(struct fpu *fpu)
@@ -219,8 +232,6 @@ EXPORT_SYMBOL_GPL(fpu_finit);
  */
 int init_fpu(struct task_struct *tsk)
 {
-   int ret;
-
if (tsk_used_math(tsk)) {
if (cpu_has_fpu  tsk == current)
unlazy_fpu(tsk);
@@ -228,13 +239,6 @@ int init_fpu(struct task_struct *tsk)
return 0;
}
 
-   /*
-* Memory allocation at the first usage of the FPU and other state.
-*/
-   ret = fpu_alloc(tsk-thread.fpu);
-   if (ret)
-   return ret;
-
fpu_finit(tsk-thread.fpu);
 
set_stopped_child_used_math(tsk);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 3fb8d95..cd9c190 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -128,12 +128,6 @@ void flush_thread(void)
flush_ptrace_hw_breakpoint(tsk);
memset(tsk-thread.tls_array, 0, sizeof(tsk-thread.tls_array));
drop_init_fpu(tsk);
-   /*
-* Free

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-17 Thread H. Peter Anvin

On 03/17/2014 05:19 AM, George Dunlap wrote:
 On Mon, Mar 17, 2014 at 3:33 AM, H. Peter Anvin h...@zytor.com wrote:
 No, the right thing is to unf*ck the Xen braindamage and use eagerfpu as a 
 workaround for the legacy hypervisor versions.
 
 The interface wasn't an accident.  In the most common case you'll want
 to clear the bit anyway. In PV mode clearing it would require an extra
 trip up into the hypervisor.  So this saves one trip up into the
 hypervisor on every context switch which involves an FPU, at the
 expense of not being able to context-switch away when handling the
 trap.
 
  -George
 

The interface was a complete faceplant, because it caused failures.
You're not infinitely unconstrained since you want to play in the same
sandbox as the native architecture, and if you want to have a hope of
avoiding these kinds of failures you really need to avoid making random
improvements, certainly not without an explicit guest opt-in (the same
we do for the native CPU architecture when adding new features.)

So if this interface wasn't an accident it was active negligence and
incompetence.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-17 Thread Jan Beulich

 On 17.03.14 at 17:55, H. Peter Anvin h...@zytor.com wrote:
 On 03/17/2014 05:19 AM, George Dunlap wrote:
 On Mon, Mar 17, 2014 at 3:33 AM, H. Peter Anvin h...@zytor.com wrote:
 No, the right thing is to unf*ck the Xen braindamage and use eagerfpu as a 
 workaround for the legacy hypervisor versions.
 
 The interface wasn't an accident.  In the most common case you'll want
 to clear the bit anyway. In PV mode clearing it would require an extra
 trip up into the hypervisor.  So this saves one trip up into the
 hypervisor on every context switch which involves an FPU, at the
 expense of not being able to context-switch away when handling the
 trap.
 
 The interface was a complete faceplant, because it caused failures.
 You're not infinitely unconstrained since you want to play in the same
 sandbox as the native architecture, and if you want to have a hope of
 avoiding these kinds of failures you really need to avoid making random
 improvements, certainly not without an explicit guest opt-in (the same
 we do for the native CPU architecture when adding new features.)
 
 So if this interface wasn't an accident it was active negligence and
 incompetence.

I don't think so - while it (as we now see) disallows certain things
inside the guest, back at the time when this was designed there was
no sign of any sort of allocation/scheduling being done inside the
#NM handler. And furthermore, a PV specification is by its nature
allowed to define deviations from real hardware behavior, or else it
wouldn't be needed in the first place.

Jan


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-17 Thread H. Peter Anvin

On 03/17/2014 10:05 AM, Jan Beulich wrote:
 
 I don't think so - while it (as we now see) disallows certain things
 inside the guest, back at the time when this was designed there was
 no sign of any sort of allocation/scheduling being done inside the
 #NM handler. And furthermore, a PV specification is by its nature
 allowed to define deviations from real hardware behavior, or else it
 wouldn't be needed in the first place.
 

And this is exactly the sort of thing about Xen that make me want to go
on murderous rampage.  You think you can just take the current Linux
implementation at whatever time you implement the code and later come
back and say don't change that, we hard-coded it in Xen.

Calling that active negligence and incompetence is probably on the
mild side.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-17 Thread George Dunlap


On 03/17/2014 05:05 PM, Jan Beulich wrote:

On 17.03.14 at 17:55, H. Peter Anvin h...@zytor.com wrote:

On 03/17/2014 05:19 AM, George Dunlap wrote:

On Mon, Mar 17, 2014 at 3:33 AM, H. Peter Anvin h...@zytor.com wrote:

No, the right thing is to unf*ck the Xen braindamage and use eagerfpu as a

workaround for the legacy hypervisor versions.

The interface wasn't an accident.  In the most common case you'll want
to clear the bit anyway. In PV mode clearing it would require an extra
trip up into the hypervisor.  So this saves one trip up into the
hypervisor on every context switch which involves an FPU, at the
expense of not being able to context-switch away when handling the
trap.

The interface was a complete faceplant, because it caused failures.
You're not infinitely unconstrained since you want to play in the same
sandbox as the native architecture, and if you want to have a hope of
avoiding these kinds of failures you really need to avoid making random
improvements, certainly not without an explicit guest opt-in (the same
we do for the native CPU architecture when adding new features.)

So if this interface wasn't an accident it was active negligence and
incompetence.

I don't think so - while it (as we now see) disallows certain things
inside the guest, back at the time when this was designed there was
no sign of any sort of allocation/scheduling being done inside the
#NM handler. And furthermore, a PV specification is by its nature
allowed to define deviations from real hardware behavior, or else it
wouldn't be needed in the first place.


But it's certainly the case that deviating from the hardware in *this* 
way by default was always very likely to case the exact kind of bug 
we've seen here.  It is an interface trap that was bound to be tripped 
over (much like Intel's infamous sysret vulnerability).


Making it opt-in would have been a much better idea.  But the people who 
made that decision are long gone, and we now need to deal with the 
situation as we have it.


 -George
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-16 Thread H. Peter Anvin

On 03/16/2014 09:12 PM, Sarah Newman wrote:
> 
> Unconditional eager allocation works. Can xen users count on this being 
> included and applied to the
> stable kernels?
> 

I don't know.  If we state that it is a bug fix for Xen it might be
possible, but it would be up to Greg (Cc:'d) and the rest of the stable
team.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-16 Thread Sarah Newman

On 03/16/2014 08:43 PM, H. Peter Anvin wrote:
> On 03/16/2014 08:35 PM, Sarah Newman wrote:
>> Can you please review my patch first?  It's only enabled when absolutely 
>> required.
> 
> It doesn't help.  It means you're running on Xen, and you will have
> processes subjected to random SIGKILL because they happen to touch the
> FPU when the atomic pool is low.
> 
> However, there is probably a happy medium: you don't actually need eager
> FPU restore, you just need eager FPU *allocation*.  We have been
> intending to allocate the FPU state at task creation time for eagerfpu,
> and Suresh Siddha has already produced such a patch; it just needs some
> minor fixups due to an __init failure.
> 
> http://lkml.kernel.org/r/1391325599.6481.5.camel@europa
> 
> In the Xen case we could turn on eager allocation but not eager fpu.  In
> fact, it might be justified to *always* do eager allocation...

Unconditional eager allocation works. Can xen users count on this being 
included and applied to the
stable kernels?

Thanks, Sarah
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-16 Thread H. Peter Anvin

On 03/16/2014 08:35 PM, Sarah Newman wrote:
> Can you please review my patch first?  It's only enabled when absolutely 
> required.

It doesn't help.  It means you're running on Xen, and you will have
processes subjected to random SIGKILL because they happen to touch the
FPU when the atomic pool is low.

However, there is probably a happy medium: you don't actually need eager
FPU restore, you just need eager FPU *allocation*.  We have been
intending to allocate the FPU state at task creation time for eagerfpu,
and Suresh Siddha has already produced such a patch; it just needs some
minor fixups due to an __init failure.

http://lkml.kernel.org/r/1391325599.6481.5.camel@europa

In the Xen case we could turn on eager allocation but not eager fpu.  In
fact, it might be justified to *always* do eager allocation...

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-16 Thread Sarah Newman

Can you please review my patch first?  It's only enabled when absolutely 
required.

On 03/16/2014 08:33 PM, H. Peter Anvin wrote:
> No, the right thing is to unf*ck the Xen braindamage and use eagerfpu as a 
> workaround for the legacy hypervisor versions.
> 
> GFP_ATOMIC -> SIGKILL is definitely a NAK.
> 
> On March 16, 2014 8:13:05 PM PDT, Sarah Newman  wrote:
>> On 03/10/2014 10:15 AM, David Vrabel wrote:
>>> On 10/03/14 16:40, H. Peter Anvin wrote:
 On 03/10/2014 09:17 AM, David Vrabel wrote:
> math_state_restore() is called from the #NM exception handler.  It
>> may
> do a GFP_KERNEL allocation (in init_fpu()) which may schedule.
>
> Change this allocation to GFP_ATOMIC, but leave all the other
>> callers
> of init_fpu() or fpu_alloc() using GFP_KERNEL.

 And what the [Finnish] do you do if GFP_ATOMIC fails?
>>>
>>> The same thing it used to do -- kill the task with SIGKILL.  I
>> haven't
>>> changed this behaviour.
>>>
 Sarah's patchset switches Xen PV to use eagerfpu unconditionally,
>> which
 removes the dependency on #NM and is the right thing to do.
>>>
>>> Ok. I'll wait for this series and not pursue this patch any further.
>>
>> Sorry, this got swallowed by my mail filter.
>>
>> I did some more testing and I think eagerfpu is going to noticeably
>> slow things down. When I ran
>> "time sysbench --num-threads=64 --test=threads run" I saw on the order
>> of 15% more time spent in
>> system mode and this seemed consistent over different runs.
>>
>> As for GFP_ATOMIC, unfortunately I don't know a sanctioned test here so
>> I rolled my own. This test
>> sequentially allocated math-using processes in the background until it
>> could not any more.  On a
>> 64MB instance, I saw 10% fewer processes allocated with GFP_ATOMIC
>> compared to GFP_KERNEL when I
>> continually allocated new processes up to OOM conditions (256 vs 228.) 
>> A similar test on a
>> different RFS and a kernel using GFP_NOWAIT showed pretty much no
>> difference in how many processes I
>> could allocate. This doesn't seem too bad unless there is some kind of
>> fragmentation over time which
>> would cause worse performance.
>>
>> Since performance degradation applies at all times and not just under
>> extreme conditions, I think
>> the lesser evil will actually be GFP_ATOMIC.  But it's not necessary to
>> always use GFP_ATOMIC, only
>> under certain conditions - IE when the xen PVABI forces us to.
>>
>> Patches will be supplied shortly.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-16 Thread Sarah Newman

Can you please review my patch first?  It's only enabled when absolutely 
required.

On 03/16/2014 08:33 PM, H. Peter Anvin wrote:
 No, the right thing is to unf*ck the Xen braindamage and use eagerfpu as a 
 workaround for the legacy hypervisor versions.
 
 GFP_ATOMIC - SIGKILL is definitely a NAK.
 
 On March 16, 2014 8:13:05 PM PDT, Sarah Newman s...@prgmr.com wrote:
 On 03/10/2014 10:15 AM, David Vrabel wrote:
 On 10/03/14 16:40, H. Peter Anvin wrote:
 On 03/10/2014 09:17 AM, David Vrabel wrote:
 math_state_restore() is called from the #NM exception handler.  It
 may
 do a GFP_KERNEL allocation (in init_fpu()) which may schedule.

 Change this allocation to GFP_ATOMIC, but leave all the other
 callers
 of init_fpu() or fpu_alloc() using GFP_KERNEL.

 And what the [Finnish] do you do if GFP_ATOMIC fails?

 The same thing it used to do -- kill the task with SIGKILL.  I
 haven't
 changed this behaviour.

 Sarah's patchset switches Xen PV to use eagerfpu unconditionally,
 which
 removes the dependency on #NM and is the right thing to do.

 Ok. I'll wait for this series and not pursue this patch any further.

 Sorry, this got swallowed by my mail filter.

 I did some more testing and I think eagerfpu is going to noticeably
 slow things down. When I ran
 time sysbench --num-threads=64 --test=threads run I saw on the order
 of 15% more time spent in
 system mode and this seemed consistent over different runs.

 As for GFP_ATOMIC, unfortunately I don't know a sanctioned test here so
 I rolled my own. This test
 sequentially allocated math-using processes in the background until it
 could not any more.  On a
 64MB instance, I saw 10% fewer processes allocated with GFP_ATOMIC
 compared to GFP_KERNEL when I
 continually allocated new processes up to OOM conditions (256 vs 228.) 
 A similar test on a
 different RFS and a kernel using GFP_NOWAIT showed pretty much no
 difference in how many processes I
 could allocate. This doesn't seem too bad unless there is some kind of
 fragmentation over time which
 would cause worse performance.

 Since performance degradation applies at all times and not just under
 extreme conditions, I think
 the lesser evil will actually be GFP_ATOMIC.  But it's not necessary to
 always use GFP_ATOMIC, only
 under certain conditions - IE when the xen PVABI forces us to.

 Patches will be supplied shortly.
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-16 Thread H. Peter Anvin

On 03/16/2014 08:35 PM, Sarah Newman wrote:
 Can you please review my patch first?  It's only enabled when absolutely 
 required.

It doesn't help.  It means you're running on Xen, and you will have
processes subjected to random SIGKILL because they happen to touch the
FPU when the atomic pool is low.

However, there is probably a happy medium: you don't actually need eager
FPU restore, you just need eager FPU *allocation*.  We have been
intending to allocate the FPU state at task creation time for eagerfpu,
and Suresh Siddha has already produced such a patch; it just needs some
minor fixups due to an __init failure.

http://lkml.kernel.org/r/1391325599.6481.5.camel@europa

In the Xen case we could turn on eager allocation but not eager fpu.  In
fact, it might be justified to *always* do eager allocation...

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-16 Thread Sarah Newman

On 03/16/2014 08:43 PM, H. Peter Anvin wrote:
 On 03/16/2014 08:35 PM, Sarah Newman wrote:
 Can you please review my patch first?  It's only enabled when absolutely 
 required.
 
 It doesn't help.  It means you're running on Xen, and you will have
 processes subjected to random SIGKILL because they happen to touch the
 FPU when the atomic pool is low.
 
 However, there is probably a happy medium: you don't actually need eager
 FPU restore, you just need eager FPU *allocation*.  We have been
 intending to allocate the FPU state at task creation time for eagerfpu,
 and Suresh Siddha has already produced such a patch; it just needs some
 minor fixups due to an __init failure.
 
 http://lkml.kernel.org/r/1391325599.6481.5.camel@europa
 
 In the Xen case we could turn on eager allocation but not eager fpu.  In
 fact, it might be justified to *always* do eager allocation...

Unconditional eager allocation works. Can xen users count on this being 
included and applied to the
stable kernels?

Thanks, Sarah
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-16 Thread H. Peter Anvin

On 03/16/2014 09:12 PM, Sarah Newman wrote:
 
 Unconditional eager allocation works. Can xen users count on this being 
 included and applied to the
 stable kernels?
 

I don't know.  If we state that it is a bug fix for Xen it might be
possible, but it would be up to Greg (Cc:'d) and the rest of the stable
team.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

38 matches

Mail list logo