Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-05 Thread Luis R. Rodriguez
On Wed, Dec 03, 2014 at 08:39:47PM +0100, Luis R. Rodriguez wrote:
> On Wed, Dec 03, 2014 at 05:37:51AM +0100, Juergen Gross wrote:
> > On 12/03/2014 03:28 AM, Luis R. Rodriguez wrote:
> >> On Tue, Dec 02, 2014 at 11:11:18AM +, David Vrabel wrote:
> >>> On 01/12/14 22:36, Luis R. Rodriguez wrote:
> 
>  Then I do agree its a fair analogy (and find this obviously odd that how
>  widespread cond_resched() is), we just don't have an equivalent for IRQ
>  context, why not avoid the special check then and use this all the time 
>  in the
>  middle of a hypercall on the return from an interrupt (e.g., the timer
>  interrupt)?
> >>>
> >>> http://lists.xen.org/archives/html/xen-devel/2014-02/msg01101.html
> >>
> >> OK thanks! That explains why we need some asm code but in that submission 
> >> you
> >> still also had used is_preemptible_hypercall(regs) and in the new
> >> implementation you use a CPU variable xen_in_preemptible_hcall prior to 
> >> calling
> >> preempt_schedule_irq(). I believe you added the CPU variable because
> >> preempt_schedule_irq() will preempt first without any checks if it should, 
> >> I'm
> >> asking why not do something like cond_resched_irq() where we check with
> >> should_resched() prior to preempting and that way we can avoid having to 
> >> use
> >> the CPU variable?
> >
> > Because that could preempt at any asynchronous interrupt making the
> > no-preempt kernel fully preemptive. 
> 
> OK yeah I see. That still doesn't negate the value of using something
> like cond_resched_irq() with a should_resched() on only critical hypercalls.
> The current implementation (patch by David) forces preemption without
> checking for should_resched() so it would preempt unnecessarily at least
> once.
> 
> > How would you know you are just
> > doing a critical hypercall which should be preempted?
> 
> You would not, you're right. I was just trying to see if we could generalize
> an API for this to avoid having users having to create their own CPU variables
> but this all seems very specialized as we want to use this on the timer
> so if we do generalize a cond_resched_irq() perhaps the documentation can
> warn about this type of case or abuse.

David's patch had the check only it was x86 based, if we use cond_resched_irq()
we can leave that aspect out to be done through asm inlines or it'll use the
generic shoudl_resched(), that should save some code on the asm implementations.

I have some patches now which generalizees this, I also have more information
about this can happen exactly, and a way to triggger it on small systems with
some hacks to emulate possibly backend behaviour on larger systems. In the worst
case this can be a dangerious situation to be in. I'll send some new RFTs.

 Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-05 Thread Luis R. Rodriguez
On Wed, Dec 03, 2014 at 08:39:47PM +0100, Luis R. Rodriguez wrote:
 On Wed, Dec 03, 2014 at 05:37:51AM +0100, Juergen Gross wrote:
  On 12/03/2014 03:28 AM, Luis R. Rodriguez wrote:
  On Tue, Dec 02, 2014 at 11:11:18AM +, David Vrabel wrote:
  On 01/12/14 22:36, Luis R. Rodriguez wrote:
 
  Then I do agree its a fair analogy (and find this obviously odd that how
  widespread cond_resched() is), we just don't have an equivalent for IRQ
  context, why not avoid the special check then and use this all the time 
  in the
  middle of a hypercall on the return from an interrupt (e.g., the timer
  interrupt)?
 
  http://lists.xen.org/archives/html/xen-devel/2014-02/msg01101.html
 
  OK thanks! That explains why we need some asm code but in that submission 
  you
  still also had used is_preemptible_hypercall(regs) and in the new
  implementation you use a CPU variable xen_in_preemptible_hcall prior to 
  calling
  preempt_schedule_irq(). I believe you added the CPU variable because
  preempt_schedule_irq() will preempt first without any checks if it should, 
  I'm
  asking why not do something like cond_resched_irq() where we check with
  should_resched() prior to preempting and that way we can avoid having to 
  use
  the CPU variable?
 
  Because that could preempt at any asynchronous interrupt making the
  no-preempt kernel fully preemptive. 
 
 OK yeah I see. That still doesn't negate the value of using something
 like cond_resched_irq() with a should_resched() on only critical hypercalls.
 The current implementation (patch by David) forces preemption without
 checking for should_resched() so it would preempt unnecessarily at least
 once.
 
  How would you know you are just
  doing a critical hypercall which should be preempted?
 
 You would not, you're right. I was just trying to see if we could generalize
 an API for this to avoid having users having to create their own CPU variables
 but this all seems very specialized as we want to use this on the timer
 so if we do generalize a cond_resched_irq() perhaps the documentation can
 warn about this type of case or abuse.

David's patch had the check only it was x86 based, if we use cond_resched_irq()
we can leave that aspect out to be done through asm inlines or it'll use the
generic shoudl_resched(), that should save some code on the asm implementations.

I have some patches now which generalizees this, I also have more information
about this can happen exactly, and a way to triggger it on small systems with
some hacks to emulate possibly backend behaviour on larger systems. In the worst
case this can be a dangerious situation to be in. I'll send some new RFTs.

 Luis
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-03 Thread Luis R. Rodriguez
On Wed, Dec 03, 2014 at 05:37:51AM +0100, Juergen Gross wrote:
> On 12/03/2014 03:28 AM, Luis R. Rodriguez wrote:
>> On Tue, Dec 02, 2014 at 11:11:18AM +, David Vrabel wrote:
>>> On 01/12/14 22:36, Luis R. Rodriguez wrote:

 Then I do agree its a fair analogy (and find this obviously odd that how
 widespread cond_resched() is), we just don't have an equivalent for IRQ
 context, why not avoid the special check then and use this all the time in 
 the
 middle of a hypercall on the return from an interrupt (e.g., the timer
 interrupt)?
>>>
>>> http://lists.xen.org/archives/html/xen-devel/2014-02/msg01101.html
>>
>> OK thanks! That explains why we need some asm code but in that submission you
>> still also had used is_preemptible_hypercall(regs) and in the new
>> implementation you use a CPU variable xen_in_preemptible_hcall prior to 
>> calling
>> preempt_schedule_irq(). I believe you added the CPU variable because
>> preempt_schedule_irq() will preempt first without any checks if it should, 
>> I'm
>> asking why not do something like cond_resched_irq() where we check with
>> should_resched() prior to preempting and that way we can avoid having to use
>> the CPU variable?
>
> Because that could preempt at any asynchronous interrupt making the
> no-preempt kernel fully preemptive. 

OK yeah I see. That still doesn't negate the value of using something
like cond_resched_irq() with a should_resched() on only critical hypercalls.
The current implementation (patch by David) forces preemption without
checking for should_resched() so it would preempt unnecessarily at least
once.

> How would you know you are just
> doing a critical hypercall which should be preempted?

You would not, you're right. I was just trying to see if we could generalize
an API for this to avoid having users having to create their own CPU variables
but this all seems very specialized as we want to use this on the timer
so if we do generalize a cond_resched_irq() perhaps the documentation can
warn about this type of case or abuse.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-03 Thread Luis R. Rodriguez
On Wed, Dec 03, 2014 at 05:37:51AM +0100, Juergen Gross wrote:
 On 12/03/2014 03:28 AM, Luis R. Rodriguez wrote:
 On Tue, Dec 02, 2014 at 11:11:18AM +, David Vrabel wrote:
 On 01/12/14 22:36, Luis R. Rodriguez wrote:

 Then I do agree its a fair analogy (and find this obviously odd that how
 widespread cond_resched() is), we just don't have an equivalent for IRQ
 context, why not avoid the special check then and use this all the time in 
 the
 middle of a hypercall on the return from an interrupt (e.g., the timer
 interrupt)?

 http://lists.xen.org/archives/html/xen-devel/2014-02/msg01101.html

 OK thanks! That explains why we need some asm code but in that submission you
 still also had used is_preemptible_hypercall(regs) and in the new
 implementation you use a CPU variable xen_in_preemptible_hcall prior to 
 calling
 preempt_schedule_irq(). I believe you added the CPU variable because
 preempt_schedule_irq() will preempt first without any checks if it should, 
 I'm
 asking why not do something like cond_resched_irq() where we check with
 should_resched() prior to preempting and that way we can avoid having to use
 the CPU variable?

 Because that could preempt at any asynchronous interrupt making the
 no-preempt kernel fully preemptive. 

OK yeah I see. That still doesn't negate the value of using something
like cond_resched_irq() with a should_resched() on only critical hypercalls.
The current implementation (patch by David) forces preemption without
checking for should_resched() so it would preempt unnecessarily at least
once.

 How would you know you are just
 doing a critical hypercall which should be preempted?

You would not, you're right. I was just trying to see if we could generalize
an API for this to avoid having users having to create their own CPU variables
but this all seems very specialized as we want to use this on the timer
so if we do generalize a cond_resched_irq() perhaps the documentation can
warn about this type of case or abuse.

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-02 Thread Juergen Gross

On 12/03/2014 03:28 AM, Luis R. Rodriguez wrote:

On Tue, Dec 02, 2014 at 11:11:18AM +, David Vrabel wrote:

On 01/12/14 22:36, Luis R. Rodriguez wrote:


Then I do agree its a fair analogy (and find this obviously odd that how
widespread cond_resched() is), we just don't have an equivalent for IRQ
context, why not avoid the special check then and use this all the time in the
middle of a hypercall on the return from an interrupt (e.g., the timer
interrupt)?


http://lists.xen.org/archives/html/xen-devel/2014-02/msg01101.html


OK thanks! That explains why we need some asm code but in that submission you
still also had used is_preemptible_hypercall(regs) and in the new
implementation you use a CPU variable xen_in_preemptible_hcall prior to calling
preempt_schedule_irq(). I believe you added the CPU variable because
preempt_schedule_irq() will preempt first without any checks if it should, I'm
asking why not do something like cond_resched_irq() where we check with
should_resched() prior to preempting and that way we can avoid having to use
the CPU variable?


Because that could preempt at any asynchronous interrupt making the
no-preempt kernel fully preemptive. How would you know you are just
doing a critical hypercall which should be preempted?

Juergen

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-02 Thread Luis R. Rodriguez
On Tue, Dec 02, 2014 at 11:11:18AM +, David Vrabel wrote:
> On 01/12/14 22:36, Luis R. Rodriguez wrote:
> > 
> > Then I do agree its a fair analogy (and find this obviously odd that how
> > widespread cond_resched() is), we just don't have an equivalent for IRQ
> > context, why not avoid the special check then and use this all the time in 
> > the
> > middle of a hypercall on the return from an interrupt (e.g., the timer
> > interrupt)?
> 
> http://lists.xen.org/archives/html/xen-devel/2014-02/msg01101.html

OK thanks! That explains why we need some asm code but in that submission you
still also had used is_preemptible_hypercall(regs) and in the new
implementation you use a CPU variable xen_in_preemptible_hcall prior to calling
preempt_schedule_irq(). I believe you added the CPU variable because
preempt_schedule_irq() will preempt first without any checks if it should, I'm
asking why not do something like cond_resched_irq() where we check with
should_resched() prior to preempting and that way we can avoid having to use
the CPU variable?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-02 Thread David Vrabel
On 01/12/14 22:36, Luis R. Rodriguez wrote:
> 
> Then I do agree its a fair analogy (and find this obviously odd that how
> widespread cond_resched() is), we just don't have an equivalent for IRQ
> context, why not avoid the special check then and use this all the time in the
> middle of a hypercall on the return from an interrupt (e.g., the timer
> interrupt)?

http://lists.xen.org/archives/html/xen-devel/2014-02/msg01101.html

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-02 Thread David Vrabel
On 01/12/14 22:36, Luis R. Rodriguez wrote:
 
 Then I do agree its a fair analogy (and find this obviously odd that how
 widespread cond_resched() is), we just don't have an equivalent for IRQ
 context, why not avoid the special check then and use this all the time in the
 middle of a hypercall on the return from an interrupt (e.g., the timer
 interrupt)?

http://lists.xen.org/archives/html/xen-devel/2014-02/msg01101.html

David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-02 Thread Luis R. Rodriguez
On Tue, Dec 02, 2014 at 11:11:18AM +, David Vrabel wrote:
 On 01/12/14 22:36, Luis R. Rodriguez wrote:
  
  Then I do agree its a fair analogy (and find this obviously odd that how
  widespread cond_resched() is), we just don't have an equivalent for IRQ
  context, why not avoid the special check then and use this all the time in 
  the
  middle of a hypercall on the return from an interrupt (e.g., the timer
  interrupt)?
 
 http://lists.xen.org/archives/html/xen-devel/2014-02/msg01101.html

OK thanks! That explains why we need some asm code but in that submission you
still also had used is_preemptible_hypercall(regs) and in the new
implementation you use a CPU variable xen_in_preemptible_hcall prior to calling
preempt_schedule_irq(). I believe you added the CPU variable because
preempt_schedule_irq() will preempt first without any checks if it should, I'm
asking why not do something like cond_resched_irq() where we check with
should_resched() prior to preempting and that way we can avoid having to use
the CPU variable?

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-02 Thread Juergen Gross

On 12/03/2014 03:28 AM, Luis R. Rodriguez wrote:

On Tue, Dec 02, 2014 at 11:11:18AM +, David Vrabel wrote:

On 01/12/14 22:36, Luis R. Rodriguez wrote:


Then I do agree its a fair analogy (and find this obviously odd that how
widespread cond_resched() is), we just don't have an equivalent for IRQ
context, why not avoid the special check then and use this all the time in the
middle of a hypercall on the return from an interrupt (e.g., the timer
interrupt)?


http://lists.xen.org/archives/html/xen-devel/2014-02/msg01101.html


OK thanks! That explains why we need some asm code but in that submission you
still also had used is_preemptible_hypercall(regs) and in the new
implementation you use a CPU variable xen_in_preemptible_hcall prior to calling
preempt_schedule_irq(). I believe you added the CPU variable because
preempt_schedule_irq() will preempt first without any checks if it should, I'm
asking why not do something like cond_resched_irq() where we check with
should_resched() prior to preempting and that way we can avoid having to use
the CPU variable?


Because that could preempt at any asynchronous interrupt making the
no-preempt kernel fully preemptive. How would you know you are just
doing a critical hypercall which should be preempted?

Juergen

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-01 Thread Luis R. Rodriguez
On Mon, Dec 01, 2014 at 06:16:28PM +, David Vrabel wrote:
> On 01/12/14 16:19, Luis R. Rodriguez wrote:
> > On Mon, Dec 01, 2014 at 03:54:24PM +, David Vrabel wrote:
> >> On 01/12/14 15:44, Luis R. Rodriguez wrote:
> >>> On Mon, Dec 1, 2014 at 10:18 AM, David Vrabel  
> >>> wrote:
>  On 01/12/14 15:05, Luis R. Rodriguez wrote:
> > On Mon, Dec 01, 2014 at 11:11:43AM +, David Vrabel wrote:
> >> On 27/11/14 18:36, Luis R. Rodriguez wrote:
> >>> On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
>  On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" 
> >
> > Some folks had reported that some xen hypercalls take a long time
> > to complete when issued from the userspace private ioctl mechanism,
> > this can happen for instance with some hypercalls that have many
> > sub-operations, this can happen for instance on hypercalls that use
> >> [...]
> > --- a/drivers/xen/privcmd.c
> > +++ b/drivers/xen/privcmd.c
> > @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user 
> > *udata)
> >  hypercall.arg[0], hypercall.arg[1],
> >  hypercall.arg[2], hypercall.arg[3],
> >  hypercall.arg[4]);
> > +#ifndef CONFIG_PREEMPT
> > + schedule();
> > +#endif
> >>
> >> As Juergen points out, this does nothing.  You need to schedule while 
> >> in
> >> the middle of the hypercall.
> >>
> >> Remember that Xen's hypercall preemption only preempts the hypercall to
> >> run interrupts in the guest.
> >
> > How is it ensured that when the kernel preempts on this code path on
> > CONFIG_PREEMPT=n kernel that only interrupts in the guest are run?
> 
>  Sorry, I really didn't describe this very well.
> 
>  If a hypercall needs a continuation, Xen returns to the guest with the
>  IP set to the hypercall instruction, and on the way back to the guest
>  Xen may schedule a different VCPU or it will do any upcalls (as per 
>  normal).
> 
>  The guest is free to return from the upcall to the original task
>  (continuing the hypercall) or to a different one.
> >>>
> >>> OK so that addresses what Xen will do when using continuation and
> >>> hypercall preemption, my concern here was that using
> >>> preempt_schedule_irq() on CONFIG_PREEMPT=n kernels in the middle of a
> >>> hypercall on the return from an interrupt (e.g., the timer interrupt)
> >>> would still let the kernel preempt to tasks other than those related
> >>> to Xen.
> >>
> >> Um.  Why would that be a problem?  We do want to switch to any task the
> >> Linux scheduler thinks is best.
> > 
> > Its safe but -- it technically is doing kernel preemption, unless we want
> > to adjust the definition of CONFIG_PREEMPT=n to exclude hypercalls. This
> > was my original concern with the use of preempt_schedule_irq() to do this.
> > I am afraid of setting precedents without being clear or wider review and
> > acceptance.
> 
> It's voluntary preemption at a well defined point. 

Its voluntarily preempting the kernel even for CONFIG_PREEMPT=n kernels... 

> It's no different to a cond_resched() call.

Then I do agree its a fair analogy (and find this obviously odd that how
widespread cond_resched() is), we just don't have an equivalent for IRQ
context, why not avoid the special check then and use this all the time in the
middle of a hypercall on the return from an interrupt (e.g., the timer
interrupt)?

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5e344bb..e60b5a1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2759,6 +2759,12 @@ static inline int signal_pending_state(long state, 
struct task_struct *p)
  */
 extern int _cond_resched(void);
 
+/*
+ * Voluntarily preempting the kernel even for CONFIG_PREEMPT=n kernels
+ * on very special circumstances.
+ */
+extern int cond_resched_irq(void);
+
 #define cond_resched() ({  \
__might_sleep(__FILE__, __LINE__, 0);   \
_cond_resched();\
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 240157c..1c4d443 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4264,6 +4264,16 @@ int __sched _cond_resched(void)
 }
 EXPORT_SYMBOL(_cond_resched);
 
+int __sched cond_resched_irq(void)
+{
+   if (should_resched()) {
+   preempt_schedule_irq();
+   return 1;
+   }
+   return 0;
+}
+EXPORT_SYMBOL_GPL(cond_resched_irq);
+
 /*
  * __cond_resched_lock() - if a reschedule is pending, drop the given lock,
  * call schedule, and on return reacquire the lock.
  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-01 Thread David Vrabel
On 01/12/14 16:19, Luis R. Rodriguez wrote:
> On Mon, Dec 01, 2014 at 03:54:24PM +, David Vrabel wrote:
>> On 01/12/14 15:44, Luis R. Rodriguez wrote:
>>> On Mon, Dec 1, 2014 at 10:18 AM, David Vrabel  
>>> wrote:
 On 01/12/14 15:05, Luis R. Rodriguez wrote:
> On Mon, Dec 01, 2014 at 11:11:43AM +, David Vrabel wrote:
>> On 27/11/14 18:36, Luis R. Rodriguez wrote:
>>> On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
 On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" 
>
> Some folks had reported that some xen hypercalls take a long time
> to complete when issued from the userspace private ioctl mechanism,
> this can happen for instance with some hypercalls that have many
> sub-operations, this can happen for instance on hypercalls that use
>> [...]
> --- a/drivers/xen/privcmd.c
> +++ b/drivers/xen/privcmd.c
> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user 
> *udata)
>  hypercall.arg[0], hypercall.arg[1],
>  hypercall.arg[2], hypercall.arg[3],
>  hypercall.arg[4]);
> +#ifndef CONFIG_PREEMPT
> + schedule();
> +#endif
>>
>> As Juergen points out, this does nothing.  You need to schedule while in
>> the middle of the hypercall.
>>
>> Remember that Xen's hypercall preemption only preempts the hypercall to
>> run interrupts in the guest.
>
> How is it ensured that when the kernel preempts on this code path on
> CONFIG_PREEMPT=n kernel that only interrupts in the guest are run?

 Sorry, I really didn't describe this very well.

 If a hypercall needs a continuation, Xen returns to the guest with the
 IP set to the hypercall instruction, and on the way back to the guest
 Xen may schedule a different VCPU or it will do any upcalls (as per 
 normal).

 The guest is free to return from the upcall to the original task
 (continuing the hypercall) or to a different one.
>>>
>>> OK so that addresses what Xen will do when using continuation and
>>> hypercall preemption, my concern here was that using
>>> preempt_schedule_irq() on CONFIG_PREEMPT=n kernels in the middle of a
>>> hypercall on the return from an interrupt (e.g., the timer interrupt)
>>> would still let the kernel preempt to tasks other than those related
>>> to Xen.
>>
>> Um.  Why would that be a problem?  We do want to switch to any task the
>> Linux scheduler thinks is best.
> 
> Its safe but -- it technically is doing kernel preemption, unless we want
> to adjust the definition of CONFIG_PREEMPT=n to exclude hypercalls. This
> was my original concern with the use of preempt_schedule_irq() to do this.
> I am afraid of setting precedents without being clear or wider review and
> acceptance.

It's voluntary preemption at a well defined point.  It's no different to
a cond_resched() call.

Note that we're not trying to fix this for the non-voluntary-preempt
kernels.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-01 Thread Luis R. Rodriguez
On Mon, Dec 01, 2014 at 03:42:33PM +0100, Juergen Gross wrote:
> On 12/01/2014 02:32 PM, Luis R. Rodriguez wrote:
>> On Mon, Dec 01, 2014 at 11:01:18AM +, David Vrabel wrote:
>>> On 28/11/14 04:49, Juergen Gross wrote:
 On 11/27/2014 07:50 PM, Andrew Cooper wrote:
>
> XenServer uses
>
> https://github.com/xenserver/linux-3.x.pg/blob/master/master/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
>
>
> to deal with these issues.  That patch is based on 3.10.

 Clever. :-)

>
> I can remember whether this has been submitted upstream before (and
> there were outstanding issues), or whether it fell at an inconvenient
> time with our development cycles.

 I found

 http://lists.xen.org/archives/html/xen-devel/2014-02/msg02540.html

 and nothing else.
>>>
>>> I dropped it because it copy-and-paste a bunch of otherwise generic x86
>>> assembler and looked unlikely to get an x86 maintainer ack.  If you
>>> think otherwise, feel free to pick it up and run with it.
>>
>> I was trying to run with it, but my biggest gripe with this was
>> the use of preempt_schedule_irq(), but we can review that on the
>> other thread.

So much for the other thread :)

> I think this can be handled completely inside xen_evtchn_do_upcall().
>
> xen_preemptible_hcall_begin() (possibly with another name) could take
> the pointer of a function as parameter which is used as continuation
> point after an asynchronous interrupt in the critical section.

Oh so we'd only preempt to one specific task?

> Parameter for that function would be the exception frame of the
> original interrupt to be able to continue at the interrupted position
> after e.g. calling schedule().

Interesting... if reasonable I wonder if this should be generalized.
Are there other CONFIG_PREEMPT=n use cases for such excemptions?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-01 Thread Juergen Gross

On 12/01/2014 02:32 PM, Luis R. Rodriguez wrote:

On Mon, Dec 01, 2014 at 11:01:18AM +, David Vrabel wrote:

On 28/11/14 04:49, Juergen Gross wrote:

On 11/27/2014 07:50 PM, Andrew Cooper wrote:


XenServer uses

https://github.com/xenserver/linux-3.x.pg/blob/master/master/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch


to deal with these issues.  That patch is based on 3.10.


Clever. :-)



I can remember whether this has been submitted upstream before (and
there were outstanding issues), or whether it fell at an inconvenient
time with our development cycles.


I found

http://lists.xen.org/archives/html/xen-devel/2014-02/msg02540.html

and nothing else.


I dropped it because it copy-and-paste a bunch of otherwise generic x86
assembler and looked unlikely to get an x86 maintainer ack.  If you
think otherwise, feel free to pick it up and run with it.


I was trying to run with it, but my biggest gripe with this was
the use of preempt_schedule_irq(), but we can review that on the
other thread.


I think this can be handled completely inside xen_evtchn_do_upcall().

xen_preemptible_hcall_begin() (possibly with another name) could take
the pointer of a function as parameter which is used as continuation
point after an asynchronous interrupt in the critical section.
Parameter for that function would be the exception frame of the
original interrupt to be able to continue at the interrupted position
after e.g. calling schedule().


Juergen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-01 Thread Luis R. Rodriguez
On Mon, Dec 01, 2014 at 11:01:18AM +, David Vrabel wrote:
> On 28/11/14 04:49, Juergen Gross wrote:
> > On 11/27/2014 07:50 PM, Andrew Cooper wrote:
> >> 
> >> XenServer uses
> >>
> >> https://github.com/xenserver/linux-3.x.pg/blob/master/master/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
> >>
> >>
> >> to deal with these issues.  That patch is based on 3.10.
> > 
> > Clever. :-)
> > 
> >>
> >> I can remember whether this has been submitted upstream before (and
> >> there were outstanding issues), or whether it fell at an inconvenient
> >> time with our development cycles.
> > 
> > I found
> > 
> > http://lists.xen.org/archives/html/xen-devel/2014-02/msg02540.html
> > 
> > and nothing else.
> 
> I dropped it because it copy-and-paste a bunch of otherwise generic x86
> assembler and looked unlikely to get an x86 maintainer ack.  If you
> think otherwise, feel free to pick it up and run with it.

I was trying to run with it, but my biggest gripe with this was
the use of preempt_schedule_irq(), but we can review that on the
other thread.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-01 Thread David Vrabel
On 28/11/14 04:49, Juergen Gross wrote:
> On 11/27/2014 07:50 PM, Andrew Cooper wrote:
>> 
>> XenServer uses
>>
>> https://github.com/xenserver/linux-3.x.pg/blob/master/master/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
>>
>>
>> to deal with these issues.  That patch is based on 3.10.
> 
> Clever. :-)
> 
>>
>> I can remember whether this has been submitted upstream before (and
>> there were outstanding issues), or whether it fell at an inconvenient
>> time with our development cycles.
> 
> I found
> 
> http://lists.xen.org/archives/html/xen-devel/2014-02/msg02540.html
> 
> and nothing else.

I dropped it because it copy-and-paste a bunch of otherwise generic x86
assembler and looked unlikely to get an x86 maintainer ack.  If you
think otherwise, feel free to pick it up and run with it.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-01 Thread Luis R. Rodriguez
On Mon, Dec 01, 2014 at 06:16:28PM +, David Vrabel wrote:
 On 01/12/14 16:19, Luis R. Rodriguez wrote:
  On Mon, Dec 01, 2014 at 03:54:24PM +, David Vrabel wrote:
  On 01/12/14 15:44, Luis R. Rodriguez wrote:
  On Mon, Dec 1, 2014 at 10:18 AM, David Vrabel david.vra...@citrix.com 
  wrote:
  On 01/12/14 15:05, Luis R. Rodriguez wrote:
  On Mon, Dec 01, 2014 at 11:11:43AM +, David Vrabel wrote:
  On 27/11/14 18:36, Luis R. Rodriguez wrote:
  On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
  On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
  From: Luis R. Rodriguez mcg...@suse.com
 
  Some folks had reported that some xen hypercalls take a long time
  to complete when issued from the userspace private ioctl mechanism,
  this can happen for instance with some hypercalls that have many
  sub-operations, this can happen for instance on hypercalls that use
  [...]
  --- a/drivers/xen/privcmd.c
  +++ b/drivers/xen/privcmd.c
  @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user 
  *udata)
   hypercall.arg[0], hypercall.arg[1],
   hypercall.arg[2], hypercall.arg[3],
   hypercall.arg[4]);
  +#ifndef CONFIG_PREEMPT
  + schedule();
  +#endif
 
  As Juergen points out, this does nothing.  You need to schedule while 
  in
  the middle of the hypercall.
 
  Remember that Xen's hypercall preemption only preempts the hypercall to
  run interrupts in the guest.
 
  How is it ensured that when the kernel preempts on this code path on
  CONFIG_PREEMPT=n kernel that only interrupts in the guest are run?
 
  Sorry, I really didn't describe this very well.
 
  If a hypercall needs a continuation, Xen returns to the guest with the
  IP set to the hypercall instruction, and on the way back to the guest
  Xen may schedule a different VCPU or it will do any upcalls (as per 
  normal).
 
  The guest is free to return from the upcall to the original task
  (continuing the hypercall) or to a different one.
 
  OK so that addresses what Xen will do when using continuation and
  hypercall preemption, my concern here was that using
  preempt_schedule_irq() on CONFIG_PREEMPT=n kernels in the middle of a
  hypercall on the return from an interrupt (e.g., the timer interrupt)
  would still let the kernel preempt to tasks other than those related
  to Xen.
 
  Um.  Why would that be a problem?  We do want to switch to any task the
  Linux scheduler thinks is best.
  
  Its safe but -- it technically is doing kernel preemption, unless we want
  to adjust the definition of CONFIG_PREEMPT=n to exclude hypercalls. This
  was my original concern with the use of preempt_schedule_irq() to do this.
  I am afraid of setting precedents without being clear or wider review and
  acceptance.
 
 It's voluntary preemption at a well defined point. 

Its voluntarily preempting the kernel even for CONFIG_PREEMPT=n kernels... 

 It's no different to a cond_resched() call.

Then I do agree its a fair analogy (and find this obviously odd that how
widespread cond_resched() is), we just don't have an equivalent for IRQ
context, why not avoid the special check then and use this all the time in the
middle of a hypercall on the return from an interrupt (e.g., the timer
interrupt)?

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5e344bb..e60b5a1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2759,6 +2759,12 @@ static inline int signal_pending_state(long state, 
struct task_struct *p)
  */
 extern int _cond_resched(void);
 
+/*
+ * Voluntarily preempting the kernel even for CONFIG_PREEMPT=n kernels
+ * on very special circumstances.
+ */
+extern int cond_resched_irq(void);
+
 #define cond_resched() ({  \
__might_sleep(__FILE__, __LINE__, 0);   \
_cond_resched();\
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 240157c..1c4d443 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4264,6 +4264,16 @@ int __sched _cond_resched(void)
 }
 EXPORT_SYMBOL(_cond_resched);
 
+int __sched cond_resched_irq(void)
+{
+   if (should_resched()) {
+   preempt_schedule_irq();
+   return 1;
+   }
+   return 0;
+}
+EXPORT_SYMBOL_GPL(cond_resched_irq);
+
 /*
  * __cond_resched_lock() - if a reschedule is pending, drop the given lock,
  * call schedule, and on return reacquire the lock.
  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-01 Thread David Vrabel
On 28/11/14 04:49, Juergen Gross wrote:
 On 11/27/2014 07:50 PM, Andrew Cooper wrote:
 
 XenServer uses

 https://github.com/xenserver/linux-3.x.pg/blob/master/master/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch


 to deal with these issues.  That patch is based on 3.10.
 
 Clever. :-)
 

 I can remember whether this has been submitted upstream before (and
 there were outstanding issues), or whether it fell at an inconvenient
 time with our development cycles.
 
 I found
 
 http://lists.xen.org/archives/html/xen-devel/2014-02/msg02540.html
 
 and nothing else.

I dropped it because it copy-and-paste a bunch of otherwise generic x86
assembler and looked unlikely to get an x86 maintainer ack.  If you
think otherwise, feel free to pick it up and run with it.

David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-01 Thread Luis R. Rodriguez
On Mon, Dec 01, 2014 at 11:01:18AM +, David Vrabel wrote:
 On 28/11/14 04:49, Juergen Gross wrote:
  On 11/27/2014 07:50 PM, Andrew Cooper wrote:
  
  XenServer uses
 
  https://github.com/xenserver/linux-3.x.pg/blob/master/master/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
 
 
  to deal with these issues.  That patch is based on 3.10.
  
  Clever. :-)
  
 
  I can remember whether this has been submitted upstream before (and
  there were outstanding issues), or whether it fell at an inconvenient
  time with our development cycles.
  
  I found
  
  http://lists.xen.org/archives/html/xen-devel/2014-02/msg02540.html
  
  and nothing else.
 
 I dropped it because it copy-and-paste a bunch of otherwise generic x86
 assembler and looked unlikely to get an x86 maintainer ack.  If you
 think otherwise, feel free to pick it up and run with it.

I was trying to run with it, but my biggest gripe with this was
the use of preempt_schedule_irq(), but we can review that on the
other thread.

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-01 Thread Juergen Gross

On 12/01/2014 02:32 PM, Luis R. Rodriguez wrote:

On Mon, Dec 01, 2014 at 11:01:18AM +, David Vrabel wrote:

On 28/11/14 04:49, Juergen Gross wrote:

On 11/27/2014 07:50 PM, Andrew Cooper wrote:


XenServer uses

https://github.com/xenserver/linux-3.x.pg/blob/master/master/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch


to deal with these issues.  That patch is based on 3.10.


Clever. :-)



I can remember whether this has been submitted upstream before (and
there were outstanding issues), or whether it fell at an inconvenient
time with our development cycles.


I found

http://lists.xen.org/archives/html/xen-devel/2014-02/msg02540.html

and nothing else.


I dropped it because it copy-and-paste a bunch of otherwise generic x86
assembler and looked unlikely to get an x86 maintainer ack.  If you
think otherwise, feel free to pick it up and run with it.


I was trying to run with it, but my biggest gripe with this was
the use of preempt_schedule_irq(), but we can review that on the
other thread.


I think this can be handled completely inside xen_evtchn_do_upcall().

xen_preemptible_hcall_begin() (possibly with another name) could take
the pointer of a function as parameter which is used as continuation
point after an asynchronous interrupt in the critical section.
Parameter for that function would be the exception frame of the
original interrupt to be able to continue at the interrupted position
after e.g. calling schedule().


Juergen
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-01 Thread Luis R. Rodriguez
On Mon, Dec 01, 2014 at 03:42:33PM +0100, Juergen Gross wrote:
 On 12/01/2014 02:32 PM, Luis R. Rodriguez wrote:
 On Mon, Dec 01, 2014 at 11:01:18AM +, David Vrabel wrote:
 On 28/11/14 04:49, Juergen Gross wrote:
 On 11/27/2014 07:50 PM, Andrew Cooper wrote:

 XenServer uses

 https://github.com/xenserver/linux-3.x.pg/blob/master/master/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch


 to deal with these issues.  That patch is based on 3.10.

 Clever. :-)


 I can remember whether this has been submitted upstream before (and
 there were outstanding issues), or whether it fell at an inconvenient
 time with our development cycles.

 I found

 http://lists.xen.org/archives/html/xen-devel/2014-02/msg02540.html

 and nothing else.

 I dropped it because it copy-and-paste a bunch of otherwise generic x86
 assembler and looked unlikely to get an x86 maintainer ack.  If you
 think otherwise, feel free to pick it up and run with it.

 I was trying to run with it, but my biggest gripe with this was
 the use of preempt_schedule_irq(), but we can review that on the
 other thread.

So much for the other thread :)

 I think this can be handled completely inside xen_evtchn_do_upcall().

 xen_preemptible_hcall_begin() (possibly with another name) could take
 the pointer of a function as parameter which is used as continuation
 point after an asynchronous interrupt in the critical section.

Oh so we'd only preempt to one specific task?

 Parameter for that function would be the exception frame of the
 original interrupt to be able to continue at the interrupted position
 after e.g. calling schedule().

Interesting... if reasonable I wonder if this should be generalized.
Are there other CONFIG_PREEMPT=n use cases for such excemptions?

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-01 Thread David Vrabel
On 01/12/14 16:19, Luis R. Rodriguez wrote:
 On Mon, Dec 01, 2014 at 03:54:24PM +, David Vrabel wrote:
 On 01/12/14 15:44, Luis R. Rodriguez wrote:
 On Mon, Dec 1, 2014 at 10:18 AM, David Vrabel david.vra...@citrix.com 
 wrote:
 On 01/12/14 15:05, Luis R. Rodriguez wrote:
 On Mon, Dec 01, 2014 at 11:11:43AM +, David Vrabel wrote:
 On 27/11/14 18:36, Luis R. Rodriguez wrote:
 On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
 On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
 From: Luis R. Rodriguez mcg...@suse.com

 Some folks had reported that some xen hypercalls take a long time
 to complete when issued from the userspace private ioctl mechanism,
 this can happen for instance with some hypercalls that have many
 sub-operations, this can happen for instance on hypercalls that use
 [...]
 --- a/drivers/xen/privcmd.c
 +++ b/drivers/xen/privcmd.c
 @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user 
 *udata)
  hypercall.arg[0], hypercall.arg[1],
  hypercall.arg[2], hypercall.arg[3],
  hypercall.arg[4]);
 +#ifndef CONFIG_PREEMPT
 + schedule();
 +#endif

 As Juergen points out, this does nothing.  You need to schedule while in
 the middle of the hypercall.

 Remember that Xen's hypercall preemption only preempts the hypercall to
 run interrupts in the guest.

 How is it ensured that when the kernel preempts on this code path on
 CONFIG_PREEMPT=n kernel that only interrupts in the guest are run?

 Sorry, I really didn't describe this very well.

 If a hypercall needs a continuation, Xen returns to the guest with the
 IP set to the hypercall instruction, and on the way back to the guest
 Xen may schedule a different VCPU or it will do any upcalls (as per 
 normal).

 The guest is free to return from the upcall to the original task
 (continuing the hypercall) or to a different one.

 OK so that addresses what Xen will do when using continuation and
 hypercall preemption, my concern here was that using
 preempt_schedule_irq() on CONFIG_PREEMPT=n kernels in the middle of a
 hypercall on the return from an interrupt (e.g., the timer interrupt)
 would still let the kernel preempt to tasks other than those related
 to Xen.

 Um.  Why would that be a problem?  We do want to switch to any task the
 Linux scheduler thinks is best.
 
 Its safe but -- it technically is doing kernel preemption, unless we want
 to adjust the definition of CONFIG_PREEMPT=n to exclude hypercalls. This
 was my original concern with the use of preempt_schedule_irq() to do this.
 I am afraid of setting precedents without being clear or wider review and
 acceptance.

It's voluntary preemption at a well defined point.  It's no different to
a cond_resched() call.

Note that we're not trying to fix this for the non-voluntary-preempt
kernels.

David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-11-28 Thread Luis R. Rodriguez
On Thu, Nov 27, 2014 at 06:50:31PM +, Andrew Cooper wrote:
> On 27/11/14 18:36, Luis R. Rodriguez wrote:
> > On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
> >> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
> >>> From: "Luis R. Rodriguez" 
> >>>
> >>> Some folks had reported that some xen hypercalls take a long time
> >>> to complete when issued from the userspace private ioctl mechanism,
> >>> this can happen for instance with some hypercalls that have many
> >>> sub-operations, this can happen for instance on hypercalls that use
> >>> multi-call feature whereby Xen lets one hypercall batch out a series
> >>> of other hypercalls on the hypervisor. At times such hypercalls can
> >>> even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default
> >>> 120 seconds), this a non-issue issue on preemptible kernels though as
> >>> the kernel may deschedule such long running tasks. Xen for instance
> >>> supports multicalls to be preempted as well, this is what Xen calls
> >>> continuation (see xen commit 42217cbc5b which introduced this [0]).
> >>> On systems without CONFIG_PREEMPT though -- a kernel with voluntary
> >>> or no preemption -- a long running hypercall will not be descheduled
> >>> until the hypercall is complete and the ioctl returns to user space.
> >>>
> >>> To help with this David had originally implemented support for use
> >>> of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This
> >>> solution never went upstream though and upon review to help refactor
> >>> this I've concluded that usage of preempt_schedule_irq() would be
> >>> a bit abussive of existing APIs -- for a few reasons:
> >>>
> >>> 0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels
> >>>
> >>> 1) we want try to consider solutions that might work for other
> >>> hypervisors for this same problem, and identify it its an issue
> >>> even present on other hypervisors or if this is a self
> >>> inflicted architectural issue caused by use of multicalls
> >>>
> >>> 2) there is no documentation or profiling of the exact hypercalls
> >>> that were causing these issues, nor do we have any context
> >>> to help evaluate this any further
> >>>
> >>> I at least checked with kvm folks and it seems hypercall preemption
> >>> is not needed there. We can survey other hypervisors...
> >>>
> >>> If 'something like preemption' is needed then CONFIG_PREEMPT
> >>> should just be enabled and encouraged, it seems we want to
> >>> encourage CONFIG_PREEMPT on xen, specially when multicalls are
> >>> used. In the meantime this tries to address a solution to help
> >>> xen on non CONFIG_PREEMPT kernels.
> >>>
> >>> One option tested and evaluated was to put private hypercalls in
> >>> process context, however this would introduce complexities such
> >>> originating hypercalls from different contexts. Current xen
> >>> hypercall callback handlers would need to be changed per architecture,
> >>> for instance, we'd also incur the cost of switching states from
> >>> user / kernel (this cost is also present if preempt_schedule_irq()
> >>> is used). There may be other issues which could be introduced with
> >>> this strategy as well. The simplest *shared* alternative is instead
> >>> to just explicitly schedule() at the end of a private hypercall on non
> >>> preempt kernels. This forces our private hypercall call mechanism
> >>> to try to be fair only on non CONFIG_PREEMPT kernels at the cost of
> >>> more context switch but keeps the private hypercall context intact.
> >>>
> >>> [0] 
> >>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
> >>> [1] 
> >>> http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
> >>>
> >>> Cc: Davidlohr Bueso 
> >>> Cc: Joerg Roedel 
> >>> Cc: Borislav Petkov 
> >>> Cc: Konrad Rzeszutek Wilk 
> >>> Cc: Jan Beulich 
> >>> Cc: Juergen Gross 
> >>> Cc: Olaf Hering 
> >>> Cc: David Vrabel 
> >>> Signed-off-by: Luis R. Rodriguez 
> >>> ---
> >>>   drivers/xen/privcmd.c | 3 +++
> >>>   1 file changed, 3 insertions(+)
> >>>
> >>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> >>> index 569a13b..e29edba 100644
> >>> --- a/drivers/xen/privcmd.c
> >>> +++ b/drivers/xen/privcmd.c
> >>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
> >>>  hypercall.arg[0], hypercall.arg[1],
> >>>  hypercall.arg[2], hypercall.arg[3],
> >>>  hypercall.arg[4]);
> >>> +#ifndef CONFIG_PREEMPT
> >>> + schedule();
> >>> +#endif
> >>>
> >>>   return ret;
> >>>   }
> >>>
> >> Sorry, I don't think this will solve anything. You're calling schedule()
> >> right after the long running hypercall just nanoseconds before returning
> >> to the user.
> > Yeah, well that is what [1] tried as well only it tried 

Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-11-28 Thread Luis R. Rodriguez
On Thu, Nov 27, 2014 at 06:50:31PM +, Andrew Cooper wrote:
 On 27/11/14 18:36, Luis R. Rodriguez wrote:
  On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
  On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
  From: Luis R. Rodriguez mcg...@suse.com
 
  Some folks had reported that some xen hypercalls take a long time
  to complete when issued from the userspace private ioctl mechanism,
  this can happen for instance with some hypercalls that have many
  sub-operations, this can happen for instance on hypercalls that use
  multi-call feature whereby Xen lets one hypercall batch out a series
  of other hypercalls on the hypervisor. At times such hypercalls can
  even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default
  120 seconds), this a non-issue issue on preemptible kernels though as
  the kernel may deschedule such long running tasks. Xen for instance
  supports multicalls to be preempted as well, this is what Xen calls
  continuation (see xen commit 42217cbc5b which introduced this [0]).
  On systems without CONFIG_PREEMPT though -- a kernel with voluntary
  or no preemption -- a long running hypercall will not be descheduled
  until the hypercall is complete and the ioctl returns to user space.
 
  To help with this David had originally implemented support for use
  of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This
  solution never went upstream though and upon review to help refactor
  this I've concluded that usage of preempt_schedule_irq() would be
  a bit abussive of existing APIs -- for a few reasons:
 
  0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels
 
  1) we want try to consider solutions that might work for other
  hypervisors for this same problem, and identify it its an issue
  even present on other hypervisors or if this is a self
  inflicted architectural issue caused by use of multicalls
 
  2) there is no documentation or profiling of the exact hypercalls
  that were causing these issues, nor do we have any context
  to help evaluate this any further
 
  I at least checked with kvm folks and it seems hypercall preemption
  is not needed there. We can survey other hypervisors...
 
  If 'something like preemption' is needed then CONFIG_PREEMPT
  should just be enabled and encouraged, it seems we want to
  encourage CONFIG_PREEMPT on xen, specially when multicalls are
  used. In the meantime this tries to address a solution to help
  xen on non CONFIG_PREEMPT kernels.
 
  One option tested and evaluated was to put private hypercalls in
  process context, however this would introduce complexities such
  originating hypercalls from different contexts. Current xen
  hypercall callback handlers would need to be changed per architecture,
  for instance, we'd also incur the cost of switching states from
  user / kernel (this cost is also present if preempt_schedule_irq()
  is used). There may be other issues which could be introduced with
  this strategy as well. The simplest *shared* alternative is instead
  to just explicitly schedule() at the end of a private hypercall on non
  preempt kernels. This forces our private hypercall call mechanism
  to try to be fair only on non CONFIG_PREEMPT kernels at the cost of
  more context switch but keeps the private hypercall context intact.
 
  [0] 
  http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
  [1] 
  http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
 
  Cc: Davidlohr Bueso dbu...@suse.de
  Cc: Joerg Roedel jroe...@suse.de
  Cc: Borislav Petkov b...@suse.de
  Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
  Cc: Jan Beulich jbeul...@suse.com
  Cc: Juergen Gross jgr...@suse.com
  Cc: Olaf Hering oher...@suse.de
  Cc: David Vrabel david.vra...@citrix.com
  Signed-off-by: Luis R. Rodriguez mcg...@suse.com
  ---
drivers/xen/privcmd.c | 3 +++
1 file changed, 3 insertions(+)
 
  diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
  index 569a13b..e29edba 100644
  --- a/drivers/xen/privcmd.c
  +++ b/drivers/xen/privcmd.c
  @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
   hypercall.arg[0], hypercall.arg[1],
   hypercall.arg[2], hypercall.arg[3],
   hypercall.arg[4]);
  +#ifndef CONFIG_PREEMPT
  + schedule();
  +#endif
 
return ret;
}
 
  Sorry, I don't think this will solve anything. You're calling schedule()
  right after the long running hypercall just nanoseconds before returning
  to the user.
  Yeah, well that is what [1] tried as well only it tried using
  preempt_schedule_irq() on the hypercall callback...
 
  I suppose you were mislead by the int 0x82 in [0]. This is the
  hypercall from the kernel into the hypervisor, e.g. inside of
  privcmd_call().
  

Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-11-27 Thread Juergen Gross

On 11/27/2014 07:50 PM, Andrew Cooper wrote:

On 27/11/14 18:36, Luis R. Rodriguez wrote:

On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:

On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:

From: "Luis R. Rodriguez" 

Some folks had reported that some xen hypercalls take a long time
to complete when issued from the userspace private ioctl mechanism,
this can happen for instance with some hypercalls that have many
sub-operations, this can happen for instance on hypercalls that use
multi-call feature whereby Xen lets one hypercall batch out a series
of other hypercalls on the hypervisor. At times such hypercalls can
even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default
120 seconds), this a non-issue issue on preemptible kernels though as
the kernel may deschedule such long running tasks. Xen for instance
supports multicalls to be preempted as well, this is what Xen calls
continuation (see xen commit 42217cbc5b which introduced this [0]).
On systems without CONFIG_PREEMPT though -- a kernel with voluntary
or no preemption -- a long running hypercall will not be descheduled
until the hypercall is complete and the ioctl returns to user space.

To help with this David had originally implemented support for use
of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This
solution never went upstream though and upon review to help refactor
this I've concluded that usage of preempt_schedule_irq() would be
a bit abussive of existing APIs -- for a few reasons:

0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels

1) we want try to consider solutions that might work for other
 hypervisors for this same problem, and identify it its an issue
 even present on other hypervisors or if this is a self
 inflicted architectural issue caused by use of multicalls

2) there is no documentation or profiling of the exact hypercalls
 that were causing these issues, nor do we have any context
 to help evaluate this any further

I at least checked with kvm folks and it seems hypercall preemption
is not needed there. We can survey other hypervisors...

If 'something like preemption' is needed then CONFIG_PREEMPT
should just be enabled and encouraged, it seems we want to
encourage CONFIG_PREEMPT on xen, specially when multicalls are
used. In the meantime this tries to address a solution to help
xen on non CONFIG_PREEMPT kernels.

One option tested and evaluated was to put private hypercalls in
process context, however this would introduce complexities such
originating hypercalls from different contexts. Current xen
hypercall callback handlers would need to be changed per architecture,
for instance, we'd also incur the cost of switching states from
user / kernel (this cost is also present if preempt_schedule_irq()
is used). There may be other issues which could be introduced with
this strategy as well. The simplest *shared* alternative is instead
to just explicitly schedule() at the end of a private hypercall on non
preempt kernels. This forces our private hypercall call mechanism
to try to be fair only on non CONFIG_PREEMPT kernels at the cost of
more context switch but keeps the private hypercall context intact.

[0] 
http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
[1] 
http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch

Cc: Davidlohr Bueso 
Cc: Joerg Roedel 
Cc: Borislav Petkov 
Cc: Konrad Rzeszutek Wilk 
Cc: Jan Beulich 
Cc: Juergen Gross 
Cc: Olaf Hering 
Cc: David Vrabel 
Signed-off-by: Luis R. Rodriguez 
---
   drivers/xen/privcmd.c | 3 +++
   1 file changed, 3 insertions(+)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 569a13b..e29edba 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
   hypercall.arg[0], hypercall.arg[1],
   hypercall.arg[2], hypercall.arg[3],
   hypercall.arg[4]);
+#ifndef CONFIG_PREEMPT
+   schedule();
+#endif

return ret;
   }


Sorry, I don't think this will solve anything. You're calling schedule()
right after the long running hypercall just nanoseconds before returning
to the user.

Yeah, well that is what [1] tried as well only it tried using
preempt_schedule_irq() on the hypercall callback...


I suppose you were mislead by the "int 0x82" in [0]. This is the
hypercall from the kernel into the hypervisor, e.g. inside of
privcmd_call().

Nope, you have to consider what was done in [1], I was trying to
do something similar but less complex that didn't involve mucking
with the callbacks but also not abusing APIs.

I'm afraid we don't have much leg room.


XenServer uses


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-11-27 Thread Andrew Cooper
On 27/11/14 18:36, Luis R. Rodriguez wrote:
> On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
>> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
>>> From: "Luis R. Rodriguez" 
>>>
>>> Some folks had reported that some xen hypercalls take a long time
>>> to complete when issued from the userspace private ioctl mechanism,
>>> this can happen for instance with some hypercalls that have many
>>> sub-operations, this can happen for instance on hypercalls that use
>>> multi-call feature whereby Xen lets one hypercall batch out a series
>>> of other hypercalls on the hypervisor. At times such hypercalls can
>>> even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default
>>> 120 seconds), this a non-issue issue on preemptible kernels though as
>>> the kernel may deschedule such long running tasks. Xen for instance
>>> supports multicalls to be preempted as well, this is what Xen calls
>>> continuation (see xen commit 42217cbc5b which introduced this [0]).
>>> On systems without CONFIG_PREEMPT though -- a kernel with voluntary
>>> or no preemption -- a long running hypercall will not be descheduled
>>> until the hypercall is complete and the ioctl returns to user space.
>>>
>>> To help with this David had originally implemented support for use
>>> of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This
>>> solution never went upstream though and upon review to help refactor
>>> this I've concluded that usage of preempt_schedule_irq() would be
>>> a bit abussive of existing APIs -- for a few reasons:
>>>
>>> 0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels
>>>
>>> 1) we want try to consider solutions that might work for other
>>> hypervisors for this same problem, and identify it its an issue
>>> even present on other hypervisors or if this is a self
>>> inflicted architectural issue caused by use of multicalls
>>>
>>> 2) there is no documentation or profiling of the exact hypercalls
>>> that were causing these issues, nor do we have any context
>>> to help evaluate this any further
>>>
>>> I at least checked with kvm folks and it seems hypercall preemption
>>> is not needed there. We can survey other hypervisors...
>>>
>>> If 'something like preemption' is needed then CONFIG_PREEMPT
>>> should just be enabled and encouraged, it seems we want to
>>> encourage CONFIG_PREEMPT on xen, specially when multicalls are
>>> used. In the meantime this tries to address a solution to help
>>> xen on non CONFIG_PREEMPT kernels.
>>>
>>> One option tested and evaluated was to put private hypercalls in
>>> process context, however this would introduce complexities such
>>> originating hypercalls from different contexts. Current xen
>>> hypercall callback handlers would need to be changed per architecture,
>>> for instance, we'd also incur the cost of switching states from
>>> user / kernel (this cost is also present if preempt_schedule_irq()
>>> is used). There may be other issues which could be introduced with
>>> this strategy as well. The simplest *shared* alternative is instead
>>> to just explicitly schedule() at the end of a private hypercall on non
>>> preempt kernels. This forces our private hypercall call mechanism
>>> to try to be fair only on non CONFIG_PREEMPT kernels at the cost of
>>> more context switch but keeps the private hypercall context intact.
>>>
>>> [0] 
>>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
>>> [1] 
>>> http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
>>>
>>> Cc: Davidlohr Bueso 
>>> Cc: Joerg Roedel 
>>> Cc: Borislav Petkov 
>>> Cc: Konrad Rzeszutek Wilk 
>>> Cc: Jan Beulich 
>>> Cc: Juergen Gross 
>>> Cc: Olaf Hering 
>>> Cc: David Vrabel 
>>> Signed-off-by: Luis R. Rodriguez 
>>> ---
>>>   drivers/xen/privcmd.c | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
>>> index 569a13b..e29edba 100644
>>> --- a/drivers/xen/privcmd.c
>>> +++ b/drivers/xen/privcmd.c
>>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
>>>hypercall.arg[0], hypercall.arg[1],
>>>hypercall.arg[2], hypercall.arg[3],
>>>hypercall.arg[4]);
>>> +#ifndef CONFIG_PREEMPT
>>> +   schedule();
>>> +#endif
>>>
>>> return ret;
>>>   }
>>>
>> Sorry, I don't think this will solve anything. You're calling schedule()
>> right after the long running hypercall just nanoseconds before returning
>> to the user.
> Yeah, well that is what [1] tried as well only it tried using
> preempt_schedule_irq() on the hypercall callback...
>
>> I suppose you were mislead by the "int 0x82" in [0]. This is the
>> hypercall from the kernel into the hypervisor, e.g. inside of
>> privcmd_call().
> Nope, you have to consider what was done in [1], I was trying 

Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-11-27 Thread Andrew Cooper
On 27/11/14 18:36, Luis R. Rodriguez wrote:
 On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
 On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
 From: Luis R. Rodriguez mcg...@suse.com

 Some folks had reported that some xen hypercalls take a long time
 to complete when issued from the userspace private ioctl mechanism,
 this can happen for instance with some hypercalls that have many
 sub-operations, this can happen for instance on hypercalls that use
 multi-call feature whereby Xen lets one hypercall batch out a series
 of other hypercalls on the hypervisor. At times such hypercalls can
 even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default
 120 seconds), this a non-issue issue on preemptible kernels though as
 the kernel may deschedule such long running tasks. Xen for instance
 supports multicalls to be preempted as well, this is what Xen calls
 continuation (see xen commit 42217cbc5b which introduced this [0]).
 On systems without CONFIG_PREEMPT though -- a kernel with voluntary
 or no preemption -- a long running hypercall will not be descheduled
 until the hypercall is complete and the ioctl returns to user space.

 To help with this David had originally implemented support for use
 of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This
 solution never went upstream though and upon review to help refactor
 this I've concluded that usage of preempt_schedule_irq() would be
 a bit abussive of existing APIs -- for a few reasons:

 0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels

 1) we want try to consider solutions that might work for other
 hypervisors for this same problem, and identify it its an issue
 even present on other hypervisors or if this is a self
 inflicted architectural issue caused by use of multicalls

 2) there is no documentation or profiling of the exact hypercalls
 that were causing these issues, nor do we have any context
 to help evaluate this any further

 I at least checked with kvm folks and it seems hypercall preemption
 is not needed there. We can survey other hypervisors...

 If 'something like preemption' is needed then CONFIG_PREEMPT
 should just be enabled and encouraged, it seems we want to
 encourage CONFIG_PREEMPT on xen, specially when multicalls are
 used. In the meantime this tries to address a solution to help
 xen on non CONFIG_PREEMPT kernels.

 One option tested and evaluated was to put private hypercalls in
 process context, however this would introduce complexities such
 originating hypercalls from different contexts. Current xen
 hypercall callback handlers would need to be changed per architecture,
 for instance, we'd also incur the cost of switching states from
 user / kernel (this cost is also present if preempt_schedule_irq()
 is used). There may be other issues which could be introduced with
 this strategy as well. The simplest *shared* alternative is instead
 to just explicitly schedule() at the end of a private hypercall on non
 preempt kernels. This forces our private hypercall call mechanism
 to try to be fair only on non CONFIG_PREEMPT kernels at the cost of
 more context switch but keeps the private hypercall context intact.

 [0] 
 http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
 [1] 
 http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch

 Cc: Davidlohr Bueso dbu...@suse.de
 Cc: Joerg Roedel jroe...@suse.de
 Cc: Borislav Petkov b...@suse.de
 Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 Cc: Jan Beulich jbeul...@suse.com
 Cc: Juergen Gross jgr...@suse.com
 Cc: Olaf Hering oher...@suse.de
 Cc: David Vrabel david.vra...@citrix.com
 Signed-off-by: Luis R. Rodriguez mcg...@suse.com
 ---
   drivers/xen/privcmd.c | 3 +++
   1 file changed, 3 insertions(+)

 diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
 index 569a13b..e29edba 100644
 --- a/drivers/xen/privcmd.c
 +++ b/drivers/xen/privcmd.c
 @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
hypercall.arg[0], hypercall.arg[1],
hypercall.arg[2], hypercall.arg[3],
hypercall.arg[4]);
 +#ifndef CONFIG_PREEMPT
 +   schedule();
 +#endif

 return ret;
   }

 Sorry, I don't think this will solve anything. You're calling schedule()
 right after the long running hypercall just nanoseconds before returning
 to the user.
 Yeah, well that is what [1] tried as well only it tried using
 preempt_schedule_irq() on the hypercall callback...

 I suppose you were mislead by the int 0x82 in [0]. This is the
 hypercall from the kernel into the hypervisor, e.g. inside of
 privcmd_call().
 Nope, you have to consider what was done in [1], I was trying to
 do something similar but less complex that didn't involve mucking
 with the callbacks but also not abusing APIs.

 I'm 

Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-11-27 Thread Juergen Gross

On 11/27/2014 07:50 PM, Andrew Cooper wrote:

On 27/11/14 18:36, Luis R. Rodriguez wrote:

On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:

On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:

From: Luis R. Rodriguez mcg...@suse.com

Some folks had reported that some xen hypercalls take a long time
to complete when issued from the userspace private ioctl mechanism,
this can happen for instance with some hypercalls that have many
sub-operations, this can happen for instance on hypercalls that use
multi-call feature whereby Xen lets one hypercall batch out a series
of other hypercalls on the hypervisor. At times such hypercalls can
even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default
120 seconds), this a non-issue issue on preemptible kernels though as
the kernel may deschedule such long running tasks. Xen for instance
supports multicalls to be preempted as well, this is what Xen calls
continuation (see xen commit 42217cbc5b which introduced this [0]).
On systems without CONFIG_PREEMPT though -- a kernel with voluntary
or no preemption -- a long running hypercall will not be descheduled
until the hypercall is complete and the ioctl returns to user space.

To help with this David had originally implemented support for use
of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This
solution never went upstream though and upon review to help refactor
this I've concluded that usage of preempt_schedule_irq() would be
a bit abussive of existing APIs -- for a few reasons:

0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels

1) we want try to consider solutions that might work for other
 hypervisors for this same problem, and identify it its an issue
 even present on other hypervisors or if this is a self
 inflicted architectural issue caused by use of multicalls

2) there is no documentation or profiling of the exact hypercalls
 that were causing these issues, nor do we have any context
 to help evaluate this any further

I at least checked with kvm folks and it seems hypercall preemption
is not needed there. We can survey other hypervisors...

If 'something like preemption' is needed then CONFIG_PREEMPT
should just be enabled and encouraged, it seems we want to
encourage CONFIG_PREEMPT on xen, specially when multicalls are
used. In the meantime this tries to address a solution to help
xen on non CONFIG_PREEMPT kernels.

One option tested and evaluated was to put private hypercalls in
process context, however this would introduce complexities such
originating hypercalls from different contexts. Current xen
hypercall callback handlers would need to be changed per architecture,
for instance, we'd also incur the cost of switching states from
user / kernel (this cost is also present if preempt_schedule_irq()
is used). There may be other issues which could be introduced with
this strategy as well. The simplest *shared* alternative is instead
to just explicitly schedule() at the end of a private hypercall on non
preempt kernels. This forces our private hypercall call mechanism
to try to be fair only on non CONFIG_PREEMPT kernels at the cost of
more context switch but keeps the private hypercall context intact.

[0] 
http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
[1] 
http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch

Cc: Davidlohr Bueso dbu...@suse.de
Cc: Joerg Roedel jroe...@suse.de
Cc: Borislav Petkov b...@suse.de
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Jan Beulich jbeul...@suse.com
Cc: Juergen Gross jgr...@suse.com
Cc: Olaf Hering oher...@suse.de
Cc: David Vrabel david.vra...@citrix.com
Signed-off-by: Luis R. Rodriguez mcg...@suse.com
---
   drivers/xen/privcmd.c | 3 +++
   1 file changed, 3 insertions(+)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 569a13b..e29edba 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
   hypercall.arg[0], hypercall.arg[1],
   hypercall.arg[2], hypercall.arg[3],
   hypercall.arg[4]);
+#ifndef CONFIG_PREEMPT
+   schedule();
+#endif

return ret;
   }


Sorry, I don't think this will solve anything. You're calling schedule()
right after the long running hypercall just nanoseconds before returning
to the user.

Yeah, well that is what [1] tried as well only it tried using
preempt_schedule_irq() on the hypercall callback...


I suppose you were mislead by the int 0x82 in [0]. This is the
hypercall from the kernel into the hypervisor, e.g. inside of
privcmd_call().

Nope, you have to consider what was done in [1], I was trying to
do something similar but less complex that didn't involve mucking
with the callbacks but also not abusing APIs.

I'm afraid we