Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-12 Thread Waiman Long

On 07/12/2016 12:16 AM, Juergen Gross wrote:

On 11/07/16 17:10, Waiman Long wrote:

On 07/06/2016 02:52 AM, Peter Zijlstra wrote:

On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:

change fomr v1:
 a simplier definition of default vcpu_is_preempted
 skip mahcine type check on ppc, and add config. remove dedicated
macro.
 add one patch to drop overload of rwsem_spin_on_owner and
mutex_spin_on_owner.
 add more comments
 thanks boqun and Peter's suggestion.

This patch set aims to fix lock holder preemption issues.

test-case:
perf record -a perf bench sched messaging -g 400 -p&&   perf report

18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
   5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
   3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
   3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
   3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
   2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call

We introduce interface bool vcpu_is_preempted(int cpu) and use it in
some spin
loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
These spin_on_onwer variant also cause rcu stall before we apply this
patch set


Paolo, could you help out with an (x86) KVM interface for this?

Waiman, could you see if you can utilize this to get rid of the
SPIN_THRESHOLD in qspinlock_paravirt?

That API is certainly useful to make the paravirt spinlock perform
better. However, I am not sure if we can completely get rid of the
SPIN_THRESHOLD at this point. It is not just the kvm, the xen code need
to be modified as well.

This should be rather easy. The relevant information is included in the
runstate data mapped into kernel memory. I can provide a patch for Xen
if needed.


Juergen


Thanks for the offering. We will wait until Xinhui's patch comes through 
before working on the next step.


As for the elimination of SPIN_THRESHOLD, the queue head may not always 
have the right CPU number of the lock holder. So I don't think we can 
eliminate that for the queue head spinning. I think we can eliminates 
the SPIN_THRESHOLD spinning for the other queue node vCPUs.


Cheers,
Longman
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-11 Thread Juergen Gross
On 11/07/16 17:10, Waiman Long wrote:
> On 07/06/2016 02:52 AM, Peter Zijlstra wrote:
>> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
>>> change fomr v1:
>>> a simplier definition of default vcpu_is_preempted
>>> skip mahcine type check on ppc, and add config. remove dedicated
>>> macro.
>>> add one patch to drop overload of rwsem_spin_on_owner and
>>> mutex_spin_on_owner.
>>> add more comments
>>> thanks boqun and Peter's suggestion.
>>>
>>> This patch set aims to fix lock holder preemption issues.
>>>
>>> test-case:
>>> perf record -a perf bench sched messaging -g 400 -p&&  perf report
>>>
>>> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
>>> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>>>   5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>>>   3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>>>   3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>>>   3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>>>   2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
>>>
>>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in
>>> some spin
>>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>>> These spin_on_onwer variant also cause rcu stall before we apply this
>>> patch set
>>>
>> Paolo, could you help out with an (x86) KVM interface for this?
>>
>> Waiman, could you see if you can utilize this to get rid of the
>> SPIN_THRESHOLD in qspinlock_paravirt?
> 
> That API is certainly useful to make the paravirt spinlock perform
> better. However, I am not sure if we can completely get rid of the
> SPIN_THRESHOLD at this point. It is not just the kvm, the xen code need
> to be modified as well.

This should be rather easy. The relevant information is included in the
runstate data mapped into kernel memory. I can provide a patch for Xen
if needed.


Juergen
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-11 Thread Waiman Long

On 07/06/2016 02:52 AM, Peter Zijlstra wrote:

On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:

change fomr v1:
a simplier definition of default vcpu_is_preempted
skip mahcine type check on ppc, and add config. remove dedicated macro.
add one patch to drop overload of rwsem_spin_on_owner and 
mutex_spin_on_owner.
add more comments
thanks boqun and Peter's suggestion.

This patch set aims to fix lock holder preemption issues.

test-case:
perf record -a perf bench sched messaging -g 400 -p&&  perf report

18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call

We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
These spin_on_onwer variant also cause rcu stall before we apply this patch set


Paolo, could you help out with an (x86) KVM interface for this?

Waiman, could you see if you can utilize this to get rid of the
SPIN_THRESHOLD in qspinlock_paravirt?


That API is certainly useful to make the paravirt spinlock perform 
better. However, I am not sure if we can completely get rid of the 
SPIN_THRESHOLD at this point. It is not just the kvm, the xen code need 
to be modified as well.


Cheers,
Longman
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-07 Thread Peter Zijlstra
On Thu, Jul 07, 2016 at 11:42:15AM +0200, Peter Zijlstra wrote:
> +static void update_steal_time_preempt(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_steal_time *st;
> +
> + if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
> + return;
> +
> + if (unlikely(kvm_read_guest_cached(vcpu->kvm, >arch.st.stime,
> + >arch.st.steal, sizeof(struct kvm_steal_time
> + return;
> +
> + st = >arch.st.steal;
> +
> + st->pad[KVM_ST_PAD_PREEMPT] = 1; /* we've stopped running */

So maybe have this be:

... = kvm_vcpu_running();

That avoids marking the vcpu preempted when we do hlt and such.

> + kvm_write_guest_cached(vcpu->kvm, >arch.st.stime,
> + st, sizeof(struct kvm_steal_time));
> +}
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-07 Thread Peter Zijlstra
On Thu, Jul 07, 2016 at 06:27:26PM +0800, Wanpeng Li wrote:
> In addition, I see xen's vcpu_runstate_info::state is updated during
> schedule, so I think I can do this similarly through kvm preemption
> notifier. IIUC, xen hypervisor has VCPUOP_get_runstate_info hypercall
> implemention, so the desired interface can be implemented if they add
> hypercall callsite in domU. I can add hypercall to kvm similarly.

So I suspect Xen has the page its writing to pinned in memory; so that a
write to it is guaranteed to not fault.

Otherwise I cannot see this working.

That is part of the larger surgery required for KVM steal time to get
'fixed'. Currently that steal time stuff uses kvm_write_guest_cached()
which appears to be able to fault in guest pages.

Or I'm not reading this stuff right; which is entirely possible.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-07 Thread Peter Zijlstra
On Thu, Jul 07, 2016 at 11:42:15AM +0200, Peter Zijlstra wrote:
> I suspect you want something like so; except this has holes in.
> 
> We clear KVM_ST_PAD_PREEMPT before disabling preemption and we set it
> after enabling it, this means that if we get preempted in between, the
> vcpu is reported as running even though it very much is not.
> 
> Fixing that requires much larger surgery.

Note that this same hole is already a 'problem' for steal time
accounting. The thread can accrue further delays (iow steal time) after
we've called record_steal_time(). These delays will go unaccounted.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-07 Thread Peter Zijlstra
On Thu, Jul 07, 2016 at 06:12:51PM +0800, Wanpeng Li wrote:

> Btw, do this in preemption
> notifier means that the vCPU is real preempted on host, however,
> depends on vmexit is different semantic I think.

Not sure; suppose the vcpu is about to reenter, eg, we're in
vcpu_enter_guest() but before the preempt_disable() and the thread gets
preempted. Are we then not preempted? The vcpu might still very much be
in running state but had to service an vmexit due to an MSR or IO port
or whatnot.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-07 Thread Wanpeng Li
2016-07-07 18:12 GMT+08:00 Wanpeng Li :
> 2016-07-07 17:42 GMT+08:00 Peter Zijlstra :
>> On Thu, Jul 07, 2016 at 04:48:05PM +0800, Wanpeng Li wrote:
>>> 2016-07-06 20:28 GMT+08:00 Paolo Bonzini :
>>> > Hmm, you're right.  We can use bit 0 of struct kvm_steal_time's flags to
>>> > indicate that pad[0] is a "VCPU preempted" field; if pad[0] is 1, the
>>> > VCPU has been scheduled out since the last time the guest reset the bit.
>>> >  The guest can use an xchg to test-and-clear it.  The bit can be
>>> > accessed at any time, independent of the version field.
>>>
>>> If one vCPU is preempted, and guest check it several times before this
>>> vCPU is scheded in, then the first time we can get "vCPU is
>>> preempted", however, since the field is cleared, the second time we
>>> will get "vCPU is running".
>>>
>>> Do you mean we should call record_steal_time() in both kvm_sched_in()
>>> and kvm_sched_out() to record this field? Btw, if we should keep both
>>> vcpu->preempted and kvm_steal_time's "vCPU preempted" field present
>>> simultaneous?
>>
>> I suspect you want something like so; except this has holes in.
>>
>> We clear KVM_ST_PAD_PREEMPT before disabling preemption and we set it
>> after enabling it, this means that if we get preempted in between, the
>> vcpu is reported as running even though it very much is not.
>
> Paolo also point out this to me offline yesterday: "Please change
> pad[12] to "__u32 preempted; __u32 pad[11];" too, and remember to
> update Documentation/virtual/kvm/msr.txt!". Btw, do this in preemption
> notifier means that the vCPU is real preempted on host, however,
> depends on vmexit is different semantic I think.

In addition, I see xen's vcpu_runstate_info::state is updated during
schedule, so I think I can do this similarly through kvm preemption
notifier. IIUC, xen hypervisor has VCPUOP_get_runstate_info hypercall
implemention, so the desired interface can be implemented if they add
hypercall callsite in domU. I can add hypercall to kvm similarly.

Paolo, thoughts?

Regards,
Wanpeng Li
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-07 Thread Wanpeng Li
2016-07-07 17:42 GMT+08:00 Peter Zijlstra :
> On Thu, Jul 07, 2016 at 04:48:05PM +0800, Wanpeng Li wrote:
>> 2016-07-06 20:28 GMT+08:00 Paolo Bonzini :
>> > Hmm, you're right.  We can use bit 0 of struct kvm_steal_time's flags to
>> > indicate that pad[0] is a "VCPU preempted" field; if pad[0] is 1, the
>> > VCPU has been scheduled out since the last time the guest reset the bit.
>> >  The guest can use an xchg to test-and-clear it.  The bit can be
>> > accessed at any time, independent of the version field.
>>
>> If one vCPU is preempted, and guest check it several times before this
>> vCPU is scheded in, then the first time we can get "vCPU is
>> preempted", however, since the field is cleared, the second time we
>> will get "vCPU is running".
>>
>> Do you mean we should call record_steal_time() in both kvm_sched_in()
>> and kvm_sched_out() to record this field? Btw, if we should keep both
>> vcpu->preempted and kvm_steal_time's "vCPU preempted" field present
>> simultaneous?
>
> I suspect you want something like so; except this has holes in.
>
> We clear KVM_ST_PAD_PREEMPT before disabling preemption and we set it
> after enabling it, this means that if we get preempted in between, the
> vcpu is reported as running even though it very much is not.

Paolo also point out this to me offline yesterday: "Please change
pad[12] to "__u32 preempted; __u32 pad[11];" too, and remember to
update Documentation/virtual/kvm/msr.txt!". Btw, do this in preemption
notifier means that the vCPU is real preempted on host, however,
depends on vmexit is different semantic I think.

Regards,
Wanpeng Li
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-07 Thread Peter Zijlstra
On Thu, Jul 07, 2016 at 04:48:05PM +0800, Wanpeng Li wrote:
> 2016-07-06 20:28 GMT+08:00 Paolo Bonzini :
> > Hmm, you're right.  We can use bit 0 of struct kvm_steal_time's flags to
> > indicate that pad[0] is a "VCPU preempted" field; if pad[0] is 1, the
> > VCPU has been scheduled out since the last time the guest reset the bit.
> >  The guest can use an xchg to test-and-clear it.  The bit can be
> > accessed at any time, independent of the version field.
> 
> If one vCPU is preempted, and guest check it several times before this
> vCPU is scheded in, then the first time we can get "vCPU is
> preempted", however, since the field is cleared, the second time we
> will get "vCPU is running".
> 
> Do you mean we should call record_steal_time() in both kvm_sched_in()
> and kvm_sched_out() to record this field? Btw, if we should keep both
> vcpu->preempted and kvm_steal_time's "vCPU preempted" field present
> simultaneous?

I suspect you want something like so; except this has holes in.

We clear KVM_ST_PAD_PREEMPT before disabling preemption and we set it
after enabling it, this means that if we get preempted in between, the
vcpu is reported as running even though it very much is not.

Fixing that requires much larger surgery.

---
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b2766723c951..117270df43b6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1997,8 +1997,29 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
vcpu->arch.pv_time_enabled = false;
 }
 
+static void update_steal_time_preempt(struct kvm_vcpu *vcpu)
+{
+   struct kvm_steal_time *st;
+
+   if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
+   return;
+
+   if (unlikely(kvm_read_guest_cached(vcpu->kvm, >arch.st.stime,
+   >arch.st.steal, sizeof(struct kvm_steal_time
+   return;
+
+   st = >arch.st.steal;
+
+   st->pad[KVM_ST_PAD_PREEMPT] = 1; /* we've stopped running */
+
+   kvm_write_guest_cached(vcpu->kvm, >arch.st.stime,
+   st, sizeof(struct kvm_steal_time));
+}
+
 static void record_steal_time(struct kvm_vcpu *vcpu)
 {
+   struct kvm_steal_time *st;
+
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
 
@@ -2006,29 +2027,34 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>arch.st.steal, sizeof(struct kvm_steal_time
return;
 
-   if (vcpu->arch.st.steal.version & 1)
-   vcpu->arch.st.steal.version += 1;  /* first time write, random 
junk */
+   st = >arch.st.steal;
+
+   if (st->version & 1) {
+   st->flags = KVM_ST_FLAG_PREEMPT;
+   st->version += 1;  /* first time write, random junk */
+   }
 
-   vcpu->arch.st.steal.version += 1;
+   st->version += 1;
 
kvm_write_guest_cached(vcpu->kvm, >arch.st.stime,
-   >arch.st.steal, sizeof(struct kvm_steal_time));
+   st, sizeof(struct kvm_steal_time));
 
smp_wmb();
 
-   vcpu->arch.st.steal.steal += current->sched_info.run_delay -
+   st->steal += current->sched_info.run_delay -
vcpu->arch.st.last_steal;
vcpu->arch.st.last_steal = current->sched_info.run_delay;
+   st->pad[KVM_ST_PAD_PREEMPT] = 0; /* we're about to start running */
 
kvm_write_guest_cached(vcpu->kvm, >arch.st.stime,
-   >arch.st.steal, sizeof(struct kvm_steal_time));
+   st, sizeof(struct kvm_steal_time));
 
smp_wmb();
 
-   vcpu->arch.st.steal.version += 1;
+   st->version += 1;
 
kvm_write_guest_cached(vcpu->kvm, >arch.st.stime,
-   >arch.st.steal, sizeof(struct kvm_steal_time));
+   st, sizeof(struct kvm_steal_time));
 }
 
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
@@ -6693,6 +6719,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
preempt_enable();
 
+   update_steal_time_preempt(vcpu);
+
vcpu->srcu_idx = srcu_read_lock(>kvm->srcu);
 
/*
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-07 Thread Wanpeng Li
2016-07-06 20:28 GMT+08:00 Paolo Bonzini :
>
>
> On 06/07/2016 14:08, Wanpeng Li wrote:
>> 2016-07-06 18:44 GMT+08:00 Paolo Bonzini :
>>>
>>>
>>> On 06/07/2016 08:52, Peter Zijlstra wrote:
 On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
> change fomr v1:
>  a simplier definition of default vcpu_is_preempted
>  skip mahcine type check on ppc, and add config. remove dedicated 
> macro.
>  add one patch to drop overload of rwsem_spin_on_owner and 
> mutex_spin_on_owner.
>  add more comments
>  thanks boqun and Peter's suggestion.
>
> This patch set aims to fix lock holder preemption issues.
>
> test-case:
> perf record -a perf bench sched messaging -g 400 -p && perf report
>
> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
>
> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some 
> spin
> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
> These spin_on_onwer variant also cause rcu stall before we apply this 
> patch set

 Paolo, could you help out with an (x86) KVM interface for this?
>>>
>>> If it's just for spin loops, you can check if the version field in the
>>> steal time structure has changed.
>>
>> Steal time will not be updated until ahead of next vmentry except
>> wrmsr MSR_KVM_STEAL_TIME. So it can't represent it is preempted
>> currently, right?
>
> Hmm, you're right.  We can use bit 0 of struct kvm_steal_time's flags to
> indicate that pad[0] is a "VCPU preempted" field; if pad[0] is 1, the
> VCPU has been scheduled out since the last time the guest reset the bit.
>  The guest can use an xchg to test-and-clear it.  The bit can be
> accessed at any time, independent of the version field.

If one vCPU is preempted, and guest check it several times before this
vCPU is scheded in, then the first time we can get "vCPU is
preempted", however, since the field is cleared, the second time we
will get "vCPU is running".

Do you mean we should call record_steal_time() in both kvm_sched_in()
and kvm_sched_out() to record this field? Btw, if we should keep both
vcpu->preempted and kvm_steal_time's "vCPU preempted" field present
simultaneous?

Regards,
Wanpeng Li
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-06 Thread Christian Borntraeger
On 07/06/2016 10:19 AM, Peter Zijlstra wrote:
> On Wed, Jul 06, 2016 at 09:47:18AM +0200, Juergen Gross wrote:
>> On 06/07/16 08:52, Peter Zijlstra wrote:
> 
>>> Paolo, could you help out with an (x86) KVM interface for this?
>>
>> Xen support of this interface should be rather easy. Could you please
>> Cc: xen-devel-requ...@lists.xenproject.org in the next version?
> 
> So meta question; aren't all you virt people looking at the regular
> virtualization list? Or should we really dig out all the various
> hypervisor lists and Cc them?
> 

Some of the kvm on s390 team reads this, but I would assume that the base s390 
team
does not (Martin can you confirm?) as the main focus was z/VM and LPAR. So 
maybe adding
linux-s390@vger for generic things does make sense.




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-06 Thread Wanpeng Li
2016-07-06 20:28 GMT+08:00 Paolo Bonzini :
>
>
> On 06/07/2016 14:08, Wanpeng Li wrote:
>> 2016-07-06 18:44 GMT+08:00 Paolo Bonzini :
>>>
>>>
>>> On 06/07/2016 08:52, Peter Zijlstra wrote:
 On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
> change fomr v1:
>  a simplier definition of default vcpu_is_preempted
>  skip mahcine type check on ppc, and add config. remove dedicated 
> macro.
>  add one patch to drop overload of rwsem_spin_on_owner and 
> mutex_spin_on_owner.
>  add more comments
>  thanks boqun and Peter's suggestion.
>
> This patch set aims to fix lock holder preemption issues.
>
> test-case:
> perf record -a perf bench sched messaging -g 400 -p && perf report
>
> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
>
> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some 
> spin
> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
> These spin_on_onwer variant also cause rcu stall before we apply this 
> patch set

 Paolo, could you help out with an (x86) KVM interface for this?
>>>
>>> If it's just for spin loops, you can check if the version field in the
>>> steal time structure has changed.
>>
>> Steal time will not be updated until ahead of next vmentry except
>> wrmsr MSR_KVM_STEAL_TIME. So it can't represent it is preempted
>> currently, right?
>
> Hmm, you're right.  We can use bit 0 of struct kvm_steal_time's flags to
> indicate that pad[0] is a "VCPU preempted" field; if pad[0] is 1, the
> VCPU has been scheduled out since the last time the guest reset the bit.
>  The guest can use an xchg to test-and-clear it.  The bit can be
> accessed at any time, independent of the version field.

I will try to implement it tomorrow, thanks for your proposal. :)

Regards,
Wanpeng Li
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-06 Thread Paolo Bonzini


On 06/07/2016 10:19, Peter Zijlstra wrote:
>>> Paolo, could you help out with an (x86) KVM interface for this?
>> > 
>> > Xen support of this interface should be rather easy. Could you please
>> > Cc: xen-devel-requ...@lists.xenproject.org in the next version?
> So meta question; aren't all you virt people looking at the regular
> virtualization list? Or should we really dig out all the various
> hypervisor lists and Cc them?

I at least skim the subjects.

Paolo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-06 Thread Paolo Bonzini


On 06/07/2016 14:08, Wanpeng Li wrote:
> 2016-07-06 18:44 GMT+08:00 Paolo Bonzini :
>>
>>
>> On 06/07/2016 08:52, Peter Zijlstra wrote:
>>> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
 change fomr v1:
  a simplier definition of default vcpu_is_preempted
  skip mahcine type check on ppc, and add config. remove dedicated 
 macro.
  add one patch to drop overload of rwsem_spin_on_owner and 
 mutex_spin_on_owner.
  add more comments
  thanks boqun and Peter's suggestion.

 This patch set aims to fix lock holder preemption issues.

 test-case:
 perf record -a perf bench sched messaging -g 400 -p && perf report

 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call

 We introduce interface bool vcpu_is_preempted(int cpu) and use it in some 
 spin
 loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
 These spin_on_onwer variant also cause rcu stall before we apply this 
 patch set
>>>
>>> Paolo, could you help out with an (x86) KVM interface for this?
>>
>> If it's just for spin loops, you can check if the version field in the
>> steal time structure has changed.
> 
> Steal time will not be updated until ahead of next vmentry except
> wrmsr MSR_KVM_STEAL_TIME. So it can't represent it is preempted
> currently, right?

Hmm, you're right.  We can use bit 0 of struct kvm_steal_time's flags to
indicate that pad[0] is a "VCPU preempted" field; if pad[0] is 1, the
VCPU has been scheduled out since the last time the guest reset the bit.
 The guest can use an xchg to test-and-clear it.  The bit can be
accessed at any time, independent of the version field.

Paolo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-06 Thread Wanpeng Li
2016-07-06 18:44 GMT+08:00 Paolo Bonzini :
>
>
> On 06/07/2016 08:52, Peter Zijlstra wrote:
>> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
>>> change fomr v1:
>>>  a simplier definition of default vcpu_is_preempted
>>>  skip mahcine type check on ppc, and add config. remove dedicated macro.
>>>  add one patch to drop overload of rwsem_spin_on_owner and 
>>> mutex_spin_on_owner.
>>>  add more comments
>>>  thanks boqun and Peter's suggestion.
>>>
>>> This patch set aims to fix lock holder preemption issues.
>>>
>>> test-case:
>>> perf record -a perf bench sched messaging -g 400 -p && perf report
>>>
>>> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
>>> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>>>  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>>>  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>>>  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>>>  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>>>  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
>>>
>>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some 
>>> spin
>>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>>> These spin_on_onwer variant also cause rcu stall before we apply this patch 
>>> set
>>>
>>
>> Paolo, could you help out with an (x86) KVM interface for this?
>
> If it's just for spin loops, you can check if the version field in the
> steal time structure has changed.

Steal time will not be updated until ahead of next vmentry except
wrmsr MSR_KVM_STEAL_TIME. So it can't represent it is preempted
currently, right?

Regards,
Wanpeng Li
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-06 Thread Peter Zijlstra
On Wed, Jul 06, 2016 at 12:44:58PM +0200, Paolo Bonzini wrote:
> > Paolo, could you help out with an (x86) KVM interface for this?
> 
> If it's just for spin loops, you can check if the version field in the
> steal time structure has changed.

That would require remembering the old value, no?

That would work with a previous interface proposal, see:

  
http://lkml.kernel.org/r/1466937715-6683-2-git-send-email-xinhui@linux.vnet.ibm.com

the vcpu_get_yield_count() thing would match that I think.

However the current proposal:

  
http://lkml.kernel.org/r/1467124991-13164-2-git-send-email-xinhui@linux.vnet.ibm.com

dropped that in favour of only vcpu_is_preempted(), which requires being
able to tell if a (remote) vcpu is currently running or not, which iirc,
isn't possible with the steal time sequence count.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-06 Thread Paolo Bonzini


On 06/07/2016 08:52, Peter Zijlstra wrote:
> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
>> change fomr v1:
>>  a simplier definition of default vcpu_is_preempted
>>  skip mahcine type check on ppc, and add config. remove dedicated macro.
>>  add one patch to drop overload of rwsem_spin_on_owner and 
>> mutex_spin_on_owner. 
>>  add more comments
>>  thanks boqun and Peter's suggestion.
>>
>> This patch set aims to fix lock holder preemption issues.
>>
>> test-case:
>> perf record -a perf bench sched messaging -g 400 -p && perf report
>>
>> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
>> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>>  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>>  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>>  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>>  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>>  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
>>
>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some 
>> spin
>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>> These spin_on_onwer variant also cause rcu stall before we apply this patch 
>> set
>>
> 
> Paolo, could you help out with an (x86) KVM interface for this?

If it's just for spin loops, you can check if the version field in the
steal time structure has changed.

Paolo

> Waiman, could you see if you can utilize this to get rid of the
> SPIN_THRESHOLD in qspinlock_paravirt?
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-06 Thread xinhui



On 2016年07月06日 14:52, Peter Zijlstra wrote:

On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:

change fomr v1:
a simplier definition of default vcpu_is_preempted
skip mahcine type check on ppc, and add config. remove dedicated macro.
add one patch to drop overload of rwsem_spin_on_owner and 
mutex_spin_on_owner.
add more comments
thanks boqun and Peter's suggestion.

This patch set aims to fix lock holder preemption issues.

test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report

18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call

We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
These spin_on_onwer variant also cause rcu stall before we apply this patch set



Paolo, could you help out with an (x86) KVM interface for this?

Waiman, could you see if you can utilize this to get rid of the
SPIN_THRESHOLD in qspinlock_paravirt?


hmm. maybe something like below. wait_node can go into pv_wait() earlier as 
soon as the prev cpu is preempted.
but for the wait_head, as qspinlock does not record the lock_holder 
correctly(thanks to lock stealing), vcpu preemption check might get wrong 
results.

Waiman, I have used one hash table to keep the lock holder in my ppc 
implementation patch. I think we could do something similar in generic code?

diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index 74c4a86..40560e8 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -312,7 +312,8 @@ pv_wait_early(struct pv_node *prev, int loop)
if ((loop & PV_PREV_CHECK_MASK) != 0)
return false;
 
-   return READ_ONCE(prev->state) != vcpu_running;

+   return READ_ONCE(prev->state) != vcpu_running ||
+   vcpu_is_preempted(prev->cpu);
 }
 
 /*


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-06 Thread Juergen Gross
On 06/07/16 10:19, Peter Zijlstra wrote:
> On Wed, Jul 06, 2016 at 09:47:18AM +0200, Juergen Gross wrote:
>> On 06/07/16 08:52, Peter Zijlstra wrote:
> 
>>> Paolo, could you help out with an (x86) KVM interface for this?
>>
>> Xen support of this interface should be rather easy. Could you please
>> Cc: xen-devel-requ...@lists.xenproject.org in the next version?
> 
> So meta question; aren't all you virt people looking at the regular
> virtualization list? Or should we really dig out all the various
> hypervisor lists and Cc them?

Hmm, good question. Up to now I didn't look at the virtualization list,
just changed that. :-)

Can't speak for the other virt people.


Juergen

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-06 Thread Peter Zijlstra
On Wed, Jul 06, 2016 at 09:47:18AM +0200, Juergen Gross wrote:
> On 06/07/16 08:52, Peter Zijlstra wrote:

> > Paolo, could you help out with an (x86) KVM interface for this?
> 
> Xen support of this interface should be rather easy. Could you please
> Cc: xen-devel-requ...@lists.xenproject.org in the next version?

So meta question; aren't all you virt people looking at the regular
virtualization list? Or should we really dig out all the various
hypervisor lists and Cc them?


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-06 Thread Juergen Gross
On 06/07/16 08:52, Peter Zijlstra wrote:
> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
>> change fomr v1:
>>  a simplier definition of default vcpu_is_preempted
>>  skip mahcine type check on ppc, and add config. remove dedicated macro.
>>  add one patch to drop overload of rwsem_spin_on_owner and 
>> mutex_spin_on_owner. 
>>  add more comments
>>  thanks boqun and Peter's suggestion.
>>
>> This patch set aims to fix lock holder preemption issues.
>>
>> test-case:
>> perf record -a perf bench sched messaging -g 400 -p && perf report
>>
>> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
>> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>>  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>>  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>>  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>>  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>>  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
>>
>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some 
>> spin
>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>> These spin_on_onwer variant also cause rcu stall before we apply this patch 
>> set
>>
> 
> Paolo, could you help out with an (x86) KVM interface for this?

Xen support of this interface should be rather easy. Could you please
Cc: xen-devel-requ...@lists.xenproject.org in the next version?


Juergen
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-06 Thread Peter Zijlstra
On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
> change fomr v1:
>   a simplier definition of default vcpu_is_preempted
>   skip mahcine type check on ppc, and add config. remove dedicated macro.
>   add one patch to drop overload of rwsem_spin_on_owner and 
> mutex_spin_on_owner. 
>   add more comments
>   thanks boqun and Peter's suggestion.
> 
> This patch set aims to fix lock holder preemption issues.
> 
> test-case:
> perf record -a perf bench sched messaging -g 400 -p && perf report
> 
> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
> 
> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
> These spin_on_onwer variant also cause rcu stall before we apply this patch 
> set
> 

Paolo, could you help out with an (x86) KVM interface for this?

Waiman, could you see if you can utilize this to get rid of the
SPIN_THRESHOLD in qspinlock_paravirt?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev