> Thanks to the feedback from Oleg, Peter, Mike, and Frederic,
> I seem to have a patch series that manages to do times()
> locklessly, and apparently correctly.
>
> Oleg points out that the monotonicity alone is not enough of a
> guarantee, but that should probably be attacked separately,
Thanks to the feedback from Oleg, Peter, Mike, and Frederic,
I seem to have a patch series that manages to do times()
locklessly, and apparently correctly.
Oleg points out that the monotonicity alone is not enough of a
guarantee, but that should probably be attacked separately, since
On Tue, 2013-07-30 at 13:18 +0530, Srikar Dronamraju wrote:
> Here is an approach that looks to consolidate workloads across nodes.
> This results in much improved performance. Again I would assume this work
> is complementary to Mel's work with numa faulting.
>
> Here are the advantages of this
On Tue, 2013-07-30 at 13:18 +0530, Srikar Dronamraju wrote:
Here is an approach that looks to consolidate workloads across nodes.
This results in much improved performance. Again I would assume this work
is complementary to Mel's work with numa faulting.
Here are the advantages of this
On Wed, 2013-06-26 at 15:52 +0300, Gleb Natapov wrote:
> On Wed, Jun 26, 2013 at 01:37:45PM +0200, Andrew Jones wrote:
> > On Wed, Jun 26, 2013 at 02:15:26PM +0530, Raghavendra K T wrote:
> > > On 06/25/2013 08:20 PM, Andrew Theurer wrote:
> > > >On Sun, 2013-06-02 a
On Wed, 2013-06-26 at 15:52 +0300, Gleb Natapov wrote:
On Wed, Jun 26, 2013 at 01:37:45PM +0200, Andrew Jones wrote:
On Wed, Jun 26, 2013 at 02:15:26PM +0530, Raghavendra K T wrote:
On 06/25/2013 08:20 PM, Andrew Theurer wrote:
On Sun, 2013-06-02 at 00:51 +0530, Raghavendra K T wrote
On Sun, 2013-06-02 at 00:51 +0530, Raghavendra K T wrote:
> This series replaces the existing paravirtualized spinlock mechanism
> with a paravirtualized ticketlock mechanism. The series provides
> implementation for both Xen and KVM.
>
> Changes in V9:
> - Changed spin_threshold to 32k to avoid
On Sun, 2013-06-02 at 00:51 +0530, Raghavendra K T wrote:
This series replaces the existing paravirtualized spinlock mechanism
with a paravirtualized ticketlock mechanism. The series provides
implementation for both Xen and KVM.
Changes in V9:
- Changed spin_threshold to 32k to avoid excess
14111.5600 754.4525 884.905124.4723 -93.72922
>2481.627071.26652383.5700 333.2435-3.95132
>1510.248331.86341477.735850.5126-2.15279
>1029.487516.91661075.922513.9911 4.51050
> +---+---+---+---
and share
the patches I tried.
-Andrew Theurer
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
? __audit_syscall_exit+0x246/0x2f0
> [ 2144.673541] [] ? __audit_syscall_entry+0x8c/0xf0
> [ 2144.673543] [] system_call_fastpath+0x16/0x1b
This is on a 40 core / 80 thread Westmere-EX with 16 VMs, each VM having
20 vCPUs (so 4x over-commit). All VMs run dbench in tmpfs, which is a
prett
dbench in tmpfs, which is a
pretty good test for spinlock preempt problems. I had PLE enabled for
the test.
When you re-base your patches I will try it again.
Thanks,
-Andrew Theurer
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord
; schedule();
> > > > >
> > > > > return yielded;
> > > > >
> > > >
> > > > Acked-by: Andrew Jones
> > > >
> > >
> > > Thank you Drew.
> > >
> > > Marcelo Gle
the latest throttled yield_to() patch (the one Vinod tested).
Signed-off-by: Andrew Theurer haban...@linux.vnet.ibm.com
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ecc5543..61d12ea 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -192,6 +192,7 @@ struct
On Tue, 2012-11-27 at 16:00 +0530, Raghavendra K T wrote:
> On 11/26/2012 07:05 PM, Andrew Jones wrote:
> > On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote:
> >> From: Peter Zijlstra
> >>
> >> In case of undercomitted scenarios, especially in large guests
> >> yield_to overhead is
On Tue, 2012-11-27 at 16:00 +0530, Raghavendra K T wrote:
On 11/26/2012 07:05 PM, Andrew Jones wrote:
On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote:
From: Peter Zijlstra pet...@infradead.org
In case of undercomitted scenarios, especially in large guests
yield_to
On Wed, 2012-11-21 at 11:52 +, Mel Gorman wrote:
> On Tue, Nov 20, 2012 at 07:54:13PM -0600, Andrew Theurer wrote:
> > On Tue, 2012-11-20 at 18:56 +0100, Ingo Molnar wrote:
> > > * Ingo Molnar wrote:
> > >
> > > > ( The 4x JVM regression is still an ope
On Wed, 2012-11-21 at 11:52 +, Mel Gorman wrote:
On Tue, Nov 20, 2012 at 07:54:13PM -0600, Andrew Theurer wrote:
On Tue, 2012-11-20 at 18:56 +0100, Ingo Molnar wrote:
* Ingo Molnar mi...@kernel.org wrote:
( The 4x JVM regression is still an open bug I think - I'll
re-check
On Tue, 2012-11-20 at 20:10 -0800, Hugh Dickins wrote:
> On Tue, 20 Nov 2012, Rik van Riel wrote:
> > On 11/20/2012 08:54 PM, Andrew Theurer wrote:
> >
> > > I can confirm single JVM JBB is working well for me. I see a 30%
> > > improvement over autoNUMA. What I
On Tue, 2012-11-20 at 20:10 -0800, Hugh Dickins wrote:
On Tue, 20 Nov 2012, Rik van Riel wrote:
On 11/20/2012 08:54 PM, Andrew Theurer wrote:
I can confirm single JVM JBB is working well for me. I see a 30%
improvement over autoNUMA. What I can't make sense of is some perf
stats
he percentages for
autoNUMA still seem a little high (but at least lower then numa/core).
I need to take a manually pinned measurement to compare.
> Those of you who would like to test all the latest patches are
> welcome to pick up latest bits at tip:master:
>
>git://git.
latest bits at tip:master:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master
I've been running on numa/core, but I'll switch to master and try these
again.
Thanks,
-Andrew Theurer
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body
On Wed, 2012-11-14 at 18:28 +, Mel Gorman wrote:
> On Wed, Nov 14, 2012 at 11:24:42AM -0600, Andrew Theurer wrote:
> >
> > > From: Peter Zijlstra
> > >
> > > Note: The scan period is much larger than it was in the original patch.
> > > The
d. This per task N
would also be an interesting value to rank memory access frequency among
tasks and help prioritize scheduling decisions.
-Andrew Theurer
>
> [ In AutoNUMA speak, this patch deals with the effective sampling
> rate of the 'hinting page fault'. AutoNUMA's scanning is
&g
space, but mark
only every Nth page. N is adjusted each period to target a rolling
average of X faults per MB per execution time period. This per task N
would also be an interesting value to rank memory access frequency among
tasks and help prioritize scheduling decisions.
-Andrew Theurer
On Wed, 2012-11-14 at 18:28 +, Mel Gorman wrote:
On Wed, Nov 14, 2012 at 11:24:42AM -0600, Andrew Theurer wrote:
From: Peter Zijlstra a.p.zijls...@chello.nl
Note: The scan period is much larger than it was in the original patch.
The reason was because the system CPU usage
> Check system load and handle different commit cases accordingly
>
> Please let me know your comments and suggestions.
>
> Link for V1:
> https://lkml.org/lkml/2012/9/21/168
>
> kernel/sched/core.c | 25 +++--
> virt/kvm/kvm_main.c | 56
> ++--
and suggestions.
Link for V1:
https://lkml.org/lkml/2012/9/21/168
kernel/sched/core.c | 25 +++--
virt/kvm/kvm_main.c | 56
++--
2 files changed, 65 insertions(+), 16 deletions(-)
-Andrew Theurer
--
To unsubscribe
On Fri, 2012-10-19 at 14:00 +0530, Raghavendra K T wrote:
> On 10/15/2012 08:04 PM, Andrew Theurer wrote:
> > On Mon, 2012-10-15 at 17:40 +0530, Raghavendra K T wrote:
> >> On 10/11/2012 01:06 AM, Andrew Theurer wrote:
> >>> On Wed, 2012-10-10 at 23:24 +0530, Raghaven
On Fri, 2012-10-19 at 14:00 +0530, Raghavendra K T wrote:
On 10/15/2012 08:04 PM, Andrew Theurer wrote:
On Mon, 2012-10-15 at 17:40 +0530, Raghavendra K T wrote:
On 10/11/2012 01:06 AM, Andrew Theurer wrote:
On Wed, 2012-10-10 at 23:24 +0530, Raghavendra K T wrote:
On 10/10/2012 08:29 AM
On Mon, 2012-10-15 at 17:40 +0530, Raghavendra K T wrote:
> On 10/11/2012 01:06 AM, Andrew Theurer wrote:
> > On Wed, 2012-10-10 at 23:24 +0530, Raghavendra K T wrote:
> >> On 10/10/2012 08:29 AM, Andrew Theurer wrote:
> >>> On Wed, 2012-10-10 at 00:21 +0530, Ragha
On Mon, 2012-10-15 at 17:40 +0530, Raghavendra K T wrote:
On 10/11/2012 01:06 AM, Andrew Theurer wrote:
On Wed, 2012-10-10 at 23:24 +0530, Raghavendra K T wrote:
On 10/10/2012 08:29 AM, Andrew Theurer wrote:
On Wed, 2012-10-10 at 00:21 +0530, Raghavendra K T wrote:
* Avi Kivity
On Wed, 2012-10-10 at 23:24 +0530, Raghavendra K T wrote:
> On 10/10/2012 08:29 AM, Andrew Theurer wrote:
> > On Wed, 2012-10-10 at 00:21 +0530, Raghavendra K T wrote:
> >> * Avi Kivity [2012-10-04 17:00:28]:
> >>
> >>> On 10/04/2012 03:07 PM, Peter Zijlstra
On Wed, 2012-10-10 at 23:13 +0530, Raghavendra K T wrote:
> On 10/10/2012 07:54 PM, Andrew Theurer wrote:
> > I ran 'perf sched map' on the dbench workload for medium and large VMs,
> > and I thought I would share some of the results. I think it helps to
> > visualize what
I ran 'perf sched map' on the dbench workload for medium and large VMs,
and I thought I would share some of the results. I think it helps to
visualize what's going on regarding the yielding.
These files are png bitmaps, generated from processing output from 'perf
sched map' (and perf data
I ran 'perf sched map' on the dbench workload for medium and large VMs,
and I thought I would share some of the results. I think it helps to
visualize what's going on regarding the yielding.
These files are png bitmaps, generated from processing output from 'perf
sched map' (and perf data
On Wed, 2012-10-10 at 23:13 +0530, Raghavendra K T wrote:
On 10/10/2012 07:54 PM, Andrew Theurer wrote:
I ran 'perf sched map' on the dbench workload for medium and large VMs,
and I thought I would share some of the results. I think it helps to
visualize what's going on regarding
On Wed, 2012-10-10 at 23:24 +0530, Raghavendra K T wrote:
On 10/10/2012 08:29 AM, Andrew Theurer wrote:
On Wed, 2012-10-10 at 00:21 +0530, Raghavendra K T wrote:
* Avi Kivity a...@redhat.com [2012-10-04 17:00:28]:
On 10/04/2012 03:07 PM, Peter Zijlstra wrote:
On Thu, 2012-10-04 at 14:41
On Wed, 2012-10-10 at 00:21 +0530, Raghavendra K T wrote:
> * Avi Kivity [2012-10-04 17:00:28]:
>
> > On 10/04/2012 03:07 PM, Peter Zijlstra wrote:
> > > On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote:
> > >>
> > >> Again the numbers are ridiculously high for arch_local_irq_restore.
> > >>
On Wed, 2012-10-10 at 00:21 +0530, Raghavendra K T wrote:
* Avi Kivity a...@redhat.com [2012-10-04 17:00:28]:
On 10/04/2012 03:07 PM, Peter Zijlstra wrote:
On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote:
Again the numbers are ridiculously high for arch_local_irq_restore.
On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote:
> On 10/04/2012 12:49 PM, Raghavendra K T wrote:
> > On 10/03/2012 10:35 PM, Avi Kivity wrote:
> >> On 10/03/2012 02:22 PM, Raghavendra K T wrote:
> So I think it's worth trying again with ple_window of 2-4.
>
> >>>
> >>> Hi
On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote:
On 10/04/2012 12:49 PM, Raghavendra K T wrote:
On 10/03/2012 10:35 PM, Avi Kivity wrote:
On 10/03/2012 02:22 PM, Raghavendra K T wrote:
So I think it's worth trying again with ple_window of 2-4.
Hi Avi,
I ran different
On Fri, 2012-09-28 at 11:08 +0530, Raghavendra K T wrote:
> On 09/27/2012 05:33 PM, Avi Kivity wrote:
> > On 09/27/2012 01:23 PM, Raghavendra K T wrote:
> >>>
> >>> This gives us a good case for tracking preemption on a per-vm basis. As
> >>> long as we aren't preempted, we can keep the PLE
On Fri, 2012-09-28 at 11:08 +0530, Raghavendra K T wrote:
On 09/27/2012 05:33 PM, Avi Kivity wrote:
On 09/27/2012 01:23 PM, Raghavendra K T wrote:
This gives us a good case for tracking preemption on a per-vm basis. As
long as we aren't preempted, we can keep the PLE window high, and
ng else?
>
> >
> > So looking back at threads/ discussions so far, I am trying to
> > summarize, the discussions so far. I feel, at least here are the few
> > potential candidates to go in:
> >
> > 1) Avoiding double runqueue lock overhead (Andrew Theur
and then others.
Or were you referring to something else?
So looking back at threads/ discussions so far, I am trying to
summarize, the discussions so far. I feel, at least here are the few
potential candidates to go in:
1) Avoiding double runqueue lock overhead (Andrew Theurer
On Sun, 2012-09-16 at 11:55 +0300, Avi Kivity wrote:
> On 09/14/2012 12:30 AM, Andrew Theurer wrote:
>
> > The concern I have is that even though we have gone through changes to
> > help reduce the candidate vcpus we yield to, we still have a very poor
> > idea of which
On Sun, 2012-09-16 at 11:55 +0300, Avi Kivity wrote:
On 09/14/2012 12:30 AM, Andrew Theurer wrote:
The concern I have is that even though we have gone through changes to
help reduce the candidate vcpus we yield to, we still have a very poor
idea of which vcpu really needs to run
On Thu, 2012-09-13 at 17:18 +0530, Raghavendra K T wrote:
> * Andrew Theurer [2012-09-11 13:27:41]:
>
> > On Tue, 2012-09-11 at 11:38 +0530, Raghavendra K T wrote:
> > > On 09/11/2012 01:42 AM, Andrew Theurer wrote:
> > > > On Mon, 2012-09-10 at 19:12 +0200, Pete
On Thu, 2012-09-13 at 17:18 +0530, Raghavendra K T wrote:
* Andrew Theurer haban...@linux.vnet.ibm.com [2012-09-11 13:27:41]:
On Tue, 2012-09-11 at 11:38 +0530, Raghavendra K T wrote:
On 09/11/2012 01:42 AM, Andrew Theurer wrote:
On Mon, 2012-09-10 at 19:12 +0200, Peter Zijlstra wrote
On Tue, 2012-09-11 at 11:38 +0530, Raghavendra K T wrote:
> On 09/11/2012 01:42 AM, Andrew Theurer wrote:
> > On Mon, 2012-09-10 at 19:12 +0200, Peter Zijlstra wrote:
> >> On Mon, 2012-09-10 at 22:26 +0530, Srikar Dronamraju wrote:
> >>>> +static bool __yield_to_c
On Tue, 2012-09-11 at 11:38 +0530, Raghavendra K T wrote:
> On 09/11/2012 01:42 AM, Andrew Theurer wrote:
> > On Mon, 2012-09-10 at 19:12 +0200, Peter Zijlstra wrote:
> >> On Mon, 2012-09-10 at 22:26 +0530, Srikar Dronamraju wrote:
> >>>> +static bool __yield_to_c
On Tue, 2012-09-11 at 11:38 +0530, Raghavendra K T wrote:
On 09/11/2012 01:42 AM, Andrew Theurer wrote:
On Mon, 2012-09-10 at 19:12 +0200, Peter Zijlstra wrote:
On Mon, 2012-09-10 at 22:26 +0530, Srikar Dronamraju wrote:
+static bool __yield_to_candidate(struct task_struct *curr, struct
On Tue, 2012-09-11 at 11:38 +0530, Raghavendra K T wrote:
On 09/11/2012 01:42 AM, Andrew Theurer wrote:
On Mon, 2012-09-10 at 19:12 +0200, Peter Zijlstra wrote:
On Mon, 2012-09-10 at 22:26 +0530, Srikar Dronamraju wrote:
+static bool __yield_to_candidate(struct task_struct *curr, struct
On Mon, 2012-09-10 at 19:12 +0200, Peter Zijlstra wrote:
> On Mon, 2012-09-10 at 22:26 +0530, Srikar Dronamraju wrote:
> > > +static bool __yield_to_candidate(struct task_struct *curr, struct
> > > task_struct *p)
> > > +{
> > > + if (!curr->sched_class->yield_to_task)
> > > +
On Sat, 2012-09-08 at 14:13 +0530, Srikar Dronamraju wrote:
> >
> > signed-off-by: Andrew Theurer
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index fbf1fd0..c767915 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/cor
On Sat, 2012-09-08 at 14:13 +0530, Srikar Dronamraju wrote:
signed-off-by: Andrew Theurer haban...@linux.vnet.ibm.com
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fbf1fd0..c767915 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4844,6 +4844,9
On Mon, 2012-09-10 at 19:12 +0200, Peter Zijlstra wrote:
On Mon, 2012-09-10 at 22:26 +0530, Srikar Dronamraju wrote:
+static bool __yield_to_candidate(struct task_struct *curr, struct
task_struct *p)
+{
+ if (!curr-sched_class-yield_to_task)
+ return false;
+
On Fri, 2012-09-07 at 23:36 +0530, Raghavendra K T wrote:
> CCing PeterZ also.
>
> On 09/07/2012 06:41 PM, Andrew Theurer wrote:
> > I have noticed recently that PLE/yield_to() is still not that scalable
> > for really large guests, sometimes even with no CPU over-commit
. So, my question is: given a runqueue, what's the best
way to check if that corresponding phys cpu is not in guest mode?
Here's the changes so far (schedstat changes not included here):
signed-off-by: Andrew Theurer
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fbf1fd0..f8eff
is: given a runqueue, what's the best
way to check if that corresponding phys cpu is not in guest mode?
Here's the changes so far (schedstat changes not included here):
signed-off-by: Andrew Theurer haban...@linux.vnet.ibm.com
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index
On Fri, 2012-09-07 at 23:36 +0530, Raghavendra K T wrote:
CCing PeterZ also.
On 09/07/2012 06:41 PM, Andrew Theurer wrote:
I have noticed recently that PLE/yield_to() is still not that scalable
for really large guests, sometimes even with no CPU over-commit. I have
a small change
On Tue, 2012-07-10 at 17:24 +0530, Raghavendra K T wrote:
> On 07/10/2012 03:17 AM, Andrew Theurer wrote:
> > On Mon, 2012-07-09 at 11:50 +0530, Raghavendra K T wrote:
> >> Currently Pause Looop Exit (PLE) handler is doing directed yield to a
> >> random VCPU on PL e
On Tue, 2012-07-10 at 17:24 +0530, Raghavendra K T wrote:
On 07/10/2012 03:17 AM, Andrew Theurer wrote:
On Mon, 2012-07-09 at 11:50 +0530, Raghavendra K T wrote:
Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering
lem will improve the
ebizzy score. That workload is so erratic for me, that I do not trust
the results at all. I have however seen consistent improvements in
disabling PLE for a http guest workload and a very high IOPS guest
workload, both with much time spent in host in the double runqueue lo
spent in host in the double runqueue lock
for yield_to(), so that's why I still gravitate toward that issue.
-Andrew Theurer
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org
n level which
has task_hot_time=0, up to a shared cache by default. Anything above
that could require a numactl like preference from userspace.
-Andrew Theurer
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Mo
which
has task_hot_time=0, up to a shared cache by default. Anything above
that could require a numactl like preference from userspace.
-Andrew Theurer
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http
68 matches
Mail list logo