Hi Tejun,
On 4 May 2017 at 19:43, Tejun Heo wrote:
> Hello,
>
> On Thu, May 04, 2017 at 10:19:46AM +0200, Vincent Guittot wrote:
>> > schbench inside a cgroup and have some base load, it is actually
>> > expected to show worse latency. You need to give higher weight to the
>> >
Hello, Vincent.
On Thu, May 04, 2017 at 09:02:39PM +0200, Vincent Guittot wrote:
> In the trace i have uploaded, you will see that regressions happen
> whereas there is no other runnable threads around so it's not a matter
> of background activities that disturbs schbench
Understood, yeah, I'm
Hi Tejun,
On 4 May 2017 at 19:43, Tejun Heo wrote:
> Hello,
>
> On Thu, May 04, 2017 at 10:19:46AM +0200, Vincent Guittot wrote:
>> > schbench inside a cgroup and have some base load, it is actually
>> > expected to show worse latency. You need to give higher weight to the
>> > cgroup matching
Hello, Vincent.
On Thu, May 04, 2017 at 09:02:39PM +0200, Vincent Guittot wrote:
> In the trace i have uploaded, you will see that regressions happen
> whereas there is no other runnable threads around so it's not a matter
> of background activities that disturbs schbench
Understood, yeah, I'm
Hello,
On Thu, May 04, 2017 at 10:19:46AM +0200, Vincent Guittot wrote:
> > schbench inside a cgroup and have some base load, it is actually
> > expected to show worse latency. You need to give higher weight to the
> > cgroup matching the number of active threads (to be accruate, scaled
> > by
Hello,
On Thu, May 04, 2017 at 10:19:46AM +0200, Vincent Guittot wrote:
> > schbench inside a cgroup and have some base load, it is actually
> > expected to show worse latency. You need to give higher weight to the
> > cgroup matching the number of active threads (to be accruate, scaled
> > by
On 3 May 2017 at 23:49, Tejun Heo wrote:
> On Wed, May 03, 2017 at 03:09:38PM +0200, Peter Zijlstra wrote:
>> On Wed, May 03, 2017 at 12:37:37PM +0200, Vincent Guittot wrote:
>> > On 3 May 2017 at 11:37, Peter Zijlstra wrote:
>>
>> > > Of course, it could
On 3 May 2017 at 23:49, Tejun Heo wrote:
> On Wed, May 03, 2017 at 03:09:38PM +0200, Peter Zijlstra wrote:
>> On Wed, May 03, 2017 at 12:37:37PM +0200, Vincent Guittot wrote:
>> > On 3 May 2017 at 11:37, Peter Zijlstra wrote:
>>
>> > > Of course, it could be I overlooked something, in which
On Wed, May 03, 2017 at 03:09:38PM +0200, Peter Zijlstra wrote:
> On Wed, May 03, 2017 at 12:37:37PM +0200, Vincent Guittot wrote:
> > On 3 May 2017 at 11:37, Peter Zijlstra wrote:
>
> > > Of course, it could be I overlooked something, in which case, please
> > > tell :-)
>
On Wed, May 03, 2017 at 03:09:38PM +0200, Peter Zijlstra wrote:
> On Wed, May 03, 2017 at 12:37:37PM +0200, Vincent Guittot wrote:
> > On 3 May 2017 at 11:37, Peter Zijlstra wrote:
>
> > > Of course, it could be I overlooked something, in which case, please
> > > tell :-)
> >
> > That's mainly
On Wed, May 03, 2017 at 12:37:37PM +0200, Vincent Guittot wrote:
> On 3 May 2017 at 11:37, Peter Zijlstra wrote:
> > Of course, it could be I overlooked something, in which case, please
> > tell :-)
>
> That's mainly based on the regression i see on my platform. I haven't
On Wed, May 03, 2017 at 12:37:37PM +0200, Vincent Guittot wrote:
> On 3 May 2017 at 11:37, Peter Zijlstra wrote:
> > Of course, it could be I overlooked something, in which case, please
> > tell :-)
>
> That's mainly based on the regression i see on my platform. I haven't
> find the root cause
On 3 May 2017 at 11:37, Peter Zijlstra wrote:
>
> On Wed, May 03, 2017 at 09:34:51AM +0200, Vincent Guittot wrote:
>
> > We use load_avg for calculating a stable share and we want to use it
> > more and more. So breaking it because it's easier doesn't seems to be
> > the
On 3 May 2017 at 11:37, Peter Zijlstra wrote:
>
> On Wed, May 03, 2017 at 09:34:51AM +0200, Vincent Guittot wrote:
>
> > We use load_avg for calculating a stable share and we want to use it
> > more and more. So breaking it because it's easier doesn't seems to be
> > the right way to do IMHO
>
>
On Wed, May 03, 2017 at 09:34:51AM +0200, Vincent Guittot wrote:
> We use load_avg for calculating a stable share and we want to use it
> more and more. So breaking it because it's easier doesn't seems to be
> the right way to do IMHO
So afaict we calculate group se->load.weight (aka shares,
On Wed, May 03, 2017 at 09:34:51AM +0200, Vincent Guittot wrote:
> We use load_avg for calculating a stable share and we want to use it
> more and more. So breaking it because it's easier doesn't seems to be
> the right way to do IMHO
So afaict we calculate group se->load.weight (aka shares,
On 3 May 2017 at 09:25, Vincent Guittot wrote:
> On 2 May 2017 at 22:56, Tejun Heo wrote:
>> Hello, Vincent.
>>
>> On Tue, May 02, 2017 at 08:56:52AM +0200, Vincent Guittot wrote:
>>> On 28 April 2017 at 18:14, Tejun Heo wrote:
>>> >
On 3 May 2017 at 09:25, Vincent Guittot wrote:
> On 2 May 2017 at 22:56, Tejun Heo wrote:
>> Hello, Vincent.
>>
>> On Tue, May 02, 2017 at 08:56:52AM +0200, Vincent Guittot wrote:
>>> On 28 April 2017 at 18:14, Tejun Heo wrote:
>>> > I'll follow up in the other subthread but there really is
Hi Tejun,
On 2 May 2017 at 23:50, Tejun Heo wrote:
> Hello,
>
> On Tue, May 02, 2017 at 09:18:53AM +0200, Vincent Guittot wrote:
>> > dbg_odd: odd: dst=28 idle=2 brk=32 lbtgt=0-31 type=2
>> > dbg_odd_dump: A: grp=1,17 w=2 avg=7.247 grp=8.337 sum=8.337 pertask=2.779
>> >
Hi Tejun,
On 2 May 2017 at 23:50, Tejun Heo wrote:
> Hello,
>
> On Tue, May 02, 2017 at 09:18:53AM +0200, Vincent Guittot wrote:
>> > dbg_odd: odd: dst=28 idle=2 brk=32 lbtgt=0-31 type=2
>> > dbg_odd_dump: A: grp=1,17 w=2 avg=7.247 grp=8.337 sum=8.337 pertask=2.779
>> > dbg_odd_dump: A:
On 2 May 2017 at 22:56, Tejun Heo wrote:
> Hello, Vincent.
>
> On Tue, May 02, 2017 at 08:56:52AM +0200, Vincent Guittot wrote:
>> On 28 April 2017 at 18:14, Tejun Heo wrote:
>> > I'll follow up in the other subthread but there really is fundamental
>> >
On 2 May 2017 at 22:56, Tejun Heo wrote:
> Hello, Vincent.
>
> On Tue, May 02, 2017 at 08:56:52AM +0200, Vincent Guittot wrote:
>> On 28 April 2017 at 18:14, Tejun Heo wrote:
>> > I'll follow up in the other subthread but there really is fundamental
>> > difference in how we calculate
Hello, Vincent.
On Tue, May 02, 2017 at 03:26:12PM +0200, Vincent Guittot wrote:
> > IMHO, we should better improve load balance selection. I'm going to
> > add smarter group selection in load_balance. that's something we
> > should have already done but it was difficult without load/util_avg
> >
Hello, Vincent.
On Tue, May 02, 2017 at 03:26:12PM +0200, Vincent Guittot wrote:
> > IMHO, we should better improve load balance selection. I'm going to
> > add smarter group selection in load_balance. that's something we
> > should have already done but it was difficult without load/util_avg
> >
Hello,
On Mon, May 01, 2017 at 05:56:13PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 28, 2017 at 04:33:47PM -0400, Tejun Heo wrote:
> > I'm attaching the debug patch. With your change (avg instead of
> > runnable_avg), the following trace shows why it's wrong.
>
> Ah, OK. So you really want
Hello,
On Mon, May 01, 2017 at 05:56:13PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 28, 2017 at 04:33:47PM -0400, Tejun Heo wrote:
> > I'm attaching the debug patch. With your change (avg instead of
> > runnable_avg), the following trace shows why it's wrong.
>
> Ah, OK. So you really want
Hello,
On Tue, May 02, 2017 at 09:18:53AM +0200, Vincent Guittot wrote:
> > dbg_odd: odd: dst=28 idle=2 brk=32 lbtgt=0-31 type=2
> > dbg_odd_dump: A: grp=1,17 w=2 avg=7.247 grp=8.337 sum=8.337 pertask=2.779
> > dbg_odd_dump: A: gcap=1.150 gutil=1.095 run=3 idle=0 gwt=2 type=2 nocap=1
> >
Hello,
On Tue, May 02, 2017 at 09:18:53AM +0200, Vincent Guittot wrote:
> > dbg_odd: odd: dst=28 idle=2 brk=32 lbtgt=0-31 type=2
> > dbg_odd_dump: A: grp=1,17 w=2 avg=7.247 grp=8.337 sum=8.337 pertask=2.779
> > dbg_odd_dump: A: gcap=1.150 gutil=1.095 run=3 idle=0 gwt=2 type=2 nocap=1
> >
Hello, Vincent.
On Tue, May 02, 2017 at 08:56:52AM +0200, Vincent Guittot wrote:
> On 28 April 2017 at 18:14, Tejun Heo wrote:
> > I'll follow up in the other subthread but there really is fundamental
> > difference in how we calculate runnable_avg w/ and w/o cgroups.
> >
Hello, Vincent.
On Tue, May 02, 2017 at 08:56:52AM +0200, Vincent Guittot wrote:
> On 28 April 2017 at 18:14, Tejun Heo wrote:
> > I'll follow up in the other subthread but there really is fundamental
> > difference in how we calculate runnable_avg w/ and w/o cgroups.
> > Indepndent of whether
Hi Tejun,
Le Tuesday 02 May 2017 à 09:18:53 (+0200), Vincent Guittot a écrit :
> On 28 April 2017 at 22:33, Tejun Heo wrote:
> > Hello, Vincent.
> >
> > On Thu, Apr 27, 2017 at 10:29:10AM +0200, Vincent Guittot wrote:
> >> On 27 April 2017 at 00:52, Tejun Heo
Hi Tejun,
Le Tuesday 02 May 2017 à 09:18:53 (+0200), Vincent Guittot a écrit :
> On 28 April 2017 at 22:33, Tejun Heo wrote:
> > Hello, Vincent.
> >
> > On Thu, Apr 27, 2017 at 10:29:10AM +0200, Vincent Guittot wrote:
> >> On 27 April 2017 at 00:52, Tejun Heo wrote:
> >> > Hello,
> >> >
> >> >
On 28 April 2017 at 22:33, Tejun Heo wrote:
> Hello, Vincent.
>
> On Thu, Apr 27, 2017 at 10:29:10AM +0200, Vincent Guittot wrote:
>> On 27 April 2017 at 00:52, Tejun Heo wrote:
>> > Hello,
>> >
>> > On Wed, Apr 26, 2017 at 08:12:09PM +0200, Vincent Guittot
On 28 April 2017 at 22:33, Tejun Heo wrote:
> Hello, Vincent.
>
> On Thu, Apr 27, 2017 at 10:29:10AM +0200, Vincent Guittot wrote:
>> On 27 April 2017 at 00:52, Tejun Heo wrote:
>> > Hello,
>> >
>> > On Wed, Apr 26, 2017 at 08:12:09PM +0200, Vincent Guittot wrote:
>> >> On 24 April 2017 at
On 28 April 2017 at 18:14, Tejun Heo wrote:
> Hello, Vincent.
>
>>
>> The only interest of runnable_load_avg is to be null when a cfs_rq is
>> idle whereas load_avg is not but not to be higher than load_avg. The
>> root cause is that load_balance only looks at "load" but not
On 28 April 2017 at 18:14, Tejun Heo wrote:
> Hello, Vincent.
>
>>
>> The only interest of runnable_load_avg is to be null when a cfs_rq is
>> idle whereas load_avg is not but not to be higher than load_avg. The
>> root cause is that load_balance only looks at "load" but not number of
>> task
On Fri, Apr 28, 2017 at 04:33:47PM -0400, Tejun Heo wrote:
> I'm attaching the debug patch. With your change (avg instead of
> runnable_avg), the following trace shows why it's wrong.
Ah, OK. So you really want runnable_avg (and I understand why), which is
rather unfortunate, since we have
On Fri, Apr 28, 2017 at 04:33:47PM -0400, Tejun Heo wrote:
> I'm attaching the debug patch. With your change (avg instead of
> runnable_avg), the following trace shows why it's wrong.
Ah, OK. So you really want runnable_avg (and I understand why), which is
rather unfortunate, since we have
Here's the debug patch.
The debug condition triggers when the load balancer picks a group w/o
more than one schbench threads on a CPU over one w/.
/sys/module/fair/parameters/dbg_odd_cnt: resettable counter
/sys/module/fair/parameters/dbg_odd_nth: dump group states on Nth
Here's the debug patch.
The debug condition triggers when the load balancer picks a group w/o
more than one schbench threads on a CPU over one w/.
/sys/module/fair/parameters/dbg_odd_cnt: resettable counter
/sys/module/fair/parameters/dbg_odd_nth: dump group states on Nth
Hello, Vincent.
On Thu, Apr 27, 2017 at 10:29:10AM +0200, Vincent Guittot wrote:
> On 27 April 2017 at 00:52, Tejun Heo wrote:
> > Hello,
> >
> > On Wed, Apr 26, 2017 at 08:12:09PM +0200, Vincent Guittot wrote:
> >> On 24 April 2017 at 22:14, Tejun Heo wrote:
>
Hello, Vincent.
On Thu, Apr 27, 2017 at 10:29:10AM +0200, Vincent Guittot wrote:
> On 27 April 2017 at 00:52, Tejun Heo wrote:
> > Hello,
> >
> > On Wed, Apr 26, 2017 at 08:12:09PM +0200, Vincent Guittot wrote:
> >> On 24 April 2017 at 22:14, Tejun Heo wrote:
> >> Can the problem be on the load
Hello, Vincent.
On Thu, Apr 27, 2017 at 10:28:01AM +0200, Vincent Guittot wrote:
> On 27 April 2017 at 02:30, Tejun Heo wrote:
> > Hello, Vincent.
> >
> > On Wed, Apr 26, 2017 at 12:21:52PM +0200, Vincent Guittot wrote:
> >> > This is from the follow-up patch. I was confused.
Hello, Vincent.
On Thu, Apr 27, 2017 at 10:28:01AM +0200, Vincent Guittot wrote:
> On 27 April 2017 at 02:30, Tejun Heo wrote:
> > Hello, Vincent.
> >
> > On Wed, Apr 26, 2017 at 12:21:52PM +0200, Vincent Guittot wrote:
> >> > This is from the follow-up patch. I was confused. Because we don't
On 27 April 2017 at 00:52, Tejun Heo wrote:
> Hello,
>
> On Wed, Apr 26, 2017 at 08:12:09PM +0200, Vincent Guittot wrote:
>> On 24 April 2017 at 22:14, Tejun Heo wrote:
>> Can the problem be on the load balance side instead ? and more
>> precisely in the wakeup
On 27 April 2017 at 00:52, Tejun Heo wrote:
> Hello,
>
> On Wed, Apr 26, 2017 at 08:12:09PM +0200, Vincent Guittot wrote:
>> On 24 April 2017 at 22:14, Tejun Heo wrote:
>> Can the problem be on the load balance side instead ? and more
>> precisely in the wakeup path ?
>> After looking at the
On 27 April 2017 at 02:30, Tejun Heo wrote:
> Hello, Vincent.
>
> On Wed, Apr 26, 2017 at 12:21:52PM +0200, Vincent Guittot wrote:
>> > This is from the follow-up patch. I was confused. Because we don't
>> > propagate decays, we still should decay the runnable_load_avg;
>> >
On 27 April 2017 at 02:30, Tejun Heo wrote:
> Hello, Vincent.
>
> On Wed, Apr 26, 2017 at 12:21:52PM +0200, Vincent Guittot wrote:
>> > This is from the follow-up patch. I was confused. Because we don't
>> > propagate decays, we still should decay the runnable_load_avg;
>> > otherwise, we end
Hello, Vincent.
On Wed, Apr 26, 2017 at 12:21:52PM +0200, Vincent Guittot wrote:
> > This is from the follow-up patch. I was confused. Because we don't
> > propagate decays, we still should decay the runnable_load_avg;
> > otherwise, we end up accumulating errors in the counter. I'll drop
> >
Hello, Vincent.
On Wed, Apr 26, 2017 at 12:21:52PM +0200, Vincent Guittot wrote:
> > This is from the follow-up patch. I was confused. Because we don't
> > propagate decays, we still should decay the runnable_load_avg;
> > otherwise, we end up accumulating errors in the counter. I'll drop
> >
Hello,
On Wed, Apr 26, 2017 at 08:12:09PM +0200, Vincent Guittot wrote:
> On 24 April 2017 at 22:14, Tejun Heo wrote:
> Can the problem be on the load balance side instead ? and more
> precisely in the wakeup path ?
> After looking at the trace, it seems that task placement
Hello,
On Wed, Apr 26, 2017 at 08:12:09PM +0200, Vincent Guittot wrote:
> On 24 April 2017 at 22:14, Tejun Heo wrote:
> Can the problem be on the load balance side instead ? and more
> precisely in the wakeup path ?
> After looking at the trace, it seems that task placement happens at
> wake up
On 24 April 2017 at 22:14, Tejun Heo wrote:
> We noticed that with cgroup CPU controller in use, the scheduling
>
> Note the drastic increase in p99 scheduling latency. After
> investigation, it turned out that the update_sd_lb_stats(), which is
> used by load_balance() to pick
On 24 April 2017 at 22:14, Tejun Heo wrote:
> We noticed that with cgroup CPU controller in use, the scheduling
>
> Note the drastic increase in p99 scheduling latency. After
> investigation, it turned out that the update_sd_lb_stats(), which is
> used by load_balance() to pick the most loaded
On 25 April 2017 at 23:08, Tejun Heo wrote:
> On Tue, Apr 25, 2017 at 11:49:41AM -0700, Tejun Heo wrote:
>> > I have run a quick test with your patches and schbench on my platform.
>> > I haven't been able to reproduce your regression but my platform is
>> > quite different from
On 25 April 2017 at 23:08, Tejun Heo wrote:
> On Tue, Apr 25, 2017 at 11:49:41AM -0700, Tejun Heo wrote:
>> > I have run a quick test with your patches and schbench on my platform.
>> > I haven't been able to reproduce your regression but my platform is
>> > quite different from yours (only 8
On 04/25/2017 04:49 PM, Tejun Heo wrote:
On Tue, Apr 25, 2017 at 11:49:41AM -0700, Tejun Heo wrote:
Will try that too. I can't see why HT would change it because I see
single CPU queues misevaluated. Just in case, you need to tune the
test params so that it doesn't load the machine too much
On 04/25/2017 04:49 PM, Tejun Heo wrote:
On Tue, Apr 25, 2017 at 11:49:41AM -0700, Tejun Heo wrote:
Will try that too. I can't see why HT would change it because I see
single CPU queues misevaluated. Just in case, you need to tune the
test params so that it doesn't load the machine too much
On Tue, Apr 25, 2017 at 11:49:41AM -0700, Tejun Heo wrote:
> > I have run a quick test with your patches and schbench on my platform.
> > I haven't been able to reproduce your regression but my platform is
> > quite different from yours (only 8 cores without SMT)
> > But most importantly, the
On Tue, Apr 25, 2017 at 11:49:41AM -0700, Tejun Heo wrote:
> > I have run a quick test with your patches and schbench on my platform.
> > I haven't been able to reproduce your regression but my platform is
> > quite different from yours (only 8 cores without SMT)
> > But most importantly, the
On Tue, Apr 25, 2017 at 11:49:41AM -0700, Tejun Heo wrote:
> Will try that too. I can't see why HT would change it because I see
> single CPU queues misevaluated. Just in case, you need to tune the
> test params so that it doesn't load the machine too much and that
> there are some non-CPU
On Tue, Apr 25, 2017 at 11:49:41AM -0700, Tejun Heo wrote:
> Will try that too. I can't see why HT would change it because I see
> single CPU queues misevaluated. Just in case, you need to tune the
> test params so that it doesn't load the machine too much and that
> there are some non-CPU
Hello,
On Tue, Apr 25, 2017 at 02:59:18PM +0200, Vincent Guittot wrote:
> >> So you are changing the purpose of propagate_entity_load_avg which
> >> aims to propagate load_avg/util_avg changes only when a task migrate
> >> and you also want to propagate the enqueue/dequeue in the parent
> >>
Hello,
On Tue, Apr 25, 2017 at 02:59:18PM +0200, Vincent Guittot wrote:
> >> So you are changing the purpose of propagate_entity_load_avg which
> >> aims to propagate load_avg/util_avg changes only when a task migrate
> >> and you also want to propagate the enqueue/dequeue in the parent
> >>
On 25 April 2017 at 11:05, Vincent Guittot wrote:
> On 25 April 2017 at 10:46, Vincent Guittot wrote:
>> On 24 April 2017 at 22:14, Tejun Heo wrote:
>>> We noticed that with cgroup CPU controller in use, the scheduling
>>>
On 25 April 2017 at 11:05, Vincent Guittot wrote:
> On 25 April 2017 at 10:46, Vincent Guittot wrote:
>> On 24 April 2017 at 22:14, Tejun Heo wrote:
>>> We noticed that with cgroup CPU controller in use, the scheduling
>>> latency gets wonky regardless of nesting level or weight
>>>
On 25 April 2017 at 10:46, Vincent Guittot wrote:
> On 24 April 2017 at 22:14, Tejun Heo wrote:
>> We noticed that with cgroup CPU controller in use, the scheduling
>> latency gets wonky regardless of nesting level or weight
>> configuration. This is
On 25 April 2017 at 10:46, Vincent Guittot wrote:
> On 24 April 2017 at 22:14, Tejun Heo wrote:
>> We noticed that with cgroup CPU controller in use, the scheduling
>> latency gets wonky regardless of nesting level or weight
>> configuration. This is easily reproducible with Chris Mason's
>>
On 24 April 2017 at 22:14, Tejun Heo wrote:
> We noticed that with cgroup CPU controller in use, the scheduling
> latency gets wonky regardless of nesting level or weight
> configuration. This is easily reproducible with Chris Mason's
> schbench[1].
>
> All tests are run on a
On 24 April 2017 at 22:14, Tejun Heo wrote:
> We noticed that with cgroup CPU controller in use, the scheduling
> latency gets wonky regardless of nesting level or weight
> configuration. This is easily reproducible with Chris Mason's
> schbench[1].
>
> All tests are run on a single socket, 16
We noticed that with cgroup CPU controller in use, the scheduling
latency gets wonky regardless of nesting level or weight
configuration. This is easily reproducible with Chris Mason's
schbench[1].
All tests are run on a single socket, 16 cores, 32 threads machine.
While the machine is mostly
We noticed that with cgroup CPU controller in use, the scheduling
latency gets wonky regardless of nesting level or weight
configuration. This is easily reproducible with Chris Mason's
schbench[1].
All tests are run on a single socket, 16 cores, 32 threads machine.
While the machine is mostly
72 matches
Mail list logo