Re: [RFC] perf/core: what is exclude_idle supposed to do

2018-04-20 Thread Stephane Eranian
On Fri, Apr 20, 2018 at 9:51 AM Vince Weaver 
wrote:

> On Fri, 20 Apr 2018, Vince Weaver wrote:

> > > AFAICT it works on Power and possibly ARM.
> >
> > at least some ARMs are a bit more honest about it than x86
> >
> > ivybridge:
> >   Performance counter stats for '/bin/ls':
> >   1,368,162  instructions
> >   1,368,162  instructions:I
> >
> > pi2/ARM cortex-A7
> >   Performance counter stats for '/bin/ls':
> >   1,910,083  instructions
> > instructions:I
> >
> > I'd fire up my Power8 machine to see but not sure it's worth the hassle
> > and/or having to get out the ear protection.

> I did power up the Power8 machine in the end:

> power8:
>  perf stat -e cycles,cycles:I sleep 5
>  Performance counter stats for 'sleep 5':
>  14,271,273  cycles
>  14,271,273  cycles:I

> ???

> But then if I try again on power8

>  perf stat -a -e cycles,cycles:I sleep 5
>   Performance counter stats for 'system wide':
>  1,238,772,322,327  cycles
>  1,238,674,771,713  cycles:I

> there is a difference.

> But then on ivybridge

>  perf stat -a -e cycles,cycles:I sleep 5
>  Performance counter stats for 'system wide':
>  589,598,104  cycles
>  589,537,190  cycles:I

This may be noise.

The way the flag is named leads me to believe its goal is to not count when
executing in the context of the idle task (pid 0 on each CPU).
However, it does not seem to be implemented that way. If you were to
implement this, then in system wide mode you'd have to check
the incoming task on ctxsw, very much like we do in cgroup monitoring. So
it would not be totally free. One can argue, in sampling mode
you can eliminate the samples coming from PID=0 in the tool. But there
would be nothing to cover counting mode.

Interestingly, there is also code in perf tool to exclude known idle
routines from reporting. But this is targeted to only some routines that the
idle task may end up executing. so it is not quite the same.


> So maybe exclude_idle does do something on x86?  Or am I completely
> misunderstanding what the flag is supposed to be indicating?

> Vince


Re: [RFC] perf/core: what is exclude_idle supposed to do

2018-04-20 Thread Stephane Eranian
On Fri, Apr 20, 2018 at 9:51 AM Vince Weaver 
wrote:

> On Fri, 20 Apr 2018, Vince Weaver wrote:

> > > AFAICT it works on Power and possibly ARM.
> >
> > at least some ARMs are a bit more honest about it than x86
> >
> > ivybridge:
> >   Performance counter stats for '/bin/ls':
> >   1,368,162  instructions
> >   1,368,162  instructions:I
> >
> > pi2/ARM cortex-A7
> >   Performance counter stats for '/bin/ls':
> >   1,910,083  instructions
> > instructions:I
> >
> > I'd fire up my Power8 machine to see but not sure it's worth the hassle
> > and/or having to get out the ear protection.

> I did power up the Power8 machine in the end:

> power8:
>  perf stat -e cycles,cycles:I sleep 5
>  Performance counter stats for 'sleep 5':
>  14,271,273  cycles
>  14,271,273  cycles:I

> ???

> But then if I try again on power8

>  perf stat -a -e cycles,cycles:I sleep 5
>   Performance counter stats for 'system wide':
>  1,238,772,322,327  cycles
>  1,238,674,771,713  cycles:I

> there is a difference.

> But then on ivybridge

>  perf stat -a -e cycles,cycles:I sleep 5
>  Performance counter stats for 'system wide':
>  589,598,104  cycles
>  589,537,190  cycles:I

This may be noise.

The way the flag is named leads me to believe its goal is to not count when
executing in the context of the idle task (pid 0 on each CPU).
However, it does not seem to be implemented that way. If you were to
implement this, then in system wide mode you'd have to check
the incoming task on ctxsw, very much like we do in cgroup monitoring. So
it would not be totally free. One can argue, in sampling mode
you can eliminate the samples coming from PID=0 in the tool. But there
would be nothing to cover counting mode.

Interestingly, there is also code in perf tool to exclude known idle
routines from reporting. But this is targeted to only some routines that the
idle task may end up executing. so it is not quite the same.


> So maybe exclude_idle does do something on x86?  Or am I completely
> misunderstanding what the flag is supposed to be indicating?

> Vince


Re: [RFC] perf/core: what is exclude_idle supposed to do

2018-04-20 Thread Vince Weaver
On Fri, 20 Apr 2018, Vince Weaver wrote:

> > AFAICT it works on Power and possibly ARM.
> 
> at least some ARMs are a bit more honest about it than x86
> 
> ivybridge:
>   Performance counter stats for '/bin/ls':
>   1,368,162  instructions
>   1,368,162  instructions:I
> 
> pi2/ARM cortex-A7
>   Performance counter stats for '/bin/ls':
>   1,910,083  instructions
> instructions:I
> 
> I'd fire up my Power8 machine to see but not sure it's worth the hassle 
> and/or having to get out the ear protection.

I did power up the Power8 machine in the end:

power8:
perf stat -e cycles,cycles:I sleep 5
Performance counter stats for 'sleep 5':
14,271,273  cycles
14,271,273  cycles:I

???

But then if I try again on power8

perf stat -a -e cycles,cycles:I sleep 5
 Performance counter stats for 'system wide':
1,238,772,322,327  cycles
1,238,674,771,713  cycles:I   

there is a difference.

But then on ivybridge

perf stat -a -e cycles,cycles:I sleep 5
Performance counter stats for 'system wide':
589,598,104  cycles
589,537,190  cycles:I

there is also a different in system wide mode.

So maybe exclude_idle does do something on x86?  Or am I completely 
misunderstanding what the flag is supposed to be indicating?

Vince


Re: [RFC] perf/core: what is exclude_idle supposed to do

2018-04-20 Thread Vince Weaver
On Fri, 20 Apr 2018, Vince Weaver wrote:

> > AFAICT it works on Power and possibly ARM.
> 
> at least some ARMs are a bit more honest about it than x86
> 
> ivybridge:
>   Performance counter stats for '/bin/ls':
>   1,368,162  instructions
>   1,368,162  instructions:I
> 
> pi2/ARM cortex-A7
>   Performance counter stats for '/bin/ls':
>   1,910,083  instructions
> instructions:I
> 
> I'd fire up my Power8 machine to see but not sure it's worth the hassle 
> and/or having to get out the ear protection.

I did power up the Power8 machine in the end:

power8:
perf stat -e cycles,cycles:I sleep 5
Performance counter stats for 'sleep 5':
14,271,273  cycles
14,271,273  cycles:I

???

But then if I try again on power8

perf stat -a -e cycles,cycles:I sleep 5
 Performance counter stats for 'system wide':
1,238,772,322,327  cycles
1,238,674,771,713  cycles:I   

there is a difference.

But then on ivybridge

perf stat -a -e cycles,cycles:I sleep 5
Performance counter stats for 'system wide':
589,598,104  cycles
589,537,190  cycles:I

there is also a different in system wide mode.

So maybe exclude_idle does do something on x86?  Or am I completely 
misunderstanding what the flag is supposed to be indicating?

Vince


Re: [RFC] perf/core: what is exclude_idle supposed to do

2018-04-20 Thread Vince Weaver
On Fri, 20 Apr 2018, Peter Zijlstra wrote:

> On Wed, Apr 18, 2018 at 11:10:20AM -0400, Vince Weaver wrote:
> > On Tue, 17 Apr 2018, Jiri Olsa wrote:
> > 
> > > On Mon, Apr 16, 2018 at 10:04:53PM +, Stephane Eranian wrote:
> > > > Hi,
> > > > 
> > > > I am trying to understand what the exclude_idle event attribute is 
> > > > supposed
> > > > to accomplish.
> > > > As per the definition in the header file:
> > > > 
> > > > exclude_idle   :  1, /* don't count when idle */
> > > 
> > > AFAICS it's not implemented
> > 
> > so just to be completely clear hear, we're saying that the "exclude_idle" 
> > modifier has never done anything useful and still doesn't?
> 
> AFAICT it works on Power and possibly ARM.

at least some ARMs are a bit more honest about it than x86

ivybridge:
Performance counter stats for '/bin/ls':
1,368,162  instructions
1,368,162  instructions:I

pi2/ARM cortex-A7
Performance counter stats for '/bin/ls':
1,910,083  instructions
  instructions:I

I'd fire up my Power8 machine to see but not sure it's worth the hassle 
and/or having to get out the ear protection.

Vince


Re: [RFC] perf/core: what is exclude_idle supposed to do

2018-04-20 Thread Vince Weaver
On Fri, 20 Apr 2018, Peter Zijlstra wrote:

> On Wed, Apr 18, 2018 at 11:10:20AM -0400, Vince Weaver wrote:
> > On Tue, 17 Apr 2018, Jiri Olsa wrote:
> > 
> > > On Mon, Apr 16, 2018 at 10:04:53PM +, Stephane Eranian wrote:
> > > > Hi,
> > > > 
> > > > I am trying to understand what the exclude_idle event attribute is 
> > > > supposed
> > > > to accomplish.
> > > > As per the definition in the header file:
> > > > 
> > > > exclude_idle   :  1, /* don't count when idle */
> > > 
> > > AFAICS it's not implemented
> > 
> > so just to be completely clear hear, we're saying that the "exclude_idle" 
> > modifier has never done anything useful and still doesn't?
> 
> AFAICT it works on Power and possibly ARM.

at least some ARMs are a bit more honest about it than x86

ivybridge:
Performance counter stats for '/bin/ls':
1,368,162  instructions
1,368,162  instructions:I

pi2/ARM cortex-A7
Performance counter stats for '/bin/ls':
1,910,083  instructions
  instructions:I

I'd fire up my Power8 machine to see but not sure it's worth the hassle 
and/or having to get out the ear protection.

Vince


Re: [RFC] perf/core: what is exclude_idle supposed to do

2018-04-20 Thread Peter Zijlstra
On Wed, Apr 18, 2018 at 11:10:20AM -0400, Vince Weaver wrote:
> On Tue, 17 Apr 2018, Jiri Olsa wrote:
> 
> > On Mon, Apr 16, 2018 at 10:04:53PM +, Stephane Eranian wrote:
> > > Hi,
> > > 
> > > I am trying to understand what the exclude_idle event attribute is 
> > > supposed
> > > to accomplish.
> > > As per the definition in the header file:
> > > 
> > > exclude_idle   :  1, /* don't count when idle */
> > 
> > AFAICS it's not implemented
> 
> so just to be completely clear hear, we're saying that the "exclude_idle" 
> modifier has never done anything useful and still doesn't?

AFAICT it works on Power and possibly ARM.


Re: [RFC] perf/core: what is exclude_idle supposed to do

2018-04-20 Thread Peter Zijlstra
On Wed, Apr 18, 2018 at 11:10:20AM -0400, Vince Weaver wrote:
> On Tue, 17 Apr 2018, Jiri Olsa wrote:
> 
> > On Mon, Apr 16, 2018 at 10:04:53PM +, Stephane Eranian wrote:
> > > Hi,
> > > 
> > > I am trying to understand what the exclude_idle event attribute is 
> > > supposed
> > > to accomplish.
> > > As per the definition in the header file:
> > > 
> > > exclude_idle   :  1, /* don't count when idle */
> > 
> > AFAICS it's not implemented
> 
> so just to be completely clear hear, we're saying that the "exclude_idle" 
> modifier has never done anything useful and still doesn't?

AFAICT it works on Power and possibly ARM.


Re: [RFC] perf/core: what is exclude_idle supposed to do

2018-04-20 Thread Peter Zijlstra
On Mon, Apr 16, 2018 at 10:04:53PM +, Stephane Eranian wrote:
> Hi,
> 
> I am trying to understand what the exclude_idle event attribute is supposed
> to accomplish.
> As per the definition in the header file:
> 
> exclude_idle   :  1, /* don't count when idle */
> 
> Naively, I thought it would simply stop the event when running in the
> context of the idle task (swapper, pid 0) on any CPU. That would seem to
> match the succinct description.
> 
> However, running a simple:
> 
> $ perf record -a -e cycles:I sleep 5
> perf_event_attr:
> sample_type  IP|TID|TIME|CPU|PERIOD
>read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID
>exclude_idle 1
> 
> on an idle machine, showed me that this is not what is actually happening:
> $ perf  script
>   swapper 0 [000] 1132634.287442:  1 cycles:I:
>   8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
>   swapper 0 [001] 1132634.287498:  1 cycles:I:
>   8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
> 
> 
> After looking at the code, it all made sense, it seems to current
> implementation is only relevant for sw events. I don't understand why.
> 
> I am left wondering what is the goal of exclude_idle?

A "git grep exclude_idle" seems to suggest powerpc/arm have it
immplemented for their PMU. If we then look at commit:

  2743a5b0fa6f ("perfcounters: provide expansion room in the ABI")

It was Paul who introduced the bit.

So I'm thinking that if x86 doesn't implement it, we should at least
error out on it. Of course, so far we've allowed it, so who knows what
all will break if we start asserting that :/




Re: [RFC] perf/core: what is exclude_idle supposed to do

2018-04-20 Thread Peter Zijlstra
On Mon, Apr 16, 2018 at 10:04:53PM +, Stephane Eranian wrote:
> Hi,
> 
> I am trying to understand what the exclude_idle event attribute is supposed
> to accomplish.
> As per the definition in the header file:
> 
> exclude_idle   :  1, /* don't count when idle */
> 
> Naively, I thought it would simply stop the event when running in the
> context of the idle task (swapper, pid 0) on any CPU. That would seem to
> match the succinct description.
> 
> However, running a simple:
> 
> $ perf record -a -e cycles:I sleep 5
> perf_event_attr:
> sample_type  IP|TID|TIME|CPU|PERIOD
>read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID
>exclude_idle 1
> 
> on an idle machine, showed me that this is not what is actually happening:
> $ perf  script
>   swapper 0 [000] 1132634.287442:  1 cycles:I:
>   8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
>   swapper 0 [001] 1132634.287498:  1 cycles:I:
>   8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
> 
> 
> After looking at the code, it all made sense, it seems to current
> implementation is only relevant for sw events. I don't understand why.
> 
> I am left wondering what is the goal of exclude_idle?

A "git grep exclude_idle" seems to suggest powerpc/arm have it
immplemented for their PMU. If we then look at commit:

  2743a5b0fa6f ("perfcounters: provide expansion room in the ABI")

It was Paul who introduced the bit.

So I'm thinking that if x86 doesn't implement it, we should at least
error out on it. Of course, so far we've allowed it, so who knows what
all will break if we start asserting that :/




Re: [RFC] perf/core: what is exclude_idle supposed to do

2018-04-18 Thread Vince Weaver
On Tue, 17 Apr 2018, Jiri Olsa wrote:

> On Mon, Apr 16, 2018 at 10:04:53PM +, Stephane Eranian wrote:
> > Hi,
> > 
> > I am trying to understand what the exclude_idle event attribute is supposed
> > to accomplish.
> > As per the definition in the header file:
> > 
> > exclude_idle   :  1, /* don't count when idle */
> 
> AFAICS it's not implemented

so just to be completely clear hear, we're saying that the "exclude_idle" 
modifier has never done anything useful and still doesn't?

If so I should update the perf_event_open manpage to spell this out.

Vince


Re: [RFC] perf/core: what is exclude_idle supposed to do

2018-04-18 Thread Vince Weaver
On Tue, 17 Apr 2018, Jiri Olsa wrote:

> On Mon, Apr 16, 2018 at 10:04:53PM +, Stephane Eranian wrote:
> > Hi,
> > 
> > I am trying to understand what the exclude_idle event attribute is supposed
> > to accomplish.
> > As per the definition in the header file:
> > 
> > exclude_idle   :  1, /* don't count when idle */
> 
> AFAICS it's not implemented

so just to be completely clear hear, we're saying that the "exclude_idle" 
modifier has never done anything useful and still doesn't?

If so I should update the perf_event_open manpage to spell this out.

Vince


Re: [RFC] perf/core: what is exclude_idle supposed to do

2018-04-17 Thread Arnaldo Carvalho de Melo
Em Mon, Apr 16, 2018 at 10:04:53PM +, Stephane Eranian escreveu:
> Hi,
> 
> I am trying to understand what the exclude_idle event attribute is supposed
> to accomplish.
> As per the definition in the header file:
> 
> exclude_idle   :  1, /* don't count when idle */
> 
> Naively, I thought it would simply stop the event when running in the
> context of the idle task (swapper, pid 0) on any CPU. That would seem to
> match the succinct description.
> 
> However, running a simple:
> 
> $ perf record -a -e cycles:I sleep 5
> perf_event_attr:
> sample_type  IP|TID|TIME|CPU|PERIOD
>read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID
>exclude_idle 1
> 
> on an idle machine, showed me that this is not what is actually happening:
> $ perf  script
>   swapper 0 [000] 1132634.287442:  1 cycles:I:
>   8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
>   swapper 0 [001] 1132634.287498:  1 cycles:I:
>   8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
> 
> 
> After looking at the code, it all made sense, it seems to current
> implementation is only relevant for sw events. I don't understand why.
> 
> I am left wondering what is the goal of exclude_idle?

To do something it is not doing, i.e. to do what you expected it to do.

There were messages exchanged recently where PeterZ said that this is
just that needs fixing.

- Arnaldo


Re: [RFC] perf/core: what is exclude_idle supposed to do

2018-04-17 Thread Arnaldo Carvalho de Melo
Em Mon, Apr 16, 2018 at 10:04:53PM +, Stephane Eranian escreveu:
> Hi,
> 
> I am trying to understand what the exclude_idle event attribute is supposed
> to accomplish.
> As per the definition in the header file:
> 
> exclude_idle   :  1, /* don't count when idle */
> 
> Naively, I thought it would simply stop the event when running in the
> context of the idle task (swapper, pid 0) on any CPU. That would seem to
> match the succinct description.
> 
> However, running a simple:
> 
> $ perf record -a -e cycles:I sleep 5
> perf_event_attr:
> sample_type  IP|TID|TIME|CPU|PERIOD
>read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID
>exclude_idle 1
> 
> on an idle machine, showed me that this is not what is actually happening:
> $ perf  script
>   swapper 0 [000] 1132634.287442:  1 cycles:I:
>   8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
>   swapper 0 [001] 1132634.287498:  1 cycles:I:
>   8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
> 
> 
> After looking at the code, it all made sense, it seems to current
> implementation is only relevant for sw events. I don't understand why.
> 
> I am left wondering what is the goal of exclude_idle?

To do something it is not doing, i.e. to do what you expected it to do.

There were messages exchanged recently where PeterZ said that this is
just that needs fixing.

- Arnaldo


Re: [RFC] perf/core: what is exclude_idle supposed to do

2018-04-17 Thread Jiri Olsa
On Mon, Apr 16, 2018 at 10:04:53PM +, Stephane Eranian wrote:
> Hi,
> 
> I am trying to understand what the exclude_idle event attribute is supposed
> to accomplish.
> As per the definition in the header file:
> 
> exclude_idle   :  1, /* don't count when idle */
> 
> Naively, I thought it would simply stop the event when running in the
> context of the idle task (swapper, pid 0) on any CPU. That would seem to
> match the succinct description.
> 
> However, running a simple:
> 
> $ perf record -a -e cycles:I sleep 5
> perf_event_attr:
> sample_type  IP|TID|TIME|CPU|PERIOD
>read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID
>exclude_idle 1
> 
> on an idle machine, showed me that this is not what is actually happening:
> $ perf  script
>   swapper 0 [000] 1132634.287442:  1 cycles:I:
>   8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
>   swapper 0 [001] 1132634.287498:  1 cycles:I:
>   8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
> 
> 
> After looking at the code, it all made sense, it seems to current
> implementation is only relevant for sw events. I don't understand why.

AFAICS it's not implemented

Peter suggested some time ago change for cpu-clock (attached)
I still have it in my queue, because it gives funny numbers
sometimes.. and I haven't figured it out so far

the idea so far was to use cpu-clock,cpu-clock:I events to
count and display idle % in perf top/record and stat metrics

jirka


---
>From 7f20b047cf56f8086d79007175592633798656ce Mon Sep 17 00:00:00 2001
From: Peter Zijlstra 
Date: Thu, 23 Nov 2017 16:15:36 +0100
Subject: [PATCH] perf: Add support to monitor idle time on cpu-clock

Adding support to use perf_event_attr::exclude_idle
(the :I modifierr in perf tools)to monitor idle time
on cpu-clock event.

Link: http://lkml.kernel.org/n/tip-jw45kl6ydrhzqx4uxws0q...@git.kernel.org
Signed-off-by: Jiri Olsa 
---
 kernel/events/core.c | 22 ++
 kernel/sched/idle_task.c | 17 +
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2989e3b0fe8b..62c0dc75ed11 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9391,19 +9391,33 @@ static void perf_swevent_init_hrtimer(struct perf_event 
*event)
  * Software event: cpu wall time clock
  */
 
+static u64 cpu_clock_count(struct perf_event *event)
+{
+   u64 now = local_clock();
+
+   if (event->attr.exclude_idle)
+   now -= idle_task(event->oncpu)->se.sum_exec_runtime;
+
+   return now;
+}
+
 static void cpu_clock_event_update(struct perf_event *event)
 {
-   s64 prev;
+   s64 prev, delta;
u64 now;
 
-   now = local_clock();
+   now = cpu_clock_count(event);
prev = local64_xchg(>hw.prev_count, now);
-   local64_add(now - prev, >count);
+   delta = now - prev;
+   if (delta > 0)
+   local64_add(delta, >count);
 }
 
 static void cpu_clock_event_start(struct perf_event *event, int flags)
 {
-   local64_set(>hw.prev_count, local_clock());
+   u64 now = cpu_clock_count(event);
+
+   local64_set(>hw.prev_count, now);
perf_swevent_start_hrtimer(event);
 }
 
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index d518664cce4f..e360ce5dd9a2 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -27,9 +27,14 @@ static void check_preempt_curr_idle(struct rq *rq, struct 
task_struct *p, int fl
 static struct task_struct *
 pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags 
*rf)
 {
+   struct task_struct *idle = rq->idle;
+
put_prev_task(rq, prev);
update_idle_core(rq);
schedstat_inc(rq->sched_goidle);
+
+   idle->se.exec_start = rq_clock_task(rq);
+
return rq->idle;
 }
 
@@ -48,6 +53,15 @@ dequeue_task_idle(struct rq *rq, struct task_struct *p, int 
flags)
 
 static void put_prev_task_idle(struct rq *rq, struct task_struct *prev)
 {
+   struct task_struct *idle = rq->idle;
+   u64 delta, now;
+
+   now = rq_clock_task(rq);
+   delta = now - idle->se.exec_start;
+   if (likely((s64)delta > 0))
+   idle->se.sum_exec_runtime += delta;
+   idle->se.exec_start = now;
+
rq_last_tick_reset(rq);
 }
 
@@ -57,6 +71,9 @@ static void task_tick_idle(struct rq *rq, struct task_struct 
*curr, int queued)
 
 static void set_curr_task_idle(struct rq *rq)
 {
+   struct task_struct *idle = rq->idle;
+
+   idle->se.exec_start = rq_clock_task(rq);
 }
 
 static void switched_to_idle(struct rq *rq, struct task_struct *p)
-- 
2.13.6



Re: [RFC] perf/core: what is exclude_idle supposed to do

2018-04-17 Thread Jiri Olsa
On Mon, Apr 16, 2018 at 10:04:53PM +, Stephane Eranian wrote:
> Hi,
> 
> I am trying to understand what the exclude_idle event attribute is supposed
> to accomplish.
> As per the definition in the header file:
> 
> exclude_idle   :  1, /* don't count when idle */
> 
> Naively, I thought it would simply stop the event when running in the
> context of the idle task (swapper, pid 0) on any CPU. That would seem to
> match the succinct description.
> 
> However, running a simple:
> 
> $ perf record -a -e cycles:I sleep 5
> perf_event_attr:
> sample_type  IP|TID|TIME|CPU|PERIOD
>read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID
>exclude_idle 1
> 
> on an idle machine, showed me that this is not what is actually happening:
> $ perf  script
>   swapper 0 [000] 1132634.287442:  1 cycles:I:
>   8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
>   swapper 0 [001] 1132634.287498:  1 cycles:I:
>   8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
> 
> 
> After looking at the code, it all made sense, it seems to current
> implementation is only relevant for sw events. I don't understand why.

AFAICS it's not implemented

Peter suggested some time ago change for cpu-clock (attached)
I still have it in my queue, because it gives funny numbers
sometimes.. and I haven't figured it out so far

the idea so far was to use cpu-clock,cpu-clock:I events to
count and display idle % in perf top/record and stat metrics

jirka


---
>From 7f20b047cf56f8086d79007175592633798656ce Mon Sep 17 00:00:00 2001
From: Peter Zijlstra 
Date: Thu, 23 Nov 2017 16:15:36 +0100
Subject: [PATCH] perf: Add support to monitor idle time on cpu-clock

Adding support to use perf_event_attr::exclude_idle
(the :I modifierr in perf tools)to monitor idle time
on cpu-clock event.

Link: http://lkml.kernel.org/n/tip-jw45kl6ydrhzqx4uxws0q...@git.kernel.org
Signed-off-by: Jiri Olsa 
---
 kernel/events/core.c | 22 ++
 kernel/sched/idle_task.c | 17 +
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2989e3b0fe8b..62c0dc75ed11 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9391,19 +9391,33 @@ static void perf_swevent_init_hrtimer(struct perf_event 
*event)
  * Software event: cpu wall time clock
  */
 
+static u64 cpu_clock_count(struct perf_event *event)
+{
+   u64 now = local_clock();
+
+   if (event->attr.exclude_idle)
+   now -= idle_task(event->oncpu)->se.sum_exec_runtime;
+
+   return now;
+}
+
 static void cpu_clock_event_update(struct perf_event *event)
 {
-   s64 prev;
+   s64 prev, delta;
u64 now;
 
-   now = local_clock();
+   now = cpu_clock_count(event);
prev = local64_xchg(>hw.prev_count, now);
-   local64_add(now - prev, >count);
+   delta = now - prev;
+   if (delta > 0)
+   local64_add(delta, >count);
 }
 
 static void cpu_clock_event_start(struct perf_event *event, int flags)
 {
-   local64_set(>hw.prev_count, local_clock());
+   u64 now = cpu_clock_count(event);
+
+   local64_set(>hw.prev_count, now);
perf_swevent_start_hrtimer(event);
 }
 
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index d518664cce4f..e360ce5dd9a2 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -27,9 +27,14 @@ static void check_preempt_curr_idle(struct rq *rq, struct 
task_struct *p, int fl
 static struct task_struct *
 pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags 
*rf)
 {
+   struct task_struct *idle = rq->idle;
+
put_prev_task(rq, prev);
update_idle_core(rq);
schedstat_inc(rq->sched_goidle);
+
+   idle->se.exec_start = rq_clock_task(rq);
+
return rq->idle;
 }
 
@@ -48,6 +53,15 @@ dequeue_task_idle(struct rq *rq, struct task_struct *p, int 
flags)
 
 static void put_prev_task_idle(struct rq *rq, struct task_struct *prev)
 {
+   struct task_struct *idle = rq->idle;
+   u64 delta, now;
+
+   now = rq_clock_task(rq);
+   delta = now - idle->se.exec_start;
+   if (likely((s64)delta > 0))
+   idle->se.sum_exec_runtime += delta;
+   idle->se.exec_start = now;
+
rq_last_tick_reset(rq);
 }
 
@@ -57,6 +71,9 @@ static void task_tick_idle(struct rq *rq, struct task_struct 
*curr, int queued)
 
 static void set_curr_task_idle(struct rq *rq)
 {
+   struct task_struct *idle = rq->idle;
+
+   idle->se.exec_start = rq_clock_task(rq);
 }
 
 static void switched_to_idle(struct rq *rq, struct task_struct *p)
-- 
2.13.6



[RFC] perf/core: what is exclude_idle supposed to do

2018-04-16 Thread Stephane Eranian
Hi,

I am trying to understand what the exclude_idle event attribute is supposed
to accomplish.
As per the definition in the header file:

exclude_idle   :  1, /* don't count when idle */

Naively, I thought it would simply stop the event when running in the
context of the idle task (swapper, pid 0) on any CPU. That would seem to
match the succinct description.

However, running a simple:

$ perf record -a -e cycles:I sleep 5
perf_event_attr:
sample_type  IP|TID|TIME|CPU|PERIOD
   read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID
   exclude_idle 1

on an idle machine, showed me that this is not what is actually happening:
$ perf  script
  swapper 0 [000] 1132634.287442:  1 cycles:I:
  8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
  swapper 0 [001] 1132634.287498:  1 cycles:I:
  8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])


After looking at the code, it all made sense, it seems to current
implementation is only relevant for sw events. I don't understand why.

I am left wondering what is the goal of exclude_idle?


[RFC] perf/core: what is exclude_idle supposed to do

2018-04-16 Thread Stephane Eranian
Hi,

I am trying to understand what the exclude_idle event attribute is supposed
to accomplish.
As per the definition in the header file:

exclude_idle   :  1, /* don't count when idle */

Naively, I thought it would simply stop the event when running in the
context of the idle task (swapper, pid 0) on any CPU. That would seem to
match the succinct description.

However, running a simple:

$ perf record -a -e cycles:I sleep 5
perf_event_attr:
sample_type  IP|TID|TIME|CPU|PERIOD
   read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID
   exclude_idle 1

on an idle machine, showed me that this is not what is actually happening:
$ perf  script
  swapper 0 [000] 1132634.287442:  1 cycles:I:
  8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])
  swapper 0 [001] 1132634.287498:  1 cycles:I:
  8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms])


After looking at the code, it all made sense, it seems to current
implementation is only relevant for sw events. I don't understand why.

I am left wondering what is the goal of exclude_idle?