Re: [ANNOUNCE] (Resend) Tools to analyse PM and scheduling behaviour

2014-08-30 Thread Sundar
Hi Amit,

On Tue, Aug 26, 2014 at 11:02 AM, Amit Kucheria
 wrote:

> Consider the following examples:
>
> *On a given platform*, we see the same benchmark scores with and
> without patchset ABC, but including patchset ABC leads to better "power
> behaviour" i.e. requests of deeper idle states and/or lower frequencies.
>
> Consider another example where the benchmark score dramatically improves
> with patchset XYZ while the idle and frequency requests are marginally
> worse (shallower idle, reduced residency or increased frequency requests).
>
> In both cases, it is left to platforms to do real measurements to confirm that
> this is indeed the case. The latter example might not even be possible
> on some platforms, given some platform constraints e.g. the platform
> thermal envelope.
>
> Idlestat is not a replacement for real measurements. It is a tool to
> allow maintainers (scheduler, PM) to judge if any further investigation
> is needed and request such numbers from people running the code on
> various architectures before merging the patches.

As I mentioned, it is very much possible for a workload to preserve the CPU
C/P states but damage some other system metric like memory/soc bandwidth,
cache characteristics because the scheduler was probably doing more aggressive
task placements. I agree that no tool (within room for
errors/approximations) can
replace a physical measurement; my only query/concern being is C/P correlation
the direct or primary metric for scheduler behavior (not PM behavior).

> First, idlestat is designed to be architecture-independent. It only
> depends on what the kernel knows.
> Second, it is created with benchmarking in mind - non-interactive and
> minimal overhead.
> Third, it was designed for maintainers to be able to quickly tell if a
> patchset changes OS behaviour dramatically and request deeper
> analysis on various architectures.
> Fourth, it has the prediction logic which calculates the intersection of
> C-state requests by several cpus in a cluster to determine the cluster
> state.
>
> On top of this, we have two WIP additions:
>  - an experimental "energy model" patch for idlestat that lets a SoC
>  vendor provide the cost of various states as input and idlestat will
>  output the "energy cost" of a workload.
>  - a 'diff mode' to show the diff between two traces

I see this as no different from powertop; would it not be easier to
add the prediction
logic and investigate energy models integration? I dont mind a
different tool to be
doing almost same things, but is there really a need for one?

> Correct. At the moment, idlestat can only provide an indication if
> something might be wrong.

And that's where I think I see an immense value for idlestat to stick to
scheduler details beyond the traditional C/P state statistics.

> These would show up as regressions in benchmark results. Fengguang's
> excellent benchmark report[1] already captures such "changes". Does it
> make sense to recapture that in a tool?
> [1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg703826.html

I am yet to digest that report, so apologies :)

> We're open to tracking more metrics if it is felt they are useful.
>
> One of the tenets of energy-aware scheduling is "improving energy
> efficiency with little or no performance regression". idlestat tells us
> about possible regressions on the energy front and benchmarks should
> tell us if we are regressing on performance. Hence the focus on
> C/P-states for now.

I would like to know your views on adding additional scheduler metrics
like task thrashing,
irregular placements, increased load balancing into the tool to be
able to zero in the
scheduler for efficiency losses. There might be more critical metrics
that I am missing...

Cheers!
-- 
-
The views expressed in this email are personal and do not necessarily
echo my employers.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] (Resend) Tools to analyse PM and scheduling behaviour

2014-08-30 Thread Sundar
Hi Amit,

On Tue, Aug 26, 2014 at 11:02 AM, Amit Kucheria
amit.kuche...@linaro.org wrote:

 Consider the following examples:

 *On a given platform*, we see the same benchmark scores with and
 without patchset ABC, but including patchset ABC leads to better power
 behaviour i.e. requests of deeper idle states and/or lower frequencies.

 Consider another example where the benchmark score dramatically improves
 with patchset XYZ while the idle and frequency requests are marginally
 worse (shallower idle, reduced residency or increased frequency requests).

 In both cases, it is left to platforms to do real measurements to confirm that
 this is indeed the case. The latter example might not even be possible
 on some platforms, given some platform constraints e.g. the platform
 thermal envelope.

 Idlestat is not a replacement for real measurements. It is a tool to
 allow maintainers (scheduler, PM) to judge if any further investigation
 is needed and request such numbers from people running the code on
 various architectures before merging the patches.

As I mentioned, it is very much possible for a workload to preserve the CPU
C/P states but damage some other system metric like memory/soc bandwidth,
cache characteristics because the scheduler was probably doing more aggressive
task placements. I agree that no tool (within room for
errors/approximations) can
replace a physical measurement; my only query/concern being is C/P correlation
the direct or primary metric for scheduler behavior (not PM behavior).

 First, idlestat is designed to be architecture-independent. It only
 depends on what the kernel knows.
 Second, it is created with benchmarking in mind - non-interactive and
 minimal overhead.
 Third, it was designed for maintainers to be able to quickly tell if a
 patchset changes OS behaviour dramatically and request deeper
 analysis on various architectures.
 Fourth, it has the prediction logic which calculates the intersection of
 C-state requests by several cpus in a cluster to determine the cluster
 state.

 On top of this, we have two WIP additions:
  - an experimental energy model patch for idlestat that lets a SoC
  vendor provide the cost of various states as input and idlestat will
  output the energy cost of a workload.
  - a 'diff mode' to show the diff between two traces

I see this as no different from powertop; would it not be easier to
add the prediction
logic and investigate energy models integration? I dont mind a
different tool to be
doing almost same things, but is there really a need for one?

 Correct. At the moment, idlestat can only provide an indication if
 something might be wrong.

And that's where I think I see an immense value for idlestat to stick to
scheduler details beyond the traditional C/P state statistics.

 These would show up as regressions in benchmark results. Fengguang's
 excellent benchmark report[1] already captures such changes. Does it
 make sense to recapture that in a tool?
 [1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg703826.html

I am yet to digest that report, so apologies :)

 We're open to tracking more metrics if it is felt they are useful.

 One of the tenets of energy-aware scheduling is improving energy
 efficiency with little or no performance regression. idlestat tells us
 about possible regressions on the energy front and benchmarks should
 tell us if we are regressing on performance. Hence the focus on
 C/P-states for now.

I would like to know your views on adding additional scheduler metrics
like task thrashing,
irregular placements, increased load balancing into the tool to be
able to zero in the
scheduler for efficiency losses. There might be more critical metrics
that I am missing...

Cheers!
-- 
-
The views expressed in this email are personal and do not necessarily
echo my employers.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] (Resend) Tools to analyse PM and scheduling behaviour

2014-08-25 Thread Amit Kucheria
On Sat, 23 Aug 2014 at 07:44 +0530, Sundar  wrote:
> Hi Amit,
>
> On Tue, Aug 19, 2014 at 11:11 AM, Amit Kucheria
>  wrote:
>>
>> We’re soliciting early feedback from community on the direction of idlestat
>
> Nice :)
>
>> Idlestat Details
>> 
>> Idlestat uses FTRACE to capture traces related to C-state and P-state
>> transitions of the CPU and wakeups (IRQ, IPI) on the system and then
>> post-processes the data to print statistics. It is designed to be used
>> non-interactively. Idlestat can deduce the idle time for a cluster as an
>> intersection between the idle times of all the cpus belonging to the same
>> cluster. This data is useful to analyse and optimise scheduling behaviour.
>> The tool will also list how many times the menu governor mis-predicts
>> target residency in a C-state.
>
> We discussed this in the energy aware scheduling workshop this week @
> the Kernel Summit. A few notes:
>
> 1. We need to really understand the co-relation of this tool w.r.t
> actual hardware states.
> It is usually likely that the software "thinks" it is in a low power
> state, but the actual
> hardware might not be. What is the coverage for these kind of cases here.

You are right, it does not represent the actual state of the HW, only
the 'requested' state.

There are various platform-dependent ways to knowing the actual HW
state. Some examples are:
 - Through an external HW signal (e.g. a GPIO that is toggled when clock
 to the CPU is cut off)
 - Measuring power on the power rails and correlating those well-known
 values (CPU ON, retention, OFF) to the traces
 - Reading some register (like MSR on x86)

This is not the main focus of the tool.

> 2. I understand that C/P states are a direct metric of how well the
> workload behaved w.r.t power;
> but I am not sure that relates to a direct measure of how the
> scheduler performed.

Consider the following examples:

*On a given platform*, we see the same benchmark scores with and
without patchset ABC, but including patchset ABC leads to better "power
behaviour" i.e. requests of deeper idle states and/or lower frequencies.

Consider another example where the benchmark score dramatically improves
with patchset XYZ while the idle and frequency requests are marginally
worse (shallower idle, reduced residency or increased frequency requests).

In both cases, it is left to platforms to do real measurements to confirm that
this is indeed the case. The latter example might not even be possible
on some platforms, given some platform constraints e.g. the platform
thermal envelope.

Idlestat is not a replacement for real measurements. It is a tool to
allow maintainers (scheduler, PM) to judge if any further investigation
is needed and request such numbers from people running the code on
various architectures before merging the patches.

> The C/P states
> could be maintained whilst giving away performance or power at the
> expense of additional components
> on the SoC and platform like DDR IOs, fabric states etc.

True.

> Quick Summary of what I discussed with Daniel @ the workshop about idlestat:
>
> 1. There might be usually platform specific tools to get residencies
> for P/C states.
> PowerTop & Turbostat are two that first come to mind. Any specific
> item apart from prediction logic
> that idlestat differs from these two?

First, idlestat is designed to be architecture-independent. It only
depends on what the kernel knows.
Second, it is created with benchmarking in mind - non-interactive and
minimal overhead.
Third, it was designed for maintainers to be able to quickly tell if a
patchset changes OS behaviour dramatically and request deeper
analysis on various architectures.
Fourth, it has the prediction logic which calculates the intersection of
C-state requests by several cpus in a cluster to determine the cluster
state.

On top of this, we have two WIP additions:
 - an experimental "energy model" patch for idlestat that lets a SoC
 vendor provide the cost of various states as input and idlestat will
 output the "energy cost" of a workload.
 - a 'diff mode' to show the diff between two traces

> 2. To me debugging performance or power, C/P states provide the
> direction that something is wrong.
>
> But they still dont tell me "what" is wrong "if" the issue is somehow
> in the kernel as opposed to a more

Correct. At the moment, idlestat can only provide an indication if
something might be wrong.

> easily fixable software code (traceable at hardware/software level for
> best optimizations). How do I
> conclude that my scheduler is the culprit apart from the points where
> it took a decision to select the
> right idle states based on predicted sleep times? In my opinion, that
> would boil down to if the scheduler
> was invoking too much load balancing calls, moving my threads across
> cores too much, data being
> thrashed across caches, cores too much etc.

These would show up as regressions in benchmark results. Fengguang's
excellent benchmark 

Re: [ANNOUNCE] (Resend) Tools to analyse PM and scheduling behaviour

2014-08-25 Thread Amit Kucheria
On Sat, 23 Aug 2014 at 07:44 +0530, Sundar sunder.s...@gmail.com wrote:
 Hi Amit,

 On Tue, Aug 19, 2014 at 11:11 AM, Amit Kucheria
 amit.kuche...@linaro.org wrote:

 We’re soliciting early feedback from community on the direction of idlestat

 Nice :)

 Idlestat Details
 
 Idlestat uses FTRACE to capture traces related to C-state and P-state
 transitions of the CPU and wakeups (IRQ, IPI) on the system and then
 post-processes the data to print statistics. It is designed to be used
 non-interactively. Idlestat can deduce the idle time for a cluster as an
 intersection between the idle times of all the cpus belonging to the same
 cluster. This data is useful to analyse and optimise scheduling behaviour.
 The tool will also list how many times the menu governor mis-predicts
 target residency in a C-state.

 We discussed this in the energy aware scheduling workshop this week @
 the Kernel Summit. A few notes:

 1. We need to really understand the co-relation of this tool w.r.t
 actual hardware states.
 It is usually likely that the software thinks it is in a low power
 state, but the actual
 hardware might not be. What is the coverage for these kind of cases here.

You are right, it does not represent the actual state of the HW, only
the 'requested' state.

There are various platform-dependent ways to knowing the actual HW
state. Some examples are:
 - Through an external HW signal (e.g. a GPIO that is toggled when clock
 to the CPU is cut off)
 - Measuring power on the power rails and correlating those well-known
 values (CPU ON, retention, OFF) to the traces
 - Reading some register (like MSR on x86)

This is not the main focus of the tool.

 2. I understand that C/P states are a direct metric of how well the
 workload behaved w.r.t power;
 but I am not sure that relates to a direct measure of how the
 scheduler performed.

Consider the following examples:

*On a given platform*, we see the same benchmark scores with and
without patchset ABC, but including patchset ABC leads to better power
behaviour i.e. requests of deeper idle states and/or lower frequencies.

Consider another example where the benchmark score dramatically improves
with patchset XYZ while the idle and frequency requests are marginally
worse (shallower idle, reduced residency or increased frequency requests).

In both cases, it is left to platforms to do real measurements to confirm that
this is indeed the case. The latter example might not even be possible
on some platforms, given some platform constraints e.g. the platform
thermal envelope.

Idlestat is not a replacement for real measurements. It is a tool to
allow maintainers (scheduler, PM) to judge if any further investigation
is needed and request such numbers from people running the code on
various architectures before merging the patches.

 The C/P states
 could be maintained whilst giving away performance or power at the
 expense of additional components
 on the SoC and platform like DDR IOs, fabric states etc.

True.

 Quick Summary of what I discussed with Daniel @ the workshop about idlestat:

 1. There might be usually platform specific tools to get residencies
 for P/C states.
 PowerTop  Turbostat are two that first come to mind. Any specific
 item apart from prediction logic
 that idlestat differs from these two?

First, idlestat is designed to be architecture-independent. It only
depends on what the kernel knows.
Second, it is created with benchmarking in mind - non-interactive and
minimal overhead.
Third, it was designed for maintainers to be able to quickly tell if a
patchset changes OS behaviour dramatically and request deeper
analysis on various architectures.
Fourth, it has the prediction logic which calculates the intersection of
C-state requests by several cpus in a cluster to determine the cluster
state.

On top of this, we have two WIP additions:
 - an experimental energy model patch for idlestat that lets a SoC
 vendor provide the cost of various states as input and idlestat will
 output the energy cost of a workload.
 - a 'diff mode' to show the diff between two traces

 2. To me debugging performance or power, C/P states provide the
 direction that something is wrong.

 But they still dont tell me what is wrong if the issue is somehow
 in the kernel as opposed to a more

Correct. At the moment, idlestat can only provide an indication if
something might be wrong.

 easily fixable software code (traceable at hardware/software level for
 best optimizations). How do I
 conclude that my scheduler is the culprit apart from the points where
 it took a decision to select the
 right idle states based on predicted sleep times? In my opinion, that
 would boil down to if the scheduler
 was invoking too much load balancing calls, moving my threads across
 cores too much, data being
 thrashed across caches, cores too much etc.

These would show up as regressions in benchmark results. Fengguang's
excellent benchmark report[1] already captures such changes. Does 

Re: [ANNOUNCE] (Resend) Tools to analyse PM and scheduling behaviour

2014-08-22 Thread Sundar
Hi Amit,

On Tue, Aug 19, 2014 at 11:11 AM, Amit Kucheria
 wrote:
>
> We’re soliciting early feedback from community on the direction of idlestat

Nice :)

> Idlestat Details
> 
> Idlestat uses FTRACE to capture traces related to C-state and P-state
> transitions of the CPU and wakeups (IRQ, IPI) on the system and then
> post-processes the data to print statistics. It is designed to be used
> non-interactively. Idlestat can deduce the idle time for a cluster as an
> intersection between the idle times of all the cpus belonging to the same
> cluster. This data is useful to analyse and optimise scheduling behaviour.
> The tool will also list how many times the menu governor mis-predicts
> target residency in a C-state.

We discussed this in the energy aware scheduling workshop this week @
the Kernel Summit. A few notes:

1. We need to really understand the co-relation of this tool w.r.t
actual hardware states.
It is usually likely that the software "thinks" it is in a low power
state, but the actual
hardware might not be. What is the coverage for these kind of cases here.

2. I understand that C/P states are a direct metric of how well the
workload behaved w.r.t power;
but I am not sure that relates to a direct measure of how the
scheduler performed. The C/P states
could be maintained whilst giving away performance or power at the
expense of additional components
on the SoC and platform like DDR IOs, fabric states etc.

Quick Summary of what I discussed with Daniel @ the workshop about idlestat:

1. There might be usually platform specific tools to get residencies
for P/C states.
PowerTop & Turbostat are two that first come to mind. Any specific
item apart from prediction logic
that idlestat differs from these two?

2. To me debugging performance or power, C/P states provide the
direction that something is wrong.
But they still dont tell me "what" is wrong "if" the issue is somehow
in the kernel as opposed to a more
easily fixable software code (traceable at hardware/software level for
best optimizations). How do I
conclude that my scheduler is the culprit apart from the points where
it took a decision to select the
right idle states based on predicted sleep times? In my opinion, that
would boil down to if the scheduler
was invoking too much load balancing calls, moving my threads across
cores too much, data being
thrashed across caches, cores too much etc.

I think a tool for scheduler metrics must be based on more inner
details like the above, finally culminating
into C/P states. as opposed to C/P states being the metric to be relied.

Let me know your thoughts.

Cheers!

-- these are my personal thoughts and do not represent my employers'
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] (Resend) Tools to analyse PM and scheduling behaviour

2014-08-22 Thread Sundar
Hi Amit,

On Tue, Aug 19, 2014 at 11:11 AM, Amit Kucheria
amit.kuche...@linaro.org wrote:

 We’re soliciting early feedback from community on the direction of idlestat

Nice :)

 Idlestat Details
 
 Idlestat uses FTRACE to capture traces related to C-state and P-state
 transitions of the CPU and wakeups (IRQ, IPI) on the system and then
 post-processes the data to print statistics. It is designed to be used
 non-interactively. Idlestat can deduce the idle time for a cluster as an
 intersection between the idle times of all the cpus belonging to the same
 cluster. This data is useful to analyse and optimise scheduling behaviour.
 The tool will also list how many times the menu governor mis-predicts
 target residency in a C-state.

We discussed this in the energy aware scheduling workshop this week @
the Kernel Summit. A few notes:

1. We need to really understand the co-relation of this tool w.r.t
actual hardware states.
It is usually likely that the software thinks it is in a low power
state, but the actual
hardware might not be. What is the coverage for these kind of cases here.

2. I understand that C/P states are a direct metric of how well the
workload behaved w.r.t power;
but I am not sure that relates to a direct measure of how the
scheduler performed. The C/P states
could be maintained whilst giving away performance or power at the
expense of additional components
on the SoC and platform like DDR IOs, fabric states etc.

Quick Summary of what I discussed with Daniel @ the workshop about idlestat:

1. There might be usually platform specific tools to get residencies
for P/C states.
PowerTop  Turbostat are two that first come to mind. Any specific
item apart from prediction logic
that idlestat differs from these two?

2. To me debugging performance or power, C/P states provide the
direction that something is wrong.
But they still dont tell me what is wrong if the issue is somehow
in the kernel as opposed to a more
easily fixable software code (traceable at hardware/software level for
best optimizations). How do I
conclude that my scheduler is the culprit apart from the points where
it took a decision to select the
right idle states based on predicted sleep times? In my opinion, that
would boil down to if the scheduler
was invoking too much load balancing calls, moving my threads across
cores too much, data being
thrashed across caches, cores too much etc.

I think a tool for scheduler metrics must be based on more inner
details like the above, finally culminating
into C/P states. as opposed to C/P states being the metric to be relied.

Let me know your thoughts.

Cheers!

-- these are my personal thoughts and do not represent my employers'
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/