Re: [PATCH v9 8/8] perf: ARM DynamIQ Shared Unit PMU support

2017-11-03 Thread Suzuki K Poulose

On 03/11/17 12:20, Mark Rutland wrote:

Hi Suzuki,

This looks good, but there are a couple of edge cases I think that we
need to handle, as noted below.

On Tue, Oct 31, 2017 at 05:23:18PM +, Suzuki K Poulose wrote:

Changes since V8:



  - Fill in the "module" field for the PMU to prevent the module unload
when the PMU is active.


Huh. For some reason I thought that was done automatically, but having
looked, I see that it is not.

It looks like this is missing from the SPE PMU, and the CCN PMU. Would
you mind fixing those up?

The only other PMU that I see affected is the AMD power PMU; I've pinged
the maintainer separately.

[...]


+The driver also exposes the CPUs connected to the DSU instance in 
"associated_cpus".


Just to check, is there a user of this?

I agree that it could be useful, but AFAICT the perf tool won't look at
this, so it seems odd to expose it. I'd feel happier punting on exposing
that so that we can settle on a common name for this across
uncore/system PMUs.


It allows the user to identify the DSU instance to profile if there are
multiple DSUs on the system. Also this information can be used to identify
the "cpu" list that can be provided for -C option.



[...]


+static void dsu_pmu_probe_pmu(void *data)
+{



+   /* We can only support upto 31 independent counters */


Nit: s/upto/up to/

[...]


+static void dsu_pmu_init_pmu(struct dsu_pmu *dsu_pmu)
+{
+   int cpu, rc;
+
+   cpu = dsu_pmu_get_online_cpu(dsu_pmu);
+   /* Defer, if we don't have any active CPUs in the DSU */
+   if (cpu >= nr_cpu_ids)
+   return;
+   rc = smp_call_function_single(cpu, dsu_pmu_probe_pmu, dsu_pmu, 1);
+   if (rc)
+   return;
+   /* Reset the interrupt overflow mask */
+   dsu_pmu_get_reset_overflow();
+   dsu_pmu_set_active_cpu(cpu, dsu_pmu);
+}


I think this can be simplified by only callnig this in the hotplug
callback, and not donig the corss-call at all at driver init time. That
way, we can do:

static void dsu_pmu_init_pmu(struct dsu_pmu *dsu_pmu)
{
if (dsu_pmu->num_counters == -1)
dsu_pmu_probe_pmu(dsu_pmu);

dsu_pmu_get_reset_overflow();
}

... which also means we can simplify the prototype of
dsu_pmu_probe_pmu().

Note that the dsu_pmu_set_active_cpu() can be factored out to the
caller, which is a little clearer, as I suiggest below.


+static int dsu_pmu_device_probe(struct platform_device *pdev)
+{



+   /*
+* We could defer probing the PMU details from the registers until
+* an associated CPU is online.
+*/
+   dsu_pmu_init_pmu(dsu_pmu);


... then we can drop this line ...


+   platform_set_drvdata(pdev, dsu_pmu);
+   rc = cpuhp_state_add_instance(dsu_pmu_cpuhp_state,
+   _pmu->cpuhp_node);


... as this should set things up if a CPU is already online.

[...]


Got it, thats a good idea. I will change it.




+static int dsu_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+   struct dsu_pmu *dsu_pmu = hlist_entry_safe(node, struct dsu_pmu,
+  cpuhp_node);
+
+   if (!cpumask_test_cpu(cpu, _pmu->associated_cpus))
+   return 0;
+
+   /* Initialise the PMU if necessary */
+   if (dsu_pmu->num_counters < 0)
+   dsu_pmu_init_pmu(dsu_pmu);
+   /* Set the active CPU if we don't have one */
+   if (cpumask_empty(_pmu->active_cpu))
+   dsu_pmu_set_active_cpu(cpu, dsu_pmu);
+   return 0;
+}


I don't think this is quite right, as if we've offlined all the
associated CPUs, the DSCU itself may have been powered down, and we'll
want to reset it when it's brought online.

I think we want this to be:

static int dsu_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
{
struct dsu_pmu *dsu_pmu = hlist_entry_safe(node, struct dsu_pmu,
   cpuhp_node);

if (!cpumask_test_cpu(cpu, _pmu->associated_cpus))
return 0;

/* If the PMU is already managed, there's nothing to do */
if (!cpumask_empty(_pmu->active_cpu))
return 0;

/* Reset the PMU, and take ownership */
dsu_pmu_init_pmu(dsu_pmu);
dsu_pmu_set_active_cpu(cpu, dsu_pmu);

return 0;
}

[...]


+static int dsu_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
+{



+   dsu_pmu_set_active_cpu(dst, dsu_pmu);
+   perf_pmu_migrate_context(_pmu->pmu, cpu, dst);


In other PMU drivers, we do the migrate, then set the active CPU. That
shouldn't matter, but for consistency, could we flip these around?


OK, will flip it.



Otherwise, this looks good to me.

With the above changes:

Reviewed-by: Mark Rutland 


Thanks for the review. I will post the updated version.

Suzuki


Re: [PATCH v9 8/8] perf: ARM DynamIQ Shared Unit PMU support

2017-11-03 Thread Suzuki K Poulose

On 03/11/17 12:20, Mark Rutland wrote:

Hi Suzuki,

This looks good, but there are a couple of edge cases I think that we
need to handle, as noted below.

On Tue, Oct 31, 2017 at 05:23:18PM +, Suzuki K Poulose wrote:

Changes since V8:



  - Fill in the "module" field for the PMU to prevent the module unload
when the PMU is active.


Huh. For some reason I thought that was done automatically, but having
looked, I see that it is not.

It looks like this is missing from the SPE PMU, and the CCN PMU. Would
you mind fixing those up?

The only other PMU that I see affected is the AMD power PMU; I've pinged
the maintainer separately.

[...]


+The driver also exposes the CPUs connected to the DSU instance in 
"associated_cpus".


Just to check, is there a user of this?

I agree that it could be useful, but AFAICT the perf tool won't look at
this, so it seems odd to expose it. I'd feel happier punting on exposing
that so that we can settle on a common name for this across
uncore/system PMUs.


It allows the user to identify the DSU instance to profile if there are
multiple DSUs on the system. Also this information can be used to identify
the "cpu" list that can be provided for -C option.



[...]


+static void dsu_pmu_probe_pmu(void *data)
+{



+   /* We can only support upto 31 independent counters */


Nit: s/upto/up to/

[...]


+static void dsu_pmu_init_pmu(struct dsu_pmu *dsu_pmu)
+{
+   int cpu, rc;
+
+   cpu = dsu_pmu_get_online_cpu(dsu_pmu);
+   /* Defer, if we don't have any active CPUs in the DSU */
+   if (cpu >= nr_cpu_ids)
+   return;
+   rc = smp_call_function_single(cpu, dsu_pmu_probe_pmu, dsu_pmu, 1);
+   if (rc)
+   return;
+   /* Reset the interrupt overflow mask */
+   dsu_pmu_get_reset_overflow();
+   dsu_pmu_set_active_cpu(cpu, dsu_pmu);
+}


I think this can be simplified by only callnig this in the hotplug
callback, and not donig the corss-call at all at driver init time. That
way, we can do:

static void dsu_pmu_init_pmu(struct dsu_pmu *dsu_pmu)
{
if (dsu_pmu->num_counters == -1)
dsu_pmu_probe_pmu(dsu_pmu);

dsu_pmu_get_reset_overflow();
}

... which also means we can simplify the prototype of
dsu_pmu_probe_pmu().

Note that the dsu_pmu_set_active_cpu() can be factored out to the
caller, which is a little clearer, as I suiggest below.


+static int dsu_pmu_device_probe(struct platform_device *pdev)
+{



+   /*
+* We could defer probing the PMU details from the registers until
+* an associated CPU is online.
+*/
+   dsu_pmu_init_pmu(dsu_pmu);


... then we can drop this line ...


+   platform_set_drvdata(pdev, dsu_pmu);
+   rc = cpuhp_state_add_instance(dsu_pmu_cpuhp_state,
+   _pmu->cpuhp_node);


... as this should set things up if a CPU is already online.

[...]


Got it, thats a good idea. I will change it.




+static int dsu_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+   struct dsu_pmu *dsu_pmu = hlist_entry_safe(node, struct dsu_pmu,
+  cpuhp_node);
+
+   if (!cpumask_test_cpu(cpu, _pmu->associated_cpus))
+   return 0;
+
+   /* Initialise the PMU if necessary */
+   if (dsu_pmu->num_counters < 0)
+   dsu_pmu_init_pmu(dsu_pmu);
+   /* Set the active CPU if we don't have one */
+   if (cpumask_empty(_pmu->active_cpu))
+   dsu_pmu_set_active_cpu(cpu, dsu_pmu);
+   return 0;
+}


I don't think this is quite right, as if we've offlined all the
associated CPUs, the DSCU itself may have been powered down, and we'll
want to reset it when it's brought online.

I think we want this to be:

static int dsu_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
{
struct dsu_pmu *dsu_pmu = hlist_entry_safe(node, struct dsu_pmu,
   cpuhp_node);

if (!cpumask_test_cpu(cpu, _pmu->associated_cpus))
return 0;

/* If the PMU is already managed, there's nothing to do */
if (!cpumask_empty(_pmu->active_cpu))
return 0;

/* Reset the PMU, and take ownership */
dsu_pmu_init_pmu(dsu_pmu);
dsu_pmu_set_active_cpu(cpu, dsu_pmu);

return 0;
}

[...]


+static int dsu_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
+{



+   dsu_pmu_set_active_cpu(dst, dsu_pmu);
+   perf_pmu_migrate_context(_pmu->pmu, cpu, dst);


In other PMU drivers, we do the migrate, then set the active CPU. That
shouldn't matter, but for consistency, could we flip these around?


OK, will flip it.



Otherwise, this looks good to me.

With the above changes:

Reviewed-by: Mark Rutland 


Thanks for the review. I will post the updated version.

Suzuki


Re: [PATCH v9 8/8] perf: ARM DynamIQ Shared Unit PMU support

2017-11-03 Thread Mark Rutland
Hi Suzuki,

This looks good, but there are a couple of edge cases I think that we
need to handle, as noted below.

On Tue, Oct 31, 2017 at 05:23:18PM +, Suzuki K Poulose wrote:
> Changes since V8:

>  - Fill in the "module" field for the PMU to prevent the module unload
>when the PMU is active.

Huh. For some reason I thought that was done automatically, but having
looked, I see that it is not.

It looks like this is missing from the SPE PMU, and the CCN PMU. Would
you mind fixing those up?

The only other PMU that I see affected is the AMD power PMU; I've pinged
the maintainer separately.

[...]

> +The driver also exposes the CPUs connected to the DSU instance in 
> "associated_cpus".

Just to check, is there a user of this?

I agree that it could be useful, but AFAICT the perf tool won't look at
this, so it seems odd to expose it. I'd feel happier punting on exposing
that so that we can settle on a common name for this across
uncore/system PMUs.

[...]

> +static void dsu_pmu_probe_pmu(void *data)
> +{

> + /* We can only support upto 31 independent counters */

Nit: s/upto/up to/

[...]

> +static void dsu_pmu_init_pmu(struct dsu_pmu *dsu_pmu)
> +{
> + int cpu, rc;
> +
> + cpu = dsu_pmu_get_online_cpu(dsu_pmu);
> + /* Defer, if we don't have any active CPUs in the DSU */
> + if (cpu >= nr_cpu_ids)
> + return;
> + rc = smp_call_function_single(cpu, dsu_pmu_probe_pmu, dsu_pmu, 1);
> + if (rc)
> + return;
> + /* Reset the interrupt overflow mask */
> + dsu_pmu_get_reset_overflow();
> + dsu_pmu_set_active_cpu(cpu, dsu_pmu);
> +}

I think this can be simplified by only callnig this in the hotplug
callback, and not donig the corss-call at all at driver init time. That
way, we can do:

static void dsu_pmu_init_pmu(struct dsu_pmu *dsu_pmu)
{
if (dsu_pmu->num_counters == -1)
dsu_pmu_probe_pmu(dsu_pmu);

dsu_pmu_get_reset_overflow();
}

... which also means we can simplify the prototype of
dsu_pmu_probe_pmu().

Note that the dsu_pmu_set_active_cpu() can be factored out to the
caller, which is a little clearer, as I suiggest below.

> +static int dsu_pmu_device_probe(struct platform_device *pdev)
> +{

> + /*
> +  * We could defer probing the PMU details from the registers until
> +  * an associated CPU is online.
> +  */
> + dsu_pmu_init_pmu(dsu_pmu);

... then we can drop this line ...

> + platform_set_drvdata(pdev, dsu_pmu);
> + rc = cpuhp_state_add_instance(dsu_pmu_cpuhp_state,
> + _pmu->cpuhp_node);

... as this should set things up if a CPU is already online.

[...]

> +static int dsu_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
> +{
> + struct dsu_pmu *dsu_pmu = hlist_entry_safe(node, struct dsu_pmu,
> +cpuhp_node);
> +
> + if (!cpumask_test_cpu(cpu, _pmu->associated_cpus))
> + return 0;
> +
> + /* Initialise the PMU if necessary */
> + if (dsu_pmu->num_counters < 0)
> + dsu_pmu_init_pmu(dsu_pmu);
> + /* Set the active CPU if we don't have one */
> + if (cpumask_empty(_pmu->active_cpu))
> + dsu_pmu_set_active_cpu(cpu, dsu_pmu);
> + return 0;
> +}

I don't think this is quite right, as if we've offlined all the
associated CPUs, the DSCU itself may have been powered down, and we'll
want to reset it when it's brought online.

I think we want this to be:

static int dsu_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
{
struct dsu_pmu *dsu_pmu = hlist_entry_safe(node, struct dsu_pmu,
   cpuhp_node);

if (!cpumask_test_cpu(cpu, _pmu->associated_cpus))
return 0;

/* If the PMU is already managed, there's nothing to do */
if (!cpumask_empty(_pmu->active_cpu))
return 0;

/* Reset the PMU, and take ownership */
dsu_pmu_init_pmu(dsu_pmu);
dsu_pmu_set_active_cpu(cpu, dsu_pmu);

return 0;
}

[...]

> +static int dsu_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
> +{

> + dsu_pmu_set_active_cpu(dst, dsu_pmu);
> + perf_pmu_migrate_context(_pmu->pmu, cpu, dst);

In other PMU drivers, we do the migrate, then set the active CPU. That
shouldn't matter, but for consistency, could we flip these around?

Otherwise, this looks good to me.

With the above changes:

Reviewed-by: Mark Rutland 

Thanks,
Mark.


Re: [PATCH v9 8/8] perf: ARM DynamIQ Shared Unit PMU support

2017-11-03 Thread Mark Rutland
Hi Suzuki,

This looks good, but there are a couple of edge cases I think that we
need to handle, as noted below.

On Tue, Oct 31, 2017 at 05:23:18PM +, Suzuki K Poulose wrote:
> Changes since V8:

>  - Fill in the "module" field for the PMU to prevent the module unload
>when the PMU is active.

Huh. For some reason I thought that was done automatically, but having
looked, I see that it is not.

It looks like this is missing from the SPE PMU, and the CCN PMU. Would
you mind fixing those up?

The only other PMU that I see affected is the AMD power PMU; I've pinged
the maintainer separately.

[...]

> +The driver also exposes the CPUs connected to the DSU instance in 
> "associated_cpus".

Just to check, is there a user of this?

I agree that it could be useful, but AFAICT the perf tool won't look at
this, so it seems odd to expose it. I'd feel happier punting on exposing
that so that we can settle on a common name for this across
uncore/system PMUs.

[...]

> +static void dsu_pmu_probe_pmu(void *data)
> +{

> + /* We can only support upto 31 independent counters */

Nit: s/upto/up to/

[...]

> +static void dsu_pmu_init_pmu(struct dsu_pmu *dsu_pmu)
> +{
> + int cpu, rc;
> +
> + cpu = dsu_pmu_get_online_cpu(dsu_pmu);
> + /* Defer, if we don't have any active CPUs in the DSU */
> + if (cpu >= nr_cpu_ids)
> + return;
> + rc = smp_call_function_single(cpu, dsu_pmu_probe_pmu, dsu_pmu, 1);
> + if (rc)
> + return;
> + /* Reset the interrupt overflow mask */
> + dsu_pmu_get_reset_overflow();
> + dsu_pmu_set_active_cpu(cpu, dsu_pmu);
> +}

I think this can be simplified by only callnig this in the hotplug
callback, and not donig the corss-call at all at driver init time. That
way, we can do:

static void dsu_pmu_init_pmu(struct dsu_pmu *dsu_pmu)
{
if (dsu_pmu->num_counters == -1)
dsu_pmu_probe_pmu(dsu_pmu);

dsu_pmu_get_reset_overflow();
}

... which also means we can simplify the prototype of
dsu_pmu_probe_pmu().

Note that the dsu_pmu_set_active_cpu() can be factored out to the
caller, which is a little clearer, as I suiggest below.

> +static int dsu_pmu_device_probe(struct platform_device *pdev)
> +{

> + /*
> +  * We could defer probing the PMU details from the registers until
> +  * an associated CPU is online.
> +  */
> + dsu_pmu_init_pmu(dsu_pmu);

... then we can drop this line ...

> + platform_set_drvdata(pdev, dsu_pmu);
> + rc = cpuhp_state_add_instance(dsu_pmu_cpuhp_state,
> + _pmu->cpuhp_node);

... as this should set things up if a CPU is already online.

[...]

> +static int dsu_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
> +{
> + struct dsu_pmu *dsu_pmu = hlist_entry_safe(node, struct dsu_pmu,
> +cpuhp_node);
> +
> + if (!cpumask_test_cpu(cpu, _pmu->associated_cpus))
> + return 0;
> +
> + /* Initialise the PMU if necessary */
> + if (dsu_pmu->num_counters < 0)
> + dsu_pmu_init_pmu(dsu_pmu);
> + /* Set the active CPU if we don't have one */
> + if (cpumask_empty(_pmu->active_cpu))
> + dsu_pmu_set_active_cpu(cpu, dsu_pmu);
> + return 0;
> +}

I don't think this is quite right, as if we've offlined all the
associated CPUs, the DSCU itself may have been powered down, and we'll
want to reset it when it's brought online.

I think we want this to be:

static int dsu_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
{
struct dsu_pmu *dsu_pmu = hlist_entry_safe(node, struct dsu_pmu,
   cpuhp_node);

if (!cpumask_test_cpu(cpu, _pmu->associated_cpus))
return 0;

/* If the PMU is already managed, there's nothing to do */
if (!cpumask_empty(_pmu->active_cpu))
return 0;

/* Reset the PMU, and take ownership */
dsu_pmu_init_pmu(dsu_pmu);
dsu_pmu_set_active_cpu(cpu, dsu_pmu);

return 0;
}

[...]

> +static int dsu_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
> +{

> + dsu_pmu_set_active_cpu(dst, dsu_pmu);
> + perf_pmu_migrate_context(_pmu->pmu, cpu, dst);

In other PMU drivers, we do the migrate, then set the active CPU. That
shouldn't matter, but for consistency, could we flip these around?

Otherwise, this looks good to me.

With the above changes:

Reviewed-by: Mark Rutland 

Thanks,
Mark.


[PATCH v9 8/8] perf: ARM DynamIQ Shared Unit PMU support

2017-10-31 Thread Suzuki K Poulose
Add support for the Cluster PMU part of the ARM DynamIQ Shared Unit (DSU).
The DSU integrates one or more cores with an L3 memory system, control
logic, and external interfaces to form a multicore cluster. The PMU
allows counting the various events related to L3, SCU etc, along with
providing a cycle counter.

The PMU can be accessed via system registers, which are common
to the cores in the same cluster. The PMU registers follow the
semantics of the ARMv8 PMU, mostly, with the exception that
the counters record the cluster wide events.

This driver is mostly based on the ARMv8 and CCI PMU drivers.
The driver only supports ARM64 at the moment. It can be extended
to support ARM32 by providing register accessors like we do in
arch/arm64/include/arm_dsu_pmu.h.

Cc: Mark Rutland 
Cc: Will Deacon 
Reviewed-by: Jonathan Cameron 
Signed-off-by: Suzuki K Poulose 
---
Changes since V8:
 - Include required header files (Mark Rutland)
 - Remove Kconfig dependency on PERF_EVENTS (Mark Rutland)
 - Fix typo in event name, bus_acesss => bus_access (Mark Rutland)
 - Use find_first_zero_bit instead of find_next_zero_bit (Mark Rutland)
 - Change order of checks in dsu_pmu_event_init (Mark Rutland)
 - Allow lazy initialisation of DSU PMU to handle cases where CPUs
   may be brought up later (e.g, maxcpus=N)- Mark Rutland.
 - Clear the interrupt overflow status upon initialisation (Mark Rutland)
 - Change the CPU check to "associated_cpus" from "active_cpus",
   as when we migrate the perf context we will access the DSU
   from two different CPUs (source and destination).
 - Fill in the "module" field for the PMU to prevent the module unload
   when the PMU is active.
Changes since V6:
 - Address comments from Jonathan
 - Add Reviewed-by tags from Jonathan
Changes since V5:
 - Address comments on V5 by Mark.
 - Use IRQ_NOBALANCING for IRQ handler
 - Don't expose events which could be unimplemented.
 - Get rid of dsu_pmu_event_supported and allow raw event
   code to be used without validating whether it is supported.
 - Rename "supported_cpus" mask to "associated_cpus"
 - Add Documentation for the PMU driver
 - Don't disable IRQ for dsu_pmu_{enable/disable}_counters
 - Use consistent return codes for validate_event/group calls.
 - Check PERF_ATTACH_TASK flag in event_init.
 - Allow missing CPUs in dsu_pmu_dt_get_cpus, to handle cases
   where kernel could have capped nr_cpus.
 - Cleanup sanity checking for the CPU before accessing DSU
 - Reject events with counting CPU not associated with the DSU.
Changes since V4:
 - Reflect the changed generic helper for mapping CPU id
Changes since V2:
 - Cleanup dsu_pmu_device_probe error handling.
 - Fix event validate_group to invert the result check of validate_event
 - Return errors if we failed to parse CPUs in the DSU.
 - Add MODULE_DEVICE_TABLE entry
 - Use hlist_entry_safe for converting cpuhp_node to dsu_pmu.
---
 Documentation/perf/arm_dsu_pmu.txt   |  28 ++
 arch/arm64/include/asm/arm_dsu_pmu.h | 129 ++
 drivers/perf/Kconfig |   9 +
 drivers/perf/Makefile|   1 +
 drivers/perf/arm_dsu_pmu.c   | 861 +++
 5 files changed, 1028 insertions(+)
 create mode 100644 Documentation/perf/arm_dsu_pmu.txt
 create mode 100644 arch/arm64/include/asm/arm_dsu_pmu.h
 create mode 100644 drivers/perf/arm_dsu_pmu.c

diff --git a/Documentation/perf/arm_dsu_pmu.txt 
b/Documentation/perf/arm_dsu_pmu.txt
new file mode 100644
index ..d611e15f5add
--- /dev/null
+++ b/Documentation/perf/arm_dsu_pmu.txt
@@ -0,0 +1,28 @@
+ARM DynamIQ Shared Unit (DSU) PMU
+==
+
+ARM DynamIQ Shared Unit integrates one or more cores with an L3 memory system,
+control logic and external interfaces to form a multicore cluster. The PMU
+allows counting the various events related to the L3 cache, Snoop Control Unit
+etc, using 32bit independent counters. It also provides a 64bit cycle counter.
+
+The PMU can only be accessed via CPU system registers and are common to the
+cores connected to the same DSU. Like most of the other uncore PMUs, DSU
+PMU doesn't support process specific events and cannot be used in sampling 
mode.
+
+The DSU provides a bitmap for a subset of implemented events via hardware
+registers. There is no way for the driver to determine if the other events
+are available or not. Hence the driver exposes only those events advertised
+by the DSU, in "events" directory under :
+
+  /sys/bus/event_sources/devices/arm_dsu_/
+
+The user should refer to the TRM of the product to figure out the supported 
events
+and use the raw event code for the unlisted events.
+
+The driver also exposes the CPUs connected to the DSU instance in 
"associated_cpus".
+
+
+e.g usage :
+
+   perf stat -a -e arm_dsu_0/cycles/
diff --git a/arch/arm64/include/asm/arm_dsu_pmu.h 
b/arch/arm64/include/asm/arm_dsu_pmu.h
new 

[PATCH v9 8/8] perf: ARM DynamIQ Shared Unit PMU support

2017-10-31 Thread Suzuki K Poulose
Add support for the Cluster PMU part of the ARM DynamIQ Shared Unit (DSU).
The DSU integrates one or more cores with an L3 memory system, control
logic, and external interfaces to form a multicore cluster. The PMU
allows counting the various events related to L3, SCU etc, along with
providing a cycle counter.

The PMU can be accessed via system registers, which are common
to the cores in the same cluster. The PMU registers follow the
semantics of the ARMv8 PMU, mostly, with the exception that
the counters record the cluster wide events.

This driver is mostly based on the ARMv8 and CCI PMU drivers.
The driver only supports ARM64 at the moment. It can be extended
to support ARM32 by providing register accessors like we do in
arch/arm64/include/arm_dsu_pmu.h.

Cc: Mark Rutland 
Cc: Will Deacon 
Reviewed-by: Jonathan Cameron 
Signed-off-by: Suzuki K Poulose 
---
Changes since V8:
 - Include required header files (Mark Rutland)
 - Remove Kconfig dependency on PERF_EVENTS (Mark Rutland)
 - Fix typo in event name, bus_acesss => bus_access (Mark Rutland)
 - Use find_first_zero_bit instead of find_next_zero_bit (Mark Rutland)
 - Change order of checks in dsu_pmu_event_init (Mark Rutland)
 - Allow lazy initialisation of DSU PMU to handle cases where CPUs
   may be brought up later (e.g, maxcpus=N)- Mark Rutland.
 - Clear the interrupt overflow status upon initialisation (Mark Rutland)
 - Change the CPU check to "associated_cpus" from "active_cpus",
   as when we migrate the perf context we will access the DSU
   from two different CPUs (source and destination).
 - Fill in the "module" field for the PMU to prevent the module unload
   when the PMU is active.
Changes since V6:
 - Address comments from Jonathan
 - Add Reviewed-by tags from Jonathan
Changes since V5:
 - Address comments on V5 by Mark.
 - Use IRQ_NOBALANCING for IRQ handler
 - Don't expose events which could be unimplemented.
 - Get rid of dsu_pmu_event_supported and allow raw event
   code to be used without validating whether it is supported.
 - Rename "supported_cpus" mask to "associated_cpus"
 - Add Documentation for the PMU driver
 - Don't disable IRQ for dsu_pmu_{enable/disable}_counters
 - Use consistent return codes for validate_event/group calls.
 - Check PERF_ATTACH_TASK flag in event_init.
 - Allow missing CPUs in dsu_pmu_dt_get_cpus, to handle cases
   where kernel could have capped nr_cpus.
 - Cleanup sanity checking for the CPU before accessing DSU
 - Reject events with counting CPU not associated with the DSU.
Changes since V4:
 - Reflect the changed generic helper for mapping CPU id
Changes since V2:
 - Cleanup dsu_pmu_device_probe error handling.
 - Fix event validate_group to invert the result check of validate_event
 - Return errors if we failed to parse CPUs in the DSU.
 - Add MODULE_DEVICE_TABLE entry
 - Use hlist_entry_safe for converting cpuhp_node to dsu_pmu.
---
 Documentation/perf/arm_dsu_pmu.txt   |  28 ++
 arch/arm64/include/asm/arm_dsu_pmu.h | 129 ++
 drivers/perf/Kconfig |   9 +
 drivers/perf/Makefile|   1 +
 drivers/perf/arm_dsu_pmu.c   | 861 +++
 5 files changed, 1028 insertions(+)
 create mode 100644 Documentation/perf/arm_dsu_pmu.txt
 create mode 100644 arch/arm64/include/asm/arm_dsu_pmu.h
 create mode 100644 drivers/perf/arm_dsu_pmu.c

diff --git a/Documentation/perf/arm_dsu_pmu.txt 
b/Documentation/perf/arm_dsu_pmu.txt
new file mode 100644
index ..d611e15f5add
--- /dev/null
+++ b/Documentation/perf/arm_dsu_pmu.txt
@@ -0,0 +1,28 @@
+ARM DynamIQ Shared Unit (DSU) PMU
+==
+
+ARM DynamIQ Shared Unit integrates one or more cores with an L3 memory system,
+control logic and external interfaces to form a multicore cluster. The PMU
+allows counting the various events related to the L3 cache, Snoop Control Unit
+etc, using 32bit independent counters. It also provides a 64bit cycle counter.
+
+The PMU can only be accessed via CPU system registers and are common to the
+cores connected to the same DSU. Like most of the other uncore PMUs, DSU
+PMU doesn't support process specific events and cannot be used in sampling 
mode.
+
+The DSU provides a bitmap for a subset of implemented events via hardware
+registers. There is no way for the driver to determine if the other events
+are available or not. Hence the driver exposes only those events advertised
+by the DSU, in "events" directory under :
+
+  /sys/bus/event_sources/devices/arm_dsu_/
+
+The user should refer to the TRM of the product to figure out the supported 
events
+and use the raw event code for the unlisted events.
+
+The driver also exposes the CPUs connected to the DSU instance in 
"associated_cpus".
+
+
+e.g usage :
+
+   perf stat -a -e arm_dsu_0/cycles/
diff --git a/arch/arm64/include/asm/arm_dsu_pmu.h 
b/arch/arm64/include/asm/arm_dsu_pmu.h
new file mode 100644
index ..82e5cc3356bf
--- /dev/null
+++