Re: [RFC 00/11] perf: Enhancing perf to export processor hazard information

2020-03-26 Thread maddy




On 3/18/20 11:05 PM, Kim Phillips wrote:

Hi Maddy,

On 3/17/20 1:50 AM, maddy wrote:

On 3/13/20 4:08 AM, Kim Phillips wrote:

On 3/11/20 11:00 AM, Ravi Bangoria wrote:

On 3/6/20 3:36 AM, Kim Phillips wrote:

On 3/3/20 3:55 AM, Kim Phillips wrote:

On 3/2/20 2:21 PM, Stephane Eranian wrote:

On Mon, Mar 2, 2020 at 2:13 AM Peter Zijlstra  wrote:

On Mon, Mar 02, 2020 at 10:53:44AM +0530, Ravi Bangoria wrote:

Modern processors export such hazard data in Performance
Monitoring Unit (PMU) registers. Ex, 'Sampled Instruction Event
Register' on IBM PowerPC[1][2] and 'Instruction-Based Sampling' on
AMD[3] provides similar information.

Implementation detail:

A new sample_type called PERF_SAMPLE_PIPELINE_HAZ is introduced.
If it's set, kernel converts arch specific hazard information
into generic format:

  struct perf_pipeline_haz_data {
     /* Instruction/Opcode type: Load, Store, Branch  */
     __u8    itype;
     /* Instruction Cache source */
     __u8    icache;
     /* Instruction suffered hazard in pipeline stage */
     __u8    hazard_stage;
     /* Hazard reason */
     __u8    hazard_reason;
     /* Instruction suffered stall in pipeline stage */
     __u8    stall_stage;
     /* Stall reason */
     __u8    stall_reason;
     __u16   pad;
  };

Kim, does this format indeed work for AMD IBS?

It's not really 1:1, we don't have these separations of stages
and reasons, for example: we have missed in L2 cache, for example.
So IBS output is flatter, with more cycle latency figures than
IBM's AFAICT.

AMD IBS captures pipeline latency data incase Fetch sampling like the
Fetch latency, tag to retire latency, completion to retire latency and
so on. Yes, Ops sampling do provide more data on load/store centric
information. But it also captures more detailed data for Branch instructions.
And we also looked at ARM SPE, which also captures more details pipeline
data and latency information.


Personally, I don't like the term hazard. This is too IBM Power
specific. We need to find a better term, maybe stall or penalty.

Right, IBS doesn't have a filter to only count stalled or otherwise
bad events.  IBS' PPR descriptions has one occurrence of the
word stall, and no penalty.  The way I read IBS is it's just
reporting more sample data than just the precise IP: things like
hits, misses, cycle latencies, addresses, types, etc., so words
like 'extended', or the 'auxiliary' already used today even
are more appropriate for IBS, although I'm the last person to
bikeshed.

We are thinking of using "pipeline" word instead of Hazard.

Hm, the word 'pipeline' occurs 0 times in IBS documentation.

NP. We thought pipeline is generic hw term so we proposed "pipeline"
word. We are open to term which can be generic enough.


I realize there are a couple of core pipeline-specific pieces
of information coming out of it, but the vast majority
are addresses, latencies of various components in the memory
hierarchy, and various component hit/miss bits.

Yes. we should capture core pipeline specific details. For example,
IBS generates Branch unit information(IbsOpData1) and Icahce related
data(IbsFetchCtl) which is something that shouldn't be extended as
part of perf-mem, IMO.

Sure, IBS Op-side output is more 'perf mem' friendly, and so it
should populate perf_mem_data_src fields, just like POWER9 can:

union perf_mem_data_src {
...
  __u64   mem_rsvd:24,
  mem_snoopx:2,   /* snoop mode, ext */
  mem_remote:1,   /* remote */
  mem_lvl_num:4,  /* memory hierarchy level number */
  mem_dtlb:7, /* tlb access */
  mem_lock:2, /* lock instr */
  mem_snoop:5,    /* snoop mode */
  mem_lvl:14, /* memory hierarchy level */
  mem_op:5;   /* type of opcode */


E.g., SIER[LDST] SIER[A_XLATE_SRC] can be used to populate
mem_lvl[_num], SIER_TYPE can be used to populate 'mem_op',
'mem_lock', and the Reload Bus Source Encoding bits can
be used to populate mem_snoop, right?

Hi Kim,

Yes. We do expose these data as part of perf-mem for POWER.

OK, I see relevant PERF_MEM_S bits in arch/powerpc/perf/isa207-common.c:
isa207_find_source now, thanks.


For IBS, I see PERF_SAMPLE_ADDR and PERF_SAMPLE_PHYS_ADDR can be
used for the ld/st target addresses, too.


What's needed here is a vendor-specific extended
sample information that all these technologies gather,
of which things like e.g., 'L1 TLB cycle latency' we
all should have in common.

Yes. We will include fields to capture the latency cycles (like Issue
latency, Instruction completion latency etc..) along with other pipeline
details in the proposed structure.

Latency figures are just an example, and from what I
can tell, struct perf_sam

Re: [RFC 00/11] perf: Enhancing perf to export processor hazard information

2020-03-17 Thread maddy
call latency
'weight'.

I didn't see any latency figures coming out of POWER9,
and do not expect this patchseries to implement those
of other vendors, e.g., AMD's IBS; leave each vendor
to amend perf to suit their own h/w output please.


Reference structure proposed in this patchset did not have members
to capture latency info for that exact reason. But idea here is to
abstract  as vendor specific as possible. So if we include u16 array,
then this format can also capture data from IBS since it provides
few latency details.




My main point there, however, was that each vendor should
use streamlined record-level code to just copy the data
in the proprietary format that their hardware produces,
and then then perf tooling can synthesize the events
from the raw data at report/script/etc. time.


I'm not sure why a new PERF_SAMPLE_PIPELINE_HAZ is needed
either.  Can we use PERF_SAMPLE_AUX instead?

We took a look at PERF_SAMPLE_AUX. IIUC, PERF_SAMPLE_AUX is intended when
large volume of data needs to be captured as part of perf.data without
frequent PMIs. But proposed type is to address the capture of pipeline

SAMPLE_AUX shouldn't care whether the volume is large, or how frequent
PMIs are, even though it may be used in those environments.


information on each sample using PMI at periodic intervals. Hence proposing
PERF_SAMPLE_PIPELINE_HAZ.

And that's fine for any extra bits that POWER9 has to convey
to its users beyond things already represented by other sample
types like PERF_SAMPLE_DATA_SRC, but the capturing of both POWER9
and other vendor e.g., AMD IBS data can be made vendor-independent
at record time by using SAMPLE_AUX, or SAMPLE_RAW even, which is
what IBS currently uses.


My bad. Not sure what you mean by this. We are trying to abstract
as much vendor specific data as possible with this (like perf-mem).


Maddy



  Take a look at
commit 98dcf14d7f9c "perf tools: Add kernel AUX area sampling
definitions".  The sample identifier can be used to determine
which vendor's sampling IP's data is in it, and events can
be recorded just by copying the content of the SIER, etc.
registers, and then events get synthesized from the aux
sample at report/inject/annotate etc. time.  This allows
for less sample recording overhead, and moves all the vendor
specific decoding and common event conversions for userspace
to figure out.

When AUX buffer data is structured, tool side changes added to present the
pipeline data can be re-used.

Not sure I understand: AUX data would be structured on
each vendor's raw h/w register formats.

Thanks,

Kim


Also worth considering is the support of ARM SPE (Statistical
Profiling Extension) which is their version of IBS.
Whatever gets added need to cover all three with no limitations.

I thought Intel's various LBR, PEBS, and PT supported providing
similar sample data in perf already, like with perf mem/c2c?

perf-mem is more of data centric in my opinion. It is more towards
memory profiling. So proposal here is to expose pipeline related
details like stalls and latencies.

Like I said, I don't see it that way, I see it as "any particular
vendor's event's extended details', and these pipeline details
have overlap with existing infrastructure within perf, e.g., L2
cache misses.

Kim





Re: [RFC 00/11] perf: Enhancing perf to export processor hazard information

2020-03-04 Thread maddy




On 3/3/20 1:51 AM, Stephane Eranian wrote:

On Mon, Mar 2, 2020 at 2:13 AM Peter Zijlstra  wrote:

On Mon, Mar 02, 2020 at 10:53:44AM +0530, Ravi Bangoria wrote:

Modern processors export such hazard data in Performance
Monitoring Unit (PMU) registers. Ex, 'Sampled Instruction Event
Register' on IBM PowerPC[1][2] and 'Instruction-Based Sampling' on
AMD[3] provides similar information.

Implementation detail:

A new sample_type called PERF_SAMPLE_PIPELINE_HAZ is introduced.
If it's set, kernel converts arch specific hazard information
into generic format:

   struct perf_pipeline_haz_data {
  /* Instruction/Opcode type: Load, Store, Branch  */
  __u8itype;
  /* Instruction Cache source */
  __u8icache;
  /* Instruction suffered hazard in pipeline stage */
  __u8hazard_stage;
  /* Hazard reason */
  __u8hazard_reason;
  /* Instruction suffered stall in pipeline stage */
  __u8stall_stage;
  /* Stall reason */
  __u8stall_reason;
  __u16   pad;
   };

Kim, does this format indeed work for AMD IBS?


Personally, I don't like the term hazard. This is too IBM Power
specific. We need to find a better term, maybe stall or penalty.


Yes, names can be reworked and thinking more on it, how about these
as "pipeline" data instead of "hazard" data.


Also worth considering is the support of ARM SPE (Statistical
Profiling Extension) which is their version of IBS.
Whatever gets added need to cover all three with no limitations.


Thanks for pointing this out. We looked at the ARM SPE spec and it does
provides information like issue latency, translation latency so on.
And AMD IBS provides data like fetch latency, tag to retire latency,
completion to retire latency and so on when using Fetch sampling.
 So yes, will rework the struct definition to include data from ARM SPE
and AMD IBS also. Will post out a newer version soon.

Thanks for the comments
Maddy



Re: [RFC 02/11] perf/core: Data structure to present hazard data

2020-03-02 Thread maddy




On 3/2/20 3:25 PM, Peter Zijlstra wrote:

On Mon, Mar 02, 2020 at 10:53:46AM +0530, Ravi Bangoria wrote:

From: Madhavan Srinivasan 

Introduce new perf sample_type PERF_SAMPLE_PIPELINE_HAZ to request kernel
to provide cpu pipeline hazard data. Also, introduce arch independent
structure 'perf_pipeline_haz_data' to pass hazard data to userspace. This
is generic structure and arch specific data needs to be converted to this
format.

Signed-off-by: Madhavan Srinivasan 
Signed-off-by: Ravi Bangoria 
---
  include/linux/perf_event.h|  7 ++
  include/uapi/linux/perf_event.h   | 32 ++-
  kernel/events/core.c  |  6 +
  tools/include/uapi/linux/perf_event.h | 32 ++-
  4 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 547773f5894e..d5b606e3c57d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1001,6 +1001,7 @@ struct perf_sample_data {
u64 stack_user_size;
  
  	u64phys_addr;

+   struct perf_pipeline_haz_data   pipeline_haz;
  } cacheline_aligned;
  
  /* default value for data source */

@@ -1021,6 +1022,12 @@ static inline void perf_sample_data_init(struct 
perf_sample_data *data,
data->weight = 0;
data->data_src.val = PERF_MEM_NA;
data->txn = 0;
+   data->pipeline_haz.itype = PERF_HAZ__ITYPE_NA;
+   data->pipeline_haz.icache = PERF_HAZ__ICACHE_NA;
+   data->pipeline_haz.hazard_stage = PERF_HAZ__PIPE_STAGE_NA;
+   data->pipeline_haz.hazard_reason = PERF_HAZ__HREASON_NA;
+   data->pipeline_haz.stall_stage = PERF_HAZ__PIPE_STAGE_NA;
+   data->pipeline_haz.stall_reason = PERF_HAZ__SREASON_NA;
  }

NAK, Don't touch anything outside of the first cacheline here.


My bad, should have looked at the comment in "struct perf_sample_data {".
Will move it to perf_prepare_sample().

Thanks for comments.
Maddy



Re: [PATCH 8/8] perf/tools/pmu-events/powerpc: Add hv_24x7 socket/chip level metric events

2020-02-20 Thread maddy




On 2/14/20 4:33 PM, Kajol Jain wrote:

The hv_24×7 feature in IBM® POWER9™ processor-based servers provide the
facility to continuously collect large numbers of hardware performance
metrics efficiently and accurately.
This patch adds hv_24x7 json metric file for different Socket/chip
resources.

Result:

power9 platform:

command:# ./perf stat --metric-only -M Memory_RD_BW_Chip -C 0
-I 1000 sleep 1

time MB   Memory_RD_BW_Chip_0 MB   Memory_RD_BW_Chip_1 MB
1.000192635  0.4  0.0
1.001695883  0.0  0.0

Signed-off-by: Kajol Jain 
---
  .../arch/powerpc/power9/hv_24x7_metrics.json  | 19 +++
  1 file changed, 19 insertions(+)
  create mode 100644 
tools/perf/pmu-events/arch/powerpc/power9/hv_24x7_metrics.json

diff --git a/tools/perf/pmu-events/arch/powerpc/power9/hv_24x7_metrics.json 
b/tools/perf/pmu-events/arch/powerpc/power9/hv_24x7_metrics.json
new file mode 100644
index ..ac38f5540ac6
--- /dev/null
+++ b/tools/perf/pmu-events/arch/powerpc/power9/hv_24x7_metrics.json


Better to have it as nest_metrics.json instead.  Rest looks fine

Reviewed-by: Madhavan Srinivasan 


@@ -0,0 +1,19 @@
+[
+{
+"MetricExpr": "(hv_24x7@PM_MCS01_128B_RD_DISP_PORT01\\,chip\\=?@ + 
hv_24x7@PM_MCS01_128B_RD_DISP_PORT23\\,chip\\=?@ + hv_24x7@PM_MCS23_128B_RD_DISP_PORT01\\,chip\\=?@ 
+ hv_24x7@PM_MCS23_128B_RD_DISP_PORT23\\,chip\\=?@)",
+"MetricName": "Memory_RD_BW_Chip",
+"MetricGroup": "Memory_BW",
+"ScaleUnit": "1.6e-2MB"
+},
+{
+"MetricExpr": "(hv_24x7@PM_MCS01_128B_WR_DISP_PORT01\\,chip\\=?@ + 
hv_24x7@PM_MCS01_128B_WR_DISP_PORT23\\,chip\\=?@ + hv_24x7@PM_MCS23_128B_WR_DISP_PORT01\\,chip\\=?@ 
+ hv_24x7@PM_MCS23_128B_WR_DISP_PORT23\\,chip\\=?@ )",
+"MetricName": "Memory_WR_BW_Chip",
+"MetricGroup": "Memory_BW",
+"ScaleUnit": "1.6e-2MB"
+},
+{
+"MetricExpr": "(hv_24x7@PM_PB_CYC\\,chip\\=?@ )",
+"MetricName": "PowerBUS_Frequency",
+"ScaleUnit": "2.5e-7GHz"
+}
+]




Re: [PATCH v2 2/5] powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events.

2020-02-06 Thread maddy




On 1/21/20 3:47 PM, Anju T Sudhakar wrote:

IMC(In-memory Collection Counters) does performance monitoring in
two different modes, i.e accumulation mode(core-imc and thread-imc events),
and trace mode(trace-imc events). A cpu thread can either be in
accumulation-mode or trace-mode at a time and this is done via the LDBAR
register in POWER architecture. The current design does not address the
races between thread-imc and trace-imc events.

Patch implements a global id and lock to avoid the races between
core, trace and thread imc events. With this global id-lock
implementation, the system can either run core, thread or trace imc
events at a time. i.e. to run any core-imc events, thread/trace imc events
should not be enabled/monitored.


Changes looks fine to me.

Reviewed-by: Madhavan Srinivasan 


Signed-off-by: Anju T Sudhakar 
---
  arch/powerpc/perf/imc-pmu.c | 177 +++-
  1 file changed, 153 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index cb50a9e1fd2d..2e220f199530 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -44,6 +44,16 @@ static DEFINE_PER_CPU(u64 *, trace_imc_mem);
  static struct imc_pmu_ref *trace_imc_refc;
  static int trace_imc_mem_size;

+/*
+ * Global data structure used to avoid races between thread,
+ * core and trace-imc
+ */
+static struct imc_pmu_ref imc_global_refc = {
+   .lock = __MUTEX_INITIALIZER(imc_global_refc.lock),
+   .id = 0,
+   .refc = 0,
+};
+
  static struct imc_pmu *imc_event_to_pmu(struct perf_event *event)
  {
return container_of(event->pmu, struct imc_pmu, pmu);
@@ -759,6 +769,20 @@ static void core_imc_counters_release(struct perf_event 
*event)
ref->refc = 0;
}
mutex_unlock(>lock);
+
+   mutex_lock(_global_refc.lock);
+   if (imc_global_refc.id == IMC_DOMAIN_CORE) {
+   imc_global_refc.refc--;
+   /*
+* If no other thread is running any core-imc
+* event, set the global id to zero.
+*/
+   if (imc_global_refc.refc <= 0) {
+   imc_global_refc.refc = 0;
+   imc_global_refc.id = 0;
+   }
+   }
+   mutex_unlock(_global_refc.lock);
  }

  static int core_imc_event_init(struct perf_event *event)
@@ -779,6 +803,22 @@ static int core_imc_event_init(struct perf_event *event)
if (event->cpu < 0)
return -EINVAL;

+   /*
+* Take the global lock, and make sure
+* no other thread is running any trace OR thread imc event
+*/
+   mutex_lock(_global_refc.lock);
+   if (imc_global_refc.id == 0) {
+   imc_global_refc.id = IMC_DOMAIN_CORE;
+   imc_global_refc.refc++;
+   } else if (imc_global_refc.id == IMC_DOMAIN_CORE) {
+   imc_global_refc.refc++;
+   } else {
+   mutex_unlock(_global_refc.lock);
+   return -EBUSY;
+   }
+   mutex_unlock(_global_refc.lock);
+
event->hw.idx = -1;
pmu = imc_event_to_pmu(event);

@@ -877,7 +917,16 @@ static int ppc_thread_imc_cpu_online(unsigned int cpu)

  static int ppc_thread_imc_cpu_offline(unsigned int cpu)
  {
-   mtspr(SPRN_LDBAR, 0);
+   /*
+* Toggle the bit 0 of LDBAR.
+*
+* If bit 0 of LDBAR is unset, it will stop posting
+* the counetr data to memory.
+* For thread-imc, bit 0 of LDBAR will be set to 1 in the
+* event_add function. So toggle this bit here, to stop the updates
+* to memory in the cpu_offline path.
+*/
+   mtspr(SPRN_LDBAR, (mfspr(SPRN_LDBAR) ^ (1UL << 63)));
return 0;
  }

@@ -889,6 +938,24 @@ static int thread_imc_cpu_init(void)
  ppc_thread_imc_cpu_offline);
  }

+static void thread_imc_counters_release(struct perf_event *event)
+{
+
+   mutex_lock(_global_refc.lock);
+   if (imc_global_refc.id == IMC_DOMAIN_THREAD) {
+   imc_global_refc.refc--;
+   /*
+* If no other thread is running any thread-imc
+* event, set the global id to zero.
+*/
+   if (imc_global_refc.refc <= 0) {
+   imc_global_refc.refc = 0;
+   imc_global_refc.id = 0;
+   }
+   }
+   mutex_unlock(_global_refc.lock);
+}
+
  static int thread_imc_event_init(struct perf_event *event)
  {
u32 config = event->attr.config;
@@ -905,6 +972,27 @@ static int thread_imc_event_init(struct perf_event *event)
if (event->hw.sample_period)
return -EINVAL;

+   mutex_lock(_global_refc.lock);
+   /*
+* Check if any other thread is running
+* core-engine, if not set the global id to
+* thread-imc.
+*/
+   if (imc_global_refc.id == 0) {
+   imc_global_refc.id = 

Re: [PATCH v2 1/5] powerpc/powernv: Re-enable imc trace-mode in kernel

2020-02-06 Thread maddy




On 1/21/20 3:47 PM, Anju T Sudhakar wrote:

commit <249fad734a25> ""powerpc/perf: Disable trace_imc pmu"
disables IMC(In-Memory Collection) trace-mode in kernel, since frequent
mode switching between accumulation mode and trace mode via the spr LDBAR
in the hardware can trigger a checkstop(system crash).

Patch to re-enable imc-trace mode in kernel.

The following patch in this series will address the mode switching issue
by implementing a global lock, and will restrict the usage of
accumulation and trace-mode at a time.


Reviewed-by: MAdhavan Srinivasan 



Signed-off-by: Anju T Sudhakar 
---
  arch/powerpc/platforms/powernv/opal-imc.c | 9 +
  1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 000b350d4060..3b4518f4b643 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -278,14 +278,7 @@ static int opal_imc_counters_probe(struct platform_device 
*pdev)
domain = IMC_DOMAIN_THREAD;
break;
case IMC_TYPE_TRACE:
-   /*
-* FIXME. Using trace_imc events to monitor application
-* or KVM thread performance can cause a checkstop
-* (system crash).
-* Disable it for now.
-*/
-   pr_info_once("IMC: disabling trace_imc PMU\n");
-   domain = -1;
+   domain = IMC_DOMAIN_TRACE;
break;
default:
pr_warn("IMC Unknown Device type \n");




Re: [RFC] per-CPU usage in perf core-book3s

2020-02-04 Thread maddy




On 1/27/20 8:36 PM, Sebastian Andrzej Siewior wrote:

I've been looking at usage of per-CPU variable cpu_hw_events in
arch/powerpc/perf/core-book3s.c.

power_pmu_enable() and power_pmu_disable() (pmu::pmu_enable() and
pmu::pmu_disable()) are accessing the variable and the callbacks are
invoked always with disabled interrupts.

power_pmu_event_init() (pmu::event_init()) is invoked from preemptible
context and uses get_cpu_var() to obtain a stable pointer (by disabling
preemption).

pmu::pmu_enable() and pmu::pmu_disable() can be invoked via a hrtimer
(perf_mux_hrtimer_handler()) and it invokes pmu::pmu_enable() and
pmu::pmu_disable() as part of the callback.

Is there anything that prevents the timer callback to interrupt
pmu::event_init() while it is accessing per-CPU data?


Sorry for the delayed response.

Yes, currently we dont have anything that prevents the timer
callback to interrupt pmu::event_init. Nice catch. Thanks for
pointing this out.

Looking at the code, per-cpu variable access are made to
check for constraints and for Branch Stack (BHRB). So could
wrap this block of  pmu::event_init with local_irq_save/restore.
Will send a patch to fix it.


Maddy



Sebastian




Re: [PATCH 2/2] powerpc/pseries/svm: Disable BHRB/EBB/PMU access

2020-01-06 Thread maddy




On 12/27/19 10:59 AM, Sukadev Bhattiprolu wrote:

Sukadev Bhattiprolu [suka...@linux.ibm.com] wrote:

Ultravisor disables some CPU features like BHRB, EBB and PMU in
secure virtual machines (SVMs). Skip accessing those registers
in SVMs to avoid getting a Program Interrupt.

Here is an updated patch that explicitly includes  in
in some files to fix build errors reported by .
---

From: Sukadev Bhattiprolu 
Date: Thu, 16 May 2019 20:57:12 -0500
Subject: [PATCH 2/2] powerpc/pseries/svm: Disable BHRB/EBB/PMU access

Ultravisor disables some CPU features like BHRB, EBB and PMU in
secure virtual machines (SVMs). Skip accessing those registers
in SVMs to avoid getting a Program Interrupt.

Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v2]
- [Michael Ellerman] Optimize the code using FW_FEATURE_SVM
- Merged EBB/BHRB and PMU patches into one and reorganized code.
- Fix some build errors reported by 
---
  arch/powerpc/kernel/cpu_setup_power.S   | 21 
  arch/powerpc/kernel/process.c   | 23 ++---
  arch/powerpc/kvm/book3s_hv.c| 33 -
  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 32 +++-
  arch/powerpc/kvm/book3s_hv_tm_builtin.c | 21 ++--
  arch/powerpc/perf/core-book3s.c |  6 +
  arch/powerpc/xmon/xmon.c| 30 +-
  7 files changed, 114 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
b/arch/powerpc/kernel/cpu_setup_power.S
index a460298c7ddb..9e895d8db468 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -206,14 +206,35 @@ __init_PMU_HV_ISA207:
blr

  __init_PMU:
+#ifdef CONFIG_PPC_SVM
+   /*
+* SVM's are restricted from accessing PMU, so skip.
+*/
+   mfmsr   r5
+   rldicl  r5, r5, 64-MSR_S_LG, 62
+   cmpwi   r5,1
+   beq skip1


I know all MMCR* are loaded with 0. But
it is better if PEF code load the MMCR0
with freeze bits on. I will send a separate
patch to handle in the non-svm case.

Rest looks good.
Acked-by: Madhavan Srinivasan 


+#endif
li  r5,0
mtspr   SPRN_MMCRA,r5
mtspr   SPRN_MMCR0,r5
mtspr   SPRN_MMCR1,r5
mtspr   SPRN_MMCR2,r5
+skip1:
blr

  __init_PMU_ISA207:
+
+#ifdef CONFIG_PPC_SVM
+   /*
+* SVM's are restricted from accessing PMU, so skip.
+   */
+   mfmsr   r5
+   rldicl  r5, r5, 64-MSR_S_LG, 62
+   cmpwi   r5,1
+   beq skip2
+#endif
li  r5,0
mtspr   SPRN_MMCRS,r5
+skip2:
blr
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 639ceae7da9d..83c7c4119305 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -64,6 +64,7 @@
  #include 
  #include 
  #include 
+#include 

  #include 
  #include 
@@ -1059,9 +1060,11 @@ static inline void save_sprs(struct thread_struct *t)
t->dscr = mfspr(SPRN_DSCR);

if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
-   t->bescr = mfspr(SPRN_BESCR);
-   t->ebbhr = mfspr(SPRN_EBBHR);
-   t->ebbrr = mfspr(SPRN_EBBRR);
+   if (!is_secure_guest()) {
+   t->bescr = mfspr(SPRN_BESCR);
+   t->ebbhr = mfspr(SPRN_EBBHR);
+   t->ebbrr = mfspr(SPRN_EBBRR);
+   }

t->fscr = mfspr(SPRN_FSCR);

@@ -1097,12 +1100,14 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
}

if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
-   if (old_thread->bescr != new_thread->bescr)
-   mtspr(SPRN_BESCR, new_thread->bescr);
-   if (old_thread->ebbhr != new_thread->ebbhr)
-   mtspr(SPRN_EBBHR, new_thread->ebbhr);
-   if (old_thread->ebbrr != new_thread->ebbrr)
-   mtspr(SPRN_EBBRR, new_thread->ebbrr);
+   if (!is_secure_guest()) {
+   if (old_thread->bescr != new_thread->bescr)
+   mtspr(SPRN_BESCR, new_thread->bescr);
+   if (old_thread->ebbhr != new_thread->ebbhr)
+   mtspr(SPRN_EBBHR, new_thread->ebbhr);
+   if (old_thread->ebbrr != new_thread->ebbrr)
+   mtspr(SPRN_EBBRR, new_thread->ebbrr);
+   }

if (old_thread->fscr != new_thread->fscr)
mtspr(SPRN_FSCR, new_thread->fscr);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 709cf1fd4cf4..29a2640108d1 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -42,6 +42,7 @@
  #include 
  #include 
  #include 
+#include 

  #include 
  #include 
@@ -3568,9 +3569,11 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 
time_limit,

Re: [PATCH v2] powerpc/kernel/sysfs: Add PMU_SYSFS config option to enable PMU SPRs sysfs file creation

2019-11-26 Thread maddy



On 11/13/19 9:40 PM, Kajol Jain wrote:

Many of the performance moniroting unit (PMU) SPRs are
exposed in the sysfs. "perf" API is the primary interface to program
PMU and collect counter data in the system. So expose these
PMU SPRs in the absence of CONFIG_PERF_EVENTS.

Patch adds a new CONFIG option 'CONFIG_PMU_SYSFS'. The new config
option used in kernel/sysfs.c for PMU SPRs sysfs file creation and
this new option is enabled only if 'CONFIG_PERF_EVENTS' option is
disabled.

Tested this patch with enable/disable CONFIG_PERF_EVENTS option
in powernv and pseries machines.
Also did compilation testing for different architecture include:
x86, mips, mips64, alpha, arm. And with book3s_32.config option.


Reviewed-By: Madhavan Srinivasan 



Signed-off-by: Kajol Jain 
---
  arch/powerpc/kernel/sysfs.c| 21 +
  arch/powerpc/platforms/Kconfig.cputype |  8 
  2 files changed, 29 insertions(+)

---
Changelog:
v1 -> v2
- Added new config option 'PMU_SYSFS' for PMU SPR's creation
   rather than using PERF_EVENTS config option directly and make
   sure SPR's file creation only if 'CONFIG_PERF_EVENTS' disabled.
---
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 80a676da11cb..b7c01f1ef236 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -457,16 +457,21 @@ static ssize_t __used \

  #if defined(CONFIG_PPC64)
  #define HAS_PPC_PMC_CLASSIC   1
+#ifdef CONFIG_PMU_SYSFS
  #define HAS_PPC_PMC_IBM   1
+#endif
  #define HAS_PPC_PMC_PA6T  1
  #elif defined(CONFIG_PPC_BOOK3S_32)
  #define HAS_PPC_PMC_CLASSIC   1
+#ifdef CONFIG_PMU_SYSFS
  #define HAS_PPC_PMC_IBM   1
  #define HAS_PPC_PMC_G41
  #endif
+#endif


  #ifdef HAS_PPC_PMC_CLASSIC
+#ifdef CONFIG_PMU_SYSFS
  SYSFS_PMCSETUP(mmcr0, SPRN_MMCR0);
  SYSFS_PMCSETUP(mmcr1, SPRN_MMCR1);
  SYSFS_PMCSETUP(pmc1, SPRN_PMC1);
@@ -485,6 +490,10 @@ SYSFS_PMCSETUP(pmc7, SPRN_PMC7);
  SYSFS_PMCSETUP(pmc8, SPRN_PMC8);

  SYSFS_PMCSETUP(mmcra, SPRN_MMCRA);
+#endif /* CONFIG_PPC64 */
+#endif /* CONFIG_PMU_SYSFS */
+
+#ifdef CONFIG_PPC64
  SYSFS_SPRSETUP(purr, SPRN_PURR);
  SYSFS_SPRSETUP(spurr, SPRN_SPURR);
  SYSFS_SPRSETUP(pir, SPRN_PIR);
@@ -495,7 +504,9 @@ SYSFS_SPRSETUP(tscr, SPRN_TSCR);
enable write when needed with a separate function.
Lets be conservative and default to pseries.
  */
+#ifdef CONFIG_PMU_SYSFS
  static DEVICE_ATTR(mmcra, 0600, show_mmcra, store_mmcra);
+#endif /* CONFIG_PMU_SYSFS */
  static DEVICE_ATTR(spurr, 0400, show_spurr, NULL);
  static DEVICE_ATTR(purr, 0400, show_purr, store_purr);
  static DEVICE_ATTR(pir, 0400, show_pir, NULL);
@@ -606,12 +617,14 @@ static void sysfs_create_dscr_default(void)
  #endif /* CONFIG_PPC64 */

  #ifdef HAS_PPC_PMC_PA6T
+#ifdef CONFIG_PMU_SYSFS
  SYSFS_PMCSETUP(pa6t_pmc0, SPRN_PA6T_PMC0);
  SYSFS_PMCSETUP(pa6t_pmc1, SPRN_PA6T_PMC1);
  SYSFS_PMCSETUP(pa6t_pmc2, SPRN_PA6T_PMC2);
  SYSFS_PMCSETUP(pa6t_pmc3, SPRN_PA6T_PMC3);
  SYSFS_PMCSETUP(pa6t_pmc4, SPRN_PA6T_PMC4);
  SYSFS_PMCSETUP(pa6t_pmc5, SPRN_PA6T_PMC5);
+#endif /* CONFIG_PMU_SYSFS */
  #ifdef CONFIG_DEBUG_MISC
  SYSFS_SPRSETUP(hid0, SPRN_HID0);
  SYSFS_SPRSETUP(hid1, SPRN_HID1);
@@ -644,6 +657,7 @@ SYSFS_SPRSETUP(tsr3, SPRN_PA6T_TSR3);
  #endif /* CONFIG_DEBUG_MISC */
  #endif /* HAS_PPC_PMC_PA6T */

+#ifdef CONFIG_PMU_SYSFS
  #ifdef HAS_PPC_PMC_IBM
  static struct device_attribute ibm_common_attrs[] = {
__ATTR(mmcr0, 0600, show_mmcr0, store_mmcr0),
@@ -671,9 +685,11 @@ static struct device_attribute classic_pmc_attrs[] = {
__ATTR(pmc8, 0600, show_pmc8, store_pmc8),
  #endif
  };
+#endif /* CONFIG_PMU_SYSFS */

  #ifdef HAS_PPC_PMC_PA6T
  static struct device_attribute pa6t_attrs[] = {
+#ifdef CONFIG_PMU_SYSFS
__ATTR(mmcr0, 0600, show_mmcr0, store_mmcr0),
__ATTR(mmcr1, 0600, show_mmcr1, store_mmcr1),
__ATTR(pmc0, 0600, show_pa6t_pmc0, store_pa6t_pmc0),
@@ -682,6 +698,7 @@ static struct device_attribute pa6t_attrs[] = {
__ATTR(pmc3, 0600, show_pa6t_pmc3, store_pa6t_pmc3),
__ATTR(pmc4, 0600, show_pa6t_pmc4, store_pa6t_pmc4),
__ATTR(pmc5, 0600, show_pa6t_pmc5, store_pa6t_pmc5),
+#endif /* CONFIG_PMU_SYSFS */
  #ifdef CONFIG_DEBUG_MISC
__ATTR(hid0, 0600, show_hid0, store_hid0),
__ATTR(hid1, 0600, show_hid1, store_hid1),
@@ -787,8 +804,10 @@ static int register_cpu_online(unsigned int cpu)
device_create_file(s, _attrs[i]);

  #ifdef CONFIG_PPC64
+#ifdef CONFIG_PMU_SYSFS
if (cpu_has_feature(CPU_FTR_MMCRA))
device_create_file(s, _attr_mmcra);
+#endif /* CONFIG_PMU_SYSFS */

if (cpu_has_feature(CPU_FTR_PURR)) {
if (!firmware_has_feature(FW_FEATURE_LPAR))
@@ -876,8 +895,10 @@ static int unregister_cpu_online(unsigned int cpu)
device_remove_file(s, _attrs[i]);

  #ifdef CONFIG_PPC64
+#ifdef CONFIG_PMU_SYSFS
if (cpu_has_feature(CPU_FTR_MMCRA))
 

Re: [PATCH] powerpc/perf: Disable trace_imc pmu

2019-11-13 Thread maddy



On 11/14/19 12:50 PM, Oliver O'Halloran wrote:

On Thu, Nov 14, 2019 at 6:19 PM Madhavan Srinivasan
 wrote:

When a root user or a user with CAP_SYS_ADMIN
privilege use trace_imc performance monitoring
unit events, to monitor application or KVM threads,
may result in a checkstop (System crash). Reason
being frequent switch of the "trace/accumulation"
mode of In-Memory Collection hardware.
This patch disables trace_imc pmu unit, but will
be re-enabled at a later stage with a fix patchset.
---
  arch/powerpc/platforms/powernv/opal-imc.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index e04b20625cb9..5790f078771f 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -285,7 +285,12 @@ static int opal_imc_counters_probe(struct platform_device 
*pdev)
 domain = IMC_DOMAIN_THREAD;
 break;
 case IMC_TYPE_TRACE:
-   domain = IMC_DOMAIN_TRACE;
+   /* Using trace_imc events to monitor
+* application or KVM thread performances
+* may result in a checkstop (system crash).
+* So disabling it for now.
+*/
+   domain = -1;
 break;
 default:
 pr_warn("IMC Unknown Device type \n");
--
2.21.0


Does this need a Fixes: tag?
I was thinking of adding this commit as a fixes tag for fix patchset. 
But if thats
not right, i can add the fixes tag along with a request to send to 
Stable and post a v2


Maddy



Re: [PATCH] powerpc/perf: Add functionality to check PERF_EVENTS config option

2019-10-03 Thread maddy



On 10/1/19 2:51 PM, Kajol Jain wrote:

Perf is the primary interface to program performance monitoring
unit (pmu) and collect counter data in system.
But currently pmu register files are created in the
/sys/devices/system/cpu/cpu* without checking CONFIG_PERF_EVENTS
option. These includes PMC* and MMCR* sprs.
Patch ties sysfs pmu spr file creation with CONFIG_PERF_EVENTS options.

Tested this patch with enable/disable CONFIG_PERF_EVENTS option
in powernv and pseries machines.
Also did compilation testing with book3s_32.config.



Title of the patch should be

powerpc/kernel/sysfs: Add PERF_EVENT config option check to PMU SPRs

Rest looks fine.

Reviewed-by: Madhavan Srinivasan 



Signed-off-by: Kajol Jain 
---
  arch/powerpc/kernel/sysfs.c | 21 -
  1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index e2147d7c9e72..263023cc6308 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -456,16 +456,21 @@ static ssize_t __used \

  #if defined(CONFIG_PPC64)
  #define HAS_PPC_PMC_CLASSIC   1
+#if defined(CONFIG_PERF_EVENTS)
  #define HAS_PPC_PMC_IBM   1
+#endif
  #define HAS_PPC_PMC_PA6T  1
  #elif defined(CONFIG_PPC_BOOK3S_32)
  #define HAS_PPC_PMC_CLASSIC   1
+#if defined(CONFIG_PERF_EVENTS)
  #define HAS_PPC_PMC_IBM   1
  #define HAS_PPC_PMC_G41
  #endif
+#endif


  #ifdef HAS_PPC_PMC_CLASSIC
+#ifdef HAS_PPC_PMC_IBM
  SYSFS_PMCSETUP(mmcr0, SPRN_MMCR0);
  SYSFS_PMCSETUP(mmcr1, SPRN_MMCR1);
  SYSFS_PMCSETUP(pmc1, SPRN_PMC1);
@@ -484,6 +489,10 @@ SYSFS_PMCSETUP(pmc7, SPRN_PMC7);
  SYSFS_PMCSETUP(pmc8, SPRN_PMC8);

  SYSFS_PMCSETUP(mmcra, SPRN_MMCRA);
+#endif /* CONFIG_PPC64 */
+#endif /* HAS_PPC_PMC_IBM */
+
+#ifdef CONFIG_PPC64
  SYSFS_SPRSETUP(purr, SPRN_PURR);
  SYSFS_SPRSETUP(spurr, SPRN_SPURR);
  SYSFS_SPRSETUP(pir, SPRN_PIR);
@@ -494,7 +503,9 @@ SYSFS_SPRSETUP(tscr, SPRN_TSCR);
enable write when needed with a separate function.
Lets be conservative and default to pseries.
  */
+#ifdef HAS_PPC_PMC_IBM
  static DEVICE_ATTR(mmcra, 0600, show_mmcra, store_mmcra);
+#endif /* HAS_PPC_PMC_IBM */
  static DEVICE_ATTR(spurr, 0400, show_spurr, NULL);
  static DEVICE_ATTR(purr, 0400, show_purr, store_purr);
  static DEVICE_ATTR(pir, 0400, show_pir, NULL);
@@ -605,12 +616,14 @@ static void sysfs_create_dscr_default(void)
  #endif /* CONFIG_PPC64 */

  #ifdef HAS_PPC_PMC_PA6T
+#ifdef HAS_PPC_PMC_IBM
  SYSFS_PMCSETUP(pa6t_pmc0, SPRN_PA6T_PMC0);
  SYSFS_PMCSETUP(pa6t_pmc1, SPRN_PA6T_PMC1);
  SYSFS_PMCSETUP(pa6t_pmc2, SPRN_PA6T_PMC2);
  SYSFS_PMCSETUP(pa6t_pmc3, SPRN_PA6T_PMC3);
  SYSFS_PMCSETUP(pa6t_pmc4, SPRN_PA6T_PMC4);
  SYSFS_PMCSETUP(pa6t_pmc5, SPRN_PA6T_PMC5);
+#endif /* HAS_PPC_PMC_IBM */
  #ifdef CONFIG_DEBUG_MISC
  SYSFS_SPRSETUP(hid0, SPRN_HID0);
  SYSFS_SPRSETUP(hid1, SPRN_HID1);
@@ -648,7 +661,6 @@ static struct device_attribute ibm_common_attrs[] = {
__ATTR(mmcr0, 0600, show_mmcr0, store_mmcr0),
__ATTR(mmcr1, 0600, show_mmcr1, store_mmcr1),
  };
-#endif /* HAS_PPC_PMC_G4 */

  #ifdef HAS_PPC_PMC_G4
  static struct device_attribute g4_common_attrs[] = {
@@ -670,9 +682,11 @@ static struct device_attribute classic_pmc_attrs[] = {
__ATTR(pmc8, 0600, show_pmc8, store_pmc8),
  #endif
  };
+#endif /* HAS_PPC_PMC_IBM */

  #ifdef HAS_PPC_PMC_PA6T
  static struct device_attribute pa6t_attrs[] = {
+#ifdef HAS_PPC_PMC_IBM
__ATTR(mmcr0, 0600, show_mmcr0, store_mmcr0),
__ATTR(mmcr1, 0600, show_mmcr1, store_mmcr1),
__ATTR(pmc0, 0600, show_pa6t_pmc0, store_pa6t_pmc0),
@@ -681,6 +695,7 @@ static struct device_attribute pa6t_attrs[] = {
__ATTR(pmc3, 0600, show_pa6t_pmc3, store_pa6t_pmc3),
__ATTR(pmc4, 0600, show_pa6t_pmc4, store_pa6t_pmc4),
__ATTR(pmc5, 0600, show_pa6t_pmc5, store_pa6t_pmc5),
+#endif /* HAS_PPC_PMC_IBM */
  #ifdef CONFIG_DEBUG_MISC
__ATTR(hid0, 0600, show_hid0, store_hid0),
__ATTR(hid1, 0600, show_hid1, store_hid1),
@@ -769,8 +784,10 @@ static int register_cpu_online(unsigned int cpu)
device_create_file(s, _attrs[i]);

  #ifdef CONFIG_PPC64
+#ifdef HAS_PPC_PMC_IBM
if (cpu_has_feature(CPU_FTR_MMCRA))
device_create_file(s, _attr_mmcra);
+#endif

if (cpu_has_feature(CPU_FTR_PURR)) {
if (!firmware_has_feature(FW_FEATURE_LPAR))
@@ -858,8 +875,10 @@ static int unregister_cpu_online(unsigned int cpu)
device_remove_file(s, _attrs[i]);

  #ifdef CONFIG_PPC64
+#ifdef HAS_PPC_PMC_IBM
if (cpu_has_feature(CPU_FTR_MMCRA))
device_remove_file(s, _attr_mmcra);
+#endif

if (cpu_has_feature(CPU_FTR_PURR))
device_remove_file(s, _attr_purr);




Re: [PATCH v2] powerpc/imc: Dont create debugfs files for cpu-less nodes

2019-07-15 Thread maddy



On 7/11/19 10:23 AM, Michael Ellerman wrote:

Hi Maddy,

Madhavan Srinivasan  writes:

diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 186109bdd41b..e04b20625cb9 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -69,20 +69,20 @@ static void export_imc_mode_and_cmd(struct device_node 
*node,
if (of_property_read_u32(node, "cb_offset", _offset))
cb_offset = IMC_CNTL_BLK_OFFSET;
  
-	for_each_node(nid) {

-   loc = (u64)(pmu_ptr->mem_info[chip].vbase) + cb_offset;
+   while (ptr->vbase != NULL) {

This means you'll bail out as soon as you find a node with no vbase, but
it's possible we could have a CPU-less node intermingled with other
nodes.

Nice catch. Thanks for the review, will fix it.

Maddy



So I think you want to keep the for loop, but continue if you see a NULL
vbase?



+   loc = (u64)(ptr->vbase) + cb_offset;
imc_mode_addr = (u64 *)(loc + IMC_CNTL_BLK_MODE_OFFSET);
-   sprintf(mode, "imc_mode_%d", nid);
+   sprintf(mode, "imc_mode_%d", (u32)(ptr->id));
if (!imc_debugfs_create_x64(mode, 0600, imc_debugfs_parent,
imc_mode_addr))
goto err;
  
  		imc_cmd_addr = (u64 *)(loc + IMC_CNTL_BLK_CMD_OFFSET);

-   sprintf(cmd, "imc_cmd_%d", nid);
+   sprintf(cmd, "imc_cmd_%d", (u32)(ptr->id));
if (!imc_debugfs_create_x64(cmd, 0600, imc_debugfs_parent,
imc_cmd_addr))
goto err;
-   chip++;
+   ptr++;
}
return;

cheers





Re: [PATCH v4 6/8] KVM: PPC: Ultravisor: Restrict LDBAR access

2019-07-01 Thread maddy



On 01/07/19 11:24 AM, Alexey Kardashevskiy wrote:


On 29/06/2019 06:08, Claudio Carvalho wrote:

When the ultravisor firmware is available, it takes control over the
LDBAR register. In this case, thread-imc updates and save/restore
operations on the LDBAR register are handled by ultravisor.

What does LDBAR do? "Power ISA™ Version 3.0 B" or "User’s Manual POWER9
Processor" do not tell.
LDBAR is a per-thread SPR used by thread-imc pmu to dump the counter 
data into memory.
LDBAR contains memory address along with few other configuration bits 
(it is populated
by the thread-imc pmu driver). It is populated and enabled only when any 
of the thread

imc pmu events are monitored.

Maddy




Signed-off-by: Claudio Carvalho 
Reviewed-by: Ram Pai 
Reviewed-by: Ryan Grimm 
Acked-by: Madhavan Srinivasan 
Acked-by: Paul Mackerras 
---
  arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 2 ++
  arch/powerpc/platforms/powernv/idle.c | 6 --
  arch/powerpc/platforms/powernv/opal-imc.c | 4 
  3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index f9b2620fbecd..cffb365d9d02 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -375,8 +375,10 @@ BEGIN_FTR_SECTION
mtspr   SPRN_RPR, r0
ld  r0, KVM_SPLIT_PMMAR(r6)
mtspr   SPRN_PMMAR, r0
+BEGIN_FW_FTR_SECTION_NESTED(70)
ld  r0, KVM_SPLIT_LDBAR(r6)
mtspr   SPRN_LDBAR, r0
+END_FW_FTR_SECTION_NESTED(FW_FEATURE_ULTRAVISOR, 0, 70)
isync
  FTR_SECTION_ELSE
/* On P9 we use the split_info for coordinating LPCR changes */
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 77f2e0a4ee37..5593a2d55959 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -679,7 +679,8 @@ static unsigned long power9_idle_stop(unsigned long psscr, 
bool mmu_on)
sprs.ptcr   = mfspr(SPRN_PTCR);
sprs.rpr= mfspr(SPRN_RPR);
sprs.tscr   = mfspr(SPRN_TSCR);
-   sprs.ldbar  = mfspr(SPRN_LDBAR);
+   if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
+   sprs.ldbar  = mfspr(SPRN_LDBAR);
  
  		sprs_saved = true;
  
@@ -762,7 +763,8 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)

mtspr(SPRN_PTCR,sprs.ptcr);
mtspr(SPRN_RPR, sprs.rpr);
mtspr(SPRN_TSCR,sprs.tscr);
-   mtspr(SPRN_LDBAR,   sprs.ldbar);
+   if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
+   mtspr(SPRN_LDBAR,   sprs.ldbar);
  
  	if (pls >= pnv_first_tb_loss_level) {

/* TB loss */
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 1b6932890a73..5fe2d4526cbc 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -254,6 +254,10 @@ static int opal_imc_counters_probe(struct platform_device 
*pdev)
bool core_imc_reg = false, thread_imc_reg = false;
u32 type;
  
+	/* Disable IMC devices, when Ultravisor is enabled. */

+   if (firmware_has_feature(FW_FEATURE_ULTRAVISOR))
+   return -EACCES;
+
/*
 * Check whether this is kdump kernel. If yes, force the engines to
 * stop and return.





Re: [RFC PATCH 4/7]powerpc/powernv: Add OPAL support for Nest pmu

2015-03-12 Thread maddy



On Thursday 12 March 2015 04:27 AM, Stewart Smith wrote:

Madhavan Srinivasan ma...@linux.vnet.ibm.com writes:

Nest Counters can be configured via PORE Engine and OPAL
provides an interface call to it. PORE Engine also does the
work of moving the counter data to memory.

Do you have the associated skiboot patch that implements this firmware
call? I haven't seen it on the skiboot list yet :)

Hi Stewart,

OPAL side code is under-developement. Will post the patches soon to 
the skiboot mailing list.


Regards
Maddy

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev