Re: [RFC PATCH] perf/kvm: Guest Symbol Resolution for powerpc

2015-06-17 Thread Hemant Kumar

Hi Arnaldo,

On 06/16/2015 09:08 PM, Arnaldo Carvalho de Melo wrote:

Em Tue, Jun 16, 2015 at 08:20:53AM +0530, Hemant Kumar escreveu:

"perf kvm {record|report}" is used to record and report the performance
profile of any workload on a guest. From the host, we can collect
guest kernel statistics which is useful in finding out any contentions
in guest kernel symbols for a certain workload.

This feature is not available on powerpc because "perf" relies on the
"cycles" event (a PMU event) to profile the guest. However, for powerpc,
this can't be used from the host because the PMUs are controlled by the
guest rather than the host.

Due to this problem, we need a different approach to profile the
workload in the guest. There exists a tracepoint "kvm_hv:kvm_guest_exit"
in powerpc which is hit whenever any of the threads exit the guest
context. The guest instruction pointer dumped along with this
tracepoint data in the field "pc", can be used as guest instruction
pointer while postprocessing the trace data to map this IP to symbol
from guest.kallsyms.

However, to have some kind of periodicity, we can't use all the kvm
exits, rather exits which are bound to happen in certain intervals.
HV_DECREMENTER Interrupt forces the threads to exit after an interval
of 10 ms.

This patch makes use of the "kvm_guest_exit" tracepoint and checks the
exit reason for any kvm exit. If it is HV_DECREMENTER, then the
instruction pointer dumped along with this tracepoint is retrieved and
mapped with the guest kallsyms.

This patch is a prototype asking for suggestions/comments as to whether
the approach is right or is there any way better than this (like using
a different event to profile for, etc) to profile the guest from the
host.

Thank You.

Signed-off-by: Hemant Kumar 
---
  tools/perf/arch/powerpc/Makefile|  1 +
  tools/perf/arch/powerpc/util/parse-tp.c | 55 +
  tools/perf/builtin-report.c |  9 ++
  tools/perf/util/event.c |  7 -
  tools/perf/util/evsel.c |  7 +
  tools/perf/util/evsel.h |  4 +++
  tools/perf/util/session.c   |  7 +++--
  7 files changed, 86 insertions(+), 4 deletions(-)
  create mode 100644 tools/perf/arch/powerpc/util/parse-tp.c

diff --git a/tools/perf/arch/powerpc/Makefile b/tools/perf/arch/powerpc/Makefile
index 6f7782b..992a0d5 100644
--- a/tools/perf/arch/powerpc/Makefile
+++ b/tools/perf/arch/powerpc/Makefile
@@ -4,3 +4,4 @@ LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/dwarf-regs.o
  LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/skip-callchain-idx.o
  endif
  LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/header.o
+LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/parse-tp.o
diff --git a/tools/perf/arch/powerpc/util/parse-tp.c 
b/tools/perf/arch/powerpc/util/parse-tp.c
new file mode 100644
index 000..4c6e49c
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/parse-tp.c
@@ -0,0 +1,55 @@
+#include "../../util/evsel.h"
+#include "../../util/trace-event.h"
+#include "../../util/session.h"
+
+#define KVMPPC_EXIT "kvm_hv:kvm_guest_exit"
+#define HV_DECREMENTER 2432
+#define HV_BIT 3
+#define PR_BIT 49
+#define PPC_MAX 63
+
+/*
+ * Get the instruction pointer from the tracepoint data
+ */
+u64 arch__get_ip(struct perf_evsel *evsel, struct perf_sample *data)
+{
+   u64 tp_ip = data->ip;
+   int trap;
+
+   if (!strcmp(KVMPPC_EXIT, evsel->name)) {

Can't you cache this somewhere? I.e. something like
  
	static int kvmppc_exit = -1;


if (evsel->attr.type != PERF_TRACEPOINT)
goto out;

if (unlikely(kvmppc_exit == -1)) {
if (strcmp(KVMPPC_EXIT, evsel->name)))
goto out;

kvmppc_exit = evsel->attr.config;
} else (if kvmppc_exit != evsel->attr.config)
goto out;


Will try this.




+   trap = raw_field_value(evsel->tp_format, "trap", data->raw_data);
+
+   if (trap == HV_DECREMENTER)
+   tp_ip = raw_field_value(evsel->tp_format, "pc",
+   data->raw_data);

out:


+   return tp_ip;
+}


Also we have:

u64 perf_evsel__intval(struct perf_evsel *evsel,
   struct perf_sample *sample, const char *name);

So:

trap = perf_evsel__intval(evsel, sample, "trap");

And:

tp_ip = perf_evsel__intval(evsel, sample, "pc");

Makes it a bit shorter and allows for optimizations in how to find that
field by name made at the evsel code.


Thanks, missed perf_evsel__intval, will use this in the next iteration.


- Arnaldo


+
+/*
+ * Get the HV and PR bits and accordingly, determine the cpumode
+ */
+u8 arch__get_cpumode(union perf_event *event, struct perf_evsel *evsel,
+struct perf_sample *data)
+{
+   unsigned long hv, pr, msr;
+   u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+
+   if (strcmp(KVMPPC_EXIT, evsel->name))
+   goto ret;
+
+   if

Re: [PATCH 0/2] KVM: PPC: Book3S HV: Dynamic micro-threading/split-core

2015-06-17 Thread Laurent Vivier
[I resend my message because MLs have refused the first one in HTML]

On 28/05/2015 07:17, Paul Mackerras wrote:
> This patch series provides a way to use more of the capacity of each
> processor core when running guests configured with threads=1, 2 or 4
> on a POWER8 host with HV KVM, without having to change the static
> micro-threading (the official name for split-core) mode for the whole
> machine.  The problem with setting the machine to static 2-way or
> 4-way micro-threading mode is that (a) then you can't run guests with
> threads=8 and (b) selecting the right mode can be tricky and requires
> knowledge of what guests you will be running.
>
> Instead, with these two patches, we can now run more than one virtual
> core (vcore) on a given physical core if possible, and if that means
> we need to switch the core to 2-way or 4-way micro-threading mode,
> then we do that on entry to the guests and switch back to whole-core
> mode on exit (and we only switch the one core, not the whole machine).
> The core mode switching is only done if the machine is in static
> whole-core mode.
>
> All of this only comes into effect when a core is over-committed.
> When the machine is lightly loaded everything operates the same with
> these patches as without.  Only when some core has a vcore that is
> able to run while there is also another vcore that was wanting to run
> on that core but got preempted does the logic kick in to try to run
> both vcores at once.
>
> Paul.
> ---
>
>  arch/powerpc/include/asm/kvm_book3s_asm.h |  20 +
>  arch/powerpc/include/asm/kvm_host.h   |  22 +-
>  arch/powerpc/kernel/asm-offsets.c |   9 +
>  arch/powerpc/kvm/book3s_hv.c  | 648 
> ++
>  arch/powerpc/kvm/book3s_hv_builtin.c  |  32 +-
>  arch/powerpc/kvm/book3s_hv_rm_xics.c  |   4 +-
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 111 -
>  7 files changed, 740 insertions(+), 106 deletions(-)

Tested-by: Laurent Vivier 

Performance is better, but Paul could you explain why it is better if I disable 
dynamic micro-threading ?
Did I miss something ?

My test system is an IBM Power S822L.

I run two guests with 8 vCPUs (-smp 8,sockets=8,cores=1,threads=1) both
attached on the same core (with pinning option of virt-manager). Then, I
measure the time needed to compile a kernel in parallel in both guests
with "make -j 16".

My kernel without micro-threading:

real37m23.424s real37m24.959s
user167m31.474suser165m44.142s
sys 113m26.195ssys 113m45.072s

With micro-threading patches (PATCH 1+2):

target_smt_mode 0 [in fact It was 8 here, but it should behave like 0, as it is 
> max threads/sub-core]
dynamic_mt_modes 6

real32m13.338s real  32m26.652s
user139m21.181suser  140m20.994s
sys 77m35.339s sys   78m16.599s

It's better, but if I disable dynamic micro-threading (but PATCH 1+2):

target_smt_mode 0
dynamic_mt_modes 0

real30m49.100s real 30m48.161s
user144m22.989suser 142m53.886s
sys 65m4.942s  sys  66m8.159s

it's even better.

without dynamic micro-threading patch (with PATCH1 but not PATCH2):

target_smt_mode 0

real33m57.279s real 34m19.524s
user158m43.064suser 156m19.863s
sys 74m25.442s sys  76m42.994s


Laurent

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] powerpc: implement barrier primitives

2015-06-17 Thread Alexander Graf


On 17.06.15 12:15, Will Deacon wrote:
> On Wed, Jun 17, 2015 at 10:43:48AM +0100, Andre Przywara wrote:
>> Instead of referring to the Linux header including the barrier
>> macros, copy over the rather simple implementation for the PowerPC
>> barrier instructions kvmtool uses. This fixes build for powerpc.
>>
>> Signed-off-by: Andre Przywara 
>> ---
>> Hi,
>>
>> I just took what kvmtool seems to have used before, I actually have
>> no idea if "sync" is the right instruction or "lwsync" would do.
>> Would be nice if some people with PowerPC knowledge could comment.
> 
> I *think* we can use lwsync for rmb and wmb, but would want confirmation
> from a ppc guy before making that change!

Also I'd prefer to play safe for now :)


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] powerpc: implement barrier primitives

2015-06-17 Thread Will Deacon
On Wed, Jun 17, 2015 at 10:43:48AM +0100, Andre Przywara wrote:
> Instead of referring to the Linux header including the barrier
> macros, copy over the rather simple implementation for the PowerPC
> barrier instructions kvmtool uses. This fixes build for powerpc.
> 
> Signed-off-by: Andre Przywara 
> ---
> Hi,
> 
> I just took what kvmtool seems to have used before, I actually have
> no idea if "sync" is the right instruction or "lwsync" would do.
> Would be nice if some people with PowerPC knowledge could comment.

I *think* we can use lwsync for rmb and wmb, but would want confirmation
from a ppc guy before making that change!

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] powerpc: add hvcall.h header from Linux

2015-06-17 Thread Will Deacon
On Wed, Jun 17, 2015 at 10:43:50AM +0100, Andre Przywara wrote:
> The powerpc code uses some PAPR hypercalls, of which we need the
> hypercall number. Copy the macro definition parts from the kernel's
> (private) hvcall.h file and remove the extra tricks formerly used
> to be able to include this header file directly.
> 
> Signed-off-by: Andre Przywara 
> ---
> Hi,
> 
> I copied most of the Linux header, without removing
> definitions that kvmtool doesn't use. That should make updates
> easier. If people would prefer a bespoke header, let me know.

I'd rather just #define the stuff we need now that we're outside of the
kernel source tree.

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] powerpc: use default endianness for converting guest/init

2015-06-17 Thread Andre Przywara
For converting the guest/init binary into an object file, we call
the linker binary, setting the endianness to big endian explicitly
when compiling kvmtool for powerpc.
This breaks if the compiler is actually targetting little endian
(which is true for the Debian port, for instance).
Remove the explicit big endianness switch from the linker call to
allow linking on little endian PowerPC builds again.

Signed-off-by: Andre Przywara 
---
Hi,

this fixed the powerpc64le build for me, while still compiling fine
for big endian. Admittedly this whole init->guest_init.o conversion
has its issues (with MIPS, for instance), which deserve proper fixing,
but lets just fix that build for now.

Andre.

 Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/Makefile b/Makefile
index 6110b8e..c118e1a 100644
--- a/Makefile
+++ b/Makefile
@@ -149,7 +149,6 @@ ifeq ($(ARCH), powerpc)
OBJS+= powerpc/xics.o
ARCH_INCLUDE := powerpc/include
CFLAGS  += -m64
-   LDFLAGS += -m elf64ppc
 
ARCH_WANT_LIBFDT := y
 endif
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] powerpc: add hvcall.h header from Linux

2015-06-17 Thread Andre Przywara
The powerpc code uses some PAPR hypercalls, of which we need the
hypercall number. Copy the macro definition parts from the kernel's
(private) hvcall.h file and remove the extra tricks formerly used
to be able to include this header file directly.

Signed-off-by: Andre Przywara 
---
Hi,

I copied most of the Linux header, without removing
definitions that kvmtool doesn't use. That should make updates
easier. If people would prefer a bespoke header, let me know.

Andre.

 powerpc/include/asm/hvcall.h | 287 +++
 powerpc/spapr.h  |   3 -
 2 files changed, 287 insertions(+), 3 deletions(-)
 create mode 100644 powerpc/include/asm/hvcall.h

diff --git a/powerpc/include/asm/hvcall.h b/powerpc/include/asm/hvcall.h
new file mode 100644
index 000..b6dc250
--- /dev/null
+++ b/powerpc/include/asm/hvcall.h
@@ -0,0 +1,287 @@
+#ifndef _ASM_POWERPC_HVCALL_H
+#define _ASM_POWERPC_HVCALL_H
+
+#define HVSC   .long 0x4422
+
+#define H_SUCCESS  0
+#define H_BUSY 1   /* Hardware busy -- retry later */
+#define H_CLOSED   2   /* Resource closed */
+#define H_NOT_AVAILABLE 3
+#define H_CONSTRAINED  4   /* Resource request constrained to max allowed 
*/
+#define H_PARTIAL   5
+#define H_IN_PROGRESS  14  /* Kind of like busy */
+#define H_PAGE_REGISTERED 15
+#define H_PARTIAL_STORE   16
+#define H_PENDING  17  /* returned from H_POLL_PENDING */
+#define H_CONTINUE 18  /* Returned from H_Join on success */
+#define H_LONG_BUSY_START_RANGE9900  /* Start of long busy 
range */
+#define H_LONG_BUSY_ORDER_1_MSEC   9900  /* Long busy, hint that 1msec \
+is a good time to retry */
+#define H_LONG_BUSY_ORDER_10_MSEC  9901  /* Long busy, hint that 10msec \
+is a good time to retry */
+#define H_LONG_BUSY_ORDER_100_MSEC 9902  /* Long busy, hint that 100msec \
+is a good time to retry */
+#define H_LONG_BUSY_ORDER_1_SEC9903  /* Long busy, hint that 
1sec \
+is a good time to retry */
+#define H_LONG_BUSY_ORDER_10_SEC   9904  /* Long busy, hint that 10sec \
+is a good time to retry */
+#define H_LONG_BUSY_ORDER_100_SEC  9905  /* Long busy, hint that 100sec \
+is a good time to retry */
+#define H_LONG_BUSY_END_RANGE  9905  /* End of long busy range */
+
+/* Internal value used in book3s_hv kvm support; not returned to guests */
+#define H_TOO_HARD 
+
+#define H_HARDWARE -1  /* Hardware error */
+#define H_FUNCTION -2  /* Function not supported */
+#define H_PRIVILEGE-3  /* Caller not privileged */
+#define H_PARAMETER-4  /* Parameter invalid, out-of-range or 
conflicting */
+#define H_BAD_MODE -5  /* Illegal msr value */
+#define H_PTEG_FULL-6  /* PTEG is full */
+#define H_NOT_FOUND-7  /* PTE was not found" */
+#define H_RESERVED_DABR-8  /* DABR address is reserved by the 
hypervisor on this processor" */
+#define H_NO_MEM   -9
+#define H_AUTHORITY-10
+#define H_PERMISSION   -11
+#define H_DROPPED  -12
+#define H_SOURCE_PARM  -13
+#define H_DEST_PARM-14
+#define H_REMOTE_PARM  -15
+#define H_RESOURCE -16
+#define H_ADAPTER_PARM  -17
+#define H_RH_PARM   -18
+#define H_RCQ_PARM  -19
+#define H_SCQ_PARM  -20
+#define H_EQ_PARM   -21
+#define H_RT_PARM   -22
+#define H_ST_PARM   -23
+#define H_SIGT_PARM -24
+#define H_TOKEN_PARM-25
+#define H_MLENGTH_PARM  -27
+#define H_MEM_PARM  -28
+#define H_MEM_ACCESS_PARM -29
+#define H_ATTR_PARM -30
+#define H_PORT_PARM -31
+#define H_MCG_PARM  -32
+#define H_VL_PARM   -33
+#define H_TSIZE_PARM-34
+#define H_TRACE_PARM-35
+
+#define H_MASK_PARM -37
+#define H_MCG_FULL  -38
+#define H_ALIAS_EXIST   -39
+#define H_P_COUNTER -40
+#define H_TABLE_FULL-41
+#define H_ALT_TABLE -42
+#define H_MR_CONDITION  -43
+#define H_NOT_ENOUGH_RESOURCES -44
+#define H_R_STATE   -45
+#define H_RESCINDED -46
+#define H_P2   -55
+#define H_P3   -56
+#define H_P4   -57
+#define H_P5   -58
+#define H_P6   -59
+#define H_P7   -60
+#define H_P8   -61
+#define H_P9   -62
+#define H_TOO_BIG  -64
+#define H_OVERLAP  -68
+#define H_INTERRUPT-69
+#define H_BAD_DATA -70
+#define H_NOT_ACTIVE   -71
+#define H_SG_LIST  -72
+#define H_OP_MODE  -73
+#define H_COP_HW   -74
+#define H_UNSUPPORTED_FLAG_START   -256
+#define H_UNSUPPORTED_FLAG_END -511
+#define H_MULTI_THREADS_ACTIVE -9005
+#define H_OUTSTANDING_COP_OPS  -9006
+
+
+/* Long Busy is a condition that can be returned by the firmware

[PATCH 0/3] kvmtool: fixes for PowerPC

2015-06-17 Thread Andre Przywara
Hello,

some patches to fix at least the build of the new kvmtool for
PowerPC. I could only compile test it so far, so I'd be grateful
if people more familiar with that architecture can have a look
and maybe even test it on actual machines.

Cheers,
Andre.

Andre Przywara (3):
  powerpc: implement barrier primitives
  powerpc: use default endianness for converting guest/init
  powerpc: add hvcall.h header from Linux

 Makefile  |   1 -
 powerpc/include/asm/hvcall.h  | 287 ++
 powerpc/include/kvm/barrier.h |   4 +-
 powerpc/spapr.h   |   3 -
 4 files changed, 290 insertions(+), 5 deletions(-)
 create mode 100644 powerpc/include/asm/hvcall.h

-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] powerpc: implement barrier primitives

2015-06-17 Thread Andre Przywara
Instead of referring to the Linux header including the barrier
macros, copy over the rather simple implementation for the PowerPC
barrier instructions kvmtool uses. This fixes build for powerpc.

Signed-off-by: Andre Przywara 
---
Hi,

I just took what kvmtool seems to have used before, I actually have
no idea if "sync" is the right instruction or "lwsync" would do.
Would be nice if some people with PowerPC knowledge could comment.

Cheers,
Andre.

 powerpc/include/kvm/barrier.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/powerpc/include/kvm/barrier.h b/powerpc/include/kvm/barrier.h
index dd5115a..4b708ae 100644
--- a/powerpc/include/kvm/barrier.h
+++ b/powerpc/include/kvm/barrier.h
@@ -1,6 +1,8 @@
 #ifndef _KVM_BARRIER_H_
 #define _KVM_BARRIER_H_
 
-#include 
+#define mb()   asm volatile ("sync" : : : "memory")
+#define rmb()  asm volatile ("sync" : : : "memory")
+#define wmb()  asm volatile ("sync" : : : "memory")
 
 #endif /* _KVM_BARRIER_H_ */
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html