Re: [PATCH 2/2] uprobes/powerpc: Make use of generic routines to enable single step

2012-11-26 Thread Suzuki K. Poulose

On 11/26/2012 10:31 PM, Oleg Nesterov wrote:

On 11/26, Suzuki K. Poulose wrote:


@@ -121,8 +125,11 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 * to be executed.
 */
regs->nip = utask->vaddr + MAX_UINSN_BYTES;
+   regs->msr = utask->autask.saved_msr;
+#ifdef CONFIG_PPC_ADV_DEBUG_REGS
+   mtspr(SPRN_DBCR0, utask->autask.saved_dbcr);
+#endif

-   user_disable_single_step(current);


Don't we need the same change in arch_uprobe_abort_xol() ?

Yes, we do. Thanks for catching that. I will fix it.

Thanks for the review.

Suzuki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] powerpc: Move the single step enable code to a generic path

2012-11-27 Thread Suzuki K. Poulose

On 11/26/2012 11:40 PM, Sebastian Andrzej Siewior wrote:

On 11/26/2012 12:05 PM, Suzuki K. Poulose wrote:

diff --git a/arch/powerpc/include/asm/probes.h
b/arch/powerpc/include/asm/probes.h
index 5f1e15b..836e9b9 100644
--- a/arch/powerpc/include/asm/probes.h
+++ b/arch/powerpc/include/asm/probes.h
@@ -38,5 +38,34 @@ typedef u32 ppc_opcode_t;
  #define is_trap(instr)(IS_TW(instr) || IS_TWI(instr))
  #endif /* CONFIG_PPC64 */

+#ifdef CONFIG_PPC_ADV_DEBUG_REGS
+#define MSR_SINGLESTEP(MSR_DE)
+#else
+#define MSR_SINGLESTEP(MSR_SE)
+#endif
+
+/* Enable single stepping for the current task */
+static inline void enable_single_step(struct pt_regs *regs)
+{
+
+/*
+ * We turn off async exceptions to ensure that the single step will
+ * be for the instruction we have the kprobe on, if we dont its

it is


+ * possible we'd get the single step reported for an exception
handler
+ * like Decrementer or External Interrupt
+ */


Hmmm. The TRM for E400 says

|5.11.1 e500 Exception Priorities
|The following is a prioritized listing of e500 exceptions:
|   4. Critical input
|   5. Debug interrupt
|   6. External input
|   22. Decrementer

The list has been cut down a little. That means the debug interrupt
comes before external interrupt and before the decrement fires.

And if single step is what wakes you up then DBSR[ICMP] is set. Am I
missing something or is this FSL only not not book-e



You are right. The same priority applies for Book3S as well. The above
code/comment was initially written for kprobe. So I didn't bother to
double check the same, when I moved it to the common place.

I will send a v2 with the required changes.

Thanks for the question !

Cheers
Suzuki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [perf] Remove the node from rblist in strlist__remove

2012-08-28 Thread Suzuki K. Poulose
The following commit:

author  David Ahern 
Tue, 31 Jul 2012 04:31:33 + (22:31 -0600)
committer   Arnaldo Carvalho de Melo 
Fri, 3 Aug 2012 13:39:51 + (10:39 -0300)
commit  ee8dd3ca43f151d9fbe1edeef68fb8a77eb9f047

causes a double free during a probe deletion as the node is
never removed from the list via strlist__remove(), even though
it gets 'deleted' (read free()'d). This causes a double
free when we do strlist__delete() as the node is already deleted
but present in the rblist.

[suzukikp@suzukikp perf]$ sudo ./perf probe -a do_fork
Added new event:
  probe:do_fork(on do_fork)

You can now use it in all perf tools, such as:

perf record -e probe:do_fork -aR sleep 1

[suzukikp@suzukikp perf]$ sudo ./perf probe -d do_fork
Removed event: probe:do_fork
*** glibc detected *** ./perf: double free or corruption (fasttop): 
0x0133d600 ***
=== Backtrace: =
/lib64/libc.so.6[0x38eec7dda6]
./perf(rblist__delete+0x5c)[0x47d3dc]
./perf(del_perf_probe_events+0xb6)[0x47b826]
./perf(cmd_probe+0x471)[0x42c8d1]
./perf[0x4150b3]
./perf(main+0x501)[0x4148e1]
/lib64/libc.so.6(__libc_start_main+0xed)[0x38eec2169d]
./perf[0x414a61]


Make sure we remove the node from the rblist before we delete the
node. The rblist__remove_node() will invoke rblist->node_delete,
which  will take care of deleting the node with the suitable function
provided by the user.

Reported-by: Ananth N. Mavinakayanahalli 
Signed-off-by: Suzuki K. Poulose 
Cc: David Ahern 
Cc: Arnaldo Carvalho de Melo 
---

 tools/perf/util/strlist.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/perf/util/strlist.c b/tools/perf/util/strlist.c
index 95856ff..155d8b7 100644
--- a/tools/perf/util/strlist.c
+++ b/tools/perf/util/strlist.c
@@ -93,7 +93,7 @@ out:
 
 void strlist__remove(struct strlist *slist, struct str_node *snode)
 {
-   str_node__delete(snode, slist->dupstr);
+   rblist__remove_node(&slist->rblist, &snode->rb_node);
 }
 
 struct str_node *strlist__find(struct strlist *slist, const char *entry)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [perf] Remove the node from rblist in strlist__remove

2012-08-28 Thread Suzuki K. Poulose

On 08/29/2012 11:59 AM, David Ahern wrote:

On 8/29/12 12:00 AM, Suzuki K. Poulose wrote:

The following commit:

authorDavid Ahern 
Tue, 31 Jul 2012 04:31:33 + (22:31 -0600)
committerArnaldo Carvalho de Melo 
Fri, 3 Aug 2012 13:39:51 + (10:39 -0300)
commitee8dd3ca43f151d9fbe1edeef68fb8a77eb9f047

causes a double free during a probe deletion as the node is
never removed from the list via strlist__remove(), even though
it gets 'deleted' (read free()'d). This causes a double
free when we do strlist__delete() as the node is already deleted
but present in the rblist.

[suzukikp@suzukikp perf]$ sudo ./perf probe -a do_fork
Added new event:
   probe:do_fork(on do_fork)

You can now use it in all perf tools, such as:

perf record -e probe:do_fork -aR sleep 1

[suzukikp@suzukikp perf]$ sudo ./perf probe -d do_fork
Removed event: probe:do_fork
*** glibc detected *** ./perf: double free or corruption (fasttop):
0x0133d600 ***
=== Backtrace: =
/lib64/libc.so.6[0x38eec7dda6]
./perf(rblist__delete+0x5c)[0x47d3dc]
./perf(del_perf_probe_events+0xb6)[0x47b826]
./perf(cmd_probe+0x471)[0x42c8d1]
./perf[0x4150b3]
./perf(main+0x501)[0x4148e1]
/lib64/libc.so.6(__libc_start_main+0xed)[0x38eec2169d]
./perf[0x414a61]


Make sure we remove the node from the rblist before we delete the
node. The rblist__remove_node() will invoke rblist->node_delete,
which  will take care of deleting the node with the suitable function
provided by the user.

Reported-by: Ananth N. Mavinakayanahalli 
Signed-off-by: Suzuki K. Poulose 
Cc: David Ahern 
Cc: Arnaldo Carvalho de Melo 


Acked-by: David Ahern 

Same type of change is needed for util/intlist.c if you want to submit
one, otherwise I will take care of it.


I can send it.

Thanks
Suzuki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [perf] Fix intlist node removal

2012-08-30 Thread Suzuki K. Poulose
Similar to the one in :
https://lkml.org/lkml/2012/8/29/27


Make sure we remove the node from the rblist before we delete the
node. The rblist__remove_node() will invoke rblist->node_delete,
which  will take care of deleting the node with the suitable function
provided by the user.

Signed-off-by: Suzuki K Poulose 
Cc: David Ahern 
---

 tools/perf/util/intlist.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/intlist.c b/tools/perf/util/intlist.c
index fd530dc..77c504f 100644
--- a/tools/perf/util/intlist.c
+++ b/tools/perf/util/intlist.c
@@ -52,9 +52,9 @@ int intlist__add(struct intlist *ilist, int i)
return rblist__add_node(&ilist->rblist, (void *)((long)i));
 }
 
-void intlist__remove(struct intlist *ilist __used, struct int_node *node)
+void intlist__remove(struct intlist *ilist, struct int_node *node)
 {
-   int_node__delete(node);
+   rblist__remove_node(&ilist->rblist, &node->rb_node);
 }
 
 struct int_node *intlist__find(struct intlist *ilist, int i)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [perf] Account the nr_entries in rblist properly

2012-08-31 Thread Suzuki K. Poulose
The nr_entries in rblist is never decremented when an element
is deleted. Also, use rblist__remove_node to delete a node in
rblist__delete(). This would keep the nr_entries sane.

Signed-off-by: Suzuki K. Poulose 
Cc: David S. Ahern 
---

 tools/perf/util/rblist.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/rblist.c b/tools/perf/util/rblist.c
index 0171fb6..a16cdd2 100644
--- a/tools/perf/util/rblist.c
+++ b/tools/perf/util/rblist.c
@@ -44,6 +44,7 @@ int rblist__add_node(struct rblist *rblist, const void 
*new_entry)
 void rblist__remove_node(struct rblist *rblist, struct rb_node *rb_node)
 {
rb_erase(rb_node, &rblist->entries);
+   --rblist->nr_entries;
rblist->node_delete(rblist, rb_node);
 }
 
@@ -87,8 +88,7 @@ void rblist__delete(struct rblist *rblist)
while (next) {
pos = next;
next = rb_next(pos);
-   rb_erase(pos, &rblist->entries);
-   rblist->node_delete(rblist, pos);
+   rblist__remove_node(rblist, pos);
}
free(rblist);
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH 00/19] Non disruptive application core dump infrastructure using task_work_add()

2013-10-06 Thread Suzuki K. Poulose
On 10/04/2013 07:14 PM, Andi Kleen wrote:
> On Fri, Oct 04, 2013 at 04:00:12PM +0530, Janani Venkataraman wrote:
>> Hi all,
>>
>> The following series implements an infrastructure for capturing the core of 
>> an 
>> application without disrupting its process.
> 
> The problem is that gcore et.al. have to stop the process briefly
> to attach and then use the pid mmap ptrace interfaces, right?
> 
Correct.

> Couldn't they just use the new process_vm_readv() syscalls instead?
> AFAIK those do not require ptrace.
> 
We need the register set and hence would need a ptrace.

> Then this could be all done in user space.
> 
> Or are there some specific races with this approach?
> 
Cheers
Suzuki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFD: Non-Disruptive Core Dump Infrastructure

2013-09-11 Thread Suzuki K. Poulose
On 09/12/2013 12:57 AM, KOSAKI Motohiro wrote:
> (9/3/13 4:39 AM), Janani Venkataraman wrote:
>> Hello,
>>
>> We are working on an infrastructure to create a system core file of a
>> specific
>> process at run-time, non-disruptively. It can also be extended to a
>> case where
>> a process is able to take a self-core dump.
>>
>> gcore, an existing utility creates a core image of the specified
>> process. It
>> attaches to the process using gdb and runs the gdb gcore command and then
>> detaches. In gcore the dump cannot be issued from a signal handler
>> context as
>> fork() is not signal safe and moreover it is disruptive in nature as
>> the gdb
>> attaches using ptrace which sends a SIGSTOP signal. Hence the gcore
>> method
>> cannot be used if the process wants to initiate a self dump.
> 
> Maybe I'm missing something. But why gcore uses c-level fork()? gcore
> need to
> call pthread-at-fork handler? No. gcore need to flush stdio buffer? No.
> 
Let me clarify. If an application wants to dump itself, it has to do a
fork() and then exec the gcore with the pid of the appication to
generate the dump.

So, if the application wants to initiate the dump from a signal handler
context, it may lead to trouble.

Thanks
Suzuki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] [powerpc] Change memory_limit from phys_addr_t to unsigned long long

2012-09-07 Thread Suzuki K. Poulose

On 09/07/2012 07:05 AM, Benjamin Herrenschmidt wrote:

On Tue, 2012-08-21 at 17:12 +0530, Suzuki K. Poulose wrote:

There are some device-tree nodes, whose values are of type phys_addr_t.
The phys_addr_t is variable sized based on the CONFIG_PHSY_T_64BIT.

Change these to a fixed unsigned long long for consistency.

This patch does the change only for memory_limit.

The following is a list of such variables which need the change:

  1) kernel_end, crashk_size - in arch/powerpc/kernel/machine_kexec.c

  2) (struct resource *)crashk_res.start - We could export a local static
 variable from machine_kexec.c.

Changing the above values might break the kexec-tools. So, I will
fix kexec-tools first to handle the different sized values and then change
  the above.

Suggested-by: Benjamin Herrenschmidt 
Signed-off-by: Suzuki K. Poulose 
---


Breaks the build on some configs (with 32-bit phys_addr_t):


Sorry for that.


/home/benh/linux-powerpc-test/arch/powerpc/kernel/prom.c: In function
'early_init_devtree':
/home/benh/linux-powerpc-test/arch/powerpc/kernel/prom.c:664:25: error:
comparison of distinct pointer types lacks a cast

I'm fixing that myself this time but please be more careful.

Sure. Thanks Ben for fixing that.

Suzuki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 0/4] uprobes/powerpc: Replace ptrace helpers for single stepping

2012-12-03 Thread Suzuki K. Poulose
The following series replaces the ptrace helpers used for single step
enable/disable for uprobes on powerpc, with uprobe specific code.

We reuse the kprobe code to enable single stepping by making it generic
and save/restore the MSR (and DBCR for BookE) across the single step.

This series applies on top of the patches posted by Oleg at :
https://lkml.org/lkml/2012/10/28/92 


Patches have been verified on Power6 and PPC440 (BookE).

Changes since V1: 

 * Don't disable external interrupts. (Sebastian)
 * Introduced routines for saving/restoring the context for sstep.
 * Restore the context in arch_uprobe_abort_xol() (Oleg)


---

Suzuki K. Poulose (4):
  kprobes/powerpc: Do not disable External interrupts during single step
  powerpc: Move the single step enable code to a generic path
  uprobes/powerpc: Introduce routines for save/restore context
  uprobes/powerpc: Make use of generic routines to enable single step


 arch/powerpc/include/asm/probes.h  |   25 +
 arch/powerpc/include/asm/uprobes.h |4 
 arch/powerpc/kernel/kprobes.c  |   21 +
 arch/powerpc/kernel/uprobes.c  |   32 +---
 4 files changed, 55 insertions(+), 27 deletions(-)

-- 
Suzuki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/4] kprobes/powerpc: Do not disable External interrupts during single step

2012-12-03 Thread Suzuki K. Poulose
From: Suzuki K. Poulose 

External/Decrement exceptions have lower priority than the Debug Exception.
So, we don't have to disable the External interrupts before a single step.
However, on BookE, Critical Input Exception(CE) has higher priority than a
Debug Exception. Hence we mask them.

Signed-off-by:  Suzuki K. Poulose 
Cc: Sebastian Andrzej Siewior 
Cc: Ananth N Mavinakaynahalli 
Cc: Kumar Gala 
Cc: linuxppc-...@ozlabs.org
---
 arch/powerpc/kernel/kprobes.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index e88c643..4901b34 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -104,13 +104,13 @@ void __kprobes arch_remove_kprobe(struct kprobe *p)
 
 static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs 
*regs)
 {
-   /* We turn off async exceptions to ensure that the single step will
-* be for the instruction we have the kprobe on, if we dont its
-* possible we'd get the single step reported for an exception handler
-* like Decrementer or External Interrupt */
-   regs->msr &= ~MSR_EE;
regs->msr |= MSR_SINGLESTEP;
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
+   /* 
+* We turn off Critical Input Exception(CE) to ensure that the single
+* step will be for the instruction we have the probe on; if we don't,
+* it is possible we'd get the single step reported for CE.
+*/
regs->msr &= ~MSR_CE;
mtspr(SPRN_DBCR0, mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM);
 #ifdef CONFIG_PPC_47x

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/4] powerpc: Move the single step enable code to a generic path

2012-12-03 Thread Suzuki K. Poulose
From: Suzuki K. Poulose 

This patch moves the single step enable code used by kprobe to a generic
routine header so that, it can be re-used by other code, in this case,
uprobes. No functional changes.

Signed-off-by: Suzuki K. Poulose 
Cc: Ananth N Mavinakaynahalli 
Cc: Kumar Gala 
Cc: linuxppc-...@ozlabs.org
---
 arch/powerpc/include/asm/probes.h |   25 +
 arch/powerpc/kernel/kprobes.c |   21 +
 2 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/probes.h 
b/arch/powerpc/include/asm/probes.h
index 5f1e15b..f94a44f 100644
--- a/arch/powerpc/include/asm/probes.h
+++ b/arch/powerpc/include/asm/probes.h
@@ -38,5 +38,30 @@ typedef u32 ppc_opcode_t;
 #define is_trap(instr) (IS_TW(instr) || IS_TWI(instr))
 #endif /* CONFIG_PPC64 */
 
+#ifdef CONFIG_PPC_ADV_DEBUG_REGS
+#define MSR_SINGLESTEP (MSR_DE)
+#else
+#define MSR_SINGLESTEP (MSR_SE)
+#endif
+
+/* Enable single stepping for the current task */
+static inline void enable_single_step(struct pt_regs *regs)
+{
+   regs->msr |= MSR_SINGLESTEP;
+#ifdef CONFIG_PPC_ADV_DEBUG_REGS
+   /* 
+* We turn off Critical Input Exception(CE) to ensure that the single
+* step will be for the instruction we have the probe on; if we don't,
+* it is possible we'd get the single step reported for CE.
+*/
+   regs->msr &= ~MSR_CE;
+   mtspr(SPRN_DBCR0, mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM);
+#ifdef CONFIG_PPC_47x
+   isync();
+#endif
+#endif
+}
+
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_PROBES_H */
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 4901b34..92f1be7 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -36,12 +36,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_PPC_ADV_DEBUG_REGS
-#define MSR_SINGLESTEP (MSR_DE)
-#else
-#define MSR_SINGLESTEP (MSR_SE)
-#endif
-
 DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
 DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
 
@@ -104,20 +98,7 @@ void __kprobes arch_remove_kprobe(struct kprobe *p)
 
 static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs 
*regs)
 {
-   regs->msr |= MSR_SINGLESTEP;
-#ifdef CONFIG_PPC_ADV_DEBUG_REGS
-   /* 
-* We turn off Critical Input Exception(CE) to ensure that the single
-* step will be for the instruction we have the probe on; if we don't,
-* it is possible we'd get the single step reported for CE.
-*/
-   regs->msr &= ~MSR_CE;
-   mtspr(SPRN_DBCR0, mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM);
-#ifdef CONFIG_PPC_47x
-   isync();
-#endif
-#endif
-
+   enable_single_step(regs);
/*
 * On powerpc we should single step on the original
 * instruction even if the probed insn is a trap

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 3/4] uprobes/powerpc: Introduce routines for save/restore context

2012-12-03 Thread Suzuki K. Poulose
From: Suzuki K. Poulose 

Introduce routines for saving and restoring the context
befre/after the single step. No functional changes involved.

These will be extended later to save/restore more info about
the process once we replace the ptrace helpers.

Signed-off-by: Suzuki K. Poulose 
---
 arch/powerpc/kernel/uprobes.c |   16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index bc77834..1a62353 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -52,6 +52,16 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
return 0;
 }
 
+static void uprobe_save_context_sstep(struct arch_uprobe_task *autask)
+{
+   autask->saved_trap_nr = current->thread.trap_nr;
+}
+
+static void uprobe_restore_context_sstep(struct arch_uprobe_task *autask)
+{
+   current->thread.trap_nr = autask->saved_trap_nr;
+}
+
 /*
  * arch_uprobe_pre_xol - prepare to execute out of line.
  * @auprobe: the probepoint information.
@@ -61,7 +71,7 @@ int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct 
pt_regs *regs)
 {
struct arch_uprobe_task *autask = ¤t->utask->autask;
 
-   autask->saved_trap_nr = current->thread.trap_nr;
+   uprobe_save_context_sstep(autask);
current->thread.trap_nr = UPROBE_TRAP_NR;
regs->nip = current->utask->xol_vaddr;
 
@@ -111,7 +121,7 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 
WARN_ON_ONCE(current->thread.trap_nr != UPROBE_TRAP_NR);
 
-   current->thread.trap_nr = utask->autask.saved_trap_nr;
+   uprobe_restore_context_sstep(&utask->autask);
 
/*
 * On powerpc, except for loads and stores, most instructions
@@ -164,7 +174,7 @@ void arch_uprobe_abort_xol(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 {
struct uprobe_task *utask = current->utask;
 
-   current->thread.trap_nr = utask->autask.saved_trap_nr;
+   uprobe_restore_context_sstep(&utask->autask);
instruction_pointer_set(regs, utask->vaddr);
 
user_disable_single_step(current);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 4/4] uprobes/powerpc: Make use of generic routines to enable single step

2012-12-03 Thread Suzuki K. Poulose
From: Suzuki K. Poulose 

Replace the ptrace helpers with the powerpc generic routines to
enable/disable single step. We save/restore the MSR (and DCBR for BookE)
across for the operation. We don't have to disable the single step,
as restoring the MSR/DBCR would restore the previous state.

Signed-off-by: Suzuki K. Poulose 
---
 arch/powerpc/include/asm/uprobes.h |4 
 arch/powerpc/kernel/uprobes.c  |   26 +-
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/uprobes.h 
b/arch/powerpc/include/asm/uprobes.h
index b532060..10a521c 100644
--- a/arch/powerpc/include/asm/uprobes.h
+++ b/arch/powerpc/include/asm/uprobes.h
@@ -43,6 +43,10 @@ struct arch_uprobe {
 
 struct arch_uprobe_task {
unsigned long   saved_trap_nr;
+   unsigned long   saved_msr;
+#ifdef CONFIG_PPC_ADV_DEBUG_REGS
+   unsigned long   saved_dbcr0;
+#endif
 };
 
 extern int  arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct 
*mm, unsigned long addr);
diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index 1a62353..6af55c4 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -52,14 +52,25 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
return 0;
 }
 
-static void uprobe_save_context_sstep(struct arch_uprobe_task *autask)
+static void uprobe_save_context_sstep(struct arch_uprobe_task *autask,
+   struct pt_regs *regs)
 {
autask->saved_trap_nr = current->thread.trap_nr;
+   autask->saved_msr = regs->msr;
+#ifdef CONFIG_PPC_ADV_DEBUG_REGS
+   autask->saved_dbcr0 = mfspr(SPRN_DBCR0);
+#endif
 }
 
-static void uprobe_restore_context_sstep(struct arch_uprobe_task *autask)
+static void uprobe_restore_context_sstep(struct arch_uprobe_task *autask,
+   struct pt_regs *regs)
 {
current->thread.trap_nr = autask->saved_trap_nr;
+
+   regs->msr = autask->saved_msr;
+#ifdef CONFIG_PPC_ADV_DEBUG_REGS
+   mtspr(SPRN_DBCR0, autask->saved_dbcr0);
+#endif
 }
 
 /*
@@ -71,11 +82,11 @@ int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct 
pt_regs *regs)
 {
struct arch_uprobe_task *autask = ¤t->utask->autask;
 
-   uprobe_save_context_sstep(autask);
+   uprobe_save_context_sstep(autask, regs);
current->thread.trap_nr = UPROBE_TRAP_NR;
regs->nip = current->utask->xol_vaddr;
 
-   user_enable_single_step(current);
+   enable_single_step(regs);
return 0;
 }
 
@@ -121,7 +132,7 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 
WARN_ON_ONCE(current->thread.trap_nr != UPROBE_TRAP_NR);
 
-   uprobe_restore_context_sstep(&utask->autask);
+   uprobe_restore_context_sstep(&utask->autask, regs);
 
/*
 * On powerpc, except for loads and stores, most instructions
@@ -132,7 +143,6 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 */
regs->nip = utask->vaddr + MAX_UINSN_BYTES;
 
-   user_disable_single_step(current);
return 0;
 }
 
@@ -174,10 +184,8 @@ void arch_uprobe_abort_xol(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 {
struct uprobe_task *utask = current->utask;
 
-   uprobe_restore_context_sstep(&utask->autask);
+   uprobe_restore_context_sstep(&utask->autask, regs);
instruction_pointer_set(regs, utask->vaddr);
-
-   user_disable_single_step(current);
 }
 
 /*

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 3/4] uprobes/powerpc: Introduce routines for save/restore context

2012-12-03 Thread Suzuki K. Poulose

On 12/03/2012 08:45 PM, Ananth N Mavinakayanahalli wrote:

On Mon, Dec 03, 2012 at 08:39:35PM +0530, Suzuki K. Poulose wrote:

From: Suzuki K. Poulose 

Introduce routines for saving and restoring the context
befre/after the single step. No functional changes involved.

These will be extended later to save/restore more info about
the process once we replace the ptrace helpers.

Signed-off-by: Suzuki K. Poulose 
---
  arch/powerpc/kernel/uprobes.c |   16 +---
  1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index bc77834..1a62353 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -52,6 +52,16 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
return 0;
  }

+static void uprobe_save_context_sstep(struct arch_uprobe_task *autask)
+{
+   autask->saved_trap_nr = current->thread.trap_nr;
+}
+
+static void uprobe_restore_context_sstep(struct arch_uprobe_task *autask)
+{
+   current->thread.trap_nr = autask->saved_trap_nr;
+}


Can't the two above be inline?

I had this discussion with Srikar and he was of the opinion that, we
should leave it as just static and let the compiler do the optimization.


Thanks
Suzuki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/4] kprobes/powerpc: Do not disable External interrupts during single step

2012-12-10 Thread Suzuki K. Poulose

On 12/03/2012 08:37 PM, Suzuki K. Poulose wrote:

From: Suzuki K. Poulose 

External/Decrement exceptions have lower priority than the Debug Exception.
So, we don't have to disable the External interrupts before a single step.
However, on BookE, Critical Input Exception(CE) has higher priority than a
Debug Exception. Hence we mask them.

Signed-off-by:  Suzuki K. Poulose 
Cc: Sebastian Andrzej Siewior 
Cc: Ananth N Mavinakaynahalli 
Cc: Kumar Gala 
Cc: linuxppc-...@ozlabs.org
---
  arch/powerpc/kernel/kprobes.c |   10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index e88c643..4901b34 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -104,13 +104,13 @@ void __kprobes arch_remove_kprobe(struct kprobe *p)

  static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs 
*regs)
  {
-   /* We turn off async exceptions to ensure that the single step will
-* be for the instruction we have the kprobe on, if we dont its
-* possible we'd get the single step reported for an exception handler
-* like Decrementer or External Interrupt */
-   regs->msr &= ~MSR_EE;
regs->msr |= MSR_SINGLESTEP;
  #ifdef CONFIG_PPC_ADV_DEBUG_REGS
+   /*
+* We turn off Critical Input Exception(CE) to ensure that the single
+* step will be for the instruction we have the probe on; if we don't,
+* it is possible we'd get the single step reported for CE.
+*/
regs->msr &= ~MSR_CE;
mtspr(SPRN_DBCR0, mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM);
  #ifdef CONFIG_PPC_47x



Ben, Kumar,

Could you please review this patch ?


Thanks
Suzuki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/4] uprobes/powerpc: Make use of generic routines to enable single step

2012-12-17 Thread Suzuki K. Poulose

On 12/15/2012 01:32 AM, Oleg Nesterov wrote:

On 12/03, Suzuki K. Poulose wrote:


Replace the ptrace helpers with the powerpc generic routines to
enable/disable single step. We save/restore the MSR (and DCBR for BookE)
across for the operation. We don't have to disable the single step,
as restoring the MSR/DBCR would restore the previous state.


Obviously I can't review this series (although it looks fine to me).

Just one note,


@@ -121,7 +132,7 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, 
struct pt_regs *regs)

WARN_ON_ONCE(current->thread.trap_nr != UPROBE_TRAP_NR);

-   uprobe_restore_context_sstep(&utask->autask);
+   uprobe_restore_context_sstep(&utask->autask, regs);


I am not sure ppc needs this, but note that x86 does a bit more.

Not only we need to restore the "single-step" state, we need to
send SIGTRAP if it was not set by us. The same for _skip_sstep.


Ok. I will investigate that part and do the necessary.

Thanks
Suzuki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] perf/powerpc: Use uapi/unistd.h to fix build error

2012-11-20 Thread Suzuki K. Poulose

On 11/08/2012 12:48 AM, Sukadev Bhattiprolu wrote:


 From b8beef080260c1625c8f801105504a82005295e5 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu 
Date: Wed, 31 Oct 2012 11:21:28 -0700
Subject: [PATCH 1/4] perf/powerpc: Use uapi/unistd.h to fix build error

Use the 'unistd.h' from arch/powerpc/include/uapi to build the perf tool.

Signed-off-by: Sukadev Bhattiprolu 

Without this patch, I couldn't build perf on powerpc, with 3.7.0-rc2

Tested-by: Suzuki K. Poulose 

Thanks
Suzuki

---
  tools/perf/perf.h |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 054182e..f4952da 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -26,7 +26,7 @@ void get_term_dimensions(struct winsize *ws);
  #endif

  #ifdef __powerpc__
-#include "../../arch/powerpc/include/asm/unistd.h"
+#include "../../arch/powerpc/include/uapi/asm/unistd.h"
  #define rmb() asm volatile ("sync" ::: "memory")
  #define cpu_relax()   asm volatile ("" ::: "memory");
  #define CPUINFO_PROC  "cpu"



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] uprobes: remove check for uprobe variable in handle_swbp()

2012-08-08 Thread Suzuki K. Poulose

On 08/07/2012 09:42 PM, Sebastian Andrzej Siewior wrote:

by the time we get here (after we pass cleanup_ret) uprobe is always is
set. If it is NULL we leave very early in the code.

Signed-off-by: Sebastian Andrzej Siewior 
---
  kernel/events/uprobes.c |   16 +++-
  1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 41a2555..c8e5204 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1528,17 +1528,15 @@ cleanup_ret:
utask->active_uprobe = NULL;
utask->state = UTASK_RUNNING;
}
-   if (uprobe) {
-   if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
+   if (!(uprobe->flags & UPROBE_SKIP_SSTEP))


Shouldn't we check uprobe != NULL before we check the uprobe->flags ?
i.e, shouldn't the above line be :

   if (uprobe && ! (uprobe->flags & UPROBE_SKIP_SSTEP)) ?

-   /*
-* cannot singlestep; cannot skip instruction;
-* re-execute the instruction.
-*/
-   instruction_pointer_set(regs, bp_vaddr);
+   /*
+* cannot singlestep; cannot skip instruction;
+* re-execute the instruction.
+*/
+   instruction_pointer_set(regs, bp_vaddr);

-   put_uprobe(uprobe);
-   }
+   put_uprobe(uprobe);
  }


Thanks
Suzuki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] uprobes: remove check for uprobe variable in handle_swbp()

2012-08-09 Thread Suzuki K. Poulose

On 08/08/2012 03:05 PM, Sebastian Andrzej Siewior wrote:

On 08/08/2012 11:10 AM, Suzuki K. Poulose wrote:

--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1528,17 +1528,15 @@ cleanup_ret:
utask->active_uprobe = NULL;
utask->state = UTASK_RUNNING;
}
- if (uprobe) {
- if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
+ if (!(uprobe->flags & UPROBE_SKIP_SSTEP))


Shouldn't we check uprobe != NULL before we check the uprobe->flags ?
i.e, shouldn't the above line be :

if (uprobe && ! (uprobe->flags & UPROBE_SKIP_SSTEP)) ?


The function starts like this:

  if (!uprobe) {
  if (is_swbp > 0) {
  send_sig(SIGTRAP, current, 0);
  } else {
  instruction_pointer_set(regs, bp_vaddr);
  }
  return;
  }

Which makes uprobe != NULL by the time we get there, no?


My bad, was looking at an older version of the function. Also,
the removal of the if (uprobe), check triggered the above question.

Thanks
Suzuki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 0/2][powerpc] Export memory_limit via device tree

2012-08-21 Thread Suzuki K. Poulose
The following series exports the linux memory_limit set by
the mem= parameter via device-tree, so that kexec-tools
can limit the crash regions to the actual memory used by
the kernel.

Change since V1:

 * Added a patch to change the type of memory_limit to a
   fixed size(unsigned long long) from 'phys_addr_t' (which
   is 32bit on some ppc32 and 64 bit on ppc64 and some ppc32)

 * Rebased the patch to use recently fixed prom_update_property()
   which would add the property if it didn't exist.

---

Suzuki K. Poulose (2):
  [powerpc] Change memory_limit from phys_addr_t to unsigned long long
  [powerpc] Export memory limit via device tree


 arch/powerpc/include/asm/setup.h|2 +-
 arch/powerpc/kernel/fadump.c|3 +--
 arch/powerpc/kernel/machine_kexec.c |   14 +-
 arch/powerpc/kernel/prom.c  |2 +-
 arch/powerpc/mm/mem.c   |2 +-
 5 files changed, 17 insertions(+), 6 deletions(-)

-- 
Suzuki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/2] [powerpc] Change memory_limit from phys_addr_t to unsigned long long

2012-08-21 Thread Suzuki K. Poulose
There are some device-tree nodes, whose values are of type phys_addr_t.
The phys_addr_t is variable sized based on the CONFIG_PHSY_T_64BIT.

Change these to a fixed unsigned long long for consistency.

This patch does the change only for memory_limit.

The following is a list of such variables which need the change:

 1) kernel_end, crashk_size - in arch/powerpc/kernel/machine_kexec.c

 2) (struct resource *)crashk_res.start - We could export a local static
variable from machine_kexec.c.

Changing the above values might break the kexec-tools. So, I will
fix kexec-tools first to handle the different sized values and then change
 the above.

Suggested-by: Benjamin Herrenschmidt 
Signed-off-by: Suzuki K. Poulose 
---

 arch/powerpc/include/asm/setup.h|2 +-
 arch/powerpc/kernel/fadump.c|3 +--
 arch/powerpc/kernel/machine_kexec.c |2 +-
 arch/powerpc/kernel/prom.c  |2 +-
 arch/powerpc/mm/mem.c   |2 +-
 5 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index d084ce1..8b9a306 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -9,7 +9,7 @@ extern void ppc_printk_progress(char *s, unsigned short hex);
 extern unsigned int rtas_data;
 extern int mem_init_done;  /* set on boot once kmalloc can be called */
 extern int init_bootmem_done;  /* set once bootmem is available */
-extern phys_addr_t memory_limit;
+extern unsigned long long memory_limit;
 extern unsigned long klimit;
 extern void *zalloc_maybe_bootmem(size_t size, gfp_t mask);
 
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 18bdf74..06c8202 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -289,8 +289,7 @@ int __init fadump_reserve_mem(void)
else
memory_limit = memblock_end_of_DRAM();
printk(KERN_INFO "Adjusted memory_limit for firmware-assisted"
-   " dump, now %#016llx\n",
-   (unsigned long long)memory_limit);
+   " dump, now %#016llx\n", memory_limit);
}
if (memory_limit)
memory_boundary = memory_limit;
diff --git a/arch/powerpc/kernel/machine_kexec.c 
b/arch/powerpc/kernel/machine_kexec.c
index 5df..4074eff 100644
--- a/arch/powerpc/kernel/machine_kexec.c
+++ b/arch/powerpc/kernel/machine_kexec.c
@@ -165,7 +165,7 @@ void __init reserve_crashkernel(void)
if (memory_limit && memory_limit <= crashk_res.end) {
memory_limit = crashk_res.end + 1;
printk("Adjusted memory limit for crashkernel, now 0x%llx\n",
-  (unsigned long long)memory_limit);
+  memory_limit);
}
 
printk(KERN_INFO "Reserving %ldMB of memory at %ldMB "
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index f191bf0..c82c77d 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -78,7 +78,7 @@ static int __init early_parse_mem(char *p)
return 1;
 
memory_limit = PAGE_ALIGN(memparse(p, &p));
-   DBG("memory limit = 0x%llx\n", (unsigned long long)memory_limit);
+   DBG("memory limit = 0x%llx\n", memory_limit);
 
return 0;
 }
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index baaafde..0a8f353 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -62,7 +62,7 @@
 
 int init_bootmem_done;
 int mem_init_done;
-phys_addr_t memory_limit;
+unsigned long long memory_limit;
 
 #ifdef CONFIG_HIGHMEM
 pte_t *kmap_pte;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/2] [powerpc] Export memory limit via device tree

2012-08-21 Thread Suzuki K. Poulose
The powerpc kernel doesn't export the memory limit enforced by 'mem='
kernel parameter. This is required for building the ELF header in
kexec-tools to limit the vmcore to capture only the used memory. On
powerpc the kexec-tools depends on the device-tree for memory related
information, unlike /proc/iomem on the x86.

Without this information, the kexec-tools assumes the entire System
RAM and vmcore creates an unnecessarily larger dump.

This patch exports the memory limit, if present, via
chosen/linux,memory-limit
property, so that the vmcore can be limited to the memory limit.

The prom_init seems to export this value in the same node. But doesn't
really
appear there.  Also the memory_limit gets adjusted with the processing of
crashkernel= parameter. This patch makes sure we get the actual limit.

The kexec-tools will use the value to limit the 'end' of the memory
regions.

Tested this patch on ppc64 and ppc32(ppc440) with a kexec-tools
patch by Mahesh.

Signed-off-by: Suzuki K. Poulose 
Tested-by: Mahesh J. Salgaonkar 
---

 arch/powerpc/kernel/machine_kexec.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/machine_kexec.c 
b/arch/powerpc/kernel/machine_kexec.c
index 4074eff..fa9f6c7 100644
--- a/arch/powerpc/kernel/machine_kexec.c
+++ b/arch/powerpc/kernel/machine_kexec.c
@@ -204,6 +204,12 @@ static struct property crashk_size_prop = {
.value = &crashk_size,
 };
 
+static struct property memory_limit_prop = {
+   .name = "linux,memory-limit",
+   .length = sizeof(unsigned long long),
+   .value = &memory_limit,
+};
+
 static void __init export_crashk_values(struct device_node *node)
 {
struct property *prop;
@@ -223,6 +229,12 @@ static void __init export_crashk_values(struct device_node 
*node)
crashk_size = resource_size(&crashk_res);
prom_add_property(node, &crashk_size_prop);
}
+
+   /*
+* memory_limit is required by the kexec-tools to limit the
+* crash regions to the actual memory used.
+*/
+   prom_update_property(node, &memory_limit_prop);
 }
 
 static int __init kexec_setup(void)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] uprobes/powerpc: Replace ptrace single step helpers

2012-11-26 Thread Suzuki K. Poulose
The following series replaces the ptrace helpers used for single step
enable/disable for uprobes on powerpc, with uprobe specific code.

We reuse the kprobe code to enable single stepping by making it generic
and save/restore the MSR (and DBCR for BookE) across the single step.

This series applies on top of the patches posted by Oleg at :
https://lkml.org/lkml/2012/10/28/92 


Patches have been verified on Power6 and PPC440 (BookE).

---

Suzuki K. Poulose (2):
  powerpc: Move the single step enable code to a generic path
  uprobes/powerpc: Make use of generic routines to enable single step


 arch/powerpc/include/asm/probes.h  |   29 +
 arch/powerpc/include/asm/uprobes.h |4 
 arch/powerpc/kernel/kprobes.c  |   21 +
 arch/powerpc/kernel/uprobes.c  |   11 +--
 4 files changed, 43 insertions(+), 22 deletions(-)

-- 
Suzuki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] powerpc: Move the single step enable code to a generic path

2012-11-26 Thread Suzuki K. Poulose
From: Suzuki K. Poulose 

This patch moves the single step enable code used by kprobe to a generic
routine so that, it can be re-used by other code, in this case,
uprobes.


Signed-off-by: Suzuki K. Poulose 
Cc: linuxppc-...@ozlabs.org
---
 arch/powerpc/include/asm/probes.h |   29 +
 arch/powerpc/kernel/kprobes.c |   21 +
 2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/probes.h 
b/arch/powerpc/include/asm/probes.h
index 5f1e15b..836e9b9 100644
--- a/arch/powerpc/include/asm/probes.h
+++ b/arch/powerpc/include/asm/probes.h
@@ -38,5 +38,34 @@ typedef u32 ppc_opcode_t;
 #define is_trap(instr) (IS_TW(instr) || IS_TWI(instr))
 #endif /* CONFIG_PPC64 */
 
+#ifdef CONFIG_PPC_ADV_DEBUG_REGS
+#define MSR_SINGLESTEP (MSR_DE)
+#else
+#define MSR_SINGLESTEP (MSR_SE)
+#endif
+
+/* Enable single stepping for the current task */
+static inline void enable_single_step(struct pt_regs *regs)
+{
+
+   /* 
+* We turn off async exceptions to ensure that the single step will
+* be for the instruction we have the kprobe on, if we dont its
+* possible we'd get the single step reported for an exception handler
+* like Decrementer or External Interrupt
+*/
+   regs->msr &= ~MSR_EE;
+   regs->msr |= MSR_SINGLESTEP;
+#ifdef CONFIG_PPC_ADV_DEBUG_REGS
+   regs->msr &= ~MSR_CE;
+   mtspr(SPRN_DBCR0, mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM);
+#ifdef CONFIG_PPC_47x
+   isync();
+#endif
+#endif
+
+}
+
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_PROBES_H */
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index e88c643..92f1be7 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -36,12 +36,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_PPC_ADV_DEBUG_REGS
-#define MSR_SINGLESTEP (MSR_DE)
-#else
-#define MSR_SINGLESTEP (MSR_SE)
-#endif
-
 DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
 DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
 
@@ -104,20 +98,7 @@ void __kprobes arch_remove_kprobe(struct kprobe *p)
 
 static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs 
*regs)
 {
-   /* We turn off async exceptions to ensure that the single step will
-* be for the instruction we have the kprobe on, if we dont its
-* possible we'd get the single step reported for an exception handler
-* like Decrementer or External Interrupt */
-   regs->msr &= ~MSR_EE;
-   regs->msr |= MSR_SINGLESTEP;
-#ifdef CONFIG_PPC_ADV_DEBUG_REGS
-   regs->msr &= ~MSR_CE;
-   mtspr(SPRN_DBCR0, mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM);
-#ifdef CONFIG_PPC_47x
-   isync();
-#endif
-#endif
-
+   enable_single_step(regs);
/*
 * On powerpc we should single step on the original
 * instruction even if the probed insn is a trap

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] uprobes/powerpc: Make use of generic routines to enable single step

2012-11-26 Thread Suzuki K. Poulose
From: Suzuki K. Poulose 

Replace the ptrace helpers with the powerpc generic routines to
enable/disable single step. We save/restore the MSR (and DCBR for BookE)
across for the operation.


Signed-off-by: Suzuki K. Poulose 
---
 arch/powerpc/include/asm/uprobes.h |4 
 arch/powerpc/kernel/uprobes.c  |   11 +--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/uprobes.h 
b/arch/powerpc/include/asm/uprobes.h
index b532060..884be93 100644
--- a/arch/powerpc/include/asm/uprobes.h
+++ b/arch/powerpc/include/asm/uprobes.h
@@ -43,6 +43,10 @@ struct arch_uprobe {
 
 struct arch_uprobe_task {
unsigned long   saved_trap_nr;
+   unsigned long   saved_msr;
+#ifdef CONFIG_PPC_ADV_DEBUG_REGS
+   unsigned long   saved_dbcr;
+#endif
 };
 
 extern int  arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct 
*mm, unsigned long addr);
diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index bc77834..c407c07 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -62,10 +62,14 @@ int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct 
pt_regs *regs)
struct arch_uprobe_task *autask = ¤t->utask->autask;
 
autask->saved_trap_nr = current->thread.trap_nr;
+   autask->saved_msr = regs->msr;
+#ifdef CONFIG_PPC_ADV_DEBUG_REGS
+   autask->saved_dbcr = mfspr(SPRN_DBCR0);
+#endif
current->thread.trap_nr = UPROBE_TRAP_NR;
regs->nip = current->utask->xol_vaddr;
 
-   user_enable_single_step(current);
+   enable_single_step(regs);
return 0;
 }
 
@@ -121,8 +125,11 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 * to be executed.
 */
regs->nip = utask->vaddr + MAX_UINSN_BYTES;
+   regs->msr = utask->autask.saved_msr;
+#ifdef CONFIG_PPC_ADV_DEBUG_REGS
+   mtspr(SPRN_DBCR0, utask->autask.saved_dbcr);
+#endif
 
-   user_disable_single_step(current);
return 0;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] uprobes/powerpc: Add dependency on single step emulation

2013-01-07 Thread Suzuki K. Poulose
From: Suzuki K. Poulose 

Uprobes uses emulate_step in sstep.c, but we haven't explicitly specified
the dependency. On pseries HAVE_HW_BREAKPOINT protects us, but 44x has no
such luxury.

Consolidate other users that depend on sstep and create a new config option.

Signed-off-by: Ananth N Mavinakayanahalli 
Signed-off-by: Suzuki K. Poulose 
Cc: linuxppc-...@ozlabs.org
Cc: sta...@vger.kernel.org
---
 arch/powerpc/Kconfig  |4 
 arch/powerpc/lib/Makefile |4 +---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 17903f1..dabe429 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -275,6 +275,10 @@ config PPC_ADV_DEBUG_DAC_RANGE
depends on PPC_ADV_DEBUG_REGS && 44x
default y
 
+config PPC_EMULATE_SSTEP
+   bool
+   default y if KPROBES || UPROBES || XMON || HAVE_HW_BREAKPOINT
+
 source "init/Kconfig"
 
 source "kernel/Kconfig.freezer"
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 746e0c8..35baad9 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -19,9 +19,7 @@ obj-$(CONFIG_PPC64)   += copypage_64.o copyuser_64.o \
   checksum_wrappers_64.o hweight_64.o \
   copyuser_power7.o string_64.o copypage_power7.o \
   memcpy_power7.o
-obj-$(CONFIG_XMON) += sstep.o ldstfp.o
-obj-$(CONFIG_KPROBES)  += sstep.o ldstfp.o
-obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += sstep.o ldstfp.o
+obj-$(CONFIG_PPC_EMULATE_SSTEP)+= sstep.o ldstfp.o
 
 ifeq ($(CONFIG_PPC64),y)
 obj-$(CONFIG_SMP)  += locks.o

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/10] coresight: etm-perf: pass struct perf_event to source::enable/disable()

2016-07-20 Thread Suzuki K Poulose

On 18/07/16 20:51, Mathieu Poirier wrote:

With this commit [1] address range filter information is now found
in the struct hw_perf_event::addr_filters.  As such pass the event
itself to the coresight_source::enable/disable() functions so that
both event attribute and filter can be accessible for configuration.

[1] 'commit 375637bc5249 ("perf/core: Introduce address range filtering")'



diff --git a/include/linux/coresight.h b/include/linux/coresight.h
index 385d62e64abb..2a5982c37dfb 100644
--- a/include/linux/coresight.h
+++ b/include/linux/coresight.h
@@ -232,8 +232,9 @@ struct coresight_ops_source {
int (*cpu_id)(struct coresight_device *csdev);
int (*trace_id)(struct coresight_device *csdev);
int (*enable)(struct coresight_device *csdev,
- struct perf_event_attr *attr,  u32 mode);
-   void (*disable)(struct coresight_device *csdev);
+ struct perf_event *event,  u32 mode);
+   void (*disable)(struct coresight_device *csdev,
+   struct perf_event *event);


nit:

Should we make this a a bit more generic API rather than hard coding
the perf stuff in there ? i.e,

how about :

int (*enable)(struct coresight_device *csdev, void *data, u32 mode)

void (*disable)(struct coresight_device *csdev, void *data, u32 mode)

where data is specific to the mode of operation. That way the API is
cleaner and each mode could pass their own data (even though sysfs
doesn't use any at the moment).

Suzuki


Re: [PATCH 03/10] coresight: etm-perf: configuring filters from perf core

2016-07-20 Thread Suzuki K Poulose

On 18/07/16 20:51, Mathieu Poirier wrote:

This patch implements the required API needed to access
and retrieve range and start/stop filters from the perf core.

Signed-off-by: Mathieu Poirier 
---
 drivers/hwtracing/coresight/coresight-etm-perf.c | 146 ---
 drivers/hwtracing/coresight/coresight-etm-perf.h |  32 +
 2 files changed, 162 insertions(+), 16 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c 
b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 78a1bc0013a2..fde7f42149c5 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -29,6 +29,7 @@
 #include 

 #include "coresight-priv.h"
+#include "coresight-etm-perf.h"

 static struct pmu etm_pmu;
 static bool etm_perf_up;
@@ -83,12 +84,44 @@ static const struct attribute_group *etm_pmu_attr_groups[] 
= {

 static void etm_event_read(struct perf_event *event) {}

+static int etm_addr_filters_alloc(struct perf_event *event)
+{


...


+   return 0;
+}
+




+
 static int etm_event_init(struct perf_event *event)
 {
+   int ret;
+
if (event->attr.type != etm_pmu.type)
return -ENOENT;

-   return 0;
+   ret = etm_addr_filters_alloc(event);




 }

 static void free_event_data(struct work_struct *work)
@@ -456,6 +489,85 @@ static void etm_free_drv_configs(struct perf_event *event)
}
 }

+static int etm_addr_filters_validate(struct list_head *filters)
+{



+
+   return 0;
+}
+
+static void etm_addr_filters_sync(struct perf_event *event)
+{
+   struct perf_addr_filters_head *head = perf_event_addr_filters(event);
+   unsigned long start, stop, *offs = event->addr_filters_offs;
+   struct etm_filters *filters = event->hw.addr_filters;
+   struct perf_addr_filter *filter;
+   int i = 0;


Is it possible to delay the etm_addr_filters_alloc() until this point ?
I understand that this function cannot report back failures if we fail
to allocate memory. Or may be do a lazy allocation from addr_filters_validate(),
when we get the first filter added.

Of course this could be done as a follow up patch to improve things once
we get the initial framework in.




+
+   list_for_each_entry(filter, &head->list, entry) {
+   start = filter->offset + offs[i];
+   stop = start + filter->size;
+
+   if (filter->range == 1) {
+   filters->filter[i].start_addr = start;
+   filters->filter[i].stop_addr = stop;
+   filters->filter[i].type = ETM_ADDR_TYPE_RANGE;
+   } else {
+   if (filter->filter == 1) {
+   filters->filter[i].start_addr = start;
+   filters->filter[i].type = ETM_ADDR_TYPE_START;
+   } else {
+   filters->filter[i].stop_addr = stop;
+   filters->filter[i].type = ETM_ADDR_TYPE_STOP;
+   }
+   }
+   i++;
+   }
+
+   filters->nr_filters = i;
+/**
+ * struct etm_filters - set of filters for a session
+ * @etm_filter:All the filters for this session.
+ * @nr_filters:Number of filters
+ * @ssstatus:  Status of the start/stop logic.
+ */
+struct etm_filters {
+   struct etm_filter   filter[ETM_ADDR_CMP_MAX];


nit: having the variable renamed to etm_filter will make the code a bit more 
readable
where we populate/validate the filters.

Otherwise looks good

Suzuki


[PATCH] arm64: Fix incorrect per-cpu usage for boot CPU

2016-07-21 Thread Suzuki K Poulose
In smp_prepare_boot_cpu(), we invoke cpuinfo_store_boot_cpu to  store
the cpuinfo in a per-cpu ptr, before initialising the per-cpu offset for
the boot CPU. This patch reorders the sequence to make sure we initialise
the per-cpu offset before accessing the per-cpu area.

Commit 4b998ff1885eec ("arm64: Delay cpuinfo_store_boot_cpu") fixed the
issue where we modified the per-cpu area even before the kernel initialises
the per-cpu areas, but failed to wait until the boot cpu updated it's
offset.

Fixes: commit 4b998ff1885eec ("arm64: Delay cpuinfo_store_boot_cpu")
Cc: 
Cc: Will Deacon 
Cc: Catalin Marinas 
Signed-off-by: Suzuki K Poulose 
---
 arch/arm64/kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 62ff3c0..d242e81 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -437,9 +437,9 @@ void __init smp_cpus_done(unsigned int max_cpus)
 
 void __init smp_prepare_boot_cpu(void)
 {
+   set_my_cpu_offset(per_cpu_offset(smp_processor_id()));
cpuinfo_store_boot_cpu();
save_boot_cpu_run_el();
-   set_my_cpu_offset(per_cpu_offset(smp_processor_id()));
 }
 
 static u64 __init of_get_cpu_mpidr(struct device_node *dn)
-- 
2.7.4



[PATCH] [4.7] arm64: Honor nosmp kernel command line option

2016-07-21 Thread Suzuki K Poulose
Passing "nosmp" should boot the kernel with a single processor, without
provision to enable secondary CPUs even if they are present. "nosmp" is
implemented by setting maxcpus=0. At the moment we still mark the secondary
CPUs present even with nosmp, which allows the userspace to bring them
up. This patch corrects the smp_prepare_cpus() to honor the maxcpus == 0.

Commit 44dbcc93ab67145 ("arm64: Fix behavior of maxcpus=N") fixed the
behavior for maxcpus >= 1, but broke maxcpus = 0.

Fixes: commit 44dbcc93ab67145 ("arm64: Fix behavior of maxcpus=N")
Cc: Will Deacon 
Cc: Catalin Marinas 
Cc: Mark Rutland 
Cc: James Morse 
Signed-off-by: Suzuki K Poulose 
---
 arch/arm64/kernel/smp.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d242e81..ec08b7a 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -694,6 +694,13 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
smp_store_cpu_info(smp_processor_id());
 
/*
+* If UP is mandated by "nosmp"(implies maxcpus=0), don't bother about
+* secondary CPUs.
+*/
+   if (max_cpus == 0)
+   return;
+
+   /*
 * Initialise the present map (which describes the set of CPUs
 * actually populated at the present time) and release the
 * secondaries from the bootloader.
-- 
2.7.4



Re: [PATCH V2 5/6] coresight: adding sink parameter to function coresight_build_path()

2016-07-21 Thread Suzuki K Poulose

On 20/07/16 21:38, Mathieu Poirier wrote:

Up to now function coresight_build_path() was counting on a sink to
have been selected (from sysFS) prior to being called.  This patch
adds a string argument so that a sink matching the argument can be
selected.




 static int _coresight_build_path(struct coresight_device *csdev,
-struct list_head *path)
+struct list_head *path, const char *sink)
 {
int i;
bool found = false;
struct coresight_node *node;

-   /* An activated sink has been found.  Enqueue the element */
-   if ((csdev->type == CORESIGHT_DEV_TYPE_SINK ||
-csdev->type == CORESIGHT_DEV_TYPE_LINKSINK) && csdev->activated)
-   goto out;
+   /*
+* First see if we are dealing with a sink.  If we have one check if
+* it was selected via sysFS or the perf cmd line.
+*/
+   if (csdev->type == CORESIGHT_DEV_TYPE_SINK ||
+   csdev->type == CORESIGHT_DEV_TYPE_LINKSINK) {
+   /* Activated via perf cmd line */
+   if (sink && !strcmp(dev_name(&csdev->dev), sink))
+   goto out;
+   /* Activated via sysFS */
+   if (csdev->activated)


When a sink is specified, should we skip an activated sink and continue to
find the specified one ? or at least fail with an error as we may not be using
the sink specified by the user ?
i.e may be :
if (!sink && csdev->activated)
goto out;

Suzuki


Re: [PATCH] coresight: tmc: Cleanup operation mode handling

2016-09-19 Thread Suzuki K Poulose

On 16/09/16 18:07, Mathieu Poirier wrote:

On 14 September 2016 at 07:53, Suzuki K Poulose  wrote:

The mode of operation of the TMC tracked in drvdata->mode is defined
as a local_t type. This is always checked and modified under the
drvdata->spinlock and hence we don't need local_t for it and the
unnecessary synchronisation instructions that comes with it. This
change makes the code a bit more cleaner.

Also fixes the order in which we update the drvdata->mode to
CS_MODE_DISABLED. i.e, in tmc_disable_etX_sink we change the
mode to CS_MODE_DISABLED before invoking tmc_disable_etX_hw()
which in turn depends on the mode to decide whether to dump the
trace to a buffer.


Thank you for the patch - just a few comments below.





@@ -194,17 +192,17 @@ static int tmc_enable_etf_sink_perf(struct 
coresight_device *csdev, u32 mode)
goto out;
}

-   val = local_xchg(&drvdata->mode, mode);
/*
 * In Perf mode there can be only one writer per sink.  There
 * is also no need to continue if the ETB/ETR is already operated
 * from sysFS.
 */
-   if (val != CS_MODE_DISABLED) {
+   if (drvdata->mode != CS_MODE_DISABLED) {
ret = -EINVAL;
goto out;
}

+   drvdata->mode = mode;


Given the way tmc_enable_etf_sink_perf() is called in
tmc_enable_etf_sink(), I think it is time to get rid of the 'mode'
parameter - it doesn't do anything nowadays.  Same thing for
tmc_enable_etf_sink_sysfs() and ETR.


Sure, makes sense. I will clean it up.


@@ -279,8 +277,8 @@ static void tmc_disable_etf_link(struct coresight_device 
*csdev,
return;
}

+   drvdata->mode = CS_MODE_DISABLED;
tmc_etf_disable_hw(drvdata);
-   local_set(&drvdata->mode, CS_MODE_DISABLED);


I think setting the mode should come after tmc_etf_disable_hw(), as it
was before.


You're right, I will change it.


Thanks for the review. Will send the updated series soon.

Cheers
Suzuki
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.



Re: [PATCH] coresight: tmc: Cleanup operation mode handling

2016-09-19 Thread Suzuki K Poulose

On 19/09/16 17:59, Suzuki K Poulose wrote:

On 16/09/16 18:07, Mathieu Poirier wrote:

On 14 September 2016 at 07:53, Suzuki K Poulose  wrote:



Cheers
Suzuki
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.



Bah, sorry about that. You are fine. This email is intended for the public list 
discussions.

Suzuki


[PATCH v6] arm64: cpuinfo: Expose MIDR_EL1 and REVIDR_EL1 to sysfs

2016-06-21 Thread Suzuki K Poulose
From: Steve Capper 

It can be useful for JIT software to be aware of MIDR_EL1 and
REVIDR_EL1 to ascertain the presence of any core errata that could
affect code generation.

This patch exposes these registers through sysfs:

/sys/devices/system/cpu/cpu$ID/identification/midr
/sys/devices/system/cpu/cpu$ID/identification/revidr

where $ID is the cpu number. For big.LITTLE systems, one can have a
mixture of cores (e.g. Cortex A53 and Cortex A57), thus all CPUs need
to be enumerated.

If the kernel does not have valid information to populate these entries
with, an empty string is returned to userspace.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Mark Rutland 
Signed-off-by: Steve Capper 
[ ABI documentation updates, hotplug notifiers ]
Signed-off-by: Suzuki K Poulose 
---
Changes since V5:
  - Add hotplug notifier to {add/remove} the attributes when the CPU is brought
{online/offline}.
  - Replace cpu_hotplug_{disable,enable} => cpu_notifier_register_{begin/done}
  - Remove redundant check for cpu present, as the sysfs infrastructure does
check already returning -ENODEV, if the CPU goes offline between open() and
read().
Changes since V4:
  - Update comment as suggested by Mark Rutland
Changes since V3:
  - Disable cpu hotplug while we initialise
  - Added a comment to explain why expose 64bit value
  - Update Document/ABI/testing/sysfs-devices-system-cpu
Changes since V2:
  - Fix errno for failures (Spotted-by: Russell King)
  - Roll back, if we encounter a missing cpu device
  - Return error for access to registers of CPUs not present.
---
 Documentation/ABI/testing/sysfs-devices-system-cpu |  13 +++
 arch/arm64/include/asm/cpu.h   |   1 +
 arch/arm64/kernel/cpuinfo.c| 106 +
 3 files changed, 120 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 1650133..8c4607d 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -340,3 +340,16 @@ Description:   POWERNV CPUFreq driver's frequency 
throttle stats directory and
'policyX/throttle_stats' directory and all the attributes are 
same as
the /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats 
directory and
attributes which give the frequency throttle information of the 
chip.
+
+What:  /sys/devices/system/cpu/cpuX/identification/
+   /sys/devices/system/cpu/cpuX/identification/midr
+   /sys/devices/system/cpu/cpuX/identification/revidr
+Date:  June 2016
+Contact:   Linux ARM Kernel Mailing list 

+   Linux Kernel mailing list 
+Description:   ARM64 CPU identification registers
+   'identification' directory exposes the CPU ID registers for
+identifying model and revision of the CPU.
+   - midr : This file gives contents of Main ID Register 
(MIDR_EL1).
+   - revidr : This file gives contents of the Revision ID register
+(REVIDR_EL1).
diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h
index 13a6103..116a382 100644
--- a/arch/arm64/include/asm/cpu.h
+++ b/arch/arm64/include/asm/cpu.h
@@ -29,6 +29,7 @@ struct cpuinfo_arm64 {
u32 reg_cntfrq;
u32 reg_dczid;
u32 reg_midr;
+   u32 reg_revidr;
 
u64 reg_id_aa64dfr0;
u64 reg_id_aa64dfr1;
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index c173d32..44ec263 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -212,6 +212,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
info->reg_ctr = read_cpuid_cachetype();
info->reg_dczid = read_cpuid(DCZID_EL0);
info->reg_midr = read_cpuid_id();
+   info->reg_revidr = read_cpuid(REVIDR_EL1);
 
info->reg_id_aa64dfr0 = read_cpuid(ID_AA64DFR0_EL1);
info->reg_id_aa64dfr1 = read_cpuid(ID_AA64DFR1_EL1);
@@ -264,3 +265,108 @@ void __init cpuinfo_store_boot_cpu(void)
boot_cpu_data = *info;
init_cpu_features(&boot_cpu_data);
 }
+
+/*
+ * The ARM ARM uses the phrase "32-bit register" to describe a register
+ * whose upper 32 bits are RES0 (per C5.1.1, ARM DDI 0487A.i), however
+ * no statement is made as to whether the upper 32 bits will or will not
+ * be made use of in future, and between ARM DDI 0487A.c and ARM DDI
+ * 0487A.d CLIDR_EL1 was expanded from 32-bit to 64-bit.
+ *
+ * Thus, while both MIDR_EL1 and REVIDR_EL1 are described as 32-bit
+ * registers, we expose them both as 64 bit values to cater for possible
+ * future expansion without an ABI break.
+ */
+#define CPUINFO_ATTR_RO(_name)  

[PATCH v3 1/7] coresight: Remove erroneous dma_free_coherent in tmc_probe

2016-06-21 Thread Suzuki K Poulose
commit de5461970b3e9e194 ("coresight: tmc: allocating memory when needed")
removed the static allocation of buffer for the trace data in ETR mode in
tmc_probe. However it failed to remove the "devm_free_coherent" in
tmc_probe when the probe fails due to other reasons. This patch gets
rid of the incorrect dma_free_coherent() call.

Fixes: commit de5461970b3e9e194 ("coresight: tmc: allocating memory when 
needed")
Cc: Mathieu Poirier 
Signed-off-by: Suzuki K Poulose 
---
 drivers/hwtracing/coresight/coresight-tmc.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc.c 
b/drivers/hwtracing/coresight/coresight-tmc.c
index 9e02ac9..3978cbb 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -388,9 +388,6 @@ static int tmc_probe(struct amba_device *adev, const struct 
amba_id *id)
 err_misc_register:
coresight_unregister(drvdata->csdev);
 err_devm_kzalloc:
-   if (drvdata->config_type == TMC_CONFIG_TYPE_ETR)
-   dma_free_coherent(dev, drvdata->size,
-   drvdata->vaddr, drvdata->paddr);
return ret;
 }
 
-- 
1.9.1



[PATCH v3 7/7] coresight: Add better messages for coresight_timeout

2016-06-21 Thread Suzuki K Poulose
When we encounter a timeout waiting for a status change via
coresight_timeout, the caller always print the offset which
was tried. This is pretty much useless as it doesn't specify
the bit position we wait for. Also, one needs to lookup the
TRM to figure out, what was wrong. This patch changes all
such error messages to print something more meaningful.

Cc: Mathieu Poirier 
Signed-off-by: Suzuki K Poulose 
---
 drivers/hwtracing/coresight/coresight-etb10.c | 6 ++
 drivers/hwtracing/coresight/coresight-etm4x.c | 6 ++
 drivers/hwtracing/coresight/coresight-tmc.c   | 6 ++
 3 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etb10.c 
b/drivers/hwtracing/coresight/coresight-etb10.c
index 4d20b0b..3b483e3 100644
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@@ -184,8 +184,7 @@ static void etb_disable_hw(struct etb_drvdata *drvdata)
 
if (coresight_timeout(drvdata->base, ETB_FFCR, ETB_FFCR_BIT, 0)) {
dev_err(drvdata->dev,
-   "timeout observed when probing at offset %#x\n",
-   ETB_FFCR);
+   "timeout while waiting for completion of Manual Flush\n");
}
 
/* disable trace capture */
@@ -193,8 +192,7 @@ static void etb_disable_hw(struct etb_drvdata *drvdata)
 
if (coresight_timeout(drvdata->base, ETB_FFSR, ETB_FFSR_BIT, 1)) {
dev_err(drvdata->dev,
-   "timeout observed when probing at offset %#x\n",
-   ETB_FFCR);
+   "timeout while waiting for Formatter to Stop\n");
}
 
CS_LOCK(drvdata->base);
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c 
b/drivers/hwtracing/coresight/coresight-etm4x.c
index 88947f3..c8b44c6 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -111,8 +111,7 @@ static void etm4_enable_hw(void *info)
/* wait for TRCSTATR.IDLE to go up */
if (coresight_timeout(drvdata->base, TRCSTATR, TRCSTATR_IDLE_BIT, 1))
dev_err(drvdata->dev,
-   "timeout observed when probing at offset %#x\n",
-   TRCSTATR);
+   "timeout while waiting for Idle Trace Status\n");
 
writel_relaxed(config->pe_sel, drvdata->base + TRCPROCSELR);
writel_relaxed(config->cfg, drvdata->base + TRCCONFIGR);
@@ -184,8 +183,7 @@ static void etm4_enable_hw(void *info)
/* wait for TRCSTATR.IDLE to go back down to '0' */
if (coresight_timeout(drvdata->base, TRCSTATR, TRCSTATR_IDLE_BIT, 0))
dev_err(drvdata->dev,
-   "timeout observed when probing at offset %#x\n",
-   TRCSTATR);
+   "timeout while waiting for Idle Trace Status\n");
 
CS_LOCK(drvdata->base);
 
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c 
b/drivers/hwtracing/coresight/coresight-tmc.c
index b3275bb..84052c7 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -38,8 +38,7 @@ void tmc_wait_for_tmcready(struct tmc_drvdata *drvdata)
if (coresight_timeout(drvdata->base,
  TMC_STS, TMC_STS_TMCREADY_BIT, 1)) {
dev_err(drvdata->dev,
-   "timeout observed when probing at offset %#x\n",
-   TMC_STS);
+   "timeout while waiting for TMC to be Ready\n");
}
 }
 
@@ -56,8 +55,7 @@ void tmc_flush_and_stop(struct tmc_drvdata *drvdata)
if (coresight_timeout(drvdata->base,
  TMC_FFCR, TMC_FFCR_FLUSHMAN_BIT, 0)) {
dev_err(drvdata->dev,
-   "timeout observed when probing at offset %#x\n",
-   TMC_FFCR);
+   "timeout while waiting for completion of Manual Flush\n");
}
 
tmc_wait_for_tmcready(drvdata);
-- 
1.9.1



[PATCH v3 3/7] coresight: Fix csdev connections initialisation

2016-06-21 Thread Suzuki K Poulose
This is a cleanup patch.

coresight_device->conns holds an array to point to the devices
connected to the OUT ports of a component. Sinks, e.g ETR, do not
have an OUT port (nr_outport = 0), as it streams the trace to
memory via AXI.

At coresight_register() we do :

conns = kcalloc(csdev->nr_outport, sizeof(*conns), GFP_KERNEL);
if (!conns) {
ret = -ENOMEM;
goto err_kzalloc_conns;
}

For ETR, since the total size requested for kcalloc is zero, the return
value is, ZERO_SIZE_PTR ( != NULL). Hence, csdev->conns = ZERO_SIZE_PTR
which cannot be verified later to contain a valid pointer. The code which
accesses the csdev->conns is bounded by the csdev->nr_outport check,
hence we don't try to dereference the ZERO_SIZE_PTR. This patch cleans
up the csdev->conns initialisation to make sure we initialise it
properly(i.e, either NULL or valid conns array).

Cc: Mathieu Poirier 
Signed-off-by: Suzuki K Poulose 
---
 drivers/hwtracing/coresight/coresight.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight.c 
b/drivers/hwtracing/coresight/coresight.c
index d08d1ab..7ba9561 100644
--- a/drivers/hwtracing/coresight/coresight.c
+++ b/drivers/hwtracing/coresight/coresight.c
@@ -893,7 +893,7 @@ struct coresight_device *coresight_register(struct 
coresight_desc *desc)
int nr_refcnts = 1;
atomic_t *refcnts = NULL;
struct coresight_device *csdev;
-   struct coresight_connection *conns;
+   struct coresight_connection *conns = NULL;
 
csdev = kzalloc(sizeof(*csdev), GFP_KERNEL);
if (!csdev) {
@@ -921,16 +921,20 @@ struct coresight_device *coresight_register(struct 
coresight_desc *desc)
 
csdev->nr_inport = desc->pdata->nr_inport;
csdev->nr_outport = desc->pdata->nr_outport;
-   conns = kcalloc(csdev->nr_outport, sizeof(*conns), GFP_KERNEL);
-   if (!conns) {
-   ret = -ENOMEM;
-   goto err_kzalloc_conns;
-   }
 
-   for (i = 0; i < csdev->nr_outport; i++) {
-   conns[i].outport = desc->pdata->outports[i];
-   conns[i].child_name = desc->pdata->child_names[i];
-   conns[i].child_port = desc->pdata->child_ports[i];
+   /* Initialise connections if there is at least one outport */
+   if (csdev->nr_outport) {
+   conns = kcalloc(csdev->nr_outport, sizeof(*conns), GFP_KERNEL);
+   if (!conns) {
+   ret = -ENOMEM;
+   goto err_kzalloc_conns;
+   }
+
+   for (i = 0; i < csdev->nr_outport; i++) {
+   conns[i].outport = desc->pdata->outports[i];
+   conns[i].child_name = desc->pdata->child_names[i];
+   conns[i].child_port = desc->pdata->child_ports[i];
+   }
}
 
csdev->conns = conns;
-- 
1.9.1



[PATCH v3 4/7] coresight: tmc: Limit the trace to available data

2016-06-21 Thread Suzuki K Poulose
At present the ETF or ETR gives out the entire device
buffer, even if there is less or even no trace data
available. This patch limits the trace data given out to
the actual trace data collected.

Cc: mathieu.poir...@linaro.org
Signed-off-by: Suzuki K Poulose 
---
 drivers/hwtracing/coresight/coresight-tmc-etf.c |  2 ++
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 12 +---
 drivers/hwtracing/coresight/coresight-tmc.c |  6 +++---
 drivers/hwtracing/coresight/coresight-tmc.h |  4 +++-
 4 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c 
b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index 466af86..e68289b 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -48,6 +48,7 @@ static void tmc_etb_dump_hw(struct tmc_drvdata *drvdata)
int i;
 
bufp = drvdata->buf;
+   drvdata->len = 0;
while (1) {
for (i = 0; i < drvdata->memwidth; i++) {
read_data = readl_relaxed(drvdata->base + TMC_RRD);
@@ -55,6 +56,7 @@ static void tmc_etb_dump_hw(struct tmc_drvdata *drvdata)
return;
memcpy(bufp, &read_data, 4);
bufp += 4;
+   drvdata->len += 4;
}
}
 }
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c 
b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 688be9e..03f36cb 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -64,11 +64,17 @@ static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
rwp = readl_relaxed(drvdata->base + TMC_RWP);
val = readl_relaxed(drvdata->base + TMC_STS);
 
-   /* How much memory do we still have */
-   if (val & BIT(0))
+   /*
+* Adjust the buffer to point to the beginning of the trace data
+* and update the available trace data.
+*/
+   if (val & BIT(0)) {
drvdata->buf = drvdata->vaddr + rwp - drvdata->paddr;
-   else
+   drvdata->len = drvdata->size;
+   } else {
drvdata->buf = drvdata->vaddr;
+   drvdata->len = rwp - drvdata->paddr;
+   }
 }
 
 static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c 
b/drivers/hwtracing/coresight/coresight-tmc.c
index 1b47258..b3275bb 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -140,8 +140,8 @@ static ssize_t tmc_read(struct file *file, char __user 
*data, size_t len,
   struct tmc_drvdata, miscdev);
char *bufp = drvdata->buf + *ppos;
 
-   if (*ppos + len > drvdata->size)
-   len = drvdata->size - *ppos;
+   if (*ppos + len > drvdata->len)
+   len = drvdata->len - *ppos;
 
if (drvdata->config_type == TMC_CONFIG_TYPE_ETR) {
if (bufp == (char *)(drvdata->vaddr + drvdata->size))
@@ -160,7 +160,7 @@ static ssize_t tmc_read(struct file *file, char __user 
*data, size_t len,
*ppos += len;
 
dev_dbg(drvdata->dev, "%s: %zu bytes copied, %d bytes left\n",
-   __func__, len, (int)(drvdata->size - *ppos));
+   __func__, len, (int)(drvdata->len - *ppos));
return len;
 }
 
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h 
b/drivers/hwtracing/coresight/coresight-tmc.h
index 5c5fe2a..44b3ae3 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -98,7 +98,8 @@ enum tmc_mem_intf_width {
  * @buf:   area of memory where trace data get sent.
  * @paddr: DMA start location in RAM.
  * @vaddr: virtual representation of @paddr.
- * @size:  @buf size.
+ * @size:  trace buffer size.
+ * @len:   size of the available trace.
  * @mode:  how this TMC is being used.
  * @config_type: TMC variant, must be of type @tmc_config_type.
  * @memwidth:  width of the memory interface databus, in bytes.
@@ -115,6 +116,7 @@ struct tmc_drvdata {
dma_addr_t  paddr;
void __iomem*vaddr;
u32 size;
+   u32 len;
local_t mode;
enum tmc_config_typeconfig_type;
enum tmc_mem_intf_width memwidth;
-- 
1.9.1



[PATCH v3 5/7] coresight: etmv4: Fix ETMv4x peripheral ID table

2016-06-21 Thread Suzuki K Poulose
This patch cleans up the peripheral id table for different ETMv4
implementations.

As per Cortex-A53 TRM, the ETM has following id values:

Peripheral ID0  0x5D0xFE0
Peripheral ID1  0xB90xFE4
Peripheral ID2  0x4B0xFE8
Peripheral ID3  0x000xFEC

where, PID2: has the following format:

[7:4]   Revision
[3] JEDEC   0b1 res1. Indicates a JEP106 identity code is used
[2:0]   DES_1   0b011   ARM Limited. This is bits[6:4] of JEP106 ID code

The existing table entry checks only the bits [1:0], which is not
sufficient enough. Fix it to match bits [3:0], just like the other
entries do. While at it, correct the comment for A57 and the A53 entry.

Cc: Mathieu Poirier 
Signed-off-by: Suzuki K Poulose 
---
 drivers/hwtracing/coresight/coresight-etm4x.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c 
b/drivers/hwtracing/coresight/coresight-etm4x.c
index 462f0dc..88947f3 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -815,12 +815,12 @@ err_arch_supported:
 }
 
 static struct amba_id etm4_ids[] = {
-   {   /* ETM 4.0 - Qualcomm */
-   .id = 0x0003b95d,
-   .mask   = 0x0003,
+   {   /* ETM 4.0 - Cortex-A53  */
+   .id = 0x000bb95d,
+   .mask   = 0x000f,
.data   = "ETM 4.0",
},
-   {   /* ETM 4.0 - Juno board */
+   {   /* ETM 4.0 - Cortex-A57 */
.id = 0x000bb95e,
.mask   = 0x000f,
.data   = "ETM 4.0",
-- 
1.9.1



[PATCH v3 6/7] coresight: Cleanup TMC status check

2016-06-21 Thread Suzuki K Poulose
Use the defined symbol rather than hardcoding the value to
check whether the TMC buffer is full.

Cc: Mathieu Poirier 
Signed-off-by: Suzuki K Poulose 
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c 
b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 03f36cb..6d7de03 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -68,7 +68,7 @@ static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
 * Adjust the buffer to point to the beginning of the trace data
 * and update the available trace data.
 */
-   if (val & BIT(0)) {
+   if (val & TMC_STS_FULL) {
drvdata->buf = drvdata->vaddr + rwp - drvdata->paddr;
drvdata->len = drvdata->size;
} else {
-- 
1.9.1



[PATCH v3 2/7] coresight: Consolidate error handling path for tmc_probe

2016-06-21 Thread Suzuki K Poulose
This patch cleans up the error handling path for tmc_probe
as a side effect of the removal of the spurious dma_free_coherent().

Cc: Mathieu Poirier 
Signed-off-by: Suzuki K Poulose 
---
 drivers/hwtracing/coresight/coresight-tmc.c | 36 ++---
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc.c 
b/drivers/hwtracing/coresight/coresight-tmc.c
index 3978cbb..1b47258 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -309,22 +309,31 @@ static int tmc_probe(struct amba_device *adev, const 
struct amba_id *id)
 
if (np) {
pdata = of_get_coresight_platform_data(dev, np);
-   if (IS_ERR(pdata))
-   return PTR_ERR(pdata);
+   if (IS_ERR(pdata)) {
+   ret = PTR_ERR(pdata);
+   goto out;
+   }
adev->dev.platform_data = pdata;
}
 
+   ret = -ENOMEM;
drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
if (!drvdata)
-   return -ENOMEM;
+   goto out;
+
+   desc = devm_kzalloc(dev, sizeof(*desc), GFP_KERNEL);
+   if (!desc)
+   goto out;
 
drvdata->dev = &adev->dev;
dev_set_drvdata(dev, drvdata);
 
/* Validity for the resource is already checked by the AMBA core */
base = devm_ioremap_resource(dev, res);
-   if (IS_ERR(base))
-   return PTR_ERR(base);
+   if (IS_ERR(base)) {
+   ret = PTR_ERR(base);
+   goto out;
+   }
 
drvdata->base = base;
 
@@ -347,12 +356,6 @@ static int tmc_probe(struct amba_device *adev, const 
struct amba_id *id)
 
pm_runtime_put(&adev->dev);
 
-   desc = devm_kzalloc(dev, sizeof(*desc), GFP_KERNEL);
-   if (!desc) {
-   ret = -ENOMEM;
-   goto err_devm_kzalloc;
-   }
-
desc->pdata = pdata;
desc->dev = dev;
desc->subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_BUFFER;
@@ -373,7 +376,7 @@ static int tmc_probe(struct amba_device *adev, const struct 
amba_id *id)
drvdata->csdev = coresight_register(desc);
if (IS_ERR(drvdata->csdev)) {
ret = PTR_ERR(drvdata->csdev);
-   goto err_devm_kzalloc;
+   goto out;
}
 
drvdata->miscdev.name = pdata->name;
@@ -381,13 +384,8 @@ static int tmc_probe(struct amba_device *adev, const 
struct amba_id *id)
drvdata->miscdev.fops = &tmc_fops;
ret = misc_register(&drvdata->miscdev);
if (ret)
-   goto err_misc_register;
-
-   return 0;
-
-err_misc_register:
-   coresight_unregister(drvdata->csdev);
-err_devm_kzalloc:
+   coresight_unregister(drvdata->csdev);
+out:
return ret;
 }
 
-- 
1.9.1



[PATCH v3 0/7] coresight: Miscellaneous fixes

2016-06-21 Thread Suzuki K Poulose
This is a collection of cleanups and minor enhancements to the
coresight driver. Applies on v4.7-rc4

Changes since V2:
 - Removed patches already queued as fixes for 4.7
 - Addressed comments on V2.

Changes since V1:
 - Added a patch to limit the trace data
 - Added a patch to fix tmc_read_unprepare_etr (another crash)
 - Split the patch to remove erraneous dma_free_coherent.
 - Use consistent error message format for coresight_timeout cleanup.
 - Fixed checkpatch warnings on the commit description, there are
   some errors reported on the "crash output" in the commit description.
   May be the checkpatch needs to be fixed ?


Suzuki K Poulose (7):
  coresight: Remove erroneous dma_free_coherent in tmc_probe
  coresight: Consolidate error handling path for tmc_probe
  coresight: Fix csdev connections initialisation
  coresight: tmc: Limit the trace to available data
  coresight: etmv4: Fix ETMv4x peripheral ID table
  coresight: Cleanup TMC status check
  coresight: Add better messages for coresight_timeout

 drivers/hwtracing/coresight/coresight-etb10.c   |  6 +--
 drivers/hwtracing/coresight/coresight-etm4x.c   | 14 +++
 drivers/hwtracing/coresight/coresight-tmc-etf.c |  2 +
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 12 --
 drivers/hwtracing/coresight/coresight-tmc.c | 51 +++--
 drivers/hwtracing/coresight/coresight-tmc.h |  4 +-
 drivers/hwtracing/coresight/coresight.c | 24 +++-
 7 files changed, 58 insertions(+), 55 deletions(-)

-- 
1.9.1



[PATCH 1/3] coresight: tmc: Cleanup operation mode handling

2016-09-27 Thread Suzuki K Poulose
The mode of operation of the TMC tracked in drvdata->mode is defined
as a local_t type. This is always checked and modified under the
drvdata->spinlock and hence we don't need local_t for it and the
unnecessary synchronisation instructions that comes with it. This
change makes the code a bit more cleaner.

Also fixes the order in which we update the drvdata->mode to
CS_MODE_DISABLED. i.e, in tmc_disable_etX_sink we change the
mode to CS_MODE_DISABLED before invoking tmc_disable_etX_hw()
which in turn depends on the mode to decide whether to dump the
trace to a buffer.

Applies on mathieu's coresight/next tree [1]

https://git.linaro.org/kernel/coresight.git next

Reported-by: Venkatesh Vivekanandan 
Cc: Mathieu Poirier 
Signed-off-by: Suzuki K Poulose 
---
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 32 +++--
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 26 +---
 drivers/hwtracing/coresight/coresight-tmc.h |  2 +-
 3 files changed, 26 insertions(+), 34 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c 
b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index d6941ea..e80a8f4 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -70,7 +70,7 @@ static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
 * When operating in sysFS mode the content of the buffer needs to be
 * read before the TMC is disabled.
 */
-   if (local_read(&drvdata->mode) == CS_MODE_SYSFS)
+   if (drvdata->mode == CS_MODE_SYSFS)
tmc_etb_dump_hw(drvdata);
tmc_disable_hw(drvdata);
 
@@ -108,7 +108,6 @@ static int tmc_enable_etf_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
int ret = 0;
bool used = false;
char *buf = NULL;
-   long val;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
@@ -138,13 +137,12 @@ static int tmc_enable_etf_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
goto out;
}
 
-   val = local_xchg(&drvdata->mode, mode);
/*
 * In sysFS mode we can have multiple writers per sink.  Since this
 * sink is already enabled no memory is needed and the HW need not be
 * touched.
 */
-   if (val == CS_MODE_SYSFS)
+   if (drvdata->mode == CS_MODE_SYSFS)
goto out;
 
/*
@@ -163,6 +161,7 @@ static int tmc_enable_etf_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
drvdata->buf = buf;
}
 
+   drvdata->mode = CS_MODE_SYSFS;
tmc_etb_enable_hw(drvdata);
 out:
spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -180,7 +179,6 @@ out:
 static int tmc_enable_etf_sink_perf(struct coresight_device *csdev, u32 mode)
 {
int ret = 0;
-   long val;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
@@ -194,17 +192,17 @@ static int tmc_enable_etf_sink_perf(struct 
coresight_device *csdev, u32 mode)
goto out;
}
 
-   val = local_xchg(&drvdata->mode, mode);
/*
 * In Perf mode there can be only one writer per sink.  There
 * is also no need to continue if the ETB/ETR is already operated
 * from sysFS.
 */
-   if (val != CS_MODE_DISABLED) {
+   if (drvdata->mode != CS_MODE_DISABLED) {
ret = -EINVAL;
goto out;
}
 
+   drvdata->mode = mode;
tmc_etb_enable_hw(drvdata);
 out:
spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -227,7 +225,6 @@ static int tmc_enable_etf_sink(struct coresight_device 
*csdev, u32 mode)
 
 static void tmc_disable_etf_sink(struct coresight_device *csdev)
 {
-   long val;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
@@ -237,10 +234,11 @@ static void tmc_disable_etf_sink(struct coresight_device 
*csdev)
return;
}
 
-   val = local_xchg(&drvdata->mode, CS_MODE_DISABLED);
/* Disable the TMC only if it needs to */
-   if (val != CS_MODE_DISABLED)
+   if (drvdata->mode != CS_MODE_DISABLED) {
tmc_etb_disable_hw(drvdata);
+   drvdata->mode = CS_MODE_DISABLED;
+   }
 
spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
@@ -260,7 +258,7 @@ static int tmc_enable_etf_link(struct coresight_device 
*csdev,
}
 
tmc_etf_enable_hw(drvdata);
-   local_set(&drvdata->mode, CS_MODE_SYSFS);
+   drvdata->mode = CS_MODE_SYSFS;
spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
dev_info(drvdata->dev, "TMC-ETF enabled\n");
@@ -280,7 +278,7 @@ static void tmc_disable_etf_link(struct coresight_device 

[PATCH 2/3] coresight: tmc: Get rid of mode parameter for helper routines

2016-09-27 Thread Suzuki K Poulose
Get rid of the superfluous mode parameter and the check for
the mode in tmc_etX_enable_sink_{perf/sysfs}. While at it, also
remove the unnecessary WARN_ON() checks.

Cc: Mathieu Poirier 
Signed-off-by: Suzuki K Poulose 
---
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 18 +-
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 15 ---
 2 files changed, 9 insertions(+), 24 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c 
b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index e80a8f4..1549436 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -103,7 +103,7 @@ static void tmc_etf_disable_hw(struct tmc_drvdata *drvdata)
CS_LOCK(drvdata->base);
 }
 
-static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev, u32 mode)
+static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev)
 {
int ret = 0;
bool used = false;
@@ -111,10 +111,6 @@ static int tmc_enable_etf_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-/* This shouldn't be happening */
-   if (WARN_ON(mode != CS_MODE_SYSFS))
-   return -EINVAL;
-
/*
 * If we don't have a buffer release the lock and allocate memory.
 * Otherwise keep the lock and move along.
@@ -176,16 +172,12 @@ out:
return ret;
 }
 
-static int tmc_enable_etf_sink_perf(struct coresight_device *csdev, u32 mode)
+static int tmc_enable_etf_sink_perf(struct coresight_device *csdev)
 {
int ret = 0;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-/* This shouldn't be happening */
-   if (WARN_ON(mode != CS_MODE_PERF))
-   return -EINVAL;
-
spin_lock_irqsave(&drvdata->spinlock, flags);
if (drvdata->reading) {
ret = -EINVAL;
@@ -202,7 +194,7 @@ static int tmc_enable_etf_sink_perf(struct coresight_device 
*csdev, u32 mode)
goto out;
}
 
-   drvdata->mode = mode;
+   drvdata->mode = CS_MODE_PERF;
tmc_etb_enable_hw(drvdata);
 out:
spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -214,9 +206,9 @@ static int tmc_enable_etf_sink(struct coresight_device 
*csdev, u32 mode)
 {
switch (mode) {
case CS_MODE_SYSFS:
-   return tmc_enable_etf_sink_sysfs(csdev, mode);
+   return tmc_enable_etf_sink_sysfs(csdev);
case CS_MODE_PERF:
-   return tmc_enable_etf_sink_perf(csdev, mode);
+   return tmc_enable_etf_sink_perf(csdev);
}
 
/* We shouldn't be here */
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c 
b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index f23ef0c..3b84d0d 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -93,7 +93,7 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
CS_LOCK(drvdata->base);
 }
 
-static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev, u32 mode)
+static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 {
int ret = 0;
bool used = false;
@@ -102,9 +102,6 @@ static int tmc_enable_etr_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
dma_addr_t paddr;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-/* This shouldn't be happening */
-   if (WARN_ON(mode != CS_MODE_SYSFS))
-   return -EINVAL;
 
/*
 * If we don't have a buffer release the lock and allocate memory.
@@ -170,16 +167,12 @@ out:
return ret;
 }
 
-static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, u32 mode)
+static int tmc_enable_etr_sink_perf(struct coresight_device *csdev)
 {
int ret = 0;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-/* This shouldn't be happening */
-   if (WARN_ON(mode != CS_MODE_PERF))
-   return -EINVAL;
-
spin_lock_irqsave(&drvdata->spinlock, flags);
if (drvdata->reading) {
ret = -EINVAL;
@@ -208,9 +201,9 @@ static int tmc_enable_etr_sink(struct coresight_device 
*csdev, u32 mode)
 {
switch (mode) {
case CS_MODE_SYSFS:
-   return tmc_enable_etr_sink_sysfs(csdev, mode);
+   return tmc_enable_etr_sink_sysfs(csdev);
case CS_MODE_PERF:
-   return tmc_enable_etr_sink_perf(csdev, mode);
+   return tmc_enable_etr_sink_perf(csdev);
}
 
/* We shouldn't be here */
-- 
2.7.4



[PATCH 3/3] coresight: tmc: Remove duplicate memset

2016-09-27 Thread Suzuki K Poulose
The tmc_etr_enable_hw() fills the buffer with 0's before enabling
the hardware. So, we don't need an explicit memset() in
tmc_enable_etr_sink_sysfs() before calling the tmc_etr_enable_hw().
This patch removes the explicit memset from tmc_enable_etr_sink_sysfs.

Signed-off-by: Suzuki K Poulose 
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c 
b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 3b84d0d..5d31269 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -150,8 +150,6 @@ static int tmc_enable_etr_sink_sysfs(struct 
coresight_device *csdev)
drvdata->buf = drvdata->vaddr;
}
 
-   memset(drvdata->vaddr, 0, drvdata->size);
-
drvdata->mode = CS_MODE_SYSFS;
tmc_etr_enable_hw(drvdata);
 out:
-- 
2.7.4



Re: [PATCH v11] acpi, apei, arm64: APEI initial support for aarch64.

2016-07-28 Thread Suzuki K Poulose

On 27/07/16 18:29, fu@linaro.org wrote:

From: Tomasz Nowicki 

This commit provides APEI arch-specific bits for aarch64

Meanwhile,
(1)add a new subfunction "hest_ia32_init" for
"acpi_disable_cmcff" which is used by IA-32 Architecture
Corrected Machine Check (CMC).
(2)move HEST type (ACPI_HEST_TYPE_IA32_CORRECTED_CHECK) checking to
a generic place.
(3)select HAVE_ACPI_APEI when EFI and ACPI is set on ARM64,
because arch_apei_get_mem_attribute is using efi_mem_attributes on ARM64.

[Fu Wei: improve && upstream]





diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index 5420cb0..d3d02dc 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -17,6 +17,7 @@

 #include 
 #include 
+#include 

 /* Macros for consistency checks of the GICC subtable of MADT */
 #define ACPI_MADT_GICC_LENGTH  \
@@ -110,8 +111,21 @@ static inline const char *acpi_get_enable_method(int cpu)
 }

 #ifdef CONFIG_ACPI_APEI
+#define acpi_disable_cmcff 1
 pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr);
-#endif
+
+/*
+ * Despite its name, this function must still broadcast the TLB
+ * invalidation in order to ensure other CPUs don't up with with junk


Please fix the comment above ^^^.


Suzuki


Re: [PATCH] coresight: tmc: fix for trace collection bug in sysFS mode

2016-09-14 Thread Suzuki K Poulose

On 13/09/16 16:41, Mathieu Poirier wrote:

On 13 September 2016 at 06:20, Venkatesh Vivekanandan
 wrote:

tmc_etb_dump_hw is never called in sysFS mode to collect trace from
hardware, because drvdata->mode is set to CS_MODE_DISABLED at
tmc_disable_etf/etr_sink

static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
{
.
.
if (local_read(&drvdata->mode) == CS_MODE_SYSFS)
tmc_etb_dump_hw(drvdata);
.
.
}

static void tmc_disable_etf_sink(struct coresight_device *csdev)
{
   .
   .
val = local_xchg(&drvdata->mode, CS_MODE_DISABLED);
/* Disable the TMC only if it needs to */
if (val != CS_MODE_DISABLED)
tmc_etb_disable_hw(drvdata);


You are correct.


   .
   .
}

Signed-off-by: Venkatesh Vivekanandan 
---
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 9 +
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 9 +
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c 
b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index 466af86..c7fb7f7 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -61,6 +61,8 @@ static void tmc_etb_dump_hw(struct tmc_drvdata *drvdata)

 static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
 {
+   long val;
+
CS_UNLOCK(drvdata->base);

tmc_flush_and_stop(drvdata);
@@ -68,7 +70,8 @@ static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
 * When operating in sysFS mode the content of the buffer needs to be
 * read before the TMC is disabled.
 */
-   if (local_read(&drvdata->mode) == CS_MODE_SYSFS)
+   val = local_xchg(&drvdata->mode, CS_MODE_DISABLED);
+   if (val == CS_MODE_SYSFS)
tmc_etb_dump_hw(drvdata);
tmc_disable_hw(drvdata);

@@ -225,7 +228,6 @@ static int tmc_enable_etf_sink(struct coresight_device 
*csdev, u32 mode)

 static void tmc_disable_etf_sink(struct coresight_device *csdev)
 {
-   long val;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);

@@ -235,9 +237,8 @@ static void tmc_disable_etf_sink(struct coresight_device 
*csdev)
return;
}

-   val = local_xchg(&drvdata->mode, CS_MODE_DISABLED);
/* Disable the TMC only if it needs to */
-   if (val != CS_MODE_DISABLED)
+   if (local_read(&drvdata->mode) != CS_MODE_DISABLED)
tmc_etb_disable_hw(drvdata);


This would work but tmc_enable_etf_sink() and tmc_disable_etf_sink()
are no longer balanced.  Another approach would be to add a "mode"
parameter to tmc_etb_disable_hw() and so something like:

if (val != CS_MODE_DISABLED)
tmc_etb_disable_hw(drvdata, val);

In tmc_etb_disable_hw(), if mode == CS_MODE_SYSFS then we can move
ahead with the dump operation.  The same apply for ETR


I think we should :

1) First switch the drvdata->mode to a normal type from local_t. Using an
atomic type for mode is completely unnecessary and comes with the overhead
of barriers/synchronisation instructions, while all accesses, including 
read/write
are performed under the drvdata->spinlock. I have a patch already for this, 
which
I plan to send it soon.

and

2) Do something like :

void  tmc_disable_etX_sink()
{
if (drvdata->mode != CS_MODE_DISABLED) {
tmc_etX_disable_hw(drvdata);
drvdata->mode = CS_MODE_DISABLED;
}
}

Leaving the tmc_etX_disable_hw() untouched.


Suzuki





Re: [PATCH] coresight: tmc: fix for trace collection bug in sysFS mode

2016-09-14 Thread Suzuki K Poulose

On 14/09/16 12:30, Venkatesh Vivekanandan wrote:



On Wed, Sep 14, 2016 at 3:26 PM, Suzuki K Poulose mailto:suzuki.poul...@arm.com>> wrote:

On 13/09/16 16:41, Mathieu Poirier wrote:

On 13 September 2016 at 06:20, Venkatesh Vivekanandan
mailto:venkatesh.vivekanan...@broadcom.com>> wrote:

tmc_etb_dump_hw is never called in sysFS mode to collect trace from
hardware, because drvdata->mode is set to CS_MODE_DISABLED at
tmc_disable_etf/etr_sink

static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
{
.
.
if (local_read(&drvdata->mode) == CS_MODE_SYSFS)
tmc_etb_dump_hw(drvdata);
.
.
}

static void tmc_disable_etf_sink(struct coresight_device *csdev)
{
   .
   .
val = local_xchg(&drvdata->mode, CS_MODE_DISABLED);
/* Disable the TMC only if it needs to */
if (val != CS_MODE_DISABLED)
tmc_etb_disable_hw(drvdata);


You are correct.

   .
   .
}

 

I think we should :

1) First switch the drvdata->mode to a normal type from local_t. Using an
atomic type for mode is completely unnecessary and comes with the overhead
of barriers/synchronisation instructions, while all accesses, including 
read/write
are performed under the drvdata->spinlock. I have a patch already for this, 
which
I plan to send it soon.

and

2) Do something like :

void  tmc_disable_etX_sink()
{
if (drvdata->mode != CS_MODE_DISABLED) {
tmc_etX_disable_hw(drvdata);
drvdata->mode = CS_MODE_DISABLED;
}
}

You will fix this along with above changes?


Yes.

nit: Please fix your mail client. Do not use HTML formatted emails on mailing 
list

Cheers
Suzuki


[PATCH] coresight: tmc: Cleanup operation mode handling

2016-09-14 Thread Suzuki K Poulose
The mode of operation of the TMC tracked in drvdata->mode is defined
as a local_t type. This is always checked and modified under the
drvdata->spinlock and hence we don't need local_t for it and the
unnecessary synchronisation instructions that comes with it. This
change makes the code a bit more cleaner.

Also fixes the order in which we update the drvdata->mode to
CS_MODE_DISABLED. i.e, in tmc_disable_etX_sink we change the
mode to CS_MODE_DISABLED before invoking tmc_disable_etX_hw()
which in turn depends on the mode to decide whether to dump the
trace to a buffer.

Applies on mathieu's coresight/next tree [1]

https://git.linaro.org/kernel/coresight.git next

Reported-by: Venkatesh Vivekanandan 
Cc: Mathieu Poirier 
Signed-off-by: Suzuki K Poulose 
---
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 32 +++--
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 30 ++-
 drivers/hwtracing/coresight/coresight-tmc.h |  2 +-
 3 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c 
b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index d6941ea..c51ce45 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -70,7 +70,7 @@ static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
 * When operating in sysFS mode the content of the buffer needs to be
 * read before the TMC is disabled.
 */
-   if (local_read(&drvdata->mode) == CS_MODE_SYSFS)
+   if (drvdata->mode == CS_MODE_SYSFS)
tmc_etb_dump_hw(drvdata);
tmc_disable_hw(drvdata);
 
@@ -108,7 +108,6 @@ static int tmc_enable_etf_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
int ret = 0;
bool used = false;
char *buf = NULL;
-   long val;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
@@ -138,13 +137,12 @@ static int tmc_enable_etf_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
goto out;
}
 
-   val = local_xchg(&drvdata->mode, mode);
/*
 * In sysFS mode we can have multiple writers per sink.  Since this
 * sink is already enabled no memory is needed and the HW need not be
 * touched.
 */
-   if (val == CS_MODE_SYSFS)
+   if (drvdata->mode == CS_MODE_SYSFS)
goto out;
 
/*
@@ -163,6 +161,7 @@ static int tmc_enable_etf_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
drvdata->buf = buf;
}
 
+   drvdata->mode = CS_MODE_SYSFS;
tmc_etb_enable_hw(drvdata);
 out:
spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -180,7 +179,6 @@ out:
 static int tmc_enable_etf_sink_perf(struct coresight_device *csdev, u32 mode)
 {
int ret = 0;
-   long val;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
@@ -194,17 +192,17 @@ static int tmc_enable_etf_sink_perf(struct 
coresight_device *csdev, u32 mode)
goto out;
}
 
-   val = local_xchg(&drvdata->mode, mode);
/*
 * In Perf mode there can be only one writer per sink.  There
 * is also no need to continue if the ETB/ETR is already operated
 * from sysFS.
 */
-   if (val != CS_MODE_DISABLED) {
+   if (drvdata->mode != CS_MODE_DISABLED) {
ret = -EINVAL;
goto out;
}
 
+   drvdata->mode = mode;
tmc_etb_enable_hw(drvdata);
 out:
spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -227,7 +225,6 @@ static int tmc_enable_etf_sink(struct coresight_device 
*csdev, u32 mode)
 
 static void tmc_disable_etf_sink(struct coresight_device *csdev)
 {
-   long val;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
@@ -237,10 +234,11 @@ static void tmc_disable_etf_sink(struct coresight_device 
*csdev)
return;
}
 
-   val = local_xchg(&drvdata->mode, CS_MODE_DISABLED);
/* Disable the TMC only if it needs to */
-   if (val != CS_MODE_DISABLED)
+   if (drvdata->mode != CS_MODE_DISABLED) {
tmc_etb_disable_hw(drvdata);
+   drvdata->mode = CS_MODE_DISABLED;
+   }
 
spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
@@ -260,7 +258,7 @@ static int tmc_enable_etf_link(struct coresight_device 
*csdev,
}
 
tmc_etf_enable_hw(drvdata);
-   local_set(&drvdata->mode, CS_MODE_SYSFS);
+   drvdata->mode = CS_MODE_SYSFS;
spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
dev_info(drvdata->dev, "TMC-ETF enabled\n");
@@ -279,8 +277,8 @@ static void tmc_disable_etf_link(struct coresig

Re: [PATCH V7 5/5] perf tools: adding sink configuration for cs_etm PMU

2016-08-31 Thread Suzuki K Poulose

On 30/08/16 17:19, Mathieu Poirier wrote:

Using the PMU::set_drv_config() callback to enable the CoreSight
sink that will be used for the trace session.



+int cs_etm_set_drv_config(struct perf_evsel_config_term *term)
+{
+   int ret;
+   char enable_sink[ENABLE_SINK_MAX];
+
+   snprintf(enable_sink, ENABLE_SINK_MAX, "%s/%s",
+term->val.drv_cfg, "enable_sink");
+
+   ret = cs_device__print_file(enable_sink, "%d", 1);
+   if (ret < 0)
+   return ret;
+
+   return 0;
+}



Don't we have to disable the sink at the end of the session ? How is that
taken care of ? Did I miss that ?

Suzuki


Re: [PATCH v3] Added perf functionality to mmdc driver

2016-08-31 Thread Suzuki K Poulose

On 17/08/16 20:42, Zhengyu Shen wrote:

MMDC is a multi-mode DDR controller that supports DDR3/DDR3L x16/x32/x64
and LPDDR2 two channel x16/x32 memory types. MMDC is configurable, high
performance, and optimized. MMDC is present on i.MX6 Quad and i.MX6
QuadPlus devices, but this driver only supports i.MX6 Quad at the moment.
MMDC provides registers for performance counters which read via this
driver to help debug memory throughput and similar issues.

$ perf stat -a -e 
mmdc/busy-cycles/,mmdc/read-accesses/,mmdc/read-bytes/,mmdc/total-cycles/,mmdc/write-accesses/,mmdc/write-bytes/
 dd if=/dev/zero of=/dev/null bs=1M count=5000
Performance counter stats for 'dd if=/dev/zero of=/dev/null bs=1M count=5000':

 898021787  mmdc/busy-cycles/
  14819600  mmdc/read-accesses/
471.30 MB   mmdc/read-bytes/
2815419216  mmdc/total-cycles/
  13367354  mmdc/write-accesses/
427.76 MB   mmdc/write-bytes/

   5.334757334 seconds time elapsed

Signed-off-by: Zhengyu Shen 
Signed-off-by: Frank Li 
---
Changes from v2 to v3:
Use WARN_ONCE instead of returning generic error values
Replace CPU Notifiers with newer state machine hotplug
Added additional checks on event_init for grouping and sampling
Remove useless mmdc_enable_profiling function
Added comments
Moved start index of events from 0x01 to 0x00
Added a counter to pmu_mmdc to only stop hrtimer after all events are 
finished
Replace readl_relaxed and writel_relaxed with readl and writel
Removed duplicate update function
Used devm_kasprintf when naming mmdcs probed

Changes from v1 to v2:
Added cpumask and migration handling support to driver
Validated event during event_init
Added code to properly stop counters
Used perf_invalid_context instead of perf_sw_context
Added hrtimer to poll for overflow
Added better description
Added support for multiple mmdcs



Should we move all the PMU specific code under at least CONFIG_PERF_EVENTS ?

Suzuki


Re: [PATCH v3] Added perf functionality to mmdc driver

2016-08-31 Thread Suzuki K Poulose

On 17/08/16 20:42, Zhengyu Shen wrote:

MMDC is a multi-mode DDR controller that supports DDR3/DDR3L x16/x32/x64
and LPDDR2 two channel x16/x32 memory types. MMDC is configurable, high
performance, and optimized. MMDC is present on i.MX6 Quad and i.MX6
QuadPlus devices, but this driver only supports i.MX6 Quad at the moment.
MMDC provides registers for performance counters which read via this
driver to help debug memory throughput and similar issues.

$ perf stat -a -e 
mmdc/busy-cycles/,mmdc/read-accesses/,mmdc/read-bytes/,mmdc/total-cycles/,mmdc/write-accesses/,mmdc/write-bytes/
 dd if=/dev/zero of=/dev/null bs=1M count=5000
Performance counter stats for 'dd if=/dev/zero of=/dev/null bs=1M count=5000':

 898021787  mmdc/busy-cycles/
  14819600  mmdc/read-accesses/
471.30 MB   mmdc/read-bytes/
2815419216  mmdc/total-cycles/
  13367354  mmdc/write-accesses/
427.76 MB   mmdc/write-bytes/

   5.334757334 seconds time elapsed

Signed-off-by: Zhengyu Shen 
Signed-off-by: Frank Li 




+
+static int mmdc_pmu_init(struct mmdc_pmu *pmu_mmdc,
+   void __iomem *mmdc_base, struct device *dev)
+{
+   int mmdc_num;
+
+   *pmu_mmdc = (struct mmdc_pmu) {
+   .pmu = (struct pmu) {
+   .task_ctx_nr= perf_invalid_context,
+   .attr_groups= attr_groups,
+   .event_init = mmdc_event_init,
+   .add= mmdc_event_add,
+   .del= mmdc_event_del,
+   .start  = mmdc_event_start,
+   .stop   = mmdc_event_stop,
+   .read   = mmdc_event_update,
+   },
+   .mmdc_base = mmdc_base,
+   };
+
+   mmdc_num = ida_simple_get(&mmdc_ida, 0, 0, GFP_KERNEL);
+
+   cpumask_set_cpu(smp_processor_id(), &pmu_mmdc->cpu);
+
+   pmu_mmdc->dev = dev;
+   pmu_mmdc->active_events = 0;
+   spin_lock_init(&pmu_mmdc->mmdc_active_events_lock);
+
+   cpuhp_mmdc_pmu = pmu_mmdc;
+   cpuhp_setup_state(CPUHP_ONLINE,
+   "PERF_MMDC_ONLINE", NULL,
+   mmdc_pmu_offline_cpu);


You may want cpuhp_setup_state_nocalls instead here ?


Cheers
Suzuki


Re: [PATCH V7 5/5] perf tools: adding sink configuration for cs_etm PMU

2016-09-01 Thread Suzuki K Poulose

On 31/08/16 15:14, Mathieu Poirier wrote:

On 31 August 2016 at 03:37, Suzuki K Poulose  wrote:

On 30/08/16 17:19, Mathieu Poirier wrote:


Using the PMU::set_drv_config() callback to enable the CoreSight
sink that will be used for the trace session.




+int cs_etm_set_drv_config(struct perf_evsel_config_term *term)
+{
+   int ret;
+   char enable_sink[ENABLE_SINK_MAX];
+
+   snprintf(enable_sink, ENABLE_SINK_MAX, "%s/%s",
+term->val.drv_cfg, "enable_sink");
+
+   ret = cs_device__print_file(enable_sink, "%d", 1);
+   if (ret < 0)
+   return ret;
+
+   return 0;
+}




Don't we have to disable the sink at the end of the session ? How is that
taken care of ? Did I miss that ?



Correct - the sink has to be disabled once it is no longer needed.  It
is a little tricky to do and I haven't decided on the best way to
proceed.  Fortunately that aspect doesn't affect this patchset.


Well, this patchset when used, could leave a sink enabled. If we a choose
a different sink (say an ETF) from the perf, which occurs before the
previous sink (say an ETR) in the coresight path, the perf wouldn't get any
trace data, without any clue.

May be we could register an atexit() handler for clearing the sink ? So that
it is guaranteed to clear it irrespective of the path taken by perf to exit ?

Cheers
Suzuki



Re: [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes

2016-09-02 Thread Suzuki K Poulose

On 26/08/16 18:00, Catalin Marinas wrote:

On Fri, Aug 26, 2016 at 05:16:27PM +0100, Will Deacon wrote:

On Fri, Aug 26, 2016 at 02:08:01PM +0100, Suzuki K Poulose wrote:

On 26/08/16 14:04, Suzuki K Poulose wrote:



It might be worth looking to see if we can pass the ctr as an extra
parameter to the assembly routines that need it. Then you can access it
easily from C code, and if you pass it as 0 that could result in the asm
code reading it from the h/w register, removing the need for the _raw
stuff you add.


How often to we need to access a sanitised sysreg from assembly? AFAICT,
CTR_EL0 is the first. If we only need it to infer the minimum cache line
size, we could as well store the latter in a global variable and access
it directly. If we feel brave, we could patch a "mov \reg, #x"
instruction in the ?cache_line_size macros (starting with 32 by default,
though to make it less cumbersome we'd have to improve the run-time
patching code a bit).



With Ard's patches [1] to refactor the feature array, we can refer to named
CTR_EL0 feature register cleanly. I can rebase this series on top of that
if nobody has any objection.

[1] http://marc.info/?l=linux-arm-kernel&m=147263959504998&w=2


Suzuki



Re: [PATCH 2/2] arm64: Use static keys for CPU features

2016-09-02 Thread Suzuki K Poulose

On 02/09/16 16:52, Catalin Marinas wrote:

On Fri, Aug 26, 2016 at 10:22:13AM +0100, Suzuki K. Poulose wrote:

On 25/08/16 18:26, Catalin Marinas wrote:




Just a heads up. I have a patch [1] which moves the "check_local_cpu_errata()"
around to smp_prepare_boot_cpu(). This patch should still work fine with that
case. Only that may be we could move the jump_lable_init() to 
smp_prepare_boot_cpu(),
before we call "update_cpu_errata_work_arounds()" for Boot CPU.


IIUC, we wouldn't call update_cpu_errata_work_arounds() until the CPU
feature infrastructure is initialised via cpuinfo_store_boot_cpu(). So
I don't think moving the jump_label_init() call above is necessary.


Right, as I said, your patch should work fine even with that change. Its just 
that,
jump_label_init() (a generic kernel setup) can be called from a better visible
place (smp_prepare_boot_cpu()) than from a less interesting place with the patch
below.

Cheers
Suzuki





[1] 
https://lkml.kernel.org/r/1471525832-21209-4-git-send-email-suzuki.poul...@arm.com






Re: [PATCH V4 6/6] coresight: etm-perf: incorporating sink definition from cmd line

2016-08-05 Thread Suzuki K Poulose

On 04/08/16 17:53, Mathieu Poirier wrote:

Now that PMU specific configuration is available as part of the event,
lookup the sink identified by users from the perf command line and build
a path from source to sink.

With this functionality it is no longer required to select a sink in a
separate step (from sysFS) before a perf trace session can be started.

Signed-off-by: Mathieu Poirier 




+static const match_table_t drv_cfg_tokens = {
+   {ETM_TOKEN_SINK_CPU, "sink=cpu%d:%s"},
+   {ETM_TOKEN_SINK, "sink=%s"},
+   {ETM_TOKEN_ERR, NULL},
+};
+


Is the above format documented somewhere for the perf users ? If not could we 
please
add it ?

Thanks
Suzuki



Re: [PATCH 8/8] arm64: Work around systems with mismatched cache line sizes

2016-08-24 Thread Suzuki K Poulose

On 22/08/16 14:02, Will Deacon wrote:

On Thu, Aug 18, 2016 at 02:10:32PM +0100, Suzuki K Poulose wrote:

Systems with differing CPU i-cache/d-cache line sizes can cause
problems with the cache management by software when the execution
is migrated from one to another. Usually, the application reads
the cache size on a CPU and then uses that length to perform cache
operations. However, if it gets migrated to another CPU with a smaller
cache line size, things could go completely wrong. To prevent such
cases, always use the smallest cache line size among the CPUs. The
kernel CPU feature infrastructure already keeps track of the safe
value for all CPUID registers including CTR. This patch works around
the problem by :

For kernel, dynamically patch the kernel to read the cache size
from the system wide copy of CTR_EL0.


Is it only CTR that is mismatched in practice, or do we need to worry
about DCZID_EL0 too?


A mismatched DCZID_EL0 is quite possible. However, there is no way to
trap accesses to DCZID_EL0. Rather, we can trap DC ZVA if we clear
SCTLR_EL1.DZE. But then clearing the SCTLR_EL1.DZE implies reading DCZID.DZP
returns 1, indicating DC ZVA is not supported. So if a proper application
checks the DZP before issuing a DC ZVA, we may never be able to emulate it.
Or in other words, if there is a mismatch, the work around is to disable
the DC ZVA operations (which could possibly affect existing (incorrect) 
userspace
applications assuming DC ZVA is supported without checking the DZP bit).
 

 static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 93c5287..db2d6cb 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -480,6 +480,14 @@ static void user_cache_maint_handler(unsigned int esr, 
struct pt_regs *regs)
regs->pc += 4;
 }

+static void ctr_read_handler(unsigned int esr, struct pt_regs *regs)
+{
+   int rt = (esr & ESR_ELx_SYS64_ISS_RT_MASK) >> 
ESR_ELx_SYS64_ISS_RT_SHIFT;
+
+   regs->regs[rt] = sys_ctr_ftr->sys_val;
+   regs->pc += 4;
+}


Whilst this is correct, I wonder if there's any advantage in reporting a
*larger* size to userspace and avoid incurring additional trap overhead?


Combining the trapping of user space dc operations for Errata work around for
clean cache, we could possibly report a larger size and emulate it properly
in the kernel. But I think that can be a enhancement on top of this series.



Any idea what sort of size typical JITs are using?


I have no clue about it. I have Cc-ed Rodolph and Stuart, who may have better
idea about the JIT's usage.

Suzuki



Re: [PATCH 2/2] arm64: Use static keys for CPU features

2016-08-26 Thread Suzuki K Poulose

On 25/08/16 18:26, Catalin Marinas wrote:

This patch adds static keys transparently for all the cpu_hwcaps
features by implementing an array of default-false static keys and
enabling them when detected. The cpus_have_cap() check uses the static
keys if the feature being checked is a constant, otherwise the compiler
generates the bitmap test.

Because of the early call to static_branch_enable() via
check_local_cpu_errata() -> update_cpu_capabilities(), the jump labels
are initialised in cpuinfo_store_boot_cpu().

Cc: Will Deacon 
Cc: Suzuki K. Poulose 
Signed-off-by: Catalin Marinas 
---




 static inline int __attribute_const__
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 62272eac1352..919b2d0d68ae 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -46,6 +46,9 @@ unsigned int compat_elf_hwcap2 __read_mostly;

 DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);

+DEFINE_STATIC_KEY_ARRAY_FALSE(cpu_hwcap_keys, ARM64_NCAPS);
+EXPORT_SYMBOL(cpu_hwcap_keys);
+
 #define __ARM64_FTR_BITS(SIGNED, STRICT, TYPE, SHIFT, WIDTH, SAFE_VAL) \
{   \
.sign = SIGNED, \
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index ed1b84fe6925..6a141e399daf 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -377,6 +377,12 @@ void cpuinfo_store_cpu(void)
 void __init cpuinfo_store_boot_cpu(void)
 {
struct cpuinfo_arm64 *info = &per_cpu(cpu_data, 0);
+
+   /*
+* Initialise the static keys early as they may be enabled by
+* check_local_cpu_errata() -> update_cpu_capabilities().
+*/
+   jump_label_init();
__cpuinfo_store_cpu(info);


Catalin,

Just a heads up. I have a patch [1] which moves the "check_local_cpu_errata()"
around to smp_prepare_boot_cpu(). This patch should still work fine with that
case. Only that may be we could move the jump_lable_init() to 
smp_prepare_boot_cpu(),
before we call "update_cpu_errata_work_arounds()" for Boot CPU.

Either way, this will be useful for some of the other feature checks.

Thanks
Suzuki

[1] 
https://lkml.kernel.org/r/1471525832-21209-4-git-send-email-suzuki.poul...@arm.com



[PATCH v2 5/9] arm64: insn: Add helpers for adrp offsets

2016-08-26 Thread Suzuki K Poulose
Adds helpers for decoding/encoding the PC relative addresses for adrp.
This will be used for handling dynamic patching of 'adrp' instructions
in alternative code patching.

Cc: Mark Rutland 
Cc: Will Deacon 
Cc: Catalin Marinas 
Signed-off-by: Suzuki K Poulose 
---
Changes since V1:
 - Replace adr_adrp with seperate handlers for adr and adrp (Marc Zyngier)
---
 arch/arm64/include/asm/insn.h | 11 ++-
 arch/arm64/kernel/insn.c  | 13 +
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 1dbaa90..bc85366 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -246,7 +246,8 @@ static __always_inline bool aarch64_insn_is_##abbr(u32 
code) \
 static __always_inline u32 aarch64_insn_get_##abbr##_value(void) \
 { return (val); }
 
-__AARCH64_INSN_FUNCS(adr_adrp, 0x1F00, 0x1000)
+__AARCH64_INSN_FUNCS(adr,  0x9F00, 0x1000)
+__AARCH64_INSN_FUNCS(adrp, 0x9F00, 0x9000)
 __AARCH64_INSN_FUNCS(prfm_lit, 0xFF00, 0xD800)
 __AARCH64_INSN_FUNCS(str_reg,  0x3FE0EC00, 0x38206800)
 __AARCH64_INSN_FUNCS(ldr_reg,  0x3FE0EC00, 0x38606800)
@@ -318,6 +319,11 @@ __AARCH64_INSN_FUNCS(msr_reg,  0xFFF0, 0xD510)
 bool aarch64_insn_is_nop(u32 insn);
 bool aarch64_insn_is_branch_imm(u32 insn);
 
+static inline bool aarch64_insn_is_adr_adrp(u32 insn)
+{
+   return aarch64_insn_is_adr(insn) || aarch64_insn_is_adrp(insn);
+}
+
 int aarch64_insn_read(void *addr, u32 *insnp);
 int aarch64_insn_write(void *addr, u32 insn);
 enum aarch64_insn_encoding_class aarch64_get_insn_class(u32 insn);
@@ -398,6 +404,9 @@ int aarch64_insn_patch_text_nosync(void *addr, u32 insn);
 int aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt);
 int aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt);
 
+s32 aarch64_insn_adrp_get_offset(u32 insn);
+u32 aarch64_insn_adrp_set_offset(u32 insn, s32 offset);
+
 bool aarch32_insn_is_wide(u32 insn);
 
 #define A32_RN_OFFSET  16
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 63f9432..f022af4 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -1202,6 +1202,19 @@ u32 aarch64_set_branch_offset(u32 insn, s32 offset)
BUG();
 }
 
+s32 aarch64_insn_adrp_get_offset(u32 insn)
+{
+   BUG_ON(!aarch64_insn_is_adrp(insn));
+   return aarch64_insn_decode_immediate(AARCH64_INSN_IMM_ADR, insn) << 12;
+}
+
+u32 aarch64_insn_adrp_set_offset(u32 insn, s32 offset)
+{
+   BUG_ON(!aarch64_insn_is_adrp(insn));
+   return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_ADR, insn,
+   offset >> 12);
+}
+
 /*
  * Extract the Op/CR data from a msr/mrs instruction.
  */
-- 
2.7.4



[PATCH v2 6/9] arm64: alternative: Add support for patching adrp instructions

2016-08-26 Thread Suzuki K Poulose
adrp uses PC-relative address offset to a page (of 4K size) of
a symbol. If it appears in an alternative code patched in, we
should adjust the offset to reflect the address where it will
be run from. This patch adds support for fixing the offset
for adrp instructions.

Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: Andre Przywara 
Cc: Mark Rutland 
Signed-off-by: Suzuki K Poulose 

---
Changes since V1:
 - Add align_down macro. Couldn't find the best place to add it.
   Didn't want to add this to uapi/ headers where the kernel's generic
   ALIGN helpers are really defined. For the time being left it here.
---
 arch/arm64/kernel/alternative.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 6b269a9..d681498 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -59,6 +59,8 @@ static bool branch_insn_requires_update(struct alt_instr 
*alt, unsigned long pc)
BUG();
 }
 
+#define align_down(x, a)   ((unsigned long)(x) & ~(((unsigned long)(a)) - 
1))
+
 static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr)
 {
u32 insn;
@@ -80,6 +82,19 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, 
u32 *altinsnptr)
offset = target - (unsigned long)insnptr;
insn = aarch64_set_branch_offset(insn, offset);
}
+   } else if (aarch64_insn_is_adrp(insn)) {
+   s32 orig_offset, new_offset;
+   unsigned long target;
+
+   /*
+* If we're replacing an adrp instruction, which uses 
PC-relative
+* immediate addressing, adjust the offset to reflect the new
+* PC. adrp operates on 4K aligned addresses.
+*/
+   orig_offset  = aarch64_insn_adrp_get_offset(insn);
+   target = align_down(altinsnptr, SZ_4K) + orig_offset;
+   new_offset = target - align_down(insnptr, SZ_4K);
+   insn = aarch64_insn_adrp_set_offset(insn, new_offset);
} else if (aarch64_insn_uses_literal(insn)) {
/*
 * Disallow patching unhandled instructions using PC relative
-- 
2.7.4



[PATCH v2 8/9] arm64: Refactor sysinstr exception handling

2016-08-26 Thread Suzuki K Poulose
Right now we trap some of the user space data cache operations
based on a few Errata (ARM 819472, 826319, 827319 and 824069).
We need to trap userspace access to CTR_EL0, if we detect mismatched
cache line size. Since both these traps share the EC, refactor
the handler a little bit to make it a bit more reader friendly.

Cc: Andre Przywara 
Cc: Mark Rutland 
Cc: Will Deacon 
Cc: Catalin Marinas 
Signed-off-by: Suzuki K Poulose 

---
Changes since V1:
 - Add comments for iss field definitions for other exceptions.
---
 arch/arm64/include/asm/esr.h | 76 ++--
 arch/arm64/kernel/traps.c| 73 +++---
 2 files changed, 114 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index f772e15..9875b32 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -78,6 +78,23 @@
 
 #define ESR_ELx_IL (UL(1) << 25)
 #define ESR_ELx_ISS_MASK   (ESR_ELx_IL - 1)
+
+/* ISS field definitions shared by different classes */
+#define ESR_ELx_WNR(UL(1) << 6)
+
+/* Shared ISS field definitions for Data/Instruction aborts */
+#define ESR_ELx_EA (UL(1) << 9)
+#define ESR_ELx_S1PTW  (UL(1) << 7)
+
+/* Shared ISS fault status code(IFSC/DFSC) for Data/Instruction aborts */
+#define ESR_ELx_FSC(0x3F)
+#define ESR_ELx_FSC_TYPE   (0x3C)
+#define ESR_ELx_FSC_EXTABT (0x10)
+#define ESR_ELx_FSC_ACCESS (0x08)
+#define ESR_ELx_FSC_FAULT  (0x04)
+#define ESR_ELx_FSC_PERM   (0x0C)
+
+/* ISS field definitions for Data Aborts */
 #define ESR_ELx_ISV(UL(1) << 24)
 #define ESR_ELx_SAS_SHIFT  (22)
 #define ESR_ELx_SAS(UL(3) << ESR_ELx_SAS_SHIFT)
@@ -86,16 +103,9 @@
 #define ESR_ELx_SRT_MASK   (UL(0x1F) << ESR_ELx_SRT_SHIFT)
 #define ESR_ELx_SF (UL(1) << 15)
 #define ESR_ELx_AR (UL(1) << 14)
-#define ESR_ELx_EA (UL(1) << 9)
 #define ESR_ELx_CM (UL(1) << 8)
-#define ESR_ELx_S1PTW  (UL(1) << 7)
-#define ESR_ELx_WNR(UL(1) << 6)
-#define ESR_ELx_FSC(0x3F)
-#define ESR_ELx_FSC_TYPE   (0x3C)
-#define ESR_ELx_FSC_EXTABT (0x10)
-#define ESR_ELx_FSC_ACCESS (0x08)
-#define ESR_ELx_FSC_FAULT  (0x04)
-#define ESR_ELx_FSC_PERM   (0x0C)
+
+/* ISS field definitions for exceptions taken in to Hyp */
 #define ESR_ELx_CV (UL(1) << 24)
 #define ESR_ELx_COND_SHIFT (20)
 #define ESR_ELx_COND_MASK  (UL(0xF) << ESR_ELx_COND_SHIFT)
@@ -109,6 +119,54 @@
((ESR_ELx_EC_BRK64 << ESR_ELx_EC_SHIFT) | ESR_ELx_IL |  \
 ((imm) & 0x))
 
+/* ISS field definitions for System instruction traps */
+#define ESR_ELx_SYS64_ISS_RES0_SHIFT   22
+#define ESR_ELx_SYS64_ISS_RES0_MASK(UL(0x7) << 
ESR_ELx_SYS64_ISS_RES0_SHIFT)
+#define ESR_ELx_SYS64_ISS_DIR_MASK 0x1
+#define ESR_ELx_SYS64_ISS_DIR_READ 0x1
+#define ESR_ELx_SYS64_ISS_DIR_WRITE0x0
+
+#define ESR_ELx_SYS64_ISS_RT_SHIFT 5
+#define ESR_ELx_SYS64_ISS_RT_MASK  (UL(0x1f) << ESR_ELx_SYS64_ISS_RT_SHIFT)
+#define ESR_ELx_SYS64_ISS_CRM_SHIFT1
+#define ESR_ELx_SYS64_ISS_CRM_MASK (UL(0xf) << ESR_ELx_SYS64_ISS_CRM_SHIFT)
+#define ESR_ELx_SYS64_ISS_CRN_SHIFT10
+#define ESR_ELx_SYS64_ISS_CRN_MASK (UL(0xf) << ESR_ELx_SYS64_ISS_CRN_SHIFT)
+#define ESR_ELx_SYS64_ISS_OP1_SHIFT14
+#define ESR_ELx_SYS64_ISS_OP1_MASK (UL(0x7) << ESR_ELx_SYS64_ISS_OP1_SHIFT)
+#define ESR_ELx_SYS64_ISS_OP2_SHIFT17
+#define ESR_ELx_SYS64_ISS_OP2_MASK (UL(0x7) << ESR_ELx_SYS64_ISS_OP2_SHIFT)
+#define ESR_ELx_SYS64_ISS_OP0_SHIFT20
+#define ESR_ELx_SYS64_ISS_OP0_MASK (UL(0x3) << ESR_ELx_SYS64_ISS_OP0_SHIFT)
+#define ESR_ELx_SYS64_ISS_SYS_MASK (ESR_ELx_SYS64_ISS_OP0_MASK | \
+ESR_ELx_SYS64_ISS_OP1_MASK | \
+ESR_ELx_SYS64_ISS_OP2_MASK | \
+ESR_ELx_SYS64_ISS_CRN_MASK | \
+ESR_ELx_SYS64_ISS_CRM_MASK)
+#define ESR_ELx_SYS64_ISS_SYS_VAL(op0, op1, op2, crn, crm) \
+   (((op0) << ESR_ELx_SYS64_ISS_OP0_SHIFT) 
| \
+((op1) << ESR_ELx_SYS64_ISS_OP1_SHIFT) 
| \
+((op2) << ESR_ELx_SYS64_ISS_OP2_SHIFT) 
| \
+((crn) << ESR_ELx_SYS64_ISS_CRN_SHIFT) 
| \
+((crm) << ESR_ELx_SYS64_ISS_CRM_SHIFT))
+/*
+ * User space cache operations have the following sysreg encoding
+ * in System instructions.
+ * op0=1, op1=3, op2=1, crn=7, crm={ 5, 10, 11, 14 }, WRITE (L=0)
+ */
+#define ESR_ELx_SYS64_ISS_CRM_DC_CIVAC 14
+#

[PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes

2016-08-26 Thread Suzuki K Poulose
Systems with differing CPU i-cache/d-cache line sizes can cause
problems with the cache management by software when the execution
is migrated from one to another. Usually, the application reads
the cache size on a CPU and then uses that length to perform cache
operations. However, if it gets migrated to another CPU with a smaller
cache line size, things could go completely wrong. To prevent such
cases, always use the smallest cache line size among the CPUs. The
kernel CPU feature infrastructure already keeps track of the safe
value for all CPUID registers including CTR. This patch works around
the problem by :

For kernel, dynamically patch the kernel to read the cache size
from the system wide copy of CTR_EL0.

For applications, trap read accesses to CTR_EL0 (by clearing the SCTLR.UCT)
and emulate the mrs instruction to return the system wide safe value
of CTR_EL0.

For faster access (i.e, avoiding to lookup the system wide value of CTR_EL0
via read_system_reg), we keep track of the pointer to table entry for
CTR_EL0 in the CPU feature infrastructure.

Cc: Mark Rutland 
Cc: Andre Przywara 
Cc: Will Deacon 
Cc: Catalin Marinas 
Signed-off-by: Suzuki K Poulose 
---
 arch/arm64/include/asm/assembler.h  | 25 +++--
 arch/arm64/include/asm/cpufeature.h |  4 +++-
 arch/arm64/include/asm/esr.h|  8 
 arch/arm64/include/asm/sysreg.h |  1 +
 arch/arm64/kernel/asm-offsets.c |  2 ++
 arch/arm64/kernel/cpu_errata.c  | 22 ++
 arch/arm64/kernel/cpufeature.c  |  9 +
 arch/arm64/kernel/traps.c   | 14 ++
 8 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index a4bb3f5..66bd268 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -216,6 +216,21 @@ lr .reqx30 // link register
.macro  mmid, rd, rn
ldr \rd, [\rn, #MM_CONTEXT_ID]
.endm
+/*
+ * read_ctr - read CTR_EL0. If the system has mismatched
+ * cache line sizes, provide the system wide safe value.
+ */
+   .macro  read_ctr, reg
+alternative_if_not ARM64_MISMATCHED_CACHE_LINE_SIZE
+   mrs \reg, ctr_el0   // read CTR
+   nop
+   nop
+alternative_else
+   ldr_l   \reg, sys_ctr_ftr   // Read system wide safe CTR 
value
+   ldr \reg, [\reg, #ARM64_FTR_SYSVAL] // from sys_ctr_ftr->sys_val
+alternative_endif
+   .endm
+
 
 /*
  * raw_dcache_line_size - get the minimum D-cache line size on this CPU
@@ -232,7 +247,10 @@ lr .reqx30 // link register
  * dcache_line_size - get the safe D-cache line size across all CPUs
  */
.macro  dcache_line_size, reg, tmp
-   raw_dcache_line_size\reg, \tmp
+   read_ctr\tmp
+   ubfm\tmp, \tmp, #16, #19// cache line size encoding
+   mov \reg, #4// bytes per word
+   lsl \reg, \reg, \tmp// actual cache line size
.endm
 
 /*
@@ -250,7 +268,10 @@ lr .reqx30 // link register
  * icache_line_size - get the safe I-cache line size across all CPUs
  */
.macro  icache_line_size, reg, tmp
-   raw_icache_line_size\reg, \tmp
+   read_ctr\tmp
+   and \tmp, \tmp, #0xf// cache line size encoding
+   mov \reg, #4// bytes per word
+   lsl \reg, \reg, \tmp// actual cache line size
.endm
 
 /*
diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 692b8d3..e99f2af 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -37,8 +37,9 @@
 #define ARM64_WORKAROUND_CAVIUM_27456  12
 #define ARM64_HAS_32BIT_EL013
 #define ARM64_HYP_OFFSET_LOW   14
+#define ARM64_MISMATCHED_CACHE_LINE_SIZE   15
 
-#define ARM64_NCAPS15
+#define ARM64_NCAPS16
 
 #ifndef __ASSEMBLY__
 
@@ -109,6 +110,7 @@ struct arm64_cpu_capabilities {
 };
 
 extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
+extern struct arm64_ftr_reg *sys_ctr_ftr;
 
 bool this_cpu_has_cap(unsigned int cap);
 
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 9875b32..d14c478 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -149,6 +149,9 @@
 ((op2) << ESR_ELx_SYS64_ISS_OP2_SHIFT) 
| \
 ((crn) << ESR_ELx_SYS64_ISS_CRN_SHIFT) 
| \
 ((crm) << ESR_ELx_SYS64_ISS_CRM_SHIFT))
+
+#define ESR_ELx_SYS64_ISS_SYS_OP_MASK  (ESR_ELx_SYS64_ISS_SYS_MASK | \
+ESR_ELx_SYS64_ISS_DIR_MASK)
 /*
  * User space cache operations have the following s

[PATCH v2 7/9] arm64: Introduce raw_{d,i}cache_line_size

2016-08-26 Thread Suzuki K Poulose
On systems with mismatched i/d cache min line sizes, we need to use
the smallest size possible across all CPUs. This will be done by fetching
the system wide safe value from CPU feature infrastructure.
However the some special users(e.g kexec, hibernate) would need the line
size on the CPU (rather than the system wide), when either the system
wide feature may not be accessible or it is guranteed that the caller
executes with a gurantee of no migration.
Provide another helper which will fetch cache line size on the current CPU.

Acked-by: James Morse 
Reviewed-by: Geoff Levand 
Signed-off-by: Suzuki K Poulose 
---
 arch/arm64/include/asm/assembler.h  | 24 
 arch/arm64/kernel/hibernate-asm.S   |  2 +-
 arch/arm64/kernel/relocate_kernel.S |  2 +-
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index d5025c6..a4bb3f5 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -218,9 +218,10 @@ lr .reqx30 // link register
.endm
 
 /*
- * dcache_line_size - get the minimum D-cache line size from the CTR register.
+ * raw_dcache_line_size - get the minimum D-cache line size on this CPU
+ * from the CTR register.
  */
-   .macro  dcache_line_size, reg, tmp
+   .macro  raw_dcache_line_size, reg, tmp
mrs \tmp, ctr_el0   // read CTR
ubfm\tmp, \tmp, #16, #19// cache line size encoding
mov \reg, #4// bytes per word
@@ -228,9 +229,17 @@ lr .reqx30 // link register
.endm
 
 /*
- * icache_line_size - get the minimum I-cache line size from the CTR register.
+ * dcache_line_size - get the safe D-cache line size across all CPUs
  */
-   .macro  icache_line_size, reg, tmp
+   .macro  dcache_line_size, reg, tmp
+   raw_dcache_line_size\reg, \tmp
+   .endm
+
+/*
+ * raw_icache_line_size - get the minimum I-cache line size on this CPU
+ * from the CTR register.
+ */
+   .macro  raw_icache_line_size, reg, tmp
mrs \tmp, ctr_el0   // read CTR
and \tmp, \tmp, #0xf// cache line size encoding
mov \reg, #4// bytes per word
@@ -238,6 +247,13 @@ lr .reqx30 // link register
.endm
 
 /*
+ * icache_line_size - get the safe I-cache line size across all CPUs
+ */
+   .macro  icache_line_size, reg, tmp
+   raw_icache_line_size\reg, \tmp
+   .endm
+
+/*
  * tcr_set_idmap_t0sz - update TCR.T0SZ so that we can load the ID map
  */
.macro  tcr_set_idmap_t0sz, valreg, tmpreg
diff --git a/arch/arm64/kernel/hibernate-asm.S 
b/arch/arm64/kernel/hibernate-asm.S
index 46f29b6..4ebc6a1 100644
--- a/arch/arm64/kernel/hibernate-asm.S
+++ b/arch/arm64/kernel/hibernate-asm.S
@@ -96,7 +96,7 @@ ENTRY(swsusp_arch_suspend_exit)
 
add x1, x10, #PAGE_SIZE
/* Clean the copied page to PoU - based on flush_icache_range() */
-   dcache_line_size x2, x3
+   raw_dcache_line_size x2, x3
sub x3, x2, #1
bic x4, x10, x3
 2: dc  cvau, x4/* clean D line / unified line */
diff --git a/arch/arm64/kernel/relocate_kernel.S 
b/arch/arm64/kernel/relocate_kernel.S
index 51b73cd..ce704a4 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -34,7 +34,7 @@ ENTRY(arm64_relocate_new_kernel)
/* Setup the list loop variables. */
mov x17, x1 /* x17 = kimage_start */
mov x16, x0 /* x16 = kimage_head */
-   dcache_line_size x15, x0/* x15 = dcache line size */
+   raw_dcache_line_size x15, x0/* x15 = dcache line size */
mov x14, xzr/* x14 = entry ptr */
mov x13, xzr/* x13 = copy dest */
 
-- 
2.7.4



[PATCH v2 0/9] arm64: Work around for mismatched cache line size

2016-08-26 Thread Suzuki K Poulose
This series adds a work around for systems with mismatched {I,D}-cache
line sizes. When a thread of execution gets migrated to a different CPU,
the cache line size it had cached could be larger than that of the new
CPU. This could cause data corruption issues. We work around this by

 - Dynamically patching the kernel to use the smallest line size on the
   system (from the CPU feature infrastructure)
 - Trapping the userspace access to CTR_EL0 (by clearing SCTLR_EL1.UCT) and
   emulating it with the system wide safe value of CTR.

The series also adds support for alternative code patching of adrp
instructions by adjusting the PC-relative address offset to reflect
the new PC.

The series has been tested on Juno with a hack to forced enabling
of the capability.

Applies on v4.8-rc3. The tree is avaiable at :
  git://linux-arm.org/linux-skp.git ctr-v2

Changes since V1:

 - Replace adr_adrp insn helper with seperate helpers for adr and adrp.
 - Add/use align_down() macro for adjusting the page address for adrp offsets.
 - Add comments for existing ISS field defintions.
 - Added a patch to disallow silent patching of unhandled pc relative
   instructions in alternative code patching.

Suzuki K Poulose (9):
  arm64: Set the safe value for L1 icache policy
  arm64: Use consistent naming for errata handling
  arm64: Rearrange CPU errata workaround checks
  arm64: alternative: Disallow patching instructions using literals
  arm64: insn: Add helpers for adrp offsets
  arm64: alternative: Add support for patching adrp instructions
  arm64: Introduce raw_{d,i}cache_line_size
  arm64: Refactor sysinstr exception handling
  arm64: Work around systems with mismatched cache line sizes

 arch/arm64/include/asm/assembler.h  | 45 +--
 arch/arm64/include/asm/cpufeature.h | 14 +++---
 arch/arm64/include/asm/esr.h| 84 +++
 arch/arm64/include/asm/insn.h   | 11 -
 arch/arm64/include/asm/sysreg.h |  1 +
 arch/arm64/kernel/alternative.c | 21 +
 arch/arm64/kernel/asm-offsets.c |  2 +
 arch/arm64/kernel/cpu_errata.c  | 26 ++-
 arch/arm64/kernel/cpufeature.c  | 44 ++-
 arch/arm64/kernel/cpuinfo.c |  2 -
 arch/arm64/kernel/hibernate-asm.S   |  2 +-
 arch/arm64/kernel/insn.c| 13 ++
 arch/arm64/kernel/relocate_kernel.S |  2 +-
 arch/arm64/kernel/smp.c |  8 +++-
 arch/arm64/kernel/traps.c   | 87 ++---
 15 files changed, 297 insertions(+), 65 deletions(-)

-- 
2.7.4



[PATCH v2 3/9] arm64: Rearrange CPU errata workaround checks

2016-08-26 Thread Suzuki K Poulose
Right now we run through the work around checks on a CPU
from __cpuinfo_store_cpu. There are some problems with that:

1) We initialise the system wide CPU feature registers only after the
Boot CPU updates its cpuinfo. Now, if a work around depends on the
variance of a CPU ID feature (e.g, check for Cache Line size mismatch),
we have no way of performing it cleanly for the boot CPU.

2) It is out of place, invoked from __cpuinfo_store_cpu() in cpuinfo.c. It
is not an obvious place for that.

This patch rearranges the CPU specific capability(aka work around) checks.

1) At the moment we use verify_local_cpu_capabilities() to check if a new
CPU has all the system advertised features. Use this for the secondary CPUs
to perform the work around check. For that we rename
  verify_local_cpu_capabilities() => check_local_cpu_capabilities()
which:

   If the system wide capabilities haven't been initialised (i.e, the CPU
   is activated at the boot), update the system wide detected work arounds.

   Otherwise (i.e a CPU hotplugged in later) verify that this CPU conforms to 
the
   system wide capabilities.

2) Boot CPU updates the work arounds from smp_prepare_boot_cpu() after we have
initialised the system wide CPU feature values.

Cc: Mark Rutland 
Cc: Andre Przywara 
Cc: Will Deacon 
Cc: Catalin Marinas 
Signed-off-by: Suzuki K Poulose 
---
 arch/arm64/include/asm/cpufeature.h |  4 ++--
 arch/arm64/kernel/cpufeature.c  | 30 --
 arch/arm64/kernel/cpuinfo.c |  2 --
 arch/arm64/kernel/smp.c |  8 +++-
 4 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index aadd946..692b8d3 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -193,11 +193,11 @@ void __init setup_cpu_features(void);
 void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
const char *info);
 void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps);
+void check_local_cpu_capabilities(void);
+
 void update_cpu_errata_workarounds(void);
 void __init enable_errata_workarounds(void);
-
 void verify_local_cpu_errata_workarounds(void);
-void verify_local_cpu_capabilities(void);
 
 u64 read_system_reg(u32 id);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index a591c35..fcf87ca 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1005,23 +1005,33 @@ verify_local_cpu_features(const struct 
arm64_cpu_capabilities *caps)
  * cannot do anything to fix it up and could cause unexpected failures. So
  * we park the CPU.
  */
-void verify_local_cpu_capabilities(void)
+static void verify_local_cpu_capabilities(void)
 {
+   verify_local_cpu_errata_workarounds();
+   verify_local_cpu_features(arm64_features);
+   verify_local_elf_hwcaps(arm64_elf_hwcaps);
+   if (system_supports_32bit_el0())
+   verify_local_elf_hwcaps(compat_elf_hwcaps);
+}
 
+void check_local_cpu_capabilities(void)
+{
+   /*
+* All secondary CPUs should conform to the early CPU features
+* in use by the kernel based on boot CPU.
+*/
check_early_cpu_features();
 
/*
-* If we haven't computed the system capabilities, there is nothing
-* to verify.
+* If we haven't finalised the system capabilities, this CPU gets
+* a chance to update the errata work arounds.
+* Otherwise, this CPU should verify that it has all the system
+* advertised capabilities.
 */
if (!sys_caps_initialised)
-   return;
-
-   verify_local_cpu_errata_workarounds();
-   verify_local_cpu_features(arm64_features);
-   verify_local_elf_hwcaps(arm64_elf_hwcaps);
-   if (system_supports_32bit_el0())
-   verify_local_elf_hwcaps(compat_elf_hwcaps);
+   update_cpu_errata_workarounds();
+   else
+   verify_local_cpu_capabilities();
 }
 
 static void __init setup_feature_capabilities(void)
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 4fa7b73..b3d5b3e 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -363,8 +363,6 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
}
 
cpuinfo_detect_icache_policy(info);
-
-   update_cpu_errata_workarounds();
 }
 
 void cpuinfo_store_cpu(void)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d93d433..99d8cc3 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -239,7 +239,7 @@ asmlinkage void secondary_start_kernel(void)
 * this CPU ticks all of those. If it doesn't, the CPU will
 * fail to come online.
 */
-   verify_local_cpu_capabilities();
+   check_local_cpu_capabilities();
 
if (cpu_ops[cpu]->cpu_postboot)
 

[PATCH v2 2/9] arm64: Use consistent naming for errata handling

2016-08-26 Thread Suzuki K Poulose
This is a cosmetic change to rename the functions dealing with
the errata work arounds to be more consistent with their naming.

1) check_local_cpu_errata() => update_cpu_errata_workarounds()
check_local_cpu_errata() actually updates the system's errata work
arounds. So rename it to reflect the same.

2) verify_local_cpu_errata() => verify_local_cpu_errata_workarounds()
Use errata_workarounds instead of _errata.

Signed-off-by: Suzuki K Poulose 
---
 arch/arm64/include/asm/cpufeature.h | 4 ++--
 arch/arm64/kernel/cpu_errata.c  | 4 ++--
 arch/arm64/kernel/cpufeature.c  | 2 +-
 arch/arm64/kernel/cpuinfo.c | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index f6f5e49..aadd946 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -193,10 +193,10 @@ void __init setup_cpu_features(void);
 void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
const char *info);
 void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps);
-void check_local_cpu_errata(void);
+void update_cpu_errata_workarounds(void);
 void __init enable_errata_workarounds(void);
 
-void verify_local_cpu_errata(void);
+void verify_local_cpu_errata_workarounds(void);
 void verify_local_cpu_capabilities(void);
 
 u64 read_system_reg(u32 id);
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 82b0fc2..5836b3d 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -116,7 +116,7 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
  * and the related information is freed soon after. If the new CPU requires
  * an errata not detected at boot, fail this CPU.
  */
-void verify_local_cpu_errata(void)
+void verify_local_cpu_errata_workarounds(void)
 {
const struct arm64_cpu_capabilities *caps = arm64_errata;
 
@@ -131,7 +131,7 @@ void verify_local_cpu_errata(void)
}
 }
 
-void check_local_cpu_errata(void)
+void update_cpu_errata_workarounds(void)
 {
update_cpu_capabilities(arm64_errata, "enabling workaround for");
 }
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 43f73e0..a591c35 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1017,7 +1017,7 @@ void verify_local_cpu_capabilities(void)
if (!sys_caps_initialised)
return;
 
-   verify_local_cpu_errata();
+   verify_local_cpu_errata_workarounds();
verify_local_cpu_features(arm64_features);
verify_local_elf_hwcaps(arm64_elf_hwcaps);
if (system_supports_32bit_el0())
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index ed1b84f..4fa7b73 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -364,7 +364,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 
cpuinfo_detect_icache_policy(info);
 
-   check_local_cpu_errata();
+   update_cpu_errata_workarounds();
 }
 
 void cpuinfo_store_cpu(void)
-- 
2.7.4



[PATCH v2 1/9] arm64: Set the safe value for L1 icache policy

2016-08-26 Thread Suzuki K Poulose
Right now we use 0 as the safe value for CTR_EL0:L1Ip, which is
not defined at the moment. The safer value for the L1Ip should be
the weakest of the policies, which happens to be AIVIVT. While at it,
fix the comment about safe_val.

Cc: Mark Rutland 
Cc: Catalin Marinas 
Cc: Will Deacon 
Signed-off-by: Suzuki K Poulose 
---
 arch/arm64/include/asm/cpufeature.h | 2 +-
 arch/arm64/kernel/cpufeature.c  | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 7099f26..f6f5e49 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -63,7 +63,7 @@ struct arm64_ftr_bits {
enum ftr_type   type;
u8  shift;
u8  width;
-   s64 safe_val; /* safe value for discrete features */
+   s64 safe_val; /* safe value for FTR_EXACT features */
 };
 
 /*
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 62272ea..43f73e0 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -147,9 +147,10 @@ static struct arm64_ftr_bits ftr_ctr[] = {
ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 1),   /* DminLine */
/*
 * Linux can handle differing I-cache policies. Userspace JITs will
-* make use of *minLine
+* make use of *minLine.
+* If we have differing I-cache policies, report it as the weakest - 
AIVIVT.
 */
-   ARM64_FTR_BITS(FTR_NONSTRICT, FTR_EXACT, 14, 2, 0), /* L1Ip */
+   ARM64_FTR_BITS(FTR_NONSTRICT, FTR_EXACT, 14, 2, ICACHE_POLICY_AIVIVT),  
/* L1Ip */
ARM64_FTR_BITS(FTR_STRICT, FTR_EXACT, 4, 10, 0),/* RAZ */
ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0),/* IminLine */
ARM64_FTR_END,
-- 
2.7.4



[PATCH v2 4/9] arm64: alternative: Disallow patching instructions using literals

2016-08-26 Thread Suzuki K Poulose
The alternative code patching doesn't check if the replaced instruction
uses a pc relative literal. This could cause silent corruption in the
instruction stream as the instruction will be executed from a different
address than what it was compiled for. Catch all such cases.

Cc: Marc Zyngier 
Cc: Andre Przywara 
Cc: Mark Rutland 
Cc: Catalin Marinas 
Suggested-by: Will Deacon 
Signed-off-by: Suzuki K Poulose 
---
 arch/arm64/kernel/alternative.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index d2ee1b2..6b269a9 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -80,6 +80,12 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, 
u32 *altinsnptr)
offset = target - (unsigned long)insnptr;
insn = aarch64_set_branch_offset(insn, offset);
}
+   } else if (aarch64_insn_uses_literal(insn)) {
+   /*
+* Disallow patching unhandled instructions using PC relative
+* literal addresses
+*/
+   BUG();
}
 
return insn;
-- 
2.7.4



Re: [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes

2016-08-26 Thread Suzuki K Poulose

On 26/08/16 14:04, Suzuki K Poulose wrote:

On 26/08/16 12:03, Ard Biesheuvel wrote:

Hello Suzuki,





For faster access (i.e, avoiding to lookup the system wide value of CTR_EL0
via read_system_reg), we keep track of the pointer to table entry for
CTR_EL0 in the CPU feature infrastructure.



IIUC it is the runtime sorting of the arm64_ftr_reg array that
requires you to stash a pointer to CTR_EL0's entry somewhere, so that
you can dereference it without doing the bsearch.


Correct.








IMO, this is a pattern that we should avoid: you are introducing one
instance now, which will make it hard to say no to the next one in the
future. Isn't there a better way to organize the arm64_ftr_reg array
that allows us to reference entries directly? Ideally, a way that gets
rid of the runtime sorting, since I don't think that is a good
replacement for developer discipline anyway (although I should have
spoken up when that was first introduced) Or am I missing something
here?


I had some form of direct access to the feature register in one of
the versions [0], but was dropped based on Catalin's suggestion at [1].


Forgot to add, [0] wouldn't solve this issue cleanly either. It would simply
speed up the read_system_reg(). So we do need a call to read_system_reg()
from assembly code, which makes it a little bit tricky.

Suzuki




[0] https://lkml.org/lkml/2015/10/5/504
[1] https://lkml.org/lkml/2015/10/7/558


Suzuki






Re: [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes

2016-08-26 Thread Suzuki K Poulose

On 26/08/16 12:03, Ard Biesheuvel wrote:

Hello Suzuki,





For faster access (i.e, avoiding to lookup the system wide value of CTR_EL0
via read_system_reg), we keep track of the pointer to table entry for
CTR_EL0 in the CPU feature infrastructure.



IIUC it is the runtime sorting of the arm64_ftr_reg array that
requires you to stash a pointer to CTR_EL0's entry somewhere, so that
you can dereference it without doing the bsearch.


Correct.



IMO, this is a pattern that we should avoid: you are introducing one
instance now, which will make it hard to say no to the next one in the
future. Isn't there a better way to organize the arm64_ftr_reg array
that allows us to reference entries directly? Ideally, a way that gets
rid of the runtime sorting, since I don't think that is a good
replacement for developer discipline anyway (although I should have
spoken up when that was first introduced) Or am I missing something
here?


I had some form of direct access to the feature register in one of
the versions [0], but was dropped based on Catalin's suggestion at [1].


[0] https://lkml.org/lkml/2015/10/5/504
[1] https://lkml.org/lkml/2015/10/7/558


Suzuki




Re: [PATCH v5 6/6] perf: ARM DynamIQ Shared Unit PMU support

2017-08-17 Thread Suzuki K Poulose

Hi Mark,

On 16/08/17 15:10, Mark Rutland wrote:

On Tue, Aug 08, 2017 at 12:37:26PM +0100, Suzuki K Poulose wrote:

+/*
+ * struct dsu_pmu  - DSU PMU descriptor
+ *
+ * @pmu_lock   : Protects accesses to DSU PMU register from multiple
+ *   CPUs.
+ * @hw_events  : Holds the event counter state.
+ * @supported_cpus : CPUs attached to the DSU.
+ * @active_cpu : CPU to which the PMU is bound for accesses.
+ * @cpuhp_node : Node for CPU hotplug notifier link.
+ * @num_counters   : Number of event counters implemented by the PMU,
+ *   excluding the cycle counter.
+ * @irq: Interrupt line for counter overflow.
+ * @cpmceid_bitmap : Bitmap for the availability of architected common
+ *   events (event_code < 0x40).
+ */
+struct dsu_pmu {
+   struct pmu  pmu;
+   struct device   *dev;
+   raw_spinlock_t  pmu_lock;


I'm in two minds about pmu_lock. The core has to take the ctx lock for
most (all?) operations, and we only allow events to be opened on a
particular CPU, so there shouldn't be concurrent accesses from other
CPUs.

We do need to disable interrupts in order to serialise against the IRQ
handler, and disabling IRQs doesn't make it clear that we're protecting
a particular resource.

Could we update the description to make it clear that the pmu_lock is
used to serialise against the IRQ handler? It also happens to protect
cross-cpu accesses, but those would be a bug.


Sure, I can.



Other PMU drivers have locks which may not be necessary; I can try to
clean those up later.

[...]


+static struct attribute *dsu_pmu_event_attrs[] = {
+   DSU_EVENT_ATTR(cycles, 0x11),
+   DSU_EVENT_ATTR(bus_acecss, 0x19),
+   DSU_EVENT_ATTR(memory_error, 0x1a),
+   DSU_EVENT_ATTR(bus_cycles, 0x1d),
+   DSU_EVENT_ATTR(l3d_cache_allocate, 0x29),
+   DSU_EVENT_ATTR(l3d_cache_refill, 0x2a),
+   DSU_EVENT_ATTR(l3d_cache, 0x2b),
+   DSU_EVENT_ATTR(l3d_cache_wb, 0x2c),


MAX_COMMON_EVENTS seems to be 0x40, so are we just assuming the below
are implemented?

If so, why bother exposing them at all? We can't know that they're going
to work...



Thats right. The only defending argument is that the event codes are specific
to the DynamIQ, unlike the COMMON_EVENTS which matches the ARMv8 PMU event 
codes.
So, someone would need to carefully find the event code for a particular event.
Having these entries would make it easier for the user to do the profiling.

Also, future revisions of the DSU could potentially expose more events. So there
wouldn't be any way to tell the user (provided there aren't any changes to the
programming model and we decide to reuse the same compatible string) what we 
*could*
potentially support. In short, this is not a problem at the moment and we could
do something about it as and when required.


+   DSU_EVENT_ATTR(bus_access_rd, 0x60),
+   DSU_EVENT_ATTR(bus_access_wr, 0x61),
+   DSU_EVENT_ATTR(bus_access_shared, 0x62),
+   DSU_EVENT_ATTR(bus_access_not_shared, 0x63),
+   DSU_EVENT_ATTR(bus_access_normal, 0x64),


...


+static bool dsu_pmu_event_supported(struct dsu_pmu *dsu_pmu, unsigned long evt)
+{
+   /*
+* DSU PMU provides a bit map for events with
+*   id < DSU_PMU_MAX_COMMON_EVENTS.
+* Events above the range are reported as supported, as
+* tracking the support needs per-chip tables and makes
+* it difficult to track. If an event is not supported,
+* it won't be counted.
+*/
+   if (evt >= DSU_PMU_MAX_COMMON_EVENTS)
+   return true;
+   /* The PMU driver doesn't support chain mode */
+   if (evt == DSU_PMU_EVT_CHAIN)
+   return false;
+   return test_bit(evt, dsu_pmu->cpmceid_bitmap);
+}


I don't think we need to use this for event_init() (more on that below),
so I think this can be simplified.

[...]


+static struct attribute *dsu_pmu_cpumask_attrs[] = {
+   DSU_CPUMASK_ATTR(cpumask, DSU_ACTIVE_CPU_MASK),
+   DSU_CPUMASK_ATTR(supported_cpus, DSU_SUPPORTED_CPU_MASK),
+   NULL,
+};


Normally we only expose one mask.

Why do we need the supported cpu mask? What is the intended use-case?

[...]



Thats just to let the user know the CPUs bound to this PMU instance.


+static inline u64 dsu_pmu_read_counter(struct perf_event *event)
+{
+   u64 val = 0;
+   unsigned long flags;
+   struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+   int idx = event->hw.idx;
+
+   if (WARN_ON(!cpumask_test_cpu(smp_processor_id(),
+&dsu_pmu->supported_cpus)))
+   return 0;


It would be better to check that this is the active CPU, since reading
on another supported CPU would still be wrong.

Likewise for the check in dsu

Re: [PATCH v5 6/6] perf: ARM DynamIQ Shared Unit PMU support

2017-08-18 Thread Suzuki K Poulose

On 17/08/17 16:57, Mark Rutland wrote:

On Thu, Aug 17, 2017 at 03:52:24PM +0100, Suzuki K Poulose wrote:

On 16/08/17 15:10, Mark Rutland wrote:

On Tue, Aug 08, 2017 at 12:37:26PM +0100, Suzuki K Poulose wrote:



+static struct attribute *dsu_pmu_event_attrs[] = {
+   DSU_EVENT_ATTR(cycles, 0x11),
+   DSU_EVENT_ATTR(bus_acecss, 0x19),
+   DSU_EVENT_ATTR(memory_error, 0x1a),
+   DSU_EVENT_ATTR(bus_cycles, 0x1d),
+   DSU_EVENT_ATTR(l3d_cache_allocate, 0x29),
+   DSU_EVENT_ATTR(l3d_cache_refill, 0x2a),
+   DSU_EVENT_ATTR(l3d_cache, 0x2b),
+   DSU_EVENT_ATTR(l3d_cache_wb, 0x2c),


MAX_COMMON_EVENTS seems to be 0x40, so are we just assuming the below
are implemented?

If so, why bother exposing them at all? We can't know that they're going
to work...


Thats right. The only defending argument is that the event codes are specific
to the DynamIQ, unlike the COMMON_EVENTS which matches the ARMv8 PMU event 
codes.
So, someone would need to carefully find the event code for a particular event.
Having these entries would make it easier for the user to do the profiling.

Also, future revisions of the DSU could potentially expose more events. So there
wouldn't be any way to tell the user (provided there aren't any changes to the
programming model and we decide to reuse the same compatible string) what we 
*could*
potentially support. In short, this is not a problem at the moment and we could
do something about it as and when required.


I'd rather that we only describes those that we can probe from the
PMCEID* registers, and for the rest, left the user to consult a manual.

I can well imagine future variants of this IP supporing different
events, and I'd prefer to avoid poriflerating tables for those.

[...]


Fair enough. I will trim the list.



+static struct attribute *dsu_pmu_cpumask_attrs[] = {
+   DSU_CPUMASK_ATTR(cpumask, DSU_ACTIVE_CPU_MASK),
+   DSU_CPUMASK_ATTR(supported_cpus, DSU_SUPPORTED_CPU_MASK),
+   NULL,
+};


Normally we only expose one mask.

Why do we need the supported cpu mask? What is the intended use-case?


Thats just to let the user know the CPUs bound to this PMU instance.


I guess that can be useful, though the cpumasks we expose today are
confusing as-is, and this is another point of confusion.

We could drop this for now, and add it when requested, or we should try
to make the naming clearer somehow -- "supported" can be read in a
number of ways.


How about dsu_cpus or connected_cpus ?



Further, it would be worth documenting this PMU under
Documentation/perf/.

[...]



OK



+static int dsu_pmu_add(struct perf_event *event, int flags)
+{
+   struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+   struct dsu_hw_events *hw_events = &dsu_pmu->hw_events;
+   struct hw_perf_event *hwc = &event->hw;
+   int idx;
+
+   if (!cpumask_test_cpu(smp_processor_id(), &dsu_pmu->supported_cpus))
+   return -ENOENT;


This shouldn't ever happen, and we can check against the active cpumask,
with a WARN_ON_ONCE(). We have to do this for CPU PMUs since they
support events which can migrate with tasks, but that's not the case
here.

[...]


But, we have to make sure we are reading the events from a CPU within this
DSU in case we have multiple DSU units.


Regardless of how many instances there are, the core should *never*
add() a CPU-bound event (which DSU events are) on another CPU. To do so
would be a major bug.

So if this is just a paranoid check, we should WARN_ON_ONCE().
Otherwise, it's unnecessary.


OK




+/**
+ * dsu_pmu_dt_get_cpus: Get the list of CPUs in the cluster.
+ */
+static int dsu_pmu_dt_get_cpus(struct device_node *dev, cpumask_t *mask)
+{
+   int i = 0, n, cpu;
+   struct device_node *cpu_node;
+
+   n = of_count_phandle_with_args(dev, "cpus", NULL);
+   if (n <= 0)
+   return -ENODEV;
+   for (; i < n; i++) {
+   cpu_node = of_parse_phandle(dev, "cpus", i);
+   if (!cpu_node)
+   break;
+   cpu = of_cpu_node_to_id(cpu_node);
+   of_node_put(cpu_node);
+   if (cpu < 0)
+   break;


I believe this can happen if the kernel's nr_cpus were capped below the
number of CPUs in the system. So we need to iterate all entries of the
cpus list regardless.



Good point. Initial version of the driver used to ignore the failures, but
was changed later. I will roll it back.


Thanks. If you could add a comment as to why, that'll hopefully avoid
anyone trying to "fix" the logic later.

[...]



Sure


+   cpmcr = __dsu_pmu_read_pmcr();
+   dsu_pmu->num_counters = ((cpmcr >> CLUSTERPMCR_N_SHIFT) &
+   CLUSTERPMCR_N_MASK);
+   if (!dsu_pmu->num_counters)
+   return;


Is that valid? What range

Re: [PATCH v8 5/8] arm64: Use of_cpu_node_to_id helper for CPU topology parsing

2017-10-17 Thread Suzuki K Poulose

On 17/10/17 17:11, Will Deacon wrote:

On Tue, Oct 17, 2017 at 04:24:23PM +0100, Mark Rutland wrote:

On Tue, Oct 10, 2017 at 11:33:00AM +0100, Suzuki K Poulose wrote:

Make use of the new generic helper to convert an of_node of a CPU
to the logical CPU id in parsing the topology.

Cc: Catalin Marinas 
Cc: Leo Yan 
Cc: Will Deacon 
Cc: Mark Rutland 
Signed-off-by: Suzuki K Poulose 


This looks sane to me, but it will need an ack from Will or Catalin.

FWIW:

Acked-by: Mark Rutland 

Thanks,
Mark.


---
  arch/arm64/kernel/topology.c | 16 ++--
  1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 8d48b233e6ce..21868530018e 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -37,18 +37,14 @@ static int __init get_cpu_for_node(struct device_node *node)
if (!cpu_node)
return -1;
  
-	for_each_possible_cpu(cpu) {

-   if (of_get_cpu_node(cpu, NULL) == cpu_node) {
-   topology_parse_cpu_capacity(cpu_node, cpu);
-   of_node_put(cpu_node);
-   return cpu;
-   }
-   }
-
-   pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
+   cpu = of_cpu_node_to_id(cpu_node);
+   if (cpu >= 0)
+   topology_parse_cpu_capacity(cpu_node, cpu);
+   else
+   pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
  
  	of_node_put(cpu_node);


This of_node_put is confusing me. Since of_cpu_node_to_id appears to be
balanced with its use of the node refcount, is this one intended to pair
with the earlier call to of_parse_phandle?


Yes.

 If so, does that mainline is

currently broken here because it doesn't drop the refcount twice for the
matching node?


No. This of_node_put is for the failure case where we couldn't match a CPU.
In the success case, it is dropped just before we return the result within
the loop.

Cheers
Suzuki


 Or do we need to return with that held?


Will





Re: [PATCH v8 5/8] arm64: Use of_cpu_node_to_id helper for CPU topology parsing

2017-10-17 Thread Suzuki K Poulose

On 17/10/17 17:20, Suzuki K Poulose wrote:

On 17/10/17 17:11, Will Deacon wrote:

On Tue, Oct 17, 2017 at 04:24:23PM +0100, Mark Rutland wrote:

On Tue, Oct 10, 2017 at 11:33:00AM +0100, Suzuki K Poulose wrote:

Make use of the new generic helper to convert an of_node of a CPU
to the logical CPU id in parsing the topology.

Cc: Catalin Marinas 
Cc: Leo Yan 
Cc: Will Deacon 
Cc: Mark Rutland 
Signed-off-by: Suzuki K Poulose 


This looks sane to me, but it will need an ack from Will or Catalin.

FWIW:

Acked-by: Mark Rutland 

Thanks,
Mark.


---
  arch/arm64/kernel/topology.c | 16 ++--
  1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 8d48b233e6ce..21868530018e 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -37,18 +37,14 @@ static int __init get_cpu_for_node(struct device_node *node)
  if (!cpu_node)
  return -1;
-    for_each_possible_cpu(cpu) {
-    if (of_get_cpu_node(cpu, NULL) == cpu_node) {
-    topology_parse_cpu_capacity(cpu_node, cpu);
-    of_node_put(cpu_node);
-    return cpu;
-    }
-    }
-
-    pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
+    cpu = of_cpu_node_to_id(cpu_node);
+    if (cpu >= 0)
+    topology_parse_cpu_capacity(cpu_node, cpu);
+    else
+    pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
  of_node_put(cpu_node);


This of_node_put is confusing me. Since of_cpu_node_to_id appears to be
balanced with its use of the node refcount, is this one intended to pair
with the earlier call to of_parse_phandle?


Yes.

  If so, does that mainline is

currently broken here because it doesn't drop the refcount twice for the
matching node?


No. This of_node_put is for the failure case where we couldn't match a CPU.
In the success case, it is dropped just before we return the result within
the loop.


As we discussed offline, there is indeed a missing of_node_put() for all the
nodes we loop through to match. But with this change, we have fixed that.
And I don't think it is worth sending to stable.

Cheers
Suzuki


[PATCH v9 7/8] dt-bindings: Document devicetree binding for ARM DSU PMU

2017-10-31 Thread Suzuki K Poulose
This patch documents the devicetree bindings for ARM DSU PMU.

Cc: Mark Rutland 
Cc: Will Deacon 
Cc: devicet...@vger.kernel.org
Cc: frowand.l...@gmail.com
Acked-by: Rob Herring 
Signed-off-by: Suzuki K Poulose 
---
Changes since V3:
 - Fixed node name in the example, suggested by Rob
---
 .../devicetree/bindings/arm/arm-dsu-pmu.txt| 27 ++
 1 file changed, 27 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt

diff --git a/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt 
b/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
new file mode 100644
index ..6efabba530f1
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
@@ -0,0 +1,27 @@
+* ARM DynamIQ Shared Unit (DSU) Performance Monitor Unit (PMU)
+
+ARM DyanmIQ Shared Unit (DSU) integrates one or more CPU cores
+with a shared L3 memory system, control logic and external interfaces to
+form a multicore cluster. The PMU enables to gather various statistics on
+the operations of the DSU. The PMU provides independent 32bit counters that
+can count any of the supported events, along with a 64bit cycle counter.
+The PMU is accessed via CPU system registers and has no MMIO component.
+
+** DSU PMU required properties:
+
+- compatible   : should be one of :
+
+   "arm,dsu-pmu"
+
+- interrupts   : Exactly 1 SPI must be listed.
+
+- cpus : List of phandles for the CPUs connected to this DSU instance.
+
+
+** Example:
+
+dsu-pmu-0 {
+   compatible = "arm,dsu-pmu";
+   interrupts = ;
+   cpus = <&cpu_0>, <&cpu_1>;
+};
-- 
2.13.6



[PATCH v9 4/8] irqchip: gic-v3: Use of_cpu_node_to_id helper

2017-10-31 Thread Suzuki K Poulose
Use the new generic helper of_cpu_node_to_id() instead
of using our own version to map a device node to logical CPU
number.

Acked-by: Marc Zyngier 
Signed-off-by: Suzuki K Poulose 
---
Changes since V3:
 - Reflect the change in the helper name and return value.
---
 drivers/irqchip/irq-gic-v3.c | 29 ++---
 1 file changed, 2 insertions(+), 27 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index b5df99c6f680..04cecab71597 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1038,31 +1038,6 @@ static int __init gic_validate_dist_version(void __iomem 
*dist_base)
return 0;
 }
 
-static int get_cpu_number(struct device_node *dn)
-{
-   const __be32 *cell;
-   u64 hwid;
-   int cpu;
-
-   cell = of_get_property(dn, "reg", NULL);
-   if (!cell)
-   return -1;
-
-   hwid = of_read_number(cell, of_n_addr_cells(dn));
-
-   /*
-* Non affinity bits must be set to 0 in the DT
-*/
-   if (hwid & ~MPIDR_HWID_BITMASK)
-   return -1;
-
-   for_each_possible_cpu(cpu)
-   if (cpu_logical_map(cpu) == hwid)
-   return cpu;
-
-   return -1;
-}
-
 /* Create all possible partitions at boot time */
 static void __init gic_populate_ppi_partitions(struct device_node *gic_node)
 {
@@ -1113,8 +1088,8 @@ static void __init gic_populate_ppi_partitions(struct 
device_node *gic_node)
if (WARN_ON(!cpu_node))
continue;
 
-   cpu = get_cpu_number(cpu_node);
-   if (WARN_ON(cpu == -1))
+   cpu = of_cpu_node_to_id(cpu_node);
+   if (WARN_ON(cpu < 0))
continue;
 
pr_cont("%pOF[%d] ", cpu_node, cpu);
-- 
2.13.6



[PATCH v9 3/8] coresight: of: Use of_cpu_node_to_id helper

2017-10-31 Thread Suzuki K Poulose
Reuse the new generic helper, of_cpu_node_to_id() to map a
given CPU phandle to a logical CPU number.

Acked-by: Mathieu Poirier 
Tested-by: Leo Yan 
Signed-off-by: Suzuki K Poulose 
---
Changes since V4:
 - Fix a regression introduced in v4, reported by bugrobot
Changes since V3:
 - Reflect the renaming of the helper and return value changes
---
 drivers/hwtracing/coresight/of_coresight.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/hwtracing/coresight/of_coresight.c 
b/drivers/hwtracing/coresight/of_coresight.c
index a18794128bf8..7c375443ede6 100644
--- a/drivers/hwtracing/coresight/of_coresight.c
+++ b/drivers/hwtracing/coresight/of_coresight.c
@@ -104,26 +104,17 @@ static int of_coresight_alloc_memory(struct device *dev,
 int of_coresight_get_cpu(const struct device_node *node)
 {
int cpu;
-   bool found;
-   struct device_node *dn, *np;
+   struct device_node *dn;
 
dn = of_parse_phandle(node, "cpu", 0);
-
/* Affinity defaults to CPU0 */
if (!dn)
return 0;
-
-   for_each_possible_cpu(cpu) {
-   np = of_cpu_device_node_get(cpu);
-   found = (dn == np);
-   of_node_put(np);
-   if (found)
-   break;
-   }
+   cpu = of_cpu_node_to_id(dn);
of_node_put(dn);
 
/* Affinity to CPU0 if no cpu nodes are found */
-   return found ? cpu : 0;
+   return (cpu < 0) ? 0 : cpu;
 }
 EXPORT_SYMBOL_GPL(of_coresight_get_cpu);
 
-- 
2.13.6



[PATCH v9 1/8] perf: Export perf_event_update_userpage

2017-10-31 Thread Suzuki K Poulose
Export perf_event_update_userpage() so that PMU driver using them,
can be built as modules.

Cc: Peter Zilstra 
Signed-off-by: Suzuki K Poulose 
---
 kernel/events/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9d93db81fa36..550015829db8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4982,6 +4982,7 @@ void perf_event_update_userpage(struct perf_event *event)
 unlock:
rcu_read_unlock();
 }
+EXPORT_SYMBOL_GPL(perf_event_update_userpage);
 
 static int perf_mmap_fault(struct vm_fault *vmf)
 {
-- 
2.13.6



[PATCH v9 2/8] of: Add helper for mapping device node to logical CPU number

2017-10-31 Thread Suzuki K Poulose
Add a helper to map a device node to a logical CPU number to avoid
duplication. Currently this is open coded in different places (e.g
gic-v3, coresight). The helper tries to map device node to a "possible"
logical CPU id, which may not be online yet. It is the responsibility
of the user to make sure that the CPU is online. The helper uses
of_cpu_device_node_get() to retrieve the device node for a given CPU
(which uses per_cpu data if available else falls back to slower
of_get_cpu_node()).

Cc: devicet...@vger.kernel.org
Cc: Frank Rowand 
Cc: Mark Rutland 
Cc: Sudeep Holla 
Reviewed-by: Marc Zyngier 
Reviewed-by: Rob Herring 
Signed-off-by: Suzuki K Poulose 
---
Changes since V6:
 - Use faster of_cpu_device_node_get instead of of_get_cpu_node(),
   which now falls back to latter if called in early.
Changes since V3:
 - Renamed the helper to of_cpu_node_to_id(), suggested by Rob
 - Return -ENODEV on failure than nr_cpus_id
---
 drivers/of/base.c  | 26 ++
 include/linux/of.h |  7 +++
 2 files changed, 33 insertions(+)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 63897531cd75..753c8160a092 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -418,6 +418,32 @@ struct device_node *of_get_cpu_node(int cpu, unsigned int 
*thread)
 EXPORT_SYMBOL(of_get_cpu_node);
 
 /**
+ * of_cpu_node_to_id: Get the logical CPU number for a given device_node
+ *
+ * @cpu_node: Pointer to the device_node for CPU.
+ *
+ * Returns the logical CPU number of the given CPU device_node.
+ * Returns -ENODEV if the CPU is not found.
+ */
+int of_cpu_node_to_id(struct device_node *cpu_node)
+{
+   int cpu;
+   bool found = false;
+   struct device_node *np;
+
+   for_each_possible_cpu(cpu) {
+   np = of_cpu_device_node_get(cpu);
+   found = (cpu_node == np);
+   of_node_put(np);
+   if (found)
+   return cpu;
+   }
+
+   return -ENODEV;
+}
+EXPORT_SYMBOL(of_cpu_node_to_id);
+
+/**
  * __of_device_is_compatible() - Check if the node matches given constraints
  * @device: pointer to node
  * @compat: required compatible string, NULL or "" for any match
diff --git a/include/linux/of.h b/include/linux/of.h
index b240ed69dc96..13eb8bc92c10 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -538,6 +538,8 @@ const char *of_prop_next_string(struct property *prop, 
const char *cur);
 
 bool of_console_check(struct device_node *dn, char *name, int index);
 
+extern int of_cpu_node_to_id(struct device_node *np);
+
 #else /* CONFIG_OF */
 
 static inline void of_core_init(void)
@@ -872,6 +874,11 @@ static inline void of_property_clear_flag(struct property 
*p, unsigned long flag
 {
 }
 
+static inline int of_cpu_node_to_id(struct device_node *np)
+{
+   return -ENODEV;
+}
+
 #define of_match_ptr(_ptr) NULL
 #define of_match_node(_matches, _node) NULL
 #endif /* CONFIG_OF */
-- 
2.13.6



[PATCH v9 0/8] perf: Support for ARM DynamIQ Shared Unit

2017-10-31 Thread Suzuki K Poulose
This series adds support for the PMU in ARM DynamIQ Shared Unit (DSU).
The DSU integrates one or more cores with an L3 memory system, control
logic, and external interfaces to form a multicore cluster. The PMU
allows counting the various events related to L3, SCU etc, using 32bit
independent counters along with a 64bit cycle counter.

The PMU can only be accessed via CPU system registers, which are common
to the cores connected to the same DSU. The PMU registers follow the
semantics of the ARMv8 PMU, mostly, with the exception that
the counters record the cluster wide events.

Tested on a Fast model with DSU. The driver only supports ARM64 at the
moment. It can be extended to support ARM32 by providing register
accessors like we do in arch/arm64/include/arm_dsu_pmu.h.

The firmware should setup appropriate bits in the ACTLR_EL3/EL2 to
allow EL1 access to the PMU registers.

Series applies on v4.14-rc6 and is also available at:

  git://linux-arm.org/linux-skp.git 4.14/dsu-v9

Changes since V8:
 - Include required header files (Mark Rutland)
 - Remove Kconfig dependency on PERF_EVENTS (Mark Rutland)
 - Fix typo in event name, bus_acesss => bus_access (Mark Rutland)
 - Use find_first_zero_bit instead of find_next_zero_bit (Mark Rutland)
 - Change order of checks in dsu_pmu_event_init (Mark Rutland)
 - Allow lazy initialisation of DSU PMU to handle cases where CPUs
   may be brought up later (e.g, maxcpus=N)- Mark Rutland.
 - Change the CPU check to "associated_cpus" from "active_cpus",
   as when we migrate the perf context we will access the DSU
   from two different CPUs (source and destination).
 - Fill in the "module" field for the PMU to prevent the module unload
   when the PMU is active.

Changes since V7:
 - No changes to the Core DSU PMU code.
 - Rebased to v4.14-rc4
 - Added Tested-by from Leo
 - Convert arm64 CPU topology parsing, ARM PMU irq_affinity parsing
   to use the new helper.

Changes since V6
 - Rebased to v4.14-rc3
 - Use of_cpu_device_node_get() instead of slower of_get_cpu_node(),
   where the former uses per_cpu data when available and falls back to
   former otherwise.
 - Added Reviewed-by tags from Jonathan

Changes since V5:
 - Pickedup ack from Rob
 - Address comments on V5 by Mark.
 - Use IRQ_NOBALANCING for IRQ handler
 - Don't expose events which could be unimplemented.
 - Get rid of dsu_pmu_event_supported and allow raw event
   code to be used without validating whether it is supported.
 - Rename "supported_cpus" mask to "associated_cpus"
 - Add Documentation for the PMU driver
 - Don't disable IRQ for dsu_pmu_{enable/disable}_counters
 - Use consistent return codes for validate_event/group calls.
 - Check PERF_ATTACH_TASK flag in event_init.
 - Allow missing CPUs in dsu_pmu_dt_get_cpus, to handle cases
   where kernel could have capped nr_cpus.
 - Cleanup sanity checking for the CPU before accessing DSU
 - Reject events with counting CPU not associated with DSU

Changes since V4:
 - Fix regressions introduced by v4, with the rename of generic
   helper.
 - Added reviewed-by tag from Rob

Changes since V3:
 - Rename the of generic helper to of_cpu_node_to_id(), and return
   -ENODEV upon failure than nr_cpus_id
 - Fix node name in device tree node example.

Suzuki K Poulose (8):
  perf: Export perf_event_update_userpage
  of: Add helper for mapping device node to logical CPU number
  coresight: of: Use of_cpu_node_to_id helper
  irqchip: gic-v3: Use of_cpu_node_to_id helper
  arm64: Use of_cpu_node_to_id helper for CPU topology parsing
  arm_pmu: Use of_cpu_node_to_id helper
  dt-bindings: Document devicetree binding for ARM DSU PMU
  perf: ARM DynamIQ Shared Unit PMU support

 .../devicetree/bindings/arm/arm-dsu-pmu.txt|  27 +
 Documentation/perf/arm_dsu_pmu.txt |  28 +
 arch/arm64/include/asm/arm_dsu_pmu.h   | 129 +++
 arch/arm64/kernel/topology.c   |  16 +-
 drivers/hwtracing/coresight/of_coresight.c |  15 +-
 drivers/irqchip/irq-gic-v3.c   |  29 +-
 drivers/of/base.c  |  26 +
 drivers/perf/Kconfig   |   9 +
 drivers/perf/Makefile  |   1 +
 drivers/perf/arm_dsu_pmu.c | 861 +
 drivers/perf/arm_pmu_platform.c|  15 +-
 include/linux/of.h |   7 +
 kernel/events/core.c   |   1 +
 13 files changed, 1103 insertions(+), 61 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
 create mode 100644 Documentation/perf/arm_dsu_pmu.txt
 create mode 100644 arch/arm64/include/asm/arm_dsu_pmu.h
 create mode 100644 drivers/perf/arm_dsu_pmu.c

-- 
2.13.6



[PATCH v9 8/8] perf: ARM DynamIQ Shared Unit PMU support

2017-10-31 Thread Suzuki K Poulose
Add support for the Cluster PMU part of the ARM DynamIQ Shared Unit (DSU).
The DSU integrates one or more cores with an L3 memory system, control
logic, and external interfaces to form a multicore cluster. The PMU
allows counting the various events related to L3, SCU etc, along with
providing a cycle counter.

The PMU can be accessed via system registers, which are common
to the cores in the same cluster. The PMU registers follow the
semantics of the ARMv8 PMU, mostly, with the exception that
the counters record the cluster wide events.

This driver is mostly based on the ARMv8 and CCI PMU drivers.
The driver only supports ARM64 at the moment. It can be extended
to support ARM32 by providing register accessors like we do in
arch/arm64/include/arm_dsu_pmu.h.

Cc: Mark Rutland 
Cc: Will Deacon 
Reviewed-by: Jonathan Cameron 
Signed-off-by: Suzuki K Poulose 
---
Changes since V8:
 - Include required header files (Mark Rutland)
 - Remove Kconfig dependency on PERF_EVENTS (Mark Rutland)
 - Fix typo in event name, bus_acesss => bus_access (Mark Rutland)
 - Use find_first_zero_bit instead of find_next_zero_bit (Mark Rutland)
 - Change order of checks in dsu_pmu_event_init (Mark Rutland)
 - Allow lazy initialisation of DSU PMU to handle cases where CPUs
   may be brought up later (e.g, maxcpus=N)- Mark Rutland.
 - Clear the interrupt overflow status upon initialisation (Mark Rutland)
 - Change the CPU check to "associated_cpus" from "active_cpus",
   as when we migrate the perf context we will access the DSU
   from two different CPUs (source and destination).
 - Fill in the "module" field for the PMU to prevent the module unload
   when the PMU is active.
Changes since V6:
 - Address comments from Jonathan
 - Add Reviewed-by tags from Jonathan
Changes since V5:
 - Address comments on V5 by Mark.
 - Use IRQ_NOBALANCING for IRQ handler
 - Don't expose events which could be unimplemented.
 - Get rid of dsu_pmu_event_supported and allow raw event
   code to be used without validating whether it is supported.
 - Rename "supported_cpus" mask to "associated_cpus"
 - Add Documentation for the PMU driver
 - Don't disable IRQ for dsu_pmu_{enable/disable}_counters
 - Use consistent return codes for validate_event/group calls.
 - Check PERF_ATTACH_TASK flag in event_init.
 - Allow missing CPUs in dsu_pmu_dt_get_cpus, to handle cases
   where kernel could have capped nr_cpus.
 - Cleanup sanity checking for the CPU before accessing DSU
 - Reject events with counting CPU not associated with the DSU.
Changes since V4:
 - Reflect the changed generic helper for mapping CPU id
Changes since V2:
 - Cleanup dsu_pmu_device_probe error handling.
 - Fix event validate_group to invert the result check of validate_event
 - Return errors if we failed to parse CPUs in the DSU.
 - Add MODULE_DEVICE_TABLE entry
 - Use hlist_entry_safe for converting cpuhp_node to dsu_pmu.
---
 Documentation/perf/arm_dsu_pmu.txt   |  28 ++
 arch/arm64/include/asm/arm_dsu_pmu.h | 129 ++
 drivers/perf/Kconfig |   9 +
 drivers/perf/Makefile|   1 +
 drivers/perf/arm_dsu_pmu.c   | 861 +++
 5 files changed, 1028 insertions(+)
 create mode 100644 Documentation/perf/arm_dsu_pmu.txt
 create mode 100644 arch/arm64/include/asm/arm_dsu_pmu.h
 create mode 100644 drivers/perf/arm_dsu_pmu.c

diff --git a/Documentation/perf/arm_dsu_pmu.txt 
b/Documentation/perf/arm_dsu_pmu.txt
new file mode 100644
index ..d611e15f5add
--- /dev/null
+++ b/Documentation/perf/arm_dsu_pmu.txt
@@ -0,0 +1,28 @@
+ARM DynamIQ Shared Unit (DSU) PMU
+==
+
+ARM DynamIQ Shared Unit integrates one or more cores with an L3 memory system,
+control logic and external interfaces to form a multicore cluster. The PMU
+allows counting the various events related to the L3 cache, Snoop Control Unit
+etc, using 32bit independent counters. It also provides a 64bit cycle counter.
+
+The PMU can only be accessed via CPU system registers and are common to the
+cores connected to the same DSU. Like most of the other uncore PMUs, DSU
+PMU doesn't support process specific events and cannot be used in sampling 
mode.
+
+The DSU provides a bitmap for a subset of implemented events via hardware
+registers. There is no way for the driver to determine if the other events
+are available or not. Hence the driver exposes only those events advertised
+by the DSU, in "events" directory under :
+
+  /sys/bus/event_sources/devices/arm_dsu_/
+
+The user should refer to the TRM of the product to figure out the supported 
events
+and use the raw event code for the unlisted events.
+
+The driver also exposes the CPUs connected to the DSU instance in 
"associated_cpus".
+
+
+e.g usage :
+
+   perf stat -a -e arm_dsu_0/cycles/
diff --git a/arch/arm64/include/asm/arm_dsu_pmu.h 
b/arch/arm64/include/asm/arm_dsu_pmu.h
new file

[PATCH v9 5/8] arm64: Use of_cpu_node_to_id helper for CPU topology parsing

2017-10-31 Thread Suzuki K Poulose
Make use of the new generic helper to convert an of_node of a CPU
to the logical CPU id in parsing the topology.

Cc: Catalin Marinas 
Cc: Leo Yan 
Cc: Will Deacon 
Acked-by: Mark Rutland 
Signed-off-by: Suzuki K Poulose 
---
 arch/arm64/kernel/topology.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 8d48b233e6ce..21868530018e 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -37,18 +37,14 @@ static int __init get_cpu_for_node(struct device_node *node)
if (!cpu_node)
return -1;
 
-   for_each_possible_cpu(cpu) {
-   if (of_get_cpu_node(cpu, NULL) == cpu_node) {
-   topology_parse_cpu_capacity(cpu_node, cpu);
-   of_node_put(cpu_node);
-   return cpu;
-   }
-   }
-
-   pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
+   cpu = of_cpu_node_to_id(cpu_node);
+   if (cpu >= 0)
+   topology_parse_cpu_capacity(cpu_node, cpu);
+   else
+   pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
 
of_node_put(cpu_node);
-   return -1;
+   return cpu;
 }
 
 static int __init parse_core(struct device_node *core, int cluster_id,
-- 
2.13.6



[PATCH v9 6/8] arm_pmu: Use of_cpu_node_to_id helper

2017-10-31 Thread Suzuki K Poulose
Use the new generic helper, of_cpu_node_to_id(), to map a
a phandle to the logical CPU number while parsing the
PMU irq affinity.

Cc: Will Deacon 
Acked-by: Mark Rutland 
Signed-off-by: Suzuki K Poulose 
---
 drivers/perf/arm_pmu_platform.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
index 4eafa7a42e52..a96884e37eaf 100644
--- a/drivers/perf/arm_pmu_platform.c
+++ b/drivers/perf/arm_pmu_platform.c
@@ -81,19 +81,10 @@ static int pmu_parse_irq_affinity(struct device_node *node, 
int i)
return -EINVAL;
}
 
-   /* Now look up the logical CPU number */
-   for_each_possible_cpu(cpu) {
-   struct device_node *cpu_dn;
-
-   cpu_dn = of_cpu_device_node_get(cpu);
-   of_node_put(cpu_dn);
-
-   if (dn == cpu_dn)
-   break;
-   }
-
-   if (cpu >= nr_cpu_ids) {
+   cpu = of_cpu_node_to_id(dn);
+   if (cpu < 0) {
pr_warn("failed to find logical CPU for %s\n", dn->name);
+   cpu = nr_cpu_ids;
}
 
of_node_put(dn);
-- 
2.13.6



Re: [PATCH 02/17] coresight tmc: Hide trace buffer handling for file read

2017-11-01 Thread Suzuki K Poulose

On 20/10/17 13:34, Julien Thierry wrote:

Hi Suzuki,

On 19/10/17 18:15, Suzuki K Poulose wrote:

At the moment we adjust the buffer pointers for reading the trace
data via misc device in the common code for ETF/ETB and ETR. Since
we are going to change how we manage the buffer for ETR, let us
move the buffer manipulation to the respective driver files, hiding
it from the common code. We do so by adding type specific helpers
for finding the length of data and the pointer to the buffer,
for a given length at a file position.

Cc: Mathieu Poirier 
Signed-off-by: Suzuki K Poulose 
---
  drivers/hwtracing/coresight/coresight-tmc-etf.c | 16 
  drivers/hwtracing/coresight/coresight-tmc-etr.c | 33 
  drivers/hwtracing/coresight/coresight-tmc.c | 34 ++---
  drivers/hwtracing/coresight/coresight-tmc.h |  4 +++
  4 files changed, 72 insertions(+), 15 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c 
b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index e2513b786242..0b6f1eb746de 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -120,6 +120,22 @@ static void tmc_etf_disable_hw(struct tmc_drvdata *drvdata)
  CS_LOCK(drvdata->base);
  }
+/*
+ * Return the available trace data in the buffer from @pos, with
+ * a maximum limit of @len, updating the @bufpp on where to
+ * find it.
+ */
+ssize_t tmc_etb_get_sysfs_trace(struct tmc_drvdata *drvdata,
+  loff_t pos, size_t len, char **bufpp)
+{
+    /* Adjust the len to available size @pos */
+    if (pos + len > drvdata->len)
+    len = drvdata->len - pos;
+    if (len > 0)


Do we have some guarantee that "pos <= drvdata->len"? because since len is 
unsigned this check only covers the case were len is 0.

Maybe it would be better to use a signed variable to store the result of the 
difference.


+    *bufpp = drvdata->buf + pos;




+ * Return the available trace data in the buffer @pos, with a maximum
+ * limit of @len, also updating the @bufpp on where to find it.
+ */
+ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
+    loff_t pos, size_t len, char **bufpp)
+{
+    char *bufp = drvdata->buf + pos;
+    char *bufend = (char *)(drvdata->vaddr + drvdata->size);
+
+    /* Adjust the len to available size @pos */
+    if (pos + len > drvdata->len)
+    len = drvdata->len - pos;
+
+    if (len <= 0)
+    return len;


Similar issue here.


Thanks for spotting. I will fix it.

Cheers


Re: [PATCH 03/17] coresight: Add helper for inserting synchronization packets

2017-11-01 Thread Suzuki K Poulose

On 30/10/17 21:44, Mathieu Poirier wrote:

On Thu, Oct 19, 2017 at 06:15:39PM +0100, Suzuki K Poulose wrote:

Right now we open code filling the trace buffer with synchronization
packets when the circular buffer wraps around in different drivers.
Move this to a common place.

Cc: Mathieu Poirier 
Cc: Mike Leach 
Signed-off-by: Suzuki K Poulose 
---
  drivers/hwtracing/coresight/coresight-etb10.c   | 10 +++--
  drivers/hwtracing/coresight/coresight-priv.h|  8 
  drivers/hwtracing/coresight/coresight-tmc-etf.c | 27 -
  drivers/hwtracing/coresight/coresight-tmc-etr.c | 13 +---
  4 files changed, 20 insertions(+), 38 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etb10.c 
b/drivers/hwtracing/coresight/coresight-etb10.c
index 56ecd7aff5eb..d7164ab8e229 100644
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@@ -203,7 +203,6 @@ static void etb_dump_hw(struct etb_drvdata *drvdata)
bool lost = false;
int i;
u8 *buf_ptr;
-   const u32 *barrier;
u32 read_data, depth;
u32 read_ptr, write_ptr;
u32 frame_off, frame_endoff;
@@ -234,19 +233,16 @@ static void etb_dump_hw(struct etb_drvdata *drvdata)
  
  	depth = drvdata->buffer_depth;

buf_ptr = drvdata->buf;
-   barrier = barrier_pkt;
for (i = 0; i < depth; i++) {
read_data = readl_relaxed(drvdata->base +
  ETB_RAM_READ_DATA_REG);
-   if (lost && *barrier) {
-   read_data = *barrier;
-   barrier++;
-   }
-
*(u32 *)buf_ptr = read_data;
buf_ptr += 4;
}
  
+	if (lost)

+   coresight_insert_barrier_packet(drvdata->buf);
+
if (frame_off) {
buf_ptr -= (frame_endoff * 4);
for (i = 0; i < frame_endoff; i++) {
diff --git a/drivers/hwtracing/coresight/coresight-priv.h 
b/drivers/hwtracing/coresight/coresight-priv.h
index f1d0e21d8cab..d12f64928c00 100644
--- a/drivers/hwtracing/coresight/coresight-priv.h
+++ b/drivers/hwtracing/coresight/coresight-priv.h
@@ -65,6 +65,7 @@ static DEVICE_ATTR_RO(name)
__coresight_simple_func(type, NULL, name, lo_off, hi_off)
  
  extern const u32 barrier_pkt[5];

+#define CORESIGHT_BARRIER_PKT_SIZE (sizeof(barrier_pkt) - sizeof(u32))


When using a memcpy() there is no need to have a 0x0 at the end of the
barrier_pkt array.  A such I suggest you remove that and simply use sizeof()
in coresight_insert_barrier_packet().


There is one place where can't simply do a memcpy(), in tmc_update_etf_buffer(),
where we could potentially move over to the next PAGE while filling the barrier 
packets.
This is why I didn't trim it off. However, I believe this shouldn't trigger as 
the trace
data should always be aligned to the frame size of the TMC and the perf buffer 
size is
page aligned. So, we should be able use memcpy() in that case too. I will fix it
in the next version.

Thanks
Suzuki



Re: [PATCH 04/17] coresight: Add generic TMC sg table framework

2017-11-01 Thread Suzuki K Poulose

On 31/10/17 22:13, Mathieu Poirier wrote:

On Thu, Oct 19, 2017 at 06:15:40PM +0100, Suzuki K Poulose wrote:

This patch introduces a generic sg table data structure and
associated operations. An SG table can be used to map a set
of Data pages where the trace data could be stored by the TMC
ETR. The information about the data pages could be stored in
different formats, depending on the type of the underlying
SG mechanism (e.g, TMC ETR SG vs Coresight CATU). The generic
structure provides book keeping of the pages used for the data
as well as the table contents. The table should be filled by
the user of the infrastructure.

A table can be created by specifying the number of data pages
as well as the number of table pages required to hold the
pointers, where the latter could be different for different
types of tables. The pages are mapped in the appropriate dma
data direction mode (i.e, DMA_TO_DEVICE for table pages
and DMA_FROM_DEVICE for data pages).  The framework can optionally
accept a set of allocated data pages (e.g, perf ring buffer) and
map them accordingly. The table and data pages are vmap'ed to allow
easier access by the drivers. The framework also provides helpers to
sync the data written to the pages with appropriate directions.

This will be later used by the TMC ETR SG unit.

Cc: Mathieu Poirier 
Signed-off-by: Suzuki K Poulose 
---
---
  drivers/hwtracing/coresight/coresight-tmc-etr.c | 289 +++-
  drivers/hwtracing/coresight/coresight-tmc.h |  44 
  2 files changed, 332 insertions(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c 
b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 41535fa6b6cf..4b9e2b276122 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -16,10 +16,297 @@
   */
  
  #include 

-#include 
+#include 
  #include "coresight-priv.h"
  #include "coresight-tmc.h"
  
+/*

+ * tmc_pages_get_offset:  Go through all the pages in the tmc_pages
+ * and map @phys_addr to an offset within the buffer.


Did you mean "... map @addr"?  It might also be worth it to explicitly mention
that it maps a physical address to an offset in the contiguous range.


Yes, definitely. I will fix it.


...


+/*
+ * Alloc pages for the table. Since this will be used by the device,
+ * allocate the pages closer to the device (i.e, dev_to_node(dev)
+ * rather than the CPU nod).
+ */
+static int tmc_alloc_table_pages(struct tmc_sg_table *sg_table)
+{
+   int rc;
+   struct tmc_pages *table_pages = &sg_table->table_pages;
+
+   rc = tmc_pages_alloc(table_pages, sg_table->dev,
+dev_to_node(sg_table->dev),
+DMA_TO_DEVICE, NULL);
+   if (rc)
+   return rc;
+   sg_table->table_vaddr = vmap(table_pages->pages,
+table_pages->nr_pages,
+VM_MAP,
+PAGE_KERNEL);
+   if (!sg_table->table_vaddr)
+   rc = -ENOMEM;
+   else
+   sg_table->table_daddr = table_pages->daddrs[0];
+   return rc;
+}
+
+static int tmc_alloc_data_pages(struct tmc_sg_table *sg_table, void **pages)
+{
+   int rc;
+
+   rc = tmc_pages_alloc(&sg_table->data_pages,
+sg_table->dev, sg_table->node,


Am I missing something very subtle here or sg_table->node should be the same as
dev_to_node(sg_table->dev)?  If the same both tmc_alloc_table_pages() and
tmc_alloc_data_pages() should be using the same construct.  Otherwise please add
a comment to justify the difference.


Yes, it was a last minute change to switch the table to use dev_to_node(), 
while the
data pages are allocated as requested by the user. Eventually the user would 
consume
the data pages (even though the device produces it). However, the table pages 
are solely
for the consumption of the device, hence the dev_to_node().

I will add a comment to make that explicit.


u32 axictl, sts;
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h 
b/drivers/hwtracing/coresight/coresight-tmc.h
index 6deb3afe9db8..5e49c035a1ac 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -19,6 +19,7 @@
  #define _CORESIGHT_TMC_H
  
  #include 

+#include 
  
  #define TMC_RSZ			0x004

  #define TMC_STS   0x00c
@@ -171,6 +172,38 @@ struct tmc_drvdata {
u32 etr_caps;
  };
  
+/**

+ * struct tmc_pages - Collection of pages used for SG.
+ * @nr_pages:  Number of pages in the list.
+ * @daddr: DMA'able page address returned by dma_map_page().
+ * @vaddr: Virtual address returned by page_address().


This isn't accurate.



Yes, I will clean that up. It kind of shows the number of re

Re: [PATCH 05/17] coresight: Add support for TMC ETR SG unit

2017-11-01 Thread Suzuki K Poulose



diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c 
b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 4b9e2b276122..4424eb67a54c 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -21,6 +21,89 @@
  #include "coresight-tmc.h"
  /*
+ * The TMC ETR SG has a page size of 4K. The SG table contains pointers
+ * to 4KB buffers. However, the OS may be use PAGE_SIZE different from


nit:
"the OS may use a PAGE_SIZE different from".





+#define ETR_SG_PAGE_SHIFT    12
+#define ETR_SG_PAGE_SIZE    (1UL << ETR_SG_PAGE_SHIFT)
+#define ETR_SG_PAGES_PER_SYSPAGE    (1UL << \
+ (PAGE_SHIFT - ETR_SG_PAGE_SHIFT))


I think this would be slightly easier to understand if defined as:
"(PAGE_SIZE / ETR_SG_PAGE_SIZE)".




+/* Dump the given sg_table */
+static void tmc_etr_sg_table_dump(struct etr_sg_table *etr_table)
+{
+    sgte_t *ptr;
+    int i = 0;
+    dma_addr_t addr;
+    struct tmc_sg_table *sg_table = etr_table->sg_table;
+
+    ptr = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table,
+  etr_table->hwaddr, true);
+    while (ptr) {
+    addr = ETR_SG_ADDR(*ptr);
+    switch (ETR_SG_ET(*ptr)) {
+    case ETR_SG_ET_NORMAL:
+    pr_debug("%05d: %p\t:[N] 0x%llx\n", i, ptr, addr);
+    ptr++;
+    break;
+    case ETR_SG_ET_LINK:
+    pr_debug("%05d: *** %p\t:{L} 0x%llx ***\n",
+ i, ptr, addr);
+    ptr = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table,
+  addr, true);
+    break;
+    case ETR_SG_ET_LAST:
+    pr_debug("%05d: ### %p\t:[L] 0x%llx ###\n",
+ i, ptr, addr);
+    return;


I get this is debug code, but it seems like if ETR_SG_ET(*ptr) is 0 we get 
stuck in an infinite loop. I guess it is something that supposedly doesn't 
happen, still I'd prefer having a default case saying the table might be 
corrupted and either incrementing ptr to try and get more info or breaking out 
of the loop.




+    }
+    i++;
+    }
+    pr_debug("*** End of Table *\n");
+}
+#endif
+
+/*
+ * Populate the SG Table page table entries from table/data
+ * pages allocated. Each Data page has ETR_SG_PAGES_PER_SYSPAGE SG pages.
+ * So does a Table page. So we keep track of indices of the tables
+ * in each system page and move the pointers accordingly.
+ */
+#define INC_IDX_ROUND(idx, size) (idx = (idx + 1) % size)


Needs more parenthesis around idx and size.




+static void tmc_etr_sg_table_populate(struct etr_sg_table *etr_table)
+{
+    dma_addr_t paddr;
+    int i, type, nr_entries;
+    int tpidx = 0; /* index to the current system table_page */
+    int sgtidx = 0;    /* index to the sg_table within the current syspage */
+    int sgtoffset = 0; /* offset to the next entry within the sg_table */


That's misleading, this seems to be the index of an entry within an ETR_SG_PAGE 
rather than an offset in bytes.

Maybe ptridx or entryidx would be a better name.


You're right, I have chosen sgtentry for now.

Thanks for the detailed look, I will fix all of them.

Cheers
Suzuki


Re: [PATCH 06/17] coresight: tmc: Make ETR SG table circular

2017-11-01 Thread Suzuki K Poulose

On 20/10/17 18:11, Julien Thierry wrote:


+static int __maybe_unused
+tmc_etr_sg_table_rotate(struct etr_sg_table *etr_table, u64 base_offset)
+{
+    u32 last_entry, first_entry;
+    u64 last_offset;
+    struct tmc_sg_table *sg_table = etr_table->sg_table;
+    sgte_t *table_ptr = sg_table->table_vaddr;
+    ssize_t buf_size = tmc_sg_table_buf_size(sg_table);
+
+    /* Offset should always be SG PAGE_SIZE aligned */
+    if (base_offset & (ETR_SG_PAGE_SIZE - 1)) {
+    pr_debug("unaligned base offset %llx\n", base_offset);
+    return -EINVAL;
+    }
+    /* Make sure the offset is within the range */
+    if (base_offset < 0 || base_offset > buf_size) {


base_offset is unsigned, so the left operand of the '||' is useless (would've 
expected the compiler to emit a warning for this).


+    base_offset = (base_offset + buf_size) % buf_size;
+    pr_debug("Resetting offset to %llx\n", base_offset);
+    }
+    first_entry = tmc_etr_sg_offset_to_table_index(base_offset);
+    if (first_entry == etr_table->first_entry) {
+    pr_debug("Head is already at %llx, skipping\n", base_offset);
+    return 0;
+    }
+
+    /* Last entry should be the previous one to the new "base" */
+    last_offset = ((base_offset - ETR_SG_PAGE_SIZE) + buf_size) % buf_size;
+    last_entry = tmc_etr_sg_offset_to_table_index(last_offset);
+
+    /* Reset the current Last page to Normal and new Last page to NORMAL */


Current Last page to NORMAL and new Last page to LAST?


Thanks again, will fix them

Cheers
Suzuki


Re: [PATCH 06/17] coresight: tmc: Make ETR SG table circular

2017-11-02 Thread Suzuki K Poulose

On 01/11/17 23:47, Mathieu Poirier wrote:

On Thu, Oct 19, 2017 at 06:15:42PM +0100, Suzuki K Poulose wrote:

Make the ETR SG table Circular buffer so that we could start
at any of the SG pages and use the entire buffer for tracing.
This can be achieved by :

1) Keeping an additional LINK pointer at the very end of the
SG table, i.e, after the LAST buffer entry, to point back to
the beginning of the first table. This will allow us to use
the buffer normally when we start the trace at offset 0 of
the buffer, as the LAST buffer entry hints the TMC-ETR and
it automatically wraps to the offset 0.

2) If we want to start at any other ETR SG page aligned offset,
we could :
  a) Make the preceding page entry as LAST entry.
  b) Make the original LAST entry a normal entry.
  c) Use the table pointer to the "new" start offset as the
 base of the table address.
This works as the TMC doesn't mandate that the page table
base address should be 4K page aligned.

Cc: Mathieu Poirier 
Signed-off-by: Suzuki K Poulose 
---



+static int __maybe_unused
+tmc_etr_sg_table_rotate(struct etr_sg_table *etr_table, u64 base_offset)
+{
+   u32 last_entry, first_entry;
+   u64 last_offset;
+   struct tmc_sg_table *sg_table = etr_table->sg_table;
+   sgte_t *table_ptr = sg_table->table_vaddr;
+   ssize_t buf_size = tmc_sg_table_buf_size(sg_table);
+
+   /* Offset should always be SG PAGE_SIZE aligned */
+   if (base_offset & (ETR_SG_PAGE_SIZE - 1)) {
+   pr_debug("unaligned base offset %llx\n", base_offset);
+   return -EINVAL;
+   }
+   /* Make sure the offset is within the range */
+   if (base_offset < 0 || base_offset > buf_size) {
+   base_offset = (base_offset + buf_size) % buf_size;
+   pr_debug("Resetting offset to %llx\n", base_offset);
+   }
+   first_entry = tmc_etr_sg_offset_to_table_index(base_offset);
+   if (first_entry == etr_table->first_entry) {
+   pr_debug("Head is already at %llx, skipping\n", base_offset);
+   return 0;
+   }
+
+   /* Last entry should be the previous one to the new "base" */
+   last_offset = ((base_offset - ETR_SG_PAGE_SIZE) + buf_size) % buf_size;
+   last_entry = tmc_etr_sg_offset_to_table_index(last_offset);
+
+   /* Reset the current Last page to Normal and new Last page to NORMAL */
+   tmc_etr_sg_update_type(&table_ptr[etr_table->last_entry],
+ETR_SG_ET_NORMAL);
+   tmc_etr_sg_update_type(&table_ptr[last_entry], ETR_SG_ET_LAST);
+   etr_table->hwaddr = tmc_etr_sg_table_index_to_daddr(sg_table,
+   first_entry);
+   etr_table->first_entry = first_entry;
+   etr_table->last_entry = last_entry;
+   pr_debug("table rotated to offset %llx-%llx, entries (%d - %d), dba: 
%llx\n",
+   base_offset, last_offset, first_entry, last_entry,
+   etr_table->hwaddr);


The above line generates a warning when compiling for ARMv7.


Where you running with LPAE off ? That could probably be the case,
where hwaddr could be 32bit or 64bit depending on whether LPAE
is enabled. I will fix it.

I have fixed some other warnings with ARMv7 with LPAE.

Suzuki


Re: [PATCH] arm64: mm: Set MAX_PHYSMEM_BITS based on ARM64_VA_BITS

2017-11-13 Thread Suzuki K Poulose

On 12/11/17 17:55, Jerome Glisse wrote:

On Fri, Nov 10, 2017 at 03:11:15PM +, Robin Murphy wrote:

On 09/11/17 22:58, Krishna Reddy wrote:

MAX_PHYSMEM_BITS greater than ARM64_VA_BITS is causing memory
access fault, when HMM_DMIRROR test is enabled.
In the failing case, ARM64_VA_BITS=39 and MAX_PHYSMEM_BITS=48.
HMM_DMIRROR test selects phys memory range from end based on
MAX_PHYSMEM_BITS and gets mapped into VA space linearly.
As VA space is 39-bit and phys space is 48-bit, this has caused
incorrect mapping and leads to memory access fault.

Limiting the MAX_PHYSMEM_BITS to ARM64_VA_BITS fixes the issue and is
the right thing instead of hard coding it as 48-bit always.

[3.378655] Unable to handle kernel paging request at virtual address 
3befd00
[3.378662] pgd = ff800a04b000
[3.378900] [3befd00] *pgd=81fa3003, *pud=81fa3003, 
*pmd=006268200711
[3.378933] Internal error: Oops: 9644 [#1] PREEMPT SMP
[3.378938] Modules linked in:
[3.378948] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
4.9.52-tegra-g91402fdc013b-dirty #51
[3.378950] Hardware name: quill (DT)
[3.378954] task: ffc1ebac task.stack: ffc1eba64000
[3.378967] PC is at __memset+0x1ac/0x1d0
[3.378976] LR is at sparse_add_one_section+0xf8/0x174
[3.378981] pc : [] lr : [] pstate: 
404000c5
[3.378983] sp : ffc1eba67a40
[3.378993] x29: ffc1eba67a40 x28: 
[3.378999] x27: 0003 x26: 0040
[3.379005] x25: 03ff x24: ffc1e9f6cf80
[3.379010] x23: ff8009ecb2d4 x22: 03befd00
[3.379015] x21: ffc1e9923ff0 x20: 0003
[3.379020] x19: ffef x18: 
[3.379025] x17: 24d7 x16: 
[3.379030] x15: ff8009cd8690 x14: ffc1e9f6c70c
[3.379035] x13: ffc1e9f6c70b x12: 0030
[3.379039] x11: 0040 x10: 0101010101010101
[3.379044] x9 :  x8 : 03befd00
[3.379049] x7 :  x6 : 003f
[3.379053] x5 : 0040 x4 : 
[3.379058] x3 : 0004 x2 : 00c0
[3.379063] x1 :  x0 : 03befd00
[3.379064]
[3.379069] Process swapper/0 (pid: 1, stack limit = 0xffc1eba64028)
[3.379071] Call trace:
[3.379079] [] __memset+0x1ac/0x1d0


What's the deal with this memset? AFAICS we're in __add_pages() from
hmm_devmem_pages_create() calling add_pages() for private memory which it
does not expect to be in the linear map anyway :/

There appears to be a more fundamental problem being papered over here.

Robin.


Yes i think the dummy driver is use badly, if you want to test CDM memory
with dummy driver you need to steal regular memory to act as CDM memory.
You can take a look at following 2 patches:

https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-cdm-next&id=fcc1e94027dbee9525f75b2a9ad88b2e6279558a
https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-cdm-next&id=84204c5be742186236b371ea2f7ad39bf1770fe6

Note that this is only if your device have its own memory that is not
reported as regular ram to kernel resource and if that memory is
accessible by CPU in cache coherent way.

For tegra platform i don't think you have any such memory. Thus you
do not need to register any memory to use HMM. But we can talk about
your platform in private mail under NDA if it is not the case.

Note that no matter what i still think it make sense to properly define
MAX_PHYSMEM_BITS like on x86 or powerpc.


Its a bit tricky on arm64. Essentially the VMEMMAP can cover a region
of (1 << (VA_BITS - 1)), the size of the linear map. However, it does
allow the physical memory to be above VA_BITS. e.g, one can boot a
39bit VA kernel on a system where the RAM is at 40bits. We adjust the
"vmemmap" to make sure that it points to the first "valid" pfn (based
on the start address of the memory).

If we reduce the MAX_PHYSMEM_BITS to VA_BITS, that could break the kernel
booting on platforms where the memory is above VA_BITS. I am wondering
if we should do an additional check, pfn_valid() before we go ahead
and assume that the PFN can be mapped in the cases like above.

Suzuki






[3.379085] [] __add_pages+0x130/0x2e0
[3.379093] [] hmm_devmem_pages_create+0x20c/0x310
[3.379100] [] hmm_devmem_add+0x1d4/0x270
[3.379128] [] dmirror_probe+0x50/0x158
[3.379137] [] platform_drv_probe+0x60/0xc8
[3.379143] [] driver_probe_device+0x26c/0x420
[3.379149] [] __driver_attach+0x124/0x128
[3.379155] [] bus_for_each_dev+0x88/0xe8
[3.379166] [] driver_attach+0x30/0x40
[3.379171] [] bus_add_driver+0x1f8/0x2b0
[3.379177] [] driver_register+0x68/0x100
[3.379183] [] __platform_driver_register+0x5c/0x68
[3.379192] [] hmm_dmirror_init+0x88/0xc4
[3.379200] [] do_one_initcall+0x5c/0x170
[3.379208] [] kernel_init_freeable+0x1b8/0x258
[ 

Re: [PATCH] arm64: mm: Set MAX_PHYSMEM_BITS based on ARM64_VA_BITS

2017-11-13 Thread Suzuki K Poulose

On 13/11/17 12:56, Robin Murphy wrote:

On 13/11/17 10:32, Suzuki K Poulose wrote:

On 12/11/17 17:55, Jerome Glisse wrote:

On Fri, Nov 10, 2017 at 03:11:15PM +, Robin Murphy wrote:

On 09/11/17 22:58, Krishna Reddy wrote:

MAX_PHYSMEM_BITS greater than ARM64_VA_BITS is causing memory
access fault, when HMM_DMIRROR test is enabled.
In the failing case, ARM64_VA_BITS=39 and MAX_PHYSMEM_BITS=48.
HMM_DMIRROR test selects phys memory range from end based on
MAX_PHYSMEM_BITS and gets mapped into VA space linearly.
As VA space is 39-bit and phys space is 48-bit, this has caused
incorrect mapping and leads to memory access fault.

Limiting the MAX_PHYSMEM_BITS to ARM64_VA_BITS fixes the issue and is
the right thing instead of hard coding it as 48-bit always.

[    3.378655] Unable to handle kernel paging request at virtual address 
3befd00
[    3.378662] pgd = ff800a04b000
[    3.378900] [3befd00] *pgd=81fa3003, *pud=81fa3003, 
*pmd=006268200711
[    3.378933] Internal error: Oops: 9644 [#1] PREEMPT SMP
[    3.378938] Modules linked in:
[    3.378948] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
4.9.52-tegra-g91402fdc013b-dirty #51
[    3.378950] Hardware name: quill (DT)
[    3.378954] task: ffc1ebac task.stack: ffc1eba64000
[    3.378967] PC is at __memset+0x1ac/0x1d0
[    3.378976] LR is at sparse_add_one_section+0xf8/0x174
[    3.378981] pc : [] lr : [] pstate: 
404000c5
[    3.378983] sp : ffc1eba67a40
[    3.378993] x29: ffc1eba67a40 x28: 
[    3.378999] x27: 0003 x26: 0040
[    3.379005] x25: 03ff x24: ffc1e9f6cf80
[    3.379010] x23: ff8009ecb2d4 x22: 03befd00
[    3.379015] x21: ffc1e9923ff0 x20: 0003
[    3.379020] x19: ffef x18: 
[    3.379025] x17: 24d7 x16: 
[    3.379030] x15: ff8009cd8690 x14: ffc1e9f6c70c
[    3.379035] x13: ffc1e9f6c70b x12: 0030
[    3.379039] x11: 0040 x10: 0101010101010101
[    3.379044] x9 :  x8 : 03befd00
[    3.379049] x7 :  x6 : 003f
[    3.379053] x5 : 0040 x4 : 
[    3.379058] x3 : 0004 x2 : 00c0
[    3.379063] x1 :  x0 : 03befd00
[    3.379064]
[    3.379069] Process swapper/0 (pid: 1, stack limit = 0xffc1eba64028)
[    3.379071] Call trace:
[    3.379079] [] __memset+0x1ac/0x1d0


What's the deal with this memset? AFAICS we're in __add_pages() from
hmm_devmem_pages_create() calling add_pages() for private memory which it
does not expect to be in the linear map anyway :/


FWIW I did keep looking, and I now see that, thanks to confusing inlining, this 
is probably the clearing of the vmemmap section in sparse_add_one_section(), 
rather than any touching of the new memory itself. The fact that the commit 
message doesn't even try to explain the real problem (seemingly that the index 
has overflowed the vmemmap area and wrapped past the top of the address space) 
only emphasises my concern that this is a bit of a hack, though.


There appears to be a more fundamental problem being papered over here.


Following some discussion with Suzuki and Catalin, there does seem to be a more 
general issue of the interaction between vmemmap and memory hotplug. Of course, 
arm64 doesn't support memory hotplug in mainline (it's something I've started 
looking at), nor other HMM dependencies, so there's already more going on here 
than meets the eye.


Yes i think the dummy driver is use badly, if you want to test CDM memory
with dummy driver you need to steal regular memory to act as CDM memory.
You can take a look at following 2 patches:

https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-cdm-next&id=fcc1e94027dbee9525f75b2a9ad88b2e6279558a
https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-cdm-next&id=84204c5be742186236b371ea2f7ad39bf1770fe6

Note that this is only if your device have its own memory that is not
reported as regular ram to kernel resource and if that memory is
accessible by CPU in cache coherent way.

For tegra platform i don't think you have any such memory. Thus you
do not need to register any memory to use HMM. But we can talk about
your platform in private mail under NDA if it is not the case.

Note that no matter what i still think it make sense to properly define
MAX_PHYSMEM_BITS like on x86 or powerpc.


Its a bit tricky on arm64. Essentially the VMEMMAP can cover a region
of (1 << (VA_BITS - 1)), the size of the linear map. However, it does
allow the physical memory to be above VA_BITS. e.g, one can boot a
39bit VA kernel on a system where the RAM is at 40bits. We adjust the
"vmemmap" to make sure that it points to the first "valid" pfn (based
on the start address of the memory).

If we reduce the MAX_PHYSMEM_B

Re: [PATCH 07/17] coresight: tmc etr: Add transparent buffer management

2017-11-03 Thread Suzuki K Poulose

On 02/11/17 17:48, Mathieu Poirier wrote:

On Thu, Oct 19, 2017 at 06:15:43PM +0100, Suzuki K Poulose wrote:

At the moment we always use contiguous memory for TMC ETR tracing
when used from sysfs. The size of the buffer is fixed at boot time
and can only be changed by modifiying the DT. With the introduction
of SG support we could support really large buffers in that mode.
This patch abstracts the buffer used for ETR to switch between a
contiguous buffer or a SG table depending on the availability of
the memory.

This also enables the sysfs mode to use the ETR in SG mode depending
on configured the trace buffer size. Also, since ETR will use the
new infrastructure to manage the buffer, we can get rid of some
of the members in the tmc_drvdata and clean up the fields a bit.

Cc: Mathieu Poirier 
Signed-off-by: Suzuki K Poulose 
---
  drivers/hwtracing/coresight/coresight-tmc-etr.c | 433 +++-
  drivers/hwtracing/coresight/coresight-tmc.h |  60 +++-
  2 files changed, 403 insertions(+), 90 deletions(-)



[..]
  

+
+static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)



+   w_offset = tmc_sg_get_data_page_offset(table, rwp);
+   if (w_offset < 0) {
+   dev_warn(table->dev, "Unable to map RWP %llx to offset\n",
+   rwp);


 dev_warn(table->dev,
  "Unable to map RWP %llx to offset\n", rwq);

It looks a little better and we respect indentation rules.  Same for r_offset.




+static inline int tmc_etr_mode_alloc_buf(int mode,
+ struct tmc_drvdata *drvdata,
+ struct etr_buf *etr_buf, int node,
+ void **pages)


static inline int
tmc_etr_mode_alloc_buf(int mode,
struct tmc_drvdata *drvdata,
struct etr_buf *etr_buf, int node,
void **pages)



+ * tmc_alloc_etr_buf: Allocate a buffer use by ETR.
+ * @drvdata: ETR device details.
+ * @size   : size of the requested buffer.
+ * @flags  : Required properties of the type of buffer.
+ * @node   : Node for memory allocations.
+ * @pages  : An optional list of pages.
+ */
+static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
+ ssize_t size, int flags,
+ int node, void **pages)


Please fix indentation.  Also @flags isn't used.



Yep, flags is only used later and can move it to the patch where we use it.


+{
+   int rc = -ENOMEM;
+   bool has_etr_sg, has_iommu;
+   struct etr_buf *etr_buf;
+
+   has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG);
+   has_iommu = iommu_get_domain_for_dev(drvdata->dev);
+
+   etr_buf = kzalloc(sizeof(*etr_buf), GFP_KERNEL);
+   if (!etr_buf)
+   return ERR_PTR(-ENOMEM);
+
+   etr_buf->size = size;
+
+   /*
+* If we have to use an existing list of pages, we cannot reliably
+* use a contiguous DMA memory (even if we have an IOMMU). Otherwise,
+* we use the contiguous DMA memory if :
+*  a) The ETR cannot use Scatter-Gather.
+*  b) if not a, we have an IOMMU backup


Please rework the above sentence.


How about :
   b) if (a) is not true and we have an IOMMU connected to the ETR.

I will address the other comments on indentation.

Thanks for the detailed look

Cheers
Suzuki


Re: [PATCH 10/17] coresight: etr: Track if the device is coherent

2017-11-03 Thread Suzuki K Poulose

On 02/11/17 19:40, Mathieu Poirier wrote:

On Thu, Oct 19, 2017 at 06:15:46PM +0100, Suzuki K Poulose wrote:

Track if the ETR is dma-coherent or not. This will be useful
in deciding if we should use software buffering for perf.

Cc: Mathieu Poirier 
Signed-off-by: Suzuki K Poulose 
---
  drivers/hwtracing/coresight/coresight-tmc.c | 5 -
  drivers/hwtracing/coresight/coresight-tmc.h | 1 +
  2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc.c 
b/drivers/hwtracing/coresight/coresight-tmc.c
index 4939333cc6c7..5a8c41130f96 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -347,6 +347,9 @@ static int tmc_etr_setup_caps(struct tmc_drvdata *drvdata,
if (!(devid & TMC_DEVID_NOSCAT))
tmc_etr_set_cap(drvdata, TMC_ETR_SG);
  
+	if (device_get_dma_attr(drvdata->dev) == DEV_DMA_COHERENT)

+   tmc_etr_set_cap(drvdata, TMC_ETR_COHERENT);
+
/* Check if the AXI address width is available */
if (devid & TMC_DEVID_AXIAW_VALID)
dma_mask = ((devid >> TMC_DEVID_AXIAW_SHIFT) &
@@ -397,7 +400,7 @@ static int tmc_probe(struct amba_device *adev, const struct 
amba_id *id)
if (!drvdata)
goto out;
  
-	drvdata->dev = &adev->dev;

+   drvdata->dev = dev;


What is that one for?



Oops, that was a minor cleanup and need not be part of this patch. I will leave 
things
as it is. It is not worth a separate patch.

Cheers
Suzuki


Re: [PATCH 11/17] coresight etr: Handle driver mode specific ETR buffers

2017-11-03 Thread Suzuki K Poulose

On 02/11/17 20:26, Mathieu Poirier wrote:

On Thu, Oct 19, 2017 at 06:15:47PM +0100, Suzuki K Poulose wrote:

Since the ETR could be driven either by SYSFS or by perf, it
becomes complicated how we deal with the buffers used for each
of these modes. The ETR driver cannot simply free the current
attached buffer without knowing the provider (i.e, sysfs vs perf).

To solve this issue, we provide:
1) the driver-mode specific etr buffer to be retained in the drvdata
2) the etr_buf for a session should be passed on when enabling the
hardware, which will be stored in drvdata->etr_buf. This will be
replaced (not free'd) as soon as the hardware is disabled, after
necessary sync operation.


If I get you right the problem you're trying to solve is what to do with a sysFS
buffer that hasn't been read (and freed) when a perf session is requested.  In
my opinion it should simply be freed.  Indeed the user probably doesn't care
much about that sysFS buffer, if it did the data would have been harvested.


Not only that. If we simply use the drvdata->etr_buf, we cannot track the mode
which uses it. If we keep the etr_buf around, how do the new mode user decide
how to free the existing one ? (e.g, the perf etr_buf could be associated with
other perf data structures). This change would allow us to leave the handling
of the etr_buf to its respective modes.

And whether to keep the sysfs etr_buf around is a separate decision from the
above.


Cheers
Suzuki


Re: [PATCH 13/17] coresight etr: Do not clean ETR trace buffer

2017-11-03 Thread Suzuki K Poulose

On 02/11/17 20:36, Mathieu Poirier wrote:

On Thu, Oct 19, 2017 at 06:15:49PM +0100, Suzuki K Poulose wrote:

We zero out the entire trace buffer used for ETR before it
is enabled, for helping with debugging. Since we could be
restoring a session in perf mode, this could destroy the data.


I'm not sure to follow you with "... restoring a session in perf mode ...".
When operating from the perf interface all the memory allocated for a session is
cleanup after, there is no re-using of memory as in sysFS.


We could directly use the perf ring buffer for the ETR. In that case, the perf
ring buffer could contain trace data collected from the previous "schedule"
which the userspace hasn't collected yet. So, doing a memset here would
destroy that data.

Cheers
Suzuki




Get rid of this step, if someone wants to debug, they can always
add it as and when needed.

Cc: Mathieu Poirier 
Signed-off-by: Suzuki K Poulose 
---
  drivers/hwtracing/coresight/coresight-tmc-etr.c | 7 ++-
  1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c 
b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 31353fc34b53..849684f85443 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -971,8 +971,6 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata,
return;
  
  	drvdata->etr_buf = etr_buf;

-   /* Zero out the memory to help with debug */
-   memset(etr_buf->vaddr, 0, etr_buf->size);


I agree, this can be costly when dealing with large areas of memory.

  
  	CS_UNLOCK(drvdata->base);
  
@@ -1267,9 +1265,8 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)

if (drvdata->mode == CS_MODE_SYSFS) {
/*
 * The trace run will continue with the same allocated trace
-* buffer. The trace buffer is cleared in tmc_etr_enable_hw(),
-* so we don't have to explicitly clear it. Also, since the
-* tracer is still enabled drvdata::buf can't be NULL.
+* buffer. Since the tracer is still enabled drvdata::buf can't
+* be NULL.
 */
tmc_etr_enable_hw(drvdata, drvdata->sysfs_buf);
} else {
--
2.13.6





[PATCH 1/2] perf: arm_spe: Prevent module unload while the PMU is in use

2017-11-03 Thread Suzuki K Poulose
When the PMU driver is built as a module, the perf expects the
pmu->module to be valid, so that the driver is prevented from
being unloaded while it is in use. Fix the SPE pmu driver to
fill in this field.

Cc: Will Deacon 
Cc: Mark Rutland 
Signed-off-by: Suzuki K Poulose 
---
 drivers/perf/arm_spe_pmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 50511b13fd35..8ce262fc2561 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -889,6 +889,7 @@ static int arm_spe_pmu_perf_init(struct arm_spe_pmu 
*spe_pmu)
struct device *dev = &spe_pmu->pdev->dev;
 
spe_pmu->pmu = (struct pmu) {
+   .module = THIS_MODULE,
.capabilities   = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE,
.attr_groups= arm_spe_pmu_attr_groups,
/*
-- 
2.13.6



[PATCH 2/2] arm-ccn: perf: Prevent module unload while PMU is in use

2017-11-03 Thread Suzuki K Poulose
When the PMU driver is built as a module, the perf expects the
pmu->module to be valid, so that the driver is prevented from
being unloaded while it is in use. Fix the CCN pmu driver to
fill in this field.

Fixes: commit a33b0daab73a0 ("bus: ARM CCN PMU driver")
Cc: Pawel Moll 
Cc: Will Deacon 
Cc: Mark Rutland 
Signed-off-by: Suzuki K Poulose 
---
 drivers/bus/arm-ccn.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/bus/arm-ccn.c b/drivers/bus/arm-ccn.c
index e8c6946fed9d..3063f5312397 100644
--- a/drivers/bus/arm-ccn.c
+++ b/drivers/bus/arm-ccn.c
@@ -1276,6 +1276,7 @@ static int arm_ccn_pmu_init(struct arm_ccn *ccn)
 
/* Perf driver registration */
ccn->dt.pmu = (struct pmu) {
+   .module = THIS_MODULE,
.attr_groups = arm_ccn_pmu_attr_groups,
.task_ctx_nr = perf_invalid_context,
.event_init = arm_ccn_pmu_event_init,
-- 
2.13.6



Re: [PATCH v9 8/8] perf: ARM DynamIQ Shared Unit PMU support

2017-11-03 Thread Suzuki K Poulose

On 03/11/17 12:20, Mark Rutland wrote:

Hi Suzuki,

This looks good, but there are a couple of edge cases I think that we
need to handle, as noted below.

On Tue, Oct 31, 2017 at 05:23:18PM +, Suzuki K Poulose wrote:

Changes since V8:



  - Fill in the "module" field for the PMU to prevent the module unload
when the PMU is active.


Huh. For some reason I thought that was done automatically, but having
looked, I see that it is not.

It looks like this is missing from the SPE PMU, and the CCN PMU. Would
you mind fixing those up?

The only other PMU that I see affected is the AMD power PMU; I've pinged
the maintainer separately.

[...]


+The driver also exposes the CPUs connected to the DSU instance in 
"associated_cpus".


Just to check, is there a user of this?

I agree that it could be useful, but AFAICT the perf tool won't look at
this, so it seems odd to expose it. I'd feel happier punting on exposing
that so that we can settle on a common name for this across
uncore/system PMUs.


It allows the user to identify the DSU instance to profile if there are
multiple DSUs on the system. Also this information can be used to identify
the "cpu" list that can be provided for -C option.



[...]


+static void dsu_pmu_probe_pmu(void *data)
+{



+   /* We can only support upto 31 independent counters */


Nit: s/upto/up to/

[...]


+static void dsu_pmu_init_pmu(struct dsu_pmu *dsu_pmu)
+{
+   int cpu, rc;
+
+   cpu = dsu_pmu_get_online_cpu(dsu_pmu);
+   /* Defer, if we don't have any active CPUs in the DSU */
+   if (cpu >= nr_cpu_ids)
+   return;
+   rc = smp_call_function_single(cpu, dsu_pmu_probe_pmu, dsu_pmu, 1);
+   if (rc)
+   return;
+   /* Reset the interrupt overflow mask */
+   dsu_pmu_get_reset_overflow();
+   dsu_pmu_set_active_cpu(cpu, dsu_pmu);
+}


I think this can be simplified by only callnig this in the hotplug
callback, and not donig the corss-call at all at driver init time. That
way, we can do:

static void dsu_pmu_init_pmu(struct dsu_pmu *dsu_pmu)
{
if (dsu_pmu->num_counters == -1)
dsu_pmu_probe_pmu(dsu_pmu);

dsu_pmu_get_reset_overflow();
}

... which also means we can simplify the prototype of
dsu_pmu_probe_pmu().

Note that the dsu_pmu_set_active_cpu() can be factored out to the
caller, which is a little clearer, as I suiggest below.


+static int dsu_pmu_device_probe(struct platform_device *pdev)
+{



+   /*
+* We could defer probing the PMU details from the registers until
+* an associated CPU is online.
+*/
+   dsu_pmu_init_pmu(dsu_pmu);


... then we can drop this line ...


+   platform_set_drvdata(pdev, dsu_pmu);
+   rc = cpuhp_state_add_instance(dsu_pmu_cpuhp_state,
+   &dsu_pmu->cpuhp_node);


... as this should set things up if a CPU is already online.

[...]


Got it, thats a good idea. I will change it.




+static int dsu_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+   struct dsu_pmu *dsu_pmu = hlist_entry_safe(node, struct dsu_pmu,
+  cpuhp_node);
+
+   if (!cpumask_test_cpu(cpu, &dsu_pmu->associated_cpus))
+   return 0;
+
+   /* Initialise the PMU if necessary */
+   if (dsu_pmu->num_counters < 0)
+   dsu_pmu_init_pmu(dsu_pmu);
+   /* Set the active CPU if we don't have one */
+   if (cpumask_empty(&dsu_pmu->active_cpu))
+   dsu_pmu_set_active_cpu(cpu, dsu_pmu);
+   return 0;
+}


I don't think this is quite right, as if we've offlined all the
associated CPUs, the DSCU itself may have been powered down, and we'll
want to reset it when it's brought online.

I think we want this to be:

static int dsu_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
{
struct dsu_pmu *dsu_pmu = hlist_entry_safe(node, struct dsu_pmu,
   cpuhp_node);

if (!cpumask_test_cpu(cpu, &dsu_pmu->associated_cpus))
return 0;

/* If the PMU is already managed, there's nothing to do */
if (!cpumask_empty(&dsu_pmu->active_cpu))
return 0;

/* Reset the PMU, and take ownership */
dsu_pmu_init_pmu(dsu_pmu);
dsu_pmu_set_active_cpu(cpu, dsu_pmu);

return 0;
}

[...]


+static int dsu_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
+{



+   dsu_pmu_set_active_cpu(dst, dsu_pmu);
+   perf_pmu_migrate_context(&dsu_pmu->pmu, cpu, dst);


In other PMU drivers, we do the migrate, then set the active CPU. That
shouldn't matter, but for consistency, could we flip these around?


OK, will flip it.



Otherwise, this looks good to me.

With the above changes:

Reviewed-by: Mark Rutland 


Thanks for the review. I will post the updated version.

Suzuki


[PATCH v10 1/8] perf: Export perf_event_update_userpage

2017-11-03 Thread Suzuki K Poulose
Export perf_event_update_userpage() so that PMU driver using them,
can be built as modules.

Cc: Peter Zilstra 
Signed-off-by: Suzuki K Poulose 
---
 kernel/events/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9d93db81fa36..550015829db8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4982,6 +4982,7 @@ void perf_event_update_userpage(struct perf_event *event)
 unlock:
rcu_read_unlock();
 }
+EXPORT_SYMBOL_GPL(perf_event_update_userpage);
 
 static int perf_mmap_fault(struct vm_fault *vmf)
 {
-- 
2.13.6



  1   2   3   4   5   6   7   8   9   10   >