RE: [PATCH v3] drivers:soc:fsl:qbman:qman.c: Sleep instead of stuck hacking jiffies.

2017-05-04 Thread Karim Eshapa
Use msleep() instead of stucking with
long delay will be more efficient.

Signed-off-by: Karim Eshapa 
---
 drivers/soc/fsl/qbman/qman.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index 3d891db..18d391e 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -1084,11 +1084,7 @@ static int drain_mr_fqrni(struct qm_portal *p)
 * entries well before the ring has been fully consumed, so
 * we're being *really* paranoid here.
 */
-   u64 now, then = jiffies;
-
-   do {
-   now = jiffies;
-   } while ((then + 1) > now);
+   msleep(1);
msg = qm_mr_current(p);
if (!msg)
return 0;
-- 
2.7.4



[PATCH] cxl: Unlock on error in probe

2017-05-04 Thread Dan Carpenter
We should unlock if get_cxl_adapter() fails.

Fixes: 594ff7d067ca ("cxl: Support to flash a new image on the adapter from a 
guest")
Signed-off-by: Dan Carpenter 

diff --git a/drivers/misc/cxl/flash.c b/drivers/misc/cxl/flash.c
index 7c61c70ba3f6..37475abea3e6 100644
--- a/drivers/misc/cxl/flash.c
+++ b/drivers/misc/cxl/flash.c
@@ -401,8 +401,10 @@ static int device_open(struct inode *inode, struct file 
*file)
if (down_interruptible() != 0)
return -EPERM;
 
-   if (!(adapter = get_cxl_adapter(adapter_num)))
-   return -ENODEV;
+   if (!(adapter = get_cxl_adapter(adapter_num))) {
+   rc = -ENODEV;
+   goto err_unlock;
+   }
 
file->private_data = adapter;
continue_token = 0;
@@ -446,6 +448,8 @@ static int device_open(struct inode *inode, struct file 
*file)
free_page((unsigned long) le);
 err:
put_device(>dev);
+err_unlock:
+   up();
 
return rc;
 }


Re: [RFC 0/2] powerpc/mm: Mark memory contexts requiring global TLBIs

2017-05-04 Thread Alistair Popple
Hi Frederic,

On Wed, 3 May 2017 04:29:04 PM Frederic Barrat wrote:
> capi2 and opencapi require the TLB invalidations being sent for
> addresses used on the cxl adapter or opencapi device to be global, as
> there's a translation cache in the PSL (for capi2) or NPU (for
> opencapi). The CAPP (for PSL) and NPU snoop the power bus.
>
> This is not new: for the hash memory model, as soon as the cxl driver
> is active, all local TLBIs become global. We need a similar mechanism
> for the radix memory model. This patch tries to improve things a bit
> by flagging the contexts requiring global TLBIs, therefore limiting
> the "upgrade" and not affecting contexts not used by the card.
>
> Alistair: for nvlink2, it is my understanding that all the required
> invalidations are already in place through software mmio/ATSD, i.e. this
> patch is not useful for you.

Not quite true. I would like to drop the global TLBI from the MMU
notifier so will need this to invalidate entries the NMMU cache.

- Alistair

> Submitting as an RFC, since I don't get to touch mmu.h everyday and
> would like to probe people's reaction.
>
>
>
> Frederic Barrat (2):
>   powerpc/mm: Add marker for contexts requiring global TLB invalidations
>   cxl: Mark context requiring global TLBIs
>
>  arch/powerpc/include/asm/book3s/64/mmu.h |  9 +
>  arch/powerpc/include/asm/tlb.h   | 10 --
>  arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
>  drivers/misc/cxl/api.c   |  5 -
>  drivers/misc/cxl/file.c  |  5 -
>  5 files changed, 26 insertions(+), 4 deletions(-)
>
>



Re: [linux-next][bisected 1945bc45] build brakes for PowerPC BE configuration on LPAR

2017-05-04 Thread Abdul Haleem
On Thu, 2017-05-04 at 20:41 +1000, Nicholas Piggin wrote:
> On Thu, 04 May 2017 14:54:19 +0530
> Abdul Haleem  wrote:
> 
> > Hi,
> > 
> > linux-next build fails on BE config with next-20170424 onwards
> > 
> > the patch https://lkml.org/lkml/2017/4/20/994  fixes a similar issue
> > with kvm guest build failure.
> > 
> > arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
> > arch/powerpc/kernel/exceptions-64s.S:395: Error: operand out of range
> > (0x8280 is not between 0x and
> > 0x)
> > make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1
> > 
> > Bisection resulted with the below bad commit.
> > 
> > commit 1945bc4549e5cb1f9aa873ec29191aa54dc851d
> > Author: Nicholas Piggin 
> > Date:   Wed Apr 19 23:05:47 2017 +1000
> > 
> > powerpc/64s: Fix POWER9 machine check handler from stop state
> > 
> > Reviewed-by: Gautham R. Shenoy 
> > Reviewed-by: Mahesh J Salgaonkar 
> > Signed-off-by: Nicholas Piggin 
> > Signed-off-by: Michael Ellerman 
> > 
> >  arch/powerpc/include/asm/reg.h   |  1 +
> >  arch/powerpc/kernel/exceptions-64s.S | 79 
> > ---
> >  arch/powerpc/kernel/idle_book3s.S| 25 +
> >  3 files changed, 70 insertions(+), 35 deletions(-)
> > 
> > the BE configuration file is attached.
> > 
> 
> Thanks for the report. I wouldn't reproduce it with this config. I
> suspect the following patch should fix it, can you test?
> 
> powerpc/64s: Fix unnecessary machine check handler relocation branch
> 
> Similarly to 2563a70c3b ("powerpc/64s: Remove unnecessary relocation
> branch from idle handler"), the machine check handler has a BRANCH_TO
> from relocated to relocated code, which is unnecessary.
> 
> It has also caused build errors with some toolchains:
> 
>   arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
>   arch/powerpc/kernel/exceptions-64s.S:395: Error: operand out of range
>   (0x8280 is not between 0x and
>   0x)
> 
> Fixes: 1945bc4549 ("powerpc/64s: Fix POWER9 machine check handler from stop 
> state")
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 3840a7700285..ef72065f684c 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -391,9 +391,7 @@ EXC_COMMON_BEGIN(machine_check_handle_early)
>*/
>   BEGIN_FTR_SECTION
>   rlwinm. r11,r12,47-31,30,31
> - beq-4f
> - BRANCH_TO_COMMON(r10, machine_check_idle_common)
> -4:
> + bne machine_check_idle_common
>   END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
>  #endif
> 

Machine Builds and boots fine, Thanks for the patch :-)

Reported-and-tested-by : Abdul Haleem 

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre





Re: [PATCH v2] drivers:soc:fsl:qbman:qman.c: Sleep instead of stuck hacking jiffies.

2017-05-04 Thread Roy Pledge
On 5/4/2017 5:07 PM, Scott Wood wrote:
> On Thu, 2017-05-04 at 06:58 +0200, Karim Eshapa wrote:
>> +stop = jiffies + 1;
>> +/*
>> + * if MR was full and h/w had other FQRNI entries to produce, we
>> + * need to allow it time to produce those entries once the
>> + * existing entries are consumed. A worst-case situation
>> + * (fully-loaded system) means h/w sequencers may have to do 3-4
>> + * other things before servicing the portal's MR pump, each of
>> + * which (if slow) may take ~50 qman cycles (which is ~200
>> + * processor cycles). So rounding up and then multiplying this
>> + * worst-case estimate by a factor of 10, just to be
>> + * ultra-paranoid, goes as high as 10,000 cycles. NB, we consume
>> + * one entry at a time, so h/w has an opportunity to produce new
>> + * entries well before the ring has been fully consumed, so
>> + * we're being *really* paranoid here.
>> + */
> OK, upon reading this more closely it seems the intent was to delay for 10,000
> *processor cycles* and somehow that got turned into 10,000 jiffies (which is
> 40 seconds at the default Hz!).  We could just replace this whole thing with
> msleep(1) and still be far more paranoid than was originally intended.
>
> Claudiu and Roy, any comments?
Yes the timing here is certainly off, the code changed a few times since
the comment was originally written.
An msleep(1) seems reasonable here to me.

Roy
>
> -Scott
>
>



Re: [PATCH v2] drivers:soc:fsl:qbman:qman.c: Sleep instead of stuck hacking jiffies.

2017-05-04 Thread Scott Wood
On Thu, 2017-05-04 at 06:58 +0200, Karim Eshapa wrote:
> + stop = jiffies + 1;
> + /*
> +  * if MR was full and h/w had other FQRNI entries to produce, we
> +  * need to allow it time to produce those entries once the
> +  * existing entries are consumed. A worst-case situation
> +  * (fully-loaded system) means h/w sequencers may have to do 3-4
> +  * other things before servicing the portal's MR pump, each of
> +  * which (if slow) may take ~50 qman cycles (which is ~200
> +  * processor cycles). So rounding up and then multiplying this
> +  * worst-case estimate by a factor of 10, just to be
> +  * ultra-paranoid, goes as high as 10,000 cycles. NB, we consume
> +  * one entry at a time, so h/w has an opportunity to produce new
> +  * entries well before the ring has been fully consumed, so
> +  * we're being *really* paranoid here.
> +  */

OK, upon reading this more closely it seems the intent was to delay for 10,000
*processor cycles* and somehow that got turned into 10,000 jiffies (which is
40 seconds at the default Hz!).  We could just replace this whole thing with
msleep(1) and still be far more paranoid than was originally intended.

Claudiu and Roy, any comments?

-Scott



Re: Freescale mpc8315 IRQ0 setup

2017-05-04 Thread Scott Wood
On Thu, 2017-05-04 at 17:06 +0200, Juergen Schindele wrote:
> Am Dienstag, 2. Mai 2017, 22:29:34 schrieb Scott Wood:
> > On Tue, 2017-05-02 at 14:43 +0200, Juergen Schindele wrote:
> > > Dear Scott,
> > > sorry for the delay but i am not very familiar with the formating.
> > > I passed the patch trough checkpatch.pl and there was no more error.
> > > pease find patch in attached file.
> > > Thanks
> > 
> > Documentation/process/submitting-patches.rst explains the way to format
> > and
> > submit kernel patches.
> > 
> > Also, why the unrelated change to a print statement in
> > ipic_set_irq_type()?
> > 
> > -Scott
> 
> The second diff is not completely unrelated because when i was
> investigating 
> the problem i saw only a message "edge sense not supported" but you dont
> know on which interrupt he is complaining about. So i added this to find
> out 
> who the suspect is.

That's fine but it's still fixing a different problem than "irq0 setup" and
should be a separate patch.

> Corrected patch

Again, please read Documentation/process/submitting-patches.rst.  Patches
should be inline, not attached.  The subject line should be something like
"powerpc/ipic: Configure "EDGE" capabilities for IRQ0 (like IRQ1-7)" and there
should be more description in the body of the changelog.

-Scott



[PATCH 2/2] powerpc/fadump: avoid holes in boot memory area when fadump is registered

2017-05-04 Thread Hari Bathini
To register fadump, boot memory area - the size of low memory chunk that
is required for a kernel to boot successfully when booted with restricted
memory, is assumed to have no holes. But this memory area is currently
not protected from hot-remove operations. So, fadump could fail to
re-register after a memory hot-remove operation, if memory is removed
from boot memory area. To avoid this, ensure that memory from boot
memory area is not hot-removed when fadump is registered.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/include/asm/fadump.h   |1 +
 arch/powerpc/kernel/fadump.c|   12 
 arch/powerpc/platforms/pseries/hotplug-memory.c |7 +++
 3 files changed, 20 insertions(+)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index 0031806..609fccc 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -198,6 +198,7 @@ struct fad_crash_memory_ranges {
unsigned long long  size;
 };
 
+extern int is_fadump_boot_memory_area(u64 addr, ulong size);
 extern int early_init_dt_scan_fw_dump(unsigned long node,
const char *uname, int depth, void *data);
 extern int fadump_reserve_mem(void);
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 03563c6..ea7dfdc 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -114,6 +114,18 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
return 1;
 }
 
+/*
+ * If fadump is registered, check if the memory provided
+ * falls within boot memory area.
+ */
+int is_fadump_boot_memory_area(u64 addr, ulong size)
+{
+   if (!fw_dump.dump_registered)
+   return 0;
+
+   return (addr + size) > RMA_START && addr <= fw_dump.boot_memory_size;
+}
+
 int is_fadump_active(void)
 {
return fw_dump.dump_active;
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index e104c71..a186b8e 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "pseries.h"
 
 static bool rtas_hp_event;
@@ -406,6 +407,12 @@ static bool lmb_is_removable(struct of_drconf_cell *lmb)
scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
phys_addr = lmb->base_addr;
 
+#ifdef CONFIG_FA_DUMP
+   /* Don't hot-remove memory that falls in fadump boot memory area */
+   if (is_fadump_boot_memory_area(phys_addr, block_sz))
+   return false;
+#endif
+
for (i = 0; i < scns_per_block; i++) {
pfn = PFN_DOWN(phys_addr);
if (!pfn_present(pfn))



[PATCH 1/2] powerpc/fadump: avoid duplicates in crash memory ranges

2017-05-04 Thread Hari Bathini
fadump sets up crash memory ranges to be used for creating PT_LOAD
program headers in elfcore header. Memory chunk RMA_START through
boot memory area size is added as the first memory range because
firmware, at the time of crash, moves this memory chunk to different
location specified during fadump registration making it necessary to
create a separate program header for it with the correct offset.
This memory chunk is skipped while setting up the remaining memory
ranges. But currently, there is possibility that some of this memory
may have duplicate entries like when it is hot-removed and added
again. Ensure that no two memory ranges represent the same memory.

When 5 lmbs are hot-removed and then hot-plugged before registering
fadump, here is how the program headers in /proc/vmcore exported by
fadump look like

without this change:

  Program Headers:
Type   Offset VirtAddr   PhysAddr
   FileSizMemSiz  Flags  Align
NOTE   0x0001 0x 0x
   0x1894 0x1894 0
LOAD   0x00021020 0xc000 0x
   0x4000 0x4000  RWE0
LOAD   0x40031020 0xc000 0x
   0x1000 0x1000  RWE0
LOAD   0x5004 0xc0001000 0x1000
   0x5000 0x5000  RWE0
LOAD   0xa004 0xc0006000 0x6000
   0x00019ffe 0x00019ffe  RWE0

and with this change:

  Program Headers:
Type   Offset VirtAddr   PhysAddr
   FileSizMemSiz  Flags  Align
NOTE   0x0001 0x 0x
   0x1894 0x1894 0
LOAD   0x00021020 0xc000 0x
   0x4000 0x4000  RWE0
LOAD   0x4003 0xc0004000 0x4000
   0x2000 0x2000  RWE0
LOAD   0x6003 0xc0006000 0x6000
   0x00019ffe 0x00019ffe  RWE0

Signed-off-by: Hari Bathini 
---
 arch/powerpc/kernel/fadump.c |   13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 8ff0dd4..03563c6 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -844,8 +844,17 @@ static void fadump_setup_crash_memory_ranges(void)
for_each_memblock(memory, reg) {
start = (unsigned long long)reg->base;
end = start + (unsigned long long)reg->size;
-   if (start == RMA_START && end >= fw_dump.boot_memory_size)
-   start = fw_dump.boot_memory_size;
+
+   /*
+* skip the first memory chunk (RMA_START through
+* boot_memory_size) that is already added.
+*/
+   if (start < fw_dump.boot_memory_size && start >= RMA_START) {
+   if (end > fw_dump.boot_memory_size)
+   start = fw_dump.boot_memory_size;
+   else
+   continue;
+   }
 
/* add this range excluding the reserved dump area. */
fadump_exclude_reserved_area(start, end);



Re: [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations

2017-05-04 Thread Frederic Barrat



Le 04/05/2017 à 08:41, Aneesh Kumar K.V a écrit :

Frederic Barrat  writes:


Introduce a new 'flags' attribute per context and define its first bit
to be a marker requiring all TLBIs for that context to be broadcasted
globally. Once that marker is set on a context, it cannot be removed.

Such a marker is useful for memory contexts used by devices behind the
NPU and CAPP/PSL. The NPU and the PSL keep their own
translation cache so they need to see all the TLBIs for those
contexts.


Can we also switch existing cxl_ctx_in_use() to this ?



That was my initial intent. But in the hash code, when calling the 
tlbie, it seems that we no longer have the related context handy. Or did 
I miss it? so it would require quite a bit of changes.


So I've just focused on fixing the pb for radix for the time being. That 
being said, we'll have to update what we do for hash if we ever want to 
support opencapi with hash/powervm (which seems like a strong 
possibility for next year), as we could have more than one opencapi drivers.


  Fred



Re: [PATCH v2] powerpc/kprobes: refactor kprobe_lookup_name for safer string operations

2017-05-04 Thread 'Naveen N. Rao'
On 2017/05/04 12:45PM, David Laight wrote:
> From: Naveen N. Rao [mailto:naveen.n@linux.vnet.ibm.com]
> > Sent: 04 May 2017 11:25
> > Use safer string manipulation functions when dealing with a
> > user-provided string in kprobe_lookup_name().
> > 
> > Reported-by: David Laight 
> > Signed-off-by: Naveen N. Rao 
> > ---
> > Changed to ignore return value of 0 from strscpy(), as suggested by
> > Masami.
> 
> let's see what this code looks like;
> 
> > char dot_name[MODULE_NAME_LEN + 1 + KSYM_NAME_LEN];
> > bool dot_appended = false;
> > +   const char *c;
> > +   ssize_t ret = 0;
> > +   int len = 0;
> > +
> > +   if ((c = strnchr(name, MODULE_NAME_LEN, ':')) != NULL) {
> 
> I don't like unnecessary assignments in conditionals.
> 
> > +   c++;
> > +   len = c - name;
> > +   memcpy(dot_name, name, len);
> > +   } else
> > +   c = name;
> > +
> > +   if (*c != '\0' && *c != '.') {
> > +   dot_name[len++] = '.';
> > dot_appended = true;
> 
> If you don't append a dot, then you can always just lookup
> the original string.
> 
> > }
> > +   ret = strscpy(dot_name + len, c, KSYM_NAME_LEN);
> > +   if (ret > 0)
> > +   addr = (kprobe_opcode_t *)kallsyms_lookup_name(dot_name);
> 
> I'm not sure you need 'ret' here at all.
> 
> > +   /* Fallback to the original non-dot symbol lookup */
> > +   if (!addr && dot_appended)
> > addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
> 
> We can bikeshed this function to death:

Sure, though that was never the intent. For me, this was all about 
ensuring safe string manipulation. And I suppose my patch and your 
version both achieve that.

> 
>   /* The function name must start with a '.'.
>* If it doesn't then we insert one. */
>   c = strnchr(name, MODULE_NAME_LEN, ':');
>   if (c && c[1] && c[1] != '.') {
 ^^^
 while we're here, I might as well point out that that's 
 un-necessary and probably a few more things below... 
 ;-)

- Naveen

>   /* Insert a '.' after the ':' */
>   c++;
>   len = c - name;
>   memcpy(dot_name, name, len);
>   } else {
>   if (name[0] == '.')
>   goto check_name:
>   /* Insert a '.' before name */
>   c = name;
>   len = 0;
>   }
> 
>   dot_name[len++] = '.';
>   if (strscpy(dot_name + len, c, KSYM_NAME_LEN) > 0) {
>   addr = (kprobe_opcode_t *)kallsyms_lookup_name(dot_name);
>   if (addr)
>   return addr;
>   }
>   /* Symbol with extra '.' not found, fallback to original name */
> 
> check_name:
>   return (kprobe_opcode_t *)kallsyms_lookup_name(name);
> 
>   David
> 



RE: [PATCH v2] powerpc/kprobes: refactor kprobe_lookup_name for safer string operations

2017-05-04 Thread David Laight
From: Paul Clarke
> Sent: 04 May 2017 16:07
...
> > +   if ((c = strnchr(name, MODULE_NAME_LEN, ':')) != NULL) {
> 
> Shouldn't this be MODULE_NAME_LEN + 1, since the ':' can come after a module 
> name of length
> MODULE_NAME_LEN?

No, because MODULE_NAME_LEN includes the terminating '\0'.

David



Re: [PATCH v2] powerpc/kprobes: refactor kprobe_lookup_name for safer string operations

2017-05-04 Thread Paul Clarke
On 05/04/2017 05:24 AM, Naveen N. Rao wrote:
> Use safer string manipulation functions when dealing with a
> user-provided string in kprobe_lookup_name().
> 
> Reported-by: David Laight 
> Signed-off-by: Naveen N. Rao 
> ---
> Changed to ignore return value of 0 from strscpy(), as suggested by
> Masami.
> 
> - Naveen
> 
>  arch/powerpc/kernel/kprobes.c | 47 
> ++-
>  1 file changed, 20 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index 160ae0fa7d0d..255d28d31ca1 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -53,7 +53,7 @@ bool arch_within_kprobe_blacklist(unsigned long addr)
> 
>  kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
>  {
> - kprobe_opcode_t *addr;
> + kprobe_opcode_t *addr = NULL;
> 
>  #ifdef PPC64_ELF_ABI_v2
>   /* PPC64 ABIv2 needs local entry point */
> @@ -85,36 +85,29 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
> unsigned int offset)
>* Also handle  format.
>*/
>   char dot_name[MODULE_NAME_LEN + 1 + KSYM_NAME_LEN];
> - const char *modsym;
>   bool dot_appended = false;
> - if ((modsym = strchr(name, ':')) != NULL) {
> - modsym++;
> - if (*modsym != '\0' && *modsym != '.') {
> - /* Convert to  */
> - strncpy(dot_name, name, modsym - name);
> - dot_name[modsym - name] = '.';
> - dot_name[modsym - name + 1] = '\0';
> - strncat(dot_name, modsym,
> - sizeof(dot_name) - (modsym - name) - 2);
> - dot_appended = true;
> - } else {
> - dot_name[0] = '\0';
> - strncat(dot_name, name, sizeof(dot_name) - 1);
> - }
> - } else if (name[0] != '.') {
> - dot_name[0] = '.';
> - dot_name[1] = '\0';
> - strncat(dot_name, name, KSYM_NAME_LEN - 2);
> + const char *c;
> + ssize_t ret = 0;
> + int len = 0;
> +
> + if ((c = strnchr(name, MODULE_NAME_LEN, ':')) != NULL) {

Shouldn't this be MODULE_NAME_LEN + 1, since the ':' can come after a module 
name of length MODULE_NAME_LEN?

> + c++;
> + len = c - name;
> + memcpy(dot_name, name, len);
> + } else
> + c = name;
> +
> + if (*c != '\0' && *c != '.') {
> + dot_name[len++] = '.';
>   dot_appended = true;
> - } else {
> - dot_name[0] = '\0';
> - strncat(dot_name, name, KSYM_NAME_LEN - 1);
>   }

PC



Re: Freescale mpc8315 IRQ0 setup

2017-05-04 Thread Juergen Schindele
Am Dienstag, 2. Mai 2017, 22:29:34 schrieb Scott Wood:
> On Tue, 2017-05-02 at 14:43 +0200, Juergen Schindele wrote:
> > Dear Scott,
> > sorry for the delay but i am not very familiar with the formating.
> > I passed the patch trough checkpatch.pl and there was no more error.
> > pease find patch in attached file.
> > Thanks
> 
> Documentation/process/submitting-patches.rst explains the way to format and
> submit kernel patches.
> 
> Also, why the unrelated change to a print statement in ipic_set_irq_type()?
> 
> -Scott
The second diff is not completely unrelated because when i was investigating 
the problem i saw only a message "edge sense not supported" but you dont
know on which interrupt he is complaining about. So i added this to find out 
who the suspect is.

Corrected patch
-- 
i. A.
Jürgen Schindele
Softwareentwicklung

PSI Nentec GmbH
Greschbachstraße 12
76229 Karlsruhe
Deutschland
Telefon: +49 721 94249-51
Telefax: +49 721 94249-10
schind...@nentec.de
www.nentec.de

Geschäftsführung: Klaus Becker, Wolfgang Fischer
Sitz der Gesellschaft: Karlsruhe
Handelsregister: Amtsgericht Mannheim HRB 107658

Diese E-Mail enthält vertrauliche oder rechtlich geschützte Informationen. 
Wenn Sie nicht der vorgesehene Empfänger sind, informieren Sie bitte sofort 
den Absender und löschen Sie diese E-Mail. Das unbefugte Kopieren dieser E-
Mail oder die unbefugte Weitergabe der enthaltenen Informationen ist nicht 
gestattet.

The information contained in this message is confidential or protected by law. 
If you are not the intended recipient, please contact the sender and delete 
this message. Any unauthorised copying of this message or unauthorised 
distribution of the information contained herein is prohibited. [PATCH] configure "EDGE" capabilities for IRQ0 (like IRQ1-7)
Signed-off-by: Jurgen Schindele 

--- a/arch/powerpc/sysdev/ipic.c	2016-12-11 20:17:54.0 +0100
+++ b/arch/powerpc/sysdev/ipic.c	2017-04-04 15:28:11.201308780 +0200
@@ -315,6 +315,7 @@ static struct ipic_info ipic_info[] = {
 		.prio_mask = 7,
 	},
 	[48] = {
+		.ack	= IPIC_SEPNR,
 		.mask	= IPIC_SEMSR,
 		.prio	= IPIC_SMPRR_A,
 		.force	= IPIC_SEFCR,
@@ -617,7 +618,7 @@ static int ipic_set_irq_type(struct irq_
 	/* ipic supports only edge mode on external interrupts */
 	if ((flow_type & IRQ_TYPE_EDGE_FALLING) && !ipic_info[src].ack) {
 		printk(KERN_ERR "ipic: edge sense not supported on internal "
-"interrupts\n");
+"interrupts %d\n", src);
 		return -EINVAL;

 	}


[PATCH v8 09/10] powerpc/perf: Thread IMC PMU functions

2017-05-04 Thread Anju T Sudhakar
This patch adds the PMU functions required for event initialization,
read, update, add, del etc. for thread IMC PMU. Thread IMC PMUs are used
for per-task monitoring. 

For each CPU, a page of memory is allocated and is kept static i.e.,
these pages will exist till the machine shuts down. The base address of
this page is assigned to the ldbar of that cpu. As soon as we do that,
the thread IMC counters start running for that cpu and the data of these
counters are assigned to the page allocated. But we use this for
per-task monitoring. Whenever we start monitoring a task, the event is
added is onto the task. At that point, we read the initial value of the
event. Whenever, we stop monitoring the task, the final value is taken
and the difference is the event data.

Now, a task can move to a different cpu. Suppose a task X is moving from
cpu A to cpu B. When the task is scheduled out of A, we get an
event_del for A, and hence, the event data is updated. And, we stop
updating the X's event data. As soon as X moves on to B, event_add is
called for B, and we again update the event_data. And this is how it
keeps on updating the event data even when the task is scheduled on to
different cpus.

Signed-off-by: Anju T Sudhakar 
Signed-off-by: Hemant Kumar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/imc-pmu.h|   5 +
 arch/powerpc/perf/imc-pmu.c   | 209 +-
 arch/powerpc/platforms/powernv/opal-imc.c |   3 +
 3 files changed, 216 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 6260e61..cc04712 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -42,6 +42,7 @@
  * IMC Core engine expects 8K bytes of memory for counter collection.
  */
 #define IMC_CORE_COUNTER_MEM   8192
+#define IMC_THREAD_COUNTER_MEM 8192
 
 /*
  *Compatbility macros for IMC devices
@@ -51,6 +52,9 @@
 #define IMC_DTB_CORE_COMPAT"ibm,imc-counters-core"
 #define IMC_DTB_THREAD_COMPAT  "ibm,imc-counters-thread"
 
+#define THREAD_IMC_LDBAR_MASK   0x0003e000
+#define THREAD_IMC_ENABLE   0x8000
+
 /*
  * Structure to hold per chip specific memory address
  * information for nest pmus. Nest Counter data are exported
@@ -110,4 +114,5 @@ extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 extern struct imc_pmu *core_imc_pmu;
 extern int __init init_imc_pmu(struct imc_events *events,int idx, struct 
imc_pmu *pmu_ptr);
 void core_imc_disable(void);
+void thread_imc_disable(void);
 #endif /* PPC_POWERNV_IMC_PMU_DEF_H */
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 9767714..cfd112e 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -38,6 +38,9 @@ static u64 per_core_pdbar_add[IMC_MAX_CHIPS][IMC_MAX_CORES];
 static cpumask_t core_imc_cpumask;
 struct imc_pmu *core_imc_pmu;
 
+/* Maintains base address for all the cpus */
+static u64 per_cpu_add[NR_CPUS];
+
 /* Needed for sanity check */
 extern u64 nest_max_offset;
 extern u64 core_max_offset;
@@ -480,6 +483,56 @@ static int core_imc_event_init(struct perf_event *event)
return 0;
 }
 
+static int thread_imc_event_init(struct perf_event *event)
+{
+   struct task_struct *target;
+
+   if (event->attr.type != event->pmu->type)
+   return -ENOENT;
+
+   /* Sampling not supported */
+   if (event->hw.sample_period)
+   return -EINVAL;
+
+   event->hw.idx = -1;
+
+   /* Sanity check for config (event offset) */
+   if (event->attr.config > thread_max_offset)
+   return -EINVAL;
+
+   target = event->hw.target;
+
+   if (!target)
+   return -EINVAL;
+
+   event->pmu->task_ctx_nr = perf_sw_context;
+   return 0;
+}
+
+static void thread_imc_read_counter(struct perf_event *event)
+{
+   u64 *addr, data;
+   int cpu_id = smp_processor_id();
+
+   addr = (u64 *)(per_cpu_add[cpu_id] + event->attr.config);
+   data = __be64_to_cpu(READ_ONCE(*addr));
+   local64_set(>hw.prev_count, data);
+}
+
+static void thread_imc_perf_event_update(struct perf_event *event)
+{
+   u64 counter_prev, counter_new, final_count, *addr;
+   int cpu_id = smp_processor_id();
+
+   addr = (u64 *)(per_cpu_add[cpu_id] + event->attr.config);
+   counter_prev = local64_read(>hw.prev_count);
+   counter_new = __be64_to_cpu(READ_ONCE(*addr));
+   final_count = counter_new - counter_prev;
+
+   local64_set(>hw.prev_count, counter_new);
+   local64_add(final_count, >count);
+}
+
 static void imc_read_counter(struct perf_event *event)
 {
u64 *addr, data;
@@ -720,6 +773,84 @@ static int core_imc_event_add(struct perf_event *event, 
int flags)
 }
 
 
+static void thread_imc_event_start(struct perf_event *event, 

[PATCH v8 07/10] powerpc/perf: PMU functions for Core IMC and hotplugging

2017-05-04 Thread Anju T Sudhakar
From: Hemant Kumar 

This patch adds the PMU function to initialize a core IMC event. It also
adds cpumask initialization function for core IMC PMU. For
initialization, a 8KB of memory is allocated per core where the data
for core IMC counters will be accumulated. The base address for this
page is sent to OPAL via an OPAL call which initializes various SCOMs
related to Core IMC initialization. Upon any errors, the pages are
free'ed and core IMC counters are disabled using the same OPAL call.

For CPU hotplugging, a cpumask is initialized which contains an online
CPU from each core. If a cpu goes offline, we check whether that cpu
belongs to the core imc cpumask, if yes, then, we migrate the PMU
context to any other online cpu (if available) in that core. If a cpu
comes back online, then this cpu will be added to the core imc cpumask
only if there was no other cpu from that core in the previous cpumask.

To register the hotplug functions for core_imc, a new state
CPUHP_AP_PERF_POWERPC_COREIMC_ONLINE is added to the list of existing
states.

Patch also adds OPAL device shutdown callback. Needed to disable the
IMC core engine to handle kexec.

Signed-off-by: Hemant Kumar 
Signed-off-by: Anju T Sudhakar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/imc-pmu.h|   7 +
 arch/powerpc/perf/imc-pmu.c   | 380 +-
 arch/powerpc/platforms/powernv/opal-imc.c |   7 +
 include/linux/cpuhotplug.h|   1 +
 4 files changed, 384 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 37fdd79..bf5fb7c 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -24,6 +24,7 @@
  */
 #define IMC_MAX_CHIPS  32
 #define IMC_MAX_PMUS   32
+#define IMC_MAX_CORES  32
 
 /*
  * This macro is used for memory buffer allocation of
@@ -38,6 +39,11 @@
 #define IMC_NEST_MAX_PAGES 64
 
 /*
+ * IMC Core engine expects 8K bytes of memory for counter collection.
+ */
+#define IMC_CORE_COUNTER_MEM   8192
+
+/*
  *Compatbility macros for IMC devices
  */
 #define IMC_DTB_COMPAT "ibm,opal-in-memory-counters"
@@ -101,4 +107,5 @@ extern struct perchip_nest_info 
nest_perchip_info[IMC_MAX_CHIPS];
 extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 extern struct imc_pmu *core_imc_pmu;
 extern int __init init_imc_pmu(struct imc_events *events,int idx, struct 
imc_pmu *pmu_ptr);
+void core_imc_disable(void);
 #endif /* PPC_POWERNV_IMC_PMU_DEF_H */
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index c132df2..fb71825 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -1,5 +1,5 @@
 /*
- * Nest Performance Monitor counter support.
+ * IMC Performance Monitor counter support.
  *
  * Copyright (C) 2017 Madhavan Srinivasan, IBM Corporation.
  *   (C) 2017 Anju T Sudhakar, IBM Corporation.
@@ -21,9 +21,21 @@ struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 static cpumask_t nest_imc_cpumask;
 
 static atomic_t nest_events;
+static atomic_t core_events;
 /* Used to avoid races in calling enable/disable nest-pmu units*/
 static DEFINE_MUTEX(imc_nest_reserve);
+/* Used to avoid races in calling enable/disable core-pmu units */
+static DEFINE_MUTEX(imc_core_reserve);
 
+/*
+ * Maintains base addresses for all the cores.
+ * MAX chip and core are defined as 32. So we
+ * statically allocate 8K for this structure.
+ *
+ * TODO -- Could be made dynamic
+ */
+static u64 per_core_pdbar_add[IMC_MAX_CHIPS][IMC_MAX_CORES];
+static cpumask_t core_imc_cpumask;
 struct imc_pmu *core_imc_pmu;
 
 /* Needed for sanity check */
@@ -46,9 +58,15 @@ static ssize_t imc_pmu_cpumask_get_attr(struct device *dev,
struct device_attribute *attr,
char *buf)
 {
+   struct pmu *pmu = dev_get_drvdata(dev);
cpumask_t *active_mask;
 
-   active_mask = _imc_cpumask;
+   if (!strncmp(pmu->name, "nest_", strlen("nest_")))
+   active_mask = _imc_cpumask;
+   else if (!strncmp(pmu->name, "core_", strlen("core_")))
+   active_mask = _imc_cpumask;
+   else
+   return 0;
return cpumap_print_to_pagebuf(true, buf, active_mask);
 }
 
@@ -64,6 +82,100 @@ static struct attribute_group imc_pmu_cpumask_attr_group = {
 };
 
 /*
+ * core_imc_mem_init : Initializes memory for the current core.
+ *
+ * Uses alloc_pages_exact_nid() and uses the returned address as an argument to
+ * an opal call to configure the pdbar. The address sent as an argument is
+ * converted to physical address before the opal call is made. This is the
+ * base address at which the core imc counters are populated.
+ */
+static int __meminit 

[PATCH v8 10/10] powerpc/perf: Thread imc cpuhotplug support

2017-05-04 Thread Anju T Sudhakar
This patch adds support for thread IMC on cpuhotplug.

When a cpu goes offline, the LDBAR for that cpu is disabled, and when it comes
back online the previous ldbar value is written back to the LDBAR for that cpu.

To register the hotplug functions for thread_imc, a new state
CPUHP_AP_PERF_POWERPC_THREADIMC_ONLINE is added to the list of existing
states.

Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Anju T Sudhakar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/imc-pmu.c | 32 +++-
 include/linux/cpuhotplug.h  |  1 +
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index cfd112e..f10489f 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -982,6 +982,16 @@ static void cleanup_all_thread_imc_memory(void)
on_each_cpu(cleanup_thread_imc_memory, NULL, 1);
 }
 
+static void thread_imc_update_ldbar(unsigned int cpu_id)
+{
+   u64 ldbar_addr, ldbar_value;
+
+   ldbar_addr = (u64)virt_to_phys((void *)per_cpu_add[cpu_id]);
+   ldbar_value = (ldbar_addr & (u64)THREAD_IMC_LDBAR_MASK) |
+   (u64)THREAD_IMC_ENABLE;
+   mtspr(SPRN_LDBAR, ldbar_value);
+}
+
 /*
  * Allocates a page of memory for each of the online cpus, and, writes the
  * physical base address of that page to the LDBAR for that cpu. This starts
@@ -989,21 +999,33 @@ static void cleanup_all_thread_imc_memory(void)
  */
 static void thread_imc_mem_alloc(void *dummy)
 {
-   u64 ldbar_addr, ldbar_value;
int cpu_id = smp_processor_id();
int phys_id = topology_physical_package_id(smp_processor_id());
 
per_cpu_add[cpu_id] = (u64)alloc_pages_exact_nid(phys_id,
(size_t)IMC_THREAD_COUNTER_MEM, GFP_KERNEL | 
__GFP_ZERO);
-   ldbar_addr = (u64)virt_to_phys((void *)per_cpu_add[cpu_id]);
-   ldbar_value = (ldbar_addr & (u64)THREAD_IMC_LDBAR_MASK) |
-   (u64)THREAD_IMC_ENABLE;
-   mtspr(SPRN_LDBAR, ldbar_value);
+   thread_imc_update_ldbar(cpu_id);
+}
+
+static int ppc_thread_imc_cpu_online(unsigned int cpu)
+{
+   thread_imc_update_ldbar(cpu);
+   return 0;
 }
 
+static int ppc_thread_imc_cpu_offline(unsigned int cpu)
+{
+   mtspr(SPRN_LDBAR, 0);
+   return 0;
+ }
+
 void thread_imc_cpu_init(void)
 {
on_each_cpu(thread_imc_mem_alloc, NULL, 1);
+   cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_THREADIMC_ONLINE,
+   "POWER_THREAD_IMC_ONLINE",
+   ppc_thread_imc_cpu_online,
+   ppc_thread_imc_cpu_offline);
 }
 
 /*
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index e7b7712..bbec927 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -139,6 +139,7 @@ enum cpuhp_state {
CPUHP_AP_PERF_ARM_QCOM_L2_ONLINE,
CPUHP_AP_PERF_POWERPC_NEST_ONLINE,
CPUHP_AP_PERF_POWERPC_COREIMC_ONLINE,
+   CPUHP_AP_PERF_POWERPC_THREADIMC_ONLINE,
CPUHP_AP_WORKQUEUE_ONLINE,
CPUHP_AP_RCUTREE_ONLINE,
CPUHP_AP_ONLINE_DYN,
-- 
2.7.4



[PATCH v8 06/10] powerpc/powernv: Core IMC events detection

2017-05-04 Thread Anju T Sudhakar
From: Hemant Kumar 

This patch adds support for detection of core IMC events along with the
Nest IMC events. It adds a new domain IMC_DOMAIN_CORE and its determined
with the help of the compatibility string "ibm,imc-counters-core" based
on the IMC device tree.

Signed-off-by: Anju T Sudhakar 
Signed-off-by: Hemant Kumar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/imc-pmu.h|  4 +++-
 arch/powerpc/perf/imc-pmu.c   |  3 +++
 arch/powerpc/platforms/powernv/opal-imc.c | 28 +---
 3 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 1478d0f..37fdd79 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -42,6 +42,7 @@
  */
 #define IMC_DTB_COMPAT "ibm,opal-in-memory-counters"
 #define IMC_DTB_NEST_COMPAT"ibm,imc-counters-nest"
+#define IMC_DTB_CORE_COMPAT"ibm,imc-counters-core"
 
 /*
  * Structure to hold per chip specific memory address
@@ -90,13 +91,14 @@ struct imc_pmu {
  * Domains for IMC PMUs
  */
 #define IMC_DOMAIN_NEST1
+#define IMC_DOMAIN_CORE2
 #define IMC_DOMAIN_UNKNOWN -1
 
 #define IMC_COUNTER_ENABLE 1
 #define IMC_COUNTER_DISABLE0
 
-
 extern struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
 extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+extern struct imc_pmu *core_imc_pmu;
 extern int __init init_imc_pmu(struct imc_events *events,int idx, struct 
imc_pmu *pmu_ptr);
 #endif /* PPC_POWERNV_IMC_PMU_DEF_H */
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 40792424..c132df2 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -24,8 +24,11 @@ static atomic_t nest_events;
 /* Used to avoid races in calling enable/disable nest-pmu units*/
 static DEFINE_MUTEX(imc_nest_reserve);
 
+struct imc_pmu *core_imc_pmu;
+
 /* Needed for sanity check */
 extern u64 nest_max_offset;
+extern u64 core_max_offset;
 
 PMU_FORMAT_ATTR(event, "config:0-20");
 static struct attribute *imc_format_attrs[] = {
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 61f6d67..d712ef3 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -34,6 +34,7 @@
 #include 
 
 u64 nest_max_offset;
+u64 core_max_offset;
 
 static int imc_event_prop_update(char *name, struct imc_events *events)
 {
@@ -114,6 +115,10 @@ static void update_max_value(u32 value, int pmu_domain)
if (nest_max_offset < value)
nest_max_offset = value;
break;
+   case IMC_DOMAIN_CORE:
+   if (core_max_offset < value)
+   core_max_offset = value;
+   break;
default:
/* Unknown domain, return */
return;
@@ -357,7 +362,7 @@ static struct imc_events *imc_events_setup(struct 
device_node *parent,
 /*
  * imc_pmu_create : Takes the parent device which is the pmu unit and a
  *  pmu_index as the inputs.
- * Allocates memory for the pmu, sets up its domain (NEST), and
+ * Allocates memory for the pmu, sets up its domain (NEST/CORE), and
  * calls imc_events_setup() to allocate memory for the events supported
  * by this pmu. Assigns a name for the pmu. Calls imc_events_node_parser()
  * to setup the individual events.
@@ -386,7 +391,10 @@ static int imc_pmu_create(struct device_node *parent, int 
pmu_index, int domain)
goto free_pmu;
 
/* Needed for hotplug/migration */
-   per_nest_pmu_arr[pmu_index] = pmu_ptr;
+   if (pmu_ptr->domain == IMC_DOMAIN_CORE)
+   core_imc_pmu = pmu_ptr;
+   else if (pmu_ptr->domain == IMC_DOMAIN_NEST)
+   per_nest_pmu_arr[pmu_index] = pmu_ptr;
 
pp = of_find_property(parent, "name", NULL);
if (!pp) {
@@ -407,7 +415,10 @@ static int imc_pmu_create(struct device_node *parent, int 
pmu_index, int domain)
goto free_pmu;
}
/* Save the name to register it later */
-   sprintf(buf, "nest_%s", (char *)pp->value);
+   if (pmu_ptr->domain == IMC_DOMAIN_NEST)
+   sprintf(buf, "nest_%s", (char *)pp->value);
+   else
+   sprintf(buf, "%s_imc", (char *)pp->value);
pmu_ptr->pmu.name = (char *)buf;
 
/*
@@ -461,6 +472,17 @@ static void __init imc_pmu_setup(struct device_node 
*parent)
return;
pmu_count++;
}
+   /*
+* Loop through the imc-counters tree for each compatible
+* "ibm,imc-counters-core", and update "struct imc_pmu".
+*/
+   for_each_compatible_node(child, NULL, IMC_DTB_CORE_COMPAT) {
+   domain = 

[PATCH v8 08/10] powerpc/powernv: Thread IMC events detection

2017-05-04 Thread Anju T Sudhakar
Patch adds support for detection of thread IMC events. It adds a new
domain IMC_DOMAIN_THREAD and it is determined with the help of the
compatibility string "ibm,imc-counters-thread" based on the IMC device
tree.

Signed-off-by: Anju T Sudhakar 
Signed-off-by: Hemant Kumar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/imc-pmu.h|  2 ++
 arch/powerpc/perf/imc-pmu.c   |  1 +
 arch/powerpc/platforms/powernv/opal-imc.c | 18 +-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index bf5fb7c..6260e61 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -49,6 +49,7 @@
 #define IMC_DTB_COMPAT "ibm,opal-in-memory-counters"
 #define IMC_DTB_NEST_COMPAT"ibm,imc-counters-nest"
 #define IMC_DTB_CORE_COMPAT"ibm,imc-counters-core"
+#define IMC_DTB_THREAD_COMPAT  "ibm,imc-counters-thread"
 
 /*
  * Structure to hold per chip specific memory address
@@ -98,6 +99,7 @@ struct imc_pmu {
  */
 #define IMC_DOMAIN_NEST1
 #define IMC_DOMAIN_CORE2
+#define IMC_DOMAIN_THREAD  3
 #define IMC_DOMAIN_UNKNOWN -1
 
 #define IMC_COUNTER_ENABLE 1
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index fb71825..9767714 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -41,6 +41,7 @@ struct imc_pmu *core_imc_pmu;
 /* Needed for sanity check */
 extern u64 nest_max_offset;
 extern u64 core_max_offset;
+extern u64 thread_max_offset;
 
 PMU_FORMAT_ATTR(event, "config:0-20");
 static struct attribute *imc_format_attrs[] = {
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 23507d7..940f6b9 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -35,6 +35,7 @@
 
 u64 nest_max_offset;
 u64 core_max_offset;
+u64 thread_max_offset;
 
 static int imc_event_prop_update(char *name, struct imc_events *events)
 {
@@ -119,6 +120,10 @@ static void update_max_value(u32 value, int pmu_domain)
if (core_max_offset < value)
core_max_offset = value;
break;
+   case IMC_DOMAIN_THREAD:
+   if (thread_max_offset < value)
+   thread_max_offset = value;
+   break;
default:
/* Unknown domain, return */
return;
@@ -362,7 +367,7 @@ static struct imc_events *imc_events_setup(struct 
device_node *parent,
 /*
  * imc_pmu_create : Takes the parent device which is the pmu unit and a
  *  pmu_index as the inputs.
- * Allocates memory for the pmu, sets up its domain (NEST/CORE), and
+ * Allocates memory for the pmu, sets up its domain (NEST/CORE/THREAD), and
  * calls imc_events_setup() to allocate memory for the events supported
  * by this pmu. Assigns a name for the pmu. Calls imc_events_node_parser()
  * to setup the individual events.
@@ -483,6 +488,17 @@ static void __init imc_pmu_setup(struct device_node 
*parent)
return;
pmu_count++;
}
+   /*
+* Loop through the imc-counters tree for each compatible
+* "ibm,imc-counters-thread", and update "struct imc_pmu".
+*/
+   for_each_compatible_node(child, NULL, IMC_DTB_THREAD_COMPAT) {
+   domain = IMC_DOMAIN_THREAD;
+   rc = imc_pmu_create(child, pmu_count, domain);
+   if (rc)
+   return;
+   pmu_count++;
+   }
 }
 
 static int opal_imc_counters_probe(struct platform_device *pdev)
-- 
2.7.4



[PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support

2017-05-04 Thread Anju T Sudhakar
Adds cpumask attribute to be used by each IMC pmu. Only one cpu (any
online CPU) from each chip for nest PMUs is designated to read counters.

On CPU hotplug, dying CPU is checked to see whether it is one of the
designated cpus, if yes, next online cpu from the same chip (for nest
units) is designated as new cpu to read counters. For this purpose, we
introduce a new state : CPUHP_AP_PERF_POWERPC_NEST_ONLINE.

Signed-off-by: Anju T Sudhakar 
Signed-off-by: Hemant Kumar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/imc-pmu.h |   4 +
 arch/powerpc/include/asm/opal-api.h|  12 +-
 arch/powerpc/include/asm/opal.h|   4 +
 arch/powerpc/perf/imc-pmu.c| 248 -
 arch/powerpc/platforms/powernv/opal-wrappers.S |   3 +
 include/linux/cpuhotplug.h |   1 +
 6 files changed, 266 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 6bbe184..1478d0f 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -92,6 +92,10 @@ struct imc_pmu {
 #define IMC_DOMAIN_NEST1
 #define IMC_DOMAIN_UNKNOWN -1
 
+#define IMC_COUNTER_ENABLE 1
+#define IMC_COUNTER_DISABLE0
+
+
 extern struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
 extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 extern int __init init_imc_pmu(struct imc_events *events,int idx, struct 
imc_pmu *pmu_ptr);
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index a0aa285..ce863d9 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -168,7 +168,10 @@
 #define OPAL_INT_SET_MFRR  125
 #define OPAL_PCI_TCE_KILL  126
 #define OPAL_NMMU_SET_PTCR 127
-#define OPAL_LAST  127
+#define OPAL_IMC_COUNTERS_INIT 149
+#define OPAL_IMC_COUNTERS_START150
+#define OPAL_IMC_COUNTERS_STOP 151
+#define OPAL_LAST  151
 
 /* Device tree flags */
 
@@ -928,6 +931,13 @@ enum {
OPAL_PCI_TCE_KILL_ALL,
 };
 
+/* Argument to OPAL_IMC_COUNTERS_*  */
+enum {
+   OPAL_IMC_COUNTERS_NEST = 1,
+   OPAL_IMC_COUNTERS_CORE = 2,
+   OPAL_IMC_COUNTERS_THREAD = 3,
+};
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __OPAL_API_H */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 1ff03a6..9c16ec6 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -227,6 +227,10 @@ int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t 
kill_type,
  uint64_t dma_addr, uint32_t npages);
 int64_t opal_nmmu_set_ptcr(uint64_t chip_id, uint64_t ptcr);
 
+int64_t opal_imc_counters_init(uint32_t type, uint64_t address);
+int64_t opal_imc_counters_start(uint32_t type);
+int64_t opal_imc_counters_stop(uint32_t type);
+
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
   int depth, void *data);
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index f09a37a..40792424 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -18,6 +18,11 @@
 
 struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
 struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+static cpumask_t nest_imc_cpumask;
+
+static atomic_t nest_events;
+/* Used to avoid races in calling enable/disable nest-pmu units*/
+static DEFINE_MUTEX(imc_nest_reserve);
 
 /* Needed for sanity check */
 extern u64 nest_max_offset;
@@ -33,6 +38,160 @@ static struct attribute_group imc_format_group = {
.attrs = imc_format_attrs,
 };
 
+/* Get the cpumask printed to a buffer "buf" */
+static ssize_t imc_pmu_cpumask_get_attr(struct device *dev,
+   struct device_attribute *attr,
+   char *buf)
+{
+   cpumask_t *active_mask;
+
+   active_mask = _imc_cpumask;
+   return cpumap_print_to_pagebuf(true, buf, active_mask);
+}
+
+static DEVICE_ATTR(cpumask, S_IRUGO, imc_pmu_cpumask_get_attr, NULL);
+
+static struct attribute *imc_pmu_cpumask_attrs[] = {
+   _attr_cpumask.attr,
+   NULL,
+};
+
+static struct attribute_group imc_pmu_cpumask_attr_group = {
+   .attrs = imc_pmu_cpumask_attrs,
+};
+
+/*
+ * nest_init : Initializes the nest imc engine for the current chip.
+ * by default the nest engine is disabled.
+ */
+static void nest_init(int *cpu_opal_rc)
+{
+   int rc;
+
+   /*
+* OPAL figures out which CPU to start based on the CPU that is
+* currently running when we call into OPAL
+*/
+   rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
+   

[PATCH v8 04/10] powerpc/perf: Add generic IMC pmu groupand event functions

2017-05-04 Thread Anju T Sudhakar
From: Hemant Kumar 

Device tree IMC driver code parses the IMC units and their events. It
passes the information to IMC pmu code which is placed in powerpc/perf
as "imc-pmu.c".

Patch adds a set of generic imc pmu related event functions to be
used  by each imc pmu unit. Add code to setup format attribute and to
register imc pmus. Add a event_init function for nest_imc events.

Since, the IMC counters' data are periodically fed to a memory location,
the functions to read/update, start/stop, add/del can be generic and can
be used by all IMC PMU units.

Signed-off-by: Anju T Sudhakar 
Signed-off-by: Hemant Kumar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/imc-pmu.h|   3 +
 arch/powerpc/perf/Makefile|   3 +
 arch/powerpc/perf/imc-pmu.c   | 269 ++
 arch/powerpc/platforms/powernv/opal-imc.c |  10 +-
 4 files changed, 283 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/perf/imc-pmu.c

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index d0193c8..6bbe184 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -92,4 +92,7 @@ struct imc_pmu {
 #define IMC_DOMAIN_NEST1
 #define IMC_DOMAIN_UNKNOWN -1
 
+extern struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
+extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+extern int __init init_imc_pmu(struct imc_events *events,int idx, struct 
imc_pmu *pmu_ptr);
 #endif /* PPC_POWERNV_IMC_PMU_DEF_H */
diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index 4d606b9..b29d918 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -6,6 +6,9 @@ obj-$(CONFIG_PPC_PERF_CTRS) += core-book3s.o bhrb.o
 obj64-$(CONFIG_PPC_PERF_CTRS)  += power4-pmu.o ppc970-pmu.o power5-pmu.o \
   power5+-pmu.o power6-pmu.o power7-pmu.o \
   isa207-common.o power8-pmu.o power9-pmu.o
+
+obj-$(CONFIG_HV_PERF_IMC_CTRS) += imc-pmu.o
+
 obj32-$(CONFIG_PPC_PERF_CTRS)  += mpc7450-pmu.o
 
 obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
new file mode 100644
index 000..f09a37a
--- /dev/null
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -0,0 +1,269 @@
+/*
+ * Nest Performance Monitor counter support.
+ *
+ * Copyright (C) 2017 Madhavan Srinivasan, IBM Corporation.
+ *   (C) 2017 Anju T Sudhakar, IBM Corporation.
+ *   (C) 2017 Hemant K Shaw, IBM Corporation.
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
+struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+
+/* Needed for sanity check */
+extern u64 nest_max_offset;
+
+PMU_FORMAT_ATTR(event, "config:0-20");
+static struct attribute *imc_format_attrs[] = {
+   _attr_event.attr,
+   NULL,
+};
+
+static struct attribute_group imc_format_group = {
+   .name = "format",
+   .attrs = imc_format_attrs,
+};
+
+static int nest_imc_event_init(struct perf_event *event)
+{
+   int chip_id;
+   u32 config = event->attr.config;
+   struct perchip_nest_info *pcni;
+
+   if (event->attr.type != event->pmu->type)
+   return -ENOENT;
+
+   /* Sampling not supported */
+   if (event->hw.sample_period)
+   return -EINVAL;
+
+   /* unsupported modes and filters */
+   if (event->attr.exclude_user   ||
+   event->attr.exclude_kernel ||
+   event->attr.exclude_hv ||
+   event->attr.exclude_idle   ||
+   event->attr.exclude_host   ||
+   event->attr.exclude_guest)
+   return -EINVAL;
+
+   if (event->cpu < 0)
+   return -EINVAL;
+
+   /* Sanity check for config (event offset) */
+   if (config > nest_max_offset)
+   return -EINVAL;
+
+   chip_id = topology_physical_package_id(event->cpu);
+   pcni = _perchip_info[chip_id];
+
+   /*
+* Memory for Nest HW counter data could be in multiple pages.
+* Hence check and pick the right event base page for chip with
+* "chip_id" and add "config" to it".
+*/
+   event->hw.event_base = pcni->vbase[config/PAGE_SIZE] +
+   (config & ~PAGE_MASK);
+
+   return 0;
+}
+
+static void imc_read_counter(struct perf_event *event)
+{
+   u64 *addr, data;
+
+   /*
+* In-Memory Collection (IMC) counters are free flowing counters.
+* So we take a snapshot of the counter value on enable and save it
+   

[PATCH v8 03/10] powerpc/powernv: Detect supported IMC units and its events

2017-05-04 Thread Anju T Sudhakar
Parse device tree to detect IMC units. Traverse through each IMC unit
node to find supported events and corresponding unit/scale files (if any).

Here is the DTS file for reference:


https://github.com/open-power/ima-catalog/blob/master/81E00612.4E0100.dts

The device tree for IMC counters starts at the node "imc-counters".
This node contains all the IMC PMU nodes and event nodes
for these IMC PMUs. The PMU nodes have an "events" property which has a
phandle value for the actual events node. The events are separated from
the PMU nodes to abstract out the common events. For example, PMU node
"mcs0", "mcs1" etc. will contain a pointer to "nest-mcs-events" since,
the events are common between these PMUs. These events have a different
prefix based on their relation to different PMUs, and hence, the PMU
nodes themselves contain an "events-prefix" property. The value for this
property concatenated to the event name, forms the actual event
name. Also, the PMU have a "reg" field as the base offset for the events
which belong to this PMU. This "reg" field is added to event's "reg" field
in the "events" node, which gives us the location of the counter data. Kernel
code uses this offset as event configuration value.

Device tree parser code also looks for scale/unit property in the event
node and passes on the value as an event attr for perf interface to use
in the post processing by the perf tool. Some PMUs may have common scale
and unit properties which implies that all events supported by this PMU
inherit the scale and unit properties of the PMU itself. For those
events, we need to set the common unit and scale values.

For failure to initialize any unit or any event, disable that unit and
continue setting up the rest of them.

Signed-off-by: Hemant Kumar 
Signed-off-by: Anju T Sudhakar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/platforms/powernv/opal-imc.c | 413 ++
 1 file changed, 413 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 3a87000..0ddaf7d 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -33,15 +33,428 @@
 #include 
 #include 
 
+u64 nest_max_offset;
 struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
+struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+
+static int imc_event_prop_update(char *name, struct imc_events *events)
+{
+   char *buf;
+
+   if (!events || !name)
+   return -EINVAL;
+
+   /* memory for content */
+   buf = kzalloc(IMC_MAX_NAME_VAL_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   events->ev_name = name;
+   events->ev_value = buf;
+   return 0;
+}
+
+static int imc_event_prop_str(struct property *pp, char *name,
+ struct imc_events *events)
+{
+   int ret;
+
+   ret = imc_event_prop_update(name, events);
+   if (ret)
+   return ret;
+
+   if (!pp->value || (strnlen(pp->value, pp->length) == pp->length) ||
+  (pp->length > IMC_MAX_NAME_VAL_LEN))
+   return -EINVAL;
+   strncpy(events->ev_value, (const char *)pp->value, pp->length);
+
+   return 0;
+}
+
+static int imc_event_prop_val(char *name, u32 val,
+ struct imc_events *events)
+{
+   int ret;
+
+   ret = imc_event_prop_update(name, events);
+   if (ret)
+   return ret;
+   snprintf(events->ev_value, IMC_MAX_NAME_VAL_LEN, "event=0x%x", val);
+
+   return 0;
+}
+
+static int set_event_property(struct property *pp, char *event_prop,
+ struct imc_events *events, char *ev_name)
+{
+   char *buf;
+   int ret;
+
+   buf = kzalloc(IMC_MAX_NAME_VAL_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   sprintf(buf, "%s.%s", ev_name, event_prop);
+   ret = imc_event_prop_str(pp, buf, events);
+   if (ret) {
+   if (events->ev_name)
+   kfree(events->ev_name);
+   if (events->ev_value)
+   kfree(events->ev_value);
+   }
+   return ret;
+}
+
+/*
+ * Updates the maximum offset for an event in the pmu with domain
+ * "pmu_domain".
+ */
+static void update_max_value(u32 value, int pmu_domain)
+{
+   switch (pmu_domain) {
+   case IMC_DOMAIN_NEST:
+   if (nest_max_offset < value)
+   nest_max_offset = value;
+   break;
+   default:
+   /* Unknown domain, return */
+   return;
+   }
+}
+
+/*
+ * imc_events_node_parser: Parse the event node "dev" and assign the parsed
+ * information to event "events".
+ *
+ * Parses the "reg", "scale" and "unit" properties of this event.
+ * "reg" gives us the event offset in the counter memory.
+ */

[PATCH v8 01/10] powerpc/powernv: Data structure and macros definitions for IMC

2017-05-04 Thread Anju T Sudhakar
From: Hemant Kumar 

Create a new header file to add the data structures and
macros needed for In-Memory Collection (IMC) counter support.

Signed-off-by: Anju T Sudhakar 
Signed-off-by: Hemant Kumar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/imc-pmu.h | 95 ++
 1 file changed, 95 insertions(+)
 create mode 100644 arch/powerpc/include/asm/imc-pmu.h

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
new file mode 100644
index 000..d0193c8
--- /dev/null
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -0,0 +1,95 @@
+#ifndef PPC_POWERNV_IMC_PMU_DEF_H
+#define PPC_POWERNV_IMC_PMU_DEF_H
+
+/*
+ * IMC Nest Performance Monitor counter support.
+ *
+ * Copyright (C) 2017 Madhavan Srinivasan, IBM Corporation.
+ *   (C) 2017 Anju T Sudhakar, IBM Corporation.
+ *   (C) 2017 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * For static allocation of some of the structures.
+ */
+#define IMC_MAX_CHIPS  32
+#define IMC_MAX_PMUS   32
+
+/*
+ * This macro is used for memory buffer allocation of
+ * event names and event string
+ */
+#define IMC_MAX_NAME_VAL_LEN   96
+
+/*
+ * Currently Microcode supports a max of 256KB of counter memory
+ * in the reserved memory region. Max pages to mmap (considering 4K PAGESIZE).
+ */
+#define IMC_NEST_MAX_PAGES 64
+
+/*
+ *Compatbility macros for IMC devices
+ */
+#define IMC_DTB_COMPAT "ibm,opal-in-memory-counters"
+#define IMC_DTB_NEST_COMPAT"ibm,imc-counters-nest"
+
+/*
+ * Structure to hold per chip specific memory address
+ * information for nest pmus. Nest Counter data are exported
+ * in per-chip reserved memory region by the PORE Engine.
+ */
+struct perchip_nest_info {
+   u32 chip_id;
+   u64 pbase;
+   u64 vbase[IMC_NEST_MAX_PAGES];
+   u64 size;
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct imc_events {
+   char *ev_name;
+   char *ev_value;
+};
+
+#define IMC_FORMAT_ATTR0
+#define IMC_CPUMASK_ATTR   1
+#define IMC_EVENT_ATTR 2
+#define IMC_NULL_ATTR  3
+
+/*
+ * Device tree parser code detects IMC pmu support and
+ * registers new IMC pmus. This structure will
+ * hold the pmu functions and attrs for each imc pmu and
+ * will be referenced at the time of pmu registration.
+ */
+struct imc_pmu {
+   struct pmu pmu;
+   int domain;
+   /*
+* Attribute groups for the PMU. Slot 0 used for
+* format attribute, slot 1 used for cpusmask attribute,
+* slot 2 used for event attribute. Slot 3 keep as
+* NULL.
+*/
+   const struct attribute_group *attr_groups[4];
+};
+
+/*
+ * Domains for IMC PMUs
+ */
+#define IMC_DOMAIN_NEST1
+#define IMC_DOMAIN_UNKNOWN -1
+
+#endif /* PPC_POWERNV_IMC_PMU_DEF_H */
-- 
2.7.4



[PATCH v8 02/10] powerpc/powernv: Autoload IMC device driver module

2017-05-04 Thread Anju T Sudhakar
This patch does three things :
 - Enables "opal.c" to create a platform device for the IMC interface
   according to the appropriate compatibility string.
 - Find the reserved-memory region details from the system device tree
   and get the base address of HOMER (Reserved memory) region address for each 
chip.
 - We also get the Nest PMU counter data offsets (in the HOMER region)
   and their sizes. The offsets for the counters' data are fixed and
   won't change from chip to chip.

The device tree parsing logic is separated from the PMU creation
functions (which is done in subsequent patches).

Patch also adds a CONFIG_HV_PERF_IMC_CTRS for the IMC driver.

Signed-off-by: Anju T Sudhakar 
Signed-off-by: Hemant Kumar 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/platforms/powernv/Kconfig|  10 +++
 arch/powerpc/platforms/powernv/Makefile   |   1 +
 arch/powerpc/platforms/powernv/opal-imc.c | 140 ++
 arch/powerpc/platforms/powernv/opal.c |  18 
 4 files changed, 169 insertions(+)
 create mode 100644 arch/powerpc/platforms/powernv/opal-imc.c

diff --git a/arch/powerpc/platforms/powernv/Kconfig 
b/arch/powerpc/platforms/powernv/Kconfig
index 3a07e4d..1b90a98 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -27,3 +27,13 @@ config OPAL_PRD
help
  This enables the opal-prd driver, a facility to run processor
  recovery diagnostics on OpenPower machines
+
+config HV_PERF_IMC_CTRS
+   bool "Hypervisor supplied In Memory Collection PMU events (Nest & Core)"
+   default y
+   depends on PERF_EVENTS && PPC_POWERNV
+   help
+ Enable access to hypervisor supplied in-memory collection counters
+ in perf. IMC counters are available from Power9 systems.
+
+  If unsure, select Y.
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index b5d98cb..715e531 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -12,3 +12,4 @@ obj-$(CONFIG_PPC_SCOM)+= opal-xscom.o
 obj-$(CONFIG_MEMORY_FAILURE)   += opal-memory-errors.o
 obj-$(CONFIG_TRACEPOINTS)  += opal-tracepoints.o
 obj-$(CONFIG_OPAL_PRD) += opal-prd.o
+obj-$(CONFIG_HV_PERF_IMC_CTRS) += opal-imc.o
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
new file mode 100644
index 000..3a87000
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -0,0 +1,140 @@
+/*
+ * OPAL IMC interface detection driver
+ * Supported on POWERNV platform
+ *
+ * Copyright   (C) 2017 Madhavan Srinivasan, IBM Corporation.
+ * (C) 2017 Anju T Sudhakar, IBM Corporation.
+ * (C) 2017 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
+
+/*
+ * imc_pmu_setup : Setup the IMC PMUs (children of "parent").
+ */
+static void __init imc_pmu_setup(struct device_node *parent)
+{
+   if (!parent)
+   return;
+}
+
+static int opal_imc_counters_probe(struct platform_device *pdev)
+{
+   struct device_node *imc_dev, *dn, *rm_node = NULL;
+   struct perchip_nest_info *pcni;
+   u32 pages, nest_offset, nest_size, chip_id;
+   int i = 0;
+   const __be32 *addrp;
+   u64 reg_addr, reg_size;
+
+   if (!pdev || !pdev->dev.of_node)
+   return -ENODEV;
+
+   /*
+* Check whether this is kdump kernel. If yes, just return.
+*/
+   if (is_kdump_kernel())
+   return -ENODEV;
+
+   imc_dev = pdev->dev.of_node;
+
+   /*
+* Nest counter data are saved in a reserved memory called HOMER.
+* "imc-nest-offset" identifies the counter data location within HOMER.
+* size : size of the entire nest-counters region
+*/
+   if (of_property_read_u32(imc_dev, "imc-nest-offset", _offset))
+   goto err;
+
+   if (of_property_read_u32(imc_dev, "imc-nest-size", _size))
+   goto err;
+
+   /* Sanity check */
+   if ((nest_size/PAGE_SIZE) > IMC_NEST_MAX_PAGES)
+   goto err;
+
+   /* Find the "HOMER region" for each chip */
+   rm_node = 

[PATCH v8 00/10] IMC Instrumentation Support

2017-05-04 Thread Anju T Sudhakar
Power9 has In-Memory-Collection (IMC) infrastructure which contains
various Performance Monitoring Units (PMUs) at Nest level (these are
on-chip but off-core), Core level and Thread level.

The Nest PMU counters are handled by a Nest IMC microcode which runs
in the OCC (On-Chip Controller) complex. The microcode collects the
counter data and moves the nest IMC counter data to memory.

The Core and Thread IMC PMU counters are handled in the core. Core
level PMU counters give us the IMC counters' data per core and thread
level PMU counters give us the IMC counters' data per CPU thread.

This patchset enables the nest IMC, core IMC and thread IMC
PMUs and is based on the initial work done by Madhavan Srinivasan.
"Nest Instrumentation Support" :
https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132078.html

v1 for this patchset can be found here :
https://lwn.net/Articles/705475/

Nest events:
Per-chip nest instrumentation provides various per-chip metrics
such as memory, powerbus, Xlink and Alink bandwidth.

Core events:
Per-core IMC instrumentation provides various per-core metrics
such as non-idle cycles, non-idle instructions, various cache and
memory related metrics etc.

Thread events:
All the events for thread level are same as core level with the
difference being in the domain. These are per-cpu metrics.

PMU Events' Information:
OPAL obtains the IMC PMU and event information from the IMC Catalog
and passes on to the kernel via the device tree. The events' information
contains :
 - Event name
 - Event Offset
 - Event description
and, maybe :
 - Event scale
 - Event unit

Some PMUs may have a common scale and unit values for all their
supported events. For those cases, the scale and unit properties for
those events must be inherited from the PMU.

The event offset in the memory is where the counter data gets
accumulated.

The OPAL-side patches are posted upstream :
https://lists.ozlabs.org/pipermail/skiboot/2017-May/007167.html

The kernel discovers the IMC counters information in the device tree
at the "imc-counters" device node which has a compatible field
"ibm,opal-in-memory-counters".

Parsing of the Events' information:
To parse the IMC PMUs and events information, the kernel has to
discover the "imc-counters" node and walk through the pmu and event
nodes.

Here is an excerpt of the dt showing the imc-counters with
mcs0 (nest), core and thread node:
https://github.com/open-power/ima-catalog/blob/master/81E00612.4E0100.dts


/dts-v1/;

[...]

/dts-v1/;

/ {
name = "";
compatible = "ibm,opal-in-memory-counters";
#address-cells = <0x1>;
#size-cells = <0x1>;
imc-nest-offset = <0x32>;
imc-nest-size = <0x3>;
version-id = "";

NEST_MCS: nest-mcs-events {
#address-cells = <0x1>;
#size-cells = <0x1>;

event@0 {
event-name = "RRTO_QFULL_NO_DISP" ;
reg = <0x0 0x8>;
desc = "RRTO not dispatched in MCS0 due to capacity - 
pulses once for each time a valid RRTO op is not dispatched due to a command 
list full condition" ;
};
event@8 {
event-name = "WRTO_QFULL_NO_DISP" ;
reg = <0x8 0x8>;
desc = "WRTO not dispatched in MCS0 due to capacity - 
pulses once for each time a valid WRTO op is not dispatched due to a command 
list full condition" ;
};
[...]
mcs0 {
compatible = "ibm,imc-counters-nest";
events-prefix = "PM_MCS0_";
unit = "";
scale = "";
reg = <0x118 0x8>;
events = < _MCS >;
};

mcs1 {
compatible = "ibm,imc-counters-nest";
events-prefix = "PM_MCS1_";
unit = "";
scale = "";
reg = <0x198 0x8>;
events = < _MCS >;
};
[...]
CORE_EVENTS: core-events {
#address-cells = <0x1>;
#size-cells = <0x1>;

event@e0 {
event-name = "0THRD_NON_IDLE_PCYC" ;
reg = <0xe0 0x8>;
desc = "The number of processor cycles when all threads 
are idle" ;
};
event@120 {
event-name = "1THRD_NON_IDLE_PCYC" ;
reg = <0x120 0x8>;
desc = "The number of processor cycles when exactly one 
SMT thread is executing non-idle code" ;
};
[...]
   core {
compatible = "ibm,imc-counters-core";
events-prefix = "CPM_";
unit = "";
scale = "";
reg = <0x0 0x8>;
events = < _EVENTS >;
};

thread {
compatible = 

RE: [PATCH v2] powerpc/kprobes: refactor kprobe_lookup_name for safer string operations

2017-05-04 Thread David Laight
From: Naveen N. Rao [mailto:naveen.n@linux.vnet.ibm.com]
> Sent: 04 May 2017 11:25
> Use safer string manipulation functions when dealing with a
> user-provided string in kprobe_lookup_name().
> 
> Reported-by: David Laight 
> Signed-off-by: Naveen N. Rao 
> ---
> Changed to ignore return value of 0 from strscpy(), as suggested by
> Masami.

let's see what this code looks like;

>   char dot_name[MODULE_NAME_LEN + 1 + KSYM_NAME_LEN];
>   bool dot_appended = false;
> + const char *c;
> + ssize_t ret = 0;
> + int len = 0;
> +
> + if ((c = strnchr(name, MODULE_NAME_LEN, ':')) != NULL) {

I don't like unnecessary assignments in conditionals.

> + c++;
> + len = c - name;
> + memcpy(dot_name, name, len);
> + } else
> + c = name;
> +
> + if (*c != '\0' && *c != '.') {
> + dot_name[len++] = '.';
>   dot_appended = true;

If you don't append a dot, then you can always just lookup
the original string.

>   }
> + ret = strscpy(dot_name + len, c, KSYM_NAME_LEN);
> + if (ret > 0)
> + addr = (kprobe_opcode_t *)kallsyms_lookup_name(dot_name);

I'm not sure you need 'ret' here at all.

> + /* Fallback to the original non-dot symbol lookup */
> + if (!addr && dot_appended)
>   addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);

We can bikeshed this function to death:

/* The function name must start with a '.'.
 * If it doesn't then we insert one. */
c = strnchr(name, MODULE_NAME_LEN, ':');
if (c && c[1] && c[1] != '.') {
/* Insert a '.' after the ':' */
c++;
len = c - name;
memcpy(dot_name, name, len);
} else {
if (name[0] == '.')
goto check_name:
/* Insert a '.' before name */
c = name;
len = 0;
}

dot_name[len++] = '.';
if (strscpy(dot_name + len, c, KSYM_NAME_LEN) > 0) {
addr = (kprobe_opcode_t *)kallsyms_lookup_name(dot_name);
if (addr)
return addr;
}
/* Symbol with extra '.' not found, fallback to original name */

check_name:
return (kprobe_opcode_t *)kallsyms_lookup_name(name);

David



Re: [linux-next][bisected 1945bc45] build brakes for PowerPC BE configuration on LPAR

2017-05-04 Thread Abdul Haleem
On Thu, 2017-05-04 at 20:29 +1000, Michael Ellerman wrote:
> Abdul Haleem  writes:
> 
> > Hi,
> >
> > linux-next build fails on BE config with next-20170424 onwards
> >
> > the patch https://lkml.org/lkml/2017/4/20/994  fixes a similar issue
> > with kvm guest build failure.
> >
> > arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
> > arch/powerpc/kernel/exceptions-64s.S:395: Error: operand out of range
> > (0x8280 is not between 0x and
> > 0x)
> > make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1
> 
> I couldn't reproduce that.
> 
> What toolchain are you using?

I have gcc-4.8.2-1 on Power 6 LPAR running Fedora 20.

Also recreating the same on Power 8 BE LPAR having gcc-4.8.5-11

> 
> cheers
> 


-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre





Re: [linux-next][bisected 1945bc45] build brakes for PowerPC BE configuration on LPAR

2017-05-04 Thread Nicholas Piggin
On Thu, 04 May 2017 14:54:19 +0530
Abdul Haleem  wrote:

> Hi,
> 
> linux-next build fails on BE config with next-20170424 onwards
> 
> the patch https://lkml.org/lkml/2017/4/20/994  fixes a similar issue
> with kvm guest build failure.
> 
> arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
> arch/powerpc/kernel/exceptions-64s.S:395: Error: operand out of range
> (0x8280 is not between 0x and
> 0x)
> make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1
> 
> Bisection resulted with the below bad commit.
> 
> commit 1945bc4549e5cb1f9aa873ec29191aa54dc851d
> Author: Nicholas Piggin 
> Date:   Wed Apr 19 23:05:47 2017 +1000
> 
> powerpc/64s: Fix POWER9 machine check handler from stop state
> 
> Reviewed-by: Gautham R. Shenoy 
> Reviewed-by: Mahesh J Salgaonkar 
> Signed-off-by: Nicholas Piggin 
> Signed-off-by: Michael Ellerman 
> 
>  arch/powerpc/include/asm/reg.h   |  1 +
>  arch/powerpc/kernel/exceptions-64s.S | 79 
> ---
>  arch/powerpc/kernel/idle_book3s.S| 25 +
>  3 files changed, 70 insertions(+), 35 deletions(-)
> 
> the BE configuration file is attached.
> 

Thanks for the report. I wouldn't reproduce it with this config. I
suspect the following patch should fix it, can you test?

powerpc/64s: Fix unnecessary machine check handler relocation branch

Similarly to 2563a70c3b ("powerpc/64s: Remove unnecessary relocation
branch from idle handler"), the machine check handler has a BRANCH_TO
from relocated to relocated code, which is unnecessary.

It has also caused build errors with some toolchains:

  arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
  arch/powerpc/kernel/exceptions-64s.S:395: Error: operand out of range
  (0x8280 is not between 0x and
  0x)

Fixes: 1945bc4549 ("powerpc/64s: Fix POWER9 machine check handler from stop 
state")
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 3840a7700285..ef72065f684c 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -391,9 +391,7 @@ EXC_COMMON_BEGIN(machine_check_handle_early)
 */
BEGIN_FTR_SECTION
rlwinm. r11,r12,47-31,30,31
-   beq-4f
-   BRANCH_TO_COMMON(r10, machine_check_idle_common)
-4:
+   bne machine_check_idle_common
END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 #endif
 
-- 
2.11.0



Re: [linux-next][bisected 1945bc45] build brakes for PowerPC BE configuration on LPAR

2017-05-04 Thread Michael Ellerman
Abdul Haleem  writes:

> Hi,
>
> linux-next build fails on BE config with next-20170424 onwards
>
> the patch https://lkml.org/lkml/2017/4/20/994  fixes a similar issue
> with kvm guest build failure.
>
> arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
> arch/powerpc/kernel/exceptions-64s.S:395: Error: operand out of range
> (0x8280 is not between 0x and
> 0x)
> make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1

I couldn't reproduce that.

What toolchain are you using?

cheers


[PATCH v2] powerpc/kprobes: refactor kprobe_lookup_name for safer string operations

2017-05-04 Thread Naveen N. Rao
Use safer string manipulation functions when dealing with a
user-provided string in kprobe_lookup_name().

Reported-by: David Laight 
Signed-off-by: Naveen N. Rao 
---
Changed to ignore return value of 0 from strscpy(), as suggested by
Masami.

- Naveen

 arch/powerpc/kernel/kprobes.c | 47 ++-
 1 file changed, 20 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 160ae0fa7d0d..255d28d31ca1 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -53,7 +53,7 @@ bool arch_within_kprobe_blacklist(unsigned long addr)
 
 kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
 {
-   kprobe_opcode_t *addr;
+   kprobe_opcode_t *addr = NULL;
 
 #ifdef PPC64_ELF_ABI_v2
/* PPC64 ABIv2 needs local entry point */
@@ -85,36 +85,29 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
unsigned int offset)
 * Also handle  format.
 */
char dot_name[MODULE_NAME_LEN + 1 + KSYM_NAME_LEN];
-   const char *modsym;
bool dot_appended = false;
-   if ((modsym = strchr(name, ':')) != NULL) {
-   modsym++;
-   if (*modsym != '\0' && *modsym != '.') {
-   /* Convert to  */
-   strncpy(dot_name, name, modsym - name);
-   dot_name[modsym - name] = '.';
-   dot_name[modsym - name + 1] = '\0';
-   strncat(dot_name, modsym,
-   sizeof(dot_name) - (modsym - name) - 2);
-   dot_appended = true;
-   } else {
-   dot_name[0] = '\0';
-   strncat(dot_name, name, sizeof(dot_name) - 1);
-   }
-   } else if (name[0] != '.') {
-   dot_name[0] = '.';
-   dot_name[1] = '\0';
-   strncat(dot_name, name, KSYM_NAME_LEN - 2);
+   const char *c;
+   ssize_t ret = 0;
+   int len = 0;
+
+   if ((c = strnchr(name, MODULE_NAME_LEN, ':')) != NULL) {
+   c++;
+   len = c - name;
+   memcpy(dot_name, name, len);
+   } else
+   c = name;
+
+   if (*c != '\0' && *c != '.') {
+   dot_name[len++] = '.';
dot_appended = true;
-   } else {
-   dot_name[0] = '\0';
-   strncat(dot_name, name, KSYM_NAME_LEN - 1);
}
-   addr = (kprobe_opcode_t *)kallsyms_lookup_name(dot_name);
-   if (!addr && dot_appended) {
-   /* Let's try the original non-dot symbol lookup */
+   ret = strscpy(dot_name + len, c, KSYM_NAME_LEN);
+   if (ret > 0)
+   addr = (kprobe_opcode_t *)kallsyms_lookup_name(dot_name);
+
+   /* Fallback to the original non-dot symbol lookup */
+   if (!addr && dot_appended)
addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
-   }
 #else
addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
 #endif
-- 
2.12.2



Re: WARNING: CPU: 0 PID: 1 at /build/linux-dp17Ba/linux-4.9.18/arch/powerpc/lib/feature-fixups.c:208 check_features+0x38/0x7c

2017-05-04 Thread Michael Ellerman
Mathieu Malaterre  writes:

> Hi all,
>
> Does this dmesg output speaks to anyone here (smp kernel):
>
>
> [4.767389] [ cut here ]
> [4.774668] WARNING: CPU: 0 PID: 1 at
> /build/linux-dp17Ba/linux-4.9.18/arch/powerpc/lib/feature-fixups.c:208
> check_features+0x38/0x7c

Is there anything prior to the "cut here" line?

cheers


[PATCH] powerpc/64s: ibm,powerpc-cpu-features dt implementation

2017-05-04 Thread Nicholas Piggin
The ibm,powerpc-cpu-features dt binding describes CPU features with
ascii names and extensible compatibility, privilege, and enablement
metadata that allows improved flexibility and compatibility with new
hardware.

Design overview and specification of features is available in the OPAL
source.

Signed-off-by: Nicholas Piggin 

Since last post:
- Check the "compatible" property.
- Update documentation, don't specify it has to be a child node of /cpus/
- POWER9 CPU_FTR_ICSWX bit is now clear upstream.
- Fix POWER9 DD1 feature getting applied.
- PPC_FEATURE2_EBB is left to PMU init for the moment (matching existing
  init).
- Slightly change how LPCR was being set in cpu_restore, to match existing
  code for idle reinit.
- Add init time TLB flushes to match existing boot init.

Mambo still has a few issues with POWER8:
- HFSCR bit 54 and 57 are now clear (mambo sets at init)
- PMAO_BUG is set. This is due to mambo setting architected POWER8 mode
  and POWER8E PVR. Current kernels lose PMAO_BUG bit.
- CI_LARGE_PAGE is now set (mambo boot does not set it for some reason,
  haven't looked at why).
---
 .../bindings/powerpc/ibm,powerpc-cpu-features.txt  | 246 +++
 arch/powerpc/Kconfig   |  16 +
 arch/powerpc/include/asm/cpu_has_feature.h |   4 +-
 arch/powerpc/include/asm/cpufeatures.h |  55 ++
 arch/powerpc/include/asm/cputable.h|   2 +
 arch/powerpc/include/asm/reg.h |   1 +
 arch/powerpc/include/uapi/asm/cputable.h   |   6 +
 arch/powerpc/kernel/Makefile   |   1 +
 arch/powerpc/kernel/cpufeatures.c  | 724 +
 arch/powerpc/kernel/cputable.c |  37 +-
 arch/powerpc/kernel/prom.c | 338 +-
 arch/powerpc/kernel/setup-common.c |   2 +-
 arch/powerpc/kernel/setup_64.c |  15 +-
 13 files changed, 1430 insertions(+), 17 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/powerpc/ibm,powerpc-cpu-features.txt
 create mode 100644 arch/powerpc/include/asm/cpufeatures.h
 create mode 100644 arch/powerpc/kernel/cpufeatures.c

diff --git 
a/Documentation/devicetree/bindings/powerpc/ibm,powerpc-cpu-features.txt 
b/Documentation/devicetree/bindings/powerpc/ibm,powerpc-cpu-features.txt
new file mode 100644
index ..2cabfc4d7e18
--- /dev/null
+++ b/Documentation/devicetree/bindings/powerpc/ibm,powerpc-cpu-features.txt
@@ -0,0 +1,246 @@
+*** NOTE ***
+This document is copied from OPAL firmware
+(skiboot/doc/device-tree/ibm,powerpc-cpu-features/binding.txt)
+
+There is more complete overview and documentation of features in that
+source tree.  All patches and modifications should go there.
+
+
+ibm,powerpc-cpu-features binding
+
+
+This device tree binding describes CPU features available to software, with
+enablement, privilege, and compatibility metadata.
+
+More general description of design and implementation of this binding is
+found in design.txt, which also points to documentation of specific features.
+
+
+/cpus/ibm,powerpc-cpu-features node binding
+---
+
+Node: ibm,powerpc-cpu-features
+
+Description: Container of CPU feature nodes.
+
+The node name must be "ibm,powerpc-cpu-features".
+
+It is implemented as a child of the node "/cpus", but this must not be
+assumed by parsers.
+
+The node is optional but should be provided by new OPAL firmware.
+
+Properties:
+
+- device_type
+  Usage: required
+  Value type: string
+  Definition: "cpu-features"
+
+- compatible
+  Usage: required
+  Value type: string
+  Definition: "ibm,powerpc-cpu-features"
+
+  This compatibility refers to backwards compatibility of the overall
+  design with parsers that behave according to these guidelines. This can
+  be extended in a backward compatible manner which would not warrant a
+  revision of the compatible property.
+
+- isa
+  Usage: required
+  Value type: 
+  Definition:
+
+  isa that the CPU is currently running in. This provides instruction set
+  compatibility, less the individual feature nodes. For example, an ISA v3.0
+  implementation that lacks the "transactional-memory" cpufeature node
+  should not use transactional memory facilities.
+
+  Value corresponds to the "Power ISA Version" multiplied by 1000.
+  For example, <3000> corresponds to Version 3.0, <2070> to Version 2.07.
+  The minor digit is available for revisions.
+
+/cpus/ibm,powerpc-cpu-features/example-feature node bindings
+
+
+Each child node of cpu-features represents a CPU feature / capability.
+
+Node: A string describing an architected CPU feature, e.g., "floating-point".
+
+Description: A feature or capability supported by the CPUs.
+
+The name of the node is a human readable string that forms the interface
+used to describe features 

Re: [PATCH v2 2/3] powerpc/kprobes: un-blacklist system_call() from kprobes

2017-05-04 Thread Michael Ellerman
"Naveen N. Rao"  writes:
> On 2017/05/04 04:03PM, Michael Ellerman wrote:
>> Would this work?
>> 
>> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
>> index 767ef6d68c9e..8d0fa4a2262a 100644
>> --- a/arch/powerpc/kernel/entry_64.S
>> +++ b/arch/powerpc/kernel/entry_64.S
>> @@ -207,6 +207,7 @@ system_call: /* label this so stack 
>> traces look sane */
>>  mtmsrd  r11,1
>>  #endif /* CONFIG_PPC_BOOK3E */
>> 
>> +syscall_exit:
>>  ld  r9,TI_FLAGS(r12)
>>  li  r11,-MAX_ERRNO
>>  andi.   
>> r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
>
> Ah, nice. I previously incorrectly assumed that syscall_exit was not 
> desirable throughout this function. Your earlier patch was only about 
> what label showed up while _inside_ a syscall.

Yep. When you're somewhere in a syscall the LR on the stack points to
the instruction following the bctrl that called the syscall handler, so
as long as the label preceeding that is system_call then the backtrace
should look good.

We could even just have a nop after the bctrl and then the label, but
that would be a bit gross.

> So, syscall_exit post handling of a syscall is fine.
>
> This patch looks fine to me. I will test with this change and get back.

Thanks.

cheers


WARNING: CPU: 0 PID: 1 at /build/linux-dp17Ba/linux-4.9.18/arch/powerpc/lib/feature-fixups.c:208 check_features+0x38/0x7c

2017-05-04 Thread Mathieu Malaterre
Hi all,

Does this dmesg output speaks to anyone here (smp kernel):


[4.767389] [ cut here ]
[4.774668] WARNING: CPU: 0 PID: 1 at
/build/linux-dp17Ba/linux-4.9.18/arch/powerpc/lib/feature-fixups.c:208
check_features+0x38/0x7c
[4.782256] Modules linked in:
[4.789766] CPU: 0 PID: 1 Comm: swapper Not tainted 4.9.0-2-powerpc
#1 Debian 4.9.18-1
[4.797441] task: df4db1a0 task.stack: df4dc000
[4.805089] NIP: c0776484 LR: c0776484 CTR: 
[4.812736] REGS: df40 TRAP: 0700   Not tainted
(4.9.0-2-powerpc Debian 4.9.18-1)
[4.820485] MSR: 00029032 
[4.820622]   CR: 242ff422  XER: 
[ 4.828326]
[4.828326] GPR00: c0776484 df4dde80 df4db1a0 002c 0004
00ff  00ff
[4.828326] GPR08: 00ff c085271c c085271c  222ff444
 c00047ec 
[4.828326] GPR16:     
  c086
[4.828326] GPR24: c076ecdc c07d2e80 c0766720 c07b2310 
c085f5b0 c086 c07b
NIP [c0776484] check_features+0x38/0x7c
[4.875135] LR [c0776484] check_features+0x38/0x7c
[4.882958] Call Trace:
[4.890751] [df4dde80] [c0776484] check_features+0x38/0x7c (unreliable)
[4.898650] [df4dde90] [c0004154] do_one_initcall+0x4c/0x188
[4.906477] [df4ddf00] [c076f578] kernel_init_freeable+0x164/0x200
[4.914269] [df4ddf30] [c0004810] kernel_init+0x24/0x134
[4.921942] [df4ddf40] [c0016500] ret_from_kernel_thread+0x5c/0x64
[4.929536] Instruction dump:
[4.937046] bfc10008 90010014 3fc0c086 811ef028 3fe0c07b 80ff349c
8108000c 7f883800
41be0014 3c60c06b 38632f34 4be5f601 <0fe0> 815ef028 3bff349c 813f0004
[4.952413] ---[ end trace de3271b83777a44e ]---


>From a non smp kernel:


[0.00] Total memory = 512MB; using 1024kB for hash table (at cff0)
[0.00] Linux version 4.9.0-2-powerpc
(debian-ker...@lists.debian.org) (gcc version 6.3.0 20170321 (Debian
6.3.0-11) ) #1 Debian 4.9.18-1 (2017-03-30)
[0.00] Found initrd at 0xc0c4:0xc19744bc
[0.00] Found UniNorth memory controller & host bridge @
0xf800 revision: 0x11
[0.00] Mapped at 0xff7c
[0.00] Found a Keylargo mac-io controller, rev: 3, mapped at 0xff74
[0.00] PowerMac motherboard: PowerMac G4 Silver
[0.00] Using PowerMac machine description


Thanks


Re: [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations

2017-05-04 Thread Michael Ellerman
Frederic Barrat  writes:

> Introduce a new 'flags' attribute per context and define its first bit
> to be a marker requiring all TLBIs for that context to be broadcasted
> globally. Once that marker is set on a context, it cannot be removed.
>
> Such a marker is useful for memory contexts used by devices behind the
> NPU and CAPP/PSL. The NPU and the PSL keep their own
> translation cache so they need to see all the TLBIs for those
> contexts.
>
> Signed-off-by: Frederic Barrat 
> ---
>  arch/powerpc/include/asm/book3s/64/mmu.h |  9 +
>  arch/powerpc/include/asm/tlb.h   | 10 --
>  arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
>  3 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
> b/arch/powerpc/include/asm/book3s/64/mmu.h
> index 77529a3e3811..7b640ab1cbeb 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> @@ -78,8 +78,12 @@ struct spinlock;
>  /* Maximum possible number of NPUs in a system. */
>  #define NV_MAX_NPUS 8
>  
> +/* Bits definition for the context flags */
> +#define MM_CONTEXT_GLOBAL_TLBI   1   /* TLBI must be global */

I think I'd prefer MM_GLOBAL_TLBIE, it's shorter and tlbie is the name
of the instruction so is something people can search for.

> @@ -164,5 +168,10 @@ extern void radix_init_pseries(void);
>  static inline void radix_init_pseries(void) { };
>  #endif
>  
> +static inline void mm_context_set_global_tlbi(mm_context_t *ctx)
> +{
> + set_bit(MM_CONTEXT_GLOBAL_TLBI, >flags);
> +}

set_bit() and test_bit() are non-atomic, and unordered vs other loads
and stores.

So the caller will need to be careful they have a barrier between this
and whatever it is they do that creates mappings that might need to be
invalidated.

Similarly on the read side we should have a barrier between the store
that makes the PTE invalid and the load of the flag.

Which makes me think cxl_ctx_in_use() is buggy :/, hmm. But it's late so
hopefully I'm wrong :D

> diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
> index 609557569f65..bd18ed083011 100644
> --- a/arch/powerpc/include/asm/tlb.h
> +++ b/arch/powerpc/include/asm/tlb.h
> @@ -71,8 +71,14 @@ static inline int mm_is_core_local(struct mm_struct *mm)
>  
>  static inline int mm_is_thread_local(struct mm_struct *mm)
>  {
> - return cpumask_equal(mm_cpumask(mm),
> -   cpumask_of(smp_processor_id()));
> + int rc;
> +
> + rc = cpumask_equal(mm_cpumask(mm),
> + cpumask_of(smp_processor_id()));
> +#ifdef CONFIG_PPC_BOOK3S_64
> + rc = rc && !test_bit(MM_CONTEXT_GLOBAL_TLBI, >context.flags);
> +#endif

The ifdef's a bit ugly, but I guess it's not worth putting it in a
static inline.

I'd be interested to see the generated code for this, and for the
reverse, ie. putting the test_bit() first, and doing an early return if
it's true. That way once the bit is set we can just skip the cpumask
comparison.

cheers


[linux-next][bisected 1945bc45] build brakes for PowerPC BE configuration on LPAR

2017-05-04 Thread Abdul Haleem
Hi,

linux-next build fails on BE config with next-20170424 onwards

the patch https://lkml.org/lkml/2017/4/20/994  fixes a similar issue
with kvm guest build failure.

arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:395: Error: operand out of range
(0x8280 is not between 0x and
0x)
make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1

Bisection resulted with the below bad commit.

commit 1945bc4549e5cb1f9aa873ec29191aa54dc851d
Author: Nicholas Piggin 
Date:   Wed Apr 19 23:05:47 2017 +1000

powerpc/64s: Fix POWER9 machine check handler from stop state

Reviewed-by: Gautham R. Shenoy 
Reviewed-by: Mahesh J Salgaonkar 
Signed-off-by: Nicholas Piggin 
Signed-off-by: Michael Ellerman 

 arch/powerpc/include/asm/reg.h   |  1 +
 arch/powerpc/kernel/exceptions-64s.S | 79 
---
 arch/powerpc/kernel/idle_book3s.S| 25 +
 3 files changed, 70 insertions(+), 35 deletions(-)

the BE configuration file is attached.

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre


#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 4.10.0-rc3 Kernel Configuration
#
CONFIG_PPC64=y

#
# Processor support
#
CONFIG_PPC_BOOK3S_64=y
# CONFIG_PPC_BOOK3E_64 is not set
CONFIG_GENERIC_CPU=y
# CONFIG_CELL_CPU is not set
# CONFIG_POWER4_CPU is not set
# CONFIG_POWER5_CPU is not set
# CONFIG_POWER6_CPU is not set
# CONFIG_POWER7_CPU is not set
# CONFIG_POWER8_CPU is not set
CONFIG_PPC_BOOK3S=y
CONFIG_PPC_FPU=y
CONFIG_ALTIVEC=y
CONFIG_VSX=y
# CONFIG_PPC_ICSWX is not set
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_64=y
CONFIG_PPC_RADIX_MMU=y
CONFIG_PPC_MM_SLICES=y
CONFIG_PPC_HAVE_PMU_SUPPORT=y
CONFIG_PPC_PERF_CTRS=y
CONFIG_SMP=y
CONFIG_NR_CPUS=1024
CONFIG_PPC_DOORBELL=y
CONFIG_VDSO32=y
CONFIG_CPU_BIG_ENDIAN=y
# CONFIG_CPU_LITTLE_ENDIAN is not set
CONFIG_64BIT=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_MMU=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NR_IRQS=512
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_ILOG2_U32=y
CONFIG_ARCH_HAS_ILOG2_U64=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK=y
CONFIG_PPC=y
# CONFIG_GENERIC_CSUM is not set
CONFIG_EARLY_PRINTK=y
CONFIG_PANIC_TIMEOUT=180
CONFIG_COMPAT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_PPC_UDBG_16550=y
# CONFIG_GENERIC_TBSYNC is not set
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
CONFIG_EPAPR_BOOT=y
# CONFIG_DEFAULT_UIMAGE is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_PPC_DCR_NATIVE is not set
# CONFIG_PPC_DCR_MMIO is not set
# CONFIG_PPC_OF_PLATFORM_PCI is not set
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_PPC_EMULATE_SSTEP=y
CONFIG_ZONE_DMA32=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_XZ is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_SHOW_LEVEL=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_GENERIC_MSI_IRQ=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_GENERIC_TIME_VSYSCALL_OLD=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_ARCH_HAS_TICK_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_VIRT_CPU_ACCOUNTING=y
# CONFIG_TICK_CPU_ACCOUNTING is not set
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TASKS_RCU=y
CONFIG_RCU_STALL_COMMON=y
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_BUILD_BIN2C is not set
# 

Re: [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations

2017-05-04 Thread Michael Ellerman
Balbir Singh  writes:

> On Wed, 2017-05-03 at 16:29 +0200, Frederic Barrat wrote:
>> Introduce a new 'flags' attribute per context and define its first bit
>> to be a marker requiring all TLBIs for that context to be broadcasted
>> globally. Once that marker is set on a context, it cannot be removed.
>> 
>> Such a marker is useful for memory contexts used by devices behind the
>> NPU and CAPP/PSL. The NPU and the PSL keep their own
>> translation cache so they need to see all the TLBIs for those
>> contexts.
>> 
>> Signed-off-by: Frederic Barrat 
>> ---
>>  arch/powerpc/include/asm/book3s/64/mmu.h |  9 +
>>  arch/powerpc/include/asm/tlb.h   | 10 --
>>  arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
>>  3 files changed, 18 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
>> b/arch/powerpc/include/asm/book3s/64/mmu.h
>> index 77529a3e3811..7b640ab1cbeb 100644
>> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
>> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
>> @@ -78,8 +78,12 @@ struct spinlock;
>>  /* Maximum possible number of NPUs in a system. */
>>  #define NV_MAX_NPUS 8
>>  
>> +/* Bits definition for the context flags */
>> +#define MM_CONTEXT_GLOBAL_TLBI  1   /* TLBI must be global */
>> +
>>  typedef struct {
>>  mm_context_id_t id;
>> +unsigned long flags;
>
> Should these flags be under #ifdef PPC_BOOK3S_64 as well? Not sure.

They shouldn't need to be, the whole file is Book3s 64 only.

cheers


[PATCH v3 2/3] powerpc/kprobes: un-blacklist system_call() from kprobes

2017-05-04 Thread Naveen N. Rao
It is actually safe to probe system_call() in entry_64.S, but only till
we unset MSR_RI. To allow this, add a new label system_call_exit after
the mtmsrd and blacklist that. Though the mtmsrd instruction itself is
now whitelisted, we won't be allowed to probe on it as we don't allow
probing on rfi and mtmsr instructions (checked for in arch_prepare_kprobe).

Suggested-by: Michael Ellerman 
Signed-off-by: Naveen N. Rao 
---
Michael,
I have named the new label system_call_exit so as to follow the
existing labels (system_call and system_call_common) and to not
conflict with the syscall_exit private label.

- Naveen


 arch/powerpc/kernel/entry_64.S | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 380361c0bb6a..e255221b0ec0 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -204,6 +204,7 @@ system_call:/* label this so stack 
traces look sane */
mtmsrd  r11,1
 #endif /* CONFIG_PPC_BOOK3E */
 
+system_call_exit:
ld  r9,TI_FLAGS(r12)
li  r11,-MAX_ERRNO
andi.   
r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
@@ -388,7 +389,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
b   .   /* prevent speculative execution */
 #endif
 _ASM_NOKPROBE_SYMBOL(system_call_common);
-_ASM_NOKPROBE_SYMBOL(system_call);
+_ASM_NOKPROBE_SYMBOL(system_call_exit);
 
 /* Save non-volatile GPRs, if not already saved. */
 _GLOBAL(save_nvgprs)
-- 
2.12.2



Re: [RFC 2/2] cxl: Mark context requiring global TLBIs

2017-05-04 Thread Balbir Singh
On Wed, 2017-05-03 at 16:29 +0200, Frederic Barrat wrote:
> The PSL needs to see all TLBI pertinent to the memory contexts used on
> the cxl adapter. For the hash memory model, it was done by making all
> TLBIs global as soon as the cxl driver is in us. For radix, we need
> something similar, but we can refine and only convert to global the
> invalidations for contexts actually used by the device.
> 
> So mark the contexts being attached to the cxl adapter as requiring
> global TLBIs.
>
 
> +#ifdef CONFIG_PPC_BOOK3S_64
> + if (ctx->mm)
> + mm_context_set_global_tlbi(>mm->context);

Just curious and wondering

Could we do mm_context_set_global_tlbi() before ->attach_process() that
way we won't need atomic tests (set_bit() and test_bit())? May be a memory
barrier would suffice. Not 100% sure, hence checking

Balbir Singh.


Re: [PATCH v2 2/3] powerpc/kprobes: un-blacklist system_call() from kprobes

2017-05-04 Thread Naveen N. Rao
On 2017/05/04 04:03PM, Michael Ellerman wrote:
> "Naveen N. Rao"  writes:
> 
> > On 2017/04/27 08:19PM, Michael Ellerman wrote:
> >> "Naveen N. Rao"  writes:
> >> 
> >> > It is actually safe to probe system_call() in entry_64.S, but only till
> >> > .Lsyscall_exit. To allow this, convert .Lsyscall_exit to a non-local
> >> > symbol __system_call() and blacklist that symbol, rather than
> >> > system_call().
> >> 
> >> I'm not sure I like this. The reason we made it a local symbol in the
> >> first place is because it made backtraces look odd:
> >> 
> >>   commit 4c3b21686111e0ac6018469dacbc5549f9915cf8
> >>   Author: Michael Ellerman 
> >>   AuthorDate: Fri Dec 5 21:16:59 2014 +1100
> >>   
> >>   powerpc/kernel: Make syscall_exit a local label
> >>   
> >>   Currently when we back trace something that is in a syscall we see
> >>   something like this:
> >>   
> >>   [c000] [c000] SyS_read+0x6c/0x110
> >>   [c000] [c000] syscall_exit+0x0/0x98
> >>   
> >>   Although it's entirely correct, seeing syscall_exit at the bottom can be
> >>   confusing - we were exiting from a syscall and then called SyS_read() ?
> >>   
> >>   If we instead change syscall_exit to be a local label we get something
> >>   more intuitive:
> >>   
> >>   [c001fa46fde0] [c026719c] SyS_read+0x6c/0x110
> >>   [c001fa46fe30] [c0009264] system_call+0x38/0xd0
> >>   
> >>   ie. we were handling a system call, and it was SyS_read().
> >> 
> >> 
> >> I think you know that, although you didn't mention it in the change log,
> >> because you've called the new symbol __system_call. But that is not a
> >> great name either because that's not what it does.
> >
> > Yes, you're right. I used __system_call since I felt that it won't cause 
> > confusion like syscall_exit did. I agree it's not a great name, but we 
> > need _some_ label other than system_call if we want to allow probing at 
> > this point.
> >
> > Also, if I'm reading this right, there is no other place to probe if we 
> > want to capture all system call entries.
> >
> > So, I felt this would be good to have.
> >
> >> 
> >> > diff --git a/arch/powerpc/kernel/entry_64.S 
> >> > b/arch/powerpc/kernel/entry_64.S
> >> > index 380361c0bb6a..e030ce34dd66 100644
> >> > --- a/arch/powerpc/kernel/entry_64.S
> >> > +++ b/arch/powerpc/kernel/entry_64.S
> >> > @@ -176,7 +176,7 @@ system_call: /* label this so stack 
> >> > traces look sane */
> >> >  mtctr   r12
> >> >  bctrl   /* Call handler */
> >> >  
> >> > -.Lsyscall_exit:
> >> > +__system_call:
> >> >  std r3,RESULT(r1)
> >> >  CURRENT_THREAD_INFO(r12, r1)
> >>   
> >> Why can't we kprobe the std and the rotate to current thread info?
> >> 
> >> Is the real no-probe point just here, prior to the clearing of MSR_RI ?
> >> 
> >>ld  r8,_MSR(r1)
> >> #ifdef CONFIG_PPC_BOOK3S
> >>/* No MSR:RI on BookE */
> >
> > We can probe at all those places, just not once MSR_RI is unset. So, the 
> > no-probe point is just *after* the mtmsrd.
> >
> > However, for kprobe blacklisting, the granularity is at a function level 
> > (or ASM labels). As such, we will have to blacklist all of 
> > syscall_exit/__system_call.
> 
> Would this work?
> 
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 767ef6d68c9e..8d0fa4a2262a 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -207,6 +207,7 @@ system_call:  /* label this so stack 
> traces look sane */
>   mtmsrd  r11,1
>  #endif /* CONFIG_PPC_BOOK3E */
> 
> +syscall_exit:
>   ld  r9,TI_FLAGS(r12)
>   li  r11,-MAX_ERRNO
>   andi.   
> r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)

Ah, nice. I previously incorrectly assumed that syscall_exit was not 
desirable throughout this function. Your earlier patch was only about 
what label showed up while _inside_ a syscall. So, syscall_exit post 
handling of a syscall is fine.

This patch looks fine to me. I will test with this change and get back.

Thanks,
Naveen



Re: [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations

2017-05-04 Thread Balbir Singh
On Wed, 2017-05-03 at 16:29 +0200, Frederic Barrat wrote:
> Introduce a new 'flags' attribute per context and define its first bit
> to be a marker requiring all TLBIs for that context to be broadcasted
> globally. Once that marker is set on a context, it cannot be removed.
> 
> Such a marker is useful for memory contexts used by devices behind the
> NPU and CAPP/PSL. The NPU and the PSL keep their own
> translation cache so they need to see all the TLBIs for those
> contexts.
> 
> Signed-off-by: Frederic Barrat 
> ---
>  arch/powerpc/include/asm/book3s/64/mmu.h |  9 +
>  arch/powerpc/include/asm/tlb.h   | 10 --
>  arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
>  3 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
> b/arch/powerpc/include/asm/book3s/64/mmu.h
> index 77529a3e3811..7b640ab1cbeb 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> @@ -78,8 +78,12 @@ struct spinlock;
>  /* Maximum possible number of NPUs in a system. */
>  #define NV_MAX_NPUS 8
>  
> +/* Bits definition for the context flags */
> +#define MM_CONTEXT_GLOBAL_TLBI   1   /* TLBI must be global */
> +
>  typedef struct {
>   mm_context_id_t id;
> + unsigned long flags;

Should these flags be under #ifdef PPC_BOOK3S_64 as well? Not sure.

>   u16 user_psize; /* page size index */
>  
>   /* NPU NMMU context */
> @@ -164,5 +168,10 @@ extern void radix_init_pseries(void);
>  static inline void radix_init_pseries(void) { };
>  #endif
>  
> +static inline void mm_context_set_global_tlbi(mm_context_t *ctx)
> +{
> + set_bit(MM_CONTEXT_GLOBAL_TLBI, >flags);
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* _ASM_POWERPC_BOOK3S_64_MMU_H_ */
> diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
> index 609557569f65..bd18ed083011 100644
> --- a/arch/powerpc/include/asm/tlb.h
> +++ b/arch/powerpc/include/asm/tlb.h
> @@ -71,8 +71,14 @@ static inline int mm_is_core_local(struct mm_struct *mm)
>  
>  static inline int mm_is_thread_local(struct mm_struct *mm)
>  {
> - return cpumask_equal(mm_cpumask(mm),
> -   cpumask_of(smp_processor_id()));
> + int rc;
> +
> + rc = cpumask_equal(mm_cpumask(mm),
> + cpumask_of(smp_processor_id()));
> +#ifdef CONFIG_PPC_BOOK3S_64
> + rc = rc && !test_bit(MM_CONTEXT_GLOBAL_TLBI, >context.flags);
> +#endif

Acked-by: Balbir Singh 


Re: [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations

2017-05-04 Thread Aneesh Kumar K.V
Frederic Barrat  writes:

> Introduce a new 'flags' attribute per context and define its first bit
> to be a marker requiring all TLBIs for that context to be broadcasted
> globally. Once that marker is set on a context, it cannot be removed.
>
> Such a marker is useful for memory contexts used by devices behind the
> NPU and CAPP/PSL. The NPU and the PSL keep their own
> translation cache so they need to see all the TLBIs for those
> contexts.

Can we also switch existing cxl_ctx_in_use() to this ?

-aneesh



Re: [PATCH v2 2/3] powerpc/kprobes: un-blacklist system_call() from kprobes

2017-05-04 Thread Michael Ellerman
"Naveen N. Rao"  writes:

> On 2017/04/27 08:19PM, Michael Ellerman wrote:
>> "Naveen N. Rao"  writes:
>> 
>> > It is actually safe to probe system_call() in entry_64.S, but only till
>> > .Lsyscall_exit. To allow this, convert .Lsyscall_exit to a non-local
>> > symbol __system_call() and blacklist that symbol, rather than
>> > system_call().
>> 
>> I'm not sure I like this. The reason we made it a local symbol in the
>> first place is because it made backtraces look odd:
>> 
>>   commit 4c3b21686111e0ac6018469dacbc5549f9915cf8
>>   Author: Michael Ellerman 
>>   AuthorDate: Fri Dec 5 21:16:59 2014 +1100
>>   
>>   powerpc/kernel: Make syscall_exit a local label
>>   
>>   Currently when we back trace something that is in a syscall we see
>>   something like this:
>>   
>>   [c000] [c000] SyS_read+0x6c/0x110
>>   [c000] [c000] syscall_exit+0x0/0x98
>>   
>>   Although it's entirely correct, seeing syscall_exit at the bottom can be
>>   confusing - we were exiting from a syscall and then called SyS_read() ?
>>   
>>   If we instead change syscall_exit to be a local label we get something
>>   more intuitive:
>>   
>>   [c001fa46fde0] [c026719c] SyS_read+0x6c/0x110
>>   [c001fa46fe30] [c0009264] system_call+0x38/0xd0
>>   
>>   ie. we were handling a system call, and it was SyS_read().
>> 
>> 
>> I think you know that, although you didn't mention it in the change log,
>> because you've called the new symbol __system_call. But that is not a
>> great name either because that's not what it does.
>
> Yes, you're right. I used __system_call since I felt that it won't cause 
> confusion like syscall_exit did. I agree it's not a great name, but we 
> need _some_ label other than system_call if we want to allow probing at 
> this point.
>
> Also, if I'm reading this right, there is no other place to probe if we 
> want to capture all system call entries.
>
> So, I felt this would be good to have.
>
>> 
>> > diff --git a/arch/powerpc/kernel/entry_64.S 
>> > b/arch/powerpc/kernel/entry_64.S
>> > index 380361c0bb6a..e030ce34dd66 100644
>> > --- a/arch/powerpc/kernel/entry_64.S
>> > +++ b/arch/powerpc/kernel/entry_64.S
>> > @@ -176,7 +176,7 @@ system_call:   /* label this so stack 
>> > traces look sane */
>> >mtctr   r12
>> >bctrl   /* Call handler */
>> >  
>> > -.Lsyscall_exit:
>> > +__system_call:
>> >std r3,RESULT(r1)
>> >CURRENT_THREAD_INFO(r12, r1)
>>   
>> Why can't we kprobe the std and the rotate to current thread info?
>> 
>> Is the real no-probe point just here, prior to the clearing of MSR_RI ?
>> 
>>  ld  r8,_MSR(r1)
>> #ifdef CONFIG_PPC_BOOK3S
>>  /* No MSR:RI on BookE */
>
> We can probe at all those places, just not once MSR_RI is unset. So, the 
> no-probe point is just *after* the mtmsrd.
>
> However, for kprobe blacklisting, the granularity is at a function level 
> (or ASM labels). As such, we will have to blacklist all of 
> syscall_exit/__system_call.

Would this work?

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 767ef6d68c9e..8d0fa4a2262a 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -207,6 +207,7 @@ system_call:/* label this so stack 
traces look sane */
mtmsrd  r11,1
 #endif /* CONFIG_PPC_BOOK3E */
 
+syscall_exit:
ld  r9,TI_FLAGS(r12)
li  r11,-MAX_ERRNO
andi.   
r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)


cheers