[PATCH v3 1/7] perf probe: Improve detection of file/function name in the probe pattern

2015-04-28 Thread Naveen N. Rao
Currently, perf probe considers patterns including a '.' to be a file.
However, this causes problems on powerpc ABIv1 where all functions have
a leading '.':

  $ perf probe -F | grep schedule_timeout_interruptible
  .schedule_timeout_interruptible
  $ perf probe .schedule_timeout_interruptible
  Semantic error :File always requires line number or lazy pattern.
Error: Command Parse Error.

Fix this:
- by checking the probe pattern in more detail, and
- skipping leading dot if one exists when creating/deleting events.

Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
---
Changes:
- Folded initial part of patch 04 [now 03] into this per Masami.
- Moved to using a bool file_spec and some whitespace changes.


 tools/perf/util/probe-event.c | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index d8bb616..a8c19d5 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1077,6 +1077,7 @@ static int parse_perf_probe_point(char *arg, struct 
perf_probe_event *pev)
struct perf_probe_point *pp = pev-point;
char *ptr, *tmp;
char c, nc = 0;
+   bool file_spec = false;
/*
 * Syntax
 * perf probe [EVENT=]SRC[:LN|;PTN]
@@ -1105,6 +1106,23 @@ static int parse_perf_probe_point(char *arg, struct 
perf_probe_event *pev)
arg = tmp;
}
 
+   /*
+* Check arg is function or file name and copy it.
+*
+* We consider arg to be a file spec if and only if it satisfies
+* all of the below criteria::
+* - it does not include any of +@%,
+* - it includes one of :;, and
+* - it has a period '.' in the name.
+*
+* Otherwise, we consider arg to be a function specification.
+*/
+   if (!strpbrk(arg, +@%)  (ptr = strpbrk(arg, ;:)) != NULL) {
+   /* This is a file spec if it includes a '.' before ; or : */
+   if (memchr(arg, '.', ptr - arg))
+   file_spec = true;
+   }
+
ptr = strpbrk(arg, ;:+@%);
if (ptr) {
nc = *ptr;
@@ -1115,10 +1133,9 @@ static int parse_perf_probe_point(char *arg, struct 
perf_probe_event *pev)
if (tmp == NULL)
return -ENOMEM;
 
-   /* Check arg is function or file and copy it */
-   if (strchr(tmp, '.'))   /* File */
+   if (file_spec)
pp-file = tmp;
-   else/* Function */
+   else
pp-function = tmp;
 
/* Parse other options */
@@ -2265,6 +2282,9 @@ static int get_new_event_name(char *buf, size_t len, 
const char *base,
 {
int i, ret;
 
+   if (*base == '.')
+   base++;
+
/* Try no suffix */
ret = e_snprintf(buf, len, %s, base);
if (ret  0) {
@@ -2751,6 +2771,9 @@ int del_perf_probe_events(struct strlist *dellist)
event = str;
}
 
+   if (event  *event == '.')
+   event++;
+
ret = e_snprintf(buf, 128, %s:%s, group, event);
if (ret  0) {
pr_err(Failed to copy event.);
-- 
2.3.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Build regressions/improvements in v4.1-rc1

2015-04-28 Thread Rusty Russell
Geert Uytterhoeven ge...@linux-m68k.org writes:
 Can't see that one with a simple grep: can you post warning?

 /home/kisskb/slave/src/arch/tile/kernel/setup.c: In function 
 'zone_sizes_init':
 /home/kisskb/slave/src/arch/tile/kernel/setup.c:777:3: warning:
 passing argument 2 of 'cpumask_test_cpu' from incompatible pointer
 type [enabled by default]
 /home/kisskb/slave/src/include/linux/cpumask.h:294:19: note: expected
 'const struct cpumask *' but argument is of type 'struct nodemask_t *'

Um, I turned the cpu_isset() into cpumask_test_cpu(), but that just
showed this bug up.  The tile maintainers need to fix this one.

Thanks,
Rusty.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] leds/powernv: Add driver for PowerNV platform

2015-04-28 Thread Vasant Hegde
On 04/27/2015 07:17 PM, Vasant Hegde wrote:
 On 04/27/2015 04:45 PM, Jacek Anaszewski wrote:
 On 04/27/2015 11:53 AM, Benjamin Herrenschmidt wrote:
 On Mon, 2015-04-27 at 09:24 +0200, Jacek Anaszewski wrote:
 I was not aware that some other entity than the driver could be
 interested in the information provided by DT node. I will no longer
 object, provided that we will get an ack from DT maintainer.

 
 Jacket,

Oops.. Sorry..  It was a typo .. I meant Jacek..


-Vasant

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 4/7] perf probe/ppc: Enable matching against dot symbols automatically

2015-04-28 Thread Naveen N. Rao
Allow perf probe to work on ppc ABIv1 without the need to specify the
leading dot '.' for functions. 'perf probe do_fork' works with this
patch.

We do this by changing how symbol name comparison works on ppc ABIv1 -
we simply ignore and skip over the initial dot, if one exists, during
symbol name comparison.

Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
---
Changes:
This patch has been redone since upstream changed quite a bit. We now implement
arch-specific helper during symbol name comparison to ignore leading dot on ppc
symbols.

 tools/perf/arch/powerpc/util/sym-handling.c | 13 +
 tools/perf/util/map.c   |  5 +
 tools/perf/util/map.h   |  3 ++-
 tools/perf/util/symbol.c|  4 ++--
 4 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/tools/perf/arch/powerpc/util/sym-handling.c 
b/tools/perf/arch/powerpc/util/sym-handling.c
index 5522a40..2de2cc4 100644
--- a/tools/perf/arch/powerpc/util/sym-handling.c
+++ b/tools/perf/arch/powerpc/util/sym-handling.c
@@ -8,6 +8,7 @@
 
 #include debug.h
 #include symbol.h
+#include map.h
 
 #ifdef HAVE_LIBELF_SUPPORT
 bool elf__needs_adjust_symbols(GElf_Ehdr ehdr)
@@ -36,4 +37,16 @@ int arch__choose_best_symbol(struct symbol *syma,
 
return SYMBOL_A;
 }
+
+/* Allow matching against dot variants */
+int arch__compare_symbol_names(const char *namea, const char *nameb)
+{
+   /* Skip over initial dot */
+   if (*namea == '.')
+   namea++;
+   if (*nameb == '.')
+   nameb++;
+
+   return strcmp(namea, nameb);
+}
 #endif
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index a14f08f..cd0e335 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -292,6 +292,11 @@ int map__load(struct map *map, symbol_filter_t filter)
return 0;
 }
 
+int __weak arch__compare_symbol_names(const char *namea, const char *nameb)
+{
+   return strcmp(namea, nameb);
+}
+
 struct symbol *map__find_symbol(struct map *map, u64 addr,
symbol_filter_t filter)
 {
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index ec19c59..4e0c729 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -124,7 +124,7 @@ struct thread;
  */
 #define __map__for_each_symbol_by_name(map, sym_name, pos, filter) \
for (pos = map__find_symbol_by_name(map, sym_name, filter); \
-pos  strcmp(pos-name, sym_name) == 0;   \
+pos  arch__compare_symbol_names(pos-name, sym_name) == 0;   
\
 pos = symbol__next_by_name(pos))
 
 #define map__for_each_symbol_by_name(map, sym_name, pos)   \
@@ -132,6 +132,7 @@ struct thread;
 
 typedef int (*symbol_filter_t)(struct map *map, struct symbol *sym);
 
+int arch__compare_symbol_names(const char *namea, const char *nameb);
 void map__init(struct map *map, enum map_type type,
   u64 start, u64 end, u64 pgoff, struct dso *dso);
 struct map *map__new(struct machine *machine, u64 start, u64 len,
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index f805757..45ba48a 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -411,7 +411,7 @@ static struct symbol *symbols__find_by_name(struct rb_root 
*symbols,
int cmp;
 
s = rb_entry(n, struct symbol_name_rb_node, rb_node);
-   cmp = strcmp(name, s-sym.name);
+   cmp = arch__compare_symbol_names(name, s-sym.name);
 
if (cmp  0)
n = n-rb_left;
@@ -429,7 +429,7 @@ static struct symbol *symbols__find_by_name(struct rb_root 
*symbols,
struct symbol_name_rb_node *tmp;
 
tmp = rb_entry(n, struct symbol_name_rb_node, rb_node);
-   if (strcmp(tmp-sym.name, s-sym.name))
+   if (arch__compare_symbol_names(tmp-sym.name, s-sym.name))
break;
 
s = tmp;
-- 
2.3.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/7] perf probe: Improve detection of file/function name in the probe pattern

2015-04-28 Thread Arnaldo Carvalho de Melo
Em Tue, Apr 28, 2015 at 05:35:34PM +0530, Naveen N. Rao escreveu:
 Currently, perf probe considers patterns including a '.' to be a file.
 However, this causes problems on powerpc ABIv1 where all functions have
 a leading '.':
 
   $ perf probe -F | grep schedule_timeout_interruptible
   .schedule_timeout_interruptible
   $ perf probe .schedule_timeout_interruptible
   Semantic error :File always requires line number or lazy pattern.
 Error: Command Parse Error.

Just gave this a real quick look, saw no problems, but I'll defer this
to Masami, a reviewed-by tag from him, ok Masami?

- Arnaldo
 
 Fix this:
 - by checking the probe pattern in more detail, and
 - skipping leading dot if one exists when creating/deleting events.
 
 Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
 Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
 ---
 Changes:
 - Folded initial part of patch 04 [now 03] into this per Masami.
 - Moved to using a bool file_spec and some whitespace changes.
 
 
  tools/perf/util/probe-event.c | 29 ++---
  1 file changed, 26 insertions(+), 3 deletions(-)
 
 diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
 index d8bb616..a8c19d5 100644
 --- a/tools/perf/util/probe-event.c
 +++ b/tools/perf/util/probe-event.c
 @@ -1077,6 +1077,7 @@ static int parse_perf_probe_point(char *arg, struct 
 perf_probe_event *pev)
   struct perf_probe_point *pp = pev-point;
   char *ptr, *tmp;
   char c, nc = 0;
 + bool file_spec = false;
   /*
* Syntax
* perf probe [EVENT=]SRC[:LN|;PTN]
 @@ -1105,6 +1106,23 @@ static int parse_perf_probe_point(char *arg, struct 
 perf_probe_event *pev)
   arg = tmp;
   }
  
 + /*
 +  * Check arg is function or file name and copy it.
 +  *
 +  * We consider arg to be a file spec if and only if it satisfies
 +  * all of the below criteria::
 +  * - it does not include any of +@%,
 +  * - it includes one of :;, and
 +  * - it has a period '.' in the name.
 +  *
 +  * Otherwise, we consider arg to be a function specification.
 +  */
 + if (!strpbrk(arg, +@%)  (ptr = strpbrk(arg, ;:)) != NULL) {
 + /* This is a file spec if it includes a '.' before ; or : */
 + if (memchr(arg, '.', ptr - arg))
 + file_spec = true;
 + }
 +
   ptr = strpbrk(arg, ;:+@%);
   if (ptr) {
   nc = *ptr;
 @@ -1115,10 +1133,9 @@ static int parse_perf_probe_point(char *arg, struct 
 perf_probe_event *pev)
   if (tmp == NULL)
   return -ENOMEM;
  
 - /* Check arg is function or file and copy it */
 - if (strchr(tmp, '.'))   /* File */
 + if (file_spec)
   pp-file = tmp;
 - else/* Function */
 + else
   pp-function = tmp;
  
   /* Parse other options */
 @@ -2265,6 +2282,9 @@ static int get_new_event_name(char *buf, size_t len, 
 const char *base,
  {
   int i, ret;
  
 + if (*base == '.')
 + base++;
 +
   /* Try no suffix */
   ret = e_snprintf(buf, len, %s, base);
   if (ret  0) {
 @@ -2751,6 +2771,9 @@ int del_perf_probe_events(struct strlist *dellist)
   event = str;
   }
  
 + if (event  *event == '.')
 + event++;
 +
   ret = e_snprintf(buf, 128, %s:%s, group, event);
   if (ret  0) {
   pr_err(Failed to copy event.);
 -- 
 2.3.5
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 0/7] Fixes for perf probe issues on ppc

2015-04-28 Thread Arnaldo Carvalho de Melo
Em Tue, Apr 28, 2015 at 05:35:33PM +0530, Naveen N. Rao escreveu:
 This patchset fixes various issues with perf probe on powerpc across ABIv1 and
 ABIv2:
 - in the presence of DWARF debug-info,
 - in the absence of DWARF, but with the symbol table, and
 - in the absence of debug-info, but with kallsyms.
 
 Arnaldo,
 I have moved all patches to use __weak functions. Kindly take a look and let 
 me
 know if this is what you had in mind.

Ok, I applied all but the first, for which I am waiting for Masami's
reaction, I kept Srikar's reviewed-by for the other patches, but would
as well like to get his word that he keeps it after the __weak changes.

So, for now, I'll leave it for a while sitting in my local tree, to give
time to Masami and Srikar, ok?

- Arnaldo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/7] perf probe/ppc: Fix symbol fixup issues due to ELF type

2015-04-28 Thread Arnaldo Carvalho de Melo
Em Tue, Apr 28, 2015 at 05:35:35PM +0530, Naveen N. Rao escreveu:
 If using the symbol table, symbol addresses are not being fixed up
 properly, resulting in probes being placed at wrong addresses:
 
   # perf probe do_fork
   Added new event:
 probe:do_fork(on do_fork)
 
   You can now use it in all perf tools, such as:
 
 perf record -e probe:do_fork -aR sleep 1
 
   # cat /sys/kernel/debug/tracing/kprobe_events
   p:probe/do_fork _text+635952
   # printf %x 635952
   9b430
   # grep do_fork /boot/System.map
   c00ab430 T .do_fork
 
 Fix by checking for ELF type ET_DYN used by ppc64 kernels.
 
 Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
 Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com

This one looks great, Are you keeping the Reviewed by from a previous
patch? Or has Srikar reissued it? Srikar?

- Arnaldo
 ---
 Changes:
 - Introduce arch helper to limit change to ppc.
 
  tools/perf/arch/powerpc/util/Build  |  1 +
  tools/perf/arch/powerpc/util/sym-handling.c | 19 +++
  tools/perf/util/symbol-elf.c|  8 ++--
  tools/perf/util/symbol.h|  4 
  4 files changed, 30 insertions(+), 2 deletions(-)
  create mode 100644 tools/perf/arch/powerpc/util/sym-handling.c
 
 diff --git a/tools/perf/arch/powerpc/util/Build 
 b/tools/perf/arch/powerpc/util/Build
 index 0af6e9b..7b8b0d1 100644
 --- a/tools/perf/arch/powerpc/util/Build
 +++ b/tools/perf/arch/powerpc/util/Build
 @@ -1,4 +1,5 @@
  libperf-y += header.o
 +libperf-y += sym-handling.o
  
  libperf-$(CONFIG_DWARF) += dwarf-regs.o
  libperf-$(CONFIG_DWARF) += skip-callchain-idx.o
 diff --git a/tools/perf/arch/powerpc/util/sym-handling.c 
 b/tools/perf/arch/powerpc/util/sym-handling.c
 new file mode 100644
 index 000..c9de001
 --- /dev/null
 +++ b/tools/perf/arch/powerpc/util/sym-handling.c
 @@ -0,0 +1,19 @@
 +/*
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License, version 2, as
 + * published by the Free Software Foundation.
 + *
 + * Copyright (C) 2015 Naveen N. Rao, IBM Corporation
 + */
 +
 +#include debug.h
 +#include symbol.h
 +
 +#ifdef HAVE_LIBELF_SUPPORT
 +bool elf__needs_adjust_symbols(GElf_Ehdr ehdr)
 +{
 + return ehdr.e_type == ET_EXEC ||
 +ehdr.e_type == ET_REL ||
 +ehdr.e_type == ET_DYN;
 +}
 +#endif
 diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
 index ac09c60..7f5d237 100644
 --- a/tools/perf/util/symbol-elf.c
 +++ b/tools/perf/util/symbol-elf.c
 @@ -630,6 +630,11 @@ void symsrc__destroy(struct symsrc *ss)
   close(ss-fd);
  }
  
 +bool __weak elf__needs_adjust_symbols(GElf_Ehdr ehdr)
 +{
 + return ehdr.e_type == ET_EXEC || ehdr.e_type == ET_REL;
 +}
 +
  int symsrc__init(struct symsrc *ss, struct dso *dso, const char *name,
enum dso_binary_type type)
  {
 @@ -712,8 +717,7 @@ int symsrc__init(struct symsrc *ss, struct dso *dso, 
 const char *name,
.gnu.prelink_undo,
NULL) != NULL);
   } else {
 - ss-adjust_symbols = ehdr.e_type == ET_EXEC ||
 -  ehdr.e_type == ET_REL;
 + ss-adjust_symbols = elf__needs_adjust_symbols(ehdr);
   }
  
   ss-name   = strdup(name);
 diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
 index 0956150..8cb0af4 100644
 --- a/tools/perf/util/symbol.h
 +++ b/tools/perf/util/symbol.h
 @@ -303,4 +303,8 @@ int setup_list(struct strlist **list, const char 
 *list_str,
  int setup_intlist(struct intlist **list, const char *list_str,
 const char *list_name);
  
 +#ifdef HAVE_LIBELF_SUPPORT
 +bool elf__needs_adjust_symbols(GElf_Ehdr ehdr);
 +#endif
 +
  #endif /* __PERF_SYMBOL */
 -- 
 2.3.5
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

2015-04-28 Thread songxium...@inspur.com
When we test the cpu and memory hotplug feature in the server with x86 
architecture and kernel4.0-rc4,we met the similar problem.


The situation is that when memory in node0 is offline,the system is down during 
booting.

Following is the bug information:

[0.335176] BUG: unable to handle kernel paging request at 1b08
[0.342164] IP: [81182587] __alloc_pages_nodemask+0xb7/0x940
[0.348706] PGD 0 
[0.350735] Oops:  [#1] SMP 
[0.353993] Modules linked in:
[0.357063] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.0.0-rc4 #1
[0.363232] Hardware name: Inspur TS860/TS860, BIOS TS860_2.0.0 2015/03/24
[0.370095] task: 88085b1e ti: 88085b1e8000 task.ti: 
88085b1e8000
[0.377564] RIP: 0010:[81182587]  [81182587] 
__alloc_pages_nodemask+0xb7/0x940
[0.386524] RSP: :88085b1ebac8  EFLAGS: 00010246
[0.391828] RAX: 1b00 RBX: 0010 RCX: 
[0.398953] RDX:  RSI:  RDI: 002052d0
[0.406075] RBP: 88085b1ebbb8 R08: 88085b13fec0 R09: 5b13fe01
[0.413198] R10: 88085e807300 R11: 810d4bc1 R12: 0001002a
[0.420321] R13: 002052d0 R14: 0001 R15: 40d0
[0.427446] FS:  () GS:88085ee0() 
knlGS:
[0.435522] CS:  0010 DS:  ES:  CR0: 80050033
[0.441259] CR2: 1b08 CR3: 019ae000 CR4: 001406f0
[0.448382] Stack:
[0.450392]  88085b1e 0400 88085b1e 
88085b1ebb68
[0.457846]  007b 88085b12d140 88085b249000 
007b
[0.465298]  88085b1ebb28 81af2900  
002052d05b12d140
[0.472750] Call Trace:
[0.475206]  [811d27b3] ? deactivate_slab+0x383/0x400
[0.481123]  [811d3947] new_slab+0xa7/0x460
[0.486174]  [816789e5] __slab_alloc+0x310/0x470
[0.491655]  [8105304f] ? dmar_msi_set_affinity+0x8f/0xc0
[0.497921]  [810d4bc1] ? __irq_domain_add+0x41/0x100
[0.503838]  [810d0fee] ? irq_do_set_affinity+0x5e/0x70
[0.509920]  [811d571d] __kmalloc_node+0xad/0x2e0
[0.515483]  [810d4bc1] ? __irq_domain_add+0x41/0x100
[0.521392]  [810d4bc1] __irq_domain_add+0x41/0x100
[0.527133]  [8105102e] mp_irqdomain_create+0x9e/0x120
[0.533140]  [81b2fb14] setup_IO_APIC+0x64/0x1be
[0.538622]  [81b2e226] apic_bsp_setup+0xa2/0xae
[0.544099]  [81b2bc70] native_smp_prepare_cpus+0x267/0x2b2
[0.550531]  [81b1927b] kernel_init_freeable+0xf2/0x253
[0.556625]  [8166b960] ? rest_init+0x80/0x80
[0.561845]  [8166b96e] kernel_init+0xe/0xf0
[0.566979]  [81681bd8] ret_from_fork+0x58/0x90
[0.572374]  [8166b960] ? rest_init+0x80/0x80
[0.577591] Code: 30 97 00 89 45 bc 83 e1 0f b8 22 01 32 01 01 c9 d3 f8 83 
e0 03 89 9d 6c ff ff ff 83 e3 10 89 45 c0 0f 85 6d 01 00 00 48 8b 45 88 48 83 
78 08 00 0f 84 51 01 00 00 b8 01 00 00 00 44 89 f1 d3 e0 
[0.597537] RIP  [81182587] __alloc_pages_nodemask+0xb7/0x940
[0.604158]  RSP 88085b1ebac8
[0.607643] CR2: 1b08
[0.610962] ---[ end trace 0a600c0841386992 ]---
[0.615573] Kernel panic - not syncing: Fatal exception
[0.620792] ---[ end Kernel panic - not syncing: Fatal exception




From: Rob Herring
Date: 2015-04-14 00:49
To: Konstantin Khlebnikov
CC: Grant Likely; devicet...@vger.kernel.org; Rob Herring; 
linux-ker...@vger.kernel.org; sparcli...@vger.kernel.org; linux...@kvack.org; 
linuxppc-dev
Subject: Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()
On Mon, Apr 13, 2015 at 8:38 AM, Konstantin Khlebnikov
khlebni...@yandex-team.ru wrote:
 On 13.04.2015 16:22, Rob Herring wrote:

 On Wed, Apr 8, 2015 at 11:59 AM, Konstantin Khlebnikov
 khlebni...@yandex-team.ru wrote:

 Node 0 might be offline as well as any other numa node,
 in this case kernel cannot handle memory allocation and crashes.

 Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru
 Fixes: 0c3f061c195c (of: implement of_node_to_nid as a weak function)
 ---
   drivers/of/base.c  |2 +-
   include/linux/of.h |5 -
   2 files changed, 5 insertions(+), 2 deletions(-)

 diff --git a/drivers/of/base.c b/drivers/of/base.c
 index 8f165b112e03..51f4bd16e613 100644
 --- a/drivers/of/base.c
 +++ b/drivers/of/base.c
 @@ -89,7 +89,7 @@ EXPORT_SYMBOL(of_n_size_cells);
   #ifdef CONFIG_NUMA
   int __weak of_node_to_nid(struct device_node *np)
   {
 -   return numa_node_id();
 +   return NUMA_NO_NODE;


 This is going to break any NUMA machine that enables OF and expects
 the weak function to work.


 Why? NUMA_NO_NODE == -1 -- this's standard no-affinity signal.
 As I see powerpc/sparc versions of of_node_to_nid returns -1 if they
 cannot find 

Re: Re: [PATCH v3 0/7] Fixes for perf probe issues on ppc

2015-04-28 Thread Masami Hiramatsu
On 2015/04/28 22:54, Arnaldo Carvalho de Melo wrote:
 Em Tue, Apr 28, 2015 at 05:35:33PM +0530, Naveen N. Rao escreveu:
 This patchset fixes various issues with perf probe on powerpc across ABIv1 
 and
 ABIv2:
 - in the presence of DWARF debug-info,
 - in the absence of DWARF, but with the symbol table, and
 - in the absence of debug-info, but with kallsyms.

 Arnaldo,
 I have moved all patches to use __weak functions. Kindly take a look and let 
 me
 know if this is what you had in mind.
 
 Ok, I applied all but the first, for which I am waiting for Masami's
 reaction, I kept Srikar's reviewed-by for the other patches, but would
 as well like to get his word that he keeps it after the __weak changes.
 
 So, for now, I'll leave it for a while sitting in my local tree, to give
 time to Masami and Srikar, ok?

OK, I reviewed all the patches in this series :)

Acked-by: Masami Hiramatsu masami.hiramatsu...@hitachi.com
for this series.

Thank you!

-- 
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research  Development Group
E-mail: masami.hiramatsu...@hitachi.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 3/7] perf probe/ppc: Use the right prefix when ignoring SyS symbols on ppc

2015-04-28 Thread Naveen N. Rao
Use the proper prefix when ignoring SyS symbols on ppc ABIv1. While at
it, generalize symbol selection so architectures can implement their
own logic.

Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
---
Changes:
- Move logic to arch helper.

 tools/perf/arch/powerpc/util/sym-handling.c | 20 
 tools/perf/util/symbol.c| 21 -
 tools/perf/util/symbol.h|  5 +
 3 files changed, 37 insertions(+), 9 deletions(-)

diff --git a/tools/perf/arch/powerpc/util/sym-handling.c 
b/tools/perf/arch/powerpc/util/sym-handling.c
index c9de001..5522a40 100644
--- a/tools/perf/arch/powerpc/util/sym-handling.c
+++ b/tools/perf/arch/powerpc/util/sym-handling.c
@@ -17,3 +17,23 @@ bool elf__needs_adjust_symbols(GElf_Ehdr ehdr)
   ehdr.e_type == ET_DYN;
 }
 #endif
+
+#if !defined(_CALL_ELF) || _CALL_ELF != 2
+int arch__choose_best_symbol(struct symbol *syma,
+struct symbol *symb __maybe_unused)
+{
+   char *sym = syma-name;
+
+   /* Skip over any initial dot */
+   if (*sym == '.')
+   sym++;
+
+   /* Avoid SyS kernel syscall aliases */
+   if (strlen(sym) = 3  !strncmp(sym, SyS, 3))
+   return SYMBOL_B;
+   if (strlen(sym) = 10  !strncmp(sym, compat_SyS, 10))
+   return SYMBOL_B;
+
+   return SYMBOL_A;
+}
+#endif
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 201f6c4c..f805757 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -85,8 +85,17 @@ static int prefix_underscores_count(const char *str)
return tail - str;
 }
 
-#define SYMBOL_A 0
-#define SYMBOL_B 1
+int __weak arch__choose_best_symbol(struct symbol *syma,
+   struct symbol *symb __maybe_unused)
+{
+   /* Avoid SyS kernel syscall aliases */
+   if (strlen(syma-name) = 3  !strncmp(syma-name, SyS, 3))
+   return SYMBOL_B;
+   if (strlen(syma-name) = 10  !strncmp(syma-name, compat_SyS, 10))
+   return SYMBOL_B;
+
+   return SYMBOL_A;
+}
 
 static int choose_best_symbol(struct symbol *syma, struct symbol *symb)
 {
@@ -134,13 +143,7 @@ static int choose_best_symbol(struct symbol *syma, struct 
symbol *symb)
else if (na  nb)
return SYMBOL_B;
 
-   /* Avoid SyS kernel syscall aliases */
-   if (na = 3  !strncmp(syma-name, SyS, 3))
-   return SYMBOL_B;
-   if (na = 10  !strncmp(syma-name, compat_SyS, 10))
-   return SYMBOL_B;
-
-   return SYMBOL_A;
+   return arch__choose_best_symbol(syma, symb);
 }
 
 void symbols__fixup_duplicate(struct rb_root *symbols)
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 8cb0af4..bd50ba0 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -307,4 +307,9 @@ int setup_intlist(struct intlist **list, const char 
*list_str,
 bool elf__needs_adjust_symbols(GElf_Ehdr ehdr);
 #endif
 
+#define SYMBOL_A 0
+#define SYMBOL_B 1
+
+int arch__choose_best_symbol(struct symbol *syma, struct symbol *symb);
+
 #endif /* __PERF_SYMBOL */
-- 
2.3.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 0/7] Fixes for perf probe issues on ppc

2015-04-28 Thread Naveen N. Rao
This patchset fixes various issues with perf probe on powerpc across ABIv1 and
ABIv2:
- in the presence of DWARF debug-info,
- in the absence of DWARF, but with the symbol table, and
- in the absence of debug-info, but with kallsyms.

Arnaldo,
I have moved all patches to use __weak functions. Kindly take a look and let me
know if this is what you had in mind.

Thanks!
- Naveen


Changes since v2:
The primary change is with the use of __weak functions instead of compiler
macros. Along with that, I have also addressed all previous review comments
(*). Individual patches have the detailed changelog. Please note that I have
dropped the first (kernel) patch from the last series since it has been pushed
upstream. This series now only includes the rest of the patches.

(*) http://thread.gmane.org/gmane.linux.kernel/1851128



Naveen N. Rao (7):
  perf probe: Improve detection of file/function name in the probe
pattern
  perf probe/ppc: Fix symbol fixup issues due to ELF type
  perf probe/ppc: Use the right prefix when ignoring SyS symbols on ppc
  perf probe/ppc: Enable matching against dot symbols automatically
  perf probe/ppc64le: Fix ppc64 ABIv2 symbol decoding
  perf probe/ppc64le: Prefer symbol table lookup over DWARF
  perf probe/ppc64le: Fixup function entry if using kallsyms lookup

 tools/perf/arch/powerpc/util/Build  |  1 +
 tools/perf/arch/powerpc/util/sym-handling.c | 80 +
 tools/perf/util/map.c   |  5 ++
 tools/perf/util/map.h   |  3 +-
 tools/perf/util/probe-event.c   | 42 +--
 tools/perf/util/probe-event.h   |  3 ++
 tools/perf/util/symbol-elf.c| 12 -
 tools/perf/util/symbol.c| 25 +
 tools/perf/util/symbol.h| 10 
 9 files changed, 164 insertions(+), 17 deletions(-)
 create mode 100644 tools/perf/arch/powerpc/util/sym-handling.c

-- 
2.3.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 6/7] perf probe/ppc64le: Prefer symbol table lookup over DWARF

2015-04-28 Thread Naveen N. Rao
Use symbol table lookups by default if DWARF is not necessary, since
powerpc ABIv2 encodes local entry points in the symbol table and the
function entry address in DWARF may not be appropriate for kprobes, as
described here:
https://sourceware.org/bugzilla/show_bug.cgi?id=17638

The DWARF address ranges deliberately include the *whole* function,
both global and local entry points.
...
If you want to set probes on a local entry point, you should look up
the symbol in the main symbol table (not DWARF), and check the st_other
bits; they will indicate whether the function has a local entry point,
and what its offset from the global entry point is.  Note that GDB does
the same when setting a breakpoint on a function entry.

Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
---
Changes:
- No changes other than move to use __weak functions

 tools/perf/arch/powerpc/util/sym-handling.c | 8 
 tools/perf/util/probe-event.c   | 8 
 tools/perf/util/probe-event.h   | 1 +
 3 files changed, 17 insertions(+)

diff --git a/tools/perf/arch/powerpc/util/sym-handling.c 
b/tools/perf/arch/powerpc/util/sym-handling.c
index 012a0f8..a170060 100644
--- a/tools/perf/arch/powerpc/util/sym-handling.c
+++ b/tools/perf/arch/powerpc/util/sym-handling.c
@@ -9,6 +9,7 @@
 #include debug.h
 #include symbol.h
 #include map.h
+#include probe-event.h
 
 #ifdef HAVE_LIBELF_SUPPORT
 bool elf__needs_adjust_symbols(GElf_Ehdr ehdr)
@@ -57,3 +58,10 @@ int arch__compare_symbol_names(const char *namea, const char 
*nameb)
return strcmp(namea, nameb);
 }
 #endif
+
+#if defined(_CALL_ELF)  _CALL_ELF == 2
+bool arch__prefers_symtab(void)
+{
+   return true;
+}
+#endif
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index a8c19d5..5e62a07 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -2587,6 +2587,8 @@ err_out:
goto out;
 }
 
+bool __weak arch__prefers_symtab(void) { return false; }
+
 static int convert_to_probe_trace_events(struct perf_probe_event *pev,
  struct probe_trace_event **tevs,
  int max_tevs, const char *target)
@@ -2602,6 +2604,12 @@ static int convert_to_probe_trace_events(struct 
perf_probe_event *pev,
}
}
 
+   if (arch__prefers_symtab()  !perf_probe_event_need_dwarf(pev)) {
+   ret = find_probe_trace_events_from_map(pev, tevs, max_tevs, 
target);
+   if (ret  0)
+   return ret; /* Found in symbol table */
+   }
+
/* Convert perf_probe_event with debuginfo */
ret = try_to_find_probe_trace_events(pev, tevs, max_tevs, target);
if (ret != 0)
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index d6b7834..52bca4b 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -135,6 +135,7 @@ extern int show_available_vars(struct perf_probe_event 
*pevs, int npevs,
   struct strfilter *filter, bool externs);
 extern int show_available_funcs(const char *module, struct strfilter *filter,
bool user);
+bool arch__prefers_symtab(void);
 
 /* Maximum index number of event-name postfix */
 #define MAX_EVENT_INDEX1024
-- 
2.3.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 5/7] perf probe/ppc64le: Fix ppc64 ABIv2 symbol decoding

2015-04-28 Thread Naveen N. Rao
From: Ananth N Mavinakayanahalli ana...@in.ibm.com

ppc64 ELF ABIv2 has a Global Entry Point (GEP) and a Local Entry Point
(LEP). For purposes of probing, we need the LEP - the offset to which is
encoded in st_other.

Signed-off-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
---
Changes:
- No changes other than move to use __weak functions

 tools/perf/arch/powerpc/util/sym-handling.c | 7 +++
 tools/perf/util/symbol-elf.c| 4 
 tools/perf/util/symbol.h| 1 +
 3 files changed, 12 insertions(+)

diff --git a/tools/perf/arch/powerpc/util/sym-handling.c 
b/tools/perf/arch/powerpc/util/sym-handling.c
index 2de2cc4..012a0f8 100644
--- a/tools/perf/arch/powerpc/util/sym-handling.c
+++ b/tools/perf/arch/powerpc/util/sym-handling.c
@@ -17,6 +17,13 @@ bool elf__needs_adjust_symbols(GElf_Ehdr ehdr)
   ehdr.e_type == ET_REL ||
   ehdr.e_type == ET_DYN;
 }
+
+#if defined(_CALL_ELF)  _CALL_ELF == 2
+void arch__elf_sym_adjust(GElf_Sym *sym)
+{
+   sym-st_value += PPC64_LOCAL_ENTRY_OFFSET(sym-st_other);
+}
+#endif
 #endif
 
 #if !defined(_CALL_ELF) || _CALL_ELF != 2
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 7f5d237..9d526a5 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -776,6 +776,8 @@ static bool want_demangle(bool is_kernel_sym)
return is_kernel_sym ? symbol_conf.demangle_kernel : 
symbol_conf.demangle;
 }
 
+void __weak arch__elf_sym_adjust(GElf_Sym *sym __maybe_unused) { }
+
 int dso__load_sym(struct dso *dso, struct map *map,
  struct symsrc *syms_ss, struct symsrc *runtime_ss,
  symbol_filter_t filter, int kmodule)
@@ -940,6 +942,8 @@ int dso__load_sym(struct dso *dso, struct map *map,
(sym.st_value  1))
--sym.st_value;
 
+   arch__elf_sym_adjust(sym);
+
if (dso-kernel || kmodule) {
char dso_name[PATH_MAX];
 
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index bd50ba0..9096529 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -305,6 +305,7 @@ int setup_intlist(struct intlist **list, const char 
*list_str,
 
 #ifdef HAVE_LIBELF_SUPPORT
 bool elf__needs_adjust_symbols(GElf_Ehdr ehdr);
+void arch__elf_sym_adjust(GElf_Sym *sym);
 #endif
 
 #define SYMBOL_A 0
-- 
2.3.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 2/7] perf probe/ppc: Fix symbol fixup issues due to ELF type

2015-04-28 Thread Naveen N. Rao
If using the symbol table, symbol addresses are not being fixed up
properly, resulting in probes being placed at wrong addresses:

  # perf probe do_fork
  Added new event:
probe:do_fork(on do_fork)

  You can now use it in all perf tools, such as:

  perf record -e probe:do_fork -aR sleep 1

  # cat /sys/kernel/debug/tracing/kprobe_events
  p:probe/do_fork _text+635952
  # printf %x 635952
  9b430
  # grep do_fork /boot/System.map
  c00ab430 T .do_fork

Fix by checking for ELF type ET_DYN used by ppc64 kernels.

Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
---
Changes:
- Introduce arch helper to limit change to ppc.

 tools/perf/arch/powerpc/util/Build  |  1 +
 tools/perf/arch/powerpc/util/sym-handling.c | 19 +++
 tools/perf/util/symbol-elf.c|  8 ++--
 tools/perf/util/symbol.h|  4 
 4 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/arch/powerpc/util/sym-handling.c

diff --git a/tools/perf/arch/powerpc/util/Build 
b/tools/perf/arch/powerpc/util/Build
index 0af6e9b..7b8b0d1 100644
--- a/tools/perf/arch/powerpc/util/Build
+++ b/tools/perf/arch/powerpc/util/Build
@@ -1,4 +1,5 @@
 libperf-y += header.o
+libperf-y += sym-handling.o
 
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
 libperf-$(CONFIG_DWARF) += skip-callchain-idx.o
diff --git a/tools/perf/arch/powerpc/util/sym-handling.c 
b/tools/perf/arch/powerpc/util/sym-handling.c
new file mode 100644
index 000..c9de001
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/sym-handling.c
@@ -0,0 +1,19 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * Copyright (C) 2015 Naveen N. Rao, IBM Corporation
+ */
+
+#include debug.h
+#include symbol.h
+
+#ifdef HAVE_LIBELF_SUPPORT
+bool elf__needs_adjust_symbols(GElf_Ehdr ehdr)
+{
+   return ehdr.e_type == ET_EXEC ||
+  ehdr.e_type == ET_REL ||
+  ehdr.e_type == ET_DYN;
+}
+#endif
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index ac09c60..7f5d237 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -630,6 +630,11 @@ void symsrc__destroy(struct symsrc *ss)
close(ss-fd);
 }
 
+bool __weak elf__needs_adjust_symbols(GElf_Ehdr ehdr)
+{
+   return ehdr.e_type == ET_EXEC || ehdr.e_type == ET_REL;
+}
+
 int symsrc__init(struct symsrc *ss, struct dso *dso, const char *name,
 enum dso_binary_type type)
 {
@@ -712,8 +717,7 @@ int symsrc__init(struct symsrc *ss, struct dso *dso, const 
char *name,
 .gnu.prelink_undo,
 NULL) != NULL);
} else {
-   ss-adjust_symbols = ehdr.e_type == ET_EXEC ||
-ehdr.e_type == ET_REL;
+   ss-adjust_symbols = elf__needs_adjust_symbols(ehdr);
}
 
ss-name   = strdup(name);
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 0956150..8cb0af4 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -303,4 +303,8 @@ int setup_list(struct strlist **list, const char *list_str,
 int setup_intlist(struct intlist **list, const char *list_str,
  const char *list_name);
 
+#ifdef HAVE_LIBELF_SUPPORT
+bool elf__needs_adjust_symbols(GElf_Ehdr ehdr);
+#endif
+
 #endif /* __PERF_SYMBOL */
-- 
2.3.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 7/7] perf probe/ppc64le: Fixup function entry if using kallsyms lookup

2015-04-28 Thread Naveen N. Rao
On powerpc ABIv2, if no debug-info is found and we use kallsyms, we need
to fixup the function entry to point to the local entry point. Use
offset of 8 since current toolchains always generate 2 instructions (8
bytes).

Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
---
Changes:
- No changes other than move to use __weak functions

 tools/perf/arch/powerpc/util/sym-handling.c | 15 +++
 tools/perf/util/probe-event.c   |  5 +
 tools/perf/util/probe-event.h   |  2 ++
 3 files changed, 22 insertions(+)

diff --git a/tools/perf/arch/powerpc/util/sym-handling.c 
b/tools/perf/arch/powerpc/util/sym-handling.c
index a170060..bbc1a50 100644
--- a/tools/perf/arch/powerpc/util/sym-handling.c
+++ b/tools/perf/arch/powerpc/util/sym-handling.c
@@ -64,4 +64,19 @@ bool arch__prefers_symtab(void)
 {
return true;
 }
+
+#define PPC64LE_LEP_OFFSET 8
+
+void arch__fix_tev_from_maps(struct perf_probe_event *pev,
+struct probe_trace_event *tev, struct map *map)
+{
+   /*
+* ppc64 ABIv2 local entry point is currently always 2 instructions
+* (8 bytes) after the global entry point.
+*/
+   if (!pev-uprobes  map-dso-symtab_type == 
DSO_BINARY_TYPE__KALLSYMS) {
+   tev-point.address += PPC64LE_LEP_OFFSET;
+   tev-point.offset += PPC64LE_LEP_OFFSET;
+   }
+}
 #endif
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 5e62a07..afbb346 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -2467,6 +2467,10 @@ static int find_probe_functions(struct map *map, char 
*name)
 #define strdup_or_goto(str, label) \
({ char *__p = strdup(str); if (!__p) goto label; __p; })
 
+void __weak arch__fix_tev_from_maps(struct perf_probe_event *pev 
__maybe_unused,
+   struct probe_trace_event *tev __maybe_unused,
+   struct map *map __maybe_unused) { }
+
 /*
  * Find probe function addresses from map.
  * Return an error or the number of found probe_trace_event
@@ -2573,6 +2577,7 @@ static int find_probe_trace_events_from_map(struct 
perf_probe_event *pev,
strdup_or_goto(pev-args[i].type,
nomem_out);
}
+   arch__fix_tev_from_maps(pev, tev, map);
}
 
 out:
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 52bca4b..180f142 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -136,6 +136,8 @@ extern int show_available_vars(struct perf_probe_event 
*pevs, int npevs,
 extern int show_available_funcs(const char *module, struct strfilter *filter,
bool user);
 bool arch__prefers_symtab(void);
+void arch__fix_tev_from_maps(struct perf_probe_event *pev,
+struct probe_trace_event *tev, struct map *map);
 
 /* Maximum index number of event-name postfix */
 #define MAX_EVENT_INDEX1024
-- 
2.3.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Build regressions/improvements in v4.1-rc1

2015-04-28 Thread Geert Uytterhoeven
cc Chris

On Tue, Apr 28, 2015 at 2:24 PM, Rusty Russell ru...@rustcorp.com.au wrote:
 Geert Uytterhoeven ge...@linux-m68k.org writes:
 Can't see that one with a simple grep: can you post warning?

 /home/kisskb/slave/src/arch/tile/kernel/setup.c: In function 
 'zone_sizes_init':
 /home/kisskb/slave/src/arch/tile/kernel/setup.c:777:3: warning:
 passing argument 2 of 'cpumask_test_cpu' from incompatible pointer
 type [enabled by default]
 /home/kisskb/slave/src/include/linux/cpumask.h:294:19: note: expected
 'const struct cpumask *' but argument is of type 'struct nodemask_t *'

 Um, I turned the cpu_isset() into cpumask_test_cpu(), but that just
 showed this bug up.  The tile maintainers need to fix this one.

 Thanks,
 Rusty.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] PCI: Generic fixes and moving enumeration back to QEMU

2015-04-28 Thread Alexey Kardashevskiy

On 04/27/2015 06:02 PM, Nikunj A Dadhania wrote:

patch 1:   Fixes a bug in pci-to-pci bridge while programming the limit
registers during probe
patch 2,3: Support 64-bit address translation and usb devices can have
   64-bit pci memory BARs
patch 4:   Use QEMU create device tree nodes and program the BARs




I taught my git send-email wrapper script to add kernel/qemu/slof in 
the subject, otherwise I believe it is hard for busy people to tell quickly 
what patchset is for what, I'd recommend you doing the same, especially 
when you mention QEMU in a subject  :)








Nikunj A Dadhania (4):
   pci: program correct bridge limit registers during probe
   pci: Support 64-bit address translation
   usb: support 64-bit pci bars
   pci: Use QEMU created PCI device nodes

  board-qemu/slof/pci-phb.fs  | 44 -
  slof/fs/devices/pci-class_0c.fs | 10 --
  slof/fs/pci-properties.fs   |  6 +-
  slof/fs/pci-scan.fs |  6 +++---
  slof/fs/translate.fs|  6 ++
  5 files changed, 61 insertions(+), 11 deletions(-)




--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel v9 01/32] powerpc/iommu: Split iommu_free_table into 2 helpers

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:25PM +1000, Alexey Kardashevskiy wrote:
 The iommu_free_table helper release memory it is using (the TCE table and
 @it_map) and release the iommu_table struct as well. We might not want
 the very last step as we store iommu_table in parent structures.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

Reviewed-by: David Gibson da...@gibson.dropbear.id.au

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgprp7lCcA6oD.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel v9 08/32] vfio: powerpc/spapr: Moving pinning/unpinning to helpers

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:32PM +1000, Alexey Kardashevskiy wrote:
 This is a pretty mechanical patch to make next patches simpler.
 
 New tce_iommu_unuse_page() helper does put_page() now but it might skip
 that after the memory registering patch applied.
 
 As we are here, this removes unnecessary checks for a value returned
 by pfn_to_page() as it cannot possibly return NULL.
 
 This moves tce_iommu_disable() later to let tce_iommu_clear() know if
 the container has been enabled because if it has not been, then
 put_page() must not be called on TCEs from the TCE table. This situation
 is not yet possible but it will after KVM acceleration patchset is
 applied.
 
 This changes code to work with physical addresses rather than linear
 mapping addresses for better code readability. Following patches will
 add an xchg() callback for an IOMMU table which will accept/return
 physical addresses (unlike current tce_build()) which will eliminate
 redundant conversions.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 [aw: for the vfio related changes]
 Acked-by: Alex Williamson alex.william...@redhat.com

Reviewed-by: David Gibson da...@gibson.dropbear.id.au

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpZceGYfZxC7.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] PCI: Generic fixes and moving enumeration back to QEMU

2015-04-28 Thread Nikunj A Dadhania
Alexey Kardashevskiy a...@ozlabs.ru writes:

 On 04/27/2015 06:02 PM, Nikunj A Dadhania wrote:
 patch 1:   Fixes a bug in pci-to-pci bridge while programming the limit
 registers during probe
 patch 2,3: Support 64-bit address translation and usb devices can have
 64-bit pci memory BARs
 patch 4:   Use QEMU create device tree nodes and program the BARs



 I taught my git send-email wrapper script to add kernel/qemu/slof in 
 the subject, otherwise I believe it is hard for busy people to tell quickly 
 what patchset is for what, I'd recommend you doing the same, especially 
 when you mention QEMU in a subject  :)

Adding in the Subject will put slof in the commit title, which is what
we do not want. So will add SLOF in the [ ] now onwards. Something
like this: [ PATCH SLOF version ]

Regards
Nikunj

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel v9 13/32] vfio: powerpc/spapr/iommu/powernv/ioda2: Rework IOMMU ownership control

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:37PM +1000, Alexey Kardashevskiy wrote:
 This adds tce_iommu_take_ownership() and tce_iommu_release_ownership
 which call in a loop iommu_take_ownership()/iommu_release_ownership()
 for every table on the group. As there is just one now, no change in
 behaviour is expected.
 
 At the moment the iommu_table struct has a set_bypass() which enables/
 disables DMA bypass on IODA2 PHB. This is exposed to POWERPC IOMMU code
 which calls this callback when external IOMMU users such as VFIO are
 about to get over a PHB.
 
 The set_bypass() callback is not really an iommu_table function but
 IOMMU/PE function. This introduces a iommu_table_group_ops struct and
 adds take_ownership()/release_ownership() callbacks to it which are
 called when an external user takes/releases control over the IOMMU.
 
 This replaces set_bypass() with ownership callbacks as it is not
 necessarily just bypass enabling, it can be something else/more
 so let's give it more generic name.
 
 The callbacks is implemented for IODA2 only. Other platforms (P5IOC2,
 IODA1) will use the old iommu_take_ownership/iommu_release_ownership API.
 The following patches will replace iommu_take_ownership/
 iommu_release_ownership calls in IODA2 with full IOMMU table release/
 create.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 [aw: for the vfio related changes]
 Acked-by: Alex Williamson alex.william...@redhat.com
 ---
 Changes:
 v9:
 * squashed vfio: powerpc/spapr: powerpc/iommu: Rework IOMMU ownership 
 control
 and vfio: powerpc/spapr: powerpc/powernv/ioda2: Rework IOMMU ownership 
 control
 into a single patch
 * moved helpers with a loop through tables in a group
 to vfio_iommu_spapr_tce.c to keep the platform code free of IOMMU table
 groups as much as possible
 * added missing tce_iommu_clear() to tce_iommu_release_ownership()
 * replaced the set_ownership(enable) callback with take_ownership() and
 release_ownership()
 ---
  arch/powerpc/include/asm/iommu.h  | 13 +-
  arch/powerpc/kernel/iommu.c   | 11 --
  arch/powerpc/platforms/powernv/pci-ioda.c | 40 +++
  drivers/vfio/vfio_iommu_spapr_tce.c   | 66 
 +++
  4 files changed, 103 insertions(+), 27 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/iommu.h 
 b/arch/powerpc/include/asm/iommu.h
 index fa37519..e63419e 100644
 --- a/arch/powerpc/include/asm/iommu.h
 +++ b/arch/powerpc/include/asm/iommu.h
 @@ -93,7 +93,6 @@ struct iommu_table {
   unsigned long  it_page_shift;/* table iommu page size */
   struct iommu_table_group *it_table_group;
   struct iommu_table_ops *it_ops;
 - void (*set_bypass)(struct iommu_table *tbl, bool enable);
  };
  
  /* Pure 2^n version of get_order */
 @@ -128,11 +127,23 @@ extern struct iommu_table *iommu_init_table(struct 
 iommu_table * tbl,
  
  #define IOMMU_TABLE_GROUP_MAX_TABLES 1
  
 +struct iommu_table_group;
 +
 +struct iommu_table_group_ops {
 + /*
 +  * Switches ownership from the kernel itself to an external
 +  * user. While onwership is taken, the kernel cannot use IOMMU itself.

Typo in onwership.  I'd also like to see this be even more explicit
that take is the core kernel - vfio/whatever transition and
release is the reverse.  

 +  */
 + void (*take_ownership)(struct iommu_table_group *table_group);
 + void (*release_ownership)(struct iommu_table_group *table_group);
 +};
 +
  struct iommu_table_group {
  #ifdef CONFIG_IOMMU_API
   struct iommu_group *group;
  #endif
   struct iommu_table tables[IOMMU_TABLE_GROUP_MAX_TABLES];
 + struct iommu_table_group_ops *ops;
  };
  
  #ifdef CONFIG_IOMMU_API
 diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
 index 005146b..2856d27 100644
 --- a/arch/powerpc/kernel/iommu.c
 +++ b/arch/powerpc/kernel/iommu.c
 @@ -1057,13 +1057,6 @@ int iommu_take_ownership(struct iommu_table *tbl)
  
   memset(tbl-it_map, 0xff, sz);
  
 - /*
 -  * Disable iommu bypass, otherwise the user can DMA to all of
 -  * our physical memory via the bypass window instead of just
 -  * the pages that has been explicitly mapped into the iommu
 -  */
 - if (tbl-set_bypass)
 - tbl-set_bypass(tbl, false);
  
   return 0;
  }
 @@ -1078,10 +1071,6 @@ void iommu_release_ownership(struct iommu_table *tbl)
   /* Restore bit#0 set by iommu_init_table() */
   if (tbl-it_offset == 0)
   set_bit(0, tbl-it_map);
 -
 - /* The kernel owns the device now, we can restore the iommu bypass */
 - if (tbl-set_bypass)
 - tbl-set_bypass(tbl, true);
  }
  EXPORT_SYMBOL_GPL(iommu_release_ownership);
  
 diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
 b/arch/powerpc/platforms/powernv/pci-ioda.c
 index 88472cb..718d5cc 100644
 --- a/arch/powerpc/platforms/powernv/pci-ioda.c
 +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
 @@ -1870,10 +1870,8 @@ static void 

Re: [PATCH kernel v9 09/32] vfio: powerpc/spapr: Rework groups attaching

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:33PM +1000, Alexey Kardashevskiy wrote:
 This is to make extended ownership and multiple groups support patches
 simpler for review.
 
 This should cause no behavioural change.

Um.. this doesn't appear to be true.  Previously removing a group from
an enabled container would fail with EBUSY, now it forces a disable.

 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 [aw: for the vfio related changes]
 Acked-by: Alex Williamson alex.william...@redhat.com
 Reviewed-by: David Gibson da...@gibson.dropbear.id.au
 ---
  drivers/vfio/vfio_iommu_spapr_tce.c | 40 
 ++---
  1 file changed, 24 insertions(+), 16 deletions(-)
 
 diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
 b/drivers/vfio/vfio_iommu_spapr_tce.c
 index 115d5e6..0fbe03e 100644
 --- a/drivers/vfio/vfio_iommu_spapr_tce.c
 +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
 @@ -460,16 +460,21 @@ static int tce_iommu_attach_group(void *iommu_data,
   iommu_group_id(container-tbl-it_group),
   iommu_group_id(iommu_group));
   ret = -EBUSY;
 - } else if (container-enabled) {
 + goto unlock_exit;
 + }
 +
 + if (container-enabled) {
   pr_err(tce_vfio: attaching group #%u to enabled container\n,
   iommu_group_id(iommu_group));
   ret = -EBUSY;
 - } else {
 - ret = iommu_take_ownership(tbl);
 - if (!ret)
 - container-tbl = tbl;
 + goto unlock_exit;
   }
  
 + ret = iommu_take_ownership(tbl);
 + if (!ret)
 + container-tbl = tbl;
 +
 +unlock_exit:
   mutex_unlock(container-lock);
  
   return ret;
 @@ -487,19 +492,22 @@ static void tce_iommu_detach_group(void *iommu_data,
   pr_warn(tce_vfio: detaching group #%u, expected group is 
 #%u\n,
   iommu_group_id(iommu_group),
   iommu_group_id(tbl-it_group));
 - } else {
 - if (container-enabled) {
 - pr_warn(tce_vfio: detaching group #%u from enabled 
 container, forcing disable\n,
 - iommu_group_id(tbl-it_group));
 - tce_iommu_disable(container);
 - }
 + goto unlock_exit;
 + }
  
 - /* pr_debug(tce_vfio: detaching group #%u from iommu %p\n,
 - iommu_group_id(iommu_group), iommu_group); */
 - container-tbl = NULL;
 - tce_iommu_clear(container, tbl, tbl-it_offset, tbl-it_size);
 - iommu_release_ownership(tbl);
 + if (container-enabled) {
 + pr_warn(tce_vfio: detaching group #%u from enabled container, 
 forcing disable\n,
 + iommu_group_id(tbl-it_group));
 + tce_iommu_disable(container);
   }
 +
 + /* pr_debug(tce_vfio: detaching group #%u from iommu %p\n,
 +iommu_group_id(iommu_group), iommu_group); */
 + container-tbl = NULL;
 + tce_iommu_clear(container, tbl, tbl-it_offset, tbl-it_size);
 + iommu_release_ownership(tbl);
 +
 +unlock_exit:
   mutex_unlock(container-lock);
  }
  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpty3d0Hsf86.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 0/7] Fixes for perf probe issues on ppc

2015-04-28 Thread Naveen N. Rao
On 2015/04/29 07:17AM, Masami Hiramatsu wrote:
 On 2015/04/28 22:54, Arnaldo Carvalho de Melo wrote:
  Em Tue, Apr 28, 2015 at 05:35:33PM +0530, Naveen N. Rao escreveu:
  This patchset fixes various issues with perf probe on powerpc across ABIv1 
  and
  ABIv2:
  - in the presence of DWARF debug-info,
  - in the absence of DWARF, but with the symbol table, and
  - in the absence of debug-info, but with kallsyms.
 
  Arnaldo,
  I have moved all patches to use __weak functions. Kindly take a look and 
  let me
  know if this is what you had in mind.
  
  Ok, I applied all but the first, for which I am waiting for Masami's
  reaction, I kept Srikar's reviewed-by for the other patches, but would
  as well like to get his word that he keeps it after the __weak changes.
  
  So, for now, I'll leave it for a while sitting in my local tree, to give
  time to Masami and Srikar, ok?
 
 OK, I reviewed all the patches in this series :)

Thanks for the review!
- Naveen

 
 Acked-by: Masami Hiramatsu masami.hiramatsu...@hitachi.com
 for this series.
 
 Thank you!
 
 -- 
 Masami HIRAMATSU
 Linux Technology Research Center, System Productivity Research Dept.
 Center for Technology Innovation - Systems Engineering
 Hitachi, Ltd., Research  Development Group
 E-mail: masami.hiramatsu...@hitachi.com
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel v9 22/32] powerpc/powernv: Implement multilevel TCE tables

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:46PM +1000, Alexey Kardashevskiy wrote:
 TCE tables might get too big in case of 4K IOMMU pages and DDW enabled
 on huge guests (hundreds of GB of RAM) so the kernel might be unable to
 allocate contiguous chunk of physical memory to store the TCE table.
 
 To address this, POWER8 CPU (actually, IODA2) supports multi-level TCE tables,
 up to 5 levels which splits the table into a tree of smaller subtables.
 
 This adds multi-level TCE tables support to pnv_pci_create_table()
 and pnv_pci_free_table() helpers.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 Changes:
 v9:
 * moved from ioda2 to common powernv pci code
 * fixed cleanup if allocation fails in a middle
 * removed check for the size - all boundary checks happen in the calling code
 anyway
 ---
  arch/powerpc/include/asm/iommu.h  |  2 +
  arch/powerpc/platforms/powernv/pci-ioda.c | 15 +++--
  arch/powerpc/platforms/powernv/pci.c  | 94 
 +--
  arch/powerpc/platforms/powernv/pci.h  |  4 +-
  4 files changed, 104 insertions(+), 11 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/iommu.h 
 b/arch/powerpc/include/asm/iommu.h
 index 7e7ca0a..0f50ee2 100644
 --- a/arch/powerpc/include/asm/iommu.h
 +++ b/arch/powerpc/include/asm/iommu.h
 @@ -96,6 +96,8 @@ struct iommu_pool {
  struct iommu_table {
   unsigned long  it_busno; /* Bus number this table belongs to */
   unsigned long  it_size;  /* Size of iommu table in entries */
 + unsigned long  it_indirect_levels;
 + unsigned long  it_level_size;
   unsigned long  it_offset;/* Offset into global table */
   unsigned long  it_base;  /* mapped address of tce table */
   unsigned long  it_index; /* which iommu table this is */
 diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
 b/arch/powerpc/platforms/powernv/pci-ioda.c
 index 59baa15..cc1d09c 100644
 --- a/arch/powerpc/platforms/powernv/pci-ioda.c
 +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
 @@ -1967,13 +1967,17 @@ static long pnv_pci_ioda2_set_window(struct 
 iommu_table_group *table_group,
   table_group);
   struct pnv_phb *phb = pe-phb;
   int64_t rc;
 + const unsigned long size = tbl-it_indirect_levels ?
 + tbl-it_level_size : tbl-it_size;
   const __u64 start_addr = tbl-it_offset  tbl-it_page_shift;
   const __u64 win_size = tbl-it_size  tbl-it_page_shift;
  
   pe_info(pe, Setting up window at %llx..%llx 
 - pgsize=0x%x tablesize=0x%lx\n,
 + pgsize=0x%x tablesize=0x%lx 
 + levels=%d levelsize=%x\n,
   start_addr, start_addr + win_size - 1,
 - 1UL  tbl-it_page_shift, tbl-it_size  3);
 + 1UL  tbl-it_page_shift, tbl-it_size  3,
 + tbl-it_indirect_levels + 1, tbl-it_level_size  3);
  
   tbl-it_table_group = pe-table_group;
  
 @@ -1984,9 +1988,9 @@ static long pnv_pci_ioda2_set_window(struct 
 iommu_table_group *table_group,
   rc = opal_pci_map_pe_dma_window(phb-opal_id,
   pe-pe_number,
   pe-pe_number  1,
 - 1,
 + tbl-it_indirect_levels + 1,
   __pa(tbl-it_base),
 - tbl-it_size  3,
 + size  3,
   1ULL  tbl-it_page_shift);
   if (rc) {
   pe_err(pe, Failed to configure TCE table, err %ld\n, rc);
 @@ -2099,7 +2103,8 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
 *phb,
   phb-ioda.m32_pci_base);
  
   rc = pnv_pci_create_table(pe-table_group, pe-phb-hose-node,
 - 0, IOMMU_PAGE_SHIFT_4K, phb-ioda.m32_pci_base, tbl);
 + 0, IOMMU_PAGE_SHIFT_4K, phb-ioda.m32_pci_base,
 + POWERNV_IOMMU_DEFAULT_LEVELS, tbl);
   if (rc) {
   pe_err(pe, Failed to create 32-bit TCE table, err %ld, rc);
   return;
 diff --git a/arch/powerpc/platforms/powernv/pci.c 
 b/arch/powerpc/platforms/powernv/pci.c
 index 6bcfad5..fc129c4 100644
 --- a/arch/powerpc/platforms/powernv/pci.c
 +++ b/arch/powerpc/platforms/powernv/pci.c
 @@ -46,6 +46,8 @@
  #define cfg_dbg(fmt...)  do { } while(0)
  //#define cfg_dbg(fmt...)printk(fmt)
  
 +#define ROUND_UP(x, n) (((x) + (n) - 1ULL)  ~((n) - 1ULL))

Use the existing ALIGN_UP macro instead of creating a new one.

  #ifdef CONFIG_PCI_MSI
  static int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
  {
 @@ -577,6 +579,19 @@ struct pci_ops pnv_pci_ops = {
  static __be64 *pnv_tce(struct iommu_table *tbl, long idx)
  {
   __be64 *tmp = ((__be64 *)tbl-it_base);
 + int  level = tbl-it_indirect_levels;
 + const long shift = ilog2(tbl-it_level_size);
 + unsigned long mask = (tbl-it_level_size - 1)  (level * shift);
 +
 + while (level) {
 + int n = (idx  

Re: [PATCH kernel v9 19/32] powerpc/powernv/ioda2: Rework iommu_table creation

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:43PM +1000, Alexey Kardashevskiy wrote:
 This moves iommu_table creation to the beginning to make following changes
 easier to review. This starts using table parameters from the iommu_table
 struct.
 
 This should cause no behavioural change.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

Reviewed-by: David Gibson da...@gibson.dropbear.id.au

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpUM0X_g_eNE.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel v9 16/32] powerpc/powernv/ioda: Move TCE kill register address to PE

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:40PM +1000, Alexey Kardashevskiy wrote:
 At the moment the DMA setup code looks for the ibm,opal-tce-kill property
 which contains the TCE kill register address. Writes to this register
 invalidates TCE cache on IODA/IODA2 hub.
 
 This moves the register address from iommu_table to pnv_ioda_pe as
 later there will be 2 tables per PE and it will be used for both tables.
 
 This moves the property reading/remapping code to a helper to reduce
 code duplication.
 
 This adds a new pnv_pci_ioda2_tvt_invalidate() helper which invalidates
 the entire table. It should be called after every call to
 opal_pci_map_pe_dma_window(). It was not required before because
 there is just a single TCE table and 64bit DMA is handled via bypass
 window (which has no table so no chache is used) but this is going
 to change with Dynamic DMA windows (DDW).
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 Changes:
 v9:
 * new in the series
 ---
  arch/powerpc/platforms/powernv/pci-ioda.c | 69 
 +++
  arch/powerpc/platforms/powernv/pci.h  |  1 +
  2 files changed, 44 insertions(+), 26 deletions(-)
 
 diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
 b/arch/powerpc/platforms/powernv/pci-ioda.c
 index f070c44..b22b3ca 100644
 --- a/arch/powerpc/platforms/powernv/pci-ioda.c
 +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
 @@ -1672,7 +1672,7 @@ static void pnv_pci_ioda1_tce_invalidate(struct 
 iommu_table *tbl,
   struct pnv_ioda_pe, table_group);
   __be64 __iomem *invalidate = rm ?
   (__be64 __iomem *)pe-tce_inval_reg_phys :
 - (__be64 __iomem *)tbl-it_index;
 + pe-tce_inval_reg;
   unsigned long start, end, inc;
   const unsigned shift = tbl-it_page_shift;
  
 @@ -1743,6 +1743,18 @@ static struct iommu_table_ops pnv_ioda1_iommu_ops = {
   .get = pnv_tce_get,
  };
  
 +static inline void pnv_pci_ioda2_tvt_invalidate(struct pnv_ioda_pe *pe)
 +{
 + /* 01xb - invalidate TCEs that match the specified PE# */
 + unsigned long addr = (0x4ull  60) | (pe-pe_number  0xFF);

This doesn't really look like an address, but rather the data you're
writing to the register.

 + if (!pe-tce_inval_reg)
 + return;
 +
 +mb(); /* Ensure above stores are visible */
 + __raw_writeq(cpu_to_be64(addr), pe-tce_inval_reg);
 +}
 +
  static void pnv_pci_ioda2_tce_invalidate(struct iommu_table *tbl,
   unsigned long index, unsigned long npages, bool rm)
  {
 @@ -1751,7 +1763,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
 iommu_table *tbl,
   unsigned long start, end, inc;
   __be64 __iomem *invalidate = rm ?
   (__be64 __iomem *)pe-tce_inval_reg_phys :
 - (__be64 __iomem *)tbl-it_index;
 + pe-tce_inval_reg;
   const unsigned shift = tbl-it_page_shift;
  
   /* We'll invalidate DMA address in PE scope */
 @@ -1803,13 +1815,31 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = {
   .get = pnv_tce_get,
  };
  
 +static void pnv_pci_ioda_setup_opal_tce_kill(struct pnv_phb *phb,
 + struct pnv_ioda_pe *pe)
 +{
 + const __be64 *swinvp;
 +
 + /* OPAL variant of PHB3 invalidated TCEs */
 + swinvp = of_get_property(phb-hose-dn, ibm,opal-tce-kill, NULL);
 + if (!swinvp)
 + return;
 +
 + /* We need a couple more fields -- an address and a data
 +  * to or.  Since the bus is only printed out on table free
 +  * errors, and on the first pass the data will be a relative
 +  * bus number, print that out instead.
 +  */

The comment above appears to have nothing to do with the surrounding code.

 + pe-tce_inval_reg_phys = be64_to_cpup(swinvp);
 + pe-tce_inval_reg = ioremap(pe-tce_inval_reg_phys, 8);
 +}
 +
  static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 struct pnv_ioda_pe *pe, unsigned int base,
 unsigned int segs)
  {
  
   struct page *tce_mem = NULL;
 - const __be64 *swinvp;
   struct iommu_table *tbl;
   unsigned int i;
   int64_t rc;
 @@ -1823,6 +1853,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb 
 *phb,
   if (WARN_ON(pe-tce32_seg = 0))
   return;
  
 + pnv_pci_ioda_setup_opal_tce_kill(phb, pe);
 +
   /* Grab a 32-bit TCE table */
   pe-tce32_seg = base;
   pe_info(pe,  Setting up 32-bit TCE table at %08x..%08x\n,
 @@ -1865,20 +1897,11 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb 
 *phb,
 base  28, IOMMU_PAGE_SHIFT_4K);
  
   /* OPAL variant of P7IOC SW invalidated TCEs */
 - swinvp = of_get_property(phb-hose-dn, ibm,opal-tce-kill, NULL);
 - if (swinvp) {
 - /* We need a couple more fields -- an address and a data
 -  * to or.  Since the bus is only printed out on table free
 -  * errors, and on the first 

Re: [PATCH kernel v9 12/32] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:36PM +1000, Alexey Kardashevskiy wrote:
 Modern IBM POWERPC systems support multiple (currently two) TCE tables
 per IOMMU group (a.k.a. PE). This adds a iommu_table_group container
 for TCE tables. Right now just one table is supported.
 
 For P5IOC2 and IODA, iommu_table_group is embedded into PE struct
 (pnv_ioda_pe and pnv_phb) and does not require iommu_free_table(), only .
 iommu_reset_table().
 
 For pSeries, this replaces multiple calls of kzalloc_node() with a new
 iommu_pseries_group_alloc() helper and stores the table group struct
 pointer into the pci_dn struct. For release, a iommu_table_group_free()
 helper is added.
 
 This should cause no behavioural change.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 [aw: for the vfio related changes]
 Acked-by: Alex Williamson alex.william...@redhat.com

I'm not particularly fond of the table_group name, but I can't
really think of a better name for now.  So,

Reviewed-by: David Gibson da...@gibson.dropbear.id.au

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpKZbhG2CVyj.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel v9 25/32] vfio: powerpc/spapr: powerpc/powernv/ioda2: Rework ownership

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:49PM +1000, Alexey Kardashevskiy wrote:
 Before the IOMMU user (VFIO) would take control over the IOMMU table
 belonging to a specific IOMMU group. This approach did not allow sharing
 tables between IOMMU groups attached to the same container.
 
 This introduces a new IOMMU ownership flavour when the user can not
 just control the existing IOMMU table but remove/create tables on demand.
 If an IOMMU implements take/release_ownership() callbacks, this lets
 the user have full control over the IOMMU group. When the ownership is taken,
 the platform code removes all the windows so the caller must create them.
 Before returning the ownership back to the platform code, VFIO
 unprograms and removes all the tables it created.
 
 This changes IODA2's onwership handler to remove the existing table

onwership

 rather than manipulating with the existing one. From now on,
 iommu_take_ownership() and iommu_release_ownership() are only called
 from the vfio_iommu_spapr_tce driver.
 
 In tce_iommu_detach_group(), this copies a iommu_table descriptor on stack
 as IODA2's unset_window() will clear the descriptor embedded into PE
 and we will not be able to free the table afterwards.
 This is a transitional hack and following patches will replace this code
 anyway.
 
 Old-style ownership is still supported allowing VFIO to run on older
 P5IOC2 and IODA IO controllers.
 
 No change in userspace-visible behaviour is expected. Since it recreates
 TCE tables on each ownership change, related kernel traces will appear
 more often.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 [aw: for the vfio related changes]
 Acked-by: Alex Williamson alex.william...@redhat.com
 ---
 Changes:
 v9:
 * fixed crash in tce_iommu_detach_group() on tbl-it_ops-free as
 tce_iommu_attach_group() used to initialize the table from a descriptor
 on stack (it does not matter for the series as this bit is changed later 
 anyway
 but it ruing bisectability)
 
 v6:
 * fixed commit log that VFIO removes tables before passing ownership
 back to the platform code, not userspace
 
 1
 ---
  arch/powerpc/platforms/powernv/pci-ioda.c | 27 +++--
  drivers/vfio/vfio_iommu_spapr_tce.c   | 33 
 +--
  2 files changed, 56 insertions(+), 4 deletions(-)
 
 diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
 b/arch/powerpc/platforms/powernv/pci-ioda.c
 index 2a4b2b2..45bc131 100644
 --- a/arch/powerpc/platforms/powernv/pci-ioda.c
 +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
 @@ -2105,16 +2105,39 @@ static void pnv_ioda2_take_ownership(struct 
 iommu_table_group *table_group)
   struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
   table_group);
  
 - iommu_take_ownership(table_group-tables[0]);
   pnv_pci_ioda2_set_bypass(pe, false);
 + pnv_pci_ioda2_unset_window(pe-table_group, 0);
 + pnv_pci_free_table(pe-table_group.tables[0]);
  }
  
  static void pnv_ioda2_release_ownership(struct iommu_table_group 
 *table_group)
  {
   struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
   table_group);
 + struct iommu_table *tbl = pe-table_group.tables[0];
 + int64_t rc;
 +
 + rc = pnv_pci_ioda2_create_table(pe-table_group, 0,
 + IOMMU_PAGE_SHIFT_4K,
 + pe-phb-ioda.m32_pci_base,
 + POWERNV_IOMMU_DEFAULT_LEVELS, tbl);
 + if (rc) {
 + pe_err(pe, Failed to create 32-bit TCE table, err %ld,
 + rc);
 + return;
 + }
 +
 + tbl-it_table_group = pe-table_group;
 + iommu_init_table(tbl, pe-phb-hose-node);
 +
 + rc = pnv_pci_ioda2_set_window(pe-table_group, 0, tbl);
 + if (rc) {
 + pe_err(pe, Failed to configure 32-bit TCE table, err %ld\n,
 + rc);
 + pnv_pci_free_table(tbl);
 + return;
 + }

It seems like you want a helper function called both here and in the
initial PE setup.  Otherwise you encourage future bugs where the
initial PE setup changes, but taking and releasing IOMMU ownership
from VFIO no longer sets up exactly the same thing again.

 - iommu_release_ownership(table_group-tables[0]);
   pnv_pci_ioda2_set_bypass(pe, true);
  }
  
 diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
 b/drivers/vfio/vfio_iommu_spapr_tce.c
 index 2d51bbf..892a584 100644
 --- a/drivers/vfio/vfio_iommu_spapr_tce.c
 +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
 @@ -569,6 +569,10 @@ static int tce_iommu_attach_group(void *iommu_data,
   if (!table_group-ops || !table_group-ops-take_ownership ||
   !table_group-ops-release_ownership) {
   ret = tce_iommu_take_ownership(table_group);
 + } else if (!table_group-ops-create_table ||
 + !table_group-ops-set_window) {
 + 

Re: [PATCH kernel v9 21/32] powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_set_window

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:45PM +1000, Alexey Kardashevskiy wrote:
 This is a part of moving DMA window programming to an iommu_ops
 callback. pnv_pci_ioda2_set_window() takes an iommu_table_group as
 a first parameter (not pnv_ioda_pe) as it is going to be used as
 a callback for VFIO DDW code.
 
 This adds pnv_pci_ioda2_tvt_invalidate() to invalidate TVT as it is
 a good thing to do.

What's the TVT and why is invalidating it a good thing?

Also, it looks like it didn't add it, just move it.

 It does not have immediate effect now as the table
 is never recreated after reboot but it will in the following patches.
 
 This should cause no behavioural change.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 Reviewed-by: David Gibson da...@gibson.dropbear.id.au

Really?  I don't remember this one.

 ---
 Changes:
 v9:
 * initialize pe-table_group.tables[0] at the very end when
 tbl is fully initialized
 * moved pnv_pci_ioda2_tvt_invalidate() from earlier patch
 ---
  arch/powerpc/platforms/powernv/pci-ioda.c | 67 
 +++
  1 file changed, 51 insertions(+), 16 deletions(-)
 
 diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
 b/arch/powerpc/platforms/powernv/pci-ioda.c
 index b9b3773..59baa15 100644
 --- a/arch/powerpc/platforms/powernv/pci-ioda.c
 +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
 @@ -1960,6 +1960,52 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb 
 *phb,
   __free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
  }
  
 +static long pnv_pci_ioda2_set_window(struct iommu_table_group *table_group,
 + struct iommu_table *tbl)
 +{
 + struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
 + table_group);
 + struct pnv_phb *phb = pe-phb;
 + int64_t rc;
 + const __u64 start_addr = tbl-it_offset  tbl-it_page_shift;
 + const __u64 win_size = tbl-it_size  tbl-it_page_shift;
 +
 + pe_info(pe, Setting up window at %llx..%llx 
 + pgsize=0x%x tablesize=0x%lx\n,
 + start_addr, start_addr + win_size - 1,
 + 1UL  tbl-it_page_shift, tbl-it_size  3);
 +
 + tbl-it_table_group = pe-table_group;
 +
 + /*
 +  * Map TCE table through TVT. The TVE index is the PE number
 +  * shifted by 1 bit for 32-bits DMA space.
 +  */
 + rc = opal_pci_map_pe_dma_window(phb-opal_id,
 + pe-pe_number,
 + pe-pe_number  1,
 + 1,
 + __pa(tbl-it_base),
 + tbl-it_size  3,
 + 1ULL  tbl-it_page_shift);
 + if (rc) {
 + pe_err(pe, Failed to configure TCE table, err %ld\n, rc);
 + goto fail;
 + }
 +
 + pnv_pci_ioda2_tvt_invalidate(pe);
 +
 + /* Store fully initialized *tbl (may be external) in PE */
 + pe-table_group.tables[0] = *tbl;

Hrm, a non-atomic copy of a whole structure into the array.  Is that
really what you want?

 + return 0;
 +fail:
 + if (pe-tce32_seg = 0)
 + pe-tce32_seg = -1;
 +
 + return rc;
 +}
 +
  static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
  {
   uint16_t window_id = (pe-pe_number  1 ) + 1;
 @@ -2068,21 +2114,16 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
 *phb,
   pe-table_group.ops = pnv_pci_ioda2_ops;
  #endif
  
 - /*
 -  * Map TCE table through TVT. The TVE index is the PE number
 -  * shifted by 1 bit for 32-bits DMA space.
 -  */
 - rc = opal_pci_map_pe_dma_window(phb-opal_id, pe-pe_number,
 - pe-pe_number  1, 1, __pa(tbl-it_base),
 - tbl-it_size  3, 1ULL  tbl-it_page_shift);
 + rc = pnv_pci_ioda2_set_window(pe-table_group, tbl);
   if (rc) {
   pe_err(pe, Failed to configure 32-bit TCE table,
   err %ld\n, rc);
 - goto fail;
 + pnv_pci_free_table(tbl);
 + if (pe-tce32_seg = 0)
 + pe-tce32_seg = -1;
 + return;
   }
  
 - pnv_pci_ioda2_tvt_invalidate(pe);
 -
   /* OPAL variant of PHB3 invalidated TCEs */
   if (pe-tce_inval_reg)
   tbl-it_type |= (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE);
 @@ -2103,12 +2144,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
 *phb,
   /* Also create a bypass window */
   if (!pnv_iommu_bypass_disabled)
   pnv_pci_ioda2_setup_bypass_pe(phb, pe);
 -
 - return;
 -fail:
 - if (pe-tce32_seg = 0)
 - pe-tce32_seg = -1;
 - pnv_pci_free_table(tbl);
  }
  
  static void pnv_ioda_setup_dma(struct pnv_phb *phb)

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpAxZsQpuckj.pgp
Description: PGP signature

Re: [PATCH kernel v9 14/32] powerpc/iommu: Fix IOMMU ownership control functions

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:38PM +1000, Alexey Kardashevskiy wrote:
 This adds missing locks in iommu_take_ownership()/
 iommu_release_ownership().
 
 This marks all pages busy in iommu_table::it_map in order to catch
 errors if there is an attempt to use this table while ownership over it
 is taken.
 
 This only clears TCE content if there is no page marked busy in it_map.
 Clearing must be done outside of the table locks as iommu_clear_tce()
 called from iommu_clear_tces_and_put_pages() does this.
 
 In order to use bitmap_empty(), the existing code clears bit#0 which
 is set even in an empty table if it is bus-mapped at 0 as
 iommu_init_table() reserves page#0 to prevent buggy drivers
 from crashing when allocated page is bus-mapped at zero
 (which is correct). This restores the bit in the case of failure
 to bring the it_map to the state it was in when we called
 iommu_take_ownership().

Ah! I finally understand what all this bit#0 stuff is about.  Thanks
for the explanation.

 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

Reviewed-by: David Gibson da...@gibson.dropbear.id.au

With one small comment..


 ---
 Changes:
 v9:
 * iommu_table_take_ownership() did not return @ret (and ignored EBUSY),
 now it does return correct error.
 * updated commit log about setting bit#0 in the case of failure
 
 v5:
 * do not store bit#0 value, it has to be set for zero-based table
 anyway
 * removed test_and_clear_bit
 ---
  arch/powerpc/kernel/iommu.c | 31 +--
  1 file changed, 25 insertions(+), 6 deletions(-)
 
 diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
 index 2856d27..ea2c8ba 100644
 --- a/arch/powerpc/kernel/iommu.c
 +++ b/arch/powerpc/kernel/iommu.c
 @@ -1045,32 +1045,51 @@ EXPORT_SYMBOL_GPL(iommu_tce_build);
  
  int iommu_take_ownership(struct iommu_table *tbl)
  {
 - unsigned long sz = (tbl-it_size + 7)  3;
 + unsigned long flags, i, sz = (tbl-it_size + 7)  3;
 + int ret = 0;
 +
 + spin_lock_irqsave(tbl-large_pool.lock, flags);
 + for (i = 0; i  tbl-nr_pools; i++)
 + spin_lock(tbl-pools[i].lock);

   if (tbl-it_offset == 0)
   clear_bit(0, tbl-it_map);
  
   if (!bitmap_empty(tbl-it_map, tbl-it_size)) {
   pr_err(iommu_tce: it_map is not empty);
 - return -EBUSY;
 + ret = -EBUSY;
 + /* Restore bit#0 set by iommu_init_table() */
 + if (tbl-it_offset == 0)
 + set_bit(0, tbl-it_map);
 + } else {
 + memset(tbl-it_map, 0xff, sz);
   }
  
 - memset(tbl-it_map, 0xff, sz);
 + for (i = 0; i  tbl-nr_pools; i++)
 + spin_unlock(tbl-pools[i].lock);

I *think* it's safe in this case, but releasing locks not in the
reverse order you acquired them makes me a bit nervous.

 + spin_unlock_irqrestore(tbl-large_pool.lock, flags);
  
 -
 - return 0;
 + return ret;
  }
  EXPORT_SYMBOL_GPL(iommu_take_ownership);
  
  void iommu_release_ownership(struct iommu_table *tbl)
  {
 - unsigned long sz = (tbl-it_size + 7)  3;
 + unsigned long flags, i, sz = (tbl-it_size + 7)  3;
 +
 + spin_lock_irqsave(tbl-large_pool.lock, flags);
 + for (i = 0; i  tbl-nr_pools; i++)
 + spin_lock(tbl-pools[i].lock);
  
   memset(tbl-it_map, 0, sz);
  
   /* Restore bit#0 set by iommu_init_table() */
   if (tbl-it_offset == 0)
   set_bit(0, tbl-it_map);
 +
 + for (i = 0; i  tbl-nr_pools; i++)
 + spin_unlock(tbl-pools[i].lock);
 + spin_unlock_irqrestore(tbl-large_pool.lock, flags);
  }
  EXPORT_SYMBOL_GPL(iommu_release_ownership);
  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpUrT_H30Rgq.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] Correct cpu affinity for dlpar added cpus

2015-04-28 Thread Michael Ellerman
Subject should be powerpc/pseries: ... please.

On Tue, 2015-04-28 at 10:37 -0500, Nathan Fontenot wrote:
 The incorrect ordering of operations during cpu dlpar causes the affinity
 of cpus being added to be invalid. Phyp does not assign affinity information
 for a cpu until the rtas set-indicator calls are made to set the isolation
 and allocation state. In the current code we call rtas configure-connector
 before making the set-indicator calls which results in invalid data in the
 ibm,associativity property for the cpu we're adding.

Invalid and benign? Or invalid and causes an oops or ..?

 This patch corrects the order of operations to make the set-indicator
 calls (done in acquire_drc) before calling configure-connector.

Which commit added the code and/or caused it to be wrong?

  
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches#n187


While looking at the code I notice it looks like we leak a reference if
dlpar_configure_connector() fails:

parent = of_find_node_by_path(/cpus);
if (!parent)
return -ENODEV;

dn = dlpar_configure_connector(cpu_to_be32(drc_index), parent);
if (!dn)
return -EINVAL;

of_node_put(parent);


Please send a separate patch to fix that.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel v9 17/32] powerpc/powernv: Implement accessor to TCE entry

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:41PM +1000, Alexey Kardashevskiy wrote:
 This replaces direct accesses to TCE table with a helper which
 returns an TCE entry address. This does not make difference now but will
 when multi-level TCE tables get introduces.
 
 No change in behavior is expected.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

Reviewed-by: David Gibson da...@gibson.dropbear.id.au


 ---
 Changes:
 v9:
 * new patch in the series to separate this mechanical change from
 functional changes; this is not right before
 powerpc/powernv: Implement multilevel TCE tables but here in order
 to let the next patch - powerpc/iommu/powernv: Release replaced TCE -
 use pnv_tce() and avoid changing the same code twice
 ---
  arch/powerpc/platforms/powernv/pci.c | 34 +-
  1 file changed, 21 insertions(+), 13 deletions(-)
 
 diff --git a/arch/powerpc/platforms/powernv/pci.c 
 b/arch/powerpc/platforms/powernv/pci.c
 index 84b4ea4..ba75aa5 100644
 --- a/arch/powerpc/platforms/powernv/pci.c
 +++ b/arch/powerpc/platforms/powernv/pci.c
 @@ -572,38 +572,46 @@ struct pci_ops pnv_pci_ops = {
   .write = pnv_pci_write_config,
  };
  
 +static __be64 *pnv_tce(struct iommu_table *tbl, long idx)
 +{
 + __be64 *tmp = ((__be64 *)tbl-it_base);
 +
 + return tmp + idx;
 +}
 +
  int pnv_tce_build(struct iommu_table *tbl, long index, long npages,
   unsigned long uaddr, enum dma_data_direction direction,
   struct dma_attrs *attrs)
  {
   u64 proto_tce = iommu_direction_to_tce_perm(direction);
 - __be64 *tcep;
 - u64 rpn;
 + u64 rpn = __pa(uaddr)  tbl-it_page_shift;

I guess this was a problem in the existing code, not this patch.  But
uaddr is a really bad name (and unsigned long is a bad type) for
what must actually be a kernel linear mapping address.

 + long i;
  
 - tcep = ((__be64 *)tbl-it_base) + index - tbl-it_offset;
 - rpn = __pa(uaddr)  tbl-it_page_shift;
 -
 - while (npages--)
 - *(tcep++) = cpu_to_be64(proto_tce |
 - (rpn++  tbl-it_page_shift));
 + for (i = 0; i  npages; i++) {
 + unsigned long newtce = proto_tce |
 + ((rpn + i)  tbl-it_page_shift);
 + unsigned long idx = index - tbl-it_offset + i;
  
 + *(pnv_tce(tbl, idx)) = cpu_to_be64(newtce);
 + }
  
   return 0;
  }
  
  void pnv_tce_free(struct iommu_table *tbl, long index, long npages)
  {
 - __be64 *tcep;
 + long i;
  
 - tcep = ((__be64 *)tbl-it_base) + index - tbl-it_offset;
 + for (i = 0; i  npages; i++) {
 + unsigned long idx = index - tbl-it_offset + i;
  
 - while (npages--)
 - *(tcep++) = cpu_to_be64(0);
 + *(pnv_tce(tbl, idx)) = cpu_to_be64(0);
 + }
  }
  
  unsigned long pnv_tce_get(struct iommu_table *tbl, long index)
  {
 - return ((u64 *)tbl-it_base)[index - tbl-it_offset];
 + return *(pnv_tce(tbl, index - tbl-it_offset));
  }
  
  void pnv_pci_setup_iommu_table(struct iommu_table *tbl,

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpyafmIPLvyi.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel v9 15/32] powerpc/powernv/ioda/ioda2: Rework TCE invalidation in tce_build()/tce_free()

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:39PM +1000, Alexey Kardashevskiy wrote:
 The pnv_pci_ioda_tce_invalidate() helper invalidates TCE cache. It is
 supposed to be called on IODA1/2 and not called on p5ioc2. It receives
 start and end host addresses of TCE table.
 
 IODA2 actually needs PCI addresses to invalidate the cache. Those
 can be calculated from host addresses but since we are going
 to implement multi-level TCE tables, calculating PCI address from
 a host address might get either tricky or ugly as TCE table remains flat
 on PCI bus but not in RAM.
 
 This moves pnv_pci_ioda_tce_invalidate() from generic pnv_tce_build/
 pnt_tce_free and defines IODA1/2-specific callbacks which call generic
 ones and do PHB-model-specific TCE cache invalidation. P5IOC2 keeps
 using generic callbacks as before.
 
 This changes pnv_pci_ioda2_tce_invalidate() to receives TCE index and
 number of pages which are PCI addresses shifted by IOMMU page shift.
 
 No change in behaviour is expected.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 Changes:
 v9:
 * removed confusing comment from commit log about unintentional calling of
 pnv_pci_ioda_tce_invalidate()
 * moved mechanical changes away to powerpc/iommu: Move tce_xxx callbacks 
 from ppc_md to iommu_table
 * fixed bug with broken invalidation in pnv_pci_ioda2_tce_invalidate -
 @index includes @tbl-it_offset but old code added it anyway which later broke
 DDW
 ---
  arch/powerpc/platforms/powernv/pci-ioda.c | 86 
 +--
  arch/powerpc/platforms/powernv/pci.c  | 17 ++
  2 files changed, 64 insertions(+), 39 deletions(-)
 
 diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
 b/arch/powerpc/platforms/powernv/pci-ioda.c
 index 718d5cc..f070c44 100644
 --- a/arch/powerpc/platforms/powernv/pci-ioda.c
 +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
 @@ -1665,18 +1665,20 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe 
 *pe,
   }
  }
  
 -static void pnv_pci_ioda1_tce_invalidate(struct pnv_ioda_pe *pe,
 -  struct iommu_table *tbl,
 -  __be64 *startp, __be64 *endp, bool rm)
 +static void pnv_pci_ioda1_tce_invalidate(struct iommu_table *tbl,
 + unsigned long index, unsigned long npages, bool rm)
  {
 + struct pnv_ioda_pe *pe = container_of(tbl-it_table_group,
 + struct pnv_ioda_pe, table_group);
   __be64 __iomem *invalidate = rm ?
   (__be64 __iomem *)pe-tce_inval_reg_phys :
   (__be64 __iomem *)tbl-it_index;
   unsigned long start, end, inc;
   const unsigned shift = tbl-it_page_shift;
  
 - start = __pa(startp);
 - end = __pa(endp);
 + start = __pa((__be64 *)tbl-it_base + index - tbl-it_offset);
 + end = __pa((__be64 *)tbl-it_base + index - tbl-it_offset +
 + npages - 1);

This doesn't look right.  The arguments to __pa don't appear to be
addresses (since index and if_offset are in units of (TCE) pages, not
bytes).

  
   /* BML uses this case for p6/p7/galaxy2: Shift addr and put in node */
   if (tbl-it_busno) {
 @@ -1712,16 +1714,40 @@ static void pnv_pci_ioda1_tce_invalidate(struct 
 pnv_ioda_pe *pe,
*/
  }
  
 +static int pnv_ioda1_tce_build(struct iommu_table *tbl, long index,
 + long npages, unsigned long uaddr,
 + enum dma_data_direction direction,
 + struct dma_attrs *attrs)
 +{
 + long ret = pnv_tce_build(tbl, index, npages, uaddr, direction,
 + attrs);
 +
 + if (!ret  (tbl-it_type  TCE_PCI_SWINV_CREATE))
 + pnv_pci_ioda1_tce_invalidate(tbl, index, npages, false);
 +
 + return ret;
 +}
 +
 +static void pnv_ioda1_tce_free(struct iommu_table *tbl, long index,
 + long npages)
 +{
 + pnv_tce_free(tbl, index, npages);
 +
 + if (tbl-it_type  TCE_PCI_SWINV_FREE)
 + pnv_pci_ioda1_tce_invalidate(tbl, index, npages, false);
 +}
 +
  static struct iommu_table_ops pnv_ioda1_iommu_ops = {
 - .set = pnv_tce_build,
 - .clear = pnv_tce_free,
 + .set = pnv_ioda1_tce_build,
 + .clear = pnv_ioda1_tce_free,
   .get = pnv_tce_get,
  };
  
 -static void pnv_pci_ioda2_tce_invalidate(struct pnv_ioda_pe *pe,
 -  struct iommu_table *tbl,
 -  __be64 *startp, __be64 *endp, bool rm)
 +static void pnv_pci_ioda2_tce_invalidate(struct iommu_table *tbl,
 + unsigned long index, unsigned long npages, bool rm)
  {
 + struct pnv_ioda_pe *pe = container_of(tbl-it_table_group,
 + struct pnv_ioda_pe, table_group);
   unsigned long start, end, inc;
   __be64 __iomem *invalidate = rm ?
   (__be64 __iomem *)pe-tce_inval_reg_phys :
 @@ -1734,10 +1760,8 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
 pnv_ioda_pe *pe,
   end = start;
  
   /* Figure out the start, end and step */
 - inc 

Re: [PATCH kernel v9 02/32] Revert powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:26PM +1000, Alexey Kardashevskiy wrote:
 This reverts commit 9e8d4a19ab66ec9e132d405357b9108a4f26efd3 as
 tce32_table has exactly the same life time as the whole PE.
 
 This makes use of a new iommu_reset_table() helper instead.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

Reviewed-by: David Gibson da...@gibson.dropbear.id.au

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpgAATFwMgC4.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 0/7] Fixes for perf probe issues on ppc

2015-04-28 Thread Srikar Dronamraju
* Arnaldo Carvalho de Melo a...@kernel.org [2015-04-28 10:54:53]:

 Em Tue, Apr 28, 2015 at 05:35:33PM +0530, Naveen N. Rao escreveu:
  This patchset fixes various issues with perf probe on powerpc across ABIv1 
  and
  ABIv2:
  - in the presence of DWARF debug-info,
  - in the absence of DWARF, but with the symbol table, and
  - in the absence of debug-info, but with kallsyms.
  
  Arnaldo,
  I have moved all patches to use __weak functions. Kindly take a look and 
  let me
  know if this is what you had in mind.
 
 Ok, I applied all but the first, for which I am waiting for Masami's
 reaction, I kept Srikar's reviewed-by for the other patches, but would
 as well like to get his word that he keeps it after the __weak changes.
 
 So, for now, I'll leave it for a while sitting in my local tree, to give
 time to Masami and Srikar, ok?
 

Yes Arnaldo, I have looked at the patches after the __weak changes and
they look good to me.

 - Arnaldo
 

-- 
Thanks and Regards
Srikar Dronamraju

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel v9 23/32] powerpc/powernv/ioda: Define and implement DMA table/window management callbacks

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:47PM +1000, Alexey Kardashevskiy wrote:
 This extends iommu_table_group_ops by a set of callbacks to support
 dynamic DMA windows management.
 
 create_table() creates a TCE table with specific parameters.
 it receives iommu_table_group to know nodeid in order to allocate
 TCE table memory closer to the PHB. The exact format of allocated
 multi-level table might be also specific to the PHB model (not
 the case now though).
 This callback calculated the DMA window offset on a PCI bus from @num
 and stores it in a just created table.
 
 set_window() sets the window at specified TVT index + @num on PHB.
 
 unset_window() unsets the window from specified TVT.
 
 This adds a free() callback to iommu_table_ops to free the memory
 (potentially a tree of tables) allocated for the TCE table.

Doesn't the free callback belong with the previous patch introducing
multi-level tables?

 create_table() and free() are supposed to be called once per
 VFIO container and set_window()/unset_window() are supposed to be
 called for every group in a container.
 
 This adds IOMMU capabilities to iommu_table_group such as default
 32bit window parameters and others.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  arch/powerpc/include/asm/iommu.h| 19 
  arch/powerpc/platforms/powernv/pci-ioda.c   | 75 
 ++---
  arch/powerpc/platforms/powernv/pci-p5ioc2.c | 12 +++--
  3 files changed, 96 insertions(+), 10 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/iommu.h 
 b/arch/powerpc/include/asm/iommu.h
 index 0f50ee2..7694546 100644
 --- a/arch/powerpc/include/asm/iommu.h
 +++ b/arch/powerpc/include/asm/iommu.h
 @@ -70,6 +70,7 @@ struct iommu_table_ops {
   /* get() returns a physical address */
   unsigned long (*get)(struct iommu_table *tbl, long index);
   void (*flush)(struct iommu_table *tbl);
 + void (*free)(struct iommu_table *tbl);
  };
  
  /* These are used by VIO */
 @@ -148,6 +149,17 @@ extern struct iommu_table *iommu_init_table(struct 
 iommu_table * tbl,
  struct iommu_table_group;
  
  struct iommu_table_group_ops {
 + long (*create_table)(struct iommu_table_group *table_group,
 + int num,
 + __u32 page_shift,
 + __u64 window_size,
 + __u32 levels,
 + struct iommu_table *tbl);
 + long (*set_window)(struct iommu_table_group *table_group,
 + int num,
 + struct iommu_table *tblnew);
 + long (*unset_window)(struct iommu_table_group *table_group,
 + int num);
   /*
* Switches ownership from the kernel itself to an external
* user. While onwership is taken, the kernel cannot use IOMMU itself.
 @@ -160,6 +172,13 @@ struct iommu_table_group {
  #ifdef CONFIG_IOMMU_API
   struct iommu_group *group;
  #endif
 + /* Some key properties of IOMMU */
 + __u32 tce32_start;
 + __u32 tce32_size;
 + __u64 pgsizes; /* Bitmap of supported page sizes */
 + __u32 max_dynamic_windows_supported;
 + __u32 max_levels;

With this information, table_group seems even more like a bad name.
iommu_state maybe?

   struct iommu_table tables[IOMMU_TABLE_GROUP_MAX_TABLES];
   struct iommu_table_group_ops *ops;
  };
 diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
 b/arch/powerpc/platforms/powernv/pci-ioda.c
 index cc1d09c..4828837 100644
 --- a/arch/powerpc/platforms/powernv/pci-ioda.c
 +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
 @@ -24,6 +24,7 @@
  #include linux/msi.h
  #include linux/memblock.h
  #include linux/iommu.h
 +#include linux/sizes.h
  
  #include asm/sections.h
  #include asm/io.h
 @@ -1846,6 +1847,7 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = {
  #endif
   .clear = pnv_ioda2_tce_free,
   .get = pnv_tce_get,
 + .free = pnv_pci_free_table,
  };
  
  static void pnv_pci_ioda_setup_opal_tce_kill(struct pnv_phb *phb,
 @@ -1936,6 +1938,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb 
 *phb,
TCE_PCI_SWINV_PAIR);
  
   tbl-it_ops = pnv_ioda1_iommu_ops;
 + pe-table_group.tce32_start = tbl-it_offset  tbl-it_page_shift;
 + pe-table_group.tce32_size = tbl-it_size  tbl-it_page_shift;
   iommu_init_table(tbl, phb-hose-node);
  
   if (pe-flags  PNV_IODA_PE_DEV) {
 @@ -1961,7 +1965,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb 
 *phb,
  }
  
  static long pnv_pci_ioda2_set_window(struct iommu_table_group *table_group,
 - struct iommu_table *tbl)
 + int num, struct iommu_table *tbl)
  {
   struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
   table_group);
 @@ -1972,9 +1976,10 @@ static long pnv_pci_ioda2_set_window(struct 
 iommu_table_group *table_group,
   const __u64 start_addr = tbl-it_offset  tbl-it_page_shift;
   const __u64 win_size = 

Re: [PATCH kernel v9 20/32] powerpc/powernv/ioda2: Introduce pnv_pci_create_table/pnv_pci_free_table

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:44PM +1000, Alexey Kardashevskiy wrote:
 This is a part of moving TCE table allocation into an iommu_ops
 callback to support multiple IOMMU groups per one VFIO container.
 
 This moves a table creation window to the file with common powernv-pci
 helpers as it does not do anything IODA2-specific.
 
 This adds pnv_pci_free_table() helper to release the actual TCE table.
 
 This enforces window size to be a power of two.
 
 This should cause no behavioural change.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 Reviewed-by: David Gibson da...@gibson.dropbear.id.au
 ---
 Changes:
 v9:
 * moved helpers to the common powernv pci.c file from pci-ioda.c
 * moved bits from pnv_pci_create_table() to pnv_alloc_tce_table_pages()
 ---
  arch/powerpc/platforms/powernv/pci-ioda.c | 36 ++
  arch/powerpc/platforms/powernv/pci.c  | 61 
 +++
  arch/powerpc/platforms/powernv/pci.h  |  4 ++
  3 files changed, 76 insertions(+), 25 deletions(-)
 
 diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
 b/arch/powerpc/platforms/powernv/pci-ioda.c
 index a80be34..b9b3773 100644
 --- a/arch/powerpc/platforms/powernv/pci-ioda.c
 +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
 @@ -1307,8 +1307,7 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev 
 *dev, struct pnv_ioda_pe
   if (rc)
   pe_warn(pe, OPAL error %ld release DMA window\n, rc);
  
 - iommu_reset_table(tbl, of_node_full_name(dev-dev.of_node));
 - free_pages(addr, get_order(TCE32_TABLE_SIZE));
 + pnv_pci_free_table(tbl);
  }
  
  static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, u16 num_vfs)
 @@ -2039,10 +2038,7 @@ static struct iommu_table_group_ops pnv_pci_ioda2_ops 
 = {
  static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
  struct pnv_ioda_pe *pe)
  {
 - struct page *tce_mem = NULL;
 - void *addr;
   struct iommu_table *tbl = pe-table_group.tables[0];
 - unsigned int tce_table_size, end;
   int64_t rc;
  
   /* We shouldn't already have a 32-bit DMA associated */
 @@ -2053,29 +2049,20 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
 *phb,
  
   /* The PE will reserve all possible 32-bits space */
   pe-tce32_seg = 0;
 - end = (1  ilog2(phb-ioda.m32_pci_base));
 - tce_table_size = (end / 0x1000) * 8;
   pe_info(pe, Setting up 32-bit TCE table at 0..%08x\n,
 - end);
 + phb-ioda.m32_pci_base);
  
 - /* Allocate TCE table */
 - tce_mem = alloc_pages_node(phb-hose-node, GFP_KERNEL,
 -get_order(tce_table_size));
 - if (!tce_mem) {
 - pe_err(pe, Failed to allocate a 32-bit TCE memory\n);
 - goto fail;
 + rc = pnv_pci_create_table(pe-table_group, pe-phb-hose-node,
 + 0, IOMMU_PAGE_SHIFT_4K, phb-ioda.m32_pci_base, tbl);
 + if (rc) {
 + pe_err(pe, Failed to create 32-bit TCE table, err %ld, rc);
 + return;
   }
 - addr = page_address(tce_mem);
 - memset(addr, 0, tce_table_size);
 -
 - /* Setup iommu */
 - tbl-it_table_group = pe-table_group;
 -
 - /* Setup linux iommu table */
 - pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0,
 - IOMMU_PAGE_SHIFT_4K);
  
   tbl-it_ops = pnv_ioda2_iommu_ops;
 +
 + /* Setup iommu */
 + tbl-it_table_group = pe-table_group;
   iommu_init_table(tbl, phb-hose-node);
  #ifdef CONFIG_IOMMU_API
   pe-table_group.ops = pnv_pci_ioda2_ops;
 @@ -2121,8 +2108,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
 *phb,
  fail:
   if (pe-tce32_seg = 0)
   pe-tce32_seg = -1;
 - if (tce_mem)
 - __free_pages(tce_mem, get_order(tce_table_size));
 + pnv_pci_free_table(tbl);
  }
  
  static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 diff --git a/arch/powerpc/platforms/powernv/pci.c 
 b/arch/powerpc/platforms/powernv/pci.c
 index e8802ac..6bcfad5 100644
 --- a/arch/powerpc/platforms/powernv/pci.c
 +++ b/arch/powerpc/platforms/powernv/pci.c
 @@ -20,7 +20,9 @@
  #include linux/io.h
  #include linux/msi.h
  #include linux/iommu.h
 +#include linux/memblock.h
  
 +#include asm/mmzone.h
  #include asm/sections.h
  #include asm/io.h
  #include asm/prom.h
 @@ -645,6 +647,65 @@ void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
   tbl-it_type = TCE_PCI;
  }
  
 +static __be64 *pnv_alloc_tce_table_pages(int nid, unsigned shift,
 + unsigned long *tce_table_allocated)

I'm a bit confused by the tce_table_allocated parameter.  What's the
circumstance where more memory is requested than required, and why
does it matter to the caller?

 +{
 + struct page *tce_mem = NULL;
 + __be64 *addr;
 + unsigned order = max_t(unsigned, shift, PAGE_SHIFT) - PAGE_SHIFT;
 + unsigned long local_allocated = 1UL  (order + PAGE_SHIFT);
 +
 + tce_mem = alloc_pages_node(nid, 

Re: [PATCH kernel v9 18/32] powerpc/iommu/powernv: Release replaced TCE

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:42PM +1000, Alexey Kardashevskiy wrote:
 At the moment writing new TCE value to the IOMMU table fails with EBUSY
 if there is a valid entry already. However PAPR specification allows
 the guest to write new TCE value without clearing it first.
 
 Another problem this patch is addressing is the use of pool locks for
 external IOMMU users such as VFIO. The pool locks are to protect
 DMA page allocator rather than entries and since the host kernel does
 not control what pages are in use, there is no point in pool locks and
 exchange()+put_page(oldtce) is sufficient to avoid possible races.
 
 This adds an exchange() callback to iommu_table_ops which does the same
 thing as set() plus it returns replaced TCE and DMA direction so
 the caller can release the pages afterwards. The exchange() receives
 a physical address unlike set() which receives linear mapping address;
 and returns a physical address as the clear() does.
 
 This implements exchange() for P5IOC2/IODA/IODA2. This adds a requirement
 for a platform to have exchange() implemented in order to support VFIO.
 
 This replaces iommu_tce_build() and iommu_clear_tce() with
 a single iommu_tce_xchg().
 
 This makes sure that TCE permission bits are not set in TCE passed to
 IOMMU API as those are to be calculated by platform code from DMA direction.
 
 This moves SetPageDirty() to the IOMMU code to make it work for both
 VFIO ioctl interface in in-kernel TCE acceleration (when it becomes
 available later).
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 [aw: for the vfio related changes]
 Acked-by: Alex Williamson alex.william...@redhat.com

This looks mostly good, but there are couple of details that need fixing.

 ---
 Changes:
 v9:
 * changed exchange() to work with physical addresses as these addresses
 are never accessed by the code and physical addresses are actual values
 we put into the IOMMU table
 ---
  arch/powerpc/include/asm/iommu.h| 22 +--
  arch/powerpc/kernel/iommu.c | 57 +---
  arch/powerpc/platforms/powernv/pci-ioda.c   | 34 +
  arch/powerpc/platforms/powernv/pci-p5ioc2.c |  3 ++
  arch/powerpc/platforms/powernv/pci.c| 17 +
  arch/powerpc/platforms/powernv/pci.h|  2 +
  drivers/vfio/vfio_iommu_spapr_tce.c | 58 
 ++---
  7 files changed, 128 insertions(+), 65 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/iommu.h 
 b/arch/powerpc/include/asm/iommu.h
 index e63419e..7e7ca0a 100644
 --- a/arch/powerpc/include/asm/iommu.h
 +++ b/arch/powerpc/include/asm/iommu.h
 @@ -45,13 +45,29 @@ extern int iommu_is_off;
  extern int iommu_force_on;
  
  struct iommu_table_ops {
 + /*
 +  * When called with direction==DMA_NONE, it is equal to clear().
 +  * uaddr is a linear map address.
 +  */
   int (*set)(struct iommu_table *tbl,
   long index, long npages,
   unsigned long uaddr,
   enum dma_data_direction direction,
   struct dma_attrs *attrs);
 +#ifdef CONFIG_IOMMU_API
 + /*
 +  * Exchanges existing TCE with new TCE plus direction bits;
 +  * returns old TCE and DMA direction mask.
 +  * @tce is a physical address.
 +  */
 + int (*exchange)(struct iommu_table *tbl,
 + long index,
 + unsigned long *tce,

I'd prefer to call this address or paddr or something, since it's
not a full TCE entry (which would contain permission bits).

 + enum dma_data_direction *direction);
 +#endif
   void (*clear)(struct iommu_table *tbl,
   long index, long npages);
 + /* get() returns a physical address */
   unsigned long (*get)(struct iommu_table *tbl, long index);
   void (*flush)(struct iommu_table *tbl);
  };
 @@ -152,6 +168,8 @@ extern void iommu_register_group(struct iommu_table_group 
 *table_group,
  extern int iommu_add_device(struct device *dev);
  extern void iommu_del_device(struct device *dev);
  extern int __init tce_iommu_bus_notifier_init(void);
 +extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
 + unsigned long *tce, enum dma_data_direction *direction);
  #else
  static inline void iommu_register_group(struct iommu_table_group 
 *table_group,
   int pci_domain_number,
 @@ -231,10 +249,6 @@ extern int iommu_tce_clear_param_check(struct 
 iommu_table *tbl,
   unsigned long npages);
  extern int iommu_tce_put_param_check(struct iommu_table *tbl,
   unsigned long ioba, unsigned long tce);
 -extern int iommu_tce_build(struct iommu_table *tbl, unsigned long entry,
 - unsigned long hwaddr, enum dma_data_direction direction);
 -extern unsigned long iommu_clear_tce(struct iommu_table *tbl,
 - unsigned long entry);
  
  extern void iommu_flush_tce(struct 

Re: [PATCH v3 0/7] Fixes for perf probe issues on ppc

2015-04-28 Thread Arnaldo Carvalho de Melo
Em Tue, Apr 28, 2015 at 09:42:17PM +0530, Naveen N. Rao escreveu:
 On 2015/04/28 10:54AM, Arnaldo Carvalho de Melo wrote:
  Em Tue, Apr 28, 2015 at 05:35:33PM +0530, Naveen N. Rao escreveu:
   This patchset fixes various issues with perf probe on powerpc across 
   ABIv1 and
   ABIv2:
   - in the presence of DWARF debug-info,
   - in the absence of DWARF, but with the symbol table, and
   - in the absence of debug-info, but with kallsyms.
   
   Arnaldo,
   I have moved all patches to use __weak functions. Kindly take a look and 
   let me
   know if this is what you had in mind.
  
  Ok, I applied all but the first, for which I am waiting for Masami's
  reaction, I kept Srikar's reviewed-by for the other patches, but would
  as well like to get his word that he keeps it after the __weak changes.
  
  So, for now, I'll leave it for a while sitting in my local tree, to give
  time to Masami and Srikar, ok?
 
 Sure. We will wait for Masami and Srikar to review/confirm, but Srikar's 
 reviewed-by is for the current revision.

Well, then I can go ahead and push it, later today, as I have applied
all of the patches that Srikar has reviewed, the one I'm waiting for
Masami can go when he reacts.

- Arnaldo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 0/7] Fixes for perf probe issues on ppc

2015-04-28 Thread Naveen N. Rao
On 2015/04/28 10:54AM, Arnaldo Carvalho de Melo wrote:
 Em Tue, Apr 28, 2015 at 05:35:33PM +0530, Naveen N. Rao escreveu:
  This patchset fixes various issues with perf probe on powerpc across ABIv1 
  and
  ABIv2:
  - in the presence of DWARF debug-info,
  - in the absence of DWARF, but with the symbol table, and
  - in the absence of debug-info, but with kallsyms.
  
  Arnaldo,
  I have moved all patches to use __weak functions. Kindly take a look and 
  let me
  know if this is what you had in mind.
 
 Ok, I applied all but the first, for which I am waiting for Masami's
 reaction, I kept Srikar's reviewed-by for the other patches, but would
 as well like to get his word that he keeps it after the __weak changes.
 
 So, for now, I'll leave it for a while sitting in my local tree, to give
 time to Masami and Srikar, ok?

Sure. We will wait for Masami and Srikar to review/confirm, but Srikar's 
reviewed-by is for the current revision.

Thanks!
- Naveen

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 RESEND 2/2] mmc: host: Add some quirks to be read from fdt in sdhci-pltm.c

2015-04-28 Thread Suman Tripathi
On Monday 27 April 2015 21:25:20 Suman Tripathi wrote:
  On Monday 27 April 2015 20:33:25 Suman Tripathi wrote:
On Tuesday 21 April 2015 21:12:39 Suman Tripathi wrote:
 +   host-quirks |= SDHCI_QUIRK_BROKEN_DMA;
 +
 +   if (of_get_property(np, no-cmd23, NULL))
 +   host-quirks2 |=
SDHCI_QUIRK2_HOST_NO_CMD23;

 if (of_get_property(np, no-1-8-v, NULL))

 host-quirks2 |= SDHCI_QUIRK2_NO_1_8_V;
   
Any property you add needs to be documented in the DT binding.
If possible, add generic properties for each bug you have mmc.txt
rather than the driver specific sdhci.txt, and implement the
  
   I will add the binding in mmc.txt. I thought this was present but not.
  
parsing in a common function that is used for all mmc hosts.
  
   As per mine understanding the sdhci_get_of_porperty is a common
   parsing function  . Am I wrong ??


A small side note: please fix your email client to use proper attribution
of the citations. The way you reply, nobody knows what you are saying
compare to what you quote. Also, reduce the quotation to the parts you
are replying to.

Ok .  sorry for that ..

  No, this is only used for sdhci, not for the other controllers.

 But our's is a SHCI variant so I added it in this file.

That's my point: a lot of the bugs are independent of the specific
host controller and could happen with any one of them. We want to
ensure that nobody tries to add another property with similar
semantics and a different name just because they are using a
different driver.

Then I am not finding a reason why we have sdhci_get_of_property function
?? .
 I added a generic names like broken-adma that everyone can reuse it.  I
made mistake of not adding it in the binding.

For eg : broken-cd is not added by me but I can use it. So I added
something like broken-adma as it was not present.

   An alternative would be to set all these bits based on the compatible
   string of your host, if that is the only one that has all these bugs.
 
  The host driver  (arasan) is reused but this quirks are needed due to
  board issues. so I have a control over dtb only to fix this.

 What is the nature of the bug on that board? Is there a different
 way to describe that without introducing six new properties?

 Sorry it is board and IP as well SoC errata's,

 1. Delay after power is required due to some voltage issues that will
 be fixed in next board revision

This is clearly not sdhci-specific, so make that a generic property
for all mmc.
okay.

 2. We need to support PIO mode as of now because DMA or ADMA requires
 some kind of translation driver that I am working on.

But this does not describe the hardware properties. Don't add properties
that describe the lack of a kernel driver. If you can't do DMA yet,
use a dma-ranges property that lists one empty range to prevent
dma_set_mask() from working, so it will fall back to PIO mode. You
may have to fix the driver if that doesn't already work.

The generic sdhc framework doesn't have this capabiltiy. It uses the quirks
to identify the broken DMA and ADMA modes even
if the controller is capable of.

What kind of driver do you need here?
For DMA and adma we need some 32 bit to 64 bit translation driver.  The
existing arasan driver only support 32 bit.

 3. The version of arasan variant we have in our SoC doesn't have the
 HISPD  bit field in HI-SPEED SD card. So this makes HI-SPEED sdcard
 work.

 4. NO_CMD23 is required for eMMC cards.

 These are not new properties.  Only the fact is I am using it for our
 SoC from dtb. These quirks are already there in mmc common framework.
 Nothing is new.

Are you sure that you have version 8.9a of the Arasan SDHCI?
Yes
This sounds
like version specific quirks, so they are probably present in each
SoC that uses the same version.
Not sure.

On Tue, Apr 28, 2015 at 1:19 AM, Arnd Bergmann a...@arndb.de wrote:

 On Monday 27 April 2015 21:25:20 Suman Tripathi wrote:
   On Monday 27 April 2015 20:33:25 Suman Tripathi wrote:
 On Tuesday 21 April 2015 21:12:39 Suman Tripathi wrote:
  +   host-quirks |= SDHCI_QUIRK_BROKEN_DMA;
  +
  +   if (of_get_property(np, no-cmd23, NULL))
  +   host-quirks2 |=
 SDHCI_QUIRK2_HOST_NO_CMD23;
 
  if (of_get_property(np, no-1-8-v, NULL))
 
  host-quirks2 |= SDHCI_QUIRK2_NO_1_8_V;

 Any property you add needs to be documented in the DT binding.
 If possible, add generic properties for each bug you have mmc.txt
 rather than the driver specific sdhci.txt, and implement the
   
I will add the binding in mmc.txt. I thought this was present but
 not.
   
 parsing in a common function that is used for all mmc hosts.
   
As per mine understanding the sdhci_get_of_porperty is a common
parsing function  . Am I wrong ??
 

 A 

Re: [v2,2/2] powerpc32: add support for csum_add()

2015-04-28 Thread christophe leroy



Le 25/03/2015 02:30, Scott Wood a écrit :

On Tue, Feb 03, 2015 at 12:39:27PM +0100, LEROY Christophe wrote:

The C version of csum_add() as defined in include/net/checksum.h gives the
following assembly:
0:   7c 04 1a 14 add r0,r4,r3
4:   7c 64 00 10 subfc   r3,r4,r0
8:   7c 63 19 10 subfe   r3,r3,r3
c:   7c 63 00 50 subfr3,r3,r0

include/net/checksum.h also offers the possibility to define an arch specific
function.
This patch provides a ppc32 specific csum_add() inline function.

What makes it 32-bit specific?


As far as I understand, the 64-bit will do a 64 bit addition, so we will 
have to handle differently the carry, can't just be an addze like in 32-bit.


The generated code is most likely different on ppc64. I have no ppc64 
compiler so I can't check what gcc generates for the following code:


|__wsumcsum_add(__wsum csum,  __wsum addend)
{
u32 res= (__force u32)csum;
res+= (__force u32)addend;
return  (__force __wsum)(res+ (res (__force u32)addend));
}|


Can someone with a ppc64 compiler tell what we get ?

Christophe


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] tile: properly use node_isset() on a nodemask_t

2015-04-28 Thread Chris Metcalf
The code accidentally used cpu_isset() previously in one place
(though properly node_isset() elsewhere).

Signed-off-by: Chris Metcalf cmetc...@ezchip.com
---
 arch/tile/kernel/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/tile/kernel/setup.c b/arch/tile/kernel/setup.c
index 6873f006f7d0..d366675e4bf8 100644
--- a/arch/tile/kernel/setup.c
+++ b/arch/tile/kernel/setup.c
@@ -774,7 +774,7 @@ static void __init zone_sizes_init(void)
 * though, there'll be no lowmem, so we just alloc_bootmem
 * the memmap.  There will be no percpu memory either.
 */
-   if (i != 0  cpumask_test_cpu(i, isolnodes)) {
+   if (i != 0  node_isset(i, isolnodes)) {
node_memmap_pfn[i] =
alloc_bootmem_pfn(0, memmap_size, 0);
BUG_ON(node_percpu[i] != 0);
-- 
2.1.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] tick-broadcast: Fix the printing of broadcast masks

2015-04-28 Thread Preeti U Murthy
Today the number of bits of the broadcast masks that is output into
/proc/timer_list is sizeof(unsigned long). This means that on machines
with larger number of CPUs, the bitmasks of CPUs beyond this range do
not appear.

Fix this by using bitmap printing through %*pb instead, so as to
output the broadcast masks for the range of nr_cpu_ids into
/proc/timer_list.

Signed-off-by: Preeti U Murthy pre...@linux.vnet.ibm.com
---

 kernel/time/timer_list.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index c82b03c..1afc726 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -269,11 +269,11 @@ static void timer_list_show_tickdevices_header(struct 
seq_file *m)
 {
 #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
print_tickdevice(m, tick_get_broadcast_device(), -1);
-   SEQ_printf(m, tick_broadcast_mask: %08lx\n,
-  cpumask_bits(tick_get_broadcast_mask())[0]);
+   SEQ_printf(m, tick_broadcast_mask: %*pb\n,
+  cpumask_pr_args(tick_get_broadcast_mask()));
 #ifdef CONFIG_TICK_ONESHOT
-   SEQ_printf(m, tick_broadcast_oneshot_mask: %08lx\n,
-  cpumask_bits(tick_get_broadcast_oneshot_mask())[0]);
+   SEQ_printf(m, tick_broadcast_oneshot_mask: %*pb\n,
+  cpumask_pr_args(tick_get_broadcast_oneshot_mask()));
 #endif
SEQ_printf(m, \n);
 #endif

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] cpufreq: powernv: Register for OCC related opal_message notification

2015-04-28 Thread Viresh Kumar
On 28 April 2015 at 13:48, Shilpasri G Bhat
shilpa.b...@linux.vnet.ibm.com wrote:
 My bad I haven't added explicit comment to state reason behind this change.

 I modified the definition of *throttle_check() to match the function 
 definition
 to be called via smp_call() instead of adding an additional wrapper around
 *throttle_check().

 OCC is a chip entity and any local throttle state changes should be associated
 to cpus belonging to that chip. The *throttle_check() will read the core
 register PMSR to verify throttling. All the cores in a chip will have the same
 throttled state as they are managed by a the same OCC in that chip.

 smp_call() is required to ensure *throttle_check() is called on a cpu 
 belonging
 to the chip for which we have received throttled/unthrottled notification. We
 could be handling throttled/unthrottled notification of 'chip1' in 'chip2' so 
 do
 an smp_call() on 'chip1'.

Okay. Lets talk about the code that is already present in mainline. Isn't that
suffering from this issue ? If yes, then you need to bugfix that separately.

 We are irq_disabled in powernv_cpufreq_occ_msg() the notification handler.
 Thus the use of kworker to do an smp_call and restore policy-cur.

 OCC_RESET is global event it affects frequency of all chips. Pmax capping is
 local event, it affects the frequency of a chip.


 That's a lot. I am not an expert here and so really can't comment on
 the internals of ppc. But, is it patch solving a single problem ? I don't
 know, I somehow got the impression that it can be split into multiple
 (smaller  review-able) patches. Only if it makes sense. Your call.

 All the changes introduced in this patch is centered around opal_message
 notification handler powernv_cpufreq_occ_msg(). I can split it into multiple
 patches but it all will be relevant only to solve the above problem.

And that's what I meant here. Yes, this all is solving a central problem, but
a patch must be divided into separate, independently working, entities.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/kvm: Fix SMP=n build error in book3s_xics.c

2015-04-28 Thread Paul Mackerras
On Tue, Apr 28, 2015 at 10:42:47AM +1000, Michael Ellerman wrote:
 Commit 34cb7954c0aa Convert ICS mutex lock to spin lock added an
 include of asm/spinlock.h, which does not work in the SMP=n case.
 
 It should instead include linux/spinlock.h
 
 Fixes: 34cb7954c0aa (KVM: PPC: Book3S HV: Convert ICS mutex lock to spin 
 lock)
 Signed-off-by: Michael Ellerman m...@ellerman.id.au

Acked-by: Paul Mackerras pau...@samba.org
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] cpufreq: powernv: Register for OCC related opal_message notification

2015-04-28 Thread Shilpasri G Bhat
Hi Viresh,

On 04/28/2015 12:18 PM, Viresh Kumar wrote:
 On 28 April 2015 at 11:53, Shilpasri G Bhat
 shilpa.b...@linux.vnet.ibm.com wrote:
 
 Changes from v1:
 - Add macros to define OCC_RESET, OCC_LOAD and OCC_THROTTLE
 - Define a structure to store chip id, chip mask which has bits set
   for cpus present in the chip, throttled state and a work_struct.
 - Modify powernv_cpufreq_throttle_check() to be called via smp_call()
 
 Why ? I might have missed it but there should be some reasoning behind
 what you are changing.

My bad I haven't added explicit comment to state reason behind this change.

I modified the definition of *throttle_check() to match the function definition
to be called via smp_call() instead of adding an additional wrapper around
*throttle_check().

OCC is a chip entity and any local throttle state changes should be associated
to cpus belonging to that chip. The *throttle_check() will read the core
register PMSR to verify throttling. All the cores in a chip will have the same
throttled state as they are managed by a the same OCC in that chip.

smp_call() is required to ensure *throttle_check() is called on a cpu belonging
to the chip for which we have received throttled/unthrottled notification. We
could be handling throttled/unthrottled notification of 'chip1' in 'chip2' so do
an smp_call() on 'chip1'.

We are irq_disabled in powernv_cpufreq_occ_msg() the notification handler.
Thus the use of kworker to do an smp_call and restore policy-cur.

OCC_RESET is global event it affects frequency of all chips. Pmax capping is
local event, it affects the frequency of a chip.

 
 - On Pmax throttling/unthrottling update 'chip.throttled' and not the
   global 'throttled' as Pmax capping is local to the chip.
 - Remove the condition which checks if local pstate is less than Pmin
   while checking for Psafe frequency. When OCC becomes active after
   reset we update 'thottled' to false and when the cpufreq governor
   initiates a pstate change, the local pstate will be in Psafe and we
   will be reporting a false positive when we are not throttled.
 - Schedule a kworker on receiving throttling/unthrottling OCC message
   for that chip and schedule on all chips after receiving active.
 - After an OCC reset all the cpus will be in Psafe frequency. So call
   target() and restore the frequency to policy-cur after OCC_ACTIVE
   and Pmax unthrottling
 - Taken care of Viresh and Preeti's comments.
 
 That's a lot. I am not an expert here and so really can't comment on
 the internals of ppc. But, is it patch solving a single problem ? I don't
 know, I somehow got the impression that it can be split into multiple
 (smaller  review-able) patches. Only if it makes sense. Your call.

All the changes introduced in this patch is centered around opal_message
notification handler powernv_cpufreq_occ_msg(). I can split it into multiple
patches but it all will be relevant only to solve the above problem.

 
 diff --git a/drivers/cpufreq/powernv-cpufreq.c 
 b/drivers/cpufreq/powernv-cpufreq.c
 
 +void powernv_cpufreq_work_fn(struct work_struct *work)
 +{
 +   struct chip *c = container_of(work, struct chip, throttle);
 +   unsigned int cpu;
 +
 +   smp_call_function_any(c-mask,
 + powernv_cpufreq_throttle_check, NULL, 0);
 +
 +   for_each_cpu(cpu, c-mask) {
 
 for_each_online_cpu ?

I want to iterate on all the cpus in a chip stored in 'struct chip.mask'.
If you were intending me to avoid 'if(!cpu_online(cpu))' then will the 
following do:

for_each_cpu_and(cpu, c-mask, cpu_online_mask)

 
 +   int index;
 +   struct cpufreq_frequency_table *freq_table;
 +   struct cpufreq_policy cpu_policy;
 
 Name it policy.

Okay.

 
 +
 +   if (!cpu_online(cpu))
 +   continue;
 
 And you can kill this..
 
 +   cpufreq_get_policy(cpu_policy, cpu);
 +   freq_table = cpufreq_frequency_get_table(cpu_policy.cpu);
 
 Just do, policy-freq_table.

Okay.

 
 
 +static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
 +  unsigned long msg_type, void *msg)
 +{
 
 +   if (reason  reason = 5)
 +   pr_info(OCC: Chip %d Pmax reduced due to %s\n,
 +   (int)chip_id, throttle_reason[reason]);
 +   else
 +   pr_info(OCC: Chip %d %s\n, (int)chip_id,
 +   throttle_reason[reason]);
 
 Blank line here. They are better for readability after blocks and loops.

Yes will do.

 
 +   for (i = 0; i  nr_chips; i++)
 +   if (chips[i].id == (int)chip_id)
 
 Why isn't .id 64 bit ?

I guess 6 bits are sufficient to store chip id given that max number of chips
can be 256. I don't have good reason for defining .id 32 bit.

Yeah 64-bit .id will avoid the typecast.

 
 +   schedule_work(chips[i].throttle);
 +   }
 

Re: [PATCH v2 2/2] cpufreq: powernv: Register for OCC related opal_message notification

2015-04-28 Thread Shilpasri G Bhat


On 04/28/2015 02:23 PM, Viresh Kumar wrote:
 On 28 April 2015 at 13:48, Shilpasri G Bhat
 shilpa.b...@linux.vnet.ibm.com wrote:
 My bad I haven't added explicit comment to state reason behind this change.

 I modified the definition of *throttle_check() to match the function 
 definition
 to be called via smp_call() instead of adding an additional wrapper around
 *throttle_check().

 OCC is a chip entity and any local throttle state changes should be 
 associated
 to cpus belonging to that chip. The *throttle_check() will read the core
 register PMSR to verify throttling. All the cores in a chip will have the 
 same
 throttled state as they are managed by a the same OCC in that chip.

 smp_call() is required to ensure *throttle_check() is called on a cpu 
 belonging
 to the chip for which we have received throttled/unthrottled notification. We
 could be handling throttled/unthrottled notification of 'chip1' in 'chip2' 
 so do
 an smp_call() on 'chip1'.
 
 Okay. Lets talk about the code that is already present in mainline. Isn't that
 suffering from this issue ? If yes, then you need to bugfix that separately.

Nope. The upstream code does not have this issue as it does not have checks to
detect unthrottling state. The unthrottling i.e, 'throttled=false' is being
handled only in this patchset.

Yes this can be fixed separately.

 
 We are irq_disabled in powernv_cpufreq_occ_msg() the notification handler.
 Thus the use of kworker to do an smp_call and restore policy-cur.

 OCC_RESET is global event it affects frequency of all chips. Pmax capping is
 local event, it affects the frequency of a chip.

 
 That's a lot. I am not an expert here and so really can't comment on
 the internals of ppc. But, is it patch solving a single problem ? I don't
 know, I somehow got the impression that it can be split into multiple
 (smaller  review-able) patches. Only if it makes sense. Your call.

 All the changes introduced in this patch is centered around opal_message
 notification handler powernv_cpufreq_occ_msg(). I can split it into multiple
 patches but it all will be relevant only to solve the above problem.
 
 And that's what I meant here. Yes, this all is solving a central problem, but
 a patch must be divided into separate, independently working, entities.
 

Yup agree. Will do.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc32: rearrange instructions order in ip_fast_csum()

2015-04-28 Thread christophe leroy



Le 25/03/2015 02:22, Scott Wood a écrit :

On Tue, Feb 03, 2015 at 12:39:27PM +0100, LEROY Christophe wrote:

Signed-off-by: Christophe Leroy christophe.le...@c-s.fr
---
  arch/powerpc/lib/checksum_32.S | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
index 6d67e05..5500704 100644
--- a/arch/powerpc/lib/checksum_32.S
+++ b/arch/powerpc/lib/checksum_32.S
@@ -26,13 +26,17 @@
  _GLOBAL(ip_fast_csum)
lwz r0,0(r3)
lwzur5,4(r3)
-   addic.  r4,r4,-2
+   addic.  r4,r4,-4
addcr0,r0,r5
mtctr   r4
blelr-
-1: lwzur4,4(r3)
-   adder0,r0,r4
+   lwzur5,4(r3)
+   lwzur4,4(r3)

The blelr is pointless since len is guaranteed to be = 5 (assuming that
comment is accurate), but now it's both pointless and in the wrong place,
since you haven't yet finished the four words that you subtracted from
r4.
The blelr is just there to protect the function against negative value 
of r4 hence ctr.
In any case, the returned result in that case in not correct, has we do 
not touch r3.


How about keeping the blelr, without the -, moving it after the initial
words, and changing the number of inital words to 5?

We can't just do blelr, we would need to fold the result first.
But indeed, this would be useless because I quickly checked and it seems 
that all functions calling ip_fast_csum()

check that the length is not lower than 5.
So I will just remove the blelr

Also maybe do all
the loads up front, since many PPC chips have a three cycle load latency
rather than two.

ok

Christophe

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 1/2] powerpc32: put csum_tcpudp_magic inline

2015-04-28 Thread christophe leroy



Le 25/03/2015 03:10, Scott Wood a écrit :

On Tue, 2015-02-03 at 12:39 +0100, Christophe Leroy wrote:

csum_tcpudp_magic() is only a few instructions, and does not modifies any other
register than the returned result. So it is not worth having it as a separate
function and suffer function branching and saving of volatile registers.
This patch makes it inline by use of the already existing csum_tcpudp_nofold()
function.

Signed-off-by: Christophe Leroy christophe.le...@c-s.fr

---
v2: no change

  arch/powerpc/include/asm/checksum.h | 15 +++
  arch/powerpc/lib/checksum_32.S  | 16 
  2 files changed, 15 insertions(+), 16 deletions(-)

The 64-bit version is pretty similar to the 32-bit -- why only use
csum_tcpudp_nofold() on 32-bit?


I did it only on 32-bit because I have no way to test it on 64-bits, but 
I can do it for 64 bits as well, no problem.


Christophe

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Build regressions/improvements in v4.1-rc1

2015-04-28 Thread Rusty Russell
Geert Uytterhoeven ge...@linux-m68k.org writes:
 On Mon, Apr 27, 2015 at 11:51 AM, Geert Uytterhoeven
 ge...@linux-m68k.org wrote:
 Below is the list of build error/warning regressions/improvements in
 v4.1-rc1[1] compared to v4.0[2].

 Summarized:
   - build errors: +34/-11
   - build warnings: +135/-163

 As I haven't mastered kup yet, there's no verbose summary at
 http://www.kernel.org/pub/linux/kernel/people/geert/linux-log/v4.1-rc1.summary.gz

 Happy fixing! ;-)

 Thanks to the linux-next team for providing the build service.

 [1] http://kisskb.ellerman.id.au/kisskb/head/8779/ (254 out of 257 configs)
 [2] http://kisskb.ellerman.id.au/kisskb/head/8710/ (254 out of 257 configs)


 *** ERRORS ***

 34 regressions:

 The quiet days are over...

   + /home/kisskb/slave/src/arch/mips/cavium-octeon/smp.c: error: passing 
 argument 2 of 'cpumask_clear_cpu' discards 'volatile' qualifier from pointer 
 target type [-Werror]:  = 242:2
   + /home/kisskb/slave/src/arch/mips/kernel/process.c: error: passing 
 argument 2 of 'cpumask_test_cpu' discards 'volatile' qualifier from pointer 
 target type [-Werror]:  = 52:2
   + /home/kisskb/slave/src/arch/mips/kernel/smp.c: error: passing argument 2 
 of 'cpumask_set_cpu' discards 'volatile' qualifier from pointer target type 
 [-Werror]:  = 149:2, 211:2
   + /home/kisskb/slave/src/arch/mips/kernel/smp.c: error: passing argument 2 
 of 'cpumask_test_cpu' discards 'volatile' qualifier from pointer target type 
 [-Werror]:  = 221:2

 mips/bigsur_defconfig
 mips/malta_defconfig
 mips/cavium_octeon_defconfig
 mips/ip27_defconfig

Already fixed in other thread...

 and related warnings due to lack of -Werror on
 ia64-defconfig

That fix is fairly obvious, I'll post separately.

 tilegx_defconfig

Can't see that one with a simple grep: can you post warning?

 m32r/m32700ut.smp_defconfig

Will post fix for this too.

 cpumask also gives fishy warnings:

 lib/cpumask.c:167:25: warning: the address of 'cpu_all_bits' will
 always evaluate as 'true' [-Waddress]

 on sparc (e.g. sparc64/sparc64-allmodconfig) and powerpc (e.g.
 powerpc/ppc64_defconfig), which seem to have been reported 6 months
 ago...

Hmm, this is cpumask_of_node?  That's... Oh my, that requires
a separate post.

 Can we throw some bitcoins at the cpumasks? ;-)

I think I should be throwing bitcoins at you, instead!

Thanks,
Rusty.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 2/2] cpufreq: powernv: Register for OCC related opal_message notification

2015-04-28 Thread Shilpasri G Bhat
OCC is an On-Chip-Controller which takes care of power and thermal
safety of the chip. During runtime due to power failure or
overtemperature the OCC may throttle the frequencies of the CPUs to
remain within the power budget.

We want the cpufreq driver to be aware of such situations to be able
to report it to the user. We register to opal_message_notifier to
receive OCC messages from opal.

powernv_cpufreq_throttle_check() reports any frequency throttling and
this patch will report the reason or event that caused throttling. We
can be throttled if OCC is reset or OCC limits Pmax due to power or
thermal reasons. We are also notified of unthrottling after an OCC
reset or if OCC restores Pmax on the chip.

Signed-off-by: Shilpasri G Bhat shilpa.b...@linux.vnet.ibm.com
CC: Rafael J. Wysocki r...@rjwysocki.net
CC: Viresh Kumar viresh.ku...@linaro.org
CC: Preeti U Murthy pre...@linux.vnet.ibm.com
CC: linux...@vger.kernel.org
---
Changes from v1:
- Add macros to define OCC_RESET, OCC_LOAD and OCC_THROTTLE
- Define a structure to store chip id, chip mask which has bits set
  for cpus present in the chip, throttled state and a work_struct.
- Modify powernv_cpufreq_throttle_check() to be called via smp_call()
- On Pmax throttling/unthrottling update 'chip.throttled' and not the
  global 'throttled' as Pmax capping is local to the chip.
- Remove the condition which checks if local pstate is less than Pmin
  while checking for Psafe frequency. When OCC becomes active after
  reset we update 'thottled' to false and when the cpufreq governor
  initiates a pstate change, the local pstate will be in Psafe and we
  will be reporting a false positive when we are not throttled.
- Schedule a kworker on receiving throttling/unthrottling OCC message
  for that chip and schedule on all chips after receiving active.
- After an OCC reset all the cpus will be in Psafe frequency. So call
  target() and restore the frequency to policy-cur after OCC_ACTIVE
  and Pmax unthrottling
- Taken care of Viresh and Preeti's comments.

 drivers/cpufreq/powernv-cpufreq.c | 181 ++
 1 file changed, 166 insertions(+), 15 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index ebef0d8..b356c9d 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -27,20 +27,33 @@
 #include linux/smp.h
 #include linux/of.h
 #include linux/reboot.h
+#include linux/slab.h
 
 #include asm/cputhreads.h
 #include asm/firmware.h
 #include asm/reg.h
 #include asm/smp.h /* Required for cpu_sibling_mask() in UP configs */
+#include asm/opal.h
 
 #define POWERNV_MAX_PSTATES256
 #define PMSR_PSAFE_ENABLE  (1UL  30)
 #define PMSR_SPR_EM_DISABLE(1UL  31)
 #define PMSR_MAX(x)((x  32)  0xFF)
-#define PMSR_LP(x) ((x  48)  0xFF)
+#define OCC_RESET  0
+#define OCC_LOAD   1
+#define OCC_THROTTLE   2
 
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
-static bool rebooting, throttled;
+static bool rebooting, throttled, occ_reset;
+
+static struct chip {
+   int id;
+   bool throttled;
+   cpumask_t mask;
+   struct work_struct throttle;
+} *chips;
+
+static int nr_chips;
 
 /*
  * Note: The set of pstates consists of contiguous integers, the
@@ -298,28 +311,33 @@ static inline unsigned int get_nominal_index(void)
return powernv_pstate_info.max - powernv_pstate_info.nominal;
 }
 
-static void powernv_cpufreq_throttle_check(unsigned int cpu)
+static void powernv_cpufreq_throttle_check(void *data)
 {
+   unsigned int cpu = smp_processor_id();
unsigned long pmsr;
-   int pmsr_pmax, pmsr_lp;
+   int pmsr_pmax, i;
 
pmsr = get_pmspr(SPRN_PMSR);
 
+   for (i = 0; i  nr_chips; i++)
+   if (chips[i].id == cpu_to_chip_id(cpu))
+   break;
+
/* Check for Pmax Capping */
pmsr_pmax = (s8)PMSR_MAX(pmsr);
if (pmsr_pmax != powernv_pstate_info.max) {
-   throttled = true;
-   pr_info(CPU %d Pmax is reduced to %d\n, cpu, pmsr_pmax);
-   pr_info(Max allowed Pstate is capped\n);
+   if (chips[i].throttled)
+   goto next;
+   chips[i].throttled = true;
+   pr_info(CPU %d on chip %d Pmax is reduced to %d\n, cpu,
+   chips[i].id, pmsr_pmax);
+   } else {
+   chips[i].throttled = false;
}
 
-   /*
-* Check for Psafe by reading LocalPstate
-* or check if Psafe_mode_active is set in PMSR.
-*/
-   pmsr_lp = (s8)PMSR_LP(pmsr);
-   if ((pmsr_lp  powernv_pstate_info.min) ||
-   (pmsr  PMSR_PSAFE_ENABLE)) {
+   /* Check if Psafe_mode_active is set in PMSR. */
+next:
+   if (pmsr  PMSR_PSAFE_ENABLE) {
throttled = true;
pr_info(Pstate set to safe frequency\n);
 

Re: Build regressions/improvements in v4.1-rc1

2015-04-28 Thread Geert Uytterhoeven
On Tue, Apr 28, 2015 at 6:39 AM, Rusty Russell ru...@rustcorp.com.au wrote:
   + /home/kisskb/slave/src/arch/mips/cavium-octeon/smp.c: error: passing 
 argument 2 of 'cpumask_clear_cpu' discards 'volatile' qualifier from 
 pointer target type [-Werror]:  = 242:2
   + /home/kisskb/slave/src/arch/mips/kernel/process.c: error: passing 
 argument 2 of 'cpumask_test_cpu' discards 'volatile' qualifier from pointer 
 target type [-Werror]:  = 52:2
   + /home/kisskb/slave/src/arch/mips/kernel/smp.c: error: passing argument 
 2 of 'cpumask_set_cpu' discards 'volatile' qualifier from pointer target 
 type [-Werror]:  = 149:2, 211:2
   + /home/kisskb/slave/src/arch/mips/kernel/smp.c: error: passing argument 
 2 of 'cpumask_test_cpu' discards 'volatile' qualifier from pointer target 
 type [-Werror]:  = 221:2

 and related warnings due to lack of -Werror on

 tilegx_defconfig

 Can't see that one with a simple grep: can you post warning?

/home/kisskb/slave/src/arch/tile/kernel/setup.c: In function 'zone_sizes_init':
/home/kisskb/slave/src/arch/tile/kernel/setup.c:777:3: warning:
passing argument 2 of 'cpumask_test_cpu' from incompatible pointer
type [enabled by default]
/home/kisskb/slave/src/include/linux/cpumask.h:294:19: note: expected
'const struct cpumask *' but argument is of type 'struct nodemask_t *'

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] cpufreq: powernv: Register for OCC related opal_message notification

2015-04-28 Thread Viresh Kumar
On 28 April 2015 at 11:53, Shilpasri G Bhat
shilpa.b...@linux.vnet.ibm.com wrote:

 Changes from v1:
 - Add macros to define OCC_RESET, OCC_LOAD and OCC_THROTTLE
 - Define a structure to store chip id, chip mask which has bits set
   for cpus present in the chip, throttled state and a work_struct.
 - Modify powernv_cpufreq_throttle_check() to be called via smp_call()

Why ? I might have missed it but there should be some reasoning behind
what you are changing.

 - On Pmax throttling/unthrottling update 'chip.throttled' and not the
   global 'throttled' as Pmax capping is local to the chip.
 - Remove the condition which checks if local pstate is less than Pmin
   while checking for Psafe frequency. When OCC becomes active after
   reset we update 'thottled' to false and when the cpufreq governor
   initiates a pstate change, the local pstate will be in Psafe and we
   will be reporting a false positive when we are not throttled.
 - Schedule a kworker on receiving throttling/unthrottling OCC message
   for that chip and schedule on all chips after receiving active.
 - After an OCC reset all the cpus will be in Psafe frequency. So call
   target() and restore the frequency to policy-cur after OCC_ACTIVE
   and Pmax unthrottling
 - Taken care of Viresh and Preeti's comments.

That's a lot. I am not an expert here and so really can't comment on
the internals of ppc. But, is it patch solving a single problem ? I don't
know, I somehow got the impression that it can be split into multiple
(smaller  review-able) patches. Only if it makes sense. Your call.

 diff --git a/drivers/cpufreq/powernv-cpufreq.c 
 b/drivers/cpufreq/powernv-cpufreq.c

 +void powernv_cpufreq_work_fn(struct work_struct *work)
 +{
 +   struct chip *c = container_of(work, struct chip, throttle);
 +   unsigned int cpu;
 +
 +   smp_call_function_any(c-mask,
 + powernv_cpufreq_throttle_check, NULL, 0);
 +
 +   for_each_cpu(cpu, c-mask) {

for_each_online_cpu ?

 +   int index;
 +   struct cpufreq_frequency_table *freq_table;
 +   struct cpufreq_policy cpu_policy;

Name it policy.

 +
 +   if (!cpu_online(cpu))
 +   continue;

And you can kill this..

 +   cpufreq_get_policy(cpu_policy, cpu);
 +   freq_table = cpufreq_frequency_get_table(cpu_policy.cpu);

Just do, policy-freq_table.


 +static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
 +  unsigned long msg_type, void *msg)
 +{

 +   if (reason  reason = 5)
 +   pr_info(OCC: Chip %d Pmax reduced due to %s\n,
 +   (int)chip_id, throttle_reason[reason]);
 +   else
 +   pr_info(OCC: Chip %d %s\n, (int)chip_id,
 +   throttle_reason[reason]);

Blank line here. They are better for readability after blocks and loops.

 +   for (i = 0; i  nr_chips; i++)
 +   if (chips[i].id == (int)chip_id)

Why isn't .id 64 bit ?

 +   schedule_work(chips[i].throttle);
 +   }
 +   return 0;
 +}
 +
 +static struct notifier_block powernv_cpufreq_opal_nb = {
 +   .notifier_call  = powernv_cpufreq_occ_msg,
 +   .next   = NULL,
 +   .priority   = 0,
 +};
 +
  static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
  {
 struct powernv_smp_call_data freq_data;
 @@ -414,6 +530,35 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
 .attr   = powernv_cpu_freq_attr,
  };

 +static int init_chip_info(void)
 +{
 +   int chip[256], i = 0, cpu;
 +   int prev_chip_id = INT_MAX;
 +
 +   for_each_possible_cpu(cpu) {
 +   int c = cpu_to_chip_id(cpu);

Does 'c' refer to id here ? Name it so then.

 +
 +   if (prev_chip_id != c) {
 +   prev_chip_id = c;
 +   chip[nr_chips++] = c;
 +   }
 +   }
 +
 +   chips = kmalloc_array(nr_chips, sizeof(struct chip), GFP_KERNEL);
 +

A blank line isn't preferred much here :). Sorry about these blank lines.

 +   if (!chips)
 +   return -ENOMEM;
 +
 +   for (i = 0; i  nr_chips; i++) {
 +   chips[i].id = chip[i];
 +   cpumask_copy(chips[i].mask, cpumask_of_node(chip[i]));
 +   chips[i].throttled = false;
 +   INIT_WORK(chips[i].throttle, powernv_cpufreq_work_fn);
 +   }
 +
 +   return 0;
 +}
 +
  static int __init powernv_cpufreq_init(void)
  {
 int rc = 0;
 @@ -429,7 +574,13 @@ static int __init powernv_cpufreq_init(void)
 return rc;
 }

 +   /* Populate chip info */
 +   rc = init_chip_info();
 +   if (rc)
 +   return rc;
 +
 register_reboot_notifier(powernv_cpufreq_reboot_nb);
 +   opal_message_notifier_register(OPAL_MSG_OCC, 
 

[PATCH v4 1/3] powerpc/powernv: Add OPAL interfaces for accessing and modifying system LED states

2015-04-28 Thread Vasant Hegde
From: Anshuman Khandual khand...@linux.vnet.ibm.com

This patch registers the following two new OPAL interfaces calls
for the platform LED subsystem. With the help of these new OPAL calls,
the kernel will be able to get or set the state of various individual
LEDs on the system at any given location code which is passed through
the LED specific device tree nodes.

(1) OPAL_LEDS_GET_INDICATOR opal_leds_get_ind
(2) OPAL_LEDS_SET_INDICATOR opal_leds_set_ind

Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com
Signed-off-by: Vasant Hegde hegdevas...@linux.vnet.ibm.com
Acked-by: Stewart Smith stew...@linux.vnet.ibm.com
Tested-by: Stewart Smith stew...@linux.vnet.ibm.com

---
Changes in v4:
  - Updated macros to reflect platform.

 arch/powerpc/include/asm/opal-api.h|   29 +++-
 arch/powerpc/include/asm/opal.h|5 
 arch/powerpc/platforms/powernv/opal-wrappers.S |2 ++
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 0321a90..ff85397 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -153,7 +153,9 @@
 #define OPAL_FLASH_READ110
 #define OPAL_FLASH_WRITE   111
 #define OPAL_FLASH_ERASE   112
-#define OPAL_LAST  112
+#define OPAL_LEDS_GET_INDICATOR114
+#define OPAL_LEDS_SET_INDICATOR115
+#define OPAL_LAST  115
 
 /* Device tree flags */
 
@@ -730,6 +732,31 @@ struct opal_i2c_request {
__be64 buffer_ra;   /* Buffer real address */
 };
 
+/* LED Mode */
+#define POWERNV_LED_MODE_LIGHT_PATHlightpath
+#define POWERNV_LED_MODE_GUIDING_LIGHT guidinglight
+
+/* LED type */
+#define POWERNV_LED_TYPE_IDENTIFY  identify
+#define POWERNV_LED_TYPE_FAULT fault
+#define POWERNV_LED_TYPE_ATTENTION attention
+
+/* LED location */
+#define POWERNV_LED_LOC_ENCLOSURE  enclosure
+#define POWERNV_LED_LOC_DESCENDENT descendent
+
+enum OpalSlotLedType {
+   OPAL_SLOT_LED_TYPE_ID = 0,  /* IDENTIFY LED */
+   OPAL_SLOT_LED_TYPE_FAULT = 1,   /* FAULT LED */
+   OPAL_SLOT_LED_TYPE_ATTN = 2,/* System Attention LED */
+   OPAL_SLOT_LED_TYPE_MAX = 3
+};
+
+enum OpalSlotLedState {
+   OPAL_SLOT_LED_STATE_OFF = 0,/* LED is OFF */
+   OPAL_SLOT_LED_STATE_ON = 1  /* LED is ON */
+};
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __OPAL_API_H */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 042af1a..e06dc7e 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -193,6 +193,11 @@ int64_t opal_ipmi_recv(uint64_t interface, struct 
opal_ipmi_msg *msg,
uint64_t *msg_len);
 int64_t opal_i2c_request(uint64_t async_token, uint32_t bus_id,
 struct opal_i2c_request *oreq);
+int64_t opal_leds_get_ind(char *loc_code, u64 *led_mask,
+ u64 *led_value, u64 *max_led_type);
+int64_t opal_leds_set_ind(uint64_t token, char *loc_code, const u64 led_mask,
+ const u64 led_value, u64 *max_led_type);
+
 
 int64_t opal_flash_read(uint64_t id, uint64_t offset, uint64_t buf,
uint64_t size, uint64_t token);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 4e74037..20d4da4 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -295,3 +295,5 @@ OPAL_CALL(opal_i2c_request, 
OPAL_I2C_REQUEST);
 OPAL_CALL(opal_flash_read, OPAL_FLASH_READ);
 OPAL_CALL(opal_flash_write,OPAL_FLASH_WRITE);
 OPAL_CALL(opal_flash_erase,OPAL_FLASH_ERASE);
+OPAL_CALL(opal_leds_get_ind,   OPAL_LEDS_GET_INDICATOR);
+OPAL_CALL(opal_leds_set_ind,   OPAL_LEDS_SET_INDICATOR);

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 0/3] LED interface for PowerNV platform

2015-04-28 Thread Vasant Hegde
The following series implements LED driver for PowerNV platform.

PowerNV platform has below type of LEDs:
  - System attention
  Indicates there is a problem with the system that needs attention.
  - Identify
  Helps the user locate/identify a particular FRU or resource in the
  system.
  - Fault
  Indicates there is a problem with the FRU or resource at the
  location with which the indicator is associated.

On PowerNV (Non Virtualized) platform OPAL firmware provides LED information
to host via device tree (location code and LED type). During init we check
for 'ibm,opal/led' node in device tree to enable LED driver. And we use
OPAL API's to get/set LEDs.

Note that on PowerNV platform firmware can activate fault LED, if it can isolate
the problem. Also one can modify the LEDs using service processor interface. 
None
of these involes kernel. Hence we retain LED state in unload path.

Sample LED device tree output:
--
led {
compatible = ibm,opal-v3-led;
phandle = 0x106b;
linux,phandle = 0x106b;
led-mode = lightpath;

U78C9.001.RST0027-P1-C1 {
led-types = identify, fault;
led-loc = descendent;
phandle = 0x106f;
linux,phandle = 0x106f;
};
...
...
}

Sample sysfs output:

.
├── U78CB.001.WZS008R-A1:fault
│   ├── brightness
│   ├── device - ../../../opal_led
│   ├── max_brightness
│   ├── power
│   │   ├── async
│   │   ├── autosuspend_delay_ms
│   │   ├── control
│   │   ├── runtime_active_kids
│   │   ├── runtime_active_time
│   │   ├── runtime_enabled
│   │   ├── runtime_status
│   │   ├── runtime_suspended_time
│   │   └── runtime_usage
│   ├── subsystem - ../../../../../class/leds
│   ├── trigger
│   └── uevent
├── U78CB.001.WZS008R-A1:identify
│   ├── brightness
│   ├── device - ../../../opal_led
│   ├── max_brightness
│   ├── power
│   │   ├── async
│   │   ├── autosuspend_delay_ms
│   │   ├── control
│   │   ├── runtime_active_kids
│   │   ├── runtime_active_time
│   │   ├── runtime_enabled
│   │   ├── runtime_status
│   │   ├── runtime_suspended_time
│   │   └── runtime_usage
│   ├── subsystem - ../../../../../class/leds
│   ├── trigger
│   └── uevent




patch 1/2: PowerNV architecture specific code. This adds necessary
   OPAL APIs.
patch 2/2: Create LED platform device and export OPAL symbols
patch 3/3: Actual LED driver implemenation for PowerNV platform.

This patchset is based on top of mpe's next branch:
  https://git.kernel.org/cgit/linux/kernel/git/mpe/linux.git/log/?h=next

Previous patchset:
  v3: https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-April/127702.html
  v2: https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-March/126301.html
  v1: https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-March/125705.html

Changes in v4:
  - Updated macros to reflect platform.
  - s/u64/__be64/g for big endian data we get from firmware
  - Addressed review comments from Jacek. Major once are:
Removed list in powernv_led_data structure
s/kzalloc/devm_kzalloc/
Removed compatible property from documentation
s/powernv_led_set_queue/powernv_brightness_set/
  - Removed LED specific brightness_set/get function. Instead this version
uses single function to queue all LED set/get requests. Later we use
LED name to detect LED type and value.
  - Removed hardcoded LED type used in previous version. Instead we use
led-types property to form LED classdev.

Changes in v3:
  - Addressed review comments from Jacek. Major once are:
Replaced spin lock and mutex and removed redundant structures
Replaced pr_* with dev_*
Moved OPAL platform sepcific part to separate patch
Moved repteated code to common function
Added device tree documentation for LEDs

Changes in v2:
  - Rebased patches on top of mpe's next branch
https://git.kernel.org/cgit/linux/kernel/git/mpe/linux.git/log/?h=next
  - Added System Attention Indicator support
  - Removed redundant code in leds-powernv.c file


---

Anshuman Khandual (1):
  powerpc/powernv: Add OPAL interfaces for accessing and modifying system 
LED states

Vasant Hegde (2):
  powerpc/powernv: Create LED platform device
  leds/powernv: Add driver for PowerNV platform


 .../devicetree/bindings/leds/leds-powernv.txt  |   29 +
 arch/powerpc/include/asm/opal-api.h|   29 +
 arch/powerpc/include/asm/opal.h|5 
 arch/powerpc/platforms/powernv/opal-wrappers.S |2 
 arch/powerpc/platforms/powernv/opal.c  |   12 -
 drivers/leds/Kconfig   |   11 
 drivers/leds/Makefile  |1 
 drivers/leds/leds-powernv.c|  472 
 8 files changed, 559 insertions(+), 2 deletions(-)
 create mode 100644 

[PATCH v4 3/3] leds/powernv: Add driver for PowerNV platform

2015-04-28 Thread Vasant Hegde
This patch implements LED driver for PowerNV platform using the existing
generic LED class framework.

PowerNV platform has below type of LEDs:
  - System attention
  Indicates there is a problem with the system that needs attention.
  - Identify
  Helps the user locate/identify a particular FRU or resource in the
  system.
  - Fault
  Indicates there is a problem with the FRU or resource at the
  location with which the indicator is associated.

We register classdev structures for all individual LEDs detected on the
system through LED specific device tree nodes. Device tree nodes specify
what all kind of LEDs present on the same location code. It registers
LED classdev structure for each of them.

All the system LEDs can be found in the same regular path /sys/class/leds/.
We don't use LED colors. We use LED node and led-types property to form
LED classdev. Our LEDs have names in this format.

location_code:attention|identify|fault

Any positive brightness value would turn on the LED and a zero value would
turn off the LED. The driver will return LED_FULL (255) for any turned on
LED and LED_OFF (0) for any turned off LED.

As per the LED class framework, the 'brightness_set' function should not
sleep. Hence these functions have been implemented through global work
queue tasks which might sleep on OPAL async call completion.

The platform level implementation of LED get and set state has been achieved
through OPAL calls. These calls are made available for the driver by
exporting from architecture specific codes.

Signed-off-by: Vasant Hegde hegdevas...@linux.vnet.ibm.com
Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com
Acked-by: Stewart Smith stew...@linux.vnet.ibm.com
Tested-by: Stewart Smith stew...@linux.vnet.ibm.com

---
Changes in v4:
  - s/u64/__be64/g for big endian data we get from firmware
  - Addressed review comments from Jacek. Major once are:
Removed list in powernv_led_data structure
s/kzalloc/devm_kzalloc/
Removed compatible property from documentation
s/powernv_led_set_queue/powernv_brightness_set/
  - Removed LED specific brightness_set/get function. Instead this version
uses single function to queue all LED set/get requests. Later we use
LED name to detect LED type and value.
  - Removed hardcoded LED type used in previous version. Instead we use
led-types property to form LED classdev.


Changes in v3:
  - Addressed review comments from Jacek. Major once are:
Replaced spin lock and mutex and removed redundant structures
Replaced pr_* with dev_*
Moved OPAL platform sepcific part to separate patch
Moved repteated code to common function
Added device tree documentation for LEDs


Changes in v2:
  - Added System Attention indicator support
  - Moved common code to powernv_led_set_queue()


 .../devicetree/bindings/leds/leds-powernv.txt  |   29 +
 drivers/leds/Kconfig   |   11 
 drivers/leds/Makefile  |1 
 drivers/leds/leds-powernv.c|  472 
 4 files changed, 513 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/leds/leds-powernv.txt
 create mode 100644 drivers/leds/leds-powernv.c

diff --git a/Documentation/devicetree/bindings/leds/leds-powernv.txt 
b/Documentation/devicetree/bindings/leds/leds-powernv.txt
new file mode 100644
index 000..6bb0e7e
--- /dev/null
+++ b/Documentation/devicetree/bindings/leds/leds-powernv.txt
@@ -0,0 +1,29 @@
+Device Tree binding for LEDs on IBM Power Systems
+-
+
+The 'led' node under '/ibm,opal' lists service indicators available in the
+system and their capabilities.
+
+led {
+   compatible = ibm,opal-v3-led;
+   phandle = 0x106b;
+   linux,phandle = 0x106b;
+   led-mode = lightpath;
+
+   U78C9.001.RST0027-P1-C1 {
+   led-types = identify, fault;
+   led-loc = descendent;
+   phandle = 0x106f;
+   linux,phandle = 0x106f;
+   };
+   ...
+   ...
+};
+
+Each node under 'led' node describes location code of FRU/Enclosure.
+
+The properties under each node:
+
+  led-types : Supported LED types (attention/identify/fault).
+
+  led-loc   : enclosure/descendent(FRU) location code.
diff --git a/drivers/leds/Kconfig b/drivers/leds/Kconfig
index 25b320d..2ea0849 100644
--- a/drivers/leds/Kconfig
+++ b/drivers/leds/Kconfig
@@ -508,6 +508,17 @@ config LEDS_BLINKM
  This option enables support for the BlinkM RGB LED connected
  through I2C. Say Y to enable support for the BlinkM LED.
 
+config LEDS_POWERNV
+   tristate LED support for PowerNV Platform
+   depends on LEDS_CLASS
+   depends on PPC_POWERNV
+   depends on OF
+   help
+ This option enables support for the system LEDs present on
+ PowerNV platforms. Say 'y' to enable this support in kernel.
+ To 

[PATCH v4 2/3] powerpc/powernv: Create LED platform device

2015-04-28 Thread Vasant Hegde
This patch adds paltform devices for leds. Also export LED related
OPAL API's so that led driver can use these APIs.

Signed-off-by: Vasant Hegde hegdevas...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/opal.c |   12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 2241565..b1951aa 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -784,7 +784,7 @@ static void opal_init_heartbeat(void)
 
 static int __init opal_init(void)
 {
-   struct device_node *np, *consoles;
+   struct device_node *np, *consoles, *led;
int rc;
 
opal_node = of_find_node_by_path(/ibm,opal);
@@ -813,6 +813,13 @@ static int __init opal_init(void)
/* Setup a heatbeat thread if requested by OPAL */
opal_init_heartbeat();
 
+   /* Create led platform devices */
+   led = of_find_node_by_path(/ibm,opal/led);
+   if (led) {
+   of_platform_device_create(led, opal_led, NULL);
+   of_node_put(led);
+   }
+
/* Find all OPAL interrupts and request them */
opal_irq_init(opal_node);
 
@@ -970,3 +977,6 @@ EXPORT_SYMBOL_GPL(opal_rtc_write);
 EXPORT_SYMBOL_GPL(opal_tpo_read);
 EXPORT_SYMBOL_GPL(opal_tpo_write);
 EXPORT_SYMBOL_GPL(opal_i2c_request);
+/* Export these symbols for PowerNV LED class driver */
+EXPORT_SYMBOL_GPL(opal_leds_get_ind);
+EXPORT_SYMBOL_GPL(opal_leds_set_ind);

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 3/3] leds/powernv: Add driver for PowerNV platform

2015-04-28 Thread Arnd Bergmann
On Tuesday 28 April 2015 15:40:35 Vasant Hegde wrote:
 +++ b/Documentation/devicetree/bindings/leds/leds-powernv.txt
 @@ -0,0 +1,29 @@
 +Device Tree binding for LEDs on IBM Power Systems
 +-
 +
 +The 'led' node under '/ibm,opal' lists service indicators available in the
 +system and their capabilities.
 +
 +led {
 +   compatible = ibm,opal-v3-led;
 +   phandle = 0x106b;
 +   linux,phandle = 0x106b;
 +   led-mode = lightpath;
 +
 +   U78C9.001.RST0027-P1-C1 {
 +   led-types = identify, fault;
 +   led-loc = descendent;
 +   phandle = 0x106f;
 +   linux,phandle = 0x106f;
 +   };
 +   ...
 +   ...
 +};

We normally don't list the 'phandle' or 'linux,phandle' properties in the 
binding
description.

 +
 +Each node under 'led' node describes location code of FRU/Enclosure.
 +
 +The properties under each node:
 +
 +  led-types : Supported LED types (attention/identify/fault).
 +
 +  led-loc   : enclosure/descendent(FRU) location code.
 

Could you use the standard 'label' property for this?

Arnd
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel] commit 4fbdf9cb (lpfc: Fix for lun discovery issue with saturn adapter.)

2015-04-28 Thread Alexey Kardashevskiy

On 04/28/2015 07:18 PM, Sebastian Herbszt wrote:

Alexey Kardashevskiy wrote:

This reverts 4fbdf9cb is breaks LPFC on POWER7 machine, big endian kernel.

This is the hardware used for verification:
0005:01:00.0 Fibre Channel [0c04]: Emulex Corporation Saturn-X: LightPulse 
Fibre Channel Host Adapter [10df:f100] (rev 03)
0005:01:00.1 Fibre Channel [0c04]: Emulex Corporation Saturn-X: LightPulse 
Fibre Channel Host Adapter [10df:f100] (rev 03)

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru


This issue is not specific to POWER7. I hit it on x86 [1] and James
promised to look at it.

[1] http://marc.info/?l=linux-scsim=142938432414173

Sebastian


Well, I hope so, I just wanted to be more specific and the fault looks much 
different (and much cooler! :) ) on my hardware (it actually enters an 
infinite loop of oops'es):




Welcome to Fedora 20 (Heisenbug)!

INFO: rcu_sched self-detected stall on CPU
INFO: rcu_sched self-detected stall on CPU
INFO: rcu_sched self-detected stall on CPU


1: (2100 ticks this GP) idle=981/141/0 softirq=234/234 fqs
=2083
2: (2100 ticks this GP) idle=c3d/141/0 softirq=259/259 fqs
=2083

 (t=2100 jiffies g=-7 c=-8 q=11820)
 (t=2100 jiffies g=-7 c=-8 q=11820)
Task dump for CPU 0:
kworker/u97:0   R  running task 8192 7  2 0x0804
Workqueue: events_unbound .async_run_entry_fn
Call Trace:
[c00ffa29ef80] [c00ffa29f060] 0xc00ffa29f060 (unreliable)
Task dump for CPU 1:
kworker/u97:2   R  running task10304  1636  2 0x0804
Workqueue: events_unbound .async_run_entry_fn
Call Trace:
[c00ff2fd2f80] [c00ff2fd3060] 0xc00ff2fd3060 (unreliable)
Task dump for CPU 2:
kworker/u97:1   R  running task 8288  1633  2 0x0804
Workqueue: events_unbound .async_run_entry_fn
Call Trace:
[c00ff2f92eb0] [c00cf610] .sched_show_task+0xf0/0x180 (unreliable)
[c00ff2f92f30] [c01041d8] .rcu_dump_cpu_stacks+0xd8/0x150
[c00ff2f92fd0] [c0108794] .rcu_check_callbacks+0x674/0x990
[c00ff2f93110] [c010e994] .update_process_times+0x44/0x90
[c00ff2f93190] [c01223f0] .tick_sched_handle.isra.16+0x20/0xa0
[c00ff2f93210] [c01224cc] .tick_sched_timer+0x5c/0xb0
[c00ff2f932b0] [c010f108] .__run_hrtimer+0x98/0x260
[c00ff2f93350] [c010fff8] .hrtimer_interrupt+0x138/0x2f0
[c00ff2f93460] [c001be1c] .__timer_interrupt+0x8c/0x230
[c00ff2f93500] [c001c488] .timer_interrupt+0x98/0xd0
[c00ff2f93580] [c00025d0] decrementer_common+0x150/0x180
--- interrupt: 901 at .string_get_size+0x120/0x250
LR = .sd_revalidate_disk+0x57c/0x1c10
[c00ff2f93870] [c048f84c] .string_get_size+0x18c/0x250 (unreliable
)
[c00ff2f93940] [c05e7c1c] .sd_revalidate_disk+0x57c/0x1c10
[c00ff2f93a70] [c05e951c] .sd_probe_async+0xac/0x230
[c00ff2f93b00] [c00c28ec] .async_run_entry_fn+0x6c/0x180
[c00ff2f93ba0] [c00b7b78] .process_one_work+0x1a8/0x4a0
[c00ff2f93c40] [c00b7ff0] .worker_thread+0x180/0x5a0
[c00ff2f93d30] [c00bee08] .kthread+0x108/0x130
[c00ff2f93e30] [c0009590] .ret_from_kernel_thread+0x58/0xc8
Task dump for CPU 0:
kworker/u97:0   R  running task 8192 7  2 0x0804
Workqueue: events_unbound .async_run_entry_fn
Call Trace:
[c00ffa29ef80] [c00ffa29f060] 0xc00ffa29f060 (unreliable)
Task dump for CPU 1:
kworker/u97:2   R  running task 9488  1636  2 0x0804
Workqueue: events_unbound .async_run_entry_fn
Call Trace:
[c00ff2fd2eb0] [c00cf610] .sched_show_task+0xf0/0x180 (unreliable)
[c00ff2fd2f30] [c01041d8] .rcu_dump_cpu_stacks+0xd8/0x150
[c00ff2fd2fd0] [c0108794] .rcu_check_callbacks+0x674/0x990
[c00ff2fd3110] [c010e994] .update_process_times+0x44/0x90
[c00ff2fd3190] [c01223f0] .tick_sched_handle.isra.16+0x20/0xa0
[c00ff2fd3210] [c01224cc] .tick_sched_timer+0x5c/0xb0
[c00ff2fd32b0] [c010f108] .__run_hrtimer+0x98/0x260
[c00ff2fd3350] [c010fff8] .hrtimer_interrupt+0x138/0x2f0
[c00ff2fd3460] [c001be1c] .__timer_interrupt+0x8c/0x230
[c00ff2fd3500] [c001c488] .timer_interrupt+0x98/0xd0
[c00ff2fd3580] [c00025d0] decrementer_common+0x150/0x180
--- interrupt: 901 at .string_get_size+0x110/0x250
LR = .sd_revalidate_disk+0x57c/0x1c10
[c00ff2fd3870] [c048f84c] .string_get_size+0x18c/0x250 (unreliable
)
[c00ff2fd3940] [c05e7c1c] .sd_revalidate_disk+0x57c/0x1c10
[c00ff2fd3a70] [c05e951c] .sd_probe_async+0xac/0x230
[c00ff2fd3b00] [c00c28ec] .async_run_entry_fn+0x6c/0x180
[c00ff2fd3ba0] [c00b7b78] .process_one_work+0x1a8/0x4a0
[c00ff2fd3c40] [c00b7ff0] .worker_thread+0x180/0x5a0
[c00ff2fd3d30] [c00bee08] .kthread+0x108/0x130
[c00ff2fd3e30] [c0009590] 

Re: [PATCH kernel] commit 4fbdf9cb (lpfc: Fix for lun discovery issue with saturn adapter.)

2015-04-28 Thread Sebastian Herbszt
Alexey Kardashevskiy wrote:
 This reverts 4fbdf9cb is breaks LPFC on POWER7 machine, big endian kernel.
 
 This is the hardware used for verification:
 0005:01:00.0 Fibre Channel [0c04]: Emulex Corporation Saturn-X: LightPulse 
 Fibre Channel Host Adapter [10df:f100] (rev 03)
 0005:01:00.1 Fibre Channel [0c04]: Emulex Corporation Saturn-X: LightPulse 
 Fibre Channel Host Adapter [10df:f100] (rev 03)
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

This issue is not specific to POWER7. I hit it on x86 [1] and James
promised to look at it.

[1] http://marc.info/?l=linux-scsim=142938432414173

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] leds/powernv: Add driver for PowerNV platform

2015-04-28 Thread Jacek Anaszewski

On 04/28/2015 08:59 AM, Stewart Smith wrote:

Jacek Anaszewski j.anaszewsk...@gmail.com writes:

Is the DT node we are discussing used by some other drivers than the
LED class driver? Or is it required in this form by other components of
your platform?


OS kernels are the chief consumers, Linux being the overwhelmingly major
one here.

But this is what firmware currently produces.

Changing the DT representation at this stage is perhaps *possible*
without creating a bunch of pain (I'd have to audit a bunch of things to
see if we have GA shipping systems with this functionality for instance,
and then evaluate the impact to partners and our various labs) which is
a lot of work I don't particularly want to do and is well below urgent
item 248 on my TODO list, especially for what seems to be largely a
cosmetic suggestion?

That being said, more and better review of things we're putting in the
device tree in firmware is probably a good thing. After all, once we
release we do kind of have to live with it essentially forever. If
people are able to aid in that kind of code review, I'd be most welcome
to hear it.

--
To unsubscribe from this list: send the line unsubscribe linux-leds in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



As we agreed with Ben - DT bindings are platform specific and they will
have to be acked by powerpc maintainers. I have only DT related
experience from ARM based platforms, but it seems not to be applicable
to powerpc.

--
Best Regards,
Jacek Anaszewski
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/2] powerpc/powernv: Add definition of OPAL_MSG_OCC message type

2015-04-28 Thread Shilpasri G Bhat
Add OPAL_MSG_OCC message definition to opal_message_type to receive
OCC events like reset, load and throttled. Host performance can be
affected when OCC is reset or OCC throttles the max Pstate.
We can register to opal_message_notifier to receive OPAL_MSG_OCC type
of message and report it to the userspace so as to keep the user
informed about the reason for a performance drop in workloads.

The reset and load OCC events are notified to kernel when FSP sends
OCC_RESET and OCC_LOAD commands.  Both reset and load messages are
sent to kernel on successful completion of reset and load operation
respectively.

The throttle OCC event indicates that the Pmax of the chip is reduced.
The chip_id and throttle reason for reducing Pmax is also queued along
with the message.

Additional opal message type OPAL_MSG_PRD is added to maintain
compatibility between opal and kernel definition of opal_message_type.

Signed-off-by: Shilpasri G Bhat shilpa.b...@linux.vnet.ibm.com
Reviewed-by: Preeti U Murthy pre...@linux.vnet.ibm.com
---
Changes from v1:
- Update the commit changelog

 arch/powerpc/include/asm/opal-api.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 0321a90..50053b7 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -352,6 +352,14 @@ enum opal_msg_type {
OPAL_MSG_SHUTDOWN,  /* params[0] = 1 reboot, 0 shutdown */
OPAL_MSG_HMI_EVT,
OPAL_MSG_DPO,
+   OPAL_MSG_PRD,
+   OPAL_MSG_OCC,   /*
+* params[0] = 0 reset,
+* 1 load,
+* 2 throttle
+* params[1] = chip_id
+* params[2] = throttle_status
+*/
OPAL_MSG_TYPE_MAX,
 };
 
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/2] powernv: cpufreq: Report frequency throttle by OCC

2015-04-28 Thread Shilpasri G Bhat
This patchset intends to add frequency throttle reporting mechanism
to powernv-cpufreq driver when OCC throttles the frequency. OCC is an
On-Chip-Controller which takes care of the power and thermal safety of
the chip. The CPU frequency can be throttled during an OCC reset or
when OCC tries to limit the max allowed frequency. The patchset will
report such conditions so as to keep the user informed about reason
for the drop in performance of workloads when frequency is throttled.

Shilpasri G Bhat (2):
  powerpc/powernv: Add definition of OPAL_MSG_OCC message type
  cpufreq: powernv: Register for OCC related opal_message notification

 arch/powerpc/include/asm/opal-api.h |   8 ++
 drivers/cpufreq/powernv-cpufreq.c   | 181 +---
 2 files changed, 174 insertions(+), 15 deletions(-)

-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] leds/powernv: Add driver for PowerNV platform

2015-04-28 Thread Stewart Smith
Jacek Anaszewski j.anaszewsk...@gmail.com writes:
 Is the DT node we are discussing used by some other drivers than the
 LED class driver? Or is it required in this form by other components of
 your platform?

OS kernels are the chief consumers, Linux being the overwhelmingly major
one here.

But this is what firmware currently produces.

Changing the DT representation at this stage is perhaps *possible*
without creating a bunch of pain (I'd have to audit a bunch of things to
see if we have GA shipping systems with this functionality for instance,
and then evaluate the impact to partners and our various labs) which is
a lot of work I don't particularly want to do and is well below urgent
item 248 on my TODO list, especially for what seems to be largely a
cosmetic suggestion?

That being said, more and better review of things we're putting in the
device tree in firmware is probably a good thing. After all, once we
release we do kind of have to live with it essentially forever. If
people are able to aid in that kind of code review, I'd be most welcome
to hear it.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH] QorIQ/TMU: add thermal management support based on TMU

2015-04-28 Thread Hongtao Jia
Hello Rui Zhang,

Please help to review this Thermal Management driver.

Thanks.

---
Best Regards,
Hongtao

 -Original Message-
 From: Jia Hongtao [mailto:hongtao@freescale.com]
 Sent: Friday, April 03, 2015 3:11 PM
 To: rui.zh...@intel.com
 Cc: linux...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Wood Scott-
 B07421; Jia Hongtao-B38951
 Subject: [PATCH] QorIQ/TMU: add thermal management support based on TMU
 
 It supports one critical trip point and one passive trip point.
 The cpufreq is used as the cooling device to throttle CPUs when
 the passive trip is crossed.
 
 Signed-off-by: Jia Hongtao hongtao@freescale.com
 ---
  drivers/thermal/Kconfig |  11 ++
  drivers/thermal/Makefile|   1 +
  drivers/thermal/qoriq_thermal.c | 405
 
  3 files changed, 417 insertions(+)
  create mode 100644 drivers/thermal/qoriq_thermal.c
 
 diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
 index af40db0..c0a8bd1 100644
 --- a/drivers/thermal/Kconfig
 +++ b/drivers/thermal/Kconfig
 @@ -147,6 +147,17 @@ config IMX_THERMAL
 cpufreq is used as the cooling device to throttle CPUs when the
 passive trip is crossed.
 
 +config QORIQ_THERMAL
 + tristate Freescale QorIQ Thermal Monitoring Unit
 + depends on CPU_THERMAL
 + depends on OF
 + default n
 + help
 +   Enable thermal management based on Freescale QorIQ Thermal
 Monitoring
 +   Unit (TMU). It supports one critical trip point and one passive
 trip
 +   point. The cpufreq is used as the cooling device to throttle CPUs
 when
 +   the passive trip is crossed.
 +
  config SPEAR_THERMAL
   bool SPEAr thermal sensor driver
   depends on PLAT_SPEAR
 diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
 index fa0dc48..7de4847 100644
 --- a/drivers/thermal/Makefile
 +++ b/drivers/thermal/Makefile
 @@ -31,6 +31,7 @@ obj-$(CONFIG_DOVE_THERMAL)  += dove_thermal.o
  obj-$(CONFIG_DB8500_THERMAL) += db8500_thermal.o
  obj-$(CONFIG_ARMADA_THERMAL) += armada_thermal.o
  obj-$(CONFIG_IMX_THERMAL)+= imx_thermal.o
 +obj-$(CONFIG_QORIQ_THERMAL)  += qoriq_thermal.o
  obj-$(CONFIG_DB8500_CPUFREQ_COOLING) += db8500_cpufreq_cooling.o
  obj-$(CONFIG_INTEL_POWERCLAMP)   += intel_powerclamp.o
  obj-$(CONFIG_X86_PKG_TEMP_THERMAL)   += x86_pkg_temp_thermal.o
 diff --git a/drivers/thermal/qoriq_thermal.c
 b/drivers/thermal/qoriq_thermal.c
 new file mode 100644
 index 000..f5d3a2c
 --- /dev/null
 +++ b/drivers/thermal/qoriq_thermal.c
 @@ -0,0 +1,405 @@
 +/*
 + * Copyright 2015 Freescale Semiconductor, Inc.
 + *
 + * This program is free software; you can redistribute it and/or modify
 it
 + * under the terms and conditions of the GNU General Public License,
 + * version 2, as published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope it will be useful, but
 WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
 for
 + * more details.
 + *
 + */
 +
 +/*
 + * Based on Freescale QorIQ Thermal Monitoring Unit (TMU)
 + */
 +#include linux/cpufreq.h
 +#include linux/cpu_cooling.h
 +#include linux/module.h
 +#include linux/platform_device.h
 +#include linux/err.h
 +#include linux/io.h
 +#include linux/of.h
 +#include linux/thermal.h
 +
 +#define SITES_MAX16
 +
 +#define TMU_TEMP_PASSIVE 85000
 +#define TMU_TEMP_CRITICAL95000
 +
 +#define TMU_PASSIVE_DELAY1000/* Milliseconds */
 +#define TMU_POLLING_DELAY5000
 +
 +/* The driver supports 1 passive trip point and 1 critical trip point */
 +enum tmu_thermal_trip {
 + TMU_TRIP_PASSIVE,
 + TMU_TRIP_CRITICAL,
 + TMU_TRIP_NUM,
 +};
 +
 +/*
 + * QorIQ TMU Registers
 + */
 +struct qoriq_tmu_site_regs {
 + __be32 tritsr;  /* Immediate Temperature Site Register */
 + __be32 tratsr;  /* Average Temperature Site Register */
 + u8 res0[0x8];
 +} __packed;
 +
 +struct qoriq_tmu_regs {
 + __be32 tmr; /* Mode Register */
 +#define TMR_DISABLE  0x0
 +#define TMR_ME   0x8000
 +#define TMR_ALPF 0x0c00
 +#define TMR_MSITE0x8000
 +#define TMR_ALL  (TMR_ME | TMR_ALPF | TMR_MSITE)
 + __be32 tsr; /* Status Register */
 + __be32 tmtmir;  /* Temperature measurement interval
 Register */
 +#define TMTMIR_DEFAULT   0x0007
 + u8 res0[0x14];
 + __be32 tier;/* Interrupt Enable Register */
 +#define TIER_DISABLE 0x0
 + __be32 tidr;/* Interrupt Detect Register */
 + __be32 tiscr;   /* Interrupt Site Capture Register */
 + __be32 ticscr;  /* Interrupt Critical Site Capture Register
 */
 + u8 res1[0x10];
 + __be32 tmhtcrh; /* High Temperature Capture Register */
 + __be32 tmhtcrl; /* Low Temperature Capture Register */
 + u8 

Re: [PATCH] powerpc/kvm: Fix SMP=n build error in book3s_xics.c

2015-04-28 Thread Alexander Graf


On 28.04.15 02:42, Michael Ellerman wrote:
 Commit 34cb7954c0aa Convert ICS mutex lock to spin lock added an
 include of asm/spinlock.h, which does not work in the SMP=n case.
 
 It should instead include linux/spinlock.h
 
 Fixes: 34cb7954c0aa (KVM: PPC: Book3S HV: Convert ICS mutex lock to spin 
 lock)
 Signed-off-by: Michael Ellerman m...@ellerman.id.au

Reviewed-by: Alexander Graf ag...@suse.de


Alex
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH kernel] commit 4fbdf9cb (lpfc: Fix for lun discovery issue with saturn adapter.)

2015-04-28 Thread Alexey Kardashevskiy
This reverts 4fbdf9cb is breaks LPFC on POWER7 machine, big endian kernel.

This is the hardware used for verification:
0005:01:00.0 Fibre Channel [0c04]: Emulex Corporation Saturn-X: LightPulse 
Fibre Channel Host Adapter [10df:f100] (rev 03)
0005:01:00.1 Fibre Channel [0c04]: Emulex Corporation Saturn-X: LightPulse 
Fibre Channel Host Adapter [10df:f100] (rev 03)

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 drivers/scsi/lpfc/lpfc_scsi.c | 41 +
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index cb73cf9..c140f99 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -1130,25 +1130,6 @@ lpfc_release_scsi_buf(struct lpfc_hba *phba, struct 
lpfc_scsi_buf *psb)
 }
 
 /**
- * lpfc_fcpcmd_to_iocb - copy the fcp_cmd data into the IOCB
- * @data: A pointer to the immediate command data portion of the IOCB.
- * @fcp_cmnd: The FCP Command that is provided by the SCSI layer.
- *
- * The routine copies the entire FCP command from @fcp_cmnd to @data while
- * byte swapping the data to big endian format for transmission on the wire.
- **/
-static void
-lpfc_fcpcmd_to_iocb(uint8_t *data, struct fcp_cmnd *fcp_cmnd)
-{
-   int i, j;
-
-   for (i = 0, j = 0; i  sizeof(struct fcp_cmnd);
-i += sizeof(uint32_t), j++) {
-   ((uint32_t *)data)[j] = cpu_to_be32(((uint32_t *)fcp_cmnd)[j]);
-   }
-}
-
-/**
  * lpfc_scsi_prep_dma_buf_s3 - DMA mapping for scsi buffer to SLI3 IF spec
  * @phba: The Hba for which this call is being executed.
  * @lpfc_cmd: The scsi buffer which is going to be mapped.
@@ -1283,7 +1264,6 @@ lpfc_scsi_prep_dma_buf_s3(struct lpfc_hba *phba, struct 
lpfc_scsi_buf *lpfc_cmd)
 * we need to set word 4 of IOCB here
 */
iocb_cmd-un.fcpi.fcpi_parm = scsi_bufflen(scsi_cmnd);
-   lpfc_fcpcmd_to_iocb(iocb_cmd-unsli3.fcp_ext.icd, fcp_cmnd);
return 0;
 }
 
@@ -4147,6 +4127,24 @@ lpfc_scsi_cmd_iocb_cmpl(struct lpfc_hba *phba, struct 
lpfc_iocbq *pIocbIn,
 }
 
 /**
+ * lpfc_fcpcmd_to_iocb - copy the fcp_cmd data into the IOCB
+ * @data: A pointer to the immediate command data portion of the IOCB.
+ * @fcp_cmnd: The FCP Command that is provided by the SCSI layer.
+ *
+ * The routine copies the entire FCP command from @fcp_cmnd to @data while
+ * byte swapping the data to big endian format for transmission on the wire.
+ **/
+static void
+lpfc_fcpcmd_to_iocb(uint8_t *data, struct fcp_cmnd *fcp_cmnd)
+{
+   int i, j;
+   for (i = 0, j = 0; i  sizeof(struct fcp_cmnd);
+i += sizeof(uint32_t), j++) {
+   ((uint32_t *)data)[j] = cpu_to_be32(((uint32_t *)fcp_cmnd)[j]);
+   }
+}
+
+/**
  * lpfc_scsi_prep_cmnd - Wrapper func for convert scsi cmnd to FCP info unit
  * @vport: The virtual port for which this call is being executed.
  * @lpfc_cmd: The scsi command which needs to send.
@@ -4225,6 +4223,9 @@ lpfc_scsi_prep_cmnd(struct lpfc_vport *vport, struct 
lpfc_scsi_buf *lpfc_cmd,
fcp_cmnd-fcpCntl3 = 0;
phba-fc4ControlRequests++;
}
+   if (phba-sli_rev == 3 
+   !(phba-sli3_options  LPFC_SLI3_BG_ENABLED))
+   lpfc_fcpcmd_to_iocb(iocb_cmd-unsli3.fcp_ext.icd, fcp_cmnd);
/*
 * Finish initializing those IOCB fields that are independent
 * of the scsi_cmnd request_buffer
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] Correct cpu affinity for dlpar added cpus

2015-04-28 Thread Nathan Fontenot
The incorrect ordering of operations during cpu dlpar causes the affinity
of cpus being added to be invalid. Phyp does not assign affinity information
for a cpu until the rtas set-indicator calls are made to set the isolation
and allocation state. In the current code we call rtas configure-connector
before making the set-indicator calls which results in invalid data in the
ibm,associativity property for the cpu we're adding.

This patch corrects the order of operations to make the set-indicator
calls (done in acquire_drc) before calling configure-connector.

Signed-off-by: Nathan Fontenot nf...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/pseries/dlpar.c |   10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index b4b1109..019d34a 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -412,6 +412,10 @@ static ssize_t dlpar_cpu_probe(const char *buf, size_t 
count)
if (rc)
return -EINVAL;
 
+   rc = dlpar_acquire_drc(drc_index);
+   if (rc)
+   return -EINVAL;
+
parent = of_find_node_by_path(/cpus);
if (!parent)
return -ENODEV;
@@ -422,12 +426,6 @@ static ssize_t dlpar_cpu_probe(const char *buf, size_t 
count)
 
of_node_put(parent);
 
-   rc = dlpar_acquire_drc(drc_index);
-   if (rc) {
-   dlpar_free_cc_nodes(dn);
-   return -EINVAL;
-   }
-
rc = dlpar_attach_node(dn);
if (rc) {
dlpar_release_drc(drc_index);

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev