Re: [V2] powerpc/Kconfig: Update config option based on page size.

2016-09-15 Thread Michael Ellerman
santhosh  writes:

>> Michael Ellerman  writes:
>>
>>> On Fri, 2016-19-02 at 05:38:47 UTC, Rashmica Gupta wrote:
 Currently on PPC64 changing kernel pagesize from 4K to 64K leaves
 FORCE_MAX_ZONEORDER set to 13 - which produces a compile error.

>>> ...
 So, update the range of FORCE_MAX_ZONEORDER from 9-64 to 8-9 for 64K pages
 and from 13-64 to 9-13 for 4K pages.
>>>
>>> https://git.kernel.org/powerpc/c/a7ee539584acf4a565b7439cea
>>>
>> HPAGE_PMD_ORDER is not something we should check w.r.t 4k linux page
>> size. We do have the below constraint w.r.t hugetlb pages
>>
>> static inline bool hstate_is_gigantic(struct hstate *h)
>> {
>>  return huge_page_order(h) >= MAX_ORDER;
>> }
>>
>> That require MAX_ORDER to be greater than 12.

So have you tried that fix?

cheers


[PATCH v21 19/19] perf, tools: Allow period= in perf stat CPU event descriptions.

2016-09-15 Thread Sukadev Bhattiprolu
This avoids the JSON PMU events parser having to know whether its aliases
are for perf stat or perf record.

Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Ingo Molnar 
---
 tools/perf/util/parse-events.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 1abda10..b675273 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -923,6 +923,7 @@ config_term_avail(int term_type, struct parse_events_error 
*err)
case PARSE_EVENTS__TERM_TYPE_CONFIG1:
case PARSE_EVENTS__TERM_TYPE_CONFIG2:
case PARSE_EVENTS__TERM_TYPE_NAME:
+   case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD:
return true;
default:
if (!err)
-- 
1.8.3.1



[PATCH v21 18/19] perf, tools, pmu-events: Add Skylake frontend MSR support

2016-09-15 Thread Sukadev Bhattiprolu
From: Andi Kleen 

Add support for the "frontend" extra MSR on Skylake in the JSON
conversion.

Signed-off-by: Andi Kleen 
Acked-by: Ingo Molnar 
---
 tools/perf/pmu-events/jevents.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c
index e8e2a87..5846057 100644
--- a/tools/perf/pmu-events/jevents.c
+++ b/tools/perf/pmu-events/jevents.c
@@ -126,6 +126,7 @@ static struct msrmap {
{ "0x3F6", "ldlat=" },
{ "0x1A6", "offcore_rsp=" },
{ "0x1A7", "offcore_rsp=" },
+   { "0x3F7", "frontend=" },
{ NULL, NULL }
 };
 
-- 
1.8.3.1



[PATCH v21 17/19] perf, tools, pmu-events: Fix fixed counters on Intel

2016-09-15 Thread Sukadev Bhattiprolu
From: Andi Kleen 

The JSON event lists use a different encoding for fixed counters
than perf for instructions and cycles (ref-cycles is ok)

This lead to some common events like inst_retired.any
or cpu_clk_unhalted.thread not counting, when specified with their
JSON name.

Special case these events in the jevents conversion process.
I prefer to not touch the JSON files for this, as it's intended
that standard JSON files can be just dropped into the perf
build without changes.

Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
[Fix minor compile error]
Acked-by: Ingo Molnar 
---
Changelog[v21]:
Fix minor conflict in tools/perf/pmu-events/jevents.c
---
 tools/perf/pmu-events/jevents.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c
index 9cdfbaa..e8e2a87 100644
--- a/tools/perf/pmu-events/jevents.c
+++ b/tools/perf/pmu-events/jevents.c
@@ -305,6 +305,29 @@ static void print_events_table_suffix(FILE *outfp)
close_table = 0;
 }
 
+static struct fixed {
+   const char *name;
+   const char *event;
+} fixed[] = {
+   { "inst_retired.any", "event=0xc0" },
+   { "cpu_clk_unhalted.thread", "event=0x3c" },
+   { "cpu_clk_unhalted.thread_any", "event=0x3c,any=1" },
+   { NULL, NULL},
+};
+
+/*
+ * Handle different fixed counter encodings between JSON and perf.
+ */
+static char *real_event(const char *name, char *event)
+{
+   int i;
+
+   for (i = 0; fixed[i].name; i++)
+   if (!strcasecmp(name, fixed[i].name))
+   return (char *)fixed[i].event;
+   return event;
+}
+
 /* Call func with each event in the json file */
 int json_events(const char *fn,
  int (*func)(void *data, char *name, char *event, char *desc,
@@ -391,7 +414,7 @@ int json_events(const char *fn,
addfield(map, , ",", msr->pname, msrval);
fixname(name);
 
-   err = func(data, name, event, desc, long_desc);
+   err = func(data, name, real_event(name, event), desc, 
long_desc);
free(event);
free(desc);
free(name);
-- 
1.8.3.1



[PATCH v21 16/19] perf, tools: Make alias matching case-insensitive

2016-09-15 Thread Sukadev Bhattiprolu
From: Andi Kleen 

Make alias matching the events parser case-insensitive. This is useful
with the JSON events. perf uses lower case events, but the CPU manuals
generally use upper case event names. The JSON files use lower
case by default too. But if we search case insensitively then
users can cut-n-paste the upper case event names.

So the following works:

% perf stat -e BR_INST_EXEC.TAKEN_INDIRECT_NEAR_CALL true

 Performance counter stats for 'true':

   305  BR_INST_EXEC.TAKEN_INDIRECT_NEAR_CALL

   0.000492799 seconds time elapsed

Signed-off-by: Andi Kleen 
Acked-by: Ingo Molnar 
---
 tools/perf/util/parse-events.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 9abd60d..1abda10 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1453,7 +1453,7 @@ comp_pmu(const void *p1, const void *p2)
struct perf_pmu_event_symbol *pmu1 = (struct perf_pmu_event_symbol *) 
p1;
struct perf_pmu_event_symbol *pmu2 = (struct perf_pmu_event_symbol *) 
p2;
 
-   return strcmp(pmu1->symbol, pmu2->symbol);
+   return strcasecmp(pmu1->symbol, pmu2->symbol);
 }
 
 static void perf_pmu__parse_cleanup(void)
-- 
1.8.3.1



[PATCH v21 15/19] perf, tools: Add README for info on parsing JSON/map files

2016-09-15 Thread Sukadev Bhattiprolu
Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Jiri Olsa 
Acked-by: Ingo Molnar 
---
Changelog[v21]
- Update README to reflect the Topics.json directory tree layout.
---
 tools/perf/pmu-events/README | 148 +++
 1 file changed, 148 insertions(+)
 create mode 100644 tools/perf/pmu-events/README

diff --git a/tools/perf/pmu-events/README b/tools/perf/pmu-events/README
new file mode 100644
index 000..c5ee208e
--- /dev/null
+++ b/tools/perf/pmu-events/README
@@ -0,0 +1,148 @@
+
+The contents of this directory allow users to specify PMU events in their
+CPUs by their symbolic names rather than raw event codes (see example below).
+
+The main program in this directory, is the 'jevents', which is built and
+executed _BEFORE_ the perf binary itself is built.
+
+The 'jevents' program tries to locate and process JSON files in the directory
+tree tools/perf/pmu-events/arch/foo.
+
+   - Regular files with '.json' extension in the name are assumed to be
+ JSON files, each of which describes a set of PMU events.
+
+   - Regular files with basename starting with 'mapfile.csv' are assumed
+ to be a CSV file that maps a specific CPU to its set of PMU events.
+ (see below for mapfile format)
+
+   - Directories are traversed, but all other files are ignored.
+
+The PMU events supported by a CPU model are expected to grouped into topics
+such as Pipelining, Cache, Memory, Floating-point etc. All events for a topic
+should be placed in a separate JSON file - where the file name identifies
+the topic. Eg: "Floating-point.json".
+
+All the topic JSON files for a CPU model/family should be in a separate
+sub directory. Thus for the Silvermont X86 CPU:
+
+   $ ls tools/perf/pmu-events/arch/x86/Silvermont_core
+   Cache.json  Memory.json Virtual-Memory.json
+   Frontend.json   Pipeline.json
+
+Using the JSON files and the mapfile, 'jevents' generates the C source file,
+'pmu-events.c', which encodes the two sets of tables:
+
+   - Set of 'PMU events tables' for all known CPUs in the architecture,
+ (one table like the following, per JSON file; table name 'pme_power8'
+ is derived from JSON file name, 'power8.json').
+
+   struct pmu_event pme_power8[] = {
+
+   ...
+
+   {
+   .name = "pm_1plus_ppc_cmpl",
+   .event = "event=0x100f2",
+   .desc = "1 or more ppc insts finished,",
+   },
+
+   ...
+   }
+
+   - A 'mapping table' that maps each CPU of the architecture, to its
+ 'PMU events table'
+
+   struct pmu_events_map pmu_events_map[] = {
+   {
+   .cpuid = "004b",
+   .version = "1",
+   .type = "core",
+   .table = pme_power8
+   },
+   ...
+
+   };
+
+After the 'pmu-events.c' is generated, it is compiled and the resulting
+'pmu-events.o' is added to 'libperf.a' which is then used to build perf.
+
+NOTES:
+   1. Several CPUs can support same set of events and hence use a common
+  JSON file. Hence several entries in the pmu_events_map[] could map
+  to a single 'PMU events table'.
+
+   2. The 'pmu-events.h' has an extern declaration for the mapping table
+  and the generated 'pmu-events.c' defines this table.
+
+   3. _All_ known CPU tables for architecture are included in the perf
+  binary.
+
+At run time, perf determines the actual CPU it is running on, finds the
+matching events table and builds aliases for those events. This allows
+users to specify events by their name:
+
+   $ perf stat -e pm_1plus_ppc_cmpl sleep 1
+
+where 'pm_1plus_ppc_cmpl' is a Power8 PMU event.
+
+In case of errors when processing files in the tools/perf/pmu-events/arch
+directory, 'jevents' tries to create an empty mapping file to allow the perf
+build to succeed even if the PMU event aliases cannot be used.
+
+However some errors in processing may cause the perf build to fail.
+
+Mapfile format
+===
+
+The mapfile enables multiple CPU models to share a single set of PMU events.
+It is required even if such mapping is 1:1.
+
+The mapfile.csv format is expected to be:
+
+   Header line
+   CPUID,Version,Dir/path/name,Type
+
+where:
+
+   Comma:
+   is the required field delimiter (i.e other fields cannot
+   have commas within them).
+
+   Comments:
+   Lines in which the first character is either '\n' or '#'
+   are ignored.
+
+   Header line
+   The header line is the first line in the file, which is
+   always _IGNORED_. It can empty.
+
+   CPUID:
+  

[PATCH v21 14/19] perf, tools, jevents: Handle header line in mapfile

2016-09-15 Thread Sukadev Bhattiprolu
From: Andi Kleen 

To work with existing mapfiles, assume that the first line in
'mapfile.csv' is a header line and skip over it.

Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Jiri Olsa 
Acked-by: Ingo Molnar 
---

Changelog[v2]
All architectures may not use the "Family" to identify. So,
assume first line is header.
---
 tools/perf/pmu-events/jevents.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c
index f550cad..9cdfbaa 100644
--- a/tools/perf/pmu-events/jevents.c
+++ b/tools/perf/pmu-events/jevents.c
@@ -492,7 +492,12 @@ static int process_mapfile(FILE *outfp, char *fpath)
 
print_mapping_table_prefix(outfp);
 
-   line_num = 0;
+   /* Skip first line (header) */
+   p = fgets(line, n, mapfp);
+   if (!p)
+   goto out;
+
+   line_num = 1;
while (1) {
char *cpuid, *version, *type, *fname;
 
@@ -536,8 +541,8 @@ static int process_mapfile(FILE *outfp, char *fpath)
fprintf(outfp, "},\n");
}
 
+out:
print_mapping_table_suffix(outfp);
-
return 0;
 }
 
-- 
1.8.3.1



[PATCH v21 13/19] perf, tools: Add support for event list topics

2016-09-15 Thread Sukadev Bhattiprolu
From: Andi Kleen 

Add support to group the output of perf list by the Topic field
in the JSON file.

Example output:

% perf list
...
Cache:
  l1d.replacement
   [L1D data line replacements]
  l1d_pend_miss.pending
   [L1D miss oustandings duration in cycles]
  l1d_pend_miss.pending_cycles
   [Cycles with L1D load Misses outstanding]
  l2_l1d_wb_rqsts.all
   [Not rejected writebacks from L1D to L2 cache lines in any state]
  l2_l1d_wb_rqsts.hit_e
   [Not rejected writebacks from L1D to L2 cache lines in E state]
  l2_l1d_wb_rqsts.hit_m
   [Not rejected writebacks from L1D to L2 cache lines in M state]

...
Pipeline:
  arith.fpu_div
   [Divide operations executed]
  arith.fpu_div_active
   [Cycles when divider is busy executing divide operations]
  baclears.any
   [Counts the total number when the front end is resteered, mainly
   when the BPU cannot provide a correct prediction and this is
   corrected by other branch handling mechanisms at the front end]
  br_inst_exec.all_branches
   [Speculative and retired branches]
  br_inst_exec.all_conditional
   [Speculative and retired macro-conditional branches]
  br_inst_exec.all_direct_jmp
   [Speculative and retired macro-unconditional branches excluding
   calls and indirects]
  br_inst_exec.all_direct_near_call
   [Speculative and retired direct near calls]
  br_inst_exec.all_indirect_jump_non_call_ret

Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Jiri Olsa 
Acked-by: Ingo Molnar 
---

Changelog[v14]
- [Jiri Olsa] Move jevents support for Topic to a separate patch.
---
 tools/perf/util/pmu.c | 37 +++--
 tools/perf/util/pmu.h |  1 +
 2 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 43838b3..ac097fc 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -223,7 +223,8 @@ static int perf_pmu__parse_snapshot(struct perf_pmu_alias 
*alias,
 }
 
 static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
-char *desc, char *val, char *long_desc)
+char *desc, char *val, char *long_desc,
+char *topic)
 {
struct perf_pmu_alias *alias;
int ret;
@@ -259,6 +260,7 @@ static int __perf_pmu__new_alias(struct list_head *list, 
char *dir, char *name,
alias->desc = desc ? strdup(desc) : NULL;
alias->long_desc = long_desc ? strdup(long_desc) :
desc ? strdup(desc) : NULL;
+   alias->topic = topic ? strdup(topic) : NULL;
 
list_add_tail(>list, list);
 
@@ -276,7 +278,7 @@ static int perf_pmu__new_alias(struct list_head *list, char 
*dir, char *name, FI
 
buf[ret] = 0;
 
-   return __perf_pmu__new_alias(list, dir, name, NULL, buf, NULL);
+   return __perf_pmu__new_alias(list, dir, name, NULL, buf, NULL, NULL);
 }
 
 static inline bool pmu_alias_info_file(char *name)
@@ -535,7 +537,7 @@ static int pmu_add_cpu_aliases(struct list_head *head)
/* need type casts to override 'const' */
__perf_pmu__new_alias(head, NULL, (char *)pe->name,
(char *)pe->desc, (char *)pe->event,
-   (char *)pe->long_desc);
+   (char *)pe->long_desc, (char *)pe->topic);
}
 
 out:
@@ -1056,19 +1058,26 @@ static char *format_alias_or(char *buf, int len, struct 
perf_pmu *pmu,
return buf;
 }
 
-struct pair {
+struct sevent {
char *name;
char *desc;
+   char *topic;
 };
 
-static int cmp_pair(const void *a, const void *b)
+static int cmp_sevent(const void *a, const void *b)
 {
-   const struct pair *as = a;
-   const struct pair *bs = b;
+   const struct sevent *as = a;
+   const struct sevent *bs = b;
 
/* Put extra events last */
if (!!as->desc != !!bs->desc)
return !!as->desc - !!bs->desc;
+   if (as->topic && bs->topic) {
+   int n = strcmp(as->topic, bs->topic);
+
+   if (n)
+   return n;
+   }
return strcmp(as->name, bs->name);
 }
 
@@ -1102,9 +,10 @@ void print_pmu_events(const char *event_glob, bool 
name_only, bool quiet_flag,
char buf[1024];
int printed = 0;
int len, j;
-   struct pair *aliases;
+   struct sevent *aliases;
int numdesc = 0;
int columns = pager_get_columns();
+   char *topic = NULL;
 
pmu = NULL;
len = 0;
@@ -1114,7 +1124,7 @@ void print_pmu_events(const char *event_glob, bool 
name_only, bool quiet_flag,
if (pmu->selectable)
len++;
}
-   aliases = zalloc(sizeof(struct pair) * len);
+  

[PATCH v21 11/19] perf, tools: Add alias support for long descriptions

2016-09-15 Thread Sukadev Bhattiprolu
Previously we were dropping the useful longer descriptions that some
events have in the event list completely. Now that jevents provides
support for longer descriptions (see previous patch), add support for
parsing the long descriptions

Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Jiri Olsa 
Acked-by: Ingo Molnar 
---

Changelog[v14]
- [Jiri Olsa] Break up independent parts of the patch into
  separate patches.
---
 tools/perf/util/parse-events.c |  5 +++--
 tools/perf/util/parse-events.h |  3 ++-
 tools/perf/util/pmu.c  | 15 ++-
 tools/perf/util/pmu.h  |  4 +++-
 4 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index a3c7739..9abd60d 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -2229,7 +2229,8 @@ out_enomem:
 /*
  * Print the help text for the event symbols:
  */
-void print_events(const char *event_glob, bool name_only, bool quiet_flag)
+void print_events(const char *event_glob, bool name_only, bool quiet_flag,
+   bool long_desc)
 {
print_symbol_events(event_glob, PERF_TYPE_HARDWARE,
event_symbols_hw, PERF_COUNT_HW_MAX, name_only);
@@ -2239,7 +2240,7 @@ void print_events(const char *event_glob, bool name_only, 
bool quiet_flag)
 
print_hwcache_events(event_glob, name_only);
 
-   print_pmu_events(event_glob, name_only, quiet_flag);
+   print_pmu_events(event_glob, name_only, quiet_flag, long_desc);
 
if (event_glob != NULL)
return;
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 795f2579..7efde4a 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -171,7 +171,8 @@ void parse_events_update_lists(struct list_head *list_event,
 void parse_events_evlist_error(struct parse_events_evlist *data,
   int idx, const char *str);
 
-void print_events(const char *event_glob, bool name_only, bool quiet);
+void print_events(const char *event_glob, bool name_only, bool quiet,
+ bool long_desc);
 
 struct event_symbol {
const char  *symbol;
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 2291d2a..43838b3 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -223,7 +223,7 @@ static int perf_pmu__parse_snapshot(struct perf_pmu_alias 
*alias,
 }
 
 static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
-char *desc, char *val)
+char *desc, char *val, char *long_desc)
 {
struct perf_pmu_alias *alias;
int ret;
@@ -257,6 +257,8 @@ static int __perf_pmu__new_alias(struct list_head *list, 
char *dir, char *name,
}
 
alias->desc = desc ? strdup(desc) : NULL;
+   alias->long_desc = long_desc ? strdup(long_desc) :
+   desc ? strdup(desc) : NULL;
 
list_add_tail(>list, list);
 
@@ -274,7 +276,7 @@ static int perf_pmu__new_alias(struct list_head *list, char 
*dir, char *name, FI
 
buf[ret] = 0;
 
-   return __perf_pmu__new_alias(list, dir, name, NULL, buf);
+   return __perf_pmu__new_alias(list, dir, name, NULL, buf, NULL);
 }
 
 static inline bool pmu_alias_info_file(char *name)
@@ -532,7 +534,8 @@ static int pmu_add_cpu_aliases(struct list_head *head)
 
/* need type casts to override 'const' */
__perf_pmu__new_alias(head, NULL, (char *)pe->name,
-   (char *)pe->desc, (char *)pe->event);
+   (char *)pe->desc, (char *)pe->event,
+   (char *)pe->long_desc);
}
 
 out:
@@ -1091,7 +1094,8 @@ static void wordwrap(char *s, int start, int max, int 
corr)
}
 }
 
-void print_pmu_events(const char *event_glob, bool name_only, bool quiet_flag)
+void print_pmu_events(const char *event_glob, bool name_only, bool quiet_flag,
+   bool long_desc)
 {
struct perf_pmu *pmu;
struct perf_pmu_alias *alias;
@@ -1139,7 +1143,8 @@ void print_pmu_events(const char *event_glob, bool 
name_only, bool quiet_flag)
if (!aliases[j].name)
goto out_enomem;
 
-   aliases[j].desc = alias->desc;
+   aliases[j].desc = long_desc ? alias->long_desc :
+   alias->desc;
j++;
}
if (pmu->selectable &&
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 42999c7..1aa614e 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -39,6 +39,7 @@ struct perf_pmu_info {
 struct perf_pmu_alias {
char *name;

[PATCH v21 12/19] perf, tools: Support long descriptions with perf list

2016-09-15 Thread Sukadev Bhattiprolu
Previously we were dropping the useful longer descriptions that some
events have in the event list completely. This patch makes them appear with
perf list.

Old perf list:

baclears:
  baclears.all
   [Counts the number of baclears]

vs new:

perf list -v:
...
baclears:
  baclears.all
   [The BACLEARS event counts the number of times the front end is
resteered, mainly when the Branch Prediction Unit cannot provide
a correct prediction and this is corrected by the Branch Address
Calculator at the front end. The BACLEARS.ANY event counts the
number of baclears for any type of branch]

Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Jiri Olsa 
Acked-by: Ingo Molnar 
---

Changelog[v15]
- [Jir Olsa, Andi Kleen] Fix usage strings; update man page.

Changelog[v14]
- [Jiri Olsa] Break up independent parts of the patch into
  separate patches.

Changelog[v18]:
- Fix minor conflict in tools/perf/builtin-list.c; add long_desc_flag
  parameter to new print_pmu_events() call site.

Changelog[v21]
- Fix minor conflicts in tools/perf/builtin-list.c
---
 tools/perf/Documentation/perf-list.txt |  6 +-
 tools/perf/builtin-list.c  | 16 +++-
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-list.txt 
b/tools/perf/Documentation/perf-list.txt
index 72209bc..41857cc 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -8,7 +8,7 @@ perf-list - List all symbolic event types
 SYNOPSIS
 
 [verse]
-'perf list' [--no-desc] [hw|sw|cache|tracepoint|pmu|event_glob]
+'perf list' [--no-desc] [--long-desc] [hw|sw|cache|tracepoint|pmu|event_glob]
 
 DESCRIPTION
 ---
@@ -20,6 +20,10 @@ OPTIONS
 --no-desc::
 Don't print descriptions.
 
+-v::
+--long-desc::
+Print longer event descriptions.
+
 
 [[EVENT_MODIFIERS]]
 EVENT MODIFIERS
diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
index b14cb16..ba9322f 100644
--- a/tools/perf/builtin-list.c
+++ b/tools/perf/builtin-list.c
@@ -22,14 +22,17 @@ int cmd_list(int argc, const char **argv, const char 
*prefix __maybe_unused)
 {
int i;
bool raw_dump = false;
+   bool long_desc_flag = false;
struct option list_options[] = {
OPT_BOOLEAN(0, "raw-dump", _dump, "Dump raw events"),
OPT_BOOLEAN('d', "desc", _flag,
"Print extra event descriptions. --no-desc to not 
print."),
+   OPT_BOOLEAN('v', "long-desc", _desc_flag,
+   "Print longer event descriptions."),
OPT_END()
};
const char * const list_usage[] = {
-   "perf list [--no-desc] 
[hw|sw|cache|tracepoint|pmu|sdt|event_glob]",
+   "perf list [] 
[hw|sw|cache|tracepoint|pmu|sdt|event_glob]",
NULL
};
 
@@ -44,7 +47,7 @@ int cmd_list(int argc, const char **argv, const char *prefix 
__maybe_unused)
printf("\nList of pre-defined events (to be used in -e):\n\n");
 
if (argc == 0) {
-   print_events(NULL, raw_dump, !desc_flag);
+   print_events(NULL, raw_dump, !desc_flag, long_desc_flag);
return 0;
}
 
@@ -65,14 +68,16 @@ int cmd_list(int argc, const char **argv, const char 
*prefix __maybe_unused)
 strcmp(argv[i], "hwcache") == 0)
print_hwcache_events(NULL, raw_dump);
else if (strcmp(argv[i], "pmu") == 0)
-   print_pmu_events(NULL, raw_dump, !desc_flag);
+   print_pmu_events(NULL, raw_dump, !desc_flag,
+   long_desc_flag);
else if (strcmp(argv[i], "sdt") == 0)
print_sdt_events(NULL, NULL, raw_dump);
else if ((sep = strchr(argv[i], ':')) != NULL) {
int sep_idx;
 
if (sep == NULL) {
-   print_events(argv[i], raw_dump, !desc_flag);
+   print_events(argv[i], raw_dump, !desc_flag,
+   long_desc_flag);
continue;
}
sep_idx = sep - argv[i];
@@ -94,7 +99,8 @@ int cmd_list(int argc, const char **argv, const char *prefix 
__maybe_unused)
print_symbol_events(s, PERF_TYPE_SOFTWARE,
event_symbols_sw, 
PERF_COUNT_SW_MAX, raw_dump);
print_hwcache_events(s, raw_dump);
-   print_pmu_events(s, raw_dump, !desc_flag);
+   print_pmu_events(s, raw_dump, !desc_flag,
+

[PATCH v21 10/19] perf, tools, jevents: Add support for long descriptions

2016-09-15 Thread Sukadev Bhattiprolu
Implement support in jevents to parse long descriptions for events
that may have them in the JSON files. A follow on patch will make this
long description available to user through the 'perf list' command.

Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Jiri Olsa 
Acked-by: Ingo Molnar 
---

Changelog[v14]
- [Jiri Olsa] Break up independent parts of the patch into
  separate patches.

Changelog[v21]
- Fix minor conflicts in tools/perf/pmu-events/jevents.c and
  tools/perf/pmu-events/pmu-events.h
---
 tools/perf/pmu-events/jevents.c| 32 
 tools/perf/pmu-events/jevents.h|  3 ++-
 tools/perf/pmu-events/pmu-events.h |  1 +
 3 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c
index a9ca86d..f550cad 100644
--- a/tools/perf/pmu-events/jevents.c
+++ b/tools/perf/pmu-events/jevents.c
@@ -268,7 +268,7 @@ static void print_events_table_prefix(FILE *fp, const char 
*tblname)
 }
 
 static int print_events_table_entry(void *data, char *name, char *event,
-   char *desc)
+   char *desc, char *long_desc)
 {
struct perf_entry_data *pd = data;
FILE *outfp = pd->outfp;
@@ -284,6 +284,8 @@ static int print_events_table_entry(void *data, char *name, 
char *event,
fprintf(outfp, "\t.event = \"%s\",\n", event);
fprintf(outfp, "\t.desc = \"%s\",\n", desc);
fprintf(outfp, "\t.topic = \"%s\",\n", topic);
+   if (long_desc && long_desc[0])
+   fprintf(outfp, "\t.long_desc = \"%s\",\n", long_desc);
 
fprintf(outfp, "},\n");
 
@@ -305,7 +307,8 @@ static void print_events_table_suffix(FILE *outfp)
 
 /* Call func with each event in the json file */
 int json_events(const char *fn,
- int (*func)(void *data, char *name, char *event, char *desc),
+ int (*func)(void *data, char *name, char *event, char *desc,
+ char *long_desc),
  void *data)
 {
int err = -EIO;
@@ -324,6 +327,8 @@ int json_events(const char *fn,
tok = tokens + 1;
for (i = 0; i < tokens->size; i++) {
char *event = NULL, *desc = NULL, *name = NULL;
+   char *long_desc = NULL;
+   char *extra_desc = NULL;
struct msrmap *msr = NULL;
jsmntok_t *msrval = NULL;
jsmntok_t *precise = NULL;
@@ -349,6 +354,10 @@ int json_events(const char *fn,
} else if (json_streq(map, field, "BriefDescription")) {
addfield(map, , "", "", val);
fixdesc(desc);
+   } else if (json_streq(map, field,
+"PublicDescription")) {
+   addfield(map, _desc, "", "", val);
+   fixdesc(long_desc);
} else if (json_streq(map, field, "PEBS") && nz) {
precise = val;
} else if (json_streq(map, field, "MSRIndex") && nz) {
@@ -357,10 +366,10 @@ int json_events(const char *fn,
msrval = val;
} else if (json_streq(map, field, "Errata") &&
   !json_streq(map, val, "null")) {
-   addfield(map, , ". ",
+   addfield(map, _desc, ". ",
" Spec update: ", val);
} else if (json_streq(map, field, "Data_LA") && nz) {
-   addfield(map, , ". ",
+   addfield(map, _desc, ". ",
" Supports address when precise",
NULL);
}
@@ -368,19 +377,26 @@ int json_events(const char *fn,
}
if (precise && desc && !strstr(desc, "(Precise Event)")) {
if (json_streq(map, precise, "2"))
-   addfield(map, , " ", "(Must be precise)",
-   NULL);
+   addfield(map, _desc, " ",
+   "(Must be precise)", NULL);
else
-   addfield(map, , " ",
+   addfield(map, _desc, " ",
"(Precise event)", NULL);
}
+   if (desc && extra_desc)
+   addfield(map, , " ", extra_desc, NULL);
+   if (long_desc && extra_desc)
+   addfield(map, _desc, " ", extra_desc, NULL);
if (msr != NULL)
   

[PATCH v21 09/19] perf, tools: Add override support for event list CPUID

2016-09-15 Thread Sukadev Bhattiprolu
From: Andi Kleen 

Add a PERF_CPUID variable to override the CPUID of the current CPU (within
the current architecture). This is useful for testing, so that all event
lists can be tested on a single system.

Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Jiri Olsa 
Acked-by: Ingo Molnar 
---

v2: Fix double free in earlier version.
Print actual CPUID being used with verbose option.
---
 tools/perf/util/pmu.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index cb4c215..2291d2a 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -501,10 +501,16 @@ static int pmu_add_cpu_aliases(struct list_head *head)
struct pmu_event *pe;
char *cpuid;
 
-   cpuid = get_cpuid_str();
+   cpuid = getenv("PERF_CPUID");
+   if (cpuid)
+   cpuid = strdup(cpuid);
+   if (!cpuid)
+   cpuid = get_cpuid_str();
if (!cpuid)
return 0;
 
+   pr_debug("Using CPUID %s\n", cpuid);
+
i = 0;
while (1) {
map = _events_map[i++];
-- 
1.8.3.1



[PATCH v21 08/19] perf, tools: Add a --no-desc flag to perf list

2016-09-15 Thread Sukadev Bhattiprolu
From: Andi Kleen 

Add a --no-desc flag to perf list to not print the event descriptions
that were earlier added for JSON events. This may be useful to
get a less crowded listing.

It's still default to print descriptions as that is the more useful
default for most users.

Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Jiri Olsa 
Acked-by: Ingo Molnar 
---

v2: Rename --quiet to --no-desc. Add option to man page.

v18: Fix minor conflict in tools/perf/builtin-list.c; Add !desc_flag
to the newly introduced print_pmu_events() call site.

v21: Fix minor conflicts in tools/perf/builtin-list.c
---
 tools/perf/Documentation/perf-list.txt |  8 +++-
 tools/perf/builtin-list.c  | 14 +-
 tools/perf/util/parse-events.c |  4 ++--
 tools/perf/util/parse-events.h |  2 +-
 tools/perf/util/pmu.c  |  4 ++--
 tools/perf/util/pmu.h  |  2 +-
 6 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/tools/perf/Documentation/perf-list.txt 
b/tools/perf/Documentation/perf-list.txt
index a126e97..72209bc 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -8,13 +8,19 @@ perf-list - List all symbolic event types
 SYNOPSIS
 
 [verse]
-'perf list' [hw|sw|cache|tracepoint|pmu|event_glob]
+'perf list' [--no-desc] [hw|sw|cache|tracepoint|pmu|event_glob]
 
 DESCRIPTION
 ---
 This command displays the symbolic event types which can be selected in the
 various perf commands with the -e option.
 
+OPTIONS
+---
+--no-desc::
+Don't print descriptions.
+
+
 [[EVENT_MODIFIERS]]
 EVENT MODIFIERS
 ---
diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
index 88ee419..b14cb16 100644
--- a/tools/perf/builtin-list.c
+++ b/tools/perf/builtin-list.c
@@ -16,16 +16,20 @@
 #include "util/pmu.h"
 #include 
 
+static bool desc_flag = true;
+
 int cmd_list(int argc, const char **argv, const char *prefix __maybe_unused)
 {
int i;
bool raw_dump = false;
struct option list_options[] = {
OPT_BOOLEAN(0, "raw-dump", _dump, "Dump raw events"),
+   OPT_BOOLEAN('d', "desc", _flag,
+   "Print extra event descriptions. --no-desc to not 
print."),
OPT_END()
};
const char * const list_usage[] = {
-   "perf list [hw|sw|cache|tracepoint|pmu|sdt|event_glob]",
+   "perf list [--no-desc] 
[hw|sw|cache|tracepoint|pmu|sdt|event_glob]",
NULL
};
 
@@ -40,7 +44,7 @@ int cmd_list(int argc, const char **argv, const char *prefix 
__maybe_unused)
printf("\nList of pre-defined events (to be used in -e):\n\n");
 
if (argc == 0) {
-   print_events(NULL, raw_dump);
+   print_events(NULL, raw_dump, !desc_flag);
return 0;
}
 
@@ -61,14 +65,14 @@ int cmd_list(int argc, const char **argv, const char 
*prefix __maybe_unused)
 strcmp(argv[i], "hwcache") == 0)
print_hwcache_events(NULL, raw_dump);
else if (strcmp(argv[i], "pmu") == 0)
-   print_pmu_events(NULL, raw_dump);
+   print_pmu_events(NULL, raw_dump, !desc_flag);
else if (strcmp(argv[i], "sdt") == 0)
print_sdt_events(NULL, NULL, raw_dump);
else if ((sep = strchr(argv[i], ':')) != NULL) {
int sep_idx;
 
if (sep == NULL) {
-   print_events(argv[i], raw_dump);
+   print_events(argv[i], raw_dump, !desc_flag);
continue;
}
sep_idx = sep - argv[i];
@@ -90,7 +94,7 @@ int cmd_list(int argc, const char **argv, const char *prefix 
__maybe_unused)
print_symbol_events(s, PERF_TYPE_SOFTWARE,
event_symbols_sw, 
PERF_COUNT_SW_MAX, raw_dump);
print_hwcache_events(s, raw_dump);
-   print_pmu_events(s, raw_dump);
+   print_pmu_events(s, raw_dump, !desc_flag);
print_tracepoint_events(NULL, s, raw_dump);
print_sdt_events(NULL, s, raw_dump);
free(s);
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 6c913c3..a3c7739 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -2229,7 +2229,7 @@ out_enomem:
 /*
  * Print the help text for the event symbols:
  */
-void print_events(const char *event_glob, bool name_only)
+void print_events(const char *event_glob, bool name_only, bool quiet_flag)
 {

[PATCH v21 07/19] perf, tools: Query terminal width and use in perf list

2016-09-15 Thread Sukadev Bhattiprolu
From: Andi Kleen 

Automatically adapt the now wider and word wrapped perf list
output to wider terminals. This requires querying the terminal
before the auto pager takes over, and exporting this
information from the pager subsystem.

Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Namhyung Kim 
Acked-by: Jiri Olsa 
Acked-by: Ingo Molnar 
---

Changelog[v20]
- Minor reorg since helpers like setup_pager() are now in
  tools/lib/subcmd/pager.c
---
 tools/lib/subcmd/pager.c | 16 
 tools/lib/subcmd/pager.h |  1 +
 tools/perf/util/pmu.c|  3 ++-
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/tools/lib/subcmd/pager.c b/tools/lib/subcmd/pager.c
index d50f3b58..6518bea 100644
--- a/tools/lib/subcmd/pager.c
+++ b/tools/lib/subcmd/pager.c
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "pager.h"
 #include "run-command.h"
 #include "sigchain.h"
@@ -14,6 +15,7 @@
  */
 
 static int spawned_pager;
+static int pager_columns;
 
 void pager_init(const char *pager_env)
 {
@@ -58,9 +60,12 @@ static void wait_for_pager_signal(int signo)
 void setup_pager(void)
 {
const char *pager = getenv(subcmd_config.pager_env);
+   struct winsize sz;
 
if (!isatty(1))
return;
+   if (ioctl(1, TIOCGWINSZ, ) == 0)
+   pager_columns = sz.ws_col;
if (!pager)
pager = getenv("PAGER");
if (!(pager || access("/usr/bin/pager", X_OK)))
@@ -98,3 +103,14 @@ int pager_in_use(void)
 {
return spawned_pager;
 }
+
+int pager_get_columns(void)
+{
+   char *s;
+
+   s = getenv("COLUMNS");
+   if (s)
+   return atoi(s);
+
+   return (pager_columns ? pager_columns : 80) - 2;
+}
diff --git a/tools/lib/subcmd/pager.h b/tools/lib/subcmd/pager.h
index 8b83714..623f554 100644
--- a/tools/lib/subcmd/pager.h
+++ b/tools/lib/subcmd/pager.h
@@ -5,5 +5,6 @@ extern void pager_init(const char *pager_env);
 
 extern void setup_pager(void);
 extern int pager_in_use(void);
+extern int pager_get_columns(void);
 
 #endif /* __SUBCMD_PAGER_H */
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index af1a612..ef9de3e 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -14,6 +14,7 @@
 #include "cpumap.h"
 #include "header.h"
 #include "pmu-events/pmu-events.h"
+#include "cache.h"
 
 struct perf_pmu_format {
char *name;
@@ -1093,7 +1094,7 @@ void print_pmu_events(const char *event_glob, bool 
name_only)
int len, j;
struct pair *aliases;
int numdesc = 0;
-   int columns = 78;
+   int columns = pager_get_columns();
 
pmu = NULL;
len = 0;
-- 
1.8.3.1



[PATCH v21 06/19] perf, tools: Support alias descriptions

2016-09-15 Thread Sukadev Bhattiprolu
From: Andi Kleen 

Add support to print alias descriptions in perf list, which
are taken from the generated event files.

The sorting code is changed to put the events with descriptions
at the end. The descriptions are printed as possibly multiple word
wrapped lines.

Example output:

% perf list
...
  arith.fpu_div
   [Divide operations executed]
  arith.fpu_div_active
   [Cycles when divider is busy executing divide operations]

Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Jiri Olsa 
Acked-by: Ingo Molnar 
---

Changelog
- Delete a redundant free()

Changelog[v14]
- [Jiri Olsa] Fail, rather than continue if strdup() returns NULL;
  remove unnecessary __maybe_unused.
---
 tools/perf/util/pmu.c | 83 +--
 tools/perf/util/pmu.h |  1 +
 2 files changed, 68 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index c842886..af1a612 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -222,7 +222,7 @@ static int perf_pmu__parse_snapshot(struct perf_pmu_alias 
*alias,
 }
 
 static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
-char *desc __maybe_unused, char *val)
+char *desc, char *val)
 {
struct perf_pmu_alias *alias;
int ret;
@@ -255,6 +255,8 @@ static int __perf_pmu__new_alias(struct list_head *list, 
char *dir, char *name,
perf_pmu__parse_snapshot(alias, dir, name);
}
 
+   alias->desc = desc ? strdup(desc) : NULL;
+
list_add_tail(>list, list);
 
return 0;
@@ -1044,11 +1046,42 @@ static char *format_alias_or(char *buf, int len, struct 
perf_pmu *pmu,
return buf;
 }
 
-static int cmp_string(const void *a, const void *b)
+struct pair {
+   char *name;
+   char *desc;
+};
+
+static int cmp_pair(const void *a, const void *b)
+{
+   const struct pair *as = a;
+   const struct pair *bs = b;
+
+   /* Put extra events last */
+   if (!!as->desc != !!bs->desc)
+   return !!as->desc - !!bs->desc;
+   return strcmp(as->name, bs->name);
+}
+
+static void wordwrap(char *s, int start, int max, int corr)
 {
-   const char * const *as = a;
-   const char * const *bs = b;
-   return strcmp(*as, *bs);
+   int column = start;
+   int n;
+
+   while (*s) {
+   int wlen = strcspn(s, " \t");
+
+   if (column + wlen >= max && column > start) {
+   printf("\n%*s", start, "");
+   column = start + corr;
+   }
+   n = printf("%s%.*s", column > start ? " " : "", wlen, s);
+   if (n <= 0)
+   break;
+   s += wlen;
+   column += n;
+   while (isspace(*s))
+   s++;
+   }
 }
 
 void print_pmu_events(const char *event_glob, bool name_only)
@@ -1058,7 +1091,9 @@ void print_pmu_events(const char *event_glob, bool 
name_only)
char buf[1024];
int printed = 0;
int len, j;
-   char **aliases;
+   struct pair *aliases;
+   int numdesc = 0;
+   int columns = 78;
 
pmu = NULL;
len = 0;
@@ -1068,14 +1103,15 @@ void print_pmu_events(const char *event_glob, bool 
name_only)
if (pmu->selectable)
len++;
}
-   aliases = zalloc(sizeof(char *) * len);
+   aliases = zalloc(sizeof(struct pair) * len);
if (!aliases)
goto out_enomem;
pmu = NULL;
j = 0;
while ((pmu = perf_pmu__scan(pmu)) != NULL) {
list_for_each_entry(alias, >aliases, list) {
-   char *name = format_alias(buf, sizeof(buf), pmu, alias);
+   char *name = alias->desc ? alias->name :
+   format_alias(buf, sizeof(buf), pmu, alias);
bool is_cpu = !strcmp(pmu->name, "cpu");
 
if (event_glob != NULL &&
@@ -1084,12 +1120,19 @@ void print_pmu_events(const char *event_glob, bool 
name_only)
   event_glob
continue;
 
-   if (is_cpu && !name_only)
+   if (is_cpu && !name_only && !alias->desc)
name = format_alias_or(buf, sizeof(buf), pmu, 
alias);
 
-   aliases[j] = strdup(name);
-   if (aliases[j] == NULL)
+   aliases[j].name = name;
+   if (is_cpu && !name_only && !alias->desc)
+   aliases[j].name = format_alias_or(buf,
+ sizeof(buf),

[PATCH v21 05/19] perf, tools: Support CPU id matching for x86 v2

2016-09-15 Thread Sukadev Bhattiprolu
From: Andi Kleen 

Implement the code to match CPU types to mapfile types for x86
based on CPUID. This extends an existing similar function,
but changes it to use the x86 mapfile cpu description.
This allows to resolve event lists generated by jevents.

Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Jiri Olsa 
Acked-by: Ingo Molnar 
---

v2: Update to new get_cpuid_str() interface
---
 tools/perf/arch/x86/util/header.c | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/tools/perf/arch/x86/util/header.c 
b/tools/perf/arch/x86/util/header.c
index 146d12a..a74a48d 100644
--- a/tools/perf/arch/x86/util/header.c
+++ b/tools/perf/arch/x86/util/header.c
@@ -19,8 +19,8 @@ cpuid(unsigned int op, unsigned int *a, unsigned int *b, 
unsigned int *c,
: "a" (op));
 }
 
-int
-get_cpuid(char *buffer, size_t sz)
+static int
+__get_cpuid(char *buffer, size_t sz, const char *fmt)
 {
unsigned int a, b, c, d, lvl;
int family = -1, model = -1, step = -1;
@@ -48,7 +48,7 @@ get_cpuid(char *buffer, size_t sz)
if (family >= 0x6)
model += ((a >> 16) & 0xf) << 4;
}
-   nb = scnprintf(buffer, sz, "%s,%u,%u,%u$", vendor, family, model, step);
+   nb = scnprintf(buffer, sz, fmt, vendor, family, model, step);
 
/* look for end marker to ensure the entire data fit */
if (strchr(buffer, '$')) {
@@ -57,3 +57,21 @@ get_cpuid(char *buffer, size_t sz)
}
return -1;
 }
+
+int
+get_cpuid(char *buffer, size_t sz)
+{
+   return __get_cpuid(buffer, sz, "%s,%u,%u,%u$");
+}
+
+char *
+get_cpuid_str(void)
+{
+   char *buf = malloc(128);
+
+   if (__get_cpuid(buf, 128, "%s-%u-%X$") < 0) {
+   free(buf);
+   return NULL;
+   }
+   return buf;
+}
-- 
1.8.3.1



[PATCH v21 04/19] perf, tools: Support CPU ID matching for Powerpc

2016-09-15 Thread Sukadev Bhattiprolu
Implement code that returns the generic CPU ID string for Powerpc.
This will be used to identify the specific table of PMU events to
parse/compare user specified events against.

Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Jiri Olsa 
Acked-by: Ingo Molnar 
---

Changelog[v14]
- [Jiri Olsa] Move this independent code off into a separate patch.
---
 tools/perf/arch/powerpc/util/header.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tools/perf/arch/powerpc/util/header.c 
b/tools/perf/arch/powerpc/util/header.c
index f8ccee1..9aaa6f5 100644
--- a/tools/perf/arch/powerpc/util/header.c
+++ b/tools/perf/arch/powerpc/util/header.c
@@ -32,3 +32,14 @@ get_cpuid(char *buffer, size_t sz)
}
return -1;
 }
+
+char *
+get_cpuid_str(void)
+{
+   char *bufp;
+
+   if (asprintf(, "%.8lx", mfspr(SPRN_PVR)) < 0)
+   bufp = NULL;
+
+   return bufp;
+}
-- 
1.8.3.1



[PATCH v21 03/19] perf, tools: Use pmu_events table to create aliases

2016-09-15 Thread Sukadev Bhattiprolu
At run time (when 'perf' is starting up), locate the specific table
of PMU events that corresponds to the current CPU. Using that table,
create aliases for the each of the PMU events in the CPU. The use
these aliases to parse the user specified perf event.

In short this would allow the user to specify events using their
aliases rather than raw event codes.

Based on input and some earlier patches from Andi Kleen, Jiri Olsa.

Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Jiri Olsa 
Acked-by: Ingo Molnar 
---

Changelog[v4]
- Split off unrelated code into separate patches.
Changelog[v3]
- [Jiri Olsa] Fix memory leak in cpuid
Changelog[v2]
- [Andi Kleen] Replace pmu_events_map->vfm with a generic "cpuid".
---
 tools/perf/util/header.h |  1 +
 tools/perf/util/pmu.c| 61 
 2 files changed, 62 insertions(+)

diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index d306ca1..d30109b 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -151,4 +151,5 @@ int write_padded(int fd, const void *bf, size_t count, 
size_t count_aligned);
  */
 int get_cpuid(char *buffer, size_t sz);
 
+char *get_cpuid_str(void);
 #endif /* __PERF_HEADER_H */
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 2babcdf..c842886 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -12,6 +12,8 @@
 #include "pmu.h"
 #include "parse-events.h"
 #include "cpumap.h"
+#include "header.h"
+#include "pmu-events/pmu-events.h"
 
 struct perf_pmu_format {
char *name;
@@ -473,6 +475,62 @@ static struct cpu_map *pmu_cpumask(const char *name)
return cpus;
 }
 
+/*
+ * Return the CPU id as a raw string.
+ *
+ * Each architecture should provide a more precise id string that
+ * can be use to match the architecture's "mapfile".
+ */
+char * __weak get_cpuid_str(void)
+{
+   return NULL;
+}
+
+/*
+ * From the pmu_events_map, find the table of PMU events that corresponds
+ * to the current running CPU. Then, add all PMU events from that table
+ * as aliases.
+ */
+static int pmu_add_cpu_aliases(struct list_head *head)
+{
+   int i;
+   struct pmu_events_map *map;
+   struct pmu_event *pe;
+   char *cpuid;
+
+   cpuid = get_cpuid_str();
+   if (!cpuid)
+   return 0;
+
+   i = 0;
+   while (1) {
+   map = _events_map[i++];
+   if (!map->table)
+   goto out;
+
+   if (!strcmp(map->cpuid, cpuid))
+   break;
+   }
+
+   /*
+* Found a matching PMU events table. Create aliases
+*/
+   i = 0;
+   while (1) {
+   pe = >table[i++];
+   if (!pe->name)
+   break;
+
+   /* need type casts to override 'const' */
+   __perf_pmu__new_alias(head, NULL, (char *)pe->name,
+   (char *)pe->desc, (char *)pe->event);
+   }
+
+out:
+   free(cpuid);
+   return 0;
+}
+
 struct perf_event_attr * __weak
 perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused)
 {
@@ -497,6 +555,9 @@ static struct perf_pmu *pmu_lookup(const char *name)
if (pmu_aliases(name, ))
return NULL;
 
+   if (!strcmp(name, "cpu"))
+   (void)pmu_add_cpu_aliases();
+
if (pmu_type(name, ))
return NULL;
 
-- 
1.8.3.1



[PATCH v21 02/19] perf, tools, jevents: Program to convert JSON file to C style file

2016-09-15 Thread Sukadev Bhattiprolu
From: Andi Kleen 

This is a modified version of an earlier patch by Andi Kleen.

We expect architectures to create JSON files describing the performance
monitoring (PMU) events that each CPU model/family of the architecture
supports.

Following is an example of the JSON file entry for an x86 event:

[
...
{
"EventCode": "0x00",
"UMask": "0x01",
"EventName": "INST_RETIRED.ANY",
"BriefDescription": "Instructions retired from execution.",
"PublicDescription": "Instructions retired from execution.",
"Counter": "Fixed counter 1",
"CounterHTOff": "Fixed counter 1",
"SampleAfterValue": "203",
"SampleAfterValue": "203",
"MSRIndex": "0",
"MSRValue": "0",
"TakenAlone": "0",
"CounterMask": "0",
"Invert": "0",
"AnyThread": "0",
"EdgeDetect": "0",
"PEBS": "0",
"PRECISE_STORE": "0",
"Errata": "null",
"Offcore": "0"
},
...

]

All the PMU events supported by a CPU model/family must be grouped into
"topics" such as "Piplelining", "Floating-point", "Virtual-memory" etc.

All events belonging to a topic must be placed in a separate JSON file
(eg: "Pipeling.json") and all the topic JSON files for a CPU model must
be in a separate directory.

Eg: for the CPU model "Silvermont_core":

$ ls tools/perf/pmu-events/arch/x86/Silvermont_core
Floating-point.json
Memory.json
Other.json
Pipelining.json
Virtualmemory.json

Finally, to allow multiple CPU models to share a single set of JSON files,
architectures must provide a mapping between a model and its set of events:

$ grep Silvermont tools/perf/pmu-events/arch/x86/mapfile.csv
GenuineIntel-6-4D,V13,Silvermont_core,core
GenuineIntel-6-4C,V13,Silvermont_core,core

which maps each CPU, identified by [vendor, family, model, version, type]
to a directory of JSON files. Thus two (or more) CPU models support the
set of PMU events listed in the directory.

tools/perf/pmu-events/arch/x86/Silvermont_core/

Given this organization of files, the program, jevents:

- locates all JSON files for each CPU-model of the architecture,

- parses all JSON files for the CPU-model and generates a C-style
  "PMU-events table" (pmu-events.c) for the model

- locates a mapfile for the architecture

- builds a global table, mapping each model of CPU to the corresponding
  PMU-events table.

The 'pmu-events.c' is generated when building perf and added to libperf.a.
The global table pmu_events_map[] table in this pmu-events.c will be used
in perf in a follow-on patch.

If the architecture does not have any JSON files or there is an error in
processing them, an empty mapping file is created. This would allow the
build of perf to proceed even if we are not able to provide aliases for
events.

The parser for JSON files allows parsing Intel style JSON event files. This
allows to use an Intel event list directly with perf. The Intel event lists
can be quite large and are too big to store in unswappable kernel memory.

The conversion from JSON to C-style is straight forward.  The parser knows
(very little) Intel specific information, and can be easily extended to
handle fields for other CPUs.

The parser code is partially shared with an independent parsing library,
which is 2-clause BSD licenced. To avoid any conflicts I marked those
files as BSD licenced too. As part of perf they become GPLv2.

Signed-off-by: Andi Kleen 
Signed-off-by: Jiri Olsa 
Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Ingo Molnar 
---
v2: Address review feedback. Rename option to --event-files
v3: Add JSON example
v4: Update manpages.
v5: Don't remove dot in fixname. Fix compile error. Add include
protection. Comment realloc.
v6: Include debug/util.h
v7: (Sukadev Bhattiprolu)
Rebase to 4.0 and fix some conflicts.
v8: (Sukadev Bhattiprolu)
Move jevents.[hc] to tools/perf/pmu-events/
Rewrite to locate and process arch specific JSON and "map" files;
and generate a C file.
(Removed acked-by Namhyung Kim due to modest changes to patch)
Compile the generated pmu-events.c and add the pmu-events.o to
libperf.a
v9: [Sukadev Bhattiprolu/Andi Kleen] Rename ->vfm to ->cpuid and use
that field to encode the PVR in Power.
Allow blank lines in mapfile.
[Jiri Olsa] Pass ARCH as a parameter to jevents so we don't have
to detect it.
[Jiri Olsa] Use the infrastrastructure to build pmu-events/perf
(Makefile changes from Jiri included in this patch).
[Jiri Olsa, Andi Kleen] Detect changes to JSON files and rebuild
pmu-events.o only if necessary.

v11:- [Andi Kleen] Add mapfile, 

[PATCH v21 01/19] perf, tools: Add jsmn `jasmine' JSON parser

2016-09-15 Thread Sukadev Bhattiprolu
From: Andi Kleen 

I need a JSON parser. This adds the simplest JSON
parser I could find -- Serge Zaitsev's jsmn `jasmine' --
to the perf library. I merely converted it to (mostly)
Linux style and added support for non 0 terminated input.

The parser is quite straight forward and does not
copy any data, just returns tokens with offsets
into the input buffer. So it's relatively efficient
and simple to use.

The code is not fully checkpatch clean, but I didn't
want to completely fork the upstream code.

Original source: http://zserge.bitbucket.org/jsmn.html

In addition I added a simple wrapper that mmaps a json
file and provides some straight forward access functions.

Used in follow-on patches to parse event files.

Signed-off-by: Andi Kleen 
Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Namhyung Kim 
Acked-by: Jiri Olsa 
Acked-by: Ingo Molnar 
---
v2: Address review feedback.
v3: Minor checkpatch fixes.
v4 (by Sukadev Bhattiprolu)
- Rebase to 4.0 and fix minor conflicts in tools/perf/Makefile.perf
- Report error if specified events file is invalid.
v5 (Sukadev Bhattiprolu)
- Move files to tools/perf/pmu-events/ since parsing of JSON file
now occurs when _building_ rather than running perf.
---
 tools/perf/pmu-events/jsmn.c | 313 +++
 tools/perf/pmu-events/jsmn.h |  67 +
 tools/perf/pmu-events/json.c | 162 ++
 tools/perf/pmu-events/json.h |  36 +
 4 files changed, 578 insertions(+)
 create mode 100644 tools/perf/pmu-events/jsmn.c
 create mode 100644 tools/perf/pmu-events/jsmn.h
 create mode 100644 tools/perf/pmu-events/json.c
 create mode 100644 tools/perf/pmu-events/json.h

diff --git a/tools/perf/pmu-events/jsmn.c b/tools/perf/pmu-events/jsmn.c
new file mode 100644
index 000..11d1fa1
--- /dev/null
+++ b/tools/perf/pmu-events/jsmn.c
@@ -0,0 +1,313 @@
+/*
+ * Copyright (c) 2010 Serge A. Zaitsev
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ *
+ * Slightly modified by AK to not assume 0 terminated input.
+ */
+
+#include 
+#include "jsmn.h"
+
+/*
+ * Allocates a fresh unused token from the token pool.
+ */
+static jsmntok_t *jsmn_alloc_token(jsmn_parser *parser,
+  jsmntok_t *tokens, size_t num_tokens)
+{
+   jsmntok_t *tok;
+
+   if ((unsigned)parser->toknext >= num_tokens)
+   return NULL;
+   tok = [parser->toknext++];
+   tok->start = tok->end = -1;
+   tok->size = 0;
+   return tok;
+}
+
+/*
+ * Fills token type and boundaries.
+ */
+static void jsmn_fill_token(jsmntok_t *token, jsmntype_t type,
+   int start, int end)
+{
+   token->type = type;
+   token->start = start;
+   token->end = end;
+   token->size = 0;
+}
+
+/*
+ * Fills next available token with JSON primitive.
+ */
+static jsmnerr_t jsmn_parse_primitive(jsmn_parser *parser, const char *js,
+ size_t len,
+ jsmntok_t *tokens, size_t num_tokens)
+{
+   jsmntok_t *token;
+   int start;
+
+   start = parser->pos;
+
+   for (; parser->pos < len; parser->pos++) {
+   switch (js[parser->pos]) {
+#ifndef JSMN_STRICT
+   /*
+* In strict mode primitive must be followed by ","
+* or "}" or "]"
+*/
+   case ':':
+#endif
+   case '\t':
+   case '\r':
+   case '\n':
+   case ' ':
+   case ',':
+   case ']':
+   case '}':
+   goto found;
+   default:
+   break;
+   }
+   if (js[parser->pos] < 32 || js[parser->pos] >= 127) {
+   parser->pos = 

[PATCH v21 00/20] perf, tools: Add support for PMU events in JSON format

2016-09-15 Thread Sukadev Bhattiprolu
CPUs support a large number of performance monitoring events (PMU events)
and often these events are very specific to an architecture/model of the
CPU. To use most of these PMU events with perf, we currently have to identify
them by their raw codes:

perf stat -e r100f2 sleep 1

This patchset allows architectures to specify these PMU events in JSON
files located in 'tools/perf/pmu-events/arch/' of the mainline tree.
The events from the JSON files for the architecture are then built into
the perf binary.

At run time, perf identifies the specific set of events for the CPU and
creates "event aliases". These aliases allow users to specify events by
"name" as:

perf stat -e pm_1plus_ppc_cmpl sleep 1

The file, 'tools/perf/pmu-events/README' in [PATCH 15/19] gives more
details.

Note:
- All known events tables for the architecture are included in the
  perf binary.

- For architectures that don't have any JSON files, an empty mapping
  table is created and they should continue to build.

Thanks to input from Andi Kleen, Jiri Olsa, Namhyung Kim and Ingo Molnar.

These patches are available from:

https://github.com/sukadev/linux.git 

Branch  Description
--
json-code-v21   Source Code only 
json-code+data-v21  Both code and data (for build/test/pull)

NOTE:   Only "source code" patches (i.e those in json-code-v21) are being
emailed. Please pull the json-code+data-v21 branch for build/test.

Changelog[v21]
- Rebase to recent perf/core
- Group the PMU events supported by a CPU model into topics and
  create a separate JSON file for each topic for each CPU (code
  and input from Jiri Olsa).

Changelog[v20]
- Rebase to recent perf/core
- Add Patch 20/20 to allow perf-stat to work with the period= field

Changelog[v19]
Rebase to recent perf/core; fix couple lines >80 chars.

Changelog[v18]
Rebase to recent perf/core; fix minor merge conflicts.

Changelog[v17]
Rebase to recent perf/core; couple of small fixes to processing Intel
JSON files; allow case-insensitive PMU event names.

Changelog[v16]
Rebase to recent perf/core; fix minor merge conflicts; drop 3 patches
that were merged into perf/core.

Changelog[v15]
Code changes:
- Fix 'perf list' usage string and update man page.
- Remove a redundant __maybe_unused tag.
- Rebase to recent perf/core branch.

Data files updates: json-files-5 branch
- Rebase to perf/intel-json-files-5 from Andi Kleen
- Add patch from Madhavan Srinivasan for couple more Powerpc models

Changelog[v14]
Comments from Jiri Olsa:
- Change parameter name/type for pmu_add_cpu_aliases (from void *data
  to list_head *head)
- Use asprintf() in file_name_to_tablename() and simplify/reorg code.
- Use __weak definition from 
- Use fopen() with mode "w" and eliminate unlink()
- Remove minor TODO.
- Add error check for return value from strdup() in print_pmu_events().
- Move independent changes from patches 3,11,12 .. to separate patches
  for easier review/backport.
- Clarify mapfile's "header line support" in patch description.
- Fix build failure with DEBUG=1

Comment from Andi Kleen:
- In tools/perf/pmu-events/Build, check for 'mapfile.csv' rather than
  'mapfile*'

Misc:
- Minor changes/clarifications to tools/perf/pmu-events/README.


Changelog[v13]
Version: Individual patches have their own history :-) that I am
preserving. Patchset version (v13) is for overall patchset and is
somewhat arbitrary.

- Added support for "categories" of events to perf
- Add mapfile, jevents build dependency on pmu-events.c
- Silence jevents when parsing JSON files unless V=1 is specified
- Cleanup error messages
- Fix memory leak with ->cpuid
- Rebase to Arnaldo's tree
- Allow overriding CPUID via environment variable
- Support long descriptions for events
- Handle header line in mapfile.csv
- Cleanup JSON files (trim PublicDescription if identical to/prefix of
  BriefDescription field)


Andi Kleen (12):
  perf, tools: Add jsmn `jasmine' JSON parser
  perf, tools, jevents: Program to convert JSON file to C style file
  perf, tools: Support CPU id matching for x86 v2
  perf, tools: Support alias descriptions
  perf, tools: Query terminal width and use in perf list
  perf, tools: Add a --no-desc flag to perf list
  perf, tools: Add override support for event list CPUID
  perf, tools: Add support for event list topics
  perf, tools, jevents: Handle header line in mapfile
  perf, tools: Make alias matching case-insensitive
  perf, tools, 

Re: powerpc/powernv: Fix the state of root PE

2016-09-15 Thread Michael Ellerman
On Tue, 2016-13-09 at 06:40:24 UTC, Gavin Shan wrote:
> The PE for root bus (root PE) can be removed because of PCI hot
> remove in EEH recovery path for fenced PHB error. We need update
> @phb->root_pe_populated accordingly so that the root PE can be
> populated again in forthcoming PCI hot add path. Also, the PE
> shouldn't be destroyed as it's global and reserved resource.
> 
> Fixes: c5f7700bbd2e ("powerpc/powernv: Dynamically release PE")
> Reported-by: Frederic Barrat 
> Signed-off-by: Gavin Shan 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/6eaed1665fc6864fbdbffcc6f4

cheers


Re: powerpc/powernv/pci: Fix missed TCE invalidations that should fallback to OPAL

2016-09-15 Thread Michael Ellerman
On Thu, 2016-15-09 at 08:39:34 UTC, Michael Ellerman wrote:
> In commit f0228c413011 ("powerpc/powernv/pci: Fallback to OPAL for TCE
> invalidations"), we added logic to fallback to OPAL for doing TCE
> invalidations if we can't do it in Linux.
> 
> Ben sent a v2 of the patch, containing these additional call sites, but
> I had already applied v1 and didn't notice. So fix them now.
> 
> Fixes: f0228c413011 ("powerpc/powernv/pci: Fallback to OPAL for TCE 
> invalidations")
> Signed-off-by: Benjamin Herrenschmidt 
> Signed-off-by: Michael Ellerman 

Applied to powerpc fixes.

https://git.kernel.org/powerpc/c/ed7d9a1d7da6fe7b1c7477dc70

cheers


Re: powerpc/powernv: Detach from PE on releasing PCI device

2016-09-15 Thread Michael Ellerman
On Tue, 2016-06-09 at 06:34:01 UTC, Gavin Shan wrote:
> The PCI hotplug can be part of EEH error recovery. The @pdn and
> the device's PE number aren't removed and added afterwords. The
> PE number in @pdn should be set to an invalid one. Otherwise, the
> PE's device count is decreased on removing devices while failing
> to be increased on adding devices. It leads to unbalanced PE's
> device count and make normal PCI hotplug path broken.
> 
> Fixes: c5f7700bbd2e ("powerpc/powernv: Dynamically release PE")
> Signed-off-by: Gavin Shan 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/29bf282dec94f6015a675c0076

cheers


Re: + powerpc-add-purgatory-for-kexec_file_load-implementation-fix.patch added to -mm tree

2016-09-15 Thread Thiago Jung Bauermann
Hello Andrew and Stephen,

Am Dienstag, 06 September 2016, 12:17:09 schrieb a...@linux-foundation.org:
> The patch titled
>  Subject: powerpc-add-purgatory-for-kexec_file_load-implementation-fix
> has been added to the -mm tree.  Its filename is
>  powerpc-add-purgatory-for-kexec_file_load-implementation-fix.patch
> 
> This patch should soon appear at
>
> http://ozlabs.org/~akpm/mmots/broken-out/powerpc-add-purgatory-for-kexec_
> file_load-implementation-fix.patch and later at
>
> http://ozlabs.org/~akpm/mmotm/broken-out/powerpc-add-purgatory-for-kexec_
> file_load-implementation-fix.patch
> 
> Before you just go and hit "reply", please:
>a) Consider who else should be cc'ed
>b) Prefer to cc a suitable mailing list as well
>c) Ideally: find the original patch on the mailing list and do a
>   reply-to-all to that, adding suitable additional cc's
> 
> *** Remember to use Documentation/SubmitChecklist when testing your code
> ***
> 
> The -mm tree is included into linux-next and is updated
> there every 3-4 working days

I noticed that both the kexec_file_load implementation for powerpc and the 
kexec hand-over buffer patches were removed from -mm and linux-next.

kexec_file_load is desired even without the kexec hand-over buffer feature 
because it enables one to only allow loading of signed kernels, and also to 
measure the loaded kernel and initramfs. There are users interested in both 
of those features on powerpc and since IMA already supports them via its 
hooks in kexec_file_load, the only missing piece in the puzzle is the 
kexec_file_load implementation itself.

The latest version of the kexec_file_load series is v8:

https://lists.infradead.org/pipermail/kexec/2016-September/017123.html

which is also part of the next-kexec-restore branch at 
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git

I'm not aware of any issue with v8, so could you consider it for -mm and 
linux-next independently of the kexec buffer hand-over series?
 
> --
> From: Thiago Jung Bauermann 
> Subject: powerpc-add-purgatory-for-kexec_file_load-implementation-fix
> 
> The change below to arch/powerpc/purgatory/Makefile fixes the cross build
> from ppc64 BE to ppc64 LE.
> 
> I also noticed that building the purgatory during archprepare is not
> necessary, so I also made the change below to arch/powerpc/Makefile.
> 
> I'm preparing a v8 of the kexec_file_load implementation for powerpc
> series with those changes and will send it shortly.
> 
> Reported-by: Stephen Rothwell 
> Signed-off-by: Andrew Morton 
> ---
> 
>  arch/powerpc/Makefile   |3 ---
>  arch/powerpc/purgatory/Makefile |8 +---
>  2 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff -puN
> arch/powerpc/Makefile~powerpc-add-purgatory-for-kexec_file_load-implement
> ation-fix arch/powerpc/Makefile ---
> a/arch/powerpc/Makefile~powerpc-add-purgatory-for-kexec_file_load-impleme
> ntation-fix +++ a/arch/powerpc/Makefile
> @@ -378,9 +378,6 @@ archclean:
>   $(Q)$(MAKE) $(clean)=$(boot)
> 
>  archprepare: checkbin
> -ifeq ($(CONFIG_KEXEC_FILE),y)
> - $(Q)$(MAKE) $(build)=arch/powerpc/purgatory
> arch/powerpc/purgatory/kexec-purgatory.c -endif
> 
>  # Use the file '.tmp_gas_check' for binutils tests, as gas won't output
>  # to stdout and these checks are run even on install targets.
> diff -puN
> arch/powerpc/purgatory/Makefile~powerpc-add-purgatory-for-kexec_file_load
> -implementation-fix arch/powerpc/purgatory/Makefile ---
> a/arch/powerpc/purgatory/Makefile~powerpc-add-purgatory-for-kexec_file_lo
> ad-implementation-fix +++ a/arch/powerpc/purgatory/Makefile
> @@ -23,10 +23,12 @@ KBUILD_AFLAGS := -fno-exceptions -msoft-
>   -D__ASSEMBLY__
> 
>  ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y)
> -KBUILD_CFLAGS += $(call cc-option,-mabi=elfv2,$(call
> cc-option,-mcall-aixdesc)) -KBUILD_AFLAGS += $(call
> cc-option,-mabi=elfv2)
> +KBUILD_CFLAGS += $(call cc-option,-mabi=elfv2,$(call
> cc-option,-mcall-aixdesc)) \ +-mlittle-endian
> +KBUILD_AFLAGS += $(call cc-option,-mabi=elfv2) -mlittle-endian
>  else
> -KBUILD_CFLAGS += $(call cc-option,-mcall-aixdesc)
> +KBUILD_CFLAGS += $(call cc-option,-mcall-aixdesc) -mbig-endian
> +KBUILD_AFLAGS += -mbig-endian
>  endif
> 
>  $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
> _
> 
> Patches currently in -mm which might be from bauer...@linux.vnet.ibm.com
> are
> 
> kexec_file-allow-arch-specific-memory-walking-for-kexec_add_buffer.patch
> kexec_file-change-kexec_add_buffer-to-take-kexec_buf-as-argument.patch
> kexec_file-factor-out-kexec_locate_mem_hole-from-kexec_add_buffer.patch
> powerpc-change-places-using-config_kexec-to-use-config_kexec_core-instead.
> patch
> powerpc-factor-out-relocation-code-from-module_64c-to-elf_util_64c.patch
> powerpc-generalize-elf64_apply_relocate_add.patch
> 

Re: [PATCH 13/13] powerpc: rewrite local_t using soft_irq

2016-09-15 Thread kbuild test robot
Hi Madhavan,

[auto build test ERROR on v4.8-rc6]
[cannot apply to powerpc/next kvm-ppc/kvm-ppc-next mpe/next next-20160915]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]
[Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for 
convenience) to record what (public, well-known) commit your patch series was 
built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:
https://github.com/0day-ci/linux/commits/Madhavan-Srinivasan/powerpc-paca-soft_enabled-based-local-atomic-operation-implementation/20160915-215652
config: powerpc-allnoconfig (attached as .config)
compiler: powerpc-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   In file included from include/linux/perf_event.h:57:0,
from include/linux/trace_events.h:9,
from include/trace/syscall.h:6,
from include/linux/syscalls.h:81,
from init/main.c:18:
   arch/powerpc/include/asm/local.h: In function 'local_add':
>> arch/powerpc/include/asm/local.h:25:2: error: implicit declaration of 
>> function 'local_irq_pmu_save' [-Werror=implicit-function-declaration]
 local_irq_pmu_save(flags);
 ^
>> arch/powerpc/include/asm/local.h:32:2: error: implicit declaration of 
>> function 'local_irq_pmu_restore' [-Werror=implicit-function-declaration]
 local_irq_pmu_restore(flags);
 ^
   cc1: some warnings being treated as errors

vim +/local_irq_pmu_save +25 arch/powerpc/include/asm/local.h

19  
20  static __inline__ void local_add(long i, local_t *l)
21  {
22  long t;
23  unsigned long flags;
24  
  > 25  local_irq_pmu_save(flags);
26  __asm__ __volatile__(
27  PPC_LL" %0,0(%2)\n\
28  add %0,%1,%0\n"
29  PPC_STL" %0,0(%2)\n"
30  : "=" (t)
31  : "r" (i), "r" (&(l->a.counter)));
  > 32  local_irq_pmu_restore(flags);
33  }
34  
35  static __inline__ void local_sub(long i, local_t *l)

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH v5 1/5] kexec_file: Include the purgatory segment in the kexec image checksum.

2016-09-15 Thread Thiago Jung Bauermann
Hello Stephen,

Am Donnerstag, 15 September 2016, 11:43:08 schrieb Stephen Rothwell:
> Hi Thiago,
> 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 2a1f0ce7c59a..dcd1679f3005 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1792,6 +1792,11 @@ config SECCOMP
> > 
> >  source kernel/Kconfig.hz
> > 
> > +# x86 needs to relocate the purgatory after the checksum is calculated,
> > +# therefore the purgatory cannot be part of the kexec image checksum.
> > +config ARCH_MODIFIES_KEXEC_PURGATORY
> > +   bool
> > +
> 
> The above should probably be in arch/Kconfig (with an appropriately
> changed comment) since it is used in generic code.

Here is the new version, with that change.

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center


Subject: [PATCH v5 1/5] kexec_file: Include the purgatory segment in the kexec
 image checksum.

Currently, the purgatory segment is skipped from the kexec image checksum
because it is modified to include the calculated digest.

By putting the digest in a separate kexec segment, we can include the
purgatory segment in the kexec image verification since it won't need
to be modified anymore.

With this change, the only part of the kexec image that is not covered
by the checksum is the digest itself.

Even with the digest stored separately, x86 needs to leave the purgatory
segment out of the checksum calculation because it modifies the purgatory
code in relocate_kernel. We use CONFIG_ARCH_MODIFIES_KEXEC_PURGATORY to
allow the powerpc purgatory to be protected by the checksum while still
preserving x86 behavior.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/Kconfig   |   8 +++
 arch/powerpc/purgatory/purgatory.c |   4 +-
 arch/x86/Kconfig   |   1 +
 arch/x86/purgatory/purgatory.c |   2 +-
 include/linux/kexec.h  |   6 +++
 kernel/kexec_file.c| 100 +
 6 files changed, 87 insertions(+), 34 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index fd6e9712af81..b386d6c1d463 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -5,6 +5,14 @@
 config KEXEC_CORE
bool
 
+#
+# Architectures that need to modify the purgatory segment after the
+# checksum is calculated need to select this option so that it won't
+# be part of the kexec image checksum.
+#
+config ARCH_MODIFIES_KEXEC_PURGATORY
+   bool
+
 config OPROFILE
tristate "OProfile system profiling"
depends on PROFILING
diff --git a/arch/powerpc/purgatory/purgatory.c 
b/arch/powerpc/purgatory/purgatory.c
index 5b006d685cf2..f19ac3d5a7d5 100644
--- a/arch/powerpc/purgatory/purgatory.c
+++ b/arch/powerpc/purgatory/purgatory.c
@@ -17,7 +17,7 @@
 #include "kexec-sha256.h"
 
 struct kexec_sha_region sha_regions[SHA256_REGIONS] = {};
-u8 sha256_digest[SHA256_DIGEST_SIZE] = { 0 };
+u8 *sha256_digest = NULL;
 
 int verify_sha256_digest(void)
 {
@@ -40,7 +40,7 @@ int verify_sha256_digest(void)
printf("\n");
 
printf("sha256_digest: ");
-   for (i = 0; i < sizeof(sha256_digest); i++)
+   for (i = 0; i < SHA256_DIGEST_SIZE; i++)
printf("%hhx ", sha256_digest[i]);
 
printf("\n");
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2a1f0ce7c59a..4d34019f7479 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1812,6 +1812,7 @@ config KEXEC
 config KEXEC_FILE
bool "kexec file based system call"
select KEXEC_CORE
+   select ARCH_MODIFIES_KEXEC_PURGATORY
select BUILD_BIN2C
depends on X86_64
depends on CRYPTO=y
diff --git a/arch/x86/purgatory/purgatory.c b/arch/x86/purgatory/purgatory.c
index 25e068ba3382..391c6a66cb03 100644
--- a/arch/x86/purgatory/purgatory.c
+++ b/arch/x86/purgatory/purgatory.c
@@ -22,7 +22,7 @@ unsigned long backup_dest = 0;
 unsigned long backup_src = 0;
 unsigned long backup_sz = 0;
 
-u8 sha256_digest[SHA256_DIGEST_SIZE] = { 0 };
+u8 *sha256_digest = NULL;
 
 struct sha_region sha_regions[16] = {};
 
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d419d0e51fe5..2a96292ee544 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -124,8 +124,14 @@ struct purgatory_info {
 */
void *purgatory_buf;
 
+   /* Digest of the contents of segments. */
+   void *digest_buf;
+
/* Address where purgatory is finally loaded and is executed from */
unsigned long purgatory_load_addr;
+
+   /* Address where the digest is loaded. */
+   unsigned long digest_load_addr;
 };
 
 typedef int (kexec_probe_t)(const char *kernel_buf, unsigned long kernel_size);
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 0c2df7f73792..6f7fa8901171 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -98,6 +98,9 @@ void kimage_file_post_load_cleanup(struct kimage *image)
vfree(pi->purgatory_buf);
pi->purgatory_buf = 

Re: [PATHC v2 0/9] ima: carry the measurement list across kexec

2016-09-15 Thread Mimi Zohar
Hi Andrew,

On Wed, 2016-08-31 at 18:38 -0400, Mimi Zohar wrote:
> On Wed, 2016-08-31 at 13:50 -0700, Andrew Morton wrote:
> > On Tue, 30 Aug 2016 18:40:02 -0400 Mimi Zohar  
> > wrote:
> > 
> > > The TPM PCRs are only reset on a hard reboot.  In order to validate a
> > > TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list
> > > of the running kernel must be saved and then restored on the subsequent
> > > boot, possibly of a different architecture.
> > > 
> > > The existing securityfs binary_runtime_measurements file conveniently
> > > provides a serialized format of the IMA measurement list. This patch
> > > set serializes the measurement list in this format and restores it.
> > > 
> > > Up to now, the binary_runtime_measurements was defined as architecture
> > > native format.  The assumption being that userspace could and would
> > > handle any architecture conversions.  With the ability of carrying the
> > > measurement list across kexec, possibly from one architecture to a
> > > different one, the per boot architecture information is lost and with it
> > > the ability of recalculating the template digest hash.  To resolve this
> > > problem, without breaking the existing ABI, this patch set introduces
> > > the boot command line option "ima_canonical_fmt", which is arbitrarily
> > > defined as little endian.
> > > 
> > > The need for this boot command line option will be limited to the
> > > existing version 1 format of the binary_runtime_measurements.
> > > Subsequent formats will be defined as canonical format (eg. TPM 2.0
> > > support for larger digests).
> > > 
> > > This patch set pre-req's Thiago Bauermann's "kexec_file: Add buffer
> > > hand-over for the next kernel" patch set. 
> > > 
> > > These patches can also be found in the next-kexec-restore branch of:
> > > git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git
> > 
> > I'll merge these into -mm to get some linux-next exposure.  I don't
> > know what your upstream merge plans will be?
> 
> Sounds good.  I'm hoping to get some review/comments on this patch set
> as well.  At the moment, I'm chasing down a kernel test robot report
> from this afternoon.

My concern about changing the canonical format as originally defined in
patch 9/9 from big endian to little endian never materialized.  Andreas
Steffan, the patch author, is happy either way.

We proposed two methods of addressing Eric Biederman's concerns of not
including the IMA measurement list segment in the kexec hash as
described in  https://lkml.org/lkml/2016/9/9/355.

- defer calculating and verifying the serialized IMA measurement list
buffer hash to IMA
- calculate the kexec hash on load, verify it on the kexec execute,
before re-calculating and updating it.

We implemented both options, which can be found in the
next-kexec-restore.with-ima-checksum and next-kexec-restore branches of:
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git

Thiago's patches posted yesterday and mine posted this morning are based
on kexec doing the hashing (the second option).

Lastly, we've addressed the automated kernel build and runtime bug
reports.

Since the IMA patches are dependent on the kexec ones, the two patch
sets need to be upstreamed together.  How should we proceed?   Should
they be upstreamed through your tree?

Mimi



Re: [PATCH v2 1/3] drivers/of: recognize status property of dt memory nodes

2016-09-15 Thread Reza Arbab

On Thu, Sep 15, 2016 at 08:43:08AM -0500, Rob Herring wrote:

On Wed, Sep 14, 2016 at 3:06 PM, Reza Arbab  wrote:

+   status = of_get_flat_dt_prop(node, "status", NULL);
+   add_memory = !status || !strcmp(status, "okay");


Move this into it's own function to mirror the unflattened version
(of_device_is_available). Also, make sure the logic is the same. IIRC,
"ok" is also allowed.


Will do. 


@@ -1057,6 +1062,9 @@ int __init early_init_dt_scan_memory(unsigned long node, 
const char *uname,
pr_debug(" - %llx ,  %llx\n", (unsigned long long)base,
(unsigned long long)size);

+   if (!add_memory)
+   continue;


There's no point in checking this in the loop. status applies to the
whole node. Just return up above.


I was trying to preserve that pr_debug output for these nodes, but I'm 
also fine with skipping it.


Thanks for your feedback! I'll spin a v3 of this patchset soon.

--
Reza Arbab



[PATCH] powerpc: do not use kprobe section to exempt exception handlers

2016-09-15 Thread Nicholas Piggin
Use the blacklist macros instead. This allows the linker to move
exception handler functions close to callers and avoids trampolines in
larger kernels.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/asm-prototypes.h |  6 +++---
 arch/powerpc/kernel/hw_breakpoint.c   |  9 ++---
 arch/powerpc/kernel/traps.c   | 21 ++---
 arch/powerpc/mm/fault.c   |  4 ++--
 4 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index e71b909..b1c3abe 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -52,8 +52,8 @@ void SMIException(struct pt_regs *regs);
 void handle_hmi_exception(struct pt_regs *regs);
 void instruction_breakpoint_exception(struct pt_regs *regs);
 void RunModeException(struct pt_regs *regs);
-void __kprobes single_step_exception(struct pt_regs *regs);
-void __kprobes program_check_exception(struct pt_regs *regs);
+void single_step_exception(struct pt_regs *regs);
+void program_check_exception(struct pt_regs *regs);
 void alignment_exception(struct pt_regs *regs);
 void StackOverflow(struct pt_regs *regs);
 void nonrecoverable_exception(struct pt_regs *regs);
@@ -70,6 +70,6 @@ void unrecoverable_exception(struct pt_regs *regs);
 void kernel_bad_stack(struct pt_regs *regs);
 void system_reset_exception(struct pt_regs *regs);
 void machine_check_exception(struct pt_regs *regs);
-void __kprobes emulation_assist_interrupt(struct pt_regs *regs);
+void emulation_assist_interrupt(struct pt_regs *regs);
 
 #endif /* _ASM_POWERPC_ASM_PROTOTYPES_H */
diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index aec9a1b..9781c69 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -206,7 +206,7 @@ void thread_change_pc(struct task_struct *tsk, struct 
pt_regs *regs)
 /*
  * Handle debug exception notifications.
  */
-int __kprobes hw_breakpoint_handler(struct die_args *args)
+int hw_breakpoint_handler(struct die_args *args)
 {
int rc = NOTIFY_STOP;
struct perf_event *bp;
@@ -290,11 +290,12 @@ out:
rcu_read_unlock();
return rc;
 }
+NOKPROBE_SYMBOL(hw_breakpoint_handler);
 
 /*
  * Handle single-step exceptions following a DABR hit.
  */
-static int __kprobes single_step_dabr_instruction(struct die_args *args)
+static int single_step_dabr_instruction(struct die_args *args)
 {
struct pt_regs *regs = args->regs;
struct perf_event *bp = NULL;
@@ -329,11 +330,12 @@ static int __kprobes single_step_dabr_instruction(struct 
die_args *args)
 
return NOTIFY_STOP;
 }
+NOKPROBE_SYMBOL(single_step_dabr_instruction);
 
 /*
  * Handle debug exception notifications.
  */
-int __kprobes hw_breakpoint_exceptions_notify(
+int hw_breakpoint_exceptions_notify(
struct notifier_block *unused, unsigned long val, void *data)
 {
int ret = NOTIFY_DONE;
@@ -349,6 +351,7 @@ int __kprobes hw_breakpoint_exceptions_notify(
 
return ret;
 }
+NOKPROBE_SYMBOL(hw_breakpoint_exceptions_notify);
 
 /*
  * Release the user breakpoints used by ptrace
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 62859eb..7cbd6cf 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -117,7 +117,7 @@ static int die_owner = -1;
 static unsigned int die_nest_count;
 static int die_counter;
 
-static unsigned __kprobes long oops_begin(struct pt_regs *regs)
+static unsigned long oops_begin(struct pt_regs *regs)
 {
int cpu;
unsigned long flags;
@@ -144,8 +144,9 @@ static unsigned __kprobes long oops_begin(struct pt_regs 
*regs)
pmac_backlight_unblank();
return flags;
 }
+NOKPROBE_SYMBOL(oops_begin);
 
-static void __kprobes oops_end(unsigned long flags, struct pt_regs *regs,
+static void oops_end(unsigned long flags, struct pt_regs *regs,
   int signr)
 {
bust_spinlocks(0);
@@ -196,8 +197,9 @@ static void __kprobes oops_end(unsigned long flags, struct 
pt_regs *regs,
panic("Fatal exception");
do_exit(signr);
 }
+NOKPROBE_SYMBOL(oops_end);
 
-static int __kprobes __die(const char *str, struct pt_regs *regs, long err)
+static int __die(const char *str, struct pt_regs *regs, long err)
 {
printk("Oops: %s, sig: %ld [#%d]\n", str, err, ++die_counter);
 #ifdef CONFIG_PREEMPT
@@ -221,6 +223,7 @@ static int __kprobes __die(const char *str, struct pt_regs 
*regs, long err)
 
return 0;
 }
+NOKPROBE_SYMBOL(__die);
 
 void die(const char *str, struct pt_regs *regs, long err)
 {
@@ -802,7 +805,7 @@ void RunModeException(struct pt_regs *regs)
_exception(SIGTRAP, regs, 0, 0);
 }
 
-void __kprobes single_step_exception(struct pt_regs *regs)
+void single_step_exception(struct pt_regs *regs)
 {
enum ctx_state prev_state = 

Re: [PATCH v2 1/3] drivers/of: recognize status property of dt memory nodes

2016-09-15 Thread Rob Herring
On Wed, Sep 14, 2016 at 3:06 PM, Reza Arbab  wrote:
> Respect the standard dt "status" property when scanning memory nodes in
> early_init_dt_scan_memory(), so that if the property is present and not
> "okay", no memory will be added.
>
> The use case at hand is accelerator or device memory, which may be
> unusable until post-boot initialization of the memory link. Such a node
> can be described in the dt as any other, given its status is "disabled".
> Per the device tree specification,
>
> "disabled"
> Indicates that the device is not presently operational, but it
> might become operational in the future (for example, something
> is not plugged in, or switched off).
>
> Once such memory is made operational, it can then be hotplugged.
>
> Signed-off-by: Reza Arbab 
> ---
>  drivers/of/fdt.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 085c638..fc19590 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -1022,8 +1022,10 @@ int __init early_init_dt_scan_memory(unsigned long 
> node, const char *uname,
>  int depth, void *data)
>  {
> const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> +   const char *status;
> const __be32 *reg, *endp;
> int l;
> +   bool add_memory;
>
> /* We are scanning "memory" nodes only */
> if (type == NULL) {
> @@ -1044,6 +1046,9 @@ int __init early_init_dt_scan_memory(unsigned long 
> node, const char *uname,
>
> endp = reg + (l / sizeof(__be32));
>
> +   status = of_get_flat_dt_prop(node, "status", NULL);
> +   add_memory = !status || !strcmp(status, "okay");

Move this into it's own function to mirror the unflattened version
(of_device_is_available). Also, make sure the logic is the same. IIRC,
"ok" is also allowed.

> +
> pr_debug("memory scan node %s, reg size %d,\n", uname, l);
>
> while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
> @@ -1057,6 +1062,9 @@ int __init early_init_dt_scan_memory(unsigned long 
> node, const char *uname,
> pr_debug(" - %llx ,  %llx\n", (unsigned long long)base,
> (unsigned long long)size);
>
> +   if (!add_memory)
> +   continue;

There's no point in checking this in the loop. status applies to the
whole node. Just return up above.

Rob


[PATCH 13/13] powerpc: rewrite local_t using soft_irq

2016-09-15 Thread Madhavan Srinivasan
Local atomic operations are fast and highly reentrant per CPU counters.
Used for percpu variable updates. Local atomic operations only guarantee
variable modification atomicity wrt the CPU which owns the data and
these needs to be executed in a preemption safe way.

Here is the design of this patch. Since local_* operations
are only need to be atomic to interrupts (IIUC), we have two options.
Either replay the "op" if interrupted or replay the interrupt after
the "op". Initial patchset posted was based on implementing local_* operation
based on CR5 which replay's the "op". Patchset had issues in case of
rewinding the address pointor from an array. This make the slow patch
really slow. Since CR5 based implementation proposed using __ex_table to find
the rewind addressr, this rasied concerns about size of __ex_table and vmlinux.

https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-December/123115.html

But this patch uses, local_irq_pmu_save to soft_disable
interrupts (including PMIs). After finishing the "op", local_irq_pmu_restore()
called and correspondingly interrupts are replayed if any occured.

patch re-write the current local_* functions to use arch_local_irq_disbale.
Base flow for each function is

{
local_irq_pmu_save(flags)
load
..
store
local_irq_pmu_restore(flags)
}

Reason for the approach is that, currently l[w/d]arx/st[w/d]cx.
instruction pair is used for local_* operations, which are heavy
on cycle count and they dont support a local variant. So to
see whether the new implementation helps, used a modified
version of Rusty's benchmark code on local_t.

https://lkml.org/lkml/2008/12/16/450

Modifications to Rusty's benchmark code:
- Executed only local_t test

Here are the values with the patch.

Time in ns per iteration

Local_t Without Patch   With Patch

_inc28  8
_add28  8
_read   3   3
_add_return 28  7

Currently only asm/local.h has been rewrite, and also
the entire change is tested only in PPC64 (pseries guest)
and PPC64 host (LE)

TODO:
- local_cmpxchg and local_xchg needs modification.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/local.h | 94 
 1 file changed, 66 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
index b8da91363864..fb5728abb4e9 100644
--- a/arch/powerpc/include/asm/local.h
+++ b/arch/powerpc/include/asm/local.h
@@ -3,6 +3,9 @@
 
 #include 
 #include 
+#include 
+
+#include 
 
 typedef struct
 {
@@ -14,24 +17,50 @@ typedef struct
 #define local_read(l)  atomic_long_read(&(l)->a)
 #define local_set(l,i) atomic_long_set(&(l)->a, (i))
 
-#define local_add(i,l) atomic_long_add((i),(&(l)->a))
-#define local_sub(i,l) atomic_long_sub((i),(&(l)->a))
-#define local_inc(l)   atomic_long_inc(&(l)->a)
-#define local_dec(l)   atomic_long_dec(&(l)->a)
+static __inline__ void local_add(long i, local_t *l)
+{
+   long t;
+   unsigned long flags;
+
+   local_irq_pmu_save(flags);
+   __asm__ __volatile__(
+   PPC_LL" %0,0(%2)\n\
+   add %0,%1,%0\n"
+   PPC_STL" %0,0(%2)\n"
+   : "=" (t)
+   : "r" (i), "r" (&(l->a.counter)));
+   local_irq_pmu_restore(flags);
+}
+
+static __inline__ void local_sub(long i, local_t *l)
+{
+   long t;
+   unsigned long flags;
+
+   local_irq_pmu_save(flags);
+   __asm__ __volatile__(
+   PPC_LL" %0,0(%2)\n\
+   subf%0,%1,%0\n"
+   PPC_STL" %0,0(%2)\n"
+   : "=" (t)
+   : "r" (i), "r" (&(l->a.counter)));
+   local_irq_pmu_restore(flags);
+}
 
 static __inline__ long local_add_return(long a, local_t *l)
 {
long t;
+   unsigned long flags;
 
+   local_irq_pmu_save(flags);
__asm__ __volatile__(
-"1:"   PPC_LLARX(%0,0,%2,0) "  # local_add_return\n\
+   PPC_LL" %0,0(%2)\n\
add %0,%1,%0\n"
-   PPC405_ERR77(0,%2)
-   PPC_STLCX   "%0,0,%2 \n\
-   bne-1b"
+   PPC_STL "%0,0(%2)\n"
: "=" (t)
: "r" (a), "r" (&(l->a.counter))
: "cc", "memory");
+   local_irq_pmu_restore(flags);
 
return t;
 }
@@ -41,16 +70,18 @@ static __inline__ long local_add_return(long a, local_t *l)
 static __inline__ long local_sub_return(long a, local_t *l)
 {
long t;
+   unsigned long flags;
+
+   local_irq_pmu_save(flags);
 
__asm__ __volatile__(
-"1:"   PPC_LLARX(%0,0,%2,0) "  # local_sub_return\n\
+"1:"   PPC_LL" %0,0(%2)\n\
subf%0,%1,%0\n"
-   PPC405_ERR77(0,%2)
-   PPC_STLCX   "%0,0,%2 \n\
-   bne-1b"
+   PPC_STL "%0,0(%2)\n"
: "=" (t)
: "r" (a), "r" (&(l->a.counter))
: "cc", "memory");
+   local_irq_pmu_restore(flags);
 
return t;
 }
@@ 

[PATCH 12/13] powerpc: Add a Kconfig and a functions to set new soft_enabled mask

2016-09-15 Thread Madhavan Srinivasan
New Kconfig is added "CONFIG_IRQ_DEBUG_SUPPORT" to add warn_on
to alert the invalid transitions. Have also moved the code under
the CONFIG_TRACE_IRQFLAGS in arch_local_irq_restore() to new Kconfig
as suggested.

To support disabling and enabling of irq with PMI, set of
new powerpc_local_irq_pmu_save() and powerpc_local_irq_restore()
functions are added. And powerpc_local_irq_save() implemented,
by adding a new soft_enabled manipulation function soft_enabled_or_return().
Local_irq_pmu_* macros are provided to access these powerpc_local_irq_pmu*
functions which includes trace_hardirqs_on|off() to match what we
have in include/linux/irqflags.h.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/Kconfig  |  4 +++
 arch/powerpc/include/asm/hw_irq.h | 69 ++-
 arch/powerpc/kernel/irq.c |  8 +++--
 3 files changed, 78 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 927d2ab2ce08..878f05925340 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -51,6 +51,10 @@ config TRACE_IRQFLAGS_SUPPORT
bool
default y
 
+config IRQ_DEBUG_SUPPORT
+   bool
+   default n
+
 config LOCKDEP_SUPPORT
bool
default y
diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 850fdffb59eb..86f9736fdbb1 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -81,6 +81,20 @@ static inline notrace unsigned long 
soft_enabled_set_return(unsigned long enable
return flags;
 }
 
+static inline notrace unsigned long soft_enabled_or_return(unsigned long 
enable)
+{
+   unsigned long flags, zero;
+
+   asm volatile(
+   "mr %1,%3; lbz %0,%2(13); or %1,%0,%1; stb %1,%2(13)"
+   : "=r" (flags), "="(zero)
+   : "i" (offsetof(struct paca_struct, soft_enabled)),\
+ "r" (enable)
+   : "memory");
+
+   return flags;
+}
+
 static inline unsigned long arch_local_save_flags(void)
 {
return soft_enabled_return();
@@ -105,7 +119,7 @@ static inline unsigned long arch_local_irq_save(void)
 
 static inline bool arch_irqs_disabled_flags(unsigned long flags)
 {
-   return flags == IRQ_DISABLE_MASK_LINUX;
+   return (flags);
 }
 
 static inline bool arch_irqs_disabled(void)
@@ -113,6 +127,59 @@ static inline bool arch_irqs_disabled(void)
return arch_irqs_disabled_flags(arch_local_save_flags());
 }
 
+static inline void powerpc_local_irq_pmu_restore(unsigned long flags)
+{
+   arch_local_irq_restore(flags);
+}
+
+static inline unsigned long powerpc_local_irq_pmu_disable(void)
+{
+   return soft_enabled_or_return(IRQ_DISABLE_MASK_LINUX | 
IRQ_DISABLE_MASK_PMU);
+}
+
+static inline unsigned long powerpc_local_irq_pmu_save(void)
+{
+   return powerpc_local_irq_pmu_disable();
+}
+
+#define raw_local_irq_pmu_save(flags)  \
+   do {\
+   typecheck(unsigned long, flags);\
+   flags = powerpc_local_irq_pmu_save();   \
+   } while(0)
+
+#define raw_local_irq_pmu_restore(flags)   \
+   do {\
+   typecheck(unsigned long, flags);\
+   powerpc_local_irq_pmu_restore(flags);   \
+   } while(0)
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+#define local_irq_pmu_save(flags)  \
+   do {\
+   raw_local_irq_pmu_save(flags);  \
+   trace_hardirqs_off();   \
+   } while(0)
+#define local_irq_pmu_restore(flags)   \
+   do {\
+   if (raw_irqs_disabled_flags(flags)) {   \
+   raw_local_irq_pmu_restore(flags);\
+   trace_hardirqs_off();   \
+   } else {\
+   trace_hardirqs_on();\
+   raw_local_irq_pmu_restore(flags);\
+   }   \
+   } while(0)
+#else
+#define local_irq_pmu_save(flags)  \
+   do {\
+   raw_local_irq_pmu_save(flags);  \
+   } while(0)
+#define local_irq_pmu_restore(flags)   \
+   do { raw_local_irq_pmu_restore(flags); } while (0)
+#endif /* CONFIG_TRACE_IRQFLAGS */
+
+
 #ifdef CONFIG_PPC_BOOK3E
 #define __hard_irq_enable()asm volatile("wrteei 1" : : : "memory")
 #define __hard_irq_disable()   asm volatile("wrteei 0" : : : "memory")
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 9e5e9a6d4147..ae31b1e85fdb 100644
--- a/arch/powerpc/kernel/irq.c

[PATCH 11/13] powerpc: Add support to mask perf interrupts and replay them

2016-09-15 Thread Madhavan Srinivasan
To support masking of the PMI interrupts, couple of new interrupt handler
macros are added MASKABLE_EXCEPTION_PSERIES_OOL and
MASKABLE_RELON_EXCEPTION_PSERIES_OOL.

Couple of new irq #defs "PACA_IRQ_PMI" and "SOFTEN_VALUE_0xf0*" added to
use in the exception code to check for PMI interrupts.

In the masked_interrupt handler, for PMIs we reset the MSR[EE]
and return. In the __check_irq_replay(), replay the PMI interrupt
by calling performance_monitor_common handler.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/exception-64s.h | 13 +
 arch/powerpc/include/asm/hw_irq.h|  1 +
 arch/powerpc/kernel/entry_64.S   |  5 +
 arch/powerpc/kernel/exceptions-64s.S |  6 --
 arch/powerpc/kernel/irq.c| 12 +++-
 5 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 41be0c2d7658..ca40b5c59869 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -427,6 +427,7 @@ label##_relon_hv:   
\
 #define SOFTEN_VALUE_0xe62 PACA_IRQ_HMI
 #define SOFTEN_VALUE_0xea0 PACA_IRQ_EE
 #define SOFTEN_VALUE_0xea2 PACA_IRQ_EE
+#define SOFTEN_VALUE_0xf00 PACA_IRQ_PMI
 
 #define __SOFTEN_TEST(h, vec, bitmask) \
lbz r10,PACASOFTIRQEN(r13); \
@@ -462,6 +463,12 @@ label##_pSeries:   
\
_MASKABLE_EXCEPTION_PSERIES(vec, label, \
EXC_STD, SOFTEN_TEST_PR, bitmask)
 
+#define MASKABLE_EXCEPTION_PSERIES_OOL(vec, label, bitmask)\
+   .globl label##_pSeries; \
+label##_pSeries:   \
+   __EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_PR, vec, bitmask); \
+   EXCEPTION_PROLOG_PSERIES_1(label##_common, EXC_STD);
+
 #define MASKABLE_EXCEPTION_HV(loc, vec, label, bitmask)
\
. = loc;\
.globl label##_hv;  \
@@ -490,6 +497,12 @@ label##_relon_pSeries: 
\
_MASKABLE_RELON_EXCEPTION_PSERIES(vec, label,   \
  EXC_STD, SOFTEN_NOTEST_PR, bitmask)
 
+#define MASKABLE_RELON_EXCEPTION_PSERIES_OOL(vec, label, bitmask)  \
+   .globl label##_relon_pSeries;   \
+label##_relon_pSeries: \
+   __EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_NOTEST_PR, vec, bitmask);\
+   EXCEPTION_PROLOG_PSERIES_1(label##_common, EXC_STD);
+
 #define MASKABLE_RELON_EXCEPTION_HV(loc, vec, label, bitmask)  \
. = loc;\
.globl label##_relon_hv;\
diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 245262c02bab..850fdffb59eb 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -26,6 +26,7 @@
 #define PACA_IRQ_DEC   0x08 /* Or FIT */
 #define PACA_IRQ_EE_EDGE   0x10 /* BookE only */
 #define PACA_IRQ_HMI   0x20
+#define PACA_IRQ_PMI   0x40
 
 /*
  * flags for paca->soft_enabled
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 533e363914a9..e3baf9c24d0e 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -933,6 +933,11 @@ restore_check_irq_replay:
addir3,r1,STACK_FRAME_OVERHEAD;
bl  do_IRQ
b   ret_from_except
+1: cmpwi   cr0,r3,0xf00
+   bne 1f
+   addir3,r1,STACK_FRAME_OVERHEAD;
+   bl  performance_monitor_exception
+   b   ret_from_except
 1: cmpwi   cr0,r3,0xe60
bne 1f
addir3,r1,STACK_FRAME_OVERHEAD;
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 581a10bdb34a..19138a411700 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -596,7 +596,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xea2)
 
/* moved from 0xf00 */
-   STD_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor)
+   MASKABLE_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor, 
IRQ_DISABLE_MASK_PMU)
KVM_HANDLER(PACA_EXGEN, EXC_STD, 0xf00)
STD_EXCEPTION_PSERIES_OOL(0xf20, altivec_unavailable)
KVM_HANDLER(PACA_EXGEN, EXC_STD, 0xf20)
@@ -671,6 +671,8 @@ _GLOBAL(__replay_interrupt)
beq decrementer_common
cmpwi   r3,0x500
beq 

[PATCH 10/13] powerpc: Add "bitmask" paramater to MASKABLE_* macros

2016-09-15 Thread Madhavan Srinivasan
Make it explicit the interrupt masking supported
by a gievn interrupt handler. Patch correspondingly
extends the MASKABLE_* macros with an addition's parameter.
"bitmask" parameter is passed to SOFTEN_TEST macro to decide
on masking the interrupt.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/exception-64s.h | 62 
 arch/powerpc/kernel/exceptions-64s.S | 36 ---
 2 files changed, 54 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 1eea4ab75607..41be0c2d7658 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -179,9 +179,9 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
  * checking of the interrupt maskable level in the SOFTEN_TEST.
  * Intended to be used in MASKABLE_EXCPETION_* macros.
  */
-#define __EXCEPTION_PROLOG_1(area, extra, vec) \
+#define __EXCEPTION_PROLOG_1(area, extra, vec, bitmask)
\
__EXCEPTION_PROLOG_1_PRE(area); \
-   extra(vec); \
+   extra(vec, bitmask);\
__EXCEPTION_PROLOG_1_POST(area);
 
 /*
@@ -428,79 +428,79 @@ label##_relon_hv: 
\
 #define SOFTEN_VALUE_0xea0 PACA_IRQ_EE
 #define SOFTEN_VALUE_0xea2 PACA_IRQ_EE
 
-#define __SOFTEN_TEST(h, vec)  \
+#define __SOFTEN_TEST(h, vec, bitmask) \
lbz r10,PACASOFTIRQEN(r13); \
-   andi.   r10,r10,IRQ_DISABLE_MASK_LINUX; \
+   andi.   r10,r10,bitmask;\
li  r10,SOFTEN_VALUE_##vec; \
bne masked_##h##interrupt
-#define _SOFTEN_TEST(h, vec)   __SOFTEN_TEST(h, vec)
+#define _SOFTEN_TEST(h, vec, bitmask)  __SOFTEN_TEST(h, vec, bitmask)
 
-#define SOFTEN_TEST_PR(vec)\
+#define SOFTEN_TEST_PR(vec, bitmask)   \
KVMTEST(vec);   \
-   _SOFTEN_TEST(EXC_STD, vec)
+   _SOFTEN_TEST(EXC_STD, vec, bitmask)
 
-#define SOFTEN_TEST_HV(vec)\
+#define SOFTEN_TEST_HV(vec, bitmask)   \
KVMTEST(vec);   \
-   _SOFTEN_TEST(EXC_HV, vec)
+   _SOFTEN_TEST(EXC_HV, vec, bitmask)
 
-#define SOFTEN_NOTEST_PR(vec)  _SOFTEN_TEST(EXC_STD, vec)
-#define SOFTEN_NOTEST_HV(vec)  _SOFTEN_TEST(EXC_HV, vec)
+#define SOFTEN_NOTEST_PR(vec, bitmask) _SOFTEN_TEST(EXC_STD, vec, 
bitmask)
+#define SOFTEN_NOTEST_HV(vec, bitmask) _SOFTEN_TEST(EXC_HV, vec, 
bitmask)
 
-#define __MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra) \
+#define __MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra, bitmask)\
SET_SCRATCH0(r13);/* save r13 */\
EXCEPTION_PROLOG_0(PACA_EXGEN); \
-   __EXCEPTION_PROLOG_1(PACA_EXGEN, extra, vec);   \
+   __EXCEPTION_PROLOG_1(PACA_EXGEN, extra, vec, bitmask);  \
EXCEPTION_PROLOG_PSERIES_1(label##_common, h);
 
-#define _MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra)  \
-   __MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra)
+#define _MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra, bitmask) \
+   __MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra, bitmask)
 
-#define MASKABLE_EXCEPTION_PSERIES(loc, vec, label)\
+#define MASKABLE_EXCEPTION_PSERIES(loc, vec, label, bitmask)   \
. = loc;\
.globl label##_pSeries; \
 label##_pSeries:   \
_MASKABLE_EXCEPTION_PSERIES(vec, label, \
-   EXC_STD, SOFTEN_TEST_PR)
+   EXC_STD, SOFTEN_TEST_PR, bitmask)
 
-#define MASKABLE_EXCEPTION_HV(loc, vec, label) \
+#define MASKABLE_EXCEPTION_HV(loc, vec, label, bitmask)
\
. = loc;\
.globl label##_hv;  \
 label##_hv:\
_MASKABLE_EXCEPTION_PSERIES(vec, label, \
-   EXC_HV, SOFTEN_TEST_HV)
+   EXC_HV, SOFTEN_TEST_HV, bitmask)
 
-#define 

[PATCH 09/13] powerpc: Introduce new mask bit for soft_enabled

2016-09-15 Thread Madhavan Srinivasan
Currently soft_enabled is used as the flag to determine
the interrupt state. Patch extends the soft_enabled
to be used as a mask instead of a flag.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/exception-64s.h | 4 ++--
 arch/powerpc/include/asm/hw_irq.h| 1 +
 arch/powerpc/include/asm/irqflags.h  | 4 ++--
 arch/powerpc/kernel/entry_64.S   | 4 ++--
 arch/powerpc/kernel/exceptions-64e.S | 6 +++---
 5 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index dd3253bd0d8e..1eea4ab75607 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -430,9 +430,9 @@ label##_relon_hv:   
\
 
 #define __SOFTEN_TEST(h, vec)  \
lbz r10,PACASOFTIRQEN(r13); \
-   cmpwi   r10,IRQ_DISABLE_MASK_LINUX; 
\
+   andi.   r10,r10,IRQ_DISABLE_MASK_LINUX; \
li  r10,SOFTEN_VALUE_##vec; \
-   beq masked_##h##interrupt
+   bne masked_##h##interrupt
 #define _SOFTEN_TEST(h, vec)   __SOFTEN_TEST(h, vec)
 
 #define SOFTEN_TEST_PR(vec)\
diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index fd9b421f9020..245262c02bab 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -32,6 +32,7 @@
  */
 #define IRQ_DISABLE_MASK_NONE  0
 #define IRQ_DISABLE_MASK_LINUX 1
+#define IRQ_DISABLE_MASK_PMU   2
 
 #endif /* CONFIG_PPC64 */
 
diff --git a/arch/powerpc/include/asm/irqflags.h 
b/arch/powerpc/include/asm/irqflags.h
index d0ed2a7d7d10..9ff09747a226 100644
--- a/arch/powerpc/include/asm/irqflags.h
+++ b/arch/powerpc/include/asm/irqflags.h
@@ -48,11 +48,11 @@
 #define RECONCILE_IRQ_STATE(__rA, __rB)\
lbz __rA,PACASOFTIRQEN(r13);\
lbz __rB,PACAIRQHAPPENED(r13);  \
-   cmpwi   cr0,__rA,IRQ_DISABLE_MASK_LINUX;\
+   andi.   __rA,__rA,IRQ_DISABLE_MASK_LINUX;\
li  __rA,IRQ_DISABLE_MASK_LINUX;\
ori __rB,__rB,PACA_IRQ_HARD_DIS;\
stb __rB,PACAIRQHAPPENED(r13);  \
-   beq 44f;\
+   bne 44f;\
stb __rA,PACASOFTIRQEN(r13);\
TRACE_DISABLE_INTS; \
 44:
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 879aeb11ad29..533e363914a9 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -764,8 +764,8 @@ restore:
 */
ld  r5,SOFTE(r1)
lbz r6,PACASOFTIRQEN(r13)
-   cmpwi   cr0,r5,IRQ_DISABLE_MASK_LINUX
-   beq restore_irq_off
+   andi.   r5,r5,IRQ_DISABLE_MASK_LINUX
+   bne restore_irq_off
 
/* We are enabling, were we already enabled ? Yes, just return */
cmpwi   cr0,r6,IRQ_DISABLE_MASK_NONE
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 5c628b5696f6..8e40df2c2f30 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -212,8 +212,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
/* Interrupts had better not already be enabled... */
twnei   r6,IRQ_DISABLE_MASK_LINUX
 
-   cmpwi   cr0,r5,IRQ_DISABLE_MASK_LINUX
-   beq 1f
+   andi.   r5,r5,IRQ_DISABLE_MASK_LINUX
+   bne 1f
 
TRACE_ENABLE_INTS
stb r5,PACASOFTIRQEN(r13)
@@ -352,7 +352,7 @@ ret_from_mc_except:
 
 #define PROLOG_ADDITION_MASKABLE_GEN(n)
\
lbz r10,PACASOFTIRQEN(r13); /* are irqs soft-disabled ? */  \
-   cmpwi   cr0,r10,IRQ_DISABLE_MASK_LINUX;/* yes -> go out of line */ \
+   andi.   r10,r10,IRQ_DISABLE_MASK_LINUX;/* yes -> go out of line */ \
beq masked_interrupt_book3e_##n
 
 #define PROLOG_ADDITION_2REGS_GEN(n)   \
-- 
2.7.4



[PATCH 08/13] powerpc: Add new _EXCEPTION_PROLOG_1 macro

2016-09-15 Thread Madhavan Srinivasan
To support addition of "bitmask" to MASKABLE_* macros,
factor out the EXCPETION_PROLOG_1 macro.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/exception-64s.h | 30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 75e262466b85..dd3253bd0d8e 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -161,18 +161,40 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
std r10,area+EX_R10(r13);   /* save r10 - r12 */\
OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR)
 
-#define __EXCEPTION_PROLOG_1(area, extra, vec) \
+#define __EXCEPTION_PROLOG_1_PRE(area) \
OPT_SAVE_REG_TO_PACA(area+EX_PPR, r9, CPU_FTR_HAS_PPR); \
OPT_SAVE_REG_TO_PACA(area+EX_CFAR, r10, CPU_FTR_CFAR);  \
SAVE_CTR(r10, area);\
-   mfcrr9; \
-   extra(vec); \
+   mfcrr9;
+
+#define __EXCEPTION_PROLOG_1_POST(area)
\
std r11,area+EX_R11(r13);   \
std r12,area+EX_R12(r13);   \
GET_SCRATCH0(r10);  \
std r10,area+EX_R13(r13)
+
+/*
+ * This version of the EXCEPTION_PROLOG_1 will carry
+ * addition parameter called "bitmask" to support
+ * checking of the interrupt maskable level in the SOFTEN_TEST.
+ * Intended to be used in MASKABLE_EXCPETION_* macros.
+ */
+#define __EXCEPTION_PROLOG_1(area, extra, vec) \
+   __EXCEPTION_PROLOG_1_PRE(area); \
+   extra(vec); \
+   __EXCEPTION_PROLOG_1_POST(area);
+
+/*
+ * This version of the EXCEPTION_PROLOG_1 is intended
+ * to be used in STD_EXCEPTION* macros
+ */
+#define _EXCEPTION_PROLOG_1(area, extra, vec)  \
+   __EXCEPTION_PROLOG_1_PRE(area); \
+   extra(vec); \
+   __EXCEPTION_PROLOG_1_POST(area);
+
 #define EXCEPTION_PROLOG_1(area, extra, vec)   \
-   __EXCEPTION_PROLOG_1(area, extra, vec)
+   _EXCEPTION_PROLOG_1(area, extra, vec)
 
 #define __EXCEPTION_PROLOG_PSERIES_1(label, h) \
ld  r12,PACAKBASE(r13); /* get high part of  */   \
-- 
2.7.4



[PATCH 07/13] powerpc: Avoid using EXCEPTION_PROLOG_1 macro in MASKABLE_*

2016-09-15 Thread Madhavan Srinivasan
Currently we use both EXCEPTION_PROLOG_1 and __EXCEPTION_PROLOG_1
in the MASKABLE_* macros. As a cleanup, this patch makes MASKABLE_*
to use only __EXCEPTION_PROLOG_1. There is not logic change.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/exception-64s.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 38272fe8a757..75e262466b85 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -450,7 +450,7 @@ label##_hv: 
\
 #define MASKABLE_EXCEPTION_HV_OOL(vec, label)  \
.globl label##_hv;  \
 label##_hv:\
-   EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_HV, vec);\
+   __EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_HV, vec);  \
EXCEPTION_PROLOG_PSERIES_1(label##_common, EXC_HV);
 
 #define __MASKABLE_RELON_EXCEPTION_PSERIES(vec, label, h, extra)   \
@@ -478,7 +478,7 @@ label##_relon_hv:   
\
 #define MASKABLE_RELON_EXCEPTION_HV_OOL(vec, label)\
.globl label##_relon_hv;\
 label##_relon_hv:  \
-   EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_NOTEST_HV, vec);  \
+   __EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_NOTEST_HV, vec);
\
EXCEPTION_PROLOG_PSERIES_1(label##_common, EXC_HV);
 
 /*
-- 
2.7.4



[PATCH 06/13] powerpc: reverse the soft_enable logic

2016-09-15 Thread Madhavan Srinivasan
"paca->soft_enabled" is used as a flag to mask some of interrupts.
Currently supported flags values and their details:

soft_enabledMSR[EE]

0   0   Disabled (PMI and HMI not masked)
1   1   Enabled

"paca->soft_enabled" is initialized to 1 to make the interripts as
enabled. arch_local_irq_disable() will toggle the value when interrupts
needs to disbled. At this point, the interrupts are not actually disabled,
instead, interrupt vector has code to check for the flag and mask it when it 
occurs.
By "mask it", it update interrupt paca->irq_happened and return.
arch_local_irq_restore() is called to re-enable interrupts, which checks and
replays interrupts if any occured.

Now, as mentioned, current logic doesnot mask "performance monitoring 
interrupts"
and PMIs are implemented as NMI. But this patchset depends on local_irq_*
for a successful local_* update. Meaning, mask all possible interrupts during
local_* update and replay them after the update.

So the idea here is to reserve the "paca->soft_enabled" logic. New values and
details:

soft_enabledMSR[EE]

1   0   Disabled  (PMI and HMI not masked)
0   1   Enabled

Reason for the this change is to create foundation for a third flag value "2"
for "soft_enabled" to add support to mask PMIs. When ->soft_enabled is
set to a value "2", PMI interrupts are mask and when set to a value
of "1", PMI are not mask.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/hw_irq.h | 4 ++--
 arch/powerpc/kernel/entry_64.S| 5 ++---
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index dc3c248f9244..fd9b421f9020 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -30,8 +30,8 @@
 /*
  * flags for paca->soft_enabled
  */
-#define IRQ_DISABLE_MASK_NONE  1
-#define IRQ_DISABLE_MASK_LINUX 0
+#define IRQ_DISABLE_MASK_NONE  0
+#define IRQ_DISABLE_MASK_LINUX 1
 
 #endif /* CONFIG_PPC64 */
 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index aef7b64cbbeb..879aeb11ad29 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -131,8 +131,7 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
 */
 #if defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_BUG)
lbz r10,PACASOFTIRQEN(r13)
-   xorir10,r10,IRQ_DISABLE_MASK_NONE
-1: tdnei   r10,0
+1: tdnei   r10,IRQ_DISABLE_MASK_NONE
EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
 #endif
 
@@ -1012,7 +1011,7 @@ _GLOBAL(enter_rtas)
 * check it with the asm equivalent of WARN_ON
 */
lbz r0,PACASOFTIRQEN(r13)
-1: tdnei   r0,IRQ_DISABLE_MASK_LINUX
+1: tdeqi   r0,IRQ_DISABLE_MASK_NONE
EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
 #endif

-- 
2.7.4



[PATCH 04/13] powerpc: Use soft_enabled_set api to update paca->soft_enabled

2016-09-15 Thread Madhavan Srinivasan
Force use of soft_enabled_set() wrapper to update paca-soft_enabled
wherever possisble. Also add a new wrapper function, soft_enabled_set_return(),
added to force the paca->soft_enabled updates.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/hw_irq.h  | 14 ++
 arch/powerpc/include/asm/kvm_ppc.h |  2 +-
 arch/powerpc/kernel/irq.c  |  2 +-
 arch/powerpc/kernel/setup_64.c |  4 ++--
 arch/powerpc/kernel/time.c |  6 +++---
 5 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 8fad8c24760b..f828b8f8df02 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -53,6 +53,20 @@ static inline notrace void soft_enabled_set(unsigned long 
enable)
: : "r" (enable), "i" (offsetof(struct paca_struct, soft_enabled)));
 }
 
+static inline notrace unsigned long soft_enabled_set_return(unsigned long 
enable)
+{
+   unsigned long flags;
+
+   asm volatile(
+   "lbz %0,%1(13); stb %2,%1(13)"
+   : "=r" (flags)
+   : "i" (offsetof(struct paca_struct, soft_enabled)),\
+ "r" (enable)
+   : "memory");
+
+   return flags;
+}
+
 static inline unsigned long arch_local_save_flags(void)
 {
unsigned long flags;
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 740ee309cea8..07f6a51ae99f 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -707,7 +707,7 @@ static inline void kvmppc_fix_ee_before_entry(void)
 
/* Only need to enable IRQs by hard enabling them after this */
local_paca->irq_happened = 0;
-   local_paca->soft_enabled = IRQ_DISABLE_MASK_NONE;
+   soft_enabled_set(IRQ_DISABLE_MASK_NONE);
 #endif
 }
 
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 5a926ea5bd0b..58462ce186fa 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -332,7 +332,7 @@ bool prep_irq_for_idle(void)
 * of entering the low power state.
 */
local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
-   local_paca->soft_enabled = IRQ_DISABLE_MASK_NONE;
+   soft_enabled_set(IRQ_DISABLE_MASK_NONE);
 
/* Tell the caller to enter the low power state */
return true;
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index f31930b9bfc1..f0f882166dcc 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -197,7 +197,7 @@ static void __init fixup_boot_paca(void)
/* Allow percpu accesses to work until we setup percpu data */
get_paca()->data_offset = 0;
/* Mark interrupts disabled in PACA */
-   get_paca()->soft_enabled = IRQ_DISABLE_MASK_LINUX;
+   soft_enabled_set(IRQ_DISABLE_MASK_LINUX);
 }
 
 static void __init configure_exceptions(void)
@@ -334,7 +334,7 @@ void __init early_setup(unsigned long dt_ptr)
 void early_setup_secondary(void)
 {
/* Mark interrupts disabled in PACA */
-   get_paca()->soft_enabled = 0;
+   soft_enabled_set(IRQ_DISABLE_MASK_LINUX);
 
/* Initialize the hash table or TLB handling */
early_init_mmu_secondary();
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 7105757cdb90..483313aa311f 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -259,7 +259,7 @@ static u64 scan_dispatch_log(u64 stop_tb)
 void accumulate_stolen_time(void)
 {
u64 sst, ust;
-   u8 save_soft_enabled = local_paca->soft_enabled;
+   unsigned long save_soft_enabled;
struct cpu_accounting_data *acct = _paca->accounting;
 
/* We are called early in the exception entry, before
@@ -268,7 +268,7 @@ void accumulate_stolen_time(void)
 * needs to reflect that so various debug stuff doesn't
 * complain
 */
-   local_paca->soft_enabled = IRQ_DISABLE_MASK_LINUX;
+   save_soft_enabled = soft_enabled_set_return(IRQ_DISABLE_MASK_LINUX);
 
sst = scan_dispatch_log(acct->starttime_user);
ust = scan_dispatch_log(acct->starttime);
@@ -276,7 +276,7 @@ void accumulate_stolen_time(void)
acct->user_time -= ust;
local_paca->stolen_time += ust + sst;
 
-   local_paca->soft_enabled = save_soft_enabled;
+   soft_enabled_set(save_soft_enabled);
 }
 
 static inline u64 calculate_stolen_time(u64 stop_tb)
-- 
2.7.4



[PATCH 05/13] powerpc: Add soft_enabled manipulation functions

2016-09-15 Thread Madhavan Srinivasan
Add new soft_enabled_* manipulation function and implement
arch_local_* using the soft_enabled_* wrappers.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/hw_irq.h | 32 ++--
 1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index f828b8f8df02..dc3c248f9244 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -53,21 +53,7 @@ static inline notrace void soft_enabled_set(unsigned long 
enable)
: : "r" (enable), "i" (offsetof(struct paca_struct, soft_enabled)));
 }
 
-static inline notrace unsigned long soft_enabled_set_return(unsigned long 
enable)
-{
-   unsigned long flags;
-
-   asm volatile(
-   "lbz %0,%1(13); stb %2,%1(13)"
-   : "=r" (flags)
-   : "i" (offsetof(struct paca_struct, soft_enabled)),\
- "r" (enable)
-   : "memory");
-
-   return flags;
-}
-
-static inline unsigned long arch_local_save_flags(void)
+static inline notrace unsigned long soft_enabled_return(void)
 {
unsigned long flags;
 
@@ -79,20 +65,30 @@ static inline unsigned long arch_local_save_flags(void)
return flags;
 }
 
-static inline unsigned long arch_local_irq_disable(void)
+static inline notrace unsigned long soft_enabled_set_return(unsigned long 
enable)
 {
unsigned long flags, zero;
 
asm volatile(
-   "li %1,%3; lbz %0,%2(13); stb %1,%2(13)"
+   "mr %1,%3; lbz %0,%2(13); stb %1,%2(13)"
: "=r" (flags), "=" (zero)
: "i" (offsetof(struct paca_struct, soft_enabled)),\
- "i" (IRQ_DISABLE_MASK_LINUX)
+ "r" (enable)
: "memory");
 
return flags;
 }
 
+static inline unsigned long arch_local_save_flags(void)
+{
+   return soft_enabled_return();
+}
+
+static inline unsigned long arch_local_irq_disable(void)
+{
+   return soft_enabled_set_return(IRQ_DISABLE_MASK_LINUX);
+}
+
 extern void arch_local_irq_restore(unsigned long);
 
 static inline void arch_local_irq_enable(void)
-- 
2.7.4



[PATCH 03/13] powerpc: move set_soft_enabled() and rename

2016-09-15 Thread Madhavan Srinivasan
Move set_soft_enabled() from powerpc/kernel/irq.c to
asm/hw_irq.c. and rename it soft_enabled_set().
THis way paca->soft_enabled updates can be forced.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/hw_irq.h |  6 ++
 arch/powerpc/kernel/irq.c | 12 +++-
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 1fcc2fd7275a..8fad8c24760b 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -47,6 +47,12 @@ extern void unknown_exception(struct pt_regs *regs);
 #ifdef CONFIG_PPC64
 #include 
 
+static inline notrace void soft_enabled_set(unsigned long enable)
+{
+   __asm__ __volatile__("stb %0,%1(13)"
+   : : "r" (enable), "i" (offsetof(struct paca_struct, soft_enabled)));
+}
+
 static inline unsigned long arch_local_save_flags(void)
 {
unsigned long flags;
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index ed1123125063..5a926ea5bd0b 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -107,12 +107,6 @@ static inline notrace unsigned long get_irq_happened(void)
return happened;
 }
 
-static inline notrace void set_soft_enabled(unsigned long enable)
-{
-   __asm__ __volatile__("stb %0,%1(13)"
-   : : "r" (enable), "i" (offsetof(struct paca_struct, soft_enabled)));
-}
-
 static inline notrace int decrementer_check_overflow(void)
 {
u64 now = get_tb_or_rtc();
@@ -208,7 +202,7 @@ notrace void arch_local_irq_restore(unsigned long en)
unsigned int replay;
 
/* Write the new soft-enabled value */
-   set_soft_enabled(en);
+   soft_enabled_set(en);
if (en == IRQ_DISABLE_MASK_LINUX)
return;
/*
@@ -254,7 +248,7 @@ notrace void arch_local_irq_restore(unsigned long en)
}
 #endif /* CONFIG_TRACE_IRQFLAGS */
 
-   set_soft_enabled(IRQ_DISABLE_MASK_LINUX);
+   soft_enabled_set(IRQ_DISABLE_MASK_LINUX);
 
/*
 * Check if anything needs to be re-emitted. We haven't
@@ -264,7 +258,7 @@ notrace void arch_local_irq_restore(unsigned long en)
replay = __check_irq_replay();
 
/* We can soft-enable now */
-   set_soft_enabled(IRQ_DISABLE_MASK_NONE);
+   soft_enabled_set(IRQ_DISABLE_MASK_NONE);
 
/*
 * And replay if we have to. This will return with interrupts
-- 
2.7.4



[PATCH 02/13] powerpc: Cleanup to use IRQ_DISABLE_MASK_* macros for paca->soft_enabled update

2016-09-15 Thread Madhavan Srinivasan
Replace the hardcoded values used when updating
paca->soft_enabled with IRQ_DISABLE_MASK_* #def.
No logic change.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/exception-64s.h |  2 +-
 arch/powerpc/include/asm/hw_irq.h| 15 ---
 arch/powerpc/include/asm/irqflags.h  |  6 +++---
 arch/powerpc/include/asm/kvm_ppc.h   |  2 +-
 arch/powerpc/kernel/entry_64.S   | 16 
 arch/powerpc/kernel/exceptions-64e.S |  6 +++---
 arch/powerpc/kernel/head_64.S|  5 +++--
 arch/powerpc/kernel/idle_book3e.S|  3 ++-
 arch/powerpc/kernel/idle_power4.S|  3 ++-
 arch/powerpc/kernel/irq.c|  9 +
 arch/powerpc/kernel/process.c|  3 ++-
 arch/powerpc/kernel/setup_64.c   |  3 +++
 arch/powerpc/kernel/time.c   |  2 +-
 arch/powerpc/mm/hugetlbpage.c|  2 +-
 arch/powerpc/perf/core-book3s.c  |  2 +-
 15 files changed, 44 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index bed66e5743b3..38272fe8a757 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -408,7 +408,7 @@ label##_relon_hv:   
\
 
 #define __SOFTEN_TEST(h, vec)  \
lbz r10,PACASOFTIRQEN(r13); \
-   cmpwi   r10,0;  \
+   cmpwi   r10,IRQ_DISABLE_MASK_LINUX; 
\
li  r10,SOFTEN_VALUE_##vec; \
beq masked_##h##interrupt
 #define _SOFTEN_TEST(h, vec)   __SOFTEN_TEST(h, vec)
diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index df5def1f635a..1fcc2fd7275a 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -64,9 +64,10 @@ static inline unsigned long arch_local_irq_disable(void)
unsigned long flags, zero;
 
asm volatile(
-   "li %1,0; lbz %0,%2(13); stb %1,%2(13)"
+   "li %1,%3; lbz %0,%2(13); stb %1,%2(13)"
: "=r" (flags), "=" (zero)
-   : "i" (offsetof(struct paca_struct, soft_enabled))
+   : "i" (offsetof(struct paca_struct, soft_enabled)),\
+ "i" (IRQ_DISABLE_MASK_LINUX)
: "memory");
 
return flags;
@@ -76,7 +77,7 @@ extern void arch_local_irq_restore(unsigned long);
 
 static inline void arch_local_irq_enable(void)
 {
-   arch_local_irq_restore(1);
+   arch_local_irq_restore(IRQ_DISABLE_MASK_NONE);
 }
 
 static inline unsigned long arch_local_irq_save(void)
@@ -86,7 +87,7 @@ static inline unsigned long arch_local_irq_save(void)
 
 static inline bool arch_irqs_disabled_flags(unsigned long flags)
 {
-   return flags == 0;
+   return flags == IRQ_DISABLE_MASK_LINUX;
 }
 
 static inline bool arch_irqs_disabled(void)
@@ -106,9 +107,9 @@ static inline bool arch_irqs_disabled(void)
u8 _was_enabled;\
__hard_irq_disable();   \
_was_enabled = local_paca->soft_enabled;\
-   local_paca->soft_enabled = 0;   \
+   local_paca->soft_enabled = IRQ_DISABLE_MASK_LINUX;\
local_paca->irq_happened |= PACA_IRQ_HARD_DIS;  \
-   if (_was_enabled)   \
+   if (_was_enabled == IRQ_DISABLE_MASK_NONE)  \
trace_hardirqs_off();   \
 } while(0)
 
@@ -131,7 +132,7 @@ static inline void may_hard_irq_enable(void)
 
 static inline bool arch_irq_disabled_regs(struct pt_regs *regs)
 {
-   return !regs->softe;
+   return (regs->softe == IRQ_DISABLE_MASK_LINUX);
 }
 
 extern bool prep_irq_for_idle(void);
diff --git a/arch/powerpc/include/asm/irqflags.h 
b/arch/powerpc/include/asm/irqflags.h
index f2149066fe5d..d0ed2a7d7d10 100644
--- a/arch/powerpc/include/asm/irqflags.h
+++ b/arch/powerpc/include/asm/irqflags.h
@@ -48,8 +48,8 @@
 #define RECONCILE_IRQ_STATE(__rA, __rB)\
lbz __rA,PACASOFTIRQEN(r13);\
lbz __rB,PACAIRQHAPPENED(r13);  \
-   cmpwi   cr0,__rA,0; \
-   li  __rA,0; \
+   cmpwi   cr0,__rA,IRQ_DISABLE_MASK_LINUX;\
+   li  __rA,IRQ_DISABLE_MASK_LINUX;\
ori __rB,__rB,PACA_IRQ_HARD_DIS;\
stb __rB,PACAIRQHAPPENED(r13);  \
beq 44f;\
@@ -63,7 +63,7 @@
 
 #define RECONCILE_IRQ_STATE(__rA, __rB)\
lbz __rA,PACAIRQHAPPENED(r13);  \
-   li  __rB,0; \
+   li  __rB,IRQ_DISABLE_MASK_LINUX;\
ori __rA,__rA,PACA_IRQ_HARD_DIS;\

[PATCH 01/13] powerpc: Add #defs for paca->soft_enabled flags

2016-09-15 Thread Madhavan Srinivasan
Two #defs IRQ_DISABLE_LEVEL_NONE and IRQ_DISABLE_LEVEL_LINUX
are added to be used when updating paca->soft_enabled.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/hw_irq.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index c7d82ff62a33..df5def1f635a 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -27,6 +27,12 @@
 #define PACA_IRQ_EE_EDGE   0x10 /* BookE only */
 #define PACA_IRQ_HMI   0x20
 
+/*
+ * flags for paca->soft_enabled
+ */
+#define IRQ_DISABLE_MASK_NONE  1
+#define IRQ_DISABLE_MASK_LINUX 0
+
 #endif /* CONFIG_PPC64 */
 
 #ifndef __ASSEMBLY__
-- 
2.7.4



[PATCH 00/13] powerpc: "paca->soft_enabled" based local atomic operation implementation

2016-09-15 Thread Madhavan Srinivasan
Local atomic operations are fast and highly reentrant per CPU counters.
Used for percpu variable updates. Local atomic operations only guarantee
variable modification atomicity wrt the CPU which owns the data and
these needs to be executed in a preemption safe way.

Here is the design of the patchset. Since local_* operations
are only need to be atomic to interrupts (IIUC), we have two options.
Either replay the "op" if interrupted or replay the interrupt after
the "op". Initial patchset posted was based on implementing local_* operation
based on CR5 which replay's the "op". Patchset had issues in case of
rewinding the address pointor from an array. This make the slow patch
really slow. Since CR5 based implementation proposed using __ex_table to find
the rewind address, this rasied concerns about size of __ex_table and vmlinux.

https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-December/123115.html

But this patchset uses Benjamin Herrenschmidt suggestion of using
arch_local_irq_disable() to soft_disable interrupts (including PMIs).
After finishing the "op", arch_local_irq_restore() called and correspondingly
interrupts are replayed if any occured.

Current paca->soft_enabled logic is reserved and MASKABLE_EXCEPTION_* macros
are extended to support this feature.

patch re-write the current local_* functions to use arch_local_irq_disbale.
Base flow for each function is

 {
local_irq_pmu_save(flags)
load
..
store
local_irq_pmu_restore(flags)
 }

Reason for the approach is that, currently l[w/d]arx/st[w/d]cx.
instruction pair is used for local_* operations, which are heavy
on cycle count and they dont support a local variant. So to
see whether the new implementation helps, used a modified
version of Rusty's benchmark code on local_t.

https://lkml.org/lkml/2008/12/16/450

Modifications to Rusty's benchmark code:
 - Executed only local_t test

Here are the values with the patch.

Time in ns per iteration

Local_t Without Patch   With Patch

_inc28  8
_add28  8
_read   3   3
_add_return 28  7

Currently only asm/local.h has been rewrite, and also
the entire change is tested only in PPC64 (pseries guest)
and PPC64 LE host. Have only compile tested ppc64e_*.

First five are the clean up patches which lays the foundation
to make things easier. Fifth patch in the patchset reverse the
current soft_enabled logic and commit message details the reason and
need for this change. Seventh and eighth patch refactor's the 
__EXPECTION_PROLOG_1
code to support addition of a new parameter to MASKABLE_* macros. New parameter
will give the possible mask for the interrupt. Rest of the patches are
to add support for maskable PMI and implementation of local_t using 
local_irq_pmu_*().

Since the patchset is experimental, testing done only on pseries and
powernv platforms. Have only compile tested the patchset for Book3e.

Changelog RFC v5:
1)Implemented new set of soft_enabled manipulation functions
2)rewritten arch_local_irq_* functions to use the new soft_enabled_*()
3)Add WARN_ON to identify invalid soft_enabled transitions
4)Added powerpc_local_irq_pmu_save() and powerpc_local_irq_pmu_restore() to
  support masking of irqs (with PMI).
5)Added local_irq_pmu_*()s macros with trace_hardirqs_on|off() to match
  include/linux/irqflags.h

Changelog RFC v4:
1)Fix build breaks in in ppc64e_defconfig compilation
2)Merged PMI replay code with the exception vector changes patch
3)Renamed the new API to set PMI mask bit as suggested
4)Modified the current arch_local_save and new API function call to
  "OR" and store the value to ->soft_enabled instead of just store.
5)Updated the check in the arch_local_irq_restore() to alway check for
  greather than or zero to _LINUX mask bit.
6)Updated the commit messages.

Changelog RFC v3:
1)Squashed PMI masked interrupt patch and replay patch together
2)Have created a new patch which includes a new Kconfig and set_irq_set_mask()
3)Fixed the compilation issue with IRQ_DISABLE_MASK_* macros in book3e_*

Changelog RFC v2:
1)Renamed IRQ_DISABLE_LEVEL_* to IRQ_DISABLE_MASK_* and made logic changes
  to treat soft_enabled as a mask and not a flag or level.
2)Added a new Kconfig variable to support a WARN_ON
3)Refactored patchset for eaiser review.
4)Made changes to commit messages.
5)Made changes for BOOK3E version

Changelog RFC v1:

1)Commit messages are improved.
2)Renamed the arch_local_irq_disable_var to soft_irq_set_level as suggested
3)Renamed the LAZY_INTERRUPT* macro to IRQ_DISABLE_LEVEL_* as suggested
4)Extended the MASKABLE_EXCEPTION* macros to support additional parameter.
5)Each MASKABLE_EXCEPTION_* macro will carry a "mask_level"
6)Logic to decide on jump to maskable_handler in SOFTEN_TEST is now based on
  "mask_level"
7)__EXCEPTION_PROLOG_1 is factored out to support "mask_level" parameter.
  This reduced the code 

[PATCH v4 9/9] ima: platform-independent hash value

2016-09-15 Thread Mimi Zohar
From: Andreas Steffen 

For remote attestion it is important for the ima measurement values
to be platform-independent. Therefore integer fields to be hashed
must be converted to canonical format.

Changelog:
- Define canonical format as little endian (Mimi)

Signed-off-by: Andreas Steffen 
Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/ima_crypto.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/security/integrity/ima/ima_crypto.c 
b/security/integrity/ima/ima_crypto.c
index 38f2ed8..802d5d2 100644
--- a/security/integrity/ima/ima_crypto.c
+++ b/security/integrity/ima/ima_crypto.c
@@ -477,11 +477,13 @@ static int ima_calc_field_array_hash_tfm(struct 
ima_field_data *field_data,
u8 buffer[IMA_EVENT_NAME_LEN_MAX + 1] = { 0 };
u8 *data_to_hash = field_data[i].data;
u32 datalen = field_data[i].len;
+   u32 datalen_to_hash =
+   !ima_canonical_fmt ? datalen : cpu_to_le32(datalen);
 
if (strcmp(td->name, IMA_TEMPLATE_IMA_NAME) != 0) {
rc = crypto_shash_update(shash,
-   (const u8 *) _data[i].len,
-   sizeof(field_data[i].len));
+   (const u8 *) _to_hash,
+   sizeof(datalen_to_hash));
if (rc)
break;
} else if (strcmp(td->fields[i]->field_id, "n") == 0) {
-- 
2.1.0



[PATCH v4 8/9] ima: define a canonical binary_runtime_measurements list format

2016-09-15 Thread Mimi Zohar
The IMA binary_runtime_measurements list is currently in platform native
format.

To allow restoring a measurement list carried across kexec with a
different endianness than the targeted kernel, this patch defines
little-endian as the canonical format.  For big endian systems wanting
to save/restore the measurement list from a system with a different
endianness, a new boot command line parameter named "ima_canonical_fmt"
is defined.

Considerations: use of the "ima_canonical_fmt" boot command line
option will break existing userspace applications on big endian systems
expecting the binary_runtime_measurements list to be in platform native
format.

Changelog v3:
- restore PCR value properly

Signed-off-by: Mimi Zohar 
---
 Documentation/kernel-parameters.txt   |  4 
 security/integrity/ima/ima.h  |  6 ++
 security/integrity/ima/ima_fs.c   | 28 +---
 security/integrity/ima/ima_kexec.c| 11 +--
 security/integrity/ima/ima_template.c | 24 ++--
 security/integrity/ima/ima_template_lib.c |  7 +--
 6 files changed, 67 insertions(+), 13 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 46c030a..5e8037fc 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1580,6 +1580,10 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
The builtin appraise policy appraises all files
owned by uid=0.
 
+   ima_canonical_fmt [IMA]
+   Use the canonical format for the binary runtime
+   measurements, instead of host native format.
+
ima_hash=   [IMA]
Format: { md5 | sha1 | rmd160 | sha256 | sha384
   | sha512 | ... }
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index e8303c9..eb0f4dd 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -112,6 +112,12 @@ struct ima_kexec_hdr {
u64 count;
 };
 
+/*
+ * The default binary_runtime_measurements list format is defined as the
+ * platform native format.  The canonical format is defined as little-endian.
+ */
+extern bool ima_canonical_fmt;
+
 /* Internal IMA function definitions */
 int ima_init(void);
 int ima_fs_init(void);
diff --git a/security/integrity/ima/ima_fs.c b/security/integrity/ima/ima_fs.c
index 66e5dd5..2bcad99 100644
--- a/security/integrity/ima/ima_fs.c
+++ b/security/integrity/ima/ima_fs.c
@@ -28,6 +28,16 @@
 
 static DEFINE_MUTEX(ima_write_mutex);
 
+bool ima_canonical_fmt;
+static int __init default_canonical_fmt_setup(char *str)
+{
+#ifdef __BIG_ENDIAN
+   ima_canonical_fmt = 1;
+#endif
+   return 1;
+}
+__setup("ima_canonical_fmt", default_canonical_fmt_setup);
+
 static int valid_policy = 1;
 #define TMPBUFLEN 12
 static ssize_t ima_show_htable_value(char __user *buf, size_t count,
@@ -122,7 +132,7 @@ int ima_measurements_show(struct seq_file *m, void *v)
struct ima_queue_entry *qe = v;
struct ima_template_entry *e;
char *template_name;
-   int namelen;
+   u32 pcr, namelen, template_data_len; /* temporary fields */
bool is_ima_template = false;
int i;
 
@@ -139,25 +149,29 @@ int ima_measurements_show(struct seq_file *m, void *v)
 * PCR used defaults to the same (config option) in
 * little-endian format, unless set in policy
 */
-   ima_putc(m, >pcr, sizeof(e->pcr));
+   pcr = !ima_canonical_fmt ? e->pcr : cpu_to_le32(e->pcr);
+   ima_putc(m, , sizeof(e->pcr));
 
/* 2nd: template digest */
ima_putc(m, e->digest, TPM_DIGEST_SIZE);
 
/* 3rd: template name size */
-   namelen = strlen(template_name);
+   namelen = !ima_canonical_fmt ? strlen(template_name) :
+   cpu_to_le32(strlen(template_name));
ima_putc(m, , sizeof(namelen));
 
/* 4th:  template name */
-   ima_putc(m, template_name, namelen);
+   ima_putc(m, template_name, strlen(template_name));
 
/* 5th:  template length (except for 'ima' template) */
if (strcmp(template_name, IMA_TEMPLATE_IMA_NAME) == 0)
is_ima_template = true;
 
-   if (!is_ima_template)
-   ima_putc(m, >template_data_len,
-sizeof(e->template_data_len));
+   if (!is_ima_template) {
+   template_data_len = !ima_canonical_fmt ? e->template_data_len :
+   cpu_to_le32(e->template_data_len);
+   ima_putc(m, _data_len, sizeof(e->template_data_len));
+   }
 
/* 6th:  template specific data */
for (i = 0; i < e->template_desc->num_fields; i++) {
diff --git a/security/integrity/ima/ima_kexec.c 
b/security/integrity/ima/ima_kexec.c
index 0abbc8d..878c062 100644
--- 

[PATCH v4 7/9] ima: support restoring multiple template formats

2016-09-15 Thread Mimi Zohar
The configured IMA measurement list template format can be replaced at
runtime on the boot command line, including a custom template format.
This patch adds support for restoring a measuremement list containing
multiple builtin/custom template formats.

Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/ima_template.c | 53 +--
 1 file changed, 50 insertions(+), 3 deletions(-)

diff --git a/security/integrity/ima/ima_template.c 
b/security/integrity/ima/ima_template.c
index 804bb95..7b15baa 100644
--- a/security/integrity/ima/ima_template.c
+++ b/security/integrity/ima/ima_template.c
@@ -155,9 +155,14 @@ static int template_desc_init_fields(const char 
*template_fmt,
 {
const char *template_fmt_ptr;
struct ima_template_field *found_fields[IMA_TEMPLATE_NUM_FIELDS_MAX];
-   int template_num_fields = template_fmt_size(template_fmt);
+   int template_num_fields;
int i, len;
 
+   if (num_fields && *num_fields > 0) /* already initialized? */
+   return 0;
+
+   template_num_fields = template_fmt_size(template_fmt);
+
if (template_num_fields > IMA_TEMPLATE_NUM_FIELDS_MAX) {
pr_err("format string '%s' contains too many fields\n",
   template_fmt);
@@ -237,6 +242,35 @@ int __init ima_init_template(void)
return result;
 }
 
+static struct ima_template_desc *restore_template_fmt(char *template_name)
+{
+   struct ima_template_desc *template_desc = NULL;
+   int ret;
+
+   ret = template_desc_init_fields(template_name, NULL, NULL);
+   if (ret < 0) {
+   pr_err("attempting to initialize the template \"%s\" failed\n",
+   template_name);
+   goto out;
+   }
+
+   template_desc = kzalloc(sizeof(*template_desc), GFP_KERNEL);
+   if (!template_desc)
+   goto out;
+
+   template_desc->name = "";
+   template_desc->fmt = kstrdup(template_name, GFP_KERNEL);
+   if (!template_desc->fmt)
+   goto out;
+
+   spin_lock(_list);
+   list_add_tail_rcu(_desc->list, _templates);
+   spin_unlock(_list);
+   synchronize_rcu();
+out:
+   return template_desc;
+}
+
 static int ima_restore_template_data(struct ima_template_desc *template_desc,
 void *template_data,
 int template_data_size,
@@ -367,10 +401,23 @@ int ima_restore_measurement_list(loff_t size, void *buf)
}
data_v1 = bufp += (u_int8_t)hdr_v1->template_name_len;
 
-   /* get template format */
template_desc = lookup_template_desc(template_name);
if (!template_desc) {
-   pr_err("template \"%s\" not found\n", template_name);
+   template_desc = restore_template_fmt(template_name);
+   if (!template_desc)
+   break;
+   }
+
+   /*
+* Only the running system's template format is initialized
+* on boot.  As needed, initialize the other template formats.
+*/
+   ret = template_desc_init_fields(template_desc->fmt,
+   &(template_desc->fields),
+   &(template_desc->num_fields));
+   if (ret < 0) {
+   pr_err("attempting to restore the template fmt \"%s\" \
+   failed\n", template_desc->fmt);
ret = -EINVAL;
break;
}
-- 
2.1.0



[PATCH v4 6/9] ima: store the builtin/custom template definitions in a list

2016-09-15 Thread Mimi Zohar
The builtin and single custom templates are currently stored in an
array.  In preparation for being able to restore a measurement list
containing multiple builtin/custom templates, this patch stores the
builtin and custom templates as a linked list.  This will permit
defining more than one custom template per boot.

Changelog v4:
- fix "spinlock bad magic" BUG - reported by Dmitry Vyukov

Changelog v3:
- initialize template format list in ima_template_desc_current(), as it
might be called during __setup before normal initialization. (kernel
test robot)
- remove __init annotation of ima_init_template_list()

Changelog v2:
- fix lookup_template_desc() preemption imbalance (kernel test robot)

Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/ima.h  |  2 ++
 security/integrity/ima/ima_main.c |  1 +
 security/integrity/ima/ima_template.c | 52 +++
 3 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 634d140..e8303c9 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -81,6 +81,7 @@ struct ima_template_field {
 
 /* IMA template descriptor definition */
 struct ima_template_desc {
+   struct list_head list;
char *name;
char *fmt;
int num_fields;
@@ -136,6 +137,7 @@ int ima_restore_measurement_list(loff_t bufsize, void *buf);
 int ima_measurements_show(struct seq_file *m, void *v);
 unsigned long ima_get_binary_runtime_size(void);
 int ima_init_template(void);
+void ima_init_template_list(void);
 
 #ifdef CONFIG_KEXEC_FILE
 void ima_load_kexec_buffer(void);
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 596ef61..592f318 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -418,6 +418,7 @@ static int __init init_ima(void)
 {
int error;
 
+   ima_init_template_list();
hash_setup(CONFIG_IMA_DEFAULT_HASH);
error = ima_init();
if (!error) {
diff --git a/security/integrity/ima/ima_template.c 
b/security/integrity/ima/ima_template.c
index 7c90075..804bb95 100644
--- a/security/integrity/ima/ima_template.c
+++ b/security/integrity/ima/ima_template.c
@@ -15,16 +15,20 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include 
 #include "ima.h"
 #include "ima_template_lib.h"
 
-static struct ima_template_desc defined_templates[] = {
+static struct ima_template_desc builtin_templates[] = {
{.name = IMA_TEMPLATE_IMA_NAME, .fmt = IMA_TEMPLATE_IMA_FMT},
{.name = "ima-ng", .fmt = "d-ng|n-ng"},
{.name = "ima-sig", .fmt = "d-ng|n-ng|sig"},
{.name = "", .fmt = ""},/* placeholder for a custom format */
 };
 
+static LIST_HEAD(defined_templates);
+static DEFINE_SPINLOCK(template_list);
+
 static struct ima_template_field supported_fields[] = {
{.field_id = "d", .field_init = ima_eventdigest_init,
 .field_show = ima_show_template_digest},
@@ -53,6 +57,8 @@ static int __init ima_template_setup(char *str)
if (ima_template)
return 1;
 
+   ima_init_template_list();
+
/*
 * Verify that a template with the supplied name exists.
 * If not, use CONFIG_IMA_DEFAULT_TEMPLATE.
@@ -81,7 +87,7 @@ __setup("ima_template=", ima_template_setup);
 
 static int __init ima_template_fmt_setup(char *str)
 {
-   int num_templates = ARRAY_SIZE(defined_templates);
+   int num_templates = ARRAY_SIZE(builtin_templates);
 
if (ima_template)
return 1;
@@ -92,22 +98,28 @@ static int __init ima_template_fmt_setup(char *str)
return 1;
}
 
-   defined_templates[num_templates - 1].fmt = str;
-   ima_template = defined_templates + num_templates - 1;
+   builtin_templates[num_templates - 1].fmt = str;
+   ima_template = builtin_templates + num_templates - 1;
+
return 1;
 }
 __setup("ima_template_fmt=", ima_template_fmt_setup);
 
 static struct ima_template_desc *lookup_template_desc(const char *name)
 {
-   int i;
+   struct ima_template_desc *template_desc;
+   int found = 0;
 
-   for (i = 0; i < ARRAY_SIZE(defined_templates); i++) {
-   if (strcmp(defined_templates[i].name, name) == 0)
-   return defined_templates + i;
+   rcu_read_lock();
+   list_for_each_entry_rcu(template_desc, _templates, list) {
+   if ((strcmp(template_desc->name, name) == 0) ||
+   (strcmp(template_desc->fmt, name) == 0)) {
+   found = 1;
+   break;
+   }
}
-
-   return NULL;
+   rcu_read_unlock();
+   return found ? template_desc : NULL;
 }
 
 static struct ima_template_field *lookup_template_field(const char *field_id)
@@ -183,11 +195,29 @@ static int template_desc_init_fields(const char 
*template_fmt,
return 0;
 }
 

[PATCH v4 5/9] ima: on soft reboot, save the measurement list

2016-09-15 Thread Mimi Zohar
From: Thiago Jung Bauermann 

This patch uses the kexec buffer passing mechanism to pass the
serialized IMA binary_runtime_measurements to the next kernel.

Changelog v4:
- Revert the skip_checksum change.  Instead calculate the checksum
with the measurement list segment, on update validate the existing
checksum before re-calulating a new checksum with the updated
measurement list.

Changelog v3:
- Request a kexec segment for storing the measurement list a half page,
not a full page, more than needed for additional measurements.
- Added binary_runtime_size overflow test
- Limit maximum number of pages needed for kexec_segment_size to half
of totalram_pages. (Dave Young)

Changelog v2:
- Fix build issue by defining a stub ima_add_kexec_buffer and stub
  struct kimage when CONFIG_IMA=n and CONFIG_IMA_KEXEC=n. (Fenguang Wu)
- removed kexec_add_handover_buffer() checksum argument.
- added skip_checksum member to kexec_buf
- only register reboot notifier once

Changelog v1:
- updated to call IMA functions  (Mimi)
- move code from ima_template.c to ima_kexec.c (Mimi)

Signed-off-by: Thiago Jung Bauermann 
Signed-off-by: Mimi Zohar 
---
 include/linux/ima.h| 12 +
 kernel/kexec_file.c|  4 ++
 security/integrity/ima/ima_kexec.c | 96 ++
 3 files changed, 112 insertions(+)

diff --git a/include/linux/ima.h b/include/linux/ima.h
index 0eb7c2e..7f6952f 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -11,6 +11,7 @@
 #define _LINUX_IMA_H
 
 #include 
+#include 
 struct linux_binprm;
 
 #ifdef CONFIG_IMA
@@ -23,6 +24,10 @@ extern int ima_post_read_file(struct file *file, void *buf, 
loff_t size,
  enum kernel_read_file_id id);
 extern void ima_post_path_mknod(struct dentry *dentry);
 
+#ifdef CONFIG_IMA_KEXEC
+extern void ima_add_kexec_buffer(struct kimage *image);
+#endif
+
 #else
 static inline int ima_bprm_check(struct linux_binprm *bprm)
 {
@@ -62,6 +67,13 @@ static inline void ima_post_path_mknod(struct dentry *dentry)
 
 #endif /* CONFIG_IMA */
 
+#ifndef CONFIG_IMA_KEXEC
+struct kimage;
+
+static inline void ima_add_kexec_buffer(struct kimage *image)
+{}
+#endif
+
 #ifdef CONFIG_IMA_APPRAISE
 extern void ima_inode_post_setattr(struct dentry *dentry);
 extern int ima_inode_setxattr(struct dentry *dentry, const char *xattr_name,
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 3acd386..ccdcbfd 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -204,6 +205,9 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
return ret;
image->kernel_buf_len = size;
 
+   /* IMA needs to pass the measurement list to the next kernel. */
+   ima_add_kexec_buffer(image);
+
/* Call arch image probe handlers */
ret = arch_kexec_kernel_image_probe(image, image->kernel_buf,
image->kernel_buf_len);
diff --git a/security/integrity/ima/ima_kexec.c 
b/security/integrity/ima/ima_kexec.c
index e77ca9d..0abbc8d 100644
--- a/security/integrity/ima/ima_kexec.c
+++ b/security/integrity/ima/ima_kexec.c
@@ -23,6 +23,11 @@
 
 #include "ima.h"
 
+#ifdef CONFIG_IMA_KEXEC
+/* Physical address of the measurement buffer in the next kernel. */
+static unsigned long kexec_buffer_load_addr;
+static size_t kexec_segment_size;
+
 static int ima_dump_measurement_list(unsigned long *buffer_size, void **buffer,
 unsigned long segment_size)
 {
@@ -75,6 +80,97 @@ out:
 }
 
 /*
+ * Called during kexec execute so that IMA can save the measurement list.
+ */
+static int ima_update_kexec_buffer(struct notifier_block *self,
+  unsigned long action, void *data)
+{
+   void *kexec_buffer = NULL;
+   size_t kexec_buffer_size;
+   int ret;
+
+   if (!kexec_in_progress)
+   return NOTIFY_OK;
+
+   kexec_buffer_size = ima_get_binary_runtime_size();
+   if (kexec_buffer_size > kexec_segment_size) {
+   pr_err("Binary measurement list grew too large.\n");
+   goto out;
+   }
+
+   ima_dump_measurement_list(_buffer_size, _buffer,
+ kexec_segment_size);
+   if (!kexec_buffer) {
+   pr_err("Not enough memory for the kexec measurement buffer.\n");
+   goto out;
+   }
+   ret = kexec_update_segment(kexec_buffer, kexec_buffer_size,
+  kexec_buffer_load_addr, kexec_segment_size);
+   if (ret)
+   pr_err("Error updating kexec buffer: %d\n", ret);
+out:
+   return NOTIFY_OK;
+}
+
+struct notifier_block update_buffer_nb = {
+   .notifier_call = ima_update_kexec_buffer,
+};
+
+/*
+ * Called during 

[PATCH v4 4/9] ima: serialize the binary_runtime_measurements

2016-09-15 Thread Mimi Zohar
The TPM PCRs are only reset on a hard reboot.  In order to validate a
TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list
of the running kernel must be saved and restored on boot.  This patch
serializes the IMA measurement list in the binary_runtime_measurements
format.

Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/ima.h   |  1 +
 security/integrity/ima/ima_fs.c|  2 +-
 security/integrity/ima/ima_kexec.c | 51 ++
 3 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index f9cd08e..634d140 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -133,6 +133,7 @@ void ima_print_digest(struct seq_file *m, u8 *digest, u32 
size);
 struct ima_template_desc *ima_template_desc_current(void);
 int ima_restore_measurement_entry(struct ima_template_entry *entry);
 int ima_restore_measurement_list(loff_t bufsize, void *buf);
+int ima_measurements_show(struct seq_file *m, void *v);
 unsigned long ima_get_binary_runtime_size(void);
 int ima_init_template(void);
 
diff --git a/security/integrity/ima/ima_fs.c b/security/integrity/ima/ima_fs.c
index c07a384..66e5dd5 100644
--- a/security/integrity/ima/ima_fs.c
+++ b/security/integrity/ima/ima_fs.c
@@ -116,7 +116,7 @@ void ima_putc(struct seq_file *m, void *data, int datalen)
  *   [eventdata length]
  *   eventdata[n]=template specific data
  */
-static int ima_measurements_show(struct seq_file *m, void *v)
+int ima_measurements_show(struct seq_file *m, void *v)
 {
/* the list never shrinks, so we don't need a lock here */
struct ima_queue_entry *qe = v;
diff --git a/security/integrity/ima/ima_kexec.c 
b/security/integrity/ima/ima_kexec.c
index 6a046ad..e77ca9d 100644
--- a/security/integrity/ima/ima_kexec.c
+++ b/security/integrity/ima/ima_kexec.c
@@ -23,6 +23,57 @@
 
 #include "ima.h"
 
+static int ima_dump_measurement_list(unsigned long *buffer_size, void **buffer,
+unsigned long segment_size)
+{
+   struct ima_queue_entry *qe;
+   struct seq_file file;
+   struct ima_kexec_hdr khdr = {
+   .version = 1, .buffer_size = 0, .count = 0};
+   int ret = 0;
+
+   /* segment size can't change between kexec load and execute */
+   file.buf = vmalloc(segment_size);
+   if (!file.buf) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   file.size = segment_size;
+   file.read_pos = 0;
+   file.count = sizeof(khdr);  /* reserved space */
+
+   list_for_each_entry_rcu(qe, _measurements, later) {
+   if (file.count < file.size) {
+   khdr.count++;
+   ima_measurements_show(, qe);
+   } else {
+   ret = -EINVAL;
+   break;
+   }
+   }
+
+   if (ret < 0)
+   goto out;
+
+   /*
+* fill in reserved space with some buffer details
+* (eg. version, buffer size, number of measurements)
+*/
+   khdr.buffer_size = file.count;
+   memcpy(file.buf, , sizeof(khdr));
+   print_hex_dump(KERN_DEBUG, "ima dump: ", DUMP_PREFIX_NONE,
+   16, 1, file.buf,
+   file.count < 100 ? file.count : 100, true);
+
+   *buffer_size = file.count;
+   *buffer = file.buf;
+out:
+   if (ret == -EINVAL)
+   vfree(file.buf);
+   return ret;
+}
+
 /*
  * Restore the measurement list from the previous kernel.
  */
-- 
2.1.0



[PATCH v4 3/9] ima: maintain memory size needed for serializing the measurement list

2016-09-15 Thread Mimi Zohar
In preparation for serializing the binary_runtime_measurements, this patch
maintains the amount of memory required.

Changelog v3:
- include the ima_kexec_hdr size in the binary_runtime_measurement size.

Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/Kconfig | 12 +
 security/integrity/ima/ima.h   |  1 +
 security/integrity/ima/ima_queue.c | 53 --
 3 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index 5487827..1c5a1c2 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -27,6 +27,18 @@ config IMA
  to learn more about IMA.
  If unsure, say N.
 
+config IMA_KEXEC
+   bool "Enable carrying the IMA measurement list across a soft boot"
+   depends on IMA && TCG_TPM && KEXEC_FILE
+   default n
+   help
+  TPM PCRs are only reset on a hard reboot.  In order to validate
+  a TPM's quote after a soft boot, the IMA measurement list of the
+  running kernel must be saved and restored on boot.
+
+  Depending on the IMA policy, the measurement list can grow to
+  be very large.
+
 config IMA_MEASURE_PCR_IDX
int
depends on IMA
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index e7b3755..f9cd08e 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -133,6 +133,7 @@ void ima_print_digest(struct seq_file *m, u8 *digest, u32 
size);
 struct ima_template_desc *ima_template_desc_current(void);
 int ima_restore_measurement_entry(struct ima_template_entry *entry);
 int ima_restore_measurement_list(loff_t bufsize, void *buf);
+unsigned long ima_get_binary_runtime_size(void);
 int ima_init_template(void);
 
 #ifdef CONFIG_KEXEC_FILE
diff --git a/security/integrity/ima/ima_queue.c 
b/security/integrity/ima/ima_queue.c
index 12d1b04..3a3cc2a 100644
--- a/security/integrity/ima/ima_queue.c
+++ b/security/integrity/ima/ima_queue.c
@@ -29,6 +29,11 @@
 #define AUDIT_CAUSE_LEN_MAX 32
 
 LIST_HEAD(ima_measurements);   /* list of all measurements */
+#ifdef CONFIG_IMA_KEXEC
+static unsigned long binary_runtime_size;
+#else
+static unsigned long binary_runtime_size = ULONG_MAX;
+#endif
 
 /* key: inode (before secure-hashing a file) */
 struct ima_h_table ima_htable = {
@@ -64,6 +69,24 @@ static struct ima_queue_entry *ima_lookup_digest_entry(u8 
*digest_value,
return ret;
 }
 
+/*
+ * Calculate the memory required for serializing a single
+ * binary_runtime_measurement list entry, which contains a
+ * couple of variable length fields (e.g template name and data).
+ */
+static int get_binary_runtime_size(struct ima_template_entry *entry)
+{
+   int size = 0;
+
+   size += sizeof(u32);/* pcr */
+   size += sizeof(entry->digest);
+   size += sizeof(int);/* template name size field */
+   size += strlen(entry->template_desc->name);
+   size += sizeof(entry->template_data_len);
+   size += entry->template_data_len;
+   return size;
+}
+
 /* ima_add_template_entry helper function:
  * - Add template entry to the measurement list and hash table, for
  *   all entries except those carried across kexec.
@@ -90,9 +113,30 @@ static int ima_add_digest_entry(struct ima_template_entry 
*entry, int flags)
key = ima_hash_key(entry->digest);
hlist_add_head_rcu(>hnext, _htable.queue[key]);
}
+
+   if (binary_runtime_size != ULONG_MAX) {
+   int size;
+
+   size = get_binary_runtime_size(entry);
+   binary_runtime_size = (binary_runtime_size < ULONG_MAX - size) ?
+binary_runtime_size + size : ULONG_MAX;
+   }
return 0;
 }
 
+/*
+ * Return the amount of memory required for serializing the
+ * entire binary_runtime_measurement list, including the ima_kexec_hdr
+ * structure.
+ */
+unsigned long ima_get_binary_runtime_size(void)
+{
+   if (binary_runtime_size >= (ULONG_MAX - sizeof(struct ima_kexec_hdr)))
+   return ULONG_MAX;
+   else
+   return binary_runtime_size + sizeof(struct ima_kexec_hdr);
+};
+
 static int ima_pcr_extend(const u8 *hash, int pcr)
 {
int result = 0;
@@ -106,8 +150,13 @@ static int ima_pcr_extend(const u8 *hash, int pcr)
return result;
 }
 
-/* Add template entry to the measurement list and hash table,
- * and extend the pcr.
+/*
+ * Add template entry to the measurement list and hash table, and
+ * extend the pcr.
+ *
+ * On systems which support carrying the IMA measurement list across
+ * kexec, maintain the total memory size required for serializing the
+ * binary_runtime_measurements.
  */
 int ima_add_template_entry(struct ima_template_entry *entry, int violation,
   const char *op, struct inode *inode,
-- 
2.1.0



[PATCH v4 2/9] ima: permit duplicate measurement list entries

2016-09-15 Thread Mimi Zohar
Measurements carried across kexec need to be added to the IMA
measurement list, but should not prevent measurements of the newly
booted kernel from being added to the measurement list. This patch
adds support for allowing duplicate measurements.

The "boot_aggregate" measurement entry is the delimiter between soft
boots.

Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/ima_queue.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/security/integrity/ima/ima_queue.c 
b/security/integrity/ima/ima_queue.c
index 4b1bb77..12d1b04 100644
--- a/security/integrity/ima/ima_queue.c
+++ b/security/integrity/ima/ima_queue.c
@@ -65,11 +65,12 @@ static struct ima_queue_entry *ima_lookup_digest_entry(u8 
*digest_value,
 }
 
 /* ima_add_template_entry helper function:
- * - Add template entry to measurement list and hash table.
+ * - Add template entry to the measurement list and hash table, for
+ *   all entries except those carried across kexec.
  *
  * (Called with ima_extend_list_mutex held.)
  */
-static int ima_add_digest_entry(struct ima_template_entry *entry)
+static int ima_add_digest_entry(struct ima_template_entry *entry, int flags)
 {
struct ima_queue_entry *qe;
unsigned int key;
@@ -85,8 +86,10 @@ static int ima_add_digest_entry(struct ima_template_entry 
*entry)
list_add_tail_rcu(>later, _measurements);
 
atomic_long_inc(_htable.len);
-   key = ima_hash_key(entry->digest);
-   hlist_add_head_rcu(>hnext, _htable.queue[key]);
+   if (flags) {
+   key = ima_hash_key(entry->digest);
+   hlist_add_head_rcu(>hnext, _htable.queue[key]);
+   }
return 0;
 }
 
@@ -126,7 +129,7 @@ int ima_add_template_entry(struct ima_template_entry 
*entry, int violation,
}
}
 
-   result = ima_add_digest_entry(entry);
+   result = ima_add_digest_entry(entry, 1);
if (result < 0) {
audit_cause = "ENOMEM";
audit_info = 0;
@@ -155,7 +158,7 @@ int ima_restore_measurement_entry(struct ima_template_entry 
*entry)
int result = 0;
 
mutex_lock(_extend_list_mutex);
-   result = ima_add_digest_entry(entry);
+   result = ima_add_digest_entry(entry, 0);
mutex_unlock(_extend_list_mutex);
return result;
 }
-- 
2.1.0



[PATCH v4 1/9] ima: on soft reboot, restore the measurement list

2016-09-15 Thread Mimi Zohar
The TPM PCRs are only reset on a hard reboot.  In order to validate a
TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list
of the running kernel must be saved and restored on boot.  This patch
restores the measurement list.

Changelog v2:
- redefined ima_kexec_hdr to use types with well defined sizes (M. Ellerman)
- defined missing ima_load_kexec_buffer() stub function

Changelog v1:
- call ima_load_kexec_buffer() (Thiago)

Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/Makefile   |   1 +
 security/integrity/ima/ima.h  |  18 
 security/integrity/ima/ima_init.c |   2 +
 security/integrity/ima/ima_kexec.c|  55 +++
 security/integrity/ima/ima_queue.c|  10 ++
 security/integrity/ima/ima_template.c | 170 ++
 6 files changed, 256 insertions(+)
 create mode 100644 security/integrity/ima/ima_kexec.c

diff --git a/security/integrity/ima/Makefile b/security/integrity/ima/Makefile
index 9aeaeda..56093be 100644
--- a/security/integrity/ima/Makefile
+++ b/security/integrity/ima/Makefile
@@ -8,4 +8,5 @@ obj-$(CONFIG_IMA) += ima.o
 ima-y := ima_fs.o ima_queue.o ima_init.o ima_main.o ima_crypto.o ima_api.o \
 ima_policy.o ima_template.o ima_template_lib.o
 ima-$(CONFIG_IMA_APPRAISE) += ima_appraise.o
+ima-$(CONFIG_KEXEC_FILE) += ima_kexec.o
 obj-$(CONFIG_IMA_BLACKLIST_KEYRING) += ima_mok.o
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index db25f54..e7b3755 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -102,6 +102,15 @@ struct ima_queue_entry {
 };
 extern struct list_head ima_measurements;  /* list of all measurements */
 
+/* Some details preceding the binary serialized measurement list */
+struct ima_kexec_hdr {
+   u16 version;
+   u16 _reserved0;
+   u32 _reserved1;
+   u64 buffer_size;
+   u64 count;
+};
+
 /* Internal IMA function definitions */
 int ima_init(void);
 int ima_fs_init(void);
@@ -122,8 +131,17 @@ int ima_init_crypto(void);
 void ima_putc(struct seq_file *m, void *data, int datalen);
 void ima_print_digest(struct seq_file *m, u8 *digest, u32 size);
 struct ima_template_desc *ima_template_desc_current(void);
+int ima_restore_measurement_entry(struct ima_template_entry *entry);
+int ima_restore_measurement_list(loff_t bufsize, void *buf);
 int ima_init_template(void);
 
+#ifdef CONFIG_KEXEC_FILE
+void ima_load_kexec_buffer(void);
+#else
+static inline void ima_load_kexec_buffer(void)
+{}
+#endif
+
 /*
  * used to protect h_table and sha_table
  */
diff --git a/security/integrity/ima/ima_init.c 
b/security/integrity/ima/ima_init.c
index 32912bd..3ba0ca4 100644
--- a/security/integrity/ima/ima_init.c
+++ b/security/integrity/ima/ima_init.c
@@ -128,6 +128,8 @@ int __init ima_init(void)
if (rc != 0)
return rc;
 
+   ima_load_kexec_buffer();
+
rc = ima_add_boot_aggregate();  /* boot aggregate must be first entry */
if (rc != 0)
return rc;
diff --git a/security/integrity/ima/ima_kexec.c 
b/security/integrity/ima/ima_kexec.c
new file mode 100644
index 000..6a046ad
--- /dev/null
+++ b/security/integrity/ima/ima_kexec.c
@@ -0,0 +1,55 @@
+/*
+ * Copyright (C) 2016 IBM Corporation
+ *
+ * Authors:
+ * Thiago Jung Bauermann 
+ * Mimi Zohar 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ima.h"
+
+/*
+ * Restore the measurement list from the previous kernel.
+ */
+void ima_load_kexec_buffer(void)
+{
+   void *kexec_buffer = NULL;
+   size_t kexec_buffer_size = 0;
+   int rc;
+
+   rc = kexec_get_handover_buffer(_buffer, _buffer_size);
+   switch (rc) {
+   case 0:
+   rc = ima_restore_measurement_list(kexec_buffer_size,
+ kexec_buffer);
+   if (rc != 0)
+   pr_err("Failed to restore the measurement list: %d\n",
+   rc);
+
+   kexec_free_handover_buffer();
+   break;
+   case -ENOTSUPP:
+   pr_debug("Restoring the measurement list not supported\n");
+   break;
+   case -ENOENT:
+   pr_debug("No measurement list to restore\n");
+   break;
+   default:
+   pr_debug("Error restoring the measurement list: %d\n", rc);
+   }
+}
diff --git a/security/integrity/ima/ima_queue.c 
b/security/integrity/ima/ima_queue.c
index 32f6ac0..4b1bb77 100644
--- a/security/integrity/ima/ima_queue.c
+++ b/security/integrity/ima/ima_queue.c

[PATCH v4 0/9] ima: carry the measurement list across kexec

2016-09-15 Thread Mimi Zohar
The TPM PCRs are only reset on a hard reboot.  In order to validate a
TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list
of the running kernel must be saved and then restored on the subsequent
boot, possibly of a different architecture.

The existing securityfs binary_runtime_measurements file conveniently
provides a serialized format of the IMA measurement list. This patch
set serializes the measurement list in this format and restores it.

Up to now, the binary_runtime_measurements was defined as architecture
native format.  The assumption being that userspace could and would
handle any architecture conversions.  With the ability of carrying the
measurement list across kexec, possibly from one architecture to a
different one, the per boot architecture information is lost and with it
the ability of recalculating the template digest hash.  To resolve this
problem, without breaking the existing ABI, this patch set introduces
the boot command line option "ima_canonical_fmt", which is arbitrarily
defined as little endian.

The need for this boot command line option will be limited to the
existing version 1 format of the binary_runtime_measurements.
Subsequent formats will be defined as canonical format (eg. TPM 2.0
support for larger digests).

This patch set pre-req's Thiago Bauermann's "kexec_file: Add buffer
hand-over for the next kernel" patch set. 

These patches can also be found in the next-kexec-restore branch of:
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git

Changelog v4:
- Fixed "spinlock bad magic" BUG - reported by Dmitry Vyukov
- Rebased on Thiago Bauermann's v5 patch set
- Removed the skip_checksum initialization  

Changelog v3:
- Cleaned up the code for calculating the requested kexec segment size
needed for the IMA measurement list, limiting the segment size to half
of the totalram_pages.
- Fixed kernel test robot reports as enumerated in the respective
patch changelog.

Changelog v2:
- Canonical measurement list support added
- Redefined the ima_kexec_hdr struct to use well defined sizes

Mimi

Andreas Steffen (1):
  ima: platform-independent hash value

Mimi Zohar (7):
  ima: on soft reboot, restore the measurement list
  ima: permit duplicate measurement list entries
  ima: maintain memory size needed for serializing the measurement list
  ima: serialize the binary_runtime_measurements
  ima: store the builtin/custom template definitions in a list
  ima: support restoring multiple template formats
  ima: define a canonical binary_runtime_measurements list format

Thiago Jung Bauermann (1):
  ima: on soft reboot, save the measurement list

 Documentation/kernel-parameters.txt   |   4 +
 include/linux/ima.h   |  12 ++
 kernel/kexec_file.c   |   4 +
 security/integrity/ima/Kconfig|  12 ++
 security/integrity/ima/Makefile   |   1 +
 security/integrity/ima/ima.h  |  28 +++
 security/integrity/ima/ima_crypto.c   |   6 +-
 security/integrity/ima/ima_fs.c   |  30 ++-
 security/integrity/ima/ima_init.c |   2 +
 security/integrity/ima/ima_kexec.c| 209 +
 security/integrity/ima/ima_main.c |   1 +
 security/integrity/ima/ima_queue.c|  76 +++-
 security/integrity/ima/ima_template.c | 293 --
 security/integrity/ima/ima_template_lib.c |   7 +-
 14 files changed, 653 insertions(+), 32 deletions(-)
 create mode 100644 security/integrity/ima/ima_kexec.c

-- 
2.1.0



Re: Linux 4.8: Reported regressions as of Sunday, 2016-08-28

2016-09-15 Thread Pavel Machek
Hi!

> Hi! Here is my second regression report for Linux 4.8. It lists 11
> regressions. 5 of them are new; 5 mentioned in the last report two 
> weeks ago got fixed.
> 
> FWIW: A small detail: I did not include "Regression - SATA disks behind 
> USB ones on v4.8-rc1, breaking boot. [Re: Who reordered my disks]" 
> (http://www.spinics.net/lists/linux-usb/msg144871.html ) in below list 
> report. The discussion mentions that device names like /dev/sd? are not 
> considered stable as they might change depending on various factors -- 
> like the order in which modules are loaded or other timing issues (like 
> in this case). That is how it is afaik (even if it's not well known), 
> and that's why I didn't include the issue; let me know if you think it 
> should be on the list.
> 
> OTOH I included "Commit cb4f71c429 deliberately changes order of 
> network interfaces" (http://www.spinics.net/lists/kernel/msg2325600.html )
> for now, as I think traditional network interface names (eth0, eth1, ...)
> might be considered stable -- but I'm not sure, that's why I raise it
> here.
> 
> Anyway, you know the drill: Are you aware of any other regressions?
> Then please let me know. And tell me if there is anything in the
> report that shouldn't be there.
> 
> Ciao, Thorsten
> 
> P.S.: Thanks to all those that Aaro Koskinen, Hans de Goede, Pavel 
> Machek for CCing me when reporting regressions. Much appreciated! Ohh, 
> and thx to all those that replied when I asked them for status updates
> when things look stuck.

Hmm, and there's one more apparently. See

Date: Tue, 13 Sep 2016 22:38:45 +0200
From: Martin Steigerwald 
To: Pavel Machek 
Cc: kernel list , daniel.vet...@intel.com,
jani.nik...@linux.intel.com, intel-...@lists.freedesktop.org,
dri-de...@lists.freedesktop.org, "Rafael J. Wysocki"

Subject: Re: 4.8-rc1: it is now common that machine needs re-run of xrandr
after resume
User-Agent: KMail/5.2.3 (Linux/4.8.0-rc6-tp520-btrfstrim+; KDE/5.25.0; x86_64;  
; )

I'm glad I'm not the only one seeing it, but I don't have idea how to
actually debug it.

Thanks and best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


[PATCH][RFC] powerpc/64: exception exit do not reload SRR[01] if unchanged

2016-09-15 Thread Nicholas Piggin
This is not ready yet, but it's a proof of concept for the approach to
speed up exception exit: avoid the mtspr instructions if the yare not
required. This saves 20 cycles per call on a getppid syscall
microbenchmark, so it seems worth looking into.

A few issues to be solved.

Firstly, realmode exceptions use an rfid to switch on relocation and
branch to common handler. This trashes SRR[01], so realmode exceptions
will always miss and be pessimised with this patch. We can avoid that
by doing a bctr to 0xc000... from realmode to enter the handler, and
then use mtmsrd to switch on relocation. The ISA actually suggests
this might be faster in some implementations, and on POWER8 it does
seem to be faster by about 6 cycles.

Secondly, avoiding the mfsprs would be nice if possible, and should
give a couple more cycles. We could use a byte in the paca to track
whether the SPRs are valid for the current exception. Anything
modifying SPRs including nested exceptions would clear the bit when
they're done. This is a bit more intrusive.

Finally, should gather some statistics for success vs failure.
---
 arch/powerpc/kernel/entry_64.S | 28 +++-
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 585b9ca..c836967 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -250,12 +250,21 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
ld  r13,GPR13(r1)   /* only restore r13 if returning to usermode */
-1: ld  r2,GPR2(r1)
+1:
+   mfspr   r11,SPRN_SRR0
+   mfspr   r12,SPRN_SRR1
+   cmpld   r7,r11
+   beq 5f
+   mtspr   SPRN_SRR0,r7
+5:
+   cmpld   r8,r12
+   beq 6f
+   mtspr   SPRN_SRR1,r8
+6:
+   ld  r2,GPR2(r1)
ld  r1,GPR1(r1)
mtlrr4
mtcrr5
-   mtspr   SPRN_SRR0,r7
-   mtspr   SPRN_SRR1,r8
RFI
b   .   /* prevent speculative execution */
 
@@ -859,12 +868,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
ACCOUNT_CPU_USER_EXIT(r13, r2, r4)
REST_GPR(13, r1)
 1:
+   mfspr   r0,SPRN_SRR0
+   mfspr   r2,SPRN_SRR1
+   ld  r4,_NIP(r1)
+
+   cmpld   r0,r4
+   beq 5f
+   mtspr   SPRN_SRR0,r4
+5:
+   cmpld   r2,r3
+   beq 6f
mtspr   SPRN_SRR1,r3
+6:
 
ld  r2,_CCR(r1)
mtcrf   0xFF,r2
-   ld  r2,_NIP(r1)
-   mtspr   SPRN_SRR0,r2
 
ld  r0,GPR0(r1)
ld  r2,GPR2(r1)
-- 
2.9.3



[PATCH] powerpc/64s: exception optimise MSR handling

2016-09-15 Thread Nicholas Piggin
mtmsrd with L=1 only affects MSR_EE and MSR_RI bits, and we always
know what state those bits are, so the kernel MSR does not need to be
loaded when modifying them.

mtmsrd is often in the critical execution path, so avoiding dependency
on even L1 load is noticable. On a POWER8 this saves about 3 cycles
from the syscall path, and possibly a few from other exception returns
(not measured).

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/entry_64.S | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6b8bc0d..585b9ca 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -139,7 +139,7 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
 #ifdef CONFIG_PPC_BOOK3E
wrteei  1
 #else
-   ld  r11,PACAKMSR(r13)
+   li  r11,MSR_RI
ori r11,r11,MSR_EE
mtmsrd  r11,1
 #endif /* CONFIG_PPC_BOOK3E */
@@ -195,7 +195,6 @@ system_call:/* label this so stack 
traces look sane */
 #ifdef CONFIG_PPC_BOOK3E
wrteei  0
 #else
-   ld  r10,PACAKMSR(r13)
/*
 * For performance reasons we clear RI the same time that we
 * clear EE. We only need to clear RI just before we restore r13
@@ -203,8 +202,7 @@ system_call:/* label this so stack 
traces look sane */
 * We have to be careful to restore RI if we branch anywhere from
 * here (eg syscall_exit_work).
 */
-   li  r9,MSR_RI
-   andcr11,r10,r9
+   li  r11,0
mtmsrd  r11,1
 #endif /* CONFIG_PPC_BOOK3E */
 
@@ -221,13 +219,12 @@ system_call:  /* label this so stack 
traces look sane */
 #endif
 2: addir3,r1,STACK_FRAME_OVERHEAD
 #ifdef CONFIG_PPC_BOOK3S
+   li  r10,MSR_RI
mtmsrd  r10,1   /* Restore RI */
 #endif
bl  restore_math
 #ifdef CONFIG_PPC_BOOK3S
-   ld  r10,PACAKMSR(r13)
-   li  r9,MSR_RI
-   andcr11,r10,r9 /* Re-clear RI */
+   li  r11,0
mtmsrd  r11,1
 #endif
ld  r8,_MSR(r1)
@@ -308,6 +305,7 @@ syscall_enosys:

 syscall_exit_work:
 #ifdef CONFIG_PPC_BOOK3S
+   li  r10,MSR_RI
mtmsrd  r10,1   /* Restore RI */
 #endif
/* If TIF_RESTOREALL is set, don't scribble on either r3 or ccr.
@@ -354,7 +352,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 #ifdef CONFIG_PPC_BOOK3E
wrteei  1
 #else
-   ld  r10,PACAKMSR(r13)
+   li  r10,MSR_RI
ori r10,r10,MSR_EE
mtmsrd  r10,1
 #endif /* CONFIG_PPC_BOOK3E */
@@ -619,7 +617,7 @@ _GLOBAL(ret_from_except_lite)
 #ifdef CONFIG_PPC_BOOK3E
wrteei  0
 #else
-   ld  r10,PACAKMSR(r13) /* Get kernel MSR without EE */
+   li  r10,MSR_RI
mtmsrd  r10,1 /* Update machine state */
 #endif /* CONFIG_PPC_BOOK3E */
 
@@ -751,7 +749,7 @@ resume_kernel:
 #ifdef CONFIG_PPC_BOOK3E
wrteei  0
 #else
-   ld  r10,PACAKMSR(r13) /* Get kernel MSR without EE */
+   li  r10,MSR_RI
mtmsrd  r10,1 /* Update machine state */
 #endif /* CONFIG_PPC_BOOK3E */
 #endif /* CONFIG_PREEMPT */
@@ -841,8 +839,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 * userspace and we take an exception after restoring r13,
 * we end up corrupting the userspace r13 value.
 */
-   ld  r4,PACAKMSR(r13) /* Get kernel MSR without EE */
-   andcr4,r4,r0 /* r0 contains MSR_RI here */
+   li  r4,0
mtmsrd  r4,1
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-- 
2.9.3



[PATCH] powerpc/64s: optimise syscall entry for virtual, relocatable case

2016-09-15 Thread Nicholas Piggin
The mflr r10 instruction was left over saving of lr when the code used
lr to branch to system_call_entry from the exception handler. That was
changed by 6a404806d to use the count register. The value is never used
now, so mflr can be removed, and r10 can be used for storage rather than
spilling to the SPR scratch register.

The scratch register spill causes a long pipeline stall due to the SPR
read after write. This change brings getppid syscall cost from 406 to
376 cycles on POWER8. getppid for non-relocatable case is 371 cycles.

Signed-off-by: Nicholas Piggin 
---

 arch/powerpc/kernel/exceptions-64s.S | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index df6d45e..2cdd64f 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -63,15 +63,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)  
\
 * is volatile across system calls.
 */
 #define SYSCALL_PSERIES_2_DIRECT   \
-   mflrr10 ;   \
ld  r12,PACAKBASE(r13) ;\
LOAD_HANDLER(r12, system_call_entry) ;  \
mtctr   r12 ;   \
mfspr   r12,SPRN_SRR1 ; \
-   /* Re-use of r13... No spare regs to do this */ \
-   li  r13,MSR_RI ;\
-   mtmsrd  r13,1 ; \
-   GET_PACA(r13) ; /* get r13 back */  \
+   li  r10,MSR_RI ;\
+   mtmsrd  r10,1 ; \
bctr ;
 #else
/* We can branch directly */
-- 
2.9.3



[PATCH] powerpc/powernv/pci: Fix missed TCE invalidations that should fallback to OPAL

2016-09-15 Thread Michael Ellerman
In commit f0228c413011 ("powerpc/powernv/pci: Fallback to OPAL for TCE
invalidations"), we added logic to fallback to OPAL for doing TCE
invalidations if we can't do it in Linux.

Ben sent a v2 of the patch, containing these additional call sites, but
I had already applied v1 and didn't notice. So fix them now.

Fixes: f0228c413011 ("powerpc/powernv/pci: Fallback to OPAL for TCE 
invalidations")
Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index da5da11a6223..bc0c91e84ca0 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2217,7 +2217,7 @@ static long pnv_pci_ioda2_set_window(struct 
iommu_table_group *table_group,
 
pnv_pci_link_table_and_group(phb->hose->node, num,
tbl, >table_group);
-   pnv_pci_phb3_tce_invalidate_pe(pe);
+   pnv_pci_ioda2_tce_invalidate_pe(pe);
 
return 0;
 }
@@ -2355,7 +2355,7 @@ static long pnv_pci_ioda2_unset_window(struct 
iommu_table_group *table_group,
if (ret)
pe_warn(pe, "Unmapping failed, ret = %ld\n", ret);
else
-   pnv_pci_phb3_tce_invalidate_pe(pe);
+   pnv_pci_ioda2_tce_invalidate_pe(pe);
 
pnv_pci_unlink_table_and_group(table_group->tables[num], table_group);
 
-- 
2.7.4



Re: [PATCH 3/3] Cyrus: create a defconfig

2016-09-15 Thread Andy Fleming


> On Sep 12, 2016, at 18:54, Scott Wood  wrote:
> 
>> On 09/10/2016 05:12 PM, Andy Fleming wrote:
>> 
>> 
>> On Tuesday, September 6, 2016, Scott Wood > > wrote:
>> 
>>>On 09/06/2016 02:12 PM, Andy Fleming wrote:
>>> This sets up the proper config elements for Power and Reset to work
>>> properly (using the gpio pins).
>>> 
>>> Signed-off-by: Andy Fleming >
>>> ---
>>> arch/powerpc/Makefile  | 5 +
>>> arch/powerpc/configs/cyrus_basic_defconfig | 9 +
>>> 2 files changed, 14 insertions(+)
>>> create mode 100644 arch/powerpc/configs/cyrus_basic_defconfig
>> 
>>Why does cyrus need its own defconfig?  Just enable the power/reset
>>stuff in 85xx-hw.config.
>> 
>> 
>> 
>>Ok.
> 
> Please send non-HTML mail with proper quote markers.  It's hard to read
> when it looks like someone is talking to themself.

Argh, sorry. gmail iPhone client did that very subtly. I've flipped it over to 
Mail, and hopefully that fixes it...

Andy

Re: [PATCH 2/3] corenet: Support gpio power/reset for corenet

2016-09-15 Thread Andy Fleming


> On Sep 12, 2016, at 23:47, Scott Wood  wrote:
> 
>> On 09/10/2016 05:05 PM, Andy Fleming wrote:
>> 
>> 
>> On Tuesday, September 6, 2016, Scott Wood > > wrote:
>> 
>>>On 09/06/2016 02:12 PM, Andy Fleming wrote:
>>> Boards can implement power and reset functionality over gpio using
>>> these drivers:
>>>  drivers/power/reset/gpio-poweroff.c
>>>  drivers/power/reset/gpio-restart.c
>>> 
>>> While not all corenet boards use gpio for power/reset, this
>>> support can be added without interfering with boards that do not
>>> use this functionality.
>>> 
>>> If a board's device tree has the related nodes, they are now probed.
>>> Also, gpio-poweroff uses the global pm_power_off callback to implement
>>> the shutdown. However, pm_power_off was not invoked when the kernel
>>> halted, although that is usually the desired behavior. If the board
>>> provides gpio power and reset support, it is reasonable to assume that
>>> halting should also power down the system, unless it has chosen to
>>> pass those calls on to hypervisor.
>> 
>>Halt and poweroff are not the same thing.  If userspace requests a
>>poweroff, then kernel_power_off() will call machine_power_off() which
>>will call pm_power_off().
>> 
>>Why do we need anything corenet-specific here?
>> 
>> 
>> 
>>We don't, but then the board will halt instead of power off when you
>>type shutdown -h now.
> 
> Isn't that what it's supposed to do?  If you want poweroff then ask for
> poweroff.
> 
>>Or if you type poweroff without a high enough
>>run level, apparently.
> 
> Hmm?
> 
> In any case, run levels have nothing to do with the kernel.  The kernel
> implements LINUX_REBOOT_CMD_HALT and LINUX_REBOOT_CMD_POWER_OFF, and
> they should do what they're advertised to do.
> 
>>I'm amenable to removing the halt code, but
>>there are concerns that this will cause the systems to behave
>>unintentionally as intended, in that most systems that users
>>interact with shut down when you call shutdown -h now. There may be
>>scripts that depend on that behavior (or at least assume it).
> 
> When did the behavior you're seeking ever exist?

Well, for one, on that Servergy board. I agree that halt and power off mean and 
have always meant different things to the kernel. The problem is that most 
desktop systems, having halted, pass control to the BIOS which--usually--shuts 
off the power. Am I wrong about this? I've been using shutdown -h now to turn 
off my Linux systems for nearly 2 decades now, but I admit that I don't do it 
often, and I tend to stick with whatever works.


> 
>>I don't see any other platforms doing this.  How do the nodes get probed
>>for them?
>> 
>> 
>>The answer is I don't know, but this is a common issue with adding
>>new devices to the device tree in embedded powerpc. The only other
>>platforms which have gpio-poweroff nodes in their trees are in
>>arch/arm, and none of those platforms call the probing
>>function of_platform_bus_probe. I suspect they either probe every
>>root node, or they somehow construct the match_id. As noted in the
>>above-referenced commit, putting the nodes under the gpio bus does
>>not cause them to get probed. This seemed like the best way under
>>the current corenet code.
> 
> Well, let's figure out what it is that PPC should be doing to have
> things work the way it does on ARM.

For all of the devices? Or just these two?

Andy