Re: [PATCH] devpts: remove DEVPTS_MULTIPLE_INSTANCES from all configs
On Monday 20 June 2016 02:44 PM, Alexandru Moise wrote: > As each mount of devpts is now an independent filesystem, > the DEVPTS_MULTIPLE_INSTANCES config option no longer exists. > So remove it. > > Signed-off-by: Alexandru Moise <00moses.alexande...@gmail.com> For arch/arc Acked-by: Vineet Gupta___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v6, 1/2] cxl: Add mechanism for delivering AFU driver specific events
Excerpts from Vaibhav Jain's message of 2016-06-20 14:20:16 +0530: > > +int cxl_unset_driver_ops(struct cxl_context *ctx) > > +{ > > +if (atomic_read(>afu_driver_events)) > > +return -EBUSY; > > + > > +ctx->afu_driver_ops = NULL; > Need a write memory barrier so that afu_driver_ops isnt possibly called > after this store. What situation do you think this will help? I haven't looked closely at the last few iterations of this patch set, but if you're in a situation where you might be racing with some code doing e.g. if (ctx->afu_driver_ops) ctx->afu_driver_ops->something(); You have a race with or without a memory barrier. Ideally you would just have the caller guarantee that it will only call cxl_unset_driver_ops if no further calls to afu_driver_ops is possible, otherwise you may need locking here which would be far from ideal. What exactly is the use case for this API? I'd vote to drop it if we can do without it. -Ian ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc/powernv: Exclude MSI region in extended bridge window
On Tue, 2016-21-06 at 02:41:05 UTC, Gavin Shan wrote: > The windows of root port and bridge behind that are extended to > the PHB's windows to accomodate the PCI hotplug happening in > future. The PHB's 64KB 32-bits MSI region is included in bridge's > M32 windows (in hardware) though it's excluded in the corresponding > resource, as the bridge's M32 windows have 1MB as their minimal > alignment. We observed EEH error during system boot when the MSI > region is included in bridge's M32 window. > > This excludes top 1MB (including 64KB 32-bits MSI region) region > from bridge's M32 windows when extending them. AFAICS you added that code in "powerpc/powernv: Extend PCI bridge resources", so I'll squash it into that. That way there is no window of breakage. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/mm: Prevent unlikely crash in copro_calculate_slb()
Acked-by: Ian Munsie___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: unrecoverable exception on G5 with CONFIG_PPC_EARLY_DEBUG enabled
> How about this? Denis does this work? > > cheers > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index 4c9440629128..8bcc1b457115 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -1399,11 +1399,12 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_RADIX) > lwz r9,PACA_EXSLB+EX_CCR(r13) /* get saved CR */ > > mtlrr10 > -BEGIN_MMU_FTR_SECTION > - b 2f > -END_MMU_FTR_SECTION_IFSET(MMU_FTR_RADIX) > andi. r10,r12,MSR_RI /* check for unrecoverable exception */ > +BEGIN_MMU_FTR_SECTION > beq-2f > +FTR_SECTION_ELSE > + b 2f > +ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_RADIX) > > .machine push > .machine "power4" > Yeah, it works. Thanks > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v20 20/20] Allow period= in perf stat CPU event descriptions.
This avoids the JSON PMU events parser having to know whether its aliases are for perf stat or perf record. Signed-off-by: Andi KleenSigned-off-by: Sukadev Bhattiprolu --- tools/perf/util/parse-events.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index c599077..e1b393d 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -920,6 +920,7 @@ config_term_avail(int term_type, struct parse_events_error *err) case PARSE_EVENTS__TERM_TYPE_CONFIG1: case PARSE_EVENTS__TERM_TYPE_CONFIG2: case PARSE_EVENTS__TERM_TYPE_NAME: + case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD: return true; default: if (!err) -- 2.5.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v20 19/20] perf, tools, pmu-events: Add Skylake frontend MSR support
From: Andi KleenAdd support for the "frontend" extra MSR on Skylake in the JSON conversion. Signed-off-by: Andi Kleen --- tools/perf/pmu-events/jevents.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c index c8d8e4a..0e43dce 100644 --- a/tools/perf/pmu-events/jevents.c +++ b/tools/perf/pmu-events/jevents.c @@ -126,6 +126,7 @@ static struct msrmap { { "0x3F6", "ldlat=" }, { "0x1A6", "offcore_rsp=" }, { "0x1A7", "offcore_rsp=" }, + { "0x3F7", "frontend=" }, { NULL, NULL } }; -- 2.5.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v20 18/20] perf, tools, pmu-events: Fix fixed counters on Intel
From: Andi KleenThe JSON event lists use a different encoding for fixed counters than perf for instructions and cycles (ref-cycles is ok) This lead to some common events like inst_retired.any or cpu_clk_unhalted.thread not counting, when specified with their JSON name. Special case these events in the jevents conversion process. I prefer to not touch the JSON files for this, as it's intended that standard JSON files can be just dropped into the perf build without changes. Signed-off-by: Andi Kleen Signed-off-by: Sukadev Bhattiprolu [Fix minor compile error] --- tools/perf/pmu-events/jevents.c | 27 +-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c index b701d77..c8d8e4a 100644 --- a/tools/perf/pmu-events/jevents.c +++ b/tools/perf/pmu-events/jevents.c @@ -237,6 +237,29 @@ static void print_events_table_suffix(FILE *outfp) fprintf(outfp, "};\n"); } +static struct fixed { + const char *name; + const char *event; +} fixed[] = { + { "inst_retired.any", "event=0xc0" }, + { "cpu_clk_unhalted.thread", "event=0x3c" }, + { "cpu_clk_unhalted.thread_any", "event=0x3c,any=1" }, + { NULL, NULL}, +}; + +/* + * Handle different fixed counter encodings between JSON and perf. + */ +static char *real_event(const char *name, char *event) +{ + int i; + + for (i = 0; fixed[i].name; i++) + if (!strcasecmp(name, fixed[i].name)) + return (char *)fixed[i].event; + return event; +} + /* Call func with each event in the json file */ int json_events(const char *fn, int (*func)(void *data, char *name, char *event, char *desc, @@ -325,8 +348,8 @@ int json_events(const char *fn, if (msr != NULL) addfield(map, , ",", msr->pname, msrval); fixname(name); - - err = func(data, name, event, desc, long_desc, topic); + err = func(data, name, real_event(name, event), desc, + long_desc, topic); free(event); free(desc); free(name); -- 2.5.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v20 10/20] perf, tools, jevents: Add support for long descriptions
Implement support in jevents to parse long descriptions for events that may have them in the JSON files. A follow on patch will make this long description available to user through the 'perf list' command. Signed-off-by: Andi KleenSigned-off-by: Sukadev Bhattiprolu Acked-by: Jiri Olsa --- Changelog[v14] - [Jiri Olsa] Break up independent parts of the patch into separate patches. --- tools/perf/pmu-events/jevents.c| 32 tools/perf/pmu-events/jevents.h| 3 ++- tools/perf/pmu-events/pmu-events.h | 1 + 3 files changed, 27 insertions(+), 9 deletions(-) diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c index aeafa82..51b864a 100644 --- a/tools/perf/pmu-events/jevents.c +++ b/tools/perf/pmu-events/jevents.c @@ -203,7 +203,7 @@ static void print_events_table_prefix(FILE *fp, const char *tblname) } static int print_events_table_entry(void *data, char *name, char *event, - char *desc) + char *desc, char *long_desc) { FILE *outfp = data; /* @@ -215,6 +215,8 @@ static int print_events_table_entry(void *data, char *name, char *event, fprintf(outfp, "\t.name = \"%s\",\n", name); fprintf(outfp, "\t.event = \"%s\",\n", event); fprintf(outfp, "\t.desc = \"%s\",\n", desc); + if (long_desc && long_desc[0]) + fprintf(outfp, "\t.long_desc = \"%s\",\n", long_desc); fprintf(outfp, "},\n"); @@ -235,7 +237,8 @@ static void print_events_table_suffix(FILE *outfp) /* Call func with each event in the json file */ int json_events(const char *fn, - int (*func)(void *data, char *name, char *event, char *desc), + int (*func)(void *data, char *name, char *event, char *desc, + char *long_desc), void *data) { int err = -EIO; @@ -254,6 +257,8 @@ int json_events(const char *fn, tok = tokens + 1; for (i = 0; i < tokens->size; i++) { char *event = NULL, *desc = NULL, *name = NULL; + char *long_desc = NULL; + char *extra_desc = NULL; struct msrmap *msr = NULL; jsmntok_t *msrval = NULL; jsmntok_t *precise = NULL; @@ -279,6 +284,10 @@ int json_events(const char *fn, } else if (json_streq(map, field, "BriefDescription")) { addfield(map, , "", "", val); fixdesc(desc); + } else if (json_streq(map, field, +"PublicDescription")) { + addfield(map, _desc, "", "", val); + fixdesc(long_desc); } else if (json_streq(map, field, "PEBS") && nz) { precise = val; } else if (json_streq(map, field, "MSRIndex") && nz) { @@ -287,10 +296,10 @@ int json_events(const char *fn, msrval = val; } else if (json_streq(map, field, "Errata") && !json_streq(map, val, "null")) { - addfield(map, , ". ", + addfield(map, _desc, ". ", " Spec update: ", val); } else if (json_streq(map, field, "Data_LA") && nz) { - addfield(map, , ". ", + addfield(map, _desc, ". ", " Supports address when precise", NULL); } @@ -298,19 +307,26 @@ int json_events(const char *fn, } if (precise && !strstr(desc, "(Precise Event)")) { if (json_streq(map, precise, "2")) - addfield(map, , " ", "(Must be precise)", - NULL); + addfield(map, _desc, " ", + "(Must be precise)", NULL); else - addfield(map, , " ", + addfield(map, _desc, " ", "(Precise event)", NULL); } + if (desc && extra_desc) + addfield(map, , " ", extra_desc, NULL); + if (long_desc && extra_desc) + addfield(map, _desc, " ", extra_desc, NULL); if (msr != NULL) addfield(map, , ",", msr->pname, msrval); fixname(name); - err = func(data, name, event, desc); + + err = func(data, name, event, desc, long_desc);
[PATCH v20 16/20] perf, tools: Add README for info on parsing JSON/map files
Signed-off-by: Sukadev BhattiproluAcked-by: Jiri Olsa --- tools/perf/pmu-events/README | 122 +++ 1 file changed, 122 insertions(+) create mode 100644 tools/perf/pmu-events/README diff --git a/tools/perf/pmu-events/README b/tools/perf/pmu-events/README new file mode 100644 index 000..da57cb5 --- /dev/null +++ b/tools/perf/pmu-events/README @@ -0,0 +1,122 @@ + +The contents of this directory allow users to specify PMU events in +their CPUs by their symbolic names rather than raw event codes (see +example below). + +The main program in this directory, is the 'jevents', which is built and +executed _before_ the perf binary itself is built. + +The 'jevents' program tries to locate and process JSON files in the directory +tree tools/perf/pmu-events/arch/foo. + + - Regular files with '.json' extension in the name are assumed to be + JSON files, each of which describes a set of PMU events. + + - Regular files with basename starting with 'mapfile.csv' are assumed + to be a CSV file that maps a specific CPU to its set of PMU events. + (see below for mapfile format) + + - Directories are traversed, but all other files are ignored. + +Using the JSON files and the mapfile, 'jevents' generates the C source file, +'pmu-events.c', which encodes the two sets of tables: + + - Set of 'PMU events tables' for all known CPUs in the architecture, + (one table like the following, per JSON file; table name 'pme_power8' + is derived from JSON file name, 'power8.json'). + + struct pmu_event pme_power8[] = { + + ... + + { + .name = "pm_1plus_ppc_cmpl", + .event = "event=0x100f2", + .desc = "1 or more ppc insts finished,", + }, + + ... + } + + - A 'mapping table' that maps each CPU of the architecture, to its + 'PMU events table' + + struct pmu_events_map pmu_events_map[] = { + { + .cpuid = "004b", + .version = "1", + .type = "core", + .table = pme_power8 + }, + ... + + }; + +After the 'pmu-events.c' is generated, it is compiled and the resulting +'pmu-events.o' is added to 'libperf.a' which is then used to build perf. + +NOTES: + 1. Several CPUs can support same set of events and hence use a common + JSON file. Hence several entries in the pmu_events_map[] could map + to a single 'PMU events table'. + + 2. The 'pmu-events.h' has an extern declaration for the mapping table + and the generated 'pmu-events.c' defines this table. + + 3. _All_ known CPU tables for architecture are included in the perf + binary. + +At run time, perf determines the actual CPU it is running on, finds the +matching events table and builds aliases for those events. This allows +users to specify events by their name: + + $ perf stat -e pm_1plus_ppc_cmpl sleep 1 + +where 'pm_1plus_ppc_cmpl' is a Power8 PMU event. + +In case of errors when processing files in the tools/perf/pmu-events/arch +directory, 'jevents' tries to create an empty mapping file to allow the perf +build to succeed even if the PMU event aliases cannot be used. + +However some errors in processing may cause the perf build to fail. + +Mapfile format +=== + +The mapfile.csv format is expected to be: + + Header line + CPUID,Version,File/path/name.json,Type + +where: + + Comma: + is the required field delimiter (i.e other fields cannot + have commas within them). + + Comments: + Lines in which the first character is either '\n' or '#' + are ignored. + + Header line + The header line is the first line in the file, which is + _IGNORED_. It can be a comment (begin with '#') or empty. + + CPUID: + CPUID is an arch-specific char string, that can be used + to identify CPU (and associate it with a set of PMU events + it supports). Multiple CPUIDS can point to the same + File/path/name.json. + + Example: + CPUID == 'GenuineIntel-6-2E' (on x86). + CPUID == '004b0100' (PVR value in Powerpc) + Version: + is the Version of the mapfile. + + File/path/name.json: + is the pathname for the JSON file, relative to the directory + containing the mapfile.csv + + Type: + indicates whether the events or "core" or "uncore" events. -- 2.5.3 ___ Linuxppc-dev
[PATCH v20 17/20] perf, tools: Make alias matching case-insensitive
From: Andi KleenMake alias matching the events parser case-insensitive. This is useful with the JSON events. perf uses lower case events, but the CPU manuals generally use upper case event names. The JSON files use lower case by default too. But if we search case insensitively then users can cut-n-paste the upper case event names. So the following works: % perf stat -e BR_INST_EXEC.TAKEN_INDIRECT_NEAR_CALL true Performance counter stats for 'true': 305 BR_INST_EXEC.TAKEN_INDIRECT_NEAR_CALL 0.000492799 seconds time elapsed Signed-off-by: Andi Kleen --- tools/perf/util/parse-events.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 77bb7ac..c599077 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -1436,7 +1436,7 @@ comp_pmu(const void *p1, const void *p2) struct perf_pmu_event_symbol *pmu1 = (struct perf_pmu_event_symbol *) p1; struct perf_pmu_event_symbol *pmu2 = (struct perf_pmu_event_symbol *) p2; - return strcmp(pmu1->symbol, pmu2->symbol); + return strcasecmp(pmu1->symbol, pmu2->symbol); } static void perf_pmu__parse_cleanup(void) -- 2.5.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v20 14/20] perf, tools: Add support for event list topics
From: Andi KleenAdd support to group the output of perf list by the Topic field in the JSON file. Example output: % perf list ... Cache: l1d.replacement [L1D data line replacements] l1d_pend_miss.pending [L1D miss oustandings duration in cycles] l1d_pend_miss.pending_cycles [Cycles with L1D load Misses outstanding] l2_l1d_wb_rqsts.all [Not rejected writebacks from L1D to L2 cache lines in any state] l2_l1d_wb_rqsts.hit_e [Not rejected writebacks from L1D to L2 cache lines in E state] l2_l1d_wb_rqsts.hit_m [Not rejected writebacks from L1D to L2 cache lines in M state] ... Pipeline: arith.fpu_div [Divide operations executed] arith.fpu_div_active [Cycles when divider is busy executing divide operations] baclears.any [Counts the total number when the front end is resteered, mainly when the BPU cannot provide a correct prediction and this is corrected by other branch handling mechanisms at the front end] br_inst_exec.all_branches [Speculative and retired branches] br_inst_exec.all_conditional [Speculative and retired macro-conditional branches] br_inst_exec.all_direct_jmp [Speculative and retired macro-unconditional branches excluding calls and indirects] br_inst_exec.all_direct_near_call [Speculative and retired direct near calls] br_inst_exec.all_indirect_jump_non_call_ret Signed-off-by: Andi Kleen Signed-off-by: Sukadev Bhattiprolu Acked-by: Jiri Olsa --- Changelog[v14] - [Jiri Olsa] Move jevents support for Topic to a separate patch. --- tools/perf/util/pmu.c | 37 +++-- tools/perf/util/pmu.h | 1 + 2 files changed, 28 insertions(+), 10 deletions(-) diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index c606f6a..2d526ad 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -223,7 +223,8 @@ static int perf_pmu__parse_snapshot(struct perf_pmu_alias *alias, } static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name, -char *desc, char *val, char *long_desc) +char *desc, char *val, char *long_desc, +char *topic) { struct perf_pmu_alias *alias; int ret; @@ -259,6 +260,7 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name, alias->desc = desc ? strdup(desc) : NULL; alias->long_desc = long_desc ? strdup(long_desc) : desc ? strdup(desc) : NULL; + alias->topic = topic ? strdup(topic) : NULL; list_add_tail(>list, list); @@ -276,7 +278,7 @@ static int perf_pmu__new_alias(struct list_head *list, char *dir, char *name, FI buf[ret] = 0; - return __perf_pmu__new_alias(list, dir, name, NULL, buf, NULL); + return __perf_pmu__new_alias(list, dir, name, NULL, buf, NULL, NULL); } static inline bool pmu_alias_info_file(char *name) @@ -526,7 +528,7 @@ static int pmu_add_cpu_aliases(struct list_head *head) /* need type casts to override 'const' */ __perf_pmu__new_alias(head, NULL, (char *)pe->name, (char *)pe->desc, (char *)pe->event, - (char *)pe->long_desc); + (char *)pe->long_desc, (char *)pe->topic); } out: @@ -1047,19 +1049,26 @@ static char *format_alias_or(char *buf, int len, struct perf_pmu *pmu, return buf; } -struct pair { +struct sevent { char *name; char *desc; + char *topic; }; -static int cmp_pair(const void *a, const void *b) +static int cmp_sevent(const void *a, const void *b) { - const struct pair *as = a; - const struct pair *bs = b; + const struct sevent *as = a; + const struct sevent *bs = b; /* Put extra events last */ if (!!as->desc != !!bs->desc) return !!as->desc - !!bs->desc; + if (as->topic && bs->topic) { + int n = strcmp(as->topic, bs->topic); + + if (n) + return n; + } return strcmp(as->name, bs->name); } @@ -1093,9 +1102,10 @@ void print_pmu_events(const char *event_glob, bool name_only, bool quiet_flag, char buf[1024]; int printed = 0; int len, j; - struct pair *aliases; + struct sevent *aliases; int numdesc = 0; int columns = pager_get_columns(); + char *topic = NULL; pmu = NULL; len = 0; @@ -1105,7 +1115,7 @@ void print_pmu_events(const char *event_glob, bool name_only, bool quiet_flag, if (pmu->selectable) len++; } - aliases = zalloc(sizeof(struct pair) * len); + aliases = zalloc(sizeof(struct sevent)
[PATCH v20 15/20] perf, tools: Handle header line in mapfile
From: Andi KleenTo work with existing mapfiles, assume that the first line in 'mapfile.csv' is a header line and skip over it. Signed-off-by: Andi Kleen Signed-off-by: Sukadev Bhattiprolu Acked-by: Jiri Olsa --- Changelog[v2] All architectures may not use the "Family" to identify. So, assume first line is header. --- tools/perf/pmu-events/jevents.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c index a4f117c..b701d77 100644 --- a/tools/perf/pmu-events/jevents.c +++ b/tools/perf/pmu-events/jevents.c @@ -463,7 +463,12 @@ static int process_mapfile(FILE *outfp, char *fpath) print_mapping_table_prefix(outfp); - line_num = 0; + /* Skip first line (header) */ + p = fgets(line, n, mapfp); + if (!p) + goto out; + + line_num = 1; while (1) { char *cpuid, *version, *type, *fname; @@ -507,8 +512,8 @@ static int process_mapfile(FILE *outfp, char *fpath) fprintf(outfp, "},\n"); } +out: print_mapping_table_suffix(outfp); - return 0; } -- 2.5.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v20 13/20] perf, tools, jevents: Add support for event topics
Allow assigning categories "Topics" field to the PMU events i.e. process the topic field from the JSON file and add a corresponding topic field to the generated C events tables. Signed-off-by: Andi KleenSigned-off-by: Sukadev Bhattiprolu Acked-by: Jiri Olsa --- Changelog[v14] [Jiri Olsa] Move this independent code off into a separate patch. --- tools/perf/pmu-events/jevents.c| 12 +--- tools/perf/pmu-events/jevents.h| 2 +- tools/perf/pmu-events/pmu-events.h | 1 + 3 files changed, 11 insertions(+), 4 deletions(-) diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c index 51b864a..a4f117c 100644 --- a/tools/perf/pmu-events/jevents.c +++ b/tools/perf/pmu-events/jevents.c @@ -203,7 +203,7 @@ static void print_events_table_prefix(FILE *fp, const char *tblname) } static int print_events_table_entry(void *data, char *name, char *event, - char *desc, char *long_desc) + char *desc, char *long_desc, char *topic) { FILE *outfp = data; /* @@ -217,6 +217,8 @@ static int print_events_table_entry(void *data, char *name, char *event, fprintf(outfp, "\t.desc = \"%s\",\n", desc); if (long_desc && long_desc[0]) fprintf(outfp, "\t.long_desc = \"%s\",\n", long_desc); + if (topic) + fprintf(outfp, "\t.topic = \"%s\",\n", topic); fprintf(outfp, "},\n"); @@ -238,7 +240,7 @@ static void print_events_table_suffix(FILE *outfp) /* Call func with each event in the json file */ int json_events(const char *fn, int (*func)(void *data, char *name, char *event, char *desc, - char *long_desc), + char *long_desc, char *topic), void *data) { int err = -EIO; @@ -259,6 +261,7 @@ int json_events(const char *fn, char *event = NULL, *desc = NULL, *name = NULL; char *long_desc = NULL; char *extra_desc = NULL; + char *topic = NULL; struct msrmap *msr = NULL; jsmntok_t *msrval = NULL; jsmntok_t *precise = NULL; @@ -298,6 +301,8 @@ int json_events(const char *fn, !json_streq(map, val, "null")) { addfield(map, _desc, ". ", " Spec update: ", val); + } else if (json_streq(map, field, "Topic")) { + addfield(map, , "", "", val); } else if (json_streq(map, field, "Data_LA") && nz) { addfield(map, _desc, ". ", " Supports address when precise", @@ -321,12 +326,13 @@ int json_events(const char *fn, addfield(map, , ",", msr->pname, msrval); fixname(name); - err = func(data, name, event, desc, long_desc); + err = func(data, name, event, desc, long_desc, topic); free(event); free(desc); free(name); free(long_desc); free(extra_desc); + free(topic); if (err) break; tok += j; diff --git a/tools/perf/pmu-events/jevents.h b/tools/perf/pmu-events/jevents.h index b0eb274..9ffcb89 100644 --- a/tools/perf/pmu-events/jevents.h +++ b/tools/perf/pmu-events/jevents.h @@ -3,7 +3,7 @@ int json_events(const char *fn, int (*func)(void *data, char *name, char *event, char *desc, - char *long_desc), + char *long_desc, char *topic), void *data); char *get_cpu_str(void); diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h index 711f049..6b69f4b 100644 --- a/tools/perf/pmu-events/pmu-events.h +++ b/tools/perf/pmu-events/pmu-events.h @@ -9,6 +9,7 @@ struct pmu_event { const char *event; const char *desc; const char *long_desc; + const char *topic; }; /* -- 2.5.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v20 12/20] perf, tools: Support long descriptions with perf list
Previously we were dropping the useful longer descriptions that some events have in the event list completely. This patch makes them appear with perf list. Old perf list: baclears: baclears.all [Counts the number of baclears] vs new: perf list -v: ... baclears: baclears.all [The BACLEARS event counts the number of times the front end is resteered, mainly when the Branch Prediction Unit cannot provide a correct prediction and this is corrected by the Branch Address Calculator at the front end. The BACLEARS.ANY event counts the number of baclears for any type of branch] Signed-off-by: Andi KleenSigned-off-by: Sukadev Bhattiprolu Acked-by: Jiri Olsa --- Changelog[v15] - [Jir Olsa, Andi Kleen] Fix usage strings; update man page. Changelog[v14] - [Jiri Olsa] Break up independent parts of the patch into separate patches. Changelog[v18]: - Fix minor conflict in tools/perf/builtin-list.c; add long_desc_flag parameter to new print_pmu_events() call site. --- tools/perf/Documentation/perf-list.txt | 6 +- tools/perf/builtin-list.c | 16 +++- 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt index 72209bc..41857cc 100644 --- a/tools/perf/Documentation/perf-list.txt +++ b/tools/perf/Documentation/perf-list.txt @@ -8,7 +8,7 @@ perf-list - List all symbolic event types SYNOPSIS [verse] -'perf list' [--no-desc] [hw|sw|cache|tracepoint|pmu|event_glob] +'perf list' [--no-desc] [--long-desc] [hw|sw|cache|tracepoint|pmu|event_glob] DESCRIPTION --- @@ -20,6 +20,10 @@ OPTIONS --no-desc:: Don't print descriptions. +-v:: +--long-desc:: +Print longer event descriptions. + [[EVENT_MODIFIERS]] EVENT MODIFIERS diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c index 6be195b..1fdf770 100644 --- a/tools/perf/builtin-list.c +++ b/tools/perf/builtin-list.c @@ -22,14 +22,17 @@ int cmd_list(int argc, const char **argv, const char *prefix __maybe_unused) { int i; bool raw_dump = false; + bool long_desc_flag = false; struct option list_options[] = { OPT_BOOLEAN(0, "raw-dump", _dump, "Dump raw events"), OPT_BOOLEAN('d', "desc", _flag, "Print extra event descriptions. --no-desc to not print."), + OPT_BOOLEAN('v', "long-desc", _desc_flag, + "Print longer event descriptions."), OPT_END() }; const char * const list_usage[] = { - "perf list [--no-desc] [hw|sw|cache|tracepoint|pmu|event_glob]", + "perf list [] [hw|sw|cache|tracepoint|pmu|event_glob]", NULL }; @@ -44,7 +47,7 @@ int cmd_list(int argc, const char **argv, const char *prefix __maybe_unused) printf("\nList of pre-defined events (to be used in -e):\n\n"); if (argc == 0) { - print_events(NULL, raw_dump, !desc_flag); + print_events(NULL, raw_dump, !desc_flag, long_desc_flag); return 0; } @@ -65,12 +68,14 @@ int cmd_list(int argc, const char **argv, const char *prefix __maybe_unused) strcmp(argv[i], "hwcache") == 0) print_hwcache_events(NULL, raw_dump); else if (strcmp(argv[i], "pmu") == 0) - print_pmu_events(NULL, raw_dump, !desc_flag); + print_pmu_events(NULL, raw_dump, !desc_flag, + long_desc_flag); else if ((sep = strchr(argv[i], ':')) != NULL) { int sep_idx; if (sep == NULL) { - print_events(argv[i], raw_dump, !desc_flag); + print_events(argv[i], raw_dump, !desc_flag, + long_desc_flag); continue; } sep_idx = sep - argv[i]; @@ -91,7 +96,8 @@ int cmd_list(int argc, const char **argv, const char *prefix __maybe_unused) print_symbol_events(s, PERF_TYPE_SOFTWARE, event_symbols_sw, PERF_COUNT_SW_MAX, raw_dump); print_hwcache_events(s, raw_dump); - print_pmu_events(s, raw_dump, !desc_flag); + print_pmu_events(s, raw_dump, !desc_flag, + long_desc_flag); print_tracepoint_events(NULL, s, raw_dump); free(s); } -- 2.5.3 ___ Linuxppc-dev mailing list
[PATCH v20 11/20] perf, tools: Add alias support for long descriptions
Previously we were dropping the useful longer descriptions that some events have in the event list completely. Now that jevents provides support for longer descriptions (see previous patch), add support for parsing the long descriptions Signed-off-by: Andi KleenSigned-off-by: Sukadev Bhattiprolu Acked-by: Jiri Olsa --- Changelog[v14] - [Jiri Olsa] Break up independent parts of the patch into separate patches. --- tools/perf/util/parse-events.c | 5 +++-- tools/perf/util/parse-events.h | 3 ++- tools/perf/util/pmu.c | 15 ++- tools/perf/util/pmu.h | 4 +++- 4 files changed, 18 insertions(+), 9 deletions(-) diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 72803e3..77bb7ac 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -2133,7 +2133,8 @@ out_enomem: /* * Print the help text for the event symbols: */ -void print_events(const char *event_glob, bool name_only, bool quiet_flag) +void print_events(const char *event_glob, bool name_only, bool quiet_flag, + bool long_desc) { print_symbol_events(event_glob, PERF_TYPE_HARDWARE, event_symbols_hw, PERF_COUNT_HW_MAX, name_only); @@ -2143,7 +2144,7 @@ void print_events(const char *event_glob, bool name_only, bool quiet_flag) print_hwcache_events(event_glob, name_only); - print_pmu_events(event_glob, name_only, quiet_flag); + print_pmu_events(event_glob, name_only, quiet_flag, long_desc); if (event_glob != NULL) return; diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h index dee5022..68794e6 100644 --- a/tools/perf/util/parse-events.h +++ b/tools/perf/util/parse-events.h @@ -169,7 +169,8 @@ void parse_events_update_lists(struct list_head *list_event, void parse_events_evlist_error(struct parse_events_evlist *data, int idx, const char *str); -void print_events(const char *event_glob, bool name_only, bool quiet); +void print_events(const char *event_glob, bool name_only, bool quiet, + bool long_desc); struct event_symbol { const char *symbol; diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index b94712a..c606f6a 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -223,7 +223,7 @@ static int perf_pmu__parse_snapshot(struct perf_pmu_alias *alias, } static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name, -char *desc, char *val) +char *desc, char *val, char *long_desc) { struct perf_pmu_alias *alias; int ret; @@ -257,6 +257,8 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name, } alias->desc = desc ? strdup(desc) : NULL; + alias->long_desc = long_desc ? strdup(long_desc) : + desc ? strdup(desc) : NULL; list_add_tail(>list, list); @@ -274,7 +276,7 @@ static int perf_pmu__new_alias(struct list_head *list, char *dir, char *name, FI buf[ret] = 0; - return __perf_pmu__new_alias(list, dir, name, NULL, buf); + return __perf_pmu__new_alias(list, dir, name, NULL, buf, NULL); } static inline bool pmu_alias_info_file(char *name) @@ -523,7 +525,8 @@ static int pmu_add_cpu_aliases(struct list_head *head) /* need type casts to override 'const' */ __perf_pmu__new_alias(head, NULL, (char *)pe->name, - (char *)pe->desc, (char *)pe->event); + (char *)pe->desc, (char *)pe->event, + (char *)pe->long_desc); } out: @@ -1082,7 +1085,8 @@ static void wordwrap(char *s, int start, int max, int corr) } } -void print_pmu_events(const char *event_glob, bool name_only, bool quiet_flag) +void print_pmu_events(const char *event_glob, bool name_only, bool quiet_flag, + bool long_desc) { struct perf_pmu *pmu; struct perf_pmu_alias *alias; @@ -1130,7 +1134,8 @@ void print_pmu_events(const char *event_glob, bool name_only, bool quiet_flag) if (!aliases[j].name) goto out_enomem; - aliases[j].desc = alias->desc; + aliases[j].desc = long_desc ? alias->long_desc : + alias->desc; j++; } if (pmu->selectable && diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h index 42999c7..1aa614e 100644 --- a/tools/perf/util/pmu.h +++ b/tools/perf/util/pmu.h @@ -39,6 +39,7 @@ struct perf_pmu_info { struct perf_pmu_alias { char *name; char *desc; + char *long_desc;
[PATCH v20 09/20] perf, tools: Add override support for event list CPUID
From: Andi KleenAdd a PERF_CPUID variable to override the CPUID of the current CPU (within the current architecture). This is useful for testing, so that all event lists can be tested on a single system. Signed-off-by: Andi Kleen Signed-off-by: Sukadev Bhattiprolu Acked-by: Jiri Olsa --- v2: Fix double free in earlier version. Print actual CPUID being used with verbose option. --- tools/perf/util/pmu.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index fe1dec3..b94712a 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -492,10 +492,16 @@ static int pmu_add_cpu_aliases(struct list_head *head) struct pmu_event *pe; char *cpuid; - cpuid = get_cpuid_str(); + cpuid = getenv("PERF_CPUID"); + if (cpuid) + cpuid = strdup(cpuid); + if (!cpuid) + cpuid = get_cpuid_str(); if (!cpuid) return 0; + pr_debug("Using CPUID %s\n", cpuid); + i = 0; while (1) { map = _events_map[i++]; -- 2.5.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v20 08/20] perf, tools: Add a --no-desc flag to perf list
From: Andi KleenAdd a --no-desc flag to perf list to not print the event descriptions that were earlier added for JSON events. This may be useful to get a less crowded listing. It's still default to print descriptions as that is the more useful default for most users. Signed-off-by: Andi Kleen Signed-off-by: Sukadev Bhattiprolu Acked-by: Jiri Olsa --- v2: Rename --quiet to --no-desc. Add option to man page. v18: Fix minor conflict in tools/perf/builtin-list.c; Add !desc_flag to the newly introduced print_pmu_events() call site. --- tools/perf/Documentation/perf-list.txt | 8 +++- tools/perf/builtin-list.c | 14 +- tools/perf/util/parse-events.c | 4 ++-- tools/perf/util/parse-events.h | 2 +- tools/perf/util/pmu.c | 4 ++-- tools/perf/util/pmu.h | 2 +- 6 files changed, 22 insertions(+), 12 deletions(-) diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt index a126e97..72209bc 100644 --- a/tools/perf/Documentation/perf-list.txt +++ b/tools/perf/Documentation/perf-list.txt @@ -8,13 +8,19 @@ perf-list - List all symbolic event types SYNOPSIS [verse] -'perf list' [hw|sw|cache|tracepoint|pmu|event_glob] +'perf list' [--no-desc] [hw|sw|cache|tracepoint|pmu|event_glob] DESCRIPTION --- This command displays the symbolic event types which can be selected in the various perf commands with the -e option. +OPTIONS +--- +--no-desc:: +Don't print descriptions. + + [[EVENT_MODIFIERS]] EVENT MODIFIERS --- diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c index 5e22db4..6be195b 100644 --- a/tools/perf/builtin-list.c +++ b/tools/perf/builtin-list.c @@ -16,16 +16,20 @@ #include "util/pmu.h" #include +static bool desc_flag = true; + int cmd_list(int argc, const char **argv, const char *prefix __maybe_unused) { int i; bool raw_dump = false; struct option list_options[] = { OPT_BOOLEAN(0, "raw-dump", _dump, "Dump raw events"), + OPT_BOOLEAN('d', "desc", _flag, + "Print extra event descriptions. --no-desc to not print."), OPT_END() }; const char * const list_usage[] = { - "perf list [hw|sw|cache|tracepoint|pmu|event_glob]", + "perf list [--no-desc] [hw|sw|cache|tracepoint|pmu|event_glob]", NULL }; @@ -40,7 +44,7 @@ int cmd_list(int argc, const char **argv, const char *prefix __maybe_unused) printf("\nList of pre-defined events (to be used in -e):\n\n"); if (argc == 0) { - print_events(NULL, raw_dump); + print_events(NULL, raw_dump, !desc_flag); return 0; } @@ -61,12 +65,12 @@ int cmd_list(int argc, const char **argv, const char *prefix __maybe_unused) strcmp(argv[i], "hwcache") == 0) print_hwcache_events(NULL, raw_dump); else if (strcmp(argv[i], "pmu") == 0) - print_pmu_events(NULL, raw_dump); + print_pmu_events(NULL, raw_dump, !desc_flag); else if ((sep = strchr(argv[i], ':')) != NULL) { int sep_idx; if (sep == NULL) { - print_events(argv[i], raw_dump); + print_events(argv[i], raw_dump, !desc_flag); continue; } sep_idx = sep - argv[i]; @@ -87,7 +91,7 @@ int cmd_list(int argc, const char **argv, const char *prefix __maybe_unused) print_symbol_events(s, PERF_TYPE_SOFTWARE, event_symbols_sw, PERF_COUNT_SW_MAX, raw_dump); print_hwcache_events(s, raw_dump); - print_pmu_events(s, raw_dump); + print_pmu_events(s, raw_dump, !desc_flag); print_tracepoint_events(NULL, s, raw_dump); free(s); } diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index d15e335..72803e3 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -2133,7 +2133,7 @@ out_enomem: /* * Print the help text for the event symbols: */ -void print_events(const char *event_glob, bool name_only) +void print_events(const char *event_glob, bool name_only, bool quiet_flag) { print_symbol_events(event_glob, PERF_TYPE_HARDWARE, event_symbols_hw, PERF_COUNT_HW_MAX, name_only); @@ -2143,7 +2143,7 @@ void print_events(const char *event_glob, bool name_only) print_hwcache_events(event_glob, name_only); -
[PATCH v20 07/20] perf, tools: Query terminal width and use in perf list
From: Andi KleenAutomatically adapt the now wider and word wrapped perf list output to wider terminals. This requires querying the terminal before the auto pager takes over, and exporting this information from the pager subsystem. Signed-off-by: Andi Kleen Signed-off-by: Sukadev Bhattiprolu Acked-by: Namhyung Kim Acked-by: Jiri Olsa --- Changelog[v20] - Minor reorg since helpers like setup_pager() are now in tools/lib/subcmd/pager.c --- tools/lib/subcmd/pager.c | 15 +++ tools/lib/subcmd/pager.h | 1 + tools/perf/util/pmu.c| 3 ++- 3 files changed, 18 insertions(+), 1 deletion(-) diff --git a/tools/lib/subcmd/pager.c b/tools/lib/subcmd/pager.c index d50f3b5..409b582 100644 --- a/tools/lib/subcmd/pager.c +++ b/tools/lib/subcmd/pager.c @@ -3,6 +3,7 @@ #include #include #include +#include #include "pager.h" #include "run-command.h" #include "sigchain.h" @@ -14,6 +15,7 @@ */ static int spawned_pager; +static int pager_columns; void pager_init(const char *pager_env) { @@ -58,9 +60,12 @@ static void wait_for_pager_signal(int signo) void setup_pager(void) { const char *pager = getenv(subcmd_config.pager_env); + struct winsize sz; if (!isatty(1)) return; + if (ioctl(1, TIOCGWINSZ, ) == 0) + pager_columns = sz.ws_col; if (!pager) pager = getenv("PAGER"); if (!(pager || access("/usr/bin/pager", X_OK))) @@ -98,3 +103,13 @@ int pager_in_use(void) { return spawned_pager; } + +int pager_get_columns(void) +{ + char *s; + + s = getenv("COLUMNS"); + if (s) + return atoi(s); + return (pager_columns ? pager_columns : 80) - 2; +} diff --git a/tools/lib/subcmd/pager.h b/tools/lib/subcmd/pager.h index 8b83714..623f554 100644 --- a/tools/lib/subcmd/pager.h +++ b/tools/lib/subcmd/pager.h @@ -5,5 +5,6 @@ extern void pager_init(const char *pager_env); extern void setup_pager(void); extern int pager_in_use(void); +extern int pager_get_columns(void); #endif /* __SUBCMD_PAGER_H */ diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index 496c02e..10a95e7 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -14,6 +14,7 @@ #include "cpumap.h" #include "header.h" #include "pmu-events/pmu-events.h" +#include "cache.h" struct perf_pmu_format { char *name; @@ -1084,7 +1085,7 @@ void print_pmu_events(const char *event_glob, bool name_only) int len, j; struct pair *aliases; int numdesc = 0; - int columns = 78; + int columns = pager_get_columns(); pmu = NULL; len = 0; -- 2.5.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v20 06/20] perf, tools: Support alias descriptions
From: Andi KleenAdd support to print alias descriptions in perf list, which are taken from the generated event files. The sorting code is changed to put the events with descriptions at the end. The descriptions are printed as possibly multiple word wrapped lines. Example output: % perf list ... arith.fpu_div [Divide operations executed] arith.fpu_div_active [Cycles when divider is busy executing divide operations] Signed-off-by: Andi Kleen Signed-off-by: Sukadev Bhattiprolu Acked-by: Jiri Olsa --- Changelog - Delete a redundant free() Changelog[v14] - [Jiri Olsa] Fail, rather than continue if strdup() returns NULL; remove unnecessary __maybe_unused. --- tools/perf/util/pmu.c | 83 +-- tools/perf/util/pmu.h | 1 + 2 files changed, 68 insertions(+), 16 deletions(-) diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index eae0de1..496c02e 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -222,7 +222,7 @@ static int perf_pmu__parse_snapshot(struct perf_pmu_alias *alias, } static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name, -char *desc __maybe_unused, char *val) +char *desc, char *val) { struct perf_pmu_alias *alias; int ret; @@ -255,6 +255,8 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name, perf_pmu__parse_snapshot(alias, dir, name); } + alias->desc = desc ? strdup(desc) : NULL; + list_add_tail(>list, list); return 0; @@ -1035,11 +1037,42 @@ static char *format_alias_or(char *buf, int len, struct perf_pmu *pmu, return buf; } -static int cmp_string(const void *a, const void *b) +struct pair { + char *name; + char *desc; +}; + +static int cmp_pair(const void *a, const void *b) +{ + const struct pair *as = a; + const struct pair *bs = b; + + /* Put extra events last */ + if (!!as->desc != !!bs->desc) + return !!as->desc - !!bs->desc; + return strcmp(as->name, bs->name); +} + +static void wordwrap(char *s, int start, int max, int corr) { - const char * const *as = a; - const char * const *bs = b; - return strcmp(*as, *bs); + int column = start; + int n; + + while (*s) { + int wlen = strcspn(s, " \t"); + + if (column + wlen >= max && column > start) { + printf("\n%*s", start, ""); + column = start + corr; + } + n = printf("%s%.*s", column > start ? " " : "", wlen, s); + if (n <= 0) + break; + s += wlen; + column += n; + while (isspace(*s)) + s++; + } } void print_pmu_events(const char *event_glob, bool name_only) @@ -1049,7 +1082,9 @@ void print_pmu_events(const char *event_glob, bool name_only) char buf[1024]; int printed = 0; int len, j; - char **aliases; + struct pair *aliases; + int numdesc = 0; + int columns = 78; pmu = NULL; len = 0; @@ -1059,14 +1094,15 @@ void print_pmu_events(const char *event_glob, bool name_only) if (pmu->selectable) len++; } - aliases = zalloc(sizeof(char *) * len); + aliases = zalloc(sizeof(struct pair) * len); if (!aliases) goto out_enomem; pmu = NULL; j = 0; while ((pmu = perf_pmu__scan(pmu)) != NULL) { list_for_each_entry(alias, >aliases, list) { - char *name = format_alias(buf, sizeof(buf), pmu, alias); + char *name = alias->desc ? alias->name : + format_alias(buf, sizeof(buf), pmu, alias); bool is_cpu = !strcmp(pmu->name, "cpu"); if (event_glob != NULL && @@ -1075,12 +,19 @@ void print_pmu_events(const char *event_glob, bool name_only) event_glob continue; - if (is_cpu && !name_only) + if (is_cpu && !name_only && !alias->desc) name = format_alias_or(buf, sizeof(buf), pmu, alias); - aliases[j] = strdup(name); - if (aliases[j] == NULL) + aliases[j].name = name; + if (is_cpu && !name_only && !alias->desc) + aliases[j].name = format_alias_or(buf, + sizeof(buf), +
[PATCH v20 05/20] perf, tools: Support CPU id matching for x86 v2
From: Andi KleenImplement the code to match CPU types to mapfile types for x86 based on CPUID. This extends an existing similar function, but changes it to use the x86 mapfile cpu description. This allows to resolve event lists generated by jevents. Signed-off-by: Andi Kleen Signed-off-by: Sukadev Bhattiprolu Acked-by: Jiri Olsa --- v2: Update to new get_cpuid_str() interface --- tools/perf/arch/x86/util/header.c | 24 +--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/tools/perf/arch/x86/util/header.c b/tools/perf/arch/x86/util/header.c index 146d12a..a74a48d 100644 --- a/tools/perf/arch/x86/util/header.c +++ b/tools/perf/arch/x86/util/header.c @@ -19,8 +19,8 @@ cpuid(unsigned int op, unsigned int *a, unsigned int *b, unsigned int *c, : "a" (op)); } -int -get_cpuid(char *buffer, size_t sz) +static int +__get_cpuid(char *buffer, size_t sz, const char *fmt) { unsigned int a, b, c, d, lvl; int family = -1, model = -1, step = -1; @@ -48,7 +48,7 @@ get_cpuid(char *buffer, size_t sz) if (family >= 0x6) model += ((a >> 16) & 0xf) << 4; } - nb = scnprintf(buffer, sz, "%s,%u,%u,%u$", vendor, family, model, step); + nb = scnprintf(buffer, sz, fmt, vendor, family, model, step); /* look for end marker to ensure the entire data fit */ if (strchr(buffer, '$')) { @@ -57,3 +57,21 @@ get_cpuid(char *buffer, size_t sz) } return -1; } + +int +get_cpuid(char *buffer, size_t sz) +{ + return __get_cpuid(buffer, sz, "%s,%u,%u,%u$"); +} + +char * +get_cpuid_str(void) +{ + char *buf = malloc(128); + + if (__get_cpuid(buf, 128, "%s-%u-%X$") < 0) { + free(buf); + return NULL; + } + return buf; +} -- 2.5.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v20 04/20] perf, tools: Support CPU ID matching for Powerpc
Implement code that returns the generic CPU ID string for Powerpc. This will be used to identify the specific table of PMU events to parse/compare user specified events against. Signed-off-by: Sukadev BhattiproluAcked-by: Jiri Olsa --- Changelog[v14] - [Jiri Olsa] Move this independent code off into a separate patch. --- tools/perf/arch/powerpc/util/header.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/tools/perf/arch/powerpc/util/header.c b/tools/perf/arch/powerpc/util/header.c index f8ccee1..9aaa6f5 100644 --- a/tools/perf/arch/powerpc/util/header.c +++ b/tools/perf/arch/powerpc/util/header.c @@ -32,3 +32,14 @@ get_cpuid(char *buffer, size_t sz) } return -1; } + +char * +get_cpuid_str(void) +{ + char *bufp; + + if (asprintf(, "%.8lx", mfspr(SPRN_PVR)) < 0) + bufp = NULL; + + return bufp; +} -- 2.5.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v20 03/20] perf, tools: Use pmu_events table to create aliases
At run time (when 'perf' is starting up), locate the specific table of PMU events that corresponds to the current CPU. Using that table, create aliases for the each of the PMU events in the CPU. The use these aliases to parse the user specified perf event. In short this would allow the user to specify events using their aliases rather than raw event codes. Based on input and some earlier patches from Andi Kleen, Jiri Olsa. Signed-off-by: Sukadev BhattiproluAcked-by: Jiri Olsa --- Changelog[v4] - Split off unrelated code into separate patches. Changelog[v3] - [Jiri Olsa] Fix memory leak in cpuid Changelog[v2] - [Andi Kleen] Replace pmu_events_map->vfm with a generic "cpuid". --- tools/perf/util/header.h | 1 + tools/perf/util/pmu.c| 61 2 files changed, 62 insertions(+) diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h index d306ca1..d30109b 100644 --- a/tools/perf/util/header.h +++ b/tools/perf/util/header.h @@ -151,4 +151,5 @@ int write_padded(int fd, const void *bf, size_t count, size_t count_aligned); */ int get_cpuid(char *buffer, size_t sz); +char *get_cpuid_str(void); #endif /* __PERF_HEADER_H */ diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index ddb0261..eae0de1 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -12,6 +12,8 @@ #include "pmu.h" #include "parse-events.h" #include "cpumap.h" +#include "header.h" +#include "pmu-events/pmu-events.h" struct perf_pmu_format { char *name; @@ -464,6 +466,62 @@ static struct cpu_map *pmu_cpumask(const char *name) return cpus; } +/* + * Return the CPU id as a raw string. + * + * Each architecture should provide a more precise id string that + * can be use to match the architecture's "mapfile". + */ +char * __weak get_cpuid_str(void) +{ + return NULL; +} + +/* + * From the pmu_events_map, find the table of PMU events that corresponds + * to the current running CPU. Then, add all PMU events from that table + * as aliases. + */ +static int pmu_add_cpu_aliases(struct list_head *head) +{ + int i; + struct pmu_events_map *map; + struct pmu_event *pe; + char *cpuid; + + cpuid = get_cpuid_str(); + if (!cpuid) + return 0; + + i = 0; + while (1) { + map = _events_map[i++]; + if (!map->table) + goto out; + + if (!strcmp(map->cpuid, cpuid)) + break; + } + + /* +* Found a matching PMU events table. Create aliases +*/ + i = 0; + while (1) { + pe = >table[i++]; + if (!pe->name) + break; + + /* need type casts to override 'const' */ + __perf_pmu__new_alias(head, NULL, (char *)pe->name, + (char *)pe->desc, (char *)pe->event); + } + +out: + free(cpuid); + return 0; +} + struct perf_event_attr * __weak perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused) { @@ -488,6 +546,9 @@ static struct perf_pmu *pmu_lookup(const char *name) if (pmu_aliases(name, )) return NULL; + if (!strcmp(name, "cpu")) + (void)pmu_add_cpu_aliases(); + if (pmu_type(name, )) return NULL; -- 2.5.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v20 02/20] perf, tools, jevents: Program to convert JSON file to C style file
From: Andi KleenThis is a modified version of an earlier patch by Andi Kleen. We expect architectures to describe the performance monitoring events for each CPU in a corresponding JSON file, which look like: [ { "EventCode": "0x00", "UMask": "0x01", "EventName": "INST_RETIRED.ANY", "BriefDescription": "Instructions retired from execution.", "PublicDescription": "Instructions retired from execution.", "Counter": "Fixed counter 1", "CounterHTOff": "Fixed counter 1", "SampleAfterValue": "203", "SampleAfterValue": "203", "MSRIndex": "0", "MSRValue": "0", "TakenAlone": "0", "CounterMask": "0", "Invert": "0", "AnyThread": "0", "EdgeDetect": "0", "PEBS": "0", "PRECISE_STORE": "0", "Errata": "null", "Offcore": "0" } ] We also expect the architectures to provide a mapping between individual CPUs to their JSON files. Eg: GenuineIntel-6-1E,V1,/NHM-EP/NehalemEP_core_V1.json,core which maps each CPU, identified by [vendor, family, model, version, type] to a JSON file. Given these files, the program, jevents:: - locates all JSON files for the architecture, - parses each JSON file and generates a C-style "PMU-events table" (pmu-events.c) - locates a mapfile for the architecture - builds a global table, mapping each model of CPU to the corresponding PMU-events table. The 'pmu-events.c' is generated when building perf and added to libperf.a. The global table pmu_events_map[] table in this pmu-events.c will be used in perf in a follow-on patch. If the architecture does not have any JSON files or there is an error in processing them, an empty mapping file is created. This would allow the build of perf to proceed even if we are not able to provide aliases for events. The parser for JSON files allows parsing Intel style JSON event files. This allows to use an Intel event list directly with perf. The Intel event lists can be quite large and are too big to store in unswappable kernel memory. The conversion from JSON to C-style is straight forward. The parser knows (very little) Intel specific information, and can be easily extended to handle fields for other CPUs. The parser code is partially shared with an independent parsing library, which is 2-clause BSD licenced. To avoid any conflicts I marked those files as BSD licenced too. As part of perf they become GPLv2. Signed-off-by: Andi Kleen Signed-off-by: Sukadev Bhattiprolu Acked-by: Jiri Olsa --- v2: Address review feedback. Rename option to --event-files v3: Add JSON example v4: Update manpages. v5: Don't remove dot in fixname. Fix compile error. Add include protection. Comment realloc. v6: Include debug/util.h v7: (Sukadev Bhattiprolu) Rebase to 4.0 and fix some conflicts. v8: (Sukadev Bhattiprolu) Move jevents.[hc] to tools/perf/pmu-events/ Rewrite to locate and process arch specific JSON and "map" files; and generate a C file. (Removed acked-by Namhyung Kim due to modest changes to patch) Compile the generated pmu-events.c and add the pmu-events.o to libperf.a v9: [Sukadev Bhattiprolu/Andi Kleen] Rename ->vfm to ->cpuid and use that field to encode the PVR in Power. Allow blank lines in mapfile. [Jiri Olsa] Pass ARCH as a parameter to jevents so we don't have to detect it. [Jiri Olsa] Use the infrastrastructure to build pmu-events/perf (Makefile changes from Jiri included in this patch). [Jiri Olsa, Andi Kleen] Detect changes to JSON files and rebuild pmu-events.o only if necessary. v11:- [Andi Kleen] Add mapfile, jevents dependency on pmu-events.c - [Jiri Olsa] Be silient if arch doesn't have JSON files - Also silence 'jevents' when parsing JSON files unless V=1 is specified during build. Cleanup error messages. v14:- - [Jiri Olsa] Fix compile error with DEBUG=1; drop unlink() and use "w" mode with fopen(); simplify file_name_to_table_name() v15:- Fix minor conflict in tools/perf/Makefile.perf when rebasing to recent perf/core. v16:- Rebase to upstream; fix conflicts in tools/perf/Makefile.perf v18:- Rebase to upstream; fix conflicts in tools/perf/Makefile.perf v20: Rebase to upstream; rename a local variable to 'ldirname' to avoid collision with the dirname(). --- tools/perf/Makefile.perf | 26 +- tools/perf/pmu-events/Build| 11 + tools/perf/pmu-events/jevents.c| 686 + tools/perf/pmu-events/jevents.h| 17 + tools/perf/pmu-events/json.h | 3 + tools/perf/pmu-events/pmu-events.h | 35 ++ 6 files changed, 775 insertions(+), 3
[PATCH v20 00/20] perf, tools: Add support for PMU events in JSON format
CPUs support a large number of performance monitoring events (PMU events) and often these events are very specific to an architecture/model of the CPU. To use most of these PMU events with perf, we currently have to identify them by their raw codes: perf stat -e r100f2 sleep 1 This patchset allows architectures to specify these PMU events in JSON files located in 'tools/perf/pmu-events/arch/' of the mainline tree. The events from the JSON files for the architecture are then built into the perf binary. At run time, perf identifies the specific set of events for the CPU and creates "event aliases". These aliases allow users to specify events by "name" as: perf stat -e pm_1plus_ppc_cmpl sleep 1 The file, 'tools/perf/pmu-events/README' in [PATCH 16/16] gives more details. Note: - All known events tables for the architecture are included in the perf binary. - For architectures that don't have any JSON files, an empty mapping table is created and they should continue to build. Thanks to input from Andi Kleen, Jiri Olsa, Namhyung Kim and Ingo Molnar. These patches are available from: https://github.com/sukadev/linux.git Branch Description -- json-code-v20 Source Code only json-data-v20 x86 and Powerpc datafiles only json-code+data-v20 Both code and data (for build/test) NOTE: Only "source code" patches (i.e those in json-code-v20) are being emailed. Please pull the "data files" from the json-data-v20 branch. Changelog[v20] - Rebase to recent perf/core - Add Patch 20/20 to allow perf-stat to work with the period= field Changelog[v19] Rebase to recent perf/core; fix couple lines >80 chars. Changelog[v18] Rebase to recent perf/core; fix minor merge conflicts. Changelog[v17] Rebase to recent perf/core; couple of small fixes to processing Intel JSON files; allow case-insensitive PMU event names. Changelog[v16] Rebase to recent perf/core; fix minor merge conflicts; drop 3 patches that were merged into perf/core. Changelog[v15] Code changes: - Fix 'perf list' usage string and update man page. - Remove a redundant __maybe_unused tag. - Rebase to recent perf/core branch. Data files updates: json-files-5 branch - Rebase to perf/intel-json-files-5 from Andi Kleen - Add patch from Madhavan Srinivasan for couple more Powerpc models Changelog[v14] Comments from Jiri Olsa: - Change parameter name/type for pmu_add_cpu_aliases (from void *data to list_head *head) - Use asprintf() in file_name_to_tablename() and simplify/reorg code. - Use __weak definition from - Use fopen() with mode "w" and eliminate unlink() - Remove minor TODO. - Add error check for return value from strdup() in print_pmu_events(). - Move independent changes from patches 3,11,12 .. to separate patches for easier review/backport. - Clarify mapfile's "header line support" in patch description. - Fix build failure with DEBUG=1 Comment from Andi Kleen: - In tools/perf/pmu-events/Build, check for 'mapfile.csv' rather than 'mapfile*' Misc: - Minor changes/clarifications to tools/perf/pmu-events/README. Changelog[v13] Version: Individual patches have their own history :-) that I am preserving. Patchset version (v13) is for overall patchset and is somewhat arbitrary. - Added support for "categories" of events to perf - Add mapfile, jevents build dependency on pmu-events.c - Silence jevents when parsing JSON files unless V=1 is specified - Cleanup error messages - Fix memory leak with ->cpuid - Rebase to Arnaldo's tree - Allow overriding CPUID via environment variable - Support long descriptions for events - Handle header line in mapfile.csv - Cleanup JSON files (trim PublicDescription if identical to/prefix of BriefDescription field) Andi Kleen (12): perf, tools: Add jsmn `jasmine' JSON parser perf, tools, jevents: Program to convert JSON file to C style file perf, tools: Support CPU id matching for x86 v2 perf, tools: Support alias descriptions perf, tools: Query terminal width and use in perf list perf, tools: Add a --no-desc flag to perf list perf, tools: Add override support for event list CPUID perf, tools: Add support for event list topics perf, tools: Handle header line in mapfile perf, tools: Make alias matching case-insensitive perf, tools, pmu-events: Fix fixed counters on Intel perf, tools, pmu-events: Add Skylake frontend MSR support Sukadev Bhattiprolu (8): perf, tools: Use pmu_events table to create aliases
[PATCH v20 01/20] perf, tools: Add jsmn `jasmine' JSON parser
From: Andi KleenI need a JSON parser. This adds the simplest JSON parser I could find -- Serge Zaitsev's jsmn `jasmine' -- to the perf library. I merely converted it to (mostly) Linux style and added support for non 0 terminated input. The parser is quite straight forward and does not copy any data, just returns tokens with offsets into the input buffer. So it's relatively efficient and simple to use. The code is not fully checkpatch clean, but I didn't want to completely fork the upstream code. Original source: http://zserge.bitbucket.org/jsmn.html In addition I added a simple wrapper that mmaps a json file and provides some straight forward access functions. Used in follow-on patches to parse event files. Acked-by: Namhyung Kim Acked-by: Jiri Olsa Signed-off-by: Andi Kleen Signed-off-by: Sukadev Bhattiprolu --- v2: Address review feedback. v3: Minor checkpatch fixes. v4 (by Sukadev Bhattiprolu) - Rebase to 4.0 and fix minor conflicts in tools/perf/Makefile.perf - Report error if specified events file is invalid. v5 (Sukadev Bhattiprolu) - Move files to tools/perf/pmu-events/ since parsing of JSON file now occurs when _building_ rather than running perf. --- tools/perf/pmu-events/jsmn.c | 313 +++ tools/perf/pmu-events/jsmn.h | 67 + tools/perf/pmu-events/json.c | 162 ++ tools/perf/pmu-events/json.h | 36 + 4 files changed, 578 insertions(+) create mode 100644 tools/perf/pmu-events/jsmn.c create mode 100644 tools/perf/pmu-events/jsmn.h create mode 100644 tools/perf/pmu-events/json.c create mode 100644 tools/perf/pmu-events/json.h diff --git a/tools/perf/pmu-events/jsmn.c b/tools/perf/pmu-events/jsmn.c new file mode 100644 index 000..11d1fa1 --- /dev/null +++ b/tools/perf/pmu-events/jsmn.c @@ -0,0 +1,313 @@ +/* + * Copyright (c) 2010 Serge A. Zaitsev + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + * + * Slightly modified by AK to not assume 0 terminated input. + */ + +#include +#include "jsmn.h" + +/* + * Allocates a fresh unused token from the token pool. + */ +static jsmntok_t *jsmn_alloc_token(jsmn_parser *parser, + jsmntok_t *tokens, size_t num_tokens) +{ + jsmntok_t *tok; + + if ((unsigned)parser->toknext >= num_tokens) + return NULL; + tok = [parser->toknext++]; + tok->start = tok->end = -1; + tok->size = 0; + return tok; +} + +/* + * Fills token type and boundaries. + */ +static void jsmn_fill_token(jsmntok_t *token, jsmntype_t type, + int start, int end) +{ + token->type = type; + token->start = start; + token->end = end; + token->size = 0; +} + +/* + * Fills next available token with JSON primitive. + */ +static jsmnerr_t jsmn_parse_primitive(jsmn_parser *parser, const char *js, + size_t len, + jsmntok_t *tokens, size_t num_tokens) +{ + jsmntok_t *token; + int start; + + start = parser->pos; + + for (; parser->pos < len; parser->pos++) { + switch (js[parser->pos]) { +#ifndef JSMN_STRICT + /* +* In strict mode primitive must be followed by "," +* or "}" or "]" +*/ + case ':': +#endif + case '\t': + case '\r': + case '\n': + case ' ': + case ',': + case ']': + case '}': + goto found; + default: + break; + } + if (js[parser->pos] < 32 || js[parser->pos] >= 127) { + parser->pos = start; + return
Re: [RESEND PATCH v2 3/4] PCI: Add a new option for resource_alignment to reassign alignment
On 2016/6/21 9:57, Bjorn Helgaas wrote: On Thu, Jun 02, 2016 at 01:46:50PM +0800, Yongji Xie wrote: When using resource_alignment kernel parameter, the current implement reassigns the alignment by changing resources' size which can potentially break some drivers. For example, the driver uses the size to locate some register whose length is related to the size. This patch adds a new option "noresize" for the parameter to solve this problem. Why do we ever want to change the resource's size? I understand that you want to change the resource's *alignment*, and that part makes sense. But why change the *size*? Changing the resource size doesn't change the hardware BAR size; it just means the struct resource will describe a region larger than what the BAR actually claims. That unnecessarily wastes space after the BAR. This was a problem with the code even before your patch; I'm suggesting that if you have a way to change the alignment without changing the resource size, maybe we should do that all the time. Then you wouldn't need to add the "noresize" option. Yes, changing resource's size seems not a good idea. But would it break some existing systems which are using this kernel parameter if we remove the "noresize" option and change the alignment all the time? Thanks, Yongji Signed-off-by: Yongji Xie--- Documentation/kernel-parameters.txt |5 - drivers/pci/pci.c | 35 +-- 2 files changed, 29 insertions(+), 11 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 82b42c9..c4802f5 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2997,13 +2997,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted. window. The default value is 64 megabytes. resource_alignment= Format: - [@][:]:.[; ...] + [@][:]:. + [:noresize][; ...] Specifies alignment and device to reassign aligned memory resources. If is not specified, PAGE_SIZE is used as alignment. PCI-PCI bridge can be specified, if resource windows need to be expanded. + noresize: Don't change the resources' sizes when + reassigning alignment. ecrc= Enable/disable PCIe ECRC (transaction layer end-to-end CRC checking). bios: Use BIOS/firmware settings. This is the diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index a259394..3ee13e5 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4748,11 +4748,13 @@ static DEFINE_SPINLOCK(resource_alignment_lock); /** * pci_specified_resource_alignment - get resource alignment specified by user. * @dev: the PCI device to get + * @resize: whether or not to change resources' size when reassigning alignment * * RETURNS: Resource alignment if it is specified. * Zero if it is not specified. */ -static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev) +static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, + bool *resize) { int seg, bus, slot, func, align_order, count; resource_size_t align = 0; @@ -4786,6 +4788,11 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev) } } p += count; + if (!strncmp(p, ":noresize", 9)) { + *resize = false; + p += 9; + } else + *resize = true; if (seg == pci_domain_nr(dev->bus) && bus == dev->bus->number && slot == PCI_SLOT(dev->devfn) && @@ -4818,11 +4825,12 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev) { int i; struct resource *r; + bool resize = true; resource_size_t align, size; u16 command; /* check if specified PCI is target device to reassign */ - align = pci_specified_resource_alignment(dev); + align = pci_specified_resource_alignment(dev, ); if (!align) return; @@ -4844,15 +4852,22 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev) if (!(r->flags & IORESOURCE_MEM)) continue; size = resource_size(r); - if (size < align) { - size = align; - dev_info(>dev, - "Rounding
Re: [RFC PATCH 1/2] KVM: PPC: divide the ics lock into smaller ones for each irq
On Jun 20, 2016, at 13:25, Paul Mackerraswrote:On Mon, May 16, 2016 at 02:58:18PM +0800, Li Zhong wrote:This patch tries to use smaller locks for each irq in the ics, insteadof a lock at the ics level, to provide better scalability.This looks like a worth-while thing to do. Do you have anyperformance measurements to justify the change? This will increasethe size of struct kvmppc_ics by 4kB, so it would be useful to showthe performance increase that justifies it.Actually, I saw some “improvement” because of the vcpus were not binded, io jobs and irqs on the guest were not binded. After I fixed those random factors, the result became stable, but I couldn’t see any obvious improvements from the patches... Maybe I need find some other test cases that could support this change. Also, when you resend the patch, please make the patch descriptionmore definite - say "With this patch, we use" rather than "this patchtries to use", for instance.OK, I will change that when doing a resend, if I can find some workload that could benefit from this change. Thanks, ZhongRegards,Paul.___Linuxppc-dev mailing listLinuxppc-dev@lists.ozlabs.orghttps://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 10/12] lguest: Only descend into lguest directory when CONFIG_LGUEST is set
"Andrew F. Davis"writes: > When CONFIG_LGUEST is not set make will still descend into the lguest > directory but nothing will be built. This produces unneeded build > artifacts and messages in addition to slowing the build. Fix this here. > > Signed-off-by: Andrew F. Davis > --- > drivers/Makefile | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Applied, Thanks! Rusty. > > diff --git a/drivers/Makefile b/drivers/Makefile > index 19305e0..b178e2f 100644 > --- a/drivers/Makefile > +++ b/drivers/Makefile > @@ -122,7 +122,7 @@ obj-$(CONFIG_ACCESSIBILITY) += accessibility/ > obj-$(CONFIG_ISDN) += isdn/ > obj-$(CONFIG_EDAC) += edac/ > obj-$(CONFIG_EISA) += eisa/ > -obj-y+= lguest/ > +obj-$(CONFIG_LGUEST) += lguest/ > obj-$(CONFIG_CPU_FREQ) += cpufreq/ > obj-$(CONFIG_CPU_IDLE) += cpuidle/ > obj-y+= mmc/ > -- > 2.8.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RESEND PATCH v2 2/4] PCI: Do not Use IORESOURCE_STARTALIGN to identify bridge resources
On 2016/6/21 9:50, Bjorn Helgaas wrote: On Thu, Jun 02, 2016 at 01:46:49PM +0800, Yongji Xie wrote: Now we use the IORESOURCE_STARTALIGN to identify bridge resources in __assign_resources_sorted(). That's quite fragile. We can't make sure that the PCI devices' resources will not use IORESOURCE_STARTALIGN any more. Can you explain this a little more? I don't quite understand the problem. Maybe you can give an example of the problem? Now there are two kinds of additional resources in the list: bridge resource and SR-IOV resource. Here we just want to fix the additional alignment for bridge. But if SR-IOV resource get the flag IORESOURCE_STARTALIGN which is not only for bridge, the current check seems to be wrong. And kernel parameter "resource_alignment" could set IORESOURCE_STARTALIGN for SR-IOV resource with my next patch applied. Thanks, Yongji In this patch, we try to use a more robust way to identify bridge resources. Signed-off-by: Yongji Xie--- drivers/pci/setup-bus.c |9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index 55641a3..216ddbc 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -390,6 +390,7 @@ static void __assign_resources_sorted(struct list_head *head, struct pci_dev_resource *dev_res, *tmp_res, *dev_res2; unsigned long fail_type; resource_size_t add_align, align; + int index; /* Check if optional add_size is there */ if (!realloc_head || list_empty(realloc_head)) @@ -410,11 +411,13 @@ static void __assign_resources_sorted(struct list_head *head, /* * There are two kinds of additional resources in the list: -* 1. bridge resource -- IORESOURCE_STARTALIGN -* 2. SR-IOV resource -- IORESOURCE_SIZEALIGN +* 1. bridge resource +* 2. SR-IOV resource * Here just fix the additional alignment for bridge */ - if (!(dev_res->res->flags & IORESOURCE_STARTALIGN)) + index = dev_res->res - dev_res->dev->resource; + if (index < PCI_BRIDGE_RESOURCES || + index > PCI_BRIDGE_RESOURCE_END) I think the code looks OK; at least, it seems to match the comment. I'd just like to understand the problem a little better. continue; add_align = get_res_add_align(realloc_head, dev_res->res); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 2/2] KVM: PPC: Don't take lock when check irq's resend flag
> On Jun 20, 2016, at 13:27, Paul Mackerraswrote: > > On Mon, May 16, 2016 at 03:02:13PM +0800, Li Zhong wrote: >> It seems that we don't need to take the lock before evaluating irq's >> resend flag. What needed is to make sure when we clear the ics's bit >> in the icp's resend_map, we don't miss the resend flag of the irqs >> that set the bit. >> >> And seems this could be ordered through the barrier in test_and_clear_bit(), >> and an newly added wmb when setting irq's resend flag, and icp's resend_map. > > This looks fine to me. Is there a measurable performance improvement > from this? I understand it could be hard to measure. > > Also, you could make the patch description more definite - just say > that we don't need to take the lock, there's no need for "seems”. OK :) However, we may need to ignore this one for now. To implement the P/Q stuff, we probably need make sure the resend irqs to be resent only once. It’s easier to make sure that with the lock here, and the resend flag can be cleared inside the lock. Thanks, Zhong > > Paul. > ___ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/powernv: Exclude MSI region in extended bridge window
On Tue, Jun 21, 2016 at 12:41:05PM +1000, Gavin Shan wrote: >The windows of root port and bridge behind that are extended to >the PHB's windows to accomodate the PCI hotplug happening in >future. The PHB's 64KB 32-bits MSI region is included in bridge's >M32 windows (in hardware) though it's excluded in the corresponding >resource, as the bridge's M32 windows have 1MB as their minimal >alignment. We observed EEH error during system boot when the MSI >region is included in bridge's M32 window. > >This excludes top 1MB (including 64KB 32-bits MSI region) region >from bridge's M32 windows when extending them. > >Signed-off-by: Gavin Shan>--- > arch/powerpc/platforms/powernv/pci-ioda.c | 17 - > 1 file changed, 16 insertions(+), 1 deletion(-) > Michael, I saw the PCI hotplug patches have been merged to your "test" branch. This one is the fix for EEH error found on Garrison platform. Please apply it on top of that series. Thanks, Gavin >diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >b/arch/powerpc/platforms/powernv/pci-ioda.c >index bde7f76..e0a8a92 100644 >--- a/arch/powerpc/platforms/powernv/pci-ioda.c >+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >@@ -3239,6 +3239,7 @@ static void pnv_pci_fixup_bridge_resources(struct >pci_bus *bus, > struct pnv_phb *phb = hose->private_data; > struct pci_dev *bridge = bus->self; > struct resource *r, *w; >+ bool msi_region = false; > int i; > > /* Check if we need apply fixup to the bridge's windows */ >@@ -3259,11 +3260,25 @@ static void pnv_pci_fixup_bridge_resources(struct >pci_bus *bus, >(type & IORESOURCE_PREFETCH) && >phb->ioda.m64_segsize) > w = >mem_resources[1]; >- else if (r->flags & type & IORESOURCE_MEM) >+ else if (r->flags & type & IORESOURCE_MEM) { > w = >mem_resources[0]; >+ msi_region = true; >+ } > > r->start = w->start; > r->end = w->end; >+ >+ /* The 64KB 32-bits MSI region shouldn't be included in >+ * the 32-bits bridge window. Otherwise, we can see strange >+ * issues. One of them is EEH error observed on Garrison. >+ * >+ * Exclude top 1MB region which is the minimal alignment of >+ * 32-bits bridge window. >+ */ >+ if (msi_region) { >+ r->end += 0x1; >+ r->end -= 0x10; >+ } > } > } > >-- >2.1.0 > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/powernv: Exclude MSI region in extended bridge window
The windows of root port and bridge behind that are extended to the PHB's windows to accomodate the PCI hotplug happening in future. The PHB's 64KB 32-bits MSI region is included in bridge's M32 windows (in hardware) though it's excluded in the corresponding resource, as the bridge's M32 windows have 1MB as their minimal alignment. We observed EEH error during system boot when the MSI region is included in bridge's M32 window. This excludes top 1MB (including 64KB 32-bits MSI region) region from bridge's M32 windows when extending them. Signed-off-by: Gavin Shan--- arch/powerpc/platforms/powernv/pci-ioda.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index bde7f76..e0a8a92 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -3239,6 +3239,7 @@ static void pnv_pci_fixup_bridge_resources(struct pci_bus *bus, struct pnv_phb *phb = hose->private_data; struct pci_dev *bridge = bus->self; struct resource *r, *w; + bool msi_region = false; int i; /* Check if we need apply fixup to the bridge's windows */ @@ -3259,11 +3260,25 @@ static void pnv_pci_fixup_bridge_resources(struct pci_bus *bus, (type & IORESOURCE_PREFETCH) && phb->ioda.m64_segsize) w = >mem_resources[1]; - else if (r->flags & type & IORESOURCE_MEM) + else if (r->flags & type & IORESOURCE_MEM) { w = >mem_resources[0]; + msi_region = true; + } r->start = w->start; r->end = w->end; + + /* The 64KB 32-bits MSI region shouldn't be included in +* the 32-bits bridge window. Otherwise, we can see strange +* issues. One of them is EEH error observed on Garrison. +* +* Exclude top 1MB region which is the minimal alignment of +* 32-bits bridge window. +*/ + if (msi_region) { + r->end += 0x1; + r->end -= 0x10; + } } } -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/powernv: Print correct PHB type names
We're initializing "IODA1" and "IODA2" PHBs though they are IODA2 and NPU PHBs as below kernel log indicates. Initializing IODA1 OPAL PHB /pciex@3fffe4070 Initializing IODA2 OPAL PHB /pciex@3fff00040 This fixes the PHB names. After it's applied, we get: Initializing IODA2 PHB (/pciex@3fffe4070) Initializing NPU PHB (/pciex@3fff00040) Signed-off-by: Gavin Shan--- arch/powerpc/platforms/powernv/pci-ioda.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index e0a8a92..7f952a6 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -55,6 +55,7 @@ #define POWERNV_IOMMU_DEFAULT_LEVELS 1 #define POWERNV_IOMMU_MAX_LEVELS 5 +static const char * const pnv_phb_names[] = { "IODA1", "IODA2", "NPU" }; static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl); void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level, @@ -3626,7 +3627,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np, void *aux; long rc; - pr_info("Initializing IODA%d OPAL PHB %s\n", ioda_type, np->full_name); + pr_info("Initializing %s PHB (%s)\n", + pnv_phb_names[ioda_type], of_node_full_name(np)); prop64 = of_get_property(np, "ibm,opal-phbid", NULL); if (!prop64) { -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RESEND PATCH v2 4/4] PCI: Add support for enforcing all MMIO BARs to be page aligned
On Thu, Jun 02, 2016 at 01:46:51PM +0800, Yongji Xie wrote: > When vfio passthrough a PCI device of which MMIO BARs are > smaller than PAGE_SIZE, guest will not handle the mmio > accesses to the BARs which leads to mmio emulations in host. > > This is because vfio will not allow to passthrough one BAR's > mmio page which may be shared with other BARs. Otherwise, > there will be a backdoor that guest can use to access BARs > of other guest. > > To solve this issue, this patch modifies resource_alignment > to support syntax where multiple devices get the same > alignment. So we can use something like > "pci=resource_alignment=*:*:*.*:noresize" to enforce the > alignment of all MMIO BARs to be at least PAGE_SIZE so that > one BAR's mmio page would not be shared with other BARs. > > And we also define a macro PCIBIOS_MIN_ALIGNMENT to enable this > automatically on PPC64 platform which can easily hit this issue > because its PAGE_SIZE is 64KB. > > Note that this would not be applied to VFs whose BARs are always > page aligned and should be never reassigned according to SRIOV > spec. I see that SR-IOV spec r1.1, sec 3.3.13 requires that all VF BAR resources be aligned on System Page Size, and must be sized to consume an integral number of pages. Where does it say VF BARs can't be reassigned? I thought they *could* be reassigned, as long as VFs are disabled when you do it. > Signed-off-by: Yongji Xie> --- > Documentation/kernel-parameters.txt |2 ++ > arch/powerpc/include/asm/pci.h |2 ++ > drivers/pci/pci.c | 68 > +-- > 3 files changed, 61 insertions(+), 11 deletions(-) > > diff --git a/Documentation/kernel-parameters.txt > b/Documentation/kernel-parameters.txt > index c4802f5..cb09503 100644 > --- a/Documentation/kernel-parameters.txt > +++ b/Documentation/kernel-parameters.txt > @@ -3003,6 +3003,8 @@ bytes respectively. Such letter suffixes can also be > entirely omitted. > aligned memory resources. > If is not specified, > PAGE_SIZE is used as alignment. > + , , and can be set to > + "*" which means match all values. > PCI-PCI bridge can be specified, if resource > windows need to be expanded. > noresize: Don't change the resources' sizes when > diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h > index a6f3ac0..742fd34 100644 > --- a/arch/powerpc/include/asm/pci.h > +++ b/arch/powerpc/include/asm/pci.h > @@ -28,6 +28,8 @@ > #define PCIBIOS_MIN_IO 0x1000 > #define PCIBIOS_MIN_MEM 0x1000 > > +#define PCIBIOS_MIN_ALIGNMENT PAGE_SIZE > + > struct pci_dev; > > /* Values for the `which' argument to sys_pciconfig_iobase syscall. */ > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 3ee13e5..664f295 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4759,7 +4759,12 @@ static resource_size_t > pci_specified_resource_alignment(struct pci_dev *dev, > int seg, bus, slot, func, align_order, count; > resource_size_t align = 0; > char *p; > + bool invalid = false; > > +#ifdef PCIBIOS_MIN_ALIGNMENT > + align = PCIBIOS_MIN_ALIGNMENT; > + *resize = false; > +#endif This PCIBIOS_MIN_ALIGNMENT part should be a separate patch by itself. If you have PCIBIOS_MIN_ALIGNMENT enabled automatically for powerpc, do you still need the command-line argument? > spin_lock(_alignment_lock); > p = resource_alignment_param; > while (*p) { > @@ -4776,16 +4781,49 @@ static resource_size_t > pci_specified_resource_alignment(struct pci_dev *dev, > } else { > align_order = -1; > } > - if (sscanf(p, "%x:%x:%x.%x%n", > - , , , , ) != 4) { > + if (p[0] == '*' && p[1] == ':') { > + seg = -1; > + count = 1; > + } else if (sscanf(p, "%x%n", , ) != 1 || > + p[count] != ':') { > + invalid = true; > + break; > + } > + p += count + 1; > + if (*p == '*') { > + bus = -1; > + count = 1; > + } else if (sscanf(p, "%x%n", , ) != 1) { > + invalid = true; > + break; > + } > + p += count; > + if (*p == '.') { > + slot = bus; > + bus = seg; > seg = 0; > - if (sscanf(p, "%x:%x.%x%n", > - , , , ) != 3) { > - /* Invalid format */ > -
Re: [RESEND PATCH v2 1/4] PCI: Ignore resource_alignment if PCI_PROBE_ONLY was set\\
On 2016/6/21 9:43, Bjorn Helgaas wrote: On Thu, Jun 02, 2016 at 01:46:48PM +0800, Yongji Xie wrote: The resource_alignment will releases memory resources allocated by firmware so that kernel can reassign new resources later on. But this will cause the problem that no resources can be allocated by kernel if PCI_PROBE_ONLY was set, e.g. on pSeries platform because PCI_PROBE_ONLY force kernel to use firmware setup and not to reassign any resources. To solve this problem, this patch ignores resource_alignment if PCI_PROBE_ONLY was set. Signed-off-by: Yongji Xie--- drivers/pci/pci.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index c8b4dbd..a259394 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4761,6 +4761,12 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev) spin_lock(_alignment_lock); p = resource_alignment_param; while (*p) { + if (pci_has_flag(PCI_PROBE_ONLY)) { + printk(KERN_ERR "PCI: Ignore resource_alignment parameter: %s with PCI_PROBE_ONLY set\n", + p); + *p = 0; + break; Wouldn't it be simpler to make pci_set_resource_alignment_param() fail if PCI_PROBE_ONLY is set? I add the check here because I want to print some logs so that users could know the reason why resource_alignment doesn't work when they add this parameter. Thanks, Yongji + } count = 0; if (sscanf(p, "%d%n", _order, ) == 1 && p[count] == '@') { -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/pseries: Auto online hotplugged memory
On 06/20/2016 07:57 PM, Michael Ellerman wrote: > On Mon, 2016-06-20 at 08:51 -0500, Nathan Fontenot wrote: > >> Auto online hotplugged memory >> >> A recent update (commit id 31bc3858ea3) to the core mm hotplug code >> introduced the memhp_auto_online variable to allow for automatically >> onlining memory that is added. >> >> This patch update the pseries memory hotplug code to enable this so that >> any memory DLPAR added to the system is automatically onlined. The code >> to add the memory block for memory added from add_memory() is removed as >> this is not needed, the memory_add code does this. > > Is this a bug fix, or just a cleanup? > Hmmm.. some cleanup and some new feature. The removal of the memblock_add() call is a cleanup and the setting of the memhp_auto_online variable is taking advantage of a feature I was not previously aware of. None of this is a bug fix. -Nathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RESEND PATCH v2 3/4] PCI: Add a new option for resource_alignment to reassign alignment
On Thu, Jun 02, 2016 at 01:46:50PM +0800, Yongji Xie wrote: > When using resource_alignment kernel parameter, the current > implement reassigns the alignment by changing resources' size > which can potentially break some drivers. For example, the driver > uses the size to locate some register whose length is related > to the size. > > This patch adds a new option "noresize" for the parameter to > solve this problem. Why do we ever want to change the resource's size? I understand that you want to change the resource's *alignment*, and that part makes sense. But why change the *size*? Changing the resource size doesn't change the hardware BAR size; it just means the struct resource will describe a region larger than what the BAR actually claims. That unnecessarily wastes space after the BAR. This was a problem with the code even before your patch; I'm suggesting that if you have a way to change the alignment without changing the resource size, maybe we should do that all the time. Then you wouldn't need to add the "noresize" option. > Signed-off-by: Yongji Xie> --- > Documentation/kernel-parameters.txt |5 - > drivers/pci/pci.c | 35 > +-- > 2 files changed, 29 insertions(+), 11 deletions(-) > > diff --git a/Documentation/kernel-parameters.txt > b/Documentation/kernel-parameters.txt > index 82b42c9..c4802f5 100644 > --- a/Documentation/kernel-parameters.txt > +++ b/Documentation/kernel-parameters.txt > @@ -2997,13 +2997,16 @@ bytes respectively. Such letter suffixes can also be > entirely omitted. > window. The default value is 64 megabytes. > resource_alignment= > Format: > - [ align>@][:]:.[; ...] > + [ align>@][:]:. > + [:noresize][; ...] > Specifies alignment and device to reassign > aligned memory resources. > If is not specified, > PAGE_SIZE is used as alignment. > PCI-PCI bridge can be specified, if resource > windows need to be expanded. > + noresize: Don't change the resources' sizes when > + reassigning alignment. > ecrc= Enable/disable PCIe ECRC (transaction layer > end-to-end CRC checking). > bios: Use BIOS/firmware settings. This is the > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index a259394..3ee13e5 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4748,11 +4748,13 @@ static DEFINE_SPINLOCK(resource_alignment_lock); > /** > * pci_specified_resource_alignment - get resource alignment specified by > user. > * @dev: the PCI device to get > + * @resize: whether or not to change resources' size when reassigning > alignment > * > * RETURNS: Resource alignment if it is specified. > * Zero if it is not specified. > */ > -static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev) > +static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, > + bool *resize) > { > int seg, bus, slot, func, align_order, count; > resource_size_t align = 0; > @@ -4786,6 +4788,11 @@ static resource_size_t > pci_specified_resource_alignment(struct pci_dev *dev) > } > } > p += count; > + if (!strncmp(p, ":noresize", 9)) { > + *resize = false; > + p += 9; > + } else > + *resize = true; > if (seg == pci_domain_nr(dev->bus) && > bus == dev->bus->number && > slot == PCI_SLOT(dev->devfn) && > @@ -4818,11 +4825,12 @@ void pci_reassigndev_resource_alignment(struct > pci_dev *dev) > { > int i; > struct resource *r; > + bool resize = true; > resource_size_t align, size; > u16 command; > > /* check if specified PCI is target device to reassign */ > - align = pci_specified_resource_alignment(dev); > + align = pci_specified_resource_alignment(dev, ); > if (!align) > return; > > @@ -4844,15 +4852,22 @@ void pci_reassigndev_resource_alignment(struct > pci_dev *dev) > if (!(r->flags & IORESOURCE_MEM)) > continue; > size = resource_size(r); > - if (size < align) { > - size = align; > - dev_info(>dev, > - "Rounding up size of resource #%d to %#llx.\n", > - i, (unsigned long long)size); > + if (resize) { > + if (size <
Re: [RESEND PATCH v2 2/4] PCI: Do not Use IORESOURCE_STARTALIGN to identify bridge resources
On Thu, Jun 02, 2016 at 01:46:49PM +0800, Yongji Xie wrote: > Now we use the IORESOURCE_STARTALIGN to identify bridge resources > in __assign_resources_sorted(). That's quite fragile. We can't > make sure that the PCI devices' resources will not use > IORESOURCE_STARTALIGN any more. Can you explain this a little more? I don't quite understand the problem. Maybe you can give an example of the problem? > In this patch, we try to use a more robust way to identify > bridge resources. > > Signed-off-by: Yongji Xie> --- > drivers/pci/setup-bus.c |9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c > index 55641a3..216ddbc 100644 > --- a/drivers/pci/setup-bus.c > +++ b/drivers/pci/setup-bus.c > @@ -390,6 +390,7 @@ static void __assign_resources_sorted(struct list_head > *head, > struct pci_dev_resource *dev_res, *tmp_res, *dev_res2; > unsigned long fail_type; > resource_size_t add_align, align; > + int index; > > /* Check if optional add_size is there */ > if (!realloc_head || list_empty(realloc_head)) > @@ -410,11 +411,13 @@ static void __assign_resources_sorted(struct list_head > *head, > > /* >* There are two kinds of additional resources in the list: > - * 1. bridge resource -- IORESOURCE_STARTALIGN > - * 2. SR-IOV resource -- IORESOURCE_SIZEALIGN > + * 1. bridge resource > + * 2. SR-IOV resource >* Here just fix the additional alignment for bridge >*/ > - if (!(dev_res->res->flags & IORESOURCE_STARTALIGN)) > + index = dev_res->res - dev_res->dev->resource; > + if (index < PCI_BRIDGE_RESOURCES || > + index > PCI_BRIDGE_RESOURCE_END) I think the code looks OK; at least, it seems to match the comment. I'd just like to understand the problem a little better. > continue; > > add_align = get_res_add_align(realloc_head, dev_res->res); > -- > 1.7.9.5 > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [V3, 2/2] powerpc/drivers: Add driver for operator panel on FSP machines
On Thu, 16 Jun 2016 20:22:39 +1000 (AEST) Michael Ellermanwrote: > On Thu, 2016-28-04 at 07:02:38 UTC, Suraj Jitindar Singh wrote: > > Implement new character device driver to allow access from user > > space to the 2x16 character operator panel display present on IBM > > Power Systems machines with FSPs. > > I looked at this previously and somehow convinced myself it depended > on skiboot changes, but it seems it doesn't. > > Some comments below ... > > > This will allow status information to be presented on the display > > which is visible to a user. > > > > The driver implements a 32 character buffer which a user can > > read/write > > It looks like "32" is actually just one possible size, it comes from > the device tree no? Correct, although it is kind of hard coded into skiboot at the moment. I will change the commit message to omit this. > > > by accessing the device (/dev/oppanel). This buffer is then > > displayed on > > Are we sure "op_panel" wouldn't be better? Seems like that will cause less confusion, will change it. > > > diff --git a/arch/powerpc/configs/powernv_defconfig > > b/arch/powerpc/configs/powernv_defconfig index 0450310..8f9f4ce > > 100644 --- a/arch/powerpc/configs/powernv_defconfig > > +++ b/arch/powerpc/configs/powernv_defconfig > > @@ -181,6 +181,7 @@ CONFIG_SERIAL_8250=y > > CONFIG_SERIAL_8250_CONSOLE=y > > CONFIG_SERIAL_JSM=m > > CONFIG_VIRTIO_CONSOLE=m > > +CONFIG_IBM_OP_PANEL=m > > I think CONFIG_POWERNV_OP_PANEL would be a better name. I agree. > > > diff --git a/arch/powerpc/include/asm/opal.h > > b/arch/powerpc/include/asm/opal.h index 9d86c66..b33e349 100644 > > --- a/arch/powerpc/include/asm/opal.h > > +++ b/arch/powerpc/include/asm/opal.h > > @@ -178,6 +178,8 @@ int64_t opal_dump_ack(uint32_t dump_id); > > int64_t opal_dump_resend_notification(void); > > > > int64_t opal_get_msg(uint64_t buffer, uint64_t size); > > +int64_t opal_write_oppanel_async(uint64_t token, oppanel_line_t > > *lines, > > + uint64_t num_lines); > > I realise you're just following the skiboot code which uses > oppanel_line_t, but please don't do that in the kernel. Just use > struct oppanel_line directly. Struct oppanel_line is typedefed to oppanel_line_t in opal-api.h, so this should be oppanel_line_t or struct oppanel_line? > > > diff --git a/drivers/char/Makefile b/drivers/char/Makefile > > index d8a7579..a02c61b 100644 > > --- a/drivers/char/Makefile > > +++ b/drivers/char/Makefile > > @@ -60,3 +60,4 @@ js-rtc-y = rtc.o > > > > obj-$(CONFIG_TILE_SROM)+= tile-srom.o > > obj-$(CONFIG_XILLYBUS) += xillybus/ > > +obj-$(CONFIG_IBM_OP_PANEL) += op-panel-powernv.o > > I'd prefer powernv-op-panel.c, but up to you. This will align to the name of the config option, so will change to your recommendation > > > diff --git a/drivers/char/op-panel-powernv.c > > b/drivers/char/op-panel-powernv.c new file mode 100644 > > index 000..90b74b7 > > --- /dev/null > > +++ b/drivers/char/op-panel-powernv.c > > @@ -0,0 +1,247 @@ > > +/* > > + * OPAL Operator Panel Display Driver > > + * > > + * (C) Copyright IBM Corp. 2016 > > + * > > + * Author: Suraj Jitindar Singh > > I'm not a fan of email addresses in C files, they just bit rot. > > The preferred format is: > > * Copyright 2016, Suraj Jitindar Singh, IBM Corporation. > > > + * > > + * This program is free software; you can redistribute it and/or > > modify > > + * it under the terms of the GNU General Public License as > > published by > > + * the Free Software Foundation; either version 2 of the License, > > or > > + * (at your option) any later version. > > + * > > + * This program is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > + * GNU General Public License for more details. > > We don't need that paragraph in every file. > Will update and remove these sections. > > + */ > > + > > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#include > > +#include > > opal-api.h is sort of an implementation detail, you should just > include opal.h > > > +/* > > + * This driver creates a character device (/dev/oppanel) which > > exposes the > > + * operator panel (2x16 character LCD display) on IBM Power > > Systems machines > > + * with FSPs. > > + * A 32 character buffer written to the device will be displayed > > on the > > + * operator panel. > > + */ > > + > > +static DEFINE_MUTEX(oppanel_mutex); > > + > > +static oppanel_line_t *oppanel_lines; > > +static char*oppanel_data; > > +static u32 line_length, num_lines; > > You calculate (num_lines *
[PATCH 6/6] IMA: Demonstration code for kexec buffer passing.
This shows how kernel code can use the kexec buffer passing mechanism to pass information to the next kernel. This patch is not intended to be committed. Signed-off-by: Thiago Jung Bauermann--- include/linux/ima.h | 11 + kernel/kexec_file.c | 4 ++ security/integrity/ima/ima.h | 5 +++ security/integrity/ima/ima_init.c | 26 security/integrity/ima/ima_template.c | 79 +++ 5 files changed, 125 insertions(+) diff --git a/include/linux/ima.h b/include/linux/ima.h index 0eb7c2e7f0d6..96528d007139 100644 --- a/include/linux/ima.h +++ b/include/linux/ima.h @@ -11,6 +11,7 @@ #define _LINUX_IMA_H #include +#include struct linux_binprm; #ifdef CONFIG_IMA @@ -23,6 +24,10 @@ extern int ima_post_read_file(struct file *file, void *buf, loff_t size, enum kernel_read_file_id id); extern void ima_post_path_mknod(struct dentry *dentry); +#ifdef CONFIG_KEXEC_FILE +extern void ima_add_kexec_buffer(struct kimage *image); +#endif + #else static inline int ima_bprm_check(struct linux_binprm *bprm) { @@ -60,6 +65,12 @@ static inline void ima_post_path_mknod(struct dentry *dentry) return; } +#ifdef CONFIG_KEXEC_FILE +static inline void ima_add_kexec_buffer(struct kimage *image) +{ +} +#endif + #endif /* CONFIG_IMA */ #ifdef CONFIG_IMA_APPRAISE diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 79d09a7784d8..143c70d2ef1c 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -261,6 +262,9 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, } } + /* IMA needs to pass the measurement list to the next kernel. */ + ima_add_kexec_buffer(image); + /* Call arch image load handlers */ ldata = arch_kexec_kernel_image_load(image); diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h index d3a939bf2781..940f68f3ccc9 100644 --- a/security/integrity/ima/ima.h +++ b/security/integrity/ima/ima.h @@ -101,6 +101,11 @@ struct ima_queue_entry { }; extern struct list_head ima_measurements; /* list of all measurements */ +#ifdef CONFIG_KEXEC_FILE +extern void *kexec_buffer; +extern size_t kexec_buffer_size; +#endif + /* Internal IMA function definitions */ int ima_init(void); int ima_fs_init(void); diff --git a/security/integrity/ima/ima_init.c b/security/integrity/ima/ima_init.c index 5d679a685616..aaa2fc536ca4 100644 --- a/security/integrity/ima/ima_init.c +++ b/security/integrity/ima/ima_init.c @@ -21,6 +21,7 @@ #include #include #include +#include #include "ima.h" @@ -103,6 +104,29 @@ void __init ima_load_x509(void) } #endif +#ifdef CONFIG_KEXEC_FILE +static void ima_load_kexec_buffer(void) +{ + int rc; + + /* Fetch the buffer from the previous kernel, if any. */ + rc = kexec_get_handover_buffer(_buffer, _buffer_size); + if (rc == 0) { + /* Demonstrate that buffer handover works. */ + pr_err("kexec buffer contents: %s\n", (char *) kexec_buffer); + pr_err("kexec buffer contents after update: %s\n", + (char *) kexec_buffer + 4 * PAGE_SIZE + 10); + + kexec_free_handover_buffer(); + } else if (rc == -ENOENT) + pr_debug("No kexec buffer from the previous kernel.\n"); + else + pr_debug("Error restoring kexec buffer: %d\n", rc); +} +#else +static void ima_load_kexec_buffer(void) { } +#endif + int __init ima_init(void) { u8 pcr_i[TPM_DIGEST_SIZE]; @@ -133,5 +157,7 @@ int __init ima_init(void) ima_init_policy(); + ima_load_kexec_buffer(); + return ima_fs_init(); } diff --git a/security/integrity/ima/ima_template.c b/security/integrity/ima/ima_template.c index febd12ed9b55..c5e81af8cb9c 100644 --- a/security/integrity/ima/ima_template.c +++ b/security/integrity/ima/ima_template.c @@ -15,6 +15,8 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include +#include #include "ima.h" #include "ima_template_lib.h" @@ -182,6 +184,83 @@ static int template_desc_init_fields(const char *template_fmt, return 0; } +#ifdef CONFIG_KEXEC_FILE +void *kexec_buffer = NULL; +size_t kexec_buffer_size = 0; + +/* Physical address of the measurement buffer in the next kernel. */ +unsigned long kexec_buffer_load_addr = 0; + +/* + * Called during reboot. IMA can add here new events that were generated after + * the kexec image was loaded. + */ +static int ima_update_kexec_buffer(struct notifier_block *self, + unsigned long action, void *data) +{ + int ret; + + if (!kexec_in_progress) + return NOTIFY_OK; + + /* +* Add content deep in the buffer to show that we can update +
[PATCH 5/6] kexec: Share logic to copy segment page contents.
Make kimage_load_normal_segment and kexec_update_segment share code which they currently duplicate. Signed-off-by: Thiago Jung Bauermann--- kernel/kexec_core.c | 159 +++- 1 file changed, 95 insertions(+), 64 deletions(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 8781d3e4479d..281d8b961fb4 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -700,6 +700,65 @@ static struct page *kimage_alloc_page(struct kimage *image, return page; } +struct kimage_update_buffer_state { + /* Destination memory address currently being copied to. */ + unsigned long maddr; + + /* Bytes in buffer still left to copy. */ + size_t ubytes; + + /* Bytes in memory still left to copy. */ + size_t mbytes; + + /* If true, copy from kbuf. */ + bool from_kernel; + + /* Clear pages before copying? */ + bool clear_pages; + + /* Buffer position to continue copying from. */ + const unsigned char *kbuf; + const unsigned char __user *buf; +}; + +static int kimage_update_page(struct page *page, + struct kimage_update_buffer_state *state) +{ + char *ptr; + int result = 0; + size_t uchunk, mchunk; + + ptr = kmap(page); + + /* Start with a clear page */ + if (state->clear_pages) + clear_page(ptr); + + ptr += state->maddr & ~PAGE_MASK; + mchunk = min_t(size_t, state->mbytes, + PAGE_SIZE - (state->maddr & ~PAGE_MASK)); + uchunk = min(state->ubytes, mchunk); + + if (state->from_kernel) + memcpy(ptr, state->kbuf, uchunk); + else + result = copy_from_user(ptr, state->buf, uchunk); + + kunmap(page); + if (result) + return -EFAULT; + + state->ubytes -= uchunk; + state->maddr += mchunk; + if (state->from_kernel) + state->kbuf += mchunk; + else + state->buf += mchunk; + state->mbytes -= mchunk; + + return 0; +} + /** * kexec_update_segment - update the contents of a kimage segment * @buffer:New contents of the segment. @@ -718,6 +777,7 @@ int kexec_update_segment(const char *buffer, unsigned long bufsz, unsigned long entry; unsigned long *ptr = NULL; void *dest = NULL; + struct kimage_update_buffer_state state; for (i = 0; i < kexec_image->nr_segments; i++) /* We only support updating whole segments. */ @@ -736,8 +796,15 @@ int kexec_update_segment(const char *buffer, unsigned long bufsz, return -EINVAL; } - for (entry = kexec_image->head; !(entry & IND_DONE) && memsz; -entry = *ptr++) { + state.maddr = load_addr; + state.ubytes = bufsz; + state.mbytes = memsz; + state.kbuf = buffer; + state.from_kernel = true; + state.clear_pages = false; + + for (entry = kexec_image->head; !(entry & IND_DONE) && + state.mbytes; entry = *ptr++) { void *addr = (void *) (entry & PAGE_MASK); switch (entry & IND_FLAGS) { @@ -754,26 +821,13 @@ int kexec_update_segment(const char *buffer, unsigned long bufsz, return -EINVAL; } - if (dest == (void *) load_addr) { - struct page *page; - char *ptr; - size_t uchunk, mchunk; - - page = kmap_to_page(addr); - - ptr = kmap(page); - ptr += load_addr & ~PAGE_MASK; - mchunk = min_t(size_t, memsz, - PAGE_SIZE - (load_addr & ~PAGE_MASK)); - uchunk = min(bufsz, mchunk); - memcpy(ptr, buffer, uchunk); - - kunmap(page); + if (dest == (void *) state.maddr) { + int ret; - bufsz -= uchunk; - load_addr += mchunk; - buffer += mchunk; - memsz -= mchunk; + ret = kimage_update_page(kmap_to_page(addr), +); + if (ret) + return ret; } dest += PAGE_SIZE; } @@ -791,31 +845,30 @@ int kexec_update_segment(const char *buffer, unsigned long bufsz, static int kimage_load_normal_segment(struct kimage *image, struct kexec_segment *segment) { - unsigned
[PATCH 4/6] kexec_file: Add mechanism to update kexec segments.
kexec_update_segment allows a given segment in kexec_image to have its contents updated. This is useful if the current kernel wants to send information to the next kernel that is up-to-date at the time of reboot. Signed-off-by: Thiago Jung Bauermann--- include/linux/kexec.h | 2 ++ kernel/kexec_core.c | 88 +++ kernel/kexec_file.c | 1 + 3 files changed, 91 insertions(+) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 131b1fc7820e..14d4ac070a8c 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -222,6 +222,8 @@ extern int kexec_add_buffer(struct kimage *image, char *buffer, unsigned long buf_align, unsigned long buf_min, unsigned long buf_max, bool top_down, bool checksum, unsigned long *load_addr); +int kexec_update_segment(const char *buffer, unsigned long bufsz, +unsigned long load_addr, unsigned long memsz); extern struct page *kimage_alloc_control_pages(struct kimage *image, unsigned int order); extern int kexec_load_purgatory(struct kimage *image, unsigned long min, diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 56b3ed0927b0..8781d3e4479d 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -700,6 +700,94 @@ static struct page *kimage_alloc_page(struct kimage *image, return page; } +/** + * kexec_update_segment - update the contents of a kimage segment + * @buffer:New contents of the segment. + * @bufsz: @buffer size. + * @load_addr: Segment's physical address in the next kernel. + * @memsz: Segment size. + * + * This function assumes kexec_mutex is held. + * + * Return: 0 on success, negative errno on error. + */ +int kexec_update_segment(const char *buffer, unsigned long bufsz, +unsigned long load_addr, unsigned long memsz) +{ + int i; + unsigned long entry; + unsigned long *ptr = NULL; + void *dest = NULL; + + for (i = 0; i < kexec_image->nr_segments; i++) + /* We only support updating whole segments. */ + if (load_addr == kexec_image->segment[i].mem && + memsz == kexec_image->segment[i].memsz) { + if (kexec_image->segment[i].do_checksum) { + pr_err("Trying to update non-modifiable segment.\n"); + return -EINVAL; + } + + break; + } + if (i == kexec_image->nr_segments) { + pr_err("Couldn't find segment to update: 0x%lx, size 0x%lx\n", + load_addr, memsz); + return -EINVAL; + } + + for (entry = kexec_image->head; !(entry & IND_DONE) && memsz; +entry = *ptr++) { + void *addr = (void *) (entry & PAGE_MASK); + + switch (entry & IND_FLAGS) { + case IND_DESTINATION: + dest = addr; + break; + case IND_INDIRECTION: + ptr = __va(addr); + break; + case IND_SOURCE: + /* Shouldn't happen, but verify just to be safe. */ + if (dest == NULL) { + pr_err("Invalid kexec entries list."); + return -EINVAL; + } + + if (dest == (void *) load_addr) { + struct page *page; + char *ptr; + size_t uchunk, mchunk; + + page = kmap_to_page(addr); + + ptr = kmap(page); + ptr += load_addr & ~PAGE_MASK; + mchunk = min_t(size_t, memsz, + PAGE_SIZE - (load_addr & ~PAGE_MASK)); + uchunk = min(bufsz, mchunk); + memcpy(ptr, buffer, uchunk); + + kunmap(page); + + bufsz -= uchunk; + load_addr += mchunk; + buffer += mchunk; + memsz -= mchunk; + } + dest += PAGE_SIZE; + } + + /* Shouldn't happen, but verify just to be safe. */ + if (ptr == NULL) { + pr_err("Invalid kexec entries list."); + return -EINVAL; + } + } + + return 0; +} + static int kimage_load_normal_segment(struct kimage *image, struct kexec_segment *segment) { diff --git a/kernel/kexec_file.c
[PATCH 3/6] kexec_file: Allow skipping checksum calculation for some segments.
Adds checksum argument to kexec_add_buffer specifying whether the given segment should be part of the checksum calculation. The next patch will add a way to update segments after a kimage is loaded. Segments that will be updated in this way should not be checksummed, otherwise they will cause the purgatory checksum verification to fail when the machine is rebooted. As a bonus, we don't need to special-case the purgatory segment anymore to avoid checksumming it. Adjust call sites for the new argument. Signed-off-by: Thiago Jung Bauermann--- arch/powerpc/kernel/kexec_elf_64.c | 6 +++--- arch/x86/kernel/crash.c| 4 ++-- arch/x86/kernel/kexec-bzimage64.c | 6 +++--- include/linux/kexec.h | 7 +-- kernel/kexec_file.c| 22 +++--- 5 files changed, 24 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/kernel/kexec_elf_64.c b/arch/powerpc/kernel/kexec_elf_64.c index 5d2b7036fee7..abbad484d7b2 100644 --- a/arch/powerpc/kernel/kexec_elf_64.c +++ b/arch/powerpc/kernel/kexec_elf_64.c @@ -311,7 +311,7 @@ static int elf_exec_load(struct kimage *image, struct elfhdr *ehdr, (char *) elf_info->buffer + phdr->p_offset, size, phdr->p_memsz, phdr->p_align, phdr->p_paddr + base, ppc64_rma_size, - false, _addr); + false, true, _addr); if (ret) goto out; @@ -487,7 +487,7 @@ void *elf64_load(struct kimage *image, char *kernel_buf, if (initrd != NULL) { ret = kexec_add_buffer(image, initrd, initrd_len, initrd_len, PAGE_SIZE, 0, ppc64_rma_size, false, - _load_addr); + true, _load_addr); if (ret) goto out; @@ -564,7 +564,7 @@ void *elf64_load(struct kimage *image, char *kernel_buf, fdt_pack(fdt); ret = kexec_add_buffer(image, fdt, fdt_size, fdt_size, PAGE_SIZE, 0, - ppc64_rma_size, true, _load_addr); + ppc64_rma_size, true, true, _load_addr); if (ret) goto out; diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 9ef978d69c22..c8b16f2ca321 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -643,7 +643,7 @@ int crash_load_segments(struct kimage *image) */ ret = kexec_add_buffer(image, (char *)_zero_bytes, sizeof(crash_zero_bytes), src_sz, - PAGE_SIZE, 0, -1, 0, + PAGE_SIZE, 0, -1, false, true, >arch.backup_load_addr); if (ret) return ret; @@ -660,7 +660,7 @@ int crash_load_segments(struct kimage *image) image->arch.elf_headers_sz = elf_sz; ret = kexec_add_buffer(image, (char *)elf_addr, elf_sz, elf_sz, - ELF_CORE_HEADER_ALIGN, 0, -1, 0, + ELF_CORE_HEADER_ALIGN, 0, -1, false, true, >arch.elf_load_addr); if (ret) { vfree((void *)image->arch.elf_headers); diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c index f2356bda2b05..f9016be44da6 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -420,7 +420,7 @@ static void *bzImage64_load(struct kimage *image, char *kernel, ret = kexec_add_buffer(image, (char *)params, params_misc_sz, params_misc_sz, 16, MIN_BOOTPARAM_ADDR, - ULONG_MAX, 1, _load_addr); + ULONG_MAX, true, true, _load_addr); if (ret) goto out_free_params; pr_debug("Loaded boot_param, command line and misc at 0x%lx bufsz=0x%lx memsz=0x%lx\n", @@ -434,7 +434,7 @@ static void *bzImage64_load(struct kimage *image, char *kernel, ret = kexec_add_buffer(image, kernel_buf, kernel_bufsz, kernel_memsz, kernel_align, - MIN_KERNEL_LOAD_ADDR, ULONG_MAX, 1, + MIN_KERNEL_LOAD_ADDR, ULONG_MAX, true, true, _load_addr); if (ret) goto out_free_params; @@ -446,7 +446,7 @@ static void *bzImage64_load(struct kimage *image, char *kernel, if (initrd) { ret = kexec_add_buffer(image, initrd, initrd_len, initrd_len, PAGE_SIZE, MIN_INITRD_LOAD_ADDR, - ULONG_MAX, 1, _load_addr); + ULONG_MAX,
[PATCH 2/6] powerpc: kexec_file: Add buffer hand-over support for the next kernel
The buffer hand-over mechanism allows the currently running kernel to pass data to kernel that will be kexec'd via a kexec segment. The second kernel can check whether the previous kernel sent data and retrieve it. This is the architecture-specific part. Signed-off-by: Thiago Jung Bauermann--- arch/powerpc/include/asm/kexec.h | 9 + arch/powerpc/kernel/kexec_elf_64.c | 44 +++ arch/powerpc/kernel/machine_kexec_64.c | 64 ++ 3 files changed, 117 insertions(+) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index a46f5f45570c..9b1ff59bc188 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -55,6 +55,15 @@ typedef void (*crash_shutdown_t)(void); #ifdef CONFIG_KEXEC +#ifdef CONFIG_KEXEC_FILE +#define ARCH_HAS_KIMAGE_ARCH + +struct kimage_arch { + phys_addr_t handover_buffer_addr; + unsigned long handover_buffer_size; +}; +#endif + /* * This function is responsible for capturing register states if coming * via panic or invoking dump using sysrq-trigger. diff --git a/arch/powerpc/kernel/kexec_elf_64.c b/arch/powerpc/kernel/kexec_elf_64.c index 4e71595300ed..5d2b7036fee7 100644 --- a/arch/powerpc/kernel/kexec_elf_64.c +++ b/arch/powerpc/kernel/kexec_elf_64.c @@ -96,6 +96,46 @@ static int elf64_probe(const char *buf, unsigned long len) return elf_check_arch()? 0 : -ENOEXEC; } +static int setup_handover_buffer(struct kimage *image, void *fdt, +int chosen_node) +{ + int ret; + + if (image->arch.handover_buffer_addr) { + ret = fdt_setprop_u64(fdt, chosen_node, + "linux,kexec-handover-buffer-start", + image->arch.handover_buffer_addr); + if (ret < 0) { + pr_err("Error setting up the new device tree.\n"); + return -EINVAL; + } + + /* -end is the first address after the buffer. */ + ret = fdt_setprop_u64(fdt, chosen_node, + "linux,kexec-handover-buffer-end", + image->arch.handover_buffer_addr + + image->arch.handover_buffer_size); + if (ret < 0) { + pr_err("Error setting up the new device tree.\n"); + return -EINVAL; + } + + ret = fdt_add_mem_rsv(fdt, image->arch.handover_buffer_addr, + image->arch.handover_buffer_size); + if (ret) { + pr_err("Error reserving kexec handover buffer: %s\n", + fdt_strerror(ret)); + return -EINVAL; + } + + pr_debug("kexec handover buffer at 0x%llx, size = 0x%lx\n", +image->arch.handover_buffer_addr, +image->arch.handover_buffer_size); + } + + return 0; +} + static bool find_debug_console(void *fdt, int chosen_node) { int len; @@ -494,6 +534,10 @@ void *elf64_load(struct kimage *image, char *kernel_buf, } } + ret = setup_handover_buffer(image, fdt, chosen_node); + if (ret) + goto out; + ret = fdt_setprop(fdt, chosen_node, "linux,booted-from-kexec", NULL, 0); if (ret) { pr_err("Error setting up the new device tree.\n"); diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 43e8185ab6f7..c582abf726f5 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include @@ -481,6 +482,69 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image) return image->fops->cleanup(image->image_loader_data); } +bool kexec_can_hand_over_buffer(void) +{ + return true; +} + +int arch_kexec_add_handover_buffer(struct kimage *image, + unsigned long load_addr, unsigned long size) +{ + image->arch.handover_buffer_addr = load_addr; + image->arch.handover_buffer_size = size; + + return 0; +} + +int kexec_get_handover_buffer(void **addr, unsigned long *size) +{ + int chosen_node; + int startsz, endsz; + const void *startp, *endp; + unsigned long start_addr, end_addr; + + chosen_node = fdt_path_offset(initial_boot_params, "/chosen"); + if (chosen_node < 0) { + pr_err("Malformed device tree: /chosen not found.\n"); + return -EINVAL; + } + + startp = of_get_flat_dt_prop(chosen_node, +"linux,kexec-handover-buffer-start", +);
[PATCH 1/6] kexec_file: Add buffer hand-over support for the next kernel
The buffer hand-over mechanism allows the currently running kernel to pass data to kernel that will be kexec'd via a kexec segment. The second kernel can check whether the previous kernel sent data and retrieve it. This is the architecture-independent part of the feature. Signed-off-by: Thiago Jung Bauermann--- include/linux/kexec.h | 40 ++ kernel/kexec_file.c | 79 +++ 2 files changed, 119 insertions(+) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index a08cd986b5a1..72db95c623b3 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -325,6 +325,46 @@ int __weak arch_kexec_walk_mem(unsigned int image_type, unsigned long start, void arch_kexec_protect_crashkres(void); void arch_kexec_unprotect_crashkres(void); +#ifdef CONFIG_KEXEC_FILE +bool __weak kexec_can_hand_over_buffer(void); +int __weak arch_kexec_add_handover_buffer(struct kimage *image, + unsigned long load_addr, + unsigned long size); +int kexec_add_handover_buffer(struct kimage *image, void *buffer, + unsigned long bufsz, unsigned long memsz, + unsigned long buf_align, unsigned long buf_min, + unsigned long buf_max, bool top_down, + unsigned long *load_addr); +int __weak kexec_get_handover_buffer(void **addr, unsigned long *size); +int __weak kexec_free_handover_buffer(void); +#else +static inline bool kexec_can_hand_over_buffer(void) +{ + return false; +} + +static inline int kexec_add_handover_buffer(struct kimage *image, void *buffer, + unsigned long bufsz, + unsigned long memsz, + unsigned long buf_align, + unsigned long buf_min, + unsigned long buf_max, + bool top_down, bool checksum, + unsigned long *load_addr) +{ + return -ENOTSUPP; +} + +static inline int kexec_get_handover_buffer(void **addr, unsigned long *size) +{ + return -ENOTSUPP; +} + +static inline int kexec_free_handover_buffer(void) +{ + return -ENOTSUPP; +} +#endif /* CONFIG_KEXEC_FILE */ #else /* !CONFIG_KEXEC_CORE */ struct pt_regs; struct task_struct; diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 3e494261d32a..d6ba702654f5 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -113,6 +113,85 @@ void kimage_file_post_load_cleanup(struct kimage *image) image->image_loader_data = NULL; } +/** + * kexec_can_hand_over_buffer - can we pass data to the kexec'd kernel? + */ +bool __weak kexec_can_hand_over_buffer(void) +{ + return false; +} + +/** + * arch_kexec_add_handover_buffer - do arch-specific steps to handover buffer + * + * Architectures should use this function to pass on the handover buffer + * information to the next kernel. + * + * Return: 0 on success, negative errno on error. + */ +int __weak arch_kexec_add_handover_buffer(struct kimage *image, + unsigned long load_addr, + unsigned long size) +{ + return -ENOTSUPP; +} + +/** + * kexec_add_handover_buffer - add buffer to be used by the next kernel + * @image: kexec image to add buffer to. + * @buffer:Contents of the handover buffer. + * @bufsz: @buffer size. + * @memsz: Handover buffer size in memory. + * @buf_align: Buffer alignment restriction. + * @buf_min: Minimum address where buffer can be placed. + * @buf_max: Maximum address where buffer can be placed. + * @top_down: Find the highest available memory position for the buffer? + * @load_addr: On successful return, set to the physical memory address of the + * buffer in the next kernel. + * + * This function assumes that kexec_mutex is held. + * + * Return: 0 on success, negative errno on error. + */ +int kexec_add_handover_buffer(struct kimage *image, void *buffer, + unsigned long bufsz, unsigned long memsz, + unsigned long buf_align, unsigned long buf_min, + unsigned long buf_max, bool top_down, + unsigned long *load_addr) +{ + int ret; + + if (!kexec_can_hand_over_buffer()) + return -ENOTSUPP; + + ret = kexec_add_buffer(image, buffer, bufsz, memsz, buf_align, buf_min, + buf_max, top_down, load_addr); + if (ret) + return ret; + + return arch_kexec_add_handover_buffer(image, *load_addr, memsz); +} + +/** + * kexec_get_handover_buffer - get the handover buffer from the
[PATCH 0/6] kexec_file: Add buffer hand-over for the next kernel
Hello, This patch series implements a mechanism which allows the kernel to pass on a buffer to the kernel that will be kexec'd. This buffer is passed as a segment which is added to the kimage when it is being prepared by kexec_file_load. How the second kernel is informed of this buffer is architecture-specific. On PowerPC, this is done via the device tree, by checking the properties /chosen/linux,kexec-handover-buffer-start and /chosen/linux,kexec-handover-buffer-end, which is analogous to how the kernel finds the initrd. This feature was implemented because the Integrity Measurement Architecture subsystem needs to preserve its measurement list accross the kexec reboot. This is so that IMA can implement trusted boot support on the OpenPower platform, because on such systems an intermediary Linux instance running as part of the firmware is used to boot the target operating system via kexec. Using this mechanism, IMA on this intermediary instance can hand over to the target OS the measurements of the components that were used to boot it. Because there could be additional measurement events between the kexec_file_load call and the actual reboot, IMA needs a way to update the buffer with those additional events before rebooting. One can minimize the interval between the kexec_file_load and the reboot syscalls, but as small as it can be, there is always the possibility that the measurement list will be out of date at the time of reboot. To address this issue, this patch series also introduces kexec_update_segment, which allows a reboot notifier to change the contents of the image segment during the reboot process. There's one patch which makes kimage_load_normal_segment and kexec_update_segment share code. It's not much code that they can share though, so I'm not sure if it's worth including this patch. The last patch is not intended to be merged, it just demonstrates how this feature can be used. This series applies on top of v2 of the "kexec_file_load implementation for PowerPC" patch series at: http://lists.infradead.org/pipermail/kexec/2016-June/016078.html Thiago Jung Bauermann (6): kexec_file: Add buffer hand-over support for the next kernel powerpc: kexec_file: Add buffer hand-over support for the next kernel kexec_file: Allow skipping checksum calculation for some segments. kexec_file: Add mechanism to update kexec segments. kexec: Share logic to copy segment page contents. IMA: Demonstration code for kexec buffer passing. arch/powerpc/include/asm/kexec.h | 9 ++ arch/powerpc/kernel/kexec_elf_64.c | 50 +++- arch/powerpc/kernel/machine_kexec_64.c | 64 ++ arch/x86/kernel/crash.c| 4 +- arch/x86/kernel/kexec-bzimage64.c | 6 +- include/linux/ima.h| 11 ++ include/linux/kexec.h | 47 +++- kernel/kexec_core.c| 205 ++--- kernel/kexec_file.c| 102 ++-- security/integrity/ima/ima.h | 5 + security/integrity/ima/ima_init.c | 26 + security/integrity/ima/ima_template.c | 79 + 12 files changed, 547 insertions(+), 61 deletions(-) -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/3] cxlflash: Shutdown notification and reset patch
> "Uma" == Uma Krishnanwrites: Uma> This patch set contains support to notify CXL Flash devices of an Uma> impending shutdown and a fix to drain operations prior to a reset. Uma> This series is intended for 4.8 and is bisectable. Applied to 4.8/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/align: Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE
Hi Arnd, > Something like the (untested) patch below, similar to how we > already handle the word size and how some other architectures > handle setting __BIG_ENDIAN__. I tested this by reverting Michael's patch and applying yours. Not only does it successfully fix the errors that patch fixes, it manages to clean up the following four errors as well: +/scratch/dja/linux/arch/powerpc/lib/sstep.c:371:32: error: cast from unknown type +/scratch/dja/linux/arch/powerpc/lib/sstep.c:371:59: error: using member 'word' in incomplete struct +/scratch/dja/linux/arch/powerpc/lib/sstep.c:411:32: error: cast from unknown type +/scratch/dja/linux/arch/powerpc/lib/sstep.c:411:59: error: using member 'word' in incomplete struct So: Tested-by: Daniel Axtens(I think the patch also needs your sign-off.) Regards, Daniel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: unrecoverable exception on G5 with CONFIG_PPC_EARLY_DEBUG enabled
On Mon, 2016-06-20 at 15:51 +0530, Aneesh Kumar K.V wrote: > Michael Ellermanwrites: > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > > b/arch/powerpc/kernel/exceptions-64s.S > > index 4c9440629128..8bcc1b457115 100644 > > --- a/arch/powerpc/kernel/exceptions-64s.S > > +++ b/arch/powerpc/kernel/exceptions-64s.S > > @@ -1399,11 +1399,12 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_RADIX) > > lwz r9,PACA_EXSLB+EX_CCR(r13) /* get saved CR */ > > > > mtlrr10 > > -BEGIN_MMU_FTR_SECTION > > - b 2f > > -END_MMU_FTR_SECTION_IFSET(MMU_FTR_RADIX) > > andi. r10,r12,MSR_RI /* check for unrecoverable exception */ > > +BEGIN_MMU_FTR_SECTION > > beq-2f > > +FTR_SECTION_ELSE > > + b 2f > > +ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_RADIX) > > > > .machine push > > .machine "power4" > > I sent a patch which should get this problem fixed. > > http://mid.gmane.org/1466274479-5650-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com Well s/fixed/avoided/. I'd rather we fixed the root cause, which is that the SLB miss handler is broken until code patching happens. When possible we should write feature sections so that the unpatched code is functional, to avoid problems like this. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/pseries: Auto online hotplugged memory
On Mon, 2016-06-20 at 08:51 -0500, Nathan Fontenot wrote: > Auto online hotplugged memory > > A recent update (commit id 31bc3858ea3) to the core mm hotplug code > introduced the memhp_auto_online variable to allow for automatically > onlining memory that is added. > > This patch update the pseries memory hotplug code to enable this so that > any memory DLPAR added to the system is automatically onlined. The code > to add the memory block for memory added from add_memory() is removed as > this is not needed, the memory_add code does this. Is this a bug fix, or just a cleanup? cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/align: Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE
On Fri, 2016-06-17 at 12:46 +0200, Arnd Bergmann wrote: > On Friday, June 17, 2016 1:35:35 PM CEST Daniel Axtens wrote: > > > It would be better to fix the sparse compilation so the same endianess > > > is set that you get when calling gcc. > > > > I will definitely work on a patch to sparse! I'd still like this or > > something like it to go in though, so we can keep working on reducing > > the sparse warning count while the sparse patch is in the works. > > I think you just need to fix the Makefile so it sets the right > arguments when calling sparse. > > Something like the (untested) patch below, similar to how we > already handle the word size and how some other architectures > handle setting __BIG_ENDIAN__. Yep that's clearly better. I didn't know we had separate CHECKER_FLAGS. Daniel can you test that? Arnd we'll add Suggested-by: you, or send a SOB if you like? cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc/asm: Remove unused symbols in asm-offsets.c
On Wed, 2016-01-06 at 22:56:47 UTC, Rashmica Gupta wrote: > THREAD_DSCR: Added in commit efcac6589a27 ("powerpc: Per process DSCR + ... > > Signed-off-by: Rashmica GuptaApplied to powerpc next, thanks. https://git.kernel.org/powerpc/c/aac6a91fea93e6bdd7ac20365d cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc/align: Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE
On Thu, 2016-16-06 at 12:33:41 UTC, Michael Ellerman wrote: > From: Daniel Axtens> > Sparse complains that it doesn't know what REG_BYTE is: > > arch/powerpc/kernel/align.c:313:29: error: undefined identifier 'REG_BYTE' > > REG_BYTE is defined differently based on whether we're compiling for > LE, BE32 or BE64. Sparse apparently doesn't provide __BIG_ENDIAN__ or > __LITTLE_ENDIAN__, which means we get no definition. > > Rather than check for __BIG_ENDIAN__ and then separately for > __LITTLE_ENDIAN__, just switch the #ifdef to check for __BIG_ENDIAN__ > and then #else we define the little endian version. Technically that's > dicey because PDP_ENDIAN is also a possibility, but we already do it in > a lot of places so one more hardly matters. > > Signed-off-by: Daniel Axtens > Signed-off-by: Michael Ellerman Applied to powerpc next. https://git.kernel.org/powerpc/c/a9650e9bc53239c30c39f77d9d cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v4] powerpc: spinlock: Fix spin_unlock_wait()
On Fri, 2016-10-06 at 03:51:28 UTC, Boqun Feng wrote: > There is an ordering issue with spin_unlock_wait() on powerpc, because > the spin_lock primitive is an ACQUIRE and an ACQUIRE is only ordering > the load part of the operation with memory operations following it. ... > > Suggested-by: "Paul E. McKenney"> Signed-off-by: Boqun Feng > Reviewed-by: "Paul E. McKenney" > [mpe: Inline the "nop" ll/sc loop and set EH=0, munge change log] > Signed-off-by: Michael Ellerman Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/6262db7c088bbfc26480d10144 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc: Add array bounds checking to crash_shutdown_handlers
On Wed, 2016-11-05 at 00:57:32 UTC, Suraj Jitindar Singh wrote: > The array crash_shutdown_handles is an array of size CRASH_HANDLER_MAX+1 > containing up to CRASH_HANDLER_MAX shutdown_handlers. It is assumed to > be NULL terminated, which it is under normal circumstances. Array > accesses in the functions crash_shutdown_unregister() and > default_machine_crash_shutdown() rely on this NULL termination property > when traversing this list and don't protect again out of bounds accesses. > If the NULL terminator were somehow overwritten these functions could > potentially access out of the bounds of the array. > > Shrink the array to size CRASH_HANDLER_MAX and implement explicit array > bounds checking when accessing the elements of the > crash_shutdown_handles[] array in crash_shutdown_unregister() and > default_machine_crash_shutdown(). > > Signed-off-by: Suraj Jitindar SinghApplied to powerpc next, thanks. https://git.kernel.org/powerpc/c/1d1451655bad9a6a5fd7a42de6 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2] powerpc/mm: Ensure "special" zones are empty
On Wed, 2016-11-05 at 09:22:18 UTC, Oliver O'Halloran wrote: > The mm zone mechanism was traditionally used by arch specific code to > partition memory into allocation zones. However there are several zones > that are managed by the mm subsystem rather than the architecture. Most > architectures set the max PFN of these special zones to zero, however on > powerpc we set them to ~0ul. This, in conjunction with a bug in > free_area_init_nodes() results in all of system memory being placed in > ZONE_DEVICE when enabled. Device memory cannot be used for regular kernel > memory allocations so this will cause a kernel panic at boot. Given the > planned addition of more mm managed zones (ZONE_CMA) we should aim to be > consistent with every other architecture and set the max PFN for these > zones to zero. > > Signed-off-by: Oliver O'Halloran> Reviewed-by: Balbir Singh > Cc: linux...@kvack.org Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/3079abe11031e2ba5d1e21 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: cxl: Update process element after allocating interrupts
On Mon, 2016-23-05 at 16:14:05 UTC, Ian Munsie wrote: > From: Ian Munsie> > In the kernel API, it is possible to attempt to allocate AFU interrupts > after already starting a context. Since the process element structure > used by the hardware is only filled out at the time the context is > started, it will not be updated with the interrupt numbers that have > just been allocated and therefore AFU interrupts will not work unless > they were allocated prior to starting the context. > > This can present some difficulties as each CAPI enabled PCI device in > the kernel API has a default context, which may need to be started very > early to enable translations, potentially before interrupts can easily > be set up. > > This patch makes the API more flexible to allow interrupts to be > allocated after a context has already been started and takes care of > updating the PE structure used by the hardware and notifying it to > discard any cached copy it may have. > > The update is currently performed via a terminate/remove/add sequence. > This is necessary on some hardware such as the XSL that does not > properly support the update LLCMD. > > Note that this is only supported on powernv at present - attempting to > perform this ordering on PowerVM will raise a warning. > > Signed-off-by: Ian Munsie > Reviewed-by: Frederic Barrat Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/292841b09648ce7aee5df16ab7 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: cxl: static-ify variables to fix sparse warnings
On Mon, 2016-18-04 at 05:03:50 UTC, Andrew Donnellan wrote: > Make a couple more variables static. Found by sparse. > > Signed-off-by: Andrew Donnellan> Reviewed-by: fbar...@linux.vnet.ibm.com > Reviewed-by: Matthew R. Ochs > Acked-by: Ian Munsie Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/64417a398973d964139306c0b1 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: cxl: Make vPHB device node match adapter's
On Wed, 2016-15-06 at 14:42:16 UTC, Frederic Barrat wrote: > Tested by cxlflash on bare-metal and powerVM. > > Signed-off-by: Frederic Barrat> Reviewed-by: Matthew R. Ochs > Acked-by: Ian Munsie > Signed-off-by: Frederic Barrat Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/a430739009384ba2c4804f3a42 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC] cxl: Add support for CAPP DMA mode
On Wed, 2016-08-06 at 05:09:54 UTC, Ian Munsie wrote: > From: Ian Munsie> > This adds support for using CAPP DMA mode, which is required for XSL > based cards such as the Mellanox CX4 to function. > > This is currently an RFC as it depends on the corresponding support to > be merged into skiboot first, which was submitted here: > http://patchwork.ozlabs.org/patch/625582/ > > In the event that the skiboot on the system does not have the above > support, it will indicate as such in the kernel log and abort the init > process. > > Signed-off-by: Ian Munsie Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/b385c9e971468eb8816b267424 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [3/4] powerpc/sparse: Include headers containing prototypes
On Wed, 2016-18-05 at 01:16:51 UTC, Daniel Axtens wrote: > Sometimes headers that provide prototypes for functions are > accidentally omitted from the files that define the functions. > > Fix a couple of times that occurs. > > Signed-off-by: Daniel AxtensApplied to powerpc next, thanks. https://git.kernel.org/powerpc/c/665e87ffe1c400c525c3a4cd6f cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: cxl: Abstract the differences between the PSL and XSL
On Mon, 2016-23-05 at 17:39:18 UTC, Ian Munsie wrote: > From: Frederic Barrat> > The XSL (Translation Service Layer) is a stripped down version of the > PSL (Power Service Layer) used in some cards such as the Mellanox CX4. > > Like the PSL, it implements the CAIA architecture, but has a number of > differences, mostly in it's implementation dependent registers. This > adds an ops structure to abstract these differences to bring initial > support for XSL CAPI devices. > > The XSL does not implement the optional architected SERR register, > however while it treats it as a reserved register and should work with > no special treatment, attempting to access it will cause the XSL_FEC > (First Error Capture) register to be filled out, preventing it from > capturing any subsequent errors. Therefore, this patch also prevents the > kernel from trying to set up the SERR register so that the FEC register > may still be useful, and to save one interrupt. > > The XSL also uses a special DMA cxl mode, which uses a slightly > different init sequence for the CAPP and PHB. The kernel support for > this will be in a future patch once the corresponding support has been > merged into skiboot. > > Co-authored-by: Ian Munsie > Signed-off-by: Ian Munsie Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/6d382616ac2283ed65c7a6a52d cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [2/4] powerpc: Introduce asm-prototypes.h
On Wed, 2016-18-05 at 01:16:50 UTC, Daniel Axtens wrote: > Sparse picked up a number of functions that are implemented in C and > then only referred to in asm code. > > This introduces asm-prototypes.h, which provides a place for > prototypes of these functions. > > This silences some sparse warnings. > > Signed-off-by: Daniel AxtensApplied to powerpc next, thanks. https://git.kernel.org/powerpc/c/42f5b4cacd783faf05e3ff8bf8 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [FIX,v2,2/2] powerpc,numa: Fix memory_hotplug_max()
On Thu, 2016-12-05 at 13:34:15 UTC, Bharata B Rao wrote: > memory_hotplug_max() uses hot_add_drconf_memory_max() to get maxmimum > addressable memory by referring to ibm,dyanamic-memory property. There > are three problems with the current approach: > > 1 hot_add_drconf_memory_max() assumes that ibm,dynamic-memory includes > all the LMBs of the guest, but that is not true for PowerKVM which > populates only DR LMBs (LMBs that can be hotplugged/removed) in that > property. > 2 hot_add_drconf_memory_max() multiplies lmb-size with lmb-count to arrive > at the max possible address. Since ibm,dynamic-memory doesn't include > RMA LMBs, the address thus obtained will be less than the actual max > address. For example, if max possible memory size is 32G, with lmb-size > of 256MB there can be 127 LMBs in ibm,dynamic-memory (1 LMB for RMA > which won't be present here). hot_add_drconf_memory_max() would then > return the max addressable memory as 127 * 256MB = 31.75GB, the max > address should have been 32G which is what ibm,lrdr-capacity shows. > 3 In PowerKVM, there can be a gap between the end of boot time RAM and > beginning of hotplug RAM area. So just multiplying lmb-count with > lmb-size will not provide the correct max possible address for PowerKVM. > > This patch fixes 1 by using ibm,lrdr-capacity property to return the max > addressable memory whenever the property is present. Then it fixes 2 & 3 > by fetching the address of the last LMB in ibm,dynamic-memory property. > > Signed-off-by: Bharata B Rao> Reviewed-by: David Gibson Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/45b64ee64970dee9392229302e cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [1/4] powerpc/sparse: make some things static
On Wed, 2016-18-05 at 01:16:49 UTC, Daniel Axtens wrote: > This is just a smattering of things picked up by sparse that should > be made static. > > Signed-off-by: Daniel AxtensApplied to powerpc next, thanks. https://git.kernel.org/powerpc/c/34852ed5511ec5d07897f22d56 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [FIX, v2, 1/2] powerpc, numa: Fix whitespace in hot_add_drconf_memory_max()
On Thu, 2016-12-05 at 13:34:14 UTC, Bharata B Rao wrote: > Signed-off-by: Bharata B Rao> Reviewed-by: David Gibson Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/e70bd3ae914ec40d8505ed842d cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch v2 1/2] powerpc: Send SIGBUS on unaligned copy and paste
On Thu, Jun 16, 2016 at 11:04:12PM -0500, Segher Boessenkool wrote: On Fri, Jun 17, 2016 at 09:33:45AM +1000, Chris Smart wrote: +#define PPC_INST_COPY 0x7c00060c +#define PPC_INST_COPY_FIRST0x7c20060c +#define PPC_INST_PASTE 0x7c00070c +#define PPC_INST_PASTE_LAST0x7c20070d That's not quite right I think. Hi Segher, Thanks for checking that for me, it's good to make sure it's correct. Just to be sure, I've gone back and compared them all with the ISA. I think that the only one that differs is the paste_last. Am I missing something? copy is 7c00060c mask fc2007fe (or ffe007fe) COPY = copy RA,RB,L (L=0) 31 L RARB 774 / 01 0 0 0 110110 0 = 0x7c00060c (instruction) 11 1 0 0 11 0 = 0xfc2007fe (specific mask) copy_first is 7c20060c mask fc2007fe COPY_FIRST = copy RA,RB,L (L=1) If L=1, the instruction identifies the beginning of a move group. 31 L RA RB 774 / 01 1 0 0 110110 0 = 0x7c20060c (instruction) 11 1 0 0 11 0 = 0xfc2007fe (specific mask) paste is 7c00070c mask fc2007fe PASTE = paste RA,RB,L (L=0 Rc=0) 31 L RARB 902 Rc 01 0 0 0 111110 0 = 0x7c00070c (instruction) 11 1 0 0 11 1 = 0xfc2007ff (specific mask) paste_last is 7c20070c mask fc2007fe PASTE_LAST = paste. RA,RB,L (L=1 Rc=1) If L=1, the instruction identifies the end of a move group. If L≠Rc, the instruction form is invalid. 31 L RARB 902 Rc 01 1 0 0 111110 1 = 0x7c20070d (instruction) 11 1 0 0 11 1 = 0xfc2007ff (specific mask) (this includes record form for paste; the low bit). To make the test simple I use a combined copy, copy_first, paste and paste_last mask to compare just against copy. So that excluded: - L - bit 24 of 32 - Rc 11 0 0 0 110111 0 = 0xfc0006fe Would it be better and more clear to check each instruction with its mask? Something like: #define PPC_INST_COPY_MASK 0xfc2007fe #define PPC_INST_PASTE_MASK 0xfc2007ff if (cpu_has_feature(CPU_FTR_ARCH_300)) { unsigned int masked_instruction = instruction & PPC_INST_COPY_MASK; if (masked_instruction == PPC_INST_COPY || \ masked_instruction == PPC_INST_COPY_FIRST) return -EIO; masked_instruction = instruction & PPC_INST_PASTE_MASK; if (masked_instruction == PPC_INST_PASTE || \ masked_instruction == PPC_INST_PASTE_LAST) return -EIO; } Thanks! -c ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [6/6] ppc: ebpf/jit: Implement JIT compiler for extended BPF
On Sun, 2016-06-19 at 23:06 +0530, Naveen N. Rao wrote: > On 2016/06/17 10:53PM, Michael Ellerman wrote: > > On Tue, 2016-07-06 at 13:32:23 UTC, "Naveen N. Rao" wrote: > > > diff --git a/arch/powerpc/net/bpf_jit_comp64.c > > > b/arch/powerpc/net/bpf_jit_comp64.c > > > new file mode 100644 > > > index 000..954ff53 > > > --- /dev/null > > > +++ b/arch/powerpc/net/bpf_jit_comp64.c > > > @@ -0,0 +1,956 @@ > > ... > > > + > > > +static void bpf_jit_fill_ill_insns(void *area, unsigned int size) > > > +{ > > > + int *p = area; > > > + > > > + /* Fill whole space with trap instructions */ > > > + while (p < (int *)((char *)area + size)) > > > + *p++ = BREAKPOINT_INSTRUCTION; > > > +} > > > > This breaks the build for some configs, presumably you're missing a header: > > > > arch/powerpc/net/bpf_jit_comp64.c:30:10: error: 'BREAKPOINT_INSTRUCTION' > > undeclared (first use in this function) > > > > http://kisskb.ellerman.id.au/kisskb/buildresult/12720611/ > > Oops. Yes, I should have caught that. I need to add: > > #include > > in bpf_jit_comp64.c > > Can you please check if it resolves the build error? Can you? :D cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] leds: Add no-op gpio_led_register_device when LED subsystem is disabled
Some systems use 'gpio_led_register_device' to make an in-memory copy of their LED device table so the original can be removed as .init.rodata. When the LED subsystem is not enabled source in the led directory is not built and so this function may be undefined. Fix this here. Signed-off-by: Andrew F. Davis--- include/linux/leds.h | 8 1 file changed, 8 insertions(+) diff --git a/include/linux/leds.h b/include/linux/leds.h index d2b1306..a4a3da6 100644 --- a/include/linux/leds.h +++ b/include/linux/leds.h @@ -386,8 +386,16 @@ struct gpio_led_platform_data { unsigned long *delay_off); }; +#ifdef CONFIG_NEW_LEDS struct platform_device *gpio_led_register_device( int id, const struct gpio_led_platform_data *pdata); +#else +static inline struct platform_device *gpio_led_register_device( + int id, const struct gpio_led_platform_data *pdata) +{ + return 0; +} +#endif enum cpu_led_event { CPU_LED_IDLE_START, /* CPU enters idle */ -- 2.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] ppc: Fix BPF JIT for ABIv2
On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote: > On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote: > > On Fri, Jun 17, 2016 at 10:53:21PM +1000, Michael Ellerman wrote: > > > On Tue, 2016-07-06 at 13:32:23 UTC, "Naveen N. Rao" wrote: > > > > diff --git a/arch/powerpc/net/bpf_jit_comp64.c > > > > b/arch/powerpc/net/bpf_jit_comp64.c > > > > new file mode 100644 > > > > index 000..954ff53 > > > > --- /dev/null > > > > +++ b/arch/powerpc/net/bpf_jit_comp64.c > > > > @@ -0,0 +1,956 @@ > > > ... > > > > + > > > > +static void bpf_jit_fill_ill_insns(void *area, unsigned int size) > > > > +{ > > > > + int *p = area; > > > > + > > > > + /* Fill whole space with trap instructions */ > > > > + while (p < (int *)((char *)area + size)) > > > > + *p++ = BREAKPOINT_INSTRUCTION; > > > > +} > > > > > > This breaks the build for some configs, presumably you're missing a > > > header: > > > > > > arch/powerpc/net/bpf_jit_comp64.c:30:10: error: > > > 'BREAKPOINT_INSTRUCTION' undeclared (first use in this function) > > > > > > http://kisskb.ellerman.id.au/kisskb/buildresult/12720611/ > > > > > > cheers > > > > Hi, Michael and Naveen. > > > > I noticed independently that there is a problem with BPF JIT and ABIv2, and > > worked out the patch below before I noticed Naveen's patchset and the latest > > changes in ppc tree for a better way to check for ABI versions. > > > > However, since the issue described below affect mainline and stable kernels, > > would you consider applying it before merging your two patchsets, so that > > we can > > more easily backport the fix? > > Hi Cascardo, > Given that this has been broken on ABIv2 since forever, I didn't bother > fixing it. But, I can see why this would be a good thing to have for > -stable and existing distros. However, while your patch below may fix > the crash you're seeing on ppc64le, it is not sufficient -- you'll need > changes in bpf_jit_asm.S as well. Hi, Naveen. Any tips on how to exercise possible issues there? Or what changes you think would be sufficient? I will see what I can find by myself, but would appreciate any help. Regards. Cascardo. > > Regards, > Naveen > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/3] cxlflash: Shutdown notify support for CXL Flash cards
> On Jun 15, 2016, at 6:49 PM, Uma Krishnanwrote: > > Some CXL Flash cards need notification of device shutdown > in order to flush pending I/Os. > > A PCI notification hook for shutdown has been added where > the driver notifies the card and returns. When the device > is removed in the PCI remove path, notification code will > wait for shutdown processing to complete. > > Signed-off-by: Uma Krishnan Acked-by: Matthew R. Ochs ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/3] cxlflash: Add device dependent flags
> On Jun 15, 2016, at 6:49 PM, Uma Krishnanwrote: > > Device dependent flags are needed to support functions that are > specific to a particular device. > > One such case is - some CXL Flash cards need to be notified of > device shutdown. For other CXL devices, this feature does not prove > to be useful yet. Such distinct features need to be identified in > the driver to bypass or invoke specific functionality. > > In this patch, a member 'flags' has been added to device dependent > values. These flags will be used and expanded in the future to > support various device specific functions. > > Signed-off-by: Uma Krishnan Acked-by: Matthew R. Ochs ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 2/9] kexec_file: Generalize kexec_add_buffer.
Am Montag, 20 Juni 2016, 10:26:05 schrieb Dave Young: > kexec_buf should go within #ifdef for kexec file like struct > purgatory_info > > Other than that it looks good. Great! Here it is. -- []'s Thiago Jung Bauermann IBM Linux Technology Center kexec_file: Generalize kexec_add_buffer. Allow architectures to specify different memory walking functions for kexec_add_buffer. Intel uses iomem to track reserved memory ranges, but PowerPC uses the memblock subsystem. Signed-off-by: Thiago Jung BauermannCc: Eric Biederman Cc: Dave Young Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org diff --git a/include/linux/kexec.h b/include/linux/kexec.h index e8acb2b43dd9..3d91bcfc180d 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -146,7 +146,24 @@ struct kexec_file_ops { kexec_verify_sig_t *verify_sig; #endif }; -#endif + +/* + * Keeps track of buffer parameters as provided by caller for requesting + * memory placement of buffer. + */ +struct kexec_buf { + struct kimage *image; + unsigned long mem; + unsigned long memsz; + unsigned long buf_align; + unsigned long buf_min; + unsigned long buf_max; + bool top_down; /* allocate from top of memory hole */ +}; + +int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf, + int (*func)(u64, u64, void *)); +#endif /* CONFIG_KEXEC_FILE */ struct kimage { kimage_entry_t head; diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index b6eec7527e9f..b1f1f6402518 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -428,6 +428,27 @@ static int locate_mem_hole_callback(u64 start, u64 end, void *arg) return locate_mem_hole_bottom_up(start, end, kbuf); } +/** + * arch_kexec_walk_mem - call func(data) on free memory regions + * @kbuf: Context info for the search. Also passed to @func. + * @func: Function to call for each memory region. + * + * Return: The memory walk will stop when func returns a non-zero value + * and that value will be returned. If all free regions are visited without + * func returning non-zero, then zero will be returned. + */ +int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf, + int (*func)(u64, u64, void *)) +{ + if (kbuf->image->type == KEXEC_TYPE_CRASH) + return walk_iomem_res_desc(crashk_res.desc, + IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY, + crashk_res.start, crashk_res.end, + kbuf, func); + else + return walk_system_ram_res(0, ULONG_MAX, kbuf, func); +} + /* * Helper function for placing a buffer in a kexec segment. This assumes * that kexec_mutex is held. @@ -472,14 +493,7 @@ int kexec_add_buffer(struct kimage *image, char *buffer, unsigned long bufsz, kbuf->top_down = top_down; /* Walk the RAM ranges and allocate a suitable range for the buffer */ - if (image->type == KEXEC_TYPE_CRASH) - ret = walk_iomem_res_desc(crashk_res.desc, - IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY, - crashk_res.start, crashk_res.end, kbuf, - locate_mem_hole_callback); - else - ret = walk_system_ram_res(0, -1, kbuf, - locate_mem_hole_callback); + ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback); if (ret != 1) { /* A suitable memory range could not be found for buffer */ return -EADDRNOTAVAIL; diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h index eefd5bf960c2..4cef7e4706b0 100644 --- a/kernel/kexec_internal.h +++ b/kernel/kexec_internal.h @@ -20,20 +20,6 @@ struct kexec_sha_region { unsigned long len; }; -/* - * Keeps track of buffer parameters as provided by caller for requesting - * memory placement of buffer. - */ -struct kexec_buf { - struct kimage *image; - unsigned long mem; - unsigned long memsz; - unsigned long buf_align; - unsigned long buf_min; - unsigned long buf_max; - bool top_down; /* allocate from top of memory hole */ -}; - void kimage_file_post_load_cleanup(struct kimage *image); #else /* CONFIG_KEXEC_FILE */ static inline void kimage_file_post_load_cleanup(struct kimage *image) { } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] ibmvnic: fix to use list_for_each_safe() when delete items
On 06/17/2016 09:53 PM, weiyj...@163.com wrote: > From: Wei Yongjun> > Since we will remove items off the list using list_del() we need > to use a safe version of the list_for_each() macro aptly named > list_for_each_safe(). > > Signed-off-by: Wei Yongjun > --- > drivers/net/ethernet/ibm/ibmvnic.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c > b/drivers/net/ethernet/ibm/ibmvnic.c > index 864cb21..0b6a922 100644 > --- a/drivers/net/ethernet/ibm/ibmvnic.c > +++ b/drivers/net/ethernet/ibm/ibmvnic.c > @@ -3141,14 +3141,14 @@ static void handle_request_ras_comp_num_rsp(union > ibmvnic_crq *crq, > > static void ibmvnic_free_inflight(struct ibmvnic_adapter *adapter) > { > - struct ibmvnic_inflight_cmd *inflight_cmd; > + struct ibmvnic_inflight_cmd *inflight_cmd, *tmp1; > struct device *dev = >vdev->dev; > - struct ibmvnic_error_buff *error_buff; > + struct ibmvnic_error_buff *error_buff, *tmp2; > unsigned long flags; > unsigned long flags2; > > spin_lock_irqsave(>inflight_lock, flags); > - list_for_each_entry(inflight_cmd, >inflight, list) { > + list_for_each_entry_safe(inflight_cmd, tmp1, >inflight, list) { > switch (inflight_cmd->crq.generic.cmd) { > case LOGIN: > dma_unmap_single(dev, adapter->login_buf_token, > @@ -3165,8 +3165,8 @@ static void ibmvnic_free_inflight(struct > ibmvnic_adapter *adapter) > break; > case REQUEST_ERROR_INFO: > spin_lock_irqsave(>error_list_lock, flags2); > - list_for_each_entry(error_buff, >errors, > - list) { > + list_for_each_entry_safe(error_buff, tmp2, > + >errors, list) { > dma_unmap_single(dev, error_buff->dma, >error_buff->len, >DMA_FROM_DEVICE); > Thanks! Acked-by: Thomas Falcon > > > > ___ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/4] kvm/stats: Add provisioning for 64-bit vcpu statistics
On 20/06/2016 02:08, Paul Mackerras wrote: > Paolo, > > Can I have an ack for Suraj's patch below? If it's OK with you, > I'll take his series through my tree. Yes, please do. Paolo > Thanks, > Paul. > > On Wed, Jun 15, 2016 at 07:21:07PM +1000, Suraj Jitindar Singh wrote: >> vcpus have statistics associated with them which can be viewed within the >> debugfs. Currently it is assumed within the vcpu_stat_get() and >> vcpu_stat_get_per_vm() functions that all of these statistics are >> represented as 32-bit numbers. The next patch adds some 64-bit statistics, >> so add provisioning for the display of 64-bit vcpu statistics. >> >> Signed-off-by: Suraj Jitindar Singh>> --- >> arch/powerpc/kvm/book3s.c | 1 + >> include/linux/kvm_host.h | 1 + >> virt/kvm/kvm_main.c | 60 >> +++ >> 3 files changed, 58 insertions(+), 4 deletions(-) >> >> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c >> index 47018fc..ed9132b 100644 >> --- a/arch/powerpc/kvm/book3s.c >> +++ b/arch/powerpc/kvm/book3s.c >> @@ -40,6 +40,7 @@ >> #include "trace.h" >> >> #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU >> +#define VCPU_STAT_U64(x) offsetof(struct kvm_vcpu, stat.x), >> KVM_STAT_VCPU_U64 >> >> /* #define EXIT_DEBUG */ >> >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h >> index 1c9c973..667b30e 100644 >> --- a/include/linux/kvm_host.h >> +++ b/include/linux/kvm_host.h >> @@ -991,6 +991,7 @@ static inline bool kvm_is_error_gpa(struct kvm *kvm, >> gpa_t gpa) >> enum kvm_stat_kind { >> KVM_STAT_VM, >> KVM_STAT_VCPU, >> +KVM_STAT_VCPU_U64, >> }; >> >> struct kvm_stat_data { >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >> index 02e98f3..ac47ffb 100644 >> --- a/virt/kvm/kvm_main.c >> +++ b/virt/kvm/kvm_main.c >> @@ -3566,6 +3566,20 @@ static int vcpu_stat_get_per_vm(void *data, u64 *val) >> return 0; >> } >> >> +static int vcpu_stat_u64_get_per_vm(void *data, u64 *val) >> +{ >> +int i; >> +struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data; >> +struct kvm_vcpu *vcpu; >> + >> +*val = 0; >> + >> +kvm_for_each_vcpu(i, vcpu, stat_data->kvm) >> +*val += *(u64 *)((void *)vcpu + stat_data->offset); >> + >> +return 0; >> +} >> + >> static int vcpu_stat_get_per_vm_open(struct inode *inode, struct file *file) >> { >> __simple_attr_check_format("%llu\n", 0ull); >> @@ -3573,6 +3587,13 @@ static int vcpu_stat_get_per_vm_open(struct inode >> *inode, struct file *file) >> NULL, "%llu\n"); >> } >> >> +static int vcpu_stat_u64_get_per_vm_open(struct inode *inode, struct file >> *file) >> +{ >> +__simple_attr_check_format("%llu\n", 0ull); >> +return kvm_debugfs_open(inode, file, vcpu_stat_u64_get_per_vm, >> + NULL, "%llu\n"); >> +} >> + >> static const struct file_operations vcpu_stat_get_per_vm_fops = { >> .owner = THIS_MODULE, >> .open= vcpu_stat_get_per_vm_open, >> @@ -3582,9 +3603,19 @@ static const struct file_operations >> vcpu_stat_get_per_vm_fops = { >> .llseek = generic_file_llseek, >> }; >> >> +static const struct file_operations vcpu_stat_u64_get_per_vm_fops = { >> +.owner = THIS_MODULE, >> +.open= vcpu_stat_u64_get_per_vm_open, >> +.release = kvm_debugfs_release, >> +.read= simple_attr_read, >> +.write = simple_attr_write, >> +.llseek = generic_file_llseek, >> +}; >> + >> static const struct file_operations *stat_fops_per_vm[] = { >> -[KVM_STAT_VCPU] = _stat_get_per_vm_fops, >> -[KVM_STAT_VM] = _stat_get_per_vm_fops, >> +[KVM_STAT_VCPU] = _stat_get_per_vm_fops, >> +[KVM_STAT_VCPU_U64] = _stat_u64_get_per_vm_fops, >> +[KVM_STAT_VM] = _stat_get_per_vm_fops, >> }; >> >> static int vm_stat_get(void *_offset, u64 *val) >> @@ -3627,9 +3658,30 @@ static int vcpu_stat_get(void *_offset, u64 *val) >> >> DEFINE_SIMPLE_ATTRIBUTE(vcpu_stat_fops, vcpu_stat_get, NULL, "%llu\n"); >> >> +static int vcpu_stat_u64_get(void *_offset, u64 *val) >> +{ >> +unsigned offset = (long)_offset; >> +struct kvm *kvm; >> +struct kvm_stat_data stat_tmp = {.offset = offset}; >> +u64 tmp_val; >> + >> +*val = 0; >> +spin_lock(_lock); >> +list_for_each_entry(kvm, _list, vm_list) { >> +stat_tmp.kvm = kvm; >> +vcpu_stat_u64_get_per_vm((void *)_tmp, _val); >> +*val += tmp_val; >> +} >> +spin_unlock(_lock); >> +return 0; >> +} >> + >> +DEFINE_SIMPLE_ATTRIBUTE(vcpu_stat_u64_fops, vcpu_stat_u64_get, NULL, >> "%llu\n"); >> + >> static const struct file_operations *stat_fops[] = { >> -[KVM_STAT_VCPU] = _stat_fops, >> -[KVM_STAT_VM] = _stat_fops, >> +[KVM_STAT_VCPU] = _stat_fops, >> +[KVM_STAT_VCPU_U64] = _stat_u64_fops, >> +[KVM_STAT_VM]
[PATCH 2/2] powerpc/pseries: Dynamic add entires to associativity lookup array
Dynamically add entries to the associativity lookup array The ibm,associativity-lookup-arrays property may only contain associativity arrays for LMBs present at boot time. When hotplug adding a LMB its associativity array may not be in the associativity lookup array, this patch adds the ability to add new entries to the associativity lookup array. Signed-off-by: Nathan Fontenot--- arch/powerpc/platforms/pseries/hotplug-memory.c | 93 --- 1 file changed, 66 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index b10f2ef..f62eef652 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -191,14 +191,72 @@ static int dlpar_update_device_tree_lmb(struct of_drconf_cell *lmb) return 0; } +static u32 find_aa_index(struct device_node *dr_node, +struct property *ala_prop, const u32 *lmb_assoc) +{ + u32 *assoc_arrays; + u32 aa_index; + int aa_arrays, aa_array_entries, aa_array_sz; + int i, index; + + /* The ibm,associativity-lookup-arrays property is defined to be +* a 32-bit value specifying the number of associativity arrays +* followed by a 32-bitvalue specifying the number of entries per +* array, followed by the associativity arrays. +*/ + assoc_arrays = ala_prop->value; + + aa_arrays = be32_to_cpu(assoc_arrays[0]); + aa_array_entries = be32_to_cpu(assoc_arrays[1]); + aa_array_sz = aa_array_entries * sizeof(u32); + + aa_index = -1; + for (i = 0; i < aa_arrays; i++) { + index = (i * aa_array_entries) + 2; + + if (memcmp(_arrays[index], _assoc[1], aa_array_sz)) + continue; + + aa_index = i; + break; + } + + if (aa_index == -1) { + struct property *new_prop; + u32 new_prop_size; + + new_prop_size = ala_prop->length + aa_array_sz; + new_prop = dlpar_clone_property(ala_prop, new_prop_size); + if (!new_prop) + return -1; + + assoc_arrays = new_prop->value; + + /* increment the number of entries in the lookup array */ + assoc_arrays[0] = cpu_to_be32(aa_arrays + 1); + + /* copy the new associativity into the lookup array */ + index = aa_arrays * aa_array_entries + 2; + memcpy(_arrays[index], _assoc[1], aa_array_sz); + + of_update_property(dr_node, new_prop); + + /* The associativity lookup array index for this lmb is +* number of entries - 1 since we added its associativity +* to the end of the lookup array. +*/ + aa_index = be32_to_cpu(assoc_arrays[0]) - 1; + } + + return aa_index; +} + static u32 lookup_lmb_associativity_index(struct of_drconf_cell *lmb) { struct device_node *parent, *lmb_node, *dr_node; + struct property *ala_prop; const u32 *lmb_assoc; - const u32 *assoc_arrays; u32 aa_index; - int aa_arrays, aa_array_entries, aa_array_sz; - int i; parent = of_find_node_by_path("/"); if (!parent) @@ -222,34 +280,15 @@ static u32 lookup_lmb_associativity_index(struct of_drconf_cell *lmb) return -ENODEV; } - assoc_arrays = of_get_property(dr_node, - "ibm,associativity-lookup-arrays", - NULL); - of_node_put(dr_node); - if (!assoc_arrays) { + ala_prop = of_find_property(dr_node, "ibm,associativity-lookup-arrays", + NULL); + if (!ala_prop) { + of_node_put(dr_node); dlpar_free_cc_nodes(lmb_node); return -ENODEV; } - /* The ibm,associativity-lookup-arrays property is defined to be -* a 32-bit value specifying the number of associativity arrays -* followed by a 32-bitvalue specifying the number of entries per -* array, followed by the associativity arrays. -*/ - aa_arrays = be32_to_cpu(assoc_arrays[0]); - aa_array_entries = be32_to_cpu(assoc_arrays[1]); - aa_array_sz = aa_array_entries * sizeof(u32); - - aa_index = -1; - for (i = 0; i < aa_arrays; i++) { - int indx = (i * aa_array_entries) + 2; - - if (memcmp(_arrays[indx], _assoc[1], aa_array_sz)) - continue; - - aa_index = i; - break; - } + aa_index = find_aa_index(dr_node, ala_prop, lmb_assoc); dlpar_free_cc_nodes(lmb_node); return aa_index; ___ Linuxppc-dev mailing list
[PATCH 1/2] powerpc/pseries: Move property cloning into its own routine
Move property cloning code into its own routine Split the pieces of dlpar_clone_drconf_property() that create a copy of the property struct into its own routine. This allows for creating clones of more than just the ibm,dynamic-memory property used in memory hotplug. Signed-off-by: Nathan Fontenot--- arch/powerpc/platforms/pseries/hotplug-memory.c | 38 --- 1 file changed, 26 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index 03f6169..b10f2ef 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -69,13 +69,36 @@ unsigned long pseries_memory_block_size(void) return memblock_size; } -static void dlpar_free_drconf_property(struct property *prop) +static void dlpar_free_property(struct property *prop) { kfree(prop->name); kfree(prop->value); kfree(prop); } +static struct property *dlpar_clone_property(struct property *prop, +u32 prop_size) +{ + struct property *new_prop; + + new_prop = kzalloc(sizeof(*new_prop), GFP_KERNEL); + if (!new_prop) + return NULL; + + new_prop->name = kstrdup(prop->name, GFP_KERNEL); + new_prop->value = kzalloc(prop_size, GFP_KERNEL); + if (!new_prop->name || !new_prop->value) { + dlpar_free_property(new_prop); + return NULL; + } + + memcpy(new_prop->value, prop->value, prop->length); + new_prop->length = prop_size; + + of_property_set_flag(new_prop, OF_DYNAMIC); + return new_prop; +} + static struct property *dlpar_clone_drconf_property(struct device_node *dn) { struct property *prop, *new_prop; @@ -87,19 +110,10 @@ static struct property *dlpar_clone_drconf_property(struct device_node *dn) if (!prop) return NULL; - new_prop = kzalloc(sizeof(*new_prop), GFP_KERNEL); + new_prop = dlpar_clone_property(prop, prop->length); if (!new_prop) return NULL; - new_prop->name = kstrdup(prop->name, GFP_KERNEL); - new_prop->value = kmemdup(prop->value, prop->length, GFP_KERNEL); - if (!new_prop->name || !new_prop->value) { - dlpar_free_drconf_property(new_prop); - return NULL; - } - - new_prop->length = prop->length; - /* Convert the property to cpu endian-ness */ p = new_prop->value; *p = be32_to_cpu(*p); @@ -718,7 +732,7 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog) break; } - dlpar_free_drconf_property(prop); + dlpar_free_property(prop); dlpar_memory_out: of_node_put(dn); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/2] powerpc/pseries: Dynamic associativity-lookup-arrays updating
The ibm,dynamic-reconfiguration-memory/ibm,associativity-lookup-arrays property used to track the associativity for LMBs assigned to a system may not contain all of the possible associativity arrays for the system at boot time. When a LMB is added to the system and its associativity array is not present in the lookup array we need to update the lookup array to contain the new associativity array. The first patch splits the code that creates a clone of a property into its own routine so this can be used for cloning any of the properties used during memory hotplug. The second patch updates the associativity lookup code to dynamically add new associativity arrays to the lookup array if they are not present. -Nathan hotplug-memory.c | 131 ++- 1 file changed, 92 insertions(+), 39 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] tracing: Expose CPU physical addresses (resource values) for PCI devices
On Sat, 18 Jun 2016 08:23:23 +1000 Benjamin Herrenschmidtwrote: > On Fri, 2016-06-17 at 17:59 -0400, Steven Rostedt wrote: > > Sorry for the late reply, this patch got pushed down in my INBOX. > > > > Could I get someone from PPC to review this patch, just to be safe? > > The patch makes sense, I can try getting somebody onto porting > mmiotrace one of these days. > OK, thanks! I'll pull this into my for-next repo then. -- Steve ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/pseries: Auto online hotplugged memory
Auto online hotplugged memory A recent update (commit id 31bc3858ea3) to the core mm hotplug code introduced the memhp_auto_online variable to allow for automatically onlining memory that is added. This patch update the pseries memory hotplug code to enable this so that any memory DLPAR added to the system is automatically onlined. The code to add the memory block for memory added from add_memory() is removed as this is not needed, the memory_add code does this. Signed-off-by: Nathan Fontenot--- arch/powerpc/platforms/pseries/hotplug-memory.c | 52 +-- 1 file changed, 11 insertions(+), 41 deletions(-) diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index 2ce1385..03f6169 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -533,50 +533,11 @@ static int dlpar_memory_remove_by_index(u32 drc_index, struct property *prop) #endif /* CONFIG_MEMORY_HOTREMOVE */ -static int dlpar_add_lmb_memory(struct of_drconf_cell *lmb) +static int dlpar_add_lmb(struct of_drconf_cell *lmb) { - struct memory_block *mem_block; unsigned long block_sz; int nid, rc; - block_sz = memory_block_size_bytes(); - - /* Find the node id for this address */ - nid = memory_add_physaddr_to_nid(lmb->base_addr); - - /* Add the memory */ - rc = add_memory(nid, lmb->base_addr, block_sz); - if (rc) - return rc; - - /* Register this block of memory */ - rc = memblock_add(lmb->base_addr, block_sz); - if (rc) { - remove_memory(nid, lmb->base_addr, block_sz); - return rc; - } - - mem_block = lmb_to_memblock(lmb); - if (!mem_block) { - remove_memory(nid, lmb->base_addr, block_sz); - return -EINVAL; - } - - rc = device_online(_block->dev); - put_device(_block->dev); - if (rc) { - remove_memory(nid, lmb->base_addr, block_sz); - return rc; - } - - lmb->flags |= DRCONF_MEM_ASSIGNED; - return 0; -} - -static int dlpar_add_lmb(struct of_drconf_cell *lmb) -{ - int rc; - if (lmb->flags & DRCONF_MEM_ASSIGNED) return -EINVAL; @@ -592,10 +553,19 @@ static int dlpar_add_lmb(struct of_drconf_cell *lmb) return rc; } - rc = dlpar_add_lmb_memory(lmb); + block_sz = memory_block_size_bytes(); + + /* Find the node id for this address */ + nid = memory_add_physaddr_to_nid(lmb->base_addr); + + /* Add and online memory */ + memhp_auto_online = true; + rc = add_memory(nid, lmb->base_addr, block_sz); if (rc) { dlpar_remove_device_tree_lmb(lmb); dlpar_release_drc(lmb->drc_index); + } else { + lmb->flags |= DRCONF_MEM_ASSIGNED; } return rc; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] cxlflash: Fix to drain operations from previous reset
> On Jun 15, 2016, at 6:49 PM, Uma Krishnanwrote: > > From: "Manoj N. Kumar" > > While running 'sg_reset -H' in a loop with a user-space application active, > hit the following exception: > > cpu 0x2: Vector: 300 (Data Access) >pc: : afu_attach+0x50/0x240 [cxlflash] >lr: : cxlflash_afu_recover+0x3dc/0x7d0 [cxlflash] >pid = 20365, comm = run_block_fvt > > Linux version 4.5.0-491-26f710d+ > > cxlflash_afu_recover+0x3dc/0x7d0 [cxlflash] > cxlflash_ioctl+0x5a8/0x6f0 [cxlflash] > scsi_ioctl+0x3b0/0x4c0 > sd_ioctl+0x110/0x190 > blkdev_ioctl+0x28c/0xc20 > block_ioctl+0xa4/0xd0 > do_vfs_ioctl+0xd8/0x8c0 > SyS_ioctl+0xd4/0xf0 > system_call+0x38/0xb4 > > The problem here is that the problem space area is unmapped while the > application issues the DK_CXLFLASH_RECOVER_AFU ioctl. > > This is the order I observe: > > proc1 proc2 > 1) sg_reset > 2) ioctl(DK_CXLFLASH_RECOVER_AFU) > 3) sg_reset again > causing a PSA unmap > 4) continues RECOVER_AFU processing > > The resolution to this problem is to have the reset handler drain all > outstanding user space initiated ioctls before proceeding. It is safe > to drain after the state has been changed to STATE_RESET. Also since > drain_ioctls() was static, it had to be moved up a bit to be before > cxlflash_eh_host_reset_handler(). > > Signed-off-by: Manoj N. Kumar Acked-by: Matthew R. Ochs ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: unrecoverable exception on G5 with CONFIG_PPC_EARLY_DEBUG enabled
Michael Ellermanwrites: > On Sat, 2016-06-18 at 22:47 +0530, Aneesh Kumar K.V wrote: >> Michael Ellerman writes: >> > On Fri, 2016-06-17 at 08:33 +1000, Benjamin Herrenschmidt wrote: >> > > On Thu, 2016-06-16 at 22:49 +0300, Denis Kirjanov wrote: >> > > > - >> > > > +BEGIN_MMU_FTR_SECTION >> > > > + b 2f >> > > > +END_MMU_FTR_SECTION_IFSET(MMU_FTR_RADIX) >> > > > andi. r10,r12,MSR_RI /* check for unrecoverable exception >> > > > */ >> > > > beq-2f >> > > >> > > Are we taking an SLB miss before we do the fixup maybe ? >> > >> > Yeah that's the only explanation that makes any sense. >> > >> > I think instead of patching down this low we should instead be redirecting >> > SLB >> > misses to unknown_exception() when radix is enabled. Aneesh? >> >> The 2f branch ends up doing unrecoverable exception. Or are you >> suggesting something else ? > > I meant more like diverting to unknown_exception() higher up the call stack, > but > that's complicated. > > How about this? Denis does this work? > > cheers > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index 4c9440629128..8bcc1b457115 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -1399,11 +1399,12 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_RADIX) > lwz r9,PACA_EXSLB+EX_CCR(r13) /* get saved CR */ > > mtlrr10 > -BEGIN_MMU_FTR_SECTION > - b 2f > -END_MMU_FTR_SECTION_IFSET(MMU_FTR_RADIX) > andi. r10,r12,MSR_RI /* check for unrecoverable exception */ > +BEGIN_MMU_FTR_SECTION > beq-2f > +FTR_SECTION_ELSE > + b 2f > +ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_RADIX) > > .machine push > .machine "power4" I sent a patch which should get this problem fixed. http://mid.gmane.org/1466274479-5650-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] tools/perf: Fix the mask in regs_dump__printf and
On Monday 20 June 2016 03:10 PM, Jiri Olsa wrote: > On Mon, Jun 20, 2016 at 05:27:25PM +0800, Wangnan (F) wrote: >> >> On 2016/6/20 17:18, Jiri Olsa wrote: >>> On Mon, Jun 20, 2016 at 02:14:01PM +0530, Madhavan Srinivasan wrote: When decoding the perf_regs mask in regs_dump__printf(), we loop through the mask using find_first_bit and find_next_bit functions. "mask" is of type "u64", but sent as a "unsigned long *" to lib functions along with sizeof(). While the exisitng code works fine in most of the case, the logic is broken when using a 32bit perf on a 64bit kernel (Big Endian). We end up reading the wrong word of the u64 first in the lib functions. >>> hum, I still don't see why this happens.. why do we read the >>> wrong word in this case? >> If you read a u64 using (u32 *)()[0] and (u32 *)()[1] >> you can get wrong value. This is what _find_next_bit() is doing. Also in find_first_bit(). >> >> In a big endian environment where 'unsigned long' is 32 bits >> long, "(u32 *)()[0]" gets upper 32 bits, but without this patch >> perf assumes it gets lower 32 bits. The root cause is wrongly convert >> u64 value to bitmap. > i see, could you please put this into comment in the code? > > also we could have common function for that, to keep it on > one place only, like bitmap_from_u64 or so Sure will do. > > thanks, > jirka > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] tools/perf: Fix the mask in regs_dump__printf and
On Mon, Jun 20, 2016 at 05:27:25PM +0800, Wangnan (F) wrote: > > > On 2016/6/20 17:18, Jiri Olsa wrote: > > On Mon, Jun 20, 2016 at 02:14:01PM +0530, Madhavan Srinivasan wrote: > > > When decoding the perf_regs mask in regs_dump__printf(), > > > we loop through the mask using find_first_bit and find_next_bit functions. > > > "mask" is of type "u64", but sent as a "unsigned long *" to > > > lib functions along with sizeof(). While the exisitng code works fine in > > > most of the case, the logic is broken when using a 32bit perf on a > > > 64bit kernel (Big Endian). We end up reading the wrong word of the u64 > > > first in the lib functions. > > hum, I still don't see why this happens.. why do we read the > > wrong word in this case? > > If you read a u64 using (u32 *)()[0] and (u32 *)()[1] > you can get wrong value. This is what _find_next_bit() is doing. > > In a big endian environment where 'unsigned long' is 32 bits > long, "(u32 *)()[0]" gets upper 32 bits, but without this patch > perf assumes it gets lower 32 bits. The root cause is wrongly convert > u64 value to bitmap. i see, could you please put this into comment in the code? also we could have common function for that, to keep it on one place only, like bitmap_from_u64 or so thanks, jirka ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] tools/perf: Fix the mask in regs_dump__printf and
On 2016/6/20 17:18, Jiri Olsa wrote: On Mon, Jun 20, 2016 at 02:14:01PM +0530, Madhavan Srinivasan wrote: When decoding the perf_regs mask in regs_dump__printf(), we loop through the mask using find_first_bit and find_next_bit functions. "mask" is of type "u64", but sent as a "unsigned long *" to lib functions along with sizeof(). While the exisitng code works fine in most of the case, the logic is broken when using a 32bit perf on a 64bit kernel (Big Endian). We end up reading the wrong word of the u64 first in the lib functions. hum, I still don't see why this happens.. why do we read the wrong word in this case? If you read a u64 using (u32 *)()[0] and (u32 *)()[1] you can get wrong value. This is what _find_next_bit() is doing. In a big endian environment where 'unsigned long' is 32 bits long, "(u32 *)()[0]" gets upper 32 bits, but without this patch perf assumes it gets lower 32 bits. The root cause is wrongly convert u64 value to bitmap. Thank you. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: unrecoverable exception on G5 with CONFIG_PPC_EARLY_DEBUG enabled
On Sat, 2016-06-18 at 22:47 +0530, Aneesh Kumar K.V wrote: > Michael Ellermanwrites: > > On Fri, 2016-06-17 at 08:33 +1000, Benjamin Herrenschmidt wrote: > > > On Thu, 2016-06-16 at 22:49 +0300, Denis Kirjanov wrote: > > > > - > > > > +BEGIN_MMU_FTR_SECTION > > > > + b 2f > > > > +END_MMU_FTR_SECTION_IFSET(MMU_FTR_RADIX) > > > > andi. r10,r12,MSR_RI /* check for unrecoverable exception > > > > */ > > > > beq-2f > > > > > > Are we taking an SLB miss before we do the fixup maybe ? > > > > Yeah that's the only explanation that makes any sense. > > > > I think instead of patching down this low we should instead be redirecting > > SLB > > misses to unknown_exception() when radix is enabled. Aneesh? > > The 2f branch ends up doing unrecoverable exception. Or are you > suggesting something else ? I meant more like diverting to unknown_exception() higher up the call stack, but that's complicated. How about this? Denis does this work? cheers diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 4c9440629128..8bcc1b457115 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1399,11 +1399,12 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_RADIX) lwz r9,PACA_EXSLB+EX_CCR(r13) /* get saved CR */ mtlrr10 -BEGIN_MMU_FTR_SECTION - b 2f -END_MMU_FTR_SECTION_IFSET(MMU_FTR_RADIX) andi. r10,r12,MSR_RI /* check for unrecoverable exception */ +BEGIN_MMU_FTR_SECTION beq-2f +FTR_SECTION_ELSE + b 2f +ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_RADIX) .machine push .machine "power4" ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] tools/perf: Fix the mask in regs_dump__printf and
On Mon, Jun 20, 2016 at 02:14:01PM +0530, Madhavan Srinivasan wrote: SNIP > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c > index e3ce2f34d3ad..76d5006ebcc3 100644 > --- a/tools/perf/builtin-script.c > +++ b/tools/perf/builtin-script.c > @@ -412,11 +412,16 @@ static void print_sample_iregs(struct perf_sample > *sample, > struct regs_dump *regs = >intr_regs; > uint64_t mask = attr->sample_regs_intr; > unsigned i = 0, r; > + unsigned long _mask[sizeof(mask)/sizeof(unsigned long)]; > > if (!regs) > return; > > - for_each_set_bit(r, (unsigned long *) , sizeof(mask) * 8) { > + _mask[0] = mask & ULONG_MAX; > + if (sizeof(mask) > sizeof(unsigned long)) > + _mask[1] = mask >> 32; > + > + for_each_set_bit(r, _mask, sizeof(mask) * 8) { > u64 val = regs->regs[i++]; > printf("%5s:0x%"PRIx64" ", perf_reg_name(r), val); > } > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c > index 5214974e841a..2eaa42a4832a 100644 > --- a/tools/perf/util/session.c > +++ b/tools/perf/util/session.c > @@ -940,8 +940,13 @@ static void branch_stack__printf(struct perf_sample > *sample) > static void regs_dump__printf(u64 mask, u64 *regs) > { > unsigned rid, i = 0; > + unsigned long _mask[sizeof(mask)/sizeof(unsigned long)]; > > - for_each_set_bit(rid, (unsigned long *) , sizeof(mask) * 8) { > + _mask[0] = mask & ULONG_MAX; > + if (sizeof(mask) > sizeof(unsigned long)) > + _mask[1] = mask >> 32; > + > + for_each_set_bit(rid, _mask, sizeof(mask) * 8) { > u64 val = regs[i++]; could you please move the common code into the function? thanks, jirka ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] tools/perf: Fix the mask in regs_dump__printf and
On Mon, Jun 20, 2016 at 02:14:01PM +0530, Madhavan Srinivasan wrote: > When decoding the perf_regs mask in regs_dump__printf(), > we loop through the mask using find_first_bit and find_next_bit functions. > "mask" is of type "u64", but sent as a "unsigned long *" to > lib functions along with sizeof(). While the exisitng code works fine in > most of the case, the logic is broken when using a 32bit perf on a > 64bit kernel (Big Endian). We end up reading the wrong word of the u64 > first in the lib functions. hum, I still don't see why this happens.. why do we read the wrong word in this case? thanks, jirka > first in the lib functions. Proposed fix is to swap the words of the > u64 to handle this case. This is not endianess swap. > SNIP ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] devpts: remove DEVPTS_MULTIPLE_INSTANCES from all configs
As each mount of devpts is now an independent filesystem, the DEVPTS_MULTIPLE_INSTANCES config option no longer exists. So remove it. Signed-off-by: Alexandru Moise <00moses.alexande...@gmail.com> --- arch/arc/configs/tb10x_defconfig| 1 - arch/arm/configs/mxs_defconfig | 1 - arch/mips/configs/ip27_defconfig| 1 - arch/mips/configs/nlm_xlp_defconfig | 1 - arch/mips/configs/nlm_xlr_defconfig | 1 - arch/parisc/configs/generic-64bit_defconfig | 1 - arch/powerpc/configs/powernv_defconfig | 1 - arch/powerpc/configs/pseries_defconfig | 1 - arch/s390/configs/default_defconfig | 1 - arch/s390/configs/gcov_defconfig| 1 - arch/s390/configs/performance_defconfig | 1 - arch/sh/configs/sh7785lcr_32bit_defconfig | 1 - arch/xtensa/configs/iss_defconfig | 1 - tools/testing/selftests/mount/config| 1 - 14 files changed, 14 deletions(-) diff --git a/arch/arc/configs/tb10x_defconfig b/arch/arc/configs/tb10x_defconfig index 4c51183..be0b4fb 100644 --- a/arch/arc/configs/tb10x_defconfig +++ b/arch/arc/configs/tb10x_defconfig @@ -58,7 +58,6 @@ CONFIG_STMMAC_ETH=y # CONFIG_INPUT is not set # CONFIG_SERIO is not set # CONFIG_VT is not set -CONFIG_DEVPTS_MULTIPLE_INSTANCES=y # CONFIG_LEGACY_PTYS is not set # CONFIG_DEVKMEM is not set CONFIG_SERIAL_8250=y diff --git a/arch/arm/configs/mxs_defconfig b/arch/arm/configs/mxs_defconfig index 6e0f751..65a84b4 100644 --- a/arch/arm/configs/mxs_defconfig +++ b/arch/arm/configs/mxs_defconfig @@ -77,7 +77,6 @@ CONFIG_INPUT_EVDEV=y CONFIG_INPUT_TOUCHSCREEN=y CONFIG_TOUCHSCREEN_TSC2007=m # CONFIG_SERIO is not set -CONFIG_DEVPTS_MULTIPLE_INSTANCES=y # CONFIG_LEGACY_PTYS is not set # CONFIG_DEVKMEM is not set CONFIG_SERIAL_AMBA_PL011=y diff --git a/arch/mips/configs/ip27_defconfig b/arch/mips/configs/ip27_defconfig index 2b74aee..df11563 100644 --- a/arch/mips/configs/ip27_defconfig +++ b/arch/mips/configs/ip27_defconfig @@ -266,7 +266,6 @@ CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_EXTENDED=y CONFIG_SERIAL_8250_MANY_PORTS=y CONFIG_SERIAL_8250_SHARE_IRQ=y -CONFIG_DEVPTS_MULTIPLE_INSTANCES=y CONFIG_HW_RANDOM_TIMERIOMEM=m CONFIG_I2C_CHARDEV=m CONFIG_I2C_ALI1535=m diff --git a/arch/mips/configs/nlm_xlp_defconfig b/arch/mips/configs/nlm_xlp_defconfig index b496c25..5c40b48 100644 --- a/arch/mips/configs/nlm_xlp_defconfig +++ b/arch/mips/configs/nlm_xlp_defconfig @@ -409,7 +409,6 @@ CONFIG_SERIO_SERPORT=m CONFIG_SERIO_LIBPS2=y CONFIG_SERIO_RAW=m CONFIG_VT_HW_CONSOLE_BINDING=y -CONFIG_DEVPTS_MULTIPLE_INSTANCES=y CONFIG_LEGACY_PTY_COUNT=0 CONFIG_SERIAL_NONSTANDARD=y CONFIG_N_HDLC=m diff --git a/arch/mips/configs/nlm_xlr_defconfig b/arch/mips/configs/nlm_xlr_defconfig index 8e99ad8..47a2756 100644 --- a/arch/mips/configs/nlm_xlr_defconfig +++ b/arch/mips/configs/nlm_xlr_defconfig @@ -347,7 +347,6 @@ CONFIG_SERIO_SERPORT=m CONFIG_SERIO_LIBPS2=y CONFIG_SERIO_RAW=m CONFIG_VT_HW_CONSOLE_BINDING=y -CONFIG_DEVPTS_MULTIPLE_INSTANCES=y CONFIG_LEGACY_PTY_COUNT=0 CONFIG_SERIAL_NONSTANDARD=y CONFIG_N_HDLC=m diff --git a/arch/parisc/configs/generic-64bit_defconfig b/arch/parisc/configs/generic-64bit_defconfig index e945c08..69aa66c 100644 --- a/arch/parisc/configs/generic-64bit_defconfig +++ b/arch/parisc/configs/generic-64bit_defconfig @@ -166,7 +166,6 @@ CONFIG_INPUT_MISC=y CONFIG_SERIO_SERPORT=m # CONFIG_HP_SDC is not set CONFIG_SERIO_RAW=m -CONFIG_DEVPTS_MULTIPLE_INSTANCES=y # CONFIG_LEGACY_PTYS is not set CONFIG_NOZOMI=m # CONFIG_DEVKMEM is not set diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig index 0450310..2fd6bbe 100644 --- a/arch/powerpc/configs/powernv_defconfig +++ b/arch/powerpc/configs/powernv_defconfig @@ -176,7 +176,6 @@ CONFIG_PPP_SYNC_TTY=m CONFIG_INPUT_EVDEV=m CONFIG_INPUT_MISC=y # CONFIG_SERIO_SERPORT is not set -CONFIG_DEVPTS_MULTIPLE_INSTANCES=y CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_JSM=m diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig index 36871a4..3c325ba 100644 --- a/arch/powerpc/configs/pseries_defconfig +++ b/arch/powerpc/configs/pseries_defconfig @@ -180,7 +180,6 @@ CONFIG_INPUT_EVDEV=m CONFIG_INPUT_MISC=y CONFIG_INPUT_PCSPKR=m # CONFIG_SERIO_SERPORT is not set -CONFIG_DEVPTS_MULTIPLE_INSTANCES=y CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_ICOM=m diff --git a/arch/s390/configs/default_defconfig b/arch/s390/configs/default_defconfig index d5ec71b..501e93a 100644 --- a/arch/s390/configs/default_defconfig +++ b/arch/s390/configs/default_defconfig @@ -453,7 +453,6 @@ CONFIG_PPP_SYNC_TTY=m # CONFIG_INPUT_KEYBOARD is not set # CONFIG_INPUT_MOUSE is not set # CONFIG_SERIO is not set -CONFIG_DEVPTS_MULTIPLE_INSTANCES=y CONFIG_LEGACY_PTY_COUNT=0 CONFIG_HW_RANDOM_VIRTIO=m CONFIG_RAW_DRIVER=m diff --git a/arch/s390/configs/gcov_defconfig b/arch/s390/configs/gcov_defconfig index
Re: [v6, 1/2] cxl: Add mechanism for delivering AFU driver specific events
Hi Philippe, Few points Philippe Bergheaudwrites: > +int cxl_unset_driver_ops(struct cxl_context *ctx) > +{ > + if (atomic_read(>afu_driver_events)) > + return -EBUSY; > + > + ctx->afu_driver_ops = NULL; Need a write memory barrier so that afu_driver_ops isnt possibly called after this store. > > -static inline int ctx_event_pending(struct cxl_context *ctx) > +static ssize_t afu_driver_event_copy(struct cxl_context *ctx, > + char __user *buf, > + struct cxl_event *event, > + struct cxl_event_afu_driver_reserved *pl) > { > - return (ctx->pending_irq || ctx->pending_fault || > - ctx->pending_afu_err || (ctx->status == CLOSED)); > + /* Check data size */ > + if (!pl || (pl->data_size > CXL_MAX_EVENT_DATA_SIZE)) { > + ctx->afu_driver_ops->event_delivered(ctx, pl, -EINVAL); > + return -EFAULT; > + } Should also check against the length of user-buffer (count) provided in the read call.Ideally this condition check should be moved to the read call where you have access to the count variable. Right now libcxl is using a harcoded value of CXL_READ_MIN_SIZE to issue the read call and in kernel code we have a check to ensure that read buffer is atleast CXL_READ_MIN_SIZE in size. But it might be a good idea to decouple driver from CXL_MAX_EVENT_DATA_SIZE. Ideally the maximum event size that we can support should be dependent on the amount user buffer we receive in the read call. That way future libcxl can support larger event_data without needing a change to the cxl.h > diff --git a/include/uapi/misc/cxl.h b/include/uapi/misc/cxl.h > index 8cd334f..4fa36e4 100644 > --- a/include/uapi/misc/cxl.h > +++ b/include/uapi/misc/cxl.h > @@ -93,6 +93,7 @@ enum cxl_event_type { > CXL_EVENT_AFU_INTERRUPT = 1, > CXL_EVENT_DATA_STORAGE = 2, > CXL_EVENT_AFU_ERROR = 3, > + CXL_EVENT_AFU_DRIVER= 4, > }; > > struct cxl_event_header { > @@ -124,12 +125,32 @@ struct cxl_event_afu_error { > __u64 error; > }; > > +#define CXL_MAX_EVENT_DATA_SIZE 128 > + Agree with Matt's earlier comments. 128 is very small and I would prefer for atleast a page size (4k/64K) limit. Cheers, ~ Vaibhav ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2] tools/perf: Fix the mask in regs_dump__printf and
When decoding the perf_regs mask in regs_dump__printf(), we loop through the mask using find_first_bit and find_next_bit functions. "mask" is of type "u64", but sent as a "unsigned long *" to lib functions along with sizeof(). While the exisitng code works fine in most of the case, the logic is broken when using a 32bit perf on a 64bit kernel (Big Endian). We end up reading the wrong word of the u64 first in the lib functions. Proposed fix is to swap the words of the u64 to handle this case. This is not endianess swap. Suggested-by: Yury NorovCc: Yury Norov Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Adrian Hunter Cc: Kan Liang Cc: Wang Nan Cc: Michael Ellerman Signed-off-by: Madhavan Srinivasan --- Changelog v1: 1)updated commit message and patch subject 2)Add the fix to print_sample_iregs() in builtin-script.c tools/perf/builtin-script.c | 7 ++- tools/perf/util/session.c | 7 ++- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index e3ce2f34d3ad..76d5006ebcc3 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -412,11 +412,16 @@ static void print_sample_iregs(struct perf_sample *sample, struct regs_dump *regs = >intr_regs; uint64_t mask = attr->sample_regs_intr; unsigned i = 0, r; + unsigned long _mask[sizeof(mask)/sizeof(unsigned long)]; if (!regs) return; - for_each_set_bit(r, (unsigned long *) , sizeof(mask) * 8) { + _mask[0] = mask & ULONG_MAX; + if (sizeof(mask) > sizeof(unsigned long)) + _mask[1] = mask >> 32; + + for_each_set_bit(r, _mask, sizeof(mask) * 8) { u64 val = regs->regs[i++]; printf("%5s:0x%"PRIx64" ", perf_reg_name(r), val); } diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 5214974e841a..2eaa42a4832a 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -940,8 +940,13 @@ static void branch_stack__printf(struct perf_sample *sample) static void regs_dump__printf(u64 mask, u64 *regs) { unsigned rid, i = 0; + unsigned long _mask[sizeof(mask)/sizeof(unsigned long)]; - for_each_set_bit(rid, (unsigned long *) , sizeof(mask) * 8) { + _mask[0] = mask & ULONG_MAX; + if (sizeof(mask) > sizeof(unsigned long)) + _mask[1] = mask >> 32; + + for_each_set_bit(rid, _mask, sizeof(mask) * 8) { u64 val = regs[i++]; printf(" %-5s 0x%" PRIx64 "\n", -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 12/12] leds: Only descend into leds directory when CONFIG_NEW_LEDS is set
On 06/18/2016 12:46 AM, Andrew F. Davis wrote: On 06/15/2016 01:48 AM, Jacek Anaszewski wrote: Hi Andrew, Thanks for the patch. Please address the issue [1] raised by test bot and resubmit. Thanks, Jacek Anaszewski [1] https://lkml.org/lkml/2016/6/13/1091 It looks like some systems use 'gpio_led_register_device' to make an in-memory copy of their LED device table so the original can be removed as .init.rodata. This doesn't necessarily depend on the LED subsystem but it kind of seems useless when the rest of the subsystem is disabled. One solution could be to use a dummy 'gpio_led_register_device' when the subsystem is not enabled. It sounds good. Please add a no-op version of gpio_led_register_device() to include/leds.h, in a separate patch. Thanks, Jacek Anaszewski Another is just to remove the five or so uses of 'gpio_led_register_device' and have those systems register LED device tables like other systems do. If nether of these are acceptable then this patch can be dropped from this series for now. Thanks, Andrew On 06/13/2016 10:02 PM, Andrew F. Davis wrote: When CONFIG_NEW_LEDS is not set make will still descend into the leds directory but nothing will be built. This produces unneeded build artifacts and messages in addition to slowing the build. Fix this here. Signed-off-by: Andrew F. Davis--- drivers/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/Makefile b/drivers/Makefile index 567e32c..fa514d5 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -127,7 +127,7 @@ obj-$(CONFIG_CPU_FREQ)+= cpufreq/ obj-$(CONFIG_CPU_IDLE)+= cpuidle/ obj-$(CONFIG_MMC)+= mmc/ obj-$(CONFIG_MEMSTICK)+= memstick/ -obj-y+= leds/ +obj-$(CONFIG_NEW_LEDS)+= leds/ obj-$(CONFIG_INFINIBAND)+= infiniband/ obj-$(CONFIG_SGI_SN)+= sn/ obj-y+= firmware/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2] powerpc/mm: Don't do debug print before we do feature fixup
On 6/19/16, Benjamin Herrenschmidtwrote: > On Sat, 2016-06-18 at 23:57 +0530, Aneesh Kumar K.V wrote: >> We use feature fixup in segment and page fault path and hence we should >> not call any function that can cause either of these before we finish >> feature fixup. >> >> Calling into early debug routine can result in segment fault as updated >> in >> https://lkml.kernel.org/r/CAOJe8K2L8D1M_yZPmyiZ11JvJ+HAS0aFHOnsqaY=xlonqta...@mail.gmail.com >> >> Avoid the segment fault by removing the debug print. Also remove the >> matching closing debug print at the end of the function. > > Please add a comment at the beginning of setup_system() explaining > the situation. IE, that until the fixups are done, not even debug > printfs can be used. > Right, otherwise we'll hit the same issue in the future. > It used to work though, the fact that we used 256M segment in that > case wasn't an issue, in part because btext only really existed on > machines that didn't have 1T segments. It was only ever going to be > a segment fault, the rest was bolted (early ioremap). > > Ben. > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev