[RESEND PATCH] x86/boot/KASLR: Extend movable_node option for KASLR
The movable_node option is a boot-time switch to make sure the physical NUMA nodes can be hot-added/removed when ACPI table can't be parsed to provide the memory hotplug information. As we all know, there is always one node, called "home node", which can't be movabled and the kernel image resides in it. With movable_node option, Linux allocates new early memorys near the kernel image to avoid using the other movable node. But, due to KASLR also can't get the the memory hotplug information, it may randomize the kernel image into a movable node which breaks the rule of movable_node option and makes the physical hot-add/remove operation failed. The perfect solution is providing the memory hotplug information to KASLR. But, it needs the efforts from hardware engineers and software engineers. Here is an alternative method. Extend movable_node option to restrict kernel to be randomized in the home node by adding a parameter. this parameter sets up the boundaries between the home nodes and other nodes. Reported-by: Chao FanSigned-off-by: Dou Liyang Reviewed-by: Kees Cook --- Changelog: -Rewrite the commit log and document. Documentation/admin-guide/kernel-parameters.txt | 12 ++-- arch/x86/boot/compressed/kaslr.c| 19 --- 2 files changed, 26 insertions(+), 5 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 1d1d53f85ddd..0cfc0b10a117 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2353,7 +2353,8 @@ mousedev.yres= [MOUSE] Vertical screen resolution, used for devices reporting absolute coordinates, such as tablets - movablecore=nn[KMG] [KNL,X86,IA-64,PPC] This parameter + movablecore=nn[KMG] + [KNL,X86,IA-64,PPC] This parameter is similar to kernelcore except it specifies the amount of memory used for migratable allocations. If both kernelcore and movablecore is specified, @@ -2363,12 +2364,19 @@ that the amount of memory usable for all allocations is not too small. - movable_node[KNL] Boot-time switch to make hotplugable memory + movable_node[KNL] Boot-time switch to make hot-pluggable memory NUMA nodes to be movable. This means that the memory of such nodes will be usable only for movable allocations which rules out almost all kernel allocations. Use with caution! + movable_node=nn[KMG] + [KNL] Extend movable_node to make it work well with KASLR. + This parameter is the boundaries between the "home node" and + the other nodes. The "home node" is an immovable node and is + defined by BIOS. Set the 'nn' to the memory size of "home + node", the kernel image will be extracted in immovable nodes. + MTD_Partition= [MTD] Format: ,,, diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index 8199a6187251..f906d7890e69 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -92,7 +92,10 @@ struct mem_vector { static bool memmap_too_large; -/* Store memory limit specified by "mem=nn[KMG]" or "memmap=nn[KMG]" */ +/* + * Store memory limit specified by the following situations: + * "mem=nn[KMG]" or "memmap=nn[KMG]" or "movable_node=nn[KMG]" + */ unsigned long long mem_limit = ULLONG_MAX; @@ -214,7 +217,8 @@ static int handle_mem_memmap(void) char *param, *val; u64 mem_size; - if (!strstr(args, "memmap=") && !strstr(args, "mem=")) + if (!strstr(args, "memmap=") && !strstr(args, "mem=") && + !strstr(args, "movable_node=")) return 0; tmp_cmdline = malloc(len + 1); @@ -249,7 +253,16 @@ static int handle_mem_memmap(void) free(tmp_cmdline); return -EINVAL; } - mem_limit = mem_size; + mem_limit = mem_limit > mem_size ? mem_size : mem_limit; + } else if (!strcmp(param, "movable_node")) { + char *p = val; + + mem_size = memparse(p, ); + if (mem_size == 0) { + free(tmp_cmdline); + return -EINVAL; + } + mem_limit = mem_limit > mem_size ? mem_size : mem_limit; } } -- 2.14.3 -- To unsubscribe from this list: send the line
Re: [PATCH 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V
On Mon, 02 Apr 2018 05:31:22 PDT (-0700), alan...@andestech.com wrote: This implements the baseline PMU for RISC-V platforms. To ease future PMU portings, a guide is also written, containing perf concepts, arch porting practices and some hints. Changes in v2: - Fix the bug reported by Alex, which was caused by not sufficient initialization. Check https://lkml.org/lkml/2018/3/31/251 for the discussion. Alan Kao (2): perf: riscv: preliminary RISC-V support perf: riscv: Add Document for Future Porting Guide Documentation/riscv/pmu.txt | 249 +++ arch/riscv/Kconfig | 12 + arch/riscv/include/asm/perf_event.h | 76 +- arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/perf_event.c | 468 5 files changed, 802 insertions(+), 4 deletions(-) create mode 100644 Documentation/riscv/pmu.txt create mode 100644 arch/riscv/kernel/perf_event.c I'm having some trouble pulling this into my tree. I think you might have another patch floating around somewhere, as I don't have any arch/riscv/include/asm/perf_event.h right now. Do you mind rebasing this on top of linux-4.16 so I can look properly? Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] perf: riscv: preliminary RISC-V support
This works for cycle and instruction counts. Alex On Mon, Apr 2, 2018 at 5:31 AM, Alan Kaowrote: > > This patch provide a basic PMU, riscv_base_pmu, which supports two > general hardware event, instructions and cycles. Furthermore, this > PMU serves as a reference implementation to ease the portings in > the future. > > riscv_base_pmu should be able to run on any RISC-V machine that > conforms to the Priv-Spec. Note that the latest qemu model hasn't > fully support a proper behavior of Priv-Spec 1.10 yet, but work > around should be easy with very small fixes. Please check > https://github.com/riscv/riscv-qemu/pull/115 for future updates. > > Cc: Nick Hu > Cc: Greentime Hu > Signed-off-by: Alan Kao > --- > arch/riscv/Kconfig | 12 + > arch/riscv/include/asm/perf_event.h | 76 +- > arch/riscv/kernel/Makefile | 1 + > arch/riscv/kernel/perf_event.c | 468 > > 4 files changed, 553 insertions(+), 4 deletions(-) > create mode 100644 arch/riscv/kernel/perf_event.c > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index c22ebe08e902..3fbf19456c9a 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -203,6 +203,18 @@ config RISCV_ISA_C > config RISCV_ISA_A > def_bool y > > +menu "PMU type" > + depends on PERF_EVENTS > + > +config RISCV_BASE_PMU > + bool "Base Performance Monitoring Unit" > + def_bool y > + help > + A base PMU that serves as a reference implementation and has limited > + feature of perf. > + > +endmenu > + > endmenu > > menu "Kernel type" > diff --git a/arch/riscv/include/asm/perf_event.h > b/arch/riscv/include/asm/perf_event.h > index e13d2ff29e83..98e2efb02d25 100644 > --- a/arch/riscv/include/asm/perf_event.h > +++ b/arch/riscv/include/asm/perf_event.h > @@ -1,13 +1,81 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > /* > * Copyright (C) 2018 SiFive > + * Copyright (C) 2018 Andes Technology Corporation > * > - * This program is free software; you can redistribute it and/or > - * modify it under the terms of the GNU General Public Licence > - * as published by the Free Software Foundation; either version > - * 2 of the Licence, or (at your option) any later version. > */ > > #ifndef _ASM_RISCV_PERF_EVENT_H > #define _ASM_RISCV_PERF_EVENT_H > > +#include > +#include > + > +#define RISCV_BASE_COUNTERS2 > + > +/* > + * The RISCV_MAX_COUNTERS parameter should be specified. > + */ > + > +#ifdef CONFIG_RISCV_BASE_PMU > +#define RISCV_MAX_COUNTERS 2 > +#endif > + > +#ifndef RISCV_MAX_COUNTERS > +#error "Please provide a valid RISCV_MAX_COUNTERS for the PMU." > +#endif > + > +/* > + * These are the indexes of bits in counteren register *minus* 1, > + * except for cycle. It would be coherent if it can directly mapped > + * to counteren bit definition, but there is a *time* register at > + * counteren[1]. Per-cpu structure is scarce resource here. > + * > + * According to the spec, an implementation can support counter up to > + * mhpmcounter31, but many high-end processors has at most 6 general > + * PMCs, we give the definition to MHPMCOUNTER8 here. > + */ > +#define RISCV_PMU_CYCLE0 > +#define RISCV_PMU_INSTRET 1 > +#define RISCV_PMU_MHPMCOUNTER3 2 > +#define RISCV_PMU_MHPMCOUNTER4 3 > +#define RISCV_PMU_MHPMCOUNTER5 4 > +#define RISCV_PMU_MHPMCOUNTER6 5 > +#define RISCV_PMU_MHPMCOUNTER7 6 > +#define RISCV_PMU_MHPMCOUNTER8 7 > + > +#define RISCV_OP_UNSUPP(-EOPNOTSUPP) > + > +struct cpu_hw_events { > + /* # currently enabled events*/ > + int n_events; > + /* currently enabled events */ > + struct perf_event *events[RISCV_MAX_COUNTERS]; > + /* vendor-defined PMU data */ > + void*platform; > +}; > + > +struct riscv_pmu { > + struct pmu *pmu; > + > + /* generic hw/cache events table */ > + const int *hw_events; > + const int (*cache_events)[PERF_COUNT_HW_CACHE_MAX] > + [PERF_COUNT_HW_CACHE_OP_MAX] > + [PERF_COUNT_HW_CACHE_RESULT_MAX]; > + /* method used to map hw/cache events */ > + int (*map_hw_event)(u64 config); > + int (*map_cache_event)(u64 config); > + > + /* max generic hw events in map */ > + int max_events; > + /* number total counters, 2(base) + x(general) */ > + int num_counters; > + /* the width of the counter */ > + int counter_width; > + > + /* vendor-defined PMU features */ > + void*platform; > +}; > + > #endif /* _ASM_RISCV_PERF_EVENT_H */ > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile > index ffa439d4a364..f50d19816757 100644 >
Re: [PATCH v4 6/6] coresight: etm4x: Support panic kdump
On Fri, Mar 30, 2018 at 11:15:24AM +0800, Leo Yan wrote: > ETMv4 hardware information and configuration needs to be saved as > metadata; the metadata format should be compatible with 'perf' tool and > finally is used by tracing data decoder. ETMv4 works as tracer per CPU, > we cannot wait for gathering ETM info after CPU panic has happened in > case there have CPU is locked up and can't response inter-processor > interrupt for execution dump operations; so it's more reliable to gather > tracer metadata when all of the CPUs are alive. > > This patch saves ETMv4 metadata but with the different method for > different registers. Since values in TRCIDR{0, 1, 2, 8} and > TRCAUTHSTATUS are read-only and won't change afterward, thus those > registers values are filled into metadata structure when tracers are > instantiated. The configuration and control registers TRCCONFIGR and > TRCTRACEIDR are dynamically configured, their values are recorded during > tracer enabling phase. > > To avoid unnecessary overload introduced by set/clear operations for > updating kdump node, we only set ETMv4 metadata info for the > corresponding kdump node at initialization and won't be cleared anymore. > > Suggested-by: Mathieu Poirier> Signed-off-by: Leo Yan > --- > drivers/hwtracing/coresight/coresight-etm4x.c | 27 > +++ > drivers/hwtracing/coresight/coresight-etm4x.h | 15 +++ > 2 files changed, 42 insertions(+) > > diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c > b/drivers/hwtracing/coresight/coresight-etm4x.c > index cf364a5..88b1e19 100644 > --- a/drivers/hwtracing/coresight/coresight-etm4x.c > +++ b/drivers/hwtracing/coresight/coresight-etm4x.c > @@ -288,6 +288,8 @@ static int etm4_enable(struct coresight_device *csdev, > int ret; > u32 val; > struct etmv4_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); > + struct etmv4_config *config = >config; > + struct etmv4_metadata *metadata = >metadata; > > val = local_cmpxchg(>mode, CS_MODE_DISABLED, mode); > > @@ -306,6 +308,10 @@ static int etm4_enable(struct coresight_device *csdev, > ret = -EINVAL; > } > > + /* Update tracer meta data after tracer configuration */ > + metadata->trcconfigr = config->cfg; > + metadata->trctraceidr = drvdata->trcid; > + > /* The tracer didn't start */ > if (ret) > local_set(>mode, CS_MODE_DISABLED); > @@ -438,6 +444,7 @@ static void etm4_init_arch_data(void *info) > u32 etmidr4; > u32 etmidr5; > struct etmv4_drvdata *drvdata = info; > + struct etmv4_metadata *metadata = >metadata; > > /* Make sure all registers are accessible */ > etm4_os_unlock(drvdata); > @@ -590,6 +597,16 @@ static void etm4_init_arch_data(void *info) > drvdata->nrseqstate = BMVAL(etmidr5, 25, 27); > /* NUMCNTR, bits[30:28] number of counters available for tracing */ > drvdata->nr_cntr = BMVAL(etmidr5, 28, 30); > + > + /* Update metadata */ > + metadata->magic = ETM4_METADATA_MAGIC; > + metadata->cpu = drvdata->cpu; > + metadata->trcidr0 = readl_relaxed(drvdata->base + TRCIDR0); > + metadata->trcidr1 = readl_relaxed(drvdata->base + TRCIDR1); > + metadata->trcidr2 = readl_relaxed(drvdata->base + TRCIDR2); > + metadata->trcidr8 = readl_relaxed(drvdata->base + TRCIDR8); > + metadata->trcauthstatus = readl_relaxed(drvdata->base + TRCAUTHSTATUS); > + > CS_LOCK(drvdata->base); > } > > @@ -957,6 +974,7 @@ static int etm4_probe(struct amba_device *adev, const > struct amba_id *id) > struct device *dev = >dev; > struct coresight_platform_data *pdata = NULL; > struct etmv4_drvdata *drvdata; > + struct etmv4_metadata *metadata; > struct resource *res = >res; > struct coresight_desc desc = { 0 }; > struct device_node *np = adev->dev.of_node; > @@ -1027,6 +1045,15 @@ static int etm4_probe(struct amba_device *adev, const > struct amba_id *id) > goto err_arch_supported; > } > > + /* Set source device handler and metadata into kdump node */ > + metadata = >metadata; > + ret = coresight_kdump_source(drvdata->cpu, drvdata->csdev, > + (char *)metadata, sizeof(*metadata)); > + if (ret) { > + coresight_unregister(drvdata->csdev); > + goto err_arch_supported; > + } > + > ret = etm_perf_symlink(drvdata->csdev, true); > if (ret) { > coresight_unregister(drvdata->csdev); > diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h > b/drivers/hwtracing/coresight/coresight-etm4x.h > index b3b5ea7..08dc8b7 100644 > --- a/drivers/hwtracing/coresight/coresight-etm4x.h > +++ b/drivers/hwtracing/coresight/coresight-etm4x.h > @@ -198,6 +198,20 @@ > #define ETM_EXLEVEL_NS_HYP BIT(14) > #define ETM_EXLEVEL_NS_NABIT(15) >
Re: [PATCH v4 5/6] coresight: Set and clear sink device handler for kdump node
On Fri, Mar 30, 2018 at 11:15:23AM +0800, Leo Yan wrote: > If Coresight path is enabled for specific CPU, the sink device handler > need to be set to kdump node; on the other hand we also need to clear > sink device handler when path is disabled. > > This patch sets sink devices handler for kdump node for two separate > Coresight enabling modes: CS_MODE_SYSFS and CS_MODE_PERF; and clear the > handler when Coresight is disabled. > > Signed-off-by: Leo Yan> --- > drivers/hwtracing/coresight/coresight-etm-perf.c | 5 + > drivers/hwtracing/coresight/coresight.c | 16 ++-- > 2 files changed, 19 insertions(+), 2 deletions(-) > > diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c > b/drivers/hwtracing/coresight/coresight-etm-perf.c > index 8a0ad77..f8b159c 100644 > --- a/drivers/hwtracing/coresight/coresight-etm-perf.c > +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c > @@ -139,6 +139,8 @@ static void free_event_data(struct work_struct *work) > for_each_cpu(cpu, mask) { > if (!(IS_ERR_OR_NULL(event_data->path[cpu]))) > coresight_release_path(event_data->path[cpu]); > + > + coresight_kdump_sink(cpu, NULL); > } > > kfree(event_data->path); > @@ -238,6 +240,9 @@ static void *etm_setup_aux(int event_cpu, void **pages, > event_data->path[cpu] = coresight_build_path(csdev, sink); > if (IS_ERR(event_data->path[cpu])) > goto err; > + > + if (coresight_kdump_sink(cpu, sink)) > + goto err; I remember telling you to use free_event_data() and etm_setup_aux(). _Maybe_ it made sense in the previous patchset but in this one it won't work. We need to reflect the current trace context, as such use etm_event_start() and etm_event_stop(). In etm_event_start() call coresight_kdump_sink(cpu, sink) just before source_ops(csdev)->enable(). Similarly call coresight_kdump_sink(cpu, NULL) right after source_ops(csdev)->disable() in etm_event_stop(). Find me on IRC if you want more information on this. > } > > if (!sink_ops(sink)->alloc_buffer) > diff --git a/drivers/hwtracing/coresight/coresight.c > b/drivers/hwtracing/coresight/coresight.c > index 389c4ba..483a1f7 100644 > --- a/drivers/hwtracing/coresight/coresight.c > +++ b/drivers/hwtracing/coresight/coresight.c > @@ -272,6 +272,7 @@ static int coresight_enable_source(struct > coresight_device *csdev, u32 mode) > static bool coresight_disable_source(struct coresight_device *csdev) > { > if (atomic_dec_return(csdev->refcnt) == 0) { > + This newline shouldn't be part of this set. > if (source_ops(csdev)->disable) > source_ops(csdev)->disable(csdev, NULL); > csdev->enable = false; > @@ -612,6 +613,13 @@ int coresight_enable(struct coresight_device *csdev) > if (ret) > goto err_source; > > + cpu = source_ops(csdev)->cpu_id(csdev); > + > + /* Set sink device handler into kdump node */ > + ret = coresight_kdump_sink(cpu, sink); > + if (ret) > + goto err_kdump; > + Call coresight_kdump_sink() just before coresight_enable_source(). That way if there is a dump just after coresight_enable_source() is called we get the chance of getting some traces in the dump file. > switch (subtype) { > case CORESIGHT_DEV_SUBTYPE_SOURCE_PROC: > /* > @@ -621,7 +629,6 @@ int coresight_enable(struct coresight_device *csdev) >* be a single session per tracer (when working from sysFS) >* a per-cpu variable will do just fine. >*/ > - cpu = source_ops(csdev)->cpu_id(csdev); > per_cpu(tracer_path, cpu) = path; > break; > case CORESIGHT_DEV_SUBTYPE_SOURCE_SOFTWARE: > @@ -636,6 +643,9 @@ int coresight_enable(struct coresight_device *csdev) > mutex_unlock(_mutex); > return ret; > > +err_kdump: > + coresight_disable_source(csdev); > + > err_source: > coresight_disable_path(path); > > @@ -659,9 +669,10 @@ void coresight_disable(struct coresight_device *csdev) > if (!csdev->enable || !coresight_disable_source(csdev)) > goto out; > > + cpu = source_ops(csdev)->cpu_id(csdev); > + > switch (csdev->subtype.source_subtype) { > case CORESIGHT_DEV_SUBTYPE_SOURCE_PROC: > - cpu = source_ops(csdev)->cpu_id(csdev); > path = per_cpu(tracer_path, cpu); > per_cpu(tracer_path, cpu) = NULL; > break; > @@ -674,6 +685,7 @@ void coresight_disable(struct coresight_device *csdev) > break; > } > > + coresight_kdump_sink(cpu, NULL); > coresight_disable_path(path); > coresight_release_path(path); > > -- > 2.7.4 > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to
Re: [PATCH v4 4/6] coresight: tmc: Hook callback for panic kdump
On Fri, Mar 30, 2018 at 11:15:22AM +0800, Leo Yan wrote: > Since Coresight panic kdump functionality has been ready, this patch is > to hook panic callback function for ETB/ETF driver. The driver data > structure has allocated a buffer when the session started, so simply > save tracing data into this buffer when panic happens and update buffer > related info for kdump. > > Signed-off-by: Leo Yan> --- > drivers/hwtracing/coresight/coresight-tmc-etf.c | 30 > + > 1 file changed, 30 insertions(+) > > diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c > b/drivers/hwtracing/coresight/coresight-tmc-etf.c > index e2513b7..d20d546 100644 > --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c > +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c > @@ -504,6 +504,35 @@ static void tmc_update_etf_buffer(struct > coresight_device *csdev, > CS_LOCK(drvdata->base); > } > > +static void tmc_panic_cb(void *data) I would call the function tmc_kdump_panic_cb()... That way there is absolutely no confusion as to what it does. > +{ > + struct coresight_device *csdev = (struct coresight_device *)data; > + struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); > + unsigned long flags; > + > + if (WARN_ON_ONCE(drvdata->config_type != TMC_CONFIG_TYPE_ETB && > + drvdata->config_type != TMC_CONFIG_TYPE_ETF)) > + return; > + > + if (drvdata->mode == CS_MODE_DISABLED) > + return; This is racy - between the check and acquiring the spinlock someone may beat you to it. > + > + spin_lock_irqsave(>spinlock, flags); if (drvdata->mode == CS_MODE_DISABLED) goto out; drvdata->mode = CS_MODE_DISABLED > + > + CS_UNLOCK(drvdata->base); > + > + tmc_flush_and_stop(drvdata); > + tmc_etb_dump_hw(drvdata); > + > + CS_LOCK(drvdata->base); > + > + /* Update buffer info for panic dump */ > + csdev->kdump_buf = drvdata->buf; > + csdev->kdump_buf_sz = drvdata->len; out: > + > + spin_unlock_irqrestore(>spinlock, flags); > +} > + > static const struct coresight_ops_sink tmc_etf_sink_ops = { > .enable = tmc_enable_etf_sink, > .disable= tmc_disable_etf_sink, > @@ -512,6 +541,7 @@ static const struct coresight_ops_sink tmc_etf_sink_ops = > { > .set_buffer = tmc_set_etf_buffer, > .reset_buffer = tmc_reset_etf_buffer, > .update_buffer = tmc_update_etf_buffer, > + .panic_cb = tmc_panic_cb, > }; > > static const struct coresight_ops_link tmc_etf_link_ops = { > -- > 2.7.4 > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 3/6] coresight: Support panic kdump functionality
On Fri, Mar 30, 2018 at 11:15:21AM +0800, Leo Yan wrote: > After kernel panic happens, Coresight tracing data has much useful info > which can be used for analysis. For example, the trace info from ETB > RAM can be used to check the CPU execution flows before the crash. So > we can save the tracing data from sink devices, and rely on kdump to > save DDR content and uses "crash" tool to extract Coresight dumping > from the vmcore file. > > This patch is to add a simple framework to support panic dump > functionality; it registers panic notifier, and provide the helper > functions coresight_kdump_source()/coresight_kdump_sink() so Coresight > source and sink devices can be recorded into Coresight kdump node for > kernel panic kdump. > > When kernel panic happens, the notifier iterates dump array and invoke > callback function to dump tracing data. Later the tracing data can be > used to reverse execution flow before the kernel panic. > > Signed-off-by: Leo Yan> --- > drivers/hwtracing/coresight/Kconfig| 9 + > drivers/hwtracing/coresight/Makefile | 1 + > .../hwtracing/coresight/coresight-panic-kdump.c| 199 > + > drivers/hwtracing/coresight/coresight-priv.h | 12 ++ > include/linux/coresight.h | 4 + > 5 files changed, 225 insertions(+) > create mode 100644 drivers/hwtracing/coresight/coresight-panic-kdump.c > > diff --git a/drivers/hwtracing/coresight/Kconfig > b/drivers/hwtracing/coresight/Kconfig > index ef9cb3c..3089abf 100644 > --- a/drivers/hwtracing/coresight/Kconfig > +++ b/drivers/hwtracing/coresight/Kconfig > @@ -103,4 +103,13 @@ config CORESIGHT_CPU_DEBUG > properly, please refer Documentation/trace/coresight-cpu-debug.txt > for detailed description and the example for usage. > > +config CORESIGHT_PANIC_KDUMP > + bool "CoreSight Panic Kdump driver" > + depends on ARM || ARM64 > + help > + This driver provides panic kdump functionality for CoreSight devices. > + When kernel panic happen Coresight device supplied callback function s/Coresight/CoreSight > + is to dump trace data to memory. From then on, kdump can be used to > + extract the trace data from kernel dump file. > + > endif > diff --git a/drivers/hwtracing/coresight/Makefile > b/drivers/hwtracing/coresight/Makefile > index 61db9dd..946fe19 100644 > --- a/drivers/hwtracing/coresight/Makefile > +++ b/drivers/hwtracing/coresight/Makefile > @@ -18,3 +18,4 @@ obj-$(CONFIG_CORESIGHT_SOURCE_ETM4X) += coresight-etm4x.o \ > obj-$(CONFIG_CORESIGHT_DYNAMIC_REPLICATOR) += coresight-dynamic-replicator.o > obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o > obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o > +obj-$(CONFIG_CORESIGHT_PANIC_KDUMP) += coresight-panic-kdump.o > diff --git a/drivers/hwtracing/coresight/coresight-panic-kdump.c > b/drivers/hwtracing/coresight/coresight-panic-kdump.c > new file mode 100644 > index 000..f4589e9 > --- /dev/null > +++ b/drivers/hwtracing/coresight/coresight-panic-kdump.c > @@ -0,0 +1,199 @@ > +// SPDX-License-Identifier: GPL-2.0 > +// Copyright (c) 2017~2018 Linaro Limited. I don't remember if I commented on this before but the above line (not the SPDX) should be enclosed with C style comments (/* */) rather than C++ (//). I would also add a new line between the copyright statement and the header file listing. > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "coresight-priv.h" > + > +/** > + * struct coresight_kdump_node - Node information for dump > + * @source_csdev:Handler for source coresight device > + * @sink_csdev: Handler for sink coresight device > + */ > +struct coresight_kdump_node { > + struct coresight_device *source_csdev; > + struct coresight_device *sink_csdev; > +}; > + > +static DEFINE_SPINLOCK(coresight_kdump_lock); > +static struct coresight_kdump_node *coresight_kdump_nodes; > +static struct notifier_block coresight_kdump_nb; > + > +/** > + * coresight_kdump_source - Set source dump info for specific CPU > + * @cpu: CPU ID > + * @csdev: Source device structure handler > + * @data:Pointer for source device metadata buffer > + * @data_sz: Size of source device metadata buffer > + * > + * This function is a helper function which is used to set/clear source > device > + * handler and metadata when the tracer is enabled; and it can be used to > clear > + * source device related info when the tracer is disabled. > + * > + * Returns: 0 on success, negative errno otherwise. > + */ > +int coresight_kdump_source(int cpu, struct coresight_device *csdev, > +char *data, unsigned int data_sz) > +{ > + struct coresight_kdump_node *node; > + unsigned long flags; > + > + if (!coresight_kdump_nodes) > + return -EPROBE_DEFER; Before
Re: [PATCH v3 5/6] Initialize the mapping of KASan shadow memory
On Mon, 2 Apr 2018, Russell King - ARM Linux wrote: > On Mon, Apr 02, 2018 at 02:08:13PM -0400, Nicolas Pitre wrote: > > On Mon, 2 Apr 2018, Abbott Liu wrote: > > > > > index c79b829..20161e2 100644 > > > --- a/arch/arm/kernel/head-common.S > > > +++ b/arch/arm/kernel/head-common.S > > > @@ -115,6 +115,9 @@ __mmap_switched: > > > str r8, [r2]@ Save atags pointer > > > cmp r3, #0 > > > strne r10, [r3] @ Save control register values > > > +#ifdef CONFIG_KASAN > > > + bl kasan_early_init > > > +#endif > > > mov lr, #0 > > > b start_kernel > > > ENDPROC(__mmap_switched) > > > > Would be better if lr was cleared before calling kasan_early_init. > > No. The code is correct - please remember that "bl" writes to LR. You're right of course. /me giving up on patch review and going back to bed Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 5/6] Initialize the mapping of KASan shadow memory
On Mon, Apr 02, 2018 at 02:08:13PM -0400, Nicolas Pitre wrote: > On Mon, 2 Apr 2018, Abbott Liu wrote: > > > index c79b829..20161e2 100644 > > --- a/arch/arm/kernel/head-common.S > > +++ b/arch/arm/kernel/head-common.S > > @@ -115,6 +115,9 @@ __mmap_switched: > > str r8, [r2]@ Save atags pointer > > cmp r3, #0 > > strne r10, [r3] @ Save control register values > > +#ifdef CONFIG_KASAN > > + bl kasan_early_init > > +#endif > > mov lr, #0 > > b start_kernel > > ENDPROC(__mmap_switched) > > Would be better if lr was cleared before calling kasan_early_init. No. The code is correct - please remember that "bl" writes to LR. The point of clearing LR here is to ensure that start_kernel is called with a zero link register, which it won't be if kasan_early_init is moved after it. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 5/6] Initialize the mapping of KASan shadow memory
On Mon, 2 Apr 2018, Abbott Liu wrote: > index c79b829..20161e2 100644 > --- a/arch/arm/kernel/head-common.S > +++ b/arch/arm/kernel/head-common.S > @@ -115,6 +115,9 @@ __mmap_switched: > str r8, [r2]@ Save atags pointer > cmp r3, #0 > strne r10, [r3] @ Save control register values > +#ifdef CONFIG_KASAN > + bl kasan_early_init > +#endif > mov lr, #0 > b start_kernel > ENDPROC(__mmap_switched) Would be better if lr was cleared before calling kasan_early_init. Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/6] doc: Add documentation for Coresight panic kdump
On Fri, Mar 30, 2018 at 11:15:20AM +0800, Leo Yan wrote: > Add detailed documentation for Coresight panic kdump, which contains > the idea for why need Coresight panic kdump and introduce the > implementation of Coresight panic kdump framework; the last section is > to explain what's usage. > > Credits to Mathieu Poirier for many suggestions since the first version > patch reviewing. The suggestions include using an array to manage dump > related info, this makes code scalable for more CPUs; the Coresight > kdump driver and integration kdump flow with other Coresight devices > also have many ideas from Mathieu. Please remove the above paragraph. > > Suggested-by: Mathieu PoirierAnd the above line too. > Signed-off-by: Leo Yan > --- > .../trace/coresight/coresight-panic-kdump.txt | 130 > + > MAINTAINERS| 1 + > 2 files changed, 131 insertions(+) > create mode 100644 Documentation/trace/coresight/coresight-panic-kdump.txt > > diff --git a/Documentation/trace/coresight/coresight-panic-kdump.txt > b/Documentation/trace/coresight/coresight-panic-kdump.txt > new file mode 100644 > index 000..c02e520 > --- /dev/null > +++ b/Documentation/trace/coresight/coresight-panic-kdump.txt > @@ -0,0 +1,130 @@ > + Coresight Panic Kdump > + = > + > + Author: Leo Yan > + Date: March 29th, 2018 > + > +Introduction > + > + > +Coresight has different sinks for trace data, the trace data is quite useful > +for postmortem debugging. Embedded Trace Buffer (ETB) is one type sink which > +provides on-chip storage of trace data, usually uses SRAM as the buffer with > +several KBs size; if the SoC designs to support 'Local ETF' (ARM DDI 0461B, > +chapter 1.2.7), every CPU has one local ETB buffer so the per CPU trace data > +can avoid being overwritten by each other. Trace Memory Controller (TMC) is > +another kind sink designed as a successor to the CoreSight ETB to capture > trace > +into DRAM. I don't think details about the sinks themselves is worth adding here. In my opinion we can simply stick with the abstract notion of a sink and achieve the same result. > + > +After Linux kernel panic has occurred, the trace data keeps the last > execution > +flow before issues happen. We could consider the trace data is quite useful > for > +postmortem debugging, especially when we can save trace data into DRAM and > rely on Even in documentation files please keep line wrapped to 80 characters (note that checkpatch won't complain). Console text from the command line (as added below) is exempt from this rule. > +kdump to preserve them into vmcore file; at the end, we can retrieve trace > data > +from vmcore file and "offline" to analyze the execution flow. > + > + > +Implementation > +-- > + > +Coresight panic kdump is a simple framework to support Coresight dump > +functionality when panic happens, it maintains an array for the dump, every > array > +item is dedicated to one specific CPU by using CPU number as an index. For > +'offline' recovery and analysis Coresight tracing data, except should to > recovery This paragraph as a whole is hard to read and the usage of the word 'except' above doesn't not work in this context. Please consider reviewing and/or get in touch with me if you want to work on it together. > +tracing data for sinks, we also need to know CPU tracer configurations; for > this > +purpose, the array item is a structure which combines source and sink device > +handlers, the device handler points to Coresight device structure which > contains > +dump info: dump buffer base address and buffer size. Below diagram is to > +present data structures relationship: > + > + array: coresight_kdump_nodes > + +--+--+--+ > + | CPU0 | CPU1 | ...| > + +--+--+--+ > + | > + V > + coresight_kdump_node coresight_device > + +---+ +---+ > + |source_csdev | --> |kdump_buf | > + +---+ / +---+ > + |sink_csdev | ' |kdump_buf_sz | > + +---+ +---+ > + > +Every CPU has its own tracer, we need save tracer registers for tracer ID and > +configuration related information as metadata, the metadata is used by > tracing > +decoder. But the tracer has the different configuration at the different > phase, > +below diagram explains tracer configurations for different time points: at > the > +system boot phase, the tracer is disabled so its registers have not been set; > +after tracer has been enabled or when panic happens, tracer registers have > been > +configured, but we need to consider if there has CPU is locked up at
Re: [PATCH v4 0/6] Coresight: Support panic kdump
Hi Leo, Please see below (and in upcoming patches) my comments related to your latest work. Thanks, Mathieu On Fri, Mar 30, 2018 at 11:15:18AM +0800, Leo Yan wrote: > This patch set is to explore Coresight tracing data for postmortem > debugging. When kernel panic happens, the Coresight panic kdump can > help to save on-chip tracing data and tracer metadata into DRAM, later > relies on kdump and crash/perf tools to recovery tracing data for > "offline" analysis. > > The documentation is important to understand the purpose of Coresight > panic kdump, the implementation of framework and usage. Patches 0001 > and patch 0002 are used for creating new sub directory for placing > Coresight docs and add a new doc for Coresight panic kdump. > > Patch 0003 introduces the simple panic kdump framework which provides > helper functions can be used by Coresight devices, and it registers > panic notifier for dump tracing data. > > Patches 0004/0005 support panic kdump for ETB; Patch 0006 supports the > kdump for ETMv4. > > This patch set has been reworked by following suggestions at Linaro > HKG18 connect (mainly suggestions from Mathieu, thanks a lot!), and > it's rebased on acme git tree [1] with last commit 109d59b900e7 ('perf > vendor events s390: Add JSON files for IBM z14'). > > Due Coresight kdump data structure has been changed significantly, the > corresponding crash extension program also has been updated for this > reason [2]; furthermore the crash extension program is updated to > dynamically generate kernel buildid according to vmlinux elf info [3], > this is a fixing for the old code which uses hard-coded buildid value. > > This patch set has been verified on 96boards Hikey620 with Coresight > enabling by the sysFS interface. Also the updated crash extension > program has been verified to cowork with Coresight panic kdump and it > successfully extracts tracing data from the vmcore and finally can be > decoded by perf tool. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git > [2] https://git.linaro.org/people/leo.yan/crash.git/tree/extensions/csdump.c > [3] > https://git.linaro.org/people/leo.yan/crash.git/tree/extensions/csdump_buildid.c > > Changes from v3: > * Following Mathieu suggestion, reworked the panic kdump framework, > used kdump array to maintain source and sink device handlers; > * According to Mathieu suggestion, optimized panic notifier to > firstly dump panic CPU tracing data and then dump other CPUs tracing > data; > * Refined doc to reflect these implementation changes; > * Changed ETMv4 driver to add source device handler at probe phase; > * Refactored crash extension program to reflect kernel changes. > > Changes from v2: > * Add the two patches for documentation. > * Following Mathieu suggestion, reworked the panic kdump framework, > removed the useless flag "PRE_PANIC". > * According to comment, changed to add and delete kdump node operations > in sink enable/disable functions; > * According to Mathieu suggestion, handle kdump node > addition/deletion/updating separately for sysFS interface and perf > method. > > Changes from v1: > * Add support to dump ETMv4 meta data. > * Wrote 'crash' extension csdump.so so rely on it to generate 'perf' > format compatible file. > * Refactored panic dump driver to support pre & post panic dump. > > Changes from RFC: > * Follow Mathieu's suggestion, use general framework to support dump > functionality. > * Changed to use perf to analyse trace data. > > Leo Yan (6): > doc: Add Coresight documentation directory > doc: Add documentation for Coresight panic kdump > coresight: Support panic kdump functionality > coresight: tmc: Hook callback for panic kdump > coresight: Set and clear sink device handler for kdump node > coresight: etm4x: Support panic kdump > > Documentation/trace/coresight-cpu-debug.txt| 187 -- > Documentation/trace/coresight.txt | 383 > - > .../trace/coresight/coresight-cpu-debug.txt| 187 ++ > .../trace/coresight/coresight-panic-kdump.txt | 130 +++ > Documentation/trace/coresight/coresight.txt| 383 > + Please use the -M option with git format-patch in order to prevent the metrics associated with the renaming of files to be tallied. > MAINTAINERS| 5 +- > drivers/hwtracing/coresight/Kconfig| 9 + > drivers/hwtracing/coresight/Makefile | 1 + > drivers/hwtracing/coresight/coresight-etm-perf.c | 5 + > drivers/hwtracing/coresight/coresight-etm4x.c | 27 ++ > drivers/hwtracing/coresight/coresight-etm4x.h | 15 + > .../hwtracing/coresight/coresight-panic-kdump.c| 199 +++ > drivers/hwtracing/coresight/coresight-priv.h | 12 + > drivers/hwtracing/coresight/coresight-tmc-etf.c| 30 ++ > drivers/hwtracing/coresight/coresight.c| 16 +-
[PULL] Documentation for 4.17
The following changes since commit 7928b2cbe55b2a410a0f5c1f154610059c57b1b2: Linux 4.16-rc1 (2018-02-11 15:04:29 -0800) are available in the Git repository at: git://git.lwn.net/linux.git tags/docs-4.17 for you to fetch changes up to 86afad7d87f535ebb1a0e978bc32a8c58ac99268: Documentation/process: update FUSE project website (2018-03-29 15:49:18 -0600) There's been a fair amount of activity in Documentation/ this time around: - Lots of work aligning Documentation/ABI with reality, done by Aishwarya Pant. - The trace documentation has been converted to RST by Changbin Du - I thrashed up kernel-doc to deal with a parsing issue and to try to make the code more readable. It's still a 20+-year-old Perl hack, though. - Lots of other updates, typo fixes, and more. Expect some annoying merge conflicts with ftrace - changes in Documentation/trace were made independently of the RST conversion. Probably the conversion should have gone through that tree as well, in retrospect. The resolution in linux-next seems good. Aaro Koskinen (1): documentation: add my name to kernel driver statement Aishwarya Pant (13): Documentation/ABI: clean up sysfs-class-pktcdvd Documentation/ABI: add sysfs interface for s6e63m0 lcd driver aoe: document sysfs interface Documentation/ABI: update infiniband sysfs interfaces block/loop: add documentation for sysfs interface backlight: lm3639: document sysfs attributes backlight: adp5520: document sysfs attributes backlight: adp8860: document sysfs attributes Documentation: rapidio: move sysfs interface to ABI block: rbd: update sysfs interface acpi: nfit: document sysfs interface char/bsr: add sysfs interface documentation Input: trackpoint: document sysfs interface Andy Shevchenko (3): dmaengine: Add note to dmatest documentation about supported channels dmaengine: Make dmatest.rst indeed reST compatible dmaengine: Fix spelling for parenthesis in dmatest documentation Changbin Du (17): Documentation: add Linux tracing to Sphinx TOC tree trace doc: convert trace/ftrace-design.txt to rst format trace doc: add ftrace-uses.rst to doc tree trace doc: convert trace/tracepoint-analysis.txt to rst format trace doc: convert trace/ftrace.txt to rst format trace doc: convert trace/kprobetrace.txt to rst format trace doc: convert trace/uprobetracer.txt to rst format trace doc: convert trace/tracepoints.txt to rst format trace doc: convert trace/events.txt to rst format trace doc: convert trace/events-kmem.txt to rst format trace doc: convert trace/events-power.txt to rst format trace doc: convert trace/events-nmi.txt to rst format trace doc: convert trace/events-msr.txt to rst format trace doc: convert trace/mmiotrace.txt to rst format trace doc: convert trace/hwlat_detector.txt to rst fromat trace doc: convert trace/intel_th.txt to rst format trace doc: convert trace/stm.txt to rst format Dave Hansen (1): docs: clarify security-bugs disclosure policy Dominik Brodowski (1): Documentation/process: Co-developed-by instead of Co-Developed-by Eric Engestrom (1): Documentation/sparse: fix typo Gary R Hook (1): Documentation/CodingStyle: Add an example for braces Joel Stanley (1): Documentation: Mention why %p prints ptrval Jonathan Corbet (12): docs: kernel-doc: Get rid of xml_escape() and friends docs: kernel-doc: Rename and split STATE_FIELD docs: kernel-doc: Move STATE_NORMAL processing into its own function docs: kernel-doc: Move STATE_NAME processing into its own function docs: kernel-doc: Move STATE_BODY processing to a separate function docs: kernel-doc: Move STATE_PROTO processing into its own function docs: kernel-doc: Finish moving STATE_* code out of process_file() docs: kernel-doc: Don't mangle literal code blocks in comments docs: Add an SPDX header to kernel-doc Merge branch 'kerneldoc2' into docs-next docs: ftrace: fix a few formatting issues Docs: Added a pointer to the formatted docs to README Jonathan Neuschäfer (2): Documentation/process/howto: Remove outdated info about bugzilla mailing lists admin-guide: Fix list formatting in tained-kernels.html Martin Kepplinger (4): README: Improve documentation descriptions Documentation: admin-guide: add kvmconfig, xenconfig and tinyconfig commands Documentation: magic-numbers: Fix typo Documentation/process: update FUSE project website Masanari Iida (2): linux-next: SLIMbus: doc: Fix a warning "Title underline too short" xfs: Change URL for the project in xfs.txt Matthew Wilcox (7): Add documentation for Context section
[PATCH 1/2] Input: mk712: update documentation web link
At the mentioned address there's nothing found. By searching information on the controller chip still can be found, so update the link to the resulting page. Signed-off-by: Martin Kepplinger--- drivers/input/touchscreen/mk712.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/input/touchscreen/mk712.c b/drivers/input/touchscreen/mk712.c index bd5352824f77..c179060525ae 100644 --- a/drivers/input/touchscreen/mk712.c +++ b/drivers/input/touchscreen/mk712.c @@ -17,7 +17,7 @@ * found in Gateway AOL Connected Touchpad computers. * * Documentation for ICS MK712 can be found at: - * http://www.idt.com/products/getDoc.cfm?docID=18713923 + * https://www.idt.com/general-parts/mk712-touch-screen-controller */ /* -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Documentation: devices.txt: remove the mk712 touchscreen device from the list
The input/touchscreen/mk712.c driver has been rewritten for the common input event system. in 2005. There shouldn't a special device node be created anymore. Signed-off-by: Martin Kepplinger--- Please review this by looking at the driver too. Thanks, martin Documentation/admin-guide/devices.txt | 1 - 1 file changed, 1 deletion(-) diff --git a/Documentation/admin-guide/devices.txt b/Documentation/admin-guide/devices.txt index 4ec843123cc3..fb39bbf0789a 100644 --- a/Documentation/admin-guide/devices.txt +++ b/Documentation/admin-guide/devices.txt @@ -259,7 +259,6 @@ 11 = /dev/vrtpanel Vr41xx embedded touch panel 13 = /dev/vpcmouse Connectix Virtual PC Mouse 14 = /dev/touchscreen/ucb1x00 UCB 1x00 touchscreen -15 = /dev/touchscreen/mk712MK712 touchscreen 128 = /dev/beep Fancy beep device 129 = 130 = /dev/watchdog Watchdog timer port -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] perf: riscv: Add Document for Future Porting Guide
Cc: Nick HuCc: Greentime Hu Signed-off-by: Alan Kao --- Documentation/riscv/pmu.txt | 249 1 file changed, 249 insertions(+) create mode 100644 Documentation/riscv/pmu.txt diff --git a/Documentation/riscv/pmu.txt b/Documentation/riscv/pmu.txt new file mode 100644 index ..a3e930ed5141 --- /dev/null +++ b/Documentation/riscv/pmu.txt @@ -0,0 +1,249 @@ +Supporting PMUs on RISC-V platforms +== +Alan Kao , Mar 2018 + +Introduction + + +As of this writing, perf_event-related features mentioned in The RISC-V ISA +Privileged Version 1.10 are as follows: +(please check the manual for more details) + +* [m|s]counteren +* mcycle[h], cycle[h] +* minstret[h], instret[h] +* mhpeventx, mhpcounterx[h] + +With such function set only, porting perf would require a lot of work, due to +the lack of the following general architectural performance monitoring features: + +* Enabling/Disabling counters + Counters are just free-running all the time in our case. +* Interrupt caused by counter overflow + No such design in the spec. +* Interrupt indicator + It is not possible to have many interrupt ports for all counters, so an + interrupt indicator is required for software to tell which counter has + just overflowed. +* Writing to counters + There will be an SBI to support this since the kernel cannot modify the + counters [1]. Alternatively, some vendor considers to implement + hardware-extension for M-S-U model machines to write counters directly. + +This document aims to provide developers a quick guide on supporting their +PMUs in the kernel. The following sections briefly explain perf' mechanism +and todos. + +You may check previous discussions here [1][2]. Also, it might be helpful +to check the appendix for related kernel structures. + + +1. Initialization +- + +*riscv_pmu* is a global pointer of type *struct riscv_pmu*, which contains +various methods according to perf's internal convention and PMU-specific +parameters. One should declare such instance to represent the PMU. By default, +*riscv_pmu* points to a constant structure *riscv_base_pmu*, which has very +basic support to a baseline QEMU model. + +Then he/she can either assign the instance's pointer to *riscv_pmu* so that +the minimal and already-implemented logic can be leveraged, or invent his/her +own *riscv_init_platform_pmu* implementation. + +In other words, existing sources of *riscv_base_pmu* merely provide a +reference implementation. Developers can flexibly decide how many parts they +can leverage, and in the most extreme case, they can customize every function +according to their needs. + + +2. Event Initialization +--- + +When a user launches a perf command to monitor some events, it is first +interpreted by the userspace perf tool into multiple *perf_event_open* +system calls, and then each of them calls to the body of *event_init* +member function that was assigned in the previous step. In *riscv_base_pmu*'s +case, it is *riscv_event_init*. + +The main purpose of this function is to translate the event provided by user +into bitmap, so that HW-related control registers or counters can directly be +manipulated. The translation is based on the mappings and methods provided in +*riscv_pmu*. + +Note that some features can be done in this stage as well: + +(1) interrupt setting, which is stated in the next section; +(2) privilege level setting (user space only, kernel space only, both); +(3) destructor setting. Normally it is sufficient to apply *riscv_destroy_event*; +(4) tweaks for non-sampling events, which will be utilized by functions such as +*perf_adjust_period*, usually something like the follows: + +if (!is_sampling_event(event)) { +hwc->sample_period = x86_pmu.max_period; +hwc->last_period = hwc->sample_period; +local64_set(>period_left, hwc->sample_period); +} + +In the case of *riscv_base_pmu*, only (3) is provided for now. + + +3. Interrupt + + +3.1. Interrupt Initialization + +This often occurs at the beginning of the *event_init* method. In common +practice, this should be a code segment like + +int x86_reserve_hardware(void) +{ +int err = 0; + +if (!atomic_inc_not_zero(_refcount)) { +mutex_lock(_reserve_mutex); +if (atomic_read(_refcount) == 0) { +if (!reserve_pmc_hardware()) +err = -EBUSY; +else +reserve_ds_buffers(); +} +if (!err) +atomic_inc(_refcount); +mutex_unlock(_reserve_mutex); +} + +return err; +} + +And the magic is in *reserve_pmc_hardware*, which usually does atomic +operations to make
[PATCH v2 1/2] perf: riscv: preliminary RISC-V support
This patch provide a basic PMU, riscv_base_pmu, which supports two general hardware event, instructions and cycles. Furthermore, this PMU serves as a reference implementation to ease the portings in the future. riscv_base_pmu should be able to run on any RISC-V machine that conforms to the Priv-Spec. Note that the latest qemu model hasn't fully support a proper behavior of Priv-Spec 1.10 yet, but work around should be easy with very small fixes. Please check https://github.com/riscv/riscv-qemu/pull/115 for future updates. Cc: Nick HuCc: Greentime Hu Signed-off-by: Alan Kao --- arch/riscv/Kconfig | 12 + arch/riscv/include/asm/perf_event.h | 76 +- arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/perf_event.c | 468 4 files changed, 553 insertions(+), 4 deletions(-) create mode 100644 arch/riscv/kernel/perf_event.c diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index c22ebe08e902..3fbf19456c9a 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -203,6 +203,18 @@ config RISCV_ISA_C config RISCV_ISA_A def_bool y +menu "PMU type" + depends on PERF_EVENTS + +config RISCV_BASE_PMU + bool "Base Performance Monitoring Unit" + def_bool y + help + A base PMU that serves as a reference implementation and has limited + feature of perf. + +endmenu + endmenu menu "Kernel type" diff --git a/arch/riscv/include/asm/perf_event.h b/arch/riscv/include/asm/perf_event.h index e13d2ff29e83..98e2efb02d25 100644 --- a/arch/riscv/include/asm/perf_event.h +++ b/arch/riscv/include/asm/perf_event.h @@ -1,13 +1,81 @@ +/* SPDX-License-Identifier: GPL-2.0 */ /* * Copyright (C) 2018 SiFive + * Copyright (C) 2018 Andes Technology Corporation * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public Licence - * as published by the Free Software Foundation; either version - * 2 of the Licence, or (at your option) any later version. */ #ifndef _ASM_RISCV_PERF_EVENT_H #define _ASM_RISCV_PERF_EVENT_H +#include +#include + +#define RISCV_BASE_COUNTERS2 + +/* + * The RISCV_MAX_COUNTERS parameter should be specified. + */ + +#ifdef CONFIG_RISCV_BASE_PMU +#define RISCV_MAX_COUNTERS 2 +#endif + +#ifndef RISCV_MAX_COUNTERS +#error "Please provide a valid RISCV_MAX_COUNTERS for the PMU." +#endif + +/* + * These are the indexes of bits in counteren register *minus* 1, + * except for cycle. It would be coherent if it can directly mapped + * to counteren bit definition, but there is a *time* register at + * counteren[1]. Per-cpu structure is scarce resource here. + * + * According to the spec, an implementation can support counter up to + * mhpmcounter31, but many high-end processors has at most 6 general + * PMCs, we give the definition to MHPMCOUNTER8 here. + */ +#define RISCV_PMU_CYCLE0 +#define RISCV_PMU_INSTRET 1 +#define RISCV_PMU_MHPMCOUNTER3 2 +#define RISCV_PMU_MHPMCOUNTER4 3 +#define RISCV_PMU_MHPMCOUNTER5 4 +#define RISCV_PMU_MHPMCOUNTER6 5 +#define RISCV_PMU_MHPMCOUNTER7 6 +#define RISCV_PMU_MHPMCOUNTER8 7 + +#define RISCV_OP_UNSUPP(-EOPNOTSUPP) + +struct cpu_hw_events { + /* # currently enabled events*/ + int n_events; + /* currently enabled events */ + struct perf_event *events[RISCV_MAX_COUNTERS]; + /* vendor-defined PMU data */ + void*platform; +}; + +struct riscv_pmu { + struct pmu *pmu; + + /* generic hw/cache events table */ + const int *hw_events; + const int (*cache_events)[PERF_COUNT_HW_CACHE_MAX] + [PERF_COUNT_HW_CACHE_OP_MAX] + [PERF_COUNT_HW_CACHE_RESULT_MAX]; + /* method used to map hw/cache events */ + int (*map_hw_event)(u64 config); + int (*map_cache_event)(u64 config); + + /* max generic hw events in map */ + int max_events; + /* number total counters, 2(base) + x(general) */ + int num_counters; + /* the width of the counter */ + int counter_width; + + /* vendor-defined PMU features */ + void*platform; +}; + #endif /* _ASM_RISCV_PERF_EVENT_H */ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index ffa439d4a364..f50d19816757 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -39,5 +39,6 @@ obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o obj-$(CONFIG_FUNCTION_TRACER) += mcount.o obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o obj-$(CONFIG_FUNCTION_GRAPH_TRACER)+= ftrace.o +obj-$(CONFIG_PERF_EVENTS) += perf_event.o clean: diff --git a/arch/riscv/kernel/perf_event.c
[PATCH 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V
This implements the baseline PMU for RISC-V platforms. To ease future PMU portings, a guide is also written, containing perf concepts, arch porting practices and some hints. Changes in v2: - Fix the bug reported by Alex, which was caused by not sufficient initialization. Check https://lkml.org/lkml/2018/3/31/251 for the discussion. Alan Kao (2): perf: riscv: preliminary RISC-V support perf: riscv: Add Document for Future Porting Guide Documentation/riscv/pmu.txt | 249 +++ arch/riscv/Kconfig | 12 + arch/riscv/include/asm/perf_event.h | 76 +- arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/perf_event.c | 468 5 files changed, 802 insertions(+), 4 deletions(-) create mode 100644 Documentation/riscv/pmu.txt create mode 100644 arch/riscv/kernel/perf_event.c -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 5/6] Initialize the mapping of KASan shadow memory
From: Andrey RyabininThis patch initializes KASan shadow region's page table and memory. There are two stage for KASan initializing: 1. At early boot stage the whole shadow region is mapped to just one physical page (kasan_zero_page). It's finished by the function kasan_early_init which is called by __mmap_switched(arch/arm/kernel/ head-common.S) ---Andrey Ryabinin 2. After the calling of paging_init, we use kasan_zero_page as zero shadow for some memory that KASan don't need to track, and we alloc new shadow space for the other memory that KASan need to track. These issues are finished by the function kasan_init which is call by setup_arch. ---Andrey Ryabinin 3. Add support arm LPAE If LPAE is enabled, KASan shadow region's mapping table need be copyed in pgd_alloc function. ---Abbott Liu 4. In 64bit machine, size_t is unsigned long, but int 32bit machine, size_t is unsigned int, so we need type conversion in the function of kasan_cache_create. ---Abbott Liu 5. Change kasan_pte_populate,kasan_pmd_populate,kasan_pud_populate, kasan_pgd_populate from .meminit.text section to .init.text section. ---Reported by: Florian Fainelli ---Signed off by: Abbott Liu Cc: Andrey Ryabinin Co-Developed-by: Abbott Liu Reviewed-by: Russell King - ARM Linux Reviewed-by: Florian Fainelli Reported-by: Florian Fainelli Tested-by: Florian Fainelli Tested-by: Joel Stanley Tested-by: Abbott Liu Signed-off-by: Abbott Liu --- arch/arm/include/asm/kasan.h | 35 + arch/arm/include/asm/pgalloc.h | 7 +- arch/arm/include/asm/thread_info.h | 4 + arch/arm/kernel/head-common.S | 3 + arch/arm/kernel/setup.c| 2 + arch/arm/mm/Makefile | 3 + arch/arm/mm/kasan_init.c | 302 + arch/arm/mm/pgd.c | 14 ++ mm/kasan/kasan.c | 5 +- 9 files changed, 371 insertions(+), 4 deletions(-) create mode 100644 arch/arm/include/asm/kasan.h create mode 100644 arch/arm/mm/kasan_init.c diff --git a/arch/arm/include/asm/kasan.h b/arch/arm/include/asm/kasan.h new file mode 100644 index 000..1801f4d --- /dev/null +++ b/arch/arm/include/asm/kasan.h @@ -0,0 +1,35 @@ +/* + * arch/arm/include/asm/kasan.h + * + * Copyright (c) 2015 Samsung Electronics Co., Ltd. + * Author: Andrey Ryabinin + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#ifndef __ASM_KASAN_H +#define __ASM_KASAN_H + +#ifdef CONFIG_KASAN + +#include + +#define KASAN_SHADOW_SCALE_SHIFT 3 + +/* + * Compiler uses shadow offset assuming that addresses start + * from 0. Kernel addresses don't start from 0, so shadow + * for kernel really starts from 'compiler's shadow offset' + + * ('kernel address space start' >> KASAN_SHADOW_SCALE_SHIFT) + */ + +extern void kasan_init(void); + +#else +static inline void kasan_init(void) { } +#endif + +#endif diff --git a/arch/arm/include/asm/pgalloc.h b/arch/arm/include/asm/pgalloc.h index 2d7344f..f170659 100644 --- a/arch/arm/include/asm/pgalloc.h +++ b/arch/arm/include/asm/pgalloc.h @@ -50,8 +50,11 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) */ #define pmd_alloc_one(mm,addr) ({ BUG(); ((pmd_t *)2); }) #define pmd_free(mm, pmd) do { } while (0) -#define pud_populate(mm,pmd,pte) BUG() - +#ifndef CONFIG_KASAN +#define pud_populate(mm, pmd, pte) BUG() +#else +#define pud_populate(mm, pmd, pte) do { } while (0) +#endif #endif /* CONFIG_ARM_LPAE */ extern pgd_t *pgd_alloc(struct mm_struct *mm); diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h index e71cc35..bc681a0 100644 --- a/arch/arm/include/asm/thread_info.h +++ b/arch/arm/include/asm/thread_info.h @@ -16,7 +16,11 @@ #include #include +#ifdef CONFIG_KASAN +#define THREAD_SIZE_ORDER 2 +#else #define THREAD_SIZE_ORDER 1 +#endif #define THREAD_SIZE(PAGE_SIZE << THREAD_SIZE_ORDER) #define THREAD_START_SP(THREAD_SIZE - 8) diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S index c79b829..20161e2 100644 --- a/arch/arm/kernel/head-common.S +++ b/arch/arm/kernel/head-common.S @@ -115,6 +115,9 @@ __mmap_switched: str r8, [r2]@ Save atags pointer cmp r3, #0 strne r10, [r3]
[PATCH v3 2/6] Disable instrumentation for some code
From: Andrey RyabininDisable instrumentation for arch/arm/boot/compressed/* ,arch/arm/kvm/hyp/* and arch/arm/vdso/* because those code won't linkd with kernel image. Disable kasan check in the function unwind_pop_register because it doesn't matter that kasan checks failed when unwind_pop_register read stack memory of task. Reviewed-by: Russell King - ARM Linux Reviewed-by: Florian Fainelli Reviewed-by: Marc Zyngier Tested-by: Joel Stanley Tested-by: Florian Fainelli Tested-by: Abbott Liu Signed-off-by: Abbott Liu --- arch/arm/boot/compressed/Makefile | 1 + arch/arm/kernel/unwind.c | 3 ++- arch/arm/kvm/hyp/Makefile | 4 arch/arm/vdso/Makefile| 2 ++ 4 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile index 45a6b9b..966103e 100644 --- a/arch/arm/boot/compressed/Makefile +++ b/arch/arm/boot/compressed/Makefile @@ -24,6 +24,7 @@ OBJS += hyp-stub.o endif GCOV_PROFILE := n +KASAN_SANITIZE := n # # Architecture dependencies diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c index 0bee233..2e55c7d 100644 --- a/arch/arm/kernel/unwind.c +++ b/arch/arm/kernel/unwind.c @@ -249,7 +249,8 @@ static int unwind_pop_register(struct unwind_ctrl_block *ctrl, if (*vsp >= (unsigned long *)ctrl->sp_high) return -URC_FAILURE; - ctrl->vrs[reg] = *(*vsp)++; + ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp)); + (*vsp)++; return URC_OK; } diff --git a/arch/arm/kvm/hyp/Makefile b/arch/arm/kvm/hyp/Makefile index 63d6b40..0a8b500 100644 --- a/arch/arm/kvm/hyp/Makefile +++ b/arch/arm/kvm/hyp/Makefile @@ -24,3 +24,7 @@ obj-$(CONFIG_KVM_ARM_HOST) += hyp-entry.o obj-$(CONFIG_KVM_ARM_HOST) += switch.o CFLAGS_switch.o += $(CFLAGS_ARMV7VE) obj-$(CONFIG_KVM_ARM_HOST) += s2-setup.o + +GCOV_PROFILE := n +KASAN_SANITIZE := n +UBSAN_SANITIZE := n diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile index bb411821..87abbb7 100644 --- a/arch/arm/vdso/Makefile +++ b/arch/arm/vdso/Makefile @@ -30,6 +30,8 @@ CFLAGS_vgettimeofday.o = -O2 # Disable gcov profiling for VDSO code GCOV_PROFILE := n +KASAN_SANITIZE := n + # Force dependency $(obj)/vdso.o : $(obj)/vdso.so -- 2.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/6] Add TTBR operator for kasan_init
The purpose of this patch is to provide set_ttbr0/get_ttbr0 to kasan_init function. The definitions of cp15 registers should be in arch/arm/include/asm/cp15.h rather than arch/arm/include/asm/kvm_hyp.h, so move them. Cc: Andrey RyabininReviewed-by: Marc Zyngier Reviewed-by: Russell King - ARM Linux Reviewed-by: Christoffer Dall Acked-by: Mark Rutland Tested-by: Florian Fainelli Tested-by: Joel Stanley Tested-by: Abbott Liu Signed-off-by: Abbott Liu --- arch/arm/include/asm/cp15.h| 104 + arch/arm/include/asm/kvm_hyp.h | 52 - arch/arm/kvm/hyp/cp15-sr.c | 12 ++--- arch/arm/kvm/hyp/switch.c | 6 +-- 4 files changed, 113 insertions(+), 61 deletions(-) diff --git a/arch/arm/include/asm/cp15.h b/arch/arm/include/asm/cp15.h index 4c9fa72..99ebb31 100644 --- a/arch/arm/include/asm/cp15.h +++ b/arch/arm/include/asm/cp15.h @@ -3,6 +3,7 @@ #define __ASM_ARM_CP15_H #include +#include /* * CR1 bits (CP#15 CR1) @@ -65,8 +66,111 @@ #define __write_sysreg(v, r, w, c, t) asm volatile(w " " c : : "r" ((t)(v))) #define write_sysreg(v, ...) __write_sysreg(v, __VA_ARGS__) +#define TTBR0_32 __ACCESS_CP15(c2, 0, c0, 0) +#define TTBR1_32 __ACCESS_CP15(c2, 0, c0, 1) +#define PAR_32 __ACCESS_CP15(c7, 0, c4, 0) +#define TTBR0_64 __ACCESS_CP15_64(0, c2) +#define TTBR1_64 __ACCESS_CP15_64(1, c2) +#define PAR_64 __ACCESS_CP15_64(0, c7) +#define VTTBR __ACCESS_CP15_64(6, c2) +#define CNTV_CVAL __ACCESS_CP15_64(3, c14) +#define CNTVOFF__ACCESS_CP15_64(4, c14) + +#define MIDR __ACCESS_CP15(c0, 0, c0, 0) +#define CSSELR __ACCESS_CP15(c0, 2, c0, 0) +#define VPIDR __ACCESS_CP15(c0, 4, c0, 0) +#define VMPIDR __ACCESS_CP15(c0, 4, c0, 5) +#define SCTLR __ACCESS_CP15(c1, 0, c0, 0) +#define CPACR __ACCESS_CP15(c1, 0, c0, 2) +#define HCR__ACCESS_CP15(c1, 4, c1, 0) +#define HDCR __ACCESS_CP15(c1, 4, c1, 1) +#define HCPTR __ACCESS_CP15(c1, 4, c1, 2) +#define HSTR __ACCESS_CP15(c1, 4, c1, 3) +#define TTBCR __ACCESS_CP15(c2, 0, c0, 2) +#define HTCR __ACCESS_CP15(c2, 4, c0, 2) +#define VTCR __ACCESS_CP15(c2, 4, c1, 2) +#define DACR __ACCESS_CP15(c3, 0, c0, 0) +#define DFSR __ACCESS_CP15(c5, 0, c0, 0) +#define IFSR __ACCESS_CP15(c5, 0, c0, 1) +#define ADFSR __ACCESS_CP15(c5, 0, c1, 0) +#define AIFSR __ACCESS_CP15(c5, 0, c1, 1) +#define HSR__ACCESS_CP15(c5, 4, c2, 0) +#define DFAR __ACCESS_CP15(c6, 0, c0, 0) +#define IFAR __ACCESS_CP15(c6, 0, c0, 2) +#define HDFAR __ACCESS_CP15(c6, 4, c0, 0) +#define HIFAR __ACCESS_CP15(c6, 4, c0, 2) +#define HPFAR __ACCESS_CP15(c6, 4, c0, 4) +#define ICIALLUIS __ACCESS_CP15(c7, 0, c1, 0) +#define BPIALLIS __ACCESS_CP15(c7, 0, c1, 6) +#define ICIMVAU__ACCESS_CP15(c7, 0, c5, 1) +#define ATS1CPR__ACCESS_CP15(c7, 0, c8, 0) +#define TLBIALLIS __ACCESS_CP15(c8, 0, c3, 0) +#define TLBIALL__ACCESS_CP15(c8, 0, c7, 0) +#define TLBIALLNSNHIS __ACCESS_CP15(c8, 4, c3, 4) +#define PRRR __ACCESS_CP15(c10, 0, c2, 0) +#define NMRR __ACCESS_CP15(c10, 0, c2, 1) +#define AMAIR0 __ACCESS_CP15(c10, 0, c3, 0) +#define AMAIR1 __ACCESS_CP15(c10, 0, c3, 1) +#define VBAR __ACCESS_CP15(c12, 0, c0, 0) +#define CID__ACCESS_CP15(c13, 0, c0, 1) +#define TID_URW__ACCESS_CP15(c13, 0, c0, 2) +#define TID_URO__ACCESS_CP15(c13, 0, c0, 3) +#define TID_PRIV __ACCESS_CP15(c13, 0, c0, 4) +#define HTPIDR __ACCESS_CP15(c13, 4, c0, 2) +#define CNTKCTL__ACCESS_CP15(c14, 0, c1, 0) +#define CNTV_CTL __ACCESS_CP15(c14, 0, c3, 1) +#define CNTHCTL__ACCESS_CP15(c14, 4, c1, 0) + extern unsigned long cr_alignment; /* defined in entry-armv.S */ +static inline void set_par(u64 val) +{ + if (IS_ENABLED(CONFIG_ARM_LPAE)) + write_sysreg(val, PAR_64); + else + write_sysreg(val, PAR_32); +} + +static inline u64 get_par(void) +{ + if (IS_ENABLED(CONFIG_ARM_LPAE)) + return read_sysreg(PAR_64); + else + return read_sysreg(PAR_32); +} + +static inline void set_ttbr0(u64 val) +{ + if (IS_ENABLED(CONFIG_ARM_LPAE)) + write_sysreg(val, TTBR0_64); + else + write_sysreg(val, TTBR0_32); +} + +static inline u64 get_ttbr0(void) +{ + if (IS_ENABLED(CONFIG_ARM_LPAE)) + return read_sysreg(TTBR0_64); + else + return
[PATCH v3 6/6] Enable KASan for arm
From: Andrey RyabininThis patch enable kernel address sanitizer for arm. Cc: Andrey Ryabinin Acked-by: Dmitry Vyukov Tested-by: Joel Stanley Tested-by: Florian Fainelli Tested-by: Abbott Liu Signed-off-by: Abbott Liu --- Documentation/dev-tools/kasan.rst | 2 +- arch/arm/Kconfig | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst index f7a18f2..d92120d 100644 --- a/Documentation/dev-tools/kasan.rst +++ b/Documentation/dev-tools/kasan.rst @@ -12,7 +12,7 @@ KASAN uses compile-time instrumentation for checking every memory access, therefore you will need a GCC version 4.9.2 or later. GCC 5.0 or later is required for detection of out-of-bounds accesses to stack or global variables. -Currently KASAN is supported only for the x86_64 and arm64 architectures. +Currently KASAN is supported only for the x86_64, arm64 and arm architectures. Usage - diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 7e3d535..ac2287b 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -49,6 +49,7 @@ config ARM select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6 select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU + select HAVE_ARCH_KASAN if MMU select HAVE_ARCH_MMAP_RND_BITS if MMU select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT) select HAVE_ARCH_THREAD_STRUCT_WHITELIST -- 2.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/6] KASan for arm
From: Andrey RyabininChangelog: v3 - v2 - Remove this patch: 2 1-byte checks more safer for memory_is_poisoned_16 because a unaligned load/store of 16 bytes is rare on arm, and this patch is very likely to affect the performance of modern CPUs. ---Acked by: Russell King - ARM Linux - Fixed some link error which kasan_pmd_populate,kasan_pte_populate and kasan_pud_populate are in section .meminit.text but the function kasan_alloc_block which is called by kasan_pmd_populate, kasan_pte_populate and kasan_pud_populate is in section .init.text. So we need change kasan_pmd_populate,kasan_pte_populate and kasan_pud_populate into the section .init.text. ---Reported by: Florian Fainelli - Fixed some compile error which caused by the wrong access instruction in arch/arm/kernel/entry-common.S. ---Reported by: kbuild test robot - Disable instrumentation for arch/arm/kvm/hyp/*. ---Acked by: Marc Zyngier - Update the set of supported architectures in Documentation/dev-tools/kasan.rst. ---Acked by:Dmitry Vyukov - The version 2 is tested by: Florian Fainelli (compile test) kbuild test robot (compile test) Joel Stanley (on ASPEED ast2500(ARMv5)) v2 - v1 - Fixed some compiling error which happens on changing kernel compression mode to lzma/xz/lzo/lz4. ---Reported by: Florian Fainelli , Russell King - ARM Linux - Fixed a compiling error cause by some older arm instruction set(armv4t) don't suppory movw/movt which is reported by kbuild. - Changed the pte flag from _L_PTE_DEFAULT | L_PTE_DIRTY | L_PTE_XN to pgprot_val(PAGE_KERNEL). ---Reported by: Russell King - ARM Linux - Moved Enable KASan patch as the last one. ---Reported by: Florian Fainelli , Russell King - ARM Linux - Moved the definitions of cp15 registers from arch/arm/include/asm/kvm_hyp.h to arch/arm/include/asm/cp15.h. ---Asked by: Mark Rutland - Merge the following commits into the commit Define the virtual space of KASan's shadow region: 1) Define the virtual space of KASan's shadow region; 2) Avoid cleaning the KASan shadow area's mapping table; 3) Add KASan layout; - Merge the following commits into the commit Initialize the mapping of KASan shadow memory: 1) Initialize the mapping of KASan shadow memory; 2) Add support arm LPAE; 3) Don't need to map the shadow of KASan's shadow memory; ---Reported by: Russell King - ARM Linux 4) Change mapping of kasan_zero_page int readonly. - The version 1 is tested by Florian Fainelli on a Cortex-A5 (no LPAE). Hi,all: These patches add arch specific code for kernel address sanitizer (see Documentation/kasan.txt). 1/8 of kernel addresses reserved for shadow memory. There was no big enough hole for this, so virtual addresses for shadow were stolen from user space. At early boot stage the whole shadow region populated with just one physical page (kasan_zero_page). Later, this page reused as readonly zero shadow for some memory that KASan currently don't track (vmalloc). After mapping the physical memory, pages for shadow memory are allocated and mapped. KASan's stack instrumentation significantly increases stack's consumption, so CONFIG_KASAN doubles THREAD_SIZE. Functions like memset/memmove/memcpy do a lot of memory accesses. If bad pointer passed to one of these function it is important to catch this. Compiler's instrumentation cannot do this since these functions are written in assembly. KASan replaces memory functions with manually instrumented variants. Original functions declared as weak symbols so strong definitions in mm/kasan/kasan.c could replace them. Original functions have aliases with '__' prefix in name, so we could call non-instrumented variant if needed. Some files built without kasan instrumentation (e.g. mm/slub.c). Original mem* function replaced (via #define) with prefixed variants to disable memory access checks for such files. On arm LPAE architecture, the mapping table of KASan shadow memory(if PAGE_OFFSET is 0xc000, the KASan shadow memory's virtual space is 0xb6e00~0xbf00) can't be filled in do_translation_fault function, because kasan instrumentation maybe cause do_translation_fault function accessing KASan shadow memory. The accessing of KASan shadow memory in do_translation_fault function maybe cause dead circle. So the mapping table of KASan shadow memory need be copyed in pgd_alloc function. Most of the code comes from: https://github.com/aryabinin/linux/commit/0b54f17e70ff50a902c4af05bb92716eb95acefe These patches are tested on vexpress-ca15, vexpress-ca9 Cc:
[PATCH v3 4/6] Define the virtual space of KASan's shadow region
Define KASAN_SHADOW_OFFSET,KASAN_SHADOW_START and KASAN_SHADOW_END for arm kernel address sanitizer. ++ 0x || || || ++ CONFIG_PAGE_OFFSET || || |-> module virtual address space area. ||/ ++ MODULE_VADDR = KASAN_SHADOW_END || || |-> the shadow area of kernel virtual address. ||/ ++ TASK_SIZE(start of kernel space) = KASAN_SHADOW_START the ||\ shadow address of MODULE_VADDR || -+ || | ++ KASAN_SHADOW_OFFSET |-> the user space area. Kernel address || |sanitizer do not use this space. || -+ ||/ -- 0 1)KASAN_SHADOW_OFFSET: This value is used to map an address to the corresponding shadow address by the following formula: shadow_addr = (address >> 3) + KASAN_SHADOW_OFFSET; 2)KASAN_SHADOW_START This value is the MODULE_VADDR's shadow address. It is the start of kernel virtual space. 3)KASAN_SHADOW_END This value is the 0x1's shadow address. It is the end of kernel addresssanitizer's shadow area. It is also the start of the module area. When enable kasan, the definition of TASK_SIZE is not an an 8-bit rotated constant, so we need to modify the TASK_SIZE access code in the *.s file. Cc: Andrey RyabininReviewed-by: Ard Biesheuvel Reviewed-by: Russell King - ARM Linux Tested-by: Joel Stanley Tested-by: Florian Fainelli Tested-by: Abbott Liu Signed-off-by: Abbott Liu --- arch/arm/include/asm/kasan_def.h | 64 arch/arm/include/asm/memory.h| 5 arch/arm/kernel/entry-armv.S | 5 ++-- arch/arm/kernel/entry-common.S | 9 -- arch/arm/mm/init.c | 6 arch/arm/mm/mmu.c| 7 - 6 files changed, 90 insertions(+), 6 deletions(-) create mode 100644 arch/arm/include/asm/kasan_def.h diff --git a/arch/arm/include/asm/kasan_def.h b/arch/arm/include/asm/kasan_def.h new file mode 100644 index 000..7b7f424 --- /dev/null +++ b/arch/arm/include/asm/kasan_def.h @@ -0,0 +1,64 @@ +/* + * arch/arm/include/asm/kasan_def.h + * + * Copyright (c) 2018 Huawei Technologies Co., Ltd. + * + * Author: Abbott Liu + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#ifndef __ASM_KASAN_DEF_H +#define __ASM_KASAN_DEF_H + +#ifdef CONFIG_KASAN + +/* + *++ 0x + *|| + *|| + *|| + *++ CONFIG_PAGE_OFFSET + *||\ + *|| |-> module virtual address space area. + *||/ + *++ MODULE_VADDR = KASAN_SHADOW_END + *||\ + *|| |-> the shadow area of kernel virtual address. + *||/ + *++ TASK_SIZE(start of kernel space) = KASAN_SHADOW_START the + *||\ shadow address of MODULE_VADDR + *|| -+ + *|| | + *++ KASAN_SHADOW_OFFSET |-> the user space area. Kernel address + *|| |sanitizer do not use this space. + *|| -+ + *||/ + *-- 0 + * + *1)KASAN_SHADOW_OFFSET: + *This value is used to map an address to the corresponding shadow + * address by the following formula: + * shadow_addr = (address >> 3) + KASAN_SHADOW_OFFSET; + * + * 2)KASAN_SHADOW_START + * This value is the MODULE_VADDR's shadow address. It is the start + * of kernel virtual space. + * + * 3) KASAN_SHADOW_END + * This value is the 0x1's shadow address. It is the end of + * kernel addresssanitizer's shadow area. It is also the start of the + * module area. + * + */ + +#define KASAN_SHADOW_OFFSET (KASAN_SHADOW_END - (1<<29)) + +#define KASAN_SHADOW_START ((KASAN_SHADOW_END >> 3) + KASAN_SHADOW_OFFSET) + +#define KASAN_SHADOW_END(UL(CONFIG_PAGE_OFFSET) - UL(SZ_16M)) + +#endif +#endif diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h index 4966677..3ce1a9a 100644 --- a/arch/arm/include/asm/memory.h +++ b/arch/arm/include/asm/memory.h @@ -21,6 +21,7 @@ #ifdef CONFIG_NEED_MACH_MEMORY_H #include #endif +#include /* * Allow for constants defined here to be used from assembly code @@ -37,7 +38,11 @@ * TASK_SIZE - the maximum size of a user space task. * TASK_UNMAPPED_BASE - the lower boundary of the mmap VM area */ +#ifndef CONFIG_KASAN #define TASK_SIZE (UL(CONFIG_PAGE_OFFSET) - UL(SZ_16M)) +#else +#define TASK_SIZE (KASAN_SHADOW_START) +#endif #define TASK_UNMAPPED_BASE ALIGN(TASK_SIZE / 3, SZ_16M) /* diff --git
[PATCH V5] thermal: Add cooling device's statistics in sysfs
This extends the sysfs interface for thermal cooling devices and exposes some pretty useful statistics. These statistics have proven to be quite useful specially while doing benchmarks related to the task scheduler, where we want to make sure that nothing has disrupted the test, specially the cooling device which may have put constraints on the CPUs. The information exposed here tells us to what extent the CPUs were constrained by the thermal framework. The write-only "reset" file is used to reset the statistics. The read-only "time_in_state_ms" file shows the time (in msec) spent by the device in the respective cooling states, and it prints one line per cooling state. The read-only "total_trans" file shows single positive integer value showing the total number of cooling state transitions the device has gone through since the time the cooling device is registered or the time when statistics were reset last. The read-only "trans_table" file shows a two dimensional matrix, where an entry (row i, column j) represents the number of transitions from State_i to State_j. This is how the directory structure looks like for a single cooling device: $ ls -R /sys/class/thermal/cooling_device0/ /sys/class/thermal/cooling_device0/: cur_state max_state power stats subsystem type uevent /sys/class/thermal/cooling_device0/power: autosuspend_delay_ms runtime_active_time runtime_suspended_time control runtime_status /sys/class/thermal/cooling_device0/stats: reset time_in_state_ms total_trans trans_table This is tested on ARM 64-bit Hisilicon hikey620 board running Ubuntu and ARM 64-bit Hisilicon hikey960 board running Android. Signed-off-by: Viresh Kumar--- V4->V5: - time_in_state's unit is msec now instead of clock_t. - Remove double setting of ->stats pointer. V3->V4: - Added CONFIG_THERMAL_STATISTICS - Added transition table file in sysfs - Updated documentation for new sysfs files - The unit of time in time_in_state is clock_t now - Separate routines for cooling device stat setup/destroy V2->V3: - Total number of states is max_level + 1. The earlier version didn't take that into account and so the stats for the highest state were missing. V1->V2: - Move to sysfs from debugfs Documentation/thermal/sysfs-api.txt | 31 + drivers/thermal/Kconfig | 7 ++ drivers/thermal/thermal_core.c | 3 +- drivers/thermal/thermal_core.h | 10 ++ drivers/thermal/thermal_helpers.c | 5 +- drivers/thermal/thermal_sysfs.c | 225 include/linux/thermal.h | 1 + 7 files changed, 280 insertions(+), 2 deletions(-) diff --git a/Documentation/thermal/sysfs-api.txt b/Documentation/thermal/sysfs-api.txt index bb9a0a53e76b..911399730c1c 100644 --- a/Documentation/thermal/sysfs-api.txt +++ b/Documentation/thermal/sysfs-api.txt @@ -255,6 +255,7 @@ temperature) and throttle appropriate devices. 2. sysfs attributes structure RO read only value +WO write only value RW read/write value Thermal sysfs attributes will be represented under /sys/class/thermal. @@ -286,6 +287,11 @@ if hwmon is compiled in or built as a module. |---type: Type of the cooling device(processor/fan/...) |---max_state: Maximum cooling state of the cooling device |---cur_state: Current cooling state of the cooling device +|---stats: Directory containing cooling device's statistics +|---stats/reset: Writing any value resets the statistics +|---stats/time_in_state_ms:Time (msec) spent in various cooling states +|---stats/total_trans: Total number of times cooling state is changed +|---stats/trans_table: Cooing state transition table Then next two dynamic attributes are created/removed in pairs. They represent @@ -490,6 +496,31 @@ cur_state - cur_state == max_state means the maximum cooling. RW, Required +stats/reset + Writing any value resets the cooling device's statistics. + WO, Required + +stats/time_in_state_ms: + The amount of time spent by the cooling device in various cooling + states. The output will have " " pair in each line, which + will mean this cooling device spent msec of time at . + Output will have one line for each of the supported states. usertime + units here is 10mS (similar to other time exported in /proc). + RO, Required + +stats/total_trans: + A single positive value showing the total number of times the state of a + cooling device is changed. + RO, Required + +stats/trans_table: + This gives fine grained information about all the cooling state + transitions. The cat output here is a two dimensional matrix, where an + entry (row i, column j) represents the number of transitions from + State_i to State_j. If the transition table is bigger than
Re: [PATCH 1/2] perf: riscv: preliminary RISC-V support
Hi Alex, On Mon, Apr 02, 2018 at 03:36:12PM +0800, Alan Kao wrote: > On Sat, Mar 31, 2018 at 03:47:10PM -0700, Alex Solomatnikov wrote: > > The original guess was that maybe, an counter value on a hart is picked > as the minusend, and an old counter value on another hart was recorded > as the subtrahend but numerically larger. Then, the overflow causes > by that subtraction. Please let me name this guess as > "cross-hart subtraction." > > > You can add a skew between cores in qemu, something like this: > > > > case CSR_INSTRET: > > core_id()*return cpu_get_host_ticks()/10; > > break; > > case CSR_CYCLE: > > return cpu_get_host_ticks(); > > break; > > > > However, I tried similar stuff to reproduce the phenomenon but in vain. > It seems that the > > ***cross-hart subtration doesn't even happen, because generic > code handles them. ... I am sorry that this observation is wrong. With appropriate tweak, we successfully reproduce the behavior and locate the the bug. This will be fix in v2. Thanks for the helps. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] perf: riscv: preliminary RISC-V support
On Sat, Mar 31, 2018 at 03:47:10PM -0700, Alex Solomatnikov wrote: The original guess was that maybe, an counter value on a hart is picked as the minusend, and an old counter value on another hart was recorded as the subtrahend but numerically larger. Then, the overflow causes by that subtraction. Please let me name this guess as "cross-hart subtraction." > You can add a skew between cores in qemu, something like this: > > case CSR_INSTRET: > core_id()*return cpu_get_host_ticks()/10; > break; > case CSR_CYCLE: > return cpu_get_host_ticks(); > break; > However, I tried similar stuff to reproduce the phenomenon but in vain. It seems that the cross-hart subtration doesn't even happen, because generic code handles them. While I am still looking for the proof to it, I would like to have more information on this first: * What is the frequency of that "funny number" event? Was that often? * If you monitor only one hart, will the event disappear? * What will happen if you change the counter_width to fit U54's counter width? * Is the test program you used open-sourced? > Alex > Many thanks, Alan > On Wed, Mar 28, 2018 at 7:30 PM, Alan Kaowrote: > > Hi Alex, > > > > I'm appreciated for your reply and tests. > > > > On Wed, Mar 28, 2018 at 03:58:41PM -0700, Alex Solomatnikov wrote: > >> Did you test this code? > > > > I did test this patch on QEMU's virt model with multi-hart, which is the > > only > > RISC-V machine I have for now. But as I mentioned in > > https://github.com/riscv/riscv-qemu/pull/115 , the hardware counter support > > in QEMU is not fully conformed to the 1.10 Priv-Spec, so I had to slightly > > tweak the code to make reading work. > > > > Specifically, the read to cycle and instret in QEMU looks like this: > > ... > > case CSR_INSTRET: > > case CSR_CYCLE: > > // if (ctr_ok) { > > return cpu_get_host_ticks(); > > // } > > break; > > ... > > and the two lines of comment was the tweak. > > > > On such environment, I did not get anything unexpected. No matter which of > > them > > is requested, QEMU returns the host's tick. > > > >> > >> I got funny numbers when I tried to run it on HiFive Unleashed: > >> > >> perf stat mem-latency > >> ... > >> > >> Performance counter stats for 'mem-latency': > >> > >> 157.907000 task-clock (msec) #0.940 CPUs utilized > >> > >> 1 context-switches #0.006 K/sec > >> > >> 1 cpu-migrations#0.006 K/sec > >> > >> 4102 page-faults #0.026 M/sec > >> > >> 157923752 cycles#1.000 GHz > >> > >> 9223372034948899840 instructions # 58403957087.78 insn > >> per cycle > >> branches > >> > >> branch-misses > >> > >> > >>0.168046000 seconds time elapsed > >> > >> > >> Tracing read_counter(), I see this: > >> > >> Jan 1 00:41:50 buildroot user.info kernel: [ 2510.058809] CPU 3: > >> read_counter idx=0 val=2528358954912 > >> Jan 1 00:41:50 buildroot user.info kernel: [ 2510.063339] CPU 3: > >> read_counter idx=1 val=53892244920 > >> Jan 1 00:41:50 buildroot user.info kernel: [ 2510.118160] CPU 3: > >> read_counter idx=0 val=2528418303035 > >> Jan 1 00:41:50 buildroot user.info kernel: [ 2510.122694] CPU 3: > >> read_counter idx=1 val=53906699665 > >> Jan 1 00:41:50 buildroot user.info kernel: [ 2510.216736] CPU 1: > >> read_counter idx=0 val=2528516878664 > >> Jan 1 00:41:50 buildroot user.info kernel: [ 2510.221270] CPU 1: > >> read_counter idx=1 val=51986369142 > >> > >> It looks like the counter values from different cores are subtracted and > >> wraparound occurs. > >> > > > > Thanks for the hint. It makes sense. 9223372034948899840 is > > 7fff8e66a400, > > which should be a wraparound with the mask I set (63-bit) in the code. > > > > I will try this direction. Ideally, we can solve it by explicitly syncing > > the > > hwc->prev_count when a cpu migration event happens. > > > >> > >> Also, core IDs and socket IDs are wrong in perf report: > >> > > > > As Palmer has replied to this, I have no comment here. > > > >> perf report --header -I > >> Error: > >> The perf.data file has no samples! > >> # > >> # captured on: Thu Jan 1 02:52:07 1970 > >> # hostname : buildroot > >> # os release : 4.15.0-00045-g0d7c030-dirty > >> # perf version : 4.15.0 > >> # arch : riscv64 > >> # nrcpus online : 4 > >> # nrcpus avail : 5 > >> # total memory : 8188340 kB > >> # cmdline : /usr/bin/perf record -F 1000 lat_mem_rd -P 1 -W 1 -N 1 -t 10 > >> # event : name = cycles:ppp, , size = 112, { sample_period, sample_freq } = > >> 1000, sample_type = IP|TID|TIME|PERIOD, disabled = 1, inherit = 1, mmap = > >> 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, precise_ip = 3, > >> sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1 > >> # sibling cores : 1 > >> #
Re: [PATCH] Documentation/thermal: Check links and convert to https
On 28-03-18, 22:59, Sanjeev Gupta wrote: > All links working. And why is it important to convert them to https ? > Signed-off-by: Sanjeev Gupta> --- > Documentation/thermal/cpu-cooling-api.txt | 2 +- > Documentation/thermal/nouveau_thermal | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/Documentation/thermal/cpu-cooling-api.txt > b/Documentation/thermal/cpu-cooling-api.txt > index 7df567eaea1a..32917d178c51 100644 > --- a/Documentation/thermal/cpu-cooling-api.txt > +++ b/Documentation/thermal/cpu-cooling-api.txt > @@ -5,7 +5,7 @@ Written by Amit Daniel Kachhap > > Updated: 6 Jan 2015 > > -Copyright (c) 2012 Samsung Electronics Co., Ltd(http://www.samsung.com) > +Copyright (c) 2012 Samsung Electronics Co., Ltd (https://www.samsung.com) > > 0. Introduction > > diff --git a/Documentation/thermal/nouveau_thermal > b/Documentation/thermal/nouveau_thermal > index 6e17a11efcb0..502b0b95c2e2 100644 > --- a/Documentation/thermal/nouveau_thermal > +++ b/Documentation/thermal/nouveau_thermal > @@ -79,4 +79,4 @@ Thermal management on Nouveau is new and may not work on > all cards. If you have > inquiries, please ping mupuf on IRC (#nouveau, freenode). > > Bug reports should be filled on Freedesktop's bug tracker. Please follow > -http://nouveau.freedesktop.org/wiki/Bugs > +https://nouveau.freedesktop.org/wiki/Bugs > -- > 2.15.1 -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html