RE: [PATCH] [repost] Drivers: hv: vmbus: Offload the handling of channels to two workqueues
> -Original Message- > From: Dexuan Cui > Sent: Thursday, November 29, 2018 12:17 AM > To: gre...@linuxfoundation.org > Cc: KY Srinivasan ; Haiyang Zhang > ; Stephen Hemminger > ; linux-kernel@vger.kernel.org; > de...@linuxdriverproject.org; a...@canonical.com; vkuznets > ; o...@aepfle.de; jasow...@redhat.com; Michael > Kelley > Subject: RE: [PATCH] [repost] Drivers: hv: vmbus: Offload the handling of > channels to two workqueues > > > From: gre...@linuxfoundation.org > > Sent: Wednesday, November 28, 2018 11:45 PM > > > > > > There is no change in this repost. I just rebased this patch to today's > > > char-misc's char-misc-next branch. Previously KY posted the patch with > his > > > Signed-off-by (which is kept in this repost), but there was a conflict > > > issue. > > > > > > Note: the patch can't be cleanly applied to char-misc's char-misc-linus > branch > > -- > > > to do that, we need to cherry-pick the supporting patch first: > > > 4d3c5c69191f ("Drivers: hv: vmbus: Remove the useless API > > vmbus_get_outgoing_channel()") > > > > That is not going to work for the obvious reason that this dependant > > patch is not going to be merged into 4.20-final. > > It looks the dependent patch (4d3c5c69191f) is going to miss the v4.20 > release. > This is not a big issue, as the dependent patch isn't really important. > > > So, what do you expect us to do here? The only way this can be accepted > > is to have it go into my -next branch, which means it will show up in > > 4.21-rc1, is that ok? > > Is there any chance for this patch ("Drivers: hv: vmbus: Offload the handling > ...") to > go into v4.20? > > If yes, I can quickly do a rebase to char-misc's char-misc-linus branch, > because actually the conflict can be very easily fixed. And I can help to fix > any > conflict when the dependent patch is backported to v4.20.1. This patch fixes an important bug while the patch this depends on is not critical. I suggest we revert the patch that this patch depends on and we can submit a new version of this patch that can go in now - into 4.20 release. K. Y
RE: [PATCH] [repost] Drivers: hv: vmbus: Offload the handling of channels to two workqueues
> -Original Message- > From: Dexuan Cui > Sent: Thursday, November 29, 2018 12:17 AM > To: gre...@linuxfoundation.org > Cc: KY Srinivasan ; Haiyang Zhang > ; Stephen Hemminger > ; linux-kernel@vger.kernel.org; > de...@linuxdriverproject.org; a...@canonical.com; vkuznets > ; o...@aepfle.de; jasow...@redhat.com; Michael > Kelley > Subject: RE: [PATCH] [repost] Drivers: hv: vmbus: Offload the handling of > channels to two workqueues > > > From: gre...@linuxfoundation.org > > Sent: Wednesday, November 28, 2018 11:45 PM > > > > > > There is no change in this repost. I just rebased this patch to today's > > > char-misc's char-misc-next branch. Previously KY posted the patch with > his > > > Signed-off-by (which is kept in this repost), but there was a conflict > > > issue. > > > > > > Note: the patch can't be cleanly applied to char-misc's char-misc-linus > branch > > -- > > > to do that, we need to cherry-pick the supporting patch first: > > > 4d3c5c69191f ("Drivers: hv: vmbus: Remove the useless API > > vmbus_get_outgoing_channel()") > > > > That is not going to work for the obvious reason that this dependant > > patch is not going to be merged into 4.20-final. > > It looks the dependent patch (4d3c5c69191f) is going to miss the v4.20 > release. > This is not a big issue, as the dependent patch isn't really important. > > > So, what do you expect us to do here? The only way this can be accepted > > is to have it go into my -next branch, which means it will show up in > > 4.21-rc1, is that ok? > > Is there any chance for this patch ("Drivers: hv: vmbus: Offload the handling > ...") to > go into v4.20? > > If yes, I can quickly do a rebase to char-misc's char-misc-linus branch, > because actually the conflict can be very easily fixed. And I can help to fix > any > conflict when the dependent patch is backported to v4.20.1. This patch fixes an important bug while the patch this depends on is not critical. I suggest we revert the patch that this patch depends on and we can submit a new version of this patch that can go in now - into 4.20 release. K. Y
Re: [RFC PATCH v3] ftrace: support very early function tracing
On Wed, 24 Oct 2018 19:22:30 + Abderrahmane Benbachir wrote: > --- a/include/linux/ftrace.h > +++ b/include/linux/ftrace.h > @@ -239,6 +239,16 @@ static inline void ftrace_free_init_mem(void) { } > static inline void ftrace_free_mem(struct module *mod, void *start, > void *end) { } > #endif /* CONFIG_FUNCTION_TRACER */ > > +#ifdef CONFIG_VERY_EARLY_FUNCTION_TRACER > +extern void ftrace_early_init(char *command_line); > +extern void ftrace_early_shutdown(void); > +extern void ftrace_early_fill_ringbuffer(void *data); > +#else > +static inline void ftrace_early_init(char *command_line) { } > +static inline void ftrace_early_shutdown(void) { } > +static inline void ftrace_early_fill_ringbuffer(void *data) { } > +#endif > + > #ifdef CONFIG_STACK_TRACER > > #define STACK_TRACE_ENTRIES 500 > @@ -443,6 +453,10 @@ unsigned long ftrace_get_addr_curr(struct > dyn_ftrace *rec); > > extern ftrace_func_t ftrace_trace_function; > > +#if defined(CONFIG_VERY_EARLY_FUNCTION_TRACER) && > defined(CONFIG_DYNAMIC_FTRACE) Seems the patch has some formatting issue. Can you resend with better email client options. > +extern ftrace_func_t ftrace_vearly_trace_function; > +#endif > + > int ftrace_regex_open(struct ftrace_ops *ops, int flag, > struct inode *inode, struct file *file); > ssize_t ftrace_filter_write(struct file *file, const char __user *ubuf, > @@ -716,7 +730,7 @@ static inline unsigned long get_lock_parent_ip(void) > #ifdef CONFIG_FTRACE_MCOUNT_RECORD > extern void ftrace_init(void); > #else > -static inline void ftrace_init(void) { } > +static inline void ftrace_init(void) { ftrace_early_shutdown(); } > #endif > > /* > diff --git a/init/main.c b/init/main.c > index 18f8f0140fa0..1b289325223f 100644 > --- a/init/main.c > +++ b/init/main.c > @@ -533,6 +533,7 @@ asmlinkage __visible void __init start_kernel(void) > char *command_line; > char *after_dashes; > > + ftrace_early_init(boot_command_line); > set_task_stack_end_magic(_task); > smp_setup_processor_id(); > debug_objects_early_init(); > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig > index 5e3de28c7677..4b358bf6abb0 100644 > --- a/kernel/trace/Kconfig > +++ b/kernel/trace/Kconfig > @@ -19,6 +19,11 @@ config HAVE_FUNCTION_TRACER > help > See Documentation/trace/ftrace-design.rst > > +config HAVE_VERY_EARLY_FTRACE > + bool > + help > + See Documentation/trace/ftrace-design.txt > + > config HAVE_FUNCTION_GRAPH_TRACER > bool > help > @@ -155,6 +160,52 @@ config FUNCTION_TRACER > (the bootup default), then the overhead of the instructions is very > small and not measurable even in micro-benchmarks. > > +config VERY_EARLY_FUNCTION_TRACER > + bool "Very Early Kernel Function Tracer" > + depends on FUNCTION_TRACER > + depends on HAVE_VERY_EARLY_FTRACE > + help > + Normally, function tracing can only start after memory has been > + initialized early in boot. If "ftrace=function" is added to the > + command line, then function tracing will start after memory setup. > + In order to trace functions before that, this option will > + have function tracing starts before memory setup is complete, by s/starts/start/ > + placing the trace in a temporary buffer, which will be copied to > + the trace buffer after memory setup. The size of this temporary > + buffer is defined by VERY_EARLY_FTRACE_BUF_SHIFT. I'm also thinking that we should allocate a sub buffer for this. That is, hold off on writing to ring buffer until after file systems are initialized. Create a sub buffer "early_boot" (will be in tracefs/instances/early_boot)" and copy the data there. The reason I say this is that having ftrace=function will continue to fill the buffer and you will most likely lose your data from overwriting. > + > +config VERY_EARLY_FTRACE_BUF_SHIFT > + int "Temporary buffer size (17 => 128 KB, 24 => 16 MB)" > + depends on VERY_EARLY_FUNCTION_TRACER > + range 8 24 > + default 19 > + help > + Select the size of the buffer to be used for storing function calls at > + very early stage. > + The value defines the size as a power of 2. > + Examples: > + 20 => 1 MB > + 19 => 512 KB > + 17 => 128 KB Should state the allowable range in the help text as well. > + > +config VERY_EARLY_FTRACE_FILTER_SHIFT > + int "Temporary filter size (filter/notrace) (17 => 128 KB, 19 => 512 > KB)" > + depends on VERY_EARLY_FUNCTION_TRACER > + depends on FTRACE_MCOUNT_RECORD > + range 0 19 > + default 17 > + help > + Select the size of the filter buffer to be used for filtering (trace/ > + no trace) functions at very early stage. > + Two buffers (trace/no_trace) will be created using by this option. > + These following kernel parameters control filtering during bootup : > +
Re: [RFC PATCH v3] ftrace: support very early function tracing
On Wed, 24 Oct 2018 19:22:30 + Abderrahmane Benbachir wrote: > --- a/include/linux/ftrace.h > +++ b/include/linux/ftrace.h > @@ -239,6 +239,16 @@ static inline void ftrace_free_init_mem(void) { } > static inline void ftrace_free_mem(struct module *mod, void *start, > void *end) { } > #endif /* CONFIG_FUNCTION_TRACER */ > > +#ifdef CONFIG_VERY_EARLY_FUNCTION_TRACER > +extern void ftrace_early_init(char *command_line); > +extern void ftrace_early_shutdown(void); > +extern void ftrace_early_fill_ringbuffer(void *data); > +#else > +static inline void ftrace_early_init(char *command_line) { } > +static inline void ftrace_early_shutdown(void) { } > +static inline void ftrace_early_fill_ringbuffer(void *data) { } > +#endif > + > #ifdef CONFIG_STACK_TRACER > > #define STACK_TRACE_ENTRIES 500 > @@ -443,6 +453,10 @@ unsigned long ftrace_get_addr_curr(struct > dyn_ftrace *rec); > > extern ftrace_func_t ftrace_trace_function; > > +#if defined(CONFIG_VERY_EARLY_FUNCTION_TRACER) && > defined(CONFIG_DYNAMIC_FTRACE) Seems the patch has some formatting issue. Can you resend with better email client options. > +extern ftrace_func_t ftrace_vearly_trace_function; > +#endif > + > int ftrace_regex_open(struct ftrace_ops *ops, int flag, > struct inode *inode, struct file *file); > ssize_t ftrace_filter_write(struct file *file, const char __user *ubuf, > @@ -716,7 +730,7 @@ static inline unsigned long get_lock_parent_ip(void) > #ifdef CONFIG_FTRACE_MCOUNT_RECORD > extern void ftrace_init(void); > #else > -static inline void ftrace_init(void) { } > +static inline void ftrace_init(void) { ftrace_early_shutdown(); } > #endif > > /* > diff --git a/init/main.c b/init/main.c > index 18f8f0140fa0..1b289325223f 100644 > --- a/init/main.c > +++ b/init/main.c > @@ -533,6 +533,7 @@ asmlinkage __visible void __init start_kernel(void) > char *command_line; > char *after_dashes; > > + ftrace_early_init(boot_command_line); > set_task_stack_end_magic(_task); > smp_setup_processor_id(); > debug_objects_early_init(); > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig > index 5e3de28c7677..4b358bf6abb0 100644 > --- a/kernel/trace/Kconfig > +++ b/kernel/trace/Kconfig > @@ -19,6 +19,11 @@ config HAVE_FUNCTION_TRACER > help > See Documentation/trace/ftrace-design.rst > > +config HAVE_VERY_EARLY_FTRACE > + bool > + help > + See Documentation/trace/ftrace-design.txt > + > config HAVE_FUNCTION_GRAPH_TRACER > bool > help > @@ -155,6 +160,52 @@ config FUNCTION_TRACER > (the bootup default), then the overhead of the instructions is very > small and not measurable even in micro-benchmarks. > > +config VERY_EARLY_FUNCTION_TRACER > + bool "Very Early Kernel Function Tracer" > + depends on FUNCTION_TRACER > + depends on HAVE_VERY_EARLY_FTRACE > + help > + Normally, function tracing can only start after memory has been > + initialized early in boot. If "ftrace=function" is added to the > + command line, then function tracing will start after memory setup. > + In order to trace functions before that, this option will > + have function tracing starts before memory setup is complete, by s/starts/start/ > + placing the trace in a temporary buffer, which will be copied to > + the trace buffer after memory setup. The size of this temporary > + buffer is defined by VERY_EARLY_FTRACE_BUF_SHIFT. I'm also thinking that we should allocate a sub buffer for this. That is, hold off on writing to ring buffer until after file systems are initialized. Create a sub buffer "early_boot" (will be in tracefs/instances/early_boot)" and copy the data there. The reason I say this is that having ftrace=function will continue to fill the buffer and you will most likely lose your data from overwriting. > + > +config VERY_EARLY_FTRACE_BUF_SHIFT > + int "Temporary buffer size (17 => 128 KB, 24 => 16 MB)" > + depends on VERY_EARLY_FUNCTION_TRACER > + range 8 24 > + default 19 > + help > + Select the size of the buffer to be used for storing function calls at > + very early stage. > + The value defines the size as a power of 2. > + Examples: > + 20 => 1 MB > + 19 => 512 KB > + 17 => 128 KB Should state the allowable range in the help text as well. > + > +config VERY_EARLY_FTRACE_FILTER_SHIFT > + int "Temporary filter size (filter/notrace) (17 => 128 KB, 19 => 512 > KB)" > + depends on VERY_EARLY_FUNCTION_TRACER > + depends on FTRACE_MCOUNT_RECORD > + range 0 19 > + default 17 > + help > + Select the size of the filter buffer to be used for filtering (trace/ > + no trace) functions at very early stage. > + Two buffers (trace/no_trace) will be created using by this option. > + These following kernel parameters control filtering during bootup : > +
[GIT PULL] gcc-plugins fix for v4.20-rc5
Hi Linus, Please pull this gcc-plugin fix for v4.20-rc5. Thanks! -Kees The following changes since commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad: Linux 4.20-rc2 (2018-11-11 17:12:31 -0600) are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git tags/gcc-plugins-v4.20-rc5 for you to fetch changes up to ef1a8409348966f0b25ff97a170d6d0367710ea9: stackleak: Disable function tracing and kprobes for stackleak_erase() (2018-11-30 09:05:07 -0800) stackleak plugin fix - Fix crash by not allowing kprobing of stackleak_erase() (Alexander Popov) Alexander Popov (1): stackleak: Disable function tracing and kprobes for stackleak_erase() kernel/stackleak.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- Kees Cook
[GIT PULL] gcc-plugins fix for v4.20-rc5
Hi Linus, Please pull this gcc-plugin fix for v4.20-rc5. Thanks! -Kees The following changes since commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad: Linux 4.20-rc2 (2018-11-11 17:12:31 -0600) are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git tags/gcc-plugins-v4.20-rc5 for you to fetch changes up to ef1a8409348966f0b25ff97a170d6d0367710ea9: stackleak: Disable function tracing and kprobes for stackleak_erase() (2018-11-30 09:05:07 -0800) stackleak plugin fix - Fix crash by not allowing kprobing of stackleak_erase() (Alexander Popov) Alexander Popov (1): stackleak: Disable function tracing and kprobes for stackleak_erase() kernel/stackleak.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- Kees Cook
[PATCH v2 3/7] arm64: capabilities: Merge duplicate entries for Qualcomm erratum 1003
Remove duplicate entries for Qualcomm erratum 1003. Since the entries are not purely based on generic MIDR checks, use the multi_cap_entry type to merge the entries. Cc: Christopher Covington Cc: Will Deacon Reviewed-by: Vladimiri Murzin Tested-by: Vladimiri Murzin Signed-off-by: Suzuki K Poulose --- arch/arm64/kernel/cpu_errata.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index bff8fa8..0ef4bb9 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -592,6 +592,19 @@ static const struct midr_range cavium_erratum_30115_cpus[] = { }; #endif +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003 +static const struct arm64_cpu_capabilities qcom_erratum_1003_list[] = { + { + ERRATA_MIDR_REV(MIDR_QCOM_FALKOR_V1, 0, 0), + }, + { + .midr_range.model = MIDR_QCOM_KRYO, + .matches = is_kryo_midr, + }, + {}, +}; +#endif + #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE static const struct midr_range workaround_clean_cache[] = { #ifdefined(CONFIG_ARM64_ERRATUM_826319) || \ @@ -685,16 +698,10 @@ const struct arm64_cpu_capabilities arm64_errata[] = { }, #ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003 { - .desc = "Qualcomm Technologies Falkor erratum 1003", + .desc = "Qualcomm Technologies Falkor/Kryo erratum 1003", .capability = ARM64_WORKAROUND_QCOM_FALKOR_E1003, - ERRATA_MIDR_REV(MIDR_QCOM_FALKOR_V1, 0, 0), - }, - { - .desc = "Qualcomm Technologies Kryo erratum 1003", - .capability = ARM64_WORKAROUND_QCOM_FALKOR_E1003, - .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM, - .midr_range.model = MIDR_QCOM_KRYO, - .matches = is_kryo_midr, + .matches = multi_entry_cap_matches, + .match_list = qcom_erratum_1003_list, }, #endif #ifdef CONFIG_QCOM_FALKOR_ERRATUM_1009 -- 2.7.4
[PATCH v2 6/7] arm64: capabilities: Use linear array for detection and verification
Use the sorted list of capability entries for the detection and verification. Reviewed-by: Vladimiri Murzin Tested-by: Vladimiri Murzin Signed-off-by: Suzuki K Poulose --- arch/arm64/kernel/cpufeature.c | 42 +- 1 file changed, 17 insertions(+), 25 deletions(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 618857f..13d729c 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1512,28 +1512,25 @@ static void __init setup_elf_hwcaps(const struct arm64_cpu_capabilities *hwcaps) cap_set_elf_hwcap(hwcaps); } -static void __update_cpu_capabilities(const struct arm64_cpu_capabilities *caps, - u16 scope_mask, const char *info) +static void update_cpu_capabilities(u16 scope_mask) { + int i; + const struct arm64_cpu_capabilities *caps; + scope_mask &= ARM64_CPUCAP_SCOPE_MASK; - for (; caps->matches; caps++) { - if (!(caps->type & scope_mask) || + for (i = 0; i < ARM64_NCAPS; i++) { + caps = cpu_hwcaps_ptrs[i]; + if (!caps || !(caps->type & scope_mask) || + cpus_have_cap(caps->capability) || !caps->matches(caps, cpucap_default_scope(caps))) continue; - if (!cpus_have_cap(caps->capability) && caps->desc) - pr_info("%s %s\n", info, caps->desc); + if (caps->desc) + pr_info("detected: %s\n", caps->desc); cpus_set_cap(caps->capability); } } -static void update_cpu_capabilities(u16 scope_mask) -{ - __update_cpu_capabilities(arm64_errata, scope_mask, - "enabling workaround for"); - __update_cpu_capabilities(arm64_features, scope_mask, "detected:"); -} - static int __enable_cpu_capability(void *arg) { const struct arm64_cpu_capabilities *cap = arg; @@ -1597,16 +1594,17 @@ static void __init enable_cpu_capabilities(u16 scope_mask) * * Returns "false" on conflicts. */ -static bool -__verify_local_cpu_caps(const struct arm64_cpu_capabilities *caps, - u16 scope_mask) +static bool verify_local_cpu_caps(u16 scope_mask) { + int i; bool cpu_has_cap, system_has_cap; + const struct arm64_cpu_capabilities *caps; scope_mask &= ARM64_CPUCAP_SCOPE_MASK; - for (; caps->matches; caps++) { - if (!(caps->type & scope_mask)) + for (i = 0; i < ARM64_NCAPS; i++) { + caps = cpu_hwcaps_ptrs[i]; + if (!caps || !(caps->type & scope_mask)) continue; cpu_has_cap = caps->matches(caps, SCOPE_LOCAL_CPU); @@ -1637,7 +1635,7 @@ __verify_local_cpu_caps(const struct arm64_cpu_capabilities *caps, } } - if (caps->matches) { + if (i < ARM64_NCAPS) { pr_crit("CPU%d: Detected conflict for capability %d (%s), System: %d, CPU: %d\n", smp_processor_id(), caps->capability, caps->desc, system_has_cap, cpu_has_cap); @@ -1647,12 +1645,6 @@ __verify_local_cpu_caps(const struct arm64_cpu_capabilities *caps, return true; } -static bool verify_local_cpu_caps(u16 scope_mask) -{ - return __verify_local_cpu_caps(arm64_errata, scope_mask) && - __verify_local_cpu_caps(arm64_features, scope_mask); -} - /* * Check for CPU features that are used in early boot * based on the Boot CPU value. -- 2.7.4
[PATCH v2 1/7] arm64: capabilities: Merge entries for ARM64_WORKAROUND_CLEAN_CACHE
We have two entries for ARM64_WORKAROUND_CLEAN_CACHE capability : 1) ARM Errata 826319, 827319, 824069, 819472 on A53 r0p[012] 2) ARM Errata 819472 on A53 r0p[01] Both have the same work around. Merge these entries to avoid duplicate entries for a single capability. Add a new Kconfig entry to control the "capability" entry to make it easier to handle combinations of the CONFIGs. Cc: Will Deacon Cc: Andre Przywara Cc: Mark Rutland Signed-off-by: Suzuki K Poulose --- arch/arm64/Kconfig | 7 +++ arch/arm64/include/asm/cputype.h | 1 + arch/arm64/kernel/cpu_errata.c | 28 3 files changed, 24 insertions(+), 12 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 787d785..ad69e15 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -313,9 +313,13 @@ menu "Kernel Features" menu "ARM errata workarounds via the alternatives framework" +config ARM64_WORKAROUND_CLEAN_CACHE + def_bool n + config ARM64_ERRATUM_826319 bool "Cortex-A53: 826319: System might deadlock if a write cannot complete until read data is accepted" default y + select ARM64_WORKAROUND_CLEAN_CACHE help This option adds an alternative code sequence to work around ARM erratum 826319 on Cortex-A53 parts up to r0p2 with an AMBA 4 ACE or @@ -337,6 +341,7 @@ config ARM64_ERRATUM_826319 config ARM64_ERRATUM_827319 bool "Cortex-A53: 827319: Data cache clean instructions might cause overlapping transactions to the interconnect" default y + select ARM64_WORKAROUND_CLEAN_CACHE help This option adds an alternative code sequence to work around ARM erratum 827319 on Cortex-A53 parts up to r0p2 with an AMBA 5 CHI @@ -358,6 +363,7 @@ config ARM64_ERRATUM_827319 config ARM64_ERRATUM_824069 bool "Cortex-A53: 824069: Cache line might not be marked as clean after a CleanShared snoop" default y + select ARM64_WORKAROUND_CLEAN_CACHE help This option adds an alternative code sequence to work around ARM erratum 824069 on Cortex-A53 parts up to r0p2 when it is connected @@ -380,6 +386,7 @@ config ARM64_ERRATUM_824069 config ARM64_ERRATUM_819472 bool "Cortex-A53: 819472: Store exclusive instructions might cause data corruption" default y + select ARM64_WORKAROUND_CLEAN_CACHE help This option adds an alternative code sequence to work around ARM erratum 819472 on Cortex-A53 parts up to r0p1 with an L2 cache diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index 12f93e4d..2e26375 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -151,6 +151,7 @@ struct midr_range { .rv_max = MIDR_CPU_VAR_REV(v_max, r_max), \ } +#define MIDR_REV_RANGE(m, v, r_min, r_max) MIDR_RANGE(m, v, r_min, v, r_max) #define MIDR_ALL_VERSIONS(m) MIDR_RANGE(m, 0, 0, 0xf, 0xf) static inline bool is_midr_in_range(u32 midr, struct midr_range const *range) diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index a509e351..be1a8bc 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -570,24 +570,28 @@ static const struct midr_range arm64_harden_el2_vectors[] = { #endif -const struct arm64_cpu_capabilities arm64_errata[] = { +#ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE +static const struct midr_range workaround_clean_cache[] = { #ifdefined(CONFIG_ARM64_ERRATUM_826319) || \ defined(CONFIG_ARM64_ERRATUM_827319) || \ defined(CONFIG_ARM64_ERRATUM_824069) - { - /* Cortex-A53 r0p[012] */ - .desc = "ARM errata 826319, 827319, 824069", - .capability = ARM64_WORKAROUND_CLEAN_CACHE, - ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 2), - .cpu_enable = cpu_enable_cache_maint_trap, - }, + /* Cortex-A53 r0p[012]: ARM errata 826319, 827319, 824069 */ + MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 2), +#endif +#ifdef CONFIG_ARM64_ERRATUM_819472 + /* Cortex-A53 r0p[01] : ARM errata 819472 */ + MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 1), +#endif + {}, +}; #endif -#ifdef CONFIG_ARM64_ERRATUM_819472 + +const struct arm64_cpu_capabilities arm64_errata[] = { +#ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE { - /* Cortex-A53 r0p[01] */ - .desc = "ARM errata 819472", + .desc = "ARM errata 826319, 827319, 824069, 819472", .capability = ARM64_WORKAROUND_CLEAN_CACHE, - ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 1), + ERRATA_MIDR_RANGE_LIST(workaround_clean_cache), .cpu_enable = cpu_enable_cache_maint_trap, }, #endif -- 2.7.4
[PATCH v2 4/7] arm64: capabilities: Speed up capability lookup
We maintain two separate tables of capabilities, errata and features, which decide the system capabilities. We iterate over each of these tables for various operations (e.g, detection, verification etc.). We do not have a way to map a system "capability" to its entry, (i.e, cap -> struct arm64_cpu_capabilities) which is needed for this_cpu_has_cap(). So we iterate over the table one by one to find the entry and then do the operation. Also, this prevents us from optimizing the way we "enable" the capabilities on the CPUs, where we now issue a stop_machine() for each available capability. One solution is to merge the two tables into a single table, sorted by the capability. But this is has the following disadvantages: - We loose the "classification" of an errata vs. feature - It is quite easy to make a mistake when adding an entry, unless we sort the table at runtime. So we maintain a list of pointers to the capability entry, sorted by the "cap number" in a separate array, initialized at boot time. The only restriction is that we can have one "entry" per capability. While at it, remove the duplicate declaration of arm64_errata table. Reviewed-by: Vladimiri Murzin Tested-by: Vladimiri Murzin Signed-off-by: Suzuki K Poulose --- arch/arm64/kernel/cpufeature.c | 32 ++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index aec5ecb..6a2563b 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -52,6 +52,7 @@ unsigned int compat_elf_hwcap2 __read_mostly; DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS); EXPORT_SYMBOL(cpu_hwcaps); +static struct arm64_cpu_capabilities const __ro_after_init *cpu_hwcaps_ptrs[ARM64_NCAPS]; /* * Flag to indicate if we have computed the system wide @@ -518,6 +519,29 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) } extern const struct arm64_cpu_capabilities arm64_errata[]; +static const struct arm64_cpu_capabilities arm64_features[]; + +static void __init +init_cpu_hwcaps_indirect_list_from_array(const struct arm64_cpu_capabilities *caps) +{ + for (; caps->matches; caps++) { + if (WARN(caps->capability > ARM64_NCAPS, + "Invalid capability %d\n", caps->capability)) + continue; + if (WARN(cpu_hwcaps_ptrs[caps->capability], + "Duplicate entry for capability %d\n", + caps->capability)) + continue; + cpu_hwcaps_ptrs[caps->capability] = caps; + } +} + +static void __init init_cpu_hwcaps_indirect_list(void) +{ + init_cpu_hwcaps_indirect_list_from_array(arm64_features); + init_cpu_hwcaps_indirect_list_from_array(arm64_errata); +} + static void __init setup_boot_cpu_capabilities(void); void __init init_cpu_features(struct cpuinfo_arm64 *info) @@ -564,6 +588,12 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info) } /* +* Initialize the indirect array of CPU hwcaps capabilities pointers +* before we handle the boot CPU below. +*/ + init_cpu_hwcaps_indirect_list(); + + /* * Detect and enable early CPU capabilities based on the boot CPU, * after we have initialised the CPU feature infrastructure. */ @@ -1750,8 +1780,6 @@ static void __init mark_const_caps_ready(void) static_branch_enable(_const_caps_ready); } -extern const struct arm64_cpu_capabilities arm64_errata[]; - bool this_cpu_has_cap(unsigned int cap) { return (__this_cpu_has_cap(arm64_features, cap) || -- 2.7.4
[PATCH v2 7/7] arm64: capabilities: Batch cpu_enable callbacks
We use a stop_machine call for each available capability to enable it on all the CPUs available at boot time. Instead we could batch the cpu_enable callbacks to a single stop_machine() call to save us some time. Reviewed-by: Vladimiri Murzin Tested-by: Vladimiri Murzin Signed-off-by: Suzuki K Poulose --- arch/arm64/include/asm/cpufeature.h | 3 ++ arch/arm64/kernel/cpufeature.c | 70 +++-- 2 files changed, 47 insertions(+), 26 deletions(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 7e2ec64..0a15e2c 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -357,6 +357,9 @@ extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS); extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS]; extern struct static_key_false arm64_const_caps_ready; +#define for_each_available_cap(cap)\ + for_each_set_bit(cap, cpu_hwcaps, ARM64_NCAPS) + bool this_cpu_has_cap(unsigned int cap); static inline bool cpu_have_feature(unsigned int num) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 13d729c..b12b9933 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1531,11 +1531,27 @@ static void update_cpu_capabilities(u16 scope_mask) } } -static int __enable_cpu_capability(void *arg) +/* + * Enable all the available capabilities on this CPU. The capabilities + * with BOOT_CPU scope are handled separately and hence skipped here. + */ +static int cpu_enable_non_boot_scope_capabilities(void *__unused) { - const struct arm64_cpu_capabilities *cap = arg; + int i; + u16 non_boot_scope = SCOPE_ALL & ~SCOPE_BOOT_CPU; - cap->cpu_enable(cap); + for_each_available_cap(i) { + const struct arm64_cpu_capabilities *cap = cpu_hwcaps_ptrs[i]; + + if (WARN_ON(!cap)) + continue; + + if (!(cap->type & non_boot_scope)) + continue; + + if (cap->cpu_enable) + cap->cpu_enable(cap); + } return 0; } @@ -1543,21 +1559,29 @@ static int __enable_cpu_capability(void *arg) * Run through the enabled capabilities and enable() it on all active * CPUs */ -static void __init -__enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps, - u16 scope_mask) +static void __init enable_cpu_capabilities(u16 scope_mask) { + int i; + const struct arm64_cpu_capabilities *caps; + bool boot_scope; + scope_mask &= ARM64_CPUCAP_SCOPE_MASK; - for (; caps->matches; caps++) { - unsigned int num = caps->capability; + boot_scope = !!(scope_mask & SCOPE_BOOT_CPU); - if (!(caps->type & scope_mask) || !cpus_have_cap(num)) + for (i = 0; i < ARM64_NCAPS; i++) { + unsigned int num; + + caps = cpu_hwcaps_ptrs[i]; + if (!caps || !(caps->type & scope_mask)) + continue; + num = caps->capability; + if (!cpus_have_cap(num)) continue; /* Ensure cpus_have_const_cap(num) works */ static_branch_enable(_hwcap_keys[num]); - if (caps->cpu_enable) { + if (boot_scope && caps->cpu_enable) /* * Capabilities with SCOPE_BOOT_CPU scope are finalised * before any secondary CPU boots. Thus, each secondary @@ -1566,25 +1590,19 @@ __enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps, * the boot CPU, for which the capability must be * enabled here. This approach avoids costly * stop_machine() calls for this case. -* -* Otherwise, use stop_machine() as it schedules the -* work allowing us to modify PSTATE, instead of -* on_each_cpu() which uses an IPI, giving us a PSTATE -* that disappears when we return. */ - if (scope_mask & SCOPE_BOOT_CPU) - caps->cpu_enable(caps); - else - stop_machine(__enable_cpu_capability, -(void *)caps, cpu_online_mask); - } + caps->cpu_enable(caps); } -} -static void __init enable_cpu_capabilities(u16 scope_mask) -{ - __enable_cpu_capabilities(arm64_errata, scope_mask); - __enable_cpu_capabilities(arm64_features, scope_mask); + /* +* For all non-boot scope capabilities, use stop_machine() +* as it schedules the work allowing us to modify PSTATE, +* instead of on_each_cpu()
[PATCH v2 3/7] arm64: capabilities: Merge duplicate entries for Qualcomm erratum 1003
Remove duplicate entries for Qualcomm erratum 1003. Since the entries are not purely based on generic MIDR checks, use the multi_cap_entry type to merge the entries. Cc: Christopher Covington Cc: Will Deacon Reviewed-by: Vladimiri Murzin Tested-by: Vladimiri Murzin Signed-off-by: Suzuki K Poulose --- arch/arm64/kernel/cpu_errata.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index bff8fa8..0ef4bb9 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -592,6 +592,19 @@ static const struct midr_range cavium_erratum_30115_cpus[] = { }; #endif +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003 +static const struct arm64_cpu_capabilities qcom_erratum_1003_list[] = { + { + ERRATA_MIDR_REV(MIDR_QCOM_FALKOR_V1, 0, 0), + }, + { + .midr_range.model = MIDR_QCOM_KRYO, + .matches = is_kryo_midr, + }, + {}, +}; +#endif + #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE static const struct midr_range workaround_clean_cache[] = { #ifdefined(CONFIG_ARM64_ERRATUM_826319) || \ @@ -685,16 +698,10 @@ const struct arm64_cpu_capabilities arm64_errata[] = { }, #ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003 { - .desc = "Qualcomm Technologies Falkor erratum 1003", + .desc = "Qualcomm Technologies Falkor/Kryo erratum 1003", .capability = ARM64_WORKAROUND_QCOM_FALKOR_E1003, - ERRATA_MIDR_REV(MIDR_QCOM_FALKOR_V1, 0, 0), - }, - { - .desc = "Qualcomm Technologies Kryo erratum 1003", - .capability = ARM64_WORKAROUND_QCOM_FALKOR_E1003, - .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM, - .midr_range.model = MIDR_QCOM_KRYO, - .matches = is_kryo_midr, + .matches = multi_entry_cap_matches, + .match_list = qcom_erratum_1003_list, }, #endif #ifdef CONFIG_QCOM_FALKOR_ERRATUM_1009 -- 2.7.4
[PATCH v2 6/7] arm64: capabilities: Use linear array for detection and verification
Use the sorted list of capability entries for the detection and verification. Reviewed-by: Vladimiri Murzin Tested-by: Vladimiri Murzin Signed-off-by: Suzuki K Poulose --- arch/arm64/kernel/cpufeature.c | 42 +- 1 file changed, 17 insertions(+), 25 deletions(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 618857f..13d729c 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1512,28 +1512,25 @@ static void __init setup_elf_hwcaps(const struct arm64_cpu_capabilities *hwcaps) cap_set_elf_hwcap(hwcaps); } -static void __update_cpu_capabilities(const struct arm64_cpu_capabilities *caps, - u16 scope_mask, const char *info) +static void update_cpu_capabilities(u16 scope_mask) { + int i; + const struct arm64_cpu_capabilities *caps; + scope_mask &= ARM64_CPUCAP_SCOPE_MASK; - for (; caps->matches; caps++) { - if (!(caps->type & scope_mask) || + for (i = 0; i < ARM64_NCAPS; i++) { + caps = cpu_hwcaps_ptrs[i]; + if (!caps || !(caps->type & scope_mask) || + cpus_have_cap(caps->capability) || !caps->matches(caps, cpucap_default_scope(caps))) continue; - if (!cpus_have_cap(caps->capability) && caps->desc) - pr_info("%s %s\n", info, caps->desc); + if (caps->desc) + pr_info("detected: %s\n", caps->desc); cpus_set_cap(caps->capability); } } -static void update_cpu_capabilities(u16 scope_mask) -{ - __update_cpu_capabilities(arm64_errata, scope_mask, - "enabling workaround for"); - __update_cpu_capabilities(arm64_features, scope_mask, "detected:"); -} - static int __enable_cpu_capability(void *arg) { const struct arm64_cpu_capabilities *cap = arg; @@ -1597,16 +1594,17 @@ static void __init enable_cpu_capabilities(u16 scope_mask) * * Returns "false" on conflicts. */ -static bool -__verify_local_cpu_caps(const struct arm64_cpu_capabilities *caps, - u16 scope_mask) +static bool verify_local_cpu_caps(u16 scope_mask) { + int i; bool cpu_has_cap, system_has_cap; + const struct arm64_cpu_capabilities *caps; scope_mask &= ARM64_CPUCAP_SCOPE_MASK; - for (; caps->matches; caps++) { - if (!(caps->type & scope_mask)) + for (i = 0; i < ARM64_NCAPS; i++) { + caps = cpu_hwcaps_ptrs[i]; + if (!caps || !(caps->type & scope_mask)) continue; cpu_has_cap = caps->matches(caps, SCOPE_LOCAL_CPU); @@ -1637,7 +1635,7 @@ __verify_local_cpu_caps(const struct arm64_cpu_capabilities *caps, } } - if (caps->matches) { + if (i < ARM64_NCAPS) { pr_crit("CPU%d: Detected conflict for capability %d (%s), System: %d, CPU: %d\n", smp_processor_id(), caps->capability, caps->desc, system_has_cap, cpu_has_cap); @@ -1647,12 +1645,6 @@ __verify_local_cpu_caps(const struct arm64_cpu_capabilities *caps, return true; } -static bool verify_local_cpu_caps(u16 scope_mask) -{ - return __verify_local_cpu_caps(arm64_errata, scope_mask) && - __verify_local_cpu_caps(arm64_features, scope_mask); -} - /* * Check for CPU features that are used in early boot * based on the Boot CPU value. -- 2.7.4
[PATCH v2 1/7] arm64: capabilities: Merge entries for ARM64_WORKAROUND_CLEAN_CACHE
We have two entries for ARM64_WORKAROUND_CLEAN_CACHE capability : 1) ARM Errata 826319, 827319, 824069, 819472 on A53 r0p[012] 2) ARM Errata 819472 on A53 r0p[01] Both have the same work around. Merge these entries to avoid duplicate entries for a single capability. Add a new Kconfig entry to control the "capability" entry to make it easier to handle combinations of the CONFIGs. Cc: Will Deacon Cc: Andre Przywara Cc: Mark Rutland Signed-off-by: Suzuki K Poulose --- arch/arm64/Kconfig | 7 +++ arch/arm64/include/asm/cputype.h | 1 + arch/arm64/kernel/cpu_errata.c | 28 3 files changed, 24 insertions(+), 12 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 787d785..ad69e15 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -313,9 +313,13 @@ menu "Kernel Features" menu "ARM errata workarounds via the alternatives framework" +config ARM64_WORKAROUND_CLEAN_CACHE + def_bool n + config ARM64_ERRATUM_826319 bool "Cortex-A53: 826319: System might deadlock if a write cannot complete until read data is accepted" default y + select ARM64_WORKAROUND_CLEAN_CACHE help This option adds an alternative code sequence to work around ARM erratum 826319 on Cortex-A53 parts up to r0p2 with an AMBA 4 ACE or @@ -337,6 +341,7 @@ config ARM64_ERRATUM_826319 config ARM64_ERRATUM_827319 bool "Cortex-A53: 827319: Data cache clean instructions might cause overlapping transactions to the interconnect" default y + select ARM64_WORKAROUND_CLEAN_CACHE help This option adds an alternative code sequence to work around ARM erratum 827319 on Cortex-A53 parts up to r0p2 with an AMBA 5 CHI @@ -358,6 +363,7 @@ config ARM64_ERRATUM_827319 config ARM64_ERRATUM_824069 bool "Cortex-A53: 824069: Cache line might not be marked as clean after a CleanShared snoop" default y + select ARM64_WORKAROUND_CLEAN_CACHE help This option adds an alternative code sequence to work around ARM erratum 824069 on Cortex-A53 parts up to r0p2 when it is connected @@ -380,6 +386,7 @@ config ARM64_ERRATUM_824069 config ARM64_ERRATUM_819472 bool "Cortex-A53: 819472: Store exclusive instructions might cause data corruption" default y + select ARM64_WORKAROUND_CLEAN_CACHE help This option adds an alternative code sequence to work around ARM erratum 819472 on Cortex-A53 parts up to r0p1 with an L2 cache diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index 12f93e4d..2e26375 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -151,6 +151,7 @@ struct midr_range { .rv_max = MIDR_CPU_VAR_REV(v_max, r_max), \ } +#define MIDR_REV_RANGE(m, v, r_min, r_max) MIDR_RANGE(m, v, r_min, v, r_max) #define MIDR_ALL_VERSIONS(m) MIDR_RANGE(m, 0, 0, 0xf, 0xf) static inline bool is_midr_in_range(u32 midr, struct midr_range const *range) diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index a509e351..be1a8bc 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -570,24 +570,28 @@ static const struct midr_range arm64_harden_el2_vectors[] = { #endif -const struct arm64_cpu_capabilities arm64_errata[] = { +#ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE +static const struct midr_range workaround_clean_cache[] = { #ifdefined(CONFIG_ARM64_ERRATUM_826319) || \ defined(CONFIG_ARM64_ERRATUM_827319) || \ defined(CONFIG_ARM64_ERRATUM_824069) - { - /* Cortex-A53 r0p[012] */ - .desc = "ARM errata 826319, 827319, 824069", - .capability = ARM64_WORKAROUND_CLEAN_CACHE, - ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 2), - .cpu_enable = cpu_enable_cache_maint_trap, - }, + /* Cortex-A53 r0p[012]: ARM errata 826319, 827319, 824069 */ + MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 2), +#endif +#ifdef CONFIG_ARM64_ERRATUM_819472 + /* Cortex-A53 r0p[01] : ARM errata 819472 */ + MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 1), +#endif + {}, +}; #endif -#ifdef CONFIG_ARM64_ERRATUM_819472 + +const struct arm64_cpu_capabilities arm64_errata[] = { +#ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE { - /* Cortex-A53 r0p[01] */ - .desc = "ARM errata 819472", + .desc = "ARM errata 826319, 827319, 824069, 819472", .capability = ARM64_WORKAROUND_CLEAN_CACHE, - ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 1), + ERRATA_MIDR_RANGE_LIST(workaround_clean_cache), .cpu_enable = cpu_enable_cache_maint_trap, }, #endif -- 2.7.4
[PATCH v2 4/7] arm64: capabilities: Speed up capability lookup
We maintain two separate tables of capabilities, errata and features, which decide the system capabilities. We iterate over each of these tables for various operations (e.g, detection, verification etc.). We do not have a way to map a system "capability" to its entry, (i.e, cap -> struct arm64_cpu_capabilities) which is needed for this_cpu_has_cap(). So we iterate over the table one by one to find the entry and then do the operation. Also, this prevents us from optimizing the way we "enable" the capabilities on the CPUs, where we now issue a stop_machine() for each available capability. One solution is to merge the two tables into a single table, sorted by the capability. But this is has the following disadvantages: - We loose the "classification" of an errata vs. feature - It is quite easy to make a mistake when adding an entry, unless we sort the table at runtime. So we maintain a list of pointers to the capability entry, sorted by the "cap number" in a separate array, initialized at boot time. The only restriction is that we can have one "entry" per capability. While at it, remove the duplicate declaration of arm64_errata table. Reviewed-by: Vladimiri Murzin Tested-by: Vladimiri Murzin Signed-off-by: Suzuki K Poulose --- arch/arm64/kernel/cpufeature.c | 32 ++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index aec5ecb..6a2563b 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -52,6 +52,7 @@ unsigned int compat_elf_hwcap2 __read_mostly; DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS); EXPORT_SYMBOL(cpu_hwcaps); +static struct arm64_cpu_capabilities const __ro_after_init *cpu_hwcaps_ptrs[ARM64_NCAPS]; /* * Flag to indicate if we have computed the system wide @@ -518,6 +519,29 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) } extern const struct arm64_cpu_capabilities arm64_errata[]; +static const struct arm64_cpu_capabilities arm64_features[]; + +static void __init +init_cpu_hwcaps_indirect_list_from_array(const struct arm64_cpu_capabilities *caps) +{ + for (; caps->matches; caps++) { + if (WARN(caps->capability > ARM64_NCAPS, + "Invalid capability %d\n", caps->capability)) + continue; + if (WARN(cpu_hwcaps_ptrs[caps->capability], + "Duplicate entry for capability %d\n", + caps->capability)) + continue; + cpu_hwcaps_ptrs[caps->capability] = caps; + } +} + +static void __init init_cpu_hwcaps_indirect_list(void) +{ + init_cpu_hwcaps_indirect_list_from_array(arm64_features); + init_cpu_hwcaps_indirect_list_from_array(arm64_errata); +} + static void __init setup_boot_cpu_capabilities(void); void __init init_cpu_features(struct cpuinfo_arm64 *info) @@ -564,6 +588,12 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info) } /* +* Initialize the indirect array of CPU hwcaps capabilities pointers +* before we handle the boot CPU below. +*/ + init_cpu_hwcaps_indirect_list(); + + /* * Detect and enable early CPU capabilities based on the boot CPU, * after we have initialised the CPU feature infrastructure. */ @@ -1750,8 +1780,6 @@ static void __init mark_const_caps_ready(void) static_branch_enable(_const_caps_ready); } -extern const struct arm64_cpu_capabilities arm64_errata[]; - bool this_cpu_has_cap(unsigned int cap) { return (__this_cpu_has_cap(arm64_features, cap) || -- 2.7.4
[PATCH v2 7/7] arm64: capabilities: Batch cpu_enable callbacks
We use a stop_machine call for each available capability to enable it on all the CPUs available at boot time. Instead we could batch the cpu_enable callbacks to a single stop_machine() call to save us some time. Reviewed-by: Vladimiri Murzin Tested-by: Vladimiri Murzin Signed-off-by: Suzuki K Poulose --- arch/arm64/include/asm/cpufeature.h | 3 ++ arch/arm64/kernel/cpufeature.c | 70 +++-- 2 files changed, 47 insertions(+), 26 deletions(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 7e2ec64..0a15e2c 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -357,6 +357,9 @@ extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS); extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS]; extern struct static_key_false arm64_const_caps_ready; +#define for_each_available_cap(cap)\ + for_each_set_bit(cap, cpu_hwcaps, ARM64_NCAPS) + bool this_cpu_has_cap(unsigned int cap); static inline bool cpu_have_feature(unsigned int num) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 13d729c..b12b9933 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1531,11 +1531,27 @@ static void update_cpu_capabilities(u16 scope_mask) } } -static int __enable_cpu_capability(void *arg) +/* + * Enable all the available capabilities on this CPU. The capabilities + * with BOOT_CPU scope are handled separately and hence skipped here. + */ +static int cpu_enable_non_boot_scope_capabilities(void *__unused) { - const struct arm64_cpu_capabilities *cap = arg; + int i; + u16 non_boot_scope = SCOPE_ALL & ~SCOPE_BOOT_CPU; - cap->cpu_enable(cap); + for_each_available_cap(i) { + const struct arm64_cpu_capabilities *cap = cpu_hwcaps_ptrs[i]; + + if (WARN_ON(!cap)) + continue; + + if (!(cap->type & non_boot_scope)) + continue; + + if (cap->cpu_enable) + cap->cpu_enable(cap); + } return 0; } @@ -1543,21 +1559,29 @@ static int __enable_cpu_capability(void *arg) * Run through the enabled capabilities and enable() it on all active * CPUs */ -static void __init -__enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps, - u16 scope_mask) +static void __init enable_cpu_capabilities(u16 scope_mask) { + int i; + const struct arm64_cpu_capabilities *caps; + bool boot_scope; + scope_mask &= ARM64_CPUCAP_SCOPE_MASK; - for (; caps->matches; caps++) { - unsigned int num = caps->capability; + boot_scope = !!(scope_mask & SCOPE_BOOT_CPU); - if (!(caps->type & scope_mask) || !cpus_have_cap(num)) + for (i = 0; i < ARM64_NCAPS; i++) { + unsigned int num; + + caps = cpu_hwcaps_ptrs[i]; + if (!caps || !(caps->type & scope_mask)) + continue; + num = caps->capability; + if (!cpus_have_cap(num)) continue; /* Ensure cpus_have_const_cap(num) works */ static_branch_enable(_hwcap_keys[num]); - if (caps->cpu_enable) { + if (boot_scope && caps->cpu_enable) /* * Capabilities with SCOPE_BOOT_CPU scope are finalised * before any secondary CPU boots. Thus, each secondary @@ -1566,25 +1590,19 @@ __enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps, * the boot CPU, for which the capability must be * enabled here. This approach avoids costly * stop_machine() calls for this case. -* -* Otherwise, use stop_machine() as it schedules the -* work allowing us to modify PSTATE, instead of -* on_each_cpu() which uses an IPI, giving us a PSTATE -* that disappears when we return. */ - if (scope_mask & SCOPE_BOOT_CPU) - caps->cpu_enable(caps); - else - stop_machine(__enable_cpu_capability, -(void *)caps, cpu_online_mask); - } + caps->cpu_enable(caps); } -} -static void __init enable_cpu_capabilities(u16 scope_mask) -{ - __enable_cpu_capabilities(arm64_errata, scope_mask); - __enable_cpu_capabilities(arm64_features, scope_mask); + /* +* For all non-boot scope capabilities, use stop_machine() +* as it schedules the work allowing us to modify PSTATE, +* instead of on_each_cpu()
[PATCH v2 2/7] arm64: capabilities: Merge duplicate Cavium erratum entries
Merge duplicate entries for a single capability using the midr range list for Cavium errata 30115 and 27456. Cc: Andrew Pinski Cc: David Daney Cc: Will Deacon Cc: Catalin Marinas Reviewed-by: Vladimiri Murzin Tested-by: Vladimiri Murzin Signed-off-by: Suzuki K Poulose --- arch/arm64/include/asm/cputype.h | 1 + arch/arm64/kernel/cpu_errata.c | 50 +++- 2 files changed, 25 insertions(+), 26 deletions(-) diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index 2e26375..951ed1a 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -152,6 +152,7 @@ struct midr_range { } #define MIDR_REV_RANGE(m, v, r_min, r_max) MIDR_RANGE(m, v, r_min, v, r_max) +#define MIDR_REV(m, v, r) MIDR_RANGE(m, v, r, v, r) #define MIDR_ALL_VERSIONS(m) MIDR_RANGE(m, 0, 0, 0xf, 0xf) static inline bool is_midr_in_range(u32 midr, struct midr_range const *range) diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index be1a8bc..bff8fa8 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -570,6 +570,28 @@ static const struct midr_range arm64_harden_el2_vectors[] = { #endif +#ifdef CONFIG_CAVIUM_ERRATUM_27456 +static const struct midr_range cavium_erratum_27456_cpus[] = { + /* Cavium ThunderX, T88 pass 1.x - 2.1 */ + MIDR_RANGE(MIDR_THUNDERX, 0, 0, 1, 1), + /* Cavium ThunderX, T81 pass 1.0 */ + MIDR_REV(MIDR_THUNDERX_81XX, 0, 0), + {}, +}; +#endif + +#ifdef CONFIG_CAVIUM_ERRATUM_30115 +static const struct midr_range cavium_erratum_30115_cpus[] = { + /* Cavium ThunderX, T88 pass 1.x - 2.2 */ + MIDR_RANGE(MIDR_THUNDERX, 0, 0, 1, 2), + /* Cavium ThunderX, T81 pass 1.0 - 1.2 */ + MIDR_REV_RANGE(MIDR_THUNDERX_81XX, 0, 0, 2), + /* Cavium ThunderX, T83 pass 1.0 */ + MIDR_REV(MIDR_THUNDERX_83XX, 0, 0), + {}, +}; +#endif + #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE static const struct midr_range workaround_clean_cache[] = { #ifdefined(CONFIG_ARM64_ERRATUM_826319) || \ @@ -642,40 +664,16 @@ const struct arm64_cpu_capabilities arm64_errata[] = { #endif #ifdef CONFIG_CAVIUM_ERRATUM_27456 { - /* Cavium ThunderX, T88 pass 1.x - 2.1 */ .desc = "Cavium erratum 27456", .capability = ARM64_WORKAROUND_CAVIUM_27456, - ERRATA_MIDR_RANGE(MIDR_THUNDERX, - 0, 0, - 1, 1), - }, - { - /* Cavium ThunderX, T81 pass 1.0 */ - .desc = "Cavium erratum 27456", - .capability = ARM64_WORKAROUND_CAVIUM_27456, - ERRATA_MIDR_REV(MIDR_THUNDERX_81XX, 0, 0), + ERRATA_MIDR_RANGE_LIST(cavium_erratum_27456_cpus), }, #endif #ifdef CONFIG_CAVIUM_ERRATUM_30115 { - /* Cavium ThunderX, T88 pass 1.x - 2.2 */ .desc = "Cavium erratum 30115", .capability = ARM64_WORKAROUND_CAVIUM_30115, - ERRATA_MIDR_RANGE(MIDR_THUNDERX, - 0, 0, - 1, 2), - }, - { - /* Cavium ThunderX, T81 pass 1.0 - 1.2 */ - .desc = "Cavium erratum 30115", - .capability = ARM64_WORKAROUND_CAVIUM_30115, - ERRATA_MIDR_REV_RANGE(MIDR_THUNDERX_81XX, 0, 0, 2), - }, - { - /* Cavium ThunderX, T83 pass 1.0 */ - .desc = "Cavium erratum 30115", - .capability = ARM64_WORKAROUND_CAVIUM_30115, - ERRATA_MIDR_REV(MIDR_THUNDERX_83XX, 0, 0), + ERRATA_MIDR_RANGE_LIST(cavium_erratum_30115_cpus), }, #endif { -- 2.7.4
[PATCH v2 5/7] arm64: capabilities: Optimize this_cpu_has_cap
Make use of the sorted capability list to access the capability entry in this_cpu_has_cap() to avoid iterating over the two tables. Reviewed-by: Vladimiri Murzin Tested-by: Vladimiri Murzin Signed-off-by: Suzuki K Poulose --- arch/arm64/kernel/cpufeature.c | 31 +-- 1 file changed, 9 insertions(+), 22 deletions(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 6a2563b..618857f 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1512,25 +1512,6 @@ static void __init setup_elf_hwcaps(const struct arm64_cpu_capabilities *hwcaps) cap_set_elf_hwcap(hwcaps); } -/* - * Check if the current CPU has a given feature capability. - * Should be called from non-preemptible context. - */ -static bool __this_cpu_has_cap(const struct arm64_cpu_capabilities *cap_array, - unsigned int cap) -{ - const struct arm64_cpu_capabilities *caps; - - if (WARN_ON(preemptible())) - return false; - - for (caps = cap_array; caps->matches; caps++) - if (caps->capability == cap) - return caps->matches(caps, SCOPE_LOCAL_CPU); - - return false; -} - static void __update_cpu_capabilities(const struct arm64_cpu_capabilities *caps, u16 scope_mask, const char *info) { @@ -1780,10 +1761,16 @@ static void __init mark_const_caps_ready(void) static_branch_enable(_const_caps_ready); } -bool this_cpu_has_cap(unsigned int cap) +bool this_cpu_has_cap(unsigned int n) { - return (__this_cpu_has_cap(arm64_features, cap) || - __this_cpu_has_cap(arm64_errata, cap)); + if (!WARN_ON(preemptible()) && n < ARM64_NCAPS) { + const struct arm64_cpu_capabilities *cap = cpu_hwcaps_ptrs[n]; + + if (cap) + return cap->matches(cap, SCOPE_LOCAL_CPU); + } + + return false; } static void __init setup_system_capabilities(void) -- 2.7.4
[PATCH v2 5/7] arm64: capabilities: Optimize this_cpu_has_cap
Make use of the sorted capability list to access the capability entry in this_cpu_has_cap() to avoid iterating over the two tables. Reviewed-by: Vladimiri Murzin Tested-by: Vladimiri Murzin Signed-off-by: Suzuki K Poulose --- arch/arm64/kernel/cpufeature.c | 31 +-- 1 file changed, 9 insertions(+), 22 deletions(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 6a2563b..618857f 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1512,25 +1512,6 @@ static void __init setup_elf_hwcaps(const struct arm64_cpu_capabilities *hwcaps) cap_set_elf_hwcap(hwcaps); } -/* - * Check if the current CPU has a given feature capability. - * Should be called from non-preemptible context. - */ -static bool __this_cpu_has_cap(const struct arm64_cpu_capabilities *cap_array, - unsigned int cap) -{ - const struct arm64_cpu_capabilities *caps; - - if (WARN_ON(preemptible())) - return false; - - for (caps = cap_array; caps->matches; caps++) - if (caps->capability == cap) - return caps->matches(caps, SCOPE_LOCAL_CPU); - - return false; -} - static void __update_cpu_capabilities(const struct arm64_cpu_capabilities *caps, u16 scope_mask, const char *info) { @@ -1780,10 +1761,16 @@ static void __init mark_const_caps_ready(void) static_branch_enable(_const_caps_ready); } -bool this_cpu_has_cap(unsigned int cap) +bool this_cpu_has_cap(unsigned int n) { - return (__this_cpu_has_cap(arm64_features, cap) || - __this_cpu_has_cap(arm64_errata, cap)); + if (!WARN_ON(preemptible()) && n < ARM64_NCAPS) { + const struct arm64_cpu_capabilities *cap = cpu_hwcaps_ptrs[n]; + + if (cap) + return cap->matches(cap, SCOPE_LOCAL_CPU); + } + + return false; } static void __init setup_system_capabilities(void) -- 2.7.4
[PATCH v2 2/7] arm64: capabilities: Merge duplicate Cavium erratum entries
Merge duplicate entries for a single capability using the midr range list for Cavium errata 30115 and 27456. Cc: Andrew Pinski Cc: David Daney Cc: Will Deacon Cc: Catalin Marinas Reviewed-by: Vladimiri Murzin Tested-by: Vladimiri Murzin Signed-off-by: Suzuki K Poulose --- arch/arm64/include/asm/cputype.h | 1 + arch/arm64/kernel/cpu_errata.c | 50 +++- 2 files changed, 25 insertions(+), 26 deletions(-) diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index 2e26375..951ed1a 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -152,6 +152,7 @@ struct midr_range { } #define MIDR_REV_RANGE(m, v, r_min, r_max) MIDR_RANGE(m, v, r_min, v, r_max) +#define MIDR_REV(m, v, r) MIDR_RANGE(m, v, r, v, r) #define MIDR_ALL_VERSIONS(m) MIDR_RANGE(m, 0, 0, 0xf, 0xf) static inline bool is_midr_in_range(u32 midr, struct midr_range const *range) diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index be1a8bc..bff8fa8 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -570,6 +570,28 @@ static const struct midr_range arm64_harden_el2_vectors[] = { #endif +#ifdef CONFIG_CAVIUM_ERRATUM_27456 +static const struct midr_range cavium_erratum_27456_cpus[] = { + /* Cavium ThunderX, T88 pass 1.x - 2.1 */ + MIDR_RANGE(MIDR_THUNDERX, 0, 0, 1, 1), + /* Cavium ThunderX, T81 pass 1.0 */ + MIDR_REV(MIDR_THUNDERX_81XX, 0, 0), + {}, +}; +#endif + +#ifdef CONFIG_CAVIUM_ERRATUM_30115 +static const struct midr_range cavium_erratum_30115_cpus[] = { + /* Cavium ThunderX, T88 pass 1.x - 2.2 */ + MIDR_RANGE(MIDR_THUNDERX, 0, 0, 1, 2), + /* Cavium ThunderX, T81 pass 1.0 - 1.2 */ + MIDR_REV_RANGE(MIDR_THUNDERX_81XX, 0, 0, 2), + /* Cavium ThunderX, T83 pass 1.0 */ + MIDR_REV(MIDR_THUNDERX_83XX, 0, 0), + {}, +}; +#endif + #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE static const struct midr_range workaround_clean_cache[] = { #ifdefined(CONFIG_ARM64_ERRATUM_826319) || \ @@ -642,40 +664,16 @@ const struct arm64_cpu_capabilities arm64_errata[] = { #endif #ifdef CONFIG_CAVIUM_ERRATUM_27456 { - /* Cavium ThunderX, T88 pass 1.x - 2.1 */ .desc = "Cavium erratum 27456", .capability = ARM64_WORKAROUND_CAVIUM_27456, - ERRATA_MIDR_RANGE(MIDR_THUNDERX, - 0, 0, - 1, 1), - }, - { - /* Cavium ThunderX, T81 pass 1.0 */ - .desc = "Cavium erratum 27456", - .capability = ARM64_WORKAROUND_CAVIUM_27456, - ERRATA_MIDR_REV(MIDR_THUNDERX_81XX, 0, 0), + ERRATA_MIDR_RANGE_LIST(cavium_erratum_27456_cpus), }, #endif #ifdef CONFIG_CAVIUM_ERRATUM_30115 { - /* Cavium ThunderX, T88 pass 1.x - 2.2 */ .desc = "Cavium erratum 30115", .capability = ARM64_WORKAROUND_CAVIUM_30115, - ERRATA_MIDR_RANGE(MIDR_THUNDERX, - 0, 0, - 1, 2), - }, - { - /* Cavium ThunderX, T81 pass 1.0 - 1.2 */ - .desc = "Cavium erratum 30115", - .capability = ARM64_WORKAROUND_CAVIUM_30115, - ERRATA_MIDR_REV_RANGE(MIDR_THUNDERX_81XX, 0, 0, 2), - }, - { - /* Cavium ThunderX, T83 pass 1.0 */ - .desc = "Cavium erratum 30115", - .capability = ARM64_WORKAROUND_CAVIUM_30115, - ERRATA_MIDR_REV(MIDR_THUNDERX_83XX, 0, 0), + ERRATA_MIDR_RANGE_LIST(cavium_erratum_30115_cpus), }, #endif { -- 2.7.4
[PATCH v2 0/7] arm64: capabilities: Optimize checking and enabling
We maintain two separate tables (i.e, arm64_features and arm64_errata) of struct arm64_cpu_capabilities which decide the capabilities of the system. We iterate over the two tables for detecting/verifying/enabling the capabilities. e.g, this_cpu_has_cap() needs to iterate over the two tables one by one to find the "capability" structure corresponding to the cap number and then check it on the CPU. Also, we enable all the non-boot scoped capabilities after all the SMP cpus are brought up by the kernel, using stop_machine() for each available capability. We could batch all the "enabling" activity to a single stop_machine() callback. But that implies you need a way to map a given capability number to the corresponding capability entry to finish the operation quickly. So we need a quicker way to access the entry for a given capability. We have two choices : 1) Unify both the tables to a static/dynamic sorted entry based on the capability number. This has the following drawbacks : a) The entries must be unique. i.e, no duplicate entries for a capability. b) Looses the separation of "features" vs. "errata" classification c) Statically sorting the list is error prone. Runtime sorting the array means more time for booting. 2) Keep an array of pointers to the capability sorted at boot time based on the capability. a) As for (1), the entries must be unique for a capability. This series implements (2) and uses the new list for optimizing the operations on the entries. As a prepatory step, we remove the duplicate entries for the same capabilities (Patch 1-3). Changes since v1: - Fix merging workarounds for clean cache errata. Added a Kconfig symbol for covering the capability to make the handling better. - Make the WARNINGs meaningful on duplicate entries - Added tags from Vladimir Suzuki K Poulose (7): arm64: capabilities: Merge entries for ARM64_WORKAROUND_CLEAN_CACHE arm64: capabilities: Merge duplicate Cavium erratum entries arm64: capabilities: Merge duplicate entries for Qualcomm erratum 1003 arm64: capabilities: Speed up capability lookup arm64: capabilities: Optimize this_cpu_has_cap arm64: capabilities: Use linear array for detection and verification arm64: capabilities: Batch cpu_enable callbacks arch/arm64/Kconfig | 7 ++ arch/arm64/include/asm/cpufeature.h | 3 + arch/arm64/include/asm/cputype.h| 2 + arch/arm64/kernel/cpu_errata.c | 103 -- arch/arm64/kernel/cpufeature.c | 169 +--- 5 files changed, 165 insertions(+), 119 deletions(-) -- 2.7.4
[PATCH v2 0/7] arm64: capabilities: Optimize checking and enabling
We maintain two separate tables (i.e, arm64_features and arm64_errata) of struct arm64_cpu_capabilities which decide the capabilities of the system. We iterate over the two tables for detecting/verifying/enabling the capabilities. e.g, this_cpu_has_cap() needs to iterate over the two tables one by one to find the "capability" structure corresponding to the cap number and then check it on the CPU. Also, we enable all the non-boot scoped capabilities after all the SMP cpus are brought up by the kernel, using stop_machine() for each available capability. We could batch all the "enabling" activity to a single stop_machine() callback. But that implies you need a way to map a given capability number to the corresponding capability entry to finish the operation quickly. So we need a quicker way to access the entry for a given capability. We have two choices : 1) Unify both the tables to a static/dynamic sorted entry based on the capability number. This has the following drawbacks : a) The entries must be unique. i.e, no duplicate entries for a capability. b) Looses the separation of "features" vs. "errata" classification c) Statically sorting the list is error prone. Runtime sorting the array means more time for booting. 2) Keep an array of pointers to the capability sorted at boot time based on the capability. a) As for (1), the entries must be unique for a capability. This series implements (2) and uses the new list for optimizing the operations on the entries. As a prepatory step, we remove the duplicate entries for the same capabilities (Patch 1-3). Changes since v1: - Fix merging workarounds for clean cache errata. Added a Kconfig symbol for covering the capability to make the handling better. - Make the WARNINGs meaningful on duplicate entries - Added tags from Vladimir Suzuki K Poulose (7): arm64: capabilities: Merge entries for ARM64_WORKAROUND_CLEAN_CACHE arm64: capabilities: Merge duplicate Cavium erratum entries arm64: capabilities: Merge duplicate entries for Qualcomm erratum 1003 arm64: capabilities: Speed up capability lookup arm64: capabilities: Optimize this_cpu_has_cap arm64: capabilities: Use linear array for detection and verification arm64: capabilities: Batch cpu_enable callbacks arch/arm64/Kconfig | 7 ++ arch/arm64/include/asm/cpufeature.h | 3 + arch/arm64/include/asm/cputype.h| 2 + arch/arm64/kernel/cpu_errata.c | 103 -- arch/arm64/kernel/cpufeature.c | 169 +--- 5 files changed, 165 insertions(+), 119 deletions(-) -- 2.7.4
[PATCH 1/2] bus: imx-weim: support multiple address ranges per child node
From: Sven Van Asbroeck Ensure that timing values for the child node are applied to all chip selects in the child's address ranges. Note that this does not support multiple timing settings per child; this can be added in the future if required. Example: { acme@0 { compatible = "acme,whatever"; reg = <0 0 0x100>, <0 0x40 0x800>, <1 0x40 0x800>; fsl,weim-cs-timing = <0x024400b1 0x1010 0x20081100 0x 0xa240 0x>; }; }; Signed-off-by: Sven Van Asbroeck --- drivers/bus/imx-weim.c | 36 +--- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/bus/imx-weim.c b/drivers/bus/imx-weim.c index f01308172de9..24f22285395d 100644 --- a/drivers/bus/imx-weim.c +++ b/drivers/bus/imx-weim.c @@ -46,6 +46,7 @@ static const struct imx_weim_devtype imx51_weim_devtype = { }; #define MAX_CS_REGS_COUNT 6 +#define OF_REG_SIZE3 static const struct of_device_id weim_id_table[] = { /* i.MX1/21 */ @@ -115,27 +116,40 @@ static int __init weim_timing_setup(struct device_node *np, void __iomem *base, const struct imx_weim_devtype *devtype) { u32 cs_idx, value[MAX_CS_REGS_COUNT]; - int i, ret; + int i, ret, reg_idx, num_regs; if (WARN_ON(devtype->cs_regs_count > MAX_CS_REGS_COUNT)) return -EINVAL; - /* get the CS index from this child node's "reg" property. */ - ret = of_property_read_u32(np, "reg", _idx); + ret = of_property_read_u32_array(np, "fsl,weim-cs-timing", +value, devtype->cs_regs_count); if (ret) return ret; - if (cs_idx >= devtype->cs_count) + /* +* the child node's "reg" property may contain multiple address ranges, +* extract the chip select for each. +*/ + num_regs = of_property_count_elems_of_size(np, "reg", OF_REG_SIZE); + if (num_regs < 0) + return num_regs; + if (!num_regs) return -EINVAL; + for (reg_idx = 0; reg_idx < num_regs; reg_idx++) { + /* get the CS index from this child node's "reg" property. */ + ret = of_property_read_u32_index(np, "reg", + reg_idx*OF_REG_SIZE, _idx); + if (ret) + break; - ret = of_property_read_u32_array(np, "fsl,weim-cs-timing", -value, devtype->cs_regs_count); - if (ret) - return ret; + if (cs_idx >= devtype->cs_count) + return -EINVAL; - /* set the timing for WEIM */ - for (i = 0; i < devtype->cs_regs_count; i++) - writel(value[i], base + cs_idx * devtype->cs_stride + i * 4); + /* set the timing for WEIM */ + for (i = 0; i < devtype->cs_regs_count; i++) + writel(value[i], + base + cs_idx * devtype->cs_stride + i * 4); + } return 0; } -- 2.17.1
[PATCH 0/2] bus: imx-weim
From: Sven Van Asbroeck Support multiple address ranges per child node on imx-weim. While we're at it, insert some code which guards against common config conflicts. Sven Van Asbroeck (2): bus: imx-weim: support multiple address ranges per child node bus: imx-weim: guard against timing configuration conflicts drivers/bus/imx-weim.c | 69 +- 1 file changed, 55 insertions(+), 14 deletions(-) -- 2.17.1
[PATCH 2/2] bus: imx-weim: guard against timing configuration conflicts
From: Sven Van Asbroeck When adding weim child devices, there is a risk that the developer will (by mistake) specify more than one timing setting for the same chip select. The driver cannot support such a configuration. In case of conflict, this patch will print a warning to the log, and will ignore the child node in question. In this example, node acme@1 will be ignored, as it tries to modify timing settings for CS0: { acme@0 { compatible = "acme,whatever"; reg = <0 0 0x100>; fsl,weim-cs-timing = ; }; acme@1 { compatible = "acme,whatnot"; reg = <0 0x500 0x100>; fsl,weim-cs-timing = ; }; }; However in this example, the driver will be happy: { acme@0 { compatible = "acme,whatever"; reg = <0 0 0x100>; fsl,weim-cs-timing = ; }; acme@1 { compatible = "acme,whatnot"; reg = <0 0x500 0x100>; fsl,weim-cs-timing = ; }; }; Signed-off-by: Sven Van Asbroeck --- drivers/bus/imx-weim.c | 33 ++--- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/bus/imx-weim.c b/drivers/bus/imx-weim.c index 24f22285395d..114e503ec785 100644 --- a/drivers/bus/imx-weim.c +++ b/drivers/bus/imx-weim.c @@ -46,8 +46,18 @@ static const struct imx_weim_devtype imx51_weim_devtype = { }; #define MAX_CS_REGS_COUNT 6 +#define MAX_CS_REGS6 #define OF_REG_SIZE3 +struct cs_timing { + bool is_applied; + u32 regs[MAX_CS_REGS_COUNT]; +}; + +struct cs_timing_state { + struct cs_timing cs[MAX_CS_REGS]; +}; + static const struct of_device_id weim_id_table[] = { /* i.MX1/21 */ { .compatible = "fsl,imx1-weim", .data = _weim_devtype, }, @@ -112,11 +122,14 @@ static int __init imx_weim_gpr_setup(struct platform_device *pdev) } /* Parse and set the timing for this device. */ -static int __init weim_timing_setup(struct device_node *np, void __iomem *base, - const struct imx_weim_devtype *devtype) +static int __init weim_timing_setup(struct device *dev, + struct device_node *np, void __iomem *base, + const struct imx_weim_devtype *devtype, + struct cs_timing_state *ts) { u32 cs_idx, value[MAX_CS_REGS_COUNT]; int i, ret, reg_idx, num_regs; + struct cs_timing *cst; if (WARN_ON(devtype->cs_regs_count > MAX_CS_REGS_COUNT)) return -EINVAL; @@ -145,10 +158,23 @@ static int __init weim_timing_setup(struct device_node *np, void __iomem *base, if (cs_idx >= devtype->cs_count) return -EINVAL; + /* prevent re-configuring a CS that's already been configured */ + cst = >cs[cs_idx]; + if (cst->is_applied && memcmp(value, cst->regs, + devtype->cs_regs_count*sizeof(u32))) { + dev_err(dev, "fsl,weim-cs-timing conflict on %pOF", np); + return -EINVAL; + } + /* set the timing for WEIM */ for (i = 0; i < devtype->cs_regs_count; i++) writel(value[i], base + cs_idx * devtype->cs_stride + i * 4); + if (!cst->is_applied) { + cst->is_applied = true; + memcpy(cst->regs, value, + devtype->cs_regs_count*sizeof(u32)); + } } return 0; @@ -165,6 +191,7 @@ static int __init weim_parse_dt(struct platform_device *pdev, const struct imx_weim_devtype *devtype = of_id->data; struct device_node *child; int ret, have_child = 0; + struct cs_timing_state ts = {}; if (devtype == _weim_devtype) { ret = imx_weim_gpr_setup(pdev); @@ -179,7 +206,7 @@ static int __init weim_parse_dt(struct platform_device *pdev, } for_each_available_child_of_node(pdev->dev.of_node, child) { - ret = weim_timing_setup(child, base, devtype); + ret = weim_timing_setup(>dev, child, base, devtype, ); if (ret) dev_warn(>dev, "%pOF set timing failed.\n", child); -- 2.17.1
[PATCH 2/2] bus: imx-weim: guard against timing configuration conflicts
From: Sven Van Asbroeck When adding weim child devices, there is a risk that the developer will (by mistake) specify more than one timing setting for the same chip select. The driver cannot support such a configuration. In case of conflict, this patch will print a warning to the log, and will ignore the child node in question. In this example, node acme@1 will be ignored, as it tries to modify timing settings for CS0: { acme@0 { compatible = "acme,whatever"; reg = <0 0 0x100>; fsl,weim-cs-timing = ; }; acme@1 { compatible = "acme,whatnot"; reg = <0 0x500 0x100>; fsl,weim-cs-timing = ; }; }; However in this example, the driver will be happy: { acme@0 { compatible = "acme,whatever"; reg = <0 0 0x100>; fsl,weim-cs-timing = ; }; acme@1 { compatible = "acme,whatnot"; reg = <0 0x500 0x100>; fsl,weim-cs-timing = ; }; }; Signed-off-by: Sven Van Asbroeck --- drivers/bus/imx-weim.c | 33 ++--- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/bus/imx-weim.c b/drivers/bus/imx-weim.c index 24f22285395d..114e503ec785 100644 --- a/drivers/bus/imx-weim.c +++ b/drivers/bus/imx-weim.c @@ -46,8 +46,18 @@ static const struct imx_weim_devtype imx51_weim_devtype = { }; #define MAX_CS_REGS_COUNT 6 +#define MAX_CS_REGS6 #define OF_REG_SIZE3 +struct cs_timing { + bool is_applied; + u32 regs[MAX_CS_REGS_COUNT]; +}; + +struct cs_timing_state { + struct cs_timing cs[MAX_CS_REGS]; +}; + static const struct of_device_id weim_id_table[] = { /* i.MX1/21 */ { .compatible = "fsl,imx1-weim", .data = _weim_devtype, }, @@ -112,11 +122,14 @@ static int __init imx_weim_gpr_setup(struct platform_device *pdev) } /* Parse and set the timing for this device. */ -static int __init weim_timing_setup(struct device_node *np, void __iomem *base, - const struct imx_weim_devtype *devtype) +static int __init weim_timing_setup(struct device *dev, + struct device_node *np, void __iomem *base, + const struct imx_weim_devtype *devtype, + struct cs_timing_state *ts) { u32 cs_idx, value[MAX_CS_REGS_COUNT]; int i, ret, reg_idx, num_regs; + struct cs_timing *cst; if (WARN_ON(devtype->cs_regs_count > MAX_CS_REGS_COUNT)) return -EINVAL; @@ -145,10 +158,23 @@ static int __init weim_timing_setup(struct device_node *np, void __iomem *base, if (cs_idx >= devtype->cs_count) return -EINVAL; + /* prevent re-configuring a CS that's already been configured */ + cst = >cs[cs_idx]; + if (cst->is_applied && memcmp(value, cst->regs, + devtype->cs_regs_count*sizeof(u32))) { + dev_err(dev, "fsl,weim-cs-timing conflict on %pOF", np); + return -EINVAL; + } + /* set the timing for WEIM */ for (i = 0; i < devtype->cs_regs_count; i++) writel(value[i], base + cs_idx * devtype->cs_stride + i * 4); + if (!cst->is_applied) { + cst->is_applied = true; + memcpy(cst->regs, value, + devtype->cs_regs_count*sizeof(u32)); + } } return 0; @@ -165,6 +191,7 @@ static int __init weim_parse_dt(struct platform_device *pdev, const struct imx_weim_devtype *devtype = of_id->data; struct device_node *child; int ret, have_child = 0; + struct cs_timing_state ts = {}; if (devtype == _weim_devtype) { ret = imx_weim_gpr_setup(pdev); @@ -179,7 +206,7 @@ static int __init weim_parse_dt(struct platform_device *pdev, } for_each_available_child_of_node(pdev->dev.of_node, child) { - ret = weim_timing_setup(child, base, devtype); + ret = weim_timing_setup(>dev, child, base, devtype, ); if (ret) dev_warn(>dev, "%pOF set timing failed.\n", child); -- 2.17.1
[PATCH 1/2] bus: imx-weim: support multiple address ranges per child node
From: Sven Van Asbroeck Ensure that timing values for the child node are applied to all chip selects in the child's address ranges. Note that this does not support multiple timing settings per child; this can be added in the future if required. Example: { acme@0 { compatible = "acme,whatever"; reg = <0 0 0x100>, <0 0x40 0x800>, <1 0x40 0x800>; fsl,weim-cs-timing = <0x024400b1 0x1010 0x20081100 0x 0xa240 0x>; }; }; Signed-off-by: Sven Van Asbroeck --- drivers/bus/imx-weim.c | 36 +--- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/bus/imx-weim.c b/drivers/bus/imx-weim.c index f01308172de9..24f22285395d 100644 --- a/drivers/bus/imx-weim.c +++ b/drivers/bus/imx-weim.c @@ -46,6 +46,7 @@ static const struct imx_weim_devtype imx51_weim_devtype = { }; #define MAX_CS_REGS_COUNT 6 +#define OF_REG_SIZE3 static const struct of_device_id weim_id_table[] = { /* i.MX1/21 */ @@ -115,27 +116,40 @@ static int __init weim_timing_setup(struct device_node *np, void __iomem *base, const struct imx_weim_devtype *devtype) { u32 cs_idx, value[MAX_CS_REGS_COUNT]; - int i, ret; + int i, ret, reg_idx, num_regs; if (WARN_ON(devtype->cs_regs_count > MAX_CS_REGS_COUNT)) return -EINVAL; - /* get the CS index from this child node's "reg" property. */ - ret = of_property_read_u32(np, "reg", _idx); + ret = of_property_read_u32_array(np, "fsl,weim-cs-timing", +value, devtype->cs_regs_count); if (ret) return ret; - if (cs_idx >= devtype->cs_count) + /* +* the child node's "reg" property may contain multiple address ranges, +* extract the chip select for each. +*/ + num_regs = of_property_count_elems_of_size(np, "reg", OF_REG_SIZE); + if (num_regs < 0) + return num_regs; + if (!num_regs) return -EINVAL; + for (reg_idx = 0; reg_idx < num_regs; reg_idx++) { + /* get the CS index from this child node's "reg" property. */ + ret = of_property_read_u32_index(np, "reg", + reg_idx*OF_REG_SIZE, _idx); + if (ret) + break; - ret = of_property_read_u32_array(np, "fsl,weim-cs-timing", -value, devtype->cs_regs_count); - if (ret) - return ret; + if (cs_idx >= devtype->cs_count) + return -EINVAL; - /* set the timing for WEIM */ - for (i = 0; i < devtype->cs_regs_count; i++) - writel(value[i], base + cs_idx * devtype->cs_stride + i * 4); + /* set the timing for WEIM */ + for (i = 0; i < devtype->cs_regs_count; i++) + writel(value[i], + base + cs_idx * devtype->cs_stride + i * 4); + } return 0; } -- 2.17.1
[PATCH 0/2] bus: imx-weim
From: Sven Van Asbroeck Support multiple address ranges per child node on imx-weim. While we're at it, insert some code which guards against common config conflicts. Sven Van Asbroeck (2): bus: imx-weim: support multiple address ranges per child node bus: imx-weim: guard against timing configuration conflicts drivers/bus/imx-weim.c | 69 +- 1 file changed, 55 insertions(+), 14 deletions(-) -- 2.17.1
[PATCH] ACPI, APEI, EINJ: Change to use DEFINE_SHOW_ATTRIBUTE macro
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li --- drivers/acpi/apei/einj.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c index b38737c83a24..fcccbfdbdd1a 100644 --- a/drivers/acpi/apei/einj.c +++ b/drivers/acpi/apei/einj.c @@ -607,17 +607,7 @@ static int available_error_type_show(struct seq_file *m, void *v) return 0; } -static int available_error_type_open(struct inode *inode, struct file *file) -{ - return single_open(file, available_error_type_show, NULL); -} - -static const struct file_operations available_error_type_fops = { - .open = available_error_type_open, - .read = seq_read, - .llseek = seq_lseek, - .release= single_release, -}; +DEFINE_SHOW_ATTRIBUTE(available_error_type); static int error_type_get(void *data, u64 *val) { -- 2.17.0
[PATCH] ACPI, APEI, EINJ: Change to use DEFINE_SHOW_ATTRIBUTE macro
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li --- drivers/acpi/apei/einj.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c index b38737c83a24..fcccbfdbdd1a 100644 --- a/drivers/acpi/apei/einj.c +++ b/drivers/acpi/apei/einj.c @@ -607,17 +607,7 @@ static int available_error_type_show(struct seq_file *m, void *v) return 0; } -static int available_error_type_open(struct inode *inode, struct file *file) -{ - return single_open(file, available_error_type_show, NULL); -} - -static const struct file_operations available_error_type_fops = { - .open = available_error_type_open, - .read = seq_read, - .llseek = seq_lseek, - .release= single_release, -}; +DEFINE_SHOW_ATTRIBUTE(available_error_type); static int error_type_get(void *data, u64 *val) { -- 2.17.0
Re: RFC: script to convert vsprintf uses of %pf to %ps and %pF to %pS
On Sun 2018-11-25 01:13:51, Joe Perches wrote: > commit 04b8eb7a4ccd ("symbol lookup: introduce > dereference_symbol_descriptor()}" > > deprecated vsprintf extension %pf and %pF. > > so a script to convert all the %pf uses to %ps and %pF uses to %pS > could be useful. > > There are a few files that appear should not be converted. > > $ git grep -w --name-only -i '%pf'| \ > grep -vP '^(?:Documentation|tools|include/linux/freezer.h)'| \ > xargs sed -i -e 's/%pf/%ps/g' -e 's/%pF/%pS/g' > > If that script above is run, it leaves the following patch > to be applied: > diff --git a/lib/vsprintf.c b/lib/vsprintf.c > index 37a54a6dd594..393002bf5298 100644 > --- a/lib/vsprintf.c > +++ b/lib/vsprintf.c > @@ -1872,8 +1870,6 @@ char *pointer(const char *fmt, char *buf, char *end, > void *ptr, > } > > switch (*fmt) { > - case 'F': > - case 'f': Please, keep handling these modifiers so that they do not get reused anytime soon. The pointer modifiers are evil. Any misuse can easily lead to a crash. Any mistakes with upstreaming 3rd party patches or backporting stable fixes would be hard to notice. Well, it is perfectly fine to just explicitly fallback to the default %p behavior. I mean something like: case 'F': case 'f': /* Replaced by %ps and %pS and removed in 4.21 */ break; Best Regards, Petr
Re: RFC: script to convert vsprintf uses of %pf to %ps and %pF to %pS
On Sun 2018-11-25 01:13:51, Joe Perches wrote: > commit 04b8eb7a4ccd ("symbol lookup: introduce > dereference_symbol_descriptor()}" > > deprecated vsprintf extension %pf and %pF. > > so a script to convert all the %pf uses to %ps and %pF uses to %pS > could be useful. > > There are a few files that appear should not be converted. > > $ git grep -w --name-only -i '%pf'| \ > grep -vP '^(?:Documentation|tools|include/linux/freezer.h)'| \ > xargs sed -i -e 's/%pf/%ps/g' -e 's/%pF/%pS/g' > > If that script above is run, it leaves the following patch > to be applied: > diff --git a/lib/vsprintf.c b/lib/vsprintf.c > index 37a54a6dd594..393002bf5298 100644 > --- a/lib/vsprintf.c > +++ b/lib/vsprintf.c > @@ -1872,8 +1870,6 @@ char *pointer(const char *fmt, char *buf, char *end, > void *ptr, > } > > switch (*fmt) { > - case 'F': > - case 'f': Please, keep handling these modifiers so that they do not get reused anytime soon. The pointer modifiers are evil. Any misuse can easily lead to a crash. Any mistakes with upstreaming 3rd party patches or backporting stable fixes would be hard to notice. Well, it is perfectly fine to just explicitly fallback to the default %p behavior. I mean something like: case 'F': case 'f': /* Replaced by %ps and %pS and removed in 4.21 */ break; Best Regards, Petr
Re: Regression: very quiet speakers on Thinkpad T570s
On 11/30/18 11:00 AM, Takashi Iwai wrote: > On Fri, 30 Nov 2018 15:49:17 +0100, > Jeremy Cline wrote: >> >> Hi, >> >> Some folks have reported on the Fedora bug tracker[0] that the laptop >> speaker volume is very low on the Thinkpad T570 when running a kernel >> that includes commit 61fcf8ece9b6 ("ALSA: hda/realtek - Enable Thinkpad >> Dock device for ALC298 platform"). >> >> alsa-info.sh from v4.15.4 (just before commit 61fcf8ece9b6 arrived in >> stable) and v4.19.4 with the issue present are attached to the bugzilla. >> I've also Cc'd Tim, who uploaded them and has the laptop in question. >> >> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1554304 > > Could you pinpoint which kernel version started showing the > regression, at least? The diffs are fairly wide between 4.15 and > 4.19. Ah, sorry for not being more clear. The regression appears to be introduced by commit 61fcf8ece9b6, which got backported to v4.15.5 because it addressed a bug with the dock[0]. v4.19.4 with that commit reverted works, according to the bug reporter. [0] https://bugzilla.kernel.org/show_bug.cgi?id=195161 Regards, Jeremy
Re: Regression: very quiet speakers on Thinkpad T570s
On 11/30/18 11:00 AM, Takashi Iwai wrote: > On Fri, 30 Nov 2018 15:49:17 +0100, > Jeremy Cline wrote: >> >> Hi, >> >> Some folks have reported on the Fedora bug tracker[0] that the laptop >> speaker volume is very low on the Thinkpad T570 when running a kernel >> that includes commit 61fcf8ece9b6 ("ALSA: hda/realtek - Enable Thinkpad >> Dock device for ALC298 platform"). >> >> alsa-info.sh from v4.15.4 (just before commit 61fcf8ece9b6 arrived in >> stable) and v4.19.4 with the issue present are attached to the bugzilla. >> I've also Cc'd Tim, who uploaded them and has the laptop in question. >> >> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1554304 > > Could you pinpoint which kernel version started showing the > regression, at least? The diffs are fairly wide between 4.15 and > 4.19. Ah, sorry for not being more clear. The regression appears to be introduced by commit 61fcf8ece9b6, which got backported to v4.15.5 because it addressed a bug with the dock[0]. v4.19.4 with that commit reverted works, according to the bug reporter. [0] https://bugzilla.kernel.org/show_bug.cgi?id=195161 Regards, Jeremy
Re: rcu_preempt caused oom
On Fri, Nov 30, 2018 at 03:18:58PM +, He, Bo wrote: > Here is the kernel cmdline: Thank you! > Kernel command line: androidboot.acpio_idx=0 > androidboot.bootloader=efiwrapper-02_03-userdebug_kernelflinger-06_03-userdebug > androidboot.diskbus=00.0 androidboot.verifiedbootstate=green > androidboot.bootreason=power-on androidboot.serialno=R1J56L6006a7bb > g_ffs.iSerialNumber=R1J56L6006a7bb no_timer_check noxsaves reboot_panic=p,w > i915.hpd_sense_invert=0x7 mem=2G nokaslr nopti ftrace_dump_on_oops > trace_buf_size=1024K intel_iommu=off gpt loglevel=4 > androidboot.hardware=gordon_peak firmware_class.path=/vendor/firmware > relative_sleep_states=1 enforcing=0 androidboot.selinux=permissive > cpu_init_udelay=10 > androidboot.android_dt_dir=/sys/bus/platform/devices/ANDR0001:00/properties/android/ > pstore.backend=ramoops memmap=0x140$0x5000 > ramoops.mem_address=0x5000 ramoops.mem_size=0x140 > ramoops.record_size=0x4000 ramoops.console_size=0x100 > ramoops.ftrace_size=0x1 ramoops.dump_oops=1 vga=current > i915.modeset=1 drm.atomic=1 i915.nuclear_pageflip=1 drm.vblankoffdelay= And no sign of any suppression of RCU CPU stall warnings. Hmmm... It does take more than 21 seconds to OOM? Or do things happen faster than that? If they do happen faster than that, then on approach would be to add something like this to the kernel command line: rcupdate.rcu_cpu_stall_timeout=7 This would set the stall timeout to seven seconds. Note that timeouts less than three seconds are silently interpreted as three seconds. Thanx, Paul > -Original Message- > From: Steven Rostedt > Sent: Friday, November 30, 2018 11:17 PM > To: Paul E. McKenney > Cc: He, Bo ; linux-kernel@vger.kernel.org; > j...@joshtriplett.org; mathieu.desnoy...@efficios.com; > jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin > ; Zhang, Yanmin > Subject: Re: rcu_preempt caused oom > > On Fri, 30 Nov 2018 06:43:17 -0800 > "Paul E. McKenney" wrote: > > > Could you please send me your list of kernel boot parameters? They > > usually appear near the start of your console output. > > Or just: cat /proc/cmdline > > -- Steve >
Re: rcu_preempt caused oom
On Fri, Nov 30, 2018 at 03:18:58PM +, He, Bo wrote: > Here is the kernel cmdline: Thank you! > Kernel command line: androidboot.acpio_idx=0 > androidboot.bootloader=efiwrapper-02_03-userdebug_kernelflinger-06_03-userdebug > androidboot.diskbus=00.0 androidboot.verifiedbootstate=green > androidboot.bootreason=power-on androidboot.serialno=R1J56L6006a7bb > g_ffs.iSerialNumber=R1J56L6006a7bb no_timer_check noxsaves reboot_panic=p,w > i915.hpd_sense_invert=0x7 mem=2G nokaslr nopti ftrace_dump_on_oops > trace_buf_size=1024K intel_iommu=off gpt loglevel=4 > androidboot.hardware=gordon_peak firmware_class.path=/vendor/firmware > relative_sleep_states=1 enforcing=0 androidboot.selinux=permissive > cpu_init_udelay=10 > androidboot.android_dt_dir=/sys/bus/platform/devices/ANDR0001:00/properties/android/ > pstore.backend=ramoops memmap=0x140$0x5000 > ramoops.mem_address=0x5000 ramoops.mem_size=0x140 > ramoops.record_size=0x4000 ramoops.console_size=0x100 > ramoops.ftrace_size=0x1 ramoops.dump_oops=1 vga=current > i915.modeset=1 drm.atomic=1 i915.nuclear_pageflip=1 drm.vblankoffdelay= And no sign of any suppression of RCU CPU stall warnings. Hmmm... It does take more than 21 seconds to OOM? Or do things happen faster than that? If they do happen faster than that, then on approach would be to add something like this to the kernel command line: rcupdate.rcu_cpu_stall_timeout=7 This would set the stall timeout to seven seconds. Note that timeouts less than three seconds are silently interpreted as three seconds. Thanx, Paul > -Original Message- > From: Steven Rostedt > Sent: Friday, November 30, 2018 11:17 PM > To: Paul E. McKenney > Cc: He, Bo ; linux-kernel@vger.kernel.org; > j...@joshtriplett.org; mathieu.desnoy...@efficios.com; > jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin > ; Zhang, Yanmin > Subject: Re: rcu_preempt caused oom > > On Fri, 30 Nov 2018 06:43:17 -0800 > "Paul E. McKenney" wrote: > > > Could you please send me your list of kernel boot parameters? They > > usually appear near the start of your console output. > > Or just: cat /proc/cmdline > > -- Steve >
Re: [PATCH v2 4/4] x86/static_call: Add inline static call implementation for x86-64
On Thu, Nov 29, 2018 at 12:24 PM Josh Poimboeuf wrote: > > > Alternatively, we could actually emulate call instructions like this: > > > > void __noreturn jump_to_kernel_pt_regs(struct pt_regs *regs, ...) > > { > > struct pt_regs ptregs_copy = *regs; > > barrier(); > > *(unsigned long *)(regs->sp - 8) = whatever; /* may clobber old > > regs, but so what? */ > > asm volatile ("jmp return_to_alternate_ptregs"); > > } > > > > where return_to_alternate_ptregs points rsp to the ptregs and goes > > through the normal return path. It's ugly, but we could have a test > > case for it, and it should work fine. > > Is that really any better than my patch to create a gap in the stack > (modified for kernel space #BP only)? > I tend to prefer a nice local hack like mine over a hack that further complicates the entry in general. This is not to say I'm thrilled by my idea either.
Re: [PATCH v2 4/4] x86/static_call: Add inline static call implementation for x86-64
On Thu, Nov 29, 2018 at 12:24 PM Josh Poimboeuf wrote: > > > Alternatively, we could actually emulate call instructions like this: > > > > void __noreturn jump_to_kernel_pt_regs(struct pt_regs *regs, ...) > > { > > struct pt_regs ptregs_copy = *regs; > > barrier(); > > *(unsigned long *)(regs->sp - 8) = whatever; /* may clobber old > > regs, but so what? */ > > asm volatile ("jmp return_to_alternate_ptregs"); > > } > > > > where return_to_alternate_ptregs points rsp to the ptregs and goes > > through the normal return path. It's ugly, but we could have a test > > case for it, and it should work fine. > > Is that really any better than my patch to create a gap in the stack > (modified for kernel space #BP only)? > I tend to prefer a nice local hack like mine over a hack that further complicates the entry in general. This is not to say I'm thrilled by my idea either.
Re: [RFC PATCH 3/6] char: fastrpc: Add support for context Invoke method
On 30/11/18 16:19, Arnd Bergmann wrote: On Fri, Nov 30, 2018 at 5:03 PM Srinivas Kandagatla wrote: On 30/11/18 15:08, Arnd Bergmann wrote: On Fri, Nov 30, 2018 at 4:01 PM Srinivas Kandagatla wrote: Thanks Arnd for the review comments! On 30/11/18 13:41, Arnd Bergmann wrote: On Fri, Nov 30, 2018 at 11:48 AM Srinivas Kandagatla wrote: +static long fastrpc_device_ioctl(struct file *file, unsigned int cmd, +unsigned long arg) +{ + struct fastrpc_user *fl = (struct fastrpc_user *)file->private_data; + struct fastrpc_channel_ctx *cctx = fl->cctx; + char __user *argp = (char __user *)arg; + int err; + + if (!fl->sctx) { + fl->sctx = fastrpc_session_alloc(cctx, 0); + if (!fl->sctx) + return -ENOENT; + } Shouldn't that session be allocated during open()? Yes, and no, we do not need context in all the cases. In cases like we just want to allocate dmabuf. Can you give an example what that would be good for? Currently the instance which does not need session is used as simple memory allocator (rpcmem), TBH, this is the side effect of trying to fit in with downstream application infrastructure which uses ion for andriod usecases. That does not sound like enough of a reason then, user space is easy to change here to just allocate the memory from the device itself. The only reason that I can see for needing a dmabuf would be if you have to share a buffer between two instances, and then you can use either of them. I agree, I will try rework this and remove the instances that does not require sessions! Sharing buffer is also a reason for dmabuf here. +static void fastrpc_notify_users(struct fastrpc_user *user) +{ + struct fastrpc_invoke_ctx *ctx, *n;will go + + spin_lock(>lock); + list_for_each_entry_safe(ctx, n, >pending, node) + complete(>work); + spin_unlock(>lock); +} Can you explain here what it means to have multiple 'users' a 'fastrpc_user' structure? Why are they all done at the same time? user is allocated on every open(). Having multiple users means that there are more than one compute sessions running on a given dsp. They reason why all the users are notified here is because the dsp is going down, so all the compute sessions associated with it will not see any response from dsp, so any pending/waiting compute contexts are explicitly notified. I don't get it yet. What are 'compute sessions'? Do you have multiple threads running on a single instance at the same time? compute sessions are "compute context-banks" instances in DSP. DSP supports multiple compute banks, Ideally 12 context banks, 4 which are reserved for other purposes and 8 of them are used for compute, one for each process. So ideally we can run 8 parallel computes. I would have expected to only ever see one thread in the 'wait_for_completion()' above, and others possibly waiting for a chance to get to but not already running. struct fastrpc_remote_crc { __u64 crc; __u64 reserved1 __u64 reserved2 __u64 reserved3 }; I don't see a need to add extra served fields for structures that are already naturally aligned here, e.g. in fastrpc_remote_arg we need the 'reserved1' but not the 'reserved2'. Yes, I see, I overdone it! Other idea, is, may be I can try to combine these into single structure something like: struct fastrpc_invoke_arg { __u64 ptr; __u64 len; __u32 fd; __u32 reserved1 __u64 attr; __u64 crc; }; struct fastrpc_ioctl_invoke { __u32 handle; __u32 sc; /* The minimum size is scalar_length * 32*/ struct fastrpc_invoke_args *args; }; That is still two structure, not one ;-) struct fastrpc_ioctl_invoke { __u32 handle; __u32 sc; /* The minimum size is scalar_length * 32 */ struct fastrpc_remote_args *rargs; struct fastrpc_remote_fd *fds; struct fastrpc_remote_attr *attrs; struct fastrpc_remote_crc *crc; }; Do these really have to be indirect then? Are they all lists of variable length? How do you know how long? Yes, they are variable length and will be scalar length long. Scalar length is derived from sc variable in this structure. Do you mean we have a variable number 'sc', but each array always has the same length as the other ones? In that case: yes, combining them seems sensible. Yes thats what I meant! The other question this raises is: what is 'handle'? Why is the file descriptor not enough to identify the instance we want to talk to? This is remote handle to opened interface on which this method has to be invoked. For example we are running a calculator application, calculator will have a unique handle on which calculate() method needs to be invoked. thanks, srini Arnd
Re: [RFC PATCH 3/6] char: fastrpc: Add support for context Invoke method
On 30/11/18 16:19, Arnd Bergmann wrote: On Fri, Nov 30, 2018 at 5:03 PM Srinivas Kandagatla wrote: On 30/11/18 15:08, Arnd Bergmann wrote: On Fri, Nov 30, 2018 at 4:01 PM Srinivas Kandagatla wrote: Thanks Arnd for the review comments! On 30/11/18 13:41, Arnd Bergmann wrote: On Fri, Nov 30, 2018 at 11:48 AM Srinivas Kandagatla wrote: +static long fastrpc_device_ioctl(struct file *file, unsigned int cmd, +unsigned long arg) +{ + struct fastrpc_user *fl = (struct fastrpc_user *)file->private_data; + struct fastrpc_channel_ctx *cctx = fl->cctx; + char __user *argp = (char __user *)arg; + int err; + + if (!fl->sctx) { + fl->sctx = fastrpc_session_alloc(cctx, 0); + if (!fl->sctx) + return -ENOENT; + } Shouldn't that session be allocated during open()? Yes, and no, we do not need context in all the cases. In cases like we just want to allocate dmabuf. Can you give an example what that would be good for? Currently the instance which does not need session is used as simple memory allocator (rpcmem), TBH, this is the side effect of trying to fit in with downstream application infrastructure which uses ion for andriod usecases. That does not sound like enough of a reason then, user space is easy to change here to just allocate the memory from the device itself. The only reason that I can see for needing a dmabuf would be if you have to share a buffer between two instances, and then you can use either of them. I agree, I will try rework this and remove the instances that does not require sessions! Sharing buffer is also a reason for dmabuf here. +static void fastrpc_notify_users(struct fastrpc_user *user) +{ + struct fastrpc_invoke_ctx *ctx, *n;will go + + spin_lock(>lock); + list_for_each_entry_safe(ctx, n, >pending, node) + complete(>work); + spin_unlock(>lock); +} Can you explain here what it means to have multiple 'users' a 'fastrpc_user' structure? Why are they all done at the same time? user is allocated on every open(). Having multiple users means that there are more than one compute sessions running on a given dsp. They reason why all the users are notified here is because the dsp is going down, so all the compute sessions associated with it will not see any response from dsp, so any pending/waiting compute contexts are explicitly notified. I don't get it yet. What are 'compute sessions'? Do you have multiple threads running on a single instance at the same time? compute sessions are "compute context-banks" instances in DSP. DSP supports multiple compute banks, Ideally 12 context banks, 4 which are reserved for other purposes and 8 of them are used for compute, one for each process. So ideally we can run 8 parallel computes. I would have expected to only ever see one thread in the 'wait_for_completion()' above, and others possibly waiting for a chance to get to but not already running. struct fastrpc_remote_crc { __u64 crc; __u64 reserved1 __u64 reserved2 __u64 reserved3 }; I don't see a need to add extra served fields for structures that are already naturally aligned here, e.g. in fastrpc_remote_arg we need the 'reserved1' but not the 'reserved2'. Yes, I see, I overdone it! Other idea, is, may be I can try to combine these into single structure something like: struct fastrpc_invoke_arg { __u64 ptr; __u64 len; __u32 fd; __u32 reserved1 __u64 attr; __u64 crc; }; struct fastrpc_ioctl_invoke { __u32 handle; __u32 sc; /* The minimum size is scalar_length * 32*/ struct fastrpc_invoke_args *args; }; That is still two structure, not one ;-) struct fastrpc_ioctl_invoke { __u32 handle; __u32 sc; /* The minimum size is scalar_length * 32 */ struct fastrpc_remote_args *rargs; struct fastrpc_remote_fd *fds; struct fastrpc_remote_attr *attrs; struct fastrpc_remote_crc *crc; }; Do these really have to be indirect then? Are they all lists of variable length? How do you know how long? Yes, they are variable length and will be scalar length long. Scalar length is derived from sc variable in this structure. Do you mean we have a variable number 'sc', but each array always has the same length as the other ones? In that case: yes, combining them seems sensible. Yes thats what I meant! The other question this raises is: what is 'handle'? Why is the file descriptor not enough to identify the instance we want to talk to? This is remote handle to opened interface on which this method has to be invoked. For example we are running a calculator application, calculator will have a unique handle on which calculate() method needs to be invoked. thanks, srini Arnd
[PATCH] pinctrl: Change to use DEFINE_SHOW_ATTRIBUTE macro
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li --- drivers/pinctrl/pinconf.c | 29 - drivers/pinctrl/pinmux.c | 29 - 2 files changed, 8 insertions(+), 50 deletions(-) diff --git a/drivers/pinctrl/pinconf.c b/drivers/pinctrl/pinconf.c index d3fe14394b73..2c7229380f08 100644 --- a/drivers/pinctrl/pinconf.c +++ b/drivers/pinctrl/pinconf.c @@ -366,29 +366,8 @@ static int pinconf_groups_show(struct seq_file *s, void *what) return 0; } -static int pinconf_pins_open(struct inode *inode, struct file *file) -{ - return single_open(file, pinconf_pins_show, inode->i_private); -} - -static int pinconf_groups_open(struct inode *inode, struct file *file) -{ - return single_open(file, pinconf_groups_show, inode->i_private); -} - -static const struct file_operations pinconf_pins_ops = { - .open = pinconf_pins_open, - .read = seq_read, - .llseek = seq_lseek, - .release= single_release, -}; - -static const struct file_operations pinconf_groups_ops = { - .open = pinconf_groups_open, - .read = seq_read, - .llseek = seq_lseek, - .release= single_release, -}; +DEFINE_SHOW_ATTRIBUTE(pinconf_pins); +DEFINE_SHOW_ATTRIBUTE(pinconf_groups); #define MAX_NAME_LEN 15 @@ -613,9 +592,9 @@ void pinconf_init_device_debugfs(struct dentry *devroot, struct pinctrl_dev *pctldev) { debugfs_create_file("pinconf-pins", S_IFREG | S_IRUGO, - devroot, pctldev, _pins_ops); + devroot, pctldev, _pins_fops); debugfs_create_file("pinconf-groups", S_IFREG | S_IRUGO, - devroot, pctldev, _groups_ops); + devroot, pctldev, _groups_fops); debugfs_create_file("pinconf-config", (S_IRUGO | S_IWUSR | S_IWGRP), devroot, pctldev, _dbg_pinconfig_fops); } diff --git a/drivers/pinctrl/pinmux.c b/drivers/pinctrl/pinmux.c index 5780442c068b..4d0cc1889dd9 100644 --- a/drivers/pinctrl/pinmux.c +++ b/drivers/pinctrl/pinmux.c @@ -644,37 +644,16 @@ void pinmux_show_setting(struct seq_file *s, setting->data.mux.func); } -static int pinmux_functions_open(struct inode *inode, struct file *file) -{ - return single_open(file, pinmux_functions_show, inode->i_private); -} - -static int pinmux_pins_open(struct inode *inode, struct file *file) -{ - return single_open(file, pinmux_pins_show, inode->i_private); -} - -static const struct file_operations pinmux_functions_ops = { - .open = pinmux_functions_open, - .read = seq_read, - .llseek = seq_lseek, - .release= single_release, -}; - -static const struct file_operations pinmux_pins_ops = { - .open = pinmux_pins_open, - .read = seq_read, - .llseek = seq_lseek, - .release= single_release, -}; +DEFINE_SHOW_ATTRIBUTE(pinmux_functions); +DEFINE_SHOW_ATTRIBUTE(pinmux_pins); void pinmux_init_device_debugfs(struct dentry *devroot, struct pinctrl_dev *pctldev) { debugfs_create_file("pinmux-functions", S_IFREG | S_IRUGO, - devroot, pctldev, _functions_ops); + devroot, pctldev, _functions_fops); debugfs_create_file("pinmux-pins", S_IFREG | S_IRUGO, - devroot, pctldev, _pins_ops); + devroot, pctldev, _pins_fops); } #endif /* CONFIG_DEBUG_FS */ -- 2.17.0
[PATCH] pinctrl: Change to use DEFINE_SHOW_ATTRIBUTE macro
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li --- drivers/pinctrl/pinconf.c | 29 - drivers/pinctrl/pinmux.c | 29 - 2 files changed, 8 insertions(+), 50 deletions(-) diff --git a/drivers/pinctrl/pinconf.c b/drivers/pinctrl/pinconf.c index d3fe14394b73..2c7229380f08 100644 --- a/drivers/pinctrl/pinconf.c +++ b/drivers/pinctrl/pinconf.c @@ -366,29 +366,8 @@ static int pinconf_groups_show(struct seq_file *s, void *what) return 0; } -static int pinconf_pins_open(struct inode *inode, struct file *file) -{ - return single_open(file, pinconf_pins_show, inode->i_private); -} - -static int pinconf_groups_open(struct inode *inode, struct file *file) -{ - return single_open(file, pinconf_groups_show, inode->i_private); -} - -static const struct file_operations pinconf_pins_ops = { - .open = pinconf_pins_open, - .read = seq_read, - .llseek = seq_lseek, - .release= single_release, -}; - -static const struct file_operations pinconf_groups_ops = { - .open = pinconf_groups_open, - .read = seq_read, - .llseek = seq_lseek, - .release= single_release, -}; +DEFINE_SHOW_ATTRIBUTE(pinconf_pins); +DEFINE_SHOW_ATTRIBUTE(pinconf_groups); #define MAX_NAME_LEN 15 @@ -613,9 +592,9 @@ void pinconf_init_device_debugfs(struct dentry *devroot, struct pinctrl_dev *pctldev) { debugfs_create_file("pinconf-pins", S_IFREG | S_IRUGO, - devroot, pctldev, _pins_ops); + devroot, pctldev, _pins_fops); debugfs_create_file("pinconf-groups", S_IFREG | S_IRUGO, - devroot, pctldev, _groups_ops); + devroot, pctldev, _groups_fops); debugfs_create_file("pinconf-config", (S_IRUGO | S_IWUSR | S_IWGRP), devroot, pctldev, _dbg_pinconfig_fops); } diff --git a/drivers/pinctrl/pinmux.c b/drivers/pinctrl/pinmux.c index 5780442c068b..4d0cc1889dd9 100644 --- a/drivers/pinctrl/pinmux.c +++ b/drivers/pinctrl/pinmux.c @@ -644,37 +644,16 @@ void pinmux_show_setting(struct seq_file *s, setting->data.mux.func); } -static int pinmux_functions_open(struct inode *inode, struct file *file) -{ - return single_open(file, pinmux_functions_show, inode->i_private); -} - -static int pinmux_pins_open(struct inode *inode, struct file *file) -{ - return single_open(file, pinmux_pins_show, inode->i_private); -} - -static const struct file_operations pinmux_functions_ops = { - .open = pinmux_functions_open, - .read = seq_read, - .llseek = seq_lseek, - .release= single_release, -}; - -static const struct file_operations pinmux_pins_ops = { - .open = pinmux_pins_open, - .read = seq_read, - .llseek = seq_lseek, - .release= single_release, -}; +DEFINE_SHOW_ATTRIBUTE(pinmux_functions); +DEFINE_SHOW_ATTRIBUTE(pinmux_pins); void pinmux_init_device_debugfs(struct dentry *devroot, struct pinctrl_dev *pctldev) { debugfs_create_file("pinmux-functions", S_IFREG | S_IRUGO, - devroot, pctldev, _functions_ops); + devroot, pctldev, _functions_fops); debugfs_create_file("pinmux-pins", S_IFREG | S_IRUGO, - devroot, pctldev, _pins_ops); + devroot, pctldev, _pins_fops); } #endif /* CONFIG_DEBUG_FS */ -- 2.17.0
Re: [PATCH v2] signal: add procfd_signal() syscall
On Fri, Nov 30, 2018 at 3:41 AM Arnd Bergmann wrote: > siginfo_t as it is now still has a number of other downsides, and Andy in > particular didn't like the idea of having three new variants on x86 > (depending on how you count). His alternative suggestion of having > a single syscall entry point that takes a 'signfo_t __user *' but interprets > it as compat_siginfo depending on in_compat_syscall()/in_x32_syscall() > should work correctly, but feels wrong to me, or at least inconsistent > with how we do this elsewhere. If everyone else is okay with it, I can get on board with three variants on x86. What I can't get on board with is *five* variants on x86, which would be: procfd_signal via int80 / the 32-bit vDSO: the ia32 structure syscall64 with nr == 335 (presumably): 64-bit syscall64 with nr == 548 | 0x4000: x32 syscall64 with nr == 548: 64-bit entry but in_compat_syscall() == true, behavior is arbitrary syscall64 with nr == 335 | 0x4000: x32 entry, but in_compat_syscall() == false, behavior is arbitrary This mess isn't really Christian's fault -- it's been there for a while, but it's awful and I don't think we want to perpetuate it. Obviously, I'd prefer a variant where the structure that's passed in is always the same. BTW, do we consider siginfo_t to be extensible? If so, and if we pass in a pointer, presumably we should pass a length as well.
Re: [PATCH v2] signal: add procfd_signal() syscall
On Fri, Nov 30, 2018 at 3:41 AM Arnd Bergmann wrote: > siginfo_t as it is now still has a number of other downsides, and Andy in > particular didn't like the idea of having three new variants on x86 > (depending on how you count). His alternative suggestion of having > a single syscall entry point that takes a 'signfo_t __user *' but interprets > it as compat_siginfo depending on in_compat_syscall()/in_x32_syscall() > should work correctly, but feels wrong to me, or at least inconsistent > with how we do this elsewhere. If everyone else is okay with it, I can get on board with three variants on x86. What I can't get on board with is *five* variants on x86, which would be: procfd_signal via int80 / the 32-bit vDSO: the ia32 structure syscall64 with nr == 335 (presumably): 64-bit syscall64 with nr == 548 | 0x4000: x32 syscall64 with nr == 548: 64-bit entry but in_compat_syscall() == true, behavior is arbitrary syscall64 with nr == 335 | 0x4000: x32 entry, but in_compat_syscall() == false, behavior is arbitrary This mess isn't really Christian's fault -- it's been there for a while, but it's awful and I don't think we want to perpetuate it. Obviously, I'd prefer a variant where the structure that's passed in is always the same. BTW, do we consider siginfo_t to be extensible? If so, and if we pass in a pointer, presumably we should pass a length as well.
Re: [PATCH 00/12 v6] fs/locks: avoid thundering-herd wake-ups
On Fri, 2018-11-30 at 10:04 +1100, NeilBrown wrote: > This series has the fixes for the recently reported performance > regressions merged into the patches which caused them. > It also has a couple of little fixes that have been mentioned on the > list, and that Jeff had merged into his copy. > > Thanks, > NeilBrown > > --- > > NeilBrown (12): > fs/locks: rename some lists and pointers. > fs/locks: split out __locks_wake_up_blocks(). > NFS: use locks_copy_lock() to copy locks. > gfs2: properly initial file_lock used for unlock. > ocfs2: properly initial file_lock used for unlock. > fs/locks: use properly initialized file_lock when unlocking. > fs/locks: allow a lock request to block other requests. > fs/locks: always delete_block after waiting. > fs/locks: change all *_conflict() functions to return bool. > fs/locks: create a tree of dependent requests. > fs/locks: merge posix_unblock_lock() and locks_delete_block() > fs/locks: locks: remove unnecessary white space. > > > fs/cifs/file.c |4 > fs/gfs2/file.c | 10 + > fs/lockd/svclock.c |2 > fs/locks.c | 342 > +-- > fs/nfs/nfs4proc.c |6 - > fs/nfsd/nfs4state.c |6 - > fs/ocfs2/locks.c| 10 + > include/linux/fs.h | 13 + > include/trace/events/filelock.h | 16 +- > 9 files changed, 252 insertions(+), 157 deletions(-) > > -- > Signature > Thanks Neil, This looks great. I've gone ahead and replaced the set in locks-next with this one. Unless we hear of any other problems, I'll plan to send a PR to Linus when the merge window opens. Cheers, -- Jeff Layton
Re: [PATCH 00/12 v6] fs/locks: avoid thundering-herd wake-ups
On Fri, 2018-11-30 at 10:04 +1100, NeilBrown wrote: > This series has the fixes for the recently reported performance > regressions merged into the patches which caused them. > It also has a couple of little fixes that have been mentioned on the > list, and that Jeff had merged into his copy. > > Thanks, > NeilBrown > > --- > > NeilBrown (12): > fs/locks: rename some lists and pointers. > fs/locks: split out __locks_wake_up_blocks(). > NFS: use locks_copy_lock() to copy locks. > gfs2: properly initial file_lock used for unlock. > ocfs2: properly initial file_lock used for unlock. > fs/locks: use properly initialized file_lock when unlocking. > fs/locks: allow a lock request to block other requests. > fs/locks: always delete_block after waiting. > fs/locks: change all *_conflict() functions to return bool. > fs/locks: create a tree of dependent requests. > fs/locks: merge posix_unblock_lock() and locks_delete_block() > fs/locks: locks: remove unnecessary white space. > > > fs/cifs/file.c |4 > fs/gfs2/file.c | 10 + > fs/lockd/svclock.c |2 > fs/locks.c | 342 > +-- > fs/nfs/nfs4proc.c |6 - > fs/nfsd/nfs4state.c |6 - > fs/ocfs2/locks.c| 10 + > include/linux/fs.h | 13 + > include/trace/events/filelock.h | 16 +- > 9 files changed, 252 insertions(+), 157 deletions(-) > > -- > Signature > Thanks Neil, This looks great. I've gone ahead and replaced the set in locks-next with this one. Unless we hear of any other problems, I'll plan to send a PR to Linus when the merge window opens. Cheers, -- Jeff Layton
Re: dcache_readdir NULL inode oops
On Fri, Nov 30, 2018 at 04:08:52PM +, Al Viro wrote: > On Fri, Nov 30, 2018 at 09:16:49AM -0600, Eric W. Biederman wrote: > > >> > + inode_lock(parent->d_inode); > > >> > dentry->d_fsdata = NULL; > > >> > drop_nlink(dentry->d_inode); > > >> > d_delete(dentry); > > >> > + inode_unlock(parent->d_inode); > > >> > + > > >> > dput(dentry); /* d_alloc_name() in devpts_pty_new() */ > > >> > } > > > > > > This feels right but getting some feedback from others would be good. > > > > This is going to be special at least because we are not coming through > > the normal unlink path and we are manipulating the dcache. > > > > This looks plausible. If this is whats going on then we have had this > > bug for a very long time. I will see if I can make some time. > > > > It looks like in the general case everything is serialized by the > > devpts_mutex. I wonder if just changing the order of operations > > here would be enough. > > > > AKA: drop_nlink d_delete then dentry->d_fsdata. Ugh d_fsdata is not > > implicated so that won't help here. > > It certainly won't. The thing is, this > if (!dir_emit(ctx, next->d_name.name, next->d_name.len, > d_inode(next)->i_ino, dt_type(d_inode(next > in dcache_readdir() obviously can block, so all we can hold over it is > blocking locks. Which we do - specifically, ->i_rwsem on our directory. > > It's actually worse than missing inode_lock() - consider the effects > of mount --bind /mnt/foo /dev/pts/42. What happens when that thing > goes away? Right, a lost mount... Ha, I hadn't even considered that scenario. Urgh! > I'll resurrect the "kernel-internal rm -rf done right" series and > post it; devpts is not the only place suffering such problem (binfmt_misc, > etc.) Thanks. I'm happy to test that it solves this issue if you throw me on cc. Will
Re: dcache_readdir NULL inode oops
On Fri, Nov 30, 2018 at 04:08:52PM +, Al Viro wrote: > On Fri, Nov 30, 2018 at 09:16:49AM -0600, Eric W. Biederman wrote: > > >> > + inode_lock(parent->d_inode); > > >> > dentry->d_fsdata = NULL; > > >> > drop_nlink(dentry->d_inode); > > >> > d_delete(dentry); > > >> > + inode_unlock(parent->d_inode); > > >> > + > > >> > dput(dentry); /* d_alloc_name() in devpts_pty_new() */ > > >> > } > > > > > > This feels right but getting some feedback from others would be good. > > > > This is going to be special at least because we are not coming through > > the normal unlink path and we are manipulating the dcache. > > > > This looks plausible. If this is whats going on then we have had this > > bug for a very long time. I will see if I can make some time. > > > > It looks like in the general case everything is serialized by the > > devpts_mutex. I wonder if just changing the order of operations > > here would be enough. > > > > AKA: drop_nlink d_delete then dentry->d_fsdata. Ugh d_fsdata is not > > implicated so that won't help here. > > It certainly won't. The thing is, this > if (!dir_emit(ctx, next->d_name.name, next->d_name.len, > d_inode(next)->i_ino, dt_type(d_inode(next > in dcache_readdir() obviously can block, so all we can hold over it is > blocking locks. Which we do - specifically, ->i_rwsem on our directory. > > It's actually worse than missing inode_lock() - consider the effects > of mount --bind /mnt/foo /dev/pts/42. What happens when that thing > goes away? Right, a lost mount... Ha, I hadn't even considered that scenario. Urgh! > I'll resurrect the "kernel-internal rm -rf done right" series and > post it; devpts is not the only place suffering such problem (binfmt_misc, > etc.) Thanks. I'm happy to test that it solves this issue if you throw me on cc. Will
Re: [PATCH v2 4/4] x86/static_call: Add inline static call implementation for x86-64
On Thu, Nov 29, 2018 at 03:04:20PM -0800, Linus Torvalds wrote: > On Thu, Nov 29, 2018 at 12:25 PM Josh Poimboeuf wrote: > > > > On Thu, Nov 29, 2018 at 11:27:00AM -0800, Andy Lutomirski wrote: > > > > > > I propose a different solution: > > > > > > As in this patch set, we have a direct and an indirect version. The > > > indirect version remains exactly the same as in this patch set. The > > > direct version just only does the patching when all seems well: the > > > call instruction needs to be 0xe8, and we only do it when the thing > > > doesn't cross a cache line. Does that work? In the rare case where > > > the compiler generates something other than 0xe8 or crosses a cache > > > line, then the thing just remains as a call to the out of line jmp > > > trampoline. Does that seem reasonable? It's a very minor change to > > > the patch set. > > > > Maybe that would be ok. If my math is right, we would use the > > out-of-line version almost 5% of the time due to cache misalignment of > > the address. > > Note that I don't think cache-line alignment is necessarily sufficient. > > The I$ fetch from the cacheline can happen in smaller chunks, because > the bus between the I$ and the instruction decode isn't a full > cacheline (well, it is _now_ in modern big cores, but it hasn't always > been). > > So even if the cacheline is updated atomically, I could imagine seeing > a partial fetch from the I$ (old values) and then a second partial > fetch (new values). > > It would be interesting to know what the exact fetch rules are. I've been doing some cross-modifying code experiments on Nehalem, with one CPU writing call destinations while the other CPUs are executing them. Reliably, one of the readers goes off into the weeds within a few seconds. The writing was done with just text_poke(), no #BP. I wasn't able to figure out the pattern in the addresses of the corrupted call sites. It wasn't cache line. That was on Nehalem. Skylake didn't crash at all. -- Josh
Re: [PATCH v2 4/4] x86/static_call: Add inline static call implementation for x86-64
On Thu, Nov 29, 2018 at 03:04:20PM -0800, Linus Torvalds wrote: > On Thu, Nov 29, 2018 at 12:25 PM Josh Poimboeuf wrote: > > > > On Thu, Nov 29, 2018 at 11:27:00AM -0800, Andy Lutomirski wrote: > > > > > > I propose a different solution: > > > > > > As in this patch set, we have a direct and an indirect version. The > > > indirect version remains exactly the same as in this patch set. The > > > direct version just only does the patching when all seems well: the > > > call instruction needs to be 0xe8, and we only do it when the thing > > > doesn't cross a cache line. Does that work? In the rare case where > > > the compiler generates something other than 0xe8 or crosses a cache > > > line, then the thing just remains as a call to the out of line jmp > > > trampoline. Does that seem reasonable? It's a very minor change to > > > the patch set. > > > > Maybe that would be ok. If my math is right, we would use the > > out-of-line version almost 5% of the time due to cache misalignment of > > the address. > > Note that I don't think cache-line alignment is necessarily sufficient. > > The I$ fetch from the cacheline can happen in smaller chunks, because > the bus between the I$ and the instruction decode isn't a full > cacheline (well, it is _now_ in modern big cores, but it hasn't always > been). > > So even if the cacheline is updated atomically, I could imagine seeing > a partial fetch from the I$ (old values) and then a second partial > fetch (new values). > > It would be interesting to know what the exact fetch rules are. I've been doing some cross-modifying code experiments on Nehalem, with one CPU writing call destinations while the other CPUs are executing them. Reliably, one of the readers goes off into the weeds within a few seconds. The writing was done with just text_poke(), no #BP. I wasn't able to figure out the pattern in the addresses of the corrupted call sites. It wasn't cache line. That was on Nehalem. Skylake didn't crash at all. -- Josh
[PATCH] gpio: ks8695: Change to use DEFINE_SHOW_ATTRIBUTE macro
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li --- drivers/gpio/gpio-ks8695.c | 15 +++ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/drivers/gpio/gpio-ks8695.c b/drivers/gpio/gpio-ks8695.c index 55d562e1278e..d6d6140ffc40 100644 --- a/drivers/gpio/gpio-ks8695.c +++ b/drivers/gpio/gpio-ks8695.c @@ -282,22 +282,13 @@ static int ks8695_gpio_show(struct seq_file *s, void *unused) return 0; } -static int ks8695_gpio_open(struct inode *inode, struct file *file) -{ - return single_open(file, ks8695_gpio_show, NULL); -} - -static const struct file_operations ks8695_gpio_operations = { - .open = ks8695_gpio_open, - .read = seq_read, - .llseek = seq_lseek, - .release= single_release, -}; +DEFINE_SHOW_ATTRIBUTE(ks8695_gpio); static int __init ks8695_gpio_debugfs_init(void) { /* /sys/kernel/debug/ks8695_gpio */ - (void) debugfs_create_file("ks8695_gpio", S_IFREG | S_IRUGO, NULL, NULL, _gpio_operations); + debugfs_create_file("ks8695_gpio", S_IFREG | S_IRUGO, NULL, NULL, + _gpio_fops); return 0; } postcore_initcall(ks8695_gpio_debugfs_init); -- 2.17.0
[PATCH] gpio: ks8695: Change to use DEFINE_SHOW_ATTRIBUTE macro
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li --- drivers/gpio/gpio-ks8695.c | 15 +++ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/drivers/gpio/gpio-ks8695.c b/drivers/gpio/gpio-ks8695.c index 55d562e1278e..d6d6140ffc40 100644 --- a/drivers/gpio/gpio-ks8695.c +++ b/drivers/gpio/gpio-ks8695.c @@ -282,22 +282,13 @@ static int ks8695_gpio_show(struct seq_file *s, void *unused) return 0; } -static int ks8695_gpio_open(struct inode *inode, struct file *file) -{ - return single_open(file, ks8695_gpio_show, NULL); -} - -static const struct file_operations ks8695_gpio_operations = { - .open = ks8695_gpio_open, - .read = seq_read, - .llseek = seq_lseek, - .release= single_release, -}; +DEFINE_SHOW_ATTRIBUTE(ks8695_gpio); static int __init ks8695_gpio_debugfs_init(void) { /* /sys/kernel/debug/ks8695_gpio */ - (void) debugfs_create_file("ks8695_gpio", S_IFREG | S_IRUGO, NULL, NULL, _gpio_operations); + debugfs_create_file("ks8695_gpio", S_IFREG | S_IRUGO, NULL, NULL, + _gpio_fops); return 0; } postcore_initcall(ks8695_gpio_debugfs_init); -- 2.17.0
Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation
On 2018-11-30 10:31 a.m., Peter Zijlstra wrote: > On Fri, Nov 30, 2018 at 04:23:47PM +0100, Peter Zijlstra wrote: > >> Hurm.. no. They apply cleanly to Linus' tree here. >> >> linux-2.6$ git describe >> v4.20-rc4-156-g94f371cb7394 >> linux-2.6$ quilt push 4 >> Applying patch patches/peterz-cpa-addr.patch >> patching file arch/x86/mm/pageattr.c >> Applying patch patches/peterz-cpa-fix-flush_array.patch >> patching file arch/x86/mm/mm_internal.h >> patching file arch/x86/mm/pageattr.c >> patching file arch/x86/mm/tlb.c >> Applying patch patches/peterz-cpa-fold-cpa_flush.patch >> patching file arch/x86/mm/pageattr.c >> Applying patch patches/peterz-cpa-clflush_opt.patch >> patching file arch/x86/mm/pageattr.c >> Now at patch patches/peterz-cpa-clflush_opt.patch >> >> Weird. > > I pushed them out to: > >git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/mm > > I hope that works; I'm out for a few hours, but should check on email > again tonight. > NAK I get a failure in TTM on init with your x86/mm branch (see attached dmesg). This builds an RC2 kernel btw whereas we were building an RC3 kernel which is about 974 commits behind the tip of our drm-next and about 850 commits behind the last drm-next merge from Dave. Tom [0.00] Linux version 4.20.0-rc2+ (root@carrizo) (gcc version 8.2.1 20181105 (Red Hat 8.2.1-5) (GCC)) #1 SMP Fri Nov 30 10:44:38 EST 2018 [0.00] Command line: BOOT_IMAGE=/vmlinuz-4.20.0-rc2+ root=UUID=d7f3aad3-6017-4563-9cec-9b3c99467214 ro rhgb quiet LANG=en_CA.UTF-8 rdblacklist=amdgpu,radeon [0.00] [Firmware Info]: CPU: Re-enabling disabled Topology Extensions Support. [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format. [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009] usable [0.00] BIOS-e820: [mem 0x000a-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xd46cefff] usable [0.00] BIOS-e820: [mem 0xd46cf000-0xd46e8fff] ACPI data [0.00] BIOS-e820: [mem 0xd46e9000-0xda5c0fff] usable [0.00] BIOS-e820: [mem 0xda5c1000-0xda6fcfff] reserved [0.00] BIOS-e820: [mem 0xda6fd000-0xda712fff] ACPI data [0.00] BIOS-e820: [mem 0xda713000-0xda80bfff] usable [0.00] BIOS-e820: [mem 0xda80c000-0xdabbdfff] ACPI NVS [0.00] BIOS-e820: [mem 0xdabbe000-0xdb53] reserved [0.00] BIOS-e820: [mem 0xdb54-0xddff] usable [0.00] BIOS-e820: [mem 0xde00-0xdfff] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfe50-0xfe5f] reserved [0.00] BIOS-e820: [mem 0xfea0-0xfea0] reserved [0.00] BIOS-e820: [mem 0xfeb8-0xfec01fff] reserved [0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved [0.00] BIOS-e820: [mem 0xfec3-0xfec30fff] reserved [0.00] BIOS-e820: [mem 0xfed0-0xfed00fff] reserved [0.00] BIOS-e820: [mem 0xfed4-0xfed44fff] reserved [0.00] BIOS-e820: [mem 0xfed8-0xfed8] reserved [0.00] BIOS-e820: [mem 0xfedc-0xfedc0fff] reserved [0.00] BIOS-e820: [mem 0xfedc2000-0xfedc8fff] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfeef] reserved [0.00] BIOS-e820: [mem 0xff00-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x0001feff] usable [0.00] BIOS-e820: [mem 0x0001ff00-0x00021eff] reserved [0.00] NX (Execute Disable) protection: active [0.00] e820: update [mem 0x9f205018-0x9f214e57] usable ==> usable [0.00] e820: update [mem 0x9f205018-0x9f214e57] usable ==> usable [0.00] e820: update [mem 0x9f1f6018-0x9f204057] usable ==> usable [0.00] e820: update [mem 0x9f1f6018-0x9f204057] usable ==> usable [0.00] extended physical RAM map: [0.00] reserve setup_data: [mem 0x-0x0009] usable [0.00] reserve setup_data: [mem 0x000a-0x000f] reserved [0.00] reserve setup_data: [mem 0x0010-0x9f1f6017] usable [0.00] reserve setup_data: [mem 0x9f1f6018-0x9f204057] usable [0.00]
Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation
On 2018-11-30 10:31 a.m., Peter Zijlstra wrote: > On Fri, Nov 30, 2018 at 04:23:47PM +0100, Peter Zijlstra wrote: > >> Hurm.. no. They apply cleanly to Linus' tree here. >> >> linux-2.6$ git describe >> v4.20-rc4-156-g94f371cb7394 >> linux-2.6$ quilt push 4 >> Applying patch patches/peterz-cpa-addr.patch >> patching file arch/x86/mm/pageattr.c >> Applying patch patches/peterz-cpa-fix-flush_array.patch >> patching file arch/x86/mm/mm_internal.h >> patching file arch/x86/mm/pageattr.c >> patching file arch/x86/mm/tlb.c >> Applying patch patches/peterz-cpa-fold-cpa_flush.patch >> patching file arch/x86/mm/pageattr.c >> Applying patch patches/peterz-cpa-clflush_opt.patch >> patching file arch/x86/mm/pageattr.c >> Now at patch patches/peterz-cpa-clflush_opt.patch >> >> Weird. > > I pushed them out to: > >git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/mm > > I hope that works; I'm out for a few hours, but should check on email > again tonight. > NAK I get a failure in TTM on init with your x86/mm branch (see attached dmesg). This builds an RC2 kernel btw whereas we were building an RC3 kernel which is about 974 commits behind the tip of our drm-next and about 850 commits behind the last drm-next merge from Dave. Tom [0.00] Linux version 4.20.0-rc2+ (root@carrizo) (gcc version 8.2.1 20181105 (Red Hat 8.2.1-5) (GCC)) #1 SMP Fri Nov 30 10:44:38 EST 2018 [0.00] Command line: BOOT_IMAGE=/vmlinuz-4.20.0-rc2+ root=UUID=d7f3aad3-6017-4563-9cec-9b3c99467214 ro rhgb quiet LANG=en_CA.UTF-8 rdblacklist=amdgpu,radeon [0.00] [Firmware Info]: CPU: Re-enabling disabled Topology Extensions Support. [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format. [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009] usable [0.00] BIOS-e820: [mem 0x000a-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xd46cefff] usable [0.00] BIOS-e820: [mem 0xd46cf000-0xd46e8fff] ACPI data [0.00] BIOS-e820: [mem 0xd46e9000-0xda5c0fff] usable [0.00] BIOS-e820: [mem 0xda5c1000-0xda6fcfff] reserved [0.00] BIOS-e820: [mem 0xda6fd000-0xda712fff] ACPI data [0.00] BIOS-e820: [mem 0xda713000-0xda80bfff] usable [0.00] BIOS-e820: [mem 0xda80c000-0xdabbdfff] ACPI NVS [0.00] BIOS-e820: [mem 0xdabbe000-0xdb53] reserved [0.00] BIOS-e820: [mem 0xdb54-0xddff] usable [0.00] BIOS-e820: [mem 0xde00-0xdfff] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfe50-0xfe5f] reserved [0.00] BIOS-e820: [mem 0xfea0-0xfea0] reserved [0.00] BIOS-e820: [mem 0xfeb8-0xfec01fff] reserved [0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved [0.00] BIOS-e820: [mem 0xfec3-0xfec30fff] reserved [0.00] BIOS-e820: [mem 0xfed0-0xfed00fff] reserved [0.00] BIOS-e820: [mem 0xfed4-0xfed44fff] reserved [0.00] BIOS-e820: [mem 0xfed8-0xfed8] reserved [0.00] BIOS-e820: [mem 0xfedc-0xfedc0fff] reserved [0.00] BIOS-e820: [mem 0xfedc2000-0xfedc8fff] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfeef] reserved [0.00] BIOS-e820: [mem 0xff00-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x0001feff] usable [0.00] BIOS-e820: [mem 0x0001ff00-0x00021eff] reserved [0.00] NX (Execute Disable) protection: active [0.00] e820: update [mem 0x9f205018-0x9f214e57] usable ==> usable [0.00] e820: update [mem 0x9f205018-0x9f214e57] usable ==> usable [0.00] e820: update [mem 0x9f1f6018-0x9f204057] usable ==> usable [0.00] e820: update [mem 0x9f1f6018-0x9f204057] usable ==> usable [0.00] extended physical RAM map: [0.00] reserve setup_data: [mem 0x-0x0009] usable [0.00] reserve setup_data: [mem 0x000a-0x000f] reserved [0.00] reserve setup_data: [mem 0x0010-0x9f1f6017] usable [0.00] reserve setup_data: [mem 0x9f1f6018-0x9f204057] usable [0.00]
[PATCH v5] arm64: implement ftrace with regs
Use -fpatchable-function-entry (gcc8) to add 2 NOPs at the beginning of each function. Replace the first NOP thus generated with a quick LR saver (move it to scratch reg x9), so the 2nd replacement insn, the call to ftrace, does not clobber the value. Ftrace will then generate the standard stack frames. Note that patchable-function-entry in GCC disables IPA-RA, which means ABI register calling conventions are obeyed *and* scratch registers such as x9 are available. Introduce and handle an ftrace_regs_trampoline for module PLTs, together with ftrace_trampoline, and double the size of this special section if .text.ftrace_trampoline is present in the module. Signed-off-by: Torsten Duwe --- As reliable stack traces are still being discussed, here is dynamic ftrace with regs only, which has a value of its own. I can see Mark has broken down an earlier version into smaller patches; just let me know if you prefer that. [Changes from v4] * include Ard's feedback and pending changes: - ADR/ADRP veneers - ftrace_trampolines[2] - add a .req for fp, just in case - comment on the pt_regs.stackframe[2] mapping * include Mark's ftrace cleanup - GLOBAL() - prepare_ftrace_return(pc, , fp) --- arch/arm64/Kconfig|2 arch/arm64/Makefile |4 + arch/arm64/include/asm/assembler.h|1 arch/arm64/include/asm/ftrace.h | 13 +++ arch/arm64/include/asm/module.h |3 arch/arm64/kernel/Makefile|6 - arch/arm64/kernel/entry-ftrace.S | 130 ++ arch/arm64/kernel/ftrace.c| 125 +--- arch/arm64/kernel/module-plts.c |3 arch/arm64/kernel/module.c|2 drivers/firmware/efi/libstub/Makefile |3 include/asm-generic/vmlinux.lds.h |1 include/linux/compiler_types.h|4 + 13 files changed, 261 insertions(+), 36 deletions(-) --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -125,6 +125,8 @@ config ARM64 select HAVE_DEBUG_KMEMLEAK select HAVE_DMA_CONTIGUOUS select HAVE_DYNAMIC_FTRACE + select HAVE_DYNAMIC_FTRACE_WITH_REGS \ + if $(cc-option,-fpatchable-function-entry=2) select HAVE_EFFICIENT_UNALIGNED_ACCESS select HAVE_FTRACE_MCOUNT_RECORD select HAVE_FUNCTION_TRACER --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -79,6 +79,10 @@ ifeq ($(CONFIG_ARM64_MODULE_PLTS),y) KBUILD_LDFLAGS_MODULE += -T $(srctree)/arch/arm64/kernel/module.lds endif +ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y) + CC_FLAGS_FTRACE := -fpatchable-function-entry=2 +endif + # Default value head-y := arch/arm64/kernel/head.o --- a/arch/arm64/include/asm/ftrace.h +++ b/arch/arm64/include/asm/ftrace.h @@ -17,6 +17,19 @@ #define MCOUNT_ADDR((unsigned long)_mcount) #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE +/* + * DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the beginning + * of each function, with the second NOP actually calling ftrace. In contrary + * to a classic _mcount call, the call instruction to be modified is thus + * the second one, and not the only one. + */ +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS +#define ARCH_SUPPORTS_FTRACE_OPS 1 +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE +#else +#define REC_IP_BRANCH_OFFSET 0 +#endif + #ifndef __ASSEMBLY__ #include --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -7,9 +7,9 @@ CPPFLAGS_vmlinux.lds:= -DTEXT_OFFSET=$( AFLAGS_head.o := -DTEXT_OFFSET=$(TEXT_OFFSET) CFLAGS_armv8_deprecated.o := -I$(src) -CFLAGS_REMOVE_ftrace.o = -pg -CFLAGS_REMOVE_insn.o = -pg -CFLAGS_REMOVE_return_address.o = -pg +CFLAGS_REMOVE_ftrace.o = -pg $(CC_FLAGS_FTRACE) +CFLAGS_REMOVE_insn.o = -pg $(CC_FLAGS_FTRACE) +CFLAGS_REMOVE_return_address.o = -pg $(CC_FLAGS_FTRACE) # Object file lists. arm64-obj-y:= debug-monitors.o entry.o irq.o fpsimd.o \ --- a/drivers/firmware/efi/libstub/Makefile +++ b/drivers/firmware/efi/libstub/Makefile @@ -13,7 +13,8 @@ cflags-$(CONFIG_X86) += -m$(BITS) -D__K # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly # disable the stackleak plugin -cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie \ +cflags-$(CONFIG_ARM64) := $(filter-out -pg $(CC_FLAGS_FTRACE)\ + ,$(KBUILD_CFLAGS)) -fpie \ $(DISABLE_STACKLEAK_PLUGIN) cflags-$(CONFIG_ARM) := $(subst -pg,,$(KBUILD_CFLAGS)) \ -fno-builtin -fpic -mno-single-pic-base --- a/arch/arm64/kernel/entry-ftrace.S +++ b/arch/arm64/kernel/entry-ftrace.S @@ -13,6 +13,8 @@ #include #include #include +#include +#include /* * Gcc with -pg will put the following code in the beginning of each function: @@ -122,6 +124,7 @@ skip_ftrace_call: // } ENDPROC(_mcount) #else
Re: [RFC PATCH 3/6] char: fastrpc: Add support for context Invoke method
On Fri, Nov 30, 2018 at 5:03 PM Srinivas Kandagatla wrote: > On 30/11/18 15:08, Arnd Bergmann wrote: > > On Fri, Nov 30, 2018 at 4:01 PM Srinivas Kandagatla > > wrote: > >> Thanks Arnd for the review comments! > >> On 30/11/18 13:41, Arnd Bergmann wrote: > >>> On Fri, Nov 30, 2018 at 11:48 AM Srinivas Kandagatla > >>> wrote: > > > +static long fastrpc_device_ioctl(struct file *file, unsigned int cmd, > +unsigned long arg) > +{ > + struct fastrpc_user *fl = (struct fastrpc_user > *)file->private_data; > + struct fastrpc_channel_ctx *cctx = fl->cctx; > + char __user *argp = (char __user *)arg; > + int err; > + > + if (!fl->sctx) { > + fl->sctx = fastrpc_session_alloc(cctx, 0); > + if (!fl->sctx) > + return -ENOENT; > + } > >>> > >>> Shouldn't that session be allocated during open()? > >>> > >> Yes, and no, we do not need context in all the cases. In cases like we > >> just want to allocate dmabuf. > > > > Can you give an example what that would be good for? > > > > Currently the instance which does not need session is used as simple > memory allocator (rpcmem), TBH, this is the side effect of trying to fit > in with downstream application infrastructure which uses ion for andriod > usecases. That does not sound like enough of a reason then, user space is easy to change here to just allocate the memory from the device itself. The only reason that I can see for needing a dmabuf would be if you have to share a buffer between two instances, and then you can use either of them. > +static void fastrpc_notify_users(struct fastrpc_user *user) > +{ > + struct fastrpc_invoke_ctx *ctx, *n;will go > + > + spin_lock(>lock); > + list_for_each_entry_safe(ctx, n, >pending, node) > + complete(>work); > + spin_unlock(>lock); > +} > >>> > >>> Can you explain here what it means to have multiple 'users' > >>> a 'fastrpc_user' structure? Why are they all done at the same time? > > user is allocated on every open(). Having multiple users means that > there are more than one compute sessions running on a given dsp. > > They reason why all the users are notified here is because the dsp is > going down, so all the compute sessions associated with it will not see > any response from dsp, so any pending/waiting compute contexts are > explicitly notified. I don't get it yet. What are 'compute sessions'? Do you have multiple threads running on a single instance at the same time? I would have expected to only ever see one thread in the 'wait_for_completion()' above, and others possibly waiting for a chance to get to but not already running. > >> struct fastrpc_remote_crc { > >> __u64 crc; > >> __u64 reserved1 > >> __u64 reserved2 > >> __u64 reserved3 > >> }; > > > > I don't see a need to add extra served fields for structures > > that are already naturally aligned here, e.g. in > > fastrpc_remote_arg we need the 'reserved1' but not > > the 'reserved2'. > Yes, I see, I overdone it! > Other idea, is, may be I can try to combine these into single structure > something like: > > struct fastrpc_invoke_arg { > __u64 ptr; > __u64 len; > __u32 fd; > __u32 reserved1 > __u64 attr; > __u64 crc; > }; > > struct fastrpc_ioctl_invoke { > __u32 handle; > __u32 sc; > /* The minimum size is scalar_length * 32*/ > struct fastrpc_invoke_args *args; > }; That is still two structure, not one ;-) > >> struct fastrpc_ioctl_invoke { > >> __u32 handle; > >> __u32 sc; > >> /* The minimum size is scalar_length * 32 */ > >> struct fastrpc_remote_args *rargs; > >> struct fastrpc_remote_fd *fds; > >> struct fastrpc_remote_attr *attrs; > >> struct fastrpc_remote_crc *crc; > >> }; > > > > Do these really have to be indirect then? Are they all > > lists of variable length? How do you know how long? > Yes, they are variable length and will be scalar length long. > Scalar length is derived from sc variable in this structure. Do you mean we have a variable number 'sc', but each array always has the same length as the other ones? In that case: yes, combining them seems sensible. The other question this raises is: what is 'handle'? Why is the file descriptor not enough to identify the instance we want to talk to? Arnd
Re: [RFC PATCH 3/6] char: fastrpc: Add support for context Invoke method
On Fri, Nov 30, 2018 at 5:03 PM Srinivas Kandagatla wrote: > On 30/11/18 15:08, Arnd Bergmann wrote: > > On Fri, Nov 30, 2018 at 4:01 PM Srinivas Kandagatla > > wrote: > >> Thanks Arnd for the review comments! > >> On 30/11/18 13:41, Arnd Bergmann wrote: > >>> On Fri, Nov 30, 2018 at 11:48 AM Srinivas Kandagatla > >>> wrote: > > > +static long fastrpc_device_ioctl(struct file *file, unsigned int cmd, > +unsigned long arg) > +{ > + struct fastrpc_user *fl = (struct fastrpc_user > *)file->private_data; > + struct fastrpc_channel_ctx *cctx = fl->cctx; > + char __user *argp = (char __user *)arg; > + int err; > + > + if (!fl->sctx) { > + fl->sctx = fastrpc_session_alloc(cctx, 0); > + if (!fl->sctx) > + return -ENOENT; > + } > >>> > >>> Shouldn't that session be allocated during open()? > >>> > >> Yes, and no, we do not need context in all the cases. In cases like we > >> just want to allocate dmabuf. > > > > Can you give an example what that would be good for? > > > > Currently the instance which does not need session is used as simple > memory allocator (rpcmem), TBH, this is the side effect of trying to fit > in with downstream application infrastructure which uses ion for andriod > usecases. That does not sound like enough of a reason then, user space is easy to change here to just allocate the memory from the device itself. The only reason that I can see for needing a dmabuf would be if you have to share a buffer between two instances, and then you can use either of them. > +static void fastrpc_notify_users(struct fastrpc_user *user) > +{ > + struct fastrpc_invoke_ctx *ctx, *n;will go > + > + spin_lock(>lock); > + list_for_each_entry_safe(ctx, n, >pending, node) > + complete(>work); > + spin_unlock(>lock); > +} > >>> > >>> Can you explain here what it means to have multiple 'users' > >>> a 'fastrpc_user' structure? Why are they all done at the same time? > > user is allocated on every open(). Having multiple users means that > there are more than one compute sessions running on a given dsp. > > They reason why all the users are notified here is because the dsp is > going down, so all the compute sessions associated with it will not see > any response from dsp, so any pending/waiting compute contexts are > explicitly notified. I don't get it yet. What are 'compute sessions'? Do you have multiple threads running on a single instance at the same time? I would have expected to only ever see one thread in the 'wait_for_completion()' above, and others possibly waiting for a chance to get to but not already running. > >> struct fastrpc_remote_crc { > >> __u64 crc; > >> __u64 reserved1 > >> __u64 reserved2 > >> __u64 reserved3 > >> }; > > > > I don't see a need to add extra served fields for structures > > that are already naturally aligned here, e.g. in > > fastrpc_remote_arg we need the 'reserved1' but not > > the 'reserved2'. > Yes, I see, I overdone it! > Other idea, is, may be I can try to combine these into single structure > something like: > > struct fastrpc_invoke_arg { > __u64 ptr; > __u64 len; > __u32 fd; > __u32 reserved1 > __u64 attr; > __u64 crc; > }; > > struct fastrpc_ioctl_invoke { > __u32 handle; > __u32 sc; > /* The minimum size is scalar_length * 32*/ > struct fastrpc_invoke_args *args; > }; That is still two structure, not one ;-) > >> struct fastrpc_ioctl_invoke { > >> __u32 handle; > >> __u32 sc; > >> /* The minimum size is scalar_length * 32 */ > >> struct fastrpc_remote_args *rargs; > >> struct fastrpc_remote_fd *fds; > >> struct fastrpc_remote_attr *attrs; > >> struct fastrpc_remote_crc *crc; > >> }; > > > > Do these really have to be indirect then? Are they all > > lists of variable length? How do you know how long? > Yes, they are variable length and will be scalar length long. > Scalar length is derived from sc variable in this structure. Do you mean we have a variable number 'sc', but each array always has the same length as the other ones? In that case: yes, combining them seems sensible. The other question this raises is: what is 'handle'? Why is the file descriptor not enough to identify the instance we want to talk to? Arnd
[PATCH v5] arm64: implement ftrace with regs
Use -fpatchable-function-entry (gcc8) to add 2 NOPs at the beginning of each function. Replace the first NOP thus generated with a quick LR saver (move it to scratch reg x9), so the 2nd replacement insn, the call to ftrace, does not clobber the value. Ftrace will then generate the standard stack frames. Note that patchable-function-entry in GCC disables IPA-RA, which means ABI register calling conventions are obeyed *and* scratch registers such as x9 are available. Introduce and handle an ftrace_regs_trampoline for module PLTs, together with ftrace_trampoline, and double the size of this special section if .text.ftrace_trampoline is present in the module. Signed-off-by: Torsten Duwe --- As reliable stack traces are still being discussed, here is dynamic ftrace with regs only, which has a value of its own. I can see Mark has broken down an earlier version into smaller patches; just let me know if you prefer that. [Changes from v4] * include Ard's feedback and pending changes: - ADR/ADRP veneers - ftrace_trampolines[2] - add a .req for fp, just in case - comment on the pt_regs.stackframe[2] mapping * include Mark's ftrace cleanup - GLOBAL() - prepare_ftrace_return(pc, , fp) --- arch/arm64/Kconfig|2 arch/arm64/Makefile |4 + arch/arm64/include/asm/assembler.h|1 arch/arm64/include/asm/ftrace.h | 13 +++ arch/arm64/include/asm/module.h |3 arch/arm64/kernel/Makefile|6 - arch/arm64/kernel/entry-ftrace.S | 130 ++ arch/arm64/kernel/ftrace.c| 125 +--- arch/arm64/kernel/module-plts.c |3 arch/arm64/kernel/module.c|2 drivers/firmware/efi/libstub/Makefile |3 include/asm-generic/vmlinux.lds.h |1 include/linux/compiler_types.h|4 + 13 files changed, 261 insertions(+), 36 deletions(-) --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -125,6 +125,8 @@ config ARM64 select HAVE_DEBUG_KMEMLEAK select HAVE_DMA_CONTIGUOUS select HAVE_DYNAMIC_FTRACE + select HAVE_DYNAMIC_FTRACE_WITH_REGS \ + if $(cc-option,-fpatchable-function-entry=2) select HAVE_EFFICIENT_UNALIGNED_ACCESS select HAVE_FTRACE_MCOUNT_RECORD select HAVE_FUNCTION_TRACER --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -79,6 +79,10 @@ ifeq ($(CONFIG_ARM64_MODULE_PLTS),y) KBUILD_LDFLAGS_MODULE += -T $(srctree)/arch/arm64/kernel/module.lds endif +ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y) + CC_FLAGS_FTRACE := -fpatchable-function-entry=2 +endif + # Default value head-y := arch/arm64/kernel/head.o --- a/arch/arm64/include/asm/ftrace.h +++ b/arch/arm64/include/asm/ftrace.h @@ -17,6 +17,19 @@ #define MCOUNT_ADDR((unsigned long)_mcount) #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE +/* + * DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the beginning + * of each function, with the second NOP actually calling ftrace. In contrary + * to a classic _mcount call, the call instruction to be modified is thus + * the second one, and not the only one. + */ +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS +#define ARCH_SUPPORTS_FTRACE_OPS 1 +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE +#else +#define REC_IP_BRANCH_OFFSET 0 +#endif + #ifndef __ASSEMBLY__ #include --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -7,9 +7,9 @@ CPPFLAGS_vmlinux.lds:= -DTEXT_OFFSET=$( AFLAGS_head.o := -DTEXT_OFFSET=$(TEXT_OFFSET) CFLAGS_armv8_deprecated.o := -I$(src) -CFLAGS_REMOVE_ftrace.o = -pg -CFLAGS_REMOVE_insn.o = -pg -CFLAGS_REMOVE_return_address.o = -pg +CFLAGS_REMOVE_ftrace.o = -pg $(CC_FLAGS_FTRACE) +CFLAGS_REMOVE_insn.o = -pg $(CC_FLAGS_FTRACE) +CFLAGS_REMOVE_return_address.o = -pg $(CC_FLAGS_FTRACE) # Object file lists. arm64-obj-y:= debug-monitors.o entry.o irq.o fpsimd.o \ --- a/drivers/firmware/efi/libstub/Makefile +++ b/drivers/firmware/efi/libstub/Makefile @@ -13,7 +13,8 @@ cflags-$(CONFIG_X86) += -m$(BITS) -D__K # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly # disable the stackleak plugin -cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie \ +cflags-$(CONFIG_ARM64) := $(filter-out -pg $(CC_FLAGS_FTRACE)\ + ,$(KBUILD_CFLAGS)) -fpie \ $(DISABLE_STACKLEAK_PLUGIN) cflags-$(CONFIG_ARM) := $(subst -pg,,$(KBUILD_CFLAGS)) \ -fno-builtin -fpic -mno-single-pic-base --- a/arch/arm64/kernel/entry-ftrace.S +++ b/arch/arm64/kernel/entry-ftrace.S @@ -13,6 +13,8 @@ #include #include #include +#include +#include /* * Gcc with -pg will put the following code in the beginning of each function: @@ -122,6 +124,7 @@ skip_ftrace_call: // } ENDPROC(_mcount) #else
Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus
On Fri, 30 Nov 2018 06:24:16 + Bharat Bhushan wrote: > Hi Alex, > > > -Original Message- > > From: Alex Williamson > > Sent: Friday, November 30, 2018 11:26 AM > > To: Bharat Bhushan > > Cc: Bjorn Helgaas ; Bjorn Helgaas > > ; linux-...@vger.kernel.org; Linux Kernel Mailing List > > ; bharatb.ya...@gmail.com; David Daney > > ; jglau...@cavium.com; > > mbroe...@libmpq.org; chrisrblak...@gmail.com > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > > > On Fri, 30 Nov 2018 05:29:47 + > > Bharat Bhushan wrote: > > > > > Hi, > > > > > > > -Original Message- > > > > From: Bjorn Helgaas > > > > Sent: Thursday, November 29, 2018 1:46 AM > > > > To: Bharat Bhushan > > > > Cc: alex.william...@redhat.com; Bjorn Helgaas ; > > > > linux- p...@vger.kernel.org; Linux Kernel Mailing List > > > ker...@vger.kernel.org>; bharatb.ya...@gmail.com; David Daney > > > > ; jglau...@cavium.com; > > mbroe...@libmpq.org; > > > > chrisrblak...@gmail.com > > > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > > > > > > > On Tue, Nov 27, 2018 at 10:32 PM Bharat Bhushan > > > > wrote: > > > > > > > > > > -Original Message- > > > > > > From: Alex Williamson > > > > > > Sent: Tuesday, November 27, 2018 9:39 PM > > > > > > To: Bjorn Helgaas > > > > > > Cc: Bharat Bhushan ; > > > > > > linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; > > > > > > bharatb.ya...@gmail.com; David Daney > > ; > > > > Jan > > > > > > Glauber ; Maik Broemme > > > > ; > > > > > > Chris Blake > > > > > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > > > > > > > > > > > On Tue, 27 Nov 2018 09:33:56 -0600 Bjorn Helgaas > > > > > > wrote: > > > > > > > > > > > 4) Is there a hardware erratum for this? If so, please > > > > > > > include the URL here. > > > > > > > > > > No h/w errata as of now. > > > > > > > > Does that mean (a) the HW folks agree this is a hardware problem but > > > > they haven't written an erratum, (b) there is an erratum but it > > > > isn't public, (c) we don't have any concrete evidence of a hardware > > > > problem, but things just don't work if we do a bus reset, (d) something > > > > > > else? > > > > > > I will say it is (c) - not concluded to be hardware h/w issue. > > > > > > > > > > > > In pci_reset_secondary_bus() I have tried to increase the delay > > > > > after reset > > > > but not helped. > > > > > Do I need to add delay at some other place as well? > > > > > > > > No, I think the place you tried should be enough. > > > > > > > > You should also be able to exercise this from user-space by using > > > > "setpci" to set and clear the Secondary Bus Reset bit in the Bridge > > > > Control register. Then you can also use setpci to read/write config > > > > space of the NIC. The kernel would normally read the Vendor and > > > > Device IDs as the first access to the device during enumeration. > > > > You also might be able to learn something by using "lspci -vv" on > > > > the bridge before and after the reset to see if it logs any AER bits > > > > (if it > > supports AER) or the other standard error logging bits. > > > > > > I tried below sequence for Secondary bus reset and device config space > > > show 0xff > > > > > > root@localhost:~# lspci -x > > > 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev > > > 10) > > > 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00 > > > 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00 > > > 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00 > > > 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00 > > > > > > 0002:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit > > > Network Connection > > > 00: 86 80 d3 10 06 04 10 00 00 00 00 02 10 00 00 00 > > > 10: 00 00 0c 40 00 00 00 40 01 00 00 00 00 00 0e 40 > > > 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 1f a0 > > > 30: 00 00 24 40 c8 00 00 00 00 00 00 00 63 01 00 00 > > > > > > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x40 > > > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x00 > > > > > > root@localhost:~# lspci -x > > > 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev > > > 10) > > > 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00 > > > 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00 > > > 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00 > > > 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00 > > > > Just for curiosity sake, what if you re-write the secondary and subordinate > > bus registers here: > > > > # setpci -s 0002:00:00.0 0x19.b=0x01 > > # setpci -s 0002:00:00.0 0x1a.b=0xff > > Result is same, here are logs > > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x40 > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x00 > root@localhost:~# setpci -s 0002:00:00.0 0x19.b=0x01 > root@localhost:~# setpci -s 0002:00:00.0 0x1a.b=0xff > root@localhost:~# lspci -x > 0002:00:00.0 PCI bridge:
Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus
On Fri, 30 Nov 2018 06:24:16 + Bharat Bhushan wrote: > Hi Alex, > > > -Original Message- > > From: Alex Williamson > > Sent: Friday, November 30, 2018 11:26 AM > > To: Bharat Bhushan > > Cc: Bjorn Helgaas ; Bjorn Helgaas > > ; linux-...@vger.kernel.org; Linux Kernel Mailing List > > ; bharatb.ya...@gmail.com; David Daney > > ; jglau...@cavium.com; > > mbroe...@libmpq.org; chrisrblak...@gmail.com > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > > > On Fri, 30 Nov 2018 05:29:47 + > > Bharat Bhushan wrote: > > > > > Hi, > > > > > > > -Original Message- > > > > From: Bjorn Helgaas > > > > Sent: Thursday, November 29, 2018 1:46 AM > > > > To: Bharat Bhushan > > > > Cc: alex.william...@redhat.com; Bjorn Helgaas ; > > > > linux- p...@vger.kernel.org; Linux Kernel Mailing List > > > ker...@vger.kernel.org>; bharatb.ya...@gmail.com; David Daney > > > > ; jglau...@cavium.com; > > mbroe...@libmpq.org; > > > > chrisrblak...@gmail.com > > > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > > > > > > > On Tue, Nov 27, 2018 at 10:32 PM Bharat Bhushan > > > > wrote: > > > > > > > > > > -Original Message- > > > > > > From: Alex Williamson > > > > > > Sent: Tuesday, November 27, 2018 9:39 PM > > > > > > To: Bjorn Helgaas > > > > > > Cc: Bharat Bhushan ; > > > > > > linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; > > > > > > bharatb.ya...@gmail.com; David Daney > > ; > > > > Jan > > > > > > Glauber ; Maik Broemme > > > > ; > > > > > > Chris Blake > > > > > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > > > > > > > > > > > On Tue, 27 Nov 2018 09:33:56 -0600 Bjorn Helgaas > > > > > > wrote: > > > > > > > > > > > 4) Is there a hardware erratum for this? If so, please > > > > > > > include the URL here. > > > > > > > > > > No h/w errata as of now. > > > > > > > > Does that mean (a) the HW folks agree this is a hardware problem but > > > > they haven't written an erratum, (b) there is an erratum but it > > > > isn't public, (c) we don't have any concrete evidence of a hardware > > > > problem, but things just don't work if we do a bus reset, (d) something > > > > > > else? > > > > > > I will say it is (c) - not concluded to be hardware h/w issue. > > > > > > > > > > > > In pci_reset_secondary_bus() I have tried to increase the delay > > > > > after reset > > > > but not helped. > > > > > Do I need to add delay at some other place as well? > > > > > > > > No, I think the place you tried should be enough. > > > > > > > > You should also be able to exercise this from user-space by using > > > > "setpci" to set and clear the Secondary Bus Reset bit in the Bridge > > > > Control register. Then you can also use setpci to read/write config > > > > space of the NIC. The kernel would normally read the Vendor and > > > > Device IDs as the first access to the device during enumeration. > > > > You also might be able to learn something by using "lspci -vv" on > > > > the bridge before and after the reset to see if it logs any AER bits > > > > (if it > > supports AER) or the other standard error logging bits. > > > > > > I tried below sequence for Secondary bus reset and device config space > > > show 0xff > > > > > > root@localhost:~# lspci -x > > > 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev > > > 10) > > > 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00 > > > 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00 > > > 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00 > > > 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00 > > > > > > 0002:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit > > > Network Connection > > > 00: 86 80 d3 10 06 04 10 00 00 00 00 02 10 00 00 00 > > > 10: 00 00 0c 40 00 00 00 40 01 00 00 00 00 00 0e 40 > > > 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 1f a0 > > > 30: 00 00 24 40 c8 00 00 00 00 00 00 00 63 01 00 00 > > > > > > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x40 > > > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x00 > > > > > > root@localhost:~# lspci -x > > > 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev > > > 10) > > > 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00 > > > 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00 > > > 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00 > > > 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00 > > > > Just for curiosity sake, what if you re-write the secondary and subordinate > > bus registers here: > > > > # setpci -s 0002:00:00.0 0x19.b=0x01 > > # setpci -s 0002:00:00.0 0x1a.b=0xff > > Result is same, here are logs > > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x40 > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x00 > root@localhost:~# setpci -s 0002:00:00.0 0x19.b=0x01 > root@localhost:~# setpci -s 0002:00:00.0 0x1a.b=0xff > root@localhost:~# lspci -x > 0002:00:00.0 PCI bridge:
Re: [RFC PATCH 2/6] char: fastrpc: Add Qualcomm fastrpc basic driver model
Thanks for the comments! On 30/11/18 16:13, Greg KH wrote: + "sdsp", "cdsp"}; +static dev_t fastrpc_major; Why do you need a whole major number for this? Why not just use the Not really! misc interface instead? Sure, I will give that a go! --srini
Re: [RFC PATCH 2/6] char: fastrpc: Add Qualcomm fastrpc basic driver model
Thanks for the comments! On 30/11/18 16:13, Greg KH wrote: + "sdsp", "cdsp"}; +static dev_t fastrpc_major; Why do you need a whole major number for this? Why not just use the Not really! misc interface instead? Sure, I will give that a go! --srini
Re: [RFC PATCH 2/6] char: fastrpc: Add Qualcomm fastrpc basic driver model
On Fri, Nov 30, 2018 at 10:46:53AM +, Srinivas Kandagatla wrote: > This patch adds basic driver model for qualcomm fastrpc. > Each DSP rpmsg channel is represented as fastrpc channel context and > is exposed as a character driver for userspace interface. > Each compute context bank is represented as fastrpc-session-context, > which are dynamically managed by the channel context char device. > > Signed-off-by: Srinivas Kandagatla > --- > drivers/char/Kconfig | 10 ++ > drivers/char/Makefile | 1 + > drivers/char/fastrpc.c | 337 + > 3 files changed, 348 insertions(+) > create mode 100644 drivers/char/fastrpc.c > > diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig > index 9d03b2ff5df6..75fd274c67df 100644 > --- a/drivers/char/Kconfig > +++ b/drivers/char/Kconfig > @@ -552,6 +552,16 @@ config ADI > and SSM (Silicon Secured Memory). Intended consumers of this > driver include crash and makedumpfile. > > +config QCOM_FASTRPC > + tristate "Qualcomm FastRPC" > + depends on ARCH_QCOM || COMPILE_TEST > + depends on RPMSG > + help > + Provides a communication mechanism that allows for clients to > + make remote method invocations across processor boundary to > + applications DSP processor. Say M if you want to enable this > + module. > + > endmenu > > config RANDOM_TRUST_CPU > diff --git a/drivers/char/Makefile b/drivers/char/Makefile > index b8d42b4e979b..30ec9187350e 100644 > --- a/drivers/char/Makefile > +++ b/drivers/char/Makefile > @@ -58,3 +58,4 @@ js-rtc-y = rtc.o > obj-$(CONFIG_XILLYBUS) += xillybus/ > obj-$(CONFIG_POWERNV_OP_PANEL) += powernv-op-panel.o > obj-$(CONFIG_ADI)+= adi.o > +obj-$(CONFIG_QCOM_FASTRPC) += fastrpc.o > diff --git a/drivers/char/fastrpc.c b/drivers/char/fastrpc.c > new file mode 100644 > index ..97d8062eb3e1 > --- /dev/null > +++ b/drivers/char/fastrpc.c > @@ -0,0 +1,337 @@ > +// SPDX-License-Identifier: GPL-2.0 > +// Copyright (c) 2011-2018, The Linux Foundation. All rights reserved. > +// Copyright (c) 2018, Linaro Limited > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define ADSP_DOMAIN_ID (0) > +#define MDSP_DOMAIN_ID (1) > +#define SDSP_DOMAIN_ID (2) > +#define CDSP_DOMAIN_ID (3) > +#define FASTRPC_DEV_MAX 4 /* adsp, mdsp, slpi, cdsp*/ > +#define FASTRPC_MAX_SESSIONS 9 /*8 compute, 1 cpz*/ > +#define FASTRPC_CTX_MAX (256) > +#define FASTRPC_CTXID_MASK (0xFF0) > +#define FASTRPC_DEVICE_NAME "fastrpc" > + > +#define cdev_to_cctx(d) container_of(d, struct fastrpc_channel_ctx, cdev) > + > +static const char *domains[FASTRPC_DEV_MAX] = { "adsp", "mdsp", > + "sdsp", "cdsp"}; > +static dev_t fastrpc_major; Why do you need a whole major number for this? Why not just use the misc interface instead?
Re: [RFC PATCH 2/6] char: fastrpc: Add Qualcomm fastrpc basic driver model
On Fri, Nov 30, 2018 at 10:46:53AM +, Srinivas Kandagatla wrote: > This patch adds basic driver model for qualcomm fastrpc. > Each DSP rpmsg channel is represented as fastrpc channel context and > is exposed as a character driver for userspace interface. > Each compute context bank is represented as fastrpc-session-context, > which are dynamically managed by the channel context char device. > > Signed-off-by: Srinivas Kandagatla > --- > drivers/char/Kconfig | 10 ++ > drivers/char/Makefile | 1 + > drivers/char/fastrpc.c | 337 + > 3 files changed, 348 insertions(+) > create mode 100644 drivers/char/fastrpc.c > > diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig > index 9d03b2ff5df6..75fd274c67df 100644 > --- a/drivers/char/Kconfig > +++ b/drivers/char/Kconfig > @@ -552,6 +552,16 @@ config ADI > and SSM (Silicon Secured Memory). Intended consumers of this > driver include crash and makedumpfile. > > +config QCOM_FASTRPC > + tristate "Qualcomm FastRPC" > + depends on ARCH_QCOM || COMPILE_TEST > + depends on RPMSG > + help > + Provides a communication mechanism that allows for clients to > + make remote method invocations across processor boundary to > + applications DSP processor. Say M if you want to enable this > + module. > + > endmenu > > config RANDOM_TRUST_CPU > diff --git a/drivers/char/Makefile b/drivers/char/Makefile > index b8d42b4e979b..30ec9187350e 100644 > --- a/drivers/char/Makefile > +++ b/drivers/char/Makefile > @@ -58,3 +58,4 @@ js-rtc-y = rtc.o > obj-$(CONFIG_XILLYBUS) += xillybus/ > obj-$(CONFIG_POWERNV_OP_PANEL) += powernv-op-panel.o > obj-$(CONFIG_ADI)+= adi.o > +obj-$(CONFIG_QCOM_FASTRPC) += fastrpc.o > diff --git a/drivers/char/fastrpc.c b/drivers/char/fastrpc.c > new file mode 100644 > index ..97d8062eb3e1 > --- /dev/null > +++ b/drivers/char/fastrpc.c > @@ -0,0 +1,337 @@ > +// SPDX-License-Identifier: GPL-2.0 > +// Copyright (c) 2011-2018, The Linux Foundation. All rights reserved. > +// Copyright (c) 2018, Linaro Limited > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define ADSP_DOMAIN_ID (0) > +#define MDSP_DOMAIN_ID (1) > +#define SDSP_DOMAIN_ID (2) > +#define CDSP_DOMAIN_ID (3) > +#define FASTRPC_DEV_MAX 4 /* adsp, mdsp, slpi, cdsp*/ > +#define FASTRPC_MAX_SESSIONS 9 /*8 compute, 1 cpz*/ > +#define FASTRPC_CTX_MAX (256) > +#define FASTRPC_CTXID_MASK (0xFF0) > +#define FASTRPC_DEVICE_NAME "fastrpc" > + > +#define cdev_to_cctx(d) container_of(d, struct fastrpc_channel_ctx, cdev) > + > +static const char *domains[FASTRPC_DEV_MAX] = { "adsp", "mdsp", > + "sdsp", "cdsp"}; > +static dev_t fastrpc_major; Why do you need a whole major number for this? Why not just use the misc interface instead?
Re: dcache_readdir NULL inode oops
On Fri, Nov 30, 2018 at 09:16:49AM -0600, Eric W. Biederman wrote: > >> > + inode_lock(parent->d_inode); > >> > dentry->d_fsdata = NULL; > >> > drop_nlink(dentry->d_inode); > >> > d_delete(dentry); > >> > + inode_unlock(parent->d_inode); > >> > + > >> > dput(dentry); /* d_alloc_name() in devpts_pty_new() */ > >> > } > > > > This feels right but getting some feedback from others would be good. > > This is going to be special at least because we are not coming through > the normal unlink path and we are manipulating the dcache. > > This looks plausible. If this is whats going on then we have had this > bug for a very long time. I will see if I can make some time. > > It looks like in the general case everything is serialized by the > devpts_mutex. I wonder if just changing the order of operations > here would be enough. > > AKA: drop_nlink d_delete then dentry->d_fsdata. Ugh d_fsdata is not > implicated so that won't help here. It certainly won't. The thing is, this if (!dir_emit(ctx, next->d_name.name, next->d_name.len, d_inode(next)->i_ino, dt_type(d_inode(next in dcache_readdir() obviously can block, so all we can hold over it is blocking locks. Which we do - specifically, ->i_rwsem on our directory. It's actually worse than missing inode_lock() - consider the effects of mount --bind /mnt/foo /dev/pts/42. What happens when that thing goes away? Right, a lost mount... I'll resurrect the "kernel-internal rm -rf done right" series and post it; devpts is not the only place suffering such problem (binfmt_misc, etc.)
Re: dcache_readdir NULL inode oops
On Fri, Nov 30, 2018 at 09:16:49AM -0600, Eric W. Biederman wrote: > >> > + inode_lock(parent->d_inode); > >> > dentry->d_fsdata = NULL; > >> > drop_nlink(dentry->d_inode); > >> > d_delete(dentry); > >> > + inode_unlock(parent->d_inode); > >> > + > >> > dput(dentry); /* d_alloc_name() in devpts_pty_new() */ > >> > } > > > > This feels right but getting some feedback from others would be good. > > This is going to be special at least because we are not coming through > the normal unlink path and we are manipulating the dcache. > > This looks plausible. If this is whats going on then we have had this > bug for a very long time. I will see if I can make some time. > > It looks like in the general case everything is serialized by the > devpts_mutex. I wonder if just changing the order of operations > here would be enough. > > AKA: drop_nlink d_delete then dentry->d_fsdata. Ugh d_fsdata is not > implicated so that won't help here. It certainly won't. The thing is, this if (!dir_emit(ctx, next->d_name.name, next->d_name.len, d_inode(next)->i_ino, dt_type(d_inode(next in dcache_readdir() obviously can block, so all we can hold over it is blocking locks. Which we do - specifically, ->i_rwsem on our directory. It's actually worse than missing inode_lock() - consider the effects of mount --bind /mnt/foo /dev/pts/42. What happens when that thing goes away? Right, a lost mount... I'll resurrect the "kernel-internal rm -rf done right" series and post it; devpts is not the only place suffering such problem (binfmt_misc, etc.)
[GIT PULL] Driver core fix for 4.20-rc5
The following changes since commit 651022382c7f8da46cb4872a545ee1da6d097d2a: Linux 4.20-rc1 (2018-11-04 15:37:52 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git tags/driver-core-4.20-rc5 for you to fetch changes up to a66d972465d15b1d89281258805eb8b47d66bd36: devres: Align data[] to ARCH_KMALLOC_MINALIGN (2018-11-11 11:40:04 -0800) Driver core fix for 4.20-rc5 Here is a single driver core fix for 4.20-rc5 It resolves an issue with the data alignment in 'struct devres' for the ARC platform. The full details are in the commit changelog, but the short summary is the change is a single line: - unsigned long long data[]; /* guarantee ull alignment */ + u8 __aligned(ARCH_KMALLOC_MINALIGN) data[]; This has been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman Alexey Brodkin (1): devres: Align data[] to ARCH_KMALLOC_MINALIGN drivers/base/devres.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-)
[GIT PULL] Driver core fix for 4.20-rc5
The following changes since commit 651022382c7f8da46cb4872a545ee1da6d097d2a: Linux 4.20-rc1 (2018-11-04 15:37:52 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git tags/driver-core-4.20-rc5 for you to fetch changes up to a66d972465d15b1d89281258805eb8b47d66bd36: devres: Align data[] to ARCH_KMALLOC_MINALIGN (2018-11-11 11:40:04 -0800) Driver core fix for 4.20-rc5 Here is a single driver core fix for 4.20-rc5 It resolves an issue with the data alignment in 'struct devres' for the ARC platform. The full details are in the commit changelog, but the short summary is the change is a single line: - unsigned long long data[]; /* guarantee ull alignment */ + u8 __aligned(ARCH_KMALLOC_MINALIGN) data[]; This has been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman Alexey Brodkin (1): devres: Align data[] to ARCH_KMALLOC_MINALIGN drivers/base/devres.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-)
[GIT PULL] Char/Misc driver fixes for 4.20-rc5
The following changes since commit 2e6e902d185027f8e3cb8b7305238f7e35d6a436: Linux 4.20-rc4 (2018-11-25 14:19:31 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git tags/char-misc-4.20-rc5 for you to fetch changes up to 6484a677294aa5d08c0210f2f387ebb9be646115: misc: mic/scif: fix copy-paste error in scif_create_remote_lookup (2018-11-27 09:00:38 +0100) Char/Misc fixes for 4.20-rc5 Here are a few small char/misc driver fixes for 4.20-rc5 that resolve a number of reported issues. The "largest" here is the thunderbolt patch, which resolves an issue with NVM upgrade, the smallest being some fsi driver fixes. There's also a hyperv bugfix, and the usual binder bugfixes. All of these have been in linux-next with no reported issues. Signed-off-by: Greg Kroah-Hartman Arnd Bergmann (1): fsi: master-ast-cf: select GENERIC_ALLOCATOR Brajeswar Ghosh (1): fsi: fsi-scom.c: Remove duplicate header Dexuan Cui (1): Drivers: hv: vmbus: check the creation_status in vmbus_establish_gpadl() Greg Kroah-Hartman (1): Merge tag 'fsi-updates-2018-11-26' of git://git.kernel.org/.../benh/linux-fsi char-misc-linus Mika Westerberg (1): thunderbolt: Prevent root port runtime suspend during NVM upgrade Todd Kjos (1): binder: fix race that allows malicious free of live buffer YueHaibing (1): misc: mic/scif: fix copy-paste error in scif_create_remote_lookup drivers/android/binder.c | 21 - drivers/android/binder_alloc.c | 16 ++-- drivers/android/binder_alloc.h | 3 +-- drivers/fsi/Kconfig | 1 + drivers/fsi/fsi-scom.c | 1 - drivers/hv/channel.c | 8 drivers/misc/mic/scif/scif_rma.c | 2 +- drivers/thunderbolt/switch.c | 40 ++-- 8 files changed, 67 insertions(+), 25 deletions(-)
[GIT PULL] Char/Misc driver fixes for 4.20-rc5
The following changes since commit 2e6e902d185027f8e3cb8b7305238f7e35d6a436: Linux 4.20-rc4 (2018-11-25 14:19:31 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git tags/char-misc-4.20-rc5 for you to fetch changes up to 6484a677294aa5d08c0210f2f387ebb9be646115: misc: mic/scif: fix copy-paste error in scif_create_remote_lookup (2018-11-27 09:00:38 +0100) Char/Misc fixes for 4.20-rc5 Here are a few small char/misc driver fixes for 4.20-rc5 that resolve a number of reported issues. The "largest" here is the thunderbolt patch, which resolves an issue with NVM upgrade, the smallest being some fsi driver fixes. There's also a hyperv bugfix, and the usual binder bugfixes. All of these have been in linux-next with no reported issues. Signed-off-by: Greg Kroah-Hartman Arnd Bergmann (1): fsi: master-ast-cf: select GENERIC_ALLOCATOR Brajeswar Ghosh (1): fsi: fsi-scom.c: Remove duplicate header Dexuan Cui (1): Drivers: hv: vmbus: check the creation_status in vmbus_establish_gpadl() Greg Kroah-Hartman (1): Merge tag 'fsi-updates-2018-11-26' of git://git.kernel.org/.../benh/linux-fsi char-misc-linus Mika Westerberg (1): thunderbolt: Prevent root port runtime suspend during NVM upgrade Todd Kjos (1): binder: fix race that allows malicious free of live buffer YueHaibing (1): misc: mic/scif: fix copy-paste error in scif_create_remote_lookup drivers/android/binder.c | 21 - drivers/android/binder_alloc.c | 16 ++-- drivers/android/binder_alloc.h | 3 +-- drivers/fsi/Kconfig | 1 + drivers/fsi/fsi-scom.c | 1 - drivers/hv/channel.c | 8 drivers/misc/mic/scif/scif_rma.c | 2 +- drivers/thunderbolt/switch.c | 40 ++-- 8 files changed, 67 insertions(+), 25 deletions(-)
[GIT PULL] Staging/IIO driver fixes for 4.20-rc5
The following changes since commit 651022382c7f8da46cb4872a545ee1da6d097d2a: Linux 4.20-rc1 (2018-11-04 15:37:52 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git tags/staging-4.20-rc5 for you to fetch changes up to c648284f6c9606f1854816086593eeae5556845a: Merge tag 'iio-fixes-for-4.20a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus (2018-11-22 09:37:36 +0100) Staging and IIO driver fixes for 4.20-rc5 Here are some small IIO and Staging driver fixes for 4.20-rc5. Nothing major, the IIO fix ended up touching the HID drivers at the same time, but the HID maintainer acked it. The staging fixes are all minor patches for reported issues and regressions, full details are in the shortlog. All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman Ben Wolsieffer (1): staging: vchiq_arm: fix compat VCHIQ_IOC_AWAIT_COMPLETION Christophe JAILLET (1): staging: rtl8723bs: Fix the return value in case of error in 'rtw_wx_read32()' Colin Ian King (3): drivers: staging: cedrus: find ctx before dereferencing it ctx staging: most: use format specifier "%s" in snprintf staging: mt7621-pinctrl: fix uninitialized variable ngroups Greg Kroah-Hartman (1): Merge tag 'iio-fixes-for-4.20a' of git://git.kernel.org/.../jic23/iio into staging-linus Hans de Goede (1): iio/hid-sensors: Fix IIO_CHAN_INFO_RAW returning wrong values for signed numbers Larry Finger (2): staging: rtl8723bs: Fix incorrect sense of ether_addr_equal staging: rtl8723bs: Add missing return for cfg80211_rtw_get_station Martin Kelly (1): iio:st_magn: Fix enable device after trigger Sergio Paracuellos (1): staging: mt7621-dma: fix potentially dereferencing uninitialized 'tx_desc' Spencer E. Olson (2): staging: comedi: ni_mio_common: scale ao INSN_CONFIG_GET_CMD_TIMING_CONSTRAINTS staging: comedi: clarify/unify macros for NI macro-defined terminals drivers/hid/hid-sensor-custom.c| 2 +- drivers/hid/hid-sensor-hub.c | 13 ++-- drivers/iio/accel/hid-sensor-accel-3d.c| 5 ++- drivers/iio/gyro/hid-sensor-gyro-3d.c | 5 ++- drivers/iio/humidity/hid-sensor-humidity.c | 3 +- drivers/iio/light/hid-sensor-als.c | 8 +++-- drivers/iio/light/hid-sensor-prox.c| 8 +++-- drivers/iio/magnetometer/hid-sensor-magn-3d.c | 8 +++-- drivers/iio/magnetometer/st_magn_buffer.c | 12 ++- drivers/iio/orientation/hid-sensor-incl-3d.c | 8 +++-- drivers/iio/pressure/hid-sensor-press.c| 8 +++-- drivers/iio/temperature/hid-sensor-temperature.c | 3 +- drivers/rtc/rtc-hid-sensor-time.c | 2 +- drivers/staging/comedi/comedi.h| 39 -- drivers/staging/comedi/drivers/ni_mio_common.c | 3 +- drivers/staging/media/sunxi/cedrus/cedrus.c| 22 ++-- drivers/staging/most/core.c| 2 +- drivers/staging/mt7621-dma/mtk-hsdma.c | 3 +- drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c| 2 +- drivers/staging/rtl8723bs/hal/rtl8723bs_recv.c | 4 +-- drivers/staging/rtl8723bs/os_dep/ioctl_cfg80211.c | 2 +- drivers/staging/rtl8723bs/os_dep/ioctl_linux.c | 2 +- .../vc04_services/interface/vchiq_arm/vchiq_arm.c | 7 +++- include/linux/hid-sensor-hub.h | 4 ++- 24 files changed, 103 insertions(+), 72 deletions(-)
[GIT PULL] Staging/IIO driver fixes for 4.20-rc5
The following changes since commit 651022382c7f8da46cb4872a545ee1da6d097d2a: Linux 4.20-rc1 (2018-11-04 15:37:52 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git tags/staging-4.20-rc5 for you to fetch changes up to c648284f6c9606f1854816086593eeae5556845a: Merge tag 'iio-fixes-for-4.20a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus (2018-11-22 09:37:36 +0100) Staging and IIO driver fixes for 4.20-rc5 Here are some small IIO and Staging driver fixes for 4.20-rc5. Nothing major, the IIO fix ended up touching the HID drivers at the same time, but the HID maintainer acked it. The staging fixes are all minor patches for reported issues and regressions, full details are in the shortlog. All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman Ben Wolsieffer (1): staging: vchiq_arm: fix compat VCHIQ_IOC_AWAIT_COMPLETION Christophe JAILLET (1): staging: rtl8723bs: Fix the return value in case of error in 'rtw_wx_read32()' Colin Ian King (3): drivers: staging: cedrus: find ctx before dereferencing it ctx staging: most: use format specifier "%s" in snprintf staging: mt7621-pinctrl: fix uninitialized variable ngroups Greg Kroah-Hartman (1): Merge tag 'iio-fixes-for-4.20a' of git://git.kernel.org/.../jic23/iio into staging-linus Hans de Goede (1): iio/hid-sensors: Fix IIO_CHAN_INFO_RAW returning wrong values for signed numbers Larry Finger (2): staging: rtl8723bs: Fix incorrect sense of ether_addr_equal staging: rtl8723bs: Add missing return for cfg80211_rtw_get_station Martin Kelly (1): iio:st_magn: Fix enable device after trigger Sergio Paracuellos (1): staging: mt7621-dma: fix potentially dereferencing uninitialized 'tx_desc' Spencer E. Olson (2): staging: comedi: ni_mio_common: scale ao INSN_CONFIG_GET_CMD_TIMING_CONSTRAINTS staging: comedi: clarify/unify macros for NI macro-defined terminals drivers/hid/hid-sensor-custom.c| 2 +- drivers/hid/hid-sensor-hub.c | 13 ++-- drivers/iio/accel/hid-sensor-accel-3d.c| 5 ++- drivers/iio/gyro/hid-sensor-gyro-3d.c | 5 ++- drivers/iio/humidity/hid-sensor-humidity.c | 3 +- drivers/iio/light/hid-sensor-als.c | 8 +++-- drivers/iio/light/hid-sensor-prox.c| 8 +++-- drivers/iio/magnetometer/hid-sensor-magn-3d.c | 8 +++-- drivers/iio/magnetometer/st_magn_buffer.c | 12 ++- drivers/iio/orientation/hid-sensor-incl-3d.c | 8 +++-- drivers/iio/pressure/hid-sensor-press.c| 8 +++-- drivers/iio/temperature/hid-sensor-temperature.c | 3 +- drivers/rtc/rtc-hid-sensor-time.c | 2 +- drivers/staging/comedi/comedi.h| 39 -- drivers/staging/comedi/drivers/ni_mio_common.c | 3 +- drivers/staging/media/sunxi/cedrus/cedrus.c| 22 ++-- drivers/staging/most/core.c| 2 +- drivers/staging/mt7621-dma/mtk-hsdma.c | 3 +- drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c| 2 +- drivers/staging/rtl8723bs/hal/rtl8723bs_recv.c | 4 +-- drivers/staging/rtl8723bs/os_dep/ioctl_cfg80211.c | 2 +- drivers/staging/rtl8723bs/os_dep/ioctl_linux.c | 2 +- .../vc04_services/interface/vchiq_arm/vchiq_arm.c | 7 +++- include/linux/hid-sensor-hub.h | 4 ++- 24 files changed, 103 insertions(+), 72 deletions(-)
Re: [RFC PATCH 3/6] char: fastrpc: Add support for context Invoke method
On 30/11/18 15:08, Arnd Bergmann wrote: On Fri, Nov 30, 2018 at 4:01 PM Srinivas Kandagatla wrote: Thanks Arnd for the review comments! On 30/11/18 13:41, Arnd Bergmann wrote: On Fri, Nov 30, 2018 at 11:48 AM Srinivas Kandagatla wrote: +static long fastrpc_device_ioctl(struct file *file, unsigned int cmd, +unsigned long arg) +{ + struct fastrpc_user *fl = (struct fastrpc_user *)file->private_data; + struct fastrpc_channel_ctx *cctx = fl->cctx; + char __user *argp = (char __user *)arg; + int err; + + if (!fl->sctx) { + fl->sctx = fastrpc_session_alloc(cctx, 0); + if (!fl->sctx) + return -ENOENT; + } Shouldn't that session be allocated during open()? Yes, and no, we do not need context in all the cases. In cases like we just want to allocate dmabuf. Can you give an example what that would be good for? Currently the instance which does not need session is used as simple memory allocator (rpcmem), TBH, this is the side effect of trying to fit in with downstream application infrastructure which uses ion for andriod usecases. +static void fastrpc_notify_users(struct fastrpc_user *user) +{ + struct fastrpc_invoke_ctx *ctx, *n;will go + + spin_lock(>lock); + list_for_each_entry_safe(ctx, n, >pending, node) + complete(>work); + spin_unlock(>lock); +} Can you explain here what it means to have multiple 'users' a 'fastrpc_user' structure? Why are they all done at the same time? user is allocated on every open(). Having multiple users means that there are more than one compute sessions running on a given dsp. They reason why all the users are notified here is because the dsp is going down, so all the compute sessions associated with it will not see any response from dsp, so any pending/waiting compute contexts are explicitly notified. This is the case where users need to be notified if the dsp goes down due to crash or shut down! What is a 'user' then? My point is that it seems to refer to two different things here. I assume 'fastrpc_user' is whoever has opened the file descriptor. +struct fastrpc_ioctl_invoke { + uint32_t handle;/* remote handle */ + uint32_t sc;/* scalars describing the data */ + remote_arg_t *pra; /* remote arguments list */ + int *fds; /* fd list */ + unsigned int *attrs;/* attribute list */ + unsigned int *crc; +}; This seems too complex for an ioctl argument, with multiple levels of pointer indirections. I'd normally try to make each ioctl argument either a scalar, or a structure with only fixed-length members. I totally agree with you and many thanks for your expert inputs, May be something like below with fixed length members would work? struct fastrpc_remote_arg { __u64 ptr; /* buffer ptr */ __u64 len; /* length */ __u32 fd; /* dmabuf fd */ __u32 reserved1 __u64 reserved2 }; struct fastrpc_remote_fd { __u64 fd; __u64 reserved1 __u64 reserved2 __u64 reserved3 }; struct fastrpc_remote_attr { __u64 attr; __u64 reserved1 __u64 reserved2 __u64 reserved3 }; struct fastrpc_remote_crc { __u64 crc; __u64 reserved1 __u64 reserved2 __u64 reserved3 }; I don't see a need to add extra served fields for structures that are already naturally aligned here, e.g. in fastrpc_remote_arg we need the 'reserved1' but not the 'reserved2'. Yes, I see, I overdone it! Other idea, is, may be I can try to combine these into single structure something like: struct fastrpc_invoke_arg { __u64 ptr; __u64 len; __u32 fd; __u32 reserved1 __u64 attr; __u64 crc; }; struct fastrpc_ioctl_invoke { __u32 handle; __u32 sc; /* The minimum size is scalar_length * 32*/ struct fastrpc_invoke_args *args; }; struct fastrpc_ioctl_invoke { __u32 handle; __u32 sc; /* The minimum size is scalar_length * 32 */ struct fastrpc_remote_args *rargs; struct fastrpc_remote_fd *fds; struct fastrpc_remote_attr *attrs; struct fastrpc_remote_crc *crc; }; Do these really have to be indirect then? Are they all lists of variable length? How do you know how long? Yes, they are variable length and will be scalar length long. Scalar length is derived from sc variable in this structure. --srini Arnd
Re: [RFC PATCH 3/6] char: fastrpc: Add support for context Invoke method
On 30/11/18 15:08, Arnd Bergmann wrote: On Fri, Nov 30, 2018 at 4:01 PM Srinivas Kandagatla wrote: Thanks Arnd for the review comments! On 30/11/18 13:41, Arnd Bergmann wrote: On Fri, Nov 30, 2018 at 11:48 AM Srinivas Kandagatla wrote: +static long fastrpc_device_ioctl(struct file *file, unsigned int cmd, +unsigned long arg) +{ + struct fastrpc_user *fl = (struct fastrpc_user *)file->private_data; + struct fastrpc_channel_ctx *cctx = fl->cctx; + char __user *argp = (char __user *)arg; + int err; + + if (!fl->sctx) { + fl->sctx = fastrpc_session_alloc(cctx, 0); + if (!fl->sctx) + return -ENOENT; + } Shouldn't that session be allocated during open()? Yes, and no, we do not need context in all the cases. In cases like we just want to allocate dmabuf. Can you give an example what that would be good for? Currently the instance which does not need session is used as simple memory allocator (rpcmem), TBH, this is the side effect of trying to fit in with downstream application infrastructure which uses ion for andriod usecases. +static void fastrpc_notify_users(struct fastrpc_user *user) +{ + struct fastrpc_invoke_ctx *ctx, *n;will go + + spin_lock(>lock); + list_for_each_entry_safe(ctx, n, >pending, node) + complete(>work); + spin_unlock(>lock); +} Can you explain here what it means to have multiple 'users' a 'fastrpc_user' structure? Why are they all done at the same time? user is allocated on every open(). Having multiple users means that there are more than one compute sessions running on a given dsp. They reason why all the users are notified here is because the dsp is going down, so all the compute sessions associated with it will not see any response from dsp, so any pending/waiting compute contexts are explicitly notified. This is the case where users need to be notified if the dsp goes down due to crash or shut down! What is a 'user' then? My point is that it seems to refer to two different things here. I assume 'fastrpc_user' is whoever has opened the file descriptor. +struct fastrpc_ioctl_invoke { + uint32_t handle;/* remote handle */ + uint32_t sc;/* scalars describing the data */ + remote_arg_t *pra; /* remote arguments list */ + int *fds; /* fd list */ + unsigned int *attrs;/* attribute list */ + unsigned int *crc; +}; This seems too complex for an ioctl argument, with multiple levels of pointer indirections. I'd normally try to make each ioctl argument either a scalar, or a structure with only fixed-length members. I totally agree with you and many thanks for your expert inputs, May be something like below with fixed length members would work? struct fastrpc_remote_arg { __u64 ptr; /* buffer ptr */ __u64 len; /* length */ __u32 fd; /* dmabuf fd */ __u32 reserved1 __u64 reserved2 }; struct fastrpc_remote_fd { __u64 fd; __u64 reserved1 __u64 reserved2 __u64 reserved3 }; struct fastrpc_remote_attr { __u64 attr; __u64 reserved1 __u64 reserved2 __u64 reserved3 }; struct fastrpc_remote_crc { __u64 crc; __u64 reserved1 __u64 reserved2 __u64 reserved3 }; I don't see a need to add extra served fields for structures that are already naturally aligned here, e.g. in fastrpc_remote_arg we need the 'reserved1' but not the 'reserved2'. Yes, I see, I overdone it! Other idea, is, may be I can try to combine these into single structure something like: struct fastrpc_invoke_arg { __u64 ptr; __u64 len; __u32 fd; __u32 reserved1 __u64 attr; __u64 crc; }; struct fastrpc_ioctl_invoke { __u32 handle; __u32 sc; /* The minimum size is scalar_length * 32*/ struct fastrpc_invoke_args *args; }; struct fastrpc_ioctl_invoke { __u32 handle; __u32 sc; /* The minimum size is scalar_length * 32 */ struct fastrpc_remote_args *rargs; struct fastrpc_remote_fd *fds; struct fastrpc_remote_attr *attrs; struct fastrpc_remote_crc *crc; }; Do these really have to be indirect then? Are they all lists of variable length? How do you know how long? Yes, they are variable length and will be scalar length long. Scalar length is derived from sc variable in this structure. --srini Arnd
Re: [PATCH] ARM: dts: socfpga: remove incorrect 0x leading
On 11/29/18 6:56 AM, Clément Péron wrote: > unit-address does not have a leading "0x" (the number is assumed to be > hexadecimal). > > Signed-off-by: Clément Péron > --- > arch/arm/boot/dts/socfpga.dtsi | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Applied! Thanks, Dinh
Re: [PATCH] ARM: dts: socfpga: remove incorrect 0x leading
On 11/29/18 6:56 AM, Clément Péron wrote: > unit-address does not have a leading "0x" (the number is assumed to be > hexadecimal). > > Signed-off-by: Clément Péron > --- > arch/arm/boot/dts/socfpga.dtsi | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Applied! Thanks, Dinh
Re: [PATCH 3/3] lockdep: Use line-buffered printk() for lockdep messages.
On Thu 2018-11-29 19:09:26, Tetsuo Handa wrote: > On 2018/11/28 22:29, David Laight wrote: > > I also spent a week trying to work out why a customer kernel was > > locking up - only to finally find out that the distro they were > > using set 'panic on opps' - making it almost impossible to find > > out what was happening. Did the machine rebooted before the messages reached console or did it produced crash-dump or frozen? panic() tries relatively hard to flush the messages to the console, see printk_safe_flush_on_panic() and console_flush_on_panic(). It is less aggressive when crashdump is called. It might deadlock in console drivers. Hmm, it might also fail when another CPU is still running and steals console_lock. We might want to disable console_trylock_spinning() if the caller is not panic_cpu. > On 2018/11/26 13:34, Sergey Senozhatsky wrote: > > Or... Instead. > > We can just leave pr_cont() alone for now. And make it possible to > > reconstruct messages - IOW, inject some info to printk messages. We > > do this at Samsung (inject CPU number at the beginning of every > > message. `cat serial.0 | grep "\[1\]"` to grep for all messages from > > CPU1). Probably this would be the simplest thing. > > Yes, I sent a patch which helps reconstructing messages at > http://lkml.kernel.org/r/1543045075-3008-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp > . All the buffering approaches have problems that cannot be solved easily. The prefix-based approach looks like the best alternative at the moment. Best Regards, Petr
Re: [PATCH 3/3] lockdep: Use line-buffered printk() for lockdep messages.
On Thu 2018-11-29 19:09:26, Tetsuo Handa wrote: > On 2018/11/28 22:29, David Laight wrote: > > I also spent a week trying to work out why a customer kernel was > > locking up - only to finally find out that the distro they were > > using set 'panic on opps' - making it almost impossible to find > > out what was happening. Did the machine rebooted before the messages reached console or did it produced crash-dump or frozen? panic() tries relatively hard to flush the messages to the console, see printk_safe_flush_on_panic() and console_flush_on_panic(). It is less aggressive when crashdump is called. It might deadlock in console drivers. Hmm, it might also fail when another CPU is still running and steals console_lock. We might want to disable console_trylock_spinning() if the caller is not panic_cpu. > On 2018/11/26 13:34, Sergey Senozhatsky wrote: > > Or... Instead. > > We can just leave pr_cont() alone for now. And make it possible to > > reconstruct messages - IOW, inject some info to printk messages. We > > do this at Samsung (inject CPU number at the beginning of every > > message. `cat serial.0 | grep "\[1\]"` to grep for all messages from > > CPU1). Probably this would be the simplest thing. > > Yes, I sent a patch which helps reconstructing messages at > http://lkml.kernel.org/r/1543045075-3008-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp > . All the buffering approaches have problems that cannot be solved easily. The prefix-based approach looks like the best alternative at the moment. Best Regards, Petr
Re: Regression: very quiet speakers on Thinkpad T570s
On Fri, 30 Nov 2018 15:49:17 +0100, Jeremy Cline wrote: > > Hi, > > Some folks have reported on the Fedora bug tracker[0] that the laptop > speaker volume is very low on the Thinkpad T570 when running a kernel > that includes commit 61fcf8ece9b6 ("ALSA: hda/realtek - Enable Thinkpad > Dock device for ALC298 platform"). > > alsa-info.sh from v4.15.4 (just before commit 61fcf8ece9b6 arrived in > stable) and v4.19.4 with the issue present are attached to the bugzilla. > I've also Cc'd Tim, who uploaded them and has the laptop in question. > > [0] https://bugzilla.redhat.com/show_bug.cgi?id=1554304 Could you pinpoint which kernel version started showing the regression, at least? The diffs are fairly wide between 4.15 and 4.19. thanks, Takashi
Re: Regression: very quiet speakers on Thinkpad T570s
On Fri, 30 Nov 2018 15:49:17 +0100, Jeremy Cline wrote: > > Hi, > > Some folks have reported on the Fedora bug tracker[0] that the laptop > speaker volume is very low on the Thinkpad T570 when running a kernel > that includes commit 61fcf8ece9b6 ("ALSA: hda/realtek - Enable Thinkpad > Dock device for ALC298 platform"). > > alsa-info.sh from v4.15.4 (just before commit 61fcf8ece9b6 arrived in > stable) and v4.19.4 with the issue present are attached to the bugzilla. > I've also Cc'd Tim, who uploaded them and has the laptop in question. > > [0] https://bugzilla.redhat.com/show_bug.cgi?id=1554304 Could you pinpoint which kernel version started showing the regression, at least? The diffs are fairly wide between 4.15 and 4.19. thanks, Takashi
Re: [PATCH] dmaengine: remove DBGFS_FUNC_DECL()
On 11/30/2018 10:42 AM, Yangtao Li wrote: We already have the DEFINE_SHOW_ATTRIBUTE,There is no need to define such a macro,so remove DBGFS_FUNC_DECL.Also use macro to simplify some code. Signed-off-by: Yangtao Li --- drivers/dma/amba-pl08x.c | 14 ++ drivers/dma/mic_x100_dma.c | 22 +++--- drivers/dma/pxa_dma.c| 24 drivers/dma/qcom/hidma_dbg.c | 33 ++--- Acked-by: Sinan Kaya for drivers/dma/qcom/hidma_dbg.c 4 files changed, 23 insertions(+), 70 deletions(-)
Re: [PATCH] dmaengine: remove DBGFS_FUNC_DECL()
On 11/30/2018 10:42 AM, Yangtao Li wrote: We already have the DEFINE_SHOW_ATTRIBUTE,There is no need to define such a macro,so remove DBGFS_FUNC_DECL.Also use macro to simplify some code. Signed-off-by: Yangtao Li --- drivers/dma/amba-pl08x.c | 14 ++ drivers/dma/mic_x100_dma.c | 22 +++--- drivers/dma/pxa_dma.c| 24 drivers/dma/qcom/hidma_dbg.c | 33 ++--- Acked-by: Sinan Kaya for drivers/dma/qcom/hidma_dbg.c 4 files changed, 23 insertions(+), 70 deletions(-)
Re: [PATCH v2] mm: page_mapped: don't assume compound page is huge or THP
On 30.11.18 13:06, Jan Stancek wrote: > LTP proc01 testcase has been observed to rarely trigger crashes > on arm64: > page_mapped+0x78/0xb4 > stable_page_flags+0x27c/0x338 > kpageflags_read+0xfc/0x164 > proc_reg_read+0x7c/0xb8 > __vfs_read+0x58/0x178 > vfs_read+0x90/0x14c > SyS_read+0x60/0xc0 > > Issue is that page_mapped() assumes that if compound page is not > huge, then it must be THP. But if this is 'normal' compound page > (COMPOUND_PAGE_DTOR), then following loop can keep running > (for HPAGE_PMD_NR iterations) until it tries to read from memory > that isn't mapped and triggers a panic: > for (i = 0; i < hpage_nr_pages(page); i++) { > if (atomic_read([i]._mapcount) >= 0) > return true; > } > > I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only > with a custom kernel module [1] which: > - allocates compound page (PAGEC) of order 1 > - allocates 2 normal pages (COPY), which are initialized to 0xff > (to satisfy _mapcount >= 0) > - 2 PAGEC page structs are copied to address of first COPY page > - second page of COPY is marked as not present > - call to page_mapped(COPY) now triggers fault on access to 2nd > COPY page at offset 0x30 (_mapcount) > > [1] > https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c > > Fix the loop to iterate for "1 << compound_order" pages. > > Debugged-by: Laszlo Ersek > Suggested-by: "Kirill A. Shutemov" > Signed-off-by: Jan Stancek > --- > mm/util.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Changes in v2: > - change the loop instead so we check also mapcount of subpages > > diff --git a/mm/util.c b/mm/util.c > index 8bf08b5b5760..5c9c7359ee8a 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -478,7 +478,7 @@ bool page_mapped(struct page *page) > return true; > if (PageHuge(page)) > return false; > - for (i = 0; i < hpage_nr_pages(page); i++) { > + for (i = 0; i < (1 << compound_order(page)); i++) { > if (atomic_read([i]._mapcount) >= 0) > return true; > } > Reviewed-by: David Hildenbrand -- Thanks, David / dhildenb
Re: [PATCH v2] mm: page_mapped: don't assume compound page is huge or THP
On 30.11.18 13:06, Jan Stancek wrote: > LTP proc01 testcase has been observed to rarely trigger crashes > on arm64: > page_mapped+0x78/0xb4 > stable_page_flags+0x27c/0x338 > kpageflags_read+0xfc/0x164 > proc_reg_read+0x7c/0xb8 > __vfs_read+0x58/0x178 > vfs_read+0x90/0x14c > SyS_read+0x60/0xc0 > > Issue is that page_mapped() assumes that if compound page is not > huge, then it must be THP. But if this is 'normal' compound page > (COMPOUND_PAGE_DTOR), then following loop can keep running > (for HPAGE_PMD_NR iterations) until it tries to read from memory > that isn't mapped and triggers a panic: > for (i = 0; i < hpage_nr_pages(page); i++) { > if (atomic_read([i]._mapcount) >= 0) > return true; > } > > I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only > with a custom kernel module [1] which: > - allocates compound page (PAGEC) of order 1 > - allocates 2 normal pages (COPY), which are initialized to 0xff > (to satisfy _mapcount >= 0) > - 2 PAGEC page structs are copied to address of first COPY page > - second page of COPY is marked as not present > - call to page_mapped(COPY) now triggers fault on access to 2nd > COPY page at offset 0x30 (_mapcount) > > [1] > https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c > > Fix the loop to iterate for "1 << compound_order" pages. > > Debugged-by: Laszlo Ersek > Suggested-by: "Kirill A. Shutemov" > Signed-off-by: Jan Stancek > --- > mm/util.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Changes in v2: > - change the loop instead so we check also mapcount of subpages > > diff --git a/mm/util.c b/mm/util.c > index 8bf08b5b5760..5c9c7359ee8a 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -478,7 +478,7 @@ bool page_mapped(struct page *page) > return true; > if (PageHuge(page)) > return false; > - for (i = 0; i < hpage_nr_pages(page); i++) { > + for (i = 0; i < (1 << compound_order(page)); i++) { > if (atomic_read([i]._mapcount) >= 0) > return true; > } > Reviewed-by: David Hildenbrand -- Thanks, David / dhildenb
[PATCH v5 2/4] perf: add arm64 smmuv3 pmu driver
From: Neil Leeder Adds a new driver to support the SMMUv3 PMU and add it into the perf events framework. Each SMMU node may have multiple PMUs associated with it, each of which may support different events. SMMUv3 PMCG devices are named as smmuv3_pmcg_ where is the physical page address of the SMMU PMCG wrapped to 4K boundary. For example, the PMCG at 0xff8884 is named smmuv3_pmcg_ff88840 Filtering by stream id is done by specifying filtering parameters with the event. options are: filter_enable- 0 = no filtering, 1 = filtering enabled filter_span - 0 = exact match, 1 = pattern match filter_stream_id - pattern to filter against Example: perf stat -e smmuv3_pmcg_ff88840/transaction,filter_enable=1, filter_span=1,filter_stream_id=0x42/ -a netperf Applies filter pattern 0x42 to transaction events, which means events matching stream ids 0x42 & 0x43 are counted as only upper StreamID bits are required to match the given filter. Further filtering information is available in the SMMU documentation. SMMU events are not attributable to a CPU, so task mode and sampling are not supported. Signed-off-by: Neil Leeder Signed-off-by: Shameer Kolothum --- drivers/perf/Kconfig | 9 + drivers/perf/Makefile | 1 + drivers/perf/arm_smmuv3_pmu.c | 778 ++ 3 files changed, 788 insertions(+) create mode 100644 drivers/perf/arm_smmuv3_pmu.c diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig index 08ebaf7..92be6a3 100644 --- a/drivers/perf/Kconfig +++ b/drivers/perf/Kconfig @@ -52,6 +52,15 @@ config ARM_PMU_ACPI depends on ARM_PMU && ACPI def_bool y +config ARM_SMMU_V3_PMU +bool "ARM SMMUv3 Performance Monitors Extension" +depends on ARM64 && ACPI && ARM_SMMU_V3 + help + Provides support for the SMMU version 3 performance monitor unit (PMU) + on ARM-based systems. + Adds the SMMU PMU into the perf events subsystem for + monitoring SMMU performance events. + config ARM_DSU_PMU tristate "ARM DynamIQ Shared Unit (DSU) PMU" depends on ARM64 diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile index b3902bd..f10a932 100644 --- a/drivers/perf/Makefile +++ b/drivers/perf/Makefile @@ -4,6 +4,7 @@ obj-$(CONFIG_ARM_CCN) += arm-ccn.o obj-$(CONFIG_ARM_DSU_PMU) += arm_dsu_pmu.o obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o +obj-$(CONFIG_ARM_SMMU_V3_PMU) += arm_smmuv3_pmu.o obj-$(CONFIG_HISI_PMU) += hisilicon/ obj-$(CONFIG_QCOM_L2_PMU) += qcom_l2_pmu.o obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c new file mode 100644 index 000..fb9dcd8 --- /dev/null +++ b/drivers/perf/arm_smmuv3_pmu.c @@ -0,0 +1,778 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * This driver adds support for perf events to use the Performance + * Monitor Counter Groups (PMCG) associated with an SMMUv3 node + * to monitor that node. + * + * SMMUv3 PMCG devices are named as smmuv3_pmcg_ where + * is the physical page address of the SMMU PMCG wrapped + * to 4K boundary. For example, the PMCG at 0xff8884 is named + * smmuv3_pmcg_ff88840 + * + * Filtering by stream id is done by specifying filtering parameters + * with the event. options are: + * filter_enable- 0 = no filtering, 1 = filtering enabled + * filter_span - 0 = exact match, 1 = pattern match + * filter_stream_id - pattern to filter against + * + * To match a partial StreamID where the X most-significant bits must match + * but the Y least-significant bits might differ, STREAMID is programmed + * with a value that contains: + * STREAMID[Y - 1] == 0. + * STREAMID[Y - 2:0] == 1 (where Y > 1). + * The remainder of implemented bits of STREAMID (X bits, from bit Y upwards) + * contain a value to match from the corresponding bits of event StreamID. + * + * Example: perf stat -e smmuv3_pmcg_ff88840/transaction,filter_enable=1, + *filter_span=1,filter_stream_id=0x42/ -a netperf + * Applies filter pattern 0x42 to transaction events, which means events + * matching stream ids 0x42 and 0x43 are counted. Further filtering + * information is available in the SMMU documentation. + * + * SMMU events are not attributable to a CPU, so task mode and sampling + * are not supported. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define SMMU_PMCG_EVCNTR0 0x0 +#define SMMU_PMCG_EVCNTR(n, stride) (SMMU_PMCG_EVCNTR0 + (n) * (stride)) +#define SMMU_PMCG_EVTYPER0 0x400 +#define SMMU_PMCG_EVTYPER(n)(SMMU_PMCG_EVTYPER0 + (n) * 4) +#define SMMU_PMCG_SID_SPAN_SHIFT29 +#define SMMU_PMCG_SID_SPAN_MASK GENMASK(29, 29) +#define SMMU_PMCG_SMR0
[PATCH v5 2/4] perf: add arm64 smmuv3 pmu driver
From: Neil Leeder Adds a new driver to support the SMMUv3 PMU and add it into the perf events framework. Each SMMU node may have multiple PMUs associated with it, each of which may support different events. SMMUv3 PMCG devices are named as smmuv3_pmcg_ where is the physical page address of the SMMU PMCG wrapped to 4K boundary. For example, the PMCG at 0xff8884 is named smmuv3_pmcg_ff88840 Filtering by stream id is done by specifying filtering parameters with the event. options are: filter_enable- 0 = no filtering, 1 = filtering enabled filter_span - 0 = exact match, 1 = pattern match filter_stream_id - pattern to filter against Example: perf stat -e smmuv3_pmcg_ff88840/transaction,filter_enable=1, filter_span=1,filter_stream_id=0x42/ -a netperf Applies filter pattern 0x42 to transaction events, which means events matching stream ids 0x42 & 0x43 are counted as only upper StreamID bits are required to match the given filter. Further filtering information is available in the SMMU documentation. SMMU events are not attributable to a CPU, so task mode and sampling are not supported. Signed-off-by: Neil Leeder Signed-off-by: Shameer Kolothum --- drivers/perf/Kconfig | 9 + drivers/perf/Makefile | 1 + drivers/perf/arm_smmuv3_pmu.c | 778 ++ 3 files changed, 788 insertions(+) create mode 100644 drivers/perf/arm_smmuv3_pmu.c diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig index 08ebaf7..92be6a3 100644 --- a/drivers/perf/Kconfig +++ b/drivers/perf/Kconfig @@ -52,6 +52,15 @@ config ARM_PMU_ACPI depends on ARM_PMU && ACPI def_bool y +config ARM_SMMU_V3_PMU +bool "ARM SMMUv3 Performance Monitors Extension" +depends on ARM64 && ACPI && ARM_SMMU_V3 + help + Provides support for the SMMU version 3 performance monitor unit (PMU) + on ARM-based systems. + Adds the SMMU PMU into the perf events subsystem for + monitoring SMMU performance events. + config ARM_DSU_PMU tristate "ARM DynamIQ Shared Unit (DSU) PMU" depends on ARM64 diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile index b3902bd..f10a932 100644 --- a/drivers/perf/Makefile +++ b/drivers/perf/Makefile @@ -4,6 +4,7 @@ obj-$(CONFIG_ARM_CCN) += arm-ccn.o obj-$(CONFIG_ARM_DSU_PMU) += arm_dsu_pmu.o obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o +obj-$(CONFIG_ARM_SMMU_V3_PMU) += arm_smmuv3_pmu.o obj-$(CONFIG_HISI_PMU) += hisilicon/ obj-$(CONFIG_QCOM_L2_PMU) += qcom_l2_pmu.o obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c new file mode 100644 index 000..fb9dcd8 --- /dev/null +++ b/drivers/perf/arm_smmuv3_pmu.c @@ -0,0 +1,778 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * This driver adds support for perf events to use the Performance + * Monitor Counter Groups (PMCG) associated with an SMMUv3 node + * to monitor that node. + * + * SMMUv3 PMCG devices are named as smmuv3_pmcg_ where + * is the physical page address of the SMMU PMCG wrapped + * to 4K boundary. For example, the PMCG at 0xff8884 is named + * smmuv3_pmcg_ff88840 + * + * Filtering by stream id is done by specifying filtering parameters + * with the event. options are: + * filter_enable- 0 = no filtering, 1 = filtering enabled + * filter_span - 0 = exact match, 1 = pattern match + * filter_stream_id - pattern to filter against + * + * To match a partial StreamID where the X most-significant bits must match + * but the Y least-significant bits might differ, STREAMID is programmed + * with a value that contains: + * STREAMID[Y - 1] == 0. + * STREAMID[Y - 2:0] == 1 (where Y > 1). + * The remainder of implemented bits of STREAMID (X bits, from bit Y upwards) + * contain a value to match from the corresponding bits of event StreamID. + * + * Example: perf stat -e smmuv3_pmcg_ff88840/transaction,filter_enable=1, + *filter_span=1,filter_stream_id=0x42/ -a netperf + * Applies filter pattern 0x42 to transaction events, which means events + * matching stream ids 0x42 and 0x43 are counted. Further filtering + * information is available in the SMMU documentation. + * + * SMMU events are not attributable to a CPU, so task mode and sampling + * are not supported. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define SMMU_PMCG_EVCNTR0 0x0 +#define SMMU_PMCG_EVCNTR(n, stride) (SMMU_PMCG_EVCNTR0 + (n) * (stride)) +#define SMMU_PMCG_EVTYPER0 0x400 +#define SMMU_PMCG_EVTYPER(n)(SMMU_PMCG_EVTYPER0 + (n) * 4) +#define SMMU_PMCG_SID_SPAN_SHIFT29 +#define SMMU_PMCG_SID_SPAN_MASK GENMASK(29, 29) +#define SMMU_PMCG_SMR0
[PATCH v5 1/4] acpi: arm64: add iort support for PMCG
From: Neil Leeder Add support for the SMMU Performance Monitor Counter Group information from ACPI. This is in preparation for its use in the SMMUv3 PMU driver. Signed-off-by: Neil Leeder Signed-off-by: Hanjun Guo Signed-off-by: Shameer Kolothum --- drivers/acpi/arm64/iort.c | 97 +-- 1 file changed, 76 insertions(+), 21 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 2a361e2..2da08e1 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -356,7 +356,8 @@ static struct acpi_iort_node *iort_node_get_id(struct acpi_iort_node *node, if (map->flags & ACPI_IORT_ID_SINGLE_MAPPING) { if (node->type == ACPI_IORT_NODE_NAMED_COMPONENT || node->type == ACPI_IORT_NODE_PCI_ROOT_COMPLEX || - node->type == ACPI_IORT_NODE_SMMU_V3) { + node->type == ACPI_IORT_NODE_SMMU_V3 || + node->type == ACPI_IORT_NODE_PMCG) { *id_out = map->output_base; return parent; } @@ -394,6 +395,8 @@ static int iort_get_id_mapping_index(struct acpi_iort_node *node) } return smmu->id_mapping_index; + case ACPI_IORT_NODE_PMCG: + return 0; default: return -EINVAL; } @@ -1216,14 +1219,23 @@ static void __init arm_smmu_v3_init_resources(struct resource *res, } } -static bool __init arm_smmu_v3_is_coherent(struct acpi_iort_node *node) +static void __init arm_smmu_v3_dma_configure(struct device *dev, +struct acpi_iort_node *node) { struct acpi_iort_smmu_v3 *smmu; + enum dev_dma_attr attr; /* Retrieve SMMUv3 specific data */ smmu = (struct acpi_iort_smmu_v3 *)node->node_data; - return smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE; + attr = (smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE) ? + DEV_DMA_COHERENT : DEV_DMA_NON_COHERENT; + + /* We expect the dma masks to be equivalent for all SMMUv3 set-ups */ + dev->dma_mask = >coherent_dma_mask; + + /* Configure DMA for the page table walker */ + acpi_dma_configure(dev, attr); } #if defined(CONFIG_ACPI_NUMA) @@ -1299,20 +1311,64 @@ static void __init arm_smmu_init_resources(struct resource *res, } } -static bool __init arm_smmu_is_coherent(struct acpi_iort_node *node) +static void __init arm_smmu_dma_configure(struct device *dev, + struct acpi_iort_node *node) { struct acpi_iort_smmu *smmu; + enum dev_dma_attr attr; /* Retrieve SMMU specific data */ smmu = (struct acpi_iort_smmu *)node->node_data; - return smmu->flags & ACPI_IORT_SMMU_COHERENT_WALK; + attr = (smmu->flags & ACPI_IORT_SMMU_COHERENT_WALK) ? + DEV_DMA_COHERENT : DEV_DMA_NON_COHERENT; + + /* We expect the dma masks to be equivalent for SMMU set-ups */ + dev->dma_mask = >coherent_dma_mask; + + /* Configure DMA for the page table walker */ + acpi_dma_configure(dev, attr); +} + +static int __init arm_smmu_v3_pmcg_count_resources(struct acpi_iort_node *node) +{ + struct acpi_iort_pmcg *pmcg; + + /* Retrieve PMCG specific data */ + pmcg = (struct acpi_iort_pmcg *)node->node_data; + + /* +* There are always 2 memory resources. +* If the overflow_gsiv is present then add that for a total of 3. +*/ + return pmcg->overflow_gsiv ? 3 : 2; +} + +static void __init arm_smmu_v3_pmcg_init_resources(struct resource *res, + struct acpi_iort_node *node) +{ + struct acpi_iort_pmcg *pmcg; + + /* Retrieve PMCG specific data */ + pmcg = (struct acpi_iort_pmcg *)node->node_data; + + res[0].start = pmcg->page0_base_address; + res[0].end = pmcg->page0_base_address + SZ_4K - 1; + res[0].flags = IORESOURCE_MEM; + res[1].start = pmcg->page1_base_address; + res[1].end = pmcg->page1_base_address + SZ_4K - 1; + res[1].flags = IORESOURCE_MEM; + + if (pmcg->overflow_gsiv) + acpi_iort_register_irq(pmcg->overflow_gsiv, "overflow", + ACPI_EDGE_SENSITIVE, [2]); } struct iort_dev_config { const char *name; int (*dev_init)(struct acpi_iort_node *node); - bool (*dev_is_coherent)(struct acpi_iort_node *node); + void (*dev_dma_configure)(struct device *dev, + struct acpi_iort_node *node); int (*dev_count_resources)(struct acpi_iort_node *node); void (*dev_init_resources)(struct resource *res, struct acpi_iort_node *node); @@ -1322,7 +1378,7 @@ struct iort_dev_config { static const struct iort_dev_config
[PATCH v5 1/4] acpi: arm64: add iort support for PMCG
From: Neil Leeder Add support for the SMMU Performance Monitor Counter Group information from ACPI. This is in preparation for its use in the SMMUv3 PMU driver. Signed-off-by: Neil Leeder Signed-off-by: Hanjun Guo Signed-off-by: Shameer Kolothum --- drivers/acpi/arm64/iort.c | 97 +-- 1 file changed, 76 insertions(+), 21 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 2a361e2..2da08e1 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -356,7 +356,8 @@ static struct acpi_iort_node *iort_node_get_id(struct acpi_iort_node *node, if (map->flags & ACPI_IORT_ID_SINGLE_MAPPING) { if (node->type == ACPI_IORT_NODE_NAMED_COMPONENT || node->type == ACPI_IORT_NODE_PCI_ROOT_COMPLEX || - node->type == ACPI_IORT_NODE_SMMU_V3) { + node->type == ACPI_IORT_NODE_SMMU_V3 || + node->type == ACPI_IORT_NODE_PMCG) { *id_out = map->output_base; return parent; } @@ -394,6 +395,8 @@ static int iort_get_id_mapping_index(struct acpi_iort_node *node) } return smmu->id_mapping_index; + case ACPI_IORT_NODE_PMCG: + return 0; default: return -EINVAL; } @@ -1216,14 +1219,23 @@ static void __init arm_smmu_v3_init_resources(struct resource *res, } } -static bool __init arm_smmu_v3_is_coherent(struct acpi_iort_node *node) +static void __init arm_smmu_v3_dma_configure(struct device *dev, +struct acpi_iort_node *node) { struct acpi_iort_smmu_v3 *smmu; + enum dev_dma_attr attr; /* Retrieve SMMUv3 specific data */ smmu = (struct acpi_iort_smmu_v3 *)node->node_data; - return smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE; + attr = (smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE) ? + DEV_DMA_COHERENT : DEV_DMA_NON_COHERENT; + + /* We expect the dma masks to be equivalent for all SMMUv3 set-ups */ + dev->dma_mask = >coherent_dma_mask; + + /* Configure DMA for the page table walker */ + acpi_dma_configure(dev, attr); } #if defined(CONFIG_ACPI_NUMA) @@ -1299,20 +1311,64 @@ static void __init arm_smmu_init_resources(struct resource *res, } } -static bool __init arm_smmu_is_coherent(struct acpi_iort_node *node) +static void __init arm_smmu_dma_configure(struct device *dev, + struct acpi_iort_node *node) { struct acpi_iort_smmu *smmu; + enum dev_dma_attr attr; /* Retrieve SMMU specific data */ smmu = (struct acpi_iort_smmu *)node->node_data; - return smmu->flags & ACPI_IORT_SMMU_COHERENT_WALK; + attr = (smmu->flags & ACPI_IORT_SMMU_COHERENT_WALK) ? + DEV_DMA_COHERENT : DEV_DMA_NON_COHERENT; + + /* We expect the dma masks to be equivalent for SMMU set-ups */ + dev->dma_mask = >coherent_dma_mask; + + /* Configure DMA for the page table walker */ + acpi_dma_configure(dev, attr); +} + +static int __init arm_smmu_v3_pmcg_count_resources(struct acpi_iort_node *node) +{ + struct acpi_iort_pmcg *pmcg; + + /* Retrieve PMCG specific data */ + pmcg = (struct acpi_iort_pmcg *)node->node_data; + + /* +* There are always 2 memory resources. +* If the overflow_gsiv is present then add that for a total of 3. +*/ + return pmcg->overflow_gsiv ? 3 : 2; +} + +static void __init arm_smmu_v3_pmcg_init_resources(struct resource *res, + struct acpi_iort_node *node) +{ + struct acpi_iort_pmcg *pmcg; + + /* Retrieve PMCG specific data */ + pmcg = (struct acpi_iort_pmcg *)node->node_data; + + res[0].start = pmcg->page0_base_address; + res[0].end = pmcg->page0_base_address + SZ_4K - 1; + res[0].flags = IORESOURCE_MEM; + res[1].start = pmcg->page1_base_address; + res[1].end = pmcg->page1_base_address + SZ_4K - 1; + res[1].flags = IORESOURCE_MEM; + + if (pmcg->overflow_gsiv) + acpi_iort_register_irq(pmcg->overflow_gsiv, "overflow", + ACPI_EDGE_SENSITIVE, [2]); } struct iort_dev_config { const char *name; int (*dev_init)(struct acpi_iort_node *node); - bool (*dev_is_coherent)(struct acpi_iort_node *node); + void (*dev_dma_configure)(struct device *dev, + struct acpi_iort_node *node); int (*dev_count_resources)(struct acpi_iort_node *node); void (*dev_init_resources)(struct resource *res, struct acpi_iort_node *node); @@ -1322,7 +1378,7 @@ struct iort_dev_config { static const struct iort_dev_config
[PATCH v5 0/4] arm64 SMMUv3 PMU driver with IORT support
This adds a driver for the SMMUv3 PMU into the perf framework. It includes an IORT update to support PM Counter Groups. This is based on the initial work done by Neil Leeder[1] SMMUv3 PMCG devices are named as smmuv3_pmcg_ where is the physical page address of the SMMU PMCG. For example, the PMCG at 0xff8884 is named smmuv3_pmcg_ff88840 Usage example: For common arch supported events: perf stat -e smmuv3_pmcg_ff88840/transaction,filter_enable=1, filter_span=1,filter_stream_id=0x42/ -a netperf For IMP DEF events: perf stat -e smmuv3_pmcg_ff88840/event=id/ -a netperf This is sanity tested on a HiSilicon platform that requires a quirk to run it properly. As per HiSilicon erratum #162001800, PMCG event counter registers (SMMU_PMCG_EVCNTRn) on HiSilicon Hip08 platforms are read only and this prevents the software from setting the initial period on event start. Unfortunately we were a bit late in the cycle to detect this issue and now require software workaround for this. Patch #4 is added to this series to provide a workaround for this issue. Further testing on supported platforms are very much welcome. v4 ---> v5 -IORT code is modified to pass the option/quirk flags to the driver through platform_data (patch #4), based on Robin's comments. -Removed COMPILE_TEST (patch #2). v3 --> v4 -Addressed comments from Jean and Robin. -Merged dma config callbacks as per Lorenzo's comments(patch #1). -Added handling of Global(Counter0) filter settings mode(patch #2). -Added patch #4 to address HiSilicon erratum #162001800 - v2 --> v3 -Addressed comments from Robin. -Removed iort helper function to retrieve the PMCG reference smmu. -PMCG devices are now named using the base address v1 --> v2 - Addressed comments from Robin. - Added an helper to retrieve the associated smmu dev and named PMUs to make the association visible to user. - Added MSI support for overflow irq [1]https://www.spinics.net/lists/arm-kernel/msg598591.html Neil Leeder (2): acpi: arm64: add iort support for PMCG perf: add arm64 smmuv3 pmu driver Shameer Kolothum (2): perf/smmuv3: Add MSI irq support perf/smmuv3_pmu: Enable HiSilicon Erratum 162001800 quirk drivers/acpi/arm64/iort.c | 127 +-- drivers/perf/Kconfig | 9 + drivers/perf/Makefile | 1 + drivers/perf/arm_smmuv3_pmu.c | 859 ++ include/linux/acpi_iort.h | 3 + 5 files changed, 975 insertions(+), 24 deletions(-) create mode 100644 drivers/perf/arm_smmuv3_pmu.c -- 2.7.4
[PATCH v5 0/4] arm64 SMMUv3 PMU driver with IORT support
This adds a driver for the SMMUv3 PMU into the perf framework. It includes an IORT update to support PM Counter Groups. This is based on the initial work done by Neil Leeder[1] SMMUv3 PMCG devices are named as smmuv3_pmcg_ where is the physical page address of the SMMU PMCG. For example, the PMCG at 0xff8884 is named smmuv3_pmcg_ff88840 Usage example: For common arch supported events: perf stat -e smmuv3_pmcg_ff88840/transaction,filter_enable=1, filter_span=1,filter_stream_id=0x42/ -a netperf For IMP DEF events: perf stat -e smmuv3_pmcg_ff88840/event=id/ -a netperf This is sanity tested on a HiSilicon platform that requires a quirk to run it properly. As per HiSilicon erratum #162001800, PMCG event counter registers (SMMU_PMCG_EVCNTRn) on HiSilicon Hip08 platforms are read only and this prevents the software from setting the initial period on event start. Unfortunately we were a bit late in the cycle to detect this issue and now require software workaround for this. Patch #4 is added to this series to provide a workaround for this issue. Further testing on supported platforms are very much welcome. v4 ---> v5 -IORT code is modified to pass the option/quirk flags to the driver through platform_data (patch #4), based on Robin's comments. -Removed COMPILE_TEST (patch #2). v3 --> v4 -Addressed comments from Jean and Robin. -Merged dma config callbacks as per Lorenzo's comments(patch #1). -Added handling of Global(Counter0) filter settings mode(patch #2). -Added patch #4 to address HiSilicon erratum #162001800 - v2 --> v3 -Addressed comments from Robin. -Removed iort helper function to retrieve the PMCG reference smmu. -PMCG devices are now named using the base address v1 --> v2 - Addressed comments from Robin. - Added an helper to retrieve the associated smmu dev and named PMUs to make the association visible to user. - Added MSI support for overflow irq [1]https://www.spinics.net/lists/arm-kernel/msg598591.html Neil Leeder (2): acpi: arm64: add iort support for PMCG perf: add arm64 smmuv3 pmu driver Shameer Kolothum (2): perf/smmuv3: Add MSI irq support perf/smmuv3_pmu: Enable HiSilicon Erratum 162001800 quirk drivers/acpi/arm64/iort.c | 127 +-- drivers/perf/Kconfig | 9 + drivers/perf/Makefile | 1 + drivers/perf/arm_smmuv3_pmu.c | 859 ++ include/linux/acpi_iort.h | 3 + 5 files changed, 975 insertions(+), 24 deletions(-) create mode 100644 drivers/perf/arm_smmuv3_pmu.c -- 2.7.4
[PATCH v5 3/4] perf/smmuv3: Add MSI irq support
This adds support for MSI-based counter overflow interrupt. Signed-off-by: Shameer Kolothum --- drivers/perf/arm_smmuv3_pmu.c | 58 +++ 1 file changed, 58 insertions(+) diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c index fb9dcd8..71d10a0 100644 --- a/drivers/perf/arm_smmuv3_pmu.c +++ b/drivers/perf/arm_smmuv3_pmu.c @@ -68,6 +68,7 @@ #define SMMU_PMCG_OVSSET0 0xCC0 #define SMMU_PMCG_CFGR 0xE00 #define SMMU_PMCG_CFGR_RELOC_CTRS BIT(20) +#define SMMU_PMCG_CFGR_MSI BIT(21) #define SMMU_PMCG_CFGR_SID_FILTER_TYPE BIT(23) #define SMMU_PMCG_CFGR_SIZE_MASKGENMASK(13, 8) #define SMMU_PMCG_CFGR_NCTR_MASKGENMASK(5, 0) @@ -78,6 +79,12 @@ #define SMMU_PMCG_IRQ_CTRL 0xE50 #define SMMU_PMCG_IRQ_CTRL_IRQENBIT(0) #define SMMU_PMCG_IRQ_CFG0 0xE58 +#define SMMU_PMCG_IRQ_CFG1 0xE60 +#define SMMU_PMCG_IRQ_CFG2 0xE64 + +/* MSI config fields */ +#define MSI_CFG0_ADDR_MASK GENMASK_ULL(51, 2) +#define MSI_CFG2_MEMATTR_DEVICE_nGnRE 0x1 #define SMMU_DEFAULT_FILTER_SPAN1 #define SMMU_DEFAULT_FILTER_STREAM_ID GENMASK(31, 0) @@ -587,11 +594,62 @@ static irqreturn_t smmu_pmu_handle_irq(int irq_num, void *data) return IRQ_HANDLED; } +static void smmu_pmu_free_msis(void *data) +{ + struct device *dev = data; + + platform_msi_domain_free_irqs(dev); +} + +static void smmu_pmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg) +{ + phys_addr_t doorbell; + struct device *dev = msi_desc_to_dev(desc); + struct smmu_pmu *pmu = dev_get_drvdata(dev); + + doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo; + doorbell &= MSI_CFG0_ADDR_MASK; + + writeq_relaxed(doorbell, pmu->reg_base + SMMU_PMCG_IRQ_CFG0); + writel_relaxed(msg->data, pmu->reg_base + SMMU_PMCG_IRQ_CFG1); + writel_relaxed(MSI_CFG2_MEMATTR_DEVICE_nGnRE, + pmu->reg_base + SMMU_PMCG_IRQ_CFG2); +} + +static void smmu_pmu_setup_msi(struct smmu_pmu *pmu) +{ + struct msi_desc *desc; + struct device *dev = pmu->dev; + int ret; + + /* Clear MSI address reg */ + writeq_relaxed(0, pmu->reg_base + SMMU_PMCG_IRQ_CFG0); + + /* MSI supported or not */ + if (!(readl(pmu->reg_base + SMMU_PMCG_CFGR) & SMMU_PMCG_CFGR_MSI)) + return; + + ret = platform_msi_domain_alloc_irqs(dev, 1, smmu_pmu_write_msi_msg); + if (ret) { + dev_warn(dev, "failed to allocate MSIs\n"); + return; + } + + desc = first_msi_entry(dev); + if (desc) + pmu->irq = desc->irq; + + /* Add callback to free MSIs on teardown */ + devm_add_action(dev, smmu_pmu_free_msis, dev); +} + static int smmu_pmu_setup_irq(struct smmu_pmu *pmu) { unsigned long flags = IRQF_NOBALANCING | IRQF_SHARED | IRQF_NO_THREAD; int irq, ret = -ENXIO; + smmu_pmu_setup_msi(pmu); + irq = pmu->irq; if (irq) ret = devm_request_irq(pmu->dev, irq, smmu_pmu_handle_irq, -- 2.7.4
[PATCH v5 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum 162001800 quirk
HiSilicon erratum 162001800 describes the limitation of SMMUv3 PMCG implementation on HiSilicon Hip08 platforms. On these platforms, the PMCG event counter registers (SMMU_PMCG_EVCNTRn) are read only and as a result it is not possible to set the initial counter period value on event monitor start. To work around this, the current value of the counter is read and used for delta calculations. OEM information from ACPI header is used to identify the affected hardware platforms. Signed-off-by: Shameer Kolothum --- drivers/acpi/arm64/iort.c | 30 +++--- drivers/perf/arm_smmuv3_pmu.c | 35 +-- include/linux/acpi_iort.h | 3 +++ 3 files changed, 59 insertions(+), 9 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 2da08e1..d174379 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -1364,6 +1364,22 @@ static void __init arm_smmu_v3_pmcg_init_resources(struct resource *res, ACPI_EDGE_SENSITIVE, [2]); } +static struct acpi_platform_list pmcg_evcntr_rdonly_list[] __initdata = { + /* HiSilicon Erratum 162001800 */ + {"HISI ", "HIP08 ", 0, ACPI_SIG_IORT, greater_than_or_equal}, + { } +}; + +static int __init arm_smmu_v3_pmcg_add_platdata(struct platform_device *pdev) +{ + u32 options = 0; + + if (acpi_match_platform_list(pmcg_evcntr_rdonly_list) >= 0) + options |= IORT_PMCG_EVCNTR_RDONLY; + + return platform_device_add_data(pdev, , sizeof(options)); +} + struct iort_dev_config { const char *name; int (*dev_init)(struct acpi_iort_node *node); @@ -1374,6 +1390,7 @@ struct iort_dev_config { struct acpi_iort_node *node); void (*dev_set_proximity)(struct device *dev, struct acpi_iort_node *node); + int (*dev_add_platdata)(struct platform_device *pdev); }; static const struct iort_dev_config iort_arm_smmu_v3_cfg __initconst = { @@ -1395,6 +1412,7 @@ static const struct iort_dev_config iort_arm_smmu_v3_pmcg_cfg __initconst = { .name = "arm-smmu-v3-pmu", .dev_count_resources = arm_smmu_v3_pmcg_count_resources, .dev_init_resources = arm_smmu_v3_pmcg_init_resources, + .dev_add_platdata = arm_smmu_v3_pmcg_add_platdata, }; static __init const struct iort_dev_config *iort_get_dev_cfg( @@ -1455,10 +1473,16 @@ static int __init iort_add_platform_device(struct acpi_iort_node *node, goto dev_put; /* -* Add a copy of IORT node pointer to platform_data to -* be used to retrieve IORT data information. +* Platform devices based on PMCG nodes uses platform_data to +* pass quirk flags to the driver. For others, add a copy of +* IORT node pointer to platform_data to be used to retrieve +* IORT data information. */ - ret = platform_device_add_data(pdev, , sizeof(node)); + if (ops->dev_add_platdata) + ret = ops->dev_add_platdata(pdev); + else + ret = platform_device_add_data(pdev, , sizeof(node)); + if (ret) goto dev_put; diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c index 71d10a0..02107a1 100644 --- a/drivers/perf/arm_smmuv3_pmu.c +++ b/drivers/perf/arm_smmuv3_pmu.c @@ -35,6 +35,7 @@ */ #include +#include #include #include #include @@ -111,6 +112,7 @@ struct smmu_pmu { struct device *dev; void __iomem *reg_base; void __iomem *reloc_base; + u32 options; u64 counter_present_mask; u64 counter_mask; }; @@ -224,12 +226,25 @@ static void smmu_pmu_set_period(struct smmu_pmu *smmu_pmu, u32 idx = hwc->idx; u64 new; - /* -* We limit the max period to half the max counter value of the counter -* size, so that even in the case of extreme interrupt latency the -* counter will (hopefully) not wrap past its initial value. -*/ - new = smmu_pmu->counter_mask >> 1; + if (smmu_pmu->options & IORT_PMCG_EVCNTR_RDONLY) { + /* +* On platforms that require this quirk, if the counter starts +* at < half_counter value and wraps, the current logic of +* handling the overflow may not work. It is expected that, +* those platforms will have full 64 counter bits implemented +* so that such a possibility is remote(eg: HiSilicon HIP08). +*/ + new = smmu_pmu_counter_get_value(smmu_pmu, idx); + } else { + /* +* We limit the max period to half the max counter value +* of the counter size, so that even in the case of extreme +* interrupt latency the counter will (hopefully) not wrap +* past its
[PATCH v5 3/4] perf/smmuv3: Add MSI irq support
This adds support for MSI-based counter overflow interrupt. Signed-off-by: Shameer Kolothum --- drivers/perf/arm_smmuv3_pmu.c | 58 +++ 1 file changed, 58 insertions(+) diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c index fb9dcd8..71d10a0 100644 --- a/drivers/perf/arm_smmuv3_pmu.c +++ b/drivers/perf/arm_smmuv3_pmu.c @@ -68,6 +68,7 @@ #define SMMU_PMCG_OVSSET0 0xCC0 #define SMMU_PMCG_CFGR 0xE00 #define SMMU_PMCG_CFGR_RELOC_CTRS BIT(20) +#define SMMU_PMCG_CFGR_MSI BIT(21) #define SMMU_PMCG_CFGR_SID_FILTER_TYPE BIT(23) #define SMMU_PMCG_CFGR_SIZE_MASKGENMASK(13, 8) #define SMMU_PMCG_CFGR_NCTR_MASKGENMASK(5, 0) @@ -78,6 +79,12 @@ #define SMMU_PMCG_IRQ_CTRL 0xE50 #define SMMU_PMCG_IRQ_CTRL_IRQENBIT(0) #define SMMU_PMCG_IRQ_CFG0 0xE58 +#define SMMU_PMCG_IRQ_CFG1 0xE60 +#define SMMU_PMCG_IRQ_CFG2 0xE64 + +/* MSI config fields */ +#define MSI_CFG0_ADDR_MASK GENMASK_ULL(51, 2) +#define MSI_CFG2_MEMATTR_DEVICE_nGnRE 0x1 #define SMMU_DEFAULT_FILTER_SPAN1 #define SMMU_DEFAULT_FILTER_STREAM_ID GENMASK(31, 0) @@ -587,11 +594,62 @@ static irqreturn_t smmu_pmu_handle_irq(int irq_num, void *data) return IRQ_HANDLED; } +static void smmu_pmu_free_msis(void *data) +{ + struct device *dev = data; + + platform_msi_domain_free_irqs(dev); +} + +static void smmu_pmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg) +{ + phys_addr_t doorbell; + struct device *dev = msi_desc_to_dev(desc); + struct smmu_pmu *pmu = dev_get_drvdata(dev); + + doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo; + doorbell &= MSI_CFG0_ADDR_MASK; + + writeq_relaxed(doorbell, pmu->reg_base + SMMU_PMCG_IRQ_CFG0); + writel_relaxed(msg->data, pmu->reg_base + SMMU_PMCG_IRQ_CFG1); + writel_relaxed(MSI_CFG2_MEMATTR_DEVICE_nGnRE, + pmu->reg_base + SMMU_PMCG_IRQ_CFG2); +} + +static void smmu_pmu_setup_msi(struct smmu_pmu *pmu) +{ + struct msi_desc *desc; + struct device *dev = pmu->dev; + int ret; + + /* Clear MSI address reg */ + writeq_relaxed(0, pmu->reg_base + SMMU_PMCG_IRQ_CFG0); + + /* MSI supported or not */ + if (!(readl(pmu->reg_base + SMMU_PMCG_CFGR) & SMMU_PMCG_CFGR_MSI)) + return; + + ret = platform_msi_domain_alloc_irqs(dev, 1, smmu_pmu_write_msi_msg); + if (ret) { + dev_warn(dev, "failed to allocate MSIs\n"); + return; + } + + desc = first_msi_entry(dev); + if (desc) + pmu->irq = desc->irq; + + /* Add callback to free MSIs on teardown */ + devm_add_action(dev, smmu_pmu_free_msis, dev); +} + static int smmu_pmu_setup_irq(struct smmu_pmu *pmu) { unsigned long flags = IRQF_NOBALANCING | IRQF_SHARED | IRQF_NO_THREAD; int irq, ret = -ENXIO; + smmu_pmu_setup_msi(pmu); + irq = pmu->irq; if (irq) ret = devm_request_irq(pmu->dev, irq, smmu_pmu_handle_irq, -- 2.7.4
[PATCH v5 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum 162001800 quirk
HiSilicon erratum 162001800 describes the limitation of SMMUv3 PMCG implementation on HiSilicon Hip08 platforms. On these platforms, the PMCG event counter registers (SMMU_PMCG_EVCNTRn) are read only and as a result it is not possible to set the initial counter period value on event monitor start. To work around this, the current value of the counter is read and used for delta calculations. OEM information from ACPI header is used to identify the affected hardware platforms. Signed-off-by: Shameer Kolothum --- drivers/acpi/arm64/iort.c | 30 +++--- drivers/perf/arm_smmuv3_pmu.c | 35 +-- include/linux/acpi_iort.h | 3 +++ 3 files changed, 59 insertions(+), 9 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 2da08e1..d174379 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -1364,6 +1364,22 @@ static void __init arm_smmu_v3_pmcg_init_resources(struct resource *res, ACPI_EDGE_SENSITIVE, [2]); } +static struct acpi_platform_list pmcg_evcntr_rdonly_list[] __initdata = { + /* HiSilicon Erratum 162001800 */ + {"HISI ", "HIP08 ", 0, ACPI_SIG_IORT, greater_than_or_equal}, + { } +}; + +static int __init arm_smmu_v3_pmcg_add_platdata(struct platform_device *pdev) +{ + u32 options = 0; + + if (acpi_match_platform_list(pmcg_evcntr_rdonly_list) >= 0) + options |= IORT_PMCG_EVCNTR_RDONLY; + + return platform_device_add_data(pdev, , sizeof(options)); +} + struct iort_dev_config { const char *name; int (*dev_init)(struct acpi_iort_node *node); @@ -1374,6 +1390,7 @@ struct iort_dev_config { struct acpi_iort_node *node); void (*dev_set_proximity)(struct device *dev, struct acpi_iort_node *node); + int (*dev_add_platdata)(struct platform_device *pdev); }; static const struct iort_dev_config iort_arm_smmu_v3_cfg __initconst = { @@ -1395,6 +1412,7 @@ static const struct iort_dev_config iort_arm_smmu_v3_pmcg_cfg __initconst = { .name = "arm-smmu-v3-pmu", .dev_count_resources = arm_smmu_v3_pmcg_count_resources, .dev_init_resources = arm_smmu_v3_pmcg_init_resources, + .dev_add_platdata = arm_smmu_v3_pmcg_add_platdata, }; static __init const struct iort_dev_config *iort_get_dev_cfg( @@ -1455,10 +1473,16 @@ static int __init iort_add_platform_device(struct acpi_iort_node *node, goto dev_put; /* -* Add a copy of IORT node pointer to platform_data to -* be used to retrieve IORT data information. +* Platform devices based on PMCG nodes uses platform_data to +* pass quirk flags to the driver. For others, add a copy of +* IORT node pointer to platform_data to be used to retrieve +* IORT data information. */ - ret = platform_device_add_data(pdev, , sizeof(node)); + if (ops->dev_add_platdata) + ret = ops->dev_add_platdata(pdev); + else + ret = platform_device_add_data(pdev, , sizeof(node)); + if (ret) goto dev_put; diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c index 71d10a0..02107a1 100644 --- a/drivers/perf/arm_smmuv3_pmu.c +++ b/drivers/perf/arm_smmuv3_pmu.c @@ -35,6 +35,7 @@ */ #include +#include #include #include #include @@ -111,6 +112,7 @@ struct smmu_pmu { struct device *dev; void __iomem *reg_base; void __iomem *reloc_base; + u32 options; u64 counter_present_mask; u64 counter_mask; }; @@ -224,12 +226,25 @@ static void smmu_pmu_set_period(struct smmu_pmu *smmu_pmu, u32 idx = hwc->idx; u64 new; - /* -* We limit the max period to half the max counter value of the counter -* size, so that even in the case of extreme interrupt latency the -* counter will (hopefully) not wrap past its initial value. -*/ - new = smmu_pmu->counter_mask >> 1; + if (smmu_pmu->options & IORT_PMCG_EVCNTR_RDONLY) { + /* +* On platforms that require this quirk, if the counter starts +* at < half_counter value and wraps, the current logic of +* handling the overflow may not work. It is expected that, +* those platforms will have full 64 counter bits implemented +* so that such a possibility is remote(eg: HiSilicon HIP08). +*/ + new = smmu_pmu_counter_get_value(smmu_pmu, idx); + } else { + /* +* We limit the max period to half the max counter value +* of the counter size, so that even in the case of extreme +* interrupt latency the counter will (hopefully) not wrap +* past its
[PATCH 1/9] tools/lib/traceevent: Implemented new API tep_get_ref()
From: Tzvetomir Stoyanov This patch implements a new API of the tracevent library: int tep_get_ref(struct tep_handle *tep); The API returns the reference counter "ref_count" of the tep handler. As "struct tep_handle" is internal only, its members cannot be accessed by the library users, the API is used to get the reference counter. Signed-off-by: Tzvetomir Stoyanov Signed-off-by: Steven Rostedt (VMware) --- tools/lib/traceevent/event-parse.c | 7 +++ tools/lib/traceevent/event-parse.h | 1 + 2 files changed, 8 insertions(+) diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c index 3692f29fee46..a5f3e37f81b5 100644 --- a/tools/lib/traceevent/event-parse.c +++ b/tools/lib/traceevent/event-parse.c @@ -6730,6 +6730,13 @@ void tep_ref(struct tep_handle *pevent) pevent->ref_count++; } +int tep_get_ref(struct tep_handle *tep) +{ + if (tep) + return tep->ref_count; + return 0; +} + void tep_free_format_field(struct tep_format_field *field) { free(field->type); diff --git a/tools/lib/traceevent/event-parse.h b/tools/lib/traceevent/event-parse.h index 16bf4c890b6f..44ec26c72c2e 100644 --- a/tools/lib/traceevent/event-parse.h +++ b/tools/lib/traceevent/event-parse.h @@ -581,6 +581,7 @@ struct tep_handle *tep_alloc(void); void tep_free(struct tep_handle *pevent); void tep_ref(struct tep_handle *pevent); void tep_unref(struct tep_handle *pevent); +int tep_get_ref(struct tep_handle *tep); /* access to the internal parser */ void tep_buffer_init(const char *buf, unsigned long long size); -- 2.19.1
[PATCH 1/9] tools/lib/traceevent: Implemented new API tep_get_ref()
From: Tzvetomir Stoyanov This patch implements a new API of the tracevent library: int tep_get_ref(struct tep_handle *tep); The API returns the reference counter "ref_count" of the tep handler. As "struct tep_handle" is internal only, its members cannot be accessed by the library users, the API is used to get the reference counter. Signed-off-by: Tzvetomir Stoyanov Signed-off-by: Steven Rostedt (VMware) --- tools/lib/traceevent/event-parse.c | 7 +++ tools/lib/traceevent/event-parse.h | 1 + 2 files changed, 8 insertions(+) diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c index 3692f29fee46..a5f3e37f81b5 100644 --- a/tools/lib/traceevent/event-parse.c +++ b/tools/lib/traceevent/event-parse.c @@ -6730,6 +6730,13 @@ void tep_ref(struct tep_handle *pevent) pevent->ref_count++; } +int tep_get_ref(struct tep_handle *tep) +{ + if (tep) + return tep->ref_count; + return 0; +} + void tep_free_format_field(struct tep_format_field *field) { free(field->type); diff --git a/tools/lib/traceevent/event-parse.h b/tools/lib/traceevent/event-parse.h index 16bf4c890b6f..44ec26c72c2e 100644 --- a/tools/lib/traceevent/event-parse.h +++ b/tools/lib/traceevent/event-parse.h @@ -581,6 +581,7 @@ struct tep_handle *tep_alloc(void); void tep_free(struct tep_handle *pevent); void tep_ref(struct tep_handle *pevent); void tep_unref(struct tep_handle *pevent); +int tep_get_ref(struct tep_handle *tep); /* access to the internal parser */ void tep_buffer_init(const char *buf, unsigned long long size); -- 2.19.1
[PATCH 5/9] tools/lib/traceevent: Rename tep_free_format() to tep_free_event()
From: Tzvetomir Stoyanov In order to make libtraceevent into a proper library, variables, data structures and functions require a unique prefix to prevent name space conflicts. This renames tep_free_format() to tep_free_event(), which describes more closely the purpose of the function. Signed-off-by: Tzvetomir Stoyanov Signed-off-by: Steven Rostedt (VMware) --- tools/lib/traceevent/event-parse.c | 6 +++--- tools/lib/traceevent/event-parse.h | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c index bacd86c41563..848cd76b91a7 100644 --- a/tools/lib/traceevent/event-parse.c +++ b/tools/lib/traceevent/event-parse.c @@ -6154,7 +6154,7 @@ __parse_event(struct tep_handle *pevent, return 0; event_add_failed: - tep_free_format(event); + tep_free_event(event); return ret; } @@ -6763,7 +6763,7 @@ static void free_formats(struct tep_format *format) free_format_fields(format->fields); } -void tep_free_format(struct tep_event *event) +void tep_free_event(struct tep_event *event) { free(event->name); free(event->system); @@ -6849,7 +6849,7 @@ void tep_free(struct tep_handle *pevent) } for (i = 0; i < pevent->nr_events; i++) - tep_free_format(pevent->events[i]); + tep_free_event(pevent->events[i]); while (pevent->handlers) { handle = pevent->handlers; diff --git a/tools/lib/traceevent/event-parse.h b/tools/lib/traceevent/event-parse.h index 2a1a644c5ec8..950ad185a5c4 100644 --- a/tools/lib/traceevent/event-parse.h +++ b/tools/lib/traceevent/event-parse.h @@ -475,7 +475,7 @@ enum tep_errno tep_parse_format(struct tep_handle *pevent, struct tep_event **eventp, const char *buf, unsigned long size, const char *sys); -void tep_free_format(struct tep_event *event); +void tep_free_event(struct tep_event *event); void tep_free_format_field(struct tep_format_field *field); void *tep_get_field_raw(struct trace_seq *s, struct tep_event *event, -- 2.19.1
[PATCH 5/9] tools/lib/traceevent: Rename tep_free_format() to tep_free_event()
From: Tzvetomir Stoyanov In order to make libtraceevent into a proper library, variables, data structures and functions require a unique prefix to prevent name space conflicts. This renames tep_free_format() to tep_free_event(), which describes more closely the purpose of the function. Signed-off-by: Tzvetomir Stoyanov Signed-off-by: Steven Rostedt (VMware) --- tools/lib/traceevent/event-parse.c | 6 +++--- tools/lib/traceevent/event-parse.h | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c index bacd86c41563..848cd76b91a7 100644 --- a/tools/lib/traceevent/event-parse.c +++ b/tools/lib/traceevent/event-parse.c @@ -6154,7 +6154,7 @@ __parse_event(struct tep_handle *pevent, return 0; event_add_failed: - tep_free_format(event); + tep_free_event(event); return ret; } @@ -6763,7 +6763,7 @@ static void free_formats(struct tep_format *format) free_format_fields(format->fields); } -void tep_free_format(struct tep_event *event) +void tep_free_event(struct tep_event *event) { free(event->name); free(event->system); @@ -6849,7 +6849,7 @@ void tep_free(struct tep_handle *pevent) } for (i = 0; i < pevent->nr_events; i++) - tep_free_format(pevent->events[i]); + tep_free_event(pevent->events[i]); while (pevent->handlers) { handle = pevent->handlers; diff --git a/tools/lib/traceevent/event-parse.h b/tools/lib/traceevent/event-parse.h index 2a1a644c5ec8..950ad185a5c4 100644 --- a/tools/lib/traceevent/event-parse.h +++ b/tools/lib/traceevent/event-parse.h @@ -475,7 +475,7 @@ enum tep_errno tep_parse_format(struct tep_handle *pevent, struct tep_event **eventp, const char *buf, unsigned long size, const char *sys); -void tep_free_format(struct tep_event *event); +void tep_free_event(struct tep_event *event); void tep_free_format_field(struct tep_format_field *field); void *tep_get_field_raw(struct trace_seq *s, struct tep_event *event, -- 2.19.1
[PATCH 2/9] tools/lib/traceevent: Added support for pkg-config
From: Tzvetomir Stoyanov This patch implements integration with pkg-config framework. pkg-config can be used by the library users to determine required CFLAGS and LDFLAGS in order to use the library Signed-off-by: Tzvetomir Stoyanov Signed-off-by: Steven Rostedt (VMware) --- tools/lib/traceevent/Makefile | 26 --- .../lib/traceevent/libtraceevent.pc.template | 10 +++ 2 files changed, 33 insertions(+), 3 deletions(-) create mode 100644 tools/lib/traceevent/libtraceevent.pc.template diff --git a/tools/lib/traceevent/Makefile b/tools/lib/traceevent/Makefile index 0b4e833088a4..adb16f845ab3 100644 --- a/tools/lib/traceevent/Makefile +++ b/tools/lib/traceevent/Makefile @@ -25,6 +25,7 @@ endef $(call allow-override,CC,$(CROSS_COMPILE)gcc) $(call allow-override,AR,$(CROSS_COMPILE)ar) $(call allow-override,NM,$(CROSS_COMPILE)nm) +$(call allow-override,PKG_CONFIG,pkg-config) EXT = -std=gnu99 INSTALL = install @@ -47,6 +48,8 @@ prefix ?= /usr/local libdir = $(prefix)/$(libdir_relative) man_dir = $(prefix)/share/man man_dir_SQ = '$(subst ','\'',$(man_dir))' +pkgconfig_dir ?= $(word 1,$(shell $(PKG_CONFIG)\ + --variable pc_path pkg-config | tr ":" " ")) export man_dir man_dir_SQ INSTALL export DESTDIR DESTDIR_SQ @@ -270,7 +273,19 @@ define do_generate_dynamic_list_file fi endef -install_lib: all_cmd install_plugins +PKG_CONFIG_FILE = libtraceevent.pc +define do_install_pkgconfig_file + if [ -n "${pkgconfig_dir}" ]; then \ + cp -f ${PKG_CONFIG_FILE}.template ${PKG_CONFIG_FILE}; \ + sed -i "s|INSTALL_PREFIX|${1}|g" ${PKG_CONFIG_FILE}; \ + sed -i "s|LIB_VERSION|${EVENT_PARSE_VERSION}|g" ${PKG_CONFIG_FILE}; \ + $(call do_install,$(PKG_CONFIG_FILE),$(pkgconfig_dir),644); \ + else \ + (echo Failed to locate pkg-config directory) 1>&2; \ + fi +endef + +install_lib: all_cmd install_plugins install_pkgconfig $(call QUIET_INSTALL, $(LIB_TARGET)) \ $(call do_install_mkdir,$(libdir_SQ)); \ cp -fpR $(LIB_INSTALL) $(DESTDIR)$(libdir_SQ) @@ -279,6 +294,10 @@ install_plugins: $(PLUGINS) $(call QUIET_INSTALL, trace_plugins) \ $(call do_install_plugins, $(PLUGINS)) +install_pkgconfig: + $(call QUIET_INSTALL, $(PKG_CONFIG_FILE)) \ + $(call do_install_pkgconfig_file,$(prefix)) + install_headers: $(call QUIET_INSTALL, headers) \ $(call do_install,event-parse.h,$(prefix)/include/traceevent,644); \ @@ -289,8 +308,9 @@ install: install_lib clean: $(call QUIET_CLEAN, libtraceevent) \ - $(RM) *.o *~ $(TARGETS) *.a *.so $(VERSION_FILES) .*.d .*.cmd \ - $(RM) TRACEEVENT-CFLAGS tags TAGS + $(RM) *.o *~ $(TARGETS) *.a *.so $(VERSION_FILES) .*.d .*.cmd; \ + $(RM) TRACEEVENT-CFLAGS tags TAGS; \ + $(RM) $(PKG_CONFIG_FILE) PHONY += force plugins force: diff --git a/tools/lib/traceevent/libtraceevent.pc.template b/tools/lib/traceevent/libtraceevent.pc.template new file mode 100644 index ..42e4d6cb6b9e --- /dev/null +++ b/tools/lib/traceevent/libtraceevent.pc.template @@ -0,0 +1,10 @@ +prefix=INSTALL_PREFIX +libdir=${prefix}/lib64 +includedir=${prefix}/include/traceevent + +Name: libtraceevent +URL: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git +Description: Linux kernel trace event library +Version: LIB_VERSION +Cflags: -I${includedir} +Libs: -L${libdir} -ltraceevent -- 2.19.1
[PATCH 0/9] tools/lib/traceevent: More udpates to make libtraceevent into a library
Arnaldo and Jiri, Here's more patches to get us a step closer to having a legitimate standalone library for libtraceevent. I'm currently reviewing man pages, which I want finished before we call it done. Please pull this tree (based on current tip/perf/core) or apply the patches. Thanks! -- Steve git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git tip/perf/core Head SHA1: 1a1dbb61ee77226e9097bfe307219abf5df8e4cd Tzvetomir Stoyanov (9): tools/lib/traceevent: Implemented new API tep_get_ref() tools/lib/traceevent: Added support for pkg-config tools/lib/traceevent: Install trace-seq.h API header file tools/lib/traceevent, tools/perf: Rename struct tep_event_format to struct tep_event tools/lib/traceevent: Rename tep_free_format() to tep_free_event() tools/perf: traceevent API cleanup, remove __tep_data2host*() tools/lib/traceevent: traceevent API cleanup tools/lib/traceevent: Introduce new libtracevent API: tep_override_comm() tools/lib/traceevent: Add sanity check to is_timestamp_in_us() tools/lib/traceevent/Makefile | 27 +- tools/lib/traceevent/event-parse-api.c | 8 +- tools/lib/traceevent/event-parse-local.h | 13 +- tools/lib/traceevent/event-parse.c | 283 - tools/lib/traceevent/event-parse.h | 78 +++--- tools/lib/traceevent/libtraceevent.pc.template | 10 + tools/lib/traceevent/parse-filter.c| 42 +-- tools/lib/traceevent/plugin_function.c | 2 +- tools/lib/traceevent/plugin_hrtimer.c | 4 +- tools/lib/traceevent/plugin_kmem.c | 2 +- tools/lib/traceevent/plugin_kvm.c | 14 +- tools/lib/traceevent/plugin_mac80211.c | 4 +- tools/lib/traceevent/plugin_sched_switch.c | 4 +- tools/perf/builtin-trace.c | 2 +- tools/perf/util/evsel.h| 4 +- tools/perf/util/header.c | 2 +- tools/perf/util/python.c | 4 +- .../perf/util/scripting-engines/trace-event-perl.c | 6 +- .../util/scripting-engines/trace-event-python.c| 8 +- tools/perf/util/trace-event-parse.c| 16 +- tools/perf/util/trace-event-read.c | 4 +- tools/perf/util/trace-event.c | 8 +- tools/perf/util/trace-event.h | 16 +- 23 files changed, 317 insertions(+), 244 deletions(-) create mode 100644 tools/lib/traceevent/libtraceevent.pc.template