[PATCH 1/1] tracing/tools: fix a couple of spelling mistakes
From: Colin Ian King There is a spelling mistake in the -g help option, I believe it should be "graph". There is also a spelling mistake in a warning message. Fix both mistakes. Signed-off-by: Colin Ian King Signed-off-by: Viktor Rosendahl --- tools/tracing/latency/latency-collector.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/tracing/latency/latency-collector.c b/tools/tracing/latency/latency-collector.c index 57b20802e71b..b69de9263ee6 100644 --- a/tools/tracing/latency/latency-collector.c +++ b/tools/tracing/latency/latency-collector.c @@ -1650,7 +1650,7 @@ static void start_printthread(void) if (ufd < 0 || read(ufd, seed, sizeof(*seed)) != sizeof(*seed)) { printf( -"Warning! Using trivial random nummer seed, since %s not available\n", +"Warning! Using trivial random number seed, since %s not available\n", DEV_URANDOM); fflush(stdout); *seed = i; @@ -1711,8 +1711,8 @@ static void show_usage(void) "\t\t\tbeginning, end, and backtrace.\n\n" "-g, --graph\t\tEnable the display-graph option in trace_option. This\n" -"\t\t\toption causes ftrace to show the functionph of how\n" -"\t\t\tfunctions are calling other functions.\n\n" +"\t\t\toption causes ftrace to show the graph of how functions\n" +"\t\t\tare calling other functions.\n\n" "-c, --policy POL\tRun the program with scheduling policy POL. POL can be\n" "\t\t\tother, batch, idle, rr or fifo. The default is rr. When\n" -- 2.25.1
[PATCH 0/1] Fix for the latency-collector
Hi Steve, I am not sure what is the correct approach to handle this latency-collector fix. Can you take it into your queue? best regards, Viktor Colin Ian King (1): tracing/tools: fix a couple of spelling mistakes tools/tracing/latency/latency-collector.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- 2.25.1
[PATCH v12 0/1] Add the latency-collector to tools
Changes since v11 * I added a .gitignore in the tools/tracing/latency directory. Changes since v10 * Using libtracefs to control ftrace and find /sys/kernel/tracing * New options to list available tracers, select the tracer and other ftrace parameters * A signal handler to restore the ftrace state when the program terminates * Cleaned up the commit message a bit, removed example output etc. Viktor Rosendahl (1): Add the latency-collector to tools tools/Makefile| 14 +- tools/tracing/Makefile| 19 + tools/tracing/latency/.gitignore |2 + tools/tracing/latency/Makefile| 24 + tools/tracing/latency/latency-collector.c | 2108 + 5 files changed, 2161 insertions(+), 6 deletions(-) create mode 100644 tools/tracing/Makefile create mode 100644 tools/tracing/latency/.gitignore create mode 100644 tools/tracing/latency/Makefile create mode 100644 tools/tracing/latency/latency-collector.c -- 2.25.1
[PATCH v12 1/1] Add the latency-collector to tools
This is a tool that is intended to work around the fact that the preemptoff, irqsoff, and preemptirqsoff tracers only work in overwrite mode. The idea is to act randomly in such a way that we do not systematically lose any latencies, so that if enough testing is done, all latencies will be captured. If the same burst of latencies is repeated, then sooner or later we will have captured all the latencies. It also works with the wakeup_dl, wakeup_rt, and wakeup tracers. However, in that case it is probably not useful to use the random sleep functionality. The reason why it may be desirable to catch all latencies with a long test campaign is that for some organizations, it's necessary to test the kernel in the field and not practical for developers to work iteratively with field testers. Because of cost and project schedules it is not possible to start a new test campaign every time a latency problem has been fixed. It uses inotify to detect changes to /sys/kernel/tracing/trace. When a latency is detected, it will either sleep or print immediately, depending on a function that act as an unfair coin toss. If immediate print is chosen, it means that we open /sys/kernel/tracing/trace and thereby cause a blackout period that will hide any subsequent latencies. If sleep is chosen, it means that we wait before opening /sys/kernel/tracing/trace, by default for 1000 ms, to see if there is another latency during this period. If there is, then we will lose the previous latency. The coin will be tossed again with a different probability, and we will either print the new latency, or possibly a subsequent one. The probability for the unfair coin toss is chosen so that there is equal probability to obtain any of the latencies in a burst. However, this assumes that we make an assumption of how many latencies there can be. By default the program assumes that there are no more than 2 latencies in a burst, the probability of immediate printout will be: 1/2 and 1 Thus, the probability of getting each of the two latencies will be 1/2. If we ever find that there is more than one latency in a series, meaning that we reach the probability of 1, then the table will be expanded to: 1/3, 1/2, and 1 Thus, we assume that there are no more than three latencies and each with a probability of 1/3 of being captured. If the probability of 1 is reached in the new table, that is we see more than two closely occurring latencies, then the table will again be extended, and so on. On my systems, it seems like this scheme works fairly well, as long as the latencies we trace are long enough, 300 us seems to be enough. This userspace program receive the inotify event at the end of a latency, and it has time until the end of the next latency to react, that is to open /sys/kernel/tracing/trace. Thus, if we trace latencies that are >300 us, then we have at least 300 us to react. The minimum latency will of course not be 300 us on all systems, it will depend on the hardware, kernel version, workload and configuration. Example usage: In one shell, give the following command: sudo latency-collector -rvv -t preemptirqsoff -s 2000 -a 3 This will trace latencies > 2000us with the preemptirqsoff tracer, using random sleep with maximum verbosity, with a probability table initialized to a size of 3. In another shell, generate a few bursts of latencies: root@host:~# modprobe preemptirq_delay_test delay=3000 test_mode=alternate burst_size=3 root@host:~# echo 1 > /sys/kernel/preemptirq_delay_test/trigger root@host:~# echo 1 > /sys/kernel/preemptirq_delay_test/trigger root@host:~# echo 1 > /sys/kernel/preemptirq_delay_test/trigger root@host:~# echo 1 > /sys/kernel/preemptirq_delay_test/trigger If all goes well, you should be getting stack traces that shows all the different latencies, i.e. you should see all the three functions preemptirqtest_0, preemptirqtest_1, preemptirqtest_2 in the stack traces. Signed-off-by: Viktor Rosendahl --- tools/Makefile| 14 +- tools/tracing/Makefile| 19 + tools/tracing/latency/.gitignore |2 + tools/tracing/latency/Makefile| 24 + tools/tracing/latency/latency-collector.c | 2108 + 5 files changed, 2161 insertions(+), 6 deletions(-) create mode 100644 tools/tracing/Makefile create mode 100644 tools/tracing/latency/.gitignore create mode 100644 tools/tracing/latency/Makefile create mode 100644 tools/tracing/latency/latency-collector.c diff --git a/tools/Makefile b/tools/Makefile index 85af6ebbce91..7e9d34ddd74c 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -31,6 +31,7 @@ help: @echo ' bootconfig - boot config tool' @echo ' spi- spi tools' @echo ' tmon - thermal monitoring and tuning tool' + @echo ' tracing- misc tracing tools' @echo ' turbostat - Intel CPU i
[PATCH v11] Add the latency-collector to tools
This is a tool that is intended to work around the fact that the preemptoff, irqsoff, and preemptirqsoff tracers only work in overwrite mode. The idea is to act randomly in such a way that we do not systematically lose any latencies, so that if enough testing is done, all latencies will be captured. If the same burst of latencies is repeated, then sooner or later we will have captured all the latencies. It also works with the wakeup_dl, wakeup_rt, and wakeup tracers. However, in that case it is probably not useful to use the random sleep functionality. The reason why it may be desirable to catch all latencies with a long test campaign is that for some organizations, it's necessary to test the kernel in the field and not practical for developers to work iteratively with field testers. Because of cost and project schedules it is not possible to start a new test campaign every time a latency problem has been fixed. It uses inotify to detect changes to /sys/kernel/tracing/trace. When a latency is detected, it will either sleep or print immediately, depending on a function that act as an unfair coin toss. If immediate print is chosen, it means that we open /sys/kernel/tracing/trace and thereby cause a blackout period that will hide any subsequent latencies. If sleep is chosen, it means that we wait before opening /sys/kernel/tracing/trace, by default for 1000 ms, to see if there is another latency during this period. If there is, then we will lose the previous latency. The coin will be tossed again with a different probability, and we will either print the new latency, or possibly a subsequent one. The probability for the unfair coin toss is chosen so that there is equal probability to obtain any of the latencies in a burst. However, this assumes that we make an assumption of how many latencies there can be. By default the program assumes that there are no more than 2 latencies in a burst, the probability of immediate printout will be: 1/2 and 1 Thus, the probability of getting each of the two latencies will be 1/2. If we ever find that there is more than one latency in a series, meaning that we reach the probability of 1, then the table will be expanded to: 1/3, 1/2, and 1 Thus, we assume that there are no more than three latencies and each with a probability of 1/3 of being captured. If the probability of 1 is reached in the new table, that is we see more than two closely occurring latencies, then the table will again be extended, and so on. On my systems, it seems like this scheme works fairly well, as long as the latencies we trace are long enough, 300 us seems to be enough. This userspace program receive the inotify event at the end of a latency, and it has time until the end of the next latency to react, that is to open /sys/kernel/tracing/trace. Thus, if we trace latencies that are >300 us, then we have at least 300 us to react. The minimum latency will of course not be 300 us on all systems, it will depend on the hardware, kernel version, workload and configuration. Example usage: In one shell, give the following command: sudo latency-collector -rvv -t preemptirqsoff -s 2000 -a 3 This will trace latencies > 2000us with the preemptirqsoff tracer, using random sleep with maximum verbosity, with a probability table initialized to a size of 3. In another shell, generate a few bursts of latencies: root@host:~# modprobe preemptirq_delay_test delay=3000 test_mode=alternate burst_size=3 root@host:~# echo 1 > /sys/kernel/preemptirq_delay_test/trigger root@host:~# echo 1 > /sys/kernel/preemptirq_delay_test/trigger root@host:~# echo 1 > /sys/kernel/preemptirq_delay_test/trigger root@host:~# echo 1 > /sys/kernel/preemptirq_delay_test/trigger If all goes well, you should be getting stack traces that shows all the different latencies, i.e. you should see all the three functions preemptirqtest_0, preemptirqtest_1, preemptirqtest_2 in the stack traces. Signed-off-by: Viktor Rosendahl --- tools/Makefile| 14 +- tools/tracing/Makefile| 19 + tools/tracing/latency/Makefile| 24 + tools/tracing/latency/latency-collector.c | 2108 + 4 files changed, 2159 insertions(+), 6 deletions(-) create mode 100644 tools/tracing/Makefile create mode 100644 tools/tracing/latency/Makefile create mode 100644 tools/tracing/latency/latency-collector.c diff --git a/tools/Makefile b/tools/Makefile index 85af6ebbce91..7e9d34ddd74c 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -31,6 +31,7 @@ help: @echo ' bootconfig - boot config tool' @echo ' spi- spi tools' @echo ' tmon - thermal monitoring and tuning tool' + @echo ' tracing- misc tracing tools' @echo ' turbostat - Intel CPU idle stats and freq reporting tool' @echo ' usb- USB testing tools'
[PATCH] Add the latency-collector to tools
all objtool_install wmi_install pci_install debugging_install: +cgroup_install firewire_install gpio_install hv_install iio_install perf_install bootconfig_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install tracing_install: $(call descend,$(@:_install=),install) liblockdep_install: @@ -137,7 +138,8 @@ install: acpi_install cgroup_install cpupower_install gpio_install \ perf_install selftests_install turbostat_install usb_install \ virtio_install vm_install bpf_install x86_energy_perf_policy_install \ tmon_install freefall_install objtool_install kvm_stat_install \ - wmi_install pci_install debugging_install intel-speed-select_install + wmi_install pci_install debugging_install intel-speed-select_install \ + tracing_install acpi_clean: $(call descend,power/acpi,clean) @@ -145,7 +147,7 @@ acpi_clean: cpupower_clean: $(call descend,power/cpupower,clean) -cgroup_clean hv_clean firewire_clean bootconfig_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean: +cgroup_clean hv_clean firewire_clean bootconfig_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean tracing_clean: $(call descend,$(@:_clean=),clean) liblockdep_clean: @@ -184,6 +186,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean firewire_clean \ vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean tmon_clean \ freefall_clean build_clean libbpf_clean libsubcmd_clean liblockdep_clean \ gpio_clean objtool_clean leds_clean wmi_clean pci_clean firmware_clean debugging_clean \ - intel-speed-select_clean + intel-speed-select_clean tracing_clean .PHONY: FORCE diff --git a/tools/tracing/Makefile b/tools/tracing/Makefile new file mode 100644 index ..f53859765154 --- /dev/null +++ b/tools/tracing/Makefile @@ -0,0 +1,23 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for vm tools +# +VAR_CFLAGS := $(shell pkg-config --cflags libtracefs 2>/dev/null) +VAR_LDLIBS := $(shell pkg-config --libs libtracefs 2>/dev/null) + +TARGETS = latency-collector +CFLAGS = -Wall -Wextra -g -O2 $(VAR_CFLAGS) +LDFLAGS = -lpthread $(VAR_LDLIBS) + +all: $(TARGETS) + +%: %.c + $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) + +clean: + $(RM) latency-collector + +sbindir ?= /usr/sbin + +install: all + install -d $(DESTDIR)$(sbindir) + install -m 755 -p $(TARGETS) $(DESTDIR)$(sbindir) diff --git a/tools/tracing/latency-collector.c b/tools/tracing/latency-collector.c new file mode 100644 index ..e9aa7a47a8a3 --- /dev/null +++ b/tools/tracing/latency-collector.c @@ -0,0 +1,2102 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2017, 2018, 2019, 2021 BMW Car IT GmbH + * Author: Viktor Rosendahl (viktor.rosend...@bmw.de) + */ + +#define _GNU_SOURCE +#define _POSIX_C_SOURCE 200809L + +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static const char *prg_name; +static const char *prg_unknown = "unknown program name"; + +static int fd_stdout; + +static int sched_policy; +static bool sched_policy_set; + +static int sched_pri; +static bool sched_pri_set; + +static bool trace_enable = true; +static bool setup_ftrace = true; +static bool use_random_sleep; + +enum traceopt { + OPTIDX_FUNC_TR = 0, + OPTIDX_DISP_GR, + OPTIDX_NR +}; + +/* Should be in the order of enum traceopt */ +static const char * const optstr[] = { + "function-trace", /* OPTIDX_FUNC_TR */ + "display-graph", /* OPTIDX_DISP_GR */ + NULL +}; + +enum errhandling { + ERR_EXIT = 0, + ERR_WARN, + ERR_CLEANUP, +}; + +static bool use_options[OPTIDX_NR]; + +static char inotify_buffer[655360]; + +#define likely(x) __builtin_expect(!!(x), 1) +#define unlikely(x)__builtin_expect(!!(x), 0) +#define bool2str(x)(x ? "true":"false") + +#define DEFAULT_NR_PRINTER_THREADS (3) +static unsigned int nr_threads = DEFAULT_NR_PRINTER_THREADS; + +#define DEFAULT_TABLE_SIZE (2) +static unsigned int table_startsize = DEFAULT_TABLE_SIZE; + +static int verbosity; + +#define verbose_sizechange() (verbosity >= 1) +#define verbose_lostevent() (verbosity >= 2) +#define verbose_ftrace() (verbosity >= 1) + +#define was_changed(ORIG, CUR) (strcmp(ORIG, CUR) != 0) +#define needs_change(CUR, WANTED) (strcmp(CUR, WANTED) != 0) + +static const char *debug_tracefile; +static const char *debug_tracefile_dflt; +static const char *debu
[RFC PATCH 1/2] Use pause-on-trace with the latency tracers
Eaerlier, tracing was disabled when reading the trace file. This behavior was changed with: commit 06e0a548bad0 ("tracing: Do not disable tracing when reading the trace file"). This doesn't seem to work with the latency tracers. The above mentioned commit dit not only change the behavior but also added an option to emulate the old behavior. The idea with this patch is to enable this pause-on-trace option when the latency tracers are used. This is a workaround, perhaps it would be better to make the latency tracers work without pausing but I am not sure how to do that, or even how feasible it is without significant rework. Signed-off-by: Viktor Rosendahl --- kernel/trace/trace_irqsoff.c | 4 1 file changed, 4 insertions(+) diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index d06aab4dcbb8..6756379b661f 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -562,6 +562,8 @@ static int __irqsoff_tracer_init(struct trace_array *tr) /* non overwrite screws up the latency tracers */ set_tracer_flag(tr, TRACE_ITER_OVERWRITE, 1); set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, 1); + /* without pause, we will produce garbage if another latency occurs */ + set_tracer_flag(tr, TRACE_ITER_PAUSE_ON_TRACE, 1); tr->max_latency = 0; irqsoff_trace = tr; @@ -583,11 +585,13 @@ static void __irqsoff_tracer_reset(struct trace_array *tr) { int lat_flag = save_flags & TRACE_ITER_LATENCY_FMT; int overwrite_flag = save_flags & TRACE_ITER_OVERWRITE; + int pause_flag = save_flags & TRACE_ITER_PAUSE_ON_TRACE; stop_irqsoff_tracer(tr, is_graph(tr)); set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, lat_flag); set_tracer_flag(tr, TRACE_ITER_OVERWRITE, overwrite_flag); + set_tracer_flag(tr, TRACE_ITER_PAUSE_ON_TRACE, pause_flag); ftrace_reset_array_ops(tr); irqsoff_busy = false; -- 2.25.1
[RFC PATCH 2/2] Add the latency-collector to tools
lt;<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< In the example above, we randomly, happened to get the third latency in a burst of 10. If the experiment is repeated enough times, we will get all 10. Sometimes, there will be lost latencies, so that we get: 3054.078294 Latency 44 printout skipped due to inotify loop ..or: 3211.957685 Latency 112 printout skipped due to print loop In my experience, these are not frequent enough to be a problem. Also, we do not really lose any latency in these cases, it's more like we get another latency than what was intended by the design. Signed-off-by: Viktor Rosendahl --- tools/Makefile| 14 +- tools/tracing/Makefile| 20 + tools/tracing/latency-collector.c | 1212 + 3 files changed, 1240 insertions(+), 6 deletions(-) create mode 100644 tools/tracing/Makefile create mode 100644 tools/tracing/latency-collector.c diff --git a/tools/Makefile b/tools/Makefile index 85af6ebbce91..7e9d34ddd74c 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -31,6 +31,7 @@ help: @echo ' bootconfig - boot config tool' @echo ' spi- spi tools' @echo ' tmon - thermal monitoring and tuning tool' + @echo ' tracing- misc tracing tools' @echo ' turbostat - Intel CPU idle stats and freq reporting tool' @echo ' usb- USB testing tools' @echo ' virtio - vhost test module' @@ -64,7 +65,7 @@ acpi: FORCE cpupower: FORCE $(call descend,power/$@) -cgroup firewire hv guest bootconfig spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging: FORCE +cgroup firewire hv guest bootconfig spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging tracing: FORCE $(call descend,$@) bpf/%: FORCE @@ -103,7 +104,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \ perf selftests bootconfig spi turbostat usb \ virtio vm bpf x86_energy_perf_policy \ tmon freefall iio objtool kvm_stat wmi \ - pci debugging + pci debugging tracing acpi_install: $(call descend,power/$(@:_install=),install) @@ -111,7 +112,7 @@ acpi_install: cpupower_install: $(call descend,power/$(@:_install=),install) -cgroup_install firewire_install gpio_install hv_install iio_install perf_install bootconfig_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install: +cgroup_install firewire_install gpio_install hv_install iio_install perf_install bootconfig_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install tracing_install: $(call descend,$(@:_install=),install) liblockdep_install: @@ -137,7 +138,8 @@ install: acpi_install cgroup_install cpupower_install gpio_install \ perf_install selftests_install turbostat_install usb_install \ virtio_install vm_install bpf_install x86_energy_perf_policy_install \ tmon_install freefall_install objtool_install kvm_stat_install \ - wmi_install pci_install debugging_install intel-speed-select_install + wmi_install pci_install debugging_install intel-speed-select_install \ + tracing_install acpi_clean: $(call descend,power/acpi,clean) @@ -145,7 +147,7 @@ acpi_clean: cpupower_clean: $(call descend,power/cpupower,clean) -cgroup_clean hv_clean firewire_clean bootconfig_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean: +cgroup_clean hv_clean firewire_clean bootconfig_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean tracing_clean: $(call descend,$(@:_clean=),clean) liblockdep_clean: @@ -184,6 +186,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean firewire_clean \ vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean tmon_clean \ freefall_clean build_clean libbpf_clean libsubcmd_clean liblockdep_clean \ gpio_clean objtool_clean leds_clean wmi_clean pci_clean firmware_clean debugging_clean \ - intel-speed-select_clean + intel-speed-select_clean tracing_clean .PHONY: FORCE diff --git a/tools/tracing/Makefile b/tools/tracing/Makefile new file mode 100644 index ..59cd483ab01f --- /dev/null +++ b/tools/tracing/Makefile @@ -0,0 +1,20 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for vm tools +# +TARGETS = latency-collector +CFLAGS = -Wall -Wextra -g -O2 +LD
[RFC PATCH 0/2] Tracing bursts of latencies
Hello all, This series contains two things: 1. A fix for a bug in the Ftrace latency tracers that appeared with Linux 5.7. 2. The latency-collector, a tool that is designed to work around the limitations in the ftrace latency tracers. It needs the bug fix in order to work properly. I have sent a patch series with the latency-collector before. I never got any comments on it and I stopped pushing it because I thought that BPF tracing would be the wave of the future and that it would solve the problem in a cleaner and more elegant way. Recently, I tried out the criticalstat script from bcc tools but it did not fulfill all of my hopes and dreams. On the bright side, it was able to capture all latencies in a burst. The main problems that I encountered were: 1. The system became unstable and froze now and then. The man page of criticalstat has a mention of it being unstable, so I assume that this is a known problem. 2. Sometimes the stack traces were incorrect but not in an obvious way. After it happened once, all subsequent ones were bad. 3. If two instances were run simultaneously (to capture both preemptoff and irq off), there seemed to be a quite large performance hit but I did not measure this exactly. 4. The filesystem footprint seemed quite large. The size of libbcc seemed to be quite large for a small embedded system. For these reasons, I take the liberty of resending the latency-collector again. I would hope to get some comments regarding it, or some suggestion of an alternative approach of how to solve the problem of being able to capture latencies that systematically occur close to each other. Admittedly, it may from a developer's perspective be somewhat of a niche problem, since removing one latency will reveal the next but when one is doing validation with a fleet of devices being tested in a long and expensive test campaign, then it is quite desirable to not lose any latencies. best regards, Viktor Viktor Rosendahl (2): Use pause-on-trace with the latency tracers Add the latency-collector to tools kernel/trace/trace_irqsoff.c |4 + tools/Makefile| 14 +- tools/tracing/Makefile| 20 + tools/tracing/latency-collector.c | 1212 + 4 files changed, 1244 insertions(+), 6 deletions(-) create mode 100644 tools/tracing/Makefile create mode 100644 tools/tracing/latency-collector.c -- 2.25.1
[PATCH v10 1/4] ftrace: Implement fs notification for tracing_max_latency
This patch implements the feature that the tracing_max_latency file, e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive notifications through the fsnotify framework when a new latency is available. One particularly interesting use of this facility is when enabling threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh, together with the preempt/irqsoff tracers. This makes it possible to implement a user space program that can, with equal probability, obtain traces of latencies that occur immediately after each other in spite of the fact that the preempt/irqsoff tracers operate in overwrite mode. This facility works with the preempt/irqsoff, and wakeup tracers. The tracers may call the latency_fsnotify() from places such as __schedule() or do_idle(); this makes it impossible to call queue_work() directly without risking a deadlock. The same would happen with a softirq, kernel thread or tasklet. For this reason we use the irq_work mechanism to call queue_work(). This patch creates a new workqueue. The reason for doing this is that I wanted to use the WQ_UNBOUND and WQ_HIGHPRI flags. My thinking was that WQ_UNBOUND might help with the latency in some important cases. If we use: queue_work(system_highpri_wq, >fsnotify_work); then the work will (almost) always execute on the same CPU but if we are unlucky that CPU could be too busy while there could be another CPU in the system that would be able to process the work soon enough. queue_work_on() could be used to queue the work on another CPU but it seems difficult to select the right CPU. Signed-off-by: Viktor Rosendahl (BMW) --- kernel/trace/trace.c | 73 ++-- kernel/trace/trace.h | 7 + 2 files changed, 78 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 6a0ee9178365..2124646dfb2a 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -45,6 +45,9 @@ #include #include #include +#include +#include +#include #include "trace.h" #include "trace_output.h" @@ -1497,6 +1500,70 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, void *buf, size_t cnt) } unsigned long __read_mostlytracing_thresh; +static const struct file_operations tracing_max_lat_fops; + +#if defined(CONFIG_TRACER_MAX_TRACE) && defined(CONFIG_FSNOTIFY) + +static struct workqueue_struct *fsnotify_wq; + +static void latency_fsnotify_workfn(struct work_struct *work) +{ + struct trace_array *tr = container_of(work, struct trace_array, + fsnotify_work); + fsnotify(tr->d_max_latency->d_inode, FS_MODIFY, +tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0); +} + +static void latency_fsnotify_workfn_irq(struct irq_work *iwork) +{ + struct trace_array *tr = container_of(iwork, struct trace_array, + fsnotify_irqwork); + queue_work(fsnotify_wq, >fsnotify_work); +} + +static void trace_create_maxlat_file(struct trace_array *tr, +struct dentry *d_tracer) +{ + INIT_WORK(>fsnotify_work, latency_fsnotify_workfn); + init_irq_work(>fsnotify_irqwork, latency_fsnotify_workfn_irq); + tr->d_max_latency = trace_create_file("tracing_max_latency", 0644, + d_tracer, >max_latency, + _max_lat_fops); +} + +__init static int latency_fsnotify_init(void) +{ + fsnotify_wq = alloc_workqueue("tr_max_lat_wq", + WQ_UNBOUND | WQ_HIGHPRI, 0); + if (!fsnotify_wq) { + pr_err("Unable to allocate tr_max_lat_wq\n"); + return -ENOMEM; + } + return 0; +} + +late_initcall_sync(latency_fsnotify_init); + +static void latency_fsnotify(struct trace_array *tr) +{ + if (!fsnotify_wq) + return; + /* +* We cannot call queue_work(>fsnotify_work) from here because it's +* possible that we are called from __schedule() or do_idle(), which +* could cause a deadlock. +*/ + irq_work_queue(>fsnotify_irqwork); +} + + +#else /* defined(CONFIG_TRACER_MAX_TRACE) && defined(CONFIG_FSNOTIFY) */ + +#define trace_create_maxlat_file(tr, d_tracer) \ + trace_create_file("tracing_max_latency", 0644, d_tracer,\ + >max_latency, _max_lat_fops) + +#endif #ifdef CONFIG_TRACER_MAX_TRACE /* @@ -1536,6 +1603,9 @@ __update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu) /* record this tasks comm */ tracing_record_cmdline(tsk); +#ifdef CONFIG_FSNOTIFY + latency_fsnotify(tr); +#endif } /** @@ -8585,8 +8655,7 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_t
[PATCH v10 4/4] ftrace: Add an option for tracing console latencies
This new kernel parameter "trace_console_latency" will enable the latency tracers to trace the console latencies. Previously this has always been implicitely disabled. I guess this is because they are considered to be well known and unavoidable. However, for some organizations it may nevertheless be desirable to trace them. Basically, we want to be able to tell that there are latencies in the system under test because someone has incorrectly enabled the serial console. Signed-off-by: Viktor Rosendahl (BMW) --- .../admin-guide/kernel-parameters.txt | 4 +++ include/linux/irqflags.h | 22 + kernel/printk/printk.c| 6 ++-- kernel/trace/trace_irqsoff.c | 32 +++ 4 files changed, 62 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index a84a83f8881e..3f96a380c528 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4768,6 +4768,10 @@ trace_buf_size=nn[KMG] [FTRACE] will set tracing buffer size on each cpu. + trace_console_latency=[0|1] + [FTRACE] Trace the console latencies if 1 is given. Do + not trace the latencies if 0 or no parameter is given. + trace_event=[event-list] [FTRACE] Set and start specified trace events in order to facilitate early boot debugging. The event-list is a diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 21619c92c377..3de891723331 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -13,6 +13,7 @@ #define _LINUX_TRACE_IRQFLAGS_H #include +#include #include /* Currently trace_softirqs_on/off is used only by lockdep */ @@ -68,9 +69,30 @@ do { \ defined(CONFIG_PREEMPT_TRACER) extern void stop_critical_timings(void); extern void start_critical_timings(void); + extern bool console_tracing_disabled(void); + +# define console_stop_critical_timings(flag) \ + do {\ + typecheck(bool, flag); \ + flag = console_tracing_disabled(); \ + if (flag) \ + stop_critical_timings();\ + } while (0) + +# define console_start_critical_timings(flag) \ + do { \ + typecheck(bool, flag); \ + if (flag)\ + start_critical_timings();\ + } while (0) + #else # define stop_critical_timings() do { } while (0) # define start_critical_timings() do { } while (0) +# define console_stop_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) +# define console_start_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) #endif /* diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index ca65327a6de8..f27e96273453 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2369,6 +2369,7 @@ void console_unlock(void) static char ext_text[CONSOLE_EXT_LOG_MAX]; static char text[LOG_LINE_MAX + PREFIX_MAX]; unsigned long flags; + bool cflag; bool do_cond_resched, retry; if (console_suspended) { @@ -2469,9 +2470,10 @@ void console_unlock(void) */ console_lock_spinning_enable(); - stop_critical_timings();/* don't trace print latency */ + /* don't trace print latency if it's disabled */ + console_stop_critical_timings(cflag); call_console_drivers(ext_text, ext_len, text, len); - start_critical_timings(); + console_start_critical_timings(cflag); if (console_lock_spinning_disable_and_check()) { printk_safe_exit_irqrestore(flags); diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index a745b0cee5d3..84876192cc73 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include "trace.h" @@ -456,6 +457,37 @@ void stop_critical_timings(void) EXPORT_SYMBOL_GPL(stop_critical_timings); NOKPROBE_SYMBOL(stop_critical_timings); +static bool no_console_latency __read_mostly = true; + +static int __init trace_console_latency(char *str) +{ + int param; + + get_option(, ); + no_console_latency = !(param == 1); + + return 0; +} + +early_param("trace_console_latency", trace_console_latency); + +bool console_tracing_disabled(void) +{
[PATCH v10 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger
This burst feature enables the user to generate a burst of preempt/irqsoff latencies. This makes it possible to test whether we are able to detect latencies that systematically occur very close to each other. The maximum burst size is 10. We also create 10 identical test functions, so that we get 10 different backtraces; this is useful when we want to test whether we can detect all the latencies in a burst. Otherwise, there would be no easy way of differentiating between which latency in a burst was captured by the tracer. In addition, there is a sysfs trigger, so that it's not necessary to reload the module to repeat the test. The trigger will appear as /sys/kernel/preemptirq_delay_test/trigger in sysfs. Signed-off-by: Viktor Rosendahl (BMW) Reviewed-by: Joel Fernandes (Google) --- kernel/trace/Kconfig | 6 +- kernel/trace/preemptirq_delay_test.c | 144 +++ 2 files changed, 128 insertions(+), 22 deletions(-) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index e08527f50d2a..2a58380ea310 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -752,9 +752,9 @@ config PREEMPTIRQ_DELAY_TEST configurable delay. The module busy waits for the duration of the critical section. - For example, the following invocation forces a one-time irq-disabled - critical section for 500us: - modprobe preemptirq_delay_test test_mode=irq delay=50 + For example, the following invocation generates a burst of three + irq-disabled critical sections for 500us: + modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3 If unsure, say N diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index d8765c952fab..31c0fad4cb9e 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -10,18 +10,25 @@ #include #include #include +#include #include #include #include #include +#include static ulong delay = 100; -static char test_mode[10] = "irq"; +static char test_mode[12] = "irq"; +static uint burst_size = 1; -module_param_named(delay, delay, ulong, S_IRUGO); -module_param_string(test_mode, test_mode, 10, S_IRUGO); -MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)"); -MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default irq)"); +module_param_named(delay, delay, ulong, 0444); +module_param_string(test_mode, test_mode, 12, 0444); +module_param_named(burst_size, burst_size, uint, 0444); +MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)"); +MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or alternate (default irq)"); +MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)"); + +#define MIN(x, y) ((x) < (y) ? (x) : (y)) static void busy_wait(ulong time) { @@ -34,37 +41,136 @@ static void busy_wait(ulong time) } while ((end - start) < (time * 1000)); } -static int preemptirq_delay_run(void *data) +static __always_inline void irqoff_test(void) { unsigned long flags; + local_irq_save(flags); + busy_wait(delay); + local_irq_restore(flags); +} - if (!strcmp(test_mode, "irq")) { - local_irq_save(flags); - busy_wait(delay); - local_irq_restore(flags); - } else if (!strcmp(test_mode, "preempt")) { - preempt_disable(); - busy_wait(delay); - preempt_enable(); +static __always_inline void preemptoff_test(void) +{ + preempt_disable(); + busy_wait(delay); + preempt_enable(); +} + +static void execute_preemptirqtest(int idx) +{ + if (!strcmp(test_mode, "irq")) + irqoff_test(); + else if (!strcmp(test_mode, "preempt")) + preemptoff_test(); + else if (!strcmp(test_mode, "alternate")) { + if (idx % 2 == 0) + irqoff_test(); + else + preemptoff_test(); } +} + +#define DECLARE_TESTFN(POSTFIX)\ + static void preemptirqtest_##POSTFIX(int idx) \ + { \ + execute_preemptirqtest(idx);\ + } \ +/* + * We create 10 different functions, so that we can get 10 different + * backtraces. + */ +DECLARE_TESTFN(0) +DECLARE_TESTFN(1) +DECLARE_TESTFN(2) +DECLARE_TESTFN(3) +DECLARE_TESTFN(4) +DECLARE_TESTFN(5) +DECLARE_TESTFN(6) +DECLARE_TESTFN(7) +DECLARE_TESTFN(8) +DECLARE_TESTFN(9) + +static void (*testfuncs[])(int) = { + preemptirqtest_0, + preemptirqtest_1, + preemptirqtest_2, + preemptirqtest_3, + preemptirqtest_4, + preemptirq
[PATCH v10 3/4] Add the latency-collector to tools
;<<<<<<<<<<<<<<<<<<<<<<<<<<<<< In the example above, we randomly, happened to get the third latency in a burst burst of 10. If the experiment is repeated enough times, we will get all 10. Sometimes, there will be lost latencies, so that we get: 3054.078294 Latency 44 printout skipped due to inotify loop ..or: 3211.957685 Latency 112 printout skipped due to print loop In my experience, these are not frequent enough to be a problem. Also, we do not really lose any latency in these cases, it's more like we get another latency than what was intended by the design. Signed-off-by: Viktor Rosendahl (BMW) --- tools/Makefile | 14 +- tools/trace/Makefile| 20 + tools/trace/latency-collector.c | 1212 +++ 3 files changed, 1240 insertions(+), 6 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c diff --git a/tools/Makefile b/tools/Makefile index 7e42f7b8bfa7..957762a8b811 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -30,6 +30,7 @@ help: @echo ' selftests - various kernel selftests' @echo ' spi- spi tools' @echo ' tmon - thermal monitoring and tuning tool' + @echo ' trace - misc tracing tools' @echo ' turbostat - Intel CPU idle stats and freq reporting tool' @echo ' usb- USB testing tools' @echo ' virtio - vhost test module' @@ -63,7 +64,7 @@ acpi: FORCE cpupower: FORCE $(call descend,power/$@) -cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging: FORCE +cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging trace: FORCE $(call descend,$@) liblockdep: FORCE @@ -99,7 +100,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \ perf selftests spi turbostat usb \ virtio vm bpf x86_energy_perf_policy \ tmon freefall iio objtool kvm_stat wmi \ - pci debugging + pci debugging trace acpi_install: $(call descend,power/$(@:_install=),install) @@ -107,7 +108,7 @@ acpi_install: cpupower_install: $(call descend,power/$(@:_install=),install) -cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install: +cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install trace_install: $(call descend,$(@:_install=),install) liblockdep_install: @@ -133,7 +134,8 @@ install: acpi_install cgroup_install cpupower_install gpio_install \ perf_install selftests_install turbostat_install usb_install \ virtio_install vm_install bpf_install x86_energy_perf_policy_install \ tmon_install freefall_install objtool_install kvm_stat_install \ - wmi_install pci_install debugging_install intel-speed-select_install + wmi_install pci_install debugging_install intel-speed-select_install \ + trace_install acpi_clean: $(call descend,power/acpi,clean) @@ -141,7 +143,7 @@ acpi_clean: cpupower_clean: $(call descend,power/cpupower,clean) -cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean: +cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean trace_clean: $(call descend,$(@:_clean=),clean) liblockdep_clean: @@ -180,6 +182,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean firewire_clean \ vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean tmon_clean \ freefall_clean build_clean libbpf_clean libsubcmd_clean liblockdep_clean \ gpio_clean objtool_clean leds_clean wmi_clean pci_clean firmware_clean debugging_clean \ - intel-speed-select_clean + intel-speed-select_clean trace_clean .PHONY: FORCE diff --git a/tools/trace/Makefile b/tools/trace/Makefile new file mode 100644 index ..59cd483ab01f --- /dev/null +++ b/tools/trace/Makefile @@ -0,0 +1,20 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for vm tools +# +TARGETS = latency-collector +CFLAGS = -Wall -Wextra -g -O2 +LDFLAGS = -lpthread + +all: $(TARGETS) + +%: %.c + $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) + +clean: +
[PATCH v10 0/4] Some new features for the preempt/irqsoff tracers
Hello all, Changes in v10: - [PATCH 1/4]: * Changed the #if statements to not depend on CONFIG_HWLAT_TRACER * Changed latency_fsnotify() to a static and removed the declaration from kernel/trace/trace.h. - [PATCH 2/4]: * No change. - [PATCH 3/4]: * No change. - [PACTH 4/4]: * No change. This series is meant to address two issues with the latency tracing. The first three patches provide a method to trace latencies that always occurs very close to each other and to differentiate between them, in spite of the fact that the latency tracers work in overwrite mode. [PATCH 1/4] This implement fs notification for tracing_max_latency. It makes it possible for userspace to detect when a new latency has been detected. [PATCH 2/4] This extends the preemptirq_delay_test module so that it can be used to generate a burst of closely occurring latencies. [PATCH 3/4] This adds a user space program to the tools directory that utilizes the fs notification feature and a randomized algorithm to print out any of the latencies in a burst with approximately equal probability. The last patch is not directly connected but earlier it didn't apply cleanly on its own. However, now it does, so in principle it could be applied separately from the others. [PATCH 4/4] This adds the option trace_console_latency=1 to the kernel parameters. This makes it possible to enable tracing of console latencies. best regards, Viktor Viktor Rosendahl (BMW) (4): ftrace: Implement fs notification for tracing_max_latency preemptirq_delay_test: Add the burst feature and a sysfs trigger Add the latency-collector to tools ftrace: Add an option for tracing console latencies .../admin-guide/kernel-parameters.txt |4 + include/linux/irqflags.h | 22 + kernel/printk/printk.c|6 +- kernel/trace/Kconfig |6 +- kernel/trace/preemptirq_delay_test.c | 144 +- kernel/trace/trace.c | 73 +- kernel/trace/trace.h |7 + kernel/trace/trace_irqsoff.c | 32 + tools/Makefile| 14 +- tools/trace/Makefile | 20 + tools/trace/latency-collector.c | 1212 + 11 files changed, 1508 insertions(+), 32 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c -- 2.17.1
Re: [PATCH v9 1/4] ftrace: Implement fs notification for tracing_max_latency
> Something bothers me. If you dropped support for HWLAT_TRACER as you > mentioned in the cover letter, then why does this #if look for the CONFIG > option? You are right. I forgot to change those #if statements; I also overlooked to give the -i option to grep when I searched for remaining references to hwlat. >(and similar comment on the rest of the patch..) Below is a new version of the patch. I will not send out v10 yet, as I expect that there might be more comments. I did not define a new macro, since it is now much simpler after removing the HWLAT_TRACER stuff from those #if statements. I made the latency_fsnotify() static and removed its declaration from kernel/trace/trace.h, since it's only used in kernel/trace/trace.c. best regards, Viktor This patch implements the feature that the tracing_max_latency file, e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive notifications through the fsnotify framework when a new latency is available. One particularly interesting use of this facility is when enabling threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh, together with the preempt/irqsoff tracers. This makes it possible to implement a user space program that can, with equal probability, obtain traces of latencies that occur immediately after each other in spite of the fact that the preempt/irqsoff tracers operate in overwrite mode. This facility works with the preempt/irqsoff, and wakeup tracers. The tracers may call the latency_fsnotify() from places such as __schedule() or do_idle(); this makes it impossible to call queue_work() directly without risking a deadlock. The same would happen with a softirq, kernel thread or tasklet. For this reason we use the irq_work mechanism to call queue_work(). This patch creates a new workqueue. The reason for doing this is that I wanted to use the WQ_UNBOUND and WQ_HIGHPRI flags. My thinking was that WQ_UNBOUND might help with the latency in some important cases. If we use: queue_work(system_highpri_wq, >fsnotify_work); then the work will (almost) always execute on the same CPU but if we are unlucky that CPU could be too busy while there could be another CPU in the system that would be able to process the work soon enough. queue_work_on() could be used to queue the work on another CPU but it seems difficult to select the right CPU. Signed-off-by: Viktor Rosendahl (BMW) --- kernel/trace/trace.c | 73 ++-- kernel/trace/trace.h | 7 + 2 files changed, 78 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 6a0ee9178365..2124646dfb2a 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -45,6 +45,9 @@ #include #include #include +#include +#include +#include #include "trace.h" #include "trace_output.h" @@ -1497,6 +1500,70 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, void *buf, size_t cnt) } unsigned long __read_mostlytracing_thresh; +static const struct file_operations tracing_max_lat_fops; + +#if defined(CONFIG_TRACER_MAX_TRACE) && defined(CONFIG_FSNOTIFY) + +static struct workqueue_struct *fsnotify_wq; + +static void latency_fsnotify_workfn(struct work_struct *work) +{ + struct trace_array *tr = container_of(work, struct trace_array, + fsnotify_work); + fsnotify(tr->d_max_latency->d_inode, FS_MODIFY, +tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0); +} + +static void latency_fsnotify_workfn_irq(struct irq_work *iwork) +{ + struct trace_array *tr = container_of(iwork, struct trace_array, + fsnotify_irqwork); + queue_work(fsnotify_wq, >fsnotify_work); +} + +static void trace_create_maxlat_file(struct trace_array *tr, +struct dentry *d_tracer) +{ + INIT_WORK(>fsnotify_work, latency_fsnotify_workfn); + init_irq_work(>fsnotify_irqwork, latency_fsnotify_workfn_irq); + tr->d_max_latency = trace_create_file("tracing_max_latency", 0644, + d_tracer, >max_latency, + _max_lat_fops); +} + +__init static int latency_fsnotify_init(void) +{ + fsnotify_wq = alloc_workqueue("tr_max_lat_wq", + WQ_UNBOUND | WQ_HIGHPRI, 0); + if (!fsnotify_wq) { + pr_err("Unable to allocate tr_max_lat_wq\n"); + return -ENOMEM; + } + return 0; +} + +late_initcall_sync(latency_fsnotify_init); + +static void latency_fsnotify(struct trace_array *tr) +{ + if (!fsnotify_wq) + return; + /* +* We cannot call queue_work(>fsnotify_work) from here because it's +* possible that we are called from __schedule() or do_idle(), which
[PATCH v9 3/4] Add the latency-collector to tools
;<<<<<<<<<<<<<<<<<<<<<<<<<<<<< In the example above, we randomly, happened to get the third latency in a burst burst of 10. If the experiment is repeated enough times, we will get all 10. Sometimes, there will be lost latencies, so that we get: 3054.078294 Latency 44 printout skipped due to inotify loop ..or: 3211.957685 Latency 112 printout skipped due to print loop In my experience, these are not frequent enough to be a problem. Also, we do not really lose any latency in these cases, it's more like we get another latency than what was intended by the design. Signed-off-by: Viktor Rosendahl (BMW) --- tools/Makefile | 14 +- tools/trace/Makefile| 20 + tools/trace/latency-collector.c | 1212 +++ 3 files changed, 1240 insertions(+), 6 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c diff --git a/tools/Makefile b/tools/Makefile index 7e42f7b8bfa7..957762a8b811 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -30,6 +30,7 @@ help: @echo ' selftests - various kernel selftests' @echo ' spi- spi tools' @echo ' tmon - thermal monitoring and tuning tool' + @echo ' trace - misc tracing tools' @echo ' turbostat - Intel CPU idle stats and freq reporting tool' @echo ' usb- USB testing tools' @echo ' virtio - vhost test module' @@ -63,7 +64,7 @@ acpi: FORCE cpupower: FORCE $(call descend,power/$@) -cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging: FORCE +cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging trace: FORCE $(call descend,$@) liblockdep: FORCE @@ -99,7 +100,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \ perf selftests spi turbostat usb \ virtio vm bpf x86_energy_perf_policy \ tmon freefall iio objtool kvm_stat wmi \ - pci debugging + pci debugging trace acpi_install: $(call descend,power/$(@:_install=),install) @@ -107,7 +108,7 @@ acpi_install: cpupower_install: $(call descend,power/$(@:_install=),install) -cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install: +cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install trace_install: $(call descend,$(@:_install=),install) liblockdep_install: @@ -133,7 +134,8 @@ install: acpi_install cgroup_install cpupower_install gpio_install \ perf_install selftests_install turbostat_install usb_install \ virtio_install vm_install bpf_install x86_energy_perf_policy_install \ tmon_install freefall_install objtool_install kvm_stat_install \ - wmi_install pci_install debugging_install intel-speed-select_install + wmi_install pci_install debugging_install intel-speed-select_install \ + trace_install acpi_clean: $(call descend,power/acpi,clean) @@ -141,7 +143,7 @@ acpi_clean: cpupower_clean: $(call descend,power/cpupower,clean) -cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean: +cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean trace_clean: $(call descend,$(@:_clean=),clean) liblockdep_clean: @@ -180,6 +182,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean firewire_clean \ vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean tmon_clean \ freefall_clean build_clean libbpf_clean libsubcmd_clean liblockdep_clean \ gpio_clean objtool_clean leds_clean wmi_clean pci_clean firmware_clean debugging_clean \ - intel-speed-select_clean + intel-speed-select_clean trace_clean .PHONY: FORCE diff --git a/tools/trace/Makefile b/tools/trace/Makefile new file mode 100644 index ..59cd483ab01f --- /dev/null +++ b/tools/trace/Makefile @@ -0,0 +1,20 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for vm tools +# +TARGETS = latency-collector +CFLAGS = -Wall -Wextra -g -O2 +LDFLAGS = -lpthread + +all: $(TARGETS) + +%: %.c + $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) + +clean: +
[PATCH v9 0/4] Some new features for the preempt/irqsoff tracers
Hello all, Changes in v9: - [PATCH 1/4]: * Rebased the patch for v5.4-rc3+ * Dropped support for the hwlat tracer. It seems to me, both from testing and looking at the code, that hwlat doesn't actually update /sys/kernel/debug/tracing/tracing_max_latency. I think it would be a bit weird to send fsnotify for changes to a file that are not visible to userspace. Also, the hwlat use case is not very interesting from my perspective because hwlat works in non-overwrite mode by default. For me, the effort needed to make it work properly would be more than what it's worth. * Because of the changes mentioned above, I removed the Reviewed-by tag. - [PATCH 2/4]: * No change. - [PATCH 3/4]: * Removed the mention of hwlat from the commit and help messages. - [PACTH 4/4]: * Changed the trace-option to a kernel parameter. This would hopefully make it easier to remove it if it becomes obsolete. This series is meant to address two issues with the latency tracing. The first three patches provide a method to trace latencies that always occurs very close to each other and to differentiate between them, in spite of the fact that the latency tracers work in overwrite mode. [PATCH 1/4] This implement fs notification for tracing_max_latency. It makes it possible for userspace to detect when a new latency has been detected. [PATCH 2/4] This extends the preemptirq_delay_test module so that it can be used to generate a burst of closely occurring latencies. [PATCH 3/4] This adds a user space program to the tools directory that utilizes the fs notification feature and a randomized algorithm to print out any of the latencies in a burst with approximately equal probability. The last patch is not directly connected but earlier it didn't apply cleanly on its own. However, now it does, so in principle it could be applied separately from the others. [PATCH 4/4] This adds the option console-latency to the trace options. This makes it possible to enable tracing of console latencies. best regards, Viktor Viktor Rosendahl (BMW) (4): ftrace: Implement fs notification for tracing_max_latency preemptirq_delay_test: Add the burst feature and a sysfs trigger Add the latency-collector to tools ftrace: Add an option for tracing console latencies .../admin-guide/kernel-parameters.txt |4 + include/linux/irqflags.h | 22 + kernel/printk/printk.c|6 +- kernel/trace/Kconfig |6 +- kernel/trace/preemptirq_delay_test.c | 144 +- kernel/trace/trace.c | 75 +- kernel/trace/trace.h | 18 + kernel/trace/trace_irqsoff.c | 32 + tools/Makefile| 14 +- tools/trace/Makefile | 20 + tools/trace/latency-collector.c | 1212 + 11 files changed, 1521 insertions(+), 32 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c -- 2.17.1
[PATCH v9 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger
This burst feature enables the user to generate a burst of preempt/irqsoff latencies. This makes it possible to test whether we are able to detect latencies that systematically occur very close to each other. The maximum burst size is 10. We also create 10 identical test functions, so that we get 10 different backtraces; this is useful when we want to test whether we can detect all the latencies in a burst. Otherwise, there would be no easy way of differentiating between which latency in a burst was captured by the tracer. In addition, there is a sysfs trigger, so that it's not necessary to reload the module to repeat the test. The trigger will appear as /sys/kernel/preemptirq_delay_test/trigger in sysfs. Signed-off-by: Viktor Rosendahl (BMW) Reviewed-by: Joel Fernandes (Google) --- kernel/trace/Kconfig | 6 +- kernel/trace/preemptirq_delay_test.c | 144 +++ 2 files changed, 128 insertions(+), 22 deletions(-) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index e08527f50d2a..2a58380ea310 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -752,9 +752,9 @@ config PREEMPTIRQ_DELAY_TEST configurable delay. The module busy waits for the duration of the critical section. - For example, the following invocation forces a one-time irq-disabled - critical section for 500us: - modprobe preemptirq_delay_test test_mode=irq delay=50 + For example, the following invocation generates a burst of three + irq-disabled critical sections for 500us: + modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3 If unsure, say N diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index d8765c952fab..31c0fad4cb9e 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -10,18 +10,25 @@ #include #include #include +#include #include #include #include #include +#include static ulong delay = 100; -static char test_mode[10] = "irq"; +static char test_mode[12] = "irq"; +static uint burst_size = 1; -module_param_named(delay, delay, ulong, S_IRUGO); -module_param_string(test_mode, test_mode, 10, S_IRUGO); -MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)"); -MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default irq)"); +module_param_named(delay, delay, ulong, 0444); +module_param_string(test_mode, test_mode, 12, 0444); +module_param_named(burst_size, burst_size, uint, 0444); +MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)"); +MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or alternate (default irq)"); +MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)"); + +#define MIN(x, y) ((x) < (y) ? (x) : (y)) static void busy_wait(ulong time) { @@ -34,37 +41,136 @@ static void busy_wait(ulong time) } while ((end - start) < (time * 1000)); } -static int preemptirq_delay_run(void *data) +static __always_inline void irqoff_test(void) { unsigned long flags; + local_irq_save(flags); + busy_wait(delay); + local_irq_restore(flags); +} - if (!strcmp(test_mode, "irq")) { - local_irq_save(flags); - busy_wait(delay); - local_irq_restore(flags); - } else if (!strcmp(test_mode, "preempt")) { - preempt_disable(); - busy_wait(delay); - preempt_enable(); +static __always_inline void preemptoff_test(void) +{ + preempt_disable(); + busy_wait(delay); + preempt_enable(); +} + +static void execute_preemptirqtest(int idx) +{ + if (!strcmp(test_mode, "irq")) + irqoff_test(); + else if (!strcmp(test_mode, "preempt")) + preemptoff_test(); + else if (!strcmp(test_mode, "alternate")) { + if (idx % 2 == 0) + irqoff_test(); + else + preemptoff_test(); } +} + +#define DECLARE_TESTFN(POSTFIX)\ + static void preemptirqtest_##POSTFIX(int idx) \ + { \ + execute_preemptirqtest(idx);\ + } \ +/* + * We create 10 different functions, so that we can get 10 different + * backtraces. + */ +DECLARE_TESTFN(0) +DECLARE_TESTFN(1) +DECLARE_TESTFN(2) +DECLARE_TESTFN(3) +DECLARE_TESTFN(4) +DECLARE_TESTFN(5) +DECLARE_TESTFN(6) +DECLARE_TESTFN(7) +DECLARE_TESTFN(8) +DECLARE_TESTFN(9) + +static void (*testfuncs[])(int) = { + preemptirqtest_0, + preemptirqtest_1, + preemptirqtest_2, + preemptirqtest_3, + preemptirqtest_4, + preemptirq
[PATCH v9 4/4] ftrace: Add an option for tracing console latencies
This new kernel parameter "trace_console_latency" will enable the latency tracers to trace the console latencies. Previously this has always been implicitely disabled. I guess this is because they are considered to be well known and unavoidable. However, for some organizations it may nevertheless be desirable to trace them. Basically, we want to be able to tell that there are latencies in the system under test because someone has incorrectly enabled the serial console. Signed-off-by: Viktor Rosendahl (BMW) --- .../admin-guide/kernel-parameters.txt | 4 +++ include/linux/irqflags.h | 22 + kernel/printk/printk.c| 6 ++-- kernel/trace/trace_irqsoff.c | 32 +++ 4 files changed, 62 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index a84a83f8881e..3f96a380c528 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4768,6 +4768,10 @@ trace_buf_size=nn[KMG] [FTRACE] will set tracing buffer size on each cpu. + trace_console_latency=[0|1] + [FTRACE] Trace the console latencies if 1 is given. Do + not trace the latencies if 0 or no parameter is given. + trace_event=[event-list] [FTRACE] Set and start specified trace events in order to facilitate early boot debugging. The event-list is a diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 21619c92c377..3de891723331 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -13,6 +13,7 @@ #define _LINUX_TRACE_IRQFLAGS_H #include +#include #include /* Currently trace_softirqs_on/off is used only by lockdep */ @@ -68,9 +69,30 @@ do { \ defined(CONFIG_PREEMPT_TRACER) extern void stop_critical_timings(void); extern void start_critical_timings(void); + extern bool console_tracing_disabled(void); + +# define console_stop_critical_timings(flag) \ + do {\ + typecheck(bool, flag); \ + flag = console_tracing_disabled(); \ + if (flag) \ + stop_critical_timings();\ + } while (0) + +# define console_start_critical_timings(flag) \ + do { \ + typecheck(bool, flag); \ + if (flag)\ + start_critical_timings();\ + } while (0) + #else # define stop_critical_timings() do { } while (0) # define start_critical_timings() do { } while (0) +# define console_stop_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) +# define console_start_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) #endif /* diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index ca65327a6de8..f27e96273453 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2369,6 +2369,7 @@ void console_unlock(void) static char ext_text[CONSOLE_EXT_LOG_MAX]; static char text[LOG_LINE_MAX + PREFIX_MAX]; unsigned long flags; + bool cflag; bool do_cond_resched, retry; if (console_suspended) { @@ -2469,9 +2470,10 @@ void console_unlock(void) */ console_lock_spinning_enable(); - stop_critical_timings();/* don't trace print latency */ + /* don't trace print latency if it's disabled */ + console_stop_critical_timings(cflag); call_console_drivers(ext_text, ext_len, text, len); - start_critical_timings(); + console_start_critical_timings(cflag); if (console_lock_spinning_disable_and_check()) { printk_safe_exit_irqrestore(flags); diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index a745b0cee5d3..84876192cc73 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include "trace.h" @@ -456,6 +457,37 @@ void stop_critical_timings(void) EXPORT_SYMBOL_GPL(stop_critical_timings); NOKPROBE_SYMBOL(stop_critical_timings); +static bool no_console_latency __read_mostly = true; + +static int __init trace_console_latency(char *str) +{ + int param; + + get_option(, ); + no_console_latency = !(param == 1); + + return 0; +} + +early_param("trace_console_latency", trace_console_latency); + +bool console_tracing_disabled(void) +{
[PATCH v9 1/4] ftrace: Implement fs notification for tracing_max_latency
This patch implements the feature that the tracing_max_latency file, e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive notifications through the fsnotify framework when a new latency is available. One particularly interesting use of this facility is when enabling threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh, together with the preempt/irqsoff tracers. This makes it possible to implement a user space program that can, with equal probability, obtain traces of latencies that occur immediately after each other in spite of the fact that the preempt/irqsoff tracers operate in overwrite mode. This facility works with the preempt/irqsoff, and wakeup tracers. The tracers may call the latency_fsnotify() from places such as __schedule() or do_idle(); this makes it impossible to call queue_work() directly without risking a deadlock. The same would happen with a softirq, kernel thread or tasklet. For this reason we use the irq_work mechanism to call queue_work(). This patch creates a new workqueue. The reason for doing this is that I wanted to use the WQ_UNBOUND and WQ_HIGHPRI flags. My thinking was that WQ_UNBOUND might help with the latency in some important cases. If we use: queue_work(system_highpri_wq, >fsnotify_work); then the work will (almost) always execute on the same CPU but if we are unlucky that CPU could be too busy while there could be another CPU in the system that would be able to process the work soon enough. queue_work_on() could be used to queue the work on another CPU but it seems difficult to select the right CPU. Signed-off-by: Viktor Rosendahl (BMW) --- kernel/trace/trace.c | 75 ++-- kernel/trace/trace.h | 18 +++ 2 files changed, 91 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 6a0ee9178365..523e3e57f1d4 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -45,6 +45,9 @@ #include #include #include +#include +#include +#include #include "trace.h" #include "trace_output.h" @@ -1497,6 +1500,74 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, void *buf, size_t cnt) } unsigned long __read_mostlytracing_thresh; +static const struct file_operations tracing_max_lat_fops; + +#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ + defined(CONFIG_FSNOTIFY) + +static struct workqueue_struct *fsnotify_wq; + +static void latency_fsnotify_workfn(struct work_struct *work) +{ + struct trace_array *tr = container_of(work, struct trace_array, + fsnotify_work); + fsnotify(tr->d_max_latency->d_inode, FS_MODIFY, +tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0); +} + +static void latency_fsnotify_workfn_irq(struct irq_work *iwork) +{ + struct trace_array *tr = container_of(iwork, struct trace_array, + fsnotify_irqwork); + queue_work(fsnotify_wq, >fsnotify_work); +} + +static void trace_create_maxlat_file(struct trace_array *tr, +struct dentry *d_tracer) +{ + INIT_WORK(>fsnotify_work, latency_fsnotify_workfn); + init_irq_work(>fsnotify_irqwork, latency_fsnotify_workfn_irq); + tr->d_max_latency = trace_create_file("tracing_max_latency", 0644, + d_tracer, >max_latency, + _max_lat_fops); +} + +__init static int latency_fsnotify_init(void) +{ + fsnotify_wq = alloc_workqueue("tr_max_lat_wq", + WQ_UNBOUND | WQ_HIGHPRI, 0); + if (!fsnotify_wq) { + pr_err("Unable to allocate tr_max_lat_wq\n"); + return -ENOMEM; + } + return 0; +} + +late_initcall_sync(latency_fsnotify_init); + +void latency_fsnotify(struct trace_array *tr) +{ + if (!fsnotify_wq) + return; + /* +* We cannot call queue_work(>fsnotify_work) from here because it's +* possible that we are called from __schedule() or do_idle(), which +* could cause a deadlock. +*/ + irq_work_queue(>fsnotify_irqwork); +} + +/* + * (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ + * defined(CONFIG_FSNOTIFY) + */ +#else + +#define trace_create_maxlat_file(tr, d_tracer) \ + trace_create_file("tracing_max_latency", 0644, d_tracer,\ + >max_latency, _max_lat_fops) + +#endif #ifdef CONFIG_TRACER_MAX_TRACE /* @@ -1536,6 +1607,7 @@ __update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu) /* record this tasks comm */ tracing_record_cmdline(tsk); + latency_fsnotify(tr); } /** @@ -8585,8 +8657,7 @@ init_tr
Re: [PATCH v8 4/4] ftrace: Add an option for tracing console latencies
Den ons 9 okt. 2019 kl 20:08 skrev Steven Rostedt : > > On Wed, 9 Oct 2019 16:51:07 +0200 > Viktor Rosendahl wrote: > > > Apologies, I should have replied there but I have been a bit busy the > > last few days with other topics, and I felt a bit indecisive about > > that point, so I just sloppily addressed the issue in the cover letter > > of this new series: > > > > "I have retained the fourth patch, although it was suggested that is > > becoming > > obsolete soon. I have retained it only because I do not know the status of > > the code that will make it obsolete. It's the last patch of the series and > > if there indeed is some code that will remove the latency issues from the > > printk code, then of course it makes sense to drop it. The first three > > patches > > will work without it." > > > > I thought that, since it's the last in the series, it would be > > possible for maintainers to just take the first three if the last one > > is not wanted. > > > > For me it solves a rather important problem though, so if the code > > that will make it obsolete isn't available for some time, then perhaps > > it should be considered as a temporary solution. > > > > Of course, if there is a commitment to never remove any knobs from > > /sys/kernel/debug/tracing/trace_options, then I can easily understand > > that it's not wanted as a temporary fix. > > There isn't quite a commitment to not remove knobs, but if a tool > starts relying on a knob, then, the answer is yes there is :-p > > As this will hopefully become something we don't worry about in the > future, I would rather have this be a kernel command line option. This > way, it wont be something that tools can really check for. > > If you add a trace_console_latency option to the kernel command line, > we can enable it that way. And then when it becomes obsolete, we can > simply remove it, without breaking any tools. > > Would that work for you? > Sounds good to me. I will try to adjust the patch accordingly within a few days. best regards, Viktor > -- Steve
Re: [PATCH v8 4/4] ftrace: Add an option for tracing console latencies
Den ons 9 okt. 2019 kl 16:11 skrev Joel Fernandes : > > On Wed, Oct 09, 2019 at 12:08:24AM +0200, Viktor Rosendahl (BMW) wrote: > > This new trace option "console-latency" will enable the latency > > tracers to trace the console latencies. Previously this has always been > > implicitely disabled. I guess this is because they are considered > > to be well known and unavoidable. > > > > However, for some organizations it may nevertheless be desirable to > > trace them. Basically, we want to be able to tell that there are > > latencies in the system under test because someone has incorrectly > > enabled the serial console. > > > > Signed-off-by: Viktor Rosendahl (BMW) > > --- > > include/linux/irqflags.h | 22 ++ > > kernel/printk/printk.c | 6 -- > > kernel/trace/trace.h | 1 + > > kernel/trace/trace_irqsoff.c | 18 ++ > > 4 files changed, 45 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h > > index 21619c92c377..3de891723331 100644 > > --- a/include/linux/irqflags.h > > +++ b/include/linux/irqflags.h > > @@ -13,6 +13,7 @@ > > #define _LINUX_TRACE_IRQFLAGS_H > > > > #include > > +#include > > #include > > > > /* Currently trace_softirqs_on/off is used only by lockdep */ > > @@ -68,9 +69,30 @@ do { \ > > defined(CONFIG_PREEMPT_TRACER) > > extern void stop_critical_timings(void); > > extern void start_critical_timings(void); > > + extern bool console_tracing_disabled(void); > > + > > +# define console_stop_critical_timings(flag) \ > > + do {\ > > + typecheck(bool, flag); \ > > + flag = console_tracing_disabled(); \ > > + if (flag) \ > > + stop_critical_timings();\ > > + } while (0) > > + > > +# define console_start_critical_timings(flag) \ > > + do { \ > > + typecheck(bool, flag); \ > > + if (flag)\ > > + start_critical_timings();\ > > + } while (0) > > + > > #else > > # define stop_critical_timings() do { } while (0) > > # define start_critical_timings() do { } while (0) > > +# define console_stop_critical_timings(flag) \ > > + do { typecheck(bool, flag); } while (0) > > +# define console_start_critical_timings(flag)\ > > + do { typecheck(bool, flag); } while (0) > > #endif > > > > /* > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > index ca65327a6de8..f27e96273453 100644 > > --- a/kernel/printk/printk.c > > +++ b/kernel/printk/printk.c > > @@ -2369,6 +2369,7 @@ void console_unlock(void) > > static char ext_text[CONSOLE_EXT_LOG_MAX]; > > static char text[LOG_LINE_MAX + PREFIX_MAX]; > > unsigned long flags; > > + bool cflag; > > bool do_cond_resched, retry; > > > > if (console_suspended) { > > @@ -2469,9 +2470,10 @@ void console_unlock(void) > >*/ > > console_lock_spinning_enable(); > > > > - stop_critical_timings();/* don't trace print latency > > */ > > + /* don't trace print latency if it's disabled */ > > + console_stop_critical_timings(cflag); > > call_console_drivers(ext_text, ext_len, text, len); > > - start_critical_timings(); > > + console_start_critical_timings(cflag); > > > > if (console_lock_spinning_disable_and_check()) { > > printk_safe_exit_irqrestore(flags); > > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h > > index 591c7a873235..10d12c8f7f77 100644 > > --- a/kernel/trace/trace.h > > +++ b/kernel/trace/trace.h > > @@ -1261,6 +1261,7 @@ extern int trace_get_user(struct trace_parser > > *parser, const char __user *ubuf, > > C(PRINTK_MSGONLY, "printk-msg-only"), \ > > C(CONTEXT_INFO, "context-info"), /* Print > > pid/cpu/time */ \ > > C(LATENCY_FMT, "latency-format"), \ > > + C(CONSOLE_LATENCY, "console-latency"),
[PATCH v8 4/4] ftrace: Add an option for tracing console latencies
This new trace option "console-latency" will enable the latency tracers to trace the console latencies. Previously this has always been implicitely disabled. I guess this is because they are considered to be well known and unavoidable. However, for some organizations it may nevertheless be desirable to trace them. Basically, we want to be able to tell that there are latencies in the system under test because someone has incorrectly enabled the serial console. Signed-off-by: Viktor Rosendahl (BMW) --- include/linux/irqflags.h | 22 ++ kernel/printk/printk.c | 6 -- kernel/trace/trace.h | 1 + kernel/trace/trace_irqsoff.c | 18 ++ 4 files changed, 45 insertions(+), 2 deletions(-) diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 21619c92c377..3de891723331 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -13,6 +13,7 @@ #define _LINUX_TRACE_IRQFLAGS_H #include +#include #include /* Currently trace_softirqs_on/off is used only by lockdep */ @@ -68,9 +69,30 @@ do { \ defined(CONFIG_PREEMPT_TRACER) extern void stop_critical_timings(void); extern void start_critical_timings(void); + extern bool console_tracing_disabled(void); + +# define console_stop_critical_timings(flag) \ + do {\ + typecheck(bool, flag); \ + flag = console_tracing_disabled(); \ + if (flag) \ + stop_critical_timings();\ + } while (0) + +# define console_start_critical_timings(flag) \ + do { \ + typecheck(bool, flag); \ + if (flag)\ + start_critical_timings();\ + } while (0) + #else # define stop_critical_timings() do { } while (0) # define start_critical_timings() do { } while (0) +# define console_stop_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) +# define console_start_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) #endif /* diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index ca65327a6de8..f27e96273453 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2369,6 +2369,7 @@ void console_unlock(void) static char ext_text[CONSOLE_EXT_LOG_MAX]; static char text[LOG_LINE_MAX + PREFIX_MAX]; unsigned long flags; + bool cflag; bool do_cond_resched, retry; if (console_suspended) { @@ -2469,9 +2470,10 @@ void console_unlock(void) */ console_lock_spinning_enable(); - stop_critical_timings();/* don't trace print latency */ + /* don't trace print latency if it's disabled */ + console_stop_critical_timings(cflag); call_console_drivers(ext_text, ext_len, text, len); - start_critical_timings(); + console_start_critical_timings(cflag); if (console_lock_spinning_disable_and_check()) { printk_safe_exit_irqrestore(flags); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 591c7a873235..10d12c8f7f77 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1261,6 +1261,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf, C(PRINTK_MSGONLY, "printk-msg-only"), \ C(CONTEXT_INFO, "context-info"), /* Print pid/cpu/time */ \ C(LATENCY_FMT, "latency-format"), \ + C(CONSOLE_LATENCY, "console-latency"), \ C(RECORD_CMD, "record-cmd"), \ C(RECORD_TGID, "record-tgid"), \ C(OVERWRITE,"overwrite"), \ diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index a745b0cee5d3..576e2162114e 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -456,6 +456,24 @@ void stop_critical_timings(void) EXPORT_SYMBOL_GPL(stop_critical_timings); NOKPROBE_SYMBOL(stop_critical_timings); +bool console_tracing_disabled(void) +{ + struct trace_array *tr = irqsoff_trace; + int pc = preempt_count(); + + /* +* If tracing is disabled, then the question of whether to trace console +* latencies is moot. By always returning false here we save the caller +* the calls to start/stop_critical_timings(). These calls would not do +* anything anyway. +*/ + if (!preempt_trace(pc) && !irq_trace()) +
[PATCH v8 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger
This burst feature enables the user to generate a burst of preempt/irqsoff latencies. This makes it possible to test whether we are able to detect latencies that systematically occur very close to each other. The maximum burst size is 10. We also create 10 identical test functions, so that we get 10 different backtraces; this is useful when we want to test whether we can detect all the latencies in a burst. Otherwise, there would be no easy way of differentiating between which latency in a burst was captured by the tracer. In addition, there is a sysfs trigger, so that it's not necessary to reload the module to repeat the test. The trigger will appear as /sys/kernel/preemptirq_delay_test/trigger in sysfs. Signed-off-by: Viktor Rosendahl (BMW) Reviewed-by: Joel Fernandes (Google) --- kernel/trace/Kconfig | 6 +- kernel/trace/preemptirq_delay_test.c | 144 +++ 2 files changed, 128 insertions(+), 22 deletions(-) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index e08527f50d2a..2a58380ea310 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -752,9 +752,9 @@ config PREEMPTIRQ_DELAY_TEST configurable delay. The module busy waits for the duration of the critical section. - For example, the following invocation forces a one-time irq-disabled - critical section for 500us: - modprobe preemptirq_delay_test test_mode=irq delay=50 + For example, the following invocation generates a burst of three + irq-disabled critical sections for 500us: + modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3 If unsure, say N diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index d8765c952fab..31c0fad4cb9e 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -10,18 +10,25 @@ #include #include #include +#include #include #include #include #include +#include static ulong delay = 100; -static char test_mode[10] = "irq"; +static char test_mode[12] = "irq"; +static uint burst_size = 1; -module_param_named(delay, delay, ulong, S_IRUGO); -module_param_string(test_mode, test_mode, 10, S_IRUGO); -MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)"); -MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default irq)"); +module_param_named(delay, delay, ulong, 0444); +module_param_string(test_mode, test_mode, 12, 0444); +module_param_named(burst_size, burst_size, uint, 0444); +MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)"); +MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or alternate (default irq)"); +MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)"); + +#define MIN(x, y) ((x) < (y) ? (x) : (y)) static void busy_wait(ulong time) { @@ -34,37 +41,136 @@ static void busy_wait(ulong time) } while ((end - start) < (time * 1000)); } -static int preemptirq_delay_run(void *data) +static __always_inline void irqoff_test(void) { unsigned long flags; + local_irq_save(flags); + busy_wait(delay); + local_irq_restore(flags); +} - if (!strcmp(test_mode, "irq")) { - local_irq_save(flags); - busy_wait(delay); - local_irq_restore(flags); - } else if (!strcmp(test_mode, "preempt")) { - preempt_disable(); - busy_wait(delay); - preempt_enable(); +static __always_inline void preemptoff_test(void) +{ + preempt_disable(); + busy_wait(delay); + preempt_enable(); +} + +static void execute_preemptirqtest(int idx) +{ + if (!strcmp(test_mode, "irq")) + irqoff_test(); + else if (!strcmp(test_mode, "preempt")) + preemptoff_test(); + else if (!strcmp(test_mode, "alternate")) { + if (idx % 2 == 0) + irqoff_test(); + else + preemptoff_test(); } +} + +#define DECLARE_TESTFN(POSTFIX)\ + static void preemptirqtest_##POSTFIX(int idx) \ + { \ + execute_preemptirqtest(idx);\ + } \ +/* + * We create 10 different functions, so that we can get 10 different + * backtraces. + */ +DECLARE_TESTFN(0) +DECLARE_TESTFN(1) +DECLARE_TESTFN(2) +DECLARE_TESTFN(3) +DECLARE_TESTFN(4) +DECLARE_TESTFN(5) +DECLARE_TESTFN(6) +DECLARE_TESTFN(7) +DECLARE_TESTFN(8) +DECLARE_TESTFN(9) + +static void (*testfuncs[])(int) = { + preemptirqtest_0, + preemptirqtest_1, + preemptirqtest_2, + preemptirqtest_3, + preemptirqtest_4, + preemptirq
[PATCH v8 3/4] Add the latency-collector to tools
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< In the example above, we randomly, happened to get the third latency in a burst burst of 10. If the experiment is repeated enough times, we will get all 10. Sometimes, there will be lost latencies, so that we get: 3054.078294 Latency 44 printout skipped due to inotify loop ..or: 3211.957685 Latency 112 printout skipped due to print loop In my experience, these are not frequent enough to be a problem. Also, we do not really lose any latency in these cases, it's more like we get another latency than what was intended by the design. Signed-off-by: Viktor Rosendahl (BMW) --- tools/Makefile | 14 +- tools/trace/Makefile| 20 + tools/trace/latency-collector.c | 1212 +++ 3 files changed, 1240 insertions(+), 6 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c diff --git a/tools/Makefile b/tools/Makefile index 7e42f7b8bfa7..957762a8b811 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -30,6 +30,7 @@ help: @echo ' selftests - various kernel selftests' @echo ' spi- spi tools' @echo ' tmon - thermal monitoring and tuning tool' + @echo ' trace - misc tracing tools' @echo ' turbostat - Intel CPU idle stats and freq reporting tool' @echo ' usb- USB testing tools' @echo ' virtio - vhost test module' @@ -63,7 +64,7 @@ acpi: FORCE cpupower: FORCE $(call descend,power/$@) -cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging: FORCE +cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging trace: FORCE $(call descend,$@) liblockdep: FORCE @@ -99,7 +100,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \ perf selftests spi turbostat usb \ virtio vm bpf x86_energy_perf_policy \ tmon freefall iio objtool kvm_stat wmi \ - pci debugging + pci debugging trace acpi_install: $(call descend,power/$(@:_install=),install) @@ -107,7 +108,7 @@ acpi_install: cpupower_install: $(call descend,power/$(@:_install=),install) -cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install: +cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install trace_install: $(call descend,$(@:_install=),install) liblockdep_install: @@ -133,7 +134,8 @@ install: acpi_install cgroup_install cpupower_install gpio_install \ perf_install selftests_install turbostat_install usb_install \ virtio_install vm_install bpf_install x86_energy_perf_policy_install \ tmon_install freefall_install objtool_install kvm_stat_install \ - wmi_install pci_install debugging_install intel-speed-select_install + wmi_install pci_install debugging_install intel-speed-select_install \ + trace_install acpi_clean: $(call descend,power/acpi,clean) @@ -141,7 +143,7 @@ acpi_clean: cpupower_clean: $(call descend,power/cpupower,clean) -cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean: +cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean trace_clean: $(call descend,$(@:_clean=),clean) liblockdep_clean: @@ -180,6 +182,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean firewire_clean \ vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean tmon_clean \ freefall_clean build_clean libbpf_clean libsubcmd_clean liblockdep_clean \ gpio_clean objtool_clean leds_clean wmi_clean pci_clean firmware_clean debugging_clean \ - intel-speed-select_clean + intel-speed-select_clean trace_clean .PHONY: FORCE diff --git a/tools/trace/Makefile b/tools/trace/Makefile new file mode 100644 index ..59cd483ab01f --- /dev/null +++ b/tools/trace/Makefile @@ -0,0 +1,20 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for vm tools +# +TARGETS = latency-collector +CFLAGS = -Wall -Wextra -g -O2 +LDFLAGS = -lpthread + +all: $(TARGETS) + +%: %.c + $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) + +clean: +
[PATCH v8 0/4] Some new features for the preempt/irqsoff tracers
Hello all, I have retained the fourth patch, although it was suggested that is becoming obsolete soon. I have retained it only because I do not know the status of the code that will make it obsolete. It's the last patch of the series and if there indeed is some code that will remove the latency issues from the printk code, then of course it makes sense to drop it. The first three patches will work without it. Changes in v8: - [PATCH 1/4]: * Moved a forward declaration in order to make code a bit more robust. * Converted a macro NOP to a static function so that the type is checked. - [PATCH 2/4]: * No change. - [PATCH 3/4]: * No change - [PACTH 4/4]: * Added a comment to explain an optimization in console_tracing_disabled(). This series is meant to address two issues with the latency tracing. The first three patches provide a method to trace latencies that always occurs very close to each other and to differentiate between them, in spite of the fact that the latency tracers work in overwrite mode. [PATCH 1/4] This implement fs notification for tracing_max_latency. It makes it possible for userspace to detect when a new latency has been detected. [PATCH 2/4] This extends the preemptirq_delay_test module so that it can be used to generate a burst of closely occurring latencies. [PATCH 3/4] This adds a user space program to the tools directory that utilizes the fs notification feature and a randomized algorithm to print out any of the latencies in a burst with approximately equal probability. The last patch is not directly connected but earlier it didn't apply cleanly on its own. However, now it does, so in principle it could be applied separately from the others. [PATCH 4/4] This adds the option console-latency to the trace options. This makes it possible to enable tracing of console latencies. best regards, Viktor Viktor Rosendahl (BMW) (4): ftrace: Implement fs notification for tracing_max_latency preemptirq_delay_test: Add the burst feature and a sysfs trigger Add the latency-collector to tools ftrace: Add an option for tracing console latencies include/linux/irqflags.h | 22 + kernel/printk/printk.c |6 +- kernel/trace/Kconfig |6 +- kernel/trace/preemptirq_delay_test.c | 144 ++- kernel/trace/trace.c | 75 +- kernel/trace/trace.h | 19 + kernel/trace/trace_hwlat.c |4 +- kernel/trace/trace_irqsoff.c | 18 + tools/Makefile | 14 +- tools/trace/Makefile | 20 + tools/trace/latency-collector.c | 1212 ++ 11 files changed, 1507 insertions(+), 33 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c -- 2.17.1
[PATCH v8 1/4] ftrace: Implement fs notification for tracing_max_latency
This patch implements the feature that the tracing_max_latency file, e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive notifications through the fsnotify framework when a new latency is available. One particularly interesting use of this facility is when enabling threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh, together with the preempt/irqsoff tracers. This makes it possible to implement a user space program that can, with equal probability, obtain traces of latencies that occur immediately after each other in spite of the fact that the preempt/irqsoff tracers operate in overwrite mode. This facility works with the hwlat, preempt/irqsoff, and wakeup tracers. The tracers may call the latency_fsnotify() from places such as __schedule() or do_idle(); this makes it impossible to call queue_work() directly without risking a deadlock. The same would happen with a softirq, kernel thread or tasklet. For this reason we use the irq_work mechanism to call queue_work(). This patch creates a new workqueue. The reason for doing this is that I wanted to use the WQ_UNBOUND and WQ_HIGHPRI flags. My thinking was that WQ_UNBOUND might help with the latency in some important cases. If we use: queue_work(system_highpri_wq, >fsnotify_work); then the work will (almost) always execute on the same CPU but if we are unlucky that CPU could be too busy while there could be another CPU in the system that would be able to process the work soon enough. queue_work_on() could be used to queue the work on another CPU but it seems difficult to select the right CPU. Signed-off-by: Viktor Rosendahl (BMW) Reviewed-by: Joel Fernandes (Google) --- kernel/trace/trace.c | 75 +- kernel/trace/trace.h | 18 + kernel/trace/trace_hwlat.c | 4 +- 3 files changed, 94 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 252f79c435f8..f8ca307f66cf 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -44,6 +44,9 @@ #include #include #include +#include +#include +#include #include "trace.h" #include "trace_output.h" @@ -1479,6 +1482,74 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, void *buf, size_t cnt) } unsigned long __read_mostlytracing_thresh; +static const struct file_operations tracing_max_lat_fops; + +#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ + defined(CONFIG_FSNOTIFY) + +static struct workqueue_struct *fsnotify_wq; + +static void latency_fsnotify_workfn(struct work_struct *work) +{ + struct trace_array *tr = container_of(work, struct trace_array, + fsnotify_work); + fsnotify(tr->d_max_latency->d_inode, FS_MODIFY, +tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0); +} + +static void latency_fsnotify_workfn_irq(struct irq_work *iwork) +{ + struct trace_array *tr = container_of(iwork, struct trace_array, + fsnotify_irqwork); + queue_work(fsnotify_wq, >fsnotify_work); +} + +static void trace_create_maxlat_file(struct trace_array *tr, +struct dentry *d_tracer) +{ + INIT_WORK(>fsnotify_work, latency_fsnotify_workfn); + init_irq_work(>fsnotify_irqwork, latency_fsnotify_workfn_irq); + tr->d_max_latency = trace_create_file("tracing_max_latency", 0644, + d_tracer, >max_latency, + _max_lat_fops); +} + +__init static int latency_fsnotify_init(void) +{ + fsnotify_wq = alloc_workqueue("tr_max_lat_wq", + WQ_UNBOUND | WQ_HIGHPRI, 0); + if (!fsnotify_wq) { + pr_err("Unable to allocate tr_max_lat_wq\n"); + return -ENOMEM; + } + return 0; +} + +late_initcall_sync(latency_fsnotify_init); + +void latency_fsnotify(struct trace_array *tr) +{ + if (!fsnotify_wq) + return; + /* +* We cannot call queue_work(>fsnotify_work) from here because it's +* possible that we are called from __schedule() or do_idle(), which +* could cause a deadlock. +*/ + irq_work_queue(>fsnotify_irqwork); +} + +/* + * (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ + * defined(CONFIG_FSNOTIFY) + */ +#else + +#define trace_create_maxlat_file(tr, d_tracer) \ + trace_create_file("tracing_max_latency", 0644, d_tracer,\ + >max_latency, _max_lat_fops) + +#endif #ifdef CONFIG_TRACER_MAX_TRACE /* @@ -1518,6 +1589,7 @@ __update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu) /* record this tasks comm */ tracing_record
Re: [PATCH v7 1/4] ftrace: Implement fs notification for tracing_max_latency
On 10/2/19 2:32 AM, Joel Fernandes wrote: Some nits, but others looks good: On Fri, Sep 20, 2019 at 05:22:16PM +0200, Viktor Rosendahl (BMW) wrote: This patch implements the feature that the tracing_max_latency file, e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive notifications through the fsnotify framework when a new latency is available. One particularly interesting use of this facility is when enabling threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh, together with the preempt/irqsoff tracers. This makes it possible to implement a user space program that can, with equal probability, obtain traces of latencies that occur immediately after each other in spite of the fact that the preempt/irqsoff tracers operate in overwrite mode. This facility works with the hwlat, preempt/irqsoff, and wakeup tracers. The tracers may call the latency_fsnotify() from places such as __schedule() or do_idle(); this makes it impossible to call queue_work() directly without risking a deadlock. The same would happen with a softirq, kernel thread or tasklet. For this reason we use the irq_work mechanism to call queue_work(). This patch creates a new workqueue. The reason for doing this is that I wanted to use the WQ_UNBOUND and WQ_HIGHPRI flags. My thinking was that WQ_UNBOUND might help with the latency in some important cases. If we use: queue_work(system_highpri_wq, >fsnotify_work); then the work will (almost) always execute on the same CPU but if we are unlucky that CPU could be too busy while there could be another CPU in the system that would be able to process the work soon enough. queue_work_on() could be used to queue the work on another CPU but it seems difficult to select the right CPU. Signed-off-by: Viktor Rosendahl (BMW) --- kernel/trace/trace.c | 75 +- kernel/trace/trace.h | 18 + kernel/trace/trace_hwlat.c | 4 +- 3 files changed, 94 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 947ba433865f..2d2150bf0d42 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -44,6 +44,9 @@ #include #include #include +#include +#include +#include #include "trace.h" #include "trace_output.h" @@ -1480,6 +1483,74 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, void *buf, size_t cnt) unsigned long __read_mostly tracing_thresh; +#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ + defined(CONFIG_FSNOTIFY) + +static const struct file_operations tracing_max_lat_fops; +static struct workqueue_struct *fsnotify_wq; + +static void latency_fsnotify_workfn(struct work_struct *work) +{ + struct trace_array *tr = container_of(work, struct trace_array, + fsnotify_work); + fsnotify(tr->d_max_latency->d_inode, FS_MODIFY, +tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0); +} + +static void latency_fsnotify_workfn_irq(struct irq_work *iwork) +{ + struct trace_array *tr = container_of(iwork, struct trace_array, + fsnotify_irqwork); + queue_work(fsnotify_wq, >fsnotify_work); +} + +static void trace_create_maxlat_file(struct trace_array *tr, +struct dentry *d_tracer) +{ + INIT_WORK(>fsnotify_work, latency_fsnotify_workfn); + init_irq_work(>fsnotify_irqwork, latency_fsnotify_workfn_irq); + tr->d_max_latency = trace_create_file("tracing_max_latency", 0644, + d_tracer, >max_latency, + _max_lat_fops); +} + +__init static int latency_fsnotify_init(void) +{ + fsnotify_wq = alloc_workqueue("tr_max_lat_wq", + WQ_UNBOUND | WQ_HIGHPRI, 0); + if (!fsnotify_wq) { + pr_err("Unable to allocate tr_max_lat_wq\n"); + return -ENOMEM; + } + return 0; +} + +late_initcall_sync(latency_fsnotify_init); + +void latency_fsnotify(struct trace_array *tr) +{ + if (!fsnotify_wq) + return; + /* +* We cannot call queue_work(>fsnotify_work) from here because it's +* possible that we are called from __schedule() or do_idle(), which +* could cause a deadlock. +*/ + irq_work_queue(>fsnotify_irqwork); +} + +/* + * (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ + * defined(CONFIG_FSNOTIFY) + */ +#else + +#define trace_create_maxlat_file(tr, d_tracer) \ + trace_create_file("tracing_max_latency", 0644, d_tracer, \ + >max_latency, _max_lat_fops) I would forward declare tracing_max_lat_fops here as well like you are doing in the
Re: [PATCH v7 4/4] ftrace: Add an option for tracing console latencies
On 10/2/19 2:52 AM, Joel Fernandes wrote: On Fri, Sep 20, 2019 at 05:22:19PM +0200, Viktor Rosendahl (BMW) wrote: This new trace option "console-latency" will enable the latency tracers to trace the console latencies. Previously this has always been implicitely disabled. I guess this is because they are considered to be well known and unavoidable. However, for some organizations it may nevertheless be desirable to trace them. Basically, we want to be able to tell that there are latencies in the system under test because someone has incorrectly enabled the serial console. Signed-off-by: Viktor Rosendahl (BMW) --- include/linux/irqflags.h | 22 ++ kernel/printk/printk.c | 6 -- kernel/trace/trace.h | 1 + kernel/trace/trace_irqsoff.c | 12 4 files changed, 39 insertions(+), 2 deletions(-) diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 21619c92c377..3de891723331 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -13,6 +13,7 @@ #define _LINUX_TRACE_IRQFLAGS_H #include +#include #include /* Currently trace_softirqs_on/off is used only by lockdep */ @@ -68,9 +69,30 @@ do { \ defined(CONFIG_PREEMPT_TRACER) extern void stop_critical_timings(void); extern void start_critical_timings(void); + extern bool console_tracing_disabled(void); + +# define console_stop_critical_timings(flag) \ + do {\ + typecheck(bool, flag); \ + flag = console_tracing_disabled(); \ + if (flag) \ + stop_critical_timings();\ + } while (0) + +# define console_start_critical_timings(flag) \ + do { \ + typecheck(bool, flag); \ + if (flag)\ + start_critical_timings();\ + } while (0) + #else # define stop_critical_timings() do { } while (0) # define start_critical_timings() do { } while (0) +# define console_stop_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) +# define console_start_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) #endif /* diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 1888f6a3b694..036460a7e4cd 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2359,6 +2359,7 @@ void console_unlock(void) static char ext_text[CONSOLE_EXT_LOG_MAX]; static char text[LOG_LINE_MAX + PREFIX_MAX]; unsigned long flags; + bool cflag; bool do_cond_resched, retry; if (console_suspended) { @@ -2459,9 +2460,10 @@ void console_unlock(void) */ console_lock_spinning_enable(); - stop_critical_timings(); /* don't trace print latency */ + /* don't trace print latency if it's disabled */ + console_stop_critical_timings(cflag); call_console_drivers(ext_text, ext_len, text, len); - start_critical_timings(); + console_start_critical_timings(cflag); if (console_lock_spinning_disable_and_check()) { printk_safe_exit_irqrestore(flags); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 4913ed1138aa..93a8b82c27e4 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1262,6 +1262,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf, C(PRINTK_MSGONLY, "printk-msg-only"), \ C(CONTEXT_INFO, "context-info"), /* Print pid/cpu/time */ \ C(LATENCY_FMT, "latency-format"),\ + C(CONSOLE_LATENCY, "console-latency"), \ C(RECORD_CMD, "record-cmd"),\ C(RECORD_TGID, "record-tgid"), \ C(OVERWRITE,"overwrite"), \ diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index a745b0cee5d3..bc02be207a7a 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -456,6 +456,18 @@ void stop_critical_timings(void) EXPORT_SYMBOL_GPL(stop_critical_timings); NOKPROBE_SYMBOL(stop_critical_timings); +bool console_tracing_disabled(void) +{ + struct trace_array *tr = irqsoff_trace; + int pc = preempt_count(); + + if (!preempt_trace(pc) && !irq_trace()) + return false; Why this condition here? Why not just make the call to stop_critical_timings() dependent on the new flag you are adding? Please explain it more and add some comments.
Re: [PATCH v7 1/4] ftrace: Implement fs notification for tracing_max_latency
On 10/2/19 5:13 PM, Steven Rostedt wrote: On Fri, 20 Sep 2019 17:22:16 +0200 "Viktor Rosendahl (BMW)" wrote: This patch implements the feature that the tracing_max_latency file, e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive notifications through the fsnotify framework when a new latency is available. One particularly interesting use of this facility is when enabling threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh, together with the preempt/irqsoff tracers. This makes it possible to implement a user space program that can, with equal probability, obtain traces of latencies that occur immediately after each other in spite of the fact that the preempt/irqsoff tracers operate in overwrite mode. This facility works with the hwlat, preempt/irqsoff, and wakeup tracers. The tracers may call the latency_fsnotify() from places such as __schedule() or do_idle(); this makes it impossible to call queue_work() directly without risking a deadlock. The same would happen with a softirq, kernel thread or tasklet. For this reason we use the irq_work mechanism to call queue_work(). Can fsnotify() be called from irq context? If so, why have the work queue at all? Just do the work from the irq_work handler. fsnotify() might sleep. It calls send_to_group(), which calls inotify_handle_event() through a function pointer. inotify_handle_event() calls kmalloc() without the GFP_ATOMIC flag. There might be other reasons as well but the above is one that I have seen a warning for, when enabling CONFIG_DEBUG_ATOMIC_SLEEP and trying to call fsnotify() from an atomic context. best regards, Viktor
[PATCH v7 1/4] ftrace: Implement fs notification for tracing_max_latency
This patch implements the feature that the tracing_max_latency file, e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive notifications through the fsnotify framework when a new latency is available. One particularly interesting use of this facility is when enabling threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh, together with the preempt/irqsoff tracers. This makes it possible to implement a user space program that can, with equal probability, obtain traces of latencies that occur immediately after each other in spite of the fact that the preempt/irqsoff tracers operate in overwrite mode. This facility works with the hwlat, preempt/irqsoff, and wakeup tracers. The tracers may call the latency_fsnotify() from places such as __schedule() or do_idle(); this makes it impossible to call queue_work() directly without risking a deadlock. The same would happen with a softirq, kernel thread or tasklet. For this reason we use the irq_work mechanism to call queue_work(). This patch creates a new workqueue. The reason for doing this is that I wanted to use the WQ_UNBOUND and WQ_HIGHPRI flags. My thinking was that WQ_UNBOUND might help with the latency in some important cases. If we use: queue_work(system_highpri_wq, >fsnotify_work); then the work will (almost) always execute on the same CPU but if we are unlucky that CPU could be too busy while there could be another CPU in the system that would be able to process the work soon enough. queue_work_on() could be used to queue the work on another CPU but it seems difficult to select the right CPU. Signed-off-by: Viktor Rosendahl (BMW) --- kernel/trace/trace.c | 75 +- kernel/trace/trace.h | 18 + kernel/trace/trace_hwlat.c | 4 +- 3 files changed, 94 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 947ba433865f..2d2150bf0d42 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -44,6 +44,9 @@ #include #include #include +#include +#include +#include #include "trace.h" #include "trace_output.h" @@ -1480,6 +1483,74 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, void *buf, size_t cnt) unsigned long __read_mostlytracing_thresh; +#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ + defined(CONFIG_FSNOTIFY) + +static const struct file_operations tracing_max_lat_fops; +static struct workqueue_struct *fsnotify_wq; + +static void latency_fsnotify_workfn(struct work_struct *work) +{ + struct trace_array *tr = container_of(work, struct trace_array, + fsnotify_work); + fsnotify(tr->d_max_latency->d_inode, FS_MODIFY, +tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0); +} + +static void latency_fsnotify_workfn_irq(struct irq_work *iwork) +{ + struct trace_array *tr = container_of(iwork, struct trace_array, + fsnotify_irqwork); + queue_work(fsnotify_wq, >fsnotify_work); +} + +static void trace_create_maxlat_file(struct trace_array *tr, +struct dentry *d_tracer) +{ + INIT_WORK(>fsnotify_work, latency_fsnotify_workfn); + init_irq_work(>fsnotify_irqwork, latency_fsnotify_workfn_irq); + tr->d_max_latency = trace_create_file("tracing_max_latency", 0644, + d_tracer, >max_latency, + _max_lat_fops); +} + +__init static int latency_fsnotify_init(void) +{ + fsnotify_wq = alloc_workqueue("tr_max_lat_wq", + WQ_UNBOUND | WQ_HIGHPRI, 0); + if (!fsnotify_wq) { + pr_err("Unable to allocate tr_max_lat_wq\n"); + return -ENOMEM; + } + return 0; +} + +late_initcall_sync(latency_fsnotify_init); + +void latency_fsnotify(struct trace_array *tr) +{ + if (!fsnotify_wq) + return; + /* +* We cannot call queue_work(>fsnotify_work) from here because it's +* possible that we are called from __schedule() or do_idle(), which +* could cause a deadlock. +*/ + irq_work_queue(>fsnotify_irqwork); +} + +/* + * (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ + * defined(CONFIG_FSNOTIFY) + */ +#else + +#define trace_create_maxlat_file(tr, d_tracer) \ + trace_create_file("tracing_max_latency", 0644, d_tracer,\ + >max_latency, _max_lat_fops) + +#endif + #ifdef CONFIG_TRACER_MAX_TRACE /* * Copy the new maximum trace into the separate maximum-trace @@ -1518,6 +1589,7 @@ __update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu) /* record
[PATCH v7 3/4] Add the latency-collector to tools
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< In the example above, we randomly, happened to get the third latency in a burst burst of 10. If the experiment is repeated enough times, we will get all 10. Sometimes, there will be lost latencies, so that we get: 3054.078294 Latency 44 printout skipped due to inotify loop ..or: 3211.957685 Latency 112 printout skipped due to print loop In my experience, these are not frequent enough to be a problem. Also, we do not really lose any latency in these cases, it's more like we get another latency than what was intended by the design. Signed-off-by: Viktor Rosendahl (BMW) --- tools/Makefile | 14 +- tools/trace/Makefile| 20 + tools/trace/latency-collector.c | 1212 +++ 3 files changed, 1240 insertions(+), 6 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c diff --git a/tools/Makefile b/tools/Makefile index 7e42f7b8bfa7..957762a8b811 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -30,6 +30,7 @@ help: @echo ' selftests - various kernel selftests' @echo ' spi- spi tools' @echo ' tmon - thermal monitoring and tuning tool' + @echo ' trace - misc tracing tools' @echo ' turbostat - Intel CPU idle stats and freq reporting tool' @echo ' usb- USB testing tools' @echo ' virtio - vhost test module' @@ -63,7 +64,7 @@ acpi: FORCE cpupower: FORCE $(call descend,power/$@) -cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging: FORCE +cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging trace: FORCE $(call descend,$@) liblockdep: FORCE @@ -99,7 +100,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \ perf selftests spi turbostat usb \ virtio vm bpf x86_energy_perf_policy \ tmon freefall iio objtool kvm_stat wmi \ - pci debugging + pci debugging trace acpi_install: $(call descend,power/$(@:_install=),install) @@ -107,7 +108,7 @@ acpi_install: cpupower_install: $(call descend,power/$(@:_install=),install) -cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install: +cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install trace_install: $(call descend,$(@:_install=),install) liblockdep_install: @@ -133,7 +134,8 @@ install: acpi_install cgroup_install cpupower_install gpio_install \ perf_install selftests_install turbostat_install usb_install \ virtio_install vm_install bpf_install x86_energy_perf_policy_install \ tmon_install freefall_install objtool_install kvm_stat_install \ - wmi_install pci_install debugging_install intel-speed-select_install + wmi_install pci_install debugging_install intel-speed-select_install \ + trace_install acpi_clean: $(call descend,power/acpi,clean) @@ -141,7 +143,7 @@ acpi_clean: cpupower_clean: $(call descend,power/cpupower,clean) -cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean: +cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean trace_clean: $(call descend,$(@:_clean=),clean) liblockdep_clean: @@ -180,6 +182,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean firewire_clean \ vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean tmon_clean \ freefall_clean build_clean libbpf_clean libsubcmd_clean liblockdep_clean \ gpio_clean objtool_clean leds_clean wmi_clean pci_clean firmware_clean debugging_clean \ - intel-speed-select_clean + intel-speed-select_clean trace_clean .PHONY: FORCE diff --git a/tools/trace/Makefile b/tools/trace/Makefile new file mode 100644 index ..59cd483ab01f --- /dev/null +++ b/tools/trace/Makefile @@ -0,0 +1,20 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for vm tools +# +TARGETS = latency-collector +CFLAGS = -Wall -Wextra -g -O2 +LDFLAGS = -lpthread + +all: $(TARGETS) + +%: %.c + $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) + +clean: +
[PATCH v7 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger
This burst feature enables the user to generate a burst of preempt/irqsoff latencies. This makes it possible to test whether we are able to detect latencies that systematically occur very close to each other. The maximum burst size is 10. We also create 10 identical test functions, so that we get 10 different backtraces; this is useful when we want to test whether we can detect all the latencies in a burst. Otherwise, there would be no easy way of differentiating between which latency in a burst was captured by the tracer. In addition, there is a sysfs trigger, so that it's not necessary to reload the module to repeat the test. The trigger will appear as /sys/kernel/preemptirq_delay_test/trigger in sysfs. Signed-off-by: Viktor Rosendahl (BMW) --- kernel/trace/Kconfig | 6 +- kernel/trace/preemptirq_delay_test.c | 144 +++ 2 files changed, 128 insertions(+), 22 deletions(-) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index e08527f50d2a..2a58380ea310 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -752,9 +752,9 @@ config PREEMPTIRQ_DELAY_TEST configurable delay. The module busy waits for the duration of the critical section. - For example, the following invocation forces a one-time irq-disabled - critical section for 500us: - modprobe preemptirq_delay_test test_mode=irq delay=50 + For example, the following invocation generates a burst of three + irq-disabled critical sections for 500us: + modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3 If unsure, say N diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index d8765c952fab..31c0fad4cb9e 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -10,18 +10,25 @@ #include #include #include +#include #include #include #include #include +#include static ulong delay = 100; -static char test_mode[10] = "irq"; +static char test_mode[12] = "irq"; +static uint burst_size = 1; -module_param_named(delay, delay, ulong, S_IRUGO); -module_param_string(test_mode, test_mode, 10, S_IRUGO); -MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)"); -MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default irq)"); +module_param_named(delay, delay, ulong, 0444); +module_param_string(test_mode, test_mode, 12, 0444); +module_param_named(burst_size, burst_size, uint, 0444); +MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)"); +MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or alternate (default irq)"); +MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)"); + +#define MIN(x, y) ((x) < (y) ? (x) : (y)) static void busy_wait(ulong time) { @@ -34,37 +41,136 @@ static void busy_wait(ulong time) } while ((end - start) < (time * 1000)); } -static int preemptirq_delay_run(void *data) +static __always_inline void irqoff_test(void) { unsigned long flags; + local_irq_save(flags); + busy_wait(delay); + local_irq_restore(flags); +} - if (!strcmp(test_mode, "irq")) { - local_irq_save(flags); - busy_wait(delay); - local_irq_restore(flags); - } else if (!strcmp(test_mode, "preempt")) { - preempt_disable(); - busy_wait(delay); - preempt_enable(); +static __always_inline void preemptoff_test(void) +{ + preempt_disable(); + busy_wait(delay); + preempt_enable(); +} + +static void execute_preemptirqtest(int idx) +{ + if (!strcmp(test_mode, "irq")) + irqoff_test(); + else if (!strcmp(test_mode, "preempt")) + preemptoff_test(); + else if (!strcmp(test_mode, "alternate")) { + if (idx % 2 == 0) + irqoff_test(); + else + preemptoff_test(); } +} + +#define DECLARE_TESTFN(POSTFIX)\ + static void preemptirqtest_##POSTFIX(int idx) \ + { \ + execute_preemptirqtest(idx);\ + } \ +/* + * We create 10 different functions, so that we can get 10 different + * backtraces. + */ +DECLARE_TESTFN(0) +DECLARE_TESTFN(1) +DECLARE_TESTFN(2) +DECLARE_TESTFN(3) +DECLARE_TESTFN(4) +DECLARE_TESTFN(5) +DECLARE_TESTFN(6) +DECLARE_TESTFN(7) +DECLARE_TESTFN(8) +DECLARE_TESTFN(9) + +static void (*testfuncs[])(int) = { + preemptirqtest_0, + preemptirqtest_1, + preemptirqtest_2, + preemptirqtest_3, + preemptirqtest_4, + preemptirqtest_5, + preemptirqtest_6, +
[PATCH v7 4/4] ftrace: Add an option for tracing console latencies
This new trace option "console-latency" will enable the latency tracers to trace the console latencies. Previously this has always been implicitely disabled. I guess this is because they are considered to be well known and unavoidable. However, for some organizations it may nevertheless be desirable to trace them. Basically, we want to be able to tell that there are latencies in the system under test because someone has incorrectly enabled the serial console. Signed-off-by: Viktor Rosendahl (BMW) --- include/linux/irqflags.h | 22 ++ kernel/printk/printk.c | 6 -- kernel/trace/trace.h | 1 + kernel/trace/trace_irqsoff.c | 12 4 files changed, 39 insertions(+), 2 deletions(-) diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 21619c92c377..3de891723331 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -13,6 +13,7 @@ #define _LINUX_TRACE_IRQFLAGS_H #include +#include #include /* Currently trace_softirqs_on/off is used only by lockdep */ @@ -68,9 +69,30 @@ do { \ defined(CONFIG_PREEMPT_TRACER) extern void stop_critical_timings(void); extern void start_critical_timings(void); + extern bool console_tracing_disabled(void); + +# define console_stop_critical_timings(flag) \ + do {\ + typecheck(bool, flag); \ + flag = console_tracing_disabled(); \ + if (flag) \ + stop_critical_timings();\ + } while (0) + +# define console_start_critical_timings(flag) \ + do { \ + typecheck(bool, flag); \ + if (flag)\ + start_critical_timings();\ + } while (0) + #else # define stop_critical_timings() do { } while (0) # define start_critical_timings() do { } while (0) +# define console_stop_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) +# define console_start_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) #endif /* diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 1888f6a3b694..036460a7e4cd 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2359,6 +2359,7 @@ void console_unlock(void) static char ext_text[CONSOLE_EXT_LOG_MAX]; static char text[LOG_LINE_MAX + PREFIX_MAX]; unsigned long flags; + bool cflag; bool do_cond_resched, retry; if (console_suspended) { @@ -2459,9 +2460,10 @@ void console_unlock(void) */ console_lock_spinning_enable(); - stop_critical_timings();/* don't trace print latency */ + /* don't trace print latency if it's disabled */ + console_stop_critical_timings(cflag); call_console_drivers(ext_text, ext_len, text, len); - start_critical_timings(); + console_start_critical_timings(cflag); if (console_lock_spinning_disable_and_check()) { printk_safe_exit_irqrestore(flags); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 4913ed1138aa..93a8b82c27e4 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1262,6 +1262,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf, C(PRINTK_MSGONLY, "printk-msg-only"), \ C(CONTEXT_INFO, "context-info"), /* Print pid/cpu/time */ \ C(LATENCY_FMT, "latency-format"), \ + C(CONSOLE_LATENCY, "console-latency"), \ C(RECORD_CMD, "record-cmd"), \ C(RECORD_TGID, "record-tgid"), \ C(OVERWRITE,"overwrite"), \ diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index a745b0cee5d3..bc02be207a7a 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -456,6 +456,18 @@ void stop_critical_timings(void) EXPORT_SYMBOL_GPL(stop_critical_timings); NOKPROBE_SYMBOL(stop_critical_timings); +bool console_tracing_disabled(void) +{ + struct trace_array *tr = irqsoff_trace; + int pc = preempt_count(); + + if (!preempt_trace(pc) && !irq_trace()) + return false; + + return !(tr->trace_flags & TRACE_ITER_CONSOLE_LATENCY); +} +EXPORT_SYMBOL_GPL(console_tracing_disabled); + #ifdef CONFIG_FUNCTION_TRACER static bool function_enabled; -- 2.17.1
[PATCH v7 0/4] Some new features for the preempt/irqsoff tracers
Hello all, There are not many changes this time. I mainly changed my email address, tweaked a commit message and removed the cluttering of a copyright message. Changes in v7: - [PATCH 1/4]: * I have added some reasons to end of the commit message why I believe that it makes sense to create a new workqueue. - [PATCH 2/4]: * Removed the cluttering of the copyright message. - [PATCH 3/4]: * No change - [PACTH 4/4]: * No change This series is meant to address two issues with the latency tracing. The first three patches provide a method to trace latencies that always occurs very close to each other and to differentiate between them, in spite of the fact that the latency tracers work in overwrite mode. [PATCH 1/4] This implement fs notification for tracing_max_latency. It makes it possible for userspace to detect when a new latency has been detected. [PATCH 2/4] This extends the preemptirq_delay_test module so that it can be used to generate a burst of closely occurring latencies. [PATCH 3/4] This adds a user space program to the tools directory that utilizes the fs notification feature and a randomized algorithm to print out any of the latencies in a burst with approximately equal probability. The last patch is not directly connected but earlier it didn't apply cleanly on its own. However, now it does, so in principle it could be applied separately from the others. [PATCH 4/4] This adds the option console-latency to the trace options. This makes it possible to enable tracing of console latencies. best regards, Viktor Viktor Rosendahl (BMW) (4): ftrace: Implement fs notification for tracing_max_latency preemptirq_delay_test: Add the burst feature and a sysfs trigger Add the latency-collector to tools ftrace: Add an option for tracing console latencies include/linux/irqflags.h | 22 + kernel/printk/printk.c |6 +- kernel/trace/Kconfig |6 +- kernel/trace/preemptirq_delay_test.c | 144 ++- kernel/trace/trace.c | 75 +- kernel/trace/trace.h | 19 + kernel/trace/trace_hwlat.c |4 +- kernel/trace/trace_irqsoff.c | 12 + tools/Makefile | 14 +- tools/trace/Makefile | 20 + tools/trace/latency-collector.c | 1212 ++ 11 files changed, 1501 insertions(+), 33 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c -- 2.17.1
Re: [PATCH v6 1/4] ftrace: Implement fs notification for tracing_max_latency
On 9/8/19 1:38 AM, Joel Fernandes wrote:> On Sat, Sep 07, 2019 at 11:12:59PM +0200, Viktor Rosendahl wrote: >> On 9/6/19 4:17 PM, Joel Fernandes wrote: >>> On Thu, Sep 05, 2019 at 03:25:45PM +0200, Viktor Rosendahl wrote: >> >>>> + >>>> +__init static int latency_fsnotify_init(void) >>>> +{ >>>> + fsnotify_wq = alloc_workqueue("tr_max_lat_wq", >>>> +WQ_UNBOUND | WQ_HIGHPRI, 0); >>>> + if (!fsnotify_wq) { >>>> + pr_err("Unable to allocate tr_max_lat_wq\n"); >>>> + return -ENOMEM; >>>> + } >>> >>> Why not just use the system workqueue instead of adding another workqueue? >>> >> >> For the the latency-collector to work properly in the worst case, when a >> new latency occurs immediately, the fsnotify must be received in less >> time than what the threshold is set to. If we always are slower we will >> always lose certain latencies. >> >> My intention was to minimize latency in some important cases, so that >> user space receives the notification sooner rather than later. >> >> There doesn't seem to be any system workqueue with WQ_UNBOUND and >> WQ_HIGHPRI. My thinking was that WQ_UNBOUND might help with the latency >> in some important cases. >> >> If we use: >> >> queue_work(system_highpri_wq, >fsnotify_work); >> >> then the work will (almost) always execute on the same CPU but if we are >> unlucky that CPU could be too busy while there could be another CPU in >> the system that would be able to process the work soon enough. >> >> queue_work_on() could be used to queue the work on another CPU but it >> seems difficult to select the right CPU. > > Ok, a separate WQ is fine with me as such since the preempt/irq events are on > a debug kernel anyway. Keep in mind that this feature is also enabled by the wakeup tracers and by hwlat. These are often enabled by production kernels. I guess it would be possible to add some ifdefs so that we create a new workqueue only if preempt/irqsoff tracing is enabled in the kernel config and use system_highpri_wq if we only have the wakeup and hwlat tracers in the config. However, I don't really like adding yet some more ifdefs to the code. Since a new workqueue will not necessariy create a new worker thread nowadays, I thought that it would be OK with a new unbound workqueue, which should not add much to the tendency to create more worker threads. > > I'll keep reviewing your patches next few days, I am at the LPC conference so > might be a bit slow. Overall I think the series look like its maturing and > getting close. > Ok, thanks. Could you let me know when you have looked through it all so that I know when it makes sense to make another reroll of the series? best regards, Viktor
Re: [PATCH v6 1/4] ftrace: Implement fs notification for tracing_max_latency
On 9/6/19 4:17 PM, Joel Fernandes wrote: On Thu, Sep 05, 2019 at 03:25:45PM +0200, Viktor Rosendahl wrote: + +__init static int latency_fsnotify_init(void) +{ + fsnotify_wq = alloc_workqueue("tr_max_lat_wq", + WQ_UNBOUND | WQ_HIGHPRI, 0); + if (!fsnotify_wq) { + pr_err("Unable to allocate tr_max_lat_wq\n"); + return -ENOMEM; + } Why not just use the system workqueue instead of adding another workqueue? For the the latency-collector to work properly in the worst case, when a new latency occurs immediately, the fsnotify must be received in less time than what the threshold is set to. If we always are slower we will always lose certain latencies. My intention was to minimize latency in some important cases, so that user space receives the notification sooner rather than later. There doesn't seem to be any system workqueue with WQ_UNBOUND and WQ_HIGHPRI. My thinking was that WQ_UNBOUND might help with the latency in some important cases. If we use: queue_work(system_highpri_wq, >fsnotify_work); then the work will (almost) always execute on the same CPU but if we are unlucky that CPU could be too busy while there could be another CPU in the system that would be able to process the work soon enough. queue_work_on() could be used to queue the work on another CPU but it seems difficult to select the right CPU. best regards, Viktor
Re: [PATCH v5 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger
On 9/5/19 6:52 PM, Steven Rostedt wrote: On Wed, 4 Sep 2019 21:10:38 +0200 Viktor Rosendahl wrote: On 9/4/19 1:42 PM, Steven Rostedt wrote: I am not so keen to clutter source files with a new copyright message. My problem is that git-send-email doesn't work well with my work email address, so I am forced to use my private gmail, which may create a false impression that I as a private individual would be the copyright holder. Then do what I do: Steven Rostedt (VMware) I did "(Red Hat)" when I worked for Red Hat. -- Steve Ok, that seems simple and effective. I will update accordingly for v7 of the patches. best regards, Viktor
[PATCH v6 1/4] ftrace: Implement fs notification for tracing_max_latency
This patch implements the feature that the tracing_max_latency file, e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive notifications through the fsnotify framework when a new latency is available. One particularly interesting use of this facility is when enabling threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh, together with the preempt/irqsoff tracers. This makes it possible to implement a user space program that can, with equal probability, obtain traces of latencies that occur immediately after each other in spite of the fact that the preempt/irqsoff tracers operate in overwrite mode. This facility works with the hwlat, preempt/irqsoff, and wakeup tracers. The tracers may call the latency_fsnotify() from places such as __schedule() or do_idle(); this makes it impossible to call queue_work() directly without risking a deadlock. The same would happen with a softirq, kernel thread or tasklet. For this reason we use the irq_work mechanism to call queue_work(). Signed-off-by: Viktor Rosendahl --- kernel/trace/trace.c | 75 +- kernel/trace/trace.h | 18 + kernel/trace/trace_hwlat.c | 4 +- 3 files changed, 94 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 563e80f9006a..72ac20c4aaa1 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -44,6 +44,9 @@ #include #include #include +#include +#include +#include #include "trace.h" #include "trace_output.h" @@ -1480,6 +1483,74 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, void *buf, size_t cnt) unsigned long __read_mostlytracing_thresh; +#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ + defined(CONFIG_FSNOTIFY) + +static const struct file_operations tracing_max_lat_fops; +static struct workqueue_struct *fsnotify_wq; + +static void latency_fsnotify_workfn(struct work_struct *work) +{ + struct trace_array *tr = container_of(work, struct trace_array, + fsnotify_work); + fsnotify(tr->d_max_latency->d_inode, FS_MODIFY, +tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0); +} + +static void latency_fsnotify_workfn_irq(struct irq_work *iwork) +{ + struct trace_array *tr = container_of(iwork, struct trace_array, + fsnotify_irqwork); + queue_work(fsnotify_wq, >fsnotify_work); +} + +static void trace_create_maxlat_file(struct trace_array *tr, +struct dentry *d_tracer) +{ + INIT_WORK(>fsnotify_work, latency_fsnotify_workfn); + init_irq_work(>fsnotify_irqwork, latency_fsnotify_workfn_irq); + tr->d_max_latency = trace_create_file("tracing_max_latency", 0644, + d_tracer, >max_latency, + _max_lat_fops); +} + +__init static int latency_fsnotify_init(void) +{ + fsnotify_wq = alloc_workqueue("tr_max_lat_wq", + WQ_UNBOUND | WQ_HIGHPRI, 0); + if (!fsnotify_wq) { + pr_err("Unable to allocate tr_max_lat_wq\n"); + return -ENOMEM; + } + return 0; +} + +late_initcall_sync(latency_fsnotify_init); + +void latency_fsnotify(struct trace_array *tr) +{ + if (!fsnotify_wq) + return; + /* +* We cannot call queue_work(>fsnotify_work) from here because it's +* possible that we are called from __schedule() or do_idle(), which +* could cause a deadlock. +*/ + irq_work_queue(>fsnotify_irqwork); +} + +/* + * (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ + * defined(CONFIG_FSNOTIFY) + */ +#else + +#define trace_create_maxlat_file(tr, d_tracer) \ + trace_create_file("tracing_max_latency", 0644, d_tracer,\ + >max_latency, _max_lat_fops) + +#endif + #ifdef CONFIG_TRACER_MAX_TRACE /* * Copy the new maximum trace into the separate maximum-trace @@ -1518,6 +1589,7 @@ __update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu) /* record this tasks comm */ tracing_record_cmdline(tsk); + latency_fsnotify(tr); } /** @@ -8550,8 +8622,7 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) create_trace_options_dir(tr); #if defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER) - trace_create_file("tracing_max_latency", 0644, d_tracer, - >max_latency, _max_lat_fops); + trace_create_maxlat_file(tr, d_tracer); #endif if (ftrace_create_function_files(tr, d_tracer)) diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 005f08629b8b
[PATCH v6 4/4] ftrace: Add an option for tracing console latencies
This new trace option "console-latency" will enable the latency tracers to trace the console latencies. Previously this has always been implicitely disabled. I guess this is because they are considered to be well known and unavoidable. However, for some organizations it may nevertheless be desirable to trace them. Basically, we want to be able to tell that there are latencies in the system under test because someone has incorrectly enabled the serial console. Signed-off-by: Viktor Rosendahl --- include/linux/irqflags.h | 22 ++ kernel/printk/printk.c | 6 -- kernel/trace/trace.h | 1 + kernel/trace/trace_irqsoff.c | 12 4 files changed, 39 insertions(+), 2 deletions(-) diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 21619c92c377..3de891723331 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -13,6 +13,7 @@ #define _LINUX_TRACE_IRQFLAGS_H #include +#include #include /* Currently trace_softirqs_on/off is used only by lockdep */ @@ -68,9 +69,30 @@ do { \ defined(CONFIG_PREEMPT_TRACER) extern void stop_critical_timings(void); extern void start_critical_timings(void); + extern bool console_tracing_disabled(void); + +# define console_stop_critical_timings(flag) \ + do {\ + typecheck(bool, flag); \ + flag = console_tracing_disabled(); \ + if (flag) \ + stop_critical_timings();\ + } while (0) + +# define console_start_critical_timings(flag) \ + do { \ + typecheck(bool, flag); \ + if (flag)\ + start_critical_timings();\ + } while (0) + #else # define stop_critical_timings() do { } while (0) # define start_critical_timings() do { } while (0) +# define console_stop_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) +# define console_start_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) #endif /* diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 1888f6a3b694..036460a7e4cd 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2359,6 +2359,7 @@ void console_unlock(void) static char ext_text[CONSOLE_EXT_LOG_MAX]; static char text[LOG_LINE_MAX + PREFIX_MAX]; unsigned long flags; + bool cflag; bool do_cond_resched, retry; if (console_suspended) { @@ -2459,9 +2460,10 @@ void console_unlock(void) */ console_lock_spinning_enable(); - stop_critical_timings();/* don't trace print latency */ + /* don't trace print latency if it's disabled */ + console_stop_critical_timings(cflag); call_console_drivers(ext_text, ext_len, text, len); - start_critical_timings(); + console_start_critical_timings(cflag); if (console_lock_spinning_disable_and_check()) { printk_safe_exit_irqrestore(flags); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 4913ed1138aa..93a8b82c27e4 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1262,6 +1262,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf, C(PRINTK_MSGONLY, "printk-msg-only"), \ C(CONTEXT_INFO, "context-info"), /* Print pid/cpu/time */ \ C(LATENCY_FMT, "latency-format"), \ + C(CONSOLE_LATENCY, "console-latency"), \ C(RECORD_CMD, "record-cmd"), \ C(RECORD_TGID, "record-tgid"), \ C(OVERWRITE,"overwrite"), \ diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index a745b0cee5d3..bc02be207a7a 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -456,6 +456,18 @@ void stop_critical_timings(void) EXPORT_SYMBOL_GPL(stop_critical_timings); NOKPROBE_SYMBOL(stop_critical_timings); +bool console_tracing_disabled(void) +{ + struct trace_array *tr = irqsoff_trace; + int pc = preempt_count(); + + if (!preempt_trace(pc) && !irq_trace()) + return false; + + return !(tr->trace_flags & TRACE_ITER_CONSOLE_LATENCY); +} +EXPORT_SYMBOL_GPL(console_tracing_disabled); + #ifdef CONFIG_FUNCTION_TRACER static bool function_enabled; -- 2.17.1
[PATCH v6 3/4] Add the latency-collector to tools
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< In the example above, we randomly, happened to get the third latency in a burst burst of 10. If the experiment is repeated enough times, we will get all 10. Sometimes, there will be lost latencies, so that we get: 3054.078294 Latency 44 printout skipped due to inotify loop ..or: 3211.957685 Latency 112 printout skipped due to print loop In my experience, these are not frequent enough to be a problem. Also, we do not really lose any latency in these cases, it's more like we get another latency than what was intended by the design. Signed-off-by: Viktor Rosendahl --- tools/Makefile | 14 +- tools/trace/Makefile| 20 + tools/trace/latency-collector.c | 1212 +++ 3 files changed, 1240 insertions(+), 6 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c diff --git a/tools/Makefile b/tools/Makefile index 68defd7ecf5d..2b3738e66280 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -30,6 +30,7 @@ help: @echo ' spi- spi tools' @echo ' objtool- an ELF object analysis tool' @echo ' tmon - thermal monitoring and tuning tool' + @echo ' trace - misc tracing tools' @echo ' turbostat - Intel CPU idle stats and freq reporting tool' @echo ' usb- USB testing tools' @echo ' virtio - vhost test module' @@ -63,7 +64,7 @@ acpi: FORCE cpupower: FORCE $(call descend,power/$@) -cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging: FORCE +cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging trace: FORCE $(call descend,$@) liblockdep: FORCE @@ -99,7 +100,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \ perf selftests spi turbostat usb \ virtio vm bpf x86_energy_perf_policy \ tmon freefall iio objtool kvm_stat wmi \ - pci debugging + pci debugging trace acpi_install: $(call descend,power/$(@:_install=),install) @@ -107,7 +108,7 @@ acpi_install: cpupower_install: $(call descend,power/$(@:_install=),install) -cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install: +cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install trace_install: $(call descend,$(@:_install=),install) liblockdep_install: @@ -133,7 +134,8 @@ install: acpi_install cgroup_install cpupower_install gpio_install \ perf_install selftests_install turbostat_install usb_install \ virtio_install vm_install bpf_install x86_energy_perf_policy_install \ tmon_install freefall_install objtool_install kvm_stat_install \ - wmi_install pci_install debugging_install intel-speed-select_install + wmi_install pci_install debugging_install intel-speed-select_install \ + trace_install acpi_clean: $(call descend,power/acpi,clean) @@ -141,7 +143,7 @@ acpi_clean: cpupower_clean: $(call descend,power/cpupower,clean) -cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean: +cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean trace_clean: $(call descend,$(@:_clean=),clean) liblockdep_clean: @@ -180,6 +182,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean firewire_clean \ vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean tmon_clean \ freefall_clean build_clean libbpf_clean libsubcmd_clean liblockdep_clean \ gpio_clean objtool_clean leds_clean wmi_clean pci_clean firmware_clean debugging_clean \ - intel-speed-select_clean + intel-speed-select_clean trace_clean .PHONY: FORCE diff --git a/tools/trace/Makefile b/tools/trace/Makefile new file mode 100644 index ..59cd483ab01f --- /dev/null +++ b/tools/trace/Makefile @@ -0,0 +1,20 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for vm tools +# +TARGETS = latency-collector +CFLAGS = -Wall -Wextra -g -O2 +LDFLAGS = -lpthread + +all: $(TARGETS) + +%: %.c + $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) + +clean: +
[PATCH v6 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger
This burst feature enables the user to generate a burst of preempt/irqsoff latencies. This makes it possible to test whether we are able to detect latencies that systematically occur very close to each other. The maximum burst size is 10. We also create 10 identical test functions, so that we get 10 different backtraces; this is useful when we want to test whether we can detect all the latencies in a burst. Otherwise, there would be no easy way of differentiating between which latency in a burst was captured by the tracer. In addition, there is a sysfs trigger, so that it's not necessary to reload the module to repeat the test. The trigger will appear as /sys/kernel/preemptirq_delay_test/trigger in sysfs. Signed-off-by: Viktor Rosendahl --- kernel/trace/Kconfig | 6 +- kernel/trace/preemptirq_delay_test.c | 147 +++ 2 files changed, 131 insertions(+), 22 deletions(-) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 98da8998c25c..5ff573d844e4 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -751,9 +751,9 @@ config PREEMPTIRQ_DELAY_TEST configurable delay. The module busy waits for the duration of the critical section. - For example, the following invocation forces a one-time irq-disabled - critical section for 500us: - modprobe preemptirq_delay_test test_mode=irq delay=50 + For example, the following invocation generates a burst of three + irq-disabled critical sections for 500us: + modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3 If unsure, say N diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index d8765c952fab..96bf6b4858bc 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -3,6 +3,9 @@ * Preempt / IRQ disable delay thread to test latency tracers * * Copyright (C) 2018 Joel Fernandes (Google) + * Copyright (C) 2018, 2019 BMW Car IT GmbH + * Author: Viktor Rosendahl (viktor.rosend...@bmw.de) + * - Added the burst feature and the sysfs trigger */ #include @@ -10,18 +13,25 @@ #include #include #include +#include #include #include #include #include +#include static ulong delay = 100; -static char test_mode[10] = "irq"; +static char test_mode[12] = "irq"; +static uint burst_size = 1; -module_param_named(delay, delay, ulong, S_IRUGO); -module_param_string(test_mode, test_mode, 10, S_IRUGO); -MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)"); -MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default irq)"); +module_param_named(delay, delay, ulong, 0444); +module_param_string(test_mode, test_mode, 12, 0444); +module_param_named(burst_size, burst_size, uint, 0444); +MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)"); +MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or alternate (default irq)"); +MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)"); + +#define MIN(x, y) ((x) < (y) ? (x) : (y)) static void busy_wait(ulong time) { @@ -34,37 +44,136 @@ static void busy_wait(ulong time) } while ((end - start) < (time * 1000)); } -static int preemptirq_delay_run(void *data) +static __always_inline void irqoff_test(void) { unsigned long flags; + local_irq_save(flags); + busy_wait(delay); + local_irq_restore(flags); +} - if (!strcmp(test_mode, "irq")) { - local_irq_save(flags); - busy_wait(delay); - local_irq_restore(flags); - } else if (!strcmp(test_mode, "preempt")) { - preempt_disable(); - busy_wait(delay); - preempt_enable(); +static __always_inline void preemptoff_test(void) +{ + preempt_disable(); + busy_wait(delay); + preempt_enable(); +} + +static void execute_preemptirqtest(int idx) +{ + if (!strcmp(test_mode, "irq")) + irqoff_test(); + else if (!strcmp(test_mode, "preempt")) + preemptoff_test(); + else if (!strcmp(test_mode, "alternate")) { + if (idx % 2 == 0) + irqoff_test(); + else + preemptoff_test(); } +} + +#define DECLARE_TESTFN(POSTFIX)\ + static void preemptirqtest_##POSTFIX(int idx) \ + { \ + execute_preemptirqtest(idx);\ + } \ +/* + * We create 10 different functions, so that we can get 10 different + * backtraces. + */ +DECLARE_TESTFN(0) +DECLARE_TESTFN(1) +DECLARE_TESTFN(2) +DECLARE_TESTFN(3) +DECLARE_TESTFN(4) +D
[PATCH v6 0/4] Some new features for the preempt/irqsoff tracers
Hello all, Changes in v5: - [PATCH 1/4]: * Using the irq_work mechanism instead of the games with linked lists, percpu variables and modifications to the scheduling and idle code. - [PATCH 2/4]: * Updated the copyright message, adding my own name and what I did. - [PATCH 3/4]: * Added my own name as author to the copyright message. - [PACTH 4/4]: * Improved the build on sh architecture with gcc 7.4.0 by including linux/types.h. The build on sh architecture still fails for me later in the build process with an ICE error from the compiler but the same failure also happens without any of my patches. This series is meant to address two issues with the latency tracing. The first three patches provide a method to trace latencies that always occurs very close to each other and to differentiate between them, in spite of the fact that the latency tracers work in overwrite mode. [PATCH 1/4] This implement fs notification for tracing_max_latency. It makes it possible for userspace to detect when a new latency has been detected. [PATCH 2/4] This extends the preemptirq_delay_test module so that it can be used to generate a burst of closely occurring latencies. [PATCH 3/4] This adds a user space program to the tools directory that utilizes the fs notification feature and a randomized algorithm to print out any of the latencies in a burst with approximately equal probability. The last patch is not directly connected but earlier it didn't apply cleanly on its own. However, now it does, so in principle it could be applied separately from the others. [PATCH 4/4] This adds the option console-latency to the trace options. This makes it possible to enable tracing of console latencies. best regards, Viktor Viktor Rosendahl (4): ftrace: Implement fs notification for tracing_max_latency preemptirq_delay_test: Add the burst feature and a sysfs trigger Add the latency-collector to tools ftrace: Add an option for tracing console latencies include/linux/irqflags.h | 22 + kernel/printk/printk.c |6 +- kernel/trace/Kconfig |6 +- kernel/trace/preemptirq_delay_test.c | 147 +++- kernel/trace/trace.c | 75 +- kernel/trace/trace.h | 19 + kernel/trace/trace_hwlat.c |4 +- kernel/trace/trace_irqsoff.c | 12 + tools/Makefile | 14 +- tools/trace/Makefile | 20 + tools/trace/latency-collector.c | 1212 ++ 11 files changed, 1504 insertions(+), 33 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c -- 2.17.1
Re: [PATCH v5 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger
On 9/4/19 1:42 PM, Steven Rostedt wrote: On Tue, 3 Sep 2019 15:26:00 +0200 Viktor Rosendahl wrote: diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index d8765c952fab..dc281fa75198 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -3,6 +3,7 @@ * Preempt / IRQ disable delay thread to test latency tracers * * Copyright (C) 2018 Joel Fernandes (Google) + * Copyright (C) 2018, 2019 BMW Car IT GmbH A name and what you did should also be attached here. Ideally, we leave these out as git history is usually enough. I am not so keen to clutter source files with a new copyright message. My problem is that git-send-email doesn't work well with my work email address, so I am forced to use my private gmail, which may create a false impression that I as a private individual would be the copyright holder. best regards, Viktor
Re: [PATCH v5 1/4] ftrace: Implement fs notification for tracing_max_latency
On 9/4/19 1:39 PM, Steven Rostedt wrote: On Tue, 3 Sep 2019 15:25:59 +0200 Viktor Rosendahl wrote: +void latency_fsnotify_stop(void) +{ + /* Make sure all CPUs see caller's previous actions to stop tracer */ + smp_wmb(); These memory barriers just look wrong. What exactly are you trying to protect here? Where's the matching rmbs? Thanks for reviewing. However, since these functions will disappear when I take the irq_work facility into use, we should perhaps not spend too much time discussing what would have been. There are no matching rmbs, I was thinking that the smp_wmb() would merely enforce the order of the memory writes, as seen by other CPUs, so that the tracer would be stopped, before the latency fsnotify is disabled. E.g. in case of the preemptirqsoff tracer the idea was that it doesn't matter exactly when a CPU sees the "tracer_enabled = 0;" in stop_irqsoff_tracer() but that it needs to be seen before the writes in latency_fsnotify_stop() are seen. best regards, Viktor
Re: [PATCH v5 1/4] ftrace: Implement fs notification for tracing_max_latency
On 9/4/19 10:19 AM, Peter Zijlstra wrote: Adding Paul since RCU faces similar situations, i.e. raising softirq risks scheduler deadlock in rcu_read_unlock_special() -- but RCU's solution is to avoid raising the softirq and instead use irq_work. Which is right. Thanks Joel and Peter for suggesting the irq_work facility. I was ignorant of the existence of this facility. It looks like it is the Holy Grail of deferred work, for this particular type of problem :) It looks like it will be possible to both avoid deadlock and also avoid the ugly additions to the sched and idle code. In addition, the rest of the code will become a lot simpler. best regards, Viktor
[PATCH v5 1/4] ftrace: Implement fs notification for tracing_max_latency
This patch implements the feature that the tracing_max_latency file, e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive notifications through the fsnotify framework when a new latency is available. One particularly interesting use of this facility is when enabling threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh, together with the preempt/irqsoff tracers. This makes it possible to implement a user space program that can, with equal probability, obtain traces of latencies that occur immediately after each other in spite of the fact that the preempt/irqsoff tracers operate in overwrite mode. This facility works with the hwlat, preempt/irqsoff, and wakeup tracers. This patch also adds some unfortunate calls from __schedule() and do_idle(). Those calls to the latency_fsnotify_disable/enable() are needed because we cannot wake up the workqueue from these critical sections without risking a deadlock. Similar problems would also arise if we try to schedule a tasklet, raise a softirq, or wake up a kernel thread. If a notification event would happen in the forbidden sections, we schedule the fsnotify work as soon as we have exited them. There was a suggestion to remove this latency_fsnotify_enable/disable() gunk, or at least to combine it with the start_critical_timings() and stop_critical_timings(). I have however not been able to come up with a way to do it. It seems like it would be possible to simply replace the calls to latency_fsnotify_enable/disable() with calls to start/stop_critical_timings(). However, the main problem is that it would not work for the wakup tracer. The wakeup tracer needs a facility that postpones the notifications, not one that prevents the measurements because all its measurements takes place in the middle of __schedule(). On the other hand, in some places, like in idle and the console we need start stop functions that prevents the measurements from being make. Signed-off-by: Viktor Rosendahl --- include/linux/ftrace.h| 31 + kernel/sched/core.c | 3 + kernel/sched/idle.c | 3 + kernel/sched/sched.h | 1 + kernel/trace/trace.c | 112 +- kernel/trace/trace.h | 22 ++ kernel/trace/trace_hwlat.c| 4 +- kernel/trace/trace_irqsoff.c | 4 ++ kernel/trace/trace_sched_wakeup.c | 4 ++ 9 files changed, 181 insertions(+), 3 deletions(-) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 8a8cb3c401b2..b4d9700ef917 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -907,4 +907,35 @@ unsigned long arch_syscall_addr(int nr); #endif /* CONFIG_FTRACE_SYSCALLS */ +#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ + defined(CONFIG_FSNOTIFY) + +DECLARE_PER_CPU(int, latency_notify_disable); +DECLARE_STATIC_KEY_FALSE(latency_notify_key); + +void latency_fsnotify_process(void); + +/* + * Disable/enable fsnotify while in scheduler and idle code. Trying to wake + * anything up from there, such as calling queue_work() is prone to deadlock. + */ +static inline void latency_fsnotify_disable(void) +{ + this_cpu_inc(latency_notify_disable); +} + +static inline void latency_fsnotify_enable(void) +{ + this_cpu_dec(latency_notify_disable); + if (static_branch_unlikely(_notify_key)) + latency_fsnotify_process(); +} + +#else + +#define latency_fsnotify_disable() do { } while (0) +#define latency_fsnotify_enable() do { } while (0) + +#endif + #endif /* _LINUX_FTRACE_H */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 010d578118d6..e3c1dc801073 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3198,6 +3198,7 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev) */ rq = finish_task_switch(prev); + latency_fsnotify_enable(); balance_callback(rq); preempt_enable(); @@ -3820,6 +3821,7 @@ static void __sched notrace __schedule(bool preempt) local_irq_disable(); rcu_note_context_switch(preempt); + latency_fsnotify_disable(); /* * Make sure that signal_pending_state()->signal_pending() below @@ -3883,6 +3885,7 @@ static void __sched notrace __schedule(bool preempt) rq_unlock_irq(rq, ); } + latency_fsnotify_enable(); balance_callback(rq); } diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 80940939b733..5fc87d99a407 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -236,6 +236,7 @@ static void do_idle(void) __current_set_polling(); tick_nohz_idle_enter(); + latency_fsnotify_disable(); while (!need_resched()) { check_pgt_cache(); @@ -265,6 +266,8 @@ static void do_idle(void) arch_cpu_idle_exit(); } + latency_fsnotify_enable(); + /* * Since
[PATCH v5 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger
This burst feature enables the user to generate a burst of preempt/irqsoff latencies. This makes it possible to test whether we are able to detect latencies that systematically occur very close to each other. The maximum burst size is 10. We also create 10 identical test functions, so that we get 10 different backtraces; this is useful when we want to test whether we can detect all the latencies in a burst. Otherwise, there would be no easy way of differentiating between which latency in a burst was captured by the tracer. In addition, there is a sysfs trigger, so that it's not necessary to reload the module to repeat the test. The trigger will appear as /sys/kernel/preemptirq_delay_test/trigger in sysfs. Signed-off-by: Viktor Rosendahl --- kernel/trace/Kconfig | 6 +- kernel/trace/preemptirq_delay_test.c | 145 +++ 2 files changed, 129 insertions(+), 22 deletions(-) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 98da8998c25c..5ff573d844e4 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -751,9 +751,9 @@ config PREEMPTIRQ_DELAY_TEST configurable delay. The module busy waits for the duration of the critical section. - For example, the following invocation forces a one-time irq-disabled - critical section for 500us: - modprobe preemptirq_delay_test test_mode=irq delay=50 + For example, the following invocation generates a burst of three + irq-disabled critical sections for 500us: + modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3 If unsure, say N diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index d8765c952fab..dc281fa75198 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -3,6 +3,7 @@ * Preempt / IRQ disable delay thread to test latency tracers * * Copyright (C) 2018 Joel Fernandes (Google) + * Copyright (C) 2018, 2019 BMW Car IT GmbH */ #include @@ -10,18 +11,25 @@ #include #include #include +#include #include #include #include #include +#include static ulong delay = 100; -static char test_mode[10] = "irq"; +static char test_mode[12] = "irq"; +static uint burst_size = 1; -module_param_named(delay, delay, ulong, S_IRUGO); -module_param_string(test_mode, test_mode, 10, S_IRUGO); -MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)"); -MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default irq)"); +module_param_named(delay, delay, ulong, 0444); +module_param_string(test_mode, test_mode, 12, 0444); +module_param_named(burst_size, burst_size, uint, 0444); +MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)"); +MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or alternate (default irq)"); +MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)"); + +#define MIN(x, y) ((x) < (y) ? (x) : (y)) static void busy_wait(ulong time) { @@ -34,37 +42,136 @@ static void busy_wait(ulong time) } while ((end - start) < (time * 1000)); } -static int preemptirq_delay_run(void *data) +static __always_inline void irqoff_test(void) { unsigned long flags; + local_irq_save(flags); + busy_wait(delay); + local_irq_restore(flags); +} - if (!strcmp(test_mode, "irq")) { - local_irq_save(flags); - busy_wait(delay); - local_irq_restore(flags); - } else if (!strcmp(test_mode, "preempt")) { - preempt_disable(); - busy_wait(delay); - preempt_enable(); +static __always_inline void preemptoff_test(void) +{ + preempt_disable(); + busy_wait(delay); + preempt_enable(); +} + +static void execute_preemptirqtest(int idx) +{ + if (!strcmp(test_mode, "irq")) + irqoff_test(); + else if (!strcmp(test_mode, "preempt")) + preemptoff_test(); + else if (!strcmp(test_mode, "alternate")) { + if (idx % 2 == 0) + irqoff_test(); + else + preemptoff_test(); } +} + +#define DECLARE_TESTFN(POSTFIX)\ + static void preemptirqtest_##POSTFIX(int idx) \ + { \ + execute_preemptirqtest(idx);\ + } \ +/* + * We create 10 different functions, so that we can get 10 different + * backtraces. + */ +DECLARE_TESTFN(0) +DECLARE_TESTFN(1) +DECLARE_TESTFN(2) +DECLARE_TESTFN(3) +DECLARE_TESTFN(4) +DECLARE_TESTFN(5) +DECLARE_TESTFN(6) +DECLARE_TESTFN(7) +DECLARE_TESTFN(8) +DECLARE_TESTFN(9) + +static void (*testfuncs[])(int) = { +
[PATCH v5 4/4] ftrace: Add an option for tracing console latencies
This new trace option "console-latency" will enable the latency tracers to trace the console latencies. Previously this has always been implicitely disabled. I guess this is because they are considered to be well known and unavoidable. However, for some organizations it may nevertheless be desirable to trace them. Basically, we want to be able to tell that there are latencies in the system under test because someone has incorrectly enabled the serial console. Signed-off-by: Viktor Rosendahl --- include/linux/irqflags.h | 21 + kernel/printk/printk.c | 6 -- kernel/trace/trace.h | 1 + kernel/trace/trace_irqsoff.c | 12 4 files changed, 38 insertions(+), 2 deletions(-) diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 21619c92c377..34b0424a32dc 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -68,9 +68,30 @@ do { \ defined(CONFIG_PREEMPT_TRACER) extern void stop_critical_timings(void); extern void start_critical_timings(void); + extern bool console_tracing_disabled(void); + +# define console_stop_critical_timings(flag) \ + do {\ + typecheck(bool, flag); \ + flag = console_tracing_disabled(); \ + if (flag) \ + stop_critical_timings();\ + } while (0) + +# define console_start_critical_timings(flag) \ + do { \ + typecheck(bool, flag); \ + if (flag)\ + start_critical_timings();\ + } while (0) + #else # define stop_critical_timings() do { } while (0) # define start_critical_timings() do { } while (0) +# define console_stop_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) +# define console_start_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) #endif /* diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 1888f6a3b694..036460a7e4cd 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2359,6 +2359,7 @@ void console_unlock(void) static char ext_text[CONSOLE_EXT_LOG_MAX]; static char text[LOG_LINE_MAX + PREFIX_MAX]; unsigned long flags; + bool cflag; bool do_cond_resched, retry; if (console_suspended) { @@ -2459,9 +2460,10 @@ void console_unlock(void) */ console_lock_spinning_enable(); - stop_critical_timings();/* don't trace print latency */ + /* don't trace print latency if it's disabled */ + console_stop_critical_timings(cflag); call_console_drivers(ext_text, ext_len, text, len); - start_critical_timings(); + console_start_critical_timings(cflag); if (console_lock_spinning_disable_and_check()) { printk_safe_exit_irqrestore(flags); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index d9f83b2aaa71..8e4c9736a801 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1266,6 +1266,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf, C(PRINTK_MSGONLY, "printk-msg-only"), \ C(CONTEXT_INFO, "context-info"), /* Print pid/cpu/time */ \ C(LATENCY_FMT, "latency-format"), \ + C(CONSOLE_LATENCY, "console-latency"), \ C(RECORD_CMD, "record-cmd"), \ C(RECORD_TGID, "record-tgid"), \ C(OVERWRITE,"overwrite"), \ diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index 29403a83a5f0..c8d17c4c1c91 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -456,6 +456,18 @@ void stop_critical_timings(void) EXPORT_SYMBOL_GPL(stop_critical_timings); NOKPROBE_SYMBOL(stop_critical_timings); +bool console_tracing_disabled(void) +{ + struct trace_array *tr = irqsoff_trace; + int pc = preempt_count(); + + if (!preempt_trace(pc) && !irq_trace()) + return false; + + return !(tr->trace_flags & TRACE_ITER_CONSOLE_LATENCY); +} +EXPORT_SYMBOL_GPL(console_tracing_disabled); + #ifdef CONFIG_FUNCTION_TRACER static bool function_enabled; -- 2.17.1
[PATCH v5 0/4] Some new features for the preempt/irqsoff tracers
Hello all, Changes in v5: - [PATCH 1/4]: * trace_disable_fsnotify() => latency_fsnotify_disable(), in order to avoid naming conflicts with trace points. * Using more appropriate operations for the percpu variables. * Moved around the calls to latency_fsnotify_enable/disable() a bit. * Added static branches to reduce overhead when the facility is not in use. * Does not add any trace points anymore. Instead using explicit function calls from sched and idle code. * Unfortunately still adds some gunk to sched and idle code. I do not know how to avoid this. This series is meant to address two issues with the latency tracing. The first three patches provide a method to trace latencies that always occurs very close to each other and to differentiate between them, in spite of the fact that the latency tracers work in overwrite mode. [PATCH 1/4] This implement fs notification for tracing_max_latency. It makes it possible for userspace to detect when a new latency has been detected. [PATCH 2/4] This extends the preemptirq_delay_test module so that it can be used to generate a burst of closely occurring latencies. [PATCH 3/4] This adds a user space program to the tools directory that utilizes the fs notification feature and a randomized algorithm to print out any of the latencies in a burst with approximately equal probability. The last patch is not directly connected but earlier it didn't apply cleanly on its own. However, now it does, so in principle it could be applied separately from the others. [PATCH 4/4] This adds the option console-latency to the trace options. This makes it possible to enable tracing of console latencies. best regards, Viktor Viktor Rosendahl (4): ftrace: Implement fs notification for tracing_max_latency preemptirq_delay_test: Add the burst feature and a sysfs trigger Add the latency-collector to tools ftrace: Add an option for tracing console latencies include/linux/ftrace.h | 31 + include/linux/irqflags.h | 21 + kernel/printk/printk.c |6 +- kernel/sched/core.c |3 + kernel/sched/idle.c |3 + kernel/sched/sched.h |1 + kernel/trace/Kconfig |6 +- kernel/trace/preemptirq_delay_test.c | 145 ++- kernel/trace/trace.c | 112 ++- kernel/trace/trace.h | 23 + kernel/trace/trace_hwlat.c |4 +- kernel/trace/trace_irqsoff.c | 16 + kernel/trace/trace_sched_wakeup.c|4 + tools/Makefile | 14 +- tools/trace/Makefile | 20 + tools/trace/latency-collector.c | 1211 ++ 16 files changed, 1587 insertions(+), 33 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c -- 2.17.1
[PATCH v5 3/4] Add the latency-collector to tools
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< In the example above, we randomly, happened to get the third latency in a burst burst of 10. If the experiment is repeated enough times, we will get all 10. Sometimes, there will be lost latencies, so that we get: 3054.078294 Latency 44 printout skipped due to inotify loop ..or: 3211.957685 Latency 112 printout skipped due to print loop In my experience, these are not frequent enough to be a problem. Also, we do not really lose any latency in these cases, it's more like we get another latency than what was intended by the design. Signed-off-by: Viktor Rosendahl --- tools/Makefile | 14 +- tools/trace/Makefile| 20 + tools/trace/latency-collector.c | 1211 +++ 3 files changed, 1239 insertions(+), 6 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c diff --git a/tools/Makefile b/tools/Makefile index 68defd7ecf5d..2b3738e66280 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -30,6 +30,7 @@ help: @echo ' spi- spi tools' @echo ' objtool- an ELF object analysis tool' @echo ' tmon - thermal monitoring and tuning tool' + @echo ' trace - misc tracing tools' @echo ' turbostat - Intel CPU idle stats and freq reporting tool' @echo ' usb- USB testing tools' @echo ' virtio - vhost test module' @@ -63,7 +64,7 @@ acpi: FORCE cpupower: FORCE $(call descend,power/$@) -cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging: FORCE +cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging trace: FORCE $(call descend,$@) liblockdep: FORCE @@ -99,7 +100,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \ perf selftests spi turbostat usb \ virtio vm bpf x86_energy_perf_policy \ tmon freefall iio objtool kvm_stat wmi \ - pci debugging + pci debugging trace acpi_install: $(call descend,power/$(@:_install=),install) @@ -107,7 +108,7 @@ acpi_install: cpupower_install: $(call descend,power/$(@:_install=),install) -cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install: +cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install trace_install: $(call descend,$(@:_install=),install) liblockdep_install: @@ -133,7 +134,8 @@ install: acpi_install cgroup_install cpupower_install gpio_install \ perf_install selftests_install turbostat_install usb_install \ virtio_install vm_install bpf_install x86_energy_perf_policy_install \ tmon_install freefall_install objtool_install kvm_stat_install \ - wmi_install pci_install debugging_install intel-speed-select_install + wmi_install pci_install debugging_install intel-speed-select_install \ + trace_install acpi_clean: $(call descend,power/acpi,clean) @@ -141,7 +143,7 @@ acpi_clean: cpupower_clean: $(call descend,power/cpupower,clean) -cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean: +cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean trace_clean: $(call descend,$(@:_clean=),clean) liblockdep_clean: @@ -180,6 +182,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean firewire_clean \ vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean tmon_clean \ freefall_clean build_clean libbpf_clean libsubcmd_clean liblockdep_clean \ gpio_clean objtool_clean leds_clean wmi_clean pci_clean firmware_clean debugging_clean \ - intel-speed-select_clean + intel-speed-select_clean trace_clean .PHONY: FORCE diff --git a/tools/trace/Makefile b/tools/trace/Makefile new file mode 100644 index ..59cd483ab01f --- /dev/null +++ b/tools/trace/Makefile @@ -0,0 +1,20 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for vm tools +# +TARGETS = latency-collector +CFLAGS = -Wall -Wextra -g -O2 +LDFLAGS = -lpthread + +all: $(TARGETS) + +%: %.c + $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) + +clean:
Re: [PATCH v4 1/4] ftrace: Implement fs notification for tracing_max_latency
On 5/21/19 6:01 PM, Steven Rostedt wrote:> > > Note, you need to add the scheduling and power management maintainers > when adding trace events to their code. > My bad. > As these trace events become visible to user space, and you only need a > callback to disable fsnotify, it may be better to just have a hard > coded call to disable fsnotify (and have it be a nop when not > configured). My thinking was that it would not be defensible to clutter the idle and scheduling code with calls to slightly obscure tracing code and that perhaps some users would like to have these tracepoints for perf/ftrace but I don't insist on it. Below is a patch with hard coded calls. > We can use a static branch if you want to keep the > overhead down when not enabled, and enable the static branch when you > start these latency tracers. > I noticed that it's possible to slightly simplify the enable/disble functions, so I guess that this would not be necessary, especially since one would need to handle the case when some CPUs are in the forbidden sections when the tracers are enabled. I can try to add the static branch if you want it though. best regards, Viktor --- include/linux/ftrace.h | 13 + kernel/sched/core.c| 2 + kernel/sched/idle.c| 2 + kernel/sched/sched.h | 1 + kernel/trace/trace.c | 111 - kernel/trace/trace.h | 12 kernel/trace/trace_hwlat.c | 4 +- 7 files changed, 142 insertions(+), 3 deletions(-) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 25e2995d4a4c..88c76f23277c 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -907,4 +907,17 @@ unsigned long arch_syscall_addr(int nr); #endif /* CONFIG_FTRACE_SYSCALLS */ +#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ + defined(CONFIG_FSNOTIFY) + +void trace_disable_fsnotify(void); +void trace_enable_fsnotify(void); + +#else + +#define trace_disable_fsnotify() do { } while (0) +#define trace_enable_fsnotify() do { } while (0) + +#endif + #endif /* _LINUX_FTRACE_H */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 874c427742a9..440cd1a62722 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3374,6 +3374,7 @@ static void __sched notrace __schedule(bool preempt) struct rq *rq; int cpu; + trace_disable_fsnotify(); cpu = smp_processor_id(); rq = cpu_rq(cpu); prev = rq->curr; @@ -3449,6 +3450,7 @@ static void __sched notrace __schedule(bool preempt) } balance_callback(rq); + trace_enable_fsnotify(); } void __noreturn do_task_dead(void) diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 80940939b733..1a38bcdb3652 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -225,6 +225,7 @@ static void cpuidle_idle_call(void) static void do_idle(void) { int cpu = smp_processor_id(); + trace_disable_fsnotify(); /* * If the arch has a polling bit, we maintain an invariant: * @@ -284,6 +285,7 @@ static void do_idle(void) smp_mb__after_atomic(); sched_ttwu_pending(); + /* schedule_idle() will call trace_enable_fsnotify() */ schedule_idle(); if (unlikely(klp_patch_pending(current))) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index b52ed1ada0be..e22871d489a5 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -46,6 +46,7 @@ #include #include #include +#include #include #include #include diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 2c92b3d9ea30..5dcc1147cbcc 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -44,6 +44,8 @@ #include #include #include +#include +#include #include "trace.h" #include "trace_output.h" @@ -1481,6 +1483,111 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, void *buf, size_t cnt) unsigned long __read_mostlytracing_thresh; +#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ + defined(CONFIG_FSNOTIFY) + +static const struct file_operations tracing_max_lat_fops; +static struct workqueue_struct *fsnotify_wq; +static DEFINE_PER_CPU(atomic_t, notify_disabled) = ATOMIC_INIT(0); +static DEFINE_PER_CPU(struct llist_head, notify_list); + +static void trace_fsnotify_workfn(struct work_struct *work) +{ + struct trace_array *tr = container_of(work, struct trace_array, + fsnotify_work); + fsnotify(tr->d_max_latency->d_inode, FS_MODIFY, +tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0); +} + +static void trace_create_maxlat_file(struct trace_array *tr, + struct dentry *d_tracer) +{ + INIT_WORK(>fsnotify_work, trace_fsnotify_workfn); + atomic_set(>notify_pending, 0); + tr->d_max_latency = trace_create_file("tracing_max_latency", 0644, +
[PATCH v4 4/4] ftrace: Add an option for tracing console latencies
This new trace option "console-latency" will enable the latency tracers to trace the console latencies. Previously this has always been implicitely disabled. I guess this is because they are considered to be well known and unavoidable. However, for some organizations it may nevertheless be desirable to trace them. Basically, we want to be able to tell that there are latencies in the system under test because someone has incorrectly enabled the serial console. Signed-off-by: Viktor Rosendahl --- include/linux/irqflags.h | 21 + kernel/printk/printk.c | 6 -- kernel/trace/trace.h | 1 + kernel/trace/trace_irqsoff.c | 12 4 files changed, 38 insertions(+), 2 deletions(-) diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 21619c92c377..34b0424a32dc 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -68,9 +68,30 @@ do { \ defined(CONFIG_PREEMPT_TRACER) extern void stop_critical_timings(void); extern void start_critical_timings(void); + extern bool console_tracing_disabled(void); + +# define console_stop_critical_timings(flag) \ + do {\ + typecheck(bool, flag); \ + flag = console_tracing_disabled(); \ + if (flag) \ + stop_critical_timings();\ + } while (0) + +# define console_start_critical_timings(flag) \ + do { \ + typecheck(bool, flag); \ + if (flag)\ + start_critical_timings();\ + } while (0) + #else # define stop_critical_timings() do { } while (0) # define start_critical_timings() do { } while (0) +# define console_stop_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) +# define console_start_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) #endif /* diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 17102fd4c136..46ebba6e5fb9 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2358,6 +2358,7 @@ void console_unlock(void) static char ext_text[CONSOLE_EXT_LOG_MAX]; static char text[LOG_LINE_MAX + PREFIX_MAX]; unsigned long flags; + bool cflag; bool do_cond_resched, retry; if (console_suspended) { @@ -2458,9 +2459,10 @@ void console_unlock(void) */ console_lock_spinning_enable(); - stop_critical_timings();/* don't trace print latency */ + /* don't trace print latency if it's disabled */ + console_stop_critical_timings(cflag); call_console_drivers(ext_text, ext_len, text, len); - start_critical_timings(); + console_start_critical_timings(cflag); if (console_lock_spinning_disable_and_check()) { printk_safe_exit_irqrestore(flags); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index f95027813296..31cdc2e4c382 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1257,6 +1257,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf, C(PRINTK_MSGONLY, "printk-msg-only"), \ C(CONTEXT_INFO, "context-info"), /* Print pid/cpu/time */ \ C(LATENCY_FMT, "latency-format"), \ + C(CONSOLE_LATENCY, "console-latency"), \ C(RECORD_CMD, "record-cmd"), \ C(RECORD_TGID, "record-tgid"), \ C(OVERWRITE,"overwrite"), \ diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index a745b0cee5d3..bc02be207a7a 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -456,6 +456,18 @@ void stop_critical_timings(void) EXPORT_SYMBOL_GPL(stop_critical_timings); NOKPROBE_SYMBOL(stop_critical_timings); +bool console_tracing_disabled(void) +{ + struct trace_array *tr = irqsoff_trace; + int pc = preempt_count(); + + if (!preempt_trace(pc) && !irq_trace()) + return false; + + return !(tr->trace_flags & TRACE_ITER_CONSOLE_LATENCY); +} +EXPORT_SYMBOL_GPL(console_tracing_disabled); + #ifdef CONFIG_FUNCTION_TRACER static bool function_enabled; -- 2.17.1
[PATCH v4 3/4] Add the latency-collector to tools
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< In the example above, we randomly, happened to get the third latency in a burst burst of 10. If the experiment is repeated enough times, we will get all 10. Sometimes, there will be lost latencies, so that we get: 3054.078294 Latency 44 printout skipped due to inotify loop ..or: 3211.957685 Latency 112 printout skipped due to print loop In my experience, these are not frequent enough to be a problem. Also, we do not really lose any latency in these cases, it's more like we get another latency than what was intended by the design. Signed-off-by: Viktor Rosendahl --- tools/Makefile | 14 +- tools/trace/Makefile| 20 + tools/trace/latency-collector.c | 1211 +++ 3 files changed, 1239 insertions(+), 6 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c diff --git a/tools/Makefile b/tools/Makefile index 3dfd72ae6c1a..f7465abf3a0a 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -29,6 +29,7 @@ help: @echo ' spi- spi tools' @echo ' objtool- an ELF object analysis tool' @echo ' tmon - thermal monitoring and tuning tool' + @echo ' trace - misc tracing tools' @echo ' turbostat - Intel CPU idle stats and freq reporting tool' @echo ' usb- USB testing tools' @echo ' virtio - vhost test module' @@ -62,7 +63,7 @@ acpi: FORCE cpupower: FORCE $(call descend,power/$@) -cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging: FORCE +cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging trace: FORCE $(call descend,$@) liblockdep: FORCE @@ -98,7 +99,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \ perf selftests spi turbostat usb \ virtio vm bpf x86_energy_perf_policy \ tmon freefall iio objtool kvm_stat wmi \ - pci debugging + pci debugging trace acpi_install: $(call descend,power/$(@:_install=),install) @@ -106,7 +107,7 @@ acpi_install: cpupower_install: $(call descend,power/$(@:_install=),install) -cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install: +cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install trace_install: $(call descend,$(@:_install=),install) liblockdep_install: @@ -132,7 +133,7 @@ install: acpi_install cgroup_install cpupower_install gpio_install \ perf_install selftests_install turbostat_install usb_install \ virtio_install vm_install bpf_install x86_energy_perf_policy_install \ tmon_install freefall_install objtool_install kvm_stat_install \ - wmi_install pci_install debugging_install + wmi_install pci_install debugging_install trace_install acpi_clean: $(call descend,power/acpi,clean) @@ -140,7 +141,7 @@ acpi_clean: cpupower_clean: $(call descend,power/cpupower,clean) -cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean: +cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean trace_clean: $(call descend,$(@:_clean=),clean) liblockdep_clean: @@ -178,6 +179,7 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean firewire_clean \ perf_clean selftests_clean turbostat_clean spi_clean usb_clean virtio_clean \ vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean tmon_clean \ freefall_clean build_clean libbpf_clean libsubcmd_clean liblockdep_clean \ - gpio_clean objtool_clean leds_clean wmi_clean pci_clean firmware_clean debugging_clean + gpio_clean objtool_clean leds_clean wmi_clean pci_clean firmware_clean \ + debugging_clean trace_clean .PHONY: FORCE diff --git a/tools/trace/Makefile b/tools/trace/Makefile new file mode 100644 index ..59cd483ab01f --- /dev/null +++ b/tools/trace/Makefile @@ -0,0 +1,20 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for vm tools +# +TARGETS = latency-collector +CFLAGS = -Wall -Wextra -g -O2 +LDFLAGS = -lpthread + +all: $(TARGETS) + +%: %.c +
[PATCH v4 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger
This burst feature enables the user to generate a burst of preempt/irqsoff latencies. This makes it possible to test whether we are able to detect latencies that systematically occur very close to each other. The maximum burst size is 10. We also create 10 identical test functions, so that we get 10 different backtraces; this is useful when we want to test whether we can detect all the latencies in a burst. Otherwise, there would be no easy way of differentiating between which latency in a burst was captured by the tracer. In addition, there is a sysfs trigger, so that it's not necessary to reload the module to repeat the test. The trigger will appear as /sys/kernel/preemptirq_delay_test/trigger in sysfs. Signed-off-by: Viktor Rosendahl --- kernel/trace/Kconfig | 6 +- kernel/trace/preemptirq_delay_test.c | 145 +++ 2 files changed, 129 insertions(+), 22 deletions(-) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 5d965cef6c77..aab8909dfbff 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -740,9 +740,9 @@ config PREEMPTIRQ_DELAY_TEST configurable delay. The module busy waits for the duration of the critical section. - For example, the following invocation forces a one-time irq-disabled - critical section for 500us: - modprobe preemptirq_delay_test test_mode=irq delay=50 + For example, the following invocation generates a burst of three + irq-disabled critical sections for 500us: + modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3 If unsure, say N diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index d8765c952fab..dc281fa75198 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -3,6 +3,7 @@ * Preempt / IRQ disable delay thread to test latency tracers * * Copyright (C) 2018 Joel Fernandes (Google) + * Copyright (C) 2018, 2019 BMW Car IT GmbH */ #include @@ -10,18 +11,25 @@ #include #include #include +#include #include #include #include #include +#include static ulong delay = 100; -static char test_mode[10] = "irq"; +static char test_mode[12] = "irq"; +static uint burst_size = 1; -module_param_named(delay, delay, ulong, S_IRUGO); -module_param_string(test_mode, test_mode, 10, S_IRUGO); -MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)"); -MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default irq)"); +module_param_named(delay, delay, ulong, 0444); +module_param_string(test_mode, test_mode, 12, 0444); +module_param_named(burst_size, burst_size, uint, 0444); +MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)"); +MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or alternate (default irq)"); +MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)"); + +#define MIN(x, y) ((x) < (y) ? (x) : (y)) static void busy_wait(ulong time) { @@ -34,37 +42,136 @@ static void busy_wait(ulong time) } while ((end - start) < (time * 1000)); } -static int preemptirq_delay_run(void *data) +static __always_inline void irqoff_test(void) { unsigned long flags; + local_irq_save(flags); + busy_wait(delay); + local_irq_restore(flags); +} - if (!strcmp(test_mode, "irq")) { - local_irq_save(flags); - busy_wait(delay); - local_irq_restore(flags); - } else if (!strcmp(test_mode, "preempt")) { - preempt_disable(); - busy_wait(delay); - preempt_enable(); +static __always_inline void preemptoff_test(void) +{ + preempt_disable(); + busy_wait(delay); + preempt_enable(); +} + +static void execute_preemptirqtest(int idx) +{ + if (!strcmp(test_mode, "irq")) + irqoff_test(); + else if (!strcmp(test_mode, "preempt")) + preemptoff_test(); + else if (!strcmp(test_mode, "alternate")) { + if (idx % 2 == 0) + irqoff_test(); + else + preemptoff_test(); } +} + +#define DECLARE_TESTFN(POSTFIX)\ + static void preemptirqtest_##POSTFIX(int idx) \ + { \ + execute_preemptirqtest(idx);\ + } \ +/* + * We create 10 different functions, so that we can get 10 different + * backtraces. + */ +DECLARE_TESTFN(0) +DECLARE_TESTFN(1) +DECLARE_TESTFN(2) +DECLARE_TESTFN(3) +DECLARE_TESTFN(4) +DECLARE_TESTFN(5) +DECLARE_TESTFN(6) +DECLARE_TESTFN(7) +DECLARE_TESTFN(8) +DECLARE_TESTFN(9) + +static void (*testfuncs[])(int) = { +
[PATCH v4 1/4] ftrace: Implement fs notification for tracing_max_latency
This patch implements the feature that the tracing_max_latency file, e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive notifications through the fsnotify framework when a new latency is available. One particularly interesting use of this facility is when enabling threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh, together with the preempt/irqsoff tracers. This makes it possible to implement a user space program that can, with equal probability, obtain traces of latencies that occur immediately after each other in spite of the fact that the preempt/irqsoff tracers operate in overwrite mode. This facility works with the hwlat, preempt/irqsoff, and wakeup tracers. This patch also add four new tracepoints, for entering/exiting __schedule() and do_idle(). Those tracepoints are needed to disable and enable the notificaton facility because we cannot wake up the workqueue from these critical sections without risking a deadlock. Similar problems would also arise if we try to schedule a tasklet, raise a softirq, or wake up a kernel thread. If a notification event would happen in the forbidden sections, we schedule the fsnotify work as soon as we have exited do_idle()/__schedule(). Signed-off-by: Viktor Rosendahl --- include/trace/events/power.h | 40 + include/trace/events/sched.h | 40 + kernel/sched/core.c | 2 + kernel/sched/idle.c | 2 + kernel/trace/trace.c | 168 ++- kernel/trace/trace.h | 12 +++ kernel/trace/trace_hwlat.c | 4 +- 7 files changed, 265 insertions(+), 3 deletions(-) diff --git a/include/trace/events/power.h b/include/trace/events/power.h index f7aece721aed..40ab747274bd 100644 --- a/include/trace/events/power.h +++ b/include/trace/events/power.h @@ -40,6 +40,46 @@ DEFINE_EVENT(cpu, cpu_idle, TP_ARGS(state, cpu_id) ); +/* + * Tracepoint for entering do_idle(): + */ +TRACE_EVENT(do_idle_enter, + + TP_PROTO(int cpu), + + TP_ARGS(cpu), + + TP_STRUCT__entry( + __field(int, cpu) + ), + + TP_fast_assign( + __entry->cpu = cpu; + ), + + TP_printk("cpu=%d", __entry->cpu) +); + +/* + * Tracepoint for exiting do_idle(): + */ +TRACE_EVENT(do_idle_exit, + + TP_PROTO(int cpu), + + TP_ARGS(cpu), + + TP_STRUCT__entry( + __field(int, cpu) + ), + + TP_fast_assign( + __entry->cpu = cpu; + ), + + TP_printk("cpu=%d", __entry->cpu) +); + TRACE_EVENT(powernv_throttle, TP_PROTO(int chip_id, const char *reason, int pmax), diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index c8c7c7efb487..a1a1befea1c1 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -183,6 +183,46 @@ TRACE_EVENT(sched_switch, __entry->next_comm, __entry->next_pid, __entry->next_prio) ); +/* + * Tracepoint for entering __schedule(): + */ +TRACE_EVENT(sched_schedule_enter, + + TP_PROTO(int cpu), + + TP_ARGS(cpu), + + TP_STRUCT__entry( + __field(int, cpu) + ), + + TP_fast_assign( + __entry->cpu = cpu; + ), + + TP_printk("cpu=%d", __entry->cpu) +); + +/* + * Tracepoint for exiting __schedule(): + */ +TRACE_EVENT(sched_schedule_exit, + + TP_PROTO(int cpu), + + TP_ARGS(cpu), + + TP_STRUCT__entry( + __field(int, cpu) + ), + + TP_fast_assign( + __entry->cpu = cpu; + ), + + TP_printk("cpu=%d", __entry->cpu) +); + /* * Tracepoint for a task being migrated: */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 102dfcf0a29a..c9d86fcc48f5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3374,6 +3374,7 @@ static void __sched notrace __schedule(bool preempt) int cpu; cpu = smp_processor_id(); + trace_sched_schedule_enter(cpu); rq = cpu_rq(cpu); prev = rq->curr; @@ -3448,6 +3449,7 @@ static void __sched notrace __schedule(bool preempt) } balance_callback(rq); + trace_sched_schedule_exit(cpu); } void __noreturn do_task_dead(void) diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index f5516bae0c1b..e328e045c6e8 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -224,6 +224,7 @@ static void cpuidle_idle_call(void) static void do_idle(void) { int cpu = smp_processor_id(); + trace_do_idle_enter(cpu); /* * If the arch has a polling bit, we maintain an invariant: * @@ -287,6 +288,7 @@ static void do_idle(void) if (unlikely(klp_patch_pending(current))) klp_update_patch_state(current); + trace_do_idle_exit(cpu); } bool cpu_in_idle(unsigned long pc) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c ind
[PATCH v4 0/4] Some new features for the latency tracers
Hello all, Changes in v4: - [PATCH 1/4]: * Improved support for multiple active trace instances by storing entries that cannot be notified immediately in a linked list instead of a single pointer. With multiple instances this could happen. * Register trace_maxlat_fsnotify_init() with late_initcall_sync(). Previously it was piggy backed on tracer_init_tracefs() and it got called more than once. * Added some comments. - [PATCH 3/4]: * Some of the code was incorrectly added in the next patch. Improved commit message. - [PATCH 4/4]: * Some code from the previous patch was incorrectly included here. This series is meant to address two issues with the latency tracing. The first three patches provide a method to trace latencies that always occurs very close to each other and to differentiate between them, in spite of the fact that the latency tracers work in overwrite mode. [PATCH 1/4] This implement fs notification for tracing_max_latency. It makes it possible for userspace to detect when a new latency has been detected. [PATCH 2/4] This extends the preemptirq_delay_test module so that it can be used to generate a burst of closely occurring latencies. [PATCH 3/4] This adds a user space program to the tools directory that utilizes the fs notification feature and a randomized algorithm to print out any of the latencies in a burst with approximately equal probability. The last patch is not directly connected but doesn't apply cleanly on its own: [PATCH 4/4] This adds the option console-latency to the trace options. This makes it possible to enable tracing of console latencies. best regards, Viktor Viktor Rosendahl (4): ftrace: Implement fs notification for tracing_max_latency preemptirq_delay_test: Add the burst feature and a sysfs trigger Add the latency-collector to tools ftrace: Add an option for tracing console latencies include/linux/irqflags.h | 21 + include/trace/events/power.h | 40 + include/trace/events/sched.h | 40 + kernel/printk/printk.c |6 +- kernel/sched/core.c |2 + kernel/sched/idle.c |2 + kernel/trace/Kconfig |6 +- kernel/trace/preemptirq_delay_test.c | 145 ++- kernel/trace/trace.c | 168 +++- kernel/trace/trace.h | 13 + kernel/trace/trace_hwlat.c |4 +- kernel/trace/trace_irqsoff.c | 12 + tools/Makefile | 14 +- tools/trace/Makefile | 20 + tools/trace/latency-collector.c | 1211 ++ 15 files changed, 1671 insertions(+), 33 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c -- 2.17.1
Re: [PATCH v3 0/4] Some new features for the latency tracers
On 5/14/19 6:51 PM, Steven Rostedt wrote: Hi Viktor, Note, as the merge window is open and I'm also currently at a conference, it may be a while before I can take a look at this. If you don't hear something from me in 10 days, feel free to send me a ping. -- Steve Hi Steve, Ok, good to know. I discovered a couple of annoying issues with this v3 of the series, so I will probably send out v4 soon. best regards, Viktor
[PATCH v3 3/4] Add the latency-collector to tools
bove, we randomly, happened to get the third latency in a burst burst of 10. If the experiment is repeated enough times, we will get all 10. Sometimes, there will be lost latencies, so that we get: 3054.078294 Latency 44 printout skipped due to inotify loop ..or: 3211.957685 Latency 112 printout skipped due to print loop In my experience, these are not frequent enough to be a problem. Also, we do not really lose any latency in these cases, it's more like we get another latency than what was intended by the design. Signed-off-by: Viktor Rosendahl --- tools/Makefile | 14 +- tools/trace/Makefile| 20 + tools/trace/latency-collector.c | 1212 +++ 3 files changed, 1240 insertions(+), 6 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c diff --git a/tools/Makefile b/tools/Makefile index 3dfd72ae6c1a..f7465abf3a0a 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -29,6 +29,7 @@ help: @echo ' spi- spi tools' @echo ' objtool- an ELF object analysis tool' @echo ' tmon - thermal monitoring and tuning tool' + @echo ' trace - misc tracing tools' @echo ' turbostat - Intel CPU idle stats and freq reporting tool' @echo ' usb- USB testing tools' @echo ' virtio - vhost test module' @@ -62,7 +63,7 @@ acpi: FORCE cpupower: FORCE $(call descend,power/$@) -cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging: FORCE +cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging trace: FORCE $(call descend,$@) liblockdep: FORCE @@ -98,7 +99,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \ perf selftests spi turbostat usb \ virtio vm bpf x86_energy_perf_policy \ tmon freefall iio objtool kvm_stat wmi \ - pci debugging + pci debugging trace acpi_install: $(call descend,power/$(@:_install=),install) @@ -106,7 +107,7 @@ acpi_install: cpupower_install: $(call descend,power/$(@:_install=),install) -cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install: +cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install trace_install: $(call descend,$(@:_install=),install) liblockdep_install: @@ -132,7 +133,7 @@ install: acpi_install cgroup_install cpupower_install gpio_install \ perf_install selftests_install turbostat_install usb_install \ virtio_install vm_install bpf_install x86_energy_perf_policy_install \ tmon_install freefall_install objtool_install kvm_stat_install \ - wmi_install pci_install debugging_install + wmi_install pci_install debugging_install trace_install acpi_clean: $(call descend,power/acpi,clean) @@ -140,7 +141,7 @@ acpi_clean: cpupower_clean: $(call descend,power/cpupower,clean) -cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean: +cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean trace_clean: $(call descend,$(@:_clean=),clean) liblockdep_clean: @@ -178,6 +179,7 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean firewire_clean \ perf_clean selftests_clean turbostat_clean spi_clean usb_clean virtio_clean \ vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean tmon_clean \ freefall_clean build_clean libbpf_clean libsubcmd_clean liblockdep_clean \ - gpio_clean objtool_clean leds_clean wmi_clean pci_clean firmware_clean debugging_clean + gpio_clean objtool_clean leds_clean wmi_clean pci_clean firmware_clean \ + debugging_clean trace_clean .PHONY: FORCE diff --git a/tools/trace/Makefile b/tools/trace/Makefile new file mode 100644 index ..59cd483ab01f --- /dev/null +++ b/tools/trace/Makefile @@ -0,0 +1,20 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for vm tools +# +TARGETS = latency-collector +CFLAGS = -Wall -Wextra -g -O2 +LDFLAGS = -lpthread + +all: $(TARGETS) + +%: %.c + $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) + +clean: + $(RM) latency-collector + +sbindir ?= /usr/sbin + +install: all + install -d
[PATCH v3 4/4] ftrace: Add an option for tracing console latencies
This new trace option "console-latency" will enable the latency tracers to trace the console latencies. Previously this has always been implicitely disabled. I guess this is because they are considered to be well known and unavoidable. However, for some organizations it may nevertheless be desirable to trace them. Basically, we want to be able to tell that there are latencies in the system under test because someone has incorrectly enabled the serial console. Signed-off-by: Viktor Rosendahl --- include/linux/irqflags.h| 21 + kernel/printk/printk.c | 6 -- kernel/trace/trace.h| 1 + kernel/trace/trace_irqsoff.c| 12 tools/trace/latency-collector.c | 11 +-- 5 files changed, 43 insertions(+), 8 deletions(-) diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 21619c92c377..34b0424a32dc 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -68,9 +68,30 @@ do { \ defined(CONFIG_PREEMPT_TRACER) extern void stop_critical_timings(void); extern void start_critical_timings(void); + extern bool console_tracing_disabled(void); + +# define console_stop_critical_timings(flag) \ + do {\ + typecheck(bool, flag); \ + flag = console_tracing_disabled(); \ + if (flag) \ + stop_critical_timings();\ + } while (0) + +# define console_start_critical_timings(flag) \ + do { \ + typecheck(bool, flag); \ + if (flag)\ + start_critical_timings();\ + } while (0) + #else # define stop_critical_timings() do { } while (0) # define start_critical_timings() do { } while (0) +# define console_stop_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) +# define console_start_critical_timings(flag) \ + do { typecheck(bool, flag); } while (0) #endif /* diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 02ca827b8fac..3a18b7208399 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2348,6 +2348,7 @@ void console_unlock(void) static char ext_text[CONSOLE_EXT_LOG_MAX]; static char text[LOG_LINE_MAX + PREFIX_MAX]; unsigned long flags; + bool cflag; bool do_cond_resched, retry; if (console_suspended) { @@ -2448,9 +2449,10 @@ void console_unlock(void) */ console_lock_spinning_enable(); - stop_critical_timings();/* don't trace print latency */ + /* don't trace print latency if it's disabled */ + console_stop_critical_timings(cflag); call_console_drivers(ext_text, ext_len, text, len); - start_critical_timings(); + console_start_critical_timings(cflag); if (console_lock_spinning_disable_and_check()) { printk_safe_exit_irqrestore(flags); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 3456d6a47a47..ca10ad40f2f8 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1250,6 +1250,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf, C(PRINTK_MSGONLY, "printk-msg-only"), \ C(CONTEXT_INFO, "context-info"), /* Print pid/cpu/time */ \ C(LATENCY_FMT, "latency-format"), \ + C(CONSOLE_LATENCY, "console-latency"), \ C(RECORD_CMD, "record-cmd"), \ C(RECORD_TGID, "record-tgid"), \ C(OVERWRITE,"overwrite"), \ diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index a745b0cee5d3..bc02be207a7a 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -456,6 +456,18 @@ void stop_critical_timings(void) EXPORT_SYMBOL_GPL(stop_critical_timings); NOKPROBE_SYMBOL(stop_critical_timings); +bool console_tracing_disabled(void) +{ + struct trace_array *tr = irqsoff_trace; + int pc = preempt_count(); + + if (!preempt_trace(pc) && !irq_trace()) + return false; + + return !(tr->trace_flags & TRACE_ITER_CONSOLE_LATENCY); +} +EXPORT_SYMBOL_GPL(console_tracing_disabled); + #ifdef CONFIG_FUNCTION_TRACER static bool function_enabled; diff --git a/tools/trace/latency-collector.c b/tools/trace/latency-collector.c index dfc364d7836a..8cf3c74d998f 100644 --- a/tools/trace/latency-collector.c +++ b/tools/trace/latency-collector.c @@
[PATCH v3 1/4] ftrace: Implement fs notification for tracing_max_latency
This patch implements the feature that the tracing_max_latency file, e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive notifications through the fsnotify framework when a new latency is available. One particularly interesting use of this facility is when enabling threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh, together with the preempt/irqsoff tracers. This makes it possible to implement a user space program that can, with equal probability, obtain traces of latencies that occur immediately after each other in spite of the fact that the preempt/irqsoff tracers operate in overwrite mode. This facility works with the hwlat, preempt/irqsoff, and wakeup tracers. This patch also add four new tracepoints, for entering/exiting __schedule() and do_idle(). Those tracepoints are needed to disable and enable the notificaton facility because we cannot wake up the workqueue from these critical sections without risking a deadlock. Similar problems would also arise if we try to schedule a tasklet, raise a softirq, or wake up a kernel thread. If a notification event would happen in the forbidden sections, we schedule the fsnotify work as soon as we have exited do_idle()/__schedule(). Signed-off-by: Viktor Rosendahl --- include/trace/events/power.h | 40 + include/trace/events/sched.h | 40 + kernel/sched/core.c | 2 + kernel/sched/idle.c | 2 + kernel/trace/trace.c | 160 ++- kernel/trace/trace.h | 10 +++ kernel/trace/trace_hwlat.c | 4 +- 7 files changed, 255 insertions(+), 3 deletions(-) diff --git a/include/trace/events/power.h b/include/trace/events/power.h index f7aece721aed..40ab747274bd 100644 --- a/include/trace/events/power.h +++ b/include/trace/events/power.h @@ -40,6 +40,46 @@ DEFINE_EVENT(cpu, cpu_idle, TP_ARGS(state, cpu_id) ); +/* + * Tracepoint for entering do_idle(): + */ +TRACE_EVENT(do_idle_enter, + + TP_PROTO(int cpu), + + TP_ARGS(cpu), + + TP_STRUCT__entry( + __field(int, cpu) + ), + + TP_fast_assign( + __entry->cpu = cpu; + ), + + TP_printk("cpu=%d", __entry->cpu) +); + +/* + * Tracepoint for exiting do_idle(): + */ +TRACE_EVENT(do_idle_exit, + + TP_PROTO(int cpu), + + TP_ARGS(cpu), + + TP_STRUCT__entry( + __field(int, cpu) + ), + + TP_fast_assign( + __entry->cpu = cpu; + ), + + TP_printk("cpu=%d", __entry->cpu) +); + TRACE_EVENT(powernv_throttle, TP_PROTO(int chip_id, const char *reason, int pmax), diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 9a4bdfadab07..6a3dae7b3249 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -183,6 +183,46 @@ TRACE_EVENT(sched_switch, __entry->next_comm, __entry->next_pid, __entry->next_prio) ); +/* + * Tracepoint for entering __schedule(): + */ +TRACE_EVENT(sched_schedule_enter, + + TP_PROTO(int cpu), + + TP_ARGS(cpu), + + TP_STRUCT__entry( + __field(int, cpu) + ), + + TP_fast_assign( + __entry->cpu = cpu; + ), + + TP_printk("cpu=%d", __entry->cpu) +); + +/* + * Tracepoint for exiting __schedule(): + */ +TRACE_EVENT(sched_schedule_exit, + + TP_PROTO(int cpu), + + TP_ARGS(cpu), + + TP_STRUCT__entry( + __field(int, cpu) + ), + + TP_fast_assign( + __entry->cpu = cpu; + ), + + TP_printk("cpu=%d", __entry->cpu) +); + /* * Tracepoint for a task being migrated: */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 102dfcf0a29a..c9d86fcc48f5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3374,6 +3374,7 @@ static void __sched notrace __schedule(bool preempt) int cpu; cpu = smp_processor_id(); + trace_sched_schedule_enter(cpu); rq = cpu_rq(cpu); prev = rq->curr; @@ -3448,6 +3449,7 @@ static void __sched notrace __schedule(bool preempt) } balance_callback(rq); + trace_sched_schedule_exit(cpu); } void __noreturn do_task_dead(void) diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index f5516bae0c1b..e328e045c6e8 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -224,6 +224,7 @@ static void cpuidle_idle_call(void) static void do_idle(void) { int cpu = smp_processor_id(); + trace_do_idle_enter(cpu); /* * If the arch has a polling bit, we maintain an invariant: * @@ -287,6 +288,7 @@ static void do_idle(void) if (unlikely(klp_patch_pending(current))) klp_update_patch_state(current); + trace_do_idle_exit(cpu); } bool cpu_in_idle(unsigned long pc) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c ind
[PATCH v3 0/4] Some new features for the latency tracers
Hello all, Changes in v3: - [PATCH 1/4]: * I have generalized this to send the fsnotify() notifications for all tracers that use tracing_max_latency. * There are large additions of code to prevent queue_work() from being called while we are in __schedule() or do_idle(). This was a bug also in the previous version but had gone unnoticed. It became very visible when I tried to make it work with the wakeup tracers because they are always invoked from the sched_switch tracepoint. * The fsnotify notification is issued for tracing_max_latency, not trace. * I got rid of the kernel config option. The facility is always compiled when CONFIG_FSNOTIFY is enabled and any of the latency tracers are enabled. - [PATCH 2/4]: * No significant changes. - [PATCH 3/4]: * The latency-collector help messages have been tuned to the fact that it can be used also with wakeup and hwlat tracers. * I increased the size of the buffer for reading from /sys/kernel/debug/tracing/trace. * Adapted it to monitor tracing_max_latency instead of trace - [PATCH 4/4]: * I converted this from a kernel config option to a trace config option that can be changed at runtime. This series is meant to address two issues with the latency tracing. The first three patches provide a method to trace latencies that always occurs very close to each other and to differentiate between them, in spite of the fact that the latency tracers always work in overwrite mode. [PATCH 1/4] This implement fs notification for preempt/irqsoff. It makes it possible for userspace to detect when a new latency has been detected. [PATCH 2/4] This extends the preemptirq_delay_test module so that it can be used to generate a burst of closely occurring latencies. [PATCH 3/4] This adds a user space program to the tools directory that utilizes the fs notification feature and a randomized algorithm to print out any of the latencies in a burst with approximately equal probability. The last patch is not directly connected but doesn't apply cleanly on its own: [PATCH 4/4] This adds the option console-latency to the trace options. best regards, Viktor Viktor Rosendahl (4): ftrace: Implement fs notification for tracing_max_latency preemptirq_delay_test: Add the burst feature and a sysfs trigger Add the latency-collector to tools ftrace: Add an option for tracing console latencies include/linux/irqflags.h | 21 + include/trace/events/power.h | 40 + include/trace/events/sched.h | 40 + kernel/printk/printk.c |6 +- kernel/sched/core.c |2 + kernel/sched/idle.c |2 + kernel/trace/Kconfig |6 +- kernel/trace/preemptirq_delay_test.c | 145 ++- kernel/trace/trace.c | 160 +++- kernel/trace/trace.h | 11 + kernel/trace/trace_hwlat.c |4 +- kernel/trace/trace_irqsoff.c | 12 + tools/Makefile | 14 +- tools/trace/Makefile | 20 + tools/trace/latency-collector.c | 1211 ++ 15 files changed, 1661 insertions(+), 33 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c -- 2.17.1
[PATCH v3 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger
This burst feature enables the user to generate a burst of preempt/irqsoff latencies. This makes it possible to test whether we are able to detect latencies that systematically occur very close to each other. The maximum burst size is 10. We also create 10 identical test functions, so that we get 10 different backtraces; this is useful when we want to test whether we can detect all the latencies in a burst. Otherwise, there would be no easy way of differentiating between which latency in a burst was captured by the tracer. In addition, there is a sysfs trigger, so that it's not necessary to reload the module to repeat the test. The trigger will appear as /sys/kernel/preemptirq_delay_test/trigger in sysfs. Signed-off-by: Viktor Rosendahl --- kernel/trace/Kconfig | 6 +- kernel/trace/preemptirq_delay_test.c | 145 +++ 2 files changed, 129 insertions(+), 22 deletions(-) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 5d965cef6c77..aab8909dfbff 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -740,9 +740,9 @@ config PREEMPTIRQ_DELAY_TEST configurable delay. The module busy waits for the duration of the critical section. - For example, the following invocation forces a one-time irq-disabled - critical section for 500us: - modprobe preemptirq_delay_test test_mode=irq delay=50 + For example, the following invocation generates a burst of three + irq-disabled critical sections for 500us: + modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3 If unsure, say N diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index d8765c952fab..dc281fa75198 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -3,6 +3,7 @@ * Preempt / IRQ disable delay thread to test latency tracers * * Copyright (C) 2018 Joel Fernandes (Google) + * Copyright (C) 2018, 2019 BMW Car IT GmbH */ #include @@ -10,18 +11,25 @@ #include #include #include +#include #include #include #include #include +#include static ulong delay = 100; -static char test_mode[10] = "irq"; +static char test_mode[12] = "irq"; +static uint burst_size = 1; -module_param_named(delay, delay, ulong, S_IRUGO); -module_param_string(test_mode, test_mode, 10, S_IRUGO); -MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)"); -MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default irq)"); +module_param_named(delay, delay, ulong, 0444); +module_param_string(test_mode, test_mode, 12, 0444); +module_param_named(burst_size, burst_size, uint, 0444); +MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)"); +MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or alternate (default irq)"); +MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)"); + +#define MIN(x, y) ((x) < (y) ? (x) : (y)) static void busy_wait(ulong time) { @@ -34,37 +42,136 @@ static void busy_wait(ulong time) } while ((end - start) < (time * 1000)); } -static int preemptirq_delay_run(void *data) +static __always_inline void irqoff_test(void) { unsigned long flags; + local_irq_save(flags); + busy_wait(delay); + local_irq_restore(flags); +} - if (!strcmp(test_mode, "irq")) { - local_irq_save(flags); - busy_wait(delay); - local_irq_restore(flags); - } else if (!strcmp(test_mode, "preempt")) { - preempt_disable(); - busy_wait(delay); - preempt_enable(); +static __always_inline void preemptoff_test(void) +{ + preempt_disable(); + busy_wait(delay); + preempt_enable(); +} + +static void execute_preemptirqtest(int idx) +{ + if (!strcmp(test_mode, "irq")) + irqoff_test(); + else if (!strcmp(test_mode, "preempt")) + preemptoff_test(); + else if (!strcmp(test_mode, "alternate")) { + if (idx % 2 == 0) + irqoff_test(); + else + preemptoff_test(); } +} + +#define DECLARE_TESTFN(POSTFIX)\ + static void preemptirqtest_##POSTFIX(int idx) \ + { \ + execute_preemptirqtest(idx);\ + } \ +/* + * We create 10 different functions, so that we can get 10 different + * backtraces. + */ +DECLARE_TESTFN(0) +DECLARE_TESTFN(1) +DECLARE_TESTFN(2) +DECLARE_TESTFN(3) +DECLARE_TESTFN(4) +DECLARE_TESTFN(5) +DECLARE_TESTFN(6) +DECLARE_TESTFN(7) +DECLARE_TESTFN(8) +DECLARE_TESTFN(9) + +static void (*testfuncs[])(int) = { +
Re: [PATCH v2 1/4] ftrace: Implement fs notification for preempt/irqsoff tracers
On 5/6/19 1:01 AM, Steven Rostedt wrote: > On Mon, 6 May 2019 00:39:15 +0200 > Viktor Rosendahl wrote: > >> Can you explain more precisely what you agree with? >> >> The general idea of being able to trace bursts of latencies? > > One thing I have an issue with the current approach is the use of the > trace file for this. You mean that using fsnotify is kind of okayish but that it's confusing for the user because only a subset of tracers would send the fsnotify event when the trace file is updated? > > Hmm, what about adding a notifier to tracing_max_latency instead? And > do it not as a config option, but have it always enabled. It would send a > notification when it changes, and that only happens when there's a new > max latency. Would that work for you? Yes, it seems to be OK since the tracing_max_latency is updated also with the latest latency that exceeds the threshold when we are using tracing_thresh. I will try to send a new version of the patch series soon, with the modifications that have been discussed so far. best regards, Viktor
Re: [PATCH v2 1/4] ftrace: Implement fs notification for preempt/irqsoff tracers
On 5/4/19 6:47 PM, Joel Fernandes wrote: > On Wed, May 01, 2019 at 10:36:47PM +0200, Viktor Rosendahl wrote: >> immediately after each other in spite of the fact that the >> preempt/irqsoff tracers operate in overwrite mode. >> >> Signed-off-by: Viktor Rosendahl > > I agree with the general idea, but I don't really like how it is done in the > patch. > Can you explain more precisely what you agree with? The general idea of being able to trace bursts of latencies? Or the slightly more specific idea of notifying user space when /sys/kernel/debug/tracing/trace has new data, and let user space do the rest? > We do have a notification mechanism already in the form of trace_pipe. Can we > not improve that in some way to be notified of a new trace data? In theory, > the trace_pipe does fit into the description in the documentation: "Reads > from this file will block until new data is retrieved" > I am not quite sure what kind of solution you are after here. Could you be more specific? I think that it would be weird if we used trace_pipe to send a message with the meaning "new data is available in the trace file". To me this would seem like (re)inventing a special purpose alternative to inotify. Another option would be to allow the user to consume the the latency traces directly from trace_pipe, without using the trace file. This would make sense to me and would indeed be a much better solution. If somebody is able to make such an implementation I am all for it. However, I do not know how to do this. I believe that it would lead towards a significant rewrite of the latency tracers, probably also how the ftrace buffering works. My solution is clearly a hack but it has the benefits of requiring quite small and straightforward changes to the kernel and the really ugly things are then done in user space. >> >> +config PREEMPTIRQ_FSNOTIFY > Does this have to be a CONFIG option? If prefer if the code automatically > does the notification and it is always enabled. I don't see any drawbacks of > that. > I was just thinking that sending fsnotify events to a sysfs file is a bit off the beaten track and I figured that some people would like to opt out. It can of course be changed so that it is always enabled when a tracer that requires it is enabled, if there is general agreement on this point. >> + >> config SCHED_TRACER >> bool "Scheduling Latency Tracer" >> select GENERIC_TRACER >> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c >> index ca1ee656d6d8..ebefb8d4e072 100644 >> --- a/kernel/trace/trace.c >> +++ b/kernel/trace/trace.c >> @@ -44,6 +44,8 @@ >> #include >> #include >> #include >> +#include >> +#include >> >> #include "trace.h" >> #include "trace_output.h" >> @@ -8191,6 +8193,32 @@ static __init void create_trace_instances(struct >> dentry *d_tracer) >> return; >> } >> >> +#ifdef CONFIG_PREEMPTIRQ_FSNOTIFY >> + >> +static void trace_notify_workfn(struct work_struct *work) > [snip] > > I prefer if this facility is available to other tracers as well such as > the wakeup tracer which is similar in output (check > Documentation/trace/ftrace.txt). I believe this should be a generic trace > facility, and not tracer specific. > This should be possible. If we stick with the fsnotify idea, it's possible to move the functions for it to trace.c and to use it also from other tracers. It is also possible to do a few small adjustments to the latency-collector and to rename it the trace-collector, so that it can work for the wakeup case also. The random sleep games probably don't make sense for the wakeup case but it's already now a command line option. Below is a suggestion for how to make the facility more generic and usable by by both preempt/irqsoff and wakeup. best regards, Viktor --- kernel/trace/Kconfig | 11 +++ kernel/trace/trace.c | 50 +-- kernel/trace/trace.h | 15 ++ kernel/trace/trace_irqsoff.c | 5 kernel/trace/trace_sched_wakeup.c | 7 - 5 files changed, 85 insertions(+), 3 deletions(-) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 8bd1d6d001d7..ca6a63004adf 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -244,6 +244,17 @@ config SCHED_TRACER This tracer tracks the latency of the highest priority task to be scheduled in, starting from the point it has woken up. +config TRACE_FSNOTIFY + bool "Generate fsnotify events for the trace file" + default n + depends on (IRQSOFF_TRACER || PREEMPT_TRACER || SCHED_TRACER) + depends on FSNOTIFY + help + This opt
Re: [PATCH v2 4/4] ftrace: Add an option for tracing console latencies
> Instead of this being turned into a nop, don't have a kconfig option > but instead have this call into the trace_irqsoff.c code, and depending > on what the options are, it should stop it. Of course, this would need > to be smart enough to pair it. Perhaps return the result of > console_stop_critical_timings() and have that passed to > console_start_critical_timings(), and only have start do something if > stop did something. This way the option only needs to disable the stop > part. Something like this? I have only compile tested it so far. best regards, Viktor --- include/linux/irqflags.h | 17 + kernel/printk/printk.c | 6 -- kernel/trace/trace.h | 1 + kernel/trace/trace_irqsoff.c | 13 + 4 files changed, 35 insertions(+), 2 deletions(-) diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 21619c92c377..e4f7ebc67a53 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -68,9 +68,26 @@ do { \ defined(CONFIG_PREEMPT_TRACER) extern void stop_critical_timings(void); extern void start_critical_timings(void); + extern bool console_tracing_disabled(void); + +#define console_stop_critical_timings(FLAG)\ + do {\ + FLAG = console_tracing_disabled(); \ + if (FLAG) \ + stop_critical_timings();\ + } while(0) + +#define console_start_critical_timings(FLAG)\ + do { \ + if (FLAG)\ + start_critical_timings();\ + } while(0) + #else # define stop_critical_timings() do { } while (0) # define start_critical_timings() do { } while (0) +# define console_stop_critical_timings(FLAG) do { } while (0) +# define console_start_critical_timings(FLAG) do { } while (0) #endif /* diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 02ca827b8fac..3a18b7208399 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2348,6 +2348,7 @@ void console_unlock(void) static char ext_text[CONSOLE_EXT_LOG_MAX]; static char text[LOG_LINE_MAX + PREFIX_MAX]; unsigned long flags; + bool cflag; bool do_cond_resched, retry; if (console_suspended) { @@ -2448,9 +2449,10 @@ void console_unlock(void) */ console_lock_spinning_enable(); - stop_critical_timings();/* don't trace print latency */ + /* don't trace print latency if it's disabled */ + console_stop_critical_timings(cflag); call_console_drivers(ext_text, ext_len, text, len); - start_critical_timings(); + console_start_critical_timings(cflag); if (console_lock_spinning_disable_and_check()) { printk_safe_exit_irqrestore(flags); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 59dc01ac52fd..3eed4756dba3 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1253,6 +1253,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf, C(PRINTK_MSGONLY, "printk-msg-only"), \ C(CONTEXT_INFO, "context-info"), /* Print pid/cpu/time */ \ C(LATENCY_FMT, "latency-format"), \ + C(CONSOLE_LATENCY, "console-latency"), \ C(RECORD_CMD, "record-cmd"), \ C(RECORD_TGID, "record-tgid"), \ C(OVERWRITE,"overwrite"), \ diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index 07a391e845de..659a82209c1e 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -490,6 +490,19 @@ void stop_critical_timings(void) EXPORT_SYMBOL_GPL(stop_critical_timings); NOKPROBE_SYMBOL(stop_critical_timings); +bool console_tracing_disabled(void) +{ + struct trace_array *tr = irqsoff_trace; + int pc = preempt_count(); + + if (!preempt_trace(pc) && !irq_trace()) + return false; + + return !(tr->trace_flags & CONSOLE_LATENCY); +} + +EXPORT_SYMBOL_GPL(console_tracing_disabled); + #ifdef CONFIG_FUNCTION_TRACER static bool function_enabled; -- 2.17.1
[PATCH v2 4/4] ftrace: Add an option for tracing console latencies
This new option CONFIG_TRACE_CONSOLE_LATENCY will enable the latency tracers to trace the console latencies. Previously this has always been implicitely disabled. I guess this is because they are considered to be well known and unavoidable. However, for some organizations it may nevertheless be desirable to trace them. Basically, we want to be able to tell that there are latencies in the system under test because someone has incorrectly enabled the serial console. Signed-off-by: Viktor Rosendahl --- include/linux/irqflags.h | 13 + kernel/printk/printk.c | 5 +++-- kernel/trace/Kconfig | 11 +++ 3 files changed, 27 insertions(+), 2 deletions(-) diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 21619c92c377..791bee718448 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -73,6 +73,19 @@ do { \ # define start_critical_timings() do { } while (0) #endif +#ifdef CONFIG_TRACE_CONSOLE_LATENCY + +#define console_stop_critical_timings() do {} while (0) +#define console_start_critical_timings() do {} while (0) + +#else /* !CONFIG_TRACE_CONSOLE_LATENCY */ + +/* don't trace print latency */ +#define console_stop_critical_timings() stop_critical_timings() +#define console_start_critical_timings() start_critical_timings() + +#endif /* !CONFIG_TRACE_CONSOLE_LATENCY */ + /* * Wrap the arch provided IRQ routines to provide appropriate checks. */ diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 02ca827b8fac..710e87f61158 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2448,9 +2448,10 @@ void console_unlock(void) */ console_lock_spinning_enable(); - stop_critical_timings();/* don't trace print latency */ + /* don't trace print latency if it's disabled */ + console_stop_critical_timings(); call_console_drivers(ext_text, ext_len, text, len); - start_critical_timings(); + console_start_critical_timings(); if (console_lock_spinning_disable_and_check()) { printk_safe_exit_irqrestore(flags); diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index e5e8f2a0199e..f168d100d4fb 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -244,6 +244,17 @@ config PREEMPT_TRACER modification of /sys/kernel/debug/tracing/trace through the inotify interface. + config TRACE_CONSOLE_LATENCY + bool "Do not turn off latency tracing for the console" + default n + depends on IRQSOFF_TRACER || PREEMPT_TRACER + help + Some of the console drivers will cause long unavoidable + latencies because they are slow and need to print immediately + in a serialized manner. Because of this their latencies are not + traced by default. This option will change behavior so that + they are traced. + config SCHED_TRACER bool "Scheduling Latency Tracer" select GENERIC_TRACER -- 2.17.1
[PATCH v2 3/4] Add the latency-collector to tools
This is a tool that is intended to work around the fact that the preemptoff, irqsoff, and preemptirqsoff tracers only work in overwrite mode. The idea is to act randomly in such a way that we do not systematically lose any latencies, so that if enough testing is done, all latencies will be captured. If the same burst of latencies is repeated, then sooner or later we will have captured all the latencies. The reason why it may be desirable to catch all latencies with a long test campaign is that for some organizations, it's necessary to test the kernel in the field and not practical for developers to work iteratively with field testers. Because of cost and project schedules it is not possible to start a new test campaign every time a latency problem has been fixed. It uses inotify to detect changes to /sys/kernel/debug/tracing/trace. When a latency is detected, it will either sleep or print immediately, depending on a function that act as an unfair coin toss. If immediate print is chosen, it means that we open /sys/kernel/debug/tracing/trace and thereby cause a blackout period that will hide any subsequent latencies. If sleep is chosen, it means that we wait before opening /sys/kernel/debug/tracing/trace, by default for 1000 ms, to see if there is another latency during this period. If there is, then we will lose the previous latency. The coin will be tossed again with a different probability, and we will either print the new latency, or possibly a subsequent one. The probability for the unfair coin toss is chosen so that there is equal probability to obtain any of the latencies in a burst. However, this assumes that we make an assumption of how many latencies there can be. By default the program assumes that there are no more than 2 latencies in a burst, the probability of immediate printout will be: 1/2 and 1 Thus, the probability of getting each of the two latencies will be 1/2. If we ever find that there is more than one latency in a series, meaning that we reach the probability of 1, then the table will be expanded to: 1/3, 1/2, and 1 Thus, we assume that there are no more than three latencies and each with a probability of 1/3 of being captured. If the probability of 1 is reached in the new table, that is we see more than two closely occurring latencies, then the table will again be extended, and so on. On my systems, it seems like this scheme works fairly well, as long as the latencies we trace are long enough, 100 us seems to be enough. This userspace program receive the inotify event at the end of a latency, and it has time until the end of the next latency to react, that is to open /sys/kernel/debug/tracing/trace. Thus, if we trace latencies that are >100 us, then we have at least 100 us to react. Example output: root@myhost:~# echo 100 > /sys/kernel/debug/tracing/tracing_thresh root@myhost:~# echo preemptirqsoff > /sys/kernel/debug/tracing/current_tracer root@myhost:~# latency-collector -rvv -a 11 & [1] 4915 root@myhost:~# Running with policy rr and priority 99. Using 3 print threads. /sys/kernel/debug/tracing/trace will be printed with random delay Start size of the probability table:11 Print a message when prob. table table changes size:true Print a warning when an event has been lost:true Sleep time is: 1000 ms Initializing probability table to 11 Thread 4916 runs with policy rr and priority 99 Thread 4917 runs with policy rr and priority 99 Thread 4918 runs with policy rr and priority 99 Thread 4919 runs with policy rr and priority 99 root@myhost:~# modprobe preemptirq_delay_test delay=105 test_mode=alternate burst_size=10 2850.638308 Latency 1 printout skipped due to random delay 2850.638383 Latency 2 printout skipped due to random delay 2850.638497 Latency 3 immediate print >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> BEGIN <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< alternat-49226d...0us!: execute_preemptirqtest <-preemptirqtest_2 alternat-49226d... 106us : execute_preemptirqtest <-preemptirqtest_2 alternat-49226d... 106us : tracer_hardirqs_on <-preemptirqtest_2 alternat-49226d... 108us : => preemptirqtest_2 => preemptirq_delay_run => kthread => ret_from_fork >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> END <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< In the example above, we randomly, happened to get the third latency in a burst burst of 10. If the experiment is repeated enough times, we will get all 10. Sometim
[PATCH v2 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger
This burst feature enables the user to generate a burst of preempt/irqsoff latencies. This makes it possible to test whether we are able to detect latencies that systematically occur very close to each other. The maximum burst size is 10. We also create 10 identical test functions, so that we get 10 different backtraces; this is useful when we want to test whether we can detect all the latencies in a burst. Otherwise, there would be no easy way of differentiating between which latency in a burst was captured by the tracer. In addition, there is a sysfs trigger, so that it's not necessary to reload the module to repeat the test. The trigger will appear as /sys/kernel/preemptirq_delay_test/trigger in sysfs. Signed-off-by: Viktor Rosendahl --- kernel/trace/Kconfig | 6 +- kernel/trace/preemptirq_delay_test.c | 145 +++ 2 files changed, 129 insertions(+), 22 deletions(-) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 35e5fd3224f6..e5e8f2a0199e 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -750,9 +750,9 @@ config PREEMPTIRQ_DELAY_TEST configurable delay. The module busy waits for the duration of the critical section. - For example, the following invocation forces a one-time irq-disabled - critical section for 500us: - modprobe preemptirq_delay_test test_mode=irq delay=50 + For example, the following invocation generates a burst of three + irq-disabled critical sections for 500us: + modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3 If unsure, say N diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index d8765c952fab..dc281fa75198 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -3,6 +3,7 @@ * Preempt / IRQ disable delay thread to test latency tracers * * Copyright (C) 2018 Joel Fernandes (Google) + * Copyright (C) 2018, 2019 BMW Car IT GmbH */ #include @@ -10,18 +11,25 @@ #include #include #include +#include #include #include #include #include +#include static ulong delay = 100; -static char test_mode[10] = "irq"; +static char test_mode[12] = "irq"; +static uint burst_size = 1; -module_param_named(delay, delay, ulong, S_IRUGO); -module_param_string(test_mode, test_mode, 10, S_IRUGO); -MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)"); -MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default irq)"); +module_param_named(delay, delay, ulong, 0444); +module_param_string(test_mode, test_mode, 12, 0444); +module_param_named(burst_size, burst_size, uint, 0444); +MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)"); +MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or alternate (default irq)"); +MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)"); + +#define MIN(x, y) ((x) < (y) ? (x) : (y)) static void busy_wait(ulong time) { @@ -34,37 +42,136 @@ static void busy_wait(ulong time) } while ((end - start) < (time * 1000)); } -static int preemptirq_delay_run(void *data) +static __always_inline void irqoff_test(void) { unsigned long flags; + local_irq_save(flags); + busy_wait(delay); + local_irq_restore(flags); +} - if (!strcmp(test_mode, "irq")) { - local_irq_save(flags); - busy_wait(delay); - local_irq_restore(flags); - } else if (!strcmp(test_mode, "preempt")) { - preempt_disable(); - busy_wait(delay); - preempt_enable(); +static __always_inline void preemptoff_test(void) +{ + preempt_disable(); + busy_wait(delay); + preempt_enable(); +} + +static void execute_preemptirqtest(int idx) +{ + if (!strcmp(test_mode, "irq")) + irqoff_test(); + else if (!strcmp(test_mode, "preempt")) + preemptoff_test(); + else if (!strcmp(test_mode, "alternate")) { + if (idx % 2 == 0) + irqoff_test(); + else + preemptoff_test(); } +} + +#define DECLARE_TESTFN(POSTFIX)\ + static void preemptirqtest_##POSTFIX(int idx) \ + { \ + execute_preemptirqtest(idx);\ + } \ +/* + * We create 10 different functions, so that we can get 10 different + * backtraces. + */ +DECLARE_TESTFN(0) +DECLARE_TESTFN(1) +DECLARE_TESTFN(2) +DECLARE_TESTFN(3) +DECLARE_TESTFN(4) +DECLARE_TESTFN(5) +DECLARE_TESTFN(6) +DECLARE_TESTFN(7) +DECLARE_TESTFN(8) +DECLARE_TESTFN(9) + +static void (*testfuncs[])(int) = { +
[PATCH v2 0/4] Some new features for the preempt/irqsoff tracers
Hello all, Changes in v2: - I have tried to improve some of the commit messages by adding additional explanations. - [PATCH 2/4]: We use burst size checking instead of the confusing modulo game. The example given in the Kconfig file is corrected and extended. I was not able to find a way to use the module parameters as trigger, so I have left the sysfs trigger file as is. This series is meant to address two issues with the latency tracing. The first three patches provide a method to trace latencies that always occurs very close to each other and to differentiate between them, in spite of the fact that the latency tracers always work in overwrite mode. [PATCH 1/4] This implement fs notification for preempt/irqsoff. It makes it possible for userspace to detect when a new latency has been detected. [PATCH 2/4] This extends the preemptirq_delay_test module so that it can be used to generate a burst of closely occurring latencies. [PATCH 3/4] This adds a user space program to the tools directory that utilizes the fs notification feature and a randomized algorithm to print out any of the latencies in a burst with approximately equal probability. The last patch is not directly connected but doesn't apply cleanly on its own: [PATCH 4/4] This adds the option CONFIG_TRACE_CONSOLE_LATENCY to decide whether we want to trace prints to the console or not. best regards, Viktor Rosendahl Viktor Rosendahl (4): ftrace: Implement fs notification for preempt/irqsoff tracers preemptirq_delay_test: Add the burst feature and a sysfs trigger Add the latency-collector to tools ftrace: Add an option for tracing console latencies include/linux/irqflags.h | 13 + kernel/printk/printk.c |5 +- kernel/trace/Kconfig | 27 +- kernel/trace/preemptirq_delay_test.c | 145 +++- kernel/trace/trace.c | 31 +- kernel/trace/trace.h |5 + kernel/trace/trace_irqsoff.c | 35 + tools/Makefile | 14 +- tools/trace/Makefile | 20 + tools/trace/latency-collector.c | 1190 ++ 10 files changed, 1453 insertions(+), 32 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c -- 2.17.1
[PATCH v2 1/4] ftrace: Implement fs notification for preempt/irqsoff tracers
This patch implements the feature that the trace file, e.g. /sys/kernel/debug/tracing/trace will receive notifications through the fsnotify framework when a new trace is available. This makes it possible to implement a user space program that can, with equal probability, obtain traces of latencies that occur immediately after each other in spite of the fact that the preempt/irqsoff tracers operate in overwrite mode. Signed-off-by: Viktor Rosendahl --- kernel/trace/Kconfig | 10 ++ kernel/trace/trace.c | 31 +-- kernel/trace/trace.h | 5 + kernel/trace/trace_irqsoff.c | 35 +++ 4 files changed, 79 insertions(+), 2 deletions(-) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 8bd1d6d001d7..35e5fd3224f6 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -234,6 +234,16 @@ config PREEMPT_TRACER enabled. This option and the irqs-off timing option can be used together or separately.) + config PREEMPTIRQ_FSNOTIFY + bool "Generate fsnotify events for the latency tracers" + default n + depends on (IRQSOFF_TRACER || PREEMPT_TRACER) && FSNOTIFY + help + This option will enable the generation of fsnotify events for the + trace file. This makes it possible for userspace to be notified about + modification of /sys/kernel/debug/tracing/trace through the inotify + interface. + config SCHED_TRACER bool "Scheduling Latency Tracer" select GENERIC_TRACER diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index ca1ee656d6d8..ebefb8d4e072 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -44,6 +44,8 @@ #include #include #include +#include +#include #include "trace.h" #include "trace_output.h" @@ -8191,6 +8193,32 @@ static __init void create_trace_instances(struct dentry *d_tracer) return; } +#ifdef CONFIG_PREEMPTIRQ_FSNOTIFY + +static void trace_notify_workfn(struct work_struct *work) +{ + struct trace_array *tr = container_of(work, struct trace_array, + notify_work); + fsnotify(tr->d_trace->d_inode, FS_MODIFY, tr->d_trace->d_inode, +FSNOTIFY_EVENT_INODE, NULL, 0); +} + +static void trace_create_trace_file(struct trace_array *tr, + struct dentry *d_tracer) +{ + /* For notify we need to init the work structure and save the pointer */ + INIT_WORK(>notify_work, trace_notify_workfn); + tr->d_trace = trace_create_file("trace", 0644, d_tracer, tr, + _fops); +} + +#else /* !CONFIG_PREEMPTIRQ_FSNOTIFY */ + +#define trace_create_trace_file(tr, d_tracer) \ + trace_create_file("trace", 0644, d_tracer, tr, _fops) + +#endif + static void init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) { @@ -8209,8 +8237,7 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) trace_create_file("trace_options", 0644, d_tracer, tr, _iter_fops); - trace_create_file("trace", 0644, d_tracer, - tr, _fops); + trace_create_trace_file(tr, d_tracer); trace_create_file("trace_pipe", 0444, d_tracer, tr, _pipe_fops); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index d80cee49e0eb..59dc01ac52fd 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -17,6 +17,7 @@ #include #include #include +#include #ifdef CONFIG_FTRACE_SYSCALLS #include /* For NR_SYSCALLS */ @@ -302,6 +303,10 @@ struct trace_array { struct dentry *options; struct dentry *percpu_dir; struct dentry *event_dir; +#ifdef CONFIG_PREEMPTIRQ_FSNOTIFY + struct dentry *d_trace; + struct work_struct notify_work; +#endif struct trace_options*topts; struct list_headsystems; struct list_headevents; diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index a745b0cee5d3..07a391e845de 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -82,6 +82,31 @@ static inline int irqsoff_display_graph(struct trace_array *tr, int set) */ static __cacheline_aligned_in_smp unsigned long max_sequence; +#ifdef CONFIG_PREEMPTIRQ_FSNOTIFY + +static struct workqueue_struct *notify_wq; + +static __init void trace_file_notify_init(void) +{ + notify_wq = alloc_workqueue("irqsoff_notify_wq", + WQ_UNBOUND | WQ_HIGHPRI, 0); + if (!notify_wq) + pr_err("Unable to allocate irqsoff_notify_wq"); +} + +static inline void trac
[PATCH 3/4] Add the latency-collector to tools
This is a tool that is intended to work around the fact that the preemptoff, irqsoff, and preemptirqsoff tracers only work in overwrite mode. The idea is to act randomly in such a way that we do not systematically lose any latencies, so that if enough testing is done, all latencies will be captured. It uses inotify to detect changes to /sys/kernel/debug/tracing/trace. When a latency is detected, it will either sleep or print immediately, depending on a function that act as an unfair coin toss. If immediate print is chosen, it means that we open /sys/kernel/debug/tracing/trace and thereby cause a blackout period that will hide any subsequent latencies. If sleep is chosen, it means that we wait before opening /sys/kernel/debug/tracing/trace, by default for 1000 ms, to see if there is another latency during this period. If there is, then we will lose the previous latency. The coin will be tossed again with a different probability, and we will either print the new latency, or possibly a subsequent one. The probability for the unfair coin toss is chosen so that there is equal probability to obtain any of the latencies in a burst. However, this assumes that we make an assumption of how many latencies there can be. By default the program assumes that there are no more than 2 latencies in a burst, the probability of immediate printout will be: 1/2 and 1 Thus, the probabilty of getting each of the two latencies will be 1/2. If we ever find that there is more than one latency in a series, meaning that we reach the probability of 1, then the table will be expanded to: 1/3, 1/2, and 1 Thus, we assume that there are no more than three latencies and each with a probability of 1/3 of being captured. If the probability of 1 is reached in the new table, that is we see more than two closely occurring latencies, then the table will again be extended, and so on. On my systems, it seems like this scheme works fairly well, as long as the latencies we trace are long enough, 100 us seems to be enough. This userspace program receive the inotify event at the end of a latency, and it has time until the end of the next latency to react, that is to open /sys/kernel/debug/tracing/trace. Thus, if we trace latencies that are >100 us, then we have at least 100 us to react. Example output: root@myhost:~# echo 100 > /sys/kernel/debug/tracing/tracing_thresh root@myhost:~# echo preemptirqsoff > /sys/kernel/debug/tracing/current_tracer root@myhost:~# latency-collector -rvv -a 11 & [1] 4915 root@myhost:~# Running with policy rr and priority 99. Using 3 print threads. /sys/kernel/debug/tracing/trace will be printed with random delay Start size of the probability table:11 Print a message when prob. table table changes size:true Print a warning when an event has been lost:true Sleep time is: 1000 ms Initializing probability table to 11 Thread 4916 runs with policy rr and priority 99 Thread 4917 runs with policy rr and priority 99 Thread 4918 runs with policy rr and priority 99 Thread 4919 runs with policy rr and priority 99 root@myhost:~# modprobe preemptirq_delay_test delay=105 test_mode=alternate burst_size=10 2850.638308 Latency 1 printout skipped due to random delay 2850.638383 Latency 2 printout skipped due to random delay 2850.638497 Latency 3 immediate print >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> BEGIN <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< alternat-49226d...0us!: execute_preemptirqtest <-preemptirqtest_2 alternat-49226d... 106us : execute_preemptirqtest <-preemptirqtest_2 alternat-49226d... 106us : tracer_hardirqs_on <-preemptirqtest_2 alternat-49226d... 108us : => preemptirqtest_2 => preemptirq_delay_run => kthread => ret_from_fork >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> END <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< In the example above, we randomly, happened to get the third latency in a burst burst of 10. If the experiment is repeated enough times, we will get all 10. Sometimes, there will be lost latencies, so that we get: 3054.078294 Latency 44 printout skipped due to inotify loop ..or: 3211.957685 Latency 112 printout skipped due to print loop In my experience, these are not frequent enough to be a problem. Also, we do not really lose any latency in these cases, it's more like we get another latency than what was intended by the design. Signed-off-by: Viktor Rosendahl --- tools/Makefile | 14 +- tools/trac
[PATCH 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger
This burst feature enables the user to generate a burst of preempt/irqsoff latencies. This makes it possible to test whether we are able to detect latencies that systematically occur very close to each other. In addition, there is a sysfs trigger, so that it's not necessary to reload the module to repeat the test. The trigger will appear as /sys/kernel/preemptirq_delay_test/trigger in sysfs. Signed-off-by: Viktor Rosendahl --- kernel/trace/preemptirq_delay_test.c | 139 +++ 1 file changed, 120 insertions(+), 19 deletions(-) diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index d8765c952fab..1fc3cdbdd293 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -3,6 +3,7 @@ * Preempt / IRQ disable delay thread to test latency tracers * * Copyright (C) 2018 Joel Fernandes (Google) + * Copyright (C) 2018 BMW Car IT GmbH */ #include @@ -10,18 +11,23 @@ #include #include #include +#include #include #include #include #include +#include static ulong delay = 100; -static char test_mode[10] = "irq"; +static char test_mode[12] = "irq"; +static uint burst_size = 1; -module_param_named(delay, delay, ulong, S_IRUGO); -module_param_string(test_mode, test_mode, 10, S_IRUGO); -MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)"); -MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default irq)"); +module_param_named(delay, delay, ulong, 0444); +module_param_string(test_mode, test_mode, 12, 0444); +module_param_named(burst_size, burst_size, uint, 0444); +MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)"); +MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or alternate (default irq)"); +MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)"); static void busy_wait(ulong time) { @@ -34,37 +40,132 @@ static void busy_wait(ulong time) } while ((end - start) < (time * 1000)); } -static int preemptirq_delay_run(void *data) +static __always_inline void irqoff_test(void) { unsigned long flags; + local_irq_save(flags); + busy_wait(delay); + local_irq_restore(flags); +} + +static __always_inline void preemptoff_test(void) +{ + preempt_disable(); + busy_wait(delay); + preempt_enable(); +} - if (!strcmp(test_mode, "irq")) { - local_irq_save(flags); - busy_wait(delay); - local_irq_restore(flags); - } else if (!strcmp(test_mode, "preempt")) { - preempt_disable(); - busy_wait(delay); - preempt_enable(); +static void execute_preemptirqtest(int idx) +{ + if (!strcmp(test_mode, "irq")) + irqoff_test(); + else if (!strcmp(test_mode, "preempt")) + preemptoff_test(); + else if (!strcmp(test_mode, "alternate")) { + if (idx % 2 == 0) + irqoff_test(); + else + preemptoff_test(); } +} + +#define DECLARE_TESTFN(POSTFIX)\ + static void preemptirqtest_##POSTFIX(int idx) \ + { \ + execute_preemptirqtest(idx);\ + } \ + +DECLARE_TESTFN(0) +DECLARE_TESTFN(1) +DECLARE_TESTFN(2) +DECLARE_TESTFN(3) +DECLARE_TESTFN(4) +DECLARE_TESTFN(5) +DECLARE_TESTFN(6) +DECLARE_TESTFN(7) +DECLARE_TESTFN(8) +DECLARE_TESTFN(9) +static void (*testfuncs[])(int) = { + preemptirqtest_0, + preemptirqtest_1, + preemptirqtest_2, + preemptirqtest_3, + preemptirqtest_4, + preemptirqtest_5, + preemptirqtest_6, + preemptirqtest_7, + preemptirqtest_8, + preemptirqtest_9, +}; + +#define NR_TEST_FUNCS ARRAY_SIZE(testfuncs) + +static int preemptirq_delay_run(void *data) +{ + + int i; + + for (i = 0; i < burst_size; i++) + (testfuncs[i % NR_TEST_FUNCS])(i); return 0; } -static int __init preemptirq_delay_init(void) +static struct task_struct *preemptirq_start_test(void) { char task_name[50]; - struct task_struct *test_task; snprintf(task_name, sizeof(task_name), "%s_test", test_mode); + return kthread_run(preemptirq_delay_run, NULL, task_name); +} + + +static ssize_t trigger_store(struct kobject *kobj, struct kobj_attribute *attr, +const char *buf, size_t count) +{ + preemptirq_start_test(); + return count; +} + +static struct kobj_attribute trigger_attribute = + __ATTR(trigger, 0200, NULL, trigger_store); + +static struct attribute *attrs[] = { + _attribute.attr, + NULL, +}; + +static struct attribute_group att
[PATCH 4/4] ftrace: Add an option for tracing console latencies
This new option CONFIG_TRACE_CONSOLE_LATENCY will enable the latency tracers to trace the console latencies. Previously this has always been implicitely disabled. I guess this is because they are considered to be well known and unavoidable. However, for some organizations it may nevertheless be desirable to trace them. Basically, we want to be able to tell that there are latencies in the system under test because someone has incorrectly enabled the serial console. Signed-off-by: Viktor Rosendahl --- include/linux/irqflags.h | 13 + kernel/printk/printk.c | 5 +++-- kernel/trace/Kconfig | 11 +++ 3 files changed, 27 insertions(+), 2 deletions(-) diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 21619c92c377..791bee718448 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -73,6 +73,19 @@ do { \ # define start_critical_timings() do { } while (0) #endif +#ifdef CONFIG_TRACE_CONSOLE_LATENCY + +#define console_stop_critical_timings() do {} while (0) +#define console_start_critical_timings() do {} while (0) + +#else /* !CONFIG_TRACE_CONSOLE_LATENCY */ + +/* don't trace print latency */ +#define console_stop_critical_timings() stop_critical_timings() +#define console_start_critical_timings() start_critical_timings() + +#endif /* !CONFIG_TRACE_CONSOLE_LATENCY */ + /* * Wrap the arch provided IRQ routines to provide appropriate checks. */ diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index d3d170374ceb..303b8f656009 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2404,9 +2404,10 @@ void console_unlock(void) */ console_lock_spinning_enable(); - stop_critical_timings();/* don't trace print latency */ + /* don't trace print latency if it's disabled */ + console_stop_critical_timings(); call_console_drivers(ext_text, ext_len, text, len); - start_critical_timings(); + console_start_critical_timings(); if (console_lock_spinning_disable_and_check()) { printk_safe_exit_irqrestore(flags); diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index ca80cb08bd39..5063b32de66c 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -244,6 +244,17 @@ config PREEMPT_TRACER modification of /sys/kernel/debug/tracing/trace through the inotify interface. + config TRACE_CONSOLE_LATENCY + bool "Do not turn off latency tracing for the console" + default n + depends on IRQSOFF_TRACER || PREEMPT_TRACER + help + Some of the console drivers will cause long unavoidable + latencies because they are slow and need to print immediately + in a serialized manner. Because of this their latencies are not + traced by default. This option will change behavior so that + they are traced. + config SCHED_TRACER bool "Scheduling Latency Tracer" select GENERIC_TRACER -- 2.17.1
[PATCH 1/4] ftrace: Implement fs notification for preempt/irqsoff tracers
This patch implements the feature that the trace file, e.g. /sys/kernel/debug/tracing/trace will receive notifications through the fsnotify framework when a new trace is available. This makes it possible to implement a user space program that can, with equal probability, obtain traces of latencies that occur immediately after each other in spite of the fact that the preempt/irqsoff tracers operate in overwrite mode. Signed-off-by: Viktor Rosendahl --- kernel/trace/Kconfig | 10 ++ kernel/trace/trace.c | 31 +-- kernel/trace/trace.h | 5 + kernel/trace/trace_irqsoff.c | 35 +++ 4 files changed, 79 insertions(+), 2 deletions(-) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index fa8b1fe824f3..ca80cb08bd39 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -234,6 +234,16 @@ config PREEMPT_TRACER enabled. This option and the irqs-off timing option can be used together or separately.) + config PREEMPTIRQ_FSNOTIFY + bool "Generate fsnotify events for the latency tracers" + default n + depends on (IRQSOFF_TRACER || PREEMPT_TRACER) && FSNOTIFY + help + This option will enable the generation of fsnotify events for the + trace file. This makes it possible for userspace to be notified about + modification of /sys/kernel/debug/tracing/trace through the inotify + interface. + config SCHED_TRACER bool "Scheduling Latency Tracer" select GENERIC_TRACER diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index c521b7347482..c7db01101ef2 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -44,6 +44,8 @@ #include #include #include +#include +#include #include "trace.h" #include "trace_output.h" @@ -7977,6 +7979,32 @@ static __init void create_trace_instances(struct dentry *d_tracer) return; } +#ifdef CONFIG_PREEMPTIRQ_FSNOTIFY + +static void trace_notify_workfn(struct work_struct *work) +{ + struct trace_array *tr = container_of(work, struct trace_array, + notify_work); + fsnotify(tr->d_trace->d_inode, FS_MODIFY, tr->d_trace->d_inode, +FSNOTIFY_EVENT_INODE, NULL, 0); +} + +static void trace_create_trace_file(struct trace_array *tr, + struct dentry *d_tracer) +{ + /* For notify we need to init the work structure and save the pointer */ + INIT_WORK(>notify_work, trace_notify_workfn); + tr->d_trace = trace_create_file("trace", 0644, d_tracer, tr, + _fops); +} + +#else /* !CONFIG_PREEMPTIRQ_FSNOTIFY */ + +#define trace_create_trace_file(tr, d_tracer) \ + trace_create_file("trace", 0644, d_tracer, tr, _fops) + +#endif + static void init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) { @@ -7995,8 +8023,7 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) trace_create_file("trace_options", 0644, d_tracer, tr, _iter_fops); - trace_create_file("trace", 0644, d_tracer, - tr, _fops); + trace_create_trace_file(tr, d_tracer); trace_create_file("trace_pipe", 0444, d_tracer, tr, _pipe_fops); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 08900828d282..a6769438a809 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -17,6 +17,7 @@ #include #include #include +#include #ifdef CONFIG_FTRACE_SYSCALLS #include /* For NR_SYSCALLS */ @@ -257,6 +258,10 @@ struct trace_array { struct dentry *options; struct dentry *percpu_dir; struct dentry *event_dir; +#ifdef CONFIG_PREEMPTIRQ_FSNOTIFY + struct dentry *d_trace; + struct work_struct notify_work; +#endif struct trace_options*topts; struct list_headsystems; struct list_headevents; diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index d3294721f119..676413504566 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -81,6 +81,31 @@ static inline int irqsoff_display_graph(struct trace_array *tr, int set) */ static __cacheline_aligned_in_smp unsigned long max_sequence; +#ifdef CONFIG_PREEMPTIRQ_FSNOTIFY + +static struct workqueue_struct *notify_wq; + +static __init void trace_file_notify_init(void) +{ + notify_wq = alloc_workqueue("irqsoff_notify_wq", + WQ_UNBOUND | WQ_HIGHPRI, 0); + if (!notify_wq) + pr_err("Unable to allocate irqsoff_notify_wq"); +} + +static inline void trac
[PATCH 0/4] Some new features for the preempt/irqsoff tracers
Hello all, This series is meant to address two issues with the latency tracing. The first three patches provide a method to trace latencies that always occurs very close to each other and to differentiate between them, in spite of the fact that the latency tracers always work in overwrite mode. [PATCH 1/4] This implement fs notification for preempt/irqsoff. It makes it possible for userspace to detect when a new latency has been detected. [PATCH 2/4] This extends the preemptirq_delay_test module so that it can be used to generate a burst of closely occurring latencies. [PATCH 3/4] This adds a user space program to the tools directory that utilizes the fs notification feature and a randomized algorithm to print out any of the latencies in a burst with approximately equal probability. The last patch is not directly connected but doesn't apply cleanly on its own: [PATCH 4/4] This adds the option CONFIG_TRACE_CONSOLE_LATENCY to decide whether we want to trace prints to the console or not. best regards, Viktor Rosendahl Viktor Rosendahl (4): ftrace: Implement fs notification for preempt/irqsoff tracers preemptirq_delay_test: Add the burst feature and a sysfs trigger Add the latency-collector to tools ftrace: Add an option for tracing console latencies include/linux/irqflags.h | 13 + kernel/printk/printk.c |5 +- kernel/trace/Kconfig | 21 + kernel/trace/preemptirq_delay_test.c | 139 ++- kernel/trace/trace.c | 31 +- kernel/trace/trace.h |5 + kernel/trace/trace_irqsoff.c | 35 + tools/Makefile | 14 +- tools/trace/Makefile | 20 + tools/trace/latency-collector.c | 1190 ++ 10 files changed, 1444 insertions(+), 29 deletions(-) create mode 100644 tools/trace/Makefile create mode 100644 tools/trace/latency-collector.c -- 2.17.1