from:"Viktor Rosendahl"

[PATCH 1/1] tracing/tools: fix a couple of spelling mistakes

2021-02-25 Thread Viktor Rosendahl

From: Colin Ian King 

There is a spelling mistake in the -g help option, I believe
it should be "graph".  There is also a spelling mistake in a
warning message. Fix both mistakes.

Signed-off-by: Colin Ian King 
Signed-off-by: Viktor Rosendahl 
---
 tools/tracing/latency/latency-collector.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/tracing/latency/latency-collector.c 
b/tools/tracing/latency/latency-collector.c
index 57b20802e71b..b69de9263ee6 100644
--- a/tools/tracing/latency/latency-collector.c
+++ b/tools/tracing/latency/latency-collector.c
@@ -1650,7 +1650,7 @@ static void start_printthread(void)
if (ufd <  0 ||
read(ufd, seed, sizeof(*seed)) != sizeof(*seed)) {
printf(
-"Warning! Using trivial random nummer seed, since %s not available\n",
+"Warning! Using trivial random number seed, since %s not available\n",
DEV_URANDOM);
fflush(stdout);
*seed = i;
@@ -1711,8 +1711,8 @@ static void show_usage(void)
 "\t\t\tbeginning, end, and backtrace.\n\n"
 
 "-g, --graph\t\tEnable the display-graph option in trace_option. This\n"
-"\t\t\toption causes ftrace to show the functionph of how\n"
-"\t\t\tfunctions are calling other functions.\n\n"
+"\t\t\toption causes ftrace to show the graph of how functions\n"
+"\t\t\tare calling other functions.\n\n"
 
 "-c, --policy POL\tRun the program with scheduling policy POL. POL can be\n"
 "\t\t\tother, batch, idle, rr or fifo. The default is rr. When\n"
-- 
2.25.1

[PATCH 0/1] Fix for the latency-collector

2021-02-25 Thread Viktor Rosendahl

Hi Steve,

I am not sure what is the correct approach to handle this latency-collector
fix. Can you take it into your queue?

best regards,

Viktor

Colin Ian King (1):
  tracing/tools: fix a couple of spelling mistakes

 tools/tracing/latency/latency-collector.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.25.1

[PATCH v12 0/1] Add the latency-collector to tools

2021-02-12 Thread Viktor Rosendahl

Changes since v11
 * I added a .gitignore in the tools/tracing/latency directory.

Changes since v10
 * Using libtracefs to control ftrace and find /sys/kernel/tracing
 * New options to list available tracers, select the tracer and other
   ftrace parameters
 * A signal handler to restore the ftrace state when the program terminates
 * Cleaned up the commit message a bit, removed example output etc.



Viktor Rosendahl (1):
  Add the latency-collector to tools

 tools/Makefile|   14 +-
 tools/tracing/Makefile|   19 +
 tools/tracing/latency/.gitignore  |2 +
 tools/tracing/latency/Makefile|   24 +
 tools/tracing/latency/latency-collector.c | 2108 +
 5 files changed, 2161 insertions(+), 6 deletions(-)
 create mode 100644 tools/tracing/Makefile
 create mode 100644 tools/tracing/latency/.gitignore
 create mode 100644 tools/tracing/latency/Makefile
 create mode 100644 tools/tracing/latency/latency-collector.c

-- 
2.25.1

[PATCH v12 1/1] Add the latency-collector to tools

2021-02-12 Thread Viktor Rosendahl

This is a tool that is intended to work around the fact that the
preemptoff, irqsoff, and preemptirqsoff tracers only work in
overwrite mode. The idea is to act randomly in such a way that we
do not systematically lose any latencies, so that if enough testing
is done, all latencies will be captured. If the same burst of
latencies is repeated, then sooner or later we will have captured all
the latencies.

It also works with the wakeup_dl, wakeup_rt, and wakeup tracers.
However, in that case it is probably not useful to use the random
sleep functionality.

The reason why it may be desirable to catch all latencies with a long
test campaign is that for some organizations, it's necessary to test
the kernel in the field and not practical for developers to work
iteratively with field testers. Because of cost and project schedules
it is not possible to start a new test campaign every time a latency
problem has been fixed.

It uses inotify to detect changes to /sys/kernel/tracing/trace.
When a latency is detected, it will either sleep or print
immediately, depending on a function that act as an unfair coin
toss.

If immediate print is chosen, it means that we open
/sys/kernel/tracing/trace and thereby cause a blackout period
that will hide any subsequent latencies.

If sleep is chosen, it means that we wait before opening
/sys/kernel/tracing/trace, by default for 1000 ms, to see if
there is another latency during this period. If there is, then we will
lose the previous latency. The coin will be tossed again with a
different probability, and we will either print the new latency, or
possibly a subsequent one.

The probability for the unfair coin toss is chosen so that there
is equal probability to obtain any of the latencies in a burst.
However, this assumes that we make an assumption of how many
latencies there can be. By default  the program assumes that there
are no more than 2 latencies in a burst, the probability of immediate
printout will be:

1/2 and 1

Thus, the probability of getting each of the two latencies will be 1/2.

If we ever find that there is more than one latency in a series,
meaning that we reach the probability of 1, then the table will be
expanded to:

1/3, 1/2, and 1

Thus, we assume that there are no more than three latencies and each
with a probability of 1/3 of being captured. If the probability of 1
is reached in the new table, that is we see more than two closely
occurring latencies, then the table will again be extended, and so
on.

On my systems, it seems like this scheme works fairly well, as
long as the latencies we trace are long enough, 300 us seems to be
enough. This userspace program receive the inotify event at the end
of a latency, and it has time until the end of the next latency
to react, that is to open /sys/kernel/tracing/trace. Thus,
if we trace latencies that are >300 us, then we have at least 300 us
to react.

The minimum latency will of course not be 300 us on all systems, it
will depend on the hardware, kernel version, workload and
configuration.

Example usage:

In one shell, give the following command:
sudo latency-collector -rvv -t preemptirqsoff -s 2000 -a 3

This will trace latencies > 2000us with the preemptirqsoff tracer,
using random sleep with maximum verbosity, with a probability
table initialized to a size of 3.

In another shell, generate a few bursts of latencies:

root@host:~# modprobe preemptirq_delay_test delay=3000 test_mode=alternate
burst_size=3
root@host:~# echo 1 > /sys/kernel/preemptirq_delay_test/trigger
root@host:~# echo 1 > /sys/kernel/preemptirq_delay_test/trigger
root@host:~# echo 1 > /sys/kernel/preemptirq_delay_test/trigger
root@host:~# echo 1 > /sys/kernel/preemptirq_delay_test/trigger

If all goes well, you should be getting stack traces that shows
all the different latencies, i.e. you should see all the three
functions preemptirqtest_0, preemptirqtest_1, preemptirqtest_2 in the
stack traces.

Signed-off-by: Viktor Rosendahl 
---
 tools/Makefile|   14 +-
 tools/tracing/Makefile|   19 +
 tools/tracing/latency/.gitignore  |2 +
 tools/tracing/latency/Makefile|   24 +
 tools/tracing/latency/latency-collector.c | 2108 +
 5 files changed, 2161 insertions(+), 6 deletions(-)
 create mode 100644 tools/tracing/Makefile
 create mode 100644 tools/tracing/latency/.gitignore
 create mode 100644 tools/tracing/latency/Makefile
 create mode 100644 tools/tracing/latency/latency-collector.c

diff --git a/tools/Makefile b/tools/Makefile
index 85af6ebbce91..7e9d34ddd74c 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -31,6 +31,7 @@ help:
@echo '  bootconfig - boot config tool'
@echo '  spi- spi tools'
@echo '  tmon   - thermal monitoring and tuning tool'
+   @echo '  tracing- misc tracing tools'
@echo '  turbostat  - Intel CPU i

[PATCH v11] Add the latency-collector to tools

2021-02-12 Thread Viktor Rosendahl

This is a tool that is intended to work around the fact that the
preemptoff, irqsoff, and preemptirqsoff tracers only work in
overwrite mode. The idea is to act randomly in such a way that we
do not systematically lose any latencies, so that if enough testing
is done, all latencies will be captured. If the same burst of
latencies is repeated, then sooner or later we will have captured all
the latencies.

It also works with the wakeup_dl, wakeup_rt, and wakeup tracers.
However, in that case it is probably not useful to use the random
sleep functionality.

The reason why it may be desirable to catch all latencies with a long
test campaign is that for some organizations, it's necessary to test
the kernel in the field and not practical for developers to work
iteratively with field testers. Because of cost and project schedules
it is not possible to start a new test campaign every time a latency
problem has been fixed.

It uses inotify to detect changes to /sys/kernel/tracing/trace.
When a latency is detected, it will either sleep or print
immediately, depending on a function that act as an unfair coin
toss.

If immediate print is chosen, it means that we open
/sys/kernel/tracing/trace and thereby cause a blackout period
that will hide any subsequent latencies.

If sleep is chosen, it means that we wait before opening
/sys/kernel/tracing/trace, by default for 1000 ms, to see if
there is another latency during this period. If there is, then we will
lose the previous latency. The coin will be tossed again with a
different probability, and we will either print the new latency, or
possibly a subsequent one.

The probability for the unfair coin toss is chosen so that there
is equal probability to obtain any of the latencies in a burst.
However, this assumes that we make an assumption of how many
latencies there can be. By default  the program assumes that there
are no more than 2 latencies in a burst, the probability of immediate
printout will be:

1/2 and 1

Thus, the probability of getting each of the two latencies will be 1/2.

If we ever find that there is more than one latency in a series,
meaning that we reach the probability of 1, then the table will be
expanded to:

1/3, 1/2, and 1

Thus, we assume that there are no more than three latencies and each
with a probability of 1/3 of being captured. If the probability of 1
is reached in the new table, that is we see more than two closely
occurring latencies, then the table will again be extended, and so
on.

On my systems, it seems like this scheme works fairly well, as
long as the latencies we trace are long enough, 300 us seems to be
enough. This userspace program receive the inotify event at the end
of a latency, and it has time until the end of the next latency
to react, that is to open /sys/kernel/tracing/trace. Thus,
if we trace latencies that are >300 us, then we have at least 300 us
to react.

The minimum latency will of course not be 300 us on all systems, it
will depend on the hardware, kernel version, workload and
configuration.

Example usage:

In one shell, give the following command:
sudo latency-collector -rvv -t preemptirqsoff -s 2000 -a 3

This will trace latencies > 2000us with the preemptirqsoff tracer,
using random sleep with maximum verbosity, with a probability
table initialized to a size of 3.

In another shell, generate a few bursts of latencies:

root@host:~# modprobe preemptirq_delay_test delay=3000 test_mode=alternate
burst_size=3
root@host:~# echo 1 > /sys/kernel/preemptirq_delay_test/trigger
root@host:~# echo 1 > /sys/kernel/preemptirq_delay_test/trigger
root@host:~# echo 1 > /sys/kernel/preemptirq_delay_test/trigger
root@host:~# echo 1 > /sys/kernel/preemptirq_delay_test/trigger

If all goes well, you should be getting stack traces that shows
all the different latencies, i.e. you should see all the three
functions preemptirqtest_0, preemptirqtest_1, preemptirqtest_2 in the
stack traces.

Signed-off-by: Viktor Rosendahl 
---
 tools/Makefile|   14 +-
 tools/tracing/Makefile|   19 +
 tools/tracing/latency/Makefile|   24 +
 tools/tracing/latency/latency-collector.c | 2108 +
 4 files changed, 2159 insertions(+), 6 deletions(-)
 create mode 100644 tools/tracing/Makefile
 create mode 100644 tools/tracing/latency/Makefile
 create mode 100644 tools/tracing/latency/latency-collector.c

diff --git a/tools/Makefile b/tools/Makefile
index 85af6ebbce91..7e9d34ddd74c 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -31,6 +31,7 @@ help:
@echo '  bootconfig - boot config tool'
@echo '  spi- spi tools'
@echo '  tmon   - thermal monitoring and tuning tool'
+   @echo '  tracing- misc tracing tools'
@echo '  turbostat  - Intel CPU idle stats and freq 
reporting tool'
@echo '  usb- USB testing tools'

[PATCH] Add the latency-collector to tools

2021-02-11 Thread Viktor Rosendahl

all objtool_install wmi_install pci_install 
debugging_install:
+cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install bootconfig_install spi_install usb_install virtio_install 
vm_install bpf_install objtool_install wmi_install pci_install 
debugging_install tracing_install:
$(call descend,$(@:_install=),install)
 
 liblockdep_install:
@@ -137,7 +138,8 @@ install: acpi_install cgroup_install cpupower_install 
gpio_install \
perf_install selftests_install turbostat_install usb_install \
virtio_install vm_install bpf_install 
x86_energy_perf_policy_install \
tmon_install freefall_install objtool_install kvm_stat_install \
-   wmi_install pci_install debugging_install 
intel-speed-select_install
+   wmi_install pci_install debugging_install 
intel-speed-select_install \
+   tracing_install
 
 acpi_clean:
$(call descend,power/acpi,clean)
@@ -145,7 +147,7 @@ acpi_clean:
 cpupower_clean:
$(call descend,power/cpupower,clean)
 
-cgroup_clean hv_clean firewire_clean bootconfig_clean spi_clean usb_clean 
virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean 
leds_clean pci_clean firmware_clean debugging_clean:
+cgroup_clean hv_clean firewire_clean bootconfig_clean spi_clean usb_clean 
virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean 
leds_clean pci_clean firmware_clean debugging_clean tracing_clean:
$(call descend,$(@:_clean=),clean)
 
 liblockdep_clean:
@@ -184,6 +186,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean 
firewire_clean \
vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean 
tmon_clean \
freefall_clean build_clean libbpf_clean libsubcmd_clean 
liblockdep_clean \
gpio_clean objtool_clean leds_clean wmi_clean pci_clean 
firmware_clean debugging_clean \
-   intel-speed-select_clean
+   intel-speed-select_clean tracing_clean
 
 .PHONY: FORCE
diff --git a/tools/tracing/Makefile b/tools/tracing/Makefile
new file mode 100644
index ..f53859765154
--- /dev/null
+++ b/tools/tracing/Makefile
@@ -0,0 +1,23 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for vm tools
+#
+VAR_CFLAGS := $(shell pkg-config --cflags libtracefs 2>/dev/null)
+VAR_LDLIBS := $(shell pkg-config --libs libtracefs 2>/dev/null)
+
+TARGETS = latency-collector
+CFLAGS = -Wall -Wextra -g -O2 $(VAR_CFLAGS)
+LDFLAGS = -lpthread $(VAR_LDLIBS)
+
+all: $(TARGETS)
+
+%: %.c
+   $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
+
+clean:
+   $(RM) latency-collector
+
+sbindir ?= /usr/sbin
+
+install: all
+   install -d $(DESTDIR)$(sbindir)
+   install -m 755 -p $(TARGETS) $(DESTDIR)$(sbindir)
diff --git a/tools/tracing/latency-collector.c 
b/tools/tracing/latency-collector.c
new file mode 100644
index ..e9aa7a47a8a3
--- /dev/null
+++ b/tools/tracing/latency-collector.c
@@ -0,0 +1,2102 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2017, 2018, 2019, 2021 BMW Car IT GmbH
+ * Author: Viktor Rosendahl (viktor.rosend...@bmw.de)
+ */
+
+#define _GNU_SOURCE
+#define _POSIX_C_SOURCE 200809L
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static const char *prg_name;
+static const char *prg_unknown = "unknown program name";
+
+static int fd_stdout;
+
+static int sched_policy;
+static bool sched_policy_set;
+
+static int sched_pri;
+static bool sched_pri_set;
+
+static bool trace_enable = true;
+static bool setup_ftrace = true;
+static bool use_random_sleep;
+
+enum traceopt {
+   OPTIDX_FUNC_TR = 0,
+   OPTIDX_DISP_GR,
+   OPTIDX_NR
+};
+
+/* Should be in the order of enum traceopt */
+static  const char * const optstr[] = {
+   "function-trace", /* OPTIDX_FUNC_TR  */
+   "display-graph",  /* OPTIDX_DISP_GR  */
+   NULL
+};
+
+enum errhandling {
+   ERR_EXIT = 0,
+   ERR_WARN,
+   ERR_CLEANUP,
+};
+
+static bool use_options[OPTIDX_NR];
+
+static char inotify_buffer[655360];
+
+#define likely(x)  __builtin_expect(!!(x), 1)
+#define unlikely(x)__builtin_expect(!!(x), 0)
+#define bool2str(x)(x ? "true":"false")
+
+#define DEFAULT_NR_PRINTER_THREADS (3)
+static unsigned int nr_threads = DEFAULT_NR_PRINTER_THREADS;
+
+#define DEFAULT_TABLE_SIZE (2)
+static unsigned int table_startsize = DEFAULT_TABLE_SIZE;
+
+static int verbosity;
+
+#define verbose_sizechange() (verbosity >= 1)
+#define verbose_lostevent()  (verbosity >= 2)
+#define verbose_ftrace() (verbosity >= 1)
+
+#define was_changed(ORIG, CUR) (strcmp(ORIG, CUR) != 0)
+#define needs_change(CUR, WANTED) (strcmp(CUR, WANTED) != 0)
+
+static const char *debug_tracefile;
+static const char *debug_tracefile_dflt;
+static const char *debu

[RFC PATCH 1/2] Use pause-on-trace with the latency tracers

2021-01-19 Thread Viktor Rosendahl

Eaerlier, tracing was disabled when reading the trace file. This behavior
was changed with:

commit 06e0a548bad0 ("tracing: Do not disable tracing when reading the
trace file").

This doesn't seem to work with the latency tracers.

The above mentioned commit dit not only change the behavior but also added
an option to emulate the old behavior. The idea with this patch is to
enable this pause-on-trace option when the latency tracers are used.

This is a workaround, perhaps it would be better to make the latency
tracers work without pausing but I am not sure how to do that, or even
how feasible it is without significant rework.

Signed-off-by: Viktor Rosendahl 
---
 kernel/trace/trace_irqsoff.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index d06aab4dcbb8..6756379b661f 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -562,6 +562,8 @@ static int __irqsoff_tracer_init(struct trace_array *tr)
/* non overwrite screws up the latency tracers */
set_tracer_flag(tr, TRACE_ITER_OVERWRITE, 1);
set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, 1);
+   /* without pause, we will produce garbage if another latency occurs */
+   set_tracer_flag(tr, TRACE_ITER_PAUSE_ON_TRACE, 1);
 
tr->max_latency = 0;
irqsoff_trace = tr;
@@ -583,11 +585,13 @@ static void __irqsoff_tracer_reset(struct trace_array *tr)
 {
int lat_flag = save_flags & TRACE_ITER_LATENCY_FMT;
int overwrite_flag = save_flags & TRACE_ITER_OVERWRITE;
+   int pause_flag = save_flags & TRACE_ITER_PAUSE_ON_TRACE;
 
stop_irqsoff_tracer(tr, is_graph(tr));
 
set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, lat_flag);
set_tracer_flag(tr, TRACE_ITER_OVERWRITE, overwrite_flag);
+   set_tracer_flag(tr, TRACE_ITER_PAUSE_ON_TRACE, pause_flag);
ftrace_reset_array_ops(tr);
 
irqsoff_busy = false;
-- 
2.25.1

[RFC PATCH 2/2] Add the latency-collector to tools

2021-01-19 Thread Viktor Rosendahl

lt;<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

In the example above, we randomly, happened to get the third latency
in a burst of 10. If the experiment is repeated enough times, we will
get all 10.

Sometimes, there will be lost latencies, so that we get:

3054.078294 Latency 44 printout skipped due to inotify loop
..or:

3211.957685 Latency 112 printout skipped due to print loop

In my experience, these are not frequent enough to be a problem.

Also, we do not really lose any latency in these cases, it's more like
we get another latency than what was intended by the design.

Signed-off-by: Viktor Rosendahl 
---
 tools/Makefile|   14 +-
 tools/tracing/Makefile|   20 +
 tools/tracing/latency-collector.c | 1212 +
 3 files changed, 1240 insertions(+), 6 deletions(-)
 create mode 100644 tools/tracing/Makefile
 create mode 100644 tools/tracing/latency-collector.c

diff --git a/tools/Makefile b/tools/Makefile
index 85af6ebbce91..7e9d34ddd74c 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -31,6 +31,7 @@ help:
@echo '  bootconfig - boot config tool'
@echo '  spi- spi tools'
@echo '  tmon   - thermal monitoring and tuning tool'
+   @echo '  tracing- misc tracing tools'
@echo '  turbostat  - Intel CPU idle stats and freq 
reporting tool'
@echo '  usb- USB testing tools'
@echo '  virtio - vhost test module'
@@ -64,7 +65,7 @@ acpi: FORCE
 cpupower: FORCE
$(call descend,power/$@)
 
-cgroup firewire hv guest bootconfig spi usb virtio vm bpf iio gpio objtool 
leds wmi pci firmware debugging: FORCE
+cgroup firewire hv guest bootconfig spi usb virtio vm bpf iio gpio objtool 
leds wmi pci firmware debugging tracing: FORCE
$(call descend,$@)
 
 bpf/%: FORCE
@@ -103,7 +104,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \
perf selftests bootconfig spi turbostat usb \
virtio vm bpf x86_energy_perf_policy \
tmon freefall iio objtool kvm_stat wmi \
-   pci debugging
+   pci debugging tracing
 
 acpi_install:
$(call descend,power/$(@:_install=),install)
@@ -111,7 +112,7 @@ acpi_install:
 cpupower_install:
$(call descend,power/$(@:_install=),install)
 
-cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install bootconfig_install spi_install usb_install virtio_install 
vm_install bpf_install objtool_install wmi_install pci_install 
debugging_install:
+cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install bootconfig_install spi_install usb_install virtio_install 
vm_install bpf_install objtool_install wmi_install pci_install 
debugging_install tracing_install:
$(call descend,$(@:_install=),install)
 
 liblockdep_install:
@@ -137,7 +138,8 @@ install: acpi_install cgroup_install cpupower_install 
gpio_install \
perf_install selftests_install turbostat_install usb_install \
virtio_install vm_install bpf_install 
x86_energy_perf_policy_install \
tmon_install freefall_install objtool_install kvm_stat_install \
-   wmi_install pci_install debugging_install 
intel-speed-select_install
+   wmi_install pci_install debugging_install 
intel-speed-select_install \
+   tracing_install
 
 acpi_clean:
$(call descend,power/acpi,clean)
@@ -145,7 +147,7 @@ acpi_clean:
 cpupower_clean:
$(call descend,power/cpupower,clean)
 
-cgroup_clean hv_clean firewire_clean bootconfig_clean spi_clean usb_clean 
virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean 
leds_clean pci_clean firmware_clean debugging_clean:
+cgroup_clean hv_clean firewire_clean bootconfig_clean spi_clean usb_clean 
virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean 
leds_clean pci_clean firmware_clean debugging_clean tracing_clean:
$(call descend,$(@:_clean=),clean)
 
 liblockdep_clean:
@@ -184,6 +186,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean 
firewire_clean \
vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean 
tmon_clean \
freefall_clean build_clean libbpf_clean libsubcmd_clean 
liblockdep_clean \
gpio_clean objtool_clean leds_clean wmi_clean pci_clean 
firmware_clean debugging_clean \
-   intel-speed-select_clean
+   intel-speed-select_clean tracing_clean
 
 .PHONY: FORCE
diff --git a/tools/tracing/Makefile b/tools/tracing/Makefile
new file mode 100644
index ..59cd483ab01f
--- /dev/null
+++ b/tools/tracing/Makefile
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for vm tools
+#
+TARGETS = latency-collector
+CFLAGS = -Wall -Wextra -g -O2
+LD

[RFC PATCH 0/2] Tracing bursts of latencies

2021-01-19 Thread Viktor Rosendahl

Hello all,

This series contains two things:

1. A fix for a bug in the Ftrace latency tracers that appeared with Linux 5.7.

2. The latency-collector, a tool that is designed to work around the
   limitations in the ftrace latency tracers. It needs the bug fix in order to
   work properly.

I have sent a patch series with the latency-collector before.

I never got any comments on it and I stopped pushing it because I thought that
BPF tracing would be the wave of the future and that it would solve the problem
in a cleaner and more elegant way.

Recently, I tried out the criticalstat script from bcc tools but it did not
fulfill all of my hopes and dreams.

On the bright side, it was able to capture all latencies in a burst. The main
problems that I encountered were:

1. The system became unstable and froze now and then. The man page of
   criticalstat has a mention of it being unstable, so I assume that this is a
   known problem.

2. Sometimes the stack traces were incorrect but not in an obvious way. After it
   happened once, all subsequent ones were bad.
   
3. If two instances were run simultaneously (to capture both preemptoff and irq
   off), there seemed to be a quite large performance hit but I did not measure
   this exactly.
   
4. The filesystem footprint seemed quite large. The size of libbcc seemed to be
   quite large for a small embedded system.

For these reasons, I take the liberty of resending the latency-collector again.

I would hope to get some comments regarding it, or some suggestion of an
alternative approach of how to solve the problem of being able to capture
latencies that systematically occur close to each other.

Admittedly, it may from a developer's perspective be somewhat of a niche
problem, since removing one latency will reveal the next but when one is doing
validation with a fleet of devices being tested in a long and expensive test
campaign, then it is quite desirable to not lose any latencies.

best regards,

Viktor

Viktor Rosendahl (2):
  Use pause-on-trace with the latency tracers
  Add the latency-collector to tools

 kernel/trace/trace_irqsoff.c  |4 +
 tools/Makefile|   14 +-
 tools/tracing/Makefile|   20 +
 tools/tracing/latency-collector.c | 1212 +
 4 files changed, 1244 insertions(+), 6 deletions(-)
 create mode 100644 tools/tracing/Makefile
 create mode 100644 tools/tracing/latency-collector.c

-- 
2.25.1

[PATCH v10 1/4] ftrace: Implement fs notification for tracing_max_latency

2019-10-18 Thread Viktor Rosendahl (BMW)

This patch implements the feature that the tracing_max_latency file,
e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive
notifications through the fsnotify framework when a new latency is
available.

One particularly interesting use of this facility is when enabling
threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh,
together with the preempt/irqsoff tracers. This makes it possible to
implement a user space program that can, with equal probability,
obtain traces of latencies that occur immediately after each other in
spite of the fact that the preempt/irqsoff tracers operate in overwrite
mode.

This facility works with the preempt/irqsoff, and wakeup tracers.

The tracers may call the latency_fsnotify() from places such as
__schedule() or do_idle(); this makes it impossible to call
queue_work() directly without risking a deadlock. The same would
happen with a softirq,  kernel thread or tasklet. For this reason we
use the irq_work mechanism to call queue_work().

This patch creates a new workqueue. The reason for doing this is that
I wanted to use the WQ_UNBOUND and WQ_HIGHPRI flags.  My thinking was
that WQ_UNBOUND might help with the latency in some important cases.

If we use:

queue_work(system_highpri_wq, >fsnotify_work);

then the work will (almost) always execute on the same CPU but if we are
unlucky that CPU could be too busy while there could be another CPU in
the system that would be able to process the work soon enough.

queue_work_on() could be used to queue the work on another CPU but it
seems difficult to select the right CPU.

Signed-off-by: Viktor Rosendahl (BMW) 
---
 kernel/trace/trace.c | 73 ++--
 kernel/trace/trace.h |  7 +
 2 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6a0ee9178365..2124646dfb2a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -45,6 +45,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "trace.h"
 #include "trace_output.h"
@@ -1497,6 +1500,70 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, 
void *buf, size_t cnt)
 }
 
 unsigned long __read_mostlytracing_thresh;
+static const struct file_operations tracing_max_lat_fops;
+
+#if defined(CONFIG_TRACER_MAX_TRACE) && defined(CONFIG_FSNOTIFY)
+
+static struct workqueue_struct *fsnotify_wq;
+
+static void latency_fsnotify_workfn(struct work_struct *work)
+{
+   struct trace_array *tr = container_of(work, struct trace_array,
+ fsnotify_work);
+   fsnotify(tr->d_max_latency->d_inode, FS_MODIFY,
+tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0);
+}
+
+static void latency_fsnotify_workfn_irq(struct irq_work *iwork)
+{
+   struct trace_array *tr = container_of(iwork, struct trace_array,
+ fsnotify_irqwork);
+   queue_work(fsnotify_wq, >fsnotify_work);
+}
+
+static void trace_create_maxlat_file(struct trace_array *tr,
+struct dentry *d_tracer)
+{
+   INIT_WORK(>fsnotify_work, latency_fsnotify_workfn);
+   init_irq_work(>fsnotify_irqwork, latency_fsnotify_workfn_irq);
+   tr->d_max_latency = trace_create_file("tracing_max_latency", 0644,
+ d_tracer, >max_latency,
+ _max_lat_fops);
+}
+
+__init static int latency_fsnotify_init(void)
+{
+   fsnotify_wq = alloc_workqueue("tr_max_lat_wq",
+ WQ_UNBOUND | WQ_HIGHPRI, 0);
+   if (!fsnotify_wq) {
+   pr_err("Unable to allocate tr_max_lat_wq\n");
+   return -ENOMEM;
+   }
+   return 0;
+}
+
+late_initcall_sync(latency_fsnotify_init);
+
+static void latency_fsnotify(struct trace_array *tr)
+{
+   if (!fsnotify_wq)
+   return;
+   /*
+* We cannot call queue_work(>fsnotify_work) from here because it's
+* possible that we are called from __schedule() or do_idle(), which
+* could cause a deadlock.
+*/
+   irq_work_queue(>fsnotify_irqwork);
+}
+
+
+#else /* defined(CONFIG_TRACER_MAX_TRACE) && defined(CONFIG_FSNOTIFY) */
+
+#define trace_create_maxlat_file(tr, d_tracer) \
+   trace_create_file("tracing_max_latency", 0644, d_tracer,\
+ >max_latency, _max_lat_fops)
+
+#endif
 
 #ifdef CONFIG_TRACER_MAX_TRACE
 /*
@@ -1536,6 +1603,9 @@ __update_max_tr(struct trace_array *tr, struct 
task_struct *tsk, int cpu)
 
/* record this tasks comm */
tracing_record_cmdline(tsk);
+#ifdef CONFIG_FSNOTIFY
+   latency_fsnotify(tr);
+#endif
 }
 
 /**
@@ -8585,8 +8655,7 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry 
*d_t

[PATCH v10 4/4] ftrace: Add an option for tracing console latencies

2019-10-18 Thread Viktor Rosendahl (BMW)

This new kernel parameter "trace_console_latency" will enable the latency
tracers to trace the console latencies. Previously this has always been
implicitely disabled. I guess this is because they are considered to be
well known and unavoidable.

However, for some organizations it may nevertheless be desirable to
trace them. Basically, we want to be able to tell that there are
latencies in the system under test because someone has incorrectly
enabled the serial console.

Signed-off-by: Viktor Rosendahl (BMW) 
---
 .../admin-guide/kernel-parameters.txt |  4 +++
 include/linux/irqflags.h  | 22 +
 kernel/printk/printk.c|  6 ++--
 kernel/trace/trace_irqsoff.c  | 32 +++
 4 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index a84a83f8881e..3f96a380c528 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4768,6 +4768,10 @@
trace_buf_size=nn[KMG]
[FTRACE] will set tracing buffer size on each cpu.
 
+   trace_console_latency=[0|1]
+   [FTRACE] Trace the console latencies if 1 is given. Do
+   not trace the latencies if 0 or no parameter is given.
+
trace_event=[event-list]
[FTRACE] Set and start specified trace events in order
to facilitate early boot debugging. The event-list is a
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 21619c92c377..3de891723331 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -13,6 +13,7 @@
 #define _LINUX_TRACE_IRQFLAGS_H
 
 #include 
+#include 
 #include 
 
 /* Currently trace_softirqs_on/off is used only by lockdep */
@@ -68,9 +69,30 @@ do { \
defined(CONFIG_PREEMPT_TRACER)
  extern void stop_critical_timings(void);
  extern void start_critical_timings(void);
+ extern bool console_tracing_disabled(void);
+
+# define console_stop_critical_timings(flag)   \
+   do {\
+   typecheck(bool, flag);  \
+   flag = console_tracing_disabled();  \
+   if (flag)   \
+   stop_critical_timings();\
+   } while (0)
+
+# define console_start_critical_timings(flag)   \
+   do { \
+   typecheck(bool, flag);   \
+   if (flag)\
+   start_critical_timings();\
+   } while (0)
+
 #else
 # define stop_critical_timings() do { } while (0)
 # define start_critical_timings() do { } while (0)
+# define console_stop_critical_timings(flag)   \
+   do { typecheck(bool, flag); } while (0)
+# define console_start_critical_timings(flag)  \
+   do { typecheck(bool, flag); } while (0)
 #endif
 
 /*
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ca65327a6de8..f27e96273453 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2369,6 +2369,7 @@ void console_unlock(void)
static char ext_text[CONSOLE_EXT_LOG_MAX];
static char text[LOG_LINE_MAX + PREFIX_MAX];
unsigned long flags;
+   bool cflag;
bool do_cond_resched, retry;
 
if (console_suspended) {
@@ -2469,9 +2470,10 @@ void console_unlock(void)
 */
console_lock_spinning_enable();
 
-   stop_critical_timings();/* don't trace print latency */
+   /* don't trace print latency if it's disabled */
+   console_stop_critical_timings(cflag);
call_console_drivers(ext_text, ext_len, text, len);
-   start_critical_timings();
+   console_start_critical_timings(cflag);
 
if (console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index a745b0cee5d3..84876192cc73 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "trace.h"
@@ -456,6 +457,37 @@ void stop_critical_timings(void)
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 NOKPROBE_SYMBOL(stop_critical_timings);
 
+static bool no_console_latency __read_mostly = true;
+
+static int __init trace_console_latency(char *str)
+{
+   int param;
+
+   get_option(, );
+   no_console_latency = !(param == 1);
+
+   return 0;
+}
+
+early_param("trace_console_latency", trace_console_latency);
+
+bool console_tracing_disabled(void)
+{

[PATCH v10 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger

2019-10-18 Thread Viktor Rosendahl (BMW)

This burst feature enables the user to generate a burst of
preempt/irqsoff latencies. This makes it possible to test whether we
are able to detect latencies that systematically occur very close to
each other.

The maximum burst size is 10. We also create 10 identical test
functions, so that we get 10 different backtraces; this is useful
when we want to test whether we can detect all the latencies in a
burst. Otherwise, there would be no easy way of differentiating
between which latency in a burst was captured by the tracer.

In addition, there is a sysfs trigger, so that it's not necessary to
reload the module to repeat the test. The trigger will appear as
/sys/kernel/preemptirq_delay_test/trigger in sysfs.

Signed-off-by: Viktor Rosendahl (BMW) 
Reviewed-by: Joel Fernandes (Google) 
---
 kernel/trace/Kconfig |   6 +-
 kernel/trace/preemptirq_delay_test.c | 144 +++
 2 files changed, 128 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e08527f50d2a..2a58380ea310 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -752,9 +752,9 @@ config PREEMPTIRQ_DELAY_TEST
  configurable delay. The module busy waits for the duration of the
  critical section.
 
- For example, the following invocation forces a one-time irq-disabled
- critical section for 500us:
- modprobe preemptirq_delay_test test_mode=irq delay=50
+ For example, the following invocation generates a burst of three
+ irq-disabled critical sections for 500us:
+ modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3
 
  If unsure, say N
 
diff --git a/kernel/trace/preemptirq_delay_test.c 
b/kernel/trace/preemptirq_delay_test.c
index d8765c952fab..31c0fad4cb9e 100644
--- a/kernel/trace/preemptirq_delay_test.c
+++ b/kernel/trace/preemptirq_delay_test.c
@@ -10,18 +10,25 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 static ulong delay = 100;
-static char test_mode[10] = "irq";
+static char test_mode[12] = "irq";
+static uint burst_size = 1;
 
-module_param_named(delay, delay, ulong, S_IRUGO);
-module_param_string(test_mode, test_mode, 10, S_IRUGO);
-MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)");
-MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default 
irq)");
+module_param_named(delay, delay, ulong, 0444);
+module_param_string(test_mode, test_mode, 12, 0444);
+module_param_named(burst_size, burst_size, uint, 0444);
+MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)");
+MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or 
alternate (default irq)");
+MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)");
+
+#define MIN(x, y) ((x) < (y) ? (x) : (y))
 
 static void busy_wait(ulong time)
 {
@@ -34,37 +41,136 @@ static void busy_wait(ulong time)
} while ((end - start) < (time * 1000));
 }
 
-static int preemptirq_delay_run(void *data)
+static __always_inline void irqoff_test(void)
 {
unsigned long flags;
+   local_irq_save(flags);
+   busy_wait(delay);
+   local_irq_restore(flags);
+}
 
-   if (!strcmp(test_mode, "irq")) {
-   local_irq_save(flags);
-   busy_wait(delay);
-   local_irq_restore(flags);
-   } else if (!strcmp(test_mode, "preempt")) {
-   preempt_disable();
-   busy_wait(delay);
-   preempt_enable();
+static __always_inline void preemptoff_test(void)
+{
+   preempt_disable();
+   busy_wait(delay);
+   preempt_enable();
+}
+
+static void execute_preemptirqtest(int idx)
+{
+   if (!strcmp(test_mode, "irq"))
+   irqoff_test();
+   else if (!strcmp(test_mode, "preempt"))
+   preemptoff_test();
+   else if (!strcmp(test_mode, "alternate")) {
+   if (idx % 2 == 0)
+   irqoff_test();
+   else
+   preemptoff_test();
}
+}
+
+#define DECLARE_TESTFN(POSTFIX)\
+   static void preemptirqtest_##POSTFIX(int idx)   \
+   {   \
+   execute_preemptirqtest(idx);\
+   }   \
 
+/*
+ * We create 10 different functions, so that we can get 10 different
+ * backtraces.
+ */
+DECLARE_TESTFN(0)
+DECLARE_TESTFN(1)
+DECLARE_TESTFN(2)
+DECLARE_TESTFN(3)
+DECLARE_TESTFN(4)
+DECLARE_TESTFN(5)
+DECLARE_TESTFN(6)
+DECLARE_TESTFN(7)
+DECLARE_TESTFN(8)
+DECLARE_TESTFN(9)
+
+static void (*testfuncs[])(int)  = {
+   preemptirqtest_0,
+   preemptirqtest_1,
+   preemptirqtest_2,
+   preemptirqtest_3,
+   preemptirqtest_4,
+   preemptirq

[PATCH v10 3/4] Add the latency-collector to tools

2019-10-18 Thread Viktor Rosendahl (BMW)

;<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

In the example above, we randomly, happened to get the third latency
in a burst burst of 10. If the experiment is repeated enough times,
we will get all 10.

Sometimes, there will be lost latencies, so that we get:

3054.078294 Latency 44 printout skipped due to inotify loop
..or:

3211.957685 Latency 112 printout skipped due to print loop

In my experience, these are not frequent enough to be a problem.

Also, we do not really lose any latency in these cases, it's
more like we get another latency than what was intended by
the design.

Signed-off-by: Viktor Rosendahl (BMW) 
---
 tools/Makefile  |   14 +-
 tools/trace/Makefile|   20 +
 tools/trace/latency-collector.c | 1212 +++
 3 files changed, 1240 insertions(+), 6 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

diff --git a/tools/Makefile b/tools/Makefile
index 7e42f7b8bfa7..957762a8b811 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -30,6 +30,7 @@ help:
@echo '  selftests  - various kernel selftests'
@echo '  spi- spi tools'
@echo '  tmon   - thermal monitoring and tuning tool'
+   @echo '  trace  - misc tracing tools'
@echo '  turbostat  - Intel CPU idle stats and freq 
reporting tool'
@echo '  usb- USB testing tools'
@echo '  virtio - vhost test module'
@@ -63,7 +64,7 @@ acpi: FORCE
 cpupower: FORCE
$(call descend,power/$@)
 
-cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci 
firmware debugging: FORCE
+cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci 
firmware debugging trace: FORCE
$(call descend,$@)
 
 liblockdep: FORCE
@@ -99,7 +100,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \
perf selftests spi turbostat usb \
virtio vm bpf x86_energy_perf_policy \
tmon freefall iio objtool kvm_stat wmi \
-   pci debugging
+   pci debugging trace
 
 acpi_install:
$(call descend,power/$(@:_install=),install)
@@ -107,7 +108,7 @@ acpi_install:
 cpupower_install:
$(call descend,power/$(@:_install=),install)
 
-cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install spi_install usb_install virtio_install vm_install bpf_install 
objtool_install wmi_install pci_install debugging_install:
+cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install spi_install usb_install virtio_install vm_install bpf_install 
objtool_install wmi_install pci_install debugging_install trace_install:
$(call descend,$(@:_install=),install)
 
 liblockdep_install:
@@ -133,7 +134,8 @@ install: acpi_install cgroup_install cpupower_install 
gpio_install \
perf_install selftests_install turbostat_install usb_install \
virtio_install vm_install bpf_install 
x86_energy_perf_policy_install \
tmon_install freefall_install objtool_install kvm_stat_install \
-   wmi_install pci_install debugging_install 
intel-speed-select_install
+   wmi_install pci_install debugging_install 
intel-speed-select_install \
+   trace_install
 
 acpi_clean:
$(call descend,power/acpi,clean)
@@ -141,7 +143,7 @@ acpi_clean:
 cpupower_clean:
$(call descend,power/cpupower,clean)
 
-cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean 
wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean 
firmware_clean debugging_clean:
+cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean 
wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean 
firmware_clean debugging_clean trace_clean:
$(call descend,$(@:_clean=),clean)
 
 liblockdep_clean:
@@ -180,6 +182,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean 
firewire_clean \
vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean 
tmon_clean \
freefall_clean build_clean libbpf_clean libsubcmd_clean 
liblockdep_clean \
gpio_clean objtool_clean leds_clean wmi_clean pci_clean 
firmware_clean debugging_clean \
-   intel-speed-select_clean
+   intel-speed-select_clean trace_clean
 
 .PHONY: FORCE
diff --git a/tools/trace/Makefile b/tools/trace/Makefile
new file mode 100644
index ..59cd483ab01f
--- /dev/null
+++ b/tools/trace/Makefile
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for vm tools
+#
+TARGETS = latency-collector
+CFLAGS = -Wall -Wextra -g -O2
+LDFLAGS = -lpthread
+
+all: $(TARGETS)
+
+%: %.c
+   $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
+
+clean:
+

[PATCH v10 0/4] Some new features for the preempt/irqsoff tracers

2019-10-18 Thread Viktor Rosendahl (BMW)

Hello all,

Changes in v10:

- [PATCH 1/4]:
  * Changed the #if statements to not depend on CONFIG_HWLAT_TRACER
  * Changed latency_fsnotify() to a static and removed the declaration from
kernel/trace/trace.h.

- [PATCH 2/4]:
  * No change.

- [PATCH 3/4]:
  * No change.

- [PACTH 4/4]:
  * No change.
 
This series is meant to address two issues with the latency tracing.

The first three patches provide a method to trace latencies that always
occurs very close to each other and to differentiate between them, in spite
of the fact that the latency tracers work in overwrite mode.

[PATCH 1/4] This implement fs notification for tracing_max_latency. It
makes it possible for userspace to detect when a new latency has been
detected.

[PATCH 2/4] This extends the preemptirq_delay_test module so that it can be
used to generate a burst of closely occurring latencies.

[PATCH 3/4] This adds a user space program to the tools directory that
utilizes the fs notification feature and a randomized algorithm to print out
any of the latencies in a burst with approximately equal probability.

The last patch is not directly connected but earlier it didn't apply
cleanly on its own. However, now it does, so in principle it could be
applied separately from the others.

[PATCH 4/4] This adds the option trace_console_latency=1 to the kernel
parameters. This makes it possible to enable tracing of console latencies.

best regards,

Viktor

Viktor Rosendahl (BMW) (4):
  ftrace: Implement fs notification for tracing_max_latency
  preemptirq_delay_test: Add the burst feature and a sysfs trigger
  Add the latency-collector to tools
  ftrace: Add an option for tracing console latencies

 .../admin-guide/kernel-parameters.txt |4 +
 include/linux/irqflags.h  |   22 +
 kernel/printk/printk.c|6 +-
 kernel/trace/Kconfig  |6 +-
 kernel/trace/preemptirq_delay_test.c  |  144 +-
 kernel/trace/trace.c  |   73 +-
 kernel/trace/trace.h  |7 +
 kernel/trace/trace_irqsoff.c  |   32 +
 tools/Makefile|   14 +-
 tools/trace/Makefile  |   20 +
 tools/trace/latency-collector.c   | 1212 +
 11 files changed, 1508 insertions(+), 32 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

-- 
2.17.1

Re: [PATCH v9 1/4] ftrace: Implement fs notification for tracing_max_latency

2019-10-16 Thread Viktor Rosendahl (BMW)

> Something bothers me. If you dropped support for HWLAT_TRACER as you
> mentioned in the cover letter, then why does this #if look for the CONFIG
> option?

You are right. I forgot to change those #if statements; I also overlooked to
give the -i option to grep when I searched for remaining references to hwlat.

>(and similar comment on the rest of the patch..)

Below is a new version of the patch. I will not send out v10 yet, as I expect
that there might be more comments.

I did not define a new macro, since it is now much simpler after removing the
HWLAT_TRACER stuff from those #if statements.

I made the latency_fsnotify() static and removed its declaration from
kernel/trace/trace.h, since it's only used in kernel/trace/trace.c.

best regards,

Viktor

This patch implements the feature that the tracing_max_latency file,
e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive
notifications through the fsnotify framework when a new latency is
available.

One particularly interesting use of this facility is when enabling
threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh,
together with the preempt/irqsoff tracers. This makes it possible to
implement a user space program that can, with equal probability,
obtain traces of latencies that occur immediately after each other in
spite of the fact that the preempt/irqsoff tracers operate in overwrite
mode.

This facility works with the preempt/irqsoff, and wakeup tracers.

The tracers may call the latency_fsnotify() from places such as
__schedule() or do_idle(); this makes it impossible to call
queue_work() directly without risking a deadlock. The same would
happen with a softirq,  kernel thread or tasklet. For this reason we
use the irq_work mechanism to call queue_work().

This patch creates a new workqueue. The reason for doing this is that
I wanted to use the WQ_UNBOUND and WQ_HIGHPRI flags.  My thinking was
that WQ_UNBOUND might help with the latency in some important cases.

If we use:

queue_work(system_highpri_wq, >fsnotify_work);

then the work will (almost) always execute on the same CPU but if we are
unlucky that CPU could be too busy while there could be another CPU in
the system that would be able to process the work soon enough.

queue_work_on() could be used to queue the work on another CPU but it
seems difficult to select the right CPU.

Signed-off-by: Viktor Rosendahl (BMW) 
---
 kernel/trace/trace.c | 73 ++--
 kernel/trace/trace.h |  7 +
 2 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6a0ee9178365..2124646dfb2a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -45,6 +45,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "trace.h"
 #include "trace_output.h"
@@ -1497,6 +1500,70 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, 
void *buf, size_t cnt)
 }
 
 unsigned long __read_mostlytracing_thresh;
+static const struct file_operations tracing_max_lat_fops;
+
+#if defined(CONFIG_TRACER_MAX_TRACE) && defined(CONFIG_FSNOTIFY)
+
+static struct workqueue_struct *fsnotify_wq;
+
+static void latency_fsnotify_workfn(struct work_struct *work)
+{
+   struct trace_array *tr = container_of(work, struct trace_array,
+ fsnotify_work);
+   fsnotify(tr->d_max_latency->d_inode, FS_MODIFY,
+tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0);
+}
+
+static void latency_fsnotify_workfn_irq(struct irq_work *iwork)
+{
+   struct trace_array *tr = container_of(iwork, struct trace_array,
+ fsnotify_irqwork);
+   queue_work(fsnotify_wq, >fsnotify_work);
+}
+
+static void trace_create_maxlat_file(struct trace_array *tr,
+struct dentry *d_tracer)
+{
+   INIT_WORK(>fsnotify_work, latency_fsnotify_workfn);
+   init_irq_work(>fsnotify_irqwork, latency_fsnotify_workfn_irq);
+   tr->d_max_latency = trace_create_file("tracing_max_latency", 0644,
+ d_tracer, >max_latency,
+ _max_lat_fops);
+}
+
+__init static int latency_fsnotify_init(void)
+{
+   fsnotify_wq = alloc_workqueue("tr_max_lat_wq",
+ WQ_UNBOUND | WQ_HIGHPRI, 0);
+   if (!fsnotify_wq) {
+   pr_err("Unable to allocate tr_max_lat_wq\n");
+   return -ENOMEM;
+   }
+   return 0;
+}
+
+late_initcall_sync(latency_fsnotify_init);
+
+static void latency_fsnotify(struct trace_array *tr)
+{
+   if (!fsnotify_wq)
+   return;
+   /*
+* We cannot call queue_work(>fsnotify_work) from here because it's
+* possible that we are called from __schedule() or do_idle(), which

[PATCH v9 3/4] Add the latency-collector to tools

2019-10-15 Thread Viktor Rosendahl (BMW)

;<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

In the example above, we randomly, happened to get the third latency
in a burst burst of 10. If the experiment is repeated enough times,
we will get all 10.

Sometimes, there will be lost latencies, so that we get:

3054.078294 Latency 44 printout skipped due to inotify loop
..or:

3211.957685 Latency 112 printout skipped due to print loop

In my experience, these are not frequent enough to be a problem.

Also, we do not really lose any latency in these cases, it's
more like we get another latency than what was intended by
the design.

Signed-off-by: Viktor Rosendahl (BMW) 
---
 tools/Makefile  |   14 +-
 tools/trace/Makefile|   20 +
 tools/trace/latency-collector.c | 1212 +++
 3 files changed, 1240 insertions(+), 6 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

diff --git a/tools/Makefile b/tools/Makefile
index 7e42f7b8bfa7..957762a8b811 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -30,6 +30,7 @@ help:
@echo '  selftests  - various kernel selftests'
@echo '  spi- spi tools'
@echo '  tmon   - thermal monitoring and tuning tool'
+   @echo '  trace  - misc tracing tools'
@echo '  turbostat  - Intel CPU idle stats and freq 
reporting tool'
@echo '  usb- USB testing tools'
@echo '  virtio - vhost test module'
@@ -63,7 +64,7 @@ acpi: FORCE
 cpupower: FORCE
$(call descend,power/$@)
 
-cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci 
firmware debugging: FORCE
+cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci 
firmware debugging trace: FORCE
$(call descend,$@)
 
 liblockdep: FORCE
@@ -99,7 +100,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \
perf selftests spi turbostat usb \
virtio vm bpf x86_energy_perf_policy \
tmon freefall iio objtool kvm_stat wmi \
-   pci debugging
+   pci debugging trace
 
 acpi_install:
$(call descend,power/$(@:_install=),install)
@@ -107,7 +108,7 @@ acpi_install:
 cpupower_install:
$(call descend,power/$(@:_install=),install)
 
-cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install spi_install usb_install virtio_install vm_install bpf_install 
objtool_install wmi_install pci_install debugging_install:
+cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install spi_install usb_install virtio_install vm_install bpf_install 
objtool_install wmi_install pci_install debugging_install trace_install:
$(call descend,$(@:_install=),install)
 
 liblockdep_install:
@@ -133,7 +134,8 @@ install: acpi_install cgroup_install cpupower_install 
gpio_install \
perf_install selftests_install turbostat_install usb_install \
virtio_install vm_install bpf_install 
x86_energy_perf_policy_install \
tmon_install freefall_install objtool_install kvm_stat_install \
-   wmi_install pci_install debugging_install 
intel-speed-select_install
+   wmi_install pci_install debugging_install 
intel-speed-select_install \
+   trace_install
 
 acpi_clean:
$(call descend,power/acpi,clean)
@@ -141,7 +143,7 @@ acpi_clean:
 cpupower_clean:
$(call descend,power/cpupower,clean)
 
-cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean 
wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean 
firmware_clean debugging_clean:
+cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean 
wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean 
firmware_clean debugging_clean trace_clean:
$(call descend,$(@:_clean=),clean)
 
 liblockdep_clean:
@@ -180,6 +182,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean 
firewire_clean \
vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean 
tmon_clean \
freefall_clean build_clean libbpf_clean libsubcmd_clean 
liblockdep_clean \
gpio_clean objtool_clean leds_clean wmi_clean pci_clean 
firmware_clean debugging_clean \
-   intel-speed-select_clean
+   intel-speed-select_clean trace_clean
 
 .PHONY: FORCE
diff --git a/tools/trace/Makefile b/tools/trace/Makefile
new file mode 100644
index ..59cd483ab01f
--- /dev/null
+++ b/tools/trace/Makefile
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for vm tools
+#
+TARGETS = latency-collector
+CFLAGS = -Wall -Wextra -g -O2
+LDFLAGS = -lpthread
+
+all: $(TARGETS)
+
+%: %.c
+   $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
+
+clean:
+

[PATCH v9 0/4] Some new features for the preempt/irqsoff tracers

2019-10-15 Thread Viktor Rosendahl (BMW)

Hello all,

Changes in v9:

- [PATCH 1/4]:
  * Rebased the patch for v5.4-rc3+
  * Dropped support for the hwlat tracer. It seems to me, both from testing
and looking at the code, that hwlat doesn't actually update
/sys/kernel/debug/tracing/tracing_max_latency. I think it would be a bit
weird to send fsnotify for changes to a file that are not visible to
userspace. Also, the hwlat use case is not very interesting from
my perspective because hwlat works in non-overwrite mode by default. For me,
the effort needed to make it work properly would be more than what it's
worth.
  * Because of the changes mentioned above, I removed the Reviewed-by tag.

- [PATCH 2/4]:
  * No change.

- [PATCH 3/4]:
  * Removed the mention of hwlat from the commit and help messages.

- [PACTH 4/4]:
  * Changed the trace-option to a kernel parameter. This would hopefully make
it easier to remove it if it becomes obsolete.

This series is meant to address two issues with the latency tracing.

The first three patches provide a method to trace latencies that always
occurs very close to each other and to differentiate between them, in spite
of the fact that the latency tracers work in overwrite mode.

[PATCH 1/4] This implement fs notification for tracing_max_latency. It
makes it possible for userspace to detect when a new latency has been
detected.

[PATCH 2/4] This extends the preemptirq_delay_test module so that it can be
used to generate a burst of closely occurring latencies.

[PATCH 3/4] This adds a user space program to the tools directory that
utilizes the fs notification feature and a randomized algorithm to print out
any of the latencies in a burst with approximately equal probability.

The last patch is not directly connected but earlier it didn't apply
cleanly on its own. However, now it does, so in principle it could be
applied separately from the others.

[PATCH 4/4] This adds the option console-latency to the trace options. This
makes it possible to enable tracing of console latencies.

best regards,

Viktor

Viktor Rosendahl (BMW) (4):
  ftrace: Implement fs notification for tracing_max_latency
  preemptirq_delay_test: Add the burst feature and a sysfs trigger
  Add the latency-collector to tools
  ftrace: Add an option for tracing console latencies

 .../admin-guide/kernel-parameters.txt |4 +
 include/linux/irqflags.h  |   22 +
 kernel/printk/printk.c|6 +-
 kernel/trace/Kconfig  |6 +-
 kernel/trace/preemptirq_delay_test.c  |  144 +-
 kernel/trace/trace.c  |   75 +-
 kernel/trace/trace.h  |   18 +
 kernel/trace/trace_irqsoff.c  |   32 +
 tools/Makefile|   14 +-
 tools/trace/Makefile  |   20 +
 tools/trace/latency-collector.c   | 1212 +
 11 files changed, 1521 insertions(+), 32 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

-- 
2.17.1

[PATCH v9 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger

2019-10-15 Thread Viktor Rosendahl (BMW)

This burst feature enables the user to generate a burst of
preempt/irqsoff latencies. This makes it possible to test whether we
are able to detect latencies that systematically occur very close to
each other.

The maximum burst size is 10. We also create 10 identical test
functions, so that we get 10 different backtraces; this is useful
when we want to test whether we can detect all the latencies in a
burst. Otherwise, there would be no easy way of differentiating
between which latency in a burst was captured by the tracer.

In addition, there is a sysfs trigger, so that it's not necessary to
reload the module to repeat the test. The trigger will appear as
/sys/kernel/preemptirq_delay_test/trigger in sysfs.

Signed-off-by: Viktor Rosendahl (BMW) 
Reviewed-by: Joel Fernandes (Google) 
---
 kernel/trace/Kconfig |   6 +-
 kernel/trace/preemptirq_delay_test.c | 144 +++
 2 files changed, 128 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e08527f50d2a..2a58380ea310 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -752,9 +752,9 @@ config PREEMPTIRQ_DELAY_TEST
  configurable delay. The module busy waits for the duration of the
  critical section.
 
- For example, the following invocation forces a one-time irq-disabled
- critical section for 500us:
- modprobe preemptirq_delay_test test_mode=irq delay=50
+ For example, the following invocation generates a burst of three
+ irq-disabled critical sections for 500us:
+ modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3
 
  If unsure, say N
 
diff --git a/kernel/trace/preemptirq_delay_test.c 
b/kernel/trace/preemptirq_delay_test.c
index d8765c952fab..31c0fad4cb9e 100644
--- a/kernel/trace/preemptirq_delay_test.c
+++ b/kernel/trace/preemptirq_delay_test.c
@@ -10,18 +10,25 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 static ulong delay = 100;
-static char test_mode[10] = "irq";
+static char test_mode[12] = "irq";
+static uint burst_size = 1;
 
-module_param_named(delay, delay, ulong, S_IRUGO);
-module_param_string(test_mode, test_mode, 10, S_IRUGO);
-MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)");
-MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default 
irq)");
+module_param_named(delay, delay, ulong, 0444);
+module_param_string(test_mode, test_mode, 12, 0444);
+module_param_named(burst_size, burst_size, uint, 0444);
+MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)");
+MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or 
alternate (default irq)");
+MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)");
+
+#define MIN(x, y) ((x) < (y) ? (x) : (y))
 
 static void busy_wait(ulong time)
 {
@@ -34,37 +41,136 @@ static void busy_wait(ulong time)
} while ((end - start) < (time * 1000));
 }
 
-static int preemptirq_delay_run(void *data)
+static __always_inline void irqoff_test(void)
 {
unsigned long flags;
+   local_irq_save(flags);
+   busy_wait(delay);
+   local_irq_restore(flags);
+}
 
-   if (!strcmp(test_mode, "irq")) {
-   local_irq_save(flags);
-   busy_wait(delay);
-   local_irq_restore(flags);
-   } else if (!strcmp(test_mode, "preempt")) {
-   preempt_disable();
-   busy_wait(delay);
-   preempt_enable();
+static __always_inline void preemptoff_test(void)
+{
+   preempt_disable();
+   busy_wait(delay);
+   preempt_enable();
+}
+
+static void execute_preemptirqtest(int idx)
+{
+   if (!strcmp(test_mode, "irq"))
+   irqoff_test();
+   else if (!strcmp(test_mode, "preempt"))
+   preemptoff_test();
+   else if (!strcmp(test_mode, "alternate")) {
+   if (idx % 2 == 0)
+   irqoff_test();
+   else
+   preemptoff_test();
}
+}
+
+#define DECLARE_TESTFN(POSTFIX)\
+   static void preemptirqtest_##POSTFIX(int idx)   \
+   {   \
+   execute_preemptirqtest(idx);\
+   }   \
 
+/*
+ * We create 10 different functions, so that we can get 10 different
+ * backtraces.
+ */
+DECLARE_TESTFN(0)
+DECLARE_TESTFN(1)
+DECLARE_TESTFN(2)
+DECLARE_TESTFN(3)
+DECLARE_TESTFN(4)
+DECLARE_TESTFN(5)
+DECLARE_TESTFN(6)
+DECLARE_TESTFN(7)
+DECLARE_TESTFN(8)
+DECLARE_TESTFN(9)
+
+static void (*testfuncs[])(int)  = {
+   preemptirqtest_0,
+   preemptirqtest_1,
+   preemptirqtest_2,
+   preemptirqtest_3,
+   preemptirqtest_4,
+   preemptirq

[PATCH v9 4/4] ftrace: Add an option for tracing console latencies

2019-10-15 Thread Viktor Rosendahl (BMW)

This new kernel parameter "trace_console_latency" will enable the latency
tracers to trace the console latencies. Previously this has always been
implicitely disabled. I guess this is because they are considered to be
well known and unavoidable.

However, for some organizations it may nevertheless be desirable to
trace them. Basically, we want to be able to tell that there are
latencies in the system under test because someone has incorrectly
enabled the serial console.

Signed-off-by: Viktor Rosendahl (BMW) 
---
 .../admin-guide/kernel-parameters.txt |  4 +++
 include/linux/irqflags.h  | 22 +
 kernel/printk/printk.c|  6 ++--
 kernel/trace/trace_irqsoff.c  | 32 +++
 4 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index a84a83f8881e..3f96a380c528 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4768,6 +4768,10 @@
trace_buf_size=nn[KMG]
[FTRACE] will set tracing buffer size on each cpu.
 
+   trace_console_latency=[0|1]
+   [FTRACE] Trace the console latencies if 1 is given. Do
+   not trace the latencies if 0 or no parameter is given.
+
trace_event=[event-list]
[FTRACE] Set and start specified trace events in order
to facilitate early boot debugging. The event-list is a
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 21619c92c377..3de891723331 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -13,6 +13,7 @@
 #define _LINUX_TRACE_IRQFLAGS_H
 
 #include 
+#include 
 #include 
 
 /* Currently trace_softirqs_on/off is used only by lockdep */
@@ -68,9 +69,30 @@ do { \
defined(CONFIG_PREEMPT_TRACER)
  extern void stop_critical_timings(void);
  extern void start_critical_timings(void);
+ extern bool console_tracing_disabled(void);
+
+# define console_stop_critical_timings(flag)   \
+   do {\
+   typecheck(bool, flag);  \
+   flag = console_tracing_disabled();  \
+   if (flag)   \
+   stop_critical_timings();\
+   } while (0)
+
+# define console_start_critical_timings(flag)   \
+   do { \
+   typecheck(bool, flag);   \
+   if (flag)\
+   start_critical_timings();\
+   } while (0)
+
 #else
 # define stop_critical_timings() do { } while (0)
 # define start_critical_timings() do { } while (0)
+# define console_stop_critical_timings(flag)   \
+   do { typecheck(bool, flag); } while (0)
+# define console_start_critical_timings(flag)  \
+   do { typecheck(bool, flag); } while (0)
 #endif
 
 /*
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ca65327a6de8..f27e96273453 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2369,6 +2369,7 @@ void console_unlock(void)
static char ext_text[CONSOLE_EXT_LOG_MAX];
static char text[LOG_LINE_MAX + PREFIX_MAX];
unsigned long flags;
+   bool cflag;
bool do_cond_resched, retry;
 
if (console_suspended) {
@@ -2469,9 +2470,10 @@ void console_unlock(void)
 */
console_lock_spinning_enable();
 
-   stop_critical_timings();/* don't trace print latency */
+   /* don't trace print latency if it's disabled */
+   console_stop_critical_timings(cflag);
call_console_drivers(ext_text, ext_len, text, len);
-   start_critical_timings();
+   console_start_critical_timings(cflag);
 
if (console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index a745b0cee5d3..84876192cc73 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "trace.h"
@@ -456,6 +457,37 @@ void stop_critical_timings(void)
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 NOKPROBE_SYMBOL(stop_critical_timings);
 
+static bool no_console_latency __read_mostly = true;
+
+static int __init trace_console_latency(char *str)
+{
+   int param;
+
+   get_option(, );
+   no_console_latency = !(param == 1);
+
+   return 0;
+}
+
+early_param("trace_console_latency", trace_console_latency);
+
+bool console_tracing_disabled(void)
+{

[PATCH v9 1/4] ftrace: Implement fs notification for tracing_max_latency

2019-10-15 Thread Viktor Rosendahl (BMW)

This patch implements the feature that the tracing_max_latency file,
e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive
notifications through the fsnotify framework when a new latency is
available.

One particularly interesting use of this facility is when enabling
threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh,
together with the preempt/irqsoff tracers. This makes it possible to
implement a user space program that can, with equal probability,
obtain traces of latencies that occur immediately after each other in
spite of the fact that the preempt/irqsoff tracers operate in overwrite
mode.

This facility works with the preempt/irqsoff, and wakeup tracers.

The tracers may call the latency_fsnotify() from places such as
__schedule() or do_idle(); this makes it impossible to call
queue_work() directly without risking a deadlock. The same would
happen with a softirq,  kernel thread or tasklet. For this reason we
use the irq_work mechanism to call queue_work().

This patch creates a new workqueue. The reason for doing this is that
I wanted to use the WQ_UNBOUND and WQ_HIGHPRI flags.  My thinking was
that WQ_UNBOUND might help with the latency in some important cases.

If we use:

queue_work(system_highpri_wq, >fsnotify_work);

then the work will (almost) always execute on the same CPU but if we are
unlucky that CPU could be too busy while there could be another CPU in
the system that would be able to process the work soon enough.

queue_work_on() could be used to queue the work on another CPU but it
seems difficult to select the right CPU.

Signed-off-by: Viktor Rosendahl (BMW) 
---
 kernel/trace/trace.c | 75 ++--
 kernel/trace/trace.h | 18 +++
 2 files changed, 91 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6a0ee9178365..523e3e57f1d4 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -45,6 +45,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "trace.h"
 #include "trace_output.h"
@@ -1497,6 +1500,74 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, 
void *buf, size_t cnt)
 }
 
 unsigned long __read_mostlytracing_thresh;
+static const struct file_operations tracing_max_lat_fops;
+
+#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
+   defined(CONFIG_FSNOTIFY)
+
+static struct workqueue_struct *fsnotify_wq;
+
+static void latency_fsnotify_workfn(struct work_struct *work)
+{
+   struct trace_array *tr = container_of(work, struct trace_array,
+ fsnotify_work);
+   fsnotify(tr->d_max_latency->d_inode, FS_MODIFY,
+tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0);
+}
+
+static void latency_fsnotify_workfn_irq(struct irq_work *iwork)
+{
+   struct trace_array *tr = container_of(iwork, struct trace_array,
+ fsnotify_irqwork);
+   queue_work(fsnotify_wq, >fsnotify_work);
+}
+
+static void trace_create_maxlat_file(struct trace_array *tr,
+struct dentry *d_tracer)
+{
+   INIT_WORK(>fsnotify_work, latency_fsnotify_workfn);
+   init_irq_work(>fsnotify_irqwork, latency_fsnotify_workfn_irq);
+   tr->d_max_latency = trace_create_file("tracing_max_latency", 0644,
+ d_tracer, >max_latency,
+ _max_lat_fops);
+}
+
+__init static int latency_fsnotify_init(void)
+{
+   fsnotify_wq = alloc_workqueue("tr_max_lat_wq",
+ WQ_UNBOUND | WQ_HIGHPRI, 0);
+   if (!fsnotify_wq) {
+   pr_err("Unable to allocate tr_max_lat_wq\n");
+   return -ENOMEM;
+   }
+   return 0;
+}
+
+late_initcall_sync(latency_fsnotify_init);
+
+void latency_fsnotify(struct trace_array *tr)
+{
+   if (!fsnotify_wq)
+   return;
+   /*
+* We cannot call queue_work(>fsnotify_work) from here because it's
+* possible that we are called from __schedule() or do_idle(), which
+* could cause a deadlock.
+*/
+   irq_work_queue(>fsnotify_irqwork);
+}
+
+/*
+ * (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
+ *  defined(CONFIG_FSNOTIFY)
+ */
+#else
+
+#define trace_create_maxlat_file(tr, d_tracer) \
+   trace_create_file("tracing_max_latency", 0644, d_tracer,\
+ >max_latency, _max_lat_fops)
+
+#endif
 
 #ifdef CONFIG_TRACER_MAX_TRACE
 /*
@@ -1536,6 +1607,7 @@ __update_max_tr(struct trace_array *tr, struct 
task_struct *tsk, int cpu)
 
/* record this tasks comm */
tracing_record_cmdline(tsk);
+   latency_fsnotify(tr);
 }
 
 /**
@@ -8585,8 +8657,7 @@ init_tr

Re: [PATCH v8 4/4] ftrace: Add an option for tracing console latencies

2019-10-10 Thread Viktor Rosendahl

Den ons 9 okt. 2019 kl 20:08 skrev Steven Rostedt :
>
> On Wed, 9 Oct 2019 16:51:07 +0200
> Viktor Rosendahl  wrote:
>
> > Apologies, I should have replied there but I have been a bit busy the
> > last few days with other topics, and I felt a bit indecisive about
> > that point, so I just sloppily addressed the issue in the cover letter
> > of this new series:
> >
> > "I have retained the fourth patch, although it was suggested that is 
> > becoming
> > obsolete soon. I have retained it only because I do not know the status of
> > the code that will make it obsolete. It's the last patch of the series and
> > if there indeed is some code that will remove the latency issues from the
> > printk code, then of course it makes sense to drop it. The first three 
> > patches
> > will work without it."
> >
> > I thought that, since it's the last in the series, it would be
> > possible for maintainers to just take the first three if the last one
> > is not wanted.
> >
> > For me it solves a rather important problem though, so if the code
> > that will make it obsolete isn't available for some time, then perhaps
> > it should be considered as a temporary solution.
> >
> > Of course, if there is a commitment to never remove any knobs from
> > /sys/kernel/debug/tracing/trace_options, then I can easily understand
> > that it's not wanted as a temporary fix.
>
> There isn't quite a commitment to not remove knobs, but if a tool
> starts relying on a knob, then, the answer is yes there is :-p
>
> As this will hopefully become something we don't worry about in the
> future, I would rather have this be a kernel command line option. This
> way, it wont be something that tools can really check for.
>
> If you add a trace_console_latency option to the kernel command line,
> we can enable it that way. And then when it becomes obsolete, we can
> simply remove it, without breaking any tools.
>
> Would that work for you?
>

Sounds good to me. I will try to adjust the patch accordingly within a few days.

best regards,

Viktor

> -- Steve

Re: [PATCH v8 4/4] ftrace: Add an option for tracing console latencies

2019-10-09 Thread Viktor Rosendahl

Den ons 9 okt. 2019 kl 16:11 skrev Joel Fernandes :
>
> On Wed, Oct 09, 2019 at 12:08:24AM +0200, Viktor Rosendahl (BMW) wrote:
> > This new trace option "console-latency" will enable the latency
> > tracers to trace the console latencies. Previously this has always been
> > implicitely disabled. I guess this is because they are considered
> > to be well known and unavoidable.
> >
> > However, for some organizations it may nevertheless be desirable to
> > trace them. Basically, we want to be able to tell that there are
> > latencies in the system under test because someone has incorrectly
> > enabled the serial console.
> >
> > Signed-off-by: Viktor Rosendahl (BMW) 
> > ---
> >  include/linux/irqflags.h | 22 ++
> >  kernel/printk/printk.c   |  6 --
> >  kernel/trace/trace.h |  1 +
> >  kernel/trace/trace_irqsoff.c | 18 ++
> >  4 files changed, 45 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> > index 21619c92c377..3de891723331 100644
> > --- a/include/linux/irqflags.h
> > +++ b/include/linux/irqflags.h
> > @@ -13,6 +13,7 @@
> >  #define _LINUX_TRACE_IRQFLAGS_H
> >
> >  #include 
> > +#include 
> >  #include 
> >
> >  /* Currently trace_softirqs_on/off is used only by lockdep */
> > @@ -68,9 +69,30 @@ do {   \
> >   defined(CONFIG_PREEMPT_TRACER)
> >   extern void stop_critical_timings(void);
> >   extern void start_critical_timings(void);
> > + extern bool console_tracing_disabled(void);
> > +
> > +# define console_stop_critical_timings(flag) \
> > + do {\
> > + typecheck(bool, flag);  \
> > + flag = console_tracing_disabled();  \
> > + if (flag)   \
> > + stop_critical_timings();\
> > + } while (0)
> > +
> > +# define console_start_critical_timings(flag) \
> > + do { \
> > + typecheck(bool, flag);   \
> > + if (flag)\
> > + start_critical_timings();\
> > + } while (0)
> > +
> >  #else
> >  # define stop_critical_timings() do { } while (0)
> >  # define start_critical_timings() do { } while (0)
> > +# define console_stop_critical_timings(flag) \
> > + do { typecheck(bool, flag); } while (0)
> > +# define console_start_critical_timings(flag)\
> > + do { typecheck(bool, flag); } while (0)
> >  #endif
> >
> >  /*
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index ca65327a6de8..f27e96273453 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2369,6 +2369,7 @@ void console_unlock(void)
> >   static char ext_text[CONSOLE_EXT_LOG_MAX];
> >   static char text[LOG_LINE_MAX + PREFIX_MAX];
> >   unsigned long flags;
> > + bool cflag;
> >   bool do_cond_resched, retry;
> >
> >   if (console_suspended) {
> > @@ -2469,9 +2470,10 @@ void console_unlock(void)
> >*/
> >   console_lock_spinning_enable();
> >
> > - stop_critical_timings();/* don't trace print latency 
> > */
> > + /* don't trace print latency if it's disabled */
> > + console_stop_critical_timings(cflag);
> >   call_console_drivers(ext_text, ext_len, text, len);
> > - start_critical_timings();
> > + console_start_critical_timings(cflag);
> >
> >   if (console_lock_spinning_disable_and_check()) {
> >   printk_safe_exit_irqrestore(flags);
> > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > index 591c7a873235..10d12c8f7f77 100644
> > --- a/kernel/trace/trace.h
> > +++ b/kernel/trace/trace.h
> > @@ -1261,6 +1261,7 @@ extern int trace_get_user(struct trace_parser 
> > *parser, const char __user *ubuf,
> >   C(PRINTK_MSGONLY,   "printk-msg-only"), \
> >   C(CONTEXT_INFO, "context-info"),   /* Print 
> > pid/cpu/time */ \
> >   C(LATENCY_FMT,  "latency-format"),  \
> > + C(CONSOLE_LATENCY,  "console-latency"),

[PATCH v8 4/4] ftrace: Add an option for tracing console latencies

2019-10-08 Thread Viktor Rosendahl (BMW)

This new trace option "console-latency" will enable the latency
tracers to trace the console latencies. Previously this has always been
implicitely disabled. I guess this is because they are considered
to be well known and unavoidable.

However, for some organizations it may nevertheless be desirable to
trace them. Basically, we want to be able to tell that there are
latencies in the system under test because someone has incorrectly
enabled the serial console.

Signed-off-by: Viktor Rosendahl (BMW) 
---
 include/linux/irqflags.h | 22 ++
 kernel/printk/printk.c   |  6 --
 kernel/trace/trace.h |  1 +
 kernel/trace/trace_irqsoff.c | 18 ++
 4 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 21619c92c377..3de891723331 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -13,6 +13,7 @@
 #define _LINUX_TRACE_IRQFLAGS_H
 
 #include 
+#include 
 #include 
 
 /* Currently trace_softirqs_on/off is used only by lockdep */
@@ -68,9 +69,30 @@ do { \
defined(CONFIG_PREEMPT_TRACER)
  extern void stop_critical_timings(void);
  extern void start_critical_timings(void);
+ extern bool console_tracing_disabled(void);
+
+# define console_stop_critical_timings(flag)   \
+   do {\
+   typecheck(bool, flag);  \
+   flag = console_tracing_disabled();  \
+   if (flag)   \
+   stop_critical_timings();\
+   } while (0)
+
+# define console_start_critical_timings(flag)   \
+   do { \
+   typecheck(bool, flag);   \
+   if (flag)\
+   start_critical_timings();\
+   } while (0)
+
 #else
 # define stop_critical_timings() do { } while (0)
 # define start_critical_timings() do { } while (0)
+# define console_stop_critical_timings(flag)   \
+   do { typecheck(bool, flag); } while (0)
+# define console_start_critical_timings(flag)  \
+   do { typecheck(bool, flag); } while (0)
 #endif
 
 /*
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ca65327a6de8..f27e96273453 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2369,6 +2369,7 @@ void console_unlock(void)
static char ext_text[CONSOLE_EXT_LOG_MAX];
static char text[LOG_LINE_MAX + PREFIX_MAX];
unsigned long flags;
+   bool cflag;
bool do_cond_resched, retry;
 
if (console_suspended) {
@@ -2469,9 +2470,10 @@ void console_unlock(void)
 */
console_lock_spinning_enable();
 
-   stop_critical_timings();/* don't trace print latency */
+   /* don't trace print latency if it's disabled */
+   console_stop_critical_timings(cflag);
call_console_drivers(ext_text, ext_len, text, len);
-   start_critical_timings();
+   console_start_critical_timings(cflag);
 
if (console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 591c7a873235..10d12c8f7f77 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1261,6 +1261,7 @@ extern int trace_get_user(struct trace_parser *parser, 
const char __user *ubuf,
C(PRINTK_MSGONLY,   "printk-msg-only"), \
C(CONTEXT_INFO, "context-info"),   /* Print 
pid/cpu/time */ \
C(LATENCY_FMT,  "latency-format"),  \
+   C(CONSOLE_LATENCY,  "console-latency"), \
C(RECORD_CMD,   "record-cmd"),  \
C(RECORD_TGID,  "record-tgid"), \
C(OVERWRITE,"overwrite"),   \
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index a745b0cee5d3..576e2162114e 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -456,6 +456,24 @@ void stop_critical_timings(void)
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 NOKPROBE_SYMBOL(stop_critical_timings);
 
+bool console_tracing_disabled(void)
+{
+   struct trace_array *tr = irqsoff_trace;
+   int pc = preempt_count();
+
+   /*
+* If tracing is disabled, then the question of whether to trace console
+* latencies is moot. By always returning false here we save the caller
+* the calls to start/stop_critical_timings(). These calls would not do
+* anything anyway.
+*/
+   if (!preempt_trace(pc) && !irq_trace())
+

[PATCH v8 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger

2019-10-08 Thread Viktor Rosendahl (BMW)

This burst feature enables the user to generate a burst of
preempt/irqsoff latencies. This makes it possible to test whether we
are able to detect latencies that systematically occur very close to
each other.

The maximum burst size is 10. We also create 10 identical test
functions, so that we get 10 different backtraces; this is useful
when we want to test whether we can detect all the latencies in a
burst. Otherwise, there would be no easy way of differentiating
between which latency in a burst was captured by the tracer.

In addition, there is a sysfs trigger, so that it's not necessary to
reload the module to repeat the test. The trigger will appear as
/sys/kernel/preemptirq_delay_test/trigger in sysfs.

Signed-off-by: Viktor Rosendahl (BMW) 
Reviewed-by: Joel Fernandes (Google) 
---
 kernel/trace/Kconfig |   6 +-
 kernel/trace/preemptirq_delay_test.c | 144 +++
 2 files changed, 128 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e08527f50d2a..2a58380ea310 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -752,9 +752,9 @@ config PREEMPTIRQ_DELAY_TEST
  configurable delay. The module busy waits for the duration of the
  critical section.
 
- For example, the following invocation forces a one-time irq-disabled
- critical section for 500us:
- modprobe preemptirq_delay_test test_mode=irq delay=50
+ For example, the following invocation generates a burst of three
+ irq-disabled critical sections for 500us:
+ modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3
 
  If unsure, say N
 
diff --git a/kernel/trace/preemptirq_delay_test.c 
b/kernel/trace/preemptirq_delay_test.c
index d8765c952fab..31c0fad4cb9e 100644
--- a/kernel/trace/preemptirq_delay_test.c
+++ b/kernel/trace/preemptirq_delay_test.c
@@ -10,18 +10,25 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 static ulong delay = 100;
-static char test_mode[10] = "irq";
+static char test_mode[12] = "irq";
+static uint burst_size = 1;
 
-module_param_named(delay, delay, ulong, S_IRUGO);
-module_param_string(test_mode, test_mode, 10, S_IRUGO);
-MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)");
-MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default 
irq)");
+module_param_named(delay, delay, ulong, 0444);
+module_param_string(test_mode, test_mode, 12, 0444);
+module_param_named(burst_size, burst_size, uint, 0444);
+MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)");
+MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or 
alternate (default irq)");
+MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)");
+
+#define MIN(x, y) ((x) < (y) ? (x) : (y))
 
 static void busy_wait(ulong time)
 {
@@ -34,37 +41,136 @@ static void busy_wait(ulong time)
} while ((end - start) < (time * 1000));
 }
 
-static int preemptirq_delay_run(void *data)
+static __always_inline void irqoff_test(void)
 {
unsigned long flags;
+   local_irq_save(flags);
+   busy_wait(delay);
+   local_irq_restore(flags);
+}
 
-   if (!strcmp(test_mode, "irq")) {
-   local_irq_save(flags);
-   busy_wait(delay);
-   local_irq_restore(flags);
-   } else if (!strcmp(test_mode, "preempt")) {
-   preempt_disable();
-   busy_wait(delay);
-   preempt_enable();
+static __always_inline void preemptoff_test(void)
+{
+   preempt_disable();
+   busy_wait(delay);
+   preempt_enable();
+}
+
+static void execute_preemptirqtest(int idx)
+{
+   if (!strcmp(test_mode, "irq"))
+   irqoff_test();
+   else if (!strcmp(test_mode, "preempt"))
+   preemptoff_test();
+   else if (!strcmp(test_mode, "alternate")) {
+   if (idx % 2 == 0)
+   irqoff_test();
+   else
+   preemptoff_test();
}
+}
+
+#define DECLARE_TESTFN(POSTFIX)\
+   static void preemptirqtest_##POSTFIX(int idx)   \
+   {   \
+   execute_preemptirqtest(idx);\
+   }   \
 
+/*
+ * We create 10 different functions, so that we can get 10 different
+ * backtraces.
+ */
+DECLARE_TESTFN(0)
+DECLARE_TESTFN(1)
+DECLARE_TESTFN(2)
+DECLARE_TESTFN(3)
+DECLARE_TESTFN(4)
+DECLARE_TESTFN(5)
+DECLARE_TESTFN(6)
+DECLARE_TESTFN(7)
+DECLARE_TESTFN(8)
+DECLARE_TESTFN(9)
+
+static void (*testfuncs[])(int)  = {
+   preemptirqtest_0,
+   preemptirqtest_1,
+   preemptirqtest_2,
+   preemptirqtest_3,
+   preemptirqtest_4,
+   preemptirq

[PATCH v8 3/4] Add the latency-collector to tools

2019-10-08 Thread Viktor Rosendahl (BMW)

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

In the example above, we randomly, happened to get the third latency
in a burst burst of 10. If the experiment is repeated enough times,
we will get all 10.

Sometimes, there will be lost latencies, so that we get:

3054.078294 Latency 44 printout skipped due to inotify loop
..or:

3211.957685 Latency 112 printout skipped due to print loop

In my experience, these are not frequent enough to be a problem.

Also, we do not really lose any latency in these cases, it's
more like we get another latency than what was intended by
the design.

Signed-off-by: Viktor Rosendahl (BMW) 
---
 tools/Makefile  |   14 +-
 tools/trace/Makefile|   20 +
 tools/trace/latency-collector.c | 1212 +++
 3 files changed, 1240 insertions(+), 6 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

diff --git a/tools/Makefile b/tools/Makefile
index 7e42f7b8bfa7..957762a8b811 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -30,6 +30,7 @@ help:
@echo '  selftests  - various kernel selftests'
@echo '  spi- spi tools'
@echo '  tmon   - thermal monitoring and tuning tool'
+   @echo '  trace  - misc tracing tools'
@echo '  turbostat  - Intel CPU idle stats and freq 
reporting tool'
@echo '  usb- USB testing tools'
@echo '  virtio - vhost test module'
@@ -63,7 +64,7 @@ acpi: FORCE
 cpupower: FORCE
$(call descend,power/$@)
 
-cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci 
firmware debugging: FORCE
+cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci 
firmware debugging trace: FORCE
$(call descend,$@)
 
 liblockdep: FORCE
@@ -99,7 +100,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \
perf selftests spi turbostat usb \
virtio vm bpf x86_energy_perf_policy \
tmon freefall iio objtool kvm_stat wmi \
-   pci debugging
+   pci debugging trace
 
 acpi_install:
$(call descend,power/$(@:_install=),install)
@@ -107,7 +108,7 @@ acpi_install:
 cpupower_install:
$(call descend,power/$(@:_install=),install)
 
-cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install spi_install usb_install virtio_install vm_install bpf_install 
objtool_install wmi_install pci_install debugging_install:
+cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install spi_install usb_install virtio_install vm_install bpf_install 
objtool_install wmi_install pci_install debugging_install trace_install:
$(call descend,$(@:_install=),install)
 
 liblockdep_install:
@@ -133,7 +134,8 @@ install: acpi_install cgroup_install cpupower_install 
gpio_install \
perf_install selftests_install turbostat_install usb_install \
virtio_install vm_install bpf_install 
x86_energy_perf_policy_install \
tmon_install freefall_install objtool_install kvm_stat_install \
-   wmi_install pci_install debugging_install 
intel-speed-select_install
+   wmi_install pci_install debugging_install 
intel-speed-select_install \
+   trace_install
 
 acpi_clean:
$(call descend,power/acpi,clean)
@@ -141,7 +143,7 @@ acpi_clean:
 cpupower_clean:
$(call descend,power/cpupower,clean)
 
-cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean 
wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean 
firmware_clean debugging_clean:
+cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean 
wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean 
firmware_clean debugging_clean trace_clean:
$(call descend,$(@:_clean=),clean)
 
 liblockdep_clean:
@@ -180,6 +182,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean 
firewire_clean \
vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean 
tmon_clean \
freefall_clean build_clean libbpf_clean libsubcmd_clean 
liblockdep_clean \
gpio_clean objtool_clean leds_clean wmi_clean pci_clean 
firmware_clean debugging_clean \
-   intel-speed-select_clean
+   intel-speed-select_clean trace_clean
 
 .PHONY: FORCE
diff --git a/tools/trace/Makefile b/tools/trace/Makefile
new file mode 100644
index ..59cd483ab01f
--- /dev/null
+++ b/tools/trace/Makefile
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for vm tools
+#
+TARGETS = latency-collector
+CFLAGS = -Wall -Wextra -g -O2
+LDFLAGS = -lpthread
+
+all: $(TARGETS)
+
+%: %.c
+   $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
+
+clean:
+

[PATCH v8 0/4] Some new features for the preempt/irqsoff tracers

2019-10-08 Thread Viktor Rosendahl (BMW)

Hello all,

I have retained the fourth patch, although it was suggested that is becoming
obsolete soon. I have retained it only because I do not know the status of
the code that will make it obsolete. It's the last patch of the series and
if there indeed is some code that will remove the latency issues from the
printk code, then of course it makes sense to drop it. The first three patches
will work without it.

Changes in v8:

- [PATCH 1/4]:
  * Moved a forward declaration in order to make code a bit more robust.
  * Converted a macro NOP to a static function so that the type is checked.

- [PATCH 2/4]:
  * No change.

- [PATCH 3/4]:
  * No change

- [PACTH 4/4]:
  * Added a comment to explain an optimization in console_tracing_disabled().

This series is meant to address two issues with the latency tracing.

The first three patches provide a method to trace latencies that always
occurs very close to each other and to differentiate between them, in spite
of the fact that the latency tracers work in overwrite mode.

[PATCH 1/4] This implement fs notification for tracing_max_latency. It
makes it possible for userspace to detect when a new latency has been
detected.

[PATCH 2/4] This extends the preemptirq_delay_test module so that it can be
used to generate a burst of closely occurring latencies.

[PATCH 3/4] This adds a user space program to the tools directory that
utilizes the fs notification feature and a randomized algorithm to print out
any of the latencies in a burst with approximately equal probability.

The last patch is not directly connected but earlier it didn't apply
cleanly on its own. However, now it does, so in principle it could be
applied separately from the others.

[PATCH 4/4] This adds the option console-latency to the trace options. This
makes it possible to enable tracing of console latencies.

best regards,

Viktor

Viktor Rosendahl (BMW) (4):
  ftrace: Implement fs notification for tracing_max_latency
  preemptirq_delay_test: Add the burst feature and a sysfs trigger
  Add the latency-collector to tools
  ftrace: Add an option for tracing console latencies

 include/linux/irqflags.h |   22 +
 kernel/printk/printk.c   |6 +-
 kernel/trace/Kconfig |6 +-
 kernel/trace/preemptirq_delay_test.c |  144 ++-
 kernel/trace/trace.c |   75 +-
 kernel/trace/trace.h |   19 +
 kernel/trace/trace_hwlat.c   |4 +-
 kernel/trace/trace_irqsoff.c |   18 +
 tools/Makefile   |   14 +-
 tools/trace/Makefile |   20 +
 tools/trace/latency-collector.c  | 1212 ++
 11 files changed, 1507 insertions(+), 33 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

-- 
2.17.1

[PATCH v8 1/4] ftrace: Implement fs notification for tracing_max_latency

2019-10-08 Thread Viktor Rosendahl (BMW)

This patch implements the feature that the tracing_max_latency file,
e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive
notifications through the fsnotify framework when a new latency is
available.

One particularly interesting use of this facility is when enabling
threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh,
together with the preempt/irqsoff tracers. This makes it possible to
implement a user space program that can, with equal probability,
obtain traces of latencies that occur immediately after each other in
spite of the fact that the preempt/irqsoff tracers operate in overwrite
mode.

This facility works with the hwlat, preempt/irqsoff, and wakeup
tracers.

The tracers may call the latency_fsnotify() from places such as
__schedule() or do_idle(); this makes it impossible to call
queue_work() directly without risking a deadlock. The same would
happen with a softirq,  kernel thread or tasklet. For this reason we
use the irq_work mechanism to call queue_work().

This patch creates a new workqueue. The reason for doing this is that
I wanted to use the WQ_UNBOUND and WQ_HIGHPRI flags.  My thinking was
that WQ_UNBOUND might help with the latency in some important cases.

If we use:

queue_work(system_highpri_wq, >fsnotify_work);

then the work will (almost) always execute on the same CPU but if we are
unlucky that CPU could be too busy while there could be another CPU in
the system that would be able to process the work soon enough.

queue_work_on() could be used to queue the work on another CPU but it
seems difficult to select the right CPU.

Signed-off-by: Viktor Rosendahl (BMW) 
Reviewed-by: Joel Fernandes (Google) 
---
 kernel/trace/trace.c   | 75 +-
 kernel/trace/trace.h   | 18 +
 kernel/trace/trace_hwlat.c |  4 +-
 3 files changed, 94 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 252f79c435f8..f8ca307f66cf 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -44,6 +44,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "trace.h"
 #include "trace_output.h"
@@ -1479,6 +1482,74 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, 
void *buf, size_t cnt)
 }
 
 unsigned long __read_mostlytracing_thresh;
+static const struct file_operations tracing_max_lat_fops;
+
+#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
+   defined(CONFIG_FSNOTIFY)
+
+static struct workqueue_struct *fsnotify_wq;
+
+static void latency_fsnotify_workfn(struct work_struct *work)
+{
+   struct trace_array *tr = container_of(work, struct trace_array,
+ fsnotify_work);
+   fsnotify(tr->d_max_latency->d_inode, FS_MODIFY,
+tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0);
+}
+
+static void latency_fsnotify_workfn_irq(struct irq_work *iwork)
+{
+   struct trace_array *tr = container_of(iwork, struct trace_array,
+ fsnotify_irqwork);
+   queue_work(fsnotify_wq, >fsnotify_work);
+}
+
+static void trace_create_maxlat_file(struct trace_array *tr,
+struct dentry *d_tracer)
+{
+   INIT_WORK(>fsnotify_work, latency_fsnotify_workfn);
+   init_irq_work(>fsnotify_irqwork, latency_fsnotify_workfn_irq);
+   tr->d_max_latency = trace_create_file("tracing_max_latency", 0644,
+ d_tracer, >max_latency,
+ _max_lat_fops);
+}
+
+__init static int latency_fsnotify_init(void)
+{
+   fsnotify_wq = alloc_workqueue("tr_max_lat_wq",
+ WQ_UNBOUND | WQ_HIGHPRI, 0);
+   if (!fsnotify_wq) {
+   pr_err("Unable to allocate tr_max_lat_wq\n");
+   return -ENOMEM;
+   }
+   return 0;
+}
+
+late_initcall_sync(latency_fsnotify_init);
+
+void latency_fsnotify(struct trace_array *tr)
+{
+   if (!fsnotify_wq)
+   return;
+   /*
+* We cannot call queue_work(>fsnotify_work) from here because it's
+* possible that we are called from __schedule() or do_idle(), which
+* could cause a deadlock.
+*/
+   irq_work_queue(>fsnotify_irqwork);
+}
+
+/*
+ * (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
+ *  defined(CONFIG_FSNOTIFY)
+ */
+#else
+
+#define trace_create_maxlat_file(tr, d_tracer) \
+   trace_create_file("tracing_max_latency", 0644, d_tracer,\
+ >max_latency, _max_lat_fops)
+
+#endif
 
 #ifdef CONFIG_TRACER_MAX_TRACE
 /*
@@ -1518,6 +1589,7 @@ __update_max_tr(struct trace_array *tr, struct 
task_struct *tsk, int cpu)
 
/* record this tasks comm */
tracing_record

Re: [PATCH v7 1/4] ftrace: Implement fs notification for tracing_max_latency

2019-10-03 Thread Viktor Rosendahl


On 10/2/19 2:32 AM, Joel Fernandes wrote:

Some nits, but others looks good:

On Fri, Sep 20, 2019 at 05:22:16PM +0200, Viktor Rosendahl (BMW) wrote:

This patch implements the feature that the tracing_max_latency file,
e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive
notifications through the fsnotify framework when a new latency is
available.

One particularly interesting use of this facility is when enabling
threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh,
together with the preempt/irqsoff tracers. This makes it possible to
implement a user space program that can, with equal probability,
obtain traces of latencies that occur immediately after each other in
spite of the fact that the preempt/irqsoff tracers operate in overwrite
mode.

This facility works with the hwlat, preempt/irqsoff, and wakeup
tracers.

The tracers may call the latency_fsnotify() from places such as
__schedule() or do_idle(); this makes it impossible to call
queue_work() directly without risking a deadlock. The same would
happen with a softirq,  kernel thread or tasklet. For this reason we
use the irq_work mechanism to call queue_work().

This patch creates a new workqueue. The reason for doing this is that
I wanted to use the WQ_UNBOUND and WQ_HIGHPRI flags.  My thinking was
that WQ_UNBOUND might help with the latency in some important cases.

If we use:

queue_work(system_highpri_wq, >fsnotify_work);

then the work will (almost) always execute on the same CPU but if we are
unlucky that CPU could be too busy while there could be another CPU in
the system that would be able to process the work soon enough.

queue_work_on() could be used to queue the work on another CPU but it
seems difficult to select the right CPU.

Signed-off-by: Viktor Rosendahl (BMW) 
---
  kernel/trace/trace.c   | 75 +-
  kernel/trace/trace.h   | 18 +
  kernel/trace/trace_hwlat.c |  4 +-
  3 files changed, 94 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 947ba433865f..2d2150bf0d42 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -44,6 +44,9 @@
  #include 
  #include 
  #include 
+#include 
+#include 
+#include 
  
  #include "trace.h"

  #include "trace_output.h"
@@ -1480,6 +1483,74 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, 
void *buf, size_t cnt)
  
  unsigned long __read_mostly	tracing_thresh;
  
+#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \

+   defined(CONFIG_FSNOTIFY)
+
+static const struct file_operations tracing_max_lat_fops;
+static struct workqueue_struct *fsnotify_wq;
+
+static void latency_fsnotify_workfn(struct work_struct *work)
+{
+   struct trace_array *tr = container_of(work, struct trace_array,
+ fsnotify_work);
+   fsnotify(tr->d_max_latency->d_inode, FS_MODIFY,
+tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0);
+}
+
+static void latency_fsnotify_workfn_irq(struct irq_work *iwork)
+{
+   struct trace_array *tr = container_of(iwork, struct trace_array,
+ fsnotify_irqwork);
+   queue_work(fsnotify_wq, >fsnotify_work);
+}
+
+static void trace_create_maxlat_file(struct trace_array *tr,
+struct dentry *d_tracer)
+{
+   INIT_WORK(>fsnotify_work, latency_fsnotify_workfn);
+   init_irq_work(>fsnotify_irqwork, latency_fsnotify_workfn_irq);
+   tr->d_max_latency = trace_create_file("tracing_max_latency", 0644,
+ d_tracer, >max_latency,
+ _max_lat_fops);
+}
+
+__init static int latency_fsnotify_init(void)
+{
+   fsnotify_wq = alloc_workqueue("tr_max_lat_wq",
+ WQ_UNBOUND | WQ_HIGHPRI, 0);
+   if (!fsnotify_wq) {
+   pr_err("Unable to allocate tr_max_lat_wq\n");
+   return -ENOMEM;
+   }
+   return 0;
+}
+
+late_initcall_sync(latency_fsnotify_init);
+
+void latency_fsnotify(struct trace_array *tr)
+{
+   if (!fsnotify_wq)
+   return;
+   /*
+* We cannot call queue_work(>fsnotify_work) from here because it's
+* possible that we are called from __schedule() or do_idle(), which
+* could cause a deadlock.
+*/
+   irq_work_queue(>fsnotify_irqwork);
+}
+
+/*
+ * (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
+ *  defined(CONFIG_FSNOTIFY)
+ */
+#else
+
+#define trace_create_maxlat_file(tr, d_tracer) \
+   trace_create_file("tracing_max_latency", 0644, d_tracer,  \
+ >max_latency, _max_lat_fops)


I would forward declare tracing_max_lat_fops here as well like you are doing
in the

Re: [PATCH v7 4/4] ftrace: Add an option for tracing console latencies

2019-10-02 Thread Viktor Rosendahl


On 10/2/19 2:52 AM, Joel Fernandes wrote:

On Fri, Sep 20, 2019 at 05:22:19PM +0200, Viktor Rosendahl (BMW) wrote:

This new trace option "console-latency" will enable the latency
tracers to trace the console latencies. Previously this has always been
implicitely disabled. I guess this is because they are considered
to be well known and unavoidable.

However, for some organizations it may nevertheless be desirable to
trace them. Basically, we want to be able to tell that there are
latencies in the system under test because someone has incorrectly
enabled the serial console.

Signed-off-by: Viktor Rosendahl (BMW) 
---
  include/linux/irqflags.h | 22 ++
  kernel/printk/printk.c   |  6 --
  kernel/trace/trace.h |  1 +
  kernel/trace/trace_irqsoff.c | 12 
  4 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 21619c92c377..3de891723331 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -13,6 +13,7 @@
  #define _LINUX_TRACE_IRQFLAGS_H
  
  #include 

+#include 
  #include 
  
  /* Currently trace_softirqs_on/off is used only by lockdep */

@@ -68,9 +69,30 @@ do { \
defined(CONFIG_PREEMPT_TRACER)
   extern void stop_critical_timings(void);
   extern void start_critical_timings(void);
+ extern bool console_tracing_disabled(void);
+
+# define console_stop_critical_timings(flag)   \
+   do {\
+   typecheck(bool, flag);  \
+   flag = console_tracing_disabled();  \
+   if (flag)   \
+   stop_critical_timings();\
+   } while (0)
+
+# define console_start_critical_timings(flag)   \
+   do { \
+   typecheck(bool, flag);   \
+   if (flag)\
+   start_critical_timings();\
+   } while (0)
+
  #else
  # define stop_critical_timings() do { } while (0)
  # define start_critical_timings() do { } while (0)
+# define console_stop_critical_timings(flag)   \
+   do { typecheck(bool, flag); } while (0)
+# define console_start_critical_timings(flag)  \
+   do { typecheck(bool, flag); } while (0)
  #endif
  
  /*

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1888f6a3b694..036460a7e4cd 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2359,6 +2359,7 @@ void console_unlock(void)
static char ext_text[CONSOLE_EXT_LOG_MAX];
static char text[LOG_LINE_MAX + PREFIX_MAX];
unsigned long flags;
+   bool cflag;
bool do_cond_resched, retry;
  
  	if (console_suspended) {

@@ -2459,9 +2460,10 @@ void console_unlock(void)
 */
console_lock_spinning_enable();
  
-		stop_critical_timings();	/* don't trace print latency */

+   /* don't trace print latency if it's disabled */
+   console_stop_critical_timings(cflag);
call_console_drivers(ext_text, ext_len, text, len);
-   start_critical_timings();
+   console_start_critical_timings(cflag);
  
  		if (console_lock_spinning_disable_and_check()) {

printk_safe_exit_irqrestore(flags);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 4913ed1138aa..93a8b82c27e4 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1262,6 +1262,7 @@ extern int trace_get_user(struct trace_parser *parser, 
const char __user *ubuf,
C(PRINTK_MSGONLY,   "printk-msg-only"),   \
C(CONTEXT_INFO, "context-info"),   /* Print 
pid/cpu/time */ \
C(LATENCY_FMT,  "latency-format"),\
+   C(CONSOLE_LATENCY,  "console-latency"),   \
C(RECORD_CMD,   "record-cmd"),\
C(RECORD_TGID,  "record-tgid"),   \
C(OVERWRITE,"overwrite"), \
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index a745b0cee5d3..bc02be207a7a 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -456,6 +456,18 @@ void stop_critical_timings(void)
  EXPORT_SYMBOL_GPL(stop_critical_timings);
  NOKPROBE_SYMBOL(stop_critical_timings);
  
+bool console_tracing_disabled(void)

+{
+   struct trace_array *tr = irqsoff_trace;
+   int pc = preempt_count();
+
+   if (!preempt_trace(pc) && !irq_trace())
+   return false;


Why this condition here? Why not just make the call to
stop_critical_timings() dependent on the new flag you are adding? Please
explain it more and add some comments.

Re: [PATCH v7 1/4] ftrace: Implement fs notification for tracing_max_latency

2019-10-02 Thread Viktor Rosendahl


On 10/2/19 5:13 PM, Steven Rostedt wrote:

On Fri, 20 Sep 2019 17:22:16 +0200
"Viktor Rosendahl (BMW)"  wrote:


This patch implements the feature that the tracing_max_latency file,
e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive
notifications through the fsnotify framework when a new latency is
available.

One particularly interesting use of this facility is when enabling
threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh,
together with the preempt/irqsoff tracers. This makes it possible to
implement a user space program that can, with equal probability,
obtain traces of latencies that occur immediately after each other in
spite of the fact that the preempt/irqsoff tracers operate in overwrite
mode.

This facility works with the hwlat, preempt/irqsoff, and wakeup
tracers.

The tracers may call the latency_fsnotify() from places such as
__schedule() or do_idle(); this makes it impossible to call
queue_work() directly without risking a deadlock. The same would
happen with a softirq,  kernel thread or tasklet. For this reason we
use the irq_work mechanism to call queue_work().


Can fsnotify() be called from irq context? If so, why have the work
queue at all? Just do the work from the irq_work handler.



fsnotify() might sleep. It calls send_to_group(), which calls 
inotify_handle_event() through a function pointer.


inotify_handle_event() calls kmalloc() without the GFP_ATOMIC flag.

There might be other reasons as well but the above is one that I have 
seen a warning for, when enabling CONFIG_DEBUG_ATOMIC_SLEEP and trying 
to call fsnotify() from an atomic context.


best regards,

Viktor

[PATCH v7 1/4] ftrace: Implement fs notification for tracing_max_latency

2019-09-20 Thread Viktor Rosendahl (BMW)

This patch implements the feature that the tracing_max_latency file,
e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive
notifications through the fsnotify framework when a new latency is
available.

One particularly interesting use of this facility is when enabling
threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh,
together with the preempt/irqsoff tracers. This makes it possible to
implement a user space program that can, with equal probability,
obtain traces of latencies that occur immediately after each other in
spite of the fact that the preempt/irqsoff tracers operate in overwrite
mode.

This facility works with the hwlat, preempt/irqsoff, and wakeup
tracers.

The tracers may call the latency_fsnotify() from places such as
__schedule() or do_idle(); this makes it impossible to call
queue_work() directly without risking a deadlock. The same would
happen with a softirq,  kernel thread or tasklet. For this reason we
use the irq_work mechanism to call queue_work().

This patch creates a new workqueue. The reason for doing this is that
I wanted to use the WQ_UNBOUND and WQ_HIGHPRI flags.  My thinking was
that WQ_UNBOUND might help with the latency in some important cases.

If we use:

queue_work(system_highpri_wq, >fsnotify_work);

then the work will (almost) always execute on the same CPU but if we are
unlucky that CPU could be too busy while there could be another CPU in
the system that would be able to process the work soon enough.

queue_work_on() could be used to queue the work on another CPU but it
seems difficult to select the right CPU.

Signed-off-by: Viktor Rosendahl (BMW) 
---
 kernel/trace/trace.c   | 75 +-
 kernel/trace/trace.h   | 18 +
 kernel/trace/trace_hwlat.c |  4 +-
 3 files changed, 94 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 947ba433865f..2d2150bf0d42 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -44,6 +44,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "trace.h"
 #include "trace_output.h"
@@ -1480,6 +1483,74 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, 
void *buf, size_t cnt)
 
 unsigned long __read_mostlytracing_thresh;
 
+#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
+   defined(CONFIG_FSNOTIFY)
+
+static const struct file_operations tracing_max_lat_fops;
+static struct workqueue_struct *fsnotify_wq;
+
+static void latency_fsnotify_workfn(struct work_struct *work)
+{
+   struct trace_array *tr = container_of(work, struct trace_array,
+ fsnotify_work);
+   fsnotify(tr->d_max_latency->d_inode, FS_MODIFY,
+tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0);
+}
+
+static void latency_fsnotify_workfn_irq(struct irq_work *iwork)
+{
+   struct trace_array *tr = container_of(iwork, struct trace_array,
+ fsnotify_irqwork);
+   queue_work(fsnotify_wq, >fsnotify_work);
+}
+
+static void trace_create_maxlat_file(struct trace_array *tr,
+struct dentry *d_tracer)
+{
+   INIT_WORK(>fsnotify_work, latency_fsnotify_workfn);
+   init_irq_work(>fsnotify_irqwork, latency_fsnotify_workfn_irq);
+   tr->d_max_latency = trace_create_file("tracing_max_latency", 0644,
+ d_tracer, >max_latency,
+ _max_lat_fops);
+}
+
+__init static int latency_fsnotify_init(void)
+{
+   fsnotify_wq = alloc_workqueue("tr_max_lat_wq",
+ WQ_UNBOUND | WQ_HIGHPRI, 0);
+   if (!fsnotify_wq) {
+   pr_err("Unable to allocate tr_max_lat_wq\n");
+   return -ENOMEM;
+   }
+   return 0;
+}
+
+late_initcall_sync(latency_fsnotify_init);
+
+void latency_fsnotify(struct trace_array *tr)
+{
+   if (!fsnotify_wq)
+   return;
+   /*
+* We cannot call queue_work(>fsnotify_work) from here because it's
+* possible that we are called from __schedule() or do_idle(), which
+* could cause a deadlock.
+*/
+   irq_work_queue(>fsnotify_irqwork);
+}
+
+/*
+ * (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
+ *  defined(CONFIG_FSNOTIFY)
+ */
+#else
+
+#define trace_create_maxlat_file(tr, d_tracer) \
+   trace_create_file("tracing_max_latency", 0644, d_tracer,\
+ >max_latency, _max_lat_fops)
+
+#endif
+
 #ifdef CONFIG_TRACER_MAX_TRACE
 /*
  * Copy the new maximum trace into the separate maximum-trace
@@ -1518,6 +1589,7 @@ __update_max_tr(struct trace_array *tr, struct 
task_struct *tsk, int cpu)
 
/* record

[PATCH v7 3/4] Add the latency-collector to tools

2019-09-20 Thread Viktor Rosendahl (BMW)

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

In the example above, we randomly, happened to get the third latency
in a burst burst of 10. If the experiment is repeated enough times,
we will get all 10.

Sometimes, there will be lost latencies, so that we get:

3054.078294 Latency 44 printout skipped due to inotify loop
..or:

3211.957685 Latency 112 printout skipped due to print loop

In my experience, these are not frequent enough to be a problem.

Also, we do not really lose any latency in these cases, it's
more like we get another latency than what was intended by
the design.

Signed-off-by: Viktor Rosendahl (BMW) 
---
 tools/Makefile  |   14 +-
 tools/trace/Makefile|   20 +
 tools/trace/latency-collector.c | 1212 +++
 3 files changed, 1240 insertions(+), 6 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

diff --git a/tools/Makefile b/tools/Makefile
index 7e42f7b8bfa7..957762a8b811 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -30,6 +30,7 @@ help:
@echo '  selftests  - various kernel selftests'
@echo '  spi- spi tools'
@echo '  tmon   - thermal monitoring and tuning tool'
+   @echo '  trace  - misc tracing tools'
@echo '  turbostat  - Intel CPU idle stats and freq 
reporting tool'
@echo '  usb- USB testing tools'
@echo '  virtio - vhost test module'
@@ -63,7 +64,7 @@ acpi: FORCE
 cpupower: FORCE
$(call descend,power/$@)
 
-cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci 
firmware debugging: FORCE
+cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci 
firmware debugging trace: FORCE
$(call descend,$@)
 
 liblockdep: FORCE
@@ -99,7 +100,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \
perf selftests spi turbostat usb \
virtio vm bpf x86_energy_perf_policy \
tmon freefall iio objtool kvm_stat wmi \
-   pci debugging
+   pci debugging trace
 
 acpi_install:
$(call descend,power/$(@:_install=),install)
@@ -107,7 +108,7 @@ acpi_install:
 cpupower_install:
$(call descend,power/$(@:_install=),install)
 
-cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install spi_install usb_install virtio_install vm_install bpf_install 
objtool_install wmi_install pci_install debugging_install:
+cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install spi_install usb_install virtio_install vm_install bpf_install 
objtool_install wmi_install pci_install debugging_install trace_install:
$(call descend,$(@:_install=),install)
 
 liblockdep_install:
@@ -133,7 +134,8 @@ install: acpi_install cgroup_install cpupower_install 
gpio_install \
perf_install selftests_install turbostat_install usb_install \
virtio_install vm_install bpf_install 
x86_energy_perf_policy_install \
tmon_install freefall_install objtool_install kvm_stat_install \
-   wmi_install pci_install debugging_install 
intel-speed-select_install
+   wmi_install pci_install debugging_install 
intel-speed-select_install \
+   trace_install
 
 acpi_clean:
$(call descend,power/acpi,clean)
@@ -141,7 +143,7 @@ acpi_clean:
 cpupower_clean:
$(call descend,power/cpupower,clean)
 
-cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean 
wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean 
firmware_clean debugging_clean:
+cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean 
wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean 
firmware_clean debugging_clean trace_clean:
$(call descend,$(@:_clean=),clean)
 
 liblockdep_clean:
@@ -180,6 +182,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean 
firewire_clean \
vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean 
tmon_clean \
freefall_clean build_clean libbpf_clean libsubcmd_clean 
liblockdep_clean \
gpio_clean objtool_clean leds_clean wmi_clean pci_clean 
firmware_clean debugging_clean \
-   intel-speed-select_clean
+   intel-speed-select_clean trace_clean
 
 .PHONY: FORCE
diff --git a/tools/trace/Makefile b/tools/trace/Makefile
new file mode 100644
index ..59cd483ab01f
--- /dev/null
+++ b/tools/trace/Makefile
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for vm tools
+#
+TARGETS = latency-collector
+CFLAGS = -Wall -Wextra -g -O2
+LDFLAGS = -lpthread
+
+all: $(TARGETS)
+
+%: %.c
+   $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
+
+clean:
+

[PATCH v7 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger

2019-09-20 Thread Viktor Rosendahl (BMW)

This burst feature enables the user to generate a burst of
preempt/irqsoff latencies. This makes it possible to test whether we
are able to detect latencies that systematically occur very close to
each other.

The maximum burst size is 10. We also create 10 identical test
functions, so that we get 10 different backtraces; this is useful
when we want to test whether we can detect all the latencies in a
burst. Otherwise, there would be no easy way of differentiating
between which latency in a burst was captured by the tracer.

In addition, there is a sysfs trigger, so that it's not necessary to
reload the module to repeat the test. The trigger will appear as
/sys/kernel/preemptirq_delay_test/trigger in sysfs.

Signed-off-by: Viktor Rosendahl (BMW) 
---
 kernel/trace/Kconfig |   6 +-
 kernel/trace/preemptirq_delay_test.c | 144 +++
 2 files changed, 128 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e08527f50d2a..2a58380ea310 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -752,9 +752,9 @@ config PREEMPTIRQ_DELAY_TEST
  configurable delay. The module busy waits for the duration of the
  critical section.
 
- For example, the following invocation forces a one-time irq-disabled
- critical section for 500us:
- modprobe preemptirq_delay_test test_mode=irq delay=50
+ For example, the following invocation generates a burst of three
+ irq-disabled critical sections for 500us:
+ modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3
 
  If unsure, say N
 
diff --git a/kernel/trace/preemptirq_delay_test.c 
b/kernel/trace/preemptirq_delay_test.c
index d8765c952fab..31c0fad4cb9e 100644
--- a/kernel/trace/preemptirq_delay_test.c
+++ b/kernel/trace/preemptirq_delay_test.c
@@ -10,18 +10,25 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 static ulong delay = 100;
-static char test_mode[10] = "irq";
+static char test_mode[12] = "irq";
+static uint burst_size = 1;
 
-module_param_named(delay, delay, ulong, S_IRUGO);
-module_param_string(test_mode, test_mode, 10, S_IRUGO);
-MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)");
-MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default 
irq)");
+module_param_named(delay, delay, ulong, 0444);
+module_param_string(test_mode, test_mode, 12, 0444);
+module_param_named(burst_size, burst_size, uint, 0444);
+MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)");
+MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or 
alternate (default irq)");
+MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)");
+
+#define MIN(x, y) ((x) < (y) ? (x) : (y))
 
 static void busy_wait(ulong time)
 {
@@ -34,37 +41,136 @@ static void busy_wait(ulong time)
} while ((end - start) < (time * 1000));
 }
 
-static int preemptirq_delay_run(void *data)
+static __always_inline void irqoff_test(void)
 {
unsigned long flags;
+   local_irq_save(flags);
+   busy_wait(delay);
+   local_irq_restore(flags);
+}
 
-   if (!strcmp(test_mode, "irq")) {
-   local_irq_save(flags);
-   busy_wait(delay);
-   local_irq_restore(flags);
-   } else if (!strcmp(test_mode, "preempt")) {
-   preempt_disable();
-   busy_wait(delay);
-   preempt_enable();
+static __always_inline void preemptoff_test(void)
+{
+   preempt_disable();
+   busy_wait(delay);
+   preempt_enable();
+}
+
+static void execute_preemptirqtest(int idx)
+{
+   if (!strcmp(test_mode, "irq"))
+   irqoff_test();
+   else if (!strcmp(test_mode, "preempt"))
+   preemptoff_test();
+   else if (!strcmp(test_mode, "alternate")) {
+   if (idx % 2 == 0)
+   irqoff_test();
+   else
+   preemptoff_test();
}
+}
+
+#define DECLARE_TESTFN(POSTFIX)\
+   static void preemptirqtest_##POSTFIX(int idx)   \
+   {   \
+   execute_preemptirqtest(idx);\
+   }   \
 
+/*
+ * We create 10 different functions, so that we can get 10 different
+ * backtraces.
+ */
+DECLARE_TESTFN(0)
+DECLARE_TESTFN(1)
+DECLARE_TESTFN(2)
+DECLARE_TESTFN(3)
+DECLARE_TESTFN(4)
+DECLARE_TESTFN(5)
+DECLARE_TESTFN(6)
+DECLARE_TESTFN(7)
+DECLARE_TESTFN(8)
+DECLARE_TESTFN(9)
+
+static void (*testfuncs[])(int)  = {
+   preemptirqtest_0,
+   preemptirqtest_1,
+   preemptirqtest_2,
+   preemptirqtest_3,
+   preemptirqtest_4,
+   preemptirqtest_5,
+   preemptirqtest_6,
+

[PATCH v7 4/4] ftrace: Add an option for tracing console latencies

2019-09-20 Thread Viktor Rosendahl (BMW)

This new trace option "console-latency" will enable the latency
tracers to trace the console latencies. Previously this has always been
implicitely disabled. I guess this is because they are considered
to be well known and unavoidable.

However, for some organizations it may nevertheless be desirable to
trace them. Basically, we want to be able to tell that there are
latencies in the system under test because someone has incorrectly
enabled the serial console.

Signed-off-by: Viktor Rosendahl (BMW) 
---
 include/linux/irqflags.h | 22 ++
 kernel/printk/printk.c   |  6 --
 kernel/trace/trace.h |  1 +
 kernel/trace/trace_irqsoff.c | 12 
 4 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 21619c92c377..3de891723331 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -13,6 +13,7 @@
 #define _LINUX_TRACE_IRQFLAGS_H
 
 #include 
+#include 
 #include 
 
 /* Currently trace_softirqs_on/off is used only by lockdep */
@@ -68,9 +69,30 @@ do { \
defined(CONFIG_PREEMPT_TRACER)
  extern void stop_critical_timings(void);
  extern void start_critical_timings(void);
+ extern bool console_tracing_disabled(void);
+
+# define console_stop_critical_timings(flag)   \
+   do {\
+   typecheck(bool, flag);  \
+   flag = console_tracing_disabled();  \
+   if (flag)   \
+   stop_critical_timings();\
+   } while (0)
+
+# define console_start_critical_timings(flag)   \
+   do { \
+   typecheck(bool, flag);   \
+   if (flag)\
+   start_critical_timings();\
+   } while (0)
+
 #else
 # define stop_critical_timings() do { } while (0)
 # define start_critical_timings() do { } while (0)
+# define console_stop_critical_timings(flag)   \
+   do { typecheck(bool, flag); } while (0)
+# define console_start_critical_timings(flag)  \
+   do { typecheck(bool, flag); } while (0)
 #endif
 
 /*
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1888f6a3b694..036460a7e4cd 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2359,6 +2359,7 @@ void console_unlock(void)
static char ext_text[CONSOLE_EXT_LOG_MAX];
static char text[LOG_LINE_MAX + PREFIX_MAX];
unsigned long flags;
+   bool cflag;
bool do_cond_resched, retry;
 
if (console_suspended) {
@@ -2459,9 +2460,10 @@ void console_unlock(void)
 */
console_lock_spinning_enable();
 
-   stop_critical_timings();/* don't trace print latency */
+   /* don't trace print latency if it's disabled */
+   console_stop_critical_timings(cflag);
call_console_drivers(ext_text, ext_len, text, len);
-   start_critical_timings();
+   console_start_critical_timings(cflag);
 
if (console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 4913ed1138aa..93a8b82c27e4 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1262,6 +1262,7 @@ extern int trace_get_user(struct trace_parser *parser, 
const char __user *ubuf,
C(PRINTK_MSGONLY,   "printk-msg-only"), \
C(CONTEXT_INFO, "context-info"),   /* Print 
pid/cpu/time */ \
C(LATENCY_FMT,  "latency-format"),  \
+   C(CONSOLE_LATENCY,  "console-latency"), \
C(RECORD_CMD,   "record-cmd"),  \
C(RECORD_TGID,  "record-tgid"), \
C(OVERWRITE,"overwrite"),   \
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index a745b0cee5d3..bc02be207a7a 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -456,6 +456,18 @@ void stop_critical_timings(void)
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 NOKPROBE_SYMBOL(stop_critical_timings);
 
+bool console_tracing_disabled(void)
+{
+   struct trace_array *tr = irqsoff_trace;
+   int pc = preempt_count();
+
+   if (!preempt_trace(pc) && !irq_trace())
+   return false;
+
+   return !(tr->trace_flags & TRACE_ITER_CONSOLE_LATENCY);
+}
+EXPORT_SYMBOL_GPL(console_tracing_disabled);
+
 #ifdef CONFIG_FUNCTION_TRACER
 static bool function_enabled;
 
-- 
2.17.1

[PATCH v7 0/4] Some new features for the preempt/irqsoff tracers

2019-09-20 Thread Viktor Rosendahl (BMW)

Hello all,

There are not many changes this time. I mainly changed my email address,
tweaked a commit message and removed the cluttering of a copyright message.

Changes in v7:
- [PATCH 1/4]:
  * I have added some reasons to end of the commit message why I believe that
it makes sense to create a new workqueue.

- [PATCH 2/4]:
  * Removed the cluttering of the copyright message.

- [PATCH 3/4]:
  * No change

- [PACTH 4/4]:
  * No change

This series is meant to address two issues with the latency tracing.

The first three patches provide a method to trace latencies that always
occurs very close to each other and to differentiate between them, in spite
of the fact that the latency tracers work in overwrite mode.

[PATCH 1/4] This implement fs notification for tracing_max_latency. It
makes it possible for userspace to detect when a new latency has been
detected.

[PATCH 2/4] This extends the preemptirq_delay_test module so that it can be
used to generate a burst of closely occurring latencies.

[PATCH 3/4] This adds a user space program to the tools directory that
utilizes the fs notification feature and a randomized algorithm to print out
any of the latencies in a burst with approximately equal probability.

The last patch is not directly connected but earlier it didn't apply
cleanly on its own. However, now it does, so in principle it could be
applied separately from the others.

[PATCH 4/4] This adds the option console-latency to the trace options. This
makes it possible to enable tracing of console latencies.

best regards,

Viktor

Viktor Rosendahl (BMW) (4):
  ftrace: Implement fs notification for tracing_max_latency
  preemptirq_delay_test: Add the burst feature and a sysfs trigger
  Add the latency-collector to tools
  ftrace: Add an option for tracing console latencies

 include/linux/irqflags.h |   22 +
 kernel/printk/printk.c   |6 +-
 kernel/trace/Kconfig |6 +-
 kernel/trace/preemptirq_delay_test.c |  144 ++-
 kernel/trace/trace.c |   75 +-
 kernel/trace/trace.h |   19 +
 kernel/trace/trace_hwlat.c   |4 +-
 kernel/trace/trace_irqsoff.c |   12 +
 tools/Makefile   |   14 +-
 tools/trace/Makefile |   20 +
 tools/trace/latency-collector.c  | 1212 ++
 11 files changed, 1501 insertions(+), 33 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

-- 
2.17.1

Re: [PATCH v6 1/4] ftrace: Implement fs notification for tracing_max_latency

2019-09-08 Thread Viktor Rosendahl

On 9/8/19 1:38 AM, Joel Fernandes wrote:> On Sat, Sep 07, 2019 at 
11:12:59PM +0200, Viktor Rosendahl wrote:

>> On 9/6/19 4:17 PM, Joel Fernandes wrote:
>>> On Thu, Sep 05, 2019 at 03:25:45PM +0200, Viktor Rosendahl wrote:
>> 
>>>> +
>>>> +__init static int latency_fsnotify_init(void)
>>>> +{
>>>> +  fsnotify_wq = alloc_workqueue("tr_max_lat_wq",
>>>> +WQ_UNBOUND | WQ_HIGHPRI, 0);
>>>> +  if (!fsnotify_wq) {
>>>> +  pr_err("Unable to allocate tr_max_lat_wq\n");
>>>> +  return -ENOMEM;
>>>> +  }
>>>
>>> Why not just use the system workqueue instead of adding another 
workqueue?

>>>
>>
>> For the the latency-collector to work properly in the worst case, when a
>> new latency occurs immediately, the fsnotify must be received in less
>> time than what the threshold is set to. If we always are slower we will
>> always lose certain latencies.
>>
>> My intention was to minimize latency in some important cases, so that
>> user space receives the notification sooner rather than later.
>>
>> There doesn't seem to be any system workqueue with WQ_UNBOUND and
>> WQ_HIGHPRI. My thinking was that WQ_UNBOUND might help with the latency
>> in some important cases.
>>
>> If we use:
>>
>> queue_work(system_highpri_wq, >fsnotify_work);
>>
>> then the work will (almost) always execute on the same CPU but if we are
>> unlucky that CPU could be too busy while there could be another CPU in
>> the system that would be able to process the work soon enough.
>>
>> queue_work_on() could be used to queue the work on another CPU but it
>> seems difficult to select the right CPU.
>
> Ok, a separate WQ is fine with me as such since the preempt/irq 
events are on

> a debug kernel anyway.

Keep in mind that this feature is also enabled by the wakeup tracers and by
hwlat. These are often enabled by production kernels.

I guess it would be possible to add some ifdefs so that we create a new
workqueue only if preempt/irqsoff tracing is enabled in the kernel 
config and

use system_highpri_wq if we only have the wakeup and hwlat tracers in the
config.

However, I don't really like adding yet some more ifdefs to the code.

Since a new workqueue will not necessariy create a new worker thread 
nowadays,

I thought that it would be OK with a new unbound workqueue, which should not
add much to the tendency to create more worker threads.

>
> I'll keep reviewing your patches next few days, I am at the LPC 
conference so
> might be a bit slow. Overall I think the series look like its 
maturing and

> getting close.
>

Ok, thanks. Could you let me know when you have looked through it all so 
that

I know when it makes sense to make another reroll of the series?

best regards,

Viktor

Re: [PATCH v6 1/4] ftrace: Implement fs notification for tracing_max_latency

2019-09-07 Thread Viktor Rosendahl


On 9/6/19 4:17 PM, Joel Fernandes wrote:

On Thu, Sep 05, 2019 at 03:25:45PM +0200, Viktor Rosendahl wrote:



+
+__init static int latency_fsnotify_init(void)
+{
+   fsnotify_wq = alloc_workqueue("tr_max_lat_wq",
+ WQ_UNBOUND | WQ_HIGHPRI, 0);
+   if (!fsnotify_wq) {
+   pr_err("Unable to allocate tr_max_lat_wq\n");
+   return -ENOMEM;
+   }


Why not just use the system workqueue instead of adding another workqueue?



For the the latency-collector to work properly in the worst case, when a
new latency occurs immediately, the fsnotify must be received in less
time than what the threshold is set to. If we always are slower we will
always lose certain latencies.

My intention was to minimize latency in some important cases, so that
user space receives the notification sooner rather than later.

There doesn't seem to be any system workqueue with WQ_UNBOUND and
WQ_HIGHPRI. My thinking was that WQ_UNBOUND might help with the latency
in some important cases.

If we use:

queue_work(system_highpri_wq, >fsnotify_work);

then the work will (almost) always execute on the same CPU but if we are
unlucky that CPU could be too busy while there could be another CPU in
the system that would be able to process the work soon enough.

queue_work_on() could be used to queue the work on another CPU but it
seems difficult to select the right CPU.

best regards,

Viktor

Re: [PATCH v5 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger

2019-09-05 Thread Viktor Rosendahl


On 9/5/19 6:52 PM, Steven Rostedt wrote:

On Wed, 4 Sep 2019 21:10:38 +0200
Viktor Rosendahl  wrote:


On 9/4/19 1:42 PM, Steven Rostedt wrote:

I am not so keen to clutter source files with a new copyright message.
My problem is that git-send-email doesn't work well with my work email
address, so I am forced to use my private gmail, which may create a
false impression that I as a private individual would be the copyright
holder.



Then do what I do:

   Steven Rostedt (VMware) 

I did "(Red Hat)" when I worked for Red Hat.

-- Steve



Ok, that seems simple and effective. I will update accordingly for v7 of 
the patches.


best regards,

Viktor

[PATCH v6 1/4] ftrace: Implement fs notification for tracing_max_latency

2019-09-05 Thread Viktor Rosendahl

This patch implements the feature that the tracing_max_latency file,
e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive
notifications through the fsnotify framework when a new latency is
available.

One particularly interesting use of this facility is when enabling
threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh,
together with the preempt/irqsoff tracers. This makes it possible to
implement a user space program that can, with equal probability,
obtain traces of latencies that occur immediately after each other in
spite of the fact that the preempt/irqsoff tracers operate in overwrite
mode.

This facility works with the hwlat, preempt/irqsoff, and wakeup
tracers.

The tracers may call the latency_fsnotify() from places such as
__schedule() or do_idle(); this makes it impossible to call
queue_work() directly without risking a deadlock. The same would
happen with a softirq,  kernel thread or tasklet. For this reason we
use the irq_work mechanism to call queue_work().

Signed-off-by: Viktor Rosendahl 
---
 kernel/trace/trace.c   | 75 +-
 kernel/trace/trace.h   | 18 +
 kernel/trace/trace_hwlat.c |  4 +-
 3 files changed, 94 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 563e80f9006a..72ac20c4aaa1 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -44,6 +44,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "trace.h"
 #include "trace_output.h"
@@ -1480,6 +1483,74 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, 
void *buf, size_t cnt)
 
 unsigned long __read_mostlytracing_thresh;
 
+#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
+   defined(CONFIG_FSNOTIFY)
+
+static const struct file_operations tracing_max_lat_fops;
+static struct workqueue_struct *fsnotify_wq;
+
+static void latency_fsnotify_workfn(struct work_struct *work)
+{
+   struct trace_array *tr = container_of(work, struct trace_array,
+ fsnotify_work);
+   fsnotify(tr->d_max_latency->d_inode, FS_MODIFY,
+tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0);
+}
+
+static void latency_fsnotify_workfn_irq(struct irq_work *iwork)
+{
+   struct trace_array *tr = container_of(iwork, struct trace_array,
+ fsnotify_irqwork);
+   queue_work(fsnotify_wq, >fsnotify_work);
+}
+
+static void trace_create_maxlat_file(struct trace_array *tr,
+struct dentry *d_tracer)
+{
+   INIT_WORK(>fsnotify_work, latency_fsnotify_workfn);
+   init_irq_work(>fsnotify_irqwork, latency_fsnotify_workfn_irq);
+   tr->d_max_latency = trace_create_file("tracing_max_latency", 0644,
+ d_tracer, >max_latency,
+ _max_lat_fops);
+}
+
+__init static int latency_fsnotify_init(void)
+{
+   fsnotify_wq = alloc_workqueue("tr_max_lat_wq",
+ WQ_UNBOUND | WQ_HIGHPRI, 0);
+   if (!fsnotify_wq) {
+   pr_err("Unable to allocate tr_max_lat_wq\n");
+   return -ENOMEM;
+   }
+   return 0;
+}
+
+late_initcall_sync(latency_fsnotify_init);
+
+void latency_fsnotify(struct trace_array *tr)
+{
+   if (!fsnotify_wq)
+   return;
+   /*
+* We cannot call queue_work(>fsnotify_work) from here because it's
+* possible that we are called from __schedule() or do_idle(), which
+* could cause a deadlock.
+*/
+   irq_work_queue(>fsnotify_irqwork);
+}
+
+/*
+ * (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
+ *  defined(CONFIG_FSNOTIFY)
+ */
+#else
+
+#define trace_create_maxlat_file(tr, d_tracer) \
+   trace_create_file("tracing_max_latency", 0644, d_tracer,\
+ >max_latency, _max_lat_fops)
+
+#endif
+
 #ifdef CONFIG_TRACER_MAX_TRACE
 /*
  * Copy the new maximum trace into the separate maximum-trace
@@ -1518,6 +1589,7 @@ __update_max_tr(struct trace_array *tr, struct 
task_struct *tsk, int cpu)
 
/* record this tasks comm */
tracing_record_cmdline(tsk);
+   latency_fsnotify(tr);
 }
 
 /**
@@ -8550,8 +8622,7 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry 
*d_tracer)
create_trace_options_dir(tr);
 
 #if defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)
-   trace_create_file("tracing_max_latency", 0644, d_tracer,
-   >max_latency, _max_lat_fops);
+   trace_create_maxlat_file(tr, d_tracer);
 #endif
 
if (ftrace_create_function_files(tr, d_tracer))
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 005f08629b8b

[PATCH v6 4/4] ftrace: Add an option for tracing console latencies

2019-09-05 Thread Viktor Rosendahl

This new trace option "console-latency" will enable the latency
tracers to trace the console latencies. Previously this has always been
implicitely disabled. I guess this is because they are considered
to be well known and unavoidable.

However, for some organizations it may nevertheless be desirable to
trace them. Basically, we want to be able to tell that there are
latencies in the system under test because someone has incorrectly
enabled the serial console.

Signed-off-by: Viktor Rosendahl 
---
 include/linux/irqflags.h | 22 ++
 kernel/printk/printk.c   |  6 --
 kernel/trace/trace.h |  1 +
 kernel/trace/trace_irqsoff.c | 12 
 4 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 21619c92c377..3de891723331 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -13,6 +13,7 @@
 #define _LINUX_TRACE_IRQFLAGS_H
 
 #include 
+#include 
 #include 
 
 /* Currently trace_softirqs_on/off is used only by lockdep */
@@ -68,9 +69,30 @@ do { \
defined(CONFIG_PREEMPT_TRACER)
  extern void stop_critical_timings(void);
  extern void start_critical_timings(void);
+ extern bool console_tracing_disabled(void);
+
+# define console_stop_critical_timings(flag)   \
+   do {\
+   typecheck(bool, flag);  \
+   flag = console_tracing_disabled();  \
+   if (flag)   \
+   stop_critical_timings();\
+   } while (0)
+
+# define console_start_critical_timings(flag)   \
+   do { \
+   typecheck(bool, flag);   \
+   if (flag)\
+   start_critical_timings();\
+   } while (0)
+
 #else
 # define stop_critical_timings() do { } while (0)
 # define start_critical_timings() do { } while (0)
+# define console_stop_critical_timings(flag)   \
+   do { typecheck(bool, flag); } while (0)
+# define console_start_critical_timings(flag)  \
+   do { typecheck(bool, flag); } while (0)
 #endif
 
 /*
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1888f6a3b694..036460a7e4cd 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2359,6 +2359,7 @@ void console_unlock(void)
static char ext_text[CONSOLE_EXT_LOG_MAX];
static char text[LOG_LINE_MAX + PREFIX_MAX];
unsigned long flags;
+   bool cflag;
bool do_cond_resched, retry;
 
if (console_suspended) {
@@ -2459,9 +2460,10 @@ void console_unlock(void)
 */
console_lock_spinning_enable();
 
-   stop_critical_timings();/* don't trace print latency */
+   /* don't trace print latency if it's disabled */
+   console_stop_critical_timings(cflag);
call_console_drivers(ext_text, ext_len, text, len);
-   start_critical_timings();
+   console_start_critical_timings(cflag);
 
if (console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 4913ed1138aa..93a8b82c27e4 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1262,6 +1262,7 @@ extern int trace_get_user(struct trace_parser *parser, 
const char __user *ubuf,
C(PRINTK_MSGONLY,   "printk-msg-only"), \
C(CONTEXT_INFO, "context-info"),   /* Print 
pid/cpu/time */ \
C(LATENCY_FMT,  "latency-format"),  \
+   C(CONSOLE_LATENCY,  "console-latency"), \
C(RECORD_CMD,   "record-cmd"),  \
C(RECORD_TGID,  "record-tgid"), \
C(OVERWRITE,"overwrite"),   \
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index a745b0cee5d3..bc02be207a7a 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -456,6 +456,18 @@ void stop_critical_timings(void)
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 NOKPROBE_SYMBOL(stop_critical_timings);
 
+bool console_tracing_disabled(void)
+{
+   struct trace_array *tr = irqsoff_trace;
+   int pc = preempt_count();
+
+   if (!preempt_trace(pc) && !irq_trace())
+   return false;
+
+   return !(tr->trace_flags & TRACE_ITER_CONSOLE_LATENCY);
+}
+EXPORT_SYMBOL_GPL(console_tracing_disabled);
+
 #ifdef CONFIG_FUNCTION_TRACER
 static bool function_enabled;
 
-- 
2.17.1

[PATCH v6 3/4] Add the latency-collector to tools

2019-09-05 Thread Viktor Rosendahl

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

In the example above, we randomly, happened to get the third latency
in a burst burst of 10. If the experiment is repeated enough times,
we will get all 10.

Sometimes, there will be lost latencies, so that we get:

3054.078294 Latency 44 printout skipped due to inotify loop
..or:

3211.957685 Latency 112 printout skipped due to print loop

In my experience, these are not frequent enough to be a problem.

Also, we do not really lose any latency in these cases, it's
more like we get another latency than what was intended by
the design.

Signed-off-by: Viktor Rosendahl 
---
 tools/Makefile  |   14 +-
 tools/trace/Makefile|   20 +
 tools/trace/latency-collector.c | 1212 +++
 3 files changed, 1240 insertions(+), 6 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

diff --git a/tools/Makefile b/tools/Makefile
index 68defd7ecf5d..2b3738e66280 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -30,6 +30,7 @@ help:
@echo '  spi- spi tools'
@echo '  objtool- an ELF object analysis tool'
@echo '  tmon   - thermal monitoring and tuning tool'
+   @echo '  trace  - misc tracing tools'
@echo '  turbostat  - Intel CPU idle stats and freq 
reporting tool'
@echo '  usb- USB testing tools'
@echo '  virtio - vhost test module'
@@ -63,7 +64,7 @@ acpi: FORCE
 cpupower: FORCE
$(call descend,power/$@)
 
-cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci 
firmware debugging: FORCE
+cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci 
firmware debugging trace: FORCE
$(call descend,$@)
 
 liblockdep: FORCE
@@ -99,7 +100,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \
perf selftests spi turbostat usb \
virtio vm bpf x86_energy_perf_policy \
tmon freefall iio objtool kvm_stat wmi \
-   pci debugging
+   pci debugging trace
 
 acpi_install:
$(call descend,power/$(@:_install=),install)
@@ -107,7 +108,7 @@ acpi_install:
 cpupower_install:
$(call descend,power/$(@:_install=),install)
 
-cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install spi_install usb_install virtio_install vm_install bpf_install 
objtool_install wmi_install pci_install debugging_install:
+cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install spi_install usb_install virtio_install vm_install bpf_install 
objtool_install wmi_install pci_install debugging_install trace_install:
$(call descend,$(@:_install=),install)
 
 liblockdep_install:
@@ -133,7 +134,8 @@ install: acpi_install cgroup_install cpupower_install 
gpio_install \
perf_install selftests_install turbostat_install usb_install \
virtio_install vm_install bpf_install 
x86_energy_perf_policy_install \
tmon_install freefall_install objtool_install kvm_stat_install \
-   wmi_install pci_install debugging_install 
intel-speed-select_install
+   wmi_install pci_install debugging_install 
intel-speed-select_install \
+   trace_install
 
 acpi_clean:
$(call descend,power/acpi,clean)
@@ -141,7 +143,7 @@ acpi_clean:
 cpupower_clean:
$(call descend,power/cpupower,clean)
 
-cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean 
wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean 
firmware_clean debugging_clean:
+cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean 
wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean 
firmware_clean debugging_clean trace_clean:
$(call descend,$(@:_clean=),clean)
 
 liblockdep_clean:
@@ -180,6 +182,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean 
firewire_clean \
vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean 
tmon_clean \
freefall_clean build_clean libbpf_clean libsubcmd_clean 
liblockdep_clean \
gpio_clean objtool_clean leds_clean wmi_clean pci_clean 
firmware_clean debugging_clean \
-   intel-speed-select_clean
+   intel-speed-select_clean trace_clean
 
 .PHONY: FORCE
diff --git a/tools/trace/Makefile b/tools/trace/Makefile
new file mode 100644
index ..59cd483ab01f
--- /dev/null
+++ b/tools/trace/Makefile
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for vm tools
+#
+TARGETS = latency-collector
+CFLAGS = -Wall -Wextra -g -O2
+LDFLAGS = -lpthread
+
+all: $(TARGETS)
+
+%: %.c
+   $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
+
+clean:
+

[PATCH v6 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger

2019-09-05 Thread Viktor Rosendahl

This burst feature enables the user to generate a burst of
preempt/irqsoff latencies. This makes it possible to test whether we
are able to detect latencies that systematically occur very close to
each other.

The maximum burst size is 10. We also create 10 identical test
functions, so that we get 10 different backtraces; this is useful
when we want to test whether we can detect all the latencies in a
burst. Otherwise, there would be no easy way of differentiating
between which latency in a burst was captured by the tracer.

In addition, there is a sysfs trigger, so that it's not necessary to
reload the module to repeat the test. The trigger will appear as
/sys/kernel/preemptirq_delay_test/trigger in sysfs.

Signed-off-by: Viktor Rosendahl 
---
 kernel/trace/Kconfig |   6 +-
 kernel/trace/preemptirq_delay_test.c | 147 +++
 2 files changed, 131 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 98da8998c25c..5ff573d844e4 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -751,9 +751,9 @@ config PREEMPTIRQ_DELAY_TEST
  configurable delay. The module busy waits for the duration of the
  critical section.
 
- For example, the following invocation forces a one-time irq-disabled
- critical section for 500us:
- modprobe preemptirq_delay_test test_mode=irq delay=50
+ For example, the following invocation generates a burst of three
+ irq-disabled critical sections for 500us:
+ modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3
 
  If unsure, say N
 
diff --git a/kernel/trace/preemptirq_delay_test.c 
b/kernel/trace/preemptirq_delay_test.c
index d8765c952fab..96bf6b4858bc 100644
--- a/kernel/trace/preemptirq_delay_test.c
+++ b/kernel/trace/preemptirq_delay_test.c
@@ -3,6 +3,9 @@
  * Preempt / IRQ disable delay thread to test latency tracers
  *
  * Copyright (C) 2018 Joel Fernandes (Google) 
+ * Copyright (C) 2018, 2019 BMW Car IT GmbH
+ * Author: Viktor Rosendahl (viktor.rosend...@bmw.de)
+ * - Added the burst feature and the sysfs trigger
  */
 
 #include 
@@ -10,18 +13,25 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 static ulong delay = 100;
-static char test_mode[10] = "irq";
+static char test_mode[12] = "irq";
+static uint burst_size = 1;
 
-module_param_named(delay, delay, ulong, S_IRUGO);
-module_param_string(test_mode, test_mode, 10, S_IRUGO);
-MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)");
-MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default 
irq)");
+module_param_named(delay, delay, ulong, 0444);
+module_param_string(test_mode, test_mode, 12, 0444);
+module_param_named(burst_size, burst_size, uint, 0444);
+MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)");
+MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or 
alternate (default irq)");
+MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)");
+
+#define MIN(x, y) ((x) < (y) ? (x) : (y))
 
 static void busy_wait(ulong time)
 {
@@ -34,37 +44,136 @@ static void busy_wait(ulong time)
} while ((end - start) < (time * 1000));
 }
 
-static int preemptirq_delay_run(void *data)
+static __always_inline void irqoff_test(void)
 {
unsigned long flags;
+   local_irq_save(flags);
+   busy_wait(delay);
+   local_irq_restore(flags);
+}
 
-   if (!strcmp(test_mode, "irq")) {
-   local_irq_save(flags);
-   busy_wait(delay);
-   local_irq_restore(flags);
-   } else if (!strcmp(test_mode, "preempt")) {
-   preempt_disable();
-   busy_wait(delay);
-   preempt_enable();
+static __always_inline void preemptoff_test(void)
+{
+   preempt_disable();
+   busy_wait(delay);
+   preempt_enable();
+}
+
+static void execute_preemptirqtest(int idx)
+{
+   if (!strcmp(test_mode, "irq"))
+   irqoff_test();
+   else if (!strcmp(test_mode, "preempt"))
+   preemptoff_test();
+   else if (!strcmp(test_mode, "alternate")) {
+   if (idx % 2 == 0)
+   irqoff_test();
+   else
+   preemptoff_test();
}
+}
+
+#define DECLARE_TESTFN(POSTFIX)\
+   static void preemptirqtest_##POSTFIX(int idx)   \
+   {   \
+   execute_preemptirqtest(idx);\
+   }   \
 
+/*
+ * We create 10 different functions, so that we can get 10 different
+ * backtraces.
+ */
+DECLARE_TESTFN(0)
+DECLARE_TESTFN(1)
+DECLARE_TESTFN(2)
+DECLARE_TESTFN(3)
+DECLARE_TESTFN(4)
+D

[PATCH v6 0/4] Some new features for the preempt/irqsoff tracers

2019-09-05 Thread Viktor Rosendahl

Hello all,

Changes in v5:
- [PATCH 1/4]:
  * Using the irq_work mechanism instead of the games with linked lists, percpu
variables and modifications to the scheduling and idle code.

- [PATCH 2/4]:
  * Updated the copyright message, adding my own name and what I did.

- [PATCH 3/4]:
  * Added my own name as author to the copyright message.

- [PACTH 4/4]:
  * Improved the build on sh architecture with gcc 7.4.0 by including
linux/types.h. The build on sh architecture still fails for me later in the
build process with an ICE error from the compiler but the same failure also
happens without any of my patches.

This series is meant to address two issues with the latency tracing.

The first three patches provide a method to trace latencies that always
occurs very close to each other and to differentiate between them, in spite
of the fact that the latency tracers work in overwrite mode.

[PATCH 1/4] This implement fs notification for tracing_max_latency. It
makes it possible for userspace to detect when a new latency has been
detected.

[PATCH 2/4] This extends the preemptirq_delay_test module so that it can be
used to generate a burst of closely occurring latencies.

[PATCH 3/4] This adds a user space program to the tools directory that
utilizes the fs notification feature and a randomized algorithm to print out
any of the latencies in a burst with approximately equal probability.

The last patch is not directly connected but earlier it didn't apply
cleanly on its own. However, now it does, so in principle it could be
applied separately from the others.

[PATCH 4/4] This adds the option console-latency to the trace options. This
makes it possible to enable tracing of console latencies.

best regards,

Viktor

Viktor Rosendahl (4):
  ftrace: Implement fs notification for tracing_max_latency
  preemptirq_delay_test: Add the burst feature and a sysfs trigger
  Add the latency-collector to tools
  ftrace: Add an option for tracing console latencies

 include/linux/irqflags.h |   22 +
 kernel/printk/printk.c   |6 +-
 kernel/trace/Kconfig |6 +-
 kernel/trace/preemptirq_delay_test.c |  147 +++-
 kernel/trace/trace.c |   75 +-
 kernel/trace/trace.h |   19 +
 kernel/trace/trace_hwlat.c   |4 +-
 kernel/trace/trace_irqsoff.c |   12 +
 tools/Makefile   |   14 +-
 tools/trace/Makefile |   20 +
 tools/trace/latency-collector.c  | 1212 ++
 11 files changed, 1504 insertions(+), 33 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

-- 
2.17.1

Re: [PATCH v5 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger

2019-09-04 Thread Viktor Rosendahl


On 9/4/19 1:42 PM, Steven Rostedt wrote:

On Tue,  3 Sep 2019 15:26:00 +0200
Viktor Rosendahl  wrote:

diff --git a/kernel/trace/preemptirq_delay_test.c 
b/kernel/trace/preemptirq_delay_test.c
index d8765c952fab..dc281fa75198 100644
--- a/kernel/trace/preemptirq_delay_test.c
+++ b/kernel/trace/preemptirq_delay_test.c
@@ -3,6 +3,7 @@
   * Preempt / IRQ disable delay thread to test latency tracers
   *
   * Copyright (C) 2018 Joel Fernandes (Google) 
+ * Copyright (C) 2018, 2019 BMW Car IT GmbH


A name and what you did should also be attached here. Ideally, we leave
these out as git history is usually enough.


I am not so keen to clutter source files with a new copyright message. 
My problem is that git-send-email doesn't work well with my work email 
address, so I am forced to use my private gmail, which may create a 
false impression that I as a private individual would be the copyright 
holder.


best regards,

Viktor

Re: [PATCH v5 1/4] ftrace: Implement fs notification for tracing_max_latency

2019-09-04 Thread Viktor Rosendahl


On 9/4/19 1:39 PM, Steven Rostedt wrote:

On Tue,  3 Sep 2019 15:25:59 +0200
Viktor Rosendahl  wrote:


+void latency_fsnotify_stop(void)
+{
+   /* Make sure all CPUs see caller's previous actions to stop tracer */
+   smp_wmb();


These memory barriers just look wrong. What exactly are you trying to protect 
here?

Where's the matching rmbs?



Thanks for reviewing.

However, since these functions will disappear when I take the irq_work
facility into use, we should perhaps not spend too much time discussing
what would have been.

There are no matching rmbs, I was thinking that the smp_wmb() would
merely enforce the order of the memory writes, as seen by other CPUs, so 
that the tracer would be stopped, before the latency fsnotify is disabled.


E.g. in case of the preemptirqsoff tracer the idea was that it doesn't
matter exactly when a CPU sees the "tracer_enabled = 0;" in
stop_irqsoff_tracer() but that it needs to be seen before the writes in
latency_fsnotify_stop() are seen.

best regards,

Viktor

Re: [PATCH v5 1/4] ftrace: Implement fs notification for tracing_max_latency

2019-09-04 Thread Viktor Rosendahl


On 9/4/19 10:19 AM, Peter Zijlstra wrote:



Adding Paul since RCU faces similar situations, i.e. raising softirq risks
scheduler deadlock in rcu_read_unlock_special() -- but RCU's solution is to
avoid raising the softirq and instead use irq_work.


Which is right.



Thanks Joel and Peter for suggesting the irq_work facility.

I was ignorant of the existence of this facility.

It looks like it is the Holy Grail of deferred work, for this particular 
type of problem :)


It looks like it will be possible to both avoid deadlock and also avoid 
the ugly additions to the sched and idle code. In addition, the rest of 
the code will become a lot simpler.


best regards,

Viktor

[PATCH v5 1/4] ftrace: Implement fs notification for tracing_max_latency

2019-09-03 Thread Viktor Rosendahl

This patch implements the feature that the tracing_max_latency file,
e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive
notifications through the fsnotify framework when a new latency is
available.

One particularly interesting use of this facility is when enabling
threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh,
together with the preempt/irqsoff tracers. This makes it possible to
implement a user space program that can, with equal probability,
obtain traces of latencies that occur immediately after each other in
spite of the fact that the preempt/irqsoff tracers operate in overwrite
mode.

This facility works with the hwlat, preempt/irqsoff, and wakeup
tracers.

This patch also adds some unfortunate calls from __schedule() and
do_idle(). Those calls to the latency_fsnotify_disable/enable() are
needed because we cannot wake up the workqueue from these critical
sections without risking a deadlock. Similar problems would also arise
if we try to schedule a tasklet, raise a softirq, or wake up a kernel
thread. If a notification event would happen in the forbidden sections,
we schedule the fsnotify work as soon as we have exited them.

There was a suggestion to remove this latency_fsnotify_enable/disable()
gunk, or at least to combine it with the start_critical_timings() and
stop_critical_timings(). I have however not been able to come up with
a way to do it.

It seems like it would be possible to simply replace the calls to
latency_fsnotify_enable/disable() with calls to
start/stop_critical_timings(). However, the main problem is that it
would not work for the wakup tracer. The wakeup tracer needs a
facility that postpones the notifications, not one that prevents the
measurements because all its measurements takes place in the middle
of __schedule(). On the other hand, in some places, like in idle and
the console we need start stop functions that prevents the
measurements from being make.

Signed-off-by: Viktor Rosendahl 
---
 include/linux/ftrace.h|  31 +
 kernel/sched/core.c   |   3 +
 kernel/sched/idle.c   |   3 +
 kernel/sched/sched.h  |   1 +
 kernel/trace/trace.c  | 112 +-
 kernel/trace/trace.h  |  22 ++
 kernel/trace/trace_hwlat.c|   4 +-
 kernel/trace/trace_irqsoff.c  |   4 ++
 kernel/trace/trace_sched_wakeup.c |   4 ++
 9 files changed, 181 insertions(+), 3 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 8a8cb3c401b2..b4d9700ef917 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -907,4 +907,35 @@ unsigned long arch_syscall_addr(int nr);
 
 #endif /* CONFIG_FTRACE_SYSCALLS */
 
+#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
+   defined(CONFIG_FSNOTIFY)
+
+DECLARE_PER_CPU(int, latency_notify_disable);
+DECLARE_STATIC_KEY_FALSE(latency_notify_key);
+
+void latency_fsnotify_process(void);
+
+/*
+ * Disable/enable fsnotify while in scheduler and idle code. Trying to wake
+ * anything up from there, such as calling queue_work() is prone to deadlock.
+ */
+static inline void latency_fsnotify_disable(void)
+{
+   this_cpu_inc(latency_notify_disable);
+}
+
+static inline void latency_fsnotify_enable(void)
+{
+   this_cpu_dec(latency_notify_disable);
+   if (static_branch_unlikely(_notify_key))
+   latency_fsnotify_process();
+}
+
+#else
+
+#define latency_fsnotify_disable() do { } while (0)
+#define latency_fsnotify_enable()  do { } while (0)
+
+#endif
+
 #endif /* _LINUX_FTRACE_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 010d578118d6..e3c1dc801073 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3198,6 +3198,7 @@ asmlinkage __visible void schedule_tail(struct 
task_struct *prev)
 */
 
rq = finish_task_switch(prev);
+   latency_fsnotify_enable();
balance_callback(rq);
preempt_enable();
 
@@ -3820,6 +3821,7 @@ static void __sched notrace __schedule(bool preempt)
 
local_irq_disable();
rcu_note_context_switch(preempt);
+   latency_fsnotify_disable();
 
/*
 * Make sure that signal_pending_state()->signal_pending() below
@@ -3883,6 +3885,7 @@ static void __sched notrace __schedule(bool preempt)
rq_unlock_irq(rq, );
}
 
+   latency_fsnotify_enable();
balance_callback(rq);
 }
 
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 80940939b733..5fc87d99a407 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -236,6 +236,7 @@ static void do_idle(void)
 
__current_set_polling();
tick_nohz_idle_enter();
+   latency_fsnotify_disable();
 
while (!need_resched()) {
check_pgt_cache();
@@ -265,6 +266,8 @@ static void do_idle(void)
arch_cpu_idle_exit();
}
 
+   latency_fsnotify_enable();
+
/*
 * Since

[PATCH v5 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger

2019-09-03 Thread Viktor Rosendahl

This burst feature enables the user to generate a burst of
preempt/irqsoff latencies. This makes it possible to test whether we
are able to detect latencies that systematically occur very close to
each other.

The maximum burst size is 10. We also create 10 identical test
functions, so that we get 10 different backtraces; this is useful
when we want to test whether we can detect all the latencies in a
burst. Otherwise, there would be no easy way of differentiating
between which latency in a burst was captured by the tracer.

In addition, there is a sysfs trigger, so that it's not necessary to
reload the module to repeat the test. The trigger will appear as
/sys/kernel/preemptirq_delay_test/trigger in sysfs.

Signed-off-by: Viktor Rosendahl 
---
 kernel/trace/Kconfig |   6 +-
 kernel/trace/preemptirq_delay_test.c | 145 +++
 2 files changed, 129 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 98da8998c25c..5ff573d844e4 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -751,9 +751,9 @@ config PREEMPTIRQ_DELAY_TEST
  configurable delay. The module busy waits for the duration of the
  critical section.
 
- For example, the following invocation forces a one-time irq-disabled
- critical section for 500us:
- modprobe preemptirq_delay_test test_mode=irq delay=50
+ For example, the following invocation generates a burst of three
+ irq-disabled critical sections for 500us:
+ modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3
 
  If unsure, say N
 
diff --git a/kernel/trace/preemptirq_delay_test.c 
b/kernel/trace/preemptirq_delay_test.c
index d8765c952fab..dc281fa75198 100644
--- a/kernel/trace/preemptirq_delay_test.c
+++ b/kernel/trace/preemptirq_delay_test.c
@@ -3,6 +3,7 @@
  * Preempt / IRQ disable delay thread to test latency tracers
  *
  * Copyright (C) 2018 Joel Fernandes (Google) 
+ * Copyright (C) 2018, 2019 BMW Car IT GmbH
  */
 
 #include 
@@ -10,18 +11,25 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 static ulong delay = 100;
-static char test_mode[10] = "irq";
+static char test_mode[12] = "irq";
+static uint burst_size = 1;
 
-module_param_named(delay, delay, ulong, S_IRUGO);
-module_param_string(test_mode, test_mode, 10, S_IRUGO);
-MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)");
-MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default 
irq)");
+module_param_named(delay, delay, ulong, 0444);
+module_param_string(test_mode, test_mode, 12, 0444);
+module_param_named(burst_size, burst_size, uint, 0444);
+MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)");
+MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or 
alternate (default irq)");
+MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)");
+
+#define MIN(x, y) ((x) < (y) ? (x) : (y))
 
 static void busy_wait(ulong time)
 {
@@ -34,37 +42,136 @@ static void busy_wait(ulong time)
} while ((end - start) < (time * 1000));
 }
 
-static int preemptirq_delay_run(void *data)
+static __always_inline void irqoff_test(void)
 {
unsigned long flags;
+   local_irq_save(flags);
+   busy_wait(delay);
+   local_irq_restore(flags);
+}
 
-   if (!strcmp(test_mode, "irq")) {
-   local_irq_save(flags);
-   busy_wait(delay);
-   local_irq_restore(flags);
-   } else if (!strcmp(test_mode, "preempt")) {
-   preempt_disable();
-   busy_wait(delay);
-   preempt_enable();
+static __always_inline void preemptoff_test(void)
+{
+   preempt_disable();
+   busy_wait(delay);
+   preempt_enable();
+}
+
+static void execute_preemptirqtest(int idx)
+{
+   if (!strcmp(test_mode, "irq"))
+   irqoff_test();
+   else if (!strcmp(test_mode, "preempt"))
+   preemptoff_test();
+   else if (!strcmp(test_mode, "alternate")) {
+   if (idx % 2 == 0)
+   irqoff_test();
+   else
+   preemptoff_test();
}
+}
+
+#define DECLARE_TESTFN(POSTFIX)\
+   static void preemptirqtest_##POSTFIX(int idx)   \
+   {   \
+   execute_preemptirqtest(idx);\
+   }   \
 
+/*
+ * We create 10 different functions, so that we can get 10 different
+ * backtraces.
+ */
+DECLARE_TESTFN(0)
+DECLARE_TESTFN(1)
+DECLARE_TESTFN(2)
+DECLARE_TESTFN(3)
+DECLARE_TESTFN(4)
+DECLARE_TESTFN(5)
+DECLARE_TESTFN(6)
+DECLARE_TESTFN(7)
+DECLARE_TESTFN(8)
+DECLARE_TESTFN(9)
+
+static void (*testfuncs[])(int)  = {
+

[PATCH v5 4/4] ftrace: Add an option for tracing console latencies

2019-09-03 Thread Viktor Rosendahl

This new trace option "console-latency" will enable the latency
tracers to trace the console latencies. Previously this has always been
implicitely disabled. I guess this is because they are considered
to be well known and unavoidable.

However, for some organizations it may nevertheless be desirable to
trace them. Basically, we want to be able to tell that there are
latencies in the system under test because someone has incorrectly
enabled the serial console.

Signed-off-by: Viktor Rosendahl 
---
 include/linux/irqflags.h | 21 +
 kernel/printk/printk.c   |  6 --
 kernel/trace/trace.h |  1 +
 kernel/trace/trace_irqsoff.c | 12 
 4 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 21619c92c377..34b0424a32dc 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -68,9 +68,30 @@ do { \
defined(CONFIG_PREEMPT_TRACER)
  extern void stop_critical_timings(void);
  extern void start_critical_timings(void);
+ extern bool console_tracing_disabled(void);
+
+# define console_stop_critical_timings(flag)   \
+   do {\
+   typecheck(bool, flag);  \
+   flag = console_tracing_disabled();  \
+   if (flag)   \
+   stop_critical_timings();\
+   } while (0)
+
+# define console_start_critical_timings(flag)   \
+   do { \
+   typecheck(bool, flag);   \
+   if (flag)\
+   start_critical_timings();\
+   } while (0)
+
 #else
 # define stop_critical_timings() do { } while (0)
 # define start_critical_timings() do { } while (0)
+# define console_stop_critical_timings(flag)   \
+   do { typecheck(bool, flag); } while (0)
+# define console_start_critical_timings(flag)  \
+   do { typecheck(bool, flag); } while (0)
 #endif
 
 /*
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1888f6a3b694..036460a7e4cd 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2359,6 +2359,7 @@ void console_unlock(void)
static char ext_text[CONSOLE_EXT_LOG_MAX];
static char text[LOG_LINE_MAX + PREFIX_MAX];
unsigned long flags;
+   bool cflag;
bool do_cond_resched, retry;
 
if (console_suspended) {
@@ -2459,9 +2460,10 @@ void console_unlock(void)
 */
console_lock_spinning_enable();
 
-   stop_critical_timings();/* don't trace print latency */
+   /* don't trace print latency if it's disabled */
+   console_stop_critical_timings(cflag);
call_console_drivers(ext_text, ext_len, text, len);
-   start_critical_timings();
+   console_start_critical_timings(cflag);
 
if (console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index d9f83b2aaa71..8e4c9736a801 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1266,6 +1266,7 @@ extern int trace_get_user(struct trace_parser *parser, 
const char __user *ubuf,
C(PRINTK_MSGONLY,   "printk-msg-only"), \
C(CONTEXT_INFO, "context-info"),   /* Print 
pid/cpu/time */ \
C(LATENCY_FMT,  "latency-format"),  \
+   C(CONSOLE_LATENCY,  "console-latency"), \
C(RECORD_CMD,   "record-cmd"),  \
C(RECORD_TGID,  "record-tgid"), \
C(OVERWRITE,"overwrite"),   \
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 29403a83a5f0..c8d17c4c1c91 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -456,6 +456,18 @@ void stop_critical_timings(void)
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 NOKPROBE_SYMBOL(stop_critical_timings);
 
+bool console_tracing_disabled(void)
+{
+   struct trace_array *tr = irqsoff_trace;
+   int pc = preempt_count();
+
+   if (!preempt_trace(pc) && !irq_trace())
+   return false;
+
+   return !(tr->trace_flags & TRACE_ITER_CONSOLE_LATENCY);
+}
+EXPORT_SYMBOL_GPL(console_tracing_disabled);
+
 #ifdef CONFIG_FUNCTION_TRACER
 static bool function_enabled;
 
-- 
2.17.1

[PATCH v5 0/4] Some new features for the preempt/irqsoff tracers

2019-09-03 Thread Viktor Rosendahl

Hello all,

Changes in v5:
- [PATCH 1/4]:
  * trace_disable_fsnotify() => latency_fsnotify_disable(), in order to avoid
naming conflicts with trace points.

  * Using more appropriate operations for the percpu variables.

  * Moved around the calls to latency_fsnotify_enable/disable() a bit.

  * Added static branches to reduce overhead when the facility is not in use.

  * Does not add any trace points anymore. Instead using explicit function
calls from sched and idle code.

  * Unfortunately still adds some gunk to sched and idle code. I do not know
how to avoid this.

This series is meant to address two issues with the latency tracing.

The first three patches provide a method to trace latencies that always
occurs very close to each other and to differentiate between them, in spite
of the fact that the latency tracers work in overwrite mode.

[PATCH 1/4] This implement fs notification for tracing_max_latency. It
makes it possible for userspace to detect when a new latency has been
detected.

[PATCH 2/4] This extends the preemptirq_delay_test module so that it can be
used to generate a burst of closely occurring latencies.

[PATCH 3/4] This adds a user space program to the tools directory that
utilizes the fs notification feature and a randomized algorithm to print out
any of the latencies in a burst with approximately equal probability.

The last patch is not directly connected but earlier it didn't apply
cleanly on its own. However, now it does, so in principle it could be
applied separately from the others.

[PATCH 4/4] This adds the option console-latency to the trace options. This
makes it possible to enable tracing of console latencies.

best regards,

Viktor

Viktor Rosendahl (4):
  ftrace: Implement fs notification for tracing_max_latency
  preemptirq_delay_test: Add the burst feature and a sysfs trigger
  Add the latency-collector to tools
  ftrace: Add an option for tracing console latencies

 include/linux/ftrace.h   |   31 +
 include/linux/irqflags.h |   21 +
 kernel/printk/printk.c   |6 +-
 kernel/sched/core.c  |3 +
 kernel/sched/idle.c  |3 +
 kernel/sched/sched.h |1 +
 kernel/trace/Kconfig |6 +-
 kernel/trace/preemptirq_delay_test.c |  145 ++-
 kernel/trace/trace.c |  112 ++-
 kernel/trace/trace.h |   23 +
 kernel/trace/trace_hwlat.c   |4 +-
 kernel/trace/trace_irqsoff.c |   16 +
 kernel/trace/trace_sched_wakeup.c|4 +
 tools/Makefile   |   14 +-
 tools/trace/Makefile |   20 +
 tools/trace/latency-collector.c  | 1211 ++
 16 files changed, 1587 insertions(+), 33 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

-- 
2.17.1

[PATCH v5 3/4] Add the latency-collector to tools

2019-09-03 Thread Viktor Rosendahl

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

In the example above, we randomly, happened to get the third latency
in a burst burst of 10. If the experiment is repeated enough times,
we will get all 10.

Sometimes, there will be lost latencies, so that we get:

3054.078294 Latency 44 printout skipped due to inotify loop
..or:

3211.957685 Latency 112 printout skipped due to print loop

In my experience, these are not frequent enough to be a problem.

Also, we do not really lose any latency in these cases, it's
more like we get another latency than what was intended by
the design.

Signed-off-by: Viktor Rosendahl 
---
 tools/Makefile  |   14 +-
 tools/trace/Makefile|   20 +
 tools/trace/latency-collector.c | 1211 +++
 3 files changed, 1239 insertions(+), 6 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

diff --git a/tools/Makefile b/tools/Makefile
index 68defd7ecf5d..2b3738e66280 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -30,6 +30,7 @@ help:
@echo '  spi- spi tools'
@echo '  objtool- an ELF object analysis tool'
@echo '  tmon   - thermal monitoring and tuning tool'
+   @echo '  trace  - misc tracing tools'
@echo '  turbostat  - Intel CPU idle stats and freq 
reporting tool'
@echo '  usb- USB testing tools'
@echo '  virtio - vhost test module'
@@ -63,7 +64,7 @@ acpi: FORCE
 cpupower: FORCE
$(call descend,power/$@)
 
-cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci 
firmware debugging: FORCE
+cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci 
firmware debugging trace: FORCE
$(call descend,$@)
 
 liblockdep: FORCE
@@ -99,7 +100,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \
perf selftests spi turbostat usb \
virtio vm bpf x86_energy_perf_policy \
tmon freefall iio objtool kvm_stat wmi \
-   pci debugging
+   pci debugging trace
 
 acpi_install:
$(call descend,power/$(@:_install=),install)
@@ -107,7 +108,7 @@ acpi_install:
 cpupower_install:
$(call descend,power/$(@:_install=),install)
 
-cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install spi_install usb_install virtio_install vm_install bpf_install 
objtool_install wmi_install pci_install debugging_install:
+cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install spi_install usb_install virtio_install vm_install bpf_install 
objtool_install wmi_install pci_install debugging_install trace_install:
$(call descend,$(@:_install=),install)
 
 liblockdep_install:
@@ -133,7 +134,8 @@ install: acpi_install cgroup_install cpupower_install 
gpio_install \
perf_install selftests_install turbostat_install usb_install \
virtio_install vm_install bpf_install 
x86_energy_perf_policy_install \
tmon_install freefall_install objtool_install kvm_stat_install \
-   wmi_install pci_install debugging_install 
intel-speed-select_install
+   wmi_install pci_install debugging_install 
intel-speed-select_install \
+   trace_install
 
 acpi_clean:
$(call descend,power/acpi,clean)
@@ -141,7 +143,7 @@ acpi_clean:
 cpupower_clean:
$(call descend,power/cpupower,clean)
 
-cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean 
wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean 
firmware_clean debugging_clean:
+cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean 
wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean 
firmware_clean debugging_clean trace_clean:
$(call descend,$(@:_clean=),clean)
 
 liblockdep_clean:
@@ -180,6 +182,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean 
firewire_clean \
vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean 
tmon_clean \
freefall_clean build_clean libbpf_clean libsubcmd_clean 
liblockdep_clean \
gpio_clean objtool_clean leds_clean wmi_clean pci_clean 
firmware_clean debugging_clean \
-   intel-speed-select_clean
+   intel-speed-select_clean trace_clean
 
 .PHONY: FORCE
diff --git a/tools/trace/Makefile b/tools/trace/Makefile
new file mode 100644
index ..59cd483ab01f
--- /dev/null
+++ b/tools/trace/Makefile
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for vm tools
+#
+TARGETS = latency-collector
+CFLAGS = -Wall -Wextra -g -O2
+LDFLAGS = -lpthread
+
+all: $(TARGETS)
+
+%: %.c
+   $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
+
+clean:

Re: [PATCH v4 1/4] ftrace: Implement fs notification for tracing_max_latency

2019-05-21 Thread Viktor Rosendahl

On 5/21/19 6:01 PM, Steven Rostedt wrote:>
>
> Note, you need to add the scheduling and power management maintainers
> when adding trace events to their code.
>

My bad.

> As these trace events become visible to user space, and you only need a
> callback to disable fsnotify, it may be better to just have a hard
> coded call to disable fsnotify (and have it be a nop when not
> configured).

My thinking was that it would not be defensible to clutter the idle and
scheduling code with calls to slightly obscure tracing code and that
perhaps some users would like to have these tracepoints for perf/ftrace but
I don't insist on it.

Below is a patch with hard coded calls.

> We can use a static branch if you want to keep the
> overhead down when not enabled, and enable the static branch when you
> start these latency tracers.
>

I noticed that it's possible to slightly simplify the enable/disble
functions, so I guess that this would not be necessary, especially since
one would need to handle the case when some CPUs are in the forbidden
sections when the tracers are enabled.

I can try to add the static branch if you want it though.

best regards,

Viktor
---
 include/linux/ftrace.h |  13 +
 kernel/sched/core.c|   2 +
 kernel/sched/idle.c|   2 +
 kernel/sched/sched.h   |   1 +
 kernel/trace/trace.c   | 111 -
 kernel/trace/trace.h   |  12 
 kernel/trace/trace_hwlat.c |   4 +-
 7 files changed, 142 insertions(+), 3 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 25e2995d4a4c..88c76f23277c 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -907,4 +907,17 @@ unsigned long arch_syscall_addr(int nr);
 
 #endif /* CONFIG_FTRACE_SYSCALLS */
 
+#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
+   defined(CONFIG_FSNOTIFY)
+
+void trace_disable_fsnotify(void);
+void trace_enable_fsnotify(void);
+
+#else
+
+#define trace_disable_fsnotify() do { } while (0)
+#define trace_enable_fsnotify() do { } while (0)
+
+#endif
+
 #endif /* _LINUX_FTRACE_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 874c427742a9..440cd1a62722 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3374,6 +3374,7 @@ static void __sched notrace __schedule(bool preempt)
struct rq *rq;
int cpu;
 
+   trace_disable_fsnotify();
cpu = smp_processor_id();
rq = cpu_rq(cpu);
prev = rq->curr;
@@ -3449,6 +3450,7 @@ static void __sched notrace __schedule(bool preempt)
}
 
balance_callback(rq);
+   trace_enable_fsnotify();
 }
 
 void __noreturn do_task_dead(void)
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 80940939b733..1a38bcdb3652 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -225,6 +225,7 @@ static void cpuidle_idle_call(void)
 static void do_idle(void)
 {
int cpu = smp_processor_id();
+   trace_disable_fsnotify();
/*
 * If the arch has a polling bit, we maintain an invariant:
 *
@@ -284,6 +285,7 @@ static void do_idle(void)
smp_mb__after_atomic();
 
sched_ttwu_pending();
+   /* schedule_idle() will call trace_enable_fsnotify() */
schedule_idle();
 
if (unlikely(klp_patch_pending(current)))
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b52ed1ada0be..e22871d489a5 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2c92b3d9ea30..5dcc1147cbcc 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -44,6 +44,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "trace.h"
 #include "trace_output.h"
@@ -1481,6 +1483,111 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, 
void *buf, size_t cnt)
 
 unsigned long __read_mostlytracing_thresh;
 
+#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \
+   defined(CONFIG_FSNOTIFY)
+
+static const struct file_operations tracing_max_lat_fops;
+static struct workqueue_struct *fsnotify_wq;
+static DEFINE_PER_CPU(atomic_t, notify_disabled) = ATOMIC_INIT(0);
+static DEFINE_PER_CPU(struct llist_head, notify_list);
+
+static void trace_fsnotify_workfn(struct work_struct *work)
+{
+   struct trace_array *tr = container_of(work, struct trace_array,
+ fsnotify_work);
+   fsnotify(tr->d_max_latency->d_inode, FS_MODIFY,
+tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0);
+}
+
+static void trace_create_maxlat_file(struct trace_array *tr,
+ struct dentry *d_tracer)
+{
+   INIT_WORK(>fsnotify_work, trace_fsnotify_workfn);
+   atomic_set(>notify_pending, 0);
+   tr->d_max_latency = trace_create_file("tracing_max_latency", 0644,
+

[PATCH v4 4/4] ftrace: Add an option for tracing console latencies

2019-05-17 Thread Viktor Rosendahl

This new trace option "console-latency" will enable the latency
tracers to trace the console latencies. Previously this has always been
implicitely disabled. I guess this is because they are considered
to be well known and unavoidable.

However, for some organizations it may nevertheless be desirable to
trace them. Basically, we want to be able to tell that there are
latencies in the system under test because someone has incorrectly
enabled the serial console.

Signed-off-by: Viktor Rosendahl 
---
 include/linux/irqflags.h | 21 +
 kernel/printk/printk.c   |  6 --
 kernel/trace/trace.h |  1 +
 kernel/trace/trace_irqsoff.c | 12 
 4 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 21619c92c377..34b0424a32dc 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -68,9 +68,30 @@ do { \
defined(CONFIG_PREEMPT_TRACER)
  extern void stop_critical_timings(void);
  extern void start_critical_timings(void);
+ extern bool console_tracing_disabled(void);
+
+# define console_stop_critical_timings(flag)   \
+   do {\
+   typecheck(bool, flag);  \
+   flag = console_tracing_disabled();  \
+   if (flag)   \
+   stop_critical_timings();\
+   } while (0)
+
+# define console_start_critical_timings(flag)   \
+   do { \
+   typecheck(bool, flag);   \
+   if (flag)\
+   start_critical_timings();\
+   } while (0)
+
 #else
 # define stop_critical_timings() do { } while (0)
 # define start_critical_timings() do { } while (0)
+# define console_stop_critical_timings(flag)   \
+   do { typecheck(bool, flag); } while (0)
+# define console_start_critical_timings(flag)  \
+   do { typecheck(bool, flag); } while (0)
 #endif
 
 /*
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 17102fd4c136..46ebba6e5fb9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2358,6 +2358,7 @@ void console_unlock(void)
static char ext_text[CONSOLE_EXT_LOG_MAX];
static char text[LOG_LINE_MAX + PREFIX_MAX];
unsigned long flags;
+   bool cflag;
bool do_cond_resched, retry;
 
if (console_suspended) {
@@ -2458,9 +2459,10 @@ void console_unlock(void)
 */
console_lock_spinning_enable();
 
-   stop_critical_timings();/* don't trace print latency */
+   /* don't trace print latency if it's disabled */
+   console_stop_critical_timings(cflag);
call_console_drivers(ext_text, ext_len, text, len);
-   start_critical_timings();
+   console_start_critical_timings(cflag);
 
if (console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index f95027813296..31cdc2e4c382 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1257,6 +1257,7 @@ extern int trace_get_user(struct trace_parser *parser, 
const char __user *ubuf,
C(PRINTK_MSGONLY,   "printk-msg-only"), \
C(CONTEXT_INFO, "context-info"),   /* Print 
pid/cpu/time */ \
C(LATENCY_FMT,  "latency-format"),  \
+   C(CONSOLE_LATENCY,  "console-latency"), \
C(RECORD_CMD,   "record-cmd"),  \
C(RECORD_TGID,  "record-tgid"), \
C(OVERWRITE,"overwrite"),   \
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index a745b0cee5d3..bc02be207a7a 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -456,6 +456,18 @@ void stop_critical_timings(void)
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 NOKPROBE_SYMBOL(stop_critical_timings);
 
+bool console_tracing_disabled(void)
+{
+   struct trace_array *tr = irqsoff_trace;
+   int pc = preempt_count();
+
+   if (!preempt_trace(pc) && !irq_trace())
+   return false;
+
+   return !(tr->trace_flags & TRACE_ITER_CONSOLE_LATENCY);
+}
+EXPORT_SYMBOL_GPL(console_tracing_disabled);
+
 #ifdef CONFIG_FUNCTION_TRACER
 static bool function_enabled;
 
-- 
2.17.1

[PATCH v4 3/4] Add the latency-collector to tools

2019-05-17 Thread Viktor Rosendahl

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

In the example above, we randomly, happened to get the third latency
in a burst burst of 10. If the experiment is repeated enough times,
we will get all 10.

Sometimes, there will be lost latencies, so that we get:

3054.078294 Latency 44 printout skipped due to inotify loop
..or:

3211.957685 Latency 112 printout skipped due to print loop

In my experience, these are not frequent enough to be a problem.

Also, we do not really lose any latency in these cases, it's
more like we get another latency than what was intended by
the design.

Signed-off-by: Viktor Rosendahl 
---
 tools/Makefile  |   14 +-
 tools/trace/Makefile|   20 +
 tools/trace/latency-collector.c | 1211 +++
 3 files changed, 1239 insertions(+), 6 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

diff --git a/tools/Makefile b/tools/Makefile
index 3dfd72ae6c1a..f7465abf3a0a 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -29,6 +29,7 @@ help:
@echo '  spi- spi tools'
@echo '  objtool- an ELF object analysis tool'
@echo '  tmon   - thermal monitoring and tuning tool'
+   @echo '  trace  - misc tracing tools'
@echo '  turbostat  - Intel CPU idle stats and freq 
reporting tool'
@echo '  usb- USB testing tools'
@echo '  virtio - vhost test module'
@@ -62,7 +63,7 @@ acpi: FORCE
 cpupower: FORCE
$(call descend,power/$@)
 
-cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci 
firmware debugging: FORCE
+cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci 
firmware debugging trace: FORCE
$(call descend,$@)
 
 liblockdep: FORCE
@@ -98,7 +99,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \
perf selftests spi turbostat usb \
virtio vm bpf x86_energy_perf_policy \
tmon freefall iio objtool kvm_stat wmi \
-   pci debugging
+   pci debugging trace
 
 acpi_install:
$(call descend,power/$(@:_install=),install)
@@ -106,7 +107,7 @@ acpi_install:
 cpupower_install:
$(call descend,power/$(@:_install=),install)
 
-cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install spi_install usb_install virtio_install vm_install bpf_install 
objtool_install wmi_install pci_install debugging_install:
+cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install spi_install usb_install virtio_install vm_install bpf_install 
objtool_install wmi_install pci_install debugging_install trace_install:
$(call descend,$(@:_install=),install)
 
 liblockdep_install:
@@ -132,7 +133,7 @@ install: acpi_install cgroup_install cpupower_install 
gpio_install \
perf_install selftests_install turbostat_install usb_install \
virtio_install vm_install bpf_install 
x86_energy_perf_policy_install \
tmon_install freefall_install objtool_install kvm_stat_install \
-   wmi_install pci_install debugging_install
+   wmi_install pci_install debugging_install trace_install
 
 acpi_clean:
$(call descend,power/acpi,clean)
@@ -140,7 +141,7 @@ acpi_clean:
 cpupower_clean:
$(call descend,power/cpupower,clean)
 
-cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean 
wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean 
firmware_clean debugging_clean:
+cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean 
wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean 
firmware_clean debugging_clean trace_clean:
$(call descend,$(@:_clean=),clean)
 
 liblockdep_clean:
@@ -178,6 +179,7 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean 
firewire_clean \
perf_clean selftests_clean turbostat_clean spi_clean usb_clean 
virtio_clean \
vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean 
tmon_clean \
freefall_clean build_clean libbpf_clean libsubcmd_clean 
liblockdep_clean \
-   gpio_clean objtool_clean leds_clean wmi_clean pci_clean 
firmware_clean debugging_clean
+   gpio_clean objtool_clean leds_clean wmi_clean pci_clean 
firmware_clean \
+   debugging_clean trace_clean
 
 .PHONY: FORCE
diff --git a/tools/trace/Makefile b/tools/trace/Makefile
new file mode 100644
index ..59cd483ab01f
--- /dev/null
+++ b/tools/trace/Makefile
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for vm tools
+#
+TARGETS = latency-collector
+CFLAGS = -Wall -Wextra -g -O2
+LDFLAGS = -lpthread
+
+all: $(TARGETS)
+
+%: %.c
+

[PATCH v4 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger

2019-05-17 Thread Viktor Rosendahl

This burst feature enables the user to generate a burst of
preempt/irqsoff latencies. This makes it possible to test whether we
are able to detect latencies that systematically occur very close to
each other.

The maximum burst size is 10. We also create 10 identical test
functions, so that we get 10 different backtraces; this is useful
when we want to test whether we can detect all the latencies in a
burst. Otherwise, there would be no easy way of differentiating
between which latency in a burst was captured by the tracer.

In addition, there is a sysfs trigger, so that it's not necessary to
reload the module to repeat the test. The trigger will appear as
/sys/kernel/preemptirq_delay_test/trigger in sysfs.

Signed-off-by: Viktor Rosendahl 
---
 kernel/trace/Kconfig |   6 +-
 kernel/trace/preemptirq_delay_test.c | 145 +++
 2 files changed, 129 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 5d965cef6c77..aab8909dfbff 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -740,9 +740,9 @@ config PREEMPTIRQ_DELAY_TEST
  configurable delay. The module busy waits for the duration of the
  critical section.
 
- For example, the following invocation forces a one-time irq-disabled
- critical section for 500us:
- modprobe preemptirq_delay_test test_mode=irq delay=50
+ For example, the following invocation generates a burst of three
+ irq-disabled critical sections for 500us:
+ modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3
 
  If unsure, say N
 
diff --git a/kernel/trace/preemptirq_delay_test.c 
b/kernel/trace/preemptirq_delay_test.c
index d8765c952fab..dc281fa75198 100644
--- a/kernel/trace/preemptirq_delay_test.c
+++ b/kernel/trace/preemptirq_delay_test.c
@@ -3,6 +3,7 @@
  * Preempt / IRQ disable delay thread to test latency tracers
  *
  * Copyright (C) 2018 Joel Fernandes (Google) 
+ * Copyright (C) 2018, 2019 BMW Car IT GmbH
  */
 
 #include 
@@ -10,18 +11,25 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 static ulong delay = 100;
-static char test_mode[10] = "irq";
+static char test_mode[12] = "irq";
+static uint burst_size = 1;
 
-module_param_named(delay, delay, ulong, S_IRUGO);
-module_param_string(test_mode, test_mode, 10, S_IRUGO);
-MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)");
-MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default 
irq)");
+module_param_named(delay, delay, ulong, 0444);
+module_param_string(test_mode, test_mode, 12, 0444);
+module_param_named(burst_size, burst_size, uint, 0444);
+MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)");
+MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or 
alternate (default irq)");
+MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)");
+
+#define MIN(x, y) ((x) < (y) ? (x) : (y))
 
 static void busy_wait(ulong time)
 {
@@ -34,37 +42,136 @@ static void busy_wait(ulong time)
} while ((end - start) < (time * 1000));
 }
 
-static int preemptirq_delay_run(void *data)
+static __always_inline void irqoff_test(void)
 {
unsigned long flags;
+   local_irq_save(flags);
+   busy_wait(delay);
+   local_irq_restore(flags);
+}
 
-   if (!strcmp(test_mode, "irq")) {
-   local_irq_save(flags);
-   busy_wait(delay);
-   local_irq_restore(flags);
-   } else if (!strcmp(test_mode, "preempt")) {
-   preempt_disable();
-   busy_wait(delay);
-   preempt_enable();
+static __always_inline void preemptoff_test(void)
+{
+   preempt_disable();
+   busy_wait(delay);
+   preempt_enable();
+}
+
+static void execute_preemptirqtest(int idx)
+{
+   if (!strcmp(test_mode, "irq"))
+   irqoff_test();
+   else if (!strcmp(test_mode, "preempt"))
+   preemptoff_test();
+   else if (!strcmp(test_mode, "alternate")) {
+   if (idx % 2 == 0)
+   irqoff_test();
+   else
+   preemptoff_test();
}
+}
+
+#define DECLARE_TESTFN(POSTFIX)\
+   static void preemptirqtest_##POSTFIX(int idx)   \
+   {   \
+   execute_preemptirqtest(idx);\
+   }   \
 
+/*
+ * We create 10 different functions, so that we can get 10 different
+ * backtraces.
+ */
+DECLARE_TESTFN(0)
+DECLARE_TESTFN(1)
+DECLARE_TESTFN(2)
+DECLARE_TESTFN(3)
+DECLARE_TESTFN(4)
+DECLARE_TESTFN(5)
+DECLARE_TESTFN(6)
+DECLARE_TESTFN(7)
+DECLARE_TESTFN(8)
+DECLARE_TESTFN(9)
+
+static void (*testfuncs[])(int)  = {
+

[PATCH v4 1/4] ftrace: Implement fs notification for tracing_max_latency

2019-05-17 Thread Viktor Rosendahl

This patch implements the feature that the tracing_max_latency file,
e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive
notifications through the fsnotify framework when a new latency is
available.

One particularly interesting use of this facility is when enabling
threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh,
together with the preempt/irqsoff tracers. This makes it possible to
implement a user space program that can, with equal probability,
obtain traces of latencies that occur immediately after each other in
spite of the fact that the preempt/irqsoff tracers operate in overwrite
mode.

This facility works with the hwlat, preempt/irqsoff, and wakeup
tracers.

This patch also add four new tracepoints, for entering/exiting
__schedule() and do_idle(). Those tracepoints are needed to disable
and enable the notificaton facility because we cannot wake up the
workqueue from these critical sections without risking a deadlock.
Similar problems would also arise if we try to schedule a tasklet,
raise a softirq, or wake up a kernel thread.

If a notification event would happen in the forbidden sections, we
schedule the fsnotify work as soon as we have exited
do_idle()/__schedule().

Signed-off-by: Viktor Rosendahl 
---
 include/trace/events/power.h |  40 +
 include/trace/events/sched.h |  40 +
 kernel/sched/core.c  |   2 +
 kernel/sched/idle.c  |   2 +
 kernel/trace/trace.c | 168 ++-
 kernel/trace/trace.h |  12 +++
 kernel/trace/trace_hwlat.c   |   4 +-
 7 files changed, 265 insertions(+), 3 deletions(-)

diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index f7aece721aed..40ab747274bd 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -40,6 +40,46 @@ DEFINE_EVENT(cpu, cpu_idle,
TP_ARGS(state, cpu_id)
 );
 
+/*
+ * Tracepoint for entering do_idle():
+ */
+TRACE_EVENT(do_idle_enter,
+
+   TP_PROTO(int cpu),
+
+   TP_ARGS(cpu),
+
+   TP_STRUCT__entry(
+   __field(int, cpu)
+   ),
+
+   TP_fast_assign(
+   __entry->cpu = cpu;
+   ),
+
+   TP_printk("cpu=%d", __entry->cpu)
+);
+
+/*
+ * Tracepoint for exiting do_idle():
+ */
+TRACE_EVENT(do_idle_exit,
+
+   TP_PROTO(int cpu),
+
+   TP_ARGS(cpu),
+
+   TP_STRUCT__entry(
+   __field(int, cpu)
+   ),
+
+   TP_fast_assign(
+   __entry->cpu = cpu;
+   ),
+
+   TP_printk("cpu=%d", __entry->cpu)
+);
+
 TRACE_EVENT(powernv_throttle,
 
TP_PROTO(int chip_id, const char *reason, int pmax),
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index c8c7c7efb487..a1a1befea1c1 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -183,6 +183,46 @@ TRACE_EVENT(sched_switch,
__entry->next_comm, __entry->next_pid, __entry->next_prio)
 );
 
+/*
+ * Tracepoint for entering __schedule():
+ */
+TRACE_EVENT(sched_schedule_enter,
+
+   TP_PROTO(int cpu),
+
+   TP_ARGS(cpu),
+
+   TP_STRUCT__entry(
+   __field(int, cpu)
+   ),
+
+   TP_fast_assign(
+   __entry->cpu = cpu;
+   ),
+
+   TP_printk("cpu=%d", __entry->cpu)
+);
+
+/*
+ * Tracepoint for exiting __schedule():
+ */
+TRACE_EVENT(sched_schedule_exit,
+
+   TP_PROTO(int cpu),
+
+   TP_ARGS(cpu),
+
+   TP_STRUCT__entry(
+   __field(int, cpu)
+   ),
+
+   TP_fast_assign(
+   __entry->cpu = cpu;
+   ),
+
+   TP_printk("cpu=%d", __entry->cpu)
+);
+
 /*
  * Tracepoint for a task being migrated:
  */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 102dfcf0a29a..c9d86fcc48f5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3374,6 +3374,7 @@ static void __sched notrace __schedule(bool preempt)
int cpu;
 
cpu = smp_processor_id();
+   trace_sched_schedule_enter(cpu);
rq = cpu_rq(cpu);
prev = rq->curr;
 
@@ -3448,6 +3449,7 @@ static void __sched notrace __schedule(bool preempt)
}
 
balance_callback(rq);
+   trace_sched_schedule_exit(cpu);
 }
 
 void __noreturn do_task_dead(void)
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index f5516bae0c1b..e328e045c6e8 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -224,6 +224,7 @@ static void cpuidle_idle_call(void)
 static void do_idle(void)
 {
int cpu = smp_processor_id();
+   trace_do_idle_enter(cpu);
/*
 * If the arch has a polling bit, we maintain an invariant:
 *
@@ -287,6 +288,7 @@ static void do_idle(void)
 
if (unlikely(klp_patch_pending(current)))
klp_update_patch_state(current);
+   trace_do_idle_exit(cpu);
 }
 
 bool cpu_in_idle(unsigned long pc)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
ind

[PATCH v4 0/4] Some new features for the latency tracers

2019-05-17 Thread Viktor Rosendahl

Hello all,

Changes in v4:

- [PATCH 1/4]:
  * Improved support for multiple active trace instances by storing entries
that cannot be notified immediately in a linked list instead of a
single pointer. With multiple instances this could happen.

  * Register trace_maxlat_fsnotify_init() with late_initcall_sync().
Previously it was piggy backed on tracer_init_tracefs() and it got
called more than once.

  * Added some comments.

- [PATCH 3/4]:
  * Some of the code was incorrectly added in the next patch. Improved
commit message.

- [PATCH 4/4]:
  * Some code from the previous patch was incorrectly included here.


This series is meant to address two issues with the latency tracing.

The first three patches provide a method to trace latencies that always
occurs very close to each other and to differentiate between them, in spite
of the fact that the latency tracers work in overwrite mode.

[PATCH 1/4] This implement fs notification for tracing_max_latency. It
makes it possible for userspace to detect when a new latency has been
detected.

[PATCH 2/4] This extends the preemptirq_delay_test module so that it can be
used to generate a burst of closely occurring latencies.

[PATCH 3/4] This adds a user space program to the tools directory that
utilizes the fs notification feature and a randomized algorithm to print out
any of the latencies in a burst with approximately equal probability.

The last patch is not directly connected but doesn't apply cleanly on
its own:

[PATCH 4/4] This adds the option console-latency to the trace options. This
makes it possible to enable tracing of console latencies.

best regards,

Viktor

Viktor Rosendahl (4):
  ftrace: Implement fs notification for tracing_max_latency
  preemptirq_delay_test: Add the burst feature and a sysfs trigger
  Add the latency-collector to tools
  ftrace: Add an option for tracing console latencies

 include/linux/irqflags.h |   21 +
 include/trace/events/power.h |   40 +
 include/trace/events/sched.h |   40 +
 kernel/printk/printk.c   |6 +-
 kernel/sched/core.c  |2 +
 kernel/sched/idle.c  |2 +
 kernel/trace/Kconfig |6 +-
 kernel/trace/preemptirq_delay_test.c |  145 ++-
 kernel/trace/trace.c |  168 +++-
 kernel/trace/trace.h |   13 +
 kernel/trace/trace_hwlat.c   |4 +-
 kernel/trace/trace_irqsoff.c |   12 +
 tools/Makefile   |   14 +-
 tools/trace/Makefile |   20 +
 tools/trace/latency-collector.c  | 1211 ++
 15 files changed, 1671 insertions(+), 33 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

-- 
2.17.1

Re: [PATCH v3 0/4] Some new features for the latency tracers

2019-05-14 Thread Viktor Rosendahl


On 5/14/19 6:51 PM, Steven Rostedt wrote:


Hi Viktor,

Note, as the merge window is open and I'm also currently at a
conference, it may be a while before I can take a look at this. If you
don't hear something from me in 10 days, feel free to send me a ping.

-- Steve




Hi Steve,

Ok, good to know.

I discovered a couple of annoying issues with this v3 of the series, so 
I will probably send out v4 soon.


best regards,

Viktor

[PATCH v3 3/4] Add the latency-collector to tools

2019-05-13 Thread Viktor Rosendahl

bove, we randomly, happened to get the third latency
in a burst burst of 10. If the experiment is repeated enough times,
we will get all 10.

Sometimes, there will be lost latencies, so that we get:

3054.078294 Latency 44 printout skipped due to inotify loop
..or:

3211.957685 Latency 112 printout skipped due to print loop

In my experience, these are not frequent enough to be a problem.

Also, we do not really lose any latency in these cases, it's
more like we get another latency than what was intended by
the design.

Signed-off-by: Viktor Rosendahl 
---
 tools/Makefile  |   14 +-
 tools/trace/Makefile|   20 +
 tools/trace/latency-collector.c | 1212 +++
 3 files changed, 1240 insertions(+), 6 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

diff --git a/tools/Makefile b/tools/Makefile
index 3dfd72ae6c1a..f7465abf3a0a 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -29,6 +29,7 @@ help:
@echo '  spi- spi tools'
@echo '  objtool- an ELF object analysis tool'
@echo '  tmon   - thermal monitoring and tuning tool'
+   @echo '  trace  - misc tracing tools'
@echo '  turbostat  - Intel CPU idle stats and freq 
reporting tool'
@echo '  usb- USB testing tools'
@echo '  virtio - vhost test module'
@@ -62,7 +63,7 @@ acpi: FORCE
 cpupower: FORCE
$(call descend,power/$@)
 
-cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci 
firmware debugging: FORCE
+cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci 
firmware debugging trace: FORCE
$(call descend,$@)
 
 liblockdep: FORCE
@@ -98,7 +99,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \
perf selftests spi turbostat usb \
virtio vm bpf x86_energy_perf_policy \
tmon freefall iio objtool kvm_stat wmi \
-   pci debugging
+   pci debugging trace
 
 acpi_install:
$(call descend,power/$(@:_install=),install)
@@ -106,7 +107,7 @@ acpi_install:
 cpupower_install:
$(call descend,power/$(@:_install=),install)
 
-cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install spi_install usb_install virtio_install vm_install bpf_install 
objtool_install wmi_install pci_install debugging_install:
+cgroup_install firewire_install gpio_install hv_install iio_install 
perf_install spi_install usb_install virtio_install vm_install bpf_install 
objtool_install wmi_install pci_install debugging_install trace_install:
$(call descend,$(@:_install=),install)
 
 liblockdep_install:
@@ -132,7 +133,7 @@ install: acpi_install cgroup_install cpupower_install 
gpio_install \
perf_install selftests_install turbostat_install usb_install \
virtio_install vm_install bpf_install 
x86_energy_perf_policy_install \
tmon_install freefall_install objtool_install kvm_stat_install \
-   wmi_install pci_install debugging_install
+   wmi_install pci_install debugging_install trace_install
 
 acpi_clean:
$(call descend,power/acpi,clean)
@@ -140,7 +141,7 @@ acpi_clean:
 cpupower_clean:
$(call descend,power/cpupower,clean)
 
-cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean 
wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean 
firmware_clean debugging_clean:
+cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean 
wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean 
firmware_clean debugging_clean trace_clean:
$(call descend,$(@:_clean=),clean)
 
 liblockdep_clean:
@@ -178,6 +179,7 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean 
firewire_clean \
perf_clean selftests_clean turbostat_clean spi_clean usb_clean 
virtio_clean \
vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean 
tmon_clean \
freefall_clean build_clean libbpf_clean libsubcmd_clean 
liblockdep_clean \
-   gpio_clean objtool_clean leds_clean wmi_clean pci_clean 
firmware_clean debugging_clean
+   gpio_clean objtool_clean leds_clean wmi_clean pci_clean 
firmware_clean \
+   debugging_clean trace_clean
 
 .PHONY: FORCE
diff --git a/tools/trace/Makefile b/tools/trace/Makefile
new file mode 100644
index ..59cd483ab01f
--- /dev/null
+++ b/tools/trace/Makefile
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for vm tools
+#
+TARGETS = latency-collector
+CFLAGS = -Wall -Wextra -g -O2
+LDFLAGS = -lpthread
+
+all: $(TARGETS)
+
+%: %.c
+   $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
+
+clean:
+   $(RM) latency-collector
+
+sbindir ?= /usr/sbin
+
+install: all
+   install -d

[PATCH v3 4/4] ftrace: Add an option for tracing console latencies

2019-05-13 Thread Viktor Rosendahl

This new trace option "console-latency" will enable the latency
tracers to trace the console latencies. Previously this has always been
implicitely disabled. I guess this is because they are considered
to be well known and unavoidable.

However, for some organizations it may nevertheless be desirable to
trace them. Basically, we want to be able to tell that there are
latencies in the system under test because someone has incorrectly
enabled the serial console.

Signed-off-by: Viktor Rosendahl 
---
 include/linux/irqflags.h| 21 +
 kernel/printk/printk.c  |  6 --
 kernel/trace/trace.h|  1 +
 kernel/trace/trace_irqsoff.c| 12 
 tools/trace/latency-collector.c | 11 +--
 5 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 21619c92c377..34b0424a32dc 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -68,9 +68,30 @@ do { \
defined(CONFIG_PREEMPT_TRACER)
  extern void stop_critical_timings(void);
  extern void start_critical_timings(void);
+ extern bool console_tracing_disabled(void);
+
+# define console_stop_critical_timings(flag)   \
+   do {\
+   typecheck(bool, flag);  \
+   flag = console_tracing_disabled();  \
+   if (flag)   \
+   stop_critical_timings();\
+   } while (0)
+
+# define console_start_critical_timings(flag)   \
+   do { \
+   typecheck(bool, flag);   \
+   if (flag)\
+   start_critical_timings();\
+   } while (0)
+
 #else
 # define stop_critical_timings() do { } while (0)
 # define start_critical_timings() do { } while (0)
+# define console_stop_critical_timings(flag)   \
+   do { typecheck(bool, flag); } while (0)
+# define console_start_critical_timings(flag)  \
+   do { typecheck(bool, flag); } while (0)
 #endif
 
 /*
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 02ca827b8fac..3a18b7208399 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2348,6 +2348,7 @@ void console_unlock(void)
static char ext_text[CONSOLE_EXT_LOG_MAX];
static char text[LOG_LINE_MAX + PREFIX_MAX];
unsigned long flags;
+   bool cflag;
bool do_cond_resched, retry;
 
if (console_suspended) {
@@ -2448,9 +2449,10 @@ void console_unlock(void)
 */
console_lock_spinning_enable();
 
-   stop_critical_timings();/* don't trace print latency */
+   /* don't trace print latency if it's disabled */
+   console_stop_critical_timings(cflag);
call_console_drivers(ext_text, ext_len, text, len);
-   start_critical_timings();
+   console_start_critical_timings(cflag);
 
if (console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 3456d6a47a47..ca10ad40f2f8 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1250,6 +1250,7 @@ extern int trace_get_user(struct trace_parser *parser, 
const char __user *ubuf,
C(PRINTK_MSGONLY,   "printk-msg-only"), \
C(CONTEXT_INFO, "context-info"),   /* Print 
pid/cpu/time */ \
C(LATENCY_FMT,  "latency-format"),  \
+   C(CONSOLE_LATENCY,  "console-latency"), \
C(RECORD_CMD,   "record-cmd"),  \
C(RECORD_TGID,  "record-tgid"), \
C(OVERWRITE,"overwrite"),   \
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index a745b0cee5d3..bc02be207a7a 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -456,6 +456,18 @@ void stop_critical_timings(void)
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 NOKPROBE_SYMBOL(stop_critical_timings);
 
+bool console_tracing_disabled(void)
+{
+   struct trace_array *tr = irqsoff_trace;
+   int pc = preempt_count();
+
+   if (!preempt_trace(pc) && !irq_trace())
+   return false;
+
+   return !(tr->trace_flags & TRACE_ITER_CONSOLE_LATENCY);
+}
+EXPORT_SYMBOL_GPL(console_tracing_disabled);
+
 #ifdef CONFIG_FUNCTION_TRACER
 static bool function_enabled;
 
diff --git a/tools/trace/latency-collector.c b/tools/trace/latency-collector.c
index dfc364d7836a..8cf3c74d998f 100644
--- a/tools/trace/latency-collector.c
+++ b/tools/trace/latency-collector.c
@@

[PATCH v3 1/4] ftrace: Implement fs notification for tracing_max_latency

2019-05-13 Thread Viktor Rosendahl

This patch implements the feature that the tracing_max_latency file,
e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive
notifications through the fsnotify framework when a new latency is
available.

One particularly interesting use of this facility is when enabling
threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh,
together with the preempt/irqsoff tracers. This makes it possible to
implement a user space program that can, with equal probability,
obtain traces of latencies that occur immediately after each other in
spite of the fact that the preempt/irqsoff tracers operate in overwrite
mode.

This facility works with the hwlat, preempt/irqsoff, and wakeup
tracers.

This patch also add four new tracepoints, for entering/exiting
__schedule() and do_idle(). Those tracepoints are needed to disable
and enable the notificaton facility because we cannot wake up the
workqueue from these critical sections without risking a deadlock.
Similar problems would also arise if we try to schedule a tasklet,
raise a softirq, or wake up a kernel thread.

If a notification event would happen in the forbidden sections, we
schedule the fsnotify work as soon as we have exited
do_idle()/__schedule().

Signed-off-by: Viktor Rosendahl 
---
 include/trace/events/power.h |  40 +
 include/trace/events/sched.h |  40 +
 kernel/sched/core.c  |   2 +
 kernel/sched/idle.c  |   2 +
 kernel/trace/trace.c | 160 ++-
 kernel/trace/trace.h |  10 +++
 kernel/trace/trace_hwlat.c   |   4 +-
 7 files changed, 255 insertions(+), 3 deletions(-)

diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index f7aece721aed..40ab747274bd 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -40,6 +40,46 @@ DEFINE_EVENT(cpu, cpu_idle,
TP_ARGS(state, cpu_id)
 );
 
+/*
+ * Tracepoint for entering do_idle():
+ */
+TRACE_EVENT(do_idle_enter,
+
+   TP_PROTO(int cpu),
+
+   TP_ARGS(cpu),
+
+   TP_STRUCT__entry(
+   __field(int, cpu)
+   ),
+
+   TP_fast_assign(
+   __entry->cpu = cpu;
+   ),
+
+   TP_printk("cpu=%d", __entry->cpu)
+);
+
+/*
+ * Tracepoint for exiting do_idle():
+ */
+TRACE_EVENT(do_idle_exit,
+
+   TP_PROTO(int cpu),
+
+   TP_ARGS(cpu),
+
+   TP_STRUCT__entry(
+   __field(int, cpu)
+   ),
+
+   TP_fast_assign(
+   __entry->cpu = cpu;
+   ),
+
+   TP_printk("cpu=%d", __entry->cpu)
+);
+
 TRACE_EVENT(powernv_throttle,
 
TP_PROTO(int chip_id, const char *reason, int pmax),
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 9a4bdfadab07..6a3dae7b3249 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -183,6 +183,46 @@ TRACE_EVENT(sched_switch,
__entry->next_comm, __entry->next_pid, __entry->next_prio)
 );
 
+/*
+ * Tracepoint for entering __schedule():
+ */
+TRACE_EVENT(sched_schedule_enter,
+
+   TP_PROTO(int cpu),
+
+   TP_ARGS(cpu),
+
+   TP_STRUCT__entry(
+   __field(int, cpu)
+   ),
+
+   TP_fast_assign(
+   __entry->cpu = cpu;
+   ),
+
+   TP_printk("cpu=%d", __entry->cpu)
+);
+
+/*
+ * Tracepoint for exiting __schedule():
+ */
+TRACE_EVENT(sched_schedule_exit,
+
+   TP_PROTO(int cpu),
+
+   TP_ARGS(cpu),
+
+   TP_STRUCT__entry(
+   __field(int, cpu)
+   ),
+
+   TP_fast_assign(
+   __entry->cpu = cpu;
+   ),
+
+   TP_printk("cpu=%d", __entry->cpu)
+);
+
 /*
  * Tracepoint for a task being migrated:
  */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 102dfcf0a29a..c9d86fcc48f5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3374,6 +3374,7 @@ static void __sched notrace __schedule(bool preempt)
int cpu;
 
cpu = smp_processor_id();
+   trace_sched_schedule_enter(cpu);
rq = cpu_rq(cpu);
prev = rq->curr;
 
@@ -3448,6 +3449,7 @@ static void __sched notrace __schedule(bool preempt)
}
 
balance_callback(rq);
+   trace_sched_schedule_exit(cpu);
 }
 
 void __noreturn do_task_dead(void)
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index f5516bae0c1b..e328e045c6e8 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -224,6 +224,7 @@ static void cpuidle_idle_call(void)
 static void do_idle(void)
 {
int cpu = smp_processor_id();
+   trace_do_idle_enter(cpu);
/*
 * If the arch has a polling bit, we maintain an invariant:
 *
@@ -287,6 +288,7 @@ static void do_idle(void)
 
if (unlikely(klp_patch_pending(current)))
klp_update_patch_state(current);
+   trace_do_idle_exit(cpu);
 }
 
 bool cpu_in_idle(unsigned long pc)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
ind

[PATCH v3 0/4] Some new features for the latency tracers

2019-05-13 Thread Viktor Rosendahl

Hello all,

Changes in v3:
- [PATCH 1/4]: 
  * I have generalized this to send the fsnotify() notifications for all
tracers that use tracing_max_latency.

  * There are large additions of code to prevent queue_work() from being
called while we are in __schedule() or do_idle(). This was a bug also
in the previous version but had gone unnoticed. It became very visible
when I tried to make it work with the wakeup tracers because they are
always invoked from the sched_switch tracepoint.

  * The fsnotify notification is issued for tracing_max_latency, not trace.

  * I got rid of the kernel config option. The facility is always compiled
when CONFIG_FSNOTIFY is enabled and any of the latency tracers are
enabled.

- [PATCH 2/4]:
  * No significant changes.

- [PATCH 3/4]:
  * The latency-collector help messages have been tuned to the fact that it
can be used also with wakeup and hwlat tracers.

  * I increased the size of the buffer for reading from
/sys/kernel/debug/tracing/trace.

  * Adapted it to monitor tracing_max_latency instead of trace

- [PATCH 4/4]:
  * I converted this from a kernel config option to a trace config option
that can be changed at runtime.


This series is meant to address two issues with the latency tracing.

The first three patches provide a method to trace latencies that always
occurs very close to each other and to differentiate between them, in spite
of the fact that the latency tracers always work in overwrite mode.

[PATCH 1/4] This implement fs notification for preempt/irqsoff. It makes it
possible for userspace to detect when a new latency has been detected.

[PATCH 2/4] This extends the preemptirq_delay_test module so that it can be
used to generate a burst of closely occurring latencies.

[PATCH 3/4] This adds a user space program to the tools directory that
utilizes the fs notification feature and a randomized algorithm to print out
any of the latencies in a burst with approximately equal probability.

The last patch is not directly connected but doesn't apply cleanly on
its own:

[PATCH 4/4] This adds the option console-latency to the trace options.

best regards,

Viktor


Viktor Rosendahl (4):
  ftrace: Implement fs notification for tracing_max_latency
  preemptirq_delay_test: Add the burst feature and a sysfs trigger
  Add the latency-collector to tools
  ftrace: Add an option for tracing console latencies

 include/linux/irqflags.h |   21 +
 include/trace/events/power.h |   40 +
 include/trace/events/sched.h |   40 +
 kernel/printk/printk.c   |6 +-
 kernel/sched/core.c  |2 +
 kernel/sched/idle.c  |2 +
 kernel/trace/Kconfig |6 +-
 kernel/trace/preemptirq_delay_test.c |  145 ++-
 kernel/trace/trace.c |  160 +++-
 kernel/trace/trace.h |   11 +
 kernel/trace/trace_hwlat.c   |4 +-
 kernel/trace/trace_irqsoff.c |   12 +
 tools/Makefile   |   14 +-
 tools/trace/Makefile |   20 +
 tools/trace/latency-collector.c  | 1211 ++
 15 files changed, 1661 insertions(+), 33 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

-- 
2.17.1

[PATCH v3 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger

2019-05-13 Thread Viktor Rosendahl

This burst feature enables the user to generate a burst of
preempt/irqsoff latencies. This makes it possible to test whether we
are able to detect latencies that systematically occur very close to
each other.

The maximum burst size is 10. We also create 10 identical test
functions, so that we get 10 different backtraces; this is useful
when we want to test whether we can detect all the latencies in a
burst. Otherwise, there would be no easy way of differentiating
between which latency in a burst was captured by the tracer.

In addition, there is a sysfs trigger, so that it's not necessary to
reload the module to repeat the test. The trigger will appear as
/sys/kernel/preemptirq_delay_test/trigger in sysfs.

Signed-off-by: Viktor Rosendahl 
---
 kernel/trace/Kconfig |   6 +-
 kernel/trace/preemptirq_delay_test.c | 145 +++
 2 files changed, 129 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 5d965cef6c77..aab8909dfbff 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -740,9 +740,9 @@ config PREEMPTIRQ_DELAY_TEST
  configurable delay. The module busy waits for the duration of the
  critical section.
 
- For example, the following invocation forces a one-time irq-disabled
- critical section for 500us:
- modprobe preemptirq_delay_test test_mode=irq delay=50
+ For example, the following invocation generates a burst of three
+ irq-disabled critical sections for 500us:
+ modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3
 
  If unsure, say N
 
diff --git a/kernel/trace/preemptirq_delay_test.c 
b/kernel/trace/preemptirq_delay_test.c
index d8765c952fab..dc281fa75198 100644
--- a/kernel/trace/preemptirq_delay_test.c
+++ b/kernel/trace/preemptirq_delay_test.c
@@ -3,6 +3,7 @@
  * Preempt / IRQ disable delay thread to test latency tracers
  *
  * Copyright (C) 2018 Joel Fernandes (Google) 
+ * Copyright (C) 2018, 2019 BMW Car IT GmbH
  */
 
 #include 
@@ -10,18 +11,25 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 static ulong delay = 100;
-static char test_mode[10] = "irq";
+static char test_mode[12] = "irq";
+static uint burst_size = 1;
 
-module_param_named(delay, delay, ulong, S_IRUGO);
-module_param_string(test_mode, test_mode, 10, S_IRUGO);
-MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)");
-MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default 
irq)");
+module_param_named(delay, delay, ulong, 0444);
+module_param_string(test_mode, test_mode, 12, 0444);
+module_param_named(burst_size, burst_size, uint, 0444);
+MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)");
+MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or 
alternate (default irq)");
+MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)");
+
+#define MIN(x, y) ((x) < (y) ? (x) : (y))
 
 static void busy_wait(ulong time)
 {
@@ -34,37 +42,136 @@ static void busy_wait(ulong time)
} while ((end - start) < (time * 1000));
 }
 
-static int preemptirq_delay_run(void *data)
+static __always_inline void irqoff_test(void)
 {
unsigned long flags;
+   local_irq_save(flags);
+   busy_wait(delay);
+   local_irq_restore(flags);
+}
 
-   if (!strcmp(test_mode, "irq")) {
-   local_irq_save(flags);
-   busy_wait(delay);
-   local_irq_restore(flags);
-   } else if (!strcmp(test_mode, "preempt")) {
-   preempt_disable();
-   busy_wait(delay);
-   preempt_enable();
+static __always_inline void preemptoff_test(void)
+{
+   preempt_disable();
+   busy_wait(delay);
+   preempt_enable();
+}
+
+static void execute_preemptirqtest(int idx)
+{
+   if (!strcmp(test_mode, "irq"))
+   irqoff_test();
+   else if (!strcmp(test_mode, "preempt"))
+   preemptoff_test();
+   else if (!strcmp(test_mode, "alternate")) {
+   if (idx % 2 == 0)
+   irqoff_test();
+   else
+   preemptoff_test();
}
+}
+
+#define DECLARE_TESTFN(POSTFIX)\
+   static void preemptirqtest_##POSTFIX(int idx)   \
+   {   \
+   execute_preemptirqtest(idx);\
+   }   \
 
+/*
+ * We create 10 different functions, so that we can get 10 different
+ * backtraces.
+ */
+DECLARE_TESTFN(0)
+DECLARE_TESTFN(1)
+DECLARE_TESTFN(2)
+DECLARE_TESTFN(3)
+DECLARE_TESTFN(4)
+DECLARE_TESTFN(5)
+DECLARE_TESTFN(6)
+DECLARE_TESTFN(7)
+DECLARE_TESTFN(8)
+DECLARE_TESTFN(9)
+
+static void (*testfuncs[])(int)  = {
+

Re: [PATCH v2 1/4] ftrace: Implement fs notification for preempt/irqsoff tracers

2019-05-05 Thread Viktor Rosendahl

On 5/6/19 1:01 AM, Steven Rostedt wrote:
> On Mon,  6 May 2019 00:39:15 +0200
> Viktor Rosendahl  wrote:
>
>> Can you explain more precisely what you agree with?
>>
>> The general idea of being able to trace bursts of latencies?
>
> One thing I have an issue with the current approach is the use of the
> trace file for this.

You mean that using fsnotify is kind of okayish but that it's confusing 
for the

user because only a subset of tracers would send the fsnotify event when the
trace file is updated?

>
> Hmm, what about adding a notifier to tracing_max_latency instead? And
> do it not as a config option, but have it always enabled. It would send a
> notification when it changes, and that only happens when there's a new
> max latency. Would that work for you?

Yes, it seems to be OK since the tracing_max_latency is updated also
with the latest latency that exceeds the threshold when we are using
tracing_thresh. I will try to send a new version of the patch series
soon, with the modifications that have been discussed so far.

best regards,

Viktor

Re: [PATCH v2 1/4] ftrace: Implement fs notification for preempt/irqsoff tracers

2019-05-05 Thread Viktor Rosendahl

On 5/4/19 6:47 PM, Joel Fernandes wrote:
> On Wed, May 01, 2019 at 10:36:47PM +0200, Viktor Rosendahl wrote:
>> immediately after each other in spite of the fact that the
>> preempt/irqsoff tracers operate in overwrite mode.
>>
>> Signed-off-by: Viktor Rosendahl 
>
> I agree with the general idea, but I don't really like how it is done in the
> patch.
>

Can you explain more precisely what you agree with?

The general idea of being able to trace bursts of latencies?

Or the slightly more specific idea of notifying user space when
/sys/kernel/debug/tracing/trace has new data, and let user space do the rest?

> We do have a notification mechanism already in the form of trace_pipe. Can we
> not improve that in some way to be notified of a new trace data? In theory,
> the trace_pipe does fit into the description in the documentation: "Reads
> from this file will block until new data is retrieved"
>

I am not quite sure what kind of solution you are after here. Could you be more
specific?

I think that it would be weird if we used trace_pipe to send a message with
the meaning "new data is available in the trace file". To me this would seem
like (re)inventing a special purpose alternative to inotify.

Another option would be to allow the user to consume the the latency traces
directly from trace_pipe, without using the trace file. This would make sense
to me and would indeed be a much better solution. If somebody is able to make
such an implementation I am all for it.

However, I do not know how to do this. I believe that it would lead towards a
significant rewrite of the latency tracers, probably also how the ftrace
buffering works.

My solution is clearly a hack but it has the benefits of requiring quite
small and straightforward changes to the kernel and the really ugly things are
then done in user space.

>>
>> +config PREEMPTIRQ_FSNOTIFY

> Does this have to be a CONFIG option? If prefer if the code automatically
> does the notification and it is always enabled. I don't see any drawbacks of
> that.
>

I was just thinking that sending fsnotify events to a sysfs file is a bit off
the beaten track and I figured that some people would like to opt out.

It can of course be changed so that it is always enabled when a tracer that
requires it is enabled, if there is general agreement on this point.

>> +
>>  config SCHED_TRACER
>>  bool "Scheduling Latency Tracer"
>>  select GENERIC_TRACER
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index ca1ee656d6d8..ebefb8d4e072 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -44,6 +44,8 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>> +#include 
>>
>>  #include "trace.h"
>>  #include "trace_output.h"
>> @@ -8191,6 +8193,32 @@ static __init void create_trace_instances(struct 
>> dentry *d_tracer)
>>  return;
>>  }
>>
>> +#ifdef CONFIG_PREEMPTIRQ_FSNOTIFY
>> +
>> +static void trace_notify_workfn(struct work_struct *work)
> [snip]
>
> I prefer if this facility is available to other tracers as well such as
> the wakeup tracer which is similar in output (check
> Documentation/trace/ftrace.txt). I believe this should be a generic trace
> facility, and not tracer specific.
>

This should be possible.

If we stick with the fsnotify idea, it's possible to move the functions for it
to trace.c and to use it also from other tracers.

It is also possible to do a few small adjustments to the latency-collector and
to rename it the trace-collector, so that it can work for the wakeup case also.
The random sleep games probably don't make sense for the wakeup case but it's
already now a command line option.

Below is a suggestion for how to make the facility more generic and usable by
by both preempt/irqsoff and wakeup.

best regards,

Viktor
---
 kernel/trace/Kconfig  | 11 +++
 kernel/trace/trace.c  | 50 +--
 kernel/trace/trace.h  | 15 ++
 kernel/trace/trace_irqsoff.c  |  5 
 kernel/trace/trace_sched_wakeup.c |  7 -
 5 files changed, 85 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 8bd1d6d001d7..ca6a63004adf 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -244,6 +244,17 @@ config SCHED_TRACER
  This tracer tracks the latency of the highest priority task
  to be scheduled in, starting from the point it has woken up.

+config TRACE_FSNOTIFY
+   bool "Generate fsnotify events for the trace file"
+   default n
+   depends on (IRQSOFF_TRACER || PREEMPT_TRACER || SCHED_TRACER)
+   depends on FSNOTIFY
+   help
+ This opt

Re: [PATCH v2 4/4] ftrace: Add an option for tracing console latencies

2019-05-02 Thread Viktor Rosendahl

> Instead of this being turned into a nop, don't have a kconfig option
> but instead have this call into the trace_irqsoff.c code, and depending
> on what the options are, it should stop it. Of course, this would need
> to be smart enough to pair it. Perhaps return the result of
> console_stop_critical_timings() and have that passed to
> console_start_critical_timings(), and only have start do something if
> stop did something. This way the option only needs to disable the stop
> part.

Something like this?

I have only compile tested it so far.

best regards,

Viktor
---
 include/linux/irqflags.h | 17 +
 kernel/printk/printk.c   |  6 --
 kernel/trace/trace.h |  1 +
 kernel/trace/trace_irqsoff.c | 13 +
 4 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 21619c92c377..e4f7ebc67a53 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -68,9 +68,26 @@ do { \
defined(CONFIG_PREEMPT_TRACER)
  extern void stop_critical_timings(void);
  extern void start_critical_timings(void);
+ extern bool console_tracing_disabled(void);
+
+#define console_stop_critical_timings(FLAG)\
+   do {\
+   FLAG = console_tracing_disabled();  \
+   if (FLAG)   \
+   stop_critical_timings();\
+   } while(0)
+
+#define console_start_critical_timings(FLAG)\
+   do { \
+   if (FLAG)\
+   start_critical_timings();\
+   } while(0)
+
 #else
 # define stop_critical_timings() do { } while (0)
 # define start_critical_timings() do { } while (0)
+# define console_stop_critical_timings(FLAG) do { } while (0)
+# define console_start_critical_timings(FLAG) do { } while (0)
 #endif
 
 /*
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 02ca827b8fac..3a18b7208399 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2348,6 +2348,7 @@ void console_unlock(void)
static char ext_text[CONSOLE_EXT_LOG_MAX];
static char text[LOG_LINE_MAX + PREFIX_MAX];
unsigned long flags;
+   bool cflag;
bool do_cond_resched, retry;
 
if (console_suspended) {
@@ -2448,9 +2449,10 @@ void console_unlock(void)
 */
console_lock_spinning_enable();
 
-   stop_critical_timings();/* don't trace print latency */
+   /* don't trace print latency if it's disabled */
+   console_stop_critical_timings(cflag);
call_console_drivers(ext_text, ext_len, text, len);
-   start_critical_timings();
+   console_start_critical_timings(cflag);
 
if (console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 59dc01ac52fd..3eed4756dba3 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1253,6 +1253,7 @@ extern int trace_get_user(struct trace_parser *parser, 
const char __user *ubuf,
C(PRINTK_MSGONLY,   "printk-msg-only"), \
C(CONTEXT_INFO, "context-info"),   /* Print 
pid/cpu/time */ \
C(LATENCY_FMT,  "latency-format"),  \
+   C(CONSOLE_LATENCY,  "console-latency"), \
C(RECORD_CMD,   "record-cmd"),  \
C(RECORD_TGID,  "record-tgid"), \
C(OVERWRITE,"overwrite"),   \
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 07a391e845de..659a82209c1e 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -490,6 +490,19 @@ void stop_critical_timings(void)
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 NOKPROBE_SYMBOL(stop_critical_timings);
 
+bool console_tracing_disabled(void)
+{
+   struct trace_array *tr = irqsoff_trace;
+   int pc = preempt_count();
+
+   if (!preempt_trace(pc) && !irq_trace())
+   return false;
+
+   return !(tr->trace_flags & CONSOLE_LATENCY);
+}
+
+EXPORT_SYMBOL_GPL(console_tracing_disabled);
+
 #ifdef CONFIG_FUNCTION_TRACER
 static bool function_enabled;
 
-- 
2.17.1

[PATCH v2 4/4] ftrace: Add an option for tracing console latencies

2019-05-01 Thread Viktor Rosendahl

This new option CONFIG_TRACE_CONSOLE_LATENCY will enable the latency
tracers to trace the console latencies. Previously this has always been
implicitely disabled. I guess this is because they are considered
to be well known and unavoidable.

However, for some organizations it may nevertheless be desirable to
trace them. Basically, we want to be able to tell that there are
latencies in the system under test because someone has incorrectly
enabled the serial console.

Signed-off-by: Viktor Rosendahl 
---
 include/linux/irqflags.h | 13 +
 kernel/printk/printk.c   |  5 +++--
 kernel/trace/Kconfig | 11 +++
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 21619c92c377..791bee718448 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -73,6 +73,19 @@ do { \
 # define start_critical_timings() do { } while (0)
 #endif
 
+#ifdef CONFIG_TRACE_CONSOLE_LATENCY
+
+#define console_stop_critical_timings()  do {} while (0)
+#define console_start_critical_timings() do {} while (0)
+
+#else /* !CONFIG_TRACE_CONSOLE_LATENCY */
+
+/* don't trace print latency */
+#define console_stop_critical_timings()  stop_critical_timings()
+#define console_start_critical_timings() start_critical_timings()
+
+#endif /* !CONFIG_TRACE_CONSOLE_LATENCY */
+
 /*
  * Wrap the arch provided IRQ routines to provide appropriate checks.
  */
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 02ca827b8fac..710e87f61158 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2448,9 +2448,10 @@ void console_unlock(void)
 */
console_lock_spinning_enable();
 
-   stop_critical_timings();/* don't trace print latency */
+   /* don't trace print latency if it's disabled */
+   console_stop_critical_timings();
call_console_drivers(ext_text, ext_len, text, len);
-   start_critical_timings();
+   console_start_critical_timings();
 
if (console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e5e8f2a0199e..f168d100d4fb 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -244,6 +244,17 @@ config PREEMPT_TRACER
  modification of /sys/kernel/debug/tracing/trace through the inotify
  interface.
 
+   config TRACE_CONSOLE_LATENCY
+   bool "Do not turn off latency tracing for the console"
+   default n
+   depends on IRQSOFF_TRACER || PREEMPT_TRACER
+   help
+ Some of the console drivers will cause long unavoidable
+ latencies because they are slow and need to print immediately
+ in a serialized manner. Because of this their latencies are not
+ traced by default. This option will change behavior so that
+ they are traced.
+
 config SCHED_TRACER
bool "Scheduling Latency Tracer"
select GENERIC_TRACER
-- 
2.17.1

[PATCH v2 3/4] Add the latency-collector to tools

2019-05-01 Thread Viktor Rosendahl

This is a tool that is intended to work around the fact that the
preemptoff, irqsoff, and preemptirqsoff tracers only work in
overwrite mode. The idea is to act randomly in such a way that we
do not systematically lose any latencies, so that if enough testing
is done, all latencies will be captured. If the same burst of
latencies is repeated, then sooner or later we will have captured all
the latencies.

The reason why it may be desirable to catch all latencies with a long
test campaign is that for some organizations, it's necessary to test
the kernel in the field and not practical for developers to work
iteratively with field testers. Because of cost and project schedules
it is not possible to start a new test campaign every time a latency
problem has been fixed.

It uses inotify to detect changes to /sys/kernel/debug/tracing/trace.
When a latency is detected, it will either sleep or print
immediately, depending on a function that act as an unfair coin
toss.

If immediate print is chosen, it means that we open
/sys/kernel/debug/tracing/trace and thereby cause a blackout period
that will hide any subsequent latencies.

If sleep is chosen, it means that we wait before opening
/sys/kernel/debug/tracing/trace, by default for 1000 ms, to see if
there is another latency during this period. If there is, then we will
lose the previous latency. The coin will be tossed again with a
different probability, and we will either print the new latency, or
possibly a subsequent one.

The probability for the unfair coin toss is chosen so that there
is equal probability to obtain any of the latencies in a burst.
However, this assumes that we make an assumption of how many
latencies there can be. By default  the program assumes that there
are no more than 2 latencies in a burst, the probability of immediate
printout will be:

1/2 and 1

Thus, the probability of getting each of the two latencies will be 1/2.

If we ever find that there is more than one latency in a series,
meaning that we reach the probability of 1, then the table will be
expanded to:

1/3, 1/2, and 1

Thus, we assume that there are no more than three latencies and each
with a probability of 1/3 of being captured. If the probability of 1
is reached in the new table, that is we see more than two closely
occurring latencies, then the table will again be extended, and so
on.

On my systems, it seems like this scheme works fairly well, as
long as the latencies we trace are long enough, 100 us seems to be
enough. This userspace program receive the inotify event at the end
of a latency, and it has time until the end of the next latency
to react, that is to open /sys/kernel/debug/tracing/trace. Thus,
if we trace latencies that are >100 us, then we have at least 100 us
to react.

Example output:

root@myhost:~# echo 100 > /sys/kernel/debug/tracing/tracing_thresh
root@myhost:~# echo preemptirqsoff > /sys/kernel/debug/tracing/current_tracer
root@myhost:~# latency-collector -rvv -a 11 &
[1] 4915
root@myhost:~# Running with policy rr and priority 99. Using 3 print threads.
/sys/kernel/debug/tracing/trace will be printed with random delay
Start size of the probability table:11
Print a message when prob. table table changes size:true
Print a warning when an event has been lost:true
Sleep time is:  1000 ms
Initializing probability table to 11
Thread 4916 runs with policy rr and priority 99
Thread 4917 runs with policy rr and priority 99
Thread 4918 runs with policy rr and priority 99
Thread 4919 runs with policy rr and priority 99

root@myhost:~# modprobe preemptirq_delay_test delay=105 test_mode=alternate 
burst_size=10
2850.638308 Latency 1 printout skipped due to random delay
2850.638383 Latency 2 printout skipped due to random delay
2850.638497 Latency 3 immediate print
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> BEGIN <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
alternat-49226d...0us!: execute_preemptirqtest <-preemptirqtest_2
alternat-49226d...  106us : execute_preemptirqtest <-preemptirqtest_2
alternat-49226d...  106us : tracer_hardirqs_on <-preemptirqtest_2
alternat-49226d...  108us : 
 => preemptirqtest_2
 => preemptirq_delay_run
 => kthread
 => ret_from_fork
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> END <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

In the example above, we randomly, happened to get the third latency
in a burst burst of 10. If the experiment is repeated enough times,
we will get all 10.

Sometim

[PATCH v2 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger

2019-05-01 Thread Viktor Rosendahl

This burst feature enables the user to generate a burst of
preempt/irqsoff latencies. This makes it possible to test whether we
are able to detect latencies that systematically occur very close to
each other.

The maximum burst size is 10. We also create 10 identical test
functions, so that we get 10 different backtraces; this is useful
when we want to test whether we can detect all the latencies in a
burst. Otherwise, there would be no easy way of differentiating
between which latency in a burst was captured by the tracer.

In addition, there is a sysfs trigger, so that it's not necessary to
reload the module to repeat the test. The trigger will appear as
/sys/kernel/preemptirq_delay_test/trigger in sysfs.

Signed-off-by: Viktor Rosendahl 
---
 kernel/trace/Kconfig |   6 +-
 kernel/trace/preemptirq_delay_test.c | 145 +++
 2 files changed, 129 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 35e5fd3224f6..e5e8f2a0199e 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -750,9 +750,9 @@ config PREEMPTIRQ_DELAY_TEST
  configurable delay. The module busy waits for the duration of the
  critical section.
 
- For example, the following invocation forces a one-time irq-disabled
- critical section for 500us:
- modprobe preemptirq_delay_test test_mode=irq delay=50
+ For example, the following invocation generates a burst of three
+ irq-disabled critical sections for 500us:
+ modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3
 
  If unsure, say N
 
diff --git a/kernel/trace/preemptirq_delay_test.c 
b/kernel/trace/preemptirq_delay_test.c
index d8765c952fab..dc281fa75198 100644
--- a/kernel/trace/preemptirq_delay_test.c
+++ b/kernel/trace/preemptirq_delay_test.c
@@ -3,6 +3,7 @@
  * Preempt / IRQ disable delay thread to test latency tracers
  *
  * Copyright (C) 2018 Joel Fernandes (Google) 
+ * Copyright (C) 2018, 2019 BMW Car IT GmbH
  */
 
 #include 
@@ -10,18 +11,25 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 static ulong delay = 100;
-static char test_mode[10] = "irq";
+static char test_mode[12] = "irq";
+static uint burst_size = 1;
 
-module_param_named(delay, delay, ulong, S_IRUGO);
-module_param_string(test_mode, test_mode, 10, S_IRUGO);
-MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)");
-MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default 
irq)");
+module_param_named(delay, delay, ulong, 0444);
+module_param_string(test_mode, test_mode, 12, 0444);
+module_param_named(burst_size, burst_size, uint, 0444);
+MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)");
+MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or 
alternate (default irq)");
+MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)");
+
+#define MIN(x, y) ((x) < (y) ? (x) : (y))
 
 static void busy_wait(ulong time)
 {
@@ -34,37 +42,136 @@ static void busy_wait(ulong time)
} while ((end - start) < (time * 1000));
 }
 
-static int preemptirq_delay_run(void *data)
+static __always_inline void irqoff_test(void)
 {
unsigned long flags;
+   local_irq_save(flags);
+   busy_wait(delay);
+   local_irq_restore(flags);
+}
 
-   if (!strcmp(test_mode, "irq")) {
-   local_irq_save(flags);
-   busy_wait(delay);
-   local_irq_restore(flags);
-   } else if (!strcmp(test_mode, "preempt")) {
-   preempt_disable();
-   busy_wait(delay);
-   preempt_enable();
+static __always_inline void preemptoff_test(void)
+{
+   preempt_disable();
+   busy_wait(delay);
+   preempt_enable();
+}
+
+static void execute_preemptirqtest(int idx)
+{
+   if (!strcmp(test_mode, "irq"))
+   irqoff_test();
+   else if (!strcmp(test_mode, "preempt"))
+   preemptoff_test();
+   else if (!strcmp(test_mode, "alternate")) {
+   if (idx % 2 == 0)
+   irqoff_test();
+   else
+   preemptoff_test();
}
+}
+
+#define DECLARE_TESTFN(POSTFIX)\
+   static void preemptirqtest_##POSTFIX(int idx)   \
+   {   \
+   execute_preemptirqtest(idx);\
+   }   \
 
+/*
+ * We create 10 different functions, so that we can get 10 different
+ * backtraces.
+ */
+DECLARE_TESTFN(0)
+DECLARE_TESTFN(1)
+DECLARE_TESTFN(2)
+DECLARE_TESTFN(3)
+DECLARE_TESTFN(4)
+DECLARE_TESTFN(5)
+DECLARE_TESTFN(6)
+DECLARE_TESTFN(7)
+DECLARE_TESTFN(8)
+DECLARE_TESTFN(9)
+
+static void (*testfuncs[])(int)  = {
+

[PATCH v2 0/4] Some new features for the preempt/irqsoff tracers

2019-05-01 Thread Viktor Rosendahl

Hello all,

Changes in v2:
- I have tried to improve some of the commit messages by adding
  additional explanations.
- [PATCH 2/4]: We use burst size checking instead of the confusing
  modulo game. The example given in the Kconfig file is corrected and
  extended. I was not able to find a way to use the module parameters
  as trigger, so I have left the sysfs trigger file as is.


This series is meant to address two issues with the latency tracing.

The first three patches provide a method to trace latencies that
always occurs very close to each other and to differentiate between
them, in spite of the fact that the latency tracers always work in
overwrite mode.

[PATCH 1/4] This implement fs notification for preempt/irqsoff. It
makes it possible for userspace to detect when a new latency has
been detected.

[PATCH 2/4] This extends the preemptirq_delay_test module so that
it can be used to generate a burst of closely occurring latencies.

[PATCH 3/4] This adds a user space program to the tools directory
that utilizes the fs notification feature and a randomized algorithm
to print out any of the latencies in a burst with approximately equal
probability.

The last patch is not directly connected but doesn't apply cleanly on
its own:

[PATCH 4/4] This adds the option CONFIG_TRACE_CONSOLE_LATENCY to
decide whether we want to trace prints to the console or not.

best regards,

Viktor Rosendahl

Viktor Rosendahl (4):
  ftrace: Implement fs notification for preempt/irqsoff tracers
  preemptirq_delay_test: Add the burst feature and a sysfs trigger
  Add the latency-collector to tools
  ftrace: Add an option for tracing console latencies

 include/linux/irqflags.h |   13 +
 kernel/printk/printk.c   |5 +-
 kernel/trace/Kconfig |   27 +-
 kernel/trace/preemptirq_delay_test.c |  145 +++-
 kernel/trace/trace.c |   31 +-
 kernel/trace/trace.h |5 +
 kernel/trace/trace_irqsoff.c |   35 +
 tools/Makefile   |   14 +-
 tools/trace/Makefile |   20 +
 tools/trace/latency-collector.c  | 1190 ++
 10 files changed, 1453 insertions(+), 32 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

-- 
2.17.1

[PATCH v2 1/4] ftrace: Implement fs notification for preempt/irqsoff tracers

2019-05-01 Thread Viktor Rosendahl

This patch implements the feature that the trace file, e.g.
/sys/kernel/debug/tracing/trace will receive notifications through
the fsnotify framework when a new trace is available.

This makes it possible to implement a user space program that can,
with equal probability, obtain traces of latencies that occur
immediately after each other in spite of the fact that the
preempt/irqsoff tracers operate in overwrite mode.

Signed-off-by: Viktor Rosendahl 
---
 kernel/trace/Kconfig | 10 ++
 kernel/trace/trace.c | 31 +--
 kernel/trace/trace.h |  5 +
 kernel/trace/trace_irqsoff.c | 35 +++
 4 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 8bd1d6d001d7..35e5fd3224f6 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -234,6 +234,16 @@ config PREEMPT_TRACER
  enabled. This option and the irqs-off timing option can be
  used together or separately.)
 
+   config PREEMPTIRQ_FSNOTIFY
+   bool "Generate fsnotify events for the latency tracers"
+   default n
+   depends on (IRQSOFF_TRACER || PREEMPT_TRACER) && FSNOTIFY
+   help
+ This option will enable the generation of fsnotify events for the
+ trace file. This makes it possible for userspace to be notified about
+ modification of /sys/kernel/debug/tracing/trace through the inotify
+ interface.
+
 config SCHED_TRACER
bool "Scheduling Latency Tracer"
select GENERIC_TRACER
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ca1ee656d6d8..ebefb8d4e072 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -44,6 +44,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "trace.h"
 #include "trace_output.h"
@@ -8191,6 +8193,32 @@ static __init void create_trace_instances(struct dentry 
*d_tracer)
return;
 }
 
+#ifdef CONFIG_PREEMPTIRQ_FSNOTIFY
+
+static void trace_notify_workfn(struct work_struct *work)
+{
+   struct trace_array *tr = container_of(work, struct trace_array,
+ notify_work);
+   fsnotify(tr->d_trace->d_inode, FS_MODIFY, tr->d_trace->d_inode,
+FSNOTIFY_EVENT_INODE, NULL, 0);
+}
+
+static void trace_create_trace_file(struct trace_array *tr,
+   struct dentry *d_tracer)
+{
+   /* For notify we need to init the work structure and save the pointer */
+   INIT_WORK(>notify_work, trace_notify_workfn);
+   tr->d_trace = trace_create_file("trace", 0644, d_tracer, tr,
+   _fops);
+}
+
+#else /* !CONFIG_PREEMPTIRQ_FSNOTIFY */
+
+#define trace_create_trace_file(tr, d_tracer) \
+   trace_create_file("trace", 0644, d_tracer, tr, _fops)
+
+#endif
+
 static void
 init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
 {
@@ -8209,8 +8237,7 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry 
*d_tracer)
trace_create_file("trace_options", 0644, d_tracer,
  tr, _iter_fops);
 
-   trace_create_file("trace", 0644, d_tracer,
- tr, _fops);
+   trace_create_trace_file(tr, d_tracer);
 
trace_create_file("trace_pipe", 0444, d_tracer,
  tr, _pipe_fops);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index d80cee49e0eb..59dc01ac52fd 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_FTRACE_SYSCALLS
 #include /* For NR_SYSCALLS   */
@@ -302,6 +303,10 @@ struct trace_array {
struct dentry   *options;
struct dentry   *percpu_dir;
struct dentry   *event_dir;
+#ifdef CONFIG_PREEMPTIRQ_FSNOTIFY
+   struct dentry   *d_trace;
+   struct work_struct  notify_work;
+#endif
struct trace_options*topts;
struct list_headsystems;
struct list_headevents;
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index a745b0cee5d3..07a391e845de 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -82,6 +82,31 @@ static inline int irqsoff_display_graph(struct trace_array 
*tr, int set)
  */
 static __cacheline_aligned_in_smp  unsigned long max_sequence;
 
+#ifdef CONFIG_PREEMPTIRQ_FSNOTIFY
+
+static struct workqueue_struct *notify_wq;
+
+static __init void trace_file_notify_init(void)
+{
+   notify_wq = alloc_workqueue("irqsoff_notify_wq",
+   WQ_UNBOUND | WQ_HIGHPRI, 0);
+   if (!notify_wq)
+   pr_err("Unable to allocate irqsoff_notify_wq");
+}
+
+static inline void trac

[PATCH 3/4] Add the latency-collector to tools

2019-01-21 Thread Viktor Rosendahl

This is a tool that is intended to work around the fact that the
preemptoff, irqsoff, and preemptirqsoff tracers only work in
overwrite mode. The idea is to act randomly in such a way that we
do not systematically lose any latencies, so that if enough testing
is done, all latencies will be captured.

It uses inotify to detect changes to /sys/kernel/debug/tracing/trace.
When a latency is detected, it will either sleep or print
immediately, depending on a function that act as an unfair coin
toss.

If immediate print is chosen, it means that we open
/sys/kernel/debug/tracing/trace and thereby cause a blackout period
that will hide any subsequent latencies.

If sleep is chosen, it means that we wait before opening
/sys/kernel/debug/tracing/trace, by default for 1000 ms, to see if
there is another latency during this period. If there is, then we will
lose the previous latency. The coin will be tossed again with a
different probability, and we will either print the new latency, or
possibly a subsequent one.

The probability for the unfair coin toss is chosen so that there
is equal probability to obtain any of the latencies in a burst.
However, this assumes that we make an assumption of how many
latencies there can be. By default  the program assumes that there
are no more than 2 latencies in a burst, the probability of immediate
printout will be:

1/2 and 1

Thus, the probabilty of getting each of the two latencies will be 1/2.

If we ever find that there is more than one latency in a series,
meaning that we reach the probability of 1, then the table will be
expanded to:

1/3, 1/2, and 1

Thus, we assume that there are no more than three latencies and each
with a probability of 1/3 of being captured. If the probability of 1
is reached in the new table, that is we see more than two closely
occurring latencies, then the table will again be extended, and so
on.

On my systems, it seems like this scheme works fairly well, as
long as the latencies we trace are long enough, 100 us seems to be
enough. This userspace program receive the inotify event at the end
of a latency, and it has time until the end of the next latency
to react, that is to open /sys/kernel/debug/tracing/trace. Thus,
if we trace latencies that are >100 us, then we have at least 100 us
to react.

Example output:

root@myhost:~# echo 100 > /sys/kernel/debug/tracing/tracing_thresh
root@myhost:~# echo preemptirqsoff > /sys/kernel/debug/tracing/current_tracer
root@myhost:~# latency-collector -rvv -a 11 &
[1] 4915
root@myhost:~# Running with policy rr and priority 99. Using 3 print threads.
/sys/kernel/debug/tracing/trace will be printed with random delay
Start size of the probability table:11
Print a message when prob. table table changes size:true
Print a warning when an event has been lost:true
Sleep time is:  1000 ms
Initializing probability table to 11
Thread 4916 runs with policy rr and priority 99
Thread 4917 runs with policy rr and priority 99
Thread 4918 runs with policy rr and priority 99
Thread 4919 runs with policy rr and priority 99

root@myhost:~# modprobe preemptirq_delay_test delay=105 test_mode=alternate 
burst_size=10
2850.638308 Latency 1 printout skipped due to random delay
2850.638383 Latency 2 printout skipped due to random delay
2850.638497 Latency 3 immediate print
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> BEGIN <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
alternat-49226d...0us!: execute_preemptirqtest <-preemptirqtest_2
alternat-49226d...  106us : execute_preemptirqtest <-preemptirqtest_2
alternat-49226d...  106us : tracer_hardirqs_on <-preemptirqtest_2
alternat-49226d...  108us : 
 => preemptirqtest_2
 => preemptirq_delay_run
 => kthread
 => ret_from_fork
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> END <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

In the example above, we randomly, happened to get the third latency
in a burst burst of 10. If the experiment is repeated enough times,
we will get all 10.

Sometimes, there will be lost latencies, so that we get:

3054.078294 Latency 44 printout skipped due to inotify loop
..or:

3211.957685 Latency 112 printout skipped due to print loop

In my experience, these are not frequent enough to be a problem.

Also, we do not really lose any latency in these cases, it's
more like we get another latency than what was intended by
the design.

Signed-off-by: Viktor Rosendahl 
---
 tools/Makefile  |   14 +-
 tools/trac

[PATCH 2/4] preemptirq_delay_test: Add the burst feature and a sysfs trigger

2019-01-21 Thread Viktor Rosendahl

This burst feature enables the user to generate a burst of
preempt/irqsoff latencies. This makes it possible to test whether we
are able to detect latencies that systematically occur very close to
each other.

In addition, there is a sysfs trigger, so that it's not necessary to
reload the module to repeat the test. The trigger will appear as
/sys/kernel/preemptirq_delay_test/trigger in sysfs.

Signed-off-by: Viktor Rosendahl 
---
 kernel/trace/preemptirq_delay_test.c | 139 +++
 1 file changed, 120 insertions(+), 19 deletions(-)

diff --git a/kernel/trace/preemptirq_delay_test.c 
b/kernel/trace/preemptirq_delay_test.c
index d8765c952fab..1fc3cdbdd293 100644
--- a/kernel/trace/preemptirq_delay_test.c
+++ b/kernel/trace/preemptirq_delay_test.c
@@ -3,6 +3,7 @@
  * Preempt / IRQ disable delay thread to test latency tracers
  *
  * Copyright (C) 2018 Joel Fernandes (Google) 
+ * Copyright (C) 2018 BMW Car IT GmbH
  */
 
 #include 
@@ -10,18 +11,23 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 static ulong delay = 100;
-static char test_mode[10] = "irq";
+static char test_mode[12] = "irq";
+static uint burst_size = 1;
 
-module_param_named(delay, delay, ulong, S_IRUGO);
-module_param_string(test_mode, test_mode, 10, S_IRUGO);
-MODULE_PARM_DESC(delay, "Period in microseconds (100 uS default)");
-MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt or irq (default 
irq)");
+module_param_named(delay, delay, ulong, 0444);
+module_param_string(test_mode, test_mode, 12, 0444);
+module_param_named(burst_size, burst_size, uint, 0444);
+MODULE_PARM_DESC(delay, "Period in microseconds (100 us default)");
+MODULE_PARM_DESC(test_mode, "Mode of the test such as preempt, irq, or 
alternate (default irq)");
+MODULE_PARM_DESC(burst_size, "The size of a burst (default 1)");
 
 static void busy_wait(ulong time)
 {
@@ -34,37 +40,132 @@ static void busy_wait(ulong time)
} while ((end - start) < (time * 1000));
 }
 
-static int preemptirq_delay_run(void *data)
+static __always_inline void irqoff_test(void)
 {
unsigned long flags;
+   local_irq_save(flags);
+   busy_wait(delay);
+   local_irq_restore(flags);
+}
+
+static __always_inline void preemptoff_test(void)
+{
+   preempt_disable();
+   busy_wait(delay);
+   preempt_enable();
+}
 
-   if (!strcmp(test_mode, "irq")) {
-   local_irq_save(flags);
-   busy_wait(delay);
-   local_irq_restore(flags);
-   } else if (!strcmp(test_mode, "preempt")) {
-   preempt_disable();
-   busy_wait(delay);
-   preempt_enable();
+static void execute_preemptirqtest(int idx)
+{
+   if (!strcmp(test_mode, "irq"))
+   irqoff_test();
+   else if (!strcmp(test_mode, "preempt"))
+   preemptoff_test();
+   else if (!strcmp(test_mode, "alternate")) {
+   if (idx % 2 == 0)
+   irqoff_test();
+   else
+   preemptoff_test();
}
+}
+
+#define DECLARE_TESTFN(POSTFIX)\
+   static void preemptirqtest_##POSTFIX(int idx)   \
+   {   \
+   execute_preemptirqtest(idx);\
+   }   \
+
+DECLARE_TESTFN(0)
+DECLARE_TESTFN(1)
+DECLARE_TESTFN(2)
+DECLARE_TESTFN(3)
+DECLARE_TESTFN(4)
+DECLARE_TESTFN(5)
+DECLARE_TESTFN(6)
+DECLARE_TESTFN(7)
+DECLARE_TESTFN(8)
+DECLARE_TESTFN(9)
 
+static void (*testfuncs[])(int)  = {
+   preemptirqtest_0,
+   preemptirqtest_1,
+   preemptirqtest_2,
+   preemptirqtest_3,
+   preemptirqtest_4,
+   preemptirqtest_5,
+   preemptirqtest_6,
+   preemptirqtest_7,
+   preemptirqtest_8,
+   preemptirqtest_9,
+};
+
+#define NR_TEST_FUNCS ARRAY_SIZE(testfuncs)
+
+static int preemptirq_delay_run(void *data)
+{
+
+   int i;
+
+   for (i = 0; i < burst_size; i++)
+   (testfuncs[i % NR_TEST_FUNCS])(i);
return 0;
 }
 
-static int __init preemptirq_delay_init(void)
+static struct task_struct *preemptirq_start_test(void)
 {
char task_name[50];
-   struct task_struct *test_task;
 
snprintf(task_name, sizeof(task_name), "%s_test", test_mode);
+   return kthread_run(preemptirq_delay_run, NULL, task_name);
+}
+
+
+static ssize_t trigger_store(struct kobject *kobj, struct kobj_attribute *attr,
+const char *buf, size_t count)
+{
+   preemptirq_start_test();
+   return count;
+}
+
+static struct kobj_attribute trigger_attribute =
+   __ATTR(trigger, 0200, NULL, trigger_store);
+
+static struct attribute *attrs[] = {
+   _attribute.attr,
+   NULL,
+};
+
+static struct attribute_group att

[PATCH 4/4] ftrace: Add an option for tracing console latencies

2019-01-21 Thread Viktor Rosendahl

This new option CONFIG_TRACE_CONSOLE_LATENCY will enable the latency
tracers to trace the console latencies. Previously this has always been
implicitely disabled. I guess this is because they are considered
to be well known and unavoidable.

However, for some organizations it may nevertheless be desirable to
trace them. Basically, we want to be able to tell that there are
latencies in the system under test because someone has incorrectly
enabled the serial console.

Signed-off-by: Viktor Rosendahl 
---
 include/linux/irqflags.h | 13 +
 kernel/printk/printk.c   |  5 +++--
 kernel/trace/Kconfig | 11 +++
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 21619c92c377..791bee718448 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -73,6 +73,19 @@ do { \
 # define start_critical_timings() do { } while (0)
 #endif
 
+#ifdef CONFIG_TRACE_CONSOLE_LATENCY
+
+#define console_stop_critical_timings()  do {} while (0)
+#define console_start_critical_timings() do {} while (0)
+
+#else /* !CONFIG_TRACE_CONSOLE_LATENCY */
+
+/* don't trace print latency */
+#define console_stop_critical_timings()  stop_critical_timings()
+#define console_start_critical_timings() start_critical_timings()
+
+#endif /* !CONFIG_TRACE_CONSOLE_LATENCY */
+
 /*
  * Wrap the arch provided IRQ routines to provide appropriate checks.
  */
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d3d170374ceb..303b8f656009 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2404,9 +2404,10 @@ void console_unlock(void)
 */
console_lock_spinning_enable();
 
-   stop_critical_timings();/* don't trace print latency */
+   /* don't trace print latency if it's disabled */
+   console_stop_critical_timings();
call_console_drivers(ext_text, ext_len, text, len);
-   start_critical_timings();
+   console_start_critical_timings();
 
if (console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index ca80cb08bd39..5063b32de66c 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -244,6 +244,17 @@ config PREEMPT_TRACER
  modification of /sys/kernel/debug/tracing/trace through the inotify
  interface.
 
+   config TRACE_CONSOLE_LATENCY
+   bool "Do not turn off latency tracing for the console"
+   default n
+   depends on IRQSOFF_TRACER || PREEMPT_TRACER
+   help
+ Some of the console drivers will cause long unavoidable
+ latencies because they are slow and need to print immediately
+ in a serialized manner. Because of this their latencies are not
+ traced by default. This option will change behavior so that
+ they are traced.
+
 config SCHED_TRACER
bool "Scheduling Latency Tracer"
select GENERIC_TRACER
-- 
2.17.1

[PATCH 1/4] ftrace: Implement fs notification for preempt/irqsoff tracers

2019-01-21 Thread Viktor Rosendahl

This patch implements the feature that the trace file, e.g.
/sys/kernel/debug/tracing/trace will receive notifications through
the fsnotify framework when a new trace is available.

This makes it possible to implement a user space program that can,
with equal probability, obtain traces of latencies that occur
immediately after each other in spite of the fact that the
preempt/irqsoff tracers operate in overwrite mode.

Signed-off-by: Viktor Rosendahl 
---
 kernel/trace/Kconfig | 10 ++
 kernel/trace/trace.c | 31 +--
 kernel/trace/trace.h |  5 +
 kernel/trace/trace_irqsoff.c | 35 +++
 4 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index fa8b1fe824f3..ca80cb08bd39 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -234,6 +234,16 @@ config PREEMPT_TRACER
  enabled. This option and the irqs-off timing option can be
  used together or separately.)
 
+   config PREEMPTIRQ_FSNOTIFY
+   bool "Generate fsnotify events for the latency tracers"
+   default n
+   depends on (IRQSOFF_TRACER || PREEMPT_TRACER) && FSNOTIFY
+   help
+ This option will enable the generation of fsnotify events for the
+ trace file. This makes it possible for userspace to be notified about
+ modification of /sys/kernel/debug/tracing/trace through the inotify
+ interface.
+
 config SCHED_TRACER
bool "Scheduling Latency Tracer"
select GENERIC_TRACER
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c521b7347482..c7db01101ef2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -44,6 +44,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "trace.h"
 #include "trace_output.h"
@@ -7977,6 +7979,32 @@ static __init void create_trace_instances(struct dentry 
*d_tracer)
return;
 }
 
+#ifdef CONFIG_PREEMPTIRQ_FSNOTIFY
+
+static void trace_notify_workfn(struct work_struct *work)
+{
+   struct trace_array *tr = container_of(work, struct trace_array,
+ notify_work);
+   fsnotify(tr->d_trace->d_inode, FS_MODIFY, tr->d_trace->d_inode,
+FSNOTIFY_EVENT_INODE, NULL, 0);
+}
+
+static void trace_create_trace_file(struct trace_array *tr,
+   struct dentry *d_tracer)
+{
+   /* For notify we need to init the work structure and save the pointer */
+   INIT_WORK(>notify_work, trace_notify_workfn);
+   tr->d_trace = trace_create_file("trace", 0644, d_tracer, tr,
+   _fops);
+}
+
+#else /* !CONFIG_PREEMPTIRQ_FSNOTIFY */
+
+#define trace_create_trace_file(tr, d_tracer) \
+   trace_create_file("trace", 0644, d_tracer, tr, _fops)
+
+#endif
+
 static void
 init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
 {
@@ -7995,8 +8023,7 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry 
*d_tracer)
trace_create_file("trace_options", 0644, d_tracer,
  tr, _iter_fops);
 
-   trace_create_file("trace", 0644, d_tracer,
- tr, _fops);
+   trace_create_trace_file(tr, d_tracer);
 
trace_create_file("trace_pipe", 0444, d_tracer,
  tr, _pipe_fops);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 08900828d282..a6769438a809 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_FTRACE_SYSCALLS
 #include /* For NR_SYSCALLS   */
@@ -257,6 +258,10 @@ struct trace_array {
struct dentry   *options;
struct dentry   *percpu_dir;
struct dentry   *event_dir;
+#ifdef CONFIG_PREEMPTIRQ_FSNOTIFY
+   struct dentry   *d_trace;
+   struct work_struct  notify_work;
+#endif
struct trace_options*topts;
struct list_headsystems;
struct list_headevents;
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index d3294721f119..676413504566 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -81,6 +81,31 @@ static inline int irqsoff_display_graph(struct trace_array 
*tr, int set)
  */
 static __cacheline_aligned_in_smp  unsigned long max_sequence;
 
+#ifdef CONFIG_PREEMPTIRQ_FSNOTIFY
+
+static struct workqueue_struct *notify_wq;
+
+static __init void trace_file_notify_init(void)
+{
+   notify_wq = alloc_workqueue("irqsoff_notify_wq",
+   WQ_UNBOUND | WQ_HIGHPRI, 0);
+   if (!notify_wq)
+   pr_err("Unable to allocate irqsoff_notify_wq");
+}
+
+static inline void trac

[PATCH 0/4] Some new features for the preempt/irqsoff tracers

2019-01-21 Thread Viktor Rosendahl

Hello all,

This series is meant to address two issues with the latency tracing.

The first three patches provide a method to trace latencies that
always occurs very close to each other and to differentiate between
them, in spite of the fact that the latency tracers always work in
overwrite mode.

[PATCH 1/4] This implement fs notification for preempt/irqsoff. It
makes it possible for userspace to detect when a new latency has
been detected.

[PATCH 2/4] This extends the preemptirq_delay_test module so that
it can be used to generate a burst of closely occurring latencies.

[PATCH 3/4] This adds a user space program to the tools directory
that utilizes the fs notification feature and a randomized algorithm
to print out any of the latencies in a burst with approximately equal
probability.

The last patch is not directly connected but doesn't apply cleanly on
its own:

[PATCH 4/4] This adds the option CONFIG_TRACE_CONSOLE_LATENCY to
decide whether we want to trace prints to the console or not.

best regards,

Viktor Rosendahl

Viktor Rosendahl (4):
  ftrace: Implement fs notification for preempt/irqsoff tracers
  preemptirq_delay_test: Add the burst feature and a sysfs trigger
  Add the latency-collector to tools
  ftrace: Add an option for tracing console latencies

 include/linux/irqflags.h |   13 +
 kernel/printk/printk.c   |5 +-
 kernel/trace/Kconfig |   21 +
 kernel/trace/preemptirq_delay_test.c |  139 ++-
 kernel/trace/trace.c |   31 +-
 kernel/trace/trace.h |5 +
 kernel/trace/trace_irqsoff.c |   35 +
 tools/Makefile   |   14 +-
 tools/trace/Makefile |   20 +
 tools/trace/latency-collector.c  | 1190 ++
 10 files changed, 1444 insertions(+), 29 deletions(-)
 create mode 100644 tools/trace/Makefile
 create mode 100644 tools/trace/latency-collector.c

-- 
2.17.1

76 matches

Mail list logo