[tip: locking/core] locking/csd_lock: Add more data to CSD lock debugging

2021-03-06 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the locking/core branch of tip:

Commit-ID: a5aabace5fb8abf2adcfcf0fe54c089b20d71755
Gitweb:
https://git.kernel.org/tip/a5aabace5fb8abf2adcfcf0fe54c089b20d71755
Author:Juergen Gross 
AuthorDate:Mon, 01 Mar 2021 11:13:36 +01:00
Committer: Ingo Molnar 
CommitterDate: Sat, 06 Mar 2021 12:49:48 +01:00

locking/csd_lock: Add more data to CSD lock debugging

In order to help identifying problems with IPI handling and remote
function execution add some more data to IPI debugging code.

There have been multiple reports of CPUs looping long times (many
seconds) in smp_call_function_many() waiting for another CPU executing
a function like tlb flushing. Most of these reports have been for
cases where the kernel was running as a guest on top of KVM or Xen
(there are rumours of that happening under VMWare, too, and even on
bare metal).

Finding the root cause hasn't been successful yet, even after more than
2 years of chasing this bug by different developers.

Commit:

  35feb60474bf4f7 ("kernel/smp: Provide CSD lock timeout diagnostics")

tried to address this by adding some debug code and by issuing another
IPI when a hang was detected. This helped mitigating the problem
(the repeated IPI unlocks the hang), but the root cause is still unknown.

Current available data suggests that either an IPI wasn't sent when it
should have been, or that the IPI didn't result in the target CPU
executing the queued function (due to the IPI not reaching the CPU,
the IPI handler not being called, or the handler not seeing the queued
request).

Try to add more diagnostic data by introducing a global atomic counter
which is being incremented when doing critical operations (before and
after queueing a new request, when sending an IPI, and when dequeueing
a request). The counter value is stored in percpu variables which can
be printed out when a hang is detected.

The data of the last event (consisting of sequence counter, source
CPU, target CPU, and event type) is stored in a global variable. When
a new event is to be traced, the data of the last event is stored in
the event related percpu location and the global data is updated with
the new event's data. This allows to track two events in one data
location: one by the value of the event data (the event before the
current one), and one by the location itself (the current event).

A typical printout with a detected hang will look like this:

csd: Detected non-responsive CSD lock (#1) on CPU#1, waiting 53 ns for 
CPU#06 scf_handler_1+0x0/0x50(0xa2a881bb1410).
csd: CSD lock (#1) handling prior 
scf_handler_1+0x0/0x50(0xa2a8813823c0) request.
csd: cnt(8cc): -> dequeue (src cpu 0 == empty)
csd: cnt(8cd): ->0006 idle
csd: cnt(0003668): 0001->0006 queue
csd: cnt(0003669): 0001->0006 ipi
csd: cnt(0003e0f): 0007->000a queue
csd: cnt(0003e10): 0001-> ping
csd: cnt(0003e71): 0003-> ping
csd: cnt(0003e72): ->0006 gotipi
csd: cnt(0003e73): ->0006 handle
csd: cnt(0003e74): ->0006 dequeue (src cpu 0 == empty)
csd: cnt(0003e7f): 0004->0006 ping
csd: cnt(0003e80): 0001-> pinged
csd: cnt(0003eb2): 0005->0001 noipi
csd: cnt(0003eb3): 0001->0006 queue
csd: cnt(0003eb4): 0001->0006 noipi
csd: cnt now: 0003f00

The idea is to print only relevant entries. Those are all events which
are associated with the hang (so sender side events for the source CPU
of the hanging request, and receiver side events for the target CPU),
and the related events just before those (for adding data needed to
identify a possible race). Printing all available data would be
possible, but this would add large amounts of data printed on larger
configurations.

Signed-off-by: Juergen Gross 
[ Minor readability edits. Breaks col80 but is far more readable. ]
Signed-off-by: Ingo Molnar 
Tested-by: Paul E. McKenney 
Link: https://lore.kernel.org/r/20210301101336.7797-4-jgr...@suse.com
---
 Documentation/admin-guide/kernel-parameters.txt |   4 +-
 kernel/smp.c| 226 ++-
 2 files changed, 226 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 98dbffa..1fe9d38 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -789,6 +789,10 @@
printed to the console in case a hanging CPU is
detected, and that CPU is pinged again in order to try
to resolve the hang situation.
+   0: disable csdlock debugging (default)
+   1: enable basic csdlock debugging (minor impact)
+   ext: enable extended csdlock debugging (more impact,
+but 

[tip: locking/core] locking/csd_lock: Add more data to CSD lock debugging

2021-03-01 Thread tip-bot2 for Juergen Gross
The following commit has been merged into the locking/core branch of tip:

Commit-ID: 6bf3195fdbab92b57f3167101a0b651b93dbeae7
Gitweb:
https://git.kernel.org/tip/6bf3195fdbab92b57f3167101a0b651b93dbeae7
Author:Juergen Gross 
AuthorDate:Mon, 01 Mar 2021 11:13:36 +01:00
Committer: Ingo Molnar 
CommitterDate: Mon, 01 Mar 2021 14:27:59 +01:00

locking/csd_lock: Add more data to CSD lock debugging

In order to help identifying problems with IPI handling and remote
function execution add some more data to IPI debugging code.

There have been multiple reports of CPUs looping long times (many
seconds) in smp_call_function_many() waiting for another CPU executing
a function like tlb flushing. Most of these reports have been for
cases where the kernel was running as a guest on top of KVM or Xen
(there are rumours of that happening under VMWare, too, and even on
bare metal).

Finding the root cause hasn't been successful yet, even after more than
2 years of chasing this bug by different developers.

Commit:

  35feb60474bf4f7 ("kernel/smp: Provide CSD lock timeout diagnostics")

tried to address this by adding some debug code and by issuing another
IPI when a hang was detected. This helped mitigating the problem
(the repeated IPI unlocks the hang), but the root cause is still unknown.

Current available data suggests that either an IPI wasn't sent when it
should have been, or that the IPI didn't result in the target CPU
executing the queued function (due to the IPI not reaching the CPU,
the IPI handler not being called, or the handler not seeing the queued
request).

Try to add more diagnostic data by introducing a global atomic counter
which is being incremented when doing critical operations (before and
after queueing a new request, when sending an IPI, and when dequeueing
a request). The counter value is stored in percpu variables which can
be printed out when a hang is detected.

The data of the last event (consisting of sequence counter, source
CPU, target CPU, and event type) is stored in a global variable. When
a new event is to be traced, the data of the last event is stored in
the event related percpu location and the global data is updated with
the new event's data. This allows to track two events in one data
location: one by the value of the event data (the event before the
current one), and one by the location itself (the current event).

A typical printout with a detected hang will look like this:

csd: Detected non-responsive CSD lock (#1) on CPU#1, waiting 53 ns for 
CPU#06 scf_handler_1+0x0/0x50(0xa2a881bb1410).
csd: CSD lock (#1) handling prior 
scf_handler_1+0x0/0x50(0xa2a8813823c0) request.
csd: cnt(8cc): -> dequeue (src cpu 0 == empty)
csd: cnt(8cd): ->0006 idle
csd: cnt(0003668): 0001->0006 queue
csd: cnt(0003669): 0001->0006 ipi
csd: cnt(0003e0f): 0007->000a queue
csd: cnt(0003e10): 0001-> ping
csd: cnt(0003e71): 0003-> ping
csd: cnt(0003e72): ->0006 gotipi
csd: cnt(0003e73): ->0006 handle
csd: cnt(0003e74): ->0006 dequeue (src cpu 0 == empty)
csd: cnt(0003e7f): 0004->0006 ping
csd: cnt(0003e80): 0001-> pinged
csd: cnt(0003eb2): 0005->0001 noipi
csd: cnt(0003eb3): 0001->0006 queue
csd: cnt(0003eb4): 0001->0006 noipi
csd: cnt now: 0003f00

The idea is to print only relevant entries. Those are all events which
are associated with the hang (so sender side events for the source CPU
of the hanging request, and receiver side events for the target CPU),
and the related events just before those (for adding data needed to
identify a possible race). Printing all available data would be
possible, but this would add large amounts of data printed on larger
configurations.

Signed-off-by: Juergen Gross 
[ Minor readability edits. Breaks col80 but is far more readable. ]
Signed-off-by: Ingo Molnar 
Tested-by: Paul E. McKenney 
Link: https://lore.kernel.org/r/20210301101336.7797-4-jgr...@suse.com
---
 Documentation/admin-guide/kernel-parameters.txt |   4 +-
 kernel/smp.c| 226 ++-
 2 files changed, 226 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 98dbffa..1fe9d38 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -789,6 +789,10 @@
printed to the console in case a hanging CPU is
detected, and that CPU is pinged again in order to try
to resolve the hang situation.
+   0: disable csdlock debugging (default)
+   1: enable basic csdlock debugging (minor impact)
+   ext: enable extended csdlock debugging (more impact,
+but