There is an ABBA deadlock between time_init() and seq_wait(): Thread 1: poll_block() time_poll() time_init() pthread_once() <-- lock A do_time_init() seq_create() pthread_mutex_lock(seq_mutex) <-- lock B
Thread 2: seq_wait(different seqno) pthread_mutex_lock(seq_mutex) <-- lock B poll_immediate_wake() poll_timer_wait() time_msec() time_init() pthread_once() <-- lock A This is likely the same deadlock Intel CI saw last year before the lab was shut down. The issue should not happen with normal applications as those would normally have the time module initialized early in the process before waiting on any sequence numbers, but it happens in the test-barrier application from time to time causing the test suite to hang. Fix that by making sure we're not calling poll_immediate_wake() under the seq_mutex. The time and seq modules are independent and it's hard to ensure the dependency without exporting some of their internals. Instead re-defining the prototype of the poll_immediate_wake_at(), adding the thread safety annotation, so we have some basic protection from this deadlock if the code ever changes. Compiler will warn on the prototype mismatch as well if it ever happens, so it's not a big problem. Having this prototype also gives us a spot in the code where we can place a comment explaining the locking order. Reportde-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/415436.html Reported-at: https://issues.redhat.com/browse/FDP-1493 Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> --- lib/seq.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/lib/seq.c b/lib/seq.c index 7c2fa0c69..b0e6d5ce4 100644 --- a/lib/seq.c +++ b/lib/seq.c @@ -163,9 +163,14 @@ seq_read(const struct seq *seq) return value; } +/* poll_immediate_wake_at() must not be called while holding seq_mutex + * in order to avoid potential deadlock with time_init() that calls + * seq_create() if the timeval module is not initialized yet. */ +void poll_immediate_wake_at(const char *where) OVS_EXCLUDED(seq_mutex); + static void seq_wait__(struct seq *seq, uint64_t value, const char *where) - OVS_REQUIRES(seq_mutex) + OVS_RELEASES(seq_mutex) { unsigned int id = ovsthread_id_self(); uint32_t hash = hash_int(id, 0); @@ -176,9 +181,11 @@ seq_wait__(struct seq *seq, uint64_t value, const char *where) if (waiter->value != value) { /* The current value is different from the value we've already * waited for, */ + ovs_mutex_unlock(&seq_mutex); poll_immediate_wake_at(where); } else { /* Already waiting on 'value', nothing more to do. */ + ovs_mutex_unlock(&seq_mutex); } return; } @@ -196,6 +203,7 @@ seq_wait__(struct seq *seq, uint64_t value, const char *where) latch_wait_at(&waiter->thread->latch, where); waiter->thread->waiting = true; } + ovs_mutex_unlock(&seq_mutex); } /* Causes the following poll_block() to wake up when 'seq''s sequence number @@ -219,9 +227,9 @@ seq_wait_at(const struct seq *seq_, uint64_t value, const char *where) if (value == seq_read(seq_)) { seq_wait__(seq, value, where); } else { + ovs_mutex_unlock(&seq_mutex); poll_immediate_wake_at(where); } - ovs_mutex_unlock(&seq_mutex); } /* Called by poll_block() just before it returns, this function destroys any -- 2.49.0 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev