** Description changed:

  [SRU Justification]
  
  [Impact]
  
  The current epoll implementation in the 5.15 kernel utilizes a read-write
  semaphore (rwlock_t) to protect the ready event list. While this allows
  multiple producers to concurrently add items, it introduces a scheduling
  priority inversion vulnerability.
  
  If a high-priority consumer (such as a real-time thread calling epoll_wait) is
  blocked waiting for the exclusive write lock, it can be indefinitely stalled 
by
  a low-priority producer holding the read lock. This results in 
un-deterministic
  system stalls and latency spikes.
  
- The fix involves replacing rwlock_t with a standard spinlock_t one-to-one, and
- removing the now-redundant lockless helper functions (list_add_tail_lockless
- and chain_epi_lockless). This ensures that under real-time configurations,
- priority inheritance works correctly across the epoll subsystem, eliminating
- priority inversion.
+ The fix involves replacing rwlock_t with spinlock_t, and removing the
+ now-redundant lockless helper functions (list_add_tail_lockless and
+ chain_epi_lockless). This ensures that under real-time configurations, 
priority
+ inheritance works correctly across the epoll subsystem, eliminating priority
+ inversion.
  
  [Fix]
  
  Backport upstream commit:
  0c43094f8cc9 ("eventpoll: Replace rwlock with spinlock")
  
  [Test Plan]
  
  Due to the nature of scheduling priority inversion, reproducing this bug
  reliably on demand is highly impractical. Because this race condition relies
  on erratic, non-deterministic scheduling micro-windows, a standard
  deterministic reproduction script cannot be provided.
  
  Therefore, validation relies on verifying that the replacement locking
  mechanism functions correctly, introduces no regressions, and scales safely
  under synthetic load.
  
  There is a test kernel available in the following PPA:
  https://launchpad.net/~munirsid/+archive/ubuntu/lp2154194
  
  [Where Problems Could Occur]
  
  There could be a performance degradation with highly specific, synthetic
  workloads on the GA kernel. As seen in the upstream commit description [0],
  in artificial benchmarks where hundreds of threads continuously spam epoll
  events, throughput can drop due to serialization around the new spinlock.
  
  However, testing with realistic workloads (via perf bench epoll wait) actually
  demonstrates a performance improvement on x86 architectures.
  
  The regression potential for real-world production environments is low, as
  typical workloads do not exhibit continuous, uninterrupted event-spamming
  behavior. Moreover, the fix is strictly isolated to fs/eventpoll.c and alters
  no external kernel APIs.
  
  [Other Info]
  
  This bug was addressed upstream and has already been integrated into Noble and
  subsequent releases.
  
  [0] -
  
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0c43094f8cc9d3d99d835c0ac9c4fe1ccc62babd

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2154194

Title:
  [Jammy] Priority inversion problem in epoll for rt kernel

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2154194/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to