Hey Daniel, got a freeze with !5020 in like 50min, so i try to get this nvidia bug fixed, otherwise i have to search for an alternative. nvidia is consuming so much of my lifetime, not only this bug, but many in the last year or so.
AI Summary: I tested MR !5020 (never-fail-swap) against a compositor freeze I've been seeing on my system. Wanted to share the results and explain why I believe this is a different issue from what !5020 addresses. Setup: Single-GPU system: RTX 3060 Ti, driver 595.58.03 proprietary, Ubuntu 26.04, Wayland, dual 4K monitors (EIZO EV2785 via DisplayPort/KVM). What I built: Took the 50.0-0ubuntu4 source package, reversed the two !5008-based patches, applied both commits from your never-fail-swap branch (9713674437 + 4fd35080d1) to meta-onscreen-native.c. Built as 50.0-0ubuntu4+mr5020. Did not include the secondary GPU buffer refactor (single-GPU system). Result: Compositor froze after ~50 minutes. Same deadlock as with the 0ubuntu3 patches. Why !5020 can't fix this specific deadlock: Your fix operates on the mutter side — it ensures correct EGL/GBM swap pairing and that pending frames are consumed before a new swap. This controls what happens before and after eglSwapBuffers is called. The deadlock I'm hitting is inside a single eglSwapBuffers call. The backtrace shows mutter correctly initiating a swap, but the driver never returns: #17 cogl_onscreen_swap_buffers_with_damage() ← mutter calls swap #15 ??? () at libmutter-cogl-18.so.0 ← cogl calls eglSwapBuffers #14 ??? () at libEGL_nvidia.so.0 ← driver code from here ... #7 ??? () at libEGL_nvidia.so.0 #6 pthread_cond_wait (mutex=0x57119e076570) ← hangs forever 8 additional threads are stuck in pthread_cond_wait on another mutex (0x7579524ed160). The root cause is a TOCTOU race on an unsynchronized needs_signal byte flag (offset 0x1f8) inside libEGL_nvidia.so: 1. Waiter thread is about to call pthread_cond_wait, but hasn't set needs_signal = 1 yet 2. Signaler thread checks needs_signal, sees 0, skips pthread_cond_broadcast 3. Waiter sets needs_signal = 1 and enters pthread_cond_wait — but the signal already came and won't come again No matter how correctly mutter sequences its swaps, once eglSwapBuffers is called, the driver's internal synchronization is solely responsible. I have a minimal reproducer (191 lines, no compositor, single output, just eglSwapBuffers in a loop) that deadlocks in 2 iterations — confirming the bug is entirely inside the driver. Bug report: NVIDIA: Developer Forum #366254 This is a different bug from the swap-pairing issues that !5020 and LP #2146782 address. Full backtrace available if useful. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2147648 Title: gnome-shell freeze: NVIDIA EGL deadlock in eglSwapBuffers triggered by notification damage rects To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mutter/+bug/2147648/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
