Public bug reported:

Release: Ubuntu 24.04 LTS (noble)
Installed version: 46.2-0ubuntu1

== Bug reference ==

This is a request to backport upstream commit 44814b8 into the noble package.
The crash is tracked in Launchpad bug #2156739 (filed against gnome-shell,
which is where the memory accumulates, but the triggering defect is in
xdg-desktop-portal-gnome).

Upstream issue filed against xdg-desktop-portal-gnome:
https://gitlab.gnome.org/GNOME/xdg-desktop-portal-gnome/-/work_items/218
(closed by maintainers as "version too old", directing fix to the distro)

== [Impact] ==

On Ubuntu 24.04 with GNOME 46, a laptop used in clamshell mode (lid closed,
external HDMI monitor only, Intel xe/Arrow Lake-U GPU) hangs and requires a
hard reboot after 2–3 hours of idle. Nine confirmed hang events have occurred.

Root cause chain:
1. Screen blank (DPMS off after 5-min idle) causes Mutter to emit
   MonitorsChanged on org.gnome.Mutter.DisplayConfig
2. xdg-desktop-portal-gnome's DisplayStateTracker responds with an async
   GetCurrentState call
3. In the async callback (get_current_state_cb), the code uses tracker->proxy
   rather than the proxy from the async result's source_object parameter.
   When tracker->proxy has become stale or mismatched by the time the callback
   fires (a race condition in the display state cycle), the call fails.
4. The failed call triggers an error log ("Monitor 'Built-in display' has no
   configuration which is-current!") which appears to re-schedule another
   GetCurrentState call — producing a retry loop that fires every ~8–9 seconds
   for the entire idle period.
5. Each iteration of this loop causes gnome-shell to allocate ~2 × 32 MB
   DMA-BUF framebuffer objects that are never released.
6. Leak rate: ~200 MB/min. After 2–3 hours idle: ~24 GB accumulated → hang.

This was confirmed with dbus-monitor and a custom fdinfo logger tracking
gnome-shell's drm-total-gtt and exported DMA-BUF fd count. The DMA-BUF count
rose steadily (e.g. 22 → 41 fds over 3 minutes) during idle with MonitorsChanged
firing, and stopped rising when the user returned. The leaked DMA-BUFs were
never freed.

Screen lock is NOT required — DPMS off alone triggers it (confirmed by
setting lock-delay=3600 and observing locked=no throughout the leak period).

== [Test Case] ==

1. ThinkPad T16 Gen 4 (or similar) with Intel Arrow Lake-U / xe driver
2. Close laptop lid, connect only external HDMI monitor (clamshell mode)
3. Set idle-delay to 300 (5 minutes): gsettings set org.gnome.desktop.session 
idle-delay 300
4. Leave system completely idle for 5+ minutes (allow screen to blank)
5. Run this logger before step 4:
     while true; do
       pid=$(pgrep -x gnome-shell | head -1)
       fds=0
       for f in /proc/$pid/fdinfo/*; do
         grep -q "^exp_name:.*drm" "$f" 2>/dev/null && fds=$((fds+1))
       done
       echo "$(date '+%H:%M:%S') dmabuf_fds=$fds"
       sleep 15
     done
6. After returning from idle, check whether dmabuf_fds grew steadily during
   the blank period. On an affected system it rises ~6/min. On a fixed system
   it stays flat.

Affected: Ubuntu 24.04 noble, xdg-desktop-portal-gnome 46.2-0ubuntu1
Likely fixed: GNOME 47+ (Ubuntu 24.10+), based on inspection of current
upstream source which no longer shows retry behavior in this path.

== [Fix] ==

Upstream commit (merged into GNOME 47 development cycle, Sept 2024):

  44814b8 "display-state-tracker: Use proxy from source object in callback"
  https://github.com/GNOME/xdg-desktop-portal-gnome/commit/44814b8

Diff summary (src/displaystatetracker.c, get_current_state_cb function):

  Before:
    if (!org_gnome_mutter_display_config_call_get_current_state_finish (
          tracker->proxy, ...))
      {
        g_warning ("Failed to get current display state: %s", error->message);
        return;
      }

  After:
    OrgGnomeMutterDisplayConfig *proxy =
      ORG_GNOME_MUTTER_DISPLAY_CONFIG (source_object);
    ...
    if (!org_gnome_mutter_display_config_call_get_current_state_finish (
          proxy, ...))
      {
        if (!g_error_matches (error, G_IO_ERROR, G_IO_ERROR_CANCELLED))
          g_warning ("Failed to get current display state: %s", error->message);
        return;
      }

The fix uses the proxy provided by the async framework (source_object) instead
of the potentially stale tracker->proxy, eliminating the race condition that
causes the callback to fail and re-schedule indefinitely. It also suppresses
spurious warnings for legitimately cancelled operations.

== [Regression Potential] ==

Low. The change is confined to a single async callback in displaystatetracker.c.
It replaces a potentially stale object reference with the canonical one provided
by the GLib async framework (source_object is always valid at callback time by
GLib contract). The added G_IO_ERROR_CANCELLED check is purely cosmetic (reduces
log noise). No functional change to the happy path.

** Affects: xdg-desktop-portal-gnome (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2156892

Title:
  [SRU] xdg-desktop-portal-gnome: async callback uses stale proxy,
  causing DMA-BUF leak and system hang in clamshell mode

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/xdg-desktop-portal-gnome/+bug/2156892/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to