Public bug reported:

This might not be a bug really about linux-gcp, but this is following
the work on bug 2039732 and so far I could not reproduce that locally.

Setup is 22.04 uptodate on GCP n2-standard instances, without GPU
attached thus relying on vkms. I have reproduced locally a similar setup
but on a KVM host.

We rely on 
https://github.com/taskcluster/taskcluster/tree/main/workers/generic-worker#readme
 to run tasks on CI, and especially generic-worker will:
 - create a new task_XXX user
 - make it autologin in gdm3 config
 - generic-worker has code to probe for existence of the GNOME Wayland session 
before launching the task

We relied on wl-clipboard package installed for verifying the status of
wayland

On top of that setup, here is the issue.

We issue a TC task with payload:
> export WAYLAND_DISPLAY=wayland-0
> export XDG_RUNTIME_DIR=/run/user/$(id -u)
> wl-paste -l -p

We expect that payload to report "No selection", but on GCP instances we
mostly always end up with "This seat has no keyboard". There were also
cases were the session would not be Wayland at all but rather X11. I
think this suggests something around the availability of /dev/dri/card0,
but forcing the gdm3 service to wait for its availability and adding
extra waiting time after card0 is present would still not get us
somewhere.

We enabled gdm3 as well as mutter debugging but never found anything
that would be a good lead on why it was not yet ready.

At some point, the seat0 session of our user was shown as inactive and
the active one was tied to gdm so we suspected this was the reason, but
both forcing the session to be active and terminating the gdm session
would still not unblock us.

We also suspected the desktop to be locking itself so we disabled locking with 
the following, but iit did not help much:
> cat > /etc/dconf/profile/user << EOF
> user-db:user
> system-db:local
> EOF
> 
> mkdir /etc/dconf/db/local.d/
> # dconf user settings
> cat > /etc/dconf/db/local.d/00-tc-gnome-settings << EOF
> # /org/gnome/desktop/session/idle-delay
> [org/gnome/desktop/session]
> idle-delay=uint32 0
> # /org/gnome/desktop/lockdown/disable-lock-screen
> [org/gnome/desktop/lockdown]
> disable-lock-screen=true
> EOF
> 
> sudo dconf update


In the end, the only viable and reliable (verified over hundreds of runs now) 
fix that lasted was to add a "/bin/sleep 30" all to the gdm3 startup:
> mkdir -p /etc/systemd/system/gdm.service.d/
> cat > /etc/systemd/system/gdm.service.d/gdm-wait.conf << EOF
> [Unit]
> Description=Extra 30s wait
> [Service]
> ExecStartPre=/bin/sleep 30
> EOF

** Affects: linux-gcp (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2062534

Title:
  GDM3 autologin might be racy on GCP resulting in inconsistent state of
  the wayland setup of seat0

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-gcp/+bug/2062534/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to