Source: gnome-shell
Version: 44.3-3
Severity: normal

Steps to reproduce:

1. Get a source tree and build-dependencies

2. Remove this workaround from d/rules if you are using mips(64)el:

ifneq ($(filter mips%,$(DEB_HOST_ARCH_CPU)),)
# gnome-shell on mips(64)el works on a real GPU (in practice usually an
# AMD GPU), but crashes when using llvmpipe or softpipe, which is all that
# is available on the buildds, so we only run the unit tests at build time
# and skip the tests that would run the whole Shell. See discussion in
# https://salsa.debian.org/gnome-team/gnome-shell/-/merge_requests/71
meson_test_options += --no-suite shell
endif

3. Add this instead:

export GALLIUM_DRIVER=softpipe
export LIBGL_ALWAYS_SOFTWARE=true

4. debuild

Expected result: tests pass.

Actual result: tests fail with a gnome-shell
crash. According to mips porter YunQiang Su on
<https://salsa.debian.org/gnome-team/gnome-shell/-/merge_requests/71>,
this is also reproducible on arm64 (I have not verified this).

Impact: nobody intentionally uses softpipe in practice, but this
prevents us from using it as a workaround when llvmpipe has issues
(such as #1049404).

YunQiang Su writes:
> The reason is that the in gjs/gi/function.cpp(Function::invoke),
> the value of ffi_arg_pointers.get() has no TOPLEVEL Stage, so
> shell_wm_completed_map segfault.
...
> On my ARM64 machine, if no breakpoint is set, segfault will always
> happen. If 2 breakpoints is set on both: b function.cpp:1050 if
> function=shell_wm_completed_map shell_wm_completed_map The test will
> always pass.
> 
> So I guess some other thread change the data to shell_wm_completed_map.
...
> nano sleep some time (1<<23 ns for my arm64 server) before the ffi_call
> can pass the test.
>
> and taskset also helps the possibility of test pass.

This suggests that there is some timing or multi-threading issue that is
triggering this when using softpipe.

shell_wm_completed_map() is a gnome-shell function, nothing to do with
Mesa or LLVM, and similarly gjs/gi/function.cpp is part of gjs, so I
think this is more likely to be a bug in gobject-introspection, gjs,
gnome-shell or mutter than a bug in LLVM or Mesa.

My guess would be that there's some fallback rendering path that is
rarely tested and therefore contains bugs, because all real-world GNOME
Shell users are using either a hardware GPU or llvmpipe, and nobody uses
softpipe in practice.

    smcv

Reply via email to