On Wed, May 05, 2021 at 01:20:02AM -0700, Anindya Mukherjee wrote:
> Hi,
> 
> I have been investigating the crash on exit problem with mpv in ports
> with vo=gpu. I think I made a little bit of progress and thought I'd
> share my findings.
> 
> The crash (SIGSEGV) happens when thread local destructors
> are called from /usr/src/lib/libc/thread/rthread_tls.c:182 in
> _rthread_tls_destructors after the gpu thread exits: vo_thread in
> video/out/vo.c:1067. The crashing call stack looks like this:
> 
> #0  0x00000176ffdc9680 in ?? ()
> #1  0x0000017748d347b5 in _rthread_tls_destructors (thread=0x17798917840) at 
> /usr/src/lib/libc/thread/rthread_tls.c:182
> #2  0x0000017748d98623 in _libc_pthread_exit (retval=<error reading variable: 
> Unhandled dwarf expression opcode 0xa3>) at 
> /usr/src/lib/libc/thread/rthread.c:150
> #3  0x0000017795b22189 in _rthread_start (v=<error reading variable: 
> Unhandled dwarf expression opcode 0xa3>) at 
> /usr/src/lib/librthread/rthread.c:97
> #4  0x0000017748d0c5ba in __tfork_thread () at 
> /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84
> 
> Note that some of the traces were taken from different runs so there
> might be some mismatch between the handles/addresses.
> 
> It crashes because the destructor is dangling. This mystified me because
> if I look at the mpv source, there is no thread local data for the gpu
> thread. Indeed, right after the gpu thread starts running if we look
> inside the thread structure, local_storage is null. However, if we look
> at the same thread at the point of the crash, its local_storage is
> populated:
> 
> (gdb) p *(*thread).local_storage
> $3 = {
>   keyid = 7,
>   next = 0x177353442e0,
>   data = 0x1770c276000
> }
> 
> The keys are indexed by the keyid in the rkeys array, from where the
> destructor is fetched in _rthread_tls_destructors:
> 
> (gdb) p rkeys[7]
> $6 = {
>   used = 1,
>   destructor = 0x43dd33e2680
> }
> 
> This destructor now points to invalid memory. It turns out the thread
> local storage is being initialised here:
> 
> #0  _libc_pthread_key_create (key=0x43dd380da08, destructor=0x43dd33e2680) at 
> /usr/src/lib/libc/thread/rthread_tls.c:42
> #1  0x0000043dd33e2667 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #2  0x0000043e793f82f7 in pthread_once (once_control=0x43dd380d9f8, 
> init_routine=0x43e793db3c0 <_libc_pthread_key_create>) at 
> /usr/src/lib/libc/thread/rthread_once.c:26
> #3  0x0000043dd33e24bd in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #4  0x0000043dd305475f in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #5  0x0000043dd3036c70 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #6  0x0000043dd30e3ca3 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #7  0x0000043dd30e4b96 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #8  0x0000043dd302d89a in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #9  0x0000043dd3031162 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #10 0x0000043dd3031ec6 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #11 0x0000043dd30f7a8b in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #12 0x0000043dd311a94e in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #13 0x0000043dd311addf in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #14 0x0000043dd33ae4a6 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #15 0x0000043dd33ad6a2 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #16 0x0000043dd276e1d3 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #17 0x0000043b8f0816c6 in gl_clear (ra=0x43e50a61a50, dst=0x43e399aa950, 
> color=0x43e0e382370, scissor=0x43e0e382390) at 
> ../mpv-0.33.1/video/out/opengl/ra_gl.c:684
> #18 0x0000043b8f061db8 in gl_video_render_frame (p=0x43db938c050, 
> frame=0x43e399bb350, fbo=..., flags=3) at 
> ../mpv-0.33.1/video/out/gpu/video.c:3251
> #19 0x0000043b8f089a8f in draw_frame (vo=0x43e32b9f450, frame=0x43e399bb350) 
> at ../mpv-0.33.1/video/out/vo_gpu.c:87
> #20 0x0000043b8f0882a4 in render_frame (vo=0x43e32b9f450) at 
> ../mpv-0.33.1/video/out/vo.c:957
> #21 0x0000043b8f087735 in vo_thread (ptr=0x43e32b9f450) at 
> ../mpv-0.33.1/video/out/vo.c:1095
> #22 0x0000043da1682181 in _rthread_start (v=<error reading variable: 
> Unhandled dwarf expression opcode 0xa3>) at 
> /usr/src/lib/librthread/rthread.c:96
> #23 0x0000043e793b35ba in __tfork_thread () at 
> /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84
> 
> So the mpv code is not directly aware of the TLS, and it is being
> allocated in iris_dri.so. This trace was taken at the start of video
> playback, and at this point iris_dri.so is loaded (using dlopen) and the
> destructor is valid.
> 
> #0  dlopen (libname=0xbbddfb93320 "/usr/X11R6/lib/modules/dri/iris_dri.so", 
> flags=258) at /usr/src/libexec/ld.so/dlfcn.c:51
> #1  0x00000bbd76e126e6 in loader_open_driver (driver_name=0xbbd52b277e0 
> "iris", out_driver_handle=0xbbd417eda28, search_path_vars=<optimized out>) at 
> /usr/xenocara/lib/mesa/mk/libloader/../../src/loader/loader.c:579
> #2  0x00000bbd76e0a7a8 in dri2_open_driver (disp=<optimized out>) at 
> /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:771
> #3  dri2_load_driver_common (disp=<optimized out>, 
> driver_extensions=<optimized out>) at 
> /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:783
> #4  dri2_load_driver_dri3 (disp=<error reading variable: Unhandled dwarf 
> expression opcode 0xa3>) at 
> /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:808
> #5  0x00000bbd76e0331c in dri2_initialize_x11_dri3 (drv=<optimized out>, 
> disp=0xbbd4180c000) at 
> /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/platform_x11.c:1393
> #6  dri2_initialize_x11 (drv=<error reading variable: Unhandled dwarf 
> expression opcode 0xa3>, disp=0xbbd4180c000) at 
> /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/platform_x11.c:1554
> #7  0x00000bbd76e0c352 in dri2_initialize (drv=0xbbd417fd200, 
> disp=0xbbd4180c000) at 
> /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1143
> #8  0x00000bbd76e0649e in _eglMatchAndInitialize (disp=0xbbd4180c000) at 
> /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/egldriver.c:75
> #9  _eglMatchDriver (disp=0xbbd4180c000) at 
> /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/egldriver.c:98
> #10 0x00000bbd76dfa1c1 in eglInitialize (dpy=<optimized out>, major=0x0, 
> minor=0x0) at /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/eglapi.c:617
> #11 0x00000bbb39afbca0 in mpegl_init (ctx=0xbbd41809350) at 
> ../mpv-0.33.1/video/out/opengl/context_x11egl.c:109
> #12 0x00000bbb39ad0c96 in ra_ctx_create (vo=0xbbd5b2df050, context_type=0x0, 
> context_name=0x0, opts=...) at ../mpv-0.33.1/video/out/gpu/context.c:185
> #13 0x00000bbb39b083a7 in preinit (vo=0xbbd5b2df050) at 
> ../mpv-0.33.1/video/out/vo_gpu.c:298
> #14 0x00000bbb39b06679 in vo_thread (ptr=0xbbd5b2df050) at 
> ../mpv-0.33.1/video/out/vo.c:1080
> #15 0x00000bbe30c09181 in _rthread_start (v=<error reading variable: 
> Unhandled dwarf expression opcode 0xa3>) at 
> /usr/src/lib/librthread/rthread.c:96
> #16 0x00000bbdb38aa5ba in __tfork_thread () at 
> /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84
> 
> However, when mpv is shutting down, iris_dri.so is
> unloaded here:
> 
> #0  dlclose (handle=0xb79c1fa5800) at /usr/src/libexec/ld.so/dlfcn.c:274
> #1  0x00000b7a51b541e0 in dri2_display_destroy (disp=0xb7969481000) at 
> /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1204
> #2  0x00000b7a51b55407 in dri2_display_release (disp=0xb7969481000) at 
> /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1188
> #3  dri2_terminate (drv=<error reading variable: Unhandled dwarf expression 
> opcode 0xa3>, disp=0xb7969481000) at 
> /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1285
> #4  0x00000b7a51b43db7 in eglTerminate (dpy=<optimized out>) at 
> /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/eglapi.c:675
> #5  0x00000b775b50a174 in mpegl_uninit (ctx=0xb796948f550) at 
> ../mpv-0.33.1/video/out/opengl/context_x11egl.c:51
> #6  0x00000b775b4dedc8 in ra_ctx_destroy (ctx_ptr=0xb796faf1158) at 
> ../mpv-0.33.1/video/out/gpu/context.c:211
> #7  0x00000b775b516d8d in uninit (vo=0xb796fabb650) at 
> ../mpv-0.33.1/video/out/vo_gpu.c:286
> #8  0x00000b775b514994 in vo_thread (ptr=0xb796fabb650) at 
> ../mpv-0.33.1/video/out/vo.c:1136
> #9  0x00000b796775c181 in _rthread_start (v=<error reading variable: 
> Unhandled dwarf expression opcode 0xa3>) at 
> /usr/src/lib/librthread/rthread.c:96
> #10 0x00000b7a3e8ef5ba in __tfork_thread () at 
> /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84
> 
> So iris_dri.so is unloaded before the _rthread_tls_destructors function
> gets to the destructor. This is the cause of the crash. I did a quick
> and dirty test by doing this:
> LD_PRELOAD=/usr/X11R6/lib/modules/dri/iris_dri.so ./mpv -v file.mp4
> and indeed now mpv does not crash on exit (vo=gpu is being used by
> default), because the destructor is being resolved from LD_PRELOAD.
> 
> I intend to look at this on Linux to see why it does not crash there,
> but haven't gotten to it yet. In the meanwhile, I wonder if we can
> patch the OpenBSD port in some way to prevent the dangling TLS
> destructor. If anyone has a clean solution based on the above
> information please feel free to chime in. I'd love to get this fixed.
> 
> Regards,
> Anindya
> 

Hi,

I can (still) reproduce this issue. I don't use Iris though.

Using LD_PRELOAD still crashes for me too. Did you make other changes to the
mpv code?

Thanks for looking into this issue,

-- 
Kind regards,
Hiltjo

Reply via email to