On Wed, May 05, 2021 at 01:20:02AM -0700, Anindya Mukherjee wrote: > Hi, > > I have been investigating the crash on exit problem with mpv in ports > with vo=gpu. I think I made a little bit of progress and thought I'd > share my findings. > > The crash (SIGSEGV) happens when thread local destructors > are called from /usr/src/lib/libc/thread/rthread_tls.c:182 in > _rthread_tls_destructors after the gpu thread exits: vo_thread in > video/out/vo.c:1067. The crashing call stack looks like this: > > #0 0x00000176ffdc9680 in ?? () > #1 0x0000017748d347b5 in _rthread_tls_destructors (thread=0x17798917840) at > /usr/src/lib/libc/thread/rthread_tls.c:182 > #2 0x0000017748d98623 in _libc_pthread_exit (retval=<error reading variable: > Unhandled dwarf expression opcode 0xa3>) at > /usr/src/lib/libc/thread/rthread.c:150 > #3 0x0000017795b22189 in _rthread_start (v=<error reading variable: > Unhandled dwarf expression opcode 0xa3>) at > /usr/src/lib/librthread/rthread.c:97 > #4 0x0000017748d0c5ba in __tfork_thread () at > /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84 > > Note that some of the traces were taken from different runs so there > might be some mismatch between the handles/addresses. > > It crashes because the destructor is dangling. This mystified me because > if I look at the mpv source, there is no thread local data for the gpu > thread. Indeed, right after the gpu thread starts running if we look > inside the thread structure, local_storage is null. However, if we look > at the same thread at the point of the crash, its local_storage is > populated: > > (gdb) p *(*thread).local_storage > $3 = { > keyid = 7, > next = 0x177353442e0, > data = 0x1770c276000 > } > > The keys are indexed by the keyid in the rkeys array, from where the > destructor is fetched in _rthread_tls_destructors: > > (gdb) p rkeys[7] > $6 = { > used = 1, > destructor = 0x43dd33e2680 > } > > This destructor now points to invalid memory. It turns out the thread > local storage is being initialised here: > > #0 _libc_pthread_key_create (key=0x43dd380da08, destructor=0x43dd33e2680) at > /usr/src/lib/libc/thread/rthread_tls.c:42 > #1 0x0000043dd33e2667 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #2 0x0000043e793f82f7 in pthread_once (once_control=0x43dd380d9f8, > init_routine=0x43e793db3c0 <_libc_pthread_key_create>) at > /usr/src/lib/libc/thread/rthread_once.c:26 > #3 0x0000043dd33e24bd in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #4 0x0000043dd305475f in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #5 0x0000043dd3036c70 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #6 0x0000043dd30e3ca3 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #7 0x0000043dd30e4b96 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #8 0x0000043dd302d89a in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #9 0x0000043dd3031162 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #10 0x0000043dd3031ec6 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #11 0x0000043dd30f7a8b in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #12 0x0000043dd311a94e in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #13 0x0000043dd311addf in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #14 0x0000043dd33ae4a6 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #15 0x0000043dd33ad6a2 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #16 0x0000043dd276e1d3 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > #17 0x0000043b8f0816c6 in gl_clear (ra=0x43e50a61a50, dst=0x43e399aa950, > color=0x43e0e382370, scissor=0x43e0e382390) at > ../mpv-0.33.1/video/out/opengl/ra_gl.c:684 > #18 0x0000043b8f061db8 in gl_video_render_frame (p=0x43db938c050, > frame=0x43e399bb350, fbo=..., flags=3) at > ../mpv-0.33.1/video/out/gpu/video.c:3251 > #19 0x0000043b8f089a8f in draw_frame (vo=0x43e32b9f450, frame=0x43e399bb350) > at ../mpv-0.33.1/video/out/vo_gpu.c:87 > #20 0x0000043b8f0882a4 in render_frame (vo=0x43e32b9f450) at > ../mpv-0.33.1/video/out/vo.c:957 > #21 0x0000043b8f087735 in vo_thread (ptr=0x43e32b9f450) at > ../mpv-0.33.1/video/out/vo.c:1095 > #22 0x0000043da1682181 in _rthread_start (v=<error reading variable: > Unhandled dwarf expression opcode 0xa3>) at > /usr/src/lib/librthread/rthread.c:96 > #23 0x0000043e793b35ba in __tfork_thread () at > /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84 > > So the mpv code is not directly aware of the TLS, and it is being > allocated in iris_dri.so. This trace was taken at the start of video > playback, and at this point iris_dri.so is loaded (using dlopen) and the > destructor is valid. > > #0 dlopen (libname=0xbbddfb93320 "/usr/X11R6/lib/modules/dri/iris_dri.so", > flags=258) at /usr/src/libexec/ld.so/dlfcn.c:51 > #1 0x00000bbd76e126e6 in loader_open_driver (driver_name=0xbbd52b277e0 > "iris", out_driver_handle=0xbbd417eda28, search_path_vars=<optimized out>) at > /usr/xenocara/lib/mesa/mk/libloader/../../src/loader/loader.c:579 > #2 0x00000bbd76e0a7a8 in dri2_open_driver (disp=<optimized out>) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:771 > #3 dri2_load_driver_common (disp=<optimized out>, > driver_extensions=<optimized out>) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:783 > #4 dri2_load_driver_dri3 (disp=<error reading variable: Unhandled dwarf > expression opcode 0xa3>) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:808 > #5 0x00000bbd76e0331c in dri2_initialize_x11_dri3 (drv=<optimized out>, > disp=0xbbd4180c000) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/platform_x11.c:1393 > #6 dri2_initialize_x11 (drv=<error reading variable: Unhandled dwarf > expression opcode 0xa3>, disp=0xbbd4180c000) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/platform_x11.c:1554 > #7 0x00000bbd76e0c352 in dri2_initialize (drv=0xbbd417fd200, > disp=0xbbd4180c000) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1143 > #8 0x00000bbd76e0649e in _eglMatchAndInitialize (disp=0xbbd4180c000) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/egldriver.c:75 > #9 _eglMatchDriver (disp=0xbbd4180c000) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/egldriver.c:98 > #10 0x00000bbd76dfa1c1 in eglInitialize (dpy=<optimized out>, major=0x0, > minor=0x0) at /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/eglapi.c:617 > #11 0x00000bbb39afbca0 in mpegl_init (ctx=0xbbd41809350) at > ../mpv-0.33.1/video/out/opengl/context_x11egl.c:109 > #12 0x00000bbb39ad0c96 in ra_ctx_create (vo=0xbbd5b2df050, context_type=0x0, > context_name=0x0, opts=...) at ../mpv-0.33.1/video/out/gpu/context.c:185 > #13 0x00000bbb39b083a7 in preinit (vo=0xbbd5b2df050) at > ../mpv-0.33.1/video/out/vo_gpu.c:298 > #14 0x00000bbb39b06679 in vo_thread (ptr=0xbbd5b2df050) at > ../mpv-0.33.1/video/out/vo.c:1080 > #15 0x00000bbe30c09181 in _rthread_start (v=<error reading variable: > Unhandled dwarf expression opcode 0xa3>) at > /usr/src/lib/librthread/rthread.c:96 > #16 0x00000bbdb38aa5ba in __tfork_thread () at > /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84 > > However, when mpv is shutting down, iris_dri.so is > unloaded here: > > #0 dlclose (handle=0xb79c1fa5800) at /usr/src/libexec/ld.so/dlfcn.c:274 > #1 0x00000b7a51b541e0 in dri2_display_destroy (disp=0xb7969481000) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1204 > #2 0x00000b7a51b55407 in dri2_display_release (disp=0xb7969481000) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1188 > #3 dri2_terminate (drv=<error reading variable: Unhandled dwarf expression > opcode 0xa3>, disp=0xb7969481000) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1285 > #4 0x00000b7a51b43db7 in eglTerminate (dpy=<optimized out>) at > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/eglapi.c:675 > #5 0x00000b775b50a174 in mpegl_uninit (ctx=0xb796948f550) at > ../mpv-0.33.1/video/out/opengl/context_x11egl.c:51 > #6 0x00000b775b4dedc8 in ra_ctx_destroy (ctx_ptr=0xb796faf1158) at > ../mpv-0.33.1/video/out/gpu/context.c:211 > #7 0x00000b775b516d8d in uninit (vo=0xb796fabb650) at > ../mpv-0.33.1/video/out/vo_gpu.c:286 > #8 0x00000b775b514994 in vo_thread (ptr=0xb796fabb650) at > ../mpv-0.33.1/video/out/vo.c:1136 > #9 0x00000b796775c181 in _rthread_start (v=<error reading variable: > Unhandled dwarf expression opcode 0xa3>) at > /usr/src/lib/librthread/rthread.c:96 > #10 0x00000b7a3e8ef5ba in __tfork_thread () at > /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84 > > So iris_dri.so is unloaded before the _rthread_tls_destructors function > gets to the destructor. This is the cause of the crash. I did a quick > and dirty test by doing this: > LD_PRELOAD=/usr/X11R6/lib/modules/dri/iris_dri.so ./mpv -v file.mp4 > and indeed now mpv does not crash on exit (vo=gpu is being used by > default), because the destructor is being resolved from LD_PRELOAD. > > I intend to look at this on Linux to see why it does not crash there, > but haven't gotten to it yet. In the meanwhile, I wonder if we can > patch the OpenBSD port in some way to prevent the dangling TLS > destructor. If anyone has a clean solution based on the above > information please feel free to chime in. I'd love to get this fixed. > > Regards, > Anindya >
Hi, I can (still) reproduce this issue. I don't use Iris though. Using LD_PRELOAD still crashes for me too. Did you make other changes to the mpv code? Thanks for looking into this issue, -- Kind regards, Hiltjo