On Thu, May 06, 2021 at 09:32:28AM +0200, Sebastien Marie wrote: > Hi, > > Anindya, did a good analysis of the problem with mpv using gpu video > output backend (it is using EGL and mesa if I correctly followed). > > > For people not reading ports@ here a resume: the destructor function > used in pthread_key_create() needs to be present in memory until > _rthread_tls_destructors() is called. > > in the case of mesa, eglInitialize() function could load, via > dlopen(), code which will use pthread_key_create() with destructor. > > once dlclose() is called, the object is unloaded from memory, but a > reference to destructor is kept, leading to segfault when > _rthread_tls_destructors() run and use the destructor (because > pointing to unloaded code). > > We already take care of such situation with __cxa_thread_atexit_impl > (in libc/stdlib/thread_atexit.c), by keeping an additionnal reference > on object loaded (it makes ld.so aware that it is still used and so > dlclose() doesn't unload it). > > I used the same idiom for pthread_key_create() and used dlctl(3) in > the same way with the destructor address. > > With the following diff, I am not able to reproduce the segfault > anymore with mpv. > > diff 393e7b397988bb6abe46729de1794883d2b9d5cf /home/semarie/repos/openbsd/src > blob - 58b39bb7df70f54c708d2e2b11a3978806a86005 > file + lib/libc/thread/rthread_tls.c > --- lib/libc/thread/rthread_tls.c > +++ lib/libc/thread/rthread_tls.c > @@ -22,10 +22,8 @@ > #include <errno.h> > #include <pthread.h> > #include <stdlib.h> > +#include <dlfcn.h> > > -#include <pthread.h> > -#include <stdlib.h> > - > #include "rthread.h" > > > @@ -58,6 +56,9 @@ pthread_key_create(pthread_key_t *key, void (*destruct > rkeys[hint].used = 1; > rkeys[hint].destructor = destructor; > > + if (destructor != NULL) > + dlctl(NULL, DL_REFERENCE, destructor); > + > *key = hint++; > if (hint >= PTHREAD_KEYS_MAX) > hint = 0; > > > I am still unsure if we want to solve this problem this way, and if > this diff would introduce more problems than it tries to solve. > > At first place, it might be better to not register a destructor when > using dlopen()... > > Comments would be appreciated. > > Thanks. > -- > Sebastien Marie > > > On Wed, May 05, 2021 at 01:20:02AM -0700, Anindya Mukherjee wrote: > > Hi, > > > > I have been investigating the crash on exit problem with mpv in ports > > with vo=gpu. I think I made a little bit of progress and thought I'd > > share my findings. > > > > The crash (SIGSEGV) happens when thread local destructors > > are called from /usr/src/lib/libc/thread/rthread_tls.c:182 in > > _rthread_tls_destructors after the gpu thread exits: vo_thread in > > video/out/vo.c:1067. The crashing call stack looks like this: > > > > #0 0x00000176ffdc9680 in ?? () > > #1 0x0000017748d347b5 in _rthread_tls_destructors (thread=0x17798917840) > > at /usr/src/lib/libc/thread/rthread_tls.c:182 > > #2 0x0000017748d98623 in _libc_pthread_exit (retval=<error reading > > variable: Unhandled dwarf expression opcode 0xa3>) at > > /usr/src/lib/libc/thread/rthread.c:150 > > #3 0x0000017795b22189 in _rthread_start (v=<error reading variable: > > Unhandled dwarf expression opcode 0xa3>) at > > /usr/src/lib/librthread/rthread.c:97 > > #4 0x0000017748d0c5ba in __tfork_thread () at > > /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84 > > > > Note that some of the traces were taken from different runs so there > > might be some mismatch between the handles/addresses. > > > > It crashes because the destructor is dangling. This mystified me because > > if I look at the mpv source, there is no thread local data for the gpu > > thread. Indeed, right after the gpu thread starts running if we look > > inside the thread structure, local_storage is null. However, if we look > > at the same thread at the point of the crash, its local_storage is > > populated: > > > > (gdb) p *(*thread).local_storage > > $3 = { > > keyid = 7, > > next = 0x177353442e0, > > data = 0x1770c276000 > > } > > > > The keys are indexed by the keyid in the rkeys array, from where the > > destructor is fetched in _rthread_tls_destructors: > > > > (gdb) p rkeys[7] > > $6 = { > > used = 1, > > destructor = 0x43dd33e2680 > > } > > > > This destructor now points to invalid memory. It turns out the thread > > local storage is being initialised here: > > > > #0 _libc_pthread_key_create (key=0x43dd380da08, destructor=0x43dd33e2680) > > at /usr/src/lib/libc/thread/rthread_tls.c:42 > > #1 0x0000043dd33e2667 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > > #2 0x0000043e793f82f7 in pthread_once (once_control=0x43dd380d9f8, > > init_routine=0x43e793db3c0 <_libc_pthread_key_create>) at > > /usr/src/lib/libc/thread/rthread_once.c:26 > > #3 0x0000043dd33e24bd in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > > #4 0x0000043dd305475f in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > > #5 0x0000043dd3036c70 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > > #6 0x0000043dd30e3ca3 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > > #7 0x0000043dd30e4b96 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > > #8 0x0000043dd302d89a in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > > #9 0x0000043dd3031162 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > > #10 0x0000043dd3031ec6 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > > #11 0x0000043dd30f7a8b in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > > #12 0x0000043dd311a94e in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > > #13 0x0000043dd311addf in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > > #14 0x0000043dd33ae4a6 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > > #15 0x0000043dd33ad6a2 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > > #16 0x0000043dd276e1d3 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so > > #17 0x0000043b8f0816c6 in gl_clear (ra=0x43e50a61a50, dst=0x43e399aa950, > > color=0x43e0e382370, scissor=0x43e0e382390) at > > ../mpv-0.33.1/video/out/opengl/ra_gl.c:684 > > #18 0x0000043b8f061db8 in gl_video_render_frame (p=0x43db938c050, > > frame=0x43e399bb350, fbo=..., flags=3) at > > ../mpv-0.33.1/video/out/gpu/video.c:3251 > > #19 0x0000043b8f089a8f in draw_frame (vo=0x43e32b9f450, > > frame=0x43e399bb350) at ../mpv-0.33.1/video/out/vo_gpu.c:87 > > #20 0x0000043b8f0882a4 in render_frame (vo=0x43e32b9f450) at > > ../mpv-0.33.1/video/out/vo.c:957 > > #21 0x0000043b8f087735 in vo_thread (ptr=0x43e32b9f450) at > > ../mpv-0.33.1/video/out/vo.c:1095 > > #22 0x0000043da1682181 in _rthread_start (v=<error reading variable: > > Unhandled dwarf expression opcode 0xa3>) at > > /usr/src/lib/librthread/rthread.c:96 > > #23 0x0000043e793b35ba in __tfork_thread () at > > /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84 > > > > So the mpv code is not directly aware of the TLS, and it is being > > allocated in iris_dri.so. This trace was taken at the start of video > > playback, and at this point iris_dri.so is loaded (using dlopen) and the > > destructor is valid. > > > > #0 dlopen (libname=0xbbddfb93320 "/usr/X11R6/lib/modules/dri/iris_dri.so", > > flags=258) at /usr/src/libexec/ld.so/dlfcn.c:51 > > #1 0x00000bbd76e126e6 in loader_open_driver (driver_name=0xbbd52b277e0 > > "iris", out_driver_handle=0xbbd417eda28, search_path_vars=<optimized out>) > > at /usr/xenocara/lib/mesa/mk/libloader/../../src/loader/loader.c:579 > > #2 0x00000bbd76e0a7a8 in dri2_open_driver (disp=<optimized out>) at > > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:771 > > #3 dri2_load_driver_common (disp=<optimized out>, > > driver_extensions=<optimized out>) at > > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:783 > > #4 dri2_load_driver_dri3 (disp=<error reading variable: Unhandled dwarf > > expression opcode 0xa3>) at > > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:808 > > #5 0x00000bbd76e0331c in dri2_initialize_x11_dri3 (drv=<optimized out>, > > disp=0xbbd4180c000) at > > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/platform_x11.c:1393 > > #6 dri2_initialize_x11 (drv=<error reading variable: Unhandled dwarf > > expression opcode 0xa3>, disp=0xbbd4180c000) at > > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/platform_x11.c:1554 > > #7 0x00000bbd76e0c352 in dri2_initialize (drv=0xbbd417fd200, > > disp=0xbbd4180c000) at > > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1143 > > #8 0x00000bbd76e0649e in _eglMatchAndInitialize (disp=0xbbd4180c000) at > > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/egldriver.c:75 > > #9 _eglMatchDriver (disp=0xbbd4180c000) at > > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/egldriver.c:98 > > #10 0x00000bbd76dfa1c1 in eglInitialize (dpy=<optimized out>, major=0x0, > > minor=0x0) at > > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/eglapi.c:617 > > #11 0x00000bbb39afbca0 in mpegl_init (ctx=0xbbd41809350) at > > ../mpv-0.33.1/video/out/opengl/context_x11egl.c:109 > > #12 0x00000bbb39ad0c96 in ra_ctx_create (vo=0xbbd5b2df050, > > context_type=0x0, context_name=0x0, opts=...) at > > ../mpv-0.33.1/video/out/gpu/context.c:185 > > #13 0x00000bbb39b083a7 in preinit (vo=0xbbd5b2df050) at > > ../mpv-0.33.1/video/out/vo_gpu.c:298 > > #14 0x00000bbb39b06679 in vo_thread (ptr=0xbbd5b2df050) at > > ../mpv-0.33.1/video/out/vo.c:1080 > > #15 0x00000bbe30c09181 in _rthread_start (v=<error reading variable: > > Unhandled dwarf expression opcode 0xa3>) at > > /usr/src/lib/librthread/rthread.c:96 > > #16 0x00000bbdb38aa5ba in __tfork_thread () at > > /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84 > > > > However, when mpv is shutting down, iris_dri.so is > > unloaded here: > > > > #0 dlclose (handle=0xb79c1fa5800) at /usr/src/libexec/ld.so/dlfcn.c:274 > > #1 0x00000b7a51b541e0 in dri2_display_destroy (disp=0xb7969481000) at > > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1204 > > #2 0x00000b7a51b55407 in dri2_display_release (disp=0xb7969481000) at > > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1188 > > #3 dri2_terminate (drv=<error reading variable: Unhandled dwarf expression > > opcode 0xa3>, disp=0xb7969481000) at > > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1285 > > #4 0x00000b7a51b43db7 in eglTerminate (dpy=<optimized out>) at > > /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/eglapi.c:675 > > #5 0x00000b775b50a174 in mpegl_uninit (ctx=0xb796948f550) at > > ../mpv-0.33.1/video/out/opengl/context_x11egl.c:51 > > #6 0x00000b775b4dedc8 in ra_ctx_destroy (ctx_ptr=0xb796faf1158) at > > ../mpv-0.33.1/video/out/gpu/context.c:211 > > #7 0x00000b775b516d8d in uninit (vo=0xb796fabb650) at > > ../mpv-0.33.1/video/out/vo_gpu.c:286 > > #8 0x00000b775b514994 in vo_thread (ptr=0xb796fabb650) at > > ../mpv-0.33.1/video/out/vo.c:1136 > > #9 0x00000b796775c181 in _rthread_start (v=<error reading variable: > > Unhandled dwarf expression opcode 0xa3>) at > > /usr/src/lib/librthread/rthread.c:96 > > #10 0x00000b7a3e8ef5ba in __tfork_thread () at > > /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84 > > > > So iris_dri.so is unloaded before the _rthread_tls_destructors function > > gets to the destructor. This is the cause of the crash. I did a quick > > and dirty test by doing this: > > LD_PRELOAD=/usr/X11R6/lib/modules/dri/iris_dri.so ./mpv -v file.mp4 > > and indeed now mpv does not crash on exit (vo=gpu is being used by > > default), because the destructor is being resolved from LD_PRELOAD. > > > > I intend to look at this on Linux to see why it does not crash there, > > but haven't gotten to it yet. In the meanwhile, I wonder if we can > > patch the OpenBSD port in some way to prevent the dangling TLS > > destructor. If anyone has a clean solution based on the above > > information please feel free to chime in. I'd love to get this fixed. > > > > Regards, > > Anindya > > >
Hi Sebastien, Thanks for the patch! Just briefly tested this and it works (no more crash). I'll run this for a while and report if there are issues. -- Kind regards, Hiltjo