Bug#903514: gimp won't launch

2018-08-07 Thread Alexis Murzeau
Hi,

On Fri, 3 Aug 2018 22:53:08 -0400 James Van Zandt
 wrote:
> Thanks, Benedict - the same solution worked for me.
> 
> Specifically:
> 
>sudo apt-get install libopenblas-base- libopenblas-dev- \
>  libblas3 liblapack3 libblas-dev liblapack-dev
> 
> Unfortunately julia and libjulia0.6 were also removed here, since they
> depend on libopenblas-base.  I intend to report this as a bug, and request
> that they depend instead on the virtual packages libblas.so.3 and
> liblapack.so.3 (which can also be provided by liblapack3 and libblas3,
> resp.).

After checking what could cause gimp issues, I found that on my machine,
gimp almost always hang showing nothing (no splashscreen) when
libopenblas-base is installed.

Using gdb to find where it hung (gimp-gdb.txt) gives threads waiting on
a lock while doing thread-local related stuff and the main thread is in
the process of dl_close-ing openblas waiting the threads to exit using
pthread_join.

It seems that the lock used in `tls_get_addr_tail` [0] is the same as
the one locked by _dl_close [1].
A recursive lock is used but here it does not help as the thread calling
`tls_get_addr_tail` and `_dl_close` are not the same.

This deadlock may not happen everytime, in my case, the openblas threads
are still initializing while dl_close is called.

Given this, I think the offending commit in openblas is bf40f806 [2]
which add TLS variables to avoid locking. But many change were done
since then.

One of related bug report is [3] which seems to indicate that the locks
handling is not easy inside glibc.

There were an attempt to fix deadlocks between tls_get_addr and a
dlclose of a module whose finalizer joins with that thread [4].

So I see these possibles solutions:
 * Add a breaks between gimp and openblas
 * Disable TLS in openblas build (if possible, but this would cause a
performance loss for users that use openblas without gimp)
 * Patch glibc to not deadlock (but this seems not easy to do at all)

Also, this deadlock might not be the only cause of issues encountered in
this bug report.

Reassigning to glibc with affects on openblas and gimp as this is caused
by a deadlock inside glibc.

[0] https://github.com/bminor/glibc/blob/glibc-2.27/elf/dl-tls.c#L761
[1] https://github.com/bminor/glibc/blob/glibc-2.27/elf/dl-close.c#L812

[2]
https://github.com/xianyi/OpenBLAS/commit/bf40f806efa55c7a7c7ec57535919598eaeb569d#diff-31f8d4e8863583d95bf2f9529f83844e
[4] https://sourceware.org/ml/libc-alpha/2015-06/msg00062.html

-- 
Alexis Murzeau
PGP: B7E6 0EBB 9293 7B06 BDBC  2787 E7BD 1904 F480 937F
(gdb) thr a a bt

Thread 4 (Thread 0x7f727a990700 (LWP 26238)):
#0  0x7f7283ad711c in __lll_lock_wait () at 
../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x7f7283ad06c6 in __GI___pthread_mutex_lock (mutex=0x7f7287775968 
<_rtld_global+2312>) at ../nptl/pthread_mutex_lock.c:113
#2  0x7f728775e5b7 in tls_get_addr_tail (ti=0x7f7278c2fc70, 
dtv=0x55edf85706b0, the_map=0x55edf8567980) at ../elf/dl-tls.c:761
#3  0x7f7287764288 in __tls_get_addr () at 
../sysdeps/x86_64/tls_get_addr.S:55
#4  0x7f7276d86400 in get_memory_table () at memory.c:1147
#5  0x7f7276d86400 in blas_memory_alloc (procpos=procpos@entry=2) at 
memory.c:1147
#6  0x7f7276d86bbb in blas_thread_server (arg=0x2) at blas_server.c:297
#7  0x7f7283acdf2a in start_thread (arg=0x7f727a990700) at 
pthread_create.c:463
#8  0x7f7283a00edf in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f727b191700 (LWP 26237)):
#0  0x7f7283ad711c in __lll_lock_wait () at 
../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x7f7283ad06c6 in __GI___pthread_mutex_lock (mutex=0x7f7287775968 
<_rtld_global+2312>) at ../nptl/pthread_mutex_lock.c:113
#2  0x7f728775e5b7 in tls_get_addr_tail (ti=0x7f7278c2fc70, 
dtv=0x55edf85704b0, the_map=0x55edf8567980) at ../elf/dl-tls.c:761
#3  0x7f7287764288 in __tls_get_addr () at 
../sysdeps/x86_64/tls_get_addr.S:55
#4  0x7f7276d86400 in get_memory_table () at memory.c:1147
#5  0x7f7276d86400 in blas_memory_alloc (procpos=procpos@entry=2) at 
memory.c:1147
#6  0x7f7276d86bbb in blas_thread_server (arg=0x1) at blas_server.c:297
#7  0x7f7283acdf2a in start_thread (arg=0x7f727b191700) at 
pthread_create.c:463
#8  0x7f7283a00edf in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f727b992700 (LWP 26236)):
#0  0x7f7283ad711c in __lll_lock_wait () at 
../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x7f7283ad06c6 in __GI___pthread_mutex_lock (mutex=0x7f7287775968 
<_rtld_global+2312>) at ../nptl/pthread_mutex_lock.c:113
#2  0x7f728775e5b7 in tls_get_addr_tail (ti=0x7f7278c2fc70, 
dtv=0x55edf8556c10, the_map=0x55edf8567980) at ../elf/dl-tls.c:761
#3  0x7f7287764288 in __tls_get_addr () at 
../sysdeps/x86_64/tls_get_addr.S:55
#4  0x7f7276d86400 in get_memory_table () at memory.c:1147
#5  0x7f7276d86400 in 

Bug#903514: gimp won't launch

2018-08-03 Thread James Van Zandt
Thanks, Benedict - the same solution worked for me.

Specifically:

   sudo apt-get install libopenblas-base- libopenblas-dev- \
 libblas3 liblapack3 libblas-dev liblapack-dev

Unfortunately julia and libjulia0.6 were also removed here, since they
depend on libopenblas-base.  I intend to report this as a bug, and request
that they depend instead on the virtual packages libblas.so.3 and
liblapack.so.3 (which can also be provided by liblapack3 and libblas3,
resp.).


Bug#903514: GIMP won't launch

2018-07-18 Thread James Van Zandt
I note that, according to the strace log, gimp successfully read in 138
plugins, but failed on the very first plug-in that was a Python script.
That can't be a coincidence.

 - Jim Van Zandt