[issue36469] Stuck during interpreter exit, attempting to take the GIL

2019-04-25 Thread Remy Noel


Remy Noel  added the comment:

closing the ticket for now.

--
resolution:  -> later
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue36469>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36469] Stuck during interpreter exit, attempting to take the GIL

2019-04-11 Thread Remy Noel


Remy Noel  added the comment:

@eric.snow Unless you confirm there is no way to figure out which thread is/was 
holding the GIL from a debugging session on the running process, I'll get rid 
of it at the end of the week.

Should i close the ticket then or will you do it ?

Thanks.

--

___
Python tracker 
<https://bugs.python.org/issue36469>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36469] Stuck during interpreter exit, attempting to take the GIL

2019-03-30 Thread Remy Noel


Remy Noel  added the comment:

Thanks for the advicesand thorough analysis. I'll try to force threads shutdown 
from the cleanup callback but i'd like to dig to the root of this isssue if 
possible.

This is what the thread 7 python backtrace looks like:

(gdb) py-bt
Traceback (most recent call first):
  
  File "/usr/lib/python3.5/threading.py", line 293, in wait
waiter.acquire()
  File "/usr/lib/python3.5/threading.py", line 549, in wait
signaled = self._cond.wait(timeout)
  File "/usr/lib/python3.5/threading.py", line 849, in start
self._started.wait()
  File "...", line 44, in __init__
thr.start()

So we are basically spawning a thread and waiting for it to start (which will 
likely never happen). That seems like a "normal" behaviour for me (from a 
programming standpoint, that is), but this may be another cause of 
never-terminating threads. (unless this is also caused by the headlock and the 
thread is expected to spawn/release the lock even after finalizing.)

Also, i have access to the process that i kept running. Is there any way to me 
to figure out which thread is currently holding the GIL ? I just want to be 
sure i can't get this info myself before we close this ticket (at which point i 
will get rid of the culprit process).

--

___
Python tracker 
<https://bugs.python.org/issue36469>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36469] Stuck during interpreter exit, attempting to take the GIL

2019-03-29 Thread Remy Noel


Remy Noel  added the comment:

Oh, also, i do not use any C extension (apart from the one i mentionned), so i 
do not acquire/release the GIL directly (a component of the standard library 
would do so). 

The demon threads, mainly spend their time listening to sockets and running 
short subprocesses (all in pure python).

--

___
Python tracker 
<https://bugs.python.org/issue36469>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36469] Stuck during interpreter exit, attempting to take the GIL

2019-03-29 Thread Remy Noel


Remy Noel  added the comment:

Thank you a lot for this detailed answer.

Does the "causes of exit" may terminate the thread without releasing the GIL ?
Because as far as i can tell, none of the threads seems to own the GIL (i only 
checked _PyThreadState_Current though there might be a better way to find the 
GIL owner).
Therefore, the question is whether thread B is still alive after tB2. and, if 
so, whether we can find it. (Or whether we can understand why it left without 
releasing the GIL).

Is there any variable i may check to dig this further ?

--

___
Python tracker 
<https://bugs.python.org/issue36469>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36469] Stuck during interpreter exit, attempting to take the GIL

2019-03-29 Thread Remy Noel


Remy Noel  added the comment:

std modules: atexit, json, os, signal, socket, ssl, subprocess, sys, time, 
threading, xmlrpc.client
The non-standard extension: psutil, PIL (only for image encoding though).

The amount of threads may vary, but i may have around 20 threads at all time. 
Most of them are indeed demon threads.

I have one atexit handler: i executes a few subprocess and pereform a few 
xmlrpc calls then exit. In this case, the handler go fully executed.

There are signal handlers, but none of them got called.

No monkeypatching is involved :)

I only browsed the patch up until the 3.5 head. (i guess il lacked to courage 
to go up to 3.7).

I tried to write a reproduction case, but i failed to run into the error. Of 
course, i will try to improve it if i get a clue about a way to increase the 
likelyness of the problem.
Sadly, all i have right now is a process i can attach to.

--
type: behavior -> 

___
Python tracker 
<https://bugs.python.org/issue36469>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36469] Stuck during interpreter exit, attempting to take the GIL

2019-03-29 Thread Remy Noel


Remy Noel  added the comment:

The bug happens about once every two weeks on a script that is fired more than 
10K times a day.

Sadly, i can't update the whole production environment to try it on the latest. 
(and i was unable to trigger the bug by myself).

I was hoping we could find inconsistencies in the hanging process that could 
lead to clues about the origin of the error.

--

___
Python tracker 
<https://bugs.python.org/issue36469>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36469] Stuck during interpreter exit, attempting to take the GIL

2019-03-29 Thread Remy Noel


New submission from Remy Noel :

I have a script (sadly, I can't publish it) spawning multiple threads that, in 
rare occurences, does not manage to exit properly and get stuck forever.

More precisely, this seems to happen during Interpreter exit: The atexit 
callbacks are called sucessfully, and we then have multiple threads that are 
all atempting to get the GIL why None seems to owns it (_PyThreadState_Current 
is always '{_value = 0}' while gil_locked is '{_value = 1}').

The main thread stack looks like this:

#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at 
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x55d00997ce6a in PyCOND_TIMEDWAIT (cond=0x55d009e8ddc0 , 
mut=0x55d009e8dd80 , us=) at ../Python/condvar.h:103
#2  take_gil () at ../Python/ceval_gil.h:224
#3  0x55d00998580b in PyEval_EvalFrameEx () at ../Python/ceval.c:1273
#4  0x55d00996f16f in _PyEval_EvalCodeWithName.lto_priv.1929 (qualname=0x0, 
name=, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, 
kws=, argcount=, 
args=, locals=, globals=, 
_co=) at ../Python/ceval.c:4033
#5  PyEval_EvalCodeEx () at ../Python/ceval.c:4054
#6  0x55d0099b90e3 in function_call.lto_priv () at 
../Objects/funcobject.c:627
#7  0x55d009a02e17 in PyObject_Call () at ../Objects/abstract.c:2166
#8  0x55d00992034e in method_call.lto_priv () at 
../Objects/classobject.c:330
#9  0x55d009a02e17 in PyObject_Call () at ../Objects/abstract.c:2166
#10 0x55d00996df7d in PyEval_CallObjectWithKeywords () at 
../Python/ceval.c:4595
#11 0x55d009a5d05d in slot_tp_repr () at ../Objects/typeobject.c:5992
#12 0x55d0099c9685 in PyObject_Repr () at ../Objects/object.c:482
#13 0x55d0099aa6be in unicode_fromformat_arg (vargs=0x7ffc2ca81110, 
f=0x55d009a8a837 "R", writer=0x7ffc2ca810b0) at ../Objects/unicodeobject.c:2645
#14 PyUnicode_FromFormatV () at ../Objects/unicodeobject.c:2710
#15 0x55d009a572bc in PyErr_WarnFormat () at ../Python/_warnings.c:895
#16 0x55d0098840bb in sock_dealloc (s=0x7f43000fc528) at 
../Modules/socketmodule.c:4177
#17 0x55d0099d031d in subtype_dealloc.lto_priv () at 
../Objects/typeobject.c:1209
#18 0x55d0099b68f7 in frame_dealloc.lto_priv () at 
../Objects/frameobject.c:431
#19 0x55d0098ab7b1 in PyThreadState_Clear (tstate=0x55d00bee8a70) at 
../Python/pystate.c:386
#20 0x55d009a4d08a in PyInterpreterState_Clear () at ../Python/pystate.c:118
#21 0x55d009a4e1d2 in Py_Finalize () at ../Python/pylifecycle.c:633
#22 0x55d009a4e2a8 in Py_Exit (sts=sts@entry=0) at 
../Python/pylifecycle.c:1465
#23 0x55d009a4e38e in handle_system_exit () at ../Python/pythonrun.c:602
#24 0x55d009a4e3f6 in PyErr_PrintEx () at ../Python/pythonrun.c:612
#25 0x55d009a4f667 in PyErr_Print () at ../Python/pythonrun.c:508
#26 PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:401
#27 0x55d009a7c2e7 in run_file (p_cf=0x7ffc2ca814fc, 
filename=0x55d00bb01140 L"...", fp=0x55d00bb62e60) at ../Modules/main.c:318
#28 Py_Main () at ../Modules/main.c:768
#29 0x55d00990bd71 in main () at ../Programs/python.c:65
#30 0x7f430b7cd2e1 in __libc_start_main (main=0x55d00990bc90 , 
argc=11, argv=0x7ffc2ca81708, init=, fini=, 
rtld_fini=, stack_end=0x7ffc2ca816f8)
at ../csu/libc-start.c:291
#31 0x55d009a12a7a in _start ()

We can see it is trying to get the GIL while finalizing (as it is emitting a 
warning when destroying a socket). However, this prevents any other thread to 
get deleted since the first thread holds the head_lock. For instance we have 
thread 18 trying to get the head lock:

Thread 18 (Thread 0x7f4302ffd700 (LWP 21117)):
#0  0x7f430c6aa536 in futex_abstimed_wait_cancelable (private=0, 
abstime=0x0, expected=0, futex_word=0x55d00bb014c0) at 
../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x55d00bb014c0, abstime=0x0) at 
sem_waitcommon.c:111
#2  0x7f430c6aa5e4 in __new_sem_wait_slow (sem=0x55d00bb014c0, abstime=0x0) 
at sem_waitcommon.c:181
#3  0x55d00994d2d5 in PyThread_acquire_lock_timed () at 
../Python/thread_pthread.h:352
#4  0x55d009a4dcec in tstate_delete_common () at ../Python/pystate.c:418
#5  0x55d009a4dd88 in PyThreadState_DeleteCurrent () at 
../Python/pystate.c:457
#6  0x55d009a482a4 in t_bootstrap () at ../Modules/_threadmodule.c:1027
#7  0x7f430c6a2494 in start_thread (arg=0x7f4302ffd700) at 
pthread_create.c:333
#8  0x7f430b895acf in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:97


I attached the full stacktrace of the 18 threads.

I a not sure wether we either shouldn't try to lock the GIL while finalizing or 
if i somehow just happen to have run into a thread aqcuiring the GIL without 
releasing it.

python version is 3.5.3.

I kept the problematic process running and can extract any information you may 
want from it.

--
components: Interpreter Core, Library (Lib)
files: stacktraces
messages: 339103
nosy: mocramis
priority: nor