Antoine Pitrou <pit...@free.fr> added the comment:

It is actually quite an intricate problem.  What happens is that child process 
*main thread* ends, but not its background sleeping thread (the `lambda: 
time.sleep(3600)`).

To diagnose it, you can display the process tree:
```
$ ps fu
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
antoine  12634  0.0  0.0  28308  9208 pts/0    Ss   15:21   0:00 bash
antoine   2520  0.0  0.0 179072 10684 pts/0    Sl+  15:29   0:00  \_ ./python 
threadforkmodel.py
antoine   2522  0.0  0.0      0     0 pts/0    Zl+  15:29   0:00      \_ 
[python] <defunct>
```

Then you can display all threads for the child process (here with pid 2522):
```
$ ps -T -p 2522
  PID  SPID TTY          TIME CMD
 2522  2522 pts/0    00:00:00 python <defunct>
 2522  2525 pts/0    00:00:00 python
```

The main thread is marked zombie ("defunct") but thread 2525 is still 
running... What is it doing?  Let's attach gdb:
```
$ gdb ./python --pid 2525
```

And display the call stack:
```
(gdb) bt
#0  0x00007f1fb3ca503f in __GI___select (nfds=nfds@entry=0, 
readfds=readfds@entry=0x0, writefds=writefds@entry=0x0, 
    exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7f1fb23553c0) at 
../sysdeps/unix/sysv/linux/select.c:41
#1  0x000055e6fc4fcf7e in pysleep (secs=<optimized out>) at 
./Modules/timemodule.c:1864
#2  0x000055e6fc4fd022 in time_sleep (self=self@entry=<module at remote 
0x7f1fb4a03398>, obj=<optimized out>)
    at ./Modules/timemodule.c:366
#3  0x000055e6fc3a02e7 in _PyMethodDef_RawFastCallKeywords 
(method=0x55e6fc887ee0 <time_methods+288>, 
    self=<module at remote 0x7f1fb4a03398>, args=args@entry=0x7f1fb336a8f8, 
nargs=nargs@entry=1, kwnames=0x0) at Objects/call.c:646
#4  0x000055e6fc3a04c7 in _PyCFunction_FastCallKeywords (
    func=func@entry=<built-in method sleep of module object at remote 
0x7f1fb4a03398>, args=args@entry=0x7f1fb336a8f8, 
    nargs=nargs@entry=1, kwnames=kwnames@entry=0x0) at Objects/call.c:732
#5  0x000055e6fc4506e9 in call_function 
(pp_stack=pp_stack@entry=0x7f1fb2355570, oparg=oparg@entry=1, 
kwnames=kwnames@entry=0x0)
    at Python/ceval.c:4607
#6  0x000055e6fc45c678 in _PyEval_EvalFrameDefault (f=Frame 0x7f1fb336a770, for 
file threadforkmodel.py, line 36, in <lambda> (), 
    throwflag=<optimized out>) at Python/ceval.c:3195
#7  0x000055e6fc451110 in PyEval_EvalFrameEx (f=f@entry=Frame 0x7f1fb336a770, 
for file threadforkmodel.py, line 36, in <lambda> (), 
    throwflag=throwflag@entry=0) at Python/ceval.c:581
#8  0x000055e6fc451d21 in _PyEval_EvalCodeWithName (_co=_co@entry=<code at 
remote 0x7f1fb4989700>, 
    globals=globals@entry={'__name__': '__main__', '__doc__': None, 
'__package__': None, '__loader__': <SourceFileLoader(name='__main__', 
path='threadforkmodel.py') at remote 0x7f1fb49d4710>, '__spec__': None, 
'__annotations__': {}, '__builtins__': <module at remote 0x7f1fb4adf8c0>, 
'__file__': 'threadforkmodel.py', '__cached__': None, 'threading': <module at 
remote 0x7f1fb36ca668>, 'time': <module at remote 0x7f1fb4a03398>, 'os': 
<module at remote 0x7f1fb49e5050>, 'atexit': <module at remote 0x7f1fb36d3aa0>, 
'signal': <module at remote 0x7f1fb36cc500>, 'run': <function at remote 
0x7f1fb4a93e10>, 'start': <function at remote 0x7f1fb33699f0>, 'join': 
<function at remote 0x7f1fb3369aa0>, 'runFork': <function at remote 
0x7f1fb3369b50>, 'handleExit': <function at remote 0x7f1fb3369c00>, 
'handleChildExit': <function at remote 0x7f1fb3369cb0>, 'main': <function at 
remote 0x7f1fb3369d60>}, locals=locals@entry=0x0, 
    args=args@entry=0x7f1fb4aec078, argcount=argcount@entry=0, 
kwnames=kwnames@entry=0x0, kwargs=0x0, kwcount=0, kwstep=2, 
    defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name='<lambda>', 
qualname='runFork.<locals>.<lambda>') at Python/ceval.c:3969

[...]
```

So basically the sleep() call wasn't woken up by the main thread's death... 
even though we might have expected it to.  This is indeed a case of weird 
interaction between threads and processes.  The only reference I could find is 
a single comment in a StackOverflow question:
"""
Be aware that infinite waits on semaphores, handles etc can cause your process 
to become a zombie in both Windows and Linux.
"""

The reason I'm posting this detailed explanation is that I hit the exact same 
issue when trying to debug the PEP 556 implementation, and it took me quite 
some time (and Pablo's help) to finally understand and workaround the issue.


In the end, I would recommend you don't use fork() but use multiprocessing with 
the "forkserver" start method, which will eliminate such problems:
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

----------
nosy: +pitrou

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35902>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to