[issue32186] io.FileIO hang all threads if fstat blocks on inaccessible NFS server

2017-12-09 Thread STINNER Victor

STINNER Victor  added the comment:

(Oops, closing was my intent of my previous comment, but I forgot it,
thanks Berker.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32186] io.FileIO hang all threads if fstat blocks on inaccessible NFS server

2017-12-08 Thread Berker Peksag

Berker Peksag  added the comment:

This has been fixed in all active branches (2.7, 3.6 and master) so I think we 
can close it as 'fixed'. Thanks, Nir!

--
nosy: +berker.peksag
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed
type:  -> behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32186] io.FileIO hang all threads if fstat blocks on inaccessible NFS server

2017-12-07 Thread STINNER Victor

STINNER Victor  added the comment:

The bug has been fixed in Python 2.7, 3.6 and the master branch.

Thank you Nir Soffer for the bug report and the fix!

--
versions:  -Python 3.5, Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32186] io.FileIO hang all threads if fstat blocks on inaccessible NFS server

2017-12-07 Thread STINNER Victor

STINNER Victor  added the comment:


New changeset 830daae1c82ed33deef0086b7b6323e5be0b0cc8 by Victor Stinner (Nir 
Soffer) in branch '2.7':
[2.7] bpo-32186: Release the GIL during fstat and lseek calls (#4651)
https://github.com/python/cpython/commit/830daae1c82ed33deef0086b7b6323e5be0b0cc8


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32186] io.FileIO hang all threads if fstat blocks on inaccessible NFS server

2017-11-30 Thread STINNER Victor

STINNER Victor  added the comment:


New changeset 8bcd41040a5f1f9b48a86d0e21f196e4b1f90e4b by Victor Stinner (Miss 
Islington (bot)) in branch '3.6':
bpo-32186: Release the GIL during lseek and fstat (GH-4652) (#4661)
https://github.com/python/cpython/commit/8bcd41040a5f1f9b48a86d0e21f196e4b1f90e4b


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32186] io.FileIO hang all threads if fstat blocks on inaccessible NFS server

2017-11-30 Thread Roundup Robot

Change by Roundup Robot :


--
pull_requests: +4571

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32186] io.FileIO hang all threads if fstat blocks on inaccessible NFS server

2017-11-30 Thread STINNER Victor

STINNER Victor  added the comment:


New changeset 6a89481680b921e7b317c29877bdda9a6031e5ad by Victor Stinner (Nir 
Soffer) in branch 'master':
bpo-32186: Release the GIL during lseek and fstat (#4652)
https://github.com/python/cpython/commit/6a89481680b921e7b317c29877bdda9a6031e5ad


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32186] io.FileIO hang all threads if fstat blocks on inaccessible NFS server

2017-11-30 Thread STINNER Victor

STINNER Victor  added the comment:

We already release the GIL when calling lseek() in fileio.c, in the 
portable_lseek() function. So it makes sense to also do it in 
_io_FileIO_readall_impl() in the same file. os.lseek() also releases the GIL. 

I found another functions which calls lseek() without releasing the GIL:

* the Windows implementation of new_mmap_object()
* _Py_DisplaySourceLine()
* fp_setreadl() of Parser/tokenizer.c

I'm not sure that these 3 functions should be modified. In case of doubt, I 
prefer to not touch the code.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32186] io.FileIO hang all threads if fstat blocks on inaccessible NFS server

2017-11-30 Thread Nir Soffer

Change by Nir Soffer :


--
keywords: +patch
pull_requests: +4563
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32186] io.FileIO hang all threads if fstat blocks on inaccessible NFS server

2017-11-30 Thread Nir Soffer

Change by Nir Soffer :


--
pull_requests: +4564

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32186] io.FileIO hang all threads if fstat blocks on inaccessible NFS server

2017-11-30 Thread Nir Soffer

Nir Soffer  added the comment:

Forgot to mention - reproducible with python 2.7.

Similar issues exists in python 3, but I did not try to reproduce since we
are using python 2.7.

I posted patches for both 2.7 and master:
- https://github.com/python/cpython/pull/4651
- https://github.com/python/cpython/pull/4652

--
nosy: +benjamin.peterson, stutzbach, vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32186] io.FileIO hang all threads if fstat blocks on inaccessible NFS server

2017-11-30 Thread Nir Soffer

New submission from Nir Soffer :

Using io.FileIO can hang all threads when accessing an inaccessible NFS
server.

To reproduce this, you need to open the file like this:

fd = os.open(filename, ...)
fio = io.FileIO(fd, "r+", closefd=True)

Inside fileio_init, there is a checkfd call, calling fstat without releasing
the GIL. This will hang all threads.

The expected behavior is blocking only the thread blocked on the system call,
so a system stay responsive and can serve other tasks.

Here is the log showing this issue, created with the attached reproducer
script (fileio_nfs_test.py).

# python fileio_nfs_test.py mnt/fileio.out dumbo.tlv.redhat.com
2017-11-30 18:41:49,159 - (MainThread) - pid=3436
2017-11-30 18:41:49,159 - (MainThread) - Opening mnt/fileio.out
2017-11-30 18:41:49,160 - (MainThread) - OK fd=3
2017-11-30 18:41:49,161 - (MainThread) - Starting canary thread
2017-11-30 18:41:49,161 - (Canary) - Blocking access to storage
2017-11-30 18:41:49,169 - (Canary) - If this test is hang, please run: iptables 
-D OUTPUT -p tcp -d dumbo.tlv.redhat.com --dport 2049 -j DROP
2017-11-30 18:41:49,169 - (Canary) - check 0
2017-11-30 18:41:49,169 - (MainThread) - Waiting until storage is blocked...
2017-11-30 18:41:50,170 - (Canary) - check 1
2017-11-30 18:41:51,170 - (Canary) - check 2
2017-11-30 18:41:52,171 - (Canary) - check 3
2017-11-30 18:41:53,171 - (Canary) - check 4
2017-11-30 18:41:54,172 - (Canary) - check 5
2017-11-30 18:41:55,172 - (Canary) - check 6
2017-11-30 18:41:56,172 - (Canary) - check 7
2017-11-30 18:41:57,173 - (Canary) - check 8
2017-11-30 18:41:58,173 - (Canary) - check 9
2017-11-30 18:41:59,174 - (MainThread) - Opening io.FileIO

Everything is hang now!

After some time I run this from another shell:
 iptables -D OUTPUT -p tcp -d dumbo.tlv.redhat.com --dport 2049 -j DROP

And now the script is unblocked and finish.

2017-11-30 18:45:29,683 - (MainThread) - OK
2017-11-30 18:45:29,684 - (MainThread) - Creating mmap
2017-11-30 18:45:29,684 - (Canary) - check 10
2017-11-30 18:45:29,684 - (MainThread) - OK
2017-11-30 18:45:29,685 - (MainThread) - Filling mmap
2017-11-30 18:45:29,685 - (MainThread) - OK
2017-11-30 18:45:29,685 - (MainThread) - Writing mmap to storage
2017-11-30 18:45:29,719 - (MainThread) - OK
2017-11-30 18:45:29,719 - (MainThread) - Syncing
2017-11-30 18:45:29,719 - (MainThread) - OK
2017-11-30 18:45:29,720 - (MainThread) - Done

We have a canary thread logging every second. Once we tried to open
the FileIO, the canary thread stopped - this is possible only if the io
extension module was holding the GIL during a blocking call.

And here is the backtrace of the hang process in the kernel:

# cat /proc/3436/stack
[] rpc_wait_bit_killable+0x24/0xb0 [sunrpc]
[] __rpc_execute+0x154/0x410 [sunrpc]
[] rpc_execute+0x68/0xb0 [sunrpc]
[] rpc_run_task+0xf6/0x150 [sunrpc]
[] nfs4_call_sync_sequence+0x63/0xa0 [nfsv4]
[] _nfs4_proc_getattr+0xcc/0xf0 [nfsv4]
[] nfs4_proc_getattr+0x72/0xf0 [nfsv4]
[] __nfs_revalidate_inode+0xbf/0x310 [nfs]
[] nfs_getattr+0x95/0x250 [nfs]
[] vfs_getattr+0x46/0x80
[] vfs_fstat+0x45/0x80
[] SYSC_newfstat+0x24/0x60
[] SyS_newfstat+0xe/0x10
[] system_call_fastpath+0x16/0x1b
[] 0x

You cannot attach to the process with gdb, since it is in D state, but once
the process is unblocked, gbd takes control, and we see:

Thread 2 (Thread 0x7f97a2ea5700 (LWP 4799)):
#0  0x7f97ab925a0b in do_futex_wait.constprop.1 () from 
/lib64/libpthread.so.0
#1  0x7f97ab925a9f in __new_sem_wait_slow.constprop.0 () from 
/lib64/libpthread.so.0
#2  0x7f97ab925b3b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#3  0x7f97abc455f5 in PyThread_acquire_lock () from 
/lib64/libpython2.7.so.1.0
#4  0x7f97abc11156 in PyEval_RestoreThread () from 
/lib64/libpython2.7.so.1.0
#5  0x7f97a44f9086 in time_sleep () from 
/usr/lib64/python2.7/lib-dynload/timemodule.so
#6  0x7f97abc18bb0 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#7  0x7f97abc1aefd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#8  0x7f97abc183fc in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#9  0x7f97abc1aefd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#10 0x7f97abc183fc in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#11 0x7f97abc1aefd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#12 0x7f97abba494d in function_call () from /lib64/libpython2.7.so.1.0
#13 0x7f97abb7f9a3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#14 0x7f97abc135bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#15 0x7f97abc1857d in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#16 0x7f97abc1857d in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#17 0x7f97abc1aefd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#18 0x7f97abba4858 in function_call () from /lib64/libpython2.7.so.1.0
#19 0x7f97abb7f9a3 in PyObject_Call () from