Nir Soffer <nir...@gmail.com> added the comment: Antoine, thanks for fixing this on master! but I don't think this issue can be closed yet.
First, the issue is not a performance but reliability. I probably made bad choice when I marked this as performance. When you call mmap.mmap() in one thread, the entire process hangs for an hour because the file descriptor is on a non-responsive NFS server. With the fix, only the thread accessing the file descriptor is affected. The rest of the system can function normally. Second, the issue affects python 2.7, which is the production version on many servers, and will be for many years e.g. on RHEL/CentOS 7. I think it is important to fix this issue for these users. Here is examples of the issue using reproducer scripts I uploaded to the bug. When mmap.mmap block, the entire process hangs. I unblocked the process from another shell by removing the iptables rule. # python bpo-33021/mmap_nfs_test.py mnt dumbo.tlv.redhat.com 2018-03-17 01:17:57,846 - (MainThread) - Starting canary thread 2018-03-17 01:17:57,846 - (Canary) - Blocking access to storage 2018-03-17 01:17:57,857 - (Canary) - If this test is hang, please run: iptables -D OUTPUT -p tcp -d dumbo.tlv.redhat.com --dport 2049 -j DROP 2018-03-17 01:17:57,857 - (Canary) - check 0 2018-03-17 01:17:58,858 - (Canary) - check 1 2018-03-17 01:17:59,858 - (Canary) - check 2 2018-03-17 01:18:00,859 - (Canary) - check 3 2018-03-17 01:18:01,859 - (Canary) - check 4 2018-03-17 01:18:02,859 - (Canary) - check 5 2018-03-17 01:18:03,860 - (Canary) - check 6 2018-03-17 01:18:04,860 - (Canary) - check 7 2018-03-17 01:18:05,861 - (Canary) - check 8 2018-03-17 01:18:06,861 - (Canary) - check 9 2018-03-17 01:18:07,862 - (Canary) - check 10 2018-03-17 01:18:07,868 - (MainThread) - Calling mmap.mmap (I remove the iptables rule here) 2018-03-17 01:18:57,683 - (MainThread) - OK 2018-03-17 01:18:57,683 - (MainThread) - Done 2018-03-17 01:18:57,683 - (Canary) - check 11 When mmapobject.size() was called, the entire process was hang. I unblocked the process from another shell by removing the iptables rule. # python bpo-33021/mmap_size_nfs_test.py mnt dumbo.tlv.redhat.com 2018-03-17 01:22:17,991 - (MainThread) - Starting canary thread 2018-03-17 01:22:17,992 - (Canary) - Blocking access to storage 2018-03-17 01:22:18,001 - (Canary) - If this test is hang, please run: iptables -D OUTPUT -p tcp -d dumbo.tlv.redhat.com --dport 2049 -j DROP 2018-03-17 01:22:18,001 - (Canary) - check 0 2018-03-17 01:22:19,002 - (Canary) - check 1 2018-03-17 01:22:20,002 - (Canary) - check 2 2018-03-17 01:22:21,002 - (Canary) - check 3 2018-03-17 01:22:22,003 - (Canary) - check 4 2018-03-17 01:22:23,003 - (Canary) - check 5 2018-03-17 01:22:24,004 - (Canary) - check 6 2018-03-17 01:22:25,004 - (Canary) - check 7 2018-03-17 01:22:26,004 - (Canary) - check 8 2018-03-17 01:22:27,005 - (Canary) - check 9 2018-03-17 01:22:28,005 - (MainThread) - Calling mmapobject.size (I removed the ipatables rule here) 2018-03-17 01:23:38,701 - (MainThread) - OK 2018-03-17 01:23:38,701 - (MainThread) - Done 2018-03-17 01:23:38,701 - (Canary) - check 10 I found that os.fdopen issue does not affect RHEL/CentOS 7, because they use python 2.7.5, and the issue was introduced in python 2.7.7, in: commit 5c863bf93809cefeb4469512eadac291b7046051 Author: Benjamin Peterson <benja...@python.org> Date: Mon Apr 14 19:45:46 2014 -0400 when an exception is raised in fdopen, never close the fd (changing on my mind on #21191) This issue affects Fedora (python 2.7.14) and probably other distros using latest python 2.7. Here is example run show how this affects Fedora 27: # python fdopen_nfs_test.py mnt dumbo.tlv.redhat.com 2018-03-17 01:43:52,718 - (MainThread) - Starting canary thread 2018-03-17 01:43:52,718 - (Canary) - Blocking access to storage 2018-03-17 01:43:52,823 - (Canary) - If this test is hang, please run: iptables -D OUTPUT -p tcp -d dumbo.tlv.redhat.com --dport 2049 -j DROP 2018-03-17 01:43:52,824 - (Canary) - check 0 2018-03-17 01:43:53,824 - (Canary) - check 1 2018-03-17 01:43:54,824 - (Canary) - check 2 2018-03-17 01:43:55,825 - (Canary) - check 3 2018-03-17 01:43:56,825 - (Canary) - check 4 2018-03-17 01:43:57,825 - (Canary) - check 5 2018-03-17 01:43:58,826 - (Canary) - check 6 2018-03-17 01:43:59,826 - (Canary) - check 7 2018-03-17 01:44:00,826 - (Canary) - check 8 2018-03-17 01:44:01,827 - (Canary) - check 9 2018-03-17 01:44:02,827 - (Canary) - check 10 2018-03-17 01:44:02,834 - (MainThread) - Calling os.fdopen (remove iptbales rule, and force-unmount here) 2018-03-17 01:50:25,853 - (MainThread) - OK 2018-03-17 01:50:25,854 - (Canary) - check 11 2018-03-17 01:50:25,895 - (MainThread) - Done Traceback (most recent call last): File "fdopen_nfs_test.py", line 75, in <module> os.unlink(filename) OSError: [Errno 2] No such file or directory: 'mnt/test' So, I think we should: - backport to 3.7, 3.6 - reconsider backport to 2.7, at least for mmap and os.fdopen. I can prepare the backports and split the 2.7 patch if this helps. ---------- Added file: https://bugs.python.org/file47490/mmap_nfs_test.py _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33021> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com