On Fri, 2 Dec 2016 11:26 am, DFS wrote:
> On 12/01/2016 06:48 PM, Ned Batchelder wrote:
>> On Thursday, December 1, 2016 at 2:31:11 PM UTC-5, DFS wrote:
>>> After a simple test below, I submit that the above scenario would never
>>> occur. Ever. The time gap between checking for the file's existence
>>> and then trying to open it is far too short for another process to sneak
>>> in and delete the file.
>> It doesn't matter how quickly the first operation is (usually) followed
>> by the second. Your process could be swapped out between the two
>> operations. On a heavily loaded machine, there could be a very long
>> time between them
> How is it possible that the 'if' portion runs, then 44/100,000ths of a
> second later my process yields to another process which deletes the
> file, then my process continues.
> Is that governed by the dreaded GIL?
No, that has nothing to do with the GIL. It is because the operating
system is a preemptive multi-processing operating system. All modern OSes
are: Linux, OS X, Windows.
Each program that runs, including the OS itself, is one or more processes.
Typically, even on a single-user desktop machine, you will have dozens of
processes running simultaneously.
Every so-many clock ticks, the OS pauses whatever process is running,
more-or-less interrupting whatever it was doing, passes control on to
another process, then the next, then the next, and so on. The application
doesn't have any control over this, it can be paused at any time,
normally just for a small fraction of a second, but potentially for
seconds or minutes at a time if the system is heavily loaded.
> "The mechanism used by the CPython interpreter to assure that only one
> thread executes Python bytecode at a time."
> But I see you posted a stack-overflow answer:
> "In the case of CPython's GIL, the granularity is a bytecode
> instruction, so execution can switch between threads at any bytecode."
> Does that mean "chars=f.read().lower()" could get interrupted between
> the read() and the lower()?
Yes, but don't think about Python threads. Think about the OS.
I'm not an expert on the low-level hardware details, so I welcome
correction, but I think that you can probably expect that the OS can
interrupt code execution between any two CPU instructions. Something like
str.lower() is likely to be thousands of CPU instructions, even for a small
> With a 5ms window, it seems the following code would always protect the
> file from being deleted between lines 4 and 5.
> 1 import os,threading
> 2 f_lock=threading.Lock()
> 3 with f_lock:
> 4 if os.path.isfile(filename):
> 5 with open(filename,'w') as f:
> 6 process(f)
>> even if on an average machine, they are executed very quickly.
Absolutely not. At least on Linux, locks are advisory, not mandatory. Here
are a pair of scripts that demonstrate that. First, the well-behaved script
that takes out a lock:
# --- locker.py ---
import os, threading, time
filename = 'thefile.txt'
f_lock = threading.Lock()
print '\ntaking lock'
print filename, 'exists and is a file'
print 'lock still active'
with open(filename,'w') as f:
# --- end ---
Now, a second script which naively, or maliciously, just deletes the file:
# --- bandit.py ---
import os, time
filename = 'thefile.txt'
print 'deleting file, mwahahahaha!!!'
# --- end ---
Now, I run them both simultaneously:
[steve@ando thread-lock]$ touch thefile.txt # ensure file exists
[steve@ando thread-lock]$ (python locker.py &) ; (python bandit.py &)
thefile.txt exists and is a file
deleting file, mwahahahaha!!!
lock still active
Traceback (most recent call last):
File "locker.py", line 14, in <module>
IOError: File not open for reading
This is on Linux. Its possible that Windows behaves differently, and I don't
know how to run a command in the background in command.com or cmd.exe or
whatever you use on Windows.
> Also, this is just theoretical (I hope). It would be terrible system
> design if all those dozens of processes were reading and writing and
> deleting the same file.
It is not theoretical. And it's not a terrible system design, in the sense
that the alternatives are *worse*.
* Turn the clock back to the 1970s and 80s with single-processing
operating systems? Unacceptable -- even primitive OSes like DOS
and Mac System 5 needed to include some basic multiprocessing
- And what are servers supposed to do in this single-process world?
- Enforce mandatory locks? A great way for malware or hostile users
to perform Denial Of Service attacks.
Even locks being left around accidentally can be a real pain: Windows users
can probably tell you about times that a file has been accidentally left
open by buggy applications, and there's nothing you can do to unlock it
short of rebooting. Unacceptable for a server, and pain in the rear even for
- Make every file access go through a single scheduling application
which ensures there are no clashes? Probably very hard to write,
and would probably kill performance. Imagine you cannot even check
the existence of a 4GB file until its finished copying onto a USB
The cost of allowing two programs to run at the same time is that
sometimes they will both want to do something to the same file.
Fundamentally though, the solution here is quite simple: don't rely on
"Look Before You Leap" checks any time you have shared data, and the
file system is shared data. If you want *reliable* code, you MUST use a
try...except block to recover from file system errors.
“Cheer up,” they said, “things could be worse.” So I cheered up,
and sure enough, things got worse.