On Oct 22, 2:54 pm, Mike Kent <[EMAIL PROTECTED]> wrote: > Before I file a bug report against Python 2.5.2, I want to run this by > the newsgroup to make sure I'm not being stupid. > > I have a text file of fixed-length records I want to read in random > order. That file is being changed in real-time by another process, > and my process want to see the changes to the file. What I'm seeing > is that, once I've opened the file and read a record, all subsequent > seeks to and reads of that same record will return the same data as > the first read of the record, so long as I don't close and reopen the > file. This indicates some sort of buffering and caching is going on. > > Consider the following: > > $ echo "hi" >foo.txt # Create my test file > $ python2.5 # Run Python > Python 2.5.2 (r252:60911, Sep 22 2008, 16:13:07) > [GCC 3.4.6 20060404 (Red Hat 3.4.6-9)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > > >>> f = open('foo.txt') # Open my test file > >>> f.seek(0) # Seek to the beginning of the file > >>> f.readline() # Read the line, I get the data I expected > 'hi\n' > >>> # At this point, in another shell I execute 'echo "bye" >foo.txt'. > >>> 'foo.txt' now has been changed
I thought this might be a case where the shell unlinks foo.txt and creates a new file... but it doesn't for me, and I still get the same behavior as you. It is indeed the buffering that's causing this. > >>> # on the disk, and now contains 'bye\n'. > >>> f.seek(0) # Seek to the beginning of the still-open file > >>> f.readline() # Read the line, I don't get 'bye\n', I get the > >>> original data, which is no longer there. > 'hi\n' > >>> f.close() # Now I close the file... > >>> f = open('foo.txt') # ... and reopen it > >>> f.seek(0) # Seek to the beginning of the file > >>> f.readline() # Read the line, I get the expected 'bye\n' > 'bye\n' > > It seems pretty clear to me that this is wrong. If there is any > caching going on, it should clearly be discarded if I do a seek. I totally disagree. If you need to discard the buffers, there's a way to do it: flush(). If you force seek() to discard perfectly good buffers you will hurt performance when not dealing with volatile data. Anyway, in Python 2.x, the behavior of the various file methods is documented as reflecting the underlying C stdio library. In fact, the documentation for fseek specifically says it sets the file's current position "like stdio's fseek()". Whatever stdio does is what Python does. So even if this behavior were a bug, it would be a bug in stdio, not in Python. > Note > that it's not just readline() that's returning me the wrong, cached > data, as I've also tried this with read(), and I get the same > results. It's not acceptable that I have to close and reopen the file > before every read when I'm doing random record access. You can call f.flush() to force it to discard the cache. Or use unbuffered I/O. Better yet, get rid of file I/O altogether and use an memory mapped file. > So, is this a bug, or am I being stupid? Well, it's not a bug, so.... Seriously, I advise you not to submit a bug report. Doesn't mean you're stupid, maybe you didn't know about unbuffered I/O or the flush() method. That just means you're uneducated. :) But please leave seek() out it. Carl Banks -- http://mail.python.org/mailman/listinfo/python-list