> Suppose we have a very large file, and wanna remove 'n' bytes in the
> middle of the file. My thought is:
> 1, read() until we reach the bytes should be removed, and mark the
> position as 'pos'.
> 2, seek(tell() + n) bytes
> 3, read() until we reach the end of the file, into a variable, say 'a'
> 4, seek(pos) back to 'pos'
> 5, write(a)
> 6, truncate()
> 
> If the file is really large, the performance may be a problem.

The biggest problem I see would be trying to read some massive 
portion if step #3 involves a huge amount of data.  If you're 
dealing with a multi-gigabyte file, and you want to delete 5 
bytes beginning at 20 bytes into the file, step #3 involves 
reading in file_size-(20+5) bytes into memory, and then spewing 
them all back out.  A better way might involve reading a 
fixed-size chunk each time and then writing that back to its 
proper offset.

def shift(f, offset, size, buffer_size=1024*1024):
        """deletes a portion of size "size" from file "f", starting at 
offset, and shifting the remainder of the file to fill.

The buffer_size can be tweaked for performance preferences,
defaulting to 1 megabyte.
"""
        f.seek(offset+size)
        while True:
                buffer = f.read(buffer_size)
                if not buffer: break
                f.seek(offset)
                f.write(buffer)
                f.seek(buffer_size,1)
                offset += buffer_size
        f.truncate()

if __name__ == '__main__':
        offset = ord('p')
        size = 5
        buffer_size = 30

        from StringIO import StringIO
        f = StringIO(''.join([chr(i) for i in xrange(256)]))
        print repr(f.read())
        print '=' * 50
        f.seek(0)
        shift(f, offset, size, buffer_size)
        f.seek(0)
        print repr(f.read())


> Is there a clever way to finish? Could mmap() help? Thx

No idea regarding mmap().

-tkc







-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to