Greg Lindstrom wrote:
I have a file with varying length records. All
but the first record, that is; it's always 107 bytes long. What I would
like to do is strip out all linefeeds from the file, read the character
in position 107 (the end of segment delimiter) and then replace all of
the end of segment characters with linefeeds, making a file where each
segment is on its own line.
Hmmmm... here's one way of doing it:
import mmap
import sys
DELIMITER_OFFSET = 107
data_file = file(sys.argv[1], "r+w")
data_file.seek(0, 2)
data_length = data_file.tell()
data = mmap.mmap(data_file.fileno(), data_length, access=mmap.ACCESS_WRITE)
delimiter = data[DELIMITER_OFFSET]
for index, char in enumerate(data):
if char == delimiter:
data[index] = "\n"
data.flush()
There are doubtless more efficient ways, like using mmap.mmap.find()
instead of iterating over every character but that's an exercise for
the reader. And personally I would make extra copies ANYWAY--not doing
so is asking for trouble.
--
Michael Hoffman
--
http://mail.python.org/mailman/listinfo/python-list