"Mike Meyer" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > "Paul Watson" <[EMAIL PROTECTED]> writes: ... > Did you do timings on it vs. mmap? Having to copy the data multiple > times to deal with the overlap - thanks to strings being immutable - > would seem to be a lose, and makes me wonder how it could be faster > than mmap in general.
The only thing copied is a string one byte less than the search string for each block. I did not do due dilligence with respect to timings. Here is a small dataset read sequentially and using mmap. $ ls -lgG t.dat -rw-r--r-- 1 16777216 Oct 28 16:32 t.dat $ time ./scanfile.py 1048576 0.80s real 0.64s user 0.15s system $ time ./scanfilemmap.py 1048576 20.33s real 6.09s user 14.24s system With a larger file, the system time skyrockets. I assume that to be the paging mechanism in the OS. This is Cyngwin on Windows XP. $ ls -lgG t2.dat -rw-r--r-- 1 268435456 Oct 28 16:33 t2.dat $ time ./scanfile.py 16777216 28.85s real 16.37s user 0.93s system $ time ./scanfilemmap.py 16777216 323.45s real 94.45s user 227.74s system -- http://mail.python.org/mailman/listinfo/python-list