Re: Scanning a file

Paul Watson Sat, 29 Oct 2005 12:20:46 -0700

"Mike Meyer" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> "Paul Watson" <[EMAIL PROTECTED]> writes:
...
> Did you do timings on it vs. mmap? Having to copy the data multiple
> times to deal with the overlap - thanks to strings being immutable -
> would seem to be a lose, and makes me wonder how it could be faster
> than mmap in general.


The only thing copied is a string one byte less than the search string for 
each block.

I did not do due dilligence with respect to timings.  Here is a small 
dataset read sequentially and using mmap.

$ ls -lgG t.dat
-rw-r--r--  1 16777216 Oct 28 16:32 t.dat
$ time  ./scanfile.py
1048576
    0.80s real     0.64s user     0.15s system
$ time  ./scanfilemmap.py
1048576
   20.33s real     6.09s user    14.24s system

With a larger file, the system time skyrockets. I assume that to be the 
paging mechanism in the OS.  This is Cyngwin on Windows XP.

$ ls -lgG t2.dat
-rw-r--r--  1 268435456 Oct 28 16:33 t2.dat
$ time  ./scanfile.py
16777216
   28.85s real    16.37s user     0.93s system
$ time  ./scanfilemmap.py
16777216
  323.45s real    94.45s user   227.74s system


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Scanning a file

Reply via email to