Re: Iterating through a file significantly slower when file has big buffer

2009-01-27 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

pyt...@bdurham.com wrote:
 The following tests were run on a Windows XP system using Python 2.6.1

Unless you changed the defaults, the Windows XP system cache size is
10MB.  When you use a larger read size, chances are it is blowing out
that cache and causes metadata (file block locations) to have to be
reread on your next read.

You are also funnelling all the data through your CPU cache with a
similar effect although it will be less noticeable.

To change XP cache sizes, see:

 http://marc.info/?l=sqlite-usersm=116743785223905w=2
 http://support.microsoft.com/kb/895932
 http://www.techspot.com/tweaks/memory-winxp/

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkl+4SkACgkQmOOfHg372QRgcACfVOdUWQGyj8xtNvHob/CtcM8g
JsEAoKt/xI36iR5RuQOfZDMz2ze4L3Ia
=DrDw
-END PGP SIGNATURE-

--
http://mail.python.org/mailman/listinfo/python-list


Iterating through a file significantly slower when file has big buffer

2009-01-26 Thread python
I'm working with very large text files and am always looking for
ways to optimize the performance of our scripts.
While reviewing our code, I wondered if changing the size of our
file buffers to a very large buffer size might speed up our file
I/O. Intuitively, I thought that bigger buffers might improve
performance by reducing the number of reads. Instead I observed
just the opposite - performance was 7x slower! (~500 sec vs. 70
sec) and used 3x the memory (24M vs. 8M) due to the larger
buffer.
The following tests were run on a Windows XP system using Python
2.6.1
SOURCE:
import time
# timer class
class timer( object ):
def __init__( self, message='' ):
self.message = message
def start( self ):
self.starttime = time.time()
print 'Start:  %s' % ( self.message )

def stop( self ):
print 'Finish: %s %6.2f' % ( self.message, time.time() -
self.starttime )
# myFileName points to a 2G text file.
myFileName = r'C:\logs\jan2009.dat'
# default buffering
myFile = open( myFileName )
for line in myFile:
pass
myFile.close()
strategy1.stop()
# setting the buffer size to 16M
bufferSize = 2 ** 24
strategy2 = timer( 'Large buffer (%sk)' % (bufferSize/1024) )
strategy2.start()
myFile = open( myFileName, 'rt', bufferSize )
for line in myFile:
pass
myFile.close()
strategy2.stop()
OUTPUT:
Start:  Default buffer
Finish: Default buffer  69.98
Start:  Large buffer (16384k)
Finish: Large buffer (16384k) 493.88  --- 7x slower
Any comments regarding this massive slowdown?
Thanks,
Malcolm
--
http://mail.python.org/mailman/listinfo/python-list


CORRECTION: Re: Iterating through a file significantly slower when file has big buffer

2009-01-26 Thread python
Added the following lines missing from my original post:

strategy1 = timer( 'Default buffer' )
strategy1.start()

Code below is now complete.

Malcolm
SOURCE:
import time
# timer class
class timer( object ):
def __init__( self, message='' ):
self.message = message
def start( self ):
self.starttime = time.time()
print 'Start:  %s' % ( self.message )

def stop( self ):
print 'Finish: %s %6.2f' % ( self.message, time.time() -
self.starttime )
# myFileName points to a 2G text file.
myFileName = r'C:\logs\jan2009.dat'
# default buffering
strategy1 = timer( 'Default buffer' )
strategy1.start()
myFile = open( myFileName )
for line in myFile:
pass
myFile.close()
strategy1.stop()
# setting the buffer size to 16M
bufferSize = 2 ** 24
strategy2 = timer( 'Large buffer (%sk)' % (bufferSize/1024) )
strategy2.start()
myFile = open( myFileName, 'rt', bufferSize )
for line in myFile:
pass
myFile.close()
strategy2.stop()
OUTPUT:
Start:  Default buffer
Finish: Default buffer  69.98
Start:  Large buffer (16384k)
Finish: Large buffer (16384k) 493.88  --- 7x slower
Any comments regarding this massive slowdown?
Thanks,
Malcolm
--
http://mail.python.org/mailman/listinfo/python-list