Re: Iterating through a file significantly slower when file has big buffer
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 pyt...@bdurham.com wrote: The following tests were run on a Windows XP system using Python 2.6.1 Unless you changed the defaults, the Windows XP system cache size is 10MB. When you use a larger read size, chances are it is blowing out that cache and causes metadata (file block locations) to have to be reread on your next read. You are also funnelling all the data through your CPU cache with a similar effect although it will be less noticeable. To change XP cache sizes, see: http://marc.info/?l=sqlite-usersm=116743785223905w=2 http://support.microsoft.com/kb/895932 http://www.techspot.com/tweaks/memory-winxp/ Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkl+4SkACgkQmOOfHg372QRgcACfVOdUWQGyj8xtNvHob/CtcM8g JsEAoKt/xI36iR5RuQOfZDMz2ze4L3Ia =DrDw -END PGP SIGNATURE- -- http://mail.python.org/mailman/listinfo/python-list
Iterating through a file significantly slower when file has big buffer
I'm working with very large text files and am always looking for ways to optimize the performance of our scripts. While reviewing our code, I wondered if changing the size of our file buffers to a very large buffer size might speed up our file I/O. Intuitively, I thought that bigger buffers might improve performance by reducing the number of reads. Instead I observed just the opposite - performance was 7x slower! (~500 sec vs. 70 sec) and used 3x the memory (24M vs. 8M) due to the larger buffer. The following tests were run on a Windows XP system using Python 2.6.1 SOURCE: import time # timer class class timer( object ): def __init__( self, message='' ): self.message = message def start( self ): self.starttime = time.time() print 'Start: %s' % ( self.message ) def stop( self ): print 'Finish: %s %6.2f' % ( self.message, time.time() - self.starttime ) # myFileName points to a 2G text file. myFileName = r'C:\logs\jan2009.dat' # default buffering myFile = open( myFileName ) for line in myFile: pass myFile.close() strategy1.stop() # setting the buffer size to 16M bufferSize = 2 ** 24 strategy2 = timer( 'Large buffer (%sk)' % (bufferSize/1024) ) strategy2.start() myFile = open( myFileName, 'rt', bufferSize ) for line in myFile: pass myFile.close() strategy2.stop() OUTPUT: Start: Default buffer Finish: Default buffer 69.98 Start: Large buffer (16384k) Finish: Large buffer (16384k) 493.88 --- 7x slower Any comments regarding this massive slowdown? Thanks, Malcolm -- http://mail.python.org/mailman/listinfo/python-list
CORRECTION: Re: Iterating through a file significantly slower when file has big buffer
Added the following lines missing from my original post: strategy1 = timer( 'Default buffer' ) strategy1.start() Code below is now complete. Malcolm SOURCE: import time # timer class class timer( object ): def __init__( self, message='' ): self.message = message def start( self ): self.starttime = time.time() print 'Start: %s' % ( self.message ) def stop( self ): print 'Finish: %s %6.2f' % ( self.message, time.time() - self.starttime ) # myFileName points to a 2G text file. myFileName = r'C:\logs\jan2009.dat' # default buffering strategy1 = timer( 'Default buffer' ) strategy1.start() myFile = open( myFileName ) for line in myFile: pass myFile.close() strategy1.stop() # setting the buffer size to 16M bufferSize = 2 ** 24 strategy2 = timer( 'Large buffer (%sk)' % (bufferSize/1024) ) strategy2.start() myFile = open( myFileName, 'rt', bufferSize ) for line in myFile: pass myFile.close() strategy2.stop() OUTPUT: Start: Default buffer Finish: Default buffer 69.98 Start: Large buffer (16384k) Finish: Large buffer (16384k) 493.88 --- 7x slower Any comments regarding this massive slowdown? Thanks, Malcolm -- http://mail.python.org/mailman/listinfo/python-list