Andy Watson kirjoitti: > I have an application that scans and processes a bunch of text files. > The content I'm pulling out and holding in memory is at least 200MB. > > I'd love to be able to tell the CPython virtual machine that I need a > heap of, say 300MB up front rather than have it grow as needed. I've > had a scan through the archives of comp.lang.python and the python > docs but cannot find a way to do this. Is this possible to configure > the PVM this way? > > Much appreciated, > Andy > -- >
Others have already suggested swap as a possible cause of slowness. I've been playing with my portable (dual Intel T2300 @ 1.66 GHz; 1 GB of mem ; Win XP ; Python Scripter IDE) using the following code: #======================= import datetime ''' # Create 10 files with sizes 1MB, ..., 10MB for i in range(1,11): print 'Writing: ' + 'Bytes_' + str(i*1000000) f = open('Bytes_' + str(i*1000000), 'w') f.write(str(i-1)*i*1000000) f.close() ''' # Read the files 5 times concatenating the contents # to one HUGE string now_1 = datetime.datetime.now() s = '' for count in range(5): for i in range(1,11): print 'Reading: ' + 'Bytes_' + str(i*1000000) f = open('Bytes_' + str(i*1000000), 'r') s = s + f.read() f.close() print 'Size of s is', len(s) print 's[274999999] = ' + s[274999999] now_2 = datetime.datetime.now() print now_1 print now_2 raw_input('???') #======================= The part at the start that is commented out is the part I used to create the 10 files. The second part prints the following output (abbreviated): Reading: Bytes_1000000 Size of s is 1000000 Reading: Bytes_2000000 Size of s is 3000000 Reading: Bytes_3000000 Size of s is 6000000 Reading: Bytes_4000000 Size of s is 10000000 Reading: Bytes_5000000 Size of s is 15000000 Reading: Bytes_6000000 Size of s is 21000000 Reading: Bytes_7000000 Size of s is 28000000 Reading: Bytes_8000000 Size of s is 36000000 Reading: Bytes_9000000 Size of s is 45000000 Reading: Bytes_10000000 Size of s is 55000000 <snip> Reading: Bytes_9000000 Size of s is 265000000 Reading: Bytes_10000000 Size of s is 275000000 s[274999999] = 9 2007-02-22 20:23:09.984000 2007-02-22 20:23:21.515000 As can be seen creating a string of 275 MB reading the parts from the files took less than 12 seconds. I think this is fast enough, but others might disagree! ;) Using the Win Task Manager I can see the process to grow to a little less than 282 MB when it reaches the raw_input call and to drop to less than 13 MB a little after I've given some input apparently as a result of PyScripter doing a GC. Your situation (hardware, file sizes etc.) may differ so that my experiment does not correspond it, but this was my 2 cents worth! HTH, Jussi -- http://mail.python.org/mailman/listinfo/python-list