On Mon, Nov 4, 2013 at 11:26 AM, Amal Thomas <amalthomas...@gmail.com> wrote: > @Dave: thanks.. By the way I am running my codes on a server with about > 100GB ram but I cant afford my code to use 4-5 times the size of the text > file. Now I am using read() / readlines() , these seems to be more > efficient in memory usage than io.StringIO(f.read()).
f.read() creates a string to initialize a StringIO object. You could instead initialize a BytesIO object with a mapped file; that should cut the peak RSS down by half. If you need decoded text, add a TextIOWrapper. import io import mmap with open('output.txt') as f: with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mf: content = io.TextIOWrapper(io.BytesIO(mf)) for line in content: 'process line' However, before you do something extreme (like say... loading a 50 GiB file into RAM), try tweaking the TextIOWrapper object's readline() by increasing _CHUNK_SIZE. This can be up to 2**63-1 in a 64-bit process. with open('output.txt') as content: content._CHUNK_SIZE = 65536 for line in content: 'process line' Check content.buffer.tell() to confirm that the file pointer is increasing in steps of the given chunk size. Built-in open() also lets you set the "buffering" size for the BufferedReader, content.buffer. However, in this case I don't think you need to worry about it. content.readline() calls content.buffer.read1() to read directly from the FileIO object, content.buffer.raw. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor