I tried this on a different PC with 12 GB RAM. As expected, this time, reading 
the data was no issue. I noticed that for large files, Python takes up 2.5x 
size in memory compared to size on disk, for the case when each line in the 
file is retained as a string within a Python list. As an anecdote, for MATLAB, 
the similar overhead is 2x, slightly lower than Python, and each line in the 
file was retained as string within a MATLAB cell. I'm curious, has any one 
compared the overhead of data in memory for other languages like for instance 
Ruby?


-----Original Message-----
From: Python-list 
[mailto:python-list-bounces+pradipto.banerjee=adainvestments....@python.org] On 
Behalf Of Steven D'Aprano
Sent: Friday, October 19, 2012 6:12 PM
To: python-list@python.org
Subject: Re: Python does not take up available physical memory

On Fri, 19 Oct 2012 14:03:37 -0500, Pradipto Banerjee wrote:

> Thanks, I tried that. Still got MemoryError, but at least this time
> python tried to use the physical memory. What I noticed is that before
> it gave me the error it used up to 1.5GB (of the 2.23 GB originally
> showed as available) - so in general, python takes up more memory than
> the size of the file itself.

Well of course it does. Once you read the data into memory, it has its
own overhead for the object structure.

You haven't told us what the file is or how you are reading it. I'm going
to assume it is ASCII text and you are using Python 2.

py> open("test file", "w").write("abcde")
py> os.stat("test file").st_size
5L
py> text = open("test file", "r").read()
py> len(text)
5
py> sys.getsizeof(text)
26

So that confirms that a five byte ASCII string takes up five bytes on
disk but 26 bytes in memory as an object.

That overhead will depend on what sort of object, whether Unicode or not,
the version of Python, and how you read the data.

In general, if you have a huge amount of data to work with, you should
try to work with it one line at a time:

for line in open("some file"):
    process(line)


rather than reading the whole file into memory at once:

lines = open("some file").readlines()
for line in lines:
    process(line)



--
Steven
--
http://mail.python.org/mailman/listinfo/python-list

 This communication is for informational purposes only. It is not intended to 
be, nor should it be construed or used as, financial, legal, tax or investment 
advice or an offer to sell, or a solicitation of any offer to buy, an interest 
in any fund advised by Ada Investment Management LP, the Investment advisor.  
Any offer or solicitation of an investment in any of the Funds may be made only 
by delivery of such Funds confidential offering materials to authorized 
prospective investors.  An investment in any of the Funds is not suitable for 
all investors.  No representation is made that the Funds will or are likely to 
achieve their objectives, or that any investor will or is likely to achieve 
results comparable to those shown, or will make any profit at all or will be 
able to avoid incurring substantial losses.  Performance results are net of 
applicable fees, are unaudited and reflect reinvestment of income and profits.  
Past performance is no guarantee of future results. All financial 
 data and other information are not warranted as to completeness or accuracy 
and are subject to change without notice.

Any comments or statements made herein do not necessarily reflect those of Ada 
Investment Management LP and its affiliates. This transmission may contain 
information that is confidential, legally privileged, and/or exempt from 
disclosure under applicable law. If you are not the intended recipient, you are 
hereby notified that any disclosure, copying, distribution, or use of the 
information contained herein (including any reliance thereon) is strictly 
prohibited. If you received this transmission in error, please immediately 
contact the sender and destroy the material in its entirety, whether in 
electronic or hard copy format.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to