[Numpy-discussion] memory usage (Emil Sidky)

2008-10-15 Thread emil

 Huang-Wen Chen wrote:
 Robert Kern wrote:
 from numpy import *
 for i in range(1000):
   a = random.randn(512**2)
   b = a.argsort(kind='quick')
 Can you try upgrading to numpy 1.2.0? On my machine with numpy 1.2.0
 on OS X, the memory usage is stable.
   
 I tried the code fragment on two platforms and the memory usage is also 
 normal.

 1. numpy 1.1.1, python 2.5.1 on Vista 32bit
 2. numpy 1.2.0, python 2.6 on RedHat 64bit
 
 If I recall correctly, there were some major improvements in python's 
 memory management/garbage collection from version 2.4 to 2.5. If you 
 could try to upgrade your python to 2.5 (and possibly also your numpy to 
 1.2.0), you'd probably see some better behaviour.
 
 Regards,
 Vincent.
 

Problem fixed. Thanks.

But it turns out there were two things going on:
(1) Upgrading to numpy 1.2 (even with python 2.4) fixed the memory usage
for the loop with argsort in it.
(2) Unfortunately, when I went back to my original program and ran it
with the upgraded numpy, it still was chewing up tons of memory. I
finally found the problem:
Consider the following two code snippets (extension of my previous example).
from numpy import *
d = []
for i in range(1000):
   a = random.randn(512**2)
   b = a.argsort(kind= 'quick')
   c = b[-100:]
   d.append(c)

and

from numpy import *
d = []
for i in range(1000):
   a = random.randn(512**2)
   b = a.argsort(kind= 'quick')
   c = b[-100:].copy()
   d.append(c)

The difference being that c is a reference to the last 100 elements of b
in the first example, while c is a copy of the last 100 in the second
example.
Both examples yield identical results (provide randn is run with the
same seed value). But the former chews up tons of memory, and the latter
doesn't.
I don't know if this explanation makes any sense, but it is as if python
has to keep all the generated b's around in the first example because c
is only a reference.

Anyway, bottom line is that my problem is solved.
Thanks,
Emil
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] memory usage (Emil Sidky)

2008-10-15 Thread Perry Greenfield
When you slice an array, you keep the original array in memory until  
the slice is deleted. The slice uses the original array memory and is  
not a copy. The second example explicitly makes a copy.

Perry


On Oct 15, 2008, at 2:31 PM, emil wrote:


 Huang-Wen Chen wrote:
 Robert Kern wrote:
 from numpy import *
 for i in range(1000):
   a = random.randn(512**2)
   b = a.argsort(kind='quick')
 Can you try upgrading to numpy 1.2.0? On my machine with numpy  
 1.2.0
 on OS X, the memory usage is stable.

 I tried the code fragment on two platforms and the memory usage  
 is also
 normal.

 1. numpy 1.1.1, python 2.5.1 on Vista 32bit
 2. numpy 1.2.0, python 2.6 on RedHat 64bit

 If I recall correctly, there were some major improvements in python's
 memory management/garbage collection from version 2.4 to 2.5. If you
 could try to upgrade your python to 2.5 (and possibly also your  
 numpy to
 1.2.0), you'd probably see some better behaviour.

 Regards,
 Vincent.


 Problem fixed. Thanks.

 But it turns out there were two things going on:
 (1) Upgrading to numpy 1.2 (even with python 2.4) fixed the memory  
 usage
 for the loop with argsort in it.
 (2) Unfortunately, when I went back to my original program and ran it
 with the upgraded numpy, it still was chewing up tons of memory. I
 finally found the problem:
 Consider the following two code snippets (extension of my previous  
 example).
 from numpy import *
 d = []
 for i in range(1000):
a = random.randn(512**2)
b = a.argsort(kind= 'quick')
c = b[-100:]
d.append(c)

 and

 from numpy import *
 d = []
 for i in range(1000):
a = random.randn(512**2)
b = a.argsort(kind= 'quick')
c = b[-100:].copy()
d.append(c)

 The difference being that c is a reference to the last 100 elements  
 of b
 in the first example, while c is a copy of the last 100 in the second
 example.
 Both examples yield identical results (provide randn is run with the
 same seed value). But the former chews up tons of memory, and the  
 latter
 doesn't.
 I don't know if this explanation makes any sense, but it is as if  
 python
 has to keep all the generated b's around in the first example  
 because c
 is only a reference.

 Anyway, bottom line is that my problem is solved.
 Thanks,
 Emil
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion