Bugs item #2814892, was opened at 2009-06-30 22:42
Message generated for change (Settings changed) made by sf-robot
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=453021&aid=2814892&group_id=48422

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Source
Group: rpy2
>Status: Closed
Resolution: None
Priority: 5
Private: Yes
Submitted By: batripler (batripler)
Assigned to: lgautier (lgautier)
Summary: memory leak in SexpVector

Initial Comment:
Hi Laurent,

Thank you for creating a wonderfully useful piece of software. I've started 
using it for a few weeks, now, and I think I have uncovered a relatively 
serious problem in rpy2.rinterface.SexpVector, which is at the heart of the 
system. Here is a manifestation of the problem. Perhaps I am doing something 
wrong.

Start a new python session and run:
{{{
import numpy; x=numpy.zeros(2e7)
}}}

You can modify the size of the array. Also, depending on the numpy defaults on 
your machine, the memory consumption will vary. On my machine a double is 8 
bytes, times 2e7 = ~150MB.  I see the process at ~162MB due to the Python 
interpreter footprint.

Now, kill this session, start a new one, and run the following:
{{{
import rpy2.robjects, rpy2.rinterface as rint
reval=rint.baseNameSpaceEnv['eval']
rparse=rint.baseNameSpaceEnv['parse']
x=reval(rparse(text=rint.StrSexpVector(["numeric(2e7)"])))
}}}

In this case, we are just creating a REALSXP vector on the R side. I see the 
process coming in at 203MB, which is reasonable given that both Python and R 
interpretters are now running. Again, this is assuming that every element of 
the REALSXP vector is 8 bytes.

Now, finally, in a new Python process, let's create an array on the Python side 
and copy it over to the R side:
{{{
import numpy, rpy2.robjects, rpy2.rinterface as rint;
x=numpy.zeros(2e7); y=rint.SexpVector(x, rint.REALSXP)
}}}

I would expect the max size of this process to be the sum of the previous two. 
It contains both the Python object, as well an equivalently-large R object. It 
comes in at a whopping 950MB!!

Incidentally, if I run the following code:
{{{
import numpy; x=numpy.zeros(2e7); y=list(x)
}}}
... it weighs in at a hefty 985MB. 

Now, I had a look at your code rinterface.c:newSEXP, and noticed that you are 
iterating using the Python sequence protocol, and creating object 
intermediaries. I don't mind the temporary memory bloat -- though it would be 
much faster and leaner to special-case numpy arrays and avoid the move to 
object space and back, -- but somehow these intermediaries are also hanging 
around. Either that, or the allocator is, for some reason, not returning space 
back to the OS. A few of these conversions and our processes is toasted.

Also, for proper 64-bit compatiblity, the index variable "i" should be 
Py_ssize_t.

FYI - I'm compiling from source on a 64-bit Linux box. Running Python 2.5.4, 
with numpy 1.3.0, and rpy2 2.0.5.

Thanks again for a great tool.


----------------------------------------------------------------------

>Comment By: SourceForge Robot (sf-robot)
Date: 2009-07-16 02:22

Message:
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).

----------------------------------------------------------------------

Comment By: lgautier (lgautier)
Date: 2009-07-01 10:22

Message:
Thanks for the bug report.

I have not been using the numpy capability as extensively as I have been
using other features.

Let me re-write one of your code snippets for the sake of simplicity:
{{{
import rpy2.robjects as ro
r_numeric = ro.baseNameSpaceEnv['numeric']
x = r_numeric(2e7)
}}}

Second, the constructor you are using is designed to be working on any
iter-able sequence.
There are numpy-specific features:
http://rpy.sourceforge.net/rpy2/doc/html/numpy.html#from-numpy-to-rpy2

Last, did you try calling either the Python or the R garbage collection ?
It has been observed in the past that this could improve things.
If any of the following is improving the memory usage, you are experience
a
previously observed behaviour (where Python's garbage collector is not
while the process has grown in size - a suspected cause is that it does
not
know about objects created by the embedded R).

{{{
import gc
gc.collect()

robjects.baseNamespaceEnv['gc']()
}}}


Note: thanks for noting the index as int instead of Py_ssize_t. I just
fixed it in the trunk

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=453021&aid=2814892&group_id=48422

------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
rpy-list mailing list
rpy-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rpy-list

Reply via email to