Bugs item #2814892, was opened at 2009-06-30 22:42
Message generated for change (Settings changed) made by sf-robot
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=453021&aid=2814892&group_id=48422
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Source
Group: rpy2
>Status: Closed
Resolution: None
Priority: 5
Private: Yes
Submitted By: batripler (batripler)
Assigned to: lgautier (lgautier)
Summary: memory leak in SexpVector
Initial Comment:
Hi Laurent,
Thank you for creating a wonderfully useful piece of software. I've started
using it for a few weeks, now, and I think I have uncovered a relatively
serious problem in rpy2.rinterface.SexpVector, which is at the heart of the
system. Here is a manifestation of the problem. Perhaps I am doing something
wrong.
Start a new python session and run:
{{{
import numpy; x=numpy.zeros(2e7)
}}}
You can modify the size of the array. Also, depending on the numpy defaults on
your machine, the memory consumption will vary. On my machine a double is 8
bytes, times 2e7 = ~150MB. I see the process at ~162MB due to the Python
interpreter footprint.
Now, kill this session, start a new one, and run the following:
{{{
import rpy2.robjects, rpy2.rinterface as rint
reval=rint.baseNameSpaceEnv['eval']
rparse=rint.baseNameSpaceEnv['parse']
x=reval(rparse(text=rint.StrSexpVector(["numeric(2e7)"])))
}}}
In this case, we are just creating a REALSXP vector on the R side. I see the
process coming in at 203MB, which is reasonable given that both Python and R
interpretters are now running. Again, this is assuming that every element of
the REALSXP vector is 8 bytes.
Now, finally, in a new Python process, let's create an array on the Python side
and copy it over to the R side:
{{{
import numpy, rpy2.robjects, rpy2.rinterface as rint;
x=numpy.zeros(2e7); y=rint.SexpVector(x, rint.REALSXP)
}}}
I would expect the max size of this process to be the sum of the previous two.
It contains both the Python object, as well an equivalently-large R object. It
comes in at a whopping 950MB!!
Incidentally, if I run the following code:
{{{
import numpy; x=numpy.zeros(2e7); y=list(x)
}}}
... it weighs in at a hefty 985MB.
Now, I had a look at your code rinterface.c:newSEXP, and noticed that you are
iterating using the Python sequence protocol, and creating object
intermediaries. I don't mind the temporary memory bloat -- though it would be
much faster and leaner to special-case numpy arrays and avoid the move to
object space and back, -- but somehow these intermediaries are also hanging
around. Either that, or the allocator is, for some reason, not returning space
back to the OS. A few of these conversions and our processes is toasted.
Also, for proper 64-bit compatiblity, the index variable "i" should be
Py_ssize_t.
FYI - I'm compiling from source on a 64-bit Linux box. Running Python 2.5.4,
with numpy 1.3.0, and rpy2 2.0.5.
Thanks again for a great tool.
----------------------------------------------------------------------
>Comment By: SourceForge Robot (sf-robot)
Date: 2009-07-16 02:22
Message:
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).
----------------------------------------------------------------------
Comment By: lgautier (lgautier)
Date: 2009-07-01 10:22
Message:
Thanks for the bug report.
I have not been using the numpy capability as extensively as I have been
using other features.
Let me re-write one of your code snippets for the sake of simplicity:
{{{
import rpy2.robjects as ro
r_numeric = ro.baseNameSpaceEnv['numeric']
x = r_numeric(2e7)
}}}
Second, the constructor you are using is designed to be working on any
iter-able sequence.
There are numpy-specific features:
http://rpy.sourceforge.net/rpy2/doc/html/numpy.html#from-numpy-to-rpy2
Last, did you try calling either the Python or the R garbage collection ?
It has been observed in the past that this could improve things.
If any of the following is improving the memory usage, you are experience
a
previously observed behaviour (where Python's garbage collector is not
while the process has grown in size - a suspected cause is that it does
not
know about objects created by the embedded R).
{{{
import gc
gc.collect()
robjects.baseNamespaceEnv['gc']()
}}}
Note: thanks for noting the index as int instead of Py_ssize_t. I just
fixed it in the trunk
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=453021&aid=2814892&group_id=48422
------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge
This is your chance to win up to $100,000 in prizes! For a limited time,
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
rpy-list mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rpy-list