Bugs item #2814892, was opened at 2009-06-30 22:42 Message generated for change (Settings changed) made by sf-robot You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=453021&aid=2814892&group_id=48422
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Source Group: rpy2 >Status: Closed Resolution: None Priority: 5 Private: Yes Submitted By: batripler (batripler) Assigned to: lgautier (lgautier) Summary: memory leak in SexpVector Initial Comment: Hi Laurent, Thank you for creating a wonderfully useful piece of software. I've started using it for a few weeks, now, and I think I have uncovered a relatively serious problem in rpy2.rinterface.SexpVector, which is at the heart of the system. Here is a manifestation of the problem. Perhaps I am doing something wrong. Start a new python session and run: {{{ import numpy; x=numpy.zeros(2e7) }}} You can modify the size of the array. Also, depending on the numpy defaults on your machine, the memory consumption will vary. On my machine a double is 8 bytes, times 2e7 = ~150MB. I see the process at ~162MB due to the Python interpreter footprint. Now, kill this session, start a new one, and run the following: {{{ import rpy2.robjects, rpy2.rinterface as rint reval=rint.baseNameSpaceEnv['eval'] rparse=rint.baseNameSpaceEnv['parse'] x=reval(rparse(text=rint.StrSexpVector(["numeric(2e7)"]))) }}} In this case, we are just creating a REALSXP vector on the R side. I see the process coming in at 203MB, which is reasonable given that both Python and R interpretters are now running. Again, this is assuming that every element of the REALSXP vector is 8 bytes. Now, finally, in a new Python process, let's create an array on the Python side and copy it over to the R side: {{{ import numpy, rpy2.robjects, rpy2.rinterface as rint; x=numpy.zeros(2e7); y=rint.SexpVector(x, rint.REALSXP) }}} I would expect the max size of this process to be the sum of the previous two. It contains both the Python object, as well an equivalently-large R object. It comes in at a whopping 950MB!! Incidentally, if I run the following code: {{{ import numpy; x=numpy.zeros(2e7); y=list(x) }}} ... it weighs in at a hefty 985MB. Now, I had a look at your code rinterface.c:newSEXP, and noticed that you are iterating using the Python sequence protocol, and creating object intermediaries. I don't mind the temporary memory bloat -- though it would be much faster and leaner to special-case numpy arrays and avoid the move to object space and back, -- but somehow these intermediaries are also hanging around. Either that, or the allocator is, for some reason, not returning space back to the OS. A few of these conversions and our processes is toasted. Also, for proper 64-bit compatiblity, the index variable "i" should be Py_ssize_t. FYI - I'm compiling from source on a 64-bit Linux box. Running Python 2.5.4, with numpy 1.3.0, and rpy2 2.0.5. Thanks again for a great tool. ---------------------------------------------------------------------- >Comment By: SourceForge Robot (sf-robot) Date: 2009-07-16 02:22 Message: This Tracker item was closed automatically by the system. It was previously set to a Pending status, and the original submitter did not respond within 14 days (the time period specified by the administrator of this Tracker). ---------------------------------------------------------------------- Comment By: lgautier (lgautier) Date: 2009-07-01 10:22 Message: Thanks for the bug report. I have not been using the numpy capability as extensively as I have been using other features. Let me re-write one of your code snippets for the sake of simplicity: {{{ import rpy2.robjects as ro r_numeric = ro.baseNameSpaceEnv['numeric'] x = r_numeric(2e7) }}} Second, the constructor you are using is designed to be working on any iter-able sequence. There are numpy-specific features: http://rpy.sourceforge.net/rpy2/doc/html/numpy.html#from-numpy-to-rpy2 Last, did you try calling either the Python or the R garbage collection ? It has been observed in the past that this could improve things. If any of the following is improving the memory usage, you are experience a previously observed behaviour (where Python's garbage collector is not while the process has grown in size - a suspected cause is that it does not know about objects created by the embedded R). {{{ import gc gc.collect() robjects.baseNamespaceEnv['gc']() }}} Note: thanks for noting the index as int instead of Py_ssize_t. I just fixed it in the trunk ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=453021&aid=2814892&group_id=48422 ------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge _______________________________________________ rpy-list mailing list rpy-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rpy-list