On 4 February 2008 at 20:45, Doran, Harold wrote:
| I have a program which reads in a very large data set, performs some 
analyses, and then repeats this process with another data set. As soon as the 
first set of analyses are complete, I remove the very large object and clean up 
to try and make memory available in order to run the second set of analyses. 
The process looks something like this:
| 
| 1) read in data set 1 and perform analyses
| rm(list=ls())
| gc()
| 2) read in data set 2 and perform analyses
| rm(list=ls())
| gc()
| ...
| 
| But, it appears that I am not making the memory that was consumed in step 1 
available back to the OS as R complains that it cannot allocate a vector of 
size X as the process tries to repeat in step 2. 
| 
| So, I close and reopen R and then drop in the code to run the second 
analysis. When this is done, I close and reopen R and run the third analysis. 
| 
| This is terribly inefficient. Instead I would rather just source in the R 
code and let the analyses run over night.
| 
| Is there a way that I can use gc() or some other function more efficiently 
rather than having to close and reopen R at each iteration?

I haven't found one. 

Every (trading) I process batches of data with R, and the only reliable way I
have found is to use fresh R sessions.  Otherwise, the fragmented memory will
eventually result in the all-too-familiar 'cannot allocate X mb' for rather
small values of X relative to my total ram. C'est la vie.

As gc() seems to help somewhat yet not 'sufficiently', fresh starts are an
alternative help, And Rscript starts faster than the main R. Now, I happen to
be partial to littler [1] which starts even faster, so I use that ( on Linux
and am not sure if it can be built on Windows as we embed R directly and
hence start faster than Rscript).  But either one should help you with some
batch files -- given you a way to run overnight.  And once you start batching
things, it is only a small step to regain efficiency by parallel execution
using something like MPI or NWS

Hth, Dirk

[1] littler is the predecessor to Rscript by Jeff and myself. See either 
        http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/LittleR
    or 
        http://dirk.eddelbuettel.com/code/littler.html
    for more on littler and feel free to email us.

-- 
Three out of two people have difficulties with fractions.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to