Hi - Admittedly, this may not be the most sophisticated memory profiling performed, but when using unix's top command, I'm noticing a notable memory leak when using R with a large matrix that has dimnames set. To allow people to reproduce the problem I'm seeing, I've added a small (< 50 lines) code snippet at the end of this email. I'm seeing this problem on both a MacOS box using R v.2.5.1 and a Unix box (x86_64) running R v.2.5.0. The output from sessionInfo() for both machines are below. What I'm seeing is that when I create a 20k x 2k matrix that does not have any dimnames set, if I call a function (the f() function below) that makes a couple of local copies of subsets of the matrix and then returns the result of some statistical massaging, R works mostly fine (more on this below) However, if I set the dimnames (currently commented out in the code snippet below), and then call from the R command intrepreter:
res <- sapply( 1:10, function(i) { cat(i, "\n"); f() } ) gc() rm( list=ls() ) gc() unix's top command reports that R has a memory stamp of roughly 2 gig (1.2 on the MacOS box), although R's gc() command reports for this 'empty' instance of R: > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 236823 12.7 467875 25.0 467875 25.0 Vcells 120446 1.0 109363282 834.4 155806232 1188.8 > As I said, if the matrix does not have the dimnames set, the same procedure will produce the same output from R's gc() command, though unix's top command reports that R's memory stamp is actually >270 meg. Not sure if that's just a basal level of R's memory needs. I see this on both OS's I'm using and both versions of R (v.2.5.x). If I'm doing something wrong in my code below which is causing this issue, please let me know, but it's fairly vanilla code so I'm not sure Thanks, Peter Waltman SessionInfo output: Mac > sessionInfo() R version 2.5.1 (2007-06-27) powerpc-apple-darwin8.9.1 locale: C attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7] "base" > Unix: > sessionInfo() R version 2.5.0 (2007-04-23) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en _US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER= en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en _US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7] "base" > test.R: f<-function() { my.cols <- sample( ncol( val ), 750 ) my.r <- val[ sample( nrow( val ), 15 ), my.cols ] avg.rows <- apply( my.r, 2, mean, na.rm=TRUE ) rm ( my.r) gc() my.r.all <- val[ , my.cols ] devs <- apply( my.r.all, 1, "-", avg.rows ) rm( my.r.all ) gc() apply( devs, 2, var, na.rm=TRUE ) } ) val<-matrix( rnorm( (20000*2000) ), 20000, 2000 )#, dimnames= list( paste( "AT2G", 1:20000,sep="" ), paste( "AT2Gcol", 1:2000,sep="" ) ) ) gc() #res <- sapply(1:10, function(i) f()) # --- works fine if dimnames aren't set # rm( list=ls() ) #gc() ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.