Re: [R] memory problems when combining randomForests
Dear Eleni, But if every time you remove a variable you pass some test data (ie data not used to train the model) and base the performance of the new, reduced model on the error rate on the confusion matrix for the test data, then this overfitting should not be an issue, right? (unless of course you were referring to unsupervised learning). Yes and no. The problem there could arise if you do this iteratively and use the minimum value you obtain with your procedure to return an estimate of the error rate. In such a case, you should, instead, do a double cross-validation or bootstrap (i.e., estimate, via cross-validation ---or the bootstrap--- the error rate of your complete procedure). Both Andy and collaborators on the one hand and myself on the other have done some further work on these issues. Svetnik V, Liaw A, Tong C, Wang T: Application of Breiman's random forest to modeling structure-activity relationships of pharmaceutical molecules. Multiple Classier Systems, Fifth International Workshop, MCS 2004, Proceedings, 9–11 June 2004, Cagliari, Italy. Lecture Notes in Computer Science, Springer 2004, 3077:334-343. Gene selection and classification of microarray data using random forest Ramón Díaz-Uriarte and Sara Alvarez de Andrés. BMC Bioinformatics 2006, 7:3. http://www.biomedcentral.com/1471-2105/7/3 Best, R. On Monday 31 July 2006 18:45, Eleni Rapsomaniki wrote: Hi Andy, I get different order of importance for my variables depending on their order in the training data. Perhaps answering my own question, the change in importance rankings could be attributed to the fact that before passing my data to randomForest I impute the missing values randomly (using the combined distributions of pos+neg), so the data seen by RF is slightly different. Then combining this with the fact that RF chooses data randomly it makes sense to see different rankings. In a previous thread regarding simplifying variables: http://thread.gmane.org/gmane.comp.lang.r.general/6989/focus=6993 you say: The basic problem is that when you select important variables by RF and then re-run RF with those variables, the OOB error rate become biased downward. As you iterate more times, the overfitting becomes more and more severe (in the sense that, the OOB error rate will keep decreasing while error rate on an independent test set will be flat or increases) But if every time you remove a variable you pass some test data (ie data not used to train the model) and base the performance of the new, reduced model on the error rate on the confusion matrix for the test data, then this overfitting should not be an issue, right? (unless of course you were referring to unsupervised learning). Best regards Eleni Rapsomaniki Birkbeck College, UK __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramón Díaz-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electr�nico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory problems when combining randomForests
Hello I've just realised attachments are not allowed, so the data for the example in my previous message is: pos.df=read.table(http://www.savefile.com/projects3.php?fid=6240314pid=847249key=119090;, header=T) neg.df=read.table(http://fs07.savefile.com/download.php?pid=847249fid=9829834key=362779;, header=T) And my last two questions (promise!): The first is related to the order of columns (ie. explanatory variables). I get different order of importance for my variables depending on their order in the training data. Is there a parameter I could fiddle with (e.g. ntree) to get a more stable importance order? And finally, since interactions are not implemented, is there another method I could use in R to find dependencies among categorical variables? (lm doesn't accept categorical variables). Many thanks Eleni Rapsomaniki Birkbeck College, UK __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory problems when combining randomForests
Hi, Andy: What's the Jerry Friedman's ISLE? I googled it and did not find the paper on it. Could you give me a link, please? Thanks, Weiwei On 7/31/06, Eleni Rapsomaniki [EMAIL PROTECTED] wrote: Hello I've just realised attachments are not allowed, so the data for the example in my previous message is: pos.df=read.table( http://www.savefile.com/projects3.php?fid=6240314pid=847249key=119090;, header=T) neg.df=read.table( http://fs07.savefile.com/download.php?pid=847249fid=9829834key=362779;, header=T) And my last two questions (promise!): The first is related to the order of columns (ie. explanatory variables). I get different order of importance for my variables depending on their order in the training data. Is there a parameter I could fiddle with (e.g. ntree) to get a more stable importance order? And finally, since interactions are not implemented, is there another method I could use in R to find dependencies among categorical variables? (lm doesn't accept categorical variables). Many thanks Eleni Rapsomaniki Birkbeck College, UK __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory problems when combining randomForests
Hi Andy, I get different order of importance for my variables depending on their order in the training data. Perhaps answering my own question, the change in importance rankings could be attributed to the fact that before passing my data to randomForest I impute the missing values randomly (using the combined distributions of pos+neg), so the data seen by RF is slightly different. Then combining this with the fact that RF chooses data randomly it makes sense to see different rankings. In a previous thread regarding simplifying variables: http://thread.gmane.org/gmane.comp.lang.r.general/6989/focus=6993 you say: The basic problem is that when you select important variables by RF and then re-run RF with those variables, the OOB error rate become biased downward. As you iterate more times, the overfitting becomes more and more severe (in the sense that, the OOB error rate will keep decreasing while error rate on an independent test set will be flat or increases) But if every time you remove a variable you pass some test data (ie data not used to train the model) and base the performance of the new, reduced model on the error rate on the confusion matrix for the test data, then this overfitting should not be an issue, right? (unless of course you were referring to unsupervised learning). Best regards Eleni Rapsomaniki Birkbeck College, UK __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory problems when combining randomForests [Broadcast]
It's the 5th paper on his web page. http://www-stat.stanford.edu/~jhf/ftp/isle.pdf http://www-stat.stanford.edu/~jhf/ftp/isle.pdf Cheers, Andy _ From: Weiwei Shi [mailto:[EMAIL PROTECTED] Sent: Monday, July 31, 2006 11:38 AM To: Eleni Rapsomaniki Cc: Liaw, Andy; r-help@stat.math.ethz.ch Subject: Re: [R] memory problems when combining randomForests [Broadcast] Hi, Andy: What's the Jerry Friedman's ISLE? I googled it and did not find the paper on it. Could you give me a link, please? Thanks, Weiwei On 7/31/06, Eleni Rapsomaniki [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Hello I've just realised attachments are not allowed, so the data for the example in my previous message is: pos.df=read.table( http://www.savefile.com/projects3.php?fid=6240314pid=847249key=119090 http://www.savefile.com/projects3.php?fid=6240314pid=847249key=119090;, header=T) neg.df=read.table( http://fs07.savefile.com/download.php?pid=847249fid=9829834key=362779 http://fs07.savefile.com/download.php?pid=847249fid=9829834key=362779;, header=T) And my last two questions (promise!): The first is related to the order of columns (ie. explanatory variables). I get different order of importance for my variables depending on their order in the training data. Is there a parameter I could fiddle with (e.g. ntree) to get a more stable importance order? And finally, since interactions are not implemented, is there another method I could use in R to find dependencies among categorical variables? (lm doesn't accept categorical variables). Many thanks Eleni Rapsomaniki Birkbeck College, UK __ R-help@stat.math.ethz.ch mailto:R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III -- -- [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory problems when combining randomForests
Found it from another paper: importance sample learning ensemble (ISLE) which originates from Friedman and Popescu (2003). On 7/31/06, Weiwei Shi [EMAIL PROTECTED] wrote: Hi, Andy: What's the Jerry Friedman's ISLE? I googled it and did not find the paper on it. Could you give me a link, please? Thanks, Weiwei On 7/31/06, Eleni Rapsomaniki [EMAIL PROTECTED] wrote: Hello I've just realised attachments are not allowed, so the data for the example in my previous message is: pos.df=read.table(http://www.savefile.com/projects3.php?fid=6240314pid=847249key=119090 , header=T) neg.df=read.table(http://fs07.savefile.com/download.php?pid=847249fid=9829834key=362779 , header=T) And my last two questions (promise!): The first is related to the order of columns (ie. explanatory variables). I get different order of importance for my variables depending on their order in the training data. Is there a parameter I could fiddle with (e.g. ntree) to get a more stable importance order? And finally, since interactions are not implemented, is there another method I could use in R to find dependencies among categorical variables? (lm doesn't accept categorical variables). Many thanks Eleni Rapsomaniki Birkbeck College, UK __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory problems when combining randomForests
Hello again, The reason why I thought the order at which rows are passed to randomForest affect the error rate is because I get different results for different ways of splitting my positive/negative data. First get the data (attached with this email) pos.df=read.table(C:/Program Files/R/rw2011/pos.df, header=T) neg.df=read.table(C:/Program Files/R/rw2011/neg.df, header=T) library(randomForest) #The first 2 columns are explanatory variables (which incidentally are not discriminative at all if one looks at their distributions), the 3rd is the class (pos or neg) train2test.ratio=8/10 min_len=min(nrow(pos.df), nrow(neg.df)) class_index=which(names(pos.df)==class) #is the same for neg.df train_size=as.integer(min_len*train2test.ratio) Way 1 train.indicesP=sample(seq(1:nrow(pos.df)), size=train_size, replace=FALSE) train.indicesN=sample(seq(1:nrow(neg.df)), size=train_size, replace=FALSE) trainP=pos.df[train.indicesP,] trainN=neg.df[train.indicesN,] testP=pos.df[-train.indicesP,] testN=neg.df[-train.indicesN,] mydata.rf - randomForest(x=rbind(trainP, trainN)[,-class_index], y=rbind(trainP, trainN)[,class_index], xtest=rbind(testP, testN)[,-class_index], ytest=rbind(testP, testN)[,class_index], importance=TRUE,proximity=FALSE, keep.forest=FALSE) mydata.rf$test$confusion ## Way 2 ind - sample(2, min(nrow(pos.df), nrow(neg.df)), replace = TRUE, prob=c(train2test.ratio, (1-train2test.ratio))) trainP=pos.df[ind==1,] trainN=neg.df[ind==1,] testP=pos.df[ind==2,] testN=neg.df[ind==2,] mydata.rf - randomForest(x=rbind(trainP, trainN)[,-dir_index], y=rbind(trainP, trainN)[,dir_index], xtest=rbind(testP, testN)[,-dir_index], ytest=rbind(testP, testN)[,dir_index], importance=TRUE,proximity=FALSE, keep.forest=FALSE) mydata.rf$test$confusion ### Way 3 subset_start=1 subset_end=subset_start+train_size train_index=seq(subset_start:subset_end) trainP=pos.df[train_index,] trainN=neg.df[train_index,] testP=pos.df[-train_index,] testN=neg.df[-train_index,] mydata.rf - randomForest(x=rbind(trainP, trainN)[,-dir_index], y=rbind(trainP, trainN)[,dir_index], xtest=rbind(testP, testN)[,-dir_index], ytest=rbind(testP, testN)[,dir_index], importance=TRUE,proximity=FALSE, keep.forest=FALSE) mydata.rf$test$confusion ### end The first 2 methods give me an abnormally low error rate (compared to what I get using the same data on a naiveBayes method) while the last one seems more realistic, but the difference in error rates is very significant. I need to use the last method to cross-validate subsets of my data sequentially(the first two methods use random rows throughout the length of the data), unless there is a better way to do it (?). Something must be very different between the first 2 methods and the last, but which is the correct one? I would greatly appreciate any suggestions on this! Many Thanks Eleni Rapsomaniki __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory problems when combining randomForests
Hi Andy, I'm using R (windows) version 2.1.1, randomForest version 4.15. ^ Never seen such a version... Ooops! I meant 4.5-15 I then save each tree to a file so I can combine them all afterwards. There are no memory issues when keep.forest=FALSE. But I think that's the bit I need for future predictions (right?). Yes, but what is your question? (Do you mean each *forest*, instead of each *tree*?) I mean the component of the object that is created from randomForest that has the name forest (and takes up all the memory!). A bit off the subject, but should the order at which at rows (ie. sets of explanatory variables) are passed to the randomForest function affect the result? I have noticed that if I pick a random unordered sample from my control data for training the error rate is much lower than if I a take an ordered sample. This remains true for all my cross-validation results. I'm not sure I understand. In randomForest() (as in other functions) variables are in columns, rather than rows, so are you talking about variables (columns) in different order or data (rows) in different order? Yes, sorry I confused you. I mean the order at which data (rows) is passed, not columns. Finally, I see from http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#inter that there is a component in Breiman's implementation of randomForest that computes interactions between parameters. Has this been implemented in R yet? Many thanks for your time and help. Eleni Rapsomaniki __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory problems when combining randomForests
From: Eleni Rapsomaniki Hi Andy, I'm using R (windows) version 2.1.1, randomForest version 4.15. ^ Never seen such a version... Ooops! I meant 4.5-15 I then save each tree to a file so I can combine them all afterwards. There are no memory issues when keep.forest=FALSE. But I think that's the bit I need for future predictions (right?). Yes, but what is your question? (Do you mean each *forest*, instead of each *tree*?) I mean the component of the object that is created from randomForest that has the name forest (and takes up all the memory!). Yes, the forest can take up quite a bit of space. You might consider setting nodesize larger and see if that gives you sufficient space saving w/o compromising prediction performance. A bit off the subject, but should the order at which at rows (ie. sets of explanatory variables) are passed to the randomForest function affect the result? I have noticed that if I pick a random unordered sample from my control data for training the error rate is much lower than if I a take an ordered sample. This remains true for all my cross-validation results. I'm not sure I understand. In randomForest() (as in other functions) variables are in columns, rather than rows, so are you talking about variables (columns) in different order or data (rows) in different order? Yes, sorry I confused you. I mean the order at which data (rows) is passed, not columns. Then I'm not sure what you mean by difference in performance, even in cross-validation. Perhaps you can show some example? Each tree in the forest is grown on a random sample from the data, so the order of the row can not matter. Finally, I see from http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#inter that there is a component in Breiman's implementation of randomForest that computes interactions between parameters. Has this been implemented in R yet? No. Prof. Breiman told me that is very experimental, and he wouldn't mind if that doesn't make it into the R package. Since I have other priorities for the package, that naturally went to the backburner. Cheers, Andy Many thanks for your time and help. Eleni Rapsomaniki This message was sent using IMP, the Internet Messaging Program. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory problems when combining randomForests [Broadcast]
I'm using R (windows) version 2.1.1, randomForest version 4.15. I call randomForest like this: my.rf=randomForest(x=train.df[,-response_index], y=train.df[,response_index], xtest=test.df[,-response_index], ytest=test.df[,response_index], importance=TRUE,proximity=FALSE, keep.forest=TRUE) (where train.df and test.df are my train and test data.frames and response_index is the column number specifiying the class) I then save each tree to a file so I can combine them all afterwards. There are no memory issues when keep.forest=FALSE. But I think that's the bit I need for future predictions (right?). I did check previous messages on memory issues, and thought that combining the trees afterwards would solve the problem. Since my cross-validation subsets give me a fairly stable error-rate, I suppose I could just use a randomForest trained on just a subset of my data. But would I not be wasting data this way? A bit off the subject, but should the order at which at rows (ie. sets of explanatory variables) are passed to the randomForest function affect the result? I have noticed that if I pick a random unordered sample from my control data for training the error rate is much lower than if I a take an ordered sample. This remains true for all my cross-validation results. I'm sorry for my many questions. Many Thanks Eleni Rapsomaniki __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory problems when combining randomForests
From: Eleni Rapsomaniki I'm using R (windows) version 2.1.1, randomForest version 4.15. ^ Never seen such a version... I call randomForest like this: my.rf=randomForest(x=train.df[,-response_index], y=train.df[,response_index], xtest=test.df[,-response_index], ytest=test.df[,response_index], importance=TRUE,proximity=FALSE, keep.forest=TRUE) (where train.df and test.df are my train and test data.frames and response_index is the column number specifiying the class) I then save each tree to a file so I can combine them all afterwards. There are no memory issues when keep.forest=FALSE. But I think that's the bit I need for future predictions (right?). Yes, but what is your question? (Do you mean each *forest*, instead of each *tree*?) I did check previous messages on memory issues, and thought that combining the trees afterwards would solve the problem. Since my cross-validation subsets give me a fairly stable error-rate, I suppose I could just use a randomForest trained on just a subset of my data. But would I not be wasting data this way? Perhaps, but see Jerry Friedman's ISLE, where he argued that RF with very small trees grown on small random samples can give even better results some of the times. A bit off the subject, but should the order at which at rows (ie. sets of explanatory variables) are passed to the randomForest function affect the result? I have noticed that if I pick a random unordered sample from my control data for training the error rate is much lower than if I a take an ordered sample. This remains true for all my cross-validation results. I'm not sure I understand. In randomForest() (as in other functions) variables are in columns, rather than rows, so are you talking about variables (columns) in different order or data (rows) in different order? Andy I'm sorry for my many questions. Many Thanks Eleni Rapsomaniki This message was sent using IMP, the Internet Messaging Program. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory problems when combining randomForests [Broadcast]
You need to give us more details, like how you call randomForest, versions of the package and R itself, etc. Also, see if this helps you: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/32918.html Andy From: Eleni Rapsomaniki Dear all, I am trying to train a randomForest using all my control data (12,000 cases, ~ 20 explanatory variables, 2 classes). Because of memory constraints, I have split my data into 7 subsets and trained a randomForest for each, hoping that using combine() afterwards would solve the memory issue. Unfortunately, combine() still runs out of memory. Is there anything else I can do? (I am not using the formula version) Many Thanks Eleni Rapsomaniki __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory problems with large dataset in rpart
Looks like you have missed the section in the rw-FAQ entitled 2.9 There seems to be a limit on the memory it uses! and not set --max-mem-size (which defaults to 1Gb on your system). However, it looks like your problem is memory fragmentation, and trying to run 1Gb tasks in a 2Gb address space is intrinsically a problem to which the only solution is a 64-bit version of R. BTW, /3GB is nothing whatsoever to do with `swap files': if both OS and application are configured correctly it increases the user address space *for that process* to /3GB (whereas swap space is shared between processes). On Tue, 18 Oct 2005 [EMAIL PROTECTED] wrote: Dear helpers, I am a Dutch student from the Erasmus University. For my Bachelor thesis I have written a script in R using boosting by means of classification and regression trees. This script uses the function the predefined function rpart. My input file consists of about 4000 vectors each having 2210 dimensions. In the third iteration R complains of a lack of memory, although in each iteration every variable is removed from the memory. Thus the first two iterations run without any problems. My computer runs on Windows XP and has 1 gigabye of internal memory. I tried R using more memory by refiguring the swap files as memtioned in the FAQ (/3gb), but I didn't succeed in making this work. The command round(memory.limit()/1048576.0, 2) gives 1023.48 If such an increase of memory can not succeed, perhaps the size of the rpart object could be reduced by not storing unnecessary information. The rpart function call is (the calls of FALSE is to try to reduce the size of the fit object): fit - rpart(price ~ ., data = trainingset, control=rpart.control(maxdepth=2,cp=0.001),model=FALSE,x=FALSE,y=FALSE) This fit object is later called in 2 predict functions, for example: predict(fit,newdata=sample) Can anybody please help me by letting R use more memory (for example swap) or can anybody help me reducing the size of the fit object? -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Memory problems
R does use virtual memory, and memory.size() (Windows only) is documented to report the usage, not change the limit. Please read that help page more carefully. If you are not already doing so, try arima0 rather than arima. And do see the posting guide! On Tue, 22 Feb 2005, Konstantinos Kleisouris wrote: I use R to do some ARIMA forecasting and R runs out of memory. The problem is that I have 20160 samples(which are quite alot) and when I try to fit the model it runs out of memory. I tried with memory.size() to change the limit, but it wouldn't work. Is there anything you can suggest? Is it possible R can use virtual memory? PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Memory problems
Looking at the posting guide will increase the chance to get a helpful response from this list. No one knows what kind of operating system are you running: Is it Windows,MacOS or Linux (32 or 64 bit)? Memory related problems are reported daily so it could be much beneficial to browse the help-archive ( has an efficient search facility ) Some reccent suggestions on memory limitations are found there Thomas I use R to do some ARIMA forecasting and R runs out of memory. The problem is that I have 20160 samples(which are quite alot) and when I try to fit the model it runs out of memory. I tried with memory.size() to change the limit, but it wouldn't work. Is there anything you can suggest? Is it possible R can use virtual memory? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Memory Problems in R
There is a limit on how long a single vector can be, and I think it's 2GB (even on 64-bit platforms). Not sure on how the gc trigger is set -roger Scott Gilpin wrote: Hello everyone - I have a couple of questions about memory management of large objects. Thanks in advance for your response. I'm running R version 1.9.1 on solaris 8, compiled as a 32 bit app. My system has 12.0 GB of memory, with usually ~ 11GB free. I checked system limits using ulimit, and there is nothing set that would limit the maximum amount of memory for a process (with the exception of an 8MB stack size). I've also checked the amount of memory available to R using mem.limits(), and there is no limit set. I'm running into two problems. The first is the error cannot allocate vector of size X - I know this has been discussed several times on this mailing list, but it usually seems the user does not have enough memory on their system, or does not have the memory limits set correctly. I don't believe this is the case in this situation. I verified that I don't have any objects in memory when R starts up, and that memory limits are set to NA. Here is some output: ls() character(0) mem.limits() nsize vsize NANA gc() used (Mb) gc trigger (Mb) Ncells 432197 11.6 531268 14.2 Vcells 116586 0.9 786432 6.0 v-rep(0,268435431) Error: cannot allocate vector of size 2097151 Kb v-rep(0,268435430) object.size(v) [1] 2147483468 gc() used (Mb) gc trigger (Mb) Ncells432214 11.6 741108 19.8 Vcells 268552029 2048.9 268939773 2051.9 Does R have a limit set on the size of an object that it will allocate? I know that the entire application will only be able to use 4GB of memory (because it's only 32bit), but I haven't found anything in the R documentation or the help lists that indicates there is maximum on the size of an object. I understand there will be problems if an object is greater than 2GB and needs to be copied - but will R limit the creation of such an object? It's also my understanding that the garbage collector won't move objects and this may cause memory to become fragmented - but I'm seeing these issues on startup when there are no objects in memory. My second problem is with matrices and the garbage collector, and the limits it sets for gc trigger after a matrix is created. When I create a vector of approximately 500MB, R sets the gc trigger to be slightly above this amount. The gc trigger also seems to correspond to the process size (as output by top). When I create a matrix of approximately 500MB, R sets the gc trigger to be roughly 3 times the size of the matrix (and the process size is ~ 1.5GB). Therefor, when I try to create larger matrices, where 3x the size of the matrix is greater than 4GB, R gives me an error. Is there anything I can do to create large matrices? Or do I have to manipulate large objects as a vector? Output from the 3 different scenarios is below: 1) - can't create a matrix, but can create a vector [Previously saved workspace restored] m-matrix(rep(0,25000*1),nrow=1) Error: cannot allocate vector of size 1953125 Kb v-rep(0,25000*1) object.size(v)/1024 [1] 1953125 2) gc trigger is set slightly higher than the size of the vector ls() character(0) mem.limits() nsize vsize NANA gc() used (Mb) gc trigger (Mb) Ncells 432197 11.6 531268 14.2 Vcells 116586 0.9 786432 6.0 v-rep(0,(2510)*(25000)) object.size(v) [1] 5.02e+08 gc() used (Mb) gc trigger (Mb) Ncells 432210 11.6 667722 17.9 Vcells 62866589 479.7 63247172 482.6 3) gc trigger is set ~ 3x the size of the matrix ls() character(0) mem.limits() nsize vsize NANA gc() used (Mb) gc trigger (Mb) Ncells 432197 11.6 531268 14.2 Vcells 116586 0.9 786432 6.0 A-matrix(rep(0,(2510)*(25000)),nrow=(2510),ncol=(25000)) object.size(A) [1] 502000120 gc() used (Mb) gc trigger (Mb) Ncells 432213 11.6 741108 19.8 Vcells 62866590 479.7 188640940 1439.3 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Memory Problems in R
On Wed, 18 Aug 2004, Roger D. Peng wrote: There is a limit on how long a single vector can be, and I think it's 2GB (even on 64-bit platforms). Not sure on how the gc trigger is set There is a limit of R_SIZE_T_MAX bytes, but that is defined as ULONG_MAX which should be 4GB-1 on a 32-bit platform, and much more on a 64-bit platform. The example works on a 64-bit platform, which demonstrates that there is no 2GB limit there. If you hit the length limit, the message is of the form cannot allocate vector of length ... Looking at the code in memory.c it seems that if (size = (LONG_MAX / sizeof(VECREC)) - sizeof(SEXPREC_ALIGN) || (s = malloc(sizeof(SEXPREC_ALIGN) + size * sizeof(VECREC))) == NULL) { /* reset the vector heap limit */ R_VSize = old_R_VSize; errorcall(R_NilValue, cannot allocate vector of size %lu Kb, (size * sizeof(VECREC))/1024); } has a limit of LONG_MAX bytes for a vector. I think that is unintentional, and you might like to try ULONG_MAX there and re-compile. But it really doesn't make much difference as there is very little you can do with an object taking up more than half the maximum memory size except access bits of it (and that is what DBMSes are for). A few comments: 1) Of course R does have objects in memory, 12.5Mb of them according to gc. You are not starting with a clean slate. Hopefully malloc has allocated them in a compact group. 2) Solaris has been a 64-bit OS for at least 7 years and you really should be using a 64-bit build of R if you plan on exceeding 1Gb. 3) To create a matrix efficiently, create a vector and assign a dim. I gave an example on R-help yesterday, so please check the archives. matrix() makes a copy of the data and so needs double the space you are thinking it does. Take a look at the source code: PROTECT(snr = allocMatrix(TYPEOF(vals), nr, nc)); if(lendat) { if (isVector(vals)) copyMatrix(snr, vals, byrow); else copyListMatrix(snr, vals, byrow); 4) The source code is the documentation here. I suspect no one person knows all the details. Scott Gilpin wrote: Hello everyone - I have a couple of questions about memory management of large objects. Thanks in advance for your response. I'm running R version 1.9.1 on solaris 8, compiled as a 32 bit app. My system has 12.0 GB of memory, with usually ~ 11GB free. I checked system limits using ulimit, and there is nothing set that would limit the maximum amount of memory for a process (with the exception of an 8MB stack size). I've also checked the amount of memory available to R using mem.limits(), and there is no limit set. I'm running into two problems. The first is the error cannot allocate vector of size X - I know this has been discussed several times on this mailing list, but it usually seems the user does not have enough memory on their system, or does not have the memory limits set correctly. I don't believe this is the case in this situation. I verified that I don't have any objects in memory when R starts up, and that memory limits are set to NA. Here is some output: ls() character(0) mem.limits() nsize vsize NANA gc() used (Mb) gc trigger (Mb) Ncells 432197 11.6 531268 14.2 Vcells 116586 0.9 786432 6.0 v-rep(0,268435431) Error: cannot allocate vector of size 2097151 Kb v-rep(0,268435430) object.size(v) [1] 2147483468 gc() used (Mb) gc trigger (Mb) Ncells432214 11.6 741108 19.8 Vcells 268552029 2048.9 268939773 2051.9 Does R have a limit set on the size of an object that it will allocate? I know that the entire application will only be able to use 4GB of memory (because it's only 32bit), but I haven't found anything in the R documentation or the help lists that indicates there is maximum on the size of an object. I understand there will be problems if an object is greater than 2GB and needs to be copied - but will R limit the creation of such an object? It's also my understanding that the garbage collector won't move objects and this may cause memory to become fragmented - but I'm seeing these issues on startup when there are no objects in memory. My second problem is with matrices and the garbage collector, and the limits it sets for gc trigger after a matrix is created. When I create a vector of approximately 500MB, R sets the gc trigger to be slightly above this amount. The gc trigger also seems to correspond to the process size (as output by top). When I create a matrix of approximately 500MB, R sets the gc trigger to be roughly 3 times the size of the matrix (and the process size is ~ 1.5GB). Therefor, when I try to create larger matrices, where 3x the
RE: [R] memory problems with lm
Can you show us the output of str(eff.fro)? Do you have other things in the global environment or the search path that's taking up memory? What does gc() say? Andy From: Adrian Dragulescu Hello list, I've seen the recent discussions documenting problems with lm. I have encountered the following problem. I use WinXP Pro with service pack 1, and R 1.9.0, on a XEON 2GHz, with 1GB of RAM. eff.fro std.dev mean NSTRDSP 7.403749e-01 1.215686e-01 CPFGEP 9.056763e+00 1.815686e+00 WSWOLF 4.703588e+05 1.112832e+05 NPILGRIM 1.017640e+06 2.134335e+05 WSNMILE 1.367312e+07 1.892021e+06 WSHIDESL 1.830811e+07 1.892021e+06 reg - lm(log(mean) ~ log(std.dev), data=eff.fro) Error in model.matrix.default(mt, mf, contrasts) : cannot allocate vector of length 1074790452 log(eff.fro$mean) [1] -2.1072763 0.5964635 11.6198339 12.2710808 14.4531561 [6] 14.4531561 reg - lm(log(eff.fro$mean) ~ log(eff.fro$std.dev)) Error: cannot allocate vector of size 3360077 Kb lef - log(eff.fro) lef std.dev mean NSTRDSP -0.3005986 -2.1072763 CPFGEP2.2035117 0.5964635 WSWOLF 13.0612512 11.6198339 NPILGRIM 13.8329973 12.2710808 WSNMILE 16.4309427 14.4531561 WSHIDESL 16.7228546 14.4531561 lef - log(eff.fro) reg - lm(lef$mean ~ lef$std.dev) Here the my computer completely crashed. A window poped-up and said memory problem at address ..., and if I want to debug. I ran the same code one more time, and it worked but it did not work how I wanted (where is the slope?): reg - lm(lef$mean ~ lef$std.dev) reg Call: lm(formula = lef$mean ~ lef$std.dev) Coefficients: (Intercept) 8.548 summary(reg) Call: lm(formula = lef$mean ~ lef$std.dev) Residuals: 1 2 3 4 5 6 -10.655 -7.951 3.072 3.723 5.905 5.905 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)8.548 2.9992.85 0.0358 * --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 7.346 on 5 degrees of freedom I ran again: reg - lm(log(mean) ~ log(std.dev), data=eff.fro) and I get the pop-up: The instruction at 0x6b4c45a5 referenced memory at 0x0032374a. The memory could not be read. Click OK to terminate the program. Any ideas? Thank you, Adrian __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] memory problems with lm
This may or may not be the same problem (which is already solved). But please read the section on BUGS in the R FAQ and set up a reproducible example. Then try out the current version of r-patched (one dated tomorrow or later, to be safe) and see if the problem recurs. If it does, please file a bug report. My guess is that eff.fro$std.dev is a 1D array (use dim or str to find out), and you did not intend that. On Thu, 29 Apr 2004, Adrian Dragulescu wrote: Hello list, I've seen the recent discussions documenting problems with lm. I have encountered the following problem. I use WinXP Pro with service pack 1, and R 1.9.0, on a XEON 2GHz, with 1GB of RAM. eff.fro std.dev mean NSTRDSP 7.403749e-01 1.215686e-01 CPFGEP 9.056763e+00 1.815686e+00 WSWOLF 4.703588e+05 1.112832e+05 NPILGRIM 1.017640e+06 2.134335e+05 WSNMILE 1.367312e+07 1.892021e+06 WSHIDESL 1.830811e+07 1.892021e+06 reg - lm(log(mean) ~ log(std.dev), data=eff.fro) Error in model.matrix.default(mt, mf, contrasts) : cannot allocate vector of length 1074790452 log(eff.fro$mean) [1] -2.1072763 0.5964635 11.6198339 12.2710808 14.4531561 [6] 14.4531561 reg - lm(log(eff.fro$mean) ~ log(eff.fro$std.dev)) Error: cannot allocate vector of size 3360077 Kb lef - log(eff.fro) lef std.dev mean NSTRDSP -0.3005986 -2.1072763 CPFGEP2.2035117 0.5964635 WSWOLF 13.0612512 11.6198339 NPILGRIM 13.8329973 12.2710808 WSNMILE 16.4309427 14.4531561 WSHIDESL 16.7228546 14.4531561 lef - log(eff.fro) reg - lm(lef$mean ~ lef$std.dev) Here the my computer completely crashed. A window poped-up and said memory problem at address ..., and if I want to debug. I ran the same code one more time, and it worked but it did not work how I wanted (where is the slope?): reg - lm(lef$mean ~ lef$std.dev) reg Call: lm(formula = lef$mean ~ lef$std.dev) Coefficients: (Intercept) 8.548 summary(reg) Call: lm(formula = lef$mean ~ lef$std.dev) Residuals: 1 2 3 4 5 6 -10.655 -7.951 3.072 3.723 5.905 5.905 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)8.548 2.9992.85 0.0358 * --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 7.346 on 5 degrees of freedom I ran again: reg - lm(log(mean) ~ log(std.dev), data=eff.fro) and I get the pop-up: The instruction at 0x6b4c45a5 referenced memory at 0x0032374a. The memory could not be read. Click OK to terminate the program. Any ideas? Thank you, Adrian __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] memory problems with lm
If I enforce the variables to be numeric it works fine. str(eff.fro) `data.frame': 6 obs. of 2 variables: $ std.dev: num [, 1:6] 7.40e-01 9.06e+00 4.70e+05 1.02e+06 1.37e+07 ... ..- attr(*, dimnames)=List of 1 .. ..$ : chr NSTRDSP CPFGEP WSWOLF NPILGRIM ... $ mean : num 1.22e-01 1.82e+00 1.11e+05 2.13e+05 1.89e+06 ... gc() used (Mb) gc trigger (Mb) Ncells 578941 15.51166886 31.2 Vcells 589444 4.52377385 18.2 eff.fro std.dev mean NSTRDSP 7.403749e-01 1.215686e-01 CPFGEP 9.056763e+00 1.815686e+00 WSWOLF 4.703588e+05 1.112832e+05 NPILGRIM 1.017640e+06 2.134335e+05 WSNMILE 1.367312e+07 1.892021e+06 WSHIDESL 1.830811e+07 1.892021e+06 reg - lm(log(as.numeric(mean)) ~ log(as.numeric(std.dev)), data=eff.fro) reg Call: lm(formula = log(as.numeric(mean)) ~ log(as.numeric(std.dev)), data = eff.fro) Coefficients: (Intercept) log(as.numeric(std.dev)) -1.63680.9864 Adrian On Thu, 29 Apr 2004, Liaw, Andy wrote: Can you show us the output of str(eff.fro)? Do you have other things in the global environment or the search path that's taking up memory? What does gc() say? Andy From: Adrian Dragulescu Hello list, I've seen the recent discussions documenting problems with lm. I have encountered the following problem. I use WinXP Pro with service pack 1, and R 1.9.0, on a XEON 2GHz, with 1GB of RAM. eff.fro std.dev mean NSTRDSP 7.403749e-01 1.215686e-01 CPFGEP 9.056763e+00 1.815686e+00 WSWOLF 4.703588e+05 1.112832e+05 NPILGRIM 1.017640e+06 2.134335e+05 WSNMILE 1.367312e+07 1.892021e+06 WSHIDESL 1.830811e+07 1.892021e+06 reg - lm(log(mean) ~ log(std.dev), data=eff.fro) Error in model.matrix.default(mt, mf, contrasts) : cannot allocate vector of length 1074790452 log(eff.fro$mean) [1] -2.1072763 0.5964635 11.6198339 12.2710808 14.4531561 [6] 14.4531561 reg - lm(log(eff.fro$mean) ~ log(eff.fro$std.dev)) Error: cannot allocate vector of size 3360077 Kb lef - log(eff.fro) lef std.dev mean NSTRDSP -0.3005986 -2.1072763 CPFGEP2.2035117 0.5964635 WSWOLF 13.0612512 11.6198339 NPILGRIM 13.8329973 12.2710808 WSNMILE 16.4309427 14.4531561 WSHIDESL 16.7228546 14.4531561 lef - log(eff.fro) reg - lm(lef$mean ~ lef$std.dev) Here the my computer completely crashed. A window poped-up and said memory problem at address ..., and if I want to debug. I ran the same code one more time, and it worked but it did not work how I wanted (where is the slope?): reg - lm(lef$mean ~ lef$std.dev) reg Call: lm(formula = lef$mean ~ lef$std.dev) Coefficients: (Intercept) 8.548 summary(reg) Call: lm(formula = lef$mean ~ lef$std.dev) Residuals: 1 2 3 4 5 6 -10.655 -7.951 3.072 3.723 5.905 5.905 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)8.548 2.9992.85 0.0358 * --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 7.346 on 5 degrees of freedom I ran again: reg - lm(log(mean) ~ log(std.dev), data=eff.fro) and I get the pop-up: The instruction at 0x6b4c45a5 referenced memory at 0x0032374a. The memory could not be read. Click OK to terminate the program. Any ideas? Thank you, Adrian __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD and in Japan as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. -- __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] memory problems with lm
I believe Prof. Ripley is right. The problem is $ std.dev: num [, 1:6] 7.40e-01 9.06e+00 4.70e+05 1.02e+06 1.37e+07 ... ..- attr(*, dimnames)=List of 1 which looks like an array, rather than a vector. Andy From: Adrian Dragulescu [mailto:[EMAIL PROTECTED] If I enforce the variables to be numeric it works fine. str(eff.fro) `data.frame': 6 obs. of 2 variables: $ std.dev: num [, 1:6] 7.40e-01 9.06e+00 4.70e+05 1.02e+06 1.37e+07 ... ..- attr(*, dimnames)=List of 1 .. ..$ : chr NSTRDSP CPFGEP WSWOLF NPILGRIM ... $ mean : num 1.22e-01 1.82e+00 1.11e+05 2.13e+05 1.89e+06 ... gc() used (Mb) gc trigger (Mb) Ncells 578941 15.51166886 31.2 Vcells 589444 4.52377385 18.2 eff.fro std.dev mean NSTRDSP 7.403749e-01 1.215686e-01 CPFGEP 9.056763e+00 1.815686e+00 WSWOLF 4.703588e+05 1.112832e+05 NPILGRIM 1.017640e+06 2.134335e+05 WSNMILE 1.367312e+07 1.892021e+06 WSHIDESL 1.830811e+07 1.892021e+06 reg - lm(log(as.numeric(mean)) ~ log(as.numeric(std.dev)), data=eff.fro) reg Call: lm(formula = log(as.numeric(mean)) ~ log(as.numeric(std.dev)), data = eff.fro) Coefficients: (Intercept) log(as.numeric(std.dev)) -1.63680.9864 Adrian On Thu, 29 Apr 2004, Liaw, Andy wrote: Can you show us the output of str(eff.fro)? Do you have other things in the global environment or the search path that's taking up memory? What does gc() say? Andy From: Adrian Dragulescu Hello list, I've seen the recent discussions documenting problems with lm. I have encountered the following problem. I use WinXP Pro with service pack 1, and R 1.9.0, on a XEON 2GHz, with 1GB of RAM. eff.fro std.dev mean NSTRDSP 7.403749e-01 1.215686e-01 CPFGEP 9.056763e+00 1.815686e+00 WSWOLF 4.703588e+05 1.112832e+05 NPILGRIM 1.017640e+06 2.134335e+05 WSNMILE 1.367312e+07 1.892021e+06 WSHIDESL 1.830811e+07 1.892021e+06 reg - lm(log(mean) ~ log(std.dev), data=eff.fro) Error in model.matrix.default(mt, mf, contrasts) : cannot allocate vector of length 1074790452 log(eff.fro$mean) [1] -2.1072763 0.5964635 11.6198339 12.2710808 14.4531561 [6] 14.4531561 reg - lm(log(eff.fro$mean) ~ log(eff.fro$std.dev)) Error: cannot allocate vector of size 3360077 Kb lef - log(eff.fro) lef std.dev mean NSTRDSP -0.3005986 -2.1072763 CPFGEP2.2035117 0.5964635 WSWOLF 13.0612512 11.6198339 NPILGRIM 13.8329973 12.2710808 WSNMILE 16.4309427 14.4531561 WSHIDESL 16.7228546 14.4531561 lef - log(eff.fro) reg - lm(lef$mean ~ lef$std.dev) Here the my computer completely crashed. A window poped-up and said memory problem at address ..., and if I want to debug. I ran the same code one more time, and it worked but it did not work how I wanted (where is the slope?): reg - lm(lef$mean ~ lef$std.dev) reg Call: lm(formula = lef$mean ~ lef$std.dev) Coefficients: (Intercept) 8.548 summary(reg) Call: lm(formula = lef$mean ~ lef$std.dev) Residuals: 1 2 3 4 5 6 -10.655 -7.951 3.072 3.723 5.905 5.905 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)8.548 2.9992.85 0.0358 * --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 7.346 on 5 degrees of freedom I ran again: reg - lm(log(mean) ~ log(std.dev), data=eff.fro) and I get the pop-up: The instruction at 0x6b4c45a5 referenced memory at 0x0032374a. The memory could not be read. Click OK to terminate the program. Any ideas? Thank you, Adrian __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD and in Japan as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. --
Re: [R] memory problems in NLME
Vumani Dlamini [EMAIL PROTECTED] writes: I am trying to fit a random coeficient model with about 30 covariates with a random intercept and one random slope. The data set has 65000 observations, and whenever I use LME I get the message that all memory has been used. Do you know what the number of columns in the model matrix for the fixed-effects will be? You say you have 30 covariates but if some of those are factors or if you take powers of continuous covariates or interactions between terms then the number of columns in the model matrix can be much larger than 30. Given the dimensions you mention it seems that the model matrix for the fixed effects is nearly 16 MB in size or larger. Evaluation of the log-likelihood requires 3 or 4 copies of matrices like this plus the original data frame and the memory being used by other R objects. I was wondering whether there is a more efficient way fo fitting the model. Saikat DebRoy and I are working on a new package for lme and related functions using S4 classes. That package, which we plan to release in a 'snapshot' form shortly after R-1.7.0 is released (scheduled for April 16, 2003), controls the number of copies of the model matrices being created. I can run this example on the new package and the old package and provide comparisons if you wish. I use a machine with 1 GB of memory which should be enough. Please contact me off-list. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] memory problems
Hi I'm using SuSE 8.0 and R 1.6.2. The mem.limits are nt set so it should go to the maximum the machine allows. My doubt is that I have 2GB and R is complainig about allocating less then 500MB. Regards EJ On Fri, 2003-01-24 at 22:22, Andrew C. Ward wrote: Ernesto, I can't tell what version of R you're using and for which platform. In any case, there are some start-up options relating to memory usage, and you will find discussions of these in the relevant FAQ. Under Windows, the amount of memory that R uses is set by the command-line flag --max-mem-size. An alternative is to perform your analysis on just a few random subsets of data and then aggregate the results. I don't know how big your data set actually is so it's hard to provide more specific guidance. Post again if you're still having trouble. Regards, Andrew C. Ward CAPE Centre Department of Chemical Engineering The University of Queensland Brisbane Qld 4072 Australia [EMAIL PROTECTED] On Friday, January 24, 2003 10:02 PM, Ernesto Jardim [SMTP:[EMAIL PROTECTED]] wrote: Hi I'm computing a bca interval using bca.ci from the boot package. When I try to use this I get an error library(boot) boot(logglm.data,boot.fishpower,2500,coef.vec=coeflm.vec)-blm8901 bca.ci(blm8901,index=29) Error: cannot allocate vector of size 456729 Kb However my machine has 2GB of memory and without R running I only have 112M of memory used. Is there something I can do to be able to perform this analysis ? (I can not by more memory ;-) Thanks EJ -- Ernesto Jardim [EMAIL PROTECTED] Marine Biologist IPIMAR - National Research Institute for Agriculture and Fisheries Av. Brasilia, 1400-006 Lisboa, Portugal Tel: +351 213 027 000 Fax: +351 213 015 948 http://ernesto.freezope.org __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Ernesto Jardim [EMAIL PROTECTED] Marine Biologist Research Institute for Agriculture and Fisheries Lisboa, Portugal Tel: +351 213 027 000 Fax: +351 213 015 948 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] memory problems
[EMAIL PROTECTED] writes: However my machine has 2GB of memory and without R running I only have 112M of memory used. How much memory is it actually using? It is complaining about allocating an *additional* 450Mb. Look at top / Task Manager / whatever. It's not the first time we're seeing that type of question. I wonder if we could make the error message more informative. Not that it is going to make the problem go away, but it could help put the user in the picture. It's a bit tricky, because there are limits to what you can make the system do in an out of memory condition. One idea might be to keep tabs on the total amount of memory allocated and freed. However, there are issues of counter overruns to deal with, and you'd still have to explain to people that not all free memory is available for allocation; you really only need to allocate a handful of *bytes* sufficiently unfortunately spaced to make it impossible to find a 450MB contiguous block in a 2GB address space. -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help