Re: [R] memory problems when combining randomForests

2006-08-01 Thread Ramon Diaz-Uriarte
Dear Eleni,


 But if every time you remove a variable you pass some test data (ie data
 not used to train the model) and base the performance of the new, reduced
 model on the error rate on the confusion matrix for the test data, then
 this overfitting should not be an issue, right?  (unless of course you
 were referring to unsupervised learning).



Yes and no. The problem there could arise if you do this iteratively and use 
the minimum value you obtain with your procedure to return an estimate of the 
error rate. In such a case, you should, instead, do a double cross-validation 
or bootstrap (i.e., estimate, via cross-validation ---or the bootstrap--- the 
error rate of your complete procedure).

Both Andy and collaborators on the one hand and myself on the other have done 
some further work on these issues.

Svetnik V, Liaw A, Tong C, Wang T: Application of Breiman's random forest to 
modeling structure-activity relationships of pharmaceutical molecules.
Multiple Classier Systems, Fifth International Workshop, MCS 2004, 
Proceedings, 9–11 June 2004, Cagliari, Italy. Lecture Notes in Computer 
Science, Springer 2004, 3077:334-343.

Gene selection and classification of microarray data using random forest
Ramón Díaz-Uriarte and Sara Alvarez de Andrés. BMC Bioinformatics 2006, 7:3. 
http://www.biomedcentral.com/1471-2105/7/3


Best,

R.



On Monday 31 July 2006 18:45, Eleni Rapsomaniki wrote:
 Hi Andy,

   I get different order of importance for my variables depending on their

 order in the training data.

 Perhaps answering my own question, the change in importance rankings could
 be attributed to the fact that before passing my data to randomForest I
 impute the missing values randomly (using the combined distributions of
 pos+neg), so the data seen by RF is slightly different. Then combining this
 with the fact that RF chooses data randomly it makes sense to see different
 rankings.

 In a previous thread regarding simplifying variables:
 http://thread.gmane.org/gmane.comp.lang.r.general/6989/focus=6993

 you say:
 The basic problem is that when you select important variables by RF and
 then re-run RF with those variables, the OOB error rate become biased
 downward. As you iterate more times, the overfitting becomes more and
 more severe (in the sense that, the OOB error rate will keep decreasing
 while error rate on an independent test set will be flat or increases)

 But if every time you remove a variable you pass some test data (ie data
 not used to train the model) and base the performance of the new, reduced
 model on the error rate on the confusion matrix for the test data, then
 this overfitting should not be an issue, right?  (unless of course you
 were referring to unsupervised learning).

 Best regards
 Eleni Rapsomaniki
 Birkbeck College, UK

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electr�nico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory problems when combining randomForests

2006-07-31 Thread Eleni Rapsomaniki
Hello

I've just realised attachments are not allowed, so the data for the example in
my previous message is:

pos.df=read.table(http://www.savefile.com/projects3.php?fid=6240314pid=847249key=119090;,
header=T)

neg.df=read.table(http://fs07.savefile.com/download.php?pid=847249fid=9829834key=362779;,
header=T)

And my last two questions (promise!): 
The first is related to the order of columns (ie. explanatory variables). I get
different order of importance for my variables depending on their order in the
training data. Is there a parameter I could fiddle with (e.g. ntree) to get a
more stable importance order?

And finally, since interactions are not implemented, is there another method I
could use in R to find dependencies among categorical variables? (lm doesn't
accept categorical variables).

Many thanks
Eleni Rapsomaniki
Birkbeck College, UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory problems when combining randomForests

2006-07-31 Thread Weiwei Shi
Hi, Andy:

What's the Jerry Friedman's ISLE? I googled it and did not find the paper on
it. Could you give me a link, please?

Thanks,

Weiwei

On 7/31/06, Eleni Rapsomaniki [EMAIL PROTECTED] wrote:

 Hello

 I've just realised attachments are not allowed, so the data for the
 example in
 my previous message is:

 pos.df=read.table(
 http://www.savefile.com/projects3.php?fid=6240314pid=847249key=119090;,
 header=T)

 neg.df=read.table(
 http://fs07.savefile.com/download.php?pid=847249fid=9829834key=362779;,
 header=T)

 And my last two questions (promise!):
 The first is related to the order of columns (ie. explanatory variables).
 I get
 different order of importance for my variables depending on their order in
 the
 training data. Is there a parameter I could fiddle with (e.g. ntree) to
 get a
 more stable importance order?

 And finally, since interactions are not implemented, is there another
 method I
 could use in R to find dependencies among categorical variables? (lm
 doesn't
 accept categorical variables).

 Many thanks
 Eleni Rapsomaniki
 Birkbeck College, UK

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Weiwei Shi, Ph.D

Did you always know?
No, I did not. But I believed...
---Matrix III

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory problems when combining randomForests

2006-07-31 Thread Eleni Rapsomaniki
Hi Andy,

  I get different order of importance for my variables depending on their
order in the training data.

Perhaps answering my own question, the change in importance rankings could be
attributed to the fact that before passing my data to randomForest I impute the
missing values randomly (using the combined distributions of pos+neg), so the
data seen by RF is slightly different. Then combining this with the fact that
RF chooses data randomly it makes sense to see different rankings.

In a previous thread regarding simplifying variables:
http://thread.gmane.org/gmane.comp.lang.r.general/6989/focus=6993

you say:
The basic problem is that when you select important variables by RF and then
re-run RF with those variables, the OOB error rate become biased downward.
As you iterate more times, the overfitting becomes more and more severe
(in the sense that, the OOB error rate will keep decreasing while error rate
on an independent test set will be flat or increases) 

But if every time you remove a variable you pass some test data (ie data not
used to train the model) and base the performance of the new, reduced model on
the error rate on the confusion matrix for the test data, then this
overfitting should not be an issue, right?  (unless of course you were
referring to unsupervised learning).

Best regards
Eleni Rapsomaniki
Birkbeck College, UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory problems when combining randomForests [Broadcast]

2006-07-31 Thread Liaw, Andy
It's the 5th paper on his web page.
http://www-stat.stanford.edu/~jhf/ftp/isle.pdf
http://www-stat.stanford.edu/~jhf/ftp/isle.pdf 
 
Cheers,
Andy


  _  

From: Weiwei Shi [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 31, 2006 11:38 AM
To: Eleni Rapsomaniki
Cc: Liaw, Andy; r-help@stat.math.ethz.ch
Subject: Re: [R] memory problems when combining randomForests [Broadcast]


Hi, Andy:

What's the Jerry Friedman's ISLE? I googled it and did not find the paper on
it. Could you give me a link, please?

Thanks,

Weiwei


On 7/31/06, Eleni Rapsomaniki [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]  wrote: 

Hello

I've just realised attachments are not allowed, so the data for the example
in
my previous message is:

pos.df=read.table(
http://www.savefile.com/projects3.php?fid=6240314pid=847249key=119090
http://www.savefile.com/projects3.php?fid=6240314pid=847249key=119090;,
header=T)

neg.df=read.table(
http://fs07.savefile.com/download.php?pid=847249fid=9829834key=362779
http://fs07.savefile.com/download.php?pid=847249fid=9829834key=362779;,
header=T)

And my last two questions (promise!):
The first is related to the order of columns (ie. explanatory variables). I
get 
different order of importance for my variables depending on their order in
the
training data. Is there a parameter I could fiddle with (e.g. ntree) to get
a
more stable importance order?

And finally, since interactions are not implemented, is there another method
I 
could use in R to find dependencies among categorical variables? (lm doesn't
accept categorical variables).

Many thanks
Eleni Rapsomaniki
Birkbeck College, UK

__ 
R-help@stat.math.ethz.ch mailto:R-help@stat.math.ethz.ch  mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.





-- 
Weiwei Shi, Ph.D

Did you always know?
No, I did not. But I believed...
---Matrix III



--

--
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory problems when combining randomForests

2006-07-31 Thread Weiwei Shi
Found it from another paper:
importance sample learning ensemble (ISLE)
which originates from Friedman and Popescu (2003).


On 7/31/06, Weiwei Shi [EMAIL PROTECTED] wrote:

 Hi, Andy:

 What's the Jerry Friedman's ISLE? I googled it and did not find the paper
 on it. Could you give me a link, please?

 Thanks,

 Weiwei


 On 7/31/06, Eleni Rapsomaniki [EMAIL PROTECTED] wrote:
 
  Hello
 
  I've just realised attachments are not allowed, so the data for the
  example in
  my previous message is:
 
  pos.df=read.table(http://www.savefile.com/projects3.php?fid=6240314pid=847249key=119090
  ,
  header=T)
 
  neg.df=read.table(http://fs07.savefile.com/download.php?pid=847249fid=9829834key=362779
  ,
  header=T)
 
  And my last two questions (promise!):
  The first is related to the order of columns (ie. explanatory
  variables). I get
  different order of importance for my variables depending on their order
  in the
  training data. Is there a parameter I could fiddle with (e.g. ntree) to
  get a
  more stable importance order?
 
  And finally, since interactions are not implemented, is there another
  method I
  could use in R to find dependencies among categorical variables? (lm
  doesn't
  accept categorical variables).
 
  Many thanks
  Eleni Rapsomaniki
  Birkbeck College, UK
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Weiwei Shi, Ph.D

 Did you always know?
 No, I did not. But I believed...
 ---Matrix III




-- 
Weiwei Shi, Ph.D

Did you always know?
No, I did not. But I believed...
---Matrix III

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory problems when combining randomForests

2006-07-29 Thread Eleni Rapsomaniki
Hello again,

The reason why I thought the order at which rows are passed to randomForest
affect the error rate is because I get different results for different ways of
splitting my positive/negative data. 

First get the data (attached with this email)
pos.df=read.table(C:/Program Files/R/rw2011/pos.df, header=T)
neg.df=read.table(C:/Program Files/R/rw2011/neg.df, header=T)
library(randomForest)
#The first 2 columns are explanatory variables (which incidentally are not
discriminative at all if one looks at their distributions), the 3rd is the
class (pos or neg) 

train2test.ratio=8/10
min_len=min(nrow(pos.df), nrow(neg.df))
class_index=which(names(pos.df)==class) #is the same for neg.df
train_size=as.integer(min_len*train2test.ratio)

   Way 1
train.indicesP=sample(seq(1:nrow(pos.df)), size=train_size, replace=FALSE)
train.indicesN=sample(seq(1:nrow(neg.df)), size=train_size, replace=FALSE)

trainP=pos.df[train.indicesP,]
trainN=neg.df[train.indicesN,]
testP=pos.df[-train.indicesP,]
testN=neg.df[-train.indicesN,]

mydata.rf - randomForest(x=rbind(trainP, trainN)[,-class_index],
y=rbind(trainP, trainN)[,class_index], xtest=rbind(testP,
testN)[,-class_index], ytest=rbind(testP, testN)[,class_index],
importance=TRUE,proximity=FALSE, keep.forest=FALSE)
mydata.rf$test$confusion

##   Way 2
ind - sample(2, min(nrow(pos.df), nrow(neg.df)), replace = TRUE,
prob=c(train2test.ratio, (1-train2test.ratio)))
trainP=pos.df[ind==1,]
trainN=neg.df[ind==1,]
testP=pos.df[ind==2,]
testN=neg.df[ind==2,]

mydata.rf - randomForest(x=rbind(trainP, trainN)[,-dir_index], y=rbind(trainP,
trainN)[,dir_index], xtest=rbind(testP, testN)[,-dir_index], ytest=rbind(testP,
testN)[,dir_index], importance=TRUE,proximity=FALSE, keep.forest=FALSE)
mydata.rf$test$confusion

### Way 3
subset_start=1
subset_end=subset_start+train_size
train_index=seq(subset_start:subset_end)
trainP=pos.df[train_index,]
trainN=neg.df[train_index,]
testP=pos.df[-train_index,]
testN=neg.df[-train_index,]

mydata.rf - randomForest(x=rbind(trainP, trainN)[,-dir_index], y=rbind(trainP,
trainN)[,dir_index], xtest=rbind(testP, testN)[,-dir_index], ytest=rbind(testP,
testN)[,dir_index], importance=TRUE,proximity=FALSE, keep.forest=FALSE)
mydata.rf$test$confusion

### end

The first 2 methods give me an abnormally low error rate (compared to what I
get using the same data on a naiveBayes method) while the last one seems more
realistic, but the difference in error rates is very significant. I need to use
the last method to cross-validate subsets of my data sequentially(the first two
methods use random rows throughout the length of the data), unless there is a
better way to do it (?). Something must be very different between the first 2
methods and the last, but which is the correct one?

I would greatly appreciate any suggestions on this!

Many Thanks
Eleni Rapsomaniki

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory problems when combining randomForests

2006-07-28 Thread Eleni Rapsomaniki
Hi Andy, 

  I'm using R (windows) version 2.1.1, randomForest version 4.15. 
^ 
 Never seen such a version...
Ooops! I meant 4.5-15
 
  I then save each tree to a file so I can combine them all 
  afterwards. There are no memory issues when 
  keep.forest=FALSE. But I think that's the bit I need for 
  future predictions (right?). 
 
 Yes, but what is your question?  (Do you mean each *forest*,
 instead of each *tree*?)
I mean the component of the object that is created from randomForest that has
the name forest (and takes up all the memory!). 

  A bit off the subject, but should the order at which at rows 
  (ie. sets of explanatory variables) are passed to the 
  randomForest function affect the result? I have noticed that 
  if I pick a random unordered sample from my control data for 
  training the error rate is much lower than if I a take an 
  ordered sample. This remains true for all my cross-validation 
  results. 
 
 I'm not sure I understand.  In randomForest() (as in other
 functions) variables are in columns, rather than rows, so
 are you talking about variables (columns) in different order 
 or data (rows) in different order?

Yes, sorry I confused you. I mean the order at which data (rows) is passed, not
columns.

Finally, I see from
http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#inter

that there is a component in Breiman's implementation of randomForest that
computes interactions between parameters. Has this been implemented in R yet?

Many thanks for your time and help.
Eleni Rapsomaniki

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory problems when combining randomForests

2006-07-28 Thread Liaw, Andy
From: Eleni Rapsomaniki
 
 Hi Andy, 
 
   I'm using R (windows) version 2.1.1, randomForest version 4.15. 
 ^ 
  Never seen such a version...
 Ooops! I meant 4.5-15
  
   I then save each tree to a file so I can combine them all 
   afterwards. There are no memory issues when 
   keep.forest=FALSE. But I think that's the bit I need for 
   future predictions (right?). 
  
  Yes, but what is your question?  (Do you mean each *forest*,
  instead of each *tree*?)
 I mean the component of the object that is created from 
 randomForest that has
 the name forest (and takes up all the memory!). 

Yes, the forest can take up quite a bit of space.  You might 
consider setting nodesize larger and see if that gives you 
sufficient space saving w/o compromising prediction performance.
 
   A bit off the subject, but should the order at which at rows 
   (ie. sets of explanatory variables) are passed to the 
   randomForest function affect the result? I have noticed that 
   if I pick a random unordered sample from my control data for 
   training the error rate is much lower than if I a take an 
   ordered sample. This remains true for all my cross-validation 
   results. 
  
  I'm not sure I understand.  In randomForest() (as in other
  functions) variables are in columns, rather than rows, so
  are you talking about variables (columns) in different order 
  or data (rows) in different order?
 
 Yes, sorry I confused you. I mean the order at which data 
 (rows) is passed, not
 columns.

Then I'm not sure what you mean by difference in performance, even
in cross-validation.  Perhaps you can show some example?  Each 
tree in the forest is grown on a random sample from the data, so
the order of the row can not matter.


 Finally, I see from
 http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#inter
 
 that there is a component in Breiman's implementation of 
 randomForest that
 computes interactions between parameters. Has this been 
 implemented in R yet?

No.  Prof. Breiman told me that is very experimental, and he
wouldn't mind if that doesn't make it into the R package.  
Since I have other priorities for the package, that naturally
went to the backburner.

Cheers,
Andy

 
 Many thanks for your time and help.
 Eleni Rapsomaniki
 
 
 
 This message was sent using IMP, the Internet Messaging Program.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory problems when combining randomForests [Broadcast]

2006-07-27 Thread Eleni Rapsomaniki
I'm using R (windows) version 2.1.1, randomForest version 4.15. 
I call randomForest like this:

my.rf=randomForest(x=train.df[,-response_index], y=train.df[,response_index],
 xtest=test.df[,-response_index], ytest=test.df[,response_index],
 importance=TRUE,proximity=FALSE, keep.forest=TRUE)

 (where train.df and test.df are my train and test data.frames and
 response_index is the column number specifiying the class)

I then save each tree to a file so I can combine them all afterwards. There are
no memory issues when keep.forest=FALSE. But I think that's the bit I need for
future predictions (right?). 

I did check previous messages on memory issues, and thought that
combining the trees afterwards would solve the problem. Since my
cross-validation subsets give me a fairly stable error-rate, I suppose I could
just use a randomForest trained on just a subset of my data. But would I not be
wasting data this way?

A bit off the subject, but should the order at which at rows (ie. sets of
explanatory variables) are passed to the randomForest function affect the
result? I have noticed that if I pick a random unordered sample from my control
data for training the error rate is much lower than if I a take an ordered
sample. This remains true for all my cross-validation results. 

I'm sorry for my many questions.
Many Thanks
Eleni Rapsomaniki

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory problems when combining randomForests

2006-07-27 Thread Liaw, Andy
From: Eleni Rapsomaniki
 
 I'm using R (windows) version 2.1.1, randomForest version 4.15. 
   ^

Never seen such a version...

 I call randomForest like this:
 
 my.rf=randomForest(x=train.df[,-response_index], 
 y=train.df[,response_index],  
 xtest=test.df[,-response_index], 
 ytest=test.df[,response_index],  
 importance=TRUE,proximity=FALSE, keep.forest=TRUE)
 
  (where train.df and test.df are my train and test 
 data.frames and  response_index is the column number 
 specifiying the class)
 
 I then save each tree to a file so I can combine them all 
 afterwards. There are no memory issues when 
 keep.forest=FALSE. But I think that's the bit I need for 
 future predictions (right?). 

Yes, but what is your question?  (Do you mean each *forest*,
instead of each *tree*?)
 
 I did check previous messages on memory issues, and thought 
 that combining the trees afterwards would solve the problem. 
 Since my cross-validation subsets give me a fairly stable 
 error-rate, I suppose I could just use a randomForest trained 
 on just a subset of my data. But would I not be wasting 
 data this way?

Perhaps, but see Jerry Friedman's ISLE, where he argued
that RF with very small trees grown on small random samples
can give even better results some of the times.
 
 A bit off the subject, but should the order at which at rows 
 (ie. sets of explanatory variables) are passed to the 
 randomForest function affect the result? I have noticed that 
 if I pick a random unordered sample from my control data for 
 training the error rate is much lower than if I a take an 
 ordered sample. This remains true for all my cross-validation 
 results. 

I'm not sure I understand.  In randomForest() (as in other
functions) variables are in columns, rather than rows, so
are you talking about variables (columns) in different order 
or data (rows) in different order?

Andy
 
 I'm sorry for my many questions.
 Many Thanks
 Eleni Rapsomaniki
 
 
 
 
 
 This message was sent using IMP, the Internet Messaging Program.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory problems when combining randomForests [Broadcast]

2006-07-26 Thread Liaw, Andy
You need to give us more details, like how you call randomForest, versions
of the package and R itself, etc.  Also, see if this helps you:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/32918.html

Andy
 
From: Eleni Rapsomaniki
 
 Dear all,
 
 I am trying to train a randomForest using all my control data 
 (12,000 cases, ~ 20 explanatory variables, 2 classes). 
 Because of memory constraints, I have split my data into 7 
 subsets and trained a randomForest for each, hoping that 
 using combine() afterwards would solve the memory issue. 
 Unfortunately,
 combine() still runs out of memory. Is there anything else I 
 can do? (I am not using the formula version)
 
 Many Thanks
 Eleni Rapsomaniki
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory problems with large dataset in rpart

2005-10-18 Thread Prof Brian Ripley
Looks like you have missed the section in the rw-FAQ entitled

 2.9 There seems to be a limit on the memory it uses!

and not set --max-mem-size (which defaults to 1Gb on your system).

However, it looks like your problem is memory fragmentation, and trying to 
run 1Gb tasks in a 2Gb address space is intrinsically a problem to which 
the only solution is a 64-bit version of R.

BTW, /3GB is nothing whatsoever to do with `swap files': if both OS and 
application are configured correctly it increases the user address space 
*for that process* to /3GB (whereas swap space is shared between 
processes).

On Tue, 18 Oct 2005 [EMAIL PROTECTED] wrote:

 Dear helpers,

 I am a Dutch student from the Erasmus University. For my Bachelor thesis I
 have written a script in R using boosting by means of classification and
 regression trees. This script uses the function the predefined function
 rpart. My input file consists of about 4000 vectors each having 2210
 dimensions. In the third iteration R complains of a lack of memory,
 although in each iteration every variable is removed from the memory. Thus
 the first two iterations run without any problems.

 My computer runs on Windows XP and has 1 gigabye of internal memory.
 I tried R using more memory by refiguring the swap files as memtioned in
 the FAQ (/3gb), but I didn't succeed in making this work.
 The command round(memory.limit()/1048576.0, 2) gives 1023.48

 If such an increase of memory can not succeed, perhaps the size of the
 rpart object could be reduced by not storing unnecessary information.
 The rpart function call is (the calls of FALSE is to try to reduce the
 size of the fit object):
 fit - rpart(price ~ ., data = trainingset,
 control=rpart.control(maxdepth=2,cp=0.001),model=FALSE,x=FALSE,y=FALSE)

 This fit object is later called in 2 predict functions, for example:
 predict(fit,newdata=sample)

 Can anybody please help me by letting R use more memory (for example swap)
 or can anybody help me reducing the size of the fit object?

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Memory problems

2005-02-23 Thread Prof Brian Ripley
R does use virtual memory, and memory.size() (Windows only) is documented 
to report the usage, not change the limit.  Please read that help page 
more carefully.

If you are not already doing so, try arima0 rather than arima.
And do see the posting guide!
On Tue, 22 Feb 2005, Konstantinos Kleisouris wrote:
  I use R to do some ARIMA forecasting and R runs out
of memory. The problem is that I have 20160
samples(which are quite alot) and when I try to fit
the model it runs out of memory. I tried with
memory.size() to change the limit, but it wouldn't
work. Is there anything you can suggest? Is it
possible R can use virtual memory?

PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Memory problems

2005-02-22 Thread Thomas Schönhoff


Looking at the posting guide will increase the chance to get a helpful 
response from this list.

No one knows what kind of operating system are you running: Is it 
Windows,MacOS or Linux (32 or 64 bit)?
Memory related problems are reported daily so it could be much 
beneficial to browse the help-archive ( has an efficient search 
facility ) Some reccent suggestions on memory limitations are found 
there

Thomas




 I use R to do some ARIMA forecasting and R runs out
 of memory. The problem is that I have 20160
 samples(which are quite alot) and when I try to fit
 the model it runs out of memory. I tried with
 memory.size() to change the limit, but it wouldn't
 work. Is there anything you can suggest? Is it
 possible R can use virtual memory?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Memory Problems in R

2004-08-18 Thread Roger D. Peng
There is a limit on how long a single vector can be, and I think it's 
2GB (even on 64-bit platforms).  Not sure on how the gc trigger is set

-roger
Scott Gilpin wrote:
Hello everyone -
I have a couple of questions about memory management of large objects.
Thanks in advance for your response.
I'm running R version 1.9.1 on solaris 8, compiled as a 32 bit app. 
My system has 12.0 GB of memory, with usually ~ 11GB free.  I checked
system limits using ulimit, and there is nothing set that would limit
the maximum amount of memory for a process (with the exception of an
8MB stack size).  I've also checked the amount of memory available to
R using mem.limits(), and there is no limit set.

I'm running into two problems.  The first is the error  cannot
allocate vector of size X - I know this has been discussed
several times on this mailing list, but it usually seems the user does
not have enough memory on their system, or does not have the memory
limits set correctly.  I don't believe this is the case in this
situation.  I verified that I don't have any objects in memory when R
starts up, and that memory limits are set to NA.  Here is some output:

ls()
character(0)
mem.limits()
nsize vsize 
   NANA 

gc()
 used (Mb) gc trigger (Mb)
Ncells 432197 11.6 531268 14.2
Vcells 116586  0.9 786432  6.0
v-rep(0,268435431)
Error: cannot allocate vector of size 2097151 Kb
v-rep(0,268435430)
object.size(v)
[1] 2147483468
gc()
used   (Mb) gc trigger   (Mb)
Ncells432214   11.6 741108   19.8
Vcells 268552029 2048.9  268939773 2051.9
Does R have a limit set on the size of an object that it will
allocate?  I know that the entire application will only be able to use
4GB of memory (because it's only 32bit), but I haven't found anything
in the R documentation or the help lists that indicates there is
maximum on the size of an object.  I understand there will be problems
if an object is greater than 2GB and needs to be copied - but will R
limit the creation of such an object?  It's also my understanding that
the garbage collector won't move objects and this may cause memory to
become fragmented - but I'm seeing these issues on startup when there
are no objects in memory.
My second problem is with matrices and the garbage collector, and the
limits it sets for gc trigger after a matrix is created.  When I
create a vector of approximately 500MB, R sets the gc trigger to be
slightly above this amount.  The gc trigger also seems to correspond
to the process size (as output by top).  When I create a matrix of
approximately 500MB, R sets the gc trigger to be roughly 3 times the
size of the matrix (and the process size is ~ 1.5GB).  Therefor, when
I try to create larger matrices, where 3x the size of the matrix is
greater than 4GB, R gives me an error.  Is there anything I can do to
create large matrices?  Or do I have to manipulate large objects as a
vector?
Output from the 3 different scenarios is below:
1) - can't create a matrix, but can create a vector
[Previously saved workspace restored]

m-matrix(rep(0,25000*1),nrow=1)
Error: cannot allocate vector of size 1953125 Kb
v-rep(0,25000*1)
object.size(v)/1024
[1] 1953125
2) gc trigger is set slightly higher than the size of the vector

ls()
character(0)
mem.limits()
nsize vsize 
   NANA 

gc()
 used (Mb) gc trigger (Mb)
Ncells 432197 11.6 531268 14.2
Vcells 116586  0.9 786432  6.0
v-rep(0,(2510)*(25000))
object.size(v)
[1] 5.02e+08
gc()
   used  (Mb) gc trigger  (Mb)
Ncells   432210  11.6 667722  17.9
Vcells 62866589 479.7   63247172 482.6
3) gc trigger is set ~ 3x the size of the matrix

ls()
character(0)
mem.limits()
nsize vsize 
   NANA 

gc()
 used (Mb) gc trigger (Mb)
Ncells 432197 11.6 531268 14.2
Vcells 116586  0.9 786432  6.0
A-matrix(rep(0,(2510)*(25000)),nrow=(2510),ncol=(25000))
object.size(A)
[1] 502000120
gc()
   used  (Mb) gc trigger   (Mb)
Ncells   432213  11.6 741108   19.8
Vcells 62866590 479.7  188640940 1439.3
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Memory Problems in R

2004-08-18 Thread Prof Brian Ripley
On Wed, 18 Aug 2004, Roger D. Peng wrote:

 There is a limit on how long a single vector can be, and I think it's 
 2GB (even on 64-bit platforms).  Not sure on how the gc trigger is set

There is a limit of R_SIZE_T_MAX bytes, but that is defined as ULONG_MAX
which should be 4GB-1 on a 32-bit platform, and much more on a 64-bit
platform.

The example works on a 64-bit platform, which demonstrates that there is
no 2GB limit there.

If you hit the length limit, the message is of the form

cannot allocate vector of length ...

Looking at the code in memory.c it seems that

if (size = (LONG_MAX / sizeof(VECREC)) - sizeof(SEXPREC_ALIGN) ||
(s = malloc(sizeof(SEXPREC_ALIGN) + size * sizeof(VECREC)))
== NULL) {
/* reset the vector heap limit */
R_VSize = old_R_VSize;
errorcall(R_NilValue, cannot allocate vector of size %lu Kb,
  (size * sizeof(VECREC))/1024);
}

has a limit of LONG_MAX bytes for a vector.  I think that is 
unintentional, and you might like to try ULONG_MAX there and re-compile.
But it really doesn't make much difference as there is very little you can 
do with an object taking up more than half the maximum memory size
except access bits of it (and that is what DBMSes are for).


A few comments:

1) Of course R does have objects in memory, 12.5Mb of them according to 
gc.  You are not starting with a clean slate.  Hopefully malloc has 
allocated them in a compact group.

2) Solaris has been a 64-bit OS for at least 7 years and you really should
be using a 64-bit build of R if you plan on exceeding 1Gb.

3) To create a matrix efficiently, create a vector and assign a dim.  I
gave an example on R-help yesterday, so please check the archives.

matrix() makes a copy of the data and so needs double the space you are
thinking it does.  Take a look at the source code:

PROTECT(snr = allocMatrix(TYPEOF(vals), nr, nc));
if(lendat) {
if (isVector(vals))
copyMatrix(snr, vals, byrow);
else
copyListMatrix(snr, vals, byrow);

4) The source code is the documentation here.  I suspect no one person 
knows all the details.


 Scott Gilpin wrote:
  Hello everyone -
  
  I have a couple of questions about memory management of large objects.
  Thanks in advance for your response.
  
  I'm running R version 1.9.1 on solaris 8, compiled as a 32 bit app. 
  My system has 12.0 GB of memory, with usually ~ 11GB free.  I checked
  system limits using ulimit, and there is nothing set that would limit
  the maximum amount of memory for a process (with the exception of an
  8MB stack size).  I've also checked the amount of memory available to
  R using mem.limits(), and there is no limit set.
  
  I'm running into two problems.  The first is the error  cannot
  allocate vector of size X - I know this has been discussed
  several times on this mailing list, but it usually seems the user does
  not have enough memory on their system, or does not have the memory
  limits set correctly.  I don't believe this is the case in this
  situation.  I verified that I don't have any objects in memory when R
  starts up, and that memory limits are set to NA.  Here is some output:
  
  
 ls()
  
  character(0)
  
 mem.limits()
  
  nsize vsize 
 NANA 
  
 gc()
  
   used (Mb) gc trigger (Mb)
  Ncells 432197 11.6 531268 14.2
  Vcells 116586  0.9 786432  6.0
  
 v-rep(0,268435431)
  
  Error: cannot allocate vector of size 2097151 Kb
  
 v-rep(0,268435430)
 object.size(v)
  
  [1] 2147483468
  
 gc()
  
  used   (Mb) gc trigger   (Mb)
  Ncells432214   11.6 741108   19.8
  Vcells 268552029 2048.9  268939773 2051.9
  
  
  Does R have a limit set on the size of an object that it will
  allocate?  I know that the entire application will only be able to use
  4GB of memory (because it's only 32bit), but I haven't found anything
  in the R documentation or the help lists that indicates there is
  maximum on the size of an object.  I understand there will be problems
  if an object is greater than 2GB and needs to be copied - but will R
  limit the creation of such an object?  It's also my understanding that
  the garbage collector won't move objects and this may cause memory to
  become fragmented - but I'm seeing these issues on startup when there
  are no objects in memory.
  
  
  My second problem is with matrices and the garbage collector, and the
  limits it sets for gc trigger after a matrix is created.  When I
  create a vector of approximately 500MB, R sets the gc trigger to be
  slightly above this amount.  The gc trigger also seems to correspond
  to the process size (as output by top).  When I create a matrix of
  approximately 500MB, R sets the gc trigger to be roughly 3 times the
  size of the matrix (and the process size is ~ 1.5GB).  Therefor, when
  I try to create larger matrices, where 3x the 

RE: [R] memory problems with lm

2004-04-29 Thread Liaw, Andy
Can you show us the output of str(eff.fro)?  Do you have other things in the
global environment or the search path that's taking up memory?  What does
gc() say?

Andy

 From: Adrian Dragulescu
 
 Hello list,
 
 I've seen the recent discussions documenting problems with lm.
 
 I have encountered the following problem.  I use WinXP Pro with
 service pack 1, and R 1.9.0, on a XEON 2GHz, with 1GB of RAM.
 
  eff.fro
   std.dev mean
 NSTRDSP  7.403749e-01 1.215686e-01
 CPFGEP   9.056763e+00 1.815686e+00
 WSWOLF   4.703588e+05 1.112832e+05
 NPILGRIM 1.017640e+06 2.134335e+05
 WSNMILE  1.367312e+07 1.892021e+06
 WSHIDESL 1.830811e+07 1.892021e+06
  reg - lm(log(mean) ~ log(std.dev), data=eff.fro)
 Error in model.matrix.default(mt, mf, contrasts) :
 cannot allocate vector of length 1074790452
  log(eff.fro$mean)
 [1] -2.1072763  0.5964635 11.6198339 12.2710808 14.4531561
 [6] 14.4531561
  reg - lm(log(eff.fro$mean) ~ log(eff.fro$std.dev))
 Error: cannot allocate vector of size 3360077 Kb
  lef - log(eff.fro)
  lef
 std.dev   mean
 NSTRDSP  -0.3005986 -2.1072763
 CPFGEP2.2035117  0.5964635
 WSWOLF   13.0612512 11.6198339
 NPILGRIM 13.8329973 12.2710808
 WSNMILE  16.4309427 14.4531561
 WSHIDESL 16.7228546 14.4531561
  lef - log(eff.fro)
  reg - lm(lef$mean ~ lef$std.dev)
 
 Here the my computer completely crashed.  A window poped-up and said
 memory problem at address ..., and if I want to debug.
 
 I ran the same code one more time, and it worked but it did not work
 how I wanted (where is the slope?):
  reg - lm(lef$mean ~ lef$std.dev)
  reg
 
 Call:
 lm(formula = lef$mean ~ lef$std.dev)
 
 Coefficients:
 (Intercept)
   8.548
 
 
  summary(reg)
 
 Call:
 lm(formula = lef$mean ~ lef$std.dev)
 
 Residuals:
   1   2   3   4   5   6
 -10.655  -7.951   3.072   3.723   5.905   5.905
 
 Coefficients:
 Estimate Std. Error t value Pr(|t|)
 (Intercept)8.548  2.9992.85   0.0358 *
 ---
 Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
 
 Residual standard error: 7.346 on 5 degrees of freedom
 
 I ran again:
 
   reg - lm(log(mean) ~ log(std.dev), data=eff.fro)
 
 and I get the pop-up:
 The instruction at 0x6b4c45a5 referenced memory at 0x0032374a.
 The memory could not be read.  Click OK to terminate the program.
 
 
 Any ideas?  Thank you,
 Adrian
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] memory problems with lm

2004-04-29 Thread Prof Brian Ripley
This may or may not be the same problem (which is already solved). But 
please read the section on BUGS in the R FAQ and set up a reproducible 
example.  Then try out the current version of r-patched (one dated 
tomorrow or later, to be safe) and see if the problem recurs.  If it does, 
please file a bug report.

My guess is that eff.fro$std.dev is a 1D array (use dim or str to find 
out), and you did not intend that.

On Thu, 29 Apr 2004, Adrian Dragulescu wrote:

 
 Hello list,
 
 I've seen the recent discussions documenting problems with lm.
 
 I have encountered the following problem.  I use WinXP Pro with
 service pack 1, and R 1.9.0, on a XEON 2GHz, with 1GB of RAM.
 
  eff.fro
   std.dev mean
 NSTRDSP  7.403749e-01 1.215686e-01
 CPFGEP   9.056763e+00 1.815686e+00
 WSWOLF   4.703588e+05 1.112832e+05
 NPILGRIM 1.017640e+06 2.134335e+05
 WSNMILE  1.367312e+07 1.892021e+06
 WSHIDESL 1.830811e+07 1.892021e+06
  reg - lm(log(mean) ~ log(std.dev), data=eff.fro)
 Error in model.matrix.default(mt, mf, contrasts) :
 cannot allocate vector of length 1074790452
  log(eff.fro$mean)
 [1] -2.1072763  0.5964635 11.6198339 12.2710808 14.4531561
 [6] 14.4531561
  reg - lm(log(eff.fro$mean) ~ log(eff.fro$std.dev))
 Error: cannot allocate vector of size 3360077 Kb
  lef - log(eff.fro)
  lef
 std.dev   mean
 NSTRDSP  -0.3005986 -2.1072763
 CPFGEP2.2035117  0.5964635
 WSWOLF   13.0612512 11.6198339
 NPILGRIM 13.8329973 12.2710808
 WSNMILE  16.4309427 14.4531561
 WSHIDESL 16.7228546 14.4531561
  lef - log(eff.fro)
  reg - lm(lef$mean ~ lef$std.dev)
 
 Here the my computer completely crashed.  A window poped-up and said
 memory problem at address ..., and if I want to debug.
 
 I ran the same code one more time, and it worked but it did not work
 how I wanted (where is the slope?):
  reg - lm(lef$mean ~ lef$std.dev)
  reg
 
 Call:
 lm(formula = lef$mean ~ lef$std.dev)
 
 Coefficients:
 (Intercept)
   8.548
 
 
  summary(reg)
 
 Call:
 lm(formula = lef$mean ~ lef$std.dev)
 
 Residuals:
   1   2   3   4   5   6
 -10.655  -7.951   3.072   3.723   5.905   5.905
 
 Coefficients:
 Estimate Std. Error t value Pr(|t|)
 (Intercept)8.548  2.9992.85   0.0358 *
 ---
 Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
 
 Residual standard error: 7.346 on 5 degrees of freedom
 
 I ran again:
 
   reg - lm(log(mean) ~ log(std.dev), data=eff.fro)
 
 and I get the pop-up:
 The instruction at 0x6b4c45a5 referenced memory at 0x0032374a.
 The memory could not be read.  Click OK to terminate the program.
 
 
 Any ideas?  Thank you,
 Adrian
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 
 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] memory problems with lm

2004-04-29 Thread Adrian Dragulescu

If I enforce the variables to be numeric it works fine.

 str(eff.fro)
`data.frame':   6 obs. of  2 variables:
 $ std.dev: num [, 1:6] 7.40e-01 9.06e+00 4.70e+05 1.02e+06 1.37e+07 ...
  ..- attr(*, dimnames)=List of 1
  .. ..$ : chr  NSTRDSP CPFGEP WSWOLF NPILGRIM ...
 $ mean   : num  1.22e-01 1.82e+00 1.11e+05 2.13e+05 1.89e+06 ...
 gc()
 used (Mb) gc trigger (Mb)
Ncells 578941 15.51166886 31.2
Vcells 589444  4.52377385 18.2
 eff.fro
  std.dev mean
NSTRDSP  7.403749e-01 1.215686e-01
CPFGEP   9.056763e+00 1.815686e+00
WSWOLF   4.703588e+05 1.112832e+05
NPILGRIM 1.017640e+06 2.134335e+05
WSNMILE  1.367312e+07 1.892021e+06
WSHIDESL 1.830811e+07 1.892021e+06
 reg - lm(log(as.numeric(mean)) ~ log(as.numeric(std.dev)),
data=eff.fro)
 reg

Call:
lm(formula = log(as.numeric(mean)) ~ log(as.numeric(std.dev)), data =
eff.fro)

Coefficients:
 (Intercept)  log(as.numeric(std.dev))
 -1.63680.9864


Adrian


On Thu, 29 Apr 2004, Liaw, Andy wrote:

 Can you show us the output of str(eff.fro)?  Do you have other things in the
 global environment or the search path that's taking up memory?  What does
 gc() say?

 Andy

  From: Adrian Dragulescu
 
  Hello list,
 
  I've seen the recent discussions documenting problems with lm.
 
  I have encountered the following problem.  I use WinXP Pro with
  service pack 1, and R 1.9.0, on a XEON 2GHz, with 1GB of RAM.
 
   eff.fro
std.dev mean
  NSTRDSP  7.403749e-01 1.215686e-01
  CPFGEP   9.056763e+00 1.815686e+00
  WSWOLF   4.703588e+05 1.112832e+05
  NPILGRIM 1.017640e+06 2.134335e+05
  WSNMILE  1.367312e+07 1.892021e+06
  WSHIDESL 1.830811e+07 1.892021e+06
   reg - lm(log(mean) ~ log(std.dev), data=eff.fro)
  Error in model.matrix.default(mt, mf, contrasts) :
  cannot allocate vector of length 1074790452
   log(eff.fro$mean)
  [1] -2.1072763  0.5964635 11.6198339 12.2710808 14.4531561
  [6] 14.4531561
   reg - lm(log(eff.fro$mean) ~ log(eff.fro$std.dev))
  Error: cannot allocate vector of size 3360077 Kb
   lef - log(eff.fro)
   lef
  std.dev   mean
  NSTRDSP  -0.3005986 -2.1072763
  CPFGEP2.2035117  0.5964635
  WSWOLF   13.0612512 11.6198339
  NPILGRIM 13.8329973 12.2710808
  WSNMILE  16.4309427 14.4531561
  WSHIDESL 16.7228546 14.4531561
   lef - log(eff.fro)
   reg - lm(lef$mean ~ lef$std.dev)
 
  Here the my computer completely crashed.  A window poped-up and said
  memory problem at address ..., and if I want to debug.
 
  I ran the same code one more time, and it worked but it did not work
  how I wanted (where is the slope?):
   reg - lm(lef$mean ~ lef$std.dev)
   reg
 
  Call:
  lm(formula = lef$mean ~ lef$std.dev)
 
  Coefficients:
  (Intercept)
8.548
 
  
   summary(reg)
 
  Call:
  lm(formula = lef$mean ~ lef$std.dev)
 
  Residuals:
1   2   3   4   5   6
  -10.655  -7.951   3.072   3.723   5.905   5.905
 
  Coefficients:
  Estimate Std. Error t value Pr(|t|)
  (Intercept)8.548  2.9992.85   0.0358 *
  ---
  Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
 
  Residual standard error: 7.346 on 5 degrees of freedom
 
  I ran again:
 
reg - lm(log(mean) ~ log(std.dev), data=eff.fro)
 
  and I get the pop-up:
  The instruction at 0x6b4c45a5 referenced memory at 0x0032374a.
  The memory could not be read.  Click OK to terminate the program.
 
 
  Any ideas?  Thank you,
  Adrian
 
  __
  [EMAIL PROTECTED] mailing list
  https://www.stat.math.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
  http://www.R-project.org/posting-guide.html
 
 


 --
 Notice:  This e-mail message, together with any attachments, contains
 information of Merck  Co., Inc. (One Merck Drive, Whitehouse Station, New
 Jersey, USA 08889), and/or its affiliates (which may be known outside the
 United States as Merck Frosst, Merck Sharp  Dohme or MSD and in Japan as
 Banyu) that may be confidential, proprietary copyrighted and/or legally
 privileged. It is intended solely for the use of the individual or entity
 named on this message.  If you are not the intended recipient, and have
 received this message in error, please notify us immediately by reply e-mail
 and then delete it from your system.
 --


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] memory problems with lm

2004-04-29 Thread Liaw, Andy
I believe Prof. Ripley is right.  The problem is

  $ std.dev: num [, 1:6] 7.40e-01 9.06e+00 4.70e+05 1.02e+06 
 1.37e+07 ...
   ..- attr(*, dimnames)=List of 1

which looks like an array, rather than a vector.

Andy

 From: Adrian Dragulescu [mailto:[EMAIL PROTECTED] 
 
 If I enforce the variables to be numeric it works fine.
 
  str(eff.fro)
 `data.frame':   6 obs. of  2 variables:
  $ std.dev: num [, 1:6] 7.40e-01 9.06e+00 4.70e+05 1.02e+06 
 1.37e+07 ...
   ..- attr(*, dimnames)=List of 1
   .. ..$ : chr  NSTRDSP CPFGEP WSWOLF NPILGRIM ...
  $ mean   : num  1.22e-01 1.82e+00 1.11e+05 2.13e+05 1.89e+06 ...
  gc()
  used (Mb) gc trigger (Mb)
 Ncells 578941 15.51166886 31.2
 Vcells 589444  4.52377385 18.2
  eff.fro
   std.dev mean
 NSTRDSP  7.403749e-01 1.215686e-01
 CPFGEP   9.056763e+00 1.815686e+00
 WSWOLF   4.703588e+05 1.112832e+05
 NPILGRIM 1.017640e+06 2.134335e+05
 WSNMILE  1.367312e+07 1.892021e+06
 WSHIDESL 1.830811e+07 1.892021e+06
  reg - lm(log(as.numeric(mean)) ~ log(as.numeric(std.dev)),
 data=eff.fro)
  reg
 
 Call:
 lm(formula = log(as.numeric(mean)) ~ 
 log(as.numeric(std.dev)), data =
 eff.fro)
 
 Coefficients:
  (Intercept)  log(as.numeric(std.dev))
  -1.63680.9864
 
 
 Adrian
 
 
 On Thu, 29 Apr 2004, Liaw, Andy wrote:
 
  Can you show us the output of str(eff.fro)?  Do you have 
 other things in the
  global environment or the search path that's taking up 
 memory?  What does
  gc() say?
 
  Andy
 
   From: Adrian Dragulescu
  
   Hello list,
  
   I've seen the recent discussions documenting problems with lm.
  
   I have encountered the following problem.  I use WinXP Pro with
   service pack 1, and R 1.9.0, on a XEON 2GHz, with 1GB of RAM.
  
eff.fro
 std.dev mean
   NSTRDSP  7.403749e-01 1.215686e-01
   CPFGEP   9.056763e+00 1.815686e+00
   WSWOLF   4.703588e+05 1.112832e+05
   NPILGRIM 1.017640e+06 2.134335e+05
   WSNMILE  1.367312e+07 1.892021e+06
   WSHIDESL 1.830811e+07 1.892021e+06
reg - lm(log(mean) ~ log(std.dev), data=eff.fro)
   Error in model.matrix.default(mt, mf, contrasts) :
   cannot allocate vector of length 1074790452
log(eff.fro$mean)
   [1] -2.1072763  0.5964635 11.6198339 12.2710808 14.4531561
   [6] 14.4531561
reg - lm(log(eff.fro$mean) ~ log(eff.fro$std.dev))
   Error: cannot allocate vector of size 3360077 Kb
lef - log(eff.fro)
lef
   std.dev   mean
   NSTRDSP  -0.3005986 -2.1072763
   CPFGEP2.2035117  0.5964635
   WSWOLF   13.0612512 11.6198339
   NPILGRIM 13.8329973 12.2710808
   WSNMILE  16.4309427 14.4531561
   WSHIDESL 16.7228546 14.4531561
lef - log(eff.fro)
reg - lm(lef$mean ~ lef$std.dev)
  
   Here the my computer completely crashed.  A window 
 poped-up and said
   memory problem at address ..., and if I want to debug.
  
   I ran the same code one more time, and it worked but it 
 did not work
   how I wanted (where is the slope?):
reg - lm(lef$mean ~ lef$std.dev)
reg
  
   Call:
   lm(formula = lef$mean ~ lef$std.dev)
  
   Coefficients:
   (Intercept)
 8.548
  
   
summary(reg)
  
   Call:
   lm(formula = lef$mean ~ lef$std.dev)
  
   Residuals:
 1   2   3   4   5   6
   -10.655  -7.951   3.072   3.723   5.905   5.905
  
   Coefficients:
   Estimate Std. Error t value Pr(|t|)
   (Intercept)8.548  2.9992.85   0.0358 *
   ---
   Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
  
   Residual standard error: 7.346 on 5 degrees of freedom
  
   I ran again:
  
 reg - lm(log(mean) ~ log(std.dev), data=eff.fro)
  
   and I get the pop-up:
   The instruction at 0x6b4c45a5 referenced memory at 0x0032374a.
   The memory could not be read.  Click OK to terminate 
 the program.
  
  
   Any ideas?  Thank you,
   Adrian
  
   __
   [EMAIL PROTECTED] mailing list
   https://www.stat.math.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide!
   http://www.R-project.org/posting-guide.html
  
  
 
 
  
 --
 
  Notice:  This e-mail message, together with any 
 attachments, contains
  information of Merck  Co., Inc. (One Merck Drive, 
 Whitehouse Station, New
  Jersey, USA 08889), and/or its affiliates (which may be 
 known outside the
  United States as Merck Frosst, Merck Sharp  Dohme or MSD 
 and in Japan as
  Banyu) that may be confidential, proprietary copyrighted 
 and/or legally
  privileged. It is intended solely for the use of the 
 individual or entity
  named on this message.  If you are not the intended 
 recipient, and have
  received this message in error, please notify us 
 immediately by reply e-mail
  and then delete it from your system.
  
 --
 
 
 



Re: [R] memory problems in NLME

2003-03-30 Thread Douglas Bates
Vumani Dlamini [EMAIL PROTECTED] writes:

 I am trying to fit a random coeficient model with about 30 covariates
 with a random intercept and one random slope. The data set has 65000
 observations, and whenever I use LME I get the message that all memory
 has been used.

Do you know what the number of columns in the model matrix for the
fixed-effects will be?  You say you have 30 covariates but if some of
those are factors or if you take powers of continuous covariates or
interactions between terms then the number of columns in the model
matrix can be much larger than 30.  Given the dimensions you mention
it seems that the model matrix for the fixed effects is nearly 16 MB
in size or larger.  Evaluation of the log-likelihood requires 3 or 4
copies of matrices like this plus the original data frame and the
memory being used by other R objects.

 I was wondering whether there is a more efficient way fo fitting the model.

Saikat DebRoy and I are working on a new package for lme and related
functions using S4 classes.  That package, which we plan to release in
a 'snapshot' form shortly after R-1.7.0 is released (scheduled for April 16,
2003), controls the number of copies of the model matrices being
created.

I can run this example on the new package and the old package and
provide comparisons if you wish.  I use a machine with 1 GB of memory
which should be enough. Please contact me off-list.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] memory problems

2003-01-24 Thread Ernesto Jardim
Hi

I'm using SuSE 8.0 and R 1.6.2. The mem.limits are nt set so it should
go to the maximum the machine allows.

My doubt is that I have 2GB and R is complainig about allocating less
then 500MB.

Regards

EJ

On Fri, 2003-01-24 at 22:22, Andrew C. Ward wrote:
 Ernesto,
 
 I can't tell what version of R you're using and for which platform.
 In any case, there are some start-up options relating to memory
 usage, and you will find discussions of these in the relevant
 FAQ. Under Windows, the amount of memory that R uses is set by the
 command-line flag --max-mem-size.
 
 An alternative is to perform your analysis on just a few random
 subsets of data and then aggregate the results. I don't know how
 big your data set actually is so it's hard to provide more
 specific guidance.
 
 Post again if you're still having trouble.
 
 
 
 Regards,
 
 Andrew C. Ward
 
 CAPE Centre
 Department of Chemical Engineering
 The University of Queensland
 Brisbane Qld 4072 Australia
 [EMAIL PROTECTED]
 
 
 
 On Friday, January 24, 2003 10:02 PM, Ernesto Jardim [SMTP:[EMAIL PROTECTED]] wrote:
  Hi
  
  I'm computing a bca interval using bca.ci from the boot package.
  
  When I try to use this I get an error 
  
  
   library(boot)
   boot(logglm.data,boot.fishpower,2500,coef.vec=coeflm.vec)-blm8901
   bca.ci(blm8901,index=29)
  Error: cannot allocate vector of size 456729 Kb
  
  However my machine has 2GB of memory and without R running I only have
  112M of memory used.
  
  Is there something I can do to be able to perform this analysis ? (I can
  not by more memory ;-)
  
  Thanks
  
  EJ
  
  -- 
  Ernesto Jardim [EMAIL PROTECTED]
  Marine Biologist
  IPIMAR - National Research Institute for Agriculture and Fisheries
  Av. Brasilia, 1400-006
  Lisboa, Portugal
  Tel: +351 213 027 000
  Fax: +351 213 015 948
  http://ernesto.freezope.org
  
  __
  [EMAIL PROTECTED] mailing list
  http://www.stat.math.ethz.ch/mailman/listinfo/r-help
-- 
Ernesto Jardim [EMAIL PROTECTED]
Marine Biologist
Research Institute for Agriculture and Fisheries
Lisboa, Portugal
Tel: +351 213 027 000
Fax: +351 213 015 948

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



Re: [R] memory problems

2003-01-24 Thread Peter Dalgaard BSA
[EMAIL PROTECTED] writes:

  However my machine has 2GB of memory and without R running I only have
  112M of memory used.
 
 How much memory is it actually using?  It is complaining about allocating
 an *additional* 450Mb.  Look at top / Task Manager / whatever.

It's not the first time we're seeing that type of question. I wonder
if we could make the error message more informative. Not that it is
going to make the problem go away, but it could help put the user in
the picture. It's a bit tricky, because there are limits to what you
can make the system do in an out of memory condition. One idea might
be to keep tabs on the total amount of memory allocated and freed.
However, there are issues of counter overruns to deal with, and you'd
still have to explain to people that not all free memory is available
for allocation; you really only need to allocate a handful of *bytes*
sufficiently unfortunately spaced to make it impossible to find a
450MB contiguous block in a 2GB address space.

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help