Re: [R] loops sampling

2007-11-01 Thread Julian Burgos
Hi Garth,

Your code is really confusing! You should start by reading the help file 
on the for() function and understanding what it does:

?for

Your line
for(i in 1:nboot){

}

is simply starting a loop around the variable 'i', which will change 
values following the sequence 1:nboot.

It seems that the problem (or part of it) is that your are calling the 
sample() function using a 'n' variable that is not defined anywhere.

Also, what nboot is supposed to be?  The numbers of samples to be taken 
(10, 20, etc.) or the number of iterations (1000).  In your example, you 
are calling your function as

bt.cor - npboot.function(nboot=10)

so in this case your function will loop around 10 times.

Here is a function that will do what you want:

npboot.function - function(data,nboot){
boot.cor - vector(length=1000)
for (i in 1:1000){
abc2=data[-(1:nboot),] #Remove the first 'nboot' rows
my.sample=sample(1:(250-nboot),nboot,replace=T) # Sample rows
abc2=rbind(abc2,abc2[my.sample,]) # Add the sampled rows to the 
truncated dataset
model - lm(asin(sqrt(abc2$y/100)) ~ abc2$x1 + abc2$x2) #Fit the model
boot.cor[i]=cor(abc2$y,model$fit)  #Get correlation
}
return (boot.cor)}

bt.cor - npboot.function(abc,nboot=120)
bootmean - mean(bt.cor)




[EMAIL PROTECTED] wrote:
 Hi,
 
  
 
 I'm new to R (and statistics) and my boss has thrown me in the deep-end with 
 the following task: 
 
  
 
 We want to evaluate the impact that sampling size has on our ability to 
 create a robust model, or evaluate how robust the model is to sample size for 
 the purpose of cross-validation i.e. in our current project we have collected 
 a series of independent data at 250 locations, from which we have built a 
 predictive model, we want to know whether we could get away with collecting 
 fewer samples and still build a decent model; for the obvious operational 
 reasons of cost, time spent in the field etc.. 
 
  
 
 Our thinking was that we could apply a bootstrap type procedure:
 
  
 
 We would remove 10 records or samples from the total n=250 and then replace 
 those 10 removed with replacements (or copies) from the remaining 240. With 
 this new data-frame we would apply our model and calculate an r², we would 
 then repeat through looping 1000 times before generating the mean r² from 
 those 1000 r² values generated. After which we would start the process again 
 by remove 20 samples from our data with replacements from the remaining 230 
 records and so on... 
 
  
 
 Below is a simplified version of the real code which contains most of the 
 basic elements. My main problem is I'm not sure what the 'for(i in 1:nboot)' 
 line is doing, originally I though what this meant was that it removed 1 
 sample or record from the data which was replaced by a copy of one of the 
 records from the remaining n, such that 'for(i in 10:nboot)' when used in the 
 context of the below code removed 10 samples with replacements as I have said 
 above. I'm almost positive that this isn't happening and if not how can I 
 make the code below for example do what we want it to? 
 
  
 
 library(utils)
 
 #data
 
 a - c(5.5, 2.3, 8.5, 9.1, 8.6, 5.1)
 
 b - c(5.2, 2.2, 8.6, 9.1, 8.8, 5.7)
 
 c - c(5.0,14.6, 8.9, 9.0, 9.1, 5.5)
 
 #join
 
 abc - data.frame(a,b,c)
 
 #set column names
 
 names(abc)[1]-y
 
 names(abc)[2]-x1
 
 names(abc)[3]-x2
 
 abc2 - abc
 
 #sample
 
 abc3 - as.data.frame(t(as.matrix(data.frame(abc2
 
 n - length(abc2)
 
 npboot.function - function(nboot)
 
 {
 
 boot.cor - vector(length=nboot)
 
 for(i in 1:nboot){
 
 rdata - sample(abc3,n,replace=T)
 
 abc4 - as.data.frame(t(as.matrix(data.frame(rdata
 
 model - lm(asin(sqrt(abc4$y/100)) ~ I(abc4$x1^2) + abc4$x2)
 
 boot.cor[i] - cor(abc4$y, model$fit)}
 
 boot.cor
 
 }
 
 bt.cor - npboot.function(nboot=10)
 
 bootmean - mean(bt.cor)
 
  
 
  
 
 Any assistance would be greatly appreciated, also the sooner the better as we 
 are under pressure to reach a conclusion.
 
  
 
 Cheers,
 
  
 
 Garth
 
 
   [[alternative HTML version deleted]]
 
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] loops sampling

2007-10-31 Thread Garth.Warren
Hi,

 

I'm new to R (and statistics) and my boss has thrown me in the deep-end with 
the following task: 

 

We want to evaluate the impact that sampling size has on our ability to create 
a robust model, or evaluate how robust the model is to sample size for the 
purpose of cross-validation i.e. in our current project we have collected a 
series of independent data at 250 locations, from which we have built a 
predictive model, we want to know whether we could get away with collecting 
fewer samples and still build a decent model; for the obvious operational 
reasons of cost, time spent in the field etc.. 

 

Our thinking was that we could apply a bootstrap type procedure:

 

We would remove 10 records or samples from the total n=250 and then replace 
those 10 removed with replacements (or copies) from the remaining 240. With 
this new data-frame we would apply our model and calculate an r², we would then 
repeat through looping 1000 times before generating the mean r² from those 1000 
r² values generated. After which we would start the process again by remove 20 
samples from our data with replacements from the remaining 230 records and so 
on... 

 

Below is a simplified version of the real code which contains most of the basic 
elements. My main problem is I'm not sure what the 'for(i in 1:nboot)' line is 
doing, originally I though what this meant was that it removed 1 sample or 
record from the data which was replaced by a copy of one of the records from 
the remaining n, such that 'for(i in 10:nboot)' when used in the context of the 
below code removed 10 samples with replacements as I have said above. I'm 
almost positive that this isn't happening and if not how can I make the code 
below for example do what we want it to? 

 

library(utils)

#data

a - c(5.5, 2.3, 8.5, 9.1, 8.6, 5.1)

b - c(5.2, 2.2, 8.6, 9.1, 8.8, 5.7)

c - c(5.0,14.6, 8.9, 9.0, 9.1, 5.5)

#join

abc - data.frame(a,b,c)

#set column names

names(abc)[1]-y

names(abc)[2]-x1

names(abc)[3]-x2

abc2 - abc

#sample

abc3 - as.data.frame(t(as.matrix(data.frame(abc2

n - length(abc2)

npboot.function - function(nboot)

{

boot.cor - vector(length=nboot)

for(i in 1:nboot){

rdata - sample(abc3,n,replace=T)

abc4 - as.data.frame(t(as.matrix(data.frame(rdata

model - lm(asin(sqrt(abc4$y/100)) ~ I(abc4$x1^2) + abc4$x2)

boot.cor[i] - cor(abc4$y, model$fit)}

boot.cor

}

bt.cor - npboot.function(nboot=10)

bootmean - mean(bt.cor)

 

 

Any assistance would be greatly appreciated, also the sooner the better as we 
are under pressure to reach a conclusion.

 

Cheers,

 

Garth


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.