Hi Garth,
Your code is really confusing! You should start by reading the help file
on the for() function and understanding what it does:
?for
Your line
for(i in 1:nboot){
}
is simply starting a loop around the variable 'i', which will change
values following the sequence 1:nboot.
It seems that the problem (or part of it) is that your are calling the
sample() function using a 'n' variable that is not defined anywhere.
Also, what nboot is supposed to be? The numbers of samples to be taken
(10, 20, etc.) or the number of iterations (1000). In your example, you
are calling your function as
bt.cor - npboot.function(nboot=10)
so in this case your function will loop around 10 times.
Here is a function that will do what you want:
npboot.function - function(data,nboot){
boot.cor - vector(length=1000)
for (i in 1:1000){
abc2=data[-(1:nboot),] #Remove the first 'nboot' rows
my.sample=sample(1:(250-nboot),nboot,replace=T) # Sample rows
abc2=rbind(abc2,abc2[my.sample,]) # Add the sampled rows to the
truncated dataset
model - lm(asin(sqrt(abc2$y/100)) ~ abc2$x1 + abc2$x2) #Fit the model
boot.cor[i]=cor(abc2$y,model$fit) #Get correlation
}
return (boot.cor)}
bt.cor - npboot.function(abc,nboot=120)
bootmean - mean(bt.cor)
[EMAIL PROTECTED] wrote:
Hi,
I'm new to R (and statistics) and my boss has thrown me in the deep-end with
the following task:
We want to evaluate the impact that sampling size has on our ability to
create a robust model, or evaluate how robust the model is to sample size for
the purpose of cross-validation i.e. in our current project we have collected
a series of independent data at 250 locations, from which we have built a
predictive model, we want to know whether we could get away with collecting
fewer samples and still build a decent model; for the obvious operational
reasons of cost, time spent in the field etc..
Our thinking was that we could apply a bootstrap type procedure:
We would remove 10 records or samples from the total n=250 and then replace
those 10 removed with replacements (or copies) from the remaining 240. With
this new data-frame we would apply our model and calculate an r², we would
then repeat through looping 1000 times before generating the mean r² from
those 1000 r² values generated. After which we would start the process again
by remove 20 samples from our data with replacements from the remaining 230
records and so on...
Below is a simplified version of the real code which contains most of the
basic elements. My main problem is I'm not sure what the 'for(i in 1:nboot)'
line is doing, originally I though what this meant was that it removed 1
sample or record from the data which was replaced by a copy of one of the
records from the remaining n, such that 'for(i in 10:nboot)' when used in the
context of the below code removed 10 samples with replacements as I have said
above. I'm almost positive that this isn't happening and if not how can I
make the code below for example do what we want it to?
library(utils)
#data
a - c(5.5, 2.3, 8.5, 9.1, 8.6, 5.1)
b - c(5.2, 2.2, 8.6, 9.1, 8.8, 5.7)
c - c(5.0,14.6, 8.9, 9.0, 9.1, 5.5)
#join
abc - data.frame(a,b,c)
#set column names
names(abc)[1]-y
names(abc)[2]-x1
names(abc)[3]-x2
abc2 - abc
#sample
abc3 - as.data.frame(t(as.matrix(data.frame(abc2
n - length(abc2)
npboot.function - function(nboot)
{
boot.cor - vector(length=nboot)
for(i in 1:nboot){
rdata - sample(abc3,n,replace=T)
abc4 - as.data.frame(t(as.matrix(data.frame(rdata
model - lm(asin(sqrt(abc4$y/100)) ~ I(abc4$x1^2) + abc4$x2)
boot.cor[i] - cor(abc4$y, model$fit)}
boot.cor
}
bt.cor - npboot.function(nboot=10)
bootmean - mean(bt.cor)
Any assistance would be greatly appreciated, also the sooner the better as we
are under pressure to reach a conclusion.
Cheers,
Garth
[[alternative HTML version deleted]]
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.