subject:"\[R\] Cross Validation"

[R] Cross validation tidyLPA

2020-03-06 Thread De Meo Ermelinda


Are there available some cross-validation method for LPA object??

Linda


Rispetta l’ambiente: non stampare questa mail se non è necessario.
Respect the environment: print this email only if necessary.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross validation multivariate kernel regression

2019-11-19 Thread Abby Spurdle

> I am planning to implement Nadaraya-Watson regression model, with

I'm not sure what you mean by "implement".
Write a package, fit a model, or something else...

Reading your whole post, I get the impression you want mid-level
"building blocks", so you customize the model fitting process, in some
way.
But maybe I've got that wrong...

If you want fine control over the model fitting process (including the
cross validation), then you may have to write your own package,
including your own building blocks.
Otherwise, I think you should just use what's available.

Also, I'm not familiar with every flavor of nonparametric regression available.
If I wanted to fit a nonparametric regression model, I would start
with the mgcv package, which is hard to beat.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross validation multivariate kernel regression

2019-11-18 Thread Preetam Pal

Hi,

This question is general- I have a data set of n observations, consisting
of a single response variable y and p regressor variables.( n ~50, p~3 or
4).
I am planning to implement Nadaraya-Watson regression model, with
bandwidths optimized via cross-validation.
For cross-validation, I will need to choose 10 outsample/test data sets of
a given size ( =n/10 ) for each choice of the bandwidth vector, and then
choose the optimum bandwidth vector (in terms of MSE or any reasonable loss
function-we can take it to be MSE,  as example).

The difficulty is I can't find any code to do this under:
A) multiple regressors (p>1) AND
B) I'll get to choose to the outsample datasets.

Thanks for any help/insight you can provide.
Regards,
Preetam

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross-validation : can't get the predicted response on the testing data

2018-04-21 Thread varin sacha via R-help

Dear R-experts,

Doing cross-validation for 2 robust regressions (HBR and fast Tau). I can't get 
the 2 errors rates (RMSE and MAPE). The problem is to predict the response on 
the testing data. I get 2 error messages. 
Here below the reproducible (fictional example) R code.

#install.packages("MLmetrics")
# install.packages( "robustbase" )
# install.packages( "MASS" )
# install.packages( "quantreg" )
# install.packages( "RobPer" )
# install.packages( "scatterplot3d" )
# install.packages("devtools")  # library("devtools") # 
install_github("kloke/hbrfit") 
#install.packages('http://www.stat.wmich.edu/mckean/Stat666/Pkgs/npsmReg2_0.1.1.tar.gz')
 

library(robustbase)
library(MASS)
library(quantreg)
library(RobPer)
library(scatterplot3d)
library(hbrfit)
library(MLmetrics) 

# numeric variables
A=c(2,3,4,3,2,6,5,6,4,3,5,55,6,5,4,5,6,6,7,52)
B= c(45,43,23,47,65,21,12,7,18,29,56,45,34,23,12,65,4,34,54,23)
C=c(21,54,34,12,4,56,74,3,12,71,14,15,63,34,35,23,24,21,69,32)

# Create a dataframe
BIO<-data.frame(A,B,C)
  
# randomize sampling seed
set.seed(1)
n=dim(BIO)[1] 
p=0.667

# Sample size
sam=sample(1 :n,floor(p*n),replace=FALSE)

# Sample training data
Training =BIO [sam,]

# Sample testing data
Testing = BIO [-sam,]
  
# Build the 2 models 
fit<- FastTau(model.matrix(~Training$A+Training$B),Training$C)
HBR<-hbrfit(C ~ A+B)

# Predict the response on the testing data
ypred=predict(fit,newdata=Testing)
ypred=predict(HBR,newdata=Testing)

# Get the true response from testing data
y=BIO[-sam,]$D

 # Get error rate
RMSE=sqrt(mean((y-ypred)^2))
RMSE
MAPE = mean(abs(y-ypred/y))
MAPE

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross validation in random forest using rfcv functin

2017-08-24 Thread David Winsemius

> On Aug 23, 2017, at 10:59 AM, Elahe chalabi via R-help  
> wrote:
> 
> Any responds?!

When I look at the original post a I see a question about a function named 
`rfcv` but do not see a `library` call to load such a function. I also see a 
reference to a help page or vignette, perhaps?, from that un-identified 
package. So it appears to me that you expect the rest of us to go searching for 
that function if we do not use it on a rtegular basis. You also apparently 
expect use to construct a dataset to reconstruct a dataset for testing. I'm not 
inclined to make all that effort, and from the crashing silence of the last 24 
hours on this venue, it appears I am not alone in thinking you presume too 
much. Read the Posting Guide and try to better understand why your behavior 
might not be eliciting the level of interest you were hoping for.

-- David.

> 
> 
> 
> On Wednesday, August 23, 2017 5:50 AM, Elahe chalabi via R-help 
>  wrote:
> 
> 
> 
> Hi all,
> 
> 
> I would like to do cross validation in random forest using rfcv function. As 
> the documentation for this package says:
> 
> 
> rfcv(trainx, trainy, cv.fold=5, scale="log", step=0.5, mtry=function(p) 
> max(1, floor(sqrt(p))), recursive=FALSE, ...)
> 
> 
> however I don't know how to build trianx and trainy for my data set, and I 
> could not understand the way trainx is built in the package documentation 
> example for iris data set.
> 
> Here is my data set and I want to do cross validation to see accuracy in 
> classifying Alzheimer and Control Group:
> 
> 
> str(data)
> 
> 'data.frame':499 obs. of  606 variables:
> 
> $ Gender: int  0 0 0 0 0 1 1 1 1 1 ...
> 
> $ NumOfWords: num  157 111 163 176 100 124 201 100 76 101
> 
> $ NumofLivings  : int  6 6 9 4 3 5 3 3 4 3 ...
> 
> $ NumofStopWords: num  77 45 87 91 46 64 104 37 32 41 ...
> 
> .
> 
> .
> 
> $ Group : Factor w/ 2 levels "Alzheimer","Control","Control"..:
> 
> 
> So basically trainy should be data$Group but how about trainx? Could anyone 
> help me in this?
> 
> 
> 
> Thanks for any help!
> 
> Elahe
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   
-Gehm's Corollary to Clarke's Third Law

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross validation in random forest using rfcv functin

2017-08-23 Thread Elahe chalabi via R-help

Any responds?!



On Wednesday, August 23, 2017 5:50 AM, Elahe chalabi via R-help 
 wrote:



Hi all,


I would like to do cross validation in random forest using rfcv function. As 
the documentation for this package says:


rfcv(trainx, trainy, cv.fold=5, scale="log", step=0.5, mtry=function(p) max(1, 
floor(sqrt(p))), recursive=FALSE, ...)


however I don't know how to build trianx and trainy for my data set, and I 
could not understand the way trainx is built in the package documentation 
example for iris data set.

Here is my data set and I want to do cross validation to see accuracy in 
classifying Alzheimer and Control Group:


str(data)

'data.frame':499 obs. of  606 variables:

$ Gender: int  0 0 0 0 0 1 1 1 1 1 ...

$ NumOfWords: num  157 111 163 176 100 124 201 100 76 101

$ NumofLivings  : int  6 6 9 4 3 5 3 3 4 3 ...

$ NumofStopWords: num  77 45 87 91 46 64 104 37 32 41 ...

.

.

$ Group : Factor w/ 2 levels "Alzheimer","Control","Control"..:


So basically trainy should be data$Group but how about trainx? Could anyone 
help me in this?



Thanks for any help!

Elahe

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross validation in random forest using rfcv functin

2017-08-23 Thread Elahe chalabi via R-help

Hi all,


I would like to do cross validation in random forest using rfcv function. As 
the documentation for this package says:


rfcv(trainx, trainy, cv.fold=5, scale="log", step=0.5, mtry=function(p) max(1, 
floor(sqrt(p))), recursive=FALSE, ...)


however I don't know how to build trianx and trainy for my data set, and I 
could not understand the way trainx is built in the package documentation 
example for iris data set.

Here is my data set and I want to do cross validation to see accuracy in 
classifying Alzheimer and Control Group:


str(data)

'data.frame':499 obs. of  606 variables:

$ Gender: int  0 0 0 0 0 1 1 1 1 1 ...

$ NumOfWords: num  157 111 163 176 100 124 201 100 76 101

$ NumofLivings  : int  6 6 9 4 3 5 3 3 4 3 ...

$ NumofStopWords: num  77 45 87 91 46 64 104 37 32 41 ...

.

.

$ Group : Factor w/ 2 levels "Alzheimer","Control","Control"..:


So basically trainy should be data$Group but how about trainx? Could anyone 
help me in this?



Thanks for any help!

Elahe

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross validation in random forest rfcv functin

2017-08-23 Thread Elahe chalabi via R-help

Hi all,

I would like to do cross validation in random forest using rfcv function. As 
the documentation for this package says:

rfcv(trainx, trainy, cv.fold=5, scale="log", step=0.5, mtry=function(p) max(1, 
floor(sqrt(p))), recursive=FALSE, ...)

however I don't know how to build trianx and trainy for my data set, and I 
could not understand the way trainx is built in the package documentation 
example for iris data set.
Here is my data set and I want to do cross validation to see accuracy in 
classifying Alzheimer and Control Group:

str(data)
'data.frame':   499 obs. of  606 variables:
$ Gender: int  0 0 0 0 0 1 1 1 1 1 ...
$ NumOfWords: num  157 111 163 176 100 124 201 100 76 101
$ NumofLivings  : int  6 6 9 4 3 5 3 3 4 3 ...
$ NumofStopWords: num  77 45 87 91 46 64 104 37 32 41 ...
.
.
$ Group : Factor w/ 2 levels "Alzheimer","Control","Control"..:

So basically trainy should be data$Group but how about trainx? Could anyone 
help me in this?


Thanks for any help!
Elahe

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross-Validation for Zero-Inflated Models

2017-06-21 Thread Jeff Newmiller

1) Helpdesk implies people whose job it is to provide support. R-help is a 
mailing list in which users help each other when they have spare time. 

2) You sent an email to the R-help mailing list, not to Lara, whoever that is. 
I suggest you figure out what her email address is and send your question to 
her directly, or read the Posting Guide mentioned below and then pose an 
entirely new question of your own to the list. There is a lot of existing 
research and packages related to cross-validation, but you are going to need to 
illustrate why you think the usual tools are not sufficient. Have you looked at 
the CRAN Task Views?

3) Email only has linkage to other email when they follow as replies... you did 
not reply to her email, so no one reading your email (quite likely even Lara, 
if she is even still on the list) has any idea what question you are referring 
to. 

-- 
Sent from my phone. Please excuse my brevity.

On June 21, 2017 7:16:49 AM PDT, Eric Weine  wrote:
>Lara:
>
>I see you sent this email to the R helpdesk a really long time ago, but
>I was just wondering if you ever got an answer to this question. I was
>just thinking that I would build my own cross validation function, but
>if you figured out a way to do this automatically, could you let me
>know?
>
>Thanks,
>
>Eric Weine.
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross-Validation for Zero-Inflated Models

2017-06-21 Thread Eric Weine

Lara:

I see you sent this email to the R helpdesk a really long time ago, but I was 
just wondering if you ever got an answer to this question. I was just thinking 
that I would build my own cross validation function, but if you figured out a 
way to do this automatically, could you let me know?

Thanks,

Eric Weine.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross validation with variables which have one factor only

2014-03-04 Thread Maik Rehnus


   Dear R-team
   I  did a model selection by AIC which explain me the habitat use of my
   animals   in   six   different   study   sites  (See  attached  files:
   cross_val_CORINE04032014.csv and cross_val_CORINE04032014.r). Sites were
   used as random factor because they are distributed over the Alps and so very
   different. In this way I also removed variables which exist in one study
   area  only  to  do the model selection. In next, I tried to do a cross
   validation with the estimated best model for its prediction per site. That
   means I used model of five sites togehther against the remaining site. In
   this step I received an error:

val_10_fold_minger - cv.glm(data= minger, glmfit = best_model_year, K =
   10)
   Error in `contrasts-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
 contrasts can be applied only to factors with 2 or more levels

   So for some of the model variables used in the model formula below there are
   actually  not  two factor levels (example=C324F where absence :153 but
   presence:  0 )
   best_model_year - glm(dung1_b ~ C231F+C324F+C332F, family=binomial(logit),
   minger)
   Does somebody know is there a possibility in cross validation methods which
   can deal with variables which have one factor only?
   Kindly

   Maik
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross validation in R

2013-07-02 Thread Eddie Smith

Guys,

I select 70% of my data and keep 30% of it for model validation.

mydata - read.csv(file.choose(), header=TRUE)
select - sample(nrow(mydata), nrow(mydata) * .7)
data70 - mydata[select,]  # select
data30 - mydata[-select,]  # testing
temp.glm - glm(Death~Temperature, data=data70,
family=binomial(link=logit))

library(ROCR)  # ROC curve and assessment of my prediction
pred - prediction(data30$pred, data30$Death)
perf - performance(pred,tpr,fpr)
plot(perf); abline(0, 1, col=red)
attributes(performance(pred, 'auc'))$y.values[[1]] # area under the ROC

How do i make a loop so that the process could be repeated several time,
producing randomly ROC curve and under ROC values?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross validation in R

2013-07-02 Thread Adams, Jean

This code is untested, since you did not provide any example data.  But it
may help you get started.

Jean

mydata - read.csv(file.choose(), header=TRUE)
library(ROCR)  # ROC curve and assessment of my prediction

plot(0:1, 0:1, type=n, xlab=False positive rate, ylab=True positive
rate)
abline(0, 1, col=red)

nsim - 5
auc - rep(NA, nsim)
for(i in 1:nsim) {
select - sample(nrow(mydata), round(nrow(mydata)*0.7))
 data70 - mydata[select, ]  # train
data30 - mydata[-select, ]  # test
temp.glm - glm(Death ~ Temperature, data=data70, family=binomial)
 pred - prediction(data30$pred, data30$Death)
perf - performance(pred, tpr, fpr)
 plot(perf, add=TRUE)
auc[i] - attributes(performance(pred, auc))$y.values[[1]] # area under
the ROC
 }
auc


On Tue, Jul 2, 2013 at 3:25 AM, Eddie Smith eddie...@gmail.com wrote:

 Guys,

 I select 70% of my data and keep 30% of it for model validation.

 mydata - read.csv(file.choose(), header=TRUE)
 select - sample(nrow(mydata), nrow(mydata) * .7)
 data70 - mydata[select,]  # select
 data30 - mydata[-select,]  # testing
 temp.glm - glm(Death~Temperature, data=data70,
 family=binomial(link=logit))

 library(ROCR)  # ROC curve and assessment of my prediction
 pred - prediction(data30$pred, data30$Death)
 perf - performance(pred,tpr,fpr)
 plot(perf); abline(0, 1, col=red)
 attributes(performance(pred, 'auc'))$y.values[[1]] # area under the ROC

 How do i make a loop so that the process could be repeated several time,
 producing randomly ROC curve and under ROC values?

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross validation in R

2013-07-02 Thread Max Kuhn

 How do i make a loop so that the process could be repeated several time,
 producing randomly ROC curve and under ROC values?


Using the caret package

http://caret.r-forge.r-project.org/

--

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross validation for Naive Bayes and Bayes Networks

2013-04-15 Thread Marco Scutari

Hi Guilherme,

On Sun, Apr 14, 2013 at 11:48 PM, Guilherme Ferraz de Arruda
gu...@yahoo.com.br wrote:
 Hi,
 I need to classify, using Naive Bayes and Bayes Networks,  and estimate
 their performance using cross validation.
 How can I do this?
 I tried the bnlearn package for Bayes Networks, althought I need to get
 more indexes, not only the error rate (precision, sensitivity, ...).

You can do that using the object returned by bn.cv(), because it
contains the predicted values and the indexes of the corresponding
observations in the original data, for each fold. It's just a matter
to reassemble observed and predicted class labels and compute your
metrics.

 I also tried the *e1071* package, but I could not find a way to do
 cross-validation.

You might be able to trick the tune() function to do it, but I am not sure.

Marco

-- 
Marco Scutari, Ph.D.
Research Associate, Genetics Institute (UGI)
University College London (UCL), United Kingdom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross validation for Naive Bayes and Bayes Networks

2013-04-14 Thread Guilherme Ferraz de Arruda

Hi,
I need to classify, using Naive Bayes and Bayes Networks,  and estimate
their performance using cross validation.
How can I do this?
I tried the bnlearn package for Bayes Networks, althought I need to get
more indexes, not only the error rate (precision, sensitivity, ...).
I also tried the *e1071* package, but I could not find a way to do
cross-validation.
Thanks for everyone.

Guilherme.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross Validation with SVM

2013-04-05 Thread Nicolás Sánchez

Good morning.

I am using package e1071 to develop a SVM model. My code is:

x - subset(dataset, select = -Score)

y - dataset$Score

model - svm(x, y,cross=10)

print(model)

summary(model)


As 10-CV produces 10 models, I need two things:


1) To have access to each model from 10-CV.


2) To predict new instances with each model to know which one does the best
performance.


Anyone can help me?


Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross validation for nls function

2012-08-01 Thread mael

Hi,

I've written a logistic function using nls and I'd like to do cross
validation for this. Is there a package for that? Below is an example of my
data and the function. N terms are presence/absence data and the response is
succesful/failed data. 

y1-sample(0:1,100,replace=T)
N1-sample(0:1,100,replace=T)
N2-sample(0:1,100,replace=T)
N3-sample(0:1,100,replace=T)
N4-sample(0:1,100,replace=T)

Sw- function(y1, N1,N2,N3,N4) {
SA - nls(y1~exp(c+(a1*N1)+(a2*N2)+(a3*N3)+(a4*N4)
)/
(1+exp(c+(a1*N1)+(a2*N2)+(a3*N3)+(a4*N4)))
,start=list(a1=-0.2,a2=-0.2,a3=-0.2,a4=-0.2,c=0.2))
SA  
   }
model- Sw(y1, N1,N2,N3,N4)
summary(model)

Thanks for any help!
/Anna




--
View this message in context: 
http://r.789695.n4.nabble.com/Cross-validation-for-nls-function-tp4638630.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross validation in glmnet

2012-05-14 Thread yan

I am using cv.glmnet from glmnet package for logistic regression.
my dataset is very imbalanced, 5% sample from one group, the rest from the
other. I'm wondering when doing cv.glmnet for choosing lambda, is every fold
having the same ratio for two groups(every fold has 5% sample from one
group, the rest from the other in my case), or just random?

many thanks 

yan

--
View this message in context: 
http://r.789695.n4.nabble.com/cross-validation-in-glmnet-tp4629919.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross validation in rvm not working? (kernlab package)

2012-02-14 Thread Martin Batholdy

Hi,

according to ?rvm the relevance vector machine function as implemented in the 
kernlab-package 
has an argument 'cross' with which you can perform k-fold cross validation.

However, when I try to add a 10-fold cross validation I get the following error 
message:

Error in match.arg(type, c(C-svc, nu-svc, kbb-svc, spoc-svc, C-bsvc,  
: 
  'arg' should be one of “C-svc”, “nu-svc”, “kbb-svc”, “spoc-svc”, “C-bsvc”, 
“one-svc”, “eps-svr”, “eps-bsvr”, “nu-svr”


code-example:

# create data
x - seq(-20,20,0.1)
y - sin(x)/x + rnorm(401,sd=0.05)

# train relevance vector machine
foo - rvm(x, y, cross=10)


So, does that mean that cross-validation is not working for rvm at the moment?
(since the type argument only allows support vector regression or 
classification)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross validation in rvm not working? (kernlab package)

2012-02-14 Thread Uwe Ligges

Please report bugs in packages to the corresponding package maintainer 
(perhaps suggesting a fix if you have an idea how to do that).


Uwe Ligges

On 14.02.2012 12:42, Martin Batholdy wrote:

Hi,

according to ?rvm the relevance vector machine function as implemented in the 
kernlab-package
has an argument 'cross' with which you can perform k-fold cross validation.

However, when I try to add a 10-fold cross validation I get the following error 
message:

Error in match.arg(type, c(C-svc, nu-svc, kbb-svc, spoc-svc, C-bsvc,  
:
   'arg' should be one of “C-svc”, “nu-svc”, “kbb-svc”, “spoc-svc”, “C-bsvc”, 
“one-svc”, “eps-svr”, “eps-bsvr”, “nu-svr”


code-example:

# create data
x- seq(-20,20,0.1)
y- sin(x)/x + rnorm(401,sd=0.05)

# train relevance vector machine
foo- rvm(x, y, cross=10)


So, does that mean that cross-validation is not working for rvm at the moment?
(since the type argument only allows support vector regression or 
classification)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross-validation error with tune and with rpart

2011-12-31 Thread Israel Saeta Pérez

Hello list,

I'm trying to generate classifiers for a certain task using several
methods, one of them being decision trees. The doubts come when I want to
estimate the cross-validation error of the generated tree:

tree - rpart(y~., data=data.frame(xsel, y), cp=0.1)
ptree - prune(tree,
cp=tree$cptable[which.min(tree$cptable[,xerror]),CP])
ptree$cptable


   CP nsplit rel error xerror   xstd
1  0.3312  01. 1. 0.02856022
2  0.0864  10.6688 0.6704 0.02683544
3  0.02986667  20.5824 0.5856 0.02584564
4  0.0288  50.4928 0.5760 0.02571738
5  0.0192  60.4640 0.5168 0.02484761
6  0.0144  80.4256 0.5056 0.02466708
7  0.0096 120.3552 0.5024 0.02461452
8  0.0088 150.3264 0.4944 0.02448120
9  0.0080 170.3088 0.4768 0.02417800
10 0.0048 250.2448 0.4672 0.02400673


If I got it right, xerror stands for the cross-validation error (using
10-fold by default), this is pretty high (0.4672 over 1). However, if I do
something similar using tune from e1071 I get a much lower error:


treetune - tune(rpart, y~., data=data.frame(xsel, y), predict.func =
treeClassPrediction, cp=0.0048)

 treetune$best.performance[1] 0.2243049


I'm also assuming that the performance returned by tune is the
cross-validation error (also 10-fold by default). So where does this
enormous difference come from? What am I missing?

Also, rel error is the relative error in the training set? The
documentation is not very descriptive:
cptable- the table of optimal prunings based on a complexity parameter.

Thanks and happy pre-new year,
-- israel

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross-validation error with tune and with rpart

2011-12-31 Thread Prof Brian Ripley


On 31/12/2011 12:34, Israel Saeta Pérez wrote:

Hello list,

I'm trying to generate classifiers for a certain task using several
methods, one of them being decision trees. The doubts come when I want to
estimate the cross-validation error of the generated tree:

tree- rpart(y~., data=data.frame(xsel, y), cp=0.1)
ptree- prune(tree,
cp=tree$cptable[which.min(tree$cptable[,xerror]),CP])
ptree$cptable


CP nsplit rel error xerror   xstd
1  0.3312  01. 1. 0.02856022
2  0.0864  10.6688 0.6704 0.02683544
3  0.02986667  20.5824 0.5856 0.02584564
4  0.0288  50.4928 0.5760 0.02571738
5  0.0192  60.4640 0.5168 0.02484761
6  0.0144  80.4256 0.5056 0.02466708
7  0.0096 120.3552 0.5024 0.02461452
8  0.0088 150.3264 0.4944 0.02448120
9  0.0080 170.3088 0.4768 0.02417800
10 0.0048 250.2448 0.4672 0.02400673


If I got it right, xerror stands for the cross-validation error (using
10-fold by default), this is pretty high (0.4672 over 1).


You didn't get it right.  Please read the documentation, or contemplate 
why the first line is exactly one.  In any case, that table is not about 
error rates for the final tree: it is part of the model selection step 
(to cross-validate the final tree you would need to include the choice 
of pruning inside the cross-validation)


Did you look up the rpart technical report or one of the books 
explaining its output?  Google 'rpart technical report' if you need to 
find it.


[...]

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross-validation complex model AUC Nagelkerke R squared code

2011-04-12 Thread Jürgen Biedermann


Hi there,

I really tried hard to understand and find my own solution, but now I 
think I have to ask for your help.
I already developed some script code for my problem but I doubt that it 
is correct.


I have the following problem:

Image you develop a logistic regression model with a binary outcome Y 
(0/1) with possible preditors (X1,X2,X3..). The development of the 
final model would be quite complex and undertake several steps (stepwise 
forward selection with LR-Test statistics, incorporating interaction 
effects etc.). The final prediction at the end however would be through 
a glm object (called fit.glm). Then, I think so, it would be no problem 
to calculate a Nagelkerke R squared measure and an AUC value (for 
example with the pROC package) following the script:


BaseRate - table(Data$Y[[1]])/sum(table(Data$Y))
L(0)=Likelihood(Null-Model)= 
(BaseRate*log(BaseRate)+(1-BaseRate)*log(1-BaseRate))*sum(table(Data$Y))

LIKM - predict(fit.glm, type=response)
L(M)=Likelihood(FittedModell)=sum(Data$Y*log(LIKM)+(1-Data$Y)*log(1-LIKM))

R2 = 1-(L(0)/L(M))^2/n
R2_max=1-(L(0))^2/n
R2_Nagelkerke=R2/R2max

library(pROC)
AUC - auc(Data$Y,LIKM)

I checked this kind of caculation of R2_Nagelkerke and AUC-Value with 
the built-in calculation in package Design and got consistent results.


Now I implement a cross validation procedure, dividing the sample 
randomly into k-subsamples with equal size. Afterwards I calculate the 
predicted probabilities  for each k-th subsample with a model 
(fit.glm_s) developed taking the same algorithm as for the whole data 
model (stepwise forward selection selection etc.) but using all but the 
k-th subsample. I store the predicted probabilities subsequently and 
build up my LIKM vector (see above) the following way.


LIKM[sub] - predict(fit.glm_s, data=Data[-sub,], type=response).

Now I use the same formula/script as above, the only change therefore 
consists in the calculation of the LIKM vector.


BaseRate - table(Data$Y[[1]])/sum(table(Data$Y))
L(0)=Likelihood(Null-Model)= 
(BaseRate*log(BaseRate)+(1-BaseRate)*log(1-BaseRate))*sum(table(Data$Y))

...calculation of the cross-validated LIKM, see above ...
L(M)=Likelihood(FittedModell)=sum(Data$Y*log(LIKM)+(1-Data$Y)*log(1-LIKM))

R2 = 1-(L(0)/L(M))^2/n
R2_max=1-(L(0))^2/n
R2_Nagelkerke=R2/R2max

AUC - auc(Data$Y,LIKM)

When I compare my results (using more simply developed models) with the 
validate method in package Design (method=cross,B=10), it seems to 
me that I consistently underestimate the true expected Nagelkerke R 
Squared. Additionally, I'm very unsure about the way I try to calculate 
a cross-validated AUC.


Do I have an error in my thoughts of how to obtain easily 
cross-validated AUC and R Squared for a model developed to predict a 
binary outcome?


I hope my problem is understandable and you could try to help me.

Best regards,
Jürgen


--
---
Jürgen Biedermann
Bergmannstraße 3
10961 Berlin-Kreuzberg
Mobil: +49 176 247 54 354
Home: +49 30 250 11 713
e-mail: juergen.biederm...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross-validation in rpart

2011-03-19 Thread Penny B

I am trying to find out what type of sampling scheme is used to select the 10
subsets in 10-fold cross-validation process used in rpart to choose the best
tree. Is it simple random sampling? Is there any documentation available on
this?

Thanks, Penny.

--
View this message in context: 
http://r.789695.n4.nabble.com/cross-validation-in-rpart-tp3389329p3389329.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross-validation in rpart

2011-03-19 Thread Allan Engelhardt

I assume you mean rpart::xpred.rpart ?  The beauty of R means that you 
can look at the source.  For the simple case (where xval is a single 
number) the code does indeed do simple random sampling


xgroups- sample(rep(1:xval, length = nobs), nobs, replace = FALSE)


If you want another sampling, then you simply pass a vector as the xval 
parameter, as the documentation says: “This may also be an explicit list 
of integers that define the cross-validation groups”.


Hope this helps a little.

Allan

On 19/03/11 09:21, Penny B wrote:

I am trying to find out what type of sampling scheme is used to select the 10
subsets in 10-fold cross-validation process used in rpart to choose the best
tree. Is it simple random sampling? Is there any documentation available on
this?

Thanks, Penny.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross-validation in rpart

2011-03-19 Thread Prof Brian Ripley


On Sat, 19 Mar 2011, Penny B wrote:


I am trying to find out what type of sampling scheme is used to select the 10
subsets in 10-fold cross-validation process used in rpart to choose the best
tree. Is it simple random sampling? Is there any documentation available on
this?


Not SRS (and least in its conventional meaning), as it is 
partitioning: the 10 folds are disjoint.


Note that this happens in two places, in rpart() and in xpred.rpart(), 
but the (default) method is the same.  I presume you asked about the 
first, but it wasn't clear.


There is a lot of documentation on the meaning of '10-fold 
cross-validation', e.g. in my 1996 book.  There are a few slightly 
different ways to do it, and you can read the rpart sources if you 
want to know the details.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross validation? when rlm, lmrob or lmRob

2011-03-16 Thread agent dunham

Dear community, 

I have fitted a model using comands above, (rlm, lmrob or lmRob). I don't
have new data to validate de models obtained. I was wondering if exists
something similar to CVlm in robust regression. In case there isn't, any
suggestion for validation would be appreciated. 

Thanks, u...@host.com 

--
View this message in context: 
http://r.789695.n4.nabble.com/cross-validation-when-rlm-lmrob-or-lmRob-tp3382189p3382189.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross validation for Ordinary Kriging

2011-01-07 Thread Jon Olav Skoien


Pearl,

The error suggests that there is something wrong with x2, and that there 
is a difference between the row names of the coordinates and the data. 
If you call

str(x2)
see if the first element of @coords is different from NULL, as this can 
cause some problems when cross-validating. If it is, try to figure out 
why. You can also set the row.names equal to NULL directly:

row.names(x...@coords) = NULL
although I dont think such manipulation of the slots of an object is 
usually recommended.


Cheers,
Jon

BTW, you will usually get more response to questions about spatial data 
handling using the list r-sig-geo 
(https://stat.ethz.ch/mailman/listinfo/r-sig-geo)



On 1/6/2011 4:00 PM, pearl may dela cruz wrote:

ear ALL,

The last part of my thesis analysis is the cross validation. Right now I am
having difficulty using the cross validation of gstat. Below are my commands
with the tsport_ace as the variable:

nfold- 3
part- sample(1:nfold, 69, replace = TRUE)
sel- (part != 1)
m.model- x2[sel, ]
m.valid- x2[-sel, ]
t- fit.variogram(v,vgm(0.0437, Exp, 26, 0))
cv69- krige.cv(tsport_ace ~ 1, x2, t, nfold = nrow(x2))

The last line gives an error saying:
Error in SpatialPointsDataFrame(coordinates(data),
data.frame(matrix(as.numeric(NA),  :
   row.names of data and coords do not match

I don't know what is wrong. The x2 data is a SpatialPointsdataframe that is why
i did not specify the location (as it will take it from the data). Here is the
usage of the function krige.cv:

krige.cv(formula, locations, data, model = NULL, beta = NULL, nmax = Inf,
 nmin = 0, maxdist = Inf, nfold = nrow(data), verbose = TRUE, ...)
I hope you can help me on this. Thanks a lot.
Best regards,
Pearl




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross validation for Ordinary Kriging

2011-01-07 Thread Jon Olav Skoien


On 1/7/2011 12:40 PM, Jon Olav Skoien wrote:

Pearl,

The error suggests that there is something wrong with x2, and that 
there is a difference between the row names of the coordinates and the 
data. If you call

str(x2)
see if the first element of @coords is different from NULL, as this 
can cause some problems when cross-validating. If it is, try to figure 
out why. You can also set the row.names equal to NULL directly:

row.names(x...@coords) = NULL
although I dont think such manipulation of the slots of an object is 
usually recommended.


Pearl,

It seems the problem was caused by a recent change in sp without 
updating gstat, the maintainer has fixed it and submitted new version of 
gstat to CRAN. So you should be able to use your original script after 
downloading the new version, probably available in a couple of days. In 
the mean time the suggestion above should still work.


Cheers,
Jon



Cheers,
Jon

BTW, you will usually get more response to questions about spatial 
data handling using the list r-sig-geo 
(https://stat.ethz.ch/mailman/listinfo/r-sig-geo)



On 1/6/2011 4:00 PM, pearl may dela cruz wrote:

ear ALL,

The last part of my thesis analysis is the cross validation. Right 
now I am
having difficulty using the cross validation of gstat. Below are my 
commands

with the tsport_ace as the variable:

nfold- 3
part- sample(1:nfold, 69, replace = TRUE)
sel- (part != 1)
m.model- x2[sel, ]
m.valid- x2[-sel, ]
t- fit.variogram(v,vgm(0.0437, Exp, 26, 0))
cv69- krige.cv(tsport_ace ~ 1, x2, t, nfold = nrow(x2))

The last line gives an error saying:
Error in SpatialPointsDataFrame(coordinates(data),
data.frame(matrix(as.numeric(NA),  :
   row.names of data and coords do not match

I don't know what is wrong. The x2 data is a SpatialPointsdataframe 
that is why
i did not specify the location (as it will take it from the data). 
Here is the

usage of the function krige.cv:

krige.cv(formula, locations, data, model = NULL, beta = NULL, nmax = 
Inf,
 nmin = 0, maxdist = Inf, nfold = nrow(data), verbose = TRUE, 
...)

I hope you can help me on this. Thanks a lot.
Best regards,
Pearl




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross validation for Ordinary Kriging

2011-01-06 Thread pearl may dela cruz

ear ALL,

The last part of my thesis analysis is the cross validation. Right now I am 
having difficulty using the cross validation of gstat. Below are my commands 
with the tsport_ace as the variable:

nfold - 3
part - sample(1:nfold, 69, replace = TRUE)
sel - (part != 1)
m.model - x2[sel, ]
m.valid - x2[-sel, ]
t- fit.variogram(v,vgm(0.0437, Exp, 26, 0))
cv69 - krige.cv(tsport_ace ~ 1, x2, t, nfold = nrow(x2))

The last line gives an error saying:
Error in SpatialPointsDataFrame(coordinates(data), 
data.frame(matrix(as.numeric(NA),  : 
  row.names of data and coords do not match

I don't know what is wrong. The x2 data is a SpatialPointsdataframe that is why 
i did not specify the location (as it will take it from the data). Here is the 
usage of the function krige.cv:

krige.cv(formula, locations, data, model = NULL, beta = NULL, nmax = Inf, 
nmin = 0, maxdist = Inf, nfold = nrow(data), verbose = TRUE, ...)
I hope you can help me on this. Thanks a lot.
Best regards,
Pearl



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross validation using e1071:SVM

2010-11-24 Thread Neeti


thank you so much for your help. if i am not wrong then createDataPartition
can be used to create stratified random splits of a data set. 

is there other way to do that?

Thank you
-- 
View this message in context: 
http://r.789695.n4.nabble.com/cross-validation-using-e1071-SVM-tp3055335p3057684.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross validation using e1071:SVM

2010-11-23 Thread Neeti


Hi everyone

I am trying to do cross validation (10 fold CV) by using e1071:svm method. I
know that there is an option (“cross”) for cross validation but still I
wanted to make a function to Generate cross-validation indices  using pls:
cvsegments method.

#

Code (at the end) Is working fine but sometime caret:confusionMatrix gives
following error:

stat_result- confusionMatrix(pred_true1,species_test)

Error in confusionMatrix.default(pred_true1, species_test) : 
 The data and reference factors must have the same number of levels

My data: total number=260
Class = 6

#
Sorry if I missed some previous discussion about this problem.

It would be nice if anyone explain or point out the mistake I am doing in
this following code.

Is there another way to do this? As I wanted to check my result based on
Accuracy and Kappa value generated by caret:confusionMatrix.

##
Code
#
x-NULL
index-cvsegments(nrow(data),10)
for(i in 1:length(index))
{
x-matrix(index[i])
testset-data[x[[1]],]
trainset-data[-x[[1]],]

species-as.factor(trainset[,ncol(trainset)])
train1-trainset[,-ncol(trainset)]
train1-train1[,-(1)]

test_t-testset[,-ncol(testset)]
species_test-as.factor(testset[,ncol(testset)])
test_t-test_t[,-(1)]
model_true1 - svm(train1,species)
pred_true1-predict(model_true1,test_t)
stat_result- confusionMatrix(pred_true1,species_test)
stat_true[[i]]-as.matrix(stat_result,what=overall)
kappa_true[i]-stat_true[[i]][2,1]
accuracy_true[i]-stat_true[[i]][1,1]
}

-- 
View this message in context: 
http://r.789695.n4.nabble.com/cross-validation-using-e1071-SVM-tp3055335p3055335.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross validation using e1071:SVM

2010-11-23 Thread Francial Giscard LIBENGUE

Hi everyone,
Can you help me to plot Gamma(x/h+1) and Beta(x/h+1,(1-x)/h+1)?I want write
x-seq(0,3,0.1)
thank

2010/11/23 Neeti nikkiha...@gmail.com


 Hi everyone

 I am trying to do cross validation (10 fold CV) by using e1071:svm method.
 I
 know that there is an option (cross) for cross validation but still I
 wanted to make a function to Generate cross-validation indices  using pls:
 cvsegments method.

 #

 Code (at the end) Is working fine but sometime caret:confusionMatrix gives
 following error:

 stat_result- confusionMatrix(pred_true1,species_test)

 Error in confusionMatrix.default(pred_true1, species_test) :
  The data and reference factors must have the same number of levels

 My data: total number=260
Class = 6

 #
 Sorry if I missed some previous discussion about this problem.

 It would be nice if anyone explain or point out the mistake I am doing in
 this following code.

 Is there another way to do this? As I wanted to check my result based on
 Accuracy and Kappa value generated by caret:confusionMatrix.

 ##
 Code
 #
 x-NULL
 index-cvsegments(nrow(data),10)
 for(i in 1:length(index))
 {
x-matrix(index[i])
testset-data[x[[1]],]
trainset-data[-x[[1]],]

species-as.factor(trainset[,ncol(trainset)])
train1-trainset[,-ncol(trainset)]
train1-train1[,-(1)]

test_t-testset[,-ncol(testset)]
species_test-as.factor(testset[,ncol(testset)])
test_t-test_t[,-(1)]
model_true1 - svm(train1,species)
pred_true1-predict(model_true1,test_t)
stat_result- confusionMatrix(pred_true1,species_test)
stat_true[[i]]-as.matrix(stat_result,what=overall)
kappa_true[i]-stat_true[[i]][2,1]
accuracy_true[i]-stat_true[[i]][1,1]
 }

 --
 View this message in context:
 http://r.789695.n4.nabble.com/cross-validation-using-e1071-SVM-tp3055335p3055335.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Francial Giscard LIBENGUE
Doctorant en Mathématiques Appliquées ;Option : Statistique
Université de Franche-Comté - UFR Sciences et Techniques
Laboratoire de Mathématiques de Besançon  UMR 6623 CNRS
16, route de Gray - 25030 Besançon cedex, France.
Tel. +333.81.66.63.98  ; Fax +33 381 666 623 ; Bureau B 328.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross validation using e1071:SVM

2010-11-23 Thread Neeti



@Francial Giscard LIBENGUE  please post your query again so that with
different subject

-- 
View this message in context: 
http://r.789695.n4.nabble.com/cross-validation-using-e1071-SVM-tp3055335p3055831.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross validation using e1071:SVM

2010-11-23 Thread Neeti


could anyone help me with my last problem. if the question is not clear
please let me know

thank you  

Hi everyone 
 
 I am trying to do cross validation (10 fold CV) by using e1071:svm method. 
 I 
 know that there is an option (cross) for cross validation but still I 
 wanted to make a function to Generate cross-validation indices  using pls: 
 cvsegments method. 
 
 # 

 Code (at the end) Is working fine but sometime caret:confusionMatrix gives
 following error: 

 stat_result- confusionMatrix(pred_true1,species_test) 

 Error in confusionMatrix.default(pred_true1, species_test) : 
  The data and reference factors must have the same number of levels 

 My data: total number=260 
Class = 6 

 # 
 Sorry if I missed some previous discussion about this problem. 

 It would be nice if anyone explain or point out the mistake I am doing in
 this following code. 

 Is there another way to do this? As I wanted to check my result based on
 Accuracy and Kappa value generated by caret:confusionMatrix. 

 ## 
 Code 
 # 
 x-NULL 
 index-cvsegments(nrow(data),10) 
 for(i in 1:length(index)) 
 { 
x-matrix(index[i]) 
testset-data[x[[1]],] 
trainset-data[-x[[1]],] 

species-as.factor(trainset[,ncol(trainset)]) 
train1-trainset[,-ncol(trainset)] 
train1-train1[,-(1)] 

test_t-testset[,-ncol(testset)] 
species_test-as.factor(testset[,ncol(testset)]) 
test_t-test_t[,-(1)] 
model_true1 - svm(train1,species) 
pred_true1-predict(model_true1,test_t) 
stat_result- confusionMatrix(pred_true1,species_test) 
stat_true[[i]]-as.matrix(stat_result,what=overall) 
kappa_true[i]-stat_true[[i]][2,1] 
accuracy_true[i]-stat_true[[i]][1,1] 
 }

-- 
View this message in context: 
http://r.789695.n4.nabble.com/cross-validation-using-e1071-SVM-tp3055335p3055836.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross validation using e1071:SVM

2010-11-23 Thread Max Kuhn

Neeti,

I'm pretty sure that the error is related to the confusionMAtrix call,
which is in the caret package, not e1071.

The error message is pretty clear: you need to pas in two factor
objects that have the same levels. You can check by running the
commands:

   str(pred_true1)
   str(species_test)

Also, caret can do the resampling for you instead of you writing the
loop yourself.

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross-validation for choosing regression trees

2010-11-04 Thread Jonathan P Daily

Forgive me if I misunderstand your goals but I have no idea what you are 
trying to determine or what your data is. I can say, however, that setting 
mindev to 0 has always overfit data for me, and that you are more than 
likely looking at a situation in which that 1 node tree is more accurate.

Also, if you look at ?cv.tree, the default function to use is 
prune.tree(). Perhaps prune.tree() is trimming down to that terminal node?

If you want an alternative look at CART methods that may account for some 
of your issues, I would recommend the packages 'rpart' and 'party', as 
they may be more informative.
--
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it.
 - Jubal Early, Firefly



From:
Shiyao Liu lsy...@iastate.edu
To:
r-help@r-project.org
Date:
11/03/2010 09:04 PM
Subject:
[R] cross-validation for choosing regression trees
Sent by:
r-help-boun...@r-project.org



Dear All,

We came across a problem when using the tree package to analyze our data
set.

First, in the tree function, if we use the default value mindev=0.01,
the resulting regression tree has a single node. So, we set mindev=0, 
and
obtain a tree with 931 terminal nodes.

However, when we further use the cv.tree function to run a 10-fold
cross-validation, the error message is:

Error in prune.tree(list(frame = list(var = 1L, n = 6676, dev =
3.28220789569792,  : can not prune singlenode tree.

Is the cv.tree function respecting the mindev chosen in the tree 
function
or what else might be wrong?

Thanks,
Shiyao

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross-validation for choosing regression trees

2010-11-03 Thread Shiyao Liu

Dear All,

We came across a problem when using the tree package to analyze our data
set.

First, in the tree function, if we use the default value mindev=0.01,
the resulting regression tree has a single node. So, we set mindev=0, and
obtain a tree with 931 terminal nodes.

However, when we further use the cv.tree function to run a 10-fold
cross-validation, the error message is:

Error in prune.tree(list(frame = list(var = 1L, n = 6676, dev =
3.28220789569792,  : can not prune singlenode tree.

Is the cv.tree function respecting the mindev chosen in the tree function
or what else might be wrong?

Thanks,
Shiyao

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross validation of SVM

2010-06-16 Thread Uwe Ligges


From ?svm:

cross 	if a integer value k0 is specified, a k-fold cross validation on 
the training data is performed to assess the quality of the model: the 
accuracy rate for classification and the Mean Squared Error for regression


Uwe Ligges


On 15.06.2010 23:14, Amy Hessen wrote:



hi,

could you please tell me what kind of cross validation that SVM of e1071 uses?

Cheers,
Amy

_
View photos of singles in your area! Looking for a hot date?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross validation of SVM

2010-06-15 Thread Amy Hessen


 
hi,
 
could you please tell me what kind of cross validation that SVM of e1071 uses?
 
Cheers,
Amy
  
_
View photos of singles in your area! Looking for a hot date?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross-validation

2010-06-08 Thread azam jaafari

Hi
 
I want to do leave-one-out cross-validation for multinomial logistic regression 
in R. I did multinomial logistic reg. by package nnet in R. How I do 
validation? by which function?
response variable has 7 levels
 
please help me
 
Thanks alot
Azam


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross-validation

2010-06-08 Thread Joris Meys

As far as my knowledge goes, nnet doesn't have a built-in function for
crossvalidation. Coding it yourself is not hard though. Nnet is used
in this book : http://www.stats.ox.ac.uk/pub/MASS4/ , which contains
enough examples on how to do so.

See also the crossval function in the bootstrap package.
http://sekhon.berkeley.edu/library/bootstrap/html/crossval.html

Cheers
Joris

On Tue, Jun 8, 2010 at 11:34 AM, azam jaafari azamjaaf...@yahoo.com wrote:
 Hi

 I want to do leave-one-out cross-validation for multinomial logistic 
 regression in R. I did multinomial logistic reg. by package nnet in R. How I 
 do validation? by which function?
 response variable has 7 levels

 please help me

 Thanks alot
 Azam



        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross-validation

2010-06-08 Thread Max Kuhn

Install the caret package and see ?train. There is also:

   http://cran.r-project.org/web/packages/caret/vignettes/caretTrain.pdf
   http://www.jstatsoft.org/v28/i05/paper

Max



On Tue, Jun 8, 2010 at 5:34 AM, azam jaafari azamjaaf...@yahoo.com wrote:
 Hi

 I want to do leave-one-out cross-validation for multinomial logistic 
 regression in R. I did multinomial logistic reg. by package nnet in R. How I 
 do validation? by which function?
 response variable has 7 levels

 please help me

 Thanks alot
 Azam



        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross-validation for parameter selection (glm/logit)

2010-04-02 Thread Jay

If my aim is to select a good subset of parameters for my final logit
model built using glm(). What is the best way to cross-validate the
results so that they are reliable?

Let's say that I have a large dataset of 1000's of observations. I
split this data into two groups, one that I use for training and
another for validation. First I use the training set to build a model,
and the the stepAIC() with a Forward-Backward search. BUT, if I base
my parameter selection purely on this result, I suppose it will be
somewhat skewed due to the 1-time data split (I use only 1 training
dataset)

What is the correct way to perform this variable selection? And are
the readily available packages for this?

Similarly, when I have my final parameter set, how should I go about
and make the final assessment of the models predictability? CV? What
package?


Thank you in advance,
Jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross-validation for parameter selection (glm/logit)

2010-04-02 Thread JLucke

Jay
Unless I have misunderstood some statistical subtleties, you can use the 
AIC in place of actual cross-validation, as the AIC is asymptotically 
equivalent to leave-out-one cross-validation under MLE.
Joe

Stone, M.
An asymptotic equivalence of choice of model by cross-validation and 
Akaike's criterion
Journal of the Royal Statistical Society. Series B (Methodological), 1977, 
39, 44-47
Abstract: A logarithmic assessment of the performance of a predicting 
density is found to lead to asymptotic equivalence of choice of model by 
cross-validation and Akaike's criterion, when maximum likelihood 
estimation is used within each model. 





Jay josip.2...@gmail.com 
Sent by: r-help-boun...@r-project.org
04/02/2010 09:14 AM

To
r-help@r-project.org
cc

Subject
[R] Cross-validation for parameter selection (glm/logit)






If my aim is to select a good subset of parameters for my final logit
model built using glm(). What is the best way to cross-validate the
results so that they are reliable?

Let's say that I have a large dataset of 1000's of observations. I
split this data into two groups, one that I use for training and
another for validation. First I use the training set to build a model,
and the the stepAIC() with a Forward-Backward search. BUT, if I base
my parameter selection purely on this result, I suppose it will be
somewhat skewed due to the 1-time data split (I use only 1 training
dataset)

What is the correct way to perform this variable selection? And are
the readily available packages for this?

Similarly, when I have my final parameter set, how should I go about
and make the final assessment of the models predictability? CV? What
package?


Thank you in advance,
Jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross-validation for parameter selection (glm/logit)

2010-04-02 Thread Steve Lianoglou

Hi,

On Fri, Apr 2, 2010 at 9:14 AM, Jay josip.2...@gmail.com wrote:
 If my aim is to select a good subset of parameters for my final logit
 model built using glm(). What is the best way to cross-validate the
 results so that they are reliable?

 Let's say that I have a large dataset of 1000's of observations. I
 split this data into two groups, one that I use for training and
 another for validation. First I use the training set to build a model,
 and the the stepAIC() with a Forward-Backward search. BUT, if I base
 my parameter selection purely on this result, I suppose it will be
 somewhat skewed due to the 1-time data split (I use only 1 training
 dataset)

Another approach would be to use penalized regression models.

The glment package has lasso and elasticnet models for both logistic
and normal regression models.

Intuitively: in addition to minimizing (say) the squared loss, the
model has to pay some cost (lambda) for including a non-zero parameter
in your model, which in turn provides sparse models.

You ca use CV to fine tune the value for lambda.

If you're not familiar with these penalized models, the glmnet package
has a few references to get you started.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross-validation for parameter selection (glm/logit)

2010-04-02 Thread Bert Gunter

Inline below: 

Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Steve Lianoglou
Sent: Friday, April 02, 2010 2:34 PM
To: Jay
Cc: r-help@r-project.org
Subject: Re: [R] Cross-validation for parameter selection (glm/logit)

Hi,

On Fri, Apr 2, 2010 at 9:14 AM, Jay josip.2...@gmail.com wrote:
 If my aim is to select a good subset of parameters for my final logit
 model built using glm(). 

-- Define good

What is the best way to cross-validate the

-- Define best

 results so that they are reliable?

-- Define reliable

Answers depend on what you mean by these terms. I suggest you consult a
statistician to work with you. These are huge issues for which you would
profit by some guidance.

Cheers,
Bert

 Let's say that I have a large dataset of 1000's of observations. I
 split this data into two groups, one that I use for training and
 another for validation. First I use the training set to build a model,
 and the the stepAIC() with a Forward-Backward search. BUT, if I base
 my parameter selection purely on this result, I suppose it will be
 somewhat skewed due to the 1-time data split (I use only 1 training
 dataset)

Another approach would be to use penalized regression models.

The glment package has lasso and elasticnet models for both logistic
and normal regression models.

Intuitively: in addition to minimizing (say) the squared loss, the
model has to pay some cost (lambda) for including a non-zero parameter
in your model, which in turn provides sparse models.

You ca use CV to fine tune the value for lambda.

If you're not familiar with these penalized models, the glmnet package
has a few references to get you started.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross-validation in plsr package

2010-02-22 Thread Bjørn-Helge Mevik

Peter Tillmann peter.tillm...@t-online.de writes:

 can anyone give an example how to use cross-validation in the plsr package.

There are examples in the references cited on
http://mevik.net/work/software/pls.html

 I miss to find the number of factors proposed by cross-validation as
 optimum.

The cross-validation in the pls package does not propose a number of
factors as optimum, you have to select this yourself.  (The reason for
this is that there is AFAIK no theoretically founded and widely accepted
way of doing this automatically.  I'd be happy to learn otherwise.)

-- 
Regards,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross-validation in plsr package

2010-02-22 Thread Peter Tillmann


Kjaere BjÃ¸rn-Helge,
 
   can anyone give an example how to use cross-validation in the plsr 
 package.
 
 There are examples in the references cited on
 http://mevik.net/work/software/pls.html
 
   I miss to find the number of factors proposed by cross-validation as
   optimum.
 
 The cross-validation in the pls package does not propose a number of
 factors as optimum, you have to select this yourself.  (The reason for
 this is that there is AFAIK no theoretically founded and widely accepted
 way of doing this automatically.  I'd be happy to learn otherwise.)

tusend takk.

Vi i NIRS bruker CV for a bestemme antall faktorer i PLS, derfor lurer 
jeg paa en foreslag fra CV. Men klart vi er bare brukerer ikke 
statistiker i samenheng med PLS.


Hilsen

Peter
*
Espenauer Str. 28, D-34246 Vellmar, Deutschland


-- 
View this message in context: 
http://n4.nabble.com/cross-validation-in-plsr-package-tp1563815p1564131.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross-validation in plsr package

2010-02-22 Thread Max Kuhn

 The cross-validation in the pls package does not propose a number of
 factors as optimum, you have to select this yourself.  (The reason for
 this is that there is AFAIK no theoretically founded and widely accepted
 way of doing this automatically.  I'd be happy to learn otherwise.)

The caret package has a wrapper for pls and multiple resampling
methods (cv, bootstrap, repeated test/train splits etc).

There are a few modules that can be used for automatically determining
the optimal number of components. I agree that there is no uniformly
best technique. The only thing that I know of that is widely accepted
is the 1 stardard error rule in CART. In this case, that would mean
that you find the value of ncomp with the smallest error and choose
the final ncomp value based of the smallest value within one standard
error of the optimal. caret can do this or use any other rule that you
think is appropriate.

Thanks,

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross-validation in plsr package

2010-02-21 Thread Peter Tillmann


Dear readers,

can anyone give an example how to use cross-validation in the plsr package.
I miss to find the number of factors proposed by cross-validation as
optimum.


Thank you

Peter
-- 
View this message in context: 
http://n4.nabble.com/cross-validation-in-plsr-package-tp1563815p1563815.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross validation function translated from stata

2010-01-21 Thread zhu yao

Hi, everyone:

I ask for help about translating a stata program into R.

The program perform cross validation as it stated.

#1. Randomly divide the data set into 10 sets of equal size, ensuring equal
numbers of events in each set
#2. Fit the model leaving out the 1st set
#3. Apply the fitted model in (2) to the 1st set to obtain the predicted
probability of a prostate cancer diagnosis.
#4. Repeat steps (2) to (3) leaving out and then applying the fitted model
to the ith group, i = 2, 3... 10. Every subject now has a predicted
probability of a prostate cancer diagnosis.
#5. Using the predicted probabilities, compute the net benefit at various
threshold probabilities.
#6. Repeat steps (1) to (5) 200 times. The corrected net benefit for each
threshold probability is the mean across the 200 replications.

=
First is stata code.

forvalues i=1(1)200 {
local event=cancer
local predictors1 = total_psa
local predictors2 = total_psa free_psa
local prediction1 = base
local prediction2 = full

g `prediction1'=.
g `prediction2'=.

quietly g u = uniform()
sort `event' u
g set = mod(_n, 10) + 1

forvalues j=1(1)10{
quietly logit `event' `predictors1' if set~=`j'
quietly predict ptemp if set==`j'
quietly replace `prediction1' = ptemp if set==`j'
drop ptemp


quietly logit `event' `predictors2' if set~=`j'
quietly predict ptemp if set==`j'
quietly replace `prediction2' = ptemp if set==`j'
drop ptemp
}
tempfile dca`i'
quietly dca `event' `prediction1' `prediction2', graphoff saving(`dca`i'')

drop u set `prediction1' `prediction2'
}


use `dca1', clear
forvalues i=2(1)200 {
append using `dca`i''
}

collapse all none modelp1 modelp2, by(threshold)

save cross validation dca output.dta, replace

twoway(line none all modelp1 modelp2 threshold, sort)

=
Here is my draft of R code. cMain is my dataset.

predca-rep(0,4)
dim(predca)-c(200,200)

for (i in 1:200) {
 cvgroup-rep(1:10,length=110)
cvgroup-sample(cvgroup)
cvpre-rep(0,length=110)
cvMain-cbind(cMain,cvgroup,cvpre)

for (j in 1:10) {
cvdev-cvMain[cvMain$cvgroup!=j,]
cvval-cvMain[cvMain$cvgroup==j,]
 cvfit-lrm(Y~X,data=cvdev,x=T,y=T)
cvprej-predict(cvfit,cvval,type=fitted)

#put the fitted value in dataset
cvMain[cvgroup==j,]$cvpre-prej
}

cvdcaop-dca(cvMain$Y,cvMain$cvpre,prob=(Y))
cvnb-100*(cvdcaop[,1]-cvdcaop[,2])
cvtpthres-cvdcaop[,4]/(100-cvdcaop[,4])
cvnr-cvnb/cvtpthres
predca[cvn,1:99]-cvnb
predca[cvn,101:199]-cvnr
}

=

My questions are
1. How to ensure equal numbers of events in each set in R?
2. A part of stata code is
forvalues j=1(1)10{
quietly logit `event' `predictors1' if set~=`j'
quietly predict ptemp if set==`j'
quietly replace `prediction1' = ptemp if set==`j'
drop ptemp

quietly logit `event' `predictors2' if set~=`j'
quietly predict ptemp if set==`j'
quietly replace `prediction2' = ptemp if set==`j'
drop ptemp
}
  I don't understand what's the difference between prediction1 and
prediction2
3. Is my code right?

Thanks !

Yao Zhu
Department of Urology
Fudan University Shanghai Cancer Center
Shanghai, China

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross validation function translated from stata

2010-01-21 Thread Steve Lianoglou

Hi,

On Thu, Jan 21, 2010 at 8:55 AM, zhu yao mailzhu...@gmail.com wrote:
 Hi, everyone:

 I ask for help about translating a stata program into R.

 The program perform cross validation as it stated.

 #1. Randomly divide the data set into 10 sets of equal size, ensuring equal
 numbers of events in each set
 #2. Fit the model leaving out the 1st set
 #3. Apply the fitted model in (2) to the 1st set to obtain the predicted
 probability of a prostate cancer diagnosis.
 #4. Repeat steps (2) to (3) leaving out and then applying the fitted model
 to the ith group, i = 2, 3... 10. Every subject now has a predicted
 probability of a prostate cancer diagnosis.
 #5. Using the predicted probabilities, compute the net benefit at various
 threshold probabilities.
 #6. Repeat steps (1) to (5) 200 times. The corrected net benefit for each
 threshold probability is the mean across the 200 replications.

 =
 First is stata code.

 forvalues i=1(1)200 {
 local event=cancer
 local predictors1 = total_psa
 local predictors2 = total_psa free_psa
 local prediction1 = base
 local prediction2 = full

 g `prediction1'=.
 g `prediction2'=.

 quietly g u = uniform()
 sort `event' u
 g set = mod(_n, 10) + 1

 forvalues j=1(1)10{
 quietly logit `event' `predictors1' if set~=`j'
 quietly predict ptemp if set==`j'
 quietly replace `prediction1' = ptemp if set==`j'
 drop ptemp


 quietly logit `event' `predictors2' if set~=`j'
 quietly predict ptemp if set==`j'
 quietly replace `prediction2' = ptemp if set==`j'
 drop ptemp
 }
 tempfile dca`i'
 quietly dca `event' `prediction1' `prediction2', graphoff saving(`dca`i'')

 drop u set `prediction1' `prediction2'
 }


 use `dca1', clear
 forvalues i=2(1)200 {
 append using `dca`i''
 }

 collapse all none modelp1 modelp2, by(threshold)

 save cross validation dca output.dta, replace

 twoway(line none all modelp1 modelp2 threshold, sort)

 =
 Here is my draft of R code. cMain is my dataset.

 predca-rep(0,4)
 dim(predca)-c(200,200)

 for (i in 1:200) {
  cvgroup-rep(1:10,length=110)
 cvgroup-sample(cvgroup)
 cvpre-rep(0,length=110)
 cvMain-cbind(cMain,cvgroup,cvpre)

 for (j in 1:10) {
 cvdev-cvMain[cvMain$cvgroup!=j,]
 cvval-cvMain[cvMain$cvgroup==j,]
  cvfit-lrm(Y~X,data=cvdev,x=T,y=T)
 cvprej-predict(cvfit,cvval,type=fitted)

 #put the fitted value in dataset
 cvMain[cvgroup==j,]$cvpre-prej
 }

 cvdcaop-dca(cvMain$Y,cvMain$cvpre,prob=(Y))
 cvnb-100*(cvdcaop[,1]-cvdcaop[,2])
 cvtpthres-cvdcaop[,4]/(100-cvdcaop[,4])
 cvnr-cvnb/cvtpthres
 predca[cvn,1:99]-cvnb
 predca[cvn,101:199]-cvnr
 }

 =

 My questions are
 1. How to ensure equal numbers of events in each set in R?

I just wanted to point you to the createFolds and
createDataPartition functions in the caret package ... they try to
do something similar, so perhaps you can see how others have tried to
solve this problem:

http://cran.r-project.org/web/packages/caret/index.html

For example, from their help page:

For other data splitting, the random sampling is done within the
levels of y when y is a factor in an attempt to balance the class
distributions within the splits. For numeric y, the sample is split
into groups sections based on quantiles and sampling is done within
these subgroups. Also, for very small class sizes (= 3) the classes
may not show up in both the training and test data

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross validation function translated from stata

2010-01-21 Thread Frank E Harrell Jr


Take a look at the validate.lrm function in the rms package.

Note that the use of threshold probabilities results in an improper 
scoring rule which will mislead you.  Also note that you need to repeat 
10-fold CV 50-100 times for precision, and that at each repeat you have 
to start from zero in analyzing associations.


Frank

zhu yao wrote:

Hi, everyone:

I ask for help about translating a stata program into R.

The program perform cross validation as it stated.

#1. Randomly divide the data set into 10 sets of equal size, ensuring equal
numbers of events in each set
#2. Fit the model leaving out the 1st set
#3. Apply the fitted model in (2) to the 1st set to obtain the predicted
probability of a prostate cancer diagnosis.
#4. Repeat steps (2) to (3) leaving out and then applying the fitted model
to the ith group, i = 2, 3... 10. Every subject now has a predicted
probability of a prostate cancer diagnosis.
#5. Using the predicted probabilities, compute the net benefit at various
threshold probabilities.
#6. Repeat steps (1) to (5) 200 times. The corrected net benefit for each
threshold probability is the mean across the 200 replications.

=
First is stata code.

forvalues i=1(1)200 {
local event=cancer
local predictors1 = total_psa
local predictors2 = total_psa free_psa
local prediction1 = base
local prediction2 = full

g `prediction1'=.
g `prediction2'=.

quietly g u = uniform()
sort `event' u
g set = mod(_n, 10) + 1

forvalues j=1(1)10{
quietly logit `event' `predictors1' if set~=`j'
quietly predict ptemp if set==`j'
quietly replace `prediction1' = ptemp if set==`j'
drop ptemp


quietly logit `event' `predictors2' if set~=`j'
quietly predict ptemp if set==`j'
quietly replace `prediction2' = ptemp if set==`j'
drop ptemp
}
tempfile dca`i'
quietly dca `event' `prediction1' `prediction2', graphoff saving(`dca`i'')

drop u set `prediction1' `prediction2'
}


use `dca1', clear
forvalues i=2(1)200 {
append using `dca`i''
}

collapse all none modelp1 modelp2, by(threshold)

save cross validation dca output.dta, replace

twoway(line none all modelp1 modelp2 threshold, sort)

=
Here is my draft of R code. cMain is my dataset.

predca-rep(0,4)
dim(predca)-c(200,200)

for (i in 1:200) {
 cvgroup-rep(1:10,length=110)
cvgroup-sample(cvgroup)
cvpre-rep(0,length=110)
cvMain-cbind(cMain,cvgroup,cvpre)

for (j in 1:10) {
cvdev-cvMain[cvMain$cvgroup!=j,]
cvval-cvMain[cvMain$cvgroup==j,]
 cvfit-lrm(Y~X,data=cvdev,x=T,y=T)
cvprej-predict(cvfit,cvval,type=fitted)

#put the fitted value in dataset
cvMain[cvgroup==j,]$cvpre-prej
}

cvdcaop-dca(cvMain$Y,cvMain$cvpre,prob=(Y))
cvnb-100*(cvdcaop[,1]-cvdcaop[,2])
cvtpthres-cvdcaop[,4]/(100-cvdcaop[,4])
cvnr-cvnb/cvtpthres
predca[cvn,1:99]-cvnb
predca[cvn,101:199]-cvnr
}

=

My questions are
1. How to ensure equal numbers of events in each set in R?
2. A part of stata code is
forvalues j=1(1)10{
quietly logit `event' `predictors1' if set~=`j'
quietly predict ptemp if set==`j'
quietly replace `prediction1' = ptemp if set==`j'
drop ptemp

quietly logit `event' `predictors2' if set~=`j'
quietly predict ptemp if set==`j'
quietly replace `prediction2' = ptemp if set==`j'
drop ptemp
}
  I don't understand what's the difference between prediction1 and
prediction2
3. Is my code right?

Thanks !

Yao Zhu
Department of Urology
Fudan University Shanghai Cancer Center
Shanghai, China

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross validation function translated from stata

2010-01-21 Thread zhu yao

Thanks Frank and Steve.
I rewrite the R code as follows.

# m is the number of fold to split sample, n is the loop number of cross
validation

library(caret)

calcvnb-function(formula,dat,m,n)
{

cvnb-rep(0,2)
dim(cvnb)-c(200,100)
 for (i in 1:n)
{
 group-rep(0,length=110)
sg-createFolds(dat$LN,k=m)
for (k in 1:m)
{
group[sg[[k]]]-k
}
 pre-rep(0,length=110)
data1-cbind(dat,group,pre)
 for (j in 1:m)
{
dev-data1[data1$group!=j,]
val-data1[data1$group==j,]
 fit-lrm(formula,data=dev,x=T,y=T)
pre1-predict(fit,val,type=fitted)

data1[group==j,]$pre-pre1
}

dcaop-dca(data1$LN,data1$pre,prob=(Y))
nb-100*(dcaop[,1]-dcaop[,2])
cvnb[i,1:99]-nb
}
mcvnb-colMeans(cvnb)
return(mcvnb)
}

# apply the function in main code

optnb1-calcvnb(formula=LN~factor(MTSTAGE)+factor(GRADE)+LVINVAS+P53,dat=cMain,m=10,n=200)


However, applied to my data,  a error occurred after several loops Error in
contrasts - '('*tmp*',value=contr.treatment):
contrasts can be applied only to factors with 2 or more levels.

Whats wrong with my code and how to handle it ?

Yao Zhu
Department of Urology
Fudan University Shanghai Cancer Center
Shanghai, China
Yao Zhu
Department of Urology
Fudan University Shanghai Cancer Center
Shanghai, China


2010/1/21 zhu yao mailzhu...@gmail.com

 Hi, everyone:

 I ask for help about translating a stata program into R.

 The program perform cross validation as it stated.

 #1. Randomly divide the data set into 10 sets of equal size, ensuring equal
 numbers of events in each set
 #2. Fit the model leaving out the 1st set
 #3. Apply the fitted model in (2) to the 1st set to obtain the predicted
 probability of a prostate cancer diagnosis.
 #4. Repeat steps (2) to (3) leaving out and then applying the fitted model
 to the ith group, i = 2, 3... 10. Every subject now has a predicted
 probability of a prostate cancer diagnosis.
 #5. Using the predicted probabilities, compute the net benefit at various
 threshold probabilities.
 #6. Repeat steps (1) to (5) 200 times. The corrected net benefit for each
 threshold probability is the mean across the 200 replications.

 =
 First is stata code.

 forvalues i=1(1)200 {
 local event=cancer
 local predictors1 = total_psa
 local predictors2 = total_psa free_psa
 local prediction1 = base
 local prediction2 = full

 g `prediction1'=.
 g `prediction2'=.

 quietly g u = uniform()
 sort `event' u
 g set = mod(_n, 10) + 1

 forvalues j=1(1)10{
  quietly logit `event' `predictors1' if set~=`j'
 quietly predict ptemp if set==`j'
  quietly replace `prediction1' = ptemp if set==`j'
 drop ptemp


 quietly logit `event' `predictors2' if set~=`j'
 quietly predict ptemp if set==`j'
  quietly replace `prediction2' = ptemp if set==`j'
 drop ptemp
 }
 tempfile dca`i'
 quietly dca `event' `prediction1' `prediction2', graphoff
 saving(`dca`i'')

 drop u set `prediction1' `prediction2'
 }


 use `dca1', clear
 forvalues i=2(1)200 {
  append using `dca`i''
 }

 collapse all none modelp1 modelp2, by(threshold)

 save cross validation dca output.dta, replace

 twoway(line none all modelp1 modelp2 threshold, sort)

 =
 Here is my draft of R code. cMain is my dataset.

 predca-rep(0,4)
 dim(predca)-c(200,200)

  for (i in 1:200) {
  cvgroup-rep(1:10,length=110)
 cvgroup-sample(cvgroup)
  cvpre-rep(0,length=110)
 cvMain-cbind(cMain,cvgroup,cvpre)

  for (j in 1:10) {
 cvdev-cvMain[cvMain$cvgroup!=j,]
 cvval-cvMain[cvMain$cvgroup==j,]
  cvfit-lrm(Y~X,data=cvdev,x=T,y=T)
 cvprej-predict(cvfit,cvval,type=fitted)

 #put the fitted value in dataset
 cvMain[cvgroup==j,]$cvpre-prej
  }

 cvdcaop-dca(cvMain$Y,cvMain$cvpre,prob=(Y))
  cvnb-100*(cvdcaop[,1]-cvdcaop[,2])
 cvtpthres-cvdcaop[,4]/(100-cvdcaop[,4])
  cvnr-cvnb/cvtpthres
 predca[cvn,1:99]-cvnb
 predca[cvn,101:199]-cvnr
 }

 =

 My questions are
 1. How to ensure equal numbers of events in each set in R?
 2. A part of stata code is
 forvalues j=1(1)10{
 quietly logit `event' `predictors1' if set~=`j'
  quietly predict ptemp if set==`j'
 quietly replace `prediction1' = ptemp if set==`j'
  drop ptemp

 quietly logit `event' `predictors2' if set~=`j'
  quietly predict ptemp if set==`j'
 quietly replace `prediction2' = ptemp if set==`j'
  drop ptemp
 }
   I don't understand what's the difference between prediction1 and
 prediction2
 3. Is my code right?

 Thanks !

 Yao Zhu
 Department of Urology
 Fudan University Shanghai Cancer Center
 Shanghai, China


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross validation for species distribution

2010-01-01 Thread elaine kuo

Dear,

Thanks for the warmful help on New Year's EVE.

Cross-validation is used to validate the predictive quality of the training
data with testing data.

As for the amount,
the cross-validation (cv) is supposed to be based on k-fold
cross-validation,
k-1 for the training and 1 for  the testing.

The cross-validation will be repeated for k times.

Is it the same with the function inside caret, ipred, and e1071 package?

Elaine

On Fri, Jan 1, 2010 at 4:02 AM, Max Kuhn mxk...@gmail.com wrote:

 You might want to be more specific about what you (exactly) intend to do.
 Reading the posting guide might help you get better answers.

 There are a few packages and functions to do what (I think) you desire.
 There is the train function in the caret package, the errorest function in
 ipred and a few in e1071.

 Max




 On Dec 31, 2009, at 12:13 AM, elaine kuo elaine.kuo...@gmail.com wrote:

   Dear,

 I wanna make cross-validation for the species data of species distribution
 models.
 Please kindly suggest any package containing cross validation suiting the
 purpose.

 Thank you.

 Elaine

   [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross validation for species distribution

2010-01-01 Thread Max Kuhn

Elaine,

That's a fair answer, but completely not what I meant. I was hoping
that you would elaborate on the species data of species distribution
models. What types of inputs and output for this particular modeling
application etc.

 Is it the same with the function inside caret, ipred, and e1071 package?

Yes and there are other resampling options other than k-fold CV.

For caret, you might start with this paper:

  www.jstatsoft.org/v28/i05/

That should tell you most of what you need to know.

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross validation for species distribution

2009-12-31 Thread Max Kuhn

You might want to be more specific about what you (exactly) intend to  
do. Reading the posting guide might help you get better answers.


There are a few packages and functions to do what (I think) you  
desire. There is the train function in the caret package, the errorest  
function in ipred and a few in e1071.


Max



On Dec 31, 2009, at 12:13 AM, elaine kuo elaine.kuo...@gmail.com  
wrote:



Dear,

I wanna make cross-validation for the species data of species  
distribution

models.
Please kindly suggest any package containing cross validation  
suiting the

purpose.

Thank you.

Elaine

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross validation for species distribution

2009-12-30 Thread elaine kuo

Dear,

I wanna make cross-validation for the species data of species distribution
models.
Please kindly suggest any package containing cross validation suiting the
purpose.

Thank you.

Elaine

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross validation/GAM/package Daim

2009-12-13 Thread Kim Vanselow

Dear r-helpers,
I estimated a generalized additive model (GAM) using Hastie's package GAM.

Example:
gam1 - gam(vegetation ~ s(slope), family = binomial, data=aufnahmen_0708, 
trace=TRUE)
pred - predict(gam1, type = response)

vegetation is a categorial, slope a numerical variable.

Now I want to assess the accurancy of the model using k-fold cross validation.

I found the package Daim with function Daim for estimation of prediction error 
based on cross-validation (CV) or various bootstrap techniques.

But I am not able to run it properly. I tried the following 3 versions:

1.
accurancy - Daim (vegetation ~ s(slope), model=gam1, data=aufnahmen_0708, 
labpos=alpine mats) -- error: could not find function model

2.
accurancy - Daim (vegetation ~ s(slope), model=gam, data=aufnahmen_0708, 
labpos=alpine mats) -- error in model(formula, train, test) : `family' not 
recognized

3. accurancy - Daim (vegetation ~ s(slope), model=gam(family=binomial), 
data=aufnahmen_0708, labpos=alpine mats) -- error in environment(formula) : 
Element 1 is empty; Der Teil der Argumentliste '.Internal' der berechnet wurde 
war:  (fun)

Can anybody help me? Any advice is greatly appreciated!

Thanks
Kim




-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross-Validation for Zero-Inflated Models

2009-04-15 Thread lara harrup (IAH-P)

Hi all
 
I have developed a zero-inflated negative binomial model using the
zeroinfl function from the pscl package, which I have carried out model
selection based on AIC and have used likelihood ratio tests (lrtest from
the lmtest package) to compare the nested models  [My end model contains
2 factors and 4 continuous variables in the count model plus one
continuous variable in the zero-inflated portion].
 
But for model assessment I would like to carry out some form of internal
cross-validation along the lines of leave one out cv etc, to gauge the
predictive ability of my final model just wondering if there is any
technique within r for doing this with zero-inflated models/negative
binomial models.
 
n.b. my data set is not large enough to split the data at the start and
only fit the model to a subset of data.
I am using r 2.8.1
 
Many Thanks in Advance
 
Lara

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross-validation

2009-03-24 Thread per243


I have reviewed all the scripts that appear http://cran.es.r-project.org/ and
I cann´t find any suitable for cross-validation with a model of the form y =
aX^(b). exp(cZ). Please can someone help me?
Thanks, a lot of!!
-- 
View this message in context: 
http://www.nabble.com/cross-validation-tp22674165p22674165.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross-validation - lift curve

2009-03-13 Thread Gene Leynes

This may be somewhat useful, but I might have more later.
http://florence.acadiau.ca/collab/hugh_public/index.php?title=R:CheckBinFit

(the code below is copied from the URL above)

CheckBinFit - function(y,phat,nq=20,new=T,...) {
if(is.factor(y)) y - as.double(y)
y - y-mean(y)
y[y0] - 1
y[y=0] - 0
quants - quantile(phat,probs=(1:nq)/(nq+1))
names(quants) - NULL
quants - c(0,quants,1)
phatD - rep(0,nq+1)
phatF - rep(0,nq+1)
for(i in 1:(nq+1))
{
which - ((phat=quants[i+1])(phatquants[i]))
phatF[i] - mean(phat[which])
phatD[i] - mean(y[which])
}
if (new) plot(phatF,phatD,xlab=phat,ylab=data,
  main=paste('R^2=',cor(phatF,phatD)^2),...)
else points(phatF,phatD,...)
abline(0,1)
return(invisible(list(phat=phatF,data=phatD)))
}



On Thu, Mar 12, 2009 at 1:30 PM, Eric Siegel e...@predictionimpact.comwrote:

 Hi all,

 I'd like to do cross-validation on lm and get the resulting lift
 curve/table
 (or, alternatively, the estimates on 100% of my data with which I can get
 lift).

 If such a thing doesn't exist, could it be derived using cv.lm, or would we
 need to start from scratch?

 Thanks!

 --
 Eric Siegel, Ph.D.
 President
 Prediction Impact, Inc.

 Predictive Analytics World Conference
 More info: www.predictiveanalyticsworld.com
 LinkedIn Group: www.linkedin.com/e/gis/1005097

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross-validation - lift curve

2009-03-12 Thread Eric Siegel

Hi all,

I'd like to do cross-validation on lm and get the resulting lift curve/table
(or, alternatively, the estimates on 100% of my data with which I can get
lift).

If such a thing doesn't exist, could it be derived using cv.lm, or would we
need to start from scratch?

Thanks!

-- 
Eric Siegel, Ph.D.
President
Prediction Impact, Inc.

Predictive Analytics World Conference
More info: www.predictiveanalyticsworld.com
LinkedIn Group: www.linkedin.com/e/gis/1005097

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross-validation question

2009-01-13 Thread Geoffrey Zhu

Hello everyone,

I have a data set that looks like the following:

Year   Days to the beginning of YearValue
1 30   100
1 60200
1..  ...
1 360  
2 30...
2 60...
2 ...
...
2 360  ...

Then I used a linear regression to fit Value ~ Days to the beginning of the
year with a polynomial.

Now I want to use cross-validation to detect over-fitting. But I am not sure
if I want to leave out 1/k random data points or leave out 1/k random years.
What do you think?

Thanks,
Geoffrey

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross-validation

2008-11-14 Thread Tim Smith

Hi,

I was trying to do cross-validation using the crossval function (bootstrap 
package), with the following code:

-

 theta.fit - function(x,y){ 
  model - svm(x,y,kernel = linear) 
 }
 
 theta.predict - function(fit,x){
  prediction - predict(fit,x)
  return(prediction)
 }
 
 x - matrix(rnorm(5100),102,50)
 rownames(x) - paste('a',1:102,sep='')
 colnames(x) - paste('b',1:50,sep='')
 y - factor(sample(1:2,102,replace=T))
 
 results - crossval(x,y,theta.fit,theta.predict)   # LOOCV 

---

I get the following error:

Error in scale(newdata[, object$scaled, drop = FALSE], center = 
object$x.scale$scaled:center,  : 
  (subscript) logical subscript too long

It seems to work alright if I use 10 fold cross validation (e.g. results - 
crossval(x,y,theta.fit,theta.predict,ngroup = 10), but gives the error for 
LOOCV.

What am I doing wrong?

thanks!

My session info is:
 sessionInfo()
R version 2.7.1 (2008-06-23) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] rpart_3.1-41 lattice_0.17-8   ROCR_1.0-2   gplots_2.6.0
[5] gdata_2.4.2  gtools_2.5.0 e1071_1.5-18 class_7.2-42
[9] bootstrap_1.0-21

loaded via a namespace (and not attached):
[1] grid_2.7.1  tools_2.7.1


-



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross Validation output

2008-09-26 Thread Donald Catanzaro, PhD


Good Day All,

I have a negative binomial model that I created using the function 
glm.nb() with the MASS library and I am performing a cross-validation 
using the function cv.glm() from the boot library.  I am really 
interested in determining the performance of this model so I can have 
confidence (or not) when it might be applied elsewhere


If I understand the cv.glm() procedure correctly, the default cost 
function is the average squared error and by running run cv.glm() in a 
loop many times I understand that I can calculate PRESS (PRedictive 
Error Sum of Squares = 1/n*Sum(all PEs) from the default output.


When I run a loop that is 10 times, my PRESS ~25

I have a few questions:

1)  I must now confess my ignorance, how does one interpret my PRESS of 
25 ?  Are there some internet resources that someone could point me to 
to help in the interpretation ?  I've spent most of yesterday studying 
up on things but feel like I am chasing my tail.  Most of the resources 
are either way so heavy in theory that I can't puzzle them out or are a 
couple of paragraphs long and don't have example with data in them.  Is 
my PRESS in essence saying that my model performance is ~ 75% ? (I 
suspect not, but I don't know thus I ask)


2)  All my observations are spatial in nature and thus I would like to 
plot out spatially where the model is performing well and where it is 
not.  This would be somewhat akin to inspecting residuals in OLS. Is 
there a way to output from cv.glm() the PEs for individual data points ? 

3)  My previous idea was to look at AIC, BIC, McFaddenR2 and PseudoR2 as 
Goodness of Fit measures of each subset model.  It appears that I can 
modify the cost function of cv.glm() but I am not to confident in my 
ability to write the correct cost function.  Are there other valid 
measures of GOF for my negative binomial model that I can substitute 
into the cost function of cv.glm() ?  Would anyone care to recommend one 
(or many) ?


Thanks in advance for your patience !

-Don

PS - if you've seen my previous posts, I've abandoned my 80/20 split 
validation scheme.


--

-Don 


Don Catanzaro, PhD  Landscape Ecologist
[EMAIL PROTECTED]   16144 Sigmond Lane
479-751-3616Lowell, AR 72745

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross validation for lme

2008-08-21 Thread mtb954

Hello,

We would like to perform a cross validation on a linear mixed model (lme)
and wonder if anyone has found something analogous to cv.glm for such
models?

Thanks, Mark

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross-validation in rpart

2008-07-07 Thread Terry Therneau

-- begin included message
I'm having a problem with custom functions in rpart, and before I tear my
hair out trying to fix it, I want to make sure it's actually a problem.  It
seems that, when you write custom functions for rpart (init, split and eval)
then rpart no longer cross-validates the resulting tree to return errors.  A
simple test is to use the usersplits.R function to get a simple, custom
rpart function, and then change fit1 and fit2 so that the both have xvals of
10.  The problem occurs in that the cptable for fit1 doesn't have xerror or
xstd, despite the fact that the cross-validation is set to 10-fold.

I guess I just need conformation that cross-validation doesn't work with
custom functions, and if someone could explain to me why that is the case it
would be greatly appreciated.

Thanks,
Sam Stewart

 end inclusion

  You are right, cross-validation does not happen automatically with 
user-written split functions.  We can think of cross-validation as having two 
steps:

   1. Get the predicted values for each observation, when that obs (or a group) 
is left out of the data set.  There is actually a vector of predicted values, 
one for each level of model complexity.  This step can be done using 
xpred.rpart, which does work for user-defined splits.  It returns a matrix with 
n rows (one per obs) and one column for each of the target cp values.  Call 
this 
matrix yhat.

   2. Summarize each column of the above matrix yhat into a single goodness 
value.  For anova fitting, for instance, this is just colMeans((y-yhat)^2).  
For 
classification models it is a bit more complex, we have to add up the expected 
loss L(y, hat) for each column using the loss matrix and the priors. 
   The reason that rpart does not do this step for a user-written function is 
that rpart does not know what summary is appropriate.  For some splitting 
rules, 
e.g. survival data split using a log-rank test, I'm not sure that \italics{I} 
know what summation is appropriate.  
   
   Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross-validation in rpart

2008-07-03 Thread Sam Stewart

Hello list,

I'm having a problem with custom functions in rpart, and before I tear my
hair out trying to fix it, I want to make sure it's actually a problem.  It
seems that, when you write custom functions for rpart (init, split and eval)
then rpart no longer cross-validates the resulting tree to return errors.  A
simple test is to use the usersplits.R function to get a simple, custom
rpart function, and then change fit1 and fit2 so that the both have xvals of
10.  The problem occurs in that the cptable for fit1 doesn't have xerror or
xstd, despite the fact that the cross-validation is set to 10-fold.

I guess I just need conformation that cross-validation doesn't work with
custom functions, and if someone could explain to me why that is the case it
would be greatly appreciated.

Thanks,
Sam Stewart

-- 
Sam Stewart, MMath
Research Statistician, Diagnostic Imaging
Rm 3016, 3 South Victoria Building
VG Site, QEII Health Sciences Centre
1278 Tower Rd, Halifax, NS, B3H 2Y9

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross-validation in R

2008-06-10 Thread Prof Brian Ripley

1) cv.glm is not 'in R', it is part of contributed package 'boot'.  Please 
give credit where it is due.


2) There is nothing 'cross' about your 'home-made cross validation'. 
cv.glm is support software for a book, so please consult it for the 
definition used of cross-validation, or MASS (the book: see the posting 
guide) or another reputable source.


3) If you want to know how a function works please consult a) its help 
page and b) its code.  Here a) answers at least your first question, and 
your fundamental misunderstanding of 'cross-validation' answers the other 
two.



On Mon, 9 Jun 2008, Luis Orlindo Tedeschi wrote:


Folks; I am having a problem with the cv.glm and would appreciate someone
shedding some light here. It seems obvious but I cannot get it. I did read
the manual, but I could not get more insight. This is a database containing
3363 records and I am trying a cross-validation to understand the process.

When using the cv.glm, code below, I get mean of perr1 of 0.2336 and SD of
0.000139. When using a home-made cross validation, code below, I get mean of
perr2 of 0.2338 and SD of 0.02184. The means are similar but SD are
different.


You are comparing apples and oranges.


Questions are:

(1) how the $delta is computed in the cv.glm? In the home-made version, I
simply use ((Yobs - Ypred)^2)/n. The equation might be correct because the
mean is similar.

(2) in the cv.glm, I have the impression the system is using glm0.dmi that
was generated using all the data points whereas in my homemade version I
only use the test database. I am confused if the cv.glm generates new glm
models for each simulation of if it uses the one provided?

(3) is the cv.glm sampling using replacement = TRUE or not?

Thanks in advance.

LOT




* cv.glm method

glm0.dmi-glm(DMI_kg~Sex+DOF+Avg_Nem+In_Wt)

# Simulation for 50 re-samplings...
perr1.vect-vector()
for (j in 1:50)
  {
  print(j)
  cv.dmi-cv.glm(data.dmi, glm0.dmi, K = 10)
  perr1-cv.dmi$delta[2]
  perr1.vect-c(perr1.vect,perr1)
  }

x11()
hist(perr1.vect)
mean(perr1.vect)
sd(perr1.vect)


* homemade method

# Brute-force cross-validation. This should be similar to the cv.glm
perr2.vect - vector()
for(j in 1:50)
  {
  print(j)
  select.dmi - sample(1:nrow(data.dmi), 0.9*nrow(data.dmi))
  train.dmi - data.dmi[select.dmi,]  #Selecting 90% of the data for
training purpose
  test.dmi - data.dmi[-select.dmi,]  #Selecting 10% (remaining) of the
data for testing purpose
  glm1.dmi - glm(DMI_kg~Sex+DOF+Avg_Nem+In_Wt, na.action=na.omit, data =
train.dmi)
  #Create fitted values using test.dmi data
  dmi_pred - predict.glm(glm1.dmi, test.dmi)
  dmi_obs-test.dmi[,DMI_kg]
  # Get the prediction error = MSE
  perr2 - t(dmi_obs - dmi_pred)%*%(dmi_obs - dmi_pred)/nrow(test.dmi)
  perr2.vect - c(perr2.vect, perr2)
}

x11()
hist(perr2.vect)
mean(perr2.vect)
sd(perr2.vect)


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross-validation in R

2008-06-09 Thread Luis Orlindo Tedeschi

Folks; I am having a problem with the cv.glm and would appreciate someone
shedding some light here. It seems obvious but I cannot get it. I did read
the manual, but I could not get more insight. This is a database containing
3363 records and I am trying a cross-validation to understand the process.

When using the cv.glm, code below, I get mean of perr1 of 0.2336 and SD of
0.000139. When using a home-made cross validation, code below, I get mean of
perr2 of 0.2338 and SD of 0.02184. The means are similar but SD are
different.

Questions are:

(1) how the $delta is computed in the cv.glm? In the home-made version, I
simply use ((Yobs - Ypred)^2)/n. The equation might be correct because the
mean is similar.

(2) in the cv.glm, I have the impression the system is using glm0.dmi that
was generated using all the data points whereas in my homemade version I
only use the test database. I am confused if the cv.glm generates new glm
models for each simulation of if it uses the one provided?

(3) is the cv.glm sampling using replacement = TRUE or not?

Thanks in advance.

LOT




* cv.glm method

glm0.dmi-glm(DMI_kg~Sex+DOF+Avg_Nem+In_Wt)

# Simulation for 50 re-samplings...
perr1.vect-vector()
for (j in 1:50)
   {
   print(j)
   cv.dmi-cv.glm(data.dmi, glm0.dmi, K = 10)
   perr1-cv.dmi$delta[2]
   perr1.vect-c(perr1.vect,perr1)
   }

x11()
hist(perr1.vect)
mean(perr1.vect)
sd(perr1.vect)


* homemade method

# Brute-force cross-validation. This should be similar to the cv.glm
perr2.vect - vector()
for(j in 1:50)
   {
   print(j)
   select.dmi - sample(1:nrow(data.dmi), 0.9*nrow(data.dmi))
   train.dmi - data.dmi[select.dmi,]  #Selecting 90% of the data for
training purpose
   test.dmi - data.dmi[-select.dmi,]  #Selecting 10% (remaining) of the
data for testing purpose
   glm1.dmi - glm(DMI_kg~Sex+DOF+Avg_Nem+In_Wt, na.action=na.omit, data =
train.dmi)
   #Create fitted values using test.dmi data
   dmi_pred - predict.glm(glm1.dmi, test.dmi)
   dmi_obs-test.dmi[,DMI_kg]
   # Get the prediction error = MSE
   perr2 - t(dmi_obs - dmi_pred)%*%(dmi_obs - dmi_pred)/nrow(test.dmi)
   perr2.vect - c(perr2.vect, perr2)
}

x11()
hist(perr2.vect)
mean(perr2.vect)
sd(perr2.vect)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross Validation

2008-03-08 Thread JStainer


Hi,

I am trying to find out the best way to calculate the average LOOCV in R for
several classifier for, KNN, centroid classification, DLDA and SVM.

I have four types of diseases and 62 samples. 

Is there a R code available to do this?


-- 
View this message in context: 
http://www.nabble.com/Cross-Validation-tp15912818p15912818.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross Validation

2008-03-08 Thread JStainer


an example from my R table will calculating the average LOOCV for two
treatments ALL and AML

table
  ALL   AML
11.2 .3
2.87.3
31.1.5
41.2.7
53.21.2
61.11.1
7.90 .99
81.1.32
92.1 1.2






JStainer wrote:
 
 Hi,
 
 I am trying to find out the best way to calculate the average LOOCV in R
 for several classifier for, KNN, centroid classification, DLDA and SVM.
 
 I have four types of diseases and 62 samples. 
 
 Is there a R code available to do this?
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Cross-Validation-tp15912818p15912854.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross Validation

2008-03-08 Thread JStainer




JStainer wrote:
 
 Hi,
 
 I am trying to find out the best way to calculate the average LOOCV in R
 for several classifier for, KNN, centroid classification, DLDA and SVM.
 
 I have four types of diseases and 62 samples. 
 
 Is there a R code available to do this?
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Cross-Validation-tp15912818p15912856.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cross validation

2008-03-08 Thread JStainer


Hi,

I must have accidentally deleted my previous post. I am having a really
difficult time calculating the LOOCV (leave out cross validation).

table in excel
genes ALL   AML  p.value
1  1.2 .3  .01
2  .87.3   .03
3  1.1.5   .05
4 1.2.7.01
5 3.21.2  .02
6 1.11.1   .5

Do i need to import them into R as a matrix?

Is there any script available where i can calculate the LOOCV?

thanks,
John





-- 
View this message in context: 
http://www.nabble.com/cross-validation-tp15913006p15913006.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross Validation

2008-02-27 Thread Carla Rebelo

Hello,

How can I do a cross validation in R?

Thank You!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross Validation

2008-02-27 Thread Patrick Burns

http://www.burns-stat.com/pages/Tutor/bootstrap_resampling.html

may be of some use to you.


Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and A Guide for the Unwilling S User)

Carla Rebelo wrote:

Hello,

How can I do a cross validation in R?

Thank You!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross Validation in rpart

2008-02-22 Thread R Help

Hello All,
 I'm writing a custom rpart function, and I'm wondering about
cross-validation.  Specifically, why isn't my splitting function being
called more often with the xval increased?  One would expect that,
with xval=10 compared to xval=1, that the prior would call the
splitting function more than the other, but they both produce the
exact same thing.  Is there something I'm missing about the
cross-validation process for rpart?

Thanks,
Sam Stewart

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

80 matches

Mail list logo