[R] prediction error for test set-cross validation

Mehmet U Ayvaci Tue, 10 Mar 2009 23:06:23 -0700

Hi,


I have a database of 2211 rows with 31 entries each and I manually split my
data into 10 folds for cross validation. I build logistic regression model
as:  

 

>model <- glm(qual ~ AgGr + FaHx + PrHx + PrSr + PaLp + SvD + IndExam + 

            Rad +BrDn + BRDS + PrinFin+ SkRtr + NpRtr + SkThck +TrThkc +
SkLes + AxAdnp + ArcDst + MaDen + CaDt + MaMG + 

            MaMrp + MaSh + SCTub + SCFoc + MaSz,
family=binomial(link=logit));

 

Where the  variables are taken from the trainSet of size 1989x31. The test
set is sized 222x31. Now my question is when I try to predict on the test
set it gives me the error:

 

> predict.glm(model, testSet, type ="response")

"Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) : 

  subscript out of bounds"

 

It does fine on trainSet. so it is something about the testSet. On the other
hand, I realized that some independent variables say "MaSz" takes 3
different values in the trainset vs. 4 different ones in the testSet. I am
not sure if this is the cause.If so, what would be the remedy?

 

Since I can retrieve the coefficients of the logistic regression, I could
manually calculate response for each entry in the testSet. This could solve
my problem although burdensome. But, I don't know an easy way of doing it as
my logistic regression have 80+ coefficients.

 

 

Could somebody advise?

 

 

Thanks, 

M


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] prediction error for test set-cross validation

Reply via email to