Dear R-list,

I'm not sure what I've found about a function in DAAG package is a bug.

When I was using cv.lm(DAAG) , I found there might be something wrong with
it. The problem is that we can't use it to deal with a linear model with
more than one predictor variable. But the usage documentation
hasn't informed us about this.

The code illustrates my discovery:

> library(DAAG)
> xx=matrix(rnorm(20*3),ncol=3)
> bb=c(1,2,0)
> yy=xx%*%bb+rnorm(20,0,10)
>
> data=data.frame(y=yy,x=xx)
> myformula=formula("y ~ x.1 + x.2 + x.3")
> cv.lm(data,myformula,plotit=F, printit=TRUE)
Analysis of Variance Table

Response: yv
          Df Sum Sq Mean Sq F value Pr(>F)
xv         1     37      37    0.29    0.6
Residuals 18   2288     127


fold 1
Observations in test set: 4 6 7 9 10 19
                X1      X2     X3    X4      X5     X6
x.1        -0.0316  -0.342  -1.44  1.42  -0.446  0.042
Predicted  -1.6335  -0.990   1.29 -4.64  -0.773 -1.786
y         -16.7876 -25.954 -14.67 -2.29 -28.118  7.731
Residual  -15.1541 -24.964 -15.96  2.35 -27.344  9.517

Sum of squares = 1951    Mean square = 325    n = 6

fold 2
Observations in test set: 5 11 12 14 15 16 20
              X1     X2    X3    X4      X5      X6      X7
x.1        0.472  0.282  2.20  1.75   0.253 -0.0938  0.1543
Predicted -5.089 -5.385 -2.40 -3.10  -5.431 -5.9707 -5.5842
y         -5.894 -8.855 -7.32  2.88 -16.414 -3.0530  0.0434
Residual  -0.805 -3.470 -4.92  5.97 -10.983  2.9177  5.6276

Sum of squares = 233    Mean square = 33.3    n = 7

fold 3
Observations in test set: 1 2 3 8 13 17 18
              X1     X2    X3       X4     X5      X6        X7
x.1        0.429  1.925  0.31  -0.0194  -1.45  -0.836   0.00308
Predicted -8.592 -0.873 -9.20 -10.9030 -18.28 -15.117 -10.78682
y         11.045 -8.562  6.64 -14.6833   6.95   0.873   1.41586
Residual  19.637 -7.689 15.84  -3.7803  25.23  15.990  12.20268

Sum of squares = 1751    Mean square = 250    n = 7
Overall ms
       197 ######################################################## Note the
model ("y ~ x.1 + x.2 + x.3") produces an Overall ms, 197
> myformula=formula("y ~ x.1 + x.2")
> cv.lm(data,myformula,plotit=F, printit=TRUE)
Analysis of Variance Table

Response: yv
          Df Sum Sq Mean Sq F value Pr(>F)
xv         1     37      37    0.29    0.6
Residuals 18   2288     127


fold 1
Observations in test set: 4 6 7 9 10 19
                X1      X2     X3    X4      X5     X6
x.1        -0.0316  -0.342  -1.44  1.42  -0.446  0.042
Predicted  -1.6335  -0.990   1.29 -4.64  -0.773 -1.786
y         -16.7876 -25.954 -14.67 -2.29 -28.118  7.731
Residual  -15.1541 -24.964 -15.96  2.35 -27.344  9.517

Sum of squares = 1951    Mean square = 325    n = 6

fold 2
Observations in test set: 5 11 12 14 15 16 20
              X1     X2    X3    X4      X5      X6      X7
x.1        0.472  0.282  2.20  1.75   0.253 -0.0938  0.1543
Predicted -5.089 -5.385 -2.40 -3.10  -5.431 -5.9707 -5.5842
y         -5.894 -8.855 -7.32  2.88 -16.414 -3.0530  0.0434
Residual  -0.805 -3.470 -4.92  5.97 -10.983  2.9177  5.6276

Sum of squares = 233    Mean square = 33.3    n = 7

fold 3
Observations in test set: 1 2 3 8 13 17 18
              X1     X2    X3       X4     X5      X6        X7
x.1        0.429  1.925  0.31  -0.0194  -1.45  -0.836   0.00308
Predicted -8.592 -0.873 -9.20 -10.9030 -18.28 -15.117 -10.78682
y         11.045 -8.562  6.64 -14.6833   6.95   0.873   1.41586
Residual  19.637 -7.689 15.84  -3.7803  25.23  15.990  12.20268

Sum of squares = 1751    Mean square = 250    n = 7
Overall ms
       197 ######################################################## Note the
model ("y ~ x.1 + x.2 ") also produces an Overall ms, 197
> myformula=formula("y ~ x.1 ")
> cv.lm(data,myformula,plotit=F, printit=TRUE)
Analysis of Variance Table

Response: yv
          Df Sum Sq Mean Sq F value Pr(>F)
xv         1     37      37    0.29    0.6
Residuals 18   2288     127


fold 1
Observations in test set: 4 6 7 9 10 19
                X1      X2     X3    X4      X5     X6
x.1        -0.0316  -0.342  -1.44  1.42  -0.446  0.042
Predicted  -1.6335  -0.990   1.29 -4.64  -0.773 -1.786
y         -16.7876 -25.954 -14.67 -2.29 -28.118  7.731
Residual  -15.1541 -24.964 -15.96  2.35 -27.344  9.517

Sum of squares = 1951    Mean square = 325    n = 6

fold 2
Observations in test set: 5 11 12 14 15 16 20
              X1     X2    X3    X4      X5      X6      X7
x.1        0.472  0.282  2.20  1.75   0.253 -0.0938  0.1543
Predicted -5.089 -5.385 -2.40 -3.10  -5.431 -5.9707 -5.5842
y         -5.894 -8.855 -7.32  2.88 -16.414 -3.0530  0.0434
Residual  -0.805 -3.470 -4.92  5.97 -10.983  2.9177  5.6276

Sum of squares = 233    Mean square = 33.3    n = 7

fold 3
Observations in test set: 1 2 3 8 13 17 18
              X1     X2    X3       X4     X5      X6        X7
x.1        0.429  1.925  0.31  -0.0194  -1.45  -0.836   0.00308
Predicted -8.592 -0.873 -9.20 -10.9030 -18.28 -15.117 -10.78682
y         11.045 -8.562  6.64 -14.6833   6.95   0.873   1.41586
Residual  19.637 -7.689 15.84  -3.7803  25.23  15.990  12.20268

Sum of squares = 1751    Mean square = 250    n = 7
Overall ms
       197 ######################################################## Note the
model ("y ~ x.1 + x.2 ") ALSO produces an Overall ms, 197

3 different linear model give three equal mss(mean squared error)!?
Then I was eager to know why 3 different linear model gave three equal
mss(mean squared error). I checked the code of function cv.lm(DAAG), then
found the residues were all derived from a model with only one predictor.

Is this a bug? Or is it because of my misunderstanding of somthing about
cv.lm(DAAG)?

Li Junjie
-- 
Junjie Li,                  [EMAIL PROTECTED]
Undergranduate in DEP of Tsinghua University,

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to