Re: [R] Multiple imputation, especially in rms/Hmisc packages

Frank Harrell Wed, 11 Aug 2010 05:59:21 -0700


Frank E Harrell Jr   Professor and Chairman        School of Medicine
                     Department of Biostatistics   Vanderbilt University

On Tue, 10 Aug 2010, Mark Seeto wrote:

Hi Frank (and others),

Thank you for your comments in reply to my questions. I had not
encountered contrast tests before. I've looked in a few texts, but the
only place I could find anything about contrast tests was your
Regression Modeling Strategies book.

You wrote that the "leave some variables out" F-test and the contrast
tests are equivalent in ordinary regression, so I have tried to verify
this. However, I got different results for the two tests, so I'm hoping
you can point out what I'm doing incorrectly.

The null hypothesis is CB = 0, where C is the contrast matrix and B is
the vector of regression coefficients. I'm using the test statistic on
p. 189 of RMS:
W = (Cb)'(CVC')^{-1}(Cb), with b being the vector of coefficient
estimates and V being its covariance matrix.

The model is y ~ rcs(x1,3) + x2, and I want to test the null hypothesis
that the two x1 coefficients are both zero. This is equivalent to CB =
0, with C being the matrix
[0 1 0 0 (first row); 0 0 1 0 (second row)].

library(rms)
set.seed(1)
x1 <- rnorm(50)
x2 <- rnorm(50)
y <- 1 + x1 + x2 + rnorm(50,0,3)

ols.full <- ols(y ~ rcs(x1,3) + x2) # full model
ols.red <- ols(y ~ x2) # reduced model

RSS.full <- sum((ols.full$fitted - y)^2) # residual sums of squares
RSS.red <- sum((ols.red$fitted - y)^2)

F <- ((RSS.red - RSS.full)/(48 - 46))/(RSS.full/46)  # test statistic
for F-test
F
[1] 2.576335

1 - pf(F, 2, 46)
[1] 0.08698798

anova(ols.full)
               Analysis of Variance          Response: y

Factor     d.f. Partial SS   MS          F    P
x1          2    39.95283222 19.97641611 2.58 0.0870
 Nonlinear  1     0.07360774  0.07360774 0.01 0.9228
x2          1    47.17322335 47.17322335 6.08 0.0174
REGRESSION  3    83.79617922 27.93205974 3.60 0.0203
ERROR      46   356.67539200  7.75381287

anova(ols.full)["x1", "F"]
[1] 2.576335    # This confirms that the result of anova is the same as
that of the "leave some variables out" F-test.

V <- ols.full$var
C <- matrix(c(0,1,0,0, 0,0,1,0), nrow=2, byrow=TRUE)
b <- ols.full$coef
W <- t(C %*% b) %*% solve(C %*% V %*% t(C)) %*% (C %*% b)

1 - pchisq(W, 2)
          [,1]
[1,] 0.07605226

Have I used the correct distribution for W? Should I expect the p-values
from the two tests to be exactly the same?

Mark - you can see the code for this at the bottom of anova.rms.Compute W, divide by numerator d.f., then compute P-value using F withnumerator and error d.f.


Frank

 >

Thanks in advance; I really appreciate any help you can give.

Mark

Frank Harrell wrote:


On Mon, 9 Aug 2010, Mark Seeto wrote:

Hello, I have a general question about combining imputations as well
as a
question specific to the rms and Hmisc packages.

The situation is multiple regression on a data set where multiple
imputation has been used to give M imputed data sets. I know how to get
the combined estimate of the covariance matrix of the estimated
coefficients (average the M covariance matrices from the individual
imputations and add (M+1)/M times the between-imputation covariance
matrix), and I know how to use this to get p-values and confidence
intervals for the individual coefficients.

What I don't know is how to combine imputations to obtain the multiple
degree-of-freedom tests to test whether a set of coefficients are all 0.
If there were no missing values, this would be done using the "general
linear test" which involves fitting the full model and the reduced
model and getting an F statistic based on the residual sums of
squares and
degrees of freedom. How does one correctly do this when there are
multiple
imputations? In the language of Hmisc, I'm asking how the values in
anova(fmi) are calculated if fmi is the result of fit.mult.impute (I've
looked at the anova.rms code but couldn't understand it).


In ordinary regression, the "leave some variables out and compare
R^2"-based F-test and the quicker contrast tests are equivalent.   In
general we use the latter, which is expedient since it is easy to
estimate the imputation-corrected covariance matrix.  In the rms and
Hmisc packages, fit.mult.impute creates the needed fit object, then
anova, contrast, summary, etc. give the correct Wald tests.


>

The second question I have relates to rms/Hmisc specifically. When I use
fit.mult.impute, the fitted values don't appear to match the values
given
by the predict function on the original data. Instead, they match the
fitted values of the last imputation. For example,

library(rms)
set.seed(1)
x1 <- rnorm(40)
x2 <- 0.5*x1 + rnorm(40,0,3)

y <- x1^2 + x2 + rnorm(40,0,3)
x2[40] <- NA   # create 1 missing value in x2
a <- aregImpute(~ y + x1 + x2, n.impute=2, nk=3) # 2 imputations

x2.imp1 <- x2; x2.imp1[40] <- a$imputed$x2[,1]  # first imputed x2
vector
x2.imp2 <- x2; x2.imp2[40] <- a$imputed$x2[,2]  # second imputed x2
vector

ols.imp1 <- ols(y ~ rcs(x1,3) + x2.imp1)  # model on imputation 1
ols.imp2 <- ols(y ~ rcs(x1,3) + x2.imp2)  # model on imputation 2

d <- data.frame(y, x1, x2)
fmi <- fit.mult.impute(y ~ rcs(x1,3) + x2, ols, a, data=d)

fmi$fitted
predict(fmi, d)

I would have expected the results of fmi$fitted and predict(fmi,d) to be
the same except for the final NA, but they are not. Instead,
fmi$fitted is
the same as ols.imp2$fitted. Also,

anova(ols.imp2)
anova(fmi)

unexpectedly give the same result in the ERROR line. I would be most
grateful if anyone could explain this to me.


I need to add some more warnings.  For many of the calculations, the
last imputation is used.

Frank


Thanks,
Mark
--
Mark Seeto
Statistician

National Acoustic Laboratories <http://www.nal.gov.au/>
A Division of Australian Hearing

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple imputation, especially in rms/Hmisc packages

Reply via email to