Re: [R] Conservative ANOVA tables in lmer
On 9/12/06, Douglas Bates [EMAIL PROTECTED] wrote: On 9/11/06, Manuel Morales [EMAIL PROTECTED] wrote: [snip] Am I right that the MCMC sample can not be used, however, to evaluate the significance of parameter groups. For example, to assess the significance of a three-level factor? Are there better alternatives than simply adjusting the CI for the number of factor levels (1-alpha/levels). Hmm - I'm not sure what confidence interval and what number of levels you mean there so I can't comment on that method. Suppose we go back to Spencer's example and consider if there is a signficant effect for the Nozzle factor. That is equivalent to the hypothesis H_0: beta_2 = beta_3 = 0 versus the general alternative. A p-value could be formulated from an MCMC sample if we assume that the marginal distribution of the parameter estimates for beta_2 and beta_3 has roughly elliptical contours and you can evaluate that by, say, examining a hexbin plot of the values in the MCMC sample. One could take the ellipses as defined by the standard errors and estimated correlation or, probably better, by the observed standard deviations and correlations in the MCMC sample. Then determine the proportion of (beta_2, beta_3) pairs in the sample that fall outside the ellipse centered at the estimates and with that eccentricity and scaling factors that passes through (0,0). That would be an empirical p-value for the test. I would recommend calculating this for a couple of samples to check on the reproducibility. There was some follow-up on this thread, including some code and results that I find encouraging. I didn't notice that the R-help list had been dropped off the cc: in later exchanges. I enclose my contribution to the conversation so that others on the list will get to see it. --- exerpt from previous private message As soon as I described the idea I knew that someone would ask for a function to perform it. Here's one mcmcpvalue - function(samp) { ## elementary version that creates an empirical p-value for the ## hypothesis that the columns of samp have mean zero versus a ## general multivariate distribution with elliptical contours. ## differences from the mean standardized by the observed ## variance-covariance factor std - backsolve(chol(var(samp)), cbind(0, t(samp)) - colMeans(samp), transpose = TRUE) sqdist - colSums(std * std) sum(sqdist[-1] sqdist[1])/nrow(samp) } At least I think I have the standardization by the Cholesky factor of the observed variance-covariance matrix correct. However I always manage to confuse myself on that calculation so please let me know if I have it wrong. As an example, consider a model fit to the AvgDailyGain data from the SASmixed package. library(nlme) data(AvgDailyGain, package = SASmixed) summary(fm1Adg - lme(adg ~ InitWt*Treatment - 1, AvgDailyGain, random = ~1|Block)) Linear mixed-effects model fit by REML Data: AvgDailyGain AIC BIClogLik 85.32685 97.10739 -32.66342 Random effects: Formula: ~1 | Block (Intercept) Residual StdDev: 0.5092266 0.2223268 Fixed effects: adg ~ InitWt * Treatment - 1 Value Std.Error DFt-value p-value InitWt 0.0022937 0.0017473 17 1.3126947 0.2067 Treatment0 0.4391370 0.7110881 17 0.6175564 0.5451 Treatment10 1.4261187 0.6375458 17 2.2368880 0.0390 Treatment20 0.4796285 0.5488867 17 0.8738206 0.3944 Treatment30 0.2001071 0.7751989 17 0.2581365 0.7994 InitWt:Treatment10 -0.0012108 0.0023326 17 -0.5190774 0.6104 InitWt:Treatment20 0.0010720 0.0021737 17 0.4931507 0.6282 InitWt:Treatment30 0.0021543 0.0027863 17 0.7731996 0.4500 Correlation: InitWt Trtmn0 Trtm10 Trtm20 Trtm30 IW:T10 IW:T20 Treatment0 -0.961 Treatment10 0.034 0.039 Treatment20 0.003 0.080 0.334 Treatment30 0.050 0.011 0.097 0.043 InitWt:Treatment10 -0.772 0.742 -0.631 -0.164 -0.059 InitWt:Treatment20 -0.806 0.775 -0.180 -0.555 -0.019 0.724 InitWt:Treatment30 -0.666 0.640 -0.046 0.024 -0.754 0.529 0.520 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -1.82903364 -0.44913967 -0.03023488 0.44738506 1.59877700 Number of Observations: 32 Number of Groups: 8 anova(fm1Adg) numDF denDF F-value p-value InitWt 117 91.68230 .0001 Treatment417 8.81312 0.0005 InitWt:Treatment 317 0.93118 0.4471 Fitting the same model in lmer then generating an MCMC sample and testing for the three interaction coefficients being zero would look like data(AvgDailyGain, package = SASmixed) (fm1Adg - lmer(adg ~ InitWt*Treatment - 1 + (1|Block), AvgDailyGain)) Linear mixed-effects model fit by REML Formula: adg ~ InitWt * Treatment - 1 + (1 | Block) Data: AvgDailyGain AIC BIC logLik MLdeviance REMLdeviance 83.33
Re: [R] Conservative ANOVA tables in lmer
- Original Message - From: Manuel Morales [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Douglas Bates [EMAIL PROTECTED]; Manuel Morales [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Sent: Wednesday, September 13, 2006 1:04 PM Subject: Re: [R] Conservative ANOVA tables in lmer On Wed, 2006-09-13 at 08:04 +1000, Andrew Robinson wrote: On Tue, September 12, 2006 7:34 am, Manuel Morales wrote: On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote: Having made that offer I think I will now withdraw it. Peter's example has convinced me that this is the wrong thing to do. I am encouraged by the fact that the results from mcmcsamp correspond closely to the correct theoretical results in the case that Peter described. I appreciate that some users will find it difficult to work with a MCMC sample (or to convince editors to accept results based on such a sample) but I think that these results indicate that it is better to go after the marginal distribution of the fixed effects estimates (which is what is being approximated by the MCMC sample - up to Bayesian/frequentist philosophical differences) than to use the conditional distribution and somehow try to adjust the reference distribution. Am I right that the MCMC sample can not be used, however, to evaluate the significance of parameter groups. For example, to assess the significance of a three-level factor? Are there better alternatives than simply adjusting the CI for the number of factor levels (1-alpha/levels). I wonder whether the likelihood ratio test would be suitable here? That seems to be supported. It just takes a little longer. require(lme4) data(sleepstudy) fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy) fm2 - lmer(Reaction ~ Days + I(Days^2) + (Days|Subject), sleepstudy) anova(fm1, fm2) So, a brief overview of the popular inferential needs and solutions would then be: 1) Test the statistical significance of one or more fixed or random effects - fit a model with and a model without the terms, and use the LRT. I believe that the LRT is anti-conservative for fixed effects, as described in Pinheiro and Bates companion book to NLME. You have this effect if you're using REML, for ML I don't think there is any problem to use LRT between nested models with different fixed-effects structure. Best, Dimitris 2) Obtain confidence intervals for one or more fixed or random effects - use mcmcsamp Did I miss anything important? - What else would people like to do? Cheers Andrew Andrew Robinson Senior Lecturer in Statistics Tel: +61-3-8344-9763 Department of Mathematics and StatisticsFax: +61-3-8344 4599 University of Melbourne, VIC 3010 Australia Email: [EMAIL PROTECTED]Website: http://www.ms.unimelb.edu.au __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
On 9/13/06, Dimitris Rizopoulos [EMAIL PROTECTED] wrote: - Original Message - From: Manuel Morales [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Douglas Bates [EMAIL PROTECTED]; Manuel Morales [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Sent: Wednesday, September 13, 2006 1:04 PM Subject: Re: [R] Conservative ANOVA tables in lmer On Wed, 2006-09-13 at 08:04 +1000, Andrew Robinson wrote: On Tue, September 12, 2006 7:34 am, Manuel Morales wrote: On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote: Having made that offer I think I will now withdraw it. Peter's example has convinced me that this is the wrong thing to do. I am encouraged by the fact that the results from mcmcsamp correspond closely to the correct theoretical results in the case that Peter described. I appreciate that some users will find it difficult to work with a MCMC sample (or to convince editors to accept results based on such a sample) but I think that these results indicate that it is better to go after the marginal distribution of the fixed effects estimates (which is what is being approximated by the MCMC sample - up to Bayesian/frequentist philosophical differences) than to use the conditional distribution and somehow try to adjust the reference distribution. Am I right that the MCMC sample can not be used, however, to evaluate the significance of parameter groups. For example, to assess the significance of a three-level factor? Are there better alternatives than simply adjusting the CI for the number of factor levels (1-alpha/levels). I wonder whether the likelihood ratio test would be suitable here? That seems to be supported. It just takes a little longer. require(lme4) data(sleepstudy) fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy) fm2 - lmer(Reaction ~ Days + I(Days^2) + (Days|Subject), sleepstudy) anova(fm1, fm2) So, a brief overview of the popular inferential needs and solutions would then be: 1) Test the statistical significance of one or more fixed or random effects - fit a model with and a model without the terms, and use the LRT. I believe that the LRT is anti-conservative for fixed effects, as described in Pinheiro and Bates companion book to NLME. You have this effect if you're using REML, for ML I don't think there is any problem to use LRT between nested models with different fixed-effects structure. There are two issues here: how the test statistic is evaluated and what reference distribution is used for the test statistic (i.e. how do you convert a value of the test statistic to a p-value). Manuel is addressing the latter issue. If you compare the difference of -2*logLik for the models to a chi-square with degrees of freedom determined by the difference in the number of parameters the test will be anti-conservative when the number of observations is small. The use of the chi-square as a reference distribution is based on asymptotic properties. The other question is how does one evaluate the likelihood-ratio test statistic and that is the issue that Dimitris is addressing. The REML criterion is a modified likelihood and it is inappropriate to look at differences in the REML criterion when the models being compared have different fixed-effects specifications, or even a different parameterization of the fixed effects. However, the anova method for an lmer object does not use the REML criterion even when the model has been estimated by REML. It uses the profiled log-likelihood evaluated at the REML estimates of the relative variances of the random effects. That's a complicated statement so let me break it down. The optimization in lmer is done with respect to the elements of the variance-covariance matrix of the random effects relative to $\sigma^2$. Given these values the conditional estimates of the fixed-effects parameters and of $\sigma^2$ can be evaluated directly with some linear algebra. In the summary or show output of an lmer model there are two quantities called the MLdeviance and the REMLdeviance. Those are based on the same relative variances but different conditional estimates of $\sigma^2$ (and hence different estimates of the elements of the variance-covariance of the random effects). It turns out that there is very little difference in the value of the profiled log-likelihood at the ML estimates and at the REML estimates. This is not to say that the log-likelihood is similar at the two (complete) sets of estimates - it is the profiled log-likelihoods that are similar and these are what are used to create the likelihood ratio test statistic, even when the models have been fit by REML. As I said, this is complicated and until I went to reply to this message I hadn't really sorted it out for myself. I knew the empirical result, which I had checked before releasing the code, but I hadn't worked out the details of why
Re: [R] Conservative ANOVA tables in lmer
On 9/13/06, Dimitris Rizopoulos [EMAIL PROTECTED] wrote: - Original Message - From: Manuel Morales [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Douglas Bates [EMAIL PROTECTED]; Manuel Morales [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Sent: Wednesday, September 13, 2006 1:04 PM Subject: Re: [R] Conservative ANOVA tables in lmer On Wed, 2006-09-13 at 08:04 +1000, Andrew Robinson wrote: On Tue, September 12, 2006 7:34 am, Manuel Morales wrote: On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote: Having made that offer I think I will now withdraw it. Peter's example has convinced me that this is the wrong thing to do. I am encouraged by the fact that the results from mcmcsamp correspond closely to the correct theoretical results in the case that Peter described. I appreciate that some users will find it difficult to work with a MCMC sample (or to convince editors to accept results based on such a sample) but I think that these results indicate that it is better to go after the marginal distribution of the fixed effects estimates (which is what is being approximated by the MCMC sample - up to Bayesian/frequentist philosophical differences) than to use the conditional distribution and somehow try to adjust the reference distribution. Am I right that the MCMC sample can not be used, however, to evaluate the significance of parameter groups. For example, to assess the significance of a three-level factor? Are there better alternatives than simply adjusting the CI for the number of factor levels (1-alpha/levels). I wonder whether the likelihood ratio test would be suitable here? That seems to be supported. It just takes a little longer. require(lme4) data(sleepstudy) fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy) fm2 - lmer(Reaction ~ Days + I(Days^2) + (Days|Subject), sleepstudy) anova(fm1, fm2) So, a brief overview of the popular inferential needs and solutions would then be: 1) Test the statistical significance of one or more fixed or random effects - fit a model with and a model without the terms, and use the LRT. I believe that the LRT is anti-conservative for fixed effects, as described in Pinheiro and Bates companion book to NLME. You have this effect if you're using REML, for ML I don't think there is any problem to use LRT between nested models with different fixed-effects structure. There are two issues here: how the test statistic is evaluated and what reference distribution is used for the test statistic (i.e. how do you convert a value of the test statistic to a p-value). Manuel is addressing the latter issue. If you compare the difference of -2*logLik for the models to a chi-square with degrees of freedom determined by the difference in the number of parameters the test will be anti-conservative when the number of observations is small. The use of the chi-square as a reference distribution is based on asymptotic properties. The other question is how does one evaluate the likelihood-ratio test statistic and that is the issue that Dimitris is addressing. The REML criterion is a modified likelihood and it is inappropriate to look at differences in the REML criterion when the models being compared have different fixed-effects specifications, or even a different parameterization of the fixed effects. However, the anova method for an lmer object does not use the REML criterion even when the model has been estimated by REML. It uses the profiled log-likelihood evaluated at the REML estimates of the relative variances of the random effects. That's a complicated statement so let me break it down. The optimization in lmer is done with respect to the elements of the variance-covariance matrix of the random effects relative to $\sigma^2$. Given these values the conditional estimates of the fixed-effects parameters and of $\sigma^2$ can be evaluated directly with some linear algebra. In the summary or show output of an lmer model there are two quantities called the MLdeviance and the REMLdeviance. Those are based on the same relative variances but different conditional estimates of $\sigma^2$ (and hence different estimates of the elements of the variance-covariance of the random effects). It turns out that there is very little difference in the value of the profiled log-likelihood at the ML estimates and at the REML estimates. This is not to say that the log-likelihood is similar at the two (complete) sets of estimates - it is the profiled log-likelihoods that are similar and these are what are used to create the likelihood ratio test statistic, even when the models have been fit by REML. As I said, this is complicated and until I went to reply to this message I hadn't really sorted it out for myself. I knew the empirical result, which I had checked before releasing the code, but I hadn't worked out the details of why
Re: [R] Conservative ANOVA tables in lmer
On Wed, 2006-09-13 at 08:04 +1000, Andrew Robinson wrote: On Tue, September 12, 2006 7:34 am, Manuel Morales wrote: On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote: Having made that offer I think I will now withdraw it. Peter's example has convinced me that this is the wrong thing to do. I am encouraged by the fact that the results from mcmcsamp correspond closely to the correct theoretical results in the case that Peter described. I appreciate that some users will find it difficult to work with a MCMC sample (or to convince editors to accept results based on such a sample) but I think that these results indicate that it is better to go after the marginal distribution of the fixed effects estimates (which is what is being approximated by the MCMC sample - up to Bayesian/frequentist philosophical differences) than to use the conditional distribution and somehow try to adjust the reference distribution. Am I right that the MCMC sample can not be used, however, to evaluate the significance of parameter groups. For example, to assess the significance of a three-level factor? Are there better alternatives than simply adjusting the CI for the number of factor levels (1-alpha/levels). I wonder whether the likelihood ratio test would be suitable here? That seems to be supported. It just takes a little longer. require(lme4) data(sleepstudy) fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy) fm2 - lmer(Reaction ~ Days + I(Days^2) + (Days|Subject), sleepstudy) anova(fm1, fm2) So, a brief overview of the popular inferential needs and solutions would then be: 1) Test the statistical significance of one or more fixed or random effects - fit a model with and a model without the terms, and use the LRT. I believe that the LRT is anti-conservative for fixed effects, as described in Pinheiro and Bates companion book to NLME. 2) Obtain confidence intervals for one or more fixed or random effects - use mcmcsamp Did I miss anything important? - What else would people like to do? Cheers Andrew Andrew Robinson Senior Lecturer in Statistics Tel: +61-3-8344-9763 Department of Mathematics and StatisticsFax: +61-3-8344 4599 University of Melbourne, VIC 3010 Australia Email: [EMAIL PROTECTED]Website: http://www.ms.unimelb.edu.au __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
[snip] Douglas Bates wrote: Hmm - I'm not sure what confidence interval and what number of levels you mean there so I can't comment on that method. Suppose we go back to Spencer's example and consider if there is a signficant effect for the Nozzle factor. That is equivalent to the hypothesis H_0: beta_2 = beta_3 = 0 versus the general alternative. A p-value could be formulated from an MCMC sample if we assume that the marginal distribution of the parameter estimates for beta_2 and beta_3 has roughly elliptical contours and you can evaluate that by, say, examining a hexbin plot of the values in the MCMC sample. One could take the ellipses as defined by the standard errors and estimated correlation or, probably better, by the observed standard deviations and correlations in the MCMC sample. Then determine the proportion of (beta_2, beta_3) pairs in the sample that fall outside the ellipse centered at the estimates and with that eccentricity and scaling factors that passes through (0,0). That would be an empirical p-value for the test. I would recommend calculating this for a couple of samples to check on the reproducibility. Here is another thought for an empirical p-value that may be easier to compute and would require fewer assumptions: Take the proportion of MCMC samples that fall into each quadrant (++, +-, -+, --) and use the smallest of these proportions as the p-value (or the smallest out of a subset of the quadrants for a one-sided style test). Think of it this way, if the smallest proportion is greater than alpha, then any closed curve (ellipse, polygon, even concave polygons) that includes 1-alpha proportion of the points would need to include points from all 4 quadrants and therefore any convex curve would have to include (0,0) which is consistent with the null hypothesis. On the other hand if there is a quadrant that contains fewer than alpha percent of the points then there exists at least one confidence region (possibly concave) that contains 1-alpha proportion of the points and excludes (0,0) and that entire quadrant, which is consistent with the alternative that at least one of the betas differs from 0. A more conservative p-value would be to take the minimum proportion and muliply it by 4 (or 2^p for p simultaneous tests) which is the same idea as multipying by 2 for a 2 sided univariate test and assumes that the confidence regions would exclude similar proportions of points in each direction (central confidence regions rather than minimum length or other confidence regions). This seems to me that it would be over conservative in some cases (since all the proportions must sum to 1, we don't really have 4 degrees of freedom and a smaller adjustment factor may still be correct and less conservative). Some simulations would be a good idea to see if the plain minimum is to liberal and how conservative the other approach is for common situations. This is just my first thoughts on the matter, I have not tested anything, so any comments or other discussion of this idea is welcome. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
On Wed, Sep 13, 2006 at 07:04:17AM -0400, Manuel Morales wrote: On Wed, 2006-09-13 at 08:04 +1000, Andrew Robinson wrote: On Tue, September 12, 2006 7:34 am, Manuel Morales wrote: On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote: Having made that offer I think I will now withdraw it. Peter's example has convinced me that this is the wrong thing to do. I am encouraged by the fact that the results from mcmcsamp correspond closely to the correct theoretical results in the case that Peter described. I appreciate that some users will find it difficult to work with a MCMC sample (or to convince editors to accept results based on such a sample) but I think that these results indicate that it is better to go after the marginal distribution of the fixed effects estimates (which is what is being approximated by the MCMC sample - up to Bayesian/frequentist philosophical differences) than to use the conditional distribution and somehow try to adjust the reference distribution. Am I right that the MCMC sample can not be used, however, to evaluate the significance of parameter groups. For example, to assess the significance of a three-level factor? Are there better alternatives than simply adjusting the CI for the number of factor levels (1-alpha/levels). I wonder whether the likelihood ratio test would be suitable here? That seems to be supported. It just takes a little longer. require(lme4) data(sleepstudy) fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy) fm2 - lmer(Reaction ~ Days + I(Days^2) + (Days|Subject), sleepstudy) anova(fm1, fm2) So, a brief overview of the popular inferential needs and solutions would then be: 1) Test the statistical significance of one or more fixed or random effects - fit a model with and a model without the terms, and use the LRT. I believe that the LRT is anti-conservative for fixed effects, as described in Pinheiro and Bates companion book to NLME. Yes, you are right. I had forgotten that. Back to square one :). Bert Gunter also kindly pointed this out to me. Cherse Andrew 2) Obtain confidence intervals for one or more fixed or random effects - use mcmcsamp Did I miss anything important? - What else would people like to do? Cheers Andrew Andrew Robinson Senior Lecturer in Statistics Tel: +61-3-8344-9763 Department of Mathematics and StatisticsFax: +61-3-8344 4599 University of Melbourne, VIC 3010 Australia Email: [EMAIL PROTECTED]Website: http://www.ms.unimelb.edu.au __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 Email: [EMAIL PROTECTED] http://www.ms.unimelb.edu.au __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
On 9/11/06, Manuel Morales [EMAIL PROTECTED] wrote: On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote: On 9/10/06, Andrew Robinson [EMAIL PROTECTED] wrote: On Thu, Sep 07, 2006 at 07:59:58AM -0500, Douglas Bates wrote: I would be happy to re-institute p-values for fixed effects in the summary and anova methods for lmer objects using a denominator degrees of freedom based on the trace of the hat matrix or the rank of Z:X if others will volunteer to respond to the these answers are obviously wrong because they don't agree with whatever and the idiot who wrote this software should be thrashed to within an inch of his life messages. I don't have the patience. This seems to be more than fair to me. I'll volunteer to help explain why the anova.lmer() output doesn't match SAS, etc. Is it worth putting a caveat in the output and the help files? Is it even worth writing a FAQ about this? Having made that offer I think I will now withdraw it. Peter's example has convinced me that this is the wrong thing to do. I am encouraged by the fact that the results from mcmcsamp correspond closely to the correct theoretical results in the case that Peter described. I appreciate that some users will find it difficult to work with a MCMC sample (or to convince editors to accept results based on such a sample) but I think that these results indicate that it is better to go after the marginal distribution of the fixed effects estimates (which is what is being approximated by the MCMC sample - up to Bayesian/frequentist philosophical differences) than to use the conditional distribution and somehow try to adjust the reference distribution. Am I right that the MCMC sample can not be used, however, to evaluate the significance of parameter groups. For example, to assess the significance of a three-level factor? Are there better alternatives than simply adjusting the CI for the number of factor levels (1-alpha/levels). Hmm - I'm not sure what confidence interval and what number of levels you mean there so I can't comment on that method. Suppose we go back to Spencer's example and consider if there is a signficant effect for the Nozzle factor. That is equivalent to the hypothesis H_0: beta_2 = beta_3 = 0 versus the general alternative. A p-value could be formulated from an MCMC sample if we assume that the marginal distribution of the parameter estimates for beta_2 and beta_3 has roughly elliptical contours and you can evaluate that by, say, examining a hexbin plot of the values in the MCMC sample. One could take the ellipses as defined by the standard errors and estimated correlation or, probably better, by the observed standard deviations and correlations in the MCMC sample. Then determine the proportion of (beta_2, beta_3) pairs in the sample that fall outside the ellipse centered at the estimates and with that eccentricity and scaling factors that passes through (0,0). That would be an empirical p-value for the test. I would recommend calculating this for a couple of samples to check on the reproducibility. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
On Tue, September 12, 2006 7:34 am, Manuel Morales wrote: On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote: Having made that offer I think I will now withdraw it. Peter's example has convinced me that this is the wrong thing to do. I am encouraged by the fact that the results from mcmcsamp correspond closely to the correct theoretical results in the case that Peter described. I appreciate that some users will find it difficult to work with a MCMC sample (or to convince editors to accept results based on such a sample) but I think that these results indicate that it is better to go after the marginal distribution of the fixed effects estimates (which is what is being approximated by the MCMC sample - up to Bayesian/frequentist philosophical differences) than to use the conditional distribution and somehow try to adjust the reference distribution. Am I right that the MCMC sample can not be used, however, to evaluate the significance of parameter groups. For example, to assess the significance of a three-level factor? Are there better alternatives than simply adjusting the CI for the number of factor levels (1-alpha/levels). I wonder whether the likelihood ratio test would be suitable here? That seems to be supported. It just takes a little longer. require(lme4) data(sleepstudy) fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy) fm2 - lmer(Reaction ~ Days + I(Days^2) + (Days|Subject), sleepstudy) anova(fm1, fm2) So, a brief overview of the popular inferential needs and solutions would then be: 1) Test the statistical significance of one or more fixed or random effects - fit a model with and a model without the terms, and use the LRT. 2) Obtain confidence intervals for one or more fixed or random effects - use mcmcsamp Did I miss anything important? - What else would people like to do? Cheers Andrew Andrew Robinson Senior Lecturer in Statistics Tel: +61-3-8344-9763 Department of Mathematics and StatisticsFax: +61-3-8344 4599 University of Melbourne, VIC 3010 Australia Email: [EMAIL PROTECTED]Website: http://www.ms.unimelb.edu.au __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
Peter's example and Doug's different test reply sent me Scheffé's discussion of the balanced and replicated mixed-effect 2-away layout. As I note below, the obvious F test for the fixed effect does not appear to be likelihood ratio for anything. Douglas Bates wrote: On 9/7/06, Douglas Bates [EMAIL PROTECTED] wrote: On 07 Sep 2006 17:20:29 +0200, Peter Dalgaard [EMAIL PROTECTED] wrote: Martin Maechler [EMAIL PROTECTED] writes: DB == Douglas Bates [EMAIL PROTECTED] on Thu, 7 Sep 2006 07:59:58 -0500 writes: DB Thanks for your summary, Hank. DB On 9/7/06, Martin Henry H. Stevens [EMAIL PROTECTED] wrote: Dear lmer-ers, My thanks for all of you who are sharing your trials and tribulations publicly. I was hoping to elicit some feedback on my thoughts on denominator degrees of freedom for F ratios in mixed models. These thoughts and practices result from my reading of previous postings by Doug Bates and others. - I start by assuming that the appropriate denominator degrees lies between n - p and and n - q, where n=number of observations, p=number of fixed effects (rank of model matrix X), and q=rank of Z:X. DB I agree with this but the opinion is by no means universal. Initially DB I misread the statement because I usually write the number of columns DB of Z as q. DB It is not easy to assess rank of Z:X numerically. In many cases one DB can reason what it should be from the form of the model but a general DB procedure to assess the rank of a matrix, especially a sparse matrix, DB is difficult. DB An alternative which can be easily calculated is n - t where t is the DB trace of the 'hat matrix'. The function 'hatTrace' applied to a DB fitted lmer model evaluates this trace (conditional on the estimates DB of the relative variances of the random effects). - I then conclude that good estimates of P values on the F ratios lie between 1 - pf(F.ratio, numDF, n-p) and 1 - pf(F.ratio, numDF, n-q). -- I further surmise that the latter of these (1 - pf(F.ratio, numDF, n-q)) is the more conservative estimate. This assumes that the true distribution (under H0) of that F ratio *is* F_{n1,n2} for some (possibly non-integer) n1 and n2. But AFAIU, this is only approximately true at best, and AFAIU, the quality of this approximation has only been investigated empirically for some situations. Hence, even your conservative estimate of the P value could be wrong (I mean wrong on the wrong side instead of just conservatively wrong). Consequently, such a P-value is only ``approximately conservative'' ... I agree howevert that in some situations, it might be a very useful descriptive statistic about the fitted model. I'm very wary of ANY attempt at guesswork in these matters. I may be understanding the post wrongly, but consider this case: Y_ij = mu + z_i + eps_ij, i = 1..3, j=1..100 I get rank(X)=1, rank(X:Z)=3, n=300 It is well known that the test for mu=0 in this case is obtained by reducing data to group means, xbar_i, and then do a one-sample t test, the square of which is F(1, 2), but it seems to be suggested that F(1, 297) is a conservative test???! It's a different test, isn't it? Your test is based upon the between group sum of squares with 2 df. I am proposing to use the within group sum of squares or its generalization. On closer examination I see that you are indeed correct. I have heard that well-known result many times and finally sat down to prove it to myself. For a balanced design the standard error of the intercept using the REML estimates is the same as the standard error of the mean calculated from the group means. data(Rail, package = 'nlme') library(lme4) summary(fm1 - lmer(travel ~ 1 + (1|Rail), Rail)) Linear mixed-effects model fit by REML Formula: travel ~ 1 + (1 | Rail) Data: Rail AIC BIC logLik MLdeviance REMLdeviance 126.2 128.0 -61.09 128.6122.2 Random effects: Groups NameVariance Std.Dev. Rail (Intercept) 615.286 24.8050 Residual 16.167 4.0208 number of obs: 18, groups: Rail, 6 Fixed effects: Estimate Std. Error t value (Intercept)66.50 10.17 6.538 mns - with(Rail, tapply(travel, Rail, mean)) # group means sd(mns)/sqrt(length(mns)) # standard error matches that from lmer [1] 10.17104 t.test(mns) One Sample t-test data: mns t = 6.5382, df = 5, p-value = 0.001253 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 40.35452 92.64548 sample estimates: mean of x 66.5 ctab - summary(fm1)@coefs # coefficient table ctab[,1] + c(-1,1) * qt(0.975, 15) * ctab[,2] # 95% conf. int. [1] 44.82139 88.17861 ##
Re: [R] Conservative ANOVA tables in lmer
Hi Spencer, I would like to make sure I understand Spencer's question and doubt's, below. On Sep 9, 2006, at 11:36 PM, Spencer Graves wrote: Peter's example and Doug's different test reply sent me Scheffé's discussion of the balanced and replicated mixed-effect 2- away layout. As I note below, the obvious F test for the fixed effect does not appear to be likelihood ratio for anything. Douglas Bates wrote: On 9/7/06, Douglas Bates [EMAIL PROTECTED] wrote: On 07 Sep 2006 17:20:29 +0200, Peter Dalgaard [EMAIL PROTECTED] wrote: Martin Maechler [EMAIL PROTECTED] writes: DB == Douglas Bates [EMAIL PROTECTED] on Thu, 7 Sep 2006 07:59:58 -0500 writes: DB Thanks for your summary, Hank. DB On 9/7/06, Martin Henry H. Stevens [EMAIL PROTECTED] wrote: Dear lmer-ers, My thanks for all of you who are sharing your trials and tribulations publicly. I was hoping to elicit some feedback on my thoughts on denominator degrees of freedom for F ratios in mixed models. These thoughts and practices result from my reading of previous postings by Doug Bates and others. - I start by assuming that the appropriate denominator degrees lies between n - p and and n - q, where n=number of observations, p=number of fixed effects (rank of model matrix X), and q=rank of Z:X. DB I agree with this but the opinion is by no means universal. Initially DB I misread the statement because I usually write the number of columns DB of Z as q. DB It is not easy to assess rank of Z:X numerically. In many cases one DB can reason what it should be from the form of the model but a general DB procedure to assess the rank of a matrix, especially a sparse matrix, DB is difficult. DB An alternative which can be easily calculated is n - t where t is the DB trace of the 'hat matrix'. The function 'hatTrace' applied to a DB fitted lmer model evaluates this trace (conditional on the estimates DB of the relative variances of the random effects). - I then conclude that good estimates of P values on the F ratios lie between 1 - pf(F.ratio, numDF, n-p) and 1 - pf (F.ratio, numDF, n-q). -- I further surmise that the latter of these (1 - pf (F.ratio, numDF, n-q)) is the more conservative estimate. This assumes that the true distribution (under H0) of that F ratio *is* F_{n1,n2} for some (possibly non-integer) n1 and n2. But AFAIU, this is only approximately true at best, and AFAIU, the quality of this approximation has only been investigated empirically for some situations. Hence, even your conservative estimate of the P value could be wrong (I mean wrong on the wrong side instead of just conservatively wrong). Consequently, such a P-value is only ``approximately conservative'' ... I agree howevert that in some situations, it might be a very useful descriptive statistic about the fitted model. I'm very wary of ANY attempt at guesswork in these matters. I may be understanding the post wrongly, but consider this case: Y_ij = mu + z_i + eps_ij, i = 1..3, j=1..100 I get rank(X)=1, rank(X:Z)=3, n=300 It is well known that the test for mu=0 in this case is obtained by reducing data to group means, xbar_i, and then do a one-sample t test, the square of which is F(1, 2), but it seems to be suggested that F(1, 297) is a conservative test???! It's a different test, isn't it? Your test is based upon the between group sum of squares with 2 df. I am proposing to use the within group sum of squares or its generalization. On closer examination I see that you are indeed correct. I have heard that well-known result many times and finally sat down to prove it to myself. For a balanced design the standard error of the intercept using the REML estimates is the same as the standard error of the mean calculated from the group means. data(Rail, package = 'nlme') library(lme4) summary(fm1 - lmer(travel ~ 1 + (1|Rail), Rail)) Linear mixed-effects model fit by REML Formula: travel ~ 1 + (1 | Rail) Data: Rail AIC BIC logLik MLdeviance REMLdeviance 126.2 128.0 -61.09 128.6122.2 Random effects: Groups NameVariance Std.Dev. Rail (Intercept) 615.286 24.8050 Residual 16.167 4.0208 number of obs: 18, groups: Rail, 6 Fixed effects: Estimate Std. Error t value (Intercept)66.50 10.17 6.538 mns - with(Rail, tapply(travel, Rail, mean)) # group means sd(mns)/sqrt(length(mns)) # standard error matches that from lmer [1] 10.17104 t.test(mns) One Sample t-test data: mns t = 6.5382, df = 5, p-value = 0.001253 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 40.35452 92.64548 sample estimates: mean of x 66.5 ctab - summary(fm1)@coefs #
Re: [R] Conservative ANOVA tables in lmer
Martin Henry H. Stevens wrote: Hi Spencer, I would like to make sure I understand Spencer's question and doubt's, below. On Sep 9, 2006, at 11:36 PM, Spencer Graves wrote: Peter's example and Doug's different test reply sent me Scheffé's discussion of the balanced and replicated mixed-effect 2-away layout. As I note below, the obvious F test for the fixed effect does not appear to be likelihood ratio for anything. Douglas Bates wrote: On 9/7/06, Douglas Bates [EMAIL PROTECTED] wrote: On 07 Sep 2006 17:20:29 +0200, Peter Dalgaard [EMAIL PROTECTED] wrote: snip I'm very wary of ANY attempt at guesswork in these matters. I may be understanding the post wrongly, but consider this case: Y_ij = mu + z_i + eps_ij, i = 1..3, j=1..100 I get rank(X)=1, rank(X:Z)=3, n=300 It is well known that the test for mu=0 in this case is obtained by reducing data to group means, xbar_i, and then do a one-sample t test, the square of which is F(1, 2), but it seems to be suggested that F(1, 297) is a conservative test???! It's a different test, isn't it? Your test is based upon the between group sum of squares with 2 df. I am proposing to use the within group sum of squares or its generalization. snip For the traditional, balanced, replicated, 2-way mixed-effects analysis, Scheffé (1959, Table 8.1.1, p. 269) gives the expected mean squares for a two-way layout with I levels of a fixed effect A, J levels of a random effect B, and K replicates, as follows: EMS(A: fixed) = var(e) + K*var(A:B) + J*K*MeanSquareA EMS(B: random) = var(e) + I*K*var(B) EMS(A:B; random)=var(e)+K*var(A:B) EMSE = var(e). In this case, the obvious test for A is MS(A: fixed) / MS(A:B, random), because this gives us a standard F statistic to test MeanSquareA = 0. However, it doesn't make sense to me to test A without simultaneously assuming var(A:B) = 0. Does one want to test whether var(A:B)=0 because the F-test assumes it? That is, that as var(A:B) increases, the var ratio MS{A:fixed)/MS(A:B, random) decreases, artifactually reducing the significance of J:K:MeanSquareA? SG: Scheffe recommends testing var(A:B)=0 using MS(A:B)/MSE, which follows an F distribution under both null and alternative hypotheses (but scaled differently under the alternative). This is a monotonic transformation of the standard likelihood ratio test, and makes sense to me. SG: Scheffe's recommended test for MeanSquareA = 0 is MSA/MS(A:B). I haven't worked out the details, but it looks like this assumes that the A:B interaction may be real without an A effect, and this violates a basic principle of hierarchy, I think. There are situations where an A:B interaction can exist without the main effect, but that rests on a particular parameterization, and we have not assumed that in this context. SG: I think this is an argument for using likelihood ratio over Scheffe's traditional answer. Similar to Doug's comment about different tests, we should be clear about what each procedure tests, and then select the procedure that seems to be the closest to the problem we really want to solve, rather than simply relying on a test statistic whose distribution is better understood. I'm not criticizing Scheffe: He didn't have the computer tools available in 1959 that we have today. The same argument applies to Peter's simpler case discussed above: With Y_ij = mu + z_i + eps_ij, it only rarely makes sense to test mu=0 while assuming var(z) != 0. In the balanced 2-way, mixed-effects analysis, the Neyman-Pearson thing to do, I would think, would be to test simultaneously MeanSquareA = 0 with var(A:B) = 0. In lmer, I might write this as follows: anova(lmer(y~A+(A|B)), lmer(y~1+(1|B)). However, this does NOT match the standard analysis associated with this design, does it? To check this, I considered problem 8.1 in Scheffé (p. 289), which compares 3 different nozzles (fixed effect) tested by 5 different operators (random effect). The data are as follows: y - c(6,6,-15, 26,12,5, 11,4,4, 21,14,7, 25,18,25, 13,6,13,4,4,11, 17,10,17, -5,2,-5, 15,8,1, 10,10,-11, -35,0,-14, 11,-10,-17, 12,-2,-16, -4,10,24) Nozzle - data.frame(Nozzle=rep(LETTERS[1:3], e=15), Operator=rep(letters[1:5], e=3), flowRate=y) The traditional analysis can be obtained from anova(lm(flowRate~Nozzle*Operator, ...)), but comparing MeanSq.Nozzle to MeanSq.Nozzle:Operator rather than MeanSquareResidual, as follows: fitAB0 - lm(flowRate~Nozzle*Operator, data=Nozzle) (aov.AB0 - anova(fitAB0)) Analysis of Variance Table Response: flowRate Df Sum Sq Mean Sq F value Pr(F) Nozzle 2 1426.98 713.49 7.0456 0.003101 ** Operator 4 798.80 199.70 1.9720 0.124304 Nozzle:Operator 8 1821.47 227.68 2.2484 0.051640 . Residuals 30 3038.00 101.27 --- Signif. codes: 0 '***' 0.001 '**'
Re: [R] Conservative ANOVA tables in lmer
On 9/10/06, Andrew Robinson [EMAIL PROTECTED] wrote: On Thu, Sep 07, 2006 at 07:59:58AM -0500, Douglas Bates wrote: I would be happy to re-institute p-values for fixed effects in the summary and anova methods for lmer objects using a denominator degrees of freedom based on the trace of the hat matrix or the rank of Z:X if others will volunteer to respond to the these answers are obviously wrong because they don't agree with whatever and the idiot who wrote this software should be thrashed to within an inch of his life messages. I don't have the patience. This seems to be more than fair to me. I'll volunteer to help explain why the anova.lmer() output doesn't match SAS, etc. Is it worth putting a caveat in the output and the help files? Is it even worth writing a FAQ about this? Having made that offer I think I will now withdraw it. Peter's example has convinced me that this is the wrong thing to do. I am encouraged by the fact that the results from mcmcsamp correspond closely to the correct theoretical results in the case that Peter described. I appreciate that some users will find it difficult to work with a MCMC sample (or to convince editors to accept results based on such a sample) but I think that these results indicate that it is better to go after the marginal distribution of the fixed effects estimates (which is what is being approximated by the MCMC sample - up to Bayesian/frequentist philosophical differences) than to use the conditional distribution and somehow try to adjust the reference distribution. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
On 9/10/06, John Maindonald [EMAIL PROTECTED] wrote: A Wiki entry is an excellent idea. I am happy to try to help. An account of mcmcsamp() might be very useful part of the Wiki. My limited investigations suggest that once the data starts to overwhelm the prior (maybe ~3 df for an effect that is of interest), the posterior distribution that it gives provides a very good approximation to the sampling distribution. I have been meaning to put aside time to try to work out, with the help of a colleague here at ANU, how the Kenward Roger (Biometrics, 1997) approximation might be implemented in lmer, but it has'nt yet happened and is unlikely to do so for a while. I think it would be nontrivial to do this for a general case. A literal translation of the formulas in that paper may be suitable for simple cases but not for general cases. Like many papers in this literature this one has the inverse of an n by n matrix (n being the number of observations) embedded in the middle of most of the formulas. Given that some users are seriously considering fitting models for which n is in the millions, forming and manipulating an n by n matrix in such cases is out of the question unless you can exploit special properties of the matrix. John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 10 Sep 2006, at 8:00 PM, [EMAIL PROTECTED] wrote: From: Spencer Graves [EMAIL PROTECTED] Date: 10 September 2006 4:54:50 PM To: Andrew Robinson [EMAIL PROTECTED] Cc: Douglas Bates [EMAIL PROTECTED], R-Help r- [EMAIL PROTECTED] Subject: Re: [R] Conservative ANOVA tables in lmer Hi, Doug, et al.: I'll volunteer to do the same, which is an extension of much of what I've been doing for R Help for a while now. Regarding writing a FAQ, what about a Wiki entry (and maybe ultimately a vignette)? This thread provides notes around which such could be built. Another piece might be an example from Scheffé (1958), which I sent as a reply to an earlier comment on this thread, (foolishly sent without reducing the cc list, which means it awaits moderator approval). Each time a question of this nature arises, someone checks the Wiki, edits adds something to it if necessary, then replies to the list with the reference to the appropriate Wiki entry. Spencer Graves Andrew Robinson wrote: On Thu, Sep 07, 2006 at 07:59:58AM -0500, Douglas Bates wrote: I would be happy to re-institute p-values for fixed effects in the summary and anova methods for lmer objects using a denominator degrees of freedom based on the trace of the hat matrix or the rank of Z:X if others will volunteer to respond to the these answers are obviously wrong because they don't agree with whatever and the idiot who wrote this software should be thrashed to within an inch of his life messages. I don't have the patience. This seems to be more than fair to me. I'll volunteer to help explain why the anova.lmer() output doesn't match SAS, etc. Is it worth putting a caveat in the output and the help files? Is it even worth writing a FAQ about this? Cheers Andrew [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote: On 9/10/06, Andrew Robinson [EMAIL PROTECTED] wrote: On Thu, Sep 07, 2006 at 07:59:58AM -0500, Douglas Bates wrote: I would be happy to re-institute p-values for fixed effects in the summary and anova methods for lmer objects using a denominator degrees of freedom based on the trace of the hat matrix or the rank of Z:X if others will volunteer to respond to the these answers are obviously wrong because they don't agree with whatever and the idiot who wrote this software should be thrashed to within an inch of his life messages. I don't have the patience. This seems to be more than fair to me. I'll volunteer to help explain why the anova.lmer() output doesn't match SAS, etc. Is it worth putting a caveat in the output and the help files? Is it even worth writing a FAQ about this? Having made that offer I think I will now withdraw it. Peter's example has convinced me that this is the wrong thing to do. I am encouraged by the fact that the results from mcmcsamp correspond closely to the correct theoretical results in the case that Peter described. I appreciate that some users will find it difficult to work with a MCMC sample (or to convince editors to accept results based on such a sample) but I think that these results indicate that it is better to go after the marginal distribution of the fixed effects estimates (which is what is being approximated by the MCMC sample - up to Bayesian/frequentist philosophical differences) than to use the conditional distribution and somehow try to adjust the reference distribution. Am I right that the MCMC sample can not be used, however, to evaluate the significance of parameter groups. For example, to assess the significance of a three-level factor? Are there better alternatives than simply adjusting the CI for the number of factor levels (1-alpha/levels). Thanks! Manuel = Manuel A. Morales Asst. Prof., Biology Williams College http://mutualism.williams.edu __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
Hi, Doug, et al.: I'll volunteer to do the same, which is an extension of much of what I've been doing for R Help for a while now. Regarding writing a FAQ, what about a Wiki entry (and maybe ultimately a vignette)? This thread provides notes around which such could be built. Another piece might be an example from Scheffé (1958), which I sent as a reply to an earlier comment on this thread, (foolishly sent without reducing the cc list, which means it awaits moderator approval). Each time a question of this nature arises, someone checks the Wiki, edits adds something to it if necessary, then replies to the list with the reference to the appropriate Wiki entry. Spencer Graves Andrew Robinson wrote: On Thu, Sep 07, 2006 at 07:59:58AM -0500, Douglas Bates wrote: I would be happy to re-institute p-values for fixed effects in the summary and anova methods for lmer objects using a denominator degrees of freedom based on the trace of the hat matrix or the rank of Z:X if others will volunteer to respond to the these answers are obviously wrong because they don't agree with whatever and the idiot who wrote this software should be thrashed to within an inch of his life messages. I don't have the patience. This seems to be more than fair to me. I'll volunteer to help explain why the anova.lmer() output doesn't match SAS, etc. Is it worth putting a caveat in the output and the help files? Is it even worth writing a FAQ about this? Cheers Andrew __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
Spencer, Would you add the reference to the WIKI? I think it would help round out this thread. John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC, University of Maryland School of Medicine Claude D. Pepper OAIC, University of Maryland Clinical Nutrition Research Unit, and Baltimore VA Center Stroke of Excellence University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) [EMAIL PROTECTED] Spencer Graves [EMAIL PROTECTED] 09/10/06 2:54 AM Hi, Doug, et al.: I'll volunteer to do the same, which is an extension of much of what I've been doing for R Help for a while now. Regarding writing a FAQ, what about a Wiki entry (and maybe ultimately a vignette)? This thread provides notes around which such could be built. Another piece might be an example from Scheffé (1958), which I sent as a reply to an earlier comment on this thread, (foolishly sent without reducing the cc list, which means it awaits moderator approval). Each time a question of this nature arises, someone checks the Wiki, edits adds something to it if necessary, then replies to the list with the reference to the appropriate Wiki entry. Spencer Graves Andrew Robinson wrote: On Thu, Sep 07, 2006 at 07:59:58AM -0500, Douglas Bates wrote: I would be happy to re-institute p-values for fixed effects in the summary and anova methods for lmer objects using a denominator degrees of freedom based on the trace of the hat matrix or the rank of Z:X if others will volunteer to respond to the these answers are obviously wrong because they don't agree with whatever and the idiot who wrote this software should be thrashed to within an inch of his life messages. I don't have the patience. This seems to be more than fair to me. I'll volunteer to help explain why the anova.lmer() output doesn't match SAS, etc. Is it worth putting a caveat in the output and the help files? Is it even worth writing a FAQ about this? Cheers Andrew __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is\ for the s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
A Wiki entry is an excellent idea. I am happy to try to help. An account of mcmcsamp() might be very useful part of the Wiki. My limited investigations suggest that once the data starts to overwhelm the prior (maybe ~3 df for an effect that is of interest), the posterior distribution that it gives provides a very good approximation to the sampling distribution. I have been meaning to put aside time to try to work out, with the help of a colleague here at ANU, how the Kenward Roger (Biometrics, 1997) approximation might be implemented in lmer, but it has'nt yet happened and is unlikely to do so for a while. John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 10 Sep 2006, at 8:00 PM, [EMAIL PROTECTED] wrote: From: Spencer Graves [EMAIL PROTECTED] Date: 10 September 2006 4:54:50 PM To: Andrew Robinson [EMAIL PROTECTED] Cc: Douglas Bates [EMAIL PROTECTED], R-Help r- [EMAIL PROTECTED] Subject: Re: [R] Conservative ANOVA tables in lmer Hi, Doug, et al.: I'll volunteer to do the same, which is an extension of much of what I've been doing for R Help for a while now. Regarding writing a FAQ, what about a Wiki entry (and maybe ultimately a vignette)? This thread provides notes around which such could be built. Another piece might be an example from Scheffé (1958), which I sent as a reply to an earlier comment on this thread, (foolishly sent without reducing the cc list, which means it awaits moderator approval). Each time a question of this nature arises, someone checks the Wiki, edits adds something to it if necessary, then replies to the list with the reference to the appropriate Wiki entry. Spencer Graves Andrew Robinson wrote: On Thu, Sep 07, 2006 at 07:59:58AM -0500, Douglas Bates wrote: I would be happy to re-institute p-values for fixed effects in the summary and anova methods for lmer objects using a denominator degrees of freedom based on the trace of the hat matrix or the rank of Z:X if others will volunteer to respond to the these answers are obviously wrong because they don't agree with whatever and the idiot who wrote this software should be thrashed to within an inch of his life messages. I don't have the patience. This seems to be more than fair to me. I'll volunteer to help explain why the anova.lmer() output doesn't match SAS, etc. Is it worth putting a caveat in the output and the help files? Is it even worth writing a FAQ about this? Cheers Andrew [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
On Thu, Sep 07, 2006 at 07:59:58AM -0500, Douglas Bates wrote: I would be happy to re-institute p-values for fixed effects in the summary and anova methods for lmer objects using a denominator degrees of freedom based on the trace of the hat matrix or the rank of Z:X if others will volunteer to respond to the these answers are obviously wrong because they don't agree with whatever and the idiot who wrote this software should be thrashed to within an inch of his life messages. I don't have the patience. This seems to be more than fair to me. I'll volunteer to help explain why the anova.lmer() output doesn't match SAS, etc. Is it worth putting a caveat in the output and the help files? Is it even worth writing a FAQ about this? Cheers Andrew -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 Email: [EMAIL PROTECTED] http://www.ms.unimelb.edu.au __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conservative ANOVA tables in lmer
Dear list, I have written functions to perform simulation-based tests of HO: Var(Random Effects)=0, beta_foo=0 in linear mixed models based on the exact distribution of the LR- and Restricted LR-test statistic (see research presented in Crainiceanu, Ruppert: LRT in LMM with 1 variance component, J. R. Statist. Soc. B (2004), 66, Part 1, pp. 165-185 ) -they are about 15-20 times faster than the parametric bootstrap. At the moment, the exact distributions are only easily simulated for the case of 1 single variance component/random effect and i.i.d. errors; feasible approximations for the multivariate case are currently being investigated and will be implemented soon. the syntax looks something like this: #begin code: data(sleepstudy) summary(sleepstudy) #Effect of sleep deprivation on reaction time xyplot(Reaction~Days|Subject, data=sleepstudy) m-lmer(Reaction~Days+(Days-1|Subject),data=sleepstudy) #random slopes, but no random intercept #doesna make sense, but it's just an example summary(m) #test for individual heterogeneity based on RLRT #(No restrictions on fixed effects under H0) #HO: lambda=Var(RandomSlopes)/Var(error)==0 == Var(RandomSlopes)==0 t3-RLRT1SimTest(m, lambda0=0, seed=5, nsim=1) #will produce output: #HO: lambda = 0 ; p-value = 0 # observed lambda = 0.06259639 #test for influence of Days based on LRT #(restriction on fixed efects: beta_Days==0) m0-lm(Reaction~1,data=sleepstudy) t4-LRT1SimTest(m, m0, seed=10, nsim=1) #will produce output: #Model under HO: Reaction ~ 1 ; #Model under HA: Reaction ~ Days + (Days - 1 | Subject) ; # p-value = 0 # observed lambda = 0.06259639 #end code If you are interested in using these functions i'll be glad to send them to you- be aware, however, that you can only use them for testing 1 Random Effect vs. no Random Effect in a model with i.i.d. errors!! The plan is to put them in a package beginning next year and use them as a basis for an (exact) anova.lmer() method. Greetings, Fabian Scheipl -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
Dear list, I have written functions to perform simulation-based tests of HO: Var(Random Effects)=0, beta_foo=0 in linear mixed models based on the exact distribution of the LR- and Restricted LR-test statistic (see research presented in Crainiceanu, Ruppert: LRT in LMM with 1 variance component, J. R. Statist. Soc. B (2004), 66, Part 1, pp. 165-185 ) -they are about 15-20 times faster than the parametric bootstrap. At the moment, the exact distributions are only easily simulated for the case of 1 single variance component/random effect and i.i.d. errors; feasible approximations for the multivariate case are currently being investigated and will be implemented soon. the syntax looks something like this: #begin code: data(sleepstudy) summary(sleepstudy) #Effect of sleep deprivation on reaction time xyplot(Reaction~Days|Subject, data=sleepstudy) m-lmer(Reaction~Days+(Days-1|Subject),data=sleepstudy) #random slopes, but no random intercept #doesna make sense, but it's just an example summary(m) #test for individual heterogeneity based on RLRT #(No restrictions on fixed effects under H0) #HO: lambda=Var(RandomSlopes)/Var(error)==0 == Var(RandomSlopes)==0 t3-RLRT1SimTest(m, lambda0=0, seed=5, nsim=1) #will produce output: #HO: lambda = 0 ; p-value = 0 # observed lambda = 0.06259639 #test for influence of Days based on LRT #(restriction on fixed efects: beta_Days==0) m0-lm(Reaction~1,data=sleepstudy) t4-LRT1SimTest(m, m0, seed=10, nsim=1) #will produce output: #Model under HO: Reaction ~ 1 ; #Model under HA: Reaction ~ Days + (Days - 1 | Subject) ; # p-value = 0 # observed lambda = 0.06259639 #end code If you are interested in using these functions i'll be glad to send them to you- be aware, however, that you can only use them for testing 1 Random Effect vs. no Random Effect in a model with i.i.d. errors!! The plan is to put them in a package beginning next year and use them as a basis for an (exact) anova.lmer() method. Greetings, Fabian Scheipl -- -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
On 9/7/06, Douglas Bates [EMAIL PROTECTED] wrote: On 07 Sep 2006 17:20:29 +0200, Peter Dalgaard [EMAIL PROTECTED] wrote: Martin Maechler [EMAIL PROTECTED] writes: DB == Douglas Bates [EMAIL PROTECTED] on Thu, 7 Sep 2006 07:59:58 -0500 writes: DB Thanks for your summary, Hank. DB On 9/7/06, Martin Henry H. Stevens [EMAIL PROTECTED] wrote: Dear lmer-ers, My thanks for all of you who are sharing your trials and tribulations publicly. I was hoping to elicit some feedback on my thoughts on denominator degrees of freedom for F ratios in mixed models. These thoughts and practices result from my reading of previous postings by Doug Bates and others. - I start by assuming that the appropriate denominator degrees lies between n - p and and n - q, where n=number of observations, p=number of fixed effects (rank of model matrix X), and q=rank of Z:X. DB I agree with this but the opinion is by no means universal. Initially DB I misread the statement because I usually write the number of columns DB of Z as q. DB It is not easy to assess rank of Z:X numerically. In many cases one DB can reason what it should be from the form of the model but a general DB procedure to assess the rank of a matrix, especially a sparse matrix, DB is difficult. DB An alternative which can be easily calculated is n - t where t is the DB trace of the 'hat matrix'. The function 'hatTrace' applied to a DB fitted lmer model evaluates this trace (conditional on the estimates DB of the relative variances of the random effects). - I then conclude that good estimates of P values on the F ratios lie between 1 - pf(F.ratio, numDF, n-p) and 1 - pf(F.ratio, numDF, n-q). -- I further surmise that the latter of these (1 - pf(F.ratio, numDF, n-q)) is the more conservative estimate. This assumes that the true distribution (under H0) of that F ratio *is* F_{n1,n2} for some (possibly non-integer) n1 and n2. But AFAIU, this is only approximately true at best, and AFAIU, the quality of this approximation has only been investigated empirically for some situations. Hence, even your conservative estimate of the P value could be wrong (I mean wrong on the wrong side instead of just conservatively wrong). Consequently, such a P-value is only ``approximately conservative'' ... I agree howevert that in some situations, it might be a very useful descriptive statistic about the fitted model. I'm very wary of ANY attempt at guesswork in these matters. I may be understanding the post wrongly, but consider this case: Y_ij = mu + z_i + eps_ij, i = 1..3, j=1..100 I get rank(X)=1, rank(X:Z)=3, n=300 It is well known that the test for mu=0 in this case is obtained by reducing data to group means, xbar_i, and then do a one-sample t test, the square of which is F(1, 2), but it seems to be suggested that F(1, 297) is a conservative test???! It's a different test, isn't it? Your test is based upon the between group sum of squares with 2 df. I am proposing to use the within group sum of squares or its generalization. On closer examination I see that you are indeed correct. I have heard that well-known result many times and finally sat down to prove it to myself. For a balanced design the standard error of the intercept using the REML estimates is the same as the standard error of the mean calculated from the group means. data(Rail, package = 'nlme') library(lme4) summary(fm1 - lmer(travel ~ 1 + (1|Rail), Rail)) Linear mixed-effects model fit by REML Formula: travel ~ 1 + (1 | Rail) Data: Rail AIC BIC logLik MLdeviance REMLdeviance 126.2 128.0 -61.09 128.6122.2 Random effects: Groups NameVariance Std.Dev. Rail (Intercept) 615.286 24.8050 Residual 16.167 4.0208 number of obs: 18, groups: Rail, 6 Fixed effects: Estimate Std. Error t value (Intercept)66.50 10.17 6.538 mns - with(Rail, tapply(travel, Rail, mean)) # group means sd(mns)/sqrt(length(mns)) # standard error matches that from lmer [1] 10.17104 t.test(mns) One Sample t-test data: mns t = 6.5382, df = 5, p-value = 0.001253 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 40.35452 92.64548 sample estimates: mean of x 66.5 ctab - summary(fm1)@coefs # coefficient table ctab[,1] + c(-1,1) * qt(0.975, 15) * ctab[,2] # 95% conf. int. [1] 44.82139 88.17861 ## interval using df = # of obs - rank of [Z:X] is too narrow So my proposal of using either the trace of the hat matrix or the rank of the combined model matrices as the degrees of freedom for the model is not conservative. However, look at the following set.seed(123454321)
[R] Conservative ANOVA tables in lmer
Dear lmer-ers, My thanks for all of you who are sharing your trials and tribulations publicly. I was hoping to elicit some feedback on my thoughts on denominator degrees of freedom for F ratios in mixed models. These thoughts and practices result from my reading of previous postings by Doug Bates and others. - I start by assuming that the appropriate denominator degrees lies between n - p and and n - q, where n=number of observations, p=number of fixed effects (rank of model matrix X), and q=rank of Z:X. - I then conclude that good estimates of P values on the F ratios lie between 1 - pf(F.ratio, numDF, n-p) and 1 - pf(F.ratio, numDF, n-q). - I further surmise that the latter of these (1 - pf(F.ratio, numDF, n-q)) is the more conservative estimate. When I use these criteria and compare my ANOVA table to the results of analysis of Helmert contrasts using MCMC sample with highest posterior density intervals, I find that my conclusions (e.g. factor A, with three levels, has a significant effect on the response variable) are qualitatively the same. Comments? Hank Dr. Hank Stevens, Assistant Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/~stevenmh/ http://www.muohio.edu/ecology/ http://www.muohio.edu/botany/ E Pluribus Unum __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
Thanks for your summary, Hank. On 9/7/06, Martin Henry H. Stevens [EMAIL PROTECTED] wrote: Dear lmer-ers, My thanks for all of you who are sharing your trials and tribulations publicly. I was hoping to elicit some feedback on my thoughts on denominator degrees of freedom for F ratios in mixed models. These thoughts and practices result from my reading of previous postings by Doug Bates and others. - I start by assuming that the appropriate denominator degrees lies between n - p and and n - q, where n=number of observations, p=number of fixed effects (rank of model matrix X), and q=rank of Z:X. I agree with this but the opinion is by no means universal. Initially I misread the statement because I usually write the number of columns of Z as q. It is not easy to assess rank of Z:X numerically. In many cases one can reason what it should be from the form of the model but a general procedure to assess the rank of a matrix, especially a sparse matrix, is difficult. An alternative which can be easily calculated is n - t where t is the trace of the 'hat matrix'. The function 'hatTrace' applied to a fitted lmer model evaluates this trace (conditional on the estimates of the relative variances of the random effects). - I then conclude that good estimates of P values on the F ratios lie between 1 - pf(F.ratio, numDF, n-p) and 1 - pf(F.ratio, numDF, n-q). - I further surmise that the latter of these (1 - pf(F.ratio, numDF, n-q)) is the more conservative estimate. When I use these criteria and compare my ANOVA table to the results of analysis of Helmert contrasts using MCMC sample with highest posterior density intervals, I find that my conclusions (e.g. factor A, with three levels, has a significant effect on the response variable) are qualitatively the same. Comments? I would be happy to re-institute p-values for fixed effects in the summary and anova methods for lmer objects using a denominator degrees of freedom based on the trace of the hat matrix or the rank of Z:X if others will volunteer to respond to the these answers are obviously wrong because they don't agree with whatever and the idiot who wrote this software should be thrashed to within an inch of his life messages. I don't have the patience. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
Dear Douglas, I would be happy to re-institute p-values for fixed effects in the summary and anova methods for lmer objects using a denominator degrees of freedom based on the trace of the hat matrix or the rank of Z:X Please do! if others will volunteer to respond to the these answers are obviously wrong because they don't agree with whatever and the idiot who wrote this software should be thrashed to within an inch of his life messages. I don't have the patience. I would try to take up my shares of these type or questions. Best regards, Lorenz - Lorenz Gygax Dr. sc. nat., postdoc Centre for proper housing of ruminants and pigs Swiss Federal Veterinary Office Agroscope Reckenholz-Tänikon Research Station ART Tänikon, CH-8356 Ettenhausen / Switzerland Tel: +41 052 368 33 84 Fax: +41 052 365 11 90 [EMAIL PROTECTED] www.art.admin.ch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
DB == Douglas Bates [EMAIL PROTECTED] on Thu, 7 Sep 2006 07:59:58 -0500 writes: DB Thanks for your summary, Hank. DB On 9/7/06, Martin Henry H. Stevens [EMAIL PROTECTED] wrote: Dear lmer-ers, My thanks for all of you who are sharing your trials and tribulations publicly. I was hoping to elicit some feedback on my thoughts on denominator degrees of freedom for F ratios in mixed models. These thoughts and practices result from my reading of previous postings by Doug Bates and others. - I start by assuming that the appropriate denominator degrees lies between n - p and and n - q, where n=number of observations, p=number of fixed effects (rank of model matrix X), and q=rank of Z:X. DB I agree with this but the opinion is by no means universal. Initially DB I misread the statement because I usually write the number of columns DB of Z as q. DB It is not easy to assess rank of Z:X numerically. In many cases one DB can reason what it should be from the form of the model but a general DB procedure to assess the rank of a matrix, especially a sparse matrix, DB is difficult. DB An alternative which can be easily calculated is n - t where t is the DB trace of the 'hat matrix'. The function 'hatTrace' applied to a DB fitted lmer model evaluates this trace (conditional on the estimates DB of the relative variances of the random effects). - I then conclude that good estimates of P values on the F ratios lie between 1 - pf(F.ratio, numDF, n-p) and 1 - pf(F.ratio, numDF, n-q). -- I further surmise that the latter of these (1 - pf(F.ratio, numDF, n-q)) is the more conservative estimate. This assumes that the true distribution (under H0) of that F ratio *is* F_{n1,n2} for some (possibly non-integer) n1 and n2. But AFAIU, this is only approximately true at best, and AFAIU, the quality of this approximation has only been investigated empirically for some situations. Hence, even your conservative estimate of the P value could be wrong (I mean wrong on the wrong side instead of just conservatively wrong). Consequently, such a P-value is only ``approximately conservative'' ... I agree howevert that in some situations, it might be a very useful descriptive statistic about the fitted model. Martin When I use these criteria and compare my ANOVA table to the results of analysis of Helmert contrasts using MCMC sample with highest posterior density intervals, I find that my conclusions (e.g. factor A, with three levels, has a significant effect on the response variable) are qualitatively the same. Comments? DB I would be happy to re-institute p-values for fixed effects in the DB summary and anova methods for lmer objects using a denominator degrees DB of freedom based on the trace of the hat matrix or the rank of Z:X if DB others will volunteer to respond to the these answers are obviously DB wrong because they don't agree with whatever and the idiot who wrote DB this software should be thrashed to within an inch of his life DB messages. I don't have the patience. DB __ DB R-help@stat.math.ethz.ch mailing list DB https://stat.ethz.ch/mailman/listinfo/r-help DB PLEASE do read the posting guide http://www.R-project.org/posting-guide.html DB and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
On 9/7/06, Martin Maechler [EMAIL PROTECTED] wrote: DB == Douglas Bates [EMAIL PROTECTED] on Thu, 7 Sep 2006 07:59:58 -0500 writes: DB Thanks for your summary, Hank. DB On 9/7/06, Martin Henry H. Stevens [EMAIL PROTECTED] wrote: Dear lmer-ers, My thanks for all of you who are sharing your trials and tribulations publicly. I was hoping to elicit some feedback on my thoughts on denominator degrees of freedom for F ratios in mixed models. These thoughts and practices result from my reading of previous postings by Doug Bates and others. - I start by assuming that the appropriate denominator degrees lies between n - p and and n - q, where n=number of observations, p=number of fixed effects (rank of model matrix X), and q=rank of Z:X. DB I agree with this but the opinion is by no means universal. Initially DB I misread the statement because I usually write the number of columns DB of Z as q. DB It is not easy to assess rank of Z:X numerically. In many cases one DB can reason what it should be from the form of the model but a general DB procedure to assess the rank of a matrix, especially a sparse matrix, DB is difficult. DB An alternative which can be easily calculated is n - t where t is the DB trace of the 'hat matrix'. The function 'hatTrace' applied to a DB fitted lmer model evaluates this trace (conditional on the estimates DB of the relative variances of the random effects). - I then conclude that good estimates of P values on the F ratios lie between 1 - pf(F.ratio, numDF, n-p) and 1 - pf(F.ratio, numDF, n-q). -- I further surmise that the latter of these (1 - pf(F.ratio, numDF, n-q)) is the more conservative estimate. This assumes that the true distribution (under H0) of that F ratio *is* F_{n1,n2} for some (possibly non-integer) n1 and n2. But AFAIU, this is only approximately true at best, and AFAIU, the quality of this approximation has only been investigated empirically for some situations. Hence, even your conservative estimate of the P value could be wrong (I mean wrong on the wrong side instead of just conservatively wrong). Consequently, such a P-value is only ``approximately conservative'' ... I agree howevert that in some situations, it might be a very useful descriptive statistic about the fitted model. Thank you for pointing that out Martin. I agree. As I mentioned a value of the denominator degrees of freedom based on the trace of the hat matrix is conditional on the estimates of the relative variances of the random effects. I think an argument could still be made for the upper bound on the dimension of the model space being rank of Z:X and hence a lower bound on the dimension of the space in which the residuals lie as being n - rank[Z:X]. One possible approach would be to use the squared length of the projection of the data vector into the orthogonal complement of Z:X as the sum of squares and n - rank(Z:X) as the degrees of freedom and base tests on that. Under the assumptions on the model I think an F ratio calculated using that actually would have an F distribution. Martin When I use these criteria and compare my ANOVA table to the results of analysis of Helmert contrasts using MCMC sample with highest posterior density intervals, I find that my conclusions (e.g. factor A, with three levels, has a significant effect on the response variable) are qualitatively the same. Comments? DB I would be happy to re-institute p-values for fixed effects in the DB summary and anova methods for lmer objects using a denominator degrees DB of freedom based on the trace of the hat matrix or the rank of Z:X if DB others will volunteer to respond to the these answers are obviously DB wrong because they don't agree with whatever and the idiot who wrote DB this software should be thrashed to within an inch of his life DB messages. I don't have the patience. DB __ DB R-help@stat.math.ethz.ch mailing list DB https://stat.ethz.ch/mailman/listinfo/r-help DB PLEASE do read the posting guide http://www.R-project.org/posting-guide.html DB and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conservative ANOVA tables in lmer
Martin Maechler [EMAIL PROTECTED] writes: DB == Douglas Bates [EMAIL PROTECTED] on Thu, 7 Sep 2006 07:59:58 -0500 writes: DB Thanks for your summary, Hank. DB On 9/7/06, Martin Henry H. Stevens [EMAIL PROTECTED] wrote: Dear lmer-ers, My thanks for all of you who are sharing your trials and tribulations publicly. I was hoping to elicit some feedback on my thoughts on denominator degrees of freedom for F ratios in mixed models. These thoughts and practices result from my reading of previous postings by Doug Bates and others. - I start by assuming that the appropriate denominator degrees lies between n - p and and n - q, where n=number of observations, p=number of fixed effects (rank of model matrix X), and q=rank of Z:X. DB I agree with this but the opinion is by no means universal. Initially DB I misread the statement because I usually write the number of columns DB of Z as q. DB It is not easy to assess rank of Z:X numerically. In many cases one DB can reason what it should be from the form of the model but a general DB procedure to assess the rank of a matrix, especially a sparse matrix, DB is difficult. DB An alternative which can be easily calculated is n - t where t is the DB trace of the 'hat matrix'. The function 'hatTrace' applied to a DB fitted lmer model evaluates this trace (conditional on the estimates DB of the relative variances of the random effects). - I then conclude that good estimates of P values on the F ratios lie between 1 - pf(F.ratio, numDF, n-p) and 1 - pf(F.ratio, numDF, n-q). -- I further surmise that the latter of these (1 - pf(F.ratio, numDF, n-q)) is the more conservative estimate. This assumes that the true distribution (under H0) of that F ratio *is* F_{n1,n2} for some (possibly non-integer) n1 and n2. But AFAIU, this is only approximately true at best, and AFAIU, the quality of this approximation has only been investigated empirically for some situations. Hence, even your conservative estimate of the P value could be wrong (I mean wrong on the wrong side instead of just conservatively wrong). Consequently, such a P-value is only ``approximately conservative'' ... I agree howevert that in some situations, it might be a very useful descriptive statistic about the fitted model. I'm very wary of ANY attempt at guesswork in these matters. I may be understanding the post wrongly, but consider this case: Y_ij = mu + z_i + eps_ij, i = 1..3, j=1..100 I get rank(X)=1, rank(X:Z)=3, n=300 It is well known that the test for mu=0 in this case is obtained by reducing data to group means, xbar_i, and then do a one-sample t test, the square of which is F(1, 2), but it seems to be suggested that F(1, 297) is a conservative test???! -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.