Re: [R] Conservative ANOVA tables in lmer

2006-09-18 Thread Douglas Bates
On 9/12/06, Douglas Bates [EMAIL PROTECTED] wrote:
 On 9/11/06, Manuel Morales [EMAIL PROTECTED] wrote:
[snip]
  Am I right that the MCMC sample can not be used, however, to evaluate
  the significance of parameter groups. For example, to assess the
  significance of a three-level factor? Are there better alternatives than
  simply adjusting the CI for the number of factor levels
  (1-alpha/levels).

 Hmm - I'm not sure what confidence interval and what number of levels
 you mean there so I can't comment on that method.

 Suppose we go back to Spencer's example and consider if there is a
 signficant effect for the Nozzle factor.  That is equivalent to the
 hypothesis H_0: beta_2 = beta_3 = 0 versus the general alternative.  A
 p-value could be formulated from an MCMC sample if we assume that
 the marginal distribution of the parameter estimates for beta_2 and
 beta_3 has roughly elliptical contours and you can evaluate that by,
 say, examining a hexbin plot of the values in the MCMC sample. One
 could take the ellipses as defined by the standard errors and
 estimated correlation or, probably better, by the observed standard
 deviations and correlations in the MCMC sample.  Then determine the
 proportion of (beta_2, beta_3) pairs in the sample that fall outside
 the ellipse centered at the estimates and with that eccentricity and
 scaling factors that passes through (0,0).  That would be an empirical
 p-value for the test.

 I would recommend calculating this for a couple of samples to check on
 the reproducibility.

There was some follow-up on this thread, including some code and
results that I find encouraging.  I didn't notice that the R-help list
had been dropped off the cc: in later exchanges. I enclose my
contribution to the conversation so that others on the list will get
to see it.

--- exerpt from previous private message 

As soon as I described the idea I knew that someone would ask for a
function to perform it.  Here's one

mcmcpvalue - function(samp)
{
   ## elementary version that creates an empirical p-value for the
   ## hypothesis that the columns of samp have mean zero versus a
   ## general multivariate distribution with elliptical contours.

   ## differences from the mean standardized by the observed
   ## variance-covariance factor
   std - backsolve(chol(var(samp)),
cbind(0, t(samp)) - colMeans(samp),
transpose = TRUE)
   sqdist - colSums(std * std)
   sum(sqdist[-1]  sqdist[1])/nrow(samp)
}

At least I think I have the standardization by the Cholesky factor of
the observed variance-covariance matrix correct.  However I always
manage to confuse myself on that calculation so please let me know if
I have it wrong.

As an example, consider a model fit to the AvgDailyGain data from the
SASmixed package.
 library(nlme)
 data(AvgDailyGain, package = SASmixed)
 summary(fm1Adg - lme(adg ~ InitWt*Treatment - 1, AvgDailyGain, random = 
 ~1|Block))
Linear mixed-effects model fit by REML
 Data: AvgDailyGain
  AIC  BIClogLik
 85.32685 97.10739 -32.66342

Random effects:
 Formula: ~1 | Block
   (Intercept)  Residual
StdDev:   0.5092266 0.2223268

Fixed effects: adg ~ InitWt * Treatment - 1
   Value Std.Error DFt-value p-value
InitWt  0.0022937 0.0017473 17  1.3126947  0.2067
Treatment0  0.4391370 0.7110881 17  0.6175564  0.5451
Treatment10 1.4261187 0.6375458 17  2.2368880  0.0390
Treatment20 0.4796285 0.5488867 17  0.8738206  0.3944
Treatment30 0.2001071 0.7751989 17  0.2581365  0.7994
InitWt:Treatment10 -0.0012108 0.0023326 17 -0.5190774  0.6104
InitWt:Treatment20  0.0010720 0.0021737 17  0.4931507  0.6282
InitWt:Treatment30  0.0021543 0.0027863 17  0.7731996  0.4500
 Correlation:
  InitWt Trtmn0 Trtm10 Trtm20 Trtm30 IW:T10 IW:T20
Treatment0 -0.961
Treatment10 0.034  0.039
Treatment20 0.003  0.080  0.334
Treatment30 0.050  0.011  0.097  0.043
InitWt:Treatment10 -0.772  0.742 -0.631 -0.164 -0.059
InitWt:Treatment20 -0.806  0.775 -0.180 -0.555 -0.019  0.724
InitWt:Treatment30 -0.666  0.640 -0.046  0.024 -0.754  0.529  0.520

Standardized Within-Group Residuals:
   Min  Q1 Med  Q3 Max
-1.82903364 -0.44913967 -0.03023488  0.44738506  1.59877700

Number of Observations: 32
Number of Groups: 8
 anova(fm1Adg)
numDF denDF  F-value p-value
InitWt   117 91.68230  .0001
Treatment417  8.81312  0.0005
InitWt:Treatment 317  0.93118  0.4471

Fitting the same model in lmer then generating an MCMC sample and
testing for the three interaction coefficients being zero would look
like

 data(AvgDailyGain, package = SASmixed)
 (fm1Adg - lmer(adg ~ InitWt*Treatment - 1 + (1|Block), AvgDailyGain))
Linear mixed-effects model fit by REML
Formula: adg ~ InitWt * Treatment - 1 + (1 | Block)
  Data: AvgDailyGain
  AIC   BIC logLik MLdeviance REMLdeviance
 83.33 

Re: [R] Conservative ANOVA tables in lmer

2006-09-14 Thread Dimitris Rizopoulos

- Original Message - 
From: Manuel Morales [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: Douglas Bates [EMAIL PROTECTED]; Manuel Morales 
[EMAIL PROTECTED]; r-help@stat.math.ethz.ch
Sent: Wednesday, September 13, 2006 1:04 PM
Subject: Re: [R] Conservative ANOVA tables in lmer


 On Wed, 2006-09-13 at 08:04 +1000, Andrew Robinson wrote:
 On Tue, September 12, 2006 7:34 am, Manuel Morales wrote:
  On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote:
  Having made that offer I think I will now withdraw it.  Peter's
  example has convinced me that this is the wrong thing to do.
 
  I am encouraged by the fact that the results from mcmcsamp 
  correspond
  closely to the correct theoretical results in the case that 
  Peter
  described.  I appreciate that some users will find it difficult 
  to
  work with a MCMC sample (or to convince editors to accept 
  results
  based on such a sample) but I think that these results indicate 
  that
  it is better to go after the marginal distribution of the fixed
  effects estimates (which is what is being approximated by the 
  MCMC
  sample - up to Bayesian/frequentist philosophical differences) 
  than to
  use the conditional distribution and somehow try to adjust the
  reference distribution.
 
  Am I right that the MCMC sample can not be used, however, to 
  evaluate
  the significance of parameter groups. For example, to assess the
  significance of a three-level factor? Are there better 
  alternatives than
  simply adjusting the CI for the number of factor levels
  (1-alpha/levels).

 I wonder whether the likelihood ratio test would be suitable here? 
 That
 seems to be supported.  It just takes a little longer.

  require(lme4)
  data(sleepstudy)
  fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
  fm2 - lmer(Reaction ~ Days + I(Days^2) + (Days|Subject), 
  sleepstudy)
  anova(fm1, fm2)

 So, a brief overview of the popular inferential needs and solutions 
 would
 then be:

 1) Test the statistical significance of one or more fixed or random
 effects - fit a model with and a model without the terms, and use 
 the LRT.

 I believe that the LRT is anti-conservative for fixed effects, as
 described in Pinheiro and Bates companion book to NLME.


You have this effect if you're using REML, for ML I don't think there 
is any problem to use LRT between nested models with different 
fixed-effects structure.

Best,
Dimitris


 2) Obtain confidence intervals for one or more fixed or random 
 effects -
 use mcmcsamp

 Did I miss anything important? - What else would people like to do?

 Cheers

 Andrew

 Andrew Robinson
 Senior Lecturer in Statistics   Tel: 
 +61-3-8344-9763
 Department of Mathematics and StatisticsFax: +61-3-8344 
 4599
 University of Melbourne, VIC 3010 Australia
 Email: [EMAIL PROTECTED]Website: 
 http://www.ms.unimelb.edu.au

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-14 Thread Douglas Bates
On 9/13/06, Dimitris Rizopoulos [EMAIL PROTECTED] wrote:

 - Original Message -
 From: Manuel Morales [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: Douglas Bates [EMAIL PROTECTED]; Manuel Morales
 [EMAIL PROTECTED]; r-help@stat.math.ethz.ch
 Sent: Wednesday, September 13, 2006 1:04 PM
 Subject: Re: [R] Conservative ANOVA tables in lmer


  On Wed, 2006-09-13 at 08:04 +1000, Andrew Robinson wrote:
  On Tue, September 12, 2006 7:34 am, Manuel Morales wrote:
   On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote:
   Having made that offer I think I will now withdraw it.  Peter's
   example has convinced me that this is the wrong thing to do.
  
   I am encouraged by the fact that the results from mcmcsamp
   correspond
   closely to the correct theoretical results in the case that
   Peter
   described.  I appreciate that some users will find it difficult
   to
   work with a MCMC sample (or to convince editors to accept
   results
   based on such a sample) but I think that these results indicate
   that
   it is better to go after the marginal distribution of the fixed
   effects estimates (which is what is being approximated by the
   MCMC
   sample - up to Bayesian/frequentist philosophical differences)
   than to
   use the conditional distribution and somehow try to adjust the
   reference distribution.
  
   Am I right that the MCMC sample can not be used, however, to
   evaluate
   the significance of parameter groups. For example, to assess the
   significance of a three-level factor? Are there better
   alternatives than
   simply adjusting the CI for the number of factor levels
   (1-alpha/levels).
 
  I wonder whether the likelihood ratio test would be suitable here?
  That
  seems to be supported.  It just takes a little longer.
 
   require(lme4)
   data(sleepstudy)
   fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
   fm2 - lmer(Reaction ~ Days + I(Days^2) + (Days|Subject),
   sleepstudy)
   anova(fm1, fm2)
 
  So, a brief overview of the popular inferential needs and solutions
  would
  then be:
 
  1) Test the statistical significance of one or more fixed or random
  effects - fit a model with and a model without the terms, and use
  the LRT.
 
  I believe that the LRT is anti-conservative for fixed effects, as
  described in Pinheiro and Bates companion book to NLME.
 

 You have this effect if you're using REML, for ML I don't think there
 is any problem to use LRT between nested models with different
 fixed-effects structure.

There are two issues here: how the test statistic is evaluated and
what reference distribution is used for the test statistic (i.e. how
do you convert a value of the test statistic to a p-value).  Manuel is
addressing the latter issue.  If you compare the difference of
-2*logLik for the models to a chi-square with degrees of freedom
determined by the difference in the number of parameters the test will
be anti-conservative when the number of observations is small.  The
use of the chi-square as a reference distribution is based on
asymptotic properties.

The other question is how does one evaluate the likelihood-ratio test
statistic and that is the issue that Dimitris is addressing.  The REML
criterion is a modified likelihood and it is inappropriate to look at
differences in the REML criterion when the models being compared have
different fixed-effects specifications, or even a different
parameterization of the fixed effects.  However, the anova method for
an lmer object does not use the REML criterion even when the model has
been estimated by REML.  It uses the profiled log-likelihood evaluated
at the REML estimates of the relative variances of the random effects.
 That's a complicated statement so let me break it down.

The optimization in lmer is done with respect to the elements of the
variance-covariance matrix of the random effects relative to
$\sigma^2$.  Given these values the conditional estimates of the
fixed-effects parameters and of $\sigma^2$ can be evaluated directly
with some linear algebra.  In the summary or show output of an lmer
model there are two quantities called the MLdeviance and the
REMLdeviance.  Those are based on the same relative variances but
different conditional estimates of $\sigma^2$ (and hence different
estimates of the elements of the variance-covariance of the random
effects).  It turns out that there is very little difference in the
value of the profiled log-likelihood at the ML estimates and at the
REML estimates.  This is not to say that the log-likelihood is similar
at the two (complete) sets of estimates - it is the profiled
log-likelihoods that are similar and these are what are used to create
the likelihood ratio test statistic, even when the models have been
fit by REML.

As I said, this is complicated and until I went to reply to this
message I hadn't really sorted it out for myself.  I knew the
empirical result, which I had checked before releasing the code, but I
hadn't worked out the details of why

Re: [R] Conservative ANOVA tables in lmer

2006-09-14 Thread Douglas Bates
On 9/13/06, Dimitris Rizopoulos [EMAIL PROTECTED] wrote:

 - Original Message -
 From: Manuel Morales [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: Douglas Bates [EMAIL PROTECTED]; Manuel Morales
 [EMAIL PROTECTED]; r-help@stat.math.ethz.ch
 Sent: Wednesday, September 13, 2006 1:04 PM
 Subject: Re: [R] Conservative ANOVA tables in lmer


  On Wed, 2006-09-13 at 08:04 +1000, Andrew Robinson wrote:
  On Tue, September 12, 2006 7:34 am, Manuel Morales wrote:
   On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote:
   Having made that offer I think I will now withdraw it.  Peter's
   example has convinced me that this is the wrong thing to do.
  
   I am encouraged by the fact that the results from mcmcsamp
   correspond
   closely to the correct theoretical results in the case that
   Peter
   described.  I appreciate that some users will find it difficult
   to
   work with a MCMC sample (or to convince editors to accept
   results
   based on such a sample) but I think that these results indicate
   that
   it is better to go after the marginal distribution of the fixed
   effects estimates (which is what is being approximated by the
   MCMC
   sample - up to Bayesian/frequentist philosophical differences)
   than to
   use the conditional distribution and somehow try to adjust the
   reference distribution.
  
   Am I right that the MCMC sample can not be used, however, to
   evaluate
   the significance of parameter groups. For example, to assess the
   significance of a three-level factor? Are there better
   alternatives than
   simply adjusting the CI for the number of factor levels
   (1-alpha/levels).
 
  I wonder whether the likelihood ratio test would be suitable here?
  That
  seems to be supported.  It just takes a little longer.
 
   require(lme4)
   data(sleepstudy)
   fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
   fm2 - lmer(Reaction ~ Days + I(Days^2) + (Days|Subject),
   sleepstudy)
   anova(fm1, fm2)
 
  So, a brief overview of the popular inferential needs and solutions
  would
  then be:
 
  1) Test the statistical significance of one or more fixed or random
  effects - fit a model with and a model without the terms, and use
  the LRT.
 
  I believe that the LRT is anti-conservative for fixed effects, as
  described in Pinheiro and Bates companion book to NLME.
 

 You have this effect if you're using REML, for ML I don't think there
 is any problem to use LRT between nested models with different
 fixed-effects structure.

There are two issues here: how the test statistic is evaluated and
what reference distribution is used for the test statistic (i.e. how
do you convert a value of the test statistic to a p-value).  Manuel is
addressing the latter issue.  If you compare the difference of
-2*logLik for the models to a chi-square with degrees of freedom
determined by the difference in the number of parameters the test will
be anti-conservative when the number of observations is small.  The
use of the chi-square as a reference distribution is based on
asymptotic properties.

The other question is how does one evaluate the likelihood-ratio test
statistic and that is the issue that Dimitris is addressing.  The REML
criterion is a modified likelihood and it is inappropriate to look at
differences in the REML criterion when the models being compared have
different fixed-effects specifications, or even a different
parameterization of the fixed effects.  However, the anova method for
an lmer object does not use the REML criterion even when the model has
been estimated by REML.  It uses the profiled log-likelihood evaluated
at the REML estimates of the relative variances of the random effects.
 That's a complicated statement so let me break it down.

The optimization in lmer is done with respect to the elements of the
variance-covariance matrix of the random effects relative to
$\sigma^2$.  Given these values the conditional estimates of the
fixed-effects parameters and of $\sigma^2$ can be evaluated directly
with some linear algebra.  In the summary or show output of an lmer
model there are two quantities called the MLdeviance and the
REMLdeviance.  Those are based on the same relative variances but
different conditional estimates of $\sigma^2$ (and hence different
estimates of the elements of the variance-covariance of the random
effects).  It turns out that there is very little difference in the
value of the profiled log-likelihood at the ML estimates and at the
REML estimates.  This is not to say that the log-likelihood is similar
at the two (complete) sets of estimates - it is the profiled
log-likelihoods that are similar and these are what are used to create
the likelihood ratio test statistic, even when the models have been
fit by REML.

As I said, this is complicated and until I went to reply to this
message I hadn't really sorted it out for myself.  I knew the
empirical result, which I had checked before releasing the code, but I
hadn't worked out the details of why

Re: [R] Conservative ANOVA tables in lmer

2006-09-13 Thread Manuel Morales
On Wed, 2006-09-13 at 08:04 +1000, Andrew Robinson wrote:
 On Tue, September 12, 2006 7:34 am, Manuel Morales wrote:
  On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote:
  Having made that offer I think I will now withdraw it.  Peter's
  example has convinced me that this is the wrong thing to do.
 
  I am encouraged by the fact that the results from mcmcsamp correspond
  closely to the correct theoretical results in the case that Peter
  described.  I appreciate that some users will find it difficult to
  work with a MCMC sample (or to convince editors to accept results
  based on such a sample) but I think that these results indicate that
  it is better to go after the marginal distribution of the fixed
  effects estimates (which is what is being approximated by the MCMC
  sample - up to Bayesian/frequentist philosophical differences) than to
  use the conditional distribution and somehow try to adjust the
  reference distribution.
 
  Am I right that the MCMC sample can not be used, however, to evaluate
  the significance of parameter groups. For example, to assess the
  significance of a three-level factor? Are there better alternatives than
  simply adjusting the CI for the number of factor levels
  (1-alpha/levels).
 
 I wonder whether the likelihood ratio test would be suitable here?  That
 seems to be supported.  It just takes a little longer.
 
  require(lme4)
  data(sleepstudy)
  fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
  fm2 - lmer(Reaction ~ Days + I(Days^2) + (Days|Subject), sleepstudy)
  anova(fm1, fm2)
 
 So, a brief overview of the popular inferential needs and solutions would
 then be:
 
 1) Test the statistical significance of one or more fixed or random
 effects - fit a model with and a model without the terms, and use the LRT.

I believe that the LRT is anti-conservative for fixed effects, as
described in Pinheiro and Bates companion book to NLME.

 2) Obtain confidence intervals for one or more fixed or random effects -
 use mcmcsamp
 
 Did I miss anything important? - What else would people like to do?
 
 Cheers
 
 Andrew
 
 Andrew Robinson
 Senior Lecturer in Statistics   Tel: +61-3-8344-9763
 Department of Mathematics and StatisticsFax: +61-3-8344 4599
 University of Melbourne, VIC 3010 Australia
 Email: [EMAIL PROTECTED]Website: http://www.ms.unimelb.edu.au
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-13 Thread Greg Snow
[snip]

 Douglas Bates wrote:
 Hmm - I'm not sure what confidence interval and what number of levels
 you mean there so I can't comment on that method.
 
 Suppose we go back to Spencer's example and consider if there is a
 signficant effect for the Nozzle factor.  That is equivalent to the
 hypothesis H_0: beta_2 = beta_3 = 0 versus the general alternative.  A
 p-value could be formulated from an MCMC sample if we assume that
 the marginal distribution of the parameter estimates for beta_2 and
 beta_3 has roughly elliptical contours and you can evaluate that by,
 say, examining a hexbin plot of the values in the MCMC sample. One
 could take the ellipses as defined by the standard errors and
 estimated correlation or, probably better, by the observed standard
 deviations and correlations in the MCMC sample.  Then determine the
 proportion of (beta_2, beta_3) pairs in the sample that fall outside
 the ellipse centered at the estimates and with that eccentricity and
 scaling factors that passes through (0,0).  That would be an empirical
 p-value for the test.
 
 I would recommend calculating this for a couple of samples to check on
 the reproducibility.

Here is another thought for an empirical p-value that may be easier to
compute and would require fewer assumptions:

Take the proportion of MCMC samples that fall into each quadrant (++,
+-, -+, --) and use the smallest of these proportions as the p-value
(or the smallest out of a subset of the quadrants for a one-sided
style test).

Think of it this way, if the smallest proportion is greater than
alpha, then any closed curve (ellipse, polygon, even concave polygons)
that includes 1-alpha proportion of the points would need to include
points from all 4 quadrants and therefore any convex curve would have
to include (0,0) which is consistent with the null hypothesis.

On the other hand if there is a quadrant that contains fewer than
alpha percent of the points then there exists at least one confidence
region (possibly concave) that contains 1-alpha proportion of the
points and excludes (0,0) and that entire quadrant, which is
consistent with the alternative that at least one of the betas differs
from 0.

A more conservative p-value would be to take the minimum proportion
and muliply it by 4 (or 2^p for p simultaneous tests) which is the
same idea as multipying by 2 for a 2 sided univariate test and assumes
that the confidence regions would exclude similar proportions of
points in each  direction (central confidence regions rather than
minimum length or other confidence regions).  This seems to me that it
would be over conservative in some cases (since all the proportions
must sum to 1, we don't really have 4 degrees of freedom and a smaller
adjustment factor may still be correct and less conservative).

Some simulations would be a good idea to see if the plain minimum is
to liberal and how conservative the other approach is for common
situations.

This is just my first thoughts on the matter, I have not tested
anything, so any comments or other discussion of this idea is welcome.


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-13 Thread Andrew Robinson
On Wed, Sep 13, 2006 at 07:04:17AM -0400, Manuel Morales wrote:
 On Wed, 2006-09-13 at 08:04 +1000, Andrew Robinson wrote:
  On Tue, September 12, 2006 7:34 am, Manuel Morales wrote:
   On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote:
   Having made that offer I think I will now withdraw it.  Peter's
   example has convinced me that this is the wrong thing to do.
  
   I am encouraged by the fact that the results from mcmcsamp correspond
   closely to the correct theoretical results in the case that Peter
   described.  I appreciate that some users will find it difficult to
   work with a MCMC sample (or to convince editors to accept results
   based on such a sample) but I think that these results indicate that
   it is better to go after the marginal distribution of the fixed
   effects estimates (which is what is being approximated by the MCMC
   sample - up to Bayesian/frequentist philosophical differences) than to
   use the conditional distribution and somehow try to adjust the
   reference distribution.
  
   Am I right that the MCMC sample can not be used, however, to evaluate
   the significance of parameter groups. For example, to assess the
   significance of a three-level factor? Are there better alternatives than
   simply adjusting the CI for the number of factor levels
   (1-alpha/levels).
  
  I wonder whether the likelihood ratio test would be suitable here?  That
  seems to be supported.  It just takes a little longer.
  
   require(lme4)
   data(sleepstudy)
   fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
   fm2 - lmer(Reaction ~ Days + I(Days^2) + (Days|Subject), sleepstudy)
   anova(fm1, fm2)
  
  So, a brief overview of the popular inferential needs and solutions would
  then be:
  
  1) Test the statistical significance of one or more fixed or random
  effects - fit a model with and a model without the terms, and use the LRT.
 
 I believe that the LRT is anti-conservative for fixed effects, as
 described in Pinheiro and Bates companion book to NLME.

Yes, you are right.  I had forgotten that.  Back to square one :).
Bert Gunter also kindly pointed this out to me.

Cherse

Andrew


 
  2) Obtain confidence intervals for one or more fixed or random effects -
  use mcmcsamp
  
  Did I miss anything important? - What else would people like to do?
  
  Cheers
  
  Andrew
  
  Andrew Robinson
  Senior Lecturer in Statistics   Tel: +61-3-8344-9763
  Department of Mathematics and StatisticsFax: +61-3-8344 4599
  University of Melbourne, VIC 3010 Australia
  Email: [EMAIL PROTECTED]Website: http://www.ms.unimelb.edu.au
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
Email: [EMAIL PROTECTED] http://www.ms.unimelb.edu.au

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-12 Thread Douglas Bates
On 9/11/06, Manuel Morales [EMAIL PROTECTED] wrote:
 On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote:
  On 9/10/06, Andrew Robinson [EMAIL PROTECTED] wrote:
   On Thu, Sep 07, 2006 at 07:59:58AM -0500, Douglas Bates wrote:
  
I would be happy to re-institute p-values for fixed effects in the
summary and anova methods for lmer objects using a denominator degrees
of freedom based on the trace of the hat matrix or the rank of Z:X if
others will volunteer to respond to the these answers are obviously
wrong because they don't agree with whatever and the idiot who wrote
this software should be thrashed to within an inch of his life
messages.  I don't have the patience.
  
   This seems to be more than fair to me.  I'll volunteer to help explain
   why the anova.lmer() output doesn't match SAS, etc.  Is it worth
   putting a caveat in the output and the help files?  Is it even worth
   writing a FAQ about this?
 
  Having made that offer I think I will now withdraw it.  Peter's
  example has convinced me that this is the wrong thing to do.
 
  I am encouraged by the fact that the results from mcmcsamp correspond
  closely to the correct theoretical results in the case that Peter
  described.  I appreciate that some users will find it difficult to
  work with a MCMC sample (or to convince editors to accept results
  based on such a sample) but I think that these results indicate that
  it is better to go after the marginal distribution of the fixed
  effects estimates (which is what is being approximated by the MCMC
  sample - up to Bayesian/frequentist philosophical differences) than to
  use the conditional distribution and somehow try to adjust the
  reference distribution.

 Am I right that the MCMC sample can not be used, however, to evaluate
 the significance of parameter groups. For example, to assess the
 significance of a three-level factor? Are there better alternatives than
 simply adjusting the CI for the number of factor levels
 (1-alpha/levels).

Hmm - I'm not sure what confidence interval and what number of levels
you mean there so I can't comment on that method.

Suppose we go back to Spencer's example and consider if there is a
signficant effect for the Nozzle factor.  That is equivalent to the
hypothesis H_0: beta_2 = beta_3 = 0 versus the general alternative.  A
p-value could be formulated from an MCMC sample if we assume that
the marginal distribution of the parameter estimates for beta_2 and
beta_3 has roughly elliptical contours and you can evaluate that by,
say, examining a hexbin plot of the values in the MCMC sample. One
could take the ellipses as defined by the standard errors and
estimated correlation or, probably better, by the observed standard
deviations and correlations in the MCMC sample.  Then determine the
proportion of (beta_2, beta_3) pairs in the sample that fall outside
the ellipse centered at the estimates and with that eccentricity and
scaling factors that passes through (0,0).  That would be an empirical
p-value for the test.

I would recommend calculating this for a couple of samples to check on
the reproducibility.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-12 Thread Andrew Robinson

On Tue, September 12, 2006 7:34 am, Manuel Morales wrote:
 On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote:
 Having made that offer I think I will now withdraw it.  Peter's
 example has convinced me that this is the wrong thing to do.

 I am encouraged by the fact that the results from mcmcsamp correspond
 closely to the correct theoretical results in the case that Peter
 described.  I appreciate that some users will find it difficult to
 work with a MCMC sample (or to convince editors to accept results
 based on such a sample) but I think that these results indicate that
 it is better to go after the marginal distribution of the fixed
 effects estimates (which is what is being approximated by the MCMC
 sample - up to Bayesian/frequentist philosophical differences) than to
 use the conditional distribution and somehow try to adjust the
 reference distribution.

 Am I right that the MCMC sample can not be used, however, to evaluate
 the significance of parameter groups. For example, to assess the
 significance of a three-level factor? Are there better alternatives than
 simply adjusting the CI for the number of factor levels
 (1-alpha/levels).

I wonder whether the likelihood ratio test would be suitable here?  That
seems to be supported.  It just takes a little longer.

 require(lme4)
 data(sleepstudy)
 fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
 fm2 - lmer(Reaction ~ Days + I(Days^2) + (Days|Subject), sleepstudy)
 anova(fm1, fm2)

So, a brief overview of the popular inferential needs and solutions would
then be:

1) Test the statistical significance of one or more fixed or random
effects - fit a model with and a model without the terms, and use the LRT.

2) Obtain confidence intervals for one or more fixed or random effects -
use mcmcsamp

Did I miss anything important? - What else would people like to do?

Cheers

Andrew

Andrew Robinson
Senior Lecturer in Statistics   Tel: +61-3-8344-9763
Department of Mathematics and StatisticsFax: +61-3-8344 4599
University of Melbourne, VIC 3010 Australia
Email: [EMAIL PROTECTED]Website: http://www.ms.unimelb.edu.au

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-11 Thread Spencer Graves
  Peter's example and Doug's different test reply sent me 
Scheffé's discussion of the balanced and replicated mixed-effect 2-away 
layout.  As I note below, the obvious F test for the fixed effect does 
not appear to be likelihood ratio for anything. 

Douglas Bates wrote:
 On 9/7/06, Douglas Bates [EMAIL PROTECTED] wrote:
   
 On 07 Sep 2006 17:20:29 +0200, Peter Dalgaard [EMAIL PROTECTED] wrote:
 
 Martin Maechler [EMAIL PROTECTED] writes:

   
 DB == Douglas Bates [EMAIL PROTECTED]
 on Thu, 7 Sep 2006 07:59:58 -0500 writes:
   
 DB Thanks for your summary, Hank.
 DB On 9/7/06, Martin Henry H. Stevens [EMAIL PROTECTED] wrote:
  Dear lmer-ers,
  My thanks for all of you who are sharing your trials and 
 tribulations
  publicly.

  I was hoping to elicit some feedback on my thoughts on denominator
  degrees of freedom for F ratios in mixed models. These thoughts and
  practices result from my reading of previous postings by Doug Bates
  and others.

  - I start by assuming that the appropriate denominator degrees lies
  between n - p and and n - q, where n=number of observations, 
 p=number
  of fixed effects (rank of model matrix X), and q=rank of Z:X.

 DB I agree with this but the opinion is by no means universal.  
 Initially
 DB I misread the statement because I usually write the number of 
 columns
 DB of Z as q.

 DB It is not easy to assess rank of Z:X numerically.  In many cases 
 one
 DB can reason what it should be from the form of the model but a 
 general
 DB procedure to assess the rank of a matrix, especially a sparse 
 matrix,
 DB is difficult.

 DB An alternative which can be easily calculated is n - t where t is 
 the
 DB trace of the 'hat matrix'.  The function 'hatTrace' applied to a
 DB fitted lmer model evaluates this trace (conditional on the 
 estimates
 DB of the relative variances of the random effects).

  - I then conclude that good estimates of P values on the F ratios 
 lie
between 1 - pf(F.ratio, numDF, n-p) and 1 - pf(F.ratio, numDF, 
 n-q).
-- I further surmise that the latter of these (1 - pf(F.ratio, 
 numDF,
n-q)) is the more conservative estimate.

 This assumes that the true distribution (under H0) of that F ratio
 *is*  F_{n1,n2}  for some (possibly non-integer)  n1 and n2.
 But AFAIU, this is only approximately true at best, and AFAIU,
 the quality of this approximation has only been investigated
 empirically for some situations.
 Hence, even your conservative estimate of the P value could be
 wrong (I mean wrong on the wrong side instead of just
 conservatively wrong).  Consequently, such a P-value is only
 ``approximately conservative'' ...
 I agree howevert that in some situations, it might be a very
 useful descriptive statistic about the fitted model.
 
 I'm very wary of ANY attempt at guesswork in these matters.

 I may be understanding the post wrongly, but consider this case: Y_ij
 = mu + z_i + eps_ij, i = 1..3, j=1..100

 I get rank(X)=1, rank(X:Z)=3,  n=300

 It is well known that the test for mu=0 in this case is obtained by
 reducing data to group means, xbar_i, and then do a one-sample t test,
 the square of which is F(1, 2), but it seems to be suggested that
 F(1, 297) is a conservative test???!
   
 It's a different test, isn't it?  Your test is based upon the between
 group sum of squares with 2 df.  I am proposing to use the within
 group sum of squares or its generalization.
 

 On closer examination I see that you are indeed correct.  I have heard
 that well-known result many times and finally sat down to prove it
 to myself.  For a balanced design the standard error of the intercept
 using the REML estimates is the same as the standard error of the mean
 calculated from the group means.

   
 data(Rail, package = 'nlme')
 library(lme4)
 summary(fm1 - lmer(travel ~ 1 + (1|Rail), Rail))
 
 Linear mixed-effects model fit by REML
 Formula: travel ~ 1 + (1 | Rail)
Data: Rail
AIC   BIC logLik MLdeviance REMLdeviance
  126.2 128.0 -61.09  128.6122.2
 Random effects:
  Groups   NameVariance Std.Dev.
  Rail (Intercept) 615.286  24.8050
  Residual  16.167   4.0208
 number of obs: 18, groups: Rail, 6

 Fixed effects:
 Estimate Std. Error t value
 (Intercept)66.50  10.17   6.538
   
 mns - with(Rail, tapply(travel, Rail, mean)) # group means
 sd(mns)/sqrt(length(mns))  # standard error matches that from lmer
 
 [1] 10.17104
   
 t.test(mns)
 

   One Sample t-test

 data:  mns
 t = 6.5382, df = 5, p-value = 0.001253
 alternative hypothesis: true mean is not equal to 0
 95 percent confidence interval:
  40.35452 92.64548
 sample estimates:
 mean of x
  66.5

   
 ctab - summary(fm1)@coefs  # coefficient table
 ctab[,1] + c(-1,1) * qt(0.975, 15) * ctab[,2] # 95% conf. int.
 
 [1] 44.82139 88.17861
   
 ## 

Re: [R] Conservative ANOVA tables in lmer

2006-09-11 Thread Martin Henry H. Stevens
Hi Spencer,
I would like to make sure I understand Spencer's question and  
doubt's, below.
On Sep 9, 2006, at 11:36 PM, Spencer Graves wrote:

  Peter's example and Doug's different test reply sent me  
 Scheffé's discussion of the balanced and replicated mixed-effect 2- 
 away layout.  As I note below, the obvious F test for the fixed  
 effect does not appear to be likelihood ratio for anything.
 Douglas Bates wrote:
 On 9/7/06, Douglas Bates [EMAIL PROTECTED] wrote:

 On 07 Sep 2006 17:20:29 +0200, Peter Dalgaard  
 [EMAIL PROTECTED] wrote:

 Martin Maechler [EMAIL PROTECTED] writes:


 DB == Douglas Bates [EMAIL PROTECTED]
 on Thu, 7 Sep 2006 07:59:58 -0500 writes:

 DB Thanks for your summary, Hank.
 DB On 9/7/06, Martin Henry H. Stevens  
 [EMAIL PROTECTED] wrote:
  Dear lmer-ers,
  My thanks for all of you who are sharing your trials and  
 tribulations
  publicly.

  I was hoping to elicit some feedback on my thoughts on  
 denominator
  degrees of freedom for F ratios in mixed models. These  
 thoughts and
  practices result from my reading of previous postings by  
 Doug Bates
  and others.

  - I start by assuming that the appropriate denominator  
 degrees lies
  between n - p and and n - q, where n=number of  
 observations, p=number
  of fixed effects (rank of model matrix X), and q=rank of  
 Z:X.

 DB I agree with this but the opinion is by no means  
 universal.  Initially
 DB I misread the statement because I usually write the  
 number of columns
 DB of Z as q.

 DB It is not easy to assess rank of Z:X numerically.  In  
 many cases one
 DB can reason what it should be from the form of the model  
 but a general
 DB procedure to assess the rank of a matrix, especially a  
 sparse matrix,
 DB is difficult.

 DB An alternative which can be easily calculated is n - t  
 where t is the
 DB trace of the 'hat matrix'.  The function 'hatTrace'  
 applied to a
 DB fitted lmer model evaluates this trace (conditional on  
 the estimates
 DB of the relative variances of the random effects).

  - I then conclude that good estimates of P values on the  
 F ratios lie
between 1 - pf(F.ratio, numDF, n-p) and 1 - pf 
 (F.ratio, numDF, n-q).
-- I further surmise that the latter of these (1 - pf 
 (F.ratio, numDF,
n-q)) is the more conservative estimate.

 This assumes that the true distribution (under H0) of that F  
 ratio
 *is*  F_{n1,n2}  for some (possibly non-integer)  n1 and n2.
 But AFAIU, this is only approximately true at best, and AFAIU,
 the quality of this approximation has only been investigated
 empirically for some situations.
 Hence, even your conservative estimate of the P value could be
 wrong (I mean wrong on the wrong side instead of just
 conservatively wrong).  Consequently, such a P-value is only
 ``approximately conservative'' ...
 I agree howevert that in some situations, it might be a very
 useful descriptive statistic about the fitted model.

 I'm very wary of ANY attempt at guesswork in these matters.

 I may be understanding the post wrongly, but consider this case:  
 Y_ij
 = mu + z_i + eps_ij, i = 1..3, j=1..100

 I get rank(X)=1, rank(X:Z)=3,  n=300

 It is well known that the test for mu=0 in this case is obtained by
 reducing data to group means, xbar_i, and then do a one-sample t  
 test,
 the square of which is F(1, 2), but it seems to be suggested that
 F(1, 297) is a conservative test???!

 It's a different test, isn't it?  Your test is based upon the  
 between
 group sum of squares with 2 df.  I am proposing to use the within
 group sum of squares or its generalization.


 On closer examination I see that you are indeed correct.  I have  
 heard
 that well-known result many times and finally sat down to prove it
 to myself.  For a balanced design the standard error of the intercept
 using the REML estimates is the same as the standard error of the  
 mean
 calculated from the group means.


 data(Rail, package = 'nlme')
 library(lme4)
 summary(fm1 - lmer(travel ~ 1 + (1|Rail), Rail))

 Linear mixed-effects model fit by REML
 Formula: travel ~ 1 + (1 | Rail)
Data: Rail
AIC   BIC logLik MLdeviance REMLdeviance
  126.2 128.0 -61.09  128.6122.2
 Random effects:
  Groups   NameVariance Std.Dev.
  Rail (Intercept) 615.286  24.8050
  Residual  16.167   4.0208
 number of obs: 18, groups: Rail, 6

 Fixed effects:
 Estimate Std. Error t value
 (Intercept)66.50  10.17   6.538

 mns - with(Rail, tapply(travel, Rail, mean)) # group means
 sd(mns)/sqrt(length(mns))  # standard error matches that from lmer

 [1] 10.17104

 t.test(mns)


  One Sample t-test

 data:  mns
 t = 6.5382, df = 5, p-value = 0.001253
 alternative hypothesis: true mean is not equal to 0
 95 percent confidence interval:
  40.35452 92.64548
 sample estimates:
 mean of x
  66.5


 ctab - summary(fm1)@coefs  # 

Re: [R] Conservative ANOVA tables in lmer

2006-09-11 Thread Spencer Graves


Martin Henry H. Stevens wrote:
 Hi Spencer,
 I would like to make sure I understand Spencer's question and doubt's, 
 below.
 On Sep 9, 2006, at 11:36 PM, Spencer Graves wrote:

  Peter's example and Doug's different test reply sent me 
 Scheffé's discussion of the balanced and replicated mixed-effect 
 2-away layout.  As I note below, the obvious F test for the fixed 
 effect does not appear to be likelihood ratio for anything.
 Douglas Bates wrote:
 On 9/7/06, Douglas Bates [EMAIL PROTECTED] wrote:

 On 07 Sep 2006 17:20:29 +0200, Peter Dalgaard 
 [EMAIL PROTECTED] wrote:
snip

 I'm very wary of ANY attempt at guesswork in these matters.

 I may be understanding the post wrongly, but consider this case: Y_ij
 = mu + z_i + eps_ij, i = 1..3, j=1..100

 I get rank(X)=1, rank(X:Z)=3,  n=300

 It is well known that the test for mu=0 in this case is obtained by
 reducing data to group means, xbar_i, and then do a one-sample t 
 test,
 the square of which is F(1, 2), but it seems to be suggested that
 F(1, 297) is a conservative test???!

 It's a different test, isn't it?  Your test is based upon the between
 group sum of squares with 2 df.  I am proposing to use the within
 group sum of squares or its generalization.
snip
  For the traditional, balanced, replicated, 2-way mixed-effects 
 analysis, Scheffé (1959, Table 8.1.1, p. 269) gives the expected mean 
 squares for a two-way layout with I levels of a fixed effect A, J 
 levels of a random effect B, and K replicates, as follows:
 EMS(A: fixed) = var(e) + K*var(A:B) + J*K*MeanSquareA
 EMS(B: random) = var(e) + I*K*var(B)
 EMS(A:B; random)=var(e)+K*var(A:B)
 EMSE = var(e).
  In this case, the obvious test for A is MS(A: fixed) / MS(A:B, 
 random), because this gives us a standard F statistic to test 
 MeanSquareA = 0.  However, it doesn't make sense to me to test A 
 without simultaneously assuming var(A:B) = 0.

 Does one want to test whether var(A:B)=0 because the F-test assumes 
 it? That is, that as var(A:B) increases, the var ratio 
 MS{A:fixed)/MS(A:B, random) decreases, artifactually reducing the 
 significance of J:K:MeanSquareA?

SG:  Scheffe recommends testing var(A:B)=0 using MS(A:B)/MSE, which 
follows an F distribution under both null and alternative hypotheses 
(but scaled differently under the alternative).  This is a monotonic 
transformation of the standard likelihood ratio test, and makes sense to 
me. 

SG:  Scheffe's recommended test for MeanSquareA = 0 is MSA/MS(A:B).  I 
haven't worked out the details, but it looks like this assumes that the 
A:B interaction may be real without an A effect, and this violates a 
basic principle of hierarchy, I think.  There are situations where an 
A:B interaction can exist without the main effect, but that rests on a 
particular parameterization, and we have not assumed that in this context.

SG:  I think this is an argument for using likelihood ratio over 
Scheffe's traditional answer.  Similar to Doug's comment about different 
tests, we should be clear about what each procedure tests, and then 
select the procedure that seems to be the closest to the problem we 
really want to solve, rather than simply relying on a test statistic 
whose distribution is better understood.  I'm not criticizing Scheffe:  
He didn't have the computer tools available in 1959 that we have today. 

 The same argument applies to Peter's simpler case discussed above:  
 With Y_ij = mu + z_i + eps_ij, it only rarely makes sense to test 
 mu=0 while assuming var(z) != 0.  In the balanced 2-way, 
 mixed-effects analysis, the Neyman-Pearson thing to do, I would 
 think, would be to test simultaneously MeanSquareA = 0 with var(A:B) 
 = 0.  In lmer, I might write this as follows:
  anova(lmer(y~A+(A|B)), lmer(y~1+(1|B)).
  However, this does NOT match the standard analysis associated 
 with this design, does it?  To check this, I considered problem 8.1 
 in Scheffé (p. 289), which compares 3 different nozzles (fixed 
 effect) tested by 5 different operators (random effect).  The data 
 are as follows:
 y - c(6,6,-15,   26,12,5, 11,4,4,   21,14,7, 25,18,25,
   13,6,13,4,4,11,   17,10,17,   -5,2,-5, 15,8,1,
 10,10,-11, -35,0,-14, 11,-10,-17, 12,-2,-16, -4,10,24)

 Nozzle - data.frame(Nozzle=rep(LETTERS[1:3], e=15),
  Operator=rep(letters[1:5], e=3), flowRate=y)

  The traditional analysis can be obtained from 
 anova(lm(flowRate~Nozzle*Operator, ...)), but comparing MeanSq.Nozzle 
 to MeanSq.Nozzle:Operator rather than MeanSquareResidual, as follows:
  fitAB0 - lm(flowRate~Nozzle*Operator, data=Nozzle)
  (aov.AB0 - anova(fitAB0))
 Analysis of Variance Table

 Response: flowRate
Df  Sum Sq Mean Sq F value   Pr(F)  Nozzle   
 2 1426.98  713.49  7.0456 0.003101 **
 Operator 4  798.80  199.70  1.9720 0.124304  Nozzle:Operator  
 8 1821.47  227.68  2.2484 0.051640 .
 Residuals   30 3038.00  101.27   ---
 Signif. codes:  0 '***' 0.001 '**' 

Re: [R] Conservative ANOVA tables in lmer

2006-09-11 Thread Douglas Bates
On 9/10/06, Andrew Robinson [EMAIL PROTECTED] wrote:
 On Thu, Sep 07, 2006 at 07:59:58AM -0500, Douglas Bates wrote:

  I would be happy to re-institute p-values for fixed effects in the
  summary and anova methods for lmer objects using a denominator degrees
  of freedom based on the trace of the hat matrix or the rank of Z:X if
  others will volunteer to respond to the these answers are obviously
  wrong because they don't agree with whatever and the idiot who wrote
  this software should be thrashed to within an inch of his life
  messages.  I don't have the patience.

 This seems to be more than fair to me.  I'll volunteer to help explain
 why the anova.lmer() output doesn't match SAS, etc.  Is it worth
 putting a caveat in the output and the help files?  Is it even worth
 writing a FAQ about this?

Having made that offer I think I will now withdraw it.  Peter's
example has convinced me that this is the wrong thing to do.

I am encouraged by the fact that the results from mcmcsamp correspond
closely to the correct theoretical results in the case that Peter
described.  I appreciate that some users will find it difficult to
work with a MCMC sample (or to convince editors to accept results
based on such a sample) but I think that these results indicate that
it is better to go after the marginal distribution of the fixed
effects estimates (which is what is being approximated by the MCMC
sample - up to Bayesian/frequentist philosophical differences) than to
use the conditional distribution and somehow try to adjust the
reference distribution.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-11 Thread Douglas Bates
On 9/10/06, John Maindonald [EMAIL PROTECTED] wrote:
 A Wiki entry is an excellent idea.  I am happy to try to help.

 An account of mcmcsamp() might be very useful part of the Wiki.  My
 limited investigations suggest that once the data starts to overwhelm
 the prior (maybe ~3 df for an effect that is of interest), the
 posterior distribution that it gives provides a very good
 approximation to the sampling distribution.

 I have been meaning to put aside time to try to work out, with the
 help of a colleague here at ANU, how the Kenward  Roger (Biometrics,
 1997) approximation might be implemented in lmer, but it has'nt yet
 happened and is unlikely to do so for a while.

I think it would be nontrivial to do this for a general case.  A
literal translation of the formulas in that paper may be suitable for
simple cases but not for general cases.  Like many papers in this
literature this one has the inverse of an n by n matrix (n being the
number of observations) embedded in the middle of most of the
formulas. Given that some users are seriously considering fitting
models for which n is in the millions, forming and manipulating an n
by n matrix in such cases is out of the question unless you can
exploit special properties of the matrix.



 John Maindonald email: [EMAIL PROTECTED]
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Mathematics  Its Applications, Room 1194,
 John Dedman Mathematical Sciences Building (Building 27)
 Australian National University, Canberra ACT 0200.


 On 10 Sep 2006, at 8:00 PM, [EMAIL PROTECTED] wrote:

  From: Spencer Graves [EMAIL PROTECTED]
  Date: 10 September 2006 4:54:50 PM
  To: Andrew Robinson [EMAIL PROTECTED]
  Cc: Douglas Bates [EMAIL PROTECTED], R-Help r-
  [EMAIL PROTECTED]
  Subject: Re: [R] Conservative ANOVA tables in lmer
 
 
  Hi, Doug, et al.:
   I'll volunteer to do the same, which is an extension of much
  of what I've been doing for R Help for a while now.
   Regarding writing a FAQ, what about a Wiki entry (and maybe
  ultimately a vignette)?  This thread provides notes around which
  such could be built.  Another piece might be an example from
  Scheffé (1958), which I sent as a reply to an earlier comment on
  this thread, (foolishly sent without reducing the cc list, which
  means it awaits moderator approval).   Each time a question of
  this nature arises, someone checks the Wiki, edits adds something
  to it if necessary, then replies to the list with the reference to
  the appropriate Wiki entry.
   Spencer Graves
 
  Andrew Robinson wrote:
  On Thu, Sep 07, 2006 at 07:59:58AM -0500, Douglas Bates wrote:
 
 
  I would be happy to re-institute p-values for fixed effects in the
  summary and anova methods for lmer objects using a denominator
  degrees
  of freedom based on the trace of the hat matrix or the rank of
  Z:X if
  others will volunteer to respond to the these answers are obviously
  wrong because they don't agree with whatever and the idiot who
  wrote
  this software should be thrashed to within an inch of his life
  messages.  I don't have the patience.
 
 
  This seems to be more than fair to me.  I'll volunteer to help
  explain
  why the anova.lmer() output doesn't match SAS, etc.  Is it worth
  putting a caveat in the output and the help files?  Is it even worth
  writing a FAQ about this?
 
  Cheers
 
  Andrew


 [[alternative HTML version deleted]]



 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-11 Thread Manuel Morales
On Mon, 2006-09-11 at 11:43 -0500, Douglas Bates wrote:
 On 9/10/06, Andrew Robinson [EMAIL PROTECTED] wrote:
  On Thu, Sep 07, 2006 at 07:59:58AM -0500, Douglas Bates wrote:
 
   I would be happy to re-institute p-values for fixed effects in the
   summary and anova methods for lmer objects using a denominator degrees
   of freedom based on the trace of the hat matrix or the rank of Z:X if
   others will volunteer to respond to the these answers are obviously
   wrong because they don't agree with whatever and the idiot who wrote
   this software should be thrashed to within an inch of his life
   messages.  I don't have the patience.
 
  This seems to be more than fair to me.  I'll volunteer to help explain
  why the anova.lmer() output doesn't match SAS, etc.  Is it worth
  putting a caveat in the output and the help files?  Is it even worth
  writing a FAQ about this?
 
 Having made that offer I think I will now withdraw it.  Peter's
 example has convinced me that this is the wrong thing to do.
 
 I am encouraged by the fact that the results from mcmcsamp correspond
 closely to the correct theoretical results in the case that Peter
 described.  I appreciate that some users will find it difficult to
 work with a MCMC sample (or to convince editors to accept results
 based on such a sample) but I think that these results indicate that
 it is better to go after the marginal distribution of the fixed
 effects estimates (which is what is being approximated by the MCMC
 sample - up to Bayesian/frequentist philosophical differences) than to
 use the conditional distribution and somehow try to adjust the
 reference distribution.

Am I right that the MCMC sample can not be used, however, to evaluate
the significance of parameter groups. For example, to assess the
significance of a three-level factor? Are there better alternatives than
simply adjusting the CI for the number of factor levels
(1-alpha/levels).

Thanks!

Manuel

=
Manuel A. Morales
Asst. Prof., Biology
Williams College
http://mutualism.williams.edu

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-10 Thread Spencer Graves
Hi, Doug, et al.: 

  I'll volunteer to do the same, which is an extension of much of 
what I've been doing for R Help for a while now. 

  Regarding writing a FAQ, what about a Wiki entry (and maybe 
ultimately a vignette)?  This thread provides notes around which such 
could be built.  Another piece might be an example from Scheffé (1958), 
which I sent as a reply to an earlier comment on this thread, (foolishly 
sent without reducing the cc list, which means it awaits moderator 
approval).   Each time a question of this nature arises, someone checks 
the Wiki, edits adds something to it if necessary, then replies to the 
list with the reference to the appropriate Wiki entry. 

  Spencer Graves

Andrew Robinson wrote:
 On Thu, Sep 07, 2006 at 07:59:58AM -0500, Douglas Bates wrote:

   
 I would be happy to re-institute p-values for fixed effects in the
 summary and anova methods for lmer objects using a denominator degrees
 of freedom based on the trace of the hat matrix or the rank of Z:X if
 others will volunteer to respond to the these answers are obviously
 wrong because they don't agree with whatever and the idiot who wrote
 this software should be thrashed to within an inch of his life
 messages.  I don't have the patience.
 

 This seems to be more than fair to me.  I'll volunteer to help explain
 why the anova.lmer() output doesn't match SAS, etc.  Is it worth
 putting a caveat in the output and the help files?  Is it even worth
 writing a FAQ about this?

 Cheers

 Andrew


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-10 Thread John Sorkin
Spencer,
Would you add the reference to the WIKI? I think it would help round out this 
thread.
John

John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC,
University of Maryland School of Medicine Claude D. Pepper OAIC,
University of Maryland Clinical Nutrition Research Unit, and
Baltimore VA Center Stroke of Excellence

University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524

(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
[EMAIL PROTECTED]

 Spencer Graves [EMAIL PROTECTED] 09/10/06 2:54 AM 
Hi, Doug, et al.: 

  I'll volunteer to do the same, which is an extension of much of 
what I've been doing for R Help for a while now. 

  Regarding writing a FAQ, what about a Wiki entry (and maybe 
ultimately a vignette)?  This thread provides notes around which such 
could be built.  Another piece might be an example from Scheffé (1958), 
which I sent as a reply to an earlier comment on this thread, (foolishly 
sent without reducing the cc list, which means it awaits moderator 
approval).   Each time a question of this nature arises, someone checks 
the Wiki, edits adds something to it if necessary, then replies to the 
list with the reference to the appropriate Wiki entry. 

  Spencer Graves

Andrew Robinson wrote:
 On Thu, Sep 07, 2006 at 07:59:58AM -0500, Douglas Bates wrote:

   
 I would be happy to re-institute p-values for fixed effects in the
 summary and anova methods for lmer objects using a denominator degrees
 of freedom based on the trace of the hat matrix or the rank of Z:X if
 others will volunteer to respond to the these answers are obviously
 wrong because they don't agree with whatever and the idiot who wrote
 this software should be thrashed to within an inch of his life
 messages.  I don't have the patience.
 

 This seems to be more than fair to me.  I'll volunteer to help explain
 why the anova.lmer() output doesn't match SAS, etc.  Is it worth
 putting a caveat in the output and the help files?  Is it even worth
 writing a FAQ about this?

 Cheers

 Andrew


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is\ for the s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-10 Thread John Maindonald
A Wiki entry is an excellent idea.  I am happy to try to help.

An account of mcmcsamp() might be very useful part of the Wiki.  My  
limited investigations suggest that once the data starts to overwhelm  
the prior (maybe ~3 df for an effect that is of interest), the  
posterior distribution that it gives provides a very good  
approximation to the sampling distribution.

I have been meaning to put aside time to try to work out, with the  
help of a colleague here at ANU, how the Kenward  Roger (Biometrics,  
1997) approximation might be implemented in lmer, but it has'nt yet  
happened and is unlikely to do so for a while.

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 10 Sep 2006, at 8:00 PM, [EMAIL PROTECTED] wrote:

 From: Spencer Graves [EMAIL PROTECTED]
 Date: 10 September 2006 4:54:50 PM
 To: Andrew Robinson [EMAIL PROTECTED]
 Cc: Douglas Bates [EMAIL PROTECTED], R-Help r- 
 [EMAIL PROTECTED]
 Subject: Re: [R] Conservative ANOVA tables in lmer


 Hi, Doug, et al.:
  I'll volunteer to do the same, which is an extension of much  
 of what I've been doing for R Help for a while now.
  Regarding writing a FAQ, what about a Wiki entry (and maybe  
 ultimately a vignette)?  This thread provides notes around which  
 such could be built.  Another piece might be an example from  
 Scheffé (1958), which I sent as a reply to an earlier comment on  
 this thread, (foolishly sent without reducing the cc list, which  
 means it awaits moderator approval).   Each time a question of  
 this nature arises, someone checks the Wiki, edits adds something  
 to it if necessary, then replies to the list with the reference to  
 the appropriate Wiki entry.
  Spencer Graves

 Andrew Robinson wrote:
 On Thu, Sep 07, 2006 at 07:59:58AM -0500, Douglas Bates wrote:


 I would be happy to re-institute p-values for fixed effects in the
 summary and anova methods for lmer objects using a denominator  
 degrees
 of freedom based on the trace of the hat matrix or the rank of  
 Z:X if
 others will volunteer to respond to the these answers are obviously
 wrong because they don't agree with whatever and the idiot who  
 wrote
 this software should be thrashed to within an inch of his life
 messages.  I don't have the patience.


 This seems to be more than fair to me.  I'll volunteer to help  
 explain
 why the anova.lmer() output doesn't match SAS, etc.  Is it worth
 putting a caveat in the output and the help files?  Is it even worth
 writing a FAQ about this?

 Cheers

 Andrew


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-09 Thread Andrew Robinson
On Thu, Sep 07, 2006 at 07:59:58AM -0500, Douglas Bates wrote:

 I would be happy to re-institute p-values for fixed effects in the
 summary and anova methods for lmer objects using a denominator degrees
 of freedom based on the trace of the hat matrix or the rank of Z:X if
 others will volunteer to respond to the these answers are obviously
 wrong because they don't agree with whatever and the idiot who wrote
 this software should be thrashed to within an inch of his life
 messages.  I don't have the patience.

This seems to be more than fair to me.  I'll volunteer to help explain
why the anova.lmer() output doesn't match SAS, etc.  Is it worth
putting a caveat in the output and the help files?  Is it even worth
writing a FAQ about this?

Cheers

Andrew
-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
Email: [EMAIL PROTECTED] http://www.ms.unimelb.edu.au

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Conservative ANOVA tables in lmer

2006-09-08 Thread Fabian Scheipl

Dear list,

I have written functions to perform simulation-based tests of 
HO: Var(Random Effects)=0, beta_foo=0 
in linear mixed models based on the exact distribution of the LR- and
Restricted LR-test statistic (see research presented in
Crainiceanu, Ruppert: LRT in LMM with 1 variance component,
J. R. Statist. Soc. B (2004), 66, Part 1, pp. 165-185 )
-they are about 15-20 times faster than the parametric bootstrap.

At the moment, the exact distributions are only easily simulated for the
case of 1 single variance component/random effect and i.i.d. errors; feasible 
approximations for the multivariate case are currently
being investigated and will be implemented soon.

the syntax looks something like this:

#begin code:

data(sleepstudy)
summary(sleepstudy)  #Effect of sleep deprivation on reaction time
xyplot(Reaction~Days|Subject, data=sleepstudy)
m-lmer(Reaction~Days+(Days-1|Subject),data=sleepstudy) 
#random slopes, but no random intercept  
#doesna make sense, but it's just an example
summary(m)

#test for individual heterogeneity based on RLRT
#(No restrictions on fixed effects under H0)
#HO: lambda=Var(RandomSlopes)/Var(error)==0 == Var(RandomSlopes)==0

t3-RLRT1SimTest(m, lambda0=0, seed=5, nsim=1)

#will produce output:
#HO: lambda = 0 ; p-value = 0 
# observed lambda = 0.06259639 

#test for influence of Days based on LRT 
#(restriction on fixed efects: beta_Days==0)

m0-lm(Reaction~1,data=sleepstudy)
t4-LRT1SimTest(m, m0, seed=10, nsim=1)

#will produce output:
#Model under HO:  Reaction ~ 1 ;
#Model under HA:  Reaction ~ Days + (Days - 1 | Subject) ;
# p-value = 0 
# observed lambda = 0.06259639 

#end code 
  
If you are interested in using these functions i'll be glad
to send them to you-
be aware, however, that you can only use them for testing
1 Random Effect vs. no Random Effect in a model with i.i.d. errors!!

The plan is to put them in a package beginning next year and
use them as a basis for an (exact) anova.lmer() method.

Greetings,
Fabian Scheipl


--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-08 Thread Fabian Scheipl
Dear list,

I have written functions to perform simulation-based tests of 
HO: Var(Random Effects)=0, beta_foo=0 
in linear mixed models based on the exact distribution of the LR- and
Restricted LR-test statistic (see research presented in
Crainiceanu, Ruppert: LRT in LMM with 1 variance component,
J. R. Statist. Soc. B (2004), 66, Part 1, pp. 165-185 )
-they are about 15-20 times faster than the parametric bootstrap.

At the moment, the exact distributions are only easily simulated for the
case of 1 single variance component/random effect and i.i.d. errors; feasible 
approximations for the multivariate case are currently
being investigated and will be implemented soon.

the syntax looks something like this:

#begin code:

data(sleepstudy)
summary(sleepstudy)  #Effect of sleep deprivation on reaction time
xyplot(Reaction~Days|Subject, data=sleepstudy)
m-lmer(Reaction~Days+(Days-1|Subject),data=sleepstudy) 
#random slopes, but no random intercept  
#doesna make sense, but it's just an example
summary(m)

#test for individual heterogeneity based on RLRT
#(No restrictions on fixed effects under H0)
#HO: lambda=Var(RandomSlopes)/Var(error)==0 == Var(RandomSlopes)==0

t3-RLRT1SimTest(m, lambda0=0, seed=5, nsim=1)

#will produce output:
#HO: lambda = 0 ; p-value = 0 
# observed lambda = 0.06259639 

#test for influence of Days based on LRT 
#(restriction on fixed efects: beta_Days==0)

m0-lm(Reaction~1,data=sleepstudy)
t4-LRT1SimTest(m, m0, seed=10, nsim=1)

#will produce output:
#Model under HO:  Reaction ~ 1 ;
#Model under HA:  Reaction ~ Days + (Days - 1 | Subject) ;
# p-value = 0 
# observed lambda = 0.06259639 

#end code 
  
If you are interested in using these functions i'll be glad
to send them to you-
be aware, however, that you can only use them for testing
1 Random Effect vs. no Random Effect in a model with i.i.d. errors!!

The plan is to put them in a package beginning next year and
use them as a basis for an (exact) anova.lmer() method.

Greetings,
Fabian Scheipl


-- 





--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-08 Thread Douglas Bates
On 9/7/06, Douglas Bates [EMAIL PROTECTED] wrote:
 On 07 Sep 2006 17:20:29 +0200, Peter Dalgaard [EMAIL PROTECTED] wrote:
  Martin Maechler [EMAIL PROTECTED] writes:
 
DB == Douglas Bates [EMAIL PROTECTED]
on Thu, 7 Sep 2006 07:59:58 -0500 writes:
  
   DB Thanks for your summary, Hank.
   DB On 9/7/06, Martin Henry H. Stevens [EMAIL PROTECTED] wrote:
Dear lmer-ers,
My thanks for all of you who are sharing your trials and 
   tribulations
publicly.
  
I was hoping to elicit some feedback on my thoughts on denominator
degrees of freedom for F ratios in mixed models. These thoughts and
practices result from my reading of previous postings by Doug Bates
and others.
  
- I start by assuming that the appropriate denominator degrees lies
between n - p and and n - q, where n=number of observations, 
   p=number
of fixed effects (rank of model matrix X), and q=rank of Z:X.
  
   DB I agree with this but the opinion is by no means universal.  
   Initially
   DB I misread the statement because I usually write the number of 
   columns
   DB of Z as q.
  
   DB It is not easy to assess rank of Z:X numerically.  In many cases 
   one
   DB can reason what it should be from the form of the model but a 
   general
   DB procedure to assess the rank of a matrix, especially a sparse 
   matrix,
   DB is difficult.
  
   DB An alternative which can be easily calculated is n - t where t is 
   the
   DB trace of the 'hat matrix'.  The function 'hatTrace' applied to a
   DB fitted lmer model evaluates this trace (conditional on the 
   estimates
   DB of the relative variances of the random effects).
  
- I then conclude that good estimates of P values on the F ratios 
   lie
  between 1 - pf(F.ratio, numDF, n-p) and 1 - pf(F.ratio, numDF, 
   n-q).
  -- I further surmise that the latter of these (1 - pf(F.ratio, 
   numDF,
  n-q)) is the more conservative estimate.
  
   This assumes that the true distribution (under H0) of that F ratio
   *is*  F_{n1,n2}  for some (possibly non-integer)  n1 and n2.
   But AFAIU, this is only approximately true at best, and AFAIU,
   the quality of this approximation has only been investigated
   empirically for some situations.
   Hence, even your conservative estimate of the P value could be
   wrong (I mean wrong on the wrong side instead of just
   conservatively wrong).  Consequently, such a P-value is only
   ``approximately conservative'' ...
   I agree howevert that in some situations, it might be a very
   useful descriptive statistic about the fitted model.
 
  I'm very wary of ANY attempt at guesswork in these matters.
 
  I may be understanding the post wrongly, but consider this case: Y_ij
  = mu + z_i + eps_ij, i = 1..3, j=1..100
 
  I get rank(X)=1, rank(X:Z)=3,  n=300
 
  It is well known that the test for mu=0 in this case is obtained by
  reducing data to group means, xbar_i, and then do a one-sample t test,
  the square of which is F(1, 2), but it seems to be suggested that
  F(1, 297) is a conservative test???!

 It's a different test, isn't it?  Your test is based upon the between
 group sum of squares with 2 df.  I am proposing to use the within
 group sum of squares or its generalization.

On closer examination I see that you are indeed correct.  I have heard
that well-known result many times and finally sat down to prove it
to myself.  For a balanced design the standard error of the intercept
using the REML estimates is the same as the standard error of the mean
calculated from the group means.

 data(Rail, package = 'nlme')
 library(lme4)
 summary(fm1 - lmer(travel ~ 1 + (1|Rail), Rail))
Linear mixed-effects model fit by REML
Formula: travel ~ 1 + (1 | Rail)
   Data: Rail
   AIC   BIC logLik MLdeviance REMLdeviance
 126.2 128.0 -61.09  128.6122.2
Random effects:
 Groups   NameVariance Std.Dev.
 Rail (Intercept) 615.286  24.8050
 Residual  16.167   4.0208
number of obs: 18, groups: Rail, 6

Fixed effects:
Estimate Std. Error t value
(Intercept)66.50  10.17   6.538
 mns - with(Rail, tapply(travel, Rail, mean)) # group means
 sd(mns)/sqrt(length(mns))  # standard error matches that from lmer
[1] 10.17104
 t.test(mns)

One Sample t-test

data:  mns
t = 6.5382, df = 5, p-value = 0.001253
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 40.35452 92.64548
sample estimates:
mean of x
 66.5

 ctab - summary(fm1)@coefs  # coefficient table
 ctab[,1] + c(-1,1) * qt(0.975, 15) * ctab[,2] # 95% conf. int.
[1] 44.82139 88.17861
 ## interval using df = # of obs - rank of [Z:X] is too narrow

So my proposal of using either the trace of the hat matrix or the rank
of the combined model matrices as the degrees of freedom for the model
is not conservative.

However, look at the following

 set.seed(123454321) 

[R] Conservative ANOVA tables in lmer

2006-09-07 Thread Martin Henry H. Stevens
Dear lmer-ers,
My thanks for all of you who are sharing your trials and tribulations  
publicly.

I was hoping to elicit some feedback on my thoughts on denominator  
degrees of freedom for F ratios in mixed models. These thoughts and  
practices result from my reading of previous postings by Doug Bates  
and others.

- I start by assuming that the appropriate denominator degrees lies  
between n - p and and n - q, where n=number of observations, p=number  
of fixed effects (rank of model matrix X), and q=rank of Z:X.
- I then conclude that good estimates of P values on the F ratios lie  
between 1 - pf(F.ratio, numDF, n-p) and 1 - pf(F.ratio, numDF, n-q).
- I further surmise that the latter of these (1 - pf(F.ratio, numDF,  
n-q)) is the more conservative estimate.

When I use these criteria and compare my ANOVA table to the results  
of analysis of Helmert contrasts using MCMC sample with highest  
posterior density intervals, I find that my conclusions (e.g. factor  
A, with three levels, has a significant effect on the response  
variable) are qualitatively the same.

Comments?

Hank


Dr. Hank Stevens, Assistant Professor
338 Pearson Hall
Botany Department
Miami University
Oxford, OH 45056

Office: (513) 529-4206
Lab: (513) 529-4262
FAX: (513) 529-4243
http://www.cas.muohio.edu/~stevenmh/
http://www.muohio.edu/ecology/
http://www.muohio.edu/botany/
E Pluribus Unum

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-07 Thread Douglas Bates
Thanks for your summary, Hank.

On 9/7/06, Martin Henry H. Stevens [EMAIL PROTECTED] wrote:
 Dear lmer-ers,
 My thanks for all of you who are sharing your trials and tribulations
 publicly.

 I was hoping to elicit some feedback on my thoughts on denominator
 degrees of freedom for F ratios in mixed models. These thoughts and
 practices result from my reading of previous postings by Doug Bates
 and others.

 - I start by assuming that the appropriate denominator degrees lies
 between n - p and and n - q, where n=number of observations, p=number
 of fixed effects (rank of model matrix X), and q=rank of Z:X.

I agree with this but the opinion is by no means universal.  Initially
I misread the statement because I usually write the number of columns
of Z as q.

It is not easy to assess rank of Z:X numerically.  In many cases one
can reason what it should be from the form of the model but a general
procedure to assess the rank of a matrix, especially a sparse matrix,
is difficult.

An alternative which can be easily calculated is n - t where t is the
trace of the 'hat matrix'.  The function 'hatTrace' applied to a
fitted lmer model evaluates this trace (conditional on the estimates
of the relative variances of the random effects).

 - I then conclude that good estimates of P values on the F ratios lie
 between 1 - pf(F.ratio, numDF, n-p) and 1 - pf(F.ratio, numDF, n-q).
 - I further surmise that the latter of these (1 - pf(F.ratio, numDF,
 n-q)) is the more conservative estimate.

 When I use these criteria and compare my ANOVA table to the results
 of analysis of Helmert contrasts using MCMC sample with highest
 posterior density intervals, I find that my conclusions (e.g. factor
 A, with three levels, has a significant effect on the response
 variable) are qualitatively the same.

 Comments?

I would be happy to re-institute p-values for fixed effects in the
summary and anova methods for lmer objects using a denominator degrees
of freedom based on the trace of the hat matrix or the rank of Z:X if
others will volunteer to respond to the these answers are obviously
wrong because they don't agree with whatever and the idiot who wrote
this software should be thrashed to within an inch of his life
messages.  I don't have the patience.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-07 Thread lorenz.gygax
Dear Douglas,

 I would be happy to re-institute p-values for fixed effects in the
 summary and anova methods for lmer objects using a denominator
 degrees of freedom based on the trace of the hat matrix or the rank
 of Z:X

Please do!

 if others will volunteer to respond to the these answers are
 obviously wrong because they don't agree with whatever and the
 idiot who wrote this software should be thrashed to within an inch
 of his life messages.  I don't have the patience.

I would try to take up my shares of these type or questions.

Best regards, Lorenz
- 
Lorenz Gygax
Dr. sc. nat., postdoc

Centre for proper housing of ruminants and pigs
Swiss Federal Veterinary Office
Agroscope Reckenholz-Tänikon Research Station ART

Tänikon, CH-8356 Ettenhausen / Switzerland
Tel: +41 052 368 33 84
Fax: +41 052 365 11 90
[EMAIL PROTECTED]
www.art.admin.ch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-07 Thread Martin Maechler
 DB == Douglas Bates [EMAIL PROTECTED]
 on Thu, 7 Sep 2006 07:59:58 -0500 writes:

DB Thanks for your summary, Hank.
DB On 9/7/06, Martin Henry H. Stevens [EMAIL PROTECTED] wrote:
 Dear lmer-ers,
 My thanks for all of you who are sharing your trials and tribulations
 publicly.

 I was hoping to elicit some feedback on my thoughts on denominator
 degrees of freedom for F ratios in mixed models. These thoughts and
 practices result from my reading of previous postings by Doug Bates
 and others.

 - I start by assuming that the appropriate denominator degrees lies
 between n - p and and n - q, where n=number of observations, p=number
 of fixed effects (rank of model matrix X), and q=rank of Z:X.

DB I agree with this but the opinion is by no means universal.  Initially
DB I misread the statement because I usually write the number of columns
DB of Z as q.

DB It is not easy to assess rank of Z:X numerically.  In many cases one
DB can reason what it should be from the form of the model but a general
DB procedure to assess the rank of a matrix, especially a sparse matrix,
DB is difficult.

DB An alternative which can be easily calculated is n - t where t is the
DB trace of the 'hat matrix'.  The function 'hatTrace' applied to a
DB fitted lmer model evaluates this trace (conditional on the estimates
DB of the relative variances of the random effects).

 - I then conclude that good estimates of P values on the F ratios lie
   between 1 - pf(F.ratio, numDF, n-p) and 1 - pf(F.ratio, numDF, n-q).
   -- I further surmise that the latter of these (1 - pf(F.ratio, numDF,
   n-q)) is the more conservative estimate.

This assumes that the true distribution (under H0) of that F ratio
*is*  F_{n1,n2}  for some (possibly non-integer)  n1 and n2.
But AFAIU, this is only approximately true at best, and AFAIU,
the quality of this approximation has only been investigated
empirically for some situations. 
Hence, even your conservative estimate of the P value could be
wrong (I mean wrong on the wrong side instead of just
conservatively wrong).  Consequently, such a P-value is only
``approximately conservative'' ...
I agree howevert that in some situations, it might be a very
useful descriptive statistic about the fitted model.

Martin

 When I use these criteria and compare my ANOVA table to the results
 of analysis of Helmert contrasts using MCMC sample with highest
 posterior density intervals, I find that my conclusions (e.g. factor
 A, with three levels, has a significant effect on the response
 variable) are qualitatively the same.

 Comments?

DB I would be happy to re-institute p-values for fixed effects in the
DB summary and anova methods for lmer objects using a denominator degrees
DB of freedom based on the trace of the hat matrix or the rank of Z:X if
DB others will volunteer to respond to the these answers are obviously
DB wrong because they don't agree with whatever and the idiot who wrote
DB this software should be thrashed to within an inch of his life
DB messages.  I don't have the patience.

DB __
DB R-help@stat.math.ethz.ch mailing list
DB https://stat.ethz.ch/mailman/listinfo/r-help
DB PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
DB and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-07 Thread Douglas Bates
On 9/7/06, Martin Maechler [EMAIL PROTECTED] wrote:
  DB == Douglas Bates [EMAIL PROTECTED]
  on Thu, 7 Sep 2006 07:59:58 -0500 writes:

 DB Thanks for your summary, Hank.
 DB On 9/7/06, Martin Henry H. Stevens [EMAIL PROTECTED] wrote:
  Dear lmer-ers,
  My thanks for all of you who are sharing your trials and tribulations
  publicly.

  I was hoping to elicit some feedback on my thoughts on denominator
  degrees of freedom for F ratios in mixed models. These thoughts and
  practices result from my reading of previous postings by Doug Bates
  and others.

  - I start by assuming that the appropriate denominator degrees lies
  between n - p and and n - q, where n=number of observations, p=number
  of fixed effects (rank of model matrix X), and q=rank of Z:X.

 DB I agree with this but the opinion is by no means universal.  Initially
 DB I misread the statement because I usually write the number of columns
 DB of Z as q.

 DB It is not easy to assess rank of Z:X numerically.  In many cases one
 DB can reason what it should be from the form of the model but a general
 DB procedure to assess the rank of a matrix, especially a sparse matrix,
 DB is difficult.

 DB An alternative which can be easily calculated is n - t where t is the
 DB trace of the 'hat matrix'.  The function 'hatTrace' applied to a
 DB fitted lmer model evaluates this trace (conditional on the estimates
 DB of the relative variances of the random effects).

  - I then conclude that good estimates of P values on the F ratios lie
between 1 - pf(F.ratio, numDF, n-p) and 1 - pf(F.ratio, numDF, n-q).
-- I further surmise that the latter of these (1 - pf(F.ratio, numDF,
n-q)) is the more conservative estimate.

 This assumes that the true distribution (under H0) of that F ratio
 *is*  F_{n1,n2}  for some (possibly non-integer)  n1 and n2.
 But AFAIU, this is only approximately true at best, and AFAIU,
 the quality of this approximation has only been investigated
 empirically for some situations.
 Hence, even your conservative estimate of the P value could be
 wrong (I mean wrong on the wrong side instead of just
 conservatively wrong).  Consequently, such a P-value is only
 ``approximately conservative'' ...
 I agree howevert that in some situations, it might be a very
 useful descriptive statistic about the fitted model.

Thank you for pointing that out Martin.  I agree.  As I mentioned a
value of the denominator degrees of freedom based on the trace of the
hat matrix is conditional on the estimates of the relative variances
of the random effects.  I think an argument could still be made for
the upper bound on the dimension of the model space being rank of Z:X
and hence a lower bound on the dimension of the space in which the
residuals lie as being n - rank[Z:X].  One possible approach would be
to use the squared length of the projection of the data vector into
the orthogonal complement of Z:X as the sum of squares and n -
rank(Z:X) as the degrees of freedom and base tests on that.  Under the
assumptions on the model I think an F ratio calculated using that
actually would have an F distribution.


 Martin

  When I use these criteria and compare my ANOVA table to the results
  of analysis of Helmert contrasts using MCMC sample with highest
  posterior density intervals, I find that my conclusions (e.g. factor
  A, with three levels, has a significant effect on the response
  variable) are qualitatively the same.

  Comments?

 DB I would be happy to re-institute p-values for fixed effects in the
 DB summary and anova methods for lmer objects using a denominator degrees
 DB of freedom based on the trace of the hat matrix or the rank of Z:X if
 DB others will volunteer to respond to the these answers are obviously
 DB wrong because they don't agree with whatever and the idiot who wrote
 DB this software should be thrashed to within an inch of his life
 DB messages.  I don't have the patience.

 DB __
 DB R-help@stat.math.ethz.ch mailing list
 DB https://stat.ethz.ch/mailman/listinfo/r-help
 DB PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 DB and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conservative ANOVA tables in lmer

2006-09-07 Thread Peter Dalgaard
Martin Maechler [EMAIL PROTECTED] writes:

  DB == Douglas Bates [EMAIL PROTECTED]
  on Thu, 7 Sep 2006 07:59:58 -0500 writes:
 
 DB Thanks for your summary, Hank.
 DB On 9/7/06, Martin Henry H. Stevens [EMAIL PROTECTED] wrote:
  Dear lmer-ers,
  My thanks for all of you who are sharing your trials and tribulations
  publicly.
 
  I was hoping to elicit some feedback on my thoughts on denominator
  degrees of freedom for F ratios in mixed models. These thoughts and
  practices result from my reading of previous postings by Doug Bates
  and others.
 
  - I start by assuming that the appropriate denominator degrees lies
  between n - p and and n - q, where n=number of observations, p=number
  of fixed effects (rank of model matrix X), and q=rank of Z:X.
 
 DB I agree with this but the opinion is by no means universal.  Initially
 DB I misread the statement because I usually write the number of columns
 DB of Z as q.
 
 DB It is not easy to assess rank of Z:X numerically.  In many cases one
 DB can reason what it should be from the form of the model but a general
 DB procedure to assess the rank of a matrix, especially a sparse matrix,
 DB is difficult.
 
 DB An alternative which can be easily calculated is n - t where t is the
 DB trace of the 'hat matrix'.  The function 'hatTrace' applied to a
 DB fitted lmer model evaluates this trace (conditional on the estimates
 DB of the relative variances of the random effects).
 
  - I then conclude that good estimates of P values on the F ratios lie
between 1 - pf(F.ratio, numDF, n-p) and 1 - pf(F.ratio, numDF, n-q).
-- I further surmise that the latter of these (1 - pf(F.ratio, numDF,
n-q)) is the more conservative estimate.
 
 This assumes that the true distribution (under H0) of that F ratio
 *is*  F_{n1,n2}  for some (possibly non-integer)  n1 and n2.
 But AFAIU, this is only approximately true at best, and AFAIU,
 the quality of this approximation has only been investigated
 empirically for some situations. 
 Hence, even your conservative estimate of the P value could be
 wrong (I mean wrong on the wrong side instead of just
 conservatively wrong).  Consequently, such a P-value is only
 ``approximately conservative'' ...
 I agree howevert that in some situations, it might be a very
 useful descriptive statistic about the fitted model.

I'm very wary of ANY attempt at guesswork in these matters. 

I may be understanding the post wrongly, but consider this case: Y_ij
= mu + z_i + eps_ij, i = 1..3, j=1..100

I get rank(X)=1, rank(X:Z)=3,  n=300

It is well known that the test for mu=0 in this case is obtained by
reducing data to group means, xbar_i, and then do a one-sample t test,
the square of which is F(1, 2), but it seems to be suggested that
F(1, 297) is a conservative test???!

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.