I've been looking back over this discussion.
Another model that one can fit using lme is:

> lme(score~Machine, random=list(Worker=pdIdent(~0+Machine)),
+        weights=varIdent(form=~1|Machine), data=Machines)
Linear mixed-effects model fit by REML
 Data: Machines
 Log-restricted-likelihood: -108.9547
 Fixed: score ~ Machine
(Intercept)    MachineB    MachineC
 52.355556    7.966667   13.916667

Random effects:
Formula: ~0 + Machine | Worker
Structure: Multiple of an Identity
       MachineA MachineB MachineC Residual
StdDev:  6.06209  6.06209  6.06209 1.148581

Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Machine
Parameter estimates:
       A         B         C
1.0000000 0.8713263 0.5859709
Number of Observations: 54
Number of Groups: 6


This insists (I think) that conditional on the random effect for the worker,
the worker/machine random effects be independent,
but allows them to have different variances.  I am wondering whether
it is possible to fit such a model using lmer().

[In this example the large estimated correlations suggest that it is not
a sensible model.]

John Maindonald             email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 14 May 2008, at 2:49 AM, Douglas Bates wrote:

On Mon, May 12, 2008 at 5:39 PM, Rolf Turner <[EMAIL PROTECTED]> wrote:

On 13/05/2008, at 4:09 AM, Douglas Bates wrote:

I'm entering this discussion late so I may be discussing issues that
have already been addressed.

As I understand it, Federico, you began by describing a model for data in which two factors have a fixed set of levels and one factor has an
extensible, or "random", set of levels and you wanted to fit a model
that you described as

y ~ effect1 * effect2 * effect3

The problem is that this specification is not complete.

     At *last* (as Owl said to Rabbit) we're getting somewhere!!!

     I always knew that there was some basic fundamental point
     about this business that I (and I believe many others) were
     simply missing.  But I could not for the life of me get anyone
     to explain to me what that point was.  Or to put it another
     way, I was never able to frame a question that would illuminate
     just what it was that I wasn't getting.

     I now may be at a stage where I can start asking the right
     questions.

An interaction of factors with fixed levels and a factor with random
levels can mean, in the lmer specification,

lmer(y ~ effect1 * effect2 + (1| effect3) + (1| effect1:effect2:effect3),
...)

or

lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...)

or other variations.  When you specify a random effect or an random
interaction term you must, either explicitly or implicitly, specify
the form of the variance-covariance matrix associated with those
random effects.

The "advantage" that other software may provide for you is that it
chooses the model for you but that, of course, means that you only
have the one choice.

     Now may I start asking what I hope are questions that will lift
     the fog a bit?

     Let us for specificity consider a three-way model with two
fixed effects and one random effect from the good old Rothamstead
style
     agricultural experiment context:  Suppose we have a number of
     species/breeds of wheat (say) and a number of fertilizers.
These are fixed effects. And we have a number of fields (blocks)
     --- a random effect.  Each breed-fertilizer combination is
     applied a number of times in each field.  We ***assume*** that
     that the field or block effect is homogeneous throughout.  This
     may or may not be a ``good'' assumption, but it's not completely
     ridiculous and would often be made in practice.  And probably
     *was* made at Rothamstead.  The response would be something like
     yield in bushels per acre.

     The way that I would write the ``full'' model for this setting,
     in mathematical style is:

Y_ijkl = mu + alpha_i + beta_j + (alpha.beta)_ij + C_k + (alpha.C)_ik
                 + (beta.C)_jk + (alpha.beta.C)_ijk + E_ijkl

     The alpha_i and beta_j are parameters corresponding to breed and
fertilizer
respectively; the C_k are random effects corresponding to fields or
blocks.
     Any effect ``involving'' C is also random.

The assumptions made by the Package-Which-Must-Not-Be-Named are (I
think)
     that

             C_k ~ N(0,sigma_C^2)
             (alpha.C)_ik ~ N(0,sigma_aC^2)
             (beta.C)jk ~ N(0,sigma_bC^2)
             (alpha.beta.C)_ijk ~ N(0,sigma_abC^2)
             E_ijkl ~ N(0,sigma^2)

     and these random variables are *all independent*.

Ahhhhhhhh ... perhaps I'm on the way to answering my own question.
Is
it this assumption of ``all independent'' which is questionable? It
     seemed innocent enough when I first learned about this stuff, lo
these
     many years ago.  But .... mmmmmaybe not!

To start with: What would be the lmer syntax to fit the foregoing (possibly naive) model? I am sorry, but I really cannot get my head around the syntax of lmer model specification, and I've tried. I really have. Hard. I know I must be starting from the wrong place, but I haven't a clue as to what the *right* place to start from is. And if I'm in that boat, I will wager Euros to pretzels that there
     are others in it.  I know that I'm not the brightest bulb in the
     chandelier, but I'm not the dullest either.

Thanks for the questions, Rolf.  I completely agree that mixed model
specification can be an extremely confusing area.

Let's consider a set of models for the Machines data reproduced (from
Milliken and Johnson) in Pinheiro and Bates and available in the MEMSS
package.

library(lme4)
Loading required package: Matrix
Loading required package: lattice

Attaching package: 'Matrix'


        The following object(s) are masked from package:stats :

         xtabs
data("Machines", package = "MEMSS")
str(Machines)
'data.frame':   54 obs. of  3 variables:
$ Worker : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 2 2 2 3 3 3 4 ...
$ Machine: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
$ score  : num  52 52.8 53.1 51.8 52.8 53.1 60 60.2 58.4 51.1 ...

We consider the Machine factor to have a fixed set of levels in that
we only consider these three machines.  The levels of the Worker
factor represent a sample from the set of potential operators.  As you
might imagine from this description, I now think of the distinction
between "fixed" and "random" as being associated with the factor, not
necessarily the "effects".

If you plot these data
dotplot(reorder(Worker, score) ~ score, Machines, groups = Machine, type = c("g", "p", "a"), xlab = "Efficiency score", ylab = "Worker", auto.key = list(columns = 3, lines = TRUE))

(resulting PDF enclosed) you will see evidence of an interaction.
That is, some workers have noticeably different score patterns on the
three machines than do others.  Worker 6 on machine B is the most
striking example.

One way to model this interaction is to say that there is a random
effect for each worker and a separate random effect for each
worker/machine combination.  If the random effects for the
worker/machine combinations are assumed to be independent with
constant variance then one expresses the model as

print(fm1 <- lmer(score ~ Machine + (1|Worker) + (1| Machine:Worker), Machines), corr = FALSE)
Linear mixed model fit by REML
Formula: score ~ Machine + (1 | Worker) + (1 | Machine:Worker)
 Data: Machines
 AIC   BIC logLik deviance REMLdev
227.7 239.6 -107.8    225.5   215.7
Random effects:
Groups         Name        Variance Std.Dev.
Machine:Worker (Intercept) 13.90963 3.72956
Worker         (Intercept) 22.85528 4.78072
Residual                    0.92464 0.96158
Number of obs: 54, groups: Machine:Worker, 18; Worker, 6

Fixed effects:
          Estimate Std. Error t value
(Intercept)   52.356      2.486  21.062
MachineB       7.967      2.177   3.659
MachineC      13.917      2.177   6.393

An equivalent formulation is
print(fm1 <- lmer(score ~ Machine + (1|Worker/Machine), Machines), corr = FALSE)
Linear mixed model fit by REML
Formula: score ~ Machine + (1 | Worker/Machine)
 Data: Machines
 AIC   BIC logLik deviance REMLdev
227.7 239.6 -107.8    225.5   215.7
Random effects:
Groups         Name        Variance Std.Dev.
Machine:Worker (Intercept) 13.90963 3.72956
Worker         (Intercept) 22.85528 4.78072
Residual                    0.92464 0.96158
Number of obs: 54, groups: Machine:Worker, 18; Worker, 6

Fixed effects:
          Estimate Std. Error t value
(Intercept)   52.356      2.486  21.062
MachineB       7.967      2.177   3.659
MachineC      13.917      2.177   6.393

The expression (1|Worker/Machine) is just "syntactic sugar".  It is
expanded to (1|Worker) + (1|Machine:Worker) before the model matrices
are created.

If you want to start with the formula and see what that means for the
model then use these rules:

- a term including the '|' operator is a random effects term
- if the left-hand side of the '|' operator is 1 then the random
effects are scalar random effects, one for each level of the factor on
the right of the '|'
- random effects associated with different terms are independent
- random effects associated with different levels of the factor within
a term are independent
- the variance of the random effects within the same term is constant

However, there is another mixed-effects model that could make sense
for these data.  Suppose I consider the variations associated with
each worker as a vector of length 3 (Machines A, B and C) with a
symmetric, positive semidefinite 3 by 3 variance-covariance matrix.  I
fit that model as

print(fm2 <- lmer(score ~ Machine + (Machine|Worker), Machines), corr = FALSE)
Linear mixed model fit by REML
Formula: score ~ Machine + (Machine | Worker)
 Data: Machines
 AIC   BIC logLik deviance REMLdev
228.3 248.2 -104.2    216.6   208.3
Random effects:
Groups   Name        Variance Std.Dev. Corr
Worker   (Intercept) 16.64051 4.07928
        MachineB    34.54670 5.87764   0.484
        MachineC    13.61398 3.68971  -0.365  0.297
Residual              0.92463 0.96158
Number of obs: 54, groups: Worker, 6

Fixed effects:
          Estimate Std. Error t value
(Intercept)   52.356      1.681  31.151
MachineB       7.967      2.421   3.291
MachineC      13.917      1.540   9.037

It may be more meaningful to write it as

print(fm3 <- lmer(score ~ Machine + (0+Machine|Worker), Machines), corr = FALSE)
Linear mixed model fit by REML
Formula: score ~ Machine + (0 + Machine | Worker)
 Data: Machines
 AIC   BIC logLik deviance REMLdev
228.3 248.2 -104.2    216.6   208.3
Random effects:
Groups   Name     Variance Std.Dev. Corr
Worker   MachineA 16.64097 4.07933
        MachineB 74.39557 8.62529  0.803
        MachineC 19.26646 4.38936  0.623 0.771
Residual           0.92463 0.96158
Number of obs: 54, groups: Worker, 6

Fixed effects:
          Estimate Std. Error t value
(Intercept)   52.356      1.681  31.150
MachineB       7.967      2.421   3.291
MachineC      13.917      1.540   9.037

Now we are fitting 3 variances and 3 covariances for the random
effects instead of the previous models which had two variances.  The
difference in the models is exactly what made you pause - the simpler
model assumes that, conditional on the random effect for the worker,
the worker/machine random effects are independent and have constant
variance.  In the more general models the worker/machine interactions
are allowed to be correlated within worker.

It is more common to allow this kind of correlation within subject in
models for longitudinal data (the Laird-Ware formulation) where each
subject has a random effect for the intercept and a random effect for
the slope with respect to time and these can be correlated.  However,
this type of representation can make sense with a factor on the left
hand side of the '|' operator, like the Machine factor here.  If that
factor has a large number of levels then the model quickly becomes
unwieldy because the number of variance-covariance parameters to
estimate is quadratic in the number of levels of the factor on the
lhs.

I hope this helps.

Having got there: Presuming that I'm more-or-less on the right track
     in my foregoing conjecture that it's the over-simple dependence
structure
     that is the problem with what's delivered by the
Package-Which-Must-Not-Be-Named,
how might one go about being less simple-minded? I.e. what might be
some
more realistic dependence structures, and how would one specify these
in lmer?
And how would one assess whether the assumed dependence structure
gives
     a reasonable fit to the data?

If you can describe how many variance components you think should be
estimated in your model and what they would represent then I think it
will be easier to describe how to fit the model.

How does this fit in with my conjecture (above) about what I've been missing all these years? Does it fit? How many variance components
     are there in the ``naive'' model?  It looks like 5 to me ... but
maybe
I'm totally out to lunch in what I think I'm understanding at this
stage.
(And besides --- there are three sorts of statistician; those who can
count,
     and those who can't.)

     Thank you for your indulgence.

              cheers,

                     Rolf Turner

######################################################################
Attention:This e-mail message is privileged and confidential. If you are not theintended recipient please delete the message and notify the sender.Any
views or opinions presented are solely those of the author.

This e-mail has been scanned and cleared by
MailMarshalwww.marshalsoftware.com
######################################################################

<Machines.pdf>_______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to