Re: [R] Random structure of nested design in lme

Doran, Harold Wed, 23 Aug 2006 04:45:13 -0700

Why are the results not reliable?


________________________________

        From: ESCHEN Rene [mailto:[EMAIL PROTECTED] 
        Sent: Wednesday, August 23, 2006 3:48 AM
        To: Spencer Graves; r-help@stat.math.ethz.ch
        Cc: Doran, Harold
        Subject: RE: [R] Random structure of nested design in lme
        
        


        The output of the suggested lmer model looks very similar to the output 
of aov, also when I ran the model on the dataset I want to use. Thank you very 
much for the suggestion, this appears to solve my problem to a great extend.
        
        However, one of my response variables is survival of my plants, which 
is a binary variable (alive = 1; dead = 0). To analyze this case, I added 
family = "binomial" to the command line:
        
        fit.lme4 <- 
lmer(binary.response~soiltype*habitat+(1|destination)+(1|origin), Dat0, family 
= "binomial")
        
        > anova(fit.lme4)
        Analysis of Variance Table
                         Df Sum Sq Mean Sq  Denom F value Pr(>F)
        soiltype          1  0.029   0.029 32.000  0.0238 0.8784
        habitat           1  0.029   0.029 32.000  0.0238 0.8784
        soiltype:habitat  1  0.062   0.062 32.000  0.0504 0.8237
        
        It seems to me that the results are either suspiciously signficant (P < 
0.0001) or the other way aroud (all P > 0.75). I read in previous posts that I 
am not the first to encounter this problem, but I did not find a way around 
this so far.
        
        Below I added the data I used as an additional column in the sample 
dataset I used before.
        
        Does anyone have a suggestion how to get reliable output from lmer 
models if the response variable is binary?
        
        René.
        
        Additional column for sample dataset:
        ___
        binary.response
        0
        0
        0
        1
        0
        1
        0
        1
        1
        0
        0
        1
        0
        0
        0
        1
        0
        0
        1
        0
        1
        0
        1
        1
        0
        0
        0
        1
        0
        0
        0
        1
        1
        0
        0
        0
        
        ___
        
        
        -----Original Message-----
        From: Spencer Graves [mailto:[EMAIL PROTECTED]
        Sent: Fri 2006-08-04 01:35
        To: ESCHEN Rene
        Cc: Doran, Harold; r-help@stat.math.ethz.ch
        Subject: Re: [R] Random structure of nested design in lme
        
                  I'm not familiar with 'aov', but I have two observations that 
might
        help you:
        
        1.  UNESTIMABLE VARIANCE COMPONENT
        
                  The variance component 'soiltype' is not estimable in your 
'lme' model:
        
                lme(NA.1~soiltype*habitat,random=~1|destination/soiltype)
        
        That's because each level of 'soiltype' occurs only once within each
        level of 'destination' in the self-contained example you provided below.
        
                  To confirm this, I deleted 'soiltype' from this model:
        
        fit.lme <- lme(response~soiltype*habitat, random=~1|destination/origin)
        fit.lme0 <- lme(response~soiltype*habitat, random=~1|destination)
        
                  The answers seemed to be identical except for one thing:
        
         > VarCorr(fit.lme)
                       Variance     StdDev
        destination = pdLogChol(1)
        (Intercept)   0.004149471  0.06441639
        origin =      pdLogChol(1)
        (Intercept)   0.060968550  0.24691810
        Residual      0.007265180  0.08523603
         > VarCorr(fit.lme0)
        destination = pdLogChol(1)
                     Variance    StdDev
        (Intercept) 0.004149471 0.06441639
        Residual    0.068233730 0.26121587
        
                  The "Residual" variance in "fit.lme0" equals the sum of 
"origin" and
        "Residual" variances in "fit.lme".
        
                  It would help if 'lme' checked for situations like this and 
either
        refused to run or dropped inestimable variance components.  However,
        it's possible that there are so many ways that variance components can
        be inestimable that it's just not feasible to check for them all.  (The
        function 'varcomp' in S-Plus 6.2 has the same problem.)
        
        
        2.  CROSSED OR NESTED?
        
                  Are 'destination' and 'origin' crossed or nested in your 
'aov' model:
        
                  aov(response~soiltype*habitat+Error(destination+origin))
        
                  I have not used 'aov', and I don't think I should take the 
time now
        to try to figure this out.  However, this model specification suggests
        to me that 'destination' and 'origin' might be crossed not nested.  (The
        difference is the 'destination:origin' interaction:  If
        'destination+origin' is crossed, their interaction is used as the error
        term;  otherwise, it looks to me like you have a saturated model.)  By
        contrast, 'destination/origin' in lme is 'nested', which means that the
        variance component for 'origin' is in essence the crossed term and the
        interaction combined.
        
                  I believe there is a way to estimate crossed random effects 
using
        'lme', but I don't understand how.  Fortunately, we can do it using
        'lmer' in the 'lme4' and 'Matrix' packages.
        
                  Because of potential conflicts between 'nlme' and 'lme4', I 
always
        quit R and restart when I switch from one to another.  The following
        will then fit something using 'lmer' that looks like it might match your
        'aov' fit:
        
        library(lme4)
        fit.lme4 <- lmer(
           response~soiltype*habitat
              +(1|destination)+(1|origin), Dat0)
        
        where Dat0 is a data.frame with columns 'response', 'soiltype',
        'habitat', 'destination' and 'origin'.
               
                  I don't know 'aov' well enough to determine easily if the 
results
        from this 'lmer' fit match those from 'aov', but I hope this helps.
        
                  Spencer Graves       
        
        ESCHEN Rene wrote:
        > Spencer,
        >
        > Thank you for the kind and elaborate reply to my previous post.
        >
        > I did consider the option you suggested and many variations.
        Depending on the order of the random factors, lme will either
        give the same output as the aov model for soiltype or for habitat,
        but not both in the same model.
        >
        > The closest I came was
        >
        >           
anova(lme(NA.1~soiltype*habitat,random=~1|destination/soiltype))
        >
        > However, it apppears that in this case the interaction is tested at 
the same level as soiltype.
        >
        > In this post, a small sample dataset with a brief explanation of the 
meaning of the different column titles is included below. Also, I included both 
the aov model and the lme model.
        >
        > Hopefully, this will help to get closer to a solution to my problem.
        >
        > Best regards,
        >
        > René Eschen.
        >
        > ___
        >
        > #Small sample dataset
        > #
        > data=read.table("Sample dataset.csv",header=T)
        > require(nlme)
        > soiltype=factor(soiltype)
        > habitat=factor(habitat)
        > destination=factor(destination)
        > origin=factor(origin)
        > summary(aov(response~soiltype*habitat+Error(destination+origin)))
        > anova(lme(response~soiltype*habitat,random=~1|destination/origin))
        > #
        > #"habitat" type is either 'arable' or 'grassland'
        > #"destination" indicates what site the soil was transplanted into, 
and is considered a random factor within habitat type
        > #"soiltype" is either 'arable' or 'grassland'
        > #"origin" indicates what site the soil was taken from, and is 
considered a random factor within soil type
        > #"response" is the response variable, typically some plant parameter 
such as growth rate or number of leaves, but in this example it is a random 
number between 0 and 1.
        > #
        > "habitat"     "destination"   "soiltype"      "origin"        
"response"
        > 1     1       1       1       0.63
        > 1     2       1       1       0.76
        > 1     3       1       1       0.14
        > 2     4       1       1       0.27
        > 2     5       1       1       0.88
        > 2     6       1       1       0.41
        > 1     1       1       2       0.47
        > 1     2       1       2       0.48
        > 1     3       1       2       0.76
        > 2     4       1       2       0.83
        > 2     5       1       2       0.88
        > 2     6       1       2       0.57
        > 1     1       1       3       0.80
        > 1     2       1       3       0.31
        > 1     3       1       3       0.22
        > 2     4       1       3       0.53
        > 2     5       1       3       0.97
        > 2     6       1       3       0.30
        > 1     1       2       4       0.46
        > 1     2       2       4       0.99
        > 1     3       2       4       0.56
        > 2     4       2       4       0.32
        > 2     5       2       4       0.46
        > 2     6       2       4       0.64
        > 1     1       2       5       0.03
        > 1     2       2       5       0.41
        > 1     3       2       5       0.24
        > 2     4       2       5       0.60
        > 2     5       2       5       0.04
        > 2     6       2       5       0.30
        > 1     1       2       6       0.97
        > 1     2       2       6       0.60
        > 1     3       2       6       0.22
        > 2     4       2       6       0.16
        > 2     5       2       6       0.58
        > 2     6       2       6       0.21
        >
        >
        >
        > -----Original Message-----
        > From: Spencer Graves [mailto:[EMAIL PROTECTED]
        > Sent: Sat 2006-07-22 20:03
        > To: ESCHEN Rene
        > Cc: Doran, Harold; r-help@stat.math.ethz.ch
        > Subject: Re: [R] Random structure of nested design in lme
        > 
        >         Have you considered the following:
        >
        >         anova(lme(NA.1~soiltype*habitat,random=~1|destination/origin))
        >
        >         This seems more closely to match the 'aov' command in your 
original
        > post.  This model might be written in more detail as follows:
        >
        >         NA.1[s, h, i,j,k] = b0 + ST[s] + H[h] +
        >               ST.H[s[i],j[j] j] + d[i] + o[i,j] + e[i,j,k]
        >
        > where           b0 = a constant to be estimated,
        >
        >         s = the soil type for that particular sample,
        >
        >         h = the habitat for that sample,
        >
        >         ST = soil type coefficients to be estimated subject to a 
constraint
        > that they sum to 0,
        >
        >         H = habitat coefficients to be estimated subject to the 
constraint
        > that they sum to 0,
        >
        >         ST.H = soil type by habitat interaction coefficients to be 
estimated
        > subject to constraints that ST.H[s,.] sum to 0 and ST.H[., h] also sum
        > to 0,
        >
        >         d[i] = a random deviation associated with each destination, 
assuming
        > the d's are all normal, independent, with mean 0 and unknown but
        > constant variance s2.d
        >
        >         o[i, j] = a random deviation associated with each destination 
/
        > origin combination, assuming the o's are all normal, independent, with
        > mean 0 and unknown variance s2.o,
        >
        > and     e[i,j,j] = the standard unknown noise term, normal, 
independent
        > with mean 0 and unknown variance s2.e.
        >
        >         The model you wrote includes nested noise terms for soil type 
and
        > habitat as well.  These terms are not estimable, which makes the 
answers
        > garbage, but the 'lme' function does not check for replicates and
        > therefore sometimes gives garbage answers without warning.
        >
        >         To get more information from the fit, I suggest you first try
        > 'methods(class="lme")', and review help pages associated with what you
        > see listed there.
        >
        >         Have you looked at Pinheiro and Bates (2000) Mixed-Effects 
Models in
        > S and S-Plus (Springer)?  This is my all-time favorite reference on
        > Bates has been one of the leading original contributors in variance
        > components analysis and nonlinear estimation more generally for over 
25
        > years.  The 'nlme' package is the product of his work and the work of
        > many of his graduate students prior to 2000.  The book, at least from 
my
        > perspective, is very well written.  Moreover, the standard R
        > distribution includes files named "ch01.R", "ch02.R", ..., "ch06.R",
        > "ch08.R" with the R scripts accompanying each chapter in the book in
        > "~\library\nlme\scripts" under the R installation directory on your 
hard
        > drive, e.g. "D:\Program files\R\R-2.3.1\library\nlme\scripts", on my
        > computer.  There are minor changes in the syntax in a few places 
between
        > the book and the current R implementation that make it impossible to 
get
        > some of the published answers.  Using these script files increases the
        > likelihood that you will get essentially the book's answers and won't 
be
        > defeated by subtle typographical errors or by the difference between 
x^2
        > and I(x^2), for example.
        >
        >         If you would like further information from this listserver, 
please
        > submit another post, preferably including a "commented, minimal,
        > self-contained, reproducible code", as suggested in the posting guide
        > "www.R-project.org/posting-guide.html".
        >
        >         Hope this helps.
        >         Spencer Graves
        >
        > ESCHEN Rene wrote:
        >> Although I know it's not correct, this is what I tried in lme:
        >>
        >> 
anova(lme(NA.1~soiltype*habitat,random=~1|destination/habitat/origin/soiltype))
        >>
        >> #                 numDF denDF   F-value p-value
        >> #(Intercept)          1   130 12.136195  0.0007
        >> #soiltype             1   130 15.099792  0.0002
        >> #habitat              1    10  0.699045  0.4226
        >> #soiltype:habitat     1   130  2.123408  0.1475
        >>
        >> René.
        >>
        >> -----Original Message-----
        >> From: Doran, Harold [mailto:[EMAIL PROTECTED]
        >> Sent: Wed 2006-07-19 13:53
        >> To: ESCHEN Rene; r-help@stat.math.ethz.ch
        >> Subject: RE: [R] Random structure of nested design in lme
        >> 
        >> Can you provide an example of what you have done with lme so we 
might be able to evaluate the issue?
        >>
        >>> -----Original Message-----
        >>> From: [EMAIL PROTECTED]
        >>> [mailto:[EMAIL PROTECTED] On Behalf Of ESCHEN Rene
        >>> Sent: Wednesday, July 19, 2006 7:37 AM
        >>> To: r-help@stat.math.ethz.ch
        >>> Subject: [R] Random structure of nested design in lme
        >>>
        >>> All,
        >>>
        >>> I'm trying to analyze the results of a reciprocal transplant
        >>> experiment using lme(). While I get the error-term right in
        >>> aov(), in lme() it appears impossible to get as expected. I
        >>> would be greatful for any help.
        >>>
        >>> My experiment aimed to identify whether two fixed factors
        >>> (habitat type and soil type) affect the development of
        >>> plants. I took soil from six random sites each of two types
        >>> (arable and grassland) and transplanted them back into the
        >>> sites of origin in such way that in each of the sites there
        >>> were six pots containing arable soil and six pots of
        >>> grassland soil, each containing a seedling.
        >>>
        >>> With aov(), I got the analysis as I expected, with habitat
        >>> type tested against destination site, and soil type tested
        >>> against origin site:
        >>>
        >>> summary(aov(response~soiltype*habitat+Error(destination+origin)))
        >>> #
        >>> #Error: destination
        >>> #          Df  Sum Sq Mean Sq F value Pr(>F)
        >>> #habitat    1  1.0000  1.0000   0.699 0.4226
        >>> #Residuals 10 14.3056  1.4306              
        >>> #
        >>> #Error: origin
        >>> #          Df  Sum Sq Mean Sq F value   Pr(>F)  
        >>> #soiltype   1 1.77778 1.77778  11.636 0.006645 **
        >>> #Residuals 10 1.52778 0.15278                   
        >>> #---
        >>> #Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #
        >>> #Error: Within
        >>> #                  Df  Sum Sq Mean Sq F value Pr(>F)
        >>> #soiltype:habitat   1  0.2500  0.2500  2.1774 0.1427
        >>> #Residuals        120 13.7778  0.1148    
        >>>
        >>> However, when I try to replicate this analysis in lme, I am
        >>> unable to get the structure of the random factors (origin and
        >>> destination) correct. Does anyone have a suggestion how to
        >>> resolve this problem?
        >>>
        >>> Thanks in advance.
        >>>
        >>> René Eschen
        >>>
        >>> CABI Bioscience Centre Switzerland
        >>> Rue des Grillons 1
        >>> 2800 Delémont
        >>> Switzerland
        >>>
        >>>     [[alternative HTML version deleted]]
        >>>
        >>>
        >>
        >>      [[alternative HTML version deleted]]
        >>
        >>
        >>
        >> 
------------------------------------------------------------------------
        >>
        >> ______________________________________________
        >> R-help@stat.math.ethz.ch mailing list
        >> https://stat.ethz.ch/mailman/listinfo/r-help
        >> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
        >> and provide commented, minimal, self-contained, reproducible code.
        >
        >
        >       [[alternative HTML version deleted]]
        >
        >
        >
        > 
------------------------------------------------------------------------
        >
        > ______________________________________________
        > R-help@stat.math.ethz.ch mailing list
        > https://stat.ethz.ch/mailman/listinfo/r-help
        > PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
        > and provide commented, minimal, self-contained, reproducible code.
        
        


        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Random structure of nested design in lme

Reply via email to