Introduction to Mixed (Hierarchical) models for biologists using R
14 May 2018 - 18 May 2018
Delivered by Prof Subhash Lele.
This course will be held at Orford Musique, 3165 Chemin du Parc,
Orford, QC J1X 7A2, Canada and can be reached directly by Montreal
Mixed models, also known as hierarchical models and multilevel models,
is a useful class of models for many applied sciences, including
biology, ecology and evolution. The goal of this course is to give a
thorough introduction to the logic, theory and most importantly
implementation of these models to solve practical problems in ecology.
Participants are not expected to know mathematics beyond the basic
algebra and calculus. Participants are expected to know some R
programming and to be familiar with the linear and generalized linear
regression. We will be using JAGS (Just Another Gibbs Sampler) for
Markov Chain Monte Carlo (MCMC) simulations for analyzing mixed models.
The course will be conducted so that participants have substantial
Linear and Generalized linear models
To understand mixed models, the most important ﬁrst step is to
thoroughly understand the linear and generalized linear models. Also,
when conducting the data analysis, it is useful to ﬁt a simpler ﬁxed
eﬀects model before trying to ﬁt a more complex mixed eﬀects model.
Hence, we will start with a very detailed review of these models. We are
assuming that the participants are familiar with these models and hence
we will emphasize some important, but not commonly covered, topics. This
will also give us an opportunity to unify the notation, review the basic
R commands and ﬁll out any gaps in knowledge and understanding of these
1. We will show the use of non-parametric exploratory techniques such as
classiﬁcation and regression trees (CART) for learning about important
covariates and possible non-linearities in the relationships.
2. We will emphasize graphical and simulation based methods (e.g. Gelman
and Hill, 2006) to understand and explore the implications of the ﬁtted
3. We will discuss graphical tools such as marginal and conditional
plots that are useful for conveying the results of a multiple regression
model to a lay person.
4. We will emphasize the use of graphical tools to conduct regression
diagnostics and appropriateness of the model.
5. We will discuss the important concepts of confounding, eﬀect
modiﬁcation and interaction. These are particularly important to conduct
causal, not just correlational, inference using observational studies.
Many of the topics that will be covered involve the use of matrix
algebra and calculus. While these mathematical techniques are essential
tools for a mathematical statistician who is trying to understand the
theory behind the methods, they can be avoided in practice by using
simulation based techniques. The built-in functions such as the ’lm’ and
’glm’ to ﬁt the regression models use the method of maximum likelihood
to estimate the parameters and conduct statistical inference. We will
discuss the use of JAGS (Just Another Gibbs Sampler) and the R package
’dclone’ to ﬁt the same models. We will use a diﬀerent statistical
philosophy, namely the Bayesian inference, to ﬁt these models. We will
show how the Bayesian approach can be tricked into giving frequentist
answers using data cloning (Lele et al. 2007, Ecology Letters). We will
also discuss the rudiments of frequentist and Bayesian inference
although we will not go into the pros and cons of them at this time.
That will be covered during sessions 3 and 4 of the ﬁfth day (and, over
1. What makes an inference statistical inference?
2. What do we mean by probability of an event?
3. How do we quantify uncertainty in an inferential statement in the
4. How do we quantify uncertainty in an inferential statement in the
We will then discuss the simulation based methods to quantify
1. Parametric bootstrap to quantify frequentist uncertainty
2. Markov Chain Monte Carlo to quantify Bayesian uncertainty
3. Fitting LM and GLM using JAGS and Bayesian approach
Linear Mixed Models
Historically, linear mixed models arose in the study of quantitative
genetics and heritability issues. They were successfully applied in
animal breeding and led to the ’white’ revolution with abundance of milk
supply for the developing world. They were, also, used in horse racing
and other such fun areas. The other situation where linear mixed eﬀects
models were developed were in the context of growth curves. We will
follow this historical trajectory of mixed models, paying tribute to the
great statisticians R. A. Fisher, C. R. Rao and Jerzy Neyman, and study
linear mixed models ﬁrst. The questions they tried to solve were:
Deciding the genetic value of a sire and/or a dam, studying heritability
of traits, studying co-evolution of traits etc. These can be answered
provided we assume that the sires and dams in our experiment or sample
are merely a sample from a super-population of sires and dams. In growth
curve analysis, we need to take into account that each individual is
unique in its own way but is also a part of a population. How do we
discuss both individual level and population inferences? In modern
times, linear mixed eﬀects models have arisen in the context of small
area estimation in survey sampling where one is interested in inferring
about a census tract based on county or state level data. These models
arise also in the context of combining remote sensed data from diﬀerent
resolutions and types. The main issues that we will be discussing are:
1. What is a random eﬀect? What is a ﬁxed eﬀect? How do we decide if an
eﬀect is random or ﬁxed?
2. How do we modify a linear regression model to accommodate random
3. Why bother ﬁtting a mixed eﬀects models? What do we gain?
4. How to modify the JAGS linear models program to ﬁt a linear mixed
eﬀects model using JAGS?
5. What is the diﬀerence between a Bayesian and a frequentist inference?
6. What is a prior? What is a non-informative prior?
7. How do we interpret the results of a linear mixed eﬀects model ﬁt?
Graphical and simulation based methods
8. How do we do model selection with mixed eﬀects models?
9. How do we do model diagnostics in mixed eﬀects models?
10. Parameter identiﬁabilty issues in linear mixed models
As we discuss these applications, we will discuss some subtle
computational issues involved in using MCMC. In my recollection (which
may be biased as it has been about 25 years since the quote), Daryl
Pregibon said: MCMC is the crack cocaine of modern statistics; it is
addictive, seductive and destructive. Hence, it is important for a
practitioner to understand these issues in order not to misuse the MCMC
1. What is a Markov Chain Monte Carlo method? Why is it necessary for
2. What are the subtleties in implementing MCMC?: Convergence of the
algorithm, Mixing of the chains.
3. Pros and cons of using MCMC
Generalised Linear Mixed Models
We will again start the discussion of GLMM in its historical context.
One of the initial uses of mixed models were in the context of over
dispersion in count data. Zero inﬂated count data was another important
example. The example that drove the current revolution in the use of
GLMM was in the context of spatial epidemiology. Clayton and Caldor
(1989, Biometrics) showed that one can use spatial correlation to
improve the prediction in mapping disease rates. This was also an
example of the application of Empirical Bayes methods that allow one to
pool information from diﬀerent spatial areas (or, studies, or, scales,
and so on).
1. Zero inﬂated data In many practical situations, we observe that there
are many locations where there are zero counts, far in excess of what
would be expected under the Poisson regression model. This can be
eﬀectively modelled using a mixed model framework. The mixed models
framework allows us to use much more complex and realistic models.
2. Over dispersion in GLM, Spatial GLM, Spatio-temporal GLM The Poisson
regression model assumes that the mean and variance are equal. This is,
often, not true in practice. Generally the variance in the data exceeds
the mean. One can show that such over-dispersion can be modelled using a
mixed eﬀects model. These models also arise in the context of
capturerecapture sampling where capture probabilities vary across space
or time or individuals.
3. Longitudinal or panel data with discrete response variable Many times
we have data on diﬀerent individuals where within the individual there
is temporal dependence but individuals are independent of each other.
Cluster sampling is another situation where we have dependence within a
cluster but independence between clusters. Such data needs to take into
account the innate variation between individuals before one can discuss
the eﬀect of interesting covariates or risk factors. Such data are
eﬀectively modelled as GLMM.
4. Measurement error, missing data Missing data and measurement error
are ubiquitous in ecological studies. Mixed models provide a convenient
way to take into account these diﬃculties and infer about the underlying
processes of interest. We will discuss these issues in the context of
Population Viability Analysis, Spatial population dynamics and
source-sink analysis, Occupancy and abundance surveys. These also arise
while doing usual linear and generalized linear models if the covariates
are measured with error.
5. Additional topics depending on the interest of the participants.
These may include, for example, discussion of Species Distribution
Models, Resource Selection Functions and Animal movement models.
6 Computational issues: Advanced topics
Mixed Models in a Bayesian Framework
MCMC is not the only approach to analyse mixed models. We will brieﬂy
discuss Laplace approximation based techniques (INLA, in particular)
along with approximate techniques such as Composite likelihood and
Approximate Bayesian Computation. Because of the mathematical nature,
this discussion will be somewhat limited, only giving the basics and
hinting at the important issues.
7 Philosophical issues: Sophie’s choice
1. What are the philosophical problems with using the frequentist
quantiﬁcation of uncertainty?
2. What are the philosophical problems with using the Bayesian
quantiﬁcation of uncertainty?
3. Sophie’s choice?
Check out our sister sites,
www.PRstatistics.com (Ecology and Life Sciences)
www.PRinformatics.com (Bioinformatics and data science)
www.PSstatsistics.com (Behaviour and cognition)
1. April 9th – 13th 2018
NETWORK ANAYLSIS FOR ECOLOGISTS USING R (NTWA02
Glasgow, Scotland, Dr. Marco Scotti
2. April 16th – 20th 2018
INTRODUCTION TO STATISTICAL MODELLING FOR PSYCHOLOGISTS USING R (IPSY01)
Glasgow, Scotland, Dr. Dale Barr, Dr Luc Bussierre
3. April 23rd – 27th 2018
MULTIVARIATE ANALYSIS OF ECOLOGICAL COMMUNITIES USING THE VEGAN PACKAGE
Glasgow, Scotland, Dr. Peter Solymos, Dr. Guillaume Blanchet
4. April 30th – 4th May 2018
QUANTITATIVE GEOGRAPHIC ECOLOGY: MODELING GENOMES, NICHES, AND
Glasgow, Scotland, Dr. Dan Warren, Dr. Matt Fitzpatrick
5. May 7th – 11th 2018 ADVANCES IN MULTIVARIATE ANALYSIS OF SPATIAL
ECOLOGICAL DATA USING R (MVSP02)
CANADA (QUEBEC), Prof. Pierre Legendre, Dr. Guillaume Blanchet
6. May 14th - 18th 2018
INTRODUCTION TO MIXED (HIERARCHICAL) MODELS FOR BIOLOGISTS (IMBR01)
CANADA (QUEBEC), Prof Subhash Lele
7. May 21st - 25th 2018
INTRODUCTION TO PYTHON FOR BIOLOGISTS (IPYB05)
SCENE, Scotland, Dr. Martin Jones
8. May 21st - 25th 2018
INTRODUCTION TO REMOTE SENISNG AND GIS FOR ECOLOGICAL APPLICATIONS
Glasgow, Scotland, Prof. Duccio Rocchini, Dr. Luca Delucchi
9. May 28th – 31st 2018
STABLE ISOTOPE MIXING MODELS USING SIAR, SIBER AND MIXSIAR (SIMM04)
CANADA (QUEBEC) Dr. Andrew Parnell, Dr. Andrew Jackson
10. May 28th – June 1st 2018
ADVANCED PYTHON FOR BIOLOGISTS (APYB02)
SCENE, Scotland, Dr. Martin Jones
11. June 12th - 15th 2018
SPECIES DISTRIBUTION MODELLING (DBMR01)
Myuna Bay sport and recreation, Australia, Prof. Jane Elith, Dr.
12. June 18th – 22nd 2018
STRUCTURAL EQUATION MODELLING FOR ECOLOGISTS AND EVOLUTIONARY BIOLOGISTS
USING R (SEMR02)
Myuna Bay sport and recreation, Australia, Dr. Jon Lefcheck
13. June 25th – 29th 2018
SPECIES DISTRIBUTION/OCCUPANCY MODELLING USING R (OCCU01)
Glasgow, Scotland, Dr. Darryl McKenzie
14. July 2nd - 5th 2018
SOCIAL NETWORK ANALYSIS FOR BEHAVIOURAL SCIENTISTS USING R (SNAR01)
Glasgow, Scotland, Prof James Curley
15. July 8th – 12th 2018
MODEL BASE MULTIVARIATE ANALYSIS OF ABUNDANCE DATA USING R (MBMV02)
Glasgow, Scotland, Prof David Warton
16. July 16th – 20th 2018
PRECISION MEDICINE BIOINFORMATICS: FROM RAW GENOME AND TRANSCRIPTOME
DATA TO CLINICAL INTERPRETATION (PMBI01)
Glasgow, Scotland, Dr Malachi Griffith, Dr. Obi Griffith
17. July 23rd – 27th 2018
EUKARYOTIC METABARCODING (EUKB01)
Glasgow, Scotland, Dr. Owen Wangensteen
18. October 8th – 12th 2018
INTRODUCTION TO SPATIAL ANALYSIS OF ECOLOGICAL DATA USING R (ISAE01)
Glasgow, Scotland, Prof. Subhash Lele
19. October 15th – 19th 2018
APPLIED BAYESIAN MODELLING FOR ECOLOGISTS AND EPIDEMIOLOGISTS (ABME
Glasgow, Scotland, Dr. Matt Denwood, Emma Howard
20. October 29th – November 2nd 2018
PHYLOGENETIC COMPARATIVE METHODS FOR STUDYING DIVERSIFICATION AND
PHENOTYPIC EVOLUTION (PCME01)
Glasgow, Scotland, Prof. Subhash Lele
Dr. Antigoni Kaliontzopoulou
21. November 26th – 30th 2018
FUNCTIONAL ECOLOGY FROM ORGANISM TO ECOSYSTEM: THEORY AND COMPUTATION
Glasgow, Scotland, Dr. Francesco de Bello, Dr. Lars Götzenberger, Dr.
22. February 2018 TBC
MOVEMENT ECOLOGY (MOVE02)
Margam Discovery Centre, Wales, Dr Luca Borger, Dr Ronny Wilson, Dr
Oliver Hooker PhD.
2017 publications -
Ecosystem size predicts eco-morphological variability in post-glacial
diversification. Ecology and Evolution. In press.
The physiological costs of prey switching reinforce foraging
specialization. Journal of animal ecology.
6 Hope Park Crescent
+44 (0) 7966500340
R-sig-Epi@r-project.org mailing list