[R] conservative robust estimation in (nonlinear) mixed models

2006-03-24 Thread dave fournier
I believe that Bert's comments are a non sequitur.
I did not and do not propose identifying which components
of the model are contaminated by outliers. What I do propose
is the more or less routine use of conservative robust methods
to replace the normal theory estimators. By definition such estimators
are to be almost as efficient as the normal theory estimators in the 
case where the normal theory applies. One may argue that
conservative robust estimators do not exist for this class of
problems. I think they do, but the obvious way to establish this
claim is to carry out simulations.

Before such simulations can be carried out one must create the
software to do the analysis. So I am proposing to add that to our
R package glmmADMB. Then other R users can carry out their own
simulation analysis to investigate how the method performs.
I think that normal mixtures are better candidates for
conservative robust estimators than say Student's T distribution,
but I will try to include both (and perhaps any others that appear
useful).

  Dave


 Bert raised an issue I had overlooked.  Ideally, we would like to be 
 able to specify a different family for the observations and for each 
 random effect, with Student's t and contaminated normal as valid options 
 in both places.
 
 If I were allowed to specify a family (or a robust family) for either 
 observations or for random effects but not both, I think I'd pick the 
 observations.  I don't know, but I wonder if misspecification of the 
 observation distribution might create more problems with estimation and 
 inference than misspecification of the distribution of a random effect. 
   As Bert indicated, there may be identifiability issues here, and the 
 choice of a model may depend on one's hypotheses about the situation 
 being modeled.
 
 spencer graves
 
 Berton Gunter wrote:
 
 Ok, since Spencer has dived in,I'll go public (I made some prior private
 remarks to David because I didn't think they were worth wasting the list's
 bandwidth on. Heck, they may still not be...)
 
 My question: isn't the difficult issue which levels of the (co)variance
 hierarchy get longer tailed distributions rather than which distributions
 are used to model ong tails? Seems to me that there is an inherent
 identifiability issue here, and even more so with nonlinear models. It's
 easy to construct examples where it all essentially depends on your priors.
 
 Cheers,
 Bert
 
 -- Bert Gunter
 Genentech Non-Clinical Statistics
 South San Francisco, CA
   
  
 
 
-Original Message-
From: r-help-bounces at stat.math.ethz.ch 
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Spencer Graves
Sent: Thursday, March 23, 2006 12:34 PM
To: otter at otter-rsch.com
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] conservative robust estimation in 
(nonlinear) mixed models

   I know of two fairly common models for robust 
methods.  One is the 
contaminated normal that you mentioned.  The other is Student's t.  A 
normal plot of the data or of residuals will often indicate 
whether the 
assumption of normality is plausible or not;  when the plot indicates 
problems, it will often also indicate whether a contaminated 
normal or 
Student's t would be better.

   Using Student's t introduces one additional parameter.  A 
contaminated normal would introduce 2;  however, in many 
applications, 
the contamination proportion (or its logit) will often b highly 
correlated with the ratio of the contamination standard deviation to 
that of the central portion of the distribution.  Thus, in 
some cases, 
it's often wise to fix the ratio of the standard deviations 
and estimate 
only the contamination proportion.

   hope this helps.
   spencer graves

dave fournier wrote:


Conservative robust estimation methods do not appear to be
currently available in the standard mixed model methods for R,
where by conservative robust estimation I mean methods which
work almost as well as the methods based on assumptions of
normality when the assumption of normality *IS* satisfied.

We are considering adding such a conservative robust 

estimation option

for the random effects to our AD Model Builder mixed model package,
glmmADMB, for R, and perhaps extending it to do robust 

estimation for 

linear mixed models at the same time.

An obvious candidate is to assume something like a mixture of
normals. I have tested this in a simple linear mixed model
using 5% contamination with  a normal with 3 times the standard 
deviation, which seems to be
a common assumption. Simulation results indicate that when the
random effects are normally distributed this estimator is about
3% less efficient, while when the random effects are 

contaminated with

5% outliers  the estimator is about 23% more efficient, where by 23%
more efficient I mean that one would have to use a sample size about
23% larger to obtain the same size confidence limits for the
parameters.

Question?

I

[R] conservative robust estimation in (nonlinear) mixed models

2006-03-23 Thread dave fournier

Conservative robust estimation methods do not appear to be
currently available in the standard mixed model methods for R,
where by conservative robust estimation I mean methods which
work almost as well as the methods based on assumptions of
normality when the assumption of normality *IS* satisfied.

We are considering adding such a conservative robust estimation option
for the random effects to our AD Model Builder mixed model package,
glmmADMB, for R, and perhaps extending it to do robust estimation for 
linear mixed models at the same time.

An obvious candidate is to assume something like a mixture of
normals. I have tested this in a simple linear mixed model
using 5% contamination with  a normal with 3 times the standard 
deviation, which seems to be
a common assumption. Simulation results indicate that when the
random effects are normally distributed this estimator is about
3% less efficient, while when the random effects are contaminated with
5% outliers  the estimator is about 23% more efficient, where by 23%
more efficient I mean that one would have to use a sample size about
23% larger to obtain the same size confidence limits for the
parameters.

Question?

I wonder if there are other distributions besides a mixture or normals. 
which might be preferable. Three things to keep in mind are:

1.)  It should be likelihood based so that the standard likelihood
  based tests are applicable.

2.)  It should work well when the random effects are normally
 distributed so that things that are already fixed don't get
 broke.

3.)  In order to implement the method efficiently it is necessary to
 be able to produce code for calculating the inverse of the
 cumulative distribution function. This enables one to extend
 methods based one the Laplace approximation for the random
 effects (i.e. the Laplace approximation itself, adaptive
 Gaussian integration, adaptive importance sampling) to the new
 distribution.

  Dave

-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] conservative robust estimation in (nonlinear) mixed models

2006-03-23 Thread Spencer Graves
  I know of two fairly common models for robust methods.  One is the 
contaminated normal that you mentioned.  The other is Student's t.  A 
normal plot of the data or of residuals will often indicate whether the 
assumption of normality is plausible or not;  when the plot indicates 
problems, it will often also indicate whether a contaminated normal or 
Student's t would be better.

  Using Student's t introduces one additional parameter.  A 
contaminated normal would introduce 2;  however, in many applications, 
the contamination proportion (or its logit) will often b highly 
correlated with the ratio of the contamination standard deviation to 
that of the central portion of the distribution.  Thus, in some cases, 
it's often wise to fix the ratio of the standard deviations and estimate 
only the contamination proportion.

  hope this helps.
  spencer graves

dave fournier wrote:

 Conservative robust estimation methods do not appear to be
 currently available in the standard mixed model methods for R,
 where by conservative robust estimation I mean methods which
 work almost as well as the methods based on assumptions of
 normality when the assumption of normality *IS* satisfied.
 
 We are considering adding such a conservative robust estimation option
 for the random effects to our AD Model Builder mixed model package,
 glmmADMB, for R, and perhaps extending it to do robust estimation for 
 linear mixed models at the same time.
 
 An obvious candidate is to assume something like a mixture of
 normals. I have tested this in a simple linear mixed model
 using 5% contamination with  a normal with 3 times the standard 
 deviation, which seems to be
 a common assumption. Simulation results indicate that when the
 random effects are normally distributed this estimator is about
 3% less efficient, while when the random effects are contaminated with
 5% outliers  the estimator is about 23% more efficient, where by 23%
 more efficient I mean that one would have to use a sample size about
 23% larger to obtain the same size confidence limits for the
 parameters.
 
 Question?
 
 I wonder if there are other distributions besides a mixture or normals. 
 which might be preferable. Three things to keep in mind are:
 
 1.)  It should be likelihood based so that the standard likelihood
   based tests are applicable.
 
 2.)  It should work well when the random effects are normally
  distributed so that things that are already fixed don't get
  broke.
 
 3.)  In order to implement the method efficiently it is necessary to
  be able to produce code for calculating the inverse of the
  cumulative distribution function. This enables one to extend
  methods based one the Laplace approximation for the random
  effects (i.e. the Laplace approximation itself, adaptive
  Gaussian integration, adaptive importance sampling) to the new
  distribution.
 
   Dave


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] conservative robust estimation in (nonlinear) mixed models

2006-03-23 Thread Berton Gunter
Ok, since Spencer has dived in,I'll go public (I made some prior private
remarks to David because I didn't think they were worth wasting the list's
bandwidth on. Heck, they may still not be...)

My question: isn't the difficult issue which levels of the (co)variance
hierarchy get longer tailed distributions rather than which distributions
are used to model ong tails? Seems to me that there is an inherent
identifiability issue here, and even more so with nonlinear models. It's
easy to construct examples where it all essentially depends on your priors.

Cheers,
Bert

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
  
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Spencer Graves
 Sent: Thursday, March 23, 2006 12:34 PM
 To: [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] conservative robust estimation in 
 (nonlinear) mixed models
 
 I know of two fairly common models for robust 
 methods.  One is the 
 contaminated normal that you mentioned.  The other is Student's t.  A 
 normal plot of the data or of residuals will often indicate 
 whether the 
 assumption of normality is plausible or not;  when the plot indicates 
 problems, it will often also indicate whether a contaminated 
 normal or 
 Student's t would be better.
 
 Using Student's t introduces one additional parameter.  A 
 contaminated normal would introduce 2;  however, in many 
 applications, 
 the contamination proportion (or its logit) will often b highly 
 correlated with the ratio of the contamination standard deviation to 
 that of the central portion of the distribution.  Thus, in 
 some cases, 
 it's often wise to fix the ratio of the standard deviations 
 and estimate 
 only the contamination proportion.
 
 hope this helps.
 spencer graves
 
 dave fournier wrote:
 
  Conservative robust estimation methods do not appear to be
  currently available in the standard mixed model methods for R,
  where by conservative robust estimation I mean methods which
  work almost as well as the methods based on assumptions of
  normality when the assumption of normality *IS* satisfied.
  
  We are considering adding such a conservative robust 
 estimation option
  for the random effects to our AD Model Builder mixed model package,
  glmmADMB, for R, and perhaps extending it to do robust 
 estimation for 
  linear mixed models at the same time.
  
  An obvious candidate is to assume something like a mixture of
  normals. I have tested this in a simple linear mixed model
  using 5% contamination with  a normal with 3 times the standard 
  deviation, which seems to be
  a common assumption. Simulation results indicate that when the
  random effects are normally distributed this estimator is about
  3% less efficient, while when the random effects are 
 contaminated with
  5% outliers  the estimator is about 23% more efficient, where by 23%
  more efficient I mean that one would have to use a sample size about
  23% larger to obtain the same size confidence limits for the
  parameters.
  
  Question?
  
  I wonder if there are other distributions besides a mixture 
 or normals. 
  which might be preferable. Three things to keep in mind are:
  
  1.)  It should be likelihood based so that the standard 
 likelihood
based tests are applicable.
  
  2.)  It should work well when the random effects are normally
   distributed so that things that are already fixed don't get
   broke.
  
  3.)  In order to implement the method efficiently it is 
 necessary to
   be able to produce code for calculating the inverse of the
   cumulative distribution function. This enables one 
 to extend
   methods based one the Laplace approximation for the random
   effects (i.e. the Laplace approximation itself, adaptive
   Gaussian integration, adaptive importance 
 sampling) to the new
   distribution.
  
Dave
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] conservative robust estimation in (nonlinear) mixed models

2006-03-23 Thread Spencer Graves
  Bert raised an issue I had overlooked.  Ideally, we would like to be 
able to specify a different family for the observations and for each 
random effect, with Student's t and contaminated normal as valid options 
in both places.

  If I were allowed to specify a family (or a robust family) for either 
observations or for random effects but not both, I think I'd pick the 
observations.  I don't know, but I wonder if misspecification of the 
observation distribution might create more problems with estimation and 
inference than misspecification of the distribution of a random effect. 
  As Bert indicated, there may be identifiability issues here, and the 
choice of a model may depend on one's hypotheses about the situation 
being modeled.

  spencer graves

Berton Gunter wrote:

 Ok, since Spencer has dived in,I'll go public (I made some prior private
 remarks to David because I didn't think they were worth wasting the list's
 bandwidth on. Heck, they may still not be...)
 
 My question: isn't the difficult issue which levels of the (co)variance
 hierarchy get longer tailed distributions rather than which distributions
 are used to model ong tails? Seems to me that there is an inherent
 identifiability issue here, and even more so with nonlinear models. It's
 easy to construct examples where it all essentially depends on your priors.
 
 Cheers,
 Bert
 
 -- Bert Gunter
 Genentech Non-Clinical Statistics
 South San Francisco, CA
   
  
 
 
-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Spencer Graves
Sent: Thursday, March 23, 2006 12:34 PM
To: [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] conservative robust estimation in 
(nonlinear) mixed models

I know of two fairly common models for robust 
methods.  One is the 
contaminated normal that you mentioned.  The other is Student's t.  A 
normal plot of the data or of residuals will often indicate 
whether the 
assumption of normality is plausible or not;  when the plot indicates 
problems, it will often also indicate whether a contaminated 
normal or 
Student's t would be better.

Using Student's t introduces one additional parameter.  A 
contaminated normal would introduce 2;  however, in many 
applications, 
the contamination proportion (or its logit) will often b highly 
correlated with the ratio of the contamination standard deviation to 
that of the central portion of the distribution.  Thus, in 
some cases, 
it's often wise to fix the ratio of the standard deviations 
and estimate 
only the contamination proportion.

hope this helps.
spencer graves

dave fournier wrote:


Conservative robust estimation methods do not appear to be
currently available in the standard mixed model methods for R,
where by conservative robust estimation I mean methods which
work almost as well as the methods based on assumptions of
normality when the assumption of normality *IS* satisfied.

We are considering adding such a conservative robust 

estimation option

for the random effects to our AD Model Builder mixed model package,
glmmADMB, for R, and perhaps extending it to do robust 

estimation for 

linear mixed models at the same time.

An obvious candidate is to assume something like a mixture of
normals. I have tested this in a simple linear mixed model
using 5% contamination with  a normal with 3 times the standard 
deviation, which seems to be
a common assumption. Simulation results indicate that when the
random effects are normally distributed this estimator is about
3% less efficient, while when the random effects are 

contaminated with

5% outliers  the estimator is about 23% more efficient, where by 23%
more efficient I mean that one would have to use a sample size about
23% larger to obtain the same size confidence limits for the
parameters.

Question?

I wonder if there are other distributions besides a mixture 

or normals. 

which might be preferable. Three things to keep in mind are:

1.)  It should be likelihood based so that the standard 

likelihood

  based tests are applicable.

2.)  It should work well when the random effects are normally
 distributed so that things that are already fixed don't get
 broke.

3.)  In order to implement the method efficiently it is 

necessary to

 be able to produce code for calculating the inverse of the
 cumulative distribution function. This enables one 

to extend

 methods based one the Laplace approximation for the random
 effects (i.e. the Laplace approximation itself, adaptive
 Gaussian integration, adaptive importance 

sampling) to the new

 distribution.

  Dave


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html