Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-27 Thread Rubén Roa
-Mensaje original-
De: Bert Gunter [mailto:gunter.ber...@gene.com] 
Enviado el: jueves, 26 de enero de 2012 21:20
Para: Rubén Roa
CC: Ben Bolker; Frank Harrell
Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and 
unique combinations?

On Wed, Jan 25, 2012 at 11:39 PM, Rubén Roa r...@azti.es wrote:
 I think we have gone through this before.
 First, the destruction of all aspects of statistical inference is not at 
 stake, Frank Harrell's post notwithstanding.
 Second, checking all pairs is a way to see for _all pairs_ which model 
 outcompetes which in terms of predictive ability by -2AIC or more. Just 
 sorting them by the AIC does not give you that if no model is better than the 
 next best by less than 2AIC.
 Third, I was not implying that AIC differences play the role of significance 
 tests. I agree with you that model selection is better not understood as a 
 proxy or as a relative of significance tests procedures.
 Incidentally, when comparing many models the AIC is often inconclusive. If 
 one is bent on selecting just _the model_ then I check numerical optimization 
 diagnostics such as size of gradients, KKT criteria, and other issues such as 
 standard errors of parameter estimates and the correlation matrix of 
 parameter estimates.

-- And the mathematical basis for this claim is   ??

--
Ruben: In my area of work (building/testing/applying mechanistic nonlinear 
models of natural and economic phenomena) the issue of numerical optimization 
is a very serious one. It is often the case that a really good looking model 
does not converge properly (that's why ADMB is so popular among us). So if the 
AIC is inconclusive, but one AIC-tied model yields reasonably looking standard 
errors and low pairwise parameter estimates correlations, while the other 
wasn´t even able to produce a positive definite Hessian matrix (though it was 
able to maximize the log-likelihood), I think I have good reasons to select the 
properly converged model. I assume here that the lack of convergence is a 
symptom of model inadequacy to the data, that the AIC was not able to detect.
---
Ruben: For some reasons I don't find model averaging appealing. I guess deep in 
my heart I expect more from my model than just the best predictive ability.

-- This is a religious, not a scientific statement, and has no place in any 
scientific discussion.

--
Ruben: Seriously, there is a wide and open place in scientific discussion for 
mechanistic model-building. I expect the models I built to be more than able 
predictors, I want them to capture some aspect of nature, to teach me something 
about nature, so I refrain from model averaging, which is an open admission 
that you don't care too much about what's really going on.

-- The belief that certain data analysis practices -- standard or not -- 
somehow can be trusted to yield reliable scientific results in the face of 
clear theoretical (mathematical )and practical results to the contrary, while 
widespread, impedes and often thwarts the progress of science, Evidence-based 
medicine and clinical trials came about for a reason. I would encourage you to 
reexamine the basis of your scientific practice and the role that magical 
thinking plays in it.

Best,
Bert

--
Ruben: All right Bert. I often re-examine the basis of my scientific praxis but 
less often than I did before, I have to confess. I like to think it is because 
I am converging on the right praxis so there are less issues to re-examine. But 
this problem of model selection is a tough one. Being a likelihoodist in 
inference naturally leads you to AIC-based model selection, Jim Lindsey being 
influent too. Wanting that your models say some something about nature is 
another strong conditioning factor. Put this two together and it becomes hard 
to do model-averaging. And it has nothing to do with religion (yuck!).

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-27 Thread Frank Harrell
Ruben, I'm not sure you are understanding the ramifications of what Bert
said.  In addition you are making several assumptions implicitly:

1. model selection is needed (vs. fitting the full model and using
shrinkage)
2. model selection works in the absence of shrinkage
3. model selection can find the right model and the features selected
would be the same features selected if the data were slightly perturbed or a
new sample taken
4. AIC tells you something that P-values don't (unless using structured
multiple degree of freedom tests)
5. parsimony is good

None of these assumptions is true.  Model selection without shrinkage
(penalization) seems to offer benefits but this is largely a mirage.

Frank


Rubén Roa wrote
 
 -Mensaje original-
 De: Bert Gunter [mailto:gunter.berton@] 
 Enviado el: jueves, 26 de enero de 2012 21:20
 Para: Rubén Roa
 CC: Ben Bolker; Frank Harrell
 Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions
 and unique combinations?
 
 On Wed, Jan 25, 2012 at 11:39 PM, Rubén Roa lt;rroa@gt; wrote:
 I think we have gone through this before.
 First, the destruction of all aspects of statistical inference is not at
 stake, Frank Harrell's post notwithstanding.
 Second, checking all pairs is a way to see for _all pairs_ which model
 outcompetes which in terms of predictive ability by -2AIC or more. Just
 sorting them by the AIC does not give you that if no model is better than
 the next best by less than 2AIC.
 Third, I was not implying that AIC differences play the role of
 significance tests. I agree with you that model selection is better not
 understood as a proxy or as a relative of significance tests procedures.
 Incidentally, when comparing many models the AIC is often inconclusive.
 If one is bent on selecting just _the model_ then I check numerical
 optimization diagnostics such as size of gradients, KKT criteria, and
 other issues such as standard errors of parameter estimates and the
 correlation matrix of parameter estimates.
 
 -- And the mathematical basis for this claim is   ??
 
 --
 Ruben: In my area of work (building/testing/applying mechanistic nonlinear
 models of natural and economic phenomena) the issue of numerical
 optimization is a very serious one. It is often the case that a really
 good looking model does not converge properly (that's why ADMB is so
 popular among us). So if the AIC is inconclusive, but one AIC-tied model
 yields reasonably looking standard errors and low pairwise parameter
 estimates correlations, while the other wasn´t even able to produce a
 positive definite Hessian matrix (though it was able to maximize the
 log-likelihood), I think I have good reasons to select the properly
 converged model. I assume here that the lack of convergence is a symptom
 of model inadequacy to the data, that the AIC was not able to detect.
 ---
 Ruben: For some reasons I don't find model averaging appealing. I guess
 deep in my heart I expect more from my model than just the best predictive
 ability.
 
 -- This is a religious, not a scientific statement, and has no place in
 any scientific discussion.
 
 --
 Ruben: Seriously, there is a wide and open place in scientific discussion
 for mechanistic model-building. I expect the models I built to be more
 than able predictors, I want them to capture some aspect of nature, to
 teach me something about nature, so I refrain from model averaging, which
 is an open admission that you don't care too much about what's really
 going on.
 
 -- The belief that certain data analysis practices -- standard or not --
 somehow can be trusted to yield reliable scientific results in the face of
 clear theoretical (mathematical )and practical results to the contrary,
 while widespread, impedes and often thwarts the progress of science,
 Evidence-based medicine and clinical trials came about for a reason. I
 would encourage you to reexamine the basis of your scientific practice and
 the role that magical thinking plays in it.
 
 Best,
 Bert
 
 --
 Ruben: All right Bert. I often re-examine the basis of my scientific
 praxis but less often than I did before, I have to confess. I like to
 think it is because I am converging on the right praxis so there are less
 issues to re-examine. But this problem of model selection is a tough one.
 Being a likelihoodist in inference naturally leads you to AIC-based model
 selection, Jim Lindsey being influent too. Wanting that your models say
 some something about nature is another strong conditioning factor. Put
 this two together and it becomes hard to do model-averaging. And it has
 nothing to do with religion (yuck!).
 
 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this 

Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-27 Thread Rubén Roa
-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Frank Harrell
Enviado el: viernes, 27 de enero de 2012 14:28
Para: r-help@r-project.org
Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and 
unique combinations?

Ruben, I'm not sure you are understanding the ramifications of what Bert said.  
In addition you are making several assumptions implicitly:

--
Ruben: Frank, I guess we are going nowhere now.
But thanks anyways. See below if you want.

1. model selection is needed (vs. fitting the full model and using shrinkage)
Ruben: Nonlinear mechanistic models that are too complex often just don't 
converge, they crash. No shrinkage to apply to a failed convergence model.

2. model selection works in the absence of shrinkage 
Ruben: I think you are assuming that shrinkage is necessary.

3. model selection can find the right model and the features selected would 
be the same features selected if the data were slightly perturbed or a new 
sample taken 
Ruben: I don't make this assumption. New data, new model.

4. AIC tells you something that P-values don't (unless using structured 
multiple degree of freedom tests)
Ruben: It does.

 5. parsimony is good
Ruben: It is.

None of these assumptions is true.  Model selection without shrinkage
(penalization) seems to offer benefits but this is largely a mirage.

Ruben: Have a good weekend!

Ruben

Rubén Roa wrote
 
 -Mensaje original-
 De: Bert Gunter [mailto:gunter.berton@] Enviado el: jueves, 26 de 
 enero de 2012 21:20
 Para: Rubén Roa
 CC: Ben Bolker; Frank Harrell
 Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 
 interactions and unique combinations?
 
 On Wed, Jan 25, 2012 at 11:39 PM, Rubén Roa lt;rroa@gt; wrote:
 I think we have gone through this before.
 First, the destruction of all aspects of statistical inference is not 
 at stake, Frank Harrell's post notwithstanding.
 Second, checking all pairs is a way to see for _all pairs_ which 
 model outcompetes which in terms of predictive ability by -2AIC or 
 more. Just sorting them by the AIC does not give you that if no model 
 is better than the next best by less than 2AIC.
 Third, I was not implying that AIC differences play the role of 
 significance tests. I agree with you that model selection is better 
 not understood as a proxy or as a relative of significance tests procedures.
 Incidentally, when comparing many models the AIC is often inconclusive.
 If one is bent on selecting just _the model_ then I check numerical 
 optimization diagnostics such as size of gradients, KKT criteria, and 
 other issues such as standard errors of parameter estimates and the 
 correlation matrix of parameter estimates.
 
 -- And the mathematical basis for this claim is   ??
 
 --
 Ruben: In my area of work (building/testing/applying mechanistic 
 nonlinear models of natural and economic phenomena) the issue of 
 numerical optimization is a very serious one. It is often the case 
 that a really good looking model does not converge properly (that's 
 why ADMB is so popular among us). So if the AIC is inconclusive, but 
 one AIC-tied model yields reasonably looking standard errors and low 
 pairwise parameter estimates correlations, while the other wasn´t even 
 able to produce a positive definite Hessian matrix (though it was able 
 to maximize the log-likelihood), I think I have good reasons to select 
 the properly converged model. I assume here that the lack of 
 convergence is a symptom of model inadequacy to the data, that the AIC was 
 not able to detect.
 ---
 Ruben: For some reasons I don't find model averaging appealing. I 
 guess deep in my heart I expect more from my model than just the best 
 predictive ability.
 
 -- This is a religious, not a scientific statement, and has no place 
 in any scientific discussion.
 
 --
 Ruben: Seriously, there is a wide and open place in scientific 
 discussion for mechanistic model-building. I expect the models I built 
 to be more than able predictors, I want them to capture some aspect of 
 nature, to teach me something about nature, so I refrain from model 
 averaging, which is an open admission that you don't care too much 
 about what's really going on.
 
 -- The belief that certain data analysis practices -- standard or not 
 -- somehow can be trusted to yield reliable scientific results in the 
 face of clear theoretical (mathematical )and practical results to the 
 contrary, while widespread, impedes and often thwarts the progress of 
 science, Evidence-based medicine and clinical trials came about for a 
 reason. I would encourage you to reexamine the basis of your 
 scientific practice and the role that magical thinking plays in it.
 
 Best,
 Bert
 
 --
 Ruben: All right Bert. I often re-examine the basis of my scientific 
 praxis but less often than I did before, I have to confess. I like to 
 think it is because I am converging on the right praxis so there 

Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-27 Thread Frank Harrell
Ruben you are mistaken on every single point.  But I see it's not worth
continuing this discussion.
Frank

Rubén Roa wrote
 
 -Mensaje original-
 De: r-help-bounces@ [mailto:r-help-bounces@] En nombre de Frank Harrell
 Enviado el: viernes, 27 de enero de 2012 14:28
 Para: r-help@
 Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions
 and unique combinations?
 
 Ruben, I'm not sure you are understanding the ramifications of what Bert
 said.  In addition you are making several assumptions implicitly:
 
 --
 Ruben: Frank, I guess we are going nowhere now.
 But thanks anyways. See below if you want.
 
 1. model selection is needed (vs. fitting the full model and using
 shrinkage)
 Ruben: Nonlinear mechanistic models that are too complex often just don't
 converge, they crash. No shrinkage to apply to a failed convergence model.
 
 2. model selection works in the absence of shrinkage 
 Ruben: I think you are assuming that shrinkage is necessary.
 
 3. model selection can find the right model and the features selected
 would be the same features selected if the data were slightly perturbed or
 a new sample taken 
 Ruben: I don't make this assumption. New data, new model.
 
 4. AIC tells you something that P-values don't (unless using structured
 multiple degree of freedom tests)
 Ruben: It does.
 
  5. parsimony is good
 Ruben: It is.
 
 None of these assumptions is true.  Model selection without shrinkage
 (penalization) seems to offer benefits but this is largely a mirage.
 
 Ruben: Have a good weekend!
 
 Ruben
 
 Rubén Roa wrote
 
 -Mensaje original-
 De: Bert Gunter [mailto:gunter.berton@] Enviado el: jueves, 26 de 
 enero de 2012 21:20
 Para: Rubén Roa
 CC: Ben Bolker; Frank Harrell
 Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 
 interactions and unique combinations?
 
 On Wed, Jan 25, 2012 at 11:39 PM, Rubén Roa lt;rroa@gt; wrote:
 I think we have gone through this before.
 First, the destruction of all aspects of statistical inference is not 
 at stake, Frank Harrell's post notwithstanding.
 Second, checking all pairs is a way to see for _all pairs_ which 
 model outcompetes which in terms of predictive ability by -2AIC or 
 more. Just sorting them by the AIC does not give you that if no model 
 is better than the next best by less than 2AIC.
 Third, I was not implying that AIC differences play the role of 
 significance tests. I agree with you that model selection is better 
 not understood as a proxy or as a relative of significance tests
 procedures.
 Incidentally, when comparing many models the AIC is often inconclusive.
 If one is bent on selecting just _the model_ then I check numerical 
 optimization diagnostics such as size of gradients, KKT criteria, and 
 other issues such as standard errors of parameter estimates and the 
 correlation matrix of parameter estimates.
 
 -- And the mathematical basis for this claim is   ??
 
 --
 Ruben: In my area of work (building/testing/applying mechanistic 
 nonlinear models of natural and economic phenomena) the issue of 
 numerical optimization is a very serious one. It is often the case 
 that a really good looking model does not converge properly (that's 
 why ADMB is so popular among us). So if the AIC is inconclusive, but 
 one AIC-tied model yields reasonably looking standard errors and low 
 pairwise parameter estimates correlations, while the other wasn´t even 
 able to produce a positive definite Hessian matrix (though it was able 
 to maximize the log-likelihood), I think I have good reasons to select 
 the properly converged model. I assume here that the lack of 
 convergence is a symptom of model inadequacy to the data, that the AIC
 was not able to detect.
 ---
 Ruben: For some reasons I don't find model averaging appealing. I 
 guess deep in my heart I expect more from my model than just the best 
 predictive ability.
 
 -- This is a religious, not a scientific statement, and has no place 
 in any scientific discussion.
 
 --
 Ruben: Seriously, there is a wide and open place in scientific 
 discussion for mechanistic model-building. I expect the models I built 
 to be more than able predictors, I want them to capture some aspect of 
 nature, to teach me something about nature, so I refrain from model 
 averaging, which is an open admission that you don't care too much 
 about what's really going on.
 
 -- The belief that certain data analysis practices -- standard or not 
 -- somehow can be trusted to yield reliable scientific results in the 
 face of clear theoretical (mathematical )and practical results to the 
 contrary, while widespread, impedes and often thwarts the progress of 
 science, Evidence-based medicine and clinical trials came about for a 
 reason. I would encourage you to reexamine the basis of your 
 scientific practice and the role that magical thinking plays in it.
 
 Best,
 Bert
 
 --
 Ruben: All right Bert. I often re-examine the basis of my scientific 
 praxis but less 

Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-27 Thread Greg Snow
What variables to consider adding and when to stop adding them depends greatly 
upon what question(s) you are trying to answer and the science behind your data.

Are you trying to create a model to predict your outcome for future predictors? 
 How precise of predictions are needed?

Are you trying to understand how certain predictors relate to the response? How 
they relate after conditioning on other predictors?

Will humans be using your equation directly? Or will it be in a black box that 
the computer generates predictions from but people never need to look at the 
details?

What is the cost (money, time, difficulty, etc.) of collecting the different 
predictors?

Answers to the above questions will be much more valuable in choosing the 
best model than AIC or other values (though you should still look at the 
results from analyses for information to combine with the other information).  
R and its programmers (no matter how great and wonderful they are) cannot 
answer these for you.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Jhope
 Sent: Thursday, January 26, 2012 2:26 PM
 To: r-help@r-project.org
 Subject: Re: [R] How do I compare 47 GLM models with 1 to 5
 interactions and unique combinations?
 
 I ask the question about when to stop adding another variable even
 though it
 lowers the AIC because each time I add a variable the AIC is lower. How
 do I
 know when the model is a good fit? When to stop adding variables,
 keeping
 the model simple?
 
 Thanks, J
 
 --
 View this message in context: http://r.789695.n4.nabble.com/How-do-I-
 compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-
 tp4326407p4331848.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-26 Thread Jhope
Thank you everyone for your dedication to improving 'R' - its function to
statistical analysis and comments. 

I have now 48 models (unique combinations of 1 to 6 variables) and have put
them into a list and gained the results for all models. Below is a sample of
my script  results:

m$model48 - Model48.glm
 sapply(m, extractAIC)

  model47  model48
[1,]8.000   46.000
[2,] 6789.863 3549.543

Q
1) How do you interpret these results? What is 1 and 2? 
2) I have been suggested to try a quasibinomial, once responses are fixed.
Is this necessary? If so is there a way I can do this by considering all
these models ?
3) Then gaussian?

Much appreciated! 
Saludos, J

--
View this message in context: 
http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4329798.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-26 Thread Frank Harrell
To pretend that AIC solves this problem is to ignore that AIC is just a
restatement of P-values.
Frank

Rubén Roa wrote
 
 I think we have gone through this before.
 First, the destruction of all aspects of statistical inference is not at
 stake, Frank Harrell's post notwithstanding.
 Second, checking all pairs is a way to see for _all pairs_ which model
 outcompetes which in terms of predictive ability by -2AIC or more. Just
 sorting them by the AIC does not give you that if no model is better than
 the next best by less than 2AIC.
 Third, I was not implying that AIC differences play the role of
 significance tests. I agree with you that model selection is better not
 understood as a proxy or as a relative of significance tests procedures.
 Incidentally, when comparing many models the AIC is often inconclusive. If
 one is bent on selecting just _the model_ then I check numerical
 optimization diagnostics such as size of gradients, KKT criteria, and
 other issues such as standard errors of parameter estimates and the
 correlation matrix of parameter estimates. For some reasons I don't find
 model averaging appealing. I guess deep in my heart I expect more from my
 model than just the best predictive ability.
 
 Rubén
 
 -Mensaje original-
 De: r-help-bounces@ [mailto:r-help-bounces@] En nombre de Ben Bolker
 Enviado el: miércoles, 25 de enero de 2012 15:41
 Para: r-help@.ethz
 Asunto: Re: [R]How do I compare 47 GLM models with 1 to 5 interactions and
 unique combinations?
 
 Rubén Roa rroa at azti.es writes:
 
 A more 'manual' way to do it is this.
 
 If you have all your models named properly and well organized, say 
 Model1, Model2, ..., Model 47, with a slot with the AIC (glm produces 
 a list with one component named 'aic') then something like
 this:
  
 x - matrix(0,1081,3) x[,1:2] - t(combn(47,2)) for(i in 1:n){x[,3]
 - abs(unlist(combn(eval(as.name(paste(Model,i,sep=)))$aic[1,])-
 unlist(combn(eval(as.name(paste(Model,i,sep=)))$aic,2)[2,]))}
 
  
 will calculate all the 1081 AIC differences between paired comparisons 
 of the 47 models. It may not be pretty but works for me.
 
  
 An AIC difference equal or less than 2 is a tie, anything higher is 
 evidence for the model with the less AIC (Sakamoto et al., Akaike 
 Information Criterion Statistics, KTK Scientific Publishers, Tokyo).
  
 
   I wouldn't go quite as far as Frank Harrell did in his answer, but
 
 (1) it seems silly to do all pairwise comparisons of models; one of the
 big advantages of information-theoretic approaches is that you can just
 list the models in order of AIC ...
 
 (2) the dredge() function from the MuMIn package may be useful for
 automating this sort of thing.  There is an also an AICtab function in the
 emdbook package.
 
 (3) If you're really just interested in picking the single model with the
 best expected predictive capability (and all of the other assumptions of
 the AIC approach are met -- very large data set, good fit to the data,
 etc.), then just picking the model with the best AIC will work.  It's when
 you start to make inferences on the individual parameters within that
 model, without taking account of the model selection process, that you run
 into trouble.  If you are really concerned about good predictions then it
 may be better to do multi-model averaging *or* use some form of shrinkage
 estimator.
 
 (4) The delta-AIC2 means pretty much the same rule of thumb is fine,
 but it drives me nuts when people use it as a quasi-significance-testing
 rule, as in the simpler model is adequate if dAIC2.  If you're going to
 work in the model selection arena, then don't follow hypothesis-testing
 procedures!  A smaller AIC means lower expected K-L distance, period.
 
 For the record, Brian Ripley has often expressed the (minority) opinion
 that AIC is *not* appropriate for comparing non-nested models (e.g. 
 lt;http://tolstoy.newcastle.edu.au/R/help/06/02/21794.htmlgt;).
 
 
 
 -Original Message-
 From: r-help-bounces at r-project.org on behalf of Milan 
 Bouchet-Valat
 Sent: Wed 1/25/2012 10:32 AM
 To: Jhope
 Cc: r-help at r-project.org
 
 Subject: Re: [R] How do I compare 47 GLM models with 1 to 5  
 interactions and unique combinations?
 
  
 Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit :
  Hi R-listers,
  
  I have developed 47 GLM models with different combinations of 
  interactions from 1 variable to 5 variables. I have manually made 
  each model separately and put them into individual tables (organized 
  by the number of variables) showing the AIC score. I want to compare
 all of these models.
  
  1) What is the best way to compare various models with unique 
  combinations and different number of variables?
 See ?step or ?stepAIC (from package MASS) if you want an automated way 
 of doing this.
 
  2) I am trying to develop the most simplest model ideally. Even 
  though adding another variable would lower the AIC,
 
   No, not necessarily.
 
 how do I interpret it is worth
  

Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-26 Thread Jhope
I ask the question about when to stop adding another variable even though it
lowers the AIC because each time I add a variable the AIC is lower. How do I
know when the model is a good fit? When to stop adding variables, keeping
the model simple?

Thanks, J

--
View this message in context: 
http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4331848.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-26 Thread Bert Gunter
Simple question. 8 million pages in the statistical literature of
answers. What, indeed, is the secret to life?

Post on a statistical help list (e.g. stats.stackexchange.com). This
has almost nothing to do with R. Be prepared for an onslaught of often
conflicting wisdom.

-- Bert

On Thu, Jan 26, 2012 at 1:25 PM, Jhope jeanwaij...@gmail.com wrote:
 I ask the question about when to stop adding another variable even though it
 lowers the AIC because each time I add a variable the AIC is lower. How do I
 know when the model is a good fit? When to stop adding variables, keeping
 the model simple?

 Thanks, J

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4331848.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-25 Thread Jhope
Hi R-listers,

I have developed 47 GLM models with different combinations of interactions
from 1 variable to 5 variables. I have manually made each model separately
and put them into individual tables (organized by the number of variables)
showing the AIC score. I want to compare all of these models. 

1) What is the best way to compare various models with unique combinations
and different number of variables? 
2) I am trying to develop the most simplest model ideally. Even though
adding another variable would lower the AIC, how do I interpret it is worth
it to include another variable in the model? How do I know when to stop? 

Definitions of Variables:
HTL - distance to high tide line (continuous)
Veg - distance to vegetation 
Aeventexhumed - Event of exhumation
Sector - number measurements along the beach
Rayos - major sections of beach (grouped sectors)
TotalEggs - nest egg density

Example of how all models were created: 
Model2.glm - glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
data=data.to.analyze, family=binomial)
Model7.glm - glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
binomial, data.to.analyze)
Model21.glm - glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs,
data.to.analyze, family = binomial)
Model37.glm - glm(cbind(Shells, TotalEggs-Shells) ~
HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)

Please advise, thanks! 
J


--
View this message in context: 
http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4326407.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-25 Thread Milan Bouchet-Valat
Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit :
 Hi R-listers,
 
 I have developed 47 GLM models with different combinations of interactions
 from 1 variable to 5 variables. I have manually made each model separately
 and put them into individual tables (organized by the number of variables)
 showing the AIC score. I want to compare all of these models. 
 
 1) What is the best way to compare various models with unique combinations
 and different number of variables? 
See ?step or ?stepAIC (from package MASS) if you want an automated way
of doing this.

 2) I am trying to develop the most simplest model ideally. Even though
 adding another variable would lower the AIC, how do I interpret it is worth
 it to include another variable in the model? How do I know when to stop? 
This is a general statistical question, not specific to R. As a general
rule, if adding a variable lowers the AIC by a significant margin, then
it's worth including it. You should only stop when a variable increases
the AIC. But this is assuming you consider it a good indicator and you
know what you're doing. There's plenty of literature on this subject.

 Definitions of Variables:
 HTL - distance to high tide line (continuous)
 Veg - distance to vegetation 
 Aeventexhumed - Event of exhumation
 Sector - number measurements along the beach
 Rayos - major sections of beach (grouped sectors)
 TotalEggs - nest egg density
 
 Example of how all models were created: 
 Model2.glm - glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
 data=data.to.analyze, family=binomial)
 Model7.glm - glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
 binomial, data.to.analyze)
 Model21.glm - glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs,
 data.to.analyze, family = binomial)
 Model37.glm - glm(cbind(Shells, TotalEggs-Shells) ~
 HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)
To extract the AICs of all these models, it's easier to put them in a
list and get their AICs like this:
m - list()
m$model2 - glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
data=data.to.analyze, family=binomial)
m$model3 - glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
binomial, data.to.analyze)

sapply(m, extractAIC)


Cheers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-25 Thread Rubén Roa

A more 'manual' way to do it is this.
If you have all your models named properly and well organized, say Model1, 
Model2, ..., Model 47, with a slot with the AIC (glm produces a list with one 
component named 'aic') then something like this:

x - matrix(0,1081,3)
x[,1:2] - t(combn(47,2))
for(i in 1:n){x[,3] - 
abs(unlist(combn(eval(as.name(paste(Model,i,sep=)))$aic[1,])-unlist(combn(eval(as.name(paste(Model,i,sep=)))$aic,2)[2,]))}

will calculate all the 1081 AIC differences between paired comparisons of the 
47 models. It may not be pretty but works for me.

An AIC difference equal or less than 2 is a tie, anything higher is evidence 
for the model with the less AIC (Sakamoto et al., Akaike Information Criterion 
Statistics, KTK Scientific Publishers, Tokyo).

HTH

Rubén H. Roa-Ureta, Ph. D.
AZTI Tecnalia, Txatxarramendi Ugartea z/g,
Sukarrieta, Bizkaia, SPAIN


-Original Message-
From: r-help-boun...@r-project.org on behalf of Milan Bouchet-Valat
Sent: Wed 1/25/2012 10:32 AM
To: Jhope
Cc: r-help@r-project.org
Subject: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and 
unique combinations?
 
Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit :
 Hi R-listers,
 
 I have developed 47 GLM models with different combinations of interactions
 from 1 variable to 5 variables. I have manually made each model separately
 and put them into individual tables (organized by the number of variables)
 showing the AIC score. I want to compare all of these models. 
 
 1) What is the best way to compare various models with unique combinations
 and different number of variables? 
See ?step or ?stepAIC (from package MASS) if you want an automated way
of doing this.

 2) I am trying to develop the most simplest model ideally. Even though
 adding another variable would lower the AIC, how do I interpret it is worth
 it to include another variable in the model? How do I know when to stop? 
This is a general statistical question, not specific to R. As a general
rule, if adding a variable lowers the AIC by a significant margin, then
it's worth including it. You should only stop when a variable increases
the AIC. But this is assuming you consider it a good indicator and you
know what you're doing. There's plenty of literature on this subject.

 Definitions of Variables:
 HTL - distance to high tide line (continuous)
 Veg - distance to vegetation 
 Aeventexhumed - Event of exhumation
 Sector - number measurements along the beach
 Rayos - major sections of beach (grouped sectors)
 TotalEggs - nest egg density
 
 Example of how all models were created: 
 Model2.glm - glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
 data=data.to.analyze, family=binomial)
 Model7.glm - glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
 binomial, data.to.analyze)
 Model21.glm - glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs,
 data.to.analyze, family = binomial)
 Model37.glm - glm(cbind(Shells, TotalEggs-Shells) ~
 HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)
To extract the AICs of all these models, it's easier to put them in a
list and get their AICs like this:
m - list()
m$model2 - glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
data=data.to.analyze, family=binomial)
m$model3 - glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
binomial, data.to.analyze)

sapply(m, extractAIC)


Cheers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-25 Thread Frank Harrell
If you are trying to destroy all aspects of statistical inference this is a
good way to go.  This is also a good way to ignore the subject matter in
driving model selection.
Frank

Jhope wrote
 
 Hi R-listers,
 
 I have developed 47 GLM models with different combinations of interactions
 from 1 variable to 5 variables. I have manually made each model separately
 and put them into individual tables (organized by the number of variables)
 showing the AIC score. I want to compare all of these models. 
 
 1) What is the best way to compare various models with unique combinations
 and different number of variables? 
 2) I am trying to develop the most simplest model ideally. Even though
 adding another variable would lower the AIC, how do I interpret it is
 worth it to include another variable in the model? How do I know when to
 stop? 
 
 Definitions of Variables:
 HTL - distance to high tide line (continuous)
 Veg - distance to vegetation 
 Aeventexhumed - Event of exhumation
 Sector - number measurements along the beach
 Rayos - major sections of beach (grouped sectors)
 TotalEggs - nest egg density
 
 Example of how all models were created: 
 Model2.glm - glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
 data=data.to.analyze, family=binomial)
 Model7.glm - glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
 binomial, data.to.analyze)
 Model21.glm - glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs,
 data.to.analyze, family = binomial)
 Model37.glm - glm(cbind(Shells, TotalEggs-Shells) ~
 HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)
 
 Please advise, thanks! 
 J
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4327219.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-25 Thread Ben Bolker
Rubén Roa rroa at azti.es writes:

 A more 'manual' way to do it is this.

 If you have all your models named properly and well organized, say
 Model1, Model2, ..., Model 47, with a slot with the AIC (glm
 produces a list with one component named 'aic') then something like
 this:
 
 x - matrix(0,1081,3) x[,1:2] - t(combn(47,2)) for(i in 1:n){x[,3]
 - abs(unlist(combn(eval(as.name(paste(Model,i,sep=)))$aic[1,])-
 unlist(combn(eval(as.name(paste(Model,i,sep=)))$aic,2)[2,]))}

 
 will calculate all the 1081 AIC differences between paired
 comparisons of the 47 models. It may not be pretty but works for me.

 
 An AIC difference equal or less than 2 is a tie, anything higher is
 evidence for the model with the less AIC (Sakamoto et al., Akaike
 Information Criterion Statistics, KTK Scientific Publishers, Tokyo).
 

  I wouldn't go quite as far as Frank Harrell did in his answer, but

(1) it seems silly to do all pairwise comparisons of models; one of
the big advantages of information-theoretic approaches is that you can
just list the models in order of AIC ...

(2) the dredge() function from the MuMIn package may be useful for
automating this sort of thing.  There is an also an AICtab function in
the emdbook package.

(3) If you're really just interested in picking the single model with
the best expected predictive capability (and all of the other
assumptions of the AIC approach are met -- very large data set, good
fit to the data, etc.), then just picking the model with the best AIC
will work.  It's when you start to make inferences on the individual
parameters within that model, without taking account of the model
selection process, that you run into trouble.  If you are really
concerned about good predictions then it may be better to do
multi-model averaging *or* use some form of shrinkage estimator.

(4) The delta-AIC2 means pretty much the same rule of thumb is
fine, but it drives me nuts when people use it as a
quasi-significance-testing rule, as in the simpler model is adequate
if dAIC2.  If you're going to work in the model selection arena,
then don't follow hypothesis-testing procedures!  A smaller AIC means
lower expected K-L distance, period.

For the record, Brian Ripley has often expressed the (minority)
opinion that AIC is *not* appropriate for comparing non-nested models
(e.g.  http://tolstoy.newcastle.edu.au/R/help/06/02/21794.html).


 
 -Original Message-
 From: r-help-bounces at r-project.org on behalf of Milan Bouchet-Valat
 Sent: Wed 1/25/2012 10:32 AM
 To: Jhope
 Cc: r-help at r-project.org

 Subject: Re: [R] How do I compare 47 GLM models with 1 to 5
  interactions and unique combinations?

 
 Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit :
  Hi R-listers,
  
  I have developed 47 GLM models with different combinations of interactions
  from 1 variable to 5 variables. I have manually made each model separately
  and put them into individual tables (organized by the number of variables)
  showing the AIC score. I want to compare all of these models. 
  
  1) What is the best way to compare various models with unique combinations
  and different number of variables? 
 See ?step or ?stepAIC (from package MASS) if you want an automated way
 of doing this.
 
  2) I am trying to develop the most simplest model ideally. Even though
  adding another variable would lower the AIC,

  No, not necessarily.

 how do I interpret it is worth
  it to include another variable in the model? How do I know when to stop? 

  When the AIC stops decreasing.

 This is a general statistical question, not specific to R. As a general
 rule, if adding a variable lowers the AIC by a significant margin, then
 it's worth including it. 

  If the variable lowers the AIC *at all* then your best estimate is that
you should include it.

 You should only stop when a variable increases
 the AIC. But this is assuming you consider it a good indicator and you
 know what you're doing. There's plenty of literature on this subject.
 
  Definitions of Variables:
  HTL - distance to high tide line (continuous)
  Veg - distance to vegetation 
  Aeventexhumed - Event of exhumation
  Sector - number measurements along the beach
  Rayos - major sections of beach (grouped sectors)
  TotalEggs - nest egg density
  
  Example of how all models were created: 
  Model2.glm - glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
  data=data.to.analyze, family=binomial)
  Model7.glm - glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
  binomial, data.to.analyze)
  Model21.glm - glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs,
  data.to.analyze, family = binomial)
  Model37.glm - glm(cbind(Shells, TotalEggs-Shells) ~
  HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)
 To extract the AICs of all these models, it's easier to put them in a
 list and get their AICs like this:
 m - list()
 m$model2 - glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
 data=data.to.analyze, family=binomial)
 m$model3 

Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-25 Thread Rubén Roa
I think we have gone through this before.
First, the destruction of all aspects of statistical inference is not at stake, 
Frank Harrell's post notwithstanding.
Second, checking all pairs is a way to see for _all pairs_ which model 
outcompetes which in terms of predictive ability by -2AIC or more. Just sorting 
them by the AIC does not give you that if no model is better than the next best 
by less than 2AIC.
Third, I was not implying that AIC differences play the role of significance 
tests. I agree with you that model selection is better not understood as a 
proxy or as a relative of significance tests procedures.
Incidentally, when comparing many models the AIC is often inconclusive. If one 
is bent on selecting just _the model_ then I check numerical optimization 
diagnostics such as size of gradients, KKT criteria, and other issues such as 
standard errors of parameter estimates and the correlation matrix of parameter 
estimates. For some reasons I don't find model averaging appealing. I guess 
deep in my heart I expect more from my model than just the best predictive 
ability.

Rubén

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Ben Bolker
Enviado el: miércoles, 25 de enero de 2012 15:41
Para: r-h...@stat.math.ethz.ch
Asunto: Re: [R]How do I compare 47 GLM models with 1 to 5 interactions and 
unique combinations?

Rubén Roa rroa at azti.es writes:

 A more 'manual' way to do it is this.

 If you have all your models named properly and well organized, say 
 Model1, Model2, ..., Model 47, with a slot with the AIC (glm produces 
 a list with one component named 'aic') then something like
 this:
 
 x - matrix(0,1081,3) x[,1:2] - t(combn(47,2)) for(i in 1:n){x[,3]
 - abs(unlist(combn(eval(as.name(paste(Model,i,sep=)))$aic[1,])-
 unlist(combn(eval(as.name(paste(Model,i,sep=)))$aic,2)[2,]))}

 
 will calculate all the 1081 AIC differences between paired comparisons 
 of the 47 models. It may not be pretty but works for me.

 
 An AIC difference equal or less than 2 is a tie, anything higher is 
 evidence for the model with the less AIC (Sakamoto et al., Akaike 
 Information Criterion Statistics, KTK Scientific Publishers, Tokyo).
 

  I wouldn't go quite as far as Frank Harrell did in his answer, but

(1) it seems silly to do all pairwise comparisons of models; one of the big 
advantages of information-theoretic approaches is that you can just list the 
models in order of AIC ...

(2) the dredge() function from the MuMIn package may be useful for automating 
this sort of thing.  There is an also an AICtab function in the emdbook package.

(3) If you're really just interested in picking the single model with the best 
expected predictive capability (and all of the other assumptions of the AIC 
approach are met -- very large data set, good fit to the data, etc.), then just 
picking the model with the best AIC will work.  It's when you start to make 
inferences on the individual parameters within that model, without taking 
account of the model selection process, that you run into trouble.  If you are 
really concerned about good predictions then it may be better to do multi-model 
averaging *or* use some form of shrinkage estimator.

(4) The delta-AIC2 means pretty much the same rule of thumb is fine, but it 
drives me nuts when people use it as a quasi-significance-testing rule, as in 
the simpler model is adequate if dAIC2.  If you're going to work in the 
model selection arena, then don't follow hypothesis-testing procedures!  A 
smaller AIC means lower expected K-L distance, period.

For the record, Brian Ripley has often expressed the (minority) opinion that 
AIC is *not* appropriate for comparing non-nested models (e.g.  
http://tolstoy.newcastle.edu.au/R/help/06/02/21794.html).


 
 -Original Message-
 From: r-help-bounces at r-project.org on behalf of Milan 
 Bouchet-Valat
 Sent: Wed 1/25/2012 10:32 AM
 To: Jhope
 Cc: r-help at r-project.org

 Subject: Re: [R] How do I compare 47 GLM models with 1 to 5  
 interactions and unique combinations?

 
 Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit :
  Hi R-listers,
  
  I have developed 47 GLM models with different combinations of 
  interactions from 1 variable to 5 variables. I have manually made 
  each model separately and put them into individual tables (organized 
  by the number of variables) showing the AIC score. I want to compare all of 
  these models.
  
  1) What is the best way to compare various models with unique 
  combinations and different number of variables?
 See ?step or ?stepAIC (from package MASS) if you want an automated way 
 of doing this.
 
  2) I am trying to develop the most simplest model ideally. Even 
  though adding another variable would lower the AIC,

  No, not necessarily.

 how do I interpret it is worth
  it to include another variable in the model? How do I know when to stop? 

  When the AIC stops decreasing.

 This is a general