Re: [R] Using anova(f1, f2) to compare lmer models yields seemingly erroneous Chisq = 0, p = 1

2009-09-07 Thread rapton

Thank you all for your insight!  I am glad to hear, at least, that I am doing
something incorrectly (since the results do not make sense), and I am very
grateful for your attempts to remedy my very limited (and admittedly
self-taught) understanding of multilevel models and R.

As I mentioned in the problem statement, predictor.1 explains vastly more
variance in outcome than predictor.2 (R2 = 15% vs. 5% in OLS regression,
with very large N), and the model estimates are very similar for the
multilevel model as for OLS regression.  Therefore, I am quite confident
that predictor.1 comprises a much better model.

I understand that several of you are saying that anova() cannot be used to
compare these two multilevel models.  Is there *any* way to compare two
predictors to see which better predicts the outcome in a multilevel model? 
f1's lower AIC and BIC, and higher logLik are concordant with the idea that
predictor.1 is superior to predictor.2, as best as I understand it, but is
there any way to test whether that difference is statistically significant? 
The only function I can find online is anova() to compare models, but its
output is nonsensical and, as you are all saying, it does not apply to my
situation anyway.

Interestingly, anova() seems to work if I arbitrarily subset my
observations, but when I use all the observations anova() generates Chisq =
0.  This is probably a red herring but I thought I would mention it in case
it is not.

Also, I concede that I am confused what you mean that the two models (f1 and
f2) are not nested, and therefore anova() cannot be used.  What would be an
example of a nested model:  comparing predictor.1 to a model with both
predictor.1 and predictor.2?  Surely there must also be a way to compare the
predictive power of predictor.1 and predictor.2 to each other in a
zero-order sense, but I am at a loss to identify it.



Alain Zuur wrote:
 
 
 
 rapton wrote:
 
  When I run two models, the output of each model is generated
 correctly as far as I can tell (e.g. summary(f1) and summary(f2) for the
 multilevel model output look perfectly reasonable), and in this case (see
 below) predictor.1 explains vastly more variance in outcome than
 predictor.2
 (R2 = 15% vs. 5% in OLS regression, with very large N).  What I am
 utterly
 puzzled by is that when I run an anova comparing the two multilevel model
 fits, the Chisq comes back as 0, with p = 1.  I am pretty sure that fit
 #1
 (f1) is a much better predictor of the outcome than f2, which is
 reflected
 in the AIC, BIC , and logLik values.  Why might anova be giving me this
 curious output?  How can I fix it?  I am sure I am making a dumb error
 somewhere, but I cannot figure out what it is.  Any help or suggestions
 would 
 be greatly appreciated!
 
 -Matt
 
 
 f1 - (lmer(outcome ~ predictor.1 + (1 | person), data=i))
 f2 - (lmer(outcome ~ predictor.2 + (1 | person), data=i))
 anova(f1, f2)
 
 Data: i
 Models:
 f1: outcome ~ predictor.1 + (1 | person)
 f2: outcome ~ predictor.2 + (1 | person)
DfAIC  BIClogLik   Chisq Chi Df Pr(Chisq)
 f1  6  45443  45489 -22715
 f2 25  47317  47511 -23633 0 19  1
 
 
 
 
 
 Your models are nest nestedit doesn't make sense to do. 
 
 
 Alain
 

-- 
View this message in context: 
http://www.nabble.com/Using-anova%28f1%2C-f2%29-to-compare-lmer-models-yields-seemingly-erroneous-Chisq-%3D-0%2C-p-%3D-1-tp25297254p25338046.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using anova(f1, f2) to compare lmer models yields seemingly erroneous Chisq = 0, p = 1

2009-09-04 Thread rapton

Hello,

I am using R to analyze a large multilevel data set, using
lmer() to model my data, and using anova() to compare the fit of various
models.  When I run two models, the output of each model is generated
correctly as far as I can tell (e.g. summary(f1) and summary(f2) for the
multilevel model output look perfectly reasonable), and in this case (see
below) predictor.1 explains vastly more variance in outcome than predictor.2
(R2 = 15% vs. 5% in OLS regression, with very large N).  What I am utterly
puzzled by is that when I run an anova comparing the two multilevel model
fits, the Chisq comes back as 0, with p = 1.  I am pretty sure that fit #1
(f1) is a much better predictor of the outcome than f2, which is reflected
in the AIC, BIC , and logLik values.  Why might anova be giving me this
curious output?  How can I fix it?  I am sure I am making a dumb error
somewhere, but I cannot figure out what it is.  Any help or suggestions
would 
be greatly appreciated!

-Matt


 f1 - (lmer(outcome ~ predictor.1 + (1 | person), data=i))
 f2 - (lmer(outcome ~ predictor.2 + (1 | person), data=i))
 anova(f1, f2)

Data: i
Models:
f1: outcome ~ predictor.1 + (1 | person)
f2: outcome ~ predictor.2 + (1 | person)
   DfAIC  BIClogLik   Chisq Chi Df Pr(Chisq)
f1  6  45443  45489 -22715
f2 25  47317  47511 -23633 0 19  1
-- 
View this message in context: 
http://www.nabble.com/Using-anova%28f1%2C-f2%29-to-compare-lmer-models-yields-seemingly-erroneous-Chisq-%3D-0%2C-p-%3D-1-tp25297254p25297254.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using loops to run functions over a list of variables

2009-05-12 Thread rapton

Hello,

I have a data set with many variables, and often I want to run a given
function, like summary() or cor() or lmer() etc. on many combinations of one
or more than one of these variables.  For every combination of variables I
want to analyze I have been writing out the code by hand, but given that I
want to run many different functions over dozens and dozens of variables
combinations it is taking a lot of time and making for very inelegent code. 
There *has* to be a better way!  I have tried looking through numerous
message boards but everything I've tried has failed.

It seems like loops would solve this problem nicely.
(1) Create list of variables of interest
(2) Iterate through the list, running a given function on each variable

I have a data matrix which I have creatively called data.  It has
variables named focus and productive.

If I run the function summary(), for instance, it works fine:
summary(data$focus)
summary(data$productive)  

Both of these work.

If I try to use a loop like:

factors - c(data$focus, data$productive)
for(i in 1:2){
summary(get(factors[i]))
}

It given the following errors:
Error in get(factors[i]) : variable data$focus was not found
Error in summary(get(factors[i])) : 
  error in evaluating the argument 'object' in selecting a method for
function 'summary'

But data$focus *does* exist!  I could run summary(data$focus) and it works
perfectly.  

What am I doing wrong?

Even if I get this working, is there a better way to do this, especially if
I have dozens of variables to analyze?

Any ideas would be greatly appreciated!
-- 
View this message in context: 
http://www.nabble.com/Using-loops-to-run-functions-over-a-list-of-variables-tp23505399p23505399.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.