On 5/4/07, ivo welch <[EMAIL PROTECTED]> wrote:
hi doug: yikes. could I have done better? Oh dear. I tried to make
my example clearer half-way through, but made it worse. I meant
set.seed(1);
fe = as.factor( as.integer( runif(100)*10 ) ); y=rnorm(100); x=rnorm(100);
print(summary(lm( y ~ x + fe)))
<deleted>
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1128 0.3680 0.31 0.76
x 0.0232 0.0960 0.24 0.81
fe1 -0.6628 0.5467 -1.21 0.23
<deleted more fe's>
Residual standard error: 0.949 on 89 degrees of freedom
Multiple R-Squared: 0.0838, Adjusted R-squared: -0.0192
F-statistic: 0.814 on 10 and 89 DF, p-value: 0.616
I really am interested only in this linear specification, the
coefficient on x (0.0232) and the R^2 of 8.38% (adjusted -1.92%). If
I did not have so much data in my real application, I would never have
to look at nlme or nlme4. I really only want to be able to run this
specification through lm with far more observations (100,000) and
groups (10,000), and be done with my problem.
OK, I understand now. You really do want to accomodate for the levels
of the factor as fixed effects not as random effects. The lme and
lmer functions are fitting a more complicated model in which the
variance of the random effects is chosen to maximize the
log-likelihood or the restricted log-likelihood but they don't give
the results that are of interest to you.
As Roger indicated in another reply you should be able to obtain the
results you want by sweeping out the means of the groups from both x
and y. However, I tried Roger's function and a modified version that
I wrote and could not show this. I'm not sure what I am doing wrong.
I enclose a transcript that shows that I can reproduce the result from
Roger's function but it doesn't do what either of us think it should.
BTW, I realize that the estimate for the Intercept should be zero in
this case.
now, with a few IQ points more, I would have looked at the lme
function instead of the nlme function in library(nlme). [then
again, I could understand stats a lot better with a few more IQ
points.] I am reading the lme description now, but I still don't
understand how to specify that I want to have dummies in my
specification, plus the x variable, and that's it. I think I am not
understanding the integration of fixed and random effects in the same
R functions.
thanks for pointing me at your lme4 library. on linux, version 2.5.0, I did
R CMD INSTALL matrix*.tar.gz
R CMD INSTALL lme4*.tar.gz
and it installed painlessly. (I guess R install packages don't have
knowledge of what they rely on; lme4 requires matrix, which the docs
state, but having gotten this wrong, I didn't get an error. no big
deal. I guess I am too used to automatic resolution of dependencies
from linux installers these days that I did not expect this.)
I now tried your specification:
> library(lme4)
Loading required package: Matrix
Loading required package: lattice
> lmer(y~x+(1|fe))
Linear mixed-effects model fit by REML
Formula: y ~ x + (1 | fe)
AIC BIC logLik MLdeviance REMLdeviance
282 290 -138 270 276
Random effects:
Groups Name Variance Std.Dev.
fe (Intercept) 0.000000000445 0.0000211
Residual 0.889548532468 0.9431588
number of obs: 100, groups: fe, 10
Fixed effects:
Estimate Std. Error t value
(Intercept) -0.0188 0.0943 -0.199
x 0.0528 0.0904 0.585
Correlation of Fixed Effects:
(Intr)
x -0.022
Warning messages:
1: Estimated variance for factor 'fe' is effectively zero
in: `LMEoptimize<-`(`*tmp*`, value = list(maxIter = 200L, tolerance =
0.0000000149011611938477,
2: $ operator not defined for this S4 class, returning NULL in: x$symbolic.cor
Without being a statistician, I can still determine that this is not
the model I would like to work with. The coefficient is 0.0528, not
0.0232. (I am also not sure why I am getting these warning messages
on my system, either, but I don't think it matters.)
is there a simple way to get the equivalent specification for my smple
model, using lmer or lme, which does not choke on huge data sets?
regards,
/ivo
> set.seed(1)
> y <- rnorm(100); x <- rnorm(100)
> fe <- gl(10, 10)
> head(coef(summary(lm(y ~ x + fe))))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.12203179 0.2955477 0.41290050 0.6806726
x 0.02927071 0.1053909 0.27773478 0.7818601
fe2 0.13032049 0.4176603 0.31202505 0.7557513
fe3 -0.25552441 0.4164178 -0.61362502 0.5410283
fe4 0.01123732 0.4227301 0.02658272 0.9788521
fe5 0.02836583 0.4255255 0.06666071 0.9470013
> coef(summary(lm(diffid(y, fe) ~ diffid(x, fe))))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.802350e-18 0.08837912 1.109125e-16 1.0000000
diffid(x, fe) 2.927071e-02 0.10043495 2.914394e-01 0.7713312
> diffid1
function(h, id) {
id <- as.factor(id)[ , drop = TRUE]
apply(as.matrix(h), 2, function(x) x - tapply(x, id, mean)[id])
}
> coef(summary(lm(diffid1(y, fe) ~ diffid1(x, fe))))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.802350e-18 0.08837912 1.109125e-16 1.0000000
diffid1(x, fe) 2.927071e-02 0.10043495 2.914394e-01 0.7713312
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.