[R] summary vs anova

2011-12-19 Thread Brent Pedersen
Hi, I'm sure this is simple, but I haven't been able to find this in TFM,
say I have some data in R like this (pasted here:

gender age smokes diseaseY
  1 female  65   ever control 0.18
  2 female  77  never control 0.12
  3   male  40 state1 0.11
  4 female  67   ever control 0.20
  5   male  63   ever  state1 0.16
  6 female  26  never  state1 0.13

where unique(disease) == c(control, state1, state2)
and unique(smokes) == c(ever, never, , current)

I then fit a linear model like:

 model = lm(Y ~ smokes + disease + age + gender, data=df)

And I want to understand the difference between:

lm(formula = Y ~ smokes + disease + age + gender, data = df)

 Min   1Q   Median   3Q  Max
-0.22311 -0.08108 -0.03483  0.05604  0.46507

Estimate Std. Error t value Pr(|t|)
(Intercept)0.1206825  0.0521368   2.315   0.0211 *
smokescurrent  0.0150641  0.066   0.339   0.7348
smokesever 0.0498764  0.0326254   1.529   0.1271
smokesnever0.0394109  0.0349142   1.129   0.2597
diseasestate1  0.0018739  0.0176817   0.106   0.9157
diseasestate2 -0.0009858  0.0178651  -0.055   0.9560
age0.0002841  0.0006290   0.452   0.6518
gendermale 0.1164889  0.0128748   9.048   2e-16 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1257 on 397 degrees of freedom
Multiple R-squared: 0.1933, Adjusted R-squared: 0.1791
F-statistic: 13.59 on 7 and 397 DF,  p-value: 8.975e-16


  Analysis of Variance Table

  Response: Y
 Df Sum Sq Mean Sq F value  Pr(F)
  smokes  3 0.1536 0.05120  3.2397 0.02215 *
  disease 2 0.0129 0.00647  0.4096 0.66420
  age 1 0.0431 0.04310  2.7270 0.09946 .
  gender  1 1.2937 1.29373 81.8634  2e-16 ***
  Residuals 397 6.2740 0.01580
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I understand (hopefully correctly) that anova() tests by adding each covariate
to the model in order it is specified in the formula.

More specific questions are:

1) How do the p-values for smokes* in summary(model) relate to the
   Pr(F) for smokes in anova
2) what do the p-values for each of those smokes* mean exactly?
3) the summary above shows the values for diseasestate1 and diseasestate2
   how can I get the p-value for diseasecontrol? (or, e.g. genderfemale)


R-help@r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] repeated measures setup

2011-03-28 Thread Brent Pedersen
hi, i have some data, a subset of which is pasted at the end of this message.
i am trying to understand how to do repeated measures as our study
design consists
of a subject and up to 2 siblings.

thus far, my model looks like this--with family_id indicating a
sibling relationship:

 formula = y ~ concordant + age.proband + age.other + sex.proband + sex.other 
 + Error(family_id)

i have seen a lot of resources where this is specified as
should that be the case here? or is the above formulation sufficient
to capture the repeated
measures by famly?

also, there seem to be a number of resources for doing this type of
analysis, is a particular package that has more traction that i should
look into?


concordant  family_id   external_ref.probandexternal_ref.other
sex.proband sex.other   age.proband age.other   y(fake)
T   58  8001555080015543M   F   15  19  1
F   58  8001555080016946M   F   15  8   2
T   54  8001549980015338F   F   5   7   3
F   54  8001549980013112F   M   5   13  4
F   22  8001226980012252F   F   12  10  5
F   22  8001226980018691F   M   12  8   5

R-help@r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.