Re: [R] How are interaction terms computed in lm's result / problems with interaction terms in lm?

David Winsemius Sun, 18 Sep 2016 12:20:44 -0700

> On Sep 18, 2016, at 11:01 AM, mviljamaa <mvilja...@kapsi.fi> wrote:
> 
> Also if you, rather than doing what's done below, do:
> 
> fit3 <- lm(kidmomhsage$kid_score ~ kidmomhsage$mom_age + kidmomhsage$mom_hs + 
> kidmomhsage$mom_age * kidmomhsage$mom_hs)
> 
> Then this gives the result:
> 
> Call:
> lm(formula = kidmomhsage$kid_score ~ kidmomhsage$mom_age + kidmomhsage$mom_hs 
> +
>    kidmomhsage$mom_age * kidmomhsage$mom_hs)
> 
> Coefficients:
>                           (Intercept)
>                               110.542
>                   kidmomhsage$mom_age
>                                -1.522
>                    kidmomhsage$mom_hs
>                               -41.287
> kidmomhsage$mom_age:kidmomhsage$mom_hs
>                                 2.391
> 
> Where the interaction term now seems properly interpretable. So perhaps this 
> is the way to use interaction terms with lm.
> 
> However, in the above, is the coefficient 2.391 of 
> kidmomhsage$mom_age:kidmomhsage$mom_hs actually only that for mom_hs == 1 in 
> which case for mom_hs == 0 one would simply ignore the last coefficient?

Yes.

In all of this it would much clearer and safer if you supplied a dataframe to 
the data parameter of lm:

lm(formula =kid_score ~ mom_age +mom_hs + mom_age*mom_hs, data= kidmomhsage)

> 
> And would one still need to perform summations of kidmomhsage$mom_age and 
> kidmomhsage$mom_age:kidmomhsage$mom_hs coefficients, i.e. the coefficient for 
> kidmomhsage$mom_age = -1.522 + 2.391?

Yes, at least if I'm understanding your terminology. That is the net mom_age 
coefficient for those subjects with mom_hs values not at the base level.

> 
> 
> On 2016-09-18 20:41, mviljamaa wrote:
>> I'm trying to use interaction terms in lm and for the following types of 
>> models:
>> fit3_hs <- lm(kidmomhsage$kid_score ~ kidmomhsage$mom_age +
>> kidmomhsage$mom_hs + kidmomhsage$mom_age * 1)
>> fit3_nohs <- lm(kidmomhsage$kid_score ~ kidmomhsage$mom_age +
>> kidmomhsage$mom_hs + kidmomhsage$mom_age * 0)
>> where you see the last term being the interaction term (it's
>> mom_age*mom_hs where mom_hs takes values 0 or 1), the results are
>> causing a bit of confusion.
>> fit3_hs returns:
>> Call:
>> lm(formula = kidmomhsage$kid_score ~ kidmomhsage$mom_age + 
>> kidmomhsage$mom_hs +
>>    kidmomhsage$mom_age * 1)
>> Coefficients:
>>        (Intercept)  kidmomhsage$mom_age
>>            70.4787               0.3261
>> kidmomhsage$mom_hs
>>            11.3112
>> fit3_nohs returns:
>> Call:
>> lm(formula = kidmomhsage$kid_score ~ kidmomhsage$mom_age + 
>> kidmomhsage$mom_hs +
>>    kidmomhsage$mom_age * 0)
>> Coefficients:
>> kidmomhsage$mom_age   kidmomhsage$mom_hs
>>              3.368               11.568
>> Now why is (Intercept) term missing from the second one?

In R, formula terms `1` and `0` have special meaning. In the first model you 
"formula-added" mom_age to mom_age and got, not 2*mom_age, but rather just 
mom_age. In the second model you got the formula equivalent of `mom_age + 
mom_hs + 0` which is an intercept-free specification. Read:

?formula

I misremembered a pithy summary of this topic that I thought was by Greg Snow 
in the fortunes package about why one should almost never use intercept free 
models, but it's not showing up for me, but perhaps some of these Rhelp threads 
will be useful:

http://markmail.org/message/o7kbarfvpdobmdir?q=list:org%2Er-project%2Er-help+snow+intercept+0

You could easily substitute 'ripley', 'lumley' or several other names in that 
search strategy in Rhelp's archives and get equally credible material.

>> Also since in the first one the interaction term's coefficient should
>> be added to the coefficient of mom_age, then is the return value of
>> kidmomhsage$mom_age 0.3261 the sum of the coefficient of mom_age and
>> the coefficient of the interaction term? Or would I need to produce
>> the sum myself somehow?

In the first one the intercept is the mean predicted score for a mom_age of 
zero and an mon_hs at the base value, so it is essentially setting a reference 
value to be added to any of the _age and _hs increments or decrements for cases 
of particular values of those covariates. The mom_age coefficient is averaged 
over the cases with both values of mom_hs. 

These sound like questions whose answers are typically learned in a first 
course on regression. So the answer _should_ all be in whatever standard 
regression textbook you _should_ be reading. They are only borderline on-topic 
for rhelp. We don't advertise as a statistics tutoring service, so I think any 
followup questions on this matter of interpreting model output should be 
directed to CrossValidated.com

As the standard sig says: Please read the Posting Guide and the second line as 
well.
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How are interaction terms computed in lm's result / problems with interaction terms in lm?

Reply via email to