# Re: [R] How are interaction terms computed in lm's result / problems with interaction terms in lm?

```> On Sep 18, 2016, at 11:01 AM, mviljamaa <mvilja...@kapsi.fi> wrote:
>
> Also if you, rather than doing what's done below, do:
>
> fit3 <- lm(kidmomhsage\$kid_score ~ kidmomhsage\$mom_age + kidmomhsage\$mom_hs +
> kidmomhsage\$mom_age * kidmomhsage\$mom_hs)
>
> Then this gives the result:
>
> Call:
> lm(formula = kidmomhsage\$kid_score ~ kidmomhsage\$mom_age + kidmomhsage\$mom_hs
> +
>    kidmomhsage\$mom_age * kidmomhsage\$mom_hs)
>
> Coefficients:
>                           (Intercept)
>                               110.542
>                   kidmomhsage\$mom_age
>                                -1.522
>                    kidmomhsage\$mom_hs
>                               -41.287
> kidmomhsage\$mom_age:kidmomhsage\$mom_hs
>                                 2.391
>
> Where the interaction term now seems properly interpretable. So perhaps this
> is the way to use interaction terms with lm.
>
> However, in the above, is the coefficient 2.391 of
> kidmomhsage\$mom_age:kidmomhsage\$mom_hs actually only that for mom_hs == 1 in
> which case for mom_hs == 0 one would simply ignore the last coefficient?```
```
Yes.

In all of this it would much clearer and safer if you supplied a dataframe to
the data parameter of lm:

lm(formula =kid_score ~ mom_age +mom_hs + mom_age*mom_hs, data= kidmomhsage)

>
> And would one still need to perform summations of kidmomhsage\$mom_age and
> kidmomhsage\$mom_age:kidmomhsage\$mom_hs coefficients, i.e. the coefficient for
> kidmomhsage\$mom_age = -1.522 + 2.391?

Yes, at least if I'm understanding your terminology. That is the net mom_age
coefficient for those subjects with mom_hs values not at the base level.

>
>
> On 2016-09-18 20:41, mviljamaa wrote:
>> I'm trying to use interaction terms in lm and for the following types of
>> models:
>> fit3_hs <- lm(kidmomhsage\$kid_score ~ kidmomhsage\$mom_age +
>> kidmomhsage\$mom_hs + kidmomhsage\$mom_age * 1)
>> fit3_nohs <- lm(kidmomhsage\$kid_score ~ kidmomhsage\$mom_age +
>> kidmomhsage\$mom_hs + kidmomhsage\$mom_age * 0)
>> where you see the last term being the interaction term (it's
>> mom_age*mom_hs where mom_hs takes values 0 or 1), the results are
>> causing a bit of confusion.
>> fit3_hs returns:
>> Call:
>> lm(formula = kidmomhsage\$kid_score ~ kidmomhsage\$mom_age +
>> kidmomhsage\$mom_hs +
>>    kidmomhsage\$mom_age * 1)
>> Coefficients:
>>        (Intercept)  kidmomhsage\$mom_age
>>            70.4787               0.3261
>> kidmomhsage\$mom_hs
>>            11.3112
>> fit3_nohs returns:
>> Call:
>> lm(formula = kidmomhsage\$kid_score ~ kidmomhsage\$mom_age +
>> kidmomhsage\$mom_hs +
>>    kidmomhsage\$mom_age * 0)
>> Coefficients:
>> kidmomhsage\$mom_age   kidmomhsage\$mom_hs
>>              3.368               11.568
>> Now why is (Intercept) term missing from the second one?

In R, formula terms `1` and `0` have special meaning. In the first model you
"formula-added" mom_age to mom_age and got, not 2*mom_age, but rather just
mom_age. In the second model you got the formula equivalent of `mom_age +
mom_hs + 0` which is an intercept-free specification. Read:

?formula

I misremembered a pithy summary of this topic that I thought was by Greg Snow
in the fortunes package about why one should almost never use intercept free
models, but it's not showing up for me, but perhaps some of these Rhelp threads
will be useful:

http://markmail.org/message/o7kbarfvpdobmdir?q=list:org%2Er-project%2Er-help+snow+intercept+0

You could easily substitute 'ripley', 'lumley' or several other names in that
search strategy in Rhelp's archives and get equally credible material.

>> Also since in the first one the interaction term's coefficient should
>> be added to the coefficient of mom_age, then is the return value of
>> kidmomhsage\$mom_age 0.3261 the sum of the coefficient of mom_age and
>> the coefficient of the interaction term? Or would I need to produce
>> the sum myself somehow?

In the first one the intercept is the mean predicted score for a mom_age of
zero and an mon_hs at the base value, so it is essentially setting a reference
value to be added to any of the _age and _hs increments or decrements for cases
of particular values of those covariates. The mom_age coefficient is averaged
over the cases with both values of mom_hs.

These sound like questions whose answers are typically learned in a first
course on regression. So the answer _should_ all be in whatever standard
regression textbook you _should_ be reading. They are only borderline on-topic
for rhelp. We don't advertise as a statistics tutoring service, so I think any
followup questions on this matter of interpreting model output should be
directed to CrossValidated.com

As the standard sig says: Please read the Posting Guide and the second line as
well.