Hi,

I am using R to fit statistical models to data were the observations are
means of the original data.  R is used to calculate the mean before fitting
the model.  My problem is: When R calculates the means using tapply, the
class of the means differs from the class of the original data, which gives
me trouble when I want to use the original data to calculate model
predictions.  Here is a simple example that demonstrates the problem:

> data.in<-read.table('example.dat',header=TRUE)
>
> #Here are the data:
> data.in
  location    x      y
1        A  17.2  28.46
2        A 91.7 143.33
3        A 93.6 148.05
4        B 95.8 150.28
5        B 54.9  89.49
6        B 51.1  82.51
7        C 53.9  88.46
8        C 40.3  63.62
9        C 38.5  64.46
 >
> attach(data.in)
>
> #Calculate means by variable "location":
> data.mn<-data.frame(xm = tapply(x,location,mean), ym =
tapply(y,location,mean))
> detach(data.in)
>
> #Here are the means:
> data.mn
        xm       ym
A 67.50000 106.6133
B 67.26667 107.4267
C 44.23333   72.1800
>
> #Fit the model:
> mod1<-lm(ym ~ xm, data.mn)
>
> mod1

Call:
lm(formula = ym ~ xm, data = data.mn)

Coefficients:
(Intercept)           xm
      5.633        1.505

> #R will make "predictions" using the data.mn data frame:
> predict(mod1,newdata =  data.mn)
        A         B         C
107.19260 106.84153  72.18587
>
> #But, even if new variables are created in the original data
> #with names that match those names used in the regression:
 > data.in$xm<-data.in$x
> data.in$ym<-data.in$y
> data.in
  location    x      y   xm     ym
1        A 17.2  28.46 17.2  28.46
2        A 91.7 143.33 91.7 143.33
3        A 93.6 148.05 93.6 148.05
4        B 95.8 150.28 95.8 150.28
5        B 54.9  89.49 54.9  89.49
6        B 51.1  82.51 51.1  82.51
7        C 53.9  88.46 53.9  88.46
8        C 40.3  63.62 40.3   63.62
9        C 38.5  64.46 38.5  64.46
>
> #R will not use data.in to make predictions:
> predict(mod1,newdata = data.in)
Error: variable 'xm' was fitted with class "other" but class "numeric" was
supplied
>
> data.in$xm
[1] 17.2 91.7 93.6 95.8 54.9 51.1 53.9 40.3 38.5
> data.mn$xm
       A        B        C
67.50000 67.26667 44.23333
>

Is there a way to make these variables have the same class?  Or, is there
something other than "tapply" that will work better for this?

Thanks!

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to