Dear R-users,

I have 1 remark and 1 question on the inclusion of interactions in the gam 
function from the gam package.

I need to fit quantitative predictors in interactions with factors. You can see 
an example of what I need in fig 9.13 p265  from Hastie and Tibshirani book 
(1990). 
It's clearly stated that in ?gam  "Interactions with nonparametric smooth terms 
are not fully supported".
I have found a trick in a former 
http://www.math.yorku.ca/Who/Faculty/Monette/S-news/2284.html, using NAs and 
na.gam.replace argument, but some points are still unclear for me.

First the prediction of new data (using predict function) is not so easy (see 
script below), and need a close reading from section 7.3.2 of the Chambers and 
Hastie (1992).

Second I need to have the same intercept for all levels of factor and this not 
achievable with this trick. My question is : why not replacing NA by 0 (or any 
other particular value) ?

Here is a quite long (sorry for that) script with a generated dataset to better 
undestand my question.
in this script the model to fit is (in a GLM-like writing) : y~s(x2):x1
the generated dataset follows this model and y(x2=0)=10 whatever x1.

########################
#start of script
########################

#data construction  (with deliberately very small noise)
data1=data.frame(x1=rep(NA,27),x2=NA,y=NA)

data1$x1=factor(c(rep(1,11),rep(2,11),rep(3,5)))
data1$x2=c(rep(0:10,2),0:4)

data1[data1$x1==1,"y"]=data1[data1$x1==1,"x2"]^4*5+rnorm(11)+10000
data1[data1$x1==2,"y"]=data1[data1$x1==2,"x2"]^4*(-3)+rnorm(11)+10000
data1[data1$x1==3,"y"]=10000*data1[data1$x1==3,"x2"]+rnorm(5)+10000

library(lattice)
xyplot(data1$y~data1$x2,groups=data1$x1)

#creation of dummy variables for interactions
data1$x2_1=ifelse(data1$x1=="1",data1$x2,NA)
data1$x2_2=ifelse(data1$x1=="2",data1$x2,NA)
data1$x2_3=ifelse(data1$x1=="3",data1$x2,NA)

#model fitting
library(gam)
model=gam(y~s(x2_1)+s(x2_2)+s(x2_3)+x1,data=data1,na=na.gam.replace)

#prediction fit well data :
summary(model)
plot(data1$x2,data1$y)
points(data1$x2,model$fitted.value,col="red",pch="+")

#trying to see prediction:
predict(model) #does work
predict(model,newdata=data1) #produce NA

#trying to replace NA in data1 by mean, to mimic na.gam.replace:
Ndata=data1
Ndata$x2_1=ifelse(data1$x1=="1",data1$x2,mean(data1$x2_1,na.rm=TRUE))
Ndata$x2_2=ifelse(data1$x1=="2",data1$x2,mean(data1$x2_2,na.rm=TRUE))
Ndata$x2_3=ifelse(data1$x1=="3",data1$x2,mean(data1$x2_3,na.rm=TRUE))

predict(model,Ndata)-predict(model) #as you can see there is a systematic biais

#correct way to predict (=returned 0 for terms with NA value):
p=predict(model,data1,type="term")
rowSums(cbind(p,attr(p,"constant")),na.rm=TRUE)-predict(model)

#alternative solution, 0 instead of NA
data1$v1=ifelse(data1$x1=="1",data1$x2,0)
data1$v2=ifelse(data1$x1=="2",data1$x2,0)
data1$v3=ifelse(data1$x1=="3",data1$x2,0)

model1=gam(y~s(v1)+s(v2)+s(v3),data=data1)
summary(model1)
points(data1$x2,predict(model1,data1),col="green",pch="X")
#no particular problem with predict function

#what's happened in x2=0 ?
predict(model)[data1$x2==0]
predict(model1)[data1$x2==0]

########################
#end of script
########################

thanks in advance
best regards
Laurent Beaulaton

---------------------------------------------
Laurent Beaulaton
###############################
# NEW !!!!                                       #
#  http://www.laurent-beaulaton.fr/    #
# Tel + 33 (0)5 57 89 27 17             #
###############################
---------------------------------------------
Cemagref (French Institute of Agricultural and Environmental Engineering 
Research )
Unité "Ecosystèmes estuariens et poissons migrateurs amphihalins"
(anciennement Unité "Ressources aquatiques continentales")
50 avenue de Verdun
F 33612 Cestas Cedex

Tel + 33 (0)5 57 89 27 17
Fax + 33 (0)5 57 89 08 01
mailto:[EMAIL PROTECTED]

http://www.laurent-beaulaton.fr/ 
http://www.bordeaux.cemagref.fr/rabx/

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to