Simon,
Many thanks as always for your help.
I see and appreciate the example that you cited, but I'm having a hard
time generalizing it to a multivariate case. A bit about my context --
my dataset is response ratios; the log of a treatment over a control.
One of my explanatory variables is treatment intensity. When intensity
goes to zero, the expectation of the response ratio should also go to
zero. Here is the model that I would like to fit:
model = (ResponseRatio~
+s(as.factor(study),bs="re",by=intensity)
+s(intensity)
+s(x1),by=intensity)
+s(x2),by=intensity)
+te(x1,x2),by=intensity)
+te(x1,intensity)
)
Here is the example that you gave:
library(mgcv)
set.seed(0)
n <- 100
x <- runif(n)*4-1;x <- sort(x);
f <- exp(4*x)/(1+exp(4*x));y <- f+rnorm(100)*0.1;plot(x,y)
dat <- data.frame(x=x,y=y)
## Create a spline basis and penalty, making sure there is a knot
## at the constraint point, (0 here, but could be anywhere)
knots <- data.frame(x=seq(-1,3,length=9)) ## create knots
## set up smoother...
sm <- smoothCon(s(x,k=9,bs="cr"),dat,knots=knots)[[1]]
## 3rd parameter is value of spline at knot location 0,
## set it to 0 by dropping...
X <- sm$X[,-3] ## spline basis
S <- sm$S[[1]][-3,-3] ## spline penalty
off <- y*0 + .6 ## offset term to force curve through (0, .6)
## fit spline constrained through (0, .6)...
b <- gam(y ~ X - 1 + offset(off),paraPen=list(X=list(S)))
lines(x,predict(b))
## compare to unconstrained fit...
b.u <- gam(y ~ s(x,k=9),data=dat,knots=knots)
lines(x,predict(b.u))
*My question*: how can I extend your example to more than one smooth
terms, and several smooth interactions? In the context of a model where
E[ResponseRatio] must equal 0 when intensity equals zero?
I am not sure that I understand exactly what is going on when you call
smoothCon to specify the `sm`. I've written my own penalized spline
code from scratch, but it is far less sophisticated than mgcv, and is
basically a ridge regression where I optimize to get a lambda after
specifying a model with a lot of knots. mgcv clearly has a lot more
going on, and is far preferable to my rudimentary code for its handling
of tensors and random effects.
(Also, for prediction, how can I do "by=dum" AND "by=intensity" at the
same time?)
Many thanks,
Andrew
PS: I am aware that interacting my model with an intensity variable
makes my model quite heteroskedastic. I am thinking of using a cluster
wild bootstrap to construct confidence intervals. If a better way
forward immediately comes to your mind -- especially if its
computationally cheaper -- I'd greatly appreciate if you could share it.
On 04/17/2013 02:16 AM, Simon Wood wrote:
> hi Andrew.
>
> gam does suppress the intercept, it's just that this doesn't force the
> smooth through the intercept in the way that you would like. Basically
> for the parameteric component of the model '-1' behaves exactly like
> it does in 'lm' (it's using the same code). The smooths are 'added on'
> to the parametric component of the model, with sum to zero constraints
> to force identifiability.
>
> There is a solution to forcing a spline through a particular point at
> http://r.789695.n4.nabble.com/Use-pcls-in-quot-mgcv-quot-package-to-achieve-constrained-cubic-spline-td4660966.html
>
>
> (i.e. the R help thread "Re: [R] Use pcls in "mgcv" package to achieve
> constrained cubic spline")
>
> best,
> Simon
>
> On 16/04/13 22:36, Andrew Crane-Droesch wrote:
>>> Dear List,
>>>
>>> I've just tried to specify a GAM without an intercept -- I've got one
>>> of the (rare) cases where it is appropriate for E(y) -> 0 as X ->0.
>>> Naively running a GAM with the "-1" appended to the formula and the
>>> calling "predict.gam", I see that the model isn't behaving as expected.
>>>
>>> I don't understand why this would be. Google turns up this old R help
>>> thread:
>>> http://r.789695.n4.nabble.com/GAM-without-intercept-td4645786.html
>>>
>>> Simon writes:
>>>
>>> *Smooth terms are constrained to sum to zero over the covariate
>>> values. **
>>> **This is an identifiability constraint designed to avoid
>>> confounding with **
>>> **the intercept (particularly important if you have more than one
>>> smooth). *
>>> If you remove the intercept from you model altogether (m2) then
>>> the
>>> smooth will still sum to zero over the covariate values, which in
>>> your
>>> case will mean that the smooth is quite a long way from the data.
>>> When
>>> you include the intercept (m1) then the intercept is effectively
>>> shifting the constrained curve up towards the data, and you get a
>>> nice fit.
>>>
>>> Why? I haven't read Simon's book in great detail, though I have read
>>> Ruppert et al.'s Semiparametric Regression. I don't see a reason why
>>> a penalized spline model shouldn't equal the intercept (or zero) when
>>> all of the regressors equals zero.
>>>
>>> Is anyone able to help with a bit of intuition? Or relevant passages
>>> from a good description of why this would be the case?
>>>
>>> Furthermore, why does the "-1" formula specification work if it
>>> doesn't work "as intended" by for example lm?
>>>
>>> Many thanks,
>>> Andrew
>>>
>>>
>>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [email protected] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.