Re: [R] Understanding why a GAM can't suppress an intercept

2013-04-17 Thread Simon Wood

hi Andrew.

gam does suppress the intercept, it's just that this doesn't force the 
smooth through the intercept in the way that you would like. Basically 
for the parameteric component of the model '-1' behaves exactly like it 
does in 'lm' (it's using the same code). The smooths are 'added on' to 
the parametric component of the model, with sum to zero constraints to 
force identifiability.


There is a solution to forcing a spline through a particular point at
http://r.789695.n4.nabble.com/Use-pcls-in-quot-mgcv-quot-package-to-achieve-constrained-cubic-spline-td4660966.html
(i.e. the R help thread Re: [R] Use pcls in mgcv package to achieve 
constrained cubic spline)


best,
Simon

On 16/04/13 22:36, Andrew Crane-Droesch wrote:

  Dear List,

I've just tried to specify a GAM without an intercept -- I've got one
of the (rare) cases where it is appropriate for E(y) - 0 as X -0.
Naively running a GAM with the -1 appended to the formula and the
calling predict.gam, I see that the model isn't behaving as expected.

I don't understand why this would be.  Google turns up this old R help
thread: http://r.789695.n4.nabble.com/GAM-without-intercept-td4645786.html

Simon writes:

 *Smooth terms are constrained to sum to zero over the covariate
 values. **
 **This is an identifiability constraint designed to avoid
 confounding with **
 **the intercept (particularly important if you have more than one
 smooth). *
 If you remove the intercept from you model altogether (m2) then the
 smooth will still sum to zero over the covariate values, which in
 your
 case will mean that the smooth is quite a long way from the data.
 When
 you include the intercept (m1) then the intercept is effectively
 shifting the constrained curve up towards the data, and you get a
 nice fit.

Why?  I haven't read Simon's book in great detail, though I have read
Ruppert et al.'s Semiparametric Regression.  I don't see a reason why
a penalized spline model shouldn't equal the intercept (or zero) when
all of the regressors equals zero.

Is anyone able to help with a bit of intuition?  Or relevant passages
from a good description of why this would be the case?

Furthermore, why does the -1 formula specification work if it
doesn't work as intended by for example lm?

Many thanks,
Andrew






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603   http://people.bath.ac.uk/sw283

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Understanding why a GAM can't suppress an intercept

2013-04-17 Thread Andrew Crane-Droesch
Simon,

Many thanks as always for your help.

I see and appreciate the example that you cited, but I'm having a hard 
time generalizing it to a multivariate case.  A bit about my context -- 
my dataset is response ratios; the log of a treatment over a control.  
One of my explanatory variables is treatment intensity.  When intensity 
goes to zero, the expectation of the response ratio should also go to 
zero.  Here is the model that I would like to fit:

model = (ResponseRatio~
 +s(as.factor(study),bs=re,by=intensity)
 +s(intensity)
 +s(x1),by=intensity)
 +s(x2),by=intensity)
 +te(x1,x2),by=intensity)
 +te(x1,intensity)
)

Here is the example that you gave:

library(mgcv)
set.seed(0)
n - 100
x - runif(n)*4-1;x - sort(x);
f - exp(4*x)/(1+exp(4*x));y - f+rnorm(100)*0.1;plot(x,y)
dat - data.frame(x=x,y=y)

## Create a spline basis and penalty, making sure there is a knot
## at the constraint point, (0 here, but could be anywhere)
knots - data.frame(x=seq(-1,3,length=9)) ## create knots
## set up smoother...
sm - smoothCon(s(x,k=9,bs=cr),dat,knots=knots)[[1]]

## 3rd parameter is value of spline at knot location 0,
## set it to 0 by dropping...
X - sm$X[,-3]## spline basis
S - sm$S[[1]][-3,-3] ## spline penalty
off - y*0 + .6   ## offset term to force curve through (0, .6)

## fit spline constrained through (0, .6)...
b - gam(y ~ X - 1 + offset(off),paraPen=list(X=list(S)))
lines(x,predict(b))

## compare to unconstrained fit...

b.u - gam(y ~ s(x,k=9),data=dat,knots=knots)
lines(x,predict(b.u))

*My question*:  how can I extend your example to more than one smooth 
terms, and several smooth interactions?  In the context of a model where 
E[ResponseRatio] must equal 0 when intensity equals zero?

I am not sure that I understand exactly what is going on when you call 
smoothCon to specify the `sm`.  I've written my own penalized spline 
code from scratch, but it is far less sophisticated than mgcv, and is 
basically a ridge regression where I optimize to get a lambda after 
specifying a model with a lot of knots.  mgcv clearly has a lot more 
going on, and is far preferable to my rudimentary code for its handling 
of tensors and random effects.

(Also, for prediction, how can I do by=dum AND by=intensity at the 
same time?)

Many thanks,
Andrew

PS:  I am aware that interacting my model with an intensity variable 
makes my model quite heteroskedastic.  I am thinking of using a cluster 
wild bootstrap to construct confidence intervals.  If a better way 
forward immediately comes to your mind -- especially if its 
computationally cheaper -- I'd greatly appreciate if you could share it.

On 04/17/2013 02:16 AM, Simon Wood wrote:
 hi Andrew.

 gam does suppress the intercept, it's just that this doesn't force the 
 smooth through the intercept in the way that you would like. Basically 
 for the parameteric component of the model '-1' behaves exactly like 
 it does in 'lm' (it's using the same code). The smooths are 'added on' 
 to the parametric component of the model, with sum to zero constraints 
 to force identifiability.

 There is a solution to forcing a spline through a particular point at
 http://r.789695.n4.nabble.com/Use-pcls-in-quot-mgcv-quot-package-to-achieve-constrained-cubic-spline-td4660966.html
  

 (i.e. the R help thread Re: [R] Use pcls in mgcv package to achieve 
 constrained cubic spline)

 best,
 Simon

 On 16/04/13 22:36, Andrew Crane-Droesch wrote:
   Dear List,

 I've just tried to specify a GAM without an intercept -- I've got one
 of the (rare) cases where it is appropriate for E(y) - 0 as X -0.
 Naively running a GAM with the -1 appended to the formula and the
 calling predict.gam, I see that the model isn't behaving as expected.

 I don't understand why this would be.  Google turns up this old R help
 thread: 
 http://r.789695.n4.nabble.com/GAM-without-intercept-td4645786.html

 Simon writes:

  *Smooth terms are constrained to sum to zero over the covariate
  values. **
  **This is an identifiability constraint designed to avoid
  confounding with **
  **the intercept (particularly important if you have more than one
  smooth). *
  If you remove the intercept from you model altogether (m2) then 
 the
  smooth will still sum to zero over the covariate values, which in
  your
  case will mean that the smooth is quite a long way from the data.
  When
  you include the intercept (m1) then the intercept is effectively
  shifting the constrained curve up towards the data, and you get a
  nice fit.

 Why?  I haven't read Simon's book in great detail, though I have read
 Ruppert et al.'s Semiparametric Regression.  I don't see a reason why
 a penalized spline model shouldn't equal the intercept (or zero) when
 all of the regressors equals zero.

 Is anyone able to help with a bit of intuition?  Or relevant passages
 from a good description of why this would be the case?

 Furthermore, why does the -1 

[R] Understanding why a GAM can't suppress an intercept

2013-04-16 Thread Andrew Crane-Droesch
  Dear List,

 I've just tried to specify a GAM without an intercept -- I've got one 
 of the (rare) cases where it is appropriate for E(y) - 0 as X -0.  
 Naively running a GAM with the -1 appended to the formula and the 
 calling predict.gam, I see that the model isn't behaving as expected.

 I don't understand why this would be.  Google turns up this old R help 
 thread: http://r.789695.n4.nabble.com/GAM-without-intercept-td4645786.html

 Simon writes:

 *Smooth terms are constrained to sum to zero over the covariate
 values. **
 **This is an identifiability constraint designed to avoid
 confounding with **
 **the intercept (particularly important if you have more than one
 smooth). *
 If you remove the intercept from you model altogether (m2) then the
 smooth will still sum to zero over the covariate values, which in
 your
 case will mean that the smooth is quite a long way from the data.
 When
 you include the intercept (m1) then the intercept is effectively
 shifting the constrained curve up towards the data, and you get a
 nice fit.

 Why?  I haven't read Simon's book in great detail, though I have read 
 Ruppert et al.'s Semiparametric Regression.  I don't see a reason why 
 a penalized spline model shouldn't equal the intercept (or zero) when 
 all of the regressors equals zero.

 Is anyone able to help with a bit of intuition?  Or relevant passages 
 from a good description of why this would be the case?

 Furthermore, why does the -1 formula specification work if it 
 doesn't work as intended by for example lm?

 Many thanks,
 Andrew





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.