Re: [R] testing non-linear component in mgcv:gam

2005-10-24 Thread Simon Wood
In this case the models being compared are really identical, and the
P-value is meaningless numerical noise.

If your main focus is hypothesis testing and you really need near exact
p-values, then do this sort of testing using unpenalized models. i.e.
don't have mgcv::gam estimate the EDF of the smooth, just use
`s(...,fx=TRUE)' to estimate it unpenalized. This gives horrible point
estimates and excellent p-values.

best,
Simon

 Hi,

 I need further help with my GAMs. Most models I test are very
 obviously non-linear. Yet, to be on the safe side, I report the
 significance of the smooth (default output of mgcv's summary.gam) and
 confirm it deviates significantly from linearity.

 I do the latter by fitting a second model where the same predictor is
 entered without the s(), and then use anova.gam to compare the two. I
 thought this was the equivalent of the default output of anova.gam
 using package gam instead of mgcv.

 I wonder if this procedure is correct because one of my models
 appears to be linear. In fact mgcv estimates df to be exactly 1.0 so
 I could have stopped there. However I inadvertently repeated the
 procedure outlined above. I would have thought in this case the
 anova.gam comparing the smooth and the linear fit would for sure have
 been not significant. To my surprise, P was 6.18e-09!

 Am I doing something wrong when I attempt to confirm the non-
 parametric part a smoother is significant? Here is my example case
 where the relationship does appear to be linear:

 library(mgcv)
  This is mgcv 1.3-7
 Temp - c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12, 0.38, 0.62,
 0.88, 1.12,
 1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12, 3.38,
 3.62, 3.88,
 4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88, 6.12,
 6.38, 6.62, 6.88,
 7.12, 8.38, 13.62)
 N.sets - c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 27, 29, 31,
 22, 26, 24, 23,
  15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 3, 1, 1,
 1, 1, 1)
 wm.sed - c(0.0, 0.016129032, 0.0, 0.062046512,
 0.396459596, 0.189082949,
  0.054757925, 0.142810440, 0.168005168, 0.180804428,
 0.111439628, 0.128799505,
  0.193707937, 0.105921610, 0.103497845, 0.028591837,
 0.217894389, 0.020535469,
  0.080389068, 0.105234450, 0.070213450, 0.050771363,
 0.042074434, 0.102348837,
  0.049748344, 0.019100478, 0.005203125, 0.101711864,
 0.0, 0.0,
  0.014808824, 0.0, 0.22200, 0.16700,
 0.0, 0.0,
  0.0)

 sed.gam - gam(wm.sed~s(Temp),weight=N.sets)
 summary.gam(sed.gam)
  Family: gaussian
  Link function: identity
 
  Formula:
  wm.sed ~ s(Temp)
 
  Parametric coefficients:
  Estimate Std. Error t value Pr(|t|)
  (Intercept)  0.084030.01347   6.241 3.73e-07 ***
  ---
  Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
 
  Approximate significance of smooth terms:
  edf Est.rank F  p-value
  s(Temp)   11 13.95 0.000666 ***
  ---
  Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
 
  R-sq.(adj) =  0.554   Deviance explained = 28.5%
  GCV score = 0.09904   Scale est. = 0.093686  n = 37

 # testing non-linear contribution
 sed.lin - gam(wm.sed~Temp,weight=N.sets)
 summary.gam(sed.lin)
  Family: gaussian
  Link function: identity
 
  Formula:
  wm.sed ~ Temp
 
  Parametric coefficients:
   Estimate Std. Error t value Pr(|t|)
  (Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
  Temp-0.023792   0.006369  -3.736 0.000666 ***
  ---
  Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
 
 
  R-sq.(adj) =  0.554   Deviance explained = 28.5%
  GCV score = 0.09904   Scale est. = 0.093686  n = 37
 anova.gam(sed.lin, sed.gam, test=F)
  Analysis of Deviance Table
 
  Model 1: wm.sed ~ Temp
  Model 2: wm.sed ~ s(Temp)
 Resid. Df Resid. Dev Df  Deviance  F   Pr(F)
  1 3.5000e+01  3.279
  2 3.5000e+01  3.279 5.5554e-10 2.353e-11 0.4521 6.18e-09 ***


 Thanks in advance,


 Denis Chabot

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] testing non-linear component in mgcv:gam

2005-10-24 Thread Simon Wood
Looks like a bug in mgcv::summary.gam when the model is strictly
parametric... I'll take a look and fix it. thanks, Simon


 In my original message I mentioned a gam fit that turns out to be a
 linear fit. By curiosity I analysed it with a linear predictor only
 with mgcv package, and then as a linear model. The output was
 identical in both, but the r-sq (adj) was 0.55 in mgcv and 0.26 in
 lm. In lm I hope that my interpretation that 26% of the variance in y
 is explained by the linear relationship with x is valid. Then what
 does r2 mean in mgcv?

 Denis
   summary.gam(lin)

 Family: gaussian
 Link function: identity

 Formula:
 wm.sed ~ Temp

 Parametric coefficients:
   Estimate Std. Error t value Pr(|t|)
 (Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
 Temp-0.023792   0.006369  -3.736 0.000666 ***
 ---
 Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1


 R-sq.(adj) =  0.554   Deviance explained = 28.5%
 GCV score = 0.09904   Scale est. = 0.093686  n = 37


   summary(sed.true.lin)

 Call:
 lm(formula = wm.sed ~ Temp, weights = N.sets)

 Residuals:
  Min  1Q  Median  3Q Max
 -0.6138 -0.1312 -0.0325  0.1089  1.1449

 Coefficients:
   Estimate Std. Error t value Pr(|t|)
 (Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
 Temp-0.023792   0.006369  -3.736 0.000666 ***
 ---
 Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

 Residual standard error: 0.3061 on 35 degrees of freedom
 Multiple R-Squared: 0.285,Adjusted R-squared: 0.2646
 F-statistic: 13.95 on 1 and 35 DF,  p-value: 0.000666


 Le 05-10-05 à 09:45, John Fox a écrit :

  Dear Denis,
 
  Take a closer look at the anova table: The models provide identical
  fits to
  the data. The differences in degrees of freedom and deviance
  between the two
  models are essentially zero, 5.5554e-10 and 2.353e-11 respectively.
 
  I hope this helps,
   John
 
  
  John Fox
  Department of Sociology
  McMaster University
  Hamilton, Ontario
  Canada L8S 4M4
  905-525-9140x23604
  http://socserv.mcmaster.ca/jfox
  
 
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Denis Chabot
  Sent: Wednesday, October 05, 2005 8:22 AM
  To: r-help@stat.math.ethz.ch
  Subject: [R] testing non-linear component in mgcv:gam
 
  Hi,
 
  I need further help with my GAMs. Most models I test are very
  obviously non-linear. Yet, to be on the safe side, I report
  the significance of the smooth (default output of mgcv's
  summary.gam) and confirm it deviates significantly from linearity.
 
  I do the latter by fitting a second model where the same
  predictor is entered without the s(), and then use anova.gam
  to compare the two. I thought this was the equivalent of the
  default output of anova.gam using package gam instead of mgcv.
 
  I wonder if this procedure is correct because one of my
  models appears to be linear. In fact mgcv estimates df to be
  exactly 1.0 so I could have stopped there. However I
  inadvertently repeated the procedure outlined above. I would
  have thought in this case the anova.gam comparing the smooth
  and the linear fit would for sure have been not significant.
  To my surprise, P was 6.18e-09!
 
  Am I doing something wrong when I attempt to confirm the non-
  parametric part a smoother is significant? Here is my example
  case where the relationship does appear to be linear:
 
  library(mgcv)
 
  This is mgcv 1.3-7
 
  Temp - c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12,
  0.38, 0.62, 0.88, 1.12,
  1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12,
  3.38, 3.62, 3.88,
  4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88,
  6.12, 6.38, 6.62, 6.88,
  7.12, 8.38, 13.62)
  N.sets - c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 27,
  29, 31, 22, 26, 24, 23,
   15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 3,
  1, 1, 1, 1, 1) wm.sed - c(0.0, 0.016129032,
  0.0, 0.062046512, 0.396459596, 0.189082949,
   0.054757925, 0.142810440, 0.168005168,
  0.180804428, 0.111439628, 0.128799505,
   0.193707937, 0.105921610, 0.103497845,
  0.028591837, 0.217894389, 0.020535469,
   0.080389068, 0.105234450, 0.070213450,
  0.050771363, 0.042074434, 0.102348837,
   0.049748344, 0.019100478, 0.005203125,
  0.101711864, 0.0, 0.0,
   0.014808824, 0.0, 0.22200,
  0.16700, 0.0, 0.0,
   0.0)
 
  sed.gam - gam(wm.sed~s(Temp),weight=N.sets)
  summary.gam(sed.gam)
 
  Family: gaussian
  Link function: identity
 
  Formula:
  wm.sed ~ s(Temp)
 
  Parametric coefficients:
  Estimate Std. Error t value Pr(|t|)
  (Intercept)  0.084030.01347   6.241 3.73e-07 ***
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
  Approximate significance of smooth terms:
  

Re: [R] testing non-linear component in mgcv:gam

2005-10-05 Thread Liaw, Andy
I think you probably should state more clearly the goal of your
analysis.  In such situation, estimation and hypothesis testing are
quite different.  The procedure that gives you the `best' estimate is
not necessarily the `best' for testing linearity of components.  If your
goal is estimation/prediction, why test linearity of components?  If the
goal is testing linearity, then you probably need to find the procedure
that gives you a good test, rather than relying on what gam() gives you.

Just my $0.02...

Andy

 From: Denis Chabot
 
 Hi,
 
 I need further help with my GAMs. Most models I test are very  
 obviously non-linear. Yet, to be on the safe side, I report the  
 significance of the smooth (default output of mgcv's 
 summary.gam) and  
 confirm it deviates significantly from linearity.
 
 I do the latter by fitting a second model where the same 
 predictor is  
 entered without the s(), and then use anova.gam to compare 
 the two. I  
 thought this was the equivalent of the default output of anova.gam  
 using package gam instead of mgcv.
 
 I wonder if this procedure is correct because one of my models  
 appears to be linear. In fact mgcv estimates df to be exactly 1.0 so  
 I could have stopped there. However I inadvertently repeated the  
 procedure outlined above. I would have thought in this case the  
 anova.gam comparing the smooth and the linear fit would for 
 sure have  
 been not significant. To my surprise, P was 6.18e-09!
 
 Am I doing something wrong when I attempt to confirm the non- 
 parametric part a smoother is significant? Here is my example case  
 where the relationship does appear to be linear:
 
 library(mgcv)
  This is mgcv 1.3-7
 Temp - c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12, 
 0.38, 0.62,  
 0.88, 1.12,
 1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12, 3.38,  
 3.62, 3.88,
 4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88, 6.12,  
 6.38, 6.62, 6.88,
 7.12, 8.38, 13.62)
 N.sets - c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 27, 29, 31,  
 22, 26, 24, 23,
  15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 3, 1, 1,  
 1, 1, 1)
 wm.sed - c(0.0, 0.016129032, 0.0, 0.062046512,  
 0.396459596, 0.189082949,
  0.054757925, 0.142810440, 0.168005168, 0.180804428,  
 0.111439628, 0.128799505,
  0.193707937, 0.105921610, 0.103497845, 0.028591837,  
 0.217894389, 0.020535469,
  0.080389068, 0.105234450, 0.070213450, 0.050771363,  
 0.042074434, 0.102348837,
  0.049748344, 0.019100478, 0.005203125, 0.101711864,  
 0.0, 0.0,
  0.014808824, 0.0, 0.22200, 0.16700,  
 0.0, 0.0,
  0.0)
 
 sed.gam - gam(wm.sed~s(Temp),weight=N.sets)
 summary.gam(sed.gam)
  Family: gaussian
  Link function: identity
 
  Formula:
  wm.sed ~ s(Temp)
 
  Parametric coefficients:
  Estimate Std. Error t value Pr(|t|)
  (Intercept)  0.084030.01347   6.241 3.73e-07 ***
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
  Approximate significance of smooth terms:
  edf Est.rank F  p-value
  s(Temp)   11 13.95 0.000666 ***
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
  R-sq.(adj) =  0.554   Deviance explained = 28.5%
  GCV score = 0.09904   Scale est. = 0.093686  n = 37
 
 # testing non-linear contribution
 sed.lin - gam(wm.sed~Temp,weight=N.sets)
 summary.gam(sed.lin)
  Family: gaussian
  Link function: identity
 
  Formula:
  wm.sed ~ Temp
 
  Parametric coefficients:
   Estimate Std. Error t value Pr(|t|)
  (Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
  Temp-0.023792   0.006369  -3.736 0.000666 ***
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
 
  R-sq.(adj) =  0.554   Deviance explained = 28.5%
  GCV score = 0.09904   Scale est. = 0.093686  n = 37
 anova.gam(sed.lin, sed.gam, test=F)
  Analysis of Deviance Table
 
  Model 1: wm.sed ~ Temp
  Model 2: wm.sed ~ s(Temp)
 Resid. Df Resid. Dev Df  Deviance  F   Pr(F)
  1 3.5000e+01  3.279
  2 3.5000e+01  3.279 5.5554e-10 2.353e-11 0.4521 6.18e-09 ***
 
 
 Thanks in advance,
 
 
 Denis Chabot
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] testing non-linear component in mgcv:gam

2005-10-05 Thread John Fox
Dear Denis,

Take a closer look at the anova table: The models provide identical fits to
the data. The differences in degrees of freedom and deviance between the two
models are essentially zero, 5.5554e-10 and 2.353e-11 respectively.

I hope this helps,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Denis Chabot
 Sent: Wednesday, October 05, 2005 8:22 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] testing non-linear component in mgcv:gam
 
 Hi,
 
 I need further help with my GAMs. Most models I test are very 
 obviously non-linear. Yet, to be on the safe side, I report 
 the significance of the smooth (default output of mgcv's 
 summary.gam) and confirm it deviates significantly from linearity.
 
 I do the latter by fitting a second model where the same 
 predictor is entered without the s(), and then use anova.gam 
 to compare the two. I thought this was the equivalent of the 
 default output of anova.gam using package gam instead of mgcv.
 
 I wonder if this procedure is correct because one of my 
 models appears to be linear. In fact mgcv estimates df to be 
 exactly 1.0 so I could have stopped there. However I 
 inadvertently repeated the procedure outlined above. I would 
 have thought in this case the anova.gam comparing the smooth 
 and the linear fit would for sure have been not significant. 
 To my surprise, P was 6.18e-09!
 
 Am I doing something wrong when I attempt to confirm the non- 
 parametric part a smoother is significant? Here is my example 
 case where the relationship does appear to be linear:
 
 library(mgcv)
  This is mgcv 1.3-7
 Temp - c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12, 
 0.38, 0.62, 0.88, 1.12,
 1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12, 
 3.38, 3.62, 3.88,
 4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88, 
 6.12, 6.38, 6.62, 6.88,
 7.12, 8.38, 13.62)
 N.sets - c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 27, 
 29, 31, 22, 26, 24, 23,
  15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 3, 
 1, 1, 1, 1, 1) wm.sed - c(0.0, 0.016129032, 
 0.0, 0.062046512, 0.396459596, 0.189082949,
  0.054757925, 0.142810440, 0.168005168, 
 0.180804428, 0.111439628, 0.128799505,
  0.193707937, 0.105921610, 0.103497845, 
 0.028591837, 0.217894389, 0.020535469,
  0.080389068, 0.105234450, 0.070213450, 
 0.050771363, 0.042074434, 0.102348837,
  0.049748344, 0.019100478, 0.005203125, 
 0.101711864, 0.0, 0.0,
  0.014808824, 0.0, 0.22200, 
 0.16700, 0.0, 0.0,
  0.0)
 
 sed.gam - gam(wm.sed~s(Temp),weight=N.sets)
 summary.gam(sed.gam)
  Family: gaussian
  Link function: identity
 
  Formula:
  wm.sed ~ s(Temp)
 
  Parametric coefficients:
  Estimate Std. Error t value Pr(|t|)
  (Intercept)  0.084030.01347   6.241 3.73e-07 ***
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
  Approximate significance of smooth terms:
  edf Est.rank F  p-value
  s(Temp)   11 13.95 0.000666 ***
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
  R-sq.(adj) =  0.554   Deviance explained = 28.5%
  GCV score = 0.09904   Scale est. = 0.093686  n = 37
 
 # testing non-linear contribution
 sed.lin - gam(wm.sed~Temp,weight=N.sets)
 summary.gam(sed.lin)
  Family: gaussian
  Link function: identity
 
  Formula:
  wm.sed ~ Temp
 
  Parametric coefficients:
   Estimate Std. Error t value Pr(|t|)
  (Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
  Temp-0.023792   0.006369  -3.736 0.000666 ***
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
 
  R-sq.(adj) =  0.554   Deviance explained = 28.5%
  GCV score = 0.09904   Scale est. = 0.093686  n = 37
 anova.gam(sed.lin, sed.gam, test=F)
  Analysis of Deviance Table
 
  Model 1: wm.sed ~ Temp
  Model 2: wm.sed ~ s(Temp)
 Resid. Df Resid. Dev Df  Deviance  F   Pr(F)
  1 3.5000e+01  3.279
  2 3.5000e+01  3.279 5.5554e-10 2.353e-11 0.4521 6.18e-09 ***
 
 
 Thanks in advance,
 
 
 Denis Chabot
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] testing non-linear component in mgcv:gam

2005-10-05 Thread Denis Chabot
Fair enough, Andy. I thought I was getting both predictive ability  
and confirmation that the phenomenon I was studying was not linear. I  
have two projects, in one prediction is the goal and I don't really  
need to test linearity. In the second I needed to confirm a cycle was  
taking place and I thought my procedure did this. There was no  
theoretical reason to anticipate the shape of the cycle, so GAM was  
an appealing methodology.

Denis

Le 05-10-05 à 09:38, Liaw, Andy a écrit :

 I think you probably should state more clearly the goal of your
 analysis.  In such situation, estimation and hypothesis testing are
 quite different.  The procedure that gives you the `best' estimate is
 not necessarily the `best' for testing linearity of components.  If  
 your
 goal is estimation/prediction, why test linearity of components?   
 If the
 goal is testing linearity, then you probably need to find the  
 procedure
 that gives you a good test, rather than relying on what gam() gives  
 you.

 Just my $0.02...

 Andy


 From: Denis Chabot

 Hi,

 I need further help with my GAMs. Most models I test are very
 obviously non-linear. Yet, to be on the safe side, I report the
 significance of the smooth (default output of mgcv's
 summary.gam) and
 confirm it deviates significantly from linearity.

 I do the latter by fitting a second model where the same
 predictor is
 entered without the s(), and then use anova.gam to compare
 the two. I
 thought this was the equivalent of the default output of anova.gam
 using package gam instead of mgcv.

 I wonder if this procedure is correct because one of my models
 appears to be linear. In fact mgcv estimates df to be exactly 1.0 so
 I could have stopped there. However I inadvertently repeated the
 procedure outlined above. I would have thought in this case the
 anova.gam comparing the smooth and the linear fit would for
 sure have
 been not significant. To my surprise, P was 6.18e-09!

 Am I doing something wrong when I attempt to confirm the non-
 parametric part a smoother is significant? Here is my example case
 where the relationship does appear to be linear:

 library(mgcv)

 This is mgcv 1.3-7

 Temp - c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12,
 0.38, 0.62,
 0.88, 1.12,
 1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12, 3.38,
 3.62, 3.88,
 4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88, 6.12,
 6.38, 6.62, 6.88,
 7.12, 8.38, 13.62)
 N.sets - c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 27, 29, 31,
 22, 26, 24, 23,
  15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 3, 1, 1,
 1, 1, 1)
 wm.sed - c(0.0, 0.016129032, 0.0, 0.062046512,
 0.396459596, 0.189082949,
  0.054757925, 0.142810440, 0.168005168, 0.180804428,
 0.111439628, 0.128799505,
  0.193707937, 0.105921610, 0.103497845, 0.028591837,
 0.217894389, 0.020535469,
  0.080389068, 0.105234450, 0.070213450, 0.050771363,
 0.042074434, 0.102348837,
  0.049748344, 0.019100478, 0.005203125, 0.101711864,
 0.0, 0.0,
  0.014808824, 0.0, 0.22200, 0.16700,
 0.0, 0.0,
  0.0)

 sed.gam - gam(wm.sed~s(Temp),weight=N.sets)
 summary.gam(sed.gam)

 Family: gaussian
 Link function: identity

 Formula:
 wm.sed ~ s(Temp)

 Parametric coefficients:
 Estimate Std. Error t value Pr(|t|)
 (Intercept)  0.084030.01347   6.241 3.73e-07 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Approximate significance of smooth terms:
 edf Est.rank F  p-value
 s(Temp)   11 13.95 0.000666 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 R-sq.(adj) =  0.554   Deviance explained = 28.5%
 GCV score = 0.09904   Scale est. = 0.093686  n = 37


 # testing non-linear contribution
 sed.lin - gam(wm.sed~Temp,weight=N.sets)
 summary.gam(sed.lin)

 Family: gaussian
 Link function: identity

 Formula:
 wm.sed ~ Temp

 Parametric coefficients:
  Estimate Std. Error t value Pr(|t|)
 (Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
 Temp-0.023792   0.006369  -3.736 0.000666 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


 R-sq.(adj) =  0.554   Deviance explained = 28.5%
 GCV score = 0.09904   Scale est. = 0.093686  n = 37

 anova.gam(sed.lin, sed.gam, test=F)

 Analysis of Deviance Table

 Model 1: wm.sed ~ Temp
 Model 2: wm.sed ~ s(Temp)
Resid. Df Resid. Dev Df  Deviance  F   Pr(F)
 1 3.5000e+01  3.279
 2 3.5000e+01  3.279 5.5554e-10 2.353e-11 0.4521 6.18e-09 ***



 Thanks in advance,


 Denis Chabot

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html





 -- 
 
 Notice:  This 

Re: [R] testing non-linear component in mgcv:gam

2005-10-05 Thread Denis Chabot
Hi John,

Le 05-10-05 à 09:45, John Fox a écrit :

 Dear Denis,

 Take a closer look at the anova table: The models provide identical  
 fits to
 the data. The differences in degrees of freedom and deviance  
 between the two
 models are essentially zero, 5.5554e-10 and 2.353e-11 respectively.

 I hope this helps,
  John
This is one of my difficulties. In some examples I found on the web,  
the difference in deviance is compared directly against the chi- 
squared distribution. But my y variable has a very small range  
(between 0 and 0.5, most of the time) so the difference in deviance  
is always very small and if I compared it against the chi-squared  
distribution as I have seen done in examples, the non-linear  
component would always be not significant. Yet it is (with one  
exception), tested with both mgcv:gam and gam:gam. I think the  
examples I have read were wrong in this regard, the scale factor  
seen in mgcv output seems to intervene. But exactly how is still  
mysterious to me and I hesitate to judge the size of the deviance  
difference myself.

I agree it is near zero in my example. I guess I need to have more  
experience with these models to better interpret the output...

Denis


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Denis Chabot
 Sent: Wednesday, October 05, 2005 8:22 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] testing non-linear component in mgcv:gam

 Hi,

 I need further help with my GAMs. Most models I test are very
 obviously non-linear. Yet, to be on the safe side, I report
 the significance of the smooth (default output of mgcv's
 summary.gam) and confirm it deviates significantly from linearity.

 I do the latter by fitting a second model where the same
 predictor is entered without the s(), and then use anova.gam
 to compare the two. I thought this was the equivalent of the
 default output of anova.gam using package gam instead of mgcv.

 I wonder if this procedure is correct because one of my
 models appears to be linear. In fact mgcv estimates df to be
 exactly 1.0 so I could have stopped there. However I
 inadvertently repeated the procedure outlined above. I would
 have thought in this case the anova.gam comparing the smooth
 and the linear fit would for sure have been not significant.
 To my surprise, P was 6.18e-09!

 Am I doing something wrong when I attempt to confirm the non-
 parametric part a smoother is significant? Here is my example
 case where the relationship does appear to be linear:

 library(mgcv)

 This is mgcv 1.3-7

 Temp - c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12,
 0.38, 0.62, 0.88, 1.12,
 1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12,
 3.38, 3.62, 3.88,
 4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88,
 6.12, 6.38, 6.62, 6.88,
 7.12, 8.38, 13.62)
 N.sets - c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 27,
 29, 31, 22, 26, 24, 23,
  15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 3,
 1, 1, 1, 1, 1) wm.sed - c(0.0, 0.016129032,
 0.0, 0.062046512, 0.396459596, 0.189082949,
  0.054757925, 0.142810440, 0.168005168,
 0.180804428, 0.111439628, 0.128799505,
  0.193707937, 0.105921610, 0.103497845,
 0.028591837, 0.217894389, 0.020535469,
  0.080389068, 0.105234450, 0.070213450,
 0.050771363, 0.042074434, 0.102348837,
  0.049748344, 0.019100478, 0.005203125,
 0.101711864, 0.0, 0.0,
  0.014808824, 0.0, 0.22200,
 0.16700, 0.0, 0.0,
  0.0)

 sed.gam - gam(wm.sed~s(Temp),weight=N.sets)
 summary.gam(sed.gam)

 Family: gaussian
 Link function: identity

 Formula:
 wm.sed ~ s(Temp)

 Parametric coefficients:
 Estimate Std. Error t value Pr(|t|)
 (Intercept)  0.084030.01347   6.241 3.73e-07 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Approximate significance of smooth terms:
 edf Est.rank F  p-value
 s(Temp)   11 13.95 0.000666 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 R-sq.(adj) =  0.554   Deviance explained = 28.5%
 GCV score = 0.09904   Scale est. = 0.093686  n = 37


 # testing non-linear contribution
 sed.lin - gam(wm.sed~Temp,weight=N.sets)
 summary.gam(sed.lin)

 Family: gaussian
 Link function: identity

 Formula:
 wm.sed ~ Temp

 Parametric coefficients:
  Estimate Std. Error t value Pr(|t|)
 (Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
 Temp-0.023792   0.006369  -3.736 0.000666 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


 R-sq.(adj) =  0.554   Deviance explained = 28.5%
 GCV score = 0.09904   Scale est. = 0.093686  n = 37

 anova.gam(sed.lin, sed.gam, test=F)

 Analysis of Deviance Table

 Model 1: wm.sed ~ Temp
 Model 2: wm.sed ~ s(Temp)
Resid. Df Resid. Dev Df  Deviance  F   Pr(F)
 1 3.5000e+01  3.279
 2 3.5000e+01  3.279 

Re: [R] testing non-linear component in mgcv:gam

2005-10-05 Thread John Fox
Dear Denis,

The chi-square test is formed in analogy to what's done for a GLM: The
difference in residual deviance for the nested models is divided by the
estimated scale parameter -- i.e., the estimated error variance for a model
with normal errors. Otherwise, as you point out, the test would be dependent
upon the scale of the response.

John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Denis Chabot
 Sent: Wednesday, October 05, 2005 9:04 AM
 To: John Fox
 Cc: R list
 Subject: Re: [R] testing non-linear component in mgcv:gam
 
 Hi John,
 
 Le 05-10-05 à 09:45, John Fox a écrit :
 
  Dear Denis,
 
  Take a closer look at the anova table: The models provide identical 
  fits to the data. The differences in degrees of freedom and 
 deviance 
  between the two models are essentially zero, 5.5554e-10 and 
 2.353e-11 
  respectively.
 
  I hope this helps,
   John
 This is one of my difficulties. In some examples I found on 
 the web, the difference in deviance is compared directly 
 against the chi- squared distribution. But my y variable has 
 a very small range (between 0 and 0.5, most of the time) so 
 the difference in deviance is always very small and if I 
 compared it against the chi-squared distribution as I have 
 seen done in examples, the non-linear component would always 
 be not significant. Yet it is (with one exception), tested 
 with both mgcv:gam and gam:gam. I think the examples I have 
 read were wrong in this regard, the scale factor seen in 
 mgcv output seems to intervene. But exactly how is still 
 mysterious to me and I hesitate to judge the size of the 
 deviance difference myself.
 
 I agree it is near zero in my example. I guess I need to have 
 more experience with these models to better interpret the output...
 
 Denis
 
 
  -Original Message-
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Denis Chabot
  Sent: Wednesday, October 05, 2005 8:22 AM
  To: r-help@stat.math.ethz.ch
  Subject: [R] testing non-linear component in mgcv:gam
 
  Hi,
 
  I need further help with my GAMs. Most models I test are very 
  obviously non-linear. Yet, to be on the safe side, I report the 
  significance of the smooth (default output of mgcv's
  summary.gam) and confirm it deviates significantly from linearity.
 
  I do the latter by fitting a second model where the same 
 predictor is 
  entered without the s(), and then use anova.gam to compare 
 the two. I 
  thought this was the equivalent of the default output of anova.gam 
  using package gam instead of mgcv.
 
  I wonder if this procedure is correct because one of my models 
  appears to be linear. In fact mgcv estimates df to be 
 exactly 1.0 so 
  I could have stopped there. However I inadvertently repeated the 
  procedure outlined above. I would have thought in this case the 
  anova.gam comparing the smooth and the linear fit would 
 for sure have 
  been not significant.
  To my surprise, P was 6.18e-09!
 
  Am I doing something wrong when I attempt to confirm the non- 
  parametric part a smoother is significant? Here is my example case 
  where the relationship does appear to be linear:
 
  library(mgcv)
 
  This is mgcv 1.3-7
 
  Temp - c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12, 
 0.38, 0.62, 
  0.88, 1.12,
  1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12, 3.38, 
  3.62, 3.88,
  4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88, 6.12, 
  6.38, 6.62, 6.88,
  7.12, 8.38, 13.62)
  N.sets - c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 
 27, 29, 31, 
  22, 26, 24, 23,
   15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 
 3, 1, 1, 
  1, 1, 1) wm.sed - c(0.0, 0.016129032, 0.0, 
  0.062046512, 0.396459596, 0.189082949,
   0.054757925, 0.142810440, 0.168005168, 0.180804428, 
  0.111439628, 0.128799505,
   0.193707937, 0.105921610, 0.103497845, 0.028591837, 
  0.217894389, 0.020535469,
   0.080389068, 0.105234450, 0.070213450, 0.050771363, 
  0.042074434, 0.102348837,
   0.049748344, 0.019100478, 0.005203125, 0.101711864, 
  0.0, 0.0,
   0.014808824, 0.0, 0.22200, 0.16700, 
  0.0, 0.0,
   0.0)
 
  sed.gam - gam(wm.sed~s(Temp),weight=N.sets)
  summary.gam(sed.gam)
 
  Family: gaussian
  Link function: identity
 
  Formula:
  wm.sed ~ s(Temp)
 
  Parametric coefficients:
  Estimate Std. Error t value Pr(|t|)
  (Intercept)  0.084030.01347   6.241 3.73e-07 ***
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
  Approximate significance of smooth terms:
  edf Est.rank F  p-value
  s(Temp)   11 13.95 0.000666

Re: [R] testing non-linear component in mgcv:gam

2005-10-05 Thread Denis Chabot
Thank you everyone for your help, but my introduction to GAM is  
turning my brain to mush. I thought the one part of the output I  
understood the best was r-sq (adj), but now even this is becoming foggy.

In my original message I mentioned a gam fit that turns out to be a  
linear fit. By curiosity I analysed it with a linear predictor only  
with mgcv package, and then as a linear model. The output was  
identical in both, but the r-sq (adj) was 0.55 in mgcv and 0.26 in  
lm. In lm I hope that my interpretation that 26% of the variance in y  
is explained by the linear relationship with x is valid. Then what  
does r2 mean in mgcv?

Denis
  summary.gam(lin)

Family: gaussian
Link function: identity

Formula:
wm.sed ~ Temp

Parametric coefficients:
  Estimate Std. Error t value Pr(|t|)
(Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
Temp-0.023792   0.006369  -3.736 0.000666 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


R-sq.(adj) =  0.554   Deviance explained = 28.5%
GCV score = 0.09904   Scale est. = 0.093686  n = 37


  summary(sed.true.lin)

Call:
lm(formula = wm.sed ~ Temp, weights = N.sets)

Residuals:
 Min  1Q  Median  3Q Max
-0.6138 -0.1312 -0.0325  0.1089  1.1449

Coefficients:
  Estimate Std. Error t value Pr(|t|)
(Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
Temp-0.023792   0.006369  -3.736 0.000666 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3061 on 35 degrees of freedom
Multiple R-Squared: 0.285,Adjusted R-squared: 0.2646
F-statistic: 13.95 on 1 and 35 DF,  p-value: 0.000666


Le 05-10-05 à 09:45, John Fox a écrit :

 Dear Denis,

 Take a closer look at the anova table: The models provide identical  
 fits to
 the data. The differences in degrees of freedom and deviance  
 between the two
 models are essentially zero, 5.5554e-10 and 2.353e-11 respectively.

 I hope this helps,
  John

 
 John Fox
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604
 http://socserv.mcmaster.ca/jfox
 


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Denis Chabot
 Sent: Wednesday, October 05, 2005 8:22 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] testing non-linear component in mgcv:gam

 Hi,

 I need further help with my GAMs. Most models I test are very
 obviously non-linear. Yet, to be on the safe side, I report
 the significance of the smooth (default output of mgcv's
 summary.gam) and confirm it deviates significantly from linearity.

 I do the latter by fitting a second model where the same
 predictor is entered without the s(), and then use anova.gam
 to compare the two. I thought this was the equivalent of the
 default output of anova.gam using package gam instead of mgcv.

 I wonder if this procedure is correct because one of my
 models appears to be linear. In fact mgcv estimates df to be
 exactly 1.0 so I could have stopped there. However I
 inadvertently repeated the procedure outlined above. I would
 have thought in this case the anova.gam comparing the smooth
 and the linear fit would for sure have been not significant.
 To my surprise, P was 6.18e-09!

 Am I doing something wrong when I attempt to confirm the non-
 parametric part a smoother is significant? Here is my example
 case where the relationship does appear to be linear:

 library(mgcv)

 This is mgcv 1.3-7

 Temp - c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12,
 0.38, 0.62, 0.88, 1.12,
 1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12,
 3.38, 3.62, 3.88,
 4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88,
 6.12, 6.38, 6.62, 6.88,
 7.12, 8.38, 13.62)
 N.sets - c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 27,
 29, 31, 22, 26, 24, 23,
  15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 3,
 1, 1, 1, 1, 1) wm.sed - c(0.0, 0.016129032,
 0.0, 0.062046512, 0.396459596, 0.189082949,
  0.054757925, 0.142810440, 0.168005168,
 0.180804428, 0.111439628, 0.128799505,
  0.193707937, 0.105921610, 0.103497845,
 0.028591837, 0.217894389, 0.020535469,
  0.080389068, 0.105234450, 0.070213450,
 0.050771363, 0.042074434, 0.102348837,
  0.049748344, 0.019100478, 0.005203125,
 0.101711864, 0.0, 0.0,
  0.014808824, 0.0, 0.22200,
 0.16700, 0.0, 0.0,
  0.0)

 sed.gam - gam(wm.sed~s(Temp),weight=N.sets)
 summary.gam(sed.gam)

 Family: gaussian
 Link function: identity

 Formula:
 wm.sed ~ s(Temp)

 Parametric coefficients:
 Estimate Std. Error t value Pr(|t|)
 (Intercept)  0.084030.01347   6.241 3.73e-07 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Approximate significance of smooth terms:
 edf Est.rank F  p-value

Re: [R] testing non-linear component in mgcv:gam

2005-10-05 Thread John Fox
Dear Denis,

You got me: I would have thought from ?summary.gam that this would be the
same as the adjusted R^2 for a linear model. Note, however, that the
percentage of deviance explained checks with the R^2 from the linear model,
as expected.

Maybe you should address this question to the package author.

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: Denis Chabot [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, October 05, 2005 3:33 PM
 To: John Fox
 Cc: R list
 Subject: Re: [R] testing non-linear component in mgcv:gam
 
 Thank you everyone for your help, but my introduction to GAM 
 is turning my brain to mush. I thought the one part of the 
 output I understood the best was r-sq (adj), but now even 
 this is becoming foggy.
 
 In my original message I mentioned a gam fit that turns out 
 to be a linear fit. By curiosity I analysed it with a linear 
 predictor only with mgcv package, and then as a linear model. 
 The output was identical in both, but the r-sq (adj) was 0.55 
 in mgcv and 0.26 in lm. In lm I hope that my interpretation 
 that 26% of the variance in y is explained by the linear 
 relationship with x is valid. Then what does r2 mean in mgcv?
 
 Denis
   summary.gam(lin)
 
 Family: gaussian
 Link function: identity
 
 Formula:
 wm.sed ~ Temp
 
 Parametric coefficients:
   Estimate Std. Error t value Pr(|t|)
 (Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
 Temp-0.023792   0.006369  -3.736 0.000666 ***
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
 
 R-sq.(adj) =  0.554   Deviance explained = 28.5%
 GCV score = 0.09904   Scale est. = 0.093686  n = 37
 
 
   summary(sed.true.lin)
 
 Call:
 lm(formula = wm.sed ~ Temp, weights = N.sets)
 
 Residuals:
  Min  1Q  Median  3Q Max
 -0.6138 -0.1312 -0.0325  0.1089  1.1449
 
 Coefficients:
   Estimate Std. Error t value Pr(|t|)
 (Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
 Temp-0.023792   0.006369  -3.736 0.000666 ***
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
 Residual standard error: 0.3061 on 35 degrees of freedom
 Multiple R-Squared: 0.285,Adjusted R-squared: 0.2646
 F-statistic: 13.95 on 1 and 35 DF,  p-value: 0.000666
 
 
 Le 05-10-05 à 09:45, John Fox a écrit :
 
  Dear Denis,
 
  Take a closer look at the anova table: The models provide identical 
  fits to the data. The differences in degrees of freedom and 
 deviance 
  between the two models are essentially zero, 5.5554e-10 and 
 2.353e-11 
  respectively.
 
  I hope this helps,
   John
 
  
  John Fox
  Department of Sociology
  McMaster University
  Hamilton, Ontario
  Canada L8S 4M4
  905-525-9140x23604
  http://socserv.mcmaster.ca/jfox
  
 
 
  -Original Message-
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Denis Chabot
  Sent: Wednesday, October 05, 2005 8:22 AM
  To: r-help@stat.math.ethz.ch
  Subject: [R] testing non-linear component in mgcv:gam
 
  Hi,
 
  I need further help with my GAMs. Most models I test are very 
  obviously non-linear. Yet, to be on the safe side, I report the 
  significance of the smooth (default output of mgcv's
  summary.gam) and confirm it deviates significantly from linearity.
 
  I do the latter by fitting a second model where the same 
 predictor is 
  entered without the s(), and then use anova.gam to compare 
 the two. I 
  thought this was the equivalent of the default output of anova.gam 
  using package gam instead of mgcv.
 
  I wonder if this procedure is correct because one of my models 
  appears to be linear. In fact mgcv estimates df to be 
 exactly 1.0 so 
  I could have stopped there. However I inadvertently repeated the 
  procedure outlined above. I would have thought in this case the 
  anova.gam comparing the smooth and the linear fit would 
 for sure have 
  been not significant.
  To my surprise, P was 6.18e-09!
 
  Am I doing something wrong when I attempt to confirm the non- 
  parametric part a smoother is significant? Here is my example case 
  where the relationship does appear to be linear:
 
  library(mgcv)
 
  This is mgcv 1.3-7
 
  Temp - c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12, 
 0.38, 0.62, 
  0.88, 1.12,
  1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12, 3.38, 
  3.62, 3.88,
  4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88, 6.12, 
  6.38, 6.62, 6.88,
  7.12, 8.38, 13.62)
  N.sets - c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 
 27, 29, 31, 
  22, 26, 24, 23,
   15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 
 3, 1, 1, 
  1, 1, 1) wm.sed - c(0.0, 0.016129032, 0.0, 
  0.062046512, 0.396459596, 0.189082949