Fair enough, Andy. I thought I was getting both predictive ability and confirmation that the phenomenon I was studying was not linear. I have two projects, in one prediction is the goal and I don't really need to test linearity. In the second I needed to confirm a cycle was taking place and I thought my procedure did this. There was no theoretical reason to anticipate the shape of the cycle, so GAM was an appealing methodology.
Denis Le 05-10-05 à 09:38, Liaw, Andy a écrit : > I think you probably should state more clearly the goal of your > analysis. In such situation, estimation and hypothesis testing are > quite different. The procedure that gives you the `best' estimate is > not necessarily the `best' for testing linearity of components. If > your > goal is estimation/prediction, why test linearity of components? > If the > goal is testing linearity, then you probably need to find the > procedure > that gives you a good test, rather than relying on what gam() gives > you. > > Just my $0.02... > > Andy > > >> From: Denis Chabot >> >> Hi, >> >> I need further help with my GAMs. Most models I test are very >> obviously non-linear. Yet, to be on the safe side, I report the >> significance of the smooth (default output of mgcv's >> summary.gam) and >> confirm it deviates significantly from linearity. >> >> I do the latter by fitting a second model where the same >> predictor is >> entered without the s(), and then use anova.gam to compare >> the two. I >> thought this was the equivalent of the default output of anova.gam >> using package gam instead of mgcv. >> >> I wonder if this procedure is correct because one of my models >> appears to be linear. In fact mgcv estimates df to be exactly 1.0 so >> I could have stopped there. However I inadvertently repeated the >> procedure outlined above. I would have thought in this case the >> anova.gam comparing the smooth and the linear fit would for >> sure have >> been not significant. To my surprise, P was 6.18e-09! >> >> Am I doing something wrong when I attempt to confirm the non- >> parametric part a smoother is significant? Here is my example case >> where the relationship does appear to be linear: >> >> library(mgcv) >> >>> This is mgcv 1.3-7 >>> >> Temp <- c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12, >> 0.38, 0.62, >> 0.88, 1.12, >> 1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12, 3.38, >> 3.62, 3.88, >> 4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88, 6.12, >> 6.38, 6.62, 6.88, >> 7.12, 8.38, 13.62) >> N.sets <- c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 27, 29, 31, >> 22, 26, 24, 23, >> 15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 3, 1, 1, >> 1, 1, 1) >> wm.sed <- c(0.000000000, 0.016129032, 0.000000000, 0.062046512, >> 0.396459596, 0.189082949, >> 0.054757925, 0.142810440, 0.168005168, 0.180804428, >> 0.111439628, 0.128799505, >> 0.193707937, 0.105921610, 0.103497845, 0.028591837, >> 0.217894389, 0.020535469, >> 0.080389068, 0.105234450, 0.070213450, 0.050771363, >> 0.042074434, 0.102348837, >> 0.049748344, 0.019100478, 0.005203125, 0.101711864, >> 0.000000000, 0.000000000, >> 0.014808824, 0.000000000, 0.222000000, 0.167000000, >> 0.000000000, 0.000000000, >> 0.000000000) >> >> sed.gam <- gam(wm.sed~s(Temp),weight=N.sets) >> summary.gam(sed.gam) >> >>> Family: gaussian >>> Link function: identity >>> >>> Formula: >>> wm.sed ~ s(Temp) >>> >>> Parametric coefficients: >>> Estimate Std. Error t value Pr(>|t|) >>> (Intercept) 0.08403 0.01347 6.241 3.73e-07 *** >>> --- >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >>> >>> Approximate significance of smooth terms: >>> edf Est.rank F p-value >>> s(Temp) 1 1 13.95 0.000666 *** >>> --- >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >>> >>> R-sq.(adj) = 0.554 Deviance explained = 28.5% >>> GCV score = 0.09904 Scale est. = 0.093686 n = 37 >>> >> >> # testing non-linear contribution >> sed.lin <- gam(wm.sed~Temp,weight=N.sets) >> summary.gam(sed.lin) >> >>> Family: gaussian >>> Link function: identity >>> >>> Formula: >>> wm.sed ~ Temp >>> >>> Parametric coefficients: >>> Estimate Std. Error t value Pr(>|t|) >>> (Intercept) 0.162879 0.019847 8.207 1.14e-09 *** >>> Temp -0.023792 0.006369 -3.736 0.000666 *** >>> --- >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >>> >>> >>> R-sq.(adj) = 0.554 Deviance explained = 28.5% >>> GCV score = 0.09904 Scale est. = 0.093686 n = 37 >>> >> anova.gam(sed.lin, sed.gam, test="F") >> >>> Analysis of Deviance Table >>> >>> Model 1: wm.sed ~ Temp >>> Model 2: wm.sed ~ s(Temp) >>> Resid. Df Resid. Dev Df Deviance F Pr(>F) >>> 1 3.5000e+01 3.279 >>> 2 3.5000e+01 3.279 5.5554e-10 2.353e-11 0.4521 6.18e-09 *** >>> >> >> >> Thanks in advance, >> >> >> Denis Chabot >> >> ______________________________________________ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html >> >> >> > > > ---------------------------------------------------------------------- > -------- > Notice: This e-mail message, together with any attachments, > contains information of Merck & Co., Inc. (One Merck Drive, > Whitehouse Station, New Jersey, USA 08889), and/or its affiliates > (which may be known outside the United States as Merck Frosst, > Merck Sharp & Dohme or MSD and in Japan, as Banyu) that may be > confidential, proprietary copyrighted and/or legally privileged. It > is intended solely for the use of the individual or entity named on > this message. If you are not the intended recipient, and have > received this message in error, please notify us immediately by > reply e-mail and then delete it from your system. > ---------------------------------------------------------------------- > -------- > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html