The second model has encountered numerical problems, but I guess you didn't need me to tell you that! Usually this results from model identifiability problems, for example if one predictor variable is a simple transformation of another (or in the *generalized* case if the linear predictor ceases to uniquely determine the fitted values, as can happen if the fitted values are essentially zero over a wide region of the covariate space and a log link is used). mgcv does some simple checks to try and catch the most usual ways in which models can run into these difficulties (for example by specifing a higher smoothing basis dimension than can be supported by the number of unique covariate combinations), but there's no way of catching all such problems.
I assume that the second model came with warnings that the termwise edf's are unreliable - the calculation of the estimated degrees of freedom for each smooth is not as numerically stable as the actual model fitting, so models which are somewhat unstable can fit without problems but then cause problems when calculating diagnostics.... More generally I'd be a bit nervous about trying to estimate 5 or 6 smooth terms and their degrees of freedom from 60 data (but I don't think that this is the cause of the numerical problems). If you can't spot an obvious identifiability issue, please let me know in case it's a bug. Simon Wood > > > Dear R experts, > > > > I'm hoping someone can help me to interpret the results of building > > gam's with mgcv in R. > > > > Below are summaries of two gam's based on the same dataset. The first > > gam (named "gam.mod") has six predictor variables. The second gam > > (named "gam.mod2") is exactly the same except it is missing one of the > > predictor variables. What is confusing me is the estimated defrees of > > freedom for each of the splines in the second model.... > > > > ________________ > > > > > summary.gam(mod.gam) > > > > Family: gaussian > > Link function: identity > > > > Formula: > > INT ~ s(IGS) + s(L2E) + s(TED) + s(PSD) + s(OPD) + s(GED) > > > > Parametric coefficients: > > Estimate std. err. t ratio Pr(>|t|) > > constant 302.32 5.192 58.23 < 2.22e-16 > > > > Approximate significance of smooth terms: > > edf chi.sq p-value > > s(IGS) 4.254 58.308 9.5524e-12 > > s(L2E) 1 8.7673 0.0030668 > > s(TED) 1 8.3915 0.0037697 > > s(PSD) 1 6.0234 0.014118 > > s(OPD) 2.289 12.745 0.0024349 > > s(GED) 3.791 152.68 < 2.22e-16 > > > > R-sq.(adj) = 0.885 Deviance explained = 91.1% > > GCV score = 2124.9 Scale est. = 1617.3 n = 60 > > > > ________________ > > > > >summary.gam(mod.gam2) > > > > Family: gaussian > > Link function: identity > > > > Formula: > > INT ~ s(IGS) + s(L2E) + s(TED) + s(PSD) + s(OPD) > > > > Parametric coefficients: > > Estimate std. err. t ratio Pr(>|t|) > > constant 302.32 4.736e-14 6.384e+15 < 2.22e-16 > > > > Approximate significance of smooth terms: > > edf chi.sq p-value > > s(IGS) 1.757e-05 1.3524e+09 < 2.22e-16 > > s(L2E) 0.009991 0.21394 0.6437 > > s(TED) 2.945e-05 1.4913e+07 < 2.22e-16 > > s(PSD) 2.566e-05 6.5495e+06 < 2.22e-16 > > s(OPD) 5.023e-05 3.2332e+07 < 2.22e-16 > > > > R-sq.(adj) = 0.645 Deviance explained = 64.5% > > GCV score = 7489.7 Scale est. = 6069.7 n = 60 > > > > > > ________________ > > > > > > Any suggestions about either (1) what went wrong with the second model? > > or (2) how the heck do I interpet these results? > > > > Thanks, > > > > Mike. > > > > ______________________________________________ > > [EMAIL PROTECTED] mailing list > > http://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > > ______________________________________________ > [EMAIL PROTECTED] mailing list > http://www.stat.math.ethz.ch/mailman/listinfo/r-help > ______________________________________________ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
