[R] gam function time trend splines
I've been doing a simple time-series analysis looking at the relationship between daily pneumonia hospitalizations and daily temperature. To mimic some of the literature, I've been including a time-trend to try to account for normal cyclical trends in hospitalization. So I've been using a function that looks something like this: gam(pneucount ~ temp_f + s(day,bs=cr,k=(4*totalyears)+1), day being the enumerated day in the analysis (1-365 for a 1 year analysis). This seems to work well enough. What troubles me is when I think about doing an analysis focusing on winter days using more than one year of data. If I just delete the summer days from the dataset, the time trend spline is trying to anneal counts from the end of one winter with the beginning of another, which doesn't seem right to me. What's the route to a statistically defensible result? Is it as simple as using the subset option? Or would I need to create indicator variables for each winter I'm interested and work in a by statement somehow (with an extra term for the levels of that indicator, I assume)? Thanks in advance for helping a Epi student who's being exposed to all this for the first time. Sincerely, Kevin Sorensen Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gam function time trend spline
If you're looking only at winter days then you probably don't need to remove seasonal trends, do you? -roger On 7/2/07, Kevin Sorensen [EMAIL PROTECTED] wrote: I've been doing a simple time-series analysis looking at the relationship between daily pneumonia hospitalizations and daily temperature. To mimic some of the literature, I've been including a time-trend to try to account for normal cyclical trends in hospitalization. So I've been using a function that looks something like this: gam(pneucount ~ temp_f + s(day,bs=cr,k=(4*totalyears)+1), day being the enumerated day in the analysis (1-365 for a 1 year analysis). This seems to work well enough. What troubles me is when I think about doing an analysis focusing on winter days using more than one year of data. If I just delete the summer days from the dataset, the time trend spline is trying to anneal counts from the end of one winter with the beginning of another, which doesn't seem right to me. What's the route to a statistically defensible result? Is it as simple as using the subset option? Or would I need to create indicator variables for each winter I'm interested and work in a by statement somehow (with an extra term for the levels of that indicator, I assume)? Thanks in advance for helping a Epi student who's being exposed to all this for the first time. Sincerely, Kevin Sorensen Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GAM for censored data? (survival analysis)
First let me admit that I am no statistician... rather, an ecologist with just enough statistical knowledge to be dangerous. I've got a dataset with percent ground cover values for species and other entities. The data are left censored at zero, in that percent ground cover cannot be negative. (My data rarely reach 100% cover so I haven't bothered with adding a right censoring at 100). I've done some previous analyses using survival analysis methods to create a predictive model for an entity of particular interest... library(survival); survreg(Surv(Y) ~ X). However, I know my data do not really match linear modeling and would like to work with some alternate methods, one of which is GAM. I noticed that Yee and Mitchell (1991, p.589) stated that GAM is appropriate for certain types of survival data. How do I implement a survival data model in GAM with R? I've searched both R help and the R site search, but not found anything relevant. Would it be as simple as library(survival); library(mgcv); gam(Surv(Y) ~ X) ??? While I have your attention, I have a related second question. I'd like to model one entity (percent ground cover) as a function of another (also percent ground cover). Is there any way to deal with a censored predictor variable as well as the censored response? Citation: Yee, T. W. N. D. Mitchell. 1991. Generalized additive models in plant ecology. Journal of Vegetation Science 2: 587-602. Thanks, -Eric Peterson Vegetation Ecologist Nevada Natural Heritage Program [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gam function in the mgcv library
I would like to fit a logistic regression using a smothing spline, where the spline is a piecewise cubic polynomial. Is the knots option used to define the subintervals for each piece of the cubic spline? If yes and there are k knots, then why does the coefficients field in the returned object from gam only list k coefficients? Shouldn't there be 4k -4 coefficients? Sincerely, Bill - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gam function in the mgcv library
On Monday 25 June 2007 13:26, Bill Wheeler wrote: I would like to fit a logistic regression using a smothing spline, where the spline is a piecewise cubic polynomial. Is the knots option used to define the subintervals for each piece of the cubic spline? - if you use something like gam(y~s(x,bs=cr,k=5),family=binomial,knots=list(x=c(0,.1,.3,.4,.8)) then yes, k is the number of knots and the `knots' list specifies where they occur. If you use the default `bs=tp' then the spline basis functions are not really `knot' based, being instead an ordered set of eigenfunctions, that are optimal in a defined sense (see Wood, 2003, JRSSB). If yes and there are k knots, then why does the coefficients field in the returned object from gam only list k coefficients? Shouldn't there be 4k -4 coefficients? A k knot natural cubic spline only has k free coefficients, so that is all that mgcv:gam reports. If you are thinking about sections of cubic, then the other 3 coefficients of each section are determined by the spline continuity conditions + the conditions of having zero second derivative at the end knots. Exact details of the `mgcv' cr basis are given in section 4.1.2 of my 2006 book (see ?gam), but all you really need to know is that it's a natural cubic spline basis parameterized in terms of function heights at the knots (although there is a gam identifiability constraint absorbed into the parameterization which muddies this neat interpretability a little). best, Simon Sincerely, Bill - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK +44 1225 386603 www.maths.bath.ac.uk/~sw283 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GAM/GLM parameters
I think this might be a very basic question, but is there a simple way to characterise the relationships that a gam or lm model have identified? I am trying the create species distribution models based on climate, and want to know whether, for example, higher temperatures (one of the predictor variables) leads to a higher probability of species presence (dependent variable). Also, how can you quantify the relative contribution of each predictor variable to the final model? Many thanks, James -- View this message in context: http://www.nabble.com/GAM-GLM-parameters-tf3525876.html#a9837219 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gam parameter predictions --Sorry for double posting
R-help, Sorry for posting again the same question (dated 26-03-2007) but all my mails have been sent to the recycle bin without possibility of recovering and thus I don't know if anyone has answer my query. Here is the original message: I'm applying a gam model (package mgcv) to predict relative abundances of a fish species. The covariates are year, month, vessel and statistical rectangle. The model looks like this: g1 - gam(log(cpue) ~ s(rekt1) + s(year) + s(mon) + s(reg1), data = dataTest) Once the model is fitted to the data I want to get the mean model estimates by year. I do the following: obsPred - data.frame(year = dataTest$year, pred = predict(g1, type = response)) gamFit - tapply(obsPred$pred, list(year = obsPred$ar), mean) Is this correct? Thanks in advance version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 4.1 year 2006 month 12 day18 svn rev40228 language R version.string R version 2.4.1 (2006-12-18) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gam parameter predictions
R-help, I'm applying a gam model (package mgcv) to predict relative abundances of a fish species. The covariates are year, month, vessel and statistical rectangle. The model looks like this: g1 - gam(log(cpue) ~ s(rekt1) + s(year) + s(mon) + s(reg1), data = dataTest) Once the model is fitted to the data I want to get the mean model estimates by year. I do the following: obsPred - data.frame(year = dataTest$year, pred = predict(g1, type = response)) gamFit - tapply(obsPred$pred, list(year = obsPred$ar), mean) Is this correct? Thanks in advance version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 4.1 year 2006 month 12 day18 svn rev40228 language R version.string R version 2.4.1 (2006-12-18) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GAM model selection and dropping terms based on GCV
Hello, I have a question regarding model selection and dropping of terms for GAMs fitted with package mgcv. I am following the approach suggested in Wood (2001), Wood and Augustin (2002). I fitted a saturated model, and I find from the plots that for two of the covariates, 1. The confidence interval includes 0 almost everywhere 2. The degrees of freedom are NOT close to 1 3. The partial residuals from plot.gam don’t show much pattern visually (to me) 4. When I drop either or both of the terms, the GCV score increases; This is my main problem: how much of an increase in GCV is ‘acceptable’ when terms are dropped? In the above case, the delta GCV scores are .03, .06 and .11 when I drop covariate A, covariate B and both respectively from the full model. I would be very grateful for any advice on this. Thank you Best Wishes Aditya __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GAM model selection and dropping terms based on GCV
On Monday 04 December 2006 12:30, aditya gangadharan wrote: Hello, I have a question regarding model selection and dropping of terms for GAMs fitted with package mgcv. I am following the approach suggested in Wood (2001), Wood and Augustin (2002). I fitted a saturated model, and I find from the plots that for two of the covariates, 1. The confidence interval includes 0 almost everywhere 2. The degrees of freedom are NOT close to 1 3. The partial residuals from plot.gam don’t show much pattern visually (to me) 4. When I drop either or both of the terms, the GCV score increases; This is my main problem: how much of an increase in GCV is ‘acceptable’ when terms are dropped? In the above case, the delta GCV scores are .03, .06 and .11 when I drop covariate A, covariate B and both respectively from the full model. I would be very grateful for any advice on this. - I'm not sure that there is really an answer to this. GCV is based on minimizing some approximation to the expected prediction error of the model. So to answer the question you'd need to do something like decide how much increase from `optimal' prediction error you would be prepared to tolerate. I think that it's not all that easy to come up with a nice way of blending prediction error based approaches to model selection, with approaches based on finding a model that is somehow the simplest model consistent with the data (but perhaps other people will comment on this). - That said, there is certainly an issue relating to the fact that the GCV score (or AIC, in fact) is rather asymmetric, so that random variability in the score tends to lead more readily to overfitting than to underfitting. This suggests that in fact prediction error performance at finite sample sizes may be improved by shrinking the smoothing parameters themselves. With `mgcv::gam' you can do this by increasing the `gamma' parameter above it's default value, which favours smoother models by making each model degree of freedom count as gamma degrees of freedom in the GCV score (or AIC/UBRE). It is possible to choose `gamma' by e.g. 10-fold cross-validation, but that requires some coding. - There are more discussions of GAM model selection in various mgcv help files and my book. See help(mgcv-package) for details of which pages, and the reference. My bottom line on model seelction is to use things like GCV, AIC, confidence interval coverage and approximate p-values for guidance, but not as the basis for rules... modelling context has to play a part as well. Sorry if that's all a bit vague. Simon -- Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK +44 1225 386603 www.maths.bath.ac.uk/~sw283 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gam() question
Hi everyone, I am fitting a bivariate smoothing model by using gam. But I got an error message like this: Error in eigen(hess1, symmetric = TRUE) : 0 x 0 matrix - this is a known problem in mgcv 1.3-20 (an optimizer fails to cope with convergence in one step). It's fixed in 1.3-21, which I'll try and get uploaded to CRAN today. Simon -- Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK +44 1225 386603 www.maths.bath.ac.uk/~sw283 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gam() question
Hi everyone, I am fitting a bivariate smoothing model by using gam. But I got an error message like this: Error in eigen(hess1, symmetric = TRUE) : 0 x 0 matrix If anyone know how to figure it out, pleaselet me know. Thanks very much. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gam() question
Hello, it's really difficult for anyone to make a constructive response based on your message. The problem could be in: 1) the function you fit (which one is it?, and which package?) 2) the arguments that you supplied (what did you tell it to do?) 3) the data that you gave it (what are they?) Try the following: PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Good luck! Andrew On Tue, Nov 14, 2006 at 04:40:09PM -0500, seeTigers wrote: Hi everyone, I am fitting a bivariate smoothing model by using gam. But I got an error message like this: Error in eigen(hess1, symmetric = TRUE) : 0 x 0 matrix If anyone know how to figure it out, pleaselet me know. Thanks very much. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 http://www.ms.unimelb.edu.au/~andrewpr http://blogs.mbs.edu/fishing-in-the-bay/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GAM Package: preplot.gam taking a **long** time
Hi: I have a large data set that I'm testing and I'm finding that preplot.gam is taking a very long amount of time to compute (like, more than 20 minutes). My machine is 32-bit, Debian unstable, 4GB memory, dual Xeon 3GHz. While the data set is very large, the gam() procedure is able to compute the model without any trouble, in a minute or so. Is there any reason why preplot.gam would be so slow? I am not using newdata in the preplot.gam function, so I am assuming that the memory is not a problem. Or could it be? Is it normal for preplot.gam to take so long on large data sets? I have not had this experience with S-Plus, on a lower quality machine, same data. Thanks, Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GAM 2D-plotting
Hi, When I fit a GAM model (using mgvc) with overlapping terms, such as gam(y~s(x,z)+s(z,w)) and afterwards I pretend to plot the component smooth functions that make it up using plot.gam, I achieve a couple of 2D plots. My question is: What's the meaning of those 2D plots in terms of y? Regards, Nixon __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GAM selection error msgs (mgcv gam packages)
My question concerns 2 error messages; one in the gam package and one in the mgcv package (see below). I have read help files and Chambers and Hastie book but am failing to understand how I can solve this problem. Could you please tell me what I must adjust so that the command does not generate error message? I am trying to achieve model selection for a GAM which is required for prediction purposes, thus my focus is on AIC. My data set has 3038 records and 116 predictor variables and a binary response variable [0 or 1]. There is no current understanding of the predictors' relationship to response so I am relying on GAM for selection of appropriate predictors. - I have some worries about using a GAM in this sort of situation - it seems like an odd model to start from to me: you don't know the relationship to the covariates, but do know that it should be additive? Is that really true? If it is then it may still be alot to ask of the model selection methods to find a good model. (I'd certainly consider upping the `gamma' parameter in mgcv:::gam). - General uneasiness apart, the specific warning message relates to the number of distinct covariate values that you have (or number of distinct X,Y,Z triplets). Do any of the covariates for single smooths have fewer than 10 distinct values? There are more than 50 distinct x,y,z triplets, I suppose? If you have distinct fewer covariate points for a smooth than the default k (10), then you need to reduce k to the number of distinct points, or fewer. - Finally, for speed reasons, I'd use the cr basis (see ?s) if doing this. best, Simon - Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY - +44 (0)1225 386603 www.maths.bath.ac.uk/~sw283/ Thanks Savrina *mgcv package 1.3-12: # I start with specifying the full model with 116 predictors including isotropic smooth of 3D location variables (when I specify only the first 14 predictors I get no error message) m0-gam(label~s(x,y,z,k=50),s+(feature4)+s(feature5)+s(feature6)+...+s(feature116),data=k.data, family=binomial) Error in smooth.construct.tp.smooth.spec(object, data, knots): A term has fewer unique covariate combinations than specified maximum degrees of freedom # I was going to follow this with backwards selection by hypothesis testing (remove highest p-val term one at a time) and also AIC comparison of all the models From help file entitled 'Generalised additive models with integrated smoothness estimation' I calculated the following where do I go from here? A) k is the basis dimension of a given term...if k is not specified k=10*3^(d-1) where 'd' is the number of covariates for this term My calculations: for all my terms but the first d=1 thus k=10*3^0=10. B) You must have more unique combinations of covariates than the model has total parameters My calculations: total parameters = sum of basis dimensions(50+10*113) + sum of non-spline terms(0) - number of spline terms(114) = 1066 *gam package: I think stepwise selection provided by gam package would be useful in finding the best predictive model. I follow example on pg 283 from 'Statistical models in S' Chambers and Hastie 1993. # I start with a full model where all predictors enter linearly k.start-gam(label~., data=k.data, family=binomial) # set up scope list with possibilities for each term eg .~1 + x + s(x) # ignore the first column of the data set k.scope-gam.scope(k.data[,-1]) # start step wise selection k.step-step(k.start,k.scope) #condensed output Start: AIC=1549.48 label~s+y+z+feature4+feature5+...+feature116 Df Deviance AIC none 1319.5 1549.5 - feature54 -1 1319.2 1551.2 - feature26 -1 1319.2 1551.2 ... -feature12 -1 1357.4 1589.4 There were 50 or more warnings (use warnings() to see the first 50) # all 50 warnings are the same warnings() Warning messages: 1: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x[, jj, drop = FALSE], y, wt, offset = object$offset, ... # it seems to not get passed the orginal linear model. It should show all the steps taken to the final model k.step$anova Step Df Deviance Resid. Df Resid. Dev AIC 1 NA NA 2922 1317.599 1549.599 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] GAM selection error msgs (mgcv gam packages)
Hi all, My question concerns 2 error messages; one in the gam package and one in the mgcv package (see below). I have read help files and Chambers and Hastie book but am failing to understand how I can solve this problem. Could you please tell me what I must adjust so that the command does not generate error message? I am trying to achieve model selection for a GAM which is required for prediction purposes, thus my focus is on AIC. My data set has 3038 records and 116 predictor variables and a binary response variable [0 or 1]. There is no current understanding of the predictors' relationship to response so I am relying on GAM for selection of appropriate predictors. Thanks Savrina *mgcv package 1.3-12: # I start with specifying the full model with 116 predictors including isotropic smooth of 3D location variables (when I specify only the first 14 predictors I get no error message) m0-gam(label~s(x,y,z,k=50),s+(feature4)+s(feature5)+s(feature6)+...+s(feature116),data=k.data, family=binomial) Error in smooth.construct.tp.smooth.spec(object, data, knots): A term has fewer unique covariate combinations than specified maximum degrees of freedom # I was going to follow this with backwards selection by hypothesis testing (remove highest p-val term one at a time) and also AIC comparison of all the models From help file entitled 'Generalised additive models with integrated smoothness estimation' I calculated the following where do I go from here? A) k is the basis dimension of a given term...if k is not specified k=10*3^(d-1) where 'd' is the number of covariates for this term My calculations: for all my terms but the first d=1 thus k=10*3^0=10. B) You must have more unique combinations of covariates than the model has total parameters My calculations: total parameters = sum of basis dimensions(50+10*113) + sum of non-spline terms(0) - number of spline terms(114) = 1066 *gam package: I think stepwise selection provided by gam package would be useful in finding the best predictive model. I follow example on pg 283 from 'Statistical models in S' Chambers and Hastie 1993. # I start with a full model where all predictors enter linearly k.start-gam(label~., data=k.data, family=binomial) # set up scope list with possibilities for each term eg .~1 + x + s(x) # ignore the first column of the data set k.scope-gam.scope(k.data[,-1]) # start step wise selection k.step-step(k.start,k.scope) #condensed output Start: AIC=1549.48 label~s+y+z+feature4+feature5+...+feature116 Df Deviance AIC none 1319.5 1549.5 - feature54 -1 1319.2 1551.2 - feature26 -1 1319.2 1551.2 ... -feature12 -1 1357.4 1589.4 There were 50 or more warnings (use warnings() to see the first 50) # all 50 warnings are the same warnings() Warning messages: 1: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x[, jj, drop = FALSE], y, wt, offset = object$offset, ... # it seems to not get passed the orginal linear model. It should show all the steps taken to the final model k.step$anova Step Df Deviance Resid. Df Resid. Dev AIC 1 NA NA 2922 1317.599 1549.599 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] gam problem
Hello, People. I ahve been trying to use mle for normal distribution data set but always reporting an erroe on gam object. is there a solution to this Victor __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gam y-axis interpretation
All smooths in a GAM are `centred' in order to ensure model identifiability. This means that a smooth, s, is estimated subject to the constraint that \sum_i s(x_i)=0, where, x_i, are the covariate values. So you can't transform back to the response scale just by applying the inverse link, even if there is only one smooth. In the single smooth case, you would need to add on the model intercept before applying the inverse link. If you need plots on the response scale then it is best to use the `predict' method function and have it return results on the `response' scale... best, Simon - Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY - +44 (0)1225 386603 www.maths.bath.ac.uk/~sw283/ On Thu, 23 Mar 2006, Bliese, Paul D LTC USAMH wrote: Sorry if this is an obvious question... I'm estimating a simple binomial generalized additive model using the gam function in the package mgcv. The model makes sense given my data, and the predicted values also make sense given what I know about the data. However, I'm having trouble interpreting the y-axis of the plot of the gam object. The y-axis is labeled s(x,2.52) which I understand to basically mean a smoothing estimator with approximately 2.52 degrees of freedom. The y-axis in my case ranges from -2 to 6 and I thought that it would be possible to convert the Y axis estimate to a probability via exp(Y)/(1+exp(Y)). So for instance, my lowest y-axis estimate is -2 for a probability of: exp(-2)/(1+exp(-2)) [1] 0.1192029 However, if I use the predict function my lowest estimate is -3.53862893 for a probability of 2.8%. The 2.8% estimate is a much better estimate than 11.9% given my specific data, so I'm clearly not interpreting the plot correctly. The help files say plot.gam provides the component smooth functions that make it up, on the scale of the linear predictor. I'm just not sure what that description means. Does someone have another description that might help me grasp the plot? Similar plots are on page 286 of Venables and Ripley (3rd Edition)... Thanks, Paul [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] gam y-axis interpretation
Sorry if this is an obvious question... I'm estimating a simple binomial generalized additive model using the gam function in the package mgcv. The model makes sense given my data, and the predicted values also make sense given what I know about the data. However, I'm having trouble interpreting the y-axis of the plot of the gam object. The y-axis is labeled s(x,2.52) which I understand to basically mean a smoothing estimator with approximately 2.52 degrees of freedom. The y-axis in my case ranges from -2 to 6 and I thought that it would be possible to convert the Y axis estimate to a probability via exp(Y)/(1+exp(Y)). So for instance, my lowest y-axis estimate is -2 for a probability of: exp(-2)/(1+exp(-2)) [1] 0.1192029 However, if I use the predict function my lowest estimate is -3.53862893 for a probability of 2.8%. The 2.8% estimate is a much better estimate than 11.9% given my specific data, so I'm clearly not interpreting the plot correctly. The help files say plot.gam provides the component smooth functions that make it up, on the scale of the linear predictor. I'm just not sure what that description means. Does someone have another description that might help me grasp the plot? Similar plots are on page 286 of Venables and Ripley (3rd Edition)... Thanks, Paul [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GAM using R tutorials?
Interesting! Very interesting! You seem to assume that I haven't done anything as you've listed below: I have even listed a book that I was aware of but I could not find in our local library... and doing some research and I have already started playing with gam, and everybody on earth knows that you can do ?gam and help.search() in R to find info, but after I have done all those, I still feel that I need some more infohelp... I don't know how you reach your ungrounded conclusion? I think you are wasting everybody's bandwidth and time by not providing constructive suggestions but discouraging (and overly repetitive) comments disregarding the effort a newbie had already paid... I think you really need to consult posting guide yourself... And remember, you can always remain silent -- you don't need to show off that you are an experienced user and for newbie your first response is always something like scolding: why didn't you search? I strongly believe that your answer is inappropriate!!! On 3/15/06, Gavin Simpson [EMAIL PROTECTED] wrote: On Tue, 2006-03-14 at 23:52 -0800, Michael wrote: Hi all, I am trying to use GAM to work on some data... Are there any resources providing hands-on tutorial/guide on how to do GAM on data in R? Specifically, I am not sure about which model to choose, and smooth models with which effective degree-of-freedom shall I use... I knew there is a book titled: GAM: an introduction using R. Unfornately our local library does not have it... so that's not an option given time constraint. Thanks a lot for your pointers! Michael. Michael, Please learn to use the search tools provided for you! You have posted numerous emails to the list recently, many of which you could have solved for yourself if only you'd heeded peoples' advice and searched for yourself. For this problem; 1) I'd suggest to the local library that they might consider buying the book, but in the meantime... 2) ...in R, do RSiteSearch(GAM) and look at the list shown in your browser. The first hit is the help page for package mgcv. Look at the references included on that help page - most are technical/statistical papers, but a starting point might be the RNews article Simon Wood wrote. That should get yourself started. But if you'd done the search yourself, you wouldn't have had to wait for someone on the list to do it for you. Finally - Please read the posting guide - it is there for a reason. HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GAM using R tutorials?
Hi Templ, That's very helpful! Indeed I've printed out the gam portion and I am digesting now... Thank you so much! I really appreicate your constructive and helpful advice! Best, Michael. On 3/15/06, TEMPL Matthias [EMAIL PROTECTED] wrote: Have you looked at: An Introduction to R: Software for StatisticalModelling Computing by Petra Kuhnert and Bill Venables which is available at http://cran.r-project.org/other-docs.html Hope this helps. Best, Matthias Hi all, I am trying to use GAM to work on some data... Are there any resources providing hands-on tutorial/guide on how to do GAM on data in R? Specifically, I am not sure about which model to choose, and smooth models with which effective degree-of-freedom shall I use... I knew there is a book titled: GAM: an introduction using R. Unfornately our local library does not have it... so that's not an option given time constraint. Thanks a lot for your pointers! Michael. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GAM using R tutorials?
On Thu, 2006-03-16 at 00:25 -0800, Michael wrote: Interesting! Very interesting! You seem to assume that I haven't done anything as you've listed below: I have even listed a book that I was aware of but I could not find in our local library... and doing some research and I have already started playing with gam, and everybody on earth knows that you can do ?gam and help.search() in R to find info, but after I have done all those, I still feel that I need some more infohelp... I don't know how you reach your ungrounded conclusion? I think you are wasting everybody's bandwidth and time by not providing constructive suggestions but discouraging (and overly repetitive) comments disregarding the effort a newbie had already paid... I think you really need to consult posting guide yourself... And remember, you can always remain silent -- you don't need to show off that you are an experienced user and for newbie your first response is always something like scolding: why didn't you search? I strongly believe that your answer is inappropriate!!! What is inappropriate about pointing you at Simon Wood's article on gam () (mgcv) in R and how to use it? Was this not what you asked for? My other comments were related to the the fact that this *was* in the References section of ?gam, which the posting guide does ask you to read. One can only assume you missed this reference when you looked at the page or did not realise the significance of it. And for the record, wherever possible I do try to reply constructively - if I didn't have an answer to your question I would not have replied to the list - but I did, and it was on the documentation provided... G On 3/15/06, Gavin Simpson [EMAIL PROTECTED] wrote: http://bugzilla.gnome.org/show_bug.cgi?id=323724 On Tue, 2006-03-14 at 23:52 -0800, Michael wrote: Hi all, I am trying to use GAM to work on some data... Are there any resources providing hands-on tutorial/guide on how to do GAM on data in R? Specifically, I am not sure about which model to choose, and smooth models with which effective degree-of-freedom shall I use... I knew there is a book titled: GAM: an introduction using R. Unfornately our local library does not have it... so that's not an option given time constraint. Thanks a lot for your pointers! Michael. Michael, Please learn to use the search tools provided for you! You have posted numerous emails to the list recently, many of which you could have solved for yourself if only you'd heeded peoples' advice and searched for yourself. For this problem; 1) I'd suggest to the local library that they might consider buying the book, but in the meantime... 2) ...in R, do RSiteSearch(GAM) and look at the list shown in your browser. The first hit is the help page for package mgcv. Look at the references included on that help page - most are technical/statistical papers, but a starting point might be the RNews article Simon Wood wrote. That should get yourself started. But if you'd done the search yourself, you wouldn't have had to wait for someone on the list to do it for you. Finally - Please read the posting guide - it is there for a reason. HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GAM using R tutorials?
Dear Templ ( Others), Thank you for pointing out the contributed papers. It is one of those things that I have been introduced to more than once, but I personally never remembered to check the section out while in the acute phase of problem-can't do it-what the heck is going on. Dear Michael, Regarding frustrating replies, those will happen. I lost sleep over such things in the past, only to learn that I didn't really have time to get caught up. It is normal when investing time and effort into a reply to want some kind of assurance that some legwork has been done. I think Gavin just didn't get the sought after assurance from your message. The reply may not have been particularly useful to you, per say, but it did serve as a reminder to a wider audience of subscribers who read posts. Given that this is done over email, and given the huge diversity of subscribers around the globe with different language histories, the error rate in communication could only increase from that of your or my locale. Hence, the importance of the posting guide FAQs. More often than not, however, I've later found replies that I previously considered terse or inappropriate to be quite accurate. On the other side, I walked through an RD lab once pointed out a solution to a router configuration problem that the technician had been working on for a few days. Having been trained on that one system, it was quite obvious what needed to be fixed. I expected the technician to be grateful, but he never spoke to me again! Well, that's my 2 cents. I do acknowledge that it wasn't requested. My hope is that someone out there in listland would find it useful, even if not you. Yes, I guess I am having a conversation with the wind... Rgds, KeithC. Thu, 16 Mar 2006 00:26:46 -0800 Michael [EMAIL PROTECTED] Re: [R] GAM using R tutorials? Hi Templ, That's very helpful! Indeed I've printed out the gam portion and I am digesting now... Thank you so much! I really appreicate your constructive and helpful advice! Best, Michael. On 3/15/06 TEMPL Matthias [EMAIL PROTECTED] wrote: Have you looked at: An Introduction to R: Software for StatisticalModelling Computing by Petra Kuhnert and Bill Venables which is available at http://cran.r-project.org/other-docs.html Hope this helps. Best, Matthias Thu, 16 Mar 2006 00:25:56 -0800 Michael [EMAIL PROTECTED] Re: [R] GAM using R tutorials? Interesting! Very interesting! You seem to assume that I haven't done anything as you've listed below: I have even listed a book that I was aware of but I could not find in our local library... and doing some research and I have already started playing with gam, and everybody on earth knows that you can do ?gam and help.search() in R to find info, but after I have done all those, I still feel that I need some more infohelp... I don't know how you reach your ungrounded conclusion? I think you are wasting everybody's bandwidth and time by not providing constructive suggestions but discouraging (and overly repetitive) comments disregarding the effort a newbie had already paid... I think you really need to consult posting guide yourself... And remember, you can always remain silent -- you don't need to show off that you are an experienced user and for newbie your first response is always something like scolding: why didn't you search? I strongly believe that your answer is inappropriate!!! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GAM using R tutorials?
Have you looked at: An Introduction to R: Software for StatisticalModelling Computing by Petra Kuhnert and Bill Venables which is available at http://cran.r-project.org/other-docs.html Hope this helps. Best, Matthias Hi all, I am trying to use GAM to work on some data... Are there any resources providing hands-on tutorial/guide on how to do GAM on data in R? Specifically, I am not sure about which model to choose, and smooth models with which effective degree-of-freedom shall I use... I knew there is a book titled: GAM: an introduction using R. Unfornately our local library does not have it... so that's not an option given time constraint. Thanks a lot for your pointers! Michael. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GAM using R tutorials?
On Tue, 2006-03-14 at 23:52 -0800, Michael wrote: Hi all, I am trying to use GAM to work on some data... Are there any resources providing hands-on tutorial/guide on how to do GAM on data in R? Specifically, I am not sure about which model to choose, and smooth models with which effective degree-of-freedom shall I use... I knew there is a book titled: GAM: an introduction using R. Unfornately our local library does not have it... so that's not an option given time constraint. Thanks a lot for your pointers! Michael. Michael, Please learn to use the search tools provided for you! You have posted numerous emails to the list recently, many of which you could have solved for yourself if only you'd heeded peoples' advice and searched for yourself. For this problem; 1) I'd suggest to the local library that they might consider buying the book, but in the meantime... 2) ...in R, do RSiteSearch(GAM) and look at the list shown in your browser. The first hit is the help page for package mgcv. Look at the references included on that help page - most are technical/statistical papers, but a starting point might be the RNews article Simon Wood wrote. That should get yourself started. But if you'd done the search yourself, you wouldn't have had to wait for someone on the list to do it for you. Finally - Please read the posting guide - it is there for a reason. HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] GAM using R tutorials?
Hi all, I am trying to use GAM to work on some data... Are there any resources providing hands-on tutorial/guide on how to do GAM on data in R? Specifically, I am not sure about which model to choose, and smooth models with which effective degree-of-freedom shall I use... I knew there is a book titled: GAM: an introduction using R. Unfornately our local library does not have it... so that's not an option given time constraint. Thanks a lot for your pointers! Michael. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gam
I'm new to both R and to this list and would like to get advice on how to build generalized additive models in R. Based on the description of gam, which I found on the R website, I specified the following model: model1-gam(ST~s(MOWST1),family=binomial,data=strikes.S), in which ST is my binary response variable and MOWST1 is a categorical independent variable. I get the following error message: Error in smooth.construct.tp.smooth.spec(object, data, knots) : NA/NaN/Inf in foreign function call (arg 1) - I guess this should maybe get trapped a bit earlier, so that you get a more informative warning. - The basic problem is that gams are based around sums of smooth functions of covariates. For the notion of smooth to be meaningful the covariates have to live in a space where you have at least a notion of distance between the covariates, since in some loose sense `smooth' means that f(x_1) must be close to f(x_2) if x_1 and x_2 are close. For factors you doen't generally have any notion of distance between the levels of a factor. (e.g. if a factor has levels brick, sky and purple, how far is it from brick to purple?) - Even if a factor is naturally ordered (e.g. small, medium, large), you would still have to decide on how to measure smoothness/wiggliness of a function of the factor. For this reason, I think that it is actually better to explicitly convert levels of an ordered factor into numeric values on a scale that you think is appropriate, before using the ordered factor as the covariate in a gam. In this way it's usually fairly easy to get one of the mgcv built in smoother classes to use the notion of smoothness that you think is appropriate: if not then it's not too hard to add a smoother class, following the template provided in ?p.spline (actually you could use this template to write a smoother class for ordered catagorical predictors). best, Simon __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gam
I.Szentirmai said the following on 2006-01-19 19:43: Dear R users, I'm new to both R and to this list and would like to get advice on how to build generalized additive models in R. Based on the description of gam, which I found on the R Which `gam'? Note that R ships with package `mgcv' which has a `gam' function, but also package `gam' on CRAN has a `gam' function. (Furthermore, several other packages exists with functions that I'd categorize as GAM-fitters, e.g. SemiPar, assist, gss, gamlss, ...) website, I specified the following model: model1-gam(ST~s(MOWST1),family=binomial,data=strikes.S), in which ST is my binary response variable and MOWST1 is a categorical independent variable. I get the following error message: Error in smooth.construct.tp.smooth.spec(object, data, From this error message, I can however deduce that we're talking about the `mgcv::gam' function. knots) : NA/NaN/Inf in foreign function call (arg 1) In addition: Warning messages: 1: argument is not numeric or logical: returning NA in: mean.default(xx) 2: - not meaningful for factors in: Ops.factor(xx, shift[i]) I would greatly appreciate if someone could tell me what I did wrong. Can I use categorical independents in gam at all? It's not clear to me what you mean by this. Yes, you can use factors in gam: gam(ST ~ MOWST1, family = binomial, data = strikes.S) would work. But you tried smoothing a factor, which isn't supported (and to me it doesn't make any sense doing so). Smoothing an ordered factor may make sense, but this is not supported (and you didn't try it, according to the error message above) by `mgcv'. I was under the impression that the `gam' function in package `gam' should be able to do this, but I just tried it and was rewarded by the error message Error: 'codes' is defunct. relating to the internals of `gam' using a defunct R function -- I've e-mailed Prof Hastie, maintainer of package `gam', about this. Even if it worked, the `gam' package won't allow estimation of the degree of smoothness of the model terms as part of the fitting process. So if this is what you want in combination with ordered factors, you're probably out of luck. (You can always send Prof Wood, `mgcv' maintainer, a feature request.) HTH, Henric __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] gam
Dear R users, I'm new to both R and to this list and would like to get advice on how to build generalized additive models in R. Based on the description of gam, which I found on the R website, I specified the following model: model1-gam(ST~s(MOWST1),family=binomial,data=strikes.S), in which ST is my binary response variable and MOWST1 is a categorical independent variable. I get the following error message: Error in smooth.construct.tp.smooth.spec(object, data, knots) : NA/NaN/Inf in foreign function call (arg 1) In addition: Warning messages: 1: argument is not numeric or logical: returning NA in: mean.default(xx) 2: - not meaningful for factors in: Ops.factor(xx, shift[i]) I would greatly appreciate if someone could tell me what I did wrong. Can I use categorical independents in gam at all? Many thanks, Istvan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GAM and AIC: How can I do??? please
Please send R and mgcv version numbers, as I can't replicate this problem. best, Simon Hello, I'm a Korean researcher who have been started to learn the R package. I want to make gam model and AIC value of the model to compare several models. I did the GAM model, but there were error for AIC. SO, how can I do? pleas help me!!! I did like below; a.fit - gam(pi~ s(t1r), family = gaussian(link=log)) summary(a.fit) Family: gaussian Link function: log Formula: pi ~ s(t1r) Parametric coefficients: Estimate std. err.t ratioPr(|t|) constant 0.093105 0.005238 17.77 2.22e-16 Approximate significance of smooth terms: edf chi.sq p-value s(t1r) 1.833 24.153 0.00014213 R-sq.(adj) = 0.435 Deviance explained = 47.1% GCV score = 0.0010938 Scale est. = 0.00099053 n = 30 AIC(a.fit) Error in logLik(object) : no applicable method for logLik Eun A Kim, MD, MPH, Ph.D Senior Researcher Occupational safety and Health Research Institute Korea Occupational Safety and Health Agency TEL : +82-32-510-0910, FAX: +82-32-518-0862 Address: 34-4 Gusan-dong, Bupyung-Gu, Incheon city, 430-711, Republic of Korea Home Fax +82-(303)3111-0573 [EMAIL PROTECTED]msgid=%3C20051024015532.0 [EMAIL PROTECTED][EMAIL PROTECTED] chkey=bd02e77f5ec97f754394e2adff337f11] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GAM and AIC: How can I do??? please
ronggui == ronggui [EMAIL PROTECTED] on Mon, 24 Oct 2005 10:09:30 +0800 writes: ronggui === 2005-10-24 09:55:32 ronggui úÚ´ÅдÀº=== Hello, I'm a Korean researcher who have been started to learn the R package. I want to make gam model and AIC value of the model to compare several models. I did the GAM model, but there were error for AIC. SO, how can I do? pleas help me!!! I did like below; a.fit - gam(pi~ s(t1r), family = gaussian(link=log)) summary(a.fit) Family: gaussian Link function: log Formula: pi ~ s(t1r) Parametric coefficients: Estimate std. err.t ratioPr(|t|) constant 0.093105 0.005238 17.77 2.22e-16 Approximate significance of smooth terms: edf chi.sq p-value s(t1r) 1.833 24.153 0.00014213 R-sq.(adj) = 0.435 Deviance explained = 47.1% GCV score = 0.0010938 Scale est. = 0.00099053 n = 30 ronggui are you using the mgcv package? if you are,just ronggui use a.fit$aic to get the aic. hmm, yes, and no: It's true what you say, BUT is not at all recommended in general: You should use the generic AIC() function rather than extracting components yourself. This is a general priniciple: If possible use 'extractor functions' to work on objects rather then relying on internal representations. This is particularly relevant for fitted models: Do use residuals(.), fitted(.), LogLik(.), AIC(.), vcov(.) etc etc! Now back to this problem: AIC(a.fit) Error in logLik(object) : no applicable method for logLik I can't reproduce this; Eun definitely needs to give more details, since the following works fine: library(mgcv) x - 1:50 set.seed(1) y - 2^(sin(x/10) + rnorm(50)) a.fit - gam(y ~ s(x), family = gaussian(link=log)) summary(a.fit) Family: gaussian Link function: log Formula: y ~ s(x) Parametric coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.3171 0.1251 2.535 0.0147 * --- Signif. codes: ..{UTF-8 code} Approximate significance of smooth terms: edf Est.rankF p-value s(x) 2.8589.000 3.07 0.00576 ** --- Signif. codes: R-sq.(adj) =0.4 Deviance explained = 43.5% GCV score = 0.94391 Scale est. = 0.87107 n = 50 AIC(a.fit) [1] 140.6937 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] GAM and AIC: How can I do??? please
Hello, I'm a Korean researcher who have been started to learn the R package. I want to make gam model and AIC value of the model to compare several models. I did the GAM model, but there were error for AIC. SO, how can I do? pleas help me!!! I did like below; a.fit - gam(pi~ s(t1r), family = gaussian(link=log)) summary(a.fit) Family: gaussian Link function: log Formula: pi ~ s(t1r) Parametric coefficients: Estimate std. err.t ratioPr(|t|) constant 0.093105 0.005238 17.77 2.22e-16 Approximate significance of smooth terms: edf chi.sq p-value s(t1r) 1.833 24.153 0.00014213 R-sq.(adj) = 0.435 Deviance explained = 47.1% GCV score = 0.0010938 Scale est. = 0.00099053 n = 30 AIC(a.fit) Error in logLik(object) : no applicable method for logLik Eun A Kim, MD, MPH, Ph.D Senior Researcher Occupational safety and Health Research Institute Korea Occupational Safety and Health Agency TEL : +82-32-510-0910, FAX: +82-32-518-0862 Address: 34-4 Gusan-dong, Bupyung-Gu, Incheon city, 430-711, Republic of Korea Home Fax +82-(303)3111-0573 [EMAIL PROTECTED]msgid=%3C20051024015532.0 [EMAIL PROTECTED][EMAIL PROTECTED] chkey=bd02e77f5ec97f754394e2adff337f11] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GAM and AIC: How can I do??? please
=== 2005-10-24 09:55:32 您在来信中写道:=== Hello, I'm a Korean researcher who have been started to learn the R package. I want to make gam model and AIC value of the model to compare several models. I did the GAM model, but there were error for AIC. SO, how can I do? pleas help me!!! I did like below; a.fit - gam(pi~ s(t1r), family = gaussian(link=log)) summary(a.fit) Family: gaussian Link function: log Formula: pi ~ s(t1r) Parametric coefficients: Estimate std. err.t ratioPr(|t|) constant 0.093105 0.005238 17.77 2.22e-16 Approximate significance of smooth terms: edf chi.sq p-value s(t1r) 1.833 24.153 0.00014213 R-sq.(adj) = 0.435 Deviance explained = 47.1% GCV score = 0.0010938 Scale est. = 0.00099053 n = 30 are you using the mgcv package? if you are,just use a.fit$aic to get the aic. AIC(a.fit) Error in logLik(object) : no applicable method for logLik Eun A Kim, MD, MPH, Ph.D Senior Researcher Occupational safety and Health Research Institute Korea Occupational Safety and Health Agency TEL : +82-32-510-0910, FAX: +82-32-518-0862 Address: 34-4 Gusan-dong, Bupyung-Gu, Incheon city, 430-711, Republic of Korea Home Fax +82-(303)3111-0573 [EMAIL PROTECTED]msgid=%3C20051024015532.0 [EMAIL PROTECTED][EMAIL PROTECTED] chkey=bd02e77f5ec97f754394e2adff337f11] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html = = = = = = = = = = = = = = = = = = = = 2005-10-24 -- Deparment of Sociology Fudan University My new mail addres is [EMAIL PROTECTED] Blog:http://sociology.yculblog.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] GAM weights
Dear all, we are trying to model some data from rare plants so we always have less than 50 1x1 km presences, and the total area is about 550.000 square km. So we have a real problem, when we perform a GAM, if we consider only the same amount of absences than presences. We have thought to use a greater number of absences but in this case we shoud downweight them. Does anybody know how to use the wheight term? thank you in advance daniel -- Mensaje enviado mediante una herramienta Webmail integrada en *El Rincon*: - https://rincon.uam.es -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] GAM/MGCV: UBRE Score
In comparing 2 GAM models are there some guidelines to determine which model might be a better fit by comparing the 2 UBRE scores? I assume that a larger value is indicative of a better fit? What magnitude of difference is a significant one? I imagine that one could also use the percent of explained deviance. Is that a reliable statistic? Are there any other statistics that one would use for model selection? Jean G. Orelien Senior Biostatistician *** SciMetrika, LLC 2 Davis Drive RTP, NC 27709 Tel: (919)765-0017 (1210) Fax: (919)990-8561 Email: [EMAIL PROTECTED] Website: http://www.scimetrika.com *** [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] gam in library(gam)
I know there are two versions of gam in R. One is in library(mgcv) and one is in library(gam). The one in mgcv can automatically calculate the smoothing parameter. However, the one in gam can't although it can incorporate a larger variety of smoothers (besides spline). Can anybody educate me if there is a way to do smoothing parameter selection in gam from library(gam)? I know I can always program cross-validation by myself. But it might be more friendly for the software if it can take this into account automatically. (like gam in library(mgcv)) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] gam(mgcv) starting values
Hi all! Ive got some problems with the function gam (library mgcv). For some models I get the error message : Error: no valid set of coefficients has been found:please supply starting values In addition: Warning message: NaNs produced in: log(x) This is a shortened code I used: gam(y ~ M1 + M3 + M4 + M5 + M6 + sex + M1*M3 + s(age), family=Gamma(link =identity), weights=days) If I add for example an additional variable, say M7, the error-message occures. If I add M7 in combination with for example M8 it works. Does somebody know, how to supply starting values or how to handle this problem. I didnt suxceed by adding control=gam.control(spIterType=outer), or by sp=137722.1 Thank you very much, Bjrn __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gam(mgcv) starting values
My guess is that your model is predicting a negative mean for some of your data. Since this is not possible for a Gamma r.v. the deviance calculation returns something non finite, which triggers the error message. This is possible because you have used an identity link. Is it not possible to use a log link? If you have to use an identity link then I'd first check that y ~ M1 + M3 + M4 + M5 + M6 + M7+ sex + M1*M3 + age works. If it does, then you could try starting with a very large min.sp argument when fitting the model with s(age), and slowly reducing it until the the estimated smoothing parameter is non-zero --- if this works then you've succeeded in finding the best fit model without any E(y) becoming negative in the process, but if it doesn't it probably means either that the model structure is wrong, or some E(y) really is very close to zero. I doubt that altering starting values is likely to help here (the starting values won't make any E(y)=0, after all). best, Simon _ Simon Wood [EMAIL PROTECTED]www.stats.gla.ac.uk/~simon/ Department of Statistics, University of Glasgow, Glasgow, G12 8QQ Direct telephone: (0)141 330 4530 Fax: (0)141 330 4814 Hi all! Ive got some problems with the function gam (library mgcv). For some models I get the error message : Error: no valid set of coefficients has been found:please supply starting values In addition: Warning message: NaNs produced in: log(x) This is a shortened code I used: gam(y ~ M1 + M3 + M4 + M5 + M6 + sex + M1*M3 + s(age), family=Gamma(link =identity), weights=days) If I add for example an additional variable, say M7, the error-message occures. If I add M7 in combination with for example M8 it works. Does somebody know, how to supply starting values or how to handle this problem. I didnt suxceed by adding control=gam.control(spIterType=outer), or by sp=137722.1 Thank you very much, Bjrn __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GAM: Remedial measures
I fitted a GAM model with Poisson distribution to a data with about 200 observations. I noticed that the plot of the residuals versus fitted values show a trend. Residuals tend to be lower for higher fitted values. Because, I'm dealing with count data, I'm thinking that this might be due to overdispersion. Is there a way to account for overdispersion in any of the packages MGCV or GAM? You can `allow for' overdispersion in mgcv::gam by using the quasipoisson family, or setting scale to -1 in the gam call. In a straight GLM this would make no difference to the residual plots, since the scale parameter does not change the coefficient estimates. However, things are different for a GAM with automatic smoothness estimations, since the scale parameter does influence the smoothing parameter estimation criterion. Another possibility is to use the negative binomial family from the MASS library, and a third is to use the quasi family. Simon _ Simon Wood [EMAIL PROTECTED]www.stats.gla.ac.uk/~simon/ Department of Statistics, University of Glasgow, Glasgow, G12 8QQ Direct telephone: (0)141 330 4530 Fax: (0)141 330 4814 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] GAM: Remedial measures
I fitted a GAM model with Poisson distribution to a data with about 200 observations. I noticed that the plot of the residuals versus fitted values show a trend. Residuals tend to be lower for higher fitted values. Because, I'm dealing with count data, I'm thinking that this might be due to overdispersion. Is there a way to account for overdispersion in any of the packages MGCV or GAM? I welcome any suggestions that one may have on this topic. Jean __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GAM: Remedial measures
Jean, The standard treatment of overdispersed data when using the Poisson distribution to model count data is to switch to the negative binomial distribution. Hope this helps, Tim Liao Original message Date: Thu, 13 Jan 2005 18:22:29 -0500 From: Jean G. Orelien [EMAIL PROTECTED] Subject: [R] GAM: Remedial measures To: r-help@stat.math.ethz.ch I fitted a GAM model with Poisson distribution to a data with about 200 observations. I noticed that the plot of the residuals versus fitted values show a trend. Residuals tend to be lower for higher fitted values. Because, I'm dealing with count data, I'm thinking that this might be due to overdispersion. Is there a way to account for overdispersion in any of the packages MGCV or GAM? I welcome any suggestions that one may have on this topic. Jean __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GAM: Getting standard errors from the parametric terms in a GAM model
summary.gam and anova.gam in package mgcv will report standard errors and p-values for parametric terms, as well as smooth terms, for a gam fitted by function gam from package mgcv. Simon I am new to R. I'm using the function GAM and wanted to get standard errors and p-values for the parametric terms (I fitted a semi-parametric models). Using the function anova() on the object from GAM, I only get p-values for the nonparametric terms. Does anyone know if and how to get standard errors for the parametric terms? Thanks. Jean G. Orelien __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GAM: Overfitting
I am analyzing particulate matter data (PM10) on a small data set (147 observations). I fitted a semi-parametric model and am worried about overfitting. How can one check for model fit in GAM? - Keeping a random subset of the data as a validation set, fitting to the remaining data and then comparing the R^2/ proportion deviance explained on fit set and validation set is usually quite diagnostic. If the fit data are much better predicted than the validation data, then you probably have over-fitting. - If your response is treated as Poisson then scale parameter estimates 1 are also diagnostic, but only if you are not expecting overdispersion, of course. - If you use gam from package mgcv then, by default, model effective degrees of freedom are estimated from your data by GCV or an approximation to AIC. mgcv::gam allows you to increase the penalty on each model degree of freedom in these criteria, via gam argument `gamma'. Some work by Kim and Gu (2004, J.Roy.Statist.Soc.B) suggests that gamma around 1.4 can be a sensible choise for surpressing overfitting, without much of a degredation in MSE performance. best, Simon __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GAM: Getting standard errors from the parametric terms in a GAM model
On Tue, 21 Dec 2004, Jean G. Orelien wrote: I am new to R. I'm using the function GAM and wanted to get standard errors and p-values for the parametric terms (I fitted a semi-parametric models). Using the function anova() on the object from GAM, I only get p-values for the nonparametric terms. Does anyone know if and how to get standard errors for the parametric terms? If you mean gam() in the gam package then, yes, someone does but it hasn't been included in the package yet. It is described in the current issue of JASA. Code for S-PLUS is supposed to be at http://www.ihapss.jsph.edu/software/ but that is currently not working. -thomas __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] GAM: Getting standard errors from the parametric terms in a GAM model
I am new to R. I'm using the function GAM and wanted to get standard errors and p-values for the parametric terms (I fitted a semi-parametric models). Using the function anova() on the object from GAM, I only get p-values for the nonparametric terms. Does anyone know if and how to get standard errors for the parametric terms? Thanks. Jean G. Orelien __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] GAM: Overfitting
I am analyzing particulate matter data (PM10) on a small data set (147 observations). I fitted a semi-parametric model and am worried about overfitting. How can one check for model fit in GAM? Jean G. Orelien __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GAM: Overfitting
Jean G. Orelien wrote: I am analyzing particulate matter data (PM10) on a small data set (147 observations). I fitted a semi-parametric model and am worried about overfitting. How can one check for model fit in GAM? Jean G. Orelien It's good to separate 'model fit' (or lack of fit) from 'overfitting'. Overfitting can cause the model fit to appear to be excellent, but there is still a huge problem. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Gam() function in R
On 6 Dec 2004, at 7:36, Janice Tse wrote: Thanks for the email. I will check that out However when I was doing this :gam(y~s(x1)+s(x2,3), family=gaussian, data=mydata )it gives me the error : Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars What does it mean ? When Any Liaw answered you (below), he asked you to specify which kind of 'gam' did you use: the one in standard package 'mgcv' or the one in package 'gam'. We should know this to know what does it mean to get your error message. If you used mgcv:::gam, it means that you didn't read it help pages which say that you should specify your model as: gam(y ~ s(x1) + s(x2, k=3)) Further, it may be useful to read the help pages to understand what it means to specify k=3 and how it may influence your model. Simon Wood -- the mgcv author -- also has a very useful article in the R Newsletter: see the CRAN archive. It may be really difficult to understand what you do when you do mgcv:::gam unless you read this paper (it is possible, but hard). Simon's article specifically answers to your first question of deciding the smoothness, and explains how elegantly this is done in mgcv:::gam (gam:::gam has another set of tools and philosophy). If you happened to use gam:::gam, then you have to look at another explanation. cheers, jari oksanen From: Liaw, Andy [mailto:[EMAIL PROTECTED] Sent: Sunday, December 05, 2004 11:34 PM To: 'Janice Tse'; [EMAIL PROTECTED] Subject: RE: [R] Gam() function in R Unfortunately that's not really an R question. I recommend that you read up on the statistical methods underneath. One that I'd wholeheartedly recommend is Prof. Harrell's `Regression Modeling Strategies'. [BTW, there are now two implementations of gam() in R: one in `mgcv', which is fairly different from that in `gam'. I'm guessing you're referring to the one in `gam', but please remember to state which contributed package you're using, along with version of R and OS.] Cheers, Andy From: Janice Tse Hi all, I'm a new user of R gam() function. I am wondering how do we decide on the smooth function to use? The general form is gam(y~s(x1,df=i)+s(x2,df=j)...) , how do we decide on the degree freedom to use for each smoother, and if we shold apply smoother to each attribute? Thanks!! __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html --- - -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jari Oksanen, Oulu, Finland __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Gam() function in R
hi all, this subject is very intersting for me. I'm using mgcv 0.8-9 with R version 1.7.1. i didn't know that there was an another gam version with package library(gam). Someone can tell me the basics differences between them? I look for an help page on google but i only find mgcv help pages. thanks! yves magliulo, Paris. Le lun 06/12/2004 à 09:09, Jari Oksanen a écrit : On 6 Dec 2004, at 7:36, Janice Tse wrote: Thanks for the email. I will check that out However when I was doing this :gam(y~s(x1)+s(x2,3), family=gaussian, data=mydata )it gives me the error : Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars What does it mean ? When Any Liaw answered you (below), he asked you to specify which kind of 'gam' did you use: the one in standard package 'mgcv' or the one in package 'gam'. We should know this to know what does it mean to get your error message. If you used mgcv:::gam, it means that you didn't read it help pages which say that you should specify your model as: gam(y ~ s(x1) + s(x2, k=3)) Further, it may be useful to read the help pages to understand what it means to specify k=3 and how it may influence your model. Simon Wood -- the mgcv author -- also has a very useful article in the R Newsletter: see the CRAN archive. It may be really difficult to understand what you do when you do mgcv:::gam unless you read this paper (it is possible, but hard). Simon's article specifically answers to your first question of deciding the smoothness, and explains how elegantly this is done in mgcv:::gam (gam:::gam has another set of tools and philosophy). If you happened to use gam:::gam, then you have to look at another explanation. cheers, jari oksanen From: Liaw, Andy [mailto:[EMAIL PROTECTED] Sent: Sunday, December 05, 2004 11:34 PM To: 'Janice Tse'; [EMAIL PROTECTED] Subject: RE: [R] Gam() function in R Unfortunately that's not really an R question. I recommend that you read up on the statistical methods underneath. One that I'd wholeheartedly recommend is Prof. Harrell's `Regression Modeling Strategies'. [BTW, there are now two implementations of gam() in R: one in `mgcv', which is fairly different from that in `gam'. I'm guessing you're referring to the one in `gam', but please remember to state which contributed package you're using, along with version of R and OS.] Cheers, Andy From: Janice Tse Hi all, I'm a new user of R gam() function. I am wondering how do we decide on the smooth function to use? The general form is gam(y~s(x1,df=i)+s(x2,df=j)...) , how do we decide on the degree freedom to use for each smoother, and if we shold apply smoother to each attribute? Thanks!! __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html --- - -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jari Oksanen, Oulu, Finland __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Gam() function in R
I'm a new user of R gam() function. I am wondering how do we decide on the smooth function to use? The general form is gam(y~s(x1,df=i)+s(x2,df=j)...) , how do we decide on the degree freedom to use for each smoother, and if we shold apply smoother to each attribute? I guess you are using gam() from package gam, in which case you probably need to look at the help file for step.gam. By default gam() in package mgcv estimates the appropriate degrees of freedom automatically as part of model estimation using generalized cross validation, (although there is an adjustable upper limit on the range of degrees of freedom considered). Package gss also has routines for fitting GAMs where the choise of df is fully automatic. best, Simon _ Simon Wood [EMAIL PROTECTED]www.stats.gla.ac.uk/~simon/ Department of Statistics, University of Glasgow, Glasgow, G12 8QQ Direct telephone: (0)141 330 4530 Fax: (0)141 330 4814 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Gam() function in R
Thank you very much. I am using gam() from mgcv actually. You answered my question about degree of freedom. One more question, if I were to compare the results from gam() and glm(), which numbers are of the greatest interest? What if my response variables are binary? Thanks! -Janice -Original Message- From: Simon Wood [mailto:[EMAIL PROTECTED] Sent: Monday, December 06, 2004 5:54 AM To: Janice Tse Cc: [EMAIL PROTECTED] Subject: Re: [R] Gam() function in R I'm a new user of R gam() function. I am wondering how do we decide on the smooth function to use? The general form is gam(y~s(x1,df=i)+s(x2,df=j)...) , how do we decide on the degree freedom to use for each smoother, and if we shold apply smoother to each attribute? I guess you are using gam() from package gam, in which case you probably need to look at the help file for step.gam. By default gam() in package mgcv estimates the appropriate degrees of freedom automatically as part of model estimation using generalized cross validation, (although there is an adjustable upper limit on the range of degrees of freedom considered). Package gss also has routines for fitting GAMs where the choise of df is fully automatic. best, Simon _ Simon Wood [EMAIL PROTECTED]www.stats.gla.ac.uk/~simon/ Department of Statistics, University of Glasgow, Glasgow, G12 8QQ Direct telephone: (0)141 330 4530 Fax: (0)141 330 4814 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Gam() function in R
At 10:48 2004-12-06 +0100, Yves Magliulo wrote: this subject is very intersting for me. I'm using mgcv 0.8-9 with R version 1.7.1. You're in need of an update. i didn't know that there was an another gam version with package library(gam). This is the 'classic' GAM implementation by Hastie Tibshirani, discussed at length in Hastie Tibshirani (1990) and in the White book. In fact, other implementations of the GAM concept also exists. Take a look at the gss and assist packages; both are at CRAN, and the former is support software for Gu's `Smoothing Spline ANOVA Models' book. There's also the vgam http://www.stat.auckland.ac.nz/~yee/VGAM/ and SemiPar packages http://web.maths.unsw.edu.au/~wand/webspr/rsplus.html; the latter is support software for the `Semiparametric Regression' book by Ruppert, Wand and Carroll. And there's probably more out there... Someone can tell me the basics differences between them? I look for an help page on google but i only find mgcv help pages. Simon Wood (author of the mgcv package) has written a brief but useful summary: http://www.stats.gla.ac.uk/~simon/simon/mgcv_overview.html HTH, Henric __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Gam() function in R
this subject is very intersting for me. I'm using mgcv 0.8-9 with R version 1.7.1. i didn't know that there was an another gam version with package library(gam). Someone can tell me the basics differences between them? I look for an help page on google but i only find mgcv help pages. - I think you'd need to move to a newer version of R in order to use package gam, but that would also let you use a much more recent version of package mgcv. - package gam is based very closely on the GAM approach presented in Hastie and Tibshirani's Generalized Additive Models book. Estimation is by back-fitting and model selection is based on step-wise regression methods based on approximate distributional results. A particular strength of this approach is that local regression smoothers (`lo()' terms) can be included in GAM models. - gam in package mgcv represents GAMs using penalized regression splines. Estimation is by direct penalized likelihood maximization with integrated smoothness estimation via GCV or related criteria (there is also an alternative `gamm' function based on a mixed model approach). Strengths of the this approach are that s() terms can be functions of more than one variable and that tensor product smooths are available via te() terms - these are useful when different degrees of smoothness are appropriate relative to different arguments of a smooth. Here's an attempt at a summary of the differences: Estimation: gam::gam based on backfitting, mgcv::gam based on direct penalized likelihood maximization (with smoothness estimation integrated) Model selection: package(gam) based on stepwise regression methods. mgcv::gam based on integrated GCV estimation of degree of smoothness. Smooth terms: gam::gam can represent smooth terms using a very wide range of scatterplot smoothers incuding loess, which is built in. mgcv::gam is restricted to smoothers that can be represented using basis functions and an associated ``wiggliness'' penalty, but these include low rank thin plate spline smoothers and tensor product smoothers for smooths of more than one variable. Both packages provide interfaces for adding new classes of smoother. Uncertainty estimation: since mgcv GAMs explicitly estimate coefficients for each smooth term, it is fairly straightforward to obtain a covariance matrix for the model coefficients, which makes further variance calcualtions easy. For example predictions with standard errors are easily obtained for predictions made with new prediction data. The backfitting approach makes variance calculation more difficult (e.g. at present s.e.s are not available from gam::predict.gam with new data) Interface: both packages are based on Trevor Hastie's Chapter 7 of Chambers and Hastie. Since Trevor H. wrote package(gam) it's a closer implementation than package(mgcv). Basically, if you want integrated smoothness selection, an underlying parametric representation, or want smooth interactions in your models then mgcv is probably worth a try (but I would say that). If you want to use local regression smoothers and/or prefer the stepwise selection approach then package gam is for you. Simon _ Simon Wood [EMAIL PROTECTED]www.stats.gla.ac.uk/~simon/ Department of Statistics, University of Glasgow, Glasgow, G12 8QQ Direct telephone: (0)141 330 4530 Fax: (0)141 330 4814 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Gam() function in R
so mgcv package is the one i need! indeed, i want integrated smoothness selection and smooth interactions rather than stepwise selection. i have a lot of predictor, and i use gam to select those who are efficient and exclude others. (using p-value) thanks a lot for those precious information. Le lun 06/12/2004 à 12:41, Simon Wood a écrit : this subject is very intersting for me. I'm using mgcv 0.8-9 with R version 1.7.1. i didn't know that there was an another gam version with package library(gam). Someone can tell me the basics differences between them? I look for an help page on google but i only find mgcv help pages. - I think you'd need to move to a newer version of R in order to use package gam, but that would also let you use a much more recent version of package mgcv. - package gam is based very closely on the GAM approach presented in Hastie and Tibshirani's Generalized Additive Models book. Estimation is by back-fitting and model selection is based on step-wise regression methods based on approximate distributional results. A particular strength of this approach is that local regression smoothers (`lo()' terms) can be included in GAM models. - gam in package mgcv represents GAMs using penalized regression splines. Estimation is by direct penalized likelihood maximization with integrated smoothness estimation via GCV or related criteria (there is also an alternative `gamm' function based on a mixed model approach). Strengths of the this approach are that s() terms can be functions of more than one variable and that tensor product smooths are available via te() terms - these are useful when different degrees of smoothness are appropriate relative to different arguments of a smooth. Here's an attempt at a summary of the differences: Estimation: gam::gam based on backfitting, mgcv::gam based on direct penalized likelihood maximization (with smoothness estimation integrated) Model selection: package(gam) based on stepwise regression methods. mgcv::gam based on integrated GCV estimation of degree of smoothness. Smooth terms: gam::gam can represent smooth terms using a very wide range of scatterplot smoothers incuding loess, which is built in. mgcv::gam is restricted to smoothers that can be represented using basis functions and an associated ``wiggliness'' penalty, but these include low rank thin plate spline smoothers and tensor product smoothers for smooths of more than one variable. Both packages provide interfaces for adding new classes of smoother. Uncertainty estimation: since mgcv GAMs explicitly estimate coefficients for each smooth term, it is fairly straightforward to obtain a covariance matrix for the model coefficients, which makes further variance calcualtions easy. For example predictions with standard errors are easily obtained for predictions made with new prediction data. The backfitting approach makes variance calculation more difficult (e.g. at present s.e.s are not available from gam::predict.gam with new data) Interface: both packages are based on Trevor Hastie's Chapter 7 of Chambers and Hastie. Since Trevor H. wrote package(gam) it's a closer implementation than package(mgcv). Basically, if you want integrated smoothness selection, an underlying parametric representation, or want smooth interactions in your models then mgcv is probably worth a try (but I would say that). If you want to use local regression smoothers and/or prefer the stepwise selection approach then package gam is for you. Simon _ Simon Wood [EMAIL PROTECTED]www.stats.gla.ac.uk/~simon/ Department of Statistics, University of Glasgow, Glasgow, G12 8QQ Direct telephone: (0)141 330 4530 Fax: (0)141 330 4814 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Gam() function in R
Yves Magliulo wrote: so mgcv package is the one i need! indeed, i want integrated smoothness selection and smooth interactions rather than stepwise selection. i have a lot of predictor, and i use gam to select those who are efficient and exclude others. (using p-value) It is interesting that you use P-values but do not care that the strategy you use (variable selection as opposed to pre-specifying models or just using shrinkage) does not preserve type I error or confidence interval coverage probabilities in subsequent analyses with mgcv. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Gam() function in R
Hi all, I'm a new user of R gam() function. I am wondering how do we decide on the smooth function to use? The general form is gam(y~s(x1,df=i)+s(x2,df=j)...) , how do we decide on the degree freedom to use for each smoother, and if we shold apply smoother to each attribute? Thanks!! __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Gam() function in R
Unfortunately that's not really an R question. I recommend that you read up on the statistical methods underneath. One that I'd wholeheartedly recommend is Prof. Harrell's `Regression Modeling Strategies'. [BTW, there are now two implementations of gam() in R: one in `mgcv', which is fairly different from that in `gam'. I'm guessing you're referring to the one in `gam', but please remember to state which contributed package you're using, along with version of R and OS.] Cheers, Andy From: Janice Tse Hi all, I'm a new user of R gam() function. I am wondering how do we decide on the smooth function to use? The general form is gam(y~s(x1,df=i)+s(x2,df=j)...) , how do we decide on the degree freedom to use for each smoother, and if we shold apply smoother to each attribute? Thanks!! __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Gam() function in R
Thanks for the email. I will check that out However when I was doing this :gam(y~s(x1)+s(x2,3), family=gaussian, data=mydata )it gives me the error : Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars What does it mean ? Thanks -Janice -Original Message- From: Liaw, Andy [mailto:[EMAIL PROTECTED] Sent: Sunday, December 05, 2004 11:34 PM To: 'Janice Tse'; [EMAIL PROTECTED] Subject: RE: [R] Gam() function in R Unfortunately that's not really an R question. I recommend that you read up on the statistical methods underneath. One that I'd wholeheartedly recommend is Prof. Harrell's `Regression Modeling Strategies'. [BTW, there are now two implementations of gam() in R: one in `mgcv', which is fairly different from that in `gam'. I'm guessing you're referring to the one in `gam', but please remember to state which contributed package you're using, along with version of R and OS.] Cheers, Andy From: Janice Tse Hi all, I'm a new user of R gam() function. I am wondering how do we decide on the smooth function to use? The general form is gam(y~s(x1,df=i)+s(x2,df=j)...) , how do we decide on the degree freedom to use for each smoother, and if we shold apply smoother to each attribute? Thanks!! __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] gam plots
When smooths fitted by the gam package are plotted, what are the units of the vertical axis? Is there a simple way to change these units to units of the dependent variable? Thanks for any suggestions! Paul von Hippel Paul von Hippel Department of Sociology / Initiative in Population Research Ohio State University 300 Bricker Hall 190 N. Oval Mall Columbus OH 43210 614 688-3768 Office hours M-Th 3-5pm __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] gam
hi, i'm working with mgcv packages and specially gam. My exemple is: test-gam(B~s(pred1)+s(pred2)) plot(test,pages=1) when ploting test, you can view pred1 vs s(pred1, edf[1] ) pred2 vs s(pred2, edf[2] ) I would like to know if there is a way to access to those terms (s(pred1) s(pred2)). Does someone know how? the purpose is to access to equation of smooths terms in order to have the equation of my additive model. best regards, -- Yves Magliulo, Climatology research departement [EMAIL PROTECTED] Climpact __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gam
i'm working with mgcv packages and specially gam. My exemple is: test-gam(B~s(pred1)+s(pred2)) plot(test,pages=1) when ploting test, you can view pred1 vs s(pred1, edf[1] ) pred2 vs s(pred2, edf[2] ) I would like to know if there is a way to access to those terms (s(pred1) s(pred2)). Does someone know how? Depends a bit on what sort of access you want... You can use predict.gam to obtain the estimated value of each smooth term at any set of pred1 or pred2 values you supply (along with standard errors). The underlying equations are somewhat unwieldy, but are given in Wood, S.N. (2003) Thin plate regression splines. J.R.Statist.Soc.B 65(1):95-114 ... if you need to evaluate the smooth in another program or something then you'd probably need to transform the t.p.r.s. parameters back to thin plate spline parameters and use the t.p.s. basis. - You can also change the smoothing basis to one which is easier to write down - the cr basis (see ?s) parameterizes a 1-d cubic spline in terms of the function values at the knots, for example. - Or you can add a smoothing basis of your own design and hence specify the equations of the smooth yourself: ?p.spline gives an example. the purpose is to access to equation of smooths terms in order to have the equation of my additive model. best, Simon _ Simon Wood [EMAIL PROTECTED]www.stats.gla.ac.uk/~simon/ Department of Statistics, University of Glasgow, Glasgow, G12 8QQ Direct telephone: (0)141 330 4530 Fax: (0)141 330 4814 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GAM question
Warning in eval(expr, envir, enclos) : non-integer #successes in a binomial glm! - one way of specifying a logistic regression model is to supply the observed proportion of sucesses as the response variable (e.g. y) and the binomial n as the weights. The warning is complaining that y/n is non-integer. Depending on exactly why you are weighting, you might want to use the quasibinomial family in place of binomial, Error: cannot allocate vector of size 60865 Kb The gam fit may get a bit memory intensive given the number of data you have. ?gam gives various approaches for dealing with large datasets, but you might want to change the smoothing basis to one that is computationally cheaper than the default. eg. replace s(x) terms by s(x,bs=cr). Simon On Thu, 3 Jun 2004, HILLARY ROBISON wrote: I am trying to use R to do a weighted GAM with PA (presence/random) as the response variable (Y, which is a 0 or a 1) and ASPECT (values go from 0-3340), DEM (from 1500-3300), HLI (from 0-5566), PLAN (from -3 to 3), PROF (from -3 to 3), SLOPE (from 100-500) and TRI (from 0-51) as predictor variables (Xs). I need to weight each observation by its WO value (from 0.18 to 0.98). I have specified the following models in R (see below), but I can't figure out what the R reported errors plainly mean. One of the errors seems to tell me my dataset is too big (it's 109,729 rows by 16 columns) - is this possible? Given what I am trying to accomplish (a weighted, logistic GAM with 7 variables), am I specifying my model correctly? I would like to attach my dataset (it's 2,064 KB as a WinZip file), but I don't know if it'll go through to the list given the HTML attachment contraints of the list... I even tried a weighted, logistic GLM with the seven variables to see if that would work and if so, perhaps it was a GAM problem. I also tried a logistic, weighted GAM with one variable to see if that would work. My next step while I wait to hear back from the list is to try a dummy dataset that is small to see if a weighted, logistic GAM with seven variables will work at all or if I am speciying the model correctly. Would anyone be willing to have my dataset sent so they can check it out if that would help solve the issue? Thank you! Hillary ([EMAIL PROTECTED]) # trial, all, weighted topo8 - gam(PA ~ s(SLOPE10) + s(ASPECT10) + s(GYADEMPLUS) + s(TRI) + s(HLI) + s(PLAN10) + s(PROF10), family=binomial, data=topox, weights = w0) Warning in eval(expr, envir, enclos) : non-integer #successes in a binomial glm! Error: cannot allocate vector of size 60865 Kb topo9 - glm(PA ~ SLOPE10 + ASPECT10 + GYADEMPLUS + TRI + HLI + PLAN10 + PROF10, family=binomial, data=topox, weights = w0) Warning in eval(expr, envir, enclos) : non-integer #successes in a binomial glm! # trial, weighted, slope only topo10 - gam(PA ~ s(SLOPE10), family=binomial, data=topox, weights = w0) Warning in eval(expr, envir, enclos) : non-integer #successes in a binomial glm! __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] GAM question
I am trying to use R to do a weighted GAM with PA (presence/random) as the response variable (Y, which is a 0 or a 1) and ASPECT (values go from 0-3340), DEM (from 1500-3300), HLI (from 0-5566), PLAN (from -3 to 3), PROF (from -3 to 3), SLOPE (from 100-500) and TRI (from 0-51) as predictor variables (Xs). I need to weight each observation by its WO value (from 0.18 to 0.98). I have specified the following models in R (see below), but I can't figure out what the R reported errors plainly mean. One of the errors seems to tell me my dataset is too big (it's 109,729 rows by 16 columns) - is this possible? Given what I am trying to accomplish (a weighted, logistic GAM with 7 variables), am I specifying my model correctly? I would like to attach my dataset (it's 2,064 KB as a WinZip file), but I don't know if it'll go through to the list given the HTML attachment contraints of the list... I even tried a weighted, logistic GLM with the seven variables to see if that would work and if so, perhaps it was a GAM problem. I also tried a logistic, weighted GAM with one variable to see if that would work. My next step while I wait to hear back from the list is to try a dummy dataset that is small to see if a weighted, logistic GAM with seven variables will work at all or if I am speciying the model correctly. Would anyone be willing to have my dataset sent so they can check it out if that would help solve the issue? Thank you! Hillary ([EMAIL PROTECTED]) # trial, all, weighted topo8 - gam(PA ~ s(SLOPE10) + s(ASPECT10) + s(GYADEMPLUS) + s(TRI) + s(HLI) + s(PLAN10) + s(PROF10), family=binomial, data=topox, weights = w0) Warning in eval(expr, envir, enclos) : non-integer #successes in a binomial glm! Error: cannot allocate vector of size 60865 Kb topo9 - glm(PA ~ SLOPE10 + ASPECT10 + GYADEMPLUS + TRI + HLI + PLAN10 + PROF10, family=binomial, data=topox, weights = w0) Warning in eval(expr, envir, enclos) : non-integer #successes in a binomial glm! # trial, weighted, slope only topo10 - gam(PA ~ s(SLOPE10), family=binomial, data=topox, weights = w0) Warning in eval(expr, envir, enclos) : non-integer #successes in a binomial glm! __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] GAM with Locfit components
From mgcv 1.0 you can add your own smoothers for use with gam, but only if they can be represented using basis functions and a quadratic penalty: so P-splines are OK, but loess is not, for example. Simon Loader's book is referring to the gam() function in S-plus (or S from Bell Labs), not the one in mgcv. They are very different things. I don't know if it's possible to implement local regression type smoothers (or something other than splines) in gam() in mgcv, but even if it's possible, the one in locfit won't work without quite a bit of work, I'd imagine. Andy From: Vivian Viallon Hi, I'm trying to combine the Locfit Package with the Mgcv package (to use Generalized Additive Models with Locfit components). I read the book written by Clive Loader where it's said that, for the S language, you just have to load the locfit package using the command : Library(locfit, first=T) in order to use locfit components in an additive model. But I can't. I guess the C-command differs from the S-command. Thanks in advance for your help. Regards, Vivian __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] GAM with Locfit components
Hi, Im trying to combine the Locfit Package with the Mgcv package (to use Generalized Additive Models with Locfit components). I read the book written by Clive Loader where its said that, for the S language, you just have to load the locfit package using the command : Library(locfit, first=T) in order to use locfit components in an additive model. But I cant. I guess the C-command differs from the S-command. Thanks in advance for your help. Regards, Vivian __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] GAM with Locfit components
Loader's book is referring to the gam() function in S-plus (or S from Bell Labs), not the one in mgcv. They are very different things. I don't know if it's possible to implement local regression type smoothers (or something other than splines) in gam() in mgcv, but even if it's possible, the one in locfit won't work without quite a bit of work, I'd imagine. Andy From: Vivian Viallon Hi, I'm trying to combine the Locfit Package with the Mgcv package (to use Generalized Additive Models with Locfit components). I read the book written by Clive Loader where it's said that, for the S language, you just have to load the locfit package using the command : Library(locfit, first=T) in order to use locfit components in an additive model. But I can't. I guess the C-command differs from the S-command. Thanks in advance for your help. Regards, Vivian __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R]gam and concurvity
in the paper Avoiding the effects of concurvity in GAM's .. of Figueiras et al. (2003) it is mentioned that in GLM collinearity is taken into account in the calc of se but not in GAM (- results in confidence interval too narrow, p-value understated, GAM S-Plus version). I haven't found any references to GAM and concurvity or collinearity on the R page. And I wonder if the R version of Gam differ in this point. - the penalized regression spline representation means that it's easy to calculate the `correct' s.e.'s and this is what is done. The covariance matrix used is based on a Bayesian model of smoothing, generalized from Silverman (1985), JRSSB (and less closely, Wahba, 1983, JRSSB), so the s.e.'s are generally a little larger than you'd get if you just pretended that the GAM was an un-penalized GLM (this widening generally improves CI performance). As Thomas Lumley pointed out, the s.e.'s don't take into account smoothing parameter estimation uncertainty. In simulation studies this uncertainty seems to have very little effect on the realized coverage probabilities of Confidence Interval's that are in some sense `whole model' intervals, but the performance of CI's for component functions of the GAM can be quite a long way from nominal. There's a simple `not-very-computer-intensive' fix for this which removes the conditioning on the smoothing parameters and greatly improves component-wise coverage probabilities implementation is on my `to-do' list (might wait to see what the referees say though!) Simon ps. mgcv 0.9 out now! (changes list linked to my www page) _ Simon Wood [EMAIL PROTECTED]www.stats.gla.ac.uk/~simon/ Department of Statistics, University of Glasgow, Glasgow, G12 8QQ Direct telephone: (0)141 330 4530 Fax: (0)141 330 4814 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] gam and concurvity
Hello, in the paper Avoiding the effects of concurvity in GAM's .. of Figueiras et al. (2003) it is mentioned that in GLM collinearity is taken into account in the calc of se but not in GAM (- results in confidence interval too narrow, p-value understated, GAM S-Plus version). I haven't found any references to GAM and concurvity or collinearity on the R page. And I wonder if the R version of Gam differ in this point. Another question would be, what the best manual way of a variable selection is, due to the lack of a stepwise procedure for GAM. Including the first variables, add var1, if GCV improves (what would be considered as improvement?) or P-value signif., keep it, otherwise drop it - add var 2, and so on? thanks in advance, cheers Martin __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
R: [R] gam and concurvity
As someone (Simon Wood, for instance) could explain much better and as it is stressed in the help files of the mgcv pakage (the package including the gam() function) gam in R is not a clone of gam in S+. S+ uses backfitting while R uses penalized splines (see the references inside gam() function). The approaches are quite different and can lead to substantial differences in particular cases, for instance with concurvity. best, vito PS Can you point out the exact reference for Figueiras et al. (2003)? - Original Message - From: Martin Wegmann [EMAIL PROTECTED] To: R-list [EMAIL PROTECTED] Sent: Tuesday, September 16, 2003 3:47 PM Subject: [R] gam and concurvity Hello, in the paper Avoiding the effects of concurvity in GAM's .. of Figueiras et al. (2003) it is mentioned that in GLM collinearity is taken into account in the calc of se but not in GAM (- results in confidence interval too narrow, p-value understated, GAM S-Plus version). I haven't found any references to GAM and concurvity or collinearity on the R page. And I wonder if the R version of Gam differ in this point. Another question would be, what the best manual way of a variable selection is, due to the lack of a stepwise procedure for GAM. Including the first variables, add var1, if GCV improves (what would be considered as improvement?) or P-value signif., keep it, otherwise drop it - add var 2, and so on? thanks in advance, cheers Martin __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: R: [R] gam and concurvity
On Tuesday 16 September 2003 16:28, Vito Muggeo wrote: As someone (Simon Wood, for instance) could explain much better and as it is stressed in the help files of the mgcv pakage (the package including the gam() function) gam in R is not a clone of gam in S+. S+ uses backfitting while R uses penalized splines (see the references inside gam() function). The approaches are quite different and can lead to substantial differences in particular cases, for instance with concurvity. best, vito PS Can you point out the exact reference for Figueiras et al. (2003)? I haven't found a journal name but the *.pdf download is http://isi-eh.usc.es/trabajos/110_70_fullpaper.pdf - Original Message - From: Martin Wegmann [EMAIL PROTECTED] To: R-list [EMAIL PROTECTED] Sent: Tuesday, September 16, 2003 3:47 PM Subject: [R] gam and concurvity Hello, in the paper Avoiding the effects of concurvity in GAM's .. of Figueiras et al. (2003) it is mentioned that in GLM collinearity is taken into account in the calc of se but not in GAM (- results in confidence interval too narrow, p-value understated, GAM S-Plus version). I haven't found any references to GAM and concurvity or collinearity on the R page. And I wonder if the R version of Gam differ in this point. Another question would be, what the best manual way of a variable selection is, due to the lack of a stepwise procedure for GAM. Including the first variables, add var1, if GCV improves (what would be considered as improvement?) or P-value signif., keep it, otherwise drop it - add var 2, and so on? thanks in advance, cheers Martin __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] gam step in grasper
hello, some weeks ago I asked if there is an equivalent of step(lm) for gam, and Simon Wood informed me that there isn't but it will eventually be done. Now I found grasp.step.gam {grasper} and wonder if it would be possible to rewrite/extract (? I don't now how it is called or how it works - I am not experienced in programming) this command for the use outside GRASP-R. Perhaps this way is less work intensive but of course this command doesn't use GCV/UBRE scores but anova. Is there an argument not to use this function inside GRASP-R even though it's purpose is not spatial? thanks for your help, cheers Martin __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] gam and step
hello, I am looking for a step() function for GAM's. In the book Statistical Computing by Crawley and a removal of predictors has been done by hand model - gam(y ~s(x1) +s(x2) + s(x3)) summary(model) model2 - gam(y ~s(x2) + s(x3)) # removal of the unsignificant variable #then comparing these two models if an significant increase occurs. anova(model, model2, test=F) isn't there a way to drop and add variables automatically until the best model is received? like in step(lm(...))? Or as in grasp.step.gam() - but that doesn't work when I tried it outside GRASP-R. thanks for your help, cheers Martin __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] gam and step
There isn't a step.gam() in mgcv yet it is one of the things that I'd like to do eventually, although it will probably be based on comparison of GCV/UBRE scores rather than H_0 testing. best, Simon I am looking for a step() function for GAM's. In the book Statistical Computing by Crawley and a removal of predictors has been done by hand model - gam(y ~s(x1) +s(x2) + s(x3)) summary(model) model2 - gam(y ~s(x2) + s(x3)) # removal of the unsignificant variable #then comparing these two models if an significant increase occurs. anova(model, model2, test=F) isn't there a way to drop and add variables automatically until the best model is received? like in step(lm(...))? Or as in grasp.step.gam() - but that doesn't work when I tried it outside GRASP-R. thanks for your help, cheers Martin __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help _ Simon Wood [EMAIL PROTECTED]www.stats.gla.ac.uk/~simon/ Department of Statistics, University of Glasgow, Glasgow, G12 8QQ Direct telephone: (0)141 330 4530 Fax: (0)141 330 4814 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] gam()
Dear Henric, At 05:01 PM 6/4/2003 +0200, Henric Nilsson wrote: I've now spent a couple of days trying to learn R and, in particular, the gam() function, and I now have a few questions and reflections regarding the latter. Maybe these things are implemented in some way that I'm not yet aware of or have perhaps been decided by the R community to not be what's wanted. Of course, my lack of complete theoretical understanding of what mgcv really does may also show... 1. When fitting models where a factor interacts with a smooth term, say y~a+s(x,by=a.1)+s(x,by=a.2), I noticed that the rug in the plot of each of the smooth terms is identical. I expected the rug in the plot of e.g. s(x,by=a.1) to only include those x for which a.1=1 to be able to judge if observations of x where a.1=1 are sparse in any region. Also, it would be really if nice the by=... was included in the output of the plot.gam() and the Approximate significance of smooth terms: part of the summary.gam(). 2. John Fox has modified anova.glm() into anova.gam() (http://www.socsci.mcmaster.ca/jfox/Books/Companion/nonparametric-regression.txt) for comparison of two or more fitted models based on the difference between residual deviances. Indiscriminate use of such a procedure shouldn't perhaps be encouraged, but I think that many users expect it to be part of the mgcv package since this model selection idea is covered in several texts and also implemented in S-plus (and may be OK for truly nested models). And even if it's been decided that this functionality is not wanted in mgcv, perhaps another function comparing several models by the GCV/UBRE score and other useful statistics can be implemented? The problem with comparing two gams in R fit with mgcv is that, by default, the degree of smoothing for terms is selected independently for each model. Simon Wood previously posted a message to the R-help list discussing this issue and making some suggestions. The issue doesn't arise in the same way with models fit by the gam function in S-PLUS because the degree of smoothing there is instead selected by the user. I should update my appendix on nonparametric regression to discuss this question -- the current presentation isn't really adequate. 3. Some authors [1, 2] suggests pointwise estimation of odds ratios and corresponding confidence intervals based on the smooth terms in a GAM. Maybe something for mgcv? [1] Figueiras, A. Cadarso-Suárez C. (2001) Application of Nonparametric Models for calculating Odds Ratios and Their Confidence Intervals for Continuous Exposures, American Journal of Epidemiology, 154(3), 264-275. [2] Saez, M., Cadarso-Suárez C. Figueiras, A. (2003) np.OR: an S-Plus function for pointwise nonparametric estimation of odds-ratios of continuous predictors, Computer Methods and Programs in Biomedicine, 71, 175-179. 4. For each purely parametric covariate a t-test is produced; I'd like to have something like S-plus' anova.gam() to get an overall test. (Perhaps with the addition of a choice between Type I and Type III tests, but I guess that may be controversial). Is it possible? John - John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: [EMAIL PROTECTED] phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] gam()
At 11:12 2003-06-05 -0400, John Fox wrote: 2. John Fox has modified anova.glm() into anova.gam() (http://www.socsci.mcmaster.ca/jfox/Books/Companion/nonparametric-regression.txt) for comparison of two or more fitted models based on the difference between residual deviances. Indiscriminate use of such a procedure shouldn't perhaps be encouraged, but I think that many users expect it to be part of the mgcv package since this model selection idea is covered in several texts and also implemented in S-plus (and may be OK for truly nested models). And even if it's been decided that this functionality is not wanted in mgcv, perhaps another function comparing several models by the GCV/UBRE score and other useful statistics can be implemented? The problem with comparing two gams in R fit with mgcv is that, by default, the degree of smoothing for terms is selected independently for each model. Simon Wood previously posted a message to the R-help list discussing this issue and making some suggestions. The issue doesn't arise in the same way with models fit by the gam function in S-PLUS because the degree of smoothing there is instead selected by the user. I should update my appendix on nonparametric regression to discuss this question -- the current presentation isn't really adequate. I'm aware of this difference between gam() in R and S-Plus, which is why I proposed a function listing relevant statistics for every fitted model so the analyst can use these to judge, without hypothesis testing, which model to prefer. Still, for models where the analyst has made sure that the models are truly nested, the use of your anova.gam can be justified by the simulation results reported by Hastie Tibshirani (1990, p. 155); maybe I just want it for purely nostalgic reasons?! ;-) Admittedly, I like the more attractive way of chosing the degrees of freedom that mgcv provides. However, I must admit that since most text books covering GAMs are more or less Splus based, and the possibilities that mgcv offers are so vast, I'm feeling a bit lost at times; it's great to have to new more flexible tools, but on the downside that means more choices to be made. So, anyone got any essential literature tips? I've read (and re-read, and read again) Simon Wood's articles in JRSS, R News and Ecological Modelling, and, of course, the mgcv manual. //Henric --- Henric Nilsson, Statistician Statisticon AB, Östra Ågatan 31, SE-753 22 UPPSALA Phone (Direct): +46 (0)18 18 22 37 Mobile: +46 (0)70 211 68 36 Fax: +46 (0)18 18 22 33 http://www.statisticon.se __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] gam()
Dear all, I've now spent a couple of days trying to learn R and, in particular, the gam() function, and I now have a few questions and reflections regarding the latter. Maybe these things are implemented in some way that I'm not yet aware of or have perhaps been decided by the R community to not be what's wanted. Of course, my lack of complete theoretical understanding of what mgcv really does may also show... 1. When fitting models where a factor interacts with a smooth term, say y~a+s(x,by=a.1)+s(x,by=a.2), I noticed that the rug in the plot of each of the smooth terms is identical. I expected the rug in the plot of e.g. s(x,by=a.1) to only include those x for which a.1=1 to be able to judge if observations of x where a.1=1 are sparse in any region. Also, it would be really if nice the by=... was included in the output of the plot.gam() and the Approximate significance of smooth terms: part of the summary.gam(). 2. John Fox has modified anova.glm() into anova.gam() (http://www.socsci.mcmaster.ca/jfox/Books/Companion/nonparametric-regression.txt) for comparison of two or more fitted models based on the difference between residual deviances. Indiscriminate use of such a procedure shouldn't perhaps be encouraged, but I think that many users expect it to be part of the mgcv package since this model selection idea is covered in several texts and also implemented in S-plus (and may be OK for truly nested models). And even if it's been decided that this functionality is not wanted in mgcv, perhaps another function comparing several models by the GCV/UBRE score and other useful statistics can be implemented? 3. Some authors [1, 2] suggests pointwise estimation of odds ratios and corresponding confidence intervals based on the smooth terms in a GAM. Maybe something for mgcv? [1] Figueiras, A. Cadarso-Suárez C. (2001) Application of Nonparametric Models for calculating Odds Ratios and Their Confidence Intervals for Continuous Exposures, American Journal of Epidemiology, 154(3), 264-275. [2] Saez, M., Cadarso-Suárez C. Figueiras, A. (2003) np.OR: an S-Plus function for pointwise nonparametric estimation of odds-ratios of continuous predictors, Computer Methods and Programs in Biomedicine, 71, 175-179. 4. For each purely parametric covariate a t-test is produced; I'd like to have something like S-plus' anova.gam() to get an overall test. (Perhaps with the addition of a choice between Type I and Type III tests, but I guess that may be controversial). Is it possible? //Henric --- Henric Nilsson, Statistician Statisticon AB, Östra Ågatan 31, SE-753 22 UPPSALA Phone (Direct): +46 (0)18 18 22 37 Mobile: +46 (0)70 211 68 36 Fax: +46 (0)18 18 22 33 http://www.statisticon.se __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] gam questions
Dear all, I'm a fairly new R user having two questions regarding gam: 1. The prediction example on p. 38 in the mgcv manual. In order to get predictions based on the original data set, by leaving out the 'newdata' argument (newd in the example), I get an error message Warning message: the condition has length 1 and only the first element will be used in: if (object$dim == 0) m - 0 else m - length(object$sp) I suspected that it had somthing to do with the data not being attached, but when fitting a gam with an attached data set I got the same error. Why? 2. I've fitted a glm y~a+x+a:x, where a is a 3 level factor and x is a continuous covariate. If I want to fit a similar gam model, is it correct to fit y~a+s(x)+s(x,by=a.1)+s(x,by=a.2)+s(x,by=a.3), where a.1--a.3 are dummy variables representing each level of the factor? Or is the s(x) term redundant? Any hints are greatly appreciated. Best wishes, Henric --- Henric Nilsson, Statistician Statisticon AB, Östra Ågatan 31, SE-753 22 UPPSALA Phone (Direct): +46 (0)18 18 22 37 Mobile: +46 (0)70 211 68 36 Fax: +46 (0)18 18 22 33 http://www.statisticon.se __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] gam questions
On Tue, 3 Jun 2003, Henric Nilsson wrote: I'm a fairly new R user having two questions regarding gam: 1. The prediction example on p. 38 in the mgcv manual. That's not very helpful: pagination of manuals depends on the paper size, for example. In order to get predictions based on the original data set, by leaving out the 'newdata' argument (newd in the example), I get an error message Warning message: the condition has length 1 and only the first element will be used in: if (object$dim == 0) m - 0 else m - length(object$sp) I suspected that it had somthing to do with the data not being attached, but when fitting a gam with an attached data set I got the same error. Why? That is a warning not an error! It was a bug in mgcv a while back. Do you have the latest version of mgcv (and of R, for that matter)? I am not seeing this any more. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] gam questions
2. I've fitted a glm y~a+x+a:x, where a is a 3 level factor and x is a continuous covariate. If I want to fit a similar gam model, is it correct to fit y~a+s(x)+s(x,by=a.1)+s(x,by=a.2)+s(x,by=a.3), where a.1--a.3 are dummy variables representing each level of the factor? Or is the s(x) term redundant? - yes the s(x) term is redundant and in the current mgcv version will likely cause spectacular nonsense as a result of lack of identifiability in the smooth part of the model (mgcv 0.9 will cope with this when released, but it's still much better to use an identifiable smooth model). Simon _ Simon Wood [EMAIL PROTECTED]www.stats.gla.ac.uk/~simon/ Department of Statistics, University of Glasgow, Glasgow, G12 8QQ Direct telephone: (0)141 330 4530 Fax: (0)141 330 4814 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] gam questions
At 10:07 2003-06-03 +0100, Prof Brian Ripley wrote: 1. The prediction example on p. 38 in the mgcv manual. That's not very helpful: pagination of manuals depends on the paper size, for example. It's the gam.predict example. Instead of pred-predict.gam(b,newd) I tried pred-predict.gam(b). Do you have the latest version of mgcv (and of R, for that matter)? I'm running R 1.7.0 under Windows 2000 with mgcv 0.8-8. Best, Henric --- Henric Nilsson, Statistician Statisticon AB, Östra Ågatan 31, SE-753 22 UPPSALA Phone (Direct): +46 (0)18 18 22 37 Mobile: +46 (0)70 211 68 36 Fax: +46 (0)18 18 22 33 http://www.statisticon.se __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] gam questions
It's the gam.predict example. Instead of pred-predict.gam(b,newd) I tried pred-predict.gam(b). - Ok, thanks - this is a bug I missed, I'll fix it. The results should be unaffected, though. Simon __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] GAM with Thin plate splines
Hello, I'm a student at the University of Klagenfurt / Austria and I need some help ! I have to predict 24 daily load-values. Therefor I got a dataset with following colums: 24 past daily load-values 6 past daily temperature-values My goal is to find a model (GAM with thin plate splines) in R. I found the function gam in the R-library mgcv, but it just fits one-dimensial splines. So my question is, either if it's possible to modify this function, if yes how, or if there is another function that gives me a solution ? Please send me a mail, if you can help me ! Thanks __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] GAM with Thin plate splines
My goal is to find a model (GAM with thin plate splines) in R. I found the function gam in the R-library mgcv, but it just fits one-dimensial splines. - Unless you have an exceedingly ancient version of mgcv (0.6), it *does* allow spline smooths of more than one variable. ?gam contains a couple of examples as does ?gam.side.conditions. Simon __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] GAM with Thin plate splines
The default basis for smooth terms in mgcv is a truncated thin plate spline basis, and has been since the later part of 2001. Including terms like `s(x,z)', `s(x0,x1,x2)' in the model formula is the way to include such terms (see help files, and examples therin). best, Simon Last time I worked with it (last year) there was no tps. And rkales is not finding it also. EJ On Thu, 2003-01-09 at 10:55, Peter Dalgaard BSA wrote: Ernesto Jardim [EMAIL PROTECTED] writes: Hi Im package gss there are functions for tps, see ssanova. gam in mgcv fits thin plate splines, where was the problem??? -- Ernesto Jardim [EMAIL PROTECTED] Marine Biologist Research Institute for Agriculture and Fisheries Lisboa, Portugal Tel: +351 213 027 000 Fax: +351 213 015 948 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help