[R] Summer student internship placement at University of York / YCCSA / SEI (paid)
Dear R-lings, I did not know which list to post to, because it is a studentship so not really a job, so it did not fit the r-sig-jobs list and it is about devloping an extension package interfaced with R I hope I did not upset anyone. If so apologies. The Centre For Complex systems Analysis at the University of York (YCCSA) in UK in collaboration with Stockholm Environment Institute is looking for a highly motivated student in Computer Science, Applied Mathematics, Applied Statistics or related fields for a 10 weeks paid student internship over the summer 2011, starting in july, to collaborate in development of a R package. The student will participate in research projects to develop prototypes for toolkits for statistical predictions of diversity and dissimilarity and the generation of spatial landscapes, with applications in the biological and environmental sciences. We require excellent development skills and experience in CUDA/openCL, and a strong foundation in Computing, Statistics / Applied Mathematics and COmputer Graphics. We need an excellent problem solver, able to innovate, find solutions and work independently. For further information on the project please contact ct...@york.ac.uk or go to http://www.york.ac.u...2011/201107.pdf For further information on the studentship programme please look at http://www.york.ac.u...olarships.html. Please send your application not later than the 13 of may to scholarsh...@yccsa.org as one single pdf document including: 1. Your CV (max 2 pages) 2. A brief personal statement (max 1 page) including: * Which project(s) you are interested in (as many as you like but in preference order) * Your reasons for applying * Your academic interest * Your future aspirations 3. A full written academic reference (not just contact details). Your application will not be accepted without this reference (max 1 page). Best, -- Corrado Topi Stockholm Environment Institute Mob: +44 (0) 7769 601784 Tel: +44 (0) 1904 322893 Skype: corrado-eeos Website: sei-international.org University of York York YO10 5DD UK Fax: +44 (0) 1904 322898 EMAIL DISCLAIMER: http://www.york.ac.uk/docs/disclaimer/email.htm -- Corrado Topi Stockholm Environment Institute Mob: +44 (0) 7769 601784 Tel: +44 (0) 1904 322893 Skype: corrado-eeos Website: sei-international.org University of York York YO10 5DD UK Fax: +44 (0) 1904 322898 EMAIL DISCLAIMER: http://www.york.ac.uk/docs/disclaimer/email.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error "singular gradient matrix at initial parameter estimates" in nls
Dear JN, Bert, 1) It is not a perfect fit. I do not think I have ever said that. I said that an external algorithms fits the model without any problems: with ~ 500,000 data points and 19 paramters (ki in the original equation), it fits the model in less than 1 second. The data are not artificial data. The variables are independent (pi in the original model). The solution is unique and the rapidity of convergence is practically independent from the selection of start conditions (with a reasonable selection of start conditions at least). The resulting residuals are approximately normally distributed with mean 0 and sd ~ 4.23. 2) I agree with the comment of Bert on over-parametrization, but again the model is not overparamterised, and it is identifiable (in part answered already in (1)) Regards Prof. John C Nash wrote: If you have a perfect fit, you have zero residuals. But in the nls manual page we have: Warning: *Do not use ‘nls’ on artificial "zero-residual" data.* So this is a case of complaining that your diesel car is broken because you ignored the "Diesel fuel only" sign on the filler cap and put in gasoline. However I've not been happy with this choice in the code of nls -- it's been there a long time -- and my own codes from 1974 onwards have always handled zero residual cases. I do believe that the code could at least give a better diagnostic message. Zero residuals -- perfect fits -- arise when one is interested more or less in an interpolating function rather than doing statistics, and I can understand the reluctance of statisticians to countenance such a use of nls. And Bert's comment on overparametrization is almost certainly valid also. JN -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] From THE R BOOK -> Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
Dear Ruben I am afraid not the paragraph's title is a bit of a give away: Proportion Data and Binomial Errors The sentence reads: " are dealt with by using a generalised linear model with a binomial error structure". with the example: glm(y~x,family=binomial) You can check at page 514/515. Rubén Roa wrote: -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Corrado Enviado el: martes, 30 de marzo de 2010 16:52 Para: r-help@r-project.org Asunto: [R] From THE R BOOK -> Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! Dear friends, I am testing glm as at page 514/515 of THE R BOOK by M.Crawley, that is on proportion data. I use glm(y~x1+,family=binomial) y is a proportion in (0,1), and x is a real number. I get the error: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! But that is exactly what was suggested in the book, where there is no mention of a similar warning. Where am I going wrong? Here is the output: > glm(response.prepared~x,data=,family=binomial) Call: glm(formula = response.prepared ~ x, family = binomial, data = ) Coefficients: (Intercept)x -0.3603 0.4480 Degrees of Freedom: 510554 Total (i.e. Null); 510553 Residual Null Deviance: 24420 Residual Deviance: 23240AIC: 700700 Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! > Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] From THE R BOOK -> Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
Dear David, David Winsemius wrote: A) It is not an error, only a warning. Wouldn't it seem reasonable to issue such a warning if you have data that violates the distributional assumptions? I am not questioning the approach. I am only trying to understand why a (rather expensive) source of documentation and the behaviour of a function are not aligned. B) You did not include any of the data Data attached as R object. C) Wouldn't this be more appropriate to the author of the book if this is "exactly what was suggested" there? I think it will be definitively appropriate, but only when I am certain I am not doing anything wrong. Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error "singular gradient matrix at initial parameter
Yes, of course. The problem still stays. Gabor Grothendieck wrote: Sorry, its algorithm="brute-force" On Tue, Mar 30, 2010 at 10:29 AM, Corrado wrote: Hi Gabor, same problem even using nls2 with method=brute-force to calculate the initial parameters. Best, Gabor Grothendieck wrote: You could try method="brute-force" in the nls2 package to find starting values. On Tue, Mar 30, 2010 at 7:03 AM, Corrado wrote: I am using nls to fit a non linear function to some data. The non linear function is: y= 1- exp(-(k0+k1*p1+ + kn*pn)) I have chosen algorithm "port", with lower boundary is 0 for all of the ki parameters, and I have tried many start values for the parameters ki (including generating them at random). If I fit the non linear function to the same data using an external algorithm, it fits perfectly and finds the parameters. As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64 bit), I keep getting the error: Error in nlsModel(formula, mf, start, wts, upper) : singular gradient matrix at initial parameter estimates I have read all the previous postings and the documentation, but to no avail: the error is there to stay. I am sure the problem is with nls, because the external fitting algorithm perfectly fits it in less than a second. Also, if my n is 4, then the nls works perfectly (but that excludes all the k5 kn). Can anyone help me with suggestions? Thanks in advance. Alternatively, what do you suggest I should do? Shall I abandon nls in favour of optim? Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error "singular gradient matrix at initial parameter
Hi Gabor, same problem even using nls2 with method=brute-force to calculate the initial parameters. Best, Gabor Grothendieck wrote: You could try method="brute-force" in the nls2 package to find starting values. On Tue, Mar 30, 2010 at 7:03 AM, Corrado wrote: I am using nls to fit a non linear function to some data. The non linear function is: y= 1- exp(-(k0+k1*p1+ + kn*pn)) I have chosen algorithm "port", with lower boundary is 0 for all of the ki parameters, and I have tried many start values for the parameters ki (including generating them at random). If I fit the non linear function to the same data using an external algorithm, it fits perfectly and finds the parameters. As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64 bit), I keep getting the error: Error in nlsModel(formula, mf, start, wts, upper) : singular gradient matrix at initial parameter estimates I have read all the previous postings and the documentation, but to no avail: the error is there to stay. I am sure the problem is with nls, because the external fitting algorithm perfectly fits it in less than a second. Also, if my n is 4, then the nls works perfectly (but that excludes all the k5 kn). Can anyone help me with suggestions? Thanks in advance. Alternatively, what do you suggest I should do? Shall I abandon nls in favour of optim? Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] From THE R BOOK -> Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
Dear friends, I am testing glm as at page 514/515 of THE R BOOK by M.Crawley, that is on proportion data. I use glm(y~x1+,family=binomial) y is a proportion in (0,1), and x is a real number. I get the error: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! But that is exactly what was suggested in the book, where there is no mention of a similar warning. Where am I going wrong? Here is the output: > glm(response.prepared~x,data=,family=binomial) Call: glm(formula = response.prepared ~ x, family = binomial, data = ) Coefficients: (Intercept)x -0.3603 0.4480 Degrees of Freedom: 510554 Total (i.e. Null); 510553 Residual Null Deviance: 24420 Residual Deviance: 23240AIC: 700700 Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! > Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error "singular gradient matrix at initial parameter estimates" in nls
I am using nls to fit a non linear function to some data. The non linear function is: y= 1- exp(-(k0+k1*p1+ + kn*pn)) I have chosen algorithm "port", with lower boundary is 0 for all of the ki parameters, and I have tried many start values for the parameters ki (including generating them at random). If I fit the non linear function to the same data using an external algorithm, it fits perfectly and finds the parameters. As soon as I come to my R installation (2.10.1 on Kubuntu Linux 910 64 bit), I keep getting the error: Error in nlsModel(formula, mf, start, wts, upper) : singular gradient matrix at initial parameter estimates I have read all the previous postings and the documentation, but to no avail: the error is there to stay. I am sure the problem is with nls, because the external fitting algorithm perfectly fits it in less than a second. Also, if my n is 4, then the nls works perfectly (but that excludes all the k5 kn). Can anyone help me with suggestions? Thanks in advance. Alternatively, what do you suggest I should do? Shall I abandon nls in favour of optim? Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] S4: Multiple inheritance
Dear Christophe, Could you please post some example code of what you are trying to achieve? Christophe Genolini wrote: Hi all, Working with S4 object, I definine two class foo1 and foo2. I define '[' (resp. '[<-') for the two classes. Then I define a third class foo3 that inherit from both foo1 and foo2. Is there a way to make '[' (resp. '[<-') for foo3 inherit from '[' (resp. '[<-') for foo1 and foo2? Thanks Christophe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Changing content of column in data.frame + efficient join extraction between 2 data.frames
Dear R users, I have 2 SpatialPointsDataFrame's, pcs and East. The column str_1 in the first (pcs) is: > pcs[0:4,] coordinates cat str_1 int_1 int_2dbl_1 dbl_2 1 (101000, 263000) 1 "SM06B" 101000 263000 4.978915 -4.293668 2 (101000, 265000) 2 "SM06C" 101000 265000 4.960478 -4.266742 3 (101000, 267000) 3 "SM06D" 101000 267000 4.912984 -4.246849 4 (101000, 269000) 4 "SM06E" 101000 269000 4.613309 -4.185405 > The column str_1 in the second (East) is: > East[0:4,] coordinates str_1 1 (489000, 215000) sp81x 2 (489000, 217000) sp81y 3 (493000, 209000) sp90j 4 (495000, 209000) sp90p > I would like to do 2 things: 1) I would like to change the format of the column str_1 in the first to be the same that it is in the second, that is I need to remove the inverted commas " and I need to make it lower case. 2) I would like to extract the rows from the first one (pcs) where pcs$str_1 is the same as East$str_1. I have even tried regexp, but cannot modify the content of pcs$str_1 to remove the inveretd commas " and change the case to lowercase. How do I do that? Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Factor variables with GAM models
You can some time manually substitute a categorical variable with a set of continuous variables. For example, you have the variables like "landcover.class" with 3 values "class A, class B, class C". You cna transform it into 3 continuous variables landcover.class.A, landcover.class.B, landcover.class.C and assign a value of 1 or 100% for elements belonging to that class or of 0 for elements not belonging. That help some time. Regards Noah Silverman wrote: Steve, I get that. What you wrote make sense. My challenge is the data I'm attempting to model. Some of the variables are continuous, some are factors. both linear and poisson models work. (Poisson doing a much more accurate job.) However, some of the numerical variables are clearly non-linear. Hence my interest in GAM. I suppose one alternative would be to try some polynomial transformation on the variable as part of a Poisson model. Any other suggestions would be welcome. Thanks! -N On 3/19/10 8:37 PM, Steven McKinney wrote: Hi Noah GAM models were developed to assess the functional form of the relationship of continuous predictor variables to the response, so weren't really meant to handle factor variables as predictor variables. GAMs are of the form E(Y | X1, X2, ...) = So + S(X1) + S(X2) + ... where S(X) is a smooth function of X. Hence you might want to rethink why you'd want a factor variable as a predictor variable in a GAM. This is why the gam machinery doesn't just do the factor conversion to indicator variables as is done in lm. HTH Steven McKinney From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Noah Silverman [n...@smartmediacorp.com] Sent: March 19, 2010 12:54 PM To: r-help@r-project.org Subject: [R] Factor variables with GAM models I'm just starting to learn about GAM models. When using the lm function in R, any factors I have in my data set are automatically converted into a series of binomial variables. For example, if I have a data.frame with a column named color and values "red", "green", "blue". The lm function automatically replaces it with 3 variables colorred, colorgreen, colorblue which are binomial {0,1} When I use the gam function, R doesn't do this so I get an error. 1) Is there a way to ask the gam function to do this conversion for me? 2) If not, is there some other tool or utility to make this data transformation easy? 3) Last option - can I use lm to transform the data and then extract it into a new data.frame to then pass to gam? Thanks!!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained non linear regression using ML
Dear Gabor, Arne, Ravi, R users, I am firstly trying the maximum likelihood approach, then will try the Bayesian approach. The likelihood function, and the log likelihood function, will depend on the pdf of the error e in the formula: y=f(theta*x)+e Now let's say that e is Gaussian distributed, then I can use LS which is the same as ML in this case, and the residuals would be distributed Gaussian. Is that right? If e is distributed differently (for example: beta, in the continuous case, or binomial, in the discrete case), then I am better off by using Maximum Likelihood. How would the residual be distributed? Should they not be distributed the same as e? Best, Gabor Grothendieck wrote: For specific questions on the betareg package contact the maintainer. If the likelihood based approaches are giving too much difficulty try moving to a Bayesian framework (WinBUGS/R2WinBUGS, JAGS/r2jags, etc.) On Wed, Mar 17, 2010 at 10:03 AM, Corrado wrote: Dear Arne, Gabor, I solved the problem with betareg (downloaded the package). I run it on my data, and unfortunately the constraint is definitively active, if I remove the active variables, I then remove the most significant variables! Of course the error is important, not the distribution of the variable. In this case, one of the assumptions is that the error may be distributed ~ beta. I think that betareg makes this assumption, am I right? I am finding it difficult to solve two problems: 1) write the maximum likelihood function (what do you suggest?) 2) deal with the fact that a few factors actually have values of y (the response) at the extremes: that is 0 and 1. But that mean that the link function returns Infinite values in that case 3) the error is dependent on E(y). PS: Additional silly question: what is the discrete equivalent of beta? binomial? Arne Henningsen wrote: On 17 March 2010 14:22, Gabor Grothendieck wrote: Contact the maintainer regarding problems with the package. Not sure if this is acceptable but if you get it to run you could consider just dropping the variables from your model that correspond to active constraints. Also try the maxLik package. You will have to define the likelihood yourself but it does support constraints. Yes. And specifying the likelihood function is probably (depending on your distributional assumptions) not too complicated. BTW: Even if your y follows a beta distribution, it does not mean that your error term also follows a beta distribution. And it the distribution of the error term which is crucial for specifying the likelihood function. /Arne -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained non linear regression using ML
Dear Arne, Gabor, I solved the problem with betareg (downloaded the package). I run it on my data, and unfortunately the constraint is definitively active, if I remove the active variables, I then remove the most significant variables! Of course the error is important, not the distribution of the variable. In this case, one of the assumptions is that the error may be distributed ~ beta. I think that betareg makes this assumption, am I right? I am finding it difficult to solve two problems: 1) write the maximum likelihood function (what do you suggest?) 2) deal with the fact that a few factors actually have values of y (the response) at the extremes: that is 0 and 1. But that mean that the link function returns Infinite values in that case 3) the error is dependent on E(y). PS: Additional silly question: what is the discrete equivalent of beta? binomial? Arne Henningsen wrote: On 17 March 2010 14:22, Gabor Grothendieck wrote: Contact the maintainer regarding problems with the package. Not sure if this is acceptable but if you get it to run you could consider just dropping the variables from your model that correspond to active constraints. Also try the maxLik package. You will have to define the likelihood yourself but it does support constraints. Yes. And specifying the likelihood function is probably (depending on your distributional assumptions) not too complicated. BTW: Even if your y follows a beta distribution, it does not mean that your error term also follows a beta distribution. And it the distribution of the error term which is crucial for specifying the likelihood function. /Arne -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained non linear regression using ML
Dear Gabor, 1) The constraints are active, at least from a formal point view. 3) I have tried several times to run betareg.fit on the data, and the only thing I can obtain is the very strange error: Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent The error is strange because, because the function dimnames is not called anywhere. Regards Gabor Grothendieck wrote: Try it anyways -- maybe none of your constraints are active. On Wed, Mar 17, 2010 at 6:01 AM, Corrado wrote: Dear Gabor, dear R users, I had already read the betareg documentation. As far as I can understand from the help, it does not allow for constrained regression. Regards Gabor Grothendieck wrote: Check out the betareg package. On Tue, Mar 16, 2010 at 2:58 PM, Corrado wrote: Dear R users, I have to fit the non linear regression: y~1-exp(-(k0+k1*p1+k2*p2+ +kn*pn)) where ki>=0 for each i in [1 n] and pi are on R+. I am using, at the moment, nls, but I would rather use a Maximum Likelhood based algorithm. The error is not necessarily normally distributed. y is approximately beta distributed, and the volume of data is medium to large (the y,pi may have ~ 40,000 elements). I have studied the packages in the task views Optimisation and Robust Statistical Methods, but I did look like what I was looking for was there. Maybe I am wrong. The nearest thing was nlrob, but even that does not allow for constraints, as far as I can understand. Any suggestion? Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained non linear regression using ML
Dear Gabor, dear R users, I had already read the betareg documentation. As far as I can understand from the help, it does not allow for constrained regression. Regards Gabor Grothendieck wrote: Check out the betareg package. On Tue, Mar 16, 2010 at 2:58 PM, Corrado wrote: Dear R users, I have to fit the non linear regression: y~1-exp(-(k0+k1*p1+k2*p2+ +kn*pn)) where ki>=0 for each i in [1 n] and pi are on R+. I am using, at the moment, nls, but I would rather use a Maximum Likelhood based algorithm. The error is not necessarily normally distributed. y is approximately beta distributed, and the volume of data is medium to large (the y,pi may have ~ 40,000 elements). I have studied the packages in the task views Optimisation and Robust Statistical Methods, but I did look like what I was looking for was there. Maybe I am wrong. The nearest thing was nlrob, but even that does not allow for constraints, as far as I can understand. Any suggestion? Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Constrained non linear regression using ML
Dear R users, I have to fit the non linear regression: y~1-exp(-(k0+k1*p1+k2*p2+ +kn*pn)) where ki>=0 for each i in [1 n] and pi are on R+. I am using, at the moment, nls, but I would rather use a Maximum Likelhood based algorithm. The error is not necessarily normally distributed. y is approximately beta distributed, and the volume of data is medium to large (the y,pi may have ~ 40,000 elements). I have studied the packages in the task views Optimisation and Robust Statistical Methods, but I did look like what I was looking for was there. Maybe I am wrong. The nearest thing was nlrob, but even that does not allow for constraints, as far as I can understand. Any suggestion? Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Distance between sets of points in transformed environmental space
Thanks Mario! (Oppure grazie Mario?) - Can those silhouette coefficients be used for distances between sets or only for distances point to set? - Where did you get the other post you attached? It did not come up when I searched the mailing list! Best, On Tuesday 01 December 2009 10:31:47 Mario Valle wrote: > silhouette coefficients? > It measure for each point how similar is to its cluster other points and > how dissimilar from the points of other clusters. > > P.N. Tam, M. Steinbach, V. Kumar, Introduction to data mining, > Addison-Wesley, 2006 page 541 > > Hope it helps. > mario > > Charlotte Maia wrote: > > Well, here's another naive post from me (hopefully better than the last > > one). > > > > Firstly I'm not sure computing euclidean distance is that simple. I > > would assume temperatures and precipitation would need to be > > standardised in some way. > > > > I think the notion of how far away something is, and how distinct > > location wise something is, are quite different, so maybe two > > measures? > > > > For distance per se, I think your first idea is the best. > > Plus simple is always good... > > > > For distinctness, given one one of two sets, for each point, you could > > just compute the closest point to it. If the closest point is a member > > of the same set, we will call that a + point, if the closest point is > > a member of the other set, we will call it a - point. In principle the > > measure of distinctness would be the sum of the +'s, however there > > might need to be some scaling to take into account the number of > > points in each set. > > > > There are also a lot of fancy things out there, so someone will > > probably come up with a much fancier (and possibly better) idea than > > this. > > > > Well, that's just my rant, before I go to bed. > > > > > > kind regards > -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Distance between sets of points in transformed environmental space
Dear friends, I have several sets of points in a transformed environmental space. Each set of points can be represented as a cloud in the environmental space. This space is spanned by n coordinates, corresponding to the first n PCs of 36 PCs of some environmental variables (12 monthly minimum temperatures, 12 monthly maximum temperature, 12 monthly precipitations). I would like to calculate a "distance" or dissimilarity between each pair of sets of points. Let's label two of those sets as X,Y, where x is in X and y is in Y. We are interested in defining a distance between X and Y. I have thought of using the following: 1) The Euclidean distance between the centroids of X and Y. Simple and effective but does not give much real information on the actual degree of overlapping. 2) The median of the all the distances between all pairs of points (x,y). Same problem as (1), partially resolved. 3) The proportion of points of X U Y which fall outside the intersection of the convex or concave hulls (defined with a smoothing parameter) of X and Y, i.e. C(X) intersect C(Y). Very complicated, and does not necessarily lead to What do you think? Are there any other approaches worth considering? Kind Regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Concave hull
Dear David and other concave-hull-ists, yes, I meant concave hulls indeed. I know about the algorithm mentioed (www.concavehull.com) but it is not open source, so you cannot integrate it in R, and it is apparently patented, so even if you find the description you cannot apply it to implement a solution (even if patenting algorithms is at least questionable and has a rather patchy validity). Some questions / comments which applies to David's approach but in general even to convex hulls (question 2): 1) How do you extend it to n dimensions (in R)? 2) How do you do "set calculus" (horrible expression to mean: union, intersection, difference, and particularly membership, and so on ) on these hulls (in R)? Finally, I am at the moment using a gis to do it, but I did not find any command for concave hulls in grass. There is a rather long a convoluted way of doing them, but nearly impossible to automatise (see http://grass.osgeo.org/wiki/Create_concave_hull). Looking for the capability of extending it to the n-dimensional case does not sound right, because gis is thought for working in 2d/3d. Best, On Nov 26 2009, David Winsemius wrote: On Nov 25, 2009, at 7:51 PM, David Winsemius wrote: Drats; Forgot the plot: xx <- runif(100, -1, 1) yy <- abs(xx)+rnorm(100,0,.2); plot(xx,yy, xlim=c( min(xx)-sd(xx), max(xx)+sd(xx)), ylim =c( min(yy)-sd(yy), max(yy)+sd(yy))) dens2 <- kde2d(xx, yy, lims=c(min(xx)-sd(xx), max(xx)+sd(xx), min(yy)-sd(yy), max(yy)+sd(yy) ) ) contour(dens2, add=TRUE) # You can pick a single contour if you like: contour(dens2, level=0.05, col="red", add=TRUE) contour(dens2, level=0.10, col="blue", add=TRUE) And as a further note you can drop the bandwidth and lower the density level to get a tighter fit: xx <- runif(1, -1, 1) yy <- abs(xx)+rnorm(1 ,0,.2); plot(xx,yy, xlim=c( min(xx)- sd(xx), max(xx)+sd(xx)), ylim =c( min(yy)-sd(yy), max(yy)+sd(yy)), cex=.2) dens2 <- kde2d(xx, yy, lims=c(min(xx)-sd(xx), max(xx)+sd(xx), min(yy)- sd(yy), max(yy)+sd(yy) ) , h=c(bandwidth.nrd(xx)/4, bandwidth.nrd(xx)/ 4) ) contour(dens2, add=TRUE) # You can pick a single contour if you like: contour(dens2, level=0.05, col="red", add=TRUE) contour(dens2, level=0.10, col="blue", add=TRUE) contour(dens2, level=0.005, col="red", add=TRUE) (More bat-like.) -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Concave hull
Dear friends, Do you know how to calculate the CONCAVE hull of a set of points (2- dimensional or n-dimensional)? is that possible in R? (With a "smoothing" parameter of course). Best, -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] IRLS or other iteratively re weighted optimization algorithms with constraints in R
Dear list, is there an iterative re weighted least square based algorithm or any or other iteratively re weighted optimisation algorithms for non linear (and possibly non parametric) optimisation problems with constraints available in R? Regards -- Corrado __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fetch large sized file from SQL
I think you can specify the number of rows to be loaded at a time. It was quite a while ago. Try reading ?sqlQuery ?odbcConnect I have loaded quite large tables. On Friday 02 October 2009 14:59:59 Dr. Alireza Zolfaghari wrote: > But the problem is that the dataframe size in sql is large, therefore odbc > can sqlQuery() can not handel it. > > On Fri, Oct 2, 2009 at 2:14 PM, Corrado wrote: > > You can try using RODBC, it allows you to connect to databases using the > > ODBC > > driver. > > > > I had some difficulties using it with the postgresSQL driver in the past, > > because of some apparent incompatibility with the native postrgesSQL ODBC > > driver. I think the problem where solved by the new ODBC driver for > > postgresSQL and the new revision for RODBC. > > > > On Friday 02 October 2009 13:56:02 Dr. Alireza Zolfaghari wrote: > > > Hi List, > > > Does any one know what package I need to use in order to fetch/get a > > > > large > > > > > sized dataframe from SQL? I have already used sqldf package which is > > > good for fetching large sized csv files. > > > > > > Thanks > > > Alireza > > > > > > [[alternative HTML version deleted]] > > > > > > __ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html<http://www.r-project.org/po > > >sting-guide.html>and provide commented, minimal, self-contained, > > > reproducible code. > > > > -- > > Corrado Topi > > > > Global Climate Change & Biodiversity Indicators > > Area 18,Department of Biology > > University of York, York, YO10 5YW, UK > > Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fetch large sized file from SQL
You can try using RODBC, it allows you to connect to databases using the ODBC driver. I had some difficulties using it with the postgresSQL driver in the past, because of some apparent incompatibility with the native postrgesSQL ODBC driver. I think the problem where solved by the new ODBC driver for postgresSQL and the new revision for RODBC. On Friday 02 October 2009 13:56:02 Dr. Alireza Zolfaghari wrote: > Hi List, > Does any one know what package I need to use in order to fetch/get a large > sized dataframe from SQL? I have already used sqldf package which is good > for fetching large sized csv files. > > Thanks > Alireza > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, minimal, > self-contained, reproducible code. -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with dist (bug?)
Dear list, here is the code that generates the problem: library(proxy) scot<-read.csv("scot.csv",header=TRUE) scot24_climate<-scot24[,1105:1109] # Scotland dist_scot24_climate<- dist(scot24_climate,method="correlation",diag=TRUE,upper=TRUE) max(dist_scot24_climate) is 1.9. I do not think it should be, because the value is usually the cos() of the angle between the 2 vectors. If you use method="cosine" you have 1.8, which I think it should not be. Is there a problem with the way I use it, or is there a bug? I have been able to reduce the scot.csv to under 200MB, but I thought of not posting it to the list > We need to see the data and the script that produced the error. > > On Fri, Oct 2, 2009 at 5:06 AM, Corrado wrote: > > Dear list, > > > > using package "proxy". > > > > In one situation, the dissimilarity between two vectors based on > > method=correlation returns a value of 1.9. That should not happen, should > > it? > > > > The correlation is normally the cos() of the angle between the two > > vectors > > > > Any clue? > > > > Package dist 0.4-3 on R 2.9.2 on Kubuntu 904 64 bit. > > > > Regards > > -- > > Corrado Topi > > > > Global Climate Change & Biodiversity Indicators > > Area 18,Department of Biology > > University of York, York, YO10 5YW, UK > > Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html and provide commented, > > minimal, self-contained, reproducible code. -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with dist (bug?)
Dear list, using package "proxy". In one situation, the dissimilarity between two vectors based on method=correlation returns a value of 1.9. That should not happen, should it? The correlation is normally the cos() of the angle between the two vectors That dissimilarity Any clue? Package dist 0.4-3 on R 2.9.2 on Kubuntu 904 64 bit. Regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A point in a vector?
Dear list, I have a strange requirement I have a vector, for example v<- c(0,0,0,0,1,2,4,6,8,8,8,8). I have a value,for example x<- 4.8. I would like to understand in which sub interval of v is x. In this case, v would be in the sub interval [4,6] that is in the subinterval starting from element j=7 to the element j+1=8. Can we do that with an R command? Regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Something wrong with my function Please Help
Did you run debug over your function? Load the library debug, and then run mtrace over your function. library(debug) ? mtrace hth On Tuesday 29 September 2009 04:29:37 Chunhao Tu wrote: > Hi R users, > I try to build a function to compute odds ratio and relative risk however > something wrong. I stuck for many hours but I really don't know how to > solve it. Would someone please give me a hint? > > > OR.RR<-function(x){ > > + x <- as.matrix(any(dim(x)==2)) > + OR<-(x[1,1]*x[2,2])/(x[1,2]*x[2,1]) > + RR<-(x[1,1]/(sum(x[1,])))/(x[2,1]/(sum(x[2,]))) > + return(OR);return(RR) > + } > > > tt<-matrix(data=1:4,nrow=2,ncol=2) > > OR.RR(tt) > > Error in OR.RR(tt) : subscript out of bounds > > Many Thanks > Tu -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SAS user now converting to R - Help with Transpose
I think you want to look at the command "reshape", it may solve your problem. Type ?reshape in the R console on your system. On Monday 28 September 2009 18:35:25 Gabor Grothendieck wrote: > > I have a dataset that looks like this: > > > > Chemical Well1 Well2 Well3 Well4 > > BOD 13.2 14.2 15.5 14.2 > > O2 7.8 2.6 3.5 2.4 > > TURB 10.2 14.6 18.5 17.3 > > and so on with more chemicals > > > > I would like to transpose my data so that it looks like this: > > Chemical WellID Value > > BOD Well1 13.2 > > BOD Well2 14.2 > > BOD Well3 15.5 > > BOD Well4 14.2 > > O2 Well1 7.8 > > O2 Well2 2.6 > > and so on > > > > In sas I would code it like this: > > proc sort data=ds1; by chemical; run; > > Proc Transpose data=ds1 out=ds2; > > by chemical; > > var Well1 Well2 Well3 Well4; > > run; > > data ds3; set ds2; > > rename _name_ = WellID; > > rename col1 = value; > > run; > > > > How can I do this in R?? Any help is much appreciated. Thanks! > > -- > > View this message in context: > > http://www.nabble.com/SAS-user-now-converting-to-R---Help-with-Transpose- > >tp25645393p25645393.html Sent from the R help mailing list archive at > > Nabble.com. > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html and provide commented, > > minimal, self-contained, reproducible code. > > ______ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, minimal, > self-contained, reproducible code. -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] define a new family (and a new link function) for gam in gam package
Dear Simon, I want a simple generalised additive model for regression which I can customise (force /define a certain basis, certain order for the splines, knots vector, generating new link functions, and force the regression through 0,0). What do you suggest? On Friday 18 September 2009 16:06:31 Simon Wood wrote: > > I am using gam in gam package (not in mgcv) it is possible to force > > gam in mgcv to behave like gam in gam package? > > -- not *exactly*, no. But what do you want to do? (i.e. what feature of > `gam' do you need?) -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] define a new family (and a new link function) for gam in gam package
Dear David, I am using gam in gam package (not in mgcv) it is possible to force gam in mgcv to behave like gam in gam package? On Thursday 17 September 2009 23:00:17 David Winsemius wrote: > On Sep 17, 2009, at 1:39 PM, Topi, Corrado wrote: > > Dear R list, > > > > is it possible to define a new family (and a new link function) for > > gam in gam package? How? > > > > I read the help for gam, family, gam.model, make.link but I did not > > find a solution. > > Wood provides an example for negbin with alternate links in package > mgcv; > > library(mgcv) > ?negbin > negbin # produces the code -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] define a new family (and a new link function) for gam in gam package
Dear R list, is it possible to define a new family (and a new link function) for gam in gam package? How? I read the help for gam, family, gam.model, make.link but I did not find a solution. Regards -- Corrado Topi Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identical(length(x), 1) returns FALSE, but print(length(x)) is 1, length(x)==1 is TRUE, and is.integer(lenght(x)) is TRUE????
On Tuesday 15 September 2009 17:28:02 Gavin Simpson wrote: > [note you don't give us your x so I'm making this up - This is what > Duncan was going on about in an earlier thread, give us something we can > just paste into R and it works] Dear Gavin, I do not understand what more information! Take any vector of length 1, for example x<-1. Plus all the command that where in my previous email What is the logic behind identical(length(x),1) being false? Regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Function returns different results on the vector as a whole vs. on the values of the vector
Dear R friends, I have developed the function here below attached. Strangely, when I apply it to a vector it behaves very differently than when I apply it separately to each value of the vector itself. Is there any reason why? Here is the function: # TODO: Add comment # # Author: ct529, 3 Sep 2009, 08:42:50,mspline.R ### mspline<-function(i=1,x=0,k=1,t=c(0,1)){ # x is the variable # i is the index of the member of the Mspline family # t is the vector of knots. t[h] is the h-th knot. # k is the Mspline degree I<-i if(identical(k,1)){ if( x=t[i] ){ td<-t[i+1]-t[i] M<-1/td }else{ M<-0 } }else if (k>1) { kk<-(k-1) if (x>=t[i] && x=t[i+k]){ M<-0 } } return(M) } For example: source("./functions/mspline.R") X<-seq(0,1,0.1) Q<-c(0,0,0,0.3,0.5,0.6,1,1,1) II<-c(1,2,3,4,5,6) plot(c(0,1),c(0,24),type="p",col="white",cex=".4",pch=".") for (h in II) { y<-vector() for ( in X) { y<-append(y,mspline(i=h,x=,k=3,t=Q)) } points(X,y,type="l",col="green") } works very differently from using a vectorial approach, that is substituting the inner for iteration with the expression: y<-mspline(i=h,x=X,k=3,t=Q) Regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] identical(length(x), 1) returns FALSE, but print(length(x)) is 1, length(x)==1 is TRUE, and is.integer(lenght(x)) is TRUE????
Dear R, the condition: identical(length(x),1) returns FALSE but print(length(x)) returns 1 and: is.vector(x) is TRUE. is.integer(length(x)) is TRUE length(x) ==1 is TRUE I am puzzled. Regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with error on function: Error in .... attempt to apply non-function (Solution)
Dear friends, the problem with the error has been solved (thanks to Peter). The line 41 in http://scsys.co.uk:8002/33852 should be rewritten as M<-k*((x-t[i])*m0+(t[i+k]-x)*m1)/((k-1)*(t[i+k]-t[i])) On Tuesday 15 September 2009 11:32:53 Corrado wrote: -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with error on function: Error in .... attempt to apply non-function
Dear Duncan, this is a reproducible example: it is the function copied straight from my Eclipse. I found the mistake (thanks to Peter) On Tuesday 15 September 2009 11:15:29 Duncan Murdoch wrote: > Corrado wrote: > > Dear R gurrus, > > > > I wrote this function > > > > http://scsys.co.uk:8002/33852?ln=on&store=on&submit=Format+it! > > > > for a small package I am preparing. > > > > Whenever I run the function I get the error > > > > Error in Mspline(i = i, x = x, degree = kk, t = t) : attempt to apply > > non- function > > > > Anyone could point me out what I am doing wrong? > > It would be a lot easier to do so if you gave us a reproducible example. > But the usual cause for that is using () instead of [], or forgetting an > operator. I think you've done the second: you have (k-1)(t[i+k]-t[i]) > where you should have (k-1)*(t[i+k]-t[i]). > > Duncan Murdoch -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with error on function: Error in .... attempt to apply non-function
Dear R gurrus, I wrote this function http://scsys.co.uk:8002/33852?ln=on&store=on&submit=Format+it! for a small package I am preparing. Whenever I run the function I get the error Error in Mspline(i = i, x = x, degree = kk, t = t) : attempt to apply non- function Anyone could point me out what I am doing wrong? kubuntu 904 64 bit, R 2.9.2 Regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] AIC and goodness of prediction - was: Re: goodness of "prediction" using a model (lm, glm, gam, brt,
Dear Kingsford, I apologise for breaking the thread, but I thought there were some more people who would be interested. What you propose is what I am using at the moment: the sum of the squares of the residuals, plus variance / stdev. I am not really satisfied. I have also tried using R2, and it works well but some people go a bit wild eyed when they see a negative R2 (which is perfectly reasonable when you use R2 as a measure of goodness of fit on prediction on a dataset different from the training set). I was then wondering whether it would make sense to use AIC: the K in the formula will still be the number of parameters of the trained model, the "sum of square residuals" would be the (predicted - observed)^2, N would be the number of samples in the test dataset. I think it should work well. What do you / other R list members think? Regards On Thursday 03 September 2009 15:06:14 Kingsford Jones wrote: > There are many ways to measure prediction quality, and what you choose > depends on the data and your goals. A common measure for a > quantitative response is mean squared error (i.e. 1/n * sum((observed > - predicted)^2)) which incorporates bias and variance. Common terms > for what you are looking for are "test error" and "generalization > error". > > > hth, > Kingsford > > On Wed, Sep 2, 2009 at 11:56 PM, Corrado wrote: > > Dear R-friends, > > > > How do you test the goodness of prediction of a model, when you predict > > on a set of data DIFFERENT from the training set? > > > > I explain myself: you train your model M (e.g. glm,gam,regression tree, > > brt) on a set of data A with a response variable Y. You then predict the > > value of that same response variable Y on a different set of data B (e.g. > > predict.glm, predict.gam and so on). Dataset A and dataset B are > > different in the sense that they contain the same variable, for example > > temperature, measured in different sites, or on a different interval > > (e.g. B is a subinterval of A for interpolation, or a different interval > > for extrapolation). If you have the measured values for Y on the new > > interval, i.e. B, how do you measure how good is the prediction, that is > > how well model fits the Y on B (that is, how well does it predict)? > > > > In other words: > > > > Y~T,data=A for training > > Y~T,data=B for predicting > > > > I have devised a couple of method based around 1) standard deviation 2) > > R^2, but I am unhappy with them. > > > > Regards > > -- > > Corrado Topi > > > > Global Climate Change & Biodiversity Indicators > > Area 18,Department of Biology > > University of York, York, YO10 5YW, UK > > Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk > > > > ______ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html and provide commented, > > minimal, self-contained, reproducible code. -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Negative AIC
I think the problem is trying to compare different models trained don the same dataset. 1) If I compare for example gam (from gam package) with and without intercept, is that a valid comparison? For example: model with intercept has explained dev 24%, with AIC -2217146, model without intercept has explained dev 85.5% with AIC 217488.1 The results sound incredibly strange, but there is actually no difference in the model but the removal of the intercept :(. So which model is "better" at fitting the data 2) If I compare for example gam from gam package with let's say gam from mgcv (using tpsp), then I get two completely analogous AIC, but are they comparable? gam from mgcv package: -2195000 gam from gam package: -2217000 3) I would like to compare those AIC to the AIC obtained by running BRT on the same dataset. I was thinking of simply recalculating manually the AIC using the formula: AIC=2K+N*log(rss/N) where K is the number of parameters of the regression (i.e. the coefficient that are not zero, I would think) and N is the number of samples. What do you think? Would that be reasonable? Regards On Thursday 10 September 2009 16:39:32 Ben Bolker wrote: > If all the models are fitted to the same data set, using the same > modeling tools (you have to be careful e.g. comparing lmer models to > glm models, because they use different additive constants), and > everything seems to make sense (!!!), then yes. I would be a little > surprised, and think that something was wrong, if you have some AIC > values that are on the order of -20,000 (as below) and others that are > +20,000 ... > > Ben Bolker > > Corrado wrote: > > My worry is: can I compare negative AIC with positive AIC? does the > > comparison still hold? > > > > On Thursday 10 September 2009 15:57:01 Ben Bolker wrote: > >> Corrado-5 wrote: > >>> Dear R list, > >>> > >>> I just obtained a negative AIC for two models (-221.7E+4 > >>> and -230.2E+4). Is that normal? > >> > >> It's not necessarily wrong. See <http://emdbolker.wikidot.com/faq> -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Negative AIC
My worry is: can I compare negative AIC with positive AIC? does the comparison still hold? On Thursday 10 September 2009 15:57:01 Ben Bolker wrote: > Corrado-5 wrote: > > Dear R list, > > > > I just obtained a negative AIC for two models (-221.7E+4 > > and -230.2E+4). Is that normal? > > It's not necessarily wrong. See <http://emdbolker.wikidot.com/faq> -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Intercept=0 in gam from gam package
Dear R list, is it possible to force the intercept to assume the value of 0 (that is no intercept) in gam from gam package? Regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Negative AIC
Dear R list, I just obtained a negative AIC for two models (-221.7E+4 and -230.2E+4). Is that normal? Regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matrix regression
Dear friends, I would like to solve the following regression problem: y=c1 x1 + c2 x2 + + cn xn where the y, xi are all matrices and the ci are constants that need to be determined. The y, xi are distance matrices (symmetric). ci should be forced to positive or null (i.e. non negative). Any suggestion? I will be more than happy to share the results of my quest with the list or developers. Regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] goodness of "prediction" using a model (lm, glm, gam, brt, regression tree .... )
Dear R-friends, How do you test the goodness of prediction of a model, when you predict on a set of data DIFFERENT from the training set? I explain myself: you train your model M (e.g. glm,gam,regression tree, brt) on a set of data A with a response variable Y. You then predict the value of that same response variable Y on a different set of data B (e.g. predict.glm, predict.gam and so on). Dataset A and dataset B are different in the sense that they contain the same variable, for example temperature, measured in different sites, or on a different interval (e.g. B is a subinterval of A for interpolation, or a different interval for extrapolation). If you have the measured values for Y on the new interval, i.e. B, how do you measure how good is the prediction, that is how well model fits the Y on B (that is, how well does it predict)? In other words: Y~T,data=A for training Y~T,data=B for predicting I have devised a couple of method based around 1) standard deviation 2) R^2, but I am unhappy with them. Regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange error returned or bug in gam in mgcv????
Dear Gavin, Simon, this is the result of str: > str(dist_scot24_vector_with_climate) 'data.frame': 2265025 obs. of 14 variables: $ X : int 1 2 3 4 5 6 7 8 9 10 ... $ tetrad_i: Factor w/ 1505 levels "HP61A","HP61I",..: 1505 1504 1503 1502 1501 1500 1499 1498 1497 1496 ... $ tetrad_j: Factor w/ 1505 levels "HP61A","HP61I",..: 1505 1505 1505 1505 1505 1505 1505 1505 1505 1505 ... $ bray: num 0 0.566 0.251 0.407 0.45 ... $ PC1 : num -3.97 -3.14 -7.27 -5.77 -5.88 ... $ PC2 : num 3.26 2.87 3.19 2.96 2.97 ... $ PC3 : num -0.16511 -0.28601 -0.00362 -0.11685 -0.09695 ... $ PC4 : num -0.629 -0.696 -0.6 -0.683 -0.639 ... $ PC5 : num 0.2603 0.3818 -0.0148 0.0967 0.094 ... $ PC6 : num -3.97 -3.97 -3.97 -3.97 -3.97 ... $ PC7 : num 3.26 3.26 3.26 3.26 3.26 ... $ PC8 : num -0.165 -0.165 -0.165 -0.165 -0.165 ... $ PC9 : num -0.629 -0.629 -0.629 -0.629 -0.629 ... $ PC10 : num 0.26 0.26 0.26 0.26 0.26 ... > It looks ok to me. What do you think? On Tuesday 01 September 2009 18:43:24 Gavin Simpson wrote: > On Tue, 2009-09-01 at 17:55 +0100, Corrado wrote: > > Dear Simon, > > > > I have stored all information at the link: > > > > http://scsys.co.uk:8002/33309?hl=on&submit=Format+it! > > You could have included that in your mail to the list - it is just plain > text after all. > > > I have the same problem if I do > > s(PC1) + . + s(PC10) or > > s(Pc1,PC2,PC3,PC4,PC5)+s(PC6,PC7,PC8,PC9,PC10) or > > s(PC1,PC2,PC3,PC6,PC7,PC8) . > > > > I have renamed PC1.1,PC2.1,PC3.1,PC4.1,PC5.1 to PC6,PC7,PC8,PC9,PC10 for > > simplicity. > > What does > > str(dist_scot24_vector_with_climate) > > show? I seem to recall getting similar errors when I'd done something > silly in a data prep routine and had data in a data frame that wasn't > numeric but looked like it was - a factor for example. > > If you can't do some quite simple things like the first of your three > alternatives above, that suggests something amiss with the data. That'd > be the first thing to check. > > HTH > > G > > > Regards > > > > On Tuesday 01 September 2009 17:31:04 Simon Wood wrote: > > > The basic problem is that you have requested a 10 dimensional thin > > > plate spline, with a basis dimension of 196830. In reality it will not > > > be possible to compute this, even if you have more than 196830 data. In > > > any case it would be unlikely to provide a very useful model --- the > > > "simplest" function that it can theoretically represent will have 3003 > > > degrees of freedom. > > > > > > That said the error message is obviously rather unhelpful... Can you > > > tell me how many data you are actually trying to fit, and I'll try and > > > track down exactly where it's failing, and put in a more informative > > > message. > > > > > > best, > > > Simon > > > > > > On Tuesday 01 September 2009 14:51, Corrado wrote: > > > > Dear friends, > > > > > > > > what is this error message in gam I cannot understand what it > > > > means is it a bug? > > > > > > > > gam_bray_scot24_pc_0505 > > > PC1.1,PC2.1,PC3.1,PC4.1,PC5.1),data=dist_scot24_vector_with_climate) > > > > > > > > Error in if (length(data) != vl) { : > > > > missing value where TRUE/FALSE needed > > > > Calls: gam ... smooth.construct -> smooth.construct.tp.smooth.spec -> > > > > array In addition: Warning message: > > > > In array(0, n * k) : NAs introduced by coercion > > > > Execution halted > > > > > > > > Thanks in advance, > > > > > > > > Best regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange error returned or bug in gam in mgcv????
Dear Simon, I have stored all information at the link: http://scsys.co.uk:8002/33309?hl=on&submit=Format+it! I have the same problem if I do s(PC1) + . + s(PC10) or s(Pc1,PC2,PC3,PC4,PC5)+s(PC6,PC7,PC8,PC9,PC10) or s(PC1,PC2,PC3,PC6,PC7,PC8) . I have renamed PC1.1,PC2.1,PC3.1,PC4.1,PC5.1 to PC6,PC7,PC8,PC9,PC10 for simplicity. Regards On Tuesday 01 September 2009 17:31:04 Simon Wood wrote: > The basic problem is that you have requested a 10 dimensional thin plate > spline, with a basis dimension of 196830. In reality it will not be > possible to compute this, even if you have more than 196830 data. In any > case it would be unlikely to provide a very useful model --- the "simplest" > function that it can theoretically represent will have 3003 degrees of > freedom. > > That said the error message is obviously rather unhelpful... Can you tell > me how many data you are actually trying to fit, and I'll try and track > down exactly where it's failing, and put in a more informative message. > > best, > Simon > > On Tuesday 01 September 2009 14:51, Corrado wrote: > > Dear friends, > > > > what is this error message in gam I cannot understand what it means > > is it a bug? > > > > gam_bray_scot24_pc_0505 > PC1.1,PC2.1,PC3.1,PC4.1,PC5.1),data=dist_scot24_vector_with_climate) > > > > Error in if (length(data) != vl) { : > > missing value where TRUE/FALSE needed > > Calls: gam ... smooth.construct -> smooth.construct.tp.smooth.spec -> > > array In addition: Warning message: > > In array(0, n * k) : NAs introduced by coercion > > Execution halted > > > > Thanks in advance, > > > > Best regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange error returned or bug in gam in mgcv???? - yet more additional information
I am using mgcv 1.4-1.1 on Fedora 9 64 bit on an Opteron server with 8Gb of RAM. On Tuesday 01 September 2009 15:19:28 Corrado wrote: > Here I pasted the code from when I opened the R shell, so that it possible > to see what is going on: > > http://scsys.co.uk:8002/33309?hl=on&submit=Format+it! > > Thanks in advance -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange error returned or bug in gam in mgcv???? - additional information
Here I pasted the code from when I opened the R shell, so that it possible to see what is going on: http://scsys.co.uk:8002/33309?hl=on&submit=Format+it! Thanks in advance -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange error returned or bug in gam in mgcv????
Nope Of course, it was just a copy and paste problem On Tuesday 01 September 2009 15:00:34 David Winsemius wrote: > On Sep 1, 2009, at 9:51 AM, Corrado wrote: > > Dear friends, > > > > what is this error message in gam I cannot understand what it > > means > > is it a bug? > > > > gam_bray_scot24_pc_0505 > PC1.1,PC2.1,PC3.1,PC4.1,PC5.1),data=dist_scot24_vector_with_climate) > > If the code was as posted, you have entered "<" where you probably > wanted "<-". > > > Error in if (length(data) != vl) { : > > missing value where TRUE/FALSE needed > > Calls: gam ... smooth.construct -> smooth.construct.tp.smooth.spec - > > > > > array > > > > In addition: Warning message: > > In array(0, n * k) : NAs introduced by coercion > > Execution halted > > > > Thanks in advance, > > > > Best regards > > -- > > Corrado Topi > > > > Global Climate Change & Biodiversity Indicators > > Area 18,Department of Biology > > University of York, York, YO10 5YW, UK > > Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html and provide commented, > > minimal, self-contained, reproducible code. > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Strange error returned or bug in gam in mgcv????
Dear friends, what is this error message in gam I cannot understand what it means is it a bug? gam_bray_scot24_pc_0505 smooth.construct.tp.smooth.spec -> array In addition: Warning message: In array(0, n * k) : NAs introduced by coercion Execution halted Thanks in advance, Best regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Google's R Style Guide
Thanks Duncan, Spencer, To clarify, the situation is: 1) I have no reasons to choose S3 on S4 or vice versa, or any other coding convention 2) Our group has not done any OO developing in R and I would be the first, so I can set up the standards 3) I am starting from scratch with a new package, so I do not have any code I need to re-use. 4) I am an R OO newbie, so whatever I can learn from the beginning what is better and good for me. So the questions would be two: 1) What coding style guide should we / I follow? Is the google style guide good, or is there something better / more prescriptive which makes our research group life easier? 2) What class type should I use? From what you two say, I should use S3 because is easier to use what are the disadvantages? Is there an advantages / disadvantages table for S3 and S4 classes? Thanks -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Google's R Style Guide
I do not understand why one should use a S3 preferentially on a S4 class, if S4 is more rigorous. (The premiss is I am a newbie with OO programming in R, and would like to understand what is the "proper" way to OO program in R ) Regards On Saturday 29 August 2009 16:23:39 hadley wickham wrote: > >> An opening curly brace should never go on its own line; > > > > I tend to do this: > > > > f <- function() > > { > > if (TRUE) > >{ > > cat("TRUE!!\n") > >} else { > > cat("FALSE!!\n") > >} > > } > > > > (I don't usually put one-liners in if/else blocks; here I would have > > used ifelse) > > > > I haven't seen many others format code in this way. Is there an > > objective reason for this (such as the rule for the trailing "}") or > > is this just aesthetics? > > It's probably just aesthetics. I don't like it because it increases > the number of lines without much real benefit - indenting already > gives you all the hints you need. > > Hadley -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best R text editors?
I am using 3.4.2 not 3.5, I would not know. But it is worth visiting the STATET mailing list archive and subscribe to the mailing list. On Friday 28 August 2009 09:22:14 [Ricardo Rodriguez] Your XEN ICT Team wrote: > Hi! > > Corrado wrote: > > Eclipse + StatET (the R plugin) both on Linux and Windows > > Please, does it work with Eclipse 3.5 Galileo on a Mac OS X (10.5.8) box? > > Thanks! -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best R text editors?
Eclipse + StatET (the R plugin) both on Linux and Windows On Thursday 27 August 2009 20:43:41 Jonathan Greenberg wrote: > Quick informal poll: what is everyone's favorite text editor for working > with R? I'd like to hear from people who are using editors that have > some level of direct R interface (e.g. Tinn-R, Komodo+SciViews). Thanks! > > --j -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Rd] Formulas in gam function of mgcv package
Dear Simon, thanks again. Concerning the whole 36 variables well, I have run a principal components analysis, and I am only using part of them (I am running a test with the pc which cover the 95% of variance and then the 99%). :) so I will possibly end up with s(x1,,x8). I wonder if using isotropic smoothers on principal component is a good idea the variance diminishes from component to component, so theoretically also the wiggliness of the smoother should be less and less what do you think? am I saying something stupid? If that is the case, and if I want to enclose some interaction, then I have so include the interaction terms manually like s(x1,x2). Is that right? Sorry for the avalanche of questions, but I am trying to understand the principles underlying the working of gam in mgcv. It looks very powerful, particularly for exploring dependencies. I have run te() instead of s(), but the predictive power seems to be less than with s() in this particular situation. At the same time, does te() include the interaction? I did not understand well your previous point on interaction term in te(): is te(x1,,xn) build as an expansion from the t(x1), ,t(xn)? Then all the interaction terms should be included Finally, is it possible to incorporate both s() and te() terms in the formula? Machine learning: I am not too well versed in the area. Did you mean regression trees or maximum entropy models? Best, On Wednesday 26 August 2009 10:27:08 Simon Wood wrote: > This will not work... > > > 2) y~s(x1, ,x36) > > Estimating a 36 dimensional functions reasonably well would require a > tremendous quantity of data, but in any case the 36 dimensional TPS > smoothnes measure will involve such high order derivatives that it will no > longer be practically useful: in fact you will not have enough data to > estimate the unpenalized coefficients of the smoother (and if you did R > would run out of memory first). > > In such a high dimensional situation, I think that GAMs are really only > useful if you have some prior knowledge of which variables are likely to > interact (and it's not too many of them). If there's no prior information > saying roughly what sort of smooth additive structure might be useful then, > I'm not sure that GAMs are the right way to go, and some sort of machine > learning approach might be better. > > Then again, the real problem with > y~s(x1, ,x36) > is that the data just won't contain enough information to estimate s, if > all you can say is that s is smooth, but this also means that it's very > unlikely that you really need to estimate s(x1, ,x36) in order to > predict well. In that case, starting from > y ~ s(x1) + + s(x36) > and building the model up might result in something that does a reasonable > predictive job. > > On the subject of tensor product smoothing vs isotropic smoothing. > Isotropic smooths are really only reasonable if you think that the smooth > should display approximately the same amount of wiggliness in all > directions. If this is not the case then tensor product smoothing is a > better bet. Centering and scaling alone is not enough to ensure that > isotropy is reasonable (although in particular cases it may help, of > course). > > best, > Simon > > > I am trying to build a predictive model. Since the the variables are > > centred and scaled, I think I need an isotropic smooth. I am also > > interested in having the interactions between the variables included, > > that is not a purely additive model. > > > > It is not clear to me when should I give preference to tensor smooths, > > possibly because I have not understood well how they work. > > > > I am reading Wood(2003) as recommended and I have also read rather > > extensively Simon N. Wood. Generalized Additive Models: An Introduction, > > 2006, but still I am stuck. Any additional suggestion or reading > > recommendation would be greatly appreciated. > > > > I have also some difficulties in understanding the values you have chosen > > for k in the first example (why 60?). > > > > Thanks > > > > Best, > > > > On Monday 24 August 2009 17:33:55 Gavin Simpson wrote: > > > [Note R-Devel is the wrong list for such questions. R-Help is where > > > this should have been directed - redirected there now] > > > > > > On Mon, 2009-08-24 at 17:02 +0100, Corrado wrote: > > > > Dear R-experts, > > > > > > > > I have a question on the formulas used in the gam function of the > > > > mgcv package. > > > > > > > > I am trying to understand the relationships between: > > > >
Re: [R] [Rd] Formulas in gam function of mgcv package
Dear Simon, thanks for your answer. I am running the model with both s and te smoothing, to compare. A few questions on your email: 1) Isotropic smoothness: my variables are centred and scaled. I assumed an isotropic smoother (that is, a smoother that treats all the variables in the same way) was good. What do you think? Is my understanding of isotropic smoothing wrong? 2) s(x1,, xn): it does not contains (1), but I thought it was true that it does improve on (1) by being free of including some interaction, albeit not explicitly is my interpretation wrong? 3) te: I am confused! What does it mean that the function space for (4) is built up from the function spaces used in (3)? Does it mean that te(xi,,xn) is an expansion on the te(xi), including all the terms te(x1)*te(x2)**te(xj)**te(xn) of the different orders? Example: in the case of 4 variables, including te(x1)*te(x2), te(x2)*te(x3), te(x1)*te(x2)*te(x3) to te(x1)*te(x2)*te(x3)*te(x4) . Sorry for being particularly daft Regards On Wednesday 26 August 2009 09:56:13 you wrote: > > > I am trying to understand the relationships between: > > > > > > y~s(x1)+s(x2)+s(x3)+s(x4) > > > > > > and > > > > > > y~s(x1,x2,x3,x4) > > > > > > Does the latter contain the former? what about the smoothers of all > > > interaction terms? > > The first says that you want a model > E(y) = f_1(x_1) + f_2(x_2) + f_3(x_3) + f_4(x_4) (1) > where the f_j are smooth functions. The additive decomposition is quite a > strong assumption, since it assumes that the effect of x_j is not dependent > on x_k unless j=k. The second model is just > E(y) = f(x_1,x_2,x_3,x4) (2) > where f is a smooth function. This looks very general, but actually `s' > terms assume isotropic smoothness, which is also quite a strong assumption. > > Now if I simply state that f and the f_j are `smooth functions', and leave > it at that, then (2) would of course contain (1), but to actually estimate > the models I need to state, mathematically, what I mean by `smooth'. Once > I've done that I've pretty much determined the function spaces in which f > and the f_j will lie, and in general (2) will no longer strictly contain > (1). mgcv's `s' terms use a thin plate spline measure of smoothness for > multivariate smooths, and this means that (1) will not be strictly nested > within (2), since e.g. a 4D thin plate spline can not generally represent > exactly what the sum of 4 1D splines can represent. > > If you want to acheive exact nesting then using tensor product smooths with > something like > > y~te(x1)+te(x2)+te(x3)+te(x4) (3) > > y~te(x1,x2,x3,x4) (4) > > will do the trick (because the function space for (4) is built up from the > function spaces used in (3)). > > As to where all the 2 and 3 way interactions have gone in (4)... it's just > like ANOVA - if you put in a 4 way interaction then the lower order > interactions are not identifiable, unless you choose to add constraints to > make them so. `mgcv' will allow you add main effects and interactions, and > will handle the constraints automatically, but if this sort of functional > ANOVA is a major component of what you want to do, then it is probably > worth checking out the gss package and Chong Gu's book on smoothing spline > ANOVA. > > best, > Simon -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Rd] Formulas in gam function of mgcv package
Dear Gavin / Rlings, thanks for your kind answer and sorry for posting to the dev mailing list. Concerning the specific of your answer: I am working with 6 to 36 covariates, and they are all centred and scaled. I represented the problem with two variables to simplify the question. So ideally, the situation is: 1) y ~ s(x1) + + s(x36) vs. 2) y~s(x1, ,x36) I am trying to build a predictive model. Since the the variables are centred and scaled, I think I need an isotropic smooth. I am also interested in having the interactions between the variables included, that is not a purely additive model. It is not clear to me when should I give preference to tensor smooths, possibly because I have not understood well how they work. I am reading Wood(2003) as recommended and I have also read rather extensively Simon N. Wood. Generalized Additive Models: An Introduction, 2006, but still I am stuck. Any additional suggestion or reading recommendation would be greatly appreciated. I have also some difficulties in understanding the values you have chosen for k in the first example (why 60?). Thanks Best, On Monday 24 August 2009 17:33:55 Gavin Simpson wrote: > [Note R-Devel is the wrong list for such questions. R-Help is where this > should have been directed - redirected there now] > > On Mon, 2009-08-24 at 17:02 +0100, Corrado wrote: > > Dear R-experts, > > > > I have a question on the formulas used in the gam function of the mgcv > > package. > > > > I am trying to understand the relationships between: > > > > y~s(x1)+s(x2)+s(x3)+s(x4) > > > > and > > > > y~s(x1,x2,x3,x4) > > > > Does the latter contain the former? what about the smoothers of all > > interaction terms? > > I'm not 100% certain how this scales to smooths of more than 2 > variables, but Sections 4.10.2 and 5.2.2 of Simon Wood's book GAM: An > Introduction with R (2006, Chapman Hall/CRC) discuss this for smooths of > 2 variables. > > Strictly y ~ s(x1) + s(x2) is not nested in y ~ s(x1, x2) as the bases > used to produce the smoothers in the two models may not be the same in > both models. One option to ensure nestedness is to fit the more > complicated model as something like this: > > ## if simpler model were: y ~ s(x1, k=20) + s(x2, k = 20) > y ~ s(x1, k=20) + s(x2, k = 20) + s(x1, x2, k = 60) > ^ > where the last term (^^^ above) has the same k as used in s(x1, x2) > > Note that these are isotropic smooths; are x1 and x2 measured in the > same units etc.? Tensor product smooths may be more appropriate if not, > and if we specify the bases when fitting models s(x1) + s(x2) *is* > strictly nested in te(x1, x2), eg. > > y ~ s(x1, bs = "cr", k = 10) + s(x2, bs = "cr", k = 10) > > is strictly nested within > > y ~ te(x1, x2, k = 10) > ## is the same as y ~ te(x1, x2, bs = "cr", k = 10) > > [Note that bs = "cr" is the default basis in te() smooths, hence we > don't need to specify it, and k = 10 refers to each individual smooth in > the te().] > > HTH > > G > > > I have (tried to) read the manual pages of gam, formula.gam, > > smooth.terms, linear.functional.terms but could not understand properly. > > > > Regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using very large matrix
Thanks a lot! Unfortunately, the R package I have to sue for my research was only released on 32 bit R on 32 bit MS Windows and only closed source I normally use 64 bit R on 64 bit Linux :) I tried to use the bigmemory in cran with 32 bit windows, but I had some serious problems. Best, On Thursday 26 February 2009 15:43:11 Jay Emerson wrote: > Corrado, > > Package bigmemory has undergone a major re-engineering and will be > available soon (available now in Beta version upon request). The version > currently on CRAN > is probably of limited use unless you're in Linux. > > bigmemory may be useful to you for data management, at the very least, > where > > x <- filebacked.big.matrix(8, 8, init=n, type="double") > > would accomplish what you want using filebacking (disk space) to hold > the object. > But even this requires 64-bit R (Linux or Mac, or perhaps a Beta > version of Windows 64-bit > R that REvolution Computing is working on). > > Subsequent operations (e.g. extraction of a small portion for analysis) are > then easy enough: > > y <- x[1,] > > would give you the first row of x as an object y in R. Note that x is > not itself an R matrix, > and most existing R analytics can't work on x directly (and would max > out the RAM if they > tried, anyway). > > Feel free to email me for more information (and this invitation > applies to anyone who is > interested in this). > > Cheers, > > Jay > > #Dear friends, > # > #I have to use a very large matrix. Something of the sort of > #matrix(8,8,n) where n is something numeric of the sort > 0.xx # > #I have not found a way of doing it. I keep getting the error > # > #Error in matrix(nrow = 8, ncol = 8, 0.2) : too many elements > specified # > #Any suggestions? I have searched the mailing list, but to no avail. > # > #Best, > #-- > #Corrado Topi > # > #Global Climate Change & Biodiversity Indicators > #Area 18,Department of Biology > #University of York, York, YO10 5YW, UK > #Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Distance between clusters
Dear friends I reformulate the question. I think I did not formulate it properly. I have some data on some sites. I can define a dissimilarity between each pair of sites. Using this dissimilarity, I have clustered the sites using the hclust algorithm, with method ward. I then obtain 48 clusters, by cutting the tree using cutree with k=48. I would now like to estimate the distance between each pair of the 48 resulting clusters. I have read the documentation, but I cannot find a solution. Any clue on how I can do that? This is a snippet of the code: distPredTurn<-as.dist(dissimilarityMatrix) hctr<-hclust(distPredTurn,"ward") cutree(hctr,k=48) Regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Similarity between clusters generated by hclust + cutree
Dear friends I have clustered some objects using the hclust algorithm, with method ward. I then cutree with 48 classes. distPredTurn<-as.dist(resultMatrix) hctr<-hclust(distPredTurn,"ward") cutree(hctr,k=NC) I would like to estimate the similarity between each pair of the 48 clusters, for example as (1-distance or dissimilarity) between the centroids. Any clue on how I can do that? Regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using very large matrix
Dear friends, I have to use a very large matrix. Something of the sort of matrix(8,8,n) where n is something numeric of the sort 0.xx I have not found a way of doing it. I keep getting the error Error in matrix(nrow = 8, ncol = 8, 0.2) : too many elements specified Any suggestions? I have searched the mailing list, but to no avail. Best, -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Principal Component Analysis - Selecting compo nents? + right choice?
Hi, I have been testing some of the alternative suggested approaches. The best PC set may not be the best predictors subset, but is that true that it is not generally the case? If you have to explore data patterns and (potential) relationships between a response variables and a large set of candidate predictors, PC still seem to be best candidate for a relatively quick test. I think some time you have to trade off against time (for example: computing time), and if any pattern emerges from response vs . first k PC then you investigate further am I completely wrong there? what alternative do you have that reduces so drastically the computation request for exploratory purposes? Furthermore, is it really generally not the case that the best PC set, say, the top k PCs contain the best predictor subset in linear regression, or does that happens only in specific situations (that is, generally the best PC set is actually a good set of predictors, but in some specific cases it is not)? Best, On Thursday 11 December 2008 17:30:51 you wrote: > Hi, > > It is generally not the case that the best PC set, say, the top k PCs > (where k < p, p being the number of predcitors) contain the best predictor > subset in linear regression. Hadi and Ling (Amer Stat, 1998) show that it > is even possible to have an extreme situation where the first (p-1) PCs > contribute nothing towards explaining the variation in the response, yet > the last PC alone contributes everything. Their theorem is that if the > true vector of regression coefficients is in the direction of the j-th > eigenvector (of the correlation matrix), then the j-th PC alone will > contribute everything to the model fit, while the remaining PCs will > contribute zilch. They illustrate this phenomenon with a "real" data set > from a classic text on regression, Draper and Smith. > > Ravi. > --- >- --- > > Ravi Varadhan, Ph.D. > > Assistant Professor, The Center on Aging and Health > > Division of Geriatric Medicine and Gerontology > > Johns Hopkins University > > Ph: (410) 502-2619 > > Fax: (410) 614-9625 > > Email: rvarad...@jhmi.edu > > Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html > > > > --- >- > > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of S Ellison > Sent: Thursday, December 11, 2008 9:37 AM > To: r-help@r-project.org; Corrado > Subject: Re: [R] Principal Component Analysis - Selecting components? + > right choice? > > If you're intending to create a model using PCs as predictors, select the > PCs based on whether they contribute significanctly to the model fit. > > In chemometrics (multivariate stats in chemistry, among other things), if > we're expecting 3 or 4 PC's to be useful in a principal component > regression, we'd generally start with at least the first half-dozen or so > and let the model fit sort them out. > > The reason for not preselecting too rigorously early on is that there's no > guarantee at all that the first couple of PC's are good predictors for what > you're interested in. The're properties of the predictor set, not of the > response set. > > Mind you, there used to be something of a gap between chemometrics and > proper statistics; I'm sure chemometricians used to do things with data > that would turn a statistician pale. > > You could also look for a PLS model, which (if I recall correctly) actually > uses the response data to select the latent variables used for prediction. > > S > > >>> Corrado 11/12/2008 11:46:37 >>> > > Dear R gurus, > > I have some climatic data for a region of the world. They are monthly > averages 1950 -2000 of precipitation (12 months), minimum temperature (12 > months), maximum temperature (12 months). I have scaled them to 2 km x 2km > cells, and I have around 75,000 cells. > > I need to feed them into a statistical model as co-variates, to use them to > predict a response variable. > > The climatic data are obviously correlated: precipitation for January is > correlated to precipitation for February and so on even precipitation > and temperature are heavily correlated. I did some correlation analysis and > they are all strongly correlated. > > I though of running PCA on them, in order to reduce the number of > co-variates I feed into the model. > > I run the PCA using prcomp, quite successfully. Now I need to use a > criteria to select the right number of PC. (that is: is
[R] Principal Component Analysis - Selecting components? + right choice?
Dear R gurus, I have some climatic data for a region of the world. They are monthly averages 1950 -2000 of precipitation (12 months), minimum temperature (12 months), maximum temperature (12 months). I have scaled them to 2 km x 2km cells, and I have around 75,000 cells. I need to feed them into a statistical model as co-variates, to use them to predict a response variable. The climatic data are obviously correlated: precipitation for January is correlated to precipitation for February and so on even precipitation and temperature are heavily correlated. I did some correlation analysis and they are all strongly correlated. I though of running PCA on them, in order to reduce the number of co-variates I feed into the model. I run the PCA using prcomp, quite successfully. Now I need to use a criteria to select the right number of PC. (that is: is it 1,2,3,4?) What criteria would you suggest? At the moment, I am using a criteria based on threshold, but that is highly subjective, even if there are some rules of thumb (Jolliffe,Principal Component Analysis, II Edition, Springer Verlag,2002). Could you suggest something more rigorous? By the way, do you think I would have been better off by using something different from PCA? Best, -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] *** buffer overflow detected ***: /usr/lib64/R/bin/exec/R terminated on R 2.6.2 to 2.8.0: logging a bug?
lib64/libreadline.so.5.2 2ad38add8000-2ad38add9000 rw-p 2ad38add8000 00:00 0 2ad38add9000-2ad38ae23000 r-xp 08:02 2044157/lib64/libncurses.so.5.6 2ad38ae23000-2ad38b022000 ---p 0004a000 08:02 2044157/lib64/libncurses.so.5.6 2ad38b022000-2ad38b027000 rw-p 00049000 08:02 2044157/lib64/libncurses.so.5.6 2ad38b027000-2ad38b04d000 r-xp 08:02 2044050/lib64/libpcre.so.0.0.1 2ad38b04d000-2ad38b24c000 ---p 00026000 08:02 2044050/lib64/libpcre.so.0.0.1 2ad38b24c000-2ad38b24d000 rw-p 00025000 08:02 2044050/lib64/libpcre.so.0.0.1 2ad38b24d000-2ad38b24e000 rw-p 2ad38b24d000 00:00 0 2ad38b24e000-2ad38b262000 r-xp 08:02 2044118/lib64/libz.so.1.2.3 2ad38b262000-2ad38b461000 ---p 00014000 08:02 2044118/lib64/libz.so.1.2.3 2ad38b461000-2ad38b462000 rw-p 00013000 08:02 2044118/lib64/libz.so.1.2.3 2ad38b462000-2ad38b464000 r-xp 08:02 2044015/lib64/libdl-2.7.so 2ad38b464000-2ad38b664000 ---p 2000 08:02 2044015/lib64/libdl-2.7.so 2ad38b664000-2ad38b665000 r--p 2000 08:02 2044015/lib64/libdl-2.7.so 2ad38b665000-2ad38b666000 rw-p 3000 08:02 2044015/lib64/libdl-2.7.so 2ad38b666000-2ad38b668000 rw-p 2ad38b666000 00:00 0 2ad38b668000-2ad38b6a7000 r--p 08:02 720800 /usr/share/locale/UTF-8/LC_CTYPE 2ad38b6a7000-2ad38b78b000 r--p 08:02 720801 /usr/share/locale/UTF-8/LC_COLLATE 2ad38b78b000-2ad38b78c000 r--p 08:02 892249 /usr/share/locale/en_GB.UTF-8/LC_TIME 2ad38b78c000-2ad38b78d000 r--p 08:02 892496 /usr/share/locale/en_GB.UTF-8/LC_PAPER 2ad38b78d000-2ad38b78e000 r--p 08:02 892500 /usr/share/locale/en_GB.UTF-8/LC_MAborted [EMAIL PROTECTED]:~$ OS: Mandriva 2008.1 x86_64 Postgresql: 8.3.1 (PostGIS enabled) R: from 2.6.2 from repository to 2.8.0 repackaged Is it my doing, or R's doing? Best, -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R base 2.7.2 packaged for Mandriva 2008.1 x86_64: anyone interested?
I have packaged R base 2.7.2 for Mandriva 2008.1 x86_64 who should I send it to so that it can be made available to everybody? It is my first attempt and it works well on my computer, but it will need some testing. Best, -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] installing source package RGtk2 on mac os x 10.4.11
Dear All, I'm trying to install the source package RGtk2 (RGtk2_2.12.5-3.tar.gz) on a i-Mac running os x 10.4.11: Nome modello:iMac Identificatore modello: iMac6,1 Nome processore: Intel Core 2 Duo Velocità processore: 2.16 GHz Numero di processori:1 Numero totale di nuclei: 2 L2 Cache (per processore): 4 MB Memoria: 2 GB Velocità bus:667 MHz Versione Boot ROM: IM61.0093.B07 Versione SMC:1.10f3 This is a step needed to use Rattle package, as detailed in: http://datamining.togaware.com/survivor/Installation_Details.html: "...Mac/OSX: download the source package from ggobi,2.12 and run the command line install: Mac/OSX: $ R CMD INSTALL RGtk2_2.8.6.tar.gz (30 minutes) You may not be able to compile RGtk2 via the R GUI on Mac/OSX as the GTK libraries can not be found when gcc is called. Once installed though, R will detect the package--don't try to load it within the GUI as GTK is not a native Mac/OSX application and it will fail. On Mac/OSX be sure to run R from the X11 environment. ..." The installation is performed in a Terminal window with the following command: corradogiannasca$ R CMD INSTALL RGtk2_2.12.5-3.tar.gz There is a long listing in output and the installation fails with the following statement: /usr/bin/libtool: internal link edit command failed make: *** [RGtk2.so] Error 1 chmod: /Library/Frameworks/R.framework/Versions/2.7/Resources/library/ RGtk2/libs/i386/*: No such file or directory ERROR: compilation failed for package 'RGtk2' ** Removing '/Library/Frameworks/R.framework/Versions/2.7/Resources/ library/RGtk2' CAN ANYONE HELP IN RESOLVING THIS ERROR? Thank you very much for your support. Corrado Giannasca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.