Re: [R] regression with proportion data
Your response variable is not binomial, it's a proportion. Try the betareg function in the betareg package, which more correctly assumes that your response variable is Beta distributed (but beware that 1 and 0 are not allowed). The syntax is the same as in a glm. HTH Ruben -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Georgiana May Enviado el: lunes, 19 de marzo de 2012 15:06 Para: r-help@r-project.org Asunto: [R] regression with proportion data Hello, I want to determine the regression relationship between a proportion (y) and a continuous variable (x). Reading a number of sources (e.g. The R Book, Quick R,help), I believe I should be able to designate the model as: model-glm(formula=proportion~x, family=binomial(link=logit)) this runs but gives me error messages: Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! If I transform the proportion variable with log, it doesn't like that either (values not: 0y1) I understand that the binomial function concerns successes vs. failures and can use those raw data, but the R Book and other sources seem to suggest that proportion data are usable as well. Not so? Thank you, Georgiana May [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Standard errors GLM
You have a conceptual problem, as pointed out by previous helpers. You don't have a standard error for the first level of your categorical variable because that level's effect is not estimated. It is being used as a reference level against which the other levels of that categorical variable are being estimated (the default in R). This is one way by which statisticians include categorical predictors into the regression framework, originally meant for relations between continuous quantitative variables. You might want to read about regression, factors, and contrasts. This paper about the issue is available online: M.J. Davis, 2010. Contrast coding in multiple regression analysis: strengths, weaknesses and utility of popular coding structures. Journal of Data Science 8:61-73. HTH Ruben -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de D_Tomas Enviado el: martes, 13 de marzo de 2012 14:39 Para: r-help@r-project.org Asunto: [R] Standard errors GLM Dear userRs, when applied the summary function to a glm fit (e.g Poisson) the parameter table provides the categorical variables assuming that the first level estimate (in alphabetical order) is 0. What is the standard error for that variable then? Are the standard errors calculated assuming a normal distribution? Many thanks, -- View this message in context: http://r.789695.n4.nabble.com/Standard-errors-GLM-tp4469086p4469086.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculation of standard error for a function
deltamethod function in package msm may help (but bear in mind the warnings/admonitions/recommendations of other helpers) HTH Rubén -- Rubén H. Roa-Ureta, Ph. D. AZTI Tecnalia, Txatxarramendi Ugartea z/g, Sukarrieta, Bizkaia, SPAIN -Original Message- From: r-help-boun...@r-project.org on behalf of Jun Shen Sent: Fri 3/2/2012 10:47 PM To: R-help Subject: [R] Calculation of standard error for a function Dear list, If I know the standard error for k1 and k2, is there anything I can call in R to calculate the standard error of k1/k2? Thanks. Jun [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constraint on one of parameters.
Read optimx's help. There are 'method', 'upper', 'lower' arguments that'll let you put bounds on pars. HTH Rubén -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de FU-WEN LIANG Enviado el: jueves, 09 de febrero de 2012 23:56 Para: r-help@r-project.org Asunto: [R] Constraint on one of parameters. Dear all, I have a function to optimize for a set of parameters and want to set a constraint on only one parameter. Here is my function. What I want to do is estimate the parameters of a bivariate normal distribution where the correlation has to be between -1 and 1. Would you please advise how to revise it? ex=function(s,prob,theta1,theta,xa,xb,xc,xd,t,delta) { expo1= exp(s[3]*xa+s[4]*xb+s[5]*xc+s[6]*xd) expo2= exp(s[9]*xa+s[10]*xb+s[11]*xc+s[12]*xd) expo3= exp(s[15]*xa+s[16]*xb+s[17]*xc+s[18]*xd) expo4= exp(s[21]*xa+s[22]*xb+s[23]*xc+s[24]*xd) expo5= exp(s[27]*xa+s[28]*xb+s[29]*xc+s[30]*xd) nume1=prob[1]*(s[2]^-s[1]*s[1]*t^(s[1]-1)*expo1)^delta*exp(-s[2]^-s[1]*t^s[1]*expo1)* theta1[1]^xa*(1-theta1[1])^(1-xa)*theta1[2]^xb*(1-theta1[2])^(1-xb)*(1+theta1[11]*(xa-theta1[1])*(xb-theta1[2])/sqrt(theta1[1]*(1-theta1[1]))/sqrt(theta1[2]*(1-theta1[2])))/ (2*pi*theta[2]*theta[4]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[1])^2/theta[2]^2+(xd-theta[3])^2/theta[4]^2-2*theta[21]^2*(xc-theta[1])*(xd-theta[3])/(theta[2]*theta[4])) nume2=prob[2]*(s[8]^-s[7]*s[7]*t^(s[7]-1)*expo2)^delta*exp(-s[8]^-s[7]*t^s[7]*expo2)* theta1[3]^xa*(1-theta1[3])^(1-xa)*theta1[4]^xb*(1-theta1[4])^(1-xb)*(1+theta1[11]*(xa-theta1[3])*(xb-theta1[4])/sqrt(theta1[3]*(1-theta1[3]))/sqrt(theta1[4]*(1-theta1[4])))/ (2*pi*theta[6]*theta[8]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[5])^2/theta[6]^2+(xd-theta[7])^2/theta[8]^2-2*theta[21]^2*(xc-theta[5])*(xd-theta[7])/(theta[6]*theta[8])) nume3=prob[3]*(s[14]^-s[13]*s[13]*t^(s[13]-1)*expo3)^delta*exp(-s[14]^-s[13]*t^s[13]*expo3)* theta1[5]^xa*(1-theta1[5])^(1-xa)*theta1[6]^xb*(1-theta1[6])^(1-xb)*(1+theta1[11]*(xa-theta1[5])*(xb-theta1[6])/sqrt(theta1[5]*(1-theta1[5]))/sqrt(theta1[6]*(1-theta1[6])))/ (2*pi*theta[10]*theta[12]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[9])^2/theta[10]^2+(xd-theta[11])^2/theta[12]^2-2*theta[21]^2*(xc-theta[9])*(xd-theta[11])/(theta[10]*theta[12])) nume4=prob[4]*(s[20]^-s[19]*s[19]*t^(s[19]-1)*expo4)^delta*exp(-s[20]^-s[19]*t^s[19]*expo4)* theta1[7]^xa*(1-theta1[7])^(1-xa)*theta1[8]^xb*(1-theta1[8])^(1-xb)*(1+theta1[11]*(xa-theta1[7])*(xb-theta1[8])/sqrt(theta1[7]*(1-theta1[7]))/sqrt(theta1[8]*(1-theta1[8])))/ (2*pi*theta[14]*theta[16]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[13])^2/theta[14]^2+(xd-theta[15])^2/theta[16]^2-2*theta[21]^2*(xc-theta[13])*(xd-theta[15])/(theta[14]*theta[16])) nume5=prob[5]*(s[26]^-s[25]*s[25]*t^(s[25]-1)*expo5)^delta*exp(-s[26]^-s[25]*t^s[25]*expo5)* theta1[9]^xa*(1-theta1[9])^(1-xa)*theta1[10]^xb*(1-theta1[10])^(1-xb)*(1+theta1[11]*(xa-theta1[9])*(xb-theta1[10])/sqrt(theta1[9]*(1-theta1[9]))/sqrt(theta1[10]*(1-theta1[10])))/ (2*pi*theta[18]*theta[20]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[17])^2/theta[18]^2+(xd-theta[19])^2/theta[20]^2-2*theta[21]^2*(xc-theta[17])*(xd-theta[19])/(theta[18]*theta[20])) denom=nume1+nume2+nume3+nume4+nume5 Ep1=nume1/denom Ep2=nume2/denom Ep3=nume3/denom Ep4=nume4/denom Ep5=nume5/denom elogld= sum(Ep1*(-log(2*pi*theta[2]*theta[4]*sqrt(1-theta[21]^2))-(2*(1-theta[21]^2))^(-1)*((xc-theta[1])^2/theta[2]^2+(xd-theta[3])^2/theta[4]^2-2*theta[21]^2*(xc-theta[1])*(xd-theta[3])/(theta[2]*theta[4] + sum(Ep2*(-log(2*pi*theta[6]*theta[8]*sqrt(1-theta[21]^2))-(2*(1-theta[21]^2))^(-1)*((xc-theta[5])^2/theta[6]^2+(xd-theta[7])^2/theta[8]^2-2*theta[21]^2*(xc-theta[5])*(xd-theta[7])/(theta[6]*theta[8] + sum(Ep3*(-log(2*pi*theta[10]*theta[12]*sqrt(1-theta[21]^2))-(2*(1-theta[21]^2))^(-1)*((xc-theta[9])^2/theta[10]^2+(xd-theta[11])^2/theta[12]^2-2*theta[21]^2*(xc-theta[9])*(xd-theta[11])/(theta[10]*theta[12] + sum(Ep4*(-log(2*pi*theta[14]*theta[16]*sqrt(1-theta[21]^2))-(2*(1-theta[21]^2))^(-1)*((xc-theta[13])^2/theta[14]^2+(xd-theta[15])^2/theta[16]^2-2*theta[21]^2*(xc-theta[13])*(xd-theta[15])/(theta[14]*theta[16] + sum(Ep5*(-log(2*pi*theta[18]*theta[20]*sqrt(1-theta[21]^2))-(2*(1-theta[21]^2))^(-1)*((xc-theta[17])^2/theta[18]^2+(xd-theta[19])^2/theta[20]^2-2*theta[21]^2*(xc-theta[17])*(xd-theta[19])/(theta[18]*theta[20] return(-elogld) } normal=optimx(c(15.5,0.4,10,1.3,17.5,0.3,15,1.5,13.5,0.5,19.5,1.1,20.7,0.4,30,0.4,25,0.7,24.6,1.6,0.5),ex,s=s,prob=prob,theta1=theta1,xa=xa,xb=xb,xc=xc,xd=xd,t=t,delta=delta) When I run this code, I got this error: In log(2 * pi * theta[6] * theta[8] * sqrt(1 - theta[21]^2)) : NaNs produced Because (1-theta[21]^2)0. Would you please advise? Thank you very much. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list
Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?
-Mensaje original- De: Bert Gunter [mailto:gunter.ber...@gene.com] Enviado el: jueves, 26 de enero de 2012 21:20 Para: Rubén Roa CC: Ben Bolker; Frank Harrell Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations? On Wed, Jan 25, 2012 at 11:39 PM, Rubén Roa r...@azti.es wrote: I think we have gone through this before. First, the destruction of all aspects of statistical inference is not at stake, Frank Harrell's post notwithstanding. Second, checking all pairs is a way to see for _all pairs_ which model outcompetes which in terms of predictive ability by -2AIC or more. Just sorting them by the AIC does not give you that if no model is better than the next best by less than 2AIC. Third, I was not implying that AIC differences play the role of significance tests. I agree with you that model selection is better not understood as a proxy or as a relative of significance tests procedures. Incidentally, when comparing many models the AIC is often inconclusive. If one is bent on selecting just _the model_ then I check numerical optimization diagnostics such as size of gradients, KKT criteria, and other issues such as standard errors of parameter estimates and the correlation matrix of parameter estimates. -- And the mathematical basis for this claim is ?? -- Ruben: In my area of work (building/testing/applying mechanistic nonlinear models of natural and economic phenomena) the issue of numerical optimization is a very serious one. It is often the case that a really good looking model does not converge properly (that's why ADMB is so popular among us). So if the AIC is inconclusive, but one AIC-tied model yields reasonably looking standard errors and low pairwise parameter estimates correlations, while the other wasn´t even able to produce a positive definite Hessian matrix (though it was able to maximize the log-likelihood), I think I have good reasons to select the properly converged model. I assume here that the lack of convergence is a symptom of model inadequacy to the data, that the AIC was not able to detect. --- Ruben: For some reasons I don't find model averaging appealing. I guess deep in my heart I expect more from my model than just the best predictive ability. -- This is a religious, not a scientific statement, and has no place in any scientific discussion. -- Ruben: Seriously, there is a wide and open place in scientific discussion for mechanistic model-building. I expect the models I built to be more than able predictors, I want them to capture some aspect of nature, to teach me something about nature, so I refrain from model averaging, which is an open admission that you don't care too much about what's really going on. -- The belief that certain data analysis practices -- standard or not -- somehow can be trusted to yield reliable scientific results in the face of clear theoretical (mathematical )and practical results to the contrary, while widespread, impedes and often thwarts the progress of science, Evidence-based medicine and clinical trials came about for a reason. I would encourage you to reexamine the basis of your scientific practice and the role that magical thinking plays in it. Best, Bert -- Ruben: All right Bert. I often re-examine the basis of my scientific praxis but less often than I did before, I have to confess. I like to think it is because I am converging on the right praxis so there are less issues to re-examine. But this problem of model selection is a tough one. Being a likelihoodist in inference naturally leads you to AIC-based model selection, Jim Lindsey being influent too. Wanting that your models say some something about nature is another strong conditioning factor. Put this two together and it becomes hard to do model-averaging. And it has nothing to do with religion (yuck!). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Frank Harrell Enviado el: viernes, 27 de enero de 2012 14:28 Para: r-help@r-project.org Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations? Ruben, I'm not sure you are understanding the ramifications of what Bert said. In addition you are making several assumptions implicitly: -- Ruben: Frank, I guess we are going nowhere now. But thanks anyways. See below if you want. 1. model selection is needed (vs. fitting the full model and using shrinkage) Ruben: Nonlinear mechanistic models that are too complex often just don't converge, they crash. No shrinkage to apply to a failed convergence model. 2. model selection works in the absence of shrinkage Ruben: I think you are assuming that shrinkage is necessary. 3. model selection can find the right model and the features selected would be the same features selected if the data were slightly perturbed or a new sample taken Ruben: I don't make this assumption. New data, new model. 4. AIC tells you something that P-values don't (unless using structured multiple degree of freedom tests) Ruben: It does. 5. parsimony is good Ruben: It is. None of these assumptions is true. Model selection without shrinkage (penalization) seems to offer benefits but this is largely a mirage. Ruben: Have a good weekend! Ruben Rubén Roa wrote -Mensaje original- De: Bert Gunter [mailto:gunter.berton@] Enviado el: jueves, 26 de enero de 2012 21:20 Para: Rubén Roa CC: Ben Bolker; Frank Harrell Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations? On Wed, Jan 25, 2012 at 11:39 PM, Rubén Roa lt;rroa@gt; wrote: I think we have gone through this before. First, the destruction of all aspects of statistical inference is not at stake, Frank Harrell's post notwithstanding. Second, checking all pairs is a way to see for _all pairs_ which model outcompetes which in terms of predictive ability by -2AIC or more. Just sorting them by the AIC does not give you that if no model is better than the next best by less than 2AIC. Third, I was not implying that AIC differences play the role of significance tests. I agree with you that model selection is better not understood as a proxy or as a relative of significance tests procedures. Incidentally, when comparing many models the AIC is often inconclusive. If one is bent on selecting just _the model_ then I check numerical optimization diagnostics such as size of gradients, KKT criteria, and other issues such as standard errors of parameter estimates and the correlation matrix of parameter estimates. -- And the mathematical basis for this claim is ?? -- Ruben: In my area of work (building/testing/applying mechanistic nonlinear models of natural and economic phenomena) the issue of numerical optimization is a very serious one. It is often the case that a really good looking model does not converge properly (that's why ADMB is so popular among us). So if the AIC is inconclusive, but one AIC-tied model yields reasonably looking standard errors and low pairwise parameter estimates correlations, while the other wasn´t even able to produce a positive definite Hessian matrix (though it was able to maximize the log-likelihood), I think I have good reasons to select the properly converged model. I assume here that the lack of convergence is a symptom of model inadequacy to the data, that the AIC was not able to detect. --- Ruben: For some reasons I don't find model averaging appealing. I guess deep in my heart I expect more from my model than just the best predictive ability. -- This is a religious, not a scientific statement, and has no place in any scientific discussion. -- Ruben: Seriously, there is a wide and open place in scientific discussion for mechanistic model-building. I expect the models I built to be more than able predictors, I want them to capture some aspect of nature, to teach me something about nature, so I refrain from model averaging, which is an open admission that you don't care too much about what's really going on. -- The belief that certain data analysis practices -- standard or not -- somehow can be trusted to yield reliable scientific results in the face of clear theoretical (mathematical )and practical results to the contrary, while widespread, impedes and often thwarts the progress of science, Evidence-based medicine and clinical trials came about for a reason. I would encourage you to reexamine the basis of your scientific practice and the role that magical thinking plays in it. Best, Bert -- Ruben: All right Bert. I often re-examine the basis of my scientific praxis but less often than I did before, I have to confess. I like to think it is because I am converging on the right praxis so
Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?
A more 'manual' way to do it is this. If you have all your models named properly and well organized, say Model1, Model2, ..., Model 47, with a slot with the AIC (glm produces a list with one component named 'aic') then something like this: x - matrix(0,1081,3) x[,1:2] - t(combn(47,2)) for(i in 1:n){x[,3] - abs(unlist(combn(eval(as.name(paste(Model,i,sep=)))$aic[1,])-unlist(combn(eval(as.name(paste(Model,i,sep=)))$aic,2)[2,]))} will calculate all the 1081 AIC differences between paired comparisons of the 47 models. It may not be pretty but works for me. An AIC difference equal or less than 2 is a tie, anything higher is evidence for the model with the less AIC (Sakamoto et al., Akaike Information Criterion Statistics, KTK Scientific Publishers, Tokyo). HTH Rubén H. Roa-Ureta, Ph. D. AZTI Tecnalia, Txatxarramendi Ugartea z/g, Sukarrieta, Bizkaia, SPAIN -Original Message- From: r-help-boun...@r-project.org on behalf of Milan Bouchet-Valat Sent: Wed 1/25/2012 10:32 AM To: Jhope Cc: r-help@r-project.org Subject: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations? Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit : Hi R-listers, I have developed 47 GLM models with different combinations of interactions from 1 variable to 5 variables. I have manually made each model separately and put them into individual tables (organized by the number of variables) showing the AIC score. I want to compare all of these models. 1) What is the best way to compare various models with unique combinations and different number of variables? See ?step or ?stepAIC (from package MASS) if you want an automated way of doing this. 2) I am trying to develop the most simplest model ideally. Even though adding another variable would lower the AIC, how do I interpret it is worth it to include another variable in the model? How do I know when to stop? This is a general statistical question, not specific to R. As a general rule, if adding a variable lowers the AIC by a significant margin, then it's worth including it. You should only stop when a variable increases the AIC. But this is assuming you consider it a good indicator and you know what you're doing. There's plenty of literature on this subject. Definitions of Variables: HTL - distance to high tide line (continuous) Veg - distance to vegetation Aeventexhumed - Event of exhumation Sector - number measurements along the beach Rayos - major sections of beach (grouped sectors) TotalEggs - nest egg density Example of how all models were created: Model2.glm - glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed, data=data.to.analyze, family=binomial) Model7.glm - glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family = binomial, data.to.analyze) Model21.glm - glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs, data.to.analyze, family = binomial) Model37.glm - glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial) To extract the AICs of all these models, it's easier to put them in a list and get their AICs like this: m - list() m$model2 - glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed, data=data.to.analyze, family=binomial) m$model3 - glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family = binomial, data.to.analyze) sapply(m, extractAIC) Cheers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?
I think we have gone through this before. First, the destruction of all aspects of statistical inference is not at stake, Frank Harrell's post notwithstanding. Second, checking all pairs is a way to see for _all pairs_ which model outcompetes which in terms of predictive ability by -2AIC or more. Just sorting them by the AIC does not give you that if no model is better than the next best by less than 2AIC. Third, I was not implying that AIC differences play the role of significance tests. I agree with you that model selection is better not understood as a proxy or as a relative of significance tests procedures. Incidentally, when comparing many models the AIC is often inconclusive. If one is bent on selecting just _the model_ then I check numerical optimization diagnostics such as size of gradients, KKT criteria, and other issues such as standard errors of parameter estimates and the correlation matrix of parameter estimates. For some reasons I don't find model averaging appealing. I guess deep in my heart I expect more from my model than just the best predictive ability. Rubén -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Ben Bolker Enviado el: miércoles, 25 de enero de 2012 15:41 Para: r-h...@stat.math.ethz.ch Asunto: Re: [R]How do I compare 47 GLM models with 1 to 5 interactions and unique combinations? Rubén Roa rroa at azti.es writes: A more 'manual' way to do it is this. If you have all your models named properly and well organized, say Model1, Model2, ..., Model 47, with a slot with the AIC (glm produces a list with one component named 'aic') then something like this: x - matrix(0,1081,3) x[,1:2] - t(combn(47,2)) for(i in 1:n){x[,3] - abs(unlist(combn(eval(as.name(paste(Model,i,sep=)))$aic[1,])- unlist(combn(eval(as.name(paste(Model,i,sep=)))$aic,2)[2,]))} will calculate all the 1081 AIC differences between paired comparisons of the 47 models. It may not be pretty but works for me. An AIC difference equal or less than 2 is a tie, anything higher is evidence for the model with the less AIC (Sakamoto et al., Akaike Information Criterion Statistics, KTK Scientific Publishers, Tokyo). I wouldn't go quite as far as Frank Harrell did in his answer, but (1) it seems silly to do all pairwise comparisons of models; one of the big advantages of information-theoretic approaches is that you can just list the models in order of AIC ... (2) the dredge() function from the MuMIn package may be useful for automating this sort of thing. There is an also an AICtab function in the emdbook package. (3) If you're really just interested in picking the single model with the best expected predictive capability (and all of the other assumptions of the AIC approach are met -- very large data set, good fit to the data, etc.), then just picking the model with the best AIC will work. It's when you start to make inferences on the individual parameters within that model, without taking account of the model selection process, that you run into trouble. If you are really concerned about good predictions then it may be better to do multi-model averaging *or* use some form of shrinkage estimator. (4) The delta-AIC2 means pretty much the same rule of thumb is fine, but it drives me nuts when people use it as a quasi-significance-testing rule, as in the simpler model is adequate if dAIC2. If you're going to work in the model selection arena, then don't follow hypothesis-testing procedures! A smaller AIC means lower expected K-L distance, period. For the record, Brian Ripley has often expressed the (minority) opinion that AIC is *not* appropriate for comparing non-nested models (e.g. http://tolstoy.newcastle.edu.au/R/help/06/02/21794.html). -Original Message- From: r-help-bounces at r-project.org on behalf of Milan Bouchet-Valat Sent: Wed 1/25/2012 10:32 AM To: Jhope Cc: r-help at r-project.org Subject: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations? Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit : Hi R-listers, I have developed 47 GLM models with different combinations of interactions from 1 variable to 5 variables. I have manually made each model separately and put them into individual tables (organized by the number of variables) showing the AIC score. I want to compare all of these models. 1) What is the best way to compare various models with unique combinations and different number of variables? See ?step or ?stepAIC (from package MASS) if you want an automated way of doing this. 2) I am trying to develop the most simplest model ideally. Even though adding another variable would lower the AIC, No, not necessarily. how do I interpret it is worth it to include another variable in the model? How do I know when to stop? When the AIC stops decreasing. This is a general
Re: [R] how to get inflection point in binomial glm
René, Yes, to fit a re-parameterized logistic model I think you'd have to code the whole enchilada yourself, not relying on glm (but not nls() as nls() deals with least squares minimization whereas here we want to minimize a minus log binomial likelihood). I did that and have the re-parameterized logistic model in a package I wrote for a colleague (this package has the logistic fit fully functional and documented). My code though only considers one continuous predictor. If you want I may email you this package and you figure out how to deal with the categorical predictor. One thing I believe at this point is that you'd have to do the inference on the continuous predictor _conditional_ on certain level(s) of the categorical predictor. Rubén -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de René Mayer Enviado el: jueves, 01 de diciembre de 2011 20:34 Para: David Winsemius CC: r-help Help Asunto: Re: [R] how to get inflection point in binomial glm Thanks David and Rubén! @David: indeed 15 betas I forgot the interaction terms, thanks for correcting! @Rubén: the re-parameterize would be done within nls()? how to do this practically with including the factor predictor? and yes, we can solve within each group for Y=0 getting 0=b0+b1*X |-b0 -b0=b1*X |/b1 -b0/b1=X but I was hoping there might a more general solution for the case of multiple logistic regression. HTH René Zitat von David Winsemius dwinsem...@comcast.net: On Dec 1, 2011, at 8:24 AM, René Mayer wrote: Dear All, I have a binomial response with one continuous predictor (d) and one factor (g) (8 levels dummy-coded). glm(resp~d*g, data, family=binomial) Y=b0+b1*X1+b2*X2 ... b7*X7 Dear Dr Mayer; I think it might be a bit more complex than that. I think you should get 15 betas rather than 8. Have you done it? how can I get the inflection point per group, e.g., P(d)=.5 Wouldn't that just be at d=1/beta in each group? (Thinking, perhaps naively, in the case of X=X1 that (Pr[y==1])/(1-Pr[y==1])) = 1 = exp( beta *d*(X==X1) ) # all other terms = 0 And taking the log of both sides, and then use middle school math to solve. Oh, wait. Muffed my first try on that for sure. Need to add back both the constant intercept and the baseline d coefficient for the non-b0 levels. (Pr[y==1])/(1-Pr[y==1])) = 1 = exp( beta_0 + beta_d_0*d + beta_n + beta_d_n *d*(X==Xn) ) And just (Pr[y==1])/(1-Pr[y==1])) = 1 = exp( beta_0 + beta_d_0*d ) # for the reference level. This felt like an exam question in my categorical analysis course 25 years ago. (Might have gotten partial credit for my first stab, depending on how forgiving the TA was that night.) I would be grateful for any help. Thanks in advance, René -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get inflection point in binomial glm
Assuming a logistic model, for each group solve for d at Y=0, or solve for d at p=0.5, where d is your continuous predictor, Y is the logit score, and p is the probability of success in the binomial model. After that you can get the standard error of the inflection point by Taylor series (aka delta method). Another idea is to re-parameterize the logistic model to include explicitly the inflection point, thus you'll get its estimate and standard error directly as a result of the fit. For example, disregarding the g factor predictor (or for each group), a logistic model such as P(d) = 1/(1+exp(log(1/19)(d-d50)/(d95-d50)) includes the inflection point directly (d50) and is a re-parameterized version of the usual logistic model P(d) =1/(1+exp(b0+b1*d)) where parameters b0 and b1 have been replaced by d95 and d50, the predictor at 50% probability (inflection point), and the predictor at 95% probability, respectively. HTH Rubén -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de René Mayer Enviado el: jueves, 01 de diciembre de 2011 14:25 Para: r-help@r-project.org Asunto: [R] how to get inflection point in binomial glm Dear All, I have a binomial response with one continuous predictor (d) and one factor (g) (8 levels dummy-coded). glm(resp~d*g, data, family=binomial) Y=b0+b1*X1+b2*X2 ... b7*X7 how can I get the inflection point per group, e.g., P(d)=.5 I would be grateful for any help. Thanks in advance, René __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RV: Reporting a conflict between ADMB and Rtools on Windows systems
De: Rubén Roa Enviado el: jueves, 17 de noviembre de 2011 9:53 Para: 'us...@admb-project.org' Asunto: Reporting a conflict between ADMB and Rtools on Windows systems Hi, I have to work under Windows, it's a company policy. I've just found that there is a conflict between tools used to build R packages (Rtools) and ADMB due to the need to put Rtools compiler's location in the PATH environmental variable to make Rtools work. On a Windows 7 64bit with Rtools installed I installed ADMB-IDE latest version and although I could translate ADMB code to cpp code I could not build the cpp code into an executable via ADMB-IDE's compiler. On another Windows machine, a Windows Vista 32bits with Rtools installed I also installed the latest ADMB-IDE and this time it was not possible to create the .obj file on the way to build the executable when building with ADMB-IDE. On this machine I also have a previous ADMB version (6.0.1) that I used to run from the DOS shell. This ADMB also failed to build the .obj file. Now, going to PATH, the location info to make Rtools is: c:\Rtools\bin;c:\Rtools\MinGW\bin;c:\Rtools\MinGW64\bin;C:\Program Files (x86)\MiKTeX 2.9\miktex\bin; If from this list I remove the reference to the compiler c:\Rtools\MinGW\bin then ADMB works again. So beware of this conflict. Suggestion of a solution will be appreciated. Meanwhile, I run ADMB code in one computer and build R packages with Rtools in another computer. Best Ruben -- Dr. Ruben H. Roa-Ureta Senior Researcher, AZTI Tecnalia, Marine Research Division, Txatxarramendi Ugartea z/g, 48395, Sukarrieta, Bizkaia, Spain [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RV: Reporting a conflict between ADMB and Rtools on Windows systems
Thanks Gabor and Jan. The batch files solution seems the way to go. Will implement it! Rubén -Mensaje original- De: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Enviado el: jueves, 17 de noviembre de 2011 13:58 Para: Rubén Roa CC: r-help@r-project.org Asunto: Re: [R] RV: Reporting a conflict between ADMB and Rtools on Windows systems On Thu, Nov 17, 2011 at 3:54 AM, Rubén Roa r...@azti.es wrote: I've just found that there is a conflict between tools used to build R packages (Rtools) and ADMB due to the need to put Rtools compiler's location in the PATH environmental variable to make Rtools work. On a Windows 7 64bit with Rtools installed I installed ADMB-IDE latest version and although I could translate ADMB code to cpp code I could not build the cpp code into an executable via ADMB-IDE's compiler. On another Windows machine, a Windows Vista 32bits with Rtools installed I also installed the latest ADMB-IDE and this time it was not possible to create the .obj file on the way to build the executable when building with ADMB-IDE. On this machine I also have a previous ADMB version (6.0.1) that I used to run from the DOS shell. This ADMB also failed to build the .obj file. Now, going to PATH, the location info to make Rtools is: c:\Rtools\bin;c:\Rtools\MinGW\bin;c:\Rtools\MinGW64\bin;C:\Program Files (x86)\MiKTeX 2.9\miktex\bin; If from this list I remove the reference to the compiler c:\Rtools\MinGW\bin then ADMB works again. So beware of this conflict. Suggestion of a solution will be appreciated. Meanwhile, I run ADMB code in one computer and build R packages with Rtools in another computer. The batchfiles Rcmd.bat, Rgui.bat temporarily add R and Rtools to your path by looking them up in the registry and then calling Rcmd.exe or Rgui.exe respectively. When R is finished the path is restored to what it was before. By using those its not necessary to have either on your path.These are self contained batch files with no dependencies so can simply be placed anywhere on the path in order to use them. For those and a few other batch files of interest to Windows users of R see: http://batchfiles.googlecode.com -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Poor performance of Optim
-Original Message- From: r-help-boun...@r-project.org on behalf of yehengxin Sent: Sat 10/1/2011 8:12 AM To: r-help@r-project.org Subject: [R] Poor performance of Optim I used to consider using R and Optim to replace my commercial packages: Gauss and Matlab. But it turns out that Optim does not converge completely. The same data for Gauss and Matlab are converged very well. I see that there are too many packages based on optim and really doubt if they can be trusted! -- Considering that your post is pure whining without any evidence or reproducible example, considering that you speak of 'data' being converged, me think it's your fault, you cann't control optim well enough to get sensible results. There are many ways to use optim eh?, you can pass on the gradients, you can use a variety of methods, you can increase the number of iterations, et cetera, read optim's help, come back with a reproducible example, or quietly stick to your commercial sofware, leaving the whining to yourself. HTH Ruben -- Rubén H. Roa-Ureta, Ph. D. AZTI Tecnalia, Txatxarramendi Ugartea z/g, Sukarrieta, Bizkaia, SPAIN -- View this message in context: http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3862229.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can I tell about someone's academic cheating
-Original Message- From: r-help-boun...@r-project.org on behalf of YuHong Sent: Sun 10/2/2011 3:27 AM To: r-help@r-project.org Subject: [R] Can I tell about someone's academic cheating Hello, Can I tell about someone¡¦s academic cheating behavior in the mailing list? For I knew this person through this R mailing list. Thanks! Regards, Hong Yu -- You have to provide a reproducible example ... Rubén H. Roa-Ureta, Ph. D. AZTI Tecnalia, Txatxarramendi Ugartea z/g, Sukarrieta, Bizkaia, SPAIN [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cannot allocate vector of size x
Check one of the examples in ?try It has this heading: ## run a simulation, keep only the results that worked. If your system is Windows, you can also try to increase the memory available for one application, in order to avoid the problem. Do a search for 3GB switch HTH Dr. Ruben H. Roa-Ureta Senior Researcher, AZTI Tecnalia, Marine Research Division, Txatxarramendi Ugartea z/g, 48395, Sukarrieta, Bizkaia, Spain -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Michael Haenlein Enviado el: miércoles, 21 de septiembre de 2011 11:54 Para: r-help@r-project.org Asunto: [R] Cannot allocate vector of size x Dear all, I am running a simulation in which I randomly generate a series of vectors to test whether they fulfill a certain condition. In most cases, there is no problem. But from time to time, the (randomly) generated vectors are too large for my system and I get the error message: Cannot allocate vector of size x. The problem is that in those cases my simulation stops and I have to start it again manually. What I would like to do is to simply ignore that the error happened (or probably report that it did) and then continue with another (randomly) generated vector. So my question: Is there a way to avoid that R stops in such a case and just restarts the program from the beginning as if nothing happened? I hope I'm making myself clear here ... Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cannot allocate vector of size x
Yes, on a recent heavy-duty job -profile likelihood of Tweedie power parameter for a relatively complex glm with hundreds of thousands rows dataframe- I had the cannot allocate vector ... error, then I just closed-saved the main workspace, full of large objects, then I did the profiling on a fresh and almost empty (with strictly necessary objects) workspace, and it run successfully. Then I just loaded the two workspaces to one session, and continued happily to the end of the job. :-) Dr. Ruben H. Roa-Ureta Senior Researcher, AZTI Tecnalia, Marine Research Division, Txatxarramendi Ugartea z/g, 48395, Sukarrieta, Bizkaia, Spain -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Jim Holtman Enviado el: miércoles, 21 de septiembre de 2011 12:11 Para: Michael Haenlein CC: r-help@r-project.org Asunto: Re: [R] Cannot allocate vector of size x how much memory do you have on your system? How large are the vectors you are creating? How many other large vectors do you have in memory? Remove all unused objects and do gc() to reclaim some of the memory. Remember all objects are in memory and you have to understand how large they are and how many you have. Ther is more information you have to provide and some more inspection you have to do. Sent from my iPad On Sep 21, 2011, at 5:53, Michael Haenlein haenl...@escpeurope.eu wrote: Dear all, I am running a simulation in which I randomly generate a series of vectors to test whether they fulfill a certain condition. In most cases, there is no problem. But from time to time, the (randomly) generated vectors are too large for my system and I get the error message: Cannot allocate vector of size x. The problem is that in those cases my simulation stops and I have to start it again manually. What I would like to do is to simply ignore that the error happened (or probably report that it did) and then continue with another (randomly) generated vector. So my question: Is there a way to avoid that R stops in such a case and just restarts the program from the beginning as if nothing happened? I hope I'm making myself clear here ... Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gradients in optimx
Well, I guess this doesn't make necessary for me to prepare a report with the CatDyn package. However, I am available to test a new optimx R-forge version with my package. Cheers Rubén -- Dr. Ruben H. Roa-Ureta Senior Researcher, AZTI Tecnalia, Marine Research Division, Txatxarramendi Ugartea z/g, 48395, Sukarrieta, Bizkaia, Spain -Mensaje original- De: John C Nash [mailto:nas...@uottawa.ca] Enviado el: domingo, 04 de septiembre de 2011 18:55 Para: Ravi Varadhan CC: Rubén Roa; 'kathryn.lord2...@gmail.com'; 'r-help@r-project.org' Asunto: Re: Gradients in optimx I've started to work on this again, and can confirm there seems to be some sort of bug in the gradient test at the beginning of the current R-forge version of optimx. It is not something obvious, and looks like a mixup in arguments to functions, which have been an issue since I've been trying to trap NaN and Inf returns. Worse, making the control starttests = FALSE fails because there I inadvertently put the initial function calculation inside the block that does the tests. Sigh. Will try to get something done by end of this week. (This will be R-forge version.) JN On 08/31/2011 09:31 AM, Ravi Varadhan wrote: Hi Reuben, I am puzzled to note that the gradient check in optimx does not work for you. Can you send me a reproducible example so that I can figure this out? John - I think the best solution for now is to issue a warning rather than an error message, when the numerical gradient is not sufficiently close to the user-specified gradient. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu mailto:rvarad...@jhmi.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gradient function in OPTIMX
Hi, In my package CatDyn, which uses optimx, I included the gradients of 20 version of the model involved. I estimate model parameters with numerical gradients, and at the final estimates I calculate the analytical gradients. In the simplest version of the model the analytical gradients computed post hoc are almost identical to the numerical gradients. This shows that the analytical gradients (whose formulas were obtained by the CAS Maxima) are correct, at least for those simple versions of my model. However, if I try to pass the analytical gradients to optimx in a new optimization, I invariably get the error message that you got: Gradient function might be wrong - check it! This happens regardless of the method used (BFGS, spg, Rcgmin). Same as you, when I try to pass the gradients to optim, instead of optimx, the gradients are accepted and computed correctly, but then I cann't use the very nice other features of optimx. I wanted to report this to Ravi and Prof. Nash but I haven't got the time for a full report with several examples and variations. So now that you report it, here I am, seconding you in calling the attention to this apparent problem in optimx. Rubén H. Roa-Ureta, Ph. D. AZTI Tecnalia, Txatxarramendi Ugartea z/g, Sukarrieta, Bizkaia, SPAIN -Original Message- From: r-help-boun...@r-project.org on behalf of Kathie Sent: Mon 8/29/2011 11:10 AM To: r-help@r-project.org Subject: [R] gradient function in OPTIMX Dear R users When I use OPTIM with BFGS, I've got a significant result without an error message. However, when I use OPTIMX with BFGS( or spg), I've got the following an error message. optimx(par=theta0, fn=obj.fy, gr=gr.fy, method=BFGS, control=list(maxit=1)) Error: Gradient function might be wrong - check it! I checked and checked my gradient function line by line. I could not find anything wrong. Is it a bug or something? I prefer OPTIMX, so I'd like to know why. Thanks a lot in advance Regards, Kathryn Lord -- View this message in context: http://r.789695.n4.nabble.com/gradient-function-in-OPTIMX-tp3775791p3775791.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PBSmapping, where is Ireland?!
Dear Ewan, I faced this problem and solved it by contacting the package authors, John Schnute and Rowan Haigh, rowan.ha...@dfo-mpo.gc.ca. Here is a function that solves the problem by displacing the Greenwich meridian to longitude 348 leaving Ireland to the right. This longitude does not span any land mass within the limits of the map so it does not cause any disappearing land masses. The function loads the GSHHS data in intermediate resolution, so it takes some time, less than 1 min in my standard laptop, to run. Change the xlim and ylim values to get different fractions of Europe. Last time I contacted them (October 2010), package authors were planning to add some comments about this in PBSmapping user's guide. So you may find more info by digging into the user's guide, or else, contact Rowan. HTH Rubén Dr. Ruben H. Roa-Ureta Senior Researcher, AZTI Tecnalia, Marine Research Division, Txatxarramendi Ugartea z/g, 48395, Sukarrieta, Bizkaia, Spain library(PBSmapping) Euromap - function(path=C:/Temp, cutLon=348) { fnam - paste(path,gshhs_f.b,sep=/); p1 - importGSHHS(fnam,xlim=c(-20,360),ylim=c(30,80),level=1,n=0,xoff=0); z - p1$XcutLon; p1$X[z] - p1$X[z]-360; NorthSeaHR - thinPolys(p1, tol=0.1, filter=3) .initPBS() clr - PBSval$PBSclr; xlim - c(-18, 16) ylim - c(32, 64) WEurope - clipPolys(NorthSeaHR, xlim=xlim, ylim=ylim) par(mfrow=c(1,1),omi=c(0,0,0,0)) plotMap(WEurope, xlim=xlim, ylim=ylim, col=clr$land, bg=clr$sea, tck=-0.02,mgp=c(2,.75,0), cex=1.2, plt=c(.08,.98,.08,.98)) } Euromap(cutLon=348) -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Ewan Minter Enviado el: martes, 16 de agosto de 2011 14:57 Para: r-help@r-project.org Asunto: [R] PBSmapping, where is Ireland?! Hi folks, I've been using 'PBSmapping' to make a map of Europe with some labels. I've been using the 'worldLL' PolyData, as my computer is too slow to make my own from the GSHHS files. The only p __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] CatDyn - Estimation of wild populations abundance
Dear ComRades, Package CatDyn 1.0-3 is now available on CRAN. The package allows the estimation of the absolute abundance of a wild population during a capture season using a nonlinear and recursive discrete-time structural model. The response variable is the catch, assumed a random variable produced by either a multiplicative stochastic process (lognormal distribution) or an additive stochastic process (normal distribution). The predictor variables are capture effort, assumed observed exactly, and stock abundance, a latent predictor. The data are high frequency (e.g. daily) observations of catch and capture effort during the season. The structural model comes in five varieties. The simplest model assumes that all individuals that can be captured during the season were available at the start of the season, thus during the season the abundance always decreases (a pure depletion model), due to natural mortality and capture mortality. The other versions include one, two, three or four positive perturbations, such as immigration of recruits, that occur at certain time steps during the season. The models also includes nonlinear (power) relations between response and predictors. The models can be fit with natural mortality fixed at a given value or letting this parameter be estimated as well. Taking into account structural and stochastic options, CatDyn can fit up to 20 model versions to the same data. CatDyn uses the recently released optimx package for maximum flexibility in selecting and using numerical methods implemented in R. CatDyn also has analytical gradients but in the current version these gradients are not yet passed to the optimizer; instead they are computed after convergence using numerical gradients, in order to compare analytical versus numerical gradients at the maximum likelihood estimates using various methods. CatDyn estimates all parameters in the log scale and uses function deltamethod() from package msm to return back-transformed estimates vectors and covariance matrices. CatDyn has been used already to estimate abundance of several fish and invertebrate stocks harvested by industrial and artisanal fishing fleets over the world. A manuscript is under preparation with the first application (a squid stock from the Falkland Islands) and will be submitted to a marine science journal. Dr. Rubén H. Roa-Ureta, AZTI Tecnalia, Txatxarramendi Ugartea z/g, Sukarrieta, Bizkaia, SPAIN [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to find the likelihood of a null model in R
I know how to do what you want. However, the fact that you didn't bother to follow the rules of the list eliminates the initial, admittedly faint, enthusiasm to help you. Maybe someone else will guide you anyways. The rules are extremely simple. Read them at the end of this message. If you do you'd probably find the answer to your question yourself. Dr. Rubén H. Roa-Ureta, AZTI Tecnalia, Txatxarramendi Ugartea z/g, Sukarrieta, Bizkaia, SPAIN -Original Message- From: r-help-boun...@r-project.org on behalf of Partha Pratim PATTNAIK Sent: Mon 7/25/2011 1:16 PM To: r-help@r-project.org Subject: [R] How to find the likelihood of a null model in R Dear All, I am working on a dataset having the dependent variable as ordinal data(discrete data) and multiple independent variables. I need to find the likelihood for the NULL model.i.e the model with only the dependent variable and all other independent variables as zero. Kindly let me know how to find the likelihood for a NULL model in R. Is there any specific function in R that can do this task? Please share if anyone has any information on this. Thanks and Regards Partha Pattnaik -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] please help! what are the different using log-link function and log transformation?
The problem is not that you are new to R. This is a conceptual issue. Let y be the response variable and let {x_i} be a set of predictors. Your first model (identity response and log-link) is saying that y=f(x_1)f(x_2)...f(x_n) + e, e~Normal(0,sigma) i.e. this is an additive observation-error model with constant variance. Your second model (log-response and identity link) is saying that y=f(x_1)f(x_2)...f(x_n)u, u=exp(e), e~Normal(0,sigma) i.e. this a multiplicative observation-error model with variance proportional to the mean. Plot the data versus response and visually examine whether you have heteroscedasticity. If this is true, use your second model, otherwise use the first one. One key to understand these kind of dichotomies is to realize that statistical models are composed of a process part and an observation part. In your models the process part is deterministic and multiplicative but after that, you still have two choices, make the random observation part additive (your first model) or multiplicative (your second model). Needless to say (but I am saying it anyways) these two models will give different results, at the very least because one assumes constant variance (your first model) whereas the other assumes a variance proportional to the mean. In my experience with multiplicative process models, the random observation part shall usually be multiplicative as well because of heteroscedasticity. HTH Rubén - Rubén H. Roa-Ureta, Ph. D. AZTI Tecnalia, Txatxarramendi Ugartea z/g, Sukarrieta, Bizkaia, SPAIN -Original Message- From: r-help-boun...@r-project.org on behalf of pigpigmeow Sent: Sun 6/19/2011 9:39 AM To: r-help@r-project.org Subject: [R] please help! what are the different using log-link function and log transformation? I'm new R-programming user, I need to use gam function. y-gam(a~s(b),family=gaussian(link=log),data) y-gam(loga~s(b), family =gaussian (link=identity),data) why these two command results are different? I guess these two command results are same, but actally these two command results are different, Why? -- View this message in context: http://r.789695.n4.nabble.com/please-help-what-are-the-different-using-log-link-function-and-log-transformation-tp3608931p3608931.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] please help! what are the different using log-link function and log transformation?
Funny, I was going to quote your paper with Dichmont in Fisheries Research regarding this, then I forgot to do it, but you yourself came out and posted the explanation. This forum is great! Rubén -Original Message- From: r-help-boun...@r-project.org on behalf of bill.venab...@csiro.au Sent: Sun 6/19/2011 2:36 PM To: gloryk...@hotmail.com; r-help@r-project.org Subject: Re: [R] please help! what are the different using log-link function and log transformation? The two commands you give below are certain to lead to very different results, because they are fitting very different models. The first is a gaussian model for the response with a log link, and constant variance. The second is a gaussian model for a log-transformed response and identity link. On the original scale this model would imply a constant coefficient of variation and hence a variance proportional to the square of the mean, and not constant. Your problem is not particularly an R issue, but a difficulty with understanding generalized linear models (and hence generalized additive models, which are based on them). Bill Venables. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of pigpigmeow [gloryk...@hotmail.com] Sent: 19 June 2011 17:39 To: r-help@r-project.org Subject: [R] please help! what are the different using log-link function and log transformation? I'm new R-programming user, I need to use gam function. y - gam(a ~ s(b), family = gaussian(link=log), data) y - gam(log(a) ~ s(b), family = gaussian (link=identity), data) why [do] these two command [give different] results? I guess these two command results are same, but actally these two command results are different, Why? -- View this message in context: http://r.789695.n4.nabble.com/please-help-what-are-the-different-using-log-link-function-and-log-transformation-tp3608931p3608931.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic growth model
Write the growth formula in an R script. Define initial par values. Input the size and age data. Plot the size and age data as points. Plot the growth model with the initial par values as a line. Play with the initial par values until you see a good agreement between the model (the line) and the data (the points). Optimise. Re-plot. Plot a residual histogram. Plot a residual scatterplot. Plot a Q-Q residual plot. HTH Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Renalda Enviado el: sábado, 04 de junio de 2011 6:17 Para: r-help@r-project.org Asunto: [R] logistic growth model I want to Fit a logistic growth model for y = k *eb0+b1(age)/1 + eb0+b1(age), can some one help on how to get the initial coefficients b0 and b1? I need to estimate in order to do the regression analysis. When I run using b0=0.5 and b1=3.4818, I get the following error 397443.8 : 0.5 3.4818 Error in nls(Height ~ k * exp(b1 + b2 * Age)/(1 + exp(b1 + b2 * Age)), : singular gradient please tell me what is wrong with my initials values, and how to get the initial values __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Building Custom GUIs for R
Check the PBSModelling package. HTH Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de vioravis Enviado el: viernes, 20 de mayo de 2011 8:52 Para: r-help@r-project.org Asunto: [R] Building Custom GUIs for R I am looking to build simple GUIs based on the R codes I have. The main objective is to hide the scary R codes from non-programming people and make it easier for them to try out different inputs. For example, 1. The GUI will have means to upload a csv file which will be read by the R code. 2. A button to preprocess data (carried out by a R function behind) 3. A button to build some models and run simulations 4. Space to display visual charts based on the simulations results 5. Option to save the results to a csv file or something similar. Are there any tools currently available that enable us build GUIs??? (MATLAB has a GUI builder that enables the users build custom GUIs). Can we make a exe of such GUI (with the R code) and let people use it without having to install R??? Any help on this would be much appreciated?? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Building-Custom-GUIs-for-R-tp353 7794p3537794.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls problem with R
In addition to Andrew's advice, you should get more familiar with your nonlinear model. From what you wrote, as T2 tends to infinity, V2 tends to v0*(1-epi). There you have a baseline on the Y-axis towards which your model tends, and this will give you sensible starting values for v0 and epi. Also, as T0 tends to 0, V2 tends to v0(1-epi(1+exp(cl*t0))). There you have another higher point on the Y-axis, and this one will give you additional sensible starting values for cl and t0. Plot the data and the predicted model with your initial values and sends the model-data combination to the optimizer once you see that the predicted line is close to the observed response. V2 - c(371000, 285000 ,156000, 20600, 4420, 3870, 5500 ) T2 - c(0.3403 ,0.4181 ,0.4986 ,0.7451 ,1.0069 ,1.553, 1.333) #last value inserted for illustration. #nls2 - nls(V2~v0*(1-epi+epi*exp(-cl*(T2-t0))),start=list(v0=10^7,epi=0.9 ,cl=6.2,t0=8.7)) v0.ini - 10^7 epi.ini - 0.9 cl.ini - 6.2 t0.ini - 8.7 V2.pred.ini - v0.ini*(1-epi.ini+epi.ini*exp(-cl.ini*(T2-t0.ini))) plot(T2,V2) lines(T2,V2.pred.ini) As you can see, with your initial values the line doesn't even show on the plot. No wonder the gradients are singular. So go find better initial values by trial and error and check the results on the plot. Then the optimizer called by nls will finish the job (hopefully). Then you repeat your plot this time with the estimates instead of the initial values. This may get you started in the business of estimating nolinear models. HTH Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Andrew Robinson Enviado el: miércoles, 04 de mayo de 2011 9:15 Para: sterlesser CC: r-help@r-project.org Asunto: Re: [R] nls problem with R The fact that T2 and V2 are of different lengths seems like a likely culprit. Other than that, you need to find start points that do not lead to a singular gradient. There are several books that provide advice on obtaining initial parameter estimates for non-linear models. Google Books might help you. Cheers Andrew On Tue, May 03, 2011 at 09:08:03PM -0700, sterlesser wrote: the original data are V2 =c(371000,285000 ,156000, 20600, 4420, 3870, 5500 ) T2=c( 0.3403 ,0.4181 ,0.4986 ,0.7451 ,1.0069 ,1.553) nls2=nls(V2~v0*(1-epi+epi*exp(-cl*(T2-t0))),start=list(v0=10^7,epi=0.9 ,cl=6.2,t0=8.7)) after execution error occurs as below Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates In addition: Warning messages: 1: In lhs - rhs : longer object length is not a multiple of shorter object length 2: In .swts * attr(rhs, gradient) : longer object length is not a multiple of shorter object length could anyone help me ?thansks -- View this message in context: http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3494454.htm l Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and StatisticsTel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple lattice question
-Mensaje original- De: Peter Ehlers [mailto:ehl...@ucalgary.ca] Enviado el: jueves, 31 de marzo de 2011 18:09 Para: Rubén Roa CC: r-help@r-project.org Asunto: Re: [R] Simple lattice question On 2011-03-31 06:58, Rubén Roa wrote: Thanks Peters! Just a few minor glitches now: require(lattice) data- data.frame(SP=sort(rep(as.factor(c('A','B','C','D','E')),12)), x=rpois(60,10), y=rep(c(rep(0,4),rep(10,4),rep(20,4)),5), z=rep(1:4,15)) xyplot(x~y|SP,data=data,groups=z,layout=c(2,3),pch=1:4,lty=1:4 ,col='black',type='b', key=list(x = .65, y = .75, corner = c(0, 0), points=TRUE, lines=TRUE, pch=1:4, lty=1:4, type='b', text=list(lab = as.character(unique(data$z) I have an extra column of symbols on the legend, and, would like to add a title to the legend. Such as 'main' for plots. Any suggestions? for key(), make 'lines' into a list: xyplot(x~y|SP,data=data,groups=z,layout=c(2,3), pch=1:4,lty=1:4,col='black',type='b', key=list(x = .65, y = .75, corner = c(0, 0), title=title here, cex.title=.9, lines.title=3, lines=list(pch=1:4, lty=1:4, type='b'), text=list(lab = as.character(unique(data$z) Peter Ehlers ... that works. Thanks Peter (sorry I misspelled your name b4). The clean code is now: require(lattice) data - data.frame(SP=sort(rep(as.factor(c('A','B','C','D','E')),12)), x=rpois(60,10), y=rep(c(rep(0,4),rep(10,4),rep(20,4)),5), z=rep(1:4,15)) xyplot(x~y|SP,data=data,groups=z,layout=c(2,3),pch=1:4,lty=1:4,col='black',type='b', key=list(x = .65, y = .75, corner = c(0, 0), lines=list(pch=1:4, lty=1:4, type='b'), title=expression(CO^2), text=list(lab = as.character(unique(data$z) David's code works too (thanks to you too!) and is somewhat shorter xyplot(x~y|SP, data=data,groups=z, layout=c(2,3), par.settings=simpleTheme(pch=1:4,lty=1:4,col='black'), type=b, auto.key = list(x = .6, y = .7, lines=TRUE, corner = c(0, 0))) but the lines and symbols are on different columns, and the line types look as if they were in bold. Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simple lattice question
DeaR ComRades, require(lattice) data - data.frame(SP=sort(rep(as.factor(c('A','B','C','D','E')),12)), x=rpois(60,10), y=rep(c(rep(0,4),rep(10,4),rep(20,4)),5), z=rep(1:4,15)) xyplot(x~y|SP,data=data,groups=z,layout=c(2,3),pch=1:4,lty=1:4,col='black',type='b') How do I put a legend for the grouping variable in the empty upper-right panel? TIA Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple lattice question
Thanks Peters! Just a few minor glitches now: require(lattice) data - data.frame(SP=sort(rep(as.factor(c('A','B','C','D','E')),12)), x=rpois(60,10), y=rep(c(rep(0,4),rep(10,4),rep(20,4)),5), z=rep(1:4,15)) xyplot(x~y|SP,data=data,groups=z,layout=c(2,3),pch=1:4,lty=1:4,col='black',type='b', key=list(x = .65, y = .75, corner = c(0, 0), points=TRUE, lines=TRUE, pch=1:4, lty=1:4, type='b', text=list(lab = as.character(unique(data$z) I have an extra column of symbols on the legend, and, would like to add a title to the legend. Such as 'main' for plots. Any suggestions? Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN -Mensaje original- De: Peter Ehlers [mailto:ehl...@ucalgary.ca] Enviado el: jueves, 31 de marzo de 2011 15:41 Para: Rubén Roa CC: r-help@r-project.org Asunto: Re: [R] Simple lattice question On 2011-03-31 03:39, Rubén Roa wrote: DeaR ComRades, require(lattice) data- data.frame(SP=sort(rep(as.factor(c('A','B','C','D','E')),12)), x=rpois(60,10), y=rep(c(rep(0,4),rep(10,4),rep(20,4)),5), z=rep(1:4,15)) xyplot(x~y|SP,data=data,groups=z,layout=c(2,3),pch=1:4,lty=1:4,col='bl ack',type='b') How do I put a legend for the grouping variable in the empty upper-right panel? See the help page for xyplot for an example using the iris data. You just need to add something like auto.key = list(x = .6, y = .7, corner = c(0, 0)) to your lattice call. See the 'key' entry on the ?xyplot page. Peter Ehlers TIA Rubén __ __ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Degrees of freedom for lm in logLik and AIC
However, shouldn't _free parameters_ only be counted for degrees of freedom and for calculation of AIC? The sigma parameter is profiled out in a least-squares linear regression, so it's not free, it's not a dimension of the likelihood. Just wondering ... Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Frank Harrell Enviado el: lunes, 28 de marzo de 2011 15:44 Para: r-help@r-project.org Asunto: Re: [R] Degrees of freedom for lm in logLik and AIC Thank you Peter. I didn't realize that was the convention used. Frank Peter Dalgaard-2 wrote: On Mar 28, 2011, at 05:36 , Frank Harrell wrote: gt; I have a question about the computation of the degrees of freedom in a linear gt; model: gt; gt; x lt;- runif(20); y lt;- runif(20) gt; f lt;- lm(y ~ x) gt; logLik(f) gt; 'log Lik.' -1.968056 (df=3) gt; gt; The 3 is coming from f$rank + 1. Shouldn't it be f$rank? This affects gt; AIC(f). I count three parameters in a simple linear regression: alpha, beta, sigma. From a generic-likelihood point of view, I don't see how you can omit the last one. -pd -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Degrees-of-freedom-for-lm-in-log Lik-and-AIC-tp3410687p3411759.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R helps win competitions
DeaR ComRades, This is a quote from a News article in Science's 11-February issue, about competitions to model data: For Chris Raimondi, a search-engine expert based in Baltimore, Maryland, and winner of the HIV-treatment competition, the Kaggle contest motivated him to hone his skills in a newly learned computer language called R, which he used to encode the winning data model. Raimondi also enjoys the competitive aspect of Kaggle challenges: It was nice to be able to compare yourself with others; ... it became kind of addictive. ... I spent more time on this than I should. If you are interested read the full article here: http://www.sciencemag.org/content/331/6018/698.full Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R² for non-linear model
-Mensaje original- De: Kjetil Halvorsen [mailto:kjetilbrinchmannhalvor...@gmail.com] Enviado el: jueves, 17 de marzo de 2011 16:19 Para: Rubén Roa CC: Alexx Hardt; r-help@r-project.org Asunto: Re: [R] R² for non-linear model see inline. On Thu, Mar 17, 2011 at 4:58 AM, Rubén Roa r...@azti.es wrote: Hi Alexx, I don't see any problem in comparing models based on different distributions for the same data using the AIC, as long as they have a different number of parameters and all the constants are included. For example, you can compare distribution mixture models with different number of components using the AIC. This is one example: Roa-Ureta. 2010. A Likelihood-Based Model of Fish Growth With Multiple Length Frequency Data. Journal of Biological, Agricultural and Environmental Statistics 15:416-429. Here is another example: www.education.umd.edu/EDMS/fac/Dayton/PCIC_JMASM.pdf Prof. Dayton writes above that one advantage of AIC over hypothesis testing is: (d) Considerations related to underlying distributions for random variables can be incorporated into the decision-making process rather than being treated as an assumption whose robustness must be considered (e.g., models based on normal densities and on log-normal densities can be compared). My reading of this is that AIC can be used to compare models with densities relative to the same dominating measure. Kjetil I think this is correct. It is probably not wise to use the AIC to compare distribution models based on the counting measure with distribution models based on the Lebesgue measure! Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R² for non-linear model
Hi Alexx, I don't see any problem in comparing models based on different distributions for the same data using the AIC, as long as they have a different number of parameters and all the constants are included. For example, you can compare distribution mixture models with different number of components using the AIC. This is one example: Roa-Ureta. 2010. A Likelihood-Based Model of Fish Growth With Multiple Length Frequency Data. Journal of Biological, Agricultural and Environmental Statistics 15:416-429. Here is another example: www.education.umd.edu/EDMS/fac/Dayton/PCIC_JMASM.pdf Prof. Dayton writes above that one advantage of AIC over hypothesis testing is: (d) Considerations related to underlying distributions for random variables can be incorporated into the decision-making process rather than being treated as an assumption whose robustness must be considered (e.g., models based on normal densities and on log-normal densities can be compared). Last, if you read Akaike's theorem you will see there is nothing precluding comparing models built on different distributional models. Here it is: the expected (over the sample space and the space of parameter estimates) maximum log-likelihood of some data on a working model overshoots the expected (over the sample space only) maximum log-likelihood of the data under the true model that generated the data by exactly the number of parameters in the working model. A remarkable result. Rubén -Original Message- From: r-help-boun...@r-project.org on behalf of Alexx Hardt Sent: Wed 3/16/2011 7:42 PM To: r-help@r-project.org Subject: Re: [R] R² for non-linear model Am 16.03.2011 19:34, schrieb Anna Gretschel: Am 16.03.2011 19:21, schrieb Alexx Hardt: And to be on-topic: Anna, as far as I know anova's are only useful to compare a submodel (e.g. with one less regressor) to another model. thanks! i don't get it either what they mean by fortune... It's an R-package (and a pdf [1]) with collected quotes from the mailing list. Be careful with the suggestion to use AIC. If you wanted to compare two models using AICs, you need the same distribution (that is, Verteilungsannahme) in both models. To my knowledge, there is no way to compare a gaussian model to an exponential one (except common sense), but my knowledge is very limited. [1] http://cran.r-project.org/web/packages/fortunes/vignettes/fortunes.pdf -- alexx@alexx-fett:~$ vi .emacs __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to produce glm graph
In addition to David's suggestion, you might want to examine the termplot function, ?termplot HTH -Original Message- From: r-help-boun...@r-project.org on behalf of David Winsemius Sent: Sat 11/20/2010 3:54 PM To: Sonja Klein Cc: r-help@r-project.org Subject: Re: [R] How to produce glm graph On Nov 20, 2010, at 4:27 AM, Sonja Klein wrote: I'm very new to R and modeling but need some help with visualization of glms. [snip] Is there a way to produce a graph in R that has these features? Of course. Modeling would be of little value without such capability. In R, regression functions produce an object with a particular class (glm in this case) and there is generally have predict method for each class. There is also a vector of fitted values within the object that may be accessed with the fitted or fitted values functions. The predict.glm help page has a worked example. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 0.3 is not 0.3, bug in seq() function?
Enrico, The same happens with other numbers/sequences. seq(0.1,0.9,0.1)[7]==0.7 [1] FALSE seq(0.1,1.3,0.1)[12]==1.2 [1] FALSE Rounding seems to fix it, round(seq(0.1,0.5,0.1),1)[3]==0.3 round(seq(0.1,0.9,0.1),1)[7]==0.7 round(seq(0.1,1.3,0.1),1)[12]==1.2 They all return TRUE. Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Enrico R. Crema Enviado el: jueves, 28 de octubre de 2010 12:24 Para: r-help@r-project.org Asunto: [R] 0.3 is not 0.3, bug in seq() function? Dear List, I've been running a numerical simulation and I found this odd error in my code where the which command could not identify which rows of a column of data.frame were corresponding to the value 0.3. There are 7 unique values in this column (0.01,0.05,0.1,0.2,0.3,0.4,0.5), and this does not work only for 0.3. So I looked at the column and manually tried to use the which() command, and the results were all FALSE despite I could see those number. So I recreated my sequence of number and tested: seq(0.1,0.5,0.1)[3]==0.3 which gave me FALSE!!! All the other numbers (0.1,0.2,0.4,0.5) give me TRUE, but 0.3 was not working. So I did: seq(0.1,0.5,0.1)[3]-0.3 which gave me 5.551115e-17. If you run a similar sequence like: seq(0.2,0.6,0.1)[2]==0.3 this will still give me FALSE. No, for my own purpose, I fixed the problem in this way: zerothree=seq(0.1,0.5,0.1)[3] which(data[,1]==zerothree) but I guess this bug is a bit of problem...Apologies if it is the wrong place to post this bug, and apologies also if this was a known issue. My version of R is : platform x86_64-pc-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 10.1 year 2009 month 12 day14 svn rev50720 language R version.string R version 2.10.1 (2009-12-14) Many Thanks, Enrico __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Set value if else...
x - data.frame(x=1:10) require(gtools) x$y - ifelse(odd(x$x),0,1) HTH R. Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Joel Enviado el: viernes, 15 de octubre de 2010 10:16 Para: r-help@r-project.org Asunto: [R] Set value if else... Hi I want to set a variable to either 1 or 0 depending on an value of a dataframe and then add this as a colum to the dataframe. This could be done with a loop but as we are able to do questions on a complete row or colum without a loop it would be sweet if it could be done. for example: table: Name Age Joel 24 agust 17 maja40 and so on... And what I want to do is a command that gives me VoteRight-1 if table$age18 else set VoteRight to 0 Then use cbind or something to add it to the table. so i get Name Age VoteRight Joel 241 agust 170 maja401 And as I said before im guessing this can be done without a loop... //Joel -- View this message in context: http://r.789695.n4.nabble.com/Set-value-if-else-tp2996667p2996667.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory limit problem
Hi, Probably Windows cann't allocate enough contiguous free space. Try this: Find boot.ini (usually at the root c:\) Without changing anything else, add this line at the end of the script: multi(0)disk(0)rdisk(0)partition(1)\WINDOWS=Microsoft Windows XP Professional 3GB /3GB /noexecute=optin /fastdetect Reboot. When prompted select the 3GB boot-up. Performance will decay but bigger objects could be saved to disk. Hope that this will be enough to get you code working. When finished re-boot and start with the normal boot-up. An example of a boot.ini script that I had to prepare for one big simulation work in an XP machine is at the end of my message. HTH Rubén [boot loader] timeout=30 default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS [operating systems] multi(0)disk(0)rdisk(0)partition(1)\WINDOWS=Microsoft Windows XP Professional /noexecute=optin /fastdetect multi(0)disk(0)rdisk(0)partition(1)\WINDOWS=Microsoft Windows XP Professional 3GB /3GB /noexecute=optin /fastdetect -Original Message- From: r-help-boun...@r-project.org on behalf of Tim Clark Sent: Tue 10/12/2010 5:49 AM To: r help r-help Cc: Tim Clark Subject: [R] Memory limit problem Dear List, I am trying to plot bathymetry contours around the Hawaiian Islands using the package rgdal and PBSmapping. I have run into a memory limit when trying to combine two fairly small objects using cbind(). I have increased the memory to 4GB, but am being told I can't allocate a vector of size 240 Kb. I am running R 2.11.1 on a Dell Optiplex 760 with Windows XP. I have pasted the error message and summaries of the objects below. Thanks for your help. Tim xyz-cbind(hi.to.utm,z=b.depth$z) Error: cannot allocate vector of size 240 Kb memory.limit() [1] 4000 memory.size() [1] 1971.68 summary(hi.to.utm) Object of class SpatialPoints Coordinates: min max x 708745.5 923406.7 y 2046153.1 2327910.9 Is projected: TRUE proj4string : [+proj=utm +zone=4 +datum=NAD83 +ellps=GRS80 +towgs84=0,0,0] Number of points: 15328 str(hi.to.utm) Formal class 'SpatialPoints' [package sp] with 3 slots ..@ coords : num [1:15328, 1:2] 708746 710482 712218 713944 715681 ... .. ..- attr(*, dimnames)=List of 2 .. .. ..$ : NULL .. .. ..$ : chr [1:2] x y ..@ bbox : num [1:2, 1:2] 708746 2046153 923407 2327911 .. ..- attr(*, dimnames)=List of 2 .. .. ..$ : chr [1:2] x y .. .. ..$ : chr [1:2] min max ..@ proj4string:Formal class 'CRS' [package sp] with 1 slots .. .. ..@ projargs: chr +proj=utm +zone=4 +datum=NAD83 +ellps=GRS80 +towgs84=0,0,0 summary(b.depth) x y z Min. :-157.0 Min. :18.50 Min. :-5783 1st Qu.:-156.6 1st Qu.:18.98 1st Qu.:-4565 Median :-156.1 Median :19.80 Median :-3358 Mean :-156.1 Mean :19.73 Mean :-3012 3rd Qu.:-155.5 3rd Qu.:20.41 3rd Qu.:-1601 Max. :-155.0 Max. :21.00 Max. : 0 str(b.depth) 'data.frame': 15328 obs. of 3 variables: $ x: num -157 -157 -157 -157 -157 ... $ y: num 21 21 21 21 21 ... $ z: num -110 -114 -110 -88 -76 -122 -196 -224 -240 -238 ... Tim Clark Marine Ecologist National Park of American Samoa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Fwd: RE: maximum likelihood problem]
If your problem was as you described it, you'd simply find the 1st derivative of your eq. w.r.t. k, equate to 0, and then solve for k (and check that the solution is a maximum). But I guess what you really want to do is to estimate k from data given your equation and _another model_ for the likelihood of the data. If that is the case then you haven't yet defined the model for the likelihood of the data as a function of your single parameter k. You have to define an R function that contains your process model (apparently your equation) and your likelihood model, which will be made from some probability model for the data. Try this to get help on how to write new functions: ?function After defining this function with the two models then you can maximize the likelihood of the data as a function of the process model parameter k. If there is only one parameter then it is recommended to use optimise(), instead of optim(). It is not necessary to maximize, just minimize the negative of the log(likelihood) once you have defined what this likelihood is. HTH Rubén -Original Message- From: r-help-boun...@r-project.org on behalf of mlar...@rsmas.miami.edu Sent: Sat 10/2/2010 3:11 AM To: r-help@r-project.org Subject: [R] [Fwd: RE: maximum likelihood problem] I forgot to add that I first gave a starting value for K. Nonlinear least squares won't work because my errors are not normally distributed. Any advide on my maximum likelihood function would be greatly appreciated. Original Message Subject: RE: [R] maximum likelihood problem From:Ravi Varadhan rvarad...@jhmi.edu Date:Fri, October 1, 2010 5:10 pm To: mlar...@rsmas.miami.edu r-help@r-project.org -- Do you want to do a nonlinear least-squares estimation (which is MLE if the errors are Gaussian)? If so, you have to define a function that takes the parameter (k) and data matrix (LR, T, LM), as arguments, and returns a scalar, which is the residual sum of squares. Then you can optimize (minimize) that function. Ravi. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of mlar...@rsmas.miami.edu Sent: Friday, October 01, 2010 4:40 PM To: r-help@r-project.org Subject: [R] maximum likelihood problem I am trying to figure out how to run maximum likelihood in R. Here is my situation: I have the following equation: equation-(1/LR-(exp(-k*T)*LM)*(1-exp(-k))) LR, T, and LM are vectors of data. I want to R to change the value of k to maximize the value of equation. My attempts at optim and optimize have been unsuccessful. Are these the recommended functions that I should use to maximize my equation? With optim I wanted the function to be maximized so I had to make the fnscale negative. Here is what I put: L-optim(k,equation,control=(fnscale=-1)) My result: Error: could not find function fn Here is what I put for optimize: L-optimise(equation,k,maximum=TRUE) My result: Error: 'xmin' not less than 'xmax' Any advise would be greatly appreciated. Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] geoR - likfit failure
Questions about geoR should be directed to R-SIG-GEO. Anyway, you should provide more info about your problem. Read the Posting Guide. Have you tried changing the model? Sometimes falling back from Matern to exponential or Gaussian allows successful convergence. HTH Rubén De: r-help-boun...@r-project.org en nombre de Mohammed Ouassou Enviado el: mié 04/08/2010 10:36 Para: R-help@r-project.org Asunto: [R] geoR - likfit failure Hi I'm using geoR package to perform linear spatial interpolation(OK). The function likfit() fails to compute REML. The error meassage is : Error in solve.default(v$varcov, xmat); How I can find out that likfit() is failed to process and retrieving the error message ? Thank you so much for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ed50
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Dipa Hari Enviado el: lunes, 12 de julio de 2010 22:19 Para: r-help@r-project.org Asunto: [R] ed50 I am using semiparametric Model library(mgcv) sm1=gam(y~x1+s(x2),family=binomial, f) How should I find out standard error for ed50 for the above model ED50 =( -sm1$coef[1]-f(x2)) / sm1$coef [2] f(x2) is estimated value for non parametric term. Thanks Two ways, 1) Re-parameterise the model so that ed50 is an explicit parameter in the model, or 2) Taylor series (aka Delta method) using the standard errors of coef[1], coef[2] and their correlation. HTH Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selection of optim parameters
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Fabian Gehring Enviado el: lunes, 05 de julio de 2010 21:53 Para: r-help@r-project.org Asunto: [R] selection of optim parameters Hi all, I am trying to rebuild the results of a study using a different data set. I'm using about 450 observations. The code I've written seems to work well, but I have some troubles minimizing the negative of the LogLikelyhood function using 5 free parameters. As starting values I am using the result of the paper I am rebuiling. The system.time of the calculation of the function is about 0.65 sec. Since the free parameters should be within some boundaries I am using the following command: optim(fn=calculateLogLikelyhood, c(0.4, 2, 0.4, 8, 0.8), lower=c(0,0,0,0,0), upper=c(1, 5, Inf, Inf, 1), control=list(trace=1, maxit=1000)) Unfortunately the result doesn't seem to be reasonable. 3 of the optimized parameters are on the boundaries. 1) Your parameters seem to vary over several orders of magnitude. The control argument has a parscale parameter that can be used to re-scale all parameters to the same order of magnitude. Alternatively, your could estimate the log of your parameters, say par=c(log(0.4), log(2), log(0.4), log(8), log(0.8) (and equivalent changes in lower and upper), and in your function, instead of the parameter value, use exp(parameter value9. That way the _numerical optimization_ occurs in the log space whereas the _function evaluation_ occurs in the original space. This transformation approach would make your parameter estimates and their hessian matrix (in case you are interested in the hessian) be output in the transformed space, so estimates and their covariance matrix will have to be back-transformed. For estimates just use exp(), whereas for the covariance matrix you might have to use something like Taylor series. 2) Did you use L-BFGS-B in the method argument of optim()? This method admits box-constrained optimization whereas the default (which you seem to be using, Nelder-Mead) in unconstrained, as far as I know. Unfortunately I don't have much experience using optimizatzion methods. That's why I am asking you. Do you have any hints for me what should be taken into account when doing such an optimization. Is there a good way to implement the boundaries into the code (instead of doing it while optimizing)? I've read about parscale in the help-section. Unfortunately I don't really know how to use it. And anyways, could this help? What other points/controls should be taken into account? About using parscale, see Ravi Varadhan's post Re: optim() not finding optimal values the 27th of June. HTH Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optim() not finding optimal values
Derek, As a general strategy, and as an alternative to parscale when using optim, you can just estimate the logarithm of your parameters. So in optim the par argument would contain the logarithm of your parameters, whereas in the model itself you would write exp(par) in the place of the parameter. The purpose of this is to bring all parameters to a similar scale to aid the numerical algorithm in finding the optimum over several dimensions. Due to the functional invariance property of maximum likelihood estimates your transformed pars back to the original scale are also the MLEs of the pars in your model. If you were using ADMB you'd get the standard error of the pars in the original scale simply by declaring them sd_report number class. With optim, you would get the standard error of pars in the original scale post-hoc by using Taylor series (a.k.a. Delta method) which in this case is very simple since the transformation is just the exponential. In relation to your model/data combination, since you have only 17 years of data and just one series of cpue, and this is a rather common case, you may want to give the choice to set B0=K, i.e. equilibrium conditions at the start, in your function, to reduce the dimensionality of your profile likelihood approximation thus helping the optimizer. HTH Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Derek Ogle Enviado el: sábado, 26 de junio de 2010 22:28 Para: R (r-help@R-project.org) Asunto: [R] optim() not finding optimal values I am trying to use optim() to minimize a sum-of-squared deviations function based upon four parameters. The basic function is defined as ... SPsse - function(par,B,CPE,SSE.only=TRUE) { n - length(B) # get number of years of data B0 - par[B0]# isolate B0 parameter K - par[K] # isolate K parameter q - par[q] # isolate q parameter r - par[r] # isolate r parameter predB - numeric(n) predB[1] - B0 for (i in 2:n) predB[i] - predB[i-1]+r*predB[i-1]*(1-predB[i-1]/K)-B[i-1] predCPE - q*predB sse - sum((CPE-predCPE)^2) if (SSE.only) sse else list(sse=sse,predB=predB,predCPE=predCPE) } My call to optim() looks like this # the data d - data.frame(catch= c(9,113300,155860,181128,198584,198395,139040,109969,71896 ,59314,62300,65343,76990,88606,118016,108250,108674), cpe=c(109.1,112.4,110.5,99.1,84.5,95.7,74.1,70.2,63.1,66.4,60. 5,89.9,117.0,93.0,116.6,90.0,105.1)) pars - c(80,100,0.0001,0.17) # put all parameters into one vector names(pars) - c(B0,K,q,r) # name the parameters ( SPoptim - optim(pars,SPsse,B=d$catch,CPE=d$cpe) )# run optim() This produces parameter estimates, however, that are not at the minimum value of the SPsse function. For example, these parameter estimates produce a smaller SPsse, parsbox - c(732506,1160771,0.0001484,0.4049) names(parsbox) - c(B0,K,q,r) ( res2 - SPsse(parsbox,d$catch,d$cpe,SSE.only=FALSE) ) Setting the starting values near the parameters shown in parsbox even resulted in a movement away from (to a larger SSE) those parameter values. ( SPoptim2 - optim(parsbox,SPsse,B=d$catch,CPE=d$cpe) )# run optim() This issue most likely has to do with my lack of understanding of optimization routines but I'm thinking that it may have to do with the optimization method used, tolerance levels in the optim algorithm, or the shape of the surface being minimized. Ultimately I was hoping to provide an alternative method to fisheries biologists who use Excel's solver routine. If anyone can offer any help or insight into my problem here I would be greatly appreciative. Thank you in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with package installation
It was the Panda antivirus installed in the main corporate server of my company. This antivirus program was corrupting the zip files containing the packages for Windows systems. The problem was solved by including CRAN repositories as trustable webpages. Best, Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN -Mensaje original- De: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de] Enviado el: lunes, 21 de junio de 2010 16:52 Para: Rubén Roa CC: r-help@r-project.org Asunto: Re: [R] Problem with package installation No idea, do you have the right repository seledcted? Is there nay proxy in your network that may change the download data for some reason? I have not observed that before and I haven't seen similar reports before. Best, Uwe Ligges On 21.06.2010 11:03, Rubén Roa wrote: Dear ComRades, I am having the wrong MD5 checksums error with every package I try to install. It happened with R 2.11.0, then I updated to R 2.11.1, same thing. sessionInfo() R version 2.11.1 (2010-05-31) i386-pc-mingw32 locale: [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252LC_MONETARY=Spanish_Spain.1252 [4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.11.1 I triend changing the mirror, to no avail. Then I tried downloading zip files and installing from local zips but again I got the wrong MD5 checksums error utils:::menuInstallLocal() files R/MASS.rdb, data/Rdata.rdb, help/MASS.rdb have the wrong MD5 checksums In some cases, another error message pointed to a namespace problem, for example: library(Formula) Error in get(Info[i, 1], envir = env) : internal error -3 in R_decompress1 Error: package/namespace load failed for 'Formula' Any hints? __ __ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with package installation
Dear ComRades, I am having the wrong MD5 checksums error with every package I try to install. It happened with R 2.11.0, then I updated to R 2.11.1, same thing. sessionInfo() R version 2.11.1 (2010-05-31) i386-pc-mingw32 locale: [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 LC_MONETARY=Spanish_Spain.1252 [4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.11.1 I triend changing the mirror, to no avail. Then I tried downloading zip files and installing from local zips but again I got the wrong MD5 checksums error utils:::menuInstallLocal() files R/MASS.rdb, data/Rdata.rdb, help/MASS.rdb have the wrong MD5 checksums In some cases, another error message pointed to a namespace problem, for example: library(Formula) Error in get(Info[i, 1], envir = env) : internal error -3 in R_decompress1 Error: package/namespace load failed for 'Formula' Any hints? Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [ADMB Users] an alternative to R for nonlinear stat models
De: Chris Gast [mailto:cmg...@gmail.com] Enviado el: jueves, 17 de junio de 2010 22:32 Para: Rubén Roa CC: r-help@r-project.org; us...@admb-project.org Asunto: Re: [ADMB Users] an alternative to R for nonlinear stat models I spoke with my colleague who did most of the testing, and he has informed me that much of the hessian sensitivity actually came from a separate program (based on Numerical Recipes in C++ code) that did not use optim(), after having stopped using optim() due to speed issues. In my experience with optim, the reltol argument has improved important in this regard. Very small changes in the parameter estimates at the converged solution (influenced by reltol) can lead to different standard error estimates by inverting the hessian, especially for parameter estimates close to zero (as vulnerability coefficients can be in many models with such a feature). It is a limitation of the finite difference method for computing the hessian based on optimal parameter estimates. Chris If the problem originates in estimates being close to zero, a simple transformation (log or exp) might help. However, what I would really like to know is the performance of different methods of the optim function. I guess the reltol parameter and other control parameters would act different depending on the method, whether Nelder-Mead, CG, etc. I am currently experimenting with CG and although it is slow for my model, it has always produced hessian matrices that were invertible and with all diagonals positive after inversion (unlike Nelder-Mead, the default). Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [ADMB Users] an alternative to R for nonlinear stat models
De: users-boun...@admb-project.org [mailto:users-boun...@admb-project.org] En nombre de Chris Gast Enviado el: miércoles, 16 de junio de 2010 21:11 Para: Arni Magnusson CC: r-help@r-project.org; us...@admb-project.org Asunto: Re: [ADMB Users] an alternative to R for nonlinear stat models Hi Arni (and others), My dissertation work involves use (and extension) of models of the same ilk (sometimes exactly the same) as those described by Nancy Gove and John Skalski in their 2002 article. I began with R, and moved to my own home-brewed C/C++ programs for the sake of of speed when fitting models and real and simulated data. In addition, we found that the estimated standard errors (based on the inverse hessian output from optim()) were very sensitive to tolerance criteria--often changing orders of magnitude. Hi, Regarding the last bit, optim() has several methods (Nelder-Mead, simulated annealing, conjugate gradient, etc). It is interesting to me which method produced what result with the standard errors from the inverse Hessian. Can you briefly ellaborate? Thanks Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding year information to a scatter plot
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Ozan Bakis Enviado el: domingo, 02 de mayo de 2010 21:25 Para: r-help@r-project.org Asunto: [R] adding year information to a scatter plot Hi R users, I would like to add year information to each point in a scatter plot. The following example shows what i mean: df=data.frame(year=c(2001,2002,2003),a=c(8,9,7),b=c(100,90,95)) df plot(b~a,data=df) Now, I would like to see which point belongs to 2001, 2002 and 2003 in the above graphic. Thank you very much, ozan Use text, df - data.frame(year=c(2001,2002,2003),a=c(8,9,7),b=c(100,90,95)) df plot(b~a,data=df,ylim=c(.98*min(b),1.02*max(b))) text(x=df$a,y=df$b-.01*max(df$b),lab=format(df$year)) Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] non linear estimation
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de JamesHuang Enviado el: jueves, 29 de abril de 2010 3:38 Para: r-help@r-project.org Asunto: Re: [R] non linear estimation any suggestion? actually I just wanna know if there is a package for non linear estimation with restriction, thanks. I am a new for R I do not know if there is any specific package for optimization with restrictions, but you can use optim with method=L-BFGS-B This only lets you set bounds of single parameters, so for restrictions such as a+b19 in Y=a+(b+c*x)*exp(-d*x) you could deduce your restrictions in terms of single parameters (for example, in your original mail you put that a10, a+b19, and b3, so the restriction a+b19 is actually redundant), or else you could think of some re-parameterization that would put a+b (and all other multi-par restrictions) as a single parameter. Wait, is this a homework? Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting daily data series to monthly series
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de zow...@ncst.go.ke Enviado el: miércoles, 21 de abril de 2010 5:16 Para: r-help@r-project.org Asunto: [R] Converting daily data series to monthly series Hi Users, I have daily series of data from 1962 - 2000, with the data for February 29th in leap years excluded, leaving 365 daily values for each year. I wish to convert the daily series to monthly series. How can I do this using the zoo package or any other package? Thanks ZABLONE OWITI GRADUATE STUDENT Nanjing University of Information, Science and Technology College of International Education Let df be your dataframe, #In case you have to format your data as date before setting the montth df$date - as.Date(df$date, %d/%m/%Y) #Getting year, month and week from your correctly formatted date df$Year - as.numeric(format(df$date, %Y))#Year df$Month - as.numeric(format(df$date, %m))#Month df$Week - as.numeric(format(df$date, %W)) +1 #Week Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scanning only specific columns into R from a VERY large file
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Josh B Enviado el: sábado, 17 de abril de 2010 0:12 Para: R Help Asunto: [R] Scanning only specific columns into R from a VERY large file Hi, I turn to you, the R Sages, once again for help. You've never let me down! (1) Please make the following toy files: x - read.table(textConnection(var.1 var.2 var.3 var.1000 indv.1 1 5 9 7 indv.21 2 9 3 8), header = TRUE) y - read.table(textConnection(var.3 var.1000), header = TRUE) write.csv(x, file = x.csv) write.csv(y, file = y.csv) (2) Pretend you are starting with the files x.csv and y.csv. They come from another source -- an online database. Pretend that these files are much, much, much larger. Specifically: (a) Pretend that x.csv contains 1000 columns by 210,000 rows. (b) y.csv contains just header titles. Pretend that there are 90 header titles in y.csv in total. These header titles are a subset of the header titles in x.csv. (3) What I want to do is scan (or import, or whatever the appropriate word is) only a subset of the columns from x.csv into an R. Specifically, I only want to scan the columns of data from x.csv into R that are indicated in the file y.csv. I still want to scan in all 21 rows from x.csv, but only for the aforementioned columns listed in y.csv. Can you guys recommend a strategy for me? I think I need to use the scan command, based on the hugeness of x.csv, but I don't know what exactly to do. Specific code that gets the job done would be the most useful. Thank you very much in advance! Josh --- Try with something like do.call(cbind,scan(file=yourfile.csv,what=list(NULL,NULL,,0,NULL,0,NULL,NULL,...,NULL),flush=TRUE)) you have to work out how to set up the list of parameter 'what' to read the headers of 'y'. In the above the only columns read are those indicated by a '0'. HTH Ruben Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Truncated Normal Distribution and Truncated Pareto distribution
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Julia Cains Enviado el: lunes, 19 de abril de 2010 9:22 Para: r-help@r-project.org Asunto: [R] Truncated Normal Distribution and Truncated Pareto distribution Dear R helpers, I have a bimodal dataset dealing with loss amounts. I have divided this dataset into two with the bounds for the first dataset i.e. dataset-A being 5,000$ to 100,000$ and the dataset-B deals with the losses exceeding 100,000$ i.e. dataset-B is left truncated. I need to fit truncated normal disribution to dataset - I having lower bound of 5000 and upper bound of 100,000. While I need to fit truncated Pareto for the lossess exceeding 100,000$. Is there any package in R which will guide me to fit these two distrubitions also giving KS (Kolmogorov Smirnov) test and Anderson Darling test results. Please guide Julia --- See library(MASS) ?fitdistr You can define your customized truncated density as a function in the parameter densfun of fitdistr. See also http://www.mail-archive.com/r-h...@stat.math.ethz.ch/msg34540.html http://www.mail-archive.com/r-h...@stat.math.ethz.ch/msg34548.html HTH Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN Only a man of Worth sees Worth in other men [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove from the R mailing list
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de zoe zhang Enviado el: martes, 30 de marzo de 2010 12:18 Para: r-help@r-project.org Asunto: [R] remove from the R mailing list I would like to be removed from the R mailing list. Thanks. --- Hi, Would you like me to remove you? Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Corrado Enviado el: martes, 30 de marzo de 2010 16:52 Para: r-help@r-project.org Asunto: [R] From THE R BOOK - Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! Dear friends, I am testing glm as at page 514/515 of THE R BOOK by M.Crawley, that is on proportion data. I use glm(y~x1+,family=binomial) y is a proportion in (0,1), and x is a real number. I get the error: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! But that is exactly what was suggested in the book, where there is no mention of a similar warning. Where am I going wrong? Here is the output: glm(response.prepared~x,data=,family=binomial) Call: glm(formula = response.prepared ~ x, family = binomial, data = ) Coefficients: (Intercept)x -0.3603 0.4480 Degrees of Freedom: 510554 Total (i.e. Null); 510553 Residual Null Deviance: 24420 Residual Deviance: 23240AIC: 700700 Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk --- Probably you are misreading Crawley's Book? A proportion would usually be modeled with the Beta distribution, not the binomial, which is for counts. If you are modeling a proportion try the betareg function in betareg package. HTH Ruben Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] add information above bars of a barplot()
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Martin Batholdy Enviado el: lunes, 22 de marzo de 2010 15:53 Para: r help Asunto: [R] add information above bars of a barplot() hi, I have a barplot with six clusters of four bars each. Now I would like to add the exact value of each bar as a number above the bar. I hoped to get some tips here. I could simply add text at the different positions, but I don't understand how the margins on the x-axis are calculated (how can I get / calculate the x-ticks of a barplot?). Also I would like to code this flexible enough so that it still works when I have more bars in each cluster. thanks for any suggestions! If you are barplotting x barplot(x) text(x=barplot(x),y=x,label=format(x),po=3) should get you closer to what you want. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tm[,-1]
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de ManInMoon Enviado el: jueves, 11 de marzo de 2010 12:22 Para: r-help@r-project.org Asunto: [R] tm[,-1] This does what I was hoping it would: aggregate(tm[,-1],by=list(tm[,10]),FUN=mean) but I don't know what tm[,-1] means (well - the -1 bit anyway. Does it somehow means the whole matrix? --- No, it means the matrix 'tm' minus the 1st column, (a - matrix(rnorm(16,4,5),4,4)) a[,-1] Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Tinn-R RGui Send problem
ComRades, On a recent Windows XP system that I have to use at work, I installed Tinn-R 2.3.4.4 and R 2.10.1 When I mark-select a piece of code, say summary(x) written in Tinn-R and send it to the RGui using the send button, the RGui console instead of showing me my piece of code, it shows something like source(.trPaths[5], echo=FALSE, max.deparse.length=150) The code is executed fine but if I want to re-call it to the console, say to make a little change, using the Up arrow of the keyboard, I do not receive the code but again the line quoted above. That's not very useful. Checking .TrPaths[5] shows that Tinn-R is writing a temporary file in ..\\Program Data\\tmp\selection.r and that file contains my code, and is being overwritten with every new code that I select and send to the console. Does anyone know what to do to make the console re-call the selected piece of code instead of the call to the temporary file? I guess it is a Tinn-R issue Thanks Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tcltk
I doubt it. Guess why? Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Vasco Cadavez Enviado el: lunes, 08 de marzo de 2010 15:46 Para: r-help@r-project.org Asunto: [R] tcltk Hi, I'm trying to install tcltk in R-2.10.1, however I get error. someone can help? thanks Vasco __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random real numbers
From what distribution? If the uniform, runif(100,0,2*pi) If another, install package Runuran, and do this ?Runuran Vignette(Runuran) HTH Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de frederik vanhaelst Enviado el: martes, 02 de marzo de 2010 11:52 Para: r-h...@stat.math.ethz.ch Asunto: [R] Random real numbers Hi, How could i generate random real numbers between 0 en 2*pi? Thanks, Frederik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotmath: suprscript on scalable delimiter?
ComRades, How do you put a superscript on a scalable delimiter? I want to put 'b' as the power of the expression in the following plot: t - 1:25 K - 0.2 y - ((1-exp(-K*t))/(1-exp(-K*t)*exp(K)))^3 plot(t,y,l,main=K=0.2, b=3) text(15,5,expression(bgroup((,frac(1-e^-Kt,1-e^-Kt*e^K), Plotmath examples in demo(plotmath) do not include this. I've tried a few things to no avail and I did an RSiteSearch(superscript delimiter) with zero results. Thx Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] discriminant analysis
Beatriz Yannicelli wrote: Dear all: Is it possible to conduct a discriminant analysis in R with categorical and continuous variables as predictors? Beatriz Beatriz, Simply doing this in the R console: RSiteSearch(discriminant) yields many promising links. In particular, check documentation of package mda. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MLE for bimodal distribution
_nico_ wrote: Hello everyone, I'm trying to use mle from package stats4 to fit a bi/multi-modal distribution to some data, but I have some problems with it. Here's what I'm doing (for a bimodal distribution): # Build some fake binormally distributed data, the procedure fails also with real data, so the problem isn't here data = c(rnorm(1000, 3, 0.5), rnorm(500, 5, 0.3)) # Just to check it's bimodal plot(density(data)) f = function(m, s, m2, s2, w) { -log( sum(w*dnorm(data, mean=m, sd=s)) + sum((1-w)*dnorm(data, mean=m2, sd=s2)) ) } res = mle(f, start=list(m=3, s=0.5, m2=5, s2=0.35, w=0.6)) This gives an error: Error in optim(start, f, method = method, hessian = TRUE, ...) : non-finite finite-difference value [2] In addition: There were 50 or more warnings (use warnings() to see the first 50) And the warnings are about dnorm producing NaNs So, my questions are: 1) What does non-finite finite-difference value mean? My guess would be that an Inf was passed somewhere where a finite number was expected...? I'm not sure how optim works though... 2) Is the log likelihood function I wrote correct? Can I use the same type of function for 3 modes? I think it is correct but it is difficult to fit as others have pointed out. As an alternative, you may discretise your variable so the mixture is fit to counts on a histogram using a multinomial likelihood. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in the NY Times
Zaslavsky, Alan M. wrote: This article is accompanied by nice pictures of Robert and Ross. Data Analysts Captivated by Power of R http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html Thanks for the heads up. The R morale is going through the roof! I've given three courses on R since the second half of 2007 here in Chile (geostatistics, Fisheries Libraries for R, and generalized linear models) and all my three audiences (professionals working in academia, government, and private research institutions) were very much impressed by the power of R. I spent as much time on R itself as on the statistical topics, since students wanted to learn data management and graphics once they started to grasp the basic elements. R creators, Core Team, package creators and maintainers, and experts on the list, thanks so much for such a great work and such an open attitude. You lead by example. Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AD Model Builder now freely available
dave fournier wrote: Hi All, Following Mike Praeger's posting on this list, I'm happy to pass on that AD Model Builder is now freely available from the ADMB Foundation. http://admb-foundation.org/ Two areas where AD Model builder would be especially useful to R users are multi-parmater smooth optimization as in larg scale maximum likelihood estimation and the analysis of general nonlinear random effects models. Cheers, Dave Thank you Dave, this is really great news! So now I can download ADMB and ADMB-RE for free? And use Mike's X2R package to connect my ADMB scripts with the power of R. Only one word: awesome! Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting data to a sigmoidal curve
Greg Snow wrote: Sarah, Doing: RSiteSearch('gompertz', restrict='functions') At the command prompt gives several promising results. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. And you can also do: nobs - length(data$salam.size.observed) fn-function(p){ salam.size.model - salam.size.inf*exp(-G1*exp(-G2*data$age.observed)); # Gompertz model squdiff- (salam.size.model-data$salam.size.observed)^2; #vector of squared differences sigmasqu- sum(squdiff)/nobs; # nuisance sigma parameter profiled out minloglik- (nobs/2)*log(sigmasqu); #profile likelihood as approx. to true likelihood } (salam.size.likfit - nlm(fn,p=c(init. val. 4 salam.size.inf, init. val. 4 G1, init. val 4 G2), hessian=TRUE)) where data is a dataframe with observed sizes and ages. Invert Hessian to obtain measures of precision. Note also that if you know the size at birth then you can re-parameterize, put size at birth instead of asymptotic size (salam.size.inf), and reduce model complexity to two parameters (but of course the Gompertz formula changes). If the Gompertz model is not flexible enough, note that it is a particular case of another more flexible, and more complex model, Schnute's, which I have found often provides a better explanation of growth data (as measured by the AIC), especially if you have sizes of very young individuals. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting a modified logistic with glm?
Mike Lawrence wrote: On Sat, Nov 8, 2008 at 3:59 PM, Rubén Roa-Ureta [EMAIL PROTECTED] wrote: ... The fit is for grouped data. ... As illustrated in my example code, I'm not dealing with data that can be grouped (x is a continuous random variable). Four points: 1) I've showed you an approach that you can follow in order to fit a modified (custom-made) logistic model, it's up to you to follow this approach and modify it accordingly, it was never my intention to give you the precise code for your problem (obviously), 2) If you do follow this approach and you keep your predictor continuous, you have to change the line of the likelihood definition, 3) If it is really true that your predictor is a *random* variable, then you have not a simple glmn (you may have a glmm), and 4) You can always make your continuous predictor categorical, for example, as when you make a histogram of said predictor. R. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting a modified logistic with glm?
Mike Lawrence wrote: Hi all, Where f(x) is a logistic function, I have data that follow: g(x) = f(x)*.5 + .5 How would you suggest I modify the standard glm(..., family='binomial') function to fit this? Here's an example of a clearly ill-advised attempt to simply use the standard glm(..., family='binomial') approach: # First generate some data #define the scale and location of the modified logistic to be fit location = 20 scale = 30 # choose some x values x = runif(200,-200,200) # generate some random noise to add to x in order to # simulate real-word measurement and avoid perfect fits x.noise = runif(length(x),-10,10) # define the probability of success for each x given the modified logistic prob.success = plogis(x+x.noise,location,scale)*.5 + .5 # obtain y, the observed success/failure at each x y = rep(NA,length(x)) for(i in 1:length(x)){ y[i] = sample( x = c(1,0) , size = 1 , prob = c(prob.success[i], 1-prob.success[i]) ) } #show the data and the source modified logistic plot(x,y) curve(plogis(x,location,scale)*.5 + .5,add=T) # Now try to fit the data #fit using standard glm and plot the result fit = glm(y~x,family='binomial') curve(plogis(fit$coefficients[1]+x*fit$coefficients[2])*.5+.5,add=T,col='red',lty=2) # It's clear that it's inappropriate to use the standard glm(y~x,family='binomial') # method to fit the modified logistic data, but what is the alternative? A custom-made fit function using nlm? Below, in the second line the logistic model has been re-parameterized to include p[1]=level of the predictor which predicts a 50% probability of success, and p[2]=level of the predictor which predicts a 95% probability of success. You can change the model in this line to your version. In the 4th line (negative log-likelihood) mat is a vector of 0s and 1s representing success and imat is a vector of 1s and 0s representing failure. The fit is for grouped data. fn - function(p){ prop.est - 1/(1+exp(log(1/19)*(l-p[1])/(p[2]-p[1]))); # estimated proportion of success iprop.est - 1-prop.est;# estimated proportion of failure negloglik - -sum(count*(mat*log(prop.est)+imat*log(iprop.est))); #binomial likelihood, prob. model } prop.lik - nlm(fn,p=c(25.8,33.9),hessian=TRUE) HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] descretizing xy data
Jon A wrote: Hello, I have a dataset with a continuous independent variable (fish length, range: 30-150 mm) and a binary response (foraging success, 0 or 1). I want to discretize fish length into 5 mm bins and give the proportion of individuals who successfully foraged in each each size bin. I have used the cut function to discretize the length values into my desired bins, but I can't figure out how to manipulate my response data in terms of the levels I've created. Any advice on how to achieve my task? Thanks in advance. You have the option of using catspec. Here is another, more transparent solution, using hist(). lb - 30 ub - 150 bk - 5 x - data.frame(cbind(runif(1000,lb,ub),rbinom(1000,1,0.75))) x$X3 - cut(x$X1,breaks=(ub-lb)/bk,labels=FALSE) y - data.frame(cbind(hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks[-1],hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$counts,0)) for (i in 1:length(y$X1)) { for (j in 1:length(x$X1)) { if(identical(x$X3[j],i)) y$X3[i] - y$X3[i]+x$X2[j] } } sum(x$X2) #check that counting was correct sum(y$X3) y$X4 - ifelse(y$X3 0,y$X3/y$X2,NA) plot(y$X1,y$X4,pch=19) abline(v=hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks) HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] descretizing xy data
Rubén Roa-Ureta wrote: Jon A wrote: Hello, I have a dataset with a continuous independent variable (fish length, range: 30-150 mm) and a binary response (foraging success, 0 or 1). I want to discretize fish length into 5 mm bins and give the proportion of individuals who successfully foraged in each each size bin. I have used the cut function to discretize the length values into my desired bins, but I can't figure out how to manipulate my response data in terms of the levels I've created. Any advice on how to achieve my task? Thanks in advance. You have the option of using catspec. Here is another, more transparent solution, using hist(). lb - 30 ub - 150 bk - 5 x - data.frame(cbind(runif(1000,lb,ub),rbinom(1000,1,0.75))) x$X3 - cut(x$X1,breaks=(ub-lb)/bk,labels=FALSE) y - data.frame(cbind(hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks[-1],hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$counts,0)) for (i in 1:length(y$X1)) { for (j in 1:length(x$X1)) { if(identical(x$X3[j],i)) y$X3[i] - y$X3[i]+x$X2[j] } } sum(x$X2) #check that counting was correct sum(y$X3) y$X4 - ifelse(y$X3 0,y$X3/y$X2,NA) plot(y$X1,y$X4,pch=19) abline(v=hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks) BTW, if you add the line below, text(x=y$X1,y=y$X4+.01,format(y$X2),cex=.5) you show the sample size at each proportion. R. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Distributions Comparison
Igor Telezhinsky wrote: Dear all, I have recently found out about the R project. Could anyone tell me if it is possible to make the comparison of two distributions using R packages? The problem is TRICKY because I have the distributions of measurements and each measurement in the given distribution has its own error, which I need to take into account when comparing these distributions. If anyone knows it is possible with R, could you please tell me what package to use. I will really appreciate some hints in code as well. Thank you. Dear Igor, Welcome to the list and to R. Please read the posting guide and study your problem b4 attempting to seek help from us. Your question as it stands is impossibly ambiguous. Try to make a toy example or formulate it better and you may have better luck. Ruben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about glm using R
Jordi Garcia wrote: Good morning, I am using R to try to model the proportion of burned area in Portugal. The dependent variable is the proportion. The family used is binomial and the epsilon would be binary. I am not able to find the package to be used when the proportion (%) has to be used in glm. Could someone help me? I am using normal commands of glm.. for example: glm_5- glm(formula=p~Precipitation, family=binomial(link=logit), data=dados) where p is the proportion of burned area, but this error message apperars: Warning message: non-integer #successes in a binomial glm! in: eval(expr, envir, enclos) That is why I think I am not using the proper glm package. Thank you very much in advance. Jordi Jordi, Your statistical model is wrong. The binomial family if four counts data (counts of successes given n trials), not for proportions. To model proportions, your family is the Beta family. I've modeled proportion response variables with function betareg of package betareg. If you want my example applications I can send you code and data off list. Reference: Ferrari and Cribari-Neto. 2004. Beta regression for modelling rates and proportions. Journal of Applied Statistics 31:799-815. Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] distributions and glm
drbn wrote: Hello, I have seen that some papers do this: 1.) Group data by year (e.g. 35 years) 2.) Estimate the mean of the key variable through the distribution that fits better (some years is a normal distribution , others is a more skewed, gamma distribution, etc.) 3.) With these estimated means of each year do a GLM. I'd like to know if it is possible (to use these means in a GLM) or is a wrong idea. Thanks in advance David David, You can model functions of data, such as means, but you must be careful to carry over most of the uncertainty in the original data into the model. If you don't, for example if you let the model know only the values of the means, then you are actually assuming that these means were observed with absolute certainty instead of being estimated from the data. To carry over the uncertainty in the original data to your modeling you can use a Bayesian approach or you can use a marginal likelihood approach. A marginal likelihood is a true likelihood function not of the data, but of functions of the data, such as of maximum likelihood estimates. If your means per year were estimated using maximum likelihood (for example with fitdistr in package MASS) and you sample size is not too small then you can use a normal marginal likelihood model for the means. Note however that each mean may come from a different distribution so the full likelihood model for your data would be a mixture of normal distributions. You may not be able to use the pre-built glm function so you may face the challenge to write your own code. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to plot the histogram and the curve in the same graph
leo_wa wrote: i want to plot the histogram and the curve in the same graph.if i have a set of data ,i plot the histogram and also want to see what distribution it was.So i want to plot the curve to know what distribution it like. To draw the curve and the distribution you should have an idea about the distribution. You cann't just draw the histogram and expect R to make a curve of the best distribution to fit that histogram. But you can plot a curve of a kernel density. x - rnorm(1000,5,3) library(MASS) (x.normfit - fitdistr(x,normal)) hist(x,prob=TRUE) lines(density(x,na.rm=TRUE),col=red) # kernel density curve(dnorm(x,mean= x.normfit$estimate[1],sd= x.normfit$estimate[2]),col=blue,add=TRUE) #maximum likelihood estimate Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot the chi square distribution and the histogram in the same graph
leo_wa wrote: In the previous post ,i ask how to plot the normal curve and the histogram in the same graph.if i want to know how to plot the chi square distribution to campare the data set ,how can i do that? You should make up your mind, is your random variable X (-inf,+inf) or Sum(X^2) (which obviously cann't take negative values)? They are quite different. Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot - central limit theorem
Jörg Groß wrote: Hi, Is there a way to simulate a population with R and pull out m samples, each with n values for calculating m means? I need that kind of data to plot a graphic, demonstrating the central limit theorem and I don't know how to begin. So, perhaps someone can give me some tips and hints how to start and which functions to use. thanks for any help, joerg x - runif(1,0,1) y - runif(1,0,1) z - runif(1,0,1) u - runif(1,0,1) v - runif(1,0,1) w - runif(1,0,1) t - x+y+z+u+v+w hist(t,breaks=100,xlab=sum of i.i.d r.v with finite mean and variance,ylab=Frequency,main=) Change runif for the corresponding function of the distribution of your choice. Not exactly what you wanted but it does verify the C.L.T. Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R graph with values incorporated
Prasanth wrote: Dear All: Greetings! By the way, is it possible to have a graph (say line graph) that shows values as well (say y-axis values within the graph)? One could do it in excel. I am just wondering whether it is possible with R! x - rnorm(100,2,3) y - rnorm(100,2,3) plot(x,y,pch=19) text(x=x,y=y+.5,format(x,digits=1),cex=.5) [[alternative HTML version deleted]] -- Read the posting guide. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] draw a function over a histogram
Daniela Garavaglia wrote: Sorry, I have some troubles with the graph device. How can I draw a function over a histogram? Thank's so much. Daniela, What function? Here is one example using density() and using dnorm() x - rnorm(1000,2,2) hist(x,prob=TRUE) lines(density(x,na.rm=TRUE),col=red) library(MASS) y - fitdistr(x,normal) curve(dnorm(x,mean=y$estimate[1],sd=y$estimate[2]),add=TRUE) HTH R. [[alternative HTML version deleted]] --- What about this?? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Geodata object border
imicola wrote: Sorry, this is probably quite an easy question, but I'm new to R and couldn't find the answer anywhere. I'm using geoR and geoRglm, but can't figure out how to get a border in my geodata object. Does this need to be defined when I'm importing my data, or afterwards, and how do I go about doing this? Thanks You can define the border previously and import it to R with read.table or read.csv or you can define it from your geodata object in R. In the first case don't forget to close the polygon by repeating the first vertex at the end of the file. In the second case you can use a convex hull function in R, such as chull(), to define the polygon with your geodata object. Say your geodata object is called my.geo, then my.geo$coords would be the coordinates. Then try this, bor - chull(my.geo$coords) bor - c(bor, bor[1]) # close polygon by appending the first vertex plot(my.geo$coords) lines(my.geo$coords[pol,]) HTH Ruben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] geoR : Passing arguments to optim when using likfit]
Mzabalazo Ngwenya wrote: Hi everyone ! I'm am trying to fit a kriging model to a set of data. When I just run the likfit command I can obtain the results. However when I try to pass additional arguements to the optimization function optim I get errors. That is I want to obtain the hessian matrix so matrix (hessian=TRUE). Heres a little example( 1-D). Can anyone shed some light? Where am I going wrong ? --- obs.points -matrix(c(0.1,0,0.367,0,0.634,0,0.901,0),byrow=TRUE,nrow=4) MM1.fx -MM1(obs.points[,1]) MM1.fx [1] 0.111 0.5789475 1.7272732 9.100 # Creating geoR object MM1.data -as.geodata(cbind(obs.points,MM1.fx)) MM1.data $coords [1,] 0.100 0 [2,] 0.367 0 [3,] 0.634 0 [4,] 0.901 0 $data [1] 0.111 0.5789475 1.7272732 9.100 attr(,class) [1] geodata # Starting values for MLE sta.vals =expand.grid(seq(1,20),seq(1,20)) # Maximum likelihood estimation of covariance parameters HERE IS THE PROBLEM MM1.fit -likfit(MM1.data,ini.cov.pars=sta.vals,fix.nugget=TRUE,cov.model=gaussian,optim(hessian=TRUE)) MM1.fit -likfit(MM1.data,ini.cov.pars=sta.vals,fix.nugget=TRUE,cov.model=gaussian,hessian=TRUE) MM1.fit$info.minimisation.function$hessian For the observed Fisher information: solve(MM1.fit$info.minimisatrion.function$hessian) HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] deleting variables
Richard Pearson wrote: ?rm Richard Ralf Goertz wrote: How can I automatically exclude one variable from being saved in the workspace when I quit an R session? The problem is I don't know how to erase a variable once it has been created. [...] Ralf More on the use of rm. If you want to delete many variables in one go, x-runif(10) y-runif(10) z-runif(10) ls() #you check all the objects in your workspace #[1] x y z rm(list=your.list - ls()[2:3]) #you selected to delete all those objects whose indices are between 2 and 3. rm(your.list) #remove the temporary list with variable names ls() [1] x HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R Newbie Question/Data Model
[EMAIL PROTECTED] wrote: Given a data set and a set of predictors and a response in the data, we would like to find a model that fits the data set best. Suppose that we do not know what kind of model (linear, polynomial regression,... ) might be good, we are wondering if there is R-package(s) can auctomatically do this. Otherwise, can you direct me, or point out reference(s), basic steps to do this. Thanks. -james The best-fitting model for any data is a model with a lot of parameters, so maybe the best fitting model for any data is a model with an infinite number of parameters. However, any model with more parameters than data will have a negative number of degrees of freedom, and you do not want that. The best-fitting model for any data subject to the constraint that the number of degrees of freedom is non-negative, is the data itself, with zero degrees of freedom. The AIC tells you this too. The AIC for the model formed by the data itsel is 2n, whereas the AIC for any model with negative degrees of freedom is 2n. But I guess you want to make inference from sample to population. If that is indeed the case, then you should consider changing your focus from finding a model that fits the data set best to a model that best summarizes the information contained in your sample about the population the sample comes from. To do that, start by defining the nature of your response variable. What is the nature of the natural process generating this response variable? Is it continuous or discrete? Is it univariate or multivariate? Can it take negative and positive values? Can it take values of zero? After you have clarified the probabilistic model for the response variable, then you can start thinking about the mathematical relation between the response variable and the predictors. Is it linear or nonlinear? Are the predictors categorical or continuous? Read the posting guide, formulate a clear question, and maybe you will be given more specific help. Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] finding an unknown distribution
andrea previtali wrote: Hi, I need to analyze the influences of several factors on a variable that is a measure of fecundity, consisting of 73 observations ranging from 0 to 5. The variable is continuous and highly positive skewed, none of the typical transformations was able to normalize the data. Thus, I was thinking in analyzing these data using a generalized linear model where I can specify a distribution other than normal. I'm thinking it may fit a gamma or exponential distribution. But I'm not sure if the data meets the assumptions of those distributions because their definitions are too complex for my understanding! Roughly, the exponential distribution is the model of a random variable describing the time/distance between two independent events that occur at the same constant rate. The gamma distribution is the model of a random variable that can be thought of as the sum of exponential random variables. I don't think fecundity data, the count of reproductive cells, qualifies as a random variable to be modeled by either of these distributions. If the count of reproductive cells is very large, and you are modeling this count as a function of animal size, such as length, you should consider the lognormal distribution, since the count of cells grow multiplicatively (volumetrically) with the increase in length. In that case you can model your response variable using glm with family=gaussian(link=log). Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mixture distribution analisys
Antonio Olinto wrote: Hello, I'm analyzing some fish length-frequency data to access relative age and growth information and I would like to do something different from FiSAT / ELEFAN analysis. I found a package named MIX in http://www.math.mcmaster.ca/peter/mix/mix.html but it's compiled for R 2.0.0 So I have two questions: Is there any problem to run this package in R 2.7.0? - probably yes … And, is there any other package in R that can be used to analyze and to separate mixture distributions? Thanks for any help. Best regards, Antonio Olinto Sao Paulo Fisheries Institute Brazil This problem is not too difficult. Maybe you can write your own code. Say, you enter the number of fish you measured, n, and a two-column dataframe with the mid point of the length class in the first column (call it l) and the observed relative frequency of each length class in the second column (call it obs_prob). Then using the multinomial likelihood for the number of fish in each length class as the random variable (the approach in Peter Macdonald's Mix), minimize log_likelihood=-n*sum(elementwise product(obs_prob,log(pred_prob))) where pred_prob=(p1*0.3989423/s1)*exp(-0.5*square((l-m1)/s1)) +(p2*0.3989423/s2)*exp(-0.5*square((l-m2)/s2)) +(p3*0.3989423/s3)*exp(-0.5*square((l-m3)/s3)) where s1, s2, s3, m1, m2, m3, p1 and p2 (p3=1-p1+p2) are the parameters. Do you know your number of components (cohorts) in the mixture? In the model above it is three. If you don't know it, try several models with different number of components and the AIC may let you know which one is the best working model. Don't use the likelihood ratio test as one of the p parameters is on the boundary of parameter space if the null is true. Also, you should bound the parameters (m1m2m3, 0p11, 0p21, and establish the condition p3=1-p1+p2. I wrote ADMB code to do this if you want it. You can translate it to R. Below you can find some example code to do the graphics. R. cont - c(4,15,32,44,62,69,61,99,114,119,106,108,89,88,95,99,91,84,137,190,224,297,368,484,566,629,446,349,377,405,438,367,215,115,36,10,4,1,0,0,1) tot - sum(cont) rel.freq - rep(cont,each=10)/tot/10 lect - seq(9.1,50,0.1) prop1 - 0.0219271 prop2 - 0.121317 prop3 - 0.0328074 prop4 - 0.315534 prop5 - 0.203071 prop6 - 0.3053439 sigma1 - 1.50760 sigma2 - 2.82352 sigma3 - 1.40698 sigma4 - 3.00063 sigma5 - 1.41955 sigma6 - 1.99940 media1 - 13.4148 media2 - 19.2206 media3 - 24.5748 media4 - 31.9498 media5 - 34.6998 media6 - 39.7016 prot1-(1/10)*(prop1*(1/sqrt(2*pi))/sigma1)*exp(-0.5*((lect-(media1-0.5))/sigma1)^2) prot2-(1/10)*(prop2*(1/sqrt(2*pi))/sigma2)*exp(-0.5*((lect-(media2-0.5))/sigma2)^2) prot3-(1/10)*(prop3*(1/sqrt(2*pi))/sigma3)*exp(-0.5*((lect-(media3-0.5))/sigma3)^2) prot4-(1/10)*(prop4*(1/sqrt(2*pi))/sigma4)*exp(-0.5*((lect-(media4-0.5))/sigma4)^2) prot5-(1/10)*(prop5*(1/sqrt(2*pi))/sigma5)*exp(-0.5*((lect-(media5-0.5))/sigma5)^2) prot6-(1/10)*(prop6*(1/sqrt(2*pi))/sigma6)*exp(-0.5*((lect-(media6-0.5))/sigma6)^2) prot.tot-prot1+prot2+prot3+prot4+prot5+prot6 plot(lect,rel.freq,type=l,lwd=3,xlab=,ylab=,ylim=c(0,0.01)) lines(lect,prot1) lines(lect,prot2) lines(lect,prot3) lines(lect,prot4) lines(lect,prot5) lines(lect,prot6) lines(lect,prot.tot,lwd=2) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] analyzing binomial data with spatially correlated errors
Roger Bivand wrote: Ben Bolker bolker at ufl.edu writes: Jean-Baptiste Ferdy Jean-Baptiste.Ferdy at univ-montp2.fr writes: Dear R users, I want to explain binomial data by a serie of fixed effects. My problem is that my binomial data are spatially correlated. Naively, I thought I could found something similar to gls to analyze such data. After some reading, I decided that lmer is probably to tool I need. The model I want to fit would look like (...) You could *almost* use glmmPQL from the MASS package, which allows you to fit any lme model structure within a GLM 'wrapper', but as far as I know it wraps only lme ( which requires at least one random effect) and not gls. The trick used in: Dormann, C. F., McPherson, J. M., Araujo, M. B., Bivand, R., Bolliger, J., Carl, G., Davies, R. G., Hirzel, A., Jetz, W., Kissling, W. D., Kühn, I., Ohlemüller, R., Peres-Neto, P. R., Reineking, B., Schröder, B., Schurr, F. M. Wilson, R. J. (2007): Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography 30: 609–628 (see online supplement), is to add a constant term group, and set random=~1|group. The specific use with a binomial family there is for a (0,1) response, rather than a two-column matrix. You could try gee or geoRglm -- neither trivially easy, I think ... The same paper includes a GEE adaptation, but for a specific spatial configuration rather than a general one. Roger Bivand Ben Bolker I suggest you also check out the package geoRglm, where you can model binomial and Poisson spatially correlated data. I used it to model spatially correlated binomial data but without covariates, i.e. without your fixed effects (so my model was a logistic regression with the intercept only) (Reference below). But I understand that you can add covariates and use them to estimate the non-random set of predictors. Here is the geoRglm webpage: http://www.daimi.au.dk/~olefc/geoRglm/ This approach would be like tackling the problem from the point of view of geostatistics, rather than from mixed models. But I believe the likelihood-based geostatistical model is the same as a generalized linear mixed model where the distance is the random effect. In SAS you can do this using the macro glimmix but from the point of view of generalized linear mixed models because there they have implemented a correlation term, so that you can identify typical spatial correlation functions such as Gauss and exponential, particular cases of the Matern family. Rubén Roa-Ureta, R. and E.N. Niklitschek (2007) Biomass estimation from surveys with likelihood-based geostatistics. ICES Journal of Marine Science 64:1723-1734 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] comparing length-weight regressions]
[EMAIL PROTECTED] wrote: I'd like to compare length-weight regressions among years. Any information would be appreciated. a. gray fisheries consultant Your message is rather cryptic for a general statistical audience, whereas I'm sure in a fisheries group everybody would understand what you want. Use a glm with family=Gaussian(link=log) for all data sets together (in original units) with year as a factor, then run the model again ignoring the year factor, and compare the different fits using anova(name of your glm objects) for a likelihood ratio test, or inspect the AICs for a non-frequentist model selection method. R. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading data into R
BEP wrote: Hello all, I am working with a very large data set into R, and I have no interest in reviving my SAS skills. To do this, I will need to drop unwanted variables given the size of the data file. The most common strategy seems to be subsetting the data after it is read into R. Unfortunately, given the size of the data set, I can't get the file read and then subsquently do the subset procedure. I would be appreciative of help on the following: 1. What are the possibilities of reading in just a small set of variables during the read.table statement (or another 'read' statement)? That is, is it possible specify just the variables that I want to keep? 2. Can I randomly select a set of observations during the 'read' statement? I have searched various R resources for this information, so if I am simply overlooking a key resource on this issue, pointing that out to me would be greatly appreciated. Thanks in advance. Brian Check this for input of specific columns from a large data matrix: mysubsetdata-do.call(cbind,scan(file='location and name of your file',what=list(NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,0,NULL,0,NULL,NULL),flush=TRUE)) This will input only columns 10 and 11 into 'mysubsetdata'. With this method you can work out the way to select random columns. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.