Re: [R] Strange error with log-normal models
On 16/04/2013 1:19 PM, Noah Silverman wrote: Hi, I have some data, that when plotted looks very close to a log-normal distribution. My goal is to build a regression model to test how this variable responds to several independent variables. To do this, I want to use the fitdistr tool from the MASS package to see how well my data fits the actual distribution, and also build a generalized linear model using the glm command. The summary of my data is: Min. 1st Qu. MedianMean 3rd Qu.Max. 0. 0. 0. 0.8617 0.8332 55.5600 So, no missing values, no negative values. When I try to use the fitdistr command, I get an error that I don't understand: m - fitdistr(y, densfun=lognormal) Error in fitdistr(y, densfun = lognormal) : need positive values to fit a log-Normal You have zeros in your data. The lognormal distribution never takes on the value zero. If they are zero because of rounding (e.g. 0.001 would be recorded as zero), and there aren't too many of them, then replacing the zeros with a small positive value (e.g. half the smallest non-zero value) might make sense. But your median is zero, so at least half of your observations are zero. You need to come up with a better model than lognormal. Duncan Murdoch When I try to build a simple model, I also get an error: l - glm(y~ x, family=gaussian(link=log)) Error in eval(expr, envir, enclos) : cannot find valid starting values: please specify some Can anyone offer some suggestions? Thanks! -- Noah Silverman, M.S. UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange error with log-normal models
On Wed, Apr 17, 2013 at 5:19 AM, Noah Silverman noahsilver...@ucla.eduwrote: Hi, I have some data, that when plotted looks very close to a log-normal distribution. My goal is to build a regression model to test how this variable responds to several independent variables. [snip] When I try to build a simple model, I also get an error: l - glm(y~ x, family=gaussian(link=log)) Error in eval(expr, envir, enclos) : cannot find valid starting values: please specify some Duncan has described the problems with the lognormal. I will just point out that this 'simple model' is not lognormal. It is a model with normal errors and log link, ie. y ~ N(mu, sigma^2) log(mu) = x \beta -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange error with log-normal models
@Duncan, You make a very good point. Somehow I overlooked that 0 is not positive. I guess that rules out the log normal model. My challenge here is finding the right model for this data. Originally it was a nice count of students. Relatively easy to model with a zero inflated Poisson model. The resulting residuals seemed reasonable. However, I was then instructed to change the count of students to a rate which was calculated as students / population (Each school has its own population.)) This is now no longer a count variable, but a proportion between 0 and 1. This rate (students/population) is no longer Poisson, but is certainly not normal either. So, I'm a bit lost as to the appropriate distribution to represent it. Any thoughts? -- Noah Silverman, M.S. UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 On Apr 16, 2013, at 12:44 PM, Thomas Lumley tlum...@uw.edu wrote: On Wed, Apr 17, 2013 at 5:19 AM, Noah Silverman noahsilver...@ucla.edu wrote: Hi, I have some data, that when plotted looks very close to a log-normal distribution. My goal is to build a regression model to test how this variable responds to several independent variables. [snip] When I try to build a simple model, I also get an error: l - glm(y~ x, family=gaussian(link=log)) Error in eval(expr, envir, enclos) : cannot find valid starting values: please specify some Duncan has described the problems with the lognormal. I will just point out that this 'simple model' is not lognormal. It is a model with normal errors and log link, ie. y ~ N(mu, sigma^2) log(mu) = x \beta -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange error with log-normal models
Noah, You might want to look at beta regression, using the betareg package on CRAN. There is a JSS paper here that you might find helpful: http://www.jstatsoft.org/v34/i02/paper along with the vignettes for the package: http://cran.r-project.org/web/packages/betareg/vignettes/betareg.pdf http://cran.r-project.org/web/packages/betareg/vignettes/betareg-ext.pdf Regards, Marc Schwartz On Apr 16, 2013, at 3:20 PM, Noah Silverman noahsilver...@ucla.edu wrote: @Duncan, You make a very good point. Somehow I overlooked that 0 is not positive. I guess that rules out the log normal model. My challenge here is finding the right model for this data. Originally it was a nice count of students. Relatively easy to model with a zero inflated Poisson model. The resulting residuals seemed reasonable. However, I was then instructed to change the count of students to a rate which was calculated as students / population (Each school has its own population.)) This is now no longer a count variable, but a proportion between 0 and 1. This rate (students/population) is no longer Poisson, but is certainly not normal either. So, I'm a bit lost as to the appropriate distribution to represent it. Any thoughts? -- Noah Silverman, M.S. UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 On Apr 16, 2013, at 12:44 PM, Thomas Lumley tlum...@uw.edu wrote: On Wed, Apr 17, 2013 at 5:19 AM, Noah Silverman noahsilver...@ucla.edu wrote: Hi, I have some data, that when plotted looks very close to a log-normal distribution. My goal is to build a regression model to test how this variable responds to several independent variables. [snip] When I try to build a simple model, I also get an error: l - glm(y~ x, family=gaussian(link=log)) Error in eval(expr, envir, enclos) : cannot find valid starting values: please specify some Duncan has described the problems with the lognormal. I will just point out that this 'simple model' is not lognormal. It is a model with normal errors and log link, ie. y ~ N(mu, sigma^2) log(mu) = x \beta -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange error with log-normal models
On Apr 16, 2013, at 22:20 , Noah Silverman wrote: @Duncan, You make a very good point. Somehow I overlooked that 0 is not positive. I guess that rules out the log normal model. My challenge here is finding the right model for this data. Originally it was a nice count of students. Relatively easy to model with a zero inflated Poisson model. The resulting residuals seemed reasonable. However, I was then instructed to change the count of students to a rate which was calculated as students / population (Each school has its own population.)) This is now no longer a count variable, but a proportion between 0 and 1. This rate (students/population) is no longer Poisson, but is certainly not normal either. So, I'm a bit lost as to the appropriate distribution to represent it. Any thoughts? Off the cuff: Could it be more natural to model as a ZIP with log(pop) as an offset on the log-lambda scale? -- Noah Silverman, M.S. UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 On Apr 16, 2013, at 12:44 PM, Thomas Lumley tlum...@uw.edu wrote: On Wed, Apr 17, 2013 at 5:19 AM, Noah Silverman noahsilver...@ucla.edu wrote: Hi, I have some data, that when plotted looks very close to a log-normal distribution. My goal is to build a regression model to test how this variable responds to several independent variables. [snip] When I try to build a simple model, I also get an error: l - glm(y~ x, family=gaussian(link=log)) Error in eval(expr, envir, enclos) : cannot find valid starting values: please specify some Duncan has described the problems with the lognormal. I will just point out that this 'simple model' is not lognormal. It is a model with normal errors and log link, ie. y ~ N(mu, sigma^2) log(mu) = x \beta -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange error with log-normal models
peter dalgaard pdalgd at gmail.com writes: On Apr 16, 2013, at 22:20 , Noah Silverman wrote: My challenge here is finding the right model for this data. Originally it was a nice count of students. Relatively easy to model with a zero inflated Poisson model. The resulting residuals seemed reasonable. [snip] Off the cuff: Could it be more natural to model as a ZIP with log(pop) as an offset on the log-lambda scale? I agree. This was cross-posted to StackOverflow (broken URL: http://stackoverflow.com/questions/16046726/ regression-for-a-rate-variable-in-r ), where I made that suggestion. I don't know that cross-posting to r-help lists and StackOverflow is anywhere expressly forbidden (cross-posting *among* r lists is ruled out in the Posting Guide), but I'd prefer people didn't (because of this kind of wasted/duplicated effort). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.