Re: [R] Strange error with log-normal models

2013-04-16 Thread Duncan Murdoch

On 16/04/2013 1:19 PM, Noah Silverman wrote:

Hi,

I have some data, that when plotted looks very close to a log-normal 
distribution.  My goal is to build a regression model to test how this variable 
responds to several independent variables.

To do this, I want to use the fitdistr tool from the MASS package to see how 
well my data fits the actual distribution, and also build a generalized linear 
model using the glm command.


The summary of my data is:

Min. 1st Qu.  MedianMean 3rd Qu.Max.
  0.  0.  0.  0.8617  0.8332 55.5600

So, no missing values, no negative values.

When I try to use the fitdistr command, I get an error that I don't understand:
m - fitdistr(y, densfun=lognormal)

Error in fitdistr(y, densfun = lognormal) :  need positive values to fit a 
log-Normal


You have zeros in your data.  The lognormal distribution never takes on 
the value zero.


If they are zero because of rounding (e.g. 0.001 would be recorded as 
zero), and there aren't too many of them, then replacing the zeros with 
a small positive value (e.g. half the smallest non-zero value) might 
make sense.  But your median is zero, so at least half of your 
observations are zero.


You need to come up with a better model than lognormal.

Duncan Murdoch





When I try to build a simple model, I also get an error:

l - glm(y~ x, family=gaussian(link=log))

Error in eval(expr, envir, enclos) :  cannot find valid starting values: please 
specify some



Can anyone offer some suggestions?


Thanks!

--
Noah Silverman, M.S.
UCLA Department of Statistics
8117 Math Sciences Building
Los Angeles, CA 90095

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange error with log-normal models

2013-04-16 Thread Thomas Lumley
On Wed, Apr 17, 2013 at 5:19 AM, Noah Silverman noahsilver...@ucla.eduwrote:

 Hi,

 I have some data, that when plotted looks very close to a log-normal
 distribution.  My goal is to build a regression model to test how this
 variable responds to several independent variables.


 [snip]

When I try to build a simple model, I also get an error:

 l - glm(y~ x, family=gaussian(link=log))

 Error in eval(expr, envir, enclos) :  cannot find valid starting values:
 please specify some


Duncan has described the problems with the lognormal.  I will just point
out that this 'simple model' is not lognormal.  It is a model with normal
errors and log link, ie.

y ~ N(mu, sigma^2)
log(mu) = x \beta


-thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange error with log-normal models

2013-04-16 Thread Noah Silverman
@Duncan, You make a very good point.  Somehow I overlooked that 0 is not 
positive.  I guess that rules out the log normal model.

My challenge here is  finding the right model for this data.  Originally it was 
a nice count of students.  Relatively easy to model with a zero inflated 
Poisson model.  The resulting residuals seemed reasonable.

However, I was then instructed to change the count of students to a rate 
which was calculated as students / population (Each school has its own 
population.)) This is now no longer a count variable, but a proportion between 
0 and 1.  

This rate (students/population) is no longer Poisson, but is certainly not 
normal either.  So, I'm a bit lost as to the appropriate distribution to 
represent it.

Any thoughts?


--
Noah Silverman, M.S.
UCLA Department of Statistics
8117 Math Sciences Building
Los Angeles, CA 90095

On Apr 16, 2013, at 12:44 PM, Thomas Lumley tlum...@uw.edu wrote:

 On Wed, Apr 17, 2013 at 5:19 AM, Noah Silverman noahsilver...@ucla.edu 
 wrote:
 Hi,
 
 I have some data, that when plotted looks very close to a log-normal 
 distribution.  My goal is to build a regression model to test how this 
 variable responds to several independent variables.
 
  [snip]
 
 When I try to build a simple model, I also get an error:
 
 l - glm(y~ x, family=gaussian(link=log))
 
 Error in eval(expr, envir, enclos) :  cannot find valid starting values: 
 please specify some
 
 
 Duncan has described the problems with the lognormal.  I will just point out 
 that this 'simple model' is not lognormal.  It is a model with normal errors 
 and log link, ie.
 
 y ~ N(mu, sigma^2)
 log(mu) = x \beta
 
 
 -thomas
 
 -- 
 Thomas Lumley
 Professor of Biostatistics
 University of Auckland


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange error with log-normal models

2013-04-16 Thread Marc Schwartz
Noah,

You might want to look at beta regression, using the betareg package on CRAN. 
There is a JSS paper here that you might find helpful:

  http://www.jstatsoft.org/v34/i02/paper

along with the vignettes for the package:

  http://cran.r-project.org/web/packages/betareg/vignettes/betareg.pdf

  http://cran.r-project.org/web/packages/betareg/vignettes/betareg-ext.pdf


Regards,

Marc Schwartz

On Apr 16, 2013, at 3:20 PM, Noah Silverman noahsilver...@ucla.edu wrote:

 @Duncan, You make a very good point.  Somehow I overlooked that 0 is not 
 positive.  I guess that rules out the log normal model.
 
 My challenge here is  finding the right model for this data.  Originally it 
 was a nice count of students.  Relatively easy to model with a zero inflated 
 Poisson model.  The resulting residuals seemed reasonable.
 
 However, I was then instructed to change the count of students to a rate 
 which was calculated as students / population (Each school has its own 
 population.)) This is now no longer a count variable, but a proportion 
 between 0 and 1.  
 
 This rate (students/population) is no longer Poisson, but is certainly not 
 normal either.  So, I'm a bit lost as to the appropriate distribution to 
 represent it.
 
 Any thoughts?
 
 
 --
 Noah Silverman, M.S.
 UCLA Department of Statistics
 8117 Math Sciences Building
 Los Angeles, CA 90095
 
 On Apr 16, 2013, at 12:44 PM, Thomas Lumley tlum...@uw.edu wrote:
 
 On Wed, Apr 17, 2013 at 5:19 AM, Noah Silverman noahsilver...@ucla.edu 
 wrote:
 Hi,
 
 I have some data, that when plotted looks very close to a log-normal 
 distribution.  My goal is to build a regression model to test how this 
 variable responds to several independent variables.
 
 [snip]
 
 When I try to build a simple model, I also get an error:
 
 l - glm(y~ x, family=gaussian(link=log))
 
 Error in eval(expr, envir, enclos) :  cannot find valid starting values: 
 please specify some
 
 
 Duncan has described the problems with the lognormal.  I will just point out 
 that this 'simple model' is not lognormal.  It is a model with normal errors 
 and log link, ie.
 
 y ~ N(mu, sigma^2)
 log(mu) = x \beta
 
 
-thomas
 
 -- 
 Thomas Lumley
 Professor of Biostatistics
 University of Auckland
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange error with log-normal models

2013-04-16 Thread peter dalgaard

On Apr 16, 2013, at 22:20 , Noah Silverman wrote:

 @Duncan, You make a very good point.  Somehow I overlooked that 0 is not 
 positive.  I guess that rules out the log normal model.
 
 My challenge here is  finding the right model for this data.  Originally it 
 was a nice count of students.  Relatively easy to model with a zero inflated 
 Poisson model.  The resulting residuals seemed reasonable.
 
 However, I was then instructed to change the count of students to a rate 
 which was calculated as students / population (Each school has its own 
 population.)) This is now no longer a count variable, but a proportion 
 between 0 and 1.  
 
 This rate (students/population) is no longer Poisson, but is certainly not 
 normal either.  So, I'm a bit lost as to the appropriate distribution to 
 represent it.
 
 Any thoughts?
 

Off the cuff: Could it be more natural to model as a ZIP with log(pop) as an 
offset on the log-lambda scale? 

 
 --
 Noah Silverman, M.S.
 UCLA Department of Statistics
 8117 Math Sciences Building
 Los Angeles, CA 90095
 
 On Apr 16, 2013, at 12:44 PM, Thomas Lumley tlum...@uw.edu wrote:
 
 On Wed, Apr 17, 2013 at 5:19 AM, Noah Silverman noahsilver...@ucla.edu 
 wrote:
 Hi,
 
 I have some data, that when plotted looks very close to a log-normal 
 distribution.  My goal is to build a regression model to test how this 
 variable responds to several independent variables.
 
 [snip]
 
 When I try to build a simple model, I also get an error:
 
 l - glm(y~ x, family=gaussian(link=log))
 
 Error in eval(expr, envir, enclos) :  cannot find valid starting values: 
 please specify some
 
 
 Duncan has described the problems with the lognormal.  I will just point out 
 that this 'simple model' is not lognormal.  It is a model with normal errors 
 and log link, ie.
 
 y ~ N(mu, sigma^2)
 log(mu) = x \beta
 
 
-thomas
 
 -- 
 Thomas Lumley
 Professor of Biostatistics
 University of Auckland
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange error with log-normal models

2013-04-16 Thread Ben Bolker
peter dalgaard pdalgd at gmail.com writes:


 On Apr 16, 2013, at 22:20 , Noah Silverman wrote:
 

  My challenge here is finding the right model for this data.
 Originally it was a nice count of students.  Relatively easy to
 model with a zero inflated Poisson model.  The resulting residuals
 seemed reasonable.  

 [snip]
 
 Off the cuff: Could it be more natural to model as a ZIP with log(pop) as
an offset on the log-lambda scale? 
 
  

  I agree.
  
This was cross-posted to StackOverflow (broken URL:
http://stackoverflow.com/questions/16046726/
   regression-for-a-rate-variable-in-r ), where I made
that suggestion.

  I don't know that cross-posting to r-help lists and StackOverflow
is anywhere expressly forbidden (cross-posting *among* r lists is
ruled out in the Posting Guide), but I'd prefer people didn't
(because of this kind of wasted/duplicated effort).

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.