Re: [R] Prediction with multiple zeros in the dependent variable

Frank E Harrell Jr Thu, 08 Sep 2005 05:29:25 -0700

John Sorkin wrote:
> I have a batch of data in each line of data contains three values,
> calcium score, age, and sex. I would like to predict calcium scores as a
> function of age and sex, i.e. calcium=f(age,sex). Unfortunately the
> calcium scorers have a very "ugly distribution". There are multiple
> zeros, and multiple values between 300 and 600. There are no values
> between zero and 300. Needless to say, the calcium scores are not
> normally distributed, however, the values between 300 and 600 have a
> distribution that is log normal. As you might imagine, the residuals
> from the regression are not normally distributed and thus violates the
> basic assumption of regression analyses. Does anyone have a suggestion
> for a method (or a transformation) that will allow me predict calcium
> from age and sex without violating the assumptions of the model?
> Thanks,
> John
>  
> John Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> Baltimore VA Medical Center GRECC and
> University of Maryland School of Medicine Claude Pepper OAIC


John - first I would try a proportional odds model, with zero as its own 
category then treating all other values as continuous or collapsing them 
into 20-tiles.  If the PO assumption happens to hold (look at partial 
residual plots) you have a simple solution.

Frank

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Prediction with multiple zeros in the dependent variable

Reply via email to