John Sorkin wrote: > I have a batch of data in each line of data contains three values, > calcium score, age, and sex. I would like to predict calcium scores as a > function of age and sex, i.e. calcium=f(age,sex). Unfortunately the > calcium scorers have a very "ugly distribution". There are multiple > zeros, and multiple values between 300 and 600. There are no values > between zero and 300. Needless to say, the calcium scores are not > normally distributed, however, the values between 300 and 600 have a > distribution that is log normal. As you might imagine, the residuals > from the regression are not normally distributed and thus violates the > basic assumption of regression analyses. Does anyone have a suggestion > for a method (or a transformation) that will allow me predict calcium > from age and sex without violating the assumptions of the model? > Thanks, > John > > John Sorkin M.D., Ph.D. > Chief, Biostatistics and Informatics > Baltimore VA Medical Center GRECC and > University of Maryland School of Medicine Claude Pepper OAIC
John - first I would try a proportional odds model, with zero as its own category then treating all other values as continuous or collapsing them into 20-tiles. If the PO assumption happens to hold (look at partial residual plots) you have a simple solution. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html