On Thu, 8 Sep 2005, John Sorkin wrote: > I have a batch of data in each line of data contains three values, > calcium score, age, and sex. I would like to predict calcium scores as a > function of age and sex, i.e. calcium=f(age,sex). Unfortunately the > calcium scorers have a very "ugly distribution". There are multiple > zeros, and multiple values between 300 and 600. There are no values > between zero and 300. Needless to say, the calcium scores are not > normally distributed, however, the values between 300 and 600 have a > distribution that is log normal.
[Coronary artery calcium by EBCT, I presume] Our approach to modelling calcium scores is to do it in two parts. First fit something like a logistic regression model where the outcome is zero vs non-zero calcium. Then, for the non-zero use something like a linear regression model for log calcium. You could presumably use such a model for prediction or imputation too, and you can work out means, medians etc from the two models. One particular reason for using this two-part model is that we find different predictors of zero/non-zero and of amount. This makes biological sense -- a factor that makes arterial plaques calcify might well have no impact until you have arterial plaques. Or you could use smooth quantile regression in the rq package. -thomas ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html