John: 1. As George Box long ago emphasized and proved, normality is **NOT** that important in regression, certainly not for estimation and not even for inference in balanced designs. Independence of the observations is far more important.
2. That said, it sounds like what you have here is a mixture of some sort. Before running off to do fancy modeling, I would work very hard to look for some kind of "lurking variable" or experimental aberration -- what was going on in the experiment or study that might have caused all the zeros? Was there an instrument problem? -- a bad reagent? -- improper handling of the samples? It might very well be that you need to throw away part of the data because it's useless, rather than artificially attempt to model it. 3. And having said that, if a comprehensive model IS called for, one rather cynical approach to take is just to add a grouping variable as a covariate that has a value of 1 for all data in the zero group and 2 for all the nonzero data. Your model is f(age,sex) = 0 for all data in group 1 and your linear or nonlinear regression for group 2. Of course, this merely cloaks the cynicism in respectable dress. It's hard for me to believe that it was Mother Nature and not some kind of experimental problem that you see. A slightly less cynical approach might be to use some sort of changepoint model (in both age and sex) of the form f(age, sex) = g(age,sex) for age>=k1 and sex <=k2 and h(age,sex) otherwise. Well, perhaps **not** less cynical -- the response data are so widely separated that you'll just be using a bunch of extra (nonlinear, incidentally) parameters to essentially reproduce the use of a covariate. So I guess the point is that unless you already have a previously developed nonlinear model that could explain the behavior you see (perhaps based on some kind of mechanistic reasoning) it's not a good idea to try to develop an artificial empirical model that comprehends all the data. The fact is (a horrible phrase) that no modeling at all is needed for the most important message the data have to convey: rather, focus on the cause of the message instead of statistical artifice. Once you have determined that, you may be able to do something sensible. Clear thinking trumps muddy modeling every time. (Hopefully, this is sufficiently inflammatory that others will vigorously and wisely dispute me). Cheers, -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of John Sorkin > Sent: Wednesday, September 07, 2005 9:06 PM > To: r-help@stat.math.ethz.ch > Subject: [R] Prediction with multiple zeros in the dependent variable > > I have a batch of data in each line of data contains three values, > calcium score, age, and sex. I would like to predict calcium > scores as a > function of age and sex, i.e. calcium=f(age,sex). Unfortunately the > calcium scorers have a very "ugly distribution". There are multiple > zeros, and multiple values between 300 and 600. There are no values > between zero and 300. Needless to say, the calcium scores are not > normally distributed, however, the values between 300 and 600 have a > distribution that is log normal. As you might imagine, the residuals > from the regression are not normally distributed and thus violates the > basic assumption of regression analyses. Does anyone have a suggestion > for a method (or a transformation) that will allow me predict calcium > from age and sex without violating the assumptions of the model? > Thanks, > John > > John Sorkin M.D., Ph.D. > Chief, Biostatistics and Informatics > Baltimore VA Medical Center GRECC and > University of Maryland School of Medicine Claude Pepper OAIC > > University of Maryland School of Medicine > Division of Gerontology > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > > 410-605-7119 > -- NOTE NEW EMAIL ADDRESS: > [EMAIL PROTECTED] > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html