Perhaps this was too big a question, so I'll ask something shorter: I have fit a linear model, and want to use its prediction intervals to calculate the sum of many individual predictions.
1) Some of the lower prediction intervals are negative, which is non-sensical. Should I just set all negative predictions to zero, or is there another way to require non-negative predictions only? 2) I am interested in the sum of many predictions based on the lm. How can I calculate the 95% prediction interval for the sum? Should I calculate a root mean square of the individual errors, or use a bootstrap method, or something else? ps. the data is attached to the end of this email. On Thu, Jun 5, 2008 at 6:25 PM, Levi Waldron <[EMAIL PROTECTED]> wrote: > I am trying to model the observed leaching of wood preservative chemicals > from treated wood during an outdoor experiment where leaching is caused by > rainfall events. For each rainfall event, the amount of rainfall was > recorded as well as the amount of preservative chemical leached. A number > of climatic variables were measured, but the most important is the amount of > rainfall. > > I have tried a simple linear model, with zero intercept because zero > rainfall cannot cause any leaching (leachdata dataframe is attached to this > email). The diagnostics show clearly non-normally distributed residuals > with a simple linear regression, and I am trying to figure out what to do > about it (see attached diagnostics.png). This dataset contains measurements > from 57 rainfall events on three replicate samples, for a total of 171 > measurements. > > Part of the problem is that physically, the leaching values can only be > positive, so for the smaller rainfall amounts the residuals are all > positive. If I allow an intercept then it is significantly positive, > possibly since the researcher wouldn't have collected measurements for very > small rain events, but in terms of the model it doesn't make sense > physically to have a positive intercept, particularly since lab experiments > have shown that a certain amount of rain exposure is required to wet the > wood before leaching begins. > > I can get more normally distributed residuals by log-transforming the > response, or using the optimal box-cox transformation of lambda = 0.21, > which produces nicer-looking residuals but unsatisfactory prediction which > is the main goal of the model (also attached). > > Any advice on how to create a better predictive model? I presume it has > something to do with glm, especially since I have repeated rainfalls on > replicate samples, but any advice on the approach to take would be much > appreciated. The code I used to produce the attached plots is included > below. > > > leach.lm <- lm(leachate~rainmm-1,data=leachdata) > > png("dianostics.png",height=1200,width=700) > par(mfrow=c(3,2)) > plot(leachate~rainmm,data=leachdata,main="Data and fitted line") > abline(leach.lm) > plot(predict(leach.lm)~leachdata$leachate,main="predicted vs. observed > leaching amount",xlim=c(0,12),ylim=c(0,12),xlab="observed > leaching",ylab="predicted leaching") > abline(a=0,b=1) > plot(leach.lm) > dev.off() > > library(MASS) > boxcox(leach.lm,plotit=T,lambda=seq(0,0.4,by=0.01)) > > boxtran <- function(y,lambda,inverse=F){ > if(inverse) > return((lambda*y+1)^(1/lambda)) > else > return((y^lambda-1)/lambda) > } > > png("boxcox-dianostics.png",height=1200,width=700) > par(mfrow=c(3,2)) > logleach.lm <- lm(boxtran(leachate,0.21)~rainmm-1,data=leachdata) > plot(leachate~rainmm,data=leachdata,main="Data and fitted line") > x <- leachdata$rainmm > y <- boxtran(predict(logleach.lm),0.21,T) > xy <- cbind(x,y)[order(x),] > lines(xy) > plot(y~leachdata$leachate,xlim=c(0,12),ylim=c(0,12),main="predicted vs. > observed leaching amount",xlab="observed leaching",ylab="predicted > leaching") > abline(a=0,b=1) > plot(logleach.lm) > dev.off() > `leachdata` <- structure(list(rainmm = c(19.68, 36.168, 18.632, 2.74, 0.822, 9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042, 2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754, 12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548, 1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918, 54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124, 10.96, 7.672, 3.562, 3.288, 6.02, 17.54, 19.68, 36.168, 18.632, 2.74, 0.822, 9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042, 2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754, 12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548, 1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918, 54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124, 10.96, 7.672, 3.562, 3.288, 6.02, 17.54, 19.68, 36.168, 18.632, 2.74, 0.822, 9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042, 2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754, 12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548, 1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918, 54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124, 10.96, 7.672, 3.562, 3.288, 6.02, 17.54), leachate = c(0.94, 4.74, 2.84, 3.28, 0.07, 1.56, 0.48, 9.63, 1.2, 2.55, 0.15, 0.67, 0.57, 0.38, 1.81, 0.08, 0.94, 0.79, 0.16, 0.09, 1.2, 0.61, 0.77, 0.02, 1, 0.26, 1.34, 0.81, 0.18, 0.17, 0.005, 0.25, 0.42, 1.45, 0.54, 0.24, 0.41, 0.55, 1.59, 1.09, 3.84, 11.52, 6.21, 3.86, 2.34, 11.02, 2.33, 1.83, 2.4, 0.74, 0.71, 0.55, 0.31, 0.83, 0.29, 0.48, 0.92, 1.33, 4.8, 1.73, 1.87, 0.21, 1.04, 1.08, 6.74, 1.23, 2.5, 0.13, 1.29, 0.75, 0.66, 2.14, 0.17, 0.43, 0.69, 0.47, 0.14, 1.6, 0.56, 1.02, 0.04, 0.75, 0.32, 1.68, 0.58, 0.42, 0.18, 0.1, 0.34, 0.36, 1.54, 0.38, 0.18, 0.26, 0.005, 0.17, 0.18, 0.4, 2.13, 0.87, 0.75, 0.52, 3.21, 0.49, 0.85, 1.24, 0.32, 0.5, 0.37, 0.19, 0.53, 0.3, 0.51, 1.37, 1.25, 3.69, 2.76, 1.82, 0.005, 0.99, 0.87, 6.93, 1.04, 2.26, 0.14, 1.27, 0.62, 0.6, 2.91, 0.19, 0.41, 0.47, 0.38, 0.17, 1.56, 0.41, 0.92, 0.02, 0.51, 0.26, 0.86, 0.47, 0.39, 0.12, 0.08, 0.28, 0.3, 1.16, 0.27, 0.15, 0.22, 0.3, 0.18, 0.16, 0.47, 6, 1.47, 0.67, 0.35, 2.13, 0.51, 0.85, 1.37, 0.23, 0.45, 0.34, 0.17, 0.46, 0.23, 0.43, 1.17)), .Names = c("rainmm", "leachate" ), row.names = c(NA, -171L), class = "data.frame") [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.