Re: [R] choosing an appropriate linear model

Levi Waldron Fri, 06 Jun 2008 09:54:34 -0700

Perhaps this was too big a question, so I'll ask something shorter:

I have fit a linear model, and want to use its prediction intervals to
calculate the sum of many individual predictions.


1) Some of the lower prediction intervals are negative, which is
non-sensical.  Should I just set all negative predictions to zero, or is
there another way to require non-negative predictions only?
2) I am interested in the sum of many predictions based on the lm.  How can
I calculate the 95% prediction interval for the sum?  Should I calculate a
root mean square of the individual errors, or use a bootstrap method, or
something else?

ps. the data is attached to the end of this email.

On Thu, Jun 5, 2008 at 6:25 PM, Levi Waldron <[EMAIL PROTECTED]> wrote:

> I am trying to model the observed leaching of wood preservative chemicals
> from treated wood during an outdoor experiment where leaching is caused by
> rainfall events.  For each rainfall event, the amount of rainfall was
> recorded as well as the amount of preservative chemical leached.  A number
> of climatic variables were measured, but the most important is the amount of
> rainfall.
>
> I have tried a simple linear model, with zero intercept because zero
> rainfall cannot cause any leaching (leachdata dataframe is attached to this
> email).  The diagnostics show clearly non-normally distributed residuals
> with a simple linear regression, and I am trying to figure out what to do
> about it (see attached diagnostics.png).  This dataset contains measurements
> from 57 rainfall events on three replicate samples, for a total of 171
> measurements.
>
> Part of the problem is that physically, the leaching values can only be
> positive, so for the smaller rainfall amounts the residuals are all
> positive.  If I allow an intercept then it is significantly positive,
> possibly since the researcher wouldn't have collected measurements for very
> small rain events, but in terms of the model it doesn't make sense
> physically to have a positive intercept, particularly since lab experiments
> have shown that a certain amount of rain exposure is required to wet the
> wood before leaching begins.
>
> I can get more normally distributed residuals by log-transforming the
> response, or using the optimal box-cox transformation of lambda = 0.21,
> which produces nicer-looking residuals but unsatisfactory prediction which
> is the main goal of the model (also attached).
>
> Any advice on how to create a better predictive model?  I presume it has
> something to do with glm, especially since I have repeated rainfalls on
> replicate samples, but any advice on the approach to take would be much
> appreciated.  The code I used to produce the attached plots is included
> below.
>
>
> leach.lm <- lm(leachate~rainmm-1,data=leachdata)
>
> png("dianostics.png",height=1200,width=700)
> par(mfrow=c(3,2))
> plot(leachate~rainmm,data=leachdata,main="Data and fitted line")
> abline(leach.lm)
> plot(predict(leach.lm)~leachdata$leachate,main="predicted vs. observed
> leaching amount",xlim=c(0,12),ylim=c(0,12),xlab="observed
> leaching",ylab="predicted leaching")
> abline(a=0,b=1)
> plot(leach.lm)
> dev.off()
>
> library(MASS)
> boxcox(leach.lm,plotit=T,lambda=seq(0,0.4,by=0.01))
>
> boxtran <- function(y,lambda,inverse=F){
>   if(inverse)
>     return((lambda*y+1)^(1/lambda))
>   else
>     return((y^lambda-1)/lambda)
> }
>
> png("boxcox-dianostics.png",height=1200,width=700)
> par(mfrow=c(3,2))
> logleach.lm <- lm(boxtran(leachate,0.21)~rainmm-1,data=leachdata)
> plot(leachate~rainmm,data=leachdata,main="Data and fitted line")
> x <- leachdata$rainmm
> y <- boxtran(predict(logleach.lm),0.21,T)
> xy <- cbind(x,y)[order(x),]
> lines(xy)
> plot(y~leachdata$leachate,xlim=c(0,12),ylim=c(0,12),main="predicted vs.
> observed leaching amount",xlab="observed leaching",ylab="predicted
> leaching")
> abline(a=0,b=1)
> plot(logleach.lm)
> dev.off()
>

`leachdata` <-
structure(list(rainmm = c(19.68, 36.168, 18.632, 2.74, 0.822,
9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042, 2.74,
18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754, 12.604,
2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548, 1.644,
3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918, 54.2,
43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124, 10.96,
7.672, 3.562, 3.288, 6.02, 17.54, 19.68, 36.168, 18.632, 2.74,
0.822, 9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042,
2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754,
12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548,
1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918,
54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124,
10.96, 7.672, 3.562, 3.288, 6.02, 17.54, 19.68, 36.168, 18.632,
2.74, 0.822, 9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288,
9.042, 2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714,
5.754, 12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21,
0.548, 1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274,
1.918, 54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13,
7.124, 10.96, 7.672, 3.562, 3.288, 6.02, 17.54), leachate = c(0.94,
4.74, 2.84, 3.28, 0.07, 1.56, 0.48, 9.63, 1.2, 2.55, 0.15, 0.67,
0.57, 0.38, 1.81, 0.08, 0.94, 0.79, 0.16, 0.09, 1.2, 0.61, 0.77,
0.02, 1, 0.26, 1.34, 0.81, 0.18, 0.17, 0.005, 0.25, 0.42, 1.45,
0.54, 0.24, 0.41, 0.55, 1.59, 1.09, 3.84, 11.52, 6.21, 3.86,
2.34, 11.02, 2.33, 1.83, 2.4, 0.74, 0.71, 0.55, 0.31, 0.83, 0.29,
0.48, 0.92, 1.33, 4.8, 1.73, 1.87, 0.21, 1.04, 1.08, 6.74, 1.23,
2.5, 0.13, 1.29, 0.75, 0.66, 2.14, 0.17, 0.43, 0.69, 0.47, 0.14,
1.6, 0.56, 1.02, 0.04, 0.75, 0.32, 1.68, 0.58, 0.42, 0.18, 0.1,
0.34, 0.36, 1.54, 0.38, 0.18, 0.26, 0.005, 0.17, 0.18, 0.4, 2.13,
0.87, 0.75, 0.52, 3.21, 0.49, 0.85, 1.24, 0.32, 0.5, 0.37, 0.19,
0.53, 0.3, 0.51, 1.37, 1.25, 3.69, 2.76, 1.82, 0.005, 0.99, 0.87,
6.93, 1.04, 2.26, 0.14, 1.27, 0.62, 0.6, 2.91, 0.19, 0.41, 0.47,
0.38, 0.17, 1.56, 0.41, 0.92, 0.02, 0.51, 0.26, 0.86, 0.47, 0.39,
0.12, 0.08, 0.28, 0.3, 1.16, 0.27, 0.15, 0.22, 0.3, 0.18, 0.16,
0.47, 6, 1.47, 0.67, 0.35, 2.13, 0.51, 0.85, 1.37, 0.23, 0.45,
0.34, 0.17, 0.46, 0.23, 0.43, 1.17)), .Names = c("rainmm", "leachate"
), row.names = c(NA, -171L), class = "data.frame")

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] choosing an appropriate linear model

Reply via email to