Hi Greg,

Thanks for your guidance. 

In this case, the evidence is that the primary subpopulation of the data, 
accounting for  observes the standard statistical model in the sense that Rice 
uses the term.  It may by all accounts be normally distributed, and a Q-Q shows 
a large portion of the primary subpopulation behaves that way, out to 2 
theoretical quantiles. But, for the measurement ranges of interest, the 
complement of the "normal subpopulation", accounting for some 20% of the total 
two million data points, behaves in other ways, which are, as a matter of fact, 
poorly understood.  That's not likely to change soon.

The choice of a robust regression framework and of "robust" (and possibly 
"quantreg" as Prof Koenker suggested) was simply to automatically fit a line to 
the primary subpopulation, without having to make arbitrary choices as what to 
keep or what to discard. Also, use of any preexisting package was simply 
pursued as a timesaver, worksaver, and to have some conceptual framework within 
to proceed other than just throwing least squares at arbitrarily chosen 
subsets.  

It sounds to me like I might use the robust regression to decide what to 
discard and then apply standard linear "lm" to the remainder, minding the 
diagnostics. Should they prove favorable, I'll proceed with the result of "lm".

Thanks for pointing out the limitations of "robust" and its kin for me. 

BTW, if "robust" does not adopt a normal model for the y variable, what's the 
proper interpretation of the standard errors for slope and intercept it yields? 
 A reference?

 - Jan

-----Original Message-----
From: Greg Snow [mailto:greg.s...@imail.org] 
Sent: Wednesday, April 08, 2009 1:20 PM
To: Galkowski, Jan; r-help@r-project.org
Subject: RE: predict "interval" for lmRob?

Your problem is related to the theory underlying linear models (and is an 
example as to why it is important to understand the theory, not just know how 
to plug numbers into a computer).

The lm function is based on theory that assumes the y variable in normally 
distributed with the mean of that normal based on the model and the x values.  
This allows the predict function for lm to create prediction intervals based on 
the normal distribution, the predicted mean of that distribution, the estimated 
standard deviation, and the uncertainty in the predicted mean.  Note that if 
your y variable is not normally distributed, but the sample size is large 
enough for the Central Limit Theorem to hold, then the confidence intervals 
will be approximately correct, but the prediction intervals will probably not 
be.

When you switch to a robust regression approach, the assumption is that the y 
variable is not normal, so a prediction interval based on the normal 
distribution does not make sense.  To get an appropriate prediction interval 
you need some information on what the distribution of the y values is 
(conditional on the model), but most robust techniques are not based on a 
specific distribution, just some properties of the distribution.  Without some 
information (or at least an assumption) on that distribution, the predict 
method cannot create prediction intervals.

I know that this does not answer your question, but hopefully helps you to 
understand what is happening.  Think about what your actual scientific question 
is, it may be that you can answer the question without prediction intervals.

If you feel that you really need the prediction intervals, then you will need 
to do some additional background research into what distribution you think the 
data comes from, then you can proceed from there.  Some options include fitting 
a model based on that distribution, simulating data from the distribution given 
the model estimates and the uncertainty in those estimates, quantile 
regression, mixture of regressions, and others.

Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Galkowski, Jan
> Sent: Wednesday, April 08, 2009 9:32 AM
> To: r-help@r-project.org
> Subject: [R] predict "interval" for lmRob?
> 
> lm's "predict" function offers an "interval" parameter to choose
> between 'confidence' and 'prediction' bands. In the package "robust"
> and for "lmRob", there is also a "predict" but it lacks such a
> parameter, and the documented "type" parameter has only "response"
> offerred.  Is there some way of obtaining prediction bands from lmRob?
> Is there an alternative robust (linear) regression package that offers
> such a capability?
> 
> Thanks for any and all help.
> 
>   - Jan Galkowski, Akamai Technologies, Cambridge, MA.
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to