Re: [R] Linear regression with a rounded response variable

Ravi Varadhan Thu, 22 Oct 2015 05:18:11 -0700

Dear Peter, Charles, Gabor, Jim, Mark, Victor, Peter, and Harold,
You have given me plenty of ammunition.  Thank you very much for the useful 
answers. 
Gratefully,
Ravi
________________________________________
From: peter dalgaard <pda...@gmail.com>
Sent: Wednesday, October 21, 2015 8:11 PM
To: Charles C. Berry
Cc: Ravi Varadhan; r-help@r-project.org
Subject: Re: [R] Linear regression with a rounded response variable

> On 21 Oct 2015, at 19:57 , Charles C. Berry <ccbe...@ucsd.edu> wrote:
>
> On Wed, 21 Oct 2015, Ravi Varadhan wrote:
>
>> [snippage]
>
> If half the subjects have a value of 5 seconds and the rest are split between 
> 4 and 6, your assertion that rounding induces an error of 
> dunif(epsilon,-0.5,0.5) is surely wrong (more positive errors in the 6 second 
> group and more negative errors in the 4 second group under any plausible 
> model).

Yes, and I think that the suggestion in another post to look at censored 
regression is more in the right direction.

In general, I'd expect the bias caused by rounding the response to quite small, 
except at very high granularity. I did a few small experiments with the 
simplest possible linear model: estimating a mean based on highly rounded data,

> y <- round(rnorm(1e2,pi,.5))
> mean(y)
[1] 3.12
> table(y)
y
 2  3  4  5
13 63 23  1

Or, using a bigger sample:

> mean(round(rnorm(1e8,pi,.5)))
[1] 3.139843

in which there is a visible bias, but quite a small one:

> pi - 3.139843
[1] 0.001749654

At lower granularity (sd=1 instead of .5), the bias has almost disappeared.

> mean(round(rnorm(1e8,pi,1)))
[1] 3.141577

If the granularity is increased sufficiently, you _will_ see a sizeable bias 
(because almost all observations will be round(pi)==3):

> mean(round(rnorm(1e8,pi,.1)))
[1] 3.00017

A full ML fit (with known sigma=1) is pretty easily done:

> library(stats4)
> mll <- function(mu)-sum(log(pnorm(y+.5,mu, .5)-pnorm(y-.5, mu, .5)))
> mle(mll,start=list(mu=3))

Call:
mle(minuslogl = mll, start = list(mu = 3))

Coefficients:
      mu
3.122069
> mean(y)
[1] 3.12

As you see, the difference is only 0.002.

A small simulation (1000 repl.) gave (r[1,]==MLE ; r{2,]==mean)

> summary(r[1,]-r[2,])
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
-0.004155  0.000702  0.001495  0.001671  0.002554  0.006860

so the corrections relative to the crude mean stay within one unit in the 2nd 
place. Notice  that the corrections are pretty darn close to cancelling out the 
bias.

-pd

>
>
> HTH,
>
> Chuck
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Linear regression with a rounded response variable

Reply via email to