On Tue, 22 Feb 2005, Duncan Murdoch wrote:

On Tue, 22 Feb 2005 13:43:47 -0700, Tony Plate
<[EMAIL PROTECTED]> wrote :

Is it a bug that quantile() can return a lower sample quantile for a higher
probability?

##### quantile returns decreasing results with increasing probs (data at
the end of the message)
quantile(x2, (0:5)/5)
           0%           20%           40%           60%           80%
-0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115
         100%
 0.2905596324
##### the 40% quantile has a lower value than the 20% quantile
diff(quantile(x2, (0:5)/5))
          20%           40%           60%           80%          100%
 5.099206e-04 -1.084202e-19  1.726945e-04  1.568908e-04  2.911342e-01


This only happens for type=7:

for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5,
type=type))<0), "\n")
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
7 TRUE
8 FALSE
9 FALSE


I know this is at the limits of machine precision, but it still seems wrong. Curiously, S-PLUS 6.2 produces exactly the same numerical result on my machine (according to the R quantile documentation, the S-PLUS calculations correspond to type=7).

I'd say it's not a bug in that rounding error is something you should expect in a calculation like this, but it does look wrong. And it isn't only restricted to type 7. If you make a vector of copies of that bad value, you'll see it in more cases:

x <- rep(-0.00090419678460984, 602)
for (type in 1:9) cat(type, any(diff(quantile(x, (0:5)/5,
+ type=type))<0), "\n")
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 TRUE
6 TRUE
7 TRUE
8 FALSE
9 TRUE

(at least on Windows).  What's happening is that R is doing linear
interpolation between two equal values, and not getting the same value
back, because of rounding.

The offending line appears to be this one:

qs[i] <- ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]])

The equivalent calculation in the approx function (which doesn't
appear to have this problem) is

qs[i] + (x[hi[i]] - qs[i]) * h

Can anyone think of why this would not be better?  (The same sort of
calculation shows up again later in quantile().)

Infinities, where arithmetic is not distributive.

It is done that way to interpolate correctly between Inf and Inf: the second version gives NaN, and that is something that was a problem with the first version of the update to give all those types. I am pretty sure there is a regression test for this.

If you really want to avoid this, I think you need to pre-test for equality.

--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to