Re: [Rd] bug? quantile() can return decreasing sample quantiles for increasing probabilities
On Tue, 22 Feb 2005 15:14:00 -0700, Tony Plate <[EMAIL PROTECTED]> wrote : >Thanks for the diagnosis. > >The reason I came across this was that I use both S-PLUS and R and I often >use the results of quantile() as the breaks for cut(). In S-PLUS, cut() >stops with an error if breaks has any decreasing values. Thus this example >caused an S-PLUS function to unexpectedly stop with an error. However, >cut() in R behaves differently: it sorts its breaks and thus does not >object to decreasing values in breaks. Another difference is that cut() in >R stops with an error if any breaks are duplicated, which, I guess, means >that in R I should use findInterval() instead of cut() for this >task. Except that findInterval() in R stops with an error if its breaks >are unsorted... > > > findInterval(x2, quantile(x2, (0:5)/5)) >Error in findInterval(x2, quantile(x2, (0:5)/5)) : > 'vec' must be sorted non-decreasingly I guess you'll just have to use sort(quantile(...)). It makes the labels look sort of funny, but is hopefully harmless: > x <- rep(-0.00090419678460984, 602) > sort(quantile(x, 0:5/5)) 0% 40% 60% 80% 100% -0.0009041968 -0.0009041968 -0.0009041968 -0.0009041968 -0.0009041968 20% -0.0009041968 Duncan Murdoch __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] bug? quantile() can return decreasing sample quantiles for increasing probabilities
Thanks for the diagnosis. The reason I came across this was that I use both S-PLUS and R and I often use the results of quantile() as the breaks for cut(). In S-PLUS, cut() stops with an error if breaks has any decreasing values. Thus this example caused an S-PLUS function to unexpectedly stop with an error. However, cut() in R behaves differently: it sorts its breaks and thus does not object to decreasing values in breaks. Another difference is that cut() in R stops with an error if any breaks are duplicated, which, I guess, means that in R I should use findInterval() instead of cut() for this task. Except that findInterval() in R stops with an error if its breaks are unsorted... > findInterval(x2, quantile(x2, (0:5)/5)) Error in findInterval(x2, quantile(x2, (0:5)/5)) : 'vec' must be sorted non-decreasingly > -- Tony Plate At Tuesday 02:36 PM 2/22/2005, Duncan Murdoch wrote: On Tue, 22 Feb 2005 13:43:47 -0700, Tony Plate <[EMAIL PROTECTED]> wrote : >Is it a bug that quantile() can return a lower sample quantile for a higher >probability? > > > # quantile returns decreasing results with increasing probs (data at >the end of the message) > > quantile(x2, (0:5)/5) >0% 20% 40% 60% 80% >-0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115 > 100% > 0.2905596324 > > # the 40% quantile has a lower value than the 20% quantile > > diff(quantile(x2, (0:5)/5)) > 20% 40% 60% 80% 100% > 5.099206e-04 -1.084202e-19 1.726945e-04 1.568908e-04 2.911342e-01 > > > >This only happens for type=7: > > > for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5, >type=type))<0), "\n") >1 FALSE >2 FALSE >3 FALSE >4 FALSE >5 FALSE >6 FALSE >7 TRUE >8 FALSE >9 FALSE > > > >I know this is at the limits of machine precision, but it still seems >wrong. Curiously, S-PLUS 6.2 produces exactly the same numerical result on >my machine (according to the R quantile documentation, the S-PLUS >calculations correspond to type=7). I'd say it's not a bug in that rounding error is something you should expect in a calculation like this, but it does look wrong. And it isn't only restricted to type 7. If you make a vector of copies of that bad value, you'll see it in more cases: > x <- rep(-0.00090419678460984, 602) > for (type in 1:9) cat(type, any(diff(quantile(x, (0:5)/5, + type=type))<0), "\n") 1 FALSE 2 FALSE 3 FALSE 4 FALSE 5 TRUE 6 TRUE 7 TRUE 8 FALSE 9 TRUE (at least on Windows). What's happening is that R is doing linear interpolation between two equal values, and not getting the same value back, because of rounding. The offending line appears to be this one: qs[i] <- ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]]) The equivalent calculation in the approx function (which doesn't appear to have this problem) is qs[i] + (x[hi[i]] - qs[i]) * h Can anyone think of why this would not be better? (The same sort of calculation shows up again later in quantile().) Duncan Murdoch __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] bug? quantile() can return decreasing sample quantiles for increasing probabilities
On Tue, 22 Feb 2005 21:36:20 +, Duncan Murdoch <[EMAIL PROTECTED]> wrote : >On Tue, 22 Feb 2005 13:43:47 -0700, Tony Plate ><[EMAIL PROTECTED]> wrote : > >>Is it a bug that quantile() can return a lower sample quantile for a higher >>probability? >> >> > # quantile returns decreasing results with increasing probs (data at >>the end of the message) >> > quantile(x2, (0:5)/5) >>0% 20% 40% 60% 80% >>-0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115 >> 100% >> 0.2905596324 >> > # the 40% quantile has a lower value than the 20% quantile >> > diff(quantile(x2, (0:5)/5)) >> 20% 40% 60% 80% 100% >> 5.099206e-04 -1.084202e-19 1.726945e-04 1.568908e-04 2.911342e-01 >> > >> >>This only happens for type=7: >> >> > for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5, >>type=type))<0), "\n") >>1 FALSE >>2 FALSE >>3 FALSE >>4 FALSE >>5 FALSE >>6 FALSE >>7 TRUE >>8 FALSE >>9 FALSE >> > >> >>I know this is at the limits of machine precision, but it still seems >>wrong. Curiously, S-PLUS 6.2 produces exactly the same numerical result on >>my machine (according to the R quantile documentation, the S-PLUS >>calculations correspond to type=7). > >I'd say it's not a bug in that rounding error is something you should >expect in a calculation like this, but it does look wrong. And it >isn't only restricted to type 7. If you make a vector of copies of >that bad value, you'll see it in more cases: > >> x <- rep(-0.00090419678460984, 602) >> for (type in 1:9) cat(type, any(diff(quantile(x, (0:5)/5, >+ type=type))<0), "\n") >1 FALSE >2 FALSE >3 FALSE >4 FALSE >5 TRUE >6 TRUE >7 TRUE >8 FALSE >9 TRUE > >(at least on Windows). What's happening is that R is doing linear >interpolation between two equal values, and not getting the same value >back, because of rounding. > >The offending line appears to be this one: > >qs[i] <- ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]]) > >The equivalent calculation in the approx function (which doesn't >appear to have this problem) is > >qs[i] + (x[hi[i]] - qs[i]) * h > >Can anyone think of why this would not be better? (The same sort of >calculation shows up again later in quantile().) Just looked at the history of this line, and it appears the code is the way it is to avoid an error if the value of the vector is infinite. For example, we now get the right answer > x <- rep(Inf, 100) > quantile(x, 0:5/5) 0% 20% 40% 60% 80% 100% Inf Inf Inf Inf Inf Inf but with my modification above we wouldn't: > quantile(x, 0:5/5) 0% 20% 40% 60% 80% 100% Inf NaN NaN NaN NaN Inf Duncan __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] bug? quantile() can return decreasing sample quantiles for increasing probabilities
On Tue, 22 Feb 2005, Duncan Murdoch wrote: On Tue, 22 Feb 2005 13:43:47 -0700, Tony Plate <[EMAIL PROTECTED]> wrote : Is it a bug that quantile() can return a lower sample quantile for a higher probability? # quantile returns decreasing results with increasing probs (data at the end of the message) quantile(x2, (0:5)/5) 0% 20% 40% 60% 80% -0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115 100% 0.2905596324 # the 40% quantile has a lower value than the 20% quantile diff(quantile(x2, (0:5)/5)) 20% 40% 60% 80% 100% 5.099206e-04 -1.084202e-19 1.726945e-04 1.568908e-04 2.911342e-01 This only happens for type=7: for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5, type=type))<0), "\n") 1 FALSE 2 FALSE 3 FALSE 4 FALSE 5 FALSE 6 FALSE 7 TRUE 8 FALSE 9 FALSE I know this is at the limits of machine precision, but it still seems wrong. Curiously, S-PLUS 6.2 produces exactly the same numerical result on my machine (according to the R quantile documentation, the S-PLUS calculations correspond to type=7). I'd say it's not a bug in that rounding error is something you should expect in a calculation like this, but it does look wrong. And it isn't only restricted to type 7. If you make a vector of copies of that bad value, you'll see it in more cases: x <- rep(-0.00090419678460984, 602) for (type in 1:9) cat(type, any(diff(quantile(x, (0:5)/5, + type=type))<0), "\n") 1 FALSE 2 FALSE 3 FALSE 4 FALSE 5 TRUE 6 TRUE 7 TRUE 8 FALSE 9 TRUE (at least on Windows). What's happening is that R is doing linear interpolation between two equal values, and not getting the same value back, because of rounding. The offending line appears to be this one: qs[i] <- ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]]) The equivalent calculation in the approx function (which doesn't appear to have this problem) is qs[i] + (x[hi[i]] - qs[i]) * h Can anyone think of why this would not be better? (The same sort of calculation shows up again later in quantile().) Infinities, where arithmetic is not distributive. It is done that way to interpolate correctly between Inf and Inf: the second version gives NaN, and that is something that was a problem with the first version of the update to give all those types. I am pretty sure there is a regression test for this. If you really want to avoid this, I think you need to pre-test for equality. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] bug? quantile() can return decreasing sample quantiles for increasing probabilities
On Tue, 22 Feb 2005 13:43:47 -0700, Tony Plate <[EMAIL PROTECTED]> wrote : >Is it a bug that quantile() can return a lower sample quantile for a higher >probability? > > > # quantile returns decreasing results with increasing probs (data at >the end of the message) > > quantile(x2, (0:5)/5) >0% 20% 40% 60% 80% >-0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115 > 100% > 0.2905596324 > > # the 40% quantile has a lower value than the 20% quantile > > diff(quantile(x2, (0:5)/5)) > 20% 40% 60% 80% 100% > 5.099206e-04 -1.084202e-19 1.726945e-04 1.568908e-04 2.911342e-01 > > > >This only happens for type=7: > > > for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5, >type=type))<0), "\n") >1 FALSE >2 FALSE >3 FALSE >4 FALSE >5 FALSE >6 FALSE >7 TRUE >8 FALSE >9 FALSE > > > >I know this is at the limits of machine precision, but it still seems >wrong. Curiously, S-PLUS 6.2 produces exactly the same numerical result on >my machine (according to the R quantile documentation, the S-PLUS >calculations correspond to type=7). I'd say it's not a bug in that rounding error is something you should expect in a calculation like this, but it does look wrong. And it isn't only restricted to type 7. If you make a vector of copies of that bad value, you'll see it in more cases: > x <- rep(-0.00090419678460984, 602) > for (type in 1:9) cat(type, any(diff(quantile(x, (0:5)/5, + type=type))<0), "\n") 1 FALSE 2 FALSE 3 FALSE 4 FALSE 5 TRUE 6 TRUE 7 TRUE 8 FALSE 9 TRUE (at least on Windows). What's happening is that R is doing linear interpolation between two equal values, and not getting the same value back, because of rounding. The offending line appears to be this one: qs[i] <- ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]]) The equivalent calculation in the approx function (which doesn't appear to have this problem) is qs[i] + (x[hi[i]] - qs[i]) * h Can anyone think of why this would not be better? (The same sort of calculation shows up again later in quantile().) Duncan Murdoch __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] bug? quantile() can return decreasing sample quantiles for increasing probabilities
Is it a bug that quantile() can return a lower sample quantile for a higher probability? > # quantile returns decreasing results with increasing probs (data at the end of the message) > quantile(x2, (0:5)/5) 0% 20% 40% 60% 80% -0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115 100% 0.2905596324 > # the 40% quantile has a lower value than the 20% quantile > diff(quantile(x2, (0:5)/5)) 20% 40% 60% 80% 100% 5.099206e-04 -1.084202e-19 1.726945e-04 1.568908e-04 2.911342e-01 > This only happens for type=7: > for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5, type=type))<0), "\n") 1 FALSE 2 FALSE 3 FALSE 4 FALSE 5 FALSE 6 FALSE 7 TRUE 8 FALSE 9 FALSE > I know this is at the limits of machine precision, but it still seems wrong. Curiously, S-PLUS 6.2 produces exactly the same numerical result on my machine (according to the R quantile documentation, the S-PLUS calculations correspond to type=7). > version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major2 minor0.1 year 2004 month11 day 15 language R > -- Tony Plate here's the data that gives the result above: x2 <- c(-0.00090419678460984, -0.000980064982459659, -0.00090419678460984, -0.000744104385375977, 0.206332797095889, -0.000817139943440755, -0.000899564652215867, -0.000574611482166109, -0.0013728312083653, -0.00090419678460984, -0.0013728312083653, -0.000723100843883696, -0.000630242483956473, -0.000817139943440755, 0.0868369624728248, -0.000817139943440755, -0.000817139943440755, -0.00112312180655343, -0.00112312180655343, -0.000380657968066988, -0.000723100843883696, -0.00090419678460984, -0.00090419678460984, -0.000380657968066988, -0.0010127169745309, -0.000723100843883696, -0.00112312180655343, -0.00112312180655343, -0.00090419678460984, -0.000681496801830473, -0.00090419678460984, -0.000380657968066988, -0.000380657968066988, -0.000817139943440755, -0.000723100843883696, -0.000723100843883696, -0.0013913767678397, -0.0013728312083653, -0.000817139943440755, -0.00112312180655343, -0.00112312180655343, -0.000817139943440755, 0.245683056967599, -0.000817139943440755, -0.00112312180655343, -0.00090419678460984, -0.00112312180655343, 0.123553718839373, -0.0013728312083653, -0.000723100843883696, -0.000899564652215867, 0.105625640778315, -0.00090419678460984, -0.0013913767678397, -0.00090419678460984, -0.000723100843883696, -0.000228291466122582, -0.00090419678460984, -0.000817139943440755, -0.00090419678460984, -0.000817139943440755, -0.000817139943440755, -0.000817139943440755, -0.000817139943440755, 0., -0.000723100843883696, -0.000380657968066988, -0.000723100843883696, -0.000723100843883696, -0.000899564652215867, -0.000764199665614537, -0.000574611482166109, -0.000681496801830473, -0.000817139943440755, -0.000817139943440755, -0.00090419678460984, -0.000723100843883696, 0.0394509065718878, -0.000817139943440755, -0.0013728312083653, -0.000228291466122582, -0.00090419678460984, -0.0013913767678397, -0.000817139943440755, -0.000817139943440755, -0.000817139943440755, -0.00090419678460984, -0.000681496801830473, -0.000817139943440755, -0.0013728312083653, -0.00090419678460984, -0.00112312180655343, -0.00090419678460984, -0.00112312180655343, -0.000723100843883696, -0.0013728312083653, -0.0013728312083653, -0.000574611482166109, -0.00133536543164934, -0.000369889395577567, -0.000723100843883696, -0.000817139943440755, -0.000723100843883696, -0.0013728312083653, -0.000817139943440755, -0.00090419678460984, -0.000723100843883696, -0.000723100843883696, -0.00090419678460984, -0.000723100843883696, -0.000723100843883696, -0.00090419678460984, -0.0010127169745309, -0.00090419678460984, -0.000723100843883696, -0.00090419678460984, -0.000723100843883696, -0.00090419678460984, -0.000817139943440755, -0.000817139943440755, -0.00138617697216216, -0.000574611482166109, -0.000723100843883696, 0.0238135826020014, -0.000723100843883696, -0.000817139943440755, -0.00090419678460984, -0.00112312180655343, -0.000574611482166109, -0.000380657968066988, -0.000723100843883696, -0.000367703891935803, -0.00090419678460984, -0.000574611482166109, -0.00112312180655343, -0.00090419678460984, 0.0681528477441697, -0.000817139943440755, -0.00090419678460984, -0.0010127169745309, -0.00090419678460984, -0.000380657968066988, -0.000392709459577288, -0.0013913767678397, -0.000681496801830473, -0.000492947442190988, -0.00090419678460984, -0.000723100843883696, -0.000723100843883696, -0.000899564652215867, -0.00090419678460984, -0.00090419678460984, -0.00090419678460984, -0.000574611482166109, -0.000817139943440755, -0.000723100843883696, 0.0394509065718878, 0.150393440609887, -0.00090419678460984, -0.000723100843883696, -0.000492947442190988, 0.0514323597862607, -0.000574611482166109, -0.00068149680183047