[Rd] bug? quantile() can return decreasing sample quantiles for increasing probabilities

2005-02-22 Thread Tony Plate
Is it a bug that quantile() can return a lower sample quantile for a higher 
probability?

 # quantile returns decreasing results with increasing probs (data at 
the end of the message)
 quantile(x2, (0:5)/5)
   0%   20%   40%   60%   80%
-0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115
 100%
 0.2905596324
 # the 40% quantile has a lower value than the 20% quantile
 diff(quantile(x2, (0:5)/5))
  20%   40%   60%   80%  100%
 5.099206e-04 -1.084202e-19  1.726945e-04  1.568908e-04  2.911342e-01


This only happens for type=7:
 for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5, 
type=type))0), \n)
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
7 TRUE
8 FALSE
9 FALSE


I know this is at the limits of machine precision, but it still seems 
wrong.  Curiously, S-PLUS 6.2 produces exactly the same numerical result on 
my machine (according to the R quantile documentation, the S-PLUS 
calculations correspond to type=7).

 version
 _
platform i386-pc-mingw32
arch i386
os   mingw32
system   i386, mingw32
status
major2
minor0.1
year 2004
month11
day  15
language R

-- Tony Plate
here's the data that gives the result above:
x2 - c(-0.00090419678460984, -0.000980064982459659, -0.00090419678460984, 
-0.000744104385375977,
	0.206332797095889, -0.000817139943440755, -0.000899564652215867,
	-0.000574611482166109, -0.0013728312083653, -0.00090419678460984,
	-0.0013728312083653, -0.000723100843883696, -0.000630242483956473,
	-0.000817139943440755, 0.0868369624728248, -0.000817139943440755,
	-0.000817139943440755, -0.00112312180655343, -0.00112312180655343,
	-0.000380657968066988, -0.000723100843883696, -0.00090419678460984,
	-0.00090419678460984, -0.000380657968066988, -0.0010127169745309,
	-0.000723100843883696, -0.00112312180655343, -0.00112312180655343,
	-0.00090419678460984, -0.000681496801830473, -0.00090419678460984,
	-0.000380657968066988, -0.000380657968066988, -0.000817139943440755,
	-0.000723100843883696, -0.000723100843883696, -0.0013913767678397,
	-0.0013728312083653, -0.000817139943440755, -0.00112312180655343,
	-0.00112312180655343, -0.000817139943440755, 0.245683056967599,
	-0.000817139943440755, -0.00112312180655343, -0.00090419678460984,
	-0.00112312180655343, 0.123553718839373, -0.0013728312083653, 
-0.000723100843883696,
	-0.000899564652215867, 0.105625640778315, -0.00090419678460984, 
-0.0013913767678397,
	-0.00090419678460984, -0.000723100843883696, -0.000228291466122582,
	-0.00090419678460984, -0.000817139943440755, -0.00090419678460984,
	-0.000817139943440755, -0.000817139943440755, -0.000817139943440755,
	-0.000817139943440755, 0., -0.000723100843883696, -0.000380657968066988,
	-0.000723100843883696, -0.000723100843883696, -0.000899564652215867,
	-0.000764199665614537, -0.000574611482166109, -0.000681496801830473,
	-0.000817139943440755, -0.000817139943440755, -0.00090419678460984,
	-0.000723100843883696, 0.0394509065718878, -0.000817139943440755,
	-0.0013728312083653, -0.000228291466122582, -0.00090419678460984,
	-0.0013913767678397, -0.000817139943440755, -0.000817139943440755,
	-0.000817139943440755, -0.00090419678460984, -0.000681496801830473,
	-0.000817139943440755, -0.0013728312083653, -0.00090419678460984,
	-0.00112312180655343, -0.00090419678460984, -0.00112312180655343,
	-0.000723100843883696, -0.0013728312083653, -0.0013728312083653,
	-0.000574611482166109, -0.00133536543164934, -0.000369889395577567,
	-0.000723100843883696, -0.000817139943440755, -0.000723100843883696,
	-0.0013728312083653, -0.000817139943440755, -0.00090419678460984,
	-0.000723100843883696, -0.000723100843883696, -0.00090419678460984,
	-0.000723100843883696, -0.000723100843883696, -0.00090419678460984,
	-0.0010127169745309, -0.00090419678460984, -0.000723100843883696,
	-0.00090419678460984, -0.000723100843883696, -0.00090419678460984,
	-0.000817139943440755, -0.000817139943440755, -0.00138617697216216,
	-0.000574611482166109, -0.000723100843883696, 0.0238135826020014,
	-0.000723100843883696, -0.000817139943440755, -0.00090419678460984,
	-0.00112312180655343, -0.000574611482166109, -0.000380657968066988,
	-0.000723100843883696, -0.000367703891935803, -0.00090419678460984,
	-0.000574611482166109, -0.00112312180655343, -0.00090419678460984,
	0.0681528477441697, -0.000817139943440755, -0.00090419678460984,
	-0.0010127169745309, -0.00090419678460984, -0.000380657968066988,
	-0.000392709459577288, -0.0013913767678397, -0.000681496801830473,
	-0.000492947442190988, -0.00090419678460984, -0.000723100843883696,
	-0.000723100843883696, -0.000899564652215867, -0.00090419678460984,
	-0.00090419678460984, -0.00090419678460984, -0.000574611482166109,
	-0.000817139943440755, -0.000723100843883696, 0.0394509065718878, 
0.150393440609887,
	-0.00090419678460984, -0.000723100843883696, -0.000492947442190988,
	0.0514323597862607, -0.000574611482166109, -0.000681496801830473,
	

Re: [Rd] bug? quantile() can return decreasing sample quantiles for increasing probabilities

2005-02-22 Thread Duncan Murdoch
On Tue, 22 Feb 2005 13:43:47 -0700, Tony Plate
[EMAIL PROTECTED] wrote :

Is it a bug that quantile() can return a lower sample quantile for a higher 
probability?

  # quantile returns decreasing results with increasing probs (data at 
the end of the message)
  quantile(x2, (0:5)/5)
0%   20%   40%   60%   80%
-0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115
  100%
  0.2905596324
  # the 40% quantile has a lower value than the 20% quantile
  diff(quantile(x2, (0:5)/5))
   20%   40%   60%   80%  100%
  5.099206e-04 -1.084202e-19  1.726945e-04  1.568908e-04  2.911342e-01
 

This only happens for type=7:

  for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5, 
type=type))0), \n)
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
7 TRUE
8 FALSE
9 FALSE
 

I know this is at the limits of machine precision, but it still seems 
wrong.  Curiously, S-PLUS 6.2 produces exactly the same numerical result on 
my machine (according to the R quantile documentation, the S-PLUS 
calculations correspond to type=7).

I'd say it's not a bug in that rounding error is something you should
expect in a calculation like this, but it does look wrong.  And it
isn't only restricted to type 7.  If you make a vector of copies of
that bad value, you'll see it in more cases:

 x - rep(-0.00090419678460984, 602)
 for (type in 1:9) cat(type, any(diff(quantile(x, (0:5)/5, 
+ type=type))0), \n)
1 FALSE 
2 FALSE 
3 FALSE 
4 FALSE 
5 TRUE 
6 TRUE 
7 TRUE 
8 FALSE 
9 TRUE 

(at least on Windows).  What's happening is that R is doing linear
interpolation between two equal values, and not getting the same value
back, because of rounding.  

The offending line appears to be this one:

qs[i] - ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]])

The equivalent calculation in the approx function (which doesn't
appear to have this problem) is

qs[i] + (x[hi[i]] - qs[i]) * h

Can anyone think of why this would not be better?  (The same sort of
calculation shows up again later in quantile().)

Duncan Murdoch

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bug? quantile() can return decreasing sample quantiles for increasing probabilities

2005-02-22 Thread Prof Brian Ripley
On Tue, 22 Feb 2005, Duncan Murdoch wrote:
On Tue, 22 Feb 2005 13:43:47 -0700, Tony Plate
[EMAIL PROTECTED] wrote :
Is it a bug that quantile() can return a lower sample quantile for a higher
probability?
# quantile returns decreasing results with increasing probs (data at
the end of the message)
quantile(x2, (0:5)/5)
   0%   20%   40%   60%   80%
-0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115
 100%
 0.2905596324
# the 40% quantile has a lower value than the 20% quantile
diff(quantile(x2, (0:5)/5))
  20%   40%   60%   80%  100%
 5.099206e-04 -1.084202e-19  1.726945e-04  1.568908e-04  2.911342e-01

This only happens for type=7:
for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5,
type=type))0), \n)
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
7 TRUE
8 FALSE
9 FALSE

I know this is at the limits of machine precision, but it still seems
wrong.  Curiously, S-PLUS 6.2 produces exactly the same numerical result on
my machine (according to the R quantile documentation, the S-PLUS
calculations correspond to type=7).
I'd say it's not a bug in that rounding error is something you should
expect in a calculation like this, but it does look wrong.  And it
isn't only restricted to type 7.  If you make a vector of copies of
that bad value, you'll see it in more cases:
x - rep(-0.00090419678460984, 602)
for (type in 1:9) cat(type, any(diff(quantile(x, (0:5)/5,
+ type=type))0), \n)
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 TRUE
6 TRUE
7 TRUE
8 FALSE
9 TRUE
(at least on Windows).  What's happening is that R is doing linear
interpolation between two equal values, and not getting the same value
back, because of rounding.
The offending line appears to be this one:
qs[i] - ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]])
The equivalent calculation in the approx function (which doesn't
appear to have this problem) is
qs[i] + (x[hi[i]] - qs[i]) * h
Can anyone think of why this would not be better?  (The same sort of
calculation shows up again later in quantile().)
Infinities, where arithmetic is not distributive.
It is done that way to interpolate correctly between Inf and Inf: the 
second version gives NaN, and that is something that was a problem with 
the first version of the update to give all those types.  I am pretty sure 
there is a regression test for this.

If you really want to avoid this, I think you need to pre-test for 
equality.

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bug? quantile() can return decreasing sample quantiles for increasing probabilities

2005-02-22 Thread Duncan Murdoch
On Tue, 22 Feb 2005 21:36:20 +, Duncan Murdoch
[EMAIL PROTECTED] wrote :

On Tue, 22 Feb 2005 13:43:47 -0700, Tony Plate
[EMAIL PROTECTED] wrote :

Is it a bug that quantile() can return a lower sample quantile for a higher 
probability?

  # quantile returns decreasing results with increasing probs (data at 
the end of the message)
  quantile(x2, (0:5)/5)
0%   20%   40%   60%   80%
-0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115
  100%
  0.2905596324
  # the 40% quantile has a lower value than the 20% quantile
  diff(quantile(x2, (0:5)/5))
   20%   40%   60%   80%  100%
  5.099206e-04 -1.084202e-19  1.726945e-04  1.568908e-04  2.911342e-01
 

This only happens for type=7:

  for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5, 
type=type))0), \n)
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
7 TRUE
8 FALSE
9 FALSE
 

I know this is at the limits of machine precision, but it still seems 
wrong.  Curiously, S-PLUS 6.2 produces exactly the same numerical result on 
my machine (according to the R quantile documentation, the S-PLUS 
calculations correspond to type=7).

I'd say it's not a bug in that rounding error is something you should
expect in a calculation like this, but it does look wrong.  And it
isn't only restricted to type 7.  If you make a vector of copies of
that bad value, you'll see it in more cases:

 x - rep(-0.00090419678460984, 602)
 for (type in 1:9) cat(type, any(diff(quantile(x, (0:5)/5, 
+ type=type))0), \n)
1 FALSE 
2 FALSE 
3 FALSE 
4 FALSE 
5 TRUE 
6 TRUE 
7 TRUE 
8 FALSE 
9 TRUE 

(at least on Windows).  What's happening is that R is doing linear
interpolation between two equal values, and not getting the same value
back, because of rounding.  

The offending line appears to be this one:

qs[i] - ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]])

The equivalent calculation in the approx function (which doesn't
appear to have this problem) is

qs[i] + (x[hi[i]] - qs[i]) * h

Can anyone think of why this would not be better?  (The same sort of
calculation shows up again later in quantile().)

Just looked at the history of this line, and it appears the code is
the way it is to avoid an error if the value of the vector is
infinite.  For example, we now get the right answer

 x - rep(Inf, 100)
 quantile(x, 0:5/5)
  0%  20%  40%  60%  80% 100% 
 Inf  Inf  Inf  Inf  Inf  Inf 

but with my modification above we wouldn't:

 quantile(x, 0:5/5)
  0%  20%  40%  60%  80% 100% 
 Inf  NaN  NaN  NaN  NaN  Inf 

Duncan

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bug? quantile() can return decreasing sample quantiles for increasing probabilities

2005-02-22 Thread Tony Plate
Thanks for the diagnosis.
The reason I came across this was that I use both S-PLUS and R and I often 
use the results of quantile() as the breaks for cut().  In S-PLUS, cut() 
stops with an error if breaks has any decreasing values.  Thus this example 
caused an S-PLUS function to unexpectedly stop with an error.   However, 
cut() in R behaves differently: it sorts its breaks and thus does not 
object to decreasing values in breaks.  Another difference is that cut() in 
R stops with an error if any breaks are duplicated, which, I guess, means 
that in R I should use findInterval() instead of cut() for this 
task.  Except that findInterval() in R stops with an error if its breaks 
are unsorted...

 findInterval(x2, quantile(x2, (0:5)/5))
Error in findInterval(x2, quantile(x2, (0:5)/5)) :
'vec' must be sorted non-decreasingly

-- Tony Plate
At Tuesday 02:36 PM 2/22/2005, Duncan Murdoch wrote:
On Tue, 22 Feb 2005 13:43:47 -0700, Tony Plate
[EMAIL PROTECTED] wrote :
Is it a bug that quantile() can return a lower sample quantile for a higher
probability?

  # quantile returns decreasing results with increasing probs (data at
the end of the message)
  quantile(x2, (0:5)/5)
0%   20%   40%   60%   80%
-0.0014141174 -0.0009041968 -0.0009041968 -0.0007315023 -0.0005746115
  100%
  0.2905596324
  # the 40% quantile has a lower value than the 20% quantile
  diff(quantile(x2, (0:5)/5))
   20%   40%   60%   80%  100%
  5.099206e-04 -1.084202e-19  1.726945e-04  1.568908e-04  2.911342e-01
 

This only happens for type=7:

  for (type in 1:9) cat(type, any(diff(quantile(x2, (0:5)/5,
type=type))0), \n)
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
7 TRUE
8 FALSE
9 FALSE
 

I know this is at the limits of machine precision, but it still seems
wrong.  Curiously, S-PLUS 6.2 produces exactly the same numerical result on
my machine (according to the R quantile documentation, the S-PLUS
calculations correspond to type=7).
I'd say it's not a bug in that rounding error is something you should
expect in a calculation like this, but it does look wrong.  And it
isn't only restricted to type 7.  If you make a vector of copies of
that bad value, you'll see it in more cases:
 x - rep(-0.00090419678460984, 602)
 for (type in 1:9) cat(type, any(diff(quantile(x, (0:5)/5,
+ type=type))0), \n)
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 TRUE
6 TRUE
7 TRUE
8 FALSE
9 TRUE
(at least on Windows).  What's happening is that R is doing linear
interpolation between two equal values, and not getting the same value
back, because of rounding.
The offending line appears to be this one:
qs[i] - ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]])
The equivalent calculation in the approx function (which doesn't
appear to have this problem) is
qs[i] + (x[hi[i]] - qs[i]) * h
Can anyone think of why this would not be better?  (The same sort of
calculation shows up again later in quantile().)
Duncan Murdoch
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bug? quantile() can return decreasing sample quantiles for increasing probabilities

2005-02-22 Thread Duncan Murdoch
On Tue, 22 Feb 2005 15:14:00 -0700, Tony Plate
[EMAIL PROTECTED] wrote :

Thanks for the diagnosis.

The reason I came across this was that I use both S-PLUS and R and I often 
use the results of quantile() as the breaks for cut().  In S-PLUS, cut() 
stops with an error if breaks has any decreasing values.  Thus this example 
caused an S-PLUS function to unexpectedly stop with an error.   However, 
cut() in R behaves differently: it sorts its breaks and thus does not 
object to decreasing values in breaks.  Another difference is that cut() in 
R stops with an error if any breaks are duplicated, which, I guess, means 
that in R I should use findInterval() instead of cut() for this 
task.  Except that findInterval() in R stops with an error if its breaks 
are unsorted...

  findInterval(x2, quantile(x2, (0:5)/5))
Error in findInterval(x2, quantile(x2, (0:5)/5)) :
 'vec' must be sorted non-decreasingly

I guess you'll just have to use sort(quantile(...)).  It makes the
labels look sort of funny, but is hopefully harmless:

 x - rep(-0.00090419678460984, 602)
 sort(quantile(x, 0:5/5))
   0%   40%   60%   80%  100% 
-0.0009041968 -0.0009041968 -0.0009041968 -0.0009041968 -0.0009041968 
  20% 
-0.0009041968 

Duncan Murdoch

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel