Re: [ot-users] Nonsensical output from computeQuantile

regis lebrun Tue, 12 Sep 2017 02:17:26 -0700

Hi Phil,


Many thanks for your feedback. I see several points in your question:
+ the quantile function is defined over R and not only [0, 1], strange
+ quantile(0) should be equal to -inf and quantile(1) to inf for unbounded 
distributions
+ the quantile function should be symmetric wrt 1/2 for symmetric distributions
+ the quantile function should be nondecreasing
+ the quantile function should be continuous for continuous distributions

If we adopt a pure mathematical point of view, for an unbounded distribution 
such as the normal distribution, we would expect quantile(x)=NaN or an 
exception to be thrown for x outside of [0,1], quantile(0)=-inf and 
quantile(1)=inf. The point is that the quantile function is used by the 
iso-probabilistic transformations, which are used in several meta-modeling 
algorithms. If the quantile function is restricted to the range [0,1], for 
example by throwing an exception, then each time the meta-model is evaluated 
outside of the bounding box of the learning sample, an exception is thrown. For 
many applications, a saturation (ie a projection on the bounding box of the 
learning sample) of the input is much more adapted so the choice we made. It 
answers the two first points.


Concerning the lack of symmetry between the lower tail and the upper tail, it 
is due to the fact that floating point numbers are NOT symmetric wrt 1/2. You 
have much more positive floating point numbers in the neighborhood of 0 than in 
the neighborhood of 1, due to the denormalized numbers. The smallest positive 
floating point number is SpecFunc.MinScalar=2.225e-308, while the gap between 1 
and the largest floating point number less than 1 is 
SpecFunc.ScalarPrecision=2.220e-16. In the case where things are done 
carefully, eg by using the relation F^{-1}(G(x))=\bar{F}^'-1}(\bar{G}(x)) for x 
large, where F,G are two CDFs, \bar{F}=1-F, \bar{G}=1-G, it allows to extend 
the range of the isoprobabilistic transformations from about [-8.12, 8.12] to 
[-37.5, 37.5] in the standard normal space.

-> it was an historical decision, today it looks weird. 


Concerning the last two points, there is definitely a bug in OpenTURNS. For 
numerical reasons (mainly to be able to compute integrals using the 
Gauss-Kronrod method), all the distributions have a range, which is the 
bounding box of their mathematical support (ie the closure of the set where the 
PDF is positive for absolutely continuous distributions, the set of points with 
positive probability for discrete distributions). The range is stored as an 
Interval object: the values of the bounds are used by the numerical methods, 
and the flags tell the user if the distribution is actually bounded or not. If 
the distribution is bounded, the value of the bound is the exact value, 
otherwise it is an extreme quantile (epsilon and 1-epsilon with epsilon=1e-14 
by default, see ResourceMap.GetAsScalar("Distribution-DefaultCDFEpsilon")).
At one point, I decided to use the range to provide a meaningful value for 
quantile(0), quantile(1) and more generally quantile(q) for q outside of (0,1). 
It works perfectly well for bounded distributions, but not for continuous 
distributions as eg quantile(1e-15) < quantile(0)! The key point is that the 
range stores both exact and approximate data, and only the exact one should be 
used in the implementation of the computeQuantile() method.
I can see two fixes:
1) Use the range information in the computation of quantiles only for finite 
bounds
2) Restrict the computeQuantile() method to (epsilon, 1.0-epsilon) and extend 
it using the numerical information in range.

I clearly prefer the first solution, but I have to check that nothing goes 
wrong in the algorithms that rely on computeQuantile() instead of getRange() to 
have a clue on the extent of the distributions.

Cheers

Régis


________________________________
De : Phil Fernandes <[email protected]>
À : Julien Schueller | Phimeca <[email protected]>; "[email protected]" 
<[email protected]> 
Envoyé le : Mardi 5 septembre 2017 19h00
Objet : Re: [ot-users] Nonsensical output from computeQuantile



Hi Julien,
 
Thank you for your reply. However, shouldn’t the quantiles of continuous 
unbounded distributions at p=0 and p=1 be undefined and return NaN (or -/+inf 
if we consider the limits)? What is the purpose of real-numbered boundary 
values in these cases? If this is a coding convenience, then why does 
ot.Normal().computeQuantile() not return quantiles that are symmetric about 0 
for ot.SpecFunc.MinScalar and 1-ot.SpecFunc.Precision?
 
Phil
 
 
From:[email protected] [mailto:[email protected]] On Behalf 
Of Julien Schueller | Phimeca
Sent: Monday, September 04, 2017 9:37 AM
To: [email protected]
Subject: [External] Re: [ot-users] Nonsensical output from computeQuantile
 
 
Maybe we don't want to throw, but I'm not sure.
We never throw, but there are some other cases where the quantile function is 
not really continuous at the bounds, because in 
DistributionImplementation::computeQuantile(L2063) we filter according to the 
values defined in the range attribute:
 
 
 
 
 
import openturns as ot
import traceback

factories = ot.DistributionFactory.GetUniVariateFactories()
for factory in factories:
    dist = factory.build()
    try:
        q0 = dist.computeQuantile(0.0)
        qm1 = dist.computeQuantile(-1.0)
        if q0 != qm1:
            print(dist.getName(), '<0', q0, qm1)
        q1 = dist.computeQuantile(1.0)
        q0p = dist.computeQuantile(ot.SpecFunc.MinScalar)
        if q0 != q0p:
            print(dist.getName(), '0+', q0, q0p)
    except:
        print(' exc for', dist.getName())
        traceback.print_exc()
        pass

    try:
        q1 = dist.computeQuantile(1.0)
        q2 = dist.computeQuantile(2.0)
        if q1 != q2:
            print(dist.getName(), '>1', q1, q2)
        q1m = dist.computeQuantile(1.0-ot.SpecFunc.ScalarEpsilon)
        if q1m != q1:
            print(dist.getName(),'1-', q1m, q1)
    except:
        print(' exc for', dist.getName())
        traceback.print_exc()
        pass
 

________________________________

De :[email protected] <[email protected]> de la part de 
Julien Schueller | Phimeca <[email protected]>
Envoyé : lundi 4 septembre 2017 16:40:25
À : [email protected]
Objet : Re: [ot-users] Nonsensical output from computeQuantile
 
Hello Phil,
 
- Calling computeQuantile(x) with p outside of (0,1) does not make sense as x 
is a probability, and it should throw an exception, maybe we should fix that.
 
- Otherwise the boundary values at x=0 and x=1 seem perfectly fine, (and should 
not be -/+inf) regarding the continuity of the fonction:
 
In [32]: ot.Normal().computeQuantile(ot.SpecFunc.MinScalar)
Out[32]: class=Point name=Unnamed dimension=1 values=[-37.5016]
 
In [42]: ot.Normal().computeQuantile(1-ot.SpecFunc.Precision)
Out[42]: class=Point name=Unnamed dimension=1 values=[8.12589]
 
j

________________________________

De :[email protected] <[email protected]> de la part de 
Phil Fernandes <[email protected]>
Envoyé : mercredi 23 août 2017 22:37:37
À : [email protected]
Objet : [ot-users] Nonsensical output from computeQuantile
 
Hello,
 
I’ve noticed that the computeQuantile function, at least for continuous 
unbounded distributions, yields a nonsensical numerical result with inputs <=0 
and >=1. For example, ot.Normal().computeQuantile(x) when x = 0 the output is 
-37.5194. If I enter x < 0, the output is -7.65063. At the moment I am handling 
this by subclassing the distribution and overriding the computeQuantile method 
with an if-else statement that returns –np.inf or np.inf if x = 0 or 1, but 
this is cumbersome. Would it be possible to incorporate this fix into the 
source code?
 
Thank you,
Phil
_______________________________________________
OpenTURNS users mailing list
[email protected]
http://openturns.org/mailman/listinfo/users
_______________________________________________
OpenTURNS users mailing list
[email protected]
http://openturns.org/mailman/listinfo/users

Re: [ot-users] Nonsensical output from computeQuantile

Reply via email to