Hi Phil,
Many thanks for your feedback. I see several points in your question:
+ the quantile function is defined over R and not only [0, 1], strange
+ quantile(0) should be equal to -inf and quantile(1) to inf for unbounded
distributions
+ the quantile function should be symmetric wrt 1/2 for symmetric distributions
+ the quantile function should be nondecreasing
+ the quantile function should be continuous for continuous distributions
If we adopt a pure mathematical point of view, for an unbounded distribution
such as the normal distribution, we would expect quantile(x)=NaN or an
exception to be thrown for x outside of [0,1], quantile(0)=-inf and
quantile(1)=inf. The point is that the quantile function is used by the
iso-probabilistic transformations, which are used in several meta-modeling
algorithms. If the quantile function is restricted to the range [0,1], for
example by throwing an exception, then each time the meta-model is evaluated
outside of the bounding box of the learning sample, an exception is thrown. For
many applications, a saturation (ie a projection on the bounding box of the
learning sample) of the input is much more adapted so the choice we made. It
answers the two first points.
Concerning the lack of symmetry between the lower tail and the upper tail, it
is due to the fact that floating point numbers are NOT symmetric wrt 1/2. You
have much more positive floating point numbers in the neighborhood of 0 than in
the neighborhood of 1, due to the denormalized numbers. The smallest positive
floating point number is SpecFunc.MinScalar=2.225e-308, while the gap between 1
and the largest floating point number less than 1 is
SpecFunc.ScalarPrecision=2.220e-16. In the case where things are done
carefully, eg by using the relation F^{-1}(G(x))=\bar{F}^'-1}(\bar{G}(x)) for x
large, where F,G are two CDFs, \bar{F}=1-F, \bar{G}=1-G, it allows to extend
the range of the isoprobabilistic transformations from about [-8.12, 8.12] to
[-37.5, 37.5] in the standard normal space.
-> it was an historical decision, today it looks weird.
Concerning the last two points, there is definitely a bug in OpenTURNS. For
numerical reasons (mainly to be able to compute integrals using the
Gauss-Kronrod method), all the distributions have a range, which is the
bounding box of their mathematical support (ie the closure of the set where the
PDF is positive for absolutely continuous distributions, the set of points with
positive probability for discrete distributions). The range is stored as an
Interval object: the values of the bounds are used by the numerical methods,
and the flags tell the user if the distribution is actually bounded or not. If
the distribution is bounded, the value of the bound is the exact value,
otherwise it is an extreme quantile (epsilon and 1-epsilon with epsilon=1e-14
by default, see ResourceMap.GetAsScalar("Distribution-DefaultCDFEpsilon")).
At one point, I decided to use the range to provide a meaningful value for
quantile(0), quantile(1) and more generally quantile(q) for q outside of (0,1).
It works perfectly well for bounded distributions, but not for continuous
distributions as eg quantile(1e-15) < quantile(0)! The key point is that the
range stores both exact and approximate data, and only the exact one should be
used in the implementation of the computeQuantile() method.
I can see two fixes:
1) Use the range information in the computation of quantiles only for finite
bounds
2) Restrict the computeQuantile() method to (epsilon, 1.0-epsilon) and extend
it using the numerical information in range.
I clearly prefer the first solution, but I have to check that nothing goes
wrong in the algorithms that rely on computeQuantile() instead of getRange() to
have a clue on the extent of the distributions.
Cheers
Régis
________________________________
De : Phil Fernandes <[email protected]>
À : Julien Schueller | Phimeca <[email protected]>; "[email protected]"
<[email protected]>
Envoyé le : Mardi 5 septembre 2017 19h00
Objet : Re: [ot-users] Nonsensical output from computeQuantile
Hi Julien,
Thank you for your reply. However, shouldn’t the quantiles of continuous
unbounded distributions at p=0 and p=1 be undefined and return NaN (or -/+inf
if we consider the limits)? What is the purpose of real-numbered boundary
values in these cases? If this is a coding convenience, then why does
ot.Normal().computeQuantile() not return quantiles that are symmetric about 0
for ot.SpecFunc.MinScalar and 1-ot.SpecFunc.Precision?
Phil
From:[email protected] [mailto:[email protected]] On Behalf
Of Julien Schueller | Phimeca
Sent: Monday, September 04, 2017 9:37 AM
To: [email protected]
Subject: [External] Re: [ot-users] Nonsensical output from computeQuantile
Maybe we don't want to throw, but I'm not sure.
We never throw, but there are some other cases where the quantile function is
not really continuous at the bounds, because in
DistributionImplementation::computeQuantile(L2063) we filter according to the
values defined in the range attribute:
import openturns as ot
import traceback
factories = ot.DistributionFactory.GetUniVariateFactories()
for factory in factories:
dist = factory.build()
try:
q0 = dist.computeQuantile(0.0)
qm1 = dist.computeQuantile(-1.0)
if q0 != qm1:
print(dist.getName(), '<0', q0, qm1)
q1 = dist.computeQuantile(1.0)
q0p = dist.computeQuantile(ot.SpecFunc.MinScalar)
if q0 != q0p:
print(dist.getName(), '0+', q0, q0p)
except:
print(' exc for', dist.getName())
traceback.print_exc()
pass
try:
q1 = dist.computeQuantile(1.0)
q2 = dist.computeQuantile(2.0)
if q1 != q2:
print(dist.getName(), '>1', q1, q2)
q1m = dist.computeQuantile(1.0-ot.SpecFunc.ScalarEpsilon)
if q1m != q1:
print(dist.getName(),'1-', q1m, q1)
except:
print(' exc for', dist.getName())
traceback.print_exc()
pass
________________________________
De :[email protected] <[email protected]> de la part de
Julien Schueller | Phimeca <[email protected]>
Envoyé : lundi 4 septembre 2017 16:40:25
À : [email protected]
Objet : Re: [ot-users] Nonsensical output from computeQuantile
Hello Phil,
- Calling computeQuantile(x) with p outside of (0,1) does not make sense as x
is a probability, and it should throw an exception, maybe we should fix that.
- Otherwise the boundary values at x=0 and x=1 seem perfectly fine, (and should
not be -/+inf) regarding the continuity of the fonction:
In [32]: ot.Normal().computeQuantile(ot.SpecFunc.MinScalar)
Out[32]: class=Point name=Unnamed dimension=1 values=[-37.5016]
In [42]: ot.Normal().computeQuantile(1-ot.SpecFunc.Precision)
Out[42]: class=Point name=Unnamed dimension=1 values=[8.12589]
j
________________________________
De :[email protected] <[email protected]> de la part de
Phil Fernandes <[email protected]>
Envoyé : mercredi 23 août 2017 22:37:37
À : [email protected]
Objet : [ot-users] Nonsensical output from computeQuantile
Hello,
I’ve noticed that the computeQuantile function, at least for continuous
unbounded distributions, yields a nonsensical numerical result with inputs <=0
and >=1. For example, ot.Normal().computeQuantile(x) when x = 0 the output is
-37.5194. If I enter x < 0, the output is -7.65063. At the moment I am handling
this by subclassing the distribution and overriding the computeQuantile method
with an if-else statement that returns –np.inf or np.inf if x = 0 or 1, but
this is cumbersome. Would it be possible to incorporate this fix into the
source code?
Thank you,
Phil
_______________________________________________
OpenTURNS users mailing list
[email protected]
http://openturns.org/mailman/listinfo/users
_______________________________________________
OpenTURNS users mailing list
[email protected]
http://openturns.org/mailman/listinfo/users