Re: [HACKERS] pgbench gaussian/exponential docs improvements

2015-11-05 Thread Robert Haas
On Thu, Nov 5, 2015 at 10:36 AM, Fabien COELHO  wrote:
> After some more thoughts, ISTM that this is not exactly a CFD because of the
> truncations, so I just named it "f" to be on the safe side.

Was there supposed to be a patch attached here?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench gaussian/exponential docs improvements

2015-11-05 Thread Fabien COELHO



On Thu, Nov 5, 2015 at 10:36 AM, Fabien COELHO  wrote:

After some more thoughts, ISTM that this is not exactly a CFD because of the
truncations, so I just named it "f" to be on the safe side.


Was there supposed to be a patch attached here?


No, the actual patch is in the "add function to pgbench" thread as the 
documentation is reworked on the occasion and I tried to take into account 
Tomas suggestions while doing the editing.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench gaussian/exponential docs improvements

2015-11-05 Thread Fabien COELHO


I've done some work on the documentation as part of adding functions to 
pgbench expression. You may have a look at:


http://www.postgresql.org/message-id/alpine.DEB.2.10.1511051256500.29177@sto


[...]
 CDF2(x) = PHI(2.0 * threshold * ...) / (2.0 * PHI(threshold) - 1.0)

and then the probability of "i" is

 P(X=i) = CDF2(i+0.5) - CDF2(i-0.5)


I agree that defining the shifted/scaled CDF and using it afterwards looks 
cleaner.


After some more thoughts, ISTM that this is not exactly a CFD because of 
the truncations, so I just named it "f" to be on the safe side.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench gaussian/exponential docs improvements

2015-10-28 Thread Robert Haas
On Sun, Oct 25, 2015 at 7:12 PM, Tomas Vondra
 wrote:
>> By default, or when uniform is specified, all values in the range are
>> drawn with equal probability. Specifying gaussian or exponential
>> options modifies this behavior; each requires a mandatory threshold
>> which determines the precise shape of the distribution.
>
> I find the 'threshold' name to be rather unfortunate, as none of the
> probability distribution functions that I know use this term. And even if
> there's one probability function that uses 'threshold' it has very little
> meaning in the others. For example the exponential distribution uses 'rate'
> (lambda). I'd prefer a neutral name (e.g. 'parameter').

+1 for this change.

(I have no particular opinion on your other suggestions.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench gaussian/exponential docs improvements

2015-10-26 Thread Fabien COELHO



I was not only thinking of mathematical figures, I was also thinking of
graphics, some format may be zip containing XML stuff for instance.


But we don't need it here, so why should we care about it too much?


I was just digressing about the main subject:-) Having some graphics in 
the doc would help here and there, though.


I do understand that. I'm trying to explain that "threshold" is in fact 
completely disconnected from min and max, as the transformation scales the 
data to [-1,1] like this


   2.0 * (i - min - mu + 0.5) / (max - min + 1)

and only then the 'threshold' coefficient is applied. And if I read the 
Box-Muller transformation correctly, it generates data with standard Normal 
distribution from [-threshold, threshold] and then transforms them to the 
right mean etc.


Yep, the threshold parameter is designed to be somehow independent of the 
actual [min max] range.



But maybe that's what the first sentence is trying to say? I mean this:

   For a Gaussian distribution, the interval is mapped onto a standard
   normal distribution (the classical bell-shaped Gaussian curve)
   truncated at -threshold on the left and +threshold on the right.


Yep, that looks like it.

I'm asking about this because it wasn't to me immediately clear whether I 
need to tweak this for data sets with different scales, but apparently not.


Indeed, This is the idea of how the parameter is used.

After reading the docs again I think that's also clear from last sentence 
that relates threshold and 67% and 95%.


Yep.

Anyway, the references to "standard normal distribution" are a bit sloppy - 
"standard" usually means normal distribution with exactly mu=0 and sigma=1. 
So it's a bit strange to say


   standard normal distribution, with mean mu defined as (max+min)/2.0

because that's not a standard normal distribution at all. I propose to fix 
this by removing the "standard".


Hmmm, probably fine if it is both more precise and shorter!


[...]
 CDF2(x) = PHI(2.0 * threshold * ...) / (2.0 * PHI(threshold) - 1.0)

and then the probability of "i" is

 P(X=i) = CDF2(i+0.5) - CDF2(i-0.5)


I agree that defining the shifted/scaled CDF and using it afterwards looks 
cleaner.


Which is what I meant by simplifying the equation. Not that it'd make easier 
to imagine the shape, though ...


Sure. This is the part about providing the "precise" information, what is 
the actual probability of drawing i depending on the parameters.


Maybe. Another thing is that "middle quarter" and "middle half" seems a bit 
strange - if you split data into 1/4s there's no middle one (sure, I 
understand what the sentence is meant to say).


Improvements are welcome!


Ok. I think that the fact that it relies on the Box-Muller transform is
relevant, because there are other methods to generate a gaussian
distribution, and I would say that there is no reason to have to go to
the source code to check that. But I would not provide further details.
So I'm fine with the current status.


There are alternative methods for almost every non-trivial piece of code, and 
we generally don't mention that in user docs. Why should we mention it in 
this case? Why would the user care which particular PRNG was used to generate 
the numbers? Maybe there really is a reason for that, I don't know.


If that was security, because one has just been announced to be broken and 
you want to know whether you depend on it.


As a scientist, I like it when follow scientists who achieved useful 
things have their name cited:-).


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench gaussian/exponential docs improvements

2015-10-25 Thread Tomas Vondra



On 10/25/2015 10:01 PM, Fabien COELHO wrote:



[...]

So either the information is important and then should be placed in
the docs directly, or it's not and then linking to wikipedia is
pointless because the users are not interested in learning all the
details about each distribution function.


What is important is that these distributions can be used from pgbench.
What is a gaussian or an exponential distribution is *not* important as
such.

For me it is not the point of pg documentation to explain probability
theory, but just to provide *precise* information about what is actually
available, for someone who would be interested, without having to read
the source code. At least that is the idea behind the current
documentation.


OK, fair enough. OTOH many of our users don't have immediate knowledge 
of statistical distributions, so if we could give them additional info 
in a reasonable way, that'd be good.





Firstly, it'd be nice if we could add some figures illustrating the
distributions - much better than explaining the shapes in text. I
don't know if we include figures in the existing docs (probably not),
but generating the figures is rather simple.


There is basically no figures in the documentation. Too bad, but it is
understandable: what should be the format (svg, jpg, png, ...), should
it be generated (gnuplot, others), what is the impact on the
documentation build (html, epub, pdf, ...), how portable should it be,
what about compressed formats vs git diffs?

Once you start asking these questions you understand why there are no
figures:-)


I don't see why diffs would be a problem.


I was not only thinking of mathematical figures, I was also thinking of
graphics, some format may be zip containing XML stuff for instance.


But we don't need it here, so why should we care about it too much?




In other words, the general shape of the curve will be exactly the
same no matter the actual min/max (except that for longer intervals
the values will be lower, as there are more possible values).

I don't really see how it's related to this?

   [(max-min)/2 - thresholds, (max-min)/2 + threshold]


The gaussian distribution is about reals, but it is used for integers,
so there is a projection on integers from the real values. The function
should compute the probability of drawing a given integer "i" in the
interval, that is given min, max and threshold, what is the probability
of drawing i.


I do understand that. I'm trying to explain that "threshold" is in fact 
completely disconnected from min and max, as the transformation scales 
the data to [-1,1] like this


2.0 * (i - min - mu + 0.5) / (max - min + 1)

and only then the 'threshold' coefficient is applied. And if I read the 
Box-Muller transformation correctly, it generates data with standard 
Normal distribution from [-threshold, threshold] and then transforms 
them to the right mean etc.


But maybe that's what the first sentence is trying to say? I mean this:

For a Gaussian distribution, the interval is mapped onto a standard
normal distribution (the classical bell-shaped Gaussian curve)
truncated at -threshold on the left and +threshold on the right.

I'm asking about this because it wasn't to me immediately clear whether 
I need to tweak this for data sets with different scales, but apparently 
not. After reading the docs again I think that's also clear from last 
sentence that relates threshold and 67% and 95%.


Anyway, the references to "standard normal distribution" are a bit 
sloppy - "standard" usually means normal distribution with exactly mu=0 
and sigma=1. So it's a bit strange to say


standard normal distribution, with mean mu defined as (max+min)/2.0

because that's not a standard normal distribution at all. I propose to 
fix this by removing the "standard".


[1] as wikipedia notes, Gauss himself used different sigma




Could we simplify the equation a bit? It's needlessly difficult to
realize it's actually just CDF(i+0.5) - CDF(i-0.5). I think it'd be
good to first define the CDF and then just use that.


ISTM that PHI is *the* normal CDF, which is more or less available as
such in various environment (matlab, python, excel...). Well, why not
defined the particular CDF and use it. Not sure the text would be that
much lighter, though.


PHI is the CDF of the normal distribution, not the modified
probability distribution here (with threshold and scaled to the
desired interval).


Yep, that is exactly what I was saying, I think.


I think we're talking about slightly different things. Essentially the 
transformation transforms Normal distribution (with PHI as CDF) into 
another statistical distribution (with the thresholds and such), and a 
different CDF, let's say CDF2, which is


  CDF2(x) = PHI(2.0 * threshold * ...) / (2.0 * PHI(threshold) - 1.0)

and then the probability of "i" is

  P(X=i) = CDF2(i+0.5) - CDF2(i-0.5)

Which is what I meant by simplifying the equation. Not that it'd make 
easier to 

[HACKERS] pgbench gaussian/exponential docs improvements

2015-10-25 Thread Tomas Vondra

Hi,

I've been looking at the checkpoint patches (sorting, flush and FPW 
compensation) and realized we got gaussian/exponential distributions in 
pgbench which is useful for simulating simple non-uniform workloads.


But I think the current docs is a bit too difficult to understand for 
people without deep insight into statistics and shapes of probability 
distributions.


Firstly, it'd be nice if we could add some figures illustrating the 
distributions - much better than explaining the shapes in text. I don't 
know if we include figures in the existing docs (probably not), but 
generating the figures is rather simple.


A few more comments:


By default, or when uniform is specified, all values in the range are
drawn with equal probability. Specifying gaussian or exponential
options modifies this behavior; each requires a mandatory threshold
which determines the precise shape of the distribution.


I find the 'threshold' name to be rather unfortunate, as none of the 
probability distribution functions that I know use this term. And even 
if there's one probability function that uses 'threshold' it has very 
little meaning in the others. For example the exponential distribution 
uses 'rate' (lambda). I'd prefer a neutral name (e.g. 'parameter').



For a Gaussian distribution, the interval is mapped onto a standard
normal distribution (the classical bell-shaped Gaussian curve)
truncated at -threshold on the left and +threshold on the right.


Probably nitpicking, but left/right of what? I assume the normal 
distribution is placed at 0, so it's left/right of zero.



To be precise, if PHI(x) is the cumulative distribution function of
the standard normal distribution, with mean mu defined as (max + min)
/ 2.0, then value i between min and max inclusive is drawn with
probability: (PHI(2.0 * threshold * (i - min - mu + 0.5) / (max -
min + 1)) - PHI(2.0 * threshold * (i - min - mu - 0.5) / (max - min +
1))) / (2.0 * PHI(threshold) - 1.0). Intuitively, the larger the
threshold, the more frequently values close to the middle of the
interval are drawn, and the less frequently values close to the min
and max bounds.


Could we simplify the equation a bit? It's needlessly difficult to 
realize it's actually just CDF(i+0.5) - CDF(i-0.5). I think it'd be good 
to first define the CDF and then just use that.



About 67% of values are drawn from the middle 1.0 / threshold and 95%
in the middle 2.0 / threshold; for instance, if threshold is 4.0, 67%
of values are drawn from the middle quarter and 95% from the middle
half of the interval.


This seems broken - too many sentences about the 67% and 95%.


The minimum threshold is 2.0 for performance of the Box-Muller
transform.


Does it make sense to explicitly mention the implementation detail 
(Box-Muller transform) here?



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench gaussian/exponential docs improvements

2015-10-25 Thread Tomas Vondra



On 10/25/2015 08:11 PM, Fabien COELHO wrote:


Hello Tomas,


I've been looking at the checkpoint patches (sorting, flush and FPW
compensation) and realized we got gaussian/exponential distributions
in pgbench which is useful for simulating simple non-uniform workloads.


Indeed.


But I think the current docs is a bit too difficult to understand for
people without deep insight into statistics and shapes of probability
distributions.


I think the idea is that (1) if you do not know anything distributions,
probably you do not want expo/gauss, and (2) pg documentation should not
try to be an introductory course in probability theory.

AFAICR I suggested to point to relevant wikipedia pages but this has
been more or less rejected, so it ended up as it is, which is indeed
pretty unconvincing.


I don't think links to wikipedia are all that useful in this context.

Firstly, we have no control over wikipedia pages so we can't point the 
users to particular sections of the page (as we don't know if it gets 
rewritten tomorrow).


So either the information is important and then should be placed in the 
docs directly, or it's not and then linking to wikipedia is pointless 
because the users are not interested in learning all the details about 
each distribution function.



Firstly, it'd be nice if we could add some figures illustrating the
distributions - much better than explaining the shapes in text. I
don't know if we include figures in the existing docs (probably not),
but generating the figures is rather simple.


There is basically no figures in the documentation. Too bad, but it is
understandable: what should be the format (svg, jpg, png, ...), should
it be generated (gnuplot, others), what is the impact on the
documentation build (html, epub, pdf, ...), how portable should it be,
what about compressed formats vs git diffs?

Once you start asking these questions you understand why there are no
figures:-)


I don't see why diffs would be a problem. Include gnuplot source files, 
then build the appropriate format for each output format (eps for pdf, 
png for web, ...).


But yes, it definitely requires some work on the Makefiles.


For a Gaussian distribution, the interval is mapped onto a standard
normal distribution (the classical bell-shaped Gaussian curve)
truncated at -threshold on the left and +threshold on the right.


Probably nitpicking, but left/right of what? I assume the normal
distribution is placed at 0, so it's left/right of zero.


No, it is around the middle of the interval.


You mean [min,max] interval? I believe the transformation

2.0 * threshold * (i - min - mu + 0.5) / (max - min + 1)

essentially moves the mean into 0, scales the data to [0,1] and then 
applies the threshold.


In other words, the general shape of the curve will be exactly the same 
no matter the actual min/max (except that for longer intervals the 
values will be lower, as there are more possible values).


I don't really see how it's related to this?

[(max-min)/2 - thresholds, (max-min)/2 + threshold]




To be precise, if PHI(x) is the cumulative distribution function of
the standard normal distribution, with mean mu defined as (max + min)
/ 2.0, then value i between min and max inclusive is drawn with
probability: (PHI(2.0 * threshold * (i - min - mu + 0.5) / (max -
min + 1)) - PHI(2.0 * threshold * (i - min - mu - 0.5) / (max - min +
1))) / (2.0 * PHI(threshold) - 1.0). Intuitively, the larger the
threshold, the more frequently values close to the middle of the
interval are drawn, and the less frequently values close to the min
and max bounds.


Could we simplify the equation a bit? It's needlessly difficult to
realize it's actually just CDF(i+0.5) - CDF(i-0.5). I think it'd be
good to first define the CDF and then just use that.


ISTM that PHI is *the* normal CDF, which is more or less available as
such in various environment (matlab, python, excel...). Well, why not
defined the particular CDF and use it. Not sure the text would be that
much lighter, though.


PHI is the CDF of the normal distribution, not the modified probability 
distribution here (with threshold and scaled to the desired interval).



About 67% of values are drawn from the middle 1.0 / threshold and 95%
in the middle 2.0 / threshold; for instance, if threshold is 4.0, 67%
of values are drawn from the middle quarter and 95% from the middle
half of the interval.


This seems broken - too many sentences about the 67% and 95%.


The point is to provide rules of thumb to describe how the distribution
is shaped. Any better sentence is welcome.


Ah, I misread the sentence initially. I haven't realized it speaks about 
1/threshold in the first part, and the second part is an example for 
threshold=4.0. So I thought it's a repetition of the first part.





The minimum threshold is 2.0 for performance of the Box-Muller
transform.


Does it make sense to explicitly mention the implementation detail
(Box-Muller transform) here?


It is 

Re: [HACKERS] pgbench gaussian/exponential docs improvements

2015-10-25 Thread Fabien COELHO



[...]

So either the information is important and then should be placed in the 
docs directly, or it's not and then linking to wikipedia is pointless 
because the users are not interested in learning all the details about 
each distribution function.


What is important is that these distributions can be used from pgbench. 
What is a gaussian or an exponential distribution is *not* important as 
such.


For me it is not the point of pg documentation to explain probability 
theory, but just to provide *precise* information about what is actually 
available, for someone who would be interested, without having to read the 
source code. At least that is the idea behind the current documentation.



Firstly, it'd be nice if we could add some figures illustrating the
distributions - much better than explaining the shapes in text. I
don't know if we include figures in the existing docs (probably not),
but generating the figures is rather simple.


There is basically no figures in the documentation. Too bad, but it is
understandable: what should be the format (svg, jpg, png, ...), should
it be generated (gnuplot, others), what is the impact on the
documentation build (html, epub, pdf, ...), how portable should it be,
what about compressed formats vs git diffs?

Once you start asking these questions you understand why there are no
figures:-)


I don't see why diffs would be a problem.


I was not only thinking of mathematical figures, I was also thinking of 
graphics, some format may be zip containing XML stuff for instance.



Probably nitpicking, but left/right of what? I assume the normal
distribution is placed at 0, so it's left/right of zero.


No, it is around the middle of the interval.


You mean [min,max] interval?


Yep.


I believe the transformation

   2.0 * threshold * (i - min - mu + 0.5) / (max - min + 1)

essentially moves the mean into 0, scales the data to [0,1] and then applies 
the threshold.


Probably:-) I wrote that some time ago, and it is 10 pm for me:-).

In other words, the general shape of the curve will be exactly the same no 
matter the actual min/max (except that for longer intervals the values will 
be lower, as there are more possible values).


I don't really see how it's related to this?

   [(max-min)/2 - thresholds, (max-min)/2 + threshold]


The gaussian distribution is about reals, but it is used for integers, so 
there is a projection on integers from the real values. The function 
should compute the probability of drawing a given integer "i" in the 
interval, that is given min, max and threshold, what is the probability of 
drawing i.



Could we simplify the equation a bit? It's needlessly difficult to
realize it's actually just CDF(i+0.5) - CDF(i-0.5). I think it'd be
good to first define the CDF and then just use that.


ISTM that PHI is *the* normal CDF, which is more or less available as
such in various environment (matlab, python, excel...). Well, why not
defined the particular CDF and use it. Not sure the text would be that
much lighter, though.


PHI is the CDF of the normal distribution, not the modified probability 
distribution here (with threshold and scaled to the desired interval).


Yep, that is exactly what I was saying, I think.


This seems broken - too many sentences about the 67% and 95%.


The point is to provide rules of thumb to describe how the distribution
is shaped. Any better sentence is welcome.


Ah, I misread the sentence initially. I haven't realized it speaks about 
1/threshold in the first part, and the second part is an example for 
threshold=4.0. So I thought it's a repetition of the first part.


Maybe it needs spacing and colons and rewording, if it too hard to parse.


Does it make sense to explicitly mention the implementation detail
(Box-Muller transform) here?


No, my point was exactly the opposite - removing the mention of Box-Muller 
entirely, not adding more details about it.


Ok. I think that the fact that it relies on the Box-Muller transform is 
relevant, because there are other methods to generate a gaussian 
distribution, and I would say that there is no reason to have to go to the 
source code to check that. But I would not provide further details. So I'm 
fine with the current status.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench gaussian/exponential docs improvements

2015-10-25 Thread Fabien COELHO


Hello Tomas,

I've been looking at the checkpoint patches (sorting, flush and FPW 
compensation) and realized we got gaussian/exponential distributions in 
pgbench which is useful for simulating simple non-uniform workloads.


Indeed.

But I think the current docs is a bit too difficult to understand for 
people without deep insight into statistics and shapes of probability 
distributions.


I think the idea is that (1) if you do not know anything distributions, 
probably you do not want expo/gauss, and (2) pg documentation should not 
try to be an introductory course in probability theory.


AFAICR I suggested to point to relevant wikipedia pages but this has been 
more or less rejected, so it ended up as it is, which is indeed pretty 
unconvincing.


Firstly, it'd be nice if we could add some figures illustrating the 
distributions - much better than explaining the shapes in text. I don't 
know if we include figures in the existing docs (probably not), but 
generating the figures is rather simple.


There is basically no figures in the documentation. Too bad, but it is 
understandable: what should be the format (svg, jpg, png, ...), should it 
be generated (gnuplot, others), what is the impact on the documentation 
build (html, epub, pdf, ...), how portable should it be, what about 
compressed formats vs git diffs?


Once you start asking these questions you understand why there are no 
figures:-)



A few more comments:


By default, or when uniform is specified, all values in the range are
drawn with equal probability. Specifying gaussian or exponential
options modifies this behavior; each requires a mandatory threshold
which determines the precise shape of the distribution.


I find the 'threshold' name to be rather unfortunate, as none of the 
probability distribution functions that I know use this term.


I think that it was proposed for gaussian, not sure why.

And even if there's one probability function that uses 'threshold' it 
has very little meaning in the others. For example the exponential 
distribution uses 'rate' (lambda). I'd prefer a neutral name (e.g. 
'parameter').


Why not. Many places to fix, though (documentation & source code).


For a Gaussian distribution, the interval is mapped onto a standard
normal distribution (the classical bell-shaped Gaussian curve)
truncated at -threshold on the left and +threshold on the right.


Probably nitpicking, but left/right of what? I assume the normal 
distribution is placed at 0, so it's left/right of zero.


No, it is around the middle of the interval.


To be precise, if PHI(x) is the cumulative distribution function of
the standard normal distribution, with mean mu defined as (max + min)
/ 2.0, then value i between min and max inclusive is drawn with
probability: (PHI(2.0 * threshold * (i - min - mu + 0.5) / (max -
min + 1)) - PHI(2.0 * threshold * (i - min - mu - 0.5) / (max - min +
1))) / (2.0 * PHI(threshold) - 1.0). Intuitively, the larger the
threshold, the more frequently values close to the middle of the
interval are drawn, and the less frequently values close to the min
and max bounds.


Could we simplify the equation a bit? It's needlessly difficult to realize 
it's actually just CDF(i+0.5) - CDF(i-0.5). I think it'd be good to first 
define the CDF and then just use that.


ISTM that PHI is *the* normal CDF, which is more or less available as such 
in various environment (matlab, python, excel...). Well, why not defined 
the particular CDF and use it. Not sure the text would be that much 
lighter, though.



About 67% of values are drawn from the middle 1.0 / threshold and 95%
in the middle 2.0 / threshold; for instance, if threshold is 4.0, 67%
of values are drawn from the middle quarter and 95% from the middle
half of the interval.


This seems broken - too many sentences about the 67% and 95%.


The point is to provide rules of thumb to describe how the distribution is 
shaped. Any better sentence is welcome.



The minimum threshold is 2.0 for performance of the Box-Muller
transform.


Does it make sense to explicitly mention the implementation detail 
(Box-Muller transform) here?


It is too complex, I would avoid it. I would point to the wikipedia page 
if that could be allowed.


https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers