Re: [HACKERS] pgbench randomness initialization

Fabien COELHO Thu, 07 Apr 2016 03:27:07 -0700

 (2) runs which really vary from one to the next, so as
     to have an idea about how much it may vary, what is the
     performance stability.


I don't think this POV makes all that much sense. If you do something
non-comparable, then the results aren't, uh, comparable. Which also
means there's a lower chance to reproduce observed problems.

That also means that you are likely not to hit them if you always do thevery same run...

Moreover, the Monte Carlo method requires randomness for its convergenceresult.

Currently pgbench focusses on (2), which may or may not be fine depending on
what you are doing. From a personal point of view I think that (2) is more
significant to collect performance data, even if the results are more
unstable: that simply reflects reality and its intrinsic variations, so I'm
fine that as the default.


Uh, and what's the benefit of that variability? pgbench isn't a reality
simulation tool, it's a benchmarking tool. And benchmarks with intrisinc
variability are bad benchmarks.

From a statistical perspective, one run does not mean anything. If you do

the exact same run over and over again, then all mathematical resultsabout (slow) convergence towards the average are lost. This is like tryingto survey a population by asking the questions to the same person over andover: the result will be biased.

Now when you develop, which is the use case you probably have in mind, youwant to compare two pg version and check for the performance impact, sohaving the exact same run seems like a proxy to quickly check for that.

However, from a stastistical perspective this is just heresy: you may do achange which improves one given run at the expense of all possible othersand you would not know it: Say for instance that there are two differentbehaviors depending on something, then you will check against one of themonly.

So I have no mathematical doubt that changing the seed is the rightdefault setting, thus I think that the current behavior is fine. HoweverI'm okay if someone wants to control the randomness for some reason (maybehaving "less sure" results, but quickly), so it could be allowed somehow.


--
Fabien.


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench randomness initialization

Reply via email to