Hello Andres,

et al I was wondering why it's a good idea for pgbench to do
        srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
to initialize randomness and then
        for (i = 0; i < nthreads; i++)
                thread->random_state[0] = random();
                thread->random_state[1] = random();
                thread->random_state[2] = random();
to initialize the individual thread random state which is then used by

To me it seems better to instead initialize srandom() with a known value
(say, uh, 0). Or even better don't use random() at all, and fill a
global pg_erand48() with a known state; and use pg_erand48() to
initialize the thread states.

Obviously that doesn't make pgbench entirely reproducible, but it seems
a lot better than now. Individual threads would do work in a
reproducible order.

I see very little reason to have the current behaviour, or at the very
least not by default.

I think that it depends on what you want, which may vary:

 (1) "exactly" reproducible runs, but one run may hit a particular
     steady state not representative of what happens in general.

 (2) runs which really vary from one to the next, so as
     to have an idea about how much it may vary, what is the
     performance stability.

Currently pgbench focusses on (2), which may or may not be fine depending on what you are doing. From a personal point of view I think that (2) is more significant to collect performance data, even if the results are more unstable: that simply reflects reality and its intrinsic variations, so I'm fine that as the default.

Now for those interested in (1) for some reason, I would suggest to rely a PGBENCH_RANDOM_SEED environment variable or --random-seed option which could be used to have a oxymoronic "deterministic randomness", if desired.
I do not think that it should be the default, though.


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to