It means that you can't separate between OS caused, and pgbench order
caused performance differences.


I'm not objecting to adding an option for this; but I think Fabien is
right that it shouldn't be the default.

Yep.

Andres, attached is a simple POC with an option & environment variable (whereas I should rather have looked at the current checkpointer/vacuum issue which I have reproduced:-().

While testing it I had a funny pattern, something like:

  pgbench --random-seed=123 -M prepared -T 3 -P 1 -S
  1.0: 600 tps
  2.0: 600 tps
  3.0: 600 tps

First rerun just after:

  pgbench --random-seed=123 -M prepared -T 3 -P 1 -S
  1.0: 1800 tps
  2.0: 600 tps
  3.0: 600 tps

The first rerun hits the same pages, so the first 1800 transactions are run in one second, and then it is new pages which are loaded so the performance goes down.

Second rerun just after:

  pgbench --random-seed=123 -M prepared -T 3 -P 1 -S
  1.0: 1800 tps
  2.0: 1400 tps
  3.0: 600 tps

The second redun hits the same 3000 transactions than the previous one in about 1.7 seconds, then goes back to 600 tps for new pages...

After more iterations the performance is 1800 tps during the 3 seconds.

This clearly illustrates that it should be used with caution.

--
Fabien.
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 06cd5db..1908896 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -678,6 +678,33 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
        </para>
       </listitem>
      </varlistentry>
+
+     <varlistentry>
+      <term><option>--random-seed=</><replaceable>SEED</></term>
+      <listitem>
+       <para>
+        Set random generator seed.  This random generator is used to initialize
+        per-thread random generator states.
+        Expected values for <replaceable>SEED</> are: <literal>time</> (the default,
+        the seed is based on the current time) or any unsigned integer value.
+        The random generator is invoked explicitely from a pgbench script
+        (<literal>random...</> functions) or implicitely (for instance option
+        <option>--rate</> uses random to schedule transactions).
+        The random generator seed may also be provided through environment variable
+        <literal>PGBENCH_RANDOM_SEED</>.
+      </para>
+      <para>
+        Setting the seed explicitely allows to reproduce a <command>pgbench</> run
+        exactly, as far as random numbers are concerned.
+        From a statistical viewpoint this is a bad idea because it can hide the
+        performance variability or improve performance unduly, e.g. by hitting
+        the same pages than a previous run.
+        However it may also be of great help for debugging, for instance
+        re-running a tricky case which leads to an error.
+        Use wisely.
+       </para>
+      </listitem>
+     </varlistentry>
     </variablelist>
    </para>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 076fbd3..d6db19f 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -434,6 +434,7 @@ usage(void)
 		   "  -U, --username=USERNAME  connect as specified database user\n"
 		 "  -V, --version            output version information, then exit\n"
 		   "  -?, --help               show this help, then exit\n"
+		   "  --random-seed=SEED       set random seed (\"time\", integer)\n"
 		   "\n"
 		   "Report bugs to <pgsql-b...@postgresql.org>.\n",
 		   progname, progname);
@@ -3258,6 +3259,7 @@ main(int argc, char **argv)
 		{"sampling-rate", required_argument, NULL, 4},
 		{"aggregate-interval", required_argument, NULL, 5},
 		{"progress-timestamp", no_argument, NULL, 6},
+		{"random-seed", required_argument, NULL, 7},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -3292,6 +3294,7 @@ main(int argc, char **argv)
 	PGconn	   *con;
 	PGresult   *res;
 	char	   *env;
+	char	   *seed = NULL;
 
 	char		val[64];
 
@@ -3607,6 +3610,9 @@ main(int argc, char **argv)
 				progress_timestamp = true;
 				benchmarking_option_set = true;
 				break;
+			case 7:
+				seed = pg_strdup(optarg);
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -3845,7 +3851,25 @@ main(int argc, char **argv)
 
 	/* set random seed */
 	INSTR_TIME_SET_CURRENT(start_time);
-	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
+
+	if (seed == NULL)
+		seed = getenv("PGBENCH_RANDOM_SEED");
+
+	if (seed == NULL || strcmp(seed, "time") == 0)
+		srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
+	else
+	{
+		unsigned int s;
+		fprintf(stderr, "random seed set to '%s'\n", seed);
+		if (sscanf(seed, "%u", &s) != 1)
+		{
+			fprintf(stderr,
+					"error while scanning '%s', expecting an unsigned integer\n",
+					seed);
+			exit(1);
+		}
+		srandom(s);
+	}
 
 	/* set up thread data structures */
 	threads = (TState *) pg_malloc(sizeof(TState) * nthreads);
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to