Hi Fabien,

13/01/2018 11:16, Fabien COELHO пишет:
>
> Hello Ildar,
>
>>> so that different instances of hash function within one script would
>>> have different seeds. Yes, that is a good idea, I can do that.
>>>
>> Added this feature in attached patch. But on a second thought this could
>> be something that user won't expect. For example, they may want to run
>> pgbench with two scripts:
>> - the first one updates row by key that is a hashed random_zipfian
>> value;
>> - the second one reads row by key generated the same way
>> (that is actually what YCSB workloads A and B do)
>>
>> It feels natural to write something like this:
>> \set rnd random_zipfian(0, 1000000, 0.99)
>> \set key abs(hash(:rnd)) % 1000
>> in both scripts and expect that they both would have the same
>> distribution. But they wouldn't. We could of course describe this
>> implicit behaviour in documentation, but ISTM that shared seed would be
>> more clear.
>
> I think that it depends on the use case, that both can be useful, so
> there should be a way to do either.
>
> With "always different" default seed, distinct distributions are achieved
> with:
>
>    -- DIFF distinct seeds inside and between runs
>    \set i1 abs(hash(:r1)) % 1000
>    \set j1 abs(hash(:r2)) % 1000
>
> and the same distribution can be done with an explicit seed:
>
>    -- DIFF same seed inside and between runs
>    \set i1 abs(hash(:r1), 5432) % 1000
>    \set j1 abs(hash(:r2), 5432) % 1000
>
> The drawback is that the same seed is used between runs in this case,
> which is not desirable. This could be circumvented by adding the
> random seed as an automatic variable and using it, eg:
>
>    -- DIFF same seed inside run, distinct between runs
>    \set i1 abs(hash(:r1), :random_seed + 5432) % 1000
>    \set j1 abs(hash(:r2), :random_seed + 2345) % 1000
>
>
> Now with a shared hash_seed the same distribution is by default:
>
>    -- SHARED same underlying hash_seed inside run, distinct between runs
>    \set i1 abs(hash(:r1)) % 1000
>    \set j1 abs(hash(:r2)) % 1000
>
> However some trick is needed now to get distinct seeds. With
>
>    -- SHARED distinct seed inside run, but same between runs
>    \set i1 abs(hash(:r1, 5432)) % 1000
>    \set j1 abs(hash(:r2, 2345)) % 1000
>
> We are back to the same issue has the previous case because then the
> distribution is the same from one run to the next, which is not
> desirable. I found this workaround trick:
>
>    -- SHARED distinct seeds inside and between runs
>    \set i1 abs(hash(:r1, hash(5432))) % 1000
>    \set j1 abs(hash(:r2, hash(2345))) % 1000
>
> Or with a new :hash_seed or :random_seed automatic variable, we could
> also have:
>
>    -- SHARED distinct seeds inside and between runs
>    \set i1 abs(hash(:r1, :hash_seed + 5432)) % 1000
>    \set j1 abs(hash(:r2, :hash_seed + 2345)) % 1000
>
> It provides controllable distinct seeds between runs but equal one
> between if desired, by reusing the same value to be hashed as a seed.
>
> I also agree with your argument that the user may reasonably expect
> that hash(5432) == hash(5432) inside and between scripts, at least on
> the same run, so would be surprised that it is not the case.
>
> So I've changed my mind, I'm sorry for making you going back and forth
> on the subject. I'm now okay with one shared 64 bit hash seed, with a
> clear documentation about the fact, and an outline of the trick to
> achieve distinct distributions inside a run if desired and why it
> would be desirable to avoid correlations. Also, I think that providing
> the seed as automatic variable (:hash_seed or :hseed or whatever)
> would make some sense as well. Maybe this could be used as a way to
> fix the seed explicitely, eg:
>
>    pgbench -D hash_seed=1234 ...
>
> Would use this value instead of the random generated one. Also, with
> that the default inserted second argument could be simply
> ":hash_seed", which would simplify the executor which would not have
> to do check for an optional second argument.
>
Here is a new version of patch. I've splitted it into two parts. The
first one is almost the same as v4 from [1] with some refactoring. The
second part introduces random_seed variable as you proposed. I didn't do
the executor simplification thing yet because I'm a little concerned
about inventive users, who may want to change random_seed variable in
runtime (which is possible since pgbench doesn't have read only
variables aka constants AFAIK).

[1]
https://www.postgresql.org/message-id/43a8fbbb-32fa-6478-30a9-f64041adf019%40postgrespro.ru

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 3dd492c..c575f19 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1246,6 +1246,27 @@ pgbench <optional> <replaceable>options</replaceable> 
</optional> <replaceable>d
        <entry><literal>5</literal></entry>
       </row>
       <row>
+       <entry><literal><function>hash(<replaceable>a</replaceable> [, 
<replaceable>seed</replaceable> ] )</function></literal></entry>
+       <entry>integer</entry>
+       <entry>alias for <literal>hash_murmur2()</literal></entry>
+       <entry><literal>hash(10, 5432)</literal></entry>
+       <entry><literal>-5817877081768721676</literal></entry>
+      </row>
+      <row>
+       <entry><literal><function>hash_fnv1a(<replaceable>a</replaceable> [, 
<replaceable>seed</replaceable> ] )</function></literal></entry>
+       <entry>integer</entry>
+       <entry><literal>FNV</literal> hash</entry>
+       <entry><literal>hash_fnv1a(10, 5432)</literal></entry>
+       <entry><literal>-7793829335365542153</literal></entry>
+      </row>
+      <row>
+       <entry><literal><function>hash_murmur2(<replaceable>a</replaceable> [, 
<replaceable>seed</replaceable> ] )</function></literal></entry>
+       <entry>integer</entry>
+       <entry><literal>murmur2</literal> hash</entry>
+       <entry><literal>hash_murmur2(10, 5432)</literal></entry>
+       <entry><literal>-5817877081768721676</literal></entry>
+      </row>
+      <row>
        
<entry><literal><function>int(<replaceable>x</replaceable>)</function></literal></entry>
        <entry>integer</entry>
        <entry>cast to int</entry>
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index e23ca51..5e982bb 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -16,6 +16,10 @@
 
 #include "pgbench.h"
 
+#define PGBENCH_NARGS_VARIABLE (-1)
+#define PGBENCH_NARGS_CASE             (-2)
+#define PGBENCH_NARGS_HASH             (-3)
+
 PgBenchExpr *expr_parse_result;
 
 static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list);
@@ -226,9 +230,13 @@ make_uop(yyscan_t yyscanner, const char *operator, 
PgBenchExpr *expr)
 /*
  * List of available functions:
  * - fname: function name, "!..." for special internal functions
- * - nargs: number of arguments
- *                     -1 is a special value for least & greatest meaning 
#args >= 1
- *                     -2 is for the "CASE WHEN ..." function, which has #args 
>= 3 and odd
+ * - nargs: number of arguments. Special cases:
+ *                     - PGBENCH_NARGS_VARIABLE is a special value for least & 
greatest
+ *                       meaning #args >= 1;
+ *                     - PGBENCH_NARGS_CASE is for the "CASE WHEN ..." 
function, which
+ *                       has #args >= 3 and odd;
+ *                     - PGBENCH_NARGS_HASH is for hash functions, which have 
one required
+ *                       and one optional argument;
  * - tag: function identifier from PgBenchFunction enum
  */
 static const struct
@@ -259,10 +267,10 @@ static const struct
                "abs", 1, PGBENCH_ABS
        },
        {
-               "least", -1, PGBENCH_LEAST
+               "least", PGBENCH_NARGS_VARIABLE, PGBENCH_LEAST
        },
        {
-               "greatest", -1, PGBENCH_GREATEST
+               "greatest", PGBENCH_NARGS_VARIABLE, PGBENCH_GREATEST
        },
        {
                "debug", 1, PGBENCH_DEBUG
@@ -347,7 +355,25 @@ static const struct
        },
        /* "case when ... then ... else ... end" construction */
        {
-               "!case_end", -2, PGBENCH_CASE
+               "!case_end", PGBENCH_NARGS_CASE, PGBENCH_CASE
+       },
+       {
+               "hash", PGBENCH_NARGS_HASH, PGBENCH_HASH_MURMUR2
+       },
+       {
+               "hash_murmur2", PGBENCH_NARGS_HASH, PGBENCH_HASH_MURMUR2
+       },
+       {
+               "hash_fnv1a", PGBENCH_NARGS_HASH, PGBENCH_HASH_FNV1A
+       },
+       {
+               "hash", -3, PGBENCH_HASH_MURMUR2
+       },
+       {
+               "hash_murmur2", -3, PGBENCH_HASH_MURMUR2
+       },
+       {
+               "hash_fnv1a", -3, PGBENCH_HASH_FNV1A
        },
        /* keep as last array element */
        {
@@ -423,29 +449,52 @@ elist_length(PgBenchExprList *list)
 static PgBenchExpr *
 make_func(yyscan_t yyscanner, int fnumber, PgBenchExprList *args)
 {
+       int len = elist_length(args);
+
        PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
 
        Assert(fnumber >= 0);
 
-       if (PGBENCH_FUNCTIONS[fnumber].nargs >= 0 &&
-               PGBENCH_FUNCTIONS[fnumber].nargs != elist_length(args))
-               expr_yyerror_more(yyscanner, "unexpected number of arguments",
-                                                 
PGBENCH_FUNCTIONS[fnumber].fname);
-
-       /* check at least one arg for least & greatest */
-       if (PGBENCH_FUNCTIONS[fnumber].nargs == -1 &&
-               elist_length(args) == 0)
-               expr_yyerror_more(yyscanner, "at least one argument expected",
-                                                 
PGBENCH_FUNCTIONS[fnumber].fname);
-       /* special case: case (when ... then ...)+ (else ...)? end */
-       if (PGBENCH_FUNCTIONS[fnumber].nargs == -2)
+       /* validate arguments number including few special cases */
+       switch (PGBENCH_FUNCTIONS[fnumber].nargs)
+       {
+               /* check at least one arg for least & greatest */
+               case PGBENCH_NARGS_VARIABLE:
+                       if (len == 0)
+                               expr_yyerror_more(yyscanner, "at least one 
argument expected",
+                                                                 
PGBENCH_FUNCTIONS[fnumber].fname);
+                       break;
+
+               /* case (when ... then ...)+ (else ...)? end */
+               case PGBENCH_NARGS_CASE:
+                       /* 'else' branch is always present, but could be a 
NULL-constant */
+                       if (len < 3 || len % 2 != 1)
+                               expr_yyerror_more(yyscanner,
+                                                                 "odd and >= 3 
number of arguments expected",
+                                                                 "case control 
structure");
+                       break;
+
+               /* hash functions with optional seed argument */
+               case PGBENCH_NARGS_HASH:
+                       if (len < 1 || len > 2)
+                       expr_yyerror_more(yyscanner, "unexpected number of 
arguments",
+                                                         
PGBENCH_FUNCTIONS[fnumber].fname);
+                       break;
+
+               /* common case: positive arguments number */
+               default:
+                       if (PGBENCH_FUNCTIONS[fnumber].nargs != len)
+                               expr_yyerror_more(yyscanner, "unexpected number 
of arguments",
+                                                                 
PGBENCH_FUNCTIONS[fnumber].fname);
+       }
+       /* special case: hash functions with optional arguments */
+       if (PGBENCH_FUNCTIONS[fnumber].nargs == -3)
        {
                int len = elist_length(args);
 
-               /* 'else' branch is always present, but could be a 
NULL-constant */
-               if (len < 3 || len % 2 != 1)
-                       expr_yyerror_more(yyscanner, "odd and >= 3 number of 
arguments expected",
-                                                         "case control 
structure");
+               if (len < 1 || len > 2)
+                       expr_yyerror_more(yyscanner, "unexpected number of 
arguments",
+                                                         
PGBENCH_FUNCTIONS[fnumber].fname);
        }
 
        expr->etype = ENODE_FUNCTION;
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 31ea6ca..6e0f340 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -61,6 +61,14 @@
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
+ * Hashing constants
+ */
+#define FNV_PRIME 0x100000001b3
+#define FNV_OFFSET_BASIS 0xcbf29ce484222325
+#define MM2_MUL 0xc6a4a7935bd1e995
+#define MM2_ROT 47
+
+/*
  * Multi-platform pthread implementations
  */
 
@@ -184,6 +192,8 @@ char           *dbName;
 char      *logfile_prefix = NULL;
 const char *progname;
 
+int                    hash_seed;
+
 #define WSEP '@'                               /* weight separator */
 
 volatile bool timer_exceeded = false;  /* flag from signal handler */
@@ -439,6 +449,8 @@ static int  num_scripts;            /* number of scripts in 
sql_script[] */
 static int     num_commands = 0;       /* total number of Command structs */
 static int64 total_weight = 0;
 
+static int hash_seed;                  /* default seed used in hash functions 
*/
+
 static int     debug = 0;                      /* debug flag */
 
 /* Builtin test scripts */
@@ -915,6 +927,51 @@ getZipfianRand(TState *thread, int64 min, int64 max, 
double s)
 }
 
 /*
+ * FNV-1a hash function
+ */
+static int64
+getHashFnv1a(int64 val, uint64 seed)
+{
+       int64   result;
+       int             i;
+
+       result = FNV_OFFSET_BASIS ^ seed;
+       for (i = 0; i < 8; ++i)
+       {
+               int32 octet = val & 0xff;
+
+               val = val >> 8;
+               result = result ^ octet;
+               result = result * FNV_PRIME;
+       }
+
+       return result;
+}
+
+/*
+ * Murmur2 hash function
+ */
+static int64
+getHashMurmur2(int64 val, uint64 seed)
+{
+       uint64  result = seed ^ (sizeof(int64) * MM2_MUL);
+       uint64  k = (uint64) val;
+
+       k *= MM2_MUL;
+       k ^= k >> MM2_ROT;
+       k *= MM2_MUL;
+
+       result ^= k;
+       result *= MM2_MUL;
+
+       result ^= result >> MM2_ROT;
+       result *= MM2_MUL;
+       result ^= result >> MM2_ROT;
+
+       return (int64) result;
+}
+
+/*
  * Initialize the given SimpleStats struct to all zeroes
  */
 static void
@@ -2209,6 +2266,34 @@ evalStandardFunc(
                                return true;
                        }
 
+                       /* hashing */
+               case PGBENCH_HASH_FNV1A:
+               case PGBENCH_HASH_MURMUR2:
+                       {
+                               int64   val;
+                               int64   seed;
+                               int64   result;
+
+                               Assert(nargs >= 1);
+
+                               if (!coerceToInt(&vargs[0], &val))
+                                       return false;
+
+                               /* read optional seed value */
+                               if (nargs > 1)
+                               {
+                                       if (!coerceToInt(&vargs[1], &seed))
+                                               return false;
+                               }
+                               else
+                                       seed = hash_seed;
+
+                               result = (func == PGBENCH_HASH_FNV1A) ?
+                                       getHashFnv1a(val, seed) : 
getHashMurmur2(val, seed);
+                               setIntValue(retval, result);
+                               return true;
+                       }
+
                default:
                        /* cannot get here */
                        Assert(0);
@@ -5054,6 +5139,9 @@ main(int argc, char **argv)
        INSTR_TIME_SET_CURRENT(start_time);
        srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
 
+       /* set default seed for hash functions */
+       hash_seed = random();
+
        /* set up thread data structures */
        threads = (TState *) pg_malloc(sizeof(TState) * nthreads);
        nclients_dealt = 0;
diff --git a/src/bin/pgbench/pgbench.h b/src/bin/pgbench/pgbench.h
index 0705ccd..6070908 100644
--- a/src/bin/pgbench/pgbench.h
+++ b/src/bin/pgbench/pgbench.h
@@ -97,7 +97,9 @@ typedef enum PgBenchFunction
        PGBENCH_LE,
        PGBENCH_LT,
        PGBENCH_IS,
-       PGBENCH_CASE
+       PGBENCH_CASE,
+       PGBENCH_HASH_FNV1A,
+       PGBENCH_HASH_MURMUR2
 } PgBenchFunction;
 
 typedef struct PgBenchExpr PgBenchExpr;
@@ -136,6 +138,7 @@ struct PgBenchExprList
 };
 
 extern PgBenchExpr *expr_parse_result;
+extern int hash_seed;
 
 extern int     expr_yyparse(yyscan_t yyscanner);
 extern int     expr_yylex(union YYSTYPE *lvalp, yyscan_t yyscanner);
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl 
b/src/bin/pgbench/t/001_pgbench_with_server.pl
index a8b2962..eda28ea 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -259,6 +259,10 @@ pgbench(
                qr{command=46.: int 46\b},
                qr{command=47.: boolean true\b},
                qr{command=48.: boolean true\b},
+               qr{command=49.: int -5817877081768721676\b},
+               qr{command=50.: boolean true\b},
+               qr{command=51.: int -7793829335365542153\b},
+               qr{command=52.: int -?\d+\b},
        ],
        'pgbench expressions',
        {   '001_pgbench_expressions' => q{-- integer functions
@@ -327,6 +331,11 @@ pgbench(
 \set n6 debug(:n IS NULL AND NOT :f AND :t)
 -- conditional truth
 \set cs debug(CASE WHEN 1 THEN TRUE END AND CASE WHEN 1.0 THEN TRUE END AND 
CASE WHEN :n THEN NULL ELSE TRUE END)
+-- hash functions
+\set h0 debug(hash(10, 5432))
+\set h1 debug(:h0 = hash_murmur2(10, 5432))
+\set h3 debug(hash_fnv1a(10, 5432))
+\set h4 debug(hash(10))
 -- lazy evaluation
 \set zy 0
 \set yz debug(case when :zy = 0 then -1 else (1 / :zy) end)
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index c575f19..790f075 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -882,6 +882,11 @@ pgbench <optional> <replaceable>options</replaceable> 
</optional> <replaceable>d
        <entry> <literal>client_id</literal> </entry>
        <entry>unique number identifying the client session (starts from 
zero)</entry>
       </row>
+
+      <row>
+       <entry> <literal>random_seed</literal> </entry>
+       <entry>random seed shared for all sessions</entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
@@ -1444,6 +1449,22 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) 
/
    </itemizedlist>
 
   <para>
+    Hash functions accepts an input value and optional seed parameter. In case 
seed isn't provided :random_seed value is used. Hash functions can be used, for 
example, to modify the distribution of random_zipfian or random_exponential 
functions in order to obtain scattered distribution. Thus the following listing 
simulates some possible real world workload typical for social media and 
blogging platforms where few accounts generates excessive load:
+
+<programlisting>
+\set r random_zipfian(0, 100000, 0.99)
+\set k abs(hash(:r)) % 1000
+</programlisting>
+
+    In some cases several distinct distributions are needed which don't 
correlate with each other and this is when implicit seed parameter comes in 
handy:
+
+<programlisting>
+\set k1 abs(hash(:r), :random_seed + 123) % 1000
+\set k2 abs(hash(:r), :random_seed + 321) % 1000
+</programlisting>
+  </para>
+
+  <para>
    As an example, the full definition of the built-in TPC-B-like
    transaction is:
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 6e0f340..b6ed829 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -192,8 +192,6 @@ char           *dbName;
 char      *logfile_prefix = NULL;
 const char *progname;
 
-int                    hash_seed;
-
 #define WSEP '@'                               /* weight separator */
 
 volatile bool timer_exceeded = false;  /* flag from signal handler */
@@ -449,8 +447,6 @@ static int  num_scripts;            /* number of scripts in 
sql_script[] */
 static int     num_commands = 0;       /* total number of Command structs */
 static int64 total_weight = 0;
 
-static int hash_seed;                  /* default seed used in hash functions 
*/
-
 static int     debug = 0;                      /* debug flag */
 
 /* Builtin test scripts */
@@ -2286,7 +2282,18 @@ evalStandardFunc(
                                                return false;
                                }
                                else
-                                       seed = hash_seed;
+                               {
+                                       Variable *var;
+
+                                       /* read seed from variable */
+                                       var = lookupVariable(st, "random_seed");
+                                       Assert(var != NULL);
+                                       if (var->value.type == PGBT_NO_VALUE)
+                                               if (!makeVariableValue(var))
+                                                       return false;
+                                       if (!coerceToInt(&var->value, &seed))
+                                               return false;
+                               }
 
                                result = (func == PGBENCH_HASH_FNV1A) ?
                                        getHashFnv1a(val, seed) : 
getHashMurmur2(val, seed);
@@ -5004,6 +5011,10 @@ main(int argc, char **argv)
         */
        main_pid = (int) getpid();
 
+       /* set random seed */
+       INSTR_TIME_SET_CURRENT(start_time);
+       srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
+
        if (nclients > 1)
        {
                state = (CState *) pg_realloc(state, sizeof(CState) * nclients);
@@ -5118,6 +5129,16 @@ main(int argc, char **argv)
                }
        }
 
+       /* set default seed for hash functions */
+       if (lookupVariable(&state[0], "random_seed") == NULL)
+       {
+               int64 seed = random();
+
+               for (i = 0; i < nclients; i++)
+                       if (!putVariableInt(&state[i], "startup", 
"random_seed", seed))
+                               exit(1);
+       }
+
        if (!is_no_vacuum)
        {
                fprintf(stderr, "starting vacuum...");
@@ -5135,13 +5156,6 @@ main(int argc, char **argv)
        }
        PQfinish(con);
 
-       /* set random seed */
-       INSTR_TIME_SET_CURRENT(start_time);
-       srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
-
-       /* set default seed for hash functions */
-       hash_seed = random();
-
        /* set up thread data structures */
        threads = (TState *) pg_malloc(sizeof(TState) * nthreads);
        nclients_dealt = 0;
diff --git a/src/bin/pgbench/pgbench.h b/src/bin/pgbench/pgbench.h
index 6070908..6983865 100644
--- a/src/bin/pgbench/pgbench.h
+++ b/src/bin/pgbench/pgbench.h
@@ -138,7 +138,6 @@ struct PgBenchExprList
 };
 
 extern PgBenchExpr *expr_parse_result;
-extern int hash_seed;
 
 extern int     expr_yyparse(yyscan_t yyscanner);
 extern int     expr_yylex(union YYSTYPE *lvalp, yyscan_t yyscanner);

Reply via email to