Hello Tom,
BTW, did you look at the question of the range of zipfian?Yep.I confirmed here that as used in the test case, it's generating a range way smaller than the other ones: repeating the insertion snippet 1000x produces stats like this: [...]I have no idea whether that indicates an actual bug, or just poor choice of parameter in the test's call. But the very small number of distinct outputs is disheartening at least.Zipf distribution is highly skewed, somehow close to an exponential. To reduce the decreasing probability the parameter must be closer to 1, eg 1.05 or something. However as far as the test is concerned I do not see this as a significant issue. I was rather planning to submit a documentation improvement to provide more precise hints about how the distribution behaves depending on the parameter, and possibly reduce the parameter used in the test in passing, but I see this as not very urgent.
Attached a documentation patch and a scripts to check the distribution (here for N = 10 & s = 2.5), the kind of thing I used when checking the initial patch:
sh> psql < zipf_init.sql sh> pgbench -t 500000 -c 2 -M prepared -f zipf_test.sql -P 1 -- close to 29000 tps on my laptop sh> psql < zipf_end.sql ┌────┬────────┬────────────────────┬────────────────────────┐ │ i │ cnt │ ratio │ expected │ ├────┼────────┼────────────────────┼────────────────────────┤ │ 1 │ 756371 │ • │ • │ │ 2 │ 133431 │ 5.6686302283577280 │ 5.65685424949238019521 │ │ 3 │ 48661 │ 2.7420521567579787 │ 2.7556759606310754 │ │ 4 │ 23677 │ 2.0552012501583816 │ 2.0528009571186693 │ │ 5 │ 13534 │ 1.7494458401063987 │ 1.7469281074217107 │ │ 6 │ 8773 │ 1.5426877920893651 │ 1.5774409656148784 │ │ 7 │ 5709 │ 1.5366964442108951 │ 1.4701680288054869 │ │ 8 │ 4247 │ 1.3442429950553332 │ 1.3963036312159316 │ │ 9 │ 3147 │ 1.3495392437241818 │ 1.3423980299088363 │ │ 10 │ 2450 │ 1.2844897959183673 │ 1.3013488313450120 │ └────┴────────┴────────────────────┴────────────────────────┘ sh> psql < zipf_clean.sqlGiven these results, I do not think that it is useful to change random_zipfian TAP test parameter from 2.5 to something else.
-- Fabien.
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 15ee7c0f2b..10285d655b 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -1613,6 +1613,14 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) / frequently values to the beginning of the interval are drawn. The closer to 0 <replaceable>parameter</replaceable> is, the flatter (more uniform) the access distribution. + The distribution is such that, assuming the range starts from 1, + the ratio of probability of drawing <replaceable>k</replaceable> versus + drawing <replaceable>k+1</replaceable> is + <literal>((<replaceable>k</replaceable>+1)/<replaceable>k</replaceable>)**<replaceable>parameter</replaceable></literal>. + For instance <literal>random_zipfian(1, ..., 2.5)</literal> draws + value <literal>1</literal> about <literal>(2/1)**2.5 = 5.66</literal> times more frequently + than <literal>2</literal>, which itself is drawn <literal>(3/2)*2.5 = 2.76</literal> times more + frequently than <literal>3</literal>, and so on. </para> </listitem> </itemizedlist>
zipf_init.sql
Description: application/sql
zipf_test.sql
Description: application/sql
zipf_end.sql
Description: application/sql
zipf_clean.sql
Description: application/sql