Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

Fabien COELHO Tue, 22 Jan 2019 02:17:13 -0800


Hello Tom,

BTW, did you look at the question of the range of zipfian?
Yep.
I confirmed here that as used in the test case, it's generating a range way smaller than the other ones: repeating the insertion snippet 1000x produces stats like this: [...]
I have no idea whether that indicates an actual bug, or just poor
choice of parameter in the test's call.  But the very small number
of distinct outputs is disheartening at least.
Zipf distribution is highly skewed, somehow close to an exponential. To reduce the decreasing probability the parameter must be closer to 1, eg 1.05 or something. However as far as the test is concerned I do not see this as a significant issue. I was rather planning to submit a documentation improvement to provide more precise hints about how the distribution behaves depending on the parameter, and possibly reduce the parameter used in the test in passing, but I see this as not very urgent.

Attached a documentation patch and a scripts to check the distribution (here for N = 10 & s = 2.5), the kind of thing I used when checking the initial patch:


  sh> psql < zipf_init.sql
  sh> pgbench -t 500000 -c 2 -M prepared -f zipf_test.sql -P 1
  -- close to 29000 tps on my laptop
  sh> psql < zipf_end.sql
 ┌────┬────────┬────────────────────┬────────────────────────┐
 │ i  │  cnt   │       ratio        │        expected        │
 ├────┼────────┼────────────────────┼────────────────────────┤
 │  1 │ 756371 │                  • │                      • │
 │  2 │ 133431 │ 5.6686302283577280 │ 5.65685424949238019521 │
 │  3 │  48661 │ 2.7420521567579787 │     2.7556759606310754 │
 │  4 │  23677 │ 2.0552012501583816 │     2.0528009571186693 │
 │  5 │  13534 │ 1.7494458401063987 │     1.7469281074217107 │
 │  6 │   8773 │ 1.5426877920893651 │     1.5774409656148784 │
 │  7 │   5709 │ 1.5366964442108951 │     1.4701680288054869 │
 │  8 │   4247 │ 1.3442429950553332 │     1.3963036312159316 │
 │  9 │   3147 │ 1.3495392437241818 │     1.3423980299088363 │
 │ 10 │   2450 │ 1.2844897959183673 │     1.3013488313450120 │
 └────┴────────┴────────────────────┴────────────────────────┘
  sh> psql < zipf_clean.sql

Given these results, I do not think that it is useful to change random_zipfian TAP test parameter from 2.5 to something else.


--
Fabien.

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 15ee7c0f2b..10285d655b 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1613,6 +1613,14 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
       frequently values to the beginning of the interval are drawn.
       The closer to 0 <replaceable>parameter</replaceable> is,
       the flatter (more uniform) the access distribution.
+      The distribution is such that, assuming the range starts from 1,
+      the ratio of probability of drawing <replaceable>k</replaceable> versus
+      drawing <replaceable>k+1</replaceable> is
+      <literal>((<replaceable>k</replaceable>+1)/<replaceable>k</replaceable>)**<replaceable>parameter</replaceable></literal>.
+      For instance <literal>random_zipfian(1, ..., 2.5)</literal> draws
+      value <literal>1</literal> about <literal>(2/1)**2.5 = 5.66</literal> times more frequently
+      than <literal>2</literal>, which itself is drawn <literal>(3/2)*2.5 = 2.76</literal> times more
+      frequently than <literal>3</literal>, and so on.
      </para>
     </listitem>
    </itemizedlist>

zipf_init.sql
Description: application/sql

zipf_test.sql
Description: application/sql

zipf_end.sql
Description: application/sql

zipf_clean.sql
Description: application/sql

Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

Reply via email to