Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-24 Thread Alvaro Herrera
On 2019-Jan-24, Tom Lane wrote:

> > Also, as 
> > the pseudo-random state is fully controlled, seeded test results are 
> > deterministic so the expected value can be fully checked.
> 
> I found that the "expected value" was different in v11 than HEAD,
> which surprised me.  It looks like the reason is that HEAD sets up
> more/different RandomStates from the same seed than v11 did.  Not
> sure if it's a good thing for this behavior to change across versions.

The rationale behind this was that some internal uses of random numbers
messed up the determinism of user-invoked random functions; 409231919443
commit message says

While at it, use separate random state for thread administratrivia such
as deciding which script to run, how long to delay for throttling, or
whether to log a message when sampling; this not only makes these tasks
independent of each other, but makes the actual thread run
deterministic.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-24 Thread Tom Lane
Fabien COELHO  writes:
>> I had in mind something more like the attached.

> Yep.
> I'm not too happy that it mixes API levels, and about the int/double/int 
> path.
> Attached an updated version which relies on pg_jrand48 instead.

Hm, I'm not sure that's really an improvement, but I pushed it like that
(and the other change along with it).

> Also, as 
> the pseudo-random state is fully controlled, seeded test results are 
> deterministic so the expected value can be fully checked.

I found that the "expected value" was different in v11 than HEAD,
which surprised me.  It looks like the reason is that HEAD sets up
more/different RandomStates from the same seed than v11 did.  Not
sure if it's a good thing for this behavior to change across versions.

regards, tom lane



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-22 Thread Fabien COELHO


Hello Tom,


Here is a POC which defines an internal interface for a PRNG, and use it
within pgbench, with several possible implementations which default to
rand48.



I seriously dislike this patch.  pgbench's random support is quite
overengineered already IMO, and this proposes to add a whole batch of
new code and new APIs to fix a very small bug.



My intention is rather to discuss postgres' PRNG, in passing. Full success
on this point:-)


Our immediate problem is to fix a portability failure, which we need to
back-patch into at least one released branch, ergo conservatism is
warranted.


Sure, the patch I sent is definitely not for backpatching, it is for 
discussion.



 I had in mind something more like the attached.


Yep.

I'm not too happy that it mixes API levels, and about the int/double/int 
path.


Attached an updated version which relies on pg_jrand48 instead. Also, as 
the pseudo-random state is fully controlled, seeded test results are 
deterministic so the expected value can be fully checked.


I did a few sanity tests which were all ok.

I think that this version is appropriate for backpatching. I also think 
that it would be appropriate to consider having a better PRNG to replace 
rand48 in a future release.


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index b5c75ce1c6..27aac479e3 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -185,7 +185,7 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
-/* random seed used when calling srandom() */
+/* random seed used to initialize base_random_sequence */
 int64		random_seed = -1;
 
 /*
@@ -287,6 +287,9 @@ typedef struct RandomState
 	unsigned short xseed[3];
 } RandomState;
 
+/* Various random sequences are initialized from this one. */
+static RandomState base_random_sequence;
+
 /*
  * Connection state machine states.
  */
@@ -598,6 +601,8 @@ static const BuiltinScript builtin_script[] =
 
 
 /* Function prototypes */
+static void initRandomState(RandomState *random_state);
+static int64 getrand(RandomState *random_state, int64 min, int64 max);
 static void setNullValue(PgBenchValue *pv);
 static void setBoolValue(PgBenchValue *pv, bool bval);
 static void setIntValue(PgBenchValue *pv, int64 ival);
@@ -833,16 +838,28 @@ strtodouble(const char *str, bool errorOK, double *dv)
 
 /*
  * Initialize a random state struct.
+ *
+ * We derive the seed from base_random_sequence, which must be set up already.
  */
 static void
 initRandomState(RandomState *random_state)
 {
-	random_state->xseed[0] = random();
-	random_state->xseed[1] = random();
-	random_state->xseed[2] = random();
+	random_state->xseed[0] = (unsigned short)
+		pg_jrand48(base_random_sequence.xseed) & 0x;
+	random_state->xseed[1] = (unsigned short)
+		pg_jrand48(base_random_sequence.xseed) & 0x;
+	random_state->xseed[2] = (unsigned short)
+		pg_jrand48(base_random_sequence.xseed) & 0x;
 }
 
-/* random number generator: uniform distribution from min to max inclusive */
+/*
+ * Random number generator: uniform distribution from min to max inclusive.
+ *
+ * Although the limits are expressed as int64, you can't generate the full
+ * int64 range in one call, because the difference of the limits mustn't
+ * overflow int64.  In practice it's unwise to ask for more than an int32
+ * range, because of the limited precision of pg_erand48().
+ */
 static int64
 getrand(RandomState *random_state, int64 min, int64 max)
 {
@@ -5126,12 +5143,14 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	}
 }
 
-/* call srandom based on some seed. NULL triggers the default behavior. */
+/*
+ * Set up a random seed according to seed parameter (NULL means default),
+ * and initialize base_random_sequence for use in initializing other sequences.
+ */
 static bool
 set_random_seed(const char *seed)
 {
-	/* srandom expects an unsigned int */
-	unsigned int iseed;
+	uint64		iseed;
 
 	if (seed == NULL || strcmp(seed, "time") == 0)
 	{
@@ -5139,7 +5158,7 @@ set_random_seed(const char *seed)
 		instr_time	now;
 
 		INSTR_TIME_SET_CURRENT(now);
-		iseed = (unsigned int) INSTR_TIME_GET_MICROSEC(now);
+		iseed = (uint64) INSTR_TIME_GET_MICROSEC(now);
 	}
 	else if (strcmp(seed, "rand") == 0)
 	{
@@ -5155,7 +5174,7 @@ set_random_seed(const char *seed)
 		/* parse seed unsigned int value */
 		char		garbage;
 
-		if (sscanf(seed, "%u%c", , ) != 1)
+		if (sscanf(seed, UINT64_FORMAT "%c", , ) != 1)
 		{
 			fprintf(stderr,
 	"unrecognized random seed option \"%s\": expecting an unsigned integer, \"time\" or \"rand\"\n",
@@ -5165,10 +5184,14 @@ set_random_seed(const char *seed)
 	}
 
 	if (seed != NULL)
-		fprintf(stderr, "setting random seed to %u\n", iseed);
-	srandom(iseed);
-	/* no precision loss: 32 bit unsigned int cast to 64 bit int */
+		fprintf(stderr, "setting random seed to " UINT64_FORMAT "\n", iseed);
 	random_seed = iseed;
+
+	/* Fill base_random_sequence with 

Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-22 Thread Fabien COELHO


It's not demonstrably slower than 2.5 either.  (1.1 is measurably slower; 
probably not by enough for anyone to care, but 1.5 is good enough for me.)


Good if it fails quick enough for you.


Attached a patch with the zipf doc update & the TAP test parameter change.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 15ee7c0f2b..10285d655b 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1613,6 +1613,14 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
   frequently values to the beginning of the interval are drawn.
   The closer to 0 parameter is,
   the flatter (more uniform) the access distribution.
+  The distribution is such that, assuming the range starts from 1,
+  the ratio of probability of drawing k versus
+  drawing k+1 is
+  ((k+1)/k)**parameter.
+  For instance random_zipfian(1, ..., 2.5) draws
+  value 1 about (2/1)**2.5 = 5.66 times more frequently
+  than 2, which itself is drawn (3/2)*2.5 = 2.76 times more
+  frequently than 3, and so on.
  
 

diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index c87748086a..c0cdfbf5f7 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -471,7 +471,7 @@ for my $i (1, 2)
 \set ur random(1000, 1999)
 \set er random_exponential(2000, 2999, 2.0)
 \set gr random_gaussian(3000, 3999, 3.0)
-\set zr random_zipfian(4000, 4999, 2.5)
+\set zr random_zipfian(4000, 4999, 1.5)
 INSERT INTO seeded_random(seed, rand, val) VALUES
   (:random_seed, 'uniform', :ur),
   (:random_seed, 'exponential', :er),


Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-22 Thread Fabien COELHO



The first value is taken about 75% of the time for N=1000 and s=2.5, which
means that a non deterministic implementation would succeed about 0.75² ~
56% of the time on that one.


Right, that's about what we've been seeing on OpenBSD.


Also, the drawing procedure is less efficient when the parameter is close
to 1 because it is more likely to loop,


That might be something to fix, but I agree it's a reason not to go
overboard trying to flatten the test case's distribution right now.


Probably you would have to invent a new method to draw a zipfian 
distribution for that, which would be nice.



If you want something more drastic, using 1.5 instead of 2.5 would reduce
the probability of accidentaly passing the test by chance to about 20%, so
it would fail 80% of the time.


I think your math is off;


Argh. Although I confirm my computation, ISTM that with 1.5 the first 
value as 39% chance of getting out so collision on 15% of cases, second 
value 14% so collision on 2%, ... total cumulated probability about 18%.


1.5 works quite well here.  I saw one failure to produce distinct values 
in 20 attempts.


For 3 failure expected, that is possible.

It's not demonstrably slower than 2.5 either.  (1.1 is measurably 
slower; probably not by enough for anyone to care, but 1.5 is good 
enough for me.)


Good if it fails quick enough for you.

--
Fabien.

Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-22 Thread Tom Lane
Fabien COELHO  writes:
>> I'm not following this argument.  The test case is basically useless
>> for its intended purpose with that parameter, because it's highly
>> likely that the failure mode it's supposedly checking for will be
>> masked by the "random" function's tendency to spit out the same
>> value all the time.

> The first value is taken about 75% of the time for N=1000 and s=2.5, which 
> means that a non deterministic implementation would succeed about 0.75² ~ 
> 56% of the time on that one.

Right, that's about what we've been seeing on OpenBSD.

> Also, the drawing procedure is less efficient when the parameter is close 
> to 1 because it is more likely to loop,

That might be something to fix, but I agree it's a reason not to go
overboard trying to flatten the test case's distribution right now.

> If you want something more drastic, using 1.5 instead of 2.5 would reduce 
> the probability of accidentaly passing the test by chance to about 20%, so 
> it would fail 80% of the time.

I think your math is off; 1.5 works quite well here.  I saw one failure
to produce distinct values in 20 attempts.  It's not demonstrably slower
than 2.5 either.  (1.1 is measurably slower; probably not by enough for
anyone to care, but 1.5 is good enough for me.)

regards, tom lane



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-22 Thread Fabien COELHO


Hello Tom,


BTW, did you look at the question of the range of zipfian?


Yep.

I confirmed here that as used in the test case, it's generating a range way 
smaller than the other ones: repeating the insertion snippet 1000x produces 
stats like this: [...]



I have no idea whether that indicates an actual bug, or just poor
choice of parameter in the test's call.  But the very small number
of distinct outputs is disheartening at least.


Zipf distribution is highly skewed, somehow close to an exponential. To 
reduce the decreasing probability the parameter must be closer to 1, eg 1.05 
or something. However as far as the test is concerned I do not see this as a 
significant issue. I was rather planning to submit a documentation 
improvement to provide more precise hints about how the distribution behaves 
depending on the parameter, and possibly reduce the parameter used in the 
test in passing, but I see this as not very urgent.


Attached a documentation patch and a scripts to check the distribution 
(here for N = 10 & s = 2.5), the kind of thing I used when checking the 
initial patch:


  sh> psql < zipf_init.sql
  sh> pgbench -t 50 -c 2 -M prepared -f zipf_test.sql -P 1
  -- close to 29000 tps on my laptop
  sh> psql < zipf_end.sql
 ┌┬┬┬┐
 │ i  │  cnt   │   ratio│expected│
 ├┼┼┼┤
 │  1 │ 756371 │  • │  • │
 │  2 │ 133431 │ 5.6686302283577280 │ 5.65685424949238019521 │
 │  3 │  48661 │ 2.7420521567579787 │ 2.7556759606310754 │
 │  4 │  23677 │ 2.0552012501583816 │ 2.0528009571186693 │
 │  5 │  13534 │ 1.7494458401063987 │ 1.7469281074217107 │
 │  6 │   8773 │ 1.5426877920893651 │ 1.5774409656148784 │
 │  7 │   5709 │ 1.5366964442108951 │ 1.4701680288054869 │
 │  8 │   4247 │ 1.3442429950553332 │ 1.3963036312159316 │
 │  9 │   3147 │ 1.3495392437241818 │ 1.3423980299088363 │
 │ 10 │   2450 │ 1.2844897959183673 │ 1.3013488313450120 │
 └┴┴┴┘
  sh> psql < zipf_clean.sql

Given these results, I do not think that it is useful to change 
random_zipfian TAP test parameter from 2.5 to something else.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 15ee7c0f2b..10285d655b 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1613,6 +1613,14 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
   frequently values to the beginning of the interval are drawn.
   The closer to 0 parameter is,
   the flatter (more uniform) the access distribution.
+  The distribution is such that, assuming the range starts from 1,
+  the ratio of probability of drawing k versus
+  drawing k+1 is
+  ((k+1)/k)**parameter.
+  For instance random_zipfian(1, ..., 2.5) draws
+  value 1 about (2/1)**2.5 = 5.66 times more frequently
+  than 2, which itself is drawn (3/2)*2.5 = 2.76 times more
+  frequently than 3, and so on.
  
 



zipf_init.sql
Description: application/sql


zipf_test.sql
Description: application/sql


zipf_end.sql
Description: application/sql


zipf_clean.sql
Description: application/sql


Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-20 Thread Fabien COELHO



Hello Tom,


Here is a POC which defines an internal interface for a PRNG, and use it
within pgbench, with several possible implementations which default to
rand48.


I seriously dislike this patch.  pgbench's random support is quite
overengineered already IMO, and this proposes to add a whole batch of
new code and new APIs to fix a very small bug.


My intention is rather to discuss postgres' PRNG, in passing. Full success 
on this point:-)



I must admit that I have a grudge against standard rand48:


I think this is nonsense, particularly the claim that anything in PG
cares about the lowest-order bits of random doubles.  I'm aware that
there are applications where that does matter, but people aren't doing
high-precision weather simulations in pgbench.


Sure. My point is not that it is an actual issue for pgbench, but as the 
same PRNG is used more or less everywhere in postgres, I think that it 
should be a good one rather than a known bad one.


Eg, about double:

  \set i debug(random(1, POWER(2,49)) % 2)

Always return 1 because of the 48 bit precision, i.e. the output is never 
even.


  \set i debug(random(1, POWER(2,48)) % 2)

Return 0 1 0 1 0 1 0 1 0 1 0 1 0 1 ... because it is a LCG.

  \set i debug(random(1, POWER(2,48)) % 4)

Cycles over (3 2 1 0)*

  \set i debug(random(1, power(2, 47)) % 4)

Cycles over (0 0 1 1 2 2 3 3)*, and so on.


BTW, did you look at the question of the range of zipfian?


Yep.

I confirmed here that as used in the test case, it's generating a range 
way smaller than the other ones: repeating the insertion snippet 1000x 
produces stats like this: [...]



I have no idea whether that indicates an actual bug, or just poor
choice of parameter in the test's call.  But the very small number
of distinct outputs is disheartening at least.


Zipf distribution is highly skewed, somehow close to an exponential. To 
reduce the decreasing probability the parameter must be closer to 1, eg 
1.05 or something. However as far as the test is concerned I do not see 
this as a significant issue. I was rather planning to submit a 
documentation improvement to provide more precise hints about how the 
distribution behaves depending on the parameter, and possibly reduce the 
parameter used in the test in passing, but I see this as not very urgent.


--
Fabien.



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-20 Thread Tom Lane
Fabien COELHO  writes:
>>> I'd suggest that maybe we should get rid of the use of both random()
>>> and srandom() in pgbench, and go over to letting set_random_seed()
>>> fill the pg_erand48 state directly.

> Here is a POC which defines an internal interface for a PRNG, and use it 
> within pgbench, with several possible implementations which default to 
> rand48.

I seriously dislike this patch.  pgbench's random support is quite
overengineered already IMO, and this proposes to add a whole batch of
new code and new APIs to fix a very small bug.

> I must admit that I have a grudge against standard rand48:

I think this is nonsense, particularly the claim that anything in PG
cares about the lowest-order bits of random doubles.  I'm aware that
there are applications where that does matter, but people aren't doing
high-precision weather simulations in pgbench.

BTW, did you look at the question of the range of zipfian?  I confirmed
here that as used in the test case, it's generating a range way smaller
than the other ones: repeating the insertion snippet 1000x produces stats
like this:

regression=# select seed,rand,min(val),max(val),count(distinct val) from 
seeded_random group by 1,2 order by 2,1;
seed|rand | min  | max  | count 
+-+--+--+---
 1957482663 | exponential | 2000 | 2993 |   586
 1958556409 | exponential | 2000 | 2995 |   569
 1959867462 | exponential | 2000 | 2997 |   569
 1957482663 | gaussian| 3009 | 3997 |   493
 1958556409 | gaussian| 3027 | 3956 |   501
 1959867462 | gaussian| 3018 | 3960 |   511
 1957482663 | uniform | 1001 | 1999 |   625
 1958556409 | uniform | 1000 | 1999 |   642
 1959867462 | uniform | 1001 | 1999 |   630
 1957482663 | zipfian | 4000 | 4081 |19
 1958556409 | zipfian | 4000 | 4022 |18
 1959867462 | zipfian | 4000 | 4156 |23

I have no idea whether that indicates an actual bug, or just poor
choice of parameter in the test's call.  But the very small number
of distinct outputs is disheartening at least.

regards, tom lane



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-20 Thread Fabien COELHO


Hello Tom,


Maybe on OpenBSD pg should switch srandom to srandom_deterministic?


Dunno.  I'm fairly annoyed by their idea that they're smarter than POSIX.


Hmmm. I'm afraid that is not that hard.


However, for most of our uses of srandom, this behavior isn't awful;
it's only pgbench that has an expectation that the platform random()
can be made to behave deterministically.  And TBH I think that's just
an expectation that's going to bite us.

I'd suggest that maybe we should get rid of the use of both random()
and srandom() in pgbench, and go over to letting set_random_seed()
fill the pg_erand48 state directly.  In the integer-seed case you
could use something equivalent to pg_srand48.  (In the other cases
probably you could do better, certainly the strong-random case could
just fill all 6 bytes directly.)  That would get us to a place where
the behavior of --random-seed=N is not only deterministic but
platform-independent, which seems like an improvement.


That's a point. Althought I'm not found of round48, indeed having something 
platform independent for testing makes definite sense.


I'll look into it.


Here is a POC which defines an internal interface for a PRNG, and use it 
within pgbench, with several possible implementations which default to 
rand48.


I must admit that I have a grudge against standard rand48:

 - it is a known poor PRNG which was designed at a time when LCG where
   basically the only low cost PRNG available. Newer designs were very
   recent when the standard was set.
 - it is a LCG, i.e. its low bits cycle quickly, so should not be used.
 - so the 48 bit state size is relevant for generating 32 bits ints
   and floats.
 - however it eis used to generate more bits...
 - the double function uses all 48 bits, whereas 52 need to be filled...
 - and it is used to generate integers, which means that for large range
   some values are inaccessible.
 - 3 * 16 bits integers state looks silly on 32/64 bit architectures.
 - ...

Given that postgres needs doubles (52 bits mantissa) and possibly 64 bits 
integers, IMO the internal state should be 64 bits as a bare minimum, 
which anyway is also the minimal bite on 64 bit architectures, which is 
what is encoutered in practice.


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index b5c75ce1c6..06f1083d5a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -185,7 +185,7 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
-/* random seed used when calling srandom() */
+/* random seed used to initialize the PRNG */
 int64		random_seed = -1;
 
 /*
@@ -279,14 +279,6 @@ typedef struct StatsData
 	SimpleStats lag;
 } StatsData;
 
-/*
- * Struct to keep random state.
- */
-typedef struct RandomState
-{
-	unsigned short xseed[3];
-} RandomState;
-
 /*
  * Connection state machine states.
  */
@@ -383,7 +375,7 @@ typedef struct
 	 * Separate randomness for each client. This is used for random functions
 	 * PGBENCH_RANDOM_* during the execution of the script.
 	 */
-	RandomState cs_func_rs;
+	pg_prng_state cs_func_rs;
 
 	int			use_file;		/* index in sql_script for this client */
 	int			command;		/* command number in script */
@@ -450,9 +442,9 @@ typedef struct
 	 * random state to make all of them independent of each other and therefore
 	 * deterministic at the thread level.
 	 */
-	RandomState ts_choose_rs;	/* random state for selecting a script */
-	RandomState ts_throttle_rs;	/* random state for transaction throttling */
-	RandomState ts_sample_rs;	/* random state for log sampling */
+	pg_prng_state ts_choose_rs;	/* random state for selecting a script */
+	pg_prng_state ts_throttle_rs;	/* random state for transaction throttling */
+	pg_prng_state ts_sample_rs;	/* random state for log sampling */
 
 	int64		throttle_trigger;	/* previous/next throttling (us) */
 	FILE	   *logfile;		/* where to log, or NULL */
@@ -832,30 +824,34 @@ strtodouble(const char *str, bool errorOK, double *dv)
 }
 
 /*
- * Initialize a random state struct.
+ * Initialize a pseudo-random state struct from main PRNG.
+ * This done at the beginning of process allows to have deterministic
+ * pgbench runs.
  */
 static void
-initRandomState(RandomState *random_state)
+pg_prng_state_init(pg_prng_state *rs)
 {
-	random_state->xseed[0] = random();
-	random_state->xseed[1] = random();
-	random_state->xseed[2] = random();
+	pg_prng_setstate_r(rs, pg_prng_u64());
 }
 
 /* random number generator: uniform distribution from min to max inclusive */
 static int64
-getrand(RandomState *random_state, int64 min, int64 max)
+getrand(pg_prng_state *random_state, int64 min, int64 max)
 {
 	/*
 	 * Odd coding is so that min and max have approximately the same chance of
 	 * being selected as do numbers between them.
 	 *
-	 * pg_erand48() is thread-safe and concurrent, which is why we use it
+	 * pg_prng_double_r() is thread-safe and concurrent, which is why we use it
 	 * rather than 

Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-18 Thread Fabien COELHO




Maybe on OpenBSD pg should switch srandom to srandom_deterministic?


Dunno.  I'm fairly annoyed by their idea that they're smarter than POSIX.
However, for most of our uses of srandom, this behavior isn't awful;
it's only pgbench that has an expectation that the platform random()
can be made to behave deterministically.  And TBH I think that's just
an expectation that's going to bite us.

I'd suggest that maybe we should get rid of the use of both random()
and srandom() in pgbench, and go over to letting set_random_seed()
fill the pg_erand48 state directly.  In the integer-seed case you
could use something equivalent to pg_srand48.  (In the other cases
probably you could do better, certainly the strong-random case could
just fill all 6 bytes directly.)  That would get us to a place where
the behavior of --random-seed=N is not only deterministic but
platform-independent, which seems like an improvement.


That's a point. Althought I'm not found of round48, indeed having 
something platform independent for testing makes definite sense.


I'll look into it.

--
Fabien.



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-18 Thread Tom Lane
Fabien COELHO  writes:
>> all is explained here:
>> https://man.openbsd.org/srandom
>> Or at least most is explained there.

> Yep. They try to be more serious than other systems about PRNG, which is 
> not bad in itself.

> Maybe on OpenBSD pg should switch srandom to srandom_deterministic?

Dunno.  I'm fairly annoyed by their idea that they're smarter than POSIX.
However, for most of our uses of srandom, this behavior isn't awful;
it's only pgbench that has an expectation that the platform random()
can be made to behave deterministically.  And TBH I think that's just
an expectation that's going to bite us.

I'd suggest that maybe we should get rid of the use of both random()
and srandom() in pgbench, and go over to letting set_random_seed()
fill the pg_erand48 state directly.  In the integer-seed case you
could use something equivalent to pg_srand48.  (In the other cases
probably you could do better, certainly the strong-random case could
just fill all 6 bytes directly.)  That would get us to a place where
the behavior of --random-seed=N is not only deterministic but
platform-independent, which seems like an improvement.

regards, tom lane



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-18 Thread Fabien COELHO




BTW, if you're wondering why curculio is still failing the pgbench
test,


Hmmm, that is interesting! It shows that at least some TAP tests are 
useful.



all is explained here:

https://man.openbsd.org/srandom

Or at least most is explained there.


Yep. They try to be more serious than other systems about PRNG, which is 
not bad in itself.


While curculio is unsurprisingly failing all four seeded_random tests, 
when I try it locally on an OpenBSD 6.4 installation, only the uniform, 
exponential, and gaussian cases reliably "fail".  zipfian usually 
doesn't.


It looks like the zipfian code almost always produces 4000 regardless of 
the seed value, though occasionally it produces 4001.  Bad parameters 
for that algorithm, perhaps?


Welcome to the zipfian highly skewed distribution! I'll check the 
parameters used in the test, maybe it should use something less extreme.


srandom is only used for initializing the state of various internal rand48 
LCG PRNG for pgbench.


Maybe on OpenBSD pg should switch srandom to srandom_deterministic?

--
Fabien.



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-18 Thread Fabien COELHO




I'd rather keep it by simply adding the "|unknown" alternative. 30 years
of programming have taught me that testing limit & error cases is useful,
although you never know when it will be proven so.


Sorry, I don't buy this line of argument.


Reasonable test design requires making cost/benefit tradeoffs: the cost 
to run the test over and over, and the cost to maintain the test itself 
(e.g. fix portability issues in it) have to be balanced against the 
probability of it finding something useful.  I judge that the chance of 
this particular test finding something is small, and I've had quite 
enough of the maintenance costs.


Just to point up that we're still not clearly done with the maintenance
costs of supposing that we know how every version of getopt_long will
spell this error message, I note that my Linux box seems to have two
variants of it:

$ pgbench -z
pgbench: invalid option -- 'z'
Try "pgbench --help" for more information.
$ pgbench --z
pgbench: unrecognized option '--z'
Try "pgbench --help" for more information.

of which the "invalid" alternative is also not in our list right now.
Who's to say how many more versions of getopt_long are out there,
or what the maintainers thereof might do in the future?


ISTM that the getopt implementers imagination should run out in the end:-) 
invalid, unknown, unrecognized, unexpected, incorrect... Ok English has 
many words:-)



I agree that some tests can be useless, but I do not think that it applies
to this one. This test also checks that under a bad option pgbench stops
with an appropriate 1 exit status.


It's possible that it's worth the trouble to check for exit status 1,
but I entirely fail to see the point of checking exactly what is the
spelling of a message that is issued by code not under our control.

Looking closer at the test case:

   [
   'bad option',
   '-h home -p 5432 -U calvin -d --bad-option',
   [ qr{(unrecognized|illegal) option}, qr{--help.*more information} ]
   ],

ISTM that just removing the first qr{} pattern, and checking only that
we get the text that *is* under our control, is a reasonable compromise
here.


Possibly. I'd be a little happier if it checks for a non-empty error 
message, eg qr{...} or qr{option} (the message should say something about 
the option).


--
Fabien.



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Tom Lane
BTW, if you're wondering why curculio is still failing the pgbench
test, all is explained here:

https://man.openbsd.org/srandom

Or at least most is explained there.  While curculio is unsurprisingly
failing all four seeded_random tests, when I try it locally on an
OpenBSD 6.4 installation, only the uniform, exponential, and gaussian
cases reliably "fail".  zipfian usually doesn't.  It looks like the
zipfian code almost always produces 4000 regardless of the seed value,
though occasionally it produces 4001.  Bad parameters for that
algorithm, perhaps?

regards, tom lane



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Michael Paquier
On Thu, Jan 17, 2019 at 07:21:08PM -0500, Tom Lane wrote:
> Sorry, I don't buy this line of argument.  Reasonable test design requires
> making cost/benefit tradeoffs: the cost to run the test over and over,
> and the cost to maintain the test itself (e.g. fix portability issues in
> it) have to be balanced against the probability of it finding something
> useful.  I judge that the chance of this particular test finding something
> is small, and I've had quite enough of the maintenance costs.

Yes, I agree with Tom's line of thoughts here.  It seems to me that
just dropping this part of the test is just but fine.
--
Michael


signature.asc
Description: PGP signature


Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Tom Lane
Fabien COELHO  writes:
>> I am, TBH, inclined to fix this by removing that test case rather
>> than teaching it another spelling to accept.  I think it's very
>> hard to make the case that tests like this one are anything but
>> a waste of developer and buildfarm time.  When they are also a
>> portability hazard, it's time to cut our losses.  (I also note
>> that this test has caused us problems before, cf 869aa40a2 and
>> 933851033.)

> I'd rather keep it by simply adding the "|unknown" alternative. 30 years 
> of programming have taught me that testing limit & error cases is useful, 
> although you never know when it will be proven so.

Sorry, I don't buy this line of argument.  Reasonable test design requires
making cost/benefit tradeoffs: the cost to run the test over and over,
and the cost to maintain the test itself (e.g. fix portability issues in
it) have to be balanced against the probability of it finding something
useful.  I judge that the chance of this particular test finding something
is small, and I've had quite enough of the maintenance costs.

Just to point up that we're still not clearly done with the maintenance
costs of supposing that we know how every version of getopt_long will
spell this error message, I note that my Linux box seems to have two
variants of it:

$ pgbench -z 
pgbench: invalid option -- 'z'
Try "pgbench --help" for more information.
$ pgbench --z
pgbench: unrecognized option '--z'
Try "pgbench --help" for more information.

of which the "invalid" alternative is also not in our list right now.
Who's to say how many more versions of getopt_long are out there,
or what the maintainers thereof might do in the future?

> I agree that some tests can be useless, but I do not think that it applies 
> to this one. This test also checks that under a bad option pgbench stops 
> with an appropriate 1 exit status.

It's possible that it's worth the trouble to check for exit status 1,
but I entirely fail to see the point of checking exactly what is the
spelling of a message that is issued by code not under our control.

Looking closer at the test case:

[
'bad option',
'-h home -p 5432 -U calvin -d --bad-option',
[ qr{(unrecognized|illegal) option}, qr{--help.*more information} ]
],

ISTM that just removing the first qr{} pattern, and checking only that
we get the text that *is* under our control, is a reasonable compromise
here.

regards, tom lane



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström



On 2019-01-18 00:31, Tom Lane wrote:

=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:

And now also the NetBSD machine failed on pgbenchCheck.


Indeed, as expected.


Ok.



should I leave it as it is for now?


Please.  I'll push a fix for the broken test case in a bit --- I
just wanted to confirm that somebody else's machines agreed that
it's broken.


Ok, I will leave it on then.

/Mikael




Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Tom Lane
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:
> And now also the NetBSD machine failed on pgbenchCheck.

Indeed, as expected.

> should I leave it as it is for now?

Please.  I'll push a fix for the broken test case in a bit --- I
just wanted to confirm that somebody else's machines agreed that
it's broken.

regards, tom lane



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Tom Lane
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:
>> But it looks like in NetBSD the options are called:

Sorry about that, I copied-and-pasted from the openbsd machine I was
looking at without remembering that netbsd is just a shade different.

> but the OpenBSD machine went further and now fails on:
> pgbenchCheck instead.
> Is that the failure you expected to get?

Yup, sure is.  Thanks!

regards, tom lane



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström



On 2019-01-18 00:00, Mikael Kjellström wrote:


I just started another run on sidewinder (NetBSD 7), let's see how that 
goes.


but the OpenBSD machine went further and now fails on:

pgbenchCheck instead.

Is that the failure you expected to get?


And now also the NetBSD machine failed on pgbenchCheck.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder=2019-01-17%2022%3A57%3A14

should I leave it as it is for now?

/Mikael



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström




On 2019-01-17 23:54, Mikael Kjellström wrote:


But it looks like in NetBSD the options are called:

netbsd7-pgbf# sysctl -a | grep semmn
kern.ipc.semmni = 10
kern.ipc.semmns = 60
kern.ipc.semmnu = 30

so I will try and set that in /etc/sysctl.conf and reboot and see what 
happens.


That seems to have done the trick:

netbsd7-pgbf# sysctl -a | grep semmn
kern.ipc.semmni = 100
kern.ipc.semmns = 2000
kern.ipc.semmnu = 30

I just started another run on sidewinder (NetBSD 7), let's see how that 
goes.


but the OpenBSD machine went further and now fails on:

pgbenchCheck instead.

Is that the failure you expected to get?

/Mikael



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström



On 2019-01-17 23:37, Mikael Kjellström wrote:


On 2019-01-17 23:23, Tom Lane wrote:


Yeah, you might've been able to get by with OpenBSD/NetBSD's default
semaphore settings before, but they really only let one postmaster
run at a time; and the TAP tests want to start more than one.
For me it seems to work to append this to /etc/sysctl.conf:

kern.seminfo.semmni=100
kern.seminfo.semmns=2000

and either reboot, or install those settings manually with sysctl.


Looks that way.

I've increased the values and rebooted the machines.

Let's hope 5th time is the charm :-)


Nope!

But it looks like in NetBSD the options are called:

netbsd7-pgbf# sysctl -a | grep semmn
kern.ipc.semmni = 10
kern.ipc.semmns = 60
kern.ipc.semmnu = 30

so I will try and set that in /etc/sysctl.conf and reboot and see what 
happens.


/Mikael




Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström



On 2019-01-17 23:23, Tom Lane wrote:


Yeah, you might've been able to get by with OpenBSD/NetBSD's default
semaphore settings before, but they really only let one postmaster
run at a time; and the TAP tests want to start more than one.
For me it seems to work to append this to /etc/sysctl.conf:

kern.seminfo.semmni=100
kern.seminfo.semmns=2000

and either reboot, or install those settings manually with sysctl.


Looks that way.

I've increased the values and rebooted the machines.

Let's hope 5th time is the charm :-)

/Mikael



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Tom Lane
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:
>> Let's see if it works better this time.

> Hmmm, nope:

> 2019-01-17 23:09:20.343 CET [9129] FATAL:  could not create semaphores: 
> No space left on device

Yeah, you might've been able to get by with OpenBSD/NetBSD's default
semaphore settings before, but they really only let one postmaster
run at a time; and the TAP tests want to start more than one.
For me it seems to work to append this to /etc/sysctl.conf:

kern.seminfo.semmni=100
kern.seminfo.semmns=2000

and either reboot, or install those settings manually with sysctl.

regards, tom lane



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström

On 2019-01-17 22:47, Mikael Kjellström wrote:



On 2019-01-17 22:42, Tom Lane wrote:


=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:

It says:
configure: error: Additional Perl modules are required to run TAP tests
so how do I find out with Perl modules that are required?


If you look into the configure log it should say just above that,
but I'm betting you just need p5-IPC-Run.


Yes it seems to be IPC::Run that is missing.

I've installed it manually through CPAN.

Let's see if it works better this time.


Hmmm, nope:

== 
pgsql.build/src/bin/pg_ctl/tmp_check/log/003_promote_standby.log 
===
2019-01-17 23:09:20.343 CET [9129] LOG:  listening on Unix socket 
"/tmp/g66P1fpMFK/.s.PGSQL.64980"
2019-01-17 23:09:20.343 CET [9129] FATAL:  could not create semaphores: 
No space left on device
2019-01-17 23:09:20.343 CET [9129] DETAIL:  Failed system call was 
semget(64980002, 17, 03600).
2019-01-17 23:09:20.343 CET [9129] HINT:  This error does *not* mean 
that you have run out of disk space.  It occurs when either the system 
limit for the maximum number of semaphore sets (SEMMNI), or the system 
wide maximum number of semaphores (SEMMNS), would be exceeded.  You need 
to raise the respective kernel parameter.  Alternatively, reduce 
PostgreSQL's consumption of semaphores by reducing its max_connections 
parameter.
	The PostgreSQL documentation contains more information about 
configuring your system for PostgreSQL.

2019-01-17 23:09:20.345 CET [9129] LOG:  database system is shut down

will try and increase SEMMNI and see if that helps.

/Mikael




Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström




On 2019-01-17 22:42, Tom Lane wrote:


=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:

It says:
configure: error: Additional Perl modules are required to run TAP tests
so how do I find out with Perl modules that are required?


If you look into the configure log it should say just above that,
but I'm betting you just need p5-IPC-Run.


Yes it seems to be IPC::Run that is missing.

I've installed it manually through CPAN.

Let's see if it works better this time.

/Mikael




Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Tom Lane
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:
> It says:
> configure: error: Additional Perl modules are required to run TAP tests
> so how do I find out with Perl modules that are required?

If you look into the configure log it should say just above that,
but I'm betting you just need p5-IPC-Run.

regards, tom lane



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström



On 2019-01-17 22:19, Mikael Kjellström wrote:


On 2019-01-17 22:16, Tom Lane wrote:


For what it's worth I've enabled tap-tests for my OpenBSD 5.9 (curculio)
and NetBSD 7 (sidewinder) animals now.


Oh, thanks!  I'm guessing they'll fail their next runs, but I'll
wait to see confirmation of that before I do anything about the
test bug.


They should run the next time within the hour or hour and a half so I 
guess we will find out soon enough.


Hm, that didn't go so well.

It says:

configure: error: Additional Perl modules are required to run TAP tests

so how do I find out with Perl modules that are required?

/Mikael



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström



On 2019-01-17 22:16, Tom Lane wrote:


For what it's worth I've enabled tap-tests for my OpenBSD 5.9 (curculio)
and NetBSD 7 (sidewinder) animals now.


Oh, thanks!  I'm guessing they'll fail their next runs, but I'll
wait to see confirmation of that before I do anything about the
test bug.


They should run the next time within the hour or hour and a half so I 
guess we will find out soon enough.


/Mikael




Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Tom Lane
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:
> On 2019-01-17 06:04, Tom Lane wrote:
>> Although we've got a few NetBSD and OpenBSD buildfarm critters,
>> none of them are doing --enable-tap-tests.  If they were, we'd
>> have noticed the pgbench regression tests falling over:

> For what it's worth I've enabled tap-tests for my OpenBSD 5.9 (curculio) 
> and NetBSD 7 (sidewinder) animals now.

Oh, thanks!  I'm guessing they'll fail their next runs, but I'll
wait to see confirmation of that before I do anything about the
test bug.

regards, tom lane



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Mikael Kjellström

On 2019-01-17 06:04, Tom Lane wrote:

Although we've got a few NetBSD and OpenBSD buildfarm critters,
none of them are doing --enable-tap-tests.  If they were, we'd
have noticed the pgbench regression tests falling over:


For what it's worth I've enabled tap-tests for my OpenBSD 5.9 (curculio) 
and NetBSD 7 (sidewinder) animals now.


/Mikael



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-17 Thread Fabien COELHO



Hello Tom,


Although we've got a few NetBSD and OpenBSD buildfarm critters,
none of them are doing --enable-tap-tests.  If they were, we'd
have noticed the pgbench regression tests falling over:

[...]

I am, TBH, inclined to fix this by removing that test case rather
than teaching it another spelling to accept.  I think it's very
hard to make the case that tests like this one are anything but
a waste of developer and buildfarm time.  When they are also a
portability hazard, it's time to cut our losses.  (I also note
that this test has caused us problems before, cf 869aa40a2 and
933851033.)


I'd rather keep it by simply adding the "|unknown" alternative. 30 years 
of programming have taught me that testing limit & error cases is useful, 
although you never know when it will be proven so.


Client application coverage is currently abysmal, especially "psql" 
despite the many script used for testing (39% of lines, 42% of 
functions!), pgbench is under 90%. Generally we really need more tests, 
not less. TAP tests are a good compromise because they are not always 
run, and ISTM sometimes (i.e. you asked for it) is enough.


I agree that some tests can be useless, but I do not think that it applies 
to this one. This test also checks that under a bad option pgbench stops 
with an appropriate 1 exit status. Recently a patch updated the exit 
status of pgbench in many cases to distinguish between different kind 
errors, thus having non-regression in this area was shown to be a good 
idea. Moreover, knowing that the exit status on getopt errors is 
consistent across platform has value, and knowing that there is some 
variability is not uninteresting.


--
Fabien.



PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-16 Thread Tom Lane
Although we've got a few NetBSD and OpenBSD buildfarm critters,
none of them are doing --enable-tap-tests.  If they were, we'd
have noticed the pgbench regression tests falling over:

not ok 3 - pgbench option error: bad option stderr /(?^:(unrecognized|illegal) 
option)/
#   Failed test 'pgbench option error: bad option stderr 
/(?^:(unrecognized|illegal) option)/'
#   at t/002_pgbench_no_server.pl line 190.
#   'pgbench: unknown option -- bad-option
# Try "pgbench --help" for more information.
# '
# doesn't match '(?^:(unrecognized|illegal) option)'

Sure enough, manual testing confirms that on these platforms
that error message is spelled differently:

$ pgbench --bad-option
pgbench: unknown option -- bad-option
Try "pgbench --help" for more information.


I am, TBH, inclined to fix this by removing that test case rather
than teaching it another spelling to accept.  I think it's very
hard to make the case that tests like this one are anything but
a waste of developer and buildfarm time.  When they are also a
portability hazard, it's time to cut our losses.  (I also note
that this test has caused us problems before, cf 869aa40a2 and
933851033.)

regards, tom lane