Re: [HACKERS] pgbench regression test failure

2017-11-11 Thread Steve Singer
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

This causes the pgbench tests to fail (consistently) with

not ok 194 - pgbench late throttling stdout /(?^:processed: [01]/10)/


When I run pgbench manually I get (-t 10 --rate=10 --latency-limit=1 -n -r)

number of transactions actually processed: 10/10
number of transactions skipped: 10 (100.000 %)

Prior to the patch I was getting.

number of transactions actually processed: 0/10
number of transactions skipped: 10 (100.000 %)



@@ -3539,7 +3542,7 @@ printResults(TState *threads, StatsData *total, 
instr_time total_time,^M
{^M
printf("number of transactions per client: %d\n", nxacts);^M
printf("number of transactions actually processed: " 
INT64_FORMAT "/%d\n",^M
-  total->cnt - total->skipped, nxacts * nclients);^M
+  total->cnt, nxacts * nclients);^M

I think you want ntx instead of total->cnt here. 




The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench regression test failure

2017-09-24 Thread Fabien COELHO


Hello Tom,


# progress: 2.6 s, 6.9 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms, 18 skipped
# progress: 3.0 s, 0.0 tps, lat -nan ms stddev -nan, lag -nan ms, 0 skipped
# progress: 4.0 s, 1.0 tps, lat 2682.730 ms stddev 0.000, lag 985.509 ms, 0 
skipped


(BTW, the "-nan" bits suggest an actual pgbench bug, independently of 
anything else.)


From my point of view, NaN is expected when no test were executed in the 
interval: if there was no transaction, it does not make sense to talk 
about its latency, so NaN is the right answer.


However, the above "6.9 tps, lat 0.000, stddev 0.000, lag 0.000" is 
inconsistent. As "6.9 = 18 / 2.6", it means that progress tps calculation 
should remove skipped transactions...


Attached patch attempts to report more consistent figures in the progress 
and in final report when transactions are skipped.


  sh> cat sleep-100.sql
  \sleep 100 ms
  SELECT 1;

  sh> ./pgbench -P 1 -t 100 -f sleep-100.sql -R 20 -L 1
  [...]
  progress: 1.0 s, 7.0 tps, lat 100.145 ms stddev 0.042, lag 0.000 ms, 16 
skipped
  progress: 2.0 s, 6.0 tps, lat 100.133 ms stddev 0.040, lag 0.021 ms, 7 skipped
  progress: 3.0 s, 9.0 tps, lat 100.115 ms stddev 0.016, lag 0.000 ms, 11 
skipped
  [...]
  number of transactions actually processed: 38/100
  number of transactions skipped: 62 (62.000 %)
  number of transactions above the 1.0 ms latency limit: 38 (38.000 %)
  latency average = 100.142 ms
  tps = 7.091010 (including connections establishing)
  tps = 7.094144 (excluding connections establishing)
  script statistics:
   - number of transactions skipped: 62 (62.000%)

--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e37496c..9ca9734 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -2584,7 +2584,7 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 		doLog(thread, st, agg, skipped, latency, lag);
 
 	/* XXX could use a mutex here, but we choose not to */
-	if (per_script_stats)
+	if (per_script_stats || latency_limit)
 		accumStats(_script[st->use_file].stats, skipped, latency, lag);
 }
 
@@ -3522,11 +3522,14 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	double		time_include,
 tps_include,
 tps_exclude;
+	int64		ntx = total->cnt - total->skipped;
 
 	time_include = INSTR_TIME_GET_DOUBLE(total_time);
-	tps_include = total->cnt / time_include;
-	tps_exclude = total->cnt / (time_include -
-(INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients));
+
+	/* tps is about actually executed transactions */
+	tps_include = ntx / time_include;
+	tps_exclude = ntx /
+		(time_include - (INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients));
 
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
@@ -3539,7 +3542,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	{
 		printf("number of transactions per client: %d\n", nxacts);
 		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   total->cnt - total->skipped, nxacts * nclients);
+			   total->cnt, nxacts * nclients);
 	}
 	else
 	{
@@ -4660,7 +4663,8 @@ threadRun(void *arg)
 			{
 /* generate and show report */
 StatsData	cur;
-int64		run = now - last_report;
+int64		run = now - last_report,
+			ntx;
 double		tps,
 			total_run,
 			latency,
@@ -4675,7 +4679,7 @@ threadRun(void *arg)
  * XXX: No locking. There is no guarantee that we get an
  * atomic snapshot of the transaction count and latencies, so
  * these figures can well be off by a small amount. The
- * progress is report's purpose is to give a quick overview of
+ * progress report's purpose is to give a quick overview of
  * how the test is going, so that shouldn't matter too much.
  * (If a read from a 64-bit integer is not atomic, you might
  * get a "torn" read and completely bogus latencies though!)
@@ -4689,15 +4693,14 @@ threadRun(void *arg)
 	cur.skipped += thread[i].stats.skipped;
 }
 
+/* we count only actually executed transactions */
+ntx = (cur.cnt - cur.skipped) - (last.cnt - last.skipped);
 total_run = (now - thread_start) / 100.0;
-tps = 100.0 * (cur.cnt - last.cnt) / run;
-latency = 0.001 * (cur.latency.sum - last.latency.sum) /
-	(cur.cnt - last.cnt);
-sqlat = 1.0 * (cur.latency.sum2 - last.latency.sum2)
-	/ (cur.cnt - last.cnt);
+tps = 100.0 * ntx / run;
+latency = 0.001 * (cur.latency.sum - last.latency.sum) / ntx;
+sqlat = 1.0 * (cur.latency.sum2 - last.latency.sum2) / ntx;
 stdev = 0.001 * sqrt(sqlat - 100.0 * latency * latency);
-lag = 0.001 * (cur.lag.sum - last.lag.sum) /
-	(cur.cnt - last.cnt);
+lag = 0.001 * (cur.lag.sum - last.lag.sum) / ntx;
 
 if (progress_timestamp)
 {
@@ -4714,6 +4717,7 @@ threadRun(void *arg)
 			 (long) tv.tv_sec, (long) (tv.tv_usec / 1000));
 }
 else
+	/* round seconds are expected, nut the thread may 

Re: [HACKERS] pgbench regression test failure

2017-09-23 Thread Tom Lane
Fabien COELHO  writes:
>> [...] After another week of buildfarm runs, we have a few more cases of 
>> 3 rows of output, and none of more than 3 or less than 1.  So I went 
>> ahead and pushed your patch.  I'm still suspicious of these results, but 
>> we might as well try to make the buildfarm green pending investigation 
>> of how this is happening.

> Yep. I keep the issue of pgbench tap test determinism in my todo list, 
> among other things.

> I think that it should be at least clearer under which condition (load ? 
> luck ? bug ?) the result may be 1 or 3 when 2 are expected, which needs 
> some thinking.

skink blew up real good just now:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2017-09-23%2010%3A50%3A01

the critical bit being

#   Failed test 'pgbench progress stderr /(?^:progress: 1\b)/'
#   at 
/home/andres/build/buildfarm/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm 
line 369.
#   'starting vacuum...end.
# progress: 2.6 s, 6.9 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms, 18 skipped
# progress: 3.0 s, 0.0 tps, lat -nan ms stddev -nan, lag -nan ms, 0 skipped
# progress: 4.0 s, 1.0 tps, lat 2682.730 ms stddev 0.000, lag 985.509 ms, 0 
skipped
# '
# doesn't match '(?^:progress: 1\b)'

#   Failed test 'transaction count for 001_pgbench_log_1.15981 (5)'
#   at t/001_pgbench_with_server.pl line 438.

#   Failed test 'transaction count for 001_pgbench_log_1.15981.1 (4)'
#   at t/001_pgbench_with_server.pl line 438.
# Looks like you failed 3 tests of 233.

That's exceeded my patience with this test case, so I've removed it
for the moment.  We can put it back as soon as we figure some way
to make it more robust.  (BTW, the "-nan" bits suggest an actual
pgbench bug, independently of anything else.)

Possibly you can duplicate skink's issue by running things under
valgrind.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench regression test failure

2017-09-22 Thread Fabien COELHO


[...] After another week of buildfarm runs, we have a few more cases of 
3 rows of output, and none of more than 3 or less than 1.  So I went 
ahead and pushed your patch.  I'm still suspicious of these results, but 
we might as well try to make the buildfarm green pending investigation 
of how this is happening.


Yep. I keep the issue of pgbench tap test determinism in my todo list, 
among other things.


I think that it should be at least clearer under which condition (load ? 
luck ? bug ?) the result may be 1 or 3 when 2 are expected, which needs 
some thinking.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench regression test failure

2017-09-22 Thread Tom Lane
Fabien COELHO  writes:
>> It could be as simple as putting the check-for-done at the bottom of the
>> loop not the top, perhaps.

> I agree that it is best if tests should work in all reasonable conditions, 
> including a somehow overloaded host...

> I'm going to think about it, but I'm not sure of the best approach. In the 
> mean time, ISTM that the issue has not been encountered (yet), so this is 
> not a pressing issue. Maybe under -T > --aggregate-interval pgbench could 
> go on over the limit if the log file has not been written at all, but that 
> would be some kind of kludge for this specific test...

After another week of buildfarm runs, we have a few more cases of 3 rows
of output, and none of more than 3 or less than 1.  So I went ahead and
pushed your patch.  I'm still suspicious of these results, but we might
as well try to make the buildfarm green pending investigation of how
this is happening.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench regression test failure

2017-09-13 Thread Fabien COELHO



I have a serious, serious dislike for tests that seem to work until
they're run on a heavily loaded machine.



I'm not that sure the error message was because of that.


No, this particular failure (probably) wasn't.  But now that I've realized
that this test case is timing-sensitive, I'm worried about what will
happen when it's run on a sufficiently slow or loaded machine.


I would not necessarily object to doing something in the code that
would guarantee that, though.



Hmmm. Interesting point.


It could be as simple as putting the check-for-done at the bottom of the
loop not the top, perhaps.


I agree that it is best if tests should work in all reasonable conditions, 
including a somehow overloaded host...


I'm going to think about it, but I'm not sure of the best approach. In the 
mean time, ISTM that the issue has not been encountered (yet), so this is 
not a pressing issue. Maybe under -T > --aggregate-interval pgbench could 
go on over the limit if the log file has not been written at all, but that 
would be some kind of kludge for this specific test...


Note that to get test coverage for -T and have to assume that maybe a 
loaded host would not be able to generate just a one line log every second 
during that time is kind of a hard assumption...


Maybe some test could be "warnings", i.e. it could be acceptable to accept 
a failure once in a while in specific conditions, if this is rare enough 
and documented. ISTM that there is such a test for random output.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench regression test failure

2017-09-12 Thread Tom Lane
Fabien COELHO  writes:
>> I have a serious, serious dislike for tests that seem to work until
>> they're run on a heavily loaded machine.

> I'm not that sure the error message was because of that.

No, this particular failure (probably) wasn't.  But now that I've realized
that this test case is timing-sensitive, I'm worried about what will
happen when it's run on a sufficiently slow or loaded machine.

>> I would not necessarily object to doing something in the code that
>> would guarantee that, though.

> Hmmm. Interesting point.

It could be as simple as putting the check-for-done at the bottom of the
loop not the top, perhaps.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench regression test failure

2017-09-12 Thread Fabien COELHO



I have a serious, serious dislike for tests that seem to work until
they're run on a heavily loaded machine.


I'm not that sure the error message was because of that. ISTM that it was 
rather finding 3 seconds in two because it started just at the right time, 
or maybe because of slowness induce by load and the order in which the 
different checks are performed.


So unless there is some reason why pgbench is *guaranteed* to run at 
least one transaction per thread, I'd rather the test not assume that.


Well, pgbench is for testing performance... so if the checks allow zero 
performance that's quite annoying as well:-) The tests are designed to 
require very low performance (eg there are a lot of -t 1 when only one 
transaction is enough to check a point), but maybe some test assume a 
minimal requirement, maybe 10 tps with 2 threads...



I would not necessarily object to doing something in the code that
would guarantee that, though.


Hmmm. Interesting point.

There could be a client-side synchronization barrier, eg something like 
"\sync :nclients/nthreads" could be easy enough to implement with pthread, 
and quite error prone to use, but probably that could be okay for 
validation purposes. Or maybe we could expose something at the SQL level, 
eg "SELECT synchro('synchroname', whomanyclientstowait);" which would be 
harder to implement server-side but possibly doable as well.


A simpler option may be to introduce a synchronization barrier at thread 
start, so that all threads start together and that would set the "zero" 
time. Not sure that would solve the potential issue you raise, although 
that would help.


Currently the statistics collection and outputs are performed by thread 0 
in addition to the client it runs, so that pgbench would work even if 
there are no threads, but it also means that under a heavy load some 
things may not be done on the target time but a little bit later, if some 
thread is stuck somewhere. Although the async protocol try to avoid that.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench regression test failure

2017-09-12 Thread Tom Lane
Fabien COELHO  writes:
> By definition, parallelism induces non determinism. When I put 2 seconds, 
> the intention was that I would get a non empty trace with a "every second" 
> aggregation. I would rather take a longer test rather than allowing an 
> empty file: the point is to check that something is generated, but 
> avoiding a longer test is desirable. So I would suggest to stick to 
> between 1 and 3, and if it fails then maybe add one second...

That's a losing game.  You can't ever guarantee that N seconds is
enough for slow, heavily loaded machines, and cranking up N just
penalizes developers who are testing under normal circumstances.

I have a serious, serious dislike for tests that seem to work until
they're run on a heavily loaded machine.  So unless there is some
reason why pgbench is *guaranteed* to run at least one transaction
per thread, I'd rather the test not assume that.

I would not necessarily object to doing something in the code that
would guarantee that, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench regression test failure

2017-09-12 Thread Fabien COELHO



Apparently, one of the threads ran 3 transactions where the test script
expects it to run at most 2.  Is this a pgbench bug, or is the test
being overoptimistic about how exact the "-T 2" cutoff is?



Probably both? It seems that cutting off on time is not a precise science,
so I suggest to accept 1, 2 and 3 lines, see attached.


Before I'd deciphered the test output fully, I was actually guessing that
the problem was the opposite, namely too few lines.


The test was waiting for betwen 1 and 2 lines, so I assumed that the 3
should the number of lines found.

Isn't it possible that some thread is slow enough to start up that it 
doesn't get to run any transactions?  IOW, do we need to allow 0 to 3 
lines?


By definition, parallelism induces non determinism. When I put 2 seconds, 
the intention was that I would get a non empty trace with a "every second" 
aggregation. I would rather take a longer test rather than allowing an 
empty file: the point is to check that something is generated, but 
avoiding a longer test is desirable. So I would suggest to stick to 
between 1 and 3, and if it fails then maybe add one second...


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench regression test failure

2017-09-12 Thread Tom Lane
Fabien COELHO  writes:
>> Apparently, one of the threads ran 3 transactions where the test script
>> expects it to run at most 2.  Is this a pgbench bug, or is the test
>> being overoptimistic about how exact the "-T 2" cutoff is?

> Probably both? It seems that cutting off on time is not a precise science,
> so I suggest to accept 1, 2 and 3 lines, see attached.

Before I'd deciphered the test output fully, I was actually guessing that
the problem was the opposite, namely too few lines.  Isn't it possible
that some thread is slow enough to start up that it doesn't get to run
any transactions?  IOW, do we need to allow 0 to 3 lines?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench regression test failure

2017-09-12 Thread Fabien COELHO



francolin just showed a non-reproducing failure in the new pgbench tests:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin=2017-09-12%2014%3A00%3A02



not ok 211 - transaction count for 001_pgbench_log_1.31583 (3)



#   Failed test 'transaction count for 001_pgbench_log_1.31583 (3)'
#   at t/001_pgbench_with_server.pl line 438.

Apparently, one of the threads ran 3 transactions where the test script
expects it to run at most 2.  Is this a pgbench bug, or is the test
being overoptimistic about how exact the "-T 2" cutoff is?


Probably both? It seems that cutting off on time is not a precise science,
so I suggest to accept 1, 2 and 3 lines, see attached.

--
Fabien.diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 3609b9b..1fe3433 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -463,7 +463,8 @@ pgbench(
 	'pgbench progress');
 
 # $nthreads threads, 2 seconds, sometimes only one aggregated line is written
-check_pgbench_logs('001_pgbench_log_1', $nthreads, 1, 2,
+# and sometimes 3 lines...
+check_pgbench_logs('001_pgbench_log_1', $nthreads, 1, 3,
 	qr{^\d+ \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$});
 
 # with sampling rate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pgbench regression test failure

2017-09-12 Thread Tom Lane
francolin just showed a non-reproducing failure in the new pgbench tests:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin=2017-09-12%2014%3A00%3A02

The relevant part of the log is

# Running: pgbench -T 2 -P 1 -l --log-prefix=001_pgbench_log_1 
--aggregate-interval=1 -S -b se@2 --rate=20 --latency-limit=1000 -j 2 -c 3 -r
ok 198 - pgbench progress status (got 0 vs expected 0)
ok 199 - pgbench progress stdout /(?^:type: multiple)/
ok 200 - pgbench progress stdout /(?^:clients: 3)/
ok 201 - pgbench progress stdout /(?^:threads: 2)/
ok 202 - pgbench progress stdout /(?^:duration: 2 s)/
ok 203 - pgbench progress stdout /(?^:script 1: .* select only)/
ok 204 - pgbench progress stdout /(?^:script 2: .* select only)/
ok 205 - pgbench progress stdout /(?^:statement latencies in milliseconds)/
ok 206 - pgbench progress stdout /(?^:FROM pgbench_accounts)/
ok 207 - pgbench progress stderr /(?^:vacuum)/
ok 208 - pgbench progress stderr /(?^:progress: 1\b)/
ok 209 - number of log files
ok 210 - file name format
not ok 211 - transaction count for 001_pgbench_log_1.31583 (3)

#   Failed test 'transaction count for 001_pgbench_log_1.31583 (3)'
#   at t/001_pgbench_with_server.pl line 438.
ok 212 - transaction format for 001_pgbench_log_1
ok 213 - transaction count for 001_pgbench_log_1.31583.1 (2)
ok 214 - transaction format for 001_pgbench_log_1
ok 215 - remove log files

Apparently, one of the threads ran 3 transactions where the test script
expects it to run at most 2.  Is this a pgbench bug, or is the test
being overoptimistic about how exact the "-T 2" cutoff is?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers