Re: [HACKERS] TAP tests take a long time

2017-04-12 Thread Stephen Frost
Andrew,

* Andrew Dunstan (andrew.duns...@2ndquadrant.com) wrote:
> On 04/12/2017 08:40 AM, Andrew Dunstan wrote:
> > On 04/12/2017 01:27 AM, Craig Ringer wrote:
> >> BTW, I suggest adding --timer to our default PROVE_FLAGS, so we can
> >> collect more data from the buildfarm on what takes a while. Sample
> >> output:
> >
> > I'll add that to the commandline the buildfarm uses in the upcoming release.
> 
> See
> 
> 
> One thing I noticed was this
> 
> [10:02:40] t/001_basic.pl . ok  320 ms
> [10:02:59] t/002_pg_dump.pl ... ok19441 ms
> [10:03:47] t/010_dump_connstr.pl .. ok47430 ms
> 
> Why does the last test set take 47s?

It's doing a bunch of dropdb/createdb stuff, and also creates multiple
clusters.  Part of this is because it's generating random names for
databases to make sure they are handled correctly.  I'd certainly love
to see it take less time but I've not really done very much with it.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] TAP tests take a long time

2017-04-12 Thread Andrew Dunstan


On 04/12/2017 08:40 AM, Andrew Dunstan wrote:
>
> On 04/12/2017 01:27 AM, Craig Ringer wrote:
>> BTW, I suggest adding --timer to our default PROVE_FLAGS, so we can
>> collect more data from the buildfarm on what takes a while. Sample
>> output:
>>
>
> I'll add that to the commandline the buildfarm uses in the upcoming release.
>


See


One thing I noticed was this

[10:02:40] t/001_basic.pl . ok  320 ms
[10:02:59] t/002_pg_dump.pl ... ok19441 ms
[10:03:47] t/010_dump_connstr.pl .. ok47430 ms

Why does the last test set take 47s?

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-12 Thread Tom Lane
Mithun Cy  writes:
> I have tried to optimize the tests maintaining the same coverage we were
> able to get with it.

This patch looks good to me: on my workstation, it reduces the total
runtime of the parallel regression tests from ~20.5 to ~16.5 seconds.
I concur that it doesn't look like it would reduce test coverage
significantly.  It gets rid of a VACUUM FULL and a REINDEX, both of
which should be equivalent to an ordinary index build so far as hash is
concerned, and it trims the volume of data being run through the test.

If no objections, I'll push this soon.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-12 Thread Mithun Cy
On Wed, Apr 12, 2017 at 6:24 PM, Amit Kapila 
wrote:
>
> I have looked into the tests and I think we can do some optimization
> without losing much on code coverage.  First is we are doing both
> Vacuum Full and Vacuum on hash_split_heap in the same test after
> executing few statements, it seems to me that we can avoid doing
> Vacuum Full.  Also, I think we can try by reducing the number of
> inserts in hash_split_heap to see if we can achieve the code coverage
> of split bucket code path.  Finally, I am not sure if Reindex Index
> hash_split_index is required for code coverage, but we might need it
> for the sake of testing that operation.
>
> Mithun, as you are the original author of these tests, can you please
> try some of the above optimizations and any others you can think of
> and see if we can reduce the time for hash index tests?

I have tried to optimize the tests maintaining the same coverage we were
able to get with it.

Original coverage

Lines:  1466 2341  62.6 %
Functions: 79101   78.2 %

After test changes

Lines:   15342341 65.5 %
Functions: 79 101 78.2 %

changes done
Earlier :

CREATE INDEX hash_split_index on hash_split_heap USING HASH (keycol);

+ Time: 8.950 ms

INSERT INTO hash_split_heap SELECT 1 FROM generate_series(1, 7) a;
+ Time: 3135.098 ms (00:03.135)

VACUUM FULL hash_split_heap;

+ Time: 2748.146 ms (00:02.748)
REINDEX INDEX hash_split_index;

+ Time: 114.290 ms

Insertion and vacuum full were taking most of the time in tests. I have
removed the vacuum full as suggested by you, which is taken care by vacuum
statement at the end, And reduced the number of rows inserted to trigger a
split, by inserting some records before creating hash index. This way we
start with less number of buckets and thus splits will happen much early.

Now :

+ INSERT INTO hash_split_heap SELECT 1 FROM generate_series(1, 500) a;

+ Time: 2.946 ms

CREATE INDEX hash_split_index on hash_split_heap USING HASH (keycol);

! Time: 8.933 ms

! INSERT INTO hash_split_heap SELECT 1 FROM generate_series(1, 5000) a;

! Time: 81.674 ms

Also removed REINDEX INDEX as suggested which was not adding for coverage.
With this, I think tests should run significantly faster now.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


hash_index_optimize_testruns.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-12 Thread Amit Kapila
On Tue, Apr 11, 2017 at 9:38 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Apr 11, 2017 at 11:32 AM, Andrew Dunstan
>>  wrote:
>>> This buildfarm run as you can see takes 33m32s, and the Tap tests take a
>>> combined 19m52s of that time.
>
>> I don't think it's quite fair to complain about the TAP tests taking a
>> long time to run as a general matter.  Many people here have long
>> wanted a separate test-suite for long running tests that can be run to
>> really check everything possible, and apparently, TAP tests are it.
>
>> What I think would be more useful is to break down where the time is
>> getting spent.  It may be that some of those tests are not adding
>> value proportionate to their runtime.
>
> The other thing that might be useful here is to push on parallelizing
> buildfarm runs.  Admittedly that will do nothing for the oldest and
> slowest buildfarm critters, but for reasonably modern hardware the
> serialization of the tests really handicaps you.  We seem to have
> fixed that for manual application of "make check-world", at least
> if you know the right magic incantations to parallelize it; but
> AFAIK the buildfarm script is entirely serial.
>
> BTW, speaking of "value proportionate to runtime", the hash_index
> regression script is now visibly a bottleneck in the core regression
> tests.  Could we take another look at whether that needs to run
> quite so long?
>

I have looked into the tests and I think we can do some optimization
without losing much on code coverage.  First is we are doing both
Vacuum Full and Vacuum on hash_split_heap in the same test after
executing few statements, it seems to me that we can avoid doing
Vacuum Full.  Also, I think we can try by reducing the number of
inserts in hash_split_heap to see if we can achieve the code coverage
of split bucket code path.  Finally, I am not sure if Reindex Index
hash_split_index is required for code coverage, but we might need it
for the sake of testing that operation.

Mithun, as you are the original author of these tests, can you please
try some of the above optimizations and any others you can think of
and see if we can reduce the time for hash index tests?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-12 Thread Andrew Dunstan


On 04/12/2017 01:27 AM, Craig Ringer wrote:
>
> BTW, I suggest adding --timer to our default PROVE_FLAGS, so we can
> collect more data from the buildfarm on what takes a while. Sample
> output:
>


I'll add that to the commandline the buildfarm uses in the upcoming release.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-12 Thread Tom Lane
Craig Ringer  writes:
> With
> make PROVE_FLAGS="--timer -j 9" check
> I don't see much difference with/without caching initdb results -
> saves about 4 seconds, from 74 to 70 seconds, but hard to tell with
> the error margins.

> So if we're going to do anything, defaulting to parallel prove seems a
> good start, and I'm not sure caching initdb results is worth it.

Yeah, the results shown in this thread are not very good :-(.  I'm
suspicious that the results are basically I/O-bound in your environment.
There might be some value in the cached-initdb-tree solution for slower,
more CPU-bound machines, but cutting out the regprocin overhead would
probably do just as much good there.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-12 Thread Craig Ringer
On 12 April 2017 at 13:27, Craig Ringer  wrote:
> On 12 April 2017 at 08:22, Michael Paquier  wrote:
>> On Wed, Apr 12, 2017 at 9:12 AM, Craig Ringer
>>  wrote:
>>> Well, you can get a lot of information on timings in crake's latest
>>> builds for example, or nightjar's.
>>>
>>> Here's an interesting fact: almost all the time taken up in the scripts
>>> test set seems to be consumed in running initdb over and over.
>>>
>>> Is it worth adding an init(cache => 1) option to PostgresNode, where we
>>> stash an initdb'd db in tmp_check/  and just do a simple fs copy of it ?
>>
>> This looks like a good idea to me, but I don't like much the idea of
>> an option in the init() routine as that's hard to maintain.
>
> How so?
>
>> It would
>> make sense to me to be able to override the initialization with an
>> environment variable instead
>
> Yeah, that's reasonable.
>
> Patch attached. It supports setting CACHE_INITDB=0 to disable the cache.
>
> With cache: 3m55.063s (user 7.7s, sys 0m11.833s)
>
> Without cache:  4m11.229s (user 0m16.963s, sys 0m12.384s)
>
> so in a serial run it doesn't make a ton of difference.
>
> Adding some timing on initdb runs shows that initdb takes about 1s, a
> cached copy about 0.1s with our Perl based recursive copy code. So
> it's hardly an overwhelming benefit, but might be more beneficial with
> prove -j .

Of course, it'll need locking for prove -j. Added. Patch attached,
applies on top of prior patch.

With

make PROVE_FLAGS="--timer -j 9" check

I don't see much difference with/without caching initdb results -
saves about 4 seconds, from 74 to 70 seconds, but hard to tell with
the error margins.

So if we're going to do anything, defaulting to parallel prove seems a
good start, and I'm not sure caching initdb results is worth it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From db1aadf450a15cbd4041c5f9bf5a0b16f66c2ad5 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 12 Apr 2017 14:03:25 +0800
Subject: [PATCH] Add locking for initdb caching to allow parallel prove

---
 src/test/perl/PostgresNode.pm | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index eef9f66..0eacd41 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -94,6 +94,7 @@ use Socket;
 use Test::More;
 use TestLib ();
 use Scalar::Util qw(blessed);
+use Fcntl qw(:flock);
 
 our @EXPORT = qw(
   get_new_node
@@ -477,15 +478,27 @@ sub _initdb_cached
 	my $pgdata = $self->data_dir;
 
 	my $cachedir = TestLib::tmp_check_dir() . "/initdb_cache";
+	my $lockfile = TestLib::tmp_check_dir() . "/.initdb_cache.lock";
 
 	if (! -d $cachedir)
 	{
 		print(STDERR "initializing initdb cache\n");
-		# initdb into a tempdir, then rename the result so we don't risk
-		# leaving failed initdb results around
-		my $temp_dbdir = TestLib::tempdir("initdb_cache_");
-		$self->_initdb($temp_dbdir, %params);
-		rename($temp_dbdir, $cachedir);
+		open my $lockfh, ">", $lockfile
+			or die "cannot open $lockfile for writing: $1\n";
+		flock($lockfh, LOCK_EX)
+			or die "cannot flock($lockfile) in exclusive mode: $!\n";
+		# If running in parallel with other tests, someone else might've got
+		# the lock first and have swapped an initdb'd directory in already.
+		if (! -d $cachedir)
+		{
+			# initdb into a tempdir, then rename the result so we don't risk
+			# leaving failed initdb results around
+			my $temp_dbdir = TestLib::tempdir("initdb_cache_");
+			$self->_initdb($temp_dbdir, %params);
+			rename($temp_dbdir, $cachedir);
+		}
+		flock($lockfh, LOCK_UN)
+			or die "cannot unlock $lockfile: $!";
 	}
 
 	RecursiveCopy::copypath($cachedir, $pgdata);
-- 
2.5.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Craig Ringer
On 12 April 2017 at 08:22, Michael Paquier  wrote:
> On Wed, Apr 12, 2017 at 9:12 AM, Craig Ringer
>  wrote:
>> Well, you can get a lot of information on timings in crake's latest
>> builds for example, or nightjar's.
>>
>> Here's an interesting fact: almost all the time taken up in the scripts
>> test set seems to be consumed in running initdb over and over.
>>
>> Is it worth adding an init(cache => 1) option to PostgresNode, where we
>> stash an initdb'd db in tmp_check/  and just do a simple fs copy of it ?
>
> This looks like a good idea to me, but I don't like much the idea of
> an option in the init() routine as that's hard to maintain.

How so?

> It would
> make sense to me to be able to override the initialization with an
> environment variable instead

Yeah, that's reasonable.

Patch attached. It supports setting CACHE_INITDB=0 to disable the cache.

With cache: 3m55.063s (user 7.7s, sys 0m11.833s)

Without cache:  4m11.229s (user 0m16.963s, sys 0m12.384s)

so in a serial run it doesn't make a ton of difference.

Adding some timing on initdb runs shows that initdb takes about 1s, a
cached copy about 0.1s with our Perl based recursive copy code. So
it's hardly an overwhelming benefit, but might be more beneficial with
prove -j .


BTW, I suggest adding --timer to our default PROVE_FLAGS, so we can
collect more data from the buildfarm on what takes a while. Sample
output:

[13:20:47] t/001_stream_rep.pl .. ok50445 ms (
0.01 usr  0.00 sys +  2.16 cusr  3.57 csys =  5.74 CPU)
[13:21:38] t/002_archiving.pl ... ok 5883 ms (
0.01 usr  0.00 sys +  0.19 cusr  0.45 csys =  0.65 CPU)
[13:21:44] t/003_recovery_targets.pl  ok26197 ms (
0.00 usr  0.00 sys +  0.70 cusr  1.75 csys =  2.45 CPU)
[13:22:10] t/004_timeline_switch.pl . ok10049 ms (
0.01 usr  0.00 sys +  0.24 cusr  0.53 csys =  0.78 CPU)
[13:22:20] t/005_replay_delay.pl  ok10224 ms (
0.00 usr  0.01 sys +  0.18 cusr  0.43 csys =  0.62 CPU)
[13:22:30] t/006_logical_decoding.pl  ok 7570 ms (
0.00 usr  0.00 sys +  0.25 cusr  0.32 csys =  0.57 CPU)
[13:22:38] t/007_sync_rep.pl  ok20693 ms (
0.00 usr  0.00 sys +  0.51 cusr  1.24 csys =  1.75 CPU)
[13:22:58] t/008_fsm_truncation.pl .. ok25399 ms (
0.01 usr  0.00 sys +  0.26 cusr  0.52 csys =  0.79 CPU)
[13:23:24] t/009_twophase.pl  ok58531 ms (
0.01 usr  0.00 sys +  0.49 cusr  0.92 csys =  1.42 CPU)
[13:24:22] t/010_logical_decoding_timelines.pl .. ok 8135 ms (
0.00 usr  0.01 sys +  0.34 cusr  0.76 csys =  1.11 CPU)
[13:24:30] t/011_crash_recovery.pl .. ok 4334 ms (
0.00 usr  0.00 sys +  0.12 cusr  0.16 csys =  0.28 CPU)


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From d2b30831803eb3edb2e39b2e49b89bc74a65ee37 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 12 Apr 2017 13:25:19 +0800
Subject: [PATCH] Cache initdb runs in TAP tests

---
 src/test/perl/PostgresNode.pm  | 51 +++---
 src/test/perl/RecursiveCopy.pm |  5 +
 src/test/perl/TestLib.pm   |  5 +
 3 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index cb84f1f..eef9f66 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -391,13 +391,28 @@ sub init
 
 	$params{allows_streaming} = 0 unless defined $params{allows_streaming};
 	$params{has_archiving}= 0 unless defined $params{has_archiving};
+	$params{cache_initdb}	  = 1 unless defined $params{cache_initdb};
 
 	mkdir $self->backup_dir;
 	mkdir $self->archive_dir;
 
-	TestLib::system_or_bail('initdb', '-D', $pgdata, '-A', 'trust', '-N',
-		@{ $params{extra} });
-	TestLib::system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata);
+	# Respect CACHE_INITDB=0 in the environment if set
+	$params{cache_initdb} = 0
+		if (defined($ENV{'CACHE_INITDB'}) && !$ENV{'CACHE_INITDB'});
+
+	# We don't cache initdb results for non-default runs
+	# and they're too uncommon to care about anyway.
+	$params{cache_initdb} = 0
+		if defined($params{extra});
+
+	if ($params{cache_initdb})
+	{
+		$self->_initdb_cached(%params);
+	}
+	else
+	{
+		$self->_initdb($pgdata, %params);
+	}
 
 	open my $conf, '>>', "$pgdata/postgresql.conf";
 	print $conf "\n# Added by PostgresNode.pm\n";
@@ -446,6 +461,36 @@ sub init
 	$self->enable_archiving if $params{has_archiving};
 }
 
+# Wrapper to actually run initdb its self
+sub _initdb
+{
+	my ($self, $pgdata, %params) = @_;
+
+	TestLib::system_or_bail('initdb', '-D', $pgdata, '-A', 'trust', '-N',
+		@{ $params{extra} });
+	TestLib::system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata);
+}
+
+sub _initdb_cached
+{
+	my ($self, %params) = @_;
+	my $pgdata = 

Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Tom Lane
Andres Freund  writes:
> The time in initdb is largely spent in regprocin and server start/stop,
> so that doesn't surprise me.

Yeah.  We had a plan for removing the regprocin overhead via preprocessing
of the .bki file, but I've not heard anything about that project for
months :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Andres Freund
On 2017-04-12 10:39:22 +0800, Craig Ringer wrote:
> On 12 April 2017 at 08:22, Michael Paquier  wrote:
> > On Wed, Apr 12, 2017 at 9:12 AM, Craig Ringer
> >  wrote:
> >>
> >> We should also have a debug --no-fsync option for initdb, or maybe allow it
> >> to take -o options to pass to child postgres so we can pass fsync=off .
> >
> > That's an idea as well...
> 
> OK, looking at this, initdb already runs postgres -F then does its own
> fsync()s at the end.
> 
> It already has a -N / --no-sync option too. Which is already used by
> PostgresNode.pm . So as much as can be done has already been done
> here.

The time in initdb is largely spent in regprocin and server start/stop,
so that doesn't surprise me.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Andrew Dunstan


On 04/11/2017 08:12 PM, Craig Ringer wrote:
>
>
> On 12 Apr. 2017 03:14, "Andrew Dunstan"
>  > wrote:
>
>
>
> On 04/11/2017 12:08 PM, Tom Lane wrote:
> > Robert Haas  > writes:
> >> On Tue, Apr 11, 2017 at 11:32 AM, Andrew Dunstan
> >>  > wrote:
> >>> This buildfarm run as you can see takes 33m32s, and the Tap
> tests take a
> >>> combined 19m52s of that time.
> >> I don't think it's quite fair to complain about the TAP tests
> taking a
> >> long time to run as a general matter.  Many people here have long
> >> wanted a separate test-suite for long running tests that can be
> run to
> >> really check everything possible, and apparently, TAP tests are it.
> >> What I think would be more useful is to break down where the
> time is
> >> getting spent.  It may be that some of those tests are not adding
> >> value proportionate to their runtime.
>
>
>
> Well, you can get a lot of information on timings in crake's latest
> builds for example, or nightjar's.
>
> Here's an interesting fact: almost all the time taken up in the
> scripts
> test set seems to be consumed in running initdb over and over.
>
>
>
> Is it worth adding an init(cache => 1) option to PostgresNode, where
> we stash an initdb'd db in tmp_check/  and just do a simple fs copy of
> it ? 
>
> Even default to caching and allow an option to disable the cached copy.
>
> We're going to need to initdb a lot in the TAP tests. We often need a
> clean state. Tests also get harder to debug the more you pack into a
> single test run and more difficult to run individually to test some
> specific failure. So IMO the best thing is to try to make that repeat
> initdb as fast as possible.
>
> It reduces our coverage of initdb only incredibly slightly - all that
> repeat runs will do is help find very uncommon intermittent failures.
> And we rerun the buildfarm all the time so it's not like there's a
> shortage of initdb runs anyway.
>
> We should also have a debug --no-fsync option for initdb, or maybe
> allow it to take -o options to pass to child postgres so we can pass
> fsync=off .


Absolutely worth it I should say.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Craig Ringer
On 12 April 2017 at 08:22, Michael Paquier  wrote:
> On Wed, Apr 12, 2017 at 9:12 AM, Craig Ringer
>  wrote:
>>
>> We should also have a debug --no-fsync option for initdb, or maybe allow it
>> to take -o options to pass to child postgres so we can pass fsync=off .
>
> That's an idea as well...

OK, looking at this, initdb already runs postgres -F then does its own
fsync()s at the end.

It already has a -N / --no-sync option too. Which is already used by
PostgresNode.pm . So as much as can be done has already been done
here.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Michael Paquier
On Wed, Apr 12, 2017 at 9:12 AM, Craig Ringer
 wrote:
> Well, you can get a lot of information on timings in crake's latest
> builds for example, or nightjar's.
>
> Here's an interesting fact: almost all the time taken up in the scripts
> test set seems to be consumed in running initdb over and over.
>
> Is it worth adding an init(cache => 1) option to PostgresNode, where we
> stash an initdb'd db in tmp_check/  and just do a simple fs copy of it ?

This looks like a good idea to me, but I don't like much the idea of
an option in the init() routine as that's hard to maintain. It would
make sense to me to be able to override the initialization with an
environment variable instead, or just make it the default and get the
base image stored in tmp_install/. Base backups are out of scope
though.

> It reduces our coverage of initdb only incredibly slightly - all that repeat
> runs will do is help find very uncommon intermittent failures. And we rerun
> the buildfarm all the time so it's not like there's a shortage of initdb
> runs anyway.

initdb is always run with the same set of options when init() is
called, so this is not something to worry about. And tests inherent to
initdb should happen in src/bin/initdb.

> We should also have a debug --no-fsync option for initdb, or maybe allow it
> to take -o options to pass to child postgres so we can pass fsync=off .

That's an idea as well...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Craig Ringer
On 12 Apr. 2017 03:14, "Andrew Dunstan" 
wrote:



On 04/11/2017 12:08 PM, Tom Lane wrote:
> Robert Haas  writes:
>> On Tue, Apr 11, 2017 at 11:32 AM, Andrew Dunstan
>>  wrote:
>>> This buildfarm run as you can see takes 33m32s, and the Tap tests take a
>>> combined 19m52s of that time.
>> I don't think it's quite fair to complain about the TAP tests taking a
>> long time to run as a general matter.  Many people here have long
>> wanted a separate test-suite for long running tests that can be run to
>> really check everything possible, and apparently, TAP tests are it.
>> What I think would be more useful is to break down where the time is
>> getting spent.  It may be that some of those tests are not adding
>> value proportionate to their runtime.



Well, you can get a lot of information on timings in crake's latest
builds for example, or nightjar's.

Here's an interesting fact: almost all the time taken up in the scripts
test set seems to be consumed in running initdb over and over.



Is it worth adding an init(cache => 1) option to PostgresNode, where we
stash an initdb'd db in tmp_check/  and just do a simple fs copy of it ?

Even default to caching and allow an option to disable the cached copy.

We're going to need to initdb a lot in the TAP tests. We often need a clean
state. Tests also get harder to debug the more you pack into a single test
run and more difficult to run individually to test some specific failure.
So IMO the best thing is to try to make that repeat initdb as fast as
possible.

It reduces our coverage of initdb only incredibly slightly - all that
repeat runs will do is help find very uncommon intermittent failures. And
we rerun the buildfarm all the time so it's not like there's a shortage of
initdb runs anyway.

We should also have a debug --no-fsync option for initdb, or maybe allow it
to take -o options to pass to child postgres so we can pass fsync=off .


Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Andrew Dunstan


On 04/11/2017 03:58 PM, Andres Freund wrote:
> On 2017-04-11 15:52:34 -0400, Tom Lane wrote:
>> Andrew Dunstan  writes:
 The other thing that might be useful here is to push on parallelizing
 buildfarm runs.
>>> One reason I haven't supported "make -j nn" everywhere (it is supported
>>> for making core, contrib and test_modules) is that the output tends to
>>> get rather jumbled, and I didn't want to impose that extra burden in
>>> people trying to decipher the results.
>> Agreed, that would be a mess.
> Doesn't make's -Otarget largely solve this?
>


Possibly, but I have just gone to a not inconsiderable amount of trouble
to disaggregate the TAP tests so we get not only the output but all the
logs for each step separately. Say there's a problem in pg_rewind.
Compare this

with having to pick apart this:

So I don't really want the output from these run in parallel to be run
all together, even if they aren't jumbled.

Now it might make sense for other pieces to be run like that. I will
have a look. But the ones that are taking the most time are ones that
seem least suitable.

Afterthought: It might be possible to make it a runtime option.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Tom Lane
Andres Freund  writes:
> On 2017-04-11 15:52:34 -0400, Tom Lane wrote:
>> Agreed, that would be a mess.

> Doesn't make's -Otarget largely solve this?

Given a sufficiently new make (which would include exactly zero of
my own buildfarm critters), that would help.  I think it does nothing
much for "prove -j" though, and certainly not for MSVC.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Andres Freund
On 2017-04-11 15:52:34 -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
> >> The other thing that might be useful here is to push on parallelizing
> >> buildfarm runs.
> 
> > One reason I haven't supported "make -j nn" everywhere (it is supported
> > for making core, contrib and test_modules) is that the output tends to
> > get rather jumbled, and I didn't want to impose that extra burden in
> > people trying to decipher the results.
> 
> Agreed, that would be a mess.

Doesn't make's -Otarget largely solve this?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Tom Lane
Andrew Dunstan  writes:
>> The other thing that might be useful here is to push on parallelizing
>> buildfarm runs.

> One reason I haven't supported "make -j nn" everywhere (it is supported
> for making core, contrib and test_modules) is that the output tends to
> get rather jumbled, and I didn't want to impose that extra burden in
> people trying to decipher the results.

Agreed, that would be a mess.  I was thinking in terms of running steps in
parallel if they have independent output log files, so that that problem
wouldn't arise.  AFAIK, for example, we could run the per-subdirectory
"make check" tests in src/bin/ in parallel without incurring any
legibility issues.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Andrew Dunstan


On 04/11/2017 12:08 PM, Tom Lane wrote:
> Robert Haas  writes:
>> On Tue, Apr 11, 2017 at 11:32 AM, Andrew Dunstan
>>  wrote:
>>> This buildfarm run as you can see takes 33m32s, and the Tap tests take a
>>> combined 19m52s of that time.
>> I don't think it's quite fair to complain about the TAP tests taking a
>> long time to run as a general matter.  Many people here have long
>> wanted a separate test-suite for long running tests that can be run to
>> really check everything possible, and apparently, TAP tests are it.
>> What I think would be more useful is to break down where the time is
>> getting spent.  It may be that some of those tests are not adding
>> value proportionate to their runtime.



Well, you can get a lot of information on timings in crake's latest
builds for example, or nightjar's.

Here's an interesting fact: almost all the time taken up in the scripts
test set seems to be consumed in running initdb over and over. The
actual tests take almost no time at all. Other test sets might have
similar stats, I am still digging.

+1 for doing some assessments on test value, and also for possibly
creating several test categories, based on both speed and criticality. I
could imaging having some very long running tests, using TAP or
pg_regress,  and optionally enabling them in a buildfarm critter.

I'm not at all opposed to having TAP tests - I have just put in quite a
bit of time extending and improving buildfarm coverage of them. And I'm
probably about to do some more work to enable similar functionality for
MSVC.


> The other thing that might be useful here is to push on parallelizing
> buildfarm runs.  Admittedly that will do nothing for the oldest and
> slowest buildfarm critters, but for reasonably modern hardware the
> serialization of the tests really handicaps you.  We seem to have
> fixed that for manual application of "make check-world", at least
> if you know the right magic incantations to parallelize it; but
> AFAIK the buildfarm script is entirely serial.


Yeah, it is, and changing that won't be simple. Say 3 steps run in
parallel fail? What do we do? I guess we can just pick the first (by
some definition) and discard the others. Parallelizing individual steps
might be simpler, I'd have to take a look.

Honestly, I'm not that concerned about how long it takes the buildfarm
client per se to run, although I am working on facilities for it where
timing will matter to some people.

And anyone who is concerned about how long it takes on their machine can
run without --enable-tap-tests.

Not sure what the magic incantation is other than "make -j nn"?

One reason I haven't supported "make -j nn" everywhere (it is supported
for making core, contrib and test_modules) is that the output tends to
get rather jumbled, and I didn't want to impose that extra burden in
people trying to decipher the results.


cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Peter Eisentraut
On 4/11/17 12:08, Tom Lane wrote:
> The other thing that might be useful here is to push on parallelizing
> buildfarm runs.  Admittedly that will do nothing for the oldest and
> slowest buildfarm critters, but for reasonably modern hardware the
> serialization of the tests really handicaps you.

Especially the recovery and subscription tests do a lot of waiting for
one of the nodes to catch up.  You get by that by running several of
these tests in parallel, which is well supported now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Tom Lane
Robert Haas  writes:
> On Tue, Apr 11, 2017 at 11:32 AM, Andrew Dunstan
>  wrote:
>> This buildfarm run as you can see takes 33m32s, and the Tap tests take a
>> combined 19m52s of that time.

> I don't think it's quite fair to complain about the TAP tests taking a
> long time to run as a general matter.  Many people here have long
> wanted a separate test-suite for long running tests that can be run to
> really check everything possible, and apparently, TAP tests are it.

> What I think would be more useful is to break down where the time is
> getting spent.  It may be that some of those tests are not adding
> value proportionate to their runtime.

The other thing that might be useful here is to push on parallelizing
buildfarm runs.  Admittedly that will do nothing for the oldest and
slowest buildfarm critters, but for reasonably modern hardware the
serialization of the tests really handicaps you.  We seem to have
fixed that for manual application of "make check-world", at least
if you know the right magic incantations to parallelize it; but
AFAIK the buildfarm script is entirely serial.

BTW, speaking of "value proportionate to runtime", the hash_index
regression script is now visibly a bottleneck in the core regression
tests.  Could we take another look at whether that needs to run
quite so long?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Robert Haas
On Tue, Apr 11, 2017 at 11:32 AM, Andrew Dunstan
 wrote:
> I've complained about this before. Below are some timings from buildfarm
> member nightjar as I test out the new client code.
>
> This buildfarm run as you can see takes 33m32s, and the Tap tests take a
> combined 19m52s of that time. That seems quite an inordinate amount of
> time, when checking out the code, configuring, building and running all
> the other checks in two locales in many cases consumes 13m40s. If people
> want to incorporate these into the check-world targets and then
> encourage developers to run check-world, there will be a massive fail if
> you can take a shower and make and drink a cup of coffee in the time it
> takes to run it.

I don't think it's quite fair to complain about the TAP tests taking a
long time to run as a general matter.  Many people here have long
wanted a separate test-suite for long running tests that can be run to
really check everything possible, and apparently, TAP tests are it.

What I think would be more useful is to break down where the time is
getting spent.  It may be that some of those tests are not adding
value proportionate to their runtime.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers