Re: reducing isolation tests runtime

2019-02-13 Thread Tom Lane
Andres Freund  writes:
> Right, but why don't we allow for more tests in a group, and then use a
> default max_connections to limit concurrency? Having larger groups is
> advantageous wrt test runtime - it reduces the number of artificial
> serialization point where the slowest test slows things down.  Obviously
> there's still a few groups that are needed for test interdependency
> management, but that's comparatively rare. We have have plenty groups
> that are just broken up to stay below max_concurrent_tests.

Meh.  That would also greatly increase the scope for hard-to-reproduce
conflicts between concurrent tests.  I'm not especially excited about
going there.

regards, tom lane



Re: reducing isolation tests runtime

2019-02-13 Thread Andres Freund
Hi,

On 2019-02-13 12:41:41 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Do you have an idea why we have both max_concurrent_tests *and*
> > max_connections in pg_regress? ISTM the former isn't really useful given
> > the latter?
> 
> No, the former is a static restriction on what the schedule file is
> allowed to contain, the latter is a dynamic restriction (that typically
> is unlimited anyway).

Right, but why don't we allow for more tests in a group, and then use a
default max_connections to limit concurrency? Having larger groups is
advantageous wrt test runtime - it reduces the number of artificial
serialization point where the slowest test slows things down.  Obviously
there's still a few groups that are needed for test interdependency
management, but that's comparatively rare. We have have plenty groups
that are just broken up to stay below max_concurrent_tests.

Greetings,

Andres Freund



Re: reducing isolation tests runtime

2019-02-13 Thread Tom Lane
Andres Freund  writes:
> Do you have an idea why we have both max_concurrent_tests *and*
> max_connections in pg_regress? ISTM the former isn't really useful given
> the latter?

No, the former is a static restriction on what the schedule file is
allowed to contain, the latter is a dynamic restriction (that typically
is unlimited anyway).

regards, tom lane



Re: reducing isolation tests runtime

2019-02-13 Thread Andres Freund
On 2019-02-13 10:58:50 -0500, Tom Lane wrote:
> Alvaro Herrera  writes:
> > On 2019-Feb-13, Tom Lane wrote:
> >> Some of the slower buildfarm critters use MAX_CONNECTIONS to limit
> >> the load on their hosts.  As long as the isolation tests honor that,
> >> I don't see a real need for a separate serial schedule.
> 
> > MAX_CONNECTIONS was the only reason I didn't push this through.  Do you
> > (Andres) have any solution to that?
> 
> Doesn't the common pg_regress.c infrastructure handle that?
> We might need to improve isolation_main.c and/or the isolation
> Makefile to make it accessible.

> I suppose that in what I'm thinking about, MAX_CONNECTIONS would be
> interpreted as "max number of concurrent isolation scripts", which
> is not exactly number of connections.  A quick and dirty answer
> would be to have isolation_main.c divide the limit by a factor of 4
> or so.

I guess that could work, although it's certainly not too pretty.
Alternatively we could pre-parse the spec files, but that's a bit
annoying given isolationtester.c is a separate c file...

Do you have an idea why we have both max_concurrent_tests *and*
max_connections in pg_regress? ISTM the former isn't really useful given
the latter?

Greetings,

Andres Freund



Re: reducing isolation tests runtime

2019-02-13 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Feb-13, Tom Lane wrote:
>> Some of the slower buildfarm critters use MAX_CONNECTIONS to limit
>> the load on their hosts.  As long as the isolation tests honor that,
>> I don't see a real need for a separate serial schedule.

> MAX_CONNECTIONS was the only reason I didn't push this through.  Do you
> (Andres) have any solution to that?

Doesn't the common pg_regress.c infrastructure handle that?
We might need to improve isolation_main.c and/or the isolation
Makefile to make it accessible.

I suppose that in what I'm thinking about, MAX_CONNECTIONS would be
interpreted as "max number of concurrent isolation scripts", which
is not exactly number of connections.  A quick and dirty answer
would be to have isolation_main.c divide the limit by a factor of 4
or so.

regards, tom lane



Re: reducing isolation tests runtime

2019-02-13 Thread Alvaro Herrera
On 2019-Feb-13, Tom Lane wrote:

> Andres Freund  writes:
> > I'm working an updated version of this. Adding the new tests is a bit
> > painful because of conflicting names making it harder than necessary to
> > schedule tests. While it's possible to work out a schedule that doesn't
> > conflict, it's pretty annoying to do and more importantly seems fragile
> > - it's very easy to create schedules that succeed on one machine, and
> > not on another, based on how slow which tests are.
> 
> > I'm more inclined to be a bit more aggressive in renaming tables -
> > there's not much point in having a lot of "foo"s around.  So I'm
> > inclined to rename some of the names that are more likely to
> > conflict. If we agree on doing that, I'd like to do that first, and
> > commit that more aggressively than the schedule itself.
> 
> +1

+1

(Using separate schemas sounds a useful idea if we accumulate dozens of
tests, so I suggest that we do that for future tests, but for the time
being I wouldn't bother.)

> > Do we want to maintain a serial version of the schedule too?
> 
> Some of the slower buildfarm critters use MAX_CONNECTIONS to limit
> the load on their hosts.  As long as the isolation tests honor that,
> I don't see a real need for a separate serial schedule.

MAX_CONNECTIONS was the only reason I didn't push this through.  Do you
(Andres) have any solution to that?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: reducing isolation tests runtime

2019-02-13 Thread Tom Lane
Andres Freund  writes:
> I'm working an updated version of this. Adding the new tests is a bit
> painful because of conflicting names making it harder than necessary to
> schedule tests. While it's possible to work out a schedule that doesn't
> conflict, it's pretty annoying to do and more importantly seems fragile
> - it's very easy to create schedules that succeed on one machine, and
> not on another, based on how slow which tests are.

> I'm more inclined to be a bit more aggressive in renaming tables -
> there's not much point in having a lot of "foo"s around.  So I'm
> inclined to rename some of the names that are more likely to
> conflict. If we agree on doing that, I'd like to do that first, and
> commit that more aggressively than the schedule itself.

+1

> Do we want to maintain a serial version of the schedule too?

Some of the slower buildfarm critters use MAX_CONNECTIONS to limit
the load on their hosts.  As long as the isolation tests honor that,
I don't see a real need for a separate serial schedule.

(We've talked about retiring the serial sched for the main regression
tests, and while that trigger's not been pulled yet, I think it's
just a matter of time.  So making the isolation tests follow that
precedent seems wrong anyway.)

regards, tom lane



Re: reducing isolation tests runtime

2019-02-12 Thread Andres Freund
Hi,

On 2018-01-25 17:34:15 -0300, Alvaro Herrera wrote:
> Tom Lane wrote:
> > Alvaro Herrera  writes:
> 
> > > I think we could solve this by putting in the same parallel group only
> > > slow tests that mostly sleeps, ie. nothing that would monopolize CPU for
> > > long enough to cause a problem.  Concretely:
> > > test: timeouts tuplelock-update deadlock-hard deadlock-soft-2
> > 
> > OK, but there'd better be a comment there explaining the concern
> > very precisely, or somebody will break it.
> 
> Here's a concrete proposal.  Runtime is 45.7 seconds on my laptop.  It
> can be further reduced, but not by more than a second or two unless you
> get in the business of modifying other tests.  (I only modified
> deadlock-soft-2 because it saves 5 seconds).

I'm working an updated version of this. Adding the new tests is a bit
painful because of conflicting names making it harder than necessary to
schedule tests. While it's possible to work out a schedule that doesn't
conflict, it's pretty annoying to do and more importantly seems fragile
- it's very easy to create schedules that succeed on one machine, and
not on another, based on how slow which tests are.

I'm more inclined to be a bit more aggressive in renaming tables -
there's not much point in having a lot of "foo"s around.  So I'm
inclined to rename some of the names that are more likely to
conflict. If we agree on doing that, I'd like to do that first, and
commit that more aggressively than the schedule itself.

An alternative approach would be to have isolationtester automatically
create a schema with the specfile's name, and place it in the search
path. But that'd make it impossible to use isolationtester against a
standby - which I think we currently don't do, but which probably would
be a good idea.


With regard to the schedule, I'm inclined to order it so that faster
test groups are earlier on, just to make it more likely to reach the
tests one is debugging faster.  Does that sound sane?

Do we want to maintain a serial version of the schedule too? I'm
wondering if we should just generate both the isolationtester and plain
regression test schedule by either adding an option to pg_regress that
serializes test groups, or by generating the serial schedule file in a
few lines of perl.

Greetings,

Andres Freund



Re: reducing isolation tests runtime

2018-12-04 Thread Andres Freund
Hi,

On 2018-12-04 15:45:39 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > I'd like to see this revived, getting a bit tired waiting longer and
> > longer to see isolationtester complete.  Is it really a problem that we
> > require a certain number of connections? Something on the order of 30-50
> > connections ought not to be a real problem for realistic machines, and
> > if it's a problem for one, they can use a serialized schedule?
> 
> The longstanding convention in the main regression tests is 20 max.
> Is there a reason to be different here?

It's a bit less obvious from the outside how many connections a test
spawn - IOW, it might be easier to maintain the schedule if the cap
isn't as tight.  And I'm doubtful that there's a good reason for the 20
limit these days, so going a bit higher seems reasonable.

Greetings,

Andres Freund



Re: reducing isolation tests runtime

2018-12-04 Thread Tom Lane
Andres Freund  writes:
> I'd like to see this revived, getting a bit tired waiting longer and
> longer to see isolationtester complete.  Is it really a problem that we
> require a certain number of connections? Something on the order of 30-50
> connections ought not to be a real problem for realistic machines, and
> if it's a problem for one, they can use a serialized schedule?

The longstanding convention in the main regression tests is 20 max.
Is there a reason to be different here?

regards, tom lane



Re: reducing isolation tests runtime

2018-12-04 Thread Alvaro Herrera
On 2018-Dec-04, Andres Freund wrote:

> Hi,
> 
> I'd like to see this revived, getting a bit tired waiting longer and
> longer to see isolationtester complete.  Is it really a problem that we
> require a certain number of connections? Something on the order of 30-50
> connections ought not to be a real problem for realistic machines, and
> if it's a problem for one, they can use a serialized schedule?

Hello

Yeah, me too.  Let me see about it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: reducing isolation tests runtime

2018-12-04 Thread Andres Freund
Hi,

On 2018-01-25 18:27:28 -0300, Alvaro Herrera wrote:
> Tom Lane wrote:
> > Alvaro Herrera  writes:
> > > Here's a concrete proposal.  Runtime is 45.7 seconds on my laptop.  It
> > > can be further reduced, but not by more than a second or two unless you
> > > get in the business of modifying other tests.  (I only modified
> > > deadlock-soft-2 because it saves 5 seconds).
> > 
> > Looks reasonable to me, but do we want to set any particular convention
> > about the max number of tests to run in parallel?  If so there should
> > be at least a comment saying what.
> 
> Hmm, I ran this in a limited number of connections and found that it
> fails with less than 27; and there's no MAX_CONNECTIONS like there is
> for pg_regress.  So I'll put this back on the drawing board until I'm
> back from vacations ...

I'd like to see this revived, getting a bit tired waiting longer and
longer to see isolationtester complete.  Is it really a problem that we
require a certain number of connections? Something on the order of 30-50
connections ought not to be a real problem for realistic machines, and
if it's a problem for one, they can use a serialized schedule?

Greetings,

Andres Freund



Re: reducing isolation tests runtime

2018-01-25 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Here's a concrete proposal.  Runtime is 45.7 seconds on my laptop.  It
> > can be further reduced, but not by more than a second or two unless you
> > get in the business of modifying other tests.  (I only modified
> > deadlock-soft-2 because it saves 5 seconds).
> 
> Looks reasonable to me, but do we want to set any particular convention
> about the max number of tests to run in parallel?  If so there should
> be at least a comment saying what.

Hmm, I ran this in a limited number of connections and found that it
fails with less than 27; and there's no MAX_CONNECTIONS like there is
for pg_regress.  So I'll put this back on the drawing board until I'm
back from vacations ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: reducing isolation tests runtime

2018-01-25 Thread Tom Lane
Alvaro Herrera  writes:
> Here's a concrete proposal.  Runtime is 45.7 seconds on my laptop.  It
> can be further reduced, but not by more than a second or two unless you
> get in the business of modifying other tests.  (I only modified
> deadlock-soft-2 because it saves 5 seconds).

Looks reasonable to me, but do we want to set any particular convention
about the max number of tests to run in parallel?  If so there should
be at least a comment saying what.

> Admittedly the new isolation_schedule file is a bit ugly.

Meh, seems fine.

We won't know if this really works till it hits the buildfarm,
I suppose.

regards, tom lane



Re: reducing isolation tests runtime

2018-01-25 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:

> > I think we could solve this by putting in the same parallel group only
> > slow tests that mostly sleeps, ie. nothing that would monopolize CPU for
> > long enough to cause a problem.  Concretely:
> > test: timeouts tuplelock-update deadlock-hard deadlock-soft-2
> 
> OK, but there'd better be a comment there explaining the concern
> very precisely, or somebody will break it.

Here's a concrete proposal.  Runtime is 45.7 seconds on my laptop.  It
can be further reduced, but not by more than a second or two unless you
get in the business of modifying other tests.  (I only modified
deadlock-soft-2 because it saves 5 seconds).

Admittedly the new isolation_schedule file is a bit ugly.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From f80a2fd9e3320d033017e17a648ac1ca13396ef0 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 25 Jan 2018 17:11:03 -0300
Subject: [PATCH] Parallelize isolation tests a bit

It is possible to run isolation tests in parallel, which saves quite a
bit of runtime (from 76 to 45 seconds in my machine).

Test 'timeout' has strict timing requirements, which may fail in animals
that are slow (either because of running under valgrind, doing cache
clobbering or expensive memory checks, or hardware is just slow), so
only put it in the same group with other tests that mostly sleep, to
avoid disruption.

Discussion: https://postgr.es/m/20180124231006.z7spaz5gkzbdvob5@alvherre.pgsql
---
 src/test/isolation/expected/deadlock-soft-2.out | 12 ++--
 src/test/isolation/isolation_schedule   | 84 +
 src/test/isolation/specs/deadlock-soft-2.spec   | 22 +++
 3 files changed, 33 insertions(+), 85 deletions(-)

diff --git a/src/test/isolation/expected/deadlock-soft-2.out 
b/src/test/isolation/expected/deadlock-soft-2.out
index 14b0343ba4..5c33a5cdaa 100644
--- a/src/test/isolation/expected/deadlock-soft-2.out
+++ b/src/test/isolation/expected/deadlock-soft-2.out
@@ -1,12 +1,12 @@
 Parsed test spec with 4 sessions
 
 starting permutation: s1a s2a s2b s3a s4a s1b s1c s2c s3c s4c
-step s1a: LOCK TABLE a1 IN SHARE UPDATE EXCLUSIVE MODE;
-step s2a: LOCK TABLE a2 IN ACCESS SHARE MODE;
-step s2b: LOCK TABLE a1 IN SHARE UPDATE EXCLUSIVE MODE; 
-step s3a: LOCK TABLE a2 IN ACCESS EXCLUSIVE MODE; 
-step s4a: LOCK TABLE a2 IN ACCESS EXCLUSIVE MODE; 
-step s1b: LOCK TABLE a2 IN SHARE UPDATE EXCLUSIVE MODE; 
+step s1a: LOCK TABLE aa1 IN SHARE UPDATE EXCLUSIVE MODE;
+step s2a: LOCK TABLE aa2 IN ACCESS SHARE MODE;
+step s2b: LOCK TABLE aa1 IN SHARE UPDATE EXCLUSIVE MODE; 
+step s3a: LOCK TABLE aa2 IN ACCESS EXCLUSIVE MODE; 
+step s4a: LOCK TABLE aa2 IN ACCESS EXCLUSIVE MODE; 
+step s1b: LOCK TABLE aa2 IN SHARE UPDATE EXCLUSIVE MODE; 
 step s1b: <... completed>
 step s1c: COMMIT;
 step s2b: <... completed>
diff --git a/src/test/isolation/isolation_schedule 
b/src/test/isolation/isolation_schedule
index 74d7d59546..0df00c6a53 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -1,68 +1,16 @@
-test: read-only-anomaly
-test: read-only-anomaly-2
-test: read-only-anomaly-3
-test: read-write-unique
-test: read-write-unique-2
-test: read-write-unique-3
-test: read-write-unique-4
-test: simple-write-skew
-test: receipt-report
-test: temporal-range-integrity
-test: project-manager
-test: classroom-scheduling
-test: total-cash
-test: referential-integrity
-test: ri-trigger
-test: partial-index
-test: two-ids
-test: multiple-row-versions
-test: index-only-scan
-test: deadlock-simple
-test: deadlock-hard
-test: deadlock-soft
-test: deadlock-soft-2
-test: fk-contention
-test: fk-deadlock
-test: fk-deadlock2
-test: eval-plan-qual
-test: lock-update-delete
-test: lock-update-traversal
-test: insert-conflict-do-nothing
-test: insert-conflict-do-nothing-2
-test: insert-conflict-do-update
-test: insert-conflict-do-update-2
-test: insert-conflict-do-update-3
-test: insert-conflict-toast
-test: delete-abort-savept
-test: delete-abort-savept-2
-test: aborted-keyrevoke
-test: multixact-no-deadlock
-test: multixact-no-forget
-test: lock-committed-update
-test: lock-committed-keyupdate
-test: update-locked-tuple
-test: propagate-lock-delete
-test: tuplelock-conflict
-test: tuplelock-update
-test: freeze-the-dead
-test: nowait
-test: nowait-2
-test: nowait-3
-test: nowait-4
-test: nowait-5
-test: skip-locked
-test: skip-locked-2
-test: skip-locked-3
-test: skip-locked-4
-test: drop-index-concurrently-1
-test: multiple-cic
-test: alter-table-1
-test: alter-table-2
-test: alter-table-3
-test: alter-table-4
-test: create-trigger
-test: sequence-ddl
-test: async-notify
-test: vacuum-reltuples
-test: timeouts
-test: vacuum-concurrent-drop
+# In the first group of parallel tests, test "timeout" might be susceptible to
+# additional load placed on the machine, so avoid anything in 

Re: reducing isolation tests runtime

2018-01-25 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Alvaro Herrera  writes:
>>> On the subject of test total time, we could paralelize isolation tests.

>> BTW, one small issue there is that the reason the timeouts test is so
>> slow is that we have to use multi-second timeouts to be sure slower
>> buildfarm critters (eg valgrind animals) will get the expected results.
>> So I'm worried that if the machine isn't otherwise idle, we will get
>> random failures.

> I think we could solve this by putting in the same parallel group only
> slow tests that mostly sleeps, ie. nothing that would monopolize CPU for
> long enough to cause a problem.  Concretely:
> test: timeouts tuplelock-update deadlock-hard deadlock-soft-2

OK, but there'd better be a comment there explaining the concern
very precisely, or somebody will break it.

regards, tom lane



Re: reducing isolation tests runtime

2018-01-25 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > On the subject of test total time, we could paralelize isolation tests.
> > Right now "make check" in src/test/isolation takes 1:16 on my machine.
> > Test "timeouts" takes full 40s of that, with nothing running in parallel
> > -- the machine is completely idle.
> 
> BTW, one small issue there is that the reason the timeouts test is so
> slow is that we have to use multi-second timeouts to be sure slower
> buildfarm critters (eg valgrind animals) will get the expected results.
> So I'm worried that if the machine isn't otherwise idle, we will get
> random failures.

I think we could solve this by putting in the same parallel group only
slow tests that mostly sleeps, ie. nothing that would monopolize CPU for
long enough to cause a problem.  Concretely:

test: timeouts tuplelock-update deadlock-hard deadlock-soft-2

all of these tests have lots of sleeps and don't go through a lot of
data.  (Compared to the previous patch, I removed alter-table-1, which
uses a thousand tuples, and multiple-row-versions, which uses 100k; also
removed receipt-report which uses a large number of permutations.)

Timings:
timeouts40.3s
tuplelock-update10.5s
deadlock-hard   10.9s
deadlock-soft-2 5.4s

alter-table-1 takes 1.5s, receipt-report 1.2s and there's nothing else
that takes above 1s, so I think this is good enough -- we can still have
the whole thing run in ~45 seconds without the hazard you describe.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: reducing isolation tests runtime

2018-01-24 Thread Tom Lane
Alvaro Herrera  writes:
> On the subject of test total time, we could paralelize isolation tests.
> Right now "make check" in src/test/isolation takes 1:16 on my machine.
> Test "timeouts" takes full 40s of that, with nothing running in parallel
> -- the machine is completely idle.

BTW, one small issue there is that the reason the timeouts test is so
slow is that we have to use multi-second timeouts to be sure slower
buildfarm critters (eg valgrind animals) will get the expected results.
So I'm worried that if the machine isn't otherwise idle, we will get
random failures.

We could parallelize the rest of those tests and leave timeouts in its own
group.  That cuts the payback a lot :-( but might still be worth doing.
Or maybe tweak things so that the buildfarm runs a serial schedule but
manual testing doesn't.  Or we could debate how important the timeout
tests really are ... or think harder about how to make them reproducible.

regards, tom lane



Re: reducing isolation tests runtime

2018-01-24 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jan 24, 2018 at 6:10 PM, Alvaro Herrera  
> wrote:
>> On the subject of test total time, we could paralelize isolation tests.

> Oh, cool.  Yes, the time the isolation tests take to run is quite
> annoying.  I didn't realize it would be so easy to run it in parallel.

+1 to both --- I hadn't realized we had enough infrastructure to do this
in parallel, either.

regards, tom lane



Re: reducing isolation tests runtime

2018-01-24 Thread Robert Haas
On Wed, Jan 24, 2018 at 6:10 PM, Alvaro Herrera  wrote:
> On the subject of test total time, we could paralelize isolation tests.
> Right now "make check" in src/test/isolation takes 1:16 on my machine.
> Test "timeouts" takes full 40s of that, with nothing running in parallel
> -- the machine is completely idle.
>
> Seems like we can have a lot of time back just by changing the schedule
> to use multiple tests per line (in particular, put the other slow tests
> together with timeouts), per the attached; with this new schedule,
> isolation takes 44 seconds in my machine -- a win of 32 seconds.  We can
> win a couple of additional second by grouping a few other lines, but
> this is the biggest win.
>
> (This needs to be adjusted because some table names in the specs
> conflict.)

Oh, cool.  Yes, the time the isolation tests take to run is quite
annoying.  I didn't realize it would be so easy to run it in parallel.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company