Re: [HACKERS] tap tests on older branches fail if concurrency is used

2017-06-07 Thread Craig Ringer
On 7 June 2017 at 13:39, Michael Paquier  wrote:
> On Thu, Jun 1, 2017 at 10:48 PM, Tom Lane  wrote:
>> Andres Freund  writes:
>>> when using
>>> $ cat ~/.proverc
>>> -j9
>>> some tests fail for me in 9.4 and 9.5.
>>
>> Weren't there fixes specifically intended to make that safe, awhile ago?
>
> 60f826c has not been back-patched. While this would fix parallel runs
> with make's --jobs, PROVE_FLAGS="-j X" would still fail.

Ah, that's why I didn't find it.

I think applying Michael's patch makes sense now, and if we decide to
backpatch PostgresNode (and I get the time to do it) we can clobber
that fix quite happily with the full backport. Thanks Michael for the
workaround.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tap tests on older branches fail if concurrency is used

2017-06-06 Thread Michael Paquier
On Thu, Jun 1, 2017 at 10:48 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> when using
>> $ cat ~/.proverc
>> -j9
>> some tests fail for me in 9.4 and 9.5.
>
> Weren't there fixes specifically intended to make that safe, awhile ago?

60f826c has not been back-patched. While this would fix parallel runs
with make's --jobs, PROVE_FLAGS="-j X" would still fail.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tap tests on older branches fail if concurrency is used

2017-06-06 Thread Michael Paquier
On Wed, May 31, 2017 at 8:45 PM, Craig Ringer  wrote:
> On 1 June 2017 at 08:15, Andres Freund  wrote:
>> Hi,
>>
>> when using
>> $ cat ~/.proverc
>> -j9
>>
>> some tests fail for me in 9.4 and 9.5.  E.g. src/bin/script's tests
>> yields a lot of fun like:
>> $ (cd ~/build/postgres/9.5-assert/vpath/src/bin/scripts/ && make check)
>> ...
>> # LOG:  received immediate shutdown request
>> # WARNING:  terminating connection because of crash of another server process
>> # DETAIL:  The postmaster has commanded this server process to roll back the 
>> current transaction and exit, because another server process exited 
>> abnormally and possibly corrupted shared memory.
>> # HINT:  In a moment you should be able to reconnect to the database and 
>> repeat your command.
>> ...
>>
>> it appears as if various tests are trampling over each other.

They are. The problem can be easily reproduced on my side with that:
PROVE_FLAGS="-j 9" make check
It would be nice to get a minimum of stability for those tests in
back-branches even if PostgresNode.pm is not back-patched.

> The immediate problem appears to be that they all use
> tmp_check/postmaster.log . So anything that examines the logs gets
> confused by seeing some other postgres instance's logs, or a mixture,
> trampling everywhere.

Amen.

> I'll be surprised if there aren't other problems though. Rather than
> trying to fix it all up, this seems like a good argument for
> backporting the updated suite from 9.6 or pg10, with PostgresNode etc.
> I already have a working tree with that done to use src/test/recovery
> in 9.5, but haven't updated src/bin/scripts etc yet.

Yup. Even if PostgresNode.pm is not back-patched, a small trick is to
append the PID of the process running the TAP test to the log file
name as in the patch attached. This gives enough uniqueness for the
tests to pass with a high parallel degree.

A second error that I have spotted is in the tests of pg_rewind, which
would fail in parallel as the same data folders are used for each
test. Using the same trick with $$ makes the tests more stable.

A third error is a failure in contrib/test_decoding, and this has been
addressed by Andres in 60f826c.

Attached is a patch for the first two ones, which makes the tests more
robust. I am myself annoyed by parallel tests failing when working on
patches for back-branches, so having at least a minimal fix would be
nice.
-- 
Michael


tap-stability-95.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tap tests on older branches fail if concurrency is used

2017-06-01 Thread Tom Lane
Andres Freund  writes:
> when using
> $ cat ~/.proverc
> -j9
> some tests fail for me in 9.4 and 9.5.

Weren't there fixes specifically intended to make that safe, awhile ago?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tap tests on older branches fail if concurrency is used

2017-05-31 Thread Craig Ringer
On 1 June 2017 at 08:15, Andres Freund  wrote:
> Hi,
>
> when using
> $ cat ~/.proverc
> -j9
>
> some tests fail for me in 9.4 and 9.5.  E.g. src/bin/script's tests
> yields a lot of fun like:
> $ (cd ~/build/postgres/9.5-assert/vpath/src/bin/scripts/ && make check)
> ...
> # LOG:  received immediate shutdown request
> # WARNING:  terminating connection because of crash of another server process
> # DETAIL:  The postmaster has commanded this server process to roll back the 
> current transaction and exit, because another server process exited 
> abnormally and possibly corrupted shared memory.
> # HINT:  In a moment you should be able to reconnect to the database and 
> repeat your command.
> ...
>
> it appears as if various tests are trampling over each other.

None of those scripts use PostgresNode, which I thought was added in
9.5, but apparently was actually introduced in 9.6. They do all their
own setup/teardown using TestLib.pm routines. TestLib uses a unique
tempdir for each test run, sets it as the unix socket directory, and
disables listening on tcp, so the most obvious conflict is hidden.

The immediate problem appears to be that they all use
tmp_check/postmaster.log . So anything that examines the logs gets
confused by seeing some other postgres instance's logs, or a mixture,
trampling everywhere.

I'll be surprised if there aren't other problems though. Rather than
trying to fix it all up, this seems like a good argument for
backporting the updated suite from 9.6 or pg10, with PostgresNode etc.
I already have a working tree with that done to use src/test/recovery
in 9.5, but haven't updated src/bin/scripts etc yet.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tap tests on older branches fail if concurrency is used

2017-05-31 Thread Craig Ringer
On 1 June 2017 at 08:15, Andres Freund  wrote:
> Hi,
>
> when using
> $ cat ~/.proverc
> -j9
>
> some tests fail for me in 9.4 and 9.5.  E.g. src/bin/script's tests
> yields a lot of fun like:
> $ (cd ~/build/postgres/9.5-assert/vpath/src/bin/scripts/ && make check)
> ...
> # LOG:  received immediate shutdown request
> # WARNING:  terminating connection because of crash of another server process
> # DETAIL:  The postmaster has commanded this server process to roll back the 
> current transaction and exit, because another server process exited 
> abnormally and possibly corrupted shared memory.
> # HINT:  In a moment you should be able to reconnect to the database and 
> repeat your command.
> ...
>
> it appears as if various tests are trampling over each other.  If needed
> I can provide detailed logs, but it appears to readily reproduce on
> several machines...

I'll take a look at what's changed and why it's happening and get back to you.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] tap tests on older branches fail if concurrency is used

2017-05-31 Thread Andres Freund
Hi,

when using
$ cat ~/.proverc
-j9

some tests fail for me in 9.4 and 9.5.  E.g. src/bin/script's tests
yields a lot of fun like:
$ (cd ~/build/postgres/9.5-assert/vpath/src/bin/scripts/ && make check)
...
# LOG:  received immediate shutdown request
# WARNING:  terminating connection because of crash of another server process
# DETAIL:  The postmaster has commanded this server process to roll back the 
current transaction and exit, because another server process exited abnormally 
and possibly corrupted shared memory.
# HINT:  In a moment you should be able to reconnect to the database and repeat 
your command.
...

it appears as if various tests are trampling over each other.  If needed
I can provide detailed logs, but it appears to readily reproduce on
several machines...

See Michael, I'll provide the details and a reproducer ;)

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers