Re: [HACKERS] Explanation for intermittent buildfarm pg_upgradecheck failures

2015-08-02 Thread Tom Lane
I wrote:
 Further experimentation says that 9.0-9.2 do this in the expected order;
 so somebody broke it during 9.3.

Depressingly enough, the somebody was me.  Fixed now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explanation for intermittent buildfarm pg_upgradecheck failures

2015-08-02 Thread Tom Lane
I wrote:
 unlink(/tmp/.s.PGSQL.5432)= 0
 unlink(postmaster.pid)= 0
 unlink(/tmp/.s.PGSQL.5432.lock)   = 0
 exit_group(0)   = ?
 +++ exited with 0 +++

 I haven't looked to find out why the unlinks happen in this order, but on
 a heavily loaded machine, it's certainly possible that the process would
 lose the CPU after unlink(postmaster.pid), and then a new postmaster
 could get far enough to see the socket lock file still there.  So that
 would account for low-probability failures in the pg_upgradecheck test,
 which is exactly what we've been seeing.

Further experimentation says that 9.0-9.2 do this in the expected order;
so somebody broke it during 9.3.

The lack of a close() on the postmaster socket goes all the way back
though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explanation for intermittent buildfarm pg_upgradecheck failures

2015-08-02 Thread Michael Paquier
On Mon, Aug 3, 2015 at 1:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I haven't looked to find out why the unlinks happen in this order, but on
 a heavily loaded machine, it's certainly possible that the process would
 lose the CPU after unlink(postmaster.pid), and then a new postmaster
 could get far enough to see the socket lock file still there.  So that
 would account for low-probability failures in the pg_upgradecheck test,
 which is exactly what we've been seeing.

Oh... This may explain the different failures seen with TAP tests on
hamster, and axolotl with pg_upgrade as well. It is rather easy to get
them heavily loaded.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers