Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-11 Thread Noah Misch
On Wed, Feb 10, 2016 at 10:55:10AM -0500, Tom Lane wrote: > Interestingly, we seem to have managed to greatly reduce the "other" > time (which I presume is basically mdpostchkpt unlinking) since 9.2. > The worst case observed in HEAD is about 100s: > > regression=# select ts,write,sync,total-write

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-10 Thread Andrew Dunstan
On 02/10/2016 12:53 PM, Tom Lane wrote: Andrew Dunstan writes: Yeah. It's faintly possible that a kernel upgrade will help. Another data point. I have another RPi2B that is running Debian Wheezy rather than the Fedora remix. I'm running the same test on it we ran yesterday on axolotl. It se

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-10 Thread Tom Lane
Andrew Dunstan writes: >> Yeah. It's faintly possible that a kernel upgrade will help. > Another data point. I have another RPi2B that is running Debian Wheezy > rather than the Fedora remix. I'm running the same test on it we ran > yesterday on axolotl. It seems to be running without having t

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-10 Thread Andrew Dunstan
On 02/09/2016 11:21 PM, Andrew Dunstan wrote: The idea I was toying with is that previous filesystem activity (making the temp install, the server's never-fsync'd writes, etc) has built up a bunch of dirty kernel buffers, and at some point the kernel goes nuts writing all that data. So the

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-10 Thread Andres Freund
On 2016-02-09 22:27:07 -0500, Tom Lane wrote: > The idea I was toying with is that previous filesystem activity (making > the temp install, the server's never-fsync'd writes, etc) has built up a > bunch of dirty kernel buffers, and at some point the kernel goes nuts > writing all that data. So the

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-10 Thread Tom Lane
Noah Misch writes: >>> That's reasonable. If you would like higher-fidelity data, I can run loops >>> of >>> "pg_ctl -w start; make installcheck; pg_ctl -t900 -w stop", and I could run >>> that for HEAD and 9.2 simultaneously. A day of logs from that should show >>> clearly if HEAD is systemati

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Noah Misch
On Mon, Feb 08, 2016 at 10:55:24PM -0500, Tom Lane wrote: > Noah Misch writes: > > On Mon, Feb 08, 2016 at 02:15:48PM -0500, Tom Lane wrote: > >> We've seen variants > >> on this theme on half a dozen machines just in the past week --- and it > >> seems to mostly happen in 9.5 and HEAD, which is f

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Andrew Dunstan
On 02/09/2016 10:27 PM, Tom Lane wrote: Noah Misch writes: On Tue, Feb 09, 2016 at 10:02:17PM -0500, Tom Lane wrote: I wonder if it's worth sticking some instrumentation into stats collector shutdown? I wouldn't be surprised if the collector got backlogged during the main phase of testing a

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Tom Lane
Noah Misch writes: > On Tue, Feb 09, 2016 at 10:02:17PM -0500, Tom Lane wrote: >> I wonder if it's worth sticking some instrumentation into stats >> collector shutdown? > I wouldn't be surprised if the collector got backlogged during the main phase > of testing and took awhile to chew through its

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Tom Lane
Jim Nasby writes: > On 2/8/16 2:45 PM, Tom Lane wrote: >> I had in mind to just "git revert" the patch when we're done with it. > It's already difficult enough for DBAs to debug some performance issues, > so getting rid of logging is a step backwards. I realize it's unlikely > that you could ru

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Noah Misch
On Tue, Feb 09, 2016 at 10:02:17PM -0500, Tom Lane wrote: > Still, it seems clear that the bulk of the shutdown time is indeed the > stats collector taking its time about shutting down, which is doubly > weird because the ecpg tests shouldn't have created very many tables, > so why would there be a

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Tom Lane
Andrew Dunstan writes: > anyway, we got a failure pretty quickly: > pg_ctl: server does not shut down at 2016-02-09 21:10:11.914 EST > ... > LOG: received fast shutdown request at 2016-02-09 21:09:11.824 EST > ... > LOG: checkpointer dead at 2016-02-09 21:09:14.683 EST > LOG: all children d

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Jim Nasby
On 2/8/16 2:45 PM, Tom Lane wrote: Alvaro Herrera writes: Tom Lane wrote: What I'd like to do to investigate this is put in a temporary HEAD-only patch that makes ShutdownXLOG() and its subroutines much chattier about how far they've gotten and what time it is, and also makes pg_ctl print out

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Andrew Dunstan
On 02/09/2016 08:49 PM, Tom Lane wrote: Andrew Dunstan writes: On 02/09/2016 07:49 PM, Tom Lane wrote: However, I'd already noted from some other digging in the buildfarm logs that axolotl's speed seems to vary tremendously. I do not know what else you typically run on that hardware, but pu

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Tom Lane
Andrew Dunstan writes: > On 02/09/2016 07:49 PM, Tom Lane wrote: >> However, I'd already noted from some other digging in the buildfarm >> logs that axolotl's speed seems to vary tremendously. I do not >> know what else you typically run on that hardware, but putting it >> under full load might h

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Andrew Dunstan
On 02/09/2016 07:49 PM, Tom Lane wrote: Andrew Dunstan writes: So running it's not running with fsync off or using the ramdisk for stats_temp_directory. Of course, that doesn't explain why we're not seeing it on branches earlier than 9.5, but it could explain why we're only seeing it on the e

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Tom Lane
Andrew Dunstan writes: > So running it's not running with fsync off or using the ramdisk for > stats_temp_directory. Of course, that doesn't explain why we're not > seeing it on branches earlier than 9.5, but it could explain why we're > only seeing it on the ecpg tests. BTW, the evidence we a

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Tom Lane
I wrote: > Anyway, I think I should push this additional instrumentation so you > can use it on axolotl. Done. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpre

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Tom Lane
Andrew Dunstan writes: > Incidentally, as I noted earlier, the ecpg tests don't honour > TEMP_CONFIG, and in axolotl's case this could well make a difference, as > it it set up like this: > ... > So running it's not running with fsync off or using the ramdisk for > stats_temp_directory. Oooohh

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Andrew Dunstan
On 02/09/2016 06:46 PM, Andrew Dunstan wrote: On 02/09/2016 05:53 PM, Tom Lane wrote: Andrew, I wonder if I could prevail on you to make axolotl run "make check" on HEAD in src/interfaces/ecpg/ until it fails, so that we can see if the logging I added tells anything useful about this.

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Andrew Dunstan
On 02/09/2016 05:53 PM, Tom Lane wrote: Andrew, I wonder if I could prevail on you to make axolotl run "make check" on HEAD in src/interfaces/ecpg/ until it fails, so that we can see if the logging I added tells anything useful about this. Will do. cheers andre

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Tom Lane
I wrote: > ... However, there is something else happening > on axolotl. Looking at the HEAD and 9.5 branches, there are three very > similar failures in the ECPG step within the past 60 days: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl&dt=2016-02-08%2014%3A49%3A23 > http://bui

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Tom Lane
I wrote: > I'm not sure whether there's anything to be gained by leaving the tracing > code in there till we see actual buildfarm fails. There might be another > slowdown mechanism somewhere, but I rather doubt it. Thoughts? Hmmm ... I take that back. AFAICT, the failures on Noah's AIX zoo are

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Andrew Dunstan
On 02/09/2016 03:05 PM, Tom Lane wrote: I wrote: In any case, we should proceed with fixing things so that buildfarm owners can specify a higher shutdown timeout for especially slow critters. I looked into doing this as I suggested yesterday, namely modifying the buildfarm scripts, and soon d

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Tom Lane
I wrote: > In any case, we should proceed with fixing things so that buildfarm owners > can specify a higher shutdown timeout for especially slow critters. I looked into doing this as I suggested yesterday, namely modifying the buildfarm scripts, and soon decided that it would be a mess; there are

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Tom Lane
I wrote: > Noah Misch writes: >> On Mon, Feb 08, 2016 at 02:15:48PM -0500, Tom Lane wrote: >>> We've seen variants >>> on this theme on half a dozen machines just in the past week --- and it >>> seems to mostly happen in 9.5 and HEAD, which is fishy. >> It has been affecting only the four AIX ani

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-09 Thread Andrew Dunstan
On 02/08/2016 10:55 PM, Tom Lane wrote: Noah Misch writes: On Mon, Feb 08, 2016 at 02:15:48PM -0500, Tom Lane wrote: We've seen variants on this theme on half a dozen machines just in the past week --- and it seems to mostly happen in 9.5 and HEAD, which is fishy. It has been affecting only

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-08 Thread Tom Lane
Noah Misch writes: > On Mon, Feb 08, 2016 at 02:15:48PM -0500, Tom Lane wrote: >> We've seen variants >> on this theme on half a dozen machines just in the past week --- and it >> seems to mostly happen in 9.5 and HEAD, which is fishy. > It has been affecting only the four AIX animals, which do s

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-08 Thread Noah Misch
On Mon, Feb 08, 2016 at 02:15:48PM -0500, Tom Lane wrote: > Of late, by far the majority of the random-noise failures we see in the > buildfarm have come from failure to shut down the postmaster in a > reasonable timeframe. > We've seen variants > on this theme on half a dozen machines just in the

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-08 Thread Andrew Dunstan
On 02/08/2016 02:15 PM, Tom Lane wrote: Of late, by far the majority of the random-noise failures we see in the buildfarm have come from failure to shut down the postmaster in a reasonable timeframe. An example is this current failure on hornet: http://buildfarm.postgresql.org/cgi-bin/show_st

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-08 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> What I'd like to do to investigate this is put in a temporary HEAD-only >> patch that makes ShutdownXLOG() and its subroutines much chattier about >> how far they've gotten and what time it is, and also makes pg_ctl print >> out the current time if it gi

Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-08 Thread Alvaro Herrera
Tom Lane wrote: > Of late, by far the majority of the random-noise failures we see in the > buildfarm have come from failure to shut down the postmaster in a > reasonable timeframe. I noticed that. > An example is this current failure on hornet: > > http://buildfarm.postgresql.org/cgi-bin/show_s

[HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-08 Thread Tom Lane
Of late, by far the majority of the random-noise failures we see in the buildfarm have come from failure to shut down the postmaster in a reasonable timeframe. An example is this current failure on hornet: http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hornet&dt=2016-02-08%2013%3A41