Re: [GENERAL] startup process stuck in recovery

2017-10-11 Thread Tom Lane
Simon Riggs writes: > On 11 October 2017 at 08:09, Christophe Pettus wrote: >> While it's certainly true that this was an extreme case, it was a real-life >> production situation. The concern here is that in the actual production >> situation, the

Re: [GENERAL] startup process stuck in recovery

2017-10-11 Thread Simon Riggs
On 11 October 2017 at 08:09, Christophe Pettus wrote: > >> On Oct 10, 2017, at 23:54, Simon Riggs wrote: >> >> The use case described seems incredibly >> unreal and certainly amenable to being rewritten. > > While it's certainly true that this was an

Re: [GENERAL] startup process stuck in recovery

2017-10-11 Thread Christophe Pettus
> On Oct 10, 2017, at 23:54, Simon Riggs wrote: > > The use case described seems incredibly > unreal and certainly amenable to being rewritten. While it's certainly true that this was an extreme case, it was a real-life production situation. The concern here is that in

Re: [GENERAL] startup process stuck in recovery

2017-10-11 Thread Simon Riggs
On 10 October 2017 at 21:23, Tom Lane wrote: > What I see is that, given this particular test case, the backend > process on the master never holds more than a few locks at a time. > Each time we abort a subtransaction, the AE lock it was holding > on the temp table it

Re: [GENERAL] startup process stuck in recovery

2017-10-10 Thread Tom Lane
Christophe Pettus writes: > I was able to reproduce this on 9.5.9 with the following: Hmm ... so I still can't reproduce the specific symptoms Christophe reports. What I see is that, given this particular test case, the backend process on the master never holds more than a

Re: [GENERAL] startup process stuck in recovery

2017-10-10 Thread Christophe Pettus
> On Oct 10, 2017, at 08:05, Tom Lane wrote: > > You're right, I was testing on HEAD, so that patch might've obscured > the problem. But the code looks like it could still be O(N^2) in > some cases. Will look again later. I was able to reproduce this on 9.5.9 with the

Re: [GENERAL] startup process stuck in recovery

2017-10-10 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> Hmm, I tried to reproduce this and could not. I experimented with >> various permutations of this: > This problem is probably related to commit 9b013dc238c, which AFAICS is > only in pg10, not 9.5. You're right, I was testing

Re: [GENERAL] startup process stuck in recovery

2017-10-10 Thread Alvaro Herrera
Tom Lane wrote: > Christophe Pettus writes: > > The problem indeed appear to be a very large number of subtransactions, > > each one creating a temp table, inside a single transaction. It's made > > worse by one of those transactions finally getting replayed on the > >

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Christophe Pettus
> On Oct 9, 2017, at 17:30, Tom Lane wrote: > > What am I missing to reproduce the problem? Not sure. The actual client behavior here is a bit cryptic (not our code, incompletely logs). They might be creating a savepoint before each temp table creation, without a

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Christophe Pettus
> On Oct 9, 2017, at 18:21, Peter Geoghegan wrote: > What's the hot_standy_feedback setting? How about > max_standby_archive_delay/max_standby_streaming_delay? On, 5m, 5m. -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-general mailing list

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Peter Geoghegan
On Mon, Oct 9, 2017 at 12:08 PM, Christophe Pettus wrote: > Suggestions on further diagnosis? What's the hot_standy_feedback setting? How about max_standby_archive_delay/max_standby_streaming_delay? -- Peter Geoghegan -- Sent via pgsql-general mailing list

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Tom Lane
Peter Geoghegan writes: > Just a guess, but do you disable autovacuum on your dev machine? (I know I > do.) Nope. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Peter Geoghegan
On Mon, Oct 9, 2017 at 5:30 PM, Tom Lane wrote: > and did not see any untoward behavior, at least not till I got to enough > temp tables to overrun the master's shared lock table, and even then it > cleaned up fine. At no point was the standby process consuming anywhere >

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Tom Lane
Christophe Pettus writes: > The problem indeed appear to be a very large number of subtransactions, each > one creating a temp table, inside a single transaction. It's made worse by > one of those transactions finally getting replayed on the secondary, only to > have

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Merlin Moncure
On Mon, Oct 9, 2017 at 6:12 PM, Christophe Pettus wrote: > >> On Oct 9, 2017, at 14:29, Tom Lane wrote: >> Hmm. Creating or dropping a temp table does take AccessExclusiveLock, >> just as it does for a non-temp table. In principle we'd not have to >>

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Christophe Pettus
> On Oct 9, 2017, at 14:29, Tom Lane wrote: > Hmm. Creating or dropping a temp table does take AccessExclusiveLock, > just as it does for a non-temp table. In principle we'd not have to > transmit those locks to standbys, but I doubt that the WAL code has > enough knowledge

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Tom Lane
Christophe Pettus writes: >> On Oct 9, 2017, at 13:26, Tom Lane wrote: >> My bet is that the source server did something that's provoking O(N^2) >> behavior in the standby server's lock management. It's hard to say >> exactly what, but I'm wondering about

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Christophe Pettus
> On Oct 9, 2017, at 13:26, Tom Lane wrote: > My bet is that the source server did something that's provoking O(N^2) > behavior in the standby server's lock management. It's hard to say > exactly what, but I'm wondering about something like a plpgsql function > taking an

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Christophe Pettus
> On Oct 9, 2017, at 13:26, Tom Lane wrote: > > Oh, that's really interesting. So it's not *just* releasing locks but > also acquiring them, which says that it is making progress of some sort. It seems to have leveled out now, and is still grinding away. > Can you

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Tom Lane
Christophe Pettus writes: >> On Oct 9, 2017, at 13:01, Tom Lane wrote: >> Is that number changing at all? > Increasing: > AccessExclusiveLock | 8810 Oh, that's really interesting. So it's not *just* releasing locks but also acquiring them, which says

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Christophe Pettus
> On Oct 9, 2017, at 13:01, Tom Lane wrote: > Hmm. Is it possible that the process is replaying the abort of a > transaction with a lot of subtransactions? That's possible, although we're now talking about an hours-long delay at this point. > Is that number changing at

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Tom Lane
Christophe Pettus writes: > The other observation is that the startup process is holding a *lot* of locks: Hmm. Is it possible that the process is replaying the abort of a transaction with a lot of subtransactions? It seems like maybe you could be getting into an O(N^2)

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Christophe Pettus
On Oct 9, 2017, at 12:18, Christophe Pettus wrote: > > #0 0x558812f4f1da in ?? () > #1 0x558812f4f8cb in StandbyReleaseLockTree () > #2 0x558812d718ee in ?? () > #3 0x558812d75520 in xact_redo () > #4 0x558812d7f713 in StartupXLOG () > #5

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Christophe Pettus
> On Oct 9, 2017, at 12:10, Tom Lane wrote: > > Attach to startup process with gdb, and get a stack trace? #0 0x558812f4f1da in ?? () #1 0x558812f4f8cb in StandbyReleaseLockTree () #2 0x558812d718ee in ?? () #3 0x558812d75520 in xact_redo () #4

Re: [GENERAL] startup process stuck in recovery

2017-10-09 Thread Tom Lane
Christophe Pettus writes: > We're dealing with a 9.5.5 database with the symptom that, after a certain > amount of time after restart, the startup process reaches a certain WAL > segment, and stops. The startup process runs at 100% CPU, with no output > from strace. There