Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-06-10 Thread Jeff Frost
On May 26, 2012, at 9:17 AM, Tom Lane wrote: Would you guys please try this in the problem databases: select a.ctid, c.relname from pg_attribute a join pg_class c on a.attrelid=c.oid where c.relnamespace=11 and c.relkind in ('r','i') order by 1 desc; If you see any block numbers above

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-28 Thread Greg Sabino Mullane
On Sun, May 27, 2012 at 05:44:15PM -0700, Jeff Frost wrote: On May 27, 2012, at 12:53 PM, Tom Lane wrote: occurring, they'd take long enough to expose the process to sinval overrun even with not-very-high DDL rates. As it turns out, there are quite a few temporary tables created. For the

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-28 Thread Tom Lane
Greg Sabino Mullane g...@endpoint.com writes: On Sun, May 27, 2012 at 05:44:15PM -0700, Jeff Frost wrote: On May 27, 2012, at 12:53 PM, Tom Lane wrote: occurring, they'd take long enough to expose the process to sinval overrun even with not-very-high DDL rates. As it turns out, there are

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-27 Thread Tom Lane
I've been continuing to poke at this business of relcache-related startup stalls, and have come to some new conclusions. One is that it no longer seems at all likely that the pg_attribute rows for system catalogs aren't at the front of pg_attribute, because the commands that might be used to

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-27 Thread Jeff Frost
On May 27, 2012, at 12:53 PM, Tom Lane wrote: Another thing that can be deduced from those stack traces is that sinval resets were happening. For example, in the third message linked above, the heapscan is being done to load up a relcache entry for relation 2601 (pg_am). This would be

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-26 Thread Greg Sabino Mullane
On Fri, May 25, 2012 at 07:02:42PM -0400, Tom Lane wrote: However, the remaining processes trying to compute new init files would still have to complete the process, so I'd expect there to be a diminishing effect --- the ones that were stalling shouldn't all release exactly together. Unless

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-26 Thread Tom Lane
Greg Sabino Mullane g...@endpoint.com writes: On Fri, May 25, 2012 at 07:02:42PM -0400, Tom Lane wrote: pg_attribute just enough smaller to avoid the scenario. Not sure about Greg's case, but he should be able to tell us the size of pg_attribute and his shared_buffers setting ...

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-26 Thread Greg Sabino Mullane
On Sat, May 26, 2012 at 12:17:04PM -0400, Tom Lane wrote: If you see any block numbers above about 20 then maybe the triggering condition is a row relocation after all. Highest was 13. -- Greg Sabino Mullane g...@endpoint.com End Point Corporation PGP Key: 0x14964AC8 pgpa6XGTGTEIZ.pgp

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-26 Thread Tom Lane
Greg Sabino Mullane g...@endpoint.com writes: On Sat, May 26, 2012 at 12:17:04PM -0400, Tom Lane wrote: If you see any block numbers above about 20 then maybe the triggering condition is a row relocation after all. Highest was 13. Hm ... but wait, you said you'd done a VACUUM FULL on the

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-26 Thread Greg Sabino Mullane
On Sat, May 26, 2012 at 01:25:29PM -0400, Tom Lane wrote: Greg Sabino Mullane g...@endpoint.com writes: On Sat, May 26, 2012 at 12:17:04PM -0400, Tom Lane wrote: If you see any block numbers above about 20 then maybe the triggering condition is a row relocation after all. Highest was

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-25 Thread Greg Sabino Mullane
Yeah, this is proof that what it was doing is the same as what we saw in Jeff's backtrace, ie loading up the system catalog relcache entries the hard way via seqscans on the core catalogs. So the question to be answered is why that's suddenly a big performance bottleneck. It's not a cheap

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-25 Thread Tom Lane
Greg Sabino Mullane g...@endpoint.com writes: Yeah, this is proof that what it was doing is the same as what we saw in Jeff's backtrace, ie loading up the system catalog relcache entries the hard way via seqscans on the core catalogs. So the question to be answered is why that's suddenly a

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-25 Thread Jeff Frost
On May 25, 2012, at 4:02 PM, Tom Lane wrote: Greg Sabino Mullane g...@endpoint.com writes: Yeah, this is proof that what it was doing is the same as what we saw in Jeff's backtrace, ie loading up the system catalog relcache entries the hard way via seqscans on the core catalogs. So the

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-25 Thread Tom Lane
Jeff Frost j...@pgexperts.com writes: In our customer's case, the size of pg_attribute was a little less than 1/4 of shared_buffers, so might not be the syncscan? Could you go back and double check that? If the shared_buffers setting were 7GB not 8GB, that would fall right between the

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-25 Thread Jeff Frost
On May 25, 2012, at 7:12 PM, Tom Lane wrote: Jeff Frost j...@pgexperts.com writes: In our customer's case, the size of pg_attribute was a little less than 1/4 of shared_buffers, so might not be the syncscan? Could you go back and double check that? If the shared_buffers setting were 7GB

[HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Greg Sabino Mullane
Yesterday I had a client that experienced a sudden high load on one of their servers (8.3.5 - yes, I know. Those of you with clients will understand). When I checked, almost all connections were in a startup state, very similar to this thread:

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Tom Lane
Greg Sabino Mullane g...@endpoint.com writes: Yesterday I had a client that experienced a sudden high load on one of their servers (8.3.5 - yes, I know. Those of you with clients will understand). When I checked, almost all connections were in a startup state, very similar to this thread:

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Greg Sabino Mullane
On Thu, May 24, 2012 at 03:54:54PM -0400, Tom Lane wrote: Did you check I/O activity? I looked again at Jeff Frost's report and now think that what he saw was probably a lot of seqscans on bloated system catalogs, cf http://archives.postgresql.org/message-id/28484.1337887...@sss.pgh.pa.us

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Greg Sabino Mullane
I think there are probably two independent issues here. The missing index entries are clearly bad but it's not clear that they had anything to do with the startup stall. On further log digging, I think you are correct, as those index warnings go back many days before the startup problems

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Tom Lane
Greg Sabino Mullane g...@endpoint.com writes: Oh, almost forgot: reading your reply to the old thread reminded me of something I saw in one of the straces right as it woke up and left the startup state to do some work. Here's a summary: 12:18:39 semop(4390981, 0x7fff66c4ec10, 1) = 0