Re: [HACKERS] SR standby hangs

Andrew Dunstan Tue, 26 Apr 2011 13:02:20 -0700


On 02/22/2011 07:55 AM, Robert Haas wrote:

On Sun, Feb 20, 2011 at 12:39 PM, Greg Stark<[email protected]>  wrote:

On Fri, Feb 18, 2011 at 6:59 PM, Andrew Dunstan<[email protected]>  wrote:

The server is running as a warm standby, and the client's application tries
to connect to both the master and the slave, accepting whichever lets it
connect (hence hot standby is not turned on).
...
   #2  0x00000000005de645 in LockBufferForCleanup () at bufmgr.c:2432
   #3  0x0000000000463733 in heap_xlog_clean (lsn=<value optimized out>,

Hm, if hot standby isn't turned on why are we bothering with locks at
all? But that said is there *anything* else happening in the database
aside from recovery? Are there any errors in the database log?

But this still shouldn't block. It's either blocked locking the buffer
or blocked waiting for the buffer to become unpinned. It would be nice
to get a backtrace from a debugging build which wouldn't have some of
the functions inlined. It would be really nice to see the pin count on
the buffer in question -- perhaps it has gotten out of sync or
underflowed?

I realize now that it would also be useful to see the state of the
LWLock on the buffer.

This has happened again. This time we have some debug info available,and can possible get more, if people tell me what will be helpful:


   (gdb) bt
   #0  0x0000003493ed5337 in semop () from /lib64/libc.so.6
   #1  0x00000000005bd1e3 in PGSemaphoreLock (sema=0x2b1a6014ac38,
   interruptOK=1 '\001') at pg_sema.c:420
   #2  0x00000000005de735 in LockBufferForCleanup (buffer=310163) at
   bufmgr.c:2432
   #3  0x000000000048b896 in RestoreBkpBlocks (lsn=<value optimized
   out>, record=0x2b1a6191b010, cleanup=1 '\001') at xlog.c:3540
   #4  0x0000000000463788 in heap_xlog_clean (lsn=<value optimized
   out>, record=0x2b1a6191b010) at heapam.c:4160
   #5  heap2_redo (lsn=<value optimized out>, record=0x2b1a6191b010) at
   heapam.c:4858
   (gdb) f 2
   #2  0x00000000005de735 in LockBufferForCleanup (buffer=310163) at
   bufmgr.c:2432
   2432                ProcWaitForSignal();
   (gdb) p *bufHdr
   $2 = {tag = {rnode = {spcNode = 16393, dbNode = 40475, relNode =
   41880}, forkNum = MAIN_FORKNUM, blockNum = 18913}, flags = 6,
   usage_count = 1, refcount = 1, wait_backend_pid = 9111,
      buf_hdr_lock = 0 '\000', buf_id = 310162, freeNext = -2,
   io_in_progress_lock = 620448, content_lock = 620449}

Strangely, I don't even see ProcWaitForSignal() in the frame list - Ishouldn't have thought it was a candidate to be optimised away.


cheers

andrew


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SR standby hangs

Reply via email to