On Tue, 2009-12-15 at 20:11 +0900, Hiroyuki Yamada wrote: > Hot Standby node can freeze when startup process calls LockBufferForCleanup(). > This bug can be reproduced by the following procedure. > > 0. start Hot Standby, with one active node(node A) and one standby node(node > B) > 1. create table X and table Y in node A > 2. insert several rows in table X in node A > 3. delete one row from table X in node A > 4. begin xact 1 in node A, execute following commands, and leave xact 1 open > 4.1 LOCK table Y IN ACCESS EXCLUSIVE MODE > 5. wait until WAL's for above actions are applied in node B > 6. begin xact 2 in node B, and execute following commands > 6.1 DECLARE CURSOR test_cursor FOR SELECT * FROM table X; > 6.2 FETCH test_cursor; > 6.3 SELECT * FROM table Y; > 7. execute VACUUM FREEZE table A in node A > 8. commit xact 1 in node A > > ...then in node B occurs following "deadlock" situation, which is not > detected by deadlock check. > * startup process waits for xact 2 to release buffers in table X (in > LockBufferForCleanup()) > * xact 2 waits for startup process to release ACCESS EXCLUSIVE lock in table > Y
Deadlock bug was prevented by stop-gap measure in December commit. Full resolution patch attached for Startup process waits on buffer pins. Startup process sets SIGALRM when waiting on a buffer pin. If woken by alarm we send SIGUSR1 to all backends requesting that they check to see if they are blocking Startup process. If so, they throw ERROR/FATAL as for other conflict resolutions. Deadlock stop gap removed. max_standby_delay = -1 option removed to prevent deadlock. Reviews welcome, otherwise commit at end of week. -- Simon Riggs www.2ndQuadrant.com
*** a/doc/src/sgml/backup.sgml --- b/doc/src/sgml/backup.sgml *************** *** 2399,2405 **** primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' </listitem> <listitem> <para> ! Waiting to acquire buffer cleanup locks (for which there is no time out) </para> </listitem> <listitem> --- 2399,2405 ---- </listitem> <listitem> <para> ! Waiting to acquire buffer cleanup locks </para> </listitem> <listitem> *************** *** 2536,2546 **** primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' Three-way deadlocks are possible between AccessExclusiveLocks arriving from the primary, cleanup WAL records that require buffer cleanup locks and user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks ! are currently resolved by the cancellation of user processes that would ! need to wait on a lock. This is heavy-handed and generates more query ! cancellations than we need to, though does remove the possibility of deadlock. ! This behaviour is expected to improve substantially for the main release ! version of 8.5. </para> <para> --- 2536,2542 ---- Three-way deadlocks are possible between AccessExclusiveLocks arriving from the primary, cleanup WAL records that require buffer cleanup locks and user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks ! are resolved by time-out when we exceed <varname>max_standby_delay</>. </para> <para> *************** *** 2630,2640 **** LOG: database system is ready to accept read only connections <varname>max_standby_delay</> or even set it to zero, though that is a very aggressive setting. If the standby server is tasked as an additional server for decision support queries then it may be acceptable to set this ! to a value of many hours (in seconds). It is also possible to set ! <varname>max_standby_delay</> to -1 which means wait forever for queries ! to complete, if there are conflicts; this will be useful when performing ! an archive recovery from a backup. ! </para> <para> Transaction status "hint bits" written on primary are not WAL-logged, --- 2626,2632 ---- <varname>max_standby_delay</> or even set it to zero, though that is a very aggressive setting. If the standby server is tasked as an additional server for decision support queries then it may be acceptable to set this ! to a value of many hours (in seconds). <para> Transaction status "hint bits" written on primary are not WAL-logged, *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *************** *** 1825,1838 **** archive_command = 'copy "%p" "C:\\server\\archivedir\\%f"' # Windows <listitem> <para> When server acts as a standby, this parameter specifies a wait policy ! for queries that conflict with incoming data changes. Valid settings ! are -1, meaning wait forever, or a wait time of 0 or more seconds. ! If a conflict should occur the server will delay up to this ! amount before it begins trying to resolve things less amicably, as described in <xref linkend="hot-standby-conflict">. Typically, this parameter makes sense only during replication, so when ! performing an archive recovery to recover from data loss a ! parameter setting of 0 is recommended. The default is 30 seconds. This parameter can only be set in the <filename>postgresql.conf</> file or on the server command line. </para> --- 1825,1839 ---- <listitem> <para> When server acts as a standby, this parameter specifies a wait policy ! for queries that conflict with data changes being replayed by recovery. ! If a conflict should occur the server will delay up to this number ! of seconds before it begins trying to resolve things less amicably, as described in <xref linkend="hot-standby-conflict">. Typically, this parameter makes sense only during replication, so when ! performing an archive recovery to recover from data loss a very high ! parameter setting is recommended. The default is 30 seconds. ! There is no wait-forever setting because of the potential for deadlock ! which that setting would introduce. This parameter can only be set in the <filename>postgresql.conf</> file or on the server command line. </para> *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *************** *** 8759,8767 **** StartupProcessMain(void) */ pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file */ pqsignal(SIGINT, SIG_IGN); /* ignore query cancel */ ! pqsignal(SIGTERM, StartupProcShutdownHandler); /* request shutdown */ ! pqsignal(SIGQUIT, startupproc_quickdie); /* hard crash time */ ! pqsignal(SIGALRM, SIG_IGN); pqsignal(SIGPIPE, SIG_IGN); pqsignal(SIGUSR1, SIG_IGN); pqsignal(SIGUSR2, SIG_IGN); --- 8759,8770 ---- */ pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file */ pqsignal(SIGINT, SIG_IGN); /* ignore query cancel */ ! pqsignal(SIGTERM, StartupProcShutdownHandler); /* request shutdown */ ! pqsignal(SIGQUIT, startupproc_quickdie); /* hard crash time */ ! if (XLogRequestRecoveryConnections) ! pqsignal(SIGALRM, handle_standby_sig_alarm); /* ignored unless InHotStandby */ ! else ! pqsignal(SIGALRM, SIG_IGN); pqsignal(SIGPIPE, SIG_IGN); pqsignal(SIGUSR1, SIG_IGN); pqsignal(SIGUSR2, SIG_IGN); *** a/src/backend/storage/buffer/bufmgr.c --- b/src/backend/storage/buffer/bufmgr.c *************** *** 44,49 **** --- 44,50 ---- #include "storage/ipc.h" #include "storage/proc.h" #include "storage/smgr.h" + #include "storage/standby.h" #include "utils/rel.h" #include "utils/resowner.h" *************** *** 2417,2430 **** LockBufferForCleanup(Buffer buffer) PinCountWaitBuf = bufHdr; UnlockBufHdr(bufHdr); LockBuffer(buffer, BUFFER_LOCK_UNLOCK); /* Wait to be signaled by UnpinBuffer() */ ! ProcWaitForSignal(); PinCountWaitBuf = NULL; /* Loop back and try again */ } } /* * ConditionalLockBufferForCleanup - as above, but don't wait to get the lock * * We won't loop, but just check once to see if the pin count is OK. If --- 2418,2459 ---- PinCountWaitBuf = bufHdr; UnlockBufHdr(bufHdr); LockBuffer(buffer, BUFFER_LOCK_UNLOCK); + /* Wait to be signaled by UnpinBuffer() */ ! if (InHotStandby) ! { ! /* Share the bufid that Startup process waits on */ ! SetStartupBufferPinWaitBufId(buffer - 1); ! /* Set alarm and then wait to be signaled by UnpinBuffer() */ ! ResolveRecoveryConflictWithBufferPin(); ! SetStartupBufferPinWaitBufId(-1); ! } ! else ! ProcWaitForSignal(); ! PinCountWaitBuf = NULL; /* Loop back and try again */ } } /* + * Check called from RecoveryConflictInterrupt handler when Startup + * process requests cancelation of all pin holders that are blocking it. + */ + bool + HoldingBufferPinThatDelaysRecovery(void) + { + int bufid = GetStartupBufferPinWaitBufId(); + + Assert(bufid >= 0); + + if (PrivateRefCount[bufid] > 0) + return true; + + return false; + } + + /* * ConditionalLockBufferForCleanup - as above, but don't wait to get the lock * * We won't loop, but just check once to see if the pin count is OK. If *** a/src/backend/storage/ipc/procarray.c --- b/src/backend/storage/ipc/procarray.c *************** *** 1620,1634 **** GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0, * latestCompletedXid since doing so would be a performance issue during * normal running, so we check it essentially for free on the standby. * ! * If dbOid is valid we skip backends attached to other databases. Some ! * callers choose to skipExistingConflicts. * * Be careful to *not* pfree the result from this function. We reuse * this array sufficiently often that we use malloc for the result. */ VirtualTransactionId * ! GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid, ! bool skipExistingConflicts) { static VirtualTransactionId *vxids; ProcArrayStruct *arrayP = procArray; --- 1620,1632 ---- * latestCompletedXid since doing so would be a performance issue during * normal running, so we check it essentially for free on the standby. * ! * If dbOid is valid we skip backends attached to other databases. * * Be careful to *not* pfree the result from this function. We reuse * this array sufficiently often that we use malloc for the result. */ VirtualTransactionId * ! GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid) { static VirtualTransactionId *vxids; ProcArrayStruct *arrayP = procArray; *************** *** 1667,1675 **** GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid, if (proc->pid == 0) continue; - if (skipExistingConflicts && proc->recoveryConflictPending) - continue; - if (!OidIsValid(dbOid) || proc->databaseId == dbOid) { --- 1665,1670 ---- *************** *** 1826,1832 **** CountDBBackends(Oid databaseid) * CancelDBBackends --- cancel backends that are using specified database */ void ! CancelDBBackends(Oid databaseid) { ProcArrayStruct *arrayP = procArray; int index; --- 1821,1827 ---- * CancelDBBackends --- cancel backends that are using specified database */ void ! CancelDBBackends(Oid databaseid, ProcSignalReason sigmode, bool conflictPending) { ProcArrayStruct *arrayP = procArray; int index; *************** *** 1839,1851 **** CancelDBBackends(Oid databaseid) { volatile PGPROC *proc = arrayP->procs[index]; ! if (proc->databaseId == databaseid) { VirtualTransactionId procvxid; GET_VXID_FROM_PGPROC(procvxid, *proc); ! proc->recoveryConflictPending = true; pid = proc->pid; if (pid != 0) { --- 1834,1846 ---- { volatile PGPROC *proc = arrayP->procs[index]; ! if (databaseid == InvalidOid || proc->databaseId == databaseid) { VirtualTransactionId procvxid; GET_VXID_FROM_PGPROC(procvxid, *proc); ! proc->recoveryConflictPending = conflictPending; pid = proc->pid; if (pid != 0) { *************** *** 1853,1860 **** CancelDBBackends(Oid databaseid) * Kill the pid if it's still here. If not, that's what we wanted * so ignore any errors. */ ! (void) SendProcSignal(pid, PROCSIG_RECOVERY_CONFLICT_DATABASE, ! procvxid.backendId); } } } --- 1848,1854 ---- * Kill the pid if it's still here. If not, that's what we wanted * so ignore any errors. */ ! (void) SendProcSignal(pid, sigmode, procvxid.backendId); } } } *** a/src/backend/storage/ipc/procsignal.c --- b/src/backend/storage/ipc/procsignal.c *************** *** 272,276 **** procsignal_sigusr1_handler(SIGNAL_ARGS) --- 272,279 ---- if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT)) RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT); + if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN)) + RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN); + errno = save_errno; } *** a/src/backend/storage/ipc/standby.c --- b/src/backend/storage/ipc/standby.c *************** *** 126,135 **** WaitExceedsMaxStandbyDelay(void) long delay_secs; int delay_usecs; - /* max_standby_delay = -1 means wait forever, if necessary */ - if (MaxStandbyDelay < 0) - return false; - /* Are we past max_standby_delay? */ TimestampDifference(GetLatestXLogTime(), GetCurrentTimestamp(), &delay_secs, &delay_usecs); --- 126,131 ---- *************** *** 241,248 **** ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid) VirtualTransactionId *backends; backends = GetConflictingVirtualXIDs(latestRemovedXid, ! InvalidOid, ! true); ResolveRecoveryConflictWithVirtualXIDs(backends, PROCSIG_RECOVERY_CONFLICT_SNAPSHOT); --- 237,243 ---- VirtualTransactionId *backends; backends = GetConflictingVirtualXIDs(latestRemovedXid, ! InvalidOid); ResolveRecoveryConflictWithVirtualXIDs(backends, PROCSIG_RECOVERY_CONFLICT_SNAPSHOT); *************** *** 273,280 **** ResolveRecoveryConflictWithTablespace(Oid tsid) * non-transactional. */ temp_file_users = GetConflictingVirtualXIDs(InvalidTransactionId, ! InvalidOid, ! false); ResolveRecoveryConflictWithVirtualXIDs(temp_file_users, PROCSIG_RECOVERY_CONFLICT_TABLESPACE); } --- 268,274 ---- * non-transactional. */ temp_file_users = GetConflictingVirtualXIDs(InvalidTransactionId, ! InvalidOid); ResolveRecoveryConflictWithVirtualXIDs(temp_file_users, PROCSIG_RECOVERY_CONFLICT_TABLESPACE); } *************** *** 295,301 **** ResolveRecoveryConflictWithDatabase(Oid dbid) */ while (CountDBBackends(dbid) > 0) { ! CancelDBBackends(dbid); /* * Wait awhile for them to die so that we avoid flooding an --- 289,295 ---- */ while (CountDBBackends(dbid) > 0) { ! CancelDBBackends(dbid, PROCSIG_RECOVERY_CONFLICT_TABLESPACE, true); /* * Wait awhile for them to die so that we avoid flooding an *************** *** 331,338 **** ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid) else { backends = GetConflictingVirtualXIDs(InvalidTransactionId, ! InvalidOid, ! true); report_memory_error = true; } --- 325,331 ---- else { backends = GetConflictingVirtualXIDs(InvalidTransactionId, ! InvalidOid); report_memory_error = true; } *************** *** 346,351 **** ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid) --- 339,451 ---- } /* + * ResolveRecoveryConflictWithBufferPin is called from LockBufferForCleanup() + * to resolve conflicts with other backends holding buffer pins. + * + * We either resolve conflicts immediately or set a SIGALRM to wake us at + * the limit of our patience. The sleep in LockBufferForCleanup() is + * performed here, for code clarity. + * + * Resolve conflict by sending a SIGUSR1 reason to all backends to check if + * they hold one of the buffer pins that is blocking Startup process. If so, + * backends will take an appropriate error action, ERROR or FATAL. + * + * A secondary purpose of this is to avoid deadlocks that might occur between + * the Startup process and lock waiters. Deadlocks occur because if queries + * wait on a lock, that must be behind an AccessExclusiveLock, which can only + * be clared if the Startup process replays a transaction completion record. + * If Startup process is waiting then that is a deadlock. If we allowed a + * setting of max_standby_delay that meant "wait forever" we would then need + * special code to protect against deadlock. Such deadlocks are rare, so the + * code would be almost certainly buggy, so we avoid both long waits and + * deadlocks using the same mechanism. + */ + void + ResolveRecoveryConflictWithBufferPin(void) + { + bool sig_alarm_enabled = false; + + Assert(InHotStandby); + + /* + * Signal immediately or set alarm for later. + */ + if (MaxStandbyDelay == 0) + SendRecoveryConflictWithBufferPin(); + else + { + TimestampTz now; + long standby_delay_secs; /* How far Startup process is lagging */ + int standby_delay_usecs; + + now = GetCurrentTimestamp(); + + /* Are we past max_standby_delay? */ + TimestampDifference(GetLatestXLogTime(), now, + &standby_delay_secs, &standby_delay_usecs); + + if (standby_delay_secs >= (long) MaxStandbyDelay) + SendRecoveryConflictWithBufferPin(); + else + { + TimestampTz fin_time; /* Expected wake-up time by timer */ + long timer_delay_secs; /* Amount of time we set timer for */ + int timer_delay_usecs = 0; + + /* + * How much longer we should wait? + */ + timer_delay_secs = MaxStandbyDelay - standby_delay_secs; + if (standby_delay_usecs > 0) + { + timer_delay_secs -= 1; + timer_delay_usecs = 1000000 - standby_delay_usecs; + } + + /* + * It's possible that the difference is less than a microsecond; + * ensure we don't cancel, rather than set, the interrupt. + */ + if (timer_delay_secs == 0 && timer_delay_usecs == 0) + timer_delay_usecs = 1; + + /* + * When is the finish time? We recheck this if we are woken early. + */ + fin_time = TimestampTzPlusMilliseconds(now, + (timer_delay_secs * 1000) + + (timer_delay_usecs / 1000)); + + if (enable_standby_sig_alarm(timer_delay_secs, timer_delay_usecs, fin_time)) + sig_alarm_enabled = true; + else + elog(FATAL, "could not set timer for process wakeup"); + } + } + + /* Wait to be signaled by UnpinBuffer() */ + ProcWaitForSignal(); + + if (sig_alarm_enabled) + { + if (!disable_standby_sig_alarm()) + elog(FATAL, "could not disable timer for process wakeup"); + } + } + + void + SendRecoveryConflictWithBufferPin(void) + { + /* + * We send signal to all backends to ask them if they are holding + * the buffer pin which is delaying the Startup process. We must + * not set the conflict flag yet, since most backends will be innocent. + * Let the SIGUSR1 handling in each backend decide their own fate. + */ + CancelDBBackends(InvalidOid, PROCSIG_RECOVERY_CONFLICT_BUFFERPIN, false); + } + + /* * ----------------------------------------------------- * Locking in Recovery Mode * ----------------------------------------------------- *** a/src/backend/storage/lmgr/lock.c --- b/src/backend/storage/lmgr/lock.c *************** *** 815,839 **** LockAcquireExtended(const LOCKTAG *locktag, } /* - * In Hot Standby we abort the lock wait if Startup process is waiting - * since this would result in a deadlock. The deadlock occurs because - * if we are waiting it must be behind an AccessExclusiveLock, which - * can only clear when a transaction completion record is replayed. - * If Startup process is waiting we never will clear that lock, so to - * wait for it just causes a deadlock. - */ - if (RecoveryInProgress() && !InRecovery && - locktag->locktag_type == LOCKTAG_RELATION) - { - LWLockRelease(partitionLock); - ereport(ERROR, - (errcode(ERRCODE_T_R_DEADLOCK_DETECTED), - errmsg("possible deadlock detected"), - errdetail("process conflicts with recovery - please resubmit query later"), - errdetail_log("process conflicts with recovery"))); - } - - /* * Set bitmask of locks this process already holds on this object. */ MyProc->heldLocks = proclock->holdMask; --- 815,820 ---- *** a/src/backend/storage/lmgr/proc.c --- b/src/backend/storage/lmgr/proc.c *************** *** 73,78 **** NON_EXEC_STATIC PGPROC *AuxiliaryProcs = NULL; --- 73,79 ---- static LOCALLOCK *lockAwaited = NULL; /* Mark these volatile because they can be changed by signal handler */ + static volatile bool standby_timeout_active = false; static volatile bool statement_timeout_active = false; static volatile bool deadlock_timeout_active = false; static volatile DeadLockState deadlock_state = DS_NOT_YET_CHECKED; *************** *** 89,94 **** static void RemoveProcFromArray(int code, Datum arg); --- 90,96 ---- static void ProcKill(int code, Datum arg); static void AuxiliaryProcKill(int code, Datum arg); static bool CheckStatementTimeout(void); + static bool CheckStandbyTimeout(void); /* *************** *** 107,112 **** ProcGlobalShmemSize(void) --- 109,116 ---- size = add_size(size, mul_size(MaxBackends, sizeof(PGPROC))); /* ProcStructLock */ size = add_size(size, sizeof(slock_t)); + /* startupBufferPinWaitBufId */ + size = add_size(size, sizeof(NBuffers)); return size; } *************** *** 487,497 **** PublishStartupProcessInformation(void) --- 491,540 ---- procglobal->startupProc = MyProc; procglobal->startupProcPid = MyProcPid; + procglobal->startupBufferPinWaitBufId = 0; SpinLockRelease(ProcStructLock); } /* + * Used from bufgr to share the value of the buffer that Startup waits on, + * or to reset the value to "not waiting" (-1). This allows processing + * of recovery conflicts for buffer pins. + */ + void + SetStartupBufferPinWaitBufId(int bufid) + { + /* use volatile pointer to prevent code rearrangement */ + volatile PROC_HDR *procglobal = ProcGlobal; + + SpinLockAcquire(ProcStructLock); + + procglobal->startupBufferPinWaitBufId = bufid; + + SpinLockRelease(ProcStructLock); + } + + /* + * Used by backends when they receive a request to check for buffer pin waits. + */ + int + GetStartupBufferPinWaitBufId(void) + { + int bufid; + + /* use volatile pointer to prevent code rearrangement */ + volatile PROC_HDR *procglobal = ProcGlobal; + + SpinLockAcquire(ProcStructLock); + + bufid = procglobal->startupBufferPinWaitBufId; + + SpinLockRelease(ProcStructLock); + + return bufid; + } + + /* * Check whether there are at least N free PGPROC objects. * * Note: this is designed on the assumption that N will generally be small. *************** *** 1542,1548 **** CheckStatementTimeout(void) /* ! * Signal handler for SIGALRM * * Process deadlock check and/or statement timeout check, as needed. * To avoid various edge cases, we must be careful to do nothing --- 1585,1591 ---- /* ! * Signal handler for SIGALRM for normal user backends * * Process deadlock check and/or statement timeout check, as needed. * To avoid various edge cases, we must be careful to do nothing *************** *** 1565,1567 **** handle_sig_alarm(SIGNAL_ARGS) --- 1608,1714 ---- errno = save_errno; } + + /* + * Signal handler for SIGALRM in Startup process + * + * To avoid various edge cases, we must be careful to do nothing + * when there is nothing to be done. We also need to be able to + * reschedule the timer interrupt if called before end of statement. + */ + bool + enable_standby_sig_alarm(long delay_s, int delay_us, TimestampTz fin_time) + { + struct itimerval timeval; + + Assert(delay_s >= 0 && delay_us >= 0); + + statement_fin_time = fin_time; + + standby_timeout_active = true; + + MemSet(&timeval, 0, sizeof(struct itimerval)); + timeval.it_value.tv_sec = delay_s; + timeval.it_value.tv_usec = delay_us; + if (setitimer(ITIMER_REAL, &timeval, NULL)) + return false; + return true; + } + + bool + disable_standby_sig_alarm(void) + { + /* + * Always disable the interrupt if it is active; this avoids being + * interrupted by the signal handler and thereby possibly getting + * confused. + * + * We will re-enable the interrupt if necessary in CheckStandbyTimeout. + */ + if (standby_timeout_active) + { + struct itimerval timeval; + + MemSet(&timeval, 0, sizeof(struct itimerval)); + if (setitimer(ITIMER_REAL, &timeval, NULL)) + { + standby_timeout_active = false; + return false; + } + } + + return true; + } + + /* + * CheckStandbyTimeout() runs unconditionally in the Startup process + * SIGALRM handler. Timers will only be set when InHotStandby. + * We simply ignore any signals unless the timer has been set. + */ + static bool + CheckStandbyTimeout(void) + { + TimestampTz now; + + if (!standby_timeout_active) + return true; /* do nothing if not active */ + + now = GetCurrentTimestamp(); + + if (now >= statement_fin_time) + SendRecoveryConflictWithBufferPin(); + else + { + /* Not time yet, so (re)schedule the interrupt */ + long secs; + int usecs; + struct itimerval timeval; + + TimestampDifference(now, statement_fin_time, + &secs, &usecs); + + /* + * It's possible that the difference is less than a microsecond; + * ensure we don't cancel, rather than set, the interrupt. + */ + if (secs == 0 && usecs == 0) + usecs = 1; + MemSet(&timeval, 0, sizeof(struct itimerval)); + timeval.it_value.tv_sec = secs; + timeval.it_value.tv_usec = usecs; + if (setitimer(ITIMER_REAL, &timeval, NULL)) + return false; + } + + return true; + } + + void + handle_standby_sig_alarm(SIGNAL_ARGS) + { + int save_errno = errno; + + (void) CheckStandbyTimeout(); + + errno = save_errno; + } *** a/src/backend/tcop/postgres.c --- b/src/backend/tcop/postgres.c *************** *** 2718,2723 **** RecoveryConflictInterrupt(ProcSignalReason reason) --- 2718,2735 ---- { switch (reason) { + case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN: + /* + * If we aren't blocking the Startup process there is + * nothing more to do. + */ + if (!HoldingBufferPinThatDelaysRecovery()) + return; + + MyProc->recoveryConflictPending = true; + + /* Intentional drop through to error handling */ + case PROCSIG_RECOVERY_CONFLICT_LOCK: case PROCSIG_RECOVERY_CONFLICT_TABLESPACE: case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT: *** a/src/backend/utils/misc/guc.c --- b/src/backend/utils/misc/guc.c *************** *** 1383,1389 **** static struct config_int ConfigureNamesInt[] = NULL }, &MaxStandbyDelay, ! 30, -1, INT_MAX, NULL, NULL }, { --- 1383,1389 ---- NULL }, &MaxStandbyDelay, ! 30, 0, INT_MAX, NULL, NULL }, { *** a/src/include/storage/bufmgr.h --- b/src/include/storage/bufmgr.h *************** *** 198,203 **** extern void LockBuffer(Buffer buffer, int mode); --- 198,204 ---- extern bool ConditionalLockBuffer(Buffer buffer); extern void LockBufferForCleanup(Buffer buffer); extern bool ConditionalLockBufferForCleanup(Buffer buffer); + extern bool HoldingBufferPinThatDelaysRecovery(void); extern void AbortBufferIO(void); *** a/src/include/storage/proc.h --- b/src/include/storage/proc.h *************** *** 16,22 **** #include "storage/lock.h" #include "storage/pg_sema.h" ! /* * Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds --- 16,22 ---- #include "storage/lock.h" #include "storage/pg_sema.h" ! #include "utils/timestamp.h" /* * Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds *************** *** 145,150 **** typedef struct PROC_HDR --- 145,152 ---- /* The proc of the Startup process, since not in ProcArray */ PGPROC *startupProc; int startupProcPid; + /* Buffer id of the buffer that Startup process waits for pin on */ + int startupBufferPinWaitBufId; } PROC_HDR; /* *************** *** 177,182 **** extern void InitProcessPhase2(void); --- 179,186 ---- extern void InitAuxiliaryProcess(void); extern void PublishStartupProcessInformation(void); + extern void SetStartupBufferPinWaitBufId(int bufid); + extern int GetStartupBufferPinWaitBufId(void); extern bool HaveNFreeProcs(int n); extern void ProcReleaseLocks(bool isCommit); *************** *** 194,197 **** extern bool enable_sig_alarm(int delayms, bool is_statement_timeout); --- 198,205 ---- extern bool disable_sig_alarm(bool is_statement_timeout); extern void handle_sig_alarm(SIGNAL_ARGS); + extern bool enable_standby_sig_alarm(long delay_s, int delay_us, TimestampTz fin_time); + extern bool disable_standby_sig_alarm(void); + extern void handle_standby_sig_alarm(SIGNAL_ARGS); + #endif /* PROC_H */ *** a/src/include/storage/procarray.h --- b/src/include/storage/procarray.h *************** *** 57,69 **** extern bool IsBackendPid(int pid); extern VirtualTransactionId *GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0, bool allDbs, int excludeVacuum, int *nvxids); ! extern VirtualTransactionId *GetConflictingVirtualXIDs(TransactionId limitXmin, ! Oid dbOid, bool skipExistingConflicts); extern pid_t CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode); extern int CountActiveBackends(void); extern int CountDBBackends(Oid databaseid); ! extern void CancelDBBackends(Oid databaseid); extern int CountUserBackends(Oid roleid); extern bool CountOtherDBBackends(Oid databaseId, int *nbackends, int *nprepared); --- 57,68 ---- extern VirtualTransactionId *GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0, bool allDbs, int excludeVacuum, int *nvxids); ! extern VirtualTransactionId *GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid); extern pid_t CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode); extern int CountActiveBackends(void); extern int CountDBBackends(Oid databaseid); ! extern void CancelDBBackends(Oid databaseid, ProcSignalReason sigmode, bool conflictPending); extern int CountUserBackends(Oid roleid); extern bool CountOtherDBBackends(Oid databaseId, int *nbackends, int *nprepared); *** a/src/include/storage/procsignal.h --- b/src/include/storage/procsignal.h *************** *** 37,42 **** typedef enum --- 37,43 ---- PROCSIG_RECOVERY_CONFLICT_TABLESPACE, PROCSIG_RECOVERY_CONFLICT_LOCK, PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, + PROCSIG_RECOVERY_CONFLICT_BUFFERPIN, NUM_PROCSIGNALS /* Must be last! */ } ProcSignalReason; *** a/src/include/storage/standby.h --- b/src/include/storage/standby.h *************** *** 19,30 **** extern int vacuum_defer_cleanup_age; extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid); extern void ResolveRecoveryConflictWithTablespace(Oid tsid); extern void ResolveRecoveryConflictWithDatabase(Oid dbid); ! extern void InitRecoveryTransactionEnvironment(void); ! extern void ShutdownRecoveryTransactionEnvironment(void); /* * Standby Rmgr (RM_STANDBY_ID) --- 19,33 ---- extern int vacuum_defer_cleanup_age; + extern void InitRecoveryTransactionEnvironment(void); + extern void ShutdownRecoveryTransactionEnvironment(void); + extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid); extern void ResolveRecoveryConflictWithTablespace(Oid tsid); extern void ResolveRecoveryConflictWithDatabase(Oid dbid); ! extern void ResolveRecoveryConflictWithBufferPin(void); ! extern void SendRecoveryConflictWithBufferPin(void); /* * Standby Rmgr (RM_STANDBY_ID)
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers