Re: [HACKERS] An example of bugs for Hot Standby
Deadlock bug was prevented by stop-gap measure in December commit. Full resolution patch attached for Startup process waits on buffer pins. Startup process sets SIGALRM when waiting on a buffer pin. If woken by alarm we send SIGUSR1 to all backends requesting that they check to see if they are blocking Startup process. If so, they throw ERROR/FATAL as for other conflict resolutions. Deadlock stop gap removed. max_standby_delay = -1 option removed to prevent deadlock. Reviews welcome, otherwise commit at end of week. I think the patch has two problems. * disable_standby_sig_alarm() does not clear standby_timeout_active flag when it succeeds in disabling the alarm. * Assertion check in HoldingBufferPinThatDelaysRecovery() can fail with following scenario. 1. Two transactions, xact A and xact B, are running in a HotStandby server. 2. Xact A holds a pin on buffer X. 3. Startup process calls LockBufferForCleanup() for buffer X, sets ProcGlobal-startupBufferPinWaitBufId = X, sends PROCSIG_RECOVERY_CONFLICT_BUFFERPIN signal to both transactions, and sleeps. 4. Xact A handles the signal, aborts itself, releases the pin on buffer X, and awake startup process. 5. Startup process wakes up and sets ProcGlobal-startupBufferPinWaitBufId = -1. 6. Xact B handles the signal, checks ProcGlobal-startupBufferPinWaitBufId, and fails in the assertion check in HoldingBufferPinThatDelaysRecovery(). regards, -- Hiroyuki YAMADA Kokolink Corporation yam...@kokolink.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
On Thu, 2010-01-21 at 18:01 +0900, Hiroyuki Yamada wrote: I think the patch has two problems. * disable_standby_sig_alarm() does not clear standby_timeout_active flag when it succeeds in disabling the alarm. Ah, thanks. * Assertion check in HoldingBufferPinThatDelaysRecovery() can fail with following scenario. Agreed. Will remove Assert() because of race conditions. Thanks, -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
On Thu, 2010-01-21 at 18:01 +0900, Hiroyuki Yamada wrote: Deadlock bug was prevented by stop-gap measure in December commit. Full resolution patch attached for Startup process waits on buffer pins. Startup process sets SIGALRM when waiting on a buffer pin. If woken by alarm we send SIGUSR1 to all backends requesting that they check to see if they are blocking Startup process. If so, they throw ERROR/FATAL as for other conflict resolutions. Deadlock stop gap removed. max_standby_delay = -1 option removed to prevent deadlock. Reviews welcome, otherwise commit at end of week. I think the patch has two problems. * disable_standby_sig_alarm() does not clear standby_timeout_active flag when it succeeds in disabling the alarm. * Assertion check in HoldingBufferPinThatDelaysRecovery() can fail with following scenario. Updated patch, including changes from Andres and Hiroyuki. -- Simon Riggs www.2ndQuadrant.com *** a/doc/src/sgml/backup.sgml --- b/doc/src/sgml/backup.sgml *** *** 2399,2405 primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' /listitem listitem para ! Waiting to acquire buffer cleanup locks (for which there is no time out) /para /listitem listitem --- 2399,2405 /listitem listitem para ! Waiting to acquire buffer cleanup locks /para /listitem listitem *** *** 2536,2546 primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' Three-way deadlocks are possible between AccessExclusiveLocks arriving from the primary, cleanup WAL records that require buffer cleanup locks and user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks ! are currently resolved by the cancellation of user processes that would ! need to wait on a lock. This is heavy-handed and generates more query ! cancellations than we need to, though does remove the possibility of deadlock. ! This behaviour is expected to improve substantially for the main release ! version of 8.5. /para para --- 2536,2542 Three-way deadlocks are possible between AccessExclusiveLocks arriving from the primary, cleanup WAL records that require buffer cleanup locks and user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks ! are resolved by time-out when we exceed varnamemax_standby_delay/. /para para *** *** 2630,2640 LOG: database system is ready to accept read only connections varnamemax_standby_delay/ or even set it to zero, though that is a very aggressive setting. If the standby server is tasked as an additional server for decision support queries then it may be acceptable to set this ! to a value of many hours (in seconds). It is also possible to set ! varnamemax_standby_delay/ to -1 which means wait forever for queries ! to complete, if there are conflicts; this will be useful when performing ! an archive recovery from a backup. !/para para Transaction status hint bits written on primary are not WAL-logged, --- 2626,2632 varnamemax_standby_delay/ or even set it to zero, though that is a very aggressive setting. If the standby server is tasked as an additional server for decision support queries then it may be acceptable to set this ! to a value of many hours (in seconds). para Transaction status hint bits written on primary are not WAL-logged, *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** *** 1825,1838 archive_command = 'copy %p C:\\server\\archivedir\\%f' # Windows listitem para When server acts as a standby, this parameter specifies a wait policy ! for queries that conflict with incoming data changes. Valid settings ! are -1, meaning wait forever, or a wait time of 0 or more seconds. ! If a conflict should occur the server will delay up to this ! amount before it begins trying to resolve things less amicably, as described in xref linkend=hot-standby-conflict. Typically, this parameter makes sense only during replication, so when ! performing an archive recovery to recover from data loss a ! parameter setting of 0 is recommended. The default is 30 seconds. This parameter can only be set in the filenamepostgresql.conf/ file or on the server command line. /para --- 1825,1839 listitem para When server acts as a standby, this parameter specifies a wait policy ! for queries that conflict with data changes being replayed by recovery. ! If a conflict should occur the server will delay up to this number ! of seconds before it begins trying to
Re: [HACKERS] An example of bugs for Hot Standby
On Wednesday 20 January 2010 17:59:36 Tom Lane wrote: Andres Freund and...@anarazel.de writes: I realize its way too late in the cycle for that, but why dont we start using some library for easy cross platform atomic ops? (1) there probably isn't one that does exactly what we want, works everywhere, and has the right license; (2) what actual gain would we get? We've already done the work. Another thing would be to have a simple rmb() wmb() instruction... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
On Wednesday 20 January 2010 06:30:28 Tom Lane wrote: Andres Freund and...@anarazel.de writes: Is there any supported platform with sizeof(sig_atomic_t) 4 - I would doubt so? Er ... what? I believe there are live platforms with sig_atomic_t = char. If we're assuming more that's a must-fix. So were assuming genereally that a integer cannot be read/written atomatically? Or was that only relating to the actualy signal.h typedef? Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
On Wed, 2010-01-20 at 06:14 +0100, Andres Freund wrote: Full resolution patch attached for Startup process waits on buffer pins. Startup process sets SIGALRM when waiting on a buffer pin. If woken by alarm we send SIGUSR1 to all backends requesting that they check to see if they are blocking Startup process. If so, they throw ERROR/FATAL as for other conflict resolutions. Deadlock stop gap removed. max_standby_delay = -1 option removed to prevent deadlock. Wouldnt it be more foolproof to also loop around sending the FATAL? Not that its likely but... More foolproof and much less accurate. The Startup process doesn't know who is holding the buffer pin that blocks it, so it could not target a FATAL. From HoldingBufferPinThatDelaysRecovery youre calling GetStartupBufferPinWaitBufId - that sounds a bit dangerous because that one is acquiring a spinlock which can also get taken at other places. Its not the most likely scenario, but it would certainly be annoying to debug. Spinlock. It isn't held for long in any situation. What problem do you foresee? Is there any supported platform with sizeof(sig_atomic_t) 4 - I would doubt so? If not the locking in GetStartupBufferPinWaitBufId and SetStartupBufferPinWaitBufId shouldnt be needed? I prefer spinlocking. Same issue issue (and more likely to trigger) exists with CheckStandbyTimeout- SendRecoveryConflictWithBufferPin-CancelDBBackends I don't see an issue. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
On Wednesday 20 January 2010 06:30:28 Tom Lane wrote: Andres Freund and...@anarazel.de writes: Is there any supported platform with sizeof(sig_atomic_t) 4 - I would doubt so? Er ... what? I believe there are live platforms with sig_atomic_t = char. If we're assuming more that's a must-fix. The reason I have asked is that the code is doing things like: /* * Used by backends when they receive a request to check for buffer pin waits. */ int GetStartupBufferPinWaitBufId(void) { int bufid; /* use volatile pointer to prevent code rearrangement */ volatile PROC_HDR *procglobal = ProcGlobal; SpinLockAcquire(ProcStructLock); bufid = procglobal-startupBufferPinWaitBufId; SpinLockRelease(ProcStructLock); return bufid; } or similar things with LWLockAcquire in a signal handler which strikes me as a not that good idea. As at least on x86 reading an integer is atomic the above spinlock is pointless. My cross arch experience is barely existing, so... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
On Wednesday 20 January 2010 10:40:10 Simon Riggs wrote: On Wed, 2010-01-20 at 06:14 +0100, Andres Freund wrote: Full resolution patch attached for Startup process waits on buffer pins. Startup process sets SIGALRM when waiting on a buffer pin. If woken by alarm we send SIGUSR1 to all backends requesting that they check to see if they are blocking Startup process. If so, they throw ERROR/FATAL as for other conflict resolutions. Deadlock stop gap removed. max_standby_delay = -1 option removed to prevent deadlock. Wouldnt it be more foolproof to also loop around sending the FATAL? Not that its likely but... More foolproof and much less accurate. The Startup process doesn't know who is holding the buffer pin that blocks it, so it could not target a FATAL. From HoldingBufferPinThatDelaysRecovery youre calling GetStartupBufferPinWaitBufId - that sounds a bit dangerous because that one is acquiring a spinlock which can also get taken at other places. Its not the most likely scenario, but it would certainly be annoying to debug. Spinlock. It isn't held for long in any situation. What problem do you foresee? If any backend is signalled while currently holding the ProcStructLock there is a basically unrecoverable deadlock - its not likely but possible. Is there any supported platform with sizeof(sig_atomic_t) 4 - I would doubt so? If not the locking in GetStartupBufferPinWaitBufId and SetStartupBufferPinWaitBufId shouldnt be needed? I prefer spinlocking. Well, its deadlock land taking the same lock inside and outside of a signal handler... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
On Wed, 2010-01-20 at 10:45 +0100, Andres Freund wrote: LWLockAcquire I'm using spinlocks, not lwlocks. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
On Wednesday 20 January 2010 10:52:24 Simon Riggs wrote: On Wed, 2010-01-20 at 10:45 +0100, Andres Freund wrote: LWLockAcquire I'm using spinlocks, not lwlocks. CancelDBBackends which is used in SendRecoveryConflictWithBufferPin which in turn used by CheckStandbyTimeout triggered by SIGALRM acquires the lwlock. Now that case is a bit less dangerous because you would have to interrupt yourself to trigger a deadlock there because the code sleeps soon after setting up the handler. If ever two SIGALRM occur consecutive there is a problem. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
On Wed, 2010-01-20 at 11:04 +0100, Andres Freund wrote: On Wednesday 20 January 2010 10:52:24 Simon Riggs wrote: On Wed, 2010-01-20 at 10:45 +0100, Andres Freund wrote: LWLockAcquire I'm using spinlocks, not lwlocks. CancelDBBackends which is used in SendRecoveryConflictWithBufferPin which in turn used by CheckStandbyTimeout triggered by SIGALRM acquires the lwlock. Those are used in similar ways to deadlock detection. Now that case is a bit less dangerous because you would have to interrupt yourself to trigger a deadlock there because the code sleeps soon after setting up the handler. If ever two SIGALRM occur consecutive there is a problem. I'll protect against subsequent calls. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
On Wednesday 20 January 2010 11:33:05 Simon Riggs wrote: On Wed, 2010-01-20 at 11:04 +0100, Andres Freund wrote: On Wednesday 20 January 2010 10:52:24 Simon Riggs wrote: On Wed, 2010-01-20 at 10:45 +0100, Andres Freund wrote: LWLockAcquire I'm using spinlocks, not lwlocks. CancelDBBackends which is used in SendRecoveryConflictWithBufferPin which in turn used by CheckStandbyTimeout triggered by SIGALRM acquires the lwlock. Those are used in similar ways to deadlock detection. But only if ImmediateInterruptOK InterruptHoldoffCount == 0 CritSectionCount == 0 - which is not the case with HoldingBufferPinThatDelaysRecovery. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
Andres Freund and...@anarazel.de writes: On Wednesday 20 January 2010 06:30:28 Tom Lane wrote: Er ... what? I believe there are live platforms with sig_atomic_t = char. If we're assuming more that's a must-fix. The reason I have asked is that the code is doing things like: [ grabbing a spinlock to read a single integer ] Yes, I think we probably actually need that. The problem is not so much whether the read is an atomic operation as whether you can rely on getting an up-to-date value. On multiprocessors with weak memory ordering you need some type of sync instruction to be sure you will see a value that was recently written by another processor. Currently, we embed such instructions in the spinlock acquire/release code. There's been some discussion of exposing memory sync independently of lock acquisition; perhaps that would be enough here, but I haven't looked at the surrounding logic enough to say. My complaint at the top was responding to the idea that someone might be supposing the specific type sig_atomic_t was at least as wide as int. That's a different matter altogether. We do assume in some places that we can read or write the specific type TransactionId indivisibly, but we don't try to declare it as sig_atomic_t. or similar things with LWLockAcquire in a signal handler [ grows visibly pale ] *Please* tell me we are not trying to take locks in a signal handler. What happens if it interrupts code that is already holding that lock? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
On Wednesday 20 January 2010 17:30:04 Tom Lane wrote: Andres Freund and...@anarazel.de writes: On Wednesday 20 January 2010 06:30:28 Tom Lane wrote: Er ... what? I believe there are live platforms with sig_atomic_t = char. If we're assuming more that's a must-fix. The reason I have asked is that the code is doing things like: [ grabbing a spinlock to read a single integer ] Yes, I think we probably actually need that. The problem is not so much whether the read is an atomic operation as whether you can rely on getting an up-to-date value. On multiprocessors with weak memory ordering you need some type of sync instruction to be sure you will see a value that was recently written by another processor. Currently, we embed such instructions in the spinlock acquire/release code. There's been some discussion of exposing memory sync independently of lock acquisition; perhaps that would be enough here, but I haven't looked at the surrounding logic enough to say. I think it should be enough. I realize its way too late in the cycle for that, but why dont we start using some library for easy cross platform atomic ops? I think libatomic or such should support the required platforms. My complaint at the top was responding to the idea that someone might be supposing the specific type sig_atomic_t was at least as wide as int. That's a different matter altogether. We do assume in some places that we can read or write the specific type TransactionId indivisibly, but we don't try to declare it as sig_atomic_t. So we already assume that? Fine. (yes, the sig_atomic_t was a sidetrack - I had memorized it wrongly as the biggest value that can be read/written atomically which is *clearly* wrong) or similar things with LWLockAcquire in a signal handler [ grows visibly pale ] *Please* tell me we are not trying to take locks in a signal handler. What happens if it interrupts code that is already holding that lock? Yes the patch does that at two places. Thats what I was complaining about and what triggered my sig_atomic_t question because of the above explained misunderstanding. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
Andres Freund and...@anarazel.de writes: I realize its way too late in the cycle for that, but why dont we start using some library for easy cross platform atomic ops? (1) there probably isn't one that does exactly what we want, works everywhere, and has the right license; (2) what actual gain would we get? We've already done the work. [ grows visibly pale ] *Please* tell me we are not trying to take locks in a signal handler. What happens if it interrupts code that is already holding that lock? Yes the patch does that at two places. That's a must-fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
Hi Tom, Hi Simon, On Wednesday 20 January 2010 17:59:36 Tom Lane wrote: Andres Freund and...@anarazel.de writes: I realize its way too late in the cycle for that, but why dont we start using some library for easy cross platform atomic ops? (1) there probably isn't one that does exactly what we want, works everywhere, and has the right license; (2) what actual gain would we get? We've already done the work. That there might be some other instructions were interested in? Like really atomic increment? [ grows visibly pale ] *Please* tell me we are not trying to take locks in a signal handler. What happens if it interrupts code that is already holding that lock? Yes the patch does that at two places. That's a must-fix. Its code intended to fix a existing problem not already comitted code. But otherwise I definitely agree. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Synchronization primitives (Was: Re: [HACKERS] An example of bugs for Hot Standby)
Andres Freund wrote: On Wednesday 20 January 2010 17:59:36 Tom Lane wrote: Andres Freund and...@anarazel.de writes: I realize its way too late in the cycle for that, but why dont we start using some library for easy cross platform atomic ops? (1) there probably isn't one that does exactly what we want, works everywhere, and has the right license; (2) what actual gain would we get? We've already done the work. That there might be some other instructions were interested in? Like really atomic increment? This reminds me of something I've been pondering for some time: Streaming Replication introduces a few places with a polling pattern like this (in pseudocode): while() { /* Check if variable in shared has advanced beoynd X */ SpinLockAcquire() localvar = sharedvar; SpinLockRelease() if (localvar X) break; /* Not yet. Sleep pg_usleep(100); } For example, startup process polls like that to wait for walreceiver to write flush new WAL to be replayed. And in master, walsender polls like that for new WAL to be generated, so that it can be sent to standby. Hot standby also has a polling loop where it waits for a transaction a transaction to die, though I'm not sure if that can be fit into the same model. That's OK for asynchronous replication, but unacceptable for synchronous mode. It would be nice to have a new synchronization primitive for that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Synchronization primitives (Was: Re: [HACKERS] An example of bugs for Hot Standby)
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Streaming Replication introduces a few places with a polling pattern like this (in pseudocode): while() { /* Check if variable in shared has advanced beoynd X */ SpinLockAcquire() localvar = sharedvar; SpinLockRelease() if (localvar X) break; /* Not yet. Sleep pg_usleep(100); } I trust there's a CHECK_FOR_INTERRUPTS in there ... It would be nice to have a new synchronization primitive for that. Maybe. The lock, the variable, the comparison operation, and the sleep time all seem rather specific to each application. Not sure that it'd really buy much to try to turn it into a generic subroutine. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Synchronization primitives (Was: Re: [HACKERS] An example of bugs for Hot Standby)
On Wed, 2010-01-20 at 20:00 +0200, Heikki Linnakangas wrote: Hot standby also has a polling loop where it waits for a transaction a transaction to die, though I'm not sure if that can be fit into the same model I prefer that in the context of HS because the Startup process is waiting for things to die. Given that their death may not be handled sweetly, I would not wish to rely on that to wake Startup. In the other two cases you mention all processes are working together normally and we aren't expecting the other processes to die. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
On Wed, 2010-01-20 at 17:40 +0100, Andres Freund wrote: or similar things with LWLockAcquire in a signal handler [ grows visibly pale ] *Please* tell me we are not trying to take locks in a signal handler. What happens if it interrupts code that is already holding that lock? Yes the patch does that at two places. I think it would be more sensible to discuss specific code and issues, rather than have general discussions about various horrors. You've already pointed out that I need to prevent multiple sigalrm interrupts using boolean flags; I've already said that I would do that. The use of locks themselves are clearly not a problem, since the existing sigalrm handler takes LWlocks for deadlock detection. The problem is just about being called multiple times. The code in HoldingBufferPinThatDelaysRecovery() also needs protection against being interrupted multiple times, but we should note that a second signal of that type is not going to arrive from anywhere inside the server and requires an explicit user action. The locking isn't strictly necessary since the value is only read when the only process that ever writes that value is sleeping on a semaphore. The single integer value can always be read atomically anyway. So I will remove the locking in XXXStartupBufferPinWaitBufId(), add in the booleans and we're done. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Synchronization primitives (Was: Re: [HACKERS] An example of bugs for Hot Standby)
Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Streaming Replication introduces a few places with a polling pattern like this (in pseudocode): while() { /* Check if variable in shared has advanced beoynd X */ SpinLockAcquire() localvar = sharedvar; SpinLockRelease() if (localvar X) break; /* Not yet. Sleep pg_usleep(100); } I trust there's a CHECK_FOR_INTERRUPTS in there ... It would be nice to have a new synchronization primitive for that. Maybe. The lock, the variable, the comparison operation, and the sleep time all seem rather specific to each application. Not sure that it'd really buy much to try to turn it into a generic subroutine. My point is that we should replace such polling loops with something non-polling, using wait/signal or semaphores or something. That gets quite a bit more complex. You'd probably still have the loop, but instead of pg_usleep() you'd call some new primitive function that waits until the shared variable changes. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Synchronization primitives (Was: Re: [HACKERS] An example of bugs for Hot Standby)
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: My point is that we should replace such polling loops with something non-polling, using wait/signal or semaphores or something. That gets quite a bit more complex. You'd probably still have the loop, but instead of pg_usleep() you'd call some new primitive function that waits until the shared variable changes. Maybe someday --- it's certainly not something we need to mess with for 8.5. As Simon comments, getting it to work nicely in the face of corner cases (like processes dying unexpectedly) could be a lot of work. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Synchronization primitives (Was: Re: [HACKERS] An example of bugs for Hot Standby)
Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: My point is that we should replace such polling loops with something non-polling, using wait/signal or semaphores or something. That gets quite a bit more complex. You'd probably still have the loop, but instead of pg_usleep() you'd call some new primitive function that waits until the shared variable changes. Maybe someday --- it's certainly not something we need to mess with for 8.5. As Simon comments, getting it to work nicely in the face of corner cases (like processes dying unexpectedly) could be a lot of work. Agreed, polling is good enough for 8.5. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Synchronization primitives (Was: Re: [HACKERS] An example of bugs for Hot Standby)
On Wed, Jan 20, 2010 at 09:22:49PM +0200, Heikki Linnakangas wrote: Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: My point is that we should replace such polling loops with something non-polling, using wait/signal or semaphores or something. That gets quite a bit more complex. You'd probably still have the loop, but instead of pg_usleep() you'd call some new primitive function that waits until the shared variable changes. Maybe someday --- it's certainly not something we need to mess with for 8.5. As Simon comments, getting it to work nicely in the face of corner cases (like processes dying unexpectedly) could be a lot of work. Agreed, polling is good enough for 8.5. Is this a TODO yet? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Synchronization primitives (Was: Re: [HACKERS] An example of bugs for Hot Standby)
David Fetter da...@fetter.org writes: On Wed, Jan 20, 2010 at 09:22:49PM +0200, Heikki Linnakangas wrote: My point is that we should replace such polling loops with something non-polling, using wait/signal or semaphores or something. Is this a TODO yet? It hardly seems concrete enough for a TODO item. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
On Tue, 2009-12-15 at 20:11 +0900, Hiroyuki Yamada wrote: Hot Standby node can freeze when startup process calls LockBufferForCleanup(). This bug can be reproduced by the following procedure. 0. start Hot Standby, with one active node(node A) and one standby node(node B) 1. create table X and table Y in node A 2. insert several rows in table X in node A 3. delete one row from table X in node A 4. begin xact 1 in node A, execute following commands, and leave xact 1 open 4.1 LOCK table Y IN ACCESS EXCLUSIVE MODE 5. wait until WAL's for above actions are applied in node B 6. begin xact 2 in node B, and execute following commands 6.1 DECLARE CURSOR test_cursor FOR SELECT * FROM table X; 6.2 FETCH test_cursor; 6.3 SELECT * FROM table Y; 7. execute VACUUM FREEZE table A in node A 8. commit xact 1 in node A ...then in node B occurs following deadlock situation, which is not detected by deadlock check. * startup process waits for xact 2 to release buffers in table X (in LockBufferForCleanup()) * xact 2 waits for startup process to release ACCESS EXCLUSIVE lock in table Y Deadlock bug was prevented by stop-gap measure in December commit. Full resolution patch attached for Startup process waits on buffer pins. Startup process sets SIGALRM when waiting on a buffer pin. If woken by alarm we send SIGUSR1 to all backends requesting that they check to see if they are blocking Startup process. If so, they throw ERROR/FATAL as for other conflict resolutions. Deadlock stop gap removed. max_standby_delay = -1 option removed to prevent deadlock. Reviews welcome, otherwise commit at end of week. -- Simon Riggs www.2ndQuadrant.com *** a/doc/src/sgml/backup.sgml --- b/doc/src/sgml/backup.sgml *** *** 2399,2405 primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' /listitem listitem para ! Waiting to acquire buffer cleanup locks (for which there is no time out) /para /listitem listitem --- 2399,2405 /listitem listitem para ! Waiting to acquire buffer cleanup locks /para /listitem listitem *** *** 2536,2546 primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' Three-way deadlocks are possible between AccessExclusiveLocks arriving from the primary, cleanup WAL records that require buffer cleanup locks and user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks ! are currently resolved by the cancellation of user processes that would ! need to wait on a lock. This is heavy-handed and generates more query ! cancellations than we need to, though does remove the possibility of deadlock. ! This behaviour is expected to improve substantially for the main release ! version of 8.5. /para para --- 2536,2542 Three-way deadlocks are possible between AccessExclusiveLocks arriving from the primary, cleanup WAL records that require buffer cleanup locks and user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks ! are resolved by time-out when we exceed varnamemax_standby_delay/. /para para *** *** 2630,2640 LOG: database system is ready to accept read only connections varnamemax_standby_delay/ or even set it to zero, though that is a very aggressive setting. If the standby server is tasked as an additional server for decision support queries then it may be acceptable to set this ! to a value of many hours (in seconds). It is also possible to set ! varnamemax_standby_delay/ to -1 which means wait forever for queries ! to complete, if there are conflicts; this will be useful when performing ! an archive recovery from a backup. !/para para Transaction status hint bits written on primary are not WAL-logged, --- 2626,2632 varnamemax_standby_delay/ or even set it to zero, though that is a very aggressive setting. If the standby server is tasked as an additional server for decision support queries then it may be acceptable to set this ! to a value of many hours (in seconds). para Transaction status hint bits written on primary are not WAL-logged, *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** *** 1825,1838 archive_command = 'copy %p C:\\server\\archivedir\\%f' # Windows listitem para When server acts as a standby, this parameter specifies a wait policy ! for queries that conflict with incoming data changes. Valid settings ! are -1, meaning wait forever, or a wait time of 0 or more seconds. ! If a conflict should occur the server will delay up to this ! amount before it begins trying to resolve things less amicably, as described in xref
Re: [HACKERS] An example of bugs for Hot Standby
On Tuesday 19 January 2010 11:46:24 Simon Riggs wrote: On Tue, 2009-12-15 at 20:11 +0900, Hiroyuki Yamada wrote: Hot Standby node can freeze when startup process calls LockBufferForCleanup(). This bug can be reproduced by the following procedure. 0. start Hot Standby, with one active node(node A) and one standby node(node B) 1. create table X and table Y in node A 2. insert several rows in table X in node A 3. delete one row from table X in node A 4. begin xact 1 in node A, execute following commands, and leave xact 1 open 4.1 LOCK table Y IN ACCESS EXCLUSIVE MODE 5. wait until WAL's for above actions are applied in node B 6. begin xact 2 in node B, and execute following commands 6.1 DECLARE CURSOR test_cursor FOR SELECT * FROM table X; 6.2 FETCH test_cursor; 6.3 SELECT * FROM table Y; 7. execute VACUUM FREEZE table A in node A 8. commit xact 1 in node A ...then in node B occurs following deadlock situation, which is not detected by deadlock check. * startup process waits for xact 2 to release buffers in table X (in LockBufferForCleanup()) * xact 2 waits for startup process to release ACCESS EXCLUSIVE lock in table Y Deadlock bug was prevented by stop-gap measure in December commit. Full resolution patch attached for Startup process waits on buffer pins. Startup process sets SIGALRM when waiting on a buffer pin. If woken by alarm we send SIGUSR1 to all backends requesting that they check to see if they are blocking Startup process. If so, they throw ERROR/FATAL as for other conflict resolutions. Deadlock stop gap removed. max_standby_delay = -1 option removed to prevent deadlock. Wouldnt it be more foolproof to also loop around sending the FATAL? Not that its likely but... From HoldingBufferPinThatDelaysRecovery youre calling GetStartupBufferPinWaitBufId - that sounds a bit dangerous because that one is acquiring a spinlock which can also get taken at other places. Its not the most likely scenario, but it would certainly be annoying to debug. Is there any supported platform with sizeof(sig_atomic_t) 4 - I would doubt so? If not the locking in GetStartupBufferPinWaitBufId and SetStartupBufferPinWaitBufId shouldnt be needed? Same issue issue (and more likely to trigger) exists with CheckStandbyTimeout- SendRecoveryConflictWithBufferPin-CancelDBBackends Too tired to look further. Greetings, Andres PS: Sorry for not doing anything on the idle front - I have been burried alive the last days. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
Andres Freund and...@anarazel.de writes: Is there any supported platform with sizeof(sig_atomic_t) 4 - I would doubt so? Er ... what? I believe there are live platforms with sig_atomic_t = char. If we're assuming more that's a must-fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
Following question may be redundant. Just a confirmation. Deadlock example is catstrophic while it's rather a rare event. On the other hand, LockBufferForCleanup() can cause another problem. * One idle pin-holder backend can freeze startup process(). This problem is not catstrophic, but it seems a similar problem which StandbyAcquireAccessExclusiveLock() tries to avoid. ...Is this the problem you call general problem above ? Here is a typical scenario in which startup process freezes until the end of a certain transaction. 1. Consider a table A, which has pages with HOT chain tuples old enough to be vacuumed. 2. Xact 1 in the standby node declares a cursor for table A, fetches the page which contains the HOT chain, and becomes idle for some reason. 3. Xact 2 in the active node reads the table A and calls heap_page_prune() for HOT pruning, which create XLOG_HEAP2_CLEAN record. 4. Startup process tries to redo XLOG_HEAP2_CLEAN record, calls LockBufferForCleanup() and freezes until the Xact 1 ends. Note that with HOT pruning, we do not need VACUUM command, and most tables, which has long history of updation, can be table A. -- Hiroyuki YAMADA Kokolink Corporation yam...@kokolink.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
Hiroyuki Yamada yam...@kokolink.net wrote: 4. Startup process tries to redo XLOG_HEAP2_CLEAN record, calls LockBufferForCleanup() and freezes until the Xact 1 ends. I think they word you're searching for is blocks. Blocking to protect integrity doesn't sound like a bug to me; perhaps an opportunity for enhancement. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
This way we only cancel direct deadlocks. It doesn't solve general problem of buffer waits, but they may be solvable by different mechanism. Following question may be redundant. Just a confirmation. Deadlock example is catstrophic while it's rather a rare event. On the other hand, LockBufferForCleanup() can cause another problem. * One idle pin-holder backend can freeze startup process(). This problem is not catstrophic, but it seems a similar problem which StandbyAcquireAccessExclusiveLock() tries to avoid. ...Is this the problem you call general problem above ? regards, -- Hiroyuki YAMADA Kokolink Corporation yam...@kokolink.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
On Wed, 2009-12-16 at 14:05 +, Simon Riggs wrote: On Wed, 2009-12-16 at 10:33 +, Simon Riggs wrote: On Tue, 2009-12-15 at 20:25 +0900, Hiroyuki Yamada wrote: Hot Standby node can freeze when startup process calls LockBufferForCleanup(). This bug can be reproduced by the following procedure. Interesting. Looks like this can happen, which is a shame cos I just removed the wait checking code after not ever having seen a wait. Thanks for the report. Must-fix item for HS. So this deadlock can happen at two places: 1. When a relation lock waits behind an AccessExclusiveLock and then Startup runs LockBufferForCleanup() 2. When Startup is a pin count waiter and a lock acquire begins to wait on a relation lock So we must put in direct deadlock detection in both places. We can't use the normal deadlock detector because in case (1) the backend might already have exceeded deadlock_timeout. Proposal: Better proposal * It's possible for 3-way deadlocks to occur in Hot Standby mode. * If a user backend sleeps on a lock while it holds a buffer pin that * leaves open the risk of deadlock. The user backend will only sleep * if it waits behind an AccessExclusiveLock held by Startup process. * If the Startup process then tries to access any buffer that is pinned * then it too will sleep and neither process will ever wake. * * We need to make a deadlock check in two places: in the user backend * when we sleep on a lock, and in the Startup process when we sleep * on a buffer pin. We need both checks because the deadlock can occur * from both directions. * * Just before a user backend sleeps on a lock, we accumulate a list of * buffers pinned by the backend. We then grab the an LWlock * and then check each of the buffers to see if the Startup process is * waiting on them. If so, we release the lock and throw deadlock error. * If Startup process is not waiting we then record the pinned buffers * in the BufferDeadlockRisk data structure and release the lock. * When we later get the lock we remove the deadlock risk. * * When the Startup process is about to wait on a buffer pin it checks * the buffer it is about to pin in the BufferDeadlockRisk list. If the * buffer is already held by one or more lock waiters then we send a * conflict cancel to them and wait for them to die before rechecking * the buffer lock. This way we only cancel direct deadlocks. It doesn't solve general problem of buffer waits, but they may be solvable by different mechanism. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
On Tue, 2009-12-15 at 20:25 +0900, Hiroyuki Yamada wrote: Hot Standby node can freeze when startup process calls LockBufferForCleanup(). This bug can be reproduced by the following procedure. Interesting. Looks like this can happen, which is a shame cos I just removed the wait checking code after not ever having seen a wait. Thanks for the report. Must-fix item for HS. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An example of bugs for Hot Standby
On Wed, 2009-12-16 at 10:33 +, Simon Riggs wrote: On Tue, 2009-12-15 at 20:25 +0900, Hiroyuki Yamada wrote: Hot Standby node can freeze when startup process calls LockBufferForCleanup(). This bug can be reproduced by the following procedure. Interesting. Looks like this can happen, which is a shame cos I just removed the wait checking code after not ever having seen a wait. Thanks for the report. Must-fix item for HS. So this deadlock can happen at two places: 1. When a relation lock waits behind an AccessExclusiveLock and then Startup runs LockBufferForCleanup() 2. When Startup is a pin count waiter and a lock acquire begins to wait on a relation lock So we must put in direct deadlock detection in both places. We can't use the normal deadlock detector because in case (1) the backend might already have exceeded deadlock_timeout. Proposal: Make Startup wait on a well-known semaphore rather than on its proc-sem. This means we can skip the spinlock check in ProcSendSignal(). For (1) if Startup runs LockBufferForCleanup and can't get cleanup lock then it marks itself waiting. It then checks for any lock waiters. If there are 0 lock waiters then it waits for up to max_standby_delay and then aborts all current lock waiters, none of whom would ever wake if we continue waiting. For (2) If a normal backend goes into a lock wait in HS then it will check to see if Startup is waiting, if so, throw ERROR. This can happen immediately because if Startup is already waiting then to wait for the lock would cause deadlock. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] An example of bugs for Hot Standby
Hot Standby node can freeze when startup process calls LockBufferForCleanup(). This bug can be reproduced by the following procedure. 0. start Hot Standby, with one active node(node A) and one standby node(node B) 1. create table X and table Y in node A 2. insert several rows in table X in node A 3. delete one row from table X in node A 4. begin xact 1 in node A, execute following commands, and leave xact 1 open 4.1 LOCK table Y IN ACCESS EXCLUSIVE MODE 5. wait until WAL's for above actions are applied in node B 6. begin xact 2 in node B, and execute following commands 6.1 DECLARE CURSOR test_cursor FOR SELECT * FROM table X; 6.2 FETCH test_cursor; 6.3 SELECT * FROM table Y; 7. execute VACUUM FREEZE table A in node A 8. commit xact 1 in node A ...then in node B occurs following deadlock situation, which is not detected by deadlock check. * startup process waits for xact 2 to release buffers in table X (in LockBufferForCleanup()) * xact 2 waits for startup process to release ACCESS EXCLUSIVE lock in table Y This situation can occur when a) a transaction in the standby node tries to acquire ACCESS SHARE lock while holding some buffers b) startup process calls LockBufferForCleanup() for any of the buffers regards, -- Hiroyuki YAMADA Kokolink Corporation yam...@kokolink.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] An example of bugs for Hot Standby
Hot Standby node can freeze when startup process calls LockBufferForCleanup(). This bug can be reproduced by the following procedure. 0. start Hot Standby, with one active node(node A) and one standby node(node B) 1. create table X and table Y in node A 2. insert several rows in table X in node A 3. delete one row from table X in node A 4. begin xact 1 in node A, execute following commands, and leave xact 1 open 4.1 LOCK table Y IN ACCESS EXCLUSIVE MODE 5. wait until WAL's for above actions are applied in node B 6. begin xact 2 in node B, and execute following commands 6.1 DECLARE CURSOR test_cursor FOR SELECT * FROM table X; 6.2 FETCH test_cursor; 6.3 SELECT * FROM table Y; 7. execute VACUUM FREEZE table A in node A 8. commit xact 1 in node A ...then in node B occurs following deadlock situation, which is not detected by deadlock check. * startup process waits for xact 2 to release buffers in table X (in LockBufferForCleanup()) * xact 2 waits for startup process to release ACCESS EXCLUSIVE lock in table Y This situation can occur when a) a transaction in the standby node tries to acquire ACCESS SHARE lock while holding some buffers b) startup process calls LockBufferForCleanup() for any of the buffers regards, -- Hiroyuki YAMADA Kokolink Corporation yam...@kokolink.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers