Re: How should the primary behave when the sync standby goes away? Re: [HACKERS] Sync Rep v17

2011-03-07 Thread Simon Riggs
On Mon, 2011-03-07 at 13:15 -0500, Robert Haas wrote: > On Sun, Mar 6, 2011 at 5:36 PM, Simon Riggs wrote: > > On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote: > >> On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao wrote: > >> > On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs > >> > wrote: > >> >

Re: How should the primary behave when the sync standby goes away? Re: [HACKERS] Sync Rep v17

2011-03-07 Thread Robert Haas
On Sun, Mar 6, 2011 at 5:36 PM, Simon Riggs wrote: > On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote: >> On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao wrote: >> > On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs wrote: >> >> The WALSender deliberately does *not* wake waiting users if the standby

Re: How should the primary behave when the sync standby goes away? Re: [HACKERS] Sync Rep v17

2011-03-06 Thread Simon Riggs
On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote: > On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao wrote: > > On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs wrote: > >> The WALSender deliberately does *not* wake waiting users if the standby > >> disconnects. Doing so would break the whole reason

How should the primary behave when the sync standby goes away? Re: [HACKERS] Sync Rep v17

2011-03-03 Thread Fujii Masao
On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao wrote: > On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs wrote: >> The WALSender deliberately does *not* wake waiting users if the standby >> disconnects. Doing so would break the whole reason for having sync rep >> in the first place. What we do is allow

Re: [HACKERS] Sync Rep v17

2011-03-03 Thread Robert Haas
On Wed, Mar 2, 2011 at 5:10 PM, Yeb Havinga wrote: > On 2011-03-02 21:26, Kevin Grittner wrote: >> >> I think including "synchronous" is OK as long as it's properly >> qualified.  Off-hand thoughts in no particular order: >> >> semi-synchronous >> conditionally synchronous >> synchronous with auto

Re: [HACKERS] Sync Rep v17

2011-03-03 Thread Dimitri Fontaine
Robert Haas writes: > I don't understand how synchronous replication with > allow_standalone_primary=on gives you ANY extra nines. AFAICS, the > only point of having synchronous replication is that you wait to > acknowledge the commit to the client until the commit record has been > replicated.

Re: [HACKERS] Sync Rep v17

2011-03-03 Thread Simon Riggs
On Thu, 2011-03-03 at 02:14 -0500, Tom Lane wrote: > Fujii Masao writes: > > On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas > > wrote: > >> To achieve the effect Fujii is looking for, we would have to silently drop > >> the connection. That would correctly leave the client not knowing whethe

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Tom Lane
Fujii Masao writes: > On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas > wrote: >> To achieve the effect Fujii is looking for, we would have to silently drop >> the connection. That would correctly leave the client not knowing whether >> the transaction committed or not. > Yeah, this seems to

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Thu, 2011-03-03 at 13:35 +0900, Fujii Masao wrote: > On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas > wrote: > > To achieve the effect Fujii is looking for, we would have to silently drop > > the connection. That would correctly leave the client not knowing whether > > the transaction comm

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Fujii Masao
On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas wrote: > To achieve the effect Fujii is looking for, we would have to silently drop > the connection. That would correctly leave the client not knowing whether > the transaction committed or not. Yeah, this seems to make more sense. Regards, -

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Fujii Masao
On Thu, Mar 3, 2011 at 6:33 AM, Robert Haas wrote: > I don't understand how synchronous replication with > allow_standalone_primary=on gives you ANY extra nines. When you start the primary (or when there is one connected standby and it crashes), allow_standalone_primary = on allows the database s

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Fujii Masao
On Thu, Mar 3, 2011 at 5:50 AM, Robert Haas wrote: > I agree.  I assumed that when Simon was talking about removing > allow_standalone_primary, he meant making the code always behave as if > it were turned OFF. I feel the same thing.. Despite his saying, the patch implements sync_replication_time

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 16:24 -0500, Andrew Dunstan wrote: > > On 03/02/2011 04:13 PM, Simon Riggs wrote: > > On Wed, 2011-03-02 at 15:44 -0500, Andrew Dunstan wrote: > >> On 03/02/2011 03:39 PM, Simon Riggs wrote: > >>> Truly "synchronous" requires two-phase commit, which this never was. So > >>> t

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Kevin Grittner
Yeb Havinga wrote: > On 2011-03-02 21:26, Kevin Grittner wrote: >> >> I think including "synchronous" is OK as long as it's properly >> qualified. Off-hand thoughts in no particular order: >> >> semi-synchronous >> conditionally synchronous >> synchronous with automatic failover to standalone > I

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 16:16 -0500, Tom Lane wrote: > Simon Riggs writes: > > On Wed, 2011-03-02 at 22:10 +0200, Heikki Linnakangas wrote: > >> Fair enough. All I'm saying is that if we end up shipping without that > >> parameter (implying allow_standalone_primary=on), we need to call the > >> fe

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Yeb Havinga
On 2011-03-02 21:26, Kevin Grittner wrote: I think including "synchronous" is OK as long as it's properly qualified. Off-hand thoughts in no particular order: semi-synchronous conditionally synchronous synchronous with automatic failover to standalone It would be good to name the concept equal

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 4:19 PM, Dimitri Fontaine wrote: > Robert Haas writes: >> 1. Everything is humming along. >> 2. The network link between the master and standby drops. >> 3. Then it comes back up again. >> >> After (2) and before (3), what should the behavior the master be?  It >> seems cle

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Andrew Dunstan
On 03/02/2011 04:13 PM, Simon Riggs wrote: On Wed, 2011-03-02 at 15:44 -0500, Andrew Dunstan wrote: On 03/02/2011 03:39 PM, Simon Riggs wrote: Truly "synchronous" requires two-phase commit, which this never was. So the absence or presence of the poorly specified parameter called allow_standal

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Dimitri Fontaine
Robert Haas writes: > 1. Everything is humming along. > 2. The network link between the master and standby drops. > 3. Then it comes back up again. > > After (2) and before (3), what should the behavior the master be? It > seems clear to me that it should WAIT. Otherwise, a crash on the That ju

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Tom Lane
Simon Riggs writes: > On Wed, 2011-03-02 at 22:10 +0200, Heikki Linnakangas wrote: >> Fair enough. All I'm saying is that if we end up shipping without that >> parameter (implying allow_standalone_primary=on), we need to call the >> feature something else. The GUCs and code can probably stay as

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 15:44 -0500, Andrew Dunstan wrote: > > On 03/02/2011 03:39 PM, Simon Riggs wrote: > > Truly "synchronous" requires two-phase commit, which this never was. So > > the absence or presence of the poorly specified parameter called > > allow_standalone_primary should have no beari

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Kevin Grittner
Simon Riggs wrote: > On Wed, 2011-03-02 at 15:50 -0500, Robert Haas wrote: > >> I assumed that when Simon was talking about removing >> allow_standalone_primary, he meant making the code always behave >> as if it were turned OFF. > > That is the part that is currently not fully specified, so no

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 15:50 -0500, Robert Haas wrote: > I assumed that when Simon was talking about removing > allow_standalone_primary, he meant making the code always behave as if > it were turned OFF. That is the part that is currently not fully specified, so no that is not currently included

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 3:45 PM, Kevin Grittner wrote: > Simon Riggs wrote: > >> allow_standalone_primary=off means "wait forever". It does nothing >> to reduce data loss since you can't replicate to a server that >> isn't there. > > Unless you're pulling from some persistent source which will the

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Kevin Grittner
Simon Riggs wrote: > allow_standalone_primary=off means "wait forever". It does nothing > to reduce data loss since you can't replicate to a server that > isn't there. Unless you're pulling from some persistent source which will then feel free to discard what you have retrieved. You can't ass

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Andrew Dunstan
On 03/02/2011 03:39 PM, Simon Riggs wrote: Truly "synchronous" requires two-phase commit, which this never was. So the absence or presence of the poorly specified parameter called allow_standalone_primary should have no bearing on what we call this feature. I haven't been following this very

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 22:10 +0200, Heikki Linnakangas wrote: > On 02.03.2011 21:48, Simon Riggs wrote: > > On Wed, 2011-03-02 at 16:53 +0200, Heikki Linnakangas wrote: > >> On 02.03.2011 12:40, Simon Riggs wrote: > >>> allow_standalone_primary seems to need to be better through than it is > >>> now

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Joshua D. Drake
On Wed, 2011-03-02 at 14:26 -0600, Kevin Grittner wrote: > Heikki Linnakangas wrote: > > > All I'm saying is that if we end up shipping without that > > parameter (implying allow_standalone_primary=on), we need to call > > the feature something else. The GUCs and code can probably stay as > > it

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Kevin Grittner
Heikki Linnakangas wrote: > All I'm saying is that if we end up shipping without that > parameter (implying allow_standalone_primary=on), we need to call > the feature something else. The GUCs and code can probably stay as > it is, but we shouldn't use the term "synchronous replication" in > the

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Heikki Linnakangas
On 02.03.2011 21:48, Simon Riggs wrote: On Wed, 2011-03-02 at 16:53 +0200, Heikki Linnakangas wrote: On 02.03.2011 12:40, Simon Riggs wrote: allow_standalone_primary seems to need to be better through than it is now, yet neither of us think its worth having. If the people that want it can thin

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 16:53 +0200, Heikki Linnakangas wrote: > On 02.03.2011 12:40, Simon Riggs wrote: > > allow_standalone_primary seems to need to be better through than it is > > now, yet neither of us think its worth having. > > > > If the people that want it can think it through a little bette

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 17:23 +0200, Heikki Linnakangas wrote: > On 02.03.2011 17:07, Robert Haas wrote: > > On Wed, Mar 2, 2011 at 9:53 AM, Heikki Linnakangas > > wrote: > >> On 02.03.2011 12:40, Simon Riggs wrote: > >>> > >>> allow_standalone_primary seems to need to be better through than it is

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 12:39 PM, Merlin Moncure wrote: > On Wed, Mar 2, 2011 at 9:58 AM, Dimitri Fontaine > wrote: >> Heikki Linnakangas writes: >>> To achieve the effect Fujii is looking for, we would have to silently drop >>> the connection. That would correctly leave the client not knowing w

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Merlin Moncure
On Wed, Mar 2, 2011 at 9:58 AM, Dimitri Fontaine wrote: > Heikki Linnakangas writes: >> To achieve the effect Fujii is looking for, we would have to silently drop >> the connection. That would correctly leave the client not knowing whether >> the transaction committed or not. > > +1 > >>> It migh

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Aidan Van Dyk
On Wed, Mar 2, 2011 at 2:30 PM, Fujii Masao wrote: > 1. The primary is running with allow_standalone_primary = on. There >    is only one (synchronous) standby connected. OK. Explicitly configured to allow the master to report as commited stuff which isn't on a/any slave. > 7. New primary does

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Kevin Grittner
Heikki Linnakangas wrote: > The defining property of synchronous replication is that when a > transaction is acknowledged as committed to the client, it has > also been replicated to the standby. You don't achieve that with > allow_standalone_primary=on, plain and simple. That's fine for a > l

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Dimitri Fontaine
Heikki Linnakangas writes: > To achieve the effect Fujii is looking for, we would have to silently drop > the connection. That would correctly leave the client not knowing whether > the transaction committed or not. +1 >> It might be reasonable to COMMIT but also issue a warning message, or >> t

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Heikki Linnakangas
On 02.03.2011 17:07, Robert Haas wrote: On Wed, Mar 2, 2011 at 9:53 AM, Heikki Linnakangas wrote: On 02.03.2011 12:40, Simon Riggs wrote: allow_standalone_primary seems to need to be better through than it is now, yet neither of us think its worth having. If the people that want it can thin

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Heikki Linnakangas
On 02.03.2011 17:07, Robert Haas wrote: On Wed, Mar 2, 2011 at 9:30 AM, Fujii Masao wrote: What I'm thinking is: when the waiting backends are released because of the timeout while the fast shutdown is being done in the master, those backends should not return the success indication to the clie

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 9:53 AM, Heikki Linnakangas wrote: > On 02.03.2011 12:40, Simon Riggs wrote: >> >> allow_standalone_primary seems to need to be better through than it is >> now, yet neither of us think its worth having. >> >> If the people that want it can think it through a little better t

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 9:30 AM, Fujii Masao wrote: > What I'm thinking is: when the waiting backends are released because > of the timeout while the fast shutdown is being done in the master, > those backends should not return the success indication to the client. > Of course, in that case, WAL ha

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Heikki Linnakangas
On 02.03.2011 12:40, Simon Riggs wrote: allow_standalone_primary seems to need to be better through than it is now, yet neither of us think its worth having. If the people that want it can think it through a little better then it might make this release, but I propose to remove it from this curr

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Fujii Masao
On Wed, Mar 2, 2011 at 7:40 PM, Simon Riggs wrote: > On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote: > >> No, I've never wished wait-forever option for now. I'd like to make >> the primary work alone when there is no connected standby, for >> high-availability. > > allow_standalone_primary s

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Fujii Masao
On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs wrote: > The WALSender deliberately does *not* wake waiting users if the standby > disconnects. Doing so would break the whole reason for having sync rep > in the first place. What we do is allow a potential standby to takeover > the role of sync standby

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 6:22 AM, Simon Riggs wrote: > The WALSender deliberately does *not* wake waiting users if the standby > disconnects. Doing so would break the whole reason for having sync rep > in the first place. What we do is allow a potential standby to takeover > the role of sync standby

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Yeb Havinga
On 2011-03-02 11:40, Simon Riggs wrote: allow_standalone_primary seems to need to be better through than it is now, yet neither of us think its worth having. If the people that want it can think it through a little better then it might make this release, but I propose to remove it from this cur

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote: > On Tue, Mar 1, 2011 at 9:19 AM, Simon Riggs wrote: > > On Mon, 2011-02-28 at 18:40 +, Simon Riggs wrote: > >> > SyncRepReleaseWaiters should be called when walsender exits. Otherwise, > >> > if the standby crashes while a transaction is w

Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote: > No, I've never wished wait-forever option for now. I'd like to make > the primary work alone when there is no connected standby, for > high-availability. allow_standalone_primary seems to need to be better through than it is now, yet neithe

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Simon Riggs
On Tue, 2011-03-01 at 10:02 -0500, Tom Lane wrote: > Simon Riggs writes: > > On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote: > >> A spinlock can be used only for very short-term operation like > >> read/write of some shared-variables. The operation on the queue > >> is not short, so should b

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Tom Lane
Robert Haas writes: > On Tue, Mar 1, 2011 at 3:21 AM, Simon Riggs wrote: >> LWlocks are just spinlocks plus sem sleeps, so I don't see the need for >> that in the current code. Other views welcome. > An LWLock is a lot safer, in general, than a spinlock. A spinlock > mustn't do anything that co

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Tom Lane
Simon Riggs writes: > On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote: >> A spinlock can be used only for very short-term operation like >> read/write of some shared-variables. The operation on the queue >> is not short, so should be protected by LWLock, I think. > There's no need to sleep w

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Robert Haas
On Tue, Mar 1, 2011 at 3:21 AM, Simon Riggs wrote: > On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote: >> Thanks for update of the patch! >> >> On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs wrote: >> >> SyncRepRemoveFromQueue seems not to be as short-term as we can >> >> use the spinlock. Inste

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Yeb Havinga
On 2011-02-28 20:32, Simon Riggs wrote: I have reworded it to see if that improves the explanation Code available at git://github.com/simon2ndQuadrant/postgres.git If a standby is removed from the list of servers then it will stop being the synchronous standby, allowing another to take it'

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Fujii Masao
On Tue, Mar 1, 2011 at 5:29 PM, Simon Riggs wrote: >> >> What if fast shutdown is requested while RecordTransactionCommit >> >> is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete >> >> until replication has been successfully done (i.e., until at least one >> >> synchronous standby

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Fujii Masao
On Tue, Mar 1, 2011 at 4:56 PM, Simon Riggs wrote: >> > If allow_standalone_primary = on then we sit and wait until we hit >> > client timeout, which occurs even after last standby has gone. >> >> In that case, why do backends need to wait until the timeout occurs? >> We can make those backends re

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Fujii Masao
On Tue, Mar 1, 2011 at 5:21 PM, Simon Riggs wrote: >> A spinlock can be used only for very short-term operation like >> read/write of some shared-variables. The operation on the queue >> is not short, so should be protected by LWLock, I think. > > There's no need to sleep while holding locks and t

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Simon Riggs
On Tue, 2011-03-01 at 16:28 +0900, Fujii Masao wrote: > On Tue, Mar 1, 2011 at 3:39 AM, Simon Riggs wrote: > >> PREPARE TRANSACTION and ROLLBACK PREPARED should wait for > >> replication as well as COMMIT PREPARED? > > > > PREPARE - Yes > > ROLLBACK - No > > > > Further discussion welcome > > If

Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Simon Riggs
On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote: > Thanks for update of the patch! > > On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs wrote: > >> SyncRepRemoveFromQueue seems not to be as short-term as we can > >> use the spinlock. Instead, LW lock should be used there. > > You seem to have fo

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote: > On Tue, Mar 1, 2011 at 9:19 AM, Simon Riggs wrote: > > On Mon, 2011-02-28 at 18:40 +, Simon Riggs wrote: > >> > SyncRepReleaseWaiters should be called when walsender exits. Otherwise, > >> > if the standby crashes while a transaction is w

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Fujii Masao
On Tue, Mar 1, 2011 at 3:39 AM, Simon Riggs wrote: >> PREPARE TRANSACTION and ROLLBACK PREPARED should wait for >> replication as well as COMMIT PREPARED? > > PREPARE - Yes > ROLLBACK - No > > Further discussion welcome If we don't make ROLLBACK PREPARED wait for replication, we might need to iss

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Fujii Masao
Thanks for update of the patch! On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs wrote: >> SyncRepRemoveFromQueue seems not to be as short-term as we can >> use the spinlock. Instead, LW lock should be used there. You seem to have forgotten to fix the above-mentioned issue. A spinlock can be used onl

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Fujii Masao
On Tue, Mar 1, 2011 at 9:19 AM, Simon Riggs wrote: > On Mon, 2011-02-28 at 18:40 +, Simon Riggs wrote: >> > SyncRepReleaseWaiters should be called when walsender exits. Otherwise, >> > if the standby crashes while a transaction is waiting for replication, >> > it waits infinitely. >> >> Will t

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-28 at 18:40 +, Simon Riggs wrote: > > SyncRepReleaseWaiters should be called when walsender exits. Otherwise, > > if the standby crashes while a transaction is waiting for replication, > > it waits infinitely. > > Will think on this. The behaviour seems correct to me: If allo

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-28 at 16:55 -0500, Robert Haas wrote: > On Mon, Feb 28, 2011 at 4:36 PM, Simon Riggs wrote: > > On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote: > > > >> > Docs changed: git://github.com/simon2ndQuadrant/postgres.git > >> > >> I'm hoping you're going to post an updated patch o

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Robert Haas
On Mon, Feb 28, 2011 at 4:36 PM, Simon Riggs wrote: > On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote: > >> > Docs changed: git://github.com/simon2ndQuadrant/postgres.git >> >> I'm hoping you're going to post an updated patch once the current rash >> of updates is all done. > > Immediately pr

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote: > > Docs changed: git://github.com/simon2ndQuadrant/postgres.git > > I'm hoping you're going to post an updated patch once the current rash > of updates is all done. Immediately prior to commit, yes. Everybody else has been nudging me towar

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Robert Haas
On Mon, Feb 28, 2011 at 4:13 PM, Simon Riggs wrote: > On Sat, 2011-02-19 at 23:26 -0500, Robert Haas wrote: > >> I believe the problem is that the definition of IsOnSyncRepQueue is >> bogus, so that the loop in SyncRepWaitOnQueue always takes the first >> branch. > > Sorry, don't see that. Jaime/Y

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Sat, 2011-02-19 at 23:26 -0500, Robert Haas wrote: > I believe the problem is that the definition of IsOnSyncRepQueue is > bogus, so that the loop in SyncRepWaitOnQueue always takes the first > branch. Sorry, don't see that. Jaime/Yeb fix applied. > It was a little confusing to me setting thi

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Sat, 2011-02-19 at 22:52 -0500, Robert Haas wrote: > On Sat, Feb 19, 2011 at 3:28 AM, Simon Riggs wrote: > > First, we should be clear to explain that you are referring to the fact > > that the request > > synchronous_commit = off > > synchronous_replication = on > > makes no sense in the way

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Fri, 2011-02-25 at 16:41 +0100, Yeb Havinga wrote: > I needed to add this to make the documentation compile > > --- a/doc/src/sgml/config.sgml > +++ b/doc/src/sgml/config.sgml > @@ -2010,6 +2010,9 @@ SET ENABLE_SEQSCAN TO OFF; > You should also consider setting > hot_standby_feedback

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-21 at 21:35 +0900, Tatsuo Ishii wrote: > > Well, good news all round. > > > > v17 implements what I believe to be the final set of features for sync > > rep. This one I'm actually fairly happy with. It can be enjoyed best at > > DEBUG3. > > > > The patch is very lite touch on a fe

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Thu, 2011-02-24 at 18:13 -0800, Daniel Farina wrote: > I have also reproduced this. Notably, things seem fine as long as > pgbench is confined to one backend, but as soon as two are used (-c 2) > by the feature I can get segfaults. Sorry that you all experienced this. I wasn't able to get conc

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Fri, 2011-02-25 at 16:41 +0100, Yeb Havinga wrote: > --- a/src/backend/replication/syncrep.c > +++ b/src/backend/replication/syncrep.c > @@ -274,6 +274,8 @@ SyncRepRemoveFromQueue(void) > } > else > { > + bool found = false; > + > while

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Tue, 2011-02-22 at 14:38 +0900, Fujii Masao wrote: > On Mon, Feb 21, 2011 at 6:06 PM, Fujii Masao wrote: > > I've read about a tenth of the patch, so I'll submit another comments > > about the rest later. > > Here are another comments: Thanks for your comments Code available at git://github.c

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Thu, 2011-02-24 at 22:08 +0900, Fujii Masao wrote: > On Tue, Feb 22, 2011 at 2:38 PM, Fujii Masao wrote: > > I've read about two-tenths of the patch, so I'll submit another comments > > about the rest later. Sorry for the slow reviewing... > > Here are another comments: Thanks for your commen

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-28 at 10:31 +0100, Yeb Havinga wrote: > 1) no automatic switch to other synchronous standby > - start master server, add synchronous standby 1 > - change allow_standalone_primary to off > - add second synchronous standby > - wait until pg_stat_replication shows both standby's are i

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-21 at 18:06 +0900, Fujii Masao wrote: > Thanks for the patch! Thanks for the review. Code available at git://github.com/simon2ndQuadrant/postgres.git > PREPARE TRANSACTION and ROLLBACK PREPARED should wait for > replication as well as COMMIT PREPARED? PREPARE - Yes ROLLBACK - N

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Yeb Havinga
On 2011-02-25 20:40, Jaime Casanova wrote: On Fri, Feb 25, 2011 at 10:41 AM, Yeb Havinga wrote: I also did some initial testing on this patch and got the queue related errors with> 1 clients. With the code change from Jaime above I still got a lot of 'not on queue warnings'. I tried to unders

Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Daniel Farina
On Fri, Feb 25, 2011 at 4:52 PM, Jeff Davis wrote: > On Wed, 2011-02-23 at 22:42 -0800, Daniel Farina wrote: >> Oh, yes, this reproduces past shutdowns/startups, and there's quite a >> few txids before I catch up. I'm also comfortable poking around with >> gdb (I have already recompiled with debug

Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Jeff Davis
On Wed, 2011-02-23 at 22:42 -0800, Daniel Farina wrote: > Oh, yes, this reproduces past shutdowns/startups, and there's quite a > few txids before I catch up. I'm also comfortable poking around with > gdb (I have already recompiled with debugging symbols and > optimizations off and was poking aroun

Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Jaime Casanova
On Fri, Feb 25, 2011 at 10:41 AM, Yeb Havinga wrote: >> > I also did some initial testing on this patch and got the queue related > errors with > 1 clients. With the code change from Jaime above I still got a > lot of 'not on queue warnings'. > > I tried to understand how the queue was supposed to

Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Yeb Havinga
On 2011-02-22 20:43, Jaime Casanova wrote: you can make this happen more easily, i just run "pgbench -n -c10 -j10 test" and qot that warning and sometimes a segmentation fault and sometimes a failed assertion and the problematic code starts at src/backend/replication/syncrep.c:277, here my sugg

Re: [HACKERS] Sync Rep v17

2011-02-24 Thread Jaime Casanova
On Mon, Feb 21, 2011 at 4:06 AM, Fujii Masao wrote: > > PREPARE TRANSACTION and ROLLBACK PREPARED should wait for > replication as well as COMMIT PREPARED? > maybe ROLLBACK PREPARED but i'm not sure... i'm pretty sure we don't need to wait for PREPARE TRANSACTION, but i could be wrong > What if

Re: [HACKERS] Sync Rep v17

2011-02-24 Thread Daniel Farina
With some more fooling around, I have also managed to get this elog(WARNING) if (proc->lwWaitLink == NULL) elog(WARNING, "could not locate ourselves on wait queue"); -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql

Re: [HACKERS] Sync Rep v17

2011-02-24 Thread Daniel Farina
On Tue, Feb 22, 2011 at 11:43 AM, Jaime Casanova wrote: > On Sat, Feb 19, 2011 at 11:26 PM, Robert Haas wrote: >> >> DEBUG:  write 0/3027BC8 flush 0/3014690 apply 0/3014690 >> DEBUG:  released 0 procs up to 0/3014690 >> DEBUG:  write 0/3027BC8 flush 0/3027BC8 apply 0/3014690 >> DEBUG:  released 2

Re: [HACKERS] Sync Rep v17

2011-02-24 Thread Fujii Masao
On Tue, Feb 22, 2011 at 2:38 PM, Fujii Masao wrote: > I've read about two-tenths of the patch, so I'll submit another comments > about the rest later. Sorry for the slow reviewing... Here are another comments: + {"synchronous_standby_names", PGC_SIGHUP, WAL_REPLICATION, +

Re: [HACKERS] Sync Rep v17

2011-02-23 Thread Daniel Farina
On Wed, Feb 23, 2011 at 10:39 PM, Daniel Farina wrote: > On Fri, Feb 18, 2011 at 4:06 PM, Simon Riggs wrote: >> >> Well, good news all round. >> >> v17 implements what I believe to be the final set of features for sync >> rep. This one I'm actually fairly happy with. It can be enjoyed best at >>

Re: [HACKERS] Sync Rep v17

2011-02-23 Thread Daniel Farina
On Fri, Feb 18, 2011 at 4:06 PM, Simon Riggs wrote: > > Well, good news all round. > > v17 implements what I believe to be the final set of features for sync > rep. This one I'm actually fairly happy with. It can be enjoyed best at > DEBUG3. I've been messing with this patch and am wondering if t

Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Daniel Farina
On Tue, Feb 22, 2011 at 5:04 AM, Greg Smith wrote: > Daniel Farina wrote: >> >> As it will be somewhat hard to prove the durability guarantees of >> commit without special heroics, unless someone can suggest a >> mechanism. > > Could you introduce a hack creating deterministic server side crashes

Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Jaime Casanova
On Sat, Feb 19, 2011 at 11:26 PM, Robert Haas wrote: > > DEBUG:  write 0/3027BC8 flush 0/3014690 apply 0/3014690 > DEBUG:  released 0 procs up to 0/3014690 > DEBUG:  write 0/3027BC8 flush 0/3027BC8 apply 0/3014690 > DEBUG:  released 2 procs up to 0/3027BC8 > WARNING:  could not locate ourselves on

Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Tom Lane
Marti Raudsepp writes: > On Tue, Feb 22, 2011 at 07:38, Fujii Masao wrote: >> + SpinLockAcquire(&WalSndCtl->ctlmutex); >> + result = WalSndCtl->sync_rep_service_available; >> + SpinLockRelease(&WalSndCtl->ctlmutex); >> volatile pointer needs to be used to prevent code rearrange

Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Greg Smith
Daniel Farina wrote: As it will be somewhat hard to prove the durability guarantees of commit without special heroics, unless someone can suggest a mechanism. Could you introduce a hack creating deterministic server side crashes in order to test this out? The simplest thing that comes to mind

Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Andres Freund
On Tuesday 22 February 2011 09:59:21 Marti Raudsepp wrote: > On Tue, Feb 22, 2011 at 07:38, Fujii Masao wrote: > > + SpinLockAcquire(&WalSndCtl->ctlmutex); > > + result = WalSndCtl->sync_rep_service_available; > > + SpinLockRelease(&WalSndCtl->ctlmutex); > > > > volatile pointer

Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Marti Raudsepp
On Tue, Feb 22, 2011 at 07:38, Fujii Masao wrote: > +       SpinLockAcquire(&WalSndCtl->ctlmutex); > +       result = WalSndCtl->sync_rep_service_available; > +       SpinLockRelease(&WalSndCtl->ctlmutex); > volatile pointer needs to be used to prevent code rearrangement. I don't think that's ne

Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Fujii Masao
On Mon, Feb 21, 2011 at 9:35 PM, Tatsuo Ishii wrote: > +       synchronous_standby_names configuration > parameter > +       > +       > +       > +        Specifies a list of standby names that can become the sole > +        synchronous standby. Other standby servers connect that are also on >

Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Fujii Masao
On Mon, Feb 21, 2011 at 6:06 PM, Fujii Masao wrote: > I've read about a tenth of the patch, so I'll submit another comments > about the rest later. Here are another comments: SyncRepReleaseWaiters should be called when walsender exits. Otherwise, if the standby crashes while a transaction is wai

Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Fujii Masao
On Tue, Feb 22, 2011 at 7:55 AM, Daniel Farina wrote: > I'm taking a look at replication timeout with non-blocking which would > be "nice" but not required for this patch, in my understanding. Why do you think so? You think sync_replication_timeout_client is sufficient for sync rep? Regards, --

Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Daniel Farina
On Mon, Feb 21, 2011 at 4:35 AM, Tatsuo Ishii wrote: >> Well, good news all round. Hello on this thread, I'm taking a look at replication timeout with non-blocking which would be "nice" but not required for this patch, in my understanding. But before that, we're going to put this patch through s

Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Tatsuo Ishii
> Well, good news all round. > > v17 implements what I believe to be the final set of features for sync > rep. This one I'm actually fairly happy with. It can be enjoyed best at > DEBUG3. > > The patch is very lite touch on a few areas of code, plus a chunk of > specific code, all on master-side.

Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Fujii Masao
On Sat, Feb 19, 2011 at 9:06 AM, Simon Riggs wrote: > > Well, good news all round. > > v17 implements what I believe to be the final set of features for sync > rep. This one I'm actually fairly happy with. It can be enjoyed best at > DEBUG3. > > The patch is very lite touch on a few areas of code,

  1   2   >